annotations.SpkLexRep package¶
Submodules¶
annotations.SpkLexRep.sppaslexrep module¶
- filename
sppas.src.annotations.SpkLexRep.sppaslexrep.py
- author
Brigitte Bigi, Laurent Vouriot
- contact
- summary
Speaker Lexical Reprise automatic annotation
- class annotations.SpkLexRep.sppaslexrep.LexReprise(win_idx, end_idx)[source]¶
Bases:
object
Data structure to store a lexical reprise.
- class annotations.SpkLexRep.sppaslexrep.sppasLexRep(log=None)[source]¶
Bases:
annotations.SelfRepet.sppasbaserepet.sppasBaseRepet
SPPAS integration of the speaker lexical variation annotation.
Main differences compared to repetitions: The span option is used to fix the max number of continuous tokens to analyze. The span window has a duration limit.
- __init__(log=None)[source]¶
Create a new sppasLexVar instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- static create_tier(sources, locations)[source]¶
Create a tier from content end localization lists.
- Parameters
sources – (dict) dict of sources – in fact, the indexes.
locations – (list) list of location corresponding to the tokens
- Returns
(sppasTier)
- get_inputs(input_files)[source]¶
Return 2 tiers with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- static get_longest(speaker1, speaker2)[source]¶
Return the index of the last token of the longest repeated sequence.
No matter if a non-speech event occurs in the middle of the repeated sequence and no matter if a non-speech event occurs in the middle of the source sequence. No matter if tokens are not repeated in the same order.
- Parameters
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker) Entries of speaker 2
- Returns
(int) Index or -1
- lexical_variation_detect(tier1, tier2)[source]¶
Detect the lexical variations between 2 tiers.
- Parameters
tier1 – (sppasTier)
tier2 – (sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of list of str) time-aligned tokens of 2 files
output – (str) the output file name
- Returns
(sppasTranscription)
- select(index1, speaker1, speaker2)[source]¶
Append (or not) a repetition.
- Parameters
index1 – (int) end index of the entry of the source (speaker1)
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker) Entries of speaker 2
- Returns
(bool)
- set_alpha(value)[source]¶
Set the alpha option.
- Parameters
value – (float) Coefficient to estimated stopwords
- set_span(value)[source]¶
Set the max span, in number of words.
- Parameters
value – (int) Max nb of tokens in a span window.
- set_span_duration(value)[source]¶
Set the spandur option.
- Parameters
value – (float, int) Max duration of a span window.
- set_stopwords(value)[source]¶
Set the stopwords option.
- Parameters
value – (bool) Enable the fact to add estimated stopwords
Module contents¶
- filename
sppas.src.annotations.SpkLexRep.__init__.py
- author
Brigitte Bigi
- contact
- summary
Speaker Lexical Reprises detection.
Search for lexical repetition of sequences of tokens/lemmas in one file compared to another one.