annotations.OtherRepet package¶
Submodules¶
annotations.OtherRepet.detectrepet module¶
- filename
sppas.src.annotations.OtherRepet.detectrepet.py
- author
Brigitte Bigi
- contact
- summary
Detection of other-repetitions.
- class annotations.OtherRepet.detectrepet.OtherRepetition(stop_list=None)[source]¶
Bases:
annotations.SelfRepet.datastructs.DataRepetition
Other-Repetition automatic detection.
Search for the sources, then find where are the echos.
- __init__(stop_list=None)[source]¶
Create a new Repetitions instance.
- Parameters
stop_list – (sppasVocabulary) List of un-relevant tokens.
- detect(speaker1, speaker2, limit=10)[source]¶
Search for the first other-repetition in tokens.
- Parameters
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker) Entries of speaker 2
limit – (int) Go no longer than ‘limit’ entries of speaker 1
- find_echos(start, end, speaker1, speaker2)[source]¶
Find all echos of a source.
- Parameters
start – (int) start index of the entry of the source (speaker1)
end – (int) end index of the entry of the source (speaker1)
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker) Entries of speaker 2
- Returns
DataRepetition()
- static get_longest(current1, speaker1, speaker2)[source]¶
Return the index of the last token of the longest repeated string.
- Parameters
current1 – (int) Current index in entries of speaker 1
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker2) Entries of speaker 2 (or None)
- Returns
(int) Index or -1
- select(start, end, speaker1, speaker2)[source]¶
Append (or not) an other-repetition.
- Parameters
start – (int) start index of the entry of the source (speaker1)
end – (int) end index of the entry of the source (speaker1)
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker) Entries of speaker 2
annotations.OtherRepet.rules module¶
- filename
sppas.src.annotations.OtherRepet.rules.py
- author
Brigitte Bigi
- contact
- summary
Other-Repetitions rules to accept/reject a repetition.
- class annotations.OtherRepet.rules.OtherRules(stop_list=None)[source]¶
Bases:
annotations.SelfRepet.rules.SelfRules
Rules to select other-repetitions.
Proposed rules deal with the number of words, the word frequencies and distinguishes if the repetition is strict or not. The following rules are proposed for other-repetitions:
Rule 1: A source is accepted if it contains one or more relevant
token. Relevance depends on the speaker producing the echo; - Rule 2: A source which contains at least K tokens is accepted if the repetition is strict.
Rule number 1 need to fix a clear definition of the relevance of a token. Un-relevant tokens are then stored in a stop-list. The stop-list also should contain very frequent tokens in the given language like adjectives, pronouns, etc.
- __init__(stop_list=None)[source]¶
Create an OtherRules instance.
- Parameters
stop_list – (sppasVocabulary or list) Un-relevant tokens.
- rule_strict(start, end, speaker1, speaker2)[source]¶
Apply rule 2 to decide if selection is a repetition or not.
Rule 2: The selection is a repetition if it respects at least one of the following criteria:
selection contains at least 3 tokens;
the repetition is strict (the source is strictly included
into the echo).
- Parameters
start – (int) Index to start the selection
end – (int) Index to stop the selection
speaker1 – (DataSpeaker) All the data
speaker2 – (DataSpeaker) All the data
- Returns
(bool)
annotations.OtherRepet.sppasrepet module¶
- filename
sppas.src.annotations.OtherRepet.sppasrepet.py
- author
Brigitte Bigi
- contact
- summary
SPPAS integration of the detection of other-repetitions.
- class annotations.OtherRepet.sppasrepet.sppasOtherRepet(log=None)[source]¶
Bases:
annotations.SelfRepet.sppasbaserepet.sppasBaseRepet
SPPAS Automatic Other-Repetition Detection.
Detect automatically other-repetitions. Result must be re-filtered by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.
- __init__(log=None)[source]¶
Create a new sppasOtherRepet instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- get_inputs(input_files)[source]¶
Return 2 tiers with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- other_detection(inputtier1, inputtier2)[source]¶
Other-Repetition detection.
- Parameters
inputtier1 – (Tier)
inputtier2 – (Tier)
Module contents¶
- filename
sppas.src.annotations.OtherRepet.__init__.py
- author
Brigitte Bigi
- contact
- summary
Other-Repetitions detection.
This package is the implementation of the following reference:
Brigitte Bigi, Roxane Bertrand, Mathilde Guardiola (2014).Automatic detection of other-repetition occurrences:application to French conversational speech,9th International conference on Language Resources andEvaluation (LREC), Reykjavik (Iceland), pp. 2648-2652.ISBN: 978-2-9517408-8-4.
- class annotations.OtherRepet.OtherRules(stop_list=None)[source]¶
Bases:
annotations.SelfRepet.rules.SelfRules
Rules to select other-repetitions.
Proposed rules deal with the number of words, the word frequencies and distinguishes if the repetition is strict or not. The following rules are proposed for other-repetitions:
Rule 1: A source is accepted if it contains one or more relevant
token. Relevance depends on the speaker producing the echo; - Rule 2: A source which contains at least K tokens is accepted if the repetition is strict.
Rule number 1 need to fix a clear definition of the relevance of a token. Un-relevant tokens are then stored in a stop-list. The stop-list also should contain very frequent tokens in the given language like adjectives, pronouns, etc.
- __init__(stop_list=None)[source]¶
Create an OtherRules instance.
- Parameters
stop_list – (sppasVocabulary or list) Un-relevant tokens.
- rule_strict(start, end, speaker1, speaker2)[source]¶
Apply rule 2 to decide if selection is a repetition or not.
Rule 2: The selection is a repetition if it respects at least one of the following criteria:
selection contains at least 3 tokens;
the repetition is strict (the source is strictly included
into the echo).
- Parameters
start – (int) Index to start the selection
end – (int) Index to stop the selection
speaker1 – (DataSpeaker) All the data
speaker2 – (DataSpeaker) All the data
- Returns
(bool)
- class annotations.OtherRepet.sppasOtherRepet(log=None)[source]¶
Bases:
annotations.SelfRepet.sppasbaserepet.sppasBaseRepet
SPPAS Automatic Other-Repetition Detection.
Detect automatically other-repetitions. Result must be re-filtered by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.
- __init__(log=None)[source]¶
Create a new sppasOtherRepet instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- get_inputs(input_files)[source]¶
Return 2 tiers with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- other_detection(inputtier1, inputtier2)[source]¶
Other-Repetition detection.
- Parameters
inputtier1 – (Tier)
inputtier2 – (Tier)