annotations.OtherRepet package

Submodules

annotations.OtherRepet.detectrepet module

filename

sppas.src.annotations.OtherRepet.detectrepet.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Detection of other-repetitions.

class annotations.OtherRepet.detectrepet.OtherRepetition(stop_list=None)[source]

Bases: annotations.SelfRepet.datastructs.DataRepetition

Other-Repetition automatic detection.

Search for the sources, then find where are the echos.

__init__(stop_list=None)[source]

Create a new Repetitions instance.

Parameters

stop_list – (sppasVocabulary) List of un-relevant tokens.

detect(speaker1, speaker2, limit=10)[source]

Search for the first other-repetition in tokens.

Parameters
  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker) Entries of speaker 2

  • limit – (int) Go no longer than ‘limit’ entries of speaker 1

find_echos(start, end, speaker1, speaker2)[source]

Find all echos of a source.

Parameters
  • start – (int) start index of the entry of the source (speaker1)

  • end – (int) end index of the entry of the source (speaker1)

  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker) Entries of speaker 2

Returns

DataRepetition()

get_all_echos()[source]

Return a list of indexes of all the repeated echos in speaker2.

static get_longest(current1, speaker1, speaker2)[source]

Return the index of the last token of the longest repeated string.

Parameters
  • current1 – (int) Current index in entries of speaker 1

  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker2) Entries of speaker 2 (or None)

Returns

(int) Index or -1

select(start, end, speaker1, speaker2)[source]

Append (or not) an other-repetition.

Parameters
  • start – (int) start index of the entry of the source (speaker1)

  • end – (int) end index of the entry of the source (speaker1)

  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker) Entries of speaker 2

annotations.OtherRepet.rules module

filename

sppas.src.annotations.OtherRepet.rules.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Other-Repetitions rules to accept/reject a repetition.

class annotations.OtherRepet.rules.OtherRules(stop_list=None)[source]

Bases: annotations.SelfRepet.rules.SelfRules

Rules to select other-repetitions.

Proposed rules deal with the number of words, the word frequencies and distinguishes if the repetition is strict or not. The following rules are proposed for other-repetitions:

  • Rule 1: A source is accepted if it contains one or more relevant

token. Relevance depends on the speaker producing the echo; - Rule 2: A source which contains at least K tokens is accepted if the repetition is strict.

Rule number 1 need to fix a clear definition of the relevance of a token. Un-relevant tokens are then stored in a stop-list. The stop-list also should contain very frequent tokens in the given language like adjectives, pronouns, etc.

__init__(stop_list=None)[source]

Create an OtherRules instance.

Parameters

stop_list – (sppasVocabulary or list) Un-relevant tokens.

rule_strict(start, end, speaker1, speaker2)[source]

Apply rule 2 to decide if selection is a repetition or not.

Rule 2: The selection is a repetition if it respects at least one of the following criteria:

  • selection contains at least 3 tokens;

  • the repetition is strict (the source is strictly included

into the echo).

Parameters
  • start – (int) Index to start the selection

  • end – (int) Index to stop the selection

  • speaker1 – (DataSpeaker) All the data

  • speaker2 – (DataSpeaker) All the data

Returns

(bool)

annotations.OtherRepet.sppasrepet module

filename

sppas.src.annotations.OtherRepet.sppasrepet.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

SPPAS integration of the detection of other-repetitions.

class annotations.OtherRepet.sppasrepet.sppasOtherRepet(log=None)[source]

Bases: annotations.SelfRepet.sppasbaserepet.sppasBaseRepet

SPPAS Automatic Other-Repetition Detection.

Detect automatically other-repetitions. Result must be re-filtered by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.

__init__(log=None)[source]

Create a new sppasOtherRepet instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

fix_options(options)[source]

Fix all options.

Parameters

options – list of sppasOption instances

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return 2 tiers with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

other_detection(inputtier1, inputtier2)[source]

Other-Repetition detection.

Parameters
  • inputtier1 – (Tier)

  • inputtier2 – (Tier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Input file is a tuple with 2 files: the main speaker and the echoing speaker.

Parameters
  • input_files – (list of str) File(s) with time-aligned token

  • output – (str) the output name

Returns

(sppasTranscription)

set_all_echos_tier(all_echos)[source]

Create a tier with all tokens that are echo-candidates.

Parameters

all_echos – (bool)

Module contents

filename

sppas.src.annotations.OtherRepet.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Other-Repetitions detection.

This package is the implementation of the following reference:

Brigitte Bigi, Roxane Bertrand, Mathilde Guardiola (2014).
Automatic detection of other-repetition occurrences:
application to French conversational speech,
9th International conference on Language Resources and
Evaluation (LREC), Reykjavik (Iceland), pp. 2648-2652.
ISBN: 978-2-9517408-8-4.
class annotations.OtherRepet.OtherRules(stop_list=None)[source]

Bases: annotations.SelfRepet.rules.SelfRules

Rules to select other-repetitions.

Proposed rules deal with the number of words, the word frequencies and distinguishes if the repetition is strict or not. The following rules are proposed for other-repetitions:

  • Rule 1: A source is accepted if it contains one or more relevant

token. Relevance depends on the speaker producing the echo; - Rule 2: A source which contains at least K tokens is accepted if the repetition is strict.

Rule number 1 need to fix a clear definition of the relevance of a token. Un-relevant tokens are then stored in a stop-list. The stop-list also should contain very frequent tokens in the given language like adjectives, pronouns, etc.

__init__(stop_list=None)[source]

Create an OtherRules instance.

Parameters

stop_list – (sppasVocabulary or list) Un-relevant tokens.

rule_strict(start, end, speaker1, speaker2)[source]

Apply rule 2 to decide if selection is a repetition or not.

Rule 2: The selection is a repetition if it respects at least one of the following criteria:

  • selection contains at least 3 tokens;

  • the repetition is strict (the source is strictly included

into the echo).

Parameters
  • start – (int) Index to start the selection

  • end – (int) Index to stop the selection

  • speaker1 – (DataSpeaker) All the data

  • speaker2 – (DataSpeaker) All the data

Returns

(bool)

class annotations.OtherRepet.sppasOtherRepet(log=None)[source]

Bases: annotations.SelfRepet.sppasbaserepet.sppasBaseRepet

SPPAS Automatic Other-Repetition Detection.

Detect automatically other-repetitions. Result must be re-filtered by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.

__init__(log=None)[source]

Create a new sppasOtherRepet instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

fix_options(options)[source]

Fix all options.

Parameters

options – list of sppasOption instances

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return 2 tiers with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

other_detection(inputtier1, inputtier2)[source]

Other-Repetition detection.

Parameters
  • inputtier1 – (Tier)

  • inputtier2 – (Tier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Input file is a tuple with 2 files: the main speaker and the echoing speaker.

Parameters
  • input_files – (list of str) File(s) with time-aligned token

  • output – (str) the output name

Returns

(sppasTranscription)

set_all_echos_tier(all_echos)[source]

Create a tier with all tokens that are echo-candidates.

Parameters

all_echos – (bool)