annotations.SpkLexRep package

Submodules

annotations.SpkLexRep.sppaslexrep module

filename

sppas.src.annotations.SpkLexRep.sppaslexrep.py

author

Brigitte Bigi, Laurent Vouriot

contact

develop@sppas.org

summary

Speaker Lexical Reprise automatic annotation

class annotations.SpkLexRep.sppaslexrep.LexReprise(win_idx, end_idx)[source]

Bases: object

Data structure to store a lexical reprise.

__init__(win_idx, end_idx)[source]
get_end()[source]
get_labels()[source]
get_start()[source]
set_content(dataspk)[source]

Set the labels from the content.

Parameters

dataspk – (DataSpk) The data content of the window win_idx and end refer to

class annotations.SpkLexRep.sppaslexrep.sppasLexRep(log=None)[source]

Bases: annotations.SelfRepet.sppasbaserepet.sppasBaseRepet

SPPAS integration of the speaker lexical variation annotation.

Main differences compared to repetitions: The span option is used to fix the max number of continuous tokens to analyze. The span window has a duration limit.

__init__(log=None)[source]

Create a new sppasLexVar instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

static create_tier(sources, locations)[source]

Create a tier from content end localization lists.

Parameters
  • sources – (dict) dict of sources – in fact, the indexes.

  • locations – (list) list of location corresponding to the tokens

Returns

(sppasTier)

fix_options(options)[source]

Fix all options.

Parameters

options – list of sppasOption instances

get_inputs(input_files)[source]

Return 2 tiers with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

static get_longest(speaker1, speaker2)[source]

Return the index of the last token of the longest repeated sequence.

No matter if a non-speech event occurs in the middle of the repeated sequence and no matter if a non-speech event occurs in the middle of the source sequence. No matter if tokens are not repeated in the same order.

Parameters
  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker) Entries of speaker 2

Returns

(int) Index or -1

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

lexical_variation_detect(tier1, tier2)[source]

Detect the lexical variations between 2 tiers.

Parameters
  • tier1 – (sppasTier)

  • tier2 – (sppasTier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of list of str) time-aligned tokens of 2 files

  • output – (str) the output file name

Returns

(sppasTranscription)

select(index1, speaker1, speaker2)[source]

Append (or not) a repetition.

Parameters
  • index1 – (int) end index of the entry of the source (speaker1)

  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker) Entries of speaker 2

Returns

(bool)

set_alpha(value)[source]

Set the alpha option.

Parameters

value – (float) Coefficient to estimated stopwords

set_span(value)[source]

Set the max span, in number of words.

Parameters

value – (int) Max nb of tokens in a span window.

set_span_duration(value)[source]

Set the spandur option.

Parameters

value – (float, int) Max duration of a span window.

set_stopwords(value)[source]

Set the stopwords option.

Parameters

value – (bool) Enable the fact to add estimated stopwords

static tier_to_list(tier, loc=False)[source]

Create a list with the tokens contained in a tier.

Parameters
  • tier – (sppasTier)

  • loc – (bool) if true create the corresponding list of sppasLocation()

Returns

(list, list) list of unicode content and list of location

windowing(content, location=None)[source]

Return the list of DataSpeaker matching the given content.

Parameters
  • content – (list) List of entries

  • location – (list) List of locations of the entries

Returns

list of DataSpeaker

Module contents

filename

sppas.src.annotations.SpkLexRep.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Speaker Lexical Reprises detection.

Search for lexical repetition of sequences of tokens/lemmas in one file compared to another one.