annotations package

Subpackages

Submodules

annotations.annotationsexc module

filename

sppas.src.annotations.annotationsexc.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Exceptions for annotations package.

exception annotations.annotationsexc.AnnotationOptionError(key)[source]

Bases: KeyError

:ERROR 1010:.

Unknown option with key {key}.

__init__(key)[source]
exception annotations.annotationsexc.AnnotationSectionConfigFileError(section_name)[source]

Bases: ValueError

:ERROR 4014:.

Missing section {section_name} in the configuration file.

__init__(section_name)[source]
exception annotations.annotationsexc.AudioChannelError(nb)[source]

Bases: OSError

:ERROR 1070:.

An audio file with only one channel is expected. Got {nb} channels.

__init__(nb)[source]
exception annotations.annotationsexc.BadInputError[source]

Bases: TypeError

:ERROR 1040:.

SHOULD BE RENAMED BadTierInputError with expected type in parameter. Bad input tier type.

__init__()[source]
exception annotations.annotationsexc.EmptyDirectoryError(dirname)[source]

Bases: OSError

:ERROR 1220:.

The directory {dirname} does not contain relevant data.

__init__(dirname)[source]
exception annotations.annotationsexc.EmptyInputError(name)[source]

Bases: OSError

:ERROR 1020:.

Empty input tier {name}.

__init__(name)[source]
exception annotations.annotationsexc.EmptyOutputError(name)[source]

Bases: OSError

:ERROR 1025:.

Empty output result. No file created.

__init__(name)[source]
exception annotations.annotationsexc.NoChannelInputError[source]

Bases: OSError

:ERROR 1036:.

Missing input audio channel. Please read the documentation.

__init__()[source]
exception annotations.annotationsexc.NoInputError[source]

Bases: OSError

:ERROR 1030:.

No valid input.

__init__()[source]
exception annotations.annotationsexc.NoTierInputError[source]

Bases: OSError

:ERROR 1035:.

Missing input tier. Please read the documentation.

__init__()[source]
exception annotations.annotationsexc.SizeInputsError(number1, number2)[source]

Bases: OSError

:ERROR 1050:.

Inconsistency between the number of intervals of the input tiers. Got: {:d} and {:d}.

__init__(number1, number2)[source]
exception annotations.annotationsexc.SmallSizeInputError(number)[source]

Bases: OSError

:ERROR 1060:.

Not enough annotations in the input tier. At least {:d} are required.

__init__(number)[source]
exception annotations.annotationsexc.TooSmallInputError(name)[source]

Bases: OSError

:ERROR 1021:.

Input tier {name} does not contains enough annotations.

__init__(name)[source]

annotations.autils module

filename

sppas.src.annotations.autils.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Utility classes for the automatic annotations.

class annotations.autils.SppasFiles[source]

Bases: object

DEFAULT_EXTENSIONS = {'ANNOT': '.xra', 'ANNOT_ANNOT': '.xra', 'ANNOT_MEASURE': '.PitchTier', 'ANNOT_TABLE': '.arff', 'AUDIO': '.wav', 'IMAGE': '.jpg', 'VIDEO': '.mp4'}
OUT_FORMATS = ('ANNOT', 'IMAGE', 'VIDEO')
static get_default_extension(filetype_format)[source]

Return the default extension defined for a given format.

Parameters

filetype_format – (str)

Returns

(str) Extension with the dot or empty string

static get_informat_extensions(filetype_format)[source]

Return the list of input extensions a format can support.

Parameters

filetype_format – (str)

Returns

(list) Extensions, starting with the dot.

static get_outformat_extensions(filetype_format)[source]

Return the list of output extensions an out_format can support.

Parameters

filetype_format – (str)

Returns

(list) Extensions, starting with the dot.

annotations.baseannot module

filename

sppas.src.annotations.baseannot.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Base class for any SPPAS integration of an automatic annotation.

class annotations.baseannot.sppasBaseAnnotation(config, log=None)[source]

Bases: object

Base class for any automatic annotation integrated into SPPAS.

__init__(config, log=None)[source]

Base class for any SPPAS automatic annotation.

Load default options/member values from a configuration file. This file must be in paths.etc

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters
  • config – (str) Name of the JSON configuration file, without path.

  • log – (sppasLog) Human-readable logs.

batch_processing(file_names, progress=None)[source]

Perform the annotation on a bunch of files.

Can be used by an annotation manager to launch all the annotations on all checked files of a workspace in a single process.

The given list of inputs can then be either:
  • a list of file names: [file1, file2, …], or

  • a list of lists of file names: [(file1_a, file1_b), (file2_a,)], or

  • a list of mixed files/list of files: [file1, (file2a, file2b), …].

Parameters
  • file_names – (list) List of inputs

  • progress – ProcessProgressTerminal() or ProcessProgressDialog()

Returns

(list of str) List of created files

fix_options(options)[source]

Fix all options of the annotation from a list of sppasOption().

Parameters

options – (list of sppasOption)

fix_out_file_ext(output, out_format='ANNOT')[source]

Return the output with an appropriate file extension.

If the output has already an extension, it is not changed.

Parameters
  • output – (str) Base name or filename

  • out_format – (str) One of ANNOT, IMAGE, VIDEO

Returns

(str) filename

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

By default, the extensions are the annotated files. Can be overridden to change the list of supported extensions: they must contain the dot.

Returns

(list of list)

get_input_patterns()[source]

List of patterns that the annotation expects for the input filenames.

Returns

(list of str)

static get_opt_input_extensions()[source]

Extensions that the annotation expects for its optional input filename.

get_option(key)[source]

Return the option value of a given key or raise KeyError.

Parameters

key – (str) Return the value of an option, or None.

Raises

KeyError

get_out_name(filename, output_format='')[source]

Return the output filename from the input one.

Output filename is created from the given filename, the annotation output pattern and the given output format (if any).

Parameters
  • filename – (str) Name of the input file

  • output_format – (str) Extension of the output file with the dot

Returns

(str)

get_output_pattern()[source]

Pattern that the annotation uses for its output filename.

get_types()[source]

Return the list of types this annotation can perform.

If this annotation is expecting another file, the type allow to find it by using the references of the workspace (if any).

load_resources(*args, **kwargs)[source]

Load the linguistic resources.

print_diagnosis(*filenames)[source]

Print the diagnosis of a list of files in the user report.

Parameters

filenames – (list) List of files.

print_filename(filename)[source]

Print the annotation name applied on a filename in the user log.

Parameters

filename – (str) Name of the file to annotate.

print_options()[source]

Print the list of options in the user log.

run(input_files, output=None)[source]

Run the automatic annotation process on a given input.

The input is a list of files the annotation needs: audio, video, transcription, pitch, etc.

Either returns the list of created files if the given output is not none, or the created object (often a sppasTranscription) if no output was given.

Parameters
  • input_files – (list of str) The required and optional input(s)

  • output – (str) The output name with or without extension

Returns

(sppasTranscription OR list of created file names)

run_for_batch_processing(input_files)[source]

Perform the annotation on a file.

This method is called by ‘batch_processing’. It fixes the name of the output file, and call the run method.

Parameters

input_files – (list of str) the required input(s) for a run

Returns

created output file name or None

set_default_out_extensions()[source]

Return the default output extension of each format.

The default extension of each format is defined in the config.

set_out_extension(extension, out_format='ANNOT')[source]

Set the extension for a specific out format.

Parameters
  • extension – (str) File extension for created files

  • out_format – (str) One of ANNOT, IMAGE, VIDEO

static transfer_metadata(from_trs, to_trs)[source]

Transfer the metadata from a sppasTranscription to another one.

The identifier is not copied and any already existing metadata is not copied.

annotations.diagnosis module

filename

sppas.src.annotations.diagnosis.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Diagnose if files are appropriate for automatic annotations.

class annotations.diagnosis.sppasDiagnosis[source]

Bases: object

Diagnose if files are appropriate.

A set of methods to check if files are valid for SPPAS automatic annotations. Each method returns a status and a message depending on the fact that the given file is matching the requirements.

EXPECTED_CHANNELS = 1
EXPECTED_FRAME_RATE = 16000
EXPECTED_SAMPLE_WIDTH = 2
static check_audio_file(filename)[source]

Check an audio file.

Are verified:

  1. the format of the file (error);

  2. the number of channels (error);

  3. the sample width (error or warning);

  4. the framerate (error or warning;

  5. the filename (warning).

Parameters

filename – (str) name of the input file

Returns

tuple with (status identifier, message)

static check_file(filename)[source]

Check file of any type: audio or image or annotated file.

The extension of the filename is used to know the type of the file.

Parameters

filename – (str) name of the input file to diagnose.

Returns

tuple with (status identifier, message)

static check_img_file(filename)[source]

Check an image file.

Are verified:

opencv can open the file

Parameters

filename – (string) name of the input file

Returns

tuple with (status identifier, message)

static check_trs_file(filename)[source]

Check an annotated file.

Are verified:

  1. the format of the file (error);

  2. the file encoding (error);

  3. the filename (warning).

Parameters

filename – (string) name of the input file

Returns

tuple with (status identifier, message)

static check_video_file(filename)[source]

Check a video file.

Are verified:

opencv can open the file

Parameters

filename – (string) name of the input file

Returns

tuple with (status identifier, message)

annotations.infotier module

filename

sppas.src.annotations.infotier.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Tier with meta information about SPPAS.

class annotations.infotier.sppasMetaInfoTier(meta_object=None)[source]

Bases: sppas.src.structs.metainfo.sppasMetaInfo

Meta information manager about SPPAS.

Manager of meta information about SPPAS. Allows to create a tier with activated meta-information.

__init__(meta_object=None)[source]

Create a new sppasMetaInfoTier instance.

Add and activate all known information about SPPAS.

Parameters

meta_object – (sppasMetadata) where to get meta infos.

create_time_tier(begin, end, tier_name='MetaInformation')[source]

Create a tier with activated information as annotations.

Parameters
  • begin – (float) Begin midpoint value

  • end – (float) End midpoint value

  • tier_name – (str) Name of the tier to create

Returns

sppasTier

annotations.log module

filename

sppas.src.annotations.log.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

The Procedure Outcome Report of automatic annotations of SPPAS.

class annotations.log.sppasLog(parameters=None)[source]

Bases: object

A log file utility class dedicated to automatic annotations.

Class to manage the SPPAS automatic annotations log file, which is also called the “Procedure Outcome Report”.

MAX_INDENT = 10
STR_INDENT = ' ... '
STR_ITEM = '  - '
__init__(parameters=None)[source]

Create a sppasLog instance and open an output stream to NULL.

Parameters

parameters – (sppasParam)

close()[source]

Close the current output stream.

create(filename)[source]

Create and open a new output stream.

Parameters

filename – (str) Output filename

static get_indent_text(number)[source]

Return a string representing some indentation.

Parameters

number – (int) A positive integer.

static get_status_text(status_id)[source]

Return a status text from a status identifier.

Parameters

status_id – (int)

open(filename)[source]

Open an existing file and set the output stream.

Parameters

filename – (str) Output filename

print_annotations_header()[source]

Print the parameters information in the output stream.

Do not print anything if no parameters were given.

print_header()[source]

Print the parameters information in the output file stream.

print_item(main_info, second_info=None)[source]

Print an item in the output stream.

Parameters
  • main_info – (str) Main information to print

  • second_info – (str) A secondary info to print

print_message(message, indent=0, status=None)[source]

Print a message at the end of the current output stream.

Parameters
  • message – (str) The message to communicate

  • indent – (int) Shift the message with indents

  • status – (int) A status identifier

0 means OK, 1 means WARNING, 2 means IGNORED, 3 means INFO, -1 means ERROR.

print_newline()[source]

Print a CR in the output file stream, do nothing if logging.

print_raw_text(text)[source]

Print a text at the end of the output stream.

Parameters

text – (str) text to print

print_separator()[source]

Print a line in the output file stream, do nothing if logging.

print_stat_item(step_number, value=None)[source]

Print a statistic value in the output stream for a given step.

Do not print anything if no parameters were given.

Parameters
  • step_number – (1..N)

  • value – (str) A statistic value.

Instead, print the status (enabled or disabled).

print_stats(stats)[source]

Print the statistics values in the output stream for a given step.

Do not print anything if no parameters were given.

Parameters

stats – List of values (one for each annotation)

print_step(step_number)[source]

Print a step name in the output stream from its number.

Parameters

step_number – (1..N) Number of an annotation defined in a

sppasParam instance.

annotations.manager module

filename

sppas.src.annotations.manager.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Automatic annotations manager for SPPAS integrated classes.

class annotations.manager.sppasAnnotationsManager[source]

Bases: threading.Thread

Parent class for running annotation processes.

Run annotations on a set of files.

__init__()[source]

Create a new instance.

Initialize a Thread.

annotate(parameters, progress=None)[source]

Execute the activated annotations.

Get execution information from the ‘parameters’ object. Create a Procedure Outcome Report if a filename is set in the parameters.

get_annot_files(annotation)[source]

Search for files of the workspace to be annotated by the given ann.

Parameters

annotation – (sppasBaseAnnot) Annotation instance

Returns

List of file names matching patterns and extensions

set_do_merge(do_merge)[source]

Fix if the ‘annotate’ method have to create a merged file or not.

Parameters

do_merge – (bool) if set to True, a merged file will be created

annotations.param module

filename

sppas.src.annotations.param.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Parametrization of automatic annotations.

class annotations.param.annotationParam(filename=None)[source]

Bases: object

Annotation data parameters.

Class to store meta data of an automatic annotation like its name, description, supported languages, etc.

__init__(filename=None)[source]

Create a new annotationParam instance.

Parameters

filename – (str) Annotation configuration file

get_activate()[source]

Return the activation status of the annotation (bool).

get_api()[source]

Return the name of the class to instantiate to perform this annotation.

get_descr()[source]

Return the description of the annotation (str).

get_key()[source]

Return the identifier of the annotation (str).

get_lang()[source]

Return the language or an empty string or None.

get_langlist()[source]

Return the list of available languages (list of str).

get_langresource()[source]

Return the list of language resources.

get_name()[source]

Return the name of the annotation (str).

get_option(step)[source]

Return the step-th option.

get_option_by_key(key)[source]

Return an option from its key.

get_options()[source]

Return the list of options of the annotation.

get_reference_identifiers()[source]

Return the list of identifiers of the references.

get_reference_url(id_ref)[source]

Return the url of a given reference or an empty string.

get_types()[source]

Return the list of types the annotation can support (list of str).

parse(filename)[source]

Parse a configuration file to fill members.

Parameters

filename – (str) Annotation configuration file (.ini)

set_activate(activate)[source]

Enable the annotation but only if this annotation is valid.

Parameters

activate – (bool) Enable or disable the annotation

Returns

(bool) enabled or disabled

set_lang(lang)[source]

Set the language of the annotation, if this latter is accepted.

Parameters

lang – (str) Language to fix for the annotation

Returns

(bool) Language is set or not

set_option_value(key, value)[source]

Change value of an option.

Parameters
  • key – (str) Identifier of the option

  • value – (any) New value for the option

Raises

KeyError

class annotations.param.sppasParam(annotation_keys=None)[source]

Bases: object

Annotation parameters manager.

Parameters of a set of annotations.

__init__(annotation_keys=None)[source]

Create a new sppasParam instance with default values.

Parameters

annotation_keys – (list) List of annotations to load. None=ALL.

activate_annotation(stepname)[source]
activate_step(step)[source]
add_to_workspace(files)[source]

Add a list of files or directories into the workspace.

The state of all the added files is set to CHECKED.

Parameters

files – (str or list of str)

disable_step(step)[source]
get_checked_roots()[source]

Return the list of entries to annotate.

get_lang(step=None)[source]
get_langlist(step=2)[source]
get_langresource(step)[source]
get_options(step)[source]
get_output_extension(out_format)[source]

Return the output extension defined for the given out_format.

get_ref_ids(step)[source]

Return a list of identifiers of the reference publications.

Parameters

step – (int) Annotation index

get_ref_url(step, ref_id)[source]

Return the URL of the reference publication.

Parameters
  • step – (int) Annotation index

  • ref_id – (str) Identifier of a reference

get_report_filename()[source]

Return the name of the file for the Procedure Outcome Report.

get_step(step)[source]

Return the ‘sppasParam’ instance of the annotation.

get_step_descr(step)[source]
get_step_idx(annotation_key)[source]

Get the annotation step index from an annotation key.

Parameters

annotation_key – (str)

Raises

KeyError

get_step_key(step)[source]
get_step_name(step)[source]
get_step_numbers()[source]
get_step_status(step)[source]
get_step_types(step)[source]
get_steplist()[source]
get_workspace()[source]

Return the workspace.

load_annotations(annotation_files=None)[source]

Load the annotation configuration files.

Load from a list of given file names (without path) or from the default sppas ui configuration file.

Parameters

annotation_files – (list) List of annotations to load. None=ALL.

parse_config_file()[source]

Parse the sppasui.json file.

Parse the file to get the list of annotations and parse the corresponding “json” file.

set_lang(language, step=None)[source]
set_option_value(step, key, value)[source]
set_output_extension(output_ext, output_format)[source]

Fix the output extension of all the annotations of a given out_format.

Parameters
  • output_ext – (str) File extension (with or without a dot)

  • output_format – (str) Either ANNOT, AUDIO, VIDEO OR IMAGE

Returns

(str) the assigned extension

Raise

ValueError

set_report_filename(filename)[source]

Fix the name of the file to save the report of the annotations.

Parameters

filename – (str) Filename for the Procedure Outcome Report

set_workspace(wkp)[source]

annotations.searchtier module

filename

sppas.src.annotations.searchtier.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Search for tier with various names.

class annotations.searchtier.sppasFindTier[source]

Bases: object

Search for tiers in a sppasTranscription.

static aligned_lemmas(trs)[source]

Return the tier with time-aligned lemmas.

Parameters

trs – (sppasTier or None)

static aligned_phones(trs)[source]

Return the tier with time-aligned phonemes.

Parameters

trs – (sppasTier or None)

static aligned_syllables(trs)[source]

Return the tier with time-aligned syllables.

Parameters

trs – (sppasTier or None)

static aligned_tokens(trs)[source]

Return the tier with time-aligned tokens.

Parameters

trs – (sppasTier or None)

static phonetization(trs)[source]

Return the tier with phonetization.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

static pitch(trs)[source]

Return the tier with pitch values.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

static pitch_anchors(trs)[source]

Return the tier with pitch anchors, like momel result.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

static tokenization(trs, pattern='')[source]

Return the tier with tokenization.

In case of EOT, several tiers with tokens are available. Priority is given to faked (i.e. without pattern).

Parameters
  • trs – (sppasTranscription)

  • pattern – (str) Priority pattern

Returns

(sppasTier or None)

static transcription(trs)[source]

Return the tier with orthographic transcription.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

annotations.windowing module

filename

sppas.src.annotations.windowing.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Windowing system on a tier.

class annotations.windowing.sppasTierWindow(tier)[source]

Bases: object

Windowing system on a tier.

Support windows in the time domain or with tag separators, both with or with overlaps among windows.

__init__(tier)[source]

Create an instance of a sppasTierWindow.

Parameters

tier – (sppasTier) Tier to analyze

anchor_split(duration=1, step=1, separators=['#'])[source]

Return a set of annotations within a window given by separators.

Parameters
  • duration – (int) the duration of a window - number of intervals among the separators

  • step – (int) the step duration - number of intervals

  • separators – (list) list of separators

Returns

(List of sppasAnnSet)

continuous_anchor_split(separators)[source]

Return all time intervals within a window given by separators.

Parameters

separators – (list) list of separators

Returns

(List of intervals)

static drange(x, y, jump)[source]

Mimics ‘range’ with either float or int values.

Parameters
  • x – start value

  • y – end value

  • jump – step value

search_for_annotations(start_time, end_time, delta=0.5, ignore=[])[source]

Return the annotation set among the given interval.

Parameters
  • start_time – (int/float)

  • end_time – (int/float)

  • delta – (float) Rate of time the annotation must overlap

  • ignore – (list of str) List of tag contents to ignore – currently applied only on the best tag

Returns

(sppasAnnSet) The annotations matching all the requirements

time_split(duration, step, delta=0.6)[source]

Return a set of annotations within a time window.

Parameters
  • duration – (float) the duration of a window

  • step – (float) the step duration

  • delta – (float) percentage of confidence for an overlapping label

Returns

(sppasAnnSet) Set of sppasAnnotation

Raises

sppasTypeError, ValueError

Module contents

filename

sppas.src.annotations.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

The automatic annotations of SPPAS.

annotations: automatic annotations.

This package includes all the automatic annotations, each one in a package and the classes to manage the data to be annotated and the resulting annotated data.

Requires the following other packages:

  • config

  • utils

  • exc

  • structs

  • wkps

  • resources

  • anndata

  • audiodata

  • imgdata – if “video” feature enabled

  • videodata – if “video” feature enabled

class annotations.ImageFaceLandmark[source]

Bases: object

__init__()[source]
class annotations.SppasFiles[source]

Bases: object

DEFAULT_EXTENSIONS = {'ANNOT': '.xra', 'ANNOT_ANNOT': '.xra', 'ANNOT_MEASURE': '.PitchTier', 'ANNOT_TABLE': '.arff', 'AUDIO': '.wav', 'IMAGE': '.jpg', 'VIDEO': '.mp4'}
OUT_FORMATS = ('ANNOT', 'IMAGE', 'VIDEO')
static get_default_extension(filetype_format)[source]

Return the default extension defined for a given format.

Parameters

filetype_format – (str)

Returns

(str) Extension with the dot or empty string

static get_informat_extensions(filetype_format)[source]

Return the list of input extensions a format can support.

Parameters

filetype_format – (str)

Returns

(list) Extensions, starting with the dot.

static get_outformat_extensions(filetype_format)[source]

Return the list of output extensions an out_format can support.

Parameters

filetype_format – (str)

Returns

(list) Extensions, starting with the dot.

class annotations.StopWords(case_sensitive=False)[source]

Bases: sppas.src.resources.vocab.sppasVocabulary

A vocabulary that can automatically evaluate a list of Stop-Words.

An entry ‘w’ is relevant for the speaker if its probability is less than a threshold:

P(w) <= 1 / (alpha * V)

where ‘alpha’ is an empirical coefficient and ‘V’ is the vocabulary size of the speaker.

MAX_ALPHA = 4.0
MIN_ANN_NUMBER = 5
__init__(case_sensitive=False)[source]

Create a new StopWords instance.

Parameters

case_sensitive – (bool) Considers the case of entries or not.

property alpha

Return the value of alpha coefficient (float).

copy()[source]

Make a deep copy of the instance.

Returns

(StopWords)

evaluate(tier=None, merge=True)[source]

Add entries to the list of stop-words from the content of a tier.

Estimate if a token is relevant: if not it adds it in the stop-list.

Parameters
  • tier – (sppasTier) A tier with entries to be analyzed.

  • merge – (bool) Merge with the existing list (if True) or

delete the existing list and create a new one (if False) :returns: (int) Number of entries added into the list :raises: EmptyInputError, TooSmallInputError

get_alpha()[source]

Return the value of alpha coefficient (float).

get_threshold()[source]

Return the last estimated threshold (float).

get_v()[source]

Return the last estimated vocabulary size (int).

load(filename, merge=True)[source]

Load a list of stop-words from a file.

Parameters
  • filename – (str)

  • merge – (bool) Merge with the existing list (if True) or

delete the existing list (if False)

set_alpha(alpha)[source]

Fix the alpha option.

Alpha is a coefficient to add specific stop-words in the list. Default value is 0.5.

Parameters

alpha – (float) Value in range [0..4]

class annotations.sppasActivity(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the Activity generation.

__init__(log=None)[source]

Create a new sppasActivity instance.

Parameters

log – (sppasLog) Human-readable logs.

convert(tier, tmin, tmax)[source]

Create an Activity and ActivityDuration tier.

Parameters
  • tier – (sppasTier)

  • tmin – (sppasPoint)

  • tmax – (sppasPoint)

Returns

(sppasTier, sppasTier)

fix_options(options)[source]

Fix all options.

Available options are:

  • duration

Parameters

options – (sppasOption)

get_input_patterns()[source]

List of patterns this annotation expects for its input filenames.

get_inputs(input_files)[source]

Return the the tier with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Important: options could be changed!

Parameters
  • input_files – (list of str) Time-aligned tokens

  • output – (str) the output name - either filename or basename

Returns

(sppasTranscription)

set_duration_tier(value)[source]

Fix the activity duration option.

Parameters

value – (bool) Activity tier generation.

class annotations.sppasAlign(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the Alignment automatic annotation.

Author

Brigitte Bigi

Contact

develop@sppas.org

This class can produce 1 up to 5 tiers with names:

  • PhonAlign

  • TokensAlign (if tokens are given in the input)

  • PhnTokAlign - option (if tokens are given in the input)

How to use sppasAlign?

>>> a = sppasAlign()
>>> a.set_aligner('julius')
>>> a.load_resources(model_dirname)
>>> a.run([phones], [audio, tokens], output)
__init__(log=None)[source]

Create a new sppasAlign instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

convert(phon_tier, tok_tier, tok_faked_tier, input_audio, workdir)[source]

Perform speech segmentation of data.

Parameters
  • phon_tier – (Tier) phonetization.

  • tok_tier – (Tier) tokenization, or None.

  • tok_faked_tier – (Tier) rescue tokenization, or None.

  • input_audio – (str) Audio file name.

  • workdir – (str) The working directory

Returns

tier_phn, tier_tok

fix_options(options)[source]

Fix all options.

Available options are:

  • clean

  • basic

  • aligner

Parameters

options – (sppasOption)

static fix_workingdir(inputaudio=None)[source]

Fix the working directory to store temporarily the data.

Parameters

inputaudio – (str) Audio file name

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the audio file name and the 2 tiers.

Two tiers: the tier with phonetization and the tier with text normalization.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasChannel, sppasTier, sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

load_resources(model, model_L1=None, **kwargs)[source]

Fix the acoustic model directory.

Create a SpeechSegmenter and AlignerIO.

Parameters

model – (str) Directory of the acoustic model of the language

of the text :param model_L1: (str) Directory of the acoustic model of the mother language of the speaker

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Phonemes, and optionally tokens, audio

  • output – (str) the output name

Returns

(sppasTranscription)

set_aligner(aligner_name)[source]

Fix the name of the aligner.

Parameters

aligner_name – (str) Case-insensitive name of the aligner.

set_basic(basic)[source]

Fix the basic option.

Parameters

basic – (bool) If basic is set to True, a basic segmentation

will be performed if the main aligner fails.

set_clean(clean)[source]

Fix the clean option.

Parameters

clean – (bool) If clean is set to True then temporary files

will be removed.

class annotations.sppasAnnotationsManager[source]

Bases: threading.Thread

Parent class for running annotation processes.

Run annotations on a set of files.

__init__()[source]

Create a new instance.

Initialize a Thread.

annotate(parameters, progress=None)[source]

Execute the activated annotations.

Get execution information from the ‘parameters’ object. Create a Procedure Outcome Report if a filename is set in the parameters.

get_annot_files(annotation)[source]

Search for files of the workspace to be annotated by the given ann.

Parameters

annotation – (sppasBaseAnnot) Annotation instance

Returns

List of file names matching patterns and extensions

set_do_merge(do_merge)[source]

Fix if the ‘annotate’ method have to create a merged file or not.

Parameters

do_merge – (bool) if set to True, a merged file will be created

class annotations.sppasCuedSpeech(*args, **kwargs)[source]

Bases: object

__init__(*args, **kwargs)[source]
class annotations.sppasFaceDetection(*args, **kwargs)[source]

Bases: object

__init__(*args, **kwargs)[source]
class annotations.sppasFaceSights[source]

Bases: object

__init__()[source]
class annotations.sppasFillIPUs(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the fill in IPUs automatic annotation.

__init__(log=None)[source]

Create a new sppasFillIPUs instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

convert(channel, text_tier)[source]

Return a tier with transcription aligned to the audio.

Parameters
  • channel – (sppasChannel) Input audio channel

  • text_tier – (sppasTier) Input transcription text in a PointTier

fix_options(options)[source]

Fix all options.

Available options are:

  • threshold: volume threshold to decide a window is silence or not

  • win_length: length of window for a estimation or volume values

  • min_sil: minimum duration of a silence

  • min_ipu: minimum duration of an ipu

  • shift_start: start boundary shift value.

  • shift_end: end boundary shift value.

Parameters

options – (sppasOption)

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the channel and the tier with ipus.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasChannel, sppasTier)

get_min_ipu()[source]
get_min_sil()[source]
get_output_pattern()[source]

Pattern this annotation uses in an output filename.

get_shift_end()[source]
get_shift_start()[source]
run(input_files, output=None)[source]

Run the automatic annotation process on an input.

input_filename is a tuple (audio, raw transcription)

Parameters
  • input_files – (list of str) (audio, ortho)

  • output – (str) the output file name

Returns

(sppasTranscription)

run_for_batch_processing(input_files)[source]

Perform the annotation on a file.

This method is called by ‘batch_processing’. It fixes the name of the output file, and call the run method.

Override to NOT ANNOTATE if an annotation is already existing.

Parameters

input_files – (list of str) the required inputs for a run

Returns

output file name or None

set_min_ipu(value)[source]

Fix the initial minimum duration of an IPU.

Parameters

value – (float) Duration in seconds.

set_min_sil(value)[source]

Fix the initial minimum duration of a silence.

Parameters

value – (float) Duration in seconds.

set_shift_end(value)[source]

Fix the end boundary shift value.

Parameters

value – (float) Duration in seconds.

set_shift_start(value)[source]

Fix the start boundary shift value.

Parameters

value – (float) Duration in seconds.

class annotations.sppasFindTier[source]

Bases: object

Search for tiers in a sppasTranscription.

static aligned_lemmas(trs)[source]

Return the tier with time-aligned lemmas.

Parameters

trs – (sppasTier or None)

static aligned_phones(trs)[source]

Return the tier with time-aligned phonemes.

Parameters

trs – (sppasTier or None)

static aligned_syllables(trs)[source]

Return the tier with time-aligned syllables.

Parameters

trs – (sppasTier or None)

static aligned_tokens(trs)[source]

Return the tier with time-aligned tokens.

Parameters

trs – (sppasTier or None)

static phonetization(trs)[source]

Return the tier with phonetization.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

static pitch(trs)[source]

Return the tier with pitch values.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

static pitch_anchors(trs)[source]

Return the tier with pitch anchors, like momel result.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

static tokenization(trs, pattern='')[source]

Return the tier with tokenization.

In case of EOT, several tiers with tokens are available. Priority is given to faked (i.e. without pattern).

Parameters
  • trs – (sppasTranscription)

  • pattern – (str) Priority pattern

Returns

(sppasTier or None)

static transcription(trs)[source]

Return the tier with orthographic transcription.

Parameters

trs – (sppasTranscription)

Returns

(sppasTier or None)

class annotations.sppasIVA(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

Estimate IVA on a tier.

Get or create segments then map them into a dictionary where:

  • key is a label assigned to the segment;

  • value is the list of observed values in the segment.

__init__(log=None)[source]

Create a new sppasIVA instance.

Parameters

log – (sppasLog) Human-readable logs.

convert(input_tier_values, input_tier_segments)[source]

Estimate IVA on the given input tier with values.

Parameters
  • input_tier_values – (sppasTier) Tier with numerical values.

  • input_tier_segments – (sppasTier) Tier with intervals.

Returns

(sppasTranscription)

fix_options(options)[source]

Fix all options.

Parameters

options – (sppasOption)

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

An annotated file with measure values (pitch, intensity…), and An annotated file with a sppasTier of type ‘interval’.

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_input_tiers(input_files)[source]

Return tiers with values and segments.

Parameters

input_files – (list)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

static iva_to_tier(iva_result, sgmts_tier, tier_name, tag_type='float')[source]

Create a tier from one of the IVA result (mean, sd, …).

Parameters
  • iva_result – One of the results of TGA

  • sgmts_tier – (sppasTier) Tier with the segments

  • tier_name – (str) Name of the output tier

  • tag_type – (str) Type of the sppasTag to be included

Returns

(sppasTier)

static iva_to_tier_reglin(iva_result, sgmts_tier, intercept=True)[source]

Create tiers of intercept,slope from the IVA result.

Parameters
  • iva_result – intercept,slope result of IVA

  • sgmts_tier – (sppasTier) Tier with the segments

  • intercept – (boolean) Export the intercept.

If False, export Slope.

Returns

(sppasTier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Values and Segments in a single file or in different ones

  • output – (str) the output file name

Returns

(sppasTranscription)

set_eval(occ=None, total=None, mean=None, median=None, stdev=None, linreg=None)[source]

Set IVA evaluations to perform.

Parameters
  • total – (bool) Estimates total of values in segments.

  • mean – (bool) Estimates mean of values in segments.

  • median – (bool) Estimates median of values in segments.

  • stdev – (bool) Estimates standard deviation of values in segments.

  • linreg – (bool) Estimates linear regression of values in segments.

set_input_tiername_segments(tiername)[source]

Fix the name of the tier with segments.

Parameters

tiername – (str) Default is ‘TokensAlign’

set_input_tiername_values(tiername)[source]

Fix the name of the tier with values.

Parameters

tiername – (str) Default is ‘PitchTier’

set_segments_separators(entry)[source]

Fix the separators to create segments.

Parameters

entry – (str) Entries separated by whitespace.

set_sgmt_prefix_label(prefix)[source]

Fix the prefix to add to each segment.

Parameters

prefix – (str) Default is ‘sgmt_

tier_to_labelled_segments(segments, input_tier_values)[source]

Create the segment intervals within the values.

Parameters
  • segments – (sppasTier) segment intervals to get values

  • input_tier_values – (sppasTier) tags are float/int values

Returns

(dict, sppasTier) dict of segment/values, labelled segments

tier_to_segments(input_tier)[source]

Create segment intervals.

Parameters

input_tier – (sppasTier)

Returns

(sppasTier)

class annotations.sppasIntsint(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the INTSINT automatic annotation.

__init__(log=None)[source]

Create a new instance.

Parameters

log – (sppasLog) Human-readable logs.

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

INTSINT requires momel anchors which can either be stored in a TextGrid file or in a PitchTier file.

Returns

(list of list)

get_input_tier(input_files)[source]

Return the tier with Momel anchors.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier) Tier of type Point

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) momel anchors

  • output – (str) the output file name

Returns

(sppasTranscription)

static tier_to_anchors(momel_tier)[source]

Initialize INTSINT attributes from a Tier with anchors.

Parameters

momel_tier – (sppasTier) A PointTier with float values.

Returns

List of tuples (time, f0 value)

static tones_to_tier(tones, anchors_tier)[source]

Convert the INTSINT result into a tier.

Parameters
  • tones – (list)

  • anchors_tier – (sppasTier)

class annotations.sppasLexMetric(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the occ and rank estimator.

__init__(log=None)[source]

Create a new sppasLexMetric instance.

Parameters

log – (sppasLog) Human-readable logs.

fix_options(options)[source]

Fix all options.

Parameters

options – list of sppasOption instances

get_input_tier(input_files)[source]

Return the input tier from the inputs.

Parameters

input_files – (list)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Time-aligned tokens, or other

  • output – (str) the output file name

Returns

(sppasTranscription)

set_alt(alt)[source]

Fix the alt option, used to estimate occ and rank.

Parameters

alt – (bool)

set_segments_separators(entry)[source]

Fix the separators to create segments.

Parameters

entry – (str) Entries separated by whitespace.

set_tiername(tier_name)[source]

Fix the tiername option.

Parameters

tier_name – (str)

tier_to_segment_occ(input_tier)[source]

Create segment intervals and eval the number of occurrences.

Parameters

input_tier – (sppasTier)

Returns

(sppasTier)

class annotations.sppasLexRep(log=None)[source]

Bases: annotations.SelfRepet.sppasbaserepet.sppasBaseRepet

SPPAS integration of the speaker lexical variation annotation.

Main differences compared to repetitions: The span option is used to fix the max number of continuous tokens to analyze. The span window has a duration limit.

__init__(log=None)[source]

Create a new sppasLexVar instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

static create_tier(sources, locations)[source]

Create a tier from content end localization lists.

Parameters
  • sources – (dict) dict of sources – in fact, the indexes.

  • locations – (list) list of location corresponding to the tokens

Returns

(sppasTier)

fix_options(options)[source]

Fix all options.

Parameters

options – list of sppasOption instances

get_inputs(input_files)[source]

Return 2 tiers with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

static get_longest(speaker1, speaker2)[source]

Return the index of the last token of the longest repeated sequence.

No matter if a non-speech event occurs in the middle of the repeated sequence and no matter if a non-speech event occurs in the middle of the source sequence. No matter if tokens are not repeated in the same order.

Parameters
  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker) Entries of speaker 2

Returns

(int) Index or -1

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

lexical_variation_detect(tier1, tier2)[source]

Detect the lexical variations between 2 tiers.

Parameters
  • tier1 – (sppasTier)

  • tier2 – (sppasTier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of list of str) time-aligned tokens of 2 files

  • output – (str) the output file name

Returns

(sppasTranscription)

select(index1, speaker1, speaker2)[source]

Append (or not) a repetition.

Parameters
  • index1 – (int) end index of the entry of the source (speaker1)

  • speaker1 – (DataSpeaker) Entries of speaker 1

  • speaker2 – (DataSpeaker) Entries of speaker 2

Returns

(bool)

set_alpha(value)[source]

Set the alpha option.

Parameters

value – (float) Coefficient to estimated stopwords

set_span(value)[source]

Set the max span, in number of words.

Parameters

value – (int) Max nb of tokens in a span window.

set_span_duration(value)[source]

Set the spandur option.

Parameters

value – (float, int) Max duration of a span window.

set_stopwords(value)[source]

Set the stopwords option.

Parameters

value – (bool) Enable the fact to add estimated stopwords

static tier_to_list(tier, loc=False)[source]

Create a list with the tokens contained in a tier.

Parameters
  • tier – (sppasTier)

  • loc – (bool) if true create the corresponding list of sppasLocation()

Returns

(list, list) list of unicode content and list of location

windowing(content, location=None)[source]

Return the list of DataSpeaker matching the given content.

Parameters
  • content – (list) List of entries

  • location – (list) List of locations of the entries

Returns

list of DataSpeaker

class annotations.sppasMomel(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of Momel.

__init__(log=None)[source]

Create a new sppasMomel instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

static anchors_to_tier(anchors)[source]

Transform anchors to a sppasTier.

Anchors are stored in frames. It is converted to seconds (a frame is during 10ms).

Parameters

anchors – (List of Anchor)

Returns

(sppasTier)

convert(pitch)[source]

Search for momel anchors.

Parameters

pitch – (list of float) pitch values samples at 10ms

Returns

sppasTier

estimate_momel(ipu_pitch, current_time)[source]

Estimate momel on an IPU.

Parameters
  • ipu_pitch – (list of float) Pitch values of an IPU.

  • current_time – (float) Time value of the last pitch value

Returns

(list of Anchor)

fix_options(options)[source]

Fix all options.

Available options are:

  • lfen1

  • hzinf

  • hzsup

  • maxec

  • lfen2

  • seuildiff_x

  • seuildiff_y

  • glitch

Parameters

options – (sppasOption)

static fix_pitch(input_filename)[source]

Load pitch values from a file.

It is supposed that the given file contains a tier with name “Pitch” with a pitch value every 10ms, or a tier with name “PitchTier”.

Returns

A list of pitch values (one value each 10 ms).

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Pitch values

  • output – (str) the output name

Returns

(sppasTranscription)

set_option_elim_glitch(value)[source]
set_option_hi(value)[source]
set_option_lo(value)[source]
set_option_maxerr(value)[source]
set_option_mind(value)[source]
set_option_minr(value)[source]
set_option_win1(value)[source]
set_option_win2(value)[source]
class annotations.sppasOtherRepet(log=None)[source]

Bases: annotations.SelfRepet.sppasbaserepet.sppasBaseRepet

SPPAS Automatic Other-Repetition Detection.

Detect automatically other-repetitions. Result must be re-filtered by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.

__init__(log=None)[source]

Create a new sppasOtherRepet instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

fix_options(options)[source]

Fix all options.

Parameters

options – list of sppasOption instances

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return 2 tiers with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

other_detection(inputtier1, inputtier2)[source]

Other-Repetition detection.

Parameters
  • inputtier1 – (Tier)

  • inputtier2 – (Tier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Input file is a tuple with 2 files: the main speaker and the echoing speaker.

Parameters
  • input_files – (list of str) File(s) with time-aligned token

  • output – (str) the output name

Returns

(sppasTranscription)

set_all_echos_tier(all_echos)[source]

Create a tier with all tokens that are echo-candidates.

Parameters

all_echos – (bool)

class annotations.sppasOverActivity(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the automatic overlaps estimator on intervals.

__init__(log=None)[source]

Create a new instance.

Parameters

log – (sppasLog) Human-readable logs.

detection(tier_spk1, tier_spk2)[source]

Search for the overlaps of annotations.

Parameters
  • tier_spk1 – (sppasTier)

  • tier_spk2 – (sppasTier)

fix_options(options)[source]

Fix all options.

Parameters

options – (sppasOption)

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return 2 tiers with name given in options.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Input files is a list with 2 files: the activity of speaker 1 and the activity of the speaker 2.

Parameters
  • input_files – (list of str) Time-aligned items, Time-aligned items

  • output – (str) the output name

Returns

(sppasTranscription)

set_default_out_items()[source]

Set the list of tags to be ignored.

set_out_items(str_entry)[source]

Fix the list of tags to be ignored from the given string.

Parameters

str_entry – (str) Entries separated by commas

set_tiername(tier_name)[source]

Fix the tiername option.

Parameters

tier_name – (str)

class annotations.sppasParam(annotation_keys=None)[source]

Bases: object

Annotation parameters manager.

Parameters of a set of annotations.

__init__(annotation_keys=None)[source]

Create a new sppasParam instance with default values.

Parameters

annotation_keys – (list) List of annotations to load. None=ALL.

activate_annotation(stepname)[source]
activate_step(step)[source]
add_to_workspace(files)[source]

Add a list of files or directories into the workspace.

The state of all the added files is set to CHECKED.

Parameters

files – (str or list of str)

disable_step(step)[source]
get_checked_roots()[source]

Return the list of entries to annotate.

get_lang(step=None)[source]
get_langlist(step=2)[source]
get_langresource(step)[source]
get_options(step)[source]
get_output_extension(out_format)[source]

Return the output extension defined for the given out_format.

get_ref_ids(step)[source]

Return a list of identifiers of the reference publications.

Parameters

step – (int) Annotation index

get_ref_url(step, ref_id)[source]

Return the URL of the reference publication.

Parameters
  • step – (int) Annotation index

  • ref_id – (str) Identifier of a reference

get_report_filename()[source]

Return the name of the file for the Procedure Outcome Report.

get_step(step)[source]

Return the ‘sppasParam’ instance of the annotation.

get_step_descr(step)[source]
get_step_idx(annotation_key)[source]

Get the annotation step index from an annotation key.

Parameters

annotation_key – (str)

Raises

KeyError

get_step_key(step)[source]
get_step_name(step)[source]
get_step_numbers()[source]
get_step_status(step)[source]
get_step_types(step)[source]
get_steplist()[source]
get_workspace()[source]

Return the workspace.

load_annotations(annotation_files=None)[source]

Load the annotation configuration files.

Load from a list of given file names (without path) or from the default sppas ui configuration file.

Parameters

annotation_files – (list) List of annotations to load. None=ALL.

parse_config_file()[source]

Parse the sppasui.json file.

Parse the file to get the list of annotations and parse the corresponding “json” file.

set_lang(language, step=None)[source]
set_option_value(step, key, value)[source]
set_output_extension(output_ext, output_format)[source]

Fix the output extension of all the annotations of a given out_format.

Parameters
  • output_ext – (str) File extension (with or without a dot)

  • output_format – (str) Either ANNOT, AUDIO, VIDEO OR IMAGE

Returns

(str) the assigned extension

Raise

ValueError

set_report_filename(filename)[source]

Fix the name of the file to save the report of the annotations.

Parameters

filename – (str) Filename for the Procedure Outcome Report

set_workspace(wkp)[source]
class annotations.sppasPhon(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the Phonetization automatic annotation.

__init__(log=None)[source]

Create a sppasPhon instance without any linguistic resources.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

convert(tier)[source]

Phonetize annotations of a tokenized tier.

Parameters

tier – (Tier) the ortho transcription previously tokenized.

Returns

(Tier) phonetized tier with name “Phones”

fix_options(options)[source]

Fix all options.

Available options are:

  • phonunk

  • usesstdtokens

Parameters

options – (sppasOption)

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the the tier with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

load_resources(dict_filename=None, map_filename=None, **kwargs)[source]

Set the pronunciation dictionary and the mapping table.

Parameters

dict_filename – (str) The pronunciation dictionary in HTK-ASCII

format with UTF-8 encoding.

Parameters

map_filename – (str) is the filename of a mapping table. It is used to generate new pronunciations by mapping phonemes of the dict.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Normalized text

  • output – (str) the output name

Returns

(sppasTranscription)

set_unk(unk)[source]

Fix the unk option value.

Parameters

unk – (bool) If unk is set to True, the system attempts

to phonetize unknown entries (i.e. tokens missing in the dictionary). Otherwise, the phonetization of an unknown entry unit is set to the default stamp.

set_usestdtokens(stdtokens)[source]

Fix the stdtokens option.

Parameters

stdtokens – (bool) If it is set to True, the phonetization

uses the standard transcription as input, instead of the faked transcription. This option does make sense only for an Enriched Orthographic Transcription.

class annotations.sppasRMS(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the automatic RMS estimator on intervals.

Estimate the root-mean-square of segments, i.e. sqrt(sum(S_i^2)/n). This is a measure of the power in an audio signal.

__init__(log=None)[source]

Create a new sppasRMS instance.

Parameters

log – (sppasLog) Human-readable logs.

estimator(tier)[source]

Estimate RMS on all non-empty intervals.

Parameters

tier – (sppasTier)

fix_options(options)[source]

Fix all options.

Parameters

options – (sppasOption)

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the channel and the tier with ipus.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasChannel, sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Input file is a tuple with 2 files: the audio file and the annotation file

Parameters
  • input_files – (list of str) (audio, time-aligned items)

  • output – (str) the output name

Returns

(sppasTranscription)

set_tiername(tier_name)[source]

Fix the tiername option.

Parameters

tier_name – (str)

class annotations.sppasReOcc(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the automatic re-occurrences annotation.

__init__(log=None)[source]

Create a new sppasReOcc instance with only the general rules.

Parameters

log – (sppasLog) Human-readable logs.

detection(tier_spk1, tier_spk2)[source]

Search for the re-occurrences of annotations.

Parameters
  • tier_spk1 – (sppasTier)

  • tier_spk2 – (sppasTier)

fix_options(options)[source]

Fix all options.

Available options are:

Parameters

options – (sppasOption)

get_inputs(input_files)[source]

Return 2 tiers with name given in options.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Input file is a tuple with 2 files: the main speaker and the echoing speaker.

Parameters
  • input_files – (list of list of str) Time-aligned items, Time-aligned items

  • output – (str) the output name

Returns

(sppasTranscription)

set_span(span)[source]

Fix the span option.

Span is the maximum number of annotations to search for re-occ. A value of 1 means to search only in the next annotation.

Parameters

span – (int) Value between 1 and 20

set_tiername(tier_name)[source]

Fix the tiername option.

Parameters

tier_name – (str)

class annotations.sppasSearchIPUs(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the IPUs detection.

__init__(log=None)[source]

Create a new sppasSearchIPUs instance.

Parameters

log – (sppasLog) Human-readable logs.

convert(channel)[source]

Search for IPUs in the given channel.

Parameters

channel – (sppasChannel) Input channel

Returns

(sppasTier)

fix_options(options)[source]

Fix all options.

Available options are:

  • threshold: volume threshold to decide a window is silence or not

  • win_length: length of window for a estimation or volume values

  • min_sil: minimum duration of a silence

  • min_ipu: minimum duration of an ipu

  • shift_start: start boundary shift value.

  • shift_end: end boundary shift value.

Parameters

options – (sppasOption)

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_min_ipu()[source]
get_min_sil()[source]
get_shift_end()[source]
get_shift_start()[source]
get_threshold()[source]
get_win_length()[source]
run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Audio

  • output – (str) the output file name

Returns

(sppasTranscription)

run_for_batch_processing(input_files)[source]

Perform the annotation on a file.

This method is called by ‘batch_processing’. It fixes the name of the output file. If the output file is already existing, the annotation is cancelled (the file won’t be overridden). If not, it calls the run method.

Parameters

input_files – (list of str) the inputs to perform a run

Returns

output file name or None

set_min_ipu(value)[source]

Fix the default minimum duration of an IPU.

Parameters

value – (float) Duration in seconds.

set_min_sil(value)[source]

Fix the default minimum duration of a silence.

Parameters

value – (float) Duration in seconds.

set_shift_end(value)[source]

Fix the end boundary shift value.

Parameters

value – (float) Duration in seconds.

set_shift_start(value)[source]

Fix the start boundary shift value.

Parameters

value – (float) Duration in seconds.

set_threshold(value)[source]

Fix the threshold volume.

Parameters

value – (int) RMS value used as volume threshold

set_win_length(value)[source]

Set a new length of window for a estimation or volume values.

TAKE CARE: it cancels any previous estimation of volume and silence search.

Parameters

value – (float) generally between 0.01 and 0.04 seconds.

static tracks_to_tier(tracks, end_time, vagueness)[source]

Create a sppasTier object from tracks.

Parameters
  • tracks – (List of tuple) with (from, to) values in seconds

  • end_time – (float) End-time of the tier

  • vagueness – (float) vagueness used for silence search

class annotations.sppasSelfRepet(log=None)[source]

Bases: annotations.SelfRepet.sppasbaserepet.sppasBaseRepet

SPPAS Automatic Self-Repetition Detection.

Detect self-repetitions. The result has never been validated by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.

__init__(log=None)[source]

Create a new sppasRepetition instance.

Parameters

log – (sppasLog) Human-readable logs.

get_input_pattern()[source]

Pattern this annotation expects for its input filename.

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Time-aligned tokens

  • output – (str) the output file name

Returns

(sppasTranscription)

self_detection(tier)[source]

Self-Repetition detection.

Parameters

tier – (sppasTier)

class annotations.sppasStopWords(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the identification of stop words in a tier.

__init__(log=None)[source]

Create a new instance.

Parameters

log – (sppasLog) Human-readable logs.

fix_options(options)[source]

Fix all options.

Parameters

options – list of sppasOption instances

get_inputs(input_files)[source]

Return the the tier with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

load_resources(lang_resources, lang=None)[source]

Load a list of stop-words and replacements.

Override the existing loaded lists…

Parameters
  • lang_resources – (str) File with extension ‘.stp’ or nothing

  • lang – (str)

make_stp_tier(tier)[source]

Return a tier indicating if entries are stop-words.

Parameters

tier – (sppasTier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Time-aligned tokens

  • output – (str) the output file name

Returns

(sppasTranscription)

set_alpha(alpha)[source]

Fix the alpha option.

Alpha is a coefficient to add specific stop-words in the list.

Parameters

alpha – (float)

set_tiername(tier_name)[source]

Fix the tiername option.

Parameters

tier_name – (str)

class annotations.sppasSyll(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the automatic syllabification annotation.

__init__(log=None)[source]

Create a new sppasSyll instance with only the general rules.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

convert(phonemes, intervals=None)[source]

Syllabify labels of a time-aligned phones tier.

Parameters
  • phonemes – (sppasTier) time-aligned phonemes tier

  • intervals – (sppasTier)

Returns

(sppasTier)

fix_options(options)[source]

Fix all options.

Available options are:

  • usesintervals

  • usesphons

  • tiername

  • createclasses

  • createstructures

Parameters

options – (sppasOption)

get_input_pattern()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the the tier with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

load_resources(config_filename, **kwargs)[source]

Fix the syllabification rules from a configuration file.

Parameters

config_filename – Name of the configuration file with the rules

make_classes(syllables)[source]

Create the tier with syllable classes.

Parameters

syllables – (sppasTier)

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Time-aligned phonemes

  • output – (str) the output file name

Returns

(sppasTranscription)

set_create_tier_classes(create=True)[source]

Fix the createclasses option.

Parameters

create – (bool)

set_tiername(tier_name)[source]

Fix the tiername option.

Parameters

tier_name – (str)

set_usesintervals(mode)[source]

Fix the usesintervals option.

Parameters

mode – (bool) If mode is set to True, the syllabification

operates inside specific (given) intervals.

set_usesphons(mode)[source]

Fix the usesphons option.

Parameters

mode – (str) If mode is set to True, the syllabification operates

by using only tier with phonemes.

syllabify_interval(phonemes, from_p, to_p, syllables)[source]

Perform the syllabification of one interval.

Parameters
  • phonemes – (sppasTier)

  • from_p – (int) index of the first phoneme to be syllabified

  • to_p – (int) index of the last phoneme to be syllabified

  • syllables – (sppasTier)

class annotations.sppasTGA(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

Estimate TGA on a tier – from D. Gibbon.

Create time groups then map them into a dictionary where:

  • key is a label assigned to the time group;

  • value is the list of observed durations of segments in this TG.

__init__(log=None)[source]

Create a new sppasTGA instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

convert(syllables)[source]

Estimate TGA on the given syllables.

Parameters

syllables – (sppasTier)

Returns

(sppasTranscription)

fix_options(options)[source]

Fix all options.

Available options are:

  • with_radius

  • original

  • annotationpro

  • tg_prefix_label

Parameters

options – (sppasOption)

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the the tier with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Syllabification

  • output – (str) the output file name

Returns

(sppasTranscription)

set_intercept_slope_annotationpro(value)[source]

Estimate intercepts and slopes with the method of annotationpro.

Default is True.

Parameters

value – (boolean)

set_intercept_slope_original(value)[source]

Estimate intercepts and slopes with the original method.

Default is False.

Parameters

value – (boolean)

set_tg_prefix_label(prefix)[source]

Fix the prefix to add to each TG.

Parameters

prefix – (str) Default is ‘tg_

set_with_radius(with_radius)[source]

Set the with_radius option, used to estimate the duration.

Parameters

with_radius – (int)

  • 0 means to use Midpoint;

  • negative value means to use R-;

  • positive radius means to use R+.

syllables_to_timegroups(syllables)[source]

Create the time group intervals.

Parameters

syllables – (sppasTier)

Returns

(sppasTier) Time groups

syllables_to_timesegments(syllables)[source]

Create the time segments intervals.

Time segments are time groups with serialized syllables.

Parameters

syllables

Returns

(sppasTier) Time segments

static tga_to_tier(tga_result, timegroups, tier_name, tag_type='float')[source]

Create a tier from one of the TGA result.

Parameters
  • tga_result – One of the results of TGA

  • timegroups – (sppasTier) Time groups

  • tier_name – (str) Name of the output tier

  • tag_type – (str) Type of the sppasTag to be included

Returns

(sppasTier)

static tga_to_tier_reglin(tga_result, timegroups, intercept=True)[source]

Create tiers of intercept,slope from one of the TGA result.

Parameters
  • tga_result – One of the results of TGA

  • timegroups – (sppasTier) Time groups

  • intercept – (boolean) Export the intercept.

If False, export Slope.

Returns

(sppasTier)

timegroups_to_durations(syllables, timegroups)[source]

Return a dict with timegroups and the syllable durations.

Parameters
  • syllables – (sppasTier) Syllables

  • timegroups – (sppasTier) Time groups

Returns

(dict)

class annotations.sppasTextNorm(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

Text normalization automatic annotation.

__init__(log=None)[source]

Create a sppasTextNorm instance without any linguistic resources.

Parameters

log – (sppasLog) Human-readable logs.

convert(tier)[source]

Text normalization of all labels of a tier.

Parameters

tier – (sppasTier) the orthographic transcription (standard or EOT)

Returns

A tuple with 3 tiers named: - “Tokens-Faked”, - “Tokens-Std”, - “Tokens-Custom”

fix_options(options)[source]

Fix all options. Available options are:

  • faked

  • std

  • custom

Parameters

options – (sppasOption)

get_inputs(input_files)[source]

Return the the tier with aligned tokens.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

load_resources(vocab_filename, lang='und', **kwargs)[source]

Fix the list of words of a given language.

It allows a better tokenization, and enables the language-dependent modules like num2letters.

Parameters
  • vocab_filename – (str) File with the orthographic transcription

  • lang – (str) the language code

occ_dur(tier)[source]

Create a tier with labels and duration of each annotation.

Parameters

tier

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) orthographic transcription

  • output – (str) the output file name

Returns

(sppasTranscription)

set_custom(value)[source]

Fix the custom option.

Parameters

value – (bool) Create a customized tokenization

set_faked(value)[source]

Fix the faked option.

Parameters

value – (bool) Create a faked tokenization

set_occ_dur(value)[source]

Fix the occurrences and duration tiers generation option.

Parameters

value – (bool) Create a tier with nb of tokens and duration

set_std(value)[source]

Fix the std option.

Parameters

value – (bool) Create a standard tokenization