annotations package¶
Subpackages¶
- annotations.Activity package
- annotations.Align package
- annotations.CuedSpeech package
- annotations.FaceClustering package
- annotations.FaceDetection package
- annotations.FaceSights package
- annotations.FillIPUs package
- annotations.IVA package
- annotations.Intsint package
- annotations.LexMetric package
- annotations.Momel package
- annotations.OtherRepet package
- annotations.Overlaps package
- annotations.Phon package
- annotations.RMS package
- annotations.ReOccurrences package
- annotations.SearchIPUs package
- annotations.SelfRepet package
- annotations.SpkLexRep package
- annotations.StopWords package
- annotations.Syll package
- annotations.TGA package
- annotations.TextNorm package
- Subpackages
- annotations.TextNorm.num2text package
- Submodules
- annotations.TextNorm.num2text.construct module
- annotations.TextNorm.num2text.num_asian_lang module
- annotations.TextNorm.num2text.num_base module
- annotations.TextNorm.num2text.num_cmn module
- annotations.TextNorm.num2text.num_europ_lang module
- annotations.TextNorm.num2text.num_fra module
- annotations.TextNorm.num2text.num_ita module
- annotations.TextNorm.num2text.num_jpn module
- annotations.TextNorm.num2text.num_khm module
- annotations.TextNorm.num2text.num_pol module
- annotations.TextNorm.num2text.num_spa module
- annotations.TextNorm.num2text.num_und module
- annotations.TextNorm.num2text.num_vie module
- annotations.TextNorm.num2text.por_num module
- Module contents
- annotations.TextNorm.num2text package
- Submodules
- annotations.TextNorm.language module
- annotations.TextNorm.normalize module
- annotations.TextNorm.num2letter module
- annotations.TextNorm.orthotranscription module
- annotations.TextNorm.splitter module
- annotations.TextNorm.sppastextnorm module
- annotations.TextNorm.tokenize module
- Module contents
- Subpackages
Submodules¶
annotations.annotationsexc module¶
- filename
sppas.src.annotations.annotationsexc.py
- author
Brigitte Bigi
- contact
- summary
Exceptions for annotations package.
- exception annotations.annotationsexc.AnnotationOptionError(key)[source]¶
Bases:
KeyError
:ERROR 1010:.
Unknown option with key {key}.
- exception annotations.annotationsexc.AnnotationSectionConfigFileError(section_name)[source]¶
Bases:
ValueError
:ERROR 4014:.
Missing section {section_name} in the configuration file.
- exception annotations.annotationsexc.AudioChannelError(nb)[source]¶
Bases:
OSError
:ERROR 1070:.
An audio file with only one channel is expected. Got {nb} channels.
- exception annotations.annotationsexc.BadInputError[source]¶
Bases:
TypeError
:ERROR 1040:.
SHOULD BE RENAMED BadTierInputError with expected type in parameter. Bad input tier type.
- exception annotations.annotationsexc.EmptyDirectoryError(dirname)[source]¶
Bases:
OSError
:ERROR 1220:.
The directory {dirname} does not contain relevant data.
- exception annotations.annotationsexc.EmptyInputError(name)[source]¶
Bases:
OSError
:ERROR 1020:.
Empty input tier {name}.
- exception annotations.annotationsexc.EmptyOutputError(name)[source]¶
Bases:
OSError
:ERROR 1025:.
Empty output result. No file created.
- exception annotations.annotationsexc.NoChannelInputError[source]¶
Bases:
OSError
:ERROR 1036:.
Missing input audio channel. Please read the documentation.
- exception annotations.annotationsexc.NoInputError[source]¶
Bases:
OSError
:ERROR 1030:.
No valid input.
- exception annotations.annotationsexc.NoTierInputError[source]¶
Bases:
OSError
:ERROR 1035:.
Missing input tier. Please read the documentation.
- exception annotations.annotationsexc.SizeInputsError(number1, number2)[source]¶
Bases:
OSError
:ERROR 1050:.
Inconsistency between the number of intervals of the input tiers. Got: {:d} and {:d}.
annotations.autils module¶
- filename
sppas.src.annotations.autils.py
- author
Brigitte Bigi
- contact
- summary
Utility classes for the automatic annotations.
- class annotations.autils.SppasFiles[source]¶
Bases:
object
- DEFAULT_EXTENSIONS = {'ANNOT': '.xra', 'ANNOT_ANNOT': '.xra', 'ANNOT_MEASURE': '.PitchTier', 'ANNOT_TABLE': '.arff', 'AUDIO': '.wav', 'IMAGE': '.jpg', 'VIDEO': '.mp4'}¶
- OUT_FORMATS = ('ANNOT', 'IMAGE', 'VIDEO')¶
- static get_default_extension(filetype_format)[source]¶
Return the default extension defined for a given format.
- Parameters
filetype_format – (str)
- Returns
(str) Extension with the dot or empty string
annotations.baseannot module¶
- filename
sppas.src.annotations.baseannot.py
- author
Brigitte Bigi
- contact
- summary
Base class for any SPPAS integration of an automatic annotation.
- class annotations.baseannot.sppasBaseAnnotation(config, log=None)[source]¶
Bases:
object
Base class for any automatic annotation integrated into SPPAS.
- __init__(config, log=None)[source]¶
Base class for any SPPAS automatic annotation.
Load default options/member values from a configuration file. This file must be in paths.etc
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
config – (str) Name of the JSON configuration file, without path.
log – (sppasLog) Human-readable logs.
- batch_processing(file_names, progress=None)[source]¶
Perform the annotation on a bunch of files.
Can be used by an annotation manager to launch all the annotations on all checked files of a workspace in a single process.
- The given list of inputs can then be either:
a list of file names: [file1, file2, …], or
a list of lists of file names: [(file1_a, file1_b), (file2_a,)], or
a list of mixed files/list of files: [file1, (file2a, file2b), …].
- Parameters
file_names – (list) List of inputs
progress – ProcessProgressTerminal() or ProcessProgressDialog()
- Returns
(list of str) List of created files
- fix_options(options)[source]¶
Fix all options of the annotation from a list of sppasOption().
- Parameters
options – (list of sppasOption)
- fix_out_file_ext(output, out_format='ANNOT')[source]¶
Return the output with an appropriate file extension.
If the output has already an extension, it is not changed.
- Parameters
output – (str) Base name or filename
out_format – (str) One of ANNOT, IMAGE, VIDEO
- Returns
(str) filename
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
By default, the extensions are the annotated files. Can be overridden to change the list of supported extensions: they must contain the dot.
- Returns
(list of list)
- get_input_patterns()[source]¶
List of patterns that the annotation expects for the input filenames.
- Returns
(list of str)
- static get_opt_input_extensions()[source]¶
Extensions that the annotation expects for its optional input filename.
- get_option(key)[source]¶
Return the option value of a given key or raise KeyError.
- Parameters
key – (str) Return the value of an option, or None.
- Raises
KeyError
- get_out_name(filename, output_format='')[source]¶
Return the output filename from the input one.
Output filename is created from the given filename, the annotation output pattern and the given output format (if any).
- Parameters
filename – (str) Name of the input file
output_format – (str) Extension of the output file with the dot
- Returns
(str)
- get_types()[source]¶
Return the list of types this annotation can perform.
If this annotation is expecting another file, the type allow to find it by using the references of the workspace (if any).
- print_diagnosis(*filenames)[source]¶
Print the diagnosis of a list of files in the user report.
- Parameters
filenames – (list) List of files.
- print_filename(filename)[source]¶
Print the annotation name applied on a filename in the user log.
- Parameters
filename – (str) Name of the file to annotate.
- run(input_files, output=None)[source]¶
Run the automatic annotation process on a given input.
The input is a list of files the annotation needs: audio, video, transcription, pitch, etc.
Either returns the list of created files if the given output is not none, or the created object (often a sppasTranscription) if no output was given.
- Parameters
input_files – (list of str) The required and optional input(s)
output – (str) The output name with or without extension
- Returns
(sppasTranscription OR list of created file names)
- run_for_batch_processing(input_files)[source]¶
Perform the annotation on a file.
This method is called by ‘batch_processing’. It fixes the name of the output file, and call the run method.
- Parameters
input_files – (list of str) the required input(s) for a run
- Returns
created output file name or None
- set_default_out_extensions()[source]¶
Return the default output extension of each format.
The default extension of each format is defined in the config.
annotations.diagnosis module¶
- filename
sppas.src.annotations.diagnosis.py
- author
Brigitte Bigi
- contact
- summary
Diagnose if files are appropriate for automatic annotations.
- class annotations.diagnosis.sppasDiagnosis[source]¶
Bases:
object
Diagnose if files are appropriate.
A set of methods to check if files are valid for SPPAS automatic annotations. Each method returns a status and a message depending on the fact that the given file is matching the requirements.
- EXPECTED_CHANNELS = 1¶
- EXPECTED_FRAME_RATE = 16000¶
- EXPECTED_SAMPLE_WIDTH = 2¶
- static check_audio_file(filename)[source]¶
Check an audio file.
Are verified:
the format of the file (error);
the number of channels (error);
the sample width (error or warning);
the framerate (error or warning;
the filename (warning).
- Parameters
filename – (str) name of the input file
- Returns
tuple with (status identifier, message)
- static check_file(filename)[source]¶
Check file of any type: audio or image or annotated file.
The extension of the filename is used to know the type of the file.
- Parameters
filename – (str) name of the input file to diagnose.
- Returns
tuple with (status identifier, message)
- static check_img_file(filename)[source]¶
Check an image file.
Are verified:
opencv can open the file
- Parameters
filename – (string) name of the input file
- Returns
tuple with (status identifier, message)
annotations.infotier module¶
- filename
sppas.src.annotations.infotier.py
- author
Brigitte Bigi
- contact
- summary
Tier with meta information about SPPAS.
- class annotations.infotier.sppasMetaInfoTier(meta_object=None)[source]¶
Bases:
sppas.src.structs.metainfo.sppasMetaInfo
Meta information manager about SPPAS.
Manager of meta information about SPPAS. Allows to create a tier with activated meta-information.
annotations.log module¶
- filename
sppas.src.annotations.log.py
- author
Brigitte Bigi
- contact
- summary
The Procedure Outcome Report of automatic annotations of SPPAS.
- class annotations.log.sppasLog(parameters=None)[source]¶
Bases:
object
A log file utility class dedicated to automatic annotations.
Class to manage the SPPAS automatic annotations log file, which is also called the “Procedure Outcome Report”.
- MAX_INDENT = 10¶
- STR_INDENT = ' ... '¶
- STR_ITEM = ' - '¶
- __init__(parameters=None)[source]¶
Create a sppasLog instance and open an output stream to NULL.
- Parameters
parameters – (sppasParam)
- create(filename)[source]¶
Create and open a new output stream.
- Parameters
filename – (str) Output filename
- static get_indent_text(number)[source]¶
Return a string representing some indentation.
- Parameters
number – (int) A positive integer.
- static get_status_text(status_id)[source]¶
Return a status text from a status identifier.
- Parameters
status_id – (int)
- open(filename)[source]¶
Open an existing file and set the output stream.
- Parameters
filename – (str) Output filename
- print_annotations_header()[source]¶
Print the parameters information in the output stream.
Do not print anything if no parameters were given.
- print_item(main_info, second_info=None)[source]¶
Print an item in the output stream.
- Parameters
main_info – (str) Main information to print
second_info – (str) A secondary info to print
- print_message(message, indent=0, status=None)[source]¶
Print a message at the end of the current output stream.
- Parameters
message – (str) The message to communicate
indent – (int) Shift the message with indents
status – (int) A status identifier
0 means OK, 1 means WARNING, 2 means IGNORED, 3 means INFO, -1 means ERROR.
- print_raw_text(text)[source]¶
Print a text at the end of the output stream.
- Parameters
text – (str) text to print
- print_stat_item(step_number, value=None)[source]¶
Print a statistic value in the output stream for a given step.
Do not print anything if no parameters were given.
- Parameters
step_number – (1..N)
value – (str) A statistic value.
Instead, print the status (enabled or disabled).
annotations.manager module¶
- filename
sppas.src.annotations.manager.py
- author
Brigitte Bigi
- contact
- summary
Automatic annotations manager for SPPAS integrated classes.
- class annotations.manager.sppasAnnotationsManager[source]¶
Bases:
threading.Thread
Parent class for running annotation processes.
Run annotations on a set of files.
- annotate(parameters, progress=None)[source]¶
Execute the activated annotations.
Get execution information from the ‘parameters’ object. Create a Procedure Outcome Report if a filename is set in the parameters.
annotations.param module¶
- filename
sppas.src.annotations.param.py
- author
Brigitte Bigi
- contact
- summary
Parametrization of automatic annotations.
- class annotations.param.annotationParam(filename=None)[source]¶
Bases:
object
Annotation data parameters.
Class to store meta data of an automatic annotation like its name, description, supported languages, etc.
- __init__(filename=None)[source]¶
Create a new annotationParam instance.
- Parameters
filename – (str) Annotation configuration file
- parse(filename)[source]¶
Parse a configuration file to fill members.
- Parameters
filename – (str) Annotation configuration file (.ini)
- set_activate(activate)[source]¶
Enable the annotation but only if this annotation is valid.
- Parameters
activate – (bool) Enable or disable the annotation
- Returns
(bool) enabled or disabled
- class annotations.param.sppasParam(annotation_keys=None)[source]¶
Bases:
object
Annotation parameters manager.
Parameters of a set of annotations.
- __init__(annotation_keys=None)[source]¶
Create a new sppasParam instance with default values.
- Parameters
annotation_keys – (list) List of annotations to load. None=ALL.
- add_to_workspace(files)[source]¶
Add a list of files or directories into the workspace.
The state of all the added files is set to CHECKED.
- Parameters
files – (str or list of str)
- get_output_extension(out_format)[source]¶
Return the output extension defined for the given out_format.
- get_ref_ids(step)[source]¶
Return a list of identifiers of the reference publications.
- Parameters
step – (int) Annotation index
- get_ref_url(step, ref_id)[source]¶
Return the URL of the reference publication.
- Parameters
step – (int) Annotation index
ref_id – (str) Identifier of a reference
- get_step_idx(annotation_key)[source]¶
Get the annotation step index from an annotation key.
- Parameters
annotation_key – (str)
- Raises
KeyError
- load_annotations(annotation_files=None)[source]¶
Load the annotation configuration files.
Load from a list of given file names (without path) or from the default sppas ui configuration file.
- Parameters
annotation_files – (list) List of annotations to load. None=ALL.
- parse_config_file()[source]¶
Parse the sppasui.json file.
Parse the file to get the list of annotations and parse the corresponding “json” file.
- set_output_extension(output_ext, output_format)[source]¶
Fix the output extension of all the annotations of a given out_format.
- Parameters
output_ext – (str) File extension (with or without a dot)
output_format – (str) Either ANNOT, AUDIO, VIDEO OR IMAGE
- Returns
(str) the assigned extension
- Raise
ValueError
annotations.searchtier module¶
- filename
sppas.src.annotations.searchtier.py
- author
Brigitte Bigi
- contact
- summary
Search for tier with various names.
- class annotations.searchtier.sppasFindTier[source]¶
Bases:
object
Search for tiers in a sppasTranscription.
- static aligned_lemmas(trs)[source]¶
Return the tier with time-aligned lemmas.
- Parameters
trs – (sppasTier or None)
- static aligned_phones(trs)[source]¶
Return the tier with time-aligned phonemes.
- Parameters
trs – (sppasTier or None)
- static aligned_syllables(trs)[source]¶
Return the tier with time-aligned syllables.
- Parameters
trs – (sppasTier or None)
- static aligned_tokens(trs)[source]¶
Return the tier with time-aligned tokens.
- Parameters
trs – (sppasTier or None)
- static phonetization(trs)[source]¶
Return the tier with phonetization.
- Parameters
trs – (sppasTranscription)
- Returns
(sppasTier or None)
- static pitch(trs)[source]¶
Return the tier with pitch values.
- Parameters
trs – (sppasTranscription)
- Returns
(sppasTier or None)
- static pitch_anchors(trs)[source]¶
Return the tier with pitch anchors, like momel result.
- Parameters
trs – (sppasTranscription)
- Returns
(sppasTier or None)
annotations.windowing module¶
- filename
sppas.src.annotations.windowing.py
- author
Brigitte Bigi
- contact
- summary
Windowing system on a tier.
- class annotations.windowing.sppasTierWindow(tier)[source]¶
Bases:
object
Windowing system on a tier.
Support windows in the time domain or with tag separators, both with or with overlaps among windows.
- __init__(tier)[source]¶
Create an instance of a sppasTierWindow.
- Parameters
tier – (sppasTier) Tier to analyze
- anchor_split(duration=1, step=1, separators=['#'])[source]¶
Return a set of annotations within a window given by separators.
- Parameters
duration – (int) the duration of a window - number of intervals among the separators
step – (int) the step duration - number of intervals
separators – (list) list of separators
- Returns
(List of sppasAnnSet)
- continuous_anchor_split(separators)[source]¶
Return all time intervals within a window given by separators.
- Parameters
separators – (list) list of separators
- Returns
(List of intervals)
- static drange(x, y, jump)[source]¶
Mimics ‘range’ with either float or int values.
- Parameters
x – start value
y – end value
jump – step value
- search_for_annotations(start_time, end_time, delta=0.5, ignore=[])[source]¶
Return the annotation set among the given interval.
- Parameters
start_time – (int/float)
end_time – (int/float)
delta – (float) Rate of time the annotation must overlap
ignore – (list of str) List of tag contents to ignore – currently applied only on the best tag
- Returns
(sppasAnnSet) The annotations matching all the requirements
- time_split(duration, step, delta=0.6)[source]¶
Return a set of annotations within a time window.
- Parameters
duration – (float) the duration of a window
step – (float) the step duration
delta – (float) percentage of confidence for an overlapping label
- Returns
(sppasAnnSet) Set of sppasAnnotation
- Raises
sppasTypeError, ValueError
Module contents¶
- filename
sppas.src.annotations.__init__.py
- author
Brigitte Bigi
- contact
- summary
The automatic annotations of SPPAS.
annotations: automatic annotations.¶
This package includes all the automatic annotations, each one in a package and the classes to manage the data to be annotated and the resulting annotated data.
Requires the following other packages:
config
utils
exc
structs
wkps
resources
anndata
audiodata
imgdata – if “video” feature enabled
videodata – if “video” feature enabled
- class annotations.SppasFiles[source]¶
Bases:
object
- DEFAULT_EXTENSIONS = {'ANNOT': '.xra', 'ANNOT_ANNOT': '.xra', 'ANNOT_MEASURE': '.PitchTier', 'ANNOT_TABLE': '.arff', 'AUDIO': '.wav', 'IMAGE': '.jpg', 'VIDEO': '.mp4'}¶
- OUT_FORMATS = ('ANNOT', 'IMAGE', 'VIDEO')¶
- static get_default_extension(filetype_format)[source]¶
Return the default extension defined for a given format.
- Parameters
filetype_format – (str)
- Returns
(str) Extension with the dot or empty string
- class annotations.StopWords(case_sensitive=False)[source]¶
Bases:
sppas.src.resources.vocab.sppasVocabulary
A vocabulary that can automatically evaluate a list of Stop-Words.
An entry ‘w’ is relevant for the speaker if its probability is less than a threshold:
P(w) <= 1 / (alpha * V)where ‘alpha’ is an empirical coefficient and ‘V’ is the vocabulary size of the speaker.
- MAX_ALPHA = 4.0¶
- MIN_ANN_NUMBER = 5¶
- __init__(case_sensitive=False)[source]¶
Create a new StopWords instance.
- Parameters
case_sensitive – (bool) Considers the case of entries or not.
- property alpha¶
Return the value of alpha coefficient (float).
- evaluate(tier=None, merge=True)[source]¶
Add entries to the list of stop-words from the content of a tier.
Estimate if a token is relevant: if not it adds it in the stop-list.
- Parameters
tier – (sppasTier) A tier with entries to be analyzed.
merge – (bool) Merge with the existing list (if True) or
delete the existing list and create a new one (if False) :returns: (int) Number of entries added into the list :raises: EmptyInputError, TooSmallInputError
- class annotations.sppasActivity(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the Activity generation.
- __init__(log=None)[source]¶
Create a new sppasActivity instance.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(tier, tmin, tmax)[source]¶
Create an Activity and ActivityDuration tier.
- Parameters
tier – (sppasTier)
tmin – (sppasPoint)
tmax – (sppasPoint)
- Returns
(sppasTier, sppasTier)
- fix_options(options)[source]¶
Fix all options.
Available options are:
duration
- Parameters
options – (sppasOption)
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- class annotations.sppasAlign(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the Alignment automatic annotation.
- Author
Brigitte Bigi
- Contact
This class can produce 1 up to 5 tiers with names:
PhonAlign
TokensAlign (if tokens are given in the input)
PhnTokAlign - option (if tokens are given in the input)
How to use sppasAlign?
>>> a = sppasAlign() >>> a.set_aligner('julius') >>> a.load_resources(model_dirname) >>> a.run([phones], [audio, tokens], output)
- __init__(log=None)[source]¶
Create a new sppasAlign instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(phon_tier, tok_tier, tok_faked_tier, input_audio, workdir)[source]¶
Perform speech segmentation of data.
- Parameters
phon_tier – (Tier) phonetization.
tok_tier – (Tier) tokenization, or None.
tok_faked_tier – (Tier) rescue tokenization, or None.
input_audio – (str) Audio file name.
workdir – (str) The working directory
- Returns
tier_phn, tier_tok
- fix_options(options)[source]¶
Fix all options.
Available options are:
clean
basic
aligner
- Parameters
options – (sppasOption)
- static fix_workingdir(inputaudio=None)[source]¶
Fix the working directory to store temporarily the data.
- Parameters
inputaudio – (str) Audio file name
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
- get_inputs(input_files)[source]¶
Return the audio file name and the 2 tiers.
Two tiers: the tier with phonetization and the tier with text normalization.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasChannel, sppasTier, sppasTier)
- load_resources(model, model_L1=None, **kwargs)[source]¶
Fix the acoustic model directory.
Create a SpeechSegmenter and AlignerIO.
- Parameters
model – (str) Directory of the acoustic model of the language
of the text :param model_L1: (str) Directory of the acoustic model of the mother language of the speaker
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Phonemes, and optionally tokens, audio
output – (str) the output name
- Returns
(sppasTranscription)
- set_aligner(aligner_name)[source]¶
Fix the name of the aligner.
- Parameters
aligner_name – (str) Case-insensitive name of the aligner.
- class annotations.sppasAnnotationsManager[source]¶
Bases:
threading.Thread
Parent class for running annotation processes.
Run annotations on a set of files.
- annotate(parameters, progress=None)[source]¶
Execute the activated annotations.
Get execution information from the ‘parameters’ object. Create a Procedure Outcome Report if a filename is set in the parameters.
- class annotations.sppasFillIPUs(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the fill in IPUs automatic annotation.
- __init__(log=None)[source]¶
Create a new sppasFillIPUs instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(channel, text_tier)[source]¶
Return a tier with transcription aligned to the audio.
- Parameters
channel – (sppasChannel) Input audio channel
text_tier – (sppasTier) Input transcription text in a PointTier
- fix_options(options)[source]¶
Fix all options.
Available options are:
threshold: volume threshold to decide a window is silence or not
win_length: length of window for a estimation or volume values
min_sil: minimum duration of a silence
min_ipu: minimum duration of an ipu
shift_start: start boundary shift value.
shift_end: end boundary shift value.
- Parameters
options – (sppasOption)
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
- get_inputs(input_files)[source]¶
Return the channel and the tier with ipus.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasChannel, sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
input_filename is a tuple (audio, raw transcription)
- Parameters
input_files – (list of str) (audio, ortho)
output – (str) the output file name
- Returns
(sppasTranscription)
- run_for_batch_processing(input_files)[source]¶
Perform the annotation on a file.
This method is called by ‘batch_processing’. It fixes the name of the output file, and call the run method.
Override to NOT ANNOTATE if an annotation is already existing.
- Parameters
input_files – (list of str) the required inputs for a run
- Returns
output file name or None
- set_min_ipu(value)[source]¶
Fix the initial minimum duration of an IPU.
- Parameters
value – (float) Duration in seconds.
- set_min_sil(value)[source]¶
Fix the initial minimum duration of a silence.
- Parameters
value – (float) Duration in seconds.
- class annotations.sppasFindTier[source]¶
Bases:
object
Search for tiers in a sppasTranscription.
- static aligned_lemmas(trs)[source]¶
Return the tier with time-aligned lemmas.
- Parameters
trs – (sppasTier or None)
- static aligned_phones(trs)[source]¶
Return the tier with time-aligned phonemes.
- Parameters
trs – (sppasTier or None)
- static aligned_syllables(trs)[source]¶
Return the tier with time-aligned syllables.
- Parameters
trs – (sppasTier or None)
- static aligned_tokens(trs)[source]¶
Return the tier with time-aligned tokens.
- Parameters
trs – (sppasTier or None)
- static phonetization(trs)[source]¶
Return the tier with phonetization.
- Parameters
trs – (sppasTranscription)
- Returns
(sppasTier or None)
- static pitch(trs)[source]¶
Return the tier with pitch values.
- Parameters
trs – (sppasTranscription)
- Returns
(sppasTier or None)
- static pitch_anchors(trs)[source]¶
Return the tier with pitch anchors, like momel result.
- Parameters
trs – (sppasTranscription)
- Returns
(sppasTier or None)
- class annotations.sppasIVA(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
Estimate IVA on a tier.
Get or create segments then map them into a dictionary where:
key is a label assigned to the segment;
value is the list of observed values in the segment.
- __init__(log=None)[source]¶
Create a new sppasIVA instance.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(input_tier_values, input_tier_segments)[source]¶
Estimate IVA on the given input tier with values.
- Parameters
input_tier_values – (sppasTier) Tier with numerical values.
input_tier_segments – (sppasTier) Tier with intervals.
- Returns
(sppasTranscription)
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
An annotated file with measure values (pitch, intensity…), and An annotated file with a sppasTier of type ‘interval’.
- get_input_tiers(input_files)[source]¶
Return tiers with values and segments.
- Parameters
input_files – (list)
- static iva_to_tier(iva_result, sgmts_tier, tier_name, tag_type='float')[source]¶
Create a tier from one of the IVA result (mean, sd, …).
- Parameters
iva_result – One of the results of TGA
sgmts_tier – (sppasTier) Tier with the segments
tier_name – (str) Name of the output tier
tag_type – (str) Type of the sppasTag to be included
- Returns
(sppasTier)
- static iva_to_tier_reglin(iva_result, sgmts_tier, intercept=True)[source]¶
Create tiers of intercept,slope from the IVA result.
- Parameters
iva_result – intercept,slope result of IVA
sgmts_tier – (sppasTier) Tier with the segments
intercept – (boolean) Export the intercept.
If False, export Slope.
- Returns
(sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Values and Segments in a single file or in different ones
output – (str) the output file name
- Returns
(sppasTranscription)
- set_eval(occ=None, total=None, mean=None, median=None, stdev=None, linreg=None)[source]¶
Set IVA evaluations to perform.
- Parameters
total – (bool) Estimates total of values in segments.
mean – (bool) Estimates mean of values in segments.
median – (bool) Estimates median of values in segments.
stdev – (bool) Estimates standard deviation of values in segments.
linreg – (bool) Estimates linear regression of values in segments.
- set_input_tiername_segments(tiername)[source]¶
Fix the name of the tier with segments.
- Parameters
tiername – (str) Default is ‘TokensAlign’
- set_input_tiername_values(tiername)[source]¶
Fix the name of the tier with values.
- Parameters
tiername – (str) Default is ‘PitchTier’
- set_segments_separators(entry)[source]¶
Fix the separators to create segments.
- Parameters
entry – (str) Entries separated by whitespace.
- set_sgmt_prefix_label(prefix)[source]¶
Fix the prefix to add to each segment.
- Parameters
prefix – (str) Default is ‘sgmt_’
- tier_to_labelled_segments(segments, input_tier_values)[source]¶
Create the segment intervals within the values.
- Parameters
segments – (sppasTier) segment intervals to get values
input_tier_values – (sppasTier) tags are float/int values
- Returns
(dict, sppasTier) dict of segment/values, labelled segments
- class annotations.sppasIntsint(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the INTSINT automatic annotation.
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
INTSINT requires momel anchors which can either be stored in a TextGrid file or in a PitchTier file.
- Returns
(list of list)
- get_input_tier(input_files)[source]¶
Return the tier with Momel anchors.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier) Tier of type Point
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) momel anchors
output – (str) the output file name
- Returns
(sppasTranscription)
- class annotations.sppasLexMetric(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the occ and rank estimator.
- __init__(log=None)[source]¶
Create a new sppasLexMetric instance.
- Parameters
log – (sppasLog) Human-readable logs.
- get_input_tier(input_files)[source]¶
Return the input tier from the inputs.
- Parameters
input_files – (list)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Time-aligned tokens, or other
output – (str) the output file name
- Returns
(sppasTranscription)
- class annotations.sppasLexRep(log=None)[source]¶
Bases:
annotations.SelfRepet.sppasbaserepet.sppasBaseRepet
SPPAS integration of the speaker lexical variation annotation.
Main differences compared to repetitions: The span option is used to fix the max number of continuous tokens to analyze. The span window has a duration limit.
- __init__(log=None)[source]¶
Create a new sppasLexVar instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- static create_tier(sources, locations)[source]¶
Create a tier from content end localization lists.
- Parameters
sources – (dict) dict of sources – in fact, the indexes.
locations – (list) list of location corresponding to the tokens
- Returns
(sppasTier)
- get_inputs(input_files)[source]¶
Return 2 tiers with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- static get_longest(speaker1, speaker2)[source]¶
Return the index of the last token of the longest repeated sequence.
No matter if a non-speech event occurs in the middle of the repeated sequence and no matter if a non-speech event occurs in the middle of the source sequence. No matter if tokens are not repeated in the same order.
- Parameters
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker) Entries of speaker 2
- Returns
(int) Index or -1
- lexical_variation_detect(tier1, tier2)[source]¶
Detect the lexical variations between 2 tiers.
- Parameters
tier1 – (sppasTier)
tier2 – (sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of list of str) time-aligned tokens of 2 files
output – (str) the output file name
- Returns
(sppasTranscription)
- select(index1, speaker1, speaker2)[source]¶
Append (or not) a repetition.
- Parameters
index1 – (int) end index of the entry of the source (speaker1)
speaker1 – (DataSpeaker) Entries of speaker 1
speaker2 – (DataSpeaker) Entries of speaker 2
- Returns
(bool)
- set_alpha(value)[source]¶
Set the alpha option.
- Parameters
value – (float) Coefficient to estimated stopwords
- set_span(value)[source]¶
Set the max span, in number of words.
- Parameters
value – (int) Max nb of tokens in a span window.
- set_span_duration(value)[source]¶
Set the spandur option.
- Parameters
value – (float, int) Max duration of a span window.
- set_stopwords(value)[source]¶
Set the stopwords option.
- Parameters
value – (bool) Enable the fact to add estimated stopwords
- class annotations.sppasMomel(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of Momel.
- __init__(log=None)[source]¶
Create a new sppasMomel instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- static anchors_to_tier(anchors)[source]¶
Transform anchors to a sppasTier.
Anchors are stored in frames. It is converted to seconds (a frame is during 10ms).
- Parameters
anchors – (List of Anchor)
- Returns
(sppasTier)
- convert(pitch)[source]¶
Search for momel anchors.
- Parameters
pitch – (list of float) pitch values samples at 10ms
- Returns
sppasTier
- estimate_momel(ipu_pitch, current_time)[source]¶
Estimate momel on an IPU.
- Parameters
ipu_pitch – (list of float) Pitch values of an IPU.
current_time – (float) Time value of the last pitch value
- Returns
(list of Anchor)
- fix_options(options)[source]¶
Fix all options.
Available options are:
lfen1
hzinf
hzsup
maxec
lfen2
seuildiff_x
seuildiff_y
glitch
- Parameters
options – (sppasOption)
- static fix_pitch(input_filename)[source]¶
Load pitch values from a file.
It is supposed that the given file contains a tier with name “Pitch” with a pitch value every 10ms, or a tier with name “PitchTier”.
- Returns
A list of pitch values (one value each 10 ms).
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
- class annotations.sppasOtherRepet(log=None)[source]¶
Bases:
annotations.SelfRepet.sppasbaserepet.sppasBaseRepet
SPPAS Automatic Other-Repetition Detection.
Detect automatically other-repetitions. Result must be re-filtered by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.
- __init__(log=None)[source]¶
Create a new sppasOtherRepet instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- get_inputs(input_files)[source]¶
Return 2 tiers with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- other_detection(inputtier1, inputtier2)[source]¶
Other-Repetition detection.
- Parameters
inputtier1 – (Tier)
inputtier2 – (Tier)
- class annotations.sppasOverActivity(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the automatic overlaps estimator on intervals.
- detection(tier_spk1, tier_spk2)[source]¶
Search for the overlaps of annotations.
- Parameters
tier_spk1 – (sppasTier)
tier_spk2 – (sppasTier)
- get_inputs(input_files)[source]¶
Return 2 tiers with name given in options.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
Input files is a list with 2 files: the activity of speaker 1 and the activity of the speaker 2.
- Parameters
input_files – (list of str) Time-aligned items, Time-aligned items
output – (str) the output name
- Returns
(sppasTranscription)
- class annotations.sppasParam(annotation_keys=None)[source]¶
Bases:
object
Annotation parameters manager.
Parameters of a set of annotations.
- __init__(annotation_keys=None)[source]¶
Create a new sppasParam instance with default values.
- Parameters
annotation_keys – (list) List of annotations to load. None=ALL.
- add_to_workspace(files)[source]¶
Add a list of files or directories into the workspace.
The state of all the added files is set to CHECKED.
- Parameters
files – (str or list of str)
- get_output_extension(out_format)[source]¶
Return the output extension defined for the given out_format.
- get_ref_ids(step)[source]¶
Return a list of identifiers of the reference publications.
- Parameters
step – (int) Annotation index
- get_ref_url(step, ref_id)[source]¶
Return the URL of the reference publication.
- Parameters
step – (int) Annotation index
ref_id – (str) Identifier of a reference
- get_step_idx(annotation_key)[source]¶
Get the annotation step index from an annotation key.
- Parameters
annotation_key – (str)
- Raises
KeyError
- load_annotations(annotation_files=None)[source]¶
Load the annotation configuration files.
Load from a list of given file names (without path) or from the default sppas ui configuration file.
- Parameters
annotation_files – (list) List of annotations to load. None=ALL.
- parse_config_file()[source]¶
Parse the sppasui.json file.
Parse the file to get the list of annotations and parse the corresponding “json” file.
- set_output_extension(output_ext, output_format)[source]¶
Fix the output extension of all the annotations of a given out_format.
- Parameters
output_ext – (str) File extension (with or without a dot)
output_format – (str) Either ANNOT, AUDIO, VIDEO OR IMAGE
- Returns
(str) the assigned extension
- Raise
ValueError
- class annotations.sppasPhon(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the Phonetization automatic annotation.
- __init__(log=None)[source]¶
Create a sppasPhon instance without any linguistic resources.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(tier)[source]¶
Phonetize annotations of a tokenized tier.
- Parameters
tier – (Tier) the ortho transcription previously tokenized.
- Returns
(Tier) phonetized tier with name “Phones”
- fix_options(options)[source]¶
Fix all options.
Available options are:
phonunk
usesstdtokens
- Parameters
options – (sppasOption)
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- load_resources(dict_filename=None, map_filename=None, **kwargs)[source]¶
Set the pronunciation dictionary and the mapping table.
- Parameters
dict_filename – (str) The pronunciation dictionary in HTK-ASCII
format with UTF-8 encoding.
- Parameters
map_filename – (str) is the filename of a mapping table. It is used to generate new pronunciations by mapping phonemes of the dict.
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Normalized text
output – (str) the output name
- Returns
(sppasTranscription)
- class annotations.sppasRMS(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the automatic RMS estimator on intervals.
Estimate the root-mean-square of segments, i.e. sqrt(sum(S_i^2)/n). This is a measure of the power in an audio signal.
- __init__(log=None)[source]¶
Create a new sppasRMS instance.
- Parameters
log – (sppasLog) Human-readable logs.
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
- get_inputs(input_files)[source]¶
Return the channel and the tier with ipus.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasChannel, sppasTier)
- class annotations.sppasReOcc(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the automatic re-occurrences annotation.
- __init__(log=None)[source]¶
Create a new sppasReOcc instance with only the general rules.
- Parameters
log – (sppasLog) Human-readable logs.
- detection(tier_spk1, tier_spk2)[source]¶
Search for the re-occurrences of annotations.
- Parameters
tier_spk1 – (sppasTier)
tier_spk2 – (sppasTier)
- fix_options(options)[source]¶
Fix all options.
Available options are:
- Parameters
options – (sppasOption)
- get_inputs(input_files)[source]¶
Return 2 tiers with name given in options.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
Input file is a tuple with 2 files: the main speaker and the echoing speaker.
- Parameters
input_files – (list of list of str) Time-aligned items, Time-aligned items
output – (str) the output name
- Returns
(sppasTranscription)
- class annotations.sppasSearchIPUs(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the IPUs detection.
- __init__(log=None)[source]¶
Create a new sppasSearchIPUs instance.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(channel)[source]¶
Search for IPUs in the given channel.
- Parameters
channel – (sppasChannel) Input channel
- Returns
(sppasTier)
- fix_options(options)[source]¶
Fix all options.
Available options are:
threshold: volume threshold to decide a window is silence or not
win_length: length of window for a estimation or volume values
min_sil: minimum duration of a silence
min_ipu: minimum duration of an ipu
shift_start: start boundary shift value.
shift_end: end boundary shift value.
- Parameters
options – (sppasOption)
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Audio
output – (str) the output file name
- Returns
(sppasTranscription)
- run_for_batch_processing(input_files)[source]¶
Perform the annotation on a file.
This method is called by ‘batch_processing’. It fixes the name of the output file. If the output file is already existing, the annotation is cancelled (the file won’t be overridden). If not, it calls the run method.
- Parameters
input_files – (list of str) the inputs to perform a run
- Returns
output file name or None
- set_min_ipu(value)[source]¶
Fix the default minimum duration of an IPU.
- Parameters
value – (float) Duration in seconds.
- set_min_sil(value)[source]¶
Fix the default minimum duration of a silence.
- Parameters
value – (float) Duration in seconds.
- set_shift_end(value)[source]¶
Fix the end boundary shift value.
- Parameters
value – (float) Duration in seconds.
- set_shift_start(value)[source]¶
Fix the start boundary shift value.
- Parameters
value – (float) Duration in seconds.
- set_threshold(value)[source]¶
Fix the threshold volume.
- Parameters
value – (int) RMS value used as volume threshold
- class annotations.sppasSelfRepet(log=None)[source]¶
Bases:
annotations.SelfRepet.sppasbaserepet.sppasBaseRepet
SPPAS Automatic Self-Repetition Detection.
Detect self-repetitions. The result has never been validated by an expert. This annotation is performed on the basis of time-aligned tokens or lemmas. The output is made of 2 tiers with sources and echos.
- __init__(log=None)[source]¶
Create a new sppasRepetition instance.
- Parameters
log – (sppasLog) Human-readable logs.
- class annotations.sppasStopWords(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the identification of stop words in a tier.
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- load_resources(lang_resources, lang=None)[source]¶
Load a list of stop-words and replacements.
Override the existing loaded lists…
- Parameters
lang_resources – (str) File with extension ‘.stp’ or nothing
lang – (str)
- make_stp_tier(tier)[source]¶
Return a tier indicating if entries are stop-words.
- Parameters
tier – (sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Time-aligned tokens
output – (str) the output file name
- Returns
(sppasTranscription)
- class annotations.sppasSyll(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the automatic syllabification annotation.
- __init__(log=None)[source]¶
Create a new sppasSyll instance with only the general rules.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(phonemes, intervals=None)[source]¶
Syllabify labels of a time-aligned phones tier.
- Parameters
phonemes – (sppasTier) time-aligned phonemes tier
intervals – (sppasTier)
- Returns
(sppasTier)
- fix_options(options)[source]¶
Fix all options.
Available options are:
usesintervals
usesphons
tiername
createclasses
createstructures
- Parameters
options – (sppasOption)
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- load_resources(config_filename, **kwargs)[source]¶
Fix the syllabification rules from a configuration file.
- Parameters
config_filename – Name of the configuration file with the rules
- make_classes(syllables)[source]¶
Create the tier with syllable classes.
- Parameters
syllables – (sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Time-aligned phonemes
output – (str) the output file name
- Returns
(sppasTranscription)
- set_create_tier_classes(create=True)[source]¶
Fix the createclasses option.
- Parameters
create – (bool)
- set_usesintervals(mode)[source]¶
Fix the usesintervals option.
- Parameters
mode – (bool) If mode is set to True, the syllabification
operates inside specific (given) intervals.
- class annotations.sppasTGA(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
Estimate TGA on a tier – from D. Gibbon.
Create time groups then map them into a dictionary where:
key is a label assigned to the time group;
value is the list of observed durations of segments in this TG.
- __init__(log=None)[source]¶
Create a new sppasTGA instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(syllables)[source]¶
Estimate TGA on the given syllables.
- Parameters
syllables – (sppasTier)
- Returns
(sppasTranscription)
- fix_options(options)[source]¶
Fix all options.
Available options are:
with_radius
original
annotationpro
tg_prefix_label
- Parameters
options – (sppasOption)
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Syllabification
output – (str) the output file name
- Returns
(sppasTranscription)
- set_intercept_slope_annotationpro(value)[source]¶
Estimate intercepts and slopes with the method of annotationpro.
Default is True.
- Parameters
value – (boolean)
- set_intercept_slope_original(value)[source]¶
Estimate intercepts and slopes with the original method.
Default is False.
- Parameters
value – (boolean)
- set_tg_prefix_label(prefix)[source]¶
Fix the prefix to add to each TG.
- Parameters
prefix – (str) Default is ‘tg_’
- set_with_radius(with_radius)[source]¶
Set the with_radius option, used to estimate the duration.
- Parameters
with_radius – (int)
0 means to use Midpoint;
negative value means to use R-;
positive radius means to use R+.
- syllables_to_timegroups(syllables)[source]¶
Create the time group intervals.
- Parameters
syllables – (sppasTier)
- Returns
(sppasTier) Time groups
- syllables_to_timesegments(syllables)[source]¶
Create the time segments intervals.
Time segments are time groups with serialized syllables.
- Parameters
syllables –
- Returns
(sppasTier) Time segments
- static tga_to_tier(tga_result, timegroups, tier_name, tag_type='float')[source]¶
Create a tier from one of the TGA result.
- Parameters
tga_result – One of the results of TGA
timegroups – (sppasTier) Time groups
tier_name – (str) Name of the output tier
tag_type – (str) Type of the sppasTag to be included
- Returns
(sppasTier)
- static tga_to_tier_reglin(tga_result, timegroups, intercept=True)[source]¶
Create tiers of intercept,slope from one of the TGA result.
- Parameters
tga_result – One of the results of TGA
timegroups – (sppasTier) Time groups
intercept – (boolean) Export the intercept.
If False, export Slope.
- Returns
(sppasTier)
- class annotations.sppasTextNorm(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
Text normalization automatic annotation.
- __init__(log=None)[source]¶
Create a sppasTextNorm instance without any linguistic resources.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(tier)[source]¶
Text normalization of all labels of a tier.
- Parameters
tier – (sppasTier) the orthographic transcription (standard or EOT)
- Returns
A tuple with 3 tiers named: - “Tokens-Faked”, - “Tokens-Std”, - “Tokens-Custom”
- fix_options(options)[source]¶
Fix all options. Available options are:
faked
std
custom
- Parameters
options – (sppasOption)
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- load_resources(vocab_filename, lang='und', **kwargs)[source]¶
Fix the list of words of a given language.
It allows a better tokenization, and enables the language-dependent modules like num2letters.
- Parameters
vocab_filename – (str) File with the orthographic transcription
lang – (str) the language code
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) orthographic transcription
output – (str) the output file name
- Returns
(sppasTranscription)
- set_custom(value)[source]¶
Fix the custom option.
- Parameters
value – (bool) Create a customized tokenization
- set_faked(value)[source]¶
Fix the faked option.
- Parameters
value – (bool) Create a faked tokenization