annotations.Align package¶
Subpackages¶
- annotations.Align.aligners package
- annotations.Align.models package
- Subpackages
- annotations.Align.models.acm package
- Submodules
- annotations.Align.models.acm.acfeatures module
- annotations.Align.models.acm.acmbaseio module
- annotations.Align.models.acm.acmodel module
- annotations.Align.models.acm.acmodelhtkio module
- annotations.Align.models.acm.hmm module
- annotations.Align.models.acm.htkscripts module
- annotations.Align.models.acm.htktrain module
- annotations.Align.models.acm.phoneset module
- annotations.Align.models.acm.readwrite module
- annotations.Align.models.acm.tiedlist module
- Module contents
- annotations.Align.models.slm package
- annotations.Align.models.acm package
- Submodules
- annotations.Align.models.modelsexc module
- annotations.Align.models.tiermapping module
- Module contents
- Subpackages
Submodules¶
annotations.Align.modelmixer module¶
- filename
sppas.src.annotations.Align.models.acm.modelmixer.py
- author
Brigitte Bigi
- contact
- summary
Combine two acoustic models.
- class annotations.Align.modelmixer.sppasModelMixer[source]¶
Bases:
object
Mix two acoustic models.
Create a mixed monophones model. Typical use is to create an acoustic model of a non-native speaker.
- mix(outputdir, format='hmmdefs', gamma=1.0)[source]¶
Mix the acm of the text with the one of the spk mother language.
Mix the acm of the text with the one of the mother language of the speaker reading such text.
All new phones are added and the shared ones are combined using a Static Linear Interpolation.
- Parameters
outputdir – (str) The directory to save the new mixed model.
format – (str) the format of the resulting acoustic model
gamma – (float) coefficient to apply to the model: between 0.
and 1. This means that a coefficient value of 1. indicates to keep the current version of each shared hmm.
- Raises
TypeError, ValueError
- Returns
a tuple indicating the number of hmms that was
(appended, interpolated, kept, changed).
annotations.Align.sppasalign module¶
- filename
sppas.src.annotations.Align.sppasalign.py
- author
Brigitte Bigi
- contact
- summary
SPPAS integration of Alignment automatic annotation.
- class annotations.Align.sppasalign.sppasAlign(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the Alignment automatic annotation.
- Author
Brigitte Bigi
- Contact
This class can produce 1 up to 5 tiers with names:
PhonAlign
TokensAlign (if tokens are given in the input)
PhnTokAlign - option (if tokens are given in the input)
How to use sppasAlign?
>>> a = sppasAlign() >>> a.set_aligner('julius') >>> a.load_resources(model_dirname) >>> a.run([phones], [audio, tokens], output)
- __init__(log=None)[source]¶
Create a new sppasAlign instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(phon_tier, tok_tier, tok_faked_tier, input_audio, workdir)[source]¶
Perform speech segmentation of data.
- Parameters
phon_tier – (Tier) phonetization.
tok_tier – (Tier) tokenization, or None.
tok_faked_tier – (Tier) rescue tokenization, or None.
input_audio – (str) Audio file name.
workdir – (str) The working directory
- Returns
tier_phn, tier_tok
- fix_options(options)[source]¶
Fix all options.
Available options are:
clean
basic
aligner
- Parameters
options – (sppasOption)
- static fix_workingdir(inputaudio=None)[source]¶
Fix the working directory to store temporarily the data.
- Parameters
inputaudio – (str) Audio file name
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
- get_inputs(input_files)[source]¶
Return the audio file name and the 2 tiers.
Two tiers: the tier with phonetization and the tier with text normalization.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasChannel, sppasTier, sppasTier)
- load_resources(model, model_L1=None, **kwargs)[source]¶
Fix the acoustic model directory.
Create a SpeechSegmenter and AlignerIO.
- Parameters
model – (str) Directory of the acoustic model of the language
of the text :param model_L1: (str) Directory of the acoustic model of the mother language of the speaker
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Phonemes, and optionally tokens, audio
output – (str) the output name
- Returns
(sppasTranscription)
- set_aligner(aligner_name)[source]¶
Fix the name of the aligner.
- Parameters
aligner_name – (str) Case-insensitive name of the aligner.
annotations.Align.tracksgmt module¶
- filename
sppas.src.annotations.Align.tracksgmt.py
- author
Brigitte Bigi
- contact
- summary
Automatic segmentation of the data into track segments
- class annotations.Align.tracksgmt.TrackSegmenter(model=None, aligner_name='basic')[source]¶
Bases:
object
Automatic segmentation of a unit of speech.
Speech segmentation of a unit of speech (IPU/utterance/sentence/segment) at phones and tokens levels.
This class is mainly an interface with external automatic aligners.
It is expected that all the following data were previously properly fixed:
audio file: 1 channel, 16000 Hz, 16 bits;
tokenization: UTF-8 encoding file (optional);
phonetization: UTF-8 encoding file;
acoustic model: HTK-ASCII (Julius or HVite expect this format);
- and that:
both the AC and phonetization are based on the same phone set
both the tokenization and phonetization contain the same nb of words
- DEFAULT_ALIGNER = 'basic'¶
- __init__(model=None, aligner_name='basic')[source]¶
Create a TrackSegmenter instance.
- Parameters
model – (str) Name of the directory of the acoustic model.
aligner_name – (str) The identifier name of the aligner.
It is expected that the AC model contains at least a file with name “hmmdefs”, and a file with name “monophones” for HVite command. It can also contain:
tiedlist file;
monophones.repl file;
config file.
Any other file will be ignored.
- aligners = <annotations.Align.aligners.aligner.sppasAligners object>¶
- segment(audio_filename, phon_name, token_name, align_name)[source]¶
Call an aligner to perform speech segmentation and manage errors.
- Parameters
audio_filename – (str) the audio file name of an IPU
phon_name – (str) file name with the phonetization
token_name – (str) file name with the tokenization
align_name – (str) file name to save the result WITHOUT ext.
- Returns
A message of the aligner in case of any problem, or
an empty string if success.
annotations.Align.tracksio module¶
- filename
sppas.src.annotations.Align.tracksio.py
- author
Brigitte Bigi
- contact
- summary
Automatic segmentation of the data into tracks
- class annotations.Align.tracksio.ListOfTracks[source]¶
Bases:
object
Manage the file with a list of tracks (units, ipus…).
- DEFAULT_FILENAME = 'tracks.list'¶
- class annotations.Align.tracksio.TrackNamesGenerator[source]¶
Bases:
object
Manage names of the files for a given track number.
- static align_filename(track_dir, track_number, ext=None)[source]¶
Return the name of the time-aligned file, without extension.
- class annotations.Align.tracksio.TracksReader[source]¶
Bases:
object
Read time-aligned tracks.
Manage tracks for the time-aligned phonemes and tokens.
- RADIUS = 0.005¶
- class annotations.Align.tracksio.TracksReaderWriter(mapping)[source]¶
Bases:
object
Manager for tracks from/to tiers.
- DELIMITERS = (' ', '|', '-')¶
- __init__(mapping)[source]¶
Create a new TracksReaderWriter instance.
- Parameters
mapping – (Mapping) a mapping table to convert the phone set
- static get_filenames(track_dir, track_number)[source]¶
Return file names corresponding to a given track.
- Parameters
track_dir – (str)
track_number – (int)
- Returns
(audio, phn, tok, align) file names
- get_units(dir_name)[source]¶
Return the time units of all tracks.
- Parameters
dir_name – (str) Input directory to get files.
- read_aligned_tracks(dir_name)[source]¶
Read time-aligned tracks in a directory.
- Parameters
dir_name – (str) Input directory to get files.
- Returns
(sppasTier, sppasTier, sppasTier)
- split_into_tracks(input_audio, phon_tier, tok_tier, tok_rescue_tier, dir_align)[source]¶
Write tracks from the given data.
- Parameters
input_audio – (str) Audio file name. Or None if no needed (basic alignment).
phon_tier – (sppasTier) The phonetization tier.
tok_tier – (sppasTier) The tokens tier, or None.
tok_rescue_tier – (sppasTier) The tokens rescue tier, or None.
dir_align – (str) Output directory to store files.
- Returns
PhonAlign, TokensAlign
- class annotations.Align.tracksio.TracksWriter[source]¶
Bases:
object
Write non-aligned track files.
Manage tracks for the audio, the phonetization and the tokenization.
- static write_tracks(input_audio, phon_tier, tok_tier, tok_rescue_tier, dir_align)[source]¶
Main method to write tracks from the given data.
- Parameters
input_audio – (src) File name of the audio file.
phon_tier – (Tier) Tier with phonetization to split.
tok_tier – (Tier) Tier with tokenization to split.
tok_rescue_tier – (Tier) Tier with tokens to split.
dir_align – (str) Directory to put units.
- Returns
List of tracks with (start-time end-time)
Module contents¶
- filename
sppas.src.annotations.Align.__init__.py
- author
Brigitte Bigi
- contact
- summary
Alignment automatic annotation of SPPAS.
Alignment is the process of aligning speech with its corresponding transcription at the phone level. This phonetic segmentation problem consists in a time-matching between a given speech unit along with a phonetic representation of the unit. The goal is to generate an alignment between the speech signal and its phonetic representation.
By default, alignment is based on the Julius Speech Recognition Engine (SRE) for 3 reasons:
it is easy to install which is important for users;
it is also easy to use then was easy to integrate in SPPAS;
3. its performances correspond to the state-of-the-art of HMM-based systems and are quite good.
The HTK command “HVite” can also be used to perform Alignment.
A finite state grammar that describes sentence patterns to be recognized and an acoustic model are needed. A grammar essentially defines constraints on what the SRE can expect as input. It is a list of words that the SRE listens to. Each word has a set of associated list of phonemes, extracted from the dictionary. When given a speech input, Julius searches for the most likely word sequence under constraint of the given grammar.
Speech Alignment also requires an Acoustic Model in order to align speech. An acoustic model is a file that contains statistical representations of each of the distinct sounds of one language. Each phoneme is represented by one of these statistical representations. SPPAS is based on the use of HTK-ASCII acoustic models.
- class annotations.Align.HMMInterpolation[source]¶
Bases:
object
HMM interpolation.
- static linear_interpolate_matrix(matrices, gammas)[source]¶
Interpolate linearly matrix with gamma coefficients.
- Parameters
matrices – List of matrix
gammas – List of coefficients (must sum to 1.)
- static linear_interpolate_mixtures(mixtures, gammas)[source]¶
Linear interpolation of a set of mixtures, of one stream only.
- Parameters
mixtures – (OrderedDict)
gammas – List of coefficients (must sum to 1.)
- Returns
mixture (OrderedDict)
- static linear_interpolate_states(states, gammas)[source]¶
Linear interpolation of a set of states, of one index only.
- Parameters
states – (OrderedDict)
gammas – List of coefficients (must sum to 1.)
- Returns
state (OrderedDict)
- static linear_interpolate_streams(streams, gammas)[source]¶
Linear interpolation of a set of streams, of one state only.
- Parameters
streams – (OrderedDict)
gammas – List of coefficients (must sum to 1.)
- Returns
stream (OrderedDict)
- static linear_interpolate_transitions(transitions, gammas)[source]¶
Linear interpolation of a set of transitions, of an hmm.
- Parameters
transitions – (OrderedDict): with key=’dim’ and key=’matrix’
gammas – List of coefficients (must sum to 1.)
- Returns
transition (OrderedDict)
- static linear_interpolate_values(values, gammas)[source]¶
Interpolate linearly values with gamma coefficients.
- Parameters
values – List of values
gammas – List of coefficients (must sum to 1.)
- static linear_interpolate_vectors(vectors, gammas)[source]¶
Interpolate linearly vectors with gamma coefficients.
- Parameters
vectors – List of vectors
gammas – List of coefficients (must sum to 1.)
- class annotations.Align.TrackSegmenter(model=None, aligner_name='basic')[source]¶
Bases:
object
Automatic segmentation of a unit of speech.
Speech segmentation of a unit of speech (IPU/utterance/sentence/segment) at phones and tokens levels.
This class is mainly an interface with external automatic aligners.
It is expected that all the following data were previously properly fixed:
audio file: 1 channel, 16000 Hz, 16 bits;
tokenization: UTF-8 encoding file (optional);
phonetization: UTF-8 encoding file;
acoustic model: HTK-ASCII (Julius or HVite expect this format);
- and that:
both the AC and phonetization are based on the same phone set
both the tokenization and phonetization contain the same nb of words
- DEFAULT_ALIGNER = 'basic'¶
- __init__(model=None, aligner_name='basic')[source]¶
Create a TrackSegmenter instance.
- Parameters
model – (str) Name of the directory of the acoustic model.
aligner_name – (str) The identifier name of the aligner.
It is expected that the AC model contains at least a file with name “hmmdefs”, and a file with name “monophones” for HVite command. It can also contain:
tiedlist file;
monophones.repl file;
config file.
Any other file will be ignored.
- aligners = <annotations.Align.aligners.aligner.sppasAligners object>¶
- segment(audio_filename, phon_name, token_name, align_name)[source]¶
Call an aligner to perform speech segmentation and manage errors.
- Parameters
audio_filename – (str) the audio file name of an IPU
phon_name – (str) file name with the phonetization
token_name – (str) file name with the tokenization
align_name – (str) file name to save the result WITHOUT ext.
- Returns
A message of the aligner in case of any problem, or
an empty string if success.
- class annotations.Align.TracksReaderWriter(mapping)[source]¶
Bases:
object
Manager for tracks from/to tiers.
- DELIMITERS = (' ', '|', '-')¶
- __init__(mapping)[source]¶
Create a new TracksReaderWriter instance.
- Parameters
mapping – (Mapping) a mapping table to convert the phone set
- static get_filenames(track_dir, track_number)[source]¶
Return file names corresponding to a given track.
- Parameters
track_dir – (str)
track_number – (int)
- Returns
(audio, phn, tok, align) file names
- get_units(dir_name)[source]¶
Return the time units of all tracks.
- Parameters
dir_name – (str) Input directory to get files.
- read_aligned_tracks(dir_name)[source]¶
Read time-aligned tracks in a directory.
- Parameters
dir_name – (str) Input directory to get files.
- Returns
(sppasTier, sppasTier, sppasTier)
- split_into_tracks(input_audio, phon_tier, tok_tier, tok_rescue_tier, dir_align)[source]¶
Write tracks from the given data.
- Parameters
input_audio – (str) Audio file name. Or None if no needed (basic alignment).
phon_tier – (sppasTier) The phonetization tier.
tok_tier – (sppasTier) The tokens tier, or None.
tok_rescue_tier – (sppasTier) The tokens rescue tier, or None.
dir_align – (str) Output directory to store files.
- Returns
PhonAlign, TokensAlign
- class annotations.Align.sppasAlign(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the Alignment automatic annotation.
- Author
Brigitte Bigi
- Contact
This class can produce 1 up to 5 tiers with names:
PhonAlign
TokensAlign (if tokens are given in the input)
PhnTokAlign - option (if tokens are given in the input)
How to use sppasAlign?
>>> a = sppasAlign() >>> a.set_aligner('julius') >>> a.load_resources(model_dirname) >>> a.run([phones], [audio, tokens], output)
- __init__(log=None)[source]¶
Create a new sppasAlign instance.
Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.
- Parameters
log – (sppasLog) Human-readable logs.
- convert(phon_tier, tok_tier, tok_faked_tier, input_audio, workdir)[source]¶
Perform speech segmentation of data.
- Parameters
phon_tier – (Tier) phonetization.
tok_tier – (Tier) tokenization, or None.
tok_faked_tier – (Tier) rescue tokenization, or None.
input_audio – (str) Audio file name.
workdir – (str) The working directory
- Returns
tier_phn, tier_tok
- fix_options(options)[source]¶
Fix all options.
Available options are:
clean
basic
aligner
- Parameters
options – (sppasOption)
- static fix_workingdir(inputaudio=None)[source]¶
Fix the working directory to store temporarily the data.
- Parameters
inputaudio – (str) Audio file name
- static get_input_extensions()[source]¶
Extensions that the annotation expects for its input filename.
- get_inputs(input_files)[source]¶
Return the audio file name and the 2 tiers.
Two tiers: the tier with phonetization and the tier with text normalization.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasChannel, sppasTier, sppasTier)
- load_resources(model, model_L1=None, **kwargs)[source]¶
Fix the acoustic model directory.
Create a SpeechSegmenter and AlignerIO.
- Parameters
model – (str) Directory of the acoustic model of the language
of the text :param model_L1: (str) Directory of the acoustic model of the mother language of the speaker
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Phonemes, and optionally tokens, audio
output – (str) the output name
- Returns
(sppasTranscription)
- set_aligner(aligner_name)[source]¶
Fix the name of the aligner.
- Parameters
aligner_name – (str) Case-insensitive name of the aligner.
- class annotations.Align.sppasAligners[source]¶
Bases:
object
Manager of the aligners implemented in the package.
- check(aligner_name)[source]¶
Check whether the aligner name is known or not.
- Parameters
aligner_name – (str) Name of the aligner.
- Returns
formatted alignername
- classes(aligner_name=None)[source]¶
Return the list of aligner classes.
- Parameters
aligner_name – (str) A specific aligner
- Returns
BasicAligner, or a list if no aligner name is given
- default_extension(aligner_name=None)[source]¶
Return the default extension of each aligner.
- Parameters
aligner_name – (str) A specific aligner
- Returns
str, or a dict of str if no aligner name is given
- extensions(aligner_name=None)[source]¶
Return the list of supported extensions of each aligner.
- Parameters
aligner_name – (str) A specific aligner
- Returns
list of str, or a dict of list if no aligner name is given
- instantiate(model_dir=None, aligner_name='basic')[source]¶
Instantiate an aligner to the appropriate system from its name.
If an error occurred, the basic aligner is returned.
- Parameters
model_dir – (str) Directory of the acoustic model
aligner_name – (str) Name of the aligner
- Returns
an Aligner instance.
- class annotations.Align.sppasArpaIO[source]¶
Bases:
object
ARPA statistical language models reader/writer.
This class is able to load and save statistical language models from ARPA-ASCII files.
- load(filename)[source]¶
Load a model from an ARPA file.
- Parameters
filename – (str) Name of the file of the model.
- class annotations.Align.sppasDataTrainer[source]¶
Bases:
object
Acoustic model trainer for HTK-ASCII models.
This class is a manager for the data created at each step of the acoustic training model procedure, following the HTK Handbook. It includes:
HTK scripts
phoneme prototypes
log files
features
- __init__()[source]¶
Create a sppasDataTrainer instance.
Initialize all members to None or empty lists.
- check()[source]¶
Check if all members are initialized with appropriate values.
- Returns
None if success.
- Raises
IOError
- create(workdir=None, scriptsdir='scripts', featsdir='features', logdir='log', protodir=None, protofilename='proto.hmm')[source]¶
Create all folders and their content (if possible) with their default names.
- Parameters
workdir – (str) Name of the working directory
scriptsdir – (str) The folder for HTK scripts
featsdir – (str) The folder for features
logdir – (str) Directory to store log files
protodir – (str) Name of the prototypes directory
protofilename – (str) Name of the file for the HMM prototype.
- Raises
IOError
- fix_proto(proto_dir='protos', proto_filename='proto.hmm')[source]¶
(Re-)Create the proto.
If relevant, create a protos directory and add the proto file. Create the macro if any.
- Parameters
proto_dir – (str) Directory in which prototypes will be stored
proto_filename – (str) File name of the default prototype
- fix_storage_dirs(basename)[source]¶
Fix the folders to store annotated speech and audio files.
The given basename can be something like: align, phon, trans, …
- Parameters
basename – (str) a name to identify storage folders
- Raises
IOError
- fix_working_dir(workdir=None, scriptsdir='scripts', featsdir='features', logdir='log')[source]¶
Set the working directory and its folders.
Create all of them if necessary.
- Parameters
workdir – (str) The working main directory
scriptsdir – (str) The folder for HTK scripts
featsdir – (str) The folder for features
logdir – (str) The folder to write output logs
- get_storemfc()[source]¶
Get the current folder name to store MFCC data files.
- Returns
folder name or None.
- get_storetrs()[source]¶
Get the current folder name to store transcribed data files.
- Returns
folder name or None.
- class annotations.Align.sppasHMM(name='und')[source]¶
Bases:
object
HMM representation for one phone.
Hidden Markov Models (HMMs) provide a simple and effective framework for modeling time-varying spectral vector sequences. As a consequence, most of speech technology systems are based on HMMs. Each base phone is represented by a continuous density HMM, with transition probability parameters and output observation distributions. One of the most commonly used extensions to standard HMMs is to model the state-output distribution as a mixture model, a mixture of Gaussians is a highly flexible distribution able to model, for example, asymmetric and multi-modal distributed data.
- An HMM-definition is made of:
state_count: int
states: list of OrderedDict with “index” and “state” as keys.
transition: OrderedDict with “dim” and “matrix” as keys.
options
regression_tree
duration
- DEFAULT_NAME = 'und'¶
- __init__(name='und')[source]¶
Create a sppasHMM instance.
The model includes a default name and an empty definition.
- Parameters
name – (str) Name of the HMM (usually the phoneme in SAMPA)
- create(states, transition, name=None)[source]¶
Create the hmm and set it.
- Parameters
states – (OrderedDict)
transition – (OrderedDict)
name – (string) The name of the HMM.
If name is set to None, the default name is assigned.
- static create_default()[source]¶
Create a default ordered dictionary, used for states.
- Returns
collections.OrderedDict()
- static create_gmm(means, variances, gconsts=None, weights=None)[source]¶
Create and return a GMM.
- Returns
collections.OrderedDict()
- create_proto(proto_size, nb_mix=1)[source]¶
Create the 5-states HMM proto and set it.
- Parameters
proto_size – (int) Number of mean and variance values.
It’s commonly either 25 or 39, it depends on the MFCC parameters. :param nb_mix: (int) Number of mixtures (i.e. the number of times means and variances occur)
- create_sp()[source]¶
Create the 3-states HMM sp and set it.
The sp model is based on a 3-state HMM with string “silst” as state 2, and a 3x3 transition matrix as follow:
0.0 1.0 0.0 0.0 0.9 0.1 0.0 0.0 0.0
- static create_square_matrix(matrix)[source]¶
Create a default matrix.
- Returns
collections.OrderedDict()
- static create_transition(state_stay_probabilities=(0.6, 0.6, 0.7))[source]¶
Create and return a transition matrix.
- Parameters
state_stay_probabilities – (list) Center transition probabilities
- Returns
collections.OrderedDict()
- property definition¶
Return the definition (OrderedDict) of the model.
- get_state(index)[source]¶
Return the state of a given index or None if index is not found.
- Parameters
index – (int) State index (commonly between 1 and 5)
- Returns
collections.OrderedDict or None
- get_vecsize()[source]¶
Return the number of means and variance of each state.
If state is pointing to a macro, 0 is returned.
- property name¶
Return the name (str) of the model.
- set(name, definition)[source]¶
Set the model.
- Parameters
name – (str) Name of the HMM
definition – (OrderedDict) Definition of the HMM (states
and transitions)
- set_definition(definition)[source]¶
Set the definition of the model.
- Parameters
definition – (OrderedDict) Definition of the HMM
(states and transitions) :raises: ModelsDataTypeError
- set_name(name)[source]¶
Set the name of the model.
- Parameters
name – (str) Name of the HMM.
- Raises
ModelsDataTypeError
- static_linear_interpolation(hmm, gamma)[source]¶
Static Linear Interpolation.
This is perhaps one of the most straightforward manner to combine models. This is an efficient way for merging the GMMs of the component models.
Gamma coefficient is applied to self and (1-gamma) to the other hmm.
- Parameters
hmm – (HMM) the hmm to be interpolated with.
gamma – (float) coefficient to be applied to self.
- Returns
(bool) Status of the interpolation.
- class annotations.Align.sppasHTKModelTrainer(corpus=None)[source]¶
Bases:
object
Acoustic model trainer.
This class allows to train an acoustic model from audio data and their transcriptions (either phonetic or orthographic or both).
Acoustic models are trained with HTK toolbox using a training corpus of speech, previously segmented in utterances and transcribed. The trained models are Hidden Markov models (HMMs). Typically, the HMM states are modeled by Gaussian mixture densities whose parameters are estimated using an expectation maximization procedure. The outcome of this training procedure is dependent on the availability of accurately annotated data and on good initialization.
Acoustic models are trained from 16 bits, 16000 hz wav files. The Mel-frequency cepstrum coefficients (MFCC) along with their first and second derivatives are extracted from the speech.
Step 1 is the data preparation.
Step 2 is the monophones initialization.
Step 3 is the monophones generation. This first model is re-estimated using the MFCC files to create a new model, using ``HERest’’. Then, it fixes the ``sp’’ model from the ``sil’’ model by extracting only 3 states of the initial 5-states model. Finally, this monophone model is re-estimated using the MFCC files and the training data.
Step 4 creates tied-state triphones from monophones and from some language specificity defined by means of a configuration file.
- __init__(corpus=None)[source]¶
Create a sppasHTKModelTrainer instance.
- Parameters
corpus – (sppasTrainingCorpus)
- align_trs(tokenizer, phonetizer, aligner)[source]¶
Alignment of the transcribed speech using the current model.
- make_triphones()[source]¶
Extract triphones from monophones data (mlf).
A new mlf file is created with triphones instead of monophones, and a file with the list of triphones is created. This latter is sorted in order of arrival (this is very important).
Command: HLEd -T 2 -n output/triphones -l ‘*’ -i output/wintri.mlf scripts/mktri.led corpus.mlf
- small_pause()[source]¶
Create and save the “sp” model for short pauses.
create a “silst” macro, using state 3 of the “sil” HMM,
adapt state 3 of the “sil” HMM definition, to use “silst”,
create a “sp” HMM,
save the “sp” HMM into the directory of monophones.
- train_step(scpfile, rounds=3, dopruning=True)[source]¶
Perform some rounds of HERest estimation.
It expects the input HMM definition to have been initialised and it uses the embedded Baum-Welch re-estimation. This involves finding the probability of being in each state at each time frame using the Forward-Backward algorithm.
- Parameters
scpfile – (str) Description file with the list of data files
rounds – (int) Number of times HERest is called.
dopruning – (bool) Do the pruning
- Returns
bool
- training_recipe(outdir=None, delete=False, header_tree=None)[source]¶
Create an acoustic model and return it.
A corpus (sppasTrainingCorpus) must be previously defined.
- Parameters
outdir – (str) Directory to save the final model and related files
delete – (bool) Delete the working directory.
header_tree – (str) Name of the script file to train a triphone (commonly header-tree.hed).
- Returns
sppasAcModel
- class annotations.Align.sppasNgramCounter(n=1, wordslist=None)[source]¶
Bases:
object
N-gram representation.
- __init__(n=1, wordslist=None)[source]¶
Create a sppasNgramSounter instance.
- Parameters
n – (int) n-gram order, between 1 and MAX_ORDER.
wordslist – (sppasVocabulary) a list of accepted tokens.
- append_sentence(sentence)[source]¶
Append a sentence in a dictionary of data counts.
- Parameters
sentence – (str) A sentence with tokens separated by whitespace.
- count(*datafiles)[source]¶
Count ngrams of order n from data files.
- Parameters
datafiles – (*args) is a set of file names, with UTF-8 encoding.
If the file contains more than one tier, only the first one is used.
- get_count(sequence)[source]¶
Get the count of a specific sequence.
- Parameters
sequence – (str) tokens separated by whitespace.
- Returns
(int)
- get_ncount()[source]¶
Get the number of observed n-grams.
Excluding start symbols if unigrams.
- Returns
(int)
- class annotations.Align.sppasNgramsModel(norder=1)[source]¶
Bases:
object
Statistical language model trainer.
A model is made of:
n-gram counts: a list of sppasNgramCounter instances.
n-gram probabilities.
How to estimate n-gram probabilities?
A slight bit of theory… The following is copied (cribbed!) from the SRILM following web page: http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html
- a_z An N-gram where a is the first word, z is the last word, and “_”
represents 0 or more words in between.
c(a_z) The count of N-gram a_z in the training data
- p(a_z) The estimated conditional probability of the nth word z given the
first n-1 words (a_) of an N-gram.
a_ The n-1 word prefix of the N-gram a_z. _z The n-1 word suffix of the N-gram a_z.
N-gram models try to estimate the probability of a word z in the context of the previous n-1 words (a_). One way to estimate p(a_z) is to look at the number of times word z has followed the previous n-1 words (a_):
p(a_z) = c(a_z)/c(a_)
This is known as the maximum likelihood (ML) estimate. Notice that it assigns zero probability to N-grams that have not been observed in the training data.
To avoid the zero probabilities, we take some probability mass from the observed N-grams and distribute it to unobserved N-grams. Such redistribution is known as smoothing or discounting. Most existing smoothing algorithms can be described by the following equation:
p(a_z) = (c(a_z) > 0) ? f(a_z) : bow(a_) p(_z)
If the N-gram a_z has been observed in the training data, we use the distribution f(a_z). Typically f(a_z) is discounted to be less than the ML estimate so we have some leftover probability for the z words unseen in the context (a_). Different algorithms mainly differ on how they discount the ML estimate to get f(a_z).
- Example
>>> # create a 3-gram model >>> model = sppasNgramsModel(3) >>> # count n-grams from data >>> model.count(*corpusfiles) >>> # estimates probas >>> probas = model.probabilities(method="logml")
Methods to estimates the probabilities:
raw: return counts instead of probabilities
lograw: idem with log values
ml: return maximum likelihood (un-smoothed probabilities)
logml: idem with log values
- __init__(norder=1)[source]¶
Create a sppasNgramsModel instance.
- Parameters
norder – (int) n-gram order, between 1 and MAX_ORDER.
- append_sentences(sentences)[source]¶
Append a list of sentences in data counts.
- Parameters
sentences – (list) sentences with tokens separated by whitespace.
- count(*datafiles)[source]¶
Count ngrams from data files.
- Parameters
datafiles – (*args) is a set of file names, with UTF-8 encoding.
If the file contains more than one tier, only the first one is used.
- probabilities(method='lograw')[source]¶
Return a list of probabilities.
- Parameters
method – (str) method to estimate probabilities
- Returns
list of n-gram probabilities.
- Example
>>> probas = probabilities("logml") >>> for t in probas[0]: >>> print(t) >>> ('</s>', -1.066946789630613, None) >>> ('<s>', -99.0, None) >>> (u'a', -0.3679767852945944, None) >>> (u'b', -0.5440680443502756, None) >>> (u'c', -0.9420080530223132, None) >>> (u'd', -1.066946789630613, None)
- set_end_symbol(symbol)[source]¶
Set the end sentence symbol.
- Parameters
symbol – (str) String to represent the end of a sentence.
- set_min_count(value)[source]¶
Fix a minimum count values, applied only to the max order.
Any observed n-gram with a count under the value is removed.
- Parameters
value – (int) Threshold for minimum count
- class annotations.Align.sppasSLM[source]¶
Bases:
object
Statistical language model representation.
- interpolate(other)[source]¶
Interpolate the model with another one.
An N-Gram language model can be constructed from a linear interpolation of several models. In this case, the overall likelihood P(w|h) of a word w occurring after the history h is computed as the arithmetic average of P(w|h) for each of the models.
The default interpolation method is linear interpolation. In addition, log-linear interpolation of models is possible.
- Parameters
other – (sppasSLM)
- load_from_arpa(filename)[source]¶
Load the model from an ARPA-ASCII file.
- Parameters
filename – (str) Filename from which to read the model.
- class annotations.Align.sppasTrainingCorpus(datatrainer=None, lang='und')[source]¶
Bases:
object
Manager of a training corpus, to prepare a set of data.
Data preparation is the step 1 of the acoustic model training procedure.
It establishes the list of phonemes. It converts the input data into the HTK-specific data format. It codes the audio data, also called “parameterizing the raw speech waveforms into sequences of feature vectors” (i.e. convert from wav to MFCC format).
Accepted input:
annotated files: one of sppasTrsRW.extensions_in()
audio files: one of audiodata.extensions
- __init__(datatrainer=None, lang='und')[source]¶
Create a sppasTrainingCorpus instance.
- Parameters
datatrainer – (sppasDataTrainer)
lang – (str) iso-8859-3 of the language
- add_corpus(directory)[source]¶
Add a new corpus to deal with.
Find matching pairs of files (audio / transcription) of the given directory and its folders.
- Parameters
directory – (str) The directory to find data files of a corpus.
- Returns
the number of pairs appended.
- add_file(trs_filename, audio_filename)[source]¶
Add a new set of files to deal with.
If such files are already in the data, they will be added again.
- Parameters
trs_filename – (str) The annotated file.
audio_filename – (str) The audio file.
- Returns
(bool)
- fix_resources(vocab_file=None, dict_file=None, mapping_file=None)[source]¶
Fix resources using default values.
Ideally, resources are fixed after the datatrainer.
- Parameters
vocab_file – (str) The lexicon, used during tokenization of the corpus.
dict_file – (str) The pronunciation dictionary, used both to
generate the list of phones and to perform phonetization of the corpus. :param mapping_file: (str) file that contains the mapping table for the phone set.
- get_mlf()[source]¶
Fix the mlf file by defining the directories to add.
Example of a line of the MLF file is: “/mfc-align/” => “workdir/trs-align”
- get_scp(aligned=True, phonetized=False, transcribed=False)[source]¶
Fix the train.scp file content.
- Parameters
aligned – (bool) Add time-aligned data in the scp file
phonetized – (bool) Add phonetized data in the scp file
transcribed – (bool) Add transcribed data in the scp file
- Returns
filename or None if no data is available.