annotations.Align package

Subpackages

Submodules

annotations.Align.modelmixer module

filename

sppas.src.annotations.Align.models.acm.modelmixer.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Combine two acoustic models.

class annotations.Align.modelmixer.sppasModelMixer[source]

Bases: object

Mix two acoustic models.

Create a mixed monophones model. Typical use is to create an acoustic model of a non-native speaker.

__init__()[source]

Create a sppasModelMixer instance.

mix(outputdir, format='hmmdefs', gamma=1.0)[source]

Mix the acm of the text with the one of the spk mother language.

Mix the acm of the text with the one of the mother language of the speaker reading such text.

All new phones are added and the shared ones are combined using a Static Linear Interpolation.

Parameters
  • outputdir – (str) The directory to save the new mixed model.

  • format – (str) the format of the resulting acoustic model

  • gamma – (float) coefficient to apply to the model: between 0.

and 1. This means that a coefficient value of 1. indicates to keep the current version of each shared hmm.

Raises

TypeError, ValueError

Returns

a tuple indicating the number of hmms that was

(appended, interpolated, kept, changed).

read(model_text_dir, model_spk_dir)[source]

Read the acoustic models from their directories.

Parameters
  • model_text_dir – (str)

  • model_spk_dir – (str)

set_models(model_text, model_spk)[source]

Fix the acoustic models.

Parameters
  • model_text – (sppasAcModel)

  • model_spk – (sppasAcModel)

annotations.Align.sppasalign module

filename

sppas.src.annotations.Align.sppasalign.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

SPPAS integration of Alignment automatic annotation.

class annotations.Align.sppasalign.sppasAlign(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the Alignment automatic annotation.

Author

Brigitte Bigi

Contact

develop@sppas.org

This class can produce 1 up to 5 tiers with names:

  • PhonAlign

  • TokensAlign (if tokens are given in the input)

  • PhnTokAlign - option (if tokens are given in the input)

How to use sppasAlign?

>>> a = sppasAlign()
>>> a.set_aligner('julius')
>>> a.load_resources(model_dirname)
>>> a.run([phones], [audio, tokens], output)
__init__(log=None)[source]

Create a new sppasAlign instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

convert(phon_tier, tok_tier, tok_faked_tier, input_audio, workdir)[source]

Perform speech segmentation of data.

Parameters
  • phon_tier – (Tier) phonetization.

  • tok_tier – (Tier) tokenization, or None.

  • tok_faked_tier – (Tier) rescue tokenization, or None.

  • input_audio – (str) Audio file name.

  • workdir – (str) The working directory

Returns

tier_phn, tier_tok

fix_options(options)[source]

Fix all options.

Available options are:

  • clean

  • basic

  • aligner

Parameters

options – (sppasOption)

static fix_workingdir(inputaudio=None)[source]

Fix the working directory to store temporarily the data.

Parameters

inputaudio – (str) Audio file name

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the audio file name and the 2 tiers.

Two tiers: the tier with phonetization and the tier with text normalization.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasChannel, sppasTier, sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

load_resources(model, model_L1=None, **kwargs)[source]

Fix the acoustic model directory.

Create a SpeechSegmenter and AlignerIO.

Parameters

model – (str) Directory of the acoustic model of the language

of the text :param model_L1: (str) Directory of the acoustic model of the mother language of the speaker

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Phonemes, and optionally tokens, audio

  • output – (str) the output name

Returns

(sppasTranscription)

set_aligner(aligner_name)[source]

Fix the name of the aligner.

Parameters

aligner_name – (str) Case-insensitive name of the aligner.

set_basic(basic)[source]

Fix the basic option.

Parameters

basic – (bool) If basic is set to True, a basic segmentation

will be performed if the main aligner fails.

set_clean(clean)[source]

Fix the clean option.

Parameters

clean – (bool) If clean is set to True then temporary files

will be removed.

annotations.Align.tracksgmt module

filename

sppas.src.annotations.Align.tracksgmt.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Automatic segmentation of the data into track segments

class annotations.Align.tracksgmt.TrackSegmenter(model=None, aligner_name='basic')[source]

Bases: object

Automatic segmentation of a unit of speech.

Speech segmentation of a unit of speech (IPU/utterance/sentence/segment) at phones and tokens levels.

This class is mainly an interface with external automatic aligners.

It is expected that all the following data were previously properly fixed:

  • audio file: 1 channel, 16000 Hz, 16 bits;

  • tokenization: UTF-8 encoding file (optional);

  • phonetization: UTF-8 encoding file;

  • acoustic model: HTK-ASCII (Julius or HVite expect this format);

and that:
  • both the AC and phonetization are based on the same phone set

  • both the tokenization and phonetization contain the same nb of words

DEFAULT_ALIGNER = 'basic'
__init__(model=None, aligner_name='basic')[source]

Create a TrackSegmenter instance.

Parameters
  • model – (str) Name of the directory of the acoustic model.

  • aligner_name – (str) The identifier name of the aligner.

It is expected that the AC model contains at least a file with name “hmmdefs”, and a file with name “monophones” for HVite command. It can also contain:

  • tiedlist file;

  • monophones.repl file;

  • config file.

Any other file will be ignored.

aligners = <annotations.Align.aligners.aligner.sppasAligners object>
get_aligner_ext()[source]

Return the output file extension the aligner will use.

get_aligner_name()[source]

Return the name of the instantiated aligner.

get_model()[source]

Return the model directory name.

segment(audio_filename, phon_name, token_name, align_name)[source]

Call an aligner to perform speech segmentation and manage errors.

Parameters
  • audio_filename – (str) the audio file name of an IPU

  • phon_name – (str) file name with the phonetization

  • token_name – (str) file name with the tokenization

  • align_name – (str) file name to save the result WITHOUT ext.

Returns

A message of the aligner in case of any problem, or

an empty string if success.

set_aligner(aligner_name)[source]

Fix the name of the aligner, one of aligners.ALIGNERS_TYPES.

Parameters

aligner_name – (str) Case-insensitive name of an aligner system.

set_aligner_ext(ext)[source]

Fix the output file extension the aligner will use.

set_model(model)[source]

Fix an acoustic model to perform time-alignment.

Parameters

model – (str) Name of the directory of the acoustic model.

annotations.Align.tracksio module

filename

sppas.src.annotations.Align.tracksio.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Automatic segmentation of the data into tracks

class annotations.Align.tracksio.ListOfTracks[source]

Bases: object

Manage the file with a list of tracks (units, ipus…).

DEFAULT_FILENAME = 'tracks.list'
static read(dir_name)[source]

Return a list of (start-time end-time).

Parameters

dir_name – Name of the directory with the file to read.

Returns

list of units

static write(dir_name, units)[source]

Write a list file (start-time end-time).

Parameters
  • dir_name – Name of the directory with the file to read.

  • units – List of units to write.

class annotations.Align.tracksio.TrackNamesGenerator[source]

Bases: object

Manage names of the files for a given track number.

static align_filename(track_dir, track_number, ext=None)[source]

Return the name of the time-aligned file, without extension.

static audio_filename(track_dir, track_number)[source]

Return the name of the audio file.

static phones_filename(track_dir, track_number)[source]

Return the name of the file with Phonetization.

static tokens_filename(track_dir, track_number)[source]

Return the name of the file with Tokenization.

class annotations.Align.tracksio.TracksReader[source]

Bases: object

Read time-aligned tracks.

Manage tracks for the time-aligned phonemes and tokens.

RADIUS = 0.005
static read_aligned_tracks(dir_name)[source]

Read a set of alignment files and set as tiers.

Parameters

dir_name – (str) input directory containing a set of units

Returns

PhonAlign, TokensAlign

class annotations.Align.tracksio.TracksReaderWriter(mapping)[source]

Bases: object

Manager for tracks from/to tiers.

DELIMITERS = (' ', '|', '-')
__init__(mapping)[source]

Create a new TracksReaderWriter instance.

Parameters

mapping – (Mapping) a mapping table to convert the phone set

static get_filenames(track_dir, track_number)[source]

Return file names corresponding to a given track.

Parameters
  • track_dir – (str)

  • track_number – (int)

Returns

(audio, phn, tok, align) file names

get_units(dir_name)[source]

Return the time units of all tracks.

Parameters

dir_name – (str) Input directory to get files.

read_aligned_tracks(dir_name)[source]

Read time-aligned tracks in a directory.

Parameters

dir_name – (str) Input directory to get files.

Returns

(sppasTier, sppasTier, sppasTier)

split_into_tracks(input_audio, phon_tier, tok_tier, tok_rescue_tier, dir_align)[source]

Write tracks from the given data.

Parameters
  • input_audio – (str) Audio file name. Or None if no needed (basic alignment).

  • phon_tier – (sppasTier) The phonetization tier.

  • tok_tier – (sppasTier) The tokens tier, or None.

  • tok_rescue_tier – (sppasTier) The tokens rescue tier, or None.

  • dir_align – (str) Output directory to store files.

Returns

PhonAlign, TokensAlign

class annotations.Align.tracksio.TracksWriter[source]

Bases: object

Write non-aligned track files.

Manage tracks for the audio, the phonetization and the tokenization.

static write_tracks(input_audio, phon_tier, tok_tier, tok_rescue_tier, dir_align)[source]

Main method to write tracks from the given data.

Parameters
  • input_audio – (src) File name of the audio file.

  • phon_tier – (Tier) Tier with phonetization to split.

  • tok_tier – (Tier) Tier with tokenization to split.

  • tok_rescue_tier – (Tier) Tier with tokens to split.

  • dir_align – (str) Directory to put units.

Returns

List of tracks with (start-time end-time)

Module contents

filename

sppas.src.annotations.Align.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Alignment automatic annotation of SPPAS.

Alignment is the process of aligning speech with its corresponding transcription at the phone level. This phonetic segmentation problem consists in a time-matching between a given speech unit along with a phonetic representation of the unit. The goal is to generate an alignment between the speech signal and its phonetic representation.

By default, alignment is based on the Julius Speech Recognition Engine (SRE) for 3 reasons:

  1. it is easy to install which is important for users;

  2. it is also easy to use then was easy to integrate in SPPAS;

3. its performances correspond to the state-of-the-art of HMM-based systems and are quite good.

The HTK command “HVite” can also be used to perform Alignment.

A finite state grammar that describes sentence patterns to be recognized and an acoustic model are needed. A grammar essentially defines constraints on what the SRE can expect as input. It is a list of words that the SRE listens to. Each word has a set of associated list of phonemes, extracted from the dictionary. When given a speech input, Julius searches for the most likely word sequence under constraint of the given grammar.

Speech Alignment also requires an Acoustic Model in order to align speech. An acoustic model is a file that contains statistical representations of each of the distinct sounds of one language. Each phoneme is represented by one of these statistical representations. SPPAS is based on the use of HTK-ASCII acoustic models.

class annotations.Align.HMMInterpolation[source]

Bases: object

HMM interpolation.

static linear_interpolate_matrix(matrices, gammas)[source]

Interpolate linearly matrix with gamma coefficients.

Parameters
  • matrices – List of matrix

  • gammas – List of coefficients (must sum to 1.)

static linear_interpolate_mixtures(mixtures, gammas)[source]

Linear interpolation of a set of mixtures, of one stream only.

Parameters
  • mixtures – (OrderedDict)

  • gammas – List of coefficients (must sum to 1.)

Returns

mixture (OrderedDict)

static linear_interpolate_states(states, gammas)[source]

Linear interpolation of a set of states, of one index only.

Parameters
  • states – (OrderedDict)

  • gammas – List of coefficients (must sum to 1.)

Returns

state (OrderedDict)

static linear_interpolate_streams(streams, gammas)[source]

Linear interpolation of a set of streams, of one state only.

Parameters
  • streams – (OrderedDict)

  • gammas – List of coefficients (must sum to 1.)

Returns

stream (OrderedDict)

static linear_interpolate_transitions(transitions, gammas)[source]

Linear interpolation of a set of transitions, of an hmm.

Parameters
  • transitions – (OrderedDict): with key=’dim’ and key=’matrix’

  • gammas – List of coefficients (must sum to 1.)

Returns

transition (OrderedDict)

static linear_interpolate_values(values, gammas)[source]

Interpolate linearly values with gamma coefficients.

Parameters
  • values – List of values

  • gammas – List of coefficients (must sum to 1.)

static linear_interpolate_vectors(vectors, gammas)[source]

Interpolate linearly vectors with gamma coefficients.

Parameters
  • vectors – List of vectors

  • gammas – List of coefficients (must sum to 1.)

static linear_states(states, coefficients)[source]

Linear interpolation of a set of states.

Parameters
  • states – (OrderedDict)

  • coefficients – List of coefficients (must sum to 1.)

Returns

state (OrderedDict)

static linear_transitions(transitions, coefficients)[source]

Linear interpolation of a set of transitions.

Parameters
  • transitions – (OrderedDict): with key=’dim’ and key=’matrix’

  • coefficients – List of coefficients (must sum to 1.)

Returns

transition (OrderedDict)

class annotations.Align.TrackSegmenter(model=None, aligner_name='basic')[source]

Bases: object

Automatic segmentation of a unit of speech.

Speech segmentation of a unit of speech (IPU/utterance/sentence/segment) at phones and tokens levels.

This class is mainly an interface with external automatic aligners.

It is expected that all the following data were previously properly fixed:

  • audio file: 1 channel, 16000 Hz, 16 bits;

  • tokenization: UTF-8 encoding file (optional);

  • phonetization: UTF-8 encoding file;

  • acoustic model: HTK-ASCII (Julius or HVite expect this format);

and that:
  • both the AC and phonetization are based on the same phone set

  • both the tokenization and phonetization contain the same nb of words

DEFAULT_ALIGNER = 'basic'
__init__(model=None, aligner_name='basic')[source]

Create a TrackSegmenter instance.

Parameters
  • model – (str) Name of the directory of the acoustic model.

  • aligner_name – (str) The identifier name of the aligner.

It is expected that the AC model contains at least a file with name “hmmdefs”, and a file with name “monophones” for HVite command. It can also contain:

  • tiedlist file;

  • monophones.repl file;

  • config file.

Any other file will be ignored.

aligners = <annotations.Align.aligners.aligner.sppasAligners object>
get_aligner_ext()[source]

Return the output file extension the aligner will use.

get_aligner_name()[source]

Return the name of the instantiated aligner.

get_model()[source]

Return the model directory name.

segment(audio_filename, phon_name, token_name, align_name)[source]

Call an aligner to perform speech segmentation and manage errors.

Parameters
  • audio_filename – (str) the audio file name of an IPU

  • phon_name – (str) file name with the phonetization

  • token_name – (str) file name with the tokenization

  • align_name – (str) file name to save the result WITHOUT ext.

Returns

A message of the aligner in case of any problem, or

an empty string if success.

set_aligner(aligner_name)[source]

Fix the name of the aligner, one of aligners.ALIGNERS_TYPES.

Parameters

aligner_name – (str) Case-insensitive name of an aligner system.

set_aligner_ext(ext)[source]

Fix the output file extension the aligner will use.

set_model(model)[source]

Fix an acoustic model to perform time-alignment.

Parameters

model – (str) Name of the directory of the acoustic model.

class annotations.Align.TracksReaderWriter(mapping)[source]

Bases: object

Manager for tracks from/to tiers.

DELIMITERS = (' ', '|', '-')
__init__(mapping)[source]

Create a new TracksReaderWriter instance.

Parameters

mapping – (Mapping) a mapping table to convert the phone set

static get_filenames(track_dir, track_number)[source]

Return file names corresponding to a given track.

Parameters
  • track_dir – (str)

  • track_number – (int)

Returns

(audio, phn, tok, align) file names

get_units(dir_name)[source]

Return the time units of all tracks.

Parameters

dir_name – (str) Input directory to get files.

read_aligned_tracks(dir_name)[source]

Read time-aligned tracks in a directory.

Parameters

dir_name – (str) Input directory to get files.

Returns

(sppasTier, sppasTier, sppasTier)

split_into_tracks(input_audio, phon_tier, tok_tier, tok_rescue_tier, dir_align)[source]

Write tracks from the given data.

Parameters
  • input_audio – (str) Audio file name. Or None if no needed (basic alignment).

  • phon_tier – (sppasTier) The phonetization tier.

  • tok_tier – (sppasTier) The tokens tier, or None.

  • tok_rescue_tier – (sppasTier) The tokens rescue tier, or None.

  • dir_align – (str) Output directory to store files.

Returns

PhonAlign, TokensAlign

class annotations.Align.sppasAlign(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the Alignment automatic annotation.

Author

Brigitte Bigi

Contact

develop@sppas.org

This class can produce 1 up to 5 tiers with names:

  • PhonAlign

  • TokensAlign (if tokens are given in the input)

  • PhnTokAlign - option (if tokens are given in the input)

How to use sppasAlign?

>>> a = sppasAlign()
>>> a.set_aligner('julius')
>>> a.load_resources(model_dirname)
>>> a.run([phones], [audio, tokens], output)
__init__(log=None)[source]

Create a new sppasAlign instance.

Log is used for a better communication of the annotation process and its results. If None, logs are redirected to the default logging system.

Parameters

log – (sppasLog) Human-readable logs.

convert(phon_tier, tok_tier, tok_faked_tier, input_audio, workdir)[source]

Perform speech segmentation of data.

Parameters
  • phon_tier – (Tier) phonetization.

  • tok_tier – (Tier) tokenization, or None.

  • tok_faked_tier – (Tier) rescue tokenization, or None.

  • input_audio – (str) Audio file name.

  • workdir – (str) The working directory

Returns

tier_phn, tier_tok

fix_options(options)[source]

Fix all options.

Available options are:

  • clean

  • basic

  • aligner

Parameters

options – (sppasOption)

static fix_workingdir(inputaudio=None)[source]

Fix the working directory to store temporarily the data.

Parameters

inputaudio – (str) Audio file name

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_input_patterns()[source]

Pattern this annotation expects for its input filename.

get_inputs(input_files)[source]

Return the audio file name and the 2 tiers.

Two tiers: the tier with phonetization and the tier with text normalization.

Parameters

input_files – (list)

Raise

NoTierInputError

Returns

(sppasChannel, sppasTier, sppasTier)

get_output_pattern()[source]

Pattern this annotation uses in an output filename.

load_resources(model, model_L1=None, **kwargs)[source]

Fix the acoustic model directory.

Create a SpeechSegmenter and AlignerIO.

Parameters

model – (str) Directory of the acoustic model of the language

of the text :param model_L1: (str) Directory of the acoustic model of the mother language of the speaker

run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Phonemes, and optionally tokens, audio

  • output – (str) the output name

Returns

(sppasTranscription)

set_aligner(aligner_name)[source]

Fix the name of the aligner.

Parameters

aligner_name – (str) Case-insensitive name of the aligner.

set_basic(basic)[source]

Fix the basic option.

Parameters

basic – (bool) If basic is set to True, a basic segmentation

will be performed if the main aligner fails.

set_clean(clean)[source]

Fix the clean option.

Parameters

clean – (bool) If clean is set to True then temporary files

will be removed.

class annotations.Align.sppasAligners[source]

Bases: object

Manager of the aligners implemented in the package.

__init__()[source]

Create a sppasAligners to manage the aligners supported by SPPAS.

check(aligner_name)[source]

Check whether the aligner name is known or not.

Parameters

aligner_name – (str) Name of the aligner.

Returns

formatted alignername

classes(aligner_name=None)[source]

Return the list of aligner classes.

Parameters

aligner_name – (str) A specific aligner

Returns

BasicAligner, or a list if no aligner name is given

static default_aligner_name()[source]

Return the name of the default aligner.

default_extension(aligner_name=None)[source]

Return the default extension of each aligner.

Parameters

aligner_name – (str) A specific aligner

Returns

str, or a dict of str if no aligner name is given

extensions(aligner_name=None)[source]

Return the list of supported extensions of each aligner.

Parameters

aligner_name – (str) A specific aligner

Returns

list of str, or a dict of list if no aligner name is given

get()[source]

Return a dictionary of aligners (key=name, value=instance).

instantiate(model_dir=None, aligner_name='basic')[source]

Instantiate an aligner to the appropriate system from its name.

If an error occurred, the basic aligner is returned.

Parameters
  • model_dir – (str) Directory of the acoustic model

  • aligner_name – (str) Name of the aligner

Returns

an Aligner instance.

names()[source]

Return the list of aligner names.

class annotations.Align.sppasArpaIO[source]

Bases: object

ARPA statistical language models reader/writer.

This class is able to load and save statistical language models from ARPA-ASCII files.

__init__()[source]

Create a sppasArpaIO instance without model.

load(filename)[source]

Load a model from an ARPA file.

Parameters

filename – (str) Name of the file of the model.

save(filename)[source]

Save the model into a file, in ARPA-ASCII format.

The ARPA format:

data

ngram 1=nb1 ngram 2=nb2 … ngram N=nbN

1-grams: p(a_z) a_z bow(a_z) …

2-grams: p(a_z) a_z bow(a_z) …

n-grams: p(a_z) a_z …

end

Parameters

filename – (str) File where to save the model.

set(slm)[source]

Set the model of the sppasSLM.

Parameters

slm – (list) List of tuples for 1-gram, 2-grams, …

class annotations.Align.sppasDataTrainer[source]

Bases: object

Acoustic model trainer for HTK-ASCII models.

This class is a manager for the data created at each step of the acoustic training model procedure, following the HTK Handbook. It includes:

  • HTK scripts

  • phoneme prototypes

  • log files

  • features

__init__()[source]

Create a sppasDataTrainer instance.

Initialize all members to None or empty lists.

check()[source]

Check if all members are initialized with appropriate values.

Returns

None if success.

Raises

IOError

create(workdir=None, scriptsdir='scripts', featsdir='features', logdir='log', protodir=None, protofilename='proto.hmm')[source]

Create all folders and their content (if possible) with their default names.

Parameters
  • workdir – (str) Name of the working directory

  • scriptsdir – (str) The folder for HTK scripts

  • featsdir – (str) The folder for features

  • logdir – (str) Directory to store log files

  • protodir – (str) Name of the prototypes directory

  • protofilename – (str) Name of the file for the HMM prototype.

Raises

IOError

delete()[source]

Delete all folders and their content, then reset members.

fix_proto(proto_dir='protos', proto_filename='proto.hmm')[source]

(Re-)Create the proto.

If relevant, create a protos directory and add the proto file. Create the macro if any.

Parameters
  • proto_dir – (str) Directory in which prototypes will be stored

  • proto_filename – (str) File name of the default prototype

fix_storage_dirs(basename)[source]

Fix the folders to store annotated speech and audio files.

The given basename can be something like: align, phon, trans, …

Parameters

basename – (str) a name to identify storage folders

Raises

IOError

fix_working_dir(workdir=None, scriptsdir='scripts', featsdir='features', logdir='log')[source]

Set the working directory and its folders.

Create all of them if necessary.

Parameters
  • workdir – (str) The working main directory

  • scriptsdir – (str) The folder for HTK scripts

  • featsdir – (str) The folder for features

  • logdir – (str) The folder to write output logs

get_storemfc()[source]

Get the current folder name to store MFCC data files.

Returns

folder name or None.

get_storetrs()[source]

Get the current folder name to store transcribed data files.

Returns

folder name or None.

get_storewav()[source]

Get the current folder name to store audio data files.

Returns

folder name or None.

reset()[source]

Fix all members to their initial value.

class annotations.Align.sppasHMM(name='und')[source]

Bases: object

HMM representation for one phone.

Hidden Markov Models (HMMs) provide a simple and effective framework for modeling time-varying spectral vector sequences. As a consequence, most of speech technology systems are based on HMMs. Each base phone is represented by a continuous density HMM, with transition probability parameters and output observation distributions. One of the most commonly used extensions to standard HMMs is to model the state-output distribution as a mixture model, a mixture of Gaussians is a highly flexible distribution able to model, for example, asymmetric and multi-modal distributed data.

An HMM-definition is made of:
  • state_count: int

  • states: list of OrderedDict with “index” and “state” as keys.

  • transition: OrderedDict with “dim” and “matrix” as keys.

  • options

  • regression_tree

  • duration

DEFAULT_NAME = 'und'
__init__(name='und')[source]

Create a sppasHMM instance.

The model includes a default name and an empty definition.

Parameters

name – (str) Name of the HMM (usually the phoneme in SAMPA)

create(states, transition, name=None)[source]

Create the hmm and set it.

Parameters
  • states – (OrderedDict)

  • transition – (OrderedDict)

  • name – (string) The name of the HMM.

If name is set to None, the default name is assigned.

static create_default()[source]

Create a default ordered dictionary, used for states.

Returns

collections.OrderedDict()

static create_gmm(means, variances, gconsts=None, weights=None)[source]

Create and return a GMM.

Returns

collections.OrderedDict()

create_proto(proto_size, nb_mix=1)[source]

Create the 5-states HMM proto and set it.

Parameters

proto_size – (int) Number of mean and variance values.

It’s commonly either 25 or 39, it depends on the MFCC parameters. :param nb_mix: (int) Number of mixtures (i.e. the number of times means and variances occur)

create_sp()[source]

Create the 3-states HMM sp and set it.

The sp model is based on a 3-state HMM with string “silst” as state 2, and a 3x3 transition matrix as follow:

0.0 1.0 0.0 0.0 0.9 0.1 0.0 0.0 0.0

static create_square_matrix(matrix)[source]

Create a default matrix.

Returns

collections.OrderedDict()

static create_transition(state_stay_probabilities=(0.6, 0.6, 0.7))[source]

Create and return a transition matrix.

Parameters

state_stay_probabilities – (list) Center transition probabilities

Returns

collections.OrderedDict()

static create_vector(vector)[source]

Create a default vector.

Returns

collections.OrderedDict()

property definition

Return the definition (OrderedDict) of the model.

get_definition()[source]

Return the definition (OrderedDict) of the model.

get_name()[source]

Return the name (str) of the model.

get_state(index)[source]

Return the state of a given index or None if index is not found.

Parameters

index – (int) State index (commonly between 1 and 5)

Returns

collections.OrderedDict or None

get_vecsize()[source]

Return the number of means and variance of each state.

If state is pointing to a macro, 0 is returned.

property name

Return the name (str) of the model.

set(name, definition)[source]

Set the model.

Parameters
  • name – (str) Name of the HMM

  • definition – (OrderedDict) Definition of the HMM (states

and transitions)

set_default_definition()[source]

Set an empty definition.

set_definition(definition)[source]

Set the definition of the model.

Parameters

definition – (OrderedDict) Definition of the HMM

(states and transitions) :raises: ModelsDataTypeError

set_name(name)[source]

Set the name of the model.

Parameters

name – (str) Name of the HMM.

Raises

ModelsDataTypeError

static_linear_interpolation(hmm, gamma)[source]

Static Linear Interpolation.

This is perhaps one of the most straightforward manner to combine models. This is an efficient way for merging the GMMs of the component models.

Gamma coefficient is applied to self and (1-gamma) to the other hmm.

Parameters
  • hmm – (HMM) the hmm to be interpolated with.

  • gamma – (float) coefficient to be applied to self.

Returns

(bool) Status of the interpolation.

class annotations.Align.sppasHTKModelTrainer(corpus=None)[source]

Bases: object

Acoustic model trainer.

This class allows to train an acoustic model from audio data and their transcriptions (either phonetic or orthographic or both).

Acoustic models are trained with HTK toolbox using a training corpus of speech, previously segmented in utterances and transcribed. The trained models are Hidden Markov models (HMMs). Typically, the HMM states are modeled by Gaussian mixture densities whose parameters are estimated using an expectation maximization procedure. The outcome of this training procedure is dependent on the availability of accurately annotated data and on good initialization.

Acoustic models are trained from 16 bits, 16000 hz wav files. The Mel-frequency cepstrum coefficients (MFCC) along with their first and second derivatives are extracted from the speech.

Step 1 is the data preparation.

Step 2 is the monophones initialization.

Step 3 is the monophones generation. This first model is re-estimated using the MFCC files to create a new model, using ``HERest’’. Then, it fixes the ``sp’’ model from the ``sil’’ model by extracting only 3 states of the initial 5-states model. Finally, this monophone model is re-estimated using the MFCC files and the training data.

Step 4 creates tied-state triphones from monophones and from some language specificity defined by means of a configuration file.

__init__(corpus=None)[source]

Create a sppasHTKModelTrainer instance.

Parameters

corpus – (sppasTrainingCorpus)

align_trs(tokenizer, phonetizer, aligner)[source]

Alignment of the transcribed speech using the current model.

create_annotators()[source]

Return a sppasTextNorm, a sppasPhon and a sppasAlign.

get_current_macro()[source]

Return the macros of the current epoch, or None.

get_current_model()[source]

Return the model of the current epoch, or None.

init_epoch_dir()[source]

Create a new epoch folder and fill it with the macros.

make_triphones()[source]

Extract triphones from monophones data (mlf).

A new mlf file is created with triphones instead of monophones, and a file with the list of triphones is created. This latter is sorted in order of arrival (this is very important).

Command: HLEd -T 2 -n output/triphones -l ‘*’ -i output/wintri.mlf scripts/mktri.led corpus.mlf

small_pause()[source]

Create and save the “sp” model for short pauses.

  • create a “silst” macro, using state 3 of the “sil” HMM,

  • adapt state 3 of the “sil” HMM definition, to use “silst”,

  • create a “sp” HMM,

  • save the “sp” HMM into the directory of monophones.

train_step(scpfile, rounds=3, dopruning=True)[source]

Perform some rounds of HERest estimation.

It expects the input HMM definition to have been initialised and it uses the embedded Baum-Welch re-estimation. This involves finding the probability of being in each state at each time frame using the Forward-Backward algorithm.

Parameters
  • scpfile – (str) Description file with the list of data files

  • rounds – (int) Number of times HERest is called.

  • dopruning – (bool) Do the pruning

Returns

bool

training_recipe(outdir=None, delete=False, header_tree=None)[source]

Create an acoustic model and return it.

A corpus (sppasTrainingCorpus) must be previously defined.

Parameters
  • outdir – (str) Directory to save the final model and related files

  • delete – (bool) Delete the working directory.

  • header_tree – (str) Name of the script file to train a triphone (commonly header-tree.hed).

Returns

sppasAcModel

training_step1()[source]

Step 1 of the training procedure.

Data preparation.

training_step2()[source]

Step 2 of the training procedure.

Monophones initialization.

training_step3()[source]

Step 3 of the training procedure.

Monophones training.

  1. Train phonemes from manually time-aligned data.

  2. Create sp model.

  3. Train from phonetized data.

  4. Align transcribed data.

  5. Train from all data.

training_step4(header_tree)[source]

Step 4 of the training procedure. Not implemented yet.

Triphones training.

Parameters

header_tree – (str) Name of the script file to train a

triphone (commonly header-tree.hed).

class annotations.Align.sppasNgramCounter(n=1, wordslist=None)[source]

Bases: object

N-gram representation.

__init__(n=1, wordslist=None)[source]

Create a sppasNgramSounter instance.

Parameters
  • n – (int) n-gram order, between 1 and MAX_ORDER.

  • wordslist – (sppasVocabulary) a list of accepted tokens.

append_sentence(sentence)[source]

Append a sentence in a dictionary of data counts.

Parameters

sentence – (str) A sentence with tokens separated by whitespace.

count(*datafiles)[source]

Count ngrams of order n from data files.

Parameters

datafiles – (*args) is a set of file names, with UTF-8 encoding.

If the file contains more than one tier, only the first one is used.

get_count(sequence)[source]

Get the count of a specific sequence.

Parameters

sequence – (str) tokens separated by whitespace.

Returns

(int)

get_ncount()[source]

Get the number of observed n-grams.

Excluding start symbols if unigrams.

Returns

(int)

get_ngram_count(ngram)[source]

Get the count of a specific ngram.

Parameters

ngram – (tuple of str) Tuple of tokens.

Returns

(int)

get_ngrams()[source]

Get the list of alphabetically-ordered n-grams.

Returns

list of tuples

shave(value)[source]

Remove data if count is lower than the given value.

Parameters

value – (int) Threshold value

class annotations.Align.sppasNgramsModel(norder=1)[source]

Bases: object

Statistical language model trainer.

A model is made of:

  • n-gram counts: a list of sppasNgramCounter instances.

  • n-gram probabilities.

How to estimate n-gram probabilities?

A slight bit of theory… The following is copied (cribbed!) from the SRILM following web page: http://www.speech.sri.com/projects/srilm/manpages/ngram-discount.7.html

a_z An N-gram where a is the first word, z is the last word, and “_”

represents 0 or more words in between.

c(a_z) The count of N-gram a_z in the training data

p(a_z) The estimated conditional probability of the nth word z given the

first n-1 words (a_) of an N-gram.

a_ The n-1 word prefix of the N-gram a_z. _z The n-1 word suffix of the N-gram a_z.

N-gram models try to estimate the probability of a word z in the context of the previous n-1 words (a_). One way to estimate p(a_z) is to look at the number of times word z has followed the previous n-1 words (a_):

  1. p(a_z) = c(a_z)/c(a_)

This is known as the maximum likelihood (ML) estimate. Notice that it assigns zero probability to N-grams that have not been observed in the training data.

To avoid the zero probabilities, we take some probability mass from the observed N-grams and distribute it to unobserved N-grams. Such redistribution is known as smoothing or discounting. Most existing smoothing algorithms can be described by the following equation:

  1. p(a_z) = (c(a_z) > 0) ? f(a_z) : bow(a_) p(_z)

If the N-gram a_z has been observed in the training data, we use the distribution f(a_z). Typically f(a_z) is discounted to be less than the ML estimate so we have some leftover probability for the z words unseen in the context (a_). Different algorithms mainly differ on how they discount the ML estimate to get f(a_z).

Example
>>> # create a 3-gram model
>>> model = sppasNgramsModel(3)
>>> # count n-grams from data
>>> model.count(*corpusfiles)
>>> # estimates probas
>>> probas = model.probabilities(method="logml")

Methods to estimates the probabilities:

  • raw: return counts instead of probabilities

  • lograw: idem with log values

  • ml: return maximum likelihood (un-smoothed probabilities)

  • logml: idem with log values

__init__(norder=1)[source]

Create a sppasNgramsModel instance.

Parameters

norder – (int) n-gram order, between 1 and MAX_ORDER.

append_sentences(sentences)[source]

Append a list of sentences in data counts.

Parameters

sentences – (list) sentences with tokens separated by whitespace.

count(*datafiles)[source]

Count ngrams from data files.

Parameters

datafiles – (*args) is a set of file names, with UTF-8 encoding.

If the file contains more than one tier, only the first one is used.

get_order()[source]

Return the n-gram order value.

Returns

N-gram order integer value to assign.

probabilities(method='lograw')[source]

Return a list of probabilities.

Parameters

method – (str) method to estimate probabilities

Returns

list of n-gram probabilities.

Example
>>> probas = probabilities("logml")
>>> for t in probas[0]:
>>>      print(t)
>>> ('</s>', -1.066946789630613, None)
>>> ('<s>', -99.0, None)
>>> (u'a', -0.3679767852945944, None)
>>> (u'b', -0.5440680443502756, None)
>>> (u'c', -0.9420080530223132, None)
>>> (u'd', -1.066946789630613, None)
set_end_symbol(symbol)[source]

Set the end sentence symbol.

Parameters

symbol – (str) String to represent the end of a sentence.

set_min_count(value)[source]

Fix a minimum count values, applied only to the max order.

Any observed n-gram with a count under the value is removed.

Parameters

value – (int) Threshold for minimum count

set_start_symbol(symbol)[source]

Set the start sentence symbol.

Parameters

symbol – (str) String to represent the beginning of a sentence.

set_vocab(filename)[source]

Fix a list of accepted tokens; others are mentioned as unknown.

Parameters

filename – (str) List of tokens.

class annotations.Align.sppasSLM[source]

Bases: object

Statistical language model representation.

__init__()[source]

Create a sppasSLM instance without model.

evaluate(filename)[source]

Evaluate a model on a file (perplexity).

interpolate(other)[source]

Interpolate the model with another one.

An N-Gram language model can be constructed from a linear interpolation of several models. In this case, the overall likelihood P(w|h) of a word w occurring after the history h is computed as the arithmetic average of P(w|h) for each of the models.

The default interpolation method is linear interpolation. In addition, log-linear interpolation of models is possible.

Parameters

other – (sppasSLM)

load_from_arpa(filename)[source]

Load the model from an ARPA-ASCII file.

Parameters

filename – (str) Filename from which to read the model.

save_as_arpa(filename)[source]

Save the model into an ARPA-ASCII file.

Parameters

filename – (str) Filename in which to write the model.

set(model)[source]

Set the language model.

Parameters

model – (list) List of lists of tuples for 1-gram, 2-grams, …

class annotations.Align.sppasTrainingCorpus(datatrainer=None, lang='und')[source]

Bases: object

Manager of a training corpus, to prepare a set of data.

Data preparation is the step 1 of the acoustic model training procedure.

It establishes the list of phonemes. It converts the input data into the HTK-specific data format. It codes the audio data, also called “parameterizing the raw speech waveforms into sequences of feature vectors” (i.e. convert from wav to MFCC format).

Accepted input:

  • annotated files: one of sppasTrsRW.extensions_in()

  • audio files: one of audiodata.extensions

__init__(datatrainer=None, lang='und')[source]

Create a sppasTrainingCorpus instance.

Parameters
  • datatrainer – (sppasDataTrainer)

  • lang – (str) iso-8859-3 of the language

add_corpus(directory)[source]

Add a new corpus to deal with.

Find matching pairs of files (audio / transcription) of the given directory and its folders.

Parameters

directory – (str) The directory to find data files of a corpus.

Returns

the number of pairs appended.

add_file(trs_filename, audio_filename)[source]

Add a new set of files to deal with.

If such files are already in the data, they will be added again.

Parameters
  • trs_filename – (str) The annotated file.

  • audio_filename – (str) The audio file.

Returns

(bool)

create()[source]

Create files and directories.

fix_resources(vocab_file=None, dict_file=None, mapping_file=None)[source]

Fix resources using default values.

Ideally, resources are fixed after the datatrainer.

Parameters
  • vocab_file – (str) The lexicon, used during tokenization of the corpus.

  • dict_file – (str) The pronunciation dictionary, used both to

generate the list of phones and to perform phonetization of the corpus. :param mapping_file: (str) file that contains the mapping table for the phone set.

get_mlf()[source]

Fix the mlf file by defining the directories to add.

Example of a line of the MLF file is: “/mfc-align/” => “workdir/trs-align”

get_scp(aligned=True, phonetized=False, transcribed=False)[source]

Fix the train.scp file content.

Parameters
  • aligned – (bool) Add time-aligned data in the scp file

  • phonetized – (bool) Add phonetized data in the scp file

  • transcribed – (bool) Add transcribed data in the scp file

Returns

filename or None if no data is available.

reset()[source]

Fix all members to None or to their default values.