annotations.Align.aligners package

Submodules

annotations.Align.aligners.aligner module

filename

sppas.src.annotations.Align.aligners.aligner.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Aligners manager.

class annotations.Align.aligners.aligner.sppasAligners[source]

Bases: object

Manager of the aligners implemented in the package.

__init__()[source]

Create a sppasAligners to manage the aligners supported by SPPAS.

check(aligner_name)[source]

Check whether the aligner name is known or not.

Parameters

aligner_name – (str) Name of the aligner.

Returns

formatted alignername

classes(aligner_name=None)[source]

Return the list of aligner classes.

Parameters

aligner_name – (str) A specific aligner

Returns

BasicAligner, or a list if no aligner name is given

static default_aligner_name()[source]

Return the name of the default aligner.

default_extension(aligner_name=None)[source]

Return the default extension of each aligner.

Parameters

aligner_name – (str) A specific aligner

Returns

str, or a dict of str if no aligner name is given

extensions(aligner_name=None)[source]

Return the list of supported extensions of each aligner.

Parameters

aligner_name – (str) A specific aligner

Returns

list of str, or a dict of list if no aligner name is given

get()[source]

Return a dictionary of aligners (key=name, value=instance).

instantiate(model_dir=None, aligner_name='basic')[source]

Instantiate an aligner to the appropriate system from its name.

If an error occurred, the basic aligner is returned.

Parameters
  • model_dir – (str) Directory of the acoustic model

  • aligner_name – (str) Name of the aligner

Returns

an Aligner instance.

names()[source]

Return the list of aligner names.

annotations.Align.aligners.alignerio module

filename

sppas.src.annotations.Align.aligners.alignerio.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Aligners Input/Output readers and writers

class annotations.Align.aligners.alignerio.AlignerIO[source]

Bases: object

Reader/writer of the output files of the aligners.

AlignerIO implements methods to read/write files of the external aligner systems.

EXTENSIONS_READ = {'mlf': <class 'annotations.Align.aligners.alignerio.mlf'>, 'palign': <class 'annotations.Align.aligners.alignerio.palign'>, 'walign': <class 'annotations.Align.aligners.alignerio.walign'>}
EXTENSIONS_WRITE = {'palign': <class 'annotations.Align.aligners.alignerio.palign'>}
static read_aligned(basename)[source]

Find an aligned file and read it.

Parameters

basename – (str) File name without extension

Returns

Two lists of tuples with phones and words - (start-time end-time phoneme score) - (start-time end-time word score)

The score can be None. todo: The “phoneme” column can be a sequence of alternative phonemes.

class annotations.Align.aligners.alignerio.BaseAlignersReader[source]

Bases: object

Base class for readers/writers of time-aligned files.

__init__()[source]
static get_lines(filename)[source]

Return the lines of a file with the SPPAS encoding.

Parameters

filename – file to load

Returns

list of decoded lines

static get_phonemes_julius(lines)[source]

Return the pronunciation of all words.

Parameters

lines – (List of str)

Returns

List of tuples (ph1 ph2…phN)

static get_units_julius(lines)[source]

Return the units of a palign/walign file (in frames).

Parameters

lines – (List of str)

Returns

List of tuples (start, end)

static get_word_scores_julius(lines)[source]

Return all scores of words.

Parameters

lines – (List of str)

Returns

List

static get_words_julius(lines)[source]

Return all words.

Parameters

lines – (List of str)

Returns

List

static make_result(units, words, phonemes, scores)[source]

Make a unique data structure from the given data.

Parameters
  • units – (List of tuples)

  • words – (List of str)

  • phonemes – (List of tuples)

  • scores – (List of str, or None)

Returns

Two data structures

  1. List of (start_time end_time phoneme None)

  2. List of (start_time end_time word score)

static read(filename)[source]
static shift_time_units(units, delta)[source]

Return the units shifted of a delta time.

The first start time and the last end time are not shifted.

Parameters
  • units – (list of tuples) Time units

  • delta – (float) Delta time value in range [-0.02;0.02]

static units_to_time(units, samplerate)[source]

Return the conversion of units.

Convert units (in frames) into time values (in seconds).

Parameters

samplerate – (int) Sample rate to be applied to the units.

Returns

List of tuples (start, end)

NOTE: DANS LES VERSIONS PREC. ON DECALAIT TOUT DE 10ms A DROITE.

class annotations.Align.aligners.alignerio.mlf[source]

Bases: annotations.Align.aligners.alignerio.BaseAlignersReader

mlf reader of time-aligned files (HTK Toolkit).

When the -m option is used, the transcriptions output by HVITE would by default contain both the model level and word level transcriptions . For example, a typical fragment of the output might be:

7500000 8700000 f -1081.604736 FOUR 30.000000 8700000 9800000 ao -903.821350 9800000 10400000 r -665.931641

10400000 10400000 sp -0.103585 10400000 11700000 s -1266.470093 SEVEN 22.860001 11700000 12500000 eh -765.568237 12500000 13000000 v -476.323334 13000000 14400000 n -1285.369629 14400000 14400000 sp -0.103585

__init__()[source]

Create a mlf instance to parse mlf files from HVite.

static get_phonemes(lines)[source]

Return the pronunciation of all words.

Parameters

lines – (List of str)

Returns

List of tuples (ph1 ph2…phN)

static get_units(lines)[source]

Return the units of a mlf file (in nano-seconds).

Parameters

lines – (List of str)

Returns

List of tuples (start, end)

static get_words(lines)[source]

Return all words.

Parameters

lines – (List of str)

Returns

List

static is_integer(s)[source]

Check whether a string is an integer or not.

Parameters

s – (str or unicode)

Returns

(bool)

static read(filename)[source]

Read an alignment file (a mlf file).

Parameters

filename – (str) the input file (a HVite mlf output file).

Returns

2 lists of tuples: - (start-time end-time phoneme None) - (start-time end-time word None)

class annotations.Align.aligners.alignerio.palign[source]

Bases: annotations.Align.aligners.alignerio.BaseAlignersReader

palign reader/writer of time-aligned files (Julius CSR Engine).

__init__()[source]

Create a palign instance to read palign files of Julius.

static read(filename)[source]

Read an alignment file in the format of Julius CSR engine.

Parameters

filename – (str) The input file name.

Returns

3 lists of tuples

  1. List of (start-time end-time phoneme None)

  2. List of (start-time end-time word None)

  3. List of (start-time end-time pron_word score)

static write(phoneslist, tokenslist, alignments, outputfilename)[source]

Write an alignment output file.

Parameters
  • phoneslist – (list) The phonetization of each token

  • tokenslist – (list) Each token

  • alignments – (list) Tuples (start-time end-time phoneme)

  • outputfilename – (str) Output file name (a Julius-like output).

class annotations.Align.aligners.alignerio.walign[source]

Bases: annotations.Align.aligners.alignerio.BaseAlignersReader

walign reader of time-aligned files (Julius CSR Engine).

__init__()[source]

Create a walign instance to read walign files of Julius.

static read(filename)[source]

Read an alignment file in the format of Julius CSR engine.

Parameters

filename – (str) The input file name.

Returns

A list of tuples (start-time end-time word score)

annotations.Align.aligners.basealigner module

filename

sppas.src.annotations.Align.tracksio.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Base class for any automatic forced alignment system.

class annotations.Align.aligners.basealigner.BaseAligner(model_dir=None)[source]

Bases: object

Base class for any automatic alignment system.

Base class for a system to perform phonetic speech segmentation.

__init__(model_dir=None)[source]

Create a BaseAligner instance.

Parameters

model_dir – (str) the acoustic model directory name

add_tiedlist(entries)[source]

Add missing triphones/biphones in the tiedlist of the model.

Backup the initial file if entries were added.

Parameters

entries – (list) List of missing entries into the tiedlist.

Returns

list of entries really added

check_data()[source]

Check the given data to be aligned (phones and tokens).

Returns

A warning message, or an empty string if check is OK.

extensions()[source]

Return the list of supported file name extensions.

name()[source]

Return the identifier name of the aligner.

outext()[source]

Return the extension of output files.

run_alignment(input_wav, output_align)[source]

Perform forced-alignment.

It is expected that the alignment is performed on a file with a size less or equal to a sentence (sentence/IPUs/segment/utterance).

The audio file must be of type PCM-WAV 16000 Hz, 16 bits, like in the model.

Parameters
  • input_wav – (str) the audio input file name

  • output_align – (str) the output file name

Returns

(str) A message of the aligner

set_phones(phones)[source]

Fix the pronunciations of each token.

Parameters

phones – (str) Phonetization

set_tokens(tokens)[source]

Fix the tokens.

Parameters

tokens – (str) Tokenization

annotations.Align.aligners.basicalign module

filename

sppas.src.annotations.Align.aligners.basicalign.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

An aligner to set the same duration to each sound.

class annotations.Align.aligners.basicalign.BasicAligner(model_dir=None)[source]

Bases: annotations.Align.aligners.basealigner.BaseAligner

Basic automatic alignment system.

This segmentation assign the same duration to each phoneme. In case of phonetic variants, the first shortest pronunciation is selected.

__init__(model_dir=None)[source]

Create a BasicAligner instance.

This class allows to align one unit assigning the same duration to each phoneme. It selects the shortest sequence in case of variants.

Parameters

model_dir – (str) Ignored.

run_alignment(input_wav, output_align)[source]

Perform the speech segmentation.

Assign the same duration to each phoneme.

Parameters
  • input_wav – (str/float) audio input file name, or its duration

  • output_align – (str) the output file name

Returns

Empty string.

run_basic(duration, output_align=None)[source]

Perform the speech segmentation.

Assign the same duration to each phoneme.

Parameters
  • duration – (float) the duration of the audio input

  • output_align – (str) the output file name

Returns

the List of tuples (begin, end, phone)

static select_shortest(pron)[source]

Return the first of the shortest pronunciations of an entry.

Parameters

pron – (str) The phonetization of a token

Returns

(str) pronunciation

annotations.Align.aligners.hvitealign module

filename

sppas.src.annotations.Align.aligners.hvitealign.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Wrapper for HVite.

http://htk.eng.cam.ac.uk/links/asr_tool.shtml

class annotations.Align.aligners.hvitealign.HviteAligner(model_dir=None)[source]

Bases: annotations.Align.aligners.basealigner.BaseAligner

HVite automatic alignment system.

__init__(model_dir=None)[source]

Create a HViteAligner instance.

This class allows to align one inter-pausal unit with with the external segmentation tool HVite.

HVite is able to align one audio segment that can be:
  • an inter-pausal unit,

  • an utterance,

  • a sentence,

  • a paragraph…

no longer than a few seconds.

Parameters

model_dir – (str) Name of the directory of the acoustic model

gen_dependencies(grammar_name, dict_name)[source]

Generate the dependencies (grammar, dictionary) for HVite.

Parameters
  • grammar_name – (str) the file name of the tokens

  • dict_name – (str) the dictionary file name

run_alignment(input_wav, output_align)[source]

Execute the external program HVite to align.

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • input_wav – (str) audio input file name

  • output_align – (str) the output file name

Returns

(str) An empty string.

run_hvite(inputwav, outputalign)[source]

Perform the speech segmentation.

Call the system command HVite.

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • inputwav – (str) audio input file name

  • outputalign – (str) the output file name

annotations.Align.aligners.juliusalign module

filename

sppas.src.annotations.Align.aligners.juliusalign.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Wrapper for Julius aligner.

http://julius.sourceforge.jp/en_index.php

Julius is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 60k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit such as HTK, CMU-Cam SLM toolkit, etc.

The main platform is Linux and other Unix workstations, and also works on Windows. Most recent version is developed on Linux and Windows (cygwin / mingw), and also has Microsoft SAPI version. Julius is distributed with open license together with source codes.

Julius has been developed as a research software for Japanese LVCSR since 1997, and the work was continued under IPA Japanese dictation toolkit project (1997-2000), Continuous Speech Recognition Consortium, Japan (CSRC) (2000-2003) and currently Interactive Speech Technology Consortium (ISTC).

class annotations.Align.aligners.juliusalign.JuliusAligner(model_dir=None)[source]

Bases: annotations.Align.aligners.basealigner.BaseAligner

Julius automatic alignment system.

JuliusAligner is able to align one audio segment that can be:
  • an inter-pausal unit,

  • an utterance,

  • a sentence…

no longer than a few seconds.

Things needed to run JuliusAligner:

To perform speech segmentation with Julius, three “models” have to be prepared. The models should define the linguistic property of the language: recognition unit, audio properties of the unit and the linguistic constraint for the connection between the units. Typically the unit should be a word, and you should give Julius these models below:

1. “Acoustic model”, which is a stochastic model of input waveform patterns, typically per phoneme. Format is HTK-ASCII model.

  1. “Word dictionary”, which defines vocabulary.

3. “Language model”, which defines syntax level rules that defines the connection constraint between words. It should give the constraint for the acceptable or preferable sentence patterns. It can be:

  • either a rule-based grammar,

  • or probabilistic N-gram model.

This class automatically construct the word dictionary and the language model from both:

  • the tokenization of speech,

  • the phonetization of speech.

If outext is set to “palign”, JuliusAligner will use a grammar and it will produce both phones and words alignments. If outext is set to “walign”, JuliusAligner will use a slm and will produce words alignments only.

__init__(model_dir=None)[source]

Create a JuliusAligner instance.

Parameters

model_dir – (str) Name of the directory of the acoustic model

gen_grammar_dependencies(basename)[source]

Generate the dependencies (grammar, dictionary) for julius.

Parameters

basename – (str) base name of the grammar and dictionary files

gen_slm_dependencies(basename, N=3)[source]

Generate the dependencies (slm, dictionary) for julius.

Parameters
  • basename – (str) base name of the slm and dictionary files

  • N – (int) Language model N-gram length.

run_alignment(input_wav, output_align, N=3)[source]

Execute the external program julius to align.

The data related to the unit to time-align need to be previously fixed with:

  • set_phones(str)

  • set_tokens(str)

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • input_wav – (str) the audio input file name

  • output_align – (str) the output file name

  • N – (int) for N-grams, used only if SLM (i.e. outext=walign)

Returns

(str) A message of julius.

run_julius(inputwav, basename, outputalign)[source]

Perform the speech segmentation.

System call to the command julius.

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • inputwav – (str) audio input file name

  • basename – (str) base name of grammar and dictionary files

  • outputalign – (str) output file name

set_outext(ext)[source]

Set the extension for output files.

Parameters

ext – (str) Extension for output file name.

Module contents

filename

sppas.src.annotations.Align.aligners.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Internal or externals automatic aligners

How to get the list of supported aligner names?

>>> a = sppasAligners()
>>> a.default_aligner_name()
>>> a.names()

How to get an instance of a given aligner?

>>> a1 = sppasAligners().instantiate(model_dir, "julius")
>>> a2 = JuliusAligner(model_dir)
class annotations.Align.aligners.BasicAligner(model_dir=None)[source]

Bases: annotations.Align.aligners.basealigner.BaseAligner

Basic automatic alignment system.

This segmentation assign the same duration to each phoneme. In case of phonetic variants, the first shortest pronunciation is selected.

__init__(model_dir=None)[source]

Create a BasicAligner instance.

This class allows to align one unit assigning the same duration to each phoneme. It selects the shortest sequence in case of variants.

Parameters

model_dir – (str) Ignored.

run_alignment(input_wav, output_align)[source]

Perform the speech segmentation.

Assign the same duration to each phoneme.

Parameters
  • input_wav – (str/float) audio input file name, or its duration

  • output_align – (str) the output file name

Returns

Empty string.

run_basic(duration, output_align=None)[source]

Perform the speech segmentation.

Assign the same duration to each phoneme.

Parameters
  • duration – (float) the duration of the audio input

  • output_align – (str) the output file name

Returns

the List of tuples (begin, end, phone)

static select_shortest(pron)[source]

Return the first of the shortest pronunciations of an entry.

Parameters

pron – (str) The phonetization of a token

Returns

(str) pronunciation

class annotations.Align.aligners.HviteAligner(model_dir=None)[source]

Bases: annotations.Align.aligners.basealigner.BaseAligner

HVite automatic alignment system.

__init__(model_dir=None)[source]

Create a HViteAligner instance.

This class allows to align one inter-pausal unit with with the external segmentation tool HVite.

HVite is able to align one audio segment that can be:
  • an inter-pausal unit,

  • an utterance,

  • a sentence,

  • a paragraph…

no longer than a few seconds.

Parameters

model_dir – (str) Name of the directory of the acoustic model

gen_dependencies(grammar_name, dict_name)[source]

Generate the dependencies (grammar, dictionary) for HVite.

Parameters
  • grammar_name – (str) the file name of the tokens

  • dict_name – (str) the dictionary file name

run_alignment(input_wav, output_align)[source]

Execute the external program HVite to align.

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • input_wav – (str) audio input file name

  • output_align – (str) the output file name

Returns

(str) An empty string.

run_hvite(inputwav, outputalign)[source]

Perform the speech segmentation.

Call the system command HVite.

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • inputwav – (str) audio input file name

  • outputalign – (str) the output file name

class annotations.Align.aligners.JuliusAligner(model_dir=None)[source]

Bases: annotations.Align.aligners.basealigner.BaseAligner

Julius automatic alignment system.

JuliusAligner is able to align one audio segment that can be:
  • an inter-pausal unit,

  • an utterance,

  • a sentence…

no longer than a few seconds.

Things needed to run JuliusAligner:

To perform speech segmentation with Julius, three “models” have to be prepared. The models should define the linguistic property of the language: recognition unit, audio properties of the unit and the linguistic constraint for the connection between the units. Typically the unit should be a word, and you should give Julius these models below:

1. “Acoustic model”, which is a stochastic model of input waveform patterns, typically per phoneme. Format is HTK-ASCII model.

  1. “Word dictionary”, which defines vocabulary.

3. “Language model”, which defines syntax level rules that defines the connection constraint between words. It should give the constraint for the acceptable or preferable sentence patterns. It can be:

  • either a rule-based grammar,

  • or probabilistic N-gram model.

This class automatically construct the word dictionary and the language model from both:

  • the tokenization of speech,

  • the phonetization of speech.

If outext is set to “palign”, JuliusAligner will use a grammar and it will produce both phones and words alignments. If outext is set to “walign”, JuliusAligner will use a slm and will produce words alignments only.

__init__(model_dir=None)[source]

Create a JuliusAligner instance.

Parameters

model_dir – (str) Name of the directory of the acoustic model

gen_grammar_dependencies(basename)[source]

Generate the dependencies (grammar, dictionary) for julius.

Parameters

basename – (str) base name of the grammar and dictionary files

gen_slm_dependencies(basename, N=3)[source]

Generate the dependencies (slm, dictionary) for julius.

Parameters
  • basename – (str) base name of the slm and dictionary files

  • N – (int) Language model N-gram length.

run_alignment(input_wav, output_align, N=3)[source]

Execute the external program julius to align.

The data related to the unit to time-align need to be previously fixed with:

  • set_phones(str)

  • set_tokens(str)

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • input_wav – (str) the audio input file name

  • output_align – (str) the output file name

  • N – (int) for N-grams, used only if SLM (i.e. outext=walign)

Returns

(str) A message of julius.

run_julius(inputwav, basename, outputalign)[source]

Perform the speech segmentation.

System call to the command julius.

Given audio file must match the ones we used to train the acoustic model: PCM-WAV 16000 Hz, 16 bits

Parameters
  • inputwav – (str) audio input file name

  • basename – (str) base name of grammar and dictionary files

  • outputalign – (str) output file name

set_outext(ext)[source]

Set the extension for output files.

Parameters

ext – (str) Extension for output file name.

class annotations.Align.aligners.sppasAligners[source]

Bases: object

Manager of the aligners implemented in the package.

__init__()[source]

Create a sppasAligners to manage the aligners supported by SPPAS.

check(aligner_name)[source]

Check whether the aligner name is known or not.

Parameters

aligner_name – (str) Name of the aligner.

Returns

formatted alignername

classes(aligner_name=None)[source]

Return the list of aligner classes.

Parameters

aligner_name – (str) A specific aligner

Returns

BasicAligner, or a list if no aligner name is given

static default_aligner_name()[source]

Return the name of the default aligner.

default_extension(aligner_name=None)[source]

Return the default extension of each aligner.

Parameters

aligner_name – (str) A specific aligner

Returns

str, or a dict of str if no aligner name is given

extensions(aligner_name=None)[source]

Return the list of supported extensions of each aligner.

Parameters

aligner_name – (str) A specific aligner

Returns

list of str, or a dict of list if no aligner name is given

get()[source]

Return a dictionary of aligners (key=name, value=instance).

instantiate(model_dir=None, aligner_name='basic')[source]

Instantiate an aligner to the appropriate system from its name.

If an error occurred, the basic aligner is returned.

Parameters
  • model_dir – (str) Directory of the acoustic model

  • aligner_name – (str) Name of the aligner

Returns

an Aligner instance.

names()[source]

Return the list of aligner names.