anndata.aio package

Submodules

anndata.aio.aioutils module

filename

sppas.src.anndata.aio.aioutils.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Utilities for readers and writers.

anndata.aio.aioutils.check_gaps(tier, min_loc=None, max_loc=None)[source]

Check if there are holes between annotations.

Parameters
  • tier – (sppasTier)

  • min_loc – (sppasPoint)

  • max_loc – (sppasPoint)

Returns

(bool)

anndata.aio.aioutils.check_overlaps(tier)[source]

Check whether some annotations are overlapping or not.

Parameters

tier – (sppasTier)

Returns

(bool)

anndata.aio.aioutils.fill_gaps(tier, min_loc=None, max_loc=None)[source]

Temporal gaps/holes between annotations are filled.

Parameters
  • tier – (sppasTier) A tier with intervals.

  • min_loc – (sppasPoint)

  • max_loc – (sppasPoint)

Returns

(sppasTier) a tier with un-labelled annotations instead of gaps.

anndata.aio.aioutils.format_label(text, empty='', tag_type='str')[source]

Create a label from a text.

Use the “{ | }” system to parse the alternative tags and = for scores.

Parameters
  • text – (str)

  • empty – (str) The text representing an empty tag.

  • tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).

Returns

sppasLabel

anndata.aio.aioutils.format_labels(text, separator='\n', empty='', tag_type='str')[source]

Create a set of labels from a text.

Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags.

Examples

text = “{le|les} {chat|chats}” is 2 labels with 2 tags each text = “{le=0.6|les=0.4}” is a label with 2 tags and their score

Parameters
  • text – (str)

  • separator – (str) String to separate labels.

  • empty – (str) The text representing an empty tag.

  • tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).

Returns

list of sppasLabel

anndata.aio.aioutils.format_point_to_float(p)[source]
anndata.aio.aioutils.format_score(text)[source]

Return a score from a text.

Parameters

text – (str) Unicode text

Returns

float or None

anndata.aio.aioutils.format_tag(text, empty='', tag_type='str')[source]

Return a tag from a text.

Parameters
  • text – (str) Unicode text

  • empty – (str) The text representing an empty tag.

  • tag_type – (str): The type of this content. One of: (‘str’, ‘int’, ‘float’, ‘bool’).

Returns

sppasTag

anndata.aio.aioutils.is_ortho_tier(tier_name)[source]

Return true is the tier_name matches an ortho trans.

i.e. is containing either “ipu”, “trans”, “trs”, “toe” or “ortho” in its name.

Parameters

tier_name – (str)

Returns

(bool)

anndata.aio.aioutils.load(filename, file_encoding='utf-8')[source]

Load a file into lines.

Parameters
  • filename – (str)

  • file_encoding – (str)

Returns

list of lines (str)

anndata.aio.aioutils.merge_overlapping_annotations(tier)[source]

Merge overlapping annotations.

The labels of 2 overlapping annotations are appended.

Parameters

tier – (Tier)

Returns

(sppasTier)

anndata.aio.aioutils.point2interval(tier, radius=0.001)[source]

Convert a PointTier into an IntervalTier.

  • Ensure the radius to be always >= 1 millisecond and the newly created

tier won’t contain overlapped intervals. - Do not convert alternatives localizations. - Do not share the hierarchy. - New tier share the original tier’s metadata, except that its ‘id’ is different. - New annotations share the original annotation’s metadata, except that their ‘id’ is different.

Parameters
  • tier – (Tier)

  • radius – (float) the radius to use for all intervals

Returns

(sppasTier) or None if tier was not converted.

anndata.aio.aioutils.serialize_label(label, empty='', alt=True)[source]

Convert the label into a string, include or not alternative tags.

Use the “{ | }” system to serialize the alternative tags. Scores of the tags are not returned.

Parameters
  • label – (sppasLabel)

  • empty – (str) The text to return if a tag is empty or not set.

  • alt – (bool) Include alternative tags

Returns

(str)

anndata.aio.aioutils.serialize_labels(labels, separator='\n', empty='', alt=True)[source]

Create a text from a list of labels.

Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags and = for scores.

Parameters
  • labels – (list of sppasLabel)

  • separator – (str) String separating labels

  • empty – (str) The text representing an empty tag

  • alt – (bool) Include alternative tags. If False, only the best tag is serialized.

Returns

list of sppasLabel

anndata.aio.aioutils.unalign(aligned_tier, ipus_separators=('#', 'sil', 'dummy'))[source]

Convert a time-aligned tier into a non-aligned tier.

Parameters
  • aligned_tier – (sppasTier)

  • ipus_separators – (list)

Returns

(Tier)

anndata.aio.aioutils.unfill_gaps(tier)[source]

Return the tier in which un-labelled annotations are removed.

An un_labelled annotation means that:

  • the annotation has no labels,

  • or the tags of each label are an empty string.

The hierarchy is not copied to the new tier.

Parameters

tier – (Tier)

Returns

(sppasTier)

anndata.aio.annotationpro module

filename

sppas.src.anndata.aio.annotationpro.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of annotation pro files.

Annotation Pro is a tool for annotation of audio and text files:

Klessa, K., Karpiński, M., Wagner, A. (2013).
Annotation Pro – a new software tool for annotation of linguistic and
paralinguistic features.
In D. Hirst & B. Bigi (Eds.)
Proceedings of the Tools and Resources for the Analysis of Speech Prosody
(TRASP) Workshop, Aix en Provence, 51-54.

http://annotationpro.org/

anndata.aio.annotationpro.color_to_rgb(color)[source]

Convert an ANTX decimal color into RGB.

anndata.aio.annotationpro.pick_random_color(v1=0, v2=255)[source]

Return a random RGB color.

anndata.aio.annotationpro.rgb_to_color(r, g, b)[source]

Convert a RGB color into ANTX decimal color.

class anndata.aio.annotationpro.sppasANT(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

AnnotationPro ANT reader and writer.

An ANT file is a ZIPPED directory.

__init__(name=None)[source]

Initialize a new sppasANT instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of ANT format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read an ANT file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write an Ant file.

Parameters

filename – (str)

class anndata.aio.annotationpro.sppasANTX(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

AnnotationPro ANTX reader and writer.

__init__(name=None)[source]

Initialize a new sppasANTX instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of ANTX format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

elt_to_meta(root, meta_object, uri, exclude_list=[])[source]

Add nodes of root in meta_object.

static indent(elem, level=0)[source]

Pretty indent of an ElementTree.

http://effbot.org/zone/element-lib.htm#prettyprint

static make_point(midpoint, sample_rate=44100)[source]

The localization is a frame value, so an integer.

read(filename)[source]

Read an ANTX file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write an Antx file.

Parameters

filename

anndata.aio.anvil module

filename

sppas.src.anndata.aio.anvil.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of SPPAS native file formats (.xra, .jra).

ANVIL is a free video annotation tool.

Kipp, M. (2012)
Multimedia Annotation, Querying and Analysis in ANVIL.
In: M. Maybury (ed.) Multimedia Information Extraction,
Chapter 21, John Wiley & Sons, pp: 351-368.

BE AWARE that the support of anvil files by SPPAS has to be verified, tested and extended!!! The last release of ANVIL was in 2017.

class anndata.aio.anvil.sppasAnvil(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

ANVIL (partially) reader.

Author

Brigitte Bigi, Jibril Saffi

Organization

Laboratoire Parole et Langage, Aix-en-Provence, France

Contact

contact@sppas.org

License

GPL, v3

Copyright

Copyright (C) 2011-2018 Brigitte Bigi

__init__(name=None)[source]

Initialize a new ANVIL instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of ANVIL format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static make_point(midpoint)[source]

The localization is a time value, so always a float.

read(filename)[source]

Read an ANVIL file and fill the Transcription.

Parameters

filename – (str)

anndata.aio.audacity module

filename

sppas.src.anndata.aio.audacity.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of Audacity file formats (.eaf).

Audacity is a multi-platform, free, easy-to-use, multi-track audio editor and recorder. Audacity is free software, developed by a group of volunteers and distributed under the GNU General Public License (GPL).

See: http://www.audacityteam.org/

class anndata.aio.audacity.sppasAudacity(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Audacity projects reader.

Can work on both Audacity projects and Audacity Label tracks.

__init__(name=None)[source]

Initialize a new sppasAudacity instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of AUP format or not. AUP files are encoded in UTF-8 without BOM.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static make_point(midpoint)[source]

The localization is a time value, so a float.

static normalize(name)[source]

Provide namespaces in element names.

Example:

<Element ‘{http://audacity.sourceforge.net/xml/}simpleblockfile’ at 0x03270230> <Element ‘{http://audacity.sourceforge.net/xml/}envelope’ at 0x032702C0> <Element ‘{http://audacity.sourceforge.net/xml/}labeltrack’ at 0x03270C50> <Element ‘{http://audacity.sourceforge.net/xml/}label’ at 0x032701E8>

See: http://effbot.org/zone/element-namespaces.htm

read(filename)[source]

Read an AUP file and fill the Transcription.

<!ELEMENT project (tags, (wavetrack | labeltrack | timetrack)*)>

Parameters

filename – (str)

anndata.aio.basetrsio module

filename

sppas.src.anndata.aio.basetrsio.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Base class for any transcription input/output.

class anndata.aio.basetrsio.sppasBaseIO(name=None)[source]

Bases: anndata.transcription.sppasTranscription

Base object for readers and writers of annotated data.

__init__(name=None)[source]

Initialize a new Transcription reader-writer instance.

Parameters

name – (str) A transcription name.

alternative_localization_support()[source]

Return True if it supports to alternative localizations.

If support with or without a score, it returns true.

Returns

boolean

alternative_tag_support()[source]

Return True if it supports alternative tags.

If support with or without a score, it returns true.

Returns

boolean

ctrl_vocab_support()[source]

Return True if it supports to read and write a controlled vocab.

Returns

boolean

static detect(filename)[source]

Check whether a file is of the appropriate format or not.

disjoint_support()[source]

Return True if it supports tiers with localizations as disjoint.

Returns

boolean

gaps_support()[source]

Return True if it supports gaps between annotations of a tier.

Returns

boolean

hierarchy_support()[source]

Return True if it supports a hierarchy between tiers.

Returns

boolean

interval_support()[source]

Return True if it supports tiers with localizations as intervals.

Returns

boolean

static is_number(s)[source]

Check whether a string is a number or not.

Parameters

s – (str or unicode)

Returns

(bool)

media_support()[source]

Return True if it supports to read and write a link to a media.

Returns

boolean

metadata_support()[source]

Return True if it supports to read and write metadata.

Returns

boolean

multi_tiers_support()[source]

Return True if it supports to read and write several tiers.

Returns

boolean

no_tiers_support()[source]

Return True if it supports to write no tier.

Returns

boolean

overlaps_support()[source]

Return True if it supports overlaps between annotations of a tier.

Returns

boolean

point_support()[source]

Return True if it supports tiers with localizations as points.

Returns

boolean

radius_support()[source]

Return True if it supports the radius value.

Returns

boolean

read(filename)[source]

Read a file and fill the transcription.

Parameters

filename – (str)

set(other)[source]

Set self with other content.

Parameters

other – (sppasTranscription)

write(filename)[source]

Write the transcription into a file.

Parameters

filename – (str)

anndata.aio.elan module

filename

sppas.src.anndata.aio.elan.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of ELAN native file formats (.eaf).

ELAN is a professional tool for the creation of complex annotations on video and audio resources.

Brugman, H., Russel, A. (2004).
Annotating Multimedia/ Multi-modal resources with ELAN.
In: Proceedings of LREC 2004, Fourth International Conference on
Language Resources and Evaluation.
class anndata.aio.elan.sppasEAF(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Elan EAF reader and writer.

__init__(name=None)[source]

Initialize a new sppasMLF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of EAF format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

format_point(second_count)[source]

Convert a time in seconds into ELAN format.

Parameters

second_count – (float) Time value (in seconds)

Returns

(int) a time in ELAN format

static indent(elem, level=0)[source]

Pretty indent.

http://effbot.org/zone/element-lib.htm#prettyprint

make_point(midpoint)[source]

Convert data into the appropriate sppasPoint().

Parameters

midpoint – (str) a time in ELAN format

Returns

(sppasPoint) Representation of time in seconds with a (very)

large vagueness!

read(filename)[source]

Read a ELAN EAF file.

Parameters

filename – (str) input filename.

write(filename)[source]

Write an ELAN EAF file.

Parameters

filename – output filename.

anndata.aio.htk module

filename

sppas.src.anndata.aio.htk.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of HTK native file formats.

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models.

The first version of the HTK Hidden Markov Model Toolkit was developed at the Speech Vision and Robotics Group of the Cambridge University Engineering Department (CUED) in 1989 by Steve Young.

class anndata.aio.htk.sppasBaseHTK(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

SPPAS HTK files reader and writer.

__init__(name=None)[source]

Initialize a new sppasMLF instance.

Parameters

name – (str) This transcription name.

static make_point(time_string)[source]

Convert data into the appropriate sppasPoint().

No radius if data is an integer. A default radius of 0.001 if data is a float.

Parameters

time_string – (str) a time in HTK format

Returns

sppasPoint() representing time in seconds.

class anndata.aio.htk.sppasLab(name=None)[source]

Bases: anndata.aio.htk.sppasBaseHTK

SPPAS LAB reader and writer.

Each line of a HTK label file contains the actual label optionally preceded by start and end times, and optionally followed by a match score.

[<start> <end>] <<name> [<score>]> [“;” <comment>]

Multiple alternatives are written as a sequence of separate label lists separated by three slashes (///).

Examples:
  • simple transcription:

    0000000 3600000 ice 3600000 8200000 cream

  • alternative labels:

    0000000 2200000 I 2200000 8200000 scream /// 0000000 3600000 ice 3600000 8200000 cream /// 0000000 3600000 eyes 3600000 8200000 cream

********* Only simple transcription is implemented yet. *******

__init__(name=None)[source]

Initialize a new sppasLab instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of HTK-Lab format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a transcription from a file.

Parameters

filename

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

anndata.aio.phonedit module

filename

sppas.src.anndata.aio.phonedit.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of Phonedit-Signaix native file formats.

PHONEDIT Signaix is a software for the analysis of sound, aerodynamic, articulatory and electro-physiological signals developed by the Parole et Langage Laboratory, Aix-en-Provence, France.

It provides a complete environment for the recording, the playback, the display, the analysis, the labeling of multi-parametric data.

http://www.lpl-aix.fr/~lpldev/phonedit/

class anndata.aio.phonedit.sppasBasePhonedit(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Readers and writers of Phonedit files.

__init__(name=None)[source]

Initialize a new sppasBasePhonedit instance.

Parameters

name – (str) This transcription name.

class anndata.aio.phonedit.sppasMRK(name=None)[source]

Bases: anndata.aio.phonedit.sppasBasePhonedit

Reader and writer of Phonedit MRK files.

Example of the old format:

[DSC_LEVEL_AA] DSC_LEVEL_NAME=”transcription” DSC_LEVEL_CREATION_DATE=2018/03/09 09:57:07 DSC_LEVEL_LASTMODIF_DATE=2018/03/09 09:57:07 DSC_LEVEL_SOFTWARE=Phonedit Application 4.2.0.8 [LBL_LEVEL_AA] LBL_LEVEL_AA_000000= “#” 0.000000 2497.100755 LBL_LEVEL_AA_000001= “ipu_1” 2497.100755 5683.888038 LBL_LEVEL_AA_000002= “#” 5683.888038 5743.602653 LBL_LEVEL_AA_000003= “ipu_2” 5743.602653 8460.595544

The new MRK format includes sections for time slots.

__init__(name=None)[source]

Initialize a new sppasBaseSclite instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CTM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static format_point(second_count)[source]

Convert a time in seconds into MRK format.

static make_point(midpoint)[source]

In Phonedit, the localization is a time value, so a float.

Parameters

midpoint – (str) a time in ELAN format

Returns

sppasPoint() representing time in seconds.

read(filename)[source]

Read a Phonedit mark file.

Parameters

filename – intput filename.

write(filename)[source]

Write a Phonedit mark file.

Parameters

filename – output filename.

class anndata.aio.phonedit.sppasSignaix(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Reader and writer of F0 values from LPL-Signaix.

__init__(name=None)[source]

Initialize a new sppasSignaix instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CTM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename, delta=0.01)[source]

Read a file with Pitch values sampled at delta seconds.

The file contains one value at a line. If the audio file is 30 seconds long and delta is 0.01, we expect: 100 * 30 = 3,000 lines in the file

Parameters
  • filename – (str) intput filename.

  • delta – (float) sampling of the file. Default is one F0

value each 10ms, so 100 values / second

write(filename)[source]

Write a file with pitch values.

Parameters

filename – (str) output filename

anndata.aio.praat module

filename

sppas.anndata.aio.praat.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

The Praat files reader/writer.

class anndata.aio.praat.sppasBaseNumericalTier(name=None)[source]

Bases: anndata.aio.praat.sppasBasePraat

SPPAS PitchTier, IntensityTier, etc reader and writer.

Support of Praat file formats with only one tier of numerical values like pitch, intensity, etc.

__init__(name=None)[source]

Initialize a new sppasBaseNumericalTier instance.

Parameters

name – (str) This transcription name.

class anndata.aio.praat.sppasBasePraat(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Base class for readers and writers of Praat files.

Praat - Doing phonetic with computers, is a GPL tool developed by:

Paul Boersma and David Weenink Phonetic Sciences, University of Amsterdam, The Netherlands

See: http://www.fon.hum.uva.nl/praat/

__init__(name=None)[source]

Initialize a new Praat instance.

Parameters

name – (str) This transcription name.

static make_point(midpoint, radius=0.0005)[source]

The localization is a time value, so a float.

Parameters
  • midpoint – (float, str, int) a time value (in seconds).

  • radius – (float): vagueness (in seconds)

Returns

(sppasPoint)

class anndata.aio.praat.sppasIntensityTier(name=None)[source]

Bases: anndata.aio.praat.sppasPitchTier

SPPAS IntensityTier reader and writer.

__init__(name=None)[source]

Initialize a new sppasIntensityTier instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of IntensityTier format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a IntensityTier file.

Parameters

filename – (str) the input file name

write(filename)[source]

Write a IntensityTier file.

Parameters

filename – (str)

class anndata.aio.praat.sppasPitchTier(name=None)[source]

Bases: anndata.aio.praat.sppasBaseNumericalTier

SPPAS PitchTier reader and writer.

__init__(name=None)[source]

Initialize a new sppasPitchTier instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of PitchTier format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a PitchTier file.

Parameters

filename – (str) the input file name

to_pitch()[source]

Convert the PitchTier to Pitch values.

Returns

list of pitch values with delta = 0.01

write(filename)[source]

Write a PitchTier file.

Parameters

filename – (str)

class anndata.aio.praat.sppasTextGrid(name=None)[source]

Bases: anndata.aio.praat.sppasBasePraat

SPPAS TextGrid reader and writer.

TextGrid supports multiple tiers in a file. TextGrid does not support empty files (file with no tiers). TextGrid does not support alternatives labels nor locations. Only the ones with the best score are saved. TextGrid does not support controlled vocabularies. TextGrid does not support hierarchy. TextGrid does not support metadata. TextGrid does not support media assignment. TextGrid supports points and intervals. TextGrid does not support disjoint intervals. TextGrid does not support alternative tags (here called “text”). TextGrid does not support radius.

Both “short TextGrid” and “long TextGrid” file formats are supported.

__init__(name=None)[source]

Initialize a new sppasTextGrid instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of TextGrid format or not.

Try first to open the file with the default sppas encoding, then UTF-16.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a TextGrid file.

Parameters

filename – is the input file name, ending by “.TextGrid”

write(filename)[source]

Write a TextGrid file.

Parameters

filename – (str)

anndata.aio.readwrite module

filename

sppas.anndata.aio.readwrite.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

The annotated files main reader/writer.

class anndata.aio.readwrite.FileFormatProperty(extension)[source]

Bases: object

Represent one format and its properties.

__init__(extension)[source]

Create a FileFormatProperty instance.

Parameters

extension – (str) File name extension.

get_extension()[source]

Return the extension, including the initial dot.

get_reader()[source]

Return True if SPPAS can read files of the extension.

get_software()[source]

Return the name of the software matching the extension.

get_trs_type()[source]

Return the transcription type: ANNOT, MEASURE, TABLE or None.

get_writer()[source]

Return True if SPPAS can write files of the extension.

class anndata.aio.readwrite.sppasTrsRW(filename)[source]

Bases: object

Main parser of annotated data: Reader and writer of annotated data.

All the 3 types of annotated files are supported: ANNOT, MEASURE, TABLE.

TRANSCRIPTION_TYPES = {'IntensityTier': <class 'anndata.aio.praat.sppasIntensityTier'>, 'PitchTier': <class 'anndata.aio.praat.sppasPitchTier'>, 'TextGrid': <class 'anndata.aio.praat.sppasTextGrid'>, 'ant': <class 'anndata.aio.annotationpro.sppasANT'>, 'antx': <class 'anndata.aio.annotationpro.sppasANTX'>, 'anvil': <class 'anndata.aio.anvil.sppasAnvil'>, 'arff': <class 'anndata.aio.table.sppasARFF'>, 'aup': <class 'anndata.aio.audacity.sppasAudacity'>, 'csv': <class 'anndata.aio.text.sppasCSV'>, 'ctm': <class 'anndata.aio.sclite.sppasCTM'>, 'eaf': <class 'anndata.aio.elan.sppasEAF'>, 'hz': <class 'anndata.aio.phonedit.sppasSignaix'>, 'lab': <class 'anndata.aio.htk.sppasLab'>, 'mrk': <class 'anndata.aio.phonedit.sppasMRK'>, 'srt': <class 'anndata.aio.subtitle.sppasSubRip'>, 'stm': <class 'anndata.aio.sclite.sppasSTM'>, 'sub': <class 'anndata.aio.subtitle.sppasSubViewer'>, 'tdf': <class 'anndata.aio.xtrans.sppasTDF'>, 'tra': <class 'anndata.aio.table.sppasTRA'>, 'trs': <class 'anndata.aio.transcriber.sppasTRS'>, 'txt': <class 'anndata.aio.text.sppasRawText'>, 'vtt': <class 'anndata.aio.subtitle.sppasWebVTT'>, 'xra': <class 'anndata.aio.xra.sppasXRA'>, 'xrff': <class 'anndata.aio.table.sppasXRFF'>}
__init__(filename)[source]

Create a Transcription reader-writer.

Parameters

filename – (str)

static annot_extensions()[source]

Return the list of ANNOT extensions (case sensitive).

static create_trs_from_extension(filename)[source]

Return a transcription according to a given filename.

Only the extension of the filename is used.

Parameters

filename – (str)

Returns

Transcription()

static create_trs_from_heuristic(filename)[source]

Return a transcription according to a given filename.

The given file is opened and an heuristic allows to fix the format.

Parameters

filename – (str)

Returns

Transcription()

static extensions()[source]

Return the whole list of supported extensions (case sensitive).

static extensions_in()[source]

Return the list of supported extensions if the reader exists.

static extensions_out()[source]

Return the list of supported extensions if the writer exists.

get_filename()[source]

Return the filename.

static measure_extensions()[source]

Return the list of MEASURE extensions (case sensitive).

read(heuristic=False)[source]

Read a transcription from a file.

Parameters

heuristic – (bool) if the extension of the file is unknown, use

an heuristic to detect the format, then to choose the reader-writer. :returns: sppasTranscription reader-writer

set_filename(filename)[source]

Set a new filename.

Parameters

filename – (str)

static table_extensions()[source]

Return the list of TABLE extensions (case sensitive).

write(transcription)[source]

Write a transcription into a file.

Parameters

transcription – (sppasTranscription)

anndata.aio.sclite module

filename

sppas.src.anndata.aio.sclite.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of SCTK formats.

Sclite readers and writers: ctm, stm file formats. The program sclite is a tool for scoring and evaluating the output of speech recognition systems.

Sclite is part of the NIST SCTK Scoring Tookit: https://www.nist.gov/itl/iad/mig/tools

File formats description: http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/infmts.htm#ctm_fmt_name_0

Remark:

Because comments are possible, this class uses this function as an opportunity to store metadata.

class anndata.aio.sclite.sppasBaseSclite(name=None)[source]

Bases: anndata.aio.text.sppasBaseText

SPPAS base Sclite reader and writer.

          • Current version does not fully support alternations. * * * * *

__init__(name=None)[source]

Initialize a new sppasBaseSclite instance.

Parameters

name – (str) This transcription name.

static make_point(midpoint)[source]

The localization is a time value, so always a float.

class anndata.aio.sclite.sppasCTM(name=None)[source]

Bases: anndata.aio.sclite.sppasBaseSclite

SPPAS ctm reader and writer.

This is the reader/writer of the time marked conversation input files to be used for scoring the output of speech recognizers via the NIST sclite() program. This file format is as follow (in BNF):

CTM :== <F> <C> <BT> <DUR> word [ <CONF> ]

where:
<F> -> The waveform filename.

NOTE: no path-names or extensions are expected.

<C> -> The waveform channel. Either “A” or “B”.

The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.

<BT> -> The begin time (seconds) of the word, measured from the

start time of the file.

<DUR> -> The duration (seconds) of the word. <CONF> -> Optional confidence score.

The file must be sorted by the first three columns: the first and the second in ASCII order, and the third by a numeric order.

Lines beginning with ‘;;’ are considered comments and ignored by sclite. Blank lines are also ignored.

      • NOT IMPLEMENTED * * *

Alternations are also accepted in some extended CTM. Examples:

;; 7654 A * * <ALT_BEGIN> 7654 A 12.00 0.34 UM 7654 A * * <ALT> 7654 A 12.00 0.34 UH 7654 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 222.77 0.32 BYEBYE 5555 A * * <ALT> 5555 A 222.78 0.12 BYE 5555 A 222.93 0.16 BYE 5555 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 186.32 0.01 D- 5555 A * * <ALT> 5555 A * * <ALT_END>

__init__(name=None)[source]

Initialize a new CTM instance.

Parameters

name – (str) This transcription name.

static check_line(line, line_number=0)[source]

Check whether a line is an annotation or not.

Raises AioLineFormatError() or ValueError() in case of a malformed line.

Parameters
  • line – (str)

  • line_number – (int)

Returns

(bool)

static detect(filename)[source]

Check whether a file is of CTM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static get_score(line)[source]

Return the score of the label of a given line.

Parameters

line – (str)

Returns

(float) or None if no score is given

get_tier(line)[source]

Return the tier related to the given line.

Find the tier or create it.

Parameters

line – (str)

Returns

(sppasTier)

read(filename)[source]

Read a ctm file and fill the Transcription.

It creates a tier for each media-channel observed in the file.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.sclite.sppasSTM(name=None)[source]

Bases: anndata.aio.sclite.sppasBaseSclite

SPPAS stm reader and writer.

This is the reader/writer for the segment time marked files to be used for scoring the output of speech recognizers via the NIST sclite() program.

STM :== <F> <C> <S> <BT> <ET> [ <LABEL> ] transcript …

where:
<F> -> The waveform filename.

NOTE: no pathnames or extensions are expected.

<C> -> The waveform channel. Either “A” or “B”.

The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.

<S> -> The speaker id, no restrictions apply to this name. <BT> -> The begin time (seconds) of the word, measured from the

start time of the file.

<ET> -> The end time (seconds) of the segment. <LABEL> -> A comma separated list of subset identifiers enclosed

in angle brackets

transcript -> The transcript can take on two forms:
  1. a whitespace separated list of words, or

2) the string “IGNORE_TIME_SEGMENT_IN_SCORING”. The list of words can contain a transcript alternation using the following BNF format:

ALTERNATE :== “{” <text> ALT+ “}” ALT :== “|” <text> TEXT :== 1 thru n words | “@” | ALTERNATE

The file must be sorted by the first and second columns in ASCII order, and the fourth in numeric order.

Lines beginning with ‘;;’ are considered comments and are ignored. Blank lines are also ignored.

__init__(name=None)[source]

Initialize a new STM instance.

Parameters

name – (str) This transcription name.

static check_line(line, line_number=0)[source]

Check whether a line is an annotation or not.

Raises AioLineFormatError() or ValueError() in case of a malformed line.

Parameters
  • line – (str)

  • line_number – (int)

Returns

(bool)

static detect(filename)[source]

Check whether a file is of STM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

get_tier(line)[source]

Return the tier related to the given line.

Find the tier or create it.

Parameters

line – (str)

Returns

(sppasTier)

read(filename)[source]

Read a ctm file and fill the Transcription.

It creates a tier for each media-channel observed in the file.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

anndata.aio.subtitle module

filename

sppas.src.anndata.aio.subtitle.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of subtitles formats (.sub, .srt…).

SubViewer is a utility for adding and synchronizing subtitles to video content. It was created by David Dolinski in 1999. Precision in time is 10ms.

SubRip is a free software program for Windows which “rips” (extracts) subtitles and their timings from video. It is free software, released under the GNU GPL. SubRip is also the name of the widely used and broadly compatible subtitle text file format created by this software. Precision in time is 1ms.

WebVTT (Web Video Text Tracks) is a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 <track> element. Main differences from SubRip are:

  • WebVTT’s first line starts with WEBVTT after the optional UTF-8 byte order mark

  • There is space for optional header data between the first line and the first cue

  • Timecode fractional values are separated by a full stop instead of a comma

  • Timecode hours are optional

  • The frame numbering/identification preceding the timecode is optional

  • Comments identified by the word NOTE can be added

  • Metadata information can be added in a JSON-style format

  • Chapter information can be optionally specified

  • Only supports extended characters as UTF-8

  • CSS in a separate file defined in the companion HTML document for C tags is used instead of the FONT tag

  • Cue settings allow the customization of cue positioning on the video

class anndata.aio.subtitle.sppasBaseSubtitles(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

SPPAS base class for subtitle formats.

__init__(name=None)[source]

Initialize a new sppasBaseSubtitles instance.

Parameters

name – (str) This transcription name.

static make_point(midpoint)[source]

In subtitles, the localization is a time value, so a float.

class anndata.aio.subtitle.sppasSubRip(name=None)[source]

Bases: anndata.aio.subtitle.sppasBaseSubtitles

SPPAS reader/writer for SRT format.

The SubRip text file format (SRT) is used by the SubRip program to save subtitles ripped from video files or DVDs. It is free software, released under the GNU GPL.

Each subtitle is represented as a group of lines. Subtitles are separated subtitles by a blank line.

  • first line of a subtitle is an index (starting from 1);

  • the second line is a timestamp interval, in the format %H:%M:%S,%m and the start and end of the range separated by –>;

  • optionally: a specific positioning by pixels, in the form X1:number Y1:number X2:number Y2:number;

  • the third line is the label. The HTML <b>, <i>, <u>, and <font> tags are allowed.

__init__(name=None)[source]

Initialize a new sppasSubRip instance.

Parameters

name – (str) This transcription name.

read(filename)[source]

Read a SRT file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.subtitle.sppasSubViewer(name=None)[source]

Bases: anndata.aio.subtitle.sppasBaseSubtitles

SPPAS reader/writer for SUB format.

The SubViewer text file format (SUB) is used by the SubViewer program to save subtitles of videos.

__init__(name=None)[source]

Initialize a new sppasBaseSubtitles instance.

Parameters

name – (str) This transcription name.

read(filename)[source]

Read a SUB file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.subtitle.sppasWebVTT(name=None)[source]

Bases: anndata.aio.subtitle.sppasBaseSubtitles

SPPAS reader/writer for VTT format.

ONLY THE WRITER IS IMPLEMENTED YET.

__init__(name=None)[source]

Initialize a new sppasWebVTT instance.

Parameters

name – (str) This transcription name.

write(filename)[source]

Write a transcription into a file.

Not fully implemented.

Parameters

filename – (str)

anndata.aio.table module

filename

sppas.src.anndata.aio.table.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Export annotated data into time-tables.

Weka is a collection of machine learning algorithms for data mining tasks: https://www.cs.waikato.ac.nz/ml/weka/

WEKA is supporting 2 file formats:

  1. ARFF: a simple ASCII file,

  2. XRFF: an XML file which can be compressed with gzip.

This file is also implementing the TRA format of SPPAS: the Table Rich Annotations format.

ONLY writers for ARFF, XRFF and TRA are implemented.

class anndata.aio.table.sppasARFF(name=None)[source]

Bases: anndata.aio.table.sppasTable

SPPAS ARFF writer.

ARFF format description is at the following URL: http://weka.wikispaces.com/ARFF+(book+version) An ARFF file for WEKA has the following structure:

  1. Several lines starting by ‘%’ with any kind of comment,

  2. The name of the relation,

  3. The set of attributes,

  4. The set of instances.

__init__(name=None)[source]

Initialize a new sppasARFF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of the appropriate format or not.

write(filename)[source]

Write a RawText file.

Parameters

filename – (str)

class anndata.aio.table.sppasTRA(name=None)[source]

Bases: anndata.aio.table.sppasTable

SPPAS TRA writer: the Table Rich Annotations format.

This format contains the set of instances separated be ‘;’. It can be easily parsed like a CSV file.

__init__(name=None)[source]

Initialize a new sppasTRA instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CSV format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

write(filename, signed=True)[source]

Write a raw text file with data in a table.

If signed is False, the default encoding is used.

Parameters
  • filename – (str)

  • signed – (bool) Indicate if the encoding is UTF-8 signed.

class anndata.aio.table.sppasTable(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

SPPAS Base writer for ARFF and XRFF formats.

The following metadata of the Transcription object can be defined:

  • table_instance_step: time step for the data instances. Do not

define if “table_instance_anchor” is set to a tier. - table_max_class_tags - table_max_attributes_tags - table_empty_annotation_tag - table_empty_annotation_class_tag - table_uncertain_annotation_tag

The following metadata can be defined in a tier:

  • table_attribute is fixed if the tier will be used as attribute

(i.e. its data will be part of the instances). The value can be “numeric” to use distributions of probabilities or “label” to use the annotation labels in the vector of parameters. - table_class is fixed to the tier with the annotation labels to

be inferred by the classification system. No matter of the value.

  • table_instance_anchor is fixed if the tier has to be used to

define the time intervals of the instances. - table_epsilon probability of an unobserved tag.

Notice that the anchor tier can also be either an attribute tier or the class tier. TODO: BUG IF ANCHOR == CLASS

__init__(name=None)[source]

Initialize a new sppasTable instance.

Parameters

name – (str) This transcription name.

static check_max_attributes_tags(nb_tags)[source]

Check the maximum number of tags for an attribute.

Parameters

nb_tags – (int) Size of the controlled vocabulary of the

attribute tier

static check_max_class_tags(nb_tags)[source]

Check the maximum number of tags for the class.

Parameters

nb_tags – (int) Size of the controlled vocabulary of the

class tier

check_metadata()[source]

Check the metadata and fix the variable members.

create_class_tier()[source]

Return a tier with a single annotation to be used as class.

get_max_class_tags()[source]

Return the maximum number of tags for the class.

set_empty_annotation_class_tag(tag_str=None)[source]

Fix the annotation string to be used to replace…

empty annotations in the class tier.

Parameters

tag_str – (str or None) None is used to NOT fill

unlabelled annotations, so to ignore them in the data.

set_empty_annotation_tag(tag_str)[source]

Fix the annotation string to be used to replace…

empty annotations.

Parameters

tag_str – (str)

set_max_attributes_tags(nb_tags)[source]

Set the maximum number of tags for an attribute.

Instead, the program won’t list the attribute and will use ‘STRING’.

Parameters

nb_tags – (int) Size of the controlled vocabulary of the

class tier

set_max_class_tags(nb_tags)[source]

Set the maximum number of tags for a class.

Parameters

nb_tags – (int) Size of the controlled vocabulary of the

class tier

set_uncertain_annotation_tag(tag_str)[source]

Fix the annotation string that is used in the annotations to…

mention an uncertain label.

Parameters

tag_str – (str)

validate()[source]

Check the tiers.

Verify if everything is ok:

  1. A class is defined: “table_class” in the metadata of a tier

  2. Attributes are fixed: “table_attribute” in the metadata of at least one tier

Raises IOError or ValueError if something is wrong.

validate_annotations()[source]

Prepare data to be compatible with the expected format.

  • Convert tier names

  • Delete the existing controlled vocabularies

  • Convert tags: fill empty tags, replace whitespace by underscores

class anndata.aio.table.sppasXRFF(name=None)[source]

Bases: anndata.aio.table.sppasTable

SPPAS XRFF writer.

XML-based format of WEKA software tool. XRFF format description is at the following URL: http://weka.wikispaces.com/XRFF

This class is limited to:
  1. Only the writers are implemented. No readers.

  2. Sparse option is not supported by both writers.

  3. XRFF output file is not gzipped.

  4. XRFF format supports the followings that are not currently implemented into this class:

    • attribute weights;

    • instance weights.

– !!!!!!!! No guarantee !!!!!! –

This class has never been tested.

– !!!!!!!! No guarantee !!!!!! –

__init__(name=None)[source]

Initialize a new sppasXRFF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of the appropriate format or not.

write(filename)[source]

Write a XRFF file.

Parameters

filename – (str)

anndata.aio.text module

filename

sppas.src.anndata.aio.text.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Text readers and writers for raw text, column-based text, csv.

class anndata.aio.text.sppasBaseText(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

SPPAS base text reader and writer.

__init__(name=None)[source]

Initialize a new sppasBaseText instance.

Parameters

name – (str) This transcription name.

static create_media(media_url, meta_object)[source]

Return the media of the given name (create it if necessary).

Parameters
  • media_url – (str) Name (url) of the media to search/create

  • meta_object – (sppasTranscription)

Returns

(sppasMedia)

static fix_location(content_begin, content_end)[source]

Fix the location from the content of the data.

Parameters

content_begin – (str) The content of a column representing

the begin of a localization. :param content_end: (str) The content of a column representing the end of a localization. :returns: sppasLocation or None

static format_quotation_marks(text)[source]

Remove initial and final quotation mark.

Parameters

text – (str/unicode) Text to clean

Returns

(unicode) the text without initial and final quotation mark.

static get_lines_columns(lines)[source]

Column-delimited? Search for the relevant separator.

Parameters

lines – (list of str)

Returns

lines (list) of columns (list of str)

static is_comment(line)[source]

Check if the line is a comment, ie starts with ‘;;’.

Parameters

line – (str/unicode)

Returns

boolean

static make_point(data)[source]

Convert data into the appropriate sppasPoint().

No radius is fixed if data is an integer. A default radius of 0.001 seconds if data is a float.

Parameters

data – (any type)

Returns

sppasPoint().

static serialize_header(filename, meta_object)[source]

Create a comment with the metadata to be written.

Parameters
  • filename – (str) Name of the file to serialize.

  • meta_object – (sppasMeta)

static serialize_header_software()[source]

Serialize the header of a file with SPPAS information.

static serialize_metadata(meta_object)[source]

Serialize the metadata of an object in a multi-lines comment.

static split_lines(lines, separator=' ')[source]

Split the lines with the given separator.

Parameters
  • lines – (list) List of lines

  • separator – (char) a character used to separate columns of the lines

Returns

Lines (list) separated by columns (list) or None if error.

class anndata.aio.text.sppasCSV(name=None)[source]

Bases: anndata.aio.text.sppasBaseText

SPPAS CSV reader and writer.

Author

Brigitte Bigi

Organization

Laboratoire Parole et Langage, Aix-en-Provence, France

Contact

contact@sppas.org

License

GPL, v3

Copyright

Copyright (C) 2011-2018 Brigitte Bigi

__init__(name=None)[source]

Initialize a new CSV instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CSV format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

format_columns_lines(lines)[source]

Append lines content into self.

The algorithm doesn’t suppose that the file is sorted by tiers

Parameters

lines – (list)

read(filename, signed=True)[source]

Read a CSV file.

Parameters
  • filename – (str)

  • signed – (bool) Indicate if the encoding is UTF-8 signed.

If False, the default encoding is used.

write(filename, signed=True)[source]

Write a CSV file.

Because the labels can be only on one line, the whitespace is used to separate labels (instead of CR in other formats like textgrid).

Parameters
  • filename – (str)

  • signed – (bool) Indicate if the encoding is UTF-8 signed.

If False, the default encoding is used.

class anndata.aio.text.sppasRawText(name=None)[source]

Bases: anndata.aio.text.sppasBaseText

SPPAS raw text reader and writer.

Author

Brigitte Bigi

Organization

Laboratoire Parole et Langage, Aix-en-Provence, France

Contact

contact@sppas.org

License

GPL, v3

Copyright

Copyright (C) 2011-2018 Brigitte Bigi

RawText does not support multiple tiers for writing (ok for reading). RawText accepts no tiers. RawText does not support alternatives labels nor locations. Only the ones with the best score are saved. RawText can save only one tier. RawText does not support controlled vocabularies. RawText does not support hierarchy. RawText does not support metadata. RawText does not support media assignment. RawText supports points and intervals. It does not support disjoint intervals. RawText does not support alternative tags. RawText does not support radius.

RawText supports comments: such lines are starting with ‘;;’.

__init__(name=None)[source]

Initialize a new sppasRawText instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Detect if file is text.

read(filename)[source]

Read a raw file and fill the Transcription.

The file can be a simple raw text (without location information). It can also be a column-based (table-style) file, so that each column represents the annotation of a tier (1st and 2nd columns are indicating the location).

Parameters

filename – (str)

write(filename)[source]

Write a RawText file.

Labels are preserved, ie. separated by whitespace and alternative tags included.

Parameters

filename – (str)

anndata.aio.transcriber module

filename

sppas.src.anndata.aio.transcriber.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of the deprecated Transcriber transcription tool.

Transcriber is a tool for assisting the manual annotation of speech signals. It provides a graphical user interface for segmenting long duration speech recordings, transcribing them, and labeling speech turns, topic changes and acoustic conditions. It is more specifically designed for the annotation of broadcast news recordings.

http://trans.sourceforge.net

class anndata.aio.transcriber.sppasTRS(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

SPPAS reader for TRS format.

__init__(name=None)[source]

Initialize a new sppasTRS instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of TRS format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static make_point(midpoint)[source]

The localization is a time value, so a float.

read(filename)[source]

Read a TRS file and fill the Transcription.

<!ELEMENT Trans ((Speakers|Topics)*,Episode)>

Parameters

filename – (str)

anndata.aio.xra module

filename

sppas.src.anndata.aio.xra.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of SPPAS native file formats (.xra, .jra).

class anndata.aio.xra.sppasJRA[source]

Bases: object

JRA is intended to be the next default format of annotated files.

static format_label(label_root, label)[source]

Add a ‘Label’ in the dict from a sppasLabel().

Parameters
  • label_root – (list)

  • label – (sppasLabel)

static parse_label(tags_list)[source]

Create a sppasLabel from a list of tag dictionaries.

class anndata.aio.xra.sppasXRA(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

SPPAS XRA reader and writer.

xra files are the native file format of the GPL tool SPPAS.

__init__(name=None)[source]

Initialize a new XRA instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of XRA format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static format_annotation(annotation_root, annotation)[source]

Add an ‘Annotation’ element in the tree from a sppasAnnotation().

Parameters
  • annotation_root – (ET) XML Element tree root.

  • annotation – (sppasAnnotation)

static format_label(label_root, label)[source]

Add a ‘Label’ element in the tree from a sppasLabel().

Parameters
  • label_root – (ET) XML Element tree root.

  • label – (sppasLabel)

static format_location(location_root, location)[source]

Add a ‘Location’ element in the tree from a sppasLocation().

Parameters
  • location_root – (ET) XML Element tree root.

  • location – (sppasLocation)

static format_metadata(metadata_root, meta_object, exclude=[])[source]

Add ‘Metadata’ element in the tree from a sppasMetaData().

Parameters
  • metadata_root – (ET) XML Element tree root.

  • meta_object – (sppasMetadata)

  • exclude – (list) List of keys to exclude

static format_tier(tier_root, tier)[source]

Add a ‘Tier’ object in the tree from a sppasTier().

Parameters
  • tier_root – (ET) XML Element tree root.

  • tier – (sppasTier)

static indent(elem, level=0)[source]

Pretty indent.

http://effbot.org/zone/element-lib.htm#prettyprint

static parse_label(label_root)[source]

Parse a ‘Label’ element and return it.

Parameters

label_root – (ET) XML Element tree root.

Returns

(sppasLabel)

read(filename)[source]

Read an XRA file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write an XRA file.

Parameters

filename – (str)

anndata.aio.xtrans module

filename

sppas.src.anndata.aio.xtrans.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Input/Output of XTrans.

XTrans is a multi-platform, multilingual, multi-channel transcription tool that supports manual transcription and annotation of audio recordings. Last version of Xtrans was released in 2009.

https://www.ldc.upenn.edu/language-resources/tools/xtrans

class anndata.aio.xtrans.sppasTDF(name=None)[source]

Bases: anndata.aio.text.sppasBaseText

SPPAS TDF reader.

This class implements a TDF reader, but not a writer. TDF is a Tab-Delimited Format. It contains 13 columns but SPPAS only extracts 8 of them.

TDF does not support alternatives labels nor locations. Only the ones with the best score are saved. TDF can save several tiers. TDF does not support controlled vocabularies. TDF does not support hierarchy. TDF does not support metadata. TDF supports media assignment. TDF supports intervals only. TDF does not support alternative tags. TDF does not support radius.

__init__(name=None)[source]

Initialize a new sppasTDF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of TDF format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static make_point(midpoint)[source]

The localization is a time value, so always a float.

read(filename)[source]

Read a raw file and fill the sppasTranscription.

It creates a tier for each speaker-channel observed in the file.

Parameters

filename – (str)

Module contents

filename

sppas.src.anndata.aio.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Readers and writers of annotated data.

anndata.aio.format_label(text, empty='', tag_type='str')[source]

Create a label from a text.

Use the “{ | }” system to parse the alternative tags and = for scores.

Parameters
  • text – (str)

  • empty – (str) The text representing an empty tag.

  • tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).

Returns

sppasLabel

anndata.aio.format_labels(text, separator='\n', empty='', tag_type='str')[source]

Create a set of labels from a text.

Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags.

Examples

text = “{le|les} {chat|chats}” is 2 labels with 2 tags each text = “{le=0.6|les=0.4}” is a label with 2 tags and their score

Parameters
  • text – (str)

  • separator – (str) String to separate labels.

  • empty – (str) The text representing an empty tag.

  • tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).

Returns

list of sppasLabel

anndata.aio.serialize_label(label, empty='', alt=True)[source]

Convert the label into a string, include or not alternative tags.

Use the “{ | }” system to serialize the alternative tags. Scores of the tags are not returned.

Parameters
  • label – (sppasLabel)

  • empty – (str) The text to return if a tag is empty or not set.

  • alt – (bool) Include alternative tags

Returns

(str)

anndata.aio.serialize_labels(labels, separator='\n', empty='', alt=True)[source]

Create a text from a list of labels.

Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags and = for scores.

Parameters
  • labels – (list of sppasLabel)

  • separator – (str) String separating labels

  • empty – (str) The text representing an empty tag

  • alt – (bool) Include alternative tags. If False, only the best tag is serialized.

Returns

list of sppasLabel

class anndata.aio.sppasANT(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

AnnotationPro ANT reader and writer.

An ANT file is a ZIPPED directory.

__init__(name=None)[source]

Initialize a new sppasANT instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of ANT format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read an ANT file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write an Ant file.

Parameters

filename – (str)

class anndata.aio.sppasANTX(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

AnnotationPro ANTX reader and writer.

__init__(name=None)[source]

Initialize a new sppasANTX instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of ANTX format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

elt_to_meta(root, meta_object, uri, exclude_list=[])[source]

Add nodes of root in meta_object.

static indent(elem, level=0)[source]

Pretty indent of an ElementTree.

http://effbot.org/zone/element-lib.htm#prettyprint

static make_point(midpoint, sample_rate=44100)[source]

The localization is a frame value, so an integer.

read(filename)[source]

Read an ANTX file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write an Antx file.

Parameters

filename

class anndata.aio.sppasARFF(name=None)[source]

Bases: anndata.aio.table.sppasTable

SPPAS ARFF writer.

ARFF format description is at the following URL: http://weka.wikispaces.com/ARFF+(book+version) An ARFF file for WEKA has the following structure:

  1. Several lines starting by ‘%’ with any kind of comment,

  2. The name of the relation,

  3. The set of attributes,

  4. The set of instances.

__init__(name=None)[source]

Initialize a new sppasARFF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of the appropriate format or not.

write(filename)[source]

Write a RawText file.

Parameters

filename – (str)

class anndata.aio.sppasAnvil(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

ANVIL (partially) reader.

Author

Brigitte Bigi, Jibril Saffi

Organization

Laboratoire Parole et Langage, Aix-en-Provence, France

Contact

contact@sppas.org

License

GPL, v3

Copyright

Copyright (C) 2011-2018 Brigitte Bigi

__init__(name=None)[source]

Initialize a new ANVIL instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of ANVIL format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static make_point(midpoint)[source]

The localization is a time value, so always a float.

read(filename)[source]

Read an ANVIL file and fill the Transcription.

Parameters

filename – (str)

class anndata.aio.sppasAudacity(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Audacity projects reader.

Can work on both Audacity projects and Audacity Label tracks.

__init__(name=None)[source]

Initialize a new sppasAudacity instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of AUP format or not. AUP files are encoded in UTF-8 without BOM.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static make_point(midpoint)[source]

The localization is a time value, so a float.

static normalize(name)[source]

Provide namespaces in element names.

Example:

<Element ‘{http://audacity.sourceforge.net/xml/}simpleblockfile’ at 0x03270230> <Element ‘{http://audacity.sourceforge.net/xml/}envelope’ at 0x032702C0> <Element ‘{http://audacity.sourceforge.net/xml/}labeltrack’ at 0x03270C50> <Element ‘{http://audacity.sourceforge.net/xml/}label’ at 0x032701E8>

See: http://effbot.org/zone/element-namespaces.htm

read(filename)[source]

Read an AUP file and fill the Transcription.

<!ELEMENT project (tags, (wavetrack | labeltrack | timetrack)*)>

Parameters

filename – (str)

class anndata.aio.sppasCSV(name=None)[source]

Bases: anndata.aio.text.sppasBaseText

SPPAS CSV reader and writer.

Author

Brigitte Bigi

Organization

Laboratoire Parole et Langage, Aix-en-Provence, France

Contact

contact@sppas.org

License

GPL, v3

Copyright

Copyright (C) 2011-2018 Brigitte Bigi

__init__(name=None)[source]

Initialize a new CSV instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CSV format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

format_columns_lines(lines)[source]

Append lines content into self.

The algorithm doesn’t suppose that the file is sorted by tiers

Parameters

lines – (list)

read(filename, signed=True)[source]

Read a CSV file.

Parameters
  • filename – (str)

  • signed – (bool) Indicate if the encoding is UTF-8 signed.

If False, the default encoding is used.

write(filename, signed=True)[source]

Write a CSV file.

Because the labels can be only on one line, the whitespace is used to separate labels (instead of CR in other formats like textgrid).

Parameters
  • filename – (str)

  • signed – (bool) Indicate if the encoding is UTF-8 signed.

If False, the default encoding is used.

class anndata.aio.sppasCTM(name=None)[source]

Bases: anndata.aio.sclite.sppasBaseSclite

SPPAS ctm reader and writer.

This is the reader/writer of the time marked conversation input files to be used for scoring the output of speech recognizers via the NIST sclite() program. This file format is as follow (in BNF):

CTM :== <F> <C> <BT> <DUR> word [ <CONF> ]

where:
<F> -> The waveform filename.

NOTE: no path-names or extensions are expected.

<C> -> The waveform channel. Either “A” or “B”.

The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.

<BT> -> The begin time (seconds) of the word, measured from the

start time of the file.

<DUR> -> The duration (seconds) of the word. <CONF> -> Optional confidence score.

The file must be sorted by the first three columns: the first and the second in ASCII order, and the third by a numeric order.

Lines beginning with ‘;;’ are considered comments and ignored by sclite. Blank lines are also ignored.

      • NOT IMPLEMENTED * * *

Alternations are also accepted in some extended CTM. Examples:

;; 7654 A * * <ALT_BEGIN> 7654 A 12.00 0.34 UM 7654 A * * <ALT> 7654 A 12.00 0.34 UH 7654 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 222.77 0.32 BYEBYE 5555 A * * <ALT> 5555 A 222.78 0.12 BYE 5555 A 222.93 0.16 BYE 5555 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 186.32 0.01 D- 5555 A * * <ALT> 5555 A * * <ALT_END>

__init__(name=None)[source]

Initialize a new CTM instance.

Parameters

name – (str) This transcription name.

static check_line(line, line_number=0)[source]

Check whether a line is an annotation or not.

Raises AioLineFormatError() or ValueError() in case of a malformed line.

Parameters
  • line – (str)

  • line_number – (int)

Returns

(bool)

static detect(filename)[source]

Check whether a file is of CTM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static get_score(line)[source]

Return the score of the label of a given line.

Parameters

line – (str)

Returns

(float) or None if no score is given

get_tier(line)[source]

Return the tier related to the given line.

Find the tier or create it.

Parameters

line – (str)

Returns

(sppasTier)

read(filename)[source]

Read a ctm file and fill the Transcription.

It creates a tier for each media-channel observed in the file.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.sppasEAF(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Elan EAF reader and writer.

__init__(name=None)[source]

Initialize a new sppasMLF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of EAF format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

format_point(second_count)[source]

Convert a time in seconds into ELAN format.

Parameters

second_count – (float) Time value (in seconds)

Returns

(int) a time in ELAN format

static indent(elem, level=0)[source]

Pretty indent.

http://effbot.org/zone/element-lib.htm#prettyprint

make_point(midpoint)[source]

Convert data into the appropriate sppasPoint().

Parameters

midpoint – (str) a time in ELAN format

Returns

(sppasPoint) Representation of time in seconds with a (very)

large vagueness!

read(filename)[source]

Read a ELAN EAF file.

Parameters

filename – (str) input filename.

write(filename)[source]

Write an ELAN EAF file.

Parameters

filename – output filename.

class anndata.aio.sppasIntensityTier(name=None)[source]

Bases: anndata.aio.praat.sppasPitchTier

SPPAS IntensityTier reader and writer.

__init__(name=None)[source]

Initialize a new sppasIntensityTier instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of IntensityTier format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a IntensityTier file.

Parameters

filename – (str) the input file name

write(filename)[source]

Write a IntensityTier file.

Parameters

filename – (str)

class anndata.aio.sppasLab(name=None)[source]

Bases: anndata.aio.htk.sppasBaseHTK

SPPAS LAB reader and writer.

Each line of a HTK label file contains the actual label optionally preceded by start and end times, and optionally followed by a match score.

[<start> <end>] <<name> [<score>]> [“;” <comment>]

Multiple alternatives are written as a sequence of separate label lists separated by three slashes (///).

Examples:
  • simple transcription:

    0000000 3600000 ice 3600000 8200000 cream

  • alternative labels:

    0000000 2200000 I 2200000 8200000 scream /// 0000000 3600000 ice 3600000 8200000 cream /// 0000000 3600000 eyes 3600000 8200000 cream

********* Only simple transcription is implemented yet. *******

__init__(name=None)[source]

Initialize a new sppasLab instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of HTK-Lab format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a transcription from a file.

Parameters

filename

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.sppasMRK(name=None)[source]

Bases: anndata.aio.phonedit.sppasBasePhonedit

Reader and writer of Phonedit MRK files.

Example of the old format:

[DSC_LEVEL_AA] DSC_LEVEL_NAME=”transcription” DSC_LEVEL_CREATION_DATE=2018/03/09 09:57:07 DSC_LEVEL_LASTMODIF_DATE=2018/03/09 09:57:07 DSC_LEVEL_SOFTWARE=Phonedit Application 4.2.0.8 [LBL_LEVEL_AA] LBL_LEVEL_AA_000000= “#” 0.000000 2497.100755 LBL_LEVEL_AA_000001= “ipu_1” 2497.100755 5683.888038 LBL_LEVEL_AA_000002= “#” 5683.888038 5743.602653 LBL_LEVEL_AA_000003= “ipu_2” 5743.602653 8460.595544

The new MRK format includes sections for time slots.

__init__(name=None)[source]

Initialize a new sppasBaseSclite instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CTM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static format_point(second_count)[source]

Convert a time in seconds into MRK format.

static make_point(midpoint)[source]

In Phonedit, the localization is a time value, so a float.

Parameters

midpoint – (str) a time in ELAN format

Returns

sppasPoint() representing time in seconds.

read(filename)[source]

Read a Phonedit mark file.

Parameters

filename – intput filename.

write(filename)[source]

Write a Phonedit mark file.

Parameters

filename – output filename.

class anndata.aio.sppasPitchTier(name=None)[source]

Bases: anndata.aio.praat.sppasBaseNumericalTier

SPPAS PitchTier reader and writer.

__init__(name=None)[source]

Initialize a new sppasPitchTier instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of PitchTier format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a PitchTier file.

Parameters

filename – (str) the input file name

to_pitch()[source]

Convert the PitchTier to Pitch values.

Returns

list of pitch values with delta = 0.01

write(filename)[source]

Write a PitchTier file.

Parameters

filename – (str)

class anndata.aio.sppasRawText(name=None)[source]

Bases: anndata.aio.text.sppasBaseText

SPPAS raw text reader and writer.

Author

Brigitte Bigi

Organization

Laboratoire Parole et Langage, Aix-en-Provence, France

Contact

contact@sppas.org

License

GPL, v3

Copyright

Copyright (C) 2011-2018 Brigitte Bigi

RawText does not support multiple tiers for writing (ok for reading). RawText accepts no tiers. RawText does not support alternatives labels nor locations. Only the ones with the best score are saved. RawText can save only one tier. RawText does not support controlled vocabularies. RawText does not support hierarchy. RawText does not support metadata. RawText does not support media assignment. RawText supports points and intervals. It does not support disjoint intervals. RawText does not support alternative tags. RawText does not support radius.

RawText supports comments: such lines are starting with ‘;;’.

__init__(name=None)[source]

Initialize a new sppasRawText instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Detect if file is text.

read(filename)[source]

Read a raw file and fill the Transcription.

The file can be a simple raw text (without location information). It can also be a column-based (table-style) file, so that each column represents the annotation of a tier (1st and 2nd columns are indicating the location).

Parameters

filename – (str)

write(filename)[source]

Write a RawText file.

Labels are preserved, ie. separated by whitespace and alternative tags included.

Parameters

filename – (str)

class anndata.aio.sppasSTM(name=None)[source]

Bases: anndata.aio.sclite.sppasBaseSclite

SPPAS stm reader and writer.

This is the reader/writer for the segment time marked files to be used for scoring the output of speech recognizers via the NIST sclite() program.

STM :== <F> <C> <S> <BT> <ET> [ <LABEL> ] transcript …

where:
<F> -> The waveform filename.

NOTE: no pathnames or extensions are expected.

<C> -> The waveform channel. Either “A” or “B”.

The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.

<S> -> The speaker id, no restrictions apply to this name. <BT> -> The begin time (seconds) of the word, measured from the

start time of the file.

<ET> -> The end time (seconds) of the segment. <LABEL> -> A comma separated list of subset identifiers enclosed

in angle brackets

transcript -> The transcript can take on two forms:
  1. a whitespace separated list of words, or

2) the string “IGNORE_TIME_SEGMENT_IN_SCORING”. The list of words can contain a transcript alternation using the following BNF format:

ALTERNATE :== “{” <text> ALT+ “}” ALT :== “|” <text> TEXT :== 1 thru n words | “@” | ALTERNATE

The file must be sorted by the first and second columns in ASCII order, and the fourth in numeric order.

Lines beginning with ‘;;’ are considered comments and are ignored. Blank lines are also ignored.

__init__(name=None)[source]

Initialize a new STM instance.

Parameters

name – (str) This transcription name.

static check_line(line, line_number=0)[source]

Check whether a line is an annotation or not.

Raises AioLineFormatError() or ValueError() in case of a malformed line.

Parameters
  • line – (str)

  • line_number – (int)

Returns

(bool)

static detect(filename)[source]

Check whether a file is of STM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

get_tier(line)[source]

Return the tier related to the given line.

Find the tier or create it.

Parameters

line – (str)

Returns

(sppasTier)

read(filename)[source]

Read a ctm file and fill the Transcription.

It creates a tier for each media-channel observed in the file.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.sppasSignaix(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

Reader and writer of F0 values from LPL-Signaix.

__init__(name=None)[source]

Initialize a new sppasSignaix instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CTM format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename, delta=0.01)[source]

Read a file with Pitch values sampled at delta seconds.

The file contains one value at a line. If the audio file is 30 seconds long and delta is 0.01, we expect: 100 * 30 = 3,000 lines in the file

Parameters
  • filename – (str) intput filename.

  • delta – (float) sampling of the file. Default is one F0

value each 10ms, so 100 values / second

write(filename)[source]

Write a file with pitch values.

Parameters

filename – (str) output filename

class anndata.aio.sppasSubRip(name=None)[source]

Bases: anndata.aio.subtitle.sppasBaseSubtitles

SPPAS reader/writer for SRT format.

The SubRip text file format (SRT) is used by the SubRip program to save subtitles ripped from video files or DVDs. It is free software, released under the GNU GPL.

Each subtitle is represented as a group of lines. Subtitles are separated subtitles by a blank line.

  • first line of a subtitle is an index (starting from 1);

  • the second line is a timestamp interval, in the format %H:%M:%S,%m and the start and end of the range separated by –>;

  • optionally: a specific positioning by pixels, in the form X1:number Y1:number X2:number Y2:number;

  • the third line is the label. The HTML <b>, <i>, <u>, and <font> tags are allowed.

__init__(name=None)[source]

Initialize a new sppasSubRip instance.

Parameters

name – (str) This transcription name.

read(filename)[source]

Read a SRT file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.sppasSubViewer(name=None)[source]

Bases: anndata.aio.subtitle.sppasBaseSubtitles

SPPAS reader/writer for SUB format.

The SubViewer text file format (SUB) is used by the SubViewer program to save subtitles of videos.

__init__(name=None)[source]

Initialize a new sppasBaseSubtitles instance.

Parameters

name – (str) This transcription name.

read(filename)[source]

Read a SUB file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write a transcription into a file.

Parameters

filename – (str)

class anndata.aio.sppasTDF(name=None)[source]

Bases: anndata.aio.text.sppasBaseText

SPPAS TDF reader.

This class implements a TDF reader, but not a writer. TDF is a Tab-Delimited Format. It contains 13 columns but SPPAS only extracts 8 of them.

TDF does not support alternatives labels nor locations. Only the ones with the best score are saved. TDF can save several tiers. TDF does not support controlled vocabularies. TDF does not support hierarchy. TDF does not support metadata. TDF supports media assignment. TDF supports intervals only. TDF does not support alternative tags. TDF does not support radius.

__init__(name=None)[source]

Initialize a new sppasTDF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of TDF format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static make_point(midpoint)[source]

The localization is a time value, so always a float.

read(filename)[source]

Read a raw file and fill the sppasTranscription.

It creates a tier for each speaker-channel observed in the file.

Parameters

filename – (str)

class anndata.aio.sppasTRA(name=None)[source]

Bases: anndata.aio.table.sppasTable

SPPAS TRA writer: the Table Rich Annotations format.

This format contains the set of instances separated be ‘;’. It can be easily parsed like a CSV file.

__init__(name=None)[source]

Initialize a new sppasTRA instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of CSV format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

write(filename, signed=True)[source]

Write a raw text file with data in a table.

If signed is False, the default encoding is used.

Parameters
  • filename – (str)

  • signed – (bool) Indicate if the encoding is UTF-8 signed.

class anndata.aio.sppasTextGrid(name=None)[source]

Bases: anndata.aio.praat.sppasBasePraat

SPPAS TextGrid reader and writer.

TextGrid supports multiple tiers in a file. TextGrid does not support empty files (file with no tiers). TextGrid does not support alternatives labels nor locations. Only the ones with the best score are saved. TextGrid does not support controlled vocabularies. TextGrid does not support hierarchy. TextGrid does not support metadata. TextGrid does not support media assignment. TextGrid supports points and intervals. TextGrid does not support disjoint intervals. TextGrid does not support alternative tags (here called “text”). TextGrid does not support radius.

Both “short TextGrid” and “long TextGrid” file formats are supported.

__init__(name=None)[source]

Initialize a new sppasTextGrid instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of TextGrid format or not.

Try first to open the file with the default sppas encoding, then UTF-16.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

read(filename)[source]

Read a TextGrid file.

Parameters

filename – is the input file name, ending by “.TextGrid”

write(filename)[source]

Write a TextGrid file.

Parameters

filename – (str)

class anndata.aio.sppasWebVTT(name=None)[source]

Bases: anndata.aio.subtitle.sppasBaseSubtitles

SPPAS reader/writer for VTT format.

ONLY THE WRITER IS IMPLEMENTED YET.

__init__(name=None)[source]

Initialize a new sppasWebVTT instance.

Parameters

name – (str) This transcription name.

write(filename)[source]

Write a transcription into a file.

Not fully implemented.

Parameters

filename – (str)

class anndata.aio.sppasXRA(name=None)[source]

Bases: anndata.aio.basetrsio.sppasBaseIO

SPPAS XRA reader and writer.

xra files are the native file format of the GPL tool SPPAS.

__init__(name=None)[source]

Initialize a new XRA instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of XRA format or not.

Parameters

filename – (str) Name of the file to check.

Returns

(bool)

static format_annotation(annotation_root, annotation)[source]

Add an ‘Annotation’ element in the tree from a sppasAnnotation().

Parameters
  • annotation_root – (ET) XML Element tree root.

  • annotation – (sppasAnnotation)

static format_label(label_root, label)[source]

Add a ‘Label’ element in the tree from a sppasLabel().

Parameters
  • label_root – (ET) XML Element tree root.

  • label – (sppasLabel)

static format_location(location_root, location)[source]

Add a ‘Location’ element in the tree from a sppasLocation().

Parameters
  • location_root – (ET) XML Element tree root.

  • location – (sppasLocation)

static format_metadata(metadata_root, meta_object, exclude=[])[source]

Add ‘Metadata’ element in the tree from a sppasMetaData().

Parameters
  • metadata_root – (ET) XML Element tree root.

  • meta_object – (sppasMetadata)

  • exclude – (list) List of keys to exclude

static format_tier(tier_root, tier)[source]

Add a ‘Tier’ object in the tree from a sppasTier().

Parameters
  • tier_root – (ET) XML Element tree root.

  • tier – (sppasTier)

static indent(elem, level=0)[source]

Pretty indent.

http://effbot.org/zone/element-lib.htm#prettyprint

static parse_label(label_root)[source]

Parse a ‘Label’ element and return it.

Parameters

label_root – (ET) XML Element tree root.

Returns

(sppasLabel)

read(filename)[source]

Read an XRA file and fill the Transcription.

Parameters

filename – (str)

write(filename)[source]

Write an XRA file.

Parameters

filename – (str)

class anndata.aio.sppasXRFF(name=None)[source]

Bases: anndata.aio.table.sppasTable

SPPAS XRFF writer.

XML-based format of WEKA software tool. XRFF format description is at the following URL: http://weka.wikispaces.com/XRFF

This class is limited to:
  1. Only the writers are implemented. No readers.

  2. Sparse option is not supported by both writers.

  3. XRFF output file is not gzipped.

  4. XRFF format supports the followings that are not currently implemented into this class:

    • attribute weights;

    • instance weights.

– !!!!!!!! No guarantee !!!!!! –

This class has never been tested.

– !!!!!!!! No guarantee !!!!!! –

__init__(name=None)[source]

Initialize a new sppasXRFF instance.

Parameters

name – (str) This transcription name.

static detect(filename)[source]

Check whether a file is of the appropriate format or not.

write(filename)[source]

Write a XRFF file.

Parameters

filename – (str)