anndata.aio package¶
Submodules¶
anndata.aio.aioutils module¶
- filename
sppas.src.anndata.aio.aioutils.py
- author
Brigitte Bigi
- contact
- summary
Utilities for readers and writers.
- anndata.aio.aioutils.check_gaps(tier, min_loc=None, max_loc=None)[source]¶
Check if there are holes between annotations.
- Parameters
tier – (sppasTier)
min_loc – (sppasPoint)
max_loc – (sppasPoint)
- Returns
(bool)
- anndata.aio.aioutils.check_overlaps(tier)[source]¶
Check whether some annotations are overlapping or not.
- Parameters
tier – (sppasTier)
- Returns
(bool)
- anndata.aio.aioutils.fill_gaps(tier, min_loc=None, max_loc=None)[source]¶
Temporal gaps/holes between annotations are filled.
- Parameters
tier – (sppasTier) A tier with intervals.
min_loc – (sppasPoint)
max_loc – (sppasPoint)
- Returns
(sppasTier) a tier with un-labelled annotations instead of gaps.
- anndata.aio.aioutils.format_label(text, empty='', tag_type='str')[source]¶
Create a label from a text.
Use the “{ | }” system to parse the alternative tags and = for scores.
- Parameters
text – (str)
empty – (str) The text representing an empty tag.
tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).
- Returns
sppasLabel
- anndata.aio.aioutils.format_labels(text, separator='\n', empty='', tag_type='str')[source]¶
Create a set of labels from a text.
Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags.
- Examples
text = “{le|les} {chat|chats}” is 2 labels with 2 tags each text = “{le=0.6|les=0.4}” is a label with 2 tags and their score
- Parameters
text – (str)
separator – (str) String to separate labels.
empty – (str) The text representing an empty tag.
tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).
- Returns
list of sppasLabel
- anndata.aio.aioutils.format_score(text)[source]¶
Return a score from a text.
- Parameters
text – (str) Unicode text
- Returns
float or None
- anndata.aio.aioutils.format_tag(text, empty='', tag_type='str')[source]¶
Return a tag from a text.
- Parameters
text – (str) Unicode text
empty – (str) The text representing an empty tag.
tag_type – (str): The type of this content. One of: (‘str’, ‘int’, ‘float’, ‘bool’).
- Returns
sppasTag
- anndata.aio.aioutils.is_ortho_tier(tier_name)[source]¶
Return true is the tier_name matches an ortho trans.
i.e. is containing either “ipu”, “trans”, “trs”, “toe” or “ortho” in its name.
- Parameters
tier_name – (str)
- Returns
(bool)
- anndata.aio.aioutils.load(filename, file_encoding='utf-8')[source]¶
Load a file into lines.
- Parameters
filename – (str)
file_encoding – (str)
- Returns
list of lines (str)
- anndata.aio.aioutils.merge_overlapping_annotations(tier)[source]¶
Merge overlapping annotations.
The labels of 2 overlapping annotations are appended.
- Parameters
tier – (Tier)
- Returns
(sppasTier)
- anndata.aio.aioutils.point2interval(tier, radius=0.001)[source]¶
Convert a PointTier into an IntervalTier.
Ensure the radius to be always >= 1 millisecond and the newly created
tier won’t contain overlapped intervals. - Do not convert alternatives localizations. - Do not share the hierarchy. - New tier share the original tier’s metadata, except that its ‘id’ is different. - New annotations share the original annotation’s metadata, except that their ‘id’ is different.
- Parameters
tier – (Tier)
radius – (float) the radius to use for all intervals
- Returns
(sppasTier) or None if tier was not converted.
- anndata.aio.aioutils.serialize_label(label, empty='', alt=True)[source]¶
Convert the label into a string, include or not alternative tags.
Use the “{ | }” system to serialize the alternative tags. Scores of the tags are not returned.
- Parameters
label – (sppasLabel)
empty – (str) The text to return if a tag is empty or not set.
alt – (bool) Include alternative tags
- Returns
(str)
- anndata.aio.aioutils.serialize_labels(labels, separator='\n', empty='', alt=True)[source]¶
Create a text from a list of labels.
Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags and = for scores.
- Parameters
labels – (list of sppasLabel)
separator – (str) String separating labels
empty – (str) The text representing an empty tag
alt – (bool) Include alternative tags. If False, only the best tag is serialized.
- Returns
list of sppasLabel
- anndata.aio.aioutils.unalign(aligned_tier, ipus_separators=('#', 'sil', 'dummy'))[source]¶
Convert a time-aligned tier into a non-aligned tier.
- Parameters
aligned_tier – (sppasTier)
ipus_separators – (list)
- Returns
(Tier)
- anndata.aio.aioutils.unfill_gaps(tier)[source]¶
Return the tier in which un-labelled annotations are removed.
An un_labelled annotation means that:
the annotation has no labels,
or the tags of each label are an empty string.
The hierarchy is not copied to the new tier.
- Parameters
tier – (Tier)
- Returns
(sppasTier)
anndata.aio.annotationpro module¶
- filename
sppas.src.anndata.aio.annotationpro.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of annotation pro files.
Annotation Pro is a tool for annotation of audio and text files:
- anndata.aio.annotationpro.rgb_to_color(r, g, b)[source]¶
Convert a RGB color into ANTX decimal color.
- class anndata.aio.annotationpro.sppasANT(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
AnnotationPro ANT reader and writer.
An ANT file is a ZIPPED directory.
- __init__(name=None)[source]¶
Initialize a new sppasANT instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.annotationpro.sppasANTX(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
AnnotationPro ANTX reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasANTX instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of ANTX format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
anndata.aio.anvil module¶
- filename
sppas.src.anndata.aio.anvil.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of SPPAS native file formats (.xra, .jra).
ANVIL is a free video annotation tool.
BE AWARE that the support of anvil files by SPPAS has to be verified, tested and extended!!! The last release of ANVIL was in 2017.
- class anndata.aio.anvil.sppasAnvil(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
ANVIL (partially) reader.
- Author
Brigitte Bigi, Jibril Saffi
- Organization
Laboratoire Parole et Langage, Aix-en-Provence, France
- Contact
- License
GPL, v3
- Copyright
Copyright (C) 2011-2018 Brigitte Bigi
- __init__(name=None)[source]¶
Initialize a new ANVIL instance.
- Parameters
name – (str) This transcription name.
anndata.aio.audacity module¶
- filename
sppas.src.anndata.aio.audacity.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of Audacity file formats (.eaf).
Audacity is a multi-platform, free, easy-to-use, multi-track audio editor and recorder. Audacity is free software, developed by a group of volunteers and distributed under the GNU General Public License (GPL).
See: http://www.audacityteam.org/
- class anndata.aio.audacity.sppasAudacity(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Audacity projects reader.
Can work on both Audacity projects and Audacity Label tracks.
- __init__(name=None)[source]¶
Initialize a new sppasAudacity instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of AUP format or not. AUP files are encoded in UTF-8 without BOM.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- static normalize(name)[source]¶
Provide namespaces in element names.
- Example:
<Element ‘{http://audacity.sourceforge.net/xml/}simpleblockfile’ at 0x03270230> <Element ‘{http://audacity.sourceforge.net/xml/}envelope’ at 0x032702C0> <Element ‘{http://audacity.sourceforge.net/xml/}labeltrack’ at 0x03270C50> <Element ‘{http://audacity.sourceforge.net/xml/}label’ at 0x032701E8>
anndata.aio.basetrsio module¶
- filename
sppas.src.anndata.aio.basetrsio.py
- author
Brigitte Bigi
- contact
- summary
Base class for any transcription input/output.
- class anndata.aio.basetrsio.sppasBaseIO(name=None)[source]¶
Bases:
anndata.transcription.sppasTranscription
Base object for readers and writers of annotated data.
- __init__(name=None)[source]¶
Initialize a new Transcription reader-writer instance.
- Parameters
name – (str) A transcription name.
- alternative_localization_support()[source]¶
Return True if it supports to alternative localizations.
If support with or without a score, it returns true.
- Returns
boolean
- alternative_tag_support()[source]¶
Return True if it supports alternative tags.
If support with or without a score, it returns true.
- Returns
boolean
- ctrl_vocab_support()[source]¶
Return True if it supports to read and write a controlled vocab.
- Returns
boolean
- disjoint_support()[source]¶
Return True if it supports tiers with localizations as disjoint.
- Returns
boolean
- gaps_support()[source]¶
Return True if it supports gaps between annotations of a tier.
- Returns
boolean
- interval_support()[source]¶
Return True if it supports tiers with localizations as intervals.
- Returns
boolean
- static is_number(s)[source]¶
Check whether a string is a number or not.
- Parameters
s – (str or unicode)
- Returns
(bool)
- media_support()[source]¶
Return True if it supports to read and write a link to a media.
- Returns
boolean
- multi_tiers_support()[source]¶
Return True if it supports to read and write several tiers.
- Returns
boolean
- overlaps_support()[source]¶
Return True if it supports overlaps between annotations of a tier.
- Returns
boolean
anndata.aio.elan module¶
- filename
sppas.src.anndata.aio.elan.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of ELAN native file formats (.eaf).
ELAN is a professional tool for the creation of complex annotations on video and audio resources.
- class anndata.aio.elan.sppasEAF(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Elan EAF reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasMLF instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of EAF format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- format_point(second_count)[source]¶
Convert a time in seconds into ELAN format.
- Parameters
second_count – (float) Time value (in seconds)
- Returns
(int) a time in ELAN format
anndata.aio.htk module¶
- filename
sppas.src.anndata.aio.htk.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of HTK native file formats.
The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models.
The first version of the HTK Hidden Markov Model Toolkit was developed at the Speech Vision and Robotics Group of the Cambridge University Engineering Department (CUED) in 1989 by Steve Young.
- class anndata.aio.htk.sppasBaseHTK(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
SPPAS HTK files reader and writer.
- class anndata.aio.htk.sppasLab(name=None)[source]¶
Bases:
anndata.aio.htk.sppasBaseHTK
SPPAS LAB reader and writer.
Each line of a HTK label file contains the actual label optionally preceded by start and end times, and optionally followed by a match score.
[<start> <end>] <<name> [<score>]> [“;” <comment>]
Multiple alternatives are written as a sequence of separate label lists separated by three slashes (///).
- Examples:
simple transcription:
0000000 3600000 ice 3600000 8200000 cream
alternative labels:
0000000 2200000 I 2200000 8200000 scream /// 0000000 3600000 ice 3600000 8200000 cream /// 0000000 3600000 eyes 3600000 8200000 cream
********* Only simple transcription is implemented yet. *******
- __init__(name=None)[source]¶
Initialize a new sppasLab instance.
- Parameters
name – (str) This transcription name.
anndata.aio.phonedit module¶
- filename
sppas.src.anndata.aio.phonedit.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of Phonedit-Signaix native file formats.
PHONEDIT Signaix is a software for the analysis of sound, aerodynamic, articulatory and electro-physiological signals developed by the Parole et Langage Laboratory, Aix-en-Provence, France.
It provides a complete environment for the recording, the playback, the display, the analysis, the labeling of multi-parametric data.
http://www.lpl-aix.fr/~lpldev/phonedit/
- class anndata.aio.phonedit.sppasBasePhonedit(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Readers and writers of Phonedit files.
- class anndata.aio.phonedit.sppasMRK(name=None)[source]¶
Bases:
anndata.aio.phonedit.sppasBasePhonedit
Reader and writer of Phonedit MRK files.
Example of the old format:
[DSC_LEVEL_AA] DSC_LEVEL_NAME=”transcription” DSC_LEVEL_CREATION_DATE=2018/03/09 09:57:07 DSC_LEVEL_LASTMODIF_DATE=2018/03/09 09:57:07 DSC_LEVEL_SOFTWARE=Phonedit Application 4.2.0.8 [LBL_LEVEL_AA] LBL_LEVEL_AA_000000= “#” 0.000000 2497.100755 LBL_LEVEL_AA_000001= “ipu_1” 2497.100755 5683.888038 LBL_LEVEL_AA_000002= “#” 5683.888038 5743.602653 LBL_LEVEL_AA_000003= “ipu_2” 5743.602653 8460.595544
The new MRK format includes sections for time slots.
- __init__(name=None)[source]¶
Initialize a new sppasBaseSclite instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of CTM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- class anndata.aio.phonedit.sppasSignaix(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Reader and writer of F0 values from LPL-Signaix.
- __init__(name=None)[source]¶
Initialize a new sppasSignaix instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of CTM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- read(filename, delta=0.01)[source]¶
Read a file with Pitch values sampled at delta seconds.
The file contains one value at a line. If the audio file is 30 seconds long and delta is 0.01, we expect: 100 * 30 = 3,000 lines in the file
- Parameters
filename – (str) intput filename.
delta – (float) sampling of the file. Default is one F0
value each 10ms, so 100 values / second
anndata.aio.praat module¶
- filename
sppas.anndata.aio.praat.py
- author
Brigitte Bigi
- contact
- summary
The Praat files reader/writer.
- class anndata.aio.praat.sppasBaseNumericalTier(name=None)[source]¶
Bases:
anndata.aio.praat.sppasBasePraat
SPPAS PitchTier, IntensityTier, etc reader and writer.
Support of Praat file formats with only one tier of numerical values like pitch, intensity, etc.
- class anndata.aio.praat.sppasBasePraat(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Base class for readers and writers of Praat files.
Praat - Doing phonetic with computers, is a GPL tool developed by:
Paul Boersma and David Weenink Phonetic Sciences, University of Amsterdam, The Netherlands
- class anndata.aio.praat.sppasIntensityTier(name=None)[source]¶
Bases:
anndata.aio.praat.sppasPitchTier
SPPAS IntensityTier reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasIntensityTier instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.praat.sppasPitchTier(name=None)[source]¶
Bases:
anndata.aio.praat.sppasBaseNumericalTier
SPPAS PitchTier reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasPitchTier instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of PitchTier format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- class anndata.aio.praat.sppasTextGrid(name=None)[source]¶
Bases:
anndata.aio.praat.sppasBasePraat
SPPAS TextGrid reader and writer.
TextGrid supports multiple tiers in a file. TextGrid does not support empty files (file with no tiers). TextGrid does not support alternatives labels nor locations. Only the ones with the best score are saved. TextGrid does not support controlled vocabularies. TextGrid does not support hierarchy. TextGrid does not support metadata. TextGrid does not support media assignment. TextGrid supports points and intervals. TextGrid does not support disjoint intervals. TextGrid does not support alternative tags (here called “text”). TextGrid does not support radius.
Both “short TextGrid” and “long TextGrid” file formats are supported.
- __init__(name=None)[source]¶
Initialize a new sppasTextGrid instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of TextGrid format or not.
Try first to open the file with the default sppas encoding, then UTF-16.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
anndata.aio.readwrite module¶
- filename
sppas.anndata.aio.readwrite.py
- author
Brigitte Bigi
- contact
- summary
The annotated files main reader/writer.
- class anndata.aio.readwrite.FileFormatProperty(extension)[source]¶
Bases:
object
Represent one format and its properties.
- class anndata.aio.readwrite.sppasTrsRW(filename)[source]¶
Bases:
object
Main parser of annotated data: Reader and writer of annotated data.
All the 3 types of annotated files are supported: ANNOT, MEASURE, TABLE.
- TRANSCRIPTION_TYPES = {'IntensityTier': <class 'anndata.aio.praat.sppasIntensityTier'>, 'PitchTier': <class 'anndata.aio.praat.sppasPitchTier'>, 'TextGrid': <class 'anndata.aio.praat.sppasTextGrid'>, 'ant': <class 'anndata.aio.annotationpro.sppasANT'>, 'antx': <class 'anndata.aio.annotationpro.sppasANTX'>, 'anvil': <class 'anndata.aio.anvil.sppasAnvil'>, 'arff': <class 'anndata.aio.table.sppasARFF'>, 'aup': <class 'anndata.aio.audacity.sppasAudacity'>, 'csv': <class 'anndata.aio.text.sppasCSV'>, 'ctm': <class 'anndata.aio.sclite.sppasCTM'>, 'eaf': <class 'anndata.aio.elan.sppasEAF'>, 'hz': <class 'anndata.aio.phonedit.sppasSignaix'>, 'lab': <class 'anndata.aio.htk.sppasLab'>, 'mrk': <class 'anndata.aio.phonedit.sppasMRK'>, 'srt': <class 'anndata.aio.subtitle.sppasSubRip'>, 'stm': <class 'anndata.aio.sclite.sppasSTM'>, 'sub': <class 'anndata.aio.subtitle.sppasSubViewer'>, 'tdf': <class 'anndata.aio.xtrans.sppasTDF'>, 'tra': <class 'anndata.aio.table.sppasTRA'>, 'trs': <class 'anndata.aio.transcriber.sppasTRS'>, 'txt': <class 'anndata.aio.text.sppasRawText'>, 'vtt': <class 'anndata.aio.subtitle.sppasWebVTT'>, 'xra': <class 'anndata.aio.xra.sppasXRA'>, 'xrff': <class 'anndata.aio.table.sppasXRFF'>}¶
- static create_trs_from_extension(filename)[source]¶
Return a transcription according to a given filename.
Only the extension of the filename is used.
- Parameters
filename – (str)
- Returns
Transcription()
- static create_trs_from_heuristic(filename)[source]¶
Return a transcription according to a given filename.
The given file is opened and an heuristic allows to fix the format.
- Parameters
filename – (str)
- Returns
Transcription()
anndata.aio.sclite module¶
- filename
sppas.src.anndata.aio.sclite.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of SCTK formats.
Sclite readers and writers: ctm, stm file formats. The program sclite is a tool for scoring and evaluating the output of speech recognition systems.
Sclite is part of the NIST SCTK Scoring Tookit: https://www.nist.gov/itl/iad/mig/tools
File formats description: http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/infmts.htm#ctm_fmt_name_0
Remark:¶
Because comments are possible, this class uses this function as an opportunity to store metadata.
- class anndata.aio.sclite.sppasBaseSclite(name=None)[source]¶
Bases:
anndata.aio.text.sppasBaseText
SPPAS base Sclite reader and writer.
Current version does not fully support alternations. * * * * *
- class anndata.aio.sclite.sppasCTM(name=None)[source]¶
Bases:
anndata.aio.sclite.sppasBaseSclite
SPPAS ctm reader and writer.
This is the reader/writer of the time marked conversation input files to be used for scoring the output of speech recognizers via the NIST sclite() program. This file format is as follow (in BNF):
CTM :== <F> <C> <BT> <DUR> word [ <CONF> ]
- where:
- <F> -> The waveform filename.
NOTE: no path-names or extensions are expected.
- <C> -> The waveform channel. Either “A” or “B”.
The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.
- <BT> -> The begin time (seconds) of the word, measured from the
start time of the file.
<DUR> -> The duration (seconds) of the word. <CONF> -> Optional confidence score.
The file must be sorted by the first three columns: the first and the second in ASCII order, and the third by a numeric order.
Lines beginning with ‘;;’ are considered comments and ignored by sclite. Blank lines are also ignored.
NOT IMPLEMENTED * * *
Alternations are also accepted in some extended CTM. Examples:
;; 7654 A * * <ALT_BEGIN> 7654 A 12.00 0.34 UM 7654 A * * <ALT> 7654 A 12.00 0.34 UH 7654 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 222.77 0.32 BYEBYE 5555 A * * <ALT> 5555 A 222.78 0.12 BYE 5555 A 222.93 0.16 BYE 5555 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 186.32 0.01 D- 5555 A * * <ALT> 5555 A * * <ALT_END>
- __init__(name=None)[source]¶
Initialize a new CTM instance.
- Parameters
name – (str) This transcription name.
- static check_line(line, line_number=0)[source]¶
Check whether a line is an annotation or not.
Raises AioLineFormatError() or ValueError() in case of a malformed line.
- Parameters
line – (str)
line_number – (int)
- Returns
(bool)
- static detect(filename)[source]¶
Check whether a file is of CTM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- static get_score(line)[source]¶
Return the score of the label of a given line.
- Parameters
line – (str)
- Returns
(float) or None if no score is given
- get_tier(line)[source]¶
Return the tier related to the given line.
Find the tier or create it.
- Parameters
line – (str)
- Returns
(sppasTier)
- class anndata.aio.sclite.sppasSTM(name=None)[source]¶
Bases:
anndata.aio.sclite.sppasBaseSclite
SPPAS stm reader and writer.
This is the reader/writer for the segment time marked files to be used for scoring the output of speech recognizers via the NIST sclite() program.
STM :== <F> <C> <S> <BT> <ET> [ <LABEL> ] transcript …
- where:
- <F> -> The waveform filename.
NOTE: no pathnames or extensions are expected.
- <C> -> The waveform channel. Either “A” or “B”.
The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.
<S> -> The speaker id, no restrictions apply to this name. <BT> -> The begin time (seconds) of the word, measured from the
start time of the file.
<ET> -> The end time (seconds) of the segment. <LABEL> -> A comma separated list of subset identifiers enclosed
in angle brackets
- transcript -> The transcript can take on two forms:
a whitespace separated list of words, or
2) the string “IGNORE_TIME_SEGMENT_IN_SCORING”. The list of words can contain a transcript alternation using the following BNF format:
ALTERNATE :== “{” <text> ALT+ “}” ALT :== “|” <text> TEXT :== 1 thru n words | “@” | ALTERNATE
The file must be sorted by the first and second columns in ASCII order, and the fourth in numeric order.
Lines beginning with ‘;;’ are considered comments and are ignored. Blank lines are also ignored.
- __init__(name=None)[source]¶
Initialize a new STM instance.
- Parameters
name – (str) This transcription name.
- static check_line(line, line_number=0)[source]¶
Check whether a line is an annotation or not.
Raises AioLineFormatError() or ValueError() in case of a malformed line.
- Parameters
line – (str)
line_number – (int)
- Returns
(bool)
- static detect(filename)[source]¶
Check whether a file is of STM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- get_tier(line)[source]¶
Return the tier related to the given line.
Find the tier or create it.
- Parameters
line – (str)
- Returns
(sppasTier)
anndata.aio.subtitle module¶
- filename
sppas.src.anndata.aio.subtitle.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of subtitles formats (.sub, .srt…).
SubViewer is a utility for adding and synchronizing subtitles to video content. It was created by David Dolinski in 1999. Precision in time is 10ms.
SubRip is a free software program for Windows which “rips” (extracts) subtitles and their timings from video. It is free software, released under the GNU GPL. SubRip is also the name of the widely used and broadly compatible subtitle text file format created by this software. Precision in time is 1ms.
WebVTT (Web Video Text Tracks) is a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 <track> element. Main differences from SubRip are:
WebVTT’s first line starts with WEBVTT after the optional UTF-8 byte order mark
There is space for optional header data between the first line and the first cue
Timecode fractional values are separated by a full stop instead of a comma
Timecode hours are optional
The frame numbering/identification preceding the timecode is optional
Comments identified by the word NOTE can be added
Metadata information can be added in a JSON-style format
Chapter information can be optionally specified
Only supports extended characters as UTF-8
CSS in a separate file defined in the companion HTML document for C tags is used instead of the FONT tag
Cue settings allow the customization of cue positioning on the video
- class anndata.aio.subtitle.sppasBaseSubtitles(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
SPPAS base class for subtitle formats.
- class anndata.aio.subtitle.sppasSubRip(name=None)[source]¶
Bases:
anndata.aio.subtitle.sppasBaseSubtitles
SPPAS reader/writer for SRT format.
The SubRip text file format (SRT) is used by the SubRip program to save subtitles ripped from video files or DVDs. It is free software, released under the GNU GPL.
Each subtitle is represented as a group of lines. Subtitles are separated subtitles by a blank line.
first line of a subtitle is an index (starting from 1);
the second line is a timestamp interval, in the format %H:%M:%S,%m and the start and end of the range separated by –>;
optionally: a specific positioning by pixels, in the form X1:number Y1:number X2:number Y2:number;
the third line is the label. The HTML <b>, <i>, <u>, and <font> tags are allowed.
- class anndata.aio.subtitle.sppasSubViewer(name=None)[source]¶
Bases:
anndata.aio.subtitle.sppasBaseSubtitles
SPPAS reader/writer for SUB format.
The SubViewer text file format (SUB) is used by the SubViewer program to save subtitles of videos.
- class anndata.aio.subtitle.sppasWebVTT(name=None)[source]¶
Bases:
anndata.aio.subtitle.sppasBaseSubtitles
SPPAS reader/writer for VTT format.
ONLY THE WRITER IS IMPLEMENTED YET.
anndata.aio.table module¶
- filename
sppas.src.anndata.aio.table.py
- author
Brigitte Bigi
- contact
- summary
Export annotated data into time-tables.
Weka is a collection of machine learning algorithms for data mining tasks: https://www.cs.waikato.ac.nz/ml/weka/
WEKA is supporting 2 file formats:
ARFF: a simple ASCII file,
XRFF: an XML file which can be compressed with gzip.
This file is also implementing the TRA format of SPPAS: the Table Rich Annotations format.
ONLY writers for ARFF, XRFF and TRA are implemented.
- class anndata.aio.table.sppasARFF(name=None)[source]¶
Bases:
anndata.aio.table.sppasTable
SPPAS ARFF writer.
ARFF format description is at the following URL: http://weka.wikispaces.com/ARFF+(book+version) An ARFF file for WEKA has the following structure:
Several lines starting by ‘%’ with any kind of comment,
The name of the relation,
The set of attributes,
The set of instances.
- class anndata.aio.table.sppasTRA(name=None)[source]¶
Bases:
anndata.aio.table.sppasTable
SPPAS TRA writer: the Table Rich Annotations format.
This format contains the set of instances separated be ‘;’. It can be easily parsed like a CSV file.
- __init__(name=None)[source]¶
Initialize a new sppasTRA instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.table.sppasTable(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
SPPAS Base writer for ARFF and XRFF formats.
The following metadata of the Transcription object can be defined:
table_instance_step: time step for the data instances. Do not
define if “table_instance_anchor” is set to a tier. - table_max_class_tags - table_max_attributes_tags - table_empty_annotation_tag - table_empty_annotation_class_tag - table_uncertain_annotation_tag
The following metadata can be defined in a tier:
table_attribute is fixed if the tier will be used as attribute
(i.e. its data will be part of the instances). The value can be “numeric” to use distributions of probabilities or “label” to use the annotation labels in the vector of parameters. - table_class is fixed to the tier with the annotation labels to
be inferred by the classification system. No matter of the value.
table_instance_anchor is fixed if the tier has to be used to
define the time intervals of the instances. - table_epsilon probability of an unobserved tag.
Notice that the anchor tier can also be either an attribute tier or the class tier. TODO: BUG IF ANCHOR == CLASS
- __init__(name=None)[source]¶
Initialize a new sppasTable instance.
- Parameters
name – (str) This transcription name.
- static check_max_attributes_tags(nb_tags)[source]¶
Check the maximum number of tags for an attribute.
- Parameters
nb_tags – (int) Size of the controlled vocabulary of the
attribute tier
- static check_max_class_tags(nb_tags)[source]¶
Check the maximum number of tags for the class.
- Parameters
nb_tags – (int) Size of the controlled vocabulary of the
class tier
- set_empty_annotation_class_tag(tag_str=None)[source]¶
Fix the annotation string to be used to replace…
empty annotations in the class tier.
- Parameters
tag_str – (str or None) None is used to NOT fill
unlabelled annotations, so to ignore them in the data.
- set_empty_annotation_tag(tag_str)[source]¶
Fix the annotation string to be used to replace…
empty annotations.
- Parameters
tag_str – (str)
- set_max_attributes_tags(nb_tags)[source]¶
Set the maximum number of tags for an attribute.
Instead, the program won’t list the attribute and will use ‘STRING’.
- Parameters
nb_tags – (int) Size of the controlled vocabulary of the
class tier
- set_max_class_tags(nb_tags)[source]¶
Set the maximum number of tags for a class.
- Parameters
nb_tags – (int) Size of the controlled vocabulary of the
class tier
- set_uncertain_annotation_tag(tag_str)[source]¶
Fix the annotation string that is used in the annotations to…
mention an uncertain label.
- Parameters
tag_str – (str)
- class anndata.aio.table.sppasXRFF(name=None)[source]¶
Bases:
anndata.aio.table.sppasTable
SPPAS XRFF writer.
XML-based format of WEKA software tool. XRFF format description is at the following URL: http://weka.wikispaces.com/XRFF
- This class is limited to:
Only the writers are implemented. No readers.
Sparse option is not supported by both writers.
XRFF output file is not gzipped.
XRFF format supports the followings that are not currently implemented into this class:
attribute weights;
instance weights.
– !!!!!!!! No guarantee !!!!!! –
This class has never been tested.
– !!!!!!!! No guarantee !!!!!! –
anndata.aio.text module¶
- filename
sppas.src.anndata.aio.text.py
- author
Brigitte Bigi
- contact
- summary
Text readers and writers for raw text, column-based text, csv.
- class anndata.aio.text.sppasBaseText(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
SPPAS base text reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasBaseText instance.
- Parameters
name – (str) This transcription name.
- static create_media(media_url, meta_object)[source]¶
Return the media of the given name (create it if necessary).
- Parameters
media_url – (str) Name (url) of the media to search/create
meta_object – (sppasTranscription)
- Returns
(sppasMedia)
- static fix_location(content_begin, content_end)[source]¶
Fix the location from the content of the data.
- Parameters
content_begin – (str) The content of a column representing
the begin of a localization. :param content_end: (str) The content of a column representing the end of a localization. :returns: sppasLocation or None
- static format_quotation_marks(text)[source]¶
Remove initial and final quotation mark.
- Parameters
text – (str/unicode) Text to clean
- Returns
(unicode) the text without initial and final quotation mark.
- static get_lines_columns(lines)[source]¶
Column-delimited? Search for the relevant separator.
- Parameters
lines – (list of str)
- Returns
lines (list) of columns (list of str)
- static is_comment(line)[source]¶
Check if the line is a comment, ie starts with ‘;;’.
- Parameters
line – (str/unicode)
- Returns
boolean
- static make_point(data)[source]¶
Convert data into the appropriate sppasPoint().
No radius is fixed if data is an integer. A default radius of 0.001 seconds if data is a float.
- Parameters
data – (any type)
- Returns
sppasPoint().
- static serialize_header(filename, meta_object)[source]¶
Create a comment with the metadata to be written.
- Parameters
filename – (str) Name of the file to serialize.
meta_object – (sppasMeta)
- class anndata.aio.text.sppasCSV(name=None)[source]¶
Bases:
anndata.aio.text.sppasBaseText
SPPAS CSV reader and writer.
- Author
Brigitte Bigi
- Organization
Laboratoire Parole et Langage, Aix-en-Provence, France
- Contact
- License
GPL, v3
- Copyright
Copyright (C) 2011-2018 Brigitte Bigi
- __init__(name=None)[source]¶
Initialize a new CSV instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of CSV format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- format_columns_lines(lines)[source]¶
Append lines content into self.
The algorithm doesn’t suppose that the file is sorted by tiers
- Parameters
lines – (list)
- read(filename, signed=True)[source]¶
Read a CSV file.
- Parameters
filename – (str)
signed – (bool) Indicate if the encoding is UTF-8 signed.
If False, the default encoding is used.
- write(filename, signed=True)[source]¶
Write a CSV file.
Because the labels can be only on one line, the whitespace is used to separate labels (instead of CR in other formats like textgrid).
- Parameters
filename – (str)
signed – (bool) Indicate if the encoding is UTF-8 signed.
If False, the default encoding is used.
- class anndata.aio.text.sppasRawText(name=None)[source]¶
Bases:
anndata.aio.text.sppasBaseText
SPPAS raw text reader and writer.
- Author
Brigitte Bigi
- Organization
Laboratoire Parole et Langage, Aix-en-Provence, France
- Contact
- License
GPL, v3
- Copyright
Copyright (C) 2011-2018 Brigitte Bigi
RawText does not support multiple tiers for writing (ok for reading). RawText accepts no tiers. RawText does not support alternatives labels nor locations. Only the ones with the best score are saved. RawText can save only one tier. RawText does not support controlled vocabularies. RawText does not support hierarchy. RawText does not support metadata. RawText does not support media assignment. RawText supports points and intervals. It does not support disjoint intervals. RawText does not support alternative tags. RawText does not support radius.
RawText supports comments: such lines are starting with ‘;;’.
- __init__(name=None)[source]¶
Initialize a new sppasRawText instance.
- Parameters
name – (str) This transcription name.
- read(filename)[source]¶
Read a raw file and fill the Transcription.
The file can be a simple raw text (without location information). It can also be a column-based (table-style) file, so that each column represents the annotation of a tier (1st and 2nd columns are indicating the location).
- Parameters
filename – (str)
anndata.aio.transcriber module¶
- filename
sppas.src.anndata.aio.transcriber.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of the deprecated Transcriber transcription tool.
Transcriber is a tool for assisting the manual annotation of speech signals. It provides a graphical user interface for segmenting long duration speech recordings, transcribing them, and labeling speech turns, topic changes and acoustic conditions. It is more specifically designed for the annotation of broadcast news recordings.
- class anndata.aio.transcriber.sppasTRS(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
SPPAS reader for TRS format.
- __init__(name=None)[source]¶
Initialize a new sppasTRS instance.
- Parameters
name – (str) This transcription name.
anndata.aio.xra module¶
- filename
sppas.src.anndata.aio.xra.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of SPPAS native file formats (.xra, .jra).
- class anndata.aio.xra.sppasJRA[source]¶
Bases:
object
JRA is intended to be the next default format of annotated files.
- class anndata.aio.xra.sppasXRA(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
SPPAS XRA reader and writer.
xra files are the native file format of the GPL tool SPPAS.
- __init__(name=None)[source]¶
Initialize a new XRA instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of XRA format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- static format_annotation(annotation_root, annotation)[source]¶
Add an ‘Annotation’ element in the tree from a sppasAnnotation().
- Parameters
annotation_root – (ET) XML Element tree root.
annotation – (sppasAnnotation)
- static format_label(label_root, label)[source]¶
Add a ‘Label’ element in the tree from a sppasLabel().
- Parameters
label_root – (ET) XML Element tree root.
label – (sppasLabel)
- static format_location(location_root, location)[source]¶
Add a ‘Location’ element in the tree from a sppasLocation().
- Parameters
location_root – (ET) XML Element tree root.
location – (sppasLocation)
- static format_metadata(metadata_root, meta_object, exclude=[])[source]¶
Add ‘Metadata’ element in the tree from a sppasMetaData().
- Parameters
metadata_root – (ET) XML Element tree root.
meta_object – (sppasMetadata)
exclude – (list) List of keys to exclude
- static format_tier(tier_root, tier)[source]¶
Add a ‘Tier’ object in the tree from a sppasTier().
- Parameters
tier_root – (ET) XML Element tree root.
tier – (sppasTier)
anndata.aio.xtrans module¶
- filename
sppas.src.anndata.aio.xtrans.py
- author
Brigitte Bigi
- contact
- summary
Input/Output of XTrans.
XTrans is a multi-platform, multilingual, multi-channel transcription tool that supports manual transcription and annotation of audio recordings. Last version of Xtrans was released in 2009.
https://www.ldc.upenn.edu/language-resources/tools/xtrans
- class anndata.aio.xtrans.sppasTDF(name=None)[source]¶
Bases:
anndata.aio.text.sppasBaseText
SPPAS TDF reader.
This class implements a TDF reader, but not a writer. TDF is a Tab-Delimited Format. It contains 13 columns but SPPAS only extracts 8 of them.
TDF does not support alternatives labels nor locations. Only the ones with the best score are saved. TDF can save several tiers. TDF does not support controlled vocabularies. TDF does not support hierarchy. TDF does not support metadata. TDF supports media assignment. TDF supports intervals only. TDF does not support alternative tags. TDF does not support radius.
- __init__(name=None)[source]¶
Initialize a new sppasTDF instance.
- Parameters
name – (str) This transcription name.
Module contents¶
- filename
sppas.src.anndata.aio.__init__.py
- author
Brigitte Bigi
- contact
- summary
Readers and writers of annotated data.
- anndata.aio.format_label(text, empty='', tag_type='str')[source]¶
Create a label from a text.
Use the “{ | }” system to parse the alternative tags and = for scores.
- Parameters
text – (str)
empty – (str) The text representing an empty tag.
tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).
- Returns
sppasLabel
- anndata.aio.format_labels(text, separator='\n', empty='', tag_type='str')[source]¶
Create a set of labels from a text.
Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags.
- Examples
text = “{le|les} {chat|chats}” is 2 labels with 2 tags each text = “{le=0.6|les=0.4}” is a label with 2 tags and their score
- Parameters
text – (str)
separator – (str) String to separate labels.
empty – (str) The text representing an empty tag.
tag_type – (str): One of: (‘str’, ‘int’, ‘float’, ‘bool’).
- Returns
list of sppasLabel
- anndata.aio.serialize_label(label, empty='', alt=True)[source]¶
Convert the label into a string, include or not alternative tags.
Use the “{ | }” system to serialize the alternative tags. Scores of the tags are not returned.
- Parameters
label – (sppasLabel)
empty – (str) The text to return if a tag is empty or not set.
alt – (bool) Include alternative tags
- Returns
(str)
- anndata.aio.serialize_labels(labels, separator='\n', empty='', alt=True)[source]¶
Create a text from a list of labels.
Use the separator to split the text into labels. Use the “{ | }” system to parse the alternative tags and = for scores.
- Parameters
labels – (list of sppasLabel)
separator – (str) String separating labels
empty – (str) The text representing an empty tag
alt – (bool) Include alternative tags. If False, only the best tag is serialized.
- Returns
list of sppasLabel
- class anndata.aio.sppasANT(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
AnnotationPro ANT reader and writer.
An ANT file is a ZIPPED directory.
- __init__(name=None)[source]¶
Initialize a new sppasANT instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.sppasANTX(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
AnnotationPro ANTX reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasANTX instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of ANTX format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- class anndata.aio.sppasARFF(name=None)[source]¶
Bases:
anndata.aio.table.sppasTable
SPPAS ARFF writer.
ARFF format description is at the following URL: http://weka.wikispaces.com/ARFF+(book+version) An ARFF file for WEKA has the following structure:
Several lines starting by ‘%’ with any kind of comment,
The name of the relation,
The set of attributes,
The set of instances.
- class anndata.aio.sppasAnvil(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
ANVIL (partially) reader.
- Author
Brigitte Bigi, Jibril Saffi
- Organization
Laboratoire Parole et Langage, Aix-en-Provence, France
- Contact
- License
GPL, v3
- Copyright
Copyright (C) 2011-2018 Brigitte Bigi
- __init__(name=None)[source]¶
Initialize a new ANVIL instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.sppasAudacity(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Audacity projects reader.
Can work on both Audacity projects and Audacity Label tracks.
- __init__(name=None)[source]¶
Initialize a new sppasAudacity instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of AUP format or not. AUP files are encoded in UTF-8 without BOM.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- static normalize(name)[source]¶
Provide namespaces in element names.
- Example:
<Element ‘{http://audacity.sourceforge.net/xml/}simpleblockfile’ at 0x03270230> <Element ‘{http://audacity.sourceforge.net/xml/}envelope’ at 0x032702C0> <Element ‘{http://audacity.sourceforge.net/xml/}labeltrack’ at 0x03270C50> <Element ‘{http://audacity.sourceforge.net/xml/}label’ at 0x032701E8>
- class anndata.aio.sppasCSV(name=None)[source]¶
Bases:
anndata.aio.text.sppasBaseText
SPPAS CSV reader and writer.
- Author
Brigitte Bigi
- Organization
Laboratoire Parole et Langage, Aix-en-Provence, France
- Contact
- License
GPL, v3
- Copyright
Copyright (C) 2011-2018 Brigitte Bigi
- __init__(name=None)[source]¶
Initialize a new CSV instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of CSV format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- format_columns_lines(lines)[source]¶
Append lines content into self.
The algorithm doesn’t suppose that the file is sorted by tiers
- Parameters
lines – (list)
- read(filename, signed=True)[source]¶
Read a CSV file.
- Parameters
filename – (str)
signed – (bool) Indicate if the encoding is UTF-8 signed.
If False, the default encoding is used.
- write(filename, signed=True)[source]¶
Write a CSV file.
Because the labels can be only on one line, the whitespace is used to separate labels (instead of CR in other formats like textgrid).
- Parameters
filename – (str)
signed – (bool) Indicate if the encoding is UTF-8 signed.
If False, the default encoding is used.
- class anndata.aio.sppasCTM(name=None)[source]¶
Bases:
anndata.aio.sclite.sppasBaseSclite
SPPAS ctm reader and writer.
This is the reader/writer of the time marked conversation input files to be used for scoring the output of speech recognizers via the NIST sclite() program. This file format is as follow (in BNF):
CTM :== <F> <C> <BT> <DUR> word [ <CONF> ]
- where:
- <F> -> The waveform filename.
NOTE: no path-names or extensions are expected.
- <C> -> The waveform channel. Either “A” or “B”.
The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.
- <BT> -> The begin time (seconds) of the word, measured from the
start time of the file.
<DUR> -> The duration (seconds) of the word. <CONF> -> Optional confidence score.
The file must be sorted by the first three columns: the first and the second in ASCII order, and the third by a numeric order.
Lines beginning with ‘;;’ are considered comments and ignored by sclite. Blank lines are also ignored.
NOT IMPLEMENTED * * *
Alternations are also accepted in some extended CTM. Examples:
;; 7654 A * * <ALT_BEGIN> 7654 A 12.00 0.34 UM 7654 A * * <ALT> 7654 A 12.00 0.34 UH 7654 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 222.77 0.32 BYEBYE 5555 A * * <ALT> 5555 A 222.78 0.12 BYE 5555 A 222.93 0.16 BYE 5555 A * * <ALT_END> ;; 5555 A * * <ALT_BEGIN> 5555 A 186.32 0.01 D- 5555 A * * <ALT> 5555 A * * <ALT_END>
- __init__(name=None)[source]¶
Initialize a new CTM instance.
- Parameters
name – (str) This transcription name.
- static check_line(line, line_number=0)[source]¶
Check whether a line is an annotation or not.
Raises AioLineFormatError() or ValueError() in case of a malformed line.
- Parameters
line – (str)
line_number – (int)
- Returns
(bool)
- static detect(filename)[source]¶
Check whether a file is of CTM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- static get_score(line)[source]¶
Return the score of the label of a given line.
- Parameters
line – (str)
- Returns
(float) or None if no score is given
- get_tier(line)[source]¶
Return the tier related to the given line.
Find the tier or create it.
- Parameters
line – (str)
- Returns
(sppasTier)
- class anndata.aio.sppasEAF(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Elan EAF reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasMLF instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of EAF format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- format_point(second_count)[source]¶
Convert a time in seconds into ELAN format.
- Parameters
second_count – (float) Time value (in seconds)
- Returns
(int) a time in ELAN format
- class anndata.aio.sppasIntensityTier(name=None)[source]¶
Bases:
anndata.aio.praat.sppasPitchTier
SPPAS IntensityTier reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasIntensityTier instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.sppasLab(name=None)[source]¶
Bases:
anndata.aio.htk.sppasBaseHTK
SPPAS LAB reader and writer.
Each line of a HTK label file contains the actual label optionally preceded by start and end times, and optionally followed by a match score.
[<start> <end>] <<name> [<score>]> [“;” <comment>]
Multiple alternatives are written as a sequence of separate label lists separated by three slashes (///).
- Examples:
simple transcription:
0000000 3600000 ice 3600000 8200000 cream
alternative labels:
0000000 2200000 I 2200000 8200000 scream /// 0000000 3600000 ice 3600000 8200000 cream /// 0000000 3600000 eyes 3600000 8200000 cream
********* Only simple transcription is implemented yet. *******
- __init__(name=None)[source]¶
Initialize a new sppasLab instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.sppasMRK(name=None)[source]¶
Bases:
anndata.aio.phonedit.sppasBasePhonedit
Reader and writer of Phonedit MRK files.
Example of the old format:
[DSC_LEVEL_AA] DSC_LEVEL_NAME=”transcription” DSC_LEVEL_CREATION_DATE=2018/03/09 09:57:07 DSC_LEVEL_LASTMODIF_DATE=2018/03/09 09:57:07 DSC_LEVEL_SOFTWARE=Phonedit Application 4.2.0.8 [LBL_LEVEL_AA] LBL_LEVEL_AA_000000= “#” 0.000000 2497.100755 LBL_LEVEL_AA_000001= “ipu_1” 2497.100755 5683.888038 LBL_LEVEL_AA_000002= “#” 5683.888038 5743.602653 LBL_LEVEL_AA_000003= “ipu_2” 5743.602653 8460.595544
The new MRK format includes sections for time slots.
- __init__(name=None)[source]¶
Initialize a new sppasBaseSclite instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of CTM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- class anndata.aio.sppasPitchTier(name=None)[source]¶
Bases:
anndata.aio.praat.sppasBaseNumericalTier
SPPAS PitchTier reader and writer.
- __init__(name=None)[source]¶
Initialize a new sppasPitchTier instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of PitchTier format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- class anndata.aio.sppasRawText(name=None)[source]¶
Bases:
anndata.aio.text.sppasBaseText
SPPAS raw text reader and writer.
- Author
Brigitte Bigi
- Organization
Laboratoire Parole et Langage, Aix-en-Provence, France
- Contact
- License
GPL, v3
- Copyright
Copyright (C) 2011-2018 Brigitte Bigi
RawText does not support multiple tiers for writing (ok for reading). RawText accepts no tiers. RawText does not support alternatives labels nor locations. Only the ones with the best score are saved. RawText can save only one tier. RawText does not support controlled vocabularies. RawText does not support hierarchy. RawText does not support metadata. RawText does not support media assignment. RawText supports points and intervals. It does not support disjoint intervals. RawText does not support alternative tags. RawText does not support radius.
RawText supports comments: such lines are starting with ‘;;’.
- __init__(name=None)[source]¶
Initialize a new sppasRawText instance.
- Parameters
name – (str) This transcription name.
- read(filename)[source]¶
Read a raw file and fill the Transcription.
The file can be a simple raw text (without location information). It can also be a column-based (table-style) file, so that each column represents the annotation of a tier (1st and 2nd columns are indicating the location).
- Parameters
filename – (str)
- class anndata.aio.sppasSTM(name=None)[source]¶
Bases:
anndata.aio.sclite.sppasBaseSclite
SPPAS stm reader and writer.
This is the reader/writer for the segment time marked files to be used for scoring the output of speech recognizers via the NIST sclite() program.
STM :== <F> <C> <S> <BT> <ET> [ <LABEL> ] transcript …
- where:
- <F> -> The waveform filename.
NOTE: no pathnames or extensions are expected.
- <C> -> The waveform channel. Either “A” or “B”.
The text of the waveform channel is not restricted by sclite. The text can be any text string without whitespace so long as the matching string is found in both the reference and hypothesis input files.
<S> -> The speaker id, no restrictions apply to this name. <BT> -> The begin time (seconds) of the word, measured from the
start time of the file.
<ET> -> The end time (seconds) of the segment. <LABEL> -> A comma separated list of subset identifiers enclosed
in angle brackets
- transcript -> The transcript can take on two forms:
a whitespace separated list of words, or
2) the string “IGNORE_TIME_SEGMENT_IN_SCORING”. The list of words can contain a transcript alternation using the following BNF format:
ALTERNATE :== “{” <text> ALT+ “}” ALT :== “|” <text> TEXT :== 1 thru n words | “@” | ALTERNATE
The file must be sorted by the first and second columns in ASCII order, and the fourth in numeric order.
Lines beginning with ‘;;’ are considered comments and are ignored. Blank lines are also ignored.
- __init__(name=None)[source]¶
Initialize a new STM instance.
- Parameters
name – (str) This transcription name.
- static check_line(line, line_number=0)[source]¶
Check whether a line is an annotation or not.
Raises AioLineFormatError() or ValueError() in case of a malformed line.
- Parameters
line – (str)
line_number – (int)
- Returns
(bool)
- static detect(filename)[source]¶
Check whether a file is of STM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- get_tier(line)[source]¶
Return the tier related to the given line.
Find the tier or create it.
- Parameters
line – (str)
- Returns
(sppasTier)
- class anndata.aio.sppasSignaix(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
Reader and writer of F0 values from LPL-Signaix.
- __init__(name=None)[source]¶
Initialize a new sppasSignaix instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of CTM format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- read(filename, delta=0.01)[source]¶
Read a file with Pitch values sampled at delta seconds.
The file contains one value at a line. If the audio file is 30 seconds long and delta is 0.01, we expect: 100 * 30 = 3,000 lines in the file
- Parameters
filename – (str) intput filename.
delta – (float) sampling of the file. Default is one F0
value each 10ms, so 100 values / second
- class anndata.aio.sppasSubRip(name=None)[source]¶
Bases:
anndata.aio.subtitle.sppasBaseSubtitles
SPPAS reader/writer for SRT format.
The SubRip text file format (SRT) is used by the SubRip program to save subtitles ripped from video files or DVDs. It is free software, released under the GNU GPL.
Each subtitle is represented as a group of lines. Subtitles are separated subtitles by a blank line.
first line of a subtitle is an index (starting from 1);
the second line is a timestamp interval, in the format %H:%M:%S,%m and the start and end of the range separated by –>;
optionally: a specific positioning by pixels, in the form X1:number Y1:number X2:number Y2:number;
the third line is the label. The HTML <b>, <i>, <u>, and <font> tags are allowed.
- class anndata.aio.sppasSubViewer(name=None)[source]¶
Bases:
anndata.aio.subtitle.sppasBaseSubtitles
SPPAS reader/writer for SUB format.
The SubViewer text file format (SUB) is used by the SubViewer program to save subtitles of videos.
- class anndata.aio.sppasTDF(name=None)[source]¶
Bases:
anndata.aio.text.sppasBaseText
SPPAS TDF reader.
This class implements a TDF reader, but not a writer. TDF is a Tab-Delimited Format. It contains 13 columns but SPPAS only extracts 8 of them.
TDF does not support alternatives labels nor locations. Only the ones with the best score are saved. TDF can save several tiers. TDF does not support controlled vocabularies. TDF does not support hierarchy. TDF does not support metadata. TDF supports media assignment. TDF supports intervals only. TDF does not support alternative tags. TDF does not support radius.
- __init__(name=None)[source]¶
Initialize a new sppasTDF instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.sppasTRA(name=None)[source]¶
Bases:
anndata.aio.table.sppasTable
SPPAS TRA writer: the Table Rich Annotations format.
This format contains the set of instances separated be ‘;’. It can be easily parsed like a CSV file.
- __init__(name=None)[source]¶
Initialize a new sppasTRA instance.
- Parameters
name – (str) This transcription name.
- class anndata.aio.sppasTextGrid(name=None)[source]¶
Bases:
anndata.aio.praat.sppasBasePraat
SPPAS TextGrid reader and writer.
TextGrid supports multiple tiers in a file. TextGrid does not support empty files (file with no tiers). TextGrid does not support alternatives labels nor locations. Only the ones with the best score are saved. TextGrid does not support controlled vocabularies. TextGrid does not support hierarchy. TextGrid does not support metadata. TextGrid does not support media assignment. TextGrid supports points and intervals. TextGrid does not support disjoint intervals. TextGrid does not support alternative tags (here called “text”). TextGrid does not support radius.
Both “short TextGrid” and “long TextGrid” file formats are supported.
- __init__(name=None)[source]¶
Initialize a new sppasTextGrid instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of TextGrid format or not.
Try first to open the file with the default sppas encoding, then UTF-16.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- class anndata.aio.sppasWebVTT(name=None)[source]¶
Bases:
anndata.aio.subtitle.sppasBaseSubtitles
SPPAS reader/writer for VTT format.
ONLY THE WRITER IS IMPLEMENTED YET.
- class anndata.aio.sppasXRA(name=None)[source]¶
Bases:
anndata.aio.basetrsio.sppasBaseIO
SPPAS XRA reader and writer.
xra files are the native file format of the GPL tool SPPAS.
- __init__(name=None)[source]¶
Initialize a new XRA instance.
- Parameters
name – (str) This transcription name.
- static detect(filename)[source]¶
Check whether a file is of XRA format or not.
- Parameters
filename – (str) Name of the file to check.
- Returns
(bool)
- static format_annotation(annotation_root, annotation)[source]¶
Add an ‘Annotation’ element in the tree from a sppasAnnotation().
- Parameters
annotation_root – (ET) XML Element tree root.
annotation – (sppasAnnotation)
- static format_label(label_root, label)[source]¶
Add a ‘Label’ element in the tree from a sppasLabel().
- Parameters
label_root – (ET) XML Element tree root.
label – (sppasLabel)
- static format_location(location_root, location)[source]¶
Add a ‘Location’ element in the tree from a sppasLocation().
- Parameters
location_root – (ET) XML Element tree root.
location – (sppasLocation)
- static format_metadata(metadata_root, meta_object, exclude=[])[source]¶
Add ‘Metadata’ element in the tree from a sppasMetaData().
- Parameters
metadata_root – (ET) XML Element tree root.
meta_object – (sppasMetadata)
exclude – (list) List of keys to exclude
- static format_tier(tier_root, tier)[source]¶
Add a ‘Tier’ object in the tree from a sppasTier().
- Parameters
tier_root – (ET) XML Element tree root.
tier – (sppasTier)
- class anndata.aio.sppasXRFF(name=None)[source]¶
Bases:
anndata.aio.table.sppasTable
SPPAS XRFF writer.
XML-based format of WEKA software tool. XRFF format description is at the following URL: http://weka.wikispaces.com/XRFF
- This class is limited to:
Only the writers are implemented. No readers.
Sparse option is not supported by both writers.
XRFF output file is not gzipped.
XRFF format supports the followings that are not currently implemented into this class:
attribute weights;
instance weights.
– !!!!!!!! No guarantee !!!!!! –
This class has never been tested.
– !!!!!!!! No guarantee !!!!!! –