4. Convert files
Interoperability and compatibility: an introduction
The conversion of a file to another file is the process of changing the form of the data presentation, and not the data itself. Every time, when data file is to be used, they must be converted to a readable format for the next application. A data conversion is normally an automated process to some extent. SPPAS provide the possibility to automatically import and export the work done on some various file formats from a wide range of other software tools. For the users, the visible change will be only a different file extension, but for software it is the difference between understanding the contents of the file and the inability to read it.
The conversion of file formats is then a challenging task, and it can imply that some data are left. Representing annotated data in SPPAS is of crucial importance for its automatic annotations, its analysis of annotations and for the software to be able to automatically annotate and analyze any kind of data files. SPPAS then includes an original and generic enough annotation representation framework. This framework for annotations is very rich: it contains both broad information and some specific information like alternative labels or alternative localizations of annotations, a tier hierarchy, controlled vocabularies, etc. A native format named XRA was then developed to fit in such data representation. The physical level of XRA representation obviously makes use of XML, XML-related standards and stand-off annotation. Due to an intuitive naming convention, XRA documents are readable as far as possible within the limits of XML.
SPPAS conversion method
In the scope of the compatibility between SPPAS data and annotated data from other software tools or programs, SPPAS is able to open/save and import/export several file formats.
SPPAS makes use of its internal data representation to convert files. A conversion then consists of two steps:
- the incoming file is loaded and mapped to the SPPAS data framework;
- such data is saved to the expected format.
These two steps are applied whatever the organization structure of annotations in the original or in the destination file format. This process allows SPPAS to import from and export to a variety of file formats. This is illustrated by the next Figure which includes the list of file formats and the corresponding software that are supported. Arrows are representing the following actions:
- import from, represented by a continued line with an arrow from the file format to the SPPAS framework;
- partially import from, represented by a dash line with an arrow from the file format to the SPPAS framework;
- export to, represented by an arrow from the SPPAS framework to the file format.
Supported file formats
SPPAS supports the following software with their file extensions:
- Praat: TextGrid, PitchTier, IntensityTier
- Elan: eaf
- Annotation Pro: ant, antx
- Phonedit: mrk
- Sclite: ctm, stm
- HTK: lab, mlf
- Subtitles: srt, sub
- Signaix: hz
- Spreadsheets, R,…: csv
- Audacity, notepads: txt
The followings can be imported:
- ANVIL: anvil
- Transcriber: trs
- Xtrans: tdf
The followings can be exported:
- Weka: arff, xrff
- Subtitles: vtt
The support of external formats is regularly extended to new formats by the author on-demand from the users and distributed to the community in the SPPAS regular updates.
Convert with the UI of SPPAS
Convert of the GUI allows converting checked files of
the workspace into the selected format. A table is indicating which of
the features are supported by each format. It allows knowing what to
expect when converting from one format to the other one. For example, a
Praat TextGrid file does not contain metadata nor media information
neither a controlled vocabulary nor a hierarchy. If the file to be
converted contains such information, they’ll be lost in the TextGrid
bin/trsconvert.py allows to convert files
with the CLI.