This documentation is governed by GNU Free Documentation License, version 1.3. It will assume that you are using a relatively recent version of SPPAS. There’s no reason not to download the latest version whenever released: it’s easy and fast!
Any and all constructive comments are welcome.
SPPAS - the automatic annotation and analyses of speech is a scientific computer software package. SPPAS is daily developed with the aim to provide a robust and reliable software. Available for free, with open source code, there is simply no other package for linguists to simple use in both the automatic annotations of speech, the analyses of any kind of annotated data and the conversion of annotated file formats. You can imagine the annotations or analyses you need, SPPAS does the rest!
SPPAS ensures flexibility rather than a one-size-fits all approach in the implementation of the proposed features. There are no fixed or unique solutions, but a bunch of customizable features to make choices according to the user requirements.
Annotating recordings is very labor-intensive and cost-ineffective since it has to be performed manually by experienced researchers with many hours of work. As the primary functionality, SPPAS proposes a set of automatic or semi-automatic annotations of recordings. In the present context, annotations are
defined as the practice of adding interpretative, linguistic information to an electronic corpus of spoken and/or written language data. (Leech, 1997). SPPAS automatizes the annotation processes and allows users to save time. In order to be used efficiently, SPPAS expects a rigorous methodology to collect data and to prepare them.
Annotation can also refer to the end-product of this process
Linguistics annotation, especially when dealing with multiple domains, makes use of different tools within a given project. This implies a rigorous annotation framework to ensure compatibilities between annotations and time-saving. SPPAS annotation files are in a specific XML format with extension
xra. Annotations can be imported from and exported to a variety of other formats including Praat (TextGrid, PitchTier, IntensityTier), Elan (eaf), Transcriber (trs), Annotation Pro (antx), Phonedit (mrk), Sclite (ctm, stm), HTK (lab, mlf), subtitles formats (srt, sub), CSV files…
[…] when multiple annotations are integrated into a single data set, inter-relationships between the annotations can be explored both qualitatively (by using database queries that combine levels) and quantitatively (by running statistical analyses or machine learning algorithms) (Chiarcos 2008). As a consequence, the annotations must be time-synchronized: annotations need to be time-aligned in order to be useful for purposes such as analyses. Some special features are offered in SPPAS for managing annotated files and analyzing data. Among others, it includes a tool to filter multi-levels annotations. Other included tools are to estimate descriptive statistics, to manage annotated files, to manage audio files, etc. These data analysis tools of SPPAS are mainly proposed in the Graphical User Interface. However, advanced users can also access directly the Application Programming Interface, for example to estimate statistics or to manipulate annotated data.
By using SPPAS, you agree to cite a reference in your publications. The full list of references is available in Chapter 7.
Many problems can be solved by updating the version of SPPAS.
When looking for more detail about some subject, one can search this documentation. This documentation is available in-line - see the SPPAS website, it is also included in the package in PDF format.
There are: a F.A.Q., tutorials and slides in the SPPAS web site.
Since January 2011, Brigitte Bigi is the main author of SPPAS. She has a tenured position of researcher at the French CNRS - Centre National de la Recherche Scientifique. She’s working since 2009 at Laboratoire Parole et Langage in Aix-en-Provence, France.
More about the author:
Contact the author by e-mail:
Do not contact the author if:
Possible e-mails are:
Here is the list of other contributors in programming:
SPPAS software, except documentation and resources, are distributed under the terms of the GNU GENERAL PUBLIC LICENSE v3.
Linguistic resources of SPPAS are either distributed:
GNU GENERAL PUBLIC LICENSE, v3, or
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.
See the documentation of the resources for details about individual license.
To summarize, SPPAS users are:
SPPAS is regularly supported by the Laboratoire Parole et Langage in many ways.
Partly supported by ANR OTIM project (Ref. Nr. ANR-08-BLAN-0239), Tools for Multimodal Information Processing. Read more at: http://www.lpl-aix.fr/~otim/
Partly supported by ORTOLANG (Ref. Nr. ANR-11-EQPX-0032) funded by the « Investissements d’Avenir » French Government program managed by the French National Research Agency (ANR). Read more at: http://www.ortolang.fr/
SPPAS was also partly carried out thanks to the support of the following projects or groups:
The introduction of Naija language is supported by the ANR NaijaSynCor.
The introduction of workspaces to manage files and the SPEAKER annotation type were both supported by the Vapvisio ANR project (ANR-18-CE28-0011-01).
Both the Proof of Concept of the Cued Speech automatic annotation and the package for the dependencies’ installation was supported by the LPL.
The website of SPPAS is located at the following URL: https://sppas.org
The releases and the source code are hosted by SourceForge at: http://sppas.sf.net/
The main website contains the
Download page to download recent versions of the SPPAS software package and it describes the installation instructions.
There is a unique version of SPPAS which does not depend on the operating system. SPPAS is ready to run, so it does not need elaborate installation. All you need to do is to copy the SPPAS package from the website to somewhere on your computer. Choose a location with preferably only US-ASCII characters in the name - it obviously includes the path. The package of SPPAS is compressed and zipped, so you will need to decompress and unpack it once you’ve got it.
An installation guide is available and must be followed carefully.
Notice that administrator rights can be required.
To summarize it:
STEP 1: install the recommended version of Python 3.x ONCE (Windows/Linux)
setup.command(Linux, MacOS) to continue the installation and follows instructions;
.sppaspyenv~, then launch the program
sppas\bin\preinstall.py. It allows installing external programs to enable some features, and the linguistic resources.
In case of difficulty arising from this installation, you’re invited to consult the web first. It probably will provide the solution. If, however, the problems were to persist, ask a technician for help.
SPPAS is a Research Software, distributed in the context of the
Open Science. Then, unlike many other software tool, SPPAS is not distributed as an executable program. Instead, everything is done so that users can check / change operations.
It is particularly suitable for automatic annotations: it allows anyone to customize automatic annotations to its own needs and to implement its custom annotation solution.
The SPPAS package is a directory with content as files and folders:
README.txtfile, which aims to be read!
setup.commandto install some external programs;
sppas.commandto launch the Graphical User Interface;
samplesdirectory contains data of various languages: they are proposed in order to test various features of SPPAS;
demofolder contains an audio file, its corresponding orthographic transcription and the video file. It also contains a large amount of annotations automatically created by SPPAS;
pluginsdirectory with one plugin. Others are available on the website;
workspacesdirectory with an example of a workspace;
sppasdirectory contains the program itself, it’s the SPPAS software itself;
resourcesdirectory contains data that are used by automatic annotations (lexicons, dictionaries, …);
documentation directory contains:
scripting_solutionsis a set of python scripts corresponding to the exercises proposed in the chapter
Scripting with Python and SPPAS
SPPAS is constantly being improved and new packages are published frequently (about 10 versions a year). It is important to update regularly in order to get the latest features and corrections.
Updating SPPAS consists of downloading, decompressing and launching the setup.
There are three main ways to use SPPAS:
The Graphical User Interface (GUI) is as user-friendly as possible. It requires wxPython to be installed, i.e. to check
during the Setup step.
sppas.batfile, under Windows;
sppas.commandfile, under MacOS or Linux.
The Command-line User Interface (CLI), with a set of programs, each one essentially independent of the others, that can be run on its own at the level of the shell.
Advanced users can also access directly the Application Programming Interface - API. Scripting with Python and SPPAS provides the more powerful way.
Features of SPPAS can be divided into 3 main categories:
The SPPAS website contains the list of the features and how most of the automatic annotations can be organized in an annotation workflow. See https://sppas.org/features.html
There is a list of important things to keep in mind while annotating with SPPAS. They are summarized as follows and detailed in the chapters of this documentation:
Speech audio files for automatic annotations:
aufiles are supported
Annotated data files:
It is recommended to use only US-ASCII characters in file names, obviously it includes its path.
The quality of the results for most of the automatic annotations is highly influenced by the quality of the data the annotation takes in input. This is a politically correct way to say: Garbage in, garbage out!
Annotations are based on the use of linguistic resources. Resources for several languages are gently shared and freely available or downloadable. The quality of the automatic annotations is largely influenced by the quality of the linguistic resources.
Users are of crucial importance for resource development.
Do not hesitate to help in checking resource files and sharing your corrections with the community.
The users of SPPAS are invited to contribute to improve them. They can release the improvements to the public, so that the whole community benefits.
Any help is welcome to improve existing resources or to create new ones.