The automatic annotation and analysis of speech

Download, Install, First use, Extend

I finished the installation. What do I have to do now to execute SPPAS?

The answer is both in the tutorials and the documentation.... Double-click the sppas.bat file under Windows, or sppas.command file under macOS/Linux. Test SPPAS features with the given samples.

SPPAS send a message with an encoding error.

SPPAS can only deal with UTF8 encoding. If the input file is not UTF-8, the encoding error message is sent. So... the files must be converted.

Solution for TextGrid files:At a first stage, Praat has to be configured properly. Execute Praat, then click on the Praat menu. Click on Preferences then, click on "Text writing preferences...". Choose "Output encoding UTF-8" (it's the first choice). Then, each file has to be converted: Open the file, then save it.

Solution for any file under Windows: Open the file(s) with Notepad++, choose UTF-8 in the "Encoding" menu then save.

Automatic annotations

Support of a new language?

All automatic annotations included in SPPAS are implemented with language-independent algorithms... this means that adding a new language in SPPAS only consists in adding the linguistic resources related to the annotation (like lexicons, dictionaries, models, set of rules, etc).

Linguistic resources can be edited, modified, changed or deleted by any user.

Supporting new languages is performed step by step, by adding linguistic resources; and constructing linguistic resources requires to collaborate with linguists. So... any help is welcome!

Find more details in chapter 3: each annotation requiring a specific resource has a section "Support of a new language".

[ ERROR ] End TimePoint must be greater than Begin TimePoint

This error message means that the given file contains a degenerated interval. It has to be corrected before the use of SPPAS.

Julius failed to time-align

The procedure outcome report indicates an error message, and the IPU is not time-aligned.

It can happen when something is wrong: julius is not installed, the audio signal quality is not good, the orthographic transcription does not really match the audio signal, there are errors in the phonetization, etc. You can then try to identify the problem and to solve it! You can also enable the basic option and SPPAS will assign the same duration to each phoneme to that specific IPU.

How long audio files can SPPAS process?

SPPAS can work on any audio file in length, as soon as the computer has enough memory.

[WARNING] Unknown word phonetization

When a token is missing of the pronunciation dictionary, SPPAS tries to phonetize by analogy with other entries of the dictionary, so this warning message occurs. If the proposed phonetization is the right one, you can ignore it. If not, you can edit the dictionary, then perform again the phonetization. See Chapter 3, Section 5 for details and the "Resources Documentation".

Do SPPAS can transcribe automatically?

No, and it won't! SPPAS is not an Automatic Speech Recognition (ASR) system.

None of the existing ASR system is able to produce the high quality orthographic transcription which is required for further reliable analyses!

Orthographic transcription has to be done manually into IPUs, and it must follow the convention briefly described in (Chapter 3, Section 3) and detailed here.

SPPAS don't create annotated files:

From SPPAS version 2.3, when I run a python script, I found SPPAS don't create annotated files. My input file name is "bcf2e54f-16ea-46d2-a969-38bda8b9265e.wav". When I remove "-", it can be run. If my file name is "1-1-1.wav", it can be run. Prior to this version, SPPAS didn't have this kind of behavior.

There's no way for the workspace manager to distinguish if the character "-" is part of a filename or if it's a "root-pattern" separator. I recommend to use "_" instead.

In details... This is not a bug but the consequence of a new feature. From version 2.3, annotations of SPPAS are based on the use of "Workspaces". All the advantages of such new feature in the new GUI (requires python3 + wx4). In workspaces, filenames like "oriana1.wav" or "oriana1.TextGrid" or oriana1-token.TextGrid are all sharing the same file root which is "oriana1" and annotations will append a pattern to that root like "-token". When annotating, the workspace manager is searching for the file root from the given filename and so...

  • for "oriana1-token.wav" the root is "oriana1", because a pattern "-token" is existing.
  • for "bcf2e54f-16ea-46d2-a969-38bda8b9265e.wav" the root is "bcf2e54f-16ea-46d2-a969", because a pattern "-38bda8b9265e" is existing. SPPAS is then searching for annotated files like bcf2e54f-16ea-46d2-a969.TextGrid, bcf2e54f-16ea-46d2-a969-token.TextGrid, etc.
  • for "1-1-1.wav" the root is "1-1-1" because a pattern can't be less than 2 characters.

Analyses

File conversion

What are the file formats supported by SPPAS?

SPPAS can to open/save and convert files from a wide range of file formats including (but not limited to) TextGrid, eaf, antx, mrk, ctm, stm, lab, srt, sub, csv, txt... The full list is available in the documentation, chapter "Introduction", section "Compatibility and Operability".