The automatic annotation and analysis of speech

Examples of SPPAS Solutions

Table of content

Before using SPPAS

A SPPAS Solution for Speech Segmentation

A SPPAS Solution for Selfies Segmentation

Prepare the data

Data preparation is the only path for good annotations then for a reliable analysis.

Description:

Examples of this tutorial are performed on the "Demo" files. In this way, you can follow the instructions and test yourself on these files.

The audio and video files are provided in the SPPAS packages, or, you can download directly from the following links:

Ready to start?

Launch the GUI of SPPAS then add both files in the workspace and check at least one of them, or open a Terminal application and change directory to the one of the SPPAS package.

Video preview
Video: Launch SPPAS GUI and add demo files

A SPPAS Solution for Speech Segmentation

Speech segmentation is the alignment of the speech recording with a phonetic transcription of the speech. SPPAS offers various solutions to perform it fully automatically or semi-automatically. This example is one of these solutions.

This solution starts with a semi-automatic macro-segmentation: the automatic search for Inter-Pausal Units (e.g. sounding segments) then the manual orthographic transcription. Three automatic annotations are then performing speech segmentation: Text Normalization, Phonetization and Alignment.

Step 1: Search for IPUs

What is it?

Given a speech recording, the goal of this task is to generate an annotation file in which the sounding segments between silences are marked. There are several parameters that must be fixed to get the expected result. Read the following paper:

Brigitte Bigi, Béatrice Priego-Valverde (2019). Search for Inter-Pausal Units: application to Cheese! corpus. In 9th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 289-293, Poznań, Poland.

What are the input/output files?

Search for IPUs requires an audio file: the better quality, the better result! Supported format is WAV with only one channel.

Input/Output of Search for IPUs

How can you do?

Use the GUI or the CLI to perform the STANDALONE "Search for IPUs" automatic annotation. In the given example, there's no need to change the default parameters because it's read speech in French language.

Read either the paper, or the book to properly configure the parameters.

Video preview
Video: SPPAS GUI - perform the search for IPUs

What's the result?

The result is an annotated file indicating boundaries of the sounding segments. It contains only one tier with name "IPUs". However, if the resulting sounding segments are not like you expected, it means either:

Screenshot of automatic annotation Search for IPUs
Image: Result of the automatic annotation "Search for IPUs".

Step 2: Orthographic transcription

What is it?

An orthographic transcription is often the minimum requirement for a speech corpus; it is at the top of the annotation procedure, and it is the entry point for most of the automatic annotations. A transcription convention is designed to provide rules for writing speech corpora. This convention establishes what are the phenomena to transcribe and also how to mention them in the orthography. From the beginning of its development, it was considered to be essential for SPPAS to deal with an Enriched Orthographic Transcription (EOT).

The SPPAS package contains the PDF of the transcription convention, or it can be downloaded from here.

Brigitte Bigi, Pauline Péri, Roxane Bertrand (2012). Orthographic Transcription: which enrichment is required for phonetization? In Proceedings of the Eight International Conference on Language Resources and Evaluation, pp. 1756–1763, Istanbul, Turkey.

What are the input/output files?

Input/Output of Transcription of IPUs

How can you do?

Whatever the solution you choose, the result of this step must be an annotation file (xra, textgrid, eaf, mrk...) with the orthographic transcription time-aligned at the IPUs level.

The following is a 100% SPPAS solution:

Notice that the page "Edit" of the GUI is still under development. Any constructive comment or suggestion or bug report is welcome.

The better alternative is to use Praat.
However, you must take care to configure it to save files in UTF-8 encoding.

Video preview
Video: SPPAS GUI - perform the orthographic transcription

What is the result?

The result is an annotated file with an exact orthographic transcription performed into the IPUs.

Screenshot of an IPU of the manual orthographic transcription annotation
Image: An IPU of the manual orthographic transcription annotation.

Step 3: Text normalization

What is it about?

The first task faced by any speech or language processing system is the conversion of input text/transcription into a linguistic representation. Speech transcriptions contain truncated words, orthographic reductions, etc. Normalizing or rewriting such texts using ordinary words is an important issue for various applications. Among the essential steps of building a corpus, word segmentation is a necessary but highly challenging task for some languages. SPPAS implements a generic approach, i.e., a text normalization method as language and task independent as possible.

Brigitte Bigi (2014). A Multilingual Text Normalization Approach. Human Language Technology Challenges for Computer Science and Linguistics, LNAI 8387, pp. 515–526.

What are the input/output files?

Input/Output of Text Normalization

How to do it?

Use the GUI or the CLI to perform the Text Normalization STANDALONE automatic annotation. Select "fra" language. In the GUI, you can check the option Create a tier with the standard tokens, but for this file, it is not really useful because it's read speech, so we didn't transcribe any specific mispronounciation or other phenomena.

Video preview
Video: SPPAS GUI - perform the text normalization

What is the result?

Screenshot of automatic annotation Text Normalization
Image: Result of the automatic annotation "Text Normalization".

Step 4: Phonetization

What is it about?

Phonetization is the process of representing sounds by phonetic symbols. The program for the phonetization of the orthographic transcription produces a phonetic transcription based on a phonetic dictionary. An important step is then to build the pronunciation dictionary, where each word in the vocabulary is expanded into its constituent phones. You can customize the pronounciation dictionaries of SPPAS: do it yourself!

Brigitte Bigi (2016). A phonetization approach for the forced-alignment task in SPPAS. Human Language Technology. Challenges for Computer Science and Linguistics, LNAI 9561, pp. 515–526.

What are the input/output files?

Input/Output of Phonetization

How to do it?

Use the GUI or the CLI to perform the Phonetization STANDALONE automatic annotation. Select "fra" language.

Video preview
Video: SPPAS GUI - perform the phonetization

What is the result?

Screenshot of automatic annotation Phonetization
Image: Result of the automatic annotation "Phonetization".

Step 5: Alignment

What is it about?

Phonetic alignment consists in a time-matching between a given speech utterance and a phonetic representation of the utterance. SPPAS is based on the Julius Speech Recognition Engine. For each utterance, the orthographic and phonetic transcriptions are used. Julius performs an alignment to identify the temporal boundaries of phones and words.

Brigitte Bigi, Christine Meunier (2018). Automatic segmentation of spontaneous speech. Revista de Estudos da Linguagem. International Thematic Issue: Speech Segmentation, 26(4), pp. 1489-1530

What are the input/output files?

Input/Output of Alignment

How to do it?

Use the GUI or the CLI to perform the Alignment STANDALONE automatic annotation. Select "fra" language. This annotation requires the program "Julius CSR Engine" (or HVite from HTK-Toolkit).

Video preview
Video: SPPAS GUI - perform the alignment

What is the result?

Screenshot of automatic annotation Alignment
Image: Result of the automatic annotation "Alignment".

A SPPAS Solution for Selfies Segmentation

All the following annotations require the "video" feature to be enabled.

Face detection is requiring models that are not installed by default. In the GUI, click the "Add annotations" button and follow instructions.

Step 1: Face detection

How to do it?

Notice that this annotation can be very (very very) slow, and the higher video framerate and image size, the slower it is!

Video preview
Video: SPPAS GUI - perform face detection

One of the possible results

Result of the automatic annotation "Face detection" with the first three models selected

Step 2: Person Face identity

By default, the annotation will produce one file with all detected persons and one file for each detected person.

Results

Result of the automatic annotations "Person Face identity" for the detected person identified as "id0001".
Result of the automatic annotation "Person Face identity".