Data preparation is the only path for good annotations then for a reliable analysis.
Try it yourself! Download the audio and the video recordings of the examples:
Launch the GUI of SPPAS then add both files in the workspace and check at least one of them, or open a Terminal application and change directory to the one of the SPPAS package.
Speech segmentation is the alignment of the speech recording with a phonetic transcription of the speech. SPPAS offers various solutions to perform it fully automatically or semi-automatically. This example is one of these solutions.
This solution starts with a semi-automatic macro-segmentation: the automatic search for Inter-Pausal Units (e.g. sounding segments) then the manual orthographic transcription. Three automatic annotations are then performing speech segmentation: Text Normalization, Phonetization and Alignment.
Given a speech recording, the goal of this task is to generate an annotation file in which the sounding segments between silences are marked. There are several parameters that must be fixed in order to get the expected result. Read the following paper:
Brigitte Bigi, Béatrice Priego-Valverde (2019). Search for Inter-Pausal Units: application to Cheese! corpus. In 9th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 289-293, Poznań, Poland.
Use the GUI or the CLI in order to perform the STANDALONE "Search for IPUs" automatic annotation. For the given example, there's no need to change the default parameters because it's read speech in French language.
An orthographic transcription is often the minimum requirement for a speech corpus so it is at the top of the annotation procedure, and it is the entry point for most of the automatic annotations. A transcription convention is designed to provide rules for writing speech corpora. This convention establishes what are the phenomena to transcribe and also how to mention them in the orthography. From the beginning of its development it was considered to be essential for SPPAS to deal with an Enriched Orthographic Transcription (EOT). The SPPAS package contains the PDF of such convention, and it can be downloaded from here. Finally, it has to be noticed that this convention is not software-dependent. The orthographic transcription can be manually performed with the Edit page of SPPAS GUI, with Praat, with Annotation Pro, …
Brigitte Bigi, Pauline Péri, Roxane Bertrand (2012). Orthographic Transcription: which enrichment is required for phonetization? In Proceedings of the Eight International Conference on Language Resources and Evaluation, pp. 1756–1763, Istanbul, Turkey.
Whatever the solution you choose, the result of this step must be an annotation file (xra, textgrid, eaf, mrk...) with the orthographic transcription time-aligned at the IPUs level.
The following is a 100% SPPAS solution:
Notice that the page "Edit" of the GUI is still under development. Any constructive comment or suggestion or bug report is welcome.
The first task faced by any speech or language processing system is the conversion of input text/transcription into a linguistic representation. Speech transcriptions contain truncated words, orthographic reductions, etc. Normalizing or rewriting such texts using ordinary words is an important issue for various applications. Among the essential steps of building a corpus, word segmentation is a necessary but highly challenging task for some languages. SPPAS implements a generic approach, i.e. a text normalization method as language and task independent as possible.
Brigitte Bigi (2014). A Multilingual Text Normalization Approach. Human Language Technology Challenges for Computer Science and Linguistics, LNAI 8387, pp. 515–526.
Use the GUI or the CLI in order to perform the Text Normalization STANDALONE automatic annotation. Select "fra" language. In the GUI, you can check the option "Create a tier with the standard tokens", but for this file, it is not really useful because it's read speech and we didn't transcribed any specific mispronounciation or other phenomena.
Phonetization is the process of representing sounds by phonetic symbols. The program for the phonetization of the orthographic transcription produces a phonetic transcription based on a phonetic dictionary. An important step is then to build the pronunciation dictionary, where each word in the vocabulary is expanded into its constituent phones. You can customize the pronounciation dictionaries of SPPAS: do it yourself!
Brigitte Bigi (2016). A phonetization approach for the forced-alignment task in SPPAS. Human Language Technology. Challenges for Computer Science and Linguistics, LNAI 9561, pp. 515–526.
Use the GUI or the CLI in order to perform the Phonetization STANDALONE automatic annotation. Select "fra" language.
Phonetic alignment consists in a time-matching between a given speech utterance and a phonetic representation of the utterance. SPPAS is based on the Julius Speech Recognition Engine. For each utterance, the orthographic and phonetic transcriptions are used. Julius performs an alignment to identify the temporal boundaries of phones and words.
Brigitte Bigi, Christine Meunier (2018). Automatic segmentation of spontaneous speech. Revista de Estudos da Linguagem. International Thematic Issue: Speech Segmentation, 26(4), pp. 1489-1530
Use the GUI or the CLI in order to perform the Alignment STANDALONE automatic annotation. Select "fra" language. This annotation requires the program "Julius CSR Engine" (or HVite from HTK-Toolkit).
All the following annotations require the "video" feature to be enabled.
Face detection is requiring models that are not installed by default. In the GUI, click the "Add annotations" button and follow instructions.
Notice that this annotation can be very (very very) slow, and the higher video framerate and images size, the slower it is!
By default, the annotation will produce one file with all detected persons and one file for each detected person.