Linguistic resources

Linguistic resources for SPPAS enable five automatic annotations: normalization, phonetization, alignment, syllabification, and cued speech.

Since SPPAS algorithms are language-independent, you can annotate any language as long as the necessary linguistic resources are available. The quality of the annotations depends on the accuracy of these resources.

Table of content

The language you’re looking for isn’t in the list? Help creating it

Overview

SPPAS provides automatic annotations using language-independent algorithms. This means that adding a new language to SPPAS simply requires adding the necessary linguistic resources for annotation:

Text normalization requires lexicons;
Phonetization requires a pronunciation dictionary;
Alignment requires an acoustic model;
Syllabification requires a configuration file with the rules.

This document includes tables listing the phones available in the resources used for phonetization, alignment, and syllabification of a given language. The first column shows the symbols used by SPPAS, while the other columns help users better understand their meaning.

SPPAS encodes phones using X-SAMPA, a machine-readable phonetic alphabet based on 7-bit printable ASCII characters. It is a language-independent notation that covers the entire International Phonetic Alphabet (IPA). A SPPAS plugin allows converting time-aligned phones from X-SAMPA to IPA.

The acoustic models created by Brigitte Bigi were trained using the HTK toolbox, version 3.4.1. HTK was developed by the Machine Intelligence Laboratory (formerly known as the Speech Vision Robotics Group) at the Cambridge University Engineering Department (CUED) and Entropic Ltd. Microsoft has since licensed HTK back to CUED and is supporting its redistribution and development via the HTK3 website. (Source: http://htk.eng.cam.ac.uk/). Please note that HTK is available for free download after registration, and users must agree to its license terms. Section 2.2 of the license states that HTK cannot be distributed or sub-licensed to any third party, either in whole or in part, in any form.

Download and Install

To automatically install the resources in SPPAS, launch SPPAS. Then, launch the Setup application in the dashboard. Follow the instructions. Finally, click the Install button.

Alternatively, you can install linguistic resources using the command-line interface with the script sppas/bin/sppassetup.py.

Contribute

Each provided resource is not perfect… and can always be improved. How?

Edit the file, apply your changes, and email your modified version to the author.
Send new audio recordings and transcriptions to help re-train the acoustic model.

You can also create new resource files and share them! Altruism contributes to the quality of life… you can make a difference.

Licenses

Most of the distributed files are released under the terms of the GNU General Public License.

However, some resources may be subject to more restrictive licenses. Please check carefully, especially before redistributing them.