Ortolang repository
About
The Ortolang repository allows to download the linguistic resources. It contains two types of data, organized into separate folders:
- Repository: https://www.ortolang.fr
- Owner: Brigitte Bigi
contact@sppas.org
- Permanent URL: https://hdl.handle.net/11403/sppasresources
lang
folder: This folder contains language-specific resources required for automatic annotation. Each language may include lexicons, pronunciation dictionaries, acoustic models, and/or syllabification rules. These resources are provided as ZIP files, named using the ISO 639-3 language code (e.g.,fra
for French,eng
for English,cmn
for Mandarin Chinese). For a complete list of language codes, visit http://www-01.sil.org/iso639-3/.annot
folder: This folder contains additional linguistic resources needed for certain types of automatic annotation, such as statistical models. These resources are also available as ZIP files, with one file per supported annotation type.
The available resources were initially created for use with SPPAS), but since they are open source, they can be freely downloaded, used with other annotation tools, and even modified or redistributed in most cases.
Repository versions history
Version 1 - June, 2020
- Linguistic resources for Text Normalization, Phonetization, Alignment and Syllabification of SPPAS for the following languages: cat, cmn, deu, eng, fra, frq, hun, ita, jpn, kor, nan, pcm, pol, por, spa, vie, yue.
- Data resources for face detection, face landmark and LfPC automatic annotations of SPPAS.
Version 2 - July 2020
- Updated data for face detection and face landmark.
Version 3 - Sept 2020
- Updated linguistic resources of Polish language.
- Add of this documentation.
Version 4 - Feb 2021
- A DNN model is added into Face Detection package
- The file fra.txt of LfPC package is modified – corrected by experts
- Lightness of LPC package hand pictures is adjusted
Version 5 - Sept 2021
- New acoustic model of Italian: it’s no longer a context-dependent
model. It’s a monophone model like for the other languages. The French
HMMs of
a~
andO~
, and the Naija HMM ofe~
were added to the hmmdefs. - The LPC file fra.txt is renamed cueConfig-fra.txt.
- The keys of the LPC vowels are coded with characters
b
,s
,m
,c
,t
instead of numbers. It’s not compatible with versions 3.x of SPPAS. - The keys of the LfPC consonant are coded differently, we now use the same as the previously defined ones for English.
Version 6 - Nov 2021
- New resources for Bengali language: vocabulary, pronunciation dictionary and acoustic model.
Version 7 - 2022
Updated resources for Bengali language.
Version 8 - 2023
Updated CuedSpeech resources (renamed from LfPC).
Added resources for Persian language.
Version 9 - March 2025
- Updated CuedSpeech resources.
- Added resources for Dutch.
- Added samples of audio files and transcripts in each language resource package.