French Language
Download
This chapter describes the linguistic resources included in both the
files fra.zip
and fraquebec.zip
of the
lang
folder of Ortolang repository.
List of phonemes
Consonant Plosives
SPPAS | IPA | Description | Examples |
---|---|---|---|
p | p | voiceless bilabial | passé, par, appris |
b | b | voiced bilabial | beau, abris, baobab |
t | t | voiceless alveolar | tout, thé, patte |
d | d | voiced alveolar | doux, deux, addition |
k | k | voiceless velar | cabas, psycho, quatre, kelvin |
g | g | voiced velar | gain, guerre, second |
Consonant Fricatives
SPPAS | IPA | Description | Examples |
---|---|---|---|
f | f | voiceless labiodental | fête, pharmacie |
s | s | voiceless alveolar | sa, hausse, ce, garçon, option, scie |
S | ʃ | voiceless postalveolar | choux, schème, shampooing |
v | v | voiced labiodental | vous, wagon, neuf heures |
z | z | voiced alveolar | hasard, zéro, transit |
Z | ʒ | voiced postalveolar | joue geai |
Consonant Nasals
SPPAS | IPA | Description | Examples |
---|---|---|---|
m | m | bilabial | mou, homme |
n | n | alveolar | nous, bonne |
Consonant Liquids
SPPAS | IPA | Description | Examples |
---|---|---|---|
l | l | alveolar lateral | lit, ville, fil |
R | ʁ | voiced uvular | roue, rhume, arrive |
Semivowels
SPPAS | IPA | Description | Examples |
---|---|---|---|
j | j | palatal | payer, fille, travail |
w | w | voiced labiovelar | oui, web, whisky |
H | ɥ | labial-palatal | huit, Puy |
Vowels for French (fra dictionary)
SPPAS | IPA | Description | Examples |
---|---|---|---|
E | ɛ | open-mid front unrounded | crème, faite, peine, fête, maître, mètre |
A/ | a | open front unrounded | patte, là |
A/ | ɑ | open back unrounded | pâte glas |
9 | œ | open-mid front rounded | sœur, neuf, œuf |
i | i | close front unrounded | si, île, régie, y |
e | e | close-mid front unrounded | clé, les, chez, aller, pied, journée |
O/ | o | close-mid back rounded | sot, hôtel, haut |
O/ | ɔ | open-mid back rounded | sort, minimum |
u | u | close back rounded | cou, clown, roue |
y | y | close front rounded | tu, sûr, rue |
2 | ø | close-mid front rounded | ceux, jeûner, deux |
@ | ə | schwa | le, reposer, faisons |
a~ | ã | nasal | sans, champ, vent, temps, Jean, taon |
U~/ | ɛ̃ | nasal | vin, pain, brin, printemps |
U~/ | œ̃ | nasal | un, parfum, brun |
O~ | ɔ̃ | nasal | son, nom, bon |
Additional phonemes for French (fre dictionary)
The following list of phonemes are currently used only in the fre
dictionary. However, any of them can be used in any of the 3 prononciation
dictionaries.
SPPAS | IPA | Description | Examples |
---|---|---|---|
o | o | close-mid back rounded | sot, hôtel, haut |
O | ɔ | open-mid back rounded | sort, minimum |
e~ | ɛ̃ | nasal | vin, pain, brin, sein |
9~ | œ̃ | nasal | un, parfum, brun |
N | ŋ | voiced velar | camping, bingo |
J | ɲ | palatal | gagne, pignon |
Vowels for Quebec French
SPPAS | IPA | Description | Examples |
---|---|---|---|
E | ɛ | open-mid front unrounded | crème, faite, peine |
a | a | open front unrounded | patte |
A | ɑ | open back unrounded | pâte |
2 | ø | close-mid front rounded | deux |
3 | ɜ | open-mid central unrounded | fête, maître, mètre |
9 | œ | open-mid front rounded | sœur, neuf, œuf |
i | i | close front unrounded | régie |
I | ɪ | near-close front unrounded | île |
e | e | close-mid front unrounded | clé, les, chez, aller, pied, journée |
o | o | close-mid back rounded | sot, haut, hôtel |
O | ɔ | open-mid back rounded | sort, minimum |
u | u | close back rounded | cou |
U | ʊ | near-close back rounded | clown |
y | y | close front rounded | tu, sûr, rue |
Y | ʏ | near-close front rounded | thune, truc |
@ | ə | schwa | |
A~ | ɑ̃ | nasal | banc, sans, champ, vent |
E~ | ɛ̃ | nasal | bassin, pain, brin |
O~ | ɔ̃ | nasal | son, nom, bon |
U~/ | œ̃ | nasal | un, parfum, brun |
Fillers
SPPAS | Description | ||
---|---|---|---|
laugh | laughter | ||
noise | noises, unintelligible speech | ||
fp | filled pause (euh) |
||
dummy | un-transcribed speech |
Lexicons
All French lexicons are (c)CNRS, Laboratoire Parole et Langage, Aix-en-Provence, France:
fra.vocab
contains a list of 345k different words;fra.stp
contains a list of 65 stop-words;fra.lem
is a list of words with their lemmas and occurrences;fra_num.repl
allows to convert numbers to their written form;fra.repl
allows to convert symbols and abbreviations into a text form.
All of them are distributed under the terms of the GNU General Public License.
Dictionaries
There are 3 dictionaries for French language:
- fra.dict which is recommended for spontaneous speech;
- fre.dict which is recommended for standard read speech;
- fra_quebec.dict is for Quebec French.
The Quebec French prononciation dictionary is (c) Marie-Hélène Côté, Université de Lausanne. The other two dictionaries are (c) CNRS, Laboratoire Parole et Langage, Aix-en-Provence, France. All of them are distributed under the terms of the GNU General Public License.
The 2 French pronunciation dictionaries were created by Brigitte Bigi by collecting and merging several open dictionaries loaded from the web.
In the fra
one, some pronunciations were added using the
LIA_Phon tool. Many words were manually corrected and a large set of
missing words and pronunciation variants were manually added. Moreover,
the observed frequent pronounciations in conversational corpora were
added: it’s mainly about reductions. The following meta-phonemes are
used: A/ for both a and A; O/ for both o and O; U~/ for both e~ and 9~.
The following phoneme is not used: N, it is replaced by the sequence
n g
.
In the fre
one, only the standard
French
pronounciations are used. However, the only meta-phoneme is A/
.
Moreover, N
is used.
The Quebec French dictionary was created by Marie-Hélène Côté, Université de Lausanne. It was converted to the required format (.dict and SAMPA) by B. Bigi. The version currently distributed is limited to 40k words. The full original version (192k words) is available on-demand.
Acoustic Models
The French acoustic model was created by Brigitte Bigi from various corpora mainly recorded at Laboratoire Parole et Langage. Special thanks are addressed to Roxane Bertrand, Béatrice Priego-Valverde, Sophie Herment, Amandine Michelas, Christine Meunier and Cristel Portes for kindly sharing their corpora. This model was evaluated in (Bigi & Meunier, 2018).
The model was updated in February 2022, by adding 31 minutes of manually time-aligned read speech of the CLeLfPC corpus, and by adding an HMM model the phonemes: o, O, e~, 9~ and N. There’s no evaluation of this new version.
The Quebec French acoustic model is based on the HMMs of the French model and the missing vowels were picked up from the Deutch, the English and the Polish models. This initial QF acoustic model created from the HMMs of other languages was then adapted to QF with a corpus Mélanie Lancien created for this purpose. This training subset is made of 273 seconds of speech - 2390 phonemes, an extract of the PFC corpus. The model was evaluated in (Lancien & al., 2020).
Both models are distributed under the terms of the Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International Public
License. They were created using a Python script available in
the SPPAS package: acmtrain.py
.
Syllabification configuration file
The syllabification configuration file corresponds to the one described in the paper (Bigi et al. 2010). It was adapted to Quebec French by adding the missing vowels. It is distributed under the terms of the GNU General Public License.
Cued Speech
The resources for the automatic generation of Cued Speech keys are under construction. They can only be used in order to help in their development. They are created by Brigitte Bigi in collaboration with Datha: http://www.datha.io
The file is a set of rules in order to be used to convert a sequence of phonemes into a sequence of keys.
They are distributed under the terms of the GNU General Public License.
References
Brigitte Bigi, Christine Meunier, Irina Nesterenko, Roxane Bertrand (2010). Automatic detection of syllable boundaries in spontaneous speech. In Language Resource and Evaluation Conference (LREC), pp. 3285-3292, La Valetta, Malta.
Brigitte Bigi, Christine Meunier (2018). Automatic speech segmentation of spontaneous speech. Revista de Estudos da Linguagem. International Thematic Issue: Speech Segmentation. Volume 26, number 4, pages 1489-1530, e-ISSN 2237-2083.
Mélanie Lancien, Marie-Hélène Côté, Brigitte Bigi (2020). Developing Resources for Automated Speech Processing of Quebec French. In Language Resources and Evaluation Conference (LREC), pp. 5323–5328, Marseille, France.