French Language

Download

This chapter describes the linguistic resources included in both the files fra.zip and fraquebec.zip of the lang folder of Ortolang repository.

List of phonemes

Consonant Plosives

SPPAS IPA Description Examples
p p voiceless bilabial passé, par, appris
b b voiced bilabial beau, abris, baobab
t t voiceless alveolar tout, thé, patte
d d voiced alveolar doux, deux, addition
k k voiceless velar cabas, psycho, quatre, kelvin
g g voiced velar gain, guerre, second

Consonant Fricatives

SPPAS IPA Description Examples
f f voiceless labiodental fête, pharmacie
s s voiceless alveolar sa, hausse, ce, garçon, option, scie
S ʃ voiceless postalveolar choux, schème, shampooing
v v voiced labiodental vous, wagon, neuf heures
z z voiced alveolar hasard, zéro, transit
Z ʒ voiced postalveolar joue geai

Consonant Nasals

SPPAS IPA Description Examples
m m bilabial mou, homme
n n alveolar nous, bonne

Consonant Liquids

SPPAS IPA Description Examples
l l alveolar lateral lit, ville, fil
R ʁ voiced uvular roue, rhume, arrive

Semivowels

SPPAS IPA Description Examples
j j palatal payer, fille, travail
w w voiced labiovelar oui, web, whisky
H ɥ labial-palatal huit, Puy

Vowels for French (fra dictionary)

SPPAS IPA Description Examples
E ɛ open-mid front unrounded crème, faite, peine, fête, maître, mètre
A/ a open front unrounded patte, là
A/ ɑ open back unrounded pâte glas
9 œ open-mid front rounded sœur, neuf, œuf
i i close front unrounded si, île, régie, y
e e close-mid front unrounded clé, les, chez, aller, pied, journée
O/ o close-mid back rounded sot, hôtel, haut
O/ ɔ open-mid back rounded sort, minimum
u u close back rounded cou, clown, roue
y y close front rounded tu, sûr, rue
2 ø close-mid front rounded ceux, jeûner, deux
@ ə schwa le, reposer, faisons
a~ ã nasal sans, champ, vent, temps, Jean, taon
U~/ ɛ̃ nasal vin, pain, brin, printemps
U~/ œ̃ nasal un, parfum, brun
O~ ɔ̃ nasal son, nom, bon

Additional phonemes for French (fre dictionary)

The following list of phonemes are currently used only in the fre dictionary. However, any of them can be used in any of the 3 prononciation dictionaries.

SPPAS IPA Description Examples
o o close-mid back rounded sot, hôtel, haut
O ɔ open-mid back rounded sort, minimum
e~ ɛ̃ nasal vin, pain, brin, sein
9~ œ̃ nasal un, parfum, brun
N ŋ voiced velar camping, bingo
J ɲ palatal gagne, pignon

Vowels for Quebec French

SPPAS IPA Description Examples
E ɛ open-mid front unrounded crème, faite, peine
a a open front unrounded patte
A ɑ open back unrounded pâte
2 ø close-mid front rounded deux
3 ɜ open-mid central unrounded fête, maître, mètre
9 œ open-mid front rounded sœur, neuf, œuf
i i close front unrounded régie
I ɪ near-close front unrounded île
e e close-mid front unrounded clé, les, chez, aller, pied, journée
o o close-mid back rounded sot, haut, hôtel
O ɔ open-mid back rounded sort, minimum
u u close back rounded cou
U ʊ near-close back rounded clown
y y close front rounded tu, sûr, rue
Y ʏ near-close front rounded thune, truc
@ ə schwa
A~ ɑ̃ nasal banc, sans, champ, vent
E~ ɛ̃ nasal bassin, pain, brin
O~ ɔ̃ nasal son, nom, bon
U~/ œ̃ nasal un, parfum, brun

Fillers

SPPAS Description
laugh laughter
noise noises, unintelligible speech
fp filled pause (euh)
dummy un-transcribed speech

Lexicons

All French lexicons are (c)CNRS, Laboratoire Parole et Langage, Aix-en-Provence, France:

  • fra.vocab contains a list of 345k different words;
  • fra.stp contains a list of 65 stop-words;
  • fra.lem is a list of words with their lemmas and occurrences;
  • fra_num.repl allows to convert numbers to their written form;
  • fra.repl allows to convert symbols and abbreviations into a text form.

All of them are distributed under the terms of the GNU General Public License.

Dictionaries

There are 3 dictionaries for French language:

  1. fra.dict which is recommended for spontaneous speech;
  2. fre.dict which is recommended for standard read speech;
  3. fra_quebec.dict is for Quebec French.

The Quebec French prononciation dictionary is (c) Marie-Hélène Côté, Université de Lausanne. The other two dictionaries are (c) CNRS, Laboratoire Parole et Langage, Aix-en-Provence, France. All of them are distributed under the terms of the GNU General Public License.

The 2 French pronunciation dictionaries were created by Brigitte Bigi by collecting and merging several open dictionaries loaded from the web.

In the fra one, some pronunciations were added using the LIA_Phon tool. Many words were manually corrected and a large set of missing words and pronunciation variants were manually added. Moreover, the observed frequent pronounciations in conversational corpora were added: it’s mainly about reductions. The following meta-phonemes are used: A/ for both a and A; O/ for both o and O; U~/ for both e~ and 9~. The following phoneme is not used: N, it is replaced by the sequence n g.

In the fre one, only the standard French pronounciations are used. However, the only meta-phoneme is A/. Moreover, N is used.

The Quebec French dictionary was created by Marie-Hélène Côté, Université de Lausanne. It was converted to the required format (.dict and SAMPA) by B. Bigi. The version currently distributed is limited to 40k words. The full original version (192k words) is available on-demand.

Acoustic Models

The French acoustic model was created by Brigitte Bigi from various corpora mainly recorded at Laboratoire Parole et Langage. Special thanks are addressed to Roxane Bertrand, Béatrice Priego-Valverde, Sophie Herment, Amandine Michelas, Christine Meunier and Cristel Portes for kindly sharing their corpora. This model was evaluated in (Bigi & Meunier, 2018).

The model was updated in February 2022, by adding 31 minutes of manually time-aligned read speech of the CLeLfPC corpus, and by adding an HMM model the phonemes: o, O, e~, 9~ and N. There’s no evaluation of this new version.

The Quebec French acoustic model is based on the HMMs of the French model and the missing vowels were picked up from the Deutch, the English and the Polish models. This initial QF acoustic model created from the HMMs of other languages was then adapted to QF with a corpus Mélanie Lancien created for this purpose. This training subset is made of 273 seconds of speech - 2390 phonemes, an extract of the PFC corpus. The model was evaluated in (Lancien & al., 2020).

Both models are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License. They were created using a Python script available in the SPPAS package: acmtrain.py.

Syllabification configuration file

The syllabification configuration file corresponds to the one described in the paper (Bigi et al. 2010). It was adapted to Quebec French by adding the missing vowels. It is distributed under the terms of the GNU General Public License.

Cued Speech

The resources for the automatic generation of Cued Speech keys are under construction. They can only be used in order to help in their development. They are created by Brigitte Bigi in collaboration with Datha: http://www.datha.io

The file is a set of rules in order to be used to convert a sequence of phonemes into a sequence of keys.

They are distributed under the terms of the GNU General Public License.

References

Brigitte Bigi, Christine Meunier, Irina Nesterenko, Roxane Bertrand (2010). Automatic detection of syllable boundaries in spontaneous speech. In Language Resource and Evaluation Conference (LREC), pp. 3285-3292, La Valetta, Malta.

Brigitte Bigi, Christine Meunier (2018). Automatic speech segmentation of spontaneous speech. Revista de Estudos da Linguagem. International Thematic Issue: Speech Segmentation. Volume 26, number 4, pages 1489-1530, e-ISSN 2237-2083.

Mélanie Lancien, Marie-Hélène Côté, Brigitte Bigi (2020). Developing Resources for Automated Speech Processing of Quebec French. In Language Resources and Evaluation Conference (LREC), pp. 5323–5328, Marseille, France.