Mandarin Language
Download
This chapter describes the linguistic resources included in the file
cmn.zip
of the lang
folder.
List of phonemes
Help is welcome to improve the quality of both Mandarin Chinese resources and of this documentation.
These resources are distributed without any warranty.
Consonant Plosives
SPPAS | IPA | Description | Examples |
---|---|---|---|
p | p | voiceless bilabial | 诐, 一把手 |
p_h | pʰ | voiceless bilabial aspirated | 仳, 伾, 佩 |
t | t | voiceless alveolar | 掉, 诋 |
t_h | tʰ | voiceless alveolar aspirated | 条心 |
k | k | voiceless velar | 诰, 仡 |
k_h | kʰ | voiceless velar aspirated | 丂, 亢 |
Consonant Fricatives
SPPAS | IPA | Description | Examples |
---|---|---|---|
f | f | voiceless labiodental | 访, 佱, 俘 |
s | s | voiceless alveolar | 诉, 偲 |
s` | ʂ | voiceless alveolar with retroflex hook | 识 说 |
z` | ʐ | voiced alveolar with retroflex hook | 儒, 入 |
S | ʃ | voiceless postalveolar | 厄, 呃 |
x | x | voiceless velar | 和, 和 |
ss | 笑, 咸 |
Consonant Nasals
SPPAS | IPA | Description | Examples |
---|---|---|---|
m | m | bilabial | 哤, 咩, 喵 |
n | n | alveolar | 噛, 哝, 咛 |
N | ŋ | voiced velar | 尝, 嚝, 嚷 |
Consonant Liquids
SPPAS | IPA | Description | Examples |
---|---|---|---|
l | l | alveolar lateral | 咾, 哢 |
Vowels
SPPAS | IPA | Description | Examples |
---|---|---|---|
a | a | open front unrounded | 垵, 奡, 壒, 墺, 埏 |
o | o | close-mid back rounded | 怄, 欧 |
e | e | close-mid front unrounded | A, 诶 |
i | i | close front unrounded | 〡, 㐆, 一 诒 |
i_d | i̪ | close front unrounded dental | 子, 孖 |
i` | ᶖ | close front unrounded retroflex | 估值, 似 |
u | u | close back rounded | 诬, 罔, 五 |
y | y | close front rounded | 诩, 语, 伝 |
@` | ᶕ | schwa with retroflex hook | 佴, 儿 |
Affricates
SPPAS | IPA | Description | Examples |
---|---|---|---|
ts | t͡s | voiceless alveolar | 孖, 字 |
tss | 讵, 讲 | ||
ts_h | t͡sʰ | voiceless alveolar aspirated | 䌽, 吹 |
ts` | voiceless alveolar retroflex hook | 证, 诊 | |
ts_h` | 串 吹 | ||
ts_hs | 诎, 㐤 |
Fillers
SPPAS | Description |
---|---|
laugh | laughter |
noise | noises, unintelligible speech |
dummy | un-transcribed speech |
Lexicons
All lexicons are (c) CNRS, Laboratoire Parole et Langage, Aix-en-Provence, France:
cmn.vocab
contains a list of 110k different words;cmn_num.repl
allows to convert numbers to their written form;cmn.repl
allows to convert symbols and abbreviations into a text form.
All of them are distributed under the terms of the GNU General Public License.
Pronunciation dictionary
The pronunciation dictionary was manually created for the syllables by Zhi Na. We address special thanks to her for sharing her work.
It is distributed under the terms of the GNU General Public License.
Acoustic model
The acoustic model was created by Brigitte Bigi from 2 corpora: the first one at Shanghai by Zhi Na, and another one by Hongwei Ding. We address special thanks to hers for giving us access to their data. Both recordings are a Chinese version of the Eurom1 corpus. See the following publication for details:
Daniel Hirst, Brigitte Bigi, Hyongsil Cho, Hongwei Ding, Sophie Herment, Ting Wang (2013). Building OMProDat: an open multilingual prosodic database, Proceedings of Tools ans Resources for the Analysis of Speech Prosody, Aix-en-Provence, France, Eds B. Bigi and D. Hirst, ISBN: 978-2-7466-6443-2, pp. 11-14.
Notice that the current model was trained from a very small amount of data: this will impact on the results. Do not expect to get good performances for the automatic alignment.
More Mandarin Chinese data are welcome! Because more data implies a better acoustic model then better alignments…
The model is distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.