Mandarin Language

Download

This chapter describes the linguistic resources included in the file cmn.zip of the lang folder.

List of phonemes

Help is welcome to improve the quality of both Mandarin Chinese resources and of this documentation.

These resources are distributed without any warranty.

Consonant Plosives

SPPAS IPA Description Examples
p p voiceless bilabial 诐, 一把手
p_h voiceless bilabial aspirated 仳, 伾, 佩
t t voiceless alveolar 掉, 诋
t_h voiceless alveolar aspirated 条心
k k voiceless velar 诰, 仡
k_h voiceless velar aspirated 丂, 亢

Consonant Fricatives

SPPAS IPA Description Examples
f f voiceless labiodental 访, 佱, 俘
s s voiceless alveolar 诉, 偲
s` ʂ voiceless alveolar with retroflex hook 识 说
z` ʐ voiced alveolar with retroflex hook 儒, 入
S ʃ voiceless postalveolar 厄, 呃
x x voiceless velar 和, 和
ss 笑, 咸

Consonant Nasals

SPPAS IPA Description Examples
m m bilabial 哤, 咩, 喵
n n alveolar 噛, 哝, 咛
N ŋ voiced velar 尝, 嚝, 嚷

Consonant Liquids

SPPAS IPA Description Examples
l l alveolar lateral 咾, 哢

Vowels

SPPAS IPA Description Examples
a a open front unrounded 垵, 奡, 壒, 墺, 埏
o o close-mid back rounded 怄, 欧
e e close-mid front unrounded A, 诶
i i close front unrounded 〡, 㐆, 一 诒
i_d close front unrounded dental 子, 孖
i` close front unrounded retroflex 估值, 似
u u close back rounded 诬, 罔, 五
y y close front rounded 诩, 语, 伝
@` schwa with retroflex hook 佴, 儿

Affricates

SPPAS IPA Description Examples
ts t͡s voiceless alveolar 孖, 字
tss 讵, 讲
ts_h t͡sʰ voiceless alveolar aspirated 䌽, 吹
ts` voiceless alveolar retroflex hook 证, 诊
ts_h` 串 吹
ts_hs 诎, 㐤

Fillers

SPPAS Description
laugh laughter
noise noises, unintelligible speech
dummy un-transcribed speech

Lexicons

All lexicons are (c) CNRS, Laboratoire Parole et Langage, Aix-en-Provence, France:

  • cmn.vocab contains a list of 110k different words;
  • cmn_num.repl allows to convert numbers to their written form;
  • cmn.repl allows to convert symbols and abbreviations into a text form.

All of them are distributed under the terms of the GNU General Public License.

Pronunciation dictionary

The pronunciation dictionary was manually created for the syllables by Zhi Na. We address special thanks to her for sharing her work.

It is distributed under the terms of the GNU General Public License.

Acoustic model

The acoustic model was created by Brigitte Bigi from 2 corpora: the first one at Shanghai by Zhi Na, and another one by Hongwei Ding. We address special thanks to hers for giving us access to their data. Both recordings are a Chinese version of the Eurom1 corpus. See the following publication for details:

Daniel Hirst, Brigitte Bigi, Hyongsil Cho, Hongwei Ding, Sophie Herment, Ting Wang (2013). Building OMProDat: an open multilingual prosodic database, Proceedings of Tools ans Resources for the Analysis of Speech Prosody, Aix-en-Provence, France, Eds B. Bigi and D. Hirst, ISBN: 978-2-7466-6443-2, pp. 11-14.

Notice that the current model was trained from a very small amount of data: this will impact on the results. Do not expect to get good performances for the automatic alignment.

More Mandarin Chinese data are welcome! Because more data implies a better acoustic model then better alignments…

The model is distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.