Cantonese Language
Download
This chapter describes the linguistic resources included in the file
yue.zip
of the lang
folder.
List of phonemes
Consonant Plosives
SPPAS | IPA | Description |
---|---|---|
p | p | voiceless bilabial |
p_h | pʰ | voiceless bilabial aspirated |
t | t | voiceless alveolar |
t_h | tʰ | voiceless alveolar aspirated |
k | k | voiceless velar |
k_h | kʰ | voiceless velar aspirated |
k_w | kʷ | voiceless velar labialized |
k_h_w | kʰʷ | voiceless velar aspirated labialized |
Consonant Fricatives
SPPAS | IPA | Description |
---|---|---|
f | f | voiceless labiodental |
s | s | voiceless alveolar |
S | ʃ | voiceless postalveolar |
h | h | voiceless glottal |
Consonant Nasals
SPPAS | IPA | Description |
---|---|---|
m | m | bilabial |
n | n | alveolar |
N | ŋ | voiced velar |
Consonant Liquids
SPPAS | IPA | Description |
---|---|---|
l | l | alveolar lateral |
Semivowels
SPPAS | IPA | Description |
---|---|---|
j | j | palatal |
w | w | voiced labiovelar |
Vowels
SPPAS | IPA | Description |
---|---|---|
E: | ɛ: | open-mid front unrounded |
a: | a: | open front unrounded |
9: | œ: | open-mid front rounded |
O: | ɔ: | open-mid back rounded |
o | o | close-mid back rounded |
e | e | close-mid front unrounded |
8 | ɵ | close-mid central rounded vowel |
i: | i: | close front unrounded |
u: | u: | close back rounded |
y: | y: | close front rounded |
6 | ɐ | near-open central vowel |
I | ɪ | near-close near-front unrounded |
U | ʊ | near-close near-back rounded |
@ | ə | schwa |
Affricates
SPPAS | IPA | Description |
---|---|---|
ts | t͡s | voiceless alveolar |
ts_h | t͡sʰ | voiceless alveolar aspirated |
tS | t͡ʃ | voiceless postalveolar |
tS_h | t͡ʃʰ | voiceless postalveolar aspirated |
Lexicons
Lexicons are (c) Laboratoire Parole et Langage, Aix-en-Provence, France:
yue.vocab
contains a list of 47k different character-based words;yue_chars.vocab
is a list of 12k characters;yue.repl
andyue_chars.repl
allow to convert symbols and abbreviations into a text form.
Both are distributed under the terms of the GNU General Public License.
Pronunciation dictionaries
The 2 dictionaries were constructed with the most frequently observed prononciations of a conversational corpus.
Acoustic Model
The Cantonese acoustic model is copyrighted: (C) DSP and Speech Technology Laboratory, Department of Electronic Engineering, the Chinese University of Hong Kong.
This is a monophone Cantonese acoustic model, based on Jyutping of the Linguistic Society of Hong Kong (LSHK). Each state is trained with 32 Gaussian mixtures. The model is trained with HTK 3.4.1. The corpus for training is CUSENT, also developed in our laboratory.
Generally speaking, you may use the model for non-commercial, academic or personal use.
See COPYRIGHT for the details of the license: Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International Public
License
.
We also have other well-trained Cantonese acoustic models. If you would like to use the models and/or the CUSENT corpus for commercial applications or development, please contact Professor Tan LEE for appropriate license terms.
The character pronunciation comes from Jyutping phrase box from the Linguistic Society of Hong Kong.
The copyright of the Jyutping phrase box belongs to the Linguistic
Society of Hong Kong. We would like to thank the Jyutping Group of the
Linguistic Society of Hong Kong for permission to use the electronic
file in our research and/or product development.
If you use this model for academic research, please cite:
Tan Lee, W.K. Lo, P.C. Ching, Helen Meng (2002). Spoken language resources for Cantonese speech processing, Speech Communication, Volume 36, Issues 3–4, Pages 327-342
- Website: http://dsp.ee.cuhk.edu.hk
- Email: tanlee@ee.cuhk.edu.hk
References
Roxana Fung, Brigitte Bigi (2015). Automatic word segmentation for spoken Cantonese. In Oriental COCOSDA and Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), pp. 196-201.