Cantonese Language

Download

This chapter describes the linguistic resources included in the file yue.zip of the "Ortolang repository".

List of phonemes

Consonant Plosives

SPPAS	IPA	Description
p	p	voiceless bilabial
p_h	pʰ	voiceless bilabial aspirated
t	t	voiceless alveolar
t_h	tʰ	voiceless alveolar aspirated
k	k	voiceless velar
k_h	kʰ	voiceless velar aspirated
k_w	kʷ	voiceless velar labialized
k_h_w	kʰʷ	voiceless velar aspirated labialized

Consonant Fricatives

SPPAS	IPA	Description
f	f	voiceless labiodental
s	s	voiceless alveolar
S	ʃ	voiceless postalveolar
h	h	voiceless glottal

Consonant Nasals

SPPAS	IPA	Description
m	m	bilabial
n	n	alveolar
N	ŋ	voiced velar

Consonant Liquids

SPPAS	IPA	Description
l	l	alveolar lateral

Semivowels

SPPAS	IPA	Description
j	j	palatal
w	w	voiced labiovelar

Vowels

SPPAS	IPA	Description
E:	ɛ:	open-mid front unrounded
a:	a:	open front unrounded
9:	œ:	open-mid front rounded
O:	ɔ:	open-mid back rounded
o	o	close-mid back rounded
e	e	close-mid front unrounded
8	ɵ	close-mid central rounded vowel
i:	i:	close front unrounded
u:	u:	close back rounded
y:	y:	close front rounded
6	ɐ	near-open central vowel
I	ɪ	near-close near-front unrounded
U	ʊ	near-close near-back rounded
@	ə	schwa

Affricates

SPPAS	IPA	Description
ts	t͡s	voiceless alveolar
ts_h	t͡sʰ	voiceless alveolar aspirated
tS	t͡ʃ	voiceless postalveolar
tS_h	t͡ʃʰ	voiceless postalveolar aspirated

Lexicons

Lexicons are (c) Laboratoire Parole et Langage, Aix-en-Provence, France:

yue.vocab contains a list of 47k different character-based words;
yue_chars.vocab is a list of 12k characters;
yue.repl and yue_chars.repl allow to convert symbols and abbreviations into a text form.

Both are distributed under the terms of the GNU General Public License.

Pronunciation dictionaries

The 2 dictionaries were constructed with the most frequently observed prononciations of a conversational corpus.

Acoustic Model

The Cantonese acoustic model is copyrighted: (C) DSP and Speech Technology Laboratory, Department of Electronic Engineering, the Chinese University of Hong Kong.

This is a monophone Cantonese acoustic model, based on Jyutping of the Linguistic Society of Hong Kong (LSHK). Each state is trained with 32 Gaussian mixtures. The model is trained with HTK 3.4.1. The corpus for training is CUSENT, also developed in our laboratory.

Generally speaking, you may use the model for non-commercial, academic or personal use.

See COPYRIGHT for the details of the license: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.

We also have other well-trained Cantonese acoustic models. If you would like to use the models and/or the CUSENT corpus for commercial applications or development, please contact Professor Tan LEE for appropriate license terms.

The character pronunciation comes from Jyutping phrase box from the Linguistic Society of Hong Kong.

The copyright of the Jyutping phrase box belongs to the Linguistic Society of Hong Kong. We would like to thank the Jyutping Group of the Linguistic Society of Hong Kong for permission to use the electronic file in our research and/or product development.

If you use this model for academic research, please cite:

Tan Lee, W.K. Lo, P.C. Ching, Helen Meng (2002). Spoken language resources for Cantonese speech processing, Speech Communication, Volume 36, Issues 3–4, Pages 327-342

Website: http://dsp.ee.cuhk.edu.hk
Email: tanlee@ee.cuhk.edu.hk

References

Roxana Fung, Brigitte Bigi (2015). Automatic word segmentation for spoken Cantonese. In Oriental COCOSDA and Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), pp. 196-201.