Dutch Language

Author

The Dutch resources were kindly created and shared by Alex Elio Stasica <alex.stasica1@gmail.com> under the terms of the CC BY-NC 4.0 license. Brigitte Bigi adapted them for use with SPPAS.

List of phonemes

SAMPA IPA Word Example (orthographic) Word Example (SAMPA)
EIɛiAANLEIDINGa n l EI d I N
AuɑuAUGUSTUSAu G Y s t Y s
uiuiBESLUITENb @ s l ui t @
EɛBESMETTELIJKb @ s m E t @ l @ k
AɑBEVALLENb @ v A l @
OɔBEVOLKTb @ v O l k t
IIBEVINDENb @ v I n d @
YʉBEWUSTb @ w Y s t
QɒAPLOMBA p l Q
UœAANPASSINGSMA NOEUVREa m p A s I N s m a n U v r @
@əAANRAAKTEa n r a k t @
eeACHTEREENVOLGENDEA x t @ r e n v O l G @ n d @
uuACAJOUa k a Z u
yyBUURTb y r t
ooCONTROLERENk O n t r o l e r @
iiCRISISMANAGEMENTk r i s @ s m E n @ dZ m @ n t
aaCOMBINATIEk O m b i n a t s i:
E:ɛ:GRESx r E:
O:ɔ:HEUVELSh O: v @ l s
y:y:JUSZ y:
i:i:KERSTVAKANTIEk E r s t f a k A n s i:
ppKNAPPEk n A p @
bbLABYRINTl a b i r I n t
ttLANDl A n t
ddLEEFDENl e v d @
kkLEEKl e k
ggVAKBLADv A g b l A t
mmVERMAAKv @ r m a k
nnVERNOEMDv @ r n u m t
NŋVERONGELUKKENv @ r O N G @ l Y k @
ssVERRASSINGv @ r A s I N
zzVERZETTENv @ r z E t @
SʃAANGEMARCHEERDa N G @ m A r S e r t
ZʒAARDBEIENGELEIa r d b EI j @ Z @ l EI
llAANVANKELIJKa n v A N k @ l @ k
xxAARDIGa r d @ x
GɣAARDIGEa r d @ G @
jjAQUARIUMa k w a r i j Y m
wwAUDIO-VISUALQ d i j o v I Z u w @ l
rrAUSTRALISCHAu s t r a l i s
hhBEHALVEb @ h A l v @
dZCRISISMANAGEMENTk r i s @ s m E n @ dZ m @ n t

Fillers

SPPAS Description
laugh laughter
noise noises, unintelligible speech
dummy un-transcribed speech
fp filled pause

Lexicons

The word list for the Dutch language was extracted from the pronunciation dictionary (see below). The files dut_num.repl and dut.repl were automatically generated by ChatGPT-4. They enable number-to-letter conversion and automatic replacements during the normalization task.

Pronunciation Dictionary

The dictionary was created based on the CELEX2 Dutch corpus (link), which contains approximately 321,000 words. CELEX provides a comprehensive pronunciation dictionary that has been reformatted for its use with SPPAS and completed with fillers. It is distributed under the Creative Commons Attribution 4.0 International License, allowing for modification and redistribution. We acknowledge and appreciate this contribution.

Dictionary reference: Baayen, R. H., R. Piepenbrock, and L. Gulikers. CELEX-2 Dutch (Version 2.0) (1995) [Data set]. Available at the Dutch Language Institute: https://hdl.handle.net/10032/tm-a2-w5

Acoustic model

The acoustic model was trained using the IFA Spoken Language Corpus (link), which provided both the audio recordings and the corresponding phonetically aligned transcriptions.

To address the absence of certain Dutch phonemes in the original dataset, a minor data augmentation process was applied. Specifically, a Dutch speech synthesizer (link) was used to generate approximately 40 additional words containing the missing phonemes.

The model was developed following the guidelines outlined in the VoxForge tutorial.