Dutch Language
Author
The Dutch resources were kindly created and shared by Alex Elio Stasica <alex.stasica1@gmail.com> under the terms of the CC BY-NC 4.0 license. Brigitte Bigi adapted them for use with SPPAS.
List of phonemes
SAMPA | IPA | Word Example (orthographic) | Word Example (SAMPA) |
---|---|---|---|
EI | ɛi | AANLEIDING | a n l EI d I N |
Au | ɑu | AUGUSTUS | Au G Y s t Y s |
ui | ui | BESLUITEN | b @ s l ui t @ |
E | ɛ | BESMETTELIJK | b @ s m E t @ l @ k |
A | ɑ | BEVALLEN | b @ v A l @ |
O | ɔ | BEVOLKT | b @ v O l k t |
I | I | BEVINDEN | b @ v I n d @ |
Y | ʉ | BEWUST | b @ w Y s t |
Q | ɒ | APLOMB | A p l Q |
U | œ | AANPASSINGSMA NOEUVRE | a m p A s I N s m a n U v r @ |
@ | ə | AANRAAKTE | a n r a k t @ |
e | e | ACHTEREENVOLGENDE | A x t @ r e n v O l G @ n d @ |
u | u | ACAJOU | a k a Z u |
y | y | BUURT | b y r t |
o | o | CONTROLEREN | k O n t r o l e r @ |
i | i | CRISISMANAGEMENT | k r i s @ s m E n @ dZ m @ n t |
a | a | COMBINATIE | k O m b i n a t s i: |
E: | ɛ: | GRES | x r E: |
O: | ɔ: | HEUVELS | h O: v @ l s |
y: | y: | JUS | Z y: |
i: | i: | KERSTVAKANTIE | k E r s t f a k A n s i: |
p | p | KNAPPE | k n A p @ |
b | b | LABYRINT | l a b i r I n t |
t | t | LAND | l A n t |
d | d | LEEFDEN | l e v d @ |
k | k | LEEK | l e k |
g | g | VAKBLAD | v A g b l A t |
m | m | VERMAAK | v @ r m a k |
n | n | VERNOEMD | v @ r n u m t |
N | ŋ | VERONGELUKKEN | v @ r O N G @ l Y k @ |
s | s | VERRASSING | v @ r A s I N |
z | z | VERZETTEN | v @ r z E t @ |
S | ʃ | AANGEMARCHEERD | a N G @ m A r S e r t |
Z | ʒ | AARDBEIENGELEI | a r d b EI j @ Z @ l EI |
l | l | AANVANKELIJK | a n v A N k @ l @ k |
x | x | AARDIG | a r d @ x |
G | ɣ | AARDIGE | a r d @ G @ |
j | j | AQUARIUM | a k w a r i j Y m |
w | w | AUDIO-VISUAL | Q d i j o v I Z u w @ l |
r | r | AUSTRALISCH | Au s t r a l i s |
h | h | BEHALVE | b @ h A l v @ |
dZ | dʒ | CRISISMANAGEMENT | k r i s @ s m E n @ dZ m @ n t |
Fillers
SPPAS | Description |
---|---|
laugh | laughter |
noise | noises, unintelligible speech |
dummy | un-transcribed speech |
fp | filled pause |
Lexicons
The word list for the Dutch language was extracted from the pronunciation dictionary (see below).
The files dut_num.repl
and dut.repl
were automatically generated by ChatGPT-4.
They enable number-to-letter conversion and automatic replacements during the normalization task.
Pronunciation Dictionary
The dictionary was created based on the CELEX2 Dutch corpus (link), which contains approximately 321,000 words. CELEX provides a comprehensive pronunciation dictionary that has been reformatted for its use with SPPAS and completed with fillers. It is distributed under the Creative Commons Attribution 4.0 International License, allowing for modification and redistribution. We acknowledge and appreciate this contribution.
Dictionary reference: Baayen, R. H., R. Piepenbrock, and L. Gulikers. CELEX-2 Dutch (Version 2.0) (1995) [Data set]. Available at the Dutch Language Institute: https://hdl.handle.net/10032/tm-a2-w5
Acoustic model
The acoustic model was trained using the IFA Spoken Language Corpus (link), which provided both the audio recordings and the corresponding phonetically aligned transcriptions.
To address the absence of certain Dutch phonemes in the original dataset, a minor data augmentation process was applied. Specifically, a Dutch speech synthesizer (link) was used to generate approximately 40 additional words containing the missing phonemes.
The model was developed following the guidelines outlined in the VoxForge tutorial.