Polish Language
Download
This chapter describes the linguistic resources included in the file
pol.zip
of the lang
folder.
List of phonemes
Consonant Plosives
SPPAS | IPA | Description | Examples |
---|---|---|---|
p | p | voiceless bilabial | paw, pan |
b | b | voiced bilabial | bal, ból |
t | t | voiceless (post)dental | tak, tata |
d | d | voiced (post)dental | dom, dawać |
k | k | voiceless velar | kot, kawa |
g | g | voiced velar | gar, gwiazda |
Consonant Fricatives
SPPAS | IPA | Description | Examples |
---|---|---|---|
f | f | voiceless labiodental | fan, fotel, faza |
s | s | voiceless (post)dental | sabat, subaru, sen |
s\ | ɕ | voiceless alveolo-palatal | świerszcz, śpi, się, siwy |
s\ | ʂ | voiceless alveolar with retroflex hook | |
z | z | voiced (post)dental | za, ząb |
z\ | ʑ | voiced alveolo-palatal | źrebak, zima, ziemia |
z\ | ʐ | voiced retroflex | |
Z | ʒ | voiced alveolar | że, żaba, rzeka |
v | v | voiced labiodental | wrak, lawenda |
x | x | voiceless velar | hak, chór |
S | ʃ | voiceless postalveolar | szum, sztama, Szczecin, Warszawa |
Consonant Nasals
SPPAS | IPA | Description | Examples |
---|---|---|---|
m | m | bilabial | mój, mama |
n | n | (post)dental | nos, nowy |
n` | ɳ | retroflex | |
N | ŋ | velar | bank, gang |
n\ | ɲ | alveolo-palatal |
Consonant Liquids
SPPAS | IPA | Description | Examples |
---|---|---|---|
l | l | lateral (post)dental | las, lato |
r | r | alveolar trill/flap | rok, rata, krok |
Semivowels / Approximants
SPPAS | IPA | Description | Examples |
---|---|---|---|
j | j | front approximant | ja, jajo, już |
w | w | back approxiamnt | ławka, łyk, łże |
Vowels
SPPAS | IPA | Description | Examples |
---|---|---|---|
a | a | open front unrounded | pat, ptak, Ala, adres |
E | ɛ | open-mid front unrounded | test, ten, Ewa, deszcz |
O | ɔ | open-mid back rounded | pot, kot, Ola, ogród, ogórek |
i | i | close front unrounded | miś, Irena, instytut |
I\ | ɨ | near-close central unrounded | ryba, mysz, być |
u | u | close back rounded | bum, uwaga, tutaj, wóz |
y | y | close front rounded | mysz |
Nasal vowels
SPPAS | IPA | Examples |
---|---|---|
E~ | ɛ̃ | węże, kęsy |
o~ | ɔ̃ | wąż, mąż, wąsy |
Affricates
SPPAS | IPA | Description | Examples |
---|---|---|---|
t^S | t͡ʃ | voiceless alveolar | czas, czy, czwartek |
t^s | t͡s | voiceless (post)dental | co, cały, Francja |
t^s\ | t͡ɕ | voiceless alveolo-palatal | ćwiczenie, pamięć |
d^z | d͡z | voiced (post)dental | dzwon, sadza |
d^z\ | d͡ʑ | voiced alveolo-palatal | dźwięk, dziwny, niedziela |
d^Z | d͡ʒ | voiced alveolar | dżem, drożdże, dżuma, dżungla |
Fillers
SPPAS | Description |
---|---|
laugh | laughter |
noise | noises, unintelligible speech |
dummy | un-transcribed speech |
Lexicons
All Polish lexicons are (c) Laboratoire Parole et Langage, Aix-en-Provence, France:
pol.vocab
contains a list of 500k different words;pol_num.repl
allows to convert numbers to their written form;pol.repl
allows to convert symbols and abbreviations into a text form.
All of them are distributed under the terms of the GNU General Public License.
Pronunciation Dictionary
The Polish pronunciation dictionary was downloaded in 2015 from the Ralf catalog of dictionaries for the Simon ASR system at http://spirit.blau.in/simon/import-pls-dictionary/.
It was then converted (format and phoneset) and corrected by Brigitte Bigi, thanks to the help of Katarzyna Klessa http://katarzyna.klessa.pl/. An update was done in 2017 to correct systematic errors.
It is distributed under the terms of the GNU General Public License.
Acoustic Model
The acoustic model was created by Brigitte Bigi. We address special thanks to Katarzyna Klessa for giving us access to a corpus.
It is distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.
The model was created using a Python script available in the SPPAS package:
acmtrain.py
.
References
Brigitte Bigi, Katarzyna Klessa (2015). Automatic Syllabification of Polish. In 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 262-266, Poznań, Poland.