Pronunciation dictionary manager.
A pronunciation dictionary contains a list of tokens, each one with a list of possible pronunciations.
sppasDictPron can load the dictionary from an HTK-ASCII file. Each line of such file looks like the following: acted [acted] { k t e d acted(2) [acted] { k t i d The first columns indicates the tokens, eventually followed by the variant number into braces. The second column (with brackets) is ignored. It should contain the token. Other columns are the phones separated by whitespace. sppasDictPron accepts missing variant numbers, empty brackets, or missing brackets.
Example
>>> d = sppasDictPron('eng.dict')
>>> d.add_pron('acted', '{ k t e')
>>> d.add_pron('acted', '{ k t i')
Then, the phonetization of a token can be accessed with get_pron() method:
Example
>>> print(d.get_pron('acted'))
>>> {-k-t-e-d|{-k-t-i-d|{-k-t-e|{-k-t-i
The following convention is adopted to represent the pronunciation variants:
- '-' separates the phones (X-SAMPA standard)
- '|' separates the variants
Notice that tokens in the dict are case-insensitive.