The answer is both in the tutorials and the documentation.... Doucle-click the sppas.bat file under Windows, or sppas.command file under MacOS/Linux. Test SPPAS features with the given samples.
SPPAS can only deal with UTF8 encoding. If the the input file is not UTF-8, the encoding error message is sent. So... the files must be converted.
Solution for TextGrid files:At a first stage, Praat has to be configured properly. Execute Praat, then click on the Praat menu. Click on "Preferences", then, click on "Text writing preferences...". Choose "Output encoding UTF-8" (it's the first choice). Then, each file has to be converted: Open the file, then save it.
Solution for any file under Windows: Open the file(s) with Notepad++, choose UTF-8 in the "Encoding" menu then save.
All automatic annotations included in SPPAS are implemented with language-independent algorithms... this means that adding a new language in SPPAS only consists in adding the linguistic resources related to the annotation (like lexicons, dictionaries, models, set of rules, etc).
Linguistic resources can be edited, modified, changed or deleted by any user.
Supporting new languages is performed step by step, by adding linguistic resources; and constructing linguistic resources requires to collaborate with linguists. So... any help is welcome!
Find more details in chapter 3: each annotation requiring a specific resource has a section "Support of a new language".
SPPAS can work on any audio file in length, as soon as the computer has enough memory.
When a token is missing of the pronunciation dictionary, SPPAS tries to phonetize by analogy with other entries of the dictionary and this warning message occurs. If the proposed phonetization is appropriate, you can just ignore it. If not, you can edit the dictionary, add the right one then perform again the phonetization. See Chapter 3, Section 5 for details, and the section of the language of chapter 4.
This error message means that the given file contains a degenerated interval. It has to be corrected before the use of SPPAS.
No, and it won't! SPPAS is not an Automatic Speech Recognition (ASR) system.
None of the existing ASR system is able to produce the high quality orthographic transcription which is required for further reliable analyses!
Orthographic transcription has to be done manually into IPUs, and it must follow the convention briefly described in (Chapter 3, Section 3) and detailed here.
The procedure outcome report indicates an error message, and the IPU is not time-aligned.
It can happen when something is wrong: julius is not installed, the audio signal quality is not good, the orthographic transcription does not really match the audio signal, there's errors in the phonetization, etc. You can then try to identify the problem and to solve it! You can also enable the "basic" option and SPPAS will assign the same duration to each phoneme to that specific IPU.
From SPPAS version 2.3, when I run python script, I found SPPAS don't create annotated files. My input file name is "bcf2e54f-16ea-46d2-a969-38bda8b9265e.wav". When I remove "-", it can be run. If my file name is "1-1-1.wav", it can be run. Prior to this version, SPPAS didn't have this kind of behavior.
There's no way for the workspace manager to distinguish if the character "-" is part of a filename or if it's a "root-pattern" separator. I recommend to use "_" instead.
In details... This is not a bug but the consequence of a new feature. From version 2.3, annotations of SPPAS are based on the use of "Workspaces". All the advantages of such new feature in the new GUI (requires python3 + wx4). In workspaces, filenames like "oriana1.wav" or "oriana1.TextGrid" or oriana1-token.TextGrid are all sharing the same file root which is "oriana1" and annotations will append a pattern to that root like "-token". When annotating, the workspace manager is searching for the file root from the given filename and so...
SPPAS can to open/save and convert files from a wide range of file formats including (but not limited to) TextGrid, eaf, antx, mrk, ctm, stm, lab, srt, sub, csv, txt... The full list is available in the documentation, chapter "Introduction", section "Compatibility and Operability".