Multi-Lingual approaches to the automatic annotation of speech

Brigitte Bigi

Thursday 29th September - Aix-en-Provence

Introduction

Corpus and annotation

Corpus annotation “can be defined as the practice of adding interpretative, linguistic information to an electronic corpus of spoken and/or written language data. 'Annotation' can also refer to the end-product of this process” (Leech, 1997).

Annotations

Time-synchronized annotations

Example of multi-level annotations
Example of multi-level annotations

Annotation software

Automatic annotation

Before using any automatic annotation tool/software, it is important to consider its error rate (where applicable) and to estimate how those errors will affect the purpose for the annotated corpora.

My research

Summary