Introduction

Corpus and annotation

Getting a corpus

Corpus annotation: Manual vs. Automatic

Example of automatic time-alignment vs manual time-alignment
Example of automatic time-alignment vs manual time-alignment

Annotation procedure

Example of automatic time-alignment, from a speech file and its orthographic transcription
Example of automatic time-alignment, from a speech file and its orthographic transcription

Multi-domain annotations

Annotation tools

Annotation: data and tool relations

It is unfortunate that there is still today an enormous gap between the community of linguists and phoneticians on the one hand and that of engineers and computer scientists on the other. Each community needs the other and, in an ideal world, linguists would provide theoretical frameworks and data which are useful to engineers, while engineers would provide tools which are useful to linguists. The exchange between the two communities, however, is in practice very slow. (D.J. Hirst 2006: 198)

Annotation tools limitation

When multiple annotations are integrated into a single data set, inter-relationships between the annotations can be explored both qualitatively (by using database queries that combine levels) and quantitatively (by running statistical analyses or machine learning algorithms).

However, when such muti-layer corpora are to be created with existing task-specific annotation tools, a new problem arises: output formats of the annotation tools can differ considerably.

Tools for the analysis of annotations

With the help of multimodal corpora searches, the investigation of the temporal alignment (synchronized co-occurrence, overlap or consecutivity) of gesture and talk has become possible. (Abuczki and Baiat Ghazaleh, 2013)

Automatic annotation analysis

Automatic pairwise duration difference plot

Wagner Quadrants plot (Genre G: Fiction (General)
Wagner Quadrants plot (Genre G: Fiction (General)

Automatic item duration comparison

Box Plot/of Mandarin tone durations
Box Plot/of Mandarin tone durations

A methodology for annotation...

The Automatic Annotator

The Automatic Analyzer

A methodology for annotation...

Why a rigorous methodology?

Quick and dirty prototyping
Quick and dirty prototyping

Summary