Basic concepts

Corpus and annotation

Multi-domain annotations

Annotation software

A methodology for annotation...

Corpus annotation: Manual vs. Automatic

Example of automatic time-alignment vs manual time-alignment
Example of automatic time-alignment vs manual time-alignment

The Automatic Annotator (an example)

The Automatic Analyzer (an example)

Getting/Sharing a corpus

Corpora - Examples (created at LPL)

Screenshots of 4 corpora (left to right): CID, GrenelleII, Aix MapTask, DVD
Screenshots of 4 corpora (left to right): CID, GrenelleII, Aix MapTask, DVD

CID - Corpus of Conversational Data

CID - Extracts

CID - a pioneer

Then...

  1. an annotation scheme was developed for each annotation level
  2. the framework I'm currently presenting was elaborated
  3. automatic tools were adapted or designed
  4. a multi-level request system was designed

... annotated either by LPL, LLING or LIMSI.

CID - Current annotations (1)

  1. Enriched orthographic transcription (manual)
    • time-aligned at the IPU level (automatic)

CID - Current annotations (2)

  1. Time-aligned phonemes and tokens and events like noises, laughter (automatic)
  2. Time-aligned syllables (automatic)

CID - Current annotations (3)

  1. Prosodic contours (manual)
  2. Momel - Modelization of melody (automatic)
  3. INternational Transcription System for INTonation (automatic)

CID - Current annotations (4)

  1. Morpho-syntax and syntax time-aligned at the token level (automatic);
  2. Time-aligned lemmas (automatic);

CID - Current annotations (5)

  1. Dysfluencies (manual)
  2. Discourse and interaction (manual)
  3. Other- and Self- Repetitions (semi-automatic)

AB:

CM:

AB-CM:

CID - Current annotations (6)

  1. Gestures: postural, face, hands (manual)

CID - to summarize

GrenelleII

GrenelleII: annotations

  1. Enriched orthographic transcription (manual)
    • time-aligned at the utterance level (automatic)
  2. Time-aligned phonemes, tokens and events (automatic)
  3. Time-aligned syllables (automatic)
  4. Prosodic contours and intonation (manual)
  5. Morpho-syntax time-aligned at the token level (automatic)
  6. Self-repetitions (semi-automatic)
  7. Interruptions (manual)

GrenelleII: Multi-modal analysis

Aix Map-Task

Aix Map-Task: Screenshot

Face to face Aix Map-Task
Face to face Aix Map-Task

Aix Map-Task: Annotations

  1. Enriched orthographic transcription (manual)
    • time-aligned at the utterance level (manual in 2002 / automatic in 2013)
  2. Time-aligned phonemes and tokens and events (automatic)
  3. Time-aligned syllables (automatic)
  4. Feedback (semi-automatic)

Why a rigorous methodology?

Quick and dirty prototyping
Quick and dirty prototyping

Summary