Conclusion

To sum-up

  1. SPPAS: automatic speech segmentation
  2. Momel and INTSINT: modelling of pitch contours and microprosody
  3. TGA: analysis of interpausal time groups

Corpora

Screenshots of 4 corpora (left to right): CID, GrenelleII, Aix MapTask, DVD
Screenshots of 4 corpora (left to right): CID, GrenelleII, Aix MapTask, DVD

CID - Corpus of Conversational Data

CID - Extracts

CID - Current annotations (1)

  1. Enriched orthographic transcription (manual)
    • time-aligned at the utterance level (automatic)

CID - Current annotations (2)

  1. Time-aligned phonemes and tokens and events like noises, laughter (automatic)
  2. Time-aligned syllables (automatic)

CID - Current annotations (3)

  1. Prosodic contours (manual)
  2. Momel - Modelization of melody (automatic)
  3. INternational Transcription System for INTonation (automatic)

CID - Current annotations (4)

  1. Morpho-syntax and syntax time-aligned at the token level (automatic);
  2. Time-aligned lemmas (automatic);

CID - Current annotations (5)

  1. Dysfluencies (manual)
  2. Discourse and interaction (manual)
  3. Other- and Self- Repetitions (semi-automatic)

CID - Current annotations (6)

  1. Gestures: postural, face, hands (manual)

GrenelleII

GrenelleII: annotations

  1. Enriched orthographic transcription (manual)
    • time-aligned at the utterance level (automatic)
  2. Time-aligned phonemes, tokens and events (automatic)
  3. Time-aligned syllables (automatic)
  4. Prosodic contours and intonation (manual)
  5. Morpho-syntax time-aligned at the token level (automatic)
  6. Self-repetitions (semi-automatic)
  7. Interruptions (manual)

Corpus creation workflow