Created in 2021 by Brigitte Bigi and Maryvonne Zimmermann
The corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. It can be used for any research or teaching purpose. The corpus is hosted by ortolang repository.
We asked the 23 volunteer participants to read aloud and to cue one topic among the 10 we prepared. Two participants accepted to read 2 different topics. Each topic was made of 4 sessions; the sessions were recorded separately for the participant to have a short break:
The corpus is made of 4 hours of high quality audio-video recordings.
Created in 2010-2011 by Brigitte Bigi, Cristel Portes, Agnès Steuckardt, Marion Tellier.
B. Bigi, C. Portes, A. Steuckardt, M. Tellier
Multimodal Annotations and Categorization for Political Debates,
ICMI Workshop on Multimodal Corpora for Machine learning (ICMI-MMC), Alicante (Espagne), 2011
B. Bigi, C. Portes, A. Steuckardt, M. Tellier
A Multimodal Study of Answers to Disruptions.,
Journal on Multimodal User Interfaces, Volume 7, Issue 1, Pages 55-66, Springer (Publisher). ISSN 1783-7677. DOI 10.1007/s12193-012-0110-zi, 2012
Created in 2011 by Brigitte Bigi and Pauline Péri
Corpus in French manually phonetized and time-aligned at the phoneme level. It's duration is 7 minutes (5400 phones), and made of 3 extracts of the following corpora: CID, AixOx and Grenelle.
Freely available for downloads:SLDR 000786
B. Bigi, P. Péri, R. Bertrand
Orthographic Transcription: Which Enrichment is required for Phonetization?,
Language Resources and Evaluation Conference, Istanbul (Turkey), pages 1756-1763, ISBN 978-2-9517408-7-7. 2012
Read corpus created between 2010 and 2012 by Sophie Herment, Anastassia Loukina, Anne Tortel, Daniel Hirst, Brigitte Bigi
40 paragraphes of about 1 minute in French and English, from EUROM 1 corpus. French texts are read by French native speakers (mainly from Aix-en-Provence) and by English native speakers (from Oxford). English texts are read by native English speakers and by French native speakers.
Freely available for downloads:SLDR 000784
S. Herment, A. Loukina, A. Tortel, D. Hirst, B. Bigi
AixOx, a multi-layered learners corpus: automatic annotation
Proceedings of international conference on corpus linguistics, Jaèn (Spain), March 2012.
Enriched orthographic transcription (manual), time-aligned at the IPU level (automatic)
Time-aligned phonemes and tokens and events like noises, laughter (automatic) and time-aligned syllables (automatic)
Prosodic contours (manual), Momel - Modelization of melody (automatic) and INternational Transcription System for INTonation (automatic)
Morpho-syntax and syntax time-aligned at the token level (automatic) and time-aligned lemmas (automatic);
Dysfluencies (manual), Discourse and interaction (manual), Other- and Self- Repetitions (semi-automatic)
P. Blache, R. Bertrand, B. Bigi, E. Bruno, E. Cela, R. Espesser,
G. Ferré, M. Guardiola, D. Hirst, E.-P. Magro, J.-C. Martin,
C. Meunier, M.-A. Morel, E. Murisasco, I Nesterenko, P. Nocera,
B. Pallaud, L. Prévot, B. Priego-Valverde, J. Seinturier,
N. Tan, M. Tellier, S. Rauzy
Multimodal Annotation of Conversational Data,
The Fourth Linguistic Annotation Workshop, ACL 2010, pages 186-191, Uppsala, Suède, 2010.