Researcher at CNRS, I work at Laboratoire Parole et Langage in Aix-en-Provence, France.
My research focuses on reproducible methods for speech and multimodal interaction data, spanning applied computational linguistics, corpus science, and time-aligned signal processing. Earlier work includes ASR and NLP.
My approach is methodologically rigorous and implementation-aware: I define protocols for corpus design, annotation conventions, and quality criteria; implement and validate the associated workflows; and ensure results are traceable, auditable, and reusable. I work end-to-end, from corpus constraints to quantitative analyses, prioritizing robustness and explicit methodological choices.
I’m the author of SPPAS, which I use to translate these protocols into validated, reusable workflows.
Over the last years, I have focused on tools and workflows that support annotated datasets: automatic/semi-automatic annotation, management of multi-level annotations, filtering/querying structured layers, and statistical analyses of time-aligned measures. This work requires careful handling of heterogeneous data (recordings, aligned annotations, multimodal streams), systematic verification of inputs/outputs, and documentation that allows other teams to reproduce analyses reliably.
I also contribute to Open Science practices beyond publication: sustainable dissemination of methods and resources, long-term maintenance of research-grade software, and alignment with FAIR principles (clear documentation, standard formats, and reuse-oriented outputs).
More recently, I have developed multimodal corpus methodologies and computational analyses for French Cued Speech (LfPC), combining speech, video and manual/facial cues to study accessible communication in the context of hearing impairment.
|
Award Ceremony 2022 (audio/video, in French)
|
Recent projects and activities
- Leader of the project "VizLector" (2025-2028), granted by AMIDEX.
- Leader of the project "AutoCuedSpeech" (2023-2026), granted by FIRAH: https://auto-cuedspeech.org
- Contributor to Vapvisio (2018-2021), granted by ANR.
- Contributor to Physocial (2018-2019), granted by Aix-Marseille Univ.
- Leader of a Procore project, in collaboration with Poly U of Hong Kong, granted by Campus France (2015-2016).
- Contributor to Variamu (2014-2015), granted by Aix-Marseille Univ.
- Contributor to ORTOLANG (2012-2015), Equipex.
Resume
Past research topics are related to text corpora:
- Formalization and constitution of corpora, including for under-resourced languages
- Application to information retrieval (classification)
- Application to automatic speech recognition (French, vietnamese, Khmer)
- Application to statistical translation (French-English, French-vietnamese)
I am a graduate from Avignon University with a PhD in Computer Science. From 1997 to 2000, I worked with Professor Renato De Mori at LIA, France. I worked on statistical language modelling for automatic speech recognition and information retrieval. I had introduced a new effective model for topic identification.
From 2000 to 2002, my work at LORIA focused on topic identification in newspaper articles and e-mails.
From 2002 to 2009, I worked at LIG on statistical language modelling for automatic speech recognition and statistical machine translation.
Since 2009, at LPL (Laboratoire Parole et Langage, Aix-en-Provence, France), my research has focused on corpus creation and annotation of speech recordings. My research focuses on language-independent approaches to tools and systems development so that they can be used either for languages with few available data resources or for languages with unexpected amount of – unnecessary – data.
Professional Experiences
- 2025-date, CRHC CNRS at LPL Aix-en-Provence, France
- 2009-2024, CRCN CNRS at LPL Aix-en-Provence, France
- 2006-2009, CR1 CNRS at LIG Grenoble, France
- 2002-2006, CR2 CNRS at LIG Grenoble, France
- 2004 (1st semester), invited researcher at ICSI Berkeley, CA, USA
- 2000-2002, ATER (Assistant) at LORIA Nancy, France
- 1997-2000, PhD student at LIA Avignon, France.