I am absolutely convinced that research should be public and should primarily be a cooperation. That's why I'm sharing all my research works under Open Source licenses.
I'm a computer scientist working on Artificial Intelligent systems. I'm interested in Applied Computational Linguistics and Corpus Linguistics. I previously worked on speech technologies (ASR) and natural language processing (NLP).
My research topics are related to multimodal corpora:
Since 2011, all my researches are programmed, tested, documented and freely distributed: this results in a software tool with name SPPAS. It is daily developed with the aim to provide a robust and reliable software for the automatic annotation and for the analyses of annotated data. As the primary functionality, SPPAS proposes a set of automatic or semi-automatic annotations of recordings. Some special features are also offered in SPPAS for managing corpora of annotated files; particularly, it includes a tool to filter multi-levels annotations. Some other tools are dedicated to the analysis of time-aligned data; as for example to estimate descriptive statistics, etc.
Past research topics are related to text corpora:
I am a graduate from Avignon University with a PhD in Computer Science. From 1997 to 2000, I worked with Professor Renato De Mori at LIA, France. I worked on statistical language modelling for automatic speech recognition and information retrieval. I had introduced a new effective model for topic identification.
From 2000 to 2002, I worked with Professor Jean-Paul Haton and Pr Kamel Smaïli at LORIA, Nancy, France. My work focused on topic identification in newspaper articles and e-mails.
From 2002 to 2009, I worked at LIG on statistical language modelling for automatic speech recognition and statistical machine translation.
Since 2009, at LPL (Laboratoire Parole et Langage, Aix-en-Provence, France), my research has focused on corpus creation and annotation of speech recordings. My research focuses on language-independent approaches to tools and systems development so that they can be used either for languages with few available data resources or for languages with unexpected amount of – unnecessary – data.