Introduction

About the author

Brigitte Bigi - Author of SPPAS
Brigitte Bigi - Author of SPPAS

Corpus and annotation

Corpus annotation “can be defined as the practice of adding interpretative, linguistic information to an electronic corpus of spoken and/or written language data. ‘Annotation’ can also refer to the end-product of this process” (Leech, 1997).

Annotations

Example of multi-level annotations
Example of multi-level annotations

Annotation software

Before using any automatic annotation tool/software, it is important to consider its error rate (where applicable) and to estimate how those errors will affect the purpose for the annotated corpora.

SPPAS is an awarded Research Software

Multi-Lingual approaches to the automatic annotation of speech

Resources extend

Resources extend (continued)

SPPAS: Main reference to cite

		Brigitte Bigi (2015).
		SPPAS - Multi-lingual Approaches to the Automatic Annotation of Speech.
		In "the Phonetician" - International Society of Phonetic Sciences,
		ISSN 0741-6164, Number 111-112 / 2015-I-II, pages 54-69.
	
Screenshot sppas paper

You reached the end of this tutorial!

Click here to go back to SPPAS tutorials web page

Corpus creation methodology

The context

Screenshots of 4 corpora (left to right): CID, GrenelleII, Aix MapTask, DVD
Screenshots of four corpora (left to right): CID, GrenelleII, Aix MapTask, DVD

The corpus creation workflow

Corpus creation and annotation: methodology
Corpus creation and annotation: methodology

Step 1: Recording speech

Step 2: Search for Inter-Pausal Units

Step 3: Orthographic Transcription

Step 4: Speech segmentation

Other steps:

Interested in knowing more?

You reached the end of this tutorial!

Click here to go back to SPPAS tutorials web page

Data preparation for automatic annotations

(Step 1) Recording Speech

Recording audio

Recording audio: software tools

audacity
sox

Recording speech: SPPAS requirements

Example of recorded speech
Example of recorded speech

(Step 2) Inter-Pausal Units

IPUs = sounding segments

Orthographic transcription into the IPUs
Orthographic transcription into the IPUs

How to do it?

(Step 3) Transcribing Speech

Orthographic transcription:

Orthographic transcription: SPPAS convention

Transcription example 1 (Conversational speech)

donc + i- i(l) prend la è- recette et tout bon i(l) vé- i(l) dit bon [okay, k]

Transcription example 2 (Conversational speech)

ah mais justement c’était pour vous vendre bla bla bla bl(a) le mec i(l) te l’a emboucané + en plus i(l) lu(i) a [acheté,acheuté] le truc et le mec il est parti j(e) dis putain le mec i(l) voulait

Transcription example 3 (GrenelleII)

euh les apiculteurs + et notamment b- on ne sait pas très bien + quelle est la cause de mortalité des abeilles m(ais) enfin il y a quand même + euh peut-êt(r)e des attaques systémiques

Annotated files: recommendations

Supported file formats

sppas formats

Always remember…

Garbage in, garbage out

You reached the end of this tutorial!

Click here to go back to SPPAS tutorials web page