Automatic Annotation of Melody

Daniel Hirst
All the software described in this presentation and a number of related publications are downloadable from: https://uk.groups.yahoo.com/neo/groups/praat-users/files/Daniel_Hirst/

Automatic annotation of melody

Command/Response: Hiroya Fujisaki 1984.
Momel: Daniel Hirst & Robert Espesser 1993.
Tilt: Paul Taylor 1998.
Stem-ML: Greg Kochanski and Chilin Shih 2000.
INTSINT: Daniel Hirst et al. 2000 (Hirst 1987).
Prosogram: Piet Mertens 2004.
Penta: Yi Xu 2004.
AuTobi: Andrew Rosenberg 2010.
SLAM: Nicolas Obin et al. 2014.

Momel and INTSINT

Input sound file
Detect f₀ Optimise max and min f₀
Momel Model f₀ as a quadratic spline function defined by a sequence of anchor points (target points)
INTSINT Code target points as {t, m, b, h, s, l, u, d} ({T,M,B… })
Output TextGrid with (optimised)INTSINT targets aligned with signal.

Detect f₀

first pass
- detect f₀ with extreme parameters: min = 60, max = 750
- calculate first and third quartiles of f₀: q1, q3
- minf0 = q1 * 0.75
- maxf0 = q3 * 1.5 (or 2 or even 2.5 for expressive speech)
second pass
- detect f₀ with these values of min and max

Problem for Modelling f₀

f₀ is often discontinuous and not smooth
In this extract the beginning and end are continuous and smooth

Problem for Modelling f₀

"More news about the reverend Sun Myung M(oon)"

the discontinuities and irregularities in this extract are due to the micromelodic effect of the non-sonorant consonants

Macromelody and micromelody

Statement intonation:

Macromelody and micromelody

Question intonation:

Momel's approach to the problem (1)

Dont stylise the f₀ but factor the curve into two components:

Macromelodic component
Micromelodic component

The macromelodic component of "More news about the reverend Sun Myung M(oon)"

Momel's approach to the problem (2)

Micromelodic component = divide each value of raw f₀ by corresponding value of macromelodic component.

The micromelodic component of "More news about the reverend Sun Myung M(oon)"

Momel - assumptions

raw f₀ can be modelled as the product of 2 components: macromelodic and micromelodic
the macromelodic component can be modelled as a quadratic spline function:
a sequence of monotonic quadratic transitions between anchor-points where the first derivate of the curve is 0.
micromelodic variations are generally a lowering of the macromelodic curve
for one or two periods following a silence, the f₀ may be raised above the macromelodic curve

A quadratic spline transition between two anchor points

The Momel Algorithm

eliminate the first two values of f₀ after a silence
fit a quadratic spline using an asymmetric form of robust regression so that:
- there are no values of f₀ above the spline curve
- the curve is less than ∆ from as many values of the raw f₀ as possible

The output of Momel for "More news about the reverend Sun Myung M(oon)"

INTSINT: An INternational Transcription System for Intonation

References:
- Hirst 1987,
- Hirst & Di Cristo 1998,
- Hirst, Di Cristo & Espesser 2000,
- Hirst 2007
Originally designed as a tool for linguists to describe intonation patterns in a language-independent way.
Based on an inventory of minimal pitch contrasts found in published descriptions of intonation patterns.

INTSINT (2)

Describes an intonation contour as a sequence of discrete "tones":

Absolute tones t(op) m(id) b(ottom):
- These are assumed to refer to the corresponding position of the speaker’s current pitch range.
Relative tones h(igher) s(ame) l(ower):
- Unlike absolute tones, relative tones are assumed to be defined with respect to the preceding tonal segment.
Iterative relative tones u(pstepped) d(ownstepped):
- These are also defined relative to the preceding tonal segment but generally involve smaller pitch changes and often occur in a sequence of steps either upwards or downwards.

From INTSINT to Momel

The pitch range of an utterance is defined by two parameters key (in Hz) and span (in octaves)

m is equal to the value of key
t and b are the limits of the pitch range of span octaves, centred on the value of key
h and l are defined with respect to the preceding target p - as the (geometric) mean of p and the value of t and b respectively
u and d are defined as the mean of p and the value of h or l (as defined as above) respectively

Graphic illustration of the mapping from INTSINT to Momel defined by 2 parameters key and span — Graphic illustration of the mapping from INTSINT to Momel defined by 2 parameters \(key\) and \(span\)

From Momel to INTSINT

In the current implementation, every possible INTSINT coding is tested with values:

key from mean -50 to mean +50 with a step of 1 Hz
span from 0.5 to 2.5 octaves with a step of 0.1 octave

The optimal values of key and span and the optimal coding with INTSINT is then used to generate anchor points

Output of the Momel and INTSINT algorithms

Output from Momel and INTSINT

Comparison of original anchor points detected by Momel and anchors generated from the points coded with INTSINT labels

These can then be visualised with Praat (via a TextGrid) or with ProZed.

Vizualize with ProZed

Summary

Introduction
Selection of annotation software
Corpus development methodology
Momel and INTSINT
SPPAS
Time Group Analyzer
Conclusion and references

Momel and INTSINT

Automatic Annotation of Melody

Automatic annotation of melody

Momel and INTSINT

Detect f0

Problem for Modelling f0

Problem for Modelling f0

Macromelody and micromelody

Macromelody and micromelody

Momel's approach to the problem (1)

Momel's approach to the problem (2)

Momel - assumptions

The Momel Algorithm

INTSINT: An INternational Transcription System for Intonation

INTSINT (2)

From INTSINT to Momel

From Momel to INTSINT

Output of the Momel and INTSINT algorithms

Output from Momel and INTSINT

Vizualize with ProZed

Summary

Detect f₀

Problem for Modelling f₀

Problem for Modelling f₀