Momel Model f0 as a quadratic spline function defined by a sequence of anchor points (target points)
INTSINT Code target points as {t, m, b, h, s, l, u, d} ({T,M,B… })
Output TextGrid with (optimised)INTSINT targets aligned with signal.
Detect f0
first pass
detect f0 with extreme parameters: min = 60, max = 750
calculate first and third quartiles of f0: q1, q3
minf0 = q1 * 0.75
maxf0 = q3 * 1.5 (or 2 or even 2.5 for expressive speech)
second pass
detect f0 with these values of min and max
Problem for Modelling f0
f0 is often discontinuous and not smooth
In this extract the beginning and end are continuous and smooth
Problem for Modelling f0
"More news about the reverend Sun Myung M(oon)"
the discontinuities and irregularities in this extract are due to the micromelodic effect of the non-sonorant consonants
Macromelody and micromelody
Statement intonation:
Macromelody and micromelody
Question intonation:
Momel's approach to the problem (1)
Dont stylise the f0 but factor the curve into two components:
Macromelodic component
Micromelodic component
Momel's approach to the problem (2)
Micromelodic component = divide each value of raw f0 by corresponding value of macromelodic component.
Momel - assumptions
raw f0 can be modelled as the product of 2 components: macromelodic and micromelodic
the macromelodic component can be modelled as a quadratic spline function: a sequence of monotonic quadratic transitions between anchor-points where the first derivate of the curve is 0.
micromelodic variations are generally a lowering of the macromelodic curve
for one or two periods following a silence, the f0 may be raised above the macromelodic curve
The Momel Algorithm
eliminate the first two values of f0 after a silence
fit a quadratic spline using an asymmetric form of robust regression so that:
there are no values of f0 above the spline curve
the curve is less than ∆ from as many values of the raw f0 as possible
INTSINT: An INternational Transcription System for Intonation
References:
Hirst 1987,
Hirst & Di Cristo 1998,
Hirst, Di Cristo & Espesser 2000,
Hirst 2007
Originally designed as a tool for linguists to describe intonation patterns in a language-independent way.
Based on an inventory of minimal pitch contrasts found in published descriptions of intonation patterns.
INTSINT (2)
Describes an intonation contour as a sequence of discrete "tones":
Absolute tones t(op) m(id) b(ottom):
These are assumed to refer to the corresponding position of the speaker’s current pitch range.
Relative tones h(igher) s(ame) l(ower):
Unlike absolute tones, relative tones are assumed to be defined with respect to the preceding tonal segment.
These are also defined relative to the preceding tonal segment but generally involve smaller pitch changes and often occur in a sequence of steps either upwards or downwards.
From INTSINT to Momel
The pitch range of an utterance is defined by two parameters key (in Hz) and span (in octaves)
m is equal to the value of key
t and b are the limits of the pitch range of span octaves, centred on the value of key
h and l are defined with respect to the preceding target p - as the (geometric) mean of p and the value of t and b respectively
u and d are defined as the mean of p and the value of h or l (as defined as above) respectively
From Momel to INTSINT
In the current implementation, every possible INTSINT coding is tested with values:
key from mean -50 to mean +50 with a step of 1 Hz
span from 0.5 to 2.5 octaves with a step of 0.1 octave
The optimal values of key and span and the optimal coding with INTSINT is then used to generate anchor points
Output of the Momel and INTSINT algorithms
Output from Momel and INTSINT
These can then be visualised with Praat (via a TextGrid) or with ProZed.