Recently, cluster analysis on f0 contours has become a popular method in phonetic research. Cluster analysis provides an automated way of categorising f0 contours, which gives new insights into (phonological) categories of intonation that vary across languages. As cluster analysis can be performed in many different ways, it is important to understand the extent to which these analyses can capture human perception of f0. This study focuses on the way in which f0 contours and differences among them are represented numerically, i.e., a crucial methodological choice preceding cluster analysis. These representations are then compared to the way in which f0 contour differences are perceived by human listeners from two different languages. To this end, four time-series contour representations (equivalent rectangular bandwidth, standardisation, octave-median rescaling, first derivative) and three distance measures [Euclidean distance (L2 norm), Pearson correlation, and dynamic time warping) were tested. The perceived differences were obtained from listeners of German and Papuan Malay, two typologically different languages. Results show that computed contour differences reflect human perception moderately, with dynamic time warping applied to the first derivative of the contour performing best, and showing minimal differences between the languages.

1.
Adriaens
,
L. L.
(
1991
). “
Ein Modell deutscher Intonation: Eine experimentell-phonetische Untersuchung nach den perzeptiv relevanten Grundfrequenzänderungen in vorgelesenem Text
” (“A model of German intonation: Experimental-phonetic investigation of the perceptually relevant fundamental frequency changes in text read aloud”) (
Technische Universiteit Eindhoven
Eindhoven, the Netherlands
).
2.
Albert
,
A.
,
Cangemi
,
F.
, and
Grice
,
M.
(
2018
). “
Using periodic energy to enrich acoustic representations of pitch in speech: A demonstration
,” in
Speech Prosody 2018
,
June 13–16
,
Poznan, Poland
, pp.
804
808
.
3.
Barnes
,
J.
,
Veilleux
,
N.
,
Brugos
,
A.
, and
Shattuck-Hufnagel
,
S.
(
2012
). “
Tonal center of gravity: A global approach to tonal implementation in a level-based intonational phonology
,”
Lab. Phonol.
3
(
2
),
337
383
.
4.
Boersma
,
P.
, and
Weenink
,
D.
(
2022
). “
Praat: Doing phonetics by computer
,” http://www.praat.org/ (Last viewed May 5, 2022).
5.
Borchers
,
H. W.
(
2022
). “
pracma: Practical Numerical Math Functions
,” https://CRAN.R-project.org/package=pracma (Last viewed November 22, 2022).
6.
Calhoun
,
S.
, and
Schweitzer
,
A.
(
2012
). “
Can intonation contours be lexicalised? Implications for discourse meanings
,” in
Prosody and Meaning
, edited by
G.
Elordieta
and
P.
Prieto
(
De Gruyter
,
Berlin
), pp.
271
327
.
7.
Cole
,
J.
, and
Steffman
,
J.
(
2021
). “
The primacy of the rising/non-rising dichotomy in American English intonational tunes
,” in
Proceedings of the 1st International Conference on Tone and Intonation (TAI)
,
March 28–30
,
Beijing, China
, pp.
122
126
.
8.
Cole
,
J.
,
Steffman
,
J.
, and
Tilsen
,
S.
(
2022
). “
Shape matters: Machine classification and listeners' perceptual discrimination of American English intonational tunes
,” in
Proceedings of Speech Prosody 2022
,
May 23–26
,
Lisbon, Portugal
, pp.
297
301
.
9.
Collier
,
R.
(
1975
). “
Perceptual and linguistic tolerance in intonation
,”
IRAL Int. Rev. Appl. Ling. Lang. Teach.
13
(
1-4
),
293
308
.
10.
Collier
,
R.
(
1977
). “
The perception of English intonation by Dutch and English listeners
,”
IPO Annu. Prog. Rep.
12
,
69
73
.
11.
De Looze
,
C.
, and
Hirst
,
D.
(
2014
). “
The OMe (Octave-Median) scale: A natural scale for speech melody
,” in
Proceedings of the 7th International Conference on Speech Prosody 2014
,
May 20–23
,
Dublin, Ireland
, pp.
910
914
.
12.
Demenko
,
G.
, and
Wagner
,
A.
(
2006
). “
The stylization of intonation contours
,” in
Proceedings of Speech Prosody 2006
,
May 2–5,
Dresden, Germany
, p.
254
.
13.
Dietz
,
E. J.
(
1983
). “
Permutation tests for association between two distance matrices
,”
Syst. Biol.
32
(
1
),
21
26
.
14.
Ding
,
H.
,
Trajcevski
,
G.
,
Scheuermann
,
P.
,
Wang
,
X.
, and
Keogh
,
E.
(
2008
). “
Querying and mining of time series data: Experimental comparison of representations and distance measures
,”
Proc. VLDB Endow.
1
(
2
),
1542
1552
.
15.
Dockum
,
R.
(
2017
). “
Computational modeling of tone in language documentation: Citation tones vs. running speech in Chindwin Khamti
,” in
Proceedings of the Annual Meeting of the Berkeley Linguistics Society
, Vol. 43, pp. 43–73.
16.
Epskamp
,
S.
,
Cramer
,
A. O. J.
,
Waldorp
,
L. J.
,
Schmittmann
,
V. D.
, and
Borsboom
,
D.
(
2012
). “
qgraph: Network visualizations of relationships in psychometric data
,”
J. Stat. Softw.
48
(
4
),
1
18
.
17.
Esling
,
P.
, and
Agon
,
C.
(
2012
). “
Time-series data mining
,”
ACM Comput. Surv.
45
(
1
),
1
34
.
18.
Féry
,
C.
(
1993
).
German Intonational Patterns
(
De Gruyter
,
Berlin
).
19.
Féry
,
C.
, and
Stoel
,
R.
(
2006
). “
Gradient perception of intonation
,” in
Gradience in Grammar
, 1st ed., edited by
G.
Fanselow
,
C.
Féry
,
M.
Schlesewsky
, and
R.
Vogel
(
Oxford University Press
,
Oxford
), pp.
145
166
.
20.
Gauthier
,
B.
,
Shi
,
R.
, and
Xu
,
Y.
(
2007
). “
Learning phonetic categories by tracking movements
,”
Cognition
103
(
1
),
80
106
.
21.
Geler
,
Z.
,
Kurbalija
,
V.
,
Ivanovic
,
M.
,
Radovanovic
,
M.
, and
Dai
,
W.
(
2019
). “
Dynamic Time Warping: Itakura vs Sakoe-Chiba
,” in
Proceedings of the 2019 IEEE International Symposium on Innovations In Intelligent Systems And Applications (INISTA
),
July 3–5
,
Sofia, Bulgaria
, pp.
1
6
,.
22.
Giorgino
,
T.
(
2009
). “
Computing and visualizing dynamic time warping alignments in R: The DTW package
,”
J. Stat. Softw.
31
(
7
),
1
24
.
23.
Glasberg
,
B. R.
, and
Moore
,
B. C.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
(
1–2
),
103
138
.
24.
Grice
,
M.
, and
Baumann
,
S.
(
2007
). “
An introduction to intonation – functions and models
,” in
Phonetic Description and Teaching Practice
, edited by
J.
Trouvain
and
U.
Gut
(
De Gruyter Mouton
,
Berlin
), pp.
25
52
.
25.
Grice
,
M.
,
Baumann
,
S.
, and
Benzmüller
,
R.
(
2005
). “
German intonation in Autosegmental-Metrical phonology
,” in
Prosodic Typology: The Phonology of Intonation and Phrasing
, edited by
S.-A.
Jun
(
Oxford University Press
,
Oxford, UK
), pp.
55
83
.
26.
Grice
,
M.
,
Baumann
,
S.
,
Rössig
,
S.
, and
Röhr
,
C.
(
2022
). “
GToBI: Übungsmaterialien zur deutschen Intonation
” (“GToBI: Training materials for German intonation”), http://www.gtobi.uni-koeln.de/index.html (Last viewed November 21, 2022).
27.
Gulati
,
S.
,
Serra
,
J.
,
Ishwar
,
V.
, and
Serra
,
X.
(
2016
). “
Discovering rāga motifs by characterizing communities in networks of melodic patterns
,” in
Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
March 20–25
,
Shanghai, China
, pp.
286
290
.
28.
Hermes
,
D. J.
(
1998
). “
Measuring the perceptual similarity of pitch contours
,”
J. Speech. Lang. Hear. Res.
41
(
1
),
73
82
.
29.
Himmelmann
,
N. P.
, and
Kaufman
,
D.
(
2020
). “
Austronesia
,” in
The Oxford Handbook of Language Prosody
, edited by
C.
Gussenhoven
and
A.
Chen
(
Oxford University Press
,
Oxford, UK
), pp.
369
383
.
30.
Hirschberg
,
J. B.
, and
Rosenberg
,
A.
(
2007
). “
V-Measure: A conditional entropy-based external cluster evaluation
,” in
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL
), June 28–30
,
Prague, Czechia
, pp.
410
420
.
31.
Hirst
,
D.
,
Di Cristo
,
A.
, and
Espesser
,
R.
(
2000
). “
Levels of representation and levels of analysis for the description of intonation systems
,” in
Prosody: Theory and Experiment
, edited by
N.
Ide
,
J.
Véronis
, and
M.
Horne
(
Springer
,
Dordrecht, the Netherlands
), Vol. 14, pp.
51
87
.
32.
Jun
,
S.-A.
(
2005
).
Oxford Linguistics Prosodic Typology: The Phonology of Intonation and Phrasing
(
Oxford University Press
,
New York
).
33.
Jun
,
S.-A.
(
2014
).
Oxford Linguistics Prosodic Typology II: The Phonology of Intonation and Phrasing
(
Oxford University Press
,
New York
).
34.
Kaland
,
C.
(
2019
). “
Acoustic correlates of word stress in Papuan Malay
,”
J. Phon.
74
,
55
74
.
35.
Kaland
,
C.
(
2020
). “
Offline and online processing of acoustic cues to word stress in Papuan Malay
,”
J. Acoust. Soc. Am.
147
(
2
),
731
747
.
36.
Kaland
,
C.
(
2021a
). “
Contour clustering: A field-data-driven approach for documenting and analysing prototypical f0 contours
,”
J. Int. Phonetic Assoc.
53
,
159
188
.
37.
Kaland
,
C.
(
2021b
). “
The perception of word stress cues in Papuan Malay: A typological perspective and experimental investigation
,”
Lab. Phonol.
12
(
1
),
1
33
.
38.
Kaland
,
C.
, and
Baumann
,
S.
(
2020
). “
Demarcating and highlighting in Papuan Malay phrase prosody
,”
J. Acoust. Soc. Am.
147
(
4
),
2974
2988
.
39.
Kaland
,
C.
, and
Gordon
,
M. K.
(
2022
). “
The role of f0 shape and phrasal position in Papuan Malay and American English word identification
,”
Phonetica
79
(
3
),
219
245
.
40.
Kaland
,
C.
,
Kluge
,
A.
, and
Van Heuven
,
V. J.
(
2021a
). “
Lexical analyses of the function and phonology of Papuan Malay word stress
,”
Phonetica
78
(
2
),
141
168
.
41.
Kaland
,
C.
,
Peck
,
N.
,
Ellison
,
T. M.
, and
Reinöhl
,
U.
(
2021b
). “
An initial exploration of the interaction of tone and intonation in Kera'a
,” in
Proceedings of the 1st International Conference on Tone and Intonation (TAI)
,
December 6–9
,
Sonderborg, Denmark
, pp.
132
136
.
42.
Kaland
,
C.
,
Swerts
,
M.
, and
Himmelmann
,
N. P.
(
2023
). “
Red and blue bananas: Time-series f0 analysis of contrastively focused noun phrases in Papuan Malay and Dutch
,”
J. Phon.
96
,
101200
.
43.
Kaufman
,
L.
, and
Rousseeuw
,
P. J.
(
1990
).
Wiley Series in Probability and Statistics Finding Groups in Data
(
John Wiley & Sons, Inc
.,
Hoboken, NJ
).
44.
Klabbers
,
E.
, and
Van Santen
,
J. P.
(
2004
). “
Clustering of Foot-Based pitch contours in expressive speech
,” in
Proceedings of the 5th ISCA Speech Synthesis Workshop
,
June 14–16
,
Pittsburgh, PA
.
45.
Ladd
,
D. R.
(
2008
).
Cambridge Studies in Linguistics Intonational Phonology
,
2nd ed
. (
Cambridge University Press
,
Cambridge, UK
).
46.
Levow
,
G.-A.
(
2006
). “
Unsupervised and semi-supervised learning of tone and pitch accent
,” in
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference
,
June 4–9
,
New York
, pp.
224
231
.
47.
Lin
,
J.
,
Keogh
,
E.
,
Lonardi
,
S.
, and
Chiu
,
B.
(
2003
). “
A symbolic representation of time series, with implications for streaming algorithms
,” in
Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge discovery—DMKD '03
,
June 13
,
San Diego, CA
, p.
2
.
48.
Mahalanobis
,
P. C.
(
1936
). “
On the generalised distance in statistics
,”
Proc. Nat. Inst. Sci. India
2
(
1
),
49
55
.
49.
Möhler
,
G.
, and
Conkie
,
A.
(
1998
). “
Parametric modeling of intonation using vector quantization
,” in
The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis
,
November 26–29
,
Blue Mountains, Australia
, pp.
311
316
.
50.
Mori
,
U.
,
Mendiburu
,
A.
, and ,
Lozano
,
A. J.
(
2016
). “
Distance measures for time series in R: The TSdist Package
,”
The R J.
8
(
2
),
451
459
.
51.
Odé
,
C.
(
1989
).
Russian Intonation: A Perceptual Description
(
Rodopi
,
Amsterdam
).
52.
Pierrehumbert
,
J.
(
1980
). “
The phonology and phonetics of English intonation
,” Ph.D. thesis,
Massachusetts Institute of Technology
,
Cambridge, MA
.
53.
Pierrehumbert
,
J.
, and
Hirschberg
,
J.
(
1990
). “
The meaning of intonational contours in the interpretation of discourse
,” in
Intentions in Communication
, edited by
P. R.
Cohen
,
J.
Morgan
, and
M. E.
Pollack
(
MIT Press
,
Cambridge, MA
).
54.
Prekopcsák
,
Z.
, and
Lemire
,
D.
(
2012
). “
Time series classification by class-specific Mahalanobis distance measures
,”
Adv. Data Anal. Class.
6
(
3
),
185
200
.
55.
Prom-on
,
S.
,
Xu
,
Y.
, and
Thipakorn
,
B.
(
2009
). “
Modeling tone and intonation in Mandarin and English as a process of target approximation
,”
J. Acoust. Soc. Am.
125
(
1
),
405
424
.
56.
Raškinis
,
G.
, and
Kazlauskienė
,
A.
(
2013
). “
From speech corpus to intonation corpus: Clustering phrase pitch contours of Lithuanian
,” in
Proceedings of the 19th Nordic Conference of Computational Linguistics
,
May 22–24
,
Oslo, Norway
, pp.
353
363
.
57.
R Core Team
(
2022
). “
R: The R project for statistical computing
,” https://www.r-project.org/ (Last viewed November 4, 2022).
58.
Reichel
,
U. D.
(
2011
). “
The CoPaSul intonation model
,” in
Studientexte Zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2011
(Study Texts on Speech Communication: Electronic Signal Processing 2011), edited by
B. J.
Kröger
and
P.
Birkholz
(
TUDpress
,
Dresden
), pp.
341
348
.
59.
Riesberg
,
S.
, and
Himmelmann
,
N. P.
(
2012
). “
The DoBeS Summits-PAGE Collection of Papuan Malay
,” https://hdl.handle.net/1839/00-0000-0000-0019-FF78-5 (Last viewed July 11, 2019).
60.
Riesberg
,
S.
,
Kalbertodt
,
J.
,
Baumann
,
S.
, and
Himmelmann
,
N. P.
(
2020
). “
Using rapid prosody transcription to probe little-known prosodic systems: The case of Papuan Malay
,”
Lab. Phonol J. Assoc. Lab. Phonol.
11
(
1
),
8
.
61.
Ritter
,
S.
, and
Grice
,
M.
(
2015
). “
The role of tonal onglides in german nuclear pitch accents
,”
Lang. Speech
58
(
1
),
114
128
.
62.
Rose
,
P.
(
1987
). “
Considerations in the normalisation of the fundamental frequency of linguistic tone
,”
Speech Commun.
6
(
4
),
343
352
.
63.
R Studio Team
(
2022
). “
RStudio: Integrated Development for R
,” https://www.rstudio.com/ (Last viewed November 4, 2022).
64.
Sakoe
,
H.
, and
Chiba
,
S.
(
1978
). “
Dynamic programming algorithm optimization for spoken word recognition
,”
IEEE Trans. Acoust, Speech, Signal Process.
26
(
1
),
43
49
.
65.
Seeliger
,
H.
, and
Kaland
,
C.
(
2022
). “
Boundary tones in German wh-questions and wh-exclamatives—A cluster-based approach
,” in
Proceedings of the 11th International Conference on Speech Prosody 2022
,
May 23–26
,
Lisbon, Portugal
, pp.
27
31
.
66.
Silverman
,
K.
,
Beckman
,
M.
,
Pitrelli
,
J.
,
Ostendorf
,
M.
,
Wightman
,
C.
,
Price
,
P.
,
Pierrehumbert
,
J.
, and
Hirschberg
,
J.
(
1992
). “
ToBI: A standard for labeling English prosody
,” in
Proceedings of the Second International Conference on Spoken Language Processing
,
October 12–16
,
Banff, Canada
.
67.
Stoet
,
G.
(
2010
). “
PsyToolkit: A software package for programming psychological experiments using Linux
,”
Behav. Res. Methods
42
(
4
),
1096
1104
.
68.
Stoet
,
G.
(
2017
). “
PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments
,”
Teaching Psychol.
44
(
1
),
24
31
.
69.
Watson
,
D.
,
Tanenhaus
,
M.
, and
Gunlogson
,
C.
(
2008
). “
Interpreting pitch accents in online comprehension: H* vs. L+H*
,”
Cogn. Sci.: A Multidiscip. J.
32
(
7
),
1232
1244
.
70.
Watts
,
D. J.
, and
Strogatz
,
S. H.
(
1998
). “
Collective dynamics of ‘small-world’ networks
,”
Nature
393
(
6684
),
440
442
.
71.
Zhang
,
S.
(
2016
). “
Mining linguistic tone patterns with symbolic representation
,” in
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
,
August 11
,
Berlin, Germany
, pp.
1
9
.
You do not currently have access to this content.