Reliable fundamental frequency (f0) extraction algorithms are crucial in many fields of speech research. The current bulk of studies testing the robustness of different algorithms have focused on healthy speech and/or measurements of sustained vowels. Few studies have tested f0 estimations in the context of pathological speech, and even fewer on continuous speech. The present study evaluated 12 available pitch detection algorithms on a corpus of read speech by 24 speakers (8 healthy speakers, 8 speakers with Parkinson's disease, and 8 with head and neck cancer). Two fusion methods' algorithms have been tested: one based on the median of algorithms and one based on the fusion between the best algorithm for voicing detection and the algorithm that generates the most accurate f0 estimations on voiced parts. Our results show that time-domain algorithms, like REAPER, are best for voicing detection while deep neural network algorithms, like FCN- f0, yield better accuracy for the f0 values on voiced parts. The combination of REAPER and FCN- f0 yields the best ratio performance/implementation complexity, since it generates less than 4% errors on voicing detection and less than 5% of gross errors in the estimation of the f0 values for all speaker groups.

1.
Ardaillon
,
L.
, and
Roebel
,
A.
(
2019
). “
Fully-convolutional network for pitch estimation of speech signals
,” in
Proc. Interspeech 2019
, pp.
2005
2009
.
2.
Babacan
,
O.
,
Drugman
,
T.
,
D'Alessandro
,
N.
,
Henrich
,
N.
, and
Dutoit
,
T.
(
2013
). “
A comparative study of pitch extraction algorithms on a large variety of singing sounds
,” in
38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013)
, Vancouver, Canada, pp.
1
5
.
3.
Bellman
,
R.
, (
1954
). “
The theory of dynamic programming
,”
Bull. Am. Math. Soc.
60
(
6
)
503
515
.
4.
Boersma
,
P.
(
2000
). “
Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound
,” in
Proceedings of the Institute of Phonetic Sciences 17
.
5.
Boersma
,
P.
, and
Weenink
,
D.
(
2020
). “
Praat: Doing phonetics by computer (version 6.1.16) [computer program]
,” http://www.praat.org (Last viewed January 20, 2022).
6.
Brookes
,
M.
(
2018
).
{VOICEBOX: Speech Processing Toolbox for MATLAB, available
at http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html (Last viewed November 12, 2022).
7.
Camacho
,
A.
, and
Harris
,
J.
(
2008
). “
A sawtooth waveform inspired pitch estimator for speech and music
,”
J. Acoust. Soc. Am.
124
,
1638
1652
.
8.
Cesari
,
U.
,
De Pietro
,
G.
,
Marciano
,
E.
,
Niri
,
C.
,
Sannino
,
G.
, and
Verde
,
L.
(
2018
). “
A new database of healthy and pathological voices
,”
Comput. Electr. Eng.
68
,
310
321
.
9.
Chu
,
W.
, and
Alwan
,
A.
(
2009
). “
Reducing f0 frame error of f0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend
,” in
2009 IEEE International Conference on Acoustics, Speech and Signal Processing
, pp.
3969
3972
.
10.
Daudet
,
A.
(
1870
).
Lettres de mon moulin: Impressions et souvenirs (Letters from my Windmill) (Hetzel, Paris)
.
11.
de Cheveigné
,
A.
, and
Kawahara
,
H.
(
2001
). “
Comparative evaluation of F0 estimation algorithms
,” in
Eurospeech
, NA, France, pp.
2451
2454
.
12.
de Cheveigne
,
A.
, and
Kawahara
,
H.
(
2002
). “
YIN, a fundamental frequency estimator for speech and music
,”
J. Acoust. Soc. Am.
111
(
4
),
1917
1930
.
13.
Di Cristo
,
A.
(
2016
).
Les musiques du français parlé: Essais sur l’accentuation, la métrique, le rythme, le phrasé prosodique et l’intonation du français contemporain (The Music of Spoken French: Essays on Accentuation, Metrics, Rhythm, Prosodic Phrasing and Intonation of Contemporary French)
(
de Gruyter
,
Berlin
).
14.
Drugman
,
T.
, and
Alwan
,
A.
(
2011
). “
Joint robust voicing detection and pitch estimation based on residual harmonics
,” in
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
, pp.
1973
1976
.
15.
Espesser
,
R.
(
1996
). “
MES : un environnement de traitement du signal” (“MES: A signal processing environment”)
,
XXIèmes Journées d'\'Etude sur la Parole (XXIst Study Days on the Word)
, Avignon,
France
, p.
447
.
16.
Espesser
,
R.
(
1999
). “
Mes signaix package
,”
Technical Report
.
17.
Ghahremani
,
P.
,
BabaAli
,
B.
,
Povey
,
D.
,
Riedhammer
,
K.
,
Trmal
,
J.
, and
Khudanpur
,
S.
(
2014
). “
A pitch extraction algorithm tuned for automatic speech recognition
,” in
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, pp.
2494
2498
.
18.
Gonzalez
,
S.
, and
Brookes
,
M.
(
2014
). “
PEFAC - A pitch estimation algorithm robust to high levels of noise
,”
IEEE/ACM Trans. Audio. Speech. Lang. Process.
22
(
2
),
518
530
.
19.
Google-Open-Source
(
2015
). “
Reaper: Robust epoch and pitch estimator
,” https://github.com/google/REAPER (Last viewed September 20, 2020).
20.
Hess
,
W. J.
(
2008
).
Pitch and Voicing Determination of Speech with an Extension Toward Music Signals
(
Springer Berlin Heidelberg
,
Berlin, Heidelberg
), pp.
181
212
.
21.
Jang
,
S.-J.
,
Choi
,
S,-H.
,
Kim
,
H.-M.
,
Choi
,
H.-S.
, and
Yoon
,
Y.-R.
(
2007
). “
Evaluation of performance of several established pitch detection algorithms in pathological voices
,” in
2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
, pp.
620
623
.
22.
Jiménez-Jiménez
,
F. J.
,
Gamboa
,
J.
,
Nieto
,
A.
,
Guerrero
,
J.
,
Orti-Pareja
,
M.
,
Molina
,
J. A.
,
García-Albea
,
E.
, and
Cobeta
,
I.
(
1997
). “
Acoustic voice analysis in untreated patients with Parkinson's disease
,”
Parkinsonism Relat. Disord.
3
(
2
),
111
116
.
23.
Jouvet
,
D.
, and
Laprie
,
Y.
(
2017
). “
Performance analysis of several pitch detection algorithms on simulated and real noisy speech data
,” in
2017 25th European Signal Processing Conference (EUSIPCO)
, pp.
1614
1618
.
24.
Kåre
,
S.
(
2005
). The Snack Sound Toolkit (Version 2.2.10), available at https://www.speech.kth.se/snack/index.html (Last viewed November 11, 2022).
25.
Kasi
,
K.
, and
Zahorian
,
S.
(
2002
). “
Yet another algorithm for pitch tracking
,” in
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
, Vol.
1
, p.
361
.
26.
Kawahara
,
H.
,
Cheveigné
,
A.
,
Banno
,
H.
,
Takahashi
,
T.
, and
Irino
,
T.
(
2005
). “
Nearly defect-free f0 trajectory extraction for expressive speech modifications based on straight
,” in
Ninth European Conference on Speech Communication and Technology
, pp.
537
540
.
27.
Kawahara
,
H.
(
2018
).
STRAIGHT, a speech analysis, modification and synthesis system, available at
http://web.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html (Last viewed November 11, 2022).
28.
Keating
,
P. A.
,
Garellek
,
M.
, and
Kreiman
,
J.
(
2015
). “
Acoustic properties of different kinds of creaky voice
,” in
Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow,
Vol. 2015, pp.
2
7
.
29.
Kim
,
J. W.
,
Salamon
,
J.
,
Li
,
P.
, and
Bello
,
J. P.
(
2018
). “
CREPE: A convolutional representation for pitch estimation
,” arXiv:1802.06182.
30.
Le Dorze
,
G. L.
,
Ouellet
,
L.
, and
Ryalls
,
J.
(
1994
). “
Intonation and speech rate in dysarthric speech
,”
J. Commun. Disorders
27
(
1
),
1
18
.
31.
Luengo
,
I.
,
Saratxaga
,
I.
,
Navas
,
E.
,
Hernaez
,
I.
,
Sanchez
,
J.
, and
Sainz
,
I.
(
2007
). “
Evaluation of pitch detection algorithms under real conditions
,” in
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
, Vol.
4
, pp.
IV-1057
IV-1060
.
32.
Parsa
,
V.
, and
Jamieson
,
D. G.
(
1999
). “
A comparison of high precision F0 extraction algorithms for sustained vowels
,”
J. Speech. Lang. Hear. Res.
42
(
1
),
112
126
.
33.
Povey
,
D.
,
Ghoshal
,
A.
,
Boulianne
,
G.
,
Burget
,
L.
,
Glembek
,
O.
,
Goel
,
N.
,
Hannemann
,
M.
,
Motlicek
,
P.
,
Qian
,
Y.
,
Schwarz
,
P.
,
Silovsky
,
J.
,
Stemmer
,
G.
, and
Vesely
,
K.
(
2011
). “
The Kaldi Speech Recognition Toolkit
”, in
IEEE 2011 Workshop on Automatic Speech Recognition and Understanding
(
IEEE
,
New York
).
34.
Ross
,
M.
,
Shaffer
,
H.
,
Cohen
,
A.
,
Freudberg
,
R.
, and
Manley
,
H.
(
1974
). “
Average magnitude difference function pitch extractor
,”
IEEE Trans. Acoust. Speech, Signal Process.
22
(
5
),
353
362
.
35.
RUGBI
(
2018–2023
). “
Looking for relevant linguistic units to improve the intelligibility measurement of speech production disorders
,” https://www.irit.fr/rugbi (Last viewed November 10, 2022).
36.
Soquet
,
A.
(
1994
). “
Approche coopérative de l'extraction de la fréquence fondamentale
” (“A cooperative approach of f0 extraction”), in
XXèmes Journées D'Études Sur la Parole (XXth Study Days on the Word)
, Trégastel, France, pp.
229
234
.
37.
Strömbergsson
,
S.
(
2016
). “
Today's most frequently used F0 estimation methods, and their accuracy in estimating male and female pitch in clean speech
,” in
Proc. Interspeech 2016
, pp.
525
529
.
38.
Talkin
,
D.
, and
Kleijn
,
W. B.
(
1995
). “
A robust algorithm for pitch tracking (RAPT)
,” in
Speech Coding Synthesis,
edited by
W. B.
Kleijn
and
K. K.
Paliwal
(Elsevier Science B. V., Amsterdam), pp.
495
518
.
39.
Tsanas
,
A.
,
Zañartu
,
M.
,
Little
,
M. A.
,
Fox
,
C.
,
Ramig
,
L. O.
, and
Clifford
,
G. D.
(
2014
). “
Robust fundamental frequency estimation in sustained vowels: Detailed algorithmic comparisons and information fusion with adaptive Kalman filtering
,”
J. Acoust. Soc. Am.
135
(
5
),
2885
2901
.
40.
Tokuda
,
K.
,
Oura
,
K.
,
Yoshimura
,
T.
,
Tamamori
,
A.
,
Sako
,
S.
,
Zen
,
H.
,
Nose
,
T.
,
Takahashi
,
T.
,
Yamagishi
,
J.
, and
Nankaku
,
Y.
(
2017
).
Speech Signal Processing Toolkit (Version 3
.
11),
available at https://sp-tk.sourceforge.net/ (Last viewed Novembver 12, 2022).
41.
Vaysse
,
R.
,
Ghio
,
A.
,
Astésano
,
C.
,
Farinas
,
J.
, and
Viallet
,
F.
(
2022
). “
Analyse macroscopique des variations et modulations de F0 en lecture dans la maladie de Parkinson: Données sur 320 locuteurs
“Macroscopic analysis of f0 variations and modulations in read speech for Parkinson disease patients: Data from 320 speakers”), in
34e Journées D'Études Sur la Parole (JEP2022)
, [34th 740 Speech Study Days (JEP2022)] (Association Française de la Communication Parlée, Noirmoutier, France, to be published).
42.
Woisard
,
V.
,
Astésano
,
C.
,
Balaguer
,
M.
,
Farinas
,
J.
,
Fredouille
,
C.
,
Gaillard
,
P.
,
Ghio
,
A.
,
Giusti
,
L.
,
Laaridh
,
I.
,
Lalain
,
M.
,
Lepage
,
B.
,
Mauclair
,
J.
,
Nocaudie
,
O.
,
Pinquier
,
J.
,
Pouchoulin
,
G.
,
Puech
,
M.
,
Robert
,
D.
, and
Roger
,
V.
(
2021
). “
C2SI corpus: A database of speech disorder productions to assess intelligibility and quality of life in head and neck cancers
,”
Lang. Resour. Eval.
55
(
1
),
173
190
.
43.
Zahorian
,
S. A.
, and
Hu
,
H.
(
2016
).
YAAPT Pitch Tracking MATLAB Function
, available at http://www.ws.binghamton.edu/zahorian/yaapt.htm (Last viewed November 11, 2022).
You do not currently have access to this content.