Reliable fundamental frequency (f0) extraction algorithms are crucial in many fields of speech research. The current bulk of studies testing the robustness of different algorithms have focused on healthy speech and/or measurements of sustained vowels. Few studies have tested f0 estimations in the context of pathological speech, and even fewer on continuous speech. The present study evaluated 12 available pitch detection algorithms on a corpus of read speech by 24 speakers (8 healthy speakers, 8 speakers with Parkinson's disease, and 8 with head and neck cancer). Two fusion methods' algorithms have been tested: one based on the median of algorithms and one based on the fusion between the best algorithm for voicing detection and the algorithm that generates the most accurate f0 estimations on voiced parts. Our results show that time-domain algorithms, like REAPER, are best for voicing detection while deep neural network algorithms, like FCN- f0, yield better accuracy for the f0 values on voiced parts. The combination of REAPER and FCN- f0 yields the best ratio performance/implementation complexity, since it generates less than 4% errors on voicing detection and less than 5% of gross errors in the estimation of the f0 values for all speaker groups.
Skip Nav Destination
,
,
Article navigation
November 2022
November 29 2022
Performance analysis of various fundamental frequency estimation algorithms in the context of pathological speech
Robin Vaysse;
Robin Vaysse
1
IRIT, Université de Toulouse, CNRS, Toulouse INP
, UT3, Toulouse, France
Search for other works by this author on:
Corine Astésano
;
Corine Astésano
c)
2
Laboratoire de NeuroPsychoLinguistique, Université Toulouse Jean-Jaurès
, France
Search for other works by this author on:
Jérôme Farinas
Jérôme Farinas
1
IRIT, Université de Toulouse, CNRS, Toulouse INP
, UT3, Toulouse, France
Search for other works by this author on:
Robin Vaysse
1
Corine Astésano
2,c)
Jérôme Farinas
1
1
IRIT, Université de Toulouse, CNRS, Toulouse INP
, UT3, Toulouse, France
2
Laboratoire de NeuroPsychoLinguistique, Université Toulouse Jean-Jaurès
, France
a)
Also at: Laboratoire de NeuroPsychoLinguistique, Université Toulouse Jean-Jaurès, France
b)
Electronic mail: [email protected]
c)
Also at: UMR 5267 Praxiling - Université Paul Valéry Montpellier, France
J. Acoust. Soc. Am. 152, 3091–3101 (2022)
Article history
Received:
August 02 2022
Accepted:
October 26 2022
Citation
Robin Vaysse, Corine Astésano, Jérôme Farinas; Performance analysis of various fundamental frequency estimation algorithms in the context of pathological speech. J. Acoust. Soc. Am. 1 November 2022; 152 (5): 3091–3101. https://doi.org/10.1121/10.0015143
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
Related Content
Fabrication of paddy cutter
AIP Conf. Proc. (May 2022)
An experimental comparison of fundamental frequency tracking algorithms
J. Acoust. Soc. Am. (September 2012)
A further comparison of fundamental frequency tracking algorithms
J. Acoust. Soc. Am. (November 2013)
Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement
J. Acoust. Soc. Am. (June 2024)
A spectral/temporal method for robust fundamental frequency tracking
J. Acoust. Soc. Am. (June 2008)