The principles of the existing pitch estimation techniques are often different and complementary in nature. In this work, a frame selective dynamic programming (FSDP) method is proposed which exploits the complementary characteristics of two existing methods, namely, sub-harmonic to harmonic ratio (SHR) and sawtooth-wave inspired pitch estimator (SWIPE). Using variants of SHR and SWIPE, the proposed FSDP method classifies all the voiced frames into two classes—the first class consists of the frames where a confidence score maximization criterion is used for pitch estimation, while for the second class, a dynamic programming (DP) based approach is proposed. Experiments are performed on speech signals separately from KEELE, CSLU, and PaulBaghsaw corpora under clean and additive white Gaussian noise at 20, 10, 5, and 0 dB SNR conditions using four baseline schemes including SHR, SWIPE, and two DP based techniques. The pitch estimation performance of FSDP, when averaged over all SNRs, is found to be better than those of the baseline schemes suggesting the benefit of applying smoothness constraint using DP in selected frames in the proposed FSDP scheme. The VuV classification error from FSDP is also found to be lower than that from all four baseline schemes in almost all SNR conditions on three corpora.

1.
M.
Wang
and
M.
Lin
, “
An analysis of pitch in Chinese spontaneous speech
,” in
Proceedings of the International Symposium on Tonal Aspects of Languages: With Emphasis on Tone Languages
(
2004
).
2.
I. R.
Murray
and
J. L.
Arnott
, “
Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion
,”
J. Acoust. Soc. Am.
93
(
2
),
1097
1108
(
1993
).
3.
K.
De Bot
, “
Visual feedback of intonation I: Effectiveness and induced practice behavior
,”
Lang. Speech
26
(
4
),
331
350
(
1983
).
4.
A.
Askenfelt
, “
Automatic notation of played music: The VISA project
,”
Fontes Artis Musicae
26
,
109
120
(
1979
).
5.
R. B.
Dannenberg
,
W. P.
Birmingham
,
G.
Tzanetakis
,
C.
Meek
,
N.
Hu
, and
B.
Pardo
, “
The Musart testbed for query-by-humming evaluation
,”
Comput. Music J.
28
(
2
),
34
48
(
2004
).
6.
E.
Yumoto
,
W. J.
Gould
, and
T.
Baer
, “
Harmonics-to-noise ratio as an index of the degree of hoarseness
,”
J. Acoust. Soc. Am.
71
(
6
),
1544
1550
(
1982
).
7.
G.
Fant
,
Acoustic Theory of Speech Production: With Calculations Based on X-Ray Studies of Russian Articulations
, Vol.
2
(
Walter de Gruyter
,
1971
).
8.
A.
Camacho
, “
SWIPE: A sawtooth waveform inspired pitch estimator
,” Ph.D. thesis, University of Florida, Ginesville, FL (
2007
).
9.
A.
Camacho
and
J. G.
Harris
, “
A sawtooth waveform inspired pitch estimator for speech and music
,”
J. Acoust. Soc. Am.
124
(
3
),
1638
1652
(
2008
).
10.
X.
Sun
, “
Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio
,”
IEEE Int. Conf. Acoust. Speech Sign. Process.
1
,
333
336
(
2002
).
11.
A.
De Cheveigné
and
H.
Kawahara
, “
YIN, a fundamental frequency estimator for speech and music
,”
J. Acoust. Soc. Am.
111
(
4
),
1917
1930
(
2002
).
12.
S.
Gonzalez
and
M.
Brookes
, “
PEFAC-a pitch estimation algorithm robust to high levels of noise
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
22
(
2
),
518
530
(
2014
).
13.
O.
Deshmukh
,
C. Y.
Espy-Wilson
,
A.
Salomon
, and
J.
Singh
, “
Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
,”
IEEE Trans. Speech Audio Process.
13
(
5
),
776
786
(
2005
).
14.
C.
Shahnaz
,
W.-P.
Zhu
, and
M. O.
Ahmad
, “
Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme
,”
IEEE Trans. Audio Speech Lang. Process.
20
(
1
),
322
335
(
2012
).
15.
W.
Bauer
and
W.
Blankenship
, “
Dyptrack–a noise-tolerant pitch tracker
,” Department of Defence (NSA), Washington, USA,
Unclassified Report No. NASL-S-210
(
1974
).
16.
H.
Ney
, “
A dynamic programming technique for nonlinear smoothing
,”
IEEE Int. Conf. Acoust. Speech Sign. Process.
6
,
62
65
(
1981
).
17.
H.
Ney
, “
Dynamic programming algorithm for optimal estimation of speech parameter contours
,”
IEEE Trans. Syst. Man Cybern.
2
,
208
214
(
1983
).
18.
L.
Sukhostat
and
Y.
Imamverdiyev
, “
A comparative analysis of pitch detection methods under the influence of different noise conditions
,”
J. Voice
29
(
4
),
410
417
(
2015
).
19.
M.
Asgari
and
I.
Shafran
, “
Improving the accuracy and the robustness of harmonic model for pitch estimation
,” in
Proceedings Interspeech
(
2013
), pp.
1936
1940
.
20.
D.
Talkin
,
A Robust Algorithm for Pitch Tracking (RAPT)
, edited by
W. B.
Kleijin
and
K. K.
Paliwal
(
Elsevier
,
Amesterdam, the Netherlands
,
1995
), pp.
495
518
.
21.
E.
Azarov
,
M.
Vashkevich
, and
A.
Petrovsky
, “
Instantaneous pitch estimation based on RAPT framework
,” in
Proceedings of the European Signal Processing Conference (EUSIPCO)
(
2012
), pp.
2787
2791
.
22.
K.
Han
and
D.
Wang
, “
Neural network based pitch tracking in very noisy speech
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
22
(
12
),
2158
2168
(
2014
).
23.
H.
Su
,
H.
Zhang
,
X.
Zhang
, and
G.
Gao
, “
Convolutional neural network for robust pitch determination
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
(
2016
), pp.
579
583
.
24.
K.
Han
and
D.
Wang
, “
Neural networks for supervised pitch tracking in noise
,”
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
(
2014
), pp.
1488
1492
.
25.
K.
Kasi
and
S. A.
Zahorian
, “
Yet another algorithm for pitch tracking
,”
IEEE Int. Conf. Acoust. Speech Sign. Process.
1
,
361
364
(
2002
).
26.
H.
Ba
,
N.
Yang
,
I.
Demirkol
, and
W.
Heinzelman
, “
BaNa: A hybrid approach for noise resilient pitch detection
,” in
Proceedings of the IEEE Statistical Signal Processing Workshop (SSP)
(
2012
), pp.
369
372
.
27.
L.
Dolansky
and
P.
Tjernlund
, “
On certain irregularities of voiced-speech waveforms
,”
IEEE Trans. Audio Electroacoust.
16
(
1
),
51
56
(
1968
).
28.
F.
Plante
,
G. F.
Meyer
, and
W. A.
Ainsworth
, “
A pitch extraction reference database
,” in
Proceedings of Eurospeech 95
(
1995
), pp.
837
840
.
29.
A.
Kain
,
CSLU: Voices
(
Linguistic Data Consortium
,
Philadelphia, PA
,
2006
).
30.
P. C.
Bagshaw
,
S. M.
Hiller
, and
M. A.
Jack
, “
Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching
,” in
Proceedings of the Third European Conference on Speech Communications and Technology
(
1993
), pp.
1003
1006
.
31.
A. B.
Kain
, “
High resolution voice transformation
,” Oregon Health and Science University (
2001
).
32.
F. A.
Everest
,
K. C.
Pohlmann
, and
T.
Books
,
The Master Handbook of Acoustics
(
McGraw-Hill
,
New York
,
2001
), Vol.
4
.
33.
H.
Fletcher
,
Speech and Hearing in Communication
(
D. van Nostrand
,
New York
,
1953
).
34.
T. M.
Cover
and
P. E.
Hart
, “
Nearest neighbor pattern classification
,”
IEEE Trans. Inf. Theory
13
(
1
),
21
27
(
1967
).
35.
J. H.
Friedman
,
J. L.
Bentley
, and
R. A.
Finkel
, “
An algorithm for finding best matches in logarithmic expected time
,”
ACM Trans. Math. Softw. (TOMS)
3
(
3
),
209
226
(
1977
).
36.
R. E.
Bellman
and
S. E.
Dreyfus
,
Applied Dynamic Programming
(
Rand Corporation
,
Santa Monica, CA
,
1962
).
37.
R.
Bellman
,
Dynamic Programming
(
Dover
,
New York
,
1957
).
38.
X.
Sun
, “
Pitch determination algorithm
,” Software, available (Jan
2016
) from http://in.mathworks.com/matlabcentral/fileexchange/1230-pitch-determination-algorithm/content/shrp.m (Last viewed April 11, 2018).
39.
A.
Camacho
, “
SWIPE pitch estimation algorithm
,” Software, available (January
2016
) from http://www.cise.ufl.edu/~acamacho/publications/swipep.m (Last viewed April 11, 2018).
40.
M.
Brookes
, “VOICEBOX: A speech processing toolbox for matlab. 2006,” http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html (Last viewed April 11, 2018).
41.
T.
Drugman
and
A.
Alwan
, “
Joint robust voicing detection and pitch estimation based on residual harmonics
,” in
Proceedings of Interspeech
(
2011
), pp.
1973
1976
.
42.
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
, and
V.
Dubourg
, “
Scikit-learn: Machine learning in Python
,”
J. Mach. Learn. Res.
12
,
2825
2830
(
2011
).
You do not currently have access to this content.