A sawtooth waveform inspired pitch estimator (SWIPE) has been developed for speech and music. SWIPE estimates the pitch as the fundamental frequency of the sawtooth waveform whose spectrum best matches the spectrum of the input signal. The comparison of the spectra is done by computing a normalized inner product between the spectrum of the signal and a modified cosine. The size of the analysis window is chosen appropriately to make the width of the main lobes of the spectrum match the width of the positive lobes of the cosine. SWIPE, a variation of SWIPE, utilizes only the first and prime harmonics of the signal, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms. The authors’ tests indicate that SWIPE and SWIPE performed better on two spoken speech and one disordered voice database and one musical instrument database consisting of single notes performed at a variety of pitches.

1.
American Standards Association
(
1960
). “
Acoustical Terminology SI 1-1960
,” American Standards Association, New York.
2.
Bagshaw
,
P. C.
,
Hiller
,
S. M.
, and
Jack
,
M. A.
(
1993
). “
Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching
,”
Proceedings of the Third European Conference on Speech Communications and Technology
, pp.
1003
1006
.
3.
Bagshaw
,
P. C.
(
1994
). “
Automatic prosodic analysis for computer aided pronunciation teaching
,” Ph.D. thesis,
University of Edinburgh
, Edinburgh.
4.
Boersma
,
P.
(
1993
). “
Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound
,”
Proceedings of the Institute of Phonetic Sciences
17
,
97
110
.
5.
Camacho
,
A.
, and
Harris
,
J. G.
(
2007
). “
A pitch estimation algorithm based on the smooth harmonic average peak-to-valley envelope
,”
Proceedings of the International Symposium on Circuits and Systems
, pp.
3940
3943
.
6.
Dannenberg
,
R. B.
,
Birmingham
,
W. P.
,
Tzanetakis
,
G. P.
,
Meek
,
C. P.
,
Hu
,
N. P.
, and
Pardo
,
B. P.
(
2004
). “
The MUSART testbed for query-by-humming evaluation
,”
Comput. Music J.
28
,
34
48
.
7.
De Bot
,
K.
(
1983
). “
Visual feedback of intonation I: Effectiveness and induced practice behavior
,”
Lang Speech
26
,
331
350
.
8.
de Cheveigné
,
A.
, and
Kawahara
,
H.
(
2002
). “
YIN, a fundamental frequency estimator for speech and music
,”
J. Acoust. Soc. Am.
111
,
1917
1930
.
9.
Di Martino
,
J.
, and
Laprie
,
Y.
(
1999
). “
An efficient F0 determination algorithm based on the implicit calculation of the autocorrelation of the temporal excitation signal
,”
Proceedings of the Sixth European Conference on Speech Communication and Technology
, pp.
2773
2776
.
10.
Doughty
,
J.
, and
Garner
,
W.
(
1947
). “
Pitch characteristics of short tones. I. Two kinds of pitch threshold
,”
J. Exp. Psychol.
37
,
351
365
.
11.
Duifhuis
,
H.
,
Willems
,
L. F.
, and
Sluyter
,
R. J.
(
1982
). “
Measurement of pitch in speech: An implementation of Goldstein’s theory of pitch perception
,”
J. Acoust. Soc. Am.
71
,
1568
1580
.
12.
Fant
,
G.
(
1970
).
Acoustic Theory of Speech Production, With Calculations Based on X-Ray Studies of Russian Articulations
(
Mouton De Gruyter
,
The Hague
).
13.
Glasberg
,
B. R.
, and
Moore
,
B. C. J.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
,
103
138
.
14.
Hermes
,
D. J.
(
1988
). “
Measurement of pitch by subharmonic summation
,”
J. Acoust. Soc. Am.
83
,
257
264
.
15.
Hess
,
W.
(
1983
).
Pitch Determination of Speech Signals
(
Springer-Verlag
,
Berlin
).
16.
Kawahara
,
H.
,
Katayose
,
H.
,
de Cheveigné
,
A.
, and
Patterson
,
R. D.
(
1999
). “
Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity
,”
Proceedings of the Sixth European Conference on Speech Communication and Technology
, pp.
2781
2784
.
17.
Klapuri
,
A.
(
2004
). “
Automatic music transcription as we know it today
,”
J. New Music Res.
33
,
269
282
.
18.
Klapuri
,
A.
(
2008
). “
Multipitch analysis of polyphonic music and speech signals using an auditory model
,”
IEEE Trans. Audio, Speech, Lang. Process.
16
,
255
266
.
19.
Medan
,
Y.
,
Yair
,
E.
, and
Chazan
,
D.
(
1991
). “
Super resolution pitch determination of speech signals
,”
IEEE Trans. Signal Process.
39
,
40
48
.
20.
Meddis
,
R.
, and
Hewitt
,
M. J.
(
1991
). “
Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification
,”
J. Acoust. Soc. Am.
89
,
2866
2882
.
21.
Meddis
,
R.
, and
O’Mard
,
L.
(
1997
). “
A unitary model of pitch perception
,”
J. Acoust. Soc. Am.
102
,
1811
1820
.
22.
Noll
,
A. M.
(
1967
). “
Cepstrum pitch determination
,”
J. Acoust. Soc. Am.
41
,
293
309
.
23.
Oppenheim
,
A. V.
,
Schafer
,
R. W.
, and
Buck
,
J. R.
(
1999
).
Discrete-Time Signal Processing
(
Prentice-Hall
,
Englewood Cliffs, NJ
).
24.
Patel
,
A. D.
, and
Balaban
,
E.
(
2001
). “
Human pitch perception is reflected in the timing of stimulus-related cortical activity
,”
Nat. Neurosci.
4
,
839
844
.
25.
Plante
,
F.
,
Meyer
,
G.
, and
Ainsworth
,
W. A.
(
1995
). “
A pitch extraction reference database
,”
Proceedings of the Fourth European Conference on Speech Communication and Technology
, pp.
837
840
.
26.
Rabiner
,
L. R.
(
1977
). “
On the Use of Autocorrelation Analysis for Pitch Detection
,”
IEEE Trans. Acoust., Speech, Signal Process.
25
,
24
33
.
27.
Schroeder
,
M. R.
(
1968
). “
Period histogram and product spectrum: New methods for fundamental frequency measurement
,”
J. Acoust. Soc. Am.
43
,
829
834
.
28.
Schwartz
,
D. A.
, and
Purves
,
D.
(
2004
). “
Pitch is determined by naturally occurring periodic sources
,”
Hear. Res.
194
,
31
46
.
29.
Secrest
,
B.
, and
Doddington
,
G.
(
1983
). “
An integrated pitch tracking algorithm for speech systems
,”
Proceedings of ICASSP-83
, pp.
1352
1355
.
30.
Sondhi
,
M. M.
(
1968
). “
New methods of pitch extraction
,”
IEEE Trans. Audio Electroacoust.
AU-16
,
262
266
.
31.
Spanias
,
A. S.
(
1994
). “
Speech coding: A tutorial review
,”
Proc. IEEE
82
,
1541
1582
.
32.
Sun
,
X.
(
2000
). “
A pitch determination algorithm based on subharmonic-to-harmonic ratio
,”
Proceedings of the International Conference on Spoken Language Processing
, Vol.
4
, pp.
676
679
.
33.
Traunmüller
,
H.
(
1990
). “
Analytical expressions for the tonotopic sensory scale
,”
J. Acoust. Soc. Am.
88
,
97
100
.
34.
Wang
,
M.
, and
Lin
,
M.
(
2004
). “
An analysis of pitch in Chinese spontaneous speech
,”
International Symposium on Tonal Aspects of Tone Languages
,
Beijing, China
.
35.
Yumoto
,
E.
,
Gould
,
W. J.
, and
Baer
,
T.
(
1982
). “
Harmonics-to-noise ratio as an index of the degree of hoarseness
,”
J. Acoust. Soc. Am.
71
,
1544
1549
.
You do not currently have access to this content.