Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.

1.
C. R.
Jankowski
,
H. D. H.
Vo
, and
R. P.
Lippmann
, “
A comparison of signal processing front ends for automatic word recognition
,”
IEEE Trans. Speech Audio Process.
3
,
1
(
1995
).
2.
Y. T.
Chan
,
J. M. M.
Lavoie
, and
J. B.
Plan
, “
A parameter estimation approach to estimation of frequencies of sinusoids
,”
IEEE Trans. Acoust., Speech, Signal Process.
29
,
214
219
(
1981
).
3.
M. D.
Skowronski
and
J. G.
Harris
, “
Human factor cepstral coefficients
,”
J. Acoust. Soc. Am.
112
,
2279
(
2002
).
4.
M. D. Skowronski and J. G. Harris, “Improving the filter bank of a classic speech feature extraction algorithm,” International Symposium on Circulatory Systems (IEEE, Bangkok, Thailand, 2003), Vol. IV, pp. 281–284.
5.
H.
Hermansky
, “
Perceptual linear prediction (PLP) analysis for speech
,”
J. Acoust. Soc. Am.
87
,
1738
1752
(
1990
).
6.
C. P.
Chan
,
P. C.
Ching
, and
T.
Lee
, “
Noisy speech recognition using de-noised multiresolution analysis acoustic features
,”
J. Acoust. Soc. Am.
110
,
2567
2574
(
2001
).
7.
J.
Tchorz
and
B.
Kollmeier
, “
A model of auditory perception as front end for automatic speech recognition
,”
J. Acoust. Soc. Am.
106
,
2040
2050
(
1999
).
8.
B.
Strope
and
A.
Alwan
, “
A model of dynamic auditory perception and its application to robust word recognition
,”
IEEE Trans. Speech Audio Process.
5
,
451
464
(
1997
).
9.
R. Singh, M. L. Seltzer, B. Raj, and R. M. Stern, “Speech in noisy environments: robust automatic segmentation, feature extraction, and hypothesis combination,” International Conference on Acoustics, Speech, and Signal Processing (IEEE, Salt Lake City, UT, 2001), pp. 273–276.
10.
R. Sinha and S. Umesh, “Non-uniform scaling based speaker normalization,” International Conference on Acoustics, Speech, and Signal Processing (IEEE, Orlando, FL, 2002), pp. 589–592.
11.
M. D. Skowronski and J. G. Harris, “Increased MFCC filter bandwidth for noise-robust phoneme recognition,” International Conference on Acoustics, Speech, and Signal Processing (IEEE, Orlando, FL, 2002), pp. 801–804.
12.
S. B.
Davis
and
P.
Mermelstein
, “
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
,”
IEEE Trans. Acoust., Speech, Signal Process.
28
,
357
366
(
1980
).
13.
L. C. W. Pols, “Spectral analysis and identification of Dutch vowels in monosyllabic words,” Ph.D. thesis, Free University, Amsterdam, The Netherlands, 1977.
14.
B. C. J.
Moore
and
B. R.
Glasberg
, “
Suggested formula for calculating auditory-filter bandwidth and excitation patterns
,”
J. Acoust. Soc. Am.
74
,
750
753
(
1983
).
15.
C. G. M.
Fant
, “
Acoustic description and classification of phonetic units
,”
Ericsson Technics
15
,
1
(
1959
);
reprinted in Speech Sound and Features, ISBN 0262060515 (MIT Press, Cambridge, 1973).
16.
M. Slaney, Auditory Toolbox, Version 2, Technical Report No. 1998-010, Interval Research Corporation, 1998.
17.
S. Young, J. Jansen, J. Odell, D. Ollasen, and P. Woodland, The HTK Book (version 2.0) (Entropics Cambridge Research Lab, Cambridge, UK, 1995).
18.
A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, “The NOISEX-92 study on the effect of additive noise on automatic speech recognition,” Technical report, Speech Research Unit, Defense Research Agency, Malvern, U.K. (unpublished), http://spib.rice.edu/spib/select_noise.html (2004).
19.
B. S.
Atal
, “
Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
,”
J. Acoust. Soc. Am.
55
,
1304
1312
(
1974
).
20.
A. A.
Dibazar
,
J.-S.
Liaw
, and
T. W.
Berger
, “
Automatic gender identification
,”
J. Acoust. Soc. Am.
109
,
2316
(
2001
).
21.
R. D.
Patterson
, “
Auditory filter shapes derived with noise stimuli
,”
J. Acoust. Soc. Am.
59
,
640
654
(
1976
).
22.
E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models (Springer Verlag, Berlin, 1991).
This content is only available via PDF.
You do not currently have access to this content.