Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs.

1.
L.
Rabiner
, and
B.
Juang
,
Fundamentals of Speech Recognition
(
Prentice-Hall
,
Englewood Cliffs, NJ
,
1993
), pp.
51
52
.
2.
K. N.
Stevens
, “
Toward a model for lexical access based on acoustic landmarks and distinctive features
,”
J. Acoust. Soc. Am.
111
,
1872
1891
(
2002
).
3.
A.
Juneja
and
C.
Espy-Wilson
, “
A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition
,”
J. Acoust. Soc. Am.
123
,
1154
1168
(
2008
).
4.
A.
Salomon
,
C.
Espy-Wilson
, and
O.
Deshmukh
, “
Detection of speech landmarks: Use of temporal information
,”
J. Acoust. Soc. Am.
115
(
3
),
1296
1305
(
2003
).
5.
N.
Bitar
and
C. Y.
Espy-Wilson
, “
Knowledge based parameters for HMM speech recognition
,”
ICASSP
,
Atlanta, GA
(
1996
), Vol.
1
, pp.
29
32
.
6.
O.
Deshmukh
,
C. Y.
Espy-Wilson
, and
A.
Juneja
, “
Acoustic-phonetic speech parameters for speaker-independent speech recognition
,”
ICASSP
,
Orlando, FL
(
2002
), Vol.
1
, pp.
593
596
.
7.
A.
Suchato
, “
Classification of stop consonant place of articulation
,” Ph. D. Dissertation,
Massachusetts Institute of Technology
, Cambridge, MA,
2004
.
8.
A. M. A.
Ali
,
J.
Van der Spiegel
, and
P.
Mueller
, “
Acoustic-phonetic features for the automatic classification of stop consonants
,”
IEEE Trans. Speech Audio Process.
9
,
833
841
(
2001
).
9.
K. N.
Stevens
and
S. E.
Blumstein
, “
Invariant cues for place of articulation of stop consonants
,”
J. Acoust. Soc. Am.
64
,
1358
1368
(
1978
).
10.
K. N.
Stevens
and
S. Y.
Manuel
, “
Revisiting place of articulation measures for stop consonants: Implications for models of consonant production
,”
Int. Congr. Phone. Sci.
2
,
1117
1120
(
1999
).
11.
A. K.
Halberstadt
, “
Heterogeneous acoustic measurements and multiple classifiers for speech recognition
,” Ph.D. Dissertation,
Massachusetts Institute of Technology
, Cambridge, MA,
1998
.
12.
D. A.
Reynolds
, “
Large population speaker identification using clean and telephone speech
,”
IEEE Sign. Process. Lett.
2
,
46
48
(
1995
).
13.
J. S.
Garofolo
,
L. F.
Lamel
,
W. M.
Fisher
,
J. G.
Fiscus
,
D. S.
Pallett
, and
N. L.
Dahlgren
, “
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM
,” NTIS order No. PB91-100354,
1993
.
14.
C.
Jankowski
,
A.
Kalyanswamy
,
S.
Basson
, and
J.
Spitz
, “
NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database
,”
ICASSP
,
Albuquerque, NM
(
1990
), Vol.
1
, pp.
109
112
.
15.
D.
Yuk
and
J.
Flanagan
, “
Telephone speech recognition using neural networks and hidden Markov models
,”
ICASSP
,
Phoenix, AZ
(
1999
), Vol.
1
, pp.
157
160
.
16.
T.
Hain
,
P. C.
Woodland
,
G.
Evermann
,
M. J. F.
Gales
,
X.
Liu
,
G. L.
Moore
,
D.
Povey
, and
L.
Wang
, “
Automatic transcription of conversational telephone speech
,”
IEEE Trans. Acoust. Speech Sign. Process.
13
(
6
),
1173
1185
(
2004
).
17.
M.
Russell
,
S.
D’Arcy
, and
Liq
Un
, “
The effects of bandwidth reduction on human and computer recognition of children’s speech
,’’
IEEE Sign. Process. Lett.
14
,
1044
1046
(
2007
).
18.
S.
Chang
,
M.
Wester
, and
S.
Greenberg
, “
An elitist approach to automatic articulatory acoustic feature classification for phonetic characterization of spoken language
,”
Comput. Speech Lang.
47
,
290
311
(
2005
).
19.
G. A.
Miller
and
P. E.
Nicely
, “
An analysis of perceptual confusions among some English consonants
,”
J. Acoust. Soc. Am.
27
,
338
352
(
1955
).
20.
J. R.
Dubno
and
H.
Levitt
, “
Predicting consonant confusions from acoustic analysis
,”
J. Acoust. Soc. Am.
69
,
249
261
(
1981
).
21.
M.
Nilsson
,
S.
Andersen
, and
W.
Kleijn
, “
On the mutual information between frequency bands in speech
,”
ICASSP
,
Istanbul, Turkey
(
2000
), Vol.
3
,
1327
1330
.
22.
M.
Nilsson
,
H.
Gustafsson
,
S. V.
Andersen
, and
W. B.
Kleijn
, “
Gaussian mixture model based mutual information estimation between frequency bands in speech
,”
ICASSP
,
Orlando, FL
(
2002
), Vol.
1
, pp.
525
528
.
23.
B.
Raj
,
M. L.
Seltzer
, and
R. M.
Stern
, “
Reconstruction of missing features for robust speech recognition
,”
Speech Commun.
43
,
275
296
(
2004
).
24.
B. J.
Borgstrom
and
A.
Alwan
, “
HMM-based reconstruction of unreliable spectrographic data for noise robust speech recognition
,”
IEEE Trans. Audio Speech Lang. Process.
18
,
1612
1623
(
2010
).
25.
U.
Remes
,
K. J.
Palomaki
, and
M.
Kurimo
, “
Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition
,”
EUSIPCO
,
Lausanne, Switzerland
(August 25–29,
2008
).
26.
W.
Kim
and
J. H. L.
Hansen
, “
Time-frequency correlation based missing-feature reconstruction for robust speech recognition in band-restricted conditions
,”
IEEE Trans. Audio Speech Lang. Process.
17
,
1292
1303
(
2009
).
27.
N.
Morales
,
D. T.
Toledano
,
J. H. L.
Hansen
, and
J.
Garrido
, “
Feature compensation techniques for ASR on band-limited speech
,”
IEEE Trans. Audio Speech Lang. Process.
17
,
758
774
(
2009
).
28.
K. N.
Stevens
,
Acoustic Phonetics
(
MIT Press
,
Cambridge MA
,
1998
), pp.
243
255
.
29.
S. A.
Liu
, “
Landmark detection for distinctive feature-based speech recognition
,”
J. Acoust. Soc. Am.
100
,
3417
3430
(
1996
).
30.
A. W.
Howitt
, “
Automatic syllable detection for vowel landmarks
,” Ph.D. dissertation,
Massachusetts Institute of Technology
, Cambridge, MA,
2000
.
31.
J. Y.
Choi
, “
Detection of consonant voicing: A module for a hierarchical speech recognition system
,” Ph.D. Dissertation,
Massachusetts Institute of Technology
, Cambridge, MA,
1999
.
32.
M. Y.
Chen
, “
Nasal detection model for a knowledge-based speech recognition system
,”
ICSLP2000
,
Beijing, China
(
2000
), Vol.
4
, pp.
636
639
.
33.
V. W.
Zue
, “
Acoustic characteristics of stop consonants: A controlled study
,” Ph.D. dissertation,
Massachusetts Institute of Technology
, Cambridge, MA, USA,
1979
.
34.
T. J.
Edwards
, “
Multiple features analysis of intervocalic English plosives
,”
J. Acoust. Soc. Am.
69
,
535
547
(
1981
).
35.
M.
Halle
,
G. W.
Hughes
, and
J. P.
Radley
, “
Acoustic properties of stop consonants
,”
J. Acoust. Soc. Am.
29
,
107
116
(
1957
).
36.
J. W.
Lee
and
J. Y.
Choi
, “
Acoustic-phonetic features for stop consonant place detection in clean and telephone speech
,”
J. Acoust. Soc. Am.
123
,
3330
(
2008
).
37.
D.
Kewley-Port
, “
Time-varying features as correlates of place of articulation in stop consonants
,”
J. Acoust. Soc. Am.
73
,
322
355
(
1983
).
38.
N.
Umeda
, “
Consonant duration in American English
,”
J. Acoust. Soc. Am.
61
,
846
858
(
1977
).
39.
S.
Young
,
G.
Evermann
,
M.
Gales
,
T.
Hain
,
D.
Kershaw
,
X.
Liu
,
G.
Moore
,
J.
Odell
,
D.
Ollason
,
D.
Povey
,
V.
Valtchev
, and
P.
Woodland
,
The HTK Book
, version 3.4 (
Cambridge University Press
,
Cambridge
,
2006
), pp.
66
67
.
40.
A.
Kain
and
M. W.
Macon
, “
Spectral voice conversion for text-to-speech synthesis
,”
ICASSP
,
Seattle, Wash
(
1998
), Vol.
1
, pp.
285
288
.
41.
Y.
Stylianou
,
O.
Cappe
, and
E.
Moulines
, “
Statistical methods for voice quality transformation
,”
Eurospeech
,
Madrid, Spain
(
1995
), pp.
447
450
.
42.
K. Y.
Park
and
H. S.
Kim
, “
Narrowband to wideband conversion of speech using GMM based transformation
,”
ICASSP
,
Istanbul, Turkey
(
2000
), Vol.
3
,
1843
1846
.
43.
C.
Mokbel
,
J.
Monne
, and
D.
Jouvet
, “
On-line adaptation of a speech recognizer to variations in telephone line conditions
,”
Eurospeech
2
,
1247
1250
(
1993
).
44.
X.
Huang
,
A.
Acero
, and
H. W.
Hon
,
Spoken Language Processing: A Guide to Theory, Algorithm and System Development
(
Prentice Hall
,
Upper Saddle River
,
2001
), pp.
526
527
.
45.
R. O.
Duda
,
P. E.
Hart
, and
D. G.
Stock
,
Pattern Classification
, 2nd ed. (
Wiley Interscience
,
New York
,
2000
), pp.
483
485
.
46.
D. A.
Reynolds
and
R. C.
Rose
, “
Robust text-independent speaker identification using Gaussian mixture models
,”
IEEE Trans. Speech Audio Processing
3
,
72
83
(
1995
).
47.
C. Y.
Espy-Wilson
,
T.
Pruthi
,
A.
Juneja
, and
O.
Deshmukh
, “
Landmark-based approach to speech recognition: An alternative to HMMs
,”
ICSLP
(
2007
), pp.
886
889
.
48.
R. D.
Kent
and
C.
Read
,
The Acoustic Analysis of Speech
, 2nd ed. (
Singular Publishing Group
,
California
,
1992
), pp.
144
150
.
49.
A. M. A.
Ali
, “
Auditory-based acoustic-phonetic signal processing for robust continuous speech recognition
,” Ph. D. dissertation, Department of Electronic Engineering,
University of Pennsylvania
, Philadelphia,
1999
.
50.
H. G.
Hirsch
and
D.
Pearce
, “
The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
,”
ISCA ITRW ASR2000
,
Paris, France
(
2000
).
You do not currently have access to this content.