Previous studies [Lisker, J. Acoust. Soc. Am.57, 15471551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am.62, 435448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., pa,ba) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.

1.
Allen
,
J. B.
(
1994
). “
How do humans process and recognize speech?
IEEE Trans. Speech Audio Process.
2
,
567
577
.
2.
Benki
,
J.
(
2001
). “
Place of articulation and first formant transition pattern both affect perception of voicing in English
,”
J. Phonetics
29
,
1
22
.
3.
Chen
,
M.
(
2001
). “
Perception of voicing for syllable-initial plosives in noise
,” Master thesis, Electrical Engineering Department,
University of California at Los Angeles
.
4.
Cho
,
T.
, and
Ladefoged
,
P.
(
1999
). “
Variation and universals in VOT: evidence from 18 languages
,”
J. Phonetics
27
,
207
229
.
5.
Cutler
,
A.
,
Weber
,
A.
,
Smits
,
R.
, and
Cooper
,
N.
(
2004
). “
Patterns of English phoneme confusions by native and non-native listeners
,”
J. Acoust. Soc. Am.
116
,
3668
3678
.
6.
Fitch
,
H. L.
,
Halwes
,
T.
,
Erickson
,
D. M.
, and
Liberman
,
A. M.
(
1980
). “
Perceptual equivalence of two acoustic cues for stop-consonant manner
,”
Percept. Psychophys.
27
,
343
350
.
7.
Haggard
,
M.
,
Ambler
,
S.
, and
Callow
,
M.
(
1970
). “
Pitch as a voicing cue
,”
J. Acoust. Soc. Am.
47
,
613
617
.
8.
Hall
,
M. D.
,
Davis
,
K.
, and
Kuhl
,
P. K.
(
1995
). “
Interactions between acoustic dimensions contributing to the perception of voicing
,”
J. Acoust. Soc. Am.
97
,
3416
.
9.
Hant
,
J.
(
2000
). “
A computational model to predict human perception of speech in noise
,” Ph.D. dissertation, Electrical Engineering Department,
University of California at Los Angeles.
10.
Hant
,
J.
, and
Alwan
,
A.
(
2000
). “
Predicting the perceptual confusion of synthetic plosive consonants in noise
,”
Proceedings of the Sixth International Conference on Spoken Language Processing
,
Beijing
,
China
, pp.
941
944
.
11.
Hant
,
J.
, and
Alwan
,
A.
(
2003
). “
A psychoacoustic-masking model to predict the perception of speech-like stimuli in noise
,”
Speech Commun.
40
,
291
313
.
12.
Hermansky
,
H.
(
1990
). “
Perceptual linear prediction (PLP) analysis for speech
,”
J. Acoust. Soc. Am.
87
,
1738
1752
.
13.
Kewley-Port
,
D.
(
1982
). “
Measurement of formant transitions in naturally produced stop consonant-vowel syllables
,”
J. Acoust. Soc. Am.
72
,
379
389
.
14.
Klatt
,
D. H.
(
1975
). “
Voice onset time, friction, and aspiration in word-initial consonant clusters
,”
J. Speech Hear. Res.
18
,
686
706
.
15.
Levitt
,
H.
(
1971
). “
Transformed up-down methods in psychoacoustics
,”
J. Acoust. Soc. Am.
49
,
467
477
.
16.
Liberman
,
A. M.
,
Cooper
,
F. S.
,
Shankweiler
,
D. P.
, and
Studdert-Kennedy
,
M.
(
1967
). “
Perception of the speech code
,”
Psychol. Rev.
74
,
431
461
.
17.
Liberman
,
A. M.
,
Delattre
,
P. C.
, and
Cooper
,
F. S.
(
1958
). “
Some cues for the distinction between voiced and voiceless stops in initial position
,”
Lang Speech
1
,
153
167
.
18.
Lisker
,
L.
(
1975
). “
Is it VOT or a first-formant transition detector?
J. Acoust. Soc. Am.
57
,
1547
1551
.
19.
Lisker
,
L.
, and
Abramson
,
A. S.
(
1964
). “
A cross-language study of voicing in initial stops: Acoustical measurements
,”
Word
20
,
384
422
.
20.
Lisker
,
L.
, and
Abramson
,
A. S.
(
1970
). “
The voicing dimension: Some experiments in comparative phonetics
,”
Proceedings of the Sixth International Congress of Phonetic Sciences
, Prague, 1967 (
Academia
,
Prague
), pp.
563
567
.
21.
Lisker
,
L.
,
Liberman
,
A. M.
,
Erickson
,
D. M.
,
Dechovitz
,
D.
, and
Mandler
,
R.
(
1977
). “
On pushing the voice onset-time (VOT) boundary about
,”
Lang Speech
20
,
209
216
.
22.
Massaro
,
D. W.
, and
Oden
,
G. C.
(
1980
). “
Evaluation and integration of acoustic features in speech perception
,”
J. Acoust. Soc. Am.
67
,
996
1013
.
23.
Menard
,
S. W.
(
1995
).
Applied Logistic Regression Analysis
(
Sage Publications
, Thousand Oaks, CA).
24.
Miller
,
J. L.
(
1977
). “
Nonindependence of feature processing in initial consonants
,”
J. Speech Hear. Res.
20
,
519
528
.
25.
Miller
,
G. A.
, and
Nicely
,
P. E.
(
1955
). “
An analysis of perceptual confusions among some English consonants
,”
J. Acoust. Soc. Am.
27
,
338
352
.
26.
Nearey
,
T. M.
(
1997
). “
Speech perception as pattern recognition
,”
J. Acoust. Soc. Am.
101
,
3241
3254
.
27.
Nittrouer
,
S.
,
Wilhelmsen
,
M.
,
Shapley
,
K.
,
Bodily
,
K.
, and
Creutz
,
T.
(
2003
). “
Two reasons not to bring your children to cocktail parties
,”
J. Acoust. Soc. Am.
113
,
2254
.
28.
Ohde
,
R. N.
(
1984
). “Fundamental frequency as an acoustic correlate of stop consonant voicing,”
J. Acoust. Soc. Am.
75
,
224
230
.
29.
Peterson
,
G. E.
, and
Lehiste
,
I.
(
1960
). “
Duration of syllable nuclei in English
,”
J. Acoust. Soc. Am.
32
,
693
703
.
30.
Repp
,
B.
(
1979
). “
Relative amplitude of aspiration noise as a voicing cue for syllable-initial stop consonants
,”
Lang Speech
22
,
173
189
.
31.
Repp
,
B.
(
1983
). “
Trading relations among acoustic cues in speech perception are largely a result of phonetic categorization
,”
Speech Commun.
2
,
341
361
.
32.
Sawusch
,
J. R.
, and
Pisoni
,
D. B.
(
1974
). “
On the identification of place and voicing features in synthetic stop consonants
,”
J. Phonetics
2
,
181
194
.
33.
Soli
,
S. D.
, and
Arabie
,
P.
(
1979
). “
Auditory versus phonetic accounts of observed confusions between consonant phonemes
,”
J. Acoust. Soc. Am.
66
,
46
59
.
34.
Stevens
,
K. N.
(
1998
).
Acoustic Phonetics
(
MIT Press
, Cambridge, MA).
35.
Stevens
,
K. N.
, and
Klatt
,
D. H.
(
1974
). “
Role of formant transitions in the voiced-voiceless distinction for stops
,”
J. Acoust. Soc. Am.
55
,
653
659
.
36.
Strope
,
B.
, and
Alwan
,
A.
(
1997
). “
A model of dynamic auditory perception and its application to robust word recognition
,”
IEEE Trans. Speech Audio Process.
5
,
451
464
.
37.
Summerfield
,
Q.
, and
Haggard
,
M.
(
1977
). “
On the dissociation of spectral and temporal cues to the voicing distinction in initial stop consonants
,”
J. Acoust. Soc. Am.
62
,
435
448
.
38.
Whalen
,
D. H.
,
Abramson
,
A. S.
,
Lisker
,
L.
, and
Mody
,
M.
(
1993
). “
F0 gives voicing information even with unambiguous voice onset times
,”
J. Acoust. Soc. Am.
93
,
2152
2159
.
You do not currently have access to this content.