When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments.

1.
Arbogast
,
T.
,
Mason
,
C.
, and
Kidd
,
G.
(
2002
). “
The effect of spatial separation on informational and energetic masking of speech
,”
J. Acoust. Soc. Am.
112
,
2086
2098
.
2.
Assmann
,
P. F.
, and
Summerfield
,
Q.
(
1990
). “
Modeling the perception of concurrent vowels: Vowels with different undamental frequencies
,”
J. Acoust. Soc. Am.
88
,
680
697
.
3.
Barker
,
J.
, and
Cooke
,
M. P.
(
1999
). “
Is the sine-wave speech cocktail party worth attending?
Speech Commun.
27
,
159
174
.
4.
Bolia
,
R.
,
Nelson
,
W.
,
Ericson
,
M.
, and
Simpson
,
B.
(
2000
). “
A speech corpus for multitalker communications research
,”
J. Acoust. Soc. Am.
107
,
1065
1066
.
5.
Brokx
,
J.
, and
Nooteboom
,
S.
(
1982
). “
Intonation and the perceptual separation of simultaneous voices
,”
J. Phonetics
10
,
23
36
.
6.
Brungart
,
D.
(
2001
). “
Informational and energetic masking effects in the perception of two simultaneous talkers
,”
J. Acoust. Soc. Am.
109
,
1101
1109
.
7.
Brungart
,
D.
, and
Simpson
,
B.
(
2002
). “
Within-channel and across-channel interference in the cocktail-party listening task
,”
J. Acoust. Soc. Am.
112
,
2985
2995
.
8.
Brungart
,
D.
,
Simpson
,
B.
,
Ericson
,
M.
, and
Scott
,
K.
(
2001
). “
Informational and energetic masking effects in the perception of multiple simultaneous talkers
,”
J. Acoust. Soc. Am.
110
,
2527
2538
.
9.
Brungart
,
D.
,
Simpson
,
B.
,
Darwin
,
C.
,
Arbogast
,
T.
, and
Kidd
,
G.
(
2005
). “
Across-ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task
,”
J. Acoust. Soc. Am.
117
,
292
304
.
10.
Carhart
,
R.
,
Tillman
,
T.
, and
Greetis
,
E.
(
1969
). “
Perceptual masking in multiple sound backgrounds
,”
J. Acoust. Soc. Am.
45
,
694
703
.
11.
Cherry
,
E.
(
1953
). “
Some experiments on the recognition of speech, with one and two ears
,”
J. Acoust. Soc. Am.
25
,
975
979
.
12.
Culling
,
J.
, and
Darwin
,
C.
(
1993
). “
Perceptual separation of simultaneous vowels: Within and across-formant grouping by Fo
,”
J. Acoust. Soc. Am.
93
,
3454
3467
.
13.
Darwin
,
C.
(
1981
). “
Perceptual grouping of speech components differing in fundamental frequency and onset-time
,”
Q. J. Exp. Psychol.
33
,
185
207
.
14.
Darwin
,
C.
,
Brungart
,
D.
, and
Simpson
,
B.
(
2003
). “
Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers
,”
J. Acoust. Soc. Am.
114
,
2913
2922
.
15.
de Cheveigne
,
A.
(
1993
). “
Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing
,”
J. Acoust. Soc. Am.
93
,
3271
3290
.
16.
Dorman
,
M.
,
Loizou
,
P.
, and
Rainey
,
D.
(
1997
). “
Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs
,”
J. Acoust. Soc. Am.
102
,
2403
2411
.
17.
Durlach
,
N.
,
Mason
,
C.
,
Kidd
,
G.
,
Arbogast
,
T.
,
Colburn
,
H.
, and
Shinn-Cunningham
,
B.
(
2003
). “
Note on informational masking
,”
J. Acoust. Soc. Am.
113
,
2984
2987
.
18.
Egan
,
J.
,
Carterette
,
E.
, and
Thwing
,
E.
(
1954
). “
Factors affecting multi-channel listening
,”
J. Acoust. Soc. Am.
26
,
774
782
.
19.
Ellis
,
D.
(
2003
).
Sinewave speech analysis/synthesis in MATLAB
(http://www.ee.columbia.edu/dpwe/resources/matlab/sws/).
20.
Fellowes
,
J. M.
,
Remez
,
R. E.
, and
Rubin
,
P. E.
(
1997
). “
Perceiving the sex and identity of a talker without natural vocal timbre
,”
Percept. Psychophys.
59
,
839
849
.
21.
Freyman
,
R.
,
Balakrishnan
,
U.
, and
Helfer
,
K.
(
2001
). “
Spatial release from informational masking in speech recognition
,”
J. Acoust. Soc. Am.
109
,
2112
2122
.
22.
Gonzalez
,
J.
, and
Oliver
,
J. C.
(
2005
). “
Gender and speaker identification as a function of the number of channels in spectrally reduced speech
,”
J. Acoust. Soc. Am.
118
,
461
470
.
23.
Kidd
,
G. J.
,
Mason
,
C.
,
Arbogast
,
T.
,
Brungart
,
D.
, and
Simpson
,
B.
(
2003
). “
Informational masking caused by contralateral stimulation
,”
J. Acoust. Soc. Am.
113
,
1594
1603
.
24.
Qin
,
M. K.
, and
Oxenham
,
A. J.
(
2003
). “
Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers
,”
J. Acoust. Soc. Am.
114
,
446
454
.
25.
Remez
,
R.
,
Rubin
,
P.
,
Pisoni
,
D.
, and
Carrell
,
T.
(
1981
). “
Speech perception without traditional speech cues
,”
Science
212
,
947
950
.
26.
Remez
,
R. E.
,
Rubin
,
P. E.
,
Berns
,
S. M.
,
Pardo
,
J. S.
, and
Lang
,
J. M.
(
1994
). “
On the perceptual organization of speech
,”
Psychol. Rev.
101
,
129
156
.
27.
Shannon
,
R. V.
,
Zeng
,
F.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
28.
Sheffert
,
S. M.
,
Pisoni
,
D. B.
,
Fellowes
,
J. M.
, and
Remez
,
R. E.
(
2002
). “
Learning to recognize talkers from natural, sinewave, and reversed speech samples
,”
J. Exp. Psychol. Hum. Percept. Perform.
28
,
1447
1469
.
29.
Smith
,
Z.
,
Delgutte
,
B.
, and
Oxenham
,
A.
(
2002
). “
Chimaeric sounds reveal dichotomies in auditory perception
,”
Nature (London)
416
,
87
90
.
30.
Stickney
,
P.
,
Zeng
,
F.-G.
,
Litovsky
,
R.
, and
Assmann
,
P.
(
2004
). “
Cochlear implant speech recognition with speech maskers
,”
J. Acoust. Soc. Am.
116
,
1081
1091
.
31.
Wilson
,
B. S.
,
Finley
,
C. C.
,
Lawson
,
D. T.
,
Wolford
,
R. D.
,
Eddington
,
D. K.
, and
Rabinowitz
,
W. M.
(
1991
). “
Better speech recognition with cochlear implants
,”
Nature (London)
352
,
236
238
.
You do not currently have access to this content.