Greater informational masking is observed when the target and masker speech are more perceptually similar. Fundamental frequency (f0) contour, or the dynamic movement of f0, is thought to provide cues for segregating target speech presented in a speech masker. Most of the data demonstrating this effect have been collected using digitally modified stimuli. Less work has been done exploring the role of f0 contour for speech-in-speech recognition when all of the stimuli have been produced naturally. The goal of this project was to explore the importance of target and masker f0 contour similarity by manipulating the speaking style of talkers producing the target and masker speech streams. Sentence recognition thresholds were evaluated for target and masker speech that was produced with either flat, normal, or exaggerated speaking styles; performance was also measured in speech spectrum shaped noise and for conditions in which the stimuli were processed through an ideal-binary mask. Results confirmed that similarities in f0 contour depth elevated speech-in-speech recognition thresholds; however, when the target and masker had similar contour depths, targets with normal f0 contours were more resistant to masking than targets with flat or exaggerated contours. Differences in energetic masking across stimuli cannot account for these results.

1.
ANSI
(
2009
). S3.21 2004 (R2009):
American National Standard Methods for Manual Pure-tone Threshold Audiometry
(
Acoustical Society of America
,
New York
).
2.
Anzalone
,
M.
,
Calandruccio
,
L.
,
Doherty
,
K.
, and
Carney
,
L.
(
2006
). “
Determination of the potential benefit of time-frequency gain manipulation
,”
Ear Hear.
27
(
5
),
480
492
.
3.
Arehart
,
K.
,
King
,
C.
, and
McLean-Mudgett
,
K.
(
1997
). “
Role of fundamental frequency differences in the perceptual separation of competing vowel sounds by listeners with normal hearing and listeners with hearing loss
,”
J. Speech Lang. Hear. Res.
40
(
6
),
1434
1444
.
4.
Assmann
,
P. F.
(
1999
). “
Fundamental frequency and the intelligibility of competing voices
,” in
Proceedings of the 14th International Congress of Phonetic Sciences
, pp.
179
182
.
5.
Bench
,
J.
,
Kowal
,
Å.
, and
Bamford
,
J.
(
1979
). “
The BKB (Bamford-Kowal-Bench) sentence lists for partially-hearing children
,”
Brit. J. Audiol.
13
,
108
112
.
6.
Binns
,
C.
, and
Culling
,
J. F.
(
2007
). “
The role of fundamental frequency contours in the perception of speech against interfering speech
,”
J. Acoust. Soc. Am.
122
(
3
),
1765
1776
.
7.
Bird
,
J.
, and
Darwin
,
C. J.
(
1998
). “
Effects of a difference in fundamental frequency in separating two sentences
,” in
Psychophysical and Physiological Advances in Hearing
, edited by
A. R.
Palmer
,
A.
Rees
,
A. Q.
Summerfield
, and
R.
Meedis
(
Whurr
,
London
), pp.
263
269
.
8.
Boersma
,
P.
, and
Weenink
,
D.
(
2017
). “
Praat: Doing phonetics by computer
” [computer program], http://www.praat.org/ (Last viewed 1/10/2017).
9.
Bolia
,
R.
,
Nelson
,
W.
,
Ericson
,
M.
, and
Simpson
,
B.
(
2000
). “
A speech corpus for multi-talker communications research
,”
J. Acoust. Soc. Am.
107
,
1065
1066
.
10.
Broadbent
,
D. E.
, and
Ladefoged
,
P.
(
1957
). “
On the fusion of sounds reaching different sense organs
,”
J. Acoust. Soc. Am.
29
(
6
),
708
710
.
11.
Brokx
,
J. P. L.
, and
Nooteboom
,
S. G.
(
1982
). “
Intonation and the perceptual separation of simultaneous voices
,”
J. Phon.
10
,
23
36
.
12.
Brungart
,
D. S.
(
2001
). “
Informational and energetic masking effects in the perception of two simultaneous talkers
,”
J. Acoust. Soc. Am.
109
(
3
),
1101
1109
.
13.
Brungart
,
D. S.
,
Chang
,
P. S.
,
Simpson
,
B. D.
, and
Wang
,
D.
(
2006
). “
Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
,”
J. Acoust. Soc. Am.
120
(
6
),
4007
4018
.
14.
Brungart
,
D. S.
,
Simpson
,
B. D.
,
Ericson
,
M. A.
, and
Scott
,
K. R.
(
2001
). “
Informational and energetic masking effects in the perception of multiple simultaneous talkers
,”
J. Acoust. Soc. Am.
110
(
5
),
2527
2538
.
15.
Calandruccio
,
L.
,
Buss
,
E.
, and
Bowdrie
,
K.
(
2017
). “
Effectiveness of two-talker maskers that differ in talker congruity and perceptual similarity to the target speech
,”
Trends Hear.
21
,
2331216517709385
.
16.
Charpentier
,
F.
, and
Stella
,
M.
(
1986
). “
Diphone synthesis using an overlap-add technique for speech waveforms concatenation
,” in
IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'86
, p.
11
.
17.
Chatterjee
,
M.
,
Peredo
,
F.
,
Nelson
,
D.
, and
Baskent
,
D.
(
2010
). “
Recognition of interrupted sentences under conditions of spectral degradation
,”
J. Acoust. Soc. Am.
127
(
2
),
EL37
EL41
.
18.
Clarke
,
J.
,
Kazanoğlu
,
D.
,
Başkent
,
D.
, and
Gaudrain
,
E.
(
2017
). “
Effect of F0 contours on top-down repair of interrupted speech
,”
J. Acoust. Soc. Am.
142
(
1
),
EL7
–EL
12
.
19.
Cutler
,
A.
,
Dahan
,
D.
, and
van Donselaar
,
W.
(
1997
). “
Prosody in the comprehension of spoken language: A literature review
,”
Lang. Speech
40
(
2
),
141
201
.
20.
Darwin
,
C. J.
,
Brungart
,
D. S.
, and
Simpson
,
B. D.
(
2003
). “
Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers
,”
J. Acoust. Soc. Am.
114
(
5
),
2913
2922
.
21.
Darwin
,
C. J.
, and
Hukin
,
R. W.
(
2000
). “
Effectiveness of spatial cues, prosody, and talker characteristics in selective attention
,”
J. Acoust. Soc. Am.
107
(
2
),
970
977
.
22.
Durlach
,
N. I.
(
2006
). “
Auditory masking: Need for improved conceptual structure
,”
J. Acoust. Soc. Am.
120
,
1787
1790
.
23.
Durlach
,
N. I.
,
Mason
,
C. R.
,
Kidd
,
G.
, Jr.
,
Arbogast
,
T.
,
Colburn
,
H. S.
, and
Shinn-Cunningham
,
B. G.
(
2003
). “
Note on informational masking (L)
,”
J. Acoust. Soc. Am.
113
(
6
),
2984
2987
.
24.
Egan
,
J. P.
(
1948
). “
Articulation testing methods
,”
Laryngoscope
58
(
9
),
955
991
.
25.
Fernald
,
A.
,
Taeschner
,
T.
,
Dunn
,
J.
,
Papousek
,
M.
,
de Boysson-Bardies
,
B.
, and
Fukui
,
I.
(
1989
). “
A cross-language study of prosodic modifications in mothers' and fathers' speech to preverbal infants
,”
J. Child Lang.
16
(
3
),
477
501
.
26.
Flaherty
,
M.
,
Buss
,
E.
, and
Leibold
,
L.
(
2019
). “
Developmental effects in children's ability to benefit from F0 differences between target and masker speech
,”
Ear Hear.
40
,
927
937
.
27.
Freyman
,
R. L.
,
Balakrishnan
,
U.
, and
Helfer
,
K. S.
(
2004
). “
Effect of number of masking talkers and auditory priming on informational masking in speech recognition
,”
J. Acoust. Soc. Am.
115
(
5
),
2246
2256
.
28.
Garnier
,
M.
, and
Henrich
,
N.
(
2014
). “
Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise?
,”
Comput. Speech Lang.
28
,
580
597
.
29.
Helfer
,
K. S.
, and
Freyman
,
R. L.
(
2008
). “
Aging and speech-on-speech masking
,”
Ear Hear.
29
(
1
),
87
98
.
30.
Hillenbrand
,
J. M.
(
2003
). “
Some effects of intonation contour on sentence intelligibility
,”
J. Acoust. Soc. Am.
114
(
4
),
2338
.
31.
Iyer
,
N.
,
Brungart
,
D. S.
, and
Simpson
,
B. D.
(
2010
). “
Effects of target-masker contextual similarity on the multimasker penalty in a three-talker diotic listening task
,”
J. Acoust. Soc. Am.
128
(
5
),
2998
3010
.
32.
Kakouros
,
S.
, and
Rasanen
,
O.
(
2016
). “
Perception of sentence stress in speech correlates with the temporal unpredictability of prosodic features
,”
Cogn. Sci.
40
(
7
),
1739
1774
.
33.
Kakouros
,
S.
,
Salminen
,
N.
, and
Rasanen
,
O.
(
2018
). “
Making predictable unpredictable with style—Behavioral and electrophysiological evidence for the critical role of prosodic expectations in the perception of prominence in speech
,”
Neuropsychologia
109
,
181
199
.
34.
Kalikow
,
D. N.
,
Stevens
,
K. N.
, and
Elliott
,
L. L.
(
1977
). “
Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability
,”
J. Acoust. Soc. Am.
61
(
5
),
1337
1351
.
35.
Kidd
,
G.
, Jr.
, and
Colburn
,
H. S.
(
2017
). “
Informational masking in speech recognition
,” in
The Auditory System at the Cocktail Party
(
Springer
,
Cham
), pp.
75
109
.
36.
Kidd
,
G.
, Jr.
,
Mason
,
C. R.
,
Richards
,
V. M.
,
Gallun
,
F. J.
, and
Durlach
,
N. I.
(
2008
). “
Informational masking
,” in
The Auditory System at the Cocktail Party
(
Springer
,
Boston
), pp.
143
189
.
37.
Krause
,
J.
, and
Braida
,
L.
(
2002
). “
Investigating alternative forms of clear speech: The effects of speaking rate and speaking mode on intelligibility
,”
J. Acoust. Soc. Am.
112
(
5
),
2165
2172
.
38.
Laures
,
J.
, and
Bunton
,
K.
(
2003
). “
Perceptual effects of a flattened fundamental frequency at the sentence level under different listening conditions
,”
J. Commun. Disord.
36
(
6
),
449
464
.
39.
Laures
,
J. S.
, and
Weismer
,
G.
(
1999
). “
The effects of a flattened fundamental frequency on intelligibility at the sentence level
,”
J. Speech Lang. Hear. Res.
42
,
1148
1156
.
40.
Leibold
,
L.
,
Buss
,
E.
, and
Calandruccio
,
L.
(
2018
). “
Developmental effects in masking release for speech-in-speech perception due to a target/masker sex mismatch
,”
Ear Hear.
39
(
5
),
935
945
.
41.
Mackersie
,
C. L.
,
Dewey
,
J.
, and
Guthrie
,
L. A.
(
2011
). “
Effects of fundamental frequency and vocal-tract length cues on sentence segregation by listeners with hearing loss
,”
J. Acoust. Soc. Am.
130
(
2
),
1006
1019
.
42.
Miller
,
S. E.
,
Schlauch
,
R. S.
, and
Watson
,
P. J.
(
2010
). “
The effects of fundamental frequency contour manipulations on speech intelligibility in background noise
,”
J. Acoust. Soc. Am.
128
(
1
),
435
443
.
43.
Moulines
,
E.
, and
Charpentier
,
F.
(
1990
). “
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
,”
Speech Commun.
9
(
5-6
),
453
467
.
44.
Nilsson
,
M.
,
Soli
,
S.
, and
Sullivan
,
J.
(
1994
). “
Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise
,”
J. Acoust. Soc. Am.
95
(
2
),
1085
1099
.
45.
Picheny
,
M.
,
Durlach
,
N.
, and
Braida
,
L.
(
1985
). “
Speaking clearly for the hard of hearing. I: Intelligibility differences between clear and conversational speech
,”
J. Speech Hear. Res.
28
(
1
),
96
103
.
46.
Rosen
,
S.
,
Souza
,
P.
,
Ekelund
,
C.
, and
Majeed
,
A.
(
2013
). “
Listening to speech in a background of other talkers: Effects of talker number and noise vocoding
,”
J. Acoust. Soc. Am.
133
(
4
),
2431
2443
.
47.
Song
,
J.
,
Demuth
,
K.
, and
Morgan
,
J.
(
2010
). “
Effects of the acoustic properties of infant-directed speech on infant word recognition
,”
J. Acoust. Soc. Am.
128
(
1
),
389
400
.
48.
Stone
,
M. A.
,
Füllgrabe
,
C.
, and
Moore
,
B. C. J.
(
2012
). “
Notionally steady background noise acts primarily as a modulation masker of speech
,”
J. Acoust. Soc. Am.
132
,
317
326
.
49.
Wang
,
D.
(
2005
). “
On ideal binary mask as the computational goal of auditory scene analysis
,” in
Speech Separation by Humans and Machines
, edited by
P.
Divenyi
(
Kluwer Academic
,
Dordrecht
), pp.
181
197
.
50.
Watson
,
C. S.
(
1987
). “
Uncertainty, informational masking, and the capacity of immediate auditory memory
,” in
Auditory Processing of Complex Sounds
, edited by
W. A.
Yost
and
C. S.
Watson
(
Lawrence Erlbaum
,
Mahwah, NJ
), pp.
267
277
.

Supplementary Material

You do not currently have access to this content.