Foreign-accented speech recognition is typically tested with linguistically simple materials, which offer a limited window into realistic speech processing. The present study examined the relationship between linguistic structure and talker intelligibility in several sentence-in-noise recognition experiments. Listeners transcribed simple/short and more complex/longer sentences embedded in noise. The sentences were spoken by three talkers of varying intelligibility: one native, one high-, and one low-intelligibility non-native English speakers. The effect of linguistic structure on sentence recognition accuracy was modulated by talker intelligibility. Accuracy was disadvantaged by increasing complexity only for the native and high intelligibility foreign-accented talkers, whereas no such effect was found for the low intelligibility foreign-accented talker. This pattern emerged across conditions: low and high signal-to-noise ratios, mixed and blocked stimulus presentation, and in the absence of a major cue to prosodic structure, the natural pitch contour of the sentences. Moreover, the pattern generalized to a different set of three talkers that matched the intelligibility of the original talkers. Taken together, the results in this study suggest that listeners employ qualitatively different speech processing strategies for low- versus high-intelligibility foreign-accented talkers, with sentence-related linguistic factors only emerging for speech over a threshold of intelligibility. Findings are discussed in the context of alternative accounts.

1.
Adams
,
C.
, and
Munro
,
R. R.
(
1978
). “
In search of the acoustic correlates of stress: Fundamental frequency, amplitude, and duration in the connected utterance of some native and non-native speakers of English
,”
Phonetica
35
,
125
156
.
2.
Anderson
,
A.
,
Bader
,
M.
,
Bard
,
E.
,
Boyle
,
E.
,
Doherty
,
G. M.
,
Garrod
,
S.
,
Isard
,
S.
,
Kowtko
,
J.
,
McAllister
,
J.
,
Miller
,
J.
,
Sotillo
,
C.
,
Thompson
,
H. S.
, and
Weinert
,
R.
(
1991
). “
The HCRC map task corpus
,”
Language and Speech
34
,
351
366
.
3.
Anderson-Hsieh
,
J.
,
Johnson
,
R.
, and
Koehler
,
K.
(
1992
). “
The relationship between native speaker judgments of non-native pronunciation and deviance in segmentals, prosody and syllable structure
,”
Lang. Learn.
42
,
529
555
.
4.
Baayen
,
R. H.
,
Davidson
,
D. J.
, and
Bates
,
D. M.
(
2008
). “
Mixed-effects modelling with crossed random effects for subjects and items
,”
J. Mem. Lang.
59
,
390
412
.
10.
Baker
,
R.
, and
Hazan
,
V.
(
2011
). “
DiapixUK: Task materials for the elicitation of multiple spontaneous speech dialogs
,”
Behavior Research Methods
43
(
3
),
761
770
.
5.
Best
,
V.
,
Keidser
,
G.
,
Freeston
,
K.
, and
Buchholz.
J. M.
(
2016
). “
A dynamic speech comprehension test for assessing real-world listening ability
,”
Journal of the American Academy of Audiology
27
,
515
526
.
6.
Best
,
V.
,
Keidser
,
G.
,
Freeston
,
K.
, and
Buchholz.
J. M.
(
2018
). “
Evaluation of the NAL Dynamic Conversations Test in older listeners with hearing loss
,”
International Journal of Audiology
57
(
3
),
221
229
.
7.
Bever
,
T. G.
,
Lackner
,
J. R.
, and
Kir
,
R.
(
1969
). “
The underlying structures of sentences are the primary units of immediate speech processing
,”
Percept. Psychophys.
5
(
4
),
225
234
.
8.
Binns
,
C.
, and
Culling
,
J. F.
(
2007
). “
The role of fundamental frequency contours in the perception of speech against interfering speech
,”
J. Acoust. Soc. Am.
122
,
1765
1776
.
9.
Boersma
,
P.
, and
Weenink
,
D.
(
2018
).
Praat: doing phonetics by computer [Computer program].
Version 6.0.28, retrieved from http://www.praat.org/ (last viewed 14 October 2018).
11.
Bolker
,
B. M.
,
Brooks
,
M. E.
,
Clark
,
C. J.
,
Geange
,
S. W.
,
Poulsen
,
J. R.
,
Stevens
,
M. H.
, and
White
,
J. S.
(
2009
). “
Generalized linear mixed models: A practical guide for ecology and evolution
,”
Trends Ecol. Evol.
24
(
3
),
127
135
.
12.
Bonhage
,
C. E.
,
Fiebach
,
C. J.
,
Bahlmann
,
J.
, and
Mueller
,
J. L.
(
2014
). “
Brain signature of working memory for sentence structure: Enriched encoding and facilitated maintenance
,”
J. Cogn. Neurosci.
26
(
8
),
1654
1671
.
13.
Bradlow
,
A. R.
ALLSSTAR: Archive of L1 and L2 scripted and spontaneous transcripts and recordings
,” retrieved from https://speechbox.linguistics.northwestern.edu/ALLSSTARcentral/#!/recordings (last viewed on 29 April, 2020).
14.
Bradlow
,
A. R.
, and
Bent
,
T.
(
2008
). “
Perceptual adaptation to non-native speech
,”
Cognition
106
,
707
729
.
15.
Bradlow
,
A. R.
,
Kim
,
M.
, and
Blasingame
,
M.
(
2017
). “
Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate
,”
J. Acoust. Soc. Am.
141
,
886
889
.
16.
Bradlow
,
A. R.
,
Blasingame
,
M.
, and
Lee
,
K.
(
2018
). “
Language-independent talker-specificity in bilingual speech intelligibility: Individual traits persist across first-language and second-language speech
,”
J. Assoc. Lab. Phonology
9
(
1
),
1
20
.
17.
Bradlow
,
A. R.
,
Nygaard
,
L. C.
, and
Pisoni
,
D. B.
(
1999
). “
Effects of talker, rate, and amplitude variation on recognition memory for spoken words
,”
Percept. Psychophys.
61
(
2
),
206
219
.
18.
Bradlow
,
A. R.
, and
Pisoni
,
D. B.
(
1999
). “
Recognition of spoken words by native and non-native listeners: Talker-, listener-, and item-related factors
,”
J. Acoust. Soc. Am.
106
,
2074
2085
.
19.
Brooks
,
M. E.
,
Kristensen
,
K.
,
van Benthem
,
K. J.
,
Magnusson
,
A.
,
Berg
,
C. W.
,
Nielsen
,
A.
,
Skaug
,
H. J.
,
Maechler
,
M.
, and
Bolker
,
B. M.
(
2017
). “
glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling
,”
The R Journal
9
(
2
),
378
400
.
20.
Burda
,
A. N.
,
Scherz
,
J. A.
,
Hageman
,
C. F.
, and
Edwards
,
H. T.
(
2003
). “
Age and understanding speakers with Spanish or Taiwanese accents
,”
Percept. Mot. Skills
97
,
11
20
.
21.
Church
,
B. A.
, and
Schacter
,
D. L.
(
1994
). “
Perceptual specificity of auditory priming: Implicit memory for voice intonation and fundamental frequency
,”
J. Exp. Psychol. Learn. Mem. Cogn.
20
,
521
533
.
22.
Cole
,
J.
(
2015
). “
Prosody in context: A review
,”
Lang. Cogn. Neurosci.
30
(
1–2
),
1
31
.
23.
Crowder
,
M. J.
(
1978
). “
Beta-binomial ANOVA for proportions
,”
J. R. Stat. Soc. Ser. C (Appl. Stat.)
27
,
34
37
.
24.
Cutler
,
A.
,
Dahan
,
D.
, and
van Donselaar
,
W.
(
1997
). “
Prosody in the comprehension of spoken language: A literature review
,”
Lang. Speech
40
,
141
201
.
25.
D'Onofrio
,
A.
(
2019
). “
Complicating categories: Personae mediate racialized expectations of non-native speech
,”
J. Sociolinguist.
23
(
4
),
346
366
.
26.
Epstein
,
W.
(
1961
). “
The influence of syntactical structure on learning
,”
Am. J. Psychol.
74
(
1
),
80
86
.
27.
Eskénazi
,
M.
,
Levow
,
G.-A.
,
Meng
,
H.
,
Parent
,
G.
, and
Suendermann
,
D.
(
2013
).
Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment
(
John Wiley & Sons Ltd
.,
New York
).
28.
Faulkner
,
K. F.
,
Tamati
,
T. N.
,
Gilbert
,
J. L.
, and
Pisoni
,
D. B.
(
2015
). “
List equivalency of PRESTO for the evaluation of speech recognition
,”
J. Am. Acad. Audiol.
26
(
6
),
582
594
.
29.
Ferguson
,
S. H.
,
Jongman
,
A.
,
Sereno
,
J. A.
, and
Keum
,
K. A.
(
2010
). “
Intelligibility of foreign- accented speech for older adults with and without hearing loss
,”
J. Am. Acad. Audiol.
21
,
153
162
.
30.
Ferrari
,
A.
, and
Comelli
,
M.
(
2016
). “
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research
,”
Journal of Neuroscience Methods
274
,
131
140
.
31.
Flege
,
J. E.
, and
Eefting
,
W.
(
1988
). “
Imitation of a VOT continuum by native speakers of English and Spanish: Evidence for phonetic category formation
,”
J. Acoust. Soc. Am.
83
,
729
740
.
32.
Fox
,
J.
,
Weisberg
,
S.
,
Price
,
B.
,
Friendly
,
M.
, and
Jangman
,
H.
(
2019
). “
Effect displays for linear, generalized linear, and other models
,” R package version 4.1-2.
33.
Fox
,
R. A.
,
Flege
,
J. E.
, and
Munro
,
J.
(
1995
). “
The perception of English and Spanish vowels by native English and Spanish listeners: A multidimensional scaling analysis
,”
J. Acoust. Soc. Am.
97
,
2540
2551
.
34.
Garnier
,
M.
,
Henrich
,
N.
, and
Dubois
,
D.
(
2010
). “
Influence of sound immersion and communicative interaction on the Lombard effect
,”
J. Speech Lang. Hear. Res.
53
,
588
608
.
35.
Gilbert
,
J. L.
,
Tamati
,
T. N.
, and
Pisoni
,
D. B.
(
2013
). “
Development, reliability, and validity of PRESTO: A new high-variability sentence recognition test
,”
J. Am. Acad. Audiol.
24
,
26
36
.
36.
Goldinger
,
S. D.
(
1996
). “
Words and voices: Episodic traces in spoken word identification and recognition memory
,”
J. Exp. Psychol. Learn. Mem. Cogn.
22
,
1166
1183
.
37.
Gordon-Salant
,
S.
,
Yeni-Komshian
,
G. H.
, and
Fitzgibbons
,
P. J.
(
2010a
). “
Recognition of accented English in quiet by younger normal-hearing listeners and older listeners with normal hearing and hearing loss
,”
J. Acoust. Soc. Am.
128
,
444
455
.
38.
Gordon-Salant
,
S.
,
Yeni-Komshian
,
G. H.
, and
Fitzgibbons
,
P. J.
(
2010b
). “
Perception of accented English in quiet and noise by younger and older listeners
,”
J. Acoust. Soc. Am.
128
,
3152
3160
.
39.
Gordon-Salant
,
S.
,
Yeni-Komshian
,
G. H.
,
Fitzgibbons
,
P. J.
, and
Cohen
,
J. I.
(
2015
). “
Effects of talker accent and age on recognition of multisyllabic words
,”
J. Acoust. Soc. Am.
137
,
884
897
.
40.
Gordon-Salant
,
S.
,
Yeni-Komshian
,
G. H.
,
Fitzgibbons
,
P. J.
,
Cohen
,
J. I.
, and
Waldroup
,
C.
(
2013
). “
Recognition of accented and unaccented speech in different noise backgrounds by younger and older listeners
,”
J. Acoust. Soc. Am.
134
,
618
627
.
41.
Hazan
,
V.
, and
Baker
,
R.
(
2011
). “
Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions
,”
J. Acoust. Soc. Am.
130
,
2139
2152
.
42.
Hilbe
,
J. M.
, (
2013
). “
Beta binomial regression
,” in
The Selected Works of Joseph Hilbe
(bepress electronic repository), retrieved from http://works.bepress.com/joseph_hilbe/43/ (last viewed on 29 April, 2020).
43.
Jaeger
,
T. F.
(
2008
). “
Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models
,”
J. Mem. Lang.
59
(
4
),
434
446
.
44.
Jarvella
,
R. J.
, and
Herman
,
S. J.
(
1972
). “
Clause structure of sentences and speech processing
,”
Percept. Psychophys.
11
(
5
),
381
384
.
45.
Just
,
M. A.
, and
Carpenter
,
P. A.
(
1992
). “
A capacity theory of comprehension: Individual differences in working memory
,”
Psychol. Rev.
99
,
122
149
.
46.
Kahneman
,
D.
(
1973
).
Attention and Effort
(
Prentice-Hall
,
Englewood Cliffs, NJ
).
47.
Laures
,
J. S.
, and
Bunton
,
K.
(
2003
). “
Perceptual effects of a flattened fundamental frequency at the sentence level under different listening conditions
,”
J. Commun. Disord.
36
,
449
464
.
48.
Laures
,
J. S.
, and
Weismer
,
G.
(
1999
). “
The effects of a flattened fundamental frequency on intelligibility at the sentence level
,”
J. Speech Lang. Hear. Res.
42
,
1148
1156
.
49.
Lecumberri
,
M. L. G.
,
Cooke
,
M.
, &
Wester
,
M.
(
2017
). “
A bi-directional task-based corpus of learners' conversational speech
,”
International Journal of Learner Corpus Research
,
3
(
2
),
175
195
.
50.
Lenth
,
R.
(
2019
). “
emmeans: Estimated Marginal Means, aka Least-Squares Means
,” R package version 1.3.5.1, https://CRAN.R-project.org/package=emmeans last viewed on 5 March, 2020.
51.
Luce
,
P. A.
, and
Lyons
,
E.
(
1998
). “
Specificity of memory representation for spoken words
,”
Mem. Cogn.
26
,
708
715
.
52.
MacKay
,
I. R. A.
,
Flege
,
J. E.
, and
Piske
,
T.
(
2000
). “
Persistent errors in the perception and production of word-initial English stop consonants by native speakers of Italian (A)
,”
J. Acoust. Soc. Am.
107
,
2802
2802
.
53.
Marks
,
L. E.
, and
Miller
,
G. A.
(
1964
). “
The role of semantic and syntactic constraints in the memorization of English sentences
,”
Journal of Verbal Learning and Verbal Behavior
3
,
1
5
.
54.
Mathworks
(
2018
).
MATLAB and Statistics Toolbox Release 2018b
(
The MathWorks, Inc
.,
Natick, MA
).
55.
Mattys
,
S. L.
,
Davis
,
M. H.
,
Bradlow
,
A. R.
, and
Scott
,
S. K.
(
2012
). “
Speech recognition in adverse conditions: A review
,”
Lang. Cogn. Process.
27
,
953
978
.
56.
Miller
,
G. A.
, and
Isard
,
S.
(
1963
). “
Some perceptual consequences of linguistic rules
,”
J. Verbal Learn. Verbal Behav.
2
,
217
228
.
57.
Miller
,
S. E.
,
Schlauch
,
R. S.
, and
Watson
,
P. J.
(
2010
). “
The effects of fundamental frequency contour manipulations on speech intelligibility in background noise
,”
J. Acoust. Soc. Am.
128
,
435
443
.
58.
Moulines
,
E.
, and
Charpentier
,
F.
(
1990
). “
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
,”
Speech Commun.
9
,
453
467
.
59.
Muniz-Terrera
,
G.
,
van den Hout
,
A.
,
Rigby
,
R. A.
, and
Stasinopoulos
,
D. M.
(
2016
). “
Analyzing cognitive test data: Distributions and non-parametric random effects
,”
Stat. Methods Med. Res.
25
(
2
),
741
753
.
60.
Munro
,
M.
, and
Derwing
,
T.
(
1995
). “
Foreign accent, comprehensibility and intelligibility in the speech of second language learners
,”
Lang. Learn.
45
,
73
97
.
61.
Munro
,
M. J.
(
1998
). “
The effects of noise on the intelligibility of foreign-accented speech
,”
Stud. Second Lang. Acquist.
20
,
139
154
.
62.
Nilsson
,
M.
,
Soli
,
S. D.
, and
Sullivan
,
J. A.
(
1994
). “
Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise
,”
J. Acoust. Soc. Am.
95
(
2
),
1085
1099
.
63.
Nygaard
,
L. C.
,
Sommers
,
M. S.
, and
Pisoni
,
D. B.
(
1994
). “
Speech perception as a talker contingent process
,”
Psychol. Sci.
5
,
42
46
.
64.
Palmeri
,
T. J.
,
Goldinger
,
S. D.
, and
Pisoni
,
D. B.
(
1993
). “
Episodic encoding of voice attributes and recognition memory for spoken words
,”
J. Exp. Psychol. Learn. Mem. Cogn.
19
,
309
328
.
65.
Plotkowski
,
A. R.
, &
Alexander
,
J. M.
(
2016
). “
A Sequential Sentence Paradigm Using Revised PRESTO Sentence Lists
,”
Journal of the American Academy of Audiology
,
27
(
8
),
647
660
.
66.
Potter
,
M. C.
, and
Lombardi
,
L.
(
1998
). “
Syntactic priming in immediate recall of sentences
,”
J. Mem. Lang.
38
,
265
282
.
67.
Prentice
,
R. L.
(
1986
). “
Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors
,”
J. Am. Stat. Assoc.
81
,
321
327
.
68.
R Core Team (2019). “
R: A language and environment for statistical computing
,” R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/ last viewed on 10 August, 2019.
69.
Rogers
,
C. L.
,
Dalby
,
J.
, and
Nishi
,
K.
(
2004
). “
Effects of noise and proficiency on intelligibility of Chinese-accented English
,”
Lang. Speech
47
,
139
154
.
70.
Rönnberg
,
J.
,
Rudner
,
M.
,
Foo
,
C.
, and
Lunner
,
T.
(
2008
). “
Cognition counts: a working memory system for ease of language understanding (ELU)
,”
International Journal of Audiology
,
47
(Suppl. 2),
S99
S105
.
71.
Rönnberg
,
J.
,
Lunner
,
T.
,
Zekveld
,
A.
,
Sorqvist
,
P.
,
Danielsson
,
H.
,
Lyxell
,
B.
,
Dahlström
,
Ö.
,
Signoret
,
C.
,
Stenfelt
,
S.
,
Pichora-Fuller
,
M. C.
, and
Rudner
,
M.
(
2013
). “
The Ease of Language Understanding (ELU) model: theoretical, empirical, and clinical advances
,”
Frontiers in Systems Neuroscience
,
7
(
31
),
1
17
.
72.
Strori
,
D.
,
Zaar
,
J.
,
Cooke
,
M.
, and
Mattys
,
S. L.
(
2018
). “
Sound specificity effects in spoken word recognition: The effect of integrality between words and sounds
,”
Atten. Percept. Psychophys.
80
(
1
),
222
241
.
73.
Shen
,
J.
, and
Souza
,
P.
(
2017
). “
The effect of dynamic pitch on speech recognition in temporally modulated noise
,”
J. Speech Lang. and Hear. Res.
60
,
2725
2739
.
74.
Tamati
,
T. N.
,
Gilbert
,
J. L.
, and
Pisoni
,
D. B.
(
2013
). “
Some factors underlying individual differences in speech recognition on PRESTO: A first report
,”
Journal of American Academy of Audiology
24
(
7
),
616
634
.
75.
Tamati
,
T. N.
, and
Pisoni
,
D. B.
(
2014
). “
Non-native speech recognition in adverse listening conditions
,”
Journal of American Academy of Audiology
25
(
9
),
869
892
.
76.
United States Census Bureau
. (
2017
). “
Selected characteristics of the total and native populations in the Unites States 2017 American Community Survey 1-Year Estimates
” [Data file]. Retrieved from https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_17_ 1YR_S0601&prodType=table (last viewed on 25 February 2020).
78.
van Wijngaarden
,
S. J.
,
Steeneken
,
H. J.
, and
Houtgast
,
T.
(
2002
). “
Quantifying the intelligibility of speech in noise for non-native talkers
,”
J. Acoust. Soc. Am.
112
,
3004
3013
.
79.
Van Engen
,
K. J.
,
Baese-Berk
,
M.
,
Baker
,
R. E.
,
Choi
,
A.
,
Kim
,
M.
, and
Bradlow
,
A. R.
(
2010
). “
The wildcat corpus of native- and foreign- accented English: Communicative efficiency across conversational dyads with varying language alignment profiles
,”
Language and Speech
53
,
510
540
.
80.
Walker
,
A.
, and
Campbell-Kibler
,
K.
(
2015
). “
Repeat what after whom? Exploring variable selectivity in a cross-dialectal shadowing task
,”
Front. Psychol.
6
,
546
.
81.
Wester
,
M.
,
Lecumberri
,
M. L. G.
, and
Cooke
,
M.
(
2014
). “
DIAPIX-FL: A symmetric corpus of conversations in first and second languages
,” in
Proceedings of Interspeech
, Singapore,
509
513
.
82.
Wilson
,
E. O.
, and
Spaulding
,
T. J.
(
2010
). “
Effects of noise and speech intelligibility on listener comprehension and processing time of Korean-accented English
,”
J. Speech Lang. Hear. Res.
53
,
1543
1554
.
83.
Wingfield
,
A.
(
1975
). “
The intonation-syntax interaction: Prosodic features in perceptual processing of sentences
,” in
Structure and Process in Speech Perception. Communication and Cybernetics, Vol. 11
, edited by
A.
Cohen
and
S. G.
Nooteboom
(
Springer
,
Berlin-Heidelberg
).
84.
Yu
,
A. C. L.
, and
Lee
,
H.
(
2014
). “
The stability of perceptual compensation for coarticulation within and across individuals: A cross-validation study
,”
J. Acoust. Soc. Am.
136
,
382
388
.
You do not currently have access to this content.