Listeners improve their ability to understand nonnative speech through exposure. The present study examines the role of semantic predictability during adaptation. Listeners were trained on high-predictability, low-predictability, or semantically anomalous sentences. Results demonstrate that trained participants improve their perception of nonnative speech compared to untrained participants. Adaptation is most robust for the types of sentences participants heard during training; however, semantic predictability during exposure did not impact the amount of adaptation overall. Results show advantages in adaptation specific to the type of speech material, a finding similar to the specificity of adaptation previously demonstrated for individual talkers or accents.

The process of understanding speech is, in many situations, relatively effortless. Talkers are easily able to convey their message to listeners who demonstrate little difficulty in understanding. This ease makes situations in which understanding speech is difficult notable. These situations can include listening to speech in a noisy environment [e.g., Neff and Green (1987)] or listening to speech from a talker with an unfamiliar accent, including those who are not native speakers of the language of communication [e.g., Munro and Derwing (1995)]. Fortunately, even in these challenging listening situations, listeners can quickly adapt, improving their ability to understand the speech after exposure to a target talker or accent (Bradlow and Bent, 2008; Clarke and Garrett, 2004; Shannon et al., 1995). In this paper, we investigate adaptation to nonnative speech as a function of the linguistic and semantic structure of the sentences listeners are exposed to during training.

Previous work has investigated factors that may impact adaptation. For example, it is clear that the specific talkers and accents presented during an exposure or training phase impact the time course and extent of adaptation. The particular talkers and accents presented during training also influence generalization to novel talkers and novel accents (Baese-Berk et al., 2013; Bradlow and Bent, 2008). Further, listeners perform better when they receive feedback (Burchill et al., 2018; Cooper and Bradlow, 2016) with a range of feedback types showing benefits. For example, in Cooper and Bradlow (2016), the presentation of Jabberwocky sentences following a participant's transcription of a target sentence yielded as much benefit as sentences that lexically matched the target; in Burchill et al. (2018), participants benefited from concurrent or delayed presentation of subtitles. However, these studies manipulated the feedback provided to listeners, while holding the type of target sentences or words constant. Therefore, it is unclear how the syntactic and semantic content of the training sentences themselves may impact adaptation to nonnative speech. Here, we specifically investigate how semantic predictability of the training and testing sentences impacts adaptation.

Although the impact of semantic predictability on adaptation has not been investigated, previous work has demonstrated that semantic predictability impacts speech perception. For example, listeners more easily identify sentence-final words when those words are highly predictable (e.g., “The color of a lemon is yellow”) than when words are less predictable (e.g., “Mom thinks that it is yellow”) (Kalikow et al., 1977). Further, sentence-final words are easier to identify in cases where the sentences are meaningful than in cases where the sentences are semantically anomalous (e.g., “The black top ran the spring”) (Miller and Isard, 1963). This semantic predictability benefit is modulated by a variety of factors, including dialect familiarity (Clopper, 2012) and listener language background (Bradlow and Alexander, 2007). The semantic predictability benefit is relatively robust when listening to nonnative speech, including for children listening to nonnative speech (Holt and Bent, 2017). Indeed, listeners may be more reliant on semantic context when listening to unfamiliar nonnative speech than when listening to native speech (Schertz and Hawthorne, 2018). Therefore, it is possible that semantic predictability will impact not just baseline perception of nonnative speech but also how listeners adapt to this speech. Event-related potential (ERP) studies with written sentences also suggest that there is a distinction in neural processing for sentences that are low in predictability compared with those that are not plausible (DeLong et al., 2014; Quante et al., 2018). Thus, we may also see differential effects in perception and adaptation for low-predictability sentences compared with anomalous sentences.

In the present study, we examine adaptation to nonnative speech after exposure in three semantic predictability conditions. Listeners were exposed to either high-predictability sentences, low-predictability sentences, or semantically anomalous sentences. After training, listeners were tested on all three sentence types. A fourth group of listeners received no training and only completed the post-test. We considered two competing hypotheses:

  1. Listeners who transcribe high-predictability sentences during training will capitalize on the top-down cues to help interpret the unfamiliar pronunciations, leading to more robust adaptation to the unfamiliar nonnative speech than listeners who transcribe low-predictability sentences. Similarly, participants transcribing low-predictability sentences may be able to utilize top-down cues more than participants in the anomalous two training condition. If this is the case, participants may demonstrate improved performance on all post-test sentences if they have had exposure to higher semantic predictability during training.

  2. Listeners who transcribe sentences in the anomalous sentences condition will be required to rely more on bottom-up cues to interpret the unfamiliar pronunciations than participants in the low-predictability cases, who are required to rely on bottom-up cues more than participants in the high-predictability condition. Although the low-predictability and particularly anomalous sentences may initially be more challenging due to the lack of semantic support, the enhanced level of difficulty in determining how to map the unfamiliar pronunciations onto words in the lexicon may ultimately result in enhanced performance in the post-test.

The stimuli were selected from existing lists of high-predictability, low-predictability, and semantically anomalous sentences (see supplementary material).1 The lists of high- and low-predictability sentences were taken from Bradlow and Alexander (2007). The list of semantically anomalous sentences recorded was taken from the Syntactically Normal Sentence Test (Nye and Gaitenby, 1974). A female nonnative English speaker with a first language of Mandarin was recorded reading these sentences in a sound-attenuated booth.

Each sound file was leveled for intensity at 75 dB and then mixed with speech-shaped noise at a 0 dB signal-to-noise ratio. Following previous work, this noise was added to prevent participants from performing at ceiling (Smiljanic and Bradlow, 2011).

92 monolingual native English speakers drawn from the Human Subjects Pool at the University of Oregon between the ages of 18 and 34 completed the experiment online. Each participant completed a questionnaire about their language background, familiarity with nonnative speech and with other languages, and history of hearing, speech, or language disorders. Of the 92 participants, 23 were excluded due to noncompliance with the task (i.e., answering “I don't know” to every item or navigating away from the experiment for long periods of time; n = 15), a self-reported hearing impairment (n = 1), significant experience with nonnative speech (n = 4), or issues with the computer program (n = 3). Thus, 69 participants are included in the analyses below (n = 16 for the semantically anomalous training condition, n = 17 for the high-predictability training condition, n = 18 for the low-predictability training condition, and n = 18 for the control condition). None of the included participants indicated extensive experience interacting with Mandarin-accented speakers.

Participants were assigned to one of four experimental groups. Three groups received training and then took a post-test. The fourth group completed only the post-test. For the training phase, participants were assigned randomly to one of three training conditions: semantically anomalous, high-predictability, or low-predictability. Each participant heard 40 sentences from the condition to which they were assigned, presented in random order. Participants were asked to transcribe each sentence they heard by typing their response. Each sentence was only presented once, and they received no feedback on their transcriptions.

Immediately after training, participants in all groups were given a post-training test consisting of a novel set of 10 high-predictability, 10 low-predictability, and 10 semantically anomalous sentences randomly presented. The task for the post-test was identical to that in the training portion.

Post-test data was scored for words correctly transcribed (0 for incorrect, 1 for correct). Typos and misspellings resulting in non-words that strongly resembled the target were counted as correct.

We performed a logistic mixed effects regression to analyze the data. The dependent variable was final word transcription [correct (1) or incorrect (0)]. The following factors were contrast coded and included as fixed effects in the final model: (1) post-test sentence type (anomalous test, high-predictability test, low-predictability test); (2) training (i.e., control vs all other training groups); (3) matched vs unmatched training (e.g., anomalous training in anomalous test vs high-predictability and low-predictability training in anomalous test); (4) comparison of the two unmatched training groups (e.g., high-predictability in anomalous test vs low-predictability training in anomalous test). These comparisons were included to assess the effect of training type on test performance. No interactions were included in the final model because they did not significantly improve model fit. Model comparisons were implemented via likelihood ratio tests between the model with the term of interest and the one without the term. Random effects were included if they improved the model fit. Final models included random intercepts for participant and item. Because random slopes did not improve the model fit, they were not included in the final models to avoid overfitting.

Before investigating the influence of training sentence type on adaptation, we evaluated whether our listeners demonstrate a semantic predictability effect for the sentences in the post-test. When collapsing across listeners in all training conditions, final words that are highly predictable are transcribed more accurately than final words that are low-predictability. These low-predictability words are transcribed more accurately than final words in anomalous sentences (high-predictability = 0.75; low-predictability = 0.53; anomalous = 0.34). This finding replicates the robust semantic predictability effect from previous work (Kalikow et al., 1977; Miller and Isard, 1963). We investigated performance on all words in a sentence, as well as on just final words. Results were similar across the two types of analyses. For the sake of concision, we present the results from final words only here.

Figure 1 shows the overall post-test performance for the trained and control groups (collapsed over post-test sentence type). Examining this figure, all trained groups appear to demonstrate improved perception of nonnative speech when compared to the control condition (i.e., post-test scores are higher for each of the trained groups than the control group). The regression model demonstrates that this observation is statistically significant. Inclusion of training (i.e., trained vs control groups) significantly improves model fit (χ2 = 29.562, p < 0.001). However, there do not appear to be substantial differences across the three training groups in overall performance.

Fig. 1.

Performance on all post-test sentences for each training group. Listeners who received anomalous sentences during training are shown in black, high-predictability sentences in dark gray, low-predictability sentences in light gray, and no training in white.

Fig. 1.

Performance on all post-test sentences for each training group. Listeners who received anomalous sentences during training are shown in black, high-predictability sentences in dark gray, low-predictability sentences in light gray, and no training in white.

Close modal

Figure 2 shows post-test performance across each of the post-test sentence types as a function of training condition. As observed above, participants in the trained conditions perform better than participants in the control condition, and this pattern holds across all three post-test sentence types. Further, when transcribing final words in the anomalous condition, participants perform less well overall as compared to the high- and low-predictability condition and less well in the low-predictability condition as compared to the high-predictability condition. Finally, it appears that participants who are trained on a specific sentence type demonstrate improved performance on that sentence type compared to participants who were trained on another sentence type (e.g., participants trained on anomalous sentences perform better on anomalous sentences than participants trained on high- or low-predictability sentences), and this pattern is true for all three sentence types. The results of the mixed effects model support these observations. Participants tested on anomalous sentences perform less well than participants on the other two test conditions (χ2 = 8.6237, p < 0.01). Further, participants perform less well on the low-predictability sentences than the high-predictability sentences (χ2 = 3.9125, p < 0.047). Further, performance in “matched” training and testing conditions differs from performance in the “unmatched” conditions (χ2 = 8.506, p < 0.01); however, the two unmatched training conditions (e.g., the high-predictability vs the low-predictability training groups in the anomalous test condition or the high-predictability vs the anomalous training groups in the low-predictability test condition) do not differ from each other (χ2 = 0.1368, p < 0.7114).

Fig. 2.

Post-test performance by sentence type for participants trained on anomalous sentences (black), high-predictability sentences (dark gray), and low-predictability sentences (light gray) and participants with no training (white).

Fig. 2.

Post-test performance by sentence type for participants trained on anomalous sentences (black), high-predictability sentences (dark gray), and low-predictability sentences (light gray) and participants with no training (white).

Close modal

The results of this experiment demonstrate that congruency between training stimuli and test stimuli impacts adaptation to an unfamiliar accent, in this case, nonnative speech. All trained groups demonstrated overall increased performance when compared to a control group that did not receive training on the accent or task. Further, listeners demonstrated improved performance on sentences that are similar to the sentences they were exposed to during training compared to participants who heard a different type of sentences during training. For example, listeners who were trained on semantically anomalous sentences performed better on semantically anomalous sentences at test than listeners who were trained on high- or low-predictability semantically meaningful sentences.

There was no clear benefit for training on high-predictability, low-predictability, or anomalous sentences, demonstrating a lack of support for either of our alternate hypotheses; overall, no training group showed improved performance compared to the other groups. Although it is difficult to fully interpret null results, it may be that there are multiple strategies for improving perception of unfamiliar accents. When faced with different levels of informativeness, listeners may adopt different strategies to reach the same end goal of optimizing word recognition (see, e.g., Schertz and Hawthorne (2018) for an example of dynamic speech perception strategies). Listeners in the high-predictability condition may benefit from the top-down cues to facilitate learning how to interpret the unfamiliar pronunciations. This process may lead to successful adaptation due to listeners' ability to bootstrap the semantic and syntactic cues. Listeners in the low-predictability and anomalous conditions have less top-down information at their disposal, which may initially make the process of mapping the unfamiliar pronunciations much more difficult and error prone. However, this increased level of difficulty may propel adaptation due to “desirable difficulty” (Bjork, 1994). However, there do not seem to be differences in the rate of learning across the training conditions. Examining performance during training, we see relatively similar performance across conditions, with participants improving during the first half of the training at similar rates across the high- and low-predictability training conditions (high-predictability: block 1 = 0.81, block 2 = 0.88, improvement = 0.6; low-predictability: block 1 = 0.61, block 2 = 0.66, improvement = 0.5).

We speculate that the similar level of performance during training may be due in part to the structure of the low- vs high-predictability sentences. While the lexical content of the final words of low-predictability sentences is less predictable than the final words in high-predictability sentences, the syntactic structure of these sentences is more predictable. That is, the low-predictability sentences follow one of only a small set of sentence structures (e.g., “NOUN VERB PREPOSITION the NOUN” or “NOUN VERB that it is NOUN”), whereas the high-predictability sentences demonstrate much more variability in their structure. The anomalous sentences were even more predictable—all sentences followed the same “The ADJECTIVE NOUN VERBed the NOUN” structure. Therefore, it is possible that listeners exposed to low-predictability or anomalous sentences are able to utilize this syntactic predictability in adapting to the non-final words in these sentences. That is, listeners could use their ability to understand non-final words to improve their ability to understand final words. These differences too might engender distinct strategies during training and test that could result in listeners demonstrating good performance on all types of stimuli. Future research could use sentences that allow for disentangling the effects of semantic and syntactic predictability on adaptation. A number of other lexical and sentential factors and their impacts on adaptation could also be explored in future work. For example, non-declarative sentences could be included, or sentences with lexical items that vary in their frequency or familiarity could be assessed. These inclusions would allow for further tests of how syntactic and semantic factors impact how listeners learn to map unfamiliar pronunciations to known words.

Although listeners in the three training groups showed similar learning trajectories, there were differences in performance across the groups. The results demonstrate the specificity of exposure on adaptation to nonnative speech. That is, listeners derive a benefit for sentence types that they were trained on compared to sentence types they were not trained on. This finding contrasts with Cooper and Bradlow (2016), who demonstrated that feedback during training was equally beneficial, whether it matched or mismatched the target. After hearing a target sentence produced by a nonnative speaker in noise, participants derived similar benefits either hearing that same sentence in the clear or hearing another unrelated sentence produced in the clear. That is, match between feedback and target did not improve performance. In this study, a match of sentence type between training and post-test appeared to provide strong benefits for adaptation to novel sentences, suggesting a potential difference between matching content in feedback during training vs matching sentence types between training and test. The benefit between a match between training and testing stimulus has been demonstrated for training on specific talkers and on specific accents (e.g., Baese-Berk et al., 2013; Bradlow and Bent, 2008; Sidaras et al., 2009). When trained on a single talker, listeners perform very well on that talker at test but are unable to generalize robustly to other talkers (i.e., talker-dependent adaptation). Further, when listeners are trained on multiple talkers from a single accent background, they are able to generalize to a novel talker from that accent background, but not to a novel talker from an unfamiliar accent background (i.e., accent-dependent adaptation). Only when listeners are exposed to multiple talkers from multiple backgrounds do they generalize to novel talkers from novel backgrounds (i.e., accent-independent adaptation). Other previous work has demonstrated that not only does the type of variability influence adaptation, but so does the presentation of this variability (i.e., blocked or randomized presentation; Tzeng et al., 2016). The result of the present study suggests that a similar facilitation effect for adaptation may be at play as a function of sentence type. When individuals were trained on a single sentence type, they demonstrated an advantage on that sentence type. Future research could investigate whether variety in sentence types and order of presentation of these sentence types leads to more robust generalization, as is the case in variety of talkers and accents.

This work also speaks to a growing body of literature that examines the interplay between linguistic structure and word recognition. It is clear that linguistic structure can impact perception (see the sentence predictability effects discussed above); however, much is still unknown about how linguistic structure may interact with other factors that impact speech perception, including talker intelligibility. Some recent work has suggested that the effects of linguistic structure on perception may be modulated by other factors, including the intelligibility of the speakers (Strori et al., 2020). Listeners recognize simple sentences more easily than complex ones, but this benefit is most robust for more intelligible talkers (i.e., native or high-proficiency nonnative talkers) and is less robust for less intelligible talkers (i.e., low-proficiency nonnative talkers). This finding suggests that future work on adaptation should investigate not just how linguistic structure influences adaptation, but also how the influence of linguistic structure may vary depending on properties of the talkers or listeners.

Our results demonstrate that listeners who are exposed to nonnative speech—regardless of the sentence type included in the training—perform better than untrained listeners. Furthermore, trained listeners derive an additional benefit from the type of sentence on which they are trained. These results suggest that listeners are likely to use dynamic strategies in adapting to nonnative speech, taking advantage of the properties that allow them to best process the unfamiliar speech. Specificity of adaptation and how that specificity might be leveraged to generalize to novel stimuli, talkers, or situations should be investigated in future work.

This work was partially supported by the Undergraduate Research Opportunities Program at the University of Oregon.

1

See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0003326 for lists of high-predictability, low-predictability, and semantically anomalous sentences.

1.
Baese-Berk
,
M. M.
,
Bradlow
,
A. R.
, and
Wright
,
B. A.
(
2013
). “
Accent-independent adaptation to foreign accented speech
,”
J. Acoust. Soc. Am.
133
,
EL174
EL180
.
2.
Bjork
,
R. A.
(
1994
). “
Memory and metamemory considerations in the training of human beings
,” in
Metacognition: Knowing about knowing
, edited by
Metcalf
,
J.
, and
Shimamura
,
A.
(
MIT Press
,
Cambridge, MA
), pp.
185
205
.
3.
Bradlow
,
A. R.
, and
Alexander
,
J. A.
(
2007
). “
Semantic and phonetic enhancements for speech-in-noise recognition by native and non-native listeners
,”
J. Acoust. Soc. Am.
121
,
2339
2349
.
4.
Bradlow
,
A. R.
, and
Bent
,
T.
(
2008
). “
Perceptual adaptation to non-native speech
,”
Cognition
106
,
707
729
.
5.
Burchill
,
Z.
,
Liu
,
L.
, and
Jaeger
,
T. F.
(
2018
). “
Maintaining information about speech input during accent adaptation
,”
PLoS ONE
13
,
e0199358
.
6.
Clarke
,
C. M.
, and
Garrett
,
M. F.
(
2004
). “
Rapid adaptation to foreign-accented English
,”
J. Acoust. Soc. Am.
116
,
3647
3658
.
7.
Clopper
,
C. G.
(
2012
). “
Effects of dialect variation on the semantic predictability benefit
,”
Lang. Cognit. Process.
27
,
1002
1020
.
8.
Cooper
,
A.
, and
Bradlow
,
A. R.
(
2016
). “
Linguistically guided adaptation to foreign-accented speech
,”
J. Acoust. Soc. Am.
140
,
EL378
EL384
.
9.
DeLong
,
K. A.
,
Quante
,
L.
, and
Kutas
,
M.
(
2014
). “
Predictability, plausibility, and two late ERP positivities during written sentence comprehension
,”
Neuropsychologia
61
,
150
162
.
10.
Holt
,
R. F.
, and
Bent
,
T.
(
2017
). “
Children's use of semantic context in perception of foreign-accented speech
,”
J. Speech Lang. Hear. Res.
60
,
223
230
.
11.
Kalikow
,
D. N.
,
Stevens
,
K. N.
, and
Elliott
,
L. L.
(
1977
). “
Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability
,”
J. Acoust. Soc. of Am.
61
,
1337
1351
.
12.
Miller
,
G. A.
, and
Isard
,
S.
(
1963
). “
Some perceptual consequences of linguistic rules
,”
J. Verbal Learn. Verbal Behav.
2
,
217
228
.
13.
Munro
,
M. J.
, and
Derwing
,
T. M.
(
1995
). “
Foreign accent, comprehensibility, and intelligibility in the speech of second language learners
,”
Lang. Learn.
45
,
73
97
.
14.
Neff
,
D. L.
, and
Green
,
D. M.
(
1987
). “
Masking produced by spectral uncertainty with multicomponent maskers
,”
Percept. Psychophys.
41
,
409
415
.
15.
Nye
,
P. W.
, and
Gaitenby
,
J. H.
(
1974
). “
The intelligibility of synthetic monosyllablic words in short, syntactically normal sentences
,” Status Report on Speech Research SR-37/38 (
Haskins Laboratories
,
New Haven, CT
), pp.
169
190
.
16.
Quante
,
L.
,
Bölte
,
J.
, and
Zwitserlood
,
P.
(
2018
). “
Dissociating predictability, plausibility and possibility of sentence continuations in reading: evidence from late-positivity ERPs
,”
PeerJ
6
,
e5717
.
17.
Schertz
,
J.
, and
Hawthorne
,
K.
(
2018
). “
The effect of sentential context on phonetic categorization is modulated by talker accent and exposure
,”
J. Acoust. Soc. Am.
143
,
EL231
EL236
.
18.
Shannon
,
R. V.
,
Zeng
,
F.-G.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
19.
Sidaras
,
S. K.
,
Alexander
,
J. E. D.
, and
Nygaard
,
L. C.
(
2009
). “
Perceptual learning of systematic variation in Spanish accented speech
,”
J. Acoust. Soc. Am.
125
,
3306
3316
.
20.
Smiljanic
,
R.
, and
Bradlow
,
A. R.
(
2011
). “
Bidirectional clear speech perception benefit for native and high-proficiency non-native talkers and listeners: Intelligibility and accentedness
,”
J. Acoust. Soc. Am.
130
,
4020
4031
.
21.
Strori
,
D.
,
Bradlow
,
A. R.
, and
Souza
,
P. E.
(
2020
). “
Recognising foreign-accented speech of varying intelligibility and linguistic complexity: Insights from older listeners with or without hearing loss
,”
Int. J. Audiol.
0
,
1
11
.
22.
Tzeng
,
C. Y.
,
Alexander
,
J. E. D.
,
Sidaras
,
S. K.
, and
Nygaard
,
L. C.
(
2016
). “
The role of training structure in perceptual learning of accented speech
,”
J. Exp. Psychol. Hum. Percept. Perform.
42
,
1793
1805
.

Supplementary Material