This study reports that English speakers, after shadowing English-like nonwords beginning on /p/ with extended voice onset time, spontaneously shifted their subsequent reading productions of English words converging toward the shadowing targets. The extent of the imitative changes correlated positively with the speakers' declarative memory of the nonwords, but not with the lexical frequency of the produced words. These findings provide evidence for the phoneme-level abstraction in perceptually induced phonetic drifts while they further suggest that the mechanisms underlying phonetic drifts in direct shadowing, and in subsequent productions of words differing from the shadowing targets, may not be identical.

Speakers spontaneously shift their production toward the model speech they have just heard (e.g., Babel, 2012; Goldinger, 1998, among many others). For example, English speakers, after being exposed to the model speech with voiceless stops with longer voice onset time (VOT), spontaneously imitated such phonetic patterns (e.g., Nielsen, 2011; Shockley , 2004; Wade , 2021; Yu , 2013). The current study examines whether shadowing nonwords with a specific phonetic pattern can induce corresponding phonetic changes in subsequent productions of real words. We first ask whether the imitative changes are generalized from nonwords to real words, and further investigate whether the imitators' memory of the nonwords and the lexical frequency of the produced words influence the degree of imitative changes observed in real words, if any.

Various patterns observed in spontaneous imitation suggest that the phenomenon is related to memory and lexicon. For example, Goldinger (1998) claims that the imitation fidelity, or how closely a speaker changes their production toward the direction of the model speech, is influenced by the amount of exposure (i.e., the number of repetitions) or lexical frequency. Specifically, the degree of imitation was greater in the words that are repeated more than those repeated less, and in low-frequency words than in high-frequency words. The influence of lexical frequency and repetition on the imitation fidelity can be readily explained without added complexities in exemplar-based models of speech perception (Johnson, 1997, 2006; Pierrehumbert, 2001) which postulate that the exemplars, or memory traces, activated in the course of perception potentially contribute to the subsequent productions, inducing imitative changes (Goldinger, 1998; Tilsen, 2009). Specifically, imitation fidelity increases with more repetitions as more exemplars associated with the specific phonetic pattern of the model speech are newly added. In addition, low-frequency words demonstrate greater imitative changes because each new exemplar has a relatively greater weight when fewer existing exemplars are associated with a particular word. In contrast, high-frequency words have more existing exemplars, making the idiosyncrasies of one instance of a word less substantial, and thus resulting in a lesser degree of imitation.

Extending findings by Goldinger (1998) obtained from immediate shadowing tasks, in which participants heard and shadowed (i.e., immediately repeated) the model speech, Nielsen (2011) reports the frequency effects in the delayed, non-shadowing imitation. In Nielsen (2011), English speakers were passively exposed to the model speech consisting of high- and low-frequency words and read those words later in a separate, post-exposure block. Participants in Nielsen (2011) showed greater imitative changes in low-frequency words than in high-frequency words in the post-exposure reading productions, even though they were temporally remote from the model speech.

Nielsen (2011) further demonstrates that the imitative patterns are generalized to unheard, but phonologically related, words in the lexicon. For example, English speakers in Nielsen (2011), after being exposed to model speech, including voiceless stop /p/ with extended VOT, produced the same words with longer VOT. Notably, this longer VOT was generalized to unheard words sharing the target phoneme /p/ or even another voiceless stop /k/. This generalization of convergent changes from the heard words to the unheard but related words has been reported in several previous studies (Kwon, 2019; Nielsen, 2011), highlighting the importance of phonological categories, such as phonemes or features in spontaneous imitation. The current study extends this line of research and examines the generalization of imitative patterns from the nonwords to unheard words that are phonologically related to the nonwords of exposure. We specifically ask whether the speakers, after shadowing nonwords beginning with /p/ with longer VOT, converge to the phonetic pattern and produce longer VOT in the post-shadowing reading productions of real English words that begin with the same sound /p/.

The effects of lexical frequency have been reported in both shadowing (Goldinger, 1998) and non-shadowing delayed imitation (Nielsen, 2011). However, in both cases, the words produced by the participants included those they had heard—the difference between the two studies is the amount of time between perceiving a word and reproducing the same word. These findings confirm that word-size exemplars are involved in spontaneous imitation, which leads to a prediction that the lexical frequency effects would disappear when the words being produced and those heard do not share the same word-size exemplars. To our knowledge, this has not yet been tested. Are lexical frequency effects indeed tied to the words being perceived? To address this question, we compare the imitative changes in low- and high-frequency words that are generalized from a different set of (non)words. Would the relation between the imitative shift and lexical frequency hold true only when speakers reproduce the same words they heard, as predicted by exemplar-based views which assume that the frequency effects stem from the activation of word-size exemplars? Or, alternatively, could the generalized imitative changes in less frequent words surpass those in more frequent ones, even when the imitators are never exposed to the spoken forms of the words during the experiment?

If it is the nature of the memory system that gives rise to the observed patterns of imitation with regard to the effects of lexical frequency or the amount of exposure, the imitation fidelity can hypothetically be related to how much the speakers retain from the model speech they heard. Related to the link between memory and imitation, Yu (2013) examined whether those with greater working memory capacity imitated the model speech more faithfully. Although they found significant correlation between the imitation fidelity and some individual traits, including autistic-like trait, openness, and attitude toward the model speaker, to their (and our) surprise, working memory capacity was not correlated with the degree of imitation. This outcome is puzzling because phonetic imitation is expected to require selective attention to fine-grained phonetic details of the model speech, which is known to be influenced by working memory resources (Yu , 2013). Working memory capacity in Yu (2013), measured in automated reading span task (RSPAN) (Unsworth , 2005), represents the individual's ability to control attention and selectively attend to relevant information (and ignore irrelevant information) (e.g., Kane , 2001). To elucidate the relation between the phonetic imitation and memory, we focus on a rather different aspect of memory. Instead of memory capacity as a trait of the individual speaker, we ask whether a specific speaker who remembers the model speech better during the experiment would shift their subsequent productions more toward the model speech. Would the speaker's memory of the nonwords that they were exposed to during the experiment predict the degree of phonetic shift observed in the post-shadowing reading productions of real words?

In summary, the current study asks (1) whether the imitative phonetic changes are generalized from nonwords to real words, (2) whether the degree of imitative changes in English words, if any, is affected by the lexical frequency of the words to which participants have not been exposed during the experiment, and (3) whether the participants who remember the nonwords in the model speech more accurately show greater phonetic shifts toward the model speech. To answer these questions, we conducted an imitation experiment that examines whether shadowing nonwords beginning on /p/ with longer VOT induces VOT lengthening in subsequent reading productions of real words of high- or low-lexical frequencies. This imitation experiment is followed by a recognition memory test that asks the participants to identify whether they have previously seen the nonwords during the experiment, and we examine how a certain participant's performances in the two tasks (imitation and memory) are related.

Twenty-five adult native speakers of American English (20 female and 5 male, mean age = 21.8 years, median = 9 years) participated in the current study. They were living in Northern Virginia at the time of testing, not fluent in any other languages. One male participant's recording was lost due to technical issues, and thus, the data from the remaining 24 participants were analyzed and reported. No participant reported any history of speech or hearing impairments. Each participant was paid $10 for completing the entire experiment.

The reading list, adapted from (Nielsen, 2011), included 80 English words: 40 /p/-initial targets (10 monosyllabic, 20 disyllabic, 10 trisyllabic) and 40 sonorant-initial fillers (15 monosyllabic, 12 disyllabic, 13 trisyllabic). Of the target words, 20 were of high frequency and 20 were of low frequency. The two frequency groups were balanced in terms of the number of syllables, neighborhood density, and familiarity (see Nielsen, 2011 for more details, including the thresholds for low- and high-frequency words). All target words were initially stressed while the fillers varied in their stress patterns.

For the shadowing stimuli, 30 nonwords were generated by the Wuggy program (Keuleers and Brysbaert, 2010), an orthography-based pseudoword generator. Of the 30 nonwords, 20 began on a /p/ (five monosyllables, 10 disyllables, and five trisyllables), and 10 began on a sonorant (three monosyllables, five disyllables, and two trisyllables). Some of the nonwords were not phonological nonwords, such as parswer (homophonous with parser). All nonwords conformed to English phonotactics and spelling regulations, as confirmed by 12 native speakers of American English (different individuals from the main participants).

A phonetically trained male native speaker of American English (age = 26 years) served as the model speaker. The speaker produced each nonword multiple times, of which the best token (free of unintended noises, self-corrections, or deviant prosody) was selected. The speaker saw the orthographic nonwords and was instructed to produce them naturally at a normal speaking rate. All nonwords, including fillers, were produced with an initial stress. The selected tokens of each /p/-initial nonword were then manipulated to have extended VOT of the initial /p/, by splicing and pasting the medial portions of the aspiration. To make the final extended VOT to be around 120 ms, without any robotic sounds or clipping noises, the manipulation often involved multiple splicing and pasting of portions of varying duration. The mean VOT for the initial /p/ was 54.63 ms [standard deviation (SD) = 9.15 ms] before manipulation and was 119.22 ms (SD = 4.36 ms) after manipulation.

Each participant completed two experiments—an imitation experiment followed by a recognition memory test. The participants were tested individually in a sound-attenuated booth in the George Mason University Phonetics and Phonology Laboratory. They were seated in front of a laptop computer, wearing Sennheiser HD 280 Pro headphones, with a Røde smartLav+ microphone attached to their shirts close to their upper chest. Presentation of stimuli was implemented on PsychoPy.3 (Peirce, 2007) with the auditory stimuli being presented via the headphones. Participants' productions were recorded onto a separate MacBook Pro using Praat (Boersma and Weenink, 2021) and a Focusrite Scarlett Solo audio interface (sampling rate: 44.1 kHz).

The imitation experiment used a modified version of the word-naming imitation paradigm (Babel, 2012; Kwon, 2019; Nielsen, 2011), and consisted of four blocks: (1) warm-up reading, (2) baseline reading, (3) nonword shadowing, and (4) post-shadowing reading. (1) In the warm-up block, the words in the reading list were randomly presented on the laptop screen, one at a time, every 2 s, and the participants were instructed to read them silently without pronouncing them. This block was included to prevent hyper-articulation on the first encounter of the words (Nielsen, 2011). (2) In the baseline reading block, the same words were presented in a different random order, and the participants were instructed to read the words aloud, as clearly and naturally as possible. (3) In the nonword shadowing block, the participants heard the nonword stimuli, which included /p/s with extended VOT, and were instructed to say what they heard quickly and clearly. The nonwords' orthography appeared on the laptop screen while the participants heard the nonwords so that the participants could experience them as nonwords. They were told that they would hear some spoken items and see their spellings on the screen. The 30 nonwords were repeated eight times, each time in a different random order. The inter-stimulus interval was 2 s, and the shadowing block lasted about 8 min. (4) The post-shadowing reading was conducted in the same way as the baseline reading block, in which the participants read the same English words in a different random order. The participants' productions during the baseline, shadowing, and post-shadowing blocks were recorded, but the focus of the current study is the VOT changes from the baseline to post-shadowing productions.

To determine whether the participants' memory of the nonwords would correlate with the VOT changes in their post-shadowing productions of English words, the participants completed a recognition memory test after the imitation experiment. In the memory test, the participants saw nonwords in orthography (without hearing) and were asked to judge whether they saw the nonword during the experiment. The test included 30 trials, half of which were randomly selected from the nonwords presented during the shadowing block, and the other half were novel nonwords that did not appear during the imitation experiment. Every participant was tested on the same set of nonwords. Both the nonwords from the shadowing list and the newly added ones included three monosyllabic, nine disyllabic, and three trisyllabic words, of which 10 were /p/-initial and five were sonorant-initial. The participants were not informed of this memory test prior to the imitation experiment. The memory test was self-paced but no one spent more than 5 min to complete the task. No feedback was provided during or after the memory test.

After the imitation experiment and the memory test, the participants completed a language background questionnaire and reported their birthplace, other places they had lived, their native language(s), and any additional languages spoken. This was to ensure the participants were monolingual native speakers of American English. The entire procedure took approximately 25 min for each participant. Participants were allowed to take a short break between blocks and tests, but no one rested more than 1 min.

To quantify the phonetic changes in English words before and after hearing the nonword stimuli, we measured VOT of initial /p/ in the participants' baseline and post-shadowing productions. Prior to taking measurements, the tokens that were not the target words had partial repetitions (including self-corrections), or included extra-verbal interruptions, such as coughing, yawning, or clearing the throat, as well as the words in the reading list with their counterparts missing—in either the baseline or the post-shadowing block—were discarded (22 tokens). VOT of initial /p/ from the remaining target words (n = 1898) were measured in Praat (Boersma and Weenink, 2021), from the beginning of the release burst to the beginning of the glottal pulsing in the waveform and/or the appearance of a voicing bar in the spectrogram. Word duration was also measured to calculate the ratio between the VOT and REST of the word duration (REST = word duration – VOT).

The results of the memory test were quantified by converting the participants' responses to the yes–no question (i.e., “Have you seen this item during this experiment?”) into d-prime (d') scores using the dprime function in the psyphy package (Knoblauch, 2022) for R (R Core Team, 2020).

The descriptive statistics for VOT values in baseline, shadowing, and post-shadowing are in Table 1. The d' scores in the memory test range from 1.01–3.73 (mean = 2.52, median = 2.64, SD = 0.68).

Table 1.

Mean VOT, REST, and RATIO in baseline, shadowing, and post-shadowing blocks. The numbers in parentheses are SD.

Block Mean VOT (ms) Mean REST (ms) Mean RATIO (VOT/REST)
Baseline  71.5 (19.0)  429.0 (102.5)  0.17 (0.06) 
Shadowinga  74.6 (20.6)  458.0 (102.2)  0.17 (0.06) 
Post-shadowing  76.2 (20.2)  424.2 (98.7)  0.19 (0.07) 
Block Mean VOT (ms) Mean REST (ms) Mean RATIO (VOT/REST)
Baseline  71.5 (19.0)  429.0 (102.5)  0.17 (0.06) 
Shadowinga  74.6 (20.6)  458.0 (102.2)  0.17 (0.06) 
Post-shadowing  76.2 (20.2)  424.2 (98.7)  0.19 (0.07) 
a

The shadowing block involved a different set of (non)words from the baseline and post-shadowing blocks, and thus, the REST and RATIO of the nonwords in the shadowing block cannot be directly compared with those of the real words in the other two blocks.

To investigate whether the mean difference in VOT between the baseline and post-shadowing blocks is significant, and if so, whether the VOT change is mediated by the speaker's memory of the shadowed nonwords and the lexical frequency of the read words, we built a series of linear mixed-effect models. The RATIO (RATIO = VOT/REST) was included as the dependent variable, instead of the raw VOT values, to determine whether the increase in VOT, if any, was specific to VOT, rather than due to the overall changes in speech rate (i.e., slowing down). The initial model included BLOCK (baseline vs post-shadowing), the word FREQuency (high vs low), the number of SYLLable (1, 2, 3), MEMory (d' score), and their interactions as the predictors. SYLL was included as it was expected to systematically influence the RATIO. All other predictors were directly related to the research question. Interaction terms were retained only when they significantly improved the model fit based on a likelihood ratio test (p < 0.05), using lrtest() in the lmtest package (Zeileis and Hothorn, 2002). The variables, BLOCK and FREQ, were treatment-coded, with the reference levels for the intercepts set to baseline and low, respectively. SYLL, which was an ordinal variable, was coded using the orthogonal polynominal coding scheme. MEM was scaled and centered at 0. In addition, by-item and by-participant random intercepts were included, as well as by-item random slopes for BLOCK and by-participant random slopes for BLOCK, FREQ, and SYLL. Adding more random slopes resulted in error terms.

The only interaction term retained in the final model was the interaction between BLOCK and MEM whose inclusion significantly improved the model fit [ χ 2 ( 1 ) = 6.522, p = 0.010]. To further understand this significant interaction, a post hoc analysis was conducted using emtrends() in the emmeans package (Lenth, 2022). This post hoc analysis revealed, as plotted in Fig. 1, that the influence of MEM on RATIO significantly differed between the baseline and the post-shadowing blocks [β = −0.011, t ( 25.6 ) = 2.461, p = 0.020], although both slopes were not significantly different from zero [baseline: β = −0.004, t ( 24.6 ) = 0.838, p = 0.204; post-shadowing: β = 0.008, t ( 24.9 ) = 1.544, p = 0.932]. This outcome indicates that the positive correlation between MEM and RATIO increased from the baseline to the post-shadowing blocks, suggesting that the participants who performed better in the memory test were more likely to lengthen their VOT in the post-shadowing block from their baseline VOT than those who remembered fewer nonwords.

Fig. 1.

The predicted ratios in two blocks (baseline vs post-shadowing) based on memory scores. Shaded areas indicate 95% confidence interval (CI).

Fig. 1.

The predicted ratios in two blocks (baseline vs post-shadowing) based on memory scores. Shaded areas indicate 95% confidence interval (CI).

Close modal

The final model also revealed the significant linear effect of SYLL [β = −0.066, t ( 44.745 ) = 6.870, p < 0.001], indicating there was a significant linear relation between SYLL and RATIO. As shown in Fig. 2, the words with more syllables, presumably with a longer REST duration, were more likely to be associated with smaller RATIO values, as expected. However, SYLL:BLOCK interaction was not significant [ χ 2 ( 2 ) = 4.964, p = 0.083], indicating that the significant linear effect of SYLL was consistent across the baseline and post-shadowing blocks.

Fig. 2.

The predicted ratio based on the number of syllables. The error bars indicate 95% CI.

Fig. 2.

The predicted ratio based on the number of syllables. The error bars indicate 95% CI.

Close modal

The effect of FREQ was not significant [β = 0.002, t ( 41.449 ) = 0.286, p = 0.776]. This suggests that RATIO was not significantly different between high- and low-frequency groups, as shown in Fig. 3. More relevant to our research question, FREQ:BLOCK was not significant either [ χ 2 ( 1 ) = 0.55, p = 0.458], suggesting that the VOT change from baseline to post-shadowing was not related to FREQ.

Fig. 3.

The relation between predicted ratio and word frequency. The error bars indicate 95% CI.

Fig. 3.

The relation between predicted ratio and word frequency. The error bars indicate 95% CI.

Close modal

To summarize, after shadowing nonwords beginning on /p/ with longer VOT, English speakers produced longer VOT in real English words than they did in the baseline block. This increase in VOT was not due to the change in speech rate (i.e., slowing down) because the ratio (VOT/REST), not the raw VOT measures, increased. This change in VOT ratio, presumably induced by shadowing nonwords with longer VOT, was related to the participants' memory scores such that those who remembered nonwords more accurately were likely to lengthen their VOT to a greater degree from their own baseline productions. On the other hand, the VOT changes were not related to the frequencies of the produced words: after shadowing nonwords (with their frequencies controlled), speakers did not significantly differ in how much they shifted their VOT productions between lower frequency words and higher frequency words.

This study investigated phonetic changes observed in reading productions of English words after hearing and shadowing nonwords with a particular phonetic pattern. We conducted an imitation experiment that tested whether shadowing English-like nonwords beginning on /p/ with longer VOT induced VOT lengthening in subsequent reading productions of English words, followed by a recognition memory test that examined how accurately the participants remembered the nonwords. We asked the following three questions: (1) whether speakers lengthened their /p/ VOT when they read English words after shadowing English-like nonwords beginning with /p/ with extended VOT, (2) whether the extent of VOT change in English words, if any, was affected by the lexical frequency of the words, and (3) whether the participants who remembered the nonwords better also changed their VOT more toward the nonwords.

Regarding the first question, our participants significantly lengthened their VOT of word-initial /p/ after shadowing nonwords with extended VOT on initial /p/, even when there was no overlap between the (non)words produced during the shadowing block and those in the post-shadowing reading block. The speakers generalized the atypical, and potentially salient, phonetic pattern from nonwords to real English words sharing the initial phoneme /p/ throughout the lexicon. This convergence to the phonetic pattern in the shadowed nonwords corroborates the claim that spontaneous imitation involves abstract phonological categories (Kwon, 2019; Nielsen, 2011), and further demonstrates imitative changes (i.e., VOT lengthening) are generalize to delayed, non-shadowing productions that are not only temporally but also lexically remote from the shadowing targets.

The second question asked whether the lexical frequency of the words being produced (but not heard during the experiment) would influence the degree of generalized imitative changes (here, VOT lengthening). The current data did not reveal a significant difference in VOT changes between low- and high-frequency words (see Fig. 3). This outcome suggests that the effect of lexical frequency in speech imitation is tied to the words that are heard, or in the exemplar model's terminology, the words whose exemplars are activated during the course of auditory perception. If the frequency effects (i.e., the amount of exposure determines the imitation fidelity) arise exclusively from the activation of word-size exemplars, the frequency effects are expected to be tied strictly to the changes in the (non)words of exposure. For instance, exposure to a nonword, such as pude with hyper-aspiration [phhhud], would activate all the stored instances of [phhh] associated with any words in a speaker's lexicon, or exemplar space. When the speaker produces an English word, such as peace /pis/, because the recently activated exemplars of peace are with hyper-aspirated instances of /p/, the speaker produces a longer VOT for the initial /p/. Crucially, the exemplars of peace (or any other words in the reading list) that are not hyper-aspirated are activated to a far weaker extent than those hyper-aspirated and, thus, these non-hyper-aspirated exemplars (however many of those exist), would not obscure or attenuate the contribution of the hyper-aspirated exemplars to imminent productions. Therefore, the size of the exemplar clouds of the individual words that are only produced (without being heard) does not matter anymore. This is why the lexical frequency effects disappear when the imitators produce the words that are different from the (non)words that they are exposed to. If this is the case, the apparently same phonetic changes (i.e., VOT lengthening) observed in the words of exposure (in the previous studies, as we do not test this immediate imitation in the current study because we do not know the baseline VOT of the nonwords before hearing the VOT-extended model speech) and those generalized to unheard words arguably arise from the distinct mechanism. Only the phonetic shifts in the words of exposure would directly result from the activation of the word-size exemplars while both processes (reproducing the same words and producing different words sharing the same phoneme) would presumably involve the generalization described above.

Last, with regard to the memory effect on the generalized imitation, the current study revealed a positive correlation between the declarative memory of the nonwords measured in the recognition test and the VOT changes before and after being exposed to the nonwords. Even when the post-shadowing productions are not only temporally but also lexically remote from the shadowing targets, the speakers who remembered the nonwords more accurately seem to change their productions to greater extents toward the nonwords. This suggests that, unlike individual speakers' working memory capacity (Yu , 2013), their memory-related performance during the experiment is related to the degree of generalized imitative changes. The generalized imitative changes observed in the current study presumably involve phonological processing to greater extents than comparable changes in immediate shadowing (as in Goldinger, 1998, for example) or delayed productions of the same words (e.g., Nielsen, 2011; Yu , 2013). If so, the current finding seems to be consistent with the previous studies showing that declarative memory is primarily responsible for phonological and lexical learning (e.g., Arthur , 2021; Ullman, 2004).

However, we acknowledge that the memory scores in the current study may not (only) reflect the participants' verbal memory, but also other executive functions, such as the attention control or the ability to deal with (ir-)relevant information. In other words, there might exist an untested individual trait that potentially gives rise to both better performances in the nonword recognition test and greater phonetic changes in English words converging toward the shadowed nonwords. Further studies could use a more refined way to measure memory and attention, as well as their influence on spontaneous imitation and its generalization.

In sum, we observed that a salient phonetic pattern (i.e., extended VOT) is generalized from nonwords to reading productions of real words sharing the same initial phoneme. The degree of the convergent phonetic change generalized from nonwords to unheard words was positively associated with imitators' declarative memory of the nonwords. However, the effects of lexical frequency were not observed when the imitators were not exposed to the auditory forms of the words during the experiment. The findings support the claim that phonetic drifts arising from perceiving a particular phonetic pattern involve a phoneme-(or feature-)level abstraction (Kwon, 2019; Nielsen, 2011), and further suggest that the phonetic changes observed in the lexically remote productions of phonologically related words, although they involve apparently identical changes observed in the imitation of heard (non)words, do not directly stem from the activation of word-size exemplars.

See the supplementary material for the reading list; the list of the shadowed nonwords; the list of nonwords in the memory test; the output of the reported regression model and the output of the post hoc comparisons.

This research was supported by the New Faculty Startup Fund from Seoul National University to H.K. and by the Linguistic Program at George Mason University to Y.W. The authors thank the editor and the reviewer for the insightful comments.

The authors have no conflicts of interest to declare.

All recruitment and experimental procedures for this study were reviewed and approved by the institutional review board at George Mason University.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
Arthur
,
D. T.
,
Ullman
,
M. T.
, and
Earle
,
F. S.
(
2021
). “
Declarative memory predicts phonological processing abilities in adulthood
,”
Front. Psychol.
12
,
658402
.
2.
Babel
,
M.
(
2012
). “
Evidence for phonetic and social selectivity in spontaneous phonetic imitation
,”
J. Phon.
40
(
1
),
177
189
.
3.
Boersma
,
P.
, and
Weenink
,
D.
(
2021
). “
Praat: Doing phonetics by computer (version 6.1. 38) [computer program]
,” http://www.praat.org.
4.
Goldinger
,
S. D.
(
1998
). “
Echoes of echoes? An episodic theory of lexical access
,”
Psychol. Rev.
105
(
2
),
251
–279.
5.
Johnson
,
K.
(
1997
). “
Speech perception without speaker normalization: An exemplar model
,” in
TalkerTalker Variability in Speech Processing
, edited by
K.
Johnson
and
J.
Mullennix
(
Academic Press
,
San Diego
), pp.
145
165
.
6.
Johnson
,
K.
(
2006
). “
Resonance in an exemplar-based lexicon: The emergence of social identity and phonology
,”
J. Phon.
34
(
4
),
485
499
.
7.
Kane
,
M. J.
,
Bleckley
,
M. K.
,
Conway
,
A. R.
, and
Engle
,
R. W.
(
2001
). “
A controlled-attention view of working-memory capacity
,”
J. Exp. Psychol. Gen.
130
(
2
),
169
–183.
8.
Keuleers
,
E.
, and
Brysbaert
,
M.
(
2010
). “
Wuggy: A multilingual pseudoword generator
,”
Behav. Res. Meth.
42
,
627
633
.
9.
Knoblauch
,
K.
(
2022
). “
psyphy: Functions for Analyzing Psychophysical Data in R
,” The Comprehensive R Archive Network (r package version 0.2-3), available at https://CRAN.R-project.org/package=psyphy.
10.
Kwon
,
H.
(
2019
). “
The role of native phonology in spontaneous imitation: Evidence from Seoul Korean
,”
Lab. Phonol.
10
(1),
10
.
11.
Lenth
,
R. V.
(
2022
). “
emmeans: Estimated marginal means, aka least-squares means
” (r package version 1.8.2), available at https://CRAN.R-project.org/package=emmeans.
12.
Nielsen
,
K.
(
2011
). “
Specificity and abstractness of VOT imitation
,”
J. Phon.
39
(
2
),
132
142
.
13.
Peirce
,
J. W.
(
2007
). “
Psychopy–psychophysics software in Python
,”
J. Neurosci. Meth.
162
(
1–2
),
8
13
.
14.
Pierrehumbert
,
J. B.
(
2001
). “
Exemplar dynamics: Word frequency, lenition and contrast
,”
Typol. Stud. Lang.
45
,
137
158
.
15.
Shockley
,
K.
,
Sabadini
,
L.
, and
Fowler
,
C. A.
(
2004
). “
Imitation in shadowing words
,”
Percept. Psychophys.
66
,
422
429
.
16.
R Core Team
(
2020
). “
R: A language and environment for statistical computing
,”
Foundation for Statistical Computing
.
17.
Tilsen
,
S.
(
2009
). “
Toward a dynamical interpretation of hierarchical linguistic structure
,”
UC Berkeley PhonLab Annu. Rep.
5
(
5
),
462
512
.
18.
Ullman
,
M. T.
(
2004
). “
Contributions of memory circuits to language: The declarative/procedural model
,”
Cognition
92
(
1
2
),
231
270
.
19.
Unsworth
,
N.
,
Heitz
,
R. P.
,
Schrock
,
J. C.
, and
Engle
,
R. W.
(
2005
). “
An automated version of the operation span task
,”
Behav. Res. Meth.
37
(
3
),
498
505
.
20.
Wade
,
L.
,
Lai
,
W.
, and
Tamminga
,
M.
(
2021
). “
The reliability of individual differences in VOT imitation
,”
Lang. Speech
64
(
3
),
576
593
.
21.
Yu
,
A. C.
,
Abrego-Collier
,
C.
, and
Sonderegger
,
M.
(
2013
). “
Phonetic imitation from an individual-difference perspective: Subjective attitude, personality and ‘autistic’ traits
,”
PloS One
8
(
9
),
e74746
.
22.
Zeileis
,
A.
, and
Hothorn
,
T.
(
2002
). “
Diagnostic checking in regression relationships
,”
R News
2
(
3
),
7
10
, available at https://journal.r-project.org/articles/RN-2002-018/.

Supplementary Material