The perceptual boundary between short and long categories depends on speech rate. We investigated the influence of speech rate on perceptual boundaries for short and long vowel and consonant contrasts by Spanish–English bilingual listeners and English monolinguals. Listeners tended to adapt their perceptual boundaries to speech rates, but the strategy differed between groups, especially for consonants. Understanding the factors that influence auditory processing in this population is essential for developing appropriate assessments of auditory comprehension. These findings have implications for the clinical care of older populations whose ability to rely on spectral and/or temporal information in the auditory signal may decline.

Casual speech includes different rates of speaking, which listeners must accommodate in order to comprehend the message. Little research exists on how bilinguals accommodate various speech rates, especially in their non-native language. Non-native (L2) listeners may have more difficulty understanding speech at faster rates because their speech perception is less automatic than that of native listeners (Segalowitz, 2003; Strange, 2011), relying to a greater degree on bottom-up processing compared to listening abilities in the native, or first-learned (L1), language (Medina , 2020; Vandergrift, 2003). For example, in a study of L1 Arabic-L2 Hebrew bilinguals, accelerated speech rates resulted in a greater decrement in speech perception in the L2 than in the L1 (Rosenhouse , 2006). However, a study of L2 English consonant perception by L1 Japanese speakers reported no effect of speech rate on performance (Guion , 2000). Studies focusing on L2 listening comprehension have also reported that faster speech rates negatively affect performance (e.g., Conrad, 1989; Griffiths, 1990; Medina , 2020; but see Griffiths, 1992). In addition to the over-reliance on bottom-up processing during incoming speech, L2 listeners may have less developed lexical systems, making retrieval of incoming words and phrases slower or more effortful (Medina , 2020). Thus, L2 listening is more cognitively demanding, leaving fewer available cognitive resources to flexibly accommodate more difficult listening conditions, such as fast speech rates. This is supported by studies of speech perception in noise, another challenging listening context. Bilinguals perform better under noise conditions in their dominant language than in their non-dominant language (Rogers , 2006; Weiss and Dempsey, 2008), and monolinguals often perform better than highly proficient bilinguals who are listening in their dominant language (Mayo , 1997; Rogers , 2006). Thus, non-ideal listening conditions such as noise or fast speech rates may disproportionately affect speech perception for bilinguals compared to monolinguals (Garcia Lecumberri , 2010).

It is not necessarily bilingualism that contributes to worse performance but rather experience using the acoustic-phonetic properties of their native language. Bilinguals' language experiences are highly varied, with differences among individuals in age and context of acquisition, context and frequency of use, proficiency gained (and, in some cases, lost), and which language is dominant (or most proficient) across the lifespan, among other variables (Takahesu Tabori , 2018). Thus, bilinguals with different L1s would be expected to perform better or worse on some aspects of L2 speech perception depending on whether the features of their L1 are present or absent in their L2. For example, Japanese uses temporal cues to differentiate word meanings, whereas Spanish exclusively uses spectral distinctions. English employs durational information as a secondary cue, combined with vowel quality, formant frequency, intensity, and/or fundamental frequency. For example, vowels in English can be distinguished by intrinsic durational differences, with longer durations for vowels such as /ɑ/ in “hot” and /ae/ in “hat,” and shorter durations for vowels such as /ʌ/ in “hut.” English also utilizes durational cues to indicate stress differences. For instance, stressed vowels in words such as “PERmit” and “perMIT” differ not only in pitch contour but also duration. Both Japanese and English use temporal cues while Spanish does not. Thus, English and Japanese speakers should be able to utilize durational information in unfamiliar linguistic contexts better than Spanish speakers. Evidence suggests that this is the case: Late Spanish-English bilinguals demonstrate lower accuracy than English monolingual listeners on English discrimination (Hisagi , 2020) and three types (vowel, consonant, syllable) of Japanese temporally-cued contrasts (Hisagi , 2022b). The authors suggest that Spanish–English bilinguals are less able to make use of durational cues in speech because it is not an intrinsic property of their L1 sound system. Thus, depending on the properties of the L1, bilinguals may utilize certain auditory processing strategies (such as durational cues) in the L2 or in novel linguistic contexts.

A model that has been used to explain why non-native speech perception can often be more difficult, particularly for late bilinguals, is the Automatic Selective Perception model (Strange, 2011). According to this model, listeners develop Selective Perceptual Routines (SPRs) for their L1 that make L1 speech comprehension efficient and automatic by attending to relevant perceptual-acoustic cues (e.g., duration, tone, and spectral cues) (Strange and Shafer, 2008; Strange, 2011). These SPRs reduce the amount of cognitive effort needed to comprehend speech in that language. L2 listeners, on the other hand, do not have SPRs in their L2, but rather they tend to use their L1 SPRs to comprehend L2 speech. When the perceptual-acoustic features for the two languages overlap, these SPRs are beneficial. For example, both English and Japanese show durational variation in their vowels, though they differ somewhat in that these durational distinctions are contrastive in Japanese but not in English. This overlap may support L2 speech perception among these listeners (both L1 English listeners perceiving L2 Japanese and L1 Japanese listeners perceiving L2 English). For example, L1 English speakers performed well on a vowel-length discrimination task in Japanese after a brief training (Hisagi and Strange, 2011). In another study, L1 Japanese speakers discriminated English vowel contrasts that differ temporally better than vowel contrasts that do not differ temporally (Strange , 2011). On the other hand, when L2 listeners’ auditory processing is not attuned to certain critical features of the L2 (e.g., consonant or vowel duration) or when L2 auditory processing leads to the misperceptions (e.g., assimilating an L2 phoneme to an L1 category), this disadvantages listeners. The result is greater cognitive effort for L2 listeners as they attempt to use top-down processes, such as lexical-semantic content and pragmatic context, to disambiguate speech. Furthermore, Garcia Lecumberri (2010) suggest that bilingual listeners are more likely to fall back on L1 SPRs in more challenging listening conditions.

Bilinguals may use different strategies for processing speech than monolinguals, especially in their L2 (e.g., Bosker and Reinisch, 2017; Flege , 2021), and these strategies can be affected by speech rate (e.g., Miller and Volaitis, 1989; Pind, 1995; Newman and Sawusch, 1996; Kato and Tajima, 2004). Speech rate normalization (a.k.a. adaptation) occurs when a perceptual boundary is based on the duration of the phonemic segment relative to the rest of the word or phrase. In other words, the category boundary is maintained despite variations in speech rate. Adaptation appears to be an obligatory process that happens in the early phase of perception for native listeners (Kawahara , 2022). By contrast, counter-adaptation occurs when the perceptual boundary is based on the absolute duration of the phonemic segment. In other words, the category boundary will be affected by speaking rate, with more sounds being perceived as “short” in fast speech and more sounds being perceived as “long” in slow speech. Non-native listeners may not be able to adapt their perceptual boundaries according to speech rate and may therefore show a counter-adaptation approach. Bosker and Reinisch (2017) examined the effect of speech rate on categorical perception of an unfamiliar language (Dutch) by native German listeners and found that when listeners encountered an ambiguous vowel between short /a/ and long /a:/, their perception of it as either /a:/ or /a/ was influenced by the speed of the surrounding speech, reflecting a counter-adaptation approach. Interestingly, their other participant group, Dutch listeners responding to unfamiliar German stimuli, showed more of an adaptation approach, despite the authors' predictions that they would show a counter-adaptation approach, like the native German listeners. The differences between the groups may have to do with degree of familiarity with the non-native language. German listeners were generally less familiar with Dutch than Dutch listeners were with German. Moreover, among the German listeners, more familiarity with Dutch was associated with more of an adaptation approach to categorical perception in Dutch. These findings highlight the different approaches to native and non-native listening strategies as well as the role of proficiency or familiarity in the non-native language.

Wilson (2005) explored the long/short perceptual boundary for Japanese contrasts at three different speech rates (fast, normal, and slow) in two conditions, a blocked condition, where all stimuli were at the same rate, and a mixed-rate condition, which included all three rates. Their stimuli included non-words with vowel contrasts presented in the context of a carrier sentence and isolation. They explored which strategy, an adaptation or a counter-adaptation strategy, listeners used to determine whether the stimuli sounded short or long using a single-stimulus two-alternative forced choice identification task. Native English listeners performed more poorly at faster rates and in the mixed-rate condition compared to native Japanese listeners. When the stimulus was presented in isolation, English listeners demonstrated a stronger counter-adaptation trend compared to native Japanese speakers. Most notably, the English listeners showed higher performance when the short vowels were presented at faster rates and when the long vowels were presented at slower rates. This reflects their reliance on a counter-adaptation strategy, which occurs when the perceptual boundary is based on the absolute duration of the phonemic segment. Thus, listeners whose language includes durational cues to distinguish phonemes tend to rely on relative duration (i.e., the segment duration is relative to the rest of the word or sentence), while listeners who are not skilled at using duration cues to distinguish phonemes tend to rely on absolute duration (i.e., the segment duration is independent of the duration of the word or carrier sentence). At faster rates, the number of “short” vowels (in terms of actual duration) increases while at slow rates the number of “long” vowels increases. By contrast, native Japanese listeners engaged in an adaptation strategy, which occurs when the perceptual boundary is based on the duration of the phonemic segment relative to the rest of the word or phrase. Thus, Japanese listeners are able to reliably hear length contrasts regardless of the speech rate. Kawahara (2022) investigated the influence of speech rate through surrounding context using singleton-geminate stop contrasts and short-long vowel contrasts in Japanese. Their results indicated that native listeners' durational cue generalization (i.e., short or long) stemmed from their rate-based adjustments to different talkers' speech regardless of the type of speech stimuli (e.g., silent closure duration or vowel duration). They concluded that listeners' rate-based adjustments are independent of talkers. Namely, regarding the general auditory normalization process, the rate information precedes the talker's characteristics (e.g., Bosker, 2017; Kingston , 2009; Sawusch and Newman, 2000; Wade and Holt, 2005).

SPRs are thought to develop and become entrenched during early stages of language development. Thus, bilinguals who learn both languages in early childhood should in theory develop SPRs for each of their languages. However, there is very little research on how early bilinguals develop SPRs. Many early bilinguals in the U.S. learn one language primarily at home (often called the “heritage language”) and another in school and the broader society. Bilinguals are often imbalanced in their proficiency in each language due to differences in how often and in which contexts they use them. It is common for early bilinguals to develop higher proficiency in the heritage language in their first few years but then to switch their language dominance to the societal language within a few years of starting school, where the societal language is typically the only one supported (Birdsong, 2014). One question guiding our research is whether early bilinguals use L1 or L2 SPRs during speech categorization in an unfamiliar language? In this study, we focused on early Spanish–English bilinguals who grew up in a primarily Spanish-speaking home environment but who were more dominant in English at the time of testing. Thus, we sought to determine whether these Spanish–English bilinguals would rely on English SPRs to distinguish durational contrasts in an unfamiliar language (Japanese) or whether they would rely more on Spanish SPRs, despite their switched language dominance. Hisagi (2022a) found that early bilinguals performed similarly to monolingual English listeners on nine out of ten auditory tests in English, including tests of spectro-temporal pattern processing and speech intelligibility and comprehension in quiet and in noise, with the only group difference appearing for speech comprehension in noise. This suggests that the early age of L2 acquisition supports the development of native-like SPRs in the L2, particularly among early bilinguals with witched dominance. Additional evidence has shown that L1 SPRs may not be maintained in adulthood in the context of switched language dominance (von Hapsburg , 2004; Weiss and Dempsey, 2008), suggesting that L2 SPRs have become dominant.

The current study examined whether early bilinguals with switched dominance rely more on their L1 (first-learned) or their L2 (second-learned) SPRs when processing novel speech sounds at different speech rates. If early Spanish–English bilinguals rely on L2 SPRs (English, their dominant language), this would give them an advantage in processing temporal contrasts since English also uses duration as an acoustic-phonetic cue. If, however, they rely on L1 SPRs (Spanish, their first-learned language), they may not be sensitive to temporal cues. Furthermore, as suggested by Garcia Lecumberri (2010), bilinguals may resort to more automatic L1 SPRs in non-ideal listening conditions. Thus, we may see that bilinguals rely on L2 SPRs in the easier listening condition(s) and L1 SPRs in the more difficult listening condition(s). Bilinguals were only tested in English (their L2 and current dominant language), and their performance was compared to a group of American English monolinguals and a group of Japanese controls. We also compared these groups on whether they showed an adaption or counter-adaptation approach to durational perception.

The stimuli included Japanese words with temporally-contrasting vowels and consonants presented in either fixed-rate blocks (the same speech rate for an entire block of stimuli) or mixed-rate blocks (varying speech rates). Differences in vowel length hold linguistic significance in English, but not in Spanish. For consonants, duration has no linguistic significance in either English or Spanish. We hypothesized that Japanese controls would show minimal effects of speech rate while both monolinguals and bilinguals would show a detriment at faster speech rates, particularly for consonants, which do not align with their SPRs. Moreover, given their switched language dominance, if bilinguals rely on their L1 (Spanish) SPRs, we predict that monolinguals would have clearer boundaries than bilinguals because English SPRs include using duration as a secondary cue. Finally, we predicted that stimuli presented in fixed-rate blocks would be easier for all groups than stimuli presented in mixed-rate blocks, which we examined through visual inspection. To our knowledge, this is the first study to compare the effect of speech rate on the perception of temporal cues by Spanish–English bilinguals.

The total of 52 participants completed all the tasks, including 20 early Spanish–English bilinguals (EB) (English acquired by age seven with regular exposure by age ten), 24 American English monolinguals (MONO), and eight Japanese (JP) speakers with limited English experience. The Spanish–English bilinguals and American English monolinguals reported no Japanese language experience. The participants, ranging in age from 19 to 35 (mean = 24.77), were residing in the U.S. (MONO and EB) or Japan (JP) at the time of testing. Participants self-evaluated their English proficiency across four domains (speaking, reading, writing, and understanding) using a Likert scale ranging from 0 (not proficient) to 10 (native-like). The average self-rated proficiency in English across the four domains showed no significant difference between MONO and EB. Their language history profile indicates that the bilinguals grew up in a predominantly Spanish environment (Spanish was used on average 89.0% of the time before age five) but eventually became dominant in English (average self-rated proficiency of 7.80 in Spanish compared to 9.11 in English at the time of testing). Table 1 presents descriptive statistics on demographics and language background for each group.

The Japanese-sounding nonword vowels were identical to those used in Wilson (2005), and the consonants were very similar to those used in Sonu (2013) (where the target consonant was /k/ or /s/; the current study used /t/ instead of /k/ or /s/). All stimuli were spoken by a professionally trained, male native Japanese speaker from Tokyo. See Table 2 for detailed information about the stimuli. The stimuli were spoken at three different speech rates: normal, slow, and fast. Target words were presented in isolation. The original recordings were re-synthesized with a duration manipulation in 20-ms steps using the STRAIGHT synthesis program (Kawahara , 1999). For vowels, there were 12 durational steps (continua) between −20 and 200 ms, and for consonants, there were ten durational steps (continua) between −20 and 160 ms. All stimuli had six repetitions, three speech rates, and were presented in two conditions (fixed-rate and mixed-rate), resulting in a total of 432 trials for vowels and 360 trials for consonants.

Participants completed a single-stimulus two-alternative forced-choice identification task in which they heard a stimulus and were asked to decide if the target phoneme sounded short or long. After each trial, the participant was prompted “Does the [x] in the word you just heard sound short or long?” The interstimulus interval (ISI) was 500 ms. Participants were instructed to press “s” on the keyboard if the target stimulus sounded “short” and “l” if it sounded “long.”

Participants completed the experiment online through their web browser, using their own computer, and at a location of their choosing. The listening task was created using PsychoPy and hosted on Pavlovia. After giving consent to participate, participants completed a comprehensive language background questionnaire on Qualtrics. Before beginning the listening task, participants were instructed to use headphones and to test their volume by listening to a test sound and adjusting the volume to a comfortable level.

Participants were given three practice trials for vowels and three for consonants, the same as they would see in the experimental block. The practice trials were given without feedback on accuracy. There were four experimental blocks for each phoneme type, namely, “vowels” or “consonants”: the first three blocks included stimuli with slow, normal, and fast speech rates, and these blocks were presented in random order. A final block included a mixture of all three speech rates. Stimulus order within each block was randomized. The total duration of the experiment was approximately 55 min. The presentation of the vowel blocks and consonant blocks was counterbalanced. The participants were encouraged to take a 2-min break in the middle of the experiment. Upon completion of the study, participants were paid $20.00 USD or the equivalent in yen, or were given extra credit in a course.

Two-way analysis of variance (ANOVA) was used for both between-subject (Group: JP vs MONO vs EB) and within-subject (Rate: slow vs normal vs fast) comparisons. Correction for multiple comparisons was conducted using Tukey's HSD (honestly significant difference). Post hoc tests were conducted to examine the effect of Rate. A significance level of 0.05 was used. The data were first checked for the direction of the boundary. A logistic function used to estimate the boundary position between short and long stimuli, crossing the y = 50% line, was fitted for each participant to estimate the boundary. Boundary position could not be estimated for some participants, for example, when they gave almost all stimuli the same response. Such participants were excluded from the analysis and reported in more detail in the following section.

We had to exclude some participants from the analysis because their boundary position could not be estimated (5 MONO, 9 EB for fixed-rate blocks, and 1 JP, 10 MONO, and 13 EB for the mixed-rate block).

3.1.1 Fixed-rate blocks

A two-way ANOVA with Group (JP, MONO, EB) as the between-subjects factor and Rate (fast, normal, slow) as the within-subjects factor was carried out using the boundary position as the dependent variable. There was a significant main effect of Rate (p < 0.001) and a significant Group-by-Rate interaction (p = 0.028). Figure 1(A) shows the overall group comparison by rate. Planned post hoc analyses showed that the effect of Rate was a significant predictor for the MONO (p < 0.001) and EB (p < 0.01) groups, but not JP. Multiple comparisons using Tukey's HSD showed the following results. For MONO, boundary position was significantly higher at the fast rate than at the slow rate (p < 0.05). For EB, boundary position was significantly higher at the fast rate than at the normal rate (p < 0.05) or the slow rate (p < 0.01). Furthermore, the effect of Group was significant for the fast rate (p < 0.05). Multiple comparisons revealed that boundary position was significantly higher for EB than for JP (p < 0.05), but no significant difference was found between JP and MONO and between MONO and EB. The effect of Group was not significant for the normal and slow rates.

3.1.2 Mixed-rate block

The logistic function used to estimate the category boundary did not cross-the y = 50% line for most listeners in the Slow Rate condition. Therefore, the analyses for this condition include only the normal and fast rates. A two-way ANOVA with Group (JP, MONO, EB) as the between-subjects factor and Rate (fast, normal) as the within-subjects factor was carried out using the boundary position as the dependent variable. There was a significant main effect of Group (p < 0.05), Rate (p < 0.001), and a significant Group-by-Rate interaction (p < 0.001). Figure 1(B) shows the overall group comparison by rate. Planned post hoc analyses showed that the effect of Rate was significant for the MONO (p < 0.001) and EB (p < 0.01) groups, but not JP. Furthermore, the effect of Group was significant for the fast rate (p < 0.001). Multiple comparisons revealed that boundary position was significantly higher for the MONO (p < 0.01) and EB (p < 0.001) groups than for the JP controls, but no significant difference was found between the MONO and EB groups. The effect of Group was not significant for the normal rate.

Perception of consonant length was unstable and inconsistent for many listeners. When a logistic function was fitted to individual listeners' data, the function did not cross-the category boundary (y-axis = 50%) for many listeners. Instead, most listeners' responses were biased toward “short” or “long” responses regardless of the duration of the consonant. Thus, the dependent variable used here is the mean percentage of “long” responses to all stimuli. One JP participant was excluded from the analyses because their responses were predominantly “short” or “long” to almost all stimuli.

3.2.1 Fixed-rate Blocks

A two-way ANOVA with Group (JP, MONO, EB) as the between-subjects factor and Rate (fast, normal, slow) as the within-subjects factor was carried out using the boundary position as the dependent variable. There was a significant main effect of Rate (p < 0.01), but no significant main effect of Group nor a significant Group-by-Rate interaction. Figure 2(A) shows the overall group comparison by rate. Planned post hoc analyses showed that the effect of Rate was significant for the EB group (p < 0.001), but not for JP or MONO groups. Multiple comparisons showed that the mean percentage of “long” responses was significantly higher for the EB participants at the slow rate than at the normal rate (p < 0.05) or fast rate (p < 0.01). The effect of Group was not significant for any of the speech rates.

3.2.2 Mixed-rate block

A two-way ANOVA with Group as the between-subjects factor and Rate (fast, normal, slow) as the within-subjects factor was carried out using the boundary position as the dependent variable. There was a significant main effect of Rate (p < 0.001), but no significant main effect of Group nor a significant Group-by-Rate interaction. Figure 2(B) shows the overall group comparison by rate. Planned post hoc analyses showed that the effect of Rate was significant for the MONO (p < 0.001) and EB (p < 0.001) groups, but not JP. Multiple comparisons showed that for the MONO group, the mean percentage of “long” responses was significantly lower at the fast rate than at both normal rate (p < 0.001) or the slow rate (p < 0.001). Similarly, for the EB group, the mean was significantly lower at the fast rate than at both the normal rate (p < 0.01) and the slow rate (p < 0.001). The effect of Group was not significant for any of the speech rates.

The JP group showed no shift in the boundary position regardless of the rate and presentation conditions (fixed-rate or mixed-rate) for both vowels and consonant contrasts, reflecting an adaptation approach. That is, even though the target vowel or consonant word duration was generally shorter in duration for faster speech rates and longer in duration for slower speech rates, the JP listeners adjusted the perceptual boundary in accordance with the changes in speech rate, thus resulting in little or no shift in boundary position across speech rates. For vowels, both the MONO and EB groups were affected by speech rate, reflecting a counter-adaptation approach, but more so for EB group. That is, due to the target vowel word duration being generally shorter in duration for faster speech rates and longer in duration for slower speech rates, the MONO and EB listeners tended to give more “short” responses (resulting in a shift in boundary location to the “long” end of the continuum) at faster speech rates, and they tended to give more “long” responses (resulting in a shift in boundary location to the “short” end of the continuum) at slower speech rates. As for consonants, the EB group showed an effect of Rate in both fixed-rate and mixed-rate blocks while the MONO showed the effect of Rate only in the mixed-rate block. In general, the impact of speech rate was more pronounced for the EB group compared to the MONO group. This indicates that EB participants consistently employed a counter-adaptation strategy, whereas MONO participants typically exhibited a counter-adaptation approach, except for consonants in the fixed-rate blocks.

The present study investigated whether early Spanish–English bilinguals show similar performance as English monolinguals in perceiving durational contrasts in an unfamiliar language, in order to examine whether these bilinguals rely more on L1 (Spanish) SPRs or L2 (English) SPRs. We hypothesized that if the bilinguals had developed efficient SPRs in their L2 English, the language they were now dominant in, they should perform similarly to the monolinguals. However, even if this were the case for typical listening conditions, under more difficult listening conditions, the bilinguals may resort to their L1 SPRs for Spanish. To test this, we examined the perception of temporally-cued vowel and consonant contrasts in Japanese spoken at three different speech rates (normal, slow, fast). We predicted specifically that (1) the Japanese control group would show no effect of speech rate, revealing an adaptation approach; (2) bilinguals and monolinguals would show variability based on speech rate, revealing a counter-adaptation approach; and (3) the bilinguals would show a similar pattern of categorical perception as the monolingual English listeners, except for the more difficult listening conditions, such as for consonants and at fast or slow speech rates.

As predicted, Japanese controls showed minimal effects on speech rate while both English monolinguals and Spanish-English bilinguals showed variable perceptual boundaries that were influenced by speech rate. Thus, Japanese listeners showed a clear adaptation approach, which occurs when listeners adapt their perceptual boundaries to different speech rates. Unlike the previous study by Wilson (2005), but in line with our predictions, the monolingual group used more of a counter-adaptation approach. This reflects a strategy similar to that of the Japanese controls, whose SPRs are optimized for the type of stimuli presented, compared to the two groups of non-native listeners, whose SPRs are optimized for English vowel durational cues, which are less strongly contrastive than in Japanese.

We found that the early bilinguals generally performed quite similarly to the monolinguals with one exception: bilinguals' perceptual boundaries were more affected by speech rate than those of the other two groups for consonant duration contrasts in the fixed-rate blocks. We hypothesized that bilinguals may lose their ability to apply English SPRs during difficult listening conditions, like fast or slow speech rates (Garcia Lecumberri , 2010). We predicted that durational contrasts would be more difficult for non-native listeners than vowel contrasts because durational cues are found in English only for vowels. Thus, it appears that the monolingual English speakers were able to perceive durational differences among consonants better than the bilinguals, perhaps making use of more efficient and automatic English SPRs.

Overall, these results support the idea that early bilinguals who experienced a language dominance shift from Spanish to English are generally able to use English SPRs similar to the monolingual English speakers to detect durational differences in novel stimuli. SPRs are thought to develop in early childhood, though the precise age range for optimal development is not clear. Since the average reported age of English acquisition in our sample was 3.9 years, it seems that learning a second language starting at around age four is early enough to develop efficient SPRs in that language. Since these bilinguals were exposed primarily to Spanish at home before the age of ten, it is possible that their English SPRs were not well established while they were dominant in Spanish but that they became more efficient and automatic as their English became more dominant. This would be an interesting area for future research.

Overall, based on visual inspection, vowel contrasts showed less variability than consonant contrasts, and fixed-rate blocks showed less variability than mixed-rate blocks, which was expected. Future research could compare early bilinguals with late bilinguals who are highly proficient in the L2 in order to determine whether early age of acquisition is critical for the development of efficient SPRs in the L2. Another comparison group could be early bilinguals who continued to be dominant in the L1 to examine the effect of switched language dominance. Given that bilingual experiences are highly variable, we caution against generalizing the results from this study to other types of bilinguals who have different language backgrounds.

There were a number of challenges running this study online. As mentioned in Sec. 3, the perception of consonant length was unstable and inconsistent for many of our listeners, which was unexpected. Sonu (2013) reported that although the perceptual boundaries for consonant contrasts were variable and ambiguous, boundary position and width were still measurable. Online administration of our experiment may have contributed additional variability, such as control of attention. Participants' attention to the task could not be controlled, nor their level of motivation. For example, we encouraged participants to take a short break between blocks, but it seems like some participants took a break during the block. We had to exclude a lot of data from the analyses in the vowel condition, particularly data from the students who were compensated with class credit because they did not follow the task instructions properly. As for the consonant condition, instead of excluding a lot of data, we had to modify the dependent variable from boundary position to mean.

We investigated the auditory processing abilities of non-native phonemic length contrasts by American English monolinguals and early Spanish–English bilinguals. We observed that, overall, in contrast to native listeners, who exhibited an adaptation approach, early bilinguals and monolinguals exhibited a counter-adaptation approach. Although generally similar, monolinguals outperformed bilinguals in discerning consonant duration contrasts in fixed-rate blocks. To be more precise, the performance of bilinguals was more adversely influenced by fast speech rates compared to that of monolinguals. Consequently, these findings endorse the notion that early bilinguals, who underwent a language dominance shift from Spanish to English, can effectively utilize English SPRs, akin to monolingual English speakers when detecting durational differences in unfamiliar stimuli. These findings have implications for auditory processing among Spanish-English bilinguals with age-related sensorineural hearing loss, whose ability to rely on spectral and/or temporal information in the auditory signal may decline.

This study was supported by a Faculty Support Grant from California State University, East Bay and a National Science Foundation Postdoctoral Research Fellowship (SBE-1715073) to Eve Higby.

There are no conflicts to disclose.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
Birdsong
,
D.
(
2014
). “
Dominance and age in bilingualism
,”
Appl. Ling.
35
(
4
),
374
392
.
2.
Bosker
,
H. R.
(
2017
). “
How our own speech rate influences our perception of others
,”
J. Exp. Psychology: Learn. Mem. Cogn.
43
(
8
),
1225
1238
.
3.
Bosker
,
H. R.
, and
Reinisch
,
E.
(
2017
). “
Foreign languages sound fast: Evidence from implicit rate normalization
,”
Front. Psychol.
8
,
1063
.
4.
Conrad
,
L.
(
1989
). “
The effects of time-compressed speech on native and EFL listening comprehension
,”
Stud. Sec. Lang. Acq.
11
,
1
16
.
5.
Flege
,
J.
,
Aoyama
,
K.
, and
Bohn
,
O.
(
2021
). “
The revised speech learning model (SLM-r) applied
,” in
Second Language Speech Learning: Theoretical and Empirical Progress
, edited by
R.
Wayland
(
Cambridge University Press
,
Cambridge, UK
), pp.
84
118
.
6.
Garcia Lecumberri
,
M. L.
,
Cooke
,
M.
, and
Cutler
,
A.
(
2010
). “
Non-native speech perception in adverse conditions: A review
,”
Speech Commun.
52
,
864
886.
7.
Griffiths
,
R.
(
1990
). “
Speech rate and NNS comprehension: A preliminary study in time-benefit analysis
,”
Lang. Learn.
40
,
311
336
.
8.
Griffiths
,
R.
(
1992
). “
Speech rate and listening comprehension: Further evidence of the relationship
,”
TESOL Quart.
26
,
385
390
.
9.
Guion
,
S. G.
,
Flege
,
J. E.
,
Akahane-Yamada
,
R.
, and
Pruitt
,
J. C.
(
2000
). “
An investigation of current models of second language speech perception: The case of Japanese adults' perception of English consonants
,”
J. Acoust. Soc. Am.
107
(
5
),
2711
2724
.
10.
Hisagi
,
M.
,
Baker
,
M.
,
Alvarado
,
E.
, and
Shafiro
,
V.
(
2022a
). “
Online assessment of speech perception and auditory spectrotemporal processing in Spanish-English bilinguals
,”
Am. J. Audiol.
31
(
3S
),
936
949
.
11.
Hisagi
,
M.
,
Higby
,
E.
,
Zandona
,
M.
,
Kent
,
J.
,
Castillo
,
D.
,
Davidovich
,
I.
, and
Shafer
,
V. L.
(
2020
). “
Perceptual discrimination measure of non-native phoneme perception in early and late Spanish-English & Japanese-English bilinguals
,”
Proc. Mtgs. Acoust.
42
,
060013
.
12.
Hisagi
,
M.
, and
Strange
,
W.
(
2011
). “
Perception Japanese temporally-cued contrasts by American English listeners
,”
Lang. Speech
54
(
2
),
241
264
.
13.
Hisagi
,
M.
,
Zandona
,
M.
,
Kent
,
J.
, and
Higby
,
E.
(
2022b
). “
Perception of temporally-contrasted Japanese words by Spanish-English bilinguals and American English monolinguals
,”
JASA Express Lett.
2
(
1
),
015201
.
14.
Kato
,
H.
, and
Tajima
,
K.
(
2004
). “
Effects of speaking rate on the perception of phonemic length contrasts in Japanese
,”
J. Acoust. Soc. Am.
115
,
2392
2393
.
15.
Kawahara
,
S.
,
Kato
,
M.
, and
Idemaru
,
K.
(
2022
). “
Speaking rate normalization across different talkers in the perception of Japanese stop and vowel length contrasts
,”
JASA Express Lett.
2
(
3
),
035204
.
16.
Kawahara
,
H.
,
Masuda-Katsuse
,
I.
, and
de Cheveigne
,
A.
(
1999
). “
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous frequency-based F0 extraction: Possible role of a repetitive structure in sounds
,”
Speech Commun.
27
,
187
207
.
17.
Kingston
,
J.
,
Kawahara
,
S.
,
Chambless
,
D.
,
Mash
,
D.
, and
Brenner-Alsop
,
E.
(
2009
). “
Contextual effects on the perception of duration
,”
J. Phon.
37
(
3
),
297
320
.
18.
Mayo
,
L. H.
,
Florentine
,
M.
, and
Buus
,
S.
(
1997
). “
Age of second language acquisition and perception of speech in noise
,”
J. Speech. Lang. Hear. Res.
40
(
3
),
686
693
.
19.
Medina
,
A.
,
Socarrás
,
G.
, and
Krishnamurti
,
S.
(
2020
). “
L2 Spanish listening comprehension: The role of speech rate, utterance length, and L2 oral proficiency
,”
Mod. Lang. J.
104
(
2
),
439
456
.
20.
Miller
,
J. L.
, and
Volaitis
,
L. E.
(
1989
). “
Effect of speaking rate on the perceptual structure of a phonetic category
,”
Percept. Psychophys.
46
(
6
),
505
512
.
21.
Newman
,
R. S.
, and
Sawusch
,
J. R.
(
1996
). “
Perceptual normalization for speaking rate: Effects of temporal distance
,”
Percept. Psychophys.
58
(
4
),
540
560
.
22.
Pind
,
J.
(
1995
). “
Speaking rate, voice-onset time, and quantity: The search for higher-order invariants for two Icelandic speech cues
,”
Percept. Psychophys.
57
(
3
),
291
304
.
23.
Rogers
,
C.
,
Lister
,
J.
,
Febo
,
D.
,
Besing
,
J.
, and
Abrams
,
H.
(
2006
). “
Effects of bilingualism, noise, and reverberation on speech perception by listeners with normal hearing
,”
Appl. Psycholinguist.
27
(
3
),
465
485
.
24.
Rosenhouse
,
J.
,
Haik
,
L.
, and
Kishon-Rabin
,
L.
(
2006
). “
Speech perception in adverse listening conditions in Arabic-Hebrew bilinguals
,”
Int. J. Bilingualism
10
,
119
135
.
25.
Sawusch
,
J. R.
, and
Newman
,
R. S.
(
2000
). “
Perceptual normalization for speaking rate II: Effects of signal discontinuities
,”
Percept. Psychophys.
62
,
285
300
.
26.
Segalowitz
,
N.
(
2003
). “
Automaticity and second language learning
,” in
The Handbook of Second Language Acquisition
, edited by
C.
Doughty
and
M.
Long
(
Blackwell
,
Oxford, UK
), pp.
382
408
.
27.
Sonu
,
M.
,
Arai
,
T.
,
Tajima
,
K.
, and
Kato
,
H.
(
2013
). “
Effect of speaking rate variation on the perception of singleton and geminate consonants in Japanese by native and Korean listeners
,”
Proc. Mtgs. Acoust.
19
,
025024
.
28.
Strange
,
W.
(
2011
). “
Automatic selective perception (ASP) of first and second language speech: A working model
,”
J. Phon.
39
(
4
),
456
466
.
29.
Strange
,
W.
,
Hisagi
,
M.
,
Akahane-Yamada
,
R.
, and
Kubo
,
R.
(
2011
). “
Cross-language perceptual similarity predicts categorial discrimination of American vowels by naïve Japanese listeners
,”
JASA Express Lett.
130
(
4
),
EL226
EL231
.
30.
Strange
,
W.
, and
Shafer
,
V. L.
(
2008
). “
Speech perception in second language learners: The re-education of selective perception
,” in
Phonology and Second Language Acquisition
, edited by
J. G.
Hansen Edwards
and
M. L.
Zampini
(
John Benjamins
,
New York
), pp.
153
191
.
31.
Takahesu Tabori
,
A.
,
Mech
,
E. N.
, and
Atagi
,
N.
(
2018
). “
Exploiting language variation to better understand the cognitive consequences of bilingualism
,”
Front. Psychol.
9
,
01686
.
32.
Vandergrift
,
L.
(
2003
). “
Orchestrating strategy use: Toward a model of the skilled second language listener
,”
Lang. Learn.
53
,
463
496
.
33.
Von Hapsburg
,
D.
,
Champlin
,
C. A.
, and
Shetty
,
S. R.
(
2004
). “
Reception thresholds for sentences in bilingual (Spanish/English) and monolingual (English) listeners
,”
J. Am. Acad. Audiol.
15
(
1
),
88
98
.
34.
Wade
,
T.
, and
Holt
,
L. L.
(
2005
). “
Effects of later-occurring nonlinguistic sounds on speech categorization
,”
J. Acoust. Soc. Am.
118
,
1701
1710
.
35.
Weiss
,
D.
, and
Dempsey
,
J. J.
(
2008
). “
Performance of bilingual speakers on the English and Spanish versions of the Hearing in Noise Test (HINT)
,”
J. Am. Acad. Audiol.
19
(
1
),
005
017
.
36.
Wilson
,
A.
,
Kato
,
H.
, and
Tajima
,
K.
(
2005
). “
Native and non-native perception of phonemic length contrasts in Japanese: Effects of speaking rate and presentation context
,”
J. Acoust. Soc. Am.
117
,
2425
.