Bilinguals are capable of retuning phonetic categories in both languages through lexically-guided perceptual learning, but recent work suggests that some bilingual speakers may lose the ability to adapt in the native language. In the study reported here, early Cantonese-English bilinguals, who are on average English-dominant, successfully retuned Cantonese /f/. Scores of Cantonese-English dominance were not shown to correlate with phonetic retuning. The results are discussed in light of what may support the maintenance of perceptual flexibility in a lesser-used language.
1. Introduction
Phonetic variation is ubiquitous, yet humans readily understand each other. One proposed mechanism behind this ability is perceptual learning—a process in which the perceptual system continually updates and improves itself, leading to lasting changes (Goldstone, 1998). In speech perception, Samuel and Kraljic (2009) distinguish between global adaptation for difficult-to-understand speech (Clarke and Garrett, 2004; Davis et al., 2005; Pallier et al., 1998), and the phonetic retuning of specific sounds (Bertelson et al., 2003; Norris et al., 2003). We investigate phonetic retuning via lexically-guided perceptual learning for Cantonese /f/ with a population of Cantonese-English bilingual listeners in Vancouver, Canada. In this paradigm, listeners hear an ambiguous fricative in real words. This lexical scaffolding is critical, as it guides listeners’ interpretation and leads to phonetic retuning (Norris et al., 2003; Scharenborg and Janse, 2013; van Heugten and Johnson, 2014), though also see Cooper and Bradlow (2016).
Phonetic retuning can generalize to new words (McAuliffe and Babel, 2016; McQueen et al., 2006), related sounds (Kraljic and Samuel, 2006), across talkers (Eisner and McQueen, 2005; Reinisch and Holt, 2014; van der Zande et al., 2014; but also see Kraljic and Samuel, 2005), across languages, and occur in a non-native language (L2; Reinisch et al., 2013; Schuhmann, 2015). L2 retuning, however, may be limited by the categories involved (Drozdova et al., 2016). Phonetic retuning does not occur uniformly in multilingual communities. Bruggeman and Cutler (2019) and Cutler et al. (2019) found that high-proficiency L1 Dutch-L2 English and L1 Mandarin-L2 English bilinguals in Australia only retuned English sounds. In both studies, the authors highlight the importance of regular exposure to new talkers, arguing that these bilinguals fail to retune L1 categories because regular L1 use is limited to immediate family. In sum, perceptual learning is not limited to the native language (L1), and is affected by language use and proficiency. We build on this work by studying the role of language dominance for a different bilingual population.
Cantonese has been spoken in Vancouver since the 1860s (Yee, 2006), and is the most common mother tongue besides English in Metro Vancouver (7.9% of the population: Statistics Canada, 2017). Cantonese speakers typically use Cantonese with their family and local community, and are highly proficient, but tend to be English-dominant. When compared to the bilinguals studied in Australia (Bruggeman and Cutler, 2019; Cutler et al., 2019), it is crucial to note that this population uses Cantonese widely.
We investigate perceptual learning for Cantonese /f/ and assess language dominance with the Bilingual Language Profile (BLP: Birdsong et al., 2012).1 While dominance does not play a role for high-proficiency bilinguals in previous work (Bruggeman, 2016), the BLP was not available when Bruggeman was collecting data. We predicted that Cantonese-English bilinguals in Vancouver would demonstrate perceptual learning, and that increased Cantonese dominance would correspond to greater perceptual flexibility.
2. Methods
The experiment comprises a lexical decision exposure phase and categorization task. Ambiguous pronunciations for /f/ words are presented during exposure for the Experimental group, while Control participants are exposed to the same materials with unambiguous /f/ pronunciations. In the categorization task, all participants heard the same /s/–/f/ continua. All materials were pretested with an independent listener group.
2.1 Participants
Participants included 156 Cantonese-English bilinguals residing in Metro Vancouver. Forty-seven were excluded for low self-rated Cantonese understanding (>2 of 6), and nine for missing BLP data. The remaining 100 were evenly split between Control and Experimental conditions—see Table 1 for details. An additional 45 bilinguals from the same population pretested the stimuli. Participants gave verbal consent and were compensated with partial course credit or $10 CAD. No speech or hearing disorders or disabilities were reported.
. | . | . | Age of Acquisition . | Understanding . | . | ||
---|---|---|---|---|---|---|---|
. | Gender . | Age . | Cantonese . | English . | Cantonese . | English . | Dominance . |
Control | 20.8 (3.7) | 1.2 (3.5) | 3.6 (2.9) | 4.8 (1.2) | 5.5 (0.8) | 46.1 (66.8) | |
Mdn: 20 | Mdn: 0 | Mdn: 3 | Mdn: 5 | Mdn: 6 | Mdn: 65.0 | ||
Experimental | 21.0 (2.9) | 3.0 (6.3) | 3.6 (3.3) | 4.6 (1.3) | 5.3 (0.9) | 40.6 (61.4) | |
Mdn: 21 | Mdn: 0 | Mdn: 4 | Mdn: 5 | Mdn: 6 | Mdn: 45.4 |
. | . | . | Age of Acquisition . | Understanding . | . | ||
---|---|---|---|---|---|---|---|
. | Gender . | Age . | Cantonese . | English . | Cantonese . | English . | Dominance . |
Control | 20.8 (3.7) | 1.2 (3.5) | 3.6 (2.9) | 4.8 (1.2) | 5.5 (0.8) | 46.1 (66.8) | |
Mdn: 20 | Mdn: 0 | Mdn: 3 | Mdn: 5 | Mdn: 6 | Mdn: 65.0 | ||
Experimental | 21.0 (2.9) | 3.0 (6.3) | 3.6 (3.3) | 4.6 (1.3) | 5.3 (0.9) | 40.6 (61.4) | |
Mdn: 21 | Mdn: 0 | Mdn: 4 | Mdn: 5 | Mdn: 6 | Mdn: 45.4 |
2.2 Materials
Cantonese words (15 critical /f/, 15 control /s/, 45 fillers) and phonologically legal nonwords (75 fillers) comprised the lexical decision exposure materials.2 All stimuli were bisyllabic, with /f/ or /s/ at the onset of the second syllable, if present. Critical /f/ words became nonwords when the /f/ was replaced with /s/. Fillers did not contain fricatives. The critical /f/ and control /s/ stimuli had comparable word frequency distributions [/f/: ; /s/: ; , p = 0.99] in the Hong Kong Cantonese Adult Language Corpus (Leung and Law, 2001).3
Four nonword–nonword consonant-vowel-consonant-vowel (CVCV) minimal pair continua with a medial ambiguous /f/-/s/ fricative were used as categorization test stimuli. The most ambiguous six steps of each continua were used. Identical categorization stimuli were used in both conditions.
Stimuli were produced by a 20-year-old bilingual male, recorded with a head-mounted microphone and pre-amp at 44.1 kHz with 16-bit resolution, and root mean square (RMS)-amplitude normalized. For all /f/ words and test stimuli, /f/ and /s/ endpoint productions (e.g., 豆腐 dau6fu6 “tofu” and nonword dau6su6) were used in 11-step continua with STRAIGHT (Kawahara et al., 2008). All items were resynthesized for uniform quality.
Stimuli were selected via pretesting from a larger pool of possible stimuli. Pretesting included a lexical decision task where listeners heard steps randomly sampled from each critical /f/ word to /s/ nonword continuum.4 For each continuum, the step closest to the 50% word acceptance mark was used as the ambiguous /f/ word in the Experimental condition (Control participants heard endpoint /f/ productions). Pretest listeners also categorized items from the 11-step test continua as /f/ or /s/. For each continuum, the six steps surrounding the 50% /f/ mark were selected for use in the categorization test. In both types of continua, by-item selection captures natural phonetic variation due to environment.
2.3 Procedures
Participants were run up to four at a time in sound-attenuated cubicles with PCs running E-Prime 2.0, wearing AKG-240 Studio Headphones. Participants made responses using buttons 1 and 5 on a serial response box. Apart from the auditory stimuli, the experiment was conducted in English. All participants first completed a lexical decision exposure task in which items were presented pseudo-randomly (as in Reinisch et al., 2013). Both conditions heard the same set of items, with different versions of /f/, and categorized each item as “Word” or “Not a word” as quickly and accurately as possible. All participants then completed the same categorization task, in which they heard a CVCV nonword with an ambiguous fricative as the second C, and responded with whether the medial consonant was “f” or “s.” Listeners heard seven repetitions of the six most ambiguous steps of each nonword–nonword continuum, in fully randomized order. In both tasks, the order of responses on the screen, and corresponding button were left/right counterbalanced across participants within each condition, without consideration of dominant hand.
3. Analysis and results
3.1 Word recognition accuracy
Non-responses, responses faster than 200 ms, or more than two standard deviations from the grand mean were removed (6.7% of the data). All participants performed well on Filler words (>92%), Nonwords (>82%), and /s/ words (>89%). Control participants were highly accurate for /f/ words (94%), while Experimental participants—exposed to ambiguous /f/—were less accurate (58%). These results were confirmed with a mixed logit model, with accuracy as the dependent variable. Fixed effects were Type (Filler words-reference level, /f/ words, Nonwords, /s/ words), Condition (Control-reference level, Experimental), and their interaction. There were random intercepts for Subject and Word.
The intercept was significant [, standard error (SE) = 0.20, z = 17.99, p < 0.001]. The negative effect of Type:Nonword (, SE = 0.18, , p < 0.001) indicates a lower accuracy for Filler nonwords. Accuracy was lower in the Experimental condition (, SE = 0.23, , p = 0.004), likely due to the ambiguous /f/ sounds. This is apparent in the Condition:Experimental and Type:/f/ interaction (, SE = 0.23, , p < 0.001). Last, the Condition:Experimental and Type:Nonword interaction (, SE = 0.15, z = 2.65, p = 0.007) suggests that increased task difficulty (i.e., exposure to ambiguous productions) corresponds to a slight Nonword bias.
As perceptual retuning depends in part on lexical scaffolding, the rate that listeners endorse ambiguous /f/ items as words is important. The rate of 58% is roughly equivalent to what was found in Cutler et al. (2019). These mean values, however, fail to reveal the variability—by-word accuracy ranges 42%–82%, and by-individual accuracy 7%–100%.
3.2 Categorization data
Responses faster than 200 ms, or more than two standard deviations above the grand mean were removed (6.1% of the data). Figure 1 depicts categorization functions by condition, where lower steps are more /f/-like. A mixed logit model was fit with the dependent variable as whether the participant responded with /f/ or not. Fixed effects were Step (1–6; numeric), Condition (Control-reference level, Experimental), and their interaction. There were random intercepts for Subject, with by-Subject random slopes for Step.
The intercept was significant (, SE = 0.36, z = 15.29, p < 0.001). The main effect of Step (, SE = 0.09, , p < 0.001) indicates that lower steps were more likely to be categorized as /f/. The main effect of Condition was not significant (, SE = 0.50, , p = 0.061). The Step and Condition:Experimental interaction (, SE = 0.12, z = 2.73, p = 0.006) indicates that Experimental participants categorized more of the /s/-like Steps as /f/ than Control listeners, which signals successful adaptation to the novel /f/. While the effect seems to arise on the /s/ end the categorization functions in Fig. 1, steps 4–6 correspond to medial steps5 of the original 11-step continua.
Listeners’ word endorsement rates of the ambiguous /f/ words in the exposure condition positively and significantly affected retuning [Pearson’s , p < 0.001], confirming the importance of lexical scaffolding.
3.3 Language dominance
Language dominance was scored following the BLP guidelines (Birdsong et al., 2012), with −218 and 218 as extreme endpoint scores. Negative scores correspond to Cantonese dominance, and positive to English. As depicted in the Fig. 2(A) density plot, scores were not significantly different between conditions (Control range: −85.1–167.3; Experimental range: −101.7–; , p = 0.67). Both groups were more English-dominant. To assess the effect of dominance, we tested for a correlation between Experimental participants’ mean proportion of /f/ responses and BLP scores. The correlation depicted in Fig. 2(B) was not significant [Pearson’s , p = 0.22], indicating that dominance did not play a role in retuning. Language dominance also did not predict listeners’ lexical decision accuracy for ambiguous /f/ words [Pearson’s , p = 0.21].
4. Discussion
This experiment tests whether early Cantonese-English bilinguals living in an English-dominant language environment are perceptually flexible enough to adjust the /f/ category in Cantonese, a sound that also occurs in English. This was not a certainty, given the disparate results for different types of bilingual communities (Bruggeman and Cutler, 2019; Cutler et al., 2019; Drozdova et al., 2016; Reinisch et al., 2013), and lack of comparable research with Cantonese. Cantonese-English bilinguals in this study were able to adjust their Cantonese /f/ category boundary following exposure to phonetically ambiguous /f/ words in Cantonese, despite living in an English-dominant society. This result replicates prior multilingual perceptual learning work (e.g., Reinisch et al., 2013), and extends the conclusions to a pair of languages from different language families (English-Cantonese vs English-Dutch). There was a strong correlation between identifying the ambiguous /f/ words as words during the lexical decision exposure phase and perceptual retuning, highlighting the importance of connecting the novel pronunciation to a lexical frame (Scharenborg and Janse, 2013).
The finding that early Cantonese-English bilinguals who are English dominant show perceptual learning in Cantonese may seem to contradict the findings of Bruggeman and Cutler (2019) and Cutler et al. (2019). Bruggeman and Cutler (2019) found that Dutch-English bilingual emigres to Sydney, Australia were unable to adapt to novel pronunciations in Dutch, but did so with English fricatives. Similarly, the Mandarin-English early bilinguals in Cutler et al. (2019) showed perceptual learning with English fricatives, but not Mandarin ones. In both of these studies, the interpretation is that regularly encountering new speakers is crucial to maintaining perceptual flexibility in a native language. The Dutch-English and Mandarin-English bilinguals are described as largely using Dutch and Mandarin only with known familial contacts. While we do not have data detailing with whom our Cantonese-English listeners use Cantonese, the median percentage of time participants estimated using Cantonese in the last 5 weeks, 5 months, and 5 years was 20%, 25%, and 25%. In the Metro Vancouver region where 7.9% of the population speaks Cantonese as a mother tongue (Statistics Canada, 2017), using Cantonese with novel speakers is a distinct possibility. Comparatively, Mandarin is spoken by 4.7% of the population in Sydney and Dutch by less than 1% (Australian Bureau of Statistics, 2017). Cantonese speakers in Vancouver constitute a higher percentage of a smaller city, and are thus more likely to encounter one another. Beyond encountering new speakers, the cultural significance and historical presence of Cantonese in the region makes it a known speech community in the Metro Vancouver area.
It is also possible that the lack of perceptual learning in Bruggeman and Cutler (2019) and Cutler et al. (2019) is not due to the lack of exposure to novel speakers, but is due to a yet unidentified and unexplained variable. For example, it may be that it is not the number of speakers, but the amount of variability exuded by each speaker an individual interacts with. In Metro Vancouver, the large number of Cantonese heritage speakers alongside more Cantonese-dominant speakers may serve to increase the variability of spoken Cantonese in the community and thus elicit greater perceptual adaptiveness on the part of Cantonese listeners to adeptly navigate that increased variability, as is the case with listeners who regularly hear their native language spoken with a non-native accent (Samuel and Larraza, 2015).
As perceptual learning did occur, we were able to test the role of language dominance in perceptual learning. On average, the population in this study self-reported higher English understanding scores and scored as more English-dominant than Cantonese-dominant on the BLP, though the BLP scores spanned a wide range. BLP dominance scores had no bearing on perceptual learning or on accuracy of identifying ambiguously pronounced words as words. Thus, this supports Bruggeman and Cutler’s (2019) argument that language dominance has little bearing on perceptual flexibility.
Acknowledgments
Parts of this work were completed as part of the honours thesis of L.C. at the University of British Columbia. We would like to thank everyone in the Speech in Context Lab for their feedback on the research reported here, particularly Brianne Senior, Lauretta Cheng, Karina Wong, Zoe Lam, and Michael McAuliffe. Thank you to Boaz Chan for lending his voice to the project, and to Martin Oberg for technical assistance. This work was supported by an NSERC award to M.B.
The BLP combines measures from the Multilingual British Picture Vocabulary Scale (Lim et al., 2008) and the Bilingual Dominance Scale questionnaires (Dunn and Tree, 2009). It is based on self-reported measures of proficiency, as well as language history, usage, and attitudes, and is freely available online at https://sites.la.utexas.edu/bilingual/.
Items were selected from a larger initial pool according to naturalness after synthesis and pretest results for real word familiarity. Due to an error, 玻璃 bo1lei4 “glass” was repeated twice.
Original stimuli lists included the manipulation of /s/ words as well, but pre-testing revealed surprisingly flat categorization functions in which listeners accepted a wide range of acoustic variation for /s/. Thus, we decided to compare ambiguous /f/ productions to a control condition. While this deviates from some prior research, manipulating two phonemes (e.g., /s/ and /f/) fails to address (potential) asymmetries in learning. Comparing against a Control condition remedies this issue, which is important given the results of Zhang and Samuel (2014).
The /s/ word to /f/ nonword continua were also pretested in this task, but were not used in this experiment.
Depending on the continuum: 4–7, 6–8, or 7–9.