This study examined how phonetic categorization in a second language (L2) is jointly affected by perceptual abilities and lexical knowledge. Adult L1 Mandarin Chinese and L1 English-L2 Mandarin learners performed a phonetic categorization task. The stimuli varied the F0 contour along a continuum resulting in four different tonal word/nonword end point combinations. Both L1 and L2 listeners categorized more ambiguous tokens as words than nonwords, thus demonstrating a lexical bias in their behavior, i.e., the Ganong effect. Non-phonetic, linguistic information can thus modify L2 phonetic categorization of lexical tones. This effect, however, can be constrained by the listener's pitch perception abilities.
Learning to understand second language (L2) speech as an adult is not an easy task. There are at least two factors that make L2 speech learning incredibly difficult. First, an adult's perceptual system is influenced or warped by the first language (L1) acquired (Kuhl, 1991). L2 speech learning is a process in which a learner must overcome L1 perceptual warping—a process that can make gradual improvement extremely difficult and sometimes even prohibit improvement past a certain point. As an example, an L1 English listener does not heavily attend to a speaker's fundamental frequency (F0) when a single consonant-vowel syllable is spoken. This is because F0 information at the syllable level does not carry lexical information in English. In contrast, F0 information at the syllable level carries lexical information in Mandarin Chinese (Chao, 1965). For instance, the Mandarin syllable wei spoken with a rising F0 (tone 2) can mean “hello” when answering the telephone; the same syllable wei can mean “small” when spoken with a high-level F0 (tone 1). An L1 English listener must therefore perceptually change how F0 contour cues are weighted in order to understand spoken Mandarin words (Chandrasekaran , 2010). Ample research on L2 Mandarin acquisition indicates that this type of perceptual learning can take years, if not decades, to approach L1-like levels (Hao, 2012; Pelzl , 2019). As expected, learners with better pitch perception abilities tend to learn tone categories better than those with weaker pitch perception abilities (Bowles , 2016; Perrachione , 2011).
L2 speech learning is also challenging because listeners must match the variable speech signal to a lexical representation in memory. This process requires substantial L2 input and involves inhibiting competition from both L1 and L2 words [see Cutler (2006)]. Crucial to the present study, learning new words can change how a listener perceives the speech signal. Evidence for this claim comes from the phonetic categorization task (Ganong, 1980). In this task, an acoustic cue is systematically varied along a defined continuum (e.g., voice onset time from /g/ to /k/ resulting in a gift-*kift continuum or *giss-kiss continuum). Listeners are presented auditorily with the stimuli and instructed to categorize each utterance. When plotted, these category responses typically show an s-like function with a steep slope representing an approximate phoneme category boundary. Importantly, listeners are biased to respond in a manner that results in a word, which causes category boundaries to shift, i.e., the “Ganong effect” (Connine and Clifton, 1987; Fox, 1984). Non-native listeners who lack lexical knowledge in the target language (i.e., not knowing that *kift and *giss are nonwords) do not show the same sensitivity and boundary shift as L1 listeners in their phonetic categorization (Keating , 1981). To further test the Ganong effect, Fox and Unkefer (1985) used the phonetic categorization task to test L1 Mandarin and naive L1 English listeners. The authors varied the F0 contour on Mandarin syllables to create tone 2—tone 1 endpoints, resulting in four conditions: word—word, word—nonword, nonword—word, and nonword—nonword. For example, shei2 means “who” but *shei1 is a nonword; *hei2 is a nonword but hei1 means “black.” The L1 Mandarin listeners categorized more ambiguous tokens as words than nonwords: tone category boundaries shifted given the endpoints' lexical status. In contrast, the naive L1 English listeners, who had no Mandarin lexical knowledge, showed no boundary shift as a result of the endpoints' lexical status.
In this study we examine how L2 phonetic categorization of Mandarin lexical tone continua is jointly constrained by perceptual abilities and lexical knowledge. We extend Fox and Unkefer's study to examine whether lexical learning alone is sufficient to shift L2 tone boundaries, or if good pitch perception abilities are also necessary. Knowing that shei2 is a word and that *shei1 is a nonword may be irrelevant if the L2 listener's perception is so warped that they cannot perceive the two tones' F0 differences. Crucial to our study, we predict that pitch perception and lexical knowledge of our target words will interact: even learners with lexical knowledge of the target words may not categorize more ambiguous tokens as words than nonwords if they lack the necessary perceptual abilities.
A total of 150 adult participants took part in the study. Participants were given class credit (L2) or volunteered their time (L1). All tasks were approved by the Institutional Review Board and occurred in a quiet lab setting. All participants self-reported normal hearing and vision. Forty participants were L1 Mandarin speakers (mean age = 22; range = 18–36) from Beijing and surrounding northern mainland China and therefore all self-reported being dominant Mandarin speakers with no knowledge of other tone dialects. All L1 Mandarin speakers had completed up to high school in China, spoke English as an L2, and were living in the U.S. as undergraduate or graduate students at the time of testing. The remaining 110 participants were L1 English speakers from the U.S. (mean age = 20; range = 18–34) who self-reported having less than five years of musical training. All L1 English speakers were enrolled in an L2 Mandarin language class as undergraduate or graduate students.
All participants provided language history and background information and then performed three behavioral tasks. First, the Tonometric adaptive pitch test (Mandell, 2018), was used to assess pitch perception abilities. In this task, participants heard two pure tones over headphones and were asked to identify via keyboard whether the second tone was higher or lower in pitch than the first tone. Participants unable to reliably differentiate two pure tones greater than 20 Hz apart were removed from the analysis. Whereas all cutoffs are arbitrary, 20 Hz served to remove participants with potential congenital amusia or “tone deafness” (see Liu , 2012) while preserving a reasonable amount of data (five L1 participants and 20 L2 participants were removed for having thresholds higher than 20 Hz). Second, participants completed a computerized version of the LexTALE_CH test (Chan and Chang, 2018) to estimate their Mandarin lexical knowledge. The test presented 90 characters (60 real, 30 nonce) in a pseudorandomized order and asked participants to indicate via mouse-click which characters were real. For this task, participants' corrected accuracy was calculated: hits − 2 * false alarms. Third, participants performed a phonetic categorization task. Participants heard stimuli over headphones and were asked to indicate as quickly and accurately as possible via button press whether each stimulus was tone 2 or tone 1. Tone labels and arrows were displayed on screen until participants responded with a button press or a 2000 ms timeout. For this task, four tone 2—tone 1 Mandarin pairs were created such that they formed a word—word (wei2—wei1), Word—nonword (shei2—*shei1), nonword—word (*hei2—hei1), and nonword—nonword (*tei2—*tei1). These items differed slightly from those tested in Fox and Unkefer (1985) in order to ensure that the majority of L2 learners were familiar with the four lexical targets (为,1 微, 谁, 黑). The eight end point targets were recorded at 44.1 kHz in a sound-attenuated booth by a 24-year-old female L1 Mandarin speaker from Beijing. The speaker produced five tokens of each target to induce natural citation-form F0 contours. Wei2 and wei1 tokens were selected to serve as the tone 2 and tone 1 F0 contours based on relatively similar F0 offset values (tone 2 F0 onset = 203.8 Hz; offset = 261.5. Tone 1 F0 onset=283.3 Hz; offset=268.3 Hz). From these contours, a nine-step continuum was created using the pitch synchronous overlap and add method in praat (Boersma and Weenink, 2019). This resulted in continua with tone 2 F0 onset starting at 203 Hz and increasing by 10 Hz across the intervening steps to reach 283 Hz. The offset for steps one through nine was set at 268 Hz. These contours were separately superimposed on all stimuli with a normalized vowel duration of 380 ms and amplitude of 66.7 dB resulting in a total of 36 unique tokens (four continua with nine steps each). Five additional L1 Mandarin speakers confirmed the tone endpoints with 100% agreement. Each of the 36 tokens was presented six times over two blocks (216 trials) along with 164 filler trials designed for a separate pilot study. Participants first performed six practice trials using the word-word yu2—yu1 end point targets to become familiar with the task. Stimuli were presented using E-prime 2.0. After completing the categorization task, L2 participants were shown the four Chinese characters on paper corresponding to the target wei1, wei2, shei2, and hei1 lexical endpoints and asked to write down the romanization (pinyin) of the word and its English meaning. On average, 55% of the targets' pinyin pronunciations were correctly identified. Roughly three-quarters of the L2 participants' errors were due to the wrong tone.
The reported data analysis was thus carried out on 125 participants (35 L1, 90 L2). Trials in which no response was pressed before 2000 ms were removed (N = 818) leaving 26 182 observations. See Wiener and Liu (2020) for data and r code detailing all our statistical analyses. Figure 1 plots the corrected LexTALE_CH score (i.e., correct hits adjusted for false alarms), Tonometric score (i.e., the lowest Hz difference at which the listener could discern two pure tones), and months in L2 classroom on the left-hand side using raincloud plots (Allen , 2019). The L1 and L2 groups differed in mean corrected LexTALE_CH (44.87 for the L1 group; –0.80 for the L2 group; [t(122.96) = 27.92, p < 0.001, d = 4.58]) and mean Tonometric pitch threshold (3.93 Hz for the L1 group; 8.36 Hz for the L2 group; [t(120.93) = –5.08, p < 0.001, d = 0.81]). The two groups' categorization means (with error bars showing 95% confidence intervals) and loess-smoothed response curves are plotted by condition in Fig. 1 on the right-hand side. The response curves were overall similar in that both L1 and L2 listeners categorized more ambiguous tokens as words than nonwords. Clear group-level differences, however, were observed in each condition as the L1 group showed considerably sharper slopes than the L2 group.
We first carried out a two-way ANOVA following Fox and Unkefer (1985). This was done by comparing participant means at step 5 by group and condition with the word—word condition as the baseline. Like Fox and Unkefer we do not report on the nonword—nonword condition. An effect of group [F(1, 369) = 26.95, p < 0.001, ηp2 = 0.07], condition [F(2, 369) = 170.01, p < 0.001, ηp2 = 0.48], and their interaction [F(2, 369) = 15.05, p < 0.001, ηp2 = 0.08] were found. Both groups showed tone boundary shifts for the nonword—word and word—nonword conditions when compared to the word—word condition (p < 0.001). The L1 and L2 groups differed in their responses in the word—word and nonword—word conditions (p < 0.001), but not in the word—nonword condition (p = 0.68).
To further explore the L2 learner variability, we modeled the Tone 1 response curves using growth curve analysis (Mirman, 2014). We treated the 9-step tone continuum as a continuous time term, which we modeled using first-order (linear), second-order (quadratic), and third-order (cubic) polynomials. Tone 1 response (1, 0) served as the dependent variable, while Tonometric, corrected LexTALE_CH scores, and months in the L2 classroom were standardized and treated as fixed independent variables. Knowledge of the target word endpoints' pinyin (1 = correct; 0 = incorrect) was also included as a fixed effect. (See r code online for additional information including random effects structure.) We confirmed that the L2 group's response curves differed across the three conditions (p < 0.001). We also confirmed that knowledge of the target word's pinyin (β = 0.22, z = 4.12, p < 0.001) affected the L2 group's overall behavior; Tonometric had a marginal effect on overall behavior (β = 0.12, z = 1.96, p = 0.05). Neither corrected LexTALE_CH results nor months in an L2 classroom had a significant effect on the L2 group's behavior (p > 0.1). To examine the slope of the three curves, we specifically tested for a three-way interaction containing the linear polynomial, Tonometric, and knowledge of the target word's pinyin, which we found to be significant (β = 0.55, z = 5.48, p < 0.001). Subset analysis confirmed this three-way interaction for the word—word (β = 0.50, z = 2.15, p = 0.03) and nonword—word (β = 0.36, z = 2.05, p = 0.04) conditions, but not for the word—nonword condition (β = 0.13, z = 0.59, p = 0.56).
To summarize, the L1 and L2 groups differed in their mean pitch perception abilities and lexical knowledge (Fig. 1, left two plots). Yet, both groups categorized more ambiguous tokens as words than nonwords. The L1 and L2 groups' behavior differed in the nonword—word and word—word conditions, but not in the word—nonword condition (Fig. 1, right three plots). Modeling of the L2 response curves using individual differences indicated that overall performance was significantly predicted by knowledge of the target words' pinyin and marginally predicted by Tonometric pitch perception results. Neither corrected LexTALE_CH (i.e., lexical knowledge) nor months in an L2 classroom predicted the L2 learners' overall behavior. The tone category slope for the nonword—word and word—word continua were characterized by the interaction of knowledge of the target word's pinyin and Tonometric pitch perception score. L2 participants with knowledge of the target word's pinyin and lower Tonometric scores (i.e., better pitch perception abilities) had steeper slopes than those L2 participants with knowledge of the target word's pinyin but higher Tonometric scores (i.e., weaker pitch perception abilities).
We found that L2 listeners' pitch perception abilities marginally predicted overall L2 tone categorization performance in line with previous studies that demonstrated the importance of pitch perception abilities in L2 tone acquisition [e.g., Bowles (2016)]. In the word—word (wei) condition, we found evidence of categorical perception of tone by L2 listeners not unlike recent Mandarin tone studies involving intermediate and advanced L2 learners [e.g., Ling and Grüter (2020) and Shen and Froud (2016)]. Overall performance in the two nonword conditions was largely predicted by whether the L2 listeners knew the target words we tested (i.e., correctly reported the pinyin). This finding is in line with previous L2 phonetic categorization studies at the segmental level [e.g., Keating (1981)]. L2 listeners, like L1 listeners, are thus biased towards a tone category that results in a syllable-tone word. Lexical knowledge of the target syllable-tone word is sufficient for this category boundary shift to occur. To the best of our knowledge, this serves as novel evidence of the Ganong (1980) effect on L2 learners of a tonal language. These results extend Fox and Unkefer's (1985) original study to demonstrate that higher-order L2 lexical knowledge involving tone categories can help interpret lower-level non-native speech involving small F0 differences.
We also found L1 and L2 group similarities and differences across the two nonword conditions. In the nonword—word (hei) conditions, the L1 and L2 groups' behavior differed. We found that L2 listeners with stronger pitch perception abilities and knowledge of the target hei word had steeper tone category slopes than those L2 listeners with knowledge of the target word but weaker pitch perception abilities. These findings indicate that for hei1, knowledge of the target word shifted phonetic categorization much less if the listener had weak pitch perception abilities. In the word—nonword (shei) condition, the L1 and L2 groups performed similarly and none of our L2 individual differences were found to predict the slope of the shei2—*shei1 tone category boundary. These results indicate that lexical knowledge of shei2 was sufficient to shift category boundaries, that L1 and L2 listeners shifted their boundaries in a similar manner, and that this shift occurred even for L2 listeners with relatively weaker pitch perception abilities. One likely reason for this pattern of behavior is shei2 (“who”) is an extremely frequent/familiar word and the only shei word in Mandarin. Our L2 learners were most likely exposed to this word early and often during the first month of L2 classroom instruction. Nearly 88% of our participants correctly identified the meaning of the shei2 character whereas 69% identified the correct syllable and tone (cf. 59% for hei1). We imagine even those L2 learners who did not correctly identify the character knew that shei2 meant “who” and either struggled to remember the character's pronunciation or had encoded an incorrect tone in memory as is common even in advanced L2 learners [e.g., Han and Tsukada (2020) and Pelzl (2020)]. A larger dataset involving more words is needed to fully evaluate this frequency/familiarity account, but this account explains why a highly frequent/familiar word like shei2 (“who”) exerted a relatively stronger lexical bias in L2 listeners than the less frequent/familiar word like hei1 (“black”) did. We note that although no frequency-based Ganong effect has been reported in L1 Mandarin listeners (Politzer-Ahles , 2020), it remains an open question whether L2 listeners demonstrate similar behavior.
Taken together, our results underscore the fundamental challenges associated with L1 perceptual warping and L2 lexical learning. In line with previous studies on L2 tone perception, we found that overcoming L1 perceptual warping in terms of F0 cue weighting can be extremely challenging (Chandrasekaran , 2010). Many of our L2 participants—even those with extensive L2 classroom experience—did not demonstrate L1-like pitch perception abilities. In our sample, L2 classroom experience and Tonometric pitch perception abilities were only weakly correlated (r = 0.14), indicating that many of the L2 participants with greater classroom experience actually had weaker pitch perception abilities than many of those participants with less classroom experience. These findings call to question whether experience with a tonal language is necessarily related to improved pitch perception [cf. Bent (2006) and Pfordresher and Brown (2009)] and to what degree our intermediate and advanced adult L2 Mandarin learners had plateaued in their tone categorization abilities given their limited pitch perception abilities [see Hao (2012)]. This plateau in L2 perceptual learning could also explain why we found that overall L2 lexical knowledge (LexTALE_CH) was only weakly correlated with L2 classroom experience (r = 0.26). For our sample of adult learners, time invested in L2 learning did not necessarily result in linear gains in terms of Mandarin word learning (at least as measured by character recognition; an auditory lexical decision task may yield different results). Importantly, lexical learning alone did not always cause L2 listeners to categorize more ambiguous sounds as words than nonwords. For some of our L2 listeners, knowledge of the target words did not lead to the same categorization behavior if they lacked strong pitch perception abilities.
Figure 2 plots the slow change in which bottom-up perceptual abilities and top-down lexical knowledge jointly shape L2 phonetic categorization of tone. In this figure “weak” pitch abilities are defined as having a threshold above the L2 group's mean (8.3 Hz); “strong” abilities indicate having a threshold below the L2 group's mean. Lexical knowledge represents knowledge of the target words' pinyin used for the continuum endpoints. Of note is the gradual and additive shift towards more “L1-like” tone category boundaries, steeper slopes, and the reduced variability (smaller error bars) across participants. This slow progression towards more categorical behavior mirrors that of adolescents [e.g., McMurray (2018)], suggesting experience with the target language affects categorization abilities. Interestingly, our results indicate that this progression in L2 learning may not necessarily be tied to L2 exposure—or at least self-reported passive exposure in the classroom as we measured it. For instance, an “advanced” learner who spent multiple years in an L2 classroom speaking only with the instructor and classmates may be less proficient than a “beginner” learner who only spent a few months in a classroom but speaks the L2 regularly with friends. A more nuanced measure of L2 use or active exposure may better capture this behavior.
In conclusion, L2 perception of suprasegmental information can be modulated by non-phonetic, linguistic information. Lexical knowledge alone can be sufficient to shift tone categories. Listeners with better pitch perception abilities, however, tend to show steeper tone category slopes and more “L1-like” categorical behavior than listeners with weak pitch perception abilities. Lexical frequency/familiarity of the target word may also affect how much L2 tone categories shift, though additional data is needed to fully evaluate this claim.
We thank Rachel Theodore, Pamela Fuhrmeister, and an anonymous reviewer for their incredibly helpful comments and suggestions on earlier versions of this research.
We acknowledge the polyphone nature of the morpheme wei, but used this character as it was familiar to both L1 and L2 speakers and most often realized as wei2 in speech.