Current frameworks of L2 phonetic acquisition remain largely underspecified with respect to the role of L1 allophonic variability in acquisition. Examining the role of L1 allophonic variability, the current study compared the perceptual discrimination of English /i-ɪ/ and /ɛ-æ/ by L1 Korean and L1 Mandarin speakers. Korean and Mandarin vowel inventories differ in that Mandarin employs significantly greater allophonic variation of the mid-region /E/ vowel. Results demonstrated worse perceptual accuracy by L1 Mandarin speakers for the /ɛ-æ/ contrast than L1 Korean speakers. These results suggest that both L1 phonemic inventories and allophonic variation play a role in L2 phonetic acquisition.
1. Introduction
Second language (L2) learners often have difficulties discriminating L2 contrasts that are absent in their L1 (Aoyama et al., 2004; Bohn, 1995; Bohn and Flege, 1992; Chen et al., 2001; Flege et al., 1997; Flege and MacKay, 2004; Guion et al., 2000). The cause of this is often thought to be interference from the existing L1 inventory (Flege, 1987, 1995). Popular theoretical frameworks focus on the nature of the L1 phonemes (i.e., what phonemes are actually present in the L1 inventory) when discussing L1 interference, but overlook cases of significant allophonic variability (i.e., places where multiple allophones exist in a region of the phonological space) when making predictions about the acquisition of contrasts.
1.1 Theoretical models
Among the most influential models, the Speech Learning model (SLM) (Flege, 1995), the perceptual assimilation model (PAM) and PAM-L2 (Best, 1995; Best and Tyler, 2007), and the Second Language Linguistic Perception model (L2LP) (Escudero, 2005, 2009) all propose that the outcomes for acquisition of novel L2 sounds (or contrasts) depend on their relation to the L1 inventory.
For example, the SLM proposes that, at the level of position-specific allophones, L2 vowels may be “identical,” where the L1 and L2 phones are acoustically identical, “similar,” where the L2 phoneme is similar but not identical to an L1 category, or “new,” where the L2 phoneme does not match up with any existing L1 category (Flege, 1988, 1992; Flege et al., 2003). Learners may either create a separate category for the “new” phoneme or assimilate the “identical” and “similar” phoneme to an existing L1 category (Flege et al., 2003) via the equivalence classification. Considering L2 contrasts, the SLM indirectly suggests that if two L2 phones are perceived as ‘equivalent’ to the same L1 category (i.e., one “identical” and one “similar”), they may be subsumed into a single existing L1 category, resulting in poor contrast discrimination. In terms of allophones, SLM-r argues that acquiring one position-specific allophone of a L2 phoneme may not help with the perception and production of any other allophones (Flege and Bohn, 2021). PAM-L2 also focuses on the role of L1 inventories in L2 acquisition, namely, whether an L2 learner has “perceived equivalence between an L2 and L1 phonological category” [Best and Tyler (2007), p. 24]. PAM-L2 predicts several perceptual outcomes for pairs of L2 contrastive sounds: two-category assimilation, category-goodness assimilation, uncategorized, and single-category L2 contrast assimilation. Relevant for the current study, single-category assimilation occurs when two L2 phonemes are perceived as equally good or bad representations of a the same L1 category (Best and Tyler, 2007). Finally, the L2LP model similarly suggests that learners initially perceive L2 phonemes relative to the L1 acoustic space (van Leussen and Escudero, 2015). Among several possible outcomes, they note that when a pair of L2 sounds are acoustically similar to a single L1 sound, learners must create a new L2 category or split their existing single L1 category, known as a NEW scenario in L2LP terms (van Leussen and Escudero, 2015).
Although all three models consider the role of the L1 on L2 acquisition, they remain underspecified with respect to the dual roles that the phonetic inventory and allophonic variation may play in the acquisition of novel L2 contrasts. The L2LP model (van Leussen and Escudero, 2015), for example, focuses the relation of “phonemic categories” in the L1 and L2. Although SLM (Flege, 1995) suggests that the acoustic-perceptual similarities between L1 and L2 phones may be established with respect to specific positional allophones, subsequent research on L2 vowel perception has largely focused on L1 and L2 phonemic (rather than allophonic) inventories [e.g., Bohn and Flege (1992), Chen et al. (2001), and Morrison (2008)]. To better explore the role of L1 allophonic variability in L2 perception and acquisition, it is worth comparing languages, such as Korean and Mandarin, that evidence similar L1 phonemic structures, but differ in allophonic variability.
1.2 L1 vowel inventories of the two speaker groups
This study examines the perception of two American English vowel pairs (/i-ɪ/ and /ɛ-æ/) by two listener groups (L1 Korean and L1 Mandarin listeners) to begin addressing the issue of whether allophonic variability is a factor in L2 contrast acquisition.
While subject to some ongoing debate (Kang 2003; Lee, 1999; Oh, 1997), recent research has supported a seven-vowel Korean inventory (Ahn and Iverson, 2007; Kim-Renaud, 2009; Lee and Iverson, 2012; Shin et al., 2012). Considering the target contrasts, there is consensus that the Korean vowel inventory includes /i/ and /ɛ/, and excludes /ɪ/ and /æ/.
Similar to Korean, there is not a consensus on which phonemes are present in Mandarin. However, recent studies have shown support for a five-vowel Standard Mandarin inventory (Wiese, 1997; Mok, 2013). Considering the target contrast /i-ɪ/, Standard Mandarin includes /i/ and excludes /ɪ/. Regarding the /ɛ-æ/ contrast, the situation is more complex as there are many vowel qualities in the mid-range of Mandarin, including [e, ɛ, ə, ɣ, ɔ, o], with complementary distributions. Widely considered to be allophones of the same phoneme, Mok (2013) (among others), proposes an underspecified mid-vowel, typically represented by /E/, that changes in frontness/backness and rounding, but not in height [for discussion of other underlying phonemes, see Mok (2013)]. In Mandarin, [ɛ] is present as an allophone of the underspecified mid-vowel phoneme /E/, but the status of [æ] is debated, as some researchers include it as an allophone of the low vowel /a/ (Wiese, 1997), and others do not (Lin, 2007). For the purposes of this study, we will follow the analysis of Lin (2007) and exclude it as a possible allophone.
1.3 Surface level predictions on the acquisition of the relevant contrasts
Considering the perception and acquisition of these contrasts, Korean and Mandarin are similar with respect to the /i-ɪ/ contrast: both contain /i/, but not /ɪ/. In PAM-L2 terminology, a single-category assimilation, is likely for both groups (i.e., English /i/ and /ɪ/ will be assimilated to Korean/Mandarin /i/). Previous studies have indeed found that both L1 Korean and L1 Mandarin speakers make this assimilation (Chen et al., 2001; Hwang and Lee 2015; Jia et al., 2006; Kim et al. 2017). With respect to the /ɛ-æ/ contrast, both Korean and Mandarin contain a single mid-vowel phoneme, but differ with respect to their allophonic variability. Korean contains /ɛ/ but not /æ/ and has only a single allophone of /ɛ/. Mandarin also does not contain/æ/and has a single mid-vowel phoneme, represented as /E/, but notably contains [ɛ] as an allophone of /E/ along with many others. If L2 perception is conditioned by L1 allophonic variability, given the wide range of allophones in the mid-region of the Mandarin vowel space, L1 Mandarin speakers may experience additional perceptual difficulties for the /ɛ-æ/ contrast relative to L1 Korean speakers. This hypothesis is based on the assumption that Mandarin speakers are more likely to accept a non-native vowel /æ/ as a deviant but possible version of the highly variable L1 mid-vowel phoneme, which already includes /ɛ/ as one of its allophones.
Previous work on the acquisition of L2 vowels present in the mid-range of the vowel space by native Mandarin speakers, who have high allophonic variability in this region, have largely overlooked allophonic variation in their analyses. For example, Flege et al. (1997) examined the perception and production of the two English vowel pairs by L1 German, L1 Spanish, L1 Korean, and L1 Mandarin speakers with different levels of L2 experience, focusing on the use of temporal and spectral cues for their perception study. The relevant results show that L1 Korean and L1 Mandarin speakers had difficulty both producing and distinguishing the two vowels in each pair (/i-ɪ/ and /ɛ-æ/) [see also Jia et al. (2006) and Wang et al. (2006) for similar findings with respect to Mandarin speakers' perception of the /ɛ-æ/ contrast]. Flege et al. (1997) speculated that L1 allophony may have hindered the English /æ/ productions, as “[æ]-quality vowels” occur in some contexts in Mandarin, and can occur as a realization of /ɛ/ in Korean, but does not explore this topic in more detail. The recent merger of non-high front vowels (Ahn and Iverson, 2007; Kim-Renaud, 2009; Lee and Iverson, 2012; Shin et al., 2012) in Korean decreases the possibility that /ɛ/ is being produced similarly to [æ], suggesting that allophony is likely not a factor for L1 Korean learners, while it may still play a role for L1 Mandarin learners.
To conclude, many previous studies (Flege et al., 1997) and theories (i.e., SLM, PAM, L2LP) predict issues arising from category similarity in L2 phonological acquisition but have largely not considered the role of allophonic variability in the discriminatory abilities of L2 learners. This study aims to determine if allophonic variability is a factor in the discrimination of L2 contrasts, or if category acquisition is solely shaped by existing phonemic categories.
1.4 The current study
The current study examines the role of L1 allophonic variability in the perception (and acquisition) of L2 phonemic contrasts by examining the discrimination accuracy of the English vowel contrasts /i-ɪ/ and /ɛ-æ/ by L1 Korean and L1 Mandarin listeners. If allophonic variability plays a role in L2 perception, it is anticipated that L1 Mandarin listeners will be less successful than L1 Korean listeners in the discrimination of the /ɛ-æ/ contrast, due to the high variability of the Mandarin mid vowel. Couched within predominant theoretical models, the allophonic variability in the Mandarin mid-vowel space may cause listeners to accept an acoustically proximal L2 realization as an equivalent to the L1 mid-vowel. Korean speakers, with a narrower L1 space for their mid phoneme /ɛ/, may be less likely to assimilate English /æ/ to the L1 /ɛ/ category. In contrast, it is expected that both groups of listeners will perform similarly in the perception of the /i-ɪ/ contrast, as their relation to the L1 is similar in both languages.
2. Methods
2.1 Participants
Sixteen L1 Mandarin–L2 English (Mage = 25.3, SD = 6.0) and 14 L1 Korean–L2 English (Mage = 25.4, SD = 5.5) participated in the discrimination task. The two groups were well-matched for both self-rated L2 (English) proficiency on a 7-point Likert scale (L1 Mandarin M = 5.81, SD = 1.01; L1 Korean M = 6.71, SD = 0.80) and L2 age of acquisition (L1 Mandarin M = 8.6, SD = 4.0; L1 Korean M = 7.7, SD = 1.4).
2.2 Stimuli
The stimuli, 10 minimal pairs in total, consisted of /i-ɪ/ and /ɛ-æ/ inserted in monosyllabic minimal pairs with a CVC structure (e.g., /ɛ/ in pet). There were five pairs for the /i-ɪ/ distinction: sit–seat, sick–seek, pitch–peach, pip–peep, chip–cheap. There were five pairs for the /ɛ-æ/ distinction: bed–bad, dead–dad, pet–pat, guess–gas, deb–dab. Analysis, using frequency counts from Davies (2008), showed no significant differences in word frequencies between words containing /i/ and /ɪ/ [t(8) = 0.57, p = 0.58] or between words containing /ɛ/ and /æ/ [t(8) = 0.24, p = 0.82].
A native speaker of Midwestern American English recorded the stimuli (F, age 23). For each of the “different” pairs (e.g., /bɛd–bæd/) the phonetic context around the vowel was acoustically identical (i.e., same recording), with different vowels spliced in. The vowels were extracted from the original recording beginning at the end of the release portion of the initial consonant and ending at the offset of periodicity and formant structure of the vowel. For the “same” pairs (e.g., /bɛd–bɛd/ and /bæd–bæd/), the phonetic context surrounding the vowels was also acoustically identical, but the two vowels came from two different recordings of the same word.
2.3 Procedure
Participants were recruited online using Prolific with the following constraints, location: United States, first language (native language): Korean or Mandarin, age: 18–40 years. Participants answered a brief background questionnaire and completed the perception task.
The perceptual discrimination task consisted of an AX discrimination paradigm. In each trial, participants heard an audio clip containing two words consecutively, with a 200 ms inter-stimulus interval, chosen based on similar AX task studies [see Nagle and Baese-Berk (2022) for an AX task review]. Participants were asked to indicate whether “the two vowels in the words” they heard were the same or different. The task was not timed, however, participants could not replay the audio. Both “same” and “different” pairs were included as stimuli, in equal numbers. In different pairs, the order of words with different vowels was counterbalanced (e.g., /æ–ɛ/ vs /ɛ–æ/). Each vowel contrast was presented in five different stimulus pairs in four different orientations, resulting in a total of 40 discrimination trials per participant. The order of presentation was randomized, and each participant received a different randomized order. Participant responses were coded as either correct (1) or incorrect (0). The final analysis included 1200 tokens (30 participants 10 word pairs 4 vowel orientations [e.g., /æ–ɛ/, /ɛ–ɛ/]).
3. Results
Answers to the perception questions were marked as either correct “1” or incorrect “0” and inputted in matlab R2021b for statistical analysis using a logistic regression mixed-effects model (MathWorks, 2021). This model included perceptual accuracy (binary, categorical) as a dependent variable and group (L1 Mandarin or L1 Korean) and contrast type (/i–ɪ/ or /ɛ–æ/) as fixed effects. Item and participant were included as random effects (random intercept). The results of this mixed effects model are shown in Table 1.
Results of mixed-effects model 1.
. | Estimate . | SE . | t-value . | p-value . | Odd ratio . |
---|---|---|---|---|---|
Intercept | −3.08 | 0.51 | −6.03 | <0.001 | 0.046 |
Group: Mandarin | 0.98 | 0.39 | 2.54 | 0.011 | 2.66 |
Contrast type: /i–ɪ/ | 0.35 | 0.67 | 0.52 | 0.600 | 1.42 |
Group: Mandarin Contrast type: /i–ɪ/ | −0.96 | 0.44 | −2.19 | 0.029 | 0.38 |
. | Estimate . | SE . | t-value . | p-value . | Odd ratio . |
---|---|---|---|---|---|
Intercept | −3.08 | 0.51 | −6.03 | <0.001 | 0.046 |
Group: Mandarin | 0.98 | 0.39 | 2.54 | 0.011 | 2.66 |
Contrast type: /i–ɪ/ | 0.35 | 0.67 | 0.52 | 0.600 | 1.42 |
Group: Mandarin Contrast type: /i–ɪ/ | −0.96 | 0.44 | −2.19 | 0.029 | 0.38 |
The results show that the L1 Mandarin group was significantly different from the L1 Korean group (β = 0.98, SE = 0.39, p = 0.011) with respect to overall perceptual discrimination accuracy, specifically the L1 Mandarin group was less accurate than the L1 Korean group.
The effect of contrast type (/i–ɪ/ or /ɛ–æ/) was not found to significantly affect discriminatory accuracy (p = 0.600), when adjusted for group and speaker. However, there was a significant interaction between group (L1 Mandarin or L1 Korean) and contrast type (/i–ɪ/ or /ɛ–æ/). Specifically, while performance was similar for the two groups for the /i–ɪ/ contrast, the L1 Korean group was significantly more accurate than the L1 Mandarin group for the /ɛ–æ/ contrast. While L1 Korean speakers showed no difference in accuracy judgements between the two contrast types, L1 Mandarin speakers produced higher accuracy judgements for /i–ɪ/ than /ɛ–æ/. This also shows that the main effect of Group is driven primarily by the /ɛ–æ/ contrast, where Korean and Mandarin participants diverged in their performance. Figure 1 shows the average accuracy scores of the L1 Korean and L1 Mandarin groups for the two different contrast types.
The average accuracy scores of the L1 Korean and L1 Mandarin groups for the two different contrast types.
The average accuracy scores of the L1 Korean and L1 Mandarin groups for the two different contrast types.
Finally, to ensure than the results above were not an artifact of participant language profile, a separate logistic regression mixed-effects model was implemented to determine if participants' language backgrounds were predictive of their discrimination accuracy. Perceptual accuracy was used again as a binary categorical dependent variable. The fixed effects were group (L1 Mandarin or L1 Korean), age of acquisition (AOA, in years), age of arrival (in years), and self-rated proficiency (1–7), with item and subject as random intercepts. The results (Table 2) show that none of the fixed effects significantly affected the discriminatory accuracy of the two speaker groups. It is important to note that age of arrival was approaching significance (p = 0.059), which indicates a potential impact on discriminatory accuracy.
Results of mixed-effects model 2.
. | Estimate . | SE . | t-value . | p-value . | Odd ratio . |
---|---|---|---|---|---|
Intercept | −1.65 | 1.16 | −1.42 | 0.155 | 0.52 |
Group: Mandarin | 0.23 | 0.33 | 0.69 | 0.490 | 1.26 |
AOA | −0.044 | −0.056 | −0.79 | 0.429 | 0.96 |
Age of arrival | 0.044 | 0.023 | 1.89 | 0.059 | 1.05 |
proficiency | −0.21 | 0.15 | −1.33 | 0.182 | 0.82 |
. | Estimate . | SE . | t-value . | p-value . | Odd ratio . |
---|---|---|---|---|---|
Intercept | −1.65 | 1.16 | −1.42 | 0.155 | 0.52 |
Group: Mandarin | 0.23 | 0.33 | 0.69 | 0.490 | 1.26 |
AOA | −0.044 | −0.056 | −0.79 | 0.429 | 0.96 |
Age of arrival | 0.044 | 0.023 | 1.89 | 0.059 | 1.05 |
proficiency | −0.21 | 0.15 | −1.33 | 0.182 | 0.82 |
4. Discussion
The current study examined the potential role of allophonic variability in L2 phonemic contrast perception by exploring the perception of the English /ɪ–i/ and /ɛ–æ/ contrasts by two L1 speaker groups: Korean and Mandarin. With respect to the /ɪ–i/ contrast, both Korean and Mandarin vowel inventories contain the /i/ phoneme but lack the /ɪ/ phoneme, and neither contain any significant allophonic variation of the /i/ phoneme, therefore it was predicted that the two groups would perform similarly on this contrast. With respect to the /ɛ–æ/ contrast, both Korean and Mandarin contain one phoneme in this mid-vowel range, /E/ and /ɛ/, respectively, but differ with respect to their allophonic variability. Specifically, Mandarin has multiple allophones, including [ɛ], while Korean has only a single allophone [ɛ]. If allophonic variability plays a role in L2 contrast discrimination, it was hypothesized that L1 Mandarin group would be less accurate than the L1 Korean group for the /ɛ–æ/ contrast because of the pronounced allophonic variability in the mid-region of the Mandarin vowel space. For the /ɪ–i/ contrast, the two groups did indeed preform comparably, as predicted. For the /ɛ–æ/ contrast, the L1 Mandarin group evidenced significantly lower discriminatory accuracy compared to the L1 Korean group, suggesting that increased allophonic variability impacts L2 contrast perception. This difference was not attributable to language background factors (e.g., AOA, age of arrival, L2 proficiency).
The results of this study suggest that L1 Mandarin learners of English face greater difficulty with the /ɛ–æ/ contrast, plausibly as a result of significant allophonic variation in the mid-vowel region of their L1. Due to this variation, Mandarin listeners may perceive different phonetic realizations in the mid region of the vowel space as variations of the same native vowel phoneme /E/, including the target English /ɛ/ and /æ/ phonemes. On the other hand, Korean, although also lacking /æ/ in its phonemic inventory, has a more restricted space of variability for its /ɛ/, leading /æ/ to be less acceptable to L1 Korean speakers as a possible realization of the L1 Korean /ɛ/. From a theoretical perspective, using Flege's (1988, 1992) “similar” and “different” dichotomy, Korean learners perceive English /æ/ as less “similar” to native vowels than Mandarin learners perceive this same vowel. As a result, equivalence classification between a similar native vowel and /æ/ is more likely for L1 Mandarin learners, while a more successful discrimination for English /æ/ is likely for L1 Korean learners as separate category formation occurs. These results may also have pedagogical implications. Given that L1 Mandarin learners may face more difficulty acquiring the English /ɛ–æ/ contrast, special emphasis should be placed on teaching this contrast.
The current study highlights the importance of considering the entire native vowel system, including both the phonemic and allophonic structures, in L2 phonetic perception and acquisition. This finding parallels the SLM proposal that the acoustic-perceptual similarities between the sounds of the L1 and L2 need to be based on specific positional allophones. Nevertheless, research exploring the acquisition of L2 vowels following SLM did not consider the detailed allophonic variation and instead compared languages based on the presence of certain vowel qualities as independent phonemes in the L1 and the L2 inventories (Bohn and Flege, 1992; Chen et al., 2001; Morrison, 2008). Although this approach is likely sufficient for making relatively accurate predictions for many L1–L2 language pairings, the present study suggests that in instances of significant variation (e.g., mid vowels in Mandarin), taking allophonic variability into account can give a more refined view on the acquisition of particular L2 vowels. Results from this study also suggest that special attention should be given towards teaching L2 contrasts realized in areas of high variability in the L1 phonemic space.
5. Conclusion
The current study highlights the role of L1 allophonic variability in L2 contrast perception, with greater L1 allophonic variability (i.e., Mandarin mid-vowel allophones) resulting in greater difficulty in perceptual discrimination of an L2 contrast in a similar location in the acoustic space (i.e., English /ɛ–æ/). Future research may seek to complement the current findings by exploring other language pairings and other sounds (e.g., consonant natural classes). An increase in the number of participants can also serve to strengthen the results, as the low number of participants is a clear limitation in this study. Moreover, as participants in the current study all reported high levels of L2 proficiency, future research may extend these findings to beginning L2 learners, where the effect may be more pronounced.
Acknowledgments
Publication of this article was funded in part by Purdue University Libraries Open Access Publishing Fund.