French participants learned English pseudowords either with the orthographic form displayed under the corresponding picture (Audio-Ortho) or without (Audio). In a naming task, pseudowords learned in the Audio-Ortho condition were produced faster and with fewer errors, providing a first piece of evidence that orthographic information facilitates the learning and on-line retrieval of productive vocabulary in a second language. Formant analyses, however, showed that productions from the Audio-Ortho condition were more French-like (i.e., less target-like), a result confirmed by a vowel categorization task performed by native speakers of English. It is argued that novel word learning and pronunciation accuracy should be considered together.
1. Introduction
Adult learners of a second language (L2) rarely attain native-like pronunciation. One factor that may lead to non-target-like productions is exposure to the orthographic form of words. Unlike children, who learn to understand and speak their native language (L1) years before they learn to read, adult learners of an L2 often learn the written and spoken forms of words together. However, the extent to which orthography influences performance in the L2 among adult learners and which aspects of performance it influences, are far from settled questions. The aim of the current study is to examine the influence of orthographic information on two aspects of language production in L2: pronunciation accuracy and novel word form learning (i.e., the ability to encode, access, and produce novel words). We concentrate on the case of L1–L2 pairs with a common alphabetic writing system, since the additional issues raised by the processing of unfamiliar writing systems are beyond the scope of the study.
The results of earlier studies paint a picture of a pervasive influence of L1 orthography on pronunciation accuracy in L2. Observations of spurious “spelling pronunciations” among language learners are common, and unsurprising, many studies have found that exposure to orthography may lead to productions that are less target-like. For L1 Italian learners of English, Bassetti and Atkinson (2015) found orthography effects on the pronunciation of words with silent letters, vowels with digraphs versus singleton letters, and homophonous words with different spellings. Bassetti (2017) found differences in the pronunciation of English words with double versus single consonants by Italian learners (see also Rafat, 2016). Hayes–Harb et al. (2018) found an effect of L1 English orthography on the pronunciation of German devoiced obstruents, as did Young–Scholten and Langer (2015) in a study of learners in a natural immersion setting. In some cases, however, exposure to orthographic forms can lead to productions that are more target-like. For example, in a study of the production of assibilated/fricative rhotics of Spanish (e.g., [] in ahumar) by L1 English speakers, Rafat (2015) found that participants who both heard novel words and saw their written forms produced more rhotic sounds, while those who did not see the written form produced non-target-like fricatives [see also Erdener and Burnham (2005) and Nimz and Khattab (2019)].
A number of models have formalized the influence of the L1 phonological system on the perception and subsequent categorization of sounds in the L2 input (e.g., Flege, 1995; Best and Tyler, 2007; see Rafat, 2016 for an overview). The pronunciation of L2 words should reflect this phonemic categorization. None of these models, however, has integrated a role of L1 orthography-induced phonological transfer. Best and Tyler (2007) do, however, raise the possibility that the use of a common grapheme in L1 and L2 may lead to L2 learners “equating” phoneme categories, even when their phonetic realizations are phonetically quite different, e.g., rhotics represented by ⟨r⟩ in French [ʁ] and English [ɹ].
Importantly, however, pronunciation is only one component of word learning. In order to produce words, speakers must first encode in long term memory novel labels (i.e., phonological representations) and their associations with the corresponding concepts. To our knowledge, the question of whether orthography influences the performance of L2 speakers in production tasks has not yet been examined. According to the dominant view in psycholinguistics, production and recognition/perception recruit distinct phonological representations (see Kittredge and Dell, 2016, for recent evidence and a literature review). Existing studies have all focused on receptive vocabulary1 and thus their findings cannot be taken to inform language production.
Ehri (2005) argues that knowledge of the alphabetical system allows readers to bond spelling to pronunciation in memory when they encounter novel written words (see also Ehri and McCormick, 1998). Research on L1 provides evidence that this orthographic information benefits spoken-word learning in children. For example, Ehri and Rosenthal (2007) taught second graders sets of novel English words. For half of these words, the orthographic form was presented during learning, while for the other half, no orthographic information was provided. In the test naming phase, words learned with the orthography were better recalled than those learned without. The same results were obtained with fifth graders. In addition, for this population, the contribution of orthographic information was greater for participants with higher reading abilities. The dual-coding theory (Sadoski, 2005) also predicts better memory for words learned with orthographic information: lexical representations become stronger with each additional source of information. In the current study, we investigate whether the benefit of orthographic information extends to adult learners of an L2.
We test the hypotheses that the presentation of the orthographic form along with the auditory form (1) facilitates the learning of novel word forms and their retrieval from the lexicon, and (2) leads to less nativelike pronunciation, if the critical grapheme-to-phoneme correspondences (GTPCs) differ between L1 and L2. We focus on L1 speakers of French learning novel L2 English pseudowords.
2. Method
2.1 Participants
Twenty-six undergraduate students (20 women; age: 18–26 years, mean: 20.6) from the Université Grenoble Alpes (France) participated in the experiment for course credit. Participants were native speakers of French who reported normal or corrected-to-normal vision and no hearing impairment. All had English as an L2, with varying degrees of proficiency, and all reported spending most of their time speaking and hearing French [88%, standard deviation (SD) = 12, vs English, 11%, SD = 8].
2.2 Materials
Stimuli were 20 monosyllabic English pseudowords C(C)VC(C), each of which was paired with a color picture of a rare animal, plant, or object. The pseudowords contained only consonant phonemes present in both English and French, and there were no minimal pairs. Half were spelled with ⟨i⟩ (e.g., lisk) and half with ⟨o⟩ (e.g., mog). Crucially, the French GTPCs for these graphemes (⟨i⟩ ∼ /i/, e.g., disque [disk] “disk” and ⟨o⟩ ∼ /ɔ/ in closed syllables, e.g., bogue [bɔɡ] “husk”) differ from the vowel produced in the spoken stimuli and from the most common North American English GTPCs (⟨i⟩ ∼ /ɪ/, the “default rule” for this grapheme (Carney, 1994, p. 337), e.g., disk [dɪsk]) and ⟨o⟩ ∼ /ɑ/ generally in monosyllabic words, e.g., log [lɑɡ]). The graphemes ⟨i⟩ and ⟨o⟩ can also correspond to other phonemes in English (e.g., ⟨i⟩ ∼ /aɪ/, /, file, ⟨o⟩ ∼ /oʊ/, go). Note, however, that the vowel /ɪ/ is not present in the French inventory and that in French ⟨o⟩ never corresponds to /ɑ/. The pseudowords were: ⟨i⟩ biv, blit, disp, flid, glizz, lisk, mib, nif, vig, zick; ⟨o⟩ blop, flob, gosk, losp, mog, skock, sloz, stot, vod, zox. They were recorded by a 26-yr-old female native speaker of English from Winnipeg, Manitoba, Canada, using an AKG C520 head-worn microphone (AKG by Harman, Stamford, CT) and a Zoom H4nSP Handy recorder (Zoom, Tokyo, Japan) at a sampling rate of 48 kHz.
2.3 Procedures
Each participant was tested individually in a quiet room over two successive days. The first day included the training session, and the following day the test session. This allowed for the consolidation of the representations of the novel words, since newly learned words are more likely to be lexicalized after a night of sleep (Gaskell and Dumay 2003).
Training session: Word learning (Day 1). Participants were told that they were going to learn new English words to be used in an American mobile phone app under development and that they would be tested on how well they had learned the words. Each of the 20 pseudowords was presented in a randomized order 20 times in 20 blocks. For all pseudowords, a sound file was played over Sennheiser HD 212Pro headphones and its associated picture was simultaneously displayed at the center of a computer screen for 4 s. Immediately after the offset of the image, the next sound file/picture pair was presented. For each participant, half (10) of the pseudowords were presented with the orthographic form displayed under the picture (Audio-Ortho condition) and half (10) were presented without it (Audio only condition). The two conditions were counterbalanced across two experimental lists. The learning session lasted approximately 40 min. No responses were collected and so no feedback was given, and there was no practice session.
Test session: Picture naming (Day 2). Participants were asked to name each picture as quickly and as accurately as possible and their responses were recorded. Each picture was named four times (in separate blocks) by each participant. Within each block, presentation order was random. The experiment was controlled by E-Prime software (version 2.0, Schneider et al., 2012). Participants then completed a language background questionnaire.
3. Results
To assess learning of the novel L2 word forms, naming accuracy and response times were measured, and to assess pronunciation accuracy, formant analyses and a vowel categorization task were used.
3.1 Naming accuracy
Responses were coded as correct if all phones of the target pseudoword were produced in the correct order with no additional phones, and the vowel produced was in the same region of the vowel space as the target (e.g., for mog: [mɑɡ], [mæɡ], or [mɔɡ]). Other responses were coded as incorrect (e.g., for mog: [miɡ]). Coding was performed independently by two coders, and inter-coder agreement was high (κ = 0.99).
The data of one participant who produced only one correct response were excluded from the analyses. The remaining 25 participants gave a total of 906 correct responses (45.3%, n = 2000). Data analyses were conducted in R (R Core Team, 2018). A generalized mixed-effects model with a logit function was fitted to these data, with learning modality [i.e., spoken form only (Audio) versus spoken and written form (Audio-Ortho)] and repetition (sum-coded) as fixed effects, Participant and Item as random intercepts, and by Participant and by Item random slopes for the factor learning modality. Correct responses were more frequent in the Audio-Ortho than in the Audio modality [51% vs 39%, b = 0.67, standard error (SE) = 0.33, z = 2.02, p = 0.04]. The probability of producing an error was higher for the first repetition than for other repetitions (b = 0.31, SE = 0.095, z = 3.25, p = 0.0012).
3.2 Response times
The data set was restricted to the correct responses to the first presentation of the picture (201 responses). Three data points were removed because the participant initially produced an incorrect response. For each correct response, naming latency was defined as the time between the onset of presentation of the image and the onset of the vocal response. Visual inspection of the distribution led us to remove one data point below 600 ms and one above 3200 ms, leaving 196 data points for the statistical analyses. A linear mixed-effects model was fitted to the log transformed naming times (transformation indicated by the Box-Cox test), with learning modality as fixed effect, Participant and Item as random intercepts, and by Participant and by Item random slopes for the factor modality. The latencies were significantly shorter (1417 ms, SD = 608) in the Audio-Ortho than in the Audio condition (1609 ms, SD = 707, b = 0.13, SE = 0.053, t = 2.41, p = 0.023).
3.3 Formant analyses
We conducted formant analyses to examine whether seeing the orthographic form of the word led to pronunciations compatible with French orthography. The data from all four repetitions were considered. For each correct response, the beginning and end of the pseudoword and of the vowel were labeled. All labelling and acoustic analyses were performed in Praat (Boersma and Weenink, 2017), using scripts to semi-automate the process. The first and second formants (F1, F2) were extracted from the vowel mid-point and hand corrected (for 21 items). Thirty items with unclear formants were excluded, leaving 876 data points. The formant values were then normalized using the Bladon procedure, appropriate for data sets with very few vowel categories represented (and evaluated in Flynn and Foulkes, 2011).
In line with the French (L1) GTPCs, for ⟨i⟩, we expected vowels to be more /i/-like (French-like) in the Audio-Ortho than in the Audio modality, that is, higher and fronter, thus with lower F1 and higher F2. For ⟨o⟩, we expected vowels to be more /ɔ/-like (French-like) in the Audio-Ortho modality, that is, higher and backer and possibly rounded, thus with both lower F1 and lower F2. These predictions were borne out by the analyses (see Fig. 1).
A linear mixed-effects model was fitted to the log transformed F1 value (following the Box-Cox test). Vowel and learning modality were entered as fixed effects, together with their interaction. We included random intercepts for Participant and Item, by Participant and by Item random slopes for the factor modality, and by a Participant random slope for the factor vowel. The interaction was not significant [F(1,15.85) = 1.16, p = 0.30]. The model without the interaction revealed main effects of vowel, with a lower F1 value for ⟨i⟩ than for ⟨o⟩ (b = 0.58, SE = 0.047, t = 12.31, p < 0.0001), and of modality, with lower F1 values in the Audio-Orthography modality (b = 0.052, SE = 0.021, t = 2.51, p = 0.022). The same analysis was performed for F2, using the untransformed values (following the Box-Cox test). The model revealed a significant interaction between vowel and modality [F(1,16.95) = 4.56, p = 0.048], suggesting that seeing the orthographic form led to an increase in F2 value for ⟨i⟩ and a decrease in F2 for ⟨o⟩.
3.4 Vowel categorization task
We designed a follow-up experiment to determine whether the observed formant differences between conditions correspond to the perception of different vowel categories by native speakers of English. The stimuli included all correct response tokens from the word-learning task that were produced without hesitations or dysfluencies (465 ⟨i⟩, 423 ⟨o⟩), as well as the 20 model pseudowords produced by the model speaker of English and an additional 20 tokens produced by a naive native speaker of French. The experiment was run in two blocks (one for each vowel grapheme) using Praat ExperimentMFC. Listeners were 24 native speakers of North American English recruited in Aix-en-Provence, France (19 women; age: 19–70 years, mean: 30.1), with no reported speech or language disorders. Participants listened to the pseudoword tokens presented over headphones in a randomized order and performed a forced choice identification task, using keywords to indicate the vowel heard (for ⟨i⟩: beet, bit, or bet, corresponding to /i/, /ɪ/ or /ɛ/; for ⟨o⟩: huck, hoke, hock, hack, corresponding to /ʌ/, /o/, /ɑ/, /æ/). The order of presentation of the blocks and of the keywords was counterbalanced. Responses were coded according to whether they were compatible with French GTPCs (⟨i⟩: beet /i/; ⟨o⟩: huck /ʌ/, hoke /o/) or incompatible (⟨i⟩: bit /ɪ/, bet /ɛ/; ⟨o⟩: hock /ɑ/, hack /æ/). In line with our predictions, participants produced more French-orthography-compatible vowels in the Audio-Ortho condition (78%) than in the Audio only condition (65%, b = 0.72, SE = 0.22, z = 3.2, p = 0.0014). As expected, the vowels of the model native speaker of Canadian English were almost never (2%) categorized as French-orthography-compatible. Note also that the probability of categorizing an occurrence as French-orthography-compatible was predicted for ⟨i⟩ by both F1 (b = 1.65, SE = 0.18, z = 9.2, p < 0.0001) and F2 (normalized) values (b = 1.38, SE = 0.13, z = 10.9, p < 0.0001) and for ⟨o⟩ by F1 (b = 1.15, SE = 0.14, z = 8.5, p < 0.0001), though not by F2 (normalized) values (b = 0.16, SE = 0.24, z = 0.67, p = 0.50).
4. Discussion
We investigated whether the presentation of the orthographic as well as the auditory form influenced two aspects of L2 learning: pronunciation accuracy and novel word form learning. Results showed a clear effect of orthographic information on pronunciation accuracy. Presentation of the orthographic form along with the auditory form led to less native-like productions of the novel words, whose vowel graphemes have different GTPCs in the participants' L1 and L2. This effect was found for a language pair in which both L1 and L2 have relatively opaque orthographies (i.e., lack one-to-one grapheme-to-phoneme and phoneme-to-grapheme correspondences), contra the original predictions of Erdener and Burnham (2005). These and other recent results (Rafat, 2016) highlight the importance of expanding models of the influence of the L1 phonological system on that of L2 to integrate the potential role of L1 orthography. We note that the orthography-induced phonological transfer observed here for L2 is in line with the hypothesis that orthography can modify the nature of the phonological representations in the L1. Perre et al. (2009), for example, showed with EEG recordings that a typical effect of orthography in the processing of spoken (L1) words (the phono-graphemic consistency effect) was localized in an area traditionally dedicated to phonological processing (the supramarginal gyrus) whereas no activation was observed in the area that codes orthographic information (the visual word form area) (see also Racine et al., 2014).
In addition, the results showed that the presentation of the orthographic form helped learners to successfully encode new items in the lexicon (fewer errors) and facilitated their retrieval (faster naming times). To our knowledge, the current study is the first to provide evidence that orthographic information facilitates the learning and on-line retrieval of new productive vocabulary in an L2. Several studies have examined the role of orthography on receptive vocabulary, with mixed results reported. In the paradigms typically used, participants are asked to learn novel (pseudo)words associated with pictures representing their meanings. In the training phase, pseudowords are presented either with auditory forms only or with both auditory and orthographic forms. In the test phase, a recognition task measures whether the new words have been memorized. In Escudero et al. (2008), L1 Dutch speakers of L2 English learned English pseudowords containing a highly confusable non-native vowel contrast (/ɛ/-/æ/, e.g., tenzer-tandik). Their results show that the participants who were presented with both the spoken and written form during training were better able to discriminate between target words and their competitors in subsequent tasks. In contrast, Simon et al. (2010) found no evidence that orthography helped listeners learn new words or a new phonemic contrast. In a study of native speakers of Spanish learning Dutch pseudowords, Escudero et al. (2014) examined whether the influence of orthography on word learning depended on whether or not pairs of vowel graphemes signal a phonemic contrast in both languages. For minimal pairs, access to orthographic forms in training facilitated word recognition when GTPCs signaled a phonemic contrast in both languages (e.g., pig-pug), while it hindered word recognition when they did not (e.g. pig-pieg) (see also Escudero, 2015). Some studies report null results from which we cannot draw conclusions, and the failure to find effects may well be due to methodological issues. In the word recognition/picture mapping tasks commonly used in previous studies, for example, a correct response required participants to have retained only minimal phonological information (e.g., onset or offset consonant). In some studies, responses are close to ceiling, making any effect undetectable. Future studies could examine the contribution of orthographic knowledge with a direct comparison of performance in production and in recognition tasks.
Of course, we must bear in mind that while the availability of orthographic information clearly facilitated word form learning in our study, it may do so only in cases where there is at least a partial overlap in the L1 and L2 GTPCs. For example, in our pseudowords, while the vowels have inconsistent GTPCs between L1 and L2, the consonants generally have consistent GTPCs (e.g., nif ∼ French /nif/, English /nɪf/), which may facilitate L2 word learning. In cases where there is little or no overlap, for example, for native speakers of French or English learning L2 Irish words like aghaidh /aɪ/, “face” or Aodh /iː/ (a boy's name), orthography may not be helpful. We also note that our participants had different degrees of proficiency in L2 English, and that further research is needed to establish how (or if) L2 proficiency modulates the influence of orthography on the learning of new productive vocabulary and on pronunciation accuracy.
Our results show that a nuanced view of the influence of orthographic information on L2 learning is needed, that the effects cannot be simply characterized as “positive” or “negative,” “friend” or “foe.” In a single experimental task with the same materials, we found that the presentation of orthographic information led both to more successful novel word learning and to less accurate L1-like vowel pronunciation. The effect on pronunciation may not be simply a question of if orthographic forms are presented but of when they are presented. In our study, participants in the Audio only condition were never presented with the orthographic form of the word. In many L2 learning contexts, however, even if a word is first learned from being heard, learners will eventually encounter the written form of the word. The potential influence of the timing of the presentation of orthographic information both on pronunciation accuracy and on novel word form learning, both in the short and long term, is worth exploring.
The datasets and scripts to reproduce the analyses reported in this paper are stored on the Open Science Framework website and can be accessed via the following link: https://osf.io/rfjh6/.
Acknowledgments
This work was partly supported by the University of Potsdam KoUP Cooperation Funding scheme and by a BQR grant from LPL. Research supported by Grant Nos. ANR-16-CONV-0002 (ILCB) and ANR-11-LABX-0036 (BLRI) and the Excellence Initiative of Aix-Marseille University (A*MIDEX). This paper is partly based on a master's thesis submitted by M.C. to Université Grenoble Alpes.
The speech production tasks of studies such as Erdener and Burnham (2005) and Rafat (2016, 2015) demonstrate the automaticity of the influence of L1 orthography. However, they required participants to retain (pseudo)words of an unfamiliar language for no more than approximately 16 s.