Previous studies suggest that listeners may use segmental coarticulation cues to facilitate spoken word recognition. Based on existing production studies which showed a pre-low raising effect in Cantonese tonal coarticulation, this study used a word identification task to investigate whether the tonal coarticulatory cue, carried by high-level and rising tones, was used when native listeners recognized pre-low and pre-high disyllabic words. The finding indicated that the listeners may rely on F0 of the rising tone to resolve lexical competition when hearing pre-high words. However, it did not provide evidence supporting the use of pre-low raising cue in spoken word recognition.

A fundamental question in speech perception is how listeners deal with variability of speech sounds, which are often differently articulated under the influence of neighboring sounds. The sound variability influenced by preceding and upcoming sounds is referred as coarticulation (Kühnert and Nolan, 2009). While listeners may find it challenging to identify sounds which go through coarticulatory modification in different phonetic contexts, listeners may also use coarticulation cues, which are available early in the acoustic signals, to anticipate upcoming speech sounds. Previous studies examined how anticipatory coarticulation cues of vowels (Beddor et al., 2013; Fowler and Brown, 2000) and consonants (Salverda et al., 2014) were used to identify or discriminate (English) words. The findings suggested that the coarticulation cues may facilitate spoken word recognition with participants showing faster and/or more accurate responses when they are available for use. For instance, Beddor (2009) found a trade-off relationship between (assimilatory) vowel nasalization and its upcoming nasal consonants in discrimination and identification of English words (e.g., bent). Importantly, among word pairs such as bet-bent, the coarticulatory cue of vowel nasalization alone was sufficient to elicit listeners' biased responses of bent (vs bet). Beddor et al. (2013) then conducted a spoken word recognition study to further examine English listeners' use of coarticulatory nasalization of vowels to anticipate the upcoming nasal consonants in their online processing. When participants heard a target word (e.g., bent; heard in the signal) with coarticulatory nasalization early or late included in the vowels, they were faster to respond by clicking the mouse if the competitor word (not heard in the signal) did NOT have a nasal following the vowels (e.g., bet; the target and competitor were mismatched in the pattern of vowel nasalization). In contrast, they were slower if the competitor word had a nasal following the vowels (e.g., bend; the target and competitor were matched in the pattern of vowel nasalization). Given listeners' use of anticipatory coarticulation cues in segments, the present study examines whether, and if so how, anticipatory coarticulation cues of lexical tones facilitate recognition of spoken words.

Pitch (i.e., F0) variations are used to distinguish lexical meanings in tonal languages. For instance, segmentally identical words that contain different lexical tones in Cantonese [/si 55/ “silk” (tone 1 (T1)), /si 25/ “history” (T2), /si 33/ “to try” (T3), /si 21/ “time” (T4), /si 23/ “city” (T5), and /si 22/ “matter” (T6)] differ in meaning (Yip, 2006). While pitch distinguishes the identity of isolated tones in tone languages, the pitch contours of lexical tones show considerable deviation from their isolated forms in continuous speech depending on the following and preceding tones. The contextual tonal variations are due to a phonetic effect of tonal coarticulation, which has been documented in tone languages such as Mandarin (Xu, 1997) and Thai (Gandour et al., 1996). For instance, Xu (1997) found an anticipatory effect of tonal dissimilation in Mandarin. In particular, the (dissimilatory) anticipatory effect was attributed to a pre-low raising (PLR) mechanism, that is, the F0 values of a high tone become higher when preceding a low tone (a High-Low sequence; a pre-low word) than when preceding a high tone (a High-High sequence; a pre-high word) (Lee et al., 2017). The PLR phenomenon has also been documented as a type of anticipatory tonal coarticulation in Cantonese (Gu and Lee, 2009; Lee et al., 2021). Lee et al. (2021) found that Hong Kong Cantonese speakers gave rise to the PLR in the first syllable (i.e., syllable 1) which carried T1 (a high-level tone, /55/) and T2 (a high-rising tone, /25/) [Gu and Lee (2009) found the effect for T2, but not for T1]. In other words, the Cantonese high tones in syllable 1 became even higher in a pre-low word (i.e., preceding tones with a low pitch onset in the second syllable, that is, syllable 2) than that in a pre-high word (i.e., preceding tones with a high pitch onset in syllable 2). The effect was found to interact with speech rate, with a larger PLR effect in normal or fast speech than that in slow speech. While tonal coarticulation has been well-documented in production studies, it is less clear whether listeners use tonal coarticulatory cues in perception. Most tone perception studies tested the perception of isolated Mandarin tones and Cantonese tones (Chang et al., 2017; Qin and Jongman, 2016). However, few studies, as far as we know, have tested whether listeners use the anticipatory PLR cue, found in tone production, in spoken word recognition.

Since it remains unclear whether listeners use the tonal coarticulation cues to anticipate the following tone, this study used the anticipatory PLR cue as a test case of tonal coarticulation and examined whether Cantonese listeners used it to recognize spoken words. Based on previous findings of segmental coarticulatory cues [e.g., Beddor (2009)], this study conducted a word identification task to examine whether Cantonese listeners would use the PLR cue, carried by syllable 1, to identify Cantonese disyllabic words. If listeners use the (dissimilatory) anticipatory cues of tones similarly as they used the (assimilatory) anticipatory cues of segments [e.g., nasalization in Beddor (2009) and Fowler and Brown (2000)], the PLR cue will facilitate word identification with the mismatch condition (i.e., two words having different patterns of PLR such as pre-low vs pre-high words) yielding higher identification accuracy and/or faster response than the match condition (i.e., two words sharing the same pattern of PLR such as two pre-low words), and vice versa. It was an open question whether the expected effect (i.e., match vs mismatch) would interact with other factors such as PLR carriers (T1 or T2) and word types (pre-high or pre-low target words).

Thirty native speakers of Macau Cantonese (15 female; mean age = 20; range = 18–24) were recruited from the University of Macau for the perception study. All participants spoke Macau Cantonese (henceforth, Cantonese) as their dominant language (self-reported percentage of using Cantonese = 75%). They had learned Mandarin and English in primary school. None of the participants reported speech or hearing impairments. They were monetarily compensated for their time.

The auditory stimuli were Cantonese disyllabic words which carry either a high tone (e.g., 中間 “middle”) or a low tone (e.g., 中年 “middle-aged”) in the second syllable. The first syllable always carries a high-level tone (T1, /55/) or a high-rising tone (T2, /25/). Both of the tones had a high pitch offset, and were found to carry the PLR cue (Gu and Lee, 2009; Lee et al., 2021). The word type was manipulated based on the tone carried by syllable 2. When the tone of this syllable was T1 /55/ which has a high pitch onset, the tone of syllable 1 was not raised in these pre-high words (which did not trigger PLR). When the tone of this syllable was T4 /21/, T5 /23/, or T6 /22/, which all have a low pitch onset, the tone of syllable 1 was raised more/higher in the pre-low words (which triggered PLR) than that in the pre-high words (see Fig. 1).

A female native speaker of Cantonese recorded the disyllabic word stimuli at a normal speech rate three times in a sound-proof room using an AKG C520 microphone linked to a digital recorder. The recordings were made at a sampling rate of 44 100 Hz with 16 bits per sample. One natural token for each target dissyllabic word was chosen by the investigators based on its intelligibility and sound quality. Consistent with previous findings of the PLR effect (Gu and Lee, 2009; Lee et al., 2021), as illustrated in Fig. 1, the natural stimuli showed a higher F0 value (i.e., higher by 9 Hz) in the pre-low target words than that in the pre-high target words when syllable 1 carried T1 (Pre-Low: F0 mean = 254 Hz; Pre-High: F0 mean = 245 Hz) and T2 (Pre-Low: F0 mean = 170 Hz; Pre-High: F0 mean = 161 Hz). The naturally produced words had an average duration at 679 ms (duration range: 440–857 ms). The stimuli was scaled to 70 dB using praat (Boersma and Weenink, 2018).

In each trial of the word identification task, participants heard a target word and saw two printed words in a visual display at the onset of the auditory stimulus. The test trials consist of 18 quadruplets (2 tones syllable 1 * 3 tones syllable 2 * 3 trials) of Cantonese disyllabic words which shared syllable 1 but differed in syllable 2 such as 中間 (“middle,” pre-high), 中心 (“center,” pre-high), 中年 (“middle-aged,” pre-low), 中華 (“China,” pre-low). While the first two words have a high tone in syllable 2 (the pre-high words), the last two words have a low tone in syllable 2 (the pre-low words). The PLR condition was manipulated based on whether two words in each trial were matched or mismatched in the PLR pattern (pre-low vs pre-high). For instance, the target word (heard in the signal) 中間 (“middle,” a pre-high word without PLR) was paired with a competitor word (not heard in the signal) 中心 (“center,” another pre-high word without PLR) in the match condition whereas it was paired with another competitor word 中年 (“middle-aged,” a pre-low word with PLR) in the mismatch condition. Likewise, the target pre-low word (e.g., 中年 with PLR) was also paired with other pre-low words (e.g., 中華 with PLR) or pre-high words (e.g., 中間 without PLR) in the match1 or mismatch condition, respectively. Important, the same stimuli of the target words were heard between the match and mismatch conditions. The competitor words visually displayed were matched on a scale of word familiarity2 (match: 3.35; mismatch: 3.43; 1 meaning “never heard or said” and 5 meaning “very often heard and said”) and the number of strokes (match: 17; mismatch: 18) between the two conditions.

Prior to the experiment, participants first completed a language background questionnaire. During the task, as mentioned above, participants were instructed to identify a target Cantonese disyllabic word from two words displayed (printed in traditional Chinese characters) on the left and right side of the screen. Specifically, they were instructed to press one of two buttons, labeled “right” and “left” on the keyboard, to identify which word was heard as quickly as possible (i.e., a speeded task) right after an auditory word was played to them. The positions of the target and competitor words were counterbalanced in the visual display throughout the experiment. No feedback was given. Time-out time was 2 s counting from word onset.3 Participants' response accuracy (0 or 1) and reaction time (RT) were recorded. RT was also measured from word onset.

Given the 2 word types (pre-high and pre-low) by 2 conditions (match and mismatch) design, there were 72 test trials (18 quadruplets * 2 types of words * 2 matching conditions; see supplementary material4 for a complete word list) in total. There were 144 filler trials. Among them, 36 filler trials had target words (e.g., 中心 “center” or 中華 “China”) displayed (but not heard) in test trials so that each word in the quadruplet was heard through the experiment. The other fillers were an additional 18 quadruplets of Cantonese disyllabic words which carried other tones (e.g., T3, T4, and T6) in syllable 1. The task had three blocks with 72 test trials and 144 filler trials repeated once in each block. Before the actual experiment, the participants familiarized themselves with 12 practice trials, randomly selected from the filler trials. The perception task was conducted using Paradigm software (Perception Research Systems, Inc.). It took approximately 25–30 min to complete this task.

The thirty participants completed the perception task. Missing responses were excluded from analysis, which resulted in a loss of 0.04% of the data. Two participants whose word identification accuracy was lower than 75% were removed prior to analysis.5 The remaining twenty-eight participants' identification accuracy and RT of correct responses were submitted for statistical analysis. Removing items with incorrect responses resulted in a loss of 4.4% of the RT data.

The accuracy data, as illustrated in Fig. 2, were entered into a mixed-effects logistic regression model with a maximal random-effects structure that converged without error (Matuschek et al., 2017). Word type (Pre-high, Pre-low; deviation coded), condition (Match, Mismatch; deviation coded), and Tonesyllalbe1 type (T1 and T2; deviation coded) were fixed effects. There were by-participant and by-item random intercepts, with by-item random slopes for the effect of Word type and Tone type. The dependent variable was accuracy (0 or 1). p values were calculated using the lmerTest package (Kuznetsova et al., 2018) in r.

The model returned a significant intercept (β = 3.60, SE = 0.19, z = 18.45, p < 0.001), but none of the fixed effects were significant (Word type: β = 0.25, SE = 0.29, z = 0.87, p = 0.38; condition: β = 0.14, SE = 0.14, z = 1.03, p = 0.30; Tone type: β = 0.23, SE = 0.29, z = 0.43, p = 0.43). There was a significant interaction between Word type and condition (β = 1.09, SE = 0.28, z = 3.92, p < 0.001) as well as their interaction with Tone type (β = 1.59, SE = 0.56, z = 2.87, p = 0.004). Given the three-way interaction, a post hoc analysis was separately conducted among the pre-high and pre-low words. The analysis of the pre-high words revealed a main effect of condition (β = 0.69, SE = 0.17, z = 4.06, p < 0.001), with the mismatch condition yielding a higher identification accuracy than the match condition, and a significant interaction between condition and Word type (β = 1.21, SE = 0.34, z = 3.53, p < 0.001). Further analysis showed a significant effect of condition for T2 stimuli (β = 1.32, SE = 0.25, z = 5.25, p < 0.001) among pre-high words, with the mismatch condition having a higher identification accuracy than the match condition. The effect of condition was not significant for T1 stimuli (β = 0.09, SE = 0.23, z = 0.40, p = 0.69) among pre-high words. In contrast, none of the effects were significant in the analysis of the pre-low words (condition: β = 0.40, SE = 0.22, z = 1.83, p = 0.07; Tone type: β = 0.28, SE = 0.26, z = 1.09, p = 0.28; interaction: β = 0.40, SE = 0.44, z = 0.91, p = 0.36).

The RT data of correct responses were log-transformed to adjust a positive skew of the raw data (Larson-Hall, 2015). The log-transformed RT data, as shown in Fig. 3, were then entered into a linear mixed-effects model with a maximal random-effects structure that converged without error. Word type (Pre-high, Pre-low; deviation coded), condition (Match, Mismatch; deviation coded), and Tonesyllalbe1 type (T1 and T2; deviation coded) were fixed effects. There were by-participant and by-item random intercepts, with by-item random slopes for the effect of Word type and Tone type. A back-fitting function (i.e., bffFixefLMER_F.fnc) which is from the package LMERConvenienceFunctions (Tremblay and Ransijn, 2015) in r and works exclusively for linear mixed-effects models was used to identify the best model that accounted for significantly more of the variance than simpler models, as determined by log-likelihood ratio tests. The model with the best fit included none of the fixed effects, so the results were not reported. Therefore, the analysis of log-transformed RT6 did not reveal any significant effect.

The present study used a word identification task to investigate whether Cantonese listeners would use the anticipatory PLR cue, carried by T1 and T2 in syllable 1, to identify Cantonese disyllabic words. Specifically, the listeners' accuracy and RT were compared between the match and mismatch conditions for pre-high words and pre-low target words, respectively. The results of accuracy showed that, when hearing the pre-high words which carried T2 in syllable 1, the participants had a higher accuracy when the target and competitor words were mismatched, rather than when they were matched, in the PLR effect. No such difference was found when they heard the pre-low words. In addition, the effect of condition was not found for accuracy of target words which carried T1 and RT data of all the target words.

The results of accuracy first indicated that, as expected, the mismatch condition yielded higher accuracy of word identification than the match condition for the pre-high target (T2) words. When hearing a pre-high word (which did not trigger PLR), the Cantonese target and competitor words revealed more lexical competition (i.e., more errors) when the competitor word was also a pre-high word in the match condition, and they revealed less competition (i.e., fewer errors) when the competitor word was a pre-low word in the mismatch condition. When the anticipatory PLR cue was not heard in the auditory stimuli, the listeners judged accurately that the target word had a high tone (i.e., not a low tone) in syllable 2, specifically when the competitor word had a different pattern of PLR. In other words, when the F0 (of T2) in syllable 1 of pre-high words was more canonical, as it was not raised by the PLR effect, Cantonese listeners relied on the F0 cue to resolve lexical competition during spoken Cantonese word recognition. However, without the anticipatory PLR cue heard in auditory stimuli, the finding of pre-high words did not directly show that the listeners used the anticipatory PLR cue during spoken Cantonese word recognition [e.g., Dahan et al. (2001)].

The results of accuracy further revealed that a high accuracy of word identification (above 0.95) was shown in both the match and the mismatch conditions for the pre-low target (T2) words. Unexpectedly, no difference in accuracy was found between the two conditions. When the anticipatory PLR cue was heard in the auditory stimuli, the cue did not seem to help the listeners determine that the target word had a low tone (i.e., not a high tone) in syllable 2. In other words, when the F0 (of T2) in syllable 1 of pre-low words was less canonical, in the sense of being raised by the PLR effect, Cantonese listeners did not seem to rely on the PLR cue to recognize spoken Cantonese words. Hearing deviant acoustic forms of F0 (higher F0 than expected) in syllable 1 might have disrupted the process of spoken word recognition [see a similar disruptive effect in Qin et al. (2019)] and thus resulted in no significant difference between the match and mismatch conditions. To this end, further experiments are needed to include a condition in which the PLR cue in pre-low words is removed through acoustic manipulation (e.g., stimuli cross-spliced from pre-high words) to verify the disruptive effect of the PLR cue (Dahan et al., 2001; Salverda et al., 2014). On the other hand, the anticipatory PLR cue in the auditory stimuli may not have been salient enough to directly facilitate the spoken word recognition by yielding a higher accuracy in the mismatch condition than in the match condition. For instance, Lee et al. (2021) found a difference of 10–20 Hz, depending on speech rates, between the pre-low words and pre-high words in Hong Kong Cantonese. However, the T1 and T2 between the target pre-low words and pre-high words (Macau Cantonese produced in a normal speech rate) yielded a difference of 9 Hz (see Fig. 1), which suggests a relatively nuanced PLR cue possibly under the influence of dialectal and/or speech rate variations. Meanwhile, it should be noted that the listeners did not recognize spoken words based on the acoustic difference as they only heard a target word, with competitor words visually displayed, in each trial. While the present study was designed to examine the use of the anticipatory PLR cue in natural stimuli, further research should resynthesize tone stimuli (i.e., cross-spliced stimuli with a greater acoustic difference) to investigate whether the PLR cue and its salience would modulate listeners' use of the cue during spoken word recognition.

The results of accuracy also indicated that the condition difference was found for the (pre-high) target words which carried T2 in syllable 1, but not for those which carried T1 in syllable 1. The difference between the T1 and T2 stimuli is presumably attributed to the more salient PLR cue of T2 stimuli. As illustrated in Fig. 1, the T1 stimuli of our study revealed a higher average F0 (9 Hz) for the pre-low words than for the pre-high words. In addition to the higher average F0 (9 Hz), the T2 stimuli also showed a steeper rising slope, on which Cantonese listeners may rely to perceive tones (Gandour, 1981), in the second part of the syllable for the pre-low words than for the pre-high words. The salience account is consistent with some production studies which showed the largest PLR effect for T2 than other tones such as T1 (Gu and Lee, 2009). Compared with T2 which is a rising tone, it is possible that the PLR effect in the production was less robust for T1 which is a level tone. For instance, Lee et al. (2021) showed that the PLR effect for T1 production was only observed for some cases (e.g., T1T5 vs T1T1) in faster speech, but not necessarily in normal or slower speech.

Identification accuracy might only reflect decision process and have masked the anticipatory process (Beddor et al., 2013). Thus, the present study included the measure of RT (also see footnote 1 for the short time-out time) in an attempt to shed light on the online use of the PLR cue in the (speeded) word identification [chapter 3.3 in Cutler (2012) and Yip (2015)]. However, different from our expectation, the results of RT did not reveal any condition difference for either pre-high or pre-low words.7 The results differ from those of previous studies that reported participants' faster responses (and faster eye fixations) when the target and competitor words were mismatched than when they were matched in the coarticulatory nasalization pattern (Beddor et al., 2013; Fowler and Brown, 2000). Methodological difference may explain the discrepancy between the present study and previous studies. The eye-tracking paradigm, which was used in the previous studies [e.g., Beddor et al. (2013)], might be more sensitive to time-course of spoken word recognition and thus had successfully captured listeners' moment-to-moment processing of anticipatory cues, especially when they were acoustically nuanced [e.g., a tonal difference of 10 Hz in Qin et al. (2019)].

To conclude, complementing existing studies which showed a coarticulatory PLR effect in Cantonese tone production (Gu and Lee, 2009; Lee et al., 2021), this study examined (Macau) Cantonese listeners' use of the PLR cue in spoken word recognition. Our finding suggests that listeners may rely on the F0 of T2 to resolve lexical competition when hearing pre-high words, but the effect was not found when listeners heard pre-low words potentially due to the possible disruptive effect of PLR (i.e., higher F0 than expected). This research called for further research (e.g., eye-tracking studies) on the use of tonal coarticulatory cues and raised questions regarding the potential influence of coarticulatory cues and their salience on spoken word recognition.

This research was partially supported by a Start-up Fund at the Hong Kong University of Science and Technology awarded to Zhen Qin, and by a Start-up Fund at the University of Macau awarded to Jingwei Zhang. The authors would like to thank Weijie Tan and Qiongyu Liu for their help in data collection. We thank Dr. Martin Cooke and two anonymous reviewers for their helpful comments.

1

In the match condition, the low tone of syllable 2 was identical in the pair of two pre-low words.

2

Since there is no large Cantonese frequency corpus, a familiarity rating task was conducted to determine the subjective familiarity of Cantonese disyllabic words (four participants rated the words, ranging from 1 “never heard or said” to 5 “very often heard and said”).

3

A short time-out time, together with instruction encouraging timely responses, was used so that participants were under time pressure (i.e., a speeded task) and their word identification process was less susceptible to post-lexical processing strategies. Pilot testing using a shorter time-out time (1 or 1.5 s) resulted in a great deal of data loss such as missing responses.

4

See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0009728 for a complete word list of test trials.

5

All the other participants had overall accuracy higher than 90%. We figured that the two participants may not have fully understood the instruction or engaged themselves in the task.

6

A model on the raw RT data of correct responses yielded the same (null) results.

7

Given the potential trade-off between response speed and accuracy, the null results of RT did not necessarily undermine the PLR pattern which was found for accuracy.

1.
Beddor
,
P. S.
(
2009
). “
A coarticulatory path to sound change
,”
Language
85
,
785
821
.
2.
Beddor
,
P. S.
,
McGowan
,
K. B.
,
Boland
,
J. E.
,
Coetzee
,
A. W.
, and
Brasher
,
A.
(
2013
). “
The time course of perception of coarticulation
,”
J. Acoust. Soc. Am.
133
,
2350
2366
.
3.
Boersma
,
P.
, and
Weenink
,
D.
(
2018
). “
Praat: Doing phonetics by computer
” [computer program], version 6.0.43, retrieved 8 September 2018.
4.
Chang
,
Y. S.
,
Yao
,
Y.
, and
Huang
,
B. H.
(
2017
). “
Effects of linguistic experience on the perception of high-variability non-native tones
,”
J. Acoust. Soc. Am.
141
,
EL120
EL126
.
5.
Cutler
,
A.
(
2012
). “
Native listening: Language experience and the recognition of spoken words
,”
Word
61
,
92
95
.
6.
Dahan
,
D.
,
Magnuson
,
J. S.
,
Tanenhaus
,
M. K.
, and
Hogan
,
E. M.
(
2001
). “
Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition
,”
Lang. Cogn. Process.
16
,
507
534
.
7.
Fowler
,
C. A.
, and
Brown
,
J. M.
(
2000
). “
Perceptual parsing of acoustic consequences of velum lowering from information for vowels
,”
Percept. Psychophys.
62
,
21
32
.
8.
Gandour
,
J.
(
1981
). “
Perceptual dimensions of tone: Evidence from Cantonese
,”
J. Chin. Linguist.
9
,
20
36
.
9.
Gandour
,
J.
,
Potisuk
,
S.
,
Ponglorpisit
,
S.
,
Dechongkit
,
S.
,
Khunadorn
,
F.
, and
Boongird
,
P.
(
1996
). “
Tonal coarticulation in Thai after unilateral brain damage
,”
Brain Lang.
52
,
505
535
.
10.
Gu
,
W.
, and
Lee
,
T.
(
2009
). “
Effects of tone and emphatic focus on F0 contours of Cantonese speech: A comparison with Standard Chinese
,”
Chin. J. Phon.
2
,
133
147
.
11.
Kühnert
,
B.
, and
Nolan
,
F.
(
2009
). “
The origin of coarticulation
,” in
Coarticulation: Theory, Data and Techniques, Cambridge Studies in Speech Science and Communication
(
Cambridge University Press
,
Cambridge
), pp.
7
30
.
12.
Kuznetsova
,
A.
,
Brockhoff
,
P. B.
, and
Christensen
,
R. H. B.
(
2018
). “
lmerTest package: Tests in linear mixed effects models
,”
J. Stat. Softw.
82
,
1
26
.
13.
Larson-Hall
,
J.
(
2015
).
A Guide to Doing Statistics in Second Language Research Using SPSS and R
, 2nd ed. (
Routledge
,
New York
).
14.
Lee
,
A.
,
Prom-on
,
S.
, and
Xu
,
Y.
(
2021
). “
Pre-low raising in Cantonese and Thai: Effects of speech rate and vowel quantity
,”
J. Acoust. Soc. Am.
149
,
179
190
.
15.
Lee
,
A.
,
Prom-On
,
S.
, and
Xu
,
Y.
(
2017
). “
Pre-low raising in Japanese pitch accent
,”
Phonetica
74
,
231
246
.
16.
Matuschek
,
H.
,
Kliegl
,
R.
,
Vasishth
,
S.
,
Baayen
,
H.
, and
Bates
,
D.
(
2017
). “
Balancing type I error and power in linear mixed models
,”
J. Mem. Lang.
94
,
305
315
.
17.
Qin
,
Z.
, and
Jongman
,
A.
(
2016
). “
Does second language experience modulate perception of tones in a third language?
,”
Lang. Speech
59
,
318
338
.
18.
Qin
,
Z.
,
Tremblay
,
A.
, and
Zhang
,
J.
(
2019
). “
Influence of within-category tonal information in the recognition of Mandarin-Chinese words by native and non-native listeners: An eye-tracking study
,”
J. Phon.
73
,
144
157
.
19.
Salverda
,
A. P.
,
Kleinschmidt
,
D.
, and
Tanenhaus
,
M. K.
(
2014
). “
Immediate effects of anticipatory coarticulation in spoken-word recognition
,”
J. Mem. Lang.
71
,
145
163
.
20.
Tremblay
,
A.
, and
Ransijn
,
J.
(
2015
). “
LMERConvenienceFunctions: A suite of functions to back-fit fixed effects and forward-fit random effects, as well as other miscellaneous functions
,” R package version 2.1, Comprehensive R Archive Network.
21.
Xu
,
Y.
(
1997
). “
Contextual tonal variations in Mandarin
,”
J. Phon.
25
,
61
83
.
22.
Yip
,
M.
(
2006
). “
Tone: Phonology
,”
Encycl. Lang. Linguist.
2006
,
761
764
.
23.
Yip
,
M. C. W.
(
2015
). “
Spoken word recognition of Chinese words in continuous speech
,”
J. Psycholinguist. Res.
44
,
775
787
.

Supplementary Material