High vowels have higher f0 than low vowels, creating a context effect on the interpretation of f0. Since onset F0 is a cue to stop voicing, the vowel context is expected to influence voicing judgements. Listeners categorized syllables starting with high (“bee”-“pea”) and low (“bye”-“pie”) vowels varying orthogonally in VOT and onset F0. Listeners made use of both cues as expected. Furthermore, vowel height affected listeners' categorization. Syllables with the low vowel /a/ elicited more voiceless responses compared to syllables with the high vowel /i/. This suggests that listeners compensate for vowel intrinsic effects when making other phonemic judgements.

In everyday communication, listeners categorize speech sounds that are made up of a multitude of acoustic cues. In English, for example, the distinction between a voiced /b/ and a voiceless /p/ as in the words “bat” vs “pat” can be signaled by a number of cues, one such cue being the pitch (f0) at the onset of the following vowel (Lisker, 1986). More specifically, f0 is higher following voiceless stops compared to voiced stops. This has been found across languages, regardless of how the phonological voicing contrast is phonetically realized. For example, the pattern occurs in languages where the voicing contrast is dependent on the presence/absence of long-lag VOT, such as English and Danish (Petersen, 1978), as well as in languages where the voicing contrast is dependent on the presence/absence of (pre-)voicing, such as French and Spanish (Caisse, 1981; Dmitrieva , 2015) (see also Ting 2024 for a cross-linguistic comparison). These f0 patterns following voiced and voiceless stops in production are called consonant intrinsic F0 (CF0) effects (Kingston, 2007; Kirby and Ladd, 2016) because they are linked to a so-called intrinsic property of the stops (i.e., voicing).

A similar effect has long been observed for the production of vowels, such that the intrinsic height property of vowels affects the f0 of the vowel (Meyer, 1896; Connell, 2002; Gonzales, 2008). More specifically, high vowels (such as /i/ and /u/) tend to have an overall higher f0, compared to low vowels (such as /a/). This has also been reported across languages (see, e.g., Whalen and Levitt, 1995 for a cross-linguistic overview) and has been termed vowel intrinsic F0 (VF0) effects.

Some researchers have proposed that such effects are the direct result of physiological properties of producing the segments involved. That is, CF0 effects occur as a result of articulatory gestures required to inhibit or sustain voicing (Halle and Stevens, 1971; Kirby and Ladd, 2016) and VF0 effects occur as a result of tongue movement required to make high vs low vowels (Lehiste, 1976; Whalen and Levitt, 1995). Others have proposed that intrinsic f0 effects are implemented by speakers in order to maintain phonological contrasts (Connell, 2002; Van Hoof and Verhoeven, 2011). Yet others have proposed that the effects are better explained by a combination of physiological/articulatory properties and explicit use of the effects (Chen, 2011; Hoole and Honda, 2011). Although the source of consonant and VF0 effects is an ongoing debate, the growing literature on intrinsic f0 effects shows that both effects are consistently found in production across many languages.

The perception of intrinsic f0 effects has also been examined, though much of the literature has focused on CF0 effects rather than VF0 effects. Studies of native English listeners have shown that, all else being equal, higher onset F0 values lead to more voiceless responses compared to lower onset F0 values (Haggard, Summerfield, and Roberts, 1981; Abramson and Lisker, 1985; Whalen , 1990; Pardo and Fowler, 1997). This effect of onset F0 has been found even when VOT is not ambiguous (Whalen , 1993). English listeners have also been shown to consider CF0 effects in the perception of relative pitch (Pardo and Fowler, 1997). Since voiceless stops tend to be produced with higher f0 compared to voiced stops, the same post-stop f0 in the context of a voiceless stop is interpreted as relatively low and vice versa. Results of such studies suggest that English listeners make use of the CF0 pattern that is observed in production when making judgements in perception.

To what extent VF0 effects influence perception has been less studied. Most studies on the perception of these effects investigate differences in vowel quality. For example, Hombert (1977) examined the effect of vowel height on listeners' perceived f0. In their study, listeners heard two synthesized vowels that had the same f0 value but differed in formant frequencies. The synthesized vowels were created to resemble the high vowels /i/ and /u/, and the low vowel /a/, using frequency values as reported in Peterson and Barney (1952). Their results showed that when listeners were presented with two vowels differing in height (e.g., /i/ and /a/), they judged the low vowel to be higher in pitch more often than they judged the high vowel to be higher in pitch. A similar study by Fowler and Brown (1997) found the same pattern. These findings suggest that vowel quality affects perception of pitch in vowels. Since high vowels tend to be produced with higher f0, compared to low vowels, the same f0 in the context of a high vowel is interpreted as relatively low and vice versa. Reinholt-Peterson (1986) also demonstrated the reverse; f0 influences the perception of vowel quality. An ambiguous vowel between Danish vowels /u:/ and /o:/ was perceived more often as the high vowel /u:/ when the vowel was synthesized with a higher f0, compared to a lower f0. Importantly, from an intrinsic f0 point of view, these results show that listeners take VF0 effects that are observed in production into consideration in their perception of vowels.

Taken together, these results suggest that VF0 effects play a role in the perception of vowels and vowels play a role in the perception of f0. However, these studies rely on the perception of relative pitch using meta-linguistic methods, or on the perception of vowel quality between vowels that are similar in terms of formant frequencies. Vowel quality judgements are hard to compare across languages with different vowel categories, making cross-linguistic comparisons difficult. Furthermore, vowel height judgments require comparing vowels of similar heights and the effects of f0 may be underestimated since vowels of similar height have smaller inherent differences in f0 (see, e.g., Hoole and Mooshammer, 2002). Since vowel quality will always be a contributing factor to f0, we might expect that listeners will account for VF0 effects whenever f0 is relevant for perception. This should be parallel to the way that other coarticulatory processes have been shown to affect perception. For example, anticipatory lip rounding of the fricative /s/, preceding vowels such as /u/ and /o/, results in a lowering of the fricative noise spectrum, making it more like /ʃ/. The effect of the rounding of the following vowel has been shown to affect listeners' perception of the contrast between /s/ and /ʃ/. Listeners are more likely to report hearing the fricative /s/ when the following vowel is /u/ rather than /a/ (Mann and Repp, 1980). This is consistent with listeners taking into consideration the coarticulatory effect of anticipatory lip rounding, and expecting a lower fricative noise spectrum.

In the same way, we may be able to detect expectations about VF0 effects through testing the perception of consonant voicing contrasts. That is, if listeners make use of VF0 information (i.e., that high vowels are associated with higher f0 compared to low vowels), then they may interpret a given f0 in the context of a low vowel as higher than the same f0 in the context of a high vowel. Then, when listeners make the judgment about consonant voicing, they will use the perceived higher f0 in their judgment and give more voiceless responses. In this study, we test whether both consonant and vowel intrinsic f0 effects can be observed through English voicing judgements. If the perceptual effects of CF0 mirror the effect found in production, then we expect to see more voiceless responses when the post-stop f0 level is high (as in previous studies), and if the perceptual effects of VF0 mirror the effect found in production and is detectable through stop voicing judgements, then we expect to see more voiceless responses for low vowels compared to high vowels.

Our stimuli were created from natural recordings (following Lo, 2022). Using a base token of this kind provides a basis for “naturalistic but controlled stimuli” (McGuire, 2010). To estimate CF0 effects, we vary VOT through cross-splicing and onset F0 through resynthesis. This allows us to compare stimuli that are identical in all aspects except onset F0. However, to estimate VF0 effects, we used recordings of different vowels. This means that we compared across stimuli that are not identical in all other aspects except vowel quality. In all cases, however, individual recordings differ from one another in various ways due to idiosyncratic properties. Thus, it is important to examine how the choice of base tokens used for stimuli creation affects experimental outcomes. In this study, multiple base tokens of each target stimulus were used so that we can estimate the extent to which our effects depend on particular tokens and obtain more generalizable results. The stimuli used in the experiment, as well as the data, code used for data analysis, and supplementary materials, are provided on the paper's OSF site (see Data Availability).

The stimuli for this study were created to be used in multiple experiments as part of a larger study involving both English and Mandarin speakers. Therefore, the stimuli consisted of monosyllabic CV syllables that correspond to real words in both English and Mandarin. A native male speaker of Mandarin recorded five repetitions of the syllables /baɪ/, /paɪ/, /bi/, and /pi/, in all four Mandarin tones (1: high level, 2: low rising, 3: low falling, 4: high falling) for a total of 80 syllables. Since English and Mandarin do not share a low vowel category (English has a low back vowel /ɑ/, Mandarin has a low central vowel /a/), the high vowel /i/ and diphthong /aɪ/ were chosen to represent high and low vowel categories. As the current study focuses on native English speakers, we do not have any predictions about tone effects. Thus, for our purposes, the four different tones can be thought of as variation in intonation, as suggested by So and Best (2008).

The recorded syllables were manipulated to vary in f0 for the four tones, while keeping duration constant across syllables. Both f0 and duration manipulations were performed using the PSOLA algorithm (Moulines and Charpentier, 1990) as implemented in Praat (Boersma and Weenink, 2023). Average f0 values at the beginning and end of the naturally produced vowels were used to determine the f0 range for stimuli with Tones 1 and 4. Following Lo (2022), both Tone 1 and Tone 4 stimuli were created using Tone 1 base syllables. Thus, for each Tone 1 syllable, f0 of the vowel was held constant at 250 Hz (for Tone 1) or had a linear decline from 250  to 100 Hz (for Tone 4). For rising Tone 2 and low falling Tone 3, the f0 contours of the natural productions were used, so that the utterances would sound as natural as possible. Vowel duration for each syllable was manipulated to be 350 ms so that vowel length was constant across all stimuli. The syllables were then grouped to form 40 pairs (two vowels × four tones × five recordings) such that each pair consisted of a voiced syllable and its voiceless counterpart with the same vowel and tone. Within each pair, the aspiration portion of the voiceless syllable was used for VOT manipulation and the vowel portion of the voiced syllable was used for f0 manipulation, thus creating 40 base tokens. Each of the 40 base tokens was then manipulated to create 35 new stimuli per token that varied orthogonally along two continua: (1) a VOT continuum from 0–48 ms in seven equal steps, and (2) an f0 continuum for the onset F0 of the vowel ranging in five equal steps from 45 Hz above to 45 Hz below the starting f0 of the vowel. VOT and onset F0 manipulations were implemented using a Praat script originally written by Matthew B. Winn, which converts sounds to manipulation objects, enabling dynamic manipulation of f0 and duration (see Winn, 2020 for full details and tutorial of the script).1 VOT was manipulated with a progressive-cutback-and-replacement approach and the temporal extent of f0 manipulation was fixed at 75 ms (following Guo, 2020). We note that while f0 perturbations have been observed for 100 ms or more in English production (see, e.g., Whalen 1990, among others), longer manipulations resulted in unnatural sounding stimuli when applied to the recordings used for the current study. In total, there were 1400 stimuli (two vowels × 4 tones × five recordings × seven VOT steps × five f0 steps).

A total of 36 participants were recruited online and paid for their participation through the Prolific online recruitment platform (Prolific). All participants reported themselves as being between the ages of 18–35 yrs, self-identified native speakers of North American English who reported being raised with their native language only, and reported normal hearing and normal to corrected vision. As this study was designed as part of a larger study that investigates both English and Mandarin, speakers with experience with Mandarin or other tonal languages were excluded from further analysis (n = 1).2

The experiment was built and hosted using the Gorilla Experiment Builder (Anwyl-Irvine, Massonnié , 2020). Before completing the main experiment, participants first completed a short demographic and language background questionnaire, and a headphone screening task. Responses to the background questionnaires are provided in the supplementary materials.

Stimuli were randomly split into one of four stimuli lists, while ensuring the lists were balanced for vowel (/i, aɪ/). Each list consisted of 350 trials which comprised 25% of the total stimuli. This was done to keep the total experiment duration under 30 min. Each participant was assigned to one of four stimuli lists and completed a self-paced identification task. On each trial, participants saw a fixation cross for 500 ms, with 100 ms of pause before and after the fixation cross. Participants then heard an audio stimulus and were asked to choose the corresponding word on the screen. Two words were displayed on the screen: the word on the left always corresponded with the word with a voiced onset (e.g., bye) while the word on the right always corresponded with the word with a voiceless onset (e.g., pie). Stimulus order was randomized for each participant.

Participants whose perception did not show a significant effect of VOT (p > 0.1) were excluded from further analysis (n = 1), following Ting and Kang (2023). The final analysis included data from 34 participants. Figure 1(a) shows the proportion of /p/ responses over the VOT continuum for the five levels of f0. Figure 1(b) shows the proportion of /p/ responses over the VOT continuum for the two levels of vowel height (/i/ vs /aɪ/).

Fig. 1.

Proportion of voiceless responses over the VOT continuum for (a) the consonant f0 effect and (b) the vowel f0 effect. CF0: proportion of voiceless responses for the five levels of onset F0, ranging from 45 Hz above to 45 Hz below the starting f0 of the vowel, averaged across vowel (n = 2), tone (n = 4), and recording (n = 5); VF0: proportion of voiceless responses for low vs high vowel categories averaged across onset F0, tone (n = 4), and recording (n = 5). Dots show the mean proportion for each VOT value. Curved lines show logistic smooths of the response proportions for each category and shadings mark 89% confidence intervals.

Fig. 1.

Proportion of voiceless responses over the VOT continuum for (a) the consonant f0 effect and (b) the vowel f0 effect. CF0: proportion of voiceless responses for the five levels of onset F0, ranging from 45 Hz above to 45 Hz below the starting f0 of the vowel, averaged across vowel (n = 2), tone (n = 4), and recording (n = 5); VF0: proportion of voiceless responses for low vs high vowel categories averaged across onset F0, tone (n = 4), and recording (n = 5). Dots show the mean proportion for each VOT value. Curved lines show logistic smooths of the response proportions for each category and shadings mark 89% confidence intervals.

Close modal

Statistical analyses were done in R (R Core Team, 2022) using the glmer function of the lme4 package with a bobyqa optimizer. A logistics mixed-effects model was fit with response (voiceless coded as 1, voiced as 0) as a dependent variable and VOT, onset F0, Vowel (/i/, /aɪ/; contrast-coded), and Tone (contrast-coded) as fixed effects. VOT and onset F0 were centered and scaled using the rescale function of the arm package (Gelman and Su, 2022). Model selection involved using the maximal random effects structure justified by the design, then progressively simplifying the random effects structure until the model had no convergence or singular fit issues. In our final model, random effects included by-Token correlated random intercepts and random slope adjustments to VOT, where Token refers to the individual recordings of each syllable used to create the stimuli (including onset F0 led to singular fit) as well as by-Participant correlated random intercepts and random slope adjustments to VOT, onset F0, Vowel, and Tone. We will not interpret tone effects.

The model showed a significant effect of VOT (β = 5.26, z = 11.62, p < 0.001), with more voiceless responses as VOT duration increased, and a significant main effect of onset F0 (β = 0.60, z = 7.22, p < 0.001), with more voiceless responses for higher onset F0 values, replicating previous findings. We also found a significant main effect of Vowel (β = 0.58, z = 3.92, p < 0.001), with the low vowel category /aɪ/ eliciting more voiceless responses compared to the high vowel category /i/, indicating compensation for the effect of vowel height. There was no significant interaction between onset F0 and Vowel. The full model output is included on the OSF site (see Data Availability) for reference.

We wanted to further investigate the extent to which differences in our base recordings might affect our results. Most previous studies that involve a perception task of this kind use only one naturally produced base token for creating stimuli. Usually, the base token is chosen with specific criteria, such as having a reliable or consistent pitch track, having modal voicing throughout, not having excess noise, etc. However, there will always be idiosyncratic differences between different recordings, which can bias responses toward one option or the other [see, e.g., Hillenbrand (1984) and Oh (2020) for discussion].

Importantly, the use of different base tokens is unavoidable in our study to evaluate the VF0 effect. This poses a problem as any given base token could be more or less biased toward a voiced response and a comparison between any two base tokens could produce an apparent vowel effect (difference in voicing responses between the two vowels) by chance. The same issue does not arise in assessing the consonant intrinsic f0 effect, for which we can manipulate onset F0 from a single base token while keeping all else equal. Our use of multiple base tokens mitigates this issue for VF0 effects while at the same time allowing us to estimate the effect of base tokens on both effects. To examine the effect of base token choice, we compared the VF0 effect for all possible combinations of base tokens with the same pitch contour (n = 100) by subtracting the mean proportion of voiceless responses across VOT and onset F0 steps for each high vowel token (bi-pi) from the mean proportion of voiceless response for each low vowel token (baɪ-paɪ). To make a parallel comparison, we did the same calculation to compare the CF0 effect for all possible combinations of base tokens with the same pitch contour (n = 100), using the highest and lowest onset F0 levels. Figure 2 shows the distribution of (a) the CF0 effect and (b) the VF0 effect across base tokens.

Fig. 2.

Distribution of effect sizes across all possible token pairings for (left) the CF0 effect and (right) the VF0 effect. Solid vertical lines represent mean effect size for each effect; dotted lines indicate effect size of 0.

Fig. 2.

Distribution of effect sizes across all possible token pairings for (left) the CF0 effect and (right) the VF0 effect. Solid vertical lines represent mean effect size for each effect; dotted lines indicate effect size of 0.

Close modal

Figure 2 shows that while the effect sizes for CF0 and VF0 are similar and both positive, both distributions include positive and negative values. The range of CF0 effect size is slightly smaller, ranging from −0.08 to 0.23, compared to the range of VF0 effect size, ranging from −0.14 to 0.21. Note that while both ranges include negative numbers, meaning some combinations of tokens resulted in an effect in the opposite direction, the effect size range for the CF0 effect is more consistently positive and has a slightly higher mean compared to the VF0 effect. More specifically, for the CF0 effect, 11 possible pairings of tokens (11%) results in a negative effect size, compared to the VF0 effect for which 22 possible pairings (22%) results in a negative effect size. Examining the distribution of effect sizes given our stimuli thus indicates that relying on a single pair of base tokens could have resulted in quite different outcomes, especially for the vowel effect. However, we can also be confident that the effect is real, despite token intrinsic variation.

The current study tested whether vowel height influences listeners' perception of the English voicing contrast through compensation for VF0 effects. We compared vowel effects to onset F0 differences intrinsic to the voicing contrast itself. We found that both onset F0 and vowel height affect English listeners' perception of the voicing contrast, and that effect sizes were similar. Our results for the effect of onset F0 replicate previous findings and show that English listeners make use of onset F0 as a cue to voicing (Whalen 1993; Pardo and Fowler, 1997, Yu 2022). Our results for the effect of vowel height show that English listeners also compensate for expected VF0 effects when making judgements on the English voicing contrast. These findings suggest that both consonant and VF0 effects play a role in perception, and that VF0 effects can be detected in the context of voicing contrast judgements, which, to the best of our knowledge, has not been tested before.

These results add to the growing literature on intrinsic f0 effects, particularly in relation to cross-linguistic analyses. For example, while cross-linguistic data of VF0 effects in production have been examined [e.g., with a meta-analysis by Whalen and Levitt (1995)], only recently has there been a cross-linguistic study for the production of CF0 effects (Ting , 2024). The results of these cross-linguistic studies show variation in the production of both VF0 and CF0 effect size across languages. However, we do not yet know about the cross-linguistic variation, if any, in the perception of VF0 and CF0 effects. While perception studies investigating CF0 effects have been done across different languages [e.g., Shultz (2012); Llanos (2013); Schertz and Khan (2020)], to our knowledge, no studies have examined perception of VF0 effects cross-linguistically. Our findings that VF0 effects can be detected in the context of voicing contrast judgements should allow for easier cross-linguistic comparison of VF0 effects in perception as well as comparison between CF0 and VF0 effects within a language.

This research was supported by the Social Sciences and Humanities Research Council (Grant No. 435–2020-1140).

The authors have no conflicts to disclose.

The research was approved by the Research Ethics Board of McGill University. All participants gave informed consent.

The stimuli and data that support the findings of this study are openly available in the Center for Open Science repository at https://osf.io/4pfg7/. These materials can be used for research and teaching purposes and should make reference to this study.

2

The language background questionnaire showed that ten participants spoke or learned an additional language other than English. Further analysis showed that excluding these participants did not lead to different overall results; thus, we include these participants in our analysis and discussion.

1.
Abramson
,
A. S.
, and
Lisker
,
L.
(
1985
). “
Relative power of cues: F0 shift versus voice timing
,” in
Phonetic Linguistics: Essays in Honor of Peter Ladefoged
, edited by
V.
Fromkin
(
Academic Press
,
Orlando, FL
), pp.
25
33
.
2.
Anwyl-Irvine
,
A. L.
,
Massonnié
,
J.
,
Flitton
,
A.
,
Kirkham
,
N.
, and
Evershed
,
J. K.
(
2020
). “
Gorilla in our midst: An online behavioral experiment builder
,”
Behav. Res.
52
,
388
407
.
3.
Boersma
,
P.
, and
Weenink
,
D.
(
2023
). “
Praat: Doing phonetics by computer (version 6.3.16) [Computer program]
,” http://www.praat.org/ (Last viewed 2023).
4.
Caisse
,
M.
(
1981
). “
Cross-linguistic differences in fundamental frequency perturbation induced by voiceless unaspirated stops
,”
J. Acoust. Soc. Am.
70
,
S76
S77
.
5.
Chen
,
Y.
(
2011
). “
How does phonology guide phonetics in segment-f0 interaction?
,”
J. Phon.
39
,
612
625
.
6.
Connell
,
B.
(
2002
). “
Tone language and the universality of intrinsic F 0: Evidence from Africa
,”
J. Phon.
30
,
101
129
.
7.
Dmitrieva
,
O.
,
Llanos
,
F.
,
Shultz
,
A. A.
, and
Francis
,
A. L.
(
2015
). “
Phonological status, not voice onset time, determines the acoustic realization of onset f0 as a secondary voicing cue in Spanish and English
,”
J. Phon.
49
,
77
95
.
8.
Fowler
,
C. A.
, and
Brown
,
J. M.
(
1997
). “
Intrinsic f0 differences in spoken and sung vowels and their perception by listeners
,”
Percept. Psychophys.
59
,
729
738
.
9.
Gonzales
,
A.
(
2008
). “
Intrinsic f0 in Shona vowels: A descriptive study
,” in
Selected Proceedings of the 39th Annual Conference on African Linguistics
,
April 17–20
,
Georgia
(
Cascadilla Proceedings
,
Somerville, MA
), pp.
145
155
.
10.
Guo
,
Y.
(
2020
). “
Production and perception of laryngeal contrasts in
Mandarin and English by Mandarin speakers
,” Ph.D. dissertation,
George Mason University
,
Fairfax, VA
.
11.
Gelman
,
A.
, and
Su
,
Y.
(
2022
). “
arm: Data analysis using regression and multilevel/hierarchical models
(R package version 1.13-1),” (Last viewed 2023).
12.
Haggard
,
M.
,
Summerfield
,
Q.
, and
Roberts
,
M.
(
1981
). “
Psychoacoustical and cultural determinants of phoneme boundaries: Evidence from trading F0 cues in the voiced-voiceless distinction
,”
J. Phon.
9
,
49
62
.
13.
Halle
,
M.
, and
Stevens
,
K. N.
(
1971
). “
A note on laryngeal features
,” in
From Memory to Speech and Back: Papers on Phonetics and Phonology 1954–2002
(
De Gruyer Mouton
,
Berlin, Boston
,
2003
), pp.
45
61
.
14.
Hillenbrand
,
J.
,
Ingrisano
,
D. R.
,
Smith
,
B. L.
, and
Flege
,
J. E.
(
1984
). “
Perception of the voiced-voiceless contrast in syllable-final stops
,”
J. Acoust. Soc. Am.
76
,
18
26
.
15.
Hombert
,
J. M.
(
1977
). “
Consonant types, vowel height and tone in Yoruba
,”
Stud. Afr. Linguistics
8
,
173
190
.
16.
Hoole
,
P.
, and
Honda
,
K.
(
2011
). “
Automaticity vs. feature-enhancement in the control of segmental f0
,” in
Where Do Phonological Features Come From?: Cognitive, Physical and Developmental Bases of Distinctive Speech Categories
, edited by
N.
Clements
, and
R.
Ridouane
(
John Benjamins, Amsterdam
), pp.
133
171
.
17.
Hoole
,
P.
, and
Mooshammer
,
C. M.
(
2002
). “
Articulatory analysis of the German vowel system
,” in
Silbenschnitt Und Tonakzente (Syllable Cut and Tonal Accents)
, edited by
H. P.
Auer
,
P.
Gilles
, and
H.
Spiekermann
(
Niemeyer
,
Tübingen
), pp.
129
152
.
18.
Kingston
,
J.
(
2007
). “
Segmental influences on f0: Automatic or controlled
?,” in
Volume 2 Experimental Studies in Word and Sentence Prosody
(
Mouton de Gruyte
,
Berlin
), pp.
171
210
.
19.
Kirby
,
J. P.
, and
Ladd
,
D. R.
(
2016
). “
Effects of obstruent voicing on vowel F0: Evidence from ‘true voicing’ languages
,”
J. Acoust. Soc. Am.
140
,
2400
2411
.
20.
Llanos
,
F.
,
Dmitrieva
,
O.
,
Shultz
,
A.
, and
Francis
,
A. L.
(
2013
). “
Auditory enhancement and second language experience in Spanish and English weighting of secondary voicing cues
,”
J. Acoust. Soc. Am.
134
,
2213
2224
.
21.
Lisker
,
L.
(
1986
). “
‘Voicing’ in English: A catalogue of acoustic features signaling /b/ versus /p/ in trochees
,”
Lang. Speech.
29
,
3
11
.
22.
Lehiste
,
I.
(
1976
). “
Influence of fundamental frequency pattern on the perception of duration
,”
J. Phon.
4
,
113
117
.
23.
Lo
,
Y. H.
(
2022
). “
Post-stop fundamental frequency perturbation in production and perception of Mandarin stop voicing
,” Ph.D. dissertation,
University of British Columbia
,
Vancouver, British Columbia
.
24.
Mann
,
V. A.
, and
Repp
,
B. H.
(
1980
). “
Influence of vocalic context on perception of the [S]-[s] distinction
,”
Percept. Psychophys.
28
,
213
228
.
25.
McGuire
,
G.
(
2010
). “
A brief primer on experimental designs for speech perception research
,”
Lab. Rep.
77
,
2
19
.
26.
Meyer
,
E. A.
(
1896
). “
Zur Tonbewegung des vokals im gesprochenen und gesungenen einzelwort
,”
Phonetische Studien (Phonetic Studies)
(Beiblatt zu der Z. Die Neuren Sprachen)
10
,
1
21
.
27.
Moulines
,
E.
, and
Charpentier
,
F.
(
1990
). “
Pitch-synchronous processing techniques for text-to-speech synthesis using diphones
,”
Speech Commun.
9
,
453
467
.
28.
Oh
,
E.
(
2020
). “
Effects of base token for stimuli manipulation on the perception of Korean stops among native and non-native listeners
,”
Phonetics Speech Sci.
12
,
43
50
.
29.
Pardo
,
J. S.
, and
Fowler
,
C. A.
(
1997
). “
Perceiving the causes of coarticulatory acoustic variation: Consonant voicing and vowel pitch
,”
Percept. Psychophys.
59
,
1141
1152
.
30.
Petersen
,
N. R.
(
1978
). “
Intrinsic fundamental frequency of Danish vowels
,”
J. Phon.
6
,
177
189
.
31.
Petersen
,
G. E.
, and
Barney
,
H. L.
(
1952
). “
Control methods used in a study of the vowels
,”
J. Acoust. Soc. Am.
24
,
175
184
.
32.
Prolific (2014). “Prolific [computer program],” https://www.prolific.com (Last viewed 2023).
33.
R Core Team
. (
2022
). “
R: A Language and Environment for Statistical Computing
,” R Foundation for Statistical Computing, Vienna, Austria.
34.
Reinholt-Peterson
,
N.
(
1986
). “
Perceptual compensation for segmentally conditioned fundamental-frequency perturbations
,”
Phonetica
43
,
31
42
.
35.
Schertz
,
J.
, and
Khan
,
S.
(
2020
). “
Acoustic cues in production and perception of the four-way stop laryngeal contrast in Hindi and Urdu
,”
J. Phon.
81
,
100979
.
36.
Shultz
,
A. A.
,
Francis
,
A. L.
, and
Llanos
,
F.
(
2012
). “
Differential cue weighting in perception and production of consonant voicing
,”
JASA Express Lett.
132
,
EL95
EL101
.
37.
So
,
C. K.
, and
Best
,
C. T.
(
2008
). “
Do English speakers assimilate Mandarin tones to English prosodic categories?
,” in
Proceedings of the 9th Annual Interspeech
, September 22–26,
Brisbane, Australia
(Curran Associates, Inc., NY), pp.
1120
.
38.
Ting
,
C.
,
Clayards
,
M.
,
Sonderegger
,
M.
, and
McAuliffe
,
M.
(
2024
). “
The cross-linguistic distribution of vowel and consonant intrinsic f0 effects
,”
Language
(in press).
39.
Ting
,
C.
, and
Kang
,
Y.
(
2023
). “
The effect of habitual speech rate on speaker-specific processing in English stop voicing perception
,”
Lang. Speech.
(published online).
40.
Van Hoof
,
S.
, and
Verhoeven
,
J.
(
2011
). “
Intrinsic vowel F0, the size of vowel inventories and second language acquisition
,”
J. Phon.
39
,
168
177
.
41.
Whalen
,
D. H.
(
1990
). “
Coarticulation is largely planned
,”
J. Phon.
18
,
3
35
.
42.
Whalen
,
D. H.
, and
Levitt
,
A. G.
(
1995
). “
The universality of intrinsic F0 of vowels
,”
J. Phon.
23
,
349
366
.
43.
Whalen
,
D. H.
,
Abramson
,
A. S.
,
Lisker
,
L.
, and
Mody
,
M.
(
1990
). “
Gradient effects of fundamental frequency on stop consonant voicing judgments
,”
Phonetica
47
,
36
49
.
44.
Whalen
,
D. H.
,
Abramson
,
A. S.
,
Lisker
,
L.
, and
Mody
,
M.
(
1993
). “
F0 gives voicing information even with unambiguous voice onset times
,”
J. Acoust. Soc. Am.
93
,
2152
2159
.
45.
Winn
,
M. B.
(
2020
). “
Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script
,”
J. Acoust. Soc. Am.
147
,
852
866
.
46.
Yu
,
A. C. L.
(
2022
). “
Perceptual cue weighting is influenced by the listener's gender and subjective evaluations of the speaker: The case of English stop voicing
,”
Front. Psychol.
13
,
840291
.