Covariation among vowel height effects on vowel intrinsic fundamental frequency (IF0), voice onset time (VOT), and voiceless interval duration (VID) is analyzed to assess the plausibility of a common physiological mechanism underlying variation in these measures. Phrases spoken by 20 young adults, containing words composed of initial voiceless stops or /s/ and high or low vowels, were produced in habitual and voluntarily increased F0 conditions. High vowels were associated with increased IF0 and longer VIDs. VOT and VID exhibited significant covariation with IF0 only for males at habitual F0. The lack of covariation for females and at increased F0 is discussed.
I. Introduction
The current work examines covariation among intrinsic vowel fundamental frequency (IF0), voice onset time (VOT), and voiceless interval duration (VID) as a function of vowel height. Issues regarding the association between laryngeal and extralaryngeal function of the speech mechanism have been addressed by many researchers. The preponderance of existing work has focused on one or two of these measures, treating issues such as IF0 independently of other segmental variability (cf., Higgins et al., 1998). Consequently, while independent association between each of these three measures of laryngeal function and vowel height has been demonstrated in the existing literature, no work has examined covariation among these three measures in a single data set.
Extralaryngeal mechanical influences on laryngeal function have been proposed to influence fundamental frequency (F0) (Honda, 2004, 1995, 1983; Vilkman et al., 1996; Sapir, 1989), VOT (Weismer, 1979; Klatt, 1975), and VID (Weismer, 1979). The magnitude of each of these acoustic measures increases for high vowel compared to low vowel contexts. Honda (1983) describes a purported mechanical relationship between vowel articulation and IF0. In short, contraction of the genioglossus muscle causes forward movement of the hyoid bone that rotates the thyroid cartilage, resulting in increased longitudinal tension along the vocal folds.
Such an explanation could reasonably be extended to account for variation in VID and VOT. Specifically, an increase in vocal fold tension resulting from contraction of the extrinsic laryngeal and lingual musculature may exert a common influence that delays voicing by increasing phonation threshold pressure. The implication of this hypothesis is that these segmental effects may automatically co-vary. The possibility of universality of the IF0 effect has been considered to lend further evidence toward such an explanation (Whalen and Levitt, 1995). A contrasting explanation subscribes to active control, purportedly motivated by a need to enhance spectral or durational contrast (Kingston, 2007; Diehl et al., 1990). The debate over passive versus active accounts of IF0 has received considerable attention (cf., Hoole and Honda, 2011; Hoole et al., 2006) due in part to the observation that lax German vowels pose particular problems for a passive, mechanical account for IF0 (Fischer-Jørgenson, 1990). Hoole et al. (2006) suggest that a passive, mechanical explanation is viable but that talkers may also learn to enact an active enhancement strategy. Covariation among IF0, VOT, and VID during voluntarily increased F0 speech would suggest that these effects are automatic, not learned (Holt et al., 2001) because this is an atypical mode of speech, unfamiliar to participants. Moreover, if covariation among segmental measures reflects common, passive extralaryngeal influences that increase vocal fold tension, then increased F0 speech should further increase the magnitude of all effects by further sensitizing the vocal folds to tension increases.
II. Methods
Participants were 20 adults (10 males, 10 females), 23-35 years old, free of any speech, language, or hearing impairments. Speech was recorded in a sound-insulated booth. Talkers read CVC sequences in the carrier phrase “Say ______ instead.” Each sequence was repeated five times in random order at a subject’s habitual F0, and five additional times at an F0 approximately one-quarter octave above the habitual F0. For example, a male with habitual F0 of 120 Hz was required to speak at 150 Hz or above. This behavior was practiced by each participant and then monitored by the experimenter. CVCs included voiceless initial stops (/p, t, k/) and the voiceless initial fricative /s/ with vowels /i/, /u/, /a/, and /ae/ and the final consonant /d/.
Cspeech (Milenkovic, 1988) was used to measure: F0—measured from the average F0 across 5 middle cycles of the vowel duration; VOT—measured from the release of the stop burst, to the first glottal pulse of the following vowel; VID—measured from the last glottal pulse of the preceding vowel to the first glottal pulse of the following vowel.
III. Results
A total of 11 946 measures were viable and contributed to the results.
For females, the mean F0 difference between high and low vowels was greater in the increased F0 (∼27 Hz) compared to the habitual F0 (∼23 Hz) conditions with high vowels produced at a higher average frequency. This pattern was also observed for males with high-low vowel differences greater in the increased F0 condition (∼17 Hz) compared to habitual F0 (∼11 Hz; Table I).
IF0 (Hz) mean (s.d.) and comparisons between vowel heights (across consonant).
| Gender . | Condition . | High vowels . | Low vowels . | t-Test . |
|---|---|---|---|---|
| Female | Habitual F0 | 229.6 (22.8) | 207.1 (16.7) | t (731) = −15.90, P < 0.001 |
| Raised F0 | 329.5 (39.2) | 302.1 (39.6) | t (795) = −9.85, P < 0.001 | |
| Male | Habitual F0 | 128.5 (14.4) | 117.8 (13.2) | t (791) = −10.94, P < 0.001 |
| Raised F0 | 194.2 (26.2) | 177.2 (35.0) | t (738) = −7.79, P < 0.001 |
| Gender . | Condition . | High vowels . | Low vowels . | t-Test . |
|---|---|---|---|---|
| Female | Habitual F0 | 229.6 (22.8) | 207.1 (16.7) | t (731) = −15.90, P < 0.001 |
| Raised F0 | 329.5 (39.2) | 302.1 (39.6) | t (795) = −9.85, P < 0.001 | |
| Male | Habitual F0 | 128.5 (14.4) | 117.8 (13.2) | t (791) = −10.94, P < 0.001 |
| Raised F0 | 194.2 (26.2) | 177.2 (35.0) | t (738) = −7.79, P < 0.001 |
For females, vowel height contrasts resulted in mean VOT differences that were small and non-significant for /p/ and /t/ at both habitual and increased F0. High vowel contexts did result in significantly longer VOT durations (∼10 ms) for /k/ at the habitual F0 and longer VOTs (∼12 ms) at the increased F0.
For males, vowel height contrasts resulted in longer VOT durations for high versus low vowels for every consonant at each F0 with the exception of /p/ (∼5 ms) at the habitual F0. At habitual F0, mean VOT differences were significant and slightly larger for /t/ (∼7 ms) and larger yet for /k/ (∼14 ms). At increased F0, a similar pattern emerged with significant VOT differences that were smallest for /p/ (∼6 ms), larger for /t/ (∼9 ms), and largest for /k/ (∼18 ms).
Vowel height contrasts on VOT durations were always greater at the increased F0 for every consonant, for both males and females (Table II).
VOT (ms) mean (s.d.) and comparisons between vowel heights (within consonant).
| Gender . | Condition . | . | High vowels . | Low vowels . | t-Test . |
|---|---|---|---|---|---|
| Female | Habitual F0 | /p/ | 98.9 (17.3) | 96.8 (23.1) | NS |
| /t/ | 108.3 (19.0) | 107.9 (21.2) | NS | ||
| /k/ | 118.3 (20.4) | 108.6 (23.8) | t (192) = 3.08, P = 0.0024 | ||
| Raised F0 | /p/ | 82.1 (29.1) | 73.3 (29.6) | NS | |
| /t/ | 91.3 (28.2) | 85.9 (30.9) | NS | ||
| /k/ | 97.4 (23.7) | 85.1 (39.9) | t (162) = 2.64, P = 0.0091 | ||
| Male | Habitual F0 | /p/ | 77.5 (16.6) | 72.5 (14.6) | NS |
| /t/ | 88.1 (15.3) | 81.6 (15.8) | t (197) = 2.99, P = 0.0032 | ||
| /k/ | 96.9 (16.0) | 83.1 (15.3) | t (197) = 6.21, P < 0.001 | ||
| Raised F0 | /p/ | 64.1 (16.0) | 58.0 (17.9) | t (195) = 2.54, P = 0.012 | |
| /t/ | 78.4 (15.5) | 69.8 (20.4) | t (184) = 3.39, P < 0.001 | ||
| /k/ | 90.1 (15.1) | 72.3 (20.1) | t (183) = 7.09, P < 0.001 |
| Gender . | Condition . | . | High vowels . | Low vowels . | t-Test . |
|---|---|---|---|---|---|
| Female | Habitual F0 | /p/ | 98.9 (17.3) | 96.8 (23.1) | NS |
| /t/ | 108.3 (19.0) | 107.9 (21.2) | NS | ||
| /k/ | 118.3 (20.4) | 108.6 (23.8) | t (192) = 3.08, P = 0.0024 | ||
| Raised F0 | /p/ | 82.1 (29.1) | 73.3 (29.6) | NS | |
| /t/ | 91.3 (28.2) | 85.9 (30.9) | NS | ||
| /k/ | 97.4 (23.7) | 85.1 (39.9) | t (162) = 2.64, P = 0.0091 | ||
| Male | Habitual F0 | /p/ | 77.5 (16.6) | 72.5 (14.6) | NS |
| /t/ | 88.1 (15.3) | 81.6 (15.8) | t (197) = 2.99, P = 0.0032 | ||
| /k/ | 96.9 (16.0) | 83.1 (15.3) | t (197) = 6.21, P < 0.001 | ||
| Raised F0 | /p/ | 64.1 (16.0) | 58.0 (17.9) | t (195) = 2.54, P = 0.012 | |
| /t/ | 78.4 (15.5) | 69.8 (20.4) | t (184) = 3.39, P < 0.001 | ||
| /k/ | 90.1 (15.1) | 72.3 (20.1) | t (183) = 7.09, P < 0.001 |
For females, average VIDs were significantly longer for all consonants preceding high versus low vowels at each F0. At habitual F0, /p/ showed the smallest average difference (∼11 ms), followed by /t/ (∼19 ms) and /k/ (∼19 ms) with /s/ having the largest average difference (∼29 ms). At the increased F0, /p/ and /s/ had similar average VID differences (∼15 ms) with larger /t/ and /k/ differences (∼22 ms).
For males, average VIDs were significantly longer for all consonants preceding high versus low vowels in each F0 condition. At habitual F0, /p/ showed the smallest average difference (∼13 ms), followed by /t/ (∼18 ms), /s/ (∼20 ms), and /k/ (∼23 ms). At the increased F0, /p/ again showed the smallest average VID difference (∼14 ms), followed by /s/ (∼27 ms), /t/ (∼18 ms), and /k/ (∼28 ms).
Vowel height contrasts on VID were greater at the increased F0 for every stop consonant for both males and females (except for males, where /t/ showed equal average VID differences in both F0 conditions). However, /s/ showed a different pattern for both males and females with vowel height contrasts on VID greater at habitual F0 (Table III).
VID (ms) means (s.d.) and comparisons between vowel heights (within consonant).
| Gender . | Condition . | . | High vowels . | Low vowels . | t-Test . |
|---|---|---|---|---|---|
| Female | Habitual F0 | /p/ | 208.4 (28.9) | 197.6 (32.3) | t (194) = 2.50, P = 0.013 |
| /t/ | 210.6 (30.2) | 191.2 (32.2) | t (195) = 4.36, P < 0.001 | ||
| /k/ | 216.9 (27.7) | 197.8 (34.9) | t (188) = 4.29, P < 0.001 | ||
| /s/ | 213.6 (34.5) | 185.0 (31.3) | t (195) = 6.12, P <0.001 | ||
| Raised F0 | /p/ | 178.9 (37.8) | 163.5 (37.9) | t (194) = 2.85, P = 0.0048 | |
| /t/ | 184.4 (37.6) | 162.1 (37.1) | t (193) = 4.17, P < 0.001 | ||
| /k/ | 185.5 (30.0) | 163.6 (33.0) | t (194) = 4.89, P < 0.001 | ||
| /s/ | 177.4 (34.4) | 162.0 (26.3) | t (189) = 3.66, P < 0.001 | ||
| Male | Habitual F0 | /p/ | 182.6 (23.1) | 169.7 (21.1) | t (196) = 4.12, P < 0.001 |
| /t/ | 184.6 (30.2) | 166.9 (24.1) | t (179) = 4.48, P < 0.001 | ||
| /k/ | 188.8 (27.7) | 166.3 (24.1) | t (194) = 6.14, P < 0.001 | ||
| /s/ | 194.7 (25.3) | 171.7 (23.3) | t (196) = 5.13, P < 0.001 | ||
| Raised F0 | /p/ | 164.9 (27.8) | 151.3 (28.4) | t (194) = 3.42, P < 0.001 | |
| /t/ | 167.3 (27.6) | 149.3 (30.1) | t (196) = 4.41, P < 0.001 | ||
| /k/ | 179.2 (27.8) | 151.5 (27.3) | t (195) = 7.07, P < 0.001 | ||
| /s/ | 173.1 (24.8) | 155.8 (22.8) | t (196) = 5.13, P < 0.001 |
| Gender . | Condition . | . | High vowels . | Low vowels . | t-Test . |
|---|---|---|---|---|---|
| Female | Habitual F0 | /p/ | 208.4 (28.9) | 197.6 (32.3) | t (194) = 2.50, P = 0.013 |
| /t/ | 210.6 (30.2) | 191.2 (32.2) | t (195) = 4.36, P < 0.001 | ||
| /k/ | 216.9 (27.7) | 197.8 (34.9) | t (188) = 4.29, P < 0.001 | ||
| /s/ | 213.6 (34.5) | 185.0 (31.3) | t (195) = 6.12, P <0.001 | ||
| Raised F0 | /p/ | 178.9 (37.8) | 163.5 (37.9) | t (194) = 2.85, P = 0.0048 | |
| /t/ | 184.4 (37.6) | 162.1 (37.1) | t (193) = 4.17, P < 0.001 | ||
| /k/ | 185.5 (30.0) | 163.6 (33.0) | t (194) = 4.89, P < 0.001 | ||
| /s/ | 177.4 (34.4) | 162.0 (26.3) | t (189) = 3.66, P < 0.001 | ||
| Male | Habitual F0 | /p/ | 182.6 (23.1) | 169.7 (21.1) | t (196) = 4.12, P < 0.001 |
| /t/ | 184.6 (30.2) | 166.9 (24.1) | t (179) = 4.48, P < 0.001 | ||
| /k/ | 188.8 (27.7) | 166.3 (24.1) | t (194) = 6.14, P < 0.001 | ||
| /s/ | 194.7 (25.3) | 171.7 (23.3) | t (196) = 5.13, P < 0.001 | ||
| Raised F0 | /p/ | 164.9 (27.8) | 151.3 (28.4) | t (194) = 3.42, P < 0.001 | |
| /t/ | 167.3 (27.6) | 149.3 (30.1) | t (196) = 4.41, P < 0.001 | ||
| /k/ | 179.2 (27.8) | 151.5 (27.3) | t (195) = 7.07, P < 0.001 | ||
| /s/ | 173.1 (24.8) | 155.8 (22.8) | t (196) = 5.13, P < 0.001 |
Covariance was analyzed pairwise by calculating the correlation coefficients and corresponding t-values to evaluate significance. The square of the correlation coefficient (r2) was calculated to quantify the variance accounted for in each analysis.
For females, there were no significant covariances between VOT durations and IF0 in either F0 condition for any consonant. For males, covariances were significant for all consonants at habitual F0 (/p/, r2 = 0.171, t (38) = 2.79971, P = 0.01; /t/, r2 = 0.287, t (38) = 3.911, P = 0.001; /k/, r2 = 0.407, t (38) = 5.10695, P = 0.001). No significant covariances were found in the increased F0 condition. When data were pooled across consonants, no significant covariances were found between VOT durations and F0 across either subject group.
For females, there were no significant covariances between VIDs and IF0 in either F0 condition, in any consonant context. For males, covariances were significant for all consonants at habitual F0 (/p/, r2 = 0.403, t (38) = 5.06474, P = 0.001; /t/, r2 = 0.51, t (38) = 6.28896, P = 0.001; /k/, r2 = 0.329, t (38) = 4.31647, P = 0.001; /s/, r2 = 0.602, t (38) = 7.58138, P = 0.001). No significant covariances were found in increased F0. When data were pooled across consonant, statistically significant covariances were found for females at both the habitual F0 and increased F0 across subjects (habitual F0, r2 = 0.788, t (14) = 7.21372, P = 0.001; increased F0, r2 = 0.731, t (14) = 6.16803, P = 0.001). The same was true for males (habitual F0, r2 = 0.717, t (14) = 5.95567, P = 0.001; increased F0, r2 = 0.672, t (14) = 5.35564, P = 0.001).
IV. Discussion
Mean differences in IFO between high and low vowels were significant for both women and men, and in both habitual and increased F0 conditions. In addition, the differences between the means of high and low vowels were exaggerated in the increased F0 condition.
Vowel height effects on VOT were observed for males in both F0 conditions with significantly longer VOT before high vowels (except for /p/ at habitual F0). For females, vowel height effects on VOT were observed for /k/ in both F0 conditions. At increased F0, slight, non-significant vowel height effects were observed for /p/ (∼9 ms), and /t/ (∼5 ms).
More consistent effects of vowel height were observed for VID with durations for all consonants significantly longer before high vowels for both men and women and in both F0 conditions. For stop consonants, the effects of vowel height on mean durations of both VOT and VID were exaggerated in the increased F0 condition. Data with /s/ did not follow this pattern.
The finding of significant vowel effects on VOT durations for only the female /k/ productions may reflect the confluence of place and gender effects on VOT. Several researchers have demonstrated an effect of place of articulation on VOT durations (Volaitis and Miller, 1992; Klatt, 1975; Lisker and Abramson, 1964). Specifically, VOT durations for stop consonants increase from bilabial to alveolar to velar place of articulations. Differences in the time-varying cross-sectional area of the constriction release, conditioned by place of articulation, could contribute an aerodynamic influence on place-conditioned differences in VOT. This aerodynamic influence could interact with the raising of the tongue for /k/, which could indirectly increase tension in the vocal folds, further increasing phonation threshold pressure (Solomon et al., 2007; Titze, 1992) and delaying voicing.
Females tend to produce longer VOTs than males (Swartz, 1992; Whiteside and Irving, 1997; Ryalls et al., 1997; Koenig, 2000; Robb et al., 2005). Voiceless stop contexts appear to reveal this effect most consistently. Results for voiced consonants are equivocal, with some studies reporting a comparable effect (Swartz, 1992; Whiteside and Irving, 1997; Ryalls et al., 1997), and others reporting a tendency for male VOTs to be longer than female VOTs (Smith, 1978; Whiteside and Irving, 1998).
Two studies suggest that gender differences in VOT are eliminated by correcting for (Allen et al., 2003), or controlling (Morris et al., 2008) gender differences in speaking rate. In the current data, average vowel durations for females were longer than for males (∼25 ms for habitual F0 and ∼8 ms for increased F0), indicating that females tended to assume a slower speaking rate. Moreover, vowel duration changed significantly with F0 condition only for females, with shorter vowel durations (∼14 ms) for increased F0 compared to habitual F0. Thus females spoke faster in the increased F0 condition (compared to habitual F0), whereas males did not. VOT durations can decrease with increasing speaking rate (Volaitis and Miller, 1992; Wayland et al., 1994) and with increasing F0 (McCrea and Morris, 2005). VOT durations may also increase in clear speech conditions with or without accompanying decreases in speaking rate (Picheny et al., 1986; Krause and Braida, 2004). Taken together, the relationship between F0 condition and speaking rate and the overall slower rate assumed by female participants suggests that females may have been more apt to assume a clear mode of speech, despite receiving no instructions regarding speech clarity. This possibility could account for the lack of covariation among female measures due to competing demands on laryngeal control.
The increased F0 condition was assumed to eliminate learning influences on possible covariation and provide a further test for a strictly passive, mechanical account of segmental variability. The lack of covariation exhibited within this condition could suggest that covariation observed for the male speakers’ habitual F0 speech is, in fact, reflective of a learned, active covariation. Another explanation is that the increased F0 condition increased the influence of reflexive neural coupling on laryngeal control (Liu and Larson, 2007; Sapir, 1989). High F0 phonation is characterized by a reduction in F0 jitter (Gelfer, 1995). An increased sensitivity to mechanical and auditory perturbation during increased F0 speech (increased “pitch shift reflex”), presumably to achieve this increased vocal stability, could involve a general increase in laryngeal and extralaryngeal muscular contraction (Larson et al., 2008; Liu and Larson, 2007; Loucks et al., 2005; Sapir et al., 2000). Such a mechanism could conceivably disrupt passive, mechanical covariation among segmental measures by stiffening the laryngeal and extralaryngeal mechanism and reducing the motor system tolerance for passive variation. While much of the research addressing this issue has looked at sustained phonation, pitch reflex sensitivity has been show to vary as a function of F0 during speech (Liu et al., 2010).
In summary, the present study demonstrates covariation among IF0, VOT, and VID for male participants speaking at a habitual F0. Covariation among these three acoustic measures has not been previously demonstrated. While differences between high and low vowel conditions were quite small, and significant covariation among measures was confined to the male, habitual F0 data, the finding of covariation is consistent with a common passive, mechanical account for variability of these segmental measures. Data acquired from females and during voluntarily increased F0 speech were not consistent with the passive, mechanical account. A lack of covariation in the female data may have reflected a group tendency toward a clear speech mode during data acquisition. Increased F0 speech may invoke an increase in reflexive neural coupling that confounds passive, mechanical mechanisms of covariation among IF0, VOT, and VID. Overall, the results of the current work suggest that a common passive, mechanical explanation for the acoustic effects of vowel height variation may only be plausible for certain talkers and speaking conditions. Thus while some variation in these acoustic measures may derive automatically from extralaryngeal influences, talkers can actively modify laryngeal behavior to enhance specific acoustic cues (Hoole and Honda, 2011).
Acknowledgments
Portions of this work were derived from Dr. Moyle’s unpublished master’s thesis. We would like to thank Dr. Gary Weismer, who supervised that thesis and encouraged Dr. Berry’s reanalysis for the current manuscript, and Dr. Anders Löfqvist and two anonymous reviewers for helpful comments on an earlier version of this manuscript. This work was supported by the Marquette University and NIH R01 DC003723.