The present study assessed the effect of sex on voice fundamental frequency (F0) responses to pitch feedback perturbations during sustained vocalization. Sixty-four native-Mandarin speakers heard their voice pitch feedback shifted at ±50, ±100, or ±200 cents for 200 ms, five times during each vocalization. The results showed that, as compared to female speakers, male speakers produced significantly larger but slower vocal responses to the pitch-shifted stimuli. These findings reveal a modulation of vocal response as a function of sex, and suggest that there may be a differential processing of vocal pitch feedback perturbations between men and women.

In recent years, researchers have exposed speakers to altered auditory feedback to explore the mechanisms underlying the control of voice fundamental frequency (F0). This research demonstrates that speakers respond to pitch feedback perturbations by rapidly shifting their F0 production in the direction opposite to the shifted auditory feedback in an effort to stabilize their production of sustained vowels (Hain et al., 2000) or speech phrases (Liu et al., 2009). Moreover, multiple lines of evidence have shown that vocal responses to pitch feedback perturbations are modulated not only as a function of the properties of the stimulus (perturbation magnitude, direction, etc.) (Liu and Larson, 2007) but also according to the demands of the specific vocal tasks (e.g., speaking vs singing) (Natke et al., 2003).

Although stimulus- or task-dependent modulation of vocal responses to perturbations in voice auditory feedback has been extensively investigated, little research has been focused on whether auditory feedback is used differently to control voice F0 in different populations (Russo et al., 2008; Liu et al., 2010). In particular, there is no published report indicating whether male speakers differ from female speakers in their vocal responses to pitch feedback perturbations during sustained phonation. To date, previous studies have assumed that no sex effect exists for the online processing of auditory feedback perturbations. As a consequence, many of the previous studies used convenience sampling and almost exclusively involved female participants (Hain et al., 2000; Larson et al., 2007).

Clear anatomical differences across the sexes such as the size of the larynx do exist (Titze, 1989). These differences are the primary reason that women have average F0 values that are 1.45–1.7 times higher than men’s averaged F0 values (Monsen and Engebretson, 1977; Klatt and Klatt, 1990). Functional brain studies have also indicated sex differences in terms of pitch processing with women relying on less lateralized processing strategies than men (Salmelin et al., 1999; Gaab et al., 2003). Thus, it is reasonable to hypothesize that these anatomical and functional differences across the sexes may lead to a differential processing of pitch perturbations in voice auditory feedback between men and women during sustained vowels. In the present study, native-Mandarin-speaking men and women were exposed to pitch-shifted auditory feedback to assess the role of sex in the processing of auditory feedback regarding F0.

Sixty-four native-Mandarin-speaking subjects (aged: 19–27 yr, 32 men) participated in the experiment. They reported no history of any hearing, language, speech, or neurological disorders. All the subjects signed the consent form approved by Institutional Review Board of The First Affiliated Hospital at Sun Yat-sen University of China.

The experiment was conducted in an acoustically shielded chamber. The subjects’ voices were recorded through a Genuine Shupu (Guangdong Province, China) microphone (model SM-306), amplified with a Mark of the Unicorn (MOTU) Ultralite Mk3 FireWire (Cambridge, MA) audio interface, and pitch-shifted with an Eventide Eclipse Harmonizer (Little Ferry, NJ), and then played back to subjects through Fostex (Akishima City, Tokyo, Japan) headphones (model T20RP mkII). Prior to the experiments, the recording system was acoustically calibrated so that the intensity of the feedback was 10 dB sound pressure level (SPL) higher than that of the voice output. A midi computer program developed with max/msp (v. 5.0 by Cycling 74) was used to control the stimulus parameters (e.g., magnitude, direction, duration, etc.) through the Eventide Eclipse Harmonizer. A transistor–transistor logical (TTL) pulse was used to indicate the onset and offset of the pitch-shift stimulus. The voice, feedback, and TTL pulses were digitized at 10 kHz by a PowerLab analog-to-digital (A/D) converter (model ML880, AD Instruments, Castle Hill, Australia) and recorded using labchart software (v. 7.0 by AD Instruments).

Subjects were asked to vocalize a vowel sound /u/ for approximately 5 s at their comfortable F0. During each vocalization, voice feedback was randomly pitch-shifted upward or downward five times. Each pitch-shift stimulus had a fixed duration of 200 ms. The first pitch-shift stimulus in the sequence of five for each vocalization was presented randomly between 500 and 1000 ms after vocal onset, and the succeeding stimuli had an inter-stimulus interval varying between 700 and 900 ms. Subjects produced 12 consecutive vocalizations in each of the three blocks in the experimental session, generating a total of 60 perturbations. Within each block, the pitch perturbation magnitude was fixed at ±50, ±100, or ±200 cents (100 cents = 1 semitone). The direction of the perturbation was randomized across the utterances within a block, leading to 30 upward and 30 downward perturbations. Presentation of the blocks was randomized across the subjects. Event-related averaging was used to measure the magnitude and latency of vocal responses (Larson et al., 2008) in igor pro [v. 6.0 by Wavemetrics Inc. (Lake Oswego, OR)] (see Fig. 1). A valid response was defined as a change in the F0 contour that exceeded a value of two standard deviations (SDs) of the pre-stimulus mean beginning at least 60 ms after the stimulus and lasting at least 50 ms. Response latency was measured as the time from the stimulus onset at which the response exceeded two SDs of the pre-stimulus mean, and the response magnitude was measured as the difference between the pre-stimulus mean and the greatest or lowest value of the F0 contour following the response onset. Absolute values of response magnitude and latency were statistically analyzed using spss (v. 16.0). A repeated-measures analysis of variance (RMANOVA) was used to test for significant differences in response magnitude and latency across all conditions. Probability values were corrected for multiple degrees of freedom using Greenhouse–Geisser if the assumption of sphericity was violated. The corrected p values were reported along with the original degrees of freedom.

FIG. 1.

Representative voice F0 contours to +100 cents stimuli. Horizontal dense dotted lines denote ±2 SDs of the pre-stimulus mean averaged F0 and vertical dashed lines denote the onset and offset time of the response. Horizontal sparse dashed lines indicate the place where response magnitude is measured. Pitch-shift stimulus onset is at time 0.0.

FIG. 1.

Representative voice F0 contours to +100 cents stimuli. Horizontal dense dotted lines denote ±2 SDs of the pre-stimulus mean averaged F0 and vertical dashed lines denote the onset and offset time of the response. Horizontal sparse dashed lines indicate the place where response magnitude is measured. Pitch-shift stimulus onset is at time 0.0.

Close modal

A three-way (stimulus magnitude, stimulus direction, and sex) RMANOVA was performed on the response magnitude, and the results revealed significant main effects of stimulus magnitude [F(2, 124) = 4.595, p = 0.012] and sex [F(1, 62) = 5.659, p = 0.020] [see Fig. 2(A)]. Male speakers (mean ± SD: 15.37 ± 7.09 cents) produced significantly larger vocal response magnitudes than female speakers (13.32 ± 5.97 cents). Post-hoc Bonferroni tests indicated that 100 cents stimuli (15.39 ± 5.18 cents) yielded significantly larger response magnitude than 50 cents stimuli (13.18 ± 5.53 cents) (p = 0.006) across all the participants. Statistical analysis, however, revealed no main effect of stimulus direction on the response magnitude [F(1, 62) = 3.004, p = 0.088], in which upward direction (13.89 ± 6.57 cents) produced similar response magnitude compared to downward direction (14.89 ± 6.72 cents). Significant interactions among these three variables were not observed either.

FIG. 2.

(A) Averaged response magnitude (SD) as a function of stimulus magnitude for male and female speakers. (B) Averaged response latency (SD) across stimulus direction for male and female speakers.

FIG. 2.

(A) Averaged response magnitude (SD) as a function of stimulus magnitude for male and female speakers. (B) Averaged response latency (SD) across stimulus direction for male and female speakers.

Close modal

Analysis of the response latency using a three-way RMANOVA showed significant main effects of stimulus direction [F(1, 62) = 8.934, p = 0.004) and sex [F(1, 62) = 4.688, p = 0.034] [see Fig. 2(B)] but not of stimulus magnitude [F(2, 124) = 2.691, p = 0.072). Upward stimuli (115 ± 66 ms) yielded significantly longer latency than downward stimuli (99 ± 51 ms). Male speakers (114 ± 61 ms) produced significantly longer latency compared to female speakers (101 ± 57 ms). There were no significant interactions among these three variables.

In addition, the averaged voice F0 values were calculated from the baseline voice prior to the stimulus onset (i.e., 200 ms pre-stimulus period) across all conditions for each subject. As expected, female speakers (285 ± 45 Hz) produced significantly higher voice F0 values than male speakers (184 ± 44 Hz) (t = −21.364, df = 350, p < 0.001). An analysis of covariance (ANCOVA), where voice F0 level as a covariate and stimulus magnitude and direction as fixed factors, was performed on all the response data. The results showed a significant effect of voice F0 level [F(1, 345) = 6.708, p = 0.010], where higher voice F0 level yielded smaller response magnitude (r = −0.17). Stimulus magnitude also had a significant effect on the response magnitude [F(1, 345) = 3.507, p = 0.031]. The subsequent ANCOVAs performed on the data from female and male speakers separately, however, revealed no effect of voice F0 level on response magnitude for either women [F(1, 161) = 0.129, p = 0.719; r = −0.12] or men [F(1, 177) = 0.315, p = 0.575; r = −0.083]. In addition, no significant main effects of stimulus magnitude or direction were observed either (p > 0.05). Regarding response latency, there was no main effect of voice F0 level for all the latency data or those from male or female speakers (p > 0.05).

Anatomical differences in the larynx of men and women (Titze, 1989), as well as functional differences in the processing of pitch (Salmelin et al., 1999; Gaab et al., 2003), suggest the possibility that a speaker’s sex might affect vocal responses to pitch feedback perturbations during vocalization. The results of the present study showed that male speakers produced significantly larger but slower responses to pitch-shifted perturbations than female speakers. These findings indicate that sex represents one of the many important factors that contribute to the online processing of auditory feedback during sustained vocalization.

One possible explanation for larger vocal responses produced by men than by women is that, regardless of sex, speakers with lower voice F0 levels produce larger vocal compensation responses. Men, on average, have lower voice F0 levels than women, and in this study, this difference was significant. Indeed, we found an effect of the voice F0 as the covariate on the response magnitude when the data were collapsed over sex, where the higher voice F0 level yielded smaller vocal responses. In a previous study, vocal responses were found to be modulated as a function of voice F0 level (Liu and Larson, 2007). However, in that study, vocal responses were found to be larger when speakers produced higher F0 values relative to the vocal responses produced during lower F0 productions (Liu and Larson, 2007). Moreover, in our study, the ANCOVAs revealed no effect of the voice F0 level on the response magnitude or latency within each sex. Therefore, it is unlikely that vocal response differences we observed across men and women are merely the result of the overall difference between their vocal pitches. At present, the source of this sex-based modulation of corrective vocal responses is unknown and could lie at the peripheral (e.g., intrinsic laryngeal muscles) or at the cortical level (e.g., auditory cortex).

In contrast to the larger vocal responses produced by male speakers, female speakers produced faster response latencies than male speakers. One possible reason is that women took less time to initiate compensation for the pitch feedback perturbations than men. This interpretation is complementary to a study by Xu and Sun (2002) regarding the maximum speed of pitch change during speech production. In that study, female speakers used less time than male speakers in the acceleration and deceleration phases of pitch changes. The physiological differences between men and women could be responsible for such differences in executing pitch changes. Differences between men and women in voice F0 are primarily accounted for by differences in thickness, mass, and length of the vocal folds; women’s vocal folds are characterized by shorter length, less thickness, and mass compared with men’s vocal folds (Titze, 1989). These physiological characteristics associated with female speakers may cause less laryngeal inertia during vocalization, which in turn results in initiating pitch change/vocal compensation faster than male speakers.

The present findings also showed an effect of stimulus magnitude on the response magnitude, where 100 cents stimuli yielded larger response magnitudes than 50 cents stimuli. Although it would seem logical for response magnitudes to be positively correlated to the amplitude of pitch perturbations, a stimulus size-dependent modulation of response magnitude is not always observed. For example, Liu and Larson (2007) reported that larger response magnitudes were associated with the larger stimulus sizes, while two other studies utilizing a similar paradigm did not find this effect (Chen et al., 2007; Larson et al., 2008). Similarly, the present study found downward pitch-shift stimuli elicited faster/ shorter response latencies than upward stimuli. Although this finding is consistent with the results reported by Larson et al. (2008), systematic changes in response latencies as a function of stimulus direction were not found in other studies (Chen et al., 2007; Larson et al., 2007). Since these studies were conducted using a similar paradigm to the one used in our study, it is likely that the inconsistent results are due to other factors that differ across the studies such as the different language backgrounds of the participants (Mandarin vs English) or manipulations of somatosensation using anesthesia.

In summary, the present study demonstrates sex-specific processing of pitch feedback perturbations during sustained vocalization. Along with other previous studies, these findings suggest that in addition to the stimulus properties and vocal tasks, the characteristics of the participants such as sex, language experience, or vocal function may make important contributions to the mechanisms underlying the voice F0 control. Although, at this point, we are unclear of the source of the observed differences across the sexes in their responses to auditory feedback, it is clear that the sex of the participants must be considered in future studies.

This work was supported by grants from National Natural Science Foundation of China (NSFC, Grant Nos. 30970965 and 31070990) and Guangdong Natural Science Foundation (Grant No. 9151008901000053).

1.
Chen
,
S. H.
,
Liu
,
H.
,
Xu
,
Y.
, and
Larson
,
C. R.
(
2007
). “
Voice F0 responses to pitch-shifted voice feedback during English speech
,”
J. Acoust. Soc. Am.
121
,
1157
1163
.
2.
Gaab
,
N.
,
Keenan
,
J. P.
, and
Schlaug
,
G.
(
2003
). “
The effects of gender on the neural substrates of pitch memory
,”
J. Cogn. Neurosci.
15
,
810
820
.
3.
Hain
,
T. C.
,
Burnett
,
T. A.
,
Kiran
,
S.
,
Larson
,
C. R.
,
Singh
,
S.
, and
Kenney
,
M. K.
(
2000
). “
Instructing subjects to make a voluntary response reveals the presence of two components to the audio-vocal reflex
,”
Exp. Brain Res.
130
,
133
141
.
4.
Klatt
,
D. H.
, and
Klatt
,
L. C.
(
1990
). “
Analysis, synthesis, and perception of voice quality variations among female and male talkers
,”
J. Acoust. Soc. Am.
87
,
820
857
.
5.
Larson
,
C. R.
,
Altman
,
K. W.
,
Liu
,
H.
, and
Hain
,
T. C.
(
2008
). “
Interactions between auditory and somatosensory feedback for voice F (0) control
,”
Exp. Brain Res.
187
,
613
621
.
6.
Larson
,
C. R.
,
Sun
,
J.
, and
Hain
,
T. C.
(
2007
). “
Effects of simultaneous perturbations of voice pitch and loudness feedback on voice F0 and amplitude control
,”
J. Acoust. Soc. Am.
121
,
2862
2872
.
7.
Liu
,
H.
, and
Larson
,
C. R.
(
2007
). “
Effects of perturbation magnitude and voice F0 level on the pitch-shift reflex
,”
J. Acoust. Soc. Am.
122
,
3671
3677
.
8.
Liu
,
H.
,
Russo
,
N.
, and
Larson
,
C. R.
(
2010
). “
Age-related differences in vocal responses to pitch feedback perturbations: A preliminary study
,”
J. Acoust. Soc. Am.
127
,
1042
1046
.
9.
Liu
,
H.
,
Xu
,
Y.
, and
Larson
,
C. R.
(
2009
). “
Attenuation of vocal responses to pitch perturbations during Mandarin speech
,”
J. Acoust. Soc. Am.
125
,
2299
2306
.
10.
Monsen
,
R. B.
, and
Engebretson
,
A. M.
(
1977
). “
Study of variations in the male and female glottal wave
,”
J. Acoust. Soc. Am.
62
,
981
993
.
11.
Natke
,
U.
,
Donath
,
T. M.
, and
Kalveram
,
K. T.
(
2003
). “
Control of voice fundamental frequency in speaking versus singing
,”
J. Acoust. Soc. Am.
113
,
1587
1593
.
12.
Russo
,
N.
,
Larson
,
C.
, and
Kraus
,
N.
(
2008
). “
Audio–vocal system regulation in children with autism spectrum disorders
,”
Exp. Brain Res.
188
,
111
124
.
13.
Salmelin
,
R.
,
Schnitzler
,
A.
,
Parkkonen
,
L.
,
Biermann
,
K.
,
Helenius
,
P.
,
Kiviniemi
,
K.
,
Kuukka
,
K.
,
Schmitz
,
F.
, and
Freund
,
H.
(
1999
). “
Native language, gender, and functional organization of the auditory cortex
,”
Proc. Natl. Acad. Sci. U.S.A.
96
,
10460
10465
.
14.
Titze
,
I. R.
(
1989
). “
Physiologic and acoustic differences between male and female voices
,”
J. Acoust. Soc. Am.
85
,
1699
1707
.
15.
Xu
,
Y.
, and
Sun
,
X.
(
2002
). “
Maximum speed of pitch change and how it may relate to speech
,”
J. Acoust. Soc. Am.
111
,
1399
1413
.