Voice pitch carries important information for speech understanding. This study examines the neural representation of voice pitch at the subcortical level, as reflected by the scalp-recorded frequency-following responses from ten American and ten Chinese newborns. By utilizing a set of four distinctive Mandarin pitch contours that mimic the English vowel /yi/, the results indicate that the rising and dipping pitch contours produce significantly better tracking accuracy and larger response amplitudes than the falling pitch contour. This finding suggests a hierarchy of potential stimuli when testing neonates who are born in a tonal or non-tonal linguistic environment.

Voice pitch is an important cue for speech understanding in both tonal and non-tonal languages. In tonal languages, such as Chinese, a change in the pitch of a person's voice can assign a whole new meaning to a word. In non-tonal languages, such as English, differences in voice pitch can change the prosody of a message and indicate a statement, a question, or a demand. Frequency-following response (FFR) is a scalp-recorded physiological measurement that reflects synchronous patterns of action potentials at the subcortical neural structures that are phase-locked to the periodicity of a stimulus.1,2 The FFR is advantageous because it is non-invasive and it does not require attention, alertness, or any active participation of the listener. Due to these advantages, the FFR has been used to examine the subcortical neural representation in adults,3,4 children,5,6 and infants.7,8 While the FFR has been investigated throughout the human lifespan, subcortical pitch representation in newborns is less well characterized. This study was designed to examine the subcortical neural representation of voice pitch during immediate postnatal days of human infancy.

To date, only two studies have demonstrated phase locking to the Mandarin pitch contours in newborns.9,10 In 2011, Jeng and colleagues9 reported no difference between American and Chinese newborns in response to a rising pitch contour. In this first study of neonatal FFRs to voice pitch, no data were compared across the four Mandarin pitch contours. In a recent paper,10 the same group reported a developmental trajectory of subcortical pitch representation during the first three months of life in Chinese newborns. In this recently published study, each newborn was tested twice: 1–3 days after birth and at 3 months of age. However, the FFR data were compared only between the two age groups (i.e., newborns versus 3-month-olds), with all the FFR recordings pooled together across the four pitch contours. Thus, subcortical pitch representation to the four Mandarin tones was undetermined.

There is a gap in our knowledge regarding the efficiency of the four Mandarin pitch contours in eliciting measurable FFRs during immediate postnatal days. Because the FFR has a potential to be used as an objective assessment tool for pitch representation in newborns, it is critical to determine the pitch pattern that will elicit the most robust response. Although the FFRs to the four Mandarin pitch contours have been reported in adults,11 those results are not directly applicable to newborns because subcortical neural circuities during immediate postnatal days are different from those in adults. The purpose of this study was to determine the pitch pattern that yields FFRs with the best tracking accuracy and the largest response magnitude. Because neural phase locking is stronger at low frequencies than at high frequencies12 and each Mandarin pitch contour carries a distinctive trajectory of voice pitch, it was hypothesized that the newborn's subcortical pitch representation in response to the rising or dipping pitch contour would exhibit the best tracking accuracy and the largest response magnitude during their immediate postnatal days. To further examine whether subcortical pitch representation is a result of the common repertoire of auditory anatomy and functional capacity at birth or a consequence of early language exposure, neonates who were born in a tonal or non-tonal linguistic environment were included.

Ten American newborns (1–3 days old, 6 boys and 4 girls) and ten Chinese newborns (1–3 days old, 4 boys and 6 girls) were recruited from OhioHealth O'Bleness Hospital in Athens, Ohio and China Medical University Hospital in Taichung, Taiwan, respectively. All the newborn participants were recruited through on-site communications with the legal guardians of the newborns at their respective hospitals. All the American newborns were from native English-speaking households in the United States and all the Chinese newborns were from native Mandarin-speaking households in Taiwan. Testing was conducted in a quiet room in the nursery station of the newborns' hospitals. All newborns included were full-term at birth and passed a hearing screening in the nursery. These newborns were not diagnosed with any neurological disorders and were free from any risk factors of hearing loss such as low birth weights and low APGAR scores.13 The legal guardians of the newborns provided consent to allow their newborns to participate in this study. Research design and experimental protocols were approved by the Institutional Review Boards at both the Ohio University and China Medical University Hospital.

A set of four contrastive, mono-syllabic Mandarin stimuli were utilized to evoke the FFRs. The four Mandarin stimuli were tone 1 /yi1/ with a flat pitch contour [fundamental frequency (f0) range: 163–180 Hz], tone 2 /yi2/ with a unidirectional rising pitch contour (f0 range: 116–157 Hz), tone 3 /yi3/ with a bidirectional falling-rising pitch contour (f0 range: 98–125 Hz), and tone 4 /yi4/ with a unidirectional falling pitch contour (f0 range: 135–186 Hz). The four Mandarin stimuli were recorded from a male Mandarin speaker and each stimulus had a duration of 250 ms with a 10 ms rise and fall time of the stimulus envelope.

Stimulus presentation and data acquisition were obtained using custom-made software written in labview (National Instruments, Austin, USA). The stimuli were presented monaurally, with alternating polarities, at 60 dB sound pressure level through an electro-magnetically shielded insert earphone (Etymotic ER-3A) to the newborn's left or right ear. The ear that was most accessible to the experimenters when the newborn was in a restful state of natural sleep was selected for testing. The silent interval between the offset of a stimulus and the onset of the next stimulus was fixed at 45 ms.

The FFR signal was recorded through the use of three gold-plated recording electrodes positioned at the midline of the high-forehead just below the hair line (non-inverting), the mastoid ipsilateral to the insert earphone (inverting), and low forehead (ground). The impedance of the three electrodes were kept under 3000 Ohms and balanced within 1500 Ohms at 10 Hz. Continuous brain activities were amplified (OptiAmp8008, gain 50000), filtered (10–3000 Hz, 6 dB/octave), digitized (16-bit analog-to-digital conversion, 20 000 samples/s), and saved on a computer for offline analysis. To better isolate spectral energies around the f0 contours, each recording was digitally band-pass filtered using a brick-wall, linear-phase, finite-impulse-response filter (90–1500 Hz, 500th order). Filtered brain waves were segmented (295 ms in length), artifact rejected (threshold = ±25 μV), and averaged.

Four FFR recordings were obtained from each newborn. Each recording consisted of 2000 artifact-free recording sweeps in response to one of the four tonal stimuli. The four Mandarin stimuli were presented in a random order within and across the participants. A control condition, where the sound tube was occluded and removed from the participant, was also administered. A typical testing session took about 49 min (295 ms × 2000 sweeps × 5 conditions ≅ 49 min) to complete; however, the total amount of time needed to ensure that the newborn was in a restful state or asleep varied. Due to time constraints, the control condition was conducted in only eight American and six Chinese newborns.

Data were analyzed by using bespoke scripts written in matlab (Mathworks, Natick, USA). Accuracy and magnitudes of the subcortical responses were estimated through two objective indices: frequency error and pitch strength, respectively. To estimate the accuracy of pitch tracking at the subcortical level, frequency error was estimated by calculating the mean difference between the f0 contours of the stimulus and that of a response. This index represented how accurately the response followed the f0 contour of a stimulus. To estimate the magnitude of a response, each recording was put through an autocorrelation algorithm to estimate the summed energy of the FFR to voice pitch for each recording. Pitch strength was derived by finding the peak-to-trough amplitude starting from the maximum positive peak (within the range of 3–10 ms time shifts) to the following trough in the normalized autocorrelation output of each recording. This index represented the robustness of pitch tracking at the subcortical level.

A two-way analysis of variance (ANOVA) was used to determine the significance between the ethnic groups (American versus Chinese) and within each group for the tone factors (tones 1, 2, 3, and 4). A conservative, post hoc Tukey-Kramer analysis was conducted to examine the significance of all pairwise comparisons among the four pitch contours. Student t tests were calculated to determine whether the means of the responses to the four pitch contours were significantly different than those obtained in the control condition. A p value less than 0.05 was considered statistically significant.

Subcortical pitch representations were visualized through the use of sliding-window, narrow-band spectrograms of the neonatal FFRs (Fig. 1). Amplitude spectrograms of the four Mandarin stimuli (top row) delineated the distinctive pitch contours of the four Mandarin tones. Grand-averaged spectrograms derived from the recordings obtained in the ten American (middle row) and ten Chinese (bottom row) newborns showed clear energy that followed the pitch contours of the four Mandarin tones.

Fig. 1.

Amplitude spectrograms and time waveforms of the four Mandarin stimuli (top row) and grand-averaged frequency-following responses obtained in ten American (middle row) and ten Chinese (bottom row) newborns who were 1–3 days after birth. A gradient progression and a vertical bar on the right indicate the spectral and temporal amplitudes of the recordings, respectively. All spectrograms were obtained using a sliding Hanning window with a duration of 50 ms, step size of 1 ms in length, and frequency resolution of 1 Hz.

Fig. 1.

Amplitude spectrograms and time waveforms of the four Mandarin stimuli (top row) and grand-averaged frequency-following responses obtained in ten American (middle row) and ten Chinese (bottom row) newborns who were 1–3 days after birth. A gradient progression and a vertical bar on the right indicate the spectral and temporal amplitudes of the recordings, respectively. All spectrograms were obtained using a sliding Hanning window with a duration of 50 ms, step size of 1 ms in length, and frequency resolution of 1 Hz.

Close modal

Results of the two-way ANOVA for frequency error revealed a significant difference for the Mandarin-tone factor [F(3, 320.460) = 3.804, p = 0.015, partial η2 = 0.174], but not for the group factor [F(1, 46.929) = 0.959, p = 0.340, partial η2 = 0.051] nor for the interaction [F(3, 12.475) = 0.148, p = 0.930, partial η2 = 0.008] between the two factors. A post hoc analysis revealed that tones 2 and 3 produced smaller frequency errors than tone 4 (see Table 1). For clarity, results of all pairwise comparisons are summarized in Table 1.

Table 1.

Group mean differences ± one standard error of the frequency errors and pitch strengths obtained in ten American and ten Chinese newborns who were 1–3 days old. The stimuli were four distinctive Mandarin mono-syllables that mimicked the English vowel /yi/ with a relatively flat (tone 1), rising (tone 2), dipping (tone 3), or falling (tone 4) pitch contours. These results were from a post hoc Tukey-Kramer procedure conducted within the four Mandarin tone conditions. Note: For the American newborns, the mean frequency errors for tones 1, 2, 3, and 4 were 13.862, 11.299, 10.784, and 16.404 Hz, respectively. The mean frequency errors for the Chinese newborns were 11.917, 9.396, 10.614, and 14.295 Hz, respectively. The mean pitch strengths for the American (and Chinese) newborns were 0.452 (0.509), 0.534 (0.614), 0.621 (0.621), and 0.394 (0.487) for tones 1, 2, 3, and 4, respectively.

ToneToneMean differenceStandard errorp95% Confidence
(I)(J)(I − J)interval
Frequency error 2.541 1.705 0.153 −1.041 6.124 
2.190 1.888 0.261 −1.776 6.157 
−2.461 1.666 0.157 −5.961 1.039 
−0.351 1.521 0.820 −3.546 2.844 
−5.002 1.376 0.002a −7.894 −2.110 
−4.651 1.842 0.021a −8.522 −0.781 
Pitch strength −0.093 0.059 0.133 −0.216 −0.031 
−0.138 0.085 0.122 −0.317 −0.041 
0.042 0.047 0.380 −0.056 0.140 
−0.046 0.061 0.467 −0.174 0.083 
0.134 0.049 0.013a 0.032 0.237 
0.180 0.060 0.007a 0.055 0.305 
ToneToneMean differenceStandard errorp95% Confidence
(I)(J)(I − J)interval
Frequency error 2.541 1.705 0.153 −1.041 6.124 
2.190 1.888 0.261 −1.776 6.157 
−2.461 1.666 0.157 −5.961 1.039 
−0.351 1.521 0.820 −3.546 2.844 
−5.002 1.376 0.002a −7.894 −2.110 
−4.651 1.842 0.021a −8.522 −0.781 
Pitch strength −0.093 0.059 0.133 −0.216 −0.031 
−0.138 0.085 0.122 −0.317 −0.041 
0.042 0.047 0.380 −0.056 0.140 
−0.046 0.061 0.467 −0.174 0.083 
0.134 0.049 0.013a 0.032 0.237 
0.180 0.060 0.007a 0.055 0.305 
a

p < 0.05; tone (I) = tone 1, 2, 3, or 4; tone (J) = tone 1, 2, 3, or 4.

For pitch strength, the ANOVA statistical analysis also demonstrated a significance for the tone factor [F(2.031, 0.409) = 3.632, p = 0.036, partial η2 = 0.168], but not for the group factor [F(1, 0.067) = 0.941, p = 0.345, partial η2 = 0.050] nor for the interaction [F(2.031, 0.026) = 0.229, p = 0.800, partial η2 = 0.013] between the two factors. A post hoc Tukey-Kramer analysis revealed that tones 2 and 3 elicited larger pitch strengths than tone 4 (see Table 1). These two findings indicated a differential pitch representation at the subcortical level for both the American and Chinese newborns during their first three days after birth.

Frequency errors and pitch strengths of the FFRs obtained in the American and Chinese newborns are plotted in the same panels for comparison. Figure 2 plots the frequency errors (left panel) and pitch strengths (right panel) obtained from the American and Chinese newborns in response to the four Mandarin pitch contours. Recordings obtained in the control condition produced significantly larger frequency errors (t = −3.760, p < 0.001, degrees of freedom = 91, Cohen's d = 1.124) and smaller pitch strengths (t = 5.198, p < 0.001, degrees of freedom = 91, Cohen's d = 1.554) than those obtained in the experimental conditions. Furthermore, the means of the frequency errors and pitch strengths of the recordings obtained in the control condition in the American newborns were not significantly different from those obtained in the Chinese newborns (t = −0.094, p = 0.463, degrees of freedom = 11, Cohen's d = 0.054 for frequency error; t = 0.174, p = 0.433, degrees of freedom = 11, Cohen's d = 0.099 for pitch strength). These two findings indicated that the recordings were not contaminated by stimulus artifact. Furthermore, they demonstrated the consistency of experimental preparations and the quality of data collection in both the American and Chinese newborns at their respective hospitals.

Fig. 2.

Group comparisons of the frequency errors (left panel) and pitch strengths (right panel) derived from the FFR recordings obtained in ten American (black vertical bars) and ten Chinese (gray vertical bars) newborns who were 1–3 days old. The stimuli were a set of four distinctive Mandarin pitch contours (tone 1 flat, tone 2 rising, tone 3 dipping, and tone 4 falling). The dotted and solid horizontal lines (that goes across the entire width of each panel) indicate the means of the frequency errors and pitch strengths obtained in a control condition for the American and Chinese newborns, respectively. Each asterisk represents a statistical significance (p < 0.05).

Fig. 2.

Group comparisons of the frequency errors (left panel) and pitch strengths (right panel) derived from the FFR recordings obtained in ten American (black vertical bars) and ten Chinese (gray vertical bars) newborns who were 1–3 days old. The stimuli were a set of four distinctive Mandarin pitch contours (tone 1 flat, tone 2 rising, tone 3 dipping, and tone 4 falling). The dotted and solid horizontal lines (that goes across the entire width of each panel) indicate the means of the frequency errors and pitch strengths obtained in a control condition for the American and Chinese newborns, respectively. Each asterisk represents a statistical significance (p < 0.05).

Close modal

Feasibility of assessing the subcortical pitch representation in response to the four Mandarin pitch contours was demonstrated through the characteristics of the neonatal FFRs. Results of this study support the idea that infants are born with necessary anatomical structures and functional capacities to track pitch contours, which provide a basis for language acquisition. Results also demonstrate that rising and dipping pitch contours elicit FFRs with better accuracy and larger magnitudes than falling pitch contours. This finding supports the idea of differential pitch representation at the subcortical level in 1–3 days old newborns.

Cross-linguistic studies have shown that newborns and young infants are capable of distinguishing the various features of human speech that may exist in the newborn's and infant's native language and those that may exist only in their foreign languages.14 The results of the present study showed no statistical differences in the FFRs recorded from the American and Chinese newborns. This finding is consistent with the biological capacity model15 which states that human infants are born with equal ability to detect any differences of speech sounds. The newborn's capability of differentiating differences between speech sounds can be attributed, at least partially, to the early maturation and readiness of the auditory system during the early stages of life. For example, the human cochlea has reached adult size and is fully functional at birth.16 The auditory nerve and neural elements at the subcortical level are also capable of processing a wide spectrum of speech sounds and to produce recognizable electrophysiological potentials, such as the auditory brainstem response to clicks.17 The similar trend of differential pitch representation in both American and Chinese newborns indicate that differential processing of pitch contours is common to both English- and Chinese-speaking newborns, and thus is not the result of early language exposure, but is instead a consequence of the common repertoire of auditory anatomy and functional capabilities at birth.

Differential neural representation at the subcortical level, in response to the various pitch contours, has also been observed in human adults.11 Krishnan and colleagues11 examined the subcortical neural representation in response to a set of four Mandarin stimuli. Their results also demonstrate that rising and dipping contours elicit stronger FFRs in adults than the other two Mandarin pitch contours. One possible explanation of this phenomenon common to both newborns and adults is that the four Mandarin tones have different pitch contours and each contour starts at a different frequency range. Specifically, the f0's of the rising and dipping pitch contours start at relatively low frequencies (117 Hz for the rising and 113 Hz for the dipping pitch contours) than those of the other two Mandarin pitch contours (163 Hz for the flat and 186 Hz for the falling pitch contours). Animal studies have shown similar trends.18,19 Certain neurons at the subcortical nuclei18 and auditory cortex19 have demonstrated differential neural sensitivities to the direction of different frequency trajectories of the stimuli. It is possible that once neurons have phase-locked to the f0's in the beginning of a stimulus presentation, it is easier for them to continue following and tracking the frequency changes of the stimuli. Thus, this finding may be generalizable to participants across culture and species.

In conclusion, results obtained in this study demonstrated the presence of differential pitch representation at the subcortical level in both American and Chinese newborns during their immediate postnatal days. These findings hold importance in both the basic-science research and clinical application realms. From the viewpoint of basic science, data obtained in this study provide a basis of subcortical pitch representation in newborns and allow further exploration of signal processing and neuroplasticity at the subcortical level when newborns advance in age. From the viewpoint of clinical applications, if the FFR is to be utilized as an assessment tool for newborns, the rising and dipping pitch contours may be the most efficient stimuli to employ.

This research received funding from the National Science Foundation—Division of Behavioral and Cognitive Sciences in the United States (Grant No. BCS-1250700) and the Ministry of Health and Welfare Clinical Trial and Research Center of Excellence in Taiwan (Grant No. MOHW104-TDU-B-212-113002).

1.
F. G.
Worden
and
J. T.
Marsh
, “
Frequency-following (microphonic-like) neural responses evoked by sound
,”
Electroencephalogr. Clin. Neurophysiol.
25
,
42
52
(
1968
).
2.
G.
Moushegian
,
A. L.
Rupert
, and
R. D.
Stillman
, “
Laboratory note. Scalp-recorded early responses in man to frequencies in the speech range
,”
Electroencephalogr. Clin. Neurophysiol.
35
,
665
667
(
1973
).
3.
P. C. M.
Wong
,
E.
Skoe
,
N. M.
Russo
,
T.
Dees
, and
N.
Kraus
, “
Musical experience shapes human brainstem encoding of linguistic pitch patterns
,”
Nat. Neurosci.
10
,
420
422
(
2007
).
4.
F.-C.
Jeng
,
H.-K.
Chung
,
C.-D.
Lin
,
B. M.
Dickman
, and
J.
Hu
, “
Exponential modeling of human frequency-following responses to voice pitch
,”
Int. J. Audiol.
50
,
582
593
(
2011
).
5.
N.
Russo
,
T.
Nicol
,
G.
Musacchia
, and
N.
Kraus
, “
Brainstem responses to speech syllables
,”
Clin. Neurophysiol.
115
,
2021
2030
(
2004
).
6.
N.
Kraus
,
J.
Slater
,
E. C.
Thompson
,
J.
Hornickel
,
D. L.
Strait
,
T.
Nicol
, and
T.
White-Schwoch
, “
Music enrichment programs improve the neural encoding of speech in at-risk children
,”
J. Neurosci.
34
,
11913
11918
(
2014
).
7.
F.-C.
Jeng
,
E. A.
Schnabel
,
B. M.
Dickman
,
J.
Hu
,
X.
Li
,
C.-D.
Lin
, and
H.-K.
Chung
, “
Early maturation of frequency-following responses to voice pitch in infants with normal hearing
,”
Percept. Motor Skills
111
,
765
784
(
2010
).
8.
S.
Anderson
,
A.
Parbery-Clark
,
T.
White-Schwoch
, and
N.
Kraus
, “
Development of subcortical speech representation in human infants
,”
J. Acoust. Soc. Am.
137
,
3346
3355
(
2015
).
9.
F.-C.
Jeng
,
J.
Hu
,
B. M.
Dickman
,
K.
Montgomery-Reagan
,
M.
Tong
,
G.
Wu
, and
C.-D.
Lin
, “
Cross-linguistic comparison of frequency-following responses to voice pitch in American and Chinese neonates and adults
,”
Ear Hear.
32
,
699
707
(
2011
).
10.
F.-C.
Jeng
,
C.-D.
Lin
,
M.-S.
Chou
,
G. R.
Hollister
,
J. T.
Sabol
,
G. N.
Mayhugh
,
T.-C.
Wang
, and
C.-Y.
Wang
, “
Development of subcortical pitch representation in three-month-old Chinese infants
,”
Percept. Motor Skills
122
,
123
135
(
2016
).
11.
A.
Krishnan
,
Y.
Xu
,
J.
Gandour
, and
P. A.
Cariani
, “
Human frequency-following response: Representation of pitch contours in Chinese tones
,”
Hearing Res.
189
,
1
12
(
2004
).
12.
E. D.
Young
and
M. B.
Sachs
, “
Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers
,”
J. Acoust. Soc. Am.
66
,
1381
1403
(
1979
).
13.
Joint Committee on Infant Hearing
, “
Position statement
,”
ASHA
24
,
1017
1018
(
1982
).
14.
M.
Friedrich
,
B.
Herold
, and
A. D.
Friederici
, “
ERP correlates of processing native and non-native language word stress in infants with different language outcomes
,”
Cortex
45
,
662
676
(
2009
).
15.
P. D.
Eimas
,
E. R.
Siqueland
,
P.
Jusczyk
, and
J.
Vigorito
, “
Speech perception in infants
,”
Science
171
,
303
306
(
1971
).
16.
D. A.
Frenz
,
J. R.
McPhee
, and
T. R.
Van De Water
, “
Structural and functional development of the ear
,” in
Physiology of the Ear
, edited by
A. F.
Jahn
and
J.
Santos-Sacchi
(
Singular Thomson Learning
,
San Diego
,
2001
), pp.
191
214
.
17.
D. R.
Stapells
, “
Maturation of the contralaterally recorded auditory brain stem response
,”
Ear Hear.
12
,
167
173
(
1991
).
18.
P. G.
Nelson
,
S. D.
Erulkar
, and
J. S.
Bryan
, “
Responses of units of the inferior colliculus to time-varying acoustic stimuli
,”
J. Neurophysiol.
29
,
834
860
(
1966
).
19.
T.
Watanable
, “
Fundamental study of the neural mechanism in cats subserving the feature extraction process of complex sounds
,”
Jpn. J. Physiol.
22
,
569
583
(
1972
).