Vocal recognition of socially relevant conspecifics is an important skill throughout the animal kingdom. Human infants recognize their own mother at birth, and they distinguish between unfamiliar female talkers by 4.5 months of age. Can 4.5-month-olds also distinguish between unfamiliar male talkers? To date, no adequately powered study has addressed this question. Here, a visual fixation procedure demonstrates that, unlike adults, 4.5-month-olds (N = 48) are worse at telling apart unfamiliar male voices than they are at telling apart unfamiliar female voices. This result holds despite infants' equal attentiveness to unfamiliar male and female voices.
1. Introduction
Newborns recognize their own mother's voice soon after birth, presumably due to in utero exposure during the third trimester of pregnancy (e.g., DeCasper and Fifer, 1980; Ockleford , 1988). Soon thereafter, recognition of the father's voice emerges (Hulsebus, 1981; Lynn, 1974). Thus, voice recognition could be conceptualized as a ready-made skill requiring little time or experience to develop. However, work with older children paints a different picture, suggesting that voice recognition does not fully mature until adolescence (Creel and Jimenez, 2012; Fecher and Johnson, 2018a; Mann , 1979) and that this slowly emerging skill is sculpted by experience with specific types of voices heard early in development (Fecher and Johnson, 2018a; Friendly , 2014; Levi and Schwartz, 2013). The stark contrast between the competency suggested in the infant literature and the protracted period of development demonstrated in the child literature may be in part due to the former's narrow focus on the identification of familiar female voices and the latter's broader focus on a wide variety of familiar and unfamiliar talkers presented in a diverse range of recognition contexts [see, e.g., Fecher (2019) for discussion]. Here, we aim to narrow the gap between these infant and child literatures by probing the limits of 4.5-month-olds' voice discrimination abilities. More specifically, we ask how well young infants can tell apart newly encountered male voices.
The few existing infant studies examining unfamiliar voice discrimination have used pairs of female voices (Fecher and Johnson, 2019) or a male and a female voice (e.g., Floccia , 2000; Lecanuet , 1993; Miller, 1983). Unsurprisingly, even while still in the womb, infants readily distinguish between unfamiliar talkers when one talker is male and the other talker is female (Floccia , 2000; Lecanuet , 1993). Evidence for the ability to tell apart unfamiliar female voices, however, has not been reported until infants reach 4–8 months of age (Friendly , 2014), and even then, infants only succeed at distinguishing between unfamiliar female voices when they speak in a familiar language (Fecher and Johnson, 2018b; Johnson , 2011). Thus, infants appear to initially find it much easier to identify familiar talkers than to distinguish between unfamiliar talkers—especially when those unfamiliar talkers speak in an unfamiliar language.
While few studies explore successful discrimination of same-gendered voices in infancy, studies examining infants' ability to identify male talkers are even rarer. We know infants can distinguish between a familiar male voice (i.e., the infant's own father) and an unfamiliar male voice (e.g., Ward and Cooper, 1999; DeCasper and Prescott, 1984), but can they distinguish between two unfamiliar male voices? This is a valid question; past studies claim that infants prefer to listen to female voices (Brazelton, 1978; Standley and Madsen, 1990) and that the cues used to distinguish male voices can differ from the cues used to distinguish female voices [Murry and Singh, 1980; Singh and Murry, 1978; see, e.g., Kreiman (2008) for discussion]. Nonetheless, we are aware of only two studies that have asked whether infants can distinguish unfamiliar male voices as readily as they distinguish unfamiliar female voices. One of these studies pooled data from infants' performance on male and female voice pairs, making it impossible to determine whether infants were equally successful with male and female voices (Brookes , 2001). Another study that reported newborns' ability to distinguish between male voices drew this conclusion from a sample of just six infants (DeCasper and Prescott, 1984). Resolving whether infants can distinguish between unfamiliar male talkers will provide crucial understanding of the extent of early talker discrimination abilities, specifically, how infants' ability to tell apart voices might be shaped by early experience and whether it generalizes to different kinds of voices.
To summarize, young infants excel at identifying familiar talkers, but we know much less about their ability to identify unfamiliar talkers. Although a few studies report discrimination between unfamiliar female talkers, there is limited evidence that infants can tell apart other types of talkers—such as unfamiliar males. Here, we probe the limits of infants' voice identification abilities by presenting 4.5-month-olds with pairs of unfamiliar male voices. We then directly compare our results to the results of Fecher and Johnson (2019), who tested 4.5-month-olds' ability to tell apart unfamiliar female voices using an identical procedure. The infants we tested were drawn from the same population and run by the same experimenters using the same equipment as in the Fecher and Johnson study. We predict two possible outcomes. If young infants are equally skilled at telling apart male and female voices, then we should observe the same results as Fecher and Johnson, with infants looking longer to same-voice trials than different-voice trials in the test phase; however, if the ability to tell voices apart is fragile at 4.5 months and/or if talker recognition is initially best for the kinds of talkers with whom young infants spend the most time, and perhaps are most dependent on for their basic care (i.e., adult females, in the case of our sample), then we predict the infants who hear male voices in the current study will perform worse in this voice discrimination task than the infants who heard female voices in Fecher and Johnson.
2. Method
2.1 Participants
Forty-eight full-term monolingual English-learning 4–5-month-old infants [mean age (Mage) = 137 days, range = 120–157; 24 female] from the greater Toronto area were tested.1 Infants were exposed to English at least 90% of the time. Of the 46 families that responded, all reported mothers as the primary caregiver. Note that Ontario has a generous maternal leave policy, allowing primary caregivers to be at home with their newborn for a full year. The data for 16 additional infants were excluded from data analysis due to failure to complete at least six exposure trials before reaching a predefined exposure criterion required to proceed to test (n = 8), fussiness (n = 3), or failure to reach posttest criterion (n = 2) or if infants completed the maximum number of exposure trials (n = 3).
2.2 Stimuli
Stimuli consisted of 40 unrelated sentences (16–18 syllables/sentence) used in previous infant studies (e.g., Fecher and Johnson, 2021, 2019) that were re-recorded by four adult male talkers in a neutral tone of voice. As in Fecher and Johnson, these four talkers were separated into two pairs. All talkers learned English from birth in Canada and were non-smokers with no particularly distinctive voice characteristics. Recordings (48 kHz; normalized to 69.5 dB) were made in a double-walled, sound-attenuated Industrial Acoustics Company (IAC) booth. Thirty-six sentences were used during the exposure phase, and the remaining four sentences were used at test. As in Fecher and Johnson, which used female voices, the current study used four acoustically similar male voices (see Table 1).
Mean (M) acoustic measures, with standard deviation (SD) in parentheses, for the female voices from Fecher and Johnson (2019) and the male voices from the current study.
Gender pair . | Talker . | M F0 (Hz) . | SD F0 (Hz) . | Durationa (s) . | Articulation rate (syllables/s) . |
---|---|---|---|---|---|
Female pair 1 | Talker 1 | 196.5 (7.8) | 47.4 (8.4) | 4.2 (0.4) | 4.1 (0.4) |
Talker 2 | 204.85 (7.1) | 40.5 (6.5) | 3.7 (0.3) | 4.6 (0.4) | |
Female pair 2 | Talker 3 | 211.4 (9.3) | 45.2 (11.8) | 4.2 (0.4) | 4.1 (0.4) |
Talker 4 | 212.1 (8.3) | 35.3 (8.5) | 3.9 (0.3) | 4.4 (0.4) | |
Male pair 1 | Talker 1 | 124.3 (5.0) | 24.8 (4.9) | 3.2 (0.3) | 5.3 (0.5) |
Talker 2 | 113.0 (7.3) | 26.4 (7.2) | 3.2 (0.3) | 5.4 (0.5) | |
Male pair 2 | Talker 3 | 133.1 (7.0) | 33.5 (8.5) | 3.9 (0.3) | 4.4 (0.4) |
Talker 4 | 135.8 (7.5) | 27.3 (5.9) | 3.8 (0.5) | 4.5 (0.5) |
Gender pair . | Talker . | M F0 (Hz) . | SD F0 (Hz) . | Durationa (s) . | Articulation rate (syllables/s) . |
---|---|---|---|---|---|
Female pair 1 | Talker 1 | 196.5 (7.8) | 47.4 (8.4) | 4.2 (0.4) | 4.1 (0.4) |
Talker 2 | 204.85 (7.1) | 40.5 (6.5) | 3.7 (0.3) | 4.6 (0.4) | |
Female pair 2 | Talker 3 | 211.4 (9.3) | 45.2 (11.8) | 4.2 (0.4) | 4.1 (0.4) |
Talker 4 | 212.1 (8.3) | 35.3 (8.5) | 3.9 (0.3) | 4.4 (0.4) | |
Male pair 1 | Talker 1 | 124.3 (5.0) | 24.8 (4.9) | 3.2 (0.3) | 5.3 (0.5) |
Talker 2 | 113.0 (7.3) | 26.4 (7.2) | 3.2 (0.3) | 5.4 (0.5) | |
Male pair 2 | Talker 3 | 133.1 (7.0) | 33.5 (8.5) | 3.9 (0.3) | 4.4 (0.4) |
Talker 4 | 135.8 (7.5) | 27.3 (5.9) | 3.8 (0.5) | 4.5 (0.5) |
Average sentence duration.
In addition to ensuring that the male talkers were comparable to the female talkers used in Fecher and Johnson in terms of acoustic variability, we pre-tested our stimuli to establish that adults perceived the male talkers used in the current study as matched in degree of perceptual distinctiveness to the female talkers used in Fecher and Johnson. Adults who learned English before the age of six completed two perception tasks with the talkers used in Fecher and Johnson and the current study: an AX discrimination task and a pairwise similarity rating task. Adults in the AX discrimination task (N = 20; Mage = 19.1 years; 17 female) completed 48 randomized trials blocked by gender and were asked to decide whether two different phrases were produced by the same talker or two different talkers. As expected, adults distinguished male (M = 0.93; SD = 0.07) and female (M = 0.94; SD = 0.07) voices equally well; t(19) = 0.75, p = 0.46. An additional 30 adults (Mage = 20.1 years; seven female; one non-binary) completed a pairwise similarity rating task. Here, listeners made pairwise ratings of similarity (7 = very different; 1 = same person) between each voice relative to each other voice twice across a total of 64 randomized trials. A Tukey multiple comparison test was used to assess pairwise mean similarity between the specific male and female voice pairs featured in Fecher and Johnson and the current study. Both pairs of male voices (MPair1 = 4.15; SDPair1 = 1.6; MPair2 = 3.65; SDPair2 = 1.5) were rated to be equally as distinctive as at least one of the female pairs (MPair1 = 5.12; SDPair1 = 1.5; MPair2 = 3.85; SDPair2 = 1.6). In sum, these results indicate that at least according to the measures reported here, adults found the female and male voice pairs equally discriminable and generally similar in perceptual distinctiveness.
2.3 Procedure
Infants were tested using a visual fixation procedure identical to Fecher and Johnson (2019), except that, as described above, stimuli were produced by four male talkers instead of four female talkers. Infants sat on their caregiver's lap in an IAC booth facing a 21.5-in. computer monitor that displayed a multicolored flickering checkerboard during all trials (see Fig. 1). Speech stimuli were presented over loudspeakers [Alesis (Cumberland, RI) M1Active 520 USB] at a constant, comfortable listening level. The experimenter monitored and relayed infants' looking behavior to a computer (running Habit2, 2018, version 2.2.1) outside the booth. Each trial was initiated by the experimenter once infants oriented toward a blinking red star that appeared in the center of the monitor. Caregivers wore noise-canceling headphones that played masking music mixed with speech stimuli from the experiment to prevent them from influencing their child's behavior.
Example of the exposure and test phase stimuli used in the visual fixation procedure.
Example of the exposure and test phase stimuli used in the visual fixation procedure.
Infants were randomly assigned to one of the two talker pair conditions: pair 1 (N = 24) or pair 2 (N = 24). The experiment consisted of an exposure phase and a test phase (see Fig. 1). In the exposure phase, each infant-controlled trial (maximum 16 s long) featured two sentences produced by one of the talkers repeated in a cyclic manner (interstimulus interval between sentences = 300 ms; minimum look time = 1 s; minimum look-away time = 2 s). Once infants' looking time decreased to 65% of the initial duration (averaged over a sliding window of the first three trials) or they had completed a maximum of 18 habituation trials, the test phase began. Due to the identical design of the exposure and test trials, both the caregivers and experimenter were unaware of the test phase commencement.
As typical of infant discrimination studies, we included a pretest (before the first exposure trial) and posttest trial (after the last test trial), during which infants were presented with a video of a colorful spinning pinwheel paired with sound effects. For the data to be included in the final analysis, infants' looking time during posttest trials had to be at least 80% of their looking time during pretest trials. This criterion ensured that infants were sufficiently attentive and could detect a change in stimulus during the task. As in Fecher and Johnson, data from infants who failed to reach the criteria for habituation (completing less than six, or reaching a maximum of 18, habituation trials), were also excluded prior to analysis.
In the test phase, infants completed two same-voice and two different-voice trials. In the same-voice trials, the same talker from the exposure phase was presented. In the different-voice trials, a new (unfamiliar) talker was presented. Across infants, we counterbalanced the order of the presentation of the two types of test trials, which talker was heard during the exposure phase and which one was heard during the test phase, and which sentences were heard during exposure and test.
3. Results
We first analyzed infants' looking behavior during the exposure phase, to ensure that infants were equally attentive to the male voices in the current study as the infants were to the female voices presented in Fecher and Johnson (2019). To do this, we compared the number of exposure trials completed prior to the initiation of the test phase. There was no difference in the number of exposure trials infants completed in the current study with male voices (M = 10.3; SD = 3.2) and in Fecher and Johnson with female voices (M = 9.3; SD = 2.6), t(90.76) = 1.54, p = 0.126. To get a more fine-grained measure of attention to male vs female voices, we also examined infants' average total looking time during the exposure phase in the current study (M = 130.0 s; SD = 6.9 s) to that of Fecher and Johnson (M = 115.9 s; SD = 6.8 s), and once again no difference was found, t(86.24) = 1.58, p = 0.118. Thus, despite past work suggesting infants are more interested in female voices than male voices in some contexts (Brazelton, 1978), we observed no evidence that the infants were any less attentive to male voices than the infants tested on female voices in Fecher and Johnson.
Next, we turned to our primary measure of interest: the difference in infants' looking time to the same- and different-voice test trials. Note, in Fecher and Johnson, infants' ability to tell apart female talkers was indicated by longer looking time to same-voice trials than different-voice trials. Thus, if infants are equally adept at telling apart unfamiliar male voices, we expected to also observe longer looking times to same-voice trials than different-voice trials. However, if infants find male voices more difficult to distinguish than female voices, then we predicted no difference in looking time and a statistically significant interaction between trial type and talker gender. Because we observed no effect of talker pair on looking times during the test trials for the male voices (MPair1 = 1.00 s; SD = 3.0; MPair2 = 0.66 s; SD = 4.8), t(38.4) = 0.30, p = 0.769, or for the female voices (MPair1 = 2.06 s; SD = 3.6; MPair2 = 1.61 s; SD = 3.7), t(46.0) = –0.42, p = 0.674, we collapsed across the talker pairs in our main analysis. A linear mixed-effects regression (LMER) model was conducted using the lme4 package (Bates , 2015), and p-values were computed using the lmerTest package (Kuznetsova , 2017) in R, with mean looking time as the dependent variable and contrast-coded fixed effects for trial type (same, different), talker gender (female, male), and their interaction. The model also included a random intercept for participant, talker, and item. Model comparisons were performed to determine whether the inclusion of each fixed factor and the interaction made a significant contribution to the model. No significant main effects of trial type (p = 0.13) or talker gender (p = 0.17) were observed. However, there was a significant interaction between trial type and talker gender, β = –2.67, SE = 0.78, χ2 (1, N = 96) = 11.4, p < 0.001.
To further investigate this interaction, we constructed separate LMER models for the two talker conditions, with each model including a fixed effect for trial type and including a random intercept for participant, talker, and item. In the female talker condition in Fecher and Johnson, infants looked significantly longer during the same-voice (M = 8.02 s; SD = 4.4) than the different-voice (M = 6.18 s, SD = 3.1) trials, β = −1.84, SE = 0.53, χ2 (1, N = 48) = 11.1, p < 0.01. In our male talker condition, however, infants' looking times during the same-voice (M = 6.78 s; SD = 3.4) and different-voice (M = 7.61 s; SD = 4.5) trials did not differ (p = 0.15), indicating that infants did not notice the talker change with male voices. To summarize, irrespective of the talker pair, infants detect a talker change with female voices, but not with male voices (see Fig. 2).
Looking time difference between same-voice and different-voice trials for male voices (current study) and female voices (Fecher and Johnson, 2019). This measure differed significantly from chance (0) for female, but not male, voices. Error bars, SE. **, p < 0.01.
Looking time difference between same-voice and different-voice trials for male voices (current study) and female voices (Fecher and Johnson, 2019). This measure differed significantly from chance (0) for female, but not male, voices. Error bars, SE. **, p < 0.01.
4. Discussion
Voice recognition is a slowly emerging skill that is crucially shaped by interactions with others in the social environment. Nonetheless, the developmental studies examining voice recognition are far outnumbered by those examining other perceptual abilities, such as face recognition and depth perception. This is surprising given that the human auditory system matures earlier in development than the visual system for instance [Birnholz and Benacerraf, 1983; see, e.g., Saffran (2006) for review] and is likely an important cue young infants rely on to identify people in their environment. Indeed, only a handful of studies investigate infants' ability to distinguish unfamiliar voices—and to date, none have provided convincing data that infants can tell apart unfamiliar male voices.
Previously, Fecher and Johnson (2019) demonstrated that 4.5-month-olds are surprisingly adept at distinguishing the voices of unfamiliar females. Here, we used the same methodology as Fecher and Johnson to investigate whether this ability extended to male voices. Surprisingly, we found no evidence that 4.5-month-olds can tell apart two unfamiliar male voices. This finding is striking given that infants drawn from the same population and tested by the same experimenters in the same lab readily distinguished between unfamiliar female voices (see Fecher and Johnson, 2019). Why would infants distinguish female talkers more readily than male talkers?
Perhaps the most straightforward explanation for infants' superior performance with female talkers is motivational. Past studies demonstrate that infants preferentially attend to things they are well-acquainted with [e.g., same-race faces (Kelly , 2007); familiar languages and accents (Kinzler , 2007; Kinzler , 2009)]. Collectively, the infants in the current study had female primary caregivers. It is therefore possible that we failed to observe successful discrimination with male voices simply because the infants were not interested in listening to the male talkers. However, this explanation is unlikely given that the 4.5-month-olds we tested demonstrated equivalent looking times while listening to male and female talkers during the exposure phase.
Given infants' equal attentiveness during the exposure phase to male talkers in the current study and to female talkers in Fecher and Johnson (2019), we propose two other possible explanations for infants' superior performance with female voices. The first is that the ability to distinguish unfamiliar talkers, whether male or female, is not yet robust at 4.5 months. That is, perhaps infants' success with female voices in Fecher and Johnson does not reliably replicate with other voices. Once again, we found no support for this explanation, and indeed, infants performed equally well with the two pairs of female voices in Fecher and Johnson and equally poorly with the two pairs of male voices in the current study. The second and more theoretically interesting explanation is that infants—unlike adults—are more sensitive to acoustic and linguistic differences between female talkers than male talkers. Perhaps this increased sensitivity to female talkers is evolutionary, as females have traditionally cared for young infants. Or perhaps this sensitivity can be explained by perceptual learning. Indeed, there is substantial evidence that perceptual narrowing, i.e., the improvement in infants' ability to make perceptual discriminations between routinely encountered stimuli, occurs in early infancy, occurring early in development across a variety of domains (e.g., languages, musical systems, and facial features; see Lewkowicz and Ghazanfar, 2009, for review). In fact, research indicates that modern-day North American infants receive far more speech input from female speakers than male speakers (Bergelson , 2019). Thus, as the vast majority of the infants in the current study had a female primary caregiver, this routine exposure to a female voice may have improved infants' ability to distinguish between female voices relative to the less-encountered male voices. However, we readily admit that at this point, this suggestion is mere speculation, and to distinguish between the above-mentioned possibilities, more data with different types of voices and infants raised in different types of households would need to be collected.
In summary, the current study probed the limits of infants' early voice discrimination abilities. Specifically, we asked whether 4.5-month-old infants can readily distinguish between unfamiliar male voices just as well as they are reported to distinguish between unfamiliar female voices. The infants in our sample showed no evidence of distinguishing male voices, despite demonstrating equivalent attentiveness to male and female voices. This finding suggests that early voice discrimination abilities may not generalize to all types of voices, but rather are shaped by early experience. This possibility is supported by evidence that 4.5-month-olds only distinguish unfamiliar females when they speak a familiar language (Fecher and Johnson, 2019). Future work will require systematic investigations of how auditory experience shapes early talker identification abilities. Better understanding the development of human voice recognition will help us understand speech processing more generally.
Acknowledgments
This research was supported by grants awarded to E.K.J. from the Social Sciences and Humanities Research Council of Canada and the Natural Sciences and Engineering Research Council of Canada.
Author declarations
Conflict of Interest
The authors have no conflicts of interest to report.
Ethics Approval
This study was approved by the University of Toronto Research Ethics Board. Informed consent was obtained for all participants through their caregivers.
Data Availability
The data that support the findings of this study are openly available in the Open Science Framework, at http://doi.org/10.17605/OSF.IO/3NSUD.
Our data were directly compared to a separate set of 48 4–5-month-old infants (Mage = 136 days, range = 122–151; 27 female) tested by Fecher and Johnson (2019), who were drawn from the same population.