Past research on speech perception has shown that speaker information, such as gender, affects phoneme categorization. Additionally, studies on listening under divided attention have argued that cognitive load decreases attention to phonetic detail and increases reliance on higher-level cues such as lexical information. This study examines the processing of speaker information under divided attention. The results of two perception experiments indicate that additional cognitive load does not increase listeners' reliance on the gender of the speaker during phoneme categorization tasks. This suggests that the processing of speaker information may pattern with lower-level acoustic rather than higher-level lexical information.

Acoustically, every production of any given word is unique. Countless factors, such as the linguistic context and differences in speakers' vocal tract length or voice quality, contribute to this variability (Ohala, 1983; Foulkes and Docherty, 2006; Kleinschmidt and Jaeger, 2015). Speaker normalization refers to listeners' ability to compensate for these differences and accurately recognize speech despite the vast diversity of input they receive (Johnson, 1997; Johnson and Sjerps, 2021). Information inferred by the listener about the speaker's identity, such as gender, is known to play a role in this process (Strand, 1999; Foulkes, 2010).

The effect of speaker gender on phoneme categorization is well established. In a seminal study, Strand and Johnson (1996) found that the gender of a speaker affected the classification of sounds along a 9-step /s/-/ʃ/ continuum, which they spliced onto one male and one female voice, producing the minimal pair sod-shod. In a categorization task, participants gave more /s/ responses when the voice was male and more /ʃ/ responses when the voice was female. Listeners appeared to expect fricatives produced by male speakers to have lower centers of gravity than fricatives produced by female speakers. It was concluded that these (unconscious) expectations are employed in speaker normalization during speech perception. The role of expectations was replicated in a study using a /ʊ/-/ʌ/ continuum (Johnson et al., 1999) in which only one gender-ambiguous voice was used and participants were merely told that the speaker was either male or female. This manipulation alone sufficed to produce a speaker effect, which suggests that spectral contrast (Holt, 2006; Watkins and Makin, 1994) between the male and female voice and the manipulated sound is not solely responsible for the difference in categorization. In a further experiment using an /s/-/θ/ continuum, participants were more likely to categorize the stimuli as /s/ when paired with a female voice or face (Munson, 2011). Again, these results could not have been predicted by acoustic characteristics of male and female speech alone, as, unlike /s/ or /ʃ/, men and women's /θ/ productions do not significantly differ in their spectral centers of gravity.

These findings indicate that expectations about how men and women produce various phonemes are involved in speaker normalization and that acoustic differences alone cannot explain the phenomenon. As with other socially constructed norms, differing expectations for men and women's fricatives may be largely learned.1 In addition to research on gender, other studies [e.g., Drager (2011) and Hay et al. (2006)] have shown that further aspects of speaker identity, such as age and social class, play a similar role in speaker normalization. However, it is not known at what stage of speech processing speaker information is employed. Within the speech signal, many kinds of information are conveyed, such as “low-level” phonetic and “higher-level” (e.g., lexical) information. In this hierarchy and weighing of information used for speech processing, speaker information, such as gender, could be presumed to be between low and high-level information. This is because speaker information is coded in the acoustic signal rather than in the words but speaker identity refers to higher-level indexical knowledge. Generally, the processing of lower-level information is thought begin before higher-level information is considered (Sjerps and Reinisch, 2015). This work seeks to understand how and when, relative to other kinds of information, speaker information is processed.

Conducting experiments on speech perception under divided attention is one method of addressing questions about levels and order of processing. Cognitive load (CL) in the form of concurrent visual search tasks has been shown to increase the magnitude of lexical bias (the Ganong effect, i.e., that listeners tend to perceive ambiguous sounds to form real words rather than non-words), presumably by decreasing attention to fine phonetic detail and in turn increasing reliance on lexical information (Mattys and Wiget, 2011; Mattys and Scharenborg, 2014). Similar suggestions regarding decreased attention to phonetic detailed were made in a study on the effect of CL on the phoneme restoration effect (Mattys et al., 2014). Thus, additional CL during speech perception is suggested to cause listeners to up-weigh higher-level information in order to compensate for decreased efficiency in the processing of acoustic detail.

The current study examines how the processing of speaker information is affected by CL. Since speaker gender may be seen as indexical information, we hypothesized that dual-tasking will increase listeners' reliance on speaker information, in the same way that CL magnifies listeners' lexical bias. In order to test this hypothesis, two experiments with a phonetic categorization task testing a male against a female voice and visual search task were conducted. In experiment 1, participants completed the categorization task under a high and a low CL condition. In experiment 2, the low CL condition was replaced by a no load (focused attention) condition.

Participants. Native speakers of Austrian German (N = 38, 25 female) were recruited via Prolific (2022) in May (N = 20) and November (N = 18) 2021. Participants were aged 19–40 (mean = 26) and reported normal hearing as well as normal or corrected to normal vision. Participants all gave informed consent and were paid according to Prolific's compensation scheme.

Design and materials. Auditory stimuli: Following the design of Strand and Johnson (1996), the auditory stimuli consisted of minimal word pairs with a word-initial /s/-/ʃ/ contrast produced by male and female speakers. Initially, four minimal pairs in Austrian German were selected; note that in Austrian German, unlike German spoken in Germany, word-initial /s/ is produced as a voiceless alveolar sibilant [s]. Eight native speakers of Austrian German (four male and four female) were recorded producing the minimal pair in a sound-conditioned room.

Based on clarity of their speech, the recordings of four speakers (two male and two female) were selected for further manipulation. The center of gravity (CoG) of all fricatives as well as the f0 of these speakers were measured. The average centers of gravity for the male speakers were 7173 Hz and 3849 Hz, for /s/ and /ʃ/, respectively; for the female speakers they were 7646 and 3715 Hz. Using a script in praat (Boersma and Weenink, 2021), selected fricatives were interpolated into a 15-step continuum. The continuum was then spliced onto the word-ends of the /s/-/ʃ/ minimal pairs produced by both male and female speakers, so that the fricatives were in word-initial position. Multiple continua were created, and a final continuum was selected that sounded natural when combined with at least one male and one female voice and multiple words. The centers of gravity of the continuum end points were 7758 Hz (/s/) and 3636 Hz (/ʃ/). The RMS amplitude of all words was normalized, and words were rate normalized so that, per minimal pair, the words produced by all speakers were approximately the same length.

All continua were piloted without a cognitive load task. Two speakers (one male and one female) and two minimal pairs were then chosen for use in further experiments. We selected stimuli that elicited a sizeable speaker-effect (such that more /ʃ/-responses were given for the female speaker), similar to that found by Strand and Johnson (1996). By ensuring that the stimuli produced a speaker-effect in listeners, the main experiments examined how the magnitude of this effect varied under cognitive load. The minimal pairs were “sein” – “Schein,” to be – appearance and “senken” – “schenken,” to sink – to bestow/to gift. From the 15 continuum-steps that were interpolated, a subset of 9 were selected, consisting of 2 end points and 7 intermediary steps.

Visual stimuli: The visual search task was modeled after the dual-task design used in Derawi, Reinisch and Gabay (2021). The visual stimuli are arrays of black diamonds and upside-down red triangles arranged randomly. In the low CL condition, the arrays consisted of 3 rows and 3 columns and in the high CL condition they consisted of 8 rows and 8 columns (see Fig. 1). In half of the trials, one of the symbols was replaced by a red diamond, henceforth referred to as the oddball. Participants' task was to identify whether the oddball was present or not. The cognitive load tasks were piloted to ensure that participants performed above chance on the high load task but were still less accurate than on the low load task.

Fig. 1.

Examples of arrays used for the visual search task. On the left is an example of the 3 × 3 array used for the low load condition and on the right is an example of the 8 × 8 array used for the high load condition. Both examples contain an oddball (the red diamond).

Fig. 1.

Examples of arrays used for the visual search task. On the left is an example of the 3 × 3 array used for the low load condition and on the right is an example of the 8 × 8 array used for the high load condition. Both examples contain an oddball (the red diamond).

Close modal

Procedure. The experiment was created using PSYCHOPY3 (Pierce et al., 2019) and conducted online via Pavlovia (2022). One trial consisted of one audio stimulus and one visual search task presented simultaneously. The visual stimuli were displayed 200 ms before the audio started, for a total of 700 ms. The audio stimuli were 570 or 650 ms long, depending on the minimal pair. Participants first responded to the phoneme categorization task, then the visual search task. The two CL conditions were blocked, and block order was counterbalanced across participants. All participants completed both blocks with trial order randomized for each participant. Each block consisted of 192 trials: from the nine-continuum steps, the two end points were presented three times per speaker and minimal pair and the seven intermediary steps were presented six times [i.e., 2 speakers × 2 minimal pairs × (2 end points × 3 repetitions + 7 intermediary steps × 6 repetitions)].

Participants were instructed to listen to the audio stimulus and simultaneously search for the oddball. If unsure how to categorize a word or whether an oddball was present, they were told to answer according to their first impression or guess. To ensure that the procedure was clear, five practice trials were completed before the experiment. The practice trials used visual search tasks from the high CL condition (8 × 8 arrays) and audio from both speakers and minimal pairs, but only continuum end points. No explicit feedback was given on either the audio or the visual task. In order to reduce fatigue and the chance of non-compliance, participants could take a break after every 50 trials. There was also a break between the two CL blocks. The experiment took approximately 25 min to complete.

Phonetic categorization task. Categorization data, calculated as the proportion of /ʃ/ responses, is presented in Fig. 2. For statistical analysis, we fit our data to a generalized linear mixed-effects model using the lme4 package (Bates et al., 2015) in r (version 4.1.0), the results of which are shown in Table 1. Fixed effects were Continuum Step (centered on zero and end points excluded from the analysis, higher values mean more /ʃ/ like), Cognitive Load (high load coded as 0.5, low load coded as −0.5), and Speaker Gender (female coded as 0.5, male as −0.5), as well as all interactions. Participant was included as a random factor with random slopes for Continuum Step and Speaker Gender over participants. Note that given the coding of the effects, the results can be interpreted as main effects and the direction of the effect is indicated by the (positive vs negative) sign of the regression weight (b).

Fig. 2.

Categorization data (as a proportion of /ʃ/ responses) over continuum steps (labeled according to the original continuum) for experiment 1. Results are presented according to speaker gender (female speaker in black, male speaker in gray), split by cognitive load condition (solid lines and dark dots represent the low CL condition, dashed lines and light dots the high CL condition). Dots show the raw data, lines the fitted data.

Fig. 2.

Categorization data (as a proportion of /ʃ/ responses) over continuum steps (labeled according to the original continuum) for experiment 1. Results are presented according to speaker gender (female speaker in black, male speaker in gray), split by cognitive load condition (solid lines and dark dots represent the low CL condition, dashed lines and light dots the high CL condition). Dots show the raw data, lines the fitted data.

Close modal
Table 1.

Results of the mixed-effects model for experiment 1 (high vs low CL). The model structure was: glmer(phoneme classification ∼ continuum step * load type * speaker gender + (1 + continuum step + speaker gender | participant)).

bSEzp
(Intercept) −0.382 0.211 −1.810 0.070 
Continuum Step 1.056 0.061 17.373 <0.001 
Load Type 0.094 0.052 1.810 0.070 
Speaker Gender 1.317 0.108 12.195 <0.001 
Continuum Step: Load Type 0.073 0.030 2.445 0.015 
Continuum Step: Speaker Gender 0.076 0.035 2.186 0.029 
Load Type: Speaker Gender 0.105 0.103 1.019 0.308 
Continuum Step: Load Type: Speaker Gender 0.070 0.060 1.173 0.241 
bSEzp
(Intercept) −0.382 0.211 −1.810 0.070 
Continuum Step 1.056 0.061 17.373 <0.001 
Load Type 0.094 0.052 1.810 0.070 
Speaker Gender 1.317 0.108 12.195 <0.001 
Continuum Step: Load Type 0.073 0.030 2.445 0.015 
Continuum Step: Speaker Gender 0.076 0.035 2.186 0.029 
Load Type: Speaker Gender 0.105 0.103 1.019 0.308 
Continuum Step: Load Type: Speaker Gender 0.070 0.060 1.173 0.241 

Results in Table 1 show significant main effects for Continuum Step and Speaker Gender. The main effect of Continuum Step confirms that listeners gave more /ʃ/ responses the higher the continuum step (i.e., the more /ʃ/-like the stimulus). The main effect of Speaker Gender shows that more /ʃ/ responses were given for the stimuli produced by the female speaker. The two-way interactions Continuum Step × Load type and Continuum Step × Speaker Gender just reached significance and show that the effect of Continuum Step is smaller, meaning the categorization function is shallower, in the low-load than in the high-load condition and for the male than the female speaker. The interaction Load type × Speaker Gender as well as the three-way interaction were not significant. Thus, in both CL conditions, the gender of the speaker affected categorization, such that stimuli elicited more /ʃ/ responses when the word was produced by the female speaker. However, this effect did not differ between the high and low CL conditions.

Visual search task. In line with values observed by Bosker et al. (2017) and Derawi et al. (2021), we found that accuracy was lower in the high load condition (67%, SD = 7%) than in the low load condition (95%, SD = 4.5%). Participants performed above chance in the high load condition and near ceiling in the low load condition, indicating that the high load task was more difficult than the low load task, but not so difficult as to lead to non-compliance.

Like experiment 1, this experiment was conducted to test the effect of cognitive load on speaker normalization in a phonetic categorization task. While experiment 1 compared a low CL and a high CL condition, this experiment used the high CL condition (8 × 8 visual search task) and a no load (focused attention) condition without a concurrent visual search task. Although the difference in mean accuracy of responses to the visual search task in experiment 1 indicates that the smaller search task was indeed less difficult than the larger task (95 vs 67% correct), it is possible that the presence of any concurrent visual search task demanded sufficient working memory as to affect phoneme categorization. In a series of experiments testing phoneme categorization and discrimination tasks, Mattys and Scharenborg (2014) found no differences in performance between low and high CL conditions, while they did observe significant differences between high and no CL conditions. These results led them to speculate that the presence of any concurrent task, and not its difficulty, may be decisive in eliciting an effect from CL. Thus, to isolate the effect of divided attention more thoroughly, this experiment compares load and no-load conditions.

Participants. Native speakers of Austrian German (N = 40, 27 female) were recruited via Prolific (2022) in May (N = 20) and November (N = 20) 2021. Users who participated in experiment 1 were excluded from the participant pool. Participants were aged 18–37 (mean = 25) and reported normal hearing and normal or corrected to normal vision. All participants gave informed consent and were compensated for their time according to the Prolific pay scheme.

Design, materials, and procedure. The design, materials, and procedure of experiment 2 were identical to experiment 1, except that the low cognitive load condition was replaced with a no-load condition without a visual search task. That is, participants heard the audio stimuli while viewing a blank screen, followed immediately by the phoneme categorization screen. The auditory stimuli and the high load condition were the same as in experiment 1. The experiment took approximately 20 min to complete.

Phonetic categorization task. Categorization data, calculated as the proportion of /ʃ/ responses, is presented in Fig. 3. We fitted the same generalized mixed-effects model as in experiment 1 to the data obtained in this experiment, the results of which are presented in Table 2. Again, fixed effects were Continuum Step (centered on zero and, again, with end points excluded from the analysis), Cognitive Load (high load coded as 0.5, no load coded as −0.5), Speaker Gender (female coded as 0.5, male as −0.5), and all interactions. Participant was included as a random effect with random slopes for Continuum Step and Speaker Gender over participant.

Fig. 3.

(a) Categorization data (as a proportion of /ʃ/ responses) over continuum steps (labeled according to the original continuum) for experiment 2. Results are presented according to speaker gender (female speaker in black, male speaker in gray), split by cognitive load condition (solid lines and dots represent the low CL condition, dashed lines and light dots the high CL condition). Dots show the raw data, lines the fitted data.

Fig. 3.

(a) Categorization data (as a proportion of /ʃ/ responses) over continuum steps (labeled according to the original continuum) for experiment 2. Results are presented according to speaker gender (female speaker in black, male speaker in gray), split by cognitive load condition (solid lines and dots represent the low CL condition, dashed lines and light dots the high CL condition). Dots show the raw data, lines the fitted data.

Close modal
Table 2.

Results of the mixed-effects model in experiment 2 (high load vs no-load). The model structure was: glmer(phoneme classification ∼ continuum step * load type * speaker gender + (1 + continuum step + speaker gender | participant)).

bSEzp
(Intercept) −0.117 0.157 −0.749 0.454 
Continuum step 1.081 0.068 15.831 <0.001 
Load type −0.285 0.049 −5.775 <0.001 
Speaker gender 1.197 0.108 11.064 <0.001 
Continuum step: Load type −0.101 0.030 −3.393 <0.001 
Continuum step: Speaker gender 0.120 0.033 3.678 <0.001 
Load type: Speaker gender −0.102 0.099 −1.030 0.303 
Continuum step: Load type: Speaker gender −0.081 0.060 −1.358 0.174 
bSEzp
(Intercept) −0.117 0.157 −0.749 0.454 
Continuum step 1.081 0.068 15.831 <0.001 
Load type −0.285 0.049 −5.775 <0.001 
Speaker gender 1.197 0.108 11.064 <0.001 
Continuum step: Load type −0.101 0.030 −3.393 <0.001 
Continuum step: Speaker gender 0.120 0.033 3.678 <0.001 
Load type: Speaker gender −0.102 0.099 −1.030 0.303 
Continuum step: Load type: Speaker gender −0.081 0.060 −1.358 0.174 

Similar to experiment 1, results show significant main effects for Continuum Step and Speaker Gender. The main effect of Continuum Step again confirms that listeners gave more /ʃ/ responses the higher the continuum step (i.e., the more /ʃ/-like the fricative) and the main effect of Speaker Gender shows that more /ʃ/ responses were given for the stimuli produced by the female speaker. The main effect of Load type shows that more /ʃ/ responses were given in the high load than the low load condition. The significant interactions Continuum Step × Load type and Continuum Step × Speaker Gender show that the effect of Continuum Step is smaller, that is the categorization function is shallower in the high load than in the no load condition and for the male than the female speaker. However, the interaction Load type × Speaker Gender as well as the three-way interaction Continuum Step × Load type × Speaker were not significant. Thus, there was no significant effect of Load type on how listeners employed Speaker Gender during the phonetic categorization task.

Visual search task. Accuracy in the cognitive load task was very similar to the value observed for the high load condition in experiment 1, with an average accuracy of 68% (SD = 10%).

Experiment 2 was designed to further explore the null effect for the interaction Speaker Gender × Load type found in experiment 1, now with a stronger manipulation comparing the high load with a no load condition. As experiment 2 replicated the null effect for this interaction, a power analysis was conducted in order to determine the statistical power of this result in experiment 2. The expected effect size was based on data from Mattys and Wiget (2011) (experiment 1) that showed an increase in the Ganong effect by one third under cognitive load compared to a no-load condition (i.e., from 25% to 33%). To estimate power for our generalized linear mixed-effects model we used data simulation as presented in DeBruine and Barr (2021) using 1000 runs. Values for fixed and random effects were taken from experiment 2 with the exception of the critical Speaker Gender × Load type interaction that was set to one third of the speaker effect. The results showed that with 40 participants we have a power of 84% to detect an effect of CL modulating the effect of Speaker Gender, provided the effect was similar to the impact of CL on the Ganong effect in Mattys and Wiget. Details of this power analysis as well as all analyses reported above can be found on OSF (2022).

This study examined the effect of cognitive load (CL) on the weighing of speaker information in a phonetic categorization task. We asked if dual-tasking in the form of a concurrent visual search task would cause listeners to up-weigh speaker information, similar to the effect that has been shown on lexical bias in previous research. To test this, participants categorized words containing a word initial fricative sampled from a single /s/-/ʃ/ continuum spliced onto one male and one female voice. In experiment 1, we compared low and high CL conditions, and in experiment 2 we compared a no load with the high CL condition. Across experiments and conditions, we replicated the effect of speaker gender on fricative perception (Strand and Johnson, 1996; Munson et al., 2006: Munson, 2011), such that more /s/-responses were given when the speaker was male. In experiment 2, we found that high CL yielded a less steep categorization function than the no load condition, which is consistent with the notion that CL decreases listeners' sensitivity to fine acoustic detail (Mattys et al. 2014). However, we found no effect of CL on the magnitude of the speaker effect. Listeners' reliance on speaker information relative to acoustic detail neither increased nor decreased under high CL as opposed to low CL or focused attention.

This has implications for our reasoning that speaker gender information is indexical, hence potentially higher-level information, but at the same time is coded in the acoustic signal rather than in the lexicon. In past studies (Mattys and Wiget, 2011; Mattys and Scharenborg, 2014) it has been argued that CL in the form of a concurrent visual search task increased lexical bias as a result of CL diminishing attention to phonetic detail. However, studies focusing on effects of CL on acoustic/phonetic processing during perception did not find an influence of CL (i.e., neither up- nor down-weighing of acoustic information under CL). Mattys and Scharenborg (2014), for instance, examined the effect of CL on phoneme discrimination tasks. In these tasks, the audio stimuli were pairs of syllables and participants' task was to determine if the first and second syllable were the same. The results showed that among younger adult listeners (mean age = 20.6) performance in phoneme discrimination was not affected by CL. Similarly, Bosker et al. (2017) found no effect of CL on temporal or spectral context effects, and concluded that the early encoding of acoustic information is not affected by CL. Therefore, in the studies testing the magnitude of the lexical bias, the primary effect of CL may not be a decrease in attention to acoustic detail, as often argued, but instead the up-weighing of lexical information via another mechanism.

The results of the present experiments pattern with studies focusing on the processing of acoustic information where no influence of CL was found. However, they also demonstrate that speaker information is not up-weighed in the same way as lexical information. Yet experiments demonstrating a speaker effect using only visual stimuli or explicit instruction to signal speaker gender (Strand and Johnson, 1996; Johnson et al., 1999; Munson, 2011) make it unlikely that the speaker effect in phonetic classification can solely be explained by acoustic properties and indicate that gendered expectations play a role.

If it is not a purely low-level, acoustic process, the resistance of speaker normalization to the effect of CL must be explained using another framework. Derawi et al. (2021) argue that effortful processes will be more affected by additional CL, while processes that are highly efficient and therefore less effortful should not be disrupted by a concurrent task. In line with this argumentation, our data suggest both the processing of acoustic detail as well as the normalization of speaker gender are completed so efficiently by normal listeners as not to be affected by CL.

Participants were paid by a grant from the German Research Foundation (DFG) (Grant No. RE 3047/1-1) to E.R. Open access funding provided by University of Vienna.

1

For a thorough discussion of the relevance of gender theory for speech perception research, we direct readers to Tripp and Munson (2021).

1.
Bates
,
D.
,
Mächler
,
M.
,
Bolker
,
B.
, and
Walker
,
S.
(
2015
). “
Fitting linear mixed-effects models using lme4
,”
J. Stat. Softw.
67
(
1
),
1
48
.
2.
Boersma
,
P.
, and
Weenink
,
D.
(
2021
). “
Praat: Doing phonetics by computer (version 6.1.50) [computer program]
,” http://www.praat.org.
3.
Bosker
,
H. R.
,
Reinisch
,
E.
, and
Sjerps
,
M. J.
(
2017
). “
Cognitive load makes speech sound fast, but does not modulate acoustic context effects
,”
J. Mem. Lang.
94
,
166
176
.
4.
DeBruine
,
L. M.
, and
Barr
,
D. J.
(
2021
). “
Understanding mixed-effects models through data simulation
,”
Adv. Meth. Practices Psychol. Sci.
4
(
1
),
251524592096511
251524592096515
.
5.
Derawi
,
H.
,
Reinisch
,
E.
, and
Gabay
,
Y.
(
2021
). “
Increased reliance on top-down information to compensate for reduced bottom-up use of acoustic cues in dyslexia
,”
Psychon. Bull. Rev.
29
(
1
),
281
292
.
6.
Drager
,
K.
(
2011
). “
Speaker age and vowel perception
,”
Lang. Speech
54
(
1
),
99
121
.
7.
Foulkes
,
P.
(
2010
). “
Exploring social-indexical knowledge: A long past but a short history
,”
Lab. Phonol.
1
(
1
),
5
39
.
8.
Foulkes
,
P.
, and
Docherty
,
G.
(
2006
). “
The social life of phonetics and phonology
,”
J. Phon.
34
(
4
),
409
438
.
9.
Hay
,
J.
,
Warren
,
P.
, and
Drager
,
K.
(
2006
). “
Factors influencing speech perception in the context of a merger-in-progress
,”
J. Phon.
34
(
4
),
458
484
.
10.
Holt
,
L. L.
(
2006
). “
The mean matters: Effects of statistically defined nonspeech spectral distributions on speech categorization
,”
J. Acoust. Soc. Am.
120
,
2801
2817
.
11.
Johnson
,
K.
(
1997
). “
Speech perception without speaker normalization: An exemplar model
,” in
Talker Variability in Speech Processing
, edited by
K.
Johnson
and
J. W.
Mullennix
(
Academic Press
,
San Diego
), pp.
145
165
.
12.
Johnson
,
K.
, and
Sjerps
,
M. J.
(
2021
). “
Speaker normalization in speech perception
,” in
Handbook Speech Perception
(
Blackwell
,
Hoboken, NJ
), pp.
145
176
.
13.
Johnson
,
K.
,
Strand
,
E. A.
, and
D'Imperio
,
M.
(
1999
). “
Auditory–visual integration of talker gender in vowel perception
,”
J. Phon.
27
(
4
),
359
384
.
14.
Kleinschmidt
,
D. F.
, and
Jaeger
,
T. F.
(
2015
). “
Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel
,”
Psychol. Rev.
122
(
2
),
148
203
.
15.
Mattys
,
S. L.
,
Barden
,
K.
, and
Samuel
,
A. G.
(
2014
). “
Extrinsic cognitive load impairs low-level speech perception
,”
Psychon. Bull. Rev.
21
(
3
),
748
754
.
16.
Mattys
,
S. L.
, and
Scharenborg
,
O.
(
2014
). “
Phoneme categorization and discrimination in younger and older adults: A comparative analysis of perceptual, lexical, and attentional factors
,”
Psychol. Aging
29
(
1
),
150
162
.
17.
Mattys
,
S. L.
, and
Wiget
,
L.
(
2011
). “
Effects of cognitive load on speech recognition
,”
J. Mem. Lang.
65
(
2
),
145
160
.
18.
Munson
,
B.
(
2011
). “
The influence of actual and imputed talker gender on fricative perception, revisited (L)
,”
J. Acoust. Soc. Am.
130
(
5
),
2631
2634
.
19.
Munson
,
B.
,
Jefferson
,
S. V.
, and
McDonald
,
E. C.
(
2006
). “
The influence of perceived sexual orientation on fricative identification
,”
J. Acoust. Soc. Am.
119
(
4
),
2427
2437
.
20.
Ohala
,
J. J.
(
1983
). “
The origin of sound patterns in vocal tract constraints
,” in
The Production of Speech
(
Springer
,
New York
), pp.
189
216
.
21.
Open Science Framework (OSF)
(
2022
). https://osf.io/edj3f/ (Last viewed 5/12/2022).
22.
Pavlovia
(
2022
). pavlovia.org (Last viewed 5/12/2022).
23.
Pierce
,
J. W.
,
Gray
,
J. R.
,
Simpson
,
S.
,
MacAskill
,
M. R.
,
Höchenberger
,
R.
,
Sogo
,
H.
,
Kastman
,
E.
, and
Lindeløv
,
J.
(
2019
). “
PsychoPy2: Experiments in behavior made easy
,”
Behav. Res.
51
,
195
203
.
24.
Prolific
(
2022
). prolific.co (Last viewed 5/12/2022).
25.
Sjerps
,
M. J.
, and
Reinisch
,
E.
(
2015
). “
Divide and conquer: How perceptual contrast sensitivity and perceptual learning cooperate in reducing input variation in speech perception
,”
J. Exp. Psychol. Hum. Percept. Perform.
41
(
3
),
710
722
.
26.
Strand
,
E.
, and
Johnson
,
K.
(
1996
). “
Gradient and visual speaker normalization in the perception of fricatives
,” in
Natural Language Processing and Speech Technology: Results of the 3rd KONVENS Conference
, pp.
14
26
.
27.
Strand
,
E.
(
1999
). “
Uncovering the role of gender stereotypes in speech perception
,”
J. Lang. Soc. Psychol.
18
(
1
),
86
100
.
28.
Tripp
,
A.
, and
Munson
,
B.
(
2021
). “
Perceiving gender while perceiving language: Integrating psycholinguistics and gender theory
,” in
Wiley Interdisciplinary Reviews: Cognitive Science
(
Wiley
,
New York
), p.
e1583
.
29.
Watkins
,
A. J.
, and
Makin
,
S. J.
(
1994
). “
Perceptual compensation for speaker differences and for spectral-envelope distortion
,”
J. Acoust. Soc. Am.
96
(
3
),
1263
1282
.