This study investigates imitation of English /s/ to determine whether speakers converge toward normalized or raw acoustic targets. Participants exposed to increased spectral mean (SM) raised SM, converging toward both the raw acoustics of the model talker (who had high baseline SM) and the pattern of increased SM. However, after exposure to decreased SM, direction of shift depended on participant baseline. All participants converged to the raw acoustic values of the model talker, increasing or decreasing their own SM accordingly. These results suggest imitation is not necessarily mediated by perceptual normalization to different talkers, and raw acoustics can be the target of phonetic imitation. This has theoretical implications for the perception-production link and methodological implications for analysis of convergence studies.

Normalization is the perceptual process by which listeners identify phones across talkers with varying acoustics (Johnson and Sjerps, 2021). Phonetic imitation (also called convergence or accommodation) occurs when talkers alter their production toward speech they hear and can occur in lab settings without explicit instruction to imitate (e.g., Goldinger, 1998; Nielsen, 2011). While there is considerable evidence for imitative processes, it is not clear how imitation interacts with perceptual normalization. Do speakers converge toward a normalized pattern or the raw acoustic properties of a talker's voice when these targets compete? This study presents a test case where raw acoustics are the target of phonetic imitation. This has theoretical implications for the perception-production link and methodological implications for analysis of convergence studies.

Perceptual normalization is robust and automatic for many types of speech sounds [see Johnson and Sjerps (2021) for a review]. For example, perceived vowel category boundaries shift according to fundamental frequency (F0) and frequencies of higher vowel formants (e.g., Miller, 1953; Fujisaki and Kawashima, 1968; Gottfried and Chew, 1986). F0 perception is normalized within a speaker's pitch range, and relative location of a F0 can be determined by listeners even without prior experience with the speaker (Honorof and Whalen, 2005).

Perception of sibilant spectral mean (SM) is normalized according to phonetic information in surrounding vowels. On average, English /ʃ/ has lower SM than /s/, and men produce lower SM than women. These gender differences are bimodal (not binary) and related to physiological and social factors (Fuchs and Toda, 2010). However, gender does not fully account for SM variability, as there are still correlations between vocal tract size and SM within gender categories (Fuchs and Toda, 2010). Formants and SM have been shown to be correlated across talkers beyond gender differences (Toda, 2007), and sibilant category boundaries shift accordingly in perceptual tasks based on fundamental and formant frequencies in following vowels (e.g., Johnson, 1991; May, 1976).

Previous work has demonstrated that imitation generalizes across phonetic dimensions (Kwon, 2019), words, and segments (e.g., Nielsen, 2011). While such results suggest that abstract phonological categories structure imitation, they do not distinguish between convergence toward normalized patterns vs raw acoustics. This is because these targets do not typically compete in exposure stimuli. For example, imitation of increased voice onset time (VOT) could reflect convergence to the pattern of enhancement or the raw duration values.1 In addition, common metrics of convergence, such as difference in distance (DID) or linear combination (Priva and Sanker, 2019), cannot distinguish normalized vs raw acoustic imitation as these measures only indicate whether shifts are convergent to the raw acoustics of the model. Analysis of how participants shifted relative to their own baselines is also necessary to evaluate imitation of a normalized pattern because it is possible for a shift to be convergent to a pattern but divergent from raw acoustics (and vice versa).

The target of imitation bears on the nature of the perception-production link, specifically, whether imitation reflects perceptual normalization to different talkers. Raw acoustic and normalized pattern convergence could both be compatible with multiple existing accounts of imitation, including representational accounts in which imitation is a consequence of episodic memory (e.g., Goldinger, 1998) and gestural accounts in which imitation is driven by direct perception of articulatory gestures (e.g., Fowler, 1986). However, convergence toward within-speaker patterns would require that these mechanisms be mediated by perceptual normalization to different talkers.

Although there has been little direct investigation of competing raw and normalized targets in imitation, there is some evidence for convergence to normalized patterns. In Zellou (2016), English nasalized vowels were manipulated to exhibit reduced nasality. Due to the high baseline nasality of the model talker, the stimuli exhibited higher nasality than that of most participants even when reduced. On average, participants decreased their own nasality after exposure, converging to the pattern of reduced nasality but diverging from the raw acoustics of the model. The measure of nasality was the relative amplitudes of a low frequency nasal peak and the first formant peak (A1-P0), which depends on vocal tract physiology and does not necessarily capture relative differences in nasality across talkers. Although participants did not converge to the raw acoustics of the model's A1-P0, they may have acoustically converged on other dimensions.

To investigate the perceptual similarity of convergence toward different targets, Nielsen and Scarborough (2019) conducted an AXB perception task using the post-exposure recordings from Zellou (2016). Participants judged productions with decreased nasality (normalized pattern convergence) as more similar to the model speech than productions with closer A1-P0 to the model. This indicates that listeners evaluated nasality using cues other than A1-P0. Overall, this line of work suggests that imitators can converge to at least some types of normalized patterns, and such shifts are perceived as convergent by listeners. The present study examines imitation of sibilant SM in English, which differs from A1-P0 as it can be compared across speakers and is perceptually normalized across talkers. It is also a primary cue to the phonological contrast between /ʃ s/, while vowel nasalization is a secondary cue.

If imitation is mediated by normalization, imitators should converge toward the within-speaker pattern exhibited by the model talker. If imitation is not mediated by normalization, imitators should converge to the raw acoustics of the model talker's voice. The present study tests this by using stimuli in which these two targets compete, English /s/-initial words produced by a model talker with higher-than-average SM. The experiment has two conditions: increased and decreased SM on the model speech. Convergence toward raw acoustics predicts that direction of shift depends on where participant baseline is relative to the model. If the model exhibits higher/lower SM relative to participant baseline, they are expected to increase/decrease SM accordingly, regardless of whether that SM is increased or decreased for the model. Convergence toward the normalized pattern predicts that direction of acoustic change depends on the pattern exhibited by the model—participants will increase SM after exposure to the increased stimuli and decrease SM after exposure to the decreased stimuli regardless of their own baseline SM.

All participants in the increased condition exhibited lower baseline SM than the model. Therefore, the decreased condition distinguishes the two hypotheses. If the target of imitation is raw acoustics, participants with lower baseline SM than the model are expected to increase SM after exposure, as the model's decreased SM is still higher than most participants' baseline. However, if the target is the normalized pattern of reduction, participants are expected to decrease SM. These predictions are summarized in Table 1, where arrows indicate direction of SM shift.

Table 1.

Predicted SM shift for different hypotheses about the target of imitation.

Predicted shift in participant /s/ by relative baseline SM
Condition Hypothesis Low SM Average SM High SM Very high SM
Increased SM  Raw  /s/ →  Model 
  Normalized  /s/ →  Model 
Decreased SM  Raw  /s/ →  Model  ←/s/   
  Normalized  ← /s/  Model  ← /s/   
Predicted shift in participant /s/ by relative baseline SM
Condition Hypothesis Low SM Average SM High SM Very high SM
Increased SM  Raw  /s/ →  Model 
  Normalized  /s/ →  Model 
Decreased SM  Raw  /s/ →  Model  ←/s/   
  Normalized  ← /s/  Model  ← /s/   

The stimuli were modeled after Nielsen (2011). The words played during exposure comprised 40 high frequency [s]-initial words, 40 low frequency [s]-initial words, and 40 sonorant-initial filler words. All words had initial stress and no onset clusters. Word frequency data were obtained from CELEX (Baayen , 1996) with thresholds for low and high frequency at below 300 and above 1000 per 17.9 × 106, respectively. Test words were balanced for phonological neighborhood density [data from Balota (2007)], number of syllables (1–3), and rounding of the following vowel.

The exposure words were recorded by a female native American English speaker in a sound-attenuated booth with a Shure (Niles, IL) SM35 microphone and Audacity software (Audacity Team, 2021). Recordings were sampled at a rate of 44.1 kHz with a bit depth of 16. The model talker had higher-than-average raw SM, formant, and fundamental frequencies for a female English speaker: 8938 Hz for /s/ SM [cf. 6500–8100 Hz in Flipsen (1999)], 457–1242 Hz range for F1 and 1100–2900 Hz range for F2 [cf. 437–936 and 1035–2761 Hz in Hillenbrand (1995)], and 244 Hz for F0 [cf. 210–235 Hz in Hillenbrand (1995)]. To construct the model speech, words were first normalized for pitch and amplitude. The sibilants with increased and decreased SM were created by shifting the spectrum of the sibilant noise up or down using the “shift frequencies” function in Praat (Boersma and Weenink, 2001). The initial sibilants were segmented using TextGrids, so only the sibilant noise was shifted, and no other parts of the word were altered. The spectra were shifted up 15% of the raw SM value for an average of 10 430 Hz in the increased condition and down 15% for an average of 7751 Hz in the decreased condition.2

Although participants did not hear the unmanipulated SM from the model talker, it can be assumed that the pattern of increased or decreased SM was perceptible. In the increased condition, SM is well above the typical range for a female adult talker. The decreased stimuli are in the typical range but lower than would be expected based on the model's higher-than-average fundamental and formant frequencies, which do trigger normalization of sibilants (see Sec. 1.1). This leads to a perceptual mismatch: Participants hear average SM values from a speaker with higher-than-average F0 and formants. An additional mismatch comes from coarticulation as participants heard the model's original formant transitions, which are associated with the unmanipulated fricatives. This provides additional information that the sibilant noise itself is either high or low for the model talker's voice.

The participants were 24 American English speakers recruited at a large American university (18 female, 6 male). While most participants had some exposure to other languages during school, none reported exposure to other languages during early childhood or high degree of fluency outside of English. Because the model talker had higher-than-average SM, it was expected that most participants would have lower baseline SM in both conditions, which was the case. Twenty-one participants had lower baseline SM than the model, and only three participants had higher baseline SM.

Participants were randomly assigned to one of two conditions: exposure to increased SM (9 female, 3 male) or decreased SM (9 female, 3 male). Each condition used the same four-block procedure: warm-up reading, baseline word-naming, listening (exposure), and post-exposure word-naming. The recording was done using the same equipment as the stimulus recording. The stimuli were presented in a random order inside the sound-attenuated booth on an external monitor.

The warm-up block for silent reading is intended to reduce hyperarticulation of low frequency words (e.g., Goldinger, 1998; Nielsen, 2011). The word lists for the warm-up and baseline blocks contained the same 80 [s]-initial words from the exposure list and 30 sonorant-initial fillers that were different from those in exposure. In the baseline block, the participants were instructed to name the word by reading it aloud naturally. In the exposure block, participants were asked to listen to the (manipulated) 80 [s]-initial words and 40 fillers from the exposure word list (see Sec. 2.1) using Audio Technica (Tokyo, Japan) M20x headphones. The post-exposure production block used the same instructions as the baseline word-naming task, and the word list comprised the baseline list plus 20 novel low frequency [s]-initial words and 20 novel low frequency [z]-initial words (which are not analyzed here).

The recordings were force aligned using the Montreal Forced Aligner with the pre-trained English model (McAuliffe , 2017). All sibilant boundaries were hand edited for accuracy. SM values were calculated from a time-averaged spectrum over the middle 80% of the fricative duration using a Hanning window with size of 0.015 s over six windows [script based on DiCanio (2021)]. Frequencies below 1000 Hz were filtered out. All analyses were done in R (R Core Team, 2013).

Because raw acoustic targets predict that direction of convergence differs by baseline, participants are grouped according to their condition and baseline SM in Fig. 1. The model average SM is indicated by the dotted lines. As expected, the model's SM was higher than most participants', but there were three participants who had higher baseline SM than the model in the decreased condition. In the increased condition (left panel), all participants had a lower baseline than the model and, on average, increased SM after exposure. In the decreased condition, most participants (n = 9; right panel) had a lower baseline and increased SM post-exposure, but the three individuals with a higher baseline decreased SM (middle panel).

Fig. 1.

Change in SM across blocks and conditions. Dotted line, average Model SM.

Fig. 1.

Change in SM across blocks and conditions. Dotted line, average Model SM.

Close modal

While convergence and direction of SM shift are visually apparent in each panel in Fig. 1, two separate statistical analyses must be performed to estimate the strength of these effects. This is because measures of convergence, like DID or linear combination, only test whether participants converged and do not indicate the direction in which participants shifted relative to their own baselines. In comparison, modeling raw SM indicates direction of shift from participant baseline but does not indicate whether the shift is convergent or divergent from the model. Therefore, both approaches are necessary to determine whether participants converged to a normalized or raw target.

To determine whether participants significantly converged, a linear combination model was employed (Priva and Sanker, 2019) as DID exhibits bias according to starting distance (MacLeod, 2021). A mixed effects linear regression was constructed with post-exposure SM as the dependent variable (Table 2). The crucial effect is Model SM, which is also significant and positive. This indicates significant convergence to the model talker in the post-exposure block. Neither the main effect of condition nor the interaction of Condition with Model SM is significant, indicating no significant difference in convergence between conditions.3

Table 2.

Fixed effect table for linear combination model. Call: Post-exposure SM ∼ Baseline SM + Model SM + Condition + Model SM × Condition + (1|Speaker) + (1|Word). Model and baseline SM are continuous predictors. Condition is a categorical predictor with increased SM as the reference level. Asterisks indicate statistical significance for an alpha level of 0.05.

Effect Estimate (s.e.a) dfb t p
Baseline SM  0.33 (0.02)  1432  14.03  <0.001*** 
Model SM  0.27 (0.06)  90.00  4.41  <0.001*** 
Condition: decreased  665.5 (597.10)  651  1.12  0.27 
Condition × Model SM  0.04 (0.06)  1422  0.57  0.57 
Effect Estimate (s.e.a) dfb t p
Baseline SM  0.33 (0.02)  1432  14.03  <0.001*** 
Model SM  0.27 (0.06)  90.00  4.41  <0.001*** 
Condition: decreased  665.5 (597.10)  651  1.12  0.27 
Condition × Model SM  0.04 (0.06)  1422  0.57  0.57 
a

Standard error (s.e.).

b

Degrees of freedom (df).

To determine whether participants significantly increased or decreased their own SM to achieve this convergence, the data were submitted to a linear mixed effects model with raw SM as the dependent variable.4 Stepwise comparison was used to determine the best-fit model from a maximally specified model with fixed effects of Block (baseline vs post-exposure), Condition (increased vs decreased SM), Lexical Frequency (high vs low), Vowel Rounding (round vs unround), all possible two-way interactions between these fixed effects, and random intercepts for speaker and word. The best-fit model is shown in Table 3.

Table 3.

Fixed effect table for linear mixed effects regression of best-fit model. Call: SM ∼ Condition + Block + Rounding + Condition:Rounding + (1|Speaker) + (1|Word). Reference level is increased condition, baseline block, and rounded vowel context.

Effect Estimate (s.e.) df t p
Condition: decreased  533.16 (338.52)  21.21  1.58  0.13 
Block: post  133.67 (21.19)  3545.19  6.31  <0.001*** 
Rounding: unround  891.60 (57.76)  108.71  15.44  <0.001*** 
Condition × Rounding  −328.09 (40.88)  3462.30  −8.03  <0.001*** 
Effect Estimate (s.e.) df t p
Condition: decreased  533.16 (338.52)  21.21  1.58  0.13 
Block: post  133.67 (21.19)  3545.19  6.31  <0.001*** 
Rounding: unround  891.60 (57.76)  108.71  15.44  <0.001*** 
Condition × Rounding  −328.09 (40.88)  3462.30  −8.03  <0.001*** 

The significant effect and positive estimate of Block indicates that participants significantly increased SM after exposure. The interaction between Condition and Block did not significantly improve model fit and is not included in Table 3. The significant and positive effect of Rounding indicates that sibilants before unrounded vowels had higher SM, as expected (Soli, 1981). There is also a significant interaction between rounding and condition, indicating that participants had lower SM before rounded vowels in the increased condition. However, this effect was consistent across blocks, so it does not indicate a difference in convergence and likely reflects individual variability in degree of sibilant coarticulation.

This study investigated spontaneous imitation of English /s/ from a model talker with high baseline SM. After exposure to model speech with increased SM, participants increased their own SM values, converging toward the raw acoustics of the model and the normalized pattern of increased SM. However, when exposed to speech with decreased SM on /s/, participants with lower baseline SM also increased SM. This diverges from the pattern of decreased SM but converges to the raw acoustics of the model, as even the decreased model speech had higher SM than most participants' baseline.

The post-exposure changes cannot be explained by an increase in global hyperarticulation. The three participants with higher baseline SM than the model did decrease SM post-exposure.5 In addition, a post hoc analysis of other acoustic features associated with hyperarticulation (sibilant and vowel duration) showed no significant differences between baseline and post-exposure in either condition. It is also unlikely that participants avoided decreasing SM on /s/ because of the phonological contrast with /ʃ/. While some previous work suggests contrast constrains imitation, results are mixed. English speakers have been shown to imitate increased VOT on /p/ (e.g., Nielsen, 2011), but only some studies found imitation of decreased VOT (Schertz and Johnson, 2022). In the present study, the three participants with higher baseline SM than the model did decrease /s/ SM after exposure, so it cannot be the case that phonological contrast generally prevented participants from decreasing SM.

Convergence to raw acoustics differs from previous work, which suggests that speakers may imitate normalized patterns when faced with competing targets (Zellou , 2016). This may be explained by design differences, like gender of the model talker, combined with the nature of the phonetic dimensions examined. Here, the dependent variable (SM) is a primary cue to phonological contrast, which does undergo perceptual normalization across talkers. In Zellou (2016), the dimension of A1-P0 as a cue to nasalization can capture relative differences within speakers but not necessarily between speakers. Further work is needed to fully determine how and when normalization affects imitation, including what types of phonetic dimensions and tasks trigger normalized vs raw convergence. The data here indicate that the target of imitation is raw acoustics in at least some circumstances, and phonetic imitation is not necessarily mediated by perceptual normalization to different talkers. In addition, these results demonstrate that modeling convergence and direction of shift relative to participant baseline are both necessary to distinguish between normalized pattern and raw acoustic imitation.

This research was supported by funds from the University of Texas Arlington. Thanks to those who have provided feedback on the project: Kristine Yu, John Kingston, Melissa Baese-Berk, Cynthia Kilpatrick, and anonymous reviewers. The authors have no conflicts of interest to disclose. Human subject data collection was approved by the University of Texas Arlington Institutional Review Board, and informed consent was obtained from all participants. The data that support the findings of this study are available from the corresponding author upon reasonable request.

1

It may also be difficult to determine what the normalized target would be in such a case.

2

There are some social associations between /s/ SM and sexual identity in American English, although these are mostly reported in men's speech (e.g., Munson , 2006), and the model talker here was a cisgender woman.

3

An alternative analysis with only baseline SM and condition as fixed effects still obtains a non-significant effect for condition.

4

If SM normalized within-speaker is the dependent variable instead, the results do not change.

5

This refers only to the descriptive shift in average SM and not statistical significance, which was not tested directly due to the mismatched n between participants with higher (three) and lower (nine) baselines.

1.
Audacity Team
(
2021
). “
Audacity: Free audio editor and recorder (version 3.2) [computer program]
,” https://www.audacityteam.org/ (Last viewed June 22, 2023).
2.
Baayen
,
R. H.
,
Piepenbrock
,
R.
, and
Gulikers
,
L.
(
1996
).
The CELEX Lexical Database, Release 2
(
Linguistics Data Consortium, University of Pennsylvania [distributor]
,
Philadelphia, PA
).
3.
Balota
,
D. A.
,
Yap
,
M. J.
,
Hutchison
,
K. A.
,
Cortese
,
M. J.
,
Kessler
,
B.
,
Loftis
,
B.
,
Neely
,
J. H.
,
Nelson
,
D. L.
,
Simpson
,
G. B.
, and
Treiman
,
R.
(
2007
). “
The English Lexicon Project
,”
Behav. Res. Methods
39
(
3
),
445
459
.
4.
Boersma
,
P.
, and
Weenink
,
D. J. M.
(
2001
). “
Praat, a system for doing phonetics by computer
,”
Glot Int.
5
(
9/10
),
341
347
.
5.
DiCanio
,
C.
(
2021
). “
Time averaging for fricatives 2.0
,” Praat script published online, https://www.acsu.buffalo.edu/~cdicanio/scripts.html.
6.
Flipsen
,
P.
,
Shriberg
,
L.
,
Karlsson
,
H.
,
McSweeny
,
J.
, and
Weismer
,
G.
(
1999
). “
Acoustic characteristics of /s/ in adolescents
,”
J. Speech. Lang. Hear. Res.
42
(
3
),
663
677
.
7.
Fowler
,
C. A.
(
1986
). “
An event approach to the study of speech perception from a direct-realist perspective
,”
J. Phon.
14
(
1
),
3
28
.
8.
Fuchs
,
S.
, and
Toda
,
M.
(
2010
). “
Do differences in male versus female /s/ reflect biological or sociophonetic factors
,” in
Turbulent Sounds: An Interdisciplinary Guide
(
Mouton de Gruyter
,
Berlin
).
9.
Fujisaki
,
H.
, and
Kawashima
,
T.
(
1968
). “
The roles of pitch and higher formants in the perception of vowels
,”
IEEE Trans. Audio Electroacoust.
16
(
1
),
73
77
.
10.
Goldinger
,
S. D.
(
1998
). “
Echoes of echoes? An episodic theory of lexical access
,”
Psychol. Rev.
105
(
2
),
251
279
.
11.
Gottfried
,
T. L.
, and
Chew
,
S. L.
(
1986
). “
Intelligibility of vowels sung by a countertenor
,”
J. Acoust. Soc. Am.
79
(
1
),
124
130
.
12.
Hillenbrand
,
J.
,
Getty
,
L.
,
Clark
,
M.
, and
Wheeler
,
K.
(
1995
). “
Acoustic characteristics of American English vowels
,”
J. Acoust. Soc. Am.
97
(
5
),
3099
3111
.
13.
Honorof
,
D.
, and
Whalen
,
D.
(
2005
). “
Perception of pitch location within a speaker's F0 range
,”
J. Acoust. Soc. Am.
117
(
4
),
2193
2200
.
14.
Johnson
,
K.
(
1991
). “
Differential effects of speaker and vowel variability on fricative perception
,”
Lang. Speech
34
(
3
),
265
279
.
15.
Johnson
,
K.
, and
Sjerps
,
M. J.
(
2021
). “
Speaker normalization in speech perception
,” in
The Handbook of Speech Perception
(
Wiley
,
New York
), pp.
145
176
.
16.
Kwon
,
H.
(
2019
). “
The role of native phonology in spontaneous imitation: Evidence from Seoul Korean
,”
Lab. Phonol.
10
(
1
),
10
.
17.
MacLeod
,
B.
(
2021
). “
Problems in the difference-in-distance measure of phonetic imitation
,”
J. Phon.
87
,
101058
.
18.
May
,
J.
(
1976
). “
Vocal tract normalization for /s/ and /ʃ/
,”
Haskins Lab. Status Rep. Speech Res.
48
,
67
73
.
19.
McAuliffe
,
M.
,
Socolof
,
M.
,
Mihuc
,
S.
,
Wagner
,
M.
, and
Sonderegger
,
M.
(
2017
). “
Montreal Forced Aligner (version 0.9.0) [computer program]
,” https://montreal-forced-aligner.readthedocs.io/en/latest/ (Last viewed June 22, 2023).
20.
Miller
,
R. L.
(
1953
). “
Auditory tests with synthetic vowels
,”
J. Acoust. Soc. Am.
25
(
1
),
114
121
.
21.
Munson
,
B.
,
McDonald
,
E.
,
DeBoe
,
N. L.
, and
White
,
A. R.
(
2006
). “
The acoustic and perceptual bases of judgments of women and men's sexual orientation from read speech
,”
J. Phon.
34
(
2
),
202
240
.
22.
Nielsen
,
K.
(
2011
). “
Specificity and abstractness of VOT imitation
,”
J. Phon.
39
(
2
),
132
142
.
23.
Nielsen
,
K.
, and
Scarborough
,
R.
(
2019
). “
Perceptual target of phonetic accommodation: A pattern within a speaker's phonetic system or the raw acoustic signal?
,” in
Proceedings of the International Congress of Phonetic Sciences
, August 5–9, Melbourne, Australia.
24.
Priva
,
U. C.
, and
Sanker
,
C.
(
2019
). “
Limitations of difference-in-difference for measuring convergence
,”
Lab. Phonol.
10
(
1
)
15
.
25.
R Core Team
(
2013
).
R: A Language and Environment for Statistical Computing
(
R Foundation for Statistical Computing
,
Vienna, Austria
).
26.
Schertz
,
J.
, and
Johnson
,
E. K.
(
2022
). “
Voice onset time imitation in teens versus adults
,”
J. Speech Lang. Hear. Res.
65
(
5
),
1839
1850
.
27.
Soli
,
S. D.
(
1981
). “
Second formants in fricatives: Acoustic consequences of fricative-vowel coarticulation
,”
J. Acoust. Soc. Am.
70
(
4
),
976
984
.
28.
Toda
,
M.
(
2007
). “
Speaker normalization of fricative noise: Considerations on language-specific contrast
,” in
Proceedings of the 16th International Congress on Phonetic Sciences
, August 6–10, Saarbrucken, Germany, pp.
825
828
.
29.
Zellou
,
G.
,
Scarborough
,
R.
, and
Nielsen
,
K.
(
2016
). “
Phonetic imitation of coarticulatory vowel nasalization
,”
J. Acoust. Soc. Am.
140
(
5
),
3560
3575
.