When a bilingual switches languages, do they switch their voice? Using a conversational corpus of speech from early Cantonese-English bilinguals (n = 34), this paper examines the talker-specific acoustic signatures of bilingual voices. Following the psychoacoustic model of voice, 24 filter and source-based acoustic measurements are estimated. The analysis summarizes mean differences for these dimensions and identifies the underlying structure of each talker's voice across languages with principal component analyses. Canonical redundancy analyses demonstrate that while talkers vary in the degree to which they have the same voice across languages, all talkers show strong similarity with themselves, suggesting an individual's voice remains relatively constant across languages. Voice variability is sensitive to sample size, and we establish the required sample to settle on a consistent impression of one's voice. These results have implications for human and machine voice recognition for bilinguals and monolinguals and speak to the substance of voice prototypes.

1.
Afouras
,
T.
,
Chung
,
J. S.
, and
Zisserman
,
A.
(
2020
). “
Now you're speaking my language: Visual language identification
,” in
Proceedings of Interspeech 2020
, October 25–29, Shanghai, China, pp.
2402
2406
.
2.
Altenberg
,
E. P.
, and
Ferrand
,
C. T.
(
2006
). “
Fundamental frequency in monolingual English, bilingual English/Russian, and bilingual English/Cantonese young adult women
,”
J. Voice
20
(
1
),
89
96
.
3.
Belin
,
P.
,
Fecteau
,
S.
, and
Bédard
,
C.
(
2004
). “
Thinking the voice: Neural correlates of voice perception
,”
Trends Cogn. Sci.
8
(
3
),
129
135
.
4.
Boersma
,
P.
, and
Weenink
,
D.
(
2021
). “
Praat: Doing phonetics by computer (version 6.1.38) [computer program]
,” http://www.praat.org/ (Last viewed January 2, 2021).
5.
Bradlow
,
A. R.
,
Kim
,
M.
, and
Blasingame
,
M.
(
2017
). “
Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate
,”
J. Acoust. Soc. Am.
141
(
2
),
886
899
.
6.
Bregman
,
M. R.
, and
Creel
,
S. C.
(
2014
). “
Gradient language dominance affects talker learning
,”
Cognition
130
(
1
),
85
95
.
7.
Bullock
,
B. E.
, and
Toribio
,
A. J.
(
2009
). “
Trying to hit a moving target: On the sociophonetics of code-switching
,” in
Studies in Bilingualism
, edited by
L.
Isurin
,
D.
Winford
, and
K.
deBot
(
John Benjamins
,
Amsterdam
), pp.
189
206
.
8.
Burton
,
A. M.
,
Kramer
,
R. S. S.
,
Ritchie
,
K. L.
, and
Jenkins
,
R.
(
2016
). “
Identity from variation: Representations of faces derived from multiple instances
,”
Cogn. Sci.
40
(
1
),
202
223
.
9.
Chai
,
Y.
, and
Garellek
,
M.
(
2022
). “
On H1–H2 as an acoustic measure of linguistic phonation type
,”
J. Acoust. Soc. Am.
152
(
3
),
1856
1870
.
10.
Cheng
,
A.
(
2020
). “
Cross-linguistic F0 differences in bilingual speakers of English and Korean
,”
J. Acoust. Soc. Am.
147
(
2
),
EL67
EL73
.
11.
Cheng
,
L. S.
,
Babel
,
M.
, and
Yao
,
Y.
(
2022
). “
Production and perception across three Hong Kong Cantonese consonant mergers: Community- and individual-level perspectives
,”
Lab. Phonol.
13
(
1
),
14
.
12.
Chodroff
,
E.
, and
Wilson
,
C.
(
2017
). “
Structure in talker-specific phonetic realization: Covariation of stop consonant VOT in American English
,”
J. Phon.
61
,
30
47
.
13.
Fant
,
G.
(
1970
).
Acoustic Theory of Speech Production
(
Mouton de Gruyter
,
Berlin
).
14.
Fricke
,
M.
,
Kroll
,
J. F.
, and
Dussias
,
P. E.
(
2016
). “
Phonetic variation in bilingual speech: A lens for studying the production-comprehension link
,”
J. Mem. Lang.
89
,
110
137
.
15.
Garellek
,
M.
(
2019
). “
The phonetics of voice
,” in
The Routledge Handbook of Phonetics
, edited by
W. F.
Katz
and
P. F.
Assmann
(
Routledge
,
Abingdon, UK
).
16.
Garellek
,
M.
,
Ritchart
,
A.
, and
Kuang
,
J.
(
2016
). “
Breathy voice during nasality: A cross-linguistic study
,”
J. Phon.
59
,
110
121
.
17.
Goggin
,
J. P.
,
Thompson
,
C. P.
,
Strube
,
G.
, and
Simental
,
L. R.
(
1991
). “
The role of language familiarity in voice identification
,”
Mem. Cognit.
19
(
5
),
448
458
.
18.
Hillenbrand
,
J.
,
Cleveland
,
R. A.
, and
Erickson
,
R. L.
(
1994
). “
Acoustic correlates of breathy vocal quality
,”
J. Speech Hear. Res.
37
(
4
),
769
778
.
19.
Hollien
,
H.
,
Majewski
,
W.
, and
Doherty
,
E. T.
(
1982
). “
Perceptual identification of voices under normal, stress and disguise speaking conditions
,”
J. Phon.
10
(
2
),
139
148
.
20.
Iseli
,
M.
,
Shue
,
Y.-L.
, and
Alwan
,
A.
(
2007
). “
Age, sex, and vowel dependencies of acoustic measures related to the voice source
,”
J. Acoust. Soc. Am.
121
(
4
),
2283
2295
.
21.
Jadoul
,
Y.
,
Thompson
,
B.
, and
de Boer
,
B.
(
2018
). “
Introducing Parselmouth: A Python interface to Praat
,”
J. Phon.
71
,
1
15
.
22.
Järvinen
,
K.
,
Laukkanen
,
A.-M.
, and
Aaltonen
,
O.
(
2013
). “
Speaking a foreign language and its effect on F0
,”
Logoped. Phoniatr. Vocol.
38
(
2
),
47
51
.
23.
Johnson
,
E. K.
,
Westrek
,
E.
,
Nazzi
,
T.
, and
Cutler
,
A.
(
2011
). “
Infant ability to tell voices apart rests on language experience
,”
Dev. Sci.
14
(
5
),
1002
1011
.
24.
Johnson
,
K. A.
(
2021a
). “
Leveraging the uniformity framework to examine crosslinguistic similarity for long-lag stops in spontaneous Cantonese-English bilingual speech
,” in
Proceedings of Interspeech 2021
, Brno, Czech Republic, August 30–September 3, pp.
2671
2675
.
25.
Johnson
,
K. A.
(
2021b
). “
SpiCE: Speech in Cantonese and English
,” https://doi.org/10.5683/SP2/MJOXP3 (Last viewed May 20, 2021).
26.
Johnsrude
,
I. S.
,
Mackey
,
A.
,
Hakyemez
,
H.
,
Alexander
,
E.
,
Trang
,
H. P.
, and
Carlyon
,
R. P.
(
2013
). “
Swinging at a cocktail party: Voice familiarity aids speech perception in the presence of a competing voice
,”
Psychol. Sci.
24
(
10
),
1995
2004
.
27.
Jolliffe
,
I. T.
(
2002
).
Principal Component Analysis
,
2nd ed.
(
Springer-Verlag
,
New York
).
28.
Kawahara
,
H.
,
Agiomyrgiannakis
,
Y.
, and
Zen
,
H.
(
2016
). “
Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis
,” in
Proceedings of the 9th ISCA Speech Synthesis Workshop
, September 13–15, Sunnyvale, CA, pp.
221
228
.
29.
Keating
,
P.
,
Kreiman
,
J.
, and
Alwan
,
A.
(
2019
). “
A new speech database for within- and between-speaker variability
,” in
Proceedings of the 19th International Congress of Phonetic Sciences
, August 5–9, Melbourne, Australia, pp.
736
739
.
30.
Keating
,
P.
, and
Kuo
,
G.
(
2012
). “
Comparison of speaking fundamental frequency in English and Mandarin
,”
J. Acoust. Soc. Am.
132
(
2
),
1050
1060
.
31.
Kreiman
,
J.
,
Gerratt
,
B. R.
,
Garellek
,
M.
,
Samlan
,
R.
, and
Zhang
,
Z.
(
2014
). “
Toward a unified theory of voice production and perception
,”
Loquens
1
(
1
),
e009
.
32.
Kreiman
,
J.
,
Lee
,
Y.
,
Garellek
,
M.
,
Samlan
,
R.
, and
Gerratt
,
B. R.
(
2021
). “
Validating a psychoacoustic model of voice quality
,”
J. Acoust. Soc. Am.
149
(
1
),
457
465
.
33.
Kreiman
,
J.
, and
Sidtis
,
D.
(
2011
).
Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception
(
Wiley-Blackwell
,
Hoboken, NJ
).
34.
Latinus
,
M.
, and
Belin
,
P.
(
2011
). “
Anti-voice adaptation suggests prototype-based coding of voice identity
,”
Front. Psychol.
2
,
175
.
35.
Latinus
,
M.
,
McAleer
,
P.
,
Bestelmeyer
,
P.
, and
Belin
,
P.
(
2013
). “
Norm-based coding of voice identity in human auditory cortex
,”
Curr. Biol.
23
(
12
),
1075
1080
.
36.
Lavan
,
N.
,
Burston
,
L. F. K.
, and
Garrido
,
L.
(
2019a
). “
How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices
,”
Br. J. Psychol.
110
(
3
),
576
593
.
37.
Lavan
,
N.
,
Knight
,
S.
, and
McGettigan
,
C.
(
2019b
). “
Listeners form average-based representations of individual voice identities
,”
Nat. Commun.
10
(
1
),
2404
.
38.
Laver
,
J.
(
1980
).
The Phonetic Description of Voice Quality
(
Cambridge University
,
New York
).
39.
Lavner
,
Y.
,
Rosenhouse
,
J.
, and
Gath
,
I.
(
2001
). “
The prototype model in speaker identification by human listeners
,”
Int. J. Speech Technol.
4
(
1
),
63
74
.
40.
Lee
,
B.
, and
Sidtis
,
D. V. L.
(
2017
). “
The bilingual voice: Vocal characteristics when speaking two languages across speech tasks
,”
Speech Lang. Hear.
20
(
3
),
174
185
.
41.
Lee
,
Y.
,
Keating
,
P.
, and
Kreiman
,
J.
(
2019
). “
Acoustic voice variation within and between speakers
,”
J. Acoust. Soc. Am.
146
(
3
),
1568
1579
.
42.
Lee
,
Y.
, and
Kreiman
,
J.
(
2019
). “
Within- and between-speaker acoustic variability: Spontaneous versus read speech
,”
J. Acoust. Soc. Am.
146
,
3011
.
43.
Lee
,
Y.
, and
Kreiman
,
J.
(
2020
). “
Language effects on acoustic voice variation within and between talkers
,”
J. Acoust. Soc. Am.
148
,
2473
.
44.
Lee
,
Y.
, and
Kreiman
,
J.
(
2022
). “
Acoustic voice variation in spontaneous speech
,”
J. Acoust. Soc. Am.
151
(
5
),
3462
3472
.
45.
Loveday
,
L.
(
1981
). “
Pitch, politeness and sexual role: An exploratory investigation into the pitch correlates of English and Japanese politeness formulae
,”
Lang. Speech
24
(
1
),
71
89
.
46.
Lüdecke
,
D.
,
Ben-Shachar
,
M. S.
,
Patil
,
I.
, and
Makowski
,
D.
(
2020
). “
Extracting, computing and exploring the parameters of statistical models using R
,”
J. Open Source Softw.
5
(
53
),
2445
.
47.
Matthews
,
S.
,
Yip
,
V.
, and
Yip
,
V.
(
2013
).
Cantonese: A Comprehensive Grammar
(
Routledge
,
London
).
48.
McAuliffe
,
M.
,
Socolof
,
M.
,
Stengel-Eskin
,
E.
,
Mihuc
,
S.
,
Wagner
,
M.
, and
Sonderegger
,
M.
(
2017
). “
Montreal forced aligner (version 1.0.1)
,” https://montrealcorpustools.github.io/Montreal-Forced-Aligner/ (Last viewed October 1, 2020).
49.
Mennen
,
I.
,
Scobbie
,
J. M.
,
de Leeuw
,
E.
,
Schaeffler
,
S.
, and
Schaeffler
,
F.
(
2010
). “
Measuring language-specific phonetic settings
,”
Second Lang. Res.
26
(
1
),
13
41
.
50.
Munson
,
B.
, and
Babel
,
M.
(
2019
). “
The phonetics of sex and gender
,” in
The Routledge Handbook of Phonetics
, edited by
W. F.
Katz
and
P. F.
Assmann
(
Routledge
,
London
).
51.
Munson
,
B.
,
Edwards
,
J.
,
Schellinger
,
S. K.
,
Beckman
,
M. E.
, and
Meyer
,
M. K.
(
2010
). “
Deconstructing phonetic transcription: Covert contrast, perceptual bias, and an extraterrestrial view of vox humana
,”
Clin. Linguist. Phon.
24
(
4–5
),
245
260
.
52.
Myers-Scotton
,
C.
(
2011
). “
The matrix language frame model: Developments and responses
,” in
Codeswitching Worldwide
(
Mouton De Gruyter
,
Berlin
).
53.
Navarro
,
D.
(
2015
). “
Learning statistics with R: A tutorial for psychology students and other beginners (version 0.6)
,” https://learningstatisticswithr.com (Last viewed October 1, 2020).
54.
Ng
,
M. L.
,
Chen
,
Y.
, and
Chan
,
E. Y.
(
2012
). “
Differences in vocal characteristics between Cantonese and English produced by proficient Cantonese-English bilingual speakers—A long-term average spectral analysis
,”
J. Voice
26
(
4
),
e171
e176
.
55.
Ng
,
M. L.
,
Hsueh
,
G.
, and
Sam Leung
,
C.-S.
(
2010
). “
Voice pitch characteristics of Cantonese and English produced by Cantonese-English bilingual children
,”
Int. J. Speech Lang. Pathol.
12
(
3
),
230
236
.
56.
Nygaard
,
L. C.
, and
Pisoni
,
D. B.
(
1998
). “
Talker-specific learning in speech perception
,”
Percept. Psychophys.
60
(
3
),
355
376
.
57.
Ordin
,
M.
, and
Mennen
,
I.
(
2017
). “
Cross-linguistic differences in bilinguals' fundamental frequency ranges
,”
J. Speech Lang. Hear. Res.
60
(
6
),
1493
1506
.
58.
Orena
,
A. J.
,
Polka
,
L.
, and
Theodore
,
R. M.
(
2019
). “
Identifying bilingual talkers after a language switch: Language experience matters
,”
J. Acoust. Soc. Am.
145
(
4
),
EL303
EL309
.
59.
Orena
,
A. J.
,
Theodore
,
R. M.
, and
Polka
,
L.
(
2015
). “
Language exposure facilitates talker learning prior to language comprehension, even in adults
,”
Cognition
143
,
36
40
.
60.
Park
,
S. J.
,
Yeung
,
G.
,
Vesselinova
,
N.
,
Kreiman
,
J.
,
Keating
,
P. A.
, and
Alwan
,
A.
(
2018
). “
Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles
,”
J. Acoust. Soc. Am.
144
(
1
),
375
386
.
61.
Perrachione
,
T.
,
Dougherty
,
S.
,
McLaughlin
,
D.
, and
Lember
,
R.
(
2015
). “
The effects of speech perception and speech comprehension on talker identification
,” in
Proceedings of the 18th International Congress of Phonetic Sciences
, August 10–14, Glasgow, UK.
62.
Perrachione
,
T. K.
,
Furbeck
,
K. T.
, and
Thurston
,
E. J.
(
2019
). “
Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices
,”
J. Acoust. Soc. Am.
146
(
5
),
3384
3399
.
63.
Perrachione
,
T. K.
, and
Wong
,
P. C.
(
2007
). “
Learning to recognize speakers of a non-native language: Implications for the functional organization of human auditory cortex
,”
Neuropsychologia
45
(
8
),
1899
1910
.
64.
Pittam
,
J.
(
1987
). “
The long-term spectral measurement of voice quality as a social and personality marker: A review
,”
Lang. Speech
30
(
1
),
1
12
.
65.
Podesva
,
R. J.
, and
Callier
,
P.
(
2015
). “
Voice quality and identity
,”
Annu. Rev. Appl. Linguist.
35
,
173
194
.
66.
Purnell
,
T.
,
Idsardi
,
W.
, and
Baugh
,
J.
(
1999
). “
Perceptual and phonetic experiments on American English dialect identification
,”
J. Lang. Soc. Psychol.
18
(
1
),
10
30
.
67.
R Core Team
(
2020
).
R: A Language and Environment for Statistical Computing
(
R Foundation for Statistical Computing
,
Vienna, Austria
).
68.
Ryabov
,
R.
,
Malakh
,
M.
,
Trachtenberg
,
M.
,
Wohl
,
S.
, and
Oliveira
,
G.
(
2016
). “
Self-perceived and acoustic voice characteristics of Russian-English bilinguals
,”
J. Voice
30
(
6
),
772.e1
772.e8
.
69.
Seyfarth
,
S.
, and
Garellek
,
M.
(
2018
). “
Plosive voicing acoustics and voice quality in Yerevan Armenian
,”
J. Phon.
71
,
425
450
.
70.
Shue
,
Y.-L.
,
Keating
,
P.
,
Vicenik
,
C.
, and
Yu
,
K.
(
2011
). “
VoiceSauce: A program for voice analysis
,” in
Proceedings of the 17th International Congress of Phonetic Sciences
, August 17–21, Hong Kong, Vol.
3
, pp.
1846
1849
.
71.
Simpson
,
A. P.
(
2009
). “
Phonetic differences between male and female speech
,”
Lang. Linguist. Compass
3
(
2
),
621
640
.
72.
Simpson
,
A. P.
(
2012
). “
The first and second harmonics should not be used to measure breathiness in male and female voices
,”
J. Phon.
40
(
3
),
477
490
.
73.
Sjölander
,
K.
(
2004
). “
The Snack Sound Toolkit
,” https://www.speech.kth.se/snack/ (Last viewed June 1, 2023).
74.
Soo
,
R.
,
Johnson
,
K. A.
, and
Babel
,
M.
(
2021
). “
Sound change in spontaneous bilingual speech: A corpus study on the Cantonese n-l merger in Cantonese-English bilinguals
,” in
Proceedings of Interspeech 2021
, Brno, Czech Republic, August 30–September 3, pp.
421
425
.
75.
Sóskuthy
,
M.
, and
Stuart-Smith
,
J.
(
2020
). “
Voice quality and coda /r/ in Glasgow English in the early 20th century
,”
Lang. Var. Change
32
(
2
),
133
157
.
76.
Soto-Faraco
,
S.
,
Navarra
,
J.
,
Weikum
,
W. M.
,
Vouloumanos
,
A.
,
Sebastián-Gallés
,
N.
, and
Werker
,
J. F.
(
2007
). “
Discriminating languages by speech-reading
,”
Percept. Psychophys.
69
(
2
),
218
231
.
77.
Stewart
,
D.
, and
Love
,
W.
(
1968
). “
A general canonical correlation index
,”
Psychol. Bull.
70
(
3, pt.1
),
160
163
.
78.
Sun
,
X.
(
2002
). “
Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio
,” in
Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
, May 13–17, Orlando, FL,
Vol. 1
, pp.
I–333
I–336
.
79.
Tabachnick
,
B. G.
, and
Fidell
,
L. S.
(
2013
).
Using Multivariate Statistics
,
6th ed.
(
Pearson
,
London
).
80.
Thompson
,
C. P.
(
1987
). “
A language effect in voice identification
,”
Appl. Cogn. Psychol.
1
(
2
),
121
131
.
81.
Turk
,
M.
, and
Pentland
,
A.
(
1991
). “
Face recognition using eigenfaces
,” in
Proceedings of the 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, June 3–6, Maui, HI.
82.
Voigt
,
R.
,
Jurafsky
,
D.
, and
Sumner
,
M.
(
2016
). “
Between- and within-speaker effects of bilingualism on F0 variation
,” in
Proceedings of Interspeech 2016
, September 8–12, San Francisco, CA, pp.
1122
1126
.
83.
Wei
,
L.
(
2018
). “
Translanguaging as a practical theory of language
,”
Appl. Linguist.
39
(
1
),
9
30
.
84.
Xie
,
X.
, and
Myers
,
E.
(
2015
). “
The impact of musical training and tone language experience on talker identification
,”
J. Acoust. Soc. Am.
137
(
1
),
419
432
.
85.
Xue
,
S. A.
,
Hagstrom
,
F.
, and
Hao
,
J.
(
2002
). “
Speaking fundamental frequency characteristics of young and elderly bilingual Chinese-English speakers: A functional system approach
,”
Asia Pac. J. Speech Lang. Hear.
7
(
1
),
55
62
.
86.
Yang
,
Y.
,
Chen
,
S.
, and
Chen
,
X.
(
2020
). “
F0 patterns in Mandarin statements of Mandarin and Cantonese speakers
,” in
Proceedings of Interspeech 2020
, October 25–29, Shanghai, China, pp.
4163
4167
.
You do not currently have access to this content.