Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.

1.
Barton
,
J. J.
, and
Corrow
,
S. L.
(
2016
). “
Recognizing and identifying people: A neuropsychological review
,”
Cortex
75
,
132
150
.
2.
Baumann
,
O.
, and
Belin
,
P.
(
2010
). “
Perceptual scaling of voice identity: Common dimensions for different vowels and speakers
,”
Psychol. Res.
74
(
1
),
110
120
.
3.
Belin
,
P.
,
Fecteau
,
S.
, and
Bédard
,
C.
(
2004
). “
Thinking the voice: Neural correlates of voice perception
,”
Trends Cogn. Sci.
8
(
3
),
129
135
.
4.
Bricker
,
P. D.
, and
Pruzansky
,
S.
(
1966
). “
Effects of stimulus content and duration on talker identification
,”
J. Acoust. Soc. Am.
40
(
6
),
1441
1449
.
5.
Burton
,
A. M.
(
2013
). “
Why has research in face recognition progressed so slowly? The importance of variability
,”
Q. J. Exp. Psychol.
66
(
8
),
1467
1485
.
6.
Burton
,
A. M.
,
Kramer
,
R. S.
,
Ritchie
,
K. L.
, and
Jenkins
,
R.
(
2016
). “
Identity from variation: Representations of faces derived from multiple instances
,”
Cogn. Sci.
40
(
1
),
202
223
.
7.
Cattell
,
R. B.
(
1966
). “
The scree test for the number of factors
,”
Multivar. Behav. Res.
1
(
2
),
245
276
.
8.
Cattell
,
R. B.
(
1978
).
The Scientific Use of Factor Analysis in Behavioral and Life Sciences
(
Springer
,
Boston)
.
9.
Cook
,
S.
, and
Wilding
,
J.
(
1997
). “
Earwitness testimony: Never mind the variety, hear the length
,”
Appl. Cogn. Psychol.
11
(
2
),
95
111
.
10.
Fant
,
G.
(
1960
).
Acoustic Theory of Speech Production
(
Mouton & Co
,
The Hague
).
11.
Fitch
,
W. T.
(
1997
). “
Vocal tract length and formant frequency dispersion correlate with body size in Rhesus Macaques
,”
J. Acoust. Soc. Am.
102
(
2
),
1213
1222
.
12.
Hanson
,
H. M.
, and
Chuang
,
E. S.
(
1999
). “
Glottal characteristics of male speakers: Acoustic correlates and comparison with female data
,”
J. Acoust. Soc. Am.
106
(
2
),
1064
1077
.
13.
Hill
,
H.
, and
Bruce
,
V.
(
1996
). “
Effects of lighting on the perception of facial surfaces
,”
J. Exp. Psychol. Hum. Percept. Perform.
22
(
4
),
986
1004
.
14.
Hillenbrand
,
J.
,
Cleveland
,
R.
, and
Erickson
,
R.
(
1994
). “
Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech
,”
J. Speech Lang. Hear. Res.
37
,
769
778
.
15.
Hirson
,
A.
, and
Duckworth
,
M.
(
1993
). “
Glottal fry and voice disguise: A case study in forensic phonetics
,”
J. Biomed. Eng.
15
(
3
),
193
200
.
16.
IEEE Subcommittee
(
1969
). “
IEEE subcommittee on subjective measurements IEEE recommended practices for speech quality measurements
,”
IEEE Trans. Signal Process.
17
,
227
246
.
17.
Iseli
,
M.
, and
Alwan
,
A.
(
2004
). “
An improved correction formula for the estimation of harmonic magnitudes and its application to open quotient estimation
,” in
Proceedings of IEEE ICASSP
, Vol.
1
, pp.
10
13
.
18.
Ives
,
D. T.
,
Smith
,
D. R. R.
, and
Patterson
,
R. D.
(
2005
). “
Discrimination of speaker size from syllable phrases
,”
J. Acoust. Soc. Am.
118
(
6
),
3816
3822
.
19.
Jenkins
,
R.
,
White
,
D.
,
Van Montfort
,
X.
, and
Burton
,
A. M.
(
2011
). “
Variability in photos of the same face
,”
Cognition
121
(
3
),
313
323
.
20.
Kaiser
,
H. F.
(
1960
). “
The applications of electronic computer to factor analysis
,”
Educ. Psychol. Meas.
20
(
1
),
141
151
.
21.
Keating
,
P.
, and
Kreiman
,
J.
(
2016
). “
Acoustic similarities among female voices
,”
J. Acoust. Soc. Am.
140
,
3393
.
22.
Keating
,
P. A.
,
Kreiman
,
J.
, and
Alwan
,
A.
(
2019
). “
A new speech database for within- and between-speaker variability
,” in
Proceedings of the ICPhS XIX
.
23.
Kramer
,
R. S.
,
Jenkins
,
R.
,
Young
,
A. W.
, and
Burton
,
A. M.
(
2017
). “
Natural variability is essential to learning new faces
,”
Vis. Cogn.
25
(
4-6
),
470
476
.
24.
Kreiman
,
J.
,
Auszmann
,
A.
, and
Gerratt
,
B.
(
2019
). “
What does it mean for a voice to sound ‘normal’
?,” in
Voice Attractiveness: Studies on Sexy, Likable, and Charismatic Speakers
, edited by
M.
Barkat-Defradas
,
B.
Weiss
,
J.
Trouvain
, and
J.
Ohala
(
Springer
,
Singapore
).
25.
Kreiman
,
J.
,
Gerratt
,
B.
,
Garellek
,
M.
,
Samlan
,
R.
, and
Zhang
,
Z.
(
2014
). “
Toward a unified theory of voice production and perception
,”
Loquens
1
(
1
),
1
9
.
26.
Kreiman
,
J.
,
Gerratt
,
B.
,
Precoda
,
K.
, and
Berke
,
G.
(
1992
). “
Individual differences in voice quality perception
,”
J. Speech Lang. Hear. Res.
35
(
3
),
512
520
.
27.
Kreiman
,
J.
, and
Gerratt
,
B. R.
(
1996
). “
The perceptual structure of pathologic voice quality
,”
J. Acoust. Soc. Am.
100
(
3
),
1787
1795
.
28.
Kreiman
,
J.
, and
Gerratt
,
B. R.
(
2012
). “
Perceptual interaction of the harmonic source and noise in voice
,”
J. Acoust. Soc. Am.
131
(
1
),
492
500
.
29.
Kreiman
,
J.
, and
Sidtis
,
D.
(
2011
).
Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception
(
Wiley-Blackwell
).
30.
Kreiman
,
J. E.
,
Keating
,
P.
, and
Vesselinova
,
N.
(
2017
). “
Acoustic similarities among voices. Part 2: Male speakers
,”
J. Acoust. Soc. Am.
142
,
2519
.
31.
LaRiviere
,
C.
(
1975
). “
Contributions of fundamental frequency and formant frequencies to speaker identification
,”
Phonetica
31
(
3-4
),
185
197
.
32.
Latinus
,
M.
, and
Belin
,
P.
(
2011a
). “
Anti-voice adaptation suggests prototype-based coding of voice identity
,”
Front. Psychol.
2
(
175
),
1
12
.
33.
Latinus
,
M.
, and
Belin
,
P.
(
2011b
). “
Primer: Human voice perception
,”
Curr. Biol.
21
(
4
),
R143
R145
.
34.
Lattner
,
S.
,
Meyer
,
M. E.
, and
Friederici
,
A. D.
(
2005
). “
Voice perception: Sex, pitch, and the right hemisphere
,”
Hum. Brain Mapp.
24
(
1
),
11
20
.
35.
Lavan
,
N.
,
Burston
,
L. F.
, and
Garrido
,
L.
(
2018
). “
How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices
,”
Br. J. Psychol.
110
,
S76
S93
.
36.
Lavan
,
N.
,
Burston
,
L. F.
,
Ladwa
,
P.
,
Merriman
,
S. E.
,
Knight
,
S.
, and
McGettigan
,
C.
(
2019
). “
Breaking voice identity perception: Expressive voices are more confusable for listeners
,”
Q. J. Exp. Psychol.
72
(9),
2240
2248
.
37.
Lavner
,
Y.
,
Rosenhouse
,
J.
, and
Gath
,
I.
(
2001
). “
The prototype model in speaker identification by human listeners
,”
Int. J. Speech Technol.
4
(
1
),
63
74
.
38.
Legge
,
G. E.
,
Grosmann
,
C.
, and
Pieper
,
C. M.
(
1984
). “
Learning unfamiliar voices
,”
J. Exp. Psychol. Learn. Mem. Cogn.
10
(
2
),
298
303
.
39.
Lively
,
S. E.
,
Logan
,
J. S.
, and
Pisoni
,
D. B.
(
1993
). “
Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories
,”
J. Acoust. Soc. Am.
94
,
1242
1255
.
40.
Maguinness
,
C.
,
Roswandowitz
,
C.
, and
von Kriegstein
,
K.
(
2018
). “
Understanding the mechanisms of familiar voice-identity recognition in the human brain
,”
Neuropsychologia
116
,
179
193
.
41.
Murphy
,
J.
,
Ipser
,
A.
,
Gaigg
,
S. B.
, and
Cook
,
R.
(
2015
). “
Exemplar variance supports robust learning of facial identity
,”
J. Exp. Psychol. Hum. Percept. Perform.
41
(
3
),
577
581
.
42.
Murry
,
T.
, and
Singh
,
S.
(
1980
). “
Multidimensional analysis of male and female voices
,”
J. Acoust. Soc. Am.
68
,
1294
1300
.
43.
O'Toole
,
A. J.
,
Edelman
,
S.
, and
Bülthoff
,
H. H.
(
1998
). “
Stimulus-specific effects in face recognition over changes in viewpoint
,”
Vis. Res.
38
(
15-16
),
2351
2363
.
44.
Papçun
,
G.
,
Kreiman
,
J.
, and
Davis
,
A.
(
1989
). “
Long-term memory for unfamiliar voices
,”
J. Acoust. Soc. Am.
85
(
2
),
913
925
.
45.
Patel
,
A. D.
(
2008
). “
Music and the brain: Three links to language
,” in
The Oxford Handbook of Music Psychology
, edited by
S.
Hallam
,
I.
Cross
, and
M.
Thaut
, 1st ed. (
Oxford University Press
,
Oxford
), pp.
208
216
.
46.
Patterson
,
K. E.
, and
Baddeley
,
A. D.
(
1977
). “
When face recognition fails
,”
J. Exp. Psychol. Learn. Mem. Cogn.
3
(
4
),
406
417
.
47.
Pisanski
,
K.
,
Fraccaro
,
P. J.
,
Tigue
,
C. C.
,
O'Connor
,
J. J. M.
,
Röder
,
S.
,
Andrews
,
P. W.
,
Fink
,
B.
,
DeBruine
,
L. M.
,
Jones
,
B. C.
, and
Feinberg
,
D. R.
(
2014
). “
Vocal indicators of body size in men and women: A meta-analysis
,”
Animal Behav.
95
,
89
99
.
48.
Read
,
D.
, and
Craik
,
F. I.
(
1995
). “
Earwitness identification: Some influences on voice recognition
,”
J. Exp. Psychol. Appl.
1
(
1
),
6
18
.
49.
Reich
,
A. R.
, and
Duke
,
J. E.
(
1979
). “
Effects of selected vocal disguises upon speaker identification by listening
,”
J. Acoust. Soc. Am.
66
(
4
),
1023
1028
.
50.
Reich
,
A. R.
,
Moll
,
K. L.
, and
Curtis
,
J. F.
(
1976
). “
Effects of selected vocal disguises upon spectrographic speaker identification
,”
J. Acoust. Soc. Am.
60
(
4
),
919
925
.
51.
Ritchie
,
K. L.
, and
Burton
,
A. M.
(
2017
). “
Learning faces from variability
,”
Q. J. Exp. Psychol.
70
(
5
),
897
905
.
52.
Samlan
,
R. A.
,
Story
,
B. H.
, and
Bunton
,
K.
(
2013
). “
Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling
,”
J. Speech Lang. Hear. Res.
56
(
4
),
1209
1223
.
53.
Saslove
,
H.
, and
Yarmey
,
A. D.
(
1980
). “
Long-term auditory memory: Speaker identification
,”
J. Appl. Psychol.
65
(
1
),
111
116
.
54.
Schweinberger
,
S. R.
(
2001
). “
Human brain potential correlates of voice priming and voice recognition
,”
Neuropsychologia
39
(
9
),
921
936
.
55.
Schweinberger
,
S. R.
(
2013
). “
Audiovisual integration in speaker identification
,” in
Integrating Face and Voice in Person Perception
, edited by
P.
Belin
,
S.
Campanella
, and
T.
Ethofer
(
Springer Science + Business Media
,
New York
), pp.
119
134
.
56.
Schweinberger
,
S. R.
,
Herholz
,
A.
, and
Sommer
,
W.
(
1997
). “
Recognizing famous voices: Influence of stimulus duration and different types of retrieval cues
,”
J. Speech Lang. Hear. Res.
40
(
2
),
453
463
.
57.
Shue
,
Y.-L.
,
Keating
,
P.
,
Vicenik
,
C.
, and
Yu
,
K.
(
2011
). “
VoiceSauce: A program for voice analysis
,” in
Proceedings of the ICPhS XVII
, pp.
1846
1849
.
58.
Singh
,
S.
, and
Murry
,
T.
(
1978
). “
Multidimensional classification of normal voice qualities
,”
J. Acoust. Soc. Am.
64
,
81
87
.
59.
Smith
,
D.
,
Patterson
,
R.
,
Turner
,
R.
,
Kawahara
,
H.
, and
Irino
,
T.
(
2005
). “
The processing and perception of size information in speech sounds
,”
J. Acoust. Soc. Am.
117
(
1
),
305
318
.
60.
Stevenage
,
S. V.
,
Neil
,
G. J.
,
Parsons
,
B.
, and
Humphreys
,
A.
(
2018
). “
A sound effect: Exploration of the distinctiveness advantage in voice recognition
,”
Appl. Cogn. Psychol.
32
(
5
),
526
536
.
61.
Sun
,
X.
(
2002
). “
Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio
,” in
Proceedings of IEEE ICASSP
, Vol.
1
, pp.
333
336
.
62.
Tabachnick
,
B. G.
, and
Fidell
,
L. S.
(
2013
).
Using Multivariate Statistics
, 6th ed. (
Pearson
,
Boston
).
63.
Thurstone
,
L. L.
(
1947
).
Multiple-Factor Analysis: A Development and Expansion of The Vectors of Mind
(
University of Chicago Press
,
Chicago
).
64.
Van Lancker
,
D.
,
Kreiman
,
J.
, and
Emmorey
,
K.
(
1985
). “
Familiar voice recognition: Patterns and parameters. Part I: Recognition of backward voices
,”
J. Phon.
13
,
19
38
.
65.
Wagner
,
I.
, and
Köster
,
O.
(
1999
). “
Perceptual recognition of familiar voices using falsetto as a type of voice disguise
,” in
Proceedings of the ICPhS XI
, pp.
1381
1384
.
66.
Walden
,
B. E.
,
Montgomery
,
A. A.
,
Gibeily
,
G. J.
,
Prosek
,
R. A.
, and
Schwartz
,
D. M.
(
1978
). “
Correlates of psychological dimensions in talker similarity
,”
J. Speech Lang. Hear. Res.
21
(
2
),
265
275
.
67.
Yovel
,
G.
, and
Belin
,
P.
(
2013
). “
A unified coding strategy for processing faces and voices
,”
Trends Cogn. Sci.
17
(
6
),
263
271
.
You do not currently have access to this content.