A psychophysical experiment was conducted to perceptually validate several spectral audio features through ordinal scaling: spectral centroid, spectral spread, spectral skewness, odd-to-even harmonic ratio, spectral slope, and harmonic spectral deviation. Several sets of stimuli per audio feature were synthesized at different fundamental frequencies and spectral centroids by controlling (wherever possible) each spectral feature independently of the others, thus isolating the effect that each feature had on the stimulus rankings within each sound set. Listeners were overall able to order stimuli varying along all the spectral features tested when presented with an appropriate spacing of feature values. For specific cases of stimuli in which the ordering task partially failed, psychophysical interpretations are provided to explain listeners' confusions. The results of the ordinal scaling experiment outline trajectories of spectral features that correspond to listeners' perceptions and suggest a number of sound synthesis parameters that could carry timbral contour information.

1.
Agus
,
N.
,
Anderson
,
H.
,
Chen
,
J.
,
Lui
,
S.
, and
Herremans
,
D.
(
2018
). “
Perceptual evaluation of measures of spectral variance
,”
J. Acoust. Soc. Am.
143
,
3300
3311
.
2.
Albert
,
A.
, and
Anderson
,
J. A.
(
1984
). “
On the existence of maximum likelihood estimates in logistic regression models
,”
Biometrika
71
,
1
10
.
3.
ANSI
(
2007
). ANSI S3.4-2007,
American National Standard Procedure for the Computation of Loudness of Steady Sound
(
Acoustical Society of America
,
Melville, NY
).
4.
Azzalini
,
A.
(
2005
). “
The skew-normal distribution and related multivariate families
,”
Scand. J. Statist.
32
,
159
188
.
5.
Barr
,
D. J.
,
Levy
,
R.
,
Scheepers
,
C.
, and
Tilly
,
H. J.
(
2013
). “
Random effects structure for confirmatory hypothesis testing: Keep it maximal
,”
J. Mem. Lang.
68
,
255
278
.
6.
Beasley
,
M. T.
, and
Zumbo
,
B. D.
(
2009
). “
Aligned rank tests for interactions in split-plot designs: Distributional assumptions and stochastic heterogeneity
,”
J. Mod. Appl. Stat. Methods
8
,
16
50
.
7.
Caclin
,
A.
,
McAdams
,
S.
,
Smith
,
B. K.
, and
Winsberg
,
S.
(
2005
). “
Acoustic correlates of timbre space dimensions: A confirmatory study using synthetic tones
,”
J. Acoust. Soc. Am.
118
,
471
482
.
8.
Glasberg
,
B. R.
, and
Moore
,
B. C. J.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
,
103
138
.
9.
Grey
,
J. M.
(
1977
). “
Multidimensional perceptual scaling of musical timbres
,”
J. Acoust. Soc. Am.
61
,
1270
1277
.
10.
Grey
,
J. M.
, and
Gordon
,
J. W.
(
1978
). “
Perceptual effects of spectral modifications on musical timbres
,”
J. Acoust. Soc. Am.
63
,
1493
1500
.
11.
Higgins
,
J. J.
, and
Tashtoush
,
S.
(
1994
). “
An aligned rank transform test for interaction
,”
Nonlinear World
1
,
201
211
.
12.
Hoffman
,
W.
(
1989
). “
Iterative algorithms for Gram-Schmidt orthogonalization
,”
Computing
41
,
335
348
.
13.
Holm
,
S.
(
1979
). “
A simple sequentially rejective multiple test procedure
,”
Scand. J. Stat.
6
,
65
70
.
14.
Horner
,
A. B.
,
Beauchamp
,
J. W.
, and
So
,
R. H. Y.
(
2011
). “
Evaluation of Mel-band and MFCC-based error metrics for correspondence to discrimination of spectrally altered musical instrument sounds
,”
J. Audio Eng. Soc.
59
,
290
303
.
15.
ISO
(
2004
). ISO 389-8,
Acoustics—Reference Zero for the Calibration of Audiometric Equipment – Part 8: Reference Equivalent Threshold Sound Pressure Levels for Pure Tones and Circumaural Earphones
(
International Organization for Standardization
,
Geneva, Switzerland
).
16.
ISO/IEC
(
2002
). ISO/IEC FDIS 15938–4:2002,
MPEG-7: Information Technology – Multimedia Content Description Interface - Part 4: Audio
(
International Organization for Standardization
,
Geneva, Switzerland
).
17.
Iverson
,
P.
, and
Krumhansl
,
C. L.
(
1993
). “
Isolating the dynamic attributes of musical timbre
,”
J. Acoust. Soc. Am.
94
,
2595
2603
.
18.
Krimphoff
,
J.
,
McAdams
,
S.
, and
Winsberg
,
S.
(
1994
). “
Caracterisation du timbre des sons complexes. 2: Analyses acoustiques et quantification psychophysique. [Characterization of the timbre of complex sounds. 2: Acoustic analysis and psychophysical quantification
,”
J. Phys.
4
,
625
628
.
19.
Krumhansl
,
C. L.
(
1989
). “
Why is musical timbre so hard to understand
?,” in
Structure and Perception of Electroacoustic Sound and Music
, edited by
S.
Nielzén
and
O.
Olsson
(
Excerpta Medica
,
Amsterdam
), pp.
43
53
.
20.
Lakatos
,
S.
(
2000
). “
A common perceptual space for harmonic and percussive timbres
,”
Percept. Psychophys.
62
,
1426
1439
.
21.
Luepsen
,
H.
(
2017
). “
The aligned rank transform and discrete variables: A warning
,”
Commun. Stat. Simul. Comput.
46
,
6923
6936
.
22.
Martin
,
F. N.
, and
Champlin
,
C. A.
(
2000
). “
Reconsidering the limits of normal hearing
,”
J. Am. Acad. Audiol.
11
,
64
66
.
23.
McAdams
,
S.
,
Beauchamp
,
J. W.
, and
Meneguzzi
,
S.
(
1999
). “
Discrimination of musical instrument sounds resynthesized with simplified spectrotemporal parameters
,”
J. Acoust. Soc. Am.
105
,
882
897
.
24.
McAdams
,
S.
,
Winsberg
,
S.
,
Donnadieu
,
S.
,
Soete
,
G. D.
, and
Krimphoff
,
J.
(
1995
). “
Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes
,”
Psychol. Res.
58
,
177
192
.
25.
McCullagh
,
P.
(
1980
). “
Regression models for ordinal data
,”
J. Roy. Stat. Soc. B. Met.
42
,
109
142
.
26.
McDermott
,
J. H.
,
Lehr
,
A. J.
, and
Oxenham
,
A. J.
(
2008
). “
Is relative pitch specific to pitch?
,”
Psychol. Sci.
19
,
1263
1271
.
27.
McDermott
,
J. H.
,
Schlemitsch
,
M.
, and
Simoncelli
,
E. P.
(
2013
). “
Summary statistics in auditory perception
,”
Nat. Neurosci.
16
,
493
498
.
28.
Moore
,
B. C. J.
, and
Glasberg
,
B. R.
(
1983
). “
Suggested formulae for calculating auditory-filter bandwidths and excitation patterns
,”
J. Acoust. Soc. Am.
74
,
750
753
.
29.
Moore
,
B. C. J.
,
Glasberg
,
B. R.
, and
Baer
,
T.
(
1997
). “
A model for the prediction of thresholds, loudness, and partial loudness
,”
J. Audio Eng. Soc.
45
,
224
240
.
30.
Peeters
,
G.
,
Giordano
,
B. L.
,
Susini
,
P.
,
Misdariis
,
N.
, and
McAdams
,
S.
(
2011
). “
The Timbre Toolbox: Extracting audio descriptors from musical signals
,”
J. Acoust. Soc. Am.
130
,
2902
2916
.
31.
Schlittenlacher
,
J.
,
Ellermeier
,
W.
, and
Hashimoto
,
T.
(
2015
). “
Spectral loudness summation: Shortcomings of current standards
,”
J. Acoust. Soc. Am.
137
,
EL26
EL31
.
32.
Smith
,
B. K.
(
1995
). “
PsiExp: An environment for psychoacoustic experimentation using the IRCAM musical workstation
,” in
Proceedings of the Meeting of the Society for Music Perception and Cognition
(
University of California, Berkeley
,
Berkeley, CA
).
33.
v. Bismarck
,
G.
(
1974
). “
Sharpness as an attribute of the timbre of steady sounds
,”
Acustica
30
,
159
172
.
34.
Wun
,
S.
,
Horner
,
A.
, and
Wu
,
B.
(
2014
). “
Effect of spectral centroid manipulation on discrimination and identification of instrument timbres
,”
J. Audio Eng. Soc.
62
,
575
583
.
35.
Yost
,
W. A.
, and
Hill
,
R.
(
1978
). “
Strength of the pitches associated with ripple noise
,”
J. Acoust. Soc. Am.
64
,
485
492
.
36.
Zwicker
,
E.
, and
Scharf
,
B.
(
1965
). “
A model of loudness summation
,”
Psych. Rev.
72
,
3
26
.
You do not currently have access to this content.