The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a “brute force” model using a large number of general audio features.

1.
Alluri
,
V.
, and
Toiviainen
,
P.
(
2009
). “
Exploring perceptual and acoustical correlates of polyphonic timbre
,”
Music Perception
27
(
3
),
223
241
.
2.
Aucouturier
,
J.-J.
, and
Bigand
,
E.
(
2013
). “
Seven problems that keep MIR from attracting the interest of cognition and neuroscience
,”
J. Intell. Inf. Syst.
41
,
483
497
.
3.
Aucouturier
,
J.-J.
, and
Pachet.
F.
(
2004
). “
Improving timbre similarity: How high is the sky?
,”
J. Negative Results Speech Audio Sci.
1
(
1
),
1
13
.
4.
Bänziger
,
T.
,
Patel
,
S.
, and
Scherer
,
K. R.
(
2014
). “
The role of perceived voice and speech characteristics in vocal emotion communication
,”
J. Nonverbal Behav.
38
(
1
),
31
52
.
5.
Bresin
,
R.
, and
Friberg
,
A.
(
1997
). “
A multimedia environment for interactive music performance
,” in
Proceedings of the KANSEI—The Technology of Emotion AIMI International Workshop
,
Genova
, pp.
64
67
.
6.
Burred
,
J. J.
, and
Lerch
,
A.
(
2004
). “
Hierarchical automatic audio signal classification
,”
J. Audio Eng. Soc.
52
(
7/8
),
724
738
.
7.
Chang
,
C. C.
, and
Lin
,
C. J.
(
2011
). “
LIBSVM: A library for support vector machines
,”
ACM Trans. Intelligent Systems and Tech. (TIST)
2
(
3
),
1
39
.
8.
Clarke
,
E. F.
(
2005
).
Ways of Listening: An Ecological Approach to the Perception of Musical Meaning
(
Oxford University Press
,
Oxford
), 240 pp.
9.
Downie
,
J. S.
,
West
,
K.
,
Ehmann
,
A.
, and
Vincent
,
E.
(
2005
). “
The 2005 music information retrieval evaluation exchange (mirex 2005): Preliminary overview
,” in
Proceedings of the ISMIR2005, 6th International Symposium on Music Information Retrieval
, pp.
320
323
.
10.
Eerola
,
T.
(
2012
). “
Modeling listeners' emotional response to music
,”
Topics Cognit. Sci.
4
(
4
),
607
624
.
11.
Eerola
,
T.
,
Friberg
,
A.
, and
Bresin
,
R.
(
2013
). “
Emotional expression in music: Contribution, linearity, and additivity of primary musical cues
,”
Front. Psychol.
4
(
487
),
1
12
.
12.
Eerola
,
T.
, and
Vuoskoski
,
J. K.
(
2011
). “
A comparison of the discrete and dimensional models of emotion in music
,”
Psychol. Music
29
(
1
),
18
49
.
13.
Elowsson
,
A.
,
Friberg
,
A.
,
Madison
,
G.
, and
Paulin
,
J.
(
2013
). “
Modelling the speed of music using features from harmonic/percussive separated audio
,” in
Proceedings of ISMIR2013, International Symposium on Music Information Retrieval
.
14.
Fabiani
,
M.
, and
Friberg
,
A.
(
2011
). “
Influence of pitch, loudness, and timbre on the perception of instrument dynamics
,”
J. Acoust. Soc. Am.
130
,
EL193
EL199
.
15.
Friberg
,
A.
(
2008
). “
Digital audio emotions—An overview of computer analysis and synthesis of emotions in music
,” in
Proceedings of the DAFx-08, the 11th International Conference on Digital Audio Effects
, pp.
1
6
.
16.
Friberg
,
A.
(
2012
). “
Music listening from an ecological perspective
,”
Poster presented at the 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music
.
17.
Friberg
,
A.
, and
Ahlbäck
,
S.
(
2009
). “
Recognition of the main melody in a polyphonic symbolic score using perceptual knowledge
,”
J. New Music Res.
38
(
2
),
155
169
.
18.
Friberg
,
A.
,
Bresin
,
R.
, and
Sundberg
,
J.
(
2006
). “
Overview of the KTH rule system for musical performance
,”
Adv. Cognit. Psychol.
2
(
2–3
),
145
161
.
19.
Friberg
,
A.
, and
Hedblad
,
A.
(
2011
). “
A comparison of perceptual ratings and computed audio features
,” in
Proceedings of the SMC2011, 8th Sound and Music Computing Conference
, pp.
122
127
.
20.
Friberg
,
A.
,
Schoonderwaldt
,
E.
, and
Hedblad
,
A.
(
2011
). “
Perceptual ratings of musical parameters
,” in
Gemessene Interpretation—Computergestützte Aufführungsanalyse im Kreuzverhör der Disziplinen (Measured Interpretation—Computer-based Performance Analysis in an Interdisciplinary Cross-examination) (Klang und Begriff 4)
, edited by
H.
von Loesch
and
S.
Weinzierl
(
Schott
,
Mainz
), pp.
237
253
.
21.
Friberg
,
A.
, and
Sundberg
,
J.
(
1999
). “
Does music performance allude to locomotion? A model of final ritardandi derived from measurements of stopping runners
,”
J. Acoust. Soc. Am.
105
(
3
),
1469
1484
.
22.
Friberg
,
A.
, and
Sundström
,
A.
(
2002
). “
Swing ratios and ensemble timing in jazz performance: Evidence for a common rhythmic pattern
,”
Music Perception
19
(
3
),
333
349
.
23.
Gabrielsson
,
A.
, and
Lindström
,
E.
(
2010
). “
The role of structure in the musical expression of emotions
,” in
Handbook of Music and Emotion: Theory, Research, Applications
, edited by
P. N.
Juslin
and
J. A.
Sloboda
(
Oxford University Press
,
New York
), pp.
367
400
.
24.
Gaver
,
W. W.
(
1993
). “
How do we hear in the world?: Explorations in ecological acoustics
,”
Ecol. Psychol.
5
(
4
),
285
313
.
25.
Geladi
,
P.
, and
Kowalski
,
B. R.
(
1986
). “
Partial least-squares regression: A tutorial
,”
Anal. Chim. Acta.
185
,
1
17
.
26.
Gibson
,
J. J.
(
1966
).
The Senses Considered as Perceptual Systems
(
Houghton Mifflin Co
,
Boston, MA
),
335
pp.
27.
Heckmann
,
M.
,
Domont
,
X.
,
Joublin
,
F.
, and
Goerick
,
C.
(
2011
). “
A hierarchical framework for spectro-temporal feature extraction
,”
Speech Commun.
53
(
5
),
736
752
.
28.
Hedblad
,
A.
(
2011
). “
Evaluation of musical feature extraction tools using perceptual ratings
,” Master's thesis,
KTH Royal Institute of Technology, Stockholm
,
Sweden
,
35
pp.
29.
Husain
,
G.
,
Thompson
,
W. F.
, and
Schellenberg
,
E. G.
(
2002
). “
Effects of musical tempo and mode on arousal, mood, and spatial abilities
,”
Music Perception
20
(
2
),
151
171
.
30.
Ilie
,
G.
, and
Thompson
,
W. F.
(
2006
). “
A comparison of acoustic cues in music and speech for three dimensions of affect
,”
Music Perception
23
(
4
),
319
330
.
31.
ITU-R.
(
2006
). Rec. ITU-R BS.1770, “
Algorithms to measure audio programme loudness and true-peak audio level
,” International Telecommunications Union.
32a.
Juslin
,
P. N.
(
2000
). “
Cue utilization in communication of emotion in music performance: Relating performance to perception
,”
J. Exp. Psychol.: Human Percept. Perform.
26
,
1797
1813
.
32.
Juslin
,
P. N.
, and
Lindström
,
E.
(
2010
). “
Musical expression of emotions: Modelling listeners' judgments of composed and performed features
,”
Music Anal.
29
(
1–3
),
334
364
.
36.
Kitahara
,
T.
,
Goto
,
M.
,
Komatani
,
K.
,
Ogata
,
T.
, and
Okuno
,
H. G.
(
2005
). “
Instrument identification in polyphonic music: Feature weighting with mixed sounds, pitch-dependent timbre modeling, and use of musical context
,” in
Proceedings of the ISMIR2008, International Conference on Music Information Retrieval
, pp.
558
563
.
33.
Ladefoged
,
P.
, and
McKinney
,
N. P.
(
1963
). “
Loudness, sound pressure, and subglottal pressure in speech
,”
J. Acoust. Soc. Am.
35
(
4
),
454
460
.
37.
Lartillot
,
O.
,
Eerola
,
T.
,
Toiviainen
,
P.
, and
Fornari
,
F.
(
2008
). “
Multi-feature modeling of pulse clarity: Design, validation and optimization
,” in
Proceedings of the ISMIR2008 International Conference on Music Information Retrieval
, pp.
521
526
.
38.
Lartillot
,
O.
, and
Toiviainen
,
P.
(
2007
). “
A MATLAB toolbox for musical feature extraction from audio
,” in
Proceedings of the 10th International Conference on Digital Audio Effects 2007 (DAFx-07)
, pp.
237
244
.
34.
Lee
,
H.
,
Pham
,
P.
,
Largman
,
Y.
, and
Ng
,
A. Y.
(
2009
). “
Unsupervised feature learning for audio classification using convolutional deep belief networks
,”
Adv. Neural Info. Process. Syst.
22
,
1096
1104
.
35.
Lemaitre
,
G.
, and
Heller
,
L. M.
(
2012
). “
Auditory perception of material is fragile while action is strikingly robust
,”
J. Acoust. Soc. Am.
131
(
2
),
1337
1348
.
39a.
Lindeberg
,
T.
, and
Friberg
,
A.
(
2014
). “
Idealized computational models for auditory receptive fields
,”
PLOS-ONE
, arXiv:1404.2037.
39.
London
,
J.
(
2004
).
Hearing in Time: Psychological Aspects of Musical Meter
(
Oxford University Press
,
New York
),
206
pp.
40.
Lu
,
L.
,
Liu
,
D.
, and
Zhang
H.
(
2006
). “
Automatic mood detection and tracking of music audio signals
,”
IEEE Trans. Audio, Speech, Lang. Proc.
14
(
1
),
5
18
.
41.
Madison
,
G.
, and
Paulin
,
J.
(
2010
). “
Ratings of speed in real music as a function of both original and manipulated beat tempo
,”
J. Acoust. Soc. Am.
128
(
5
),
3032
3040
.
42.
McGraw
,
K. O.
, and
Wong
,
S. P.
(
1996
). “
Forming inferences about some intraclass correlation coefficients
,”
Psychol. Methods
1
(
1
),
30
46
.
43.
Mesgarani
,
N.
,
Shamma
,
S.
, and
Slaney
,
M.
(
2004
). “
Speech discrimination based on multiscale spectro-temporal modulations
,” in
Proceedings of the 2004 IEEE International Conference on Acoustics, Speech and Signal Processing
, Vol.
1
, pp.
601
604
.
44.
Nygren
,
P.
(
2009
). “
Achieving equal loudness between audio files
,” Master's thesis,
KTH Royal Institute of Technology, Stockholm
,
Sweden
,
61
pp.
45.
Patterson
,
R. D.
,
Gaudrain
,
E.
, and
Walters
,
T. C.
(
2010
). “
The perception of family and register in musical tones
,” in
Music Perception
(
Springer
,
New York)
, pp.
13
50
.
46.
Peeters
,
G.
(
2004
). “
A large set of audio features for sound description (similarity and classification) in the CUIDADO project
,” CUIDADO I.S.T. Project Report,
25
pp.
47.
Pohle
,
T.
,
Pampalk
,
E.
, and
Widmer
,
G.
(
2005
). “
Evaluation of frequently used audio features for classification of music into perceptual categories
,” in
Proceedings of the International Workshop on Content-Based Multimedia Indexing
.
48.
Polotti
,
P.
, and
Rocchesso
,
D.
, Eds. (
2008
).
Sound to Sense, Sense to Sound: A State of the Art in Sound and Music Computing
(
Logos Verlag
,
Berlin
),
490
pp.
49.
Rasch
,
R. A.
(
1979
). “
Synchronization in performed ensemble music
,”
Acustica
43
,
121
131
.
50.
Russell
,
J. A.
(
1980
). “
A circumplex model of affect
,”
J. Personality Social Psychol.
39
,
1161
1178
.
51.
Scherer
,
K. R.
(
2003
). “
Vocal communication of emotion: A review of research paradigms
,”
Speech Commun.
40
(
1
),
227
256
.
52.
Schubert
,
E.
, and
Wolfe
,
J.
(
2006
). “
Does timbral brightness scale with frequency and spectral centroid?
,”
Acta Acust. Acust.
92
(
5
),
820
825
.
53.
Slaney
,
M.
(
1998
). “
Auditory toolbox
,”
Interval Research Corporation, Tech. Report No. 1998-10
,
52
pp.
54.
Smith
,
D. R. R.
, and
Patterson
,
R. D.
(
2005
). “
The interaction of glottal-pulse rate and vocal-tract length in judgments of speaker size, sex, and age
,”
J. Acoust. Soc. Am.
117
,
2374
2386
.
55.
Sonic Annotator
(
2014
). http://www.omras2.org/SonicAnnotator (Last viewed September 1, 2014).
56.
Wedin
,
L.
(
1972
). “
A multidimensional study of perceptual-emotional qualities in music
,”
Scand. J. Psychol.
13
,
241
257
.
57.
Vurma
,
A.
,
Raju
,
M.
, and
Kuuda
,
A.
(
2011
). “
Does timbre affect pitch? Estimations by musicians and non-musicians
,”
Psychol. Music
39
(
3
),
291
306
.
You do not currently have access to this content.