The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a “brute force” model using a large number of general audio features.
Skip Nav Destination
Article navigation
October 2014
October 01 2014
Using listener-based perceptual features as intermediate representations in music information retrieval
Anders Friberg;
Anders Friberg
a)
KTH Royal Institute of Technology,
School of Computer Science and Communication
, Speech, Music and Hearing, Stockholm, Sweden
Search for other works by this author on:
Erwin Schoonderwaldt;
Erwin Schoonderwaldt
Hanover University of Music
, Drama and Media, Institute of Music Physiology and Musicians' Medicine, Hannover, Germany
Search for other works by this author on:
Anton Hedblad;
Anton Hedblad
KTH Royal Institute of Technology,
School of Computer Science and Communication
, Speech, Music and Hearing, Stockholm, Sweden
Search for other works by this author on:
Marco Fabiani;
Marco Fabiani
KTH Royal Institute of Technology,
School of Computer Science and Communication
, Speech, Music and Hearing, Stockholm, Sweden
Search for other works by this author on:
Anders Elowsson
Anders Elowsson
KTH Royal Institute of Technology,
School of Computer Science and Communication
, Speech, Music and Hearing, Stockholm, Sweden
Search for other works by this author on:
a)
Author to whom correspondence should be addressed. Electronic mail: [email protected]
J. Acoust. Soc. Am. 136, 1951–1963 (2014)
Article history
Received:
January 09 2014
Accepted:
July 16 2014
Citation
Anders Friberg, Erwin Schoonderwaldt, Anton Hedblad, Marco Fabiani, Anders Elowsson; Using listener-based perceptual features as intermediate representations in music information retrieval. J. Acoust. Soc. Am. 1 October 2014; 136 (4): 1951–1963. https://doi.org/10.1121/1.4892767
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
All we know about anechoic chambers
Michael Vorländer
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Performance study of ray-based ocean acoustic tomography methods for estimating submesoscale variability in the upper ocean
Etienne Ollivier, Richard X. Touret, et al.
Related Content
Perception of the proximal voice features for emotional speech
J Acoust Soc Am (October 2011)
Predicting the perception of performed dynamics in music audio with ensemble learning
J. Acoust. Soc. Am. (March 2017)
Modeling the perception of tempo
J. Acoust. Soc. Am. (June 2015)
Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields
J. Acoust. Soc. Am. (September 2018)
Estimation of harpsichord inharmonicity and temperament from musical recordings
J. Acoust. Soc. Am. (January 2012)