A system is proposed in which rhythmic representations are used to model the perception of tempo in music. The system can be understood as a five-layered model, where representations are transformed into higher-level abstractions in each layer. First, source separation is applied (Audio Level), onsets are detected (Onset Level), and interonset relationships are analyzed (Interonset Level). Then, several high-level representations of rhythm are computed (Rhythm Level). The periodicity of the music is modeled by the cepstroid vector—the periodicity of an interonset interval (IOI)–histogram. The pulse strength for plausible beat length candidates is defined by computing the magnitudes in different IOI histograms. The speed of the music is modeled as a continuous function on the basis of the idea that such a function corresponds to the underlying perceptual phenomena, and it seems to effectively reduce octave errors. By combining the rhythmic representations in a logistic regression framework, the tempo of the music is finally computed (Tempo Level). The results are the highest reported in a formal benchmarking test (2006–2013), with a P-Score of 0.857. Furthermore, the highest results so far are reported for two widely adopted test sets, with an Acc1 of 77.3% and 93.0% for the Songs and Ballroom datasets.
Skip Nav Destination
Article navigation
June 2015
June 01 2015
Modeling the perception of tempo
Anders Elowsson;
Anders Elowsson
a)
School of Computer Science and Communication, Speech, Music and Hearing, KTH
Royal Institute of Technology
, Stockholm, Sweden
Search for other works by this author on:
Anders Friberg
Anders Friberg
School of Computer Science and Communication, Speech, Music and Hearing, KTH
Royal Institute of Technology
, Stockholm, Sweden
Search for other works by this author on:
a)
Electronic mail: [email protected]
J. Acoust. Soc. Am. 137, 3163–3177 (2015)
Article history
Received:
July 22 2014
Accepted:
April 04 2015
Citation
Anders Elowsson, Anders Friberg; Modeling the perception of tempo. J. Acoust. Soc. Am. 1 June 2015; 137 (6): 3163–3177. https://doi.org/10.1121/1.4919306
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
All we know about anechoic chambers
Michael Vorländer
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Does sound symbolism need sound?: The role of articulatory movement in detecting iconicity between sound and meaning
Mutsumi Imai, Sotaro Kita, et al.
Related Content
Ratings of speed in real music as a function of both original and manipulated beat tempo
J. Acoust. Soc. Am. (November 2010)
Predicting the perception of performed dynamics in music audio with ensemble learning
J. Acoust. Soc. Am. (March 2017)
Effects of tempo, swing density, and listener's drumming experience, on swing detection thresholds for drum rhythms
J. Acoust. Soc. Am. (June 2017)
Tempo and beat analysis of acoustic musical signals
J Acoust Soc Am (January 1998)
Effects of instructed timing and tempo on snare drum sound in drum kit performance
J. Acoust. Soc. Am. (October 2015)