This paper investigates the use of musical priors for sparse expansion of audio signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided and shows that the model is capable of giving a relevant and legible representation of Western tonal music audio signals.

1.
Baraniuk
,
R.
,
Flandrin
,
P.
,
Janssen
,
A.
, and
Michel
,
O.
(
2001
). “
Measuring time-frequency information content using the Rényi entropies
,”
Proc. IEEE Trans. Inf. Theory
47
,
1391
1409
.
2.
Bello
,
J.
, and
Pickens
,
J.
(
2005
). “
A robust mid-level representation for harmonic content in music signal
,” in
Proceedings of the International Symposium on Music Information Retrieval
(International Society for Music Information Retrival, London, UK).
3.
Benaroya
,
L.
,
Bimbot
,
F.
, and
Gribonval
,
R.
(
2006
). “
Audio source separation with a single sensor
,”
IEEE Trans. Audio Speech Language Processing
14
,
191
199
.
4.
Blumensath
,
T.
, and
Davies
,
M.
(
2004
). “
Unsupervised learning of sparse and shift-invariant decompositions of polyphonic music
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processings
(ICASSP, Montreal, Canada), Vol.
5
, pp.
497
500
.
5.
Brown
,
J.
(
1991
). “
Calculation of a constant Q spectral transform
,”
J. Acoust. Soc. Am.
89
,
425
434
.
6.
Casella
,
G.
, and
George
,
E.
(
1992
). “
Explaining the Gibbs sampler
,”
Am. Stat.
46
,
167
174
.
7.
Chen
,
S.
,
David L. Donoho
,
D.
, and
Saunders
,
M.
(
1998
). “
Atomic decomposition by basis pursuit
,”
SIAM J. Sci. Comput.
20
,
33
61
.
8.
Crouse
,
M.
,
Nowak
,
R.
, and
Baraniuk
,
R.
(
1998
). “
Wavelet-based statistical signal processing using hidden markov models
,”
IEEE Trans. Sig. Processing
46
,
886
902
.
9.
Daudet
,
L.
(
2004
). “
Sparse and structured decompositions of audio signals in overcomplete spaces
,” in
Proceedings of the International Conference on Digital Audio Effects (DAFx'04)
(Naples, Italy), pp.
22
26
.
10.
Daudet
,
L.
(
2006a
). “
A review on techniques for the extraction of transients in musical signals
,” in
Computer Music Modeling and Retrieval, Lecture Notes in Computer Science
(
Springer-Verlag
,
Berlin
), Vol.
3902
, pp.
219
232
.
11.
Daudet
,
L.
(
2006b
). “
Sparse and structured decompositions of signals with the molecular matching pursuit
,”
IEEE Trans. Audio Speech Language Processing
14
,
1808
1816
.
12.
Daudet
,
L.
(
2010
). “
Audio sparse decompositions in parallel, Let the greed be shared!
IEEE Trans. Signal Processing
27
,
90
96
.
13.
Daudet
,
L.
,
Molla
,
S.
, and
Torrésani
,
B.
(
2004
). “
Towards a hybrid audio coder
,” in
Proceedings of the International Conference Wavelet Analysis Its Applications
(Chongqing, Chine), pp.
13
24
.
14.
Daudet
,
L.
, and
Torrésani
,
B.
(
2002
). “
Hybrid representations for audiophonic signal encoding
,”
Signal Proc. J.
82
,
1595
1617
.
15.
Daudet
,
L.
, and
Torrésani
,
B.
(
2006
).
Signal Processing Methods for Music Transcription
(
Springer
,
New York)
, Chap. 3, pp.
65
98
.
16.
Davies
,
M.
, and
Daudet
,
L.
(
2006
). “
Sparse audio representations using the MCLT
Signal Processing J.
86
,
457
470
.
17.
Davies
,
M.
, and
Plumbley
,
M.
(
2007
). “
Context-dependent beat tracking of musical audio
,”
IEEE Trans. Audio Speech Language Processing
15
,
1009
1020
.
18.
Dixon
,
S.
(
2007
). “
Evaluation of audio beat tracking system beatroot
,”
J. New Music Res.
36
,
39
51
.
19.
Emiya
,
V.
,
Vincent
,
E.
,
Harlander
,
N.
, and
Hohmann
,
V.
(
2010
). “
Subjective and objective quality assessment of audio source separation
,”
IEEE Trans. Audio, Speech, Lang. Process.
19
(
7
),
2046
2057
.
20.
Févotte
,
C.
,
Daudet
,
L.
,
Godsill
,
S.
, and
Torresani
,
B.
(
2006
). “
Sparse regression with structured priors: Application to audio denoising
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processings
(Toulouse, France), Vol.
3
.
21.
Févotte
,
C.
, and
Godsill
,
S.
(
2006a
). “
A Bayesian approach for blind separation of sparse sources
,”
IEEE Trans. Audio Speech Language Processing
14
,
2174
2188
.
22.
Févotte
,
C.
, and
Godsill
,
S.
(
2006b
). “
Sparse linear regression in unions of bases via Bayesian variable selection
IEEE Signal Processing Letters
13
,
441
444
.
23.
Févotte
,
C.
,
Torrésani
,
B.
,
Daudet
,
L.
, and
Godsill
,
S.
(
2008
). “
Sparse linear regression with structured priors and application to denoising of musical audio
,”
IEEE Trans. Audio Speech Language Processing
16
,
174
185
.
24.
Figueiredo
,
M.
(
2003
). “
Adaptive sparseness for supervised learning
,”
IEEE Trans. Pattern Anal. Mach. Intell.
25
,
1150
1159
.
25.
Fujishima
,
T.
(
1999
). “
Real-time chord recognition of musical sound: a system using common lisp music
,” in
Proceedings of the International Computer Music Conference
(Bejing, China) (MPublishing, Ann Arbor, MI), pp.
464
467
.
26.
Geman
,
S.
, and
Geman
,
D.
(
1984
). “
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
,”
IEEE Trans. Pattern Anal. Mach. Intell.
6
,
721
741
.
27.
George
,
E.
, and
McCulloch
,
R.
(
1997
). “
Approaches for Bayesian variable selection.
Stat. Sin.
7
,
339
373
.
28.
Geweke
,
J.
(
1996
). “
Variable selection and model comparison in regression
,”
Bayesian Stat.
5
,
609
620
.
29.
Hamdy
,
K.
,
Ali
,
M.
, and
Tewfi
,
A.
(
1996
). “
Low bit rate high quality audio coding with combined harmonic and wavelet representations
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processings
(Atlanta, GA) (IEEE, Piscataway, NJ), Vol.
2
, pp.
1045
1048
.
30.
Harte
,
C.
, and
Sandler
,
M.
(
2005
). “
Automatic chord identification using a quantised chromagram
,” in
Proceedings of the Convention Audio Engineering Society (AES)
(Barcelona, Spain) (AES, New York).
31.
Jaillet
,
F.
, and
Torrésani
,
B.
(
2004
). “
Time-frequency jigsaw puzzle- Adaptive multiwindow and multilayered gabor expansions
,” technical report (LATP, Université de Provence, Marseille, France).
32.
Klapuri
,
A.
,
Eronen
,
A.
, and
Astola
,
J.
(
2006
). “
Analysis of the meter of acoustic musical signals
,”
IEEE Trans. Audio Speech Language Processing
14
,
342
355
.
33.
Kowalski
,
M.
(
2009
). “
Sparse regression using mixed norms
,”
Appl. Comput. Harmonic Anal.
27
,
303
324
.
34.
Kowalski
,
M.
, and
Torrésani
,
B.
(
2008
). “
Random models for sparse signals expansion on unions of bases with application to audio signals
,”
IEEE Trans. Signal Processing
56
,
3468
3481
.
35.
Liuni
,
M.
,
Roebel
,
A.
,
Romito
,
M.
, and
Rodet
,
X.
(
2011
). “
An entropy Based method for Local time-adaptation of the spectrogram
,” in
Computer Music Modeling and Retrieval, Lecture Notes in Computer Science
(
Springer-Verlag
,
Berlin
), pp.
60
75
.
36.
Low
,
F.
(
1985
). “
Complete sets of wave packets
,” in
A Passion for Physics—Essay in Honor of Geoffrey Chew
, edited by
C.
DeTar
(
World Scientific
,
New York
), pp.
17
22
.
37.
Mallat
,
S.
(
1998
).
A Wavelet Tour of Signal Processing
, 3rd ed. (
Academic Press
,
San Diego, CA
),
832
p.
38.
Mallat
,
S.
, and
Zhang
,
Z.
(
1993
). “
Matching pursuit with time-frequency dictionaries
,”
IEEE Trans. Signal Processing
41
,
3397
3415
.
39.
Malvar
,
H.
(
1990
). “
Lapped transforms for efficient transform/subband coding
,”
IEEE Trans. Acoust. Speech Signal Processing
38
,
969
978
.
40.
Molla
,
S.
, and
Torrésani
,
B.
(
2004
). “
Determining local transientness of audio signals
,”
IEEE Signal Processing Lett.
11
,
625
628
.
41.
Molla
,
S.
, and
Torrésani
,
B.
(
2005
). “
An hybrid audio scheme using hidden Markov models of waveforms
,”
Appl. Comput. Harmonic Anal.
18
,
137
166
.
42.
Noland
,
K.
and
Sandler
,
M.
(
2006
). “
Key estimation using a hidden Markov model
,” in
Proceedings of the International Symposium on Music Information Retrieval
(Victoria, BC, Canada).
43.
Papadopoulos
,
H.
, and
Kowalski
,
M.
(
2011
). “
Sparse Signal Decomposition on Hybrid Dictionaries Using Musical Priors
,” in
Proceedings of the International Symposium on Music Information Retrieval
(International Society for Music Information Retrival, Miami, FL).
44.
Papadopoulos
,
H.
, and
Kowalski
,
M.
(
2012
). “
Sound files
,” http://www.lss.supelec.fr/perso/kowalski/jasa/jasa.htm (Last viewed April 9, 2012).
45.
Papadopoulos
,
H.
, and
Peeters
,
G.
(
2007
). “
Large-scale study of chord estimation algorithms based on chroma representation and hmm
,” in
Proceedings of the International Workshop on Content-Based Multimedia Indexing (CBMI)
(Bordeaux, France), pp. 53–60.
46.
Papadopoulos
,
H.
, and
Peeters
,
G.
(
2011
). “
Joint estimation of chords and downbeats
,”
IEEE Trans. Audio Speech Language Processing
19
,
138
152
.
47.
Pati
,
Y.
,
Rezaiifar
,
R.
, and
Krishnaprasad
,
P.
(
1993
). “
Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition
,” in
Proceedings of the 27th Annual Asilomar Conference on Signals, Systems, and Computers
(Pacific Grove, CA), pp.
40
44
.
48.
Peeters
,
G.
(
2006
). “
Musical key estimation of audio signal based on HMM modeling of chroma vectors
,” in
Proceedings of the International Conference on Digital Audio Effects (DAFx)
(Montreal, Canada), pp.
127
131
.
49.
Peeters
,
G.
, and
Papadopoulos
,
H.
(
2011
). “
Simultaneous beat and downbeat-tracking using a probabilistic framework: Theory and large-scale evaluation
,”
IEEE Trans. Audio Speech Language Processing
19
,
1754
1769
.
50.
Plumbley
,
M.
,
Blumensath
,
T.
,
Daudet
,
L.
,
Gribonval
,
R.
, and
Davies
,
M.
(
2010
). “
Sparse Representations in audio and music: From coding to source separation
,”
Proc. IEEE
98
,
995
1005
.
51.
Ravelli
,
E.
,
Richard
,
G.
, and
Daudet
,
L.
(
2008
). “
Union of MDCT bases for audio coding
,”
IEEE Trans. Audio Speech Language Processing
16
,
1361
1372
.
52.
Ravelli
,
E.
,
Richard
,
G.
, and
Daudet
,
L.
(
2010
). “
Audio signal representations for indexing in the transform domain
,”
IEEE Trans. Audio Speech Language Processing
18
,
434
446
.
53.
Rényi
,
A.
(
1961
). “
On measures of entropy and information
,” in
Proceedings of the Fourth Berkeley Symposium on Mathematics of Statistics and Probability
(
University of California Press
,
Berkeley, CA
), Vol.
41
, pp.
547
561
.
54.
Rohdenburg
,
T.
,
Hohmann
,
V.
, and
Kollmeier
,
B.
(
2005
). “
Objective perceptual quality measures for the evaluation of noise reduction schemes
,” in
Proceedings of the 9th International Workshop on Acoustic Echo and Noise Control
(Technische Universiteit Eindhoven, Eindhoven, The Netherlands), pp. 169–172.
55.
Scheirer
,
E.
(
1998
). “
Tempo and beat analysis of acoustic musical signals
,”
J. Acoust. Soc. Am.
103
,
588
601
.
56.
Sheh
,
A.
, and
Ellis
,
D.
(
2003
). “
Chord segmentation and recognition using EM-trained HMM
,” in
Proceedings of the International Symposium on Music Information Retrieval
(International Society for Music Information Retrieval, Baltimore, MD).
57.
Tibshirani
,
R.
(
1996
). “
Regression shrinkage and selection via the lasso
,”
J. R. Stat. Soc. Ser. B
58
,
267
288
.
58.
Verma
,
T.
, and
Meng
,
T.
(
2000
). “
Extending spectral modeling synthesis with transient modeling synthesis
,”
Comput. Music J.
24
,
47
59
.
59.
Vincent
,
E.
,
Jafari
,
M.
, and
Plumbley
,
M. D.
(
2006
). “
Preliminary guidelines for subjective evaluation of audio source separation algorithms
,” in
Proceedings of the UK ICA Research Network Workshop
(Royaume-Uni, Southampton, UK), pp.
93
96
.
60.
Wakefield
,
G.
(
1999
). “
Mathematical representation of joint time-chroma distribution
,” in
Proceedings of the SPIE Conference on Advanced Signal Processing Algorithms, Architecture and Implementation
(ASPAAI, Denver, CO), pp.
637
645
.
61.
Wolfe
,
P.
,
Godsill
,
S.
, and
Ng
,
W.-J.
(
2004
). “
Bayesian variable selection and regularization for time-frequency surface estimation
,”
J. R. Stat. Soc. Ser. B
66
,
575
589
.
62.
Yeh
,
C.
,
Roebel
,
A.
, and
Rodet
,
X.
(
2010
). “
Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals
,”
IEEE Trans. Audio Speech Language Processing
18
,
1116
1126
.
You do not currently have access to this content.