An unsupervised single channel audio separation method from pattern recognition viewpoint is presented. The proposed method does not require training knowledge and the separation system is based on non-uniform time-frequency (TF) analysis and feature extraction. Unlike conventional research that concentrates on the use of spectrogram or its variants, the proposed separation algorithm uses an alternative TF representation based on the gammatone filterbank. In particular, the monaural mixed audio signal is shown to be considerably more separable in this non-uniform TF domain. The analysis of signal separability to verify this finding is provided. In addition, a variational Bayesian approach is derived to learn the sparsity parameters for optimizing the matrix factorization. Experimental tests have been conducted, which show that the extraction of the spectral dictionary and temporal codes is more efficient using sparsity learning and subsequently leads to better separation performance.
Skip Nav Destination
,
,
Article navigation
March 2014
March 01 2014
Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation
Bin Gao;
Bin Gao
School of Automation Engineering
, University of Electronic Science and Technology of China
, Chengdu, 611731, People's Republic of China
Search for other works by this author on:
W. L. Woo;
W. L. Woo
a)
School of Electrical and Electronic Engineering, Newcastle University
, Newcastle upon Tyne NE1 7RU, United Kingdom
Search for other works by this author on:
L. C. Khor
L. C. Khor
School of Electrical and Electronic Engineering, Newcastle University
, Newcastle upon Tyne NE1 7RU, United Kingdom
Search for other works by this author on:
Bin Gao
W. L. Woo
a)
L. C. Khor
School of Automation Engineering
, University of Electronic Science and Technology of China
, Chengdu, 611731, People's Republic of China
a)
Author to whom correspondence should be addressed. Electronic mail: [email protected]
J. Acoust. Soc. Am. 135, 1171–1185 (2014)
Article history
Received:
November 06 2012
Accepted:
January 17 2014
Citation
Bin Gao, W. L. Woo, L. C. Khor; Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation. J. Acoust. Soc. Am. 1 March 2014; 135 (3): 1171–1185. https://doi.org/10.1121/1.4864294
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
Related Content
Dereverberation binaural source separation using deep learning
J. Acoust. Soc. Am. (September 2018)
Broadband continuous wave source localization via pair‐wise, cochleagram processing
J. Acoust. Soc. Am. (April 2005)
Single-channel blind separation using L1-sparse complex non-negative matrix factorization for acoustic signals
J. Acoust. Soc. Am. (January 2015)
Visual representations of speech—A computer model based on correlation
J. Acoust. Soc. Am. (August 2005)
Underdetermined reverberant acoustic source separation using weighted full-rank nonnegative tensor models
J. Acoust. Soc. Am. (December 2015)