In this paper, a fusion of K models of full-rank weighted nonnegative tensor factor two-dimensional deconvolution (K-wNTF2D) is proposed to separate the acoustic sources that have been mixed in an underdetermined reverberant environment. The model is adapted in an unsupervised manner under the hybrid framework of the generalized expectation maximization and multiplicative update algorithms. The derivation of the algorithm and the development of proposed full-rank K-wNTF2D will be shown. The algorithm also encodes a set of variable sparsity parameters derived from Gibbs distribution into the K-wNTF2D model. This optimizes each sub-model in K-wNTF2D with the required sparsity to model the time-varying variances of the sources in the spectrogram. In addition, an initialization method is proposed to initialize the parameters in the K-wNTF2D. Experimental results on the underdetermined reverberant mixing environment have shown that the proposed algorithm is effective at separating the mixture with an average signal-to-distortion ratio of 3 dB.

1.
M.
Frikel
,
V.
Barroso
, and
J.
Xavier
, “
Blind source separation
,”
J. Acoust. Soc. Am.
105
,
1101
1102
(
1999
).
2.
J.
Anemüller
and
B.
Kollmeier
, “
Convolutive blind source separation of speech signals based on amplitude modulation decorrelation
,”
J. Acoust. Soc. Am.
108
,
2630
(
2000
).
3.
M. J.
Roan
and
J.
Erling
, “
Blind source separation and blind deconvolution in experimental acoustics
,”
J. Acoust. Soc. Am.
108
,
2628
2629
(
2000
).
4.
L. H.
Sibul
,
M. J.
Roan
, and
C. M.
Coviello
, “
Blind deconvolution and source separation in acoustics
,”
J. Acoust. Soc. Am.
118
,
2028
(
2005
).
5.
K.
Teramoto
and
N.
Mori
, “
Blind source separation by convex optimization to resolution enhancement
,”
J. Acoust. Soc. Am.
105
,
1309
(
1999
).
6.
P.
De Leon
and
Y.
Ma
, “
Blind source separation of mixtures of speech signals with unknown propagation delays
,”
J. Acoust. Soc. Am.
108
,
2629
(
2000
).
7.
R.
Mukai
,
H.
Sawada
,
S.
Araki
, and
S.
Makino
, “
Frequency domain blind source separation in a noisy environment
,”
J. Acoust. Soc. Am.
120
,
3045
(
2006
).
8.
A.
Cichocki
,
R.
Zdunek
,
A. H.
Phan
, and
S. I.
Amari
,
Nonnegative Matrix and Tensor Factorizations Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation
(
Wiley and Sons
,
Chichester, UK
,
2009
),
500
pp.
9.
P.
Comon
and
C.
Jutten
,
Handbook of Blind Source Separation Independent Component Analysis and Applications
(
Academic
,
New York
,
2010
),
856
pp.
10.
Y.
Xianchuan
,
H.
Dan
, and
X.
Jindong
,
Blind Source Separation: Theory and Applications
(
Wiley and Sons
,
Singapore
,
2014
),
416
pp.
11.
R.
Zdunek
, “
Improved convolutive and under-determined blind audio source separation with MRF smoothing
,”
Cognit. Comput.
5
(
4
),
493
503
(
2013
).
12.
H.
Sawada
,
H.
Kameoka
,
S.
Araki
, and
N.
Ueda
, “
Multichannel extensions of non-negative matrix factorization with complex-valued data
,”
IEEE Trans. Audio Speech Lang. Process.
21
(
5
),
971
982
(
2013
).
13.
K.
Takeda
,
H.
Kameoka
,
H.
Sawada
,
S.
Araki
,
S.
Miyabe
,
T.
Yamada
, and
S.
Makino
, “
Underdetermined BSS with multichannel complex NMF assuming W-disjoint orthogonality of source
,” in
IEEE Region 10 Conference Tencon
(
2011
), pp.
413
416
.
14.
H.
Sawada
,
S.
Araki
, and
S.
Makino
, “
Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment
,”
IEEE Trans. Audio Speech Lang. Process.
19
(
3
),
516
527
(
2011
).
15.
A.
Ozerov
and
C.
Fevotte
, “
Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation
,”
IEEE Trans. Audio Speech Lang. Process.
18
(
3
),
550
563
(
2010
).
16.
N. Q. K.
Duong
,
E.
Vincent
, and
R.
Gribonval
, “
Under-determined reverberant audio source separation using a full-rank spatial covariance model
,”
IEEE Trans. Audio Speech Lang. Process.
18
(
7
),
1830
1840
(
2010
).
17.
N. Q. K.
Duong
,
E.
Vincent
, and
R.
Gribonval
, “
Under-determined reverberant audio source separation using local observed covariance and auditory-motivated time-frequency representation
,” in
9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA'10)
(
2010
), pp.
73
80
.
18.
A. M.
Darsono
,
G.
Bin
,
W. L.
Woo
, and
S. S.
Dlay
, “
Nonlinear single channel source separation
,” in
7th International Symposium on Communication Systems Networks and Digital Signal Processing (CSNDSP)
(
2010
), pp.
507
511
.
19.
M.
Parvaix
and
L.
Girin
, “
Informed source separation of linear instantaneous under-determined audio mixtures by source index embedding
,”
IEEE Trans. Audio Speech Lang. Process.
19
(
6
),
1721
1733
(
2011
).
20.
A.
Nesbit
,
E.
Vincent
, and
M. D.
Plumbley
, “
Benchmarking flexible adaptive time-frequency transforms for underdetermined audio source separation
,” in
IEEE International Conference on Acoustics, Speech, and Signal Processing
(
2009
), pp.
37
40
.
21.
J. T.
Chien
,
H.
Sawada
, and
S.
Makino
, “
Adaptive processing and learning for audio source separation
,” in
2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
(
2013
), pp.
1
6
.
22.
W.
Fuxiang
and
Z.
Jun
, “
Adaptive sparse factorization for even-determined and over-determined blind source separation
,” in
International Conference on Computational Intelligence and Software Engineering 2009
(
2009
), pp.
1
4
.
23.
J. L.
Yao
,
X. N.
Yang
,
J. D.
Li
, and
Z.
Li
, “
An MRC based over-determined blind source separation algorithm
,” in
2010 IEEE 21st International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC)
(
2010
), pp.
309
313
.
24.
J.
Zhang
,
W. L.
Woo
, and
S. S.
Dlay
, “
Blind source separation of post-nonlinear convolutive mixture
,”
IEEE Trans. Audio, Speech Lang. Process.
15
(
8
),
2311
2330
(
2007
).
25.
S.
Arberet
,
A.
Ozerov
,
N. Q. K.
Duong
,
E.
Vincent
,
R.
Gribonval
,
F.
Bimbot
, and
P.
Vandergheynst
, “
Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation
,” in
10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA)
(
2010
), pp.
1
4
.
26.
M. N.
Schmidt
and
M.
Morup
, “
Nonnegative matrix factor 2-D deconvolution for blind single channel source separation
,” in
6th International Conference on Independent Component Analysis and Signal Separation (ICA'06)
, Charleston, SC (
2006
), pp.
700
707
.
27.

By definition, a three-dimensional NTF is given by Vi,f,n=jai,jbf,jcj,n. This can be extended to NTF2D by introducing the convolutive parameters as Vi,f,n=jτϕai,jbfϕ,jτcj,nτϕ. We can further extend the NTF2D by introducing a dependence of ai,j with respect to one of the dimension say f, i.e., ai,j(f). In this case, we replace ai,j with ai,j,f so that Vi,f,n=jτϕai,j,fbfϕ,jτcj,nτϕ. This coupling allows us to weight the NTF2D as a function of f. We term this as the weighted NTF2D (wNTF2D). Finally, we introduce a fusion of K models of weighted NTF2D resulting to Vi,f,n=k=1Kjτϕai,j,fbfϕ,jτ,kcj,nτϕ,k, which we term it as the “K-wNTF2D.”

28.
C.
Fevotte
,
N.
Bertin
, and
J. L.
Durrieu
, “
Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis
,”
Neural Comput.
21
(
3
),
793
830
(
2009
).
29.
D. D.
Lee
and
H. S.
Seung
, “
Learning the parts of objects by non-negative matrix factorization
,”
Nature
401
(
6755
),
788
791
(
1999
).
30.
P.
Parathai
,
W. L.
Woo
,
S. S.
Dlay
, and
B.
Gao
, “
Single-channel blind separation using L1-sparse complex nonnegative matrix factorization for acoustic signals
,”
J. Acoust. Soc. Am.
137
,
EL124
EL129
(
2015
).
31.
B.
Gao
,
W. L.
Woo
, and
L. C.
Khor
, “
Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation
,”
J. Acoust. Soc. Am.
135
,
1171
1185
(
2014
).
32.
M.
Morup
and
M. N.
Schmid
, “
Sparse non-negative matrix factor 2-D deconvolution
,”
Technical Report
Technical University of Denmark, Copenhagen, Denmark (
2006
).
33.
B.
Gao
,
W. L.
Woo
, and
S. S.
Dlay
, “
Nonnegative matrix factorization for single channel source separation
,”
IEEE J. Selected Top. Signal Process.
5
(
5
),
989
1001
(
2011
).
34.
B.
Gao
,
W. L.
Woo
, and
S. S.
Dlay
, “
Variational regularized 2-D nonnegative matrix factorization
,”
IEEE Trans. Neural Netw. Learn. Syst.
23
(
5
),
703
716
(
2012
).
35.
B.
Gao
,
W. L.
Woo
, and
S. S.
Dlay
, “
Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and Itakura-Saito nonnegative matrix two-dimensional factorizations
,”
IEEE Trans. Circuits Syst. I-Regular Pap.
60
(
3
),
662
675
(
2013
).
36.
A.
Abdullah
,
J.
Moeller
, and
S.
Venkatasubramanian
, “
Approximate Bregman near neighbors in sublinear time: Beyond the triangle inequality
,”
Int. J. Comput. Geometr. Applications
23
(
4–5
),
253
301
(
2013
).
37.
M.
Goodwin
, “
The STFT, sinusoidal models, and speech modification
,” in
Springer Handbook of Speech Processing
, edited by
J.
Benesty
,
M. M.
Sondhi
, and
Y.
Huang
(
Springer
,
New York
,
2008
), pp.
229
258
.
38.
G.
Casalino
,
N.
Del Buono
, and
C.
Mencar
, “
Subtractive clustering for seeding non-negative matrix factorizations
,”
Inform. Sci.
257
,
369
387
(
2014
).
39.
Information on Signal Separation Evaluation Campaign (SiSEC 2013) available at https://sisec.wiki.irisa.fr/ (Last viewed 01/06/
2015
).
40.
E.
Vincent
,
R.
Gribonval
, and
C.
Fevotte
, “
Performance measurement in blind audio source separation
,”
IEEE Trans. Audio, Speech Lang. Process.
14
(
4
),
1462
1469
(
2006
).
41.
K.
Adiloglu
,
H.
Kayser
, and
L.
Wang
, “
A variational inference based source separation approach for the separation of sources in underdetermined recording
,” http://www.onn.nii.ac.jp/sisec13/evaluation_result/UND/submission/ob/Algorithm.pdf (Last viewed 01/06/
2015
).
42.
K.
Adiloglu
and
E.
Vincent
, “
Variational Bayesian interference for source separation and robust feature extraction
,”
Technical Report RT-0428
, Inria, Augest (
2012
).
43.
A.
Ozerov
,
E.
Vincent
, and
F.
Bimbot
, “
A general flexible framework for the handling of prior information in audio source separation
,”
IEEE Trans. Audio, Speech Lang. Process.
20
(
4
),
1118
1133
(
2012
).
44.
C.
Knapp
and
G. C.
Carter
, “
The generalized correlation method for estimation of time delay
,”
IEEE Trans. Acoust. Speech Signal Process.
24
(
4
),
320
327
(
1976
).
45.
B.
Gao
,
W. L.
Woo
, and
S. S.
Dlay
, “
Single channel blind source separation using EMD-subband variable regularized sparse features
,”
IEEE Trans. Audio, Speech Lang. Process.
19
(
4
),
961
976
(
2011
).
You do not currently have access to this content.