A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. The typical SS-validity confidence metrics analyse the validity of the SS assumption over a fixed-size TF region local to the TF bin. The performance of such methods degrades as the number of simultaneously active sources increases due to the associated decrease in the size of the TF regions where the SS assumption is valid. A SS-validity confidence metric is proposed that exploits a dynamic MS assumption over relatively larger TF regions. The proposed metric first clusters the initial DOA estimates (one per TF bin) and then uses the members' spatial consistency as well as its cluster's spread to weight each TF bin. Distance-based and density-based clustering are employed as two alternative approaches for clustering DOAs. A noise-robust density-based clustering is also used in an evolutionary framework to propose a method for source counting and source direction estimation. The evaluation results based on simulations and also with real recordings show that the proposed weighting strategy significantly improves the accuracy of source counting and MS DOA estimation compared to the state-of-the-art.

1.
H. W.
Löllmann
,
C.
Evers
,
A.
Schmidt
,
H.
Mellmann
,
H.
Barfuss
,
P. A.
Naylor
, and
W.
Kellermann
, “
The LOCATA challenge data corpus for acoustic source localization and tracking
,” in
IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM)
, Sheffield, UK (
2018
).
2.
S.
Rickard
and
Z.
Yilmaz
, “
On the approximate W-disjoint orthogonality of speech
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
(
2002
), Vol.
1
, pp.
529
532
.
3.
R. O.
Schmidt
, “
Multiple emitter location and signal parameter estimation
,”
IEEE Trans. Antennas Propag.
34
(
3
),
276
280
(
1986
).
4.
D.
Khaykin
and
B.
Rafaely
, “
Acoustic analysis by spherical microphone array processing of room impulse responses
,”
J. Acoust. Soc. Am.
132
(
1
),
261
270
(
2012
).
5.
J. L.
Yuxiang Hu
and
X.
Qiu
, “
A maximum likelihood direction of arrival estimation method for open-sphere microphone arrays in the spherical harmonic domain
,”
J. Acoust. Soc. Am.
138
(
2
),
791
794
(
2015
).
6.
D.
Levin
,
E. A. P.
Habets
, and
S.
Gannot
, “
On the angular error of intensity vector based direction of arrival estimation in reverberant sound fields
,”
J. Acoust. Soc. Am.
128
(
4
),
1800
1811
(
2010
).
7.
P. A.
Naylor
and
N. D.
Gaubitch
,
Speech Dereverberation
(
Springer
,
Berlin
,
2010
).
8.
S.
Mohan
,
M. E.
Lockwood
,
M. L.
Kramer
, and
D. L.
Jones
, “
Localization of multiple acoustic sources with small arrays using a coherence test
,”
J. Acoust. Soc. Am.
123
,
2136
2147
(
2008
).
9.
D.
Pavlidi
,
M.
Puigt
,
A.
Griffin
, and
A.
Mouchtaris
, “
Real-time multiple sound source localization using a circular microphone array based on single-source confidence measures
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Kyoto, Japan (
2012
), pp.
2625
2628
.
10.
O.
Nadiri
and
B.
Rafaely
, “
Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test
,”
IEEE Trans. Audio Speech Lang. Process.
22
(
10
),
1494
1505
(
2014
).
11.
S.
Hafezi
,
A. H.
Moore
, and
P. A.
Naylor
, “
Multiple DOA estimation based on estimation consistency and spherical harmonic multiple signal classification
,” in
Proceedings of the European Signal Processing Conference (EUSIPCO)
, Kos, Greece (
2017
), pp.
1280
1284
.
12.
S.
Hafezi
,
A. H.
Moore
, and
P. A.
Naylor
, “
Augmented intensity vectors for direction of arrival estimation in the spherical harmonic domain
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
25
(
10
),
1956
1968
(
2017
).
13.
S.
Hafezi
,
A. H.
Moore
, and
P. A.
Naylor
, “
Robust source counting for DOA estimation using density-based clustering
,” in
IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM)
, Sheffield, UK (
2018
).
14.
H.
Wang
and
M.
Kaveh
, “
Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources
,”
IEEE Trans. Acoust. Speech Sign. Process.
33
(
4
),
823
831
(
1985
).
15.
H.
Akaike
, “
A new look at the statistical model identification
,”
IEEE Trans. Autom. Control AC-
19
(
6
),
716
723
(
1974
).
16.
E.
Sato
and
Y.
Tatekura
, “
Fast multiple moving sound sources localization utilizing sparseness of speech signals
,”
J. Acoust. Soc. Am.
140
(
4
),
3061
(
2016
).
17.
A. H.
Moore
,
C.
Evers
, and
P. A.
Naylor
, “
2D direction of arrival estimation of multiple moving sources using a spherical microphone array
,” in
Proceedings of the European Signal Processing Conference (EUSIPCO)
, Budapest, Hungary, 29 Aug.–2 Sept. 2016 (
IEEE
,
2016
).
18.
J.
Ahonen
and
V.
Pulkki
, “
Diffuseness estimation using temporal variation of intensity vectors
,” in
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
, New Paltz, NY (
2009
), pp.
285
288
.
19.
D. P.
Jarrett
,
O.
Thiergart
,
E. A. P.
Habets
, and
P. A.
Naylor
, “
Coherence-based diffuseness estimation in the spherical harmonic domain
,” in
Proceedings of the IEEE Convention of Electrical & Electronics Engineers in Israel (IEEEI)
, Eilat, Israel (
2012
), pp.
1
5
.
20.
M.
Ester
,
H. P.
Krigel
,
J.
Sander
, and
X.
Xu
, “
A density-based algorithm for discovering clusters in large spatial database with noise
,” in
Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining
, Portland, WA (
1996
), pp.
226
231
.
21.
C. D.
Manning
,
P.
Raghavan
, and
H.
Schutze
,
Introduction to Information Retrieval
(
Cambridge University Press
,
Cambridge, UK
,
2008
).
22.
S.
Hafezi
,
A. H.
Moore
, and
P. A.
Naylor
, “
Multiple source localization using estimation consistency in the time-frequency domain
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, New Orleans, LA (
2017
), pp.
516
520
.
23.
S.
Hafezi
,
A. H.
Moore
, and
P. A.
Naylor
, “
3D acoustic source localization in the spherical harmonic domain based on optimized grid search
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Shanghai, China (
2016
), pp.
415
419
.
24.
S.
Hafezi
,
A. H.
Moore
, and
P. A.
Naylor
, “
Multiple source localization in the spherical harmonic domain using augmented intensity vectors based on grid search
,” in
Proceedings of the European Signal Processing Conference (EUSIPCO)
, Budapest, Hungary (
2016
), pp.
602
606
.
25.
A.
Griffin
,
D.
Pavlidi
,
M.
Puigt
, and
A.
Mouchtaris
, “
Real-time multiple speaker DOA estimation in a circular microphone array based on matching pursuit
,” in
Proceedings of the European Signal Processing Conference (EUSIPCO)
, Bucharest, Romania (
2012
), pp.
2303
2307
.
26.
D.
Pavlidi
,
S.
Delikaris-Manias
,
V.
Pulkki
, and
A.
Mouchtaris
, “
3D localization of multiple sound sources with intensity vector estimates in single source zones
,” in
Proceedings of the European Signal Processing Conference (EUSIPCO)
(
2015
), pp.
1556
1560
.
27.
B.
Loesch
and
B.
Yang
, “
Source number estimation and clustering for underdetermined blind source separation
,” in
IWAENC
, Seattle, WA (
2008
).
28.
W.
Zhang
and
B.
Rao
, “
A two microphone-based approach for source localization of multiple speech sources
,”
IEEE Trans. Audio Speech Lang. Process.
18
(
8
),
1913
1928
(
2010
).
29.
M.
Cobos
,
J. J.
Lopez
, and
D.
Martinez
, “
Two-microphone multi-speaker localization based on a Laplacian mixture model
,”
Dig. Sign. Process.
21
(
1
),
66
76
(
2011
).
30.
C.
Kim
,
C.
Khawand
, and
R. M.
Stern
, “
Two-microphone source separation algorithm based on statistical modeling of angle distributions
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Kyoto, Japan (
2012
), pp.
4629
4632
.
31.
A.
Ram
,
A.
Sharma
,
A. S.
Jalall
,
R.
Singh
, and
A.
Agrawal
, “
An enhanced density based spatial clustering of applications with noise
,” in
IEEE International Advance Computing Conferece (IACC)
, Patiala, India (
2009
), pp.
1475
1478
.
32.
P.
Liu
,
D.
Zhou
, and
N.
Wu
, “
VDBSCAN: Varied density based spatial clustering of applications with noise
,” in
Proceedings of the IEEE International Conference on Service Systems and Service Management
, Chengdu, China (
2007
), pp.
1
4
.
33.
C.
Xiaoyun
,
M.
Yufang
,
Z.
Yan
, and
W.
Ping
, “
GMDBSCAN: Multi-density DBSCAN cluster based on grid
,” in
IEEE International Conference on e-Business Enginerring (ICEBE)
(
2008
), pp.
780
783
.
34.
Z.
Xiong
,
R.
Chen
,
Y.
Zhang
, and
X.
Zhang
, “
Multi-density DBSCAN algorithm based on density levels partitioning
,”
J. Inf. Comput. Sci.
9
,
2739
2749
(
2012
).
35.
O.
Uncu
,
W. A.
Gruver
,
D. B.
Kotak
,
D.
Sabaz
,
Z.
Alibhai
, and
C.
Ng
, “
GRIDBSCAN: Grid density-based spatial clustering of applications with noise
,” in
IEEE International Conference on Systems, Man and Cybernetics
, Taipei, Taiwan (
2006
), pp.
2976
2981
.
36.
H.
Sun
,
E.
Mabande
,
K.
Kowalczyk
, and
W.
Kellermann
, “
Localization of distinct reflections in rooms using spherical microphone array eigenbeam processing
,”
J. Acoust. Soc. Am.
131
(
4
),
2828
2840
(
2012
).
37.
K.
Haddad
and
J.
Hald
, “
3D localization of acoustic sources with a spherical array
,”
J. Acoust. Soc. Am.
123
(
5
),
3311
(
2008
).
38.
M. R.
Bai
and
Y. H.
Yao
, “
Source localization and signal extraction using spherical microphone arrays
,”
J. Acoust. Soc. Am.
137
(
4
),
2232
(
2015
).
39.
P. A.
Naylor
and
N. D.
Gaubitch
, “
Speech dereverberation
,” in
Proceedings of the International Workshop on Acoustical Echo and Noise Control (IWAENC)
, Eindhoven, the Netherlands (
2005
).
40.
J. B.
Allen
and
D. A.
Berkley
, “
Image method for efficiently simulating small-room acoustics
,”
J. Acoust. Soc. Am.
65
(
4
),
943
950
(
1979
).
41.
D. P.
Jarrett
,
E. A. P.
Habets
,
M. R. P.
Thomas
, and
P. A.
Naylor
, “
Simulating room impulse responses for spherical microphone arrays
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Prague, Czech Republic (
2011
), pp.
129
132
.
42.
G.
Lindsey
,
A.
Breen
, and
S.
Nevard
, “
SPAR's archivable actual-word databases
,” technical report, University College London, London (
1987
).
43.
ITU-T
, “
Objective measurement of active speech level
,” http://www.itu.int/rec/T-REC-P.56-201112-I/en (2011) (Last viewed 12/14/2011).
44.
D. P.
Jarrett
,
E. A. P.
Habets
, and
P. A.
Naylor
, “
3D source localization in the spherical harmonic domain using a pseudointensity vector
,” in
Proceedings of the European Signal Processing Conference (EUSIPCO)
, Aalborg, Denmark (
2010
), pp.
442
446
.
45.
D. P.
Jarrett
,
E. A.
Habets
, and
P. A.
Naylor
,
Theory and Applications of Spherical Microphone Array Processing
, in
Springer Topics in Signal Processing
(
Springer
,
Berlin
,
2016
).
You do not currently have access to this content.