In this letter, a generic search grid generation algorithm for far-field source localization (SL) is proposed. Since conventional uniform regular grid structures only consider the resolution of the distribution, it is difficult to control the number of grid points to be distributed. The proposed algorithm generates a search grid by distributing a desired number of points evenly, depending on the target criterion, in either direction of arrival or time difference of arrival domain. The experimental results show that the proposed algorithm provides optimally distributed grid points given the number of desired points and the corresponding domain for SL processing.

The latest development of voice-activated user interfaces requires an acquisition of a desired speech signal from a distance. Since the acquired signal is corrupted by coherent interference and diffused background noise in this distant target scenario, it is necessary to estimate the desired speech signal using various types of enhancement techniques. Microphone array techniques are the best choice for removing both coherent directional interference and diffused noise, where the overall enhancement performance is dependent upon the localization accuracy of the desired speech signal. Toward this end, numerous source localization (SL) techniques have been investigated over the past several decades.1–9 

One way to classify SL algorithms is based on whether the location is derived directly from time difference of arrival (TDoA) or through a grid searching approach. TDoA-based SL schemes1,2 provide fast localization, but search grid-based SL algorithms3–7 yield better localization performance. To provide a feasible and robust solution, data-driven SL algorithms, utilizing deep neural networks (DNNs), have been recently introduced.8,9 Since a DNN is known to be highly effective in modeling the non-linear relationship between input and output data, it has been proven to deliver excellent localization performance in adverse environments8 even with small aperture arrays.

The overall localization performance of search grid-based SL algorithms varies significantly depending on the structure of the search grid.10 Search grids with a higher resolution provide more accurate localization but require more computational power. For instance, the complexity of the steered response power3 algorithm is directly proportional to the number of search grid points. An indefinite increase of the number of grid points, however, may not yield improvement of localization accuracy due to the discriminability issue.11 Also, the structure of the search grids should be chosen based on their application. For instance, search grids uniformly distributed in the TDoA domain are more suitable than those uniformly distributed in the direction of arrival (DoA) domain for beamforming applications.12 On the other hand, for applications such as audio rendering and reproductions, it would be better to optimize search grids in the DoA domain. Furthermore, data-driven approaches require that the search grid be determined prior to the training phase and its structure cannot be altered once the training has begun. Hence, in order to maximize the efficiency of the localization performance, the structure of the search grids has to be carefully defined.

Although the search grid is an important aspect of the SL process, only a few studies have discussed how the grids should be generated and distributed to represent the search space. The majority of SL algorithms rely on uniform regular grid structures3–7 because the grid generation process is simple. Nunes et al. showed that the geometry of the microphone array is significantly correlated to the localization performance.11 Using this concept, Salvati et al. introduced the geometrically sampled grid structure under a near-field assumption scenario.10 Lee et al. also introduced the pseudo-uniform grid (PUG) structure, where the grid points are uniformly distributed in the TDoA domain in three-dimensional (3D) space under a far-field assumption scenario.12 However, the number of grid points generated from the conventional algorithms are defined by the spatial resolution. Since the conventional approaches do not provide a direct control over the number of search grid points to be distributed, it is difficult to control the trade-off between localization performance and the complexity for these algorithms. Also, the way of distributing search grid points may also affect the localization performance, even when the same number of grid points is distributed.

In this letter, we propose a generic process to determine an optimal search grid structure in a far-field 3D space scenario. The proposed algorithm generates an evenly distributed search grid structure, where an iterative update is used to control the grid distribution process. By utilizing a generic cost function for the grid distribution process, the proposed algorithm enables generating search grids for specific application, e.g., beamforming. Furthermore, the proposed algorithm provides direct control over the number of grid points to be distributed, which is equivalent to the output dimension of the data-driven model. Through simulation results, we show that the proposed algorithm generates evenly distributed search grids in either the DoA or the TDoA domain under a constraint where the user explicitly defines the number of desired grid points.

In an M-channel microphone array with a small aperture, the relationship between DoA and TDoA is described as

τm(p)=ϱmc(cos(ϕφm)cos(θ)cos(ϑm)+sin(θ)sin(ϑm)),
(1)

where m{1,2,,C2M} is the microphone pair index, p = [ϕ,θ] is the DoA of the source described in azimuth and elevation angles, [φm, ϑm] is the azimuth and elevation angles of the displacement of the mth microphone pair, and ϱm is the distance between the mth microphone pair. In search grid-based SL algorithms, the location of the source is one of the N predefined grid points, which is most likely to generate the observed TDoA. Hence, the structure of predefined search grids affects both the computational cost and the localization performance.

The PUG structure12 defines search grids based on the criterion that the desired TDoA variation

τ̂=E[|τm(pi)τm(pj)|],
(2)

are constant for all neighboring grid points pi and pj where subscripts i and j denote the grid point indices. To simplify the problem, the elevation and the azimuth angles are regarded as independent axis in the grid distribution process as follows:

Ñ=k=1NθNϕ(θk)+1,
(3)

where Ñ, k, Nθ, and Nϕ(θk) denote the number of distributed grid points, the elevation grid index, the number of elevation grids, and the number of azimuth grids for a given elevation grid, respectively. Then the desired TDoA variation between adjacent grid points can be described as

τ̂=θiθi+1ηθ(ϕi,θ)dθ=ϕjϕj+1ηϕ(ϕ,θj)dϕ,
(4)

where ηϕ(ϕ, θ) and ηθ(ϕ, θ) denote the TDoA sensitivity functions with respect to azimuth and the elevation angles. Analytical derivation shows that the number of grid points at the elevation angle θk=cos1(k/Nθ) is

Nϕ(θk)=2πk,
(5)

where · denotes the rounding operator. Since Ñ is not directly related to τ̂, it is difficult to ensure that N=Ñ without any heuristic post-processing. As seen in Table 1, Ñ, resulting from Eqs. (3) and (5), can only accommodate certain values.

Table 1.

Number of grid points, Ñ, for the PUG structure for a various number of elevation grid points, Nθ, based on Eqs. (3) and (5).

Nθ123456789101112131415
Ñ 20 39 64 95 133 177 227 284 347 416 491 573 661 755 
Nθ123456789101112131415
Ñ 20 39 64 95 133 177 227 284 347 416 491 573 661 755 

Although PUG provides a uniform grid structure in the TDoA domain, the heuristic approach in Eq. (3) limits the tractability of the algorithm. In this section, a generic uniform grid (GUG) structure, which does not need a heuristic assumption and provides better tractability, is introduced.

The Thomson problem states that N nodes on a unit sphere can be uniformly distributed by minimizing the following cost function:

J=i=1Nj=i+1N1r(pi)r(pj)2,
(6)

where r(p)=[cos(ϕ)cos(θ),sin(ϕ)cos(θ),sin(θ)]T. The cost J can be minimized with a simple iterative update rule13 

pit+1=pitμpit,
(7)

where t is the iteration index, μ is the learning rate, and

pit=[Jtϕi,Jtθi]
(8)

is the analytical gradient of the cost function with respect to azimuth and the elevation angles. The criterion to stop the iterative update is given as

|Jt+1Jt|JtJth,
(9)

for some threshold value Jth. Since the iterative update process is initialized with N randomly placed nodes, it is possible to directly control the number of grid points to be distributed.

Equation (1) implies that the space where the search grid has to be distributed can be limited to the surface of the unit sphere, which allows Eq. (6) to be utilized for the search grid generation process. However, for microphone arrays with certain geometry, i.e., planar arrays, the search space becomes asymmetrically shaped due to the cone of confusion issue. In the asymmetrical shape condition, the equilibrium of forces occurs when the nodes are pushed from the center toward the outer region, which changes the uniform separation between the neighboring nodes. Furthermore, since the geometry of the array is disregarded in the cost function, the resulting grid is uniform only in the DoA domain, not the TDoA domain.

To overcome these limitations, we introduce a generalized cost function

JK=i=1Nj=i+1N1d(pi,pj)K,
(10)

where d(pi, pj) is the distance function and K ≥ 1 is a weighting factor. The distance functions in the DoA and the TDoA domain are defined as

dr(pi,pj)=r(pi)r(pj)2
(11)

and

dτ(pi,pj)=τ(pi)τ(pj)2,
(12)

respectively, where τ(p)=[τ1(p)τC2M(p)]T for τm(p) is given in Eq. (1). The analytical gradients of Eq. (10) with respect to ϕ and θ for the distance functions of Eqs. (11) and (12) are

Jr,Kϕi=Kj=1;jiNsin(ϕi)cos(θi)Δxijcos(ϕi)cos(θi)Δyij(Δxij2+Δyij2+Δzij2)(K+2)/2,
(13a)
Jr,Kθi=Kj=1;jiNcos(ϕi)sin(θi)Δxij+sin(ϕi)sin(θi)Δyijcos(θi)Δzij(Δxij2+Δyij2+Δzij2)(K+2)/2
(13b)

and

Jτ,Kϕi=Kcj=1;jiNmϱmΔτm(pi,pj)sin(ϕiφm)cos(θi)cos(ϑm)(m=1C2MΔτm(pi,pj)2)(K+2)/2,
(14a)
Jτ,Kθi=Kcj=1;jiNmϱmΔτm(pi,pj)(cos(ϕiφm)sin(θi)cos(ϑm)cos(θi)sin(ϑm))(m=1C2MΔτm(pi,pj)2)(K+2)/2,
(14b)

respectively, where Δxij=cos(ϕi)cos(θi)cos(ϕj)cos(θj),Δyij=sin(ϕi)cos(θi)sin(ϕj)cos(θj),Δzij=sin(θi)sin(θj), and Δτm(pi,pj)=τm(pi)τm(pj).

The parameter K in Eq. (1) plays a role in weighting the repulsion force with regard to the distance from two nodes: pi and pj. Figure 1(a) depicts a variation of node distribution for different values of K; this consists of the distribution of 11 nodes, p1 through p11, on a line with a range of x ∈ [−5, 5], where the circles and the ×'s denote results for K = 1 and K = 8, respectively. It is clear that the nodes are distributed more evenly when K = 8. The actual distance between neighboring nodes is depicted in Fig. 1(b), and it is also verified that the distance between the neighboring grids is much more uniform when K = 8 than K = 1, as depicted in the dashed-dotted and solid lines, respectively.

Fig. 1.

(Color online) (a) Distribution of 11 nodes on a line x = [−5 5] using Eq. (10) and (b) corresponding distance between neighboring nodes for K = 1 and K = 8.

Fig. 1.

(Color online) (a) Distribution of 11 nodes on a line x = [−5 5] using Eq. (10) and (b) corresponding distance between neighboring nodes for K = 1 and K = 8.

Close modal

To demonstrate the evenness of the search grids, the mean and the standard deviation of the central angle of great circle distance, defined as

Δσ=arccos(cos(ϕiϕj)cos(θi)cos(θj)+sin(θi)sin(θj)),
(15)

and the norm defined in Eq. (12) were measured for all neighboring grid points, obtained using Delaunay triangulation. The search grids generated using the PUG algorithm are compared to those generated using the proposed algorithms, uniformly distributed in the TDoA (GUG-TDoA) and the DoA (GUG-DoA) domains. An 8-channel uniform circular array with a 10.5 cm diameter was used for the simulations. The desired number of grid points for all three structures was chosen from the information presented in Table 1, where Nθ ∈ {6, 7, 8, 9, 10, 11, 12}. Although the proposed algorithm allows any natural value for N, values matching those of a PUG structure were selected to ensure a fair comparison. For the iterative update process of the proposed algorithm, K = 4, μ = 0.01, and Jth = 10–6 were used. The distributions obtained from Eqs. (15) and (12) for each grid setting are depicted in Figs. 2(a) and 2(b), respectively, where the markers in the center denote the mean and the vertical lines denote the 1σ range of each distribution. Since it can be said that a smaller σ implies more uniformly distributed grid points, it is clear that the grids obtained from GUG-DoA are much more uniformly distributed in the DoA domain whereas those obtained from PUG and GUG-TDoA are more uniformly distributed in the TDoA domain. Furthermore, it can be noted from the figures that the proposed GUG-TDoA structure is more evenly distributed than the conventional PUG structure in both DoA and TDoA domains.

Fig. 2.

(Color online) Distribution of (a) great circle distance and (b) TDoA variation norm measured between neighboring grid points for PUG, GUG-TDoA, and GUG-DoA.

Fig. 2.

(Color online) Distribution of (a) great circle distance and (b) TDoA variation norm measured between neighboring grid points for PUG, GUG-TDoA, and GUG-DoA.

Close modal

In addition, the 3D SL performances were evaluated using the steered response power with phase transform (SRP-PHAT) introduced in Ref. 3 and the DNN-based localization scheme introduced in Ref. 8 for all three grid structures. The reverberant target signals were generated by convolving the speech signal from the TIMIT database with room impulse responses synthesized using the image-source method,14 which were subsequently mixed with diffused noise.15 The detailed conditions of the mixture are described in Table 2. The localization results were evaluated by measuring the localization error between the true and estimated source locations, given in Eqs. (15) and (12). The averaged error results, displayed in Table 3, show that the performance variation between PUG and GUG-TDoA are relatively small compared to those between the PUG and GUG-DoA structures. This is an expected result since both PUG and GUG-TDoA aim to distribute search grid points evenly in the TDoA domain, whereas GUG-DoA aims to do so in the DoA domain. Also, it can be noted that the localization performance is much better and that the performance variations between conventional and proposed search grid structures is more evident when DNN-based localization is used instead of SRP-PHAT. Finally, the localization results in the table show that the GUG-DoA structure is suitable for DoA estimation whereas the PUG and GUG-TDoA structures are suitable for the TDoA estimation.

Table 2.

Environmental conditions used to generate training and test data. All rooms are assumed to be of 3 m high, and the distance denotes the distance between the source and the center of the array.

Training dataTest data
Speech  45 000 utterances 12 960 utterances 
Room size (m)  small 4 × 6 medium 6 × 10 large 10 × 14 small 4 × 6, 4 × 7 medium 6 × 10, 5 × 10 large 10 × 14, 12 × 12 
Distance near 1.5 m 
far 2 m 3 m 5 m 2 m 3 m 5 m 
T60  0.3, 0.5, and 0.7 s 0.3, 0.5, 0.7, and 1.0 s 
Noise types  Babble, white, pink, and volvo White and HFNoise 
SNR  0 to 20 dB, 5 dB increment 0, 10, and 20 dB 
Training dataTest data
Speech  45 000 utterances 12 960 utterances 
Room size (m)  small 4 × 6 medium 6 × 10 large 10 × 14 small 4 × 6, 4 × 7 medium 6 × 10, 5 × 10 large 10 × 14, 12 × 12 
Distance near 1.5 m 
far 2 m 3 m 5 m 2 m 3 m 5 m 
T60  0.3, 0.5, and 0.7 s 0.3, 0.5, 0.7, and 1.0 s 
Noise types  Babble, white, pink, and volvo White and HFNoise 
SNR  0 to 20 dB, 5 dB increment 0, 10, and 20 dB 
Table 3.

Overall localization error per grid settings in the estimation domain of the DoA and TDoA domains.

TypePUGGUG-TDoAGUG-DoA
DoA error (degrees) SRP-PHAT 11.110 11.101 11.217 
DNN 6.970 6.741 6.432 
TDoA error (samples) SRP-PHAT 1.766 1.790 1.833 
DNN 1.024 0.988 1.172 
TypePUGGUG-TDoAGUG-DoA
DoA error (degrees) SRP-PHAT 11.110 11.101 11.217 
DNN 6.970 6.741 6.432 
TDoA error (samples) SRP-PHAT 1.766 1.790 1.833 
DNN 1.024 0.988 1.172 

In this letter, we proposed a generic search grid generation algorithm that distributes the grid points evenly over the search space. Unlike conventional grid generation algorithms, the generalization of the cost function provides direct control over the number of grid points to be distributed and the domain in which the grids are distributed. Hence, it is possible to generate application specific search grids by choosing an appropriate cost function. The simulation results reveal that localization performance in the desired domain can be improved by selecting appropriate search grids. The uniformly distributed search grids in the TDoA domain show similar or slightly enhanced performances compared to the conventional PUG algorithm for all cases, whereas uniformly distributed grids in the DoA domain outperform the conventional PUG algorithm in the DoA domain.

This work was supported and funded by NAVER Corp.

1.
Y.
Huang
,
J.
Benesty
,
G. W.
Elko
, and
R. M.
Mersereau
, “
Real-time passive source localization: A practical linear-correction least-squares approach
,”
IEEE Trans. Speech Audio Process.
9
(
8
),
943
956
(
2001
).
2.
S. J.
Spencer
, “
Closed-form analytical solutions of the time difference of arrival source location problem for minimal element monitoring arrays
,”
J. Acoust. Soc. Am.
127
(
5
),
2943
2954
(
2010
).
3.
J. H.
DiBiase
, “
A high accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays
,” Ph.D. thesis,
Brown University
,
Providence, RI
,
2000
.
4.
S.
Zhao
,
T.
Saluev
, and
D. L.
Jones
, “
Underdetermined direction of arrival estimation using acoustic vector sensor
,”
Signal Process.
100
,
160
168
(
2014
).
5.
A.
Marti
,
M.
Cobos
,
J. J.
Lopez
, and
J.
Escolano
, “
A steered response power iterative method for high-accuracy acoustic source localization
,”
J. Acoust. Soc. Am.
134
(
4
),
2627
2630
(
2013
).
6.
J. P.
Dmochowski
,
J.
Benesty
, and
S.
Affes
, “
A generalized steered response power method for computationally viable source localization
,”
IEEE Trans. Audio, Speech, Lang. Process.
15
(
8
),
2510
2526
(
2007
).
7.
L. O.
Nunes
,
W. A.
Martins
,
M. V. S.
Lima
,
L. W. P.
Biscainho
,
M. V. M.
Costa
,
F. M.
Goncalves
,
A.
Said
, and
B.
Lee
, “
A steered-response power algorithm employing hierarchical search for acoustic source localization using microphone arrays
,”
IEEE Trans. Signal Process.
62
(
19
),
5171
5183
(
2014
).
8.
X.
Xiao
,
S.
Zhao
,
X.
Zhong
,
D. L.
Jones
,
E. S.
Chng
, and
H.
Li
, “
A learning-based approach to direction of arrival estimation in noisy and reverberant environments
,” in
Proceedings of the International Conference on Acoustics, Speech and Signal Processing
,
South Brisbane, Queensland
(
2015
), pp.
2814
2818
.
9.
R.
Takeda
and
K.
Komatani
, “
Sound source localization based on deep neural networks with directional activate function exploiting phase information
,” in
Proceedings of the International Conference on Acoustics, Speech and Signal Processing
,
Shanghai, China
(
2016
), pp.
405
409
.
10.
D.
Salvati
,
C.
Drioli
, and
G. L.
Foresti
, “
Exploiting a geometrically sampled grid in the steered response power algorithm for localization improvement
,”
J. Acoust. Soc. Am.
141
(
1
),
586
601
(
2017
).
11.
L. O.
Nunes
,
W. A.
Martins
,
M. V. S.
Lima
,
L. W. P.
Biscainho
,
B.
Lee
,
A.
Said
, and
R. W.
Schafer
, “
Discriminability measure for microphone array source localization
,” in
International Workshop on Acoustic Signal Enhancement
,
Aachen, Germany
(
2012
), pp.
1
4
.
12.
J.
Lee
,
S. W.
Chung
,
M. S.
Choi
, and
H. G.
Kang
, “
A study on search grid points for data-driven 3-D beamsteering
,” in
Hands-Free Speech Communication and Microphone Arrays
,
San Francisco, CA
(
March 1–3, 2017
), pp.
6
-
10
.
13.
M.
Thomas
, “
Fast computation of cubature formulae for the sphere
,” in
Hands-Free Speech Communication and Microphone Arrays
,
San Francisco, CA
(
March 1–3, 2017
), pp.
201
205
.
14.
J. B.
Allen
and
D. A.
Berkley
, “
Image method for efficiently simulating small-room acoustics
,”
J. Acoust. Soc. Am.
65
,
943
950
(
1979
).
15.
E. A. P.
Habets
,
I.
Cohen
, and
S.
Gannot
, “
Generating nonstationary multisensor signals under a spatial coherence constraint
,”
J. Acoust. Soc. Am.
124
(
5
),
2911
2917
(
2008
).