In this letter, a generic search grid generation algorithm for far-field source localization (SL) is proposed. Since conventional uniform regular grid structures only consider the resolution of the distribution, it is difficult to control the number of grid points to be distributed. The proposed algorithm generates a search grid by distributing a desired number of points evenly, depending on the target criterion, in either direction of arrival or time difference of arrival domain. The experimental results show that the proposed algorithm provides optimally distributed grid points given the number of desired points and the corresponding domain for SL processing.

## 1. Introduction

The latest development of voice-activated user interfaces requires an acquisition of a desired speech signal from a distance. Since the acquired signal is corrupted by coherent interference and diffused background noise in this distant target scenario, it is necessary to estimate the desired speech signal using various types of enhancement techniques. Microphone array techniques are the best choice for removing both coherent directional interference and diffused noise, where the overall enhancement performance is dependent upon the localization accuracy of the desired speech signal. Toward this end, numerous source localization (SL) techniques have been investigated over the past several decades.^{1–9}

One way to classify SL algorithms is based on whether the location is derived directly from time difference of arrival (TDoA) or through a grid searching approach. TDoA-based SL schemes^{1,2} provide fast localization, but search grid-based SL algorithms^{3–7} yield better localization performance. To provide a feasible and robust solution, data-driven SL algorithms, utilizing deep neural networks (DNNs), have been recently introduced.^{8,9} Since a DNN is known to be highly effective in modeling the non-linear relationship between input and output data, it has been proven to deliver excellent localization performance in adverse environments^{8} even with small aperture arrays.

The overall localization performance of search grid-based SL algorithms varies significantly depending on the structure of the search grid.^{10} Search grids with a higher resolution provide more accurate localization but require more computational power. For instance, the complexity of the steered response power^{3} algorithm is directly proportional to the number of search grid points. An indefinite increase of the number of grid points, however, may not yield improvement of localization accuracy due to the discriminability issue.^{11} Also, the structure of the search grids should be chosen based on their application. For instance, search grids uniformly distributed in the TDoA domain are more suitable than those uniformly distributed in the direction of arrival (DoA) domain for beamforming applications.^{12} On the other hand, for applications such as audio rendering and reproductions, it would be better to optimize search grids in the DoA domain. Furthermore, data-driven approaches require that the search grid be determined prior to the training phase and its structure cannot be altered once the training has begun. Hence, in order to maximize the efficiency of the localization performance, the structure of the search grids has to be carefully defined.

Although the search grid is an important aspect of the SL process, only a few studies have discussed how the grids should be generated and distributed to represent the search space. The majority of SL algorithms rely on uniform regular grid structures^{3–7} because the grid generation process is simple. Nunes *et al.* showed that the geometry of the microphone array is significantly correlated to the localization performance.^{11} Using this concept, Salvati *et al.* introduced the geometrically sampled grid structure under a near-field assumption scenario.^{10} Lee *et al.* also introduced the pseudo-uniform grid (PUG) structure, where the grid points are uniformly distributed in the TDoA domain in three-dimensional (3D) space under a far-field assumption scenario.^{12} However, the number of grid points generated from the conventional algorithms are defined by the spatial resolution. Since the conventional approaches do not provide a direct control over the number of search grid points to be distributed, it is difficult to control the trade-off between localization performance and the complexity for these algorithms. Also, the way of distributing search grid points may also affect the localization performance, even when the same number of grid points is distributed.

In this letter, we propose a generic process to determine an optimal search grid structure in a far-field 3D space scenario. The proposed algorithm generates an evenly distributed search grid structure, where an iterative update is used to control the grid distribution process. By utilizing a generic cost function for the grid distribution process, the proposed algorithm enables generating search grids for specific application, e.g., beamforming. Furthermore, the proposed algorithm provides direct control over the number of grid points to be distributed, which is equivalent to the output dimension of the data-driven model. Through simulation results, we show that the proposed algorithm generates evenly distributed search grids in either the DoA or the TDoA domain under a constraint where the user explicitly defines the number of desired grid points.

## 2. Conventional uniform search grid generation in TDoA domain

In an *M*-channel microphone array with a small aperture, the relationship between DoA and TDoA is described as

where $m\u2208{1,\u20092,\u2009\u2026,\u2009C2M}$ is the microphone pair index, **p** = [*ϕ*,*θ*] is the DoA of the source described in azimuth and elevation angles, [*φ _{m}*,

*ϑ*] is the azimuth and elevation angles of the displacement of the

_{m}*m*th microphone pair, and ϱ

_{m}is the distance between the

*m*th microphone pair. In search grid-based SL algorithms, the location of the source is one of the

*N*predefined grid points, which is most likely to generate the observed TDoA. Hence, the structure of predefined search grids affects both the computational cost and the localization performance.

The PUG structure^{12} defines search grids based on the criterion that the desired TDoA variation

are constant for all neighboring grid points **p**_{i} and **p**_{j} where subscripts *i* and *j* denote the grid point indices. To simplify the problem, the elevation and the azimuth angles are regarded as independent axis in the grid distribution process as follows:

where $N\u0303$, *k*, *N _{θ}*, and

*N*(

_{ϕ}*θ*) denote the number of distributed grid points, the elevation grid index, the number of elevation grids, and the number of azimuth grids for a given elevation grid, respectively. Then the desired TDoA variation between adjacent grid points can be described as

_{k}where *η _{ϕ}*(

*ϕ*,

*θ*) and

*η*(

_{θ}*ϕ*,

*θ*) denote the TDoA sensitivity functions with respect to azimuth and the elevation angles. Analytical derivation shows that the number of grid points at the elevation angle $\theta k=cos\u22121(k/N\theta )$ is

where $\u230a\xb7\u2309$ denotes the rounding operator. Since $N\u0303$ is not directly related to $\tau \u0302$, it is difficult to ensure that $N=N\u0303$ without any heuristic post-processing. As seen in Table 1, $N\u0303$, resulting from Eqs. (3) and (5), can only accommodate certain values.

## 3. Proposed generic search grid generation algorithm

Although PUG provides a uniform grid structure in the TDoA domain, the heuristic approach in Eq. (3) limits the tractability of the algorithm. In this section, a generic uniform grid (GUG) structure, which does not need a heuristic assumption and provides better tractability, is introduced.

The Thomson problem states that *N* nodes on a unit sphere can be uniformly distributed by minimizing the following cost function:

where $r(p)=[cos(\varphi )\u2009cos(\theta ),\u2009\u2009sin(\varphi )\u2009cos(\theta ),\u2009\u2009sin(\theta )]T$. The cost *J* can be minimized with a simple iterative update rule^{13}

where *t* is the iteration index, *μ* is the learning rate, and

is the analytical gradient of the cost function with respect to azimuth and the elevation angles. The criterion to stop the iterative update is given as

for some threshold value *J*_{th}. Since the iterative update process is initialized with *N* randomly placed nodes, it is possible to directly control the number of grid points to be distributed.

Equation (1) implies that the space where the search grid has to be distributed can be limited to the surface of the unit sphere, which allows Eq. (6) to be utilized for the search grid generation process. However, for microphone arrays with certain geometry, i.e., planar arrays, the search space becomes asymmetrically shaped due to the cone of confusion issue. In the asymmetrical shape condition, the equilibrium of forces occurs when the nodes are pushed from the center toward the outer region, which changes the uniform separation between the neighboring nodes. Furthermore, since the geometry of the array is disregarded in the cost function, the resulting grid is uniform only in the DoA domain, not the TDoA domain.

To overcome these limitations, we introduce a generalized cost function

where *d*(**p**_{i}, **p**_{j}) is the distance function and *K* ≥ 1 is a weighting factor. The distance functions in the DoA and the TDoA domain are defined as

and

respectively, where $\tau (p)=[\tau 1(p)\u2009\cdots \u2009\tau C2M(p)]T$ for *τ _{m}*(

**p**) is given in Eq. (1). The analytical gradients of Eq. (10) with respect to

*ϕ*and

*θ*for the distance functions of Eqs. (11) and (12) are

and

respectively, where $\Delta xij=cos(\varphi i)\u2009cos(\theta i)\u2212cos(\varphi j)\u2009cos(\theta j),\u2009\Delta yij=sin(\varphi i)\u2009cos(\theta i)\u2212sin(\varphi j)\u2009cos(\theta j),\u2009\Delta zij=sin(\theta i)\u2212sin(\theta j),$ and $\Delta \tau m(pi,pj)=\tau m(pi)\u2212\tau m(pj)$.

The parameter *K* in Eq. (1) plays a role in weighting the repulsion force with regard to the distance from two nodes: **p**_{i} and **p**_{j}. Figure 1(a) depicts a variation of node distribution for different values of *K*; this consists of the distribution of 11 nodes, **p**_{1} through **p**_{11}, on a line with a range of *x* ∈ [−5, 5], where the circles and the ×'s denote results for *K* = 1 and *K* = 8, respectively. It is clear that the nodes are distributed more evenly when *K* = 8. The actual distance between neighboring nodes is depicted in Fig. 1(b), and it is also verified that the distance between the neighboring grids is much more uniform when *K* = 8 than *K* = 1, as depicted in the dashed-dotted and solid lines, respectively.

## 4. Performance evaluation

To demonstrate the evenness of the search grids, the mean and the standard deviation of the central angle of great circle distance, defined as

and the norm defined in Eq. (12) were measured for all neighboring grid points, obtained using Delaunay triangulation. The search grids generated using the PUG algorithm are compared to those generated using the proposed algorithms, uniformly distributed in the TDoA (GUG-TDoA) and the DoA (GUG-DoA) domains. An 8-channel uniform circular array with a 10.5 cm diameter was used for the simulations. The desired number of grid points for all three structures was chosen from the information presented in Table 1, where *N _{θ}* ∈ {6, 7, 8, 9, 10, 11, 12}. Although the proposed algorithm allows any natural value for

*N*, values matching those of a PUG structure were selected to ensure a fair comparison. For the iterative update process of the proposed algorithm,

*K*= 4,

*μ*= 0.01, and

*J*

_{th}= 10

^{–6}were used. The distributions obtained from Eqs. (15) and (12) for each grid setting are depicted in Figs. 2(a) and 2(b), respectively, where the markers in the center denote the mean and the vertical lines denote the 1

*σ*range of each distribution. Since it can be said that a smaller

*σ*implies more uniformly distributed grid points, it is clear that the grids obtained from GUG-DoA are much more uniformly distributed in the DoA domain whereas those obtained from PUG and GUG-TDoA are more uniformly distributed in the TDoA domain. Furthermore, it can be noted from the figures that the proposed GUG-TDoA structure is more evenly distributed than the conventional PUG structure in both DoA and TDoA domains.

In addition, the 3D SL performances were evaluated using the steered response power with phase transform (SRP-PHAT) introduced in Ref. 3 and the DNN-based localization scheme introduced in Ref. 8 for all three grid structures. The reverberant target signals were generated by convolving the speech signal from the TIMIT database with room impulse responses synthesized using the image-source method,^{14} which were subsequently mixed with diffused noise.^{15} The detailed conditions of the mixture are described in Table 2. The localization results were evaluated by measuring the localization error between the true and estimated source locations, given in Eqs. (15) and (12). The averaged error results, displayed in Table 3, show that the performance variation between PUG and GUG-TDoA are relatively small compared to those between the PUG and GUG-DoA structures. This is an expected result since both PUG and GUG-TDoA aim to distribute search grid points evenly in the TDoA domain, whereas GUG-DoA aims to do so in the DoA domain. Also, it can be noted that the localization performance is much better and that the performance variations between conventional and proposed search grid structures is more evident when DNN-based localization is used instead of SRP-PHAT. Finally, the localization results in the table show that the GUG-DoA structure is suitable for DoA estimation whereas the PUG and GUG-TDoA structures are suitable for the TDoA estimation.

. | . | Training data . | Test data . | ||||
---|---|---|---|---|---|---|---|

Speech | 45 000 utterances | 12 960 utterances | |||||

Room size (m) | small 4 × 6 | medium 6 × 10 | large 10 × 14 | small 4 × 6, 4 × 7 | medium 6 × 10, 5 × 10 | large 10 × 14, 12 × 12 | |

Distance | near | 1.5 m | |||||

far | 2 m | 3 m | 5 m | 2 m | 3 m | 5 m | |

T60 | 0.3, 0.5, and 0.7 s | 0.3, 0.5, 0.7, and 1.0 s | |||||

Noise types | Babble, white, pink, and volvo | White and HFNoise | |||||

SNR | 0 to 20 dB, 5 dB increment | 0, 10, and 20 dB |

. | . | Training data . | Test data . | ||||
---|---|---|---|---|---|---|---|

Speech | 45 000 utterances | 12 960 utterances | |||||

Room size (m) | small 4 × 6 | medium 6 × 10 | large 10 × 14 | small 4 × 6, 4 × 7 | medium 6 × 10, 5 × 10 | large 10 × 14, 12 × 12 | |

Distance | near | 1.5 m | |||||

far | 2 m | 3 m | 5 m | 2 m | 3 m | 5 m | |

T60 | 0.3, 0.5, and 0.7 s | 0.3, 0.5, 0.7, and 1.0 s | |||||

Noise types | Babble, white, pink, and volvo | White and HFNoise | |||||

SNR | 0 to 20 dB, 5 dB increment | 0, 10, and 20 dB |

Type . | . | PUG . | GUG-TDoA . | GUG-DoA . |
---|---|---|---|---|

DoA error (degrees) | SRP-PHAT | 11.110 | 11.101 | 11.217 |

DNN | 6.970 | 6.741 | 6.432 | |

TDoA error (samples) | SRP-PHAT | 1.766 | 1.790 | 1.833 |

DNN | 1.024 | 0.988 | 1.172 |

Type . | . | PUG . | GUG-TDoA . | GUG-DoA . |
---|---|---|---|---|

DoA error (degrees) | SRP-PHAT | 11.110 | 11.101 | 11.217 |

DNN | 6.970 | 6.741 | 6.432 | |

TDoA error (samples) | SRP-PHAT | 1.766 | 1.790 | 1.833 |

DNN | 1.024 | 0.988 | 1.172 |

## 5. Conclusion

In this letter, we proposed a generic search grid generation algorithm that distributes the grid points evenly over the search space. Unlike conventional grid generation algorithms, the generalization of the cost function provides direct control over the number of grid points to be distributed and the domain in which the grids are distributed. Hence, it is possible to generate application specific search grids by choosing an appropriate cost function. The simulation results reveal that localization performance in the desired domain can be improved by selecting appropriate search grids. The uniformly distributed search grids in the TDoA domain show similar or slightly enhanced performances compared to the conventional PUG algorithm for all cases, whereas uniformly distributed grids in the DoA domain outperform the conventional PUG algorithm in the DoA domain.

## Acknowledgments

This work was supported and funded by NAVER Corp.