Time-delay estimation (TDE), which measures the relative time delay between different receivers, is a fundamental approach for identifying, localizing, and tracking radiating sources. The generalized cross-correlation method is the most popular and is well explained in a landmark paper by Knapp and Carter [(1976). IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327]. Adaptive eigenvalue decomposition- (EVD) based algorithms have also been developed to improve TDE performance, especially in reverberant environments. This paper extends the adaptive EVD algorithm to utilize the sparsity in transfer channel between source and receivers. Two estimation algorithms based on the log-sum and lp-norm penalized minor component analysis by excitatory and inhibitory learning rules is proposed. In addition, simulations with uncorrelated, correlated noise and reverberation for several signal-to-noise ratios are performed to show the improved estimation performance in noise and reverberation.
I. INTRODUCTION
Time delay estimation (TDE), which estimates the relative time difference of arrival (TDOA) among spatially separated sensors, has played an important role in radar, sonar, and seismology in localizing radiating sources. Nowadays, TDE on a microphone array is applied to localize and track acoustic sources in a room environment for applications such as automatic camera tracking for video-conference and microphone array beam steering for suppressing noise and reverberation in various communication and voice processing systems. However, TDE with two receivers is also actively used in many actual applications such as humanoid robotics (Murray et al., 2004; Ferreira et al., 2009; Trifa et al., 2007) and hearing aids (May et al., 2011). More recent works using just two receivers include studies by Cobos et al. (2011), Zhang and Rao (2010), and Cobos and Lopez (2010).
In many TDE methods, the generalized cross-correlation (GCC) method, proposed by Knapp and Carter (1976), is the most popular technique for TDE. The delay estimate is obtained as the time-lag that maximizes the cross-correlation between filtered versions of the received signals. Since then, many new ideas have been proposed to mitigate noise and reverberation. In Benesty (2000), an adaptive eigenvalue decomposition (EVD) algorithm has been developed for the blind estimation of two acoustic impulse responses from the source to the two microphones. The adaptive EVD algorithm for TDE performs better in reverberant environments than GCC based methods. Doclo and Moonen (2003) extend the adaptive EVD algorithm for TDE to the spatiotemporally colored noise case using an adaptive generalized eigenvalue decomposition (GEVD) algorithm.
In practice, we often encounter sparse impulse responses that have a small percentage of their components with a significant magnitude while the rest are zero or small (Hansler, 2006). Sparse impulse responses are encountered in many applications: network and acoustic echo cancellation, feedback cancellation in hearing aids, blind identification of acoustic impulse responses for time delay estimation, and source localization. Many recent works have been done in sparse channel estimation by the use of the l1-norm penalty (Donoho, 2006; Li et al., 2017; Berger et al., 2010). These algorithms achieve a good performance for sparse channel estimation applications (Bajwa et al., 2010; Taheri and Vorobyov, 2011; Lim and Pang, 2016b).
In this paper, we propose a new TDE algorithm to improve the adaptive EVD-based algorithm in Benesty (2000). The proposed algorithm utilizes the sparsity in the propagation channel. In order to deal with sparsity, we reformulate the object function from Benesty (2000) into the Rayleigh quotient form and add the log-sum penalty. We also reformulate the object function with the lp-norm penalty. In the simulation, we show the performance of the two proposed TDE algorithms by comparing them with two different algorithms of GCC and an adaptive EVD-based TDE. This paper is organized as follows. Section II discusses two models for the TDE problem. In Sec. III, the adaptive EVD algorithm for TDE is summarized, as previously proposed in Benesty (2000). In Secs. IV and V, the two proposed algorithms are developed. Section VI gives some simulation results and comparisons of different algorithms.
II. MODELS FOR THE TDE PROBLEM
This section presents two models often used for the TDE problem. First, the “ideal model” is described and then is followed by the “real model,” which more accurately describes a real acoustic environment (Benesty, 2000).
A. Ideal model
A simple and widely used signal model for the classical TDE problem is as follows. Let , denote the ith receiver signal, then
where αi is the attenuation factor due to propagation effects, τi is the propagation time from the unknown source s(k) to receiver i, and is an additive noise signal at the ith receiver. It is assumed that s(k), , and are zero-mean, uncorrelated, and stationary Gaussian random processes. The relative delay between the two received signals 1 and 2 is defined as
B. Real model
In a real acoustic environment we must consider the reverberation of the room; therefore, the ideal model no longer holds. A more complicated but more complete model for received signals , can be expressed as follows:
where * denotes convolution and gi is the channel impulse response between the source s(k) and the ith receiver. Moreover, and might be correlated, which is the case when the noise is directional, e.g., from a ceiling fan or an overhead projector (Benesty, 2000; Rubo et al., 2011; Netsch and Stachurski, 2014). For example, is correlated with in case that and are from the same noise source. In this case, Eq. (3) can be rewritten as
where τsignal is the time delay between signals in receiver 1 and receiver 2, and τnoise is the time delay between noises in receiver 1 and receiver 2.
III. ADAPTIVE EVD ALGORITHM
Benesty (2000) utilized the following relationship between the received signals 1 and 2:
where , i = 1, 2 is vector of signal samples at the microphone outputs, “T” denotes the transpose of a vector or a matrix, and the impulse response vectors of length M are defined as , i = 1, 2.
This linear relation follows from the fact that , i = 1, 2, thus . The covariance matrix of the two microphone signals is
where . Consider the vector .
From Eqs. (5) and (6), it can be seen that , which means that the vector (containing the two impulse responses) is the eigenvector of the covariance matrix corresponding to the eigenvalue of 0. Moreover, if the two impulse responses and have no common zeros and the autocorrelation matrix of the source signal s(n) is full rank, which is assumed in the rest of this paper, the covariance matrix has only one eigenvalue with a value of zero (Benesty, 2000).
In practice, accurate estimation of the vector is not trivial because of the nature of speech, the length of the impulse responses, and the background noise, etc. However, for this application, we only need to find an efficient way to detect the direct paths of the two impulse responses. In the following, it is explained how this can be done. Benesty (2000) derived the object function for the optimal vector as the minimizing the quantity with respect to and subject to .
Benesty (2000) proposed a simple algorithm to estimate iteratively the eigenvector (here corresponding to the minimum eigenvalue of , by using an algorithm similar to the Frost algorithm which is a simple constrained Least-Mean-Square (LMS).
IV. LOG-SUM PENALIZED MINOR COMPONENT ANALYSIS BY EXCITATORY AND INHIBITORY LEARNING RULES (MCA EXIN) BASED TIME DELAY ESTIMATION
The minimization problem derived by Benesty (2000) can be formulated with the Lagrange multiplication method
where, if .
The above optimization problem can also be reformulated with the Rayleigh quotient (Cirrincione and Cirrincione, 2010; Cichocki and ichi Amari, 2002)
In general, the impulse response is a kind of a sparse impulse response (Hansler, 2006). It is because a large fraction of its energy is concentrated in a small fraction of its duration (Hansler, 2006). We can utilize the sparsity to derive the solution of Eq. (8) with more improved accuracy.
In Lim and Pang (2016a), the authors proposed a time recursive MCA EXIN with the log-sum regularization (so called RZA-TLS EXIN) in order to estimate the system coefficients of the sparse system,
where is the ith component in . Because the impulse response is sparse, the augmented vector of the impulse response is also sparse. Therefore, we can replace the time delay estimation by Eq. (8) with the object function of Eq. (9).
For the update equation from Eq. (9), the steepest descent method and subgradient concept were used in Lim and Pang (2016a). The update equation is as follows:
where is the sub-gradient of the convex function (Chen et al., 2009) and .
where μ is the learning rate and . The third term in Eq. (11) acts as reweighted zero attractor, which takes effect only on taps whose magnitudes are comparable to 1/ε; there is little shrinkage exerted on the taps whose (Chen et al., 2009).
The instantaneous approximation algorithm applied to Eq. (11) yields
V. -NORM PENALIZED MCA EXIN BASED TIME DELAY ESTIMATION
The log-sum penalty term in Eq. (9) works as l0-norm penalty term. We consider another penalty function which is more similar to the l0-norm penalty. It is lp-norm with . The lp-norm has been introduced to LMS based sparse channel estimation in Taheri and Vorobyov, (2011), Taheri and Vorobyov (2014), and Wu and Tong (2013). In this case, the cost function becomes
where stands for the -norm of a vector and γp is the weighting constant. By using the steepest descent method and subgradient concept like Eq. (10), the update equation is as follows:
where . Furthermore, we can set an upper bound on the last term in Eq. (14) in order to cope with an element in approaching zero in a sparse channel impulse response. Then the update equation is modified as
where is a value for bounding the last term in Eq. (15) (Taheri and Vorobyov, 2011; Taheri and Vorobyov, 2014).
VI. SIMULATION AND COMPARISON
In this section, we compare the performance of different time delay estimation algorithms for three scenarios. The algorithms being considered here are GCC (Knapp and Carter, 1976) and adaptive eigenvector decomposition algorithm (Benesty, 2000) as well as the proposed log-sum penalized MCA EXIN-based time delay algorithm and the lp-norm penalized MCA EXIN-based time delay algorithm.
In the first experiment, we consider two receiving sensors. In most practical applications, the desired source signals of interest are correlated. We also generated the source signal by passing a first order autoregressive (AR) process, viz., , where w(k) is a white Gaussian process (So, 2001). Assuming the signals propagated in the free space, the second signal is the time-delayed version of with delay of ten time steps, i.e., . Then, the two signals and are contaminated by two real white Gaussian noises and , respectively. We check to ensure that the two noise sequences are mutually uncorrelated.
The signal sequences were scaled to obtain the desired signal-to-noise ratio (SNR) and added to the noise sequences as in Eq. (1) to form the sensor outputs and . SNR of approximately from 20 dB to −10 dB were considered, where SNR = .
The sequences and were processed using the traditional GCC, adaptive EVD-based method and the proposed two algorithms. We set the step-size as 0.001 with the filter length of 30 in the adaptive EVD-based algorithm and the proposed algorithms. For the lp-norm penalized algorithm, p is chosen to be 1/2 with and . The parameters of the log-sum penalized algorithm are set to and . In GCC, we apply the Phase Transform–Generalized Cross Correlation (PHAT-GCC) algorithm with an fast Fourier transform size of 2048 samples at 16 kHz sampling rate.
We compared the root mean squared delay errors (RMSD) for the performance comparison. All the results provided were averages of 500 independent trials. Figure 1 shows the RMSD of the three algorithms. When SNR −5 dB, the proposed methods outperform the adaptive EVD method and the GCC. In particular, the lp-norm penalized algorithm is superior to the others. This result means lp-norm handle the sparsity better than the other proposed algorithm. In the first experiment, we can conclude that the proposed algorithms improve the performance by applying the sparsity penalty to the delay parameter estimation algorithm.
RMSD comparison in the uncorrelated additive noise environments (-○-: log-sum penalized algorithm, --: lp-norm penalized algorithm, -×-: GCC (Knapp and Carter, 1976), --: adaptive EVD method (Benesty, 2000)).
RMSD comparison in the uncorrelated additive noise environments (-○-: log-sum penalized algorithm, --: lp-norm penalized algorithm, -×-: GCC (Knapp and Carter, 1976), --: adaptive EVD method (Benesty, 2000)).
In the second experiment, the case of the correlate noises was considered. This case frequently happens when the noise is directional such as from a ceiling fan or an overhead projector. We generated correlated noise as, i.e., , where is real white Gaussian noise. The other parameters for the algorithms are the same as those in the first experiment. We also compared the RMSD for the performance comparison by 500 independent trials. Figure 2 shows the two proposed estimators outperform the GCC and the adaptive EVD method. From these results, we can conclude that the proposed algorithms improve the performance of the adaptive EVD method by adding the sparsity penalty term.
RMSD comparison in the correlated additive noise environments (-○-: log-sum penalized algorithm, : lp-norm penalized algorithm, -×-: GCC (Knapp and Carter, 1976), --: adaptive EVD method (Benesty, 2000)).
RMSD comparison in the correlated additive noise environments (-○-: log-sum penalized algorithm, : lp-norm penalized algorithm, -×-: GCC (Knapp and Carter, 1976), --: adaptive EVD method (Benesty, 2000)).
In the third experiment, we simulate under realistic reverberation conditions. For realistic reverberation conditions, we have simulated a room with dimensions [5 m, 4 m, 2 m] having three different reverberation times of T60 = 250 ms and 1 s, respectively. The reverberation time T60 can be expressed as a function of the absorption coefficient γ of the walls, according to Eyring's formula (Doclo and Moonen, 2003; Everest, 2001)
with V the volume of the room and S the total surface of the room. As shown in Fig. 3, the room consists of a microphone array, with two omnidirectional microphones at positions [1 m, 1 m, 1 m] and [1.5 m, 1 m, 1 m], and a sound source at position [2 m, 2 m, 1.7 m]. The microphone 2 is set as the reference. The received signals, , are filtered versions of the in experiment 1 using simulated acoustic impulse responses constructed by the image method (Lehmann and Johansson, 2008; Lehmann, 2008; Jarrett et al., 2011; De Sena et al., 2015) with a filter length . The exact time delay between the speech components is −12.18 samples, which have been obtained by a simple geometrical calculation in case of the sampling frequency kHz. Noise has been generated by considering uncorrelated white noise sources equally distributed over all directions. We have performed simulations using the adaptive eigenvector decomposition algorithm (Adaptive EVD), GCC, and the two proposed algorithms for different SNRs (10 dB and 0 dB). In this experiment, we perform 100 independent trials.
Chamber floor plan with the position of two microphones and one sound source.
Figures 4 and 5 show histograms of TDE with a pair of omni directional microphones in T60 = 250 ms and 1 s, respectively. Comparison results in SNR = 10 dB are shown in Figs. 4(a)–4(h), and comparison results in SNR = 0 dB are shown in Figs. 5(a)–5(h). The inverted triangle in the figures points to the true delay. In all the cases, the proposed two algorithms keep the excellent estimation performance. The lp-norm penalized algorithm shows better than the l1-norm penalized algorithm especially in the low SNR case. From these results, we can conclude that the two proposed algorithms perform better in accuracy. In addition, we can also confirm that the proposed algorithms improve the robustness of the adaptive EVD algorithm in reverberant environments.
(Color online) Comparison of TDE in SNR = 10 dB, (a) proposed l1 MCA EXIN in T60 = 250 ms, (b) proposed lp MCA EXIN in T60 = 250 ms, (c) GCC in T60 = 250 ms, (d) adaptive EVD in T60 = 250 ms, (e) proposed l1 MCA EXIN in T60 = 1 s, (f) proposed lp MCA EXIN in T60 = 1 s, (g) GCC in T60 = 1 s, (h) adaptive EVD in T60 = 1 s.
(Color online) Comparison of TDE in SNR = 10 dB, (a) proposed l1 MCA EXIN in T60 = 250 ms, (b) proposed lp MCA EXIN in T60 = 250 ms, (c) GCC in T60 = 250 ms, (d) adaptive EVD in T60 = 250 ms, (e) proposed l1 MCA EXIN in T60 = 1 s, (f) proposed lp MCA EXIN in T60 = 1 s, (g) GCC in T60 = 1 s, (h) adaptive EVD in T60 = 1 s.
(Color online) Comparison of TDE in SNR = 0 dB, (a) proposed l1 MCA EXIN in T60 = 250 ms, (b) proposed lp MCA EXIN in T60 = 250 ms, (c) GCC in T60 = 250 ms, (d) adaptive EVD in T60 = 250 ms, (e) proposed l1 MCA EXIN in T60 = 1 s, (f) proposed lp MCA EXIN in T60 = 1 s, (g) GCC in T60 = 1 s, (h) adaptive EVD in T60 = 1 s.
(Color online) Comparison of TDE in SNR = 0 dB, (a) proposed l1 MCA EXIN in T60 = 250 ms, (b) proposed lp MCA EXIN in T60 = 250 ms, (c) GCC in T60 = 250 ms, (d) adaptive EVD in T60 = 250 ms, (e) proposed l1 MCA EXIN in T60 = 1 s, (f) proposed lp MCA EXIN in T60 = 1 s, (g) GCC in T60 = 1 s, (h) adaptive EVD in T60 = 1 s.
VII. CONCLUSION
This paper proposed two new algorithms in the time delay estimation. The algorithms estimate the time delay parameter from the eigenvector for the minimum eigenvalue estimated by log-sum penalized MCA EXIN and lp-norm penalized MCA EXIN, respectively. In comparison with other methods, the two proposed algorithms are much more robust in reverberant environments as well as in uncorrelated noise environments and correlated noise environments. The results in this paper provide a good starting point for discussion, and further research is needed to confirm the proposed algorithms by applying and testing in real reverberant facilities.
ACKNOWLEDGMENTS
This paper was supported by Agency for Defense Development (ADD) in Korea (UD160015DD).