Spatial post-filter for linear hydrophone arrays with applications to underwater source localisation

: This letter presents a spatial post-ﬁlter that can be employed in linear hydrophone arrays, commonly found in sonar systems, for the task of improving the bearing estimation and noise suppression capabilities of traditional beamformers. The proposed ﬁlter is computed in the time-frequency domain as the normalised cross-spectral density between two beamformed signals, which are generated by applying conventional beamforming to two adjacent non-overlapping sub-arrays. The evaluation on both simulated and real-world data demonstrates promising performance compared to other popular post-ﬁlters in some cases, especially for targets near the end-ﬁre direction and in the presence of uncorrelated interferers or diffuse noise. V C


Introduction
Underwater surveillance and submarine navigation rely predominantly on passive-sonar systems, which employ hydrophone arrays to visualise the surrounding soundfield on the horizontal plane.The main limitation of conventional soundfield visualisation techniques is that their spatial resolution (main-lobewidth) and background noise suppression capabilities are tightly linked to the array aperture and the number of hydrophones in the array.Therefore, improving their overall performance would typically require additional resources in the form of added hydrophones in the array.Modern signal processing techniques, such as adaptive beamformers and subspace methods, can achieve improved performance, under certain conditions, without the need to increase the number of hydrophones.However, these methods often lead to a significant increase in computational requirements, especially when the arrays comprise a large number of hydrophones, as is the case for contemporary sonar systems (Gentilho et al., 2020;Stergiopoulos, 2017).
To overcome such limitations, this paper proposes a post-filter based on the cross-pattern coherence (CroPaC) method (Delikaris-Manias and Pulkki, 2013;Pulkki et al., 2018).This method, which may be applied to the output of existing beamformer-based systems, is relatively inexpensive from a computational perspective and has recently shown promising performance for underwater soundfield visualisation applications in Bountourakis et al. (2021).However, since the original method operates in either the circular or the spherical harmonic domain, existing formulations are only directly applicable to circular and spherical arrays.This work aims to extend the method's applicability to linear arrays, which represent the most common type of arrays encountered in submarines, surface ships, and unmanned underwater vehicles (Maguer et al., 2008;Nicholson and Healey, 2008); hence, the proposed algorithm may be of high practical relevance to the underwater acoustics community.The novel formulation of CroPaC operates directly in the space (sensor) domain, i.e., without the need to transform the hydrophone signals into the circular or spherical harmonic domain, while largely retaining the desirable properties of the original method.

Related works
Underwater soundfields are typically visualised on the horizontal plane as spatial spectra which depict the beamformer energy over the steering direction.The most popular beamformer for sonar applications is the delay-and-sum (DaS) algorithm, commonly referred to as conventional beamforming (Carter, 1981;Knight et al., 1981;Quazi, 1981).The main reasons for its popularity pertain to its simplicity, robustness, and low computational requirements.The main downside of conventional beamforming, however, is that its performance is inherently limited by the array geometry and the number of hydrophones (Van Trees, 2004).To address this drawback, signal-dependent adaptive beamformers have been proposed, such as the minimum variance distortionless response (MVDR) (Ferguson and Carevic, 2010; Van Veen and Buckley, 1988;   a) Author to whom correspondence should be addressed.Zoltowski, 1988) and the more general linearly constrained minimum-variance solution (Frost, 1972).In certain cases, these beamformers can yield higher resolution than DaS, but with the penalty of increased computational load.Other drawbacks of these beamformers include their performance drop in low signal-to-noise ratio (SNR) levels (Zoltowski, 1988), as well as their susceptibility to coherent interferers (Reddy et al., 1987) and to modelling errors (Cox, 1973).
One class of signal-dependent techniques which do not fall into the category of beamformers are those referred to as subspace methods, which include the multiple signal classification (MUSIC) (Schmidt, 1986) and the estimation of signal parameters via rotational invariance technique (ESPRIT) (Roy and Kailath, 1989) algorithms.In theory, subspace methods can resolve arbitrarily closely spaced targets and, therefore, may provide higher resolution than beamformers in some cases.Their primary downsides, however, are their requirement for source detection algorithms (Akaike, 1974;Han and Nehorai, 2013) and their sensitivity to ambient noise and correlated signals arising due to multipath propagation.
Split-beam processing is a bearing estimation technique with passive linear towed arrays which has attracted attention in the underwater acoustics community (MacDonald and Schultheiss, 1969) due to its low-complexity.It is based on dividing the array into two equal sub-arrays and steering coincident beams from each sub-array through conventional beamforming.The cross-correlations between these beamformed signals provide time delays through which bearing estimates are extracted.Stergiopoulos and Ashley (1997) demonstrated that, when prefiltering is included, split-beam processing may offer better angular resolution than conventional full-aperture beamforming at the end-fire beams, although this advantage is restricted only to dominant signals.Chen et al. (2019), on the other hand, recently proposed a split-beam processing algorithm, which may also detect weak targets.The concept of splitting a linear hydrophone array into adjacent sub-arrays for bearing estimation has been also examined in Cox et al. (1990) and Hinich and Rule (1975), while the capabilities of distributed sonar subarrays have been studied in Ganti (2018).
Another class of techniques that may be employed for improving soundfield visualisation includes the spatial post-filters.In the context of microphone array filtering, post-filtering refers to the post-processing of the beamformer output by a single-channel noise suppression (Wiener) filter, which is computed by means of auto-and cross-spectral densities of the multichannel input (Marro et al., 1998;Simmer et al., 2001).Although these filters have received limited attention in the underwater acoustics community, they are known to perform well in environments with high background noise and coherent interferers, such as conditions typically found in maritime shallow-water environments.Popular implementations of such filters are the Zelinski (Zelinski, 1988), the McCowan (McCowan and Bourlard, 2003), and the Leukimmiatis post-filters (Leukimmiatis and Maragos, 2006), which have been tested for speech enhancement applications with microphone arrays and have demonstrated good background noise suppression capabilities in reverberant rooms.
This work is inspired by the CroPaC post-filter, which has demonstrated improved performance compared to the Zelinski and the McCowan filters in the suppression of diffuse noise, especially at low frequencies (Delikaris-Manias and Pulkki, 2013;Pulkki et al., 2018).The key difference of CroPaC is that it utilises directional microphones to compute the required auto-and cross-spectral densities as opposed to the omnidirectional microphones employed by the other post-filters.Although the original application domain of post-filters is audio signal enhancement, CroPaC has also been employed for soundfield visualisation in McCormack et al. (2017).More recently, a reformulated version for circular hydrophone arrays has been successfully applied for underwater soundfield visualisation (Bountourakis et al., 2021).The results of this previous study motivated the present research, which proposes a new CroPaC formulation suitable for linear arrays.

CroPaC post-filter
The spatial discrimination achieved by beamformers may be further improved through the use of post-filters, which aim to suppress the residual noise, and thus increase the SNR.An analysis made by Simmer et al. (2001) showed that the maximum SNR in the minimum mean square error (MMSE) sense for the estimation of broadband signals can be achieved by the multi-channel Wiener filter, which can be decomposed into a MVDR beamformer and a single-channel Wiener postfilter.In that derivation, the observation vector of the array signals at frequency k and time i is modelled as xðk; iÞ ¼ sðk; iÞdðkÞ þ nðk; iÞ; (1 where s is the target source signal, d is the array transfer vector which describes the propagation from the target to the array, and n is the noise vector.The optimal MMSE solution for the beamformer weights is then given as where U ss and U nn are the signal and noise power spectral densities at the beamformer output.Here, the frequency and angle indices are omitted for brevity.Since U ss and U nn are not known a priori, the post-filter transfer function is essentially a theoretical expression.Therefore, research in post-filters has mainly focused on proposing heuristic approximations of this transfer function.2 (8), 084802 (2022)  2, 084802-2 The first implementation of such a filter was proposed in Zelinski (1988), where the required spectral densities are computed through the auto-and cross-spectral densities of the sensor signals.A generalisation of the Zelinski filter was proposed by McCowan and Bourlard (2003), who included a noise field coherence function, allowing the assumption of more realistic noise fields, such as spherically or cylindrically isotropic (diffuse) fields.An extended version of the McCowan filter is derived in Leukimmiatis and Maragos (2006), where the noise spectral density is estimated optimally in a Wiener sense.
CroPaC (Delikaris-Manias and Pulkki, 2013) introduced the concept of using directional microphones to estimate the required spectral densities in Eq. (2).These microphones, which are "virtual" in the sense that they are created through beamforming, are used to capture four signals S j , j ¼ 1; 2; 3; 4 with directional characteristics corresponding to circular or spherical harmonic patterns.The post-filter transfer function, denoted by G, is then computed in the timefrequency domain for each steering angle h as the half-wave rectified normalised cross-spectral density between S 1 and S 2 Gðk; i; (3) where U 12 ðk; i; hÞ ¼ EfS Ã 1 ðk; i; hÞS 2 ðk; i; hÞg is the cross-spectral density between S 1 and S 2 , U jj ðk; i; hÞ ¼ EfjS j ðk; i; hÞj 2 g are the auto-power spectral densities of S j , j ¼ 1; 2; 3; 4; E denotes an expectation operator, < denotes the real part, and k 2 ½0; 1 is a spectral floor parameter.The directional patterns for capturing S 1 and S 2 are designed to have maximum gain and the same phase in the steering direction, with their side-lobes occurring at different angles and/or having opposite phases.The directional patterns of the two additional signals S 3 and S 4 are designed such that the sum of each pair of spectral densities U 11 ; U 33 and U 22 ; U 44 equals the total soundfield energy.This choice constrains the value of the normalised cross-spectral density between À1 and 1.The half-wave rectification, indicated with the max function in Eq. ( 3), then discards the negative values, which correspond to phase mismatches between the beamformers.As a result, G attains its maximum value when the DOA of a plane wave coincides with the steering direction, while for waves arriving from other directions, it obtains lower values.The real operator is applied on the numerator of Eq. ( 3) as it expresses the auto-power spectrum of the desired signal U ss [Eq.( 2)], which is a real-valued quantity.Finally, the computed parameter G can be used to modulate the output of a beamformer b steered in the same direction.The filtered signal is then obtained as yðk; i; hÞ ¼ Gðk; i; hÞ Á bðk; i; hÞ: (4) Note that, the baseline beamformer b may be obtained through any beamforming algorithm, i.e., it does not necessarily need to be the MVDR design, as defined in Eq. (2).

Proposed method
The original CroPaC algorithm is intended for cylindrical or spherical arrays, since the required beam patterns in the derivation of the post-filter correspond to circular or spherical harmonics of different orders.This therefore necessitates transforming the sensor signals into the circular or spherical harmonic domains.These harmonic patterns may also be obtained, within a limited frequency range, by using other array configurations, such as rectangular arrays (Delikaris-Manias and Pulkki, 2014b).However, when considering linear arrays, which are of particular importance in underwater acoustics, the required harmonic patterns can only be obtained adequately in the end-fire direction.Therefore, a new formulation of CroPaC, henceforth referred to as space-domain cross-pattern coherence (SD-CroPaC), is proposed, which is more suitable for linear arrays.
The SD-CroPaC algorithm, as the name implies, utilises patterns obtained by conventional beamforming directly in the space domain.Without loss of generality, consider a uniform linear array in free space consisting of Q hydrophones along the x axis at the positions x q , q ¼ 0; …; Q À 1 centered around the origin x ¼ 0. As with any algorithm employing conventional beamforming, the sound sources captured by the array are assumed to be in the far-field so that the received signals can be considered as plane waves.The array is first divided into two adjacent non-overlapping sub-arrays U and V, comprising M and Q -M sensors, respectively, as follows: sub-array U contains the sensors q ¼ 0; …; M À 1, while sub-array V the sensors q ¼ M; …; Q À 1.The time-domain signals of the sub-arrays are then transformed into the timefrequency domain, typically through a short-time Fourier transform (STFT), and the resulting signals are denoted by the vectors x U ðk; iÞ 2 C MÂ1 and x V ðk; iÞ 2 C ðQÀMÞÂ1 .The SD-CroPaC transfer function is computed as the half-wave rectified normalised cross-spectral density between two signals S U and S V , which are obtained by applying the DaS beamformer with uniform amplitude weighting on each of the signals x U and x V .The beamforming weights are obtained through the array transfer vectors of the sub-arrays d U ðk; hÞ ¼ ½e jðx k =cÞx0 cos h ; …; e jðx k =cÞxMÀ1 cos h T and d V ðk; hÞ ¼ ½e jðx k =cÞxM cos h ; …; e jðxk=cÞxQÀ1 cos h T ; where h 2 ½0; p is the steering angle (in half-space due to axial symmetry), x k is the temporal angular frequency of the frequency bin k, and c is the sound speed in the medium of propagation.Note that the reference point for the computation of the required phase shifts is placed in the centre of the original array for both sub-arrays.(5) where U UV ðk; i; hÞ ¼ EfS Ã U ðk; i; hÞS V ðk; i; hÞg is the cross-spectral density between the beamformed signals, and U U ðk; i; hÞ ¼ EfjS U ðk; i; hÞj 2 g; U V ðk; i; hÞ ¼ EfjS V ðk; i; hÞj 2 g are their auto-power spectral densities.
The choice of the specific type of sub-arrays is dictated by the requirement that G attain low values when noise is dominant in the received signals.In other words, both diffuse and sensor noise are required to be uncorrelated between S U and S V .This is achieved by (a) selecting sub-arrays that are separated in space to ensure that the captured diffuse noise is uncorrelated (above a certain frequency) between the two beamformed signals, and (b) selecting sub-arrays that are non-overlapping, i.e., they do not share common sensors, so that sensor noise is also uncorrelated between the two signals.
The normalisation scheme in this formulation, inspired by Delikaris-Manias and Pulkki (2014a), differs from the original CroPaC in that it does not require the computation of two additional signals; instead, it uses in the denominator the same two signals S U and S V, which capture approximately the local soundfield energy.This type of spatially-localised normalisation has been shown to be more suitable for environments with high diffuse noise and/or multiple interfering targets (Bountourakis et al., 2021) so that the resulting parameter G lies in the interval ½k, 1] in accordance with the original CroPaC algorithm.The spectral floor k is commonly set to a value above 0 in order to suppress any audible artifacts.However, in soundfield visualisation applications, where audio quality is not a concern, it may be set to 0, thus permitting maximum noise suppression.Finally, note that the robustness of the bearing estimation and the noise suppression may be improved by averaging over several G values computed using different sub-array combinations.Exemplary choices of beamformers to be modulated by the post-filter are the DaS and MVDR designs, both utilising all the array sensors.

Ideal patterns in the noise-free case
The proposed filter is initially examined under noise-free conditions with a simulated uniform linear array (ULA) consisting of 12 omnidirectional sensors with inter-sensor distance d.The array is divided into two non-overlapping arrays U and V of 7 and 5 sensors, respectively, as described in Sec. 3. Note that the investigation of optimal division ratios is beyond the scope of this study.Figure 1(a) shows the directional patterns employed for the computation of the post-filter at f ¼ c=ð2dÞ when they are simultaneously steered at h ¼ 90 (broadside).Note that the employed patterns for the computation of parameter G are frequency-dependent, as opposed to the original and other variants of CroPaC where the same patterns are used for all frequencies.It can be observed that patterns P U and P V exhibit their most significant side-lobes at different angles, so that their magnitudes will not produce large numerator values when the patterns are multiplied together in the crossspectral density computation [Eq.( 5)].An additional attenuation for the off-axis directions is achieved due to their different phase values: as shown in Fig. 1(b), the phases of P U and P V are equal only at h ¼ 90 , and thus it is only at this direction that the incoming waves are captured coherently.For all other directions, the incoming waves are captured with different phase, which results in lower cross-spectral density values.It is noted that for phase differences larger than p=2, the numerator in Eq. ( 5) becomes negative, and thus the half-wave rectification operation will completely suppress such side-lobes.Figure 1(c) shows the resulting attenuation pattern G computed for a range of frequencies up to the aliasing limit f ¼ c=ð2dÞ using the same combination of sub-arrays.It can be seen that, with increasing frequency, the post-filter pattern becomes more directive with some low-level side-lobes starting to appear at the higher end.

Performance analysis in noise conditions
In this section, SD-CroPaC is tested with the 12-sensor ULA presented in Sec.4.1 against two types of noise: uncorrelated (spatially white) noise and correlated noise.In typical underwater scenarios, noise in the hydrophones represents a mixture of the former type, which is mainly flow noise and sensor self-noise, and the latter type, which is attributed to sea noise, tow ship noise, and traffic noise (Groen et al., 2005).It is therefore sensible to evaluate the performance separately for each type of noise.Additionally, the method is tested on real-world data of an eight-hydrophone non-uniform linear array, which include both types of noise, as well as additional potential sources of errors, such as uncertainties in the sensor positions, gain and phase mismatches, coherent reflections, and non-stationary conditions.
For the first set of experiments, a target was modelled as a static white-noise source in the far-field, which is captured by the array as plane waves on the horizontal plane arriving from either h ¼ 90 (broadside) or h ¼ 30 (near endfire).Additionally, uncorrelated white Gaussian noise was added to each sensor with SNR ¼ -10 dB.For the second set of experiments, the target source was placed in the same directions but the added noise was modelled as a spherically isotropic diffuse field, approximated by 5012 incoherent noise sources uniformly distributed around the array.This type of noise is correlated between the sensors at low frequencies, while it becomes increasingly uncorrelated at higher frequencies.The SNR was set again at À10 dB.Figures 2(a)-2(e) show the performance of SD-CroPaC applied to the output of DaS beamformer (using all the array sensors) by means of the spatial spectra obtained during an observation interval of 10 s in the frequency range [f al =8, f al ].For comparison, the plots also include other popular post-filters modulating the same baseline beamformer (DaS).It can be observed that SD-CroPaC is more directive than the other post-filters when the noise is more correlated between the sensors, and almost equally directive when the noise is uncorrelated.However, the McCowan and Zelinski filters achieve higher background noise suppression away from the target in all cases.It is also worth noting that Zelinski suppresses more noise than McCowan when the noise is uncorrelated, a result that can be explained by the assumption of purely uncorrelated noise in the derivation of the Zelinski filter.A case of particular interest is the one of the target near end-fire accompanied by diffuse noise, where both McCowan and Zelinski practically fail to localise the target, while SD-CroPaC notably outperforms all methods.It is yet worth noting that, in this case, Leukimmiatis still offers an improvement compared to DaS.
The real-world data involve recordings of a single moving target (surface vessel) in shallow water using a towed array which remains stationary during the measurements.The array consists of eight non-uniformly distributed hydrophones along the x axis with coordinates (in metres) x q ¼ ½0:00; 0:08; 0:37; 0:75; 0:94; 1:07; 1:50; 2:08.The sampling frequency is 32 kHz and the conversion of the sensor signals into the time-frequency domain is performed via an STFT utilising Hann windows of 100 ms with 50% overlap.The obtained spatial spectra are averaged in the range 300 Hz to 15 kHz and over a time interval of 2 s.Note that, due to the non-uniform distribution of the hydrophones, it is less likely for grating lobes to appear above the theoretical aliasing limit of the array (f al ' 1280 Hz).The array is divided into two sub-arrays, which consist of the first 4 and the remaining 4 hydrophones, respectively.The spatial spectra for the target located at the same angles as in the simulated scenarios are shown in Figs.2(c) and 2(f).From these snapshots, it may be concluded that, in real-world conditions, all the tested post-filters demonstrate a behaviour that falls between the two cases of the simulated data.A further conclusion is that the application scope of the proposed method may be extended to nonuniform linear arrays consisting of as few as eight sensors.The DOA estimation accuracy of the studied post-filters was tested with a set of 2000 Monte Carlo trials; the results of which are shown in Fig. 3. Figure 3(a) depicts the minimum squared error (MSE) in the DOA estimation of a target either at h ¼ 90 or h ¼ 30 for a range of SNR levels when the noise is spatially uncorrelated.The corresponding plots for correlated noise are given in Fig. 3(b).First, it can be seen that, in both cases of uncorrelated noise, DaS achieves the minimum MSE, as expected from theory under the tested conditions (Stoica and Moses, 2005).Second, in all tested cases, SD-CroPaC has a better DOA estimation accuracy than McCowan and Zelinski, while Leukimmiatis demonstrates a performance that approaches that of DaS.More notably, for a target near end-fire and spatially correlated noise, SD-CroPaC outperforms all the tested methods.Along with the improved directivity and background noise suppression, this last observation reveals an additional advantage of the proposed method when used near the end-fire direction.

Performance analysis in the presence of uncorrelated interferers
Another set of Monte Carlo simulations was performed in order to investigate the spatial resolution limits of the proposed filter.In these experiments, a static target was placed alternately at h ¼ 90 and h ¼ 30 while a second target of equal power (uncorrelated interferer) was placed consecutively in a range of angular distances Dh from the main target varying with a step of d h ¼ 1 .For each trial, a peak finding algorithm recorded the number of detected peaks with at least 1 dB of prominence in the spatial spectra of each method along with the DOA estimation of the main target.The results for the cases of the main target at broadside (h ¼ 90 ) and near end-fire (h ¼ 30 ) are shown in Figs.4(a   respectively.It can be seen that, in both cases, SD-CroPaC detects two distinct peaks at a shorter angular distance than DaS and the other tested post-filters.Additionally, it achieves the lowest MSE in the DOA estimation of the main target for a wide range of angular distances.Leukimmiatis presents the next best performance by offering a less significant improvement than SD-CroPaC on the DaS performance.On the other hand, Zelinski and McCowan filters demonstrate the lowest resolution capability of all tested methods; nonetheless, above a certain angular distance Dh, they both improve the DOA estimation accuracy of DaS.

Conclusions
This work proposes a spatial post-filtering method for applications requiring underwater soundfield visualisation with linear hydrophone arrays.The proposed filter represents a space-domain reformulation of the CroPaC algorithm, which is originally defined in the spherical and circular harmonic domains.The filter's operation is based on applying attenuation values to the output of conventional or adaptive beamformers when the received signals correspond to noise or interference.The evaluation results, performed with simulated data of a 12-sensor ULA and with real-world data of an 8-hydrophone towed array, effectively demonstrate the capabilities of the filter in terms of noise suppression and DOA estimation accuracy.A set of comparison tests against other popular post-filters revealed the advantages of the method, especially near the end-fire direction and in the presence of spatially correlated noise, such as diffuse noise in the lowfrequency region, or uncorrelated interferers.Finally, it is noted that the proposed method may also be applicable to realtime applications, due to its rather low computational requirements.

Fig. 1 .
Fig. 1.Examples of SD-CroPaC directional patterns for a 12-sensor ULA split into two adjacent non-overlapping sub-arrays U and V comprising 7 and 5 sensors, respectively: (a) magnitude of the patterns for h ¼ 90 and f ¼ c=2d and (b) phase response of the same patterns.The resulting attenuation pattern G is plotted for different frequencies in (c).

Fig. 2 .
Fig. 2. Spatial spectra for a single target at broadside (upper row) and near end-fire (bottom row): (a), (b) using simulated data of a 12-sensor ULA with added spatially white noise; (c), (d) using the same array with added simulated diffuse noise; (e), (f) using real-world data of an 8hydrophone non-ULA.
ARTICLE asa.scitation.org/journal/jel Fig. 3. Mean squared error (MSE) in the direction-of-arrival (DOA) estimation of a static target located at h ¼ 90 and h ¼ 30 for different levels of (a) spatially white noise and (b) diffuse noise.

Fig. 4 .
Fig. 4. DOA resolution and estimation accuracy of the proposed method and other post-filters for two closely spaced targets: the main target is located either at h ¼ 90 (a) or at h ¼ 30 (b) and an interfering target of equal power at an angular distance Dh away from it.In both cases, SD-CroPaC achieves better separation of the two targets than DaS and the other post-filters as well as the lowest MSE of the main target's DOA.