Recent work has shown that endfire beamforming of ocean noise can be used to produce images of the seabed layering [Siderius et al, J. Acoust. Soc. Am. 120, 1315–1323 (2006)]. This initial noise imaging technique used conventional beamforming and was later extended to adaptive beamforming that is theoretically optimal. However, there can be problems with adaptive methods, which include extreme sensitivity to random errors, the required averaging time, and computational complexity. Here, the concept of supergain is used to show that delay and sum beamforming can produce nearly the same results as the optimal adaptive methods without the drawbacks.

## I. Introduction

Ocean ambient noise is generated from a variety of man-made and natural sources. Here, ambient noise is considered in the band from 20 Hz to about 4 kHz. The lower frequencies of this band can have shipping components of noise and the entire band can contain natural sounds such as those from marine life. However, a large part of the noise in this band is typically due to surface breaking waves, which is used here for seabed imaging. The process of cross-correlating noise signals measured on two sensors has been shown to produce an approximation of the Green’s function (or impulse response) between the sensors. The basis of the technique is found in research literature from the last 10 or more years and was first applied in the ocean to retrieve of the Green’s function between horizontally separated sensors by Roux and co-workers.^{1,2} The noise cross-correlation phenomenon can also be applied to beams that are formed using a hydrophone array (as opposed to correlating individual hydrophone sensors). This is the so-called passive fathometer application where beamformed surface noise is used to determine the seabed interface and sub-bottom layering.^{3} The beamforming provides the capability for focusing on the useful (i.e., coherent) noise while reducing interference from unwanted noise sources. This greatly reduces the noise correlation averaging time needed and improves the estimates through array gain.

The interest here is in imaging the seabed using the passive fathometer method where the noise directly overhead a vertical hydrophone array is the effective signal and all other sources are contaminating noise.^{3,4} For this application, a vertical array is essential to form up-looking and down-looking beams that are cross-correlated rather than individual sensors. In the original passive fathometer formulation, conventional (i.e., delay and sum) beamforming was used to form the up- and down-looking (endfire) beams.^{3,5} Subsequent analysis showed there to be substantial contamination to the seabed image due to competing noise sources coming from directions other than endfire. This noise enters the passive fathometer process because the beamforming is not perfect and endfire beams contain sidelobes that allow noise from other directions into the beamformer output. Further, the conventional beams can be relatively large in the endfire directions, especially at the low frequencies where the array length may be only a fraction of a wavelength. Only a limited region containing noise sources coming from the endfire direction contributes to the coherent components of the noise cross correlation.^{2} Therefore, some control over the size of the endfire beams is desired in the processing.

To mitigate the beam-width problem as well as to minimize sidelobe contamination, adaptive processing was introduced.^{4,6} Adaptive processing, in theory, should address these issues and provide optimal complex array weighting for gain and sidelobe suppression. However, in practice, there are several drawbacks to adaptive methods. First, they are known to be highly sensitive to random errors. For example, these can be errors in the actual versus assumed hydrophone locations (e.g., due to tilting or sagging of the array). Second, the adaptive processing requires inversion of the measured cross-spectral density matrix (CSDM). The inversion not only requires adequate data averaging but also usually requires diagonal loading for stability. The diagonal loading is equivalent to adding white noise to the CSDM and choosing the ideal amount of white noise is an additional complication. In spite of these difficulties, the adaptive methods showed a marked improvement on certain data sets.^{6} However, in some cases it may not be possible to overcome random errors or the required averaging may pose serious limitations. For example, if the seabed or array motion is changing rapidly the averaging may be better applied after the beamforming rather than before to allow for time alignment of data.

In this paper, the concept of supergain^{7} is explored for the passive noise seabed imaging problem and compared to the theoretically optimal adaptive methods. Supergain uses delay and sum beamforming, but with performance that is nearly the same as adaptive methods. Therefore, it does not suffer from the drawbacks associated with adaptive methods. The supergain technique uses simple array shading to suppress sidelobes and oversteering past endfire to limit beam-width size. In Sec. II, the passive fathometer cross-correlation methods will be briefly summarized. The practical supergain technique is described in Sec. III and key results from both the conventional, adaptive and supergain processing methods are presented in Sec. IV.

## II. Imaging the seabed with passive fathometer processing

The image of the seabed is obtained from the surface noise by using a vertical array and correlating a beam steered toward the surface with a beam steered toward the seabed. The surface steered beam captures the surface signal directly and the bottom steered beam captures the corresponding echo after reflecting from the seabed and sub-bottom layers.

This process has been referred to as a passive fathometer since the water–seabed interface and sub-bottom layer structure can be determined using only a receiver array.^{3} The original formulation used simple delay and sum (or conventional) beamforming and is briefly summarized here.

The hydrophone data for each channel at angular frequency ω are written as the vector **p** = [*p*_{1}*, p*_{2}, …, *p*_{M}]^{T} for the *M* hydrophones (with *T* indicating transpose operation). Each entry is determined through a discrete Fourier transform (DFT) of an ambient noise time series measured on each channel, $pm(\omega )=F{pm(t)}$. The number of points in the DFT processing will be referred to as the snapshot size. With conventional beamforming, each channel is multiplied by a complex weight to properly delay (phase shift) before summing all channels together. The weight for the *m*th hydrophone steered at angle θ is written, *w*_{m} = *e*^{−imkd}^{ sin θ}, for plane waves arriving at grazing angle θ between the hydrophones separated by distance *d*. The array is referenced to the shallowest hydrophone, which is channel *m* = 0. The wavenumber is *k* = ω/*c* and *c* is the sound speed in the water (around 1500 m/s).

As stated, for passive fathometer processing the up-looking beam is correlated with the down-looking beam. To steer an up-looking beam directly upward toward the surface, θ = +90° and to steer directly toward the seabed, θ = −90°. The steering weights are *w*_{m} = *e*^{±}^{(imkd)} and the down-looking weight will be denoted **w** and the up-looking weight as **w*** (conjugated). Therefore, the upward beam is *b*_{up} = **w**^{†}**p** († represents conjugate transpose operation), and the downward beam is *b*_{dn} = **w**^{T}**p**. The frequency domain correlation of these two beams is then $C=bupbdown*=w\u2020Kw*$, where **K** is the data cross-spectral density matrix CSDM, **K** = **pp**^{†}. The image of the seabed is obtained by moving the vertical array (e.g., having it drift over the seabed) and stacking the correlation time series *c*(*t*) which is obtained through inverse discrete Fourier transform of *C*, $c(t)=F-1{C(\omega )}$.

Adaptive processing can be applied with a relatively simple extension using, for example, the Minimum Variance Distortionless Response (MVDR) processor.^{8} In this case, the optimal, complex steering weights $w\u0303$ are computed using the conventional weights according to

The adaptive cross correlation is found the same way as the conventional processing except for using the adaptive up-looking and down-looking weights. As can be seen in Eq. (1), this process requires inversion of the CSDM and problems can occur when the matrix is not full rank. This can happen, for example, when averaging time for the CSDM is limited (the so-called snapshot deficiency problem). For the seabed imaging problem the averaging time is limited by how fast the data are changing due, for example, to the array moving. If the array is stable and slowly moving and if the seabed is not changing rapidly (relative to the array moving over the seabed), then sufficient averaging time may not be a big problem. However, this is sometimes not possible if the array has vertical motion, if the array is moving too rapidly horizontally or if the seabed is somewhat spatially irregular.

The MVDR processor is also known to be sensitive to random errors. This might be due to slight errors in the assumed hydrophone locations due, for example, to array tilt or sag. The White Noise Gain Constraint (WNC) beamformer adjusts the diagonal loading on the CSDM for each steering angle to provide robustness to the adaptive processor that is constrained by the white noise gain (WNG). The WNC beamformer can be tuned to be pure conventional, pure MVDR or somewhere in between according to the WNG. Details of using WNC beamformer applied to noise imaging is given in Siderius *et al.*^{6} and references within. One of the issues with the WNC beamformer is setting the WNG and, unfortunately, this is often determined in an *ad hoc* way.

## III. Using practical supergain for noise imaging of the seabed

Adaptive methods give the theoretically optimal complex beamforming weights, but for steering in the endfire directions practical supergain produces nearly optimal results by simply applying array shading and oversteering to the delay and sum beamforming.^{7} Using this approach avoids the pitfalls of adaptive processing while maintaining the performance. It also has the advantage of allowing the averaging to be done after beamforming since the CSDM data are not used to construct the weights as shown in Eq. (1).

Supergain uses beamforming in spatial frequency rather than grazing angle (sometimes referred to as *k*–ω beamforming but here *k–f* is used where *f* = ω/2π). This is performed by taking a DFT over the array elements. The spatial frequency, or wavenumber is bounded according to −π/*d* < *k* < +π/*d*, where *d* is the hydrophone separation. However, propagating acoustic waves are bounded by the slowest speed a wave can physically propagate (i.e, *c*) which corresponds to a wavenumber magnitude of |*k*| = ω/*c*. Inside the region |*k*| ≤ ω/*c* is the so-called visible space since outside this region there are no propagating acoustic waves.

To illustrate, first consider beamforming on typical ocean noise, which is shown in Fig. 1. These are data taken in the Mediterranean Sea (also analyzed in Siderius *et al.*)^{6} and has frequency band of 20–4000 Hz, sampling frequency is 12 kHz, the vertical array has 32 equally spaced hydrophones with spacing *d* = 0.18 m. Figure 1(a) shows the vertical directionality of the noise field using conventional delay and sum beamforming as a function of frequency and grazing angle. The grazing angles range from +90° to −90°. Horizontally arriving sound corresponds to 0° and upward steered beams with positive angles are toward the sea surface while negative beams are toward the seabed (+90° is directly above the array and −90° is directly below the array). A 32 point Taylor window was used for shading the array.

To illustrate DFT beamforming used for supergain, the same data were processed again as a function of spatial frequency rather than grazing angle. This was performed by taking a DFT over the 32 array elements. The DFT beamforming results are shown in Fig. 1(b). Zero padding was used to extend the DFT from 32 to 256 points. The solid white lines correspond to the endfire directions, and form a V region inside which grazing angles between ±90° fall. This region is the so-called visible space where the spatial frequencies correspond to sound speeds equal to or greater than the ocean sound speed *c* ≈ 1500 m/s (i.e., |*k*| ≤ ω/*c*). Outside the V is beyond the visible space and corresponds to slow waves that travel at sound speeds less than *c*. In principle, if these slow waves existed they would not represent propagating sound but some other type of signal such as mechanical waves due to array strum. However, for properly functioning arrays this region has no propagating sound and therefore shows very low beam outputs as evident in Fig. 1. Note that some of these non-propagating signals are present at the lowest frequencies and may, in fact, indicate some low frequency mechanical signals on the array.

For noise imaging of the seabed, only the endfire beams are used and these are cross correlated as described in Sec. II. One of the factors that degrades results is the interference from noise sources other than endfire. This can be seen in Fig. 1, where equally high intensity sound can be seen coming from most of the positive angular directions. These high levels will enter the cross correlations through the beam sidelobes. If aggressive array shading is used, these sidelobes can be greatly suppressed; however, this is at the high price of widening the main lobes, which are already large in the endfire directions.

The practical supergain approach consists of array shading to suppress sidelobes and mild oversteering to limit the effective size of endfire beam widths. That is, the effect of the wider main lobe is reduced by steering the beam out of the visible space past endfire; effectively narrowing the beam width. The sidelobes were suppressed using a 32 point Hanning window. To oversteer, the time delay applied to the array *m*th element was, $\tau m=(1+\beta )md/(1+\beta )mdcc$, where β is a small positive constant that was used to oversteer the array. For the example that will be presented in Sec. IV the array was oversteered by 7 points in the 256 point DFT. This oversteering factor β was determined through trial and error.

## IV. Results

A comparison can be made between imaging the seabed using conventional delay and sum, adaptive, and practical supergain processing. This is shown for data collected in the Mediterranean (same array parameters as indicated previously) in Fig. 2. The conventional beamforming results are in Fig. 2(a) and the adaptive processing results are in Fig. 2(b). Each time trace is a vertical line of the image and is formed using 90 s of averaging time. The horizontal axis is a file number that corresponds to range since the array was drifting over the seabed. The vertical axis is distance to the seabed and sub-bottom layers (two-way travel time is actually measured but is converted to distance using sound speed of 1 500 m/s). Supergain results are in Fig. 2(c). The supergain beamforming weights do not depend on the data so these are formed independently from the data. For comparison, here the data were beamformed after the same 90 s averaging time as done for the conventional and adaptive processing. However, essentially the same results were found by beamforming and cross correlating after just 10 s of averaging for the CSDM followed by additional averaging after beamforming (for a total of 90 s). It may be possible to do most of the averaging after beamforming but the advantages of pre-beamforming averaging versus post-beamforming averaging have not been studied in detail here.

## V. Conclusion

The idea of using practical supergain as an alternative to adaptive beamforming for noise imaging has been presented showing very similar results between the two techniques. However, supergain is a simple application of the delay and sum beamformer with array shading and oversteering. It does not suffer from some of the drawbacks associated with adaptive processing such as sensitivity to random errors and averaging time. With the supergain approach it is possible to do most of the averaging after forming the cross correlation. This will allow better time alignment of the signals if, for example, the seabed or array is changing rapidly.

## Acknowledgment

Support for this work from the Office of Naval Research Ocean Acoustics Program is gratefully acknowledged.