State-of-the-art mode estimation methods either utilize active source transmissions or rely on a full-spanning array to extract normal modes from noise radiated by a ship-of-opportunity. Modal-MUSIC, an adaptation of the MUSIC algorithm (best known for direction-of-arrival estimation), extracts normal modes from a moving source of unknown range recorded on a partially spanning vertical line array, given knowledge of the water column sound speed profile. The method is demonstrated on simulations, as well as on data from the SWellEx-96 experiment. Extracted normal modes from ship noise during the experiment are used to successfully localize a multitone source without any geoacoustic information.

The normal modes of a waveguide are determined by the sound speed profile (SSP) and boundary conditions. In underwater acoustics, the surface boundary condition is always pressure-release, and the SSP in the water column is easy to measure. The set of all possible modes that satisfy the surface condition and the water-column SSP can be explored via the shooting method,1 which reframes the boundary value problem as an initial value problem (IVP). The true modes for a given waveguide are the subset of IVP solutions that also satisfy the bottom geoacoustic parameters. This paper presents a mode extraction method, modal-MUSIC, which picks the true modes from the set of IVP solutions by comparison with data from noise, thus replacing the need for a geoacoustic model. As a result, modal-MUSIC allows matched field processing (MFP) to be performed without any information below the sea floor.

Mode estimation methods can be divided into two categories: active methods that require the use of a controlled source and passive methods that rely on ambient acoustic noise to perform the extraction. Active source methods fall into two subcategories: wavenumber estimation–based methods applied to a horizontal line array/synthetic aperture2–6 and modal group/phase speed estimation from broadband impulsive source transmissions.7–9 Passive mode extraction on a vertical line array (VLA) has thus far been limited to the special case where the receiving array fully spans and sufficiently samples the water column (in which case, the singular value decomposition provides mode estimates).10,11 An effective passive mode extraction method for more general experimental configurations would be beneficial; the present work extends passive mode estimation to partially spanning arrays.

This paper presents modal-MUSIC, a method for extracting normal modes in an underwater waveguide using noise from a moving source of unknown range received on a partially spanning array, and further demonstrates its application to model-free MFP without needing knowledge of the bottom properties.

MUSIC, which stands for “Multiple Signal Classification,” is a general algorithm for the estimation of the parameters of multiple sources radiating within the vicinity of an array.12 In the most common MUSIC presentation, it is used to estimate the angles associated with sources radiating plane waves.

Measurements on an array, written as a vector x, to obey the signal model
(1)
where the N columns of A are the incident plane waves
(2)
received on the array from N sources with amplitudes
(3)
The noise w contains anything uncorrelated with the source signals. The data covariance matrix can be written as
(4)
where, with E ( · ) denoting expectation, S = E ( s s ) is the source covariance matrix, λ W = E ( w w ) is the covariance matrix of the noise, and λ is a scaling parameter that is relevant for the general derivation of the MUSIC algorithm.
As shown by Schmidt,12 whenever the matrix AS A is singular (i.e., there are fewer sources N than there are number of sensors L), the covariance matrix can be written as
(5)
where λmin is the smallest solution to X λ W = 0 (i.e., λmin is the smallest generalized eigenvalue of X and W).

As long as the sources are not fully correlated with one another, the Vandermonde structure of A guarantees that the rank of AS A is the number of sources N. Schmidt shows that λmin must have multiplicity LN, and the (LN)–dimensional associated eigenspace is orthogonal to the space spanned by the columns of AS A .

Therefore, given (a) the matrix E N whose columns are the generalized eigenvectors associated with λ min ( X , W ) and (b) a parameterization a ( θ ) of all possible columns of A, the MUSIC spectrum can be computed as
(6)
and will have peaks at the parameter values θ excited by the sources.

With respect to (a) in theory, knowledge of W permits one to estimate E N by computing the generalized eigenvalues λ ( X , W ). However, under the stronger assumption that the noise is uncorrelated white noise, the matrix E N can be estimated directly from the eigenvectors of the covariance matrix X [since the generalized eigenvectors λ ( X , a I ) for identity matrix I are simply the eigenvectors of X]. At high SNR, estimation of the noise subspace can be made by choosing a cutoff eigenvalue from the spectrum of the covariance matrix. A practical way to automate this is to find the peak in the “difference spectrum,” defined as { λ i + 1 λ i }. Then, the eigenvectors that fall to the right of this peak (assuming eigenvectors sorted into a descending order) span the noise space and form the columns of E N.

With respect to (b) the possible columns of A form the “array manifold,” which for a plane wave signal model is comprised of the set of all possible arriving plane waves evaluated for the array. In this case, it is parameterized by source angle θ and is trivially computed given the positions of the array elements.

Here, it is shown that by viewing each excited mode in a waveguide as a virtual source (whose waveform on the array is a normal mode rather than a plane wave), the mode estimation problem has the same form as the source estimation problem that MUSIC solves. The field produced by a moving source produces the necessary covariance matrix structure.

We can express the covariance matrix for a VLA, built up from snapshots of a single moving source over a set of ranges { r i } in the familiar matrix form X = AS A + λ W , where now the N columns of A are the normal modes evaluated at the array depths,
(7)
and the source covariance matrix is
(8)
Here, S ( ω ) is the source's amplitude at frequency ω, zs is the depth of the source, Φ i ( z ) is the ith normal mode depth function that solves the depth-separated Helmholtz equation, and ki is the corresponding wavenumber. S is the modal correlation matrix. Although any amount of partial source correlation is theoretically permissible in the derivation of MUSIC, source correlations can significantly degrade the performance of MUSIC in the presence of noise.13 Therefore, a requirement for the use of modal-MUSIC for a moving source is that the source transits a significant range (see Neilson and Westwood10 for details) in which case S in Eq. (8) will be approximately diagonal.

The previous paragraph showed that we can express the field due to a moving source in the multiple source matrix form X = AS A + W. The source parameters determine the columns of A: in this case, the parameters are the horizontal wavenumbers of the excited modes, and the columns of A are the corresponding modes { Φ i } evaluated at the array depths.

Recall that MUSIC also requires a means of exploring the array manifold. This is trivial for a plane wave signal model, where for each possible source angle, the array response is simply the steering vector a ( θ ) = e i k sin θ z. For the normal mode problem, the waveforms of the “signals” are the normal modes sampled at the array depths. For each candidate wavenumber kr, the associated normal mode sampled at the array depths, Φ ( k r ), is computed using the shooting method.1 This requires knowledge of the SSP from the surface down to the bottom of the array. Now that we are equipped with the means of exploring the array manifold, the modal-MUSIC spectrum can be computed as
(9)
and will have peaks at the wavenumbers supported by the waveguide.

Simulations demonstrate the performance of modal-MUSIC under conditions of varying signal-to-noise ratio (SNR) and array length. It is seen that the quality of the estimates is always better for the higher order modes, which are better sampled by the array. This effect is accentuated because we use a surface source that emphasizes the high order modes.

We simulate an 8 m deep, 50 Hz source moving through a 216.5 m deep Pekeris waveguide with water column sound speed c = 1500 m/s, bottom sound speed c b = 1600 m/s, and bottom density ρ b = 2.0 g/cm3. The source has five excited modes in this waveguide. For each simulation setup, we generate a sequence of pressure field values [ p 1 , p 2 , , p N r ] by sampling the field received on the array from the source as it moves from 5–8 km range every 5 m (for a total of 301 snapshots). The normal modes and wavenumbers are computed using the Sturm sequence eigenvalue method.14 

An ensemble of data is generated by adding realizations of white Gaussian noise to the sequence of pressure field values. The noise variance σ N 2 required to produce a desired SNR in dB is given by σ N 2 = σ s 2 / 10 SNR / 10, where σ s 2 is the average received power from the source over all ranges and array elements. For each realization, uncorrelated complex Gaussian noise with real and imaginary variance of σ N 2 / 2 is added to each element in the pressure field sequence. Then, the sample covariance matrix is formed from the 301 noise-corrupted pressure field snapshots. The signal subspace rank is estimated from the eigenvalue difference spectrum, and that estimate is used to compute the modal-MUSIC spectrum.

To look at the effect of SNR on modal-MUSIC performance, we use a half-spanning VLA similar to the VLA deployed in the SWellEx-96 experiment. It consists of 50 elements spaced at 2 m, and starts 100 m deep in the 216.5 m–deep waveguide. For SNR values from 10–30 dB, 50 realizations of data are generated, and the modal-MUSIC spectrum is computed (shown in Fig. 1). At 30 dB SNR, modal-MUSIC resolves all five modes of the wave guide. As 10 dB and 20 dB SNR, modal-MUSIC can no longer resolve mode 1 (which has a vertical wavenumber of X), and the estimate of mode 2 becomes biased.

Fig. 1.

Simulation results for (a) varying SNR and (b) array size. For both, all 50 realizations are shown faintly, with the mean spectrum shown darkly. The modeled wavenumbers are shown as dashed lines. (a) Normalized modal-MUSIC spectra are shown for SNR of 10, 20, and 30 dB. (b) Normalized modal-MUSIC spectra are shown for arrays with 50, 30, and 20 elements spaced at 2 m and with shallowest element at 100 m depth. The SNR is fixed at 30 dB for all of the array lengths.

Fig. 1.

Simulation results for (a) varying SNR and (b) array size. For both, all 50 realizations are shown faintly, with the mean spectrum shown darkly. The modeled wavenumbers are shown as dashed lines. (a) Normalized modal-MUSIC spectra are shown for SNR of 10, 20, and 30 dB. (b) Normalized modal-MUSIC spectra are shown for arrays with 50, 30, and 20 elements spaced at 2 m and with shallowest element at 100 m depth. The SNR is fixed at 30 dB for all of the array lengths.

Close modal

We also demonstrate the effect of shortening the array from 50 elements (100 m) to 20 elements (40 m), keeping the SNR at 30 dB for all of the data realizations. Note that the wavelength of the 50 Hz source is around 30 m. The ability to estimate the low-angle modes quickly drops off with array length, though the ability to estimate the highest mode is consistent even as the array length drops close to the wavelength of the sound.

MFP15 is a source localization method used in underwater acoustics. For each potential source position, which in a range-independent environment consists of range and depth (r, z), a model is used to compute a replica r ( r , z ) which is compared with the received field x. Using a simple linear processor (the Bartlett), the ambiguity surface is computed as
(10)
where the weight vector w = r / | | r | | is the normalized replica vector and the sample covariance matrix of the data X is used when multiple snapshots are available. Assuming the model is correct, the ambiguity surface will show a peak at the true source location. When a source transmits at multiple frequencies, a single ambiguity surface is formed by summing the normalized ambiguity surfaces from each frequency in dB.15 

As mentioned, a full acoustic model (water column SSP plus geoacoustic model) is typically used to compute the replica r ( r , z ). Here, we will use the normal modes extracted from noise to calculate the replica vectors, thus sidestepping the use of a geoacoustic model.

Noise from the R/V Sproul during the SWellEx-96 experiment is used to extract normal modes using data from a VLA. Then, matched field processing using the extracted normal modes is used to demonstrate the source localization capability on the S5 event.

SWellEx-96 is a well-known shallow-water MFP experiment performed off the coast of San Diego, CA.16 This work uses recordings from the experiment made by a 64-element VLA with element spacing 1.875 m that spans the bottom half of the water column. The array is cut for 400 Hz, and the sampling rate of the sensors is 1500 Hz.

For normal mode extraction, two 32 min segments (labeled “noise 1” and “noise 2”) of data were selected from the recording during which sound from the ship driving in the experiment region is received on the VLA. For localization, two 10 min segments (labeled “segment A” and “segment B”) were selected from the S5 event, which was a 75 min source tow. During the source tow, both a shallow and a deep source were towed, although we focus on the deep source which was towed at a depth of 60 m and transmitted 13 tonals between 49 Hz and 388 Hz. The ship's position in the experiment during these four segments is shown in Fig. 2(a), and the source–receiver range is shown in Fig. 2(b).

Fig. 2.

(a) SWellEx-96 experiment region. The two tracks of the ship during the half-hour segment of data used for mode estimation are shown to the right of the Coronado Bank and south of the VLA. Segments A and B, used for localization are also shown. (b) Ranges from the ship to the VLA during noise 1 and noise 2 (segments used for mode extraction) and segment A and segment B (used for localization). The full ship track during the S5 event is also shown.

Fig. 2.

(a) SWellEx-96 experiment region. The two tracks of the ship during the half-hour segment of data used for mode estimation are shown to the right of the Coronado Bank and south of the VLA. Segments A and B, used for localization are also shown. (b) Ranges from the ship to the VLA during noise 1 and noise 2 (segments used for mode extraction) and segment A and segment B (used for localization). The full ship track during the S5 event is also shown.

Close modal

To obtain covariance matrices for modal-MUSIC, a short time Fourier transform (STFT) with a 2048 point Hamming-windowed fast Fourier transform and no overlap, was applied to both noise 1 and noise 2 to obtain snapshots with a time step of 1.36 s and a frequency spacing around 0.73 Hz. The roughly 1300 snapshots at each frequency between 30 and 400 Hz were then used to build the roughly 500 sample covariance matrices (one for each frequency) used in modal-MUSIC.

Similarly for the MFP application, tones were extracted from a 2048 point, Hamming-windowed STFT with a 50% overlap between snapshots, resulting in snapshots spaced at 0.68 s. The signal bins were selected as those closest to the Doppler-shifted frequency, as estimated from mean range-rates for the source over the segments (−2.45 m/s for segment A and −2.1 m/s for segment B) and a nominal sound speed of 1500 m/s. Then, covariance matrices were formed using sets of 100 snapshots, corresponding to a covariance integration time of 68 s. By overlapping each covariance integration interval by 50 snapshots, the final sequence of covariance matrices have a time step of 34 s. These covariance matrix sequences are then used for multi-tone MFP.

Estimation of the noise subspace E N required by modal-MUSIC requires estimates of the rank q ̂ ( f ) of the signal subspace for each frequency. This was estimated using the peak in the eigenvalue difference spectrum. The resulting rank estimates q ̂ ( f ) are then fit to a curve over frequency to remove outliers and noise from the subspace dimension estimates. These smoothed rank estimates are used to build up E ̂ N ( f ) at each frequency by taking the 64 q ̂ ( f ) smallest eigenvectors (64 is the number of array elements). Once the noise subspace is estimated, a conductivity–temperature–depth cast from the experiment is used to generate an SSP for shooting candidate modes, and the modal-MUSIC spectrum is computed using Eq. (9). The single modal-MUSIC spectrum from ship noise at 50.54 Hz during both noise 1 and noise 2 is compared to wavenumbers computed using an established geoacoustic model17 in Fig. 3(a).

Fig. 3.

(a) Normalized modal-MUSIC spectrum at 50.54 Hz, shown for both noise 1 and noise 2 and compared to modeled wavenumbers. (b) Normalized modal-MUSIC spectra and mode curves extracted from noise 2 data segment. The peaks are shown as black lines, and their polynomial fits are shown in blue. (b) Smoothed wavenumber curves extracted from both segments noise 1 and noise 2, shown in blue and red. The wavenumbers computed from the model are shown in black. The model and data agree fairly well up to mode 15.

Fig. 3.

(a) Normalized modal-MUSIC spectrum at 50.54 Hz, shown for both noise 1 and noise 2 and compared to modeled wavenumbers. (b) Normalized modal-MUSIC spectra and mode curves extracted from noise 2 data segment. The peaks are shown as black lines, and their polynomial fits are shown in blue. (b) Smoothed wavenumber curves extracted from both segments noise 1 and noise 2, shown in blue and red. The wavenumbers computed from the model are shown in black. The model and data agree fairly well up to mode 15.

Close modal

In order to compare the spectra across frequency, the modes are parameterized by an “angle” θ ( k r ) = arccos ( k r / k ), where k = ω / min z c ( z ) is the maximum wavenumber for the waveguide. The normalized spectra for every processed frequency are shown in Fig. 3(b) for noise 2.

The mode estimates at each frequency are obtained by picking peaks from each frequency's modal-MUSIC spectrum. The spectra are smooth enough that picking peaks based on local optima is sufficient. Peaks are thresholded at 10% of the peak-maximum to limit the selection to well-excited normal modes.

As seen in Fig. 3(b), the normal mode estimates are noisy. To obtain denoised estimates of the modes, the estimates from all 506 frequencies can be combined using a smoothness constraint. To enforce smoothness of the normal mode curves in frequency, a polynomial fit is made for each measured mode. Polynomials of increasing degree are fit through each mode curve until the change in the residuals with increasing degree falls below a threshold value (we selected 0.1), or the maximum degree of 14 is reached. These polynomial curves provide the final product of normal modes as a function of frequency, and are compared to modes computed with an established geoacoustic model for the SWellEx environment17 in Fig. 3(c) for both noise 1 and noise 2.

Typically, MFP uses a geoacoustic model to compute the replica vectors. Here, we rely on the normal modes extracted from data to compute the replica vectors for each candidate source position (r, z). These replicas are combined with the covariance matrix sequences (described in Sec. 6) to compute a sequence of ambiguity surfaces using Eq. (10). By taking the maximum over depth, a range-time ambiguity surface can be formed (and similarly, taking the maximum over range yields a depth–time ambiguity surface), which is useful for visualizing the three-dimensional range–depth–time ambiguity surface produced by MFP. The two-dimensional surfaces are shown in Fig. 4. It can be seen that the extracted normal modes allow for accurate source range and depth estimation out to a range of around 5 km, although the quality of estimation decreases with range.

Fig. 4.

Results for localization on segments A and B from S5 data use data-derived normal modes. In each of the two plots, segment A is shown on the left and segment B is shown on the right. The range and depth are well-estimated out to around 5 km. No geoacoustic model was used for these results.

Fig. 4.

Results for localization on segments A and B from S5 data use data-derived normal modes. In each of the two plots, segment A is shown on the left and segment B is shown on the right. The range and depth are well-estimated out to around 5 km. No geoacoustic model was used for these results.

Close modal

Knowledge of the water column SSP is sufficient to explore the array manifold of an array measuring a normal mode acoustic field. This opens up the use of the high resolution estimator MUSIC to then find the excited modes in the waveguide, provided a suitable source of acoustic energy. A moving source transiting a sufficient (unknown) range is a suitable source for normal mode extraction.

The estimated excited modes can be related to the environment properties for a geoacoustic inversion (not explored here), or be used directly for performing source localization without any knowledge of the bottom properties of the environment. It was shown that normal modes extracted from ship noise and smoothed over frequency had sufficient accuracy to perform localization using broadband MFP.

This work was funded by the TFO program of the United States Office of Naval Research. Bill Hodgkiss and Dave Ensberg helped with data access for the SWellEx-96 experiment.

1.
M. B.
Porter
and
E. L.
Reiss
, “
A note on the relationship between finite-difference and shooting methods for ODE eigenvalue problems
,”
SIAM J. Numer. Anal.
23
(
5
),
1034
1039
(
1986
).
2.
G. V.
Frisk
,
K. M.
Becker
,
S. D.
Rajan
,
C. J.
Sellers
,
K.
von der Heydt
,
C. M.
Smith
, and
M. S.
Ballard
, “
Modal mapping experiment and geoacoustic inversion using sonobuoys
,”
IEEE J. Ocean. Eng.
40
(
3
),
607
620
(
2015
).
3.
S. C.
Walker
,
P.
Roux
, and
W. A.
Kuperman
, “
Modal Doppler theory of an arbitrarily accelerating continuous-wave source applied to mode extraction in the oceanic waveguide
,”
J. Acoust. Soc. Am.
122
(
3
),
1426
1439
(
2007
).
4.
Y.
Park
,
P.
Gerstoft
, and
W.
Seong
, “
Grid-free compressive mode extraction
,”
J. Acoust. Soc. Am.
145
(
3
),
1427
1442
(
2019
).
5.
T. C.
Yang
, “
Data-based matched-mode source localization for a moving source
,”
J. Acoust. Soc. Am.
135
(
3
),
1218
1230
(
2014
).
6.
T. C.
Yang
, “
Source depth estimation based on synthetic aperture beamforming for a moving source
,”
J. Acoust. Soc. Am.
138
(
3
),
1678
1686
(
2015
).
7.
J.
Bonnel
,
A.
Thode
,
D.
Wright
, and
R.
Chapman
, “
Nonlinear time-warping made simple: A step-by-step tutorial on underwater acoustic modal separation with a single hydrophone
,”
J. Acoust. Soc. Am.
147
(
3
),
1897
1926
(
2020
).
8.
H.
Niu
,
P.
Gerstoft
,
E.
Ozanich
,
Z.
Li
,
R.
Zhang
,
Z.
Gong
, and
H.
Wang
, “
Block sparse Bayesian learning for broadband mode extraction in shallow water from a vertical array
,”
J. Acoust. Soc. Am.
147
(
6
),
3729
3739
(
2020
).
9.
H.
Niu
,
P.
Gerstoft
,
R.
Zhang
,
Z.
Li
,
Z.
Gong
, and
H.
Wang
, “
Mode separation with one hydrophone in shallow water: A sparse Bayesian learning approach based on phase speed
,”
J. Acoust. Soc. Am.
149
(
6
),
4366
4376
(
2021
).
10.
T. B.
Neilsen
and
E. K.
Westwood
, “
Extraction of acoustic normal mode depth functions using vertical line array data
,”
J. Acoust. Soc. Am.
111
(
2
),
748
756
(
2002
).
11.
P.
Hursky
,
W. S.
Hodgkiss
, and
W. A.
Kuperman
, “
Matched field processing with data-derived modes
,”
J. Acoust. Soc. Am.
109
(
4
),
1355
1366
(
2001
).
12.
R.
Schmidt
, “
Multiple emitter location and signal parameter estimation
,”
IEEE Trans. Antennas Propag.
34
(
3
),
276
280
(
1986
).
13.
T.-J.
Shan
,
M.
Wax
, and
T.
Kailath
, “
On spatial smoothing for direction-of-arrival estimation of coherent signals
,”
IEEE Trans. Acoust., Speech, Signal Process.
33
(
4
),
806
811
(
1985
).
14.
M.
Porter
and
E. L.
Reiss
, “
A numerical method for ocean-acoustic normal modes
,”
J. Acoust. Soc. Am.
76
(
1
),
244
252
(
1984
).
15.
A. B.
Baggeroer
,
W. A.
Kuperman
, and
H.
Schmidt
, “
Matched field processing: Source localization in correlated noise as an optimum parameter estimation problem
,”
J. Acoust. Soc. Am.
83
(
2
),
571
587
(
1988
).
16.
Information on the swellex-96 experiment
available at http://www.mpl.ucsd.edu/swellex96 (
1996
).
17.
P. A.
Baxley
,
N. O.
Booth
, and
W. S.
Hodgkiss
, “
Matched-field replica model optimization and bottom property inversion in shallow water
,”
J. Acoust. Soc. Am.
107
(
3
),
1301
1323
(
2000
).