Conventional direction-of-arrival (DOA) estimation algorithms for shallow water environments usually contain high amounts of error due to the presence of many acoustic reflective surfaces and scattering fields. Utilizing data from a single acoustic vector sensor, the magnitude and DOA of an acoustic signature can be estimated; as such, DOA algorithms are used to reduce the error in these estimations. Three experiments were conducted using a moving boat as an acoustic target in a waterway in Houghton, Michigan. The shallow and narrow waterway is a complex and non-linear environment for DOA estimation. This paper compares minimizing DOA errors using conventional and machine learning algorithms. The conventional algorithm uses frequency-masking averaging, and the machine learning algorithms incorporate two recurrent neural network architectures, one shallow and one deep network. Results show that the deep neural network models the shallow water environment better than the shallow neural network, and both networks are superior in performance to the frequency-masking average method.

Source direction-of-arrival (DOA) estimation in shallow water has seen strong advancements for applied water acoustics in the past decade with success specifically in machine learning (Niu et al., 2017a; Niu et al., 2017b; Wang and Peng, 2018). It is of interest to determine the location of anthropogenic sources for many applications: naval operations, merchant shipping, and environmental studies, to name a few. Using neural networks to estimate the DOA of an underwater acoustic source is of recent interest, including the use of multi-layer perceptron (MLP) networks (Ozanich et al., 2020; Yangzhou et al., 2019; Zou et al., 2017), convolutional neural networks (CNNs) (Cao et al., 2021; Ferguson et al., 2019), and recurrent neural networks (RNNs) (Huang et al., 2019; Qin et al., 2020).

This paper discusses conventional and machine learning methods of improving surface-water angle-finding utilizing a single underwater acoustic vector sensor (AVS). Generally, multiple sensors working together are required to find the angle-of-arrival of a signal source (Huang et al., 2018; Trees, 2002; Yangzhou et al., 2019). A pressure-particle acceleration (pa) AVS is capable of determining the angle-of-arrival with a triaxial piezoelectric accelerometer in a neutrally buoyant body. The triaxial accelerometer in the AVS generates a vector quantity of the DOA of the acoustic wave (Bereketli et al., 2015; Fahy, 1995; Kang et al., 2004). There are different types of AVSs: pressure-particle velocity (pu), pa, pressure-pressure (pp), and particle velocity-particle velocity (uu); all have their advantages and disadvantages. This paper solely discusses angle-finding utilizing a Meggitt VS-209 (Wilcoxon Sensing Technologies, Frederick, MD) underwater pa AVS for its broader frequency response, though the methods described here would generalize to any AVS.

We will investigate a shallow RNN architecture and a deep RNN architecture as the machine learning algorithms in the paper. The parameters, such as the inner node lengths and depth of the network, were tested and compared for accuracy. The best models we found with our data are shown in Sec. IV.

The Meggitt VS-209 AVS consists of a hydrophone and a triaxial accelerometer oriented with its −x, −y, and −z orientations—as shown in Fig. 1—with respect to the physical sensor's orientation. The underwater pa-type AVS records the particle acceleration in three orthogonal axes together with a scalar underwater sound pressure measurement. The particle acceleration and sound pressure are combined to produce a sound intensity vector, where the intensity vector contains the strength and angle-of-arrival of all the incident wavefronts.

FIG. 1.

(Color online) Underwater AVS accelerometer orientation.

FIG. 1.

(Color online) Underwater AVS accelerometer orientation.

Close modal

The estimation techniques in this paper require some post-processing of the AVS data. Let ax(t),ay(t), and az(t) be the three components of the time-domain accelerometer data, and let p(t) be the pressure time-series data from the underwater pa AVS. To account for sensor bandwidth and noise, the sensor measurements are first projected into the frequency domain, where ax(ω)=F(ax(t)) is the Fourier transform of ax(t), and, respectively, for each component of the sensor data. Since we are concerned with a moving acoustic source, a short-time Fourier transform (STFT) facilitates its time-dependence. Using the STFT, we compute Ax,Ay,Az,PN×T for the respective three time-domain accelerometer data and hydrophone data, where N is the block-size of the STFT and T is the number of time-series samples divided by the block-size, rounded down. Equations (1) and (2) are computed along each axis with only the x axis shown for brevity. The measurements are composed into the cross power spectra via

(1)

where Ax* is the complex conjugate of the frequency domain accelerometer data in the x axis direction and P is the pressure vector. With the cross power spectra, GAxPN×T, the acoustic intensity is computed as

(2)

where IxN×T are the active intensity levels in the x axis direction. The intensities are computed for all three axes, i.e., the x, y, and z directions corresponding to the three-axis accelerometer. With the three AVS-relative intensity orientations, an intensity vector, Ir=(Ix,Iy,Iz)T3×N×T, can be composed. The intensity vector is relative to the orientation of the AVS as shown in Fig. 1.

The Meggitt VS-209 AVS has a magnetic heading sensor and a gravitational sensor to remove any relative orientation in data collection. The pitch, roll, and heading are the respective rotations along the x, y, and z axes in Fig. 1. A rotation matrix, Qfixed, is calculated from the magnetic and gravitational sensors (Penhale, 2019), such that

(3)

After the rotation, the intensity vector Ig is no longer oriented with respect to the sensor's orientation; instead, it is oriented relative to magnetic north and the gravity vector. We call this a global coordinate system, and global angle measurements are now considered for localization.

The re-oriented intensity vector, Ig=(Iwest,Inorth,Iup)T, is then converted to a spherical coordinate system with

(4)
(4a)
(4b)
(4c)
where |I|, Θ, and Φ are the magnitude of the acoustic intensity vector, azimuth angle, and elevation angle of the received signal, respectively. Notice that each of these is a function of frequency and time. The magnitude of the intensity vector shows the signal strength at each frequency at a specific time. |I| is an indicator of the signal-to-noise ratio (SNR) in the system. The two angles show the DOA of the incident sound wave at each frequency at a specific time. If a particular magnitude of the signal, |Iωi,ti|, is at the noise floor, then the associated angles-of-arrival, θωi,ti and ϕωi,ti, correspond to a DOA of noise; therefore, the measurement at that frequency is not a useful measurement. A noise gate is used to remove these angles at the noise floor in post-processing. Table I shows the post-processing parameters used in this paper.

TABLE I.

Post-processing parameters.

ParameterValue
Sample rate 17–067 Hz 
STFT block-size 1706 samples 
STFT zero padding 1024 samples 
Noise gate threshold –40 dB (re 1 pW/m2
Frequency range 100–8000 Hz 
ParameterValue
Sample rate 17–067 Hz 
STFT block-size 1706 samples 
STFT zero padding 1024 samples 
Noise gate threshold –40 dB (re 1 pW/m2
Frequency range 100–8000 Hz 

In the experiments in this paper, all signal sources are assumed to be on the surface of the water; hence, we only need to estimate the azimuth angle Θ from the AVS signals. Also, note that this paper focuses on DOA estimation, so range is not of interest. To determine the estimated azimuth angle, θ*, of the signal source in our experiment, Θ must be processed along its frequency axis into a single angle prediction at each time step, such that

(5)

To process Θ in a machine learning approach, a linear regression—i.e., single-layer perceptron (SLP) network—can be trained to output θ* using the input Θ. Comparatively, a conventional approach can average Θ along its frequency axis to generate a θ* angle prediction.

After processing Θ to estimate θ*, time-series filtering can be performed to smooth out the effect of noise and outliers to generate more realistic results. Considering machine learning, our hypothesis is that a RNN architecture can be trained to output a better estimate of θ* than conventional averaging, enhancing the localization performance of the AVS.

We use a weighted average with our experimental data to demonstrate a conventional approach for combining the predicted DOA of an acoustic signal from an AVS. For each frequency component in the AVS signal, there is an angle measurement Θ and intensity measurement |I|. The intensity measurement is directly proportional to the SNR; hence, the intensity is used as a weight for the angle measurement. The sample-based average of the weighted angles is the estimated θ*. It follows that

(6)

with the intensities, I, in dB scale normalized on the interval [0,1], and each fi term corresponds to a frequency bin from i=1,2,,N. This estimate gives more weight to an angle that has a stronger corresponding intensity, with the assumption that this signal is emanating from the direct path of the source to be localized. This approach works well with high SNR measurements (Bereketli et al., 2015), though the results deteriorate appreciably with band limited, low SNR responses, as demonstrated in Sec. V. When the acoustic source generates a strong signal, the acoustic intensity, I, at that point dominates the weighted average, while a weak signal will vary greatly depending upon the noise. To address this degraded performance with low SNR measurements, we next explore use of a SLP as an alternative approach to estimate DOA.

While the weighted average is a reasonable approach for processing the AVS measurements into a predicted DOA, there are numerous sources of error that are not taken into account. The source may be a band limited signal and thus only be present in certain frequencies; there may be signal outside these bands that emanates from other sources, say marine mammals, other underwater activity, or noise. Hence, to implicitly learn the best relationship between the AVS measurements, |I| and θ, we will employ machine learning, specifically a neural network. For this experiment, we use a SLP network regression to process the frequency domain of the signal. The SLP network processes the frequency domain angle measurements by

(7)

where wf is a vector of weights for each frequency bin in θt and b is a scalar bias. In essence, if wf=1/N,f, where N is the number of frequency bins, and b = 0, then the neural network would estimate a non-weighted average of the angle measurements across the frequency axis. To create a weighted average, the neural network learns wf and b such that it minimizes E, with respect to the root-mean squared error (RMSE),

(8)

where θttrue is the true angle measurement (or label), and the neural network predicts θt* at each time step t.

Since the AVS is the source of the angle measurements, the neural network must minimize a modified RMSE that considers the AVS's polar nature. The angle measurements for the noise source are wrapped around a 180° and 180° range, so a circular RMSE where the error is the difference between two angles is necessary. This is important because a prediction that is at 179° with a true angle at 181° should have an angle difference of 2°. A standard RMSE would have an angle difference of 358°, overly penalizing this small error. The circular mean squared error that the neural network incorporates is

(9)

where dt=||θt*θttrue||1 is the absolute difference of predicted angle and truth angle at each time step t. The SLP processes the AVS measurements in a linear fashion [see Eq. (7)]; hence, this algorithm may be unable to capture non-linearity present in the system. Thus, we next describe a neural network architecture that can better model non-linearities.

The SLP network is useful in determining the frequencies at which a band limited signal is present; the learned weights wf in Eq. (7) show how the SLP weights the measurements at each frequency. On the other hand, the SLP architecture does not handle time-dependent parameters or non-linearity in the environment. Following Eq. (7), the SLP estimates at each time step, t, calculated independently of one another. However, a RNN considers the current and previous samples (Connor and Atlas, 1991). Thus, a RNN is better able to handle temporal aspects of the signal, creating a time-dependence in its predictions from looking at previous samples. We use a conventional form of a RNN, a fully recurrent neural network with no gates, as a basis for the simplest neural network model. A fully recurrent neural network predicts with n previous samples and its current sample,

(10)

where w and h are trainable parameters. Equation (10) is repeated n times, for each θt until

(11a)
(11b)

where b is also a trainable bias parameter. There is an inherent issue with fully RNN architectures, where w is back-propagated n times during training. The issue arises with values significantly greater than 1 or significantly less than 1 causing very large gradient or close to zero gradients, respectively (Bengio et al., 1994). For example, with n = 20 and w = 1.4, the gradient would increase to 1.420=836. A SLP is used to reduce the dimensionality of the RNN backbone, and a small n value is used to prevent forms of the gradient descent failing due to this issue. The weights in the RNN—w, h, and b—are learned using the truncated backpropagation through time (TBPTT) algorithm (Werbos, 1990) to minimize E in Eq. (9).

The output of a RNN is either multi-input, multi-output (MIMO) or multi-input, single-output (MISO), shown in Fig. 2. In this paper, the MIMO-type RNN is used for internal layers. With the output of the MIMO-type RNN having the same vector length as the input, the internal layers can be connected multiple times, permitting use of a deep neural network (DNN) architecture. The MISO-type RNN is used for the final prediction layer so that a single prediction is made, θ*. The MISO-type RNN is useful for predicting a single angle measurement based on the previous n samples. Both the SLP network and the RNN network can be combined such that the output of one network is the input of another. Now that we have described the basis of the three main algorithms we will use for predicting DOA, we turn to our experiments.

FIG. 2.

(a) MIMO-type RNN and (b) MISO-type RNN.

FIG. 2.

(a) MIMO-type RNN and (b) MISO-type RNN.

Close modal

To record angle data, we staged collections from three events on the Keweenaw Portage Waterway in Houghton, Michigan, on July 14, July 27, and August 18, 2020. Figure 3 shows the location of the Keweenaw Portage Waterway in Michigan. The events consisted of driving a boat near the AVS while recording the boat's GPS position at a 1 Hz sample rate. The three experiments total roughly 79 min of GPS and acoustic data. A bathymetric cross section and measured sound speed profile are shown in Sec. VI.

FIG. 3.

(Color online) Experiment location at Keweenaw Portage Waterway: (A) location in upper peninsula of Michigan, (B) location in Keweenaw peninsula, and (C) on-site location of experiment.

FIG. 3.

(Color online) Experiment location at Keweenaw Portage Waterway: (A) location in upper peninsula of Michigan, (B) location in Keweenaw peninsula, and (C) on-site location of experiment.

Close modal

The sensor data were recorded using a data acquisition (DAQ) unit, National Instruments (NI) cRIO-9035, which has eight slots for NI C-series modules. The C-series modules used in this setup were two NI-9234 analog-to-digital converters (ADCs) for reading the acoustic data, one NI-9467 GPS receiver for timing and location, and one NI-9344 switch module for system-related control. The NI-9234 ADC has 24-bit precision and stored each data point as a 32-bit, single floating point number. The acoustic data collected on the cRIO-9035 were sampled at 17.067 kHz and chunked into 4-min intervals. These intervals are continuous, meaning that there are no missing data between each 4-min interval. The 17.067 kHz sample rate was used, since this rate is the closest discrete range that the NI-9234 module has above the Meggitt VS-209 pa AVS 3-dB frequency cutoff above 7 kHz.

The post-processing of these data, described in Table I, converts the 17.067 kHz sampled data into 1023 frequency bins at a block-size of 0.1 s using the STFT. The four AVS channels are used to generate Θ in Eq. (4). Since the GPS data were recorded at 1 Hz, we linearly interpolated between GPS measurements to match the time interval at which the AVS data were post-processed. Figure 4 shows the 1 Hz rate at which the GPS locations were mapped onto the Keweenaw Portage Waterway.

FIG. 4.

(Color online) First experiment's GPS data.

FIG. 4.

(Color online) First experiment's GPS data.

Close modal

Table II shows the parameters used within the two compared RNN architectures, and Table III shows the layer structures, which are illustrated visually in Fig. 5. The optimizer used is stochastic gradient descent (SGD) with a learning rate of 0.01. No activation function is used on the output layer of the neural network to prevent any skewing of the angle measurement data. The experimental data are split between training and testing for the machine learning algorithm 20 times, so that 20 different models are generated per neural network architecture to test on every portion of the data set in a cross-fold validation setup. Within a single data split, 5% of the training data is used as validation data to determine lowest error in the training set. Then the neural network predicts the test data using the lowest validation error along each fold of the data split. To generate the network architectures, we use the Keras open-source library for its simple modularity and ease of use. Since Keras is written in python, the AVS post-processing in Sec. II B is also written in python.

TABLE II.

Experimental parameters.

ParameterValue
SLP activation None 
RNN activation tanh 
RNN lookback 5 steps 
Epochs 20 
Train/Validation/Test split 90%/5%/5% 
Optimizer SGD 
Learning rate 0.01 
ParameterValue
SLP activation None 
RNN activation tanh 
RNN lookback 5 steps 
Epochs 20 
Train/Validation/Test split 90%/5%/5% 
Optimizer SGD 
Learning rate 0.01 
TABLE III.

RNN architecture shape.

Layer typeDeep RNN dimensionsShallow RNN dimensions
SLP 1023×1 1023×1 
RNN 1×32 1×1 
RNN 32×32 — 
RNN 32×32 — 
SLP 32×1 1×1 
Layer typeDeep RNN dimensionsShallow RNN dimensions
SLP 1023×1 1023×1 
RNN 1×32 1×1 
RNN 32×32 — 
RNN 32×32 — 
SLP 32×1 1×1 
FIG. 5.

(Color online) (a) Deep RNN and (b) shallow RNN architectures.

FIG. 5.

(Color online) (a) Deep RNN and (b) shallow RNN architectures.

Close modal

All results in this section only use the test data defined per model described in Sec. IV. Once the networks have been trained on the experiment training data, the networks are compared with one another. The RMSEs of the test data follow Eq. (9) and are shown in Table IV.

TABLE IV.

RMSE results of experiments.

Weighted averageShallow RNNDeep RNN
RMSE 39.4° 33.5° 24.8° 
SD 45.3° 22.4° 13.8° 
Weighted averageShallow RNNDeep RNN
RMSE 39.4° 33.5° 24.8° 
SD 45.3° 22.4° 13.8° 

Each neural network has its test data folded 20 times and averaged to yield a RMSE and standard deviation (SD). The time-series predictions of the different algorithms are compared to the total testing truth data in Fig. 6 with Fig. 6(b) using a Kalman filter added to the output of each algorithm. The covariance of the process noise (Q=106) and covariance of the observation noise (R = 0.025) are chosen empirically to show the differences between each algorithm along a larger portion of the data set. It should be noted that no results other than Fig. 6(b) use these filtered data; every other figure, table, result, and discussion uses the original algorithm data.

FIG. 6.

(Color online) (a) Subset of algorithm predictions and (b) full test data predictions with Kalman filter.

FIG. 6.

(Color online) (a) Subset of algorithm predictions and (b) full test data predictions with Kalman filter.

Close modal

The results show that the trained deep RNN has the lowest total error throughout the data set, but a single RMSE does not fully convey the deep RNN's results. Another representation is the average angle error with respect to the SNR of the signal. The SNR is calculated by subtracting the ambient acoustic intensity from the acoustic source intensity. A time average of 4 min before the acoustic experiment was conducted is used as the acoustic ambient signature. Figure 7 shows the comparison of the acoustic source signal at different boat distances with a time average of 30 s each.

FIG. 7.

(Color online) Comparison of ambient background and acoustic signal source at varying SNRs.

FIG. 7.

(Color online) Comparison of ambient background and acoustic signal source at varying SNRs.

Close modal

Figure 8 shows the error with respect to SNR. These data are presented by averaging the RMSEs according to the respective 0.5-dB SNR bins and then comparing the results of the three different estimation techniques. For example, in the discrete SNR range of 10–10.5 dB, there are 121 error points, and the mean of these errors for a deep RNN is 13.47°. The shallow RNN and weighted average at this range have an error of 30.26° and 44.22°. To prevent any discrepancies, if a SNR average contains fewer than five samples within the SNR bin, the SNR average is removed. The data with high SNR correspond to a small portion of very fast crossings of the boat driving by the sensor. Due to the high vessel speed, the experimental timing errors become noticeable at these data.

FIG. 8.

(Color online) Algorithm mean errors with respect to SNR.

FIG. 8.

(Color online) Algorithm mean errors with respect to SNR.

Close modal

What is of particular note is in the range from 0 to 20 dB SNR. Both RNN architectures perform significantly better than the weighted average. The shallow RNN produces results slightly better than a weighted average of the angles, and the deep RNN produces results significantly better than the shallow RNN and a weighted average of the angles inside this range. The shallow RNN architecture gives more non-linearity in the algorithm, but the amount of training data permits the usage of a deeper RNN without overfitting. The large amount of training data prevents the deep RNN from overfitting the data while training.

Each model converges in quality at a SNR of 20 dB. We see that the weighted average algorithm performs equally well as the neural network architectures at this SNR. A SNR of 20 dB is high enough for the weighted average, a linear model, to perform as well as the neural networks, a non-linear model. Our data find the neural networks unnecessary for signals above 20 dB SNR in our acoustic environment.

In some points in these data, the acoustic source's distance from the AVS is too large, and/or there is no direct acoustic path to the AVS. Using solely the weighted frequency intensity analysis, the results are poor at high angle values, above 100°, shown in Fig. 9. The high angles map to the boat to the west of the sensor (Fig. 4) with no direct acoustic path present and far away from the sensor itself. These data are kept in the analysis, as the purpose of the machine learning algorithms is to work with these highly noisy signals and still map the DOA with higher accuracy than the weighted average. The results in Table IV show this is the case.

FIG. 9.

(Color online) First experiment's data with (a) weighted average analysis and (b) the distance from the source.

FIG. 9.

(Color online) First experiment's data with (a) weighted average analysis and (b) the distance from the source.

Close modal

The experimental data contain multi-path interferences. To validate this claim, two simulations were created to compare the Portage Waterway acoustic channel and an open field. Figure 10 shows a comparison of two RAMGeo (Collins, 1993) simulations (one with multi-path and one without) and the corresponding experimental data. The distance is used equally among all panels in Fig. 10 using the experimental GPS distances from a single pass in Fig. 4, and each simulation time step is computed independently. The Portage Waterway simulation parameters are shown in Fig. 11 from recorded bathymetry and water velocity on the Portage Waterway. Note that the sound speed varies by less than 0.05 m/s at 1471.5 m/s.

FIG. 10.

(Color online) Moving source past sensor of a single pass for (a) Portage Waterway simulation, (b) open water simulation, and (c) Portage Waterway experimental data.

FIG. 10.

(Color online) Moving source past sensor of a single pass for (a) Portage Waterway simulation, (b) open water simulation, and (c) Portage Waterway experimental data.

Close modal
FIG. 11.

(Color online) Portage Waterway environment simulation input from historical measured data.

FIG. 11.

(Color online) Portage Waterway environment simulation input from historical measured data.

Close modal

The open water simulation has the same sound speed velocity with an infinite depth. The swept frequency patterns are a common result of acoustic interference patterns from a moving source in a channel, while the open water simulation contains very little of this pattern. Multi-path constructive and destructive interference is present in the shallow waveguide both in the Portage Waterway simulation and in the Portage Waterway experimental data. The experimental data also show electrical power noise present at harmonics of 60 Hz, common for working with alternating current (ac) power in a marine environment.

In this paper, we compared two types of RNNs and a weighted acoustic intensity average to predict the direction-of-arrival from acoustic vector sensor data. The RNNs helped in predicting the temporal aspect of a moving acoustic source. The weighted acoustic intensity average was a good indicator to determine the benefits of using deep learning. Our real-world experiment results suggest that DNNs are a strong candidate for use for direction-of-arrival estimation in high-noise scenarios. Conversely, if the signal has a relatively high SNR—our data show that in our environment the threshold is around 25 dB SNR—linear methods, such as weighted averaging or SLPs, suffice.

These results encourage further study of the use of machine learning for localization with multiple acoustic vector sensors in difficult-to-model acoustic environments. There is also an opportunity to analyze detection and estimation tasks in near-shore ice in Houghton's surrogate Arctic environment (Penhale, 2019; Penhale et al., 2018) with the neural network models. Near-shore ice has been shown to be a difficult acoustics environment (Penhale, 2019; Penhale et al., 2018), and we anticipate that machine learning will show to be a good candidate for increased performance in detection and estimation tasks in this scenario. We are currently carrying out experiments to test this hypothesis. Future work will also examine advanced machine learning methods, such as other deep network architectures—long short-term memory networks (Hochreiter and Schmidhuber, 1997), transformers (Vaswani et al., 2017), etc.—which will be enabled by ongoing data collects.

This work was funded by the United States Naval Undersea Warfare Center and Naval Engineering Education Consortium (NEEC) (Grant No. N00174-19-1-0004) and the Office of Naval Research (ONR) (Grant No. N00014-20-1-2793). This is Contribution No. 76 of the Great Lakes Research Center at Michigan Technological University.

1.
Bengio
,
Y.
,
Simard
,
P.
, and
Frasconi
,
P.
(
1994
). “
Learning long-term dependencies with gradient descent is difficult
,”
IEEE Trans. Neural Netw.
5
(
2
),
157
166
.
2.
Bereketli
,
A.
,
Guldogan
,
M. B.
,
Kolcak
,
T.
,
Gudu
,
T.
, and
Avsar
,
A. L.
(
2015
). “
Experimental results for direction of arrival estimation with a single acoustic vector sensor in shallow water
,”
J. Sens.
2015
,
401353
.
3.
Cao
,
H.
,
Wang
,
W.
,
Su
,
L.
,
Ni
,
H.
,
Gerstoft
,
P.
,
Ren
,
Q.
, and
Ma
,
L.
(
2021
). “
Deep transfer learning for underwater direction of arrival using one vector sensor
,”
J. Acoust. Soc. Am.
149
(
3
),
1699
1711
.
4.
Collins
,
M. D.
(
1993
). “
A split-step padé solution for the parabolic equation method
,”
J. Acoust. Soc. Am.
93
(
4
),
1736
1742
.
5.
Connor
,
J.
, and
Atlas
,
L.
(
1991
). “
Recurrent neural networks and time series prediction
,” in
Proceedings of IJCNN-91–Seattle International Joint Conference on Neural Networks
, July 8–12, Seattle, WA, Vol.
I
, pp.
301
306
.
6.
Fahy
,
F.
(
1995
).
Sound Intensity
, 2nd ed. (
CRC
,
Boca Raton, FL
).
7.
Ferguson
,
E. L.
,
Williams
,
S. B.
, and
Jin
,
C. T.
(
2019
). “
Convolutional neural network for single-sensor acoustic localization of a transiting broadband source in very shallow water
,”
J. Acoust. Soc. Am.
146
(
6
),
4687
4698
.
8.
Hochreiter
,
S.
, and
Schmidhuber
,
J.
(
1997
). “
Long short-term memory
,”
Neural Comput.
9
(
8
),
1735
1780
.
9.
Huang
,
Z.
,
Xu
,
J.
,
Gong
,
Z.
,
Wang
,
H.
, and
Yan
,
Y.
(
2018
). “
Source localization using deep neural networks in a shallow water environment
,”
J. Acoust. Soc. Am.
143
(
5
),
2922
2932
.
10.
Huang
,
Z.
,
Xu
,
J.
,
Gong
,
Z.
,
Wang
,
H.
, and
Yan
,
Y.
(
2019
). “
Multiple source localization in a shallow water waveguide exploiting subarray beamforming and deep neural networks
,”
Sensors
19
(
21
),
4768
.
11.
Kang
,
K.
,
Gabrielson
,
T. B.
, and
Lauchle
,
G. C.
(
2004
). “
Development of an accelerometer-based underwater acoustic intensity sensor
,”
J. Acoust. Soc. Am.
116
,
3384
3392
.
12.
Niu
,
H.
,
Ozanich
,
E.
, and
Gerstoft
,
P.
(
2017a
). “
Ship localization in santa barbara channel using machine learning classifiers
,”
J. Acoust. Soc. Am.
142
(
5
),
EL455
EL460
.
13.
Niu
,
H.
,
Reeves
,
E.
, and
Gerstoft
,
P.
(
2017b
). “
Source localization in an ocean waveguide using supervised machine learning
,”
J. Acoust. Soc. Am.
142
(
3
),
1176
1188
.
14.
Ozanich
,
E.
,
Gerstoft
,
P.
, and
Niu
,
H.
(
2020
). “
A feedforward neural network for direction-of-arrival estimation
,”
J. Acoust. Soc. Am.
147
(
3
),
2035
2048
.
15.
Penhale
,
M. B.
(
2019
). “
Acoustic localization techniques for application in near-shore Arctic environments
,” Ph.D. thesis,
Michigan Technological University
,
Houghton, MI
.
16.
Penhale
,
M. B.
,
Barnard
,
A. R.
, and
Shuchman
,
R.
(
2018
). “
Multi-modal and short-range transmission loss in thin, ice-covered, near-shore arctic waters
,”
J. Acoust. Soc. Am.
143
(
5
),
3126
3137
.
17.
Qin
,
D.
,
Tang
,
J.
, and
Yan
,
Z.
(
2020
). “
Underwater acoustic source localization using LSTM neural network
,” in
Proceedings of the 2020 39th Chinese Control Conference (CCC)
, pp.
7452
7457
.
18.
Trees
,
H. L. V.
(
2002
).
Optimum Array Processing
(
Wiley
,
New York
).
19.
Vaswani
,
A.
,
Shazeer
,
N.
,
Parmar
,
N.
,
Uszkoreit
,
J.
,
Jones
,
L.
,
Gomez
,
A. N.
,
Kaiser
,
L. U.
, and
Polosukhin
,
I.
(
2017
). “
Attention is all you need
,” in
Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017)
, December 4–9, Long Beach, CA, Vol.
30
.
20.
Wang
,
Y.
, and
Peng
,
H.
(
2018
). “
Underwater acoustic source localization using generalized regression neural network
,”
J. Acoust. Soc. Am.
143
(
4
),
2321
2331
.
21.
Werbos
,
P. J.
(
1990
). “
Backpropagation through time: What it does and how to do it
,”
Proc. IEEE
78
(
10
),
1550
1560
.
22.
Yangzhou
,
J.
,
Ma
,
Z.
, and
Huang
,
X.
(
2019
). “
A deep neural network approach to acoustic source localization in a shallow water tank experiment
,”
J. Acoust. Soc. Am.
146
(
6
),
4802
4811
.
23.
Zou
,
Y.
,
Gu
,
R.
,
Wang
,
D.
,
Jiang
,
A.
, and
Ritz
,
C. H.
(
2017
). “
Learning a robust DOA estimation model with acoustic vector sensor cues
,” in
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIP ASC)
, December 12–15, Kuala Lumpur, Malaysia, pp.
1688
1691
.