Predicting acoustic transmission loss in the SOFAR channel faces challenges, such as excessively complex algorithms and computationally intensive calculations in classical methods. To address these challenges, a deep learning-based underwater acoustic transmission loss prediction method is proposed. By properly training a U-net-type convolutional neural network, the method can provide an accurate mapping between ray trajectories and the transmission loss over the problem domain. Verifications are performed in a SOFAR channel with Munk's sound speed profile. The results suggest that the method has potential to be used as a fast predicting model without sacrificing accuracy.
1. Introduction
Predicting low-frequency acoustic transmission loss (TL) in the SOFAR channel is an important research field in acoustics. Low-frequency TL plays a crucial role in various applications, such as early warning of undersea earthquakes,1 underwater sound source localization,2 and monitoring of marine mammals.3 As sound travels through the complex environment of the deep ocean, it encounters various factors, such as varying seafloor and stratified medium, that affect its propagation and intensity. Therefore, the prediction of low-frequency underwater acoustic TL has long been a challenge.
Ray-based models are commonly used to calculate TL in the SOFAR channels by providing a simplified representation of sound waves traveling through water. Based on the tracing of rays, the sound field can be calculated by solving the eikonal equation and the transport equations. Ray-based methods can handle range-dependent environment and are well adapted for long-range propagation. However, due to the high-frequency approximation, classical ray methods are usually not considered to be suitable for low-frequency problems.4 Here, “low frequency” does not refer to a constant value or range, but a variable for different environments. For example, in the user manual of BELLHOP, which uses ray theories as its core algorithms, a calculation example at 50 Hz is performed in a SOFAR channel with a range of 100 km and a depth of 5 km. It is mentioned that 50 Hz is usually considered to be a low frequency in such an environment and the ray methods cannot give accurate results for this problem due to the errors in the shadow zone.5
Wave-based models are another important type of methods for calculating TL in the SOFAR channel. Normal mode (NM) methods are one of a basic wave-based methods. In the NM, the sound pressure is expressed by summing up a set of modal functions weighted in accordance with the source depth.6 The NM method has high accuracy on calculating the sound field. However, the method is ineffective for range-dependent ocean environments. The parabolic equation (PE) method is a suitable and popular wave-theory technique for solving range-dependent propagation problems.7 Early PEs usually have inherent phase errors, which limit their applicability to a certain range of angles around the main propagation direction. However, very-wide-angle PE implementation based on Padé approximants, which has been proposed in subsequent research, has nearly eliminated the small-angle limitations.8 This high-angle capability is achieved with additional computational effort.
To improve the performance of calculating TL in the complex ocean environment, many extensions to classical methods, such as Gaussian beam tracing method,9 coupled mode method,10 and hybrid method,11 have been proposed. These methods have improved the accuracy of underwater sound field simulation in various aspects. However, they have also raised issues, such as long computation time.
In recent years, deep learning techniques have achieved remarkable progress in various scientific research fields.12,13 Deep learning architectures based on neural networks are capable of extracting valuable patterns and insights that would be challenging or time-consuming with traditional methods. It has been successfully applied in the field of underwater acoustics, such as the source localization,14 source depth estimation,15 and dim frequency line detection.16 Deep learning technique also has been increasingly used in modeling the ocean acoustic propagation. A deep convolutional recurrent autoencoder network is presented for data-driven learning of complex underwater wave scattering and interference.17 Deep learning methods also have been used to predict modal horizontal wavenumbers and group velocities,18 and predict far-field acoustic propagation based on near field data.19
To rapidly and accurately predict the acoustic TL in SOFAR channels, we develop a convolutional neural network-based method for predicting low-frequency underwater acoustic TL map from ray trajectories. In this method, a U-net type of neural network is trained with ray trajectories as input to predict TL at low frequencies. Compared to the conventional ray-based method, the solving of the transport equation is replaced by the deep learning model. This avoids the problem of high-frequency approximation in the construction of the transport equation. In addition, since ray trajectories can be easily determined even in complex environments, it is possible to conveniently and accurately predict the TL using the proposed method.
2. Method
2.1 Problem description
2.2 Calculation of ray trajectories and downsampling
To calculate ray trajectories in a given SOFAR channel, a grid is defined on the 2D plane illustrated in Fig. 1(a). Assume Nr rays are emitted from the sound source within an angular range with equal angular intervals.
Schematic of generation of ray trajectories. (a) Calculating plane is discretized into a grid, and source emits rays within an angular range. (b) Calculated ray trajectories on a fine grid . (c) Original rays are downsampled on a coarse grid . (d) Downsampled ray trajectories.
Schematic of generation of ray trajectories. (a) Calculating plane is discretized into a grid, and source emits rays within an angular range. (b) Calculated ray trajectories on a fine grid . (c) Original rays are downsampled on a coarse grid . (d) Downsampled ray trajectories.
For each ray, its trajectory on grid can be calculated according to the Snell's law.4 After calculating all rays, a set of ray trajectories is obtained as shown in Fig. 1(b). For the cell corresponding to depth i ( ) and range j ( ), the ijth component of is the number of rays that pass through the cell.
2.3 Neural network architecture
A U-net is used as the neural network architecture in this research. The U-net is a type of convolutional neural network architecture commonly used for image segmentation tasks. Proposed in 2015, U-net has become widely adopted in the field of image analysis.22 In recent years, it has been introduced in the sound field prediction23 and has achieved promising results.
The architecture of the U-Net used in this research is illustrated in Fig. 2(a). It consists of an encoder path and a decoder path, which are connected through skip connections. The encoder path gradually downsamples the input ray trajectory, extracting high-level features. In each convolutional layer, convolution is performed as illustrated in Fig. 2(b). The “same mode” of padding is used after convolution, and Rectified Linear Unit is performed as the activate function after the padding to avoid vanishing gradient problem. Following each convolutional layer, max pooling with filters of 2 × 2 and stride of (2, 2) is performed, which reduce the spatial dimensions of the feature maps. The decoder path performs upsampling operations to progressively recover the spatial resolution determined by and finally generates the scaled predicted TL map.
Schematic diagram of the U-Net architecture used in this paper. (a) Architecture of the U-Net used in this paper. Blue numbers represent the width × height. Black numbers denote the length of the input and output volumes. (b) Schematic of a convolution layer.
Schematic diagram of the U-Net architecture used in this paper. (a) Architecture of the U-Net used in this paper. Blue numbers represent the width × height. Black numbers denote the length of the input and output volumes. (b) Schematic of a convolution layer.
The skip connections between the encoder and decoder paths allow the network to preserve and integrate both local and global information. They help in recovering fine details by bypassing the low-level feature maps directly to the decoder path.24
2.4 Data scaling and loss function
The reference data for the neural network consist of TL maps produced in the same environment as the ray trajectory calculation. The data can be calculated by selecting an appropriate method for the specified environment. The TL in dB at frequency f is referred to as the ground truth data in the training.
3. Training and test
3.1 Training
To evaluate the performance of the proposed method on predicting the TL map, we train the network using the data generated from different sound source depths and then test the network using the data from new source depths that were never used in the training. The training and test data are simulated in a SOFAR channel with a continental slope as illustrated in Fig. 3(a). In such a range-dependent environment, ray-based methods lack sufficient accuracy at low frequencies, while wave-based methods usually require complex and time-consuming algorithms to perform the calculations.
Environment and training loss. (a) Calculating environment with a continental slope. (b) Munk's sound speed profile. (c) Training loss curves of the model at 10 Hz. (d) Training loss curves of the model at 50 Hz.
Environment and training loss. (a) Calculating environment with a continental slope. (b) Munk's sound speed profile. (c) Training loss curves of the model at 10 Hz. (d) Training loss curves of the model at 50 Hz.
The maximum range distance R is 100 km, depth Z is 5 km. The maximum range and height of the slope are 100 km and 1 km, respectively. The surface of the ocean is assumed to be pressure-release boundary, and the seafloor is an acousto-elastic half-space where its speed of sound is 1550 m/s and density is 1 g/cm3. We consider the environment with depth-dependent sound speed following Munk's sound speed profile27 as shown in Fig. 3(b). The ray trajectories are calculated using BELLHOP code,28 and the ground truth TL maps are calculated using RAM code,29 which obtains the results using PE method.
In this paper, the networks are trained independently on individual frequencies from 10 to 50 Hz with interval of 5 Hz; thus, nine network models are obtained. This strategy increases the repetitive works in the training. However, for tasks that have specific frequency of interest, it can effectively reduce the complexity of the network.
Based on the aforementioned training strategy, nine training sets are built at the specified frequencies. Each training set consists of two parts of data, namely, the ray trajectories and ground truth TL maps. Original ray trajectories are computed on a grid with size of . Then, is downsampled to on a grid with size of . Note that the grids are generated in the rectangular plane as shown in Fig. 3(a), which covers the slope area. Calculations were performed under 541 source depths ranging from 300 to 3000 m with a constant interval of 5 m. For each source depth, a number of Nr = 30 rays are emitted from the source within an angle range of θ = [−30°, 30°]. Ground truth TL maps are also computed on grid at the same source depths to calculate the loss. Since ray trajectories are frequency-independent, they are the same in all training sets.
In the training, is randomly divided into two sets with a ratio of 3:1, being used as the training set and validation set, respectively. The training is performed via the ADAM optimizer. Batchsize in the training is set to be 2 and the learning rate is 0.0001. The hyper-parameters of the neural networks in this paper are listed in Table 1.
Parameters of the neural network and the training.
Layer name . | Input size . | Hyper-parameters . | Output size . | |
---|---|---|---|---|
C: filter size, (stride), filter number; . | ||||
M: filter size, (stride); . | ||||
D: filter size . | ||||
C1–M1 | 128 × 256 × 1 | C1: 3 × 3 × 1, (1, 1), 64 | M1: 2 × 2, (2, 2) | 64 × 128 × 64 |
C2–M2 | 64 × 128 × 64 | C2: 3 × 3 × 64, (1, 1), 128 | M2: 2 × 2, (2, 2) | 32 × 16 × 128 |
C3–M3 | 32 × 64 × 128 | C3: 3 × 3 × 128, (1, 1), 256 | M3: 2 × 2, (2, 2) | 16 × 32 × 256 |
C4–M4 | 16 × 32 × 256 | C4: 3 × 3 × 256, (1, 1), 512 | M4: 2 × 2, (2, 2) | 8 × 16 × 512 |
D1–C5 | 8 × 16 × 512 | D1: 2 × 2 | C5: 3 × 3 × 1024, (1, 1), 512 | 16 × 32 × 512 |
D2–C6 | 16 × 32 × 512 | D2: 2 × 2 | C6: 3 × 3 × 768, (1, 1), 256 | 32 × 64 × 256 |
D3–C7 | 32 × 64 × 256 | D3: 2 × 2 | C7: 3 × 3 × 384, (1, 1), 128 | 64 × 128 × 128 |
D4–C8 | 64 × 128 × 128 | D4: 2 × 2 | C8: 3 × 3 × 192, (1, 1), 64 | 128 × 256 × 64 |
C9 | 128 × 256 × 64 | — | C8: 1 × 1 × 64, (1, 1), 1 | 128 × 256 × 1 |
Layer name . | Input size . | Hyper-parameters . | Output size . | |
---|---|---|---|---|
C: filter size, (stride), filter number; . | ||||
M: filter size, (stride); . | ||||
D: filter size . | ||||
C1–M1 | 128 × 256 × 1 | C1: 3 × 3 × 1, (1, 1), 64 | M1: 2 × 2, (2, 2) | 64 × 128 × 64 |
C2–M2 | 64 × 128 × 64 | C2: 3 × 3 × 64, (1, 1), 128 | M2: 2 × 2, (2, 2) | 32 × 16 × 128 |
C3–M3 | 32 × 64 × 128 | C3: 3 × 3 × 128, (1, 1), 256 | M3: 2 × 2, (2, 2) | 16 × 32 × 256 |
C4–M4 | 16 × 32 × 256 | C4: 3 × 3 × 256, (1, 1), 512 | M4: 2 × 2, (2, 2) | 8 × 16 × 512 |
D1–C5 | 8 × 16 × 512 | D1: 2 × 2 | C5: 3 × 3 × 1024, (1, 1), 512 | 16 × 32 × 512 |
D2–C6 | 16 × 32 × 512 | D2: 2 × 2 | C6: 3 × 3 × 768, (1, 1), 256 | 32 × 64 × 256 |
D3–C7 | 32 × 64 × 256 | D3: 2 × 2 | C7: 3 × 3 × 384, (1, 1), 128 | 64 × 128 × 128 |
D4–C8 | 64 × 128 × 128 | D4: 2 × 2 | C8: 3 × 3 × 192, (1, 1), 64 | 128 × 256 × 64 |
C9 | 128 × 256 × 64 | — | C8: 1 × 1 × 64, (1, 1), 1 | 128 × 256 × 1 |
3.2 Test
Test data are calculated in the same environment as the training data produced. At frequency f, ray trajectories under a number of 100 random source depths ranging from 300 to 3000 m are computed as the inputs in tests. TL maps calculated by RAM under the same condition are considered as the ground truth data. The proposed method is also compared with the classical ray method, which also uses the ray trajectories to calculate the TL. The results of ray method are obtained by BELLHOP. Two examples of TL maps obtained from different methods are shown in Fig. 4(a).
Comparisons of TL maps and error analysis. (a) Two examples of TL maps of ground truth, the proposed method, and the ray method. (b) MAE, MSSIM, and their 95% confidence intervals of the proposed method and the ray method.
Comparisons of TL maps and error analysis. (a) Two examples of TL maps of ground truth, the proposed method, and the ray method. (b) MAE, MSSIM, and their 95% confidence intervals of the proposed method and the ray method.
The MAE, MSSIM, and their 95% confidence intervals from 10 to 50 Hz with an interval of 5 Hz are illustrated in Fig. 4(b).
From Fig. 4(a), good similarities are observed between the predicted and true data. This proves that the proposed method is capable of predicting the TL maps for given source depth, frequency, and sound speed profile from the corresponding ray trajectories. Figure 4(b) illustrates that the proposed method predicts the TL maps on low error levels. The error tends to slightly increase with the increase in the frequency. This increase is because the distribution of TL at higher frequencies exhibits higher complexity. In addition, both the 95% confidence intervals of MAE and MSSIM are similar at different frequencies, which demonstrate that the network is stable to predict the TL maps for different source depths.
Figure 4 also demonstrates that the ray method has obvious larger errors than the proposed method. The ray method lacks sufficient accuracy in the area where rays do not travel through, which proves that the proposed method has good performance in low-frequency range.
4. Conclusion
To efficiently predict the TL in SOFAR channels, a deep learning-based method is proposed and examined in this research. The method provides an accurate mapping between ray trajectories and the TL using the convolutional neural networks in an image-processing-like framework. Ray trajectories contain rich information of the wave propagation, are usually easy to obtain, and thus provide a solid data foundation for predicting the TL. The U-net-type network is used, and a hybrid loss function combining SSIM and IG is designed. By successfully training the network, the model can achieve generalized learning of the underlying physics of underwater acoustic transmission phenomena from ray trajectories and then effectively and efficiently predict the low-frequency TL. The tests in a SOFAR channel with a continental slope show that trainings can converge quickly based on a small amount of training data. It also offers promising prospects for use in more complex environments, where its computational efficiency characteristics can be further exploited.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (12074317).
Author Declarations
Conflict of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.