Underwater source ranging based on Deep Learning methods demands a considerable amount of labeled data, which is costly to collect. To alleviate this challenge, semi-supervised learning of the wrapper paradigm is introduced into this task. First, the Siamese network is used to generate pseudo labels for unlabeled data to expand the labeled dataset. A new effective confidence criterion based on similarity score and similar sample distribution is proposed to evaluate the reliability of pseudo labels. Then the model can be trained more fully with an expanded dataset. Experiments on the SwellEx-96 dataset validate that this method can effectively improve prediction accuracy.
Due to the excellent performance of deep learning in other fields, researchers began to apply deep learning methods for underwater source localization. In deep learning methods, a neural network trained with a large amount of labeled data can directly predict source range, even without target environment parameters. Many works have shown that deep learning methods1–5 are better than the Matched Field Processing (MFP) method.6–9 However, deep learning-based methods usually require a large amount of labeled data for supervised training, which is difficult to collect in real scenarios.
Semi-supervised learning (SSL) is an effective method of solving the problem with insufficient labeled data. Zhu et al.10 utilized an unsupervised preprocessing paradigm of SSL in underwater source ranging. First, they perform Principal Component Regression (PCR) to reduce dimensions and then train an encoder to extract the target feature space for source ranging, which we call PCR-Encoder. This step is performed on both labeled and unlabeled data. Then, the pre-trained encoder (freezing parameters in this step) with a Multilayer Perceptron (MLP) is trained on labeled data for source ranging. There are some improved methods based on encoder preprocessing. Zhu et al.11 propose using time domain and frequency domain features to train two encoders separately. Jin et al.12 use residual convolutional autoencoder and self-attention mechanism to extract features. However, the ranging performance gain is not significant if the feature space extracted by the encoder is not the required one for source ranging. Thus, we introduce another wrapper method in SSL13–15 into this task. Specifically, a model is first trained with labeled data, and then the model is used to generate pseudo labels for unlabeled data. If the pseudo labels are judged to be reliable by confidence criterion, then an unlabeled sample with its pseudo label is added to the labeled dataset.
In the SSL wrapper paradigm, generating reliable pseudo labels is crucial. Typically, a geometric distance criterion like the Euclidean distance is used to identify the most similar samples from the labeled dataset for each unlabeled sample. The pseudo label for the unlabeled sample is then determined by taking the average of the labels of the similar samples. Liu et al.16 first utilized the Siamese network to measure the similarity, which projects the data into a new feature space, and then calculates the Euclidean distance. However, the geometric distance criterion faces a problem in practice: two samples with similar geometric distances may have significantly different labels.
In order to range the underwater source exactly with insufficient labeled data, we introduce the SSL guided by the Siamese network into the underwater ranging task. We propose a two-step SSL method for underwater source ranging. The first step, the improved Siamese network, which builds a direct mapping relationship between the input samples and the sample similarity, is used to generate pseudo labels for unlabeled data. Then, we give a new confidence criterion to evaluate the reliability of pseudo labels through a similarity score and similar sample distribution. In the second step, a range predictor is trained on the expanded labeled dataset to achieve underwater source ranging. Experiments on the public SWellEx-96 dataset show that our method outperforms pure supervised learning and PCR-Encoder.
2. The proposed method
2.1 Improved Siamese network
The Siamese network was first proposed in signature verification tasks to obtain the similarity of two input samples. Input for the Siamese network is a sample pair , in which (xi, xj) represents two samples as input, and is the label of the sample pair. G = 0 means the two samples are similar, and G = 1 means they are dissimilar. The Siamese network outputs similarity score of two samples: . The network is optimized by minimizing the difference between S and G.
The proposed Siamese network in this letter consists of the Feature Network and the Similarity Prediction Network, and the structure is shown in Fig. 1(b). The Feature Network consists of four convolution blocks and a fully connected block. Each convolution block contains a Conv-1D layer followed by a normalization layer and a ReLU activation layer, and the parameters (In_channels, Out_channels, Kernel_size, Stride) of the four convolution blocks are: . The Similarity Prediction Network contains a middle layer and a fully connected block.
The Feature Network is used to extract features of input data. Then we use the Similarity Prediction Network to give the similarity score S of two samples. The S is usually given by Euclidean distance, but the Euclidean distance of two samples' features may not be consistent with the actual distance. To address this problem, we merge the feature of two samples in a middle layer by square deviation and directly give the similarity score with a fully connected network. That is, the mapping relationship between input samples and sample similarity is learned by a neural network.
2.2 Generate pseudo labels for unlabeled data
2.3 Underwater source ranging based on SSL
We use the Feature Network in Fig. 1(b) and a fully connected block to construct the predictor which is shown in Fig. 1(c). Then the predictor is trained on an expanded labeled dataset. Based on the above presentation, the whole process of our method is presented as follows:
Prepare training data. Generate similar sample pairs LS and dissimilar sample pairs LD for training the Siamese network. The detailed procedure is presented in Sec. 3.4.
Train the Siamese network with LS and LD.
Use the Siamese network to generate a pseudo label for by Eq. (2). In this letter, N = 5.
Measure the reliability of the pseudo labels by Eq. (3). Then add unlabeled samples with reliable pseudo labels to L.
Train range predictor on the augmented labeled dataset and complete source ranging task.
3. Data preprocessing and preparation
3.1 The SWellEx-96 experiment
This letter uses the public data of event S5 in the SWellEx-96 experiment to assess the validity of our method. A source at a depth of 9 m, towed by a ship, starts its track south of all arrays and proceeds northward at a speed of 5 knots (2.5 m/s). The source transmits various broadband signals at frequencies between 50 and 400 Hz. A vertical array (VLA) was deployed with 21 hydrophones (denoted as R1–R21) equally spaced between 94.125 and 212.25 m. VLA records 75 min data and the sampling rate is 1500 Hz. The data also provides every minute horizontal range between the VLA and the moving source. For more experimental details, please refer to Ref. 17.
To prove the validity of the proposed method, hydrophone data from different depths are used for experiments. In this letter, R1, R10, and R20 are selected which are at depths: 94.125, 195.38, and 206.62 m. The same experiments are performed on all three hydrophones.
3.2 Data preprocessing
The SWellEx-96 dataset only provides a horizontal range per minute, so labels for each sample are obtained by uniform interpolation. Similarly, the sample labels obtained by interpolation are expressed in matrix form Y with shape 2250 × 1.
3.3 Split dataset
In this letter, we conducted two experiments using 12.5% and 25% of the data as labeled data. We refer to them as the “12.5% training case” and the “25% training case.” Underwater source ranging is regarded as a regression task, so the training data should cover all the tracks. Therefore, labeled data are obtained by uniformly sampling from X. The specific sampling method is as follows:
- 12.5% training case
- 25% training case
For both cases, 12.5% samples are uniformly sampled from unlabeled dataset U as a test set.
3.4 Generate sample pairs
After making the sample pairs, the number of LS is much smaller than that of LD. To balance the number of positive and negative samples, we duplicate LS until LS and LD are equal in number.
4.1 Results of generated pseudo labels
The Siamese network trained 50 epochs with the Adam optimizer on an RTX 3080 GPU card (NVIDIA Corporation, Santa Clara, CA). The learning rate and weight decay are and , respectively. Cosine annealing learning rate with warm up is used to adjust the learning rate. Mean square error (MSE) is used to quantify the performance. We conducted two experiments on each hydrophone (12.5% training case and 25% training case), resulting in a total of six experiments.
First, we compare the accuracy of the Euclidean distance and the improved Siamese network for generating pseudo labels. The experiment results of hydrophone R1 are shown in Fig. 2, where the proportion of labeled data is 12.5%. By comparing Figs. 2(a) and 2(b), it is evident that the pseudo labels generated by our method are more accurate, which fully proves the effectiveness of our method. In addition, Fig. 2(c) shows that the proposed confidence criterion can effectively filter out the pseudo labels with large errors, which ensures that the resulting pseudo labels are sufficiently reliable.
All six experiment results are shown in Fig. 3. Obviously, except for several samples, most of the pseudo labels are fairly close to the true labels. The average MSE of pseudo labels and true labels is 0.0136. In all experiments, an average of 1300 reliable pseudo label samples are generated, which means the method we proposed can effectively alleviate the problem of insufficient labeled data. In addition, in the 25% training case, the Siamese network can generate pseudo labels for almost all unlabeled samples; in the 12.5% training case, the Siamese network can generate pseudo labels for 69% unlabeled samples. This shows that our method can achieve good results with less labeled data. The details of generated pseudo labels are shown in Table 1.
|Experiments .||Number .||MSE of pseudo labelsa .||MSE of SL .||MSE of SSL .||MSE of PCR-Encoder .|
|Experiments .||Number .||MSE of pseudo labelsa .||MSE of SL .||MSE of SSL .||MSE of PCR-Encoder .|
The MSE of pseudo labels and true labels.
4.2 Ranging performance gain
The predictor is trained on the original labeled dataset and the newly augmented labeled dataset, respectively. Then the performance of the predictor is verified on the test data. The results of different experiments are shown in Fig. 4. At the beginning and the end of the track, the prediction based on SSL using the augmented dataset is obviously much better than supervised learning (SL) trained on the original dataset. In the middle part, the prediction based on SSL is more accurate and the distribution is more even. The MSEs of all experiments are shown in Table 1. The average MSEs of SL and SSL in the 12.5% training case are 0.45 and 0.17, respectively; in the 25% training case, they are 0.27 and 0.12, respectively. Obviously, the less labeled data, the more significant the performance improvement brought by the proposed method. Moreover, we contrast the effect of our method with the PCR-Encoder. Except for the R1–12.5% experiment, our method obviously outperforms the PCR-Encoder method.
Because of the lack of labeled data for underwater source ranging, we propose a two-step SSL method. First, the pseudo label is generated for unlabeled data to expand the labeled dataset using the improved Siamese network and confidence criterion. In all experiments, the average number of pseudo labels generated by our method is 1300 and the average MSE is 0.0136. This means that our method can generate a large number of reliable pseudo labels, and then increase the labeled data. The prediction based on SSL trained on the augmented dataset is obviously better than the pure supervised learning trained on the small original labeled dataset. Average MSE decreases by 0.28 and 0.15 in the 12.5% and 25% training cases, respectively. Moreover, the less labeled data, the more obvious the effect of our method.
This research is supported by the National Natural Science Foundation of China (NSFC, Nos. 62201046, 62192711, 62192712).
Conflict of interest
All authors have no conflicts of interest to disclose.