Underwater source ranging based on Deep Learning methods demands a considerable amount of labeled data, which is costly to collect. To alleviate this challenge, semi-supervised learning of the wrapper paradigm is introduced into this task. First, the Siamese network is used to generate pseudo labels for unlabeled data to expand the labeled dataset. A new effective confidence criterion based on similarity score and similar sample distribution is proposed to evaluate the reliability of pseudo labels. Then the model can be trained more fully with an expanded dataset. Experiments on the SwellEx-96 dataset validate that this method can effectively improve prediction accuracy.

Due to the excellent performance of deep learning in other fields, researchers began to apply deep learning methods for underwater source localization. In deep learning methods, a neural network trained with a large amount of labeled data can directly predict source range, even without target environment parameters. Many works have shown that deep learning methods1–5 are better than the Matched Field Processing (MFP) method.6–9 However, deep learning-based methods usually require a large amount of labeled data for supervised training, which is difficult to collect in real scenarios.

Semi-supervised learning (SSL) is an effective method of solving the problem with insufficient labeled data. Zhu et al.10 utilized an unsupervised preprocessing paradigm of SSL in underwater source ranging. First, they perform Principal Component Regression (PCR) to reduce dimensions and then train an encoder to extract the target feature space for source ranging, which we call PCR-Encoder. This step is performed on both labeled and unlabeled data. Then, the pre-trained encoder (freezing parameters in this step) with a Multilayer Perceptron (MLP) is trained on labeled data for source ranging. There are some improved methods based on encoder preprocessing. Zhu et al.11 propose using time domain and frequency domain features to train two encoders separately. Jin et al.12 use residual convolutional autoencoder and self-attention mechanism to extract features. However, the ranging performance gain is not significant if the feature space extracted by the encoder is not the required one for source ranging. Thus, we introduce another wrapper method in SSL13–15 into this task. Specifically, a model is first trained with labeled data, and then the model is used to generate pseudo labels for unlabeled data. If the pseudo labels are judged to be reliable by confidence criterion, then an unlabeled sample with its pseudo label is added to the labeled dataset.

In the SSL wrapper paradigm, generating reliable pseudo labels is crucial. Typically, a geometric distance criterion like the Euclidean distance is used to identify the most similar samples from the labeled dataset for each unlabeled sample. The pseudo label for the unlabeled sample is then determined by taking the average of the labels of the similar samples. Liu et al.16 first utilized the Siamese network to measure the similarity, which projects the data into a new feature space, and then calculates the Euclidean distance. However, the geometric distance criterion faces a problem in practice: two samples with similar geometric distances may have significantly different labels.

In order to range the underwater source exactly with insufficient labeled data, we introduce the SSL guided by the Siamese network into the underwater ranging task. We propose a two-step SSL method for underwater source ranging. The first step, the improved Siamese network, which builds a direct mapping relationship between the input samples and the sample similarity, is used to generate pseudo labels for unlabeled data. Then, we give a new confidence criterion to evaluate the reliability of pseudo labels through a similarity score and similar sample distribution. In the second step, a range predictor is trained on the expanded labeled dataset to achieve underwater source ranging. Experiments on the public SWellEx-96 dataset show that our method outperforms pure supervised learning and PCR-Encoder.

The Siamese network was first proposed in signature verification tasks to obtain the similarity of two input samples. Input for the Siamese network is a sample pair [ ( x i , x j ) , G ) ], in which (xi, xj) represents two samples as input, and G ( 0 , 1 ) is the label of the sample pair. G = 0 means the two samples are similar, and G = 1 means they are dissimilar. The Siamese network outputs similarity score of two samples: S = Siamese ( x i , x j ). The network is optimized by minimizing the difference between S and G.

The proposed Siamese network in this letter consists of the Feature Network and the Similarity Prediction Network, and the structure is shown in Fig. 1(b). The Feature Network consists of four convolution blocks and a fully connected block. Each convolution block contains a Conv-1D layer followed by a normalization layer and a ReLU activation layer, and the parameters (In_channels, Out_channels, Kernel_size, Stride) of the four convolution blocks are: ( 1 , 128 , 5 , 2 ) , ( 128 , 256 , 5 , 2 ) , ( 256 , 256 , 3 , 2 ) , ( 256 , 256 , 3 , 2 ). The Similarity Prediction Network contains a middle layer and a fully connected block.

The Feature Network is used to extract features of input data. Then we use the Similarity Prediction Network to give the similarity score S of two samples. The S is usually given by Euclidean distance, but the Euclidean distance of two samples' features may not be consistent with the actual distance. To address this problem, we merge the feature of two samples in a middle layer by square deviation and directly give the similarity score with a fully connected network. That is, the mapping relationship between input samples and sample similarity is learned by a neural network.

The network is optimized with the contrastive loss function defined in Eq. (1). The contrastive loss function ensures that the score of similar samples tends to zero while constraining the score of dissimilar samples to exceed marginal size α,
(1)
where G and S are the label and similarity score of the sample pair, respectively.
The key of the proposed method is to generate pseudo labels for unlabeled data as accurately as possible. In our method, the pseudo labels are generated based on the K-Nearest Neighbor (KNN) principle. As depicted in Fig. 1(a), let L = { ( L X 1 , Y 1 ) , , ( L X m , Y m ) } represent the labeled dataset and U = { U X 1 , , U X n } represent the unlabeled dataset. For the unlabeled sample UX i, we first calculate its similarity to all samples in L, and we get the similarity scores: S 1 , S 2 , , S m. Then the similarity scores are sorted from low to high (the smaller the similarity score, the more similar the two samples): S i 1 , , S i N , , S i m. According to the principle of KNN, the pseudo label of the UXi is determined by the N samples in L that are most similar to UXi. With the sorted similarity scores, we can get the N sample labels: Y i 1 , , Y i N. Then the pseudo label is given by Eq. (2),
(2)
Some predicted pseudo labels may have a large bias from the true labels. Therefore, we need a criterion to select reliable pseudo labels. Since the true labels of unlabeled data are unknown, we propose a confidence criterion using similarity score and similar sample distribution to jointly select the reliable pseudo labels. On the one hand, the two samples should be similar enough, that is, Si is small enough. On the other hand, it is more reliable when similar samples are distributed in a small range. Specifically, a reliable pseudo label should satisfy the two criteria in Eq. (3). This process is illustrated in the decision branch shown in Fig. 1(a),
(3)
where θ1 is a threshold for similarity score and θ2 is a threshold for similar samples distribution. In this letter, both θ1 and θ2 are set to 0.5.

We use the Feature Network in Fig. 1(b) and a fully connected block to construct the predictor which is shown in Fig. 1(c). Then the predictor is trained on an expanded labeled dataset. Based on the above presentation, the whole process of our method is presented as follows:

  1. Prepare training data. Generate similar sample pairs LS and dissimilar sample pairs LD for training the Siamese network. The detailed procedure is presented in Sec. 3.4.

  2. Train the Siamese network with LS and LD.

  3. Use the Siamese network to generate a pseudo label for UX i U by Eq. (2). In this letter, N = 5.

  4. Measure the reliability of the pseudo labels by Eq. (3). Then add unlabeled samples with reliable pseudo labels to L.

  5. Train range predictor on the augmented labeled dataset and complete source ranging task.

This letter uses the public data of event S5 in the SWellEx-96 experiment to assess the validity of our method. A source at a depth of 9 m, towed by a ship, starts its track south of all arrays and proceeds northward at a speed of 5 knots (2.5 m/s). The source transmits various broadband signals at frequencies between 50 and 400 Hz. A vertical array (VLA) was deployed with 21 hydrophones (denoted as R1–R21) equally spaced between 94.125 and 212.25 m. VLA records 75 min data and the sampling rate is 1500 Hz. The data also provides every minute horizontal range between the VLA and the moving source. For more experimental details, please refer to Ref. 17.

To prove the validity of the proposed method, hydrophone data from different depths are used for experiments. In this letter, R1, R10, and R20 are selected which are at depths: 94.125, 195.38, and 206.62 m. The same experiments are performed on all three hydrophones.

We refer to the data preprocessing method in Zhu et al.10 The data used for experiments is obtained from hydrophone data using a sliding window approach, with both the window length and step size set to 2 s. As a result, we obtained a total of 2250 samples. By analyzing the amplitude spectrum, we find that most of the energy is concentrated on the direct current (DC) component. To ensure the training stability, we remove the DC component and arrange the sample data as a matrix X with shape 2250 × 1499. Each row in X is a sample feature sequence, and 1499 is the feature length. Then we normalize each sample to [ 0 , 1 ] by the following Min-Max scaling method:
(4)

The SWellEx-96 dataset only provides a horizontal range per minute, so labels for each sample are obtained by uniform interpolation. Similarly, the sample labels obtained by interpolation are expressed in matrix form Y with shape 2250 × 1.

In this letter, we conducted two experiments using 12.5% and 25% of the data as labeled data. We refer to them as the “12.5% training case” and the “25% training case.” Underwater source ranging is regarded as a regression task, so the training data should cover all the tracks. Therefore, labeled data are obtained by uniformly sampling from X. The specific sampling method is as follows:

  • 12.5% training case
    (5)
  • 25% training case
    (6)

For both cases, 12.5% samples are uniformly sampled from unlabeled dataset U as a test set.

The input data for training the Siamese network should be a sample pair [ ( L X i , L X j ) , G ]. There need to be similar pairs and dissimilar pairs. In this letter, we use absolute distance difference to determine whether two samples are similar. For the sample L X i , L X j L , i j, the absolute distance difference of the label is | L Y i L Y j |. If the absolute difference is less than the threshold, the two samples are considered to be similar and given the label 0. Otherwise, the label is 1. The division scheme is as follows:
(7)
where LS and LD mean similar pairs and dissimilar pairs, respectively. θ3 is a threshold parameter to split similar and dissimilar sample pairs. In our experiments θ 3 = 0.25.

After making the sample pairs, the number of LS is much smaller than that of LD. To balance the number of positive and negative samples, we duplicate LS until LS and LD are equal in number.

The Siamese network trained 50 epochs with the Adam optimizer on an RTX 3080 GPU card (NVIDIA Corporation, Santa Clara, CA). The learning rate and weight decay are 10 4 and 10 6, respectively. Cosine annealing learning rate with warm up is used to adjust the learning rate. Mean square error (MSE) is used to quantify the performance. We conducted two experiments on each hydrophone (12.5% training case and 25% training case), resulting in a total of six experiments.

First, we compare the accuracy of the Euclidean distance and the improved Siamese network for generating pseudo labels. The experiment results of hydrophone R1 are shown in Fig. 2, where the proportion of labeled data is 12.5%. By comparing Figs. 2(a) and 2(b), it is evident that the pseudo labels generated by our method are more accurate, which fully proves the effectiveness of our method. In addition, Fig. 2(c) shows that the proposed confidence criterion can effectively filter out the pseudo labels with large errors, which ensures that the resulting pseudo labels are sufficiently reliable.

All six experiment results are shown in Fig. 3. Obviously, except for several samples, most of the pseudo labels are fairly close to the true labels. The average MSE of pseudo labels and true labels is 0.0136. In all experiments, an average of 1300 reliable pseudo label samples are generated, which means the method we proposed can effectively alleviate the problem of insufficient labeled data. In addition, in the 25% training case, the Siamese network can generate pseudo labels for almost all unlabeled samples; in the 12.5% training case, the Siamese network can generate pseudo labels for 69% unlabeled samples. This shows that our method can achieve good results with less labeled data. The details of generated pseudo labels are shown in Table 1.

The predictor is trained on the original labeled dataset and the newly augmented labeled dataset, respectively. Then the performance of the predictor is verified on the test data. The results of different experiments are shown in Fig. 4. At the beginning and the end of the track, the prediction based on SSL using the augmented dataset is obviously much better than supervised learning (SL) trained on the original dataset. In the middle part, the prediction based on SSL is more accurate and the distribution is more even. The MSEs of all experiments are shown in Table 1. The average MSEs of SL and SSL in the 12.5% training case are 0.45 and 0.17, respectively; in the 25% training case, they are 0.27 and 0.12, respectively. Obviously, the less labeled data, the more significant the performance improvement brought by the proposed method. Moreover, we contrast the effect of our method with the PCR-Encoder. Except for the R1–12.5% experiment, our method obviously outperforms the PCR-Encoder method.

Because of the lack of labeled data for underwater source ranging, we propose a two-step SSL method. First, the pseudo label is generated for unlabeled data to expand the labeled dataset using the improved Siamese network and confidence criterion. In all experiments, the average number of pseudo labels generated by our method is 1300 and the average MSE is 0.0136. This means that our method can generate a large number of reliable pseudo labels, and then increase the labeled data. The prediction based on SSL trained on the augmented dataset is obviously better than the pure supervised learning trained on the small original labeled dataset. Average MSE decreases by 0.28 and 0.15 in the 12.5% and 25% training cases, respectively. Moreover, the less labeled data, the more obvious the effect of our method.

This research is supported by the National Natural Science Foundation of China (NSFC, Nos. 62201046, 62192711, 62192712).

All authors have no conflicts of interest to disclose.

The data that support the findings of this study are openly available at http://swellex96.ucsd.edu/; please refer to Ref. 17.

1.
H.
Niu
,
E.
Reeves
, and
P.
Gerstoft
, “
Source localization in an ocean waveguide using supervised machine learning
,”
J. Acoust. Soc. Am.
142
(
3
),
1176
1188
(
2017
).
2.
W.
Liu
,
Y.
Yang
,
M.
Xu
,
L.
,
Z.
Liu
, and
Y.
Shi
, “
Source localization in the deep ocean using a convolutional neural network
,”
J. Acoust. Soc. Am.
147
(
4
),
EL314
EL319
(
2020
).
3.
S.
Yoon
,
H.
Yang
, and
W.
Seong
, “
Deep learning-based high-frequency source depth estimation using a single sensor
,”
J. Acoust. Soc. Am.
149
(
3
),
1454
1465
(
2021
).
4.
R.
Chen
and
H.
Schmidt
, “
Model-based convolutional neural network approach to underwater source-range estimation
,”
J. Acoust. Soc. Am.
149
(
1
),
405
420
(
2021
).
5.
Y.
Liu
,
H.
Niu
,
Z.
Li
, and
M.
Wang
, “
Deep-learning source localization using autocorrelation functions from a single hydrophone in deep ocean
,”
JASA Express Lett.
1
(
3
),
036002
(
2021
).
6.
A. B.
Baggeroer
,
W.
Kuperman
, and
H.
Schmidt
, “
Matched field processing: Source localization in correlated noise as an optimum parameter estimation problem
,”
J. Acoust. Soc. Am.
83
(
2
),
571
587
(
1988
).
7.
A. B.
Baggeroer
,
W. A.
Kuperman
, and
P. N.
Mikhalevsky
, “
An overview of matched field methods in ocean acoustics
,”
IEEE J. Oceanic Eng.
18
(
4
),
401
424
(
1993
).
8.
W.
Mantzel
,
J.
Romberg
, and
K.
Sabra
, “
Compressive matched-field processing
,”
J. Acoust. Soc. Am.
132
(
1
),
90
102
(
2012
).
9.
T.
Yang
, “
Data-based matched-mode source localization for a moving source
,”
J. Acoust. Soc. Am.
135
(
3
),
1218
1230
(
2014
).
10.
X.
Zhu
,
H.
Dong
,
P. S.
Rossi
, and
M.
Landrø
, “
Feature selection based on principal component analysis for underwater source localization by deep learning
,” arXiv:2011.12754 (
2020
).
11.
X.
Zhu
,
H.
Dong
,
P. S.
Rossi
, and
M.
Landrø
, “
Time-frequency fused underwater acoustic source localization based on contrastive predictive coding
,”
IEEE Sens. J.
22
(
13
),
13299
13308
(
2022
).
12.
P.
Jin
,
B.
Wang
,
L.
Li
,
P.
Chao
, and
F.
Xie
, “
Semi-supervised underwater acoustic source localization based on residual convolutional autoencoder
,”
EURASIP J. Adv. Signal Process.
2022
(
1
),
107
.
13.
Z.-H.
Zhou
and
M.
Li
, “
Semisupervised regression with cotraining-style algorithms
,”
IEEE Trans. Knowl. Data Eng.
19
(
11
),
1479
1493
(
2007
).
14.
M. F.
Abdel Hady
,
F.
Schwenker
, and
G.
Palm
, “
Semi-supervised learning for regression with co-training by committee
,” in
Proceedings of the International Conference on Artificial Neural Networks
,
Limassol, Cyprus
(
September 14–17
,
2009
), pp.
121
130
.
15.
G.
Kostopoulos
,
S.
Karlos
,
S.
Kotsiantis
, and
O.
Ragos
, “
Semi-supervised regression: A recent review
,”
J. Intell. Fuzzy Syst.
35
(
2
),
1483
1500
(
2018
).
16.
C.-L.
Liu
and
Q.-H.
Chen
, “
Metric-based semi-supervised regression
,”
IEEE Access
8
,
30001
30011
(
2020
).
17.
J.
Murray
and
D.
Ensberg
, “
The swellex-96 experiment
,” http://www.mpl.ucsd.edu/swellex96 (Last viewed August 20, 2023).