Coherent processing in synthetic aperture sonar (SAS) requires platform motion estimation and compensation with sub-wavelength accuracy for high-resolution imaging. Micronavigation, i.e., through-the-sensor platform motion estimation, is essential when positioning information from navigational instruments is absent or inadequately accurate. A machine learning method based on variational Bayesian inference has been proposed for unsupervised data-driven micronavigation. Herein, the multiple-input multiple-output arrangement of a multi-band SAS system is exploited and combined with a hierarchical variational inference scheme, which self-supervises the learning of platform motion and results in improved micronavigation accuracy.

Synthetic aperture sonar (SAS) combines coherently the backscattered echoes recorded with an active sonar from a platform moving along a predefined trajectory.1 Coherent processing requires platform motion estimation and compensation with sub-wavelength accuracy to produce high-resolution acoustic imaging of the seafloor.2 Motion estimation with navigational instruments, which are commonly mounted on the SAS platform, is limited by the nominal accuracy of the sensors and, possibly, by interrupted data acquisition.3 In multi-channel systems, the relative ping-to-ping platform motion can be estimated by cross-correlating the signals of overlapping elements between successive pings due to the spatiotemporal coherence of homogeneous reverberation.4,5 This through-the-sensor platform motion estimation, referred to as micronavigation, aims at providing sub-wavelength accuracy for coherent SAS processing.6 

Contrary to traditional micronavigation methods, which are based on analytical or numerical coherence models and involve spatiotemporal interpolation and fitting,7–9 a representation learning approach based on variational inference and implemented with a variational autoencoder (VAE) offers a fully data-driven method for platform motion estimation.10 The trained VAE provides immediate ping-to-ping platform translation estimates from coherence measurements on the coarse spatiotemporal acquisition grid determined by overlapping sensors, without further processing. Compared to data-driven autofocusing methods,11 which aim to compensate for phase errors in post-processing after image reconstruction, and hence their algorithmic complexity depends on the size of the image patch, data-driven micronavigation is a pre-processing phase correction, and it is independent of the image size or the beamforming method used for image reconstruction.

This study extends the variational inference scheme for micronavigation introduced in Ref. 10, by relating the coherence data from each subsystem of a multiple-input multiple-output (MIMO) configuration with a hierarchical Bayesian model. MIMO configurations utilize the waveform diversity from multiple transmitters for multi-spectral processing12 or for improving the spatial sampling.13 Herein, a SAS system with a two-dimensional (2D) receiver array and two transmitters is considered for multi-band imaging.14,15 Such a configuration results in two subsystems of virtual monostatic phase centers,5 which can be utilized for coherence measurements. Due to each transmitter's distinctive aperture and transmitted pulse bandwidth, the shape of the coherence function on overlapping elements at successive pings differs between the two subsystems, but the location of the coherence peak is defined by the relative translation of the platform in both cases.

We show that, for such multi-static systems, micronavigation estimates can be fused for better estimation accuracy. Specifically, we introduce a variational inference scheme with two coupled but independently parameterized VAEs that uses the common latent space between the two coherence datasets to learn jointly the corresponding generative features. Such cross-domain learning has been used for data fusion from different modalities of sensory signals16 and unified learning from multi-view images.17 The hierarchical formulation of the variational inference problem transfers the knowledge between the datasets and thus self-supervises the training of coupled VAEs and improves the estimation accuracy.

A SAS system comprises an arrangement of transmitters and receivers to insonify the seafloor and record the backscattered echoes, respectively, as the platform moves along a predefined (usually linear) path. In monostatic systems, the transmitter and the receiver are collocated by definition, whereas in multi-static configurations the phase center approximation (PCA)5 replaces each transmitter-receiver pair with a virtual transceiver located at the middle of the distance between them (see Fig. 1). In MIMO systems, each transmitter m { 1 , , M } with spatial aperture b T m ( r ) insonifies the seafloor with a pulse, q m ( f ), as a function of frequency f, referred to as a ping. In any given time frame, the frequency response of the sound pressure recorded at a receiver located at r is the total backscattered field from all scatterers within the corresponding isochronous volume vs insonified by the superposition of the M transmitted pulses,
p ( r , f ) = V s [ m = 1 M q m ( f ) B m ( k s ) e j ( 2 π f / c ) 2 | | r s r v m | | 2 ( 4 π | | r s r v m | | 2 ) 2 ] s ( r s , f ) d r s ,
(1)
where s is the scattering strength of a scatterer located at r s , r v m is the location of the transceiver on the virtual aperture associated with the mth transmitter and B m ( k s ) expresses the combined beampattern of each transmitter-receiver pair, where k s = ( 2 π f / c ) ( r s r v m ) / | | r s r v m | | 2 is the wavenumber vector for sound speed c. The information from the transmit diversity is exploited with matched filtering, i.e., by multiplying the recorded signal in Eq. (1) with the complex conjugate (denoted by an overline) of each transmitted pulse,
p mf ( r v m , f ) = q ¯ m ( f ) p ( r , f ) = | | q m ( f ) | | 2 2 ( 4 π ) 2 V s B m ( k s ) e j ( 2 π f / c ) 2 | | r s r v m | | 2 | | r s r v m | | 2 2 s ( r s , f ) d r s + res .
(2)
Fig. 1.

(a) Schematic of a multi-static SAS system with a two-dimensional (2D) array of receivers and two transmitters with different aperture sizes and frequency bands at either side of the receiver array. (b) The corresponding PCA virtual sensor configuration at two successive pings along the nominal trajectory (along x axis). An instance of the 3D spatial coherence of diffuse backscatter, as a function of relative displacement relative to the transducer size D and the pulse bandwith Δ f, is annotated for each of the virtual arrays.

Fig. 1.

(a) Schematic of a multi-static SAS system with a two-dimensional (2D) array of receivers and two transmitters with different aperture sizes and frequency bands at either side of the receiver array. (b) The corresponding PCA virtual sensor configuration at two successive pings along the nominal trajectory (along x axis). An instance of the 3D spatial coherence of diffuse backscatter, as a function of relative displacement relative to the transducer size D and the pulse bandwith Δ f, is annotated for each of the virtual arrays.

Close modal

The matched filtered response in Eq. (2) assumes orthogonal waveforms, e.g., when the transmitted pulses occupy distinct parts of the frequency spectrum, which is the case considered in this study for multi-band imaging, i.e., when the residual term res = q ¯ m ( f ) V s [ n = 1 , n m M q n ( f ) B n ( k s ) e j ( 2 π f / c ) 2 | | r s r v n | | 2 / ( 4 π | | r s r v n | | 2 ) 2 ] s ( r s , f ) d r s vanishes since q ¯ m ( f ) q n m ( f ) = 0.

The coherence of the matched-filtered signals recorded at r v m and r 0 (here r 0 = 0 for notational brevity), respectively, is
C ( r v m , f ) = E s [ p mf ( r v m , f ) p ¯ mf ( r 0 , f ) ] | | q m ( f ) | | 2 4 ( 4 π R s ) 4 V s E s [ s ( r s , f ) s ¯ ( r s , f ) ] | | B m ( k s ) | | 2 2 e j ( 2 π f / c ) 2 r ̂ s r v m d r ̂ s ,
(3)
where the expectation E s [ · ] is over the scattering function s, which is the only random variable in Eq. (3), and considering that all scatterers in the isochronous volume lie on a spherical shell with mean radius R s | | r s | | 2 at the far field of the sonar such that | | r s r v m | | 2 | | r s | | 2 r ̂ s r v m.18 The spatial dependence of the coherence is described by the integral in Eq. (3), which represents the spatial Fourier transform of the power spectral density of the virtual transceiver beampattern modulated by the spatial coherence of the scattering function. For unresolved scatterers, the scattering function is spectrally and spatially uncorrelated, E s [ s ( r s , f ) s ¯ ( r s , f ) ] = | | s | | 2 2. In this case, the spatial coherence depends only on the autocorrelation function of the virtual transceiver aperture,19, C T m ( r v m ) = ( b v m * b v m ) [ r ], where b v m = b T m * b R, bR is the receiver's spatial aperture and * is the autocorrelation operator. Considering sensors with rectangular apertures, as in Fig. 1, the resulting autocorrelation function of the virtual transceiver aperture can be well-approximated by a 2D Gaussian function.8,20 Including the effect of spatially coherent scattering due to inhomogeneous media or anisotropic backscatter modulates the spatial coherence accordingly: C T m ( r v m ) = ( b v m * b v m ) [ r ] * ( s * s ) [ r ].19 The temporal dependence of the coherence is obtained through the inverse Fourier transform of Eq. (3), which depends on the power spectral density of the transmitted pulse, C q m ( τ ) = | | q m ( f ) | | 2 4 e j 2 π f τ d f = ( q m * q m ) [ τ ] * ( q m * q m ) [ τ ]. For linear frequency modulated (LFM) waveforms with uniform frequency response over their corresponding bandwidth, the temporal coherence is a sinc2 function, which can be also well-approximated by a Gaussian function. Normalizing Eq. (3) with the average power of the corresponding signals, the spatiotemporal correlation coefficient on each virtual sub-array is the product of the normalized spatial and temporal coherence associated with the corresponding transmitter characteristics,
C m ( r v m , τ ) = C T m ( r v m ) C T m ( 0 ) C q m ( τ ) C q m ( 0 ) .
(4)
The temporal dimension of the correlation coefficient in Eq. (4) is transformed into slant-range as y = c τ / 2 for two-way propagation. For generality, we assume a SAS system with a 2D receiver array, which allows for three-dimensional (3D) coherence measurements. Note that 2D coherence measurements obtained with SAS systems with a linear receiver array or in arrangements that the transmitters insonify the seafloor at very low grazing angles can be treated as a special case with reduced dimensionality, i.e., without direct motion estimation capability along the z axis.
Let d L be a backscatter coherence measurement on a 3D spatial grid, C ( x , y , z ) over L voxels, as derived in Sec. 2, and let u K describe the K-dimensional vector ( K L) of the unobserved latent features such that
d = f θ ( u ) ,
(5)
where f is a non-linear generative model with parameters θ. The non-linear generative model offers flexibility as higher moments of the data are captured in the likelihood, which, in the presence of additive Gaussian noise with zero mean and variance σ 2, is expressed as the Gaussian distribution
p θ ( d | u ) = N ( d | f θ ( u ) , σ 2 I ) .
(6)
Probabilistic inference provides a statistical estimate of the latent variables that have generated the noisy observed data described by the posterior distribution
p θ ( u | d ) = p θ ( d | u ) p ( u ) p θ ( d ) ,
(7)
where p ( u ) is the prior distribution of the latent variables and p θ ( d ) is the marginal likelihood. However, the posterior in Eq. (7) is usually analytically or computationally intractable [e.g., when the data are non-linear functions of the latent variables,21 as in Eq. (5)] due to the marginal likelihood, which involves the integration of the joint probability distribution p θ ( d , u ) over the latent variables. To overcome the intractability of the true posterior distribution, variational inference methods approximate the true posterior with a simpler distribution, such as a Gaussian distribution with diagonal covariance matrix
q ϕ ( u | d ) = N ( u | μ ϕ , diag ( σ ϕ 2 ) ) ,
(8)
with the mean and the variance inferred from the data through a non-linear model, [ μ ϕ , log σ ϕ ] = g ϕ ( d ), parameterized by ϕ.21 The accuracy of the approximation, q ϕ ( u | d ) p θ ( u | d ), is quantified with the Kullback–Liebler (KL) divergence, D K L, between the approximating and the exact posterior distributions and is optimized through the parameters ϕ and θ:22,23
{ ϕ ̂ , θ ̂ } = arg min ϕ , θ D K L ( q ϕ ( u | d ) | | p θ ( u | d ) ) = arg min ϕ , θ [ E q ϕ [ log p θ ( d | u ) ] + β D K L ( q ϕ ( u | d ) | | p ( u ) ) ] .
(9)
The regularization parameter β is introduced to control the importance between the data-fitting term and KL divergence between the approximate posterior and the prior distribution of the latent variables.24 With a prior that promotes independent latent variables, e.g., p ( u ) = N ( u | 0 , I ) due to the diagonal covariance, setting β > 1 favors disentangled representations of the latent variables at the expense of less accurate data reconstruction.25 VAEs solve the variational inference problem by implementing the non-linear generative f θ and inference g ϕ models that parameterize the data likelihood and approximate posterior distribution, respectively, as deep neural networks with mirrored architecture.26 
Consider the case that the same latent features generate correlated data points in two different datasets d 1 D 1 and d 2 D 2. To describe this dependency, we introduce the hierarchical model
p θ ( u | d 1 , d 2 ) = p θ ( d 1 , d 2 | u ) p ( u ) p θ ( d 1 , d 2 ) = p θ 2 ( d 2 | d 1 , u ) p θ 2 ( d 2 | d 1 ) p θ 1 ( d 1 | u ) p ( u ) p θ 1 ( d 1 ) .
(10)
Following Eqs. (6) and (8), the data likelihoods and the approximate posterior distributions for the hierarchical model in Eq. (10) are expressed, respectively, as
p θ 1 ( d 1 | u ) = N ( d 1 | d 1 ̂ , σ 2 I ) , d 1 ̂ = f θ 1 ( u ) , p θ 2 ( d 2 | d 1 , u ) = N ( d 2 | d 2 ̂ , σ 2 I ) , d 2 ̂ = f θ 2 ( u ) ,
(11)
q ϕ 1 ( u | d 1 ) p θ 1 ( d 1 | u ) p ( u ) p θ 1 ( d 1 ) = N ( u | μ ϕ 1 , diag ( σ ϕ 1 2 ) ) , [ μ ϕ 1 , log σ ϕ 1 ] = g ϕ 1 ( d 1 ) , q ϕ 2 ( u | d 1 , d 2 ) p θ 2 ( d 2 | d 1 , u ) p θ 2 ( d 2 | d 1 ) q ϕ 1 ( u | d 1 ) = N ( u | μ ϕ 2 , diag ( σ ϕ 2 2 ) ) , [ μ ϕ 2 , log σ ϕ 2 ] = g ϕ 2 ( d 2 ) .
(12)
To solve the hierarchical model in Eq. (10) with variational inference, we introduce a pair of VAEs that implement the non-linear generative, f θ 1 and f θ 2, and inference, g ϕ 1 and g ϕ 2, models as neural networks with the same architecture but independently parameterized (see Fig. 2) that are optimized simultaneously by minimizing the coupled loss function with respect to their independent parameters, ϕ 1 , 2 and θ 1 , 2:
L ϕ 1 , 2 , θ 1 , 2 β ( d 1 , d 2 ) = [ E q ϕ 1 [ log p θ 1 ( d 1 | u ) ] + β D K L ( q ϕ 1 ( u | d 1 ) | | p ( u ) ) E q ϕ 2 [ log p θ 2 ( d 2 | d 1 , u ) ] + β D K L ( q ϕ 2 ( u | d 1 , d 2 ) | | q ϕ 1 ( u | d 1 ) ) ] .
(13)
Fig. 2.

Schematic of the coupled VAE architecture.

Fig. 2.

Schematic of the coupled VAE architecture.

Close modal

The loss function in Eq. (13) maximizes simultaneously the data likelihood for both datasets and couples the latent variables by minimizing the KL divergence between the two approximate posteriors. This coupling, introduced by the hierarchical model in Eq. (10), allows the approximate posterior q ϕ 1 to progressively supervise the training of the approximate posterior q ϕ 2 by labeling its prior, improving simultaneously the estimation accuracy of both models compared to the unsupervised case.

The coupled model for hierarchical variational inference of the platform motion from coherence measurements with the multi-static system in Fig. 1 is implemented with two VAEs, depicted in Fig. 2, independently parameterized and trained with correlated datasets. Since this study builds upon the work in Ref. 10, we employ the same neural network architecture for the encoder and the decoder for each VAE.

The coherence data samples are simulated with 3D Gaussian functions, as derived in Sec. 2 for rectangular sensor apertures and LFM waveforms, on a 3D spatial grid corresponding to a 12 × 12 grid of adjacent PCA virtual transceivers with spacing 1.5 cm and a temporal window of 60 samples, which is transformed into slant-range as y = c τ / 2. To allow comparison with the unsupervised model introduced in Ref. 10, the data depend on nine generative factors: Δ x , Δ y, and Δ z, which determine the 3D location of the Gaussian function and simulate the ping-to-ping translation, respectively; sx, sy, and sz, which determine the spread of the Gaussian function in each dimension and simulate the effect of transmitter aperture and pulse bandwidth; and rotation (ψ), scale (α), and noise-floor (ζ), which modulate and scale the Gaussian function and simulate the effect of anisotropic backscatter and noise, such that d = C ( x , y , z ) = max ( α exp ( 1 / 2 [ ( x / s x ) 2 + ( y / s y ) 2 + ( z / s z ) 2 ] ) , ζ ), where x = cos ψ ( x Δ x ) sin ψ ( z Δ z ) , y = ( y Δ y ) and z = sin ψ ( x Δ x ) + cos ψ ( z Δ z ). The choice of a Gaussian coherence function simplifies the analysis and aids reproducibility. Nevertheless, the models can be trained with other simulated or measured datasets from different array geometries7,9 with potential impact on performance.

Note that for multi-static SAS systems with several transmitters, the location of the maximum coherence is determined by the relative platform motion between pings and is common for all sub-systems. We employ a SAS configuration with two transmitters operating in different frequency bands, assuming that the second transducer has double the aperture size and half the bandwidth of the first transducer (80 and 40 kHz, respectively). Hence, the variance of the Gaussian coherence instances from the second dataset is double that of the first dataset for all dimensions (see Fig. 2). The rest of the generative factors are common for both datasets. Data points are generated simultaneously for the two datasets by randomly sampling each generative factor from a Gaussian distribution with mean μ gf and variance σ gf 2 (see Table 1). The generative factors that are common between datasets are sampled once for each pair of data points.

Table 1.

Generative factors parameterizing the Gaussian functions in the coupled coherence datasets and the parameters of the Gaussian distributions they are sampled from.

Generative factor x-location Δ x y-location Δ y z-location Δ z x-spread sx y-spread sy z-spread sz Rotation ψ Scale α Noise-floor ζ
μ gf  0.03 m  0 m  0 m  0.015 / 0.03 0.01 / 0.02 0.015 / 0.03 0 °  0.75  0.1 
σ gf  0.02 m  0.04 m  0.04 m  0.01 m  0.01 m  0.01 m  15 °  0.15  0.05 
Generative factor x-location Δ x y-location Δ y z-location Δ z x-spread sx y-spread sy z-spread sz Rotation ψ Scale α Noise-floor ζ
μ gf  0.03 m  0 m  0 m  0.015 / 0.03 0.01 / 0.02 0.015 / 0.03 0 °  0.75  0.1 
σ gf  0.02 m  0.04 m  0.04 m  0.01 m  0.01 m  0.01 m  15 °  0.15  0.05 

The training of the coupled VAEs by optimizing Eq. (13) consisted of 5 × 10 3 iterations for convergence, i.e., infinitesimal change of the loss value between iterations. At each training iteration, a batch of 1000 data points is used to update the parameters of the encoder and decoder networks. The data likelihood is considered Gaussian with σ = 0.05 for both datasets [see Eq. (11)]. The regularization parameter is set to β = 25, which offers a good balance between data reconstruction accuracy and a disentangled latent representation; see Ref. 10 for details on regularization parameter tuning. The latent space dimension, K = 15, is chosen larger than the dimension of the generative factors of the simulated datasets to account for realistic cases, where the number of latent features is not known a priori. Any extra features not present in the data will correspond to non-informative latents after training.

Figure 3 summarizes the capacity of the coupled VAEs to learn the latent features that represent their selective datasets. Specifically, the results in Fig. 3(a) refer to the VAE model fed with data points from the first dataset (corresponding to the smaller transmitter with wider bandwidth), referred to as β-VAE I, whereas the results in Fig. 3(b) are associated with the second model fed with data points from the second dataset, referred to as β-VAE II. The covariance matrix of the mean values μ that parameterize the approximate posterior of the latent variables inferred from the encoder during training indicates that both VAEs have learned disentangled representations of the data generative factors, indicated by the diminishing cross-correlations.

Fig. 3.

Performance statistics, including the covariance matrix of the approximate posterior mean values and the RMSE, the correlation coefficient, and the error histogram between the actual and the inferred variables from (a) β-VAE I and (b) β-VAE II, corresponding to the independent and dependent models of the coupled architecture, respectively.

Fig. 3.

Performance statistics, including the covariance matrix of the approximate posterior mean values and the RMSE, the correlation coefficient, and the error histogram between the actual and the inferred variables from (a) β-VAE I and (b) β-VAE II, corresponding to the independent and dependent models of the coupled architecture, respectively.

Close modal

β-VAE I and II have learned to represent the data in their corresponding datasets, with six and nine generative factors, respectively, indicated by the number of the non-zero diagonal elements that relate to the variance of the corresponding inferred mean values μi from zero. The number of informative latents for VAE I is smaller than that for VAE II due to the fact that the employed spatial grid is too coarse to resolve some parameters, such as the spread and the rotation, sx, sz, and ψ, respectively, for the corresponding dataset. Hence, the common features learned correspond to 3D location, scale, and noise-floor. The rest of the learned features for VAE II capture the variation in spread and rotation but not very accurately due to the lack of supervision (see Ref. 10 for details). In Fig. 3, the plots showing the root mean square error (RMSE) between the latent variables encoding the 3D location of the Gaussian coherence and the corresponding generative factors, as well as the square of the Pearson correlation coefficient, ρ 2, associated with each pair quantify the predictive ability of the VAE models. The histograms show the statistics of the error between the actual generative factor and the corresponding inferred latent mean value, Δ x μ x , Δ y μ y, and Δ z μ z, after training and provide a statistical description of the inference accuracy. Note that the error variance is smaller for VAE II, even though it relates to the dataset corresponding to a transmitter with larger aperture and narrower bandwidth, as its approximate posterior is supervised by the approximate posterior of VAE I in the hierarchical formulation.

Finally, Fig. 4 demonstrates the predictive ability of the trained coupled VAEs on a specific test case, which is the same as in Ref. 10 for comparison with the unsupervised model. A predefined translation track over 100 pings is superimposed with an interval of ±1 standard deviation inferred from each of the coupled VAEs. The absolute difference between the actual and the inferred tracks from coherence measurements is less than 2 mm for all translations for both models in the coupled architecture. Coupling the training of VAE through a common loss reduces the micronavigation estimation error up to 10 times compared to the unsupervised case.10 

Fig. 4.

Ping-to-ping 3D translation trajectory (black solid line) of the platform carrying a SAS system along with the estimated values from an unsupervised (see Ref. 10) and a self-supervised coupled β-VAE (β = 25).

Fig. 4.

Ping-to-ping 3D translation trajectory (black solid line) of the platform carrying a SAS system along with the estimated values from an unsupervised (see Ref. 10) and a self-supervised coupled β-VAE (β = 25).

Close modal

Coherent processing in SAS requires platform motion estimation and compensation with sub-wavelength accuracy. Micronavigation aims to infer the ping-to-ping platform displacement from the spatial coherence of diffuse backscatter on redundant recordings between pings. Variational inference offers a fully data-driven method for platform motion from coherence measurements. In this study, we introduce a hierarchical variational model implemented with coupled VAEs to relate the common latent features between datasets of coherence measurements in multi-band MIMO SAS systems. Self-supervising the training process of independently parameterized but coupled VAEs improves significantly the accuracy of the micronavigation estimates.

This work was performed under Project No. SAC000E04 of the STO-CMRE Programme of Work, funded by the NATO Allied Command Transformation.

The authors have no conflicts to disclose.

The data that support the findings of this study are available within the article.

1.
P. T.
Gough
and
D. W.
Hawkins
, “
Unified framework for modern synthetic aperture imaging algorithms
,”
Int. J. Imag. Syst. Technol.
8
(
4
),
343
358
(
1997
).
2.
R. E.
Hansen
,
Sonar Systems
(
InTechOpen
,
Rijeka, Croatia
,
2011
), Chap. 1, pp.
3
28
.
3.
R. E.
Hansen
,
H. J.
Callow
,
T. O.
Saebø
, and
S.
Synnes
, “
Challenges in seafloor imaging and mapping with synthetic aperture sonar
,”
IEEE Trans. Geosci. Remote Sens.
49
(
10
),
3677
3687
(
2011
).
4.
Y.
Doisy
, “
General motion estimation from correlation sonar
,”
IEEE J. Ocean. Eng.
23
(
2
),
127
140
(
1998
).
5.
A.
Bellettini
and
M. A.
Pinto
, “
Theoretical accuracy of synthetic aperture sonar micronavigation using a displaced phase-center antenna
,”
IEEE J. Ocean. Eng.
27
(
4
),
780
789
(
2002
).
6.
M. A.
Pinto
,
F.
Fohanno
,
O.
Trémois
, and
S.
Guyonic
, “
Autofocusing a synthetic aperture sonar using the temporal and spatial coherence of seafloor reverberation
,” in
Proceedings of the High Frequency Acoustics in Shallow Water
, Lerici, Italy (NATO SACLANTCEN, La Spezia, Italy
1997
), pp.
417
424
.
7.
T. E.
Blanford
,
D. C.
Brown
, and
R. J.
Meyer
, “
Measurements and models of the correlation of redundant spatial coherence measurements for the incoherently scattered field
,”
J. Acoust. Soc. Am.
146
(
6
),
4224
4236
(
2019
).
8.
D. C.
Brown
,
I. D.
Gerg
, and
T. E.
Blanford
, “
Interpolation kernels for synthetic aperture sonar along-track motion estimation
,”
IEEE J. Ocean. Eng.
45
(
4
),
1497
1505
(
2020
).
9.
B.
Thomas
and
A.
Hunter
, “
Coherence-induced bias reduction in synthetic aperture sonar along-track micronavigation
,”
IEEE J. Ocean. Eng.
47
,
162
178
(
2021
).
10.
A.
Xenaki
,
B.
Gips
, and
Y.
Pailhas
, “
Unsupervised learning of platform motion in synthetic aperture sonar
,”
J. Acoust. Soc. Am.
151
(
2
),
1104
1114
(
2022
).
11.
I. D.
Gerg
and
V.
Monga
, “
Real-time, deep synthetic aperture sonar (SAS) autofocus
,” in
2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS
, Brussels, Belgium (
IEEE
,
New York
,
2021
), pp.
8684
8687
.
12.
S. M.
Steele
,
R.
Charron
,
J.
Dillon
, and
D.
Shea
, “
Performance prediction for a low frequency ultra-wideband synthetic aperture sonar
,” in
Oceans 2019 MTS/IEEE SEATTLE
, Seattle, WA (
IEEE
,
New York
,
2019
).
13.
A.
Xenaki
,
Y.
Pailhas
, and
R.
Sabatini
, “
Sparse MIMO synthetic aperture sonar processing with distributed optimization
,” in
54th Asilomar Conference on Signals, Systems, and Computers
, Pacific Grove, CA (
IEEE
,
New York
,
2020
), pp.
82
87
.
14.
A. L. D.
Beckers
,
R.
van Vossen
, and
G.
Vlaming
, “
Low-frequency synthetic aperture sonar for detecting explosives in harbors
,”
Sea Technol.
53
(
3
),
15
18
(
2012
).
15.
S. A. V.
Synnes
and
R. E.
Hansen
, “
Ultra wideband SAS imaging
,” in
Proceedings of the 1st International Conference and Exhibition on Underwater Acoustics (UA 2013)
, Corfu, Greece (
2013
), pp.
111
118
.
16.
R.
Gala
,
A.
Budzillo
,
F.
Baftizadeh
,
J.
Miller
,
N.
Gouwens
,
A.
Arkhipov
,
G.
Murphy
,
B.
Tasic
,
H.
Zeng
,
M.
Hawrylycz
, and
U.
Sümbül
, “
Consistent cross-modal identification of cortical neurons with coupled autoencoders
,”
Nat. Comp. Sci.
1
(
2
),
120
127
(
2021
).
17.
S.
Wang
,
Z.
Ding
, and
Y.
Fu
, “
Coupled marginalized auto-encoders for cross-domain multi-view learning
,” in
IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence
, New York, NY (
AAAI Press
,
2016
), pp.
2125
2131
.
18.
J. W.
Goodman
,
Introduction to Fourier Optics
, 3rd ed. (
Roberts and Company
,
Englewood, CO
,
2005
), Chap. 4.
19.
R.
Mallart
and
M.
Fink
, “
The van Cittert–Zernike theorem in pulse echo measurements
,”
J. Acoust. Soc. Am.
90
(
5
),
2718
2727
(
1991
).
20.
F.
Novella
,
Y.
Pailhas
,
G. L.
Chenadec
,
I.
Quidu
, and
M.
Legris
, “
Low frequency SAS: Influence of multipaths on spatial coherence
,”
Proc. Mtgs. Acoust.
44
,
070032
(
2021
).
21.
D. P.
Kingma
and
M.
Welling
, “
Auto-encoding variational Bayes
,” arXiv:13126114 (
2014
).
22.
K. P.
Murphy
,
Machine Learning: A Probabilistic Perspective
(
MIT Press
,
Cambridge, MA
,
2012
), Chap. 21.
23.
D. M.
Blei
,
A.
Kucukelbir
, and
J. D.
McAuliffe
, “
Variational inference: A review for statisticians
,”
J. Am. Stat. Assoc.
112
(
518
),
859
877
(
2017
).
24.
I.
Higgins
,
L.
Matthey
,
A.
Pal
,
C.
Burgess
,
X.
Glorot
,
M.
Botvinick
,
S.
Mohamed
, and
A.
Lerchner
, “
β-VAE: Learning basic visual concepts with a constrained variational framework
,” in
5th International Conference on Learning Representations
, Toulon, France (
2017
), pp.
1
22
.
25.
C.
Burgess
,
I.
Higgins
,
A.
Pal
,
L.
Matthey
,
N.
Watters
,
G.
Desjardins
, and
A.
Lerchner
, “
Understanding disentangling in β-VAE
,” arXiv:1804.03599 (
2018
).
26.
D.
Kingma
and
M.
Welling
, “
An introduction to variational autoencoders
,” arXiv:1906.02691 (
2019
).