We present a new data-driven model to reconstruct nonlinear flow from spatially sparse observations. The proposed model is a version of a Conditional Variational Auto-Encoder (CVAE), which allows for probabilistic reconstruction and thus uncertainty quantification of the prediction. We show that in our model, conditioning on measurements from the complete flow data leads to a CVAE where only the decoder depends on the measurements. For this reason, we call the model semi-conditional variational autoencoder. The method, reconstructions, and associated uncertainty estimates are illustrated on the velocity data from simulations of 2D flow around a cylinder and bottom currents from a simulation of the southern North Sea by the Bergen Ocean Model. The reconstruction errors are compared to those of the Gappy proper orthogonal decomposition method.

Reconstruction of non-linear dynamic processes from sparse observations traditionally requires knowledge of the processes and the governing equations to be able to generalize to a wider area around, in-between, and beyond the measurements. Alternatively, it is possible to learn the underlying processes or equations based on the data itself, the so-called data-driven methods. In geophysics and environmental monitoring, measurements are often only available at sparse locations. For instance, within the field of meteorology, atmospheric pressures, temperatures, and wind are only measured at a limited number of stations. Producing accurate and general weather predictions requires methods that both forecast for the future and also reconstruct where no data are available. Within oceanography, one faces the same problem that in situ information about the ocean dynamics is only available at sparse locations such as buoys or sub-sea sensors.

Both the weather and ocean currents can be approximated with models that are governed by physical laws, e.g., the Navier–Stokes equation. However, it is of crucial importance to incorporate observations in order to obtain accurate reliable reconstructions and forecasts.

Reconstruction and inference based on sparse observations are important in numerous applications.1–6 Bolton and Zanna3 used Convolutional Neural Networks (CNNs) to hindcast ocean models, and Yeo7 reconstructed time series from nonlinear dynamics based on sparse observations. Oikonomo et al.8 proposed a method for filling data gaps in groundwater level observations, and Kong et al.2 used reconstruction techniques to model the characteristics of cartridge valves.

The above mentioned applications and methods are just some of the many examples of reconstruction of a dynamic process based on limited information. Here, we focus on the reconstruction of flow. This problem can be formulated as follows: Let wRd, dN, represent a state vector of the flow containing, for example, velocity, pressure, and temperature. Here, we will focus on in-compressible unsteady fluid flows in 2D, so w=(u,v)R2, where u and v are the two velocity components. The velocity vector, w, is typically obtained from computational fluid dynamic simulations on a meshed spatial domain, P, at discrete times, T={t1,,tK}.

Let P={p1,,pN} consist of N grid points pn, n = 1, …, N. Then, the state of the flow, w, evaluated on P at a time tiT, can be represented as a vector x(i)R2N,

x(i)=(u(p1,ti),,u(pN,ti),v(p1,ti),,v(pN,ti))T.
(1)

The collection of x(i), i = 1, …, K, constitutes the dataset X. In order to account for incompressibility, we introduce a discrete divergence operator Ldiv, which is given by a N × 2N matrix associated with a finite difference scheme, and

(Ldivx)k(w)(pk)=0.
(2)

Furthermore, we assume that the state is measured only at specific points in P, that is, at Q={q1,,qM}P with typically MN. Hence, there is M={m(i)R2M:m(i)=Cx(i),x(i)X}, where CR2M×2N represents a sampling matrix. More specifically, C is a two block matrix,

C=C1/2OOC1/2,(C1/2)ij=1,ifqi=pj0,otherwise,i=1,,Nj=1,,M,

and ORM×N is the zero matrix. The problem of reconstructing fluid flow x(i)X from m(i)M is presented as a schematic plot in Fig. 1.

FIG. 1.

Sketch of reconstruction of x(i) from m(i). The dots on the right-hand side represent the grid P, and those on the left-hand side represent the measurement locations Q.

FIG. 1.

Sketch of reconstruction of x(i) from m(i). The dots on the right-hand side represent the grid P, and those on the left-hand side represent the measurement locations Q.

Close modal

There have been a wide range of methods for solving the problem, e.g., Refs. 9–14. In particular, the use of proper orthogonal decomposition (POD)9 techniques has been popular. It is a traditional dimensional reduction technique where, based on a dataset, a number of basis functions are constructed. The key idea is that linear combinations of these basis functions can reconstruct the original data within some error margin, efficiently reducing the dimension of the problem. In a modified version of the POD, the Gappy POD (GPOD),10 the aim is to fill the gap in-between sparse measurements. Given a POD basis, one can minimize the L2-error with the measurements and find a linear combination of the POD-basis that complements the measurements. If the basis is not known, an iterative scheme can be formulated to optimize the basis based on the measurements. The original application of GPOD10 was related to reconstruction of human faces, and it has later been applied to fluid flow reconstruction.4,15–17 This motivates us to use this method for comparison in our study.

A similar approach is the Compressed Sensing (CS) technique.11 As for the GPOD method, the aim is to solve a linear system; in the CS case, it will be a under-determined linear system. Hence, there is a need for additional information about the system to be able to solve it; typically, this can be a condition/constraint related to the smoothness of the solution. The core difference between CS and GPOD is, however, the sparsity constraint. That is, instead of minimizing the L2-norm, we minimize the L1-norm. Minimizing the L1-norm favors sparse solutions, i.e., solutions with a small number of nonzero coefficients.

Another reconstruction approach is Dynamical Mode Decomposition (DMD).12 Instead of using principal components in the spatial domain, DMD seeks to find modes or representations that are associated with a specific frequency in the data, i.e., modes in the temporal domain. Again, the goal is to find a solution to an undetermined linear system and reconstruct based on the measurements by minimizing the deviance from the observed values.

During the past decade, data-driven methods have become popular, partly because of the growth and availability of data, but also driven by new technology and improved hardware. Modeling non-linear relationships with linear approximations is one of the fundamental limitation of the DMD, CS, and GPOD methods. There have been efforts that involve neural networks and other machine learning methods to solve the flow reconstruction and decomposition problem. For example, Pawar and San18 used data assimilation to correct a physics-based model coupled with a neural network as a surrogate to resolved flow dynamics in multiscale systems. Other examples include the work of Erichson et al.19 that used shallow neural networks to learn the input-to-output mapping between sensor measurements and flow fields for a deterministic reconstruction of the data and that of Pérez et al.20 that reconstructed three-dimensional flow fields from two-dimensional data through higher order dynamic mode decomposition,21 while Gao et al.22 used an attentional generative adversarial neural network to reconstruct and interpolate air pollution data from Beijing. Probabilistic reconstruction of flow fields is not that widespread; however, Maulik et al.23 recently developed a probabilistic neural network for fluid flow surrogate modeling and data recovery through Mixture Density Networks (MDN). MDNs can account for complex multi-modal conditional distributions but are, in general, difficult to optimize. In addition, MDNs have tendency to overfit the training data and are prone to mode collapse.24 These are examples of recent initiatives that seek to either reconstruct or decompose flow fields based on the input from limited information with data-driven methods.

Another promising and interesting method where the neural networks are informed with a physical law, the so-called Physic-Informed Neural Networks (PINNs),14 has been developed during the past few years. In PINNs, the reconstruction is typically informed by a Partial Differential Equation (PDE) (e.g., Navier Stokes equation for fluid flow), and thus, the neural network can learn to fill the gap between measurements that are in compliance with this equation. This is what Raissi et al.25 have shown for the benchmark examples such as flow around a 2D and 3D cylinder.

Although PINNs are showing encouraging results, we have yet to see applications to complex systems such as atmospheric or oceanographic systems, where other aspects have to be accounted for in large scale oceanic circulation models that are driven by forcing, such as tides, bathymetry, and river-influx. That being said, these problems may be resolved through PINNs in the future.

Popular non-linear data-driven approaches for reconstruction of fluid flow are different variations of autoencoders.13,26,27 An autoencoder28 is a special configuration of an artificial neural network that first encodes the data by gradually decreasing the size of the hidden layers. Through this process, the data are represented in a lower dimensional space. A second neural network then takes the output of the encoder as the input and decodes the representation back to its original shape. These two neural networks together constitute an autoencoder.

Principal Component Analysis (PCA)29 also represent the data in a different and more compact space. However, PCA reduces the dimension of the data by finding orthogonal basis functions or principal components through singular value decomposition. In fact, it has been showed with a linear activation function, PCA and autoencoders produce the same basis functions.30 The probabilistic version of the autoencoder is called Variational Auto-Encoder (VAE).31 Conditional Variational Auto-Encoders (CVAEs)32 are conditional probabilistic autoencoders, that is, the model is dependent on some additional information such that it is possible to create representations that are dependent on this information.

Here, we address the mentioned problem from a probabilistic point of view. Let x:PR2N and m:QR2M be two multivariate random variables associated with the flow on P and Q, respectively. Then, the datasets X and M consist of the realizations of x and m, respectively. Using X and M, we intend to approximate the probability distribution p(x|m). This would allow us not only to predict x(i) given m(i) but also to estimate an associated uncertainty. We use a variational autoencoder to approximate p(x|m). The method we use is a Bayesian Neural Network (BNN)33 approximated through variational inference,34,35 which we have called Semi-Conditional Variational Auto-Encoder (SCVAE). A detailed description of the SCVAE method for reconstruction and associated uncertainty quantification is given in Sec. III B.

Here, we focus on fluid flow, being the main driving mechanism behind transport and dilution of tracers in marine waters. The world’s oceans are under tremendous stress,36 UN has declared 2021–2030 as the ocean decade,80 and an ecosystem based Marine Spatial Planning initiative has been launched by the Intergovernmental Oceanographic Commission of UNESCO (IOC).37 

Local and regional current conditions determine transport of tracers in the ocean.38,39 Examples are accidental release of radioactive, biological, or chemical substances from industrial complexes, e.g., organic waste from fish farms in Norwegian fjords,40 plastic,41 or other contaminants that might have adverse effects on marine ecosystems.42 

The ability to predict the environmental impact of a release, i.e., concentrations as a function of distance and direction from the source, requires reliable current conditions.43,44 Subsequently, these transport predictions support design of marine environmental monitoring programs.45–48 The aim here is to model current conditions in a probabilistic manner using SCVAEs. This allows for predicting footprints in a Monte Carlo framework, providing simulated data for training networks used for analyzing environmental time series.49 

We will compare results with the GPOD method.50 We are aware that there are recent methods (e.g., PINNS and traditional autoencoder) that may perform better on the specific datasets than the GPOD; however, the simplicity, versatility, and, not least, popularity of GPOD5,51,52 make it a great method for comparison.

The reminder of this manuscript is outlined as follows: Sec. II presents a motivating example for the SCVAE-method in comparison with the GPOD-method. In Sec. III, we review both the VAE and CVAE methods and present the SCVAE. Results from experiments on two different datasets are presented in Sec. IV. Section V summarizes and discusses the method, experiments, drawbacks, benefits, potential extensions, and further work.

Here, we illustrate the performance of the proposed method vs the GPOD method in order to give a motivation for this study. We use data from simulations of a two dimensional viscous flow around a cylinder at the Reynolds number of 160 obtained from Ref. 53. The simulations were performed by Weinkauf and Theisel54 with the Gerris Flow Solver software.55 The dataset consists of a two velocity components, u and v, on an uniform 400 × 50 × 1001 grid of the [−0.5, 7.5] × [−0.5, 0.5] × [15, 23] spatial–temporal domain. The simulations are run from t = 0 to t = 23, but velocities are only extracted from t = 15 to t = 23. In particular, we have 400 points in the horizontal direction, 50 points in the vertical direction, and 1001 points in time. The cylinder has a diameter of 0.125 and is centered at the origin (see Fig. 2).

FIG. 2.

A spatial snapshot at the time t ≈ 19 (a time step of 500) of the flow field from the original 2D flow around a cylinder dataset53 with u and v components presented in the upper and lower panels, respectively.

FIG. 2.

A spatial snapshot at the time t ≈ 19 (a time step of 500) of the flow field from the original 2D flow around a cylinder dataset53 with u and v components presented in the upper and lower panels, respectively.

Close modal

The left vertical boundary (inlet) has the Dirichlet boundary condition, u = 1 and v = 0. The homogeneous Neumann boundary condition is imposed at the right boundary (outlet) and with homogeneous Dirichlet conditions on the remaining boundaries. At the start of the simulation, t = 0, both velocities were equal to zero. We plot a snapshot of the velocities in Fig. 2.

In the experiment below, we extract the data downstream from the cylinder, that is, from grid points 40–200 in the horizontal direction, and keep all grid points in the vertical direction. Hence, P contains N = 8000 points, 160 points in the horizontal direction, and 50 in the vertical direction. The temporal resolution is kept as before, that is, the number of time steps in T is K = 1001.

For validation purposes, the dataset was split into a train, validation, and test dataset. For both the SCVAE and the GPOD, the goal was to minimize the L2 error between the true and the modeled flow state. The restriction of the GPOD is that the number of components r can be at most 2M. To deal with this problem and to account for the flow incompressibility, we added the regularization term λLdivx(i), λ > 0, to the objective function (see  Appendix A). For the GPOD method, the parameters r and λ are optimized on the validation dataset in order to have the smallest mean error. We give more details about objective functions for the SCVAE in Sec. III B. For now, we mention that there are two versions, where one version uses an additional divergence regularization term similar to GPOD.

In Fig. 3, we plot the mean of the relative L2 error calculated on the test data for both methods with and without the div-regularization (in-compressible fluid). The results are presented for three, four, and five measurement locations, that is, M = 3, 4, 5. For each of these three cases, we selected 20 randomly different configurations of M. In particular, we created 20 subgrids Q, each containing five randomly sampled spatial grid points. Next, we removed one and then two points from each of the 20 subgrids Q to create new subgrids of 4 and 3 measurements, respectively. The figure is a box-and-whisker plot of these 20 configurations, where outliers are defined as data points that are outside 1.5 times the inter quartile range from the upper and lower quantiles.

FIG. 3.

The mean relative error for two reconstruction methods. The orange and blue label correspond to the SCVAE with (div-on) and without (div-off) additional divergence regularization. The green and red labels correspond to the GPOD method. The figure is a box-and-whisker plot of the 20 configurations, where outliers are defined as data points that are outside 1.5 times the interquartile range from the upper and lower quantiles (25/75).

FIG. 3.

The mean relative error for two reconstruction methods. The orange and blue label correspond to the SCVAE with (div-on) and without (div-off) additional divergence regularization. The green and red labels correspond to the GPOD method. The figure is a box-and-whisker plot of the 20 configurations, where outliers are defined as data points that are outside 1.5 times the interquartile range from the upper and lower quantiles (25/75).

Close modal

As shown in Fig. 3, both methods perform well with 5 measurements. The resulting relative errors have comparable mean and variance. When reducing the number of observations, the SCVAE method maintains low errors, while the GPOD error increases. The SCVAE seems to benefit from the additional regularization of minimizing the divergence in terms of lower error and less variation in the error estimates. The effect is more profound with fewer measurements.

The key benefit of the SCVAE is that its predictions are optimal for the given measurement locations. In contrast, the POD based approaches and particularly the GPOD create a set of basis functions (principal components) based on the training data independently of the measurements. While this has an obvious computational advantage, the number of principle components for complex flows can be high and, as a result, many more measurements are needed.6,50,56 There are a number of algorithms that aim to optimize the measurement locations to achieve the best performance of the POD based methods (see Refs. 17, 50, and 51). In practice, however, the locations are often fixed and other approaches are needed. The results in Fig. 3 suggest that the SCVAE could be one of these approaches.

Before we introduce the model used for reconstruction of flows, we give a brief introduction to VAEs and CVAEs. For a detailed introduction, see Ref. 57. VAEs are neural network models that has been used for learning structured representations in a wide variety of applications, e.g., image generation,58 interpolation between sentences,59 and compressed sensing.26 

Let us assume that the dataset X is generated by a random process that involves an unobserved continuous random variable z. The process consists of two steps: (i) a value z(i) is sampled from a prior pθ*(z) and (ii) x(i) is generated from a conditional distribution, pθ*(x|z). In the case of flow reconstruction, z could be unknown boundary or initial conditions, tidal and wind forcing, etc. However, generally, z is just a convenient construct to represent X, rather than a physically explained phenomena. Hence, it is, for convenience, assumed that pθ*(z) and pθ*(x|z) come from parametric families of distributions pθ(z) and pθ(x|z), and their density functions are differentiable almost everywhere with respect to both z and θ. A probabilistic autoencoder is a neural network that is trained to represent its input X as pθ(x) via latent representationzpθ(z), that is,

pθ(x)=pθ(x,z)dz=pθ(x|z)pθ(z)dz.
(3)

As pθ(z) is unknown and observations z(i) are not accessible, we must use X in order to generate zpθ(z|x). That is, the network can be viewed as consisting of two parts: an encoder pθ(z|x) and a decoder pθ(x|z). Typically, the true posterior distribution pθ(z|x) is intractable but could be approximated with variational inference.34,35 That is, we define the so-called recognition model qϕ(z|x) with variational parameters ϕ, which aims to approximate pθ(z|x). The recognition model is often parameterized as a Gaussian. Thus, the problem of estimating pθ(z|x) is reduced to finding the best possible estimate for ϕ, effectively turning the problem into an optimization problem.

An autoencoder that uses a recognition model is called Variational Auto-Encoder (VAE). In order to get good prediction, we need to estimate the parameters ϕ and θ. The marginal likelihood is equal to the sum over the marginal likelihoods of the individual samples, that is, i=1Klogpθ(x(i)). Therefore, we further on present estimates for an individual sample. The Kullback–Leibler divergence between two probability distributions qϕ(z|x(i)) and pθ(z|x(i)), defined as

DKLqϕz|x(i)pθz|x(i)  =qϕz|x(i)logqϕz|x(i)pθz|x(i)dz,

can be interpreted as a measure of distinctiveness between these two distributions.60 It can be shown (see Ref. 57) that

logpθxi=DKLqϕz|xipθz|xi+Lθ,ϕ;xi,
(4)

where

Lθ,ϕ;xi=Eqϕz|xilogqϕz|xi+logpθxi,z.

Since KL-divergence is non-negative, we have logpθ(x(i))L(θ,ϕ;x(i)) and L(θ,ϕ;x(i)) is called Evidence Lower Bound (ELBO) for the marginal likelihood log pθ(x(i)). Thus, instead of maximizing the marginal probability, one can instead maximize its variational lower bound to which we also refer to as an objective function. It can be further shown that the ELBO can be written as

Lθ,ϕ;xi=Eqϕz|xilogpθxi|zDKLqϕz|xipθz.
(5)

Reformulating the traditional VAE framework as a constraint optimization problem, it is possible to obtain the β-VAE61 objective function if pθ(z)=N(0,I),

Lθ,ϕ;xi=Eqϕz|xilogpθxi|zβDKLqϕz|xipθz,
(6)

where β > 0. Here, β is a regularization coefficient that constrains the capacity of the latent representation z. The Eqϕ(z|x(i))logpθ(x(i)|z) can be interpreted as the reconstruction term, while the KL-term can be interpreted βDKL[qϕ(z|x(i))pθ(z)] as a regularization term.

Conditional Variational Auto-Encoders32 (CVAE) are similar to VAEs but differ by conditioning on an additional property of the data (e.g., a label or class), here denoted c. Conditioning both the recognition model and the true posteriori on both x(i) and c results in the CVAE ELBO,

Lθ,ϕ;xi,c=Eqϕz|xi,clogpθxi|z,cDKLqz|xi,cpθz|c.
(7)

In the decoding phase, CVAE allows for conditional probabilistic reconstruction and permits sampling from the conditional distribution pθ(z|c), which has been useful for generative modeling of data with known labels (see Ref. 32). Here, we investigate a special case of the CVAE when c is a partial observation of x. We call this the Semi-Conditional Variational Auto-Encoder (SCVAE).

The SCVAE takes the input data X, conditioned on M, and approximates the probability distribution pθ(x|z, m). Then, we can generate x(i) based on the observations m(i) and latent representation z. As m(i) = Cx(i), where C is a non-stochastic sampling matrix, we have

pθz|xi,mi=pθz|xi,andqϕz|xi,mi=qϕz|xi.

Therefore, from Eq. (7), the ELBO for SCVAE is

logpθxi|miLθ,ϕ;xi,mi=Eqϕz|xilogpθxi|z,miDKLqϕz|xipθz|mi,
(8)

where pθ(z|m(i))=N(0,I). Similarly, as for the β-VAE,61 we can obtain a relaxed version of Eq. (8) by maximizing the parameters {ϕ, θ} of the expected log-likelihood Eqϕ()[logpθ(x(i)|m(i),z)]) and treat it as a constrained optimization problem,

maxϕ,θEqϕlogpθxi|mi,z subject toDKLqϕz|mi,xipθz|miϵ,
(9)

where ϵ > 0 is small. The subscript qϕ(·) is short for qϕ(z|m(i), x(i)). Since m(i) is dependent on x(i), we have qϕ(z|m(i), x(i)) = qϕ(z|x(i)). Equation (9) can be expressed as a Lagrangian under the Karush–Kuhn–Tucker (KKT) conditions.62,63 Hence,

Fθ,ϕ,β,α,x(i),m(i)=Eqϕ()logpθx(i)|m(i),z+βDKLqϕz|x(i)pθz|m(i)ϵ.
(10)

According to the complementary slackness KKT condition β ≥ 0, we can rewrite Eq. (10) as

Fθ,ϕ,β,x(i),m(i)Lθ,ϕ,x(i),m(i)=Eqϕ()logpθx(i)|m(i),z+βDKLqϕz|x(i)pθz|m(i).
(11)

The objective functions in Eqs. (8) and (11), later in Eq. (13), show that if conditioning on a feature that is a known function of the original data, such as measurements, we do not need to account for them in the encoding phase. The measurements are then coupled with the encoded data in the decoder. We sketch the main components of the SCVAE in Fig. 4.

FIG. 4.

The figure shows a sketch of the model used to estimate pθ(x|m(i)).

FIG. 4.

The figure shows a sketch of the model used to estimate pθ(x|m(i)).

Close modal

In order to preserve some physical properties of the data X, we can condition yet on another feature. Here, we utilize the incompressibility property of the fluid, i.e., d(i) = Ldivx(i) ≈ 0 [see Eq. (2)]. We intend to maximize a log-likelihood under an additional constrain d(i) compared to Eq. (9). That is,

maxϕ,θEqϕlogpθxi|mi,z

subject to

DKLqϕz|xipθz|mi,diϵ

and

Eqϕlogpθdi|mi,zδ,
(12)

where ϵ, δ > 0 are small. Equation (12) can expressed as a Lagrangian under the Karush–Kuhn–Tucker (KKT) conditions as before, and as a consequence of the complementary slackness condition λ, β ≥ 0, we can obtain the objective function

Fθ,ϕ,β,α,xi,mi,di  Lθ,ϕ,xi,mi,di  =Eqϕlogpθxi|mi,z   +λEqϕlogpθdi|mi,z   βDKLqϕz|xipθz|mi,di,
(13)

where p(z|m(i),d(i))=N(0,I). For convenience of notation, we refer to the objective function [Eq. (11)] as the case with λ = 0 and the objective function [Eq. (13)] as the case with λ > 0. Observe that under the Gaussian assumptions on the priors, Eq. (13) is equivalent to Eq. (11) if λ = 0. Thus, from now on, we will refer to it as a special case of Eq. (13) and denote as L0. Optimizing Eq. (13) can be interpreted as a multi task learning,64 i.e., we optimize the network to solve more than one task at a time. Multitask learning reduces the risk of overfitting65 and produces more consistent results.64 

Similar to Ref. 31, we obtain qϕ(z|x(i))=N(μ(i),diag(σ(i))2), that is, ϕ = {μ(i), σ(i)}. This allows us to express the KL-divergence terms in a closed form and avoid issues related to differentiability of the ELBOs. Under these assumptions, the KL-divergence terms can be integrated analytically, while the terms Eqϕ(z|x(i))logpθ(x(i)|z,m(i)) and Eqϕ(z|x(i))logpθ(d(i)|z,m(i)) require estimation by sampling,

Eqϕz|xilogpθxi|z,mi1Ll=1Llogpθxi|zi,l,mi,Eqϕz|xilogpθdi|z,mi1Ll=1Llogpθdi|zi,l,mi,

where

zi,l=gϕϵi,l,xi,ϵlpϵ.
(14)

Here, ϵl is an auxiliary (noise) variable with independent marginal p(ϵ), and gϕ(·) is a differentiable transformation of ϵ, parametrized by ϕ (see Ref. 31 for details). We denote Lλ, λ ≥ 0 [Eq. (13)], with the approximation above as L^λ, that is,

L^λθ,ϕ,xi,mi,di=1Ll=1Llogpθxi|zi,l,mi+λ1Ll=1Llogpθdi|zi,l,miβDKLqϕz|xipθz|mi,di.
(15)

The objective function L^λ can be maximized by gradient descent. Since the gradient θ,ϕL^λ cannot be calculated for large datasets, Stochastic gradient descent methods (see Refs. 66 and 67) are typically used where

L^λθ,ϕ;X,M,DL^Rθ,ϕ;XR,MR,DR=KRr=1RL^λθ,ϕ;xir,mir,dir,λ0.
(16)

Here, XR=x(ir)r=1R, R < K is a mini-batch consisting of randomly sampled datapoints, MR=m(ir)r=1R, and DR=d(ir)r=1R. After the network is optimized, a posterior predictive distribution pθ(x|m, d) can be approximated with a Monte Carlo estimator.

1. Uncertainty quantification

Let θ^ and ϕ^ be an estimation of generative and variational parameters, as described in Sec. III B. Then, the decoder can be used to predict the posterior as

pθ^x|m*,d*1NMCj=1NMCpθ^x|zj,m*,d*NMCpθ^x|z,m*,d*pθ^z|m*,d*dz.
(17)

While sampling from the latent space has been viewed typically as an approach for generating new samples with similar properties, here, we use it to estimate the prediction uncertainty of the trained model. From Eq. (17), we are able to estimate the mean prediction x^* and empirical covariance matrix Σ^ using a Monte Carlo estimator. We get

x^*=1NMCj=1NMCxj

and

Σ^=1NMC1j=1NMCxjx^*xjx^*T,
(18)

where x(j)pθ^(x|m*,d*). The empirical standard deviation is then σ^=diag(Σ^). To estimate the confidence region, we assume that the predicted pθ^(x|m*,d*) is well approximated by a normal distribution N(μ, Σ). Given that x^* and Σ^ are approximations of μ and Σ, obtained from NMC samples as above, a confidence region estimate for a prediction x* can be given as

x(i)R2N:x(i)x^*TΣ^+(x(i)x^*)χk2(p),
(19)

where χk2(p) is the quantile function for probability p of the chi-squared distribution with k = min{NMC, 2N} degrees of freedom and Σ^+ is the pseudoinverse of Σ^. Using the singular value decomposition, Σ^=USUT, the corresponding interval for (x*)n, n = 1, …, 2N, is

(x^*)nχk2(p)unTS1/22,(x^*)n+χk2(p)unTS1/22,
(20)

where unT is nth row of the matrix U.

We will present the SCVAE method on two different datasets. The first one is the 2D flow around a cylinder described in Sec. II, and the second is ocean currents on the seafloor created by the Bergen Ocean Model (BOM).68 The data X consist of the two dimensional velocities w = (u, v). To illustrate the results, we will plot u and v components of x(i)X [see Eq. (1)]. For validation of the models, the data X are split into train, test, and validation subsets, which are subscripted accordingly, if necessary. The datasets, spitting, and preprocessing for each case are described in Secs. IV A and IV B.

We use a schematically simple architecture to explore the SCVAE. The main ingredient of the encoder is the convolutional neural network (CNN),69,70 and for the decoder, we use transposed CNN-layers.71 The SCVAE has a slightly different architecture in each case, which we present in  Appendix C.

The SCVAE is trained to maximize the objective function in Eq. (16) with the backpropagation algorithm72 and the Adam algorithm.73 We use an adaptive approach of weighing the reconstruction term with KL-divergence and/or divergence terms,74 that is, finding the regularization parameters β and λ. Specifically, we calculate the proportion of contribution of each term to the total value of the objective function and scale the terms accordingly. This approach prevents posterior collapse. Posteriori collapse occurs if the KL-divergence term becomes too close to zero, resulting in a non-probabilistic reconstruction. The approach of weighing the terms proportionally iteratively adjusts the weight of the KL-divergence term, β, such that posterior collapse is mitigated. For the result shown here, we trained the SCVAEs with early stopping criteria of 50 epochs, i.e., the optimization is stopped if we do not see any improvement after 50 epochs and returns the best model. We use a two-dimensional Gaussian distribution for pθ(z|m(i), d(i)) in all the experiments.

Let the test data Xtest consist of n instances x(i), i = 1, …, n, and x^(i) denote a prediction of the true x(i) given m(i). In the case of the SCVAE, x^(i) is the mean prediction obtained as in Eq. (18). For the GPOD method, x^(i) is a deterministic output of the optimization problem (see  Appendix A). In order to compare the SCVAE results with the results of the GPOD method, we introduce the mean of the relative error for the prediction,

E=1ni=1nx^(i)x(i)2x(i)2,
(21)

and the mean of the absolute error for the divergence,

Ediv=1ni=1nLdivx(i)2.
(22)

Here, we return to the example in Sec. II. In the following, we give some additional details of the data preprocessing and model implementation.

1. Preprocessing

The data are reduced, as described in Sec. II. We assess the SCVAE with a sequential split for train, test, and validation: the last 15% of the data is used for test, the last 30% of the remaining data is used for validation, and the first 70% is used for training. To improve the conditioning of the optimization problem, we scale the data as described in  Appendix B. The errors, E [Eq. (21)] and Ediv [Eq. (22)], are calculated after re-scaling the data back. The input to the SVAE x(i) was reshaped as an array with dimension (160 × 50 × 2) in order to apply convolutional layers. Here, we use five, four, three, and two fixed spatial measurements, that is, four different subgrids Q,

Q5={(12,76),(47,8),(30,40),(153,34),(16,10)},Q4={(12,76),(47,8),(30,40),(153,34)},Q3={(12,76),(47,8),(30,40)},Q2={(12,76),(47,8)}.
(23)

The flow state at these specific locations constitutes M.

2. Model

A schematic description of the model is given in  Appendix C. The first layer of the encoder is a zero-padding layer that expands the horizontal and vertical dimensions by adding zeros on the boundaries, a stencil of four in the horizontal direction and three in the vertical direction. The subsequent layers consist of two convolutional layers, where the first and second layers have 160 and 200 filters, respectively. We use a kernel size and strides of 2 in both convolutional layers and ReLu activation functions. This design compresses the data into a (42 × 14 × 200) shape. The compressed representation from the convolutional layers is flattened and is further compressed into a 64 dimensional vector through a traditional dense layer. Two outputs layers are defined to represent the mean and log-variance of the latent representation z. The reparameterization trick is realized in a third layer, the so-called lambda layer, which takes the mean and log-variance as an input and generates z. The output of the encoder are the samples z(i) and the mean and the log-variance of z(i).

The decoder takes the latent representation z(i) and the measurements m(i) as the input. The input m(i) is flattened and then concatenated with z(i). The next layer is a dense layer with shape (42 × 14 × 200). Afterward, there are two transposed convolutional layers with filters of 200 and 160. The strides and the kernel size are the same as those for the encoder. The final layer is a transposed convolutional layer with the same dimension as the input to the encoder, the dimension of x(i). A linear activation function is used for this output layer. The last layer of the model is a lambda layer that removes the zero-padding. In Sec. IV A 3, we show statistics of the probabilistic reconstruction and compare with the GPOD method.

3. Results

In Fig. 5, we have plotted the reconstructed velocity fields and associated statistics. The observations placements are shown as stars (black and white). The SCVAE with the objective function L^0 [see Eq. (7)] was used for this prediction. To generate the posterior predictive distributions [Eq. (17)], we sample 100 realizations from zN(0,I), which allows for calculation mean prediction and uncertainty estimates [see Eq. (18)].

FIG. 5.

(a) and (b) represent u-velocities and (e)–(h) represent v-velocities and associated statistical measures, respectively. The results are based on a model trained with λ = 0 and Q3 measurement locations. [(a) and (e)] True solutions. [(b) and (f)] Reconstructed solutions. [(c) and (g)] Standard deviations. [(d) and (h)] Absolute errors.

FIG. 5.

(a) and (b) represent u-velocities and (e)–(h) represent v-velocities and associated statistical measures, respectively. The results are based on a model trained with λ = 0 and Q3 measurement locations. [(a) and (e)] True solutions. [(b) and (f)] Reconstructed solutions. [(c) and (g)] Standard deviations. [(d) and (h)] Absolute errors.

Close modal

We emphasize again that the SCVAEs with L^0 and L^λ, λ > 0, are two different models. For the sake of notation, we here refer to λ = 0 when we mean the model with the objective function in Eq. (7) and to λ > 0 when in Eq. (13). The same holds for the GPOD method (see  Appendix B). When λ = 0, the number of the principle components r is less, 2M. The number r is chosen such that the prediction on the validation data has the smallest possible error on average. If λ > 0, no restrictions on r are imposed. In this case, both λ and r are estimated from the validation data.

The results show that the SCVAE reconstructs the data well with the associated low uncertainty. This can be explained by the periodicity in the data. In particular, the training and validation datasets represent the test data well enough.

In Fig. 6, we have plotted four time series of the reconstructed test data at two specific grid points, together with the confidence regions constructed as in Eq. (20) with p = 0.95. Figures 6(a) and 6(c) represent the reconstruction at the grid point (6, 31), and Figs. 6(b) and 6(d) represent that at (101, 25) for u and v, respectively. The SCVAE reconstruction is significantly better than the GPOD and close to the true solution for all time steps.

FIG. 6.

Velocities u [(a) and (c)] and v [(b) and (d)] at grid points (6, 31) and (101, 25) with associated confidence regions, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

FIG. 6.

Velocities u [(a) and (c)] and v [(b) and (d)] at grid points (6, 31) and (101, 25) with associated confidence regions, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

Close modal

Figure 7 shows the difference between the true values and the model prediction in time for the same two locations. This figure has to be seen in context with Fig. 5. The difference marginals are obtained based on the confidence region in Fig. 11. In Table I, we display the relative errors [Eq. (21)] for the SCVAE and the GPOD method, both with and without divergence regularization, for five, four, three, and two measurement locations given in Eq. (23).

FIG. 7.

The difference between the true and predicted u [(a) and (c)] and v [(b) and (d)] at grid points (6, 31) and (101, 25) with associated difference marginals, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

FIG. 7.

The difference between the true and predicted u [(a) and (c)] and v [(b) and (d)] at grid points (6, 31) and (101, 25) with associated difference marginals, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

Close modal
TABLE I.

The mean relative error E [Eq. (21)] for the SCVAE prediction and the GPOD prediction with or without div-regularization and different number of measurements.

Measurement locations
MethodRegularization5432
SCVAE λ = 0 0.30 × 10−2 0.33 × 10−2 0.26 × 10−2 0.28 × 10−2 
 λ > 0 0.31 × 10−2 0.32 × 10−2 0.30 × 10−2 0.28 × 10−2 
GPOD λ = 0 2.35 × 10−2 2.49 × 10−2 3.38 × 10−2 17.38 × 10−2 
 λ > 0 2.12 × 10−2 2.33 × 10−2 3.15 × 10−2 16.38 × 10−2 
Measurement locations
MethodRegularization5432
SCVAE λ = 0 0.30 × 10−2 0.33 × 10−2 0.26 × 10−2 0.28 × 10−2 
 λ > 0 0.31 × 10−2 0.32 × 10−2 0.30 × 10−2 0.28 × 10−2 
GPOD λ = 0 2.35 × 10−2 2.49 × 10−2 3.38 × 10−2 17.38 × 10−2 
 λ > 0 2.12 × 10−2 2.33 × 10−2 3.15 × 10−2 16.38 × 10−2 

The results of the SCVAE depend on two stochastic inputs, which are (i) randomness in the initialization of the prior weights and (ii) random mini-batch sampling. We have trained the model with each measurement configuration ten times and chose the model that performs the best on the validation dataset. Ideally, we would run test cases where we used all the values as measurements, i.e., M = X, and test how well the model would reconstruct in this case. This would then give us the lower bound of the best reconstruction that is possible for this specific architecture and hyper-parameter settings. However, this scenario was not possible to test due to limitations in the memory in the GPU. Therefore, we have used a large enough M, which still allowed us to run the model. In particular, we used every fifth and second pixel in the horizontal and vertical directions, which resulted in a total of (32 × 25) measurement locations, or M = 800. We believe that training the model with these settings gave us a good indication of the lower bound of the reconstruction error. The error observed was of the magnitude of 10−3.

This lower bound has been reached for all measurement configurations [see Eq. (23)]. However, a larger computational cost was needed to reach the lower bound for fewer measurement locations. Figure 8 shows the number of epochs as a boxplot diagram. For each measurement configuration and regularization technique, the model is run ten times. The variation of the number of epochs for each measurement locations is due to different priors of the weights and random mini-batch sampling. In comparison with GPOD, the SCVAE error is ten times lower than the GPOD-error, and this difference becomes larger with fewer measurements. Note that adding regularization did not have much effect on the relative error. From the motivating example, we observed that regularizing with λ > 0 is better in terms of a more consistent and low variable error estimation. Here, we selected from the ten trained models the one that performed best on the validation dataset. This model selection approach shows that there are no significant differences between the two regularization techniques. The associated errors in the divergence of the velocity fields are reported in Table II.

FIG. 8.

Number of epochs trained depending on the number of measurements.

FIG. 8.

Number of epochs trained depending on the number of measurements.

Close modal
TABLE II.

Comparison of the divergence error Ediv as calculated in Eq. (22) for the different methods and regularization techniques. The true divergence error on the entire test dataset is 0.1058.

Measurement locations
MethodRegularization5432
SCVAE λ = 0 0.1439 0.1580 0.1383 0.143 2 
 λ > 0 0.1533 0.1408 0.1468 0.141 0 
GPOD λ = 0 0.1052 0.1047 0.0943 0.088 66 
 λ > 0 0.1039 0.1051 0.0966 0.066 9 
Measurement locations
MethodRegularization5432
SCVAE λ = 0 0.1439 0.1580 0.1383 0.143 2 
 λ > 0 0.1533 0.1408 0.1468 0.141 0 
GPOD λ = 0 0.1052 0.1047 0.0943 0.088 66 
 λ > 0 0.1039 0.1051 0.0966 0.066 9 

We tested the SCVAE on simulations from the Bergen Ocean Model (BOM).68 The BOM is a three-dimensional terrain-following nonhydrostatic ocean circulation model with capabilities of resolving mesoscale to large-scale processes. Here, we use velocities simulated by Ali et al.43 The simulations were conducted on the entire North Sea with 800 m horizontal and vertical grid resolution and 41 layers for the period from 1 January 2012 to 15 January 2012. Forcing of the model consists of wind, atmospheric pressure, harmonic tides, rivers, and initial fields for salinity and temperature. For details of the setup of the model, forcing, and the simulations, we refer to Ref. 43.

Here, the horizontal velocities in the bottom layer of an excerpt of the model domain, with dimensions 25.6 × 25.6 km2 in the southern North Sea [center at (58.36°N, 1.91°E)] are used as the dataset for reconstruction. In Fig. 9, we have plotted time series of the mean and extreme values of the two velocity components, u and v, for each time t in T.

FIG. 9.

Mean velocities u (a) and v (b) for each time t in T and associated extremes for each instance. The horizontal lines indicate the sequential data split. Units on the x-axis are in minutes.

FIG. 9.

Mean velocities u (a) and v (b) for each time t in T and associated extremes for each instance. The horizontal lines indicate the sequential data split. Units on the x-axis are in minutes.

Close modal

1. Preprocessing

We extract the 32 × 32 central grid from the bottom layer velocity data. Hence, P contains N = 1024 points. The temporal resolution is originally 105 000, and the time between each time step is 1 min. We downsample the temporal dimension of the original data uniformly such that the number of time steps in T is K = 8500. We train and validate the SCVAE with two different data splits: randomized and sequential in time. For the sequential split, we have used the last 15% for the test, the last 30% of the remaining data is used for validation, and the fist 70% for training. In Fig. 9, the red and blue vertical lines indicate the data split for this case. For the random split, the instances x(i) are drawn randomly from X with the same percentage. The data were scaled as described in  Appendix B. The input x(i) to the SCVAE was shaped as (32 × 32 × 2) in order to apply convolutional layers. We use nine, five, and three fixed spatial measurement locations. In particular, the subgrid Q is given as

Q9=(6,6),(6,17),(6,27),(17,17),(17,27),(17,6),(27,6),(27,17),(27,27),Q5={(6,6),(17,17),(27,27),(6,27),(27,6)},Q3={(6,27),(17,17),(27,6)}.
(24)

As before, the values of u and v at these specific locations constitute the measurements m(i)M.

2. Model

A schematic description of the model is given in Appendixes C 3 and C 4. The first two layers of the encoder are convolutional layers with 64 and 128 filters with strides and a kernel size of 2 and ReLu activation functions. This compresses the data into a shape of (8 × 8 × 128). The next layers are flattening and dense layers, where the latter have 16 filters and ReLu activation. The subsequent layers define the mean and log-variance of the latent representation z, which is input to a lambda layer for realization of the reparameterization trick. The encoder outputs the samples z(i), the mean, and the log-variance of z(i).

The input to the decoder is the output z(i) of the encoder and the measurement m(i). To concatenate the inputs, m(i) is flattened. After concatenation of z(i) and m(i), the next layer is a dense layer with shape (8 × 8 × 128) and ReLu activation. This allows for the use of transposed convolutional layers to obtain the original shape of the data. Hence, the following layers are two transposed convolutional layers with 64 and 128 filters, strides and kernel size of 2, and ReLu activation. The final layer is a transposed convolutional with linear activation functions and a filter size of shape (32 × 32 × 2), i.e., the same shape as x(i).

3. Results

We illustrate the obtained posterior predictive distribution in terms of the predictive mean and standard deviation for the prediction at a specific time. The SCVAE is compared with the GPOD method, both with λ > 0 and λ = 0, for measurement locations given in Eq. (24) for random and sequential split cases. To generate the posterior predictive distributions [Eq. (17)], we sample 200 realizations from zN(0,I), which allows for calculation mean prediction and uncertainty estimates [see Eq. (18)]. Figure 10 shows the results of the prediction at time step 1185 for both the u and v components and associated uncertainty estimates for a trained model with λ = 0 and Q3 measurement locations [see Eq. (24)].

FIG. 10.

(a) and (b) represent u-velocities and (e)–(h) represent v-velocities and associated statistical measures, respectively. The results are based on a model trained with λ = 0 and Q3 measurement locations. [(a) and (e)] True solutions. [(b) and (f)] Reconstructed solutions. [(c) and (g)] Standard deviations. [(d) and (h)] Absolute errors.

FIG. 10.

(a) and (b) represent u-velocities and (e)–(h) represent v-velocities and associated statistical measures, respectively. The results are based on a model trained with λ = 0 and Q3 measurement locations. [(a) and (e)] True solutions. [(b) and (f)] Reconstructed solutions. [(c) and (g)] Standard deviations. [(d) and (h)] Absolute errors.

Close modal

In Fig. 11, we plot the true solution and the predicted mean velocity [Eq. (18)] with the associated uncertainty [Eq. (20)] for two grid points. We plot only the first 600 time steps for readability. The first grid point is (26, 6) and (4, 1). One location is ∼5.1 km from the nearest observation, and another one is about 16.1 km away.

FIG. 11.

Velocities u [(a) and (c)] and v [(b) and (d)] at grid points (26, 6) (5.1 km from the nearest observation) and (4, 1) (16.1 km from the nearest observation) with the associated confidence regions, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

FIG. 11.

Velocities u [(a) and (c)] and v [(b) and (d)] at grid points (26, 6) (5.1 km from the nearest observation) and (4, 1) (16.1 km from the nearest observation) with the associated confidence regions, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

Close modal

Figure 12 has to be viewed in context with Fig. 11 and shows the difference between the true and the predicted solutions with the associated difference marginal in time for the two locations as in Fig. 11.

FIG. 12.

The difference between the true and predicted u [(a) and (c)] and v [(b) and (d)] at grid points (6, 31) and (101, 25) with the associated difference marginals, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

FIG. 12.

The difference between the true and predicted u [(a) and (c)] and v [(b) and (d)] at grid points (6, 31) and (101, 25) with the associated difference marginals, respectively. The estimates are based on a model trained with λ = 0 and Q3 measurement locations.

Close modal

Integrating over the latent space generates a posterior distribution of the reconstruction, as described in Sec. III B 1. It is also possible to use the latent space to generate new statistically sound versions of u and v. This is presented in Fig. 13 where it is sampled uniformly over the two dimensional latent space zN(0,I), and the result shows how different variations can be created with the SCVAE model, given only the sparse measurements.

FIG. 13.

Predictions with uniformly sampling over the latent representation u (a) and v (b) for a sample number of 1185 with the associated true solutions in (c) and (d), respectively. The predictions are generated from a model with λ = 0 and Q3 measurement locations.

FIG. 13.

Predictions with uniformly sampling over the latent representation u (a) and v (b) for a sample number of 1185 with the associated true solutions in (c) and (d), respectively. The predictions are generated from a model with λ = 0 and Q3 measurement locations.

Close modal

These sampled velocities could be used for ensemble simulations when estimating uncertainty in a passive tracer transport (see Ref. 48).

The SCVAE results are compared with results of the GPOD method (see Tables III and IV). The tables show the errors as calculated in Eqs. (21) and (22) of the test dataset for both sequential and random split. For the sequential splitting, the SCVAE is better for three measurement locations, while the GPOD method performs better for nine and five locations. From Fig. 9, we observe that test dataset seems to arise from a different process than the train and validation data (especially for v). Thus, the SCVAE generalize worse than a simpler model such as the GPOD.75 For the three location case, the number of components in the GPOD is not enough to compete with the SCVAE.

TABLE III.

Errors as calculated in Eq. (21) for the different methods, regularization techniques (λ = 0 or λ > 0), split regimes, and measurements.

Measurement locations
SplitRegularizationMethod953
Random λ = 0 SCVAE 0.1379 0.2097 0.2928 
  GPOD 0.3300 0.3822 0.4349 
 λ > 0 SCVAE 0.1403 0.2025 0.3016 
  GPOD 0.2971 0.3579 0.4039 
Time dependent λ = 0 SCVAE 0.3493 0.3913 0.4155 
  GPOD 0.3767 0.4031 0.4678 
 λ > 0 SCVAE 0.3527 0.3889 0.4141 
  GPOD 0.3362 0.3695 0.4462 
Measurement locations
SplitRegularizationMethod953
Random λ = 0 SCVAE 0.1379 0.2097 0.2928 
  GPOD 0.3300 0.3822 0.4349 
 λ > 0 SCVAE 0.1403 0.2025 0.3016 
  GPOD 0.2971 0.3579 0.4039 
Time dependent λ = 0 SCVAE 0.3493 0.3913 0.4155 
  GPOD 0.3767 0.4031 0.4678 
 λ > 0 SCVAE 0.3527 0.3889 0.4141 
  GPOD 0.3362 0.3695 0.4462 
TABLE IV.

Divergence errors as calculated in Eq. (22) for the different methods, regularization techniques (λ = 0 or λ > 0), split regimes, and measurements. The true divergence of the test data is of order 10−4.

Measurement locations
SplitRegularizationMethod953
Random λ = 0 SCVAE 3.75 × 10−5 3.62 × 10−5 3.42 × 10−5 
  GPOD 6.51 × 10−5 5.88 × 10−5 5.02 × 10−5 
 λ > 0 SCVAE 3.60 × 10−5 3.60 × 10−5 3.13 × 10−5 
  GPOD 6.23 × 10−5 4.77 × 10−5 4.14 × 10−5 
Time dependent λ = 0 SCVAE 2.02 × 10−5 1.80 × 10−5 1.69 × 10−5 
  GPOD 5.09 × 10−5 4.03 × 10−5 4.15 × 10−5 
 λ > 0 SCVAE 2.05 × 10−5 1.99 × 10−5 1.85 × 10−5 
  GPOD 4.39 × 10−5 3.65 × 10−5 2.92 × 10−5 
Measurement locations
SplitRegularizationMethod953
Random λ = 0 SCVAE 3.75 × 10−5 3.62 × 10−5 3.42 × 10−5 
  GPOD 6.51 × 10−5 5.88 × 10−5 5.02 × 10−5 
 λ > 0 SCVAE 3.60 × 10−5 3.60 × 10−5 3.13 × 10−5 
  GPOD 6.23 × 10−5 4.77 × 10−5 4.14 × 10−5 
Time dependent λ = 0 SCVAE 2.02 × 10−5 1.80 × 10−5 1.69 × 10−5 
  GPOD 5.09 × 10−5 4.03 × 10−5 4.15 × 10−5 
 λ > 0 SCVAE 2.05 × 10−5 1.99 × 10−5 1.85 × 10−5 
  GPOD 4.39 × 10−5 3.65 × 10−5 2.92 × 10−5 

With random split on the train, test, and validation data, we see that the SCVAE is significantly better than the GPOD. The training data and measurements represent the test data and test measurements better with random splitting. This highlights the importance of large datasets that cover as many outcomes as possible. Demanding that λ > 0 in Eq. (16) does not improve the result. From the SCVAE models with λ = 0, we learn that the reconstructed representations should have low divergence without explicitly demanding it during optimization. However, as discussed in the 2D flow around cylinder experiment, demanding λ > 0 seems to improve the conditioning of the optimization problem and give more consistent results. In Fig. 14, we present a boxplot of the number of epochs against the number of measurements. For each measurement configuration and regularization technique, the model is optimized ten times. The variation in the number of epochs for each measurement and regularization technique is due to different priors of the weights and mini-batch sampling.

FIG. 14.

The figure shows the number of epochs and the number of measurement locations.

FIG. 14.

The figure shows the number of epochs and the number of measurement locations.

Close modal

We have presented the SCVAE method for efficient data reconstruction based on sparse observations. The derived objective functions for the network optimization show that the encoding is independent of measurements. This allows for a simpler model structure with fewer model parameters than a CVAE and results in an optimization procedure that requires less computational resources.

We have shown that the SCVAE is suitable for the reconstruction of fluid flow. The method is showcased on two different datasets: velocity data from simulations of 2D flow around a cylinder and bottom currents from the BOM. The fact that the fluids studied in the experiments are incompressible served as a motivation for adding an extra constraint to the objective function with λ > 0.

Our investigation of additional regularization showed that the mean reconstruction errors over all models were lower with λ > 0 compared to the model where λ = 0, but the best reconstruction errors were similar for λ = 0 and λ > 0.

The SCVAE is a probabilistic model, which allows us to make predictions, estimate their uncertainty, and draw multiple samples from the predictive distribution. The last two properties make the SCVAE a useful method especially when the predictions are used as a component in a larger application, i.e., ensemble simulations of tracer transport. Motivated by Ref. 17, we compared the SCVAE predictions with the predictions of a modified GPOD method.

Unlike the GPOD method, a benefit with the SCVAE method is that it scales well to larger datasets. Another aspect and as suggested by the experiments in Sec. IV, the GPOD seems more sensitive to the number of measurement locations than the SCVAE. On the other hand, the experiments suggested that GPOD is better than SCVAE with a larger number of measurement locations if the training data and the test data are too different. Essentially, the SCVAE is overfit to the training data and, as a result, performs poorly on the test dataset. This fact shows the importance of training the SCVAE on large datasets, which covers as many potential flow patterns as possible. Furthermore, the results show that the GPOD is more sensitive to the measurement location choice than the SCVAE, and the GPOD method is not expected to perform well on a complex flow with very few fixed measurement locations.

VAEs have been used for generating data in computer vision,31 and autoencoders are natural choices in reconstruction tasks.13 Many reconstruction approaches, including the GPOD approach, first create a basis and then use the basis and minimize the error of the observations.50,76 This makes the GPOD suitable for fast optimization of measurement locations that minimize the reconstruction error. On the other hand, the SCVAE optimizes the basis function given the measurements, i.e., they are known and fixed. This makes it challenging to use the framework for optimizing sensor layout. However, if the measurement locations are fixed and large amounts of training data are available, the SCVAE outperforms the GPOD for reconstruction. SCVAE optimizes the latent representation and the neural network model parameters and variational and generative parameters, given the measurements. This ensures that the reconstruction is adapted to the specific configuration of measurements.

A limitation of our experiments is that we used only 100 and 200 samples and constructed the confidence region under further simplifying assumptions. The uncertainty estimate could be improved by increasing the sample size and better model for the confidence region.

Natural applications for the SCVAE are related to environmental data, where we often have sparse measurements. It is possible to optimize the sensor layout to the best possible one for detecting unintentional discharges in the marine environment by using a simple transport model, forced by given flow fields, to predict the area of influence (see the work of Oleynik et al).48 The SCVAE can be used to improve such an approach by efficiently generating realistic flow fields in a Monte Carlo framework. The incompressibility constraint is important in this case, and it assures conservation of mass in the transport model. Such a framework may be important as the input to design, environmental risk assessments, and emergency preparedness plans.

We have highlighted the SCVAE through the reconstruction of currents and flow field reconstruction; however, the SCVAE method is not limited to fluid flow problems. For instance, the same principles could be used in computer vision to generate a new picture based on sparse pixel representations or in time series reconstruction.

A survey paper on Bayesian networks that accounts for time dependence in the model itself, the so-called dynamical BNNs, was recently published.77 A variety of recurrent neural networks are presented, applicable for spatial and temporal data without explicitly addressing the flow reconstruction problem. A natural extension of the SCVAE can be to implement it as a dynamical BNN, i.e., to predict the current state pθ(xt|mt, xt−1), given the measurements and the reconstruction from the previous time step. This could potentially improve the reconstruction further.

The 2D flow around a cylinder dataset is simulated by Weinkauf54 using the Free Software Gerris Flow Solver.55 The data that support the findings of this study are openly available in http://tinoweinkauf.net/notes/cylinder2d.html.53 The BOM dataset is simulated by Alfatih Ali and contains time series of CO2 concentration, velocity components, time, and position in longitude and latitude. The data that support the findings of this study are openly available in Zenodo at http://doi.org/10.5281/zenodo.806088.78 

This work is part of the project ACTOM, funded through the ACT program (Accelerating CCS Technologies, Horizon2020 Project No. 294766). This work received financial contributions from the Research Council of Norway (RCN), Norway; the Netherlands Enterprise Agency (RVO), the Netherlands; the Department for Business, Energy and Industrial Strategy (BEIS) together with extra funding from NERC and EPSRC research councils, United Kingdom; and the U.S. Department of Energy (U.S. DOE), USA. Kristian Gundersen was supported by the Research Council of Norway through the CLIMIT program (Project No. 254711, BayMode) and the European Union Horizon 2020 research and innovation program under Grant Agreement No. 654462, STEMM-CCS. The authors acknowledge NVIDIA Corporation for providing their GPUs in the academic GPU Grant Program.

Let the vectors x(i) in X be organized as a snapshots matrix XhR2N×K. Here, we consider that the latent space be given by the r principal components of the matrix Xh, assuming that rN. Thus, Xh ≈ ΦAh, where Φ is the 2N × r matrix of principal components and Ah = Φ+Xh is the r × K representation of Xh.

Let x*R2N be an unknown state that we need to reconstruct from M spatial measurements m* = Cx*. We assume that there is an a such that x* = Φa, and we search for a solution of CΦa = m*. Even if the system CΦa = m* is overdetermined, the matrix CΦ could be ill-conditioned or rank-deficient. However, since the number of sensors is usually small, 2M < r, the system is underdetermined and regularization is required.

Assuming that the flow is incompressible, the natural regularization is to penalize the divergence error of a solution. That is, we solve

a*=argminaCΦam*22+λLdivΦa22,
(A1)

where Ldiv:R2NRN is a linear operator approximating the divergence and λ > 0 is a regularization constant. Finally, we decode x* from the measurements m* as x* = Φa*.

Let Ttrain contain the times tli, i = 1, …, n, corresponding to the training data. We define

umax=maxp,tu(p,t)andumin=minp,tu(p,t)

and

vmax=maxp,tv(p,t)andvmin=minp,tv(p,t)

as the largest and smallest values of u and v on P and Ttrain. Then, the middle points are given as

uc=umax+umin2,vc=vmax+vmin2,

and the half lengths are given as

du=umaxumin2,dv=vmaxvmin2.

Then, the whole data are scaled as

ũ=uucdu,ṽ=vvcdv,

and the divergence operator Ldiv is scaled accordingly.

After the optimization is completed, the data are scaled back, i.e.,

u=duũ+uc,v=dvṽ+vc.

The relative errors in Eq. (21) are calculated on the scaled data. The divergence errors [Eq. (22)] are unaffected by the scaling.

We use Keras79 in the implementation of the SCVAE for all experiments. Here, we present details on the architecture of the decoders and encoders for the different experiments (Figs. 15,Figs. 16,Figs. 17,Figs. 18). We have optimized the SCVAE models with different number of measurements. That is, the shape of the input layer to the decoder will be dependent on the measurements (sensor-input layer). Here, we present details on the architecture of the encoders and decoders with the largest number of measurements for SCVAE models for both experiments. There is one extra dimension in the figures showing the encoders and decoders. Here, this dimension is one, but the framework is implemented to allow for more dimensions in time.

FIG. 15.

Schematic overview and details of the encoder used in the CW-data experiment.

FIG. 15.

Schematic overview and details of the encoder used in the CW-data experiment.

Close modal
FIG. 16.

Schematic overview and details of the decoder used in the CW-data experiment.

FIG. 16.

Schematic overview and details of the decoder used in the CW-data experiment.

Close modal
FIG. 17.

Schematic overview and details of the encoder used in the BOM-data experiment.

FIG. 17.

Schematic overview and details of the encoder used in the BOM-data experiment.

Close modal
FIG. 18.

Schematic overview and details of the decoder used in the BOM-data experiment.

FIG. 18.

Schematic overview and details of the decoder used in the BOM-data experiment.

Close modal

1. Encoder for 2D flow around cylinder data experiment

2. Decoder for 2D flow around cylinder experiment

3. Encoder for BOM data experiment

4. Decoder for BOM data experiment

1.
S. L.
Brunton
and
B. R.
Noack
, “
Closed-loop turbulence control: Progress and challenges
,”
Appl. Mech. Rev.
67
(
5
),
050801
(
2015
).
2.
L.
Kong
,
W.
Wei
, and
Q.
Yan
, “
Application of flow field decomposition and reconstruction in studying and modeling the characteristics of a cartridge valve
,”
Eng. Appl. Comput. Fluid Mech.
12
(
1
),
385
396
(
2018
).
3.
T.
Bolton
and
L.
Zanna
, “
Applications of deep learning to ocean data inference and subgrid parameterization
,”
J. Adv. Model. Earth Syst.
11
(
1
),
376
399
(
2019
).
4.
D.
Venturi
and
G. E.
Karniadakis
, “
Gappy data and reconstruction procedures for flow past a cylinder
,”
J. Fluid Mech.
519
,
315
(
2004
).
5.
J. L.
Callaham
,
K.
Maeda
, and
S. L.
Brunton
, “
Robust flow reconstruction from limited measurements via sparse representation
,”
Phys. Rev. Fluids
4
(
10
),
103907
(
2019
).
6.
K.
Manohar
,
B. W.
Brunton
,
J. N.
Kutz
, and
S. L.
Brunton
, “
Data-driven sparse sensor placement for reconstruction: Demonstrating the benefits of exploiting known patterns
,”
IEEE Control Syst. Mag.
38
(
3
),
63
86
(
2018
).
7.
K.
Yeo
, “
Data-driven reconstruction of nonlinear dynamics from sparse observation
,”
J. Comput. Phys.
395
,
671
689
(
2019
).
8.
P. D.
Oikonomou
,
A. H.
Alzraiee
,
C. A.
Karavitis
, and
R. M.
Waskom
, “
A novel framework for filling data gaps in groundwater level observations
,”
Adv. Water Resour.
119
,
111
124
(
2018
).
9.
L.
Sirovich
, “
Turbulence and the dynamics of coherent structures. I. Coherent structures
,”
Q. Appl. Math.
45
(
3
),
561
571
(
1987
).
10.
R.
Everson
and
L.
Sirovich
, “
Karhunen–Loève procedure for gappy data
,”
J. Opt. Soc. Am. A
12
(
8
),
1657
1664
(
1995
).
11.
D. L.
Donoho
, “
Compressed sensing
,”
IEEE Trans. Inf. Theory
52
(
4
),
1289
1306
(
2006
).
12.
P. J.
Schmid
, “
Dynamic mode decomposition of numerical and experimental data
,”
J. Fluid Mech.
656
,
5
28
(
2010
).
13.
S. M. A.
Al Mamun
,
C.
Lu
, and
B.
Jayaraman
, “
Extreme learning machines as encoders for sparse reconstruction
,”
Fluids
3
(
4
),
88
(
2018
).
14.
M.
Raissi
,
P.
Perdikaris
, and
G. E.
Karniadakis
, “
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
,”
J. Comput. Phys.
378
,
686
707
(
2019
).
15.
I.
Bright
,
G.
Lin
, and
J. N.
Kutz
, “
Compressive sensing based machine learning strategy for characterizing the flow around a cylinder with limited pressure measurements
,”
Phys. Fluids
25
(
12
),
127102
(
2013
).
16.
K.
Cohen
,
S.
Siegel
, and
T.
McLaughlin
, “
Sensor placement based on proper orthogonal decomposition modeling of a cylinder wake
,” in
33rd AIAA Fluid Dynamics Conference and Exhibit
(
Aerospace Research Central
,
2003
), p.
4259
.
17.
B.
Yildirim
,
C.
Chryssostomidis
, and
G. E.
Karniadakis
, “
Efficient sensor placement for ocean measurements using low-dimensional concepts
,”
Ocean Modell.
27
(
3-4
),
160
173
(
2009
).
18.
S.
Pawar
and
O.
San
, “
Data assimilation empowered neural network parameterizations for subgrid processes in geophysical flows
,” arXiv:2006.08901 (
2020
).
19.
N. B.
Erichson
,
L.
Mathelin
,
Z.
Yao
,
S. L.
Brunton
,
M. W.
Mahoney
, and
J. N.
Kutz
, “
Shallow neural networks for fluid flow reconstruction with limited sensors
,”
Proc. R. Soc. A
476
(
2238
),
20200097
(
2020
).
20.
J. M.
Pérez
,
S.
Le Clainche
, and
J. M.
Vega
, “
Reconstruction of three-dimensional flow fields from two-dimensional data
,”
J. Comput. Phys.
407
,
109239
(
2020
).
21.
S.
Le Clainche
and
J. M.
Vega
, “
Higher order dynamic mode decomposition to identify and extrapolate flow patterns
,”
Phys. Fluids
29
(
8
),
084102
(
2017
).
22.
Y.
Gao
,
L.
Liu
,
C.
Zhang
,
X.
Wang
, and
H.
Ma
, “
SI-AGAN: Spatial interpolation with attentional generative adversarial networks for environment monitoring
,” in
24th European Conference on Artificial Intelligence–ECAI
(
IOS Press BV
,
2020)
, pp.
1786
1794
.
23.
R.
Maulik
,
K.
Fukami
,
N.
Ramachandra
,
K.
Fukagata
, and
K.
Taira
, “
Probabilistic neural networks for fluid flow surrogate modeling and data recovery
,”
Phys. Rev. Fluids
5
(
10
),
104401
(
2020
).
24.
O.
Makansi
,
E.
Ilg
,
O.
Cicek
, and
T.
Brox
, “
Overcoming limitations of mixture density networks: A sampling and fitting framework for multimodal future prediction
,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2019
), pp.
7144
7153
.
25.
M.
Raissi
,
A.
Yazdani
, and
G. E.
Karniadakis
, “
Hidden fluid mechanics: A Navier-Stokes informed deep learning framework for assimilating flow visualization data
,” arXiv:1808.04327 (
2018
).
26.
A.
Grover
and
S.
Ermon
, “
Uncertainty autoencoders: Learning compressed representations via variational information maximization
,” in
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics
(
PMLR
,
2019
), pp.
2514
2524
.
27.
K.
Fukami
,
T.
Nakamura
, and
K.
Fukagata
, “
Convolutional neural network based hierarchical autoencoder for nonlinear mode decomposition of fluid field data
,”
Phys. Fluids
32
(
9
),
095110
(
2020
).
28.
D. E.
Rumelhart
,
G. E.
Hinton
, and
R. J.
Williams
,
Learning Internal Representations by Error Propagation
(
MIT Press
,
Cambridge, MA, USA
,
1986
), pp.
318
362
.
29.
F. R. S.
Karl Pearson
, “
LIII. On lines and planes of closest fit to systems of points in space
,”
London, Edinburgh Dublin Philos. Mag. J. Sci.
2
(
11
),
559
572
(
1901
).
30.
H.
Bourlard
and
Y.
Kamp
, “
Auto-association by multilayer perceptrons and singular value decomposition
,”
Biol. Cybern.
59
(
4-5
),
291
294
(
1988
).
31.
D. P.
Kingma
and
M.
Welling
, “
Auto-encoding variational bayes
,” arXiv:1312.6114 (
2013
).
32.
K.
Sohn
,
H.
Lee
, and
X.
Yan
, “
Learning structured output representation using deep conditional generative models
,” in
Advances in Neural Information Processing Systems
(
NIPS
,
2015
), pp.
3483
3491
.
33.
D. J. C.
MacKay
, “
A practical Bayesian framework for backpropagation networks
,”
Neural Comput.
4
(
3
),
448
472
(
1992
).
34.
M. D.
Hoffman
,
D. M.
Blei
,
C.
Wang
, and
J.
Paisley
, “
Stochastic variational inference
,”
J. Mach. Learn. Res.
14
(
1
),
1303
1347
(
2013
).
35.
D. M.
Blei
,
A.
Kucukelbir
, and
J. D.
McAuliffe
, “
Variational inference: A review for statisticians
,”
J. Am. Stat. Assoc.
112
(
518
),
859
877
(
2017
).
36.
B. S.
Halpern
,
C.
Longo
,
D.
Hardy
,
K. L.
McLeod
,
J. F.
Samhouri
,
S. K.
Katona
,
K.
Kleisner
,
S. E.
Lester
,
J.
O’Leary
,
M.
Ranelletti
,
A. A.
Rosenberg
,
C.
Scarborough
,
E. R.
Selig
,
B. D.
Best
,
D. R.
Brumbaugh
,
F. S.
Chapin
,
L. B.
Crowder
,
K. L.
Daly
,
S. C.
Doney
,
C.
Elfes
,
M. J.
Fogarty
,
S. D.
Gaines
,
K. I.
Jacobsen
,
L. B.
Karrer
,
H. M.
Leslie
,
E.
Neeley
,
D.
Pauly
,
S.
Polasky
,
B.
Ris
,
K.
St Martin
,
G. S.
Stone
,
U. R.
Sumaila
, and
D.
Zeller
, “
An index to assess the health and benefits of the global ocean
,”
Nature
488
(
7413
),
615
620
(
2012
).
37.
E.
Domínguez-Tejo
,
G.
Metternicht
,
E.
Johnston
, and
L.
Hedge
, “
Marine spatial planning advancing the ecosystem-based approach to coastal zone management: A review
,”
Mar. Policy
72
,
115
130
(
2016
).
38.
H.
Drange
,
G.
Alendal
, and
O. M.
Johannessen
, “
Ocean release of fossil fuel CO2: A case study
,”
Geophys. Res. Lett.
28
(
13
),
2637
2640
, (
2001
).
39.
S. F.
Barstow
, “
The ecology of Langmuir circulation: A review
,”
Mar. Environ. Res.
9
(
4
),
211
236
(
1983
).
40.
A.
Ali
,
Ø.
Thiem
, and
J.
Berntsen
, “
Numerical modelling of organic waste dispersion from fjord located fish farms
,”
Ocean Dyn.
61
(
7
),
977
989
(
2011
).
41.
K. L.
Law
, “
Plastics in the marine environment
,”
Annu. Rev. Mar. Sci.
9
(
1
),
205
229
(
2017
).
42.
K.
Hylland
,
T.
Burgeot
,
C.
Martínez-Gómez
,
T.
Lang
,
C. D.
Robinson
,
J.
Svavarsson
,
J. E.
Thain
,
A. D.
Vethaak
, and
M. J.
Gubbins
, “
How can we quantify impacts of contaminants in marine ecosystems? The ICON project
,”
Mar. Environ. Res.
124
,
2
(
2015
).
43.
A.
Ali
,
H. G.
Frøysa
,
H.
Avlesen
, and
G.
Alendal
, “
Simulating spatial and temporal varying CO2 signals from sources at the seafloor to help designing risk-based monitoring programs
,”
J. Geophys. Res.: Oceans
121
(
1
),
745
757
, (
2016
).
44.
J.
Blackford
,
G.
Alendal
,
H.
Avlesen
,
A.
Brereton
,
P. W.
Cazenave
,
B.
Chen
,
M.
Dewar
,
J.
Holt
, and
J.
Phelps
, “
Impact and detectability of hypothetical CCS offshore seep scenarios as an aid to storage assurance and risk assessment
,”
Int. J. Greenhouse Gas Control
95
,
102949
(
2020
).
45.
H. K.
Hvidevold
,
G.
Alendal
,
T.
Johannessen
,
A.
Ali
,
T.
Mannseth
, and
H.
Avlesen
, “
Layout of CCS monitoring infrastructure with highest probability of detecting a footprint of a CO2 leak in a varying marine environment
,”
Int. J. Greenhouse Gas Control
37
,
274
279
(
2015
).
46.
H. K.
Hvidevold
,
G.
Alendal
,
T.
Johannessen
, and
A.
Ali
, “
Survey strategies to quantify and optimize detecting probability of a CO2 seep in a varying marine environment
,”
Environ. Modell. Software
83
,
303
309
(
2016
).
47.
G.
Alendal
, “
Cost efficient environmental survey paths for detecting continuous tracer discharges
,”
J. Geophys. Res.: Oceans
122
(
7
),
5458
5467
, (
2017
).
48.
A.
Oleynik
,
M. I.
García-Ibáñez
,
N.
Blaser
,
A.
Omar
, and
G.
Alendal
, “
Optimal sensors placement for detecting CO2 discharges from unknown locations on the seafloor
,”
Int. J. Greenhouse Gas Control
95
,
102951
(
2020
).
49.
K.
Gundersen
,
G.
Alendal
,
A.
Oleynik
, and
N.
Blaser
, “
Binary time series classification with Bayesian convolutional neural networks when monitoring for marine gas discharges
,”
Algorithms
13
(
6
),
145
(
2020
).
50.
K.
Willcox
, “
Unsteady flow sensing and estimation via the gappy proper orthogonal decomposition
,”
Comput. Fluids
35
(
2
),
208
226
(
2006
).
51.
T.
Jo
,
B.
Koo
,
H.
Kim
,
D.
Lee
, and
J. Y.
Yoon
, “
Effective sensor placement in a steam reformer using gappy proper orthogonal decomposition
,”
Appl. Therm. Eng.
154
,
419
432
(
2019
).
52.
M.
Mifsud
,
A.
Vendl
,
L.-U.
Hansen
, and
S.
Görtz
, “
Fusing wind-tunnel measurements and CFD data using constrained gappy proper orthogonal decomposition
,”
Aerosp. Sci. Technol.
86
,
312
326
(
2019
).
53.
T.
Weinkauf
, 2D flow around a cylinder dataset, https://www.csc.kth.se/∼weinkauf/datasets/Cylinder2D.7z,
2010
.
54.
T.
Weinkauf
and
H.
Theisel
, “
Streak lines as tangent curves of a derived vector field
,”
IEEE Trans. Visualization Comput. Graphics
16
(
6
),
1225
1234
(
2010
).
55.
S.
Popinet
, “
Free computational fluid dynamics
,”
ClusterWorld
2
(
6
),
7
(
2004
).
56.
J. L.
Proctor
,
S. L.
Brunton
,
B. W.
Brunton
, and
J. N.
Kutz
, “
Exploiting sparsity and equation-free architectures in complex systems
,”
Eur. Phys. J. Spec. Top.
223
,
2665
2684
(
2014
).
57.
D.
Kingma
and
M.
Welling
, “
An introduction to variational autoencoders
,”
Found. Trends Mach. Learn.
12
,
307
392
(
2019
).
58.
K.
Gregor
,
I.
Danihelka
,
A.
Graves
,
D.
Rezende
, and
D.
Wierstra
, “
DRAW: A recurrent neural network for image generation
,” in
Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7-9 July 2015
, Proceedings of Machine Learning Research Vol. 37, edited by
F.
Bach
and
D.
Blei
(
PMLR
,
2015
), pp.
1462
1471
.
59.
S. R.
Bowman
,
L.
Vilnis
,
O.
Vinyals
,
A.
Dai
,
R.
Jozefowicz
, and
S.
Bengio
, “
Generating sentences from a continuous space
,” in
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, August 2016
(
Association for Computational Linguistics
,
2016
), pp.
10
21
.
60.
S.
Kullback
and
R. A.
Leibler
, “
On information and sufficiency
,”
Ann. Math. Stat.
22
(
1
),
79
86
(
1951
).
61.
I.
Higgins
,
L.
Matthey
,
A.
Pal
,
C.
Burgess
,
X.
Glorot
,
M. M.
Botvinick
,
S.
Mohamed
, and
A.
Lerchner
, “
beta-VAE: Learning basic visual concepts with a constrained variational framework
,” in
International Conference on Learning Representations ICLR
(
OpenReview.net/ICLR
,
2017
).
62.
H. W.
Kuhn
and
A. W.
Tucker
, “
Nonlinear programming
,” in
Traces and Emergence of Nonlinear Programming
(
Springer
,
2014
), pp.
247
258
.
63.
W.
Karush
, “
Minima of functions of several variables with inequalities as side constraints
,” M.Sc. dissertation (
Department of Mathematics, University of Chicago
,
1939
).
64.
R.
Caruana
, “
Multitask learning
,”
Mach. Learn.
28
(
1
),
41
75
(
1997
).
65.
J.
Baxter
, “
A Bayesian/information theoretic model of learning to learn via multiple task sampling
,”
Mach. Learn.
28
(
1
),
7
39
(
1997
).
66.
J.
Kiefer
,
J.
Wolfowitz
 et al, “
Stochastic estimation of the maximum of a regression function
,”
Ann. Math. Stat.
23
(
3
),
462
466
(
1952
).
67.
H.
Robbins
and
S.
Monro
, “
A stochastic approximation method
,”
Ann. Math. Stat.
22
,
400
407
(
1951
).
68.
J.
Berntsen
, “
Users guide for a modesplit σ-coordinate numerical ocean model
,” Department of Applied Mathematics, University of Bergen, Technical Report No. 135,
2000
, p.
48
.
69.
Y.
LeCun
,
L.
Bottou
,
Y.
Bengio
, and
P.
Haffner
, “
Gradient-based learning applied to document recognition
,”
Proc. IEEE
86
(
11
),
2278
2324
(
1998
).
70.
A.
Krizhevsky
,
I.
Sutskever
, and
G. E.
Hinton
, “
Imagenet classification with deep convolutional neural networks
,” in
Advances in Neural Information Processing Systems
(
OpenReviwe.net/ICLR
,
2012
), pp.
1097
1105
.
71.
H.
Noh
,
S.
Hong
, and
B.
Han
, “
Learning deconvolution network for semantic segmentation
,” in
Proceedings of the IEEE International Conference on Computer Vision
(
IEEE
,
2015
), pp.
1520
1528
.
72.
Y.
LeCun
,
B.
Boser
,
J. S.
Denker
,
D.
Henderson
,
R. E.
Howard
,
W.
Hubbard
, and
L. D.
Jackel
, “
Backpropagation applied to handwritten zip code recognition
,”
Neural Comput.
1
(
4
),
541
551
(
1989
).
73.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2014
).
74.
A.
Ali Heydari
,
C. A.
Thompson
, and
A.
Mehmood
, “
SoftAdapt: Techniques for adaptive loss weighting of neural networks with multi-part loss functions
,” arXiv:1912.12355 (
2019
).
75.
K.
Burnham
and
D.
Anderson
,
Model Selection and Multimodel Inference - A Practical Information-theoretic Approach
(
Springer-Verlag
,
2004
).
76.
T.
Bui-Thanh
,
M.
Damodaran
, and
K.
Willcox
, “
Aerodynamic data reconstruction and inverse design using proper orthogonal decomposition
,”
AIAA J.
42
(
8
),
1505
1516
(
2004
).
77.
L.
Girin
,
S.
Leglaive
,
X.
Bie
,
J.
Diard
,
T.
Hueber
, and
X.
Alameda-Pineda
, “
Dynamical variational autoencoders: A comprehensive review
,” arXiv:2008.12595 (
2020
).
78.
A.
Ali
,
G.
Alendal
, and
H.
Avlesen
(
2017
). “
Modelled time series of CO2 in the vicinity of a seep in the North Sea
,” Zenodo. .
79.
F.
Chollet
 et al, Keras, https://keras.io,
2015
.
80.
See https://www.oceandecade.org/ for more information about the project “United Nations Decade of Ocean Science for Sustainable Development (2021-2030).”
81.

The simulations are run from t = 0 to t = 23, but velocities are only extracted from t = 15 to t = 23.