Developing data-driven subgrid-scale (SGS) models for large eddy simulations (LESs) has received substantial attention recently. Despite some success, particularly in a priori (offline) tests, challenges have been identified that include numerical instabilities in a posteriori (online) tests and generalization (i.e., extrapolation) of trained data-driven SGS models, for example, to higher Reynolds numbers. Here, using the stochastically forced Burgers turbulence as the test-bed, we show that deep neural networks trained using properly pre-conditioned (augmented) data yield stable and accurate a posteriori LES models. Furthermore, we show that transfer learning enables accurate/stable generalization to a flow with 10× higher Reynolds number.

Due to their high computational cost, the direct numerical simulation (DNS) of turbulent flows will remain out of reach for many real-world applications in the foreseeable future. As a result, the need for parameterization of subgrid-scale (SGS) processes in coarse-resolution models such as large eddy simulation (LES) continues in various areas of science and engineering.1,2 In recent years, there has been substantial interest in applications of deep learning for data-driven modeling of turbulent flows,3–13 including for developing data-driven SGS parameterization (DDP) models.14–27 In many of these studies, the goal is to learn the relationship between the filtered variables and SGS terms in high-fidelity data (e.g., DNS data), and use this DDP model in LES. A priori tests in some of these studies17–19,25 have shown that such a non-parametric approach can yield DDP models that capture important physical processes (e.g., energy backscatter28,29) beyond the simple diffusion process that is represented in canonical physics-based SGS models such as Smagorinsky and dynamic Smagorinsky (DSMAG).30–32 However, these studies have also reported that a posteriori (i.e., online) LES tests, in which the DDP model is coupled to a coarse-resolution Navier-Stokes solver, show numerical instabilities or lead to physically unrealistic flows.17–19,25,26 As a remedy, often ad hoc post-processing steps of the DDP models' outputs are introduced, e.g., to remove backscattering or to attenuate the SGS feedback into the numerical solver. Usually, such post-processing steps substantially take away the advantages gained from using deep learning. As a result, numerical instabilities remain a major obstacle to broadening the applications of LES with DDP models.

Another major concern with DDP models is their (in)ability to accurately generalize beyond the flow they are trained for, particularly to flows that have higher Reynolds numbers (Re). However, such extrapolations are known to be challenging for neural networks.23,33 Some degree of generalization is essential for building robust and trustworthy LES models with DDP. Furthermore, given that high-fidelity data from often-expensive simulations (e.g., DNS) are needed to train DDP models, some capability to extrapolate to higher Re makes such DDP models much more practically useful.

In this paper, with a particular focus on the issues of stability and generalization, we use a deep artificial neural network (ANN) to develop a DDP model for stochastically forced Burgers turbulence. The forced Burgers equation is34 

ut+12(uu)x=ν2ux2+F,
(1)

where u is velocity, ν=1/Re, and F is a stochastic forcing (defined later). The domain is periodic with length L. Despite being one-dimensional, the presence of strongly nonlinear local regions in the form of shocks, often multiple shocks [Fig. 1(a)], makes Burgers turbulence a complex and challenging system, which has been used as the test-bed in various SGS and reduced-order modeling studies.34–40,F(x, t) is defined as34 

F=k=13αkA20kΔtcos(2π(kxL+Φk)),
(2)

where k, Δt, and A are the wavenumber, time step, and forcing amplitude, respectively. Φk and αk are real, random numbers. To develop the LES model, we spatially filter Eq. (1) to obtain

u¯t+12(u¯u¯)x=ν2u¯x2+F¯+Π,
(3)

with SGS term

Π=12x(uu¯u¯u¯¯).
(4)
FIG. 1.

A sample profile and statistics of the stochastically forced Burgers turbulence (from DNS data at Re = Rec). (a) u showing three distinct shocks. (b) The KE spectrum, showing the inertial range. (c) PSD, as a function of frequency ω, showing chaotic behavior.

FIG. 1.

A sample profile and statistics of the stochastically forced Burgers turbulence (from DNS data at Re = Rec). (a) u showing three distinct shocks. (b) The KE spectrum, showing the inertial range. (c) PSD, as a function of frequency ω, showing chaotic behavior.

Close modal

Here, we use a box filter;1 explorations with Gaussian and sharp spectral filters yield the same findings and conclusions. Overbars indicate filtered (and coarse-grained to LES resolution) variables. Note that the difference between F and F¯ is negligible. Our aim is to train an ANN to learn Π as a function of u¯ in the DNS data, and then use this DDP model as a closure in Eq. (3).

We define a setup, referred to as “control” and indicated with subscripts “c,” with the following parameters (identical to those used by Dolaptchiev et al.34) L = 100, ν=0.02, and A=2/100. Φk and αk are drawn randomly from N(0,1) every 20Δt to update F. To obtain the DNS data, which are treated as the “truth,” Eq. (1) is integrated using a pseudo-spectral solver with 1024 Fourier modes and time step Δt=0.01. (The second-order Adams-Bashforth and Crank-Nicholson methods are used for time integration.) Figure 1 shows a sample profile of u(x), and the kinetic energy (KE) spectrum and power spectral density (PSD) of the flow. To perform LES, Eq. (3) with the DDP model of Π(u¯) is integrated using the same pseudo-spectral solver but with 128 Fourier modes and time step 20Δt. At this spatial resolution, the filtered velocity field accounts for 81% of the total KE of the flow (from DNS), conforming with the commonly used ratio.1 Also, note that the spatial and temporal resolutions of the LES solver are, respectively, 8× and 20× lower than those of the DNS solver, substantially reducing the computational cost.

The schematic of LES with DDP is shown in Fig. 2(a). Next we present the details of the ANN and the training data/procedure. We use a multilayer perceptron ANN41 to develop the DDP model. This ANN is unidirectional (information only passes in one direction from input to output) and is fully connected between the layers. The ANN is trained, i.e., all learnable parameters of the network (weights and biases, collectively represented by θ) are computed, by minimizing the mean square-error MSE=i=1M||ANN(u¯̃i;θ)Π̃i||22/M. Here, M is the number of training samples, ||·||2 is the L2 norm, u¯ and Π are calculated from DNS data, and ·̃ indicates pre-conditioned (augmented) training data (discussed shortly). The best network architecture, found based on extensive trial and error using MSE, consists of an input layer, six hidden layers with 250 nodes each, and a linear output layer. On all but the final layer, the swish activation function42 is used. Overall, the ANN has 394 640 trainable parameters.

FIG. 2.

(a) The schematic of the LES with DDP model. With normalized u¯(x,t) as input, the trained ANN predicts Π, which is then de-normalized and used in Eq. (3) to compute u¯(x,t+20Δt), and the cycle continues. (b) The pre-conditioning step to augment the training data by adding random shifts in x to produce spatially diverse samples from a relatively small DNS dataset.

FIG. 2.

(a) The schematic of the LES with DDP model. With normalized u¯(x,t) as input, the trained ANN predicts Π, which is then de-normalized and used in Eq. (3) to compute u¯(x,t+20Δt), and the cycle continues. (b) The pre-conditioning step to augment the training data by adding random shifts in x to produce spatially diverse samples from a relatively small DNS dataset.

Close modal

Our first attempts to train the DDP model with M=O(105) resulted in inaccurate Π terms in a priori tests and unstable LES with DDP in a posteriori tests. Further analysis showed that the problem is due to the fact that the SGS dynamics and thus the Π terms in Burgers turbulence are highly localized around the shocks,37 which as explained below, leads to overfitting, i.e., poor generalization of ANN (at the same Re) beyond the training set. Shocks are persistent and can remain fairly stationary for many time steps, which can lead to small or near-zero Π terms in some regions of the domain that do not experience shocks throughout the training set. The ANN trained on such a dataset will predict Π0 in those regions no matter what the inputted u¯ is during (a priori or a posteriori) tests. Note that by design, the flow during training could be very different, in terms of the location of shocks and their evolution, from the flow during testing. (Though the training and testing sets have the same Re, the latter is chosen from an independent DNS run or from a time window far from the time window of the training set.) Of course, this overfitting problem can be resolved by using a much larger training set that contains a sufficient number of samples of shocks waves occurring in all regions; however, such large training sets are often unavailable. Here, we propose a simple strategy, based on pre-conditioning the training samples, to overcome this problem without the need for a larger dataset.

As shown in Fig. 2(b), a random shift η, drawn from the uniform distribution U(0,L), is added to x for each input-output pair (u¯,Π)

u¯̃(x,t)=u¯(xη,t)andΠ̃(x,t)=Π(xη,t).
(5)

The periodicity in x is used when xη<0. It should be noted that this type of artificially enhancing the richness of information inside the training set is commonly used in the machine learning community and is called data augmentation.43 For example, in processing of natural images, data augmentation generally involves artificially enhancing the training set by rotating, mirroring, or cropping images. Here, we have exploited the periodicity of x to introduce a physically meaningful augmentation, which allows us to enrich the information of the localized flow and SGS terms around shock waves in the training set without the need for a longer DNS dataset. Finally, as is common practice in machine learning, the input u¯̃ and output Π̃ samples are separately normalized (through removing the mean and dividing by the standard deviation).

The pre-conditioned input-output pairs (u¯̃,Π̃) are used to train the ANN. As shown next, the DDP model with an ANN trained using augmented data leads to accurate Π terms in a priori tests and stable and accurate LES models in a posteriori tests without the need for any post-processing of the trained ANN or its output [with the exception of de-normalizing the predicted Π; see Fig. 2(a)]. We have used M=5×105 samples for training and another (independent) 5×104 samples for validation from a DNS run at Re = Rec. For testing, we have used data from the same run but 5×104Δt separated from the training/validation sets as well as data from two other independent DNS runs at Re = Rec.

We examine the performance of the LES with DDP in a posteriori (online) tests to assess both accuracy (of the SGS modeling) and stability of the hybrid model. Given that the numerical solution of Eq. (3) blows up without any SGS modeling (i.e., with Π = 0), we use a conventional SGS scheme, DSMAG,44 as the baseline. Figures 3(a) and 3(b) show the spectrum and the probability density function (PDF) of the Π terms predicted by DDP and DSMAG compared against those of the filtered DNS (FDNS), which is treated as the truth. Both panels show that the statistics of Π predicted by DDP closely follow those of the truth at any k and even at the tails of the PDF. Furthermore, both panels show that DDP outperforms DSMAG in modeling the statistics of the SGS term (Π). The better performance of DDP is clearly seen at high and low k in (a) and beyond ±1 standard deviation in (b). Note that the difference between the Π's PDFs from FDNS and DSMAG (DDP) is (is not) statistically significant at 95% confidence level based on both Kolmogorov-Smirnov, KS, and Kullback-Leibler divergence, KL, tests.45 

FIG. 3.

Statistics of the resolved flow u¯ and SGS term Π calculated using results from a posteriori tests at Re = Rec. The training and testing data are both at Re = Rec. (a) Spectrum of Π, denoted as Π̂(k). The spectrum for FDNS agrees with those reported in previous studies of Burgers turbulence.36 (b) PDF of Π. (c) Spectrum of KE. The curl up in KE around the maximum resolved k of LES is a common feature of spectral LES solvers applied to Burgers turbulence.38,39,46 In (a)–(c), each curve is produced using 3×105 sequential samples that are 20Δt apart. (d) PDF of u¯ computed using a kernel estimator.45 Inset panels in (d) show the zoomed-in left and right tails. Shading shows uncertainty as ±1 standard deviation obtained from bootstrapping three independent LES or DNS runs that are combined (each providing 3×105 samples as before). In (b) and (d), σ is the variable's standard deviation.

FIG. 3.

Statistics of the resolved flow u¯ and SGS term Π calculated using results from a posteriori tests at Re = Rec. The training and testing data are both at Re = Rec. (a) Spectrum of Π, denoted as Π̂(k). The spectrum for FDNS agrees with those reported in previous studies of Burgers turbulence.36 (b) PDF of Π. (c) Spectrum of KE. The curl up in KE around the maximum resolved k of LES is a common feature of spectral LES solvers applied to Burgers turbulence.38,39,46 In (a)–(c), each curve is produced using 3×105 sequential samples that are 20Δt apart. (d) PDF of u¯ computed using a kernel estimator.45 Inset panels in (d) show the zoomed-in left and right tails. Shading shows uncertainty as ±1 standard deviation obtained from bootstrapping three independent LES or DNS runs that are combined (each providing 3×105 samples as before). In (b) and (d), σ is the variable's standard deviation.

Close modal

To examine the statistics of the resolved flow, Figs. 3(c) and 3(d) show the spectrum of KE and the PDF of u¯. Both LES with DDP and LES with DSMAG capture the KE spectrum up to near the maximum resolved k (=64) although DDP does slightly better and agrees with the FDNS' KE spectrum up to k60 while DSMAG does so up to k50. Furthermore, as shown in panel (d), LES with DDP outperforms LES with DSMAG in capturing the PDF's tails, which correspond to shocks. Note that the differences between the PDFs of DDP, FDNS, and DSMAG are not statistically significant (at 95% confidence level) based on the KS or KL test, but that is because such tests mainly assess similarities in the bulk rather than the tails of the PDFs. A closer visual inspection shows that the difference between the tails of the PDFs from FDNS and DDP (DSMAG) is within (outside) the uncertainty range, indicating that DDP (DSMAG) accurately captures (does not capture) the statistics of the rare events.

So far we have discussed the results with LES resolution of 128 Fourier modes, which as mentioned before, conforms with the commonly used criterion for LES resolution based on the KE of the filtered flow. With the lower resolution of 96 modes, the DDP model still leads to a stable LES that outperforms LES with DSMAG. Further lowering the resolution to 64 modes leads to an unstable DDP. While LES with DSMAG is stable (due to the purely diffusive nature of this SGS model), the accuracy is poor. At this resolution, the u¯ field only accounts for 40% of the total KE, and LES (with any SGS model) is not expected to be used at such low resolutions.

In summary, the DDP model that uses an ANN trained with augmented data (from Re = Rec) leads to a stable LES model (with reasonably high enough resolution) in a posteriori tests (at Re = Rec) that is more accurate than LES with DSMAG. Next, we examine whether a DDP model trained with augmented data from a given Re can be used for LES of a flow that has higher Re.

Figure 4 shows the statistics of the resolved flow and of Π calculated using results from a posteriori tests at Re = Rec but with a DDP model that uses an ANN trained on data from Re=Rec/10 (see the dashed blue lines). It is clear that this DDP model does not generalize as the spectrum and PDF of Π and the spectrum of KE all deviate from those of the FDNS. The results are not surprising as it is known that data-driven models often have difficulty with generalization to a different (especially more complex) system. For example, using a multi-scale Lorenz 96 system, we23 showed that ANN- and recurrent neural network-based data-driven SGS models do not accurately generalize when the system is forced to become more chaotic. However, we also showed that transfer learning (TL)47 provides an effective way for addressing this challenge, at least for a simple chaotic toy model. Below, we show the effectiveness of TL in making DDP generalizable to higher Re in a turbulent flow.

FIG. 4.

Statistics of the resolved flow and SGS term calculated using results from a posteriori tests at Re = Rec but with DDP models mainly trained on data from Re=Rec/10. Each curve is produced using 3×105 sequential samples that are 20Δt apart. The DDP model without transfer learning (TL) uses the ANN trained on M=5×105 samples from DNS at Re=Rec/10. The DDP model with TL uses the same ANN but after its last two layers are re-trained with 5×104 samples from DNS at Re = Rec (Fig. 5). (a) Spectrum of Π. (b) PDF of Π. (c) Spectrum of KE.

FIG. 4.

Statistics of the resolved flow and SGS term calculated using results from a posteriori tests at Re = Rec but with DDP models mainly trained on data from Re=Rec/10. Each curve is produced using 3×105 sequential samples that are 20Δt apart. The DDP model without transfer learning (TL) uses the ANN trained on M=5×105 samples from DNS at Re=Rec/10. The DDP model with TL uses the same ANN but after its last two layers are re-trained with 5×104 samples from DNS at Re = Rec (Fig. 5). (a) Spectrum of Π. (b) PDF of Π. (c) Spectrum of KE.

Close modal

Figure 5 shows the schematic of TL applied to the ANN of a DDP model. In general, the weights of an ANN are randomly initialized and then they are updated through training on M samples from a given data distribution (here, data from a flow with Re=Rec/10). The test in Fig. 4 showed that this ANN does not accurately work for Re = Rec. The idea of TL is that we re-train this ANN (starting with its current weights rather than random initializations) and update the weights only in the deeper layers using a smaller number of samples (e.g., MTL=M/10) from the new data distribution (i.e., the flow with Re = Rec). The underlying idea of TL is that in deep networks, the initial layers learn high-level features, and only the deeper layers learn low-level features that are specific to a particular data distribution.47 Thus, for generalization, we only need to re-train the deeper layers, which can be done using a small amount of data from the new distribution.

FIG. 5.

Schematic of transfer learning (TL) to develop an accurate DDP model for Re = Rec. Without TL, the ANN in the DDP model is trained, starting with random weights, on M=5×105 samples from DNS at Re=Rec/10. This DDP model does not generalize to Re = Rec (dashed blue lines in Fig. 4). Then, TL is applied: the weights in the first three layers (blue) of this ANN are fixed, and the last two layers (red) are re-trained, starting with the previously computed weights, and using only MTL=5×104 samples from DNS at Re = Rec. The DDP model with TL is accurate and stable in a posteriori tests at Re = Rec (solid blue lines in Fig. 4).

FIG. 5.

Schematic of transfer learning (TL) to develop an accurate DDP model for Re = Rec. Without TL, the ANN in the DDP model is trained, starting with random weights, on M=5×105 samples from DNS at Re=Rec/10. This DDP model does not generalize to Re = Rec (dashed blue lines in Fig. 4). Then, TL is applied: the weights in the first three layers (blue) of this ANN are fixed, and the last two layers (red) are re-trained, starting with the previously computed weights, and using only MTL=5×104 samples from DNS at Re = Rec. The DDP model with TL is accurate and stable in a posteriori tests at Re = Rec (solid blue lines in Fig. 4).

Close modal

Figure 4 shows that the DDP model with TL (solid blue lines) accurately generalizes to the flow with Rec as the spectrum and PDF of Π and spectrum of KE closely match those of FDNS. In fact, the accuracy of the DDP model with TL in Fig. 4 (which only uses MTL=5×104 training samples from Rec) is comparable with the accuracy of the DDP model in Fig. 3 (which uses M=5×10 5 training samples from Rec). Furthermore, Fig. 6 shows how gradually increasing MTL improves the generalization capability of the DDP model. Finally, a figure in the supplementary material further demonstrates the effect of the number of re-trained layers (as well as MTL), showing that at large enough MTL, re-training more than one layer yields accurate generalization.

FIG. 6.

Spectrum of Π in a posteriori tests on Re = Rec as MTL (the number of training samples from Rec used in TL) is increased. MTL = 0 correspond to no TL and the original ANN trained on M=5×105 samples from Re=Rec/10. Adding MTL = 500–5000 samples improves the generalization capability of the DDP model to some degree. MTL=104 (2% of M) leads to substantial improvements although Π̂ is underestimated at high k while overestimated at low k. Increasing MTL to 5×104 (10% of M) further improves the generalization capability and Π̂ that is just slightly underestimated.

FIG. 6.

Spectrum of Π in a posteriori tests on Re = Rec as MTL (the number of training samples from Rec used in TL) is increased. MTL = 0 correspond to no TL and the original ANN trained on M=5×105 samples from Re=Rec/10. Adding MTL = 500–5000 samples improves the generalization capability of the DDP model to some degree. MTL=104 (2% of M) leads to substantial improvements although Π̂ is underestimated at high k while overestimated at low k. Increasing MTL to 5×104 (10% of M) further improves the generalization capability and Π̂ that is just slightly underestimated.

Close modal

In conclusion, we have investigated ANN-based data-driven SGS modeling of Burgers turbulence, with a particular focus on the stability of a posteriori LES models and generalization to higher Re. We show that developing a DDP model for Burgers turbulence is particularly challenging due to the presence of shocks, which localize the SGS term (Π), resulting in ANNs that overfit in the absence of a large training set. The overfitting ANNs lead to inaccurate/unstable DDP models. To overcome this challenge, we introduce a pre-conditioning step in which, exploiting periodicity, training samples are randomly shifted, thus enriching and augmenting the training set. The DDP model trained on this augmented dataset leads to stable and accurate a posteriori LES models. These results suggest that similar data augmentation strategies that exploit symmetries, invariances, and other physical properties (see Xie et al.,48 Pan and Duraisamy,49 and Formentin et al.50 for more examples) should be considered in developing DDP models for more complex flows when large training sets are unavailable, not only to improve accuracy but also to improve the stability of a posteriori LES runs.

We have also found the DDP model not to generalize (i.e., extrapolate) to a flow with 10× higher Re. However, we show, for the first time to the best of our knowledge, the application of TL to making a DDP model generalizable in a turbulent flow. Transfer learning enables the development of DDP models for high-Re flows with most of the training data provided by high-fidelity simulations at lower Re, which is highly appealing for practical purposes because the computational cost of simulating turbulent flows rapidly increases with Re.

In future work, the application of TL and data augmentation to develop accurate, stable, generalizable DDP models for more complex turbulent flows that are 2D and 3D will be investigated.

See the supplementary material for a figure showing the effects of the number of re-trained layers and the number of samples used for re-training on the generalization accuracy.

We thank Romit Maulik, Rambod Mojgani, and Ebrahim Nabizadeh for insightful discussions. We are grateful to two anonymous reviewers for helpful comments and suggestions that improved the quality of the manuscript. This work was supported by an award from the ONR Young Investigator Program, N00014-20-1-2722, and by NSF Grant No. OAC-2005123 (to P.H.). A.C. thanks the Rice University Ken Kennedy Institute for Information Technology for a BP HPC Graduate Fellowship. Computational resources were provided by NSF XSEDE (Allocation No. ATM170020) and by the Rice University Center for Research Computing.

The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.4316338 and in GitHub at https://github.com/envfluids/Burgers_DDP_and_TL.

1.
S. B.
Pope
,
Turbulent Flows
(
Cambridge University Press
,
2001
).
2.
P.
Sagaut
,
Multiscale and Multiresolution Approaches in Turbulence: LES, DES and Hybrid RANS/LES Methods: Applications and Guidelines
(
World Scientific
,
2013
).
3.
J.
Ling
,
A.
Kurzawski
, and
J.
Templeton
, “
Reynolds averaged turbulence modelling using deep neural networks with embedded invariance
,”
J. Fluid Mech.
807
,
155
166
(
2016
).
4.
J. N.
Kutz
, “
Deep learning in fluid dynamics
,”
J. Fluid Mech.
814
,
1
4
(
2017
).
5.
S. L.
Brunton
,
B. R.
Noack
, and
P.
Koumoutsakos
, “
Machine learning for fluid mechanics
,”
Annu. Rev. Fluid Mech.
52
,
477
508
(
2020
).
6.
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
Z.
Lu
, and
E.
Ott
, “
Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach
,”
Phys. Rev. Lett.
120
,
024102
(
2018
).
7.
J.-L.
Wu
,
K.
Kashinath
,
A.
Albert
,
D.
Chirila
,
H.
Xiao
 et al, “
Enforcing statistical constraints in generative adversarial networks for modeling chaotic dynamical systems
,”
J. Comput. Phys.
406
,
109209
(
2020
).
8.
A. T.
Mohan
,
D.
Tretiak
,
M.
Chertkov
, and
D.
Livescu
, “
Spatio-temporal deep learning models of 3D turbulence with physics informed diagnostics
,”
J. Turbul.
21
,
484
441
(
2020
).
9.
A.
Chattopadhyay
,
P.
Hassanzadeh
, and
D.
Subramanian
, “
Data-driven predictions of a multiscale lorenz 96 chaotic system using machine-learning methods: Reservoir computing, artificial neural network, and long short-term memory network
,”
Nonlinear Processes Geophys.
27
,
373
389
(
2020
).
10.
A.
Chattopadhyay
,
E.
Nabizadeh
, and
P.
Hassanzadeh
, “
Analog forecasting of extreme-causing weather patterns using deep learning
,”
J. Adv. Modeling Earth Syst.
12
,
e2019MS001958
(
2020
).
11.
M.
Raissi
,
A.
Yazdani
, and
G. E.
Karniadakis
, “
Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations
,”
Science
367
,
1026
1030
(
2020
).
12.
H.
Eivazi
,
H.
Veisi
,
M. H.
Naderi
, and
V.
Esfahanian
, “
Deep neural networks for nonlinear model order reduction of unsteady flows
,”
Phys. Fluids
32
,
105104
(
2020
).
13.
S.
Pandey
,
J.
Schumacher
, and
K. R.
Sreenivasan
, “
A perspective on machine learning in turbulent flows
,”
J. Turbul.
21
,
567
584
(
2020
).
14.
S.
Pan
and
K.
Duraisamy
, “
Data-driven discovery of closure models
,”
SIAM J. Appl. Dyn. Syst.
17
,
2381
2413
(
2018
).
15.
K.
Duraisamy
,
G.
Iaccarino
, and
H.
Xiao
, “
Turbulence modeling in the age of data
,”
Annu. Rev. Fluid Mech.
51
,
357
377
(
2019
).
16.
C.
Xie
,
J.
Wang
,
H.
Li
,
M.
Wan
, and
S.
Chen
, “
Artificial neural network mixed model for large eddy simulation of compressible isotropic turbulence
,”
Phys. Fluids
31
,
085112
(
2019
).
17.
R.
Maulik
,
O.
San
,
A.
Rasheed
, and
P.
Vedula
, “
Subgrid modelling for two-dimensional turbulence using neural networks
,”
J. Fluid Mech.
858
,
122
144
(
2019
).
18.
A.
Beck
,
D.
Flad
, and
C.-D.
Munz
, “
Deep neural networks for data-driven LES closure models
,”
J. Comput. Phys.
398
,
108910
(
2019
).
19.
Z.
Zhou
,
G.
He
,
S.
Wang
, and
G.
Jin
, “
Subgrid-scale model for large-eddy simulation of isotropic turbulent flows using an artificial neural network
,”
Comput. Fluids
195
,
104319
(
2019
).
20.
T.
Bolton
and
L.
Zanna
, “
Applications of deep learning to ocean data inference and sub-grid parameterisation
,”
J. Adv. Modeling Earth Syst.
11
,
376
399
(
2019
).
21.
S.
Pawar
,
O.
San
,
A.
Rasheed
, and
P.
Vedula
, “
A priori analysis on deep learning of subgrid-scale parameterizations for Kraichnan turbulence
,”
Theor. Comput. Fluid Dyn.
34
,
429
427
(
2020
).
22.
C.
Xie
,
J.
Wang
, and
E.
Weinan
, “
Modeling subgrid-scale forces by spatial artificial neural networks in large eddy simulation of turbulence
,”
Phys. Rev. Fluids
5
,
054606
(
2020
).
23.
A.
Chattopadhyay
,
A.
Subel
, and
P.
Hassanzadeh
, “
Data-driven super-parameterization using deep learning: Experimentation with multi-scale Lorenz 96 systems and transfer learning
,”
J. Adv. Modeling Earth Syst.
21
,
e2020MS002084
(
2020
).
24.
H.
Frezat
,
G.
Balarac
,
J.
Le Sommer
,
R.
Fablet
, and
R.
Lguensat
, “
Physical invariance in neural networks for subgrid-scale scalar flux modeling
,”
Phys. Rev. Fluids
6
,
024607
(
2021
).
25.
L.
Zanna
and
T.
Bolton
, “
Data-driven equation discovery of ocean mesoscale closures
,”
Geophys. Res. Lett.
47
,
e2020GL088376
, (
2020
).
26.
M.
Kurz
and
A.
Beck
, “
A machine learning framework for LES closure terms
,” arXiv:2010.03030 (
2020
).
27.
S.
Pawar
,
S. E.
Ahmed
, and
O.
San
, “
Interface learning in fluid dynamics: Statistical inference of closures within micro–macro-coupling models
,”
Phys. Fluids
32
,
091704
(
2020
).
28.
U.
Piomelli
,
W. H.
Cabot
,
P.
Moin
, and
S.
Lee
, “
Subgrid-scale backscatter in turbulent and transitional flows
,”
Phys. Fluids A: Fluid Dyn.
3
,
1766
1771
(
1991
).
29.
H. T.
Hewitt
,
M.
Roberts
,
P.
Mathiot
,
A.
Biastoch
,
E.
Blockley
,
E. P.
Chassignet
,
B.
Fox-Kemper
,
P.
Hyder
,
D. P.
Marshall
,
E.
Popova
 et al, “
Resolving and parameterising the ocean mesoscale in earth system models
,”
Curr. Clim. Change Rep.
6
,
137
116
(
2020
).
30.
J.
Smagorinsky
, “
General circulation experiments with the primitive equations: I. The basic experiment
,”
Mon. Weather Rev.
91
,
99
164
(
1963
).
31.
M.
Germano
,
U.
Piomelli
,
P.
Moin
, and
W. H.
Cabot
, “
A dynamic subgrid-scale eddy viscosity model
,”
Phys. Fluids A: Fluid Dyn.
3
,
1760
1765
(
1991
).
32.
C.
Meneveau
and
T. S.
Lund
, “
The dynamic Smagorinsky model and scale-dependent coefficients in the viscous range of turbulence
,”
Phys. Fluids
9
,
3932
3934
(
1997
).
33.
D.
Krueger
,
E.
Caballero
,
J.-H.
Jacobsen
,
A.
Zhang
,
J.
Binas
,
R. L.
Priol
, and
A.
Courville
, “
Out-of-distribution generalization via risk extrapolation (REx)
,” arXiv:2003.00688 (
2020
).
34.
S. I.
Dolaptchiev
,
U.
Achatz
, and
I.
Timofeyev
, “
Stochastic closure for local averages in the finite-difference discretization of the forced Burgers equation
,”
Theor. Comput. Fluid Dyn.
27
,
297
317
(
2013
).
35.
S. S.
Girimaji
, “
Spectrum and energy transfer in steady Burgers turbulence
,”
Phys. Lett. A
202
,
279
287
(
1995
).
36.
A.
Das
and
R. D.
Moser
, “
Optimal large-eddy simulation of forced Burgers equation
,”
Phys. Fluids
14
,
4344
4351
(
2002
).
37.
M.
Love
, “
Subgrid modelling studies with Burgers' equation
,”
J. Fluid Mech.
100
,
87
110
(
1980
).
38.
A.
LaBryer
,
P. J.
Attar
, and
P.
Vedula
, “
A framework for large eddy simulation of Burgers turbulence based upon spatial and temporal statistical information
,”
Phys. Fluids
27
,
035116
(
2015
).
39.
R.
Maulik
and
O.
San
, “
Explicit and implicit LES closures for Burgers turbulence
,”
J. Comput. Appl. Math.
327
,
12
40
(
2018
).
40.
J.
Alcala
and
I.
Timofeyev
, “
Subgrid-scale parametrization of unresolved scales in forced Burgers equation using generative adversarial networks (GAN)
,” arXiv:2007.06692 (
2020
).
41.
I.
Goodfellow
,
Y.
Bengio
,
A.
Courville
, and
Y.
Bengio
,
Deep Learning
(
MIT Press, Cambridge
,
2016
), Vol.
1
.
42.
P.
Ramachandran
,
B.
Zoph
, and
Q. V.
Le
, “
Searching for activation functions
,” arXiv:1710.05941 (
2017
).
43.
M. A.
Tanner
and
W. H.
Wong
, “
The calculation of posterior distributions by data augmentation
,”
J. Am. Stat. Assoc.
82
,
528
540
(
1987
).
44.
L.
Yanan
and
W. Z.
J
, “
A priori and a posteriori evaluations of sub-grid scale models for the Burgers equation
,”
Comput. Fluids
139
,
92
104
(
2016
).
45.
R.
Wilcox
,
Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy
(
Springer Science & Business Media
,
2010
).
46.
C.
Bayona
,
J.
Baiges
, and
R.
Codina
, “
Variational multiscale approximation of the one-dimensional forced burgers equation: The role of orthogonal subgrid scales in turbulence modeling
,”
Int. J. Numer. Methods Fluids
86
,
313
328
(
2018
).
47.
J.
Yosinski
,
J.
Clune
,
Y.
Bengio
, and
H.
Lipson
, “
How transferable are features in deep neural networks?
” in
NeurIPS Proceedings of Advances in Neural Information Processing Systems
(
2014
), pp.
3320
3328
.
48.
Y.
Xie
,
E.
Franz
,
M.
Chu
, and
N.
Thuerey
, “
tempoGAN: A temporally coherent, volumetric GAN for super-resolution fluid flow
,”
ACM Trans. Graph.
37
,
1
15
(
2018
).
49.
S.
Pan
and
K.
Duraisamy
, “
Long-time predictive modeling of nonlinear dynamical systems using neural networks
,”
Complexity
2018
,
1
.
50.
S.
Formentin
,
M.
Mazzoleni
,
M.
Scandella
, and
F.
Previdi
, “
Nonlinear system identification via data augmentation
,”
Syst. Control Lett.
128
,
56
63
(
2019
).

Supplementary Material