A convolutional encoder–decoder-based transformer model is proposed for autoregressively training on spatiotemporal data of turbulent flows. The prediction of future fluid flow fields is based on the previously predicted fluid flow field to ensure long-term predictions without diverging. A combination of convolutional neural networks and transformer architecture is utilized to handle both the spatial and temporal dimensions of the data. To assess the performance of the model, a priori assessments are conducted, and significant agreements are found with the ground truth data. The a posteriori predictions, which are generated after a considerable number of simulation steps, exhibit predicted variances. The autoregressive training and prediction of a posteriori states are deemed crucial steps toward the development of more complex data-driven turbulence models and simulations. The highly nonlinear and chaotic dynamics of turbulent flows can be handled by the proposed model, and accurate predictions over long time horizons can be generated. Overall, the potential of using deep learning techniques to improve the accuracy and efficiency of turbulence modeling and simulation is demonstrated by this approach. The proposed model can be further optimized and extended to incorporate additional physics and boundary conditions, paving the way for more realistic simulations of complex fluid dynamics.

The main factor in turbulent flows is convection, which makes tasks such as flow control and model reduction complex and challenging. These tasks become nonlinear, high-dimensional, multi-scale, and nonconvex optimization problems due to the dominance of convection over diffusion. Due to the vast amount of numerical and experimental data available for turbulent flows, data-driven approaches are now gaining popularity in the fluid mechanics community. These approaches use deep learning models to make predictions and represent a valid alternative to traditional methods. This article explores a new data-driven approach based on deep learning to estimate future fluid flow fields from previous ones. The proposed method uses a novel convolutional encoder–decoder transformer model and autoregressive training to achieve long-term spatiotemporal predictions. The approach is tested on two turbulent fluid flow cases, namely, a wake-flow past a stationary obstacle and an environmental flow past a tower fixed on a surface. The results show the effectiveness of the proposed method in predicting the fluid flow fields accurately, highlighting the potential of data-driven approaches in solving challenging problems in fluid mechanics.

There are several traditional ways to address temporal estimations, such as Koopman theory and proper orthogonal decomposition, which are suitable for prediction and control.1–3 Additionally, data assimilation schemes are popular, where the model weights are updated to reflect new observations.4 In recent years, supervised learning techniques using neural networks have been applied to capture nonlinear relations between past and future states. For example, a recurrent neural network (RNN) with long short-term memory (LSTM) was used to predict the chaotic Lorenz system, and convolutional networks were used to predict transient flows.5,6 There have also been attempts to approximate the full Navier–Stokes equations using deep neural networks, but prediction accuracy decreased significantly for chaotic and turbulent flows.7–10 Regarding the estimation of flow fields using deep neural networks, several studies have focused on spatial and temporal reconstruction, as well as spatial supersampling.11–13 Hybrid deep neural network architectures have been designed to capture the spatial–temporal features of unsteady flows,14 and machine learning-based reduced-order models have been proposed for three-dimensional complex flows.15 A deep learning framework combining long short-term memory networks and convolutional neural networks has been used to predict the temporal evolution of turbulent flames.16 However, despite the significant progress made in the acceleration of flow simulation, these models still suffer from the generalization problem and are sensitive to parameter changes.17 

New deep learning architectures for temporal problems in unstructured and structured data are emerging, with transformers being one of the most promising. These models make use of self-attention mechanisms to differentially weigh the significance of each part of the input data,18,19 without the need for recurrent network architecture. Inspired by neighborhood-like notions in convolutional neural networks, transformers build features of inputs using a self-attention mechanism to determine the importance of other samples in the dataset with respect to the current sample. The updated features of the inputs are simply the sum of linear transformations of all features weighted by their importance. Transformers avoid recurrence by using the self-attention mechanism, which accounts for the similarity score between elements of a sequence and the positional embedding of these elements, allowing them to account for the full sequence instead of single elements. These models have been successful in natural language processing (NLP) tasks, such as translation and text summarization, and are becoming the model of choice for NLP problems, replacing classical recurrent neural network (RNN) models, such as long short-term memory (LSTM).20–22 Transformers have also been applied to image processing tasks using convolutional neural networks to capture relationships between different portions of an image.23–25 Hybrid architectures combining convolutional layers with transformers have achieved excellent results in several computer vision tasks.26,27 In spatiotemporal context, transformers have been used for video-understanding tasks, capturing spatial and temporal information through the use of divided space–time attention.28,29 In fluid mechanics, attention mechanisms have enhanced the reduced-order model to extract temporal feature relationships from high-fidelity numerical solutions.30 Recently, a similar combination of autoregressive transformers and two-dimensional homogeneous isotropic turbulence was proposed for spatiotemporal prediction of flow fields.31 However, transformers have never been used for spatiotemporal prediction of flow fields involving turbulent flows.

The present contribution is organized as follows: first, the deep learning method based on the convolutional self-attention transformer is discussed, after which focus is made on the autoregressive training procedure. Section III provides insights into the performance of the proposed approach by considering (i) a turbulent flow case with an obstacle embedded in a rectangular domain and (ii) a surface-mounted tower in an open flow. This part is followed by a discussion and a conclusion.

The primary focus of this contribution is to address the challenge of learning the spatiotemporal dynamics of turbulent flows, which are known for their high complexity, nonlinear behavior, and high dimensionality. There are two main approaches to estimate a reference spatiotemporal field Xt: (i) reconstruction, which involves utilizing limited measurements Xt̃ at a specific time t to reconstruct the full Xt field at the same time, and (ii) prediction, where a dynamical model is utilized to advance the field in time based on previous estimates. Here, spatiotemporal learning is formulated as a task with a given time-series containing N sequential snapshots xt,xt+Δt,.,xt+(N1)Δt, in order to predict the same quantity of interest on M steps ahead in time. The input X of the deep learning model is xt,xt+Δt,.,xt+(N1)Δt, and the output Y is xt+NΔt,,xt+N+(M1)Δt. Each snapshot xt can be a scalar field or a vector field containing multiple features.

Transformer models were created to address problems in natural language processing, such as completing sentences and translation by embedding words one by one. These tasks involve a sequence of words or sentence tensors measured over time and can be considered temporal learning problems.18 Transformer models have demonstrated impressive results in a range of other tasks, including learning image patches as sequences, image reconstruction, and completion.18,26,27 As a result, transformer models are now challenging the traditional long short-term memory (LSTM) models, which are the de facto RNNs, and are becoming the preferred state-of-the-art approach for a variety of temporal learning tasks.

Like RNNs, transformers are designed to handle sequential input data. However, unlike the latter, they do not necessarily process the data in order. Rather, the attention mechanism provides context for any position in the input sequence, and self-attention itself identifies/learns the weights of attention. In the case of spatiotemporal data, the attention can be applied to the spatial as well as the temporal sequence to attend to or pay attention to. The vanilla transformers in their original form are pure sequence to sequence models, as they learn a target output sequence from an input sequence, i.e., they perform transformation at the sequence level. Their limitations, such as disrupting temporal coherence and failing to capture long-term dependencies, were reached for sentence completion of language generation tasks, where difficulties were noted while generating texts with a model that learns sequences without the knowledge of full-sequences.19,32,33 Several studies were performed, such as that of Dai et al.,34 to address this inability to capture long-term dependencies by attending to memories from previously learned parameters, yet at the expense of computing costs. To deal with some of these issues, autoregressive transformers were proposed by Ref. 35 for sentence and image completion tasks. Although not explicitly stated in some works, the Generative Pre-trained Transformer (GPT) family of models22,36,37 are in fact autoregressive transformers inspired by the decoder part of the original transformers. In Ref. 35, Katharopoulous et al. showed that a self-attention layer trained in an autoregressive fashion can be seen as a recurrent neural network. Transformers can be combined with the classic convolutional encoder–decoder type models to harness their full potential when the input and target output tensors are in a spatiotemporal form. As locality is more important in learning small-scale features, this combination serves as a powerful method for a variety of computer-vision problems, including video-frame prediction. The self-attention mechanism on convolutional layers not only attends or focuses on a sequence of significance, but it also improves the representation of spatially relevant regions by focusing on important features and suppressing less-important ones.38 

When a transformer block follows a convolutional layer, the model learns to highlight significant features across the channel sequence and spatial dimensions. The input sequences are initially concatenated channel-wise to the input layer, and subsequent convolutional operations take place in the encoder. In the convolutional layers, the intermediate feature maps FRC×H×W from a specific layer go through the self-attention convolutional transformer layer convα, which attends to both spatial representation and the positional embeddings of the input sequence channels. In convα, let x,yRC denote the input and output intermediate feature tensors, where C indicates the number of intermediate channels. When i,jRH×W are indices of the spatial nodes, a standard convolution operation occurs as follows:
(1)
where N(i) signifies the spatial nodes in a local neighborhood defined by a kernel of size k × k centered at node i, ij denotes the relative spatial relationship from i to j, and WijRC×C is the weight matrix. On the other hand, self-attention for intermediate convolutional features has three weight matrices Wq,Wk,WvRC×C to compute query, key, and value, respectively. For each convolution window, the self-attention is given as
(2)
where the self-attention αij ∈ (0, 1) is a scalar that controls the contribution of values in spatial nodes, with WqkRC×k2, and [j] means jth element of the tensor. α is usually normalized by a softmax operation such that ∑jαij = 1. These operations are summarized in Fig. 1.
FIG. 1.

The convolutional transformer layer is composed of two blocks: the batched matrix multiplication (BMM) and the self-attention summation. The BMM block corresponds to Wijxj in Eq. (2), with the batch dimension being the number of spatial locations. It performs k × k different input-dependent summations with the weights α in Eq. (2). It contains both the learnable filter and the dynamic kernel.

FIG. 1.

The convolutional transformer layer is composed of two blocks: the batched matrix multiplication (BMM) and the self-attention summation. The BMM block corresponds to Wijxj in Eq. (2), with the batch dimension being the number of spatial locations. It performs k × k different input-dependent summations with the weights α in Eq. (2). It contains both the learnable filter and the dynamic kernel.

Close modal
Combining Eqs. (1) and (2), one obtains both an input sequence dependent kernel and the learnable convolution filters providing the final output feature map F by convolutional transformer layer, given as
(3)

The current self-attention convolutional transformer layer has a 3 × 3 kernel and incorporates the representation of convolutional features. Combining convolutional neural networks with self-attention thus offers superior learning capabilities of spatiotemporal structures, which would benefit turbulent flows and computational fluid dynamics (CFD) in general, where one learns spatial filters as well as temporal embeddings and dependencies. In addition to the convolutional transformer layer, the model is trained in an autoregressive fashion. Formally, autoregressive models are those which forecast future sequences from the previously forecasted sequences in a cyclical way, and thus here auto indicates the regression of the variable sequence against itself.

In turbulent flow problems, the high-dimensional state-space is characterized by intricate spatiotemporal dynamics, and therefore, dimensionality reduction techniques can be useful.39 The prediction and reconstruction problems can be interpreted as estimating the reduced or latent state, making it natural to use encoder–decoder architectures. The encoder takes input tensors, learns the most relevant parts, and maps them to a high-dimensional representation. This high-dimensional representation is then converted to target output tensors by the decoder, which involves successive up-samplings and convolutions. By connecting the encoder and decoder, their weight matrices learn to jointly map the input to the output tensors, allowing small-scale features to be learned. The decoder aims to transform the latent space representation with a dimension of nz × nz to the original spatial dimensions of the target output at time xtt.

For a trained model M as shown in Fig. 2, multi-step training is performed for quantity Xt in an auto-regressive manner, i.e., Xtt is predicted from previously predicted Xt, where t is some non-dimensional time. In other words, an initial condition Xt is inputted to the model to learn X̂t+Δt, after which this predicted X̂t+Δt is then fed back to the model again to learn X̂t+2Δt and so on, in an autoregressive manner,
(4)
where t is the time step and XRC×H×W the input tensor snapshot at instant t. In the following, the autoregressive training sequence length is set equal to two in order to limit the computational cost.
FIG. 2.

Convolutional encoder–decoder transformer architecture. Model architecture of the convolutional encoder–decoder transformer to process low and high level features. The canonical four-stage design is utilized in addition to the convolutional transformer blocks or layers. H, W are the input resolutions for each snapshot in Tin sequence and Tout sequence.

FIG. 2.

Convolutional encoder–decoder transformer architecture. Model architecture of the convolutional encoder–decoder transformer to process low and high level features. The canonical four-stage design is utilized in addition to the convolutional transformer blocks or layers. H, W are the input resolutions for each snapshot in Tin sequence and Tout sequence.

Close modal
In order to preserve meaningful values at the boundaries, the convolutional filters used in the proposed architecture incorporate a symmetric boundary condition into the padding operation. While padding is typically used to retain the spatial dimensions of the field undergoing convolution, zero-padding does not accurately represent the expected physical behavior. Indeed, padding with zeros everywhere would violate the representation of existing boundary conditions, for example, the notion of wall boundaries would have lesser significance if a region is padded with zeros on all the sides in a channel flow. To preserve the boundary conditions after multiple successive convolutions, a boundary condition formulation was implemented such that the walls could be padded with zeros if required, while the other sides could be padded with adequate values from the symmetric cells. To train the model, the Adam40 optimizer is used to iteratively minimize the total equi-weighted mean squared error (MSE) loss defined by
(5)

The activation function used in the neural network was ReLU, which is known to help stabilize the weight update during training.41 During training, the entire training dataset was presented to the network repeatedly after shuffling, and each complete pass is called an epoch. An early stopping criterion was used to stop the training process, along with a learning rate reduction if learning improvement did not occur after every 100 epochs. The TensorFlow platform42 was used to implement the neural network, and Nvidia Tesla V100 graphics processing units (GPUs) were used to train it.

The evolution of the velocity u and pressure p in an incompressible fluid flow is governed by the Navier–Stokes momentum and continuity equations, given with positive constant density ρ and dynamic viscosity μ as
(6)
Equation (6) includes the strain-rate tensor ɛ(u) as a function of velocity, the d-dimensional identity tensor Id, and an additional forcing or source term f. To solve a physical problem, suitable boundary and initial conditions are added to the equations. The presence of turbulence is accounted for by including an eddy viscosity term μt in the equations, which is modeled based on one or more turbulent scales. The eddy viscosity is computed using the Spalart–Allmaras (SA) model,43 which is a one-equation model that solves a convection–diffusion–reaction problem to describe the evolution of the kinematic eddy viscosity-like variable ν̃ governed by a nonlinear convection–diffusion–reaction equation and a damping function to enforce linear profile in the viscous sublayer. More details on the implementation of this model can be found in Ref. 44, and more details on the turbulent eddy viscosity models can be found in Ref. 45.

The turbulent flow around a 2D square cylinder is examined as a widely used benchmark case. The reference velocity U and cylinder lateral size H at the domain’s center are used to set the baseline Reynolds number to 22 × 103. The computational domain spans 5H,15H×7H,7H with 20 length units in the streamwise x and with 14 length units in crosswise y directions, respectively. To conduct an Unsteady Reynolds Averaged Navier–Stokes (URANS) or Variational Multi-scale (VMS) simulation, the domain is discretized into a sufficient number of cells (around 100 000) using a finite-element flow solver developed in-house.44,46–48 The inflow boundary conditions comprise of u = (Vin, 0) and ν̃=3ν, leading to an eddy to kinematic viscosity ratio of about 0.2. The side boundaries are treated as symmetrical, with yux = uy = 0 and yν̃=0. For the outflow, xux = xuy = 0, xν̃=0 along with p = 0 are enforced. At the cylinder surface, no-slip conditions u = 0 and ν̃=0 are applied. The simulation runs for a physical time of 5000 s with a time step of Δt = 0.05 s. A stable flow is achieved after about 200 s, and the remaining data for the next 4800 s are collected for training and testing purposes. Data are recorded every Δt = 0.25 s, resulting in about 1500 snapshots. In terms of non-dimensional time defined as t* = tU/H, this sampling rate corresponds to Δt* = 1. Approximately 24 shedding cycles are observed in simulation data. Given the 70/30 splitting strategy where 70% data is used for training and 30% data is used for validation and testing, 16 shedding cycles are observed in training data, which seems reasonable to fully characterize the dynamics of wake turbulent flow past a two-dimensional square cylinder considering its simplicity. Figure 3 shows the sketch of the associated case.

FIG. 3.

Setup cases used in this study. (a) Case 1 setup: Wake-flow past a square cylinder. (b) Case 2 setup: Environmental flow over a surface-mounted tower.

FIG. 3.

Setup cases used in this study. (a) Case 1 setup: Wake-flow past a square cylinder. (b) Case 2 setup: Environmental flow over a surface-mounted tower.

Close modal

The turbulent flow past a two-dimensional (2D) rectangular tower on the land surface is considered. The baseline Reynolds number is set to 45 × 102, based on the reference velocity U and the square tower of sides H, which is placed on the surface. The dimensions of the computational domain are 5H,30H×H,7H with 35 length units in the streamwise x and with 8 length units crosswise y directions, respectively, and the domain is discretized into sufficiently large number of cells (around 100 000) to perform a URANS or VMS simulation. The inflow boundary conditions are u = (Vin, 0), together with ν̃=3ν, which corresponds to a ratio of eddy to kinematic viscosity of ∼0.2. For the top of the domain, the velocity component normal to the surface is set to zero. No-slip boundary conditions u = 0 and ν̃=0 are imposed at the tower surface, as well as the bottom surface at y = −1. The time step is Δt = 0.01 and 300 s are simulated. For a statistically steady state to be reached (periodic vortex shedding to be observed), around 100 s are required. The data of the remaining 200 s (i.e., ∼20 × 103 time steps) are stored for training and testing purposes. The data are sampled at each Δt = 0.1 s, thus collecting around 2000 snapshots. In terms of non-dimensional time defined as t* = tU/H, this sampling at each Δt = 0.1 denotes Δt* = 1. Figure 3(b) shows the sketch of the associated case. Initially, a free separated shear layer expands above the tower and becomes wavy and then reattaches at the bottom surface of the domain. The shear layer flaps and vortical structures are shed from it. Approximately 18 shedding cycles are observed in simulation data. Given the 70/30 splitting strategy (70% data for training and 30% data for validation and testing), 12 shedding cycles are observed in training data enough to reasonably characterize the dynamics of environmental flow over the surface-mounted obstacle.

In this section, the results are discussed as follows: first, the temporal evolutions of the quantities are compared, and then the spatial measurements at various times are compared for velocity components. In a second time, temporal propagation of errors and correlation coefficients are compared along with the propagation of phase shifts. Additionally, the contour plots of quantities are also compared to provide qualitative assessments. These comparisons are performed for both the cases and for both the a priori and a posteriori simulations as illustrated in Fig. 4. To compare results, a first a priori simulation is performed by exploiting data samples that were not used during training. The trained model is fed snapshots at instant t, and predicts the next two snapshots at instants t + Δt and t + 2Δt, and the process is repeated by feeding the subsequent snapshots from the dataset, until the same number of snapshots is reached for comparison with the original ground truth time series. As snapshots from the dataset are utilized, this approach is termed a priori deep learning simulation. On the other hand, a posteriori simulation is performed by feeding a snapshot at instant t0 from the same dataset not used during training, and by predicting the next two snapshots at instants t + Δt and t + 2Δt. This predicted snapshot at instant t + 2Δt is then injected back into the model to predict snapshots at instants t + 3Δt and t + 4Δt, and the process is similarly repeated until the same number of snapshots are obtained so as to compare with the true snapshots. This way of recycling the model predictions is termed a posteriori deep learning simulation. Once an equal length of time snapshots are obtained, both the a priori and the a posteriori results against the truth from the dataset can now be compared. Figure 5 shows the temporal evolution of the ensemble average of velocity magnitude for case 1 and case 2. For case 1, both a priori and a posteriori predictions present a good agreement with respect to the truth, whereas for case 2 the predictions, though fairly accurate, suffer from deterioration. Moreover, the long-term predictions of the model are evident from the accuracy of a posteriori predictions, giving us an indication of global long-term learning while comparing ensemble averages.

FIG. 4.

Illustration of a priori and a posteriori simulations. Left: For a priori simulation, each Xt from {X}, the dataset not used in training time, is fed to the model. Right: For a posteriori simulation, the inputs Xt̂ are received from their own previous predictions, provided they were initiated with a suitable Xt.

FIG. 4.

Illustration of a priori and a posteriori simulations. Left: For a priori simulation, each Xt from {X}, the dataset not used in training time, is fed to the model. Right: For a posteriori simulation, the inputs Xt̂ are received from their own previous predictions, provided they were initiated with a suitable Xt.

Close modal
FIG. 5.

Temporal evolution of the ensemble averages for a priori and a posteriori values of velocity magnitude compared to the true values in black. Left: Ensemble mean for spatial values of velocity magnitude for case 1 (a priori and a posteriori predictions comparedto ground truth). Right: Ensemble mean for spatial values of velocity magnitude for case 2 (a priori and a posteriori predictions comparedto ground truth).

FIG. 5.

Temporal evolution of the ensemble averages for a priori and a posteriori values of velocity magnitude compared to the true values in black. Left: Ensemble mean for spatial values of velocity magnitude for case 1 (a priori and a posteriori predictions comparedto ground truth). Right: Ensemble mean for spatial values of velocity magnitude for case 2 (a priori and a posteriori predictions comparedto ground truth).

Close modal

The accuracy of the predictions is further verified by comparing the values along the various streamwise and cross-streamwise locations. These locations are marked with dashed lines in Fig. 6 for both the cases. For case 1, measurements were made along streamwise directions at x = [−2.5H, 8H, 12H, 16H] and cross-streamwise directions at y = −2H, 0, 2H, and similarly for case 2, the measurements were made at x = [−2.5H, 8H, 16H, 32H] and at y = [0.5H, 1.5H, 5H]. As the wake-flows are topic of interest, these locations were chosen based on the region of interest away from the obstacle for both the cases. With regard to temporal evolution, the predictions were compared at a certain percentage of the total predicted snapshots. As a reminder, around 200 snapshots were predicted for case 1 and around 100 snapshots for case 2. The predictions are compared at instants t = [2%, 33%, 66%] to verify the quality of temporal evolution.

FIG. 6.

Locations of the probe lines used for comparison with the reference quantities. Top: Lines along streamwise and cross-streamwise directions for case 1. Bottom: For case 2.

FIG. 6.

Locations of the probe lines used for comparison with the reference quantities. Top: Lines along streamwise and cross-streamwise directions for case 1. Bottom: For case 2.

Close modal

Figure 7 shows the evolution of temporal predictions of streamwise velocity component u0 when measured along with cross-streamwise directions. The a priori predictions follow closely the reference values indicating a good agreement with the short-term predictions along with the measured spatial directions. For a posteriori predictions, an increasing deviation from the reference was observed as time evolves, which can be attributed to the accumulation error while making long-term predictions. Similarly, the evolution of the same quantity (u0) when measured along streamwise directions is shown in Fig. 8. A similar trend is observed for the predictions against the reference, where the a posteriori predictions deteriorate as time evolves. It can be noted that the upstream predictions at x = −2.5H are better across times, as it is not affected by the turbulent wake. Overall measurements indicate a decent agreement of both the short-term a priori predictions as well as long-term a posteriori predictions with the reference solutions.

FIG. 7.

Comparative predictions of streamwise velocity component (u0) for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.

FIG. 7.

Comparative predictions of streamwise velocity component (u0) for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.

Close modal
FIG. 8.

Comparative predictions of streamwise velocity component (u0) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.

FIG. 8.

Comparative predictions of streamwise velocity component (u0) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.

Close modal

Later, the temporal evolution of prediction error against reference by computing relative mean-squared errors of velocity magnitude for both cases is investigated. These errors are measured along the locations mentioned earlier. Figure 9(a) shows the evolution of error for a priori predictions, and Fig. 9(b) shows a posteriori predictions for case 1 measured at streamwise locations. As could be expected, the errors accumulate for long-term posterior predictions, leading to a clear distinction when compared to a priori predictions. It is interesting to note that although magnitude increases over time, this evolution also follows the vortex/wake shedding cycles denoting that the trained model performs well for long-term a posteriori predictions. A similar trend is observed for case 2 as shown in Figs. 9(c) and 9(d), although here the magnitude of accumulated errors is higher than that of case 1. As shown in Fig. 10, a similar trend is observed when measurements of errors were performed at cross-streamwise locations.

FIG. 9.

Mean squared error propagation for velocity magnitude with respect to reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right are the temporal evolution of a posteriori mean squared error. The values are shown for locations along the X-axis.

FIG. 9.

Mean squared error propagation for velocity magnitude with respect to reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right are the temporal evolution of a posteriori mean squared error. The values are shown for locations along the X-axis.

Close modal
FIG. 10.

Mean squared error propagation for velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right is the temporal evolution of a posteriori mean squared error. The values are shown for locations along the Y-axis.

FIG. 10.

Mean squared error propagation for velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right is the temporal evolution of a posteriori mean squared error. The values are shown for locations along the Y-axis.

Close modal
The a posteriori predictions are observed to experience high noise conditions caused by error propagation. Hence, extracting the correlation between these two sets (predicted vs reference) of temporal evolution is important, in particular, to assess whether the heavy noise contributions are degrading correlation values. To do so, the Pearson product–moment correlation coefficient Rxy of n pairs of time series data (x1,y1),,(xn,yn) is computed,
(7)
where n is sample size, xi, yi are the individual sample points indexed with i, and x̄=1ni=1nxi is the sample mean (analogously for ȳ). In simple terms, Rxy is the covariance of the two variables divided by the product of their standard deviations. For our measurements, the two variables are simply the predicted and reference snapshots at the same instants, and computation is performed for both the a priori and a posteriori predictions. Figures 11(b) and 11(d) show a gradual decrease in the correlation coefficient for the a posteriori predictions for case 1 and case 2, respectively. A steeper degradation of correlation is observed in the measurements at cross-streamwise locations as shown in Figs. 12(b) and 12(d) for both cases, while that of the a posteriori predictions remains stable. Since a clear trend is observed in degrading correlations for a posteriori predictions, the phase-shift φ(t) were measured for the temporal evolution of velocity magnitude predictions against the reference. Measurements were done along the similar spatial directions as mentioned before, the results of which are shown in Fig. 13. The value φ(t) < 0 denotes that predictions are shifted by that value before the reference, and the φ(t) > 0 denotes predictions shifted after the reference. For case 1, it is interesting to note that the phase shift goes on increasing in magnitude as time evolves, indicating the model’s stability for long-term predictions. However, any clear trend for case 2 when measured at streamwise locations was not observed.
FIG. 11.

Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the X-axis.

FIG. 11.

Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the X-axis.

Close modal
FIG. 12.

Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the Y-axis.

FIG. 12.

Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the Y-axis.

Close modal
FIG. 13.

Phase-shift evolution for a posteriori values of velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left are temporal evolutions measured along with streamwise locations, while on the right are the evolutions measured along with cross-streamwise locations.

FIG. 13.

Phase-shift evolution for a posteriori values of velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left are temporal evolutions measured along with streamwise locations, while on the right are the evolutions measured along with cross-streamwise locations.

Close modal

For a qualitative assessment of results, the contours of velocity components for both cases are compared. Figure 14 shows the instantaneous snapshots of streamwise velocity contours for case 1 at t = [2%, 33%, 66%] of total predicted snapshots as mentioned earlier, and similar instantaneous snapshots for case 2 are shown in Fig. 15. For both the cases, the a priori as well as a posteriori predictions show a fairly accurate agreement with the reference.

FIG. 14.

Comparison of a priori and a posteriori prediction of streamwise velocity contours against the reference showing the temporal evolution for case 1.

FIG. 14.

Comparison of a priori and a posteriori prediction of streamwise velocity contours against the reference showing the temporal evolution for case 1.

Close modal
FIG. 15.

Comparison of a priori and a posteriori prediction of streamwise velocity contours against the truth reference showing the temporal evolution for case 2.

FIG. 15.

Comparison of a priori and a posteriori prediction of streamwise velocity contours against the truth reference showing the temporal evolution for case 2.

Close modal

A convolutional encoder–decoder-based transformer model has been developed to auto-regressively train on spatiotemporal data of turbulent flows. The method of auto-regressive training works by predicting future fluid flow fields from the previously predicted fluid flow field to ensure long-term predictions without diverging. The model has been validated by demonstrating its applicability to turbulent wake flow past an obstacle and environmental flow past surface mounted obstacle. The work demonstrates a promising model and method for forecasting fluid flow fields where the training data are available. The proposed model trained in an autoregressive way shows significant agreements for a priori evaluations, whereas the posterior predictions show expected deviations after a considerable number of simulation steps. The spatiotemporal complexity of predictions is comparable to the target simulations of fully developed turbulence. The autoregressive training and prediction of a posteriori states is the primary step toward the development of more complex data-driven turbulence models and simulations. It is shown that the self-attention transformers incorporated within the convolutional encoder–decoder can predict up to 200Δt time steps with relatively high accuracy, and the proposed data-driven deep learning model remains stable for multiple long time scales, promising a stable and physical deep learning predictive turbulence modeling candidate. Longer autoregressive training sequences would allow the model to capture longer-range dependencies in time, which can be especially important for understanding and predicting patterns over extended time intervals. But such longer sequences and more number of parameters require more computations during training as well as inference, and it slows down training convergence due to the increased difficulty of learning dependencies across long time scales. To achieve this trade-off between training cost and capturing longer-range dependencies, the present model was trained with the autoregressive training sequence length equal to two.

Although achieving longer a posteriori simulation steps with reasonable accuracy can be challenging due to the compounding nature of errors in autoregressive models, some strategies could be considered to improve the accuracy of longer-term predictions. Some of the strategies could be to increase the model’s capacity by adding more layers but ensuring not to overfit, learning rate scheduling to reduce the learning rate as training progress, implementing error feedback mechanisms within training where errors can be used to correct subsequent predictions and reducing error accumulation. Future work includes the training and prediction of multiple future snapshots by inputting multiple time steps from the past to achieve longer a posteriori deep learning simulation steps.

The generalization ability of the proposed model could be extended by transfer learning and fine-tuning, thereby allowing the model to leverage knowledge from one Reynolds regime to improve performance in another. Pretraining the model on data from a range of Reynolds numbers and then fine-tuning it specifically for the target Reynolds number can enhance its generalization. If the model demonstrates reasonable generalization but still exhibits discrepancies at specific Reynolds numbers, fine-tuning it on data from those specific conditions can be considered to further enhance accuracy. Changes to loss function can be done to achieve even longer, stable, physically realistic results. Additional experiments are needed to demonstrate the model’s ability on generalizing to local mesh regions as well as longer a posteriori simulation steps. Further investigations on a variety of industrial and academic cases could include training for flow Reynolds numbers, turbulence intensity, and other inlet parameters. Conclusions from this work would also provide valuable insights for the development of new deep learning methods and their deployment for turbulent flows on complex geometries in industrial problems. Deploying a trained model to assist a fluid solver is regarded as a future extension of the present work.

This work was supported by the Carnot M.I.N.E.S. Institute through the project MINDS—Mines Initiative for Numerics and Data Science.

The authors have no conflicts to disclose.

Aakash Patil: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Jonathan Viquerat: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal). Elie Hachem: Conceptualization (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Writing – review & editing (equal).

The data that support the findings of this study are openly available in https://github.com/aakash30jan/Spatio-Temporal-Learning-of-Turbulent-Flows.49 

For additional verification, we show the instantaneous snapshots of cross-streamwise velocity contours for case 1 and case 2 in Figs. 16 and 17, respectively. The evolution of temporal predictions of cross-streamwise velocity component when measured along the cross-streamwise directions is shown in Fig. 18, and when measured along streamwise directions is shown in Fig. 19. Similarly, evolution of temporal predictions of SA turbulent viscosities when measured along the cross-streamwise directions is shown in Fig. 20, and when measured along streamwise directions is shown in Fig. 21.

FIG. 16.

Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the reference showing the temporal evolution for case 1.

FIG. 16.

Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the reference showing the temporal evolution for case 1.

Close modal
FIG. 17.

Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the truth reference showing the temporal evolution for case 2.

FIG. 17.

Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the truth reference showing the temporal evolution for case 2.

Close modal
FIG. 18.

Comparative predictions of cross-streamwise velocity components (u1) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

FIG. 18.

Comparative predictions of cross-streamwise velocity components (u1) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

Close modal
FIG. 19.

Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

FIG. 19.

Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

Close modal
FIG. 20.

Comparative predictions of sampled along the y-axis of SA turbulent viscosities for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

FIG. 20.

Comparative predictions of sampled along the y-axis of SA turbulent viscosities for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

Close modal
FIG. 21.

Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

FIG. 21.

Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.

Close modal
1.
M. O.
Williams
,
I. G.
Kevrekidis
, and
C. W.
Rowley
, “
A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition
,”
J. Nonlinear Sci.
25
,
1307
(
2015
).
2.
S. L.
Brunton
,
B. W.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control
,”
PLoS One
11
,
e0150171
(
2016
).
3.
C. W.
Rowley
and
S. T. M.
Dawson
, “
Model reduction for flow analysis and control
,”
Annu. Rev. Fluid Mech.
49
,
387
(
2017
).
4.
V.
Mons
,
J.-C.
Chassaing
,
T.
Gomez
, and
P.
Sagaut
, “
Reconstruction of unsteady viscous flows using data assimilation schemes
,”
J. Comput. Phys.
316
,
255
(
2016
).
5.
P.
Dubois
,
T.
Gomez
,
L.
Planckaert
, and
L.
Perret
, “
Data-driven predictions of the Lorenz system
,”
Physica D
408
,
132495
(
2020
).
6.
J.
Xu
and
K.
Duraisamy
, “
Multi-level convolutional autoencoder networks for parametric prediction of spatio-temporal dynamics
,”
Comput. Methods Appl. Mech. Eng.
372
,
113379
(
2020
).
7.
B.
Lusch
,
J. N.
Kutz
, and
S. L.
Brunton
, “
Deep learning for universal linear embeddings of nonlinear dynamics
,”
Nat. Commun.
9
,
4950
(
2018
).
8.
J.
Sirignano
and
K.
Spiliopoulos
, “
DGM: A deep learning algorithm for solving partial differential equations
,”
J. Comput. Phys.
375
,
1339
(
2018
).
9.
H.
Tang
,
L.
Li
,
M.
Grossberg
,
Y.
Liu
,
Y.
Jia
,
S.
Li
, and
W.
Dong
, “
An exploratory study on machine learning to couple numerical solutions of partial differential equations
,”
Commun. Nonlinear Sci. Numer. Simul.
97
,
105729
(
2021
).
10.
Y.
Sun
,
L.
Zhang
, and
H.
Schaeffer
, “
NeuPDE: Neural network based ordinary and partial differential equations for modeling time-dependent data
,” in
Mathematical and Scientific Machine Learning
(
PMLR
,
2020
), pp.
352
372
.
11.
C.
Cheng
and
G.-T.
Zhang
, “
Deep learning method based on physics informed neural network with Resnet block for solving fluid flow problems
,”
Water
13
,
423
(
2021
).
12.
M. Z.
Yousif
,
L.
Yu
, and
H.-C.
Lim
, “
High-fidelity reconstruction of turbulent flow from spatially limited data using enhanced super-resolution generative adversarial network
,”
Phys. Fluids
33
,
125119
(
2021
).
13.
D.
Schmidt
,
R.
Maulik
, and
K.
Lyras
, “
Machine learning accelerated turbulence modeling of transient flashing jets
,”
Phys. Fluids
33
,
127104
(
2021
).
14.
R.
Han
,
Y.
Wang
,
Y.
Zhang
, and
G.
Chen
, “
A novel spatial-temporal prediction method for unsteady wake flows based on hybrid deep neural network
,”
Phys. Fluids
31
,
127101
(
2019
).
15.
T.
Nakamura
,
K.
Fukami
,
K.
Hasegawa
,
Y.
Nabae
, and
K.
Fukagata
, “
Convolutional neural network and long short-term memory based reduced order surrogate for minimal turbulent channel flow
,”
Phys. Fluids
33
,
025116
(
2021
).
16.
J.
Ren
,
H.
Wang
,
G.
Chen
,
K.
Luo
, and
J.
Fan
, “
Predictive models for flame evolution using machine learning: A priori assessment in turbulent flames without and with mean shear
,”
Phys. Fluids
33
,
055113
(
2021
).
17.
D.
Kochkov
,
J. A.
Smith
,
A.
Alieva
,
Q.
Wang
,
M. P.
Brenner
, and
S.
Hoyer
, “
Machine learning–accelerated computational fluid dynamics
,”
Proc. Natl. Acad. Sci. U. S. A.
118
,
e2101784118
(
2021
).
18.
A.
Vaswani
,
N.
Shazeer
,
N.
Parmar
,
J.
Uszkoreit
,
L.
Jones
,
A. N.
Gomez
,
Ł.
Kaiser
, and
I.
Polosukhin
, “
Attention is all you need
,” in
Advances in Neural Information Processing Systems
,
30
(
NeurIPS
,
2017
).
19.
D.
Bahdanau
,
K.
Cho
, and
Y.
Bengio
, “
Neural machine translation by jointly learning to align and translate
,” arXiv:1409.0473 (
2014
).
20.
T.
Wolf
,
L.
Debut
,
V.
Sanh
,
J.
Chaumond
,
C.
Delangue
,
A.
Moi
,
P.
Cistac
,
T.
Rault
,
R.
Louf
,
M.
Funtowicz
et al, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (Association for Computational Linguistics, 2020), pp.
38
45
.
21.
J.
Devlin
,
M.-W.
Chang
,
K.
Lee
, and
K.
Toutanova
, “
BERT: Pre-training of deep bidirectional transformers for language understanding
,” arXiv:1810.04805 (
2018
).
22.
A.
Radford
,
J.
Wu
,
R.
Child
,
D.
Luan
,
D.
Amodei
,
I.
Sutskever
et al, “
Language models are unsupervised multitask learners
,”
OpenAI Blog
1
,
9
(
2019
).
23.
A.
Dosovitskiy
,
L.
Beyer
,
A.
Kolesnikov
,
D.
Weissenborn
,
X.
Zhai
,
T.
Unterthiner
,
M.
Dehghani
,
M.
Minderer
,
G.
Heigold
,
S.
Gelly
et al, “
An image is worth 16 × 16 words: Transformers for image recognition at scale
,” arXiv:2010.11929 (
2020
).
24.
N.
Parmar
,
A.
Vaswani
,
J.
Uszkoreit
,
L.
Kaiser
,
N.
Shazeer
,
A.
Ku
, and
D.
Tran
, “
Image transformer
,” in
International Conference on Machine Learning
(
PMLR
,
2018
), pp.
4055
4064
.
25.
H.
Touvron
,
M.
Cord
,
M.
Douze
,
F.
Massa
,
A.
Sablayrolles
, and
H.
Jégou
, “
Training data-efficient image transformers and distillation through attention
,” in
International Conference on Machine Learning
(
PMLR
,
2021
), pp.
10347
10357
.
26.
Z.
Dai
,
H.
Liu
,
Q. V.
Le
, and
M.
Tan
, “
CoAtNet: Marrying convolution and attention for all data sizes
,” in
Advances in Neural Information Processing Systems
,
34
(
NeurIPS
,
2021
), p.
3965
.
27.
H.
Wu
,
B.
Xiao
,
N.
Codella
,
M.
Liu
,
X.
Dai
,
L.
Yuan
, and
L.
Zhang
, “
CvT: Introducing convolutions to vision transformers
,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (
IEEE
,
2021
), pp.
22
31
.
28.
G.
Sharir
,
A.
Noy
, and
L.
Zelnik-Manor
, “
An image is worth 16×16 words, what is a video worth?
,” arXiv:2103.13915 (
2021
).
29.
G.
Bertasius
,
H.
Wang
, and
L.
Torresani
, “
Is space-time attention all you need for video understanding?
,” arXiv:2102.05095 (
2021
).
30.
P.
Wu
,
S.
Gong
,
K.
Pan
,
F.
Qiu
,
W.
Feng
, and
C.
Pain
, “
Reduced order model using convolutional auto-encoder with self-attention
,”
Phys. Fluids
33
,
077107
(
2021
).
31.
W.
Peng
,
Z.
Yuan
, and
J.
Wang
, “
Attention-enhanced neural network models for turbulence simulation
,”
Phys. Fluids
34
,
025111
(
2022
).
32.
L.
Yu
,
W.
Zhang
,
J.
Wang
, and
Y.
Yu
, “
SeqGAN: Sequence generative adversarial nets with policy gradient
,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI,
2017
), Vol. 31.
33.
J.
Guo
,
S.
Lu
,
H.
Cai
,
W.
Zhang
,
Y.
Yu
, and
J.
Wang
, “
Long text generation via adversarial training with leaked information
,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI,
2018
), Vol. 32.
34.
Z.
Dai
,
Z.
Yang
,
Y.
Yang
,
J.
Carbonell
,
Q. V.
Le
, and
R.
Salakhutdinov
,
Transformer-XL: Attentive language models beyond a fixed-length context
, arXiv:1901.02860 (
2019
).
35.
A.
Katharopoulos
,
A.
Vyas
,
N.
Pappas
, and
F.
Fleuret
, “
Transformers are RNNs: Fast autoregressive transformers with linear attention
,” in
International Conference on Machine Learning
(
PMLR
,
2020
), pp.
5156
5165
.
36.
A.
Radford
,
K.
Narasimhan
,
T.
Salimans
,
I.
Sutskever
et al, Improving language understanding by generative pre-training,
OpenAI blog
,
2018
.
37.
T.
Brown
,
B.
Mann
,
N.
Ryder
,
M.
Subbiah
,
J. D.
Kaplan
,
P.
Dhariwal
,
A.
Neelakantan
,
P.
Shyam
,
G.
Sastry
,
A.
Askell
et al, “
Language models are few-shot learners
,” in
Advances in Neural Information Processing Systems
,
33
(
NeurIPS
,
2020
), p.
1877
.
38.
S.
Woo
,
J.
Park
,
J.-Y.
Lee
, and
I. S.
Kweon
, “
CBAM: Convolutional block attention module
,” in Proceedings of the European Conference on Computer Vision (ECCV) (
Springer Nature
,
2018
), pp.
3
19
.
39.
P.
Dubois
,
T.
Gomez
,
L.
Planckaert
, and
L.
Perret
, “
Machine learning for fluid flow reconstruction from limited measurements
,”
J. Comput. Phys.
448
,
110733
(
2022
).
40.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2014
).
41.
V.
Nair
and
G. E.
Hinton
, “
Rectified linear units improve restricted Boltzmann machines
,” in Proceedings of the International Conference on Machine Learning (ICML) (
Omnipress
,
2010
), pp.
285
319
.
42.
M.
Abadi
,
P.
Barham
,
J.
Chen
,
Z.
Chen
,
A.
Davis
,
J.
Dean
,
M.
Devin
,
S.
Ghemawat
,
G.
Irving
,
M.
Isard
et al, “TensorFlow: A system for large-scale machine learning,”
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.(OSDI) (USENIX Association,
2016
), Vol. 16, pp.
265
283
.
43.
P.
Spalart
and
S.
Allmaras
, “
A one-equation turbulence model for aerodynamic flows
,” in
30th Aerospace Sciences Meeting and Exhibit
(
American Institute of Aeronautics and Astronautics
,
1992
), p.
439
.
44.
G.
Guiza
,
A.
Larcher
,
A.
Goetz
,
L.
Billon
,
P.
Meliga
, and
E.
Hachem
, “
Anisotropic boundary layer mesh generation for reliable 3D unsteady RANS simulations
,”
Finite Elem. Anal. Des.
170
,
103345
(
2020
).
45.
S. B.
Pope
,
Turbulent Flows
(
Cambridge University Press
,
2001
).
46.
Y.
Bazilevs
,
V.
Calo
,
J.
Cottrell
,
T.
Hughes
,
A.
Reali
, and
G.
Scovazzi
, “
Variational multiscale residual-based turbulence modeling for large eddy simulation of incompressible flows
,”
Comput. Methods Appl. Mech. Eng.
197
,
173
(
2007
).
47.
K.
Takizawa
,
T. E.
Tezduyar
, and
Y.
Otoguro
, “
Stabilization and discontinuity-capturing parameters for space–time flow computations with finite element and isogeometric discretizations
,”
Comput. Mech.
62
,
1169
(
2018
).
48.
E.
Hachem
,
S.
Feghali
,
R.
Codina
, and
T.
Coupez
, “
Immersed stress method for fluid–structure interaction using anisotropic mesh adaptation
,”
Int. J. Numer. Methods Eng.
94
,
805
(
2013
).
49.
A.
Patil
,
J.
Viquerat
, and
E.
Hachem
(
2023
). “
Autoregressive transformers for data-driven spatio-temporal learning of turbulent flows
,” Github. https://github.com/aakash30jan/Spatio-Temporal-Learning-of-Turbulent-Flows