A convolutional encoder–decoder-based transformer model is proposed for autoregressively training on spatiotemporal data of turbulent flows. The prediction of future fluid flow fields is based on the previously predicted fluid flow field to ensure long-term predictions without diverging. A combination of convolutional neural networks and transformer architecture is utilized to handle both the spatial and temporal dimensions of the data. To assess the performance of the model, a priori assessments are conducted, and significant agreements are found with the ground truth data. The a posteriori predictions, which are generated after a considerable number of simulation steps, exhibit predicted variances. The autoregressive training and prediction of a posteriori states are deemed crucial steps toward the development of more complex data-driven turbulence models and simulations. The highly nonlinear and chaotic dynamics of turbulent flows can be handled by the proposed model, and accurate predictions over long time horizons can be generated. Overall, the potential of using deep learning techniques to improve the accuracy and efficiency of turbulence modeling and simulation is demonstrated by this approach. The proposed model can be further optimized and extended to incorporate additional physics and boundary conditions, paving the way for more realistic simulations of complex fluid dynamics.
I. INTRODUCTION
The main factor in turbulent flows is convection, which makes tasks such as flow control and model reduction complex and challenging. These tasks become nonlinear, high-dimensional, multi-scale, and nonconvex optimization problems due to the dominance of convection over diffusion. Due to the vast amount of numerical and experimental data available for turbulent flows, data-driven approaches are now gaining popularity in the fluid mechanics community. These approaches use deep learning models to make predictions and represent a valid alternative to traditional methods. This article explores a new data-driven approach based on deep learning to estimate future fluid flow fields from previous ones. The proposed method uses a novel convolutional encoder–decoder transformer model and autoregressive training to achieve long-term spatiotemporal predictions. The approach is tested on two turbulent fluid flow cases, namely, a wake-flow past a stationary obstacle and an environmental flow past a tower fixed on a surface. The results show the effectiveness of the proposed method in predicting the fluid flow fields accurately, highlighting the potential of data-driven approaches in solving challenging problems in fluid mechanics.
There are several traditional ways to address temporal estimations, such as Koopman theory and proper orthogonal decomposition, which are suitable for prediction and control.1–3 Additionally, data assimilation schemes are popular, where the model weights are updated to reflect new observations.4 In recent years, supervised learning techniques using neural networks have been applied to capture nonlinear relations between past and future states. For example, a recurrent neural network (RNN) with long short-term memory (LSTM) was used to predict the chaotic Lorenz system, and convolutional networks were used to predict transient flows.5,6 There have also been attempts to approximate the full Navier–Stokes equations using deep neural networks, but prediction accuracy decreased significantly for chaotic and turbulent flows.7–10 Regarding the estimation of flow fields using deep neural networks, several studies have focused on spatial and temporal reconstruction, as well as spatial supersampling.11–13 Hybrid deep neural network architectures have been designed to capture the spatial–temporal features of unsteady flows,14 and machine learning-based reduced-order models have been proposed for three-dimensional complex flows.15 A deep learning framework combining long short-term memory networks and convolutional neural networks has been used to predict the temporal evolution of turbulent flames.16 However, despite the significant progress made in the acceleration of flow simulation, these models still suffer from the generalization problem and are sensitive to parameter changes.17
New deep learning architectures for temporal problems in unstructured and structured data are emerging, with transformers being one of the most promising. These models make use of self-attention mechanisms to differentially weigh the significance of each part of the input data,18,19 without the need for recurrent network architecture. Inspired by neighborhood-like notions in convolutional neural networks, transformers build features of inputs using a self-attention mechanism to determine the importance of other samples in the dataset with respect to the current sample. The updated features of the inputs are simply the sum of linear transformations of all features weighted by their importance. Transformers avoid recurrence by using the self-attention mechanism, which accounts for the similarity score between elements of a sequence and the positional embedding of these elements, allowing them to account for the full sequence instead of single elements. These models have been successful in natural language processing (NLP) tasks, such as translation and text summarization, and are becoming the model of choice for NLP problems, replacing classical recurrent neural network (RNN) models, such as long short-term memory (LSTM).20–22 Transformers have also been applied to image processing tasks using convolutional neural networks to capture relationships between different portions of an image.23–25 Hybrid architectures combining convolutional layers with transformers have achieved excellent results in several computer vision tasks.26,27 In spatiotemporal context, transformers have been used for video-understanding tasks, capturing spatial and temporal information through the use of divided space–time attention.28,29 In fluid mechanics, attention mechanisms have enhanced the reduced-order model to extract temporal feature relationships from high-fidelity numerical solutions.30 Recently, a similar combination of autoregressive transformers and two-dimensional homogeneous isotropic turbulence was proposed for spatiotemporal prediction of flow fields.31 However, transformers have never been used for spatiotemporal prediction of flow fields involving turbulent flows.
The present contribution is organized as follows: first, the deep learning method based on the convolutional self-attention transformer is discussed, after which focus is made on the autoregressive training procedure. Section III provides insights into the performance of the proposed approach by considering (i) a turbulent flow case with an obstacle embedded in a rectangular domain and (ii) a surface-mounted tower in an open flow. This part is followed by a discussion and a conclusion.
II. DEEP LEARNING METHOD
The primary focus of this contribution is to address the challenge of learning the spatiotemporal dynamics of turbulent flows, which are known for their high complexity, nonlinear behavior, and high dimensionality. There are two main approaches to estimate a reference spatiotemporal field Xt: (i) reconstruction, which involves utilizing limited measurements at a specific time t to reconstruct the full Xt field at the same time, and (ii) prediction, where a dynamical model is utilized to advance the field in time based on previous estimates. Here, spatiotemporal learning is formulated as a task with a given time-series containing N sequential snapshots , in order to predict the same quantity of interest on M steps ahead in time. The input X of the deep learning model is , and the output Y is . Each snapshot xt can be a scalar field or a vector field containing multiple features.
Transformer models were created to address problems in natural language processing, such as completing sentences and translation by embedding words one by one. These tasks involve a sequence of words or sentence tensors measured over time and can be considered temporal learning problems.18 Transformer models have demonstrated impressive results in a range of other tasks, including learning image patches as sequences, image reconstruction, and completion.18,26,27 As a result, transformer models are now challenging the traditional long short-term memory (LSTM) models, which are the de facto RNNs, and are becoming the preferred state-of-the-art approach for a variety of temporal learning tasks.
Like RNNs, transformers are designed to handle sequential input data. However, unlike the latter, they do not necessarily process the data in order. Rather, the attention mechanism provides context for any position in the input sequence, and self-attention itself identifies/learns the weights of attention. In the case of spatiotemporal data, the attention can be applied to the spatial as well as the temporal sequence to attend to or pay attention to. The vanilla transformers in their original form are pure sequence to sequence models, as they learn a target output sequence from an input sequence, i.e., they perform transformation at the sequence level. Their limitations, such as disrupting temporal coherence and failing to capture long-term dependencies, were reached for sentence completion of language generation tasks, where difficulties were noted while generating texts with a model that learns sequences without the knowledge of full-sequences.19,32,33 Several studies were performed, such as that of Dai et al.,34 to address this inability to capture long-term dependencies by attending to memories from previously learned parameters, yet at the expense of computing costs. To deal with some of these issues, autoregressive transformers were proposed by Ref. 35 for sentence and image completion tasks. Although not explicitly stated in some works, the Generative Pre-trained Transformer (GPT) family of models22,36,37 are in fact autoregressive transformers inspired by the decoder part of the original transformers. In Ref. 35, Katharopoulous et al. showed that a self-attention layer trained in an autoregressive fashion can be seen as a recurrent neural network. Transformers can be combined with the classic convolutional encoder–decoder type models to harness their full potential when the input and target output tensors are in a spatiotemporal form. As locality is more important in learning small-scale features, this combination serves as a powerful method for a variety of computer-vision problems, including video-frame prediction. The self-attention mechanism on convolutional layers not only attends or focuses on a sequence of significance, but it also improves the representation of spatially relevant regions by focusing on important features and suppressing less-important ones.38
The convolutional transformer layer is composed of two blocks: the batched matrix multiplication (BMM) and the self-attention summation. The BMM block corresponds to Wi→jxj in Eq. (2), with the batch dimension being the number of spatial locations. It performs k × k different input-dependent summations with the weights α in Eq. (2). It contains both the learnable filter and the dynamic kernel.
The convolutional transformer layer is composed of two blocks: the batched matrix multiplication (BMM) and the self-attention summation. The BMM block corresponds to Wi→jxj in Eq. (2), with the batch dimension being the number of spatial locations. It performs k × k different input-dependent summations with the weights α in Eq. (2). It contains both the learnable filter and the dynamic kernel.
The current self-attention convolutional transformer layer has a 3 × 3 kernel and incorporates the representation of convolutional features. Combining convolutional neural networks with self-attention thus offers superior learning capabilities of spatiotemporal structures, which would benefit turbulent flows and computational fluid dynamics (CFD) in general, where one learns spatial filters as well as temporal embeddings and dependencies. In addition to the convolutional transformer layer, the model is trained in an autoregressive fashion. Formally, autoregressive models are those which forecast future sequences from the previously forecasted sequences in a cyclical way, and thus here auto indicates the regression of the variable sequence against itself.
In turbulent flow problems, the high-dimensional state-space is characterized by intricate spatiotemporal dynamics, and therefore, dimensionality reduction techniques can be useful.39 The prediction and reconstruction problems can be interpreted as estimating the reduced or latent state, making it natural to use encoder–decoder architectures. The encoder takes input tensors, learns the most relevant parts, and maps them to a high-dimensional representation. This high-dimensional representation is then converted to target output tensors by the decoder, which involves successive up-samplings and convolutions. By connecting the encoder and decoder, their weight matrices learn to jointly map the input to the output tensors, allowing small-scale features to be learned. The decoder aims to transform the latent space representation with a dimension of nz × nz to the original spatial dimensions of the target output at time xt+Δt.
Convolutional encoder–decoder transformer architecture. Model architecture of the convolutional encoder–decoder transformer to process low and high level features. The canonical four-stage design is utilized in addition to the convolutional transformer blocks or layers. H, W are the input resolutions for each snapshot in Tin sequence and Tout sequence.
Convolutional encoder–decoder transformer architecture. Model architecture of the convolutional encoder–decoder transformer to process low and high level features. The canonical four-stage design is utilized in addition to the convolutional transformer blocks or layers. H, W are the input resolutions for each snapshot in Tin sequence and Tout sequence.
The activation function used in the neural network was ReLU, which is known to help stabilize the weight update during training.41 During training, the entire training dataset was presented to the network repeatedly after shuffling, and each complete pass is called an epoch. An early stopping criterion was used to stop the training process, along with a learning rate reduction if learning improvement did not occur after every 100 epochs. The TensorFlow platform42 was used to implement the neural network, and Nvidia Tesla V100 graphics processing units (GPUs) were used to train it.
III. NUMERICAL SIMULATION CASES AND DATA GENERATION
A. Governing equations
B. Case 1: Wake-flow past a square cylinder
The turbulent flow around a 2D square cylinder is examined as a widely used benchmark case. The reference velocity U∞ and cylinder lateral size H at the domain’s center are used to set the baseline Reynolds number to 22 × 103. The computational domain spans with 20 length units in the streamwise x and with 14 length units in crosswise y directions, respectively. To conduct an Unsteady Reynolds Averaged Navier–Stokes (URANS) or Variational Multi-scale (VMS) simulation, the domain is discretized into a sufficient number of cells (around 100 000) using a finite-element flow solver developed in-house.44,46–48 The inflow boundary conditions comprise of u = (Vin, 0) and , leading to an eddy to kinematic viscosity ratio of about 0.2. The side boundaries are treated as symmetrical, with ∂yux = uy = 0 and . For the outflow, ∂xux = ∂xuy = 0, along with p = 0 are enforced. At the cylinder surface, no-slip conditions u = 0 and are applied. The simulation runs for a physical time of 5000 s with a time step of Δt = 0.05 s. A stable flow is achieved after about 200 s, and the remaining data for the next 4800 s are collected for training and testing purposes. Data are recorded every Δt = 0.25 s, resulting in about 1500 snapshots. In terms of non-dimensional time defined as t* = tU∞/H, this sampling rate corresponds to Δt* = 1. Approximately 24 shedding cycles are observed in simulation data. Given the 70/30 splitting strategy where 70% data is used for training and 30% data is used for validation and testing, 16 shedding cycles are observed in training data, which seems reasonable to fully characterize the dynamics of wake turbulent flow past a two-dimensional square cylinder considering its simplicity. Figure 3 shows the sketch of the associated case.
Setup cases used in this study. (a) Case 1 setup: Wake-flow past a square cylinder. (b) Case 2 setup: Environmental flow over a surface-mounted tower.
Setup cases used in this study. (a) Case 1 setup: Wake-flow past a square cylinder. (b) Case 2 setup: Environmental flow over a surface-mounted tower.
C. Case 2: Environmental flow over surface-mounted tower
The turbulent flow past a two-dimensional (2D) rectangular tower on the land surface is considered. The baseline Reynolds number is set to 45 × 102, based on the reference velocity U∞ and the square tower of sides H, which is placed on the surface. The dimensions of the computational domain are with 35 length units in the streamwise x and with 8 length units crosswise y directions, respectively, and the domain is discretized into sufficiently large number of cells (around 100 000) to perform a URANS or VMS simulation. The inflow boundary conditions are u = (Vin, 0), together with , which corresponds to a ratio of eddy to kinematic viscosity of ∼0.2. For the top of the domain, the velocity component normal to the surface is set to zero. No-slip boundary conditions u = 0 and are imposed at the tower surface, as well as the bottom surface at y = −1. The time step is Δt = 0.01 and 300 s are simulated. For a statistically steady state to be reached (periodic vortex shedding to be observed), around 100 s are required. The data of the remaining 200 s (i.e., ∼20 × 103 time steps) are stored for training and testing purposes. The data are sampled at each Δt = 0.1 s, thus collecting around 2000 snapshots. In terms of non-dimensional time defined as t* = tU∞/H, this sampling at each Δt = 0.1 denotes Δt* = 1. Figure 3(b) shows the sketch of the associated case. Initially, a free separated shear layer expands above the tower and becomes wavy and then reattaches at the bottom surface of the domain. The shear layer flaps and vortical structures are shed from it. Approximately 18 shedding cycles are observed in simulation data. Given the 70/30 splitting strategy (70% data for training and 30% data for validation and testing), 12 shedding cycles are observed in training data enough to reasonably characterize the dynamics of environmental flow over the surface-mounted obstacle.
IV. RESULTS AND DISCUSSION
In this section, the results are discussed as follows: first, the temporal evolutions of the quantities are compared, and then the spatial measurements at various times are compared for velocity components. In a second time, temporal propagation of errors and correlation coefficients are compared along with the propagation of phase shifts. Additionally, the contour plots of quantities are also compared to provide qualitative assessments. These comparisons are performed for both the cases and for both the a priori and a posteriori simulations as illustrated in Fig. 4. To compare results, a first a priori simulation is performed by exploiting data samples that were not used during training. The trained model is fed snapshots at instant t, and predicts the next two snapshots at instants t + Δt and t + 2Δt, and the process is repeated by feeding the subsequent snapshots from the dataset, until the same number of snapshots is reached for comparison with the original ground truth time series. As snapshots from the dataset are utilized, this approach is termed a priori deep learning simulation. On the other hand, a posteriori simulation is performed by feeding a snapshot at instant t0 from the same dataset not used during training, and by predicting the next two snapshots at instants t + Δt and t + 2Δt. This predicted snapshot at instant t + 2Δt is then injected back into the model to predict snapshots at instants t + 3Δt and t + 4Δt, and the process is similarly repeated until the same number of snapshots are obtained so as to compare with the true snapshots. This way of recycling the model predictions is termed a posteriori deep learning simulation. Once an equal length of time snapshots are obtained, both the a priori and the a posteriori results against the truth from the dataset can now be compared. Figure 5 shows the temporal evolution of the ensemble average of velocity magnitude for case 1 and case 2. For case 1, both a priori and a posteriori predictions present a good agreement with respect to the truth, whereas for case 2 the predictions, though fairly accurate, suffer from deterioration. Moreover, the long-term predictions of the model are evident from the accuracy of a posteriori predictions, giving us an indication of global long-term learning while comparing ensemble averages.
Illustration of a priori and a posteriori simulations. Left: For a priori simulation, each Xt from {X}, the dataset not used in training time, is fed to the model. Right: For a posteriori simulation, the inputs are received from their own previous predictions, provided they were initiated with a suitable Xt.
Illustration of a priori and a posteriori simulations. Left: For a priori simulation, each Xt from {X}, the dataset not used in training time, is fed to the model. Right: For a posteriori simulation, the inputs are received from their own previous predictions, provided they were initiated with a suitable Xt.
Temporal evolution of the ensemble averages for a priori and a posteriori values of velocity magnitude compared to the true values in black. Left: Ensemble mean for spatial values of velocity magnitude for case 1 (a priori and a posteriori predictions comparedto ground truth). Right: Ensemble mean for spatial values of velocity magnitude for case 2 (a priori and a posteriori predictions comparedto ground truth).
Temporal evolution of the ensemble averages for a priori and a posteriori values of velocity magnitude compared to the true values in black. Left: Ensemble mean for spatial values of velocity magnitude for case 1 (a priori and a posteriori predictions comparedto ground truth). Right: Ensemble mean for spatial values of velocity magnitude for case 2 (a priori and a posteriori predictions comparedto ground truth).
The accuracy of the predictions is further verified by comparing the values along the various streamwise and cross-streamwise locations. These locations are marked with dashed lines in Fig. 6 for both the cases. For case 1, measurements were made along streamwise directions at x = [−2.5H, 8H, 12H, 16H] and cross-streamwise directions at y = −2H, 0, 2H, and similarly for case 2, the measurements were made at x = [−2.5H, 8H, 16H, 32H] and at y = [0.5H, 1.5H, 5H]. As the wake-flows are topic of interest, these locations were chosen based on the region of interest away from the obstacle for both the cases. With regard to temporal evolution, the predictions were compared at a certain percentage of the total predicted snapshots. As a reminder, around 200 snapshots were predicted for case 1 and around 100 snapshots for case 2. The predictions are compared at instants t = [2%, 33%, 66%] to verify the quality of temporal evolution.
Locations of the probe lines used for comparison with the reference quantities. Top: Lines along streamwise and cross-streamwise directions for case 1. Bottom: For case 2.
Locations of the probe lines used for comparison with the reference quantities. Top: Lines along streamwise and cross-streamwise directions for case 1. Bottom: For case 2.
Figure 7 shows the evolution of temporal predictions of streamwise velocity component u0 when measured along with cross-streamwise directions. The a priori predictions follow closely the reference values indicating a good agreement with the short-term predictions along with the measured spatial directions. For a posteriori predictions, an increasing deviation from the reference was observed as time evolves, which can be attributed to the accumulation error while making long-term predictions. Similarly, the evolution of the same quantity (u0) when measured along streamwise directions is shown in Fig. 8. A similar trend is observed for the predictions against the reference, where the a posteriori predictions deteriorate as time evolves. It can be noted that the upstream predictions at x = −2.5H are better across times, as it is not affected by the turbulent wake. Overall measurements indicate a decent agreement of both the short-term a priori predictions as well as long-term a posteriori predictions with the reference solutions.
Comparative predictions of streamwise velocity component (u0) for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.
Comparative predictions of streamwise velocity component (u0) for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.
Comparative predictions of streamwise velocity component (u0) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.
Comparative predictions of streamwise velocity component (u0) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row contains instantaneous predictions at t = 0.33 Tn, and the bottom row contains instantaneous predictions at t = 0.66 Tn.
Later, the temporal evolution of prediction error against reference by computing relative mean-squared errors of velocity magnitude for both cases is investigated. These errors are measured along the locations mentioned earlier. Figure 9(a) shows the evolution of error for a priori predictions, and Fig. 9(b) shows a posteriori predictions for case 1 measured at streamwise locations. As could be expected, the errors accumulate for long-term posterior predictions, leading to a clear distinction when compared to a priori predictions. It is interesting to note that although magnitude increases over time, this evolution also follows the vortex/wake shedding cycles denoting that the trained model performs well for long-term a posteriori predictions. A similar trend is observed for case 2 as shown in Figs. 9(c) and 9(d), although here the magnitude of accumulated errors is higher than that of case 1. As shown in Fig. 10, a similar trend is observed when measurements of errors were performed at cross-streamwise locations.
Mean squared error propagation for velocity magnitude with respect to reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right are the temporal evolution of a posteriori mean squared error. The values are shown for locations along the X-axis.
Mean squared error propagation for velocity magnitude with respect to reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right are the temporal evolution of a posteriori mean squared error. The values are shown for locations along the X-axis.
Mean squared error propagation for velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right is the temporal evolution of a posteriori mean squared error. The values are shown for locations along the Y-axis.
Mean squared error propagation for velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of a priori mean squared error, while on the right is the temporal evolution of a posteriori mean squared error. The values are shown for locations along the Y-axis.
Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the X-axis.
Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the X-axis.
Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the Y-axis.
Correlation propagation for velocity magnitude with respect to the true values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left is the temporal evolution of the Pearson product–moment correlation coefficient for a priori values with reference to true values, while on the right are the R values for a posteriori values with reference to true values. The values are shown for locations along the Y-axis.
Phase-shift evolution for a posteriori values of velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left are temporal evolutions measured along with streamwise locations, while on the right are the evolutions measured along with cross-streamwise locations.
Phase-shift evolution for a posteriori values of velocity magnitude with respect to the reference values. The top row shows the evolution for case 1, and the bottom row shows the evolution for case 2. On the left are temporal evolutions measured along with streamwise locations, while on the right are the evolutions measured along with cross-streamwise locations.
For a qualitative assessment of results, the contours of velocity components for both cases are compared. Figure 14 shows the instantaneous snapshots of streamwise velocity contours for case 1 at t = [2%, 33%, 66%] of total predicted snapshots as mentioned earlier, and similar instantaneous snapshots for case 2 are shown in Fig. 15. For both the cases, the a priori as well as a posteriori predictions show a fairly accurate agreement with the reference.
Comparison of a priori and a posteriori prediction of streamwise velocity contours against the reference showing the temporal evolution for case 1.
Comparison of a priori and a posteriori prediction of streamwise velocity contours against the reference showing the temporal evolution for case 1.
Comparison of a priori and a posteriori prediction of streamwise velocity contours against the truth reference showing the temporal evolution for case 2.
Comparison of a priori and a posteriori prediction of streamwise velocity contours against the truth reference showing the temporal evolution for case 2.
V. CONCLUSIONS
A convolutional encoder–decoder-based transformer model has been developed to auto-regressively train on spatiotemporal data of turbulent flows. The method of auto-regressive training works by predicting future fluid flow fields from the previously predicted fluid flow field to ensure long-term predictions without diverging. The model has been validated by demonstrating its applicability to turbulent wake flow past an obstacle and environmental flow past surface mounted obstacle. The work demonstrates a promising model and method for forecasting fluid flow fields where the training data are available. The proposed model trained in an autoregressive way shows significant agreements for a priori evaluations, whereas the posterior predictions show expected deviations after a considerable number of simulation steps. The spatiotemporal complexity of predictions is comparable to the target simulations of fully developed turbulence. The autoregressive training and prediction of a posteriori states is the primary step toward the development of more complex data-driven turbulence models and simulations. It is shown that the self-attention transformers incorporated within the convolutional encoder–decoder can predict up to 200Δt time steps with relatively high accuracy, and the proposed data-driven deep learning model remains stable for multiple long time scales, promising a stable and physical deep learning predictive turbulence modeling candidate. Longer autoregressive training sequences would allow the model to capture longer-range dependencies in time, which can be especially important for understanding and predicting patterns over extended time intervals. But such longer sequences and more number of parameters require more computations during training as well as inference, and it slows down training convergence due to the increased difficulty of learning dependencies across long time scales. To achieve this trade-off between training cost and capturing longer-range dependencies, the present model was trained with the autoregressive training sequence length equal to two.
Although achieving longer a posteriori simulation steps with reasonable accuracy can be challenging due to the compounding nature of errors in autoregressive models, some strategies could be considered to improve the accuracy of longer-term predictions. Some of the strategies could be to increase the model’s capacity by adding more layers but ensuring not to overfit, learning rate scheduling to reduce the learning rate as training progress, implementing error feedback mechanisms within training where errors can be used to correct subsequent predictions and reducing error accumulation. Future work includes the training and prediction of multiple future snapshots by inputting multiple time steps from the past to achieve longer a posteriori deep learning simulation steps.
The generalization ability of the proposed model could be extended by transfer learning and fine-tuning, thereby allowing the model to leverage knowledge from one Reynolds regime to improve performance in another. Pretraining the model on data from a range of Reynolds numbers and then fine-tuning it specifically for the target Reynolds number can enhance its generalization. If the model demonstrates reasonable generalization but still exhibits discrepancies at specific Reynolds numbers, fine-tuning it on data from those specific conditions can be considered to further enhance accuracy. Changes to loss function can be done to achieve even longer, stable, physically realistic results. Additional experiments are needed to demonstrate the model’s ability on generalizing to local mesh regions as well as longer a posteriori simulation steps. Further investigations on a variety of industrial and academic cases could include training for flow Reynolds numbers, turbulence intensity, and other inlet parameters. Conclusions from this work would also provide valuable insights for the development of new deep learning methods and their deployment for turbulent flows on complex geometries in industrial problems. Deploying a trained model to assist a fluid solver is regarded as a future extension of the present work.
ACKNOWLEDGMENTS
This work was supported by the Carnot M.I.N.E.S. Institute through the project MINDS—Mines Initiative for Numerics and Data Science.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Aakash Patil: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Jonathan Viquerat: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal). Elie Hachem: Conceptualization (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are openly available in https://github.com/aakash30jan/Spatio-Temporal-Learning-of-Turbulent-Flows.49
APPENDIX: ADDITIONAL VERIFICATIONS
For additional verification, we show the instantaneous snapshots of cross-streamwise velocity contours for case 1 and case 2 in Figs. 16 and 17, respectively. The evolution of temporal predictions of cross-streamwise velocity component when measured along the cross-streamwise directions is shown in Fig. 18, and when measured along streamwise directions is shown in Fig. 19. Similarly, evolution of temporal predictions of SA turbulent viscosities when measured along the cross-streamwise directions is shown in Fig. 20, and when measured along streamwise directions is shown in Fig. 21.
Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the reference showing the temporal evolution for case 1.
Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the reference showing the temporal evolution for case 1.
Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the truth reference showing the temporal evolution for case 2.
Comparison of a priori and a posteriori prediction of cross streamwise velocity contours against the truth reference showing the temporal evolution for case 2.
Comparative predictions of cross-streamwise velocity components (u1) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.
Comparative predictions of cross-streamwise velocity components (u1) sampled along the y-axis for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.
Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.
Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.
Comparative predictions of sampled along the y-axis of SA turbulent viscosities for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.
Comparative predictions of sampled along the y-axis of SA turbulent viscosities for case 1 (left) and case 2 (right). Figures from top to bottom denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.
Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.
Comparative predictions of cross-streamwise velocity components (u1) for case 1 (left) and case 2 (right). Figures from top to bottom, denote the predictions at increasing times, i.e., the top row contains instantaneous predictions at t = 0.02 Tn, the middle row at t = 0.33 Tn, and the bottom row shows the predictions at t = 0.66 Tn.