Numerical simulation of fluids is important in modeling a variety of physical phenomena, such as weather, climate, aerodynamics, and plasma physics. The Navier–Stokes equations are commonly used to describe fluids, but solving them at a large scale can be computationally expensive, particularly when it comes to resolving small spatiotemporal features. This trade-off between accuracy and tractability can be challenging. In this paper, we propose a novel artificial intelligence-based method for improving fluid flow approximations in computational fluid dynamics (CFD) using deep learning (DL). Our method, called CFDformer, is a surrogate model that can handle both local and global features of CFD input data. It is also able to adjust boundary conditions and incorporate additional flow conditions, such as velocity and pressure. Importantly, CFDformer performs well under different velocities and pressures outside of the flows it was trained on. Through comprehensive experiments and comparisons, we demonstrate that CFDformer outperforms other baseline DL models, including U-shaped convolutional neural network (U-Net) and TransUNet models.
I. INTRODUCTION
Computational fluid dynamics (CFD) is a technology that calculates and predicts physical phenomena. The fundamental CFD task is applied to various scientific and engineering problems, such as automobile optimization1 and aircraft aerodynamic design.2,3 CFD simulations have also been used to determine the evolution of weather patterns,4,5 catastrophic events such as wildfires,6 and even the heart’s blood flow to detect irregularities.7,8
Furthermore, the CFD simulation plays a vital role in addressing industrial engineering problems, such as frequently in the atmospheric diffusion of chemicals.9,10 In the case of chemical spillage and propagation, the CFD simulation is greatly advantageous since it enables the quick depiction of the flow of the fluid even if the accuracy may be slightly lowered.
CFD is a popular method for solving the Navier–Stokes equations,11 and partial differential equations (PDEs) that describe the flow of incompressible fluids in fluid mechanics. CFD finds an approximate solution of the Navier–Stokes equations through an algorithm of a computer and a numerical method. Simulating each cell iteratively until convergence requires high computational costs. In other words, it is impossible to make real-time predictions even with supercomputers.
The Navier–Stokes equations are PDEs with many terms, such as velocity vector u, pressure vector p, kinematic viscosity γ, density ρ, and body force F. These equations are considered in boundary conditions with a given specified velocity field. In particular, the incompressible Navier–Stokes equation discussed in this study is as presented in the following equation:
To reduce the high computational costs in CFD, several recent deep learning (DL) methods have been proposed as surrogates for solving the problem of Navier–Stokes equations computation. In particular, multiple studies12,13 have employed the convolutional neural network (CNN)-based encoder-decoder architecture to predict steady or unsteady state flow with fields represented by image arrays.
For example, DeepCFD was proposed by Ribeiro et al.,14 which is a novel and efficient way of approximating non-uniform 2D steady laminar flow CFD estimations based on the U-shaped convolutional neural network (U-Net) architecture. U-Net is a convolutional neural network architecture that is widely used in image segmentation tasks. It is characterized by its “U”-shaped architecture, which includes an encoder network for feature extraction and a decoder network for spatial resolution recovery. The Signed Distance Function (SDF) feature and flow region channel as two input features are fed into the U-Net model. Then, the encoder compresses spatial features on the mesh and boundary conditions and generates flow changes from compressed information. Next, Le et al.16 based on interpolated feature data generation and a deep U-Net learning model to estimate incompressible laminar flow. In addition, several studies17–20 have introduced machine learning (ML)-accelerated CFD, which combines a physical fluid flow simulator with CNN inference.
Although these methods possess the great benefit that they can predict results in real-time, they come with the drawback of having to learn by fixing the conditions for the initial flow velocity because only the features of the image containing spatial information are extracted. Furthermore, CNN is weak in capturing global fluid flow features considering the obstacle’s position and inlet/outlet position, even though CNN can capture spatial features of the local receptive field. To prevent this, general vision tasks can use data augmentation, but it is difficult to apply to fluid flow problems.
In this paper, we present a novel efficient DL-based CFD model with low-cost computation, which overcomes several limitations of previous studies. In particular, we use the encoder-decoder neural network scheme; a hybrid Vision Transformer (ViT)21 and U-Net22 to predict fluid flow on 2D geometry. In detail, we extract local spatial features with convolutional layers in the encoder and calculate global fluid flow attention using geometry information. On the decoder side, consisting of an upscaling path based on the repetition of a deconvolution operation, it generates flow results in a steady state as an image array.
The contributions of this work can be summarized as follows:
To the author’s best knowledge, the first proposed is the CFD surrogate model that simultaneously reflects and analyzes various initial velocity conditions.
Our CFDformer does not only provide similar laminar flows to the standard CFD solver but also yields an incredibly faster simulation time of up to 99.94%, which is revolutionary.
We perform extensive experiments showing that our method outperforms other existing methods by comparing with several baselines of deep generative models under the same initial conditions.
Our extensive experimentation demonstrates that the proposed method exhibits superior performance when compared to existing methods in the prediction of results under a range of initial velocity conditions. In particular, our proposed method demonstrates the ability to generalize to new, unseen conditions that were not present in the training data. This suggests that the proposed method possesses robust generalization capabilities.
II. METHODOLOGY
Given a grid image, with a spatial resolution H × W, boundary condition array of , two velocity components and under fixed pressure p0, our goal is to predict the corresponding steady-state velocity result. In this paper, we introduce a deep neural network (DNN) named CFDformer as a surrogate model for solving the Reynolds-averaged Navier–Stokes (RANS) equations. CFDformer’s architecture network couples the so-called Vision Transformer (ViT) and U-Net, as shown in Fig. 1. Our goal is to check the visual, natural, and accurate results in real-time by utilizing our CFDformer.
Our CFDformer uses a series of convolutions to convert input images into a set of lower-resolution feature maps, then encodes them using a ViT.21 In this process, we can extract latent locality features of the grid space. At the same time, we extract embedding features of initial velocity conditions and using linear projection. Subsequently, we concatenate geometric and condition features. These concatenated vectors are added with positional embedding vectors to identify positions. Then, the transformer calculates self-attention to capture global latent features from concatenated vectors.
Next, the model decodes them with several upsamplers employing convolution layers. We propose using the MHA upsampling methodology in the decode step, which contains multi-head attention (MHA) and skip-connection to enable precise localization. MHA upsampling uses key and value vectors from the encoder side and query vectors from the previous step feature map on the decoder side.
A. Encoder
1. Latent spatial representation
To first encode the local spatial information, we extract latent features from concatenated arrays xgrid and xbdr using the Latent Spatial Representation neural network (LSRNet), as shown in Fig. 1. LSRNet consists of three convolution blocks. In each block, batch normalization is applied to ensure normal distribution in the previous convolution values. Except for the first block, the feature maps from the remaining blocks are used as skip connections when upsampling is performed in the decoding stage. These latent features capture both the positions (e.g., inlet, outlet, and obstacles) and the grid resolution so that structural features can be extracted. Also, we use flattened latent features as the transformer encoder’s input patches. Meanwhile, θ is the parameter of LSRNet,
2. Latent air inlets condition
Using a trainable linear projection, we map two velocity conditions, i.e., and , into a latent D-dimensional embedding space using a standard fully connected (FC) neural network. The latent dimension is the same as the LSRNet outputs,
We finally use patch-embedding vectors, including the spatial information and the initial velocity information, z, as inputs to the transformer encoder, as shown in Eq. (5). Note that Epos is the position encoding the vector for the unfolded patches,
As illustrated in Fig. 1, the embedding vector z is first passed into the Layer Normalization (LM) and then to the Multi-head Self-Attention (MSA). The MSA consists of multiple self-attention layers. Then, the MSA result is passed into a position-wise Feed-Forward Network (FFN). The above process can be described as formulas shown in the following equations:
The MSA layer puts each input vector z into relation with all input vectors z1, …, zd and thereby transforms the input vectors z into a more refined spatial representation of itself including initial velocity information, defined as .
B. Decoder
A decoder, including multiple MHA upsampling layers, decodes the hidden feature to estimate the steady-state flow, as shown in Fig. 1. Furthermore, as inspired by A. Vaswani et al.,23 we perform multi-head attention. Query Q is the target pixels of upsampling, whereas key K and value V are spatial vectors of LSRNet. We calculate attention values with Q, K, and V to emphasize global spatial features. In other words, while upsampling is performed, the degree of correlation between pixels generates flow results naturally. In addition, it can recover structural information, such as the obstacle’s position, through the concatenated skip-connection process.
C. Loss function
In this study, flow estimation is achieved by minimizing mean squared loss, , and smooth loss, , as shown in the following equations:
We employ total variation loss24 as represents natural flow regarding vertical and horizontal direction visualizations to recover the image discontinuities. Note that γ is a constant value for determining the contribution of ,
D. Optimization
For optimization, we set an Adam25 optimizer with a polynomial decay schedule. We set the start learning rate at 2e − 5, the end learning rate at 1e − 1, and power at 0.9.
III. EXPERIMENTS
A. Dataset
For verification of our experiment, we employ two different datasets. The first one is the dataset of DeepCFD,14 whereas the other is our own dataset.15 Specifically, we utilize CFDTool26 to generate a dataset. CFDTool is a graphical user interface (GUI) in MATLAB that allows users to set up and solve fluid flow and heat transfer problems using various numerical methods. It also includes tools for visualizing the results of simulations. CFDTool is useful for analyzing complex fluid flow and heat transfer problems in fields such as fluid mechanics, heat transfer, and thermodynamics.
As shown in Figs. 2 and 3, we fix the inlet and outlet in a 2D space with a width of 200 and a height of 300. Also, we set the boundary condition elements as follows: (0) inner part of an obstacle, (1) cell, (2) non-slip wall, (3) inlet, (4) outlet, and (5) wall of an obstacle.
We collect data while changing two velocity components, as shown in Table I. The number of grid cells in each sample is about 10 300. We generated 1000 samples of fluid flow data using an automated MATLAB script and CFDTool. The samples included obstacles with random sizes that were located beyond the middle of the width, as shown in Fig. 2, where the left side represents the inlet and the right side represents the outlet. The code for generating the fluid flow datasets15 is available at https://github.com/HyoeunKang/cfdformer. In terms of the data, we split them into a train set of 800, a validation set of 100, and a test set of 100.
B. Experimental settings
We perform distributed training with two TitanXP devices on TensorFlow version 2.6.0. The batch size is set to 16. Moreover, we employ h = 4 parallel attention heads, hidden unit size dmodel = 256 for Multi-Head Attention, and dffn = 512 for FFN. Finally, we set the dropout rate to 0.1 to avoid overfitting.
C. Experimental results on DeepCFD dataset
Table II shows a comparison of the results obtained with U-Net,22 TransUNet,27 DeepCFD,14 and CFDformer. We measure the mean squared error (MSE) for the best model from each architecture. To compare the results with baselines, we used DeepCFD datasets.14 As shown in Table II, our predicted results on two velocity components (i.e., ux and uy) in a steady state are more accurate.
Model performance comparison between CFDformer and baseline models.
Estimation (MSE) . | U-Net . | TransUNet . | DeepCFD . | Ours (CFDformer) . |
---|---|---|---|---|
ux | 3.1 | 1.51 | 1.21 | 0.23 |
uy | 2.75 | 0.98 | 0.97 | 0.21 |
Estimation (MSE) . | U-Net . | TransUNet . | DeepCFD . | Ours (CFDformer) . |
---|---|---|---|---|
ux | 3.1 | 1.51 | 1.21 | 0.23 |
uy | 2.75 | 0.98 | 0.97 | 0.21 |
D. Experimental results on our dataset
We test the CFDformer’s performance by varying the initial conditions, as shown in Table I. The obstacle’s position is also randomly selected, as shown in Fig. 2.
In Fig. 4, the ground-truth data distribution is plotted against the CFDformer-modeled data distribution from 100 test samples. The approximated CFDformer solution on the test set produces data distributions with shapes very similar to those from the ground-truth CFD simulation for all quantities analyzed.
The histogram compares ground-truth CFD data distribution against CFDformer predicted distribution from 100 test samples.
The histogram compares ground-truth CFD data distribution against CFDformer predicted distribution from 100 test samples.
As depicted in Fig. 5, it can be seen that the values are almost similar to those of the original for most pixels. Applying a smooth loss function to prevent abrupt changes in pixels entails a trade-off relationship near some boundary lines. In addition, Fig. 6 shows that the model converges as learning proceeds.
E. Interpolation
Accelerator models based on DL, including our CFDformer, are trained in the probability distribution of datasets. We can consider a dataset in this case as a set of points in vector space. Thus, our training data are bounded, which means it is confined to a finite region of vector space28 called the interpolation zone.
We validate the universality of CFDformer by using vectors of the interpolation zone, as shown in Fig. 7. We select initial two velocity values from the inner range of our datasets. Figure 8 shows that our model makes reasonable predictions for the untrained driving condition value range.
The interpolation results for two velocity components at (m/s) and (m/s) (a) ux(m/s) (b) uy (m/s).
The interpolation results for two velocity components at (m/s) and (m/s) (a) ux(m/s) (b) uy (m/s).
The histogram compares ground-truth CFD data distribution against CFDformer predicted distribution from 100 samples for interpolation validation.
The histogram compares ground-truth CFD data distribution against CFDformer predicted distribution from 100 samples for interpolation validation.
Furthermore, as shown in Fig. 9, the simulation time in the standard solver is about 15 s per sample, excluding the time for generating geometries in the solver. In comparison, CFDformer makes inferences about the entire samples in 8 s. Thus, it can be seen that CFDformer reduces the analysis time by 99.94%.
IV. CONCLUSIONS AND FUTURE WORK
In this paper, we presented CFDformer, a surrogate model for improving the accuracy of fluid flow approximations in computational fluid dynamics. Our model, which is based on a vision transformer and a U-shaped convolutional neural network, is able to handle both local and global features of input data and adjust boundary conditions. Through comprehensive experiments and comparisons, we demonstrated that CFDformer outperforms other baseline models and is able to approximate the full Navier–Stokes solutions with a high degree of accuracy, even in cases where the input vectors are not present in the training data.
One of the key contributions of our work is the demonstration of the generalization capabilities of our model. Our results suggest that CFDformer has strong generalization capabilities and has the potential to be used in a wide range of applications where accurate predictions are required, even in cases where the training data are limited.
There are several factors that might influence the generalization capabilities of our model. For example, the quality and diversity of the training data may play a role in the model’s ability to generalize to new situations. In future work, it will be important to explore these factors in more detail and to identify strategies for improving the generalization capabilities of the model. In addition, we plan to further explore the factors that influence the model’s performance and identify strategies for improving its generalization capabilities. Also, we intend to extend the model to 3D laminar fluid flow simulations.
Overall, our study provides a novel approach for improving the accuracy of fluid flow approximations in computational fluid dynamics and has the potential to significantly reduce the computational cost of simulating complex flow scenarios. We believe that our work will be of interest to researchers in the field of artificial intelligence and computational fluid dynamics, and we hope that it will inspire further research in this area.
ACKNOWLEDGMENTS
This work was supported in part by the MSIT (Ministry of Science and ICT), Korea, under the Convergence security core talent training business (Pusan National University) support program (Grant No. IITP-2023-2022-0-01201) supervised by the IITP (Institute for Information and Communications Technology Planning & Evaluation) and in part by the BK21 FOUR Program by Pusan National University Research Grant, 2021-2022.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Hyoeun Kang: Conceptualization (lead); Formal analysis (equal); Methodology (lead); Project administration (equal); Resources (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). Yongsu Kim: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Resources (equal); Visualization (equal); Writing – original draft (equal). Thi-Thu-Huong Le: Formal analysis (equal); Software (equal); Supervision (equal); Validation (equal); Visualization (equal). Changwoo Choi: Conceptualization (equal); Formal analysis (equal); Methodology (equal); Validation (equal). Yoonyoung Hong: Data curation (equal); Formal analysis (equal); Investigation (equal). Seungdo Hong: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Project administration (equal); Resources (equal). Sim Won Chin: Funding acquisition (equal); Investigation (equal); Project administration (equal); Resources (equal); Supervision (equal). Howon Kim: Funding acquisition (equal); Project administration (equal); Resources (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are openly available in CFDFormer at https://doi.org/10.5281/zenodo.7527624, Ref. 15.