A nonlocal subgrid-scale stress (SGS) model is developed based on the convolution neural network (CNN), which is a powerful supervised data-driven method and also an ideal approach to naturally consider spatial information due to its wide receptive field. The CNN-based models used in this study take primitive flow variables as input only, and then, the flow features are automatically extracted without any *a priori* guidance. The nonlocal models trained by direct numerical simulation (DNS) data of a turbulent channel flow at *Re*_{τ} = 178 are accessed in both the *a priori* and *a posteriori* tests, providing reasonable flow statistics (such as mean velocity and velocity fluctuations) close to the DNS results even when extrapolating to a higher Reynolds number *Re*_{τ} = 600. It is identified that the nonlocal models outperform local data-driven models, such as the artificial neural network, and some typical SGS models (e.g., the dynamic Smagorinsky model) in large eddy simulation (LES). The model is also robust with stable numerical simulation since the solutions can be well obtained when examining the grid resolution from one-half to double of the spatial resolution used in training. We also investigate the influence of receptive fields and propose using the two-point correlation analysis as a quantitative method to guide the design of nonlocal physical models. The present study provides effective data-driven nonlocal methods for SGS modeling in LES of complex anisotropic turbulent flows.

## I. INTRODUCTION

Large eddy simulation (LES) is a powerful tool for turbulence simulation. The flow fields in LES are decomposed into resolved and unresolved parts. The resolved large-scale turbulent motions are directly solved from the Navier–Stokes equations, while the unresolved part is modeled by subgrid-scale (SGS) stress models. In recent decades, a variety of SGS models are developed to establish mappings from the resolved flow variables to the SGS stress tensor. The SGS model proposed originally by Smagorinsky^{1} represents that the SGS stresses are linearly related to the resolved strain rate tensor. This model is simple and effective but purely dissipative since no backward energy transfer from the subgrid-scale to the resolved scale is allowed. The dynamic models are motivated by the defect of the Smagorinsky model.^{2–4} The local value of the Smagorinsky coefficient is dynamically determined in different flow regimes and the backscatter can be predicted. In reality, the backward energy transfer from residual motions to the resolved flow field is clipped sometimes by setting the negative eddy viscosity *ν*_{t} to zero considering the numerical stability.^{5,6} The dynamic models are generally time-consuming because of the additional filtering operation. On the other hand, the gradient model^{7,8} and the similarity model^{9,10} are not sufficiently dissipative and prone to be numerically unstable in actual LES.

Recently, data-driven approaches are introduced to turbulence modeling,^{11–18} and there emerge many data-driven based models for LES.^{19–26} Gamahara and Hattori^{27} attempted to construct SGS models from filtered direct numerical simulation (fDNS) data of turbulent channel flow by using the artificial neural network (ANN). In their work, the strain tensor, rotation tensor, and the distance from the wall are selected as the input features of ANN, while the performance of ANN shows no advantage over the Smagorinsky model in the *a posteriori* test. Wang *et al.*^{28} compared the performance of two machine learning (ML) algorithms, i.e., random forests and ANN; they found that ANN is better in SGS modeling, while random forests are helpful for input feature selection. In addition, Maulik *et al.*^{29} used the blind deconvolution method to recover unfiltered variables from the corresponding filtered ones. In compressible flow field, Xie *et al.*^{30} applied an ANN mixed model to predict the SGS stresses and the SGS heat flux of compressible isotropic turbulence.

Most models mentioned above are working in a pointwise manner, i.e., only local information is considered in prediction. Some researchers^{31,32} choose a multiple points stencil as input of ANN to include spatial information. The work of Park and Choi^{33} shows that the model with multiple grid points type input can obtain higher correlation coefficients compared to a single grid point type input in the *a priori* test but may suffer numerically unstable in the *a posteriori* simulation. In addition, in the framework of ANN, as shown in Fig. 1(a), the increase in the number of stretch grid points usually means more input features, larger model size, and lower efficiency. Apart from that, feature engineering is a common issue for ANN-based models no matter how many grid points are included.

Some previous research works^{33,34} have revealed that input feature selection is crucial to the success of ANN-based models and extra spatial information is helpful to improve the model’s performance. In the machine learning field, the convolutional neural network (CNN) is another powerful tool that works well in feature extraction tasks and naturally contains spatial topology information. Based on CNN, we can simply use primitive variables as input and let the machine learn a model from the DNS data automatically without any assumption. As depicted in Fig. 1(b), the convolution operation in CNN is similar to the filter operation in physics. The combination of convolutional layers enables CNN to extract information like the first- or higher-order derivatives of the considered variables, which may provide the basis to understand CNN in a physical aspect.^{35} Furthermore, CNN has a much wider scope compared to ANN with stencil-type input. The scope size increases with the deepening of the network, and the prediction performed on the last layer is based on the information from a block of the first layer rather than a point.

Because of the above-mentioned advantages, CNN has started to be used in subgrid-scale modeling.^{36,37} Typically, Beck *et al.*^{38} constructed a mapping from coarse grid quantities to SGS force with CNN in decaying homogeneous isotropic turbulence. Their *a priori* test result indicates that the CNN-based models can get better results than ANN-based ones. Pawar *et al.*^{39} conducted *a priori* analysis on LES of two-dimensional Kraichnan turbulence by means of different data-driven parameterizations, including CNN, ANN with point-to-point mapping, and ANN with neighboring stencil data mapping. They found that CNN can provide accurate predictions with less computational overhead. Although these *a priori* tests have demonstrated the potential of the CNN-based models, further test in practical simulations is absent.

Motivated by CNN’s advantage of the large receptive field on the physical space and the feature extraction ability, as well as the lack of relevant study, we aim to progress the research of nonlocal data-driven models on the anisotropic flow and develop a concise, efficient SGS model by using CNN. However, some obstacles prevent the application of CNN in actual LES (i.e., the *a posteriori* test) in reality, which are mainly related to efficiency and maneuverability. First, CNN would be quite slow when serially working on the Central Processing Unit (CPU). Only when running in a highly parallelized manner, it shows a speed advantage. However, most Computational Fluid Dynamics (CFD) programs are written for CPU that cannot meet the demand of CNN. Second, Python is the most widely used computational language in constructing machine-learning models, such as CNN, and various Python open-source libraries make programming convenient. Nevertheless, CFD codes are mainly written in Fortran or C++, so mixed-language programming may be an irritating issue when embedding CNN models to realistic LES. One solution is writing both the ML and CFD codes in the same programming language; however, it may take a lot of time to rewrite the source code, and the efficiency problem still exists. Another solution is using mixed-language programming, while the CFD process and ML process have to run on the same device or machine in this way.

To overcome the above-mentioned obstacles, we propose the heterogeneous machine-learning CFD (HML-CFD) framework based on inter-process communication technology that breaks the limitation of programming language and running machines. In this framework, the ML process [usually runs on Graphics Processing Unit (GPU), written in Python] and the CFD process [usually runs on Central Processing Unit (CPU), written in Fortran/C++] implemented in different programming languages can work on different devices to leverage the concise and nonlocal property of CNN while maintaining efficiency.

The main contribution of this work is as follows:

We propose CNN-based nonlocal SGS models for the anisotropic flow. CNN has built-in invariance to spatial displacements and is especially suitable for structured tensors.

^{35}Our research shows that the CNN-based models can automatically extract features simply from primitive flow variables, stable*a posteriori*LES, and have an advantage over ANN-based SGS models in terms of the receptive field.We study the influence of the receptive field on the models’ performance. Through the two-point correlation analysis, we elucidate the physical meaning of CNNs’ kernel size and relate the design of data-driven models with the physical problem.

We propose the HML-CFD framework, which provides a convenient way to embed ML algorithms into the CFD programs. The CNN-based models can work as efficiently as traditional models due to the application of this framework.

The rest of the paper is organized as follows: In Sec. II, we will introduce the numerical methods of ML and CFD, as well as the details of the HML-CFD framework. The results of *a priori* and *a posteriori* tests in turbulent channel flow are discussed in Secs. III A and III B, respectively, and then the investigation of numerical stability and efficiency is given in Sec. III C. Finally, concluding remarks are addressed in Sec. IV.

## II. NUMERICAL METHODS AND HML-CFD FRAMEWORK

The purpose of this study is to establish a mapping from the velocity field (*u*, *v*, *w*) to the SGS stress tensor *τ*_{ij} by using the CNN-based models. In this study, the datasets used for training and testing the ML models come from the DNS of turbulent channel flow. There are various flow characters in such wall-bounded turbulent flow, which enhance the difficulty of modeling. Thus, we try to study the ML-based SGS models in the wall-bounded turbulent flow here.

### A. Details of DNS

The governing equations for DNS of turbulent channel flow are given as

where **u** is the velocity vector, *p* is the pressure, and *ν* is the kinematic viscosity. The simulation is performed with periodic boundary conditions in the streamwise (*x*) and spanwise (*z*) directions, and the no-slip boundary condition is employed at the top and bottom walls with the wall-normal (*y*) direction.

To solve the governing equations, a third-order Runge–Kutta and sixth-order compact schemes are applied for time integration and spatial derivatives, respectively. A constant mass flux is maintained in the channel. The simulations are performed at *Re*_{τ} = 178 and 600 by an open-source flow solver Xcompact3d.^{40} Here, *Re*_{τ} is the friction Reynolds number defined by the wall fraction velocity *u*_{τ}, the kinematic viscosity *ν*, and the channel half-width *δ*, while the bulk Reynolds number is denoted as *Re*_{b}. The computational domain sizes of DNS in the three dimensions are *L*_{x} = 4*π*, *L*_{y} = 2, and *L*_{z} = 2*π* for *Re*_{τ} = 178 case. The other computational parameters are listed in Table I. The validity of the DNS is shown in Fig. 2.

Re_{b}
. | Re_{τ}
. | N_{x}, N_{y}, N_{z}
. | L_{x}, L_{y}, L_{z}
. | $\Delta x+,\Delta ymin+,\Delta z+$ . |
---|---|---|---|---|

4 200 | 178 | 384, 129, 192 | 4π, 2, 2π | 4.4, 1.0, 4.4 |

16 800 | 600 | 384, 321, 384 | 2π, 2, π | 9.8, 1.0, 4.9 |

Re_{b}
. | Re_{τ}
. | N_{x}, N_{y}, N_{z}
. | L_{x}, L_{y}, L_{z}
. | $\Delta x+,\Delta ymin+,\Delta z+$ . |
---|---|---|---|---|

4 200 | 178 | 384, 129, 192 | 4π, 2, 2π | 4.4, 1.0, 4.4 |

16 800 | 600 | 384, 321, 384 | 2π, 2, π | 9.8, 1.0, 4.9 |

The DNS data at *Re*_{τ} = 178 are used to generate input and output data pairs for ML-based models. The real SGS stresses can be calculated by the relation $\tau ijfDNS=uiu\u0304j\u2212u\u0304iu\u0304j$, where the overbar denotes the filtering operation; then, the mapping function from the filtered velocity field $(u\u0304,v\u0304,w\u0304)$ to $\tau ijfDNS$ can be learned by the models. To obtain the filtered data from the DNS result, the Gaussian filter is used in the wall-parallel (i.e., *x* and *z*) directions. The filter function for the one-dimensional Gaussian filter is

where Δ is the filter width. In this work, the filter widths Δ_{x} and Δ_{z} in these two directions are 8Δ_{DNS} and 4Δ_{DNS}, respectively.

The training dataset is sampled by every eight and four grid points in the *x* and *z* directions, respectively, so that CNN can be trained with the coarse grid resolution similar to the actual LES. All the grid points in the *y* direction are sampled except for that on the wall. The grid sizes used in the DNS of *Re*_{τ} = 178 case are *N*_{x}, *N*_{y}, and *N*_{z} = 384, 129, and 192. Using the above-mentioned data sampling method, we collect the sample data from 200 instantaneous instants with *N*_{x}/8 × (*N*_{y} − 2) × *N*_{z}/4 grid points based on DNS at *Re*_{τ} = 178. Then, the sampled data with 80% of instants (i.e., 160 instants) are used for training, with 10% (i.e., 20 instants) for validation, and with 10% (i.e., 20 instants) for the *a priori* test in Sec. III A. The training data are normalized to zero mean and unitary variance. Based on our test, we also identify that no noticeable improvement in the performance is attained when more training data are used.

### B. CNN-based SGS models

The artificial neural network and the convolutional neural network are the most widely used tools in the machine learning field. As shown in Figs. 1(a) and 1(b), there are multiple layers between the input and output layers for both networks; each layer’s input comes from the output of its former layer. In ANN, each neuron in one layer is usually connected to all neurons in the layer before it, and the procedure of computing layer output can be formalized as

where $Xil$ denotes the *i*th output in layer *l*, *ϕ* is the nonlinear activation function, $fn\u2032$ is the number of hidden neurons in layer –*l* − 1, and *W*^{l} and *b*^{l} are the trainable parameters called weights and bias, respectively.

For CNN, each neuron only receives a restricted region of data from the previous layer called the neuron’s receptive field. Trainable weights are called filters in CNN, which are shared across neurons in the same layer and convolved by receptive fields in an iterative way to get the feature maps. The convolution operation is expressed as

where *f*_{h} and *f*_{w} denote the height and width of filters, respectively; $fn\u2032$ is the depth of filters in the previous layer (layer –*l* − 1); $Xi,j,kl$ is the output of the neuron located in row *i*, column *j* in the feature map *k* of the convolution layer *l*; $Wm,n,k\u2032,kl$ is the connect weight between feature map *k*′ (layer –*l* − 1) and feature map *k* (layer *l*); and $bkl$ denotes the corresponding bias term. Weight sharing makes the extraction of shift-invariance and hierarchical features possible in deep neural networks. Because of the distinct character, CNN is ideal for the data with a grid-topology such as image and CFD data.

As shown in Fig. 1(b), the prediction on a point in the feature map *X*^{l+1} totally relies on its receptive field (e.g., 3 × 3) on the feature map *X*^{l}, while the prediction of a point in *X*^{l} depends on its receptive fields (e.g., 3 × 3) on *X*^{l−1}, which means a 5 × 5 size receptive field in layer *l* − 1 contributes to the prediction of the point in layer *l* + 1. With the deepening of the neural network, the receptive field gets larger. Generally, the receptive field of a CNN with kernel size *f*_{h} × *f*_{w} and *L* hidden layers is [*L*(*f*_{h} − 1) + 1] × [*L*(*f*_{w} − 1) + 1]. In other words, information from a [*L*(*f*_{h} − 1) + 1] × [*L*(*f*_{w} − 1) + 1] size regime around a point directly or indirectly influences the final prediction on the point. Therefore, we could use CNN to develop a nonlocal modeling method.

For the SGS stress modeling problem, a CNN-based model can directly handle the raw field variables *u*, *v*, and *w* without destroying the spatial topology and the flow field feature extraction is automatic. However, the feature selection is hand-engineered and depends on *a priori* knowledge for ANN. On the other hand, the ANN-based model can only include spatial information by adding more neighboring grid points into the input features, which multiplies the number of input features and enlarges the model size.

The training data are the same for both ANN and CNN but transformed into different shapes. ANN makes predictions in a point-to-point way, i.e., it takes the physical values on a grid point (data shape *N*_{feature}) and predicts *τ*_{ij} on this grid point, while CNN works in a block-to-block way, where *N*_{feature} is the number of input features. CNN can take flow variables on the *x*–*z* plane [data shape $NxLES\xd7NzLES\xd7Nfeature$ corresponding to the two-dimensional (2D) convolution kernel] or a whole three-dimensional (3D) flow field (data shape $NxLES\xd7NyLES\xd7NzLES\xd7Nfeature$ corresponding to the 3D convolution kernel) as the input. The grid of turbulent channel flow is usually uniform in the *x* and *z* directions but nonuniform in the *y* direction; the traditional 3D convolution kernel cannot represent the nonuniform grid spacings in different directions. In addition, the 2D convolution kernel requires fewer parameters, thus more efficient than the 3D one. Therefore, we choose the *x*–*z* plane type input in this work. Specifically, the training data are fed into the CNN in the shape of 48 × 48 × 3 (=*N*_{x}/8 × *N*_{z}/4 × *N*_{feature}). The information of the *y* direction is contained in the training data sampled at different *y* locations. This kind of input can be applied to different grid resolutions in the *y* direction, meaning it is more flexible.

The CNN used in this work consists of five hidden layers shown in Fig. 3. The input is the three components of velocity (*N*_{feature} = 3), while the output is the corresponding six components of SGS stresses *τ*_{ij}. The filter kernels in each hidden layer (except the last one) consist of *f*_{n} feature maps with size *f*_{h} × *f*_{w}. Here, *f*_{h} and *f*_{w} are the kernel sizes in each layer, which can be regarded as a measure of the receptive field. Generally, the square filter (*f*_{h} = *f*_{w}) is most widely used in practice, which is also adopted in this study. A series of CNN-based models with different filter depths and sizes are examined in Sec. III A. As a comparison, an ANN with three hidden layers (*f*_{n} neurons per hidden layer) is also employed. Note that the input feature selection is crucial for ANN-based models, but it is not the focus of this paper. The velocity gradient tensor *∂u*_{i}/*∂x*_{j} is chosen as the input feature of ANN (*N*_{feature} = 9), since it is widely used in the preview work,^{27,28,33} while the output is also *τ*_{ij}. The activation function *ϕ* for both models is the exponential linear unit (ELU) defined as

There is no activation function in the last layer of both ANN and CNN.

For both the ML models, the objective function to minimize is the mean square error (MSE) loss function denoted as

where $\tau ijpred$ is the output of the ML models, $\tau ijfDNS$ is computed from the fDNS data using the equation $\tau IjfDNS=uiu\u0304j\u2212u\u0304iu\u0304j$, and *N*_{data} is the total number of training data. The Adam optimizer^{43} is employed to minimize the loss function.

### C. HML-CFD framework

It is known that the ML algorithms (e.g., ANN and CNN) written mostly in Python are suitable for devices with a highly parallel structure, such as GPU, while most CFD algorithms written in Fortran/C++ run on CPU. When applying the ML models to realistic numerical simulations, we may encounter the problem of device selection and mixed-language programming. The HML-CFD framework is designed to make the combination of ML models and CFD simulations concisely and efficiently. As depicted in Fig. 4, the framework consists of a client and a server. The client is the CFD simulation process demanding unclosed terms of undetermined coefficients, while the server is the ML process waiting for the request from the client. The client initially sends fluid information to the server and then waits for the prediction of pre-trained ML models, and the CFD process continues after the predicted result is returned. The two processes exchange messages through an inter-process communication technology called sockets. A socket is one endpoint of a two-way communication link between two programs running on the network. The sockets allow for communication between two different processes on the same or different machines. With the HML-CFD framework, we can focus on the development of the CFD process, while the ML process can work on efficient parallel devices such as GPU or cloud computing resources.

In the case of CNN-based SGS models, the CFD process initially sends the fluid field data *u*, *v*, and *w* to the ML process, and then the ML model predicts the SGS stresses *τ*_{ij} according to the received messages and returns the predictions to the CFD process. This procedure repeats until the end of the CFD simulation. Owing to the independence of the two program processes, the CFD program and the ML program can run on different machines using different languages. In this way, we can take advantage of parallel computing in the computation of ML models and avoid the trouble of mixed-language programming. Hence, our proposed technique is applicable to the actual LES efficiently.

In this work, the CFD simulations are implemented by Fortran on Intel Core i7-9700K (CPU), while the ML algorithms are conducted using the open-source library Pytorch^{44} on NVIDIA GeForce GTX 2080 (GPU).

## III. RESULTS

To examine the performance of ML-based models, *a priori* and *a posteriori* tests are conducted in Secs. III A and III B, respectively. In the *a priori* test, the SGS stresses are predicted by the ML models (ANN and CNN) with the input variables from fDNS at *Re*_{τ} = 178. LESs with two traditional models, i.e., the dynamic Smagorinsky (DSM) model and the wall-adapting local eddy-viscosity (WALE) model,^{45} are also performed as a comparison. The WALE model is a kind of algebraic eddy-viscosity model, and it can return the correct wall-asymptotic behavior, thus suitable for wall-bounded flows like channel flow. In the *a posteriori* test, all the models are applied in actual LESs. Finally, the numerical stability and computational efficiency are discussed in Sec. III C.

### A. *A priori* test

First, the pre-trained models are examined with the test data from fDNS at *Re*_{τ} = 178. The correlation coefficient between the models’ prediction $\tau ijpred$ and $\tau ijfDNS$ is defined as

which is used to assess models’ performance. In total, three CNN-based models, i.e., CNN-K3, CNN-K5, and CNN-K7, are examined here. CNN-K3 denotes that the size of filters used in this network is *f*_{h} × *f*_{w} = 3 × 3, and CNN-K5 and CNN-K7 have the similar meanings.

To identify the relatively optimal parameters for different data-driven models, the effect of filter depth *f*_{n} (or hidden neurons numbers per layer for ANN) is initially investigated. As shown in Fig. 5(a), with the increase in *f*_{n}, *ρ*_{τ} slightly raises until a critical *f*_{n}, after which *ρ*_{τ} shows no noticeable change, indicating greater *f*_{n} is unhelpful. ANN is a local method with the velocity gradient tensor as an input feature. *ρ*_{τ} of ANN is as large as around 0.97, indicating the velocity gradient tensor is highly related to the SGS stresses. For the nonlocal CNN-based models, with three velocity components as input, i.e., CNN-K3, CNN-K5, and CNN-K7, their predictions also show a high correlation (*ρ*_{τ} about 0.9) with the reference data, indicating the CNN successfully extracted effective features related to the SGS stresses from the primitive flow variables. From the comparison of the nonlocal CNN-based models, we could find the models with large kernel sizes (i.e., CNN-K5 and CNN-K7), including more nonlocal information in prediction, generally lead to better results when *f*_{n} is smaller than 32, since the fitting ability of CNN-K3 is not strong enough to well construct the mapping relationship. Note that the kernel size determines the number of trainable parameters and the fitting ability, and the ratio of parameter numbers for the CNN-based models is CNN-K3: CNN-K5: CNN-K7 = 9: 25: 49 when *f*_{n} is fixed. However, as *f*_{n} gets bigger than 32, the trend reverses. CNN-K7 becomes the worst one even though it has the most parameters. Two reasons may account for this trend. First, CNN-K7 with the most parameters is harder to train than CNN–K3. Second, the receptive field of CNN-K7 is much larger than that of CNN-K3, thus containing much more irrelevant information to the prediction of the SGS stresses, which increases the difficulty of extracting the effective features. The ML models are more likely to focus on useless information when the receptive field is too big, leading to bad extrapolation ability. Further discussion about the influence of receptive fields is detailed in Sec. III B.

Considering the balance of accuracy and efficiency, CNN-K3, CNN-K5, and CNN-K7 with *f*_{n} = 32 and ANN with *f*_{n} = 16 are selected as the tested models in the following work. The variation of *ρ*_{τ} with *y* is depicted in Fig. 5(b). It is seen that *ρ*_{τ} of all models are relatively uniform in the outer layer but vary dramatically in the inner layer. The nonuniform performance may result from the fact that there exist distinct flow characters in different flow regions, while, here, only one model is used to represent the features in the whole flow regime. The *ρ*_{τ} from nonlocal data-driven models remain relatively high in all the wall-normal distances, especially for the near-wall vicinity. The results of CNN-K3 and CNN-K5 are quite close, slightly higher than that of CNN-K7 in the outer layer, which is consistent with the result displayed in Fig. 5(a).

Therefore, the CNN-based models successfully extracted the key flow features from primitive variables, and the correlation coefficients *ρ*_{τ} are large for the predicted results from both ANN and CNN. However, a high *ρ*_{τ} in *a priori* test cannot guarantee success in the actual LES,^{33} and it is necessary to access the accuracy, stability, and efficiency of SGS models in the *a posteriori* test.

### B. *A posteriori* test

The LES of turbulent channel flow with a constant mass flow at *Re*_{b} = 4200 (denoted as LES178) and *Re*_{b} = 16 800 (denoted as LES600) will be performed using different models. All the LESs are carried out with the same numerical methods as those of DNS described in Sec. II B. The grid spacing is uniform in the *x* and *z* directions but nonuniform in the *y* direction; the corresponding computational parameters of LESs are listed in Table II. The time steps of LESs are eight times that of DNS. The values of *Re*_{τ} obtained from LES with ANN, CNN-K3 and CNN-K5 are close to that of DNS (less than 3% error) in the LES178 case. CNN-K7 shows similar performance as traditional models (around 4% error).

Case . | Re_{b}
. | (N_{x}, N_{y}, N_{z})
. | L_{x}, L_{y}, L_{z}
. | $(\Delta x\u0304+,\Delta z\u0304+)$ . | SGS model . | Re_{τ}
. |
---|---|---|---|---|---|---|

LES178 | 4200 | 24, 49, 24 | 2π, 2, π | 46.7, 23.4 | DSM | 170 |

⋯ | ⋯ | ⋯ | ⋯ | WALE | 170 | |

⋯ | ⋯ | ⋯ | ⋯ | ANN | 174 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K3 | 173 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K5 | 173 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K7 | 171 | |

LES600 | 16 800 | 42, 129, 42 | π, 2, π/2 | 44.8, 22.4 | DSM | 570 |

⋯ | ⋯ | ⋯ | ⋯ | WALE | 570 | |

⋯ | 42, 97, 42 | ⋯ | ⋯ | ANN | Diverged | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K3 | 581 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K5 | 581 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K7 | 589 |

Case . | Re_{b}
. | (N_{x}, N_{y}, N_{z})
. | L_{x}, L_{y}, L_{z}
. | $(\Delta x\u0304+,\Delta z\u0304+)$ . | SGS model . | Re_{τ}
. |
---|---|---|---|---|---|---|

LES178 | 4200 | 24, 49, 24 | 2π, 2, π | 46.7, 23.4 | DSM | 170 |

⋯ | ⋯ | ⋯ | ⋯ | WALE | 170 | |

⋯ | ⋯ | ⋯ | ⋯ | ANN | 174 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K3 | 173 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K5 | 173 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K7 | 171 | |

LES600 | 16 800 | 42, 129, 42 | π, 2, π/2 | 44.8, 22.4 | DSM | 570 |

⋯ | ⋯ | ⋯ | ⋯ | WALE | 570 | |

⋯ | 42, 97, 42 | ⋯ | ⋯ | ANN | Diverged | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K3 | 581 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K5 | 581 | |

⋯ | ⋯ | ⋯ | ⋯ | CNN–K7 | 589 |

Figure 6(a) compares the mean velocity profiles from the LES and fDNS of channel flow at *Re*_{b} = 4200. All the models well capture the trend and generate similar results. CNN-K3 and CNN-K5 are slightly better than WALE, but the difference is not so obvious. In terms of velocity fluctuations, the nonlocal models also perform well, providing fairly well agreement with the reference data in the *x* direction but obvious deviation in the *y* direction. It is difficult to capture fluctuations in the *y* direction since the value is too small compared to the streamwise fluctuations, especially in the near-wall regions. On the other hand, ANN slightly underpredicts the rms velocity fluctuations in the streamwise direction, while the DSM and the WALE models overpredict them. Therefore, all the SGS models generate reasonable solutions in general and the nonlocal data-driven models show the advantage over traditional models.

The previous test is conducted at the same Reynolds number (*Re*_{b} = 4200) as that of the training data. To evaluate the extrapolation ability of the ML models, the *a posteriori* test is also carried out at a higher Reynolds number *Re*_{b} = 16 800. The ANN-based model diverges in this case. The numerical stability is a big issue for the local data-driven method like the ANN-based model. As demonstrated in previous work,^{33} most ANN-based models face the numerical divergence problem without special treatments (e.g., clipping backscatter). The DSM and WALE models are numerically stable in a wide range of Reynolds numbers, which is the advantage of traditional models without backscatter. All the CNN-based models provide physically reasonable solutions just as their performance in the LES178 case. The values of *Re*_{τ} from LES600 are well predicted by the nonlocal models (about 3% error), while the two traditional models underpredict the value. Figures 7(a) and 7(b) show that the nonlocal models well capture the mean velocity profile and the rms of velocity fluctuations in the streamwise direction, although they generate less fluctuations than the reference data in the spanwise and wall-normal directions. On the other hand, the performance of the DSM and WALE models is similar, both are worse than the nonlocal CNN-based models. The CNN-based models do not show much difference in the *a posteriori* test, which indicates too many parameters cannot definitely result in better performance when building physical models. The kinetic energy spectra predicted by different models at *y*^{+} = 30 are also analyzed. Figure 8 only shows the results of LESs with CNN–K3 and the WALE model for clarity. The predicted velocity spectra from CNN-K3 well agree with that of DNS, which indicates the data-driven models can generate physically reasonable result as traditional models like the WALE model.

From previous tests, we could find that the nonlocal CNN-based models can well predict the SGS stresses in turbulent channel flow in LES178 and LES600 cases, and their performance in the LES600 case surpasses both the local data-driven methods (ANN) and traditional models (DSM and WALE model). The extra included spatial information contributes to the success of the SGS stress modeling. However, from the comparison of CNN-K3 and CNN-K7 in both *a priori* and *a posteriori* tests, we could also identify that too large receptive fields seem unhelpful for the modeling. To analyze the effect of receptive fields qualitatively and quantitatively, the two-point correlations *R* in the streamwise and spanwise directions calculated from the DNS data at *Re*_{τ} = 178 are presented in Fig. 9. Two locations are selected, one close to the wall (*y*^{+} = 5) and the other close to the centerline (*y*^{+} = 148). As shown in Fig. 9, with the increase in separation distance, the two-point correlations tend to zero—*R* fall off steeper in the spanwise direction than that of the streamwise direction—since the streamwise vortices are longer.

In CNN, the final prediction totally depends on the information from its receptive fields, as described in Sec. II B; the receptive field of a CNN with *f*_{h} × *f*_{w} size kernels and *L* hidden layers is [*L*(*f*_{h} − 1) + 1] × [*L*(*f*_{w} − 1) + 1]. The receptive field for CNN-K3 is 11 × 11 here. Because the training data are organized in the shape of 48 × 48 × 3, the physical domain size corresponding to the receptive field is $1112\pi \xd71124\pi $ $4\pi \xd71148=1112\pi $, $2\pi \xd71148=1124\pi $. In other words, the prediction of *τ*_{ij} on a grid point is affected by information from a $1112\pi \xd71124\pi $ size flow region around it. The receptive field of CNN-K3, as well as that of CNN-K5 and CNN-K7, is indicated by rectangles in Fig. 9. As shown in Fig. 9(a), the receptive field of CNN-K3 only contains the region of physical distance less than $1124\pi $ in the *x* direction, which includes the most related information to the predicted point in the flow field. For CNN-K5, the scope is larger and includes extra less related information (*R*_{ii} < 0.3). CNN-K7 considers nearly $23$ of the flow field to make predictions on a grid point, while most of the flow region is irrelative to the grid point, which means much more irrelevant flow field information is included. It increases the difficulty of modeling and the risk of overfitting. It is more obvious in the *z* direction, as can be seen in Fig. 9(b), that the two-point correlations *R* nearly approach zero when the two-point distance is larger than 0.8 except for *R*_{11} at *y*^{+} = 148, and the extra scopes of CNN-K5 and CNN-K7 do not provide more useful information, which accounts for the reason why CNN-K7 cannot surpass the other nonlocal CNN-based models even with more accessible information. Therefore, introducing moderate nonlocal information is beneficial for the SGS stress modeling, while excessive nonlocal information is unnecessary. The perspective of correlation analysis provides a quantitative method to guide the design of nonlocal data-driven models. Furthermore, the larger kernel size generally leads to more trainable weights and a larger network, thus affecting models’ efficiency, which will be discussed below.

### C. The numerical stability and computational efficiency

In the preceding analysis, ML-based models are examined in the aspect of accuracy, while numerical stability and computational efficiency are also crucial for SGS models. Thus, an extra *a posteriori* test is carried out for the nonlocal CNN-K3 model to investigate its numerical stability and the influence of grid resolutions. We select a series of grid resolutions from nearly one-half to double the spatial resolution in the *x* and *z* directions used in models’ training with $(\Delta x\u0304+,\Delta z\u0304+)=(46.7,23.4)$; the simulations are conducted in the channel flow at *Re*_{b} = 16 800 with domain size *L*_{x} × *L*_{y} × *L*_{z} = *π* × 2 × 0.5*π*. The grid point numbers *N*_{x} and *N*_{z} in the *x* and *z* directions are varied. Hereafter, we use *N*_{x} × *N*_{z} to denote the grid resolutions, e.g., 32 × 32 representing *N*_{x} = 32 and *N*_{z} = 32. The grid numbers in the *y* direction are set as *N*_{y} = 65 for the coarser mesh cases (21 × 21 and 32 × 32) and *N*_{y} = 97 for the finer mesh cases (63 × 63 and 84 × 84). Note that the CNN-K3 model examined here is the same as that tested in Sec. III B, and only the DNS data at *Re*_{τ} = 178 are used for training, while this test is conducted at a higher Reynolds number.

Figure 10 shows the predicted results in different grid resolutions. The result for the resolution 42 × 42 in Fig. 10(a), which is the same as the resolution used in training, provides fairly well agreement with the DNS data. As the resolution becomes coarser (resolution 32 × 32), the Reynolds shear stress predicted by the CNN-based model begins to deviate from the DNS result but is in the rational range. On the other hand, the predictions of mean flow and velocity fluctuation are accurate. When the resolution is as coarse as half of the resolution of training data (21 × 21), the grid becomes too coarse for the CNN-based model to generate a correct result. The mean flow is overpredicted and the Reynolds shear stress is obviously underestimated. Although not so accurate, the model still keeps numerical stability and does not diverge in such coarse resolution. On the other hand, the simulation in finer resolutions is more stable and authentic compared to the coarser ones. Overall, the CNN-based model is able to generate stable and reasonable solutions in a wide range of resolutions, exhibiting excellent numerical robustness.

Finally, Table III exhibits the performance of models by comparing the number of parameters, multiply–accumulate operations (MACs), and the time consumption in realistic simulations of the LES600 case with grid sizes *N*_{x}, *N*_{y}, *N*_{z} = 42, 97, 42, respectively. The number of parameters determines the model size, while the MACs denote the number of calculations in a forward pass of the neural networks. With the increase in kernel size, the number of parameters multiplies, so as the MACs. Note that the MACs for data-driven models (measured in billions) are much larger than the traditional models such as the Smagorinsky model; it means that the models would be inefficient when working serially in practice and parallel running is necessary. When embedding these data-driven models into the realistic LESs, the discrepancies of time consumption for the nonlocal models are not so large as the MACs, since the convolution operation is well optimized by the mainstream machine learning libraries and parallel devices. Here, the ML algorithms and CFD simulations are performed on different devices (as introduced in Sec. II C) of the same machine in the HML-CFD framework. The time consumption of LES without the SGS model (i.e., coarse DNS) under the same condition is utilized to norm other models’ time consumptions. Through the parallelization of ML algorithms, the nonlocal data-driven models, i.e., CNN-K3, CNN-K5, and CNN-K7, merely spent 37%, 40%, and 42% extra cost, respectively, which are comparable to the traditional models. These results show the potential of the ML-based models for replacing traditional models as an efficient tool in reality. The current simulations conducted here are small-scale, and the HML-CFD framework will have a great advantage in large-scale computing.

SGS model . | No model . | ANN . | CNN-K3 . | CNN-K5 . | CNN-K7 . |
---|---|---|---|---|---|

Parameters (K) | 0.00 | 0.53 | 30.37 | 84.13 | 164.77 |

MACs (G) | 0.00 | 0.85 | 5.20 | 14.40 | 28.19 |

Time consumption | 1.00 | 1.67 | 1.37 | 1.40 | 1.42 |

SGS model . | No model . | ANN . | CNN-K3 . | CNN-K5 . | CNN-K7 . |
---|---|---|---|---|---|

Parameters (K) | 0.00 | 0.53 | 30.37 | 84.13 | 164.77 |

MACs (G) | 0.00 | 0.85 | 5.20 | 14.40 | 28.19 |

Time consumption | 1.00 | 1.67 | 1.37 | 1.40 | 1.42 |

## IV. CONCLUDING REMARKS

Based on the convolutional neural network, several SGS models are developed and examined in turbulent channel flows. CNN can automatically exploit flow features from raw flow variables, thus avoiding the trouble of feature selection. The naturally nonlocal property of CNN enables the prediction process accounting for spatial correlations in a wide flow region and contributes to the numerical stability in realistic LES. These nonlocal models are trained only with the DNS data at *Re*_{τ} = 178 and tested in both *Re*_{τ} = 178 and 600 cases. In the *a priori* test, the correlation coefficients between *τ*_{ij} predicted by the nonlocal CNN-based models and $\tau ijfDNS$ are as large as around 0.9. In the *a posteriori* test, the CNN-based model accurately predicts the mean flow and the velocity fluctuations in the LES600 case, showing an advantage over ANN and traditional models in terms of accuracy. The influence of filter kernel size (or receptive fields) of CNN is also investigated—the receptive field with appropriate size includes the spatial information in the prediction process, which is beneficial for the SGS modeling, while too large scope size is unnecessary. The two-point correlation analysis explained the impact of kernel size from a physical perspective, providing a quantitative method to guide the design of nonlocal models. On the other hand, the CNN-based model is well fitted to different grid resolutions with excellent numerical robustness. Due to our HML-CFD framework, the CNN-based model can be as efficient as traditional models in practical simulations. This framework is suitable for various machine learning algorithms and physical problems but is not limited to the LES modeling problem.

Although CNN has the advantage in data fitting and feature extraction, it still has some shortcomings. One is that the CNN can be readily applied on a structured grid but suffers in configuration with complex geometry or unstructured grids because traditional convolutional kernels cannot contain unstructured topology information. Data transformations or specially designed convolutional filters may be helpful for the solution to this problem. The ANN-based model is more flexible in this aspect since it works in a point-to-point manner. Apart from that, it is still a challenge for the ML-based models to extrapolate to different flow types that are not contained in the training process. The unsupervised learning methods and semi-supervised methods like Generative Adversarial Networks^{46,47} may help us to alleviate this problem. Interpretability is also a big issue for all data-driven approaches, which will be a target in our future work.

## ACKNOWLEDGMENTS

The authors thank Professor Z.-H. Wan at USTC for useful discussions on the numerical results. This work was supported by the Natural Science Foundation of China (Grant Nos. 92052301 and 11621202), by the Science Challenge Project (Grant No. TZ2016001), and by Anhui NARI Jiyuan Electric Power Grid Technology Co. Ltd. through the Joint Laboratory of USTC-NARI. The present direct numerical simulations were performed on the supercomputing system at the Supercomputing Center of USTC.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

_{τ}= 590,”

_{τ}= 180,”