The Lattice Boltzmann Method (LBM) is an approach for modeling mesoscopic fluid flow and heat transfer, based on modeling distributions of particles moving and colliding on a lattice. Using a perturbative formulation of the Boltzmann equation, it scales to the macroscopic Navier–Stokes equation. We simulate natural thermal convection via LBM in a 2D rectangular box being heated from below and cooled from above, and use the results as training, testing, and generalization datasets to build a deep learning model. GoogLeNet, a convolutional neural network, is used to classify the simulation results based on two parameters: Rayleigh ( ) and Prandtl ( ) numbers, from a single snapshot of either the entire modeling field of resolution , or a crop. For each fixed in a range from 1 to 128, increasing by a factor of 2, we estimate with an accuracy varying from 40% to 90%, depending on the chosen augmentation strategy. For each fixed in the range from to , increasing of a factor , the method predicts with a systematically lower accuracy ranging from 30% to 80%. This approach has great potential for industrial applications like being able to control the industrial flow or scientific research on geophysical ones including the transport of heat in the earth’s interiors, ocean, and atmosphere.
I. INTRODUCTION
Convection, a mechanism of heat transfer in a fluid due to the diffusion–advection of heat, is a sophisticated phenomenon that has attracted much attention because of both its fundamental interest and its practical applications. Categorized as natural or forced, natural convection is caused by external or internal heat sources, which cause buoyancy imbalance, the rise of warmer fluid, and the downwelling of the cooler fluid.1
Natural convection has a major role in many industrial and geophysical processes. Industrial examples include the transport of heat in chemical reactors and electronic cooling systems. Geophysical flow comprises heat transport in the atmosphere and the ocean, which determines weather, climate, and ocean circulation. In all these cases, the ability to control and anticipate the flow behavior requires real-time access to the critical parameters of the flow based on the boundary conditions and geometries.
Any couple of these three parameters uniquely determine the convection characteristics, most commonly and . With the present development of machine learning, it is tempting to examine the possibility of extracting and from the convection patterns. Our research is in line with previous work where fundamental parameters for convection such as Yield stress and viscosity were extrapolated from numerical results2 and attempts at generating depth profiles of steady-state convecting systems from machine learning.3–5 Our work more generally explores the prediction of fundamental fluid-dynamic parameters, which has applications in engineering as well as for the geo/planetary community.
Progress in machine learning leaves open several challenges and decisions on how to implement this task. First, there is no general recipe to create an effective dataset designed to capture the fundamental characteristics of the physical model. Second, it is unknown whether the complexity emerging in fluid-dynamic systems when chaos, or turbulence, forms, is an obstacle due to its unpredictability or an advantage due to its richness of details when combined with machine learning algorithms such as neural networks. Finally, the generation of datasets of computational fluid-dynamic simulations for machine learning applications requires the amount of computational resources that can be inaccessible to many scientists.6
In this work, we generate training and testing sets for a popular CNN algorithm, called GoogLeNet, by simulating 2D natural thermal convection between two plates, using the Lattice Boltzmann method (LBM). LBM allows modeling incompressible convection with a wide range of and numbers (Sec. II). By using traditional data augmentation,7 we either fix or and extract the other fundamental convection parameter (Sec. III). Finally, our approach is tested for different boundary conditions, sampling rates, and augmentation methods (Sec. IV).
II. SIMULATION
The Lattice Boltzmann method (LBM) is an alternative approach to classical numerical solutions of PDEs for modeling fluid dynamics based on simulating the Boltzmann equation on a discrete lattice. Since the first inception as the Lattice gas method in which it was proven that the Navier–Stokes equations are recovered at the macroscopic scale by modeling discrete particles moving and colliding on a discrete lattice,8 the more efficient Lattice Boltzmann Method (LBM) was developed9 in which particle number densities are moved on a discrete lattice with collisions being achieved by relaxing the number densities toward the Boltzmann equilibrium distribution.10 The LBM, has since been used to model a wide range of fluid dynamics problems11 including mantle convection,12–15 magneto-hydrodynamics,16 multiphase flow in porous media,17,18 and aerospace engineering.19 In the following, we overview the thermal Lattice Boltzmann method which can be used to simulate natural convection and present 2D simulations with a linear rheology.
A. The thermal Lattice Boltzmann method
The thermal LBM models two distributions and representing the mass density and energy density moving one lattice spacing along the directions (i.e., orthogonally and diagonally) of a Cartesian lattice. After these number densities move by one lattice spacing, they are “relaxed” toward the Boltzmann equilibrium distribution. These two steps applied to which relates to the fluid density are found to yield the Navier–Stokes equations for incompressible fluids.9 The two steps applied to the advection–diffusion equation for energy density which relates to the temperature .20
(a) Space discretization of the 2D convection field and (b) weighted discretized velocity space in D2Q9 lattice.
(a) Space discretization of the 2D convection field and (b) weighted discretized velocity space in D2Q9 lattice.
1. Boussinesq buoyancy term
B. Simulation setup and results
For setting up the simulation, we select a suitable range for and numbers. Looking for a range for , we choose – to encompass a broad range of fluids, from air to heavy oils.21 Convection patterns converge toward an asymptotic solution for higher Prandtl numbers, normally obtained for infinite .22 We vary in the range from , which is sufficiently above the critical number (for fixed boundary conditions ) to guarantee a rapid initiation of convection, to . The maximum is determined by resolution since high convection is characterized by progressively thin boundary layers, which need to be well resolved for the validity of the solution.21
The values and ranges of the parameters used for the simulations.
Symbol . | Definition . | Value . |
---|---|---|
ρ | Density | ≃1 |
g | Acceleration due to gravity | 0.42 |
β | Coefficient of thermal expansion | 0.1 |
Tl | Temperature of the lower plate | 0.0005 |
Tu | Temperature of the upper plate | −0.0005 |
L | The characteristic length | 1024 |
Pr | Prandtl number | [1 − 128] |
Ra | Rayleigh number | [105 − 109] |
ν | Kinematic viscosity | [6.7 × 10−3 − 7.6] |
κ | Thermal diffusivity | [5.9 × 10−4 − 0.67] |
Symbol . | Definition . | Value . |
---|---|---|
ρ | Density | ≃1 |
g | Acceleration due to gravity | 0.42 |
β | Coefficient of thermal expansion | 0.1 |
Tl | Temperature of the lower plate | 0.0005 |
Tu | Temperature of the upper plate | −0.0005 |
L | The characteristic length | 1024 |
Pr | Prandtl number | [1 − 128] |
Ra | Rayleigh number | [105 − 109] |
ν | Kinematic viscosity | [6.7 × 10−3 − 7.6] |
κ | Thermal diffusivity | [5.9 × 10−4 − 0.67] |
The boundary conditions that we use for simulation purposes are either (i) fixed or (ii) periodic. In fixed boundary conditions, the values for , , and are set on the right and left side of the modeling domain equal to 0. For periodic boundary conditions, these values are the same on the left and right sides of the simulation domain.
All simulations in this work are executed on a lattice of resolution . varies from to , increasing by factor of for each fixed number. number varies from 1 to , increasing it of a factor 2 at each iteration. Using as a factor, is therefore approximated by . For each couple of and , we run two simulations, one from to , saving the results each 1000 steps, and a second one from to , saving the solution each 10 000 steps. Three scalars are saved for each computing node: (1) the temperature field ( ), (2) the x-component of the velocity ( ), and (3) the z-component of the velocity ( ). In this way, for every and number, two sets of 603 images are generated, each used to train a different neural network and perform another ML experiment.
Examples of results ranging from laminar flow to turbulent one are shown in Fig. 2.
Simulation results for laminar and turbulent convection with different boundary conditions and time steps. (a) Convection field for and and fixed boundary conditions at , (b) Convection field for and and fixed boundary conditions at , (c) Convection field for and and periodic boundary conditions at , (d) Convection field for and and periodic boundary conditions at .
Simulation results for laminar and turbulent convection with different boundary conditions and time steps. (a) Convection field for and and fixed boundary conditions at , (b) Convection field for and and fixed boundary conditions at , (c) Convection field for and and periodic boundary conditions at , (d) Convection field for and and periodic boundary conditions at .
III. CONVOLUTIONAL NEURAL NETWORKS
A. Algorithm
In the last several years, mainly due to the advances of deep learning, and convolutional networks principally, the quality of image recognition and object detection has been progressing at an exceptional pace.24 Most of this progress is not just the result of more powerful hardware, expanded datasets, and larger models, but primarily a consequence of new ideas, improved algorithms, and innovative network architectures. The most successful methods of object detection instead of coming from more powerful deep networks alone, emerged from the combination of deep architectures and classical computer vision algorithms, such as the R-CNN algorithm by Girshick et al.25 In this paper, we use a widely used deep neural network architecture for computer vision, codenamed GoogLeNet,23 which is rooted in pioneering work by Lin et al.,26 who experimented with stacking micro neural networks to improve feature map generation.
The simplest way to improve the performance of a deep neural network is to increase its size (depth, i.e., the number of levels of the network, and width, i.e., the number of units per level). With a sufficiently large training set, this approach is successful, but with some caveats. First of all, if the training set is not sufficiently large, the large number of parameters will make the greater network more prone to overfitting. In works like ours, the generation of the training dataset is related to the use of limited computational resources. By the similarity of the results in some of the numerical models, once a steady state is reached, the number of required training images is increased.
Another aspect in which computational resources limit the size of a suitable network for practical application is that the training of a network requires resources that scale quadratically with the network size.27 On top of that, for specific and limited applications such as the one presented in this paper, where the categories are of the order of 10, many weights of the network are irrelevant. They would be set to close to zero values, implying a waste of computational resources.28 Overall, an optimal algorithm uses all the available computational resources, minimizing waste.
GoogLeNet addressed these challenges in Szegedy et al.23 as part of the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The solution proposed requires the replacement of fully connected architectures with primarily sparsely connected ones, which allows to flexibly adapt to features at many scales.29 Considering this idea, inspired by the biological interpretation of neural networks (e.g., Arora et al.30), the optimal network topology is layer by layer constructed by correlating clustered neurons with highly correlated outputs. In Ref. 23, this is described with the well known Hebbian principle that “Neurons that fire together, wire together,” which is a more algorithmic way to describe the same principle.
From the computational point of view, the connectivity between neural layers can be either sparse or dense. Sparse connectivity is more challenging than dense. Constantly improving numerical libraries for fast dense matrix multiplication, based on state-of-the-art standard hierarchical CPU/GPU hardware31,32 allows hardware-based rapid progress of traditional computational approaches. For this reason, the Inception architecture (GoogLeNet) that we employ approximates sparse structures for vision networks with dense, computationally efficient components. This approach allows more uniformity of structures, and filters, by employing efficient dense computations.
A key aspect responsible for the quality of the classification is the choice of the dataset used for pre-training the network. Training from scratch a neural network such as GoogLeNet, characterized by a complex structure and vast number of parameters, is computationally overly expensive and not necessary, given the common and well-tested use of pretraining.27 Even if pretraining is done with radically different datasets from the ones for which it is used, as in our case, its result is optimal because of the layered structures of networks. The network layers are responsible for detecting single features, simpler for the lowest layer and more abstract for the apical ones. For instance, for a lower layer, the curvature of the temperature field in a fluid-dynamic simulation is not different from the curvature of the body of a car. Hence, using a network pre-trained with a general purpose very large dataset can be trained much more rapidly than a blank network. Experience has shown that only the last layer needs to be reconstructed, based on the specific dataset of interest. This approach is commonly called transfer learning in ML literature.33 We use a version of GoogLeNet pre-trained with the ImageNet dataset, composed of over 10 million images (Fig. 3). The use of ImageNet has two important advantages: (1) the large amount of data. (2) the diversity and fairness employed to design the composition of the dataset.34,35
An overview of the GoogLeNet internal structure. The green area shows the layers that their weights are not trained during transfer learning. The inception modules have different parameters based on Ref. 23. The linear layer is adopted upon the number of classes. Then, it is trained using a dataset designed by simulation results. The figure shows the situation for fixed . and in Convolution, max pooling, and average pooling layers are abbreviations of patch size and stride.
An overview of the GoogLeNet internal structure. The green area shows the layers that their weights are not trained during transfer learning. The inception modules have different parameters based on Ref. 23. The linear layer is adopted upon the number of classes. Then, it is trained using a dataset designed by simulation results. The figure shows the situation for fixed . and in Convolution, max pooling, and average pooling layers are abbreviations of patch size and stride.
B. Dataset
1. Design
Since GoogLeNet, as well as most pre-trained networks, has been developed to process RGB images (three fields for each pixel), we designed a dataset that would optimally fit with our Neural Network. For each , , and time step, we created an RGB image as a combination of the three fields , , and . Four examples of the RGB combined images of the step of the simulation for the four end-members (minimum and maximum vs maximum and minimum , for both fixed and periodic boundary conditions) of the convection regime are shown in Fig. 4.
The RGB images from a combination of (red), (green) and (blue) fields with (a), (b) fixed and (c) and (d) periodic boundary condition. (a) , (b) , (c) , (d) .
The RGB images from a combination of (red), (green) and (blue) fields with (a), (b) fixed and (c) and (d) periodic boundary condition. (a) , (b) , (c) , (d) .
2. Image mapping
Since we use the pre-trained version of the GoogLeNet, only images with a resolution of can be fed to the network. We adopted three strategies to map the field to the required resolution.
The first approach uses the bilinear interpolation, which is a resampling method that uses the distance-weighted average of the four nearest pixel values to estimate a new pixel value, to map field to .36 This implies a certain reduction of resolution, which reduces the sensibility of the neural network to the smallest details.
In a second approach, we cropped always the same sized central region to the domain. Although a smaller region implies a loss of information on the overall convection field, our method tests whether the NN can extrapolate missing information from a subregion, which is expected particularly for the complex flow characteristic of a high Rayleigh number.
The third approach, random cropping, is based on the random selection of regions at casual x and y positions of the convection field. This choice is expected to increase the information fed into the Network since data from the entire domain are used for training, but with the disadvantage of not giving information on the position to the network.
3. Image augmentation
Present machine learning algorithms are known to need huge amounts of data to perform satisfactorily, contrary to human-based intelligence, which can instead extrapolate conclusions from small inputs. To obtain the datasets necessary to train an ML, two approaches exist: one is to produce large low-quality data, like unfiltered Web pages, containing many errors, to train large language methods, or augment a small set of curated structured data.7 We use this second approach, as has been proven effective by several recent works.37–39
Given that initial tests of random cropping gave poor results, as later shown in the results section, we implemented an augmentation technique for the third, random cropping, design. For each model output, we cropped 20 RGB images each at a different random position. This represented our fourth dataset.
Four datasets have been so created for each type of boundary condition, fixed and periodic (Fig. 5).
Different methods of image preparation for making dataset. (a) Bilinear interpolation, (b) center cropping, (c) random cropping, (d) multiple cropping.
Different methods of image preparation for making dataset. (a) Bilinear interpolation, (b) center cropping, (c) random cropping, (d) multiple cropping.
C. Implementation
Using GoogLeNet, we perform classification on and classes. For each test, the dataset was divided into two parts, randomly picked each time: for training and for testing. The training was executed for epochs with a batch size equal to 1. This minimum batch was chosen as a guarantee of accuracy over performance.
We performed training and testing separately on two setups. In one, we fixed the number and trained the network to predict the number from each image, and in the second case, we fixed and tried to predict the number.
Detailed information on the augmentation and mapping is given in the precious subsection and all the information are summarized in Table II.
Detailed information about machine learning dataset.
Definition . | Value . |
---|---|
Boundary conditions | Fixed, periodic |
Sampling rate (time steps) | 103,a 104b |
Simulation duration (time steps) | 2 × 105, 2 × 106 |
No. of images per simulation | 200 |
No. of image: resize | 1 image |
No. of image: center cropped | 1 tile |
No. of image: random cropped | 1 tile |
No. of image: augmented random cropped | 20 tiles |
Dataset size (fixed Pr) | c |
Dataset size (Fixed Ra) |
Definition . | Value . |
---|---|
Boundary conditions | Fixed, periodic |
Sampling rate (time steps) | 103,a 104b |
Simulation duration (time steps) | 2 × 105, 2 × 106 |
No. of images per simulation | 200 |
No. of image: resize | 1 image |
No. of image: center cropped | 1 tile |
No. of image: random cropped | 1 tile |
No. of image: augmented random cropped | 20 tiles |
Dataset size (fixed Pr) | c |
Dataset size (Fixed Ra) |
Refers to the 2 × 105 long models.
Refers to the 2 × 106 long models.
Each number refers to each of the four approaches.
IV. RESULTS AND DISCUSSION
Given the large number of numerical experiments performed, we synthesize the results in two plots in which each line represents the average of four models: (1) for 1000 and 10 000 time step intervals for each (2) fixed and periodic lateral boundary conditions. The fine-grained effect of each of these choices is then shown in the figures in Appendix A and discussed later. The main pictures emerging from the four types of datasets: (a) resized image from the entire domain, (b) center cropped image, (c) random cropped image and (d) augmented random cropped (x 20 times). Figure 6 presents the results for (a) fixed and variable , and (b) fixed and variable .
Average classification accuracy for fixed and numbers. (a) Average classification accuracy of obtaining number at fixed number, (b) Average classification accuracy of obtaining number at fixed number.
Average classification accuracy for fixed and numbers. (a) Average classification accuracy of obtaining number at fixed number, (b) Average classification accuracy of obtaining number at fixed number.
An immediate result is that regardless of the augmentation mechanism, the classification accuracy is always superior when estimating Ra (fixed ) vs estimating (given a fixed ). This can be associated with the greater factor increase for Ra ( ) than for ( ). We performed tests with a factor for in the “Resized” case and verified that performance drastically improved. However, it is not clear whether this is due to the greater factor or to the reduced number of options.
Another aspect related to the superior performance of the Ra classifier is that high solutions asymptotically converge toward infinite solution. This implies that high solutions are more similar to each other than different solutions. For example, the complexities of the patterns determined by the fluid flow vary with (it is known the general relationship between Reynolds and Rayleigh number, ), therefore an increase in turbulence can be likely picked by the CNN. On the contrary an increase in produces smoother solutions (more laminar flow) more and more similar to each other, with increasing .
For both classifications of and , a common result is that for both the physical parameters, GoogleNet displays consistently much better results for the bilinear dataset compared to the others. The most obvious explanation is that the mapping of the convection field features ( , , ) maintains the macroscopic patterns even if pixels are mapped into 1. When looking more in detail into the and dependent results, one observes that for fixed [Fig. 6(b)], the performance of the network systematically improves for increasing , until it saturates for . The increase is likely due to the greater amount of details of high convection flow vs the low convection, however, the very small scale features that characterized the highly turbulent convection above are most likely smoothed by the bilinear algorithm. On the contrary, for fixed , Fig. 6(a), performance is very uniform for varying , with a mild improvement for high .
Our work does not investigate which features are extracted by the trained neural network to infer and . Fluid-dynamic theory predicts that the boundary layer thickness decreases with increasing , for a fixed . We estimate that for the extreme case of and the boundary layer at the resolution is about pixels thick so in its mapping to a image the boundary layer will be only about 2 pixels thick. However, we not have information whether at very high the network infers and from the thickness of the boundary layers. Since we do not observe a worsening of the performance for high , we expect that these values emerge from the patterns of the turbulence in the rest of the modeling domain. Our work does not put a final word on this regard, and future work will have reverse the network to find the active features.
Of the other three cases in which we cropped part of the domain, GoogleNet systematically performs better for the center cropped field dataset. albeit always much worse than the bilinear mapping. We attribute this relative success of the Network learning to interpret this small portion, 20 times smaller than the entire domain, to the fact that features in the same location, at equal distance from the top and bottom boundary, and from the sides for the non-periodic case, are much more similar compared to a random location in the domain.
We challenged the network by testing its ability to detect and from a random cropped subset of the convection domain. Unsurprisingly for the same dataset size of the bilinear and center cropped case, performance dropped drastically to below 50% for fixed and below 30% for fixed . However, we observed a substantial improvement by using image augmentation. By selecting 20 crops for each image, in turn by increasing twenty-folds the training dataset, performance systematically improved by 20% for both cases. We concluded from this last test, that even for the extremely challenging task of extracting fundamental physical properties from a small patch of a large-scale fluid-dynamic experiment, satisfactory performances can be achieved if a sufficiently augmented dataset is used.
The network performance is superior for compared to . A more detailed analysis of the role of the timesteps and the boundary conditions shows that the first has a substantial effect on the final result (Fig. 7), while boundary conditions have not (Fig. 8). Systematically, choosing a smaller time-step interval improves the performance of for most scenarios. A detailed analysis of the images has shown that this difference is likely due to greater variety among the time-step because most images are collected before reaching a steady state, for low Ra numbers. This is confirmed also by the matching between and results for cropped (Fig. 7) and low Ra numbers (Fig. 8). Overall, this result is encouraging for future application to natural systems where a steady state is not expected to be reached. In turn, training the network with a denser dataset will be preferred for generalizing our approach to datasets of different origins.
Average classification accuracy for fixed and numbers (solid lines) represent the average of the values for two tested cases with different time sampling rate: and (dashed lines). (a) Average classification accuracy of obtaining number at fixed number, (b) Average classification accuracy of obtaining number at fixed number.
Average classification accuracy for fixed and numbers (solid lines) represent the average of the values for two tested cases with different time sampling rate: and (dashed lines). (a) Average classification accuracy of obtaining number at fixed number, (b) Average classification accuracy of obtaining number at fixed number.
Average classification accuracy for fixed and numbers (solid lines) represent the average of the values for either fixed or periodic boundary conditions (dashed lines). (a) Average classification accuracy of obtaining number at fixed number, (b) Average classification accuracy of obtaining number at fixed number.
Average classification accuracy for fixed and numbers (solid lines) represent the average of the values for either fixed or periodic boundary conditions (dashed lines). (a) Average classification accuracy of obtaining number at fixed number, (b) Average classification accuracy of obtaining number at fixed number.
On the contrary, we observed little effect by changing the choice of BC, however even if in small measure, training sets with fixed boundary conditions exhibit superior performance. Overall, we believe that this is due to the limited freedom that the flow has with fixed BCs, whereas with periodic boundary conditions, the fluid dynamics can be more complex. Overall this effect will likely vanish for a wider domain, where the effect of lateral boundary conditions will play a minor role.
These models are just a small sample of the opportunities offered by machine learning to solve the inverse problem of extracting fluid-dynamic properties from numerical (and potentially experimental) results. Future tests will be necessary to test consistency for the non-linear flow and temperature-dependent parameters such as fluid viscosity, which have wide applications in Geophysics, Planetary Science, and Engineering. Further, our results suggest that generative machine learning will likely succeed at offering even better performance than the one that we obtained since our approach is limited by a predefined number of categories.
ACKNOWLEDGMENTS
This work was supported by NASA through NASA Emerging World Program Grant (No. 20-EW20_2-0026) as well as Louisiana Optical Network Infrastructure (LONI).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Mohammad Ali Boroumand: Data curation (equal); Formal analysis (equal); Investigation (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Gabriele Morra: Conceptualization (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Peter Mora: Software (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX A: REFINED ANALYSIS OF THE NETWORK PERFORMANCE
Figure 7 provides more information about the effect of choosing different time steps on the classification results. Every point on the dashed line graphs is the average classification accuracy value for fixed and periodic boundary conditions at the chosen time step. To present the same analysis for different boundary conditions, the dashed line in Fig. 8 exhibits average classification results for different time steps at selected boundary conditions. Both analyses have been done for the situations where the dataset is comprised of images with fixed number and changing and vice versa.
APPENDIX B: STEADY-STATE REGIMES OF THE CONVECTION
Root mean square of the velocity and Nusselt number through time for two end-member fluid-dynamic solutions with fixed boundary condition. (a) and vs time for convection field with and , (b) and vs time for convection field with and .
Root mean square of the velocity and Nusselt number through time for two end-member fluid-dynamic solutions with fixed boundary condition. (a) and vs time for convection field with and , (b) and vs time for convection field with and .
Results in 9 show that while for high the majority of the images used for this work are in a dynamic steady state, half or more of the images for the low are taken from the transient time while it reaches steady state. Our predicting approach shows robustness by combining transient and steady-state images.