The Lattice Boltzmann Method (LBM) is an approach for modeling mesoscopic fluid flow and heat transfer, based on modeling distributions of particles moving and colliding on a lattice. Using a perturbative formulation of the Boltzmann equation, it scales to the macroscopic Navier–Stokes equation. We simulate natural thermal convection via LBM in a 2D rectangular box being heated from below and cooled from above, and use the results as training, testing, and generalization datasets to build a deep learning model. GoogLeNet, a convolutional neural network, is used to classify the simulation results based on two parameters: Rayleigh ( R a) and Prandtl ( P r) numbers, from a single snapshot of either the entire modeling field of resolution 1024 × 1024, or a 224 × 224 crop. For each fixed P r in a range from 1 to 128, increasing by a factor of 2, we estimate R a with an accuracy varying from 40% to 90%, depending on the chosen augmentation strategy. For each fixed R a in the range from 10 5 to 10 9, increasing of a factor 10, the method predicts P r with a systematically lower accuracy ranging from 30% to 80%. This approach has great potential for industrial applications like being able to control the industrial flow or scientific research on geophysical ones including the transport of heat in the earth’s interiors, ocean, and atmosphere.

Convection, a mechanism of heat transfer in a fluid due to the diffusion–advection of heat, is a sophisticated phenomenon that has attracted much attention because of both its fundamental interest and its practical applications. Categorized as natural or forced, natural convection is caused by external or internal heat sources, which cause buoyancy imbalance, the rise of warmer fluid, and the downwelling of the cooler fluid.1 

Natural convection has a major role in many industrial and geophysical processes. Industrial examples include the transport of heat in chemical reactors and electronic cooling systems. Geophysical flow comprises heat transport in the atmosphere and the ocean, which determines weather, climate, and ocean circulation. In all these cases, the ability to control and anticipate the flow behavior requires real-time access to the critical parameters of the flow based on the boundary conditions and geometries.

The Rayleigh number ( R a) is a dimensionless number associated with the intensity of natural convection: a low R a is associated with laminar flow while a high R a to turbulent behavior. Below a critical value R a c, heat transfer dominates over advection, therefore convection does not happen. The Rayleigh number is defined as the product of the Grashof number ( G r), which describes the ratio between buoyancy and viscosity, and the Prandtl number ( P r), the ratio between momentum and thermal diffusivity,
(1)

Any couple of these three parameters uniquely determine the convection characteristics, most commonly R a and P r. With the present development of machine learning, it is tempting to examine the possibility of extracting R a and P r from the convection patterns. Our research is in line with previous work where fundamental parameters for convection such as Yield stress and viscosity were extrapolated from numerical results2 and attempts at generating depth profiles of steady-state convecting systems from machine learning.3–5 Our work more generally explores the prediction of fundamental fluid-dynamic parameters, which has applications in engineering as well as for the geo/planetary community.

Progress in machine learning leaves open several challenges and decisions on how to implement this task. First, there is no general recipe to create an effective dataset designed to capture the fundamental characteristics of the physical model. Second, it is unknown whether the complexity emerging in fluid-dynamic systems when chaos, or turbulence, forms, is an obstacle due to its unpredictability or an advantage due to its richness of details when combined with machine learning algorithms such as neural networks. Finally, the generation of datasets of computational fluid-dynamic simulations for machine learning applications requires the amount of computational resources that can be inaccessible to many scientists.6 

In this work, we generate training and testing sets for a popular CNN algorithm, called GoogLeNet, by simulating 2D natural thermal convection between two plates, using the Lattice Boltzmann method (LBM). LBM allows modeling incompressible convection with a wide range of P r and R a numbers (Sec. II). By using traditional data augmentation,7 we either fix R a or P r and extract the other fundamental convection parameter (Sec. III). Finally, our approach is tested for different boundary conditions, sampling rates, and augmentation methods (Sec. IV).

The Lattice Boltzmann method (LBM) is an alternative approach to classical numerical solutions of PDEs for modeling fluid dynamics based on simulating the Boltzmann equation on a discrete lattice. Since the first inception as the Lattice gas method in which it was proven that the Navier–Stokes equations are recovered at the macroscopic scale by modeling discrete particles moving and colliding on a discrete lattice,8 the more efficient Lattice Boltzmann Method (LBM) was developed9 in which particle number densities are moved on a discrete lattice with collisions being achieved by relaxing the number densities toward the Boltzmann equilibrium distribution.10 The LBM, has since been used to model a wide range of fluid dynamics problems11 including mantle convection,12–15 magneto-hydrodynamics,16 multiphase flow in porous media,17,18 and aerospace engineering.19 In the following, we overview the thermal Lattice Boltzmann method which can be used to simulate natural convection and present 2D simulations with a linear rheology.

The thermal LBM models two distributions f α and g α representing the mass density and energy density moving one lattice spacing along the α directions (i.e., orthogonally and diagonally) of a Cartesian lattice. After these number densities move by one lattice spacing, they are “relaxed” toward the Boltzmann equilibrium distribution. These two steps applied to f α which relates to the fluid density ρ are found to yield the Navier–Stokes equations for incompressible fluids.9 The two steps applied to g α the advection–diffusion equation for energy density E which relates to the temperature T.20 

The combined movement and collision steps can be written as
(2)
and
(3)
where c α is the velocity vector pointing between adjacent lattice sites (i.e., c α Δ t equals one lattice spacing orthogonally or diagonally in a square or cubic lattice), and Δ α C f and Δ α C g are the collision terms which redistribute number densities between directions due to collisions. These collision terms can be calculated by an efficient method of relaxing the distributions toward the equilibrium distribution due to Bhatnagar et al.10 using
(4)
(5)
where τ f and τ g are the relaxation times for the two number densities f α and g α, and f α e q and g α e q are, respectively, the corresponding equilibrium distributions which can be calculated as a Taylor expansion to the second order of the Boltzmann distribution, namely,
(6)
and
(7)
where ρ is the macroscopic density, u is the macroscopic fluid velocity, ε is the energy density, w α are weights which depend on the lattice (to be specified shortly) and c s = s / 3 is the speed of sound in the lattice where s = Δ x / Δ t is the lattice speed (i.e., Δ x is the lattice spacing of the cartesian lattice and Δ t is the time step). In the following, we use non-dimensionalized simulations and hence, we set Δ x = Δ t = 1 which leads to equilibrium distributions given by
(8)
and
(9)
The relaxation times in the above equilibrium distributions relate to the kinematic viscosity and the thermal diffusivity as
(10)
where ν f is the kinematic viscosity and
(11)
where ν g = κ is the thermal diffusivity.
So far, the above equations are valid for 2D or 3D. However, in the following numerical study, we will perform simulations in 2D. We use the so-called D2Q9 lattice (2 dimensions, 9 lattice velocities) in which number densities move on a square lattice along the axes or in the diagonal directions. We define the velocity vectors in the D2Q9 lattice c α as
(12)
and the weights vector w α for this lattice is given by
(13)
i.e., a weight of 4/9 for the zero velocity vector (i.e., the velocity vector for no movement), 1/9 for directions along the cartesian axes, and 1/36 along diagonals (note that α w α = 1) (Fig. 1). It is well known that the above choice leads to the Navier–Stokes equations for incompressible fluid flow9 where the macroscopic properties can be obtained from the number densities f α and g α using
(14)
(15)
and
(16)
where the energy density ε relates to temperature through ε = D R T / 2 where D = 2 is the number of dimensions, T is the temperature, and R is the gas constant, which we set to unity for our non-dimensionalized simulation runs in this paper.
FIG. 1.

(a) Space discretization of the 2D convection field and (b) weighted discretized velocity space in D2Q9 lattice.

FIG. 1.

(a) Space discretization of the 2D convection field and (b) weighted discretized velocity space in D2Q9 lattice.

Close modal

1. Boussinesq buoyancy term

To model convection, we need to add a buoyancy force and we adopt the usual Boussinesq approximation where density variations have a fixed part plus a perturbation that linearly depends on the temperature given by
(17)
We assume gravity and hence, under the Boussinesq approximation, we need to add a force term of
(18)
where g is the acceleration due to gravity, β is the coefficient of thermal expansion, and G is the gravitational force due to gravity. As a practice in LBM, the physical quantities used in the simulations are dimensionless. Here, ρ is a perturbative value around 1.
The above gravitational force can be added to the LBM by adding a forcing term to the Lattice Boltzmann equations for f α. Equation (2) including the gravitational force becomes
(19)
where the Boussinesq buoyancy forcing term F b is given by
(20)
where G = | G | = G z = ρ β Δ T g. Equation (20) can be verified by calculating the change in velocity during one time step due to the Boussinesq term using Eq. (15).

For setting up the simulation, we select a suitable range for R a and P r numbers. Looking for a range for P r, we choose P r = 1 128 to encompass a broad range of fluids, from air to heavy oils.21 Convection patterns converge toward an asymptotic solution for higher Prandtl numbers, normally obtained for infinite P r.22 We vary R a in the range from 10 5, which is sufficiently above the critical R a number (for fixed boundary conditions R a c = 1708) to guarantee a rapid initiation of convection, to 10 9. The maximum R a is determined by resolution since high R a convection is characterized by progressively thin boundary layers, which need to be well resolved for the validity of the solution.21 

Once P r and R a are chosen, and assumed the other physical parameters to be fixed, the viscosity and thermal diffusivity are uniquely determined. Using Eq. (1) and the definition of the Grashof number (Gr) and of the Prandtl number (Pr) we have13 
(21)
with
(22)
where ν is the kinematic viscosity and κ is the thermal diffusivity. Using Eqs. (21) and (22), one obtains for all our simulations,
(23)
(24)
The range of values obtained through this formulation varying of several orders of magnitude is summarized in Table I.
TABLE I.

The values and ranges of the parameters used for the simulations.

SymbolDefinitionValue
ρ Density ≃1 
g Acceleration due to gravity 0.42 
β Coefficient of thermal expansion 0.1 
Tl Temperature of the lower plate 0.0005 
Tu Temperature of the upper plate −0.0005 
L The characteristic length 1024 
Pr Prandtl number [1 − 128] 
Ra Rayleigh number [105 − 109
ν Kinematic viscosity [6.7 × 10−3 − 7.6] 
κ Thermal diffusivity [5.9 × 10−4 − 0.67] 
SymbolDefinitionValue
ρ Density ≃1 
g Acceleration due to gravity 0.42 
β Coefficient of thermal expansion 0.1 
Tl Temperature of the lower plate 0.0005 
Tu Temperature of the upper plate −0.0005 
L The characteristic length 1024 
Pr Prandtl number [1 − 128] 
Ra Rayleigh number [105 − 109
ν Kinematic viscosity [6.7 × 10−3 − 7.6] 
κ Thermal diffusivity [5.9 × 10−4 − 0.67] 

The boundary conditions that we use for simulation purposes are either (i) fixed or (ii) periodic. In fixed boundary conditions, the values for T, v x, and v z are set on the right and left side of the modeling domain equal to 0. For periodic boundary conditions, these values are the same on the left and right sides of the simulation domain.

All simulations in this work are executed on a lattice of resolution 1024 × 1024. R a varies from 10 5 to 10 9, increasing by factor of 10 3.16 for each fixed P r number. P r number varies from 1 to P r = 128, increasing it of a factor 2 at each iteration. Using 3.16 as a factor, 10 9 is therefore approximated by 9.89 × 10 8. For each couple of R a and P r, we run two simulations, one from t = 0 to t = 2.0 × 10 5, saving the results each 1000 steps, and a second one from t = 0 to t = 2.0 × 10 6, saving the solution each 10 000 steps. Three scalars are saved for each computing node: (1) the temperature field ( T), (2) the x-component of the velocity ( v x), and (3) the z-component of the velocity ( v z). In this way, for every R a and P r number, two sets of 603 images are generated, each used to train a different neural network and perform another ML experiment.

Examples of results ranging from laminar flow to turbulent one are shown in Fig. 2.

FIG. 2.

Simulation results for laminar and turbulent convection with different boundary conditions and time steps. (a) Convection field for R a = 10 5 and P r = 128 and fixed boundary conditions at t = 2 000 000, (b) Convection field for R a = 10 9 and P r = 1 and fixed boundary conditions at t = 200 000, (c) Convection field for R a = 10 5 and P r = 128 and periodic boundary conditions at t = 2 000 000, (d) Convection field for R a = 10 9 and P r = 1 and periodic boundary conditions at t = 200 000.

FIG. 2.

Simulation results for laminar and turbulent convection with different boundary conditions and time steps. (a) Convection field for R a = 10 5 and P r = 128 and fixed boundary conditions at t = 2 000 000, (b) Convection field for R a = 10 9 and P r = 1 and fixed boundary conditions at t = 200 000, (c) Convection field for R a = 10 5 and P r = 128 and periodic boundary conditions at t = 2 000 000, (d) Convection field for R a = 10 9 and P r = 1 and periodic boundary conditions at t = 200 000.

Close modal

In the last several years, mainly due to the advances of deep learning, and convolutional networks principally, the quality of image recognition and object detection has been progressing at an exceptional pace.24 Most of this progress is not just the result of more powerful hardware, expanded datasets, and larger models, but primarily a consequence of new ideas, improved algorithms, and innovative network architectures. The most successful methods of object detection instead of coming from more powerful deep networks alone, emerged from the combination of deep architectures and classical computer vision algorithms, such as the R-CNN algorithm by Girshick et al.25 In this paper, we use a widely used deep neural network architecture for computer vision, codenamed GoogLeNet,23 which is rooted in pioneering work by Lin et al.,26 who experimented with stacking micro neural networks to improve feature map generation.

The simplest way to improve the performance of a deep neural network is to increase its size (depth, i.e., the number of levels of the network, and width, i.e., the number of units per level). With a sufficiently large training set, this approach is successful, but with some caveats. First of all, if the training set is not sufficiently large, the large number of parameters will make the greater network more prone to overfitting. In works like ours, the generation of the training dataset is related to the use of limited computational resources. By the similarity of the results in some of the numerical models, once a steady state is reached, the number of required training images is increased.

Another aspect in which computational resources limit the size of a suitable network for practical application is that the training of a network requires resources that scale quadratically with the network size.27 On top of that, for specific and limited applications such as the one presented in this paper, where the categories are of the order of 10, many weights of the network are irrelevant. They would be set to close to zero values, implying a waste of computational resources.28 Overall, an optimal algorithm uses all the available computational resources, minimizing waste.

GoogLeNet addressed these challenges in Szegedy et al.23 as part of the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The solution proposed requires the replacement of fully connected architectures with primarily sparsely connected ones, which allows to flexibly adapt to features at many scales.29 Considering this idea, inspired by the biological interpretation of neural networks (e.g., Arora et al.30), the optimal network topology is layer by layer constructed by correlating clustered neurons with highly correlated outputs. In Ref. 23, this is described with the well known Hebbian principle that “Neurons that fire together, wire together,” which is a more algorithmic way to describe the same principle.

From the computational point of view, the connectivity between neural layers can be either sparse or dense. Sparse connectivity is more challenging than dense. Constantly improving numerical libraries for fast dense matrix multiplication, based on state-of-the-art standard hierarchical CPU/GPU hardware31,32 allows hardware-based rapid progress of traditional computational approaches. For this reason, the Inception architecture (GoogLeNet) that we employ approximates sparse structures for vision networks with dense, computationally efficient components. This approach allows more uniformity of structures, and filters, by employing efficient dense computations.

A key aspect responsible for the quality of the classification is the choice of the dataset used for pre-training the network. Training from scratch a neural network such as GoogLeNet, characterized by a complex structure and vast number of parameters, is computationally overly expensive and not necessary, given the common and well-tested use of pretraining.27 Even if pretraining is done with radically different datasets from the ones for which it is used, as in our case, its result is optimal because of the layered structures of networks. The network layers are responsible for detecting single features, simpler for the lowest layer and more abstract for the apical ones. For instance, for a lower layer, the curvature of the temperature field in a fluid-dynamic simulation is not different from the curvature of the body of a car. Hence, using a network pre-trained with a general purpose very large dataset can be trained much more rapidly than a blank network. Experience has shown that only the last layer needs to be reconstructed, based on the specific dataset of interest. This approach is commonly called transfer learning in ML literature.33 We use a version of GoogLeNet pre-trained with the ImageNet dataset, composed of over 10 million images (Fig. 3). The use of ImageNet has two important advantages: (1) the large amount of data. (2) the diversity and fairness employed to design the composition of the dataset.34,35

FIG. 3.

An overview of the GoogLeNet internal structure. The green area shows the layers that their weights are not trained during transfer learning. The inception modules have different parameters based on Ref. 23. The linear layer is adopted upon the number of classes. Then, it is trained using a dataset designed by simulation results. The figure shows the situation for fixed P r = 1. S and P in Convolution, max pooling, and average pooling layers are abbreviations of patch size and stride.

FIG. 3.

An overview of the GoogLeNet internal structure. The green area shows the layers that their weights are not trained during transfer learning. The inception modules have different parameters based on Ref. 23. The linear layer is adopted upon the number of classes. Then, it is trained using a dataset designed by simulation results. The figure shows the situation for fixed P r = 1. S and P in Convolution, max pooling, and average pooling layers are abbreviations of patch size and stride.

Close modal

1. Design

Since GoogLeNet, as well as most pre-trained networks, has been developed to process RGB images (three fields for each pixel), we designed a dataset that would optimally fit with our Neural Network. For each R a, P r, and time step, we created an RGB image as a combination of the three fields T, v x, and v z. Four examples of the RGB combined images of the 200 k step of the simulation for the four end-members (minimum R a and maximum P r vs maximum R a and minimum P r, for both fixed and periodic boundary conditions) of the convection regime are shown in Fig. 4.

FIG. 4.

The RGB images from a combination of T (red), v x (green) and v z (blue) fields with (a), (b) fixed and (c) and (d) periodic boundary condition. (a) R a = 10 5 , P r = 128 t = 2 × 10 6, (b) R a = 10 9 , P r = 1 t = 2 × 10 5, (c) R a = 10 5 , P r = 128 t = 2 × 10 6, (d) R a = 10 9 , P r = 1 t = 2 × 10 5.

FIG. 4.

The RGB images from a combination of T (red), v x (green) and v z (blue) fields with (a), (b) fixed and (c) and (d) periodic boundary condition. (a) R a = 10 5 , P r = 128 t = 2 × 10 6, (b) R a = 10 9 , P r = 1 t = 2 × 10 5, (c) R a = 10 5 , P r = 128 t = 2 × 10 6, (d) R a = 10 9 , P r = 1 t = 2 × 10 5.

Close modal

2. Image mapping

Since we use the pre-trained version of the GoogLeNet, only images with a resolution of 224 × 224 can be fed to the network. We adopted three strategies to map the 1024 × 1024 field to the required resolution.

  1. The first approach uses the bilinear interpolation, which is a resampling method that uses the distance-weighted average of the four nearest pixel values to estimate a new pixel value, to map 1024 × 1024 field to 224 × 224.36 This implies a certain reduction of resolution, which reduces the sensibility of the neural network to the smallest details.

  2. In a second approach, we cropped always the same 224 × 224 sized central region to the domain. Although a smaller region implies a loss of information on the overall convection field, our method tests whether the NN can extrapolate missing information from a subregion, which is expected particularly for the complex flow characteristic of a high Rayleigh number.

  3. The third approach, random cropping, is based on the random selection of 224 × 224 regions at casual x and y positions of the convection field. This choice is expected to increase the information fed into the Network since data from the entire domain are used for training, but with the disadvantage of not giving information on the position to the network.

3. Image augmentation

Present machine learning algorithms are known to need huge amounts of data to perform satisfactorily, contrary to human-based intelligence, which can instead extrapolate conclusions from small inputs. To obtain the datasets necessary to train an ML, two approaches exist: one is to produce large low-quality data, like unfiltered Web pages, containing many errors, to train large language methods, or augment a small set of curated structured data.7 We use this second approach, as has been proven effective by several recent works.37–39 

Given that initial tests of random cropping gave poor results, as later shown in the results section, we implemented an augmentation technique for the third, random cropping, design. For each 1024 × 1024 model output, we cropped 20 224 × 224 RGB images each at a different random position. This represented our fourth dataset.

Four datasets have been so created for each type of boundary condition, fixed and periodic (Fig. 5).

FIG. 5.

Different methods of image preparation for making dataset. (a) Bilinear interpolation, (b) center cropping, (c) random cropping, (d) multiple cropping.

FIG. 5.

Different methods of image preparation for making dataset. (a) Bilinear interpolation, (b) center cropping, (c) random cropping, (d) multiple cropping.

Close modal

Using GoogLeNet, we perform classification on R a and P r classes. For each test, the dataset was divided into two parts, randomly picked each time: 80 % for training and 20 % for testing. The training was executed for 40 epochs with a batch size equal to 1. This minimum batch was chosen as a guarantee of accuracy over performance.

We performed training and testing separately on two setups. In one, we fixed the P r number and trained the network to predict the R a number from each image, and in the second case, we fixed R a and tried to predict the P r number.

Detailed information on the augmentation and mapping is given in the precious subsection and all the information are summarized in Table II.

TABLE II.

Detailed information about machine learning dataset.

DefinitionValue
Boundary conditions Fixed, periodic 
Sampling rate (time steps) 103,a 104b 
Simulation duration (time steps) 2 × 105, 2 × 106 
No. of images per simulation 200 
No. of image: resize 1 image 
No. of image: center cropped 1 tile 
No. of image: random cropped 1 tile 
No. of image: augmented random cropped 20 tiles 
Dataset size (fixed Pr)  [ 1800 , 1800 , 1800 , 36 000 ]c 
Dataset size (Fixed Ra)  [ 1600 , 1600 , 1600 , 32 000 ] 
DefinitionValue
Boundary conditions Fixed, periodic 
Sampling rate (time steps) 103,a 104b 
Simulation duration (time steps) 2 × 105, 2 × 106 
No. of images per simulation 200 
No. of image: resize 1 image 
No. of image: center cropped 1 tile 
No. of image: random cropped 1 tile 
No. of image: augmented random cropped 20 tiles 
Dataset size (fixed Pr)  [ 1800 , 1800 , 1800 , 36 000 ]c 
Dataset size (Fixed Ra)  [ 1600 , 1600 , 1600 , 32 000 ] 
a

Refers to the 2 × 105 long models.

b

Refers to the 2 × 106 long models.

c

Each number refers to each of the four approaches.

Given the large number of numerical experiments performed, we synthesize the results in two plots in which each line represents the average of four models: (1) for 1000 and 10 000 time step intervals for each (2) fixed and periodic lateral boundary conditions. The fine-grained effect of each of these choices is then shown in the figures in  Appendix A and discussed later. The main pictures emerging from the four types of datasets: (a) resized image from the entire domain, (b) center cropped image, (c) random cropped image and (d) augmented random cropped (x 20 times). Figure 6 presents the results for (a) fixed R a and variable P r, and (b) fixed P r and variable R a.

FIG. 6.

Average classification accuracy for fixed P r and R a numbers. (a) Average classification accuracy of obtaining R a number at fixed P r number, (b) Average classification accuracy of obtaining P r number at fixed R a number.

FIG. 6.

Average classification accuracy for fixed P r and R a numbers. (a) Average classification accuracy of obtaining R a number at fixed P r number, (b) Average classification accuracy of obtaining P r number at fixed R a number.

Close modal

An immediate result is that regardless of the augmentation mechanism, the classification accuracy is always superior when estimating Ra (fixed P r) vs estimating P r (given a fixed R a). This can be associated with the greater factor increase for Ra ( 10) than for P r ( 2). We performed tests with a factor 4 for P r in the “Resized” case and verified that performance drastically improved. However, it is not clear whether this is due to the greater factor or to the reduced number of options.

Another aspect related to the superior performance of the Ra classifier is that high P r solutions asymptotically converge toward infinite P r solution. This implies that high P r solutions are more similar to each other than different R a solutions. For example, the complexities of the patterns determined by the fluid flow vary with R a (it is known the general relationship between Reynolds and Rayleigh number, R e R a 1 / 2), therefore an increase in turbulence can be likely picked by the CNN. On the contrary an increase in P r produces smoother solutions (more laminar flow) more and more similar to each other, with increasing P r.

For both classifications of R a and P r, a common result is that for both the physical parameters, GoogleNet displays consistently much better results for the bilinear dataset compared to the others. The most obvious explanation is that the mapping of the convection field features ( T, v x, v z) maintains the macroscopic patterns even if 20 pixels are mapped into 1. When looking more in detail into the P r and R a dependent results, one observes that for fixed P r [Fig. 6(b)], the performance of the network systematically improves for increasing R a, until it saturates for R a 10 8. The increase is likely due to the greater amount of details of high R a convection flow vs the low R a convection, however, the very small scale features that characterized the highly turbulent convection above R a = 10 8 are most likely smoothed by the bilinear algorithm. On the contrary, for fixed R a, Fig. 6(a), performance is very uniform for varying P r, with a mild improvement for high P r.

Our work does not investigate which features are extracted by the trained neural network to infer R a and P r. Fluid-dynamic theory predicts that the boundary layer thickness decreases with increasing R a, for a fixed P r. We estimate that for the extreme case of P r = 1 and R a = 10 9 the boundary layer at the 1024 × 1024 resolution is about 10 pixels thick so in its mapping to a 224 × 224 image the boundary layer will be only about 2 pixels thick. However, we not have information whether at very high R a the network infers R a and P r from the thickness of the boundary layers. Since we do not observe a worsening of the performance for high R a, we expect that these values emerge from the patterns of the turbulence in the rest of the modeling domain. Our work does not put a final word on this regard, and future work will have reverse the network to find the active features.

Of the other three cases in which we cropped part of the domain, GoogleNet systematically performs better for the center cropped field dataset. albeit always much worse than the bilinear mapping. We attribute this relative success of the Network learning to interpret this small portion, 20 times smaller than the entire domain, to the fact that features in the same location, at equal distance from the top and bottom boundary, and from the sides for the non-periodic case, are much more similar compared to a random location in the domain.

We challenged the network by testing its ability to detect P r and R a from a random cropped subset of the convection domain. Unsurprisingly for the same dataset size of the bilinear and center cropped case, performance dropped drastically to below 50% for fixed R a and below 30% for fixed P r. However, we observed a substantial improvement by using image augmentation. By selecting 20 crops for each image, in turn by increasing twenty-folds the training dataset, performance systematically improved by 20% for both cases. We concluded from this last test, that even for the extremely challenging task of extracting fundamental physical properties from a small patch of a large-scale fluid-dynamic experiment, satisfactory performances can be achieved if a sufficiently augmented dataset is used.

The network performance is superior for d t = 10 3 compared to d t = 10 4. A more detailed analysis of the role of the timesteps and the boundary conditions shows that the first has a substantial effect on the final result (Fig. 7), while boundary conditions have not (Fig. 8). Systematically, choosing a smaller time-step interval improves the performance of 10 % for most scenarios. A detailed analysis of the images has shown that this difference is likely due to greater variety among the d t = 10 3 time-step because most images are collected before reaching a steady state, for low Ra numbers. This is confirmed also by the matching between d t = 10 3 and d t = 10 4 results for cropped P r = 1 (Fig. 7) and low Ra numbers (Fig. 8). Overall, this result is encouraging for future application to natural systems where a steady state is not expected to be reached. In turn, training the network with a denser dataset will be preferred for generalizing our approach to datasets of different origins.

FIG. 7.

Average classification accuracy for fixed P r and R a numbers (solid lines) represent the average of the values for two tested cases with different time sampling rate: d t = 1000 and d t = 10 000 (dashed lines). (a) Average classification accuracy of obtaining R a number at fixed P r number, (b) Average classification accuracy of obtaining P r number at fixed R a number.

FIG. 7.

Average classification accuracy for fixed P r and R a numbers (solid lines) represent the average of the values for two tested cases with different time sampling rate: d t = 1000 and d t = 10 000 (dashed lines). (a) Average classification accuracy of obtaining R a number at fixed P r number, (b) Average classification accuracy of obtaining P r number at fixed R a number.

Close modal
FIG. 8.

Average classification accuracy for fixed P r and R a numbers (solid lines) represent the average of the values for either fixed or periodic boundary conditions (dashed lines). (a) Average classification accuracy of obtaining R a number at fixed P r number, (b) Average classification accuracy of obtaining P r number at fixed R a number.

FIG. 8.

Average classification accuracy for fixed P r and R a numbers (solid lines) represent the average of the values for either fixed or periodic boundary conditions (dashed lines). (a) Average classification accuracy of obtaining R a number at fixed P r number, (b) Average classification accuracy of obtaining P r number at fixed R a number.

Close modal

On the contrary, we observed little effect by changing the choice of BC, however even if in small measure, training sets with fixed boundary conditions exhibit superior performance. Overall, we believe that this is due to the limited freedom that the flow has with fixed BCs, whereas with periodic boundary conditions, the fluid dynamics can be more complex. Overall this effect will likely vanish for a wider domain, where the effect of lateral boundary conditions will play a minor role.

These models are just a small sample of the opportunities offered by machine learning to solve the inverse problem of extracting fluid-dynamic properties from numerical (and potentially experimental) results. Future tests will be necessary to test consistency for the non-linear flow and temperature-dependent parameters such as fluid viscosity, which have wide applications in Geophysics, Planetary Science, and Engineering. Further, our results suggest that generative machine learning will likely succeed at offering even better performance than the one that we obtained since our approach is limited by a predefined number of categories.

This work was supported by NASA through NASA Emerging World Program Grant (No. 20-EW20_2-0026) as well as Louisiana Optical Network Infrastructure (LONI).

The authors have no conflicts to disclose.

Mohammad Ali Boroumand: Data curation (equal); Formal analysis (equal); Investigation (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Gabriele Morra: Conceptualization (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Peter Mora: Software (equal); Supervision (equal); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Figure 7 provides more information about the effect of choosing different time steps on the classification results. Every point on the dashed line graphs is the average classification accuracy value for fixed and periodic boundary conditions at the chosen time step. To present the same analysis for different boundary conditions, the dashed line in Fig. 8 exhibits average classification results for different time steps at selected boundary conditions. Both analyses have been done for the situations where the dataset is comprised of images with fixed P r number and changing R a and vice versa.

Figure 9 has illustrated the effect of changing fundamental parameters on the starting point for the stability of convection. For a lower P r number and a higher R a number, the convection reaches the steady state at an earlier time. This can be seen from the time that V r m s and Nusselt number, N u, are constant or exhibit less fluctuation. In this study, N u was calculated based on the following formula:40 
(B1)
where T is temperature and L z and L x are height and width of the convection field. We conducted experiments in the rectangular domain with a cold upper lid with temperature T u and a hot lower base with temperature T l. In our simulations, we investigated convection with Rayleigh numbers far from critical Rayleigh number R a c = 1708 (a high Rayleigh number corresponding to vigorous convection). When the Rayleigh number is similar to but greater than R a c, broad Rayleigh-Bernard convection cells will tend to form. For high Rayleigh numbers with R a R a c, convection will become increasingly vigorous.21 
FIG. 9.

Root mean square of the velocity and Nusselt number through time for two end-member fluid-dynamic solutions with fixed boundary condition. (a) V r m s and N u vs time for 1024 × 1024 convection field with R a = 10 5 and P r = 128, (b) V r m s and N u vs time for 1024 × 1024 convection field with R a = 10 9 and P r = 1.

FIG. 9.

Root mean square of the velocity and Nusselt number through time for two end-member fluid-dynamic solutions with fixed boundary condition. (a) V r m s and N u vs time for 1024 × 1024 convection field with R a = 10 5 and P r = 128, (b) V r m s and N u vs time for 1024 × 1024 convection field with R a = 10 9 and P r = 1.

Close modal

Results in 9 show that while for high R a the majority of the images used for this work are in a dynamic steady state, half or more of the images for the low R a are taken from the transient time while it reaches steady state. Our predicting approach shows robustness by combining transient and steady-state images.

1.
Y.
Çengel
and
A.
Ghajar
,
Heat and Mass Transfer: Fundamentals & Applications
(
McGraw-Hill Education
,
2019
).
2.
S.
Atkins
,
A. P.
Valentine
,
P. J.
Tackley
, and
J.
Trampert
, “
Using pattern recognition to infer parameters governing mantle convection
,”
Phys. Earth Planetary Interiors
257
,
171
186
(
2016
).
3.
S.
Agarwal
,
N.
Tosi
,
D.
Breuer
,
S.
Padovan
,
P.
Kessel
, and
G.
Montavon
, “
A machine-learning-based surrogate model of mars’ thermal evolution
,”
Geophys. J. Int.
222
,
1656
1670
(
2020
).
4.
S.
Agarwal
,
N.
Tosi
,
P.
Kessel
,
S.
Padovan
,
D.
Breuer
, and
G.
Montavon
, “
Toward constraining mars’ thermal evolution using machine learning
,”
Earth Space Sci.
8
,
e2020EA001484
(
2021
).
5.
M.
Shahnas
,
D.
Yuen
, and
R.
Pysklywec
, “
Inverse problems in geodynamics using machine learning algorithms
,”
J. Geophys. Res.: Solid Earth
123
,
296
310
, https://doi.org/10.1002/2017JB014846 (
2018
).
6.
S. L.
Brunton
,
B. R.
Noack
, and
P.
Koumoutsakos
, “
Machine learning for fluid mechanics
,”
Annu. Rev. Fluid Mech.
52
,
477
508
(
2020
).
7.
L.
Perez
and
J.
Wang
. “
The effectiveness of data augmentation in image classification using deep learning
,” arXiv preprint arXiv:1712.04621 (2017)
8.
U.
Frisch
,
B.
Hasslacher
, and
Y.
Pomeau
, “
Lattice-gas automata for the Navier–Stokes equation
,”
Phys. Rev. Lett.
56
,
1505
(
1986
).
9.
S.
Chen
and
G. D.
Doolen
, “
Lattice Boltzmann method for fluid flows
,”
Annu. Rev. Fluid Mech.
30
,
329
364
(
1998
).
10.
P. L.
Bhatnagar
,
E. P.
Gross
, and
M.
Krook
, “
A model for collision processes in gases. I. Small amplitude processes in charged and neutral one-component systems
,”
Phys. Rev.
94
,
511
(
1954
).
11.
S.
Succi
and
S.
Succi
,
The Lattice Boltzmann Equation: For Complex States of Flowing Matter
(
Oxford University Press
,
2018
).
12.
P.
Mora
and
D. A.
Yuen
, “
Simulation of plume dynamics by the Lattice Boltzmann method
,”
Geophys. J. Int.
210
,
1932
1937
(
2017
).
13.
P.
Mora
and
D. A.
Yuen
, “
Simulation of regimes of convection and plume dynamics by the thermal Lattice Boltzmann method
,”
Phys. Earth Planetary Interiors
275
,
69
79
(
2018
).
14.
P.
Mora
and
D.
Yuen
, “
Comparison of convection for Reynolds and Arrhenius temperature dependent viscosities
,”
Fluid Mech. Res. Int.
2
,
99
107
(
2018
).
15.
P.
Mora
,
G.
Morra
, and
D. A.
Yuen
, “
Models of plate tectonics with the Lattice Boltzmann method
,”
Artif. Intellig. Geosci.
4
,
47
58
(
2023
).
16.
P. J.
Dellar
, “
Lattice kinetic schemes for magnetohydrodynamics
,”
J. Comput. Phys.
179
,
95
126
(
2002
).
17.
H.
Huang
,
M.
Sukop
, and
X.
Lu
,
Multiphase Lattice Boltzmann Methods: Theory and Application
(
John Wiley & Sons
,
2015
).
18.
P.
Mora
,
G.
Morra
,
D. A.
Yuen
, and
R.
Juanes
, “
Influence of wetting on viscous fingering via 2D Lattice Boltzmann simulations
,”
Transport Porous Media
138
,
511
538
(
2021
).
19.
K. V.
Sharma
,
R.
Straka
, and
F. W.
Tavares
, “
Current status of Lattice Boltzmann methods applied to aerodynamic, aeroacoustic, and thermal flows
,”
Prog. Aeronaut. Sci.
115
,
100616
(
2020
).
20.
T.
Krüger
,
H.
Kusumaatmaja
,
A.
Kuzmin
,
O.
Shardt
,
G.
Silva
, and
E. M.
Viggen
, “
The Lattice Boltzmann method
,”
Springer International Publishing
10
,
4
15
(
2017
).
21.
D.
Turcotte
and
G.
Schubert
,
Geodynamics
(
Cambridge University Press
,
2014
).
22.
P.
Constantin
and
C. R.
Doering
, “
Infinite Prandtl number convection
,”
J. Stat. Phys.
94
,
159
172
(
1999
).
23.
C.
Szegedy
,
W.
Liu
,
Y.
Jia
,
P.
Sermanet
,
S.
Reed
,
D.
Anguelov
,
D.
Erhan
,
V.
Vanhoucke
, and
A.
Rabinovich
, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2015), pp. 1–9.
24.
Y.
LeCun
,
B.
Boser
,
J. S.
Denker
,
D.
Henderson
,
R. E.
Howard
,
W.
Hubbard
, and
L. D.
Jackel
, “
Backpropagation applied to handwritten zip code recognition
,”
Neural Comput.
1
,
541
551
(
1989
).
25.
R.
Girshick
,
J.
Donahue
,
T.
Darrell
, and
J.
Malik
, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2014), pp. 580–587.
26.
M.
Lin
,
Q.
Chen
, and
S.
Yan
, “Network in network,” arXiv:1312.4400 (2013).
27.
N. C.
Thompson
,
K.
Greenewald
,
K.
Lee
, and
G. F.
Manso
, “The computational limits of deep learning,” arXiv:2007.05558 (2020).
28.
G.
MacDonald
,
A.
Godbout
,
B.
Gillcash
, and
S.
Cairns
, “Volume-preserving neural networks,” in 2021 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2021), pp. 1–9.
29.
Y.
LeCun
,
L.
Bottou
,
Y.
Bengio
, and
P.
Haffner
, “
Gradient-based learning applied to document recognition
,”
Proc. IEEE
86
,
2278
2324
(
1998
).
30.
S.
Arora
,
A.
Bhaskara
,
R.
Ge
, and
T.
Ma
, “Provable bounds for learning some deep representations,” in International Conference on Machine Learning (PMLR, 2014), pp. 584–592.
31.
F.
Song
and
J.
Dongarra
, “Scaling up matrix computations on shared-memory many core systems with 1000 CPU cores,” in
Proceedings of the 28th ACM International Conference on Supercomputing
(Association for Computing Machinery, New York, 2014), pp. 333–342.
32.
A.
Krizhevsky
,
I.
Sutskever
, and
G. E.
Hinton
, “
ImageNet classification with deep convolutional neural networks
,”
Commun. ACM
60(6), 84–90 (2017).
33.
F.
Zhuang
,
Z.
Qi
,
K.
Duan
,
D.
Xi
,
Y.
Zhu
,
H.
Zhu
,
H.
Xiong
, and
Q.
He
, “
A comprehensive survey on transfer learning
,”
Proc. IEEE
109
,
43
76
(
2020
).
34.
O.
Russakovsky
,
J.
Deng
,
H.
Su
,
J.
Krause
,
S.
Satheesh
,
S.
Ma
,
Z.
Huang
,
A.
Karpathy
,
A.
Khosla
,
M.
Bernstein
,
A. C.
Berg
, and
L.
Fei-Fei
, “
ImageNet large scale visual recognition challenge
,”
International Journal of Computer Vision (IJCV)
115
,
211
252
(
2015
).
35.
K.
Yang
,
K.
Qinami
,
L.
Fei-Fei
,
J.
Deng
, and
O.
Russakovsky
, “Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy,” in
Conference on Fairness, Accountability, and Transparency
(Association for Computing Machinery, New York, 2020).
36.
K. T.
Gribbon
and
D. G.
Bailey
, “A novel approach to real-time bilinear interpolation,” in Proceedings of DELTA 2004. Second IEEE International Workshop on Electronic Design, Test and Applications (IEEE, 2004), pp. 126–131.
37.
C. N.
Vasconcelos
and
B. N.
Vasconcelos
, “Convolutional neural network committees for melanoma classification with classical and expert knowledge based image transforms data augmentation,” arXiv:1702.07025 (2017).
38.
Y.
Xu
,
R.
Jia
,
L.
Mou
,
G.
Li
,
Y.
Chen
,
Y.
Lu
, and
Z.
Jin
, “Improved relation classification by deep recurrent neural networks with data augmentation,” arXiv:1601.03651 (2016).
39.
S. C.
Wong
,
A.
Gatt
,
V.
Stamatescu
, and
M. D.
McDonnell
, “Understanding data augmentation for classification: When to warp?,” in 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA) (IEEE, 2016), pp. 1–6.
40.
B.
Blankenbach
,
F.
Busse
,
U.
Christensen
,
L.
Cserepes
,
D.
Gunkel
,
U.
Hansen
,
H.
Harder
,
G.
Jarvis
,
M.
Koch
,
G.
Marquart
et al., “
A benchmark comparison for mantle convection codes
,”
Geophys. J. Int.
98
,
23
38
(
1989
).