This work proposes a chemical mechanism tabulation method using artificial neural networks (ANNs) for turbulent combustion simulations. The method is employed here in the context of the Large-Eddy Simulation (LES)–Probability Density Function (PDF) approach and the method of stochastic fields for numerical solution, but can also be employed in other methods featuring real-time integration of chemical kinetics. The focus of the paper is on exploring an ANN architecture aiming at improved generalization, which uses a single multilayer perceptron (MLP) for each species over the entire training dataset. This method is shown to outperform previous approaches which take advantage of specialization by clustering the composition space using the Self-Organizing Map (SOM). The ANN training data are generated using the canonical combustion problem of igniting/extinguishing one-dimensional laminar flamelets with a detailed methane combustion mechanism, before being augmented with randomly generated data to produce a hybrid random/flamelet dataset with improved composition space coverage. The ANNs generated in this study are applied to the LES of a turbulent non-premixed CH_{4}/air flame, Sydney flame L. The transported PDF approach is used for reaction source term closure, while numerical solution is obtained using the method of stochastic fields. Very good agreement is observed between direct integration (DI) and the ANNs, meaning that the ANNs can successfully replace the integration of chemical kinetics. The time taken for the reaction source computation is reduced 18-fold, which means that LES–PDF simulations with comprehensive mechanisms can be performed on modest computing resources.

## I. INTRODUCTION

The accurate and efficient simulation of turbulent reacting flows is important for increased fuel efficiency and reduced pollutant formation. Direct Numerical Simulation (DNS) resolves all spatial and temporal scales of a turbulent flow field, which is powerful but computationally expensive. Large eddy simulation (LES) reduces the computational burden by filtering the equations of motion so that only the large scale motions of the flow field are resolved directly, while the subgrid scales are modeled. However, in the context of reacting flows, fluctuations at the unresolved subgrid scales strongly influence the filtered chemical reaction source terms, requiring important modeling decisions to be made.

Introducing a transport equation for the filtered joint probability density function (PDF) for the reactive scalars offers a description of these scalars at the subgrid scale. This has the key advantage of presenting the reaction source terms in closed form, with no further reaction modeling required. The present paper commemorates O'Brien who was a pioneer of PDF methods for reacting flows, initially laying the mathematical basis for PDF methods in simple reacting systems^{1,2} before moving on to PDF methods for turbulent reacting flows.^{3–6} Later, Gao and O'Brien^{7} established the LES–PDF method that underpins this work by combining the LES equations with the filtered fine-grained joint PDF for the reactive scalars. For a comprehensive treatment of PDF methods in reacting flows, the reader is referred to the reviews of O'Brien,^{8} Pope,^{9} and Dopazo.^{10} One complication associated with the LES–PDF method is the fact that the PDF for the reactive scalars contains a large number of independent variables, and therefore numerical solution can only be obtained with stochastic methods.

A commonly used stochastic solution method is based on Lagrangian stochastic particles,^{9} where a collection of particles are used to represent the joint PDF, coupled with an Eulerian approach for the velocity components. An alternative is the stochastic fields method, which is an Eulerian approach first formulated by Valiño^{11} with an alternative derivation provided by Sabelnikov and Soulard.^{12} The method was developed further for use in LES by, for example, Jones and Navarro-Martinez.^{13} The method introduces an ensemble of stochastic fields which evolve according to stochastic partial differential equations and describe an equivalent stochastic system to the joint PDF transport equation. It has been applied successfully to a variety of turbulent flames.^{14–17} A caveat of stochastic methods is that the number of evaluations of the reaction source terms needed is increased as it must be evaluated for every stochastic particle or field employed. For example, it is common to employ eight stochastic fields, in which case the number of reaction source term evaluations is eight times more.

The chemical kinetics of a reaction mechanism is described by a set of stiff ordinary differential equations (ODEs), which must be solved at every point in space and time for each stochastic field. This system of ODEs can be large, particularly when using more complex mechanisms, such as those required for predicting pollutant formation. The evaluation of the reaction source terms consumes the majority of the required computational time when doing an LES–PDF simulation with stochastic fields. This is not a problem exclusive to the LES–PDF method; other methods such as DNS, unsteady flamelet, Conditional Moment Closure (CMC), Multiple Mapping Closure (MMC), Linear Eddy Model (LEM), Thickened Flame Model (TFM), and the OpenFOAM Partially Stirred Reactor (PaSR) all require solution of the reaction source term at every point and time instance. Further details about these methods can be found, e.g., in Refs. 18–20. The chemistry tabulation approach developed in the present paper is applicable to all of these methods.

The computational time required for the solution of the reaction source term can be reduced by shifting from an on-the-fly solution of the system of ODEs to a retrieval of the desired solution through a table assembled ahead of time. The simplest tabulation methods, such as the lookup table,^{21} suffer from the curse of dimensionality, as memory requirements grow exponentially with the number of species. This problem limits the practical application of simple tabulation methods to simulations involving only a few variables. Methods which tabulate the relevant chemical space on the fly, such as the binary tree based *in situ* adaptive tabulation (ISAT),^{22} alleviate this issue although memory requirements still grow as the simulation progresses, particularly if new regions of composition space are accessed. Tabulation using the flamelet model^{18} requires small amounts of memory with fast retrieval times but is restricted to certain flame regimes.

An alternative approach to tabulation is the use of Artificial Neural Networks (ANNs), a class of machine learning techniques capable of non-linear function approximation. In the case of reacting flows, the function being approximated describes the temporal evolution of the chemical species due to reaction, which is dictated solely by the reaction mechanism chosen. The ANNs are composed of many interconnected basis units, or neurons, which can contain non-linear activation functions that determine the neuron output given the inputs. The connections between neurons are assigned numerical values known as weights. The values of these weights are determined ahead of a simulation through a non-linear optimization process, referred to as training. It is only the weights that need to be stored during a simulation, so memory requirements are low. Retrievals are performed using a small number of matrix multiplications, which is a considerably faster process than the solution of the original ODEs.

Previous approaches to using ANNs for combustion chemistry tabulation can be roughly classified according to the method used to generate ANN training data. The simplest approaches^{23–27} generate training data by performing uniform sampling across the composition space, which is only feasible when considering a small number of species. Blasco *et al.*^{28} and Chen *et al.*^{29} both generated training data using a partially stirred reactor, before using different methods of splitting the composition space to increase ANN accuracy. Mixture fraction and temperature windows were used by Chen *et al.*,^{29} while Blasco *et al.*^{28} used the Self-Organizing Map (SOM),^{30} an ANN that is able to find regions of similarity in high-dimensional data, to subdivide the composition space. A single multilayer perceptron (MLP), a type of ANN that is capable of performing non-linear function approximation, was trained to each subdivision.

A major issue related to the data generation process is generalization. If the training data are generated with the problem one is aiming to simulate, ANN application is made much easier, as the data encountered will be similar to that on which they were trained. However, this also means that new ANNs must be trained for each problem, thus negating any speed gain. A number of works have addressed this problem with the generation of data via a canonical problem and subsequent application to turbulent flames. Using a canonical problem has the added advantage of being problem non-specific, meaning that ANNs trained with data collected in this way can be applied to families of similar problems. Sen and Menon^{31} employed a flame-vortex interaction, while Sen *et al.*^{32,33} and Sinaei and Tabejamaat^{34} used LEM simulations to generate training data for the application of ANNs to LES of non-premixed syngas/air and CH_{4}/air jet flames, respectively. Chatzopoulos and Rigopoulos^{35} used an ensemble of igniting unsteady laminar flamelets to generate training data, with the aim of generating data applicable to a family of turbulent non-premixed flames. Since the data are shuffled prior to ANN training, the diffusion-reaction structure of flamelets is not retained and the predictive potential of the ANNs is not limited by the flamelet model assumptions. The data were divided into 400 regions using an SOM with a single MLP trained for all species in each region. The combined SOM–MLP method was then applied to a Reynolds Averaged Navier Stokes (RANS) PDF simulation of German Aerospace Center (DLR) flames A and B, and good agreement between direct integration (DI) and the SOM–MLP method was obtained. The SOM–MLP method was further developed by Franke *et al.*^{36} to include flamelets with high strain rates to capture extinction, before application to an LES–PDF simulation of Sydney flame L, again with good agreement between the SOM–MLP method and DI. The SOM–MLP method was also applied by An *et al.*^{37} to the LES simulation of a rocket-based H_{2}/CO/kerosene combustor, with training data collected via a RANS simulation of the same configuration rather than via unsteady flamelets. Other methods employed for generating ANN training data include the eddy dissipation model^{38} and a stochastic micromixing system.^{39} In the last work, two large MLPs were trained, one of them focusing on slow reactions, before application to a turbulent non-premixed syngas/O_{2} flame, where good agreement between the MLPs and DI was obtained.

Another class of works have used ANNs in conjunction with the steady flamelet model, which uses mixture fraction or progress variable to tabulate all reaction variables, thus reducing the computational time required compared to conventional integration. This means that the main benefit of using ANNs with the flamelet model is reduced memory requirements. For example, Kempf *et al.*^{40} used a single MLP for each predicted flamelet variable to perform an LES of a non-premixed piloted CH_{4}/air jet flame with good results, while Ihme *et al.*^{41,42} used a quantitative method to determine an optimal MLP architecture for steady flamelet training data, before application to the LES of a non-premixed bluff-body swirl-stabilized CH_{4}/H_{2} flame. A recent work from Hansinger *et al.*^{43} used a deep ANN trained on flamelet/progress variable (FPV) tables in an LES of the Sydney/Sandia flame, with good agreement obtained between the ANN approach and the simulation using conventional FPV tabulation.

The present paper contributes to the development of ANN tabulation methodologies for turbulent combustion modeling approaches featuring real-time integration of chemical kinetics. Previous work by our group was based on the SOM–MLP approach^{35,36} which, as mentioned, is an approach based on clustering and training of ANNs specialized to particular subdomains of the composition space. That approach, as well as those relying on alternative clustering techniques, takes advantage of MLP specialization to achieve more effective training. However, effective training is not the sole determinant of MLP accuracy; even a well-trained MLP will perform poorly if asked to make predictions on data that are significantly different to the training data. Therefore, this work proposes an ANN approach that focuses on generalization rather than specialization to improve performance, while retaining accuracy through careful ANN architecture selection and training. To achieve this, training data obtained from a canonical combustion problem are augmented with randomly generated data to improve composition space coverage. An investigation of different ANN architectures then shows that focusing on generalization gives increased accuracy in a representative application. Furthermore, in this work the MLP training is performed using a more sophisticated method than standard gradient descent, the Levenberg–Marquardt algorithm. While more computationally demanding, the use of this algorithm is rendered feasible by the simpler architecture of the single-species MLPs.

This paper is structured as follows: Sec. II describes the LES–PDF method and the stochastic field formulation in more detail, while Sec. IV A describes how training data are generated and then augmented to better represent the composition space. Section IV C provides results from an investigation into ANN architecture and training, which shows that approaches relying on generalization are beneficial. Finally, Secs. IV D and V present the application of the best performing ANNs to laminar flamelet simulations and to the LES–PDF simulation of the turbulent non-premixed CH_{4}/air flame, Sydney flame L.

## II. THE LES–PDF METHOD

The equations for the LES–PDF method can be derived by applying density weighted, or Favre filtering, to the transport equations for reacting flows. Application of a box filter, of width equal to the cube root of the local cell volume, to the equations of continuity and momentum conservation gives Eqs. (1) and (2), respectively,

where overbars represent filtered values, tildes denote density weighted filtered values, and *u*, *p*, *μ*, and *ρ* represent the velocity, pressure, viscosity, and density, respectively. The isotropic parts of the viscous and subgrid scale stresses are absorbed into the pressure term, giving the subgrid scale stress tensor as

This term is modeled using the Smagorinksy model,^{44} which introduces a subgrid scale viscosity given by

where *C _{s}* is the Smagorinksy constant, determined through the dynamic procedure of Piomelli and Liu,

^{45}Δ is the filter width, and $||Sij||$ is the Frobenius norm of the resolved rate of strain tensor. Under the assumptions of constant thermodynamic pressure (low Mach number flow) with equal diffusivities for all species and a unity Lewis number, the filtered equation for the reactive scalars is given by

where $\varphi \alpha $ is a general scalar with $\alpha =1,Ns$, where *N _{s}* is the number of variables needed to describe the system (i.e., the enthalpy and the species mass fractions), $J\xafj,\alpha $ is the mean scalar flux, and $Jj,\alpha sgs$ is the subgrid scale scalar flux. Corresponding to the unity Lewis number assumption,

where *σ* is the Prandtl/Schmidt number.

Because of the difficulties in evaluating the reactive source terms, $\rho \omega \u0307\xaf\alpha $, Eq. (5) is not solved and instead the LES–PDF approach is followed. A density weighted joint subgrid PDF for the scalars, $P\u0303sgs$, is introduced. A transport equation for $P\u0303sgs$ can be derived by following, for example, Gao and O'Brien.^{7} Following the approach of Brauner *et al.*,^{46} the resolved parts of the convection and “molecular” mixing are added to both sides of the equation so that the modeled form of the equation becomes

where $\psi \alpha $ is the “phase space” for the scalar *α*. The subgrid scale convection of $P\u0303sgs$ is modeled using a gradient closure analogous to the Smagorinsky model, while the Linear Mean Square Estimation (LMSE)^{4} or Interaction by Exchange with the Mean (IEM)^{47} closure has been employed for the subgrid scale mixing term. The sgs mixing constant *C _{d}* is assigned the value 2.0, while

*τ*is the micro-mixing time scale given by $\tau sgs=\rho \xaf\Delta 2/\mu sgs$.

_{sgs}The stochastic fields method formulated in Ref. 11 and extended to LES in Ref. 13 is used for the numerical solution of Eq. (6). This involves the solution of an equivalent stochastic system which shares the same statistical moments as $P\u0303sgs$—see Appendix. This equivalent system is described by an ensemble of *N* stochastic fields for each of the *N _{s}* reactive scalars. Using the Itô stochastic integral, the stochastic fields

*ξ*can be shown to evolve according to

where $dWin$ represents increments of a (vector) Wiener process, different for each field but independent of the spatial location **x**. The increments are given by $\eta indt$ where $\eta in$ is a ${\u22121,1}$ dichotomic random vector. Filtered quantities, $\varphi \u0303\alpha $, are easily obtained through a simple averaging over the stochastic fields, as in the following equation:

From Eq. (7), it can be seen that each of the *N* stochastic fields has its own reaction source term, and thus the number of calls to the chemical reaction solver is increased with each extra field used. The solution of the reaction source term occupies the majority of the time in an LES–PDF simulation using the stochastic fields method. Section III outlines how ANNs are used to approximate the reaction source term, in order to reduce computational expense.

## III. OVERVIEW OF ANN TABULATION OF CHEMICAL KINETICS

### A. The ANN tabulation process

The reaction source term for a given stochastic field defines an initial value problem which, for an adiabatic, isobaric reaction can be expressed by the following equation:

The function *f* is implicitly defined by the system of ODEs given by the chosen reaction mechanism. This function is in turn approximated by the ANNs. In the present paper, the ANNs will be applied to the tabulation of the Gas Research Institute (GRI) 1.2 mechanism,^{48} so a single point in composition space is described by a vector of *N _{s}* = 32 variables: the total enthalpy and 31 species mass fractions.

The relevant ANN data are obtained by numerical integration of Eq. (9) over a time step *δt*, giving an input-change pair consisting of $\xi \alpha (t)$ and the change in the composition vector, $\delta \xi \alpha (t)$, over a given time step. The total enthalpy, *h*, and the N_{2} mass fraction are unchanged over the reaction step, meaning the $\delta \xi 1$ and $\delta \xi 32$ are always zero, and the changes in these variables are not predicted by the ANNs although they do form part of the ANN input. The composition vector at the subsequent time step $\xi \alpha (t+\delta t)$ can be easily obtained by $\xi \alpha (t)+\delta \xi \alpha (t)$. Using $\delta \xi \alpha (t)$ as the output instead of $\xi \alpha (t)$ has been carried out previously (e.g., Refs. 23, 24, 26, and 29) and is beneficial as generally the changes in species mass fractions are small compared to the species mass fractions themselves, hence using the changes increases the resolution of the problem. The ANN tabulation method consists of the following steps.

#### 1. Generation of data for ANN training

The first step is to generate a training dataset consisting of many input-change pairs of the form ($\xi \alpha ,\delta \xi \alpha $) so that the ANNs can be trained to accurately imitate the reaction mechanism. This is performed by augmenting data obtained from unsteady one-dimensional laminar flamelet simulations with randomly generated data to form a hybrid random/flamelet training dataset, as explained in Sec. IV A.

#### 2. ANN architecture selection and training

The next step is to choose an ANN architecture and perform training using the data generated in step 1. Details of the ANN architecture selection and the training method are given in Sec. IV C, with a comparison made between methods which focus on specialization and those which focus on generalization.

#### 3. Application of ANNs

The best performing ANNs from step 2 are then applied to two simulations. In Sec. IV D, they replace the reaction mechanism in laminar flamelet simulations at varying strain rates to validate the data generation and training procedure. Then, in Sec. V, they are used in place of the reaction mechanism in an LES–PDF simulation of the non-premixed CH_{4}/air flame Sydney flame L, and comparison is made to a simulation using DI for the reaction source term.

### B. Summary of SOM and MLP theory

A brief explanation of the theory of the two types of ANN used in this work is given here, with a thorough explanation given elsewhere (e.g., Ref. 49). The first type of ANN used is the self-organizing map, whose purpose is to divide the composition space to subdomains in which further ANNs are going to perform function approximation. The SOM consists of a two-dimensional grid of neurons, each represented by a vector in composition space. At the start of training, the neurons are given random initial values and the input part of an input-change pair is presented. The Euclidean distances between the point considered and the SOM neurons are computed, with the closest SOM neuron moving toward the point considered. This process is repeated by iterating through the training dataset, until the locations of the SOM neurons no longer change appreciably. Once this has been achieved, the SOM neurons are fixed, with each SOM neuron representing the center of a cluster in composition space. Any presented data can then be subdivided according to these clusters by finding the closest neuron in Euclidean space to the input point considered.

Function approximation is performed using multilayer perceptrons. In the SOM–MLP approach, individual MLPs are trained for clusters found through the SOM. As Fig. 1 shows, an MLP consists of successive layers of neurons with connecting weights.

The first layer is the input layer, which takes the input part of an input-change data pair. These inputs are fed forward to the so-called hidden layer according to

where ** r** is the input vector,

**is the hidden layer output vector, $W1$ is the matrix of weights connecting the input and hidden layers, and $b1$ is the vector of weights connecting the input layer bias neuron (indicated by**

*n**b*in Fig. 1) to the hidden layer neurons. The function

*g*

^{1}is the hidden layer activation function, a non-linear function which is the source of the MLP's ability to perform non-linear function approximation. Although in principle an MLP can have any number of hidden layers with any number of neurons in each layer, throughout this work only one hidden layer of 30 neurons is used, so the overall MLP output

**is calculated as**

*o*where *g*^{2} is the output layer activation function, $W2$ is the matrix of weights connecting the hidden and output layers, and $b2$ is the weights vector connecting the hidden layer bias neuron to the output. The MLP shown has a generic number of outputs *N _{o}*: this can range from

*N*= 1 if predicting the change in a single species to

_{o}*N*= 30 if predicting the changes in all of the species in the GRI 1.2 mechanism (as the changes in total enthalpy and N

_{o}_{2}mass fraction are not predicted). During training, the MLP weights are adjusted based on the error between the MLP output and the target MLP output

**for a given**

*s***, using the Levenberg–Marquardt algorithm,**

*r*^{50}detailed in Sec. IV B. At run-time, the weights determined at the end of the training are used with Eqs. (10) and (11) to make predictions for each input.

## IV. A NEW APPROACH TO TRAINING ANNs FOR CHEMICAL KINETICS TABULATION

This section details the elements of the approach developed in the present paper for training ANNs for chemical kinetics tabulation. The main elements are as follows:

ANN training data generation based on the canonical combustion problem approach of Chatzopoulos and Rigopoulos,

^{35}but with the addition of randomly generated data to increase composition space coverage and the generalization capability of the ANNs.ANN training using the Levenberg–Marquardt algorithm, which gives greater accuracy than stochastic gradient based training algorithms.

An ANN architecture based on generalization using a single MLP for each considered species, which is found to outperform approaches based on specialization through clustering.

### A. Training data generation

The underlying aim is for composition space coverage to be as general as possible, so that the ANNs can be applied to a range of combustion problems. However, the considered composition space has 32 dimensions making uniform coverage infeasible. Only a section of the composition space will be covered in realistic combustion scenarios, which is largely dictated by the chosen types and quantities of fuel and oxidizer, as well as the reaction mechanism. Generating composition data using a canonical combustion problem ensures that only the section of the 32 dimensional space relevant to combustion is targeted. In this work, the canonical combustion problem used is the method of igniting/extinguishing one-dimensional laminar flamelets.^{35,36}

One hundred flamelets were simulated with strain rates chosen randomly between 1 s^{−1} and 1100 s^{−1}. The time step used was $1\xd710\u22126\u2009s$. Half of the flamelets were piloted by setting a small part of the mixture fraction space to equilibrium as in Ref. 36, while the other half were initialized with the entire mixture fraction space set to equilibrium conditions. The initial starting temperature of the flamelets was randomly varied between 300 K and 500 K, to expand the range of enthalpy covered by the training data. This is equivalent to adjusting the temperature of a burnt mixture without altering the composition. Data with a temperature below 500 K or a mixture fraction outside of the range $0.02\u2264z\u22640.1$ were not collected due to low reactivity. Overall, approximately 250 000 composition space data points were taken from the flamelet simulations.

To expand the generality of the training dataset, a hybrid/flamelet dataset was formed by augmenting the flamelet data with randomly generated data.^{51} Although the flamelet data broadly cover the composition space expected in the turbulent flame simulation, it is unlikely that data from the turbulent flame will conform exactly to the flamelet equations. Like other function approximation models, MLPs have the capacity to interpolate, but if the regions of composition space accessed by the turbulent flame are sparsely populated by the training data accuracy loss is inevitable. Additionally, the MLPs should be robust to their own prediction errors, as reactive points pass through the MLPs many times as a simulation progresses.

The two main types of MLP prediction error arising in simulations are mass conservation errors and element conservation errors. Mass conservation errors were corrected by normalizing the total predicted mass to the initial mass of the reacting species, $1\u2212\xi N2$. Element conservation errors from MLP predictions are harder to correct, so allowance was made for variable molar element ratios in the augmented data. The element ratios in the data from the flamelet simulations are exact; the molar ratio of H atoms to C atoms is always exactly four as the fuel is CH_{4}, while the ratio of O atoms to N atoms is the same as that in air, set to 0.264. The allowable range of molar C to N ratios is determined by the mixture fraction limits.

To generate data with variable element ratios, the data from the flamelet simulations are augmented with randomly generated data. As the composition space is multidimensional, the generation of a representative dataset has to be guided by a canonical combustion problem. Therefore, it makes sense to employ the flamelet data, which roughly target the desired region of composition space, and modify them to increase composition space coverage and ANN robustness. Augmentation of the flamelet data was achieved by taking each input variable from the flamelet data in turn, with *h* and N_{2} mass fraction adjusted linearly using the following equation:

where *p* is the randomly generated value, *p ^{f}* is the value of either

*h*or $N2$ mass fraction from the flamelet data, and

*a*is a uniform random number between −1 and 1. The remaining species mass fractions were individually adjusted using the following equation:

where *p _{f}* is again the relevant species mass fraction from the flamelet data and

*b*another uniform random number between −1 and 1. In contrast to the linear alteration of

*h*and N

_{2}, the remaining species are altered in an exponential fashion, which gives larger changes. This is because the range of

*h*and N

_{2}is well defined through the choice of fuel and oxidizer, where as it is harder to predict the values that species (particularly intermediate species) will take throughout a simulation. The resulting mass fractions were normalized to ensure summation to unity. Then, each generated composition went through an acceptance-rejection process to ensure that the element molar ratios were within allowable limits. Table I gives the acceptable molar ratio range of H/C, O/N, and mixture fraction.

Quantity . | Range . |
---|---|

H/C | 3.8–4.2 |

O/N | 0.254–0.274 |

z | 0.02–0.10 |

Quantity . | Range . |
---|---|

H/C | 3.8–4.2 |

O/N | 0.254–0.274 |

z | 0.02–0.10 |

Any point which failed any of these conditions was rejected, with different values of *a* and *b* generated until the point was accepted. The limits on *a*, *b*, H/C, O/N, and *z* are arbitrary, but some guidance can be provided. Increasing the range of *a* and *b* introduces more randomness (and consequently more rejections), while widening the ranges of allowable element ratios leads to fewer rejections. The values given have been found to work well, introducing data significantly different to the flamelet data quickly and efficiently, without an unacceptably high level of rejections. The randomly generated data were combined with the original flamelet data and shuffled, with points then integrated over a time step of $1\xd710\u22126\u2009s$ to form a training dataset of approximately 500 000 input-change pairs, $[p\alpha ,\delta p\alpha ]$. The magnitude of the considered variables has a wide range, so for effective training linear scaling is performed on each variable in the input-change pairs using the following equations:

where $r\alpha $ and $s\alpha $ are the scaled inputs and changes, respectively. This scaled dataset is used for ANN training.

### B. The Levenberg–Marquardt method for MLP training

During training the MLP weights $W1,\u2009b1,\u2009W2,\u2009b2$ are adjusted using the Levenberg–Marquardt algorithm,^{50} which has been used before for training MLPs for combustion chemical kinetics tabulation (e.g., Refs. 42 and 51–53). The aim of MLP training is to find the weights which minimize a given objective function, *F*, in this case the sum of squared errors between the target outputs, ** s**, and the MLP predictions,

**, over a dataset of size $q=1,Nd$**

*o*where ** w** is the vectorized form of the network weights and biases and $eq$ is the error between the MLP output and the target output for the data point

*q*. After giving the MLP weights initial values, Levenberg–Marquardt training is performed iteratively, with the change in weights for a given data point calculated as

where *λ* is an adjustable parameter, ** I** is the identity matrix, and

**the Jacobian matrix, containing derivatives of the network error with respect to the weights, of the form**

*J*By varying the value of *λ*, the Levenberg–Marquardt algorithm can effectively switch between the Gauss–Newton method, which is fast converging but unstable, and standard gradient descent, which is slow converging but very robust. Here, the Levenberg–Marquardt algorithm is implemented with the modification of Wilamowski and Yu,^{54} which allows training to be performed with large datasets without overwhelming memory requirements.

Figure 2 shows the predictions vs targets of an MLP trained with stochastic gradient descent to predict the change in CO mass fractions over the hybrid random/flamelet dataset. Figure 3 shows the predictions vs targets of an MLP of the same size, trained on the same problem with the Levenberg–Marquardt algorithm. It is clear from Figs. 2 and 3 that training MLPs with the Levenberg–Marquardt algorithm gives substantial accuracy benefits.

One caveat of the Levenberg–Marquardt algorithm is that it becomes infeasible for training MLPs with a very large number of weights. Using a large number of weights increases the size of the Jacobian matrix, as shown by Eq. (18), while also increasing the computational cost and time required for computing the weight updates of Eq. (17). This practical consideration precludes the use of the Levenberg–Marquardt algorithm with very large MLPs (such as the ones used in Ref. 39) or large numbers of medium sized ones (such as those in Refs. 36 and 37) meaning approaches based on stochastic gradient descent have to be used instead. This suggests that the number of MLP weights should be kept small wherever possible to allow more sophisticated training algorithms to be used.

### C. Specialization vs generalization

The next step in the method is to determine a suitable ANN architecture. In general, with ANNs, there is a speed-accuracy trade-off; using larger networks will give greater accuracy for a given problem at the expense of computational speed. Small, fast, MLPs can be used with high accuracy if the problem can be simplified. This simplification has previously been performed using subdivision of the composition space, with the SOM frequently used.^{28,35–37}

In general, it is good practice when training ANNs to split the entire training dataset into two, with one part used to determine the optimal SOM/MLP parameters (cluster locations and weights). The reserved part is never used to determine the optimal ANN parameters but instead forms a test set to ensure that the training procedure has been successful. It is in this reserved portion of the training dataset that the SOM has shown to improve accuracy.

The performance of an ANN method in this reserved test set is of limited use as it is the performance in the application that is of most interest. It would be possible to achieve very high accuracy in the reserved test set through specialization (that is, learning the exact patterns in the training data rather than the underlying function) and consequently have poor generalization capabilities. Here, to test the performance of different ANN architectures, 500 000 points are randomly sampled throughout an LES–PDF simulation of Sydney flame L with stochastic fields, using conventional DI of the reaction source term. It should be emphasized that this dataset has not been used at any point during the ANN training procedure, but instead serves only to test the generalization capability of the various ANN architectures.

In order to assess the effects of specialization using SOM clustering on overall MLP prediction accuracy, a comparison of the performance of various architectures on this Sydney L dataset was undertaken, with a summary of all of the considered architectures given by Table II. Architecture A uses the best performing SOM architecture of Franke *et al.*^{36} and serves as the reference case. Architecture B uses the same number of SOM neurons as architecture A, but in B each cluster has one MLP per species predicted (so 3000 MLPs in total), rather than one MLP for all species predictions as in A.

Architecture . | SOM . | SOM neurons . | One MLP per species . | Other features . |
---|---|---|---|---|

A | Yes | 100 | No | … |

B | Yes | 100 | Yes | … |

C | Yes | 9 | Yes | Grouping of SOM |

D | Yes | 4 | Yes | Major species only |

E | Yes | 4 | Yes | … |

F | No | … | Yes | … |

Architecture . | SOM . | SOM neurons . | One MLP per species . | Other features . |
---|---|---|---|---|

A | Yes | 100 | No | … |

B | Yes | 100 | Yes | … |

C | Yes | 9 | Yes | Grouping of SOM |

D | Yes | 4 | Yes | Major species only |

E | Yes | 4 | Yes | … |

F | No | … | Yes | … |

Architecture C groups data from a given SOM neuron and its three closest neighbors by Euclidean distance and uses these groups to train one MLP for each species. Architecture D uses one MLP per species trained on data in clusters found by applying an SOM to only the enthalpy and the major species: CH_{4}, O_{2}, H_{2}O, CO, CO_{2}, and N_{2}. Architectures E and F use one MLP per species, with architecture E using only four SOM neurons, and architecture F using no SOM at all.

The hybrid random/flamelet dataset from Sec. IV A was used for both SOM and MLP training. In all cases, the MLPs used contained a single hidden layer of thirty neurons. Hyperbolic tangent and linear activation functions were used on the hidden and output layers, respectively. All MLPs were trained with the Levenberg–Marquardt algorithm, as described in Sec. IV B. After training, the root mean square (RMS) error between the network output and the desired target was calculated over the entire Sydney L testing dataset for each species, according to the following equation:

where $\u03f5\alpha $ is the RMS error in the species *α*, *N _{d}* is the number of data in the Sydney L testing dataset ($\u2248$ 500 000), $t\alpha i$ is the network target for the

*i*th input point, and $o\alpha i$ is the network output for the corresponding input point. Table III gives the mean RMS error from Eq. (19) over all the species, while Figs. 4 and 5 show the errors for each architecture for the species CO

_{2}and OH. The worst performing architecture is A, while accuracy generally increases with fewer SOM neurons. Architecture B is more accurate than A, showing that using a single MLP for each species is considerably more accurate than using a single MLP for all species, even when using the SOM.

Architecture . | Mean ϵ (%)
. |
---|---|

A | 0.202 |

B | 0.152 |

C | 0.164 |

D | 0.063 |

E | 0.061 |

F | 0.059 |

Architecture . | Mean ϵ (%)
. |
---|---|

A | 0.202 |

B | 0.152 |

C | 0.164 |

D | 0.063 |

E | 0.061 |

F | 0.059 |

Figure 6 shows the prediction error, $\epsilon i$, for a given input as calculated by Eq. (20) plotted against the Euclidean distance between the point and its closest SOM neuron for architecture A

where $\epsilon i$ is the mean absolute error over all the species for the *i*th input. It can be seen that as the distance from the center of an SOM cluster increases, prediction accuracy decreases in a roughly linear manner. This is a consequence of specialization, as more common points near the center of a cluster are prioritized during training at the expense of less frequent points far from the center. Architecture C mitigates this by having fewer SOM neurons, with Figs. 4 and 5 showing improved accuracy although the good performance in these species is outweighed by poorer performance in other species, leading to a higher mean error overall compared to architecture B.

Architectures D and E showed very similar performance, with the small error difference showing that the most meaningful clusters are found in the major variables and clustering in the minor species is not as relevant. The best performing architecture overall is architecture F, using a single MLP for each species and no SOM. The key problem with the SOM is that it identifies regions of similarity in the training data, which may not share the same distribution as the data encountered in the simulation. This is true regardless of the complexity of the ANNs used for regression—although a more complex ANN would be more accurate, it would still be specialized to data which may not be worthwhile in the simulation. By using a single MLP for each species, training can be performed effectively over the entire training dataset, meaning clustering is not necessary even with a complex reaction mechanism. This has the added benefit that, if desired, different MLP architectures can be used for each species, depending on how difficult they are to predict.

One downside of using a single MLP for each species is that at run-time 30 MLPs have to be called, rather than one. This is mitigated to some extent by no longer having to compute the Euclidean distance of a point to all of the SOM neurons. A slight increase in the ANN computation time is an acceptable trade-off for a large increase in prediction accuracy, particularly in the context of LES–PDF simulations where DI of the reaction source term is many times slower than the ANNs anyway.

### D. MLP validation via laminar flamelet simulations

The aim of Sec. IV A was to increase the composition space covered by the dataset used to train the MLPs. To ensure that using hybrid random/flamelet data does not come at the cost of decreased MLP accuracy in the original flamelet data, the trained MLPs from architecture F were used to replace the reaction mechanism in simulations of CH_{4}/air laminar flamelets at strain rates between 1 s^{−1} and 530 s^{−1} (just below the extinction strain rate). The flamelets were piloted by setting a small section of mixture fraction space to equilibrium conditions for 1000 steps before being simulated until steady state. The time step used was the same as that which the MLPs were trained for, $1\xd710\u22126\u2009s$.

Figures 7 and 8 show the steady state temperature and CO mass fraction flamelet profiles for strain rates between 25 s^{−1} and 530 s^{−1}. The profiles from the simulations using MLPs are shown as the black dotted lines, while the red lines are the simulations using DI of the reaction step. It can be seen from Figs. 7 and 8 that the MLP prediction accuracy is generally very good, but decreases slightly with decreasing strain rate. This can be explained as the lower the strain rate is the more slowly the flamelet profile develops, and so the error from repeated MLP predictions has more time to accumulate.

Figure 9 shows the flamelet profile for temperature for a strain rate of 1 s^{−1}. In this extreme case it can be seen that the MLP predicted profile deviates considerably from the DI profile. Although the predicted flamelet profile at strain rates 1 s^{−1} is inaccurate this does not cast doubt on the overall MLP method. This is something of a contrived case as at such low strain rates the equilibrium pilot used to ignite the flamelet is essentially conserved due to a lack of diffusion, meaning the MLPs are tasked with repeatedly predicting equilibrium states where the chemical reaction rates are approximately zero. Because there is always some error associated with MLP predictions, the best that could be hoped for in this case is that the net prediction errors are zero. The flamelet simulation at a strain rate of 1 s^{−1} was run for 0.1 s, meaning the error shown has accumulated over 100 000 MLP predictions, and a single MLP prediction, although not exact when predicting equilibrium states, is not very inaccurate.

Such an accumulation of errors does not occur in a flame simulation, where the fluid elements do not remain indefinitely within the flame zone. If one wished to avoid incorrect MLP predictions at equilibrium, a comparison of the current temperature of a reacting point with the equilibrium temperature could be performed, with no prediction made if the current temperature is sufficiently close to the equilibrium temperature.

## V. APPLICATION TO SYDNEY FLAME L

### A. Experimental details and previous studies

The Sydney L flame considered here as the test case for the ANN method was studied experimentally by Masri *et al.*^{55} The burner exit geometry is identical to the Sandia flame series, with a central fuel jet of diameter *D *=* *7.2 mm, surrounded by an annular pilot with a diameter of $DP=18.2$ mm. The fuel jet is 100% CH_{4} injected at a bulk velocity of 41 m/s while the pilot consists of a fully burnt mixture of C_{2}H_{2}/H_{2}/air with the same mixture fraction as stoichiometric CH_{4}/air, *z *=* *0.055. The pilot bulk velocity is assumed to be 24 m/s based on the unburnt pilot velocity of 3 m/s and a fully burnt pilot mixture temperature of 2600 K. An air co-flow is maintained around the pilot at 15 m/s.

Several previous simulations of Sydney flame L have been conducted. Chen *et al.*^{21} used RANS with a joint-scalar transported PDF, with the Curl model used for micromixing closure. A stochastic solution scheme based on Lagrangian particles was employed, with two mechanisms tested: a reduced mechanism of four species and a constrained equilibrium model with four scalars. Mean radial profile peak values are reasonably predicted for most species but the experimental spreading rate is not well captured at all, a discrepancy attributed to the micromixing model. Masri and Pope^{56} employed RANS coupled with the joint-scalar-velocity PDF with a stochastic solution scheme using Lagrangian particles. Although the experimental spreading rate was well captured, local extinction was significantly under predicted due to the simple thermochemical approaches employed. Jones and Kakhi^{57} used RANS with a joint-scalar PDF approach using two different micromixing models: Curl and LMSE and a stochastic solution scheme. Mean radial profiles were reasonably close to the nozzle but deteriorate further downstream, with differences attributed to the simple micromixing models employed.

More recent studies have employed LES to the simulation of Sydney flame L. Prasad *et al.*^{16} employed the LES–PDF method using stochastic fields with a reduced mechanism of 19 species to the simulation of Sydney flames B, L, and M. A very fine body-fitted pyramid grid of approximately 7.2 × 10^{6} cells was used, with the mean radial profiles shown (temperature, mixture fraction, CO_{2} and H_{2}) agreeing closely with experiment at all axial locations. Wang *et al.*^{58} used the LES–PDF method with Lagrangian stochastic particles to simulate the transient behavior of Sydney flames B, L, and M when subjected to an inlet velocity pulse. Mean radial profiles generally show good agreement with experiment although the spreading rate is not well captured in some species. Finally, Franke *et al.*^{36} simulated the Sydney flame L using the LES–PDF method with stochastic fields and a reduced mechanism of 16 species. Good agreement is seen at the axial location $x/D=10$ with extinction and flame spreading rate more poorly predicted further downstream, resulting in significant over predictions of temperature, CO_{2}, and H_{2}O.

### B. Sydney flame L simulation and results

The simulation was performed using the in-house LES code BOFFIN,^{59} which uses a second-order accurate finite volume method based on a low Mach number, pressure based, variable density formulation. An energy-conserving discretization scheme is used for the momentum equation convection terms, whilst second-order central differencing is used for all other spatial derivatives, apart from the scalar equation convection terms, which use a total variation diminishing (TVD) scheme to avoid unphysical values. The stochastic field equations were solved using fractional steps involving an Euler–Maruyama scheme.^{60} The code is fully parallelized using domain splitting and Message Passing Interface (MPI) routines, while the reaction fractional step is further parallelized to distribute reaction source computations evenly across all processors. Computation of the reaction source terms using DI was with the variable-coefficient ODE solver VODE.

An overall domain of 25 × 10 × 10 cm^{3} was represented using a Cartesian finite volume grid of approximately 3.2 × 10^{6} cells arranged 224 × 120 × 120 and split into 288 domains, with a central processing unit (CPU) assigned to each one. Cell widths were approximately 0.4 mm near the burner exit, with a smooth radial and axial expansion. Eight stochastic fields were used, a number which has previously been found to be adequate^{14} for predicting mean quantities. The time step was fixed at $1\xd710\u22126\u2009s$, which is also the time step that the MLPs were trained for. If a larger or variable time step was required, the MLPs could be called repeatedly to make up the required time. Unsteady inflow profiles were generated using the digital turbulence filter method of Di Mare *et al.*^{61}

The MLPs developed as detailed in Sec. IV C were used to simulate Sydney flame L from the beginning of the simulation, which differs from the approach of Franke *et al.*^{36} where DI was used to obtain a converged solution before switching to ANNs to collect statistics. The ANN simulation was compared with a simulation using conventional DI of the chemical kinetics. Time mean and RMS statistics were collected over eight flow through times, based on the jet bulk velocity, once the flame had reached a statistical steady state. For faster MLP computations, the tanh hidden layer activation function was replaced by a polynomial approximation based on Lambert's continued fraction, given by

The reason for using such high powers of *x* is that MLPs can be trained using tanh activation functions and then at run-time use the polynomial replacement for increased speed with no significant loss of accuracy.

Figures 10 and 11 show the radial mean and RMS plots for temperature, mixture fraction, and major species at three axial locations. Excellent agreement is observed between the DI and ANN simulations for the major species, with very minor discrepancies at the larger axial locations for CO and H_{2}. The agreement with experiments is reasonable although, as in many previous studies, extinction at $x/D=20$ is not captured well, resulting in a significant over prediction of combustion at $x/D=30$. This point is not important regarding the objective of this study, which is to obtain good agreement between DI and ANNs.

Figures 12 and 13 show the radial mean and RMS profiles for several minor species. The benefits of the ANN approach are most apparent when using complex mechanisms with numerous minor species, and Figs. 12 and 13 show that, on the whole, the agreement between the ANN and DI simulations is very good. With the exceptions of H_{2}O_{2} and C_{2}H_{6}, the ANNs successfully reproduce the shape and peak of the radial profiles at all axial locations, with agreement generally decreasing slightly with axial position. If increased accuracy on H_{2}O_{2} and C_{2}H_{6} is desired, larger MLPs can be used on these species specifically; this is another advantage of the approach of using specialized MLPs for each species.

The results in Figs. 10–13 show that using individual networks for species rather than for composition subdomains has allowed a much smaller number of weights to be targeted at species predictions more effectively: this work uses approximately 30 000 weights for a mechanism of 31 species, whereas Franke *et al.*^{36} used approximately 750 000 for a mechanism of 16 species, An *et al.*^{37} used approximately 4 500 000 for a mechanism of 41 species, and Wan *et al.*^{39} used approximately 180 000 for a mechanism of 11 species.

Table IV shows the average time taken to complete the various parts of one step of the Sydney L flame simulation, normalized to the time taken to integrate the reaction source terms of the stochastic fields when using DI. It can be seen that, when using DI, the majority of the simulation time is devoted to reaction source term computations. With the ANNs, the time taken to integrate the reaction source term reduces approximately 18-fold, and the chemical reaction step takes only 12% of the total CPU time, meaning it is no longer the bottleneck in LES–PDF simulations. The results demonstrate the potential of using ANNs for larger mechanisms for more complex fuels such as kerosene, where the time savings will be even larger.

## VI. CONCLUSIONS

This work has presented an ANN tabulation method for large combustion chemistry mechanisms that focuses on generalization rather than specialization. Representative composition space data for ANN training were generated using the canonical combustion problem of igniting/extinguishing one-dimensional laminar flamelets with the full GRI 1.2 mechanism, before being augmented with randomly generated data to produce a hybrid random/flamelet dataset with improved composition space coverage. These data were used to train a selection of ANN architectures using the Levenberg–Marquardt algorithm, with testing taking place on data representative of the desired ANN application. ANN architectures based on clustering the composition space via SOM and training specialized MLPs were compared with the new approach where individual MLPs were trained for each chemical species on the entire set of composition states. The new approach yielded better results, indicating that it attains better generalization over the entire composition space.

The simpler architecture of the single-species MLPs has the added benefit of rendering more sophisticated MLP training methods, such as the Levenberg–Marquardt algorithm, feasible. The method was first validated via simulation of one-dimensional laminar flamelets, and excellent agreement was obtained between the MLPs and DI for all but the very lowest strain rates. Subsequently, the method was applied to the LES–PDF simulation of a non-premixed turbulent CH_{4}/air flame, the Sydney L flame. Excellent agreement between the MLPs and DI was obtained for all of the major species and for most of the minor species as well. The required time for reaction source computation was reduced approximately 18-fold when using the MLPs compared to DI, while the chemical reaction step took only 12% of the total CPU time. While the method was employed here in the context of LES–PDF simulations, it can be used with any other method that requires real-time integration of chemical kinetics such as DNS, unsteady flamelet, CMC, MMC, LEM, TFM, PaSR (OpenFOAM), and laminar flame computation. In all of these methods, by removing the bottleneck caused by the integration of chemistry, the ANNs can render simulations with complex mechanisms feasible on modest computing resources.

## ACKNOWLEDGMENTS

Thomas Readshaw gratefully acknowledges the support by the Engineering and Physical Sciences Research Council (EPSRC) and Rolls-Royce plc in the form of a Doctoral Training Partnership (DTP) and Cooperative Award in Science and Technology (CASE) award. Tianjie Ding gratefully acknowledges the financial support provided by the China Scholarship Council (CSC). The simulations in this work have been conducted on the Imperial College Research Computing Service (http://doi.org/10.14469/hpc/2232).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX: EQUIVALENCE OF MODELED PDF AND STOCHASTIC FIELD EQUATION

The equivalence of the modeled filtered PDF $P\xafsgs(\psi ;x,t)$ and the stochastic fields can be examined by comparing the conservation equations describing the one-point moments arising from the two approaches. Equivalence requires that the two equations be identical. Here the PDF of a single scalar variable, $\varphi (x,t)$, will be considered with the density assumed constant. The equations describing the *n ^{th}* moment,

*M*, of the scalar are derived first from the PDF using Eq. (6) and then from the stochastic fields using Eqs. (7). The conclusions drawn are valid for an arbitrary number of scalar quantities.

_{n}##### 1. The modeled PDF equation

The PDF, $P\xafsgs(\psi ;x,t)$, is considered as a good generalized function with the following properties:

which implies that in physical space the problem is bounded. The *n*th moment of the scalar $\varphi (x,t)$ is

If, for a constant density fluid, Eq. (6) is multiplied term by term by *ψ ^{n}* and then integrated between the limits $\xb1\u221e$, the following equation for the

*n*th moment results:

##### 2. The stochastic fields

The increment of the stochastic field $\xi (x,t)$, obtained using the relations between the stochastic process and the PDF, is given by

The stochastic term involves the increment of a Wiener process, which is represented as the product of *η* (a dichotomic variable with $N\u223c[0,1]$) and the square root of the time step. It is thus of $\u223cO(dt1/2)$. The only constraint on *η* is its property $N\u223c[0,1]$. The stochastic process so defined is not differentiable in time, and it is therefore necessary to resort to a time expansion around a time instance, *t*. The derivative of the *n ^{th}* moment is calculated assuming an infinite number of stochastic fields

*N*

It is to be noted that a single realization of the scalar field does not have any physical significance. However, in an averaged formulation, the stochastic fields should share the same one-point moments as those given by the modeled PDF equation.

To calculate $(\xi (x,t+dt))n$, it is necessary to use a Taylor–Itô expansion^{62} of the function $f(\xi )=(\xi (x,t+dt))n$. The Itô formula/Martingale representation theorem in differential form allows the expansion of the function and yields

where *R* is the remainder containing higher order terms. Substituting $f(\xi )=(\xi (x,t+dt))n$ into Eq. (A6) and neglecting $R\u223cO(dt3/2)$

where the superscript denotes the power *n*. Once the increment is calculated, $\xi n(x,t+dt)$ can be obtained from $\xi n(x,t+dt)=\xi n(x,t)+d\xi n$. On substitution into Eq. (A5)

On substitution of $A(\xi )$ and $B(\xi )$ into Eq. (A8), term *I* then becomes

In the stochastic contribution, the quantities *B* and *dW* are uncorrelated so that term II is zero, viz.,

and the equation for the moments is

The moments calculated from Eq. (A4) differ to those calculated directly from the modeled PDF equation Eq. (6) by the term

The term is negligible if $\mu sgs\u226b\mu $ and is zero if there are no subgrid-scale variations, i.e., if the flow is laminar. In the majority of LES computations the former situation will prevail over the majority of the solution domain but the flow is quite likely to be laminar or quasi-laminar in small and sometimes important regions. In these latter circumstances, the direct viscous terms become dominant.

## References

_{2}/CO

_{2}flames

_{4}/H

_{2}/N

_{2}flames

_{4}/H

_{2}/N

_{2}jet diffusion flame using artificial neural networks