Computer simulations of high energy density science experiments are computationally challenging, consisting of multiple physics calculations including radiation transport, hydrodynamics, atomic physics, nuclear reactions, laser–plasma interactions, and more. To simulate inertial confinement fusion (ICF) experiments at high fidelity, each of these physics calculations should be as detailed as possible. However, this quickly becomes too computationally expensive even for modern supercomputers, and thus many simplifying assumptions are made to reduce the required computational time. Much of the research has focused on acceleration techniques for the various packages in multiphysics codes. In this work, we explore a novel method for accelerating physics packages via machine learning. The non-local thermodynamic equilibrium (NLTE) package is one of the most expensive calculations in the simulations of indirect drive inertial confinement fusion, taking several tens of percent of the total wall clock time. We explore the use of machine learning to accelerate this package, by essentially replacing the physics calculation with a deep neural network that has been trained to emulate the physics code. We demonstrate the feasibility of this approach on a simple problem and perform a side-by-side comparison of the physics calculation and the neural network inline in an ICF *Hohlraum* simulation. We show that the neural network achieves a 10× speed up in NLTE computational time while achieving good agreement with the physics code for several quantities of interest.

## I. INTRODUCTION

Inertial confinement fusion (ICF) experiments, such as those carried out at the National Ignition Facility, are highly complex and are designed with multiphysics codes that include dozens of coupled physics packages—radiation transport, hydrodynamics, neutronics, magnetic fields, and more. One of the most expensive calculations is the non-local thermodynamic equilibrium (NLTE) opacity, emissivity, and ionization. Due to the plasma conditions reached in ICF experiments, it is necessary to use NLTE models when describing the interactions between the x-rays and the plasma. This is a more involved calculation than the standard approximation of local thermodynamic equilibrium (LTE), in which populations of ions are described by the Maxwell–Boltzmann distribution.

In the NLTE case, the distribution of the ion populations is found by solving the collisional-radiative equations. In the steady-state case and at a fixed electronic density, this model is a linear system whose size is the number of described ion states. Either at LTE or at NLTE, absorption, emission spectra, and ionization are readily constructed from these populations and radiative cross sections. For gold, a common *Hohlraum* material in ICF experiments, this size ranges from $N=102$ to $N=106$ depending on the fidelity of the atomic model,^{1} with the upper limit set only by computational feasibility. The CPU-time of a call of the collisional-radiative model evolves in *N*^{2} or *N*^{3}, depending on the linear solver.

The LTE opacities are usually stored in a three-dimensional (frequency, density, and temperature) database. In the NLTE case, it also depends on the radiative spectrum: making the NLTE database very challenging in the general case. Its feasibility depends on the number of parameters that allow to describe the radiative spectrum.

Therefore, the collisional-radiative model is solved inline in the multiphysics code. It has to be called in each cell of the mesh, at each time step, for every iteration of the solver. Much work has been done to accelerate the NLTE calculation in ICF codes, by the use of parallel computing and by simplifying the description of the atomic model.^{2} However, these calculations remain computationally taxing, taking from 10% to 90% of the total wall clock time of an ICF simulation.

A faster representation of the collisional-radiative model will enable significant acceleration of ICF simulations. Moreover, if we had such fast representation, we may, at fixed computing power, improve the physics (atomic description), the numerics (number of calls by cycle) or the parallel computing (low memory, easy to parallelize).

This article is summarized in Fig. 1. One collisional-radiative model can require milliseconds to days of computing time, depending on the size of the atomic model. We prove here the feasibility of using deep neural networks (DNNs) for obtaining a fast representation of the collisional-radiative model in the case of Krypton material with 1808 atomic levels. The extension of this work to higher fidelity of the atomic model (green dashed lines) may allow their use in an ICF simulation. This work is a first step to improve the trade-off between physical description accuracy and CPU limited resources. The CPU-cost of NLTE physics is even more prohibitive in 3D simulations, because they contain many more cells.^{3–5}

In this work, we explore the use of machine learning models to accelerate the NLTE calculations in multiphysics ICF simulations. The inline collisional-radiative model takes as input a broad-band radiation spectrum that describes the radiation field, and outputs broad-band spectra that describe the material absorptivity and emissivity. This can be framed as a high-dimensional regression task, in which a machine learning model (e.g., a series of neural networks) is trained to map from the input to the output spectra via supervised learning. The neural network representation of the collisional-radiative model is significantly faster than the collisional-radiative model itself, providing a factor of 10 reduction in the NLTE CPU-cost for the 1D ICF *Hohlraum* simulation we present here.

Neural networks have long been used in spectroscopy, even before they revolutionized the fields of image recognition, natural language processing, or gaming. Spectra have been used by neural networks for classification purposes, such as predicting the electronic configurations of ions of manganese from narrow-band spectra,^{6} or to infer scalars, such as the inference of temperature and density of a plasma using the K-shell spectroscopy of Aluminum tracers.^{7} More recently, narrow-band spectra of molecules have been predicted using the description of molecules by Coulomb matrices.^{8} We extend these studies in the case of broad-band spectra (either in inputs or outputs).

Obtaining fast representation with machine learning can be done around codes or inside codes. Then we may obtain global models,^{9,10} or multi-scales or multi-fidelity coupling,^{11,12} or generalization of numerical methods.^{13,14} Deep neural networks improve accuracy and the use of multi-modal data.^{15} We confirm and quantify this statement on the high-dimensional regression problem of NLTE opacities with the construction of novel hybrid neural networks.

In this article, we use the radiation hydrodynamics code *HYDRA*^{16} and the collisional-radiative package of the code *Cretin*^{17} to demonstrate the efficacy of replacing inline *Cretin* calculations with neural networks for *HYDRA* ICF simulations. In Sec. II, we describe the collisional-radiative model. In Sec. III, we describe the neural networks used in this study. Finally, in Secs. IV and V, we will describe proof-of-principle simulations showing that neural networks are able to replace the collisional-radiative model with enough accuracy to be used in a radiation hydrodynamics code, with a tenfold reduction in CPU time.

## II. THE COLLISIONAL-RADIATIVE MODEL IN *CRETIN*

We briefly describe here the collisional-radiative model of the NLTE atomic kinetics/radiation transport code *Cretin.*^{17} This model in-lined in HYDRA is called DCA. It is most often used with an atomic model constructed to be inexpensive yet reasonably accurate.^{18} Consider a plasma of a single isotope of Krypton (*Z *=* *36), of atomic mass *A* in $g\u2009mol\u22121$. It is composed of ions with density *n*_{ion} in $cm\u22123$ and free electrons with density *n _{el}* in $cm\u22123$. Each ion is characterized by its charge state, i.e., the number of bound electrons

*Z*, and by its atomic energy levels. Each atomic level encompasses one or more atomic states, described by the distribution of the bound electrons over possible quantum states. The extent of quantum states included in a given atomic model affects the range of conditions for which that model would be suitable, while the lumping of atomic states into levels provides a means of decreasing computational expense at the cost of decreased fidelity. The atomic models commonly used for ICF simulations are generated with a screened-hydrogenic approach. Quantum states are highly averaged, based on superconfigurations, and detailed (improved treatment of photoexcitation transitions, approximate unresolved transition arrays (UTA) widths etc.) so as to contain relevant phenomena for ICF simulations.

_{s}We summarize the atomic level by a single index *s*, and we denote its density *N _{s}*. The total ionic density is given by:

Transitions between two ions states *s* and $s\u2032$ occur via several atomic processes, characterized as collisional when induced by a collision with a free electron and as radiative when a photon is absorbed or emitted. The rate of transition between states *s* and $s\u2032Rss\u2032$ (in $s\u22121$) depends on the plasma conditions and is calculated by the collisional-radiative model. The populations of all level then follow by solving the rate equations:^{19}

In the following, we will solve the steady-state collisional-radiative model:^{20}

which is a linear system of size equal to the number of atomic levels.

At the macroscopic scale, at every point in space and time, the plasma is characterized by its mass density *ρ* in $g\u2009cm\u22123$

and its electronic temperature *T _{e}* in

*K*. The plasma is embedded in a radiation field described by its mean energy (over solid angles) $J(\nu )$ in $erg\u2009cm\u22122\u2009s\u22121\u2009Hz\u22121$ as a function of frequency

*ν*(in

*H*), equivalent to the phase space distribution of photons. Charge neutrality of the plasma implies the following electronic density:

_{z}with $Qs=Z\u2212Zs$ being the ionization of the atomic level *s*. For example, if we have *Q _{s}* = 3, it means that we have 33 bound electrons for ions in the level

*s*.

Rates $Rss\u2032$ vary with electronic density. Therefore, (2) is a linear system only if we fix the electronic density. In hydro-codes, we fix the mass density, and thus the ionic density, and we obtain a nonlinear system. One redundant equation of (2) is replaced by the ionic density equation (1); we solve the linear system at a fixed electronic density; we calculate the new electronic density with (3); we iterate until convergence.

For illustration purpose, we consider here transitions between ions states *s* and $s\u2032$ by collisional excitation (resp. photo-ionization). These states differ by the quantum state of only one electron, initially bounded and finally bounded (resp. free). Then, upward rates (going to the state of higher energy) are given by:^{19}

The atomic collisional cross section for collisional excitation, $\sigma ss\u2032coll\u2212exc(v)$, describes collisions of free electrons at velocity *v*, exciting the ion of state *s* to state $s\u2032$. The free electron requires a minimum speed (*v*_{0}) to provide enough energy for the transition of the bound electron. Free electrons are described by a Maxwellian velocity distribution *f*(*v*, *T*). The atomic radiative cross section for photo-ionization, $\sigma ss\u2032rad\u2212ion(\nu )$, describes the absorption of photons of frequency *ν*. The inverse rates (from $s\u2032$ to *s*) are obtained from the principle of detailed balance, which ensures that upward and downward processes are balanced under conditions of thermodynamic equilibrium.

The constructions of these rates are done for every atomic collisional and radiative process that entails transitions, either upward (excitation, ionization, and dielectronic capture) or downward (de-excitation, recombination, and auto-ionization).

Finally, for the given plasma and radiation field conditions $(\rho ,T,J(\nu ))$, and atomic structures (levels and cross sections), we can solve Eq. (2) and obtain the ion populations (their densities *N _{s}*).

The mean ionization:

is used in the calculations of equations of state, thermal conduction, and laser absorption. Other quantities of interest are the absorptivity $\kappa (\nu )$ in $cm\u22121$ and the emissivity $\eta (\nu )$ in $erg/cc/s/Hz$. The absorptivity and emissivity characterize the interaction between the plasma and the radiation: a photon of frequency *ν* travels a mean distance $1/\kappa (\nu )$ without any collisions, and a volume of plasma *V* will emit during the time *dt* in the frequency interval $d\nu $ the energy $\eta (\nu )V\u2009dt\u2009d\nu $.

By splitting the radiative upward cross sections^{21,22} into $\sigma ss\u2032R$ (for transitions from the bound level *s* to the upper bound level $s\u2032$), $\sigma sR$ (for ionizations from the bound level *s* to the continuum), and $\sigma bremR$ (a free electron decelerated in the ion electric field, emitting free–free radiation), we have:

and

The statistical weights *g _{s}* represent the degeneracy of the state

*s*, representing the multiplicity of atomic levels which are then treated identically in the collisional-radiative model. These weights may also depend on plasma conditions to account for the progressive delocalization of bound states with increasing ion density. Absorption and emission are here isotropic, as they are calculated in the reference frame moving with the plasma.

To conclude this brief description, we summarize the role of the neural network model in Fig. 2. This model will replace the construction of the rates (4), the solution of the steady-state collisional-radiative model (2), and the construction of the spectra (5), (6).

It is tempting to use our knowledge of the code or of the physics inside *Cretin* to help our model. For instance, we could ask our neural network to obtain populations *N _{s}*, or we could help our neural network by providing the free–free part of absorptivity and emissivity, but we aim to be as general as possible, and to encapsulate

*Cretin*with no intrusive development or

*Cretin*-specific tricks. In the long term, we would like to construct a framework that can encapsulate any NLTE code with any atomic structure.

## III. NEURAL NETWORK SURROGATES FOR *CRETIN*

Deep neural networks (DNN) are commonly used as “surrogate” models, or emulators, of expensive scientific simulations and experiments. In this section, we describe the deep neural network architectures used to emulate the code *Cretin* for use in the radiation hydrodynamics code, *HYDRA*. In a standard ICF *Hohlraum* simulation, *HYDRA* will call *DCA* each time step, for each zone in the problem. *HYDRA* gives as input to *DCA* the material density, temperature, and radiation field represented by a binned spectrum. *DCA* then produces for *HYDRA* absorption and emission spectra to be used.

We describe the neural networks used to map from the inputs of density, temperature, and radiation field, into the output absorption and emission spectra. In the following, we do not use a DNN for the ionization, focusing instead on the spectral output, although using a DNN for the ionization has not encountered any difficulties. The high dimension of spectral data benefits from the use of two types of neural networks—auto-encoders, which determine a low-dimensional representation of high-dimensional data, and feed-forward neural networks used as regression functions, to map from low dimension inputs to low dimension outputs.

### A. Neural network architectures

In its simplest form, a neural network is a composition of layers, and a layer is a series of matrix–vector multiplication and addition operations whose outputs are fed through nonlinear “activation” functions:

where *z* is the input vector to the layer of interest, *w* is the weight array, *b* is the bias, and *σ* is the activation function. The output of this layer, $z\u2032$, is the input for the next layer in the neural network until the final layer is reached, in which case, the output $y=z\u2032$. The number of layers in the neural network and the width of each layer (the number of elements in each vector $z\u2032$) are hyper-parameters that must be tuned for each dataset. The neural network is trained by giving the model examples of (input, output) or (x, y) pairs. The input to the model x goes through a series of transformations given by the expression above to produce a prediction y^{*}. The difference between y^{*} and the true output y is computed in the form of a cost function—often the mean squared error. The weights and biases of the model are then adjusted via an optimization process (such as stochastic gradient descent), by an amount set by the learning rate. The network is trained on many (x,y) pairs for multiple iterations (called epochs) until the cost function reaches a minimum. The number of epochs a neural network is trained, how many data points it is exposed to between weight adjustments (batch size), and how large of a change can be made to the weights/biases in each step (learning rate), in addition to the overall shape of the neural network, are hyper-parameters that are empirically tuned to maximize the performance of the model.

In this article, we use a variant of the stochastic gradient descent, the Adam optimizer,^{23} with a learning rate initialized at 0.001. The batch size is fixed at 5% of the training dataset size, and the number of epochs is chosen empirically based on the convergence of the cost function.

To ensure the neural networks do not simply memorize their training data, the total dataset of (x,y) pairs is split into training and validation sets. In this work, 90% of the data is used to train the models, and they are validated using the withheld 10% of the dataset to ensure that the models can accurately predict data that were not part of the training set. In this work, we also produce a test dataset that is composed of 30,000 samples not used in the training or validation process.

As previously mentioned, two types of neural networks are used to create the *Cretin* surrogates: auto-encoders and feed-forward networks constructed using the algorithm Deep Jointly Informed Neural Networks (DJINNs).^{24} Two separate models are used for absorption and emissivity, but the models share a common set of inputs: temperature, density, and radiation field.

It is important to note that preprocessing of the data has a significant impact on neural network performance for this application. The spectral data spans many orders of magnitude, thus we scale the spectral data according to: $\u2009log10(1+X)$, where *X* is either the absorption, or the emissivity, or the radiation field. Additionally, the inputs and outputs of the feed-forward DJINN-based networks (see Sec. III C) are scaled $X\u2212XminXmax\u2212Xmin$, where the minimum and maximum are computed on the training data. This scaling ensures all inputs and outputs are treated as being equally important for the purposes of minimizing the cost function.

### B. Auto-encoders

Auto-encoders are used to compress the spectral data into lower dimensional representations that are more readily processed by the DJINN-based predictive model. An auto-encoder is essentially a model for nonlinearly compressing data with minimal information loss. Auto-encoders typically have an “hour-glass” architecture, shown in Fig. 3—the input layer is wide, and each layer compresses this information gradually until the architecture reaches its narrowest layer, which is often called the “latent space.” The latent space is a low-dimensional representation of the input space. After the latent space, the data is decompressed, with each layer getting progressively wider until you get back to the full size vector in the output layer. The auto-encoder is trained by minimizing the difference between the input data and the output data. If the output layer can reproduce the input layer with minimal error, the latent space can then be taken as a good low-dimensional representation of the high-dimensional data. We often refer to the auto-encoder by its two halves: the “encoder” compresses the full-dimensional data into the low-dimensional latent space. The “decoder” decompresses the latent space to give back the full-dimensional data.

The architecture of the auto-encoders used in this work is determined by adjusting the size of the latent space until minimum reconstruction error is achieved. The number of neurons per layer between the input and latent space decays by geometric progression. The models included in this work are trained for 10,000 epochs. The auto-encoders are fully connected with softplus activation functions, and are initialized with Xavier weights and biases of zero.^{25}

Thus, all spectra described in this article may be encoded in a latent space of low dimension, and a decoder may recover the original spectrum from the latent space.

### C. DJINN models

“Deep jointly informed neural networks,” or DJINNs,^{24} are used to create networks that map from the inputs of *Cretin* (encoded on a latent space for spectra), to the outputs of *Cretin* (encoded on a latent space for spectra). The DJINN algorithm simplifies the neural network training process by choosing an appropriate neural network architecture for the data automatically, without requiring the user to manually tune the architecture. DJINN trains a decision tree-based model of a depth specified by the user on the data, then maps the tree to an initialized neural network architecture. The model is then trained using the Adam optimizer with the learning rate, batch size, and number of epochs set by the user. The DJINN models included in this work are of depth 11 and are trained for 1000 epochs. Each model has about 2 million trainable parameters.

## IV. STANDALONE NEURAL NETWORK SURROGATES FOR *CRETIN*

To illustrate the feasibility of using neural networks to emulate an atomic physics calculation, we first consider a simple *Cretin*-only problem with a smooth radiation field described analytically surrounding a Krypton plasma (in Sec. V, we consider a more realistic example with radiation fields produced by the multiphysics code *HYDRA*). The smooth radiation field is an approximation of a typical field in an ICF *Hohlraum*; it is a superposition of a Planckian spectrum and M-band emission generated in the gold coronal plasma. This field is described by two parameters, the radiation temperature *T _{r}* and the M-band ratio

*α*:

where $b(\nu ,Tr)$ is the reduced Planckian, and $g(\nu )$ is a Gaussian of mean 3 keV and a full width half maximum of 1 keV.

A dataset is generated by running an ensemble of *Cretin* simulations under various plasma conditions typical for ICF experiments. The mass density (*ρ*) ranges from 3 to 100 mg/cc, the electron temperature (*T _{e}*) from 300 to 3000 eV, the radiation temperature (

*T*) from 30 to 300 eV, and the M-band ratio (

_{r}*α*) from 0 to 0.3. The data generation process is illustrated in Fig. 4.

We construct two distinct auto-encoders for absorption ($\kappa \nu $) and emission ($\eta \nu $) spectra given on 200 bins. After multiple trainings, we find that suitable latent space dimensions are 5 and 7, respectively, as shown in Fig. 5 and Table I. Encoding these data on their latent space, we can train the DJINN model; DJINN maps between the density, temperature, and the two parameters of the radiation field, (*ρ*, *T _{e}*,

*T*,

_{r}*α*), to the compressed absorption and emission spectra. The compressed spectra are then decompressed by the appropriate decoder networks. The compressed representation of the data reduces the overall size of the DJINN network, and thus reduces the amount of training data required to train the model.

. | Absorption DNN . |
---|---|

Input dimension | 4 |

Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1528) |

2 106 596 parameters | |

Latent space dimension | 5 |

Spectra decoder | (10,22,46,96) |

24 898 parameters | |

Output dimension | 200 |

. | Absorption DNN . |
---|---|

Input dimension | 4 |

Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1528) |

2 106 596 parameters | |

Latent space dimension | 5 |

Spectra decoder | (10,22,46,96) |

24 898 parameters | |

Output dimension | 200 |

The accuracy of the neural network surrogate is compared graphically to *Cretin* data in Fig. 6 and quantitatively in Fig. 7 and Table II. For these comparisons, we use a test dataset of 30,000 samples, distinct from the training and validation datasets.

. | Mean (%) . | Max (%) . |
---|---|---|

Planck absorption | 0.2 | 6.1 |

Rosseland absorption | 0.2 | 8.8 |

Integrated emissivity | 0.4 | 15^{a} |

. | Mean (%) . | Max (%) . |
---|---|---|

Planck absorption | 0.2 | 6.1 |

Rosseland absorption | 0.2 | 8.8 |

Integrated emissivity | 0.4 | 15^{a} |

^{a}

For the integrated emissivity, we filtered and kept only values that are above $5\xd7106$ erg/cc/s/ste. We obtain the highest relative errors at the lowest emissivities, which is expected as we used Means Square Error as a cost function, and which has no effect on the final simulations, as error made at low emissivity has no great effect. But as a consequence, the maximum relative error is very sensitive to the value at which we cut the considered emissivities used for relative error calculations.

There is good agreement between the neural network predicted spectra and those from *Cretin* simulations. The mean integrated prediction error is 0.2%, and the maximum observed error in the outlier datapoints is less than 9%. The neural network models thus provide an accurate representation of *Cretin* when considering simple, analytic radiation fields. In Sec. V, we implement more realistic *HYDRA*-generated radiation fields.

## V. *HYDRA* SIMULATIONS WITH INLINE NEURAL NETWORKS

In Sec. IV, we demonstrated the feasibility of training a neural network to emulate *Cretin* with a well-sampled input space described by a smooth analytically described radiation field, electron temperature, and material density. This dataset provides the neural network with a broad view of the various plasma conditions it might encounter during the validation and testing stages.

We aim to train a neural network suitable for embedding in the radiation hydrodynamics code *HYDRA*, to illustrate the computational time reduction that replacing *Cretin* with a neural network could provide. This model is slightly more complicated to train because the radiation field is no longer analytically described—it must be generated by *HYDRA*. Thus, instead of running tens of thousands of independent *Cretin* simulations to produce training data, we now run several independent *Hohlraum* simulations, each of which provides a large number of radiation fields, to produce a wide variety of realistic radiation fields.

To demonstrate the advantages of using neural networks as an approximation to *Cretin* inline in a *HYDRA* calculation, we consider a simple ICF simulation—a 1D spherical Krypton *Hohlraum* filled with helium gas (at $0.6\u2009mg/cc$), that is used to compress a cryogenic deuterium–tritium (DT) capsule with a beryllium ablator doped with copper to reduce x-ray preheat at the ablator-ice interface.^{26} Although Krypton is not an experimentally relevant material for *Hohlraums*, it is a mid-Z material that is a reasonable stand-in for the high-Z materials used for standard ICF *Hohlraums*. Krypton is also used as a tracer in some ICF experiments, thus the neural network could be used inline for such calculations in the future, (with possible adaptation of the networks to the non-steady-state case). We use a DCA^{18} description for Krypton, with 1808 atomic levels. For the discretization of frequencies in the absorption and emission spectra, we use 40 bins from 10 eV to 40 keV, arranged such that they capture the K and L edges. *HYDRA* is run with multi-group diffusion with a flux limiter of 15% for conduction; NLTE is always activated.

The neural network architectures used in the inline calculations for *HYDRA* are shown in Fig. 8 and detailed in Table III.

Input dimension . | 42 . |
---|---|

Radiation field encoder | (19,9,4) |

975 parameters | |

Latent space dimension | 2 |

Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1535) |

2 111 768 parameters | |

Latent space dimension | 4 |

Spectra decoder | (6,9,13,19,27,40) |

2035 parameters | |

Output dimension | 40 |

Input dimension . | 42 . |
---|---|

Radiation field encoder | (19,9,4) |

975 parameters | |

Latent space dimension | 2 |

Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1535) |

2 111 768 parameters | |

Latent space dimension | 4 |

Spectra decoder | (6,9,13,19,27,40) |

2035 parameters | |

Output dimension | 40 |

Instead of the simple radiation field as described in Sec. IV,, we now have a 40-bin radiation field that is produced by *HYDRA*. This field is compressed via the encoder portion of an auto-encoder to a 2-parameter latent space. The two parameters are combined with the material density and temperature, and are mapped to compressed absorption and emission spectra via a DJINN model. The absorption and emission spectra each have a 4D latent space, which is decoded by their respective auto-encoders to produce full 40-bin spectra. The spectra are used in *HYDRA* during the next time step of the radiation-hydrodynamics calculation.

We first produce a dataset of radiation fields to train the radiation field auto-encoder, and as input for *Cretin* calculations. To generate a diverse set of radiation fields in HYDRA, we make 10 variations of the laser drive as shown in Fig. 10. Each simulation produces radiation fields every 50 ps, producing 78,000 total fields. The fields are used as inputs to *Cretin*, along with randomly sampled mass densities (ranging from 2.4 mg/cc to 19 g/cc) and temperatures (3.4 eV to 3.3 keV), to generate a set of 120,000 *Cretin* calculations. This process is illustrated in Fig. 9. It should be noted that the 78 000 radiation fields produced by the *HYDRA* simulations are not unique; many of the radiation fields are highly similar and correlated, thus a path to improve the generalizability of the neural network models is to create more radiation fields that span regimes not reached in this particular set of ten simulations.

Table IV shows the error in integrated absorption and emissivity metrics for the holdout test dataset for the case of realistic radiation fields. The mean errors are 1%–3% with a maximum error of less than 13%. Including a more diverse set of radiation fields and increasing the size of the training dataset is expected to improve these results further.

The trained DNN models are embedded into the *HYDRA* simulation in place of calls to *Cretin*. The performance of the DNN is compared to *Cretin* in *HYDRA* for the purple laser pulse shown in Fig. 10 The comparisons between *HYDRA* with *Cretin* and with the DNN are shown in Fig. 11.

*HYDRA* run with the DNN model in place of *Cretin* achieves comparable results, with *Hohlraum* temperature differences of less than 1%, while providing a 10× speed up in NLTE computational time (the *HYDRA* serial ICF calculation lasts 454 s with DCA, and 65 s with the DNN in-lined by a Python entry-point: in this simulation, a 10× speed up on NLTE package entails a 7× speed up on the global simulation). To test the robustness of the DNN model, ten more *HYDRA* simulations are run with the entire laser pulse adjusted by a fixed percentage of power ($\xb12%,\xb14%,\xb16%,\xb18%,\xb110%$) (unlike the training data, in which the power change was randomly sampled at several time points throughout the pulse). Changing the overall power of the laser pulse is expected to produce different radiation conditions than were seen in the training data, thus testing the ability of the DNN to extrapolate. The results of the extrapolation are shown in Fig. 12; the error in the radiation temperature (*T _{r}*) and electron temperature (

*T*) increases as the change in power from the baseline laser pulse is increased, as we extrapolate more. However, the error is below 10% even at the extreme ends of extrapolation, and stays within about 2% error for laser drive variations of up to 4% in power. To make the DNN model more accurate for new laser pulses, a more diverse set of radiation fields can be generated for use in the DNN training process.

_{e}In the future, to cope with extrapolation, we will have to define a metric that indicate how far our radiative fields are from the training fields, a large dataset that sample a given manifold in the given metric, a strategy to enrich the training dataset and re-train our networks when new situations are encountered.

## VI. CONCLUSION

Multiphysics computer simulations for high energy density physics experiments, such as ICF, are prohibitively expensive to run at high fidelity. This work presents a novel method for accelerating one of the most expensive physics calculations in ICF simulations via the use of machine learning. A machine learning model trained to emulate the atomic physics code *Cretin* is used in place of *Cretin* in an integrated ICF *Hohlraum* simulation ran with the radiation hydrodynamics code, *HYDRA*. The machine learning model reduces the NLTE computational time of the simulation by a factor of 10, with significant room for further speed up.

This speed up will be all the more important in 3D simulations for ICF,^{3–5} as the number of cells and thus the number of atomic physics calculation calls dramatically increases. Moreover, this speed up may improve with the use of parallelism, as it is easier to parallelize a *DNN* evaluation than a collisional-radiative calculation.

This method for accelerating physics calculations also offers a straightforward path to including higher fidelity physics without changing the computational cost: the neural network models used in this work can be trained on data produced with more detailed atomic models. This would increase the cost of creating the training data for the networks, but once trained, the evaluation time of the network will not change significantly; thus ICF simulations can be run with more accurate atomic physics models without added expense.

At a given physic representation, and at a fixed computing cost, this method may also be used to improve ensemble strategies for ICF design or prediction,^{9,10} by offering a larger ensemble.

This work demonstrates a proof of principle, and much work is needed to create a robust machine learning model replacement of *Cretin* for ICF simulations. First, the networks will have to generalize to a broad range of radiation fields, and will need to be insensitive to the choice of energy binning. The model should also generalize to other materials in an efficient manner; in this work, we examined only Krypton. However, we would like to use these models for all materials that require NLTE calculations without needing to train a new set of neural networks from scratch. The DNNs also have to be efficient with noisy data when Monte Carlo methods are used.^{27}

Finally, we can improve these models by using higher fidelity atomic models to generate training data (the dashed green curves in Fig. 1). This will enable us to use accurate physics that is currently not feasible in *Hohlraum* simulations. Assessing the impact higher fidelity atomic models have on ICF quantities of interest could provide important insights into the deficiencies of current ICF simulations.

The ideas presented in this work are also not specific to the atomic physics calculation in a radiation hydrodynamics code. One can imagine using the same techniques to accelerate other physics packages or table look ups—such as the equation of state or nuclear cross section information. Often the machine learning models can be much less memory intensive than the data on which they are trained, making them attractive for replacing large data tables. Inline machine learning models have the potential to significantly improve multiphysics simulations; they can reduce computational time and memory requirements while providing the opportunity to include more accurate physics models that enable us to better simulate experiments.

## ACKNOWLEDGMENTS

The first author was sponsored by DGA-AID (ERE) of the French government.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Document released as LLNL-JRNL-805050. This document was prepared as an account of the work sponsored by an agency of the United States government. Neither the United States government nor the Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or the Lawrence Livermore National Security, LLC. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the United States government or the Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.