Computer simulations of high energy density science experiments are computationally challenging, consisting of multiple physics calculations including radiation transport, hydrodynamics, atomic physics, nuclear reactions, laser–plasma interactions, and more. To simulate inertial confinement fusion (ICF) experiments at high fidelity, each of these physics calculations should be as detailed as possible. However, this quickly becomes too computationally expensive even for modern supercomputers, and thus many simplifying assumptions are made to reduce the required computational time. Much of the research has focused on acceleration techniques for the various packages in multiphysics codes. In this work, we explore a novel method for accelerating physics packages via machine learning. The non-local thermodynamic equilibrium (NLTE) package is one of the most expensive calculations in the simulations of indirect drive inertial confinement fusion, taking several tens of percent of the total wall clock time. We explore the use of machine learning to accelerate this package, by essentially replacing the physics calculation with a deep neural network that has been trained to emulate the physics code. We demonstrate the feasibility of this approach on a simple problem and perform a side-by-side comparison of the physics calculation and the neural network inline in an ICF Hohlraum simulation. We show that the neural network achieves a 10× speed up in NLTE computational time while achieving good agreement with the physics code for several quantities of interest.
I. INTRODUCTION
Inertial confinement fusion (ICF) experiments, such as those carried out at the National Ignition Facility, are highly complex and are designed with multiphysics codes that include dozens of coupled physics packages—radiation transport, hydrodynamics, neutronics, magnetic fields, and more. One of the most expensive calculations is the non-local thermodynamic equilibrium (NLTE) opacity, emissivity, and ionization. Due to the plasma conditions reached in ICF experiments, it is necessary to use NLTE models when describing the interactions between the x-rays and the plasma. This is a more involved calculation than the standard approximation of local thermodynamic equilibrium (LTE), in which populations of ions are described by the Maxwell–Boltzmann distribution.
In the NLTE case, the distribution of the ion populations is found by solving the collisional-radiative equations. In the steady-state case and at a fixed electronic density, this model is a linear system whose size is the number of described ion states. Either at LTE or at NLTE, absorption, emission spectra, and ionization are readily constructed from these populations and radiative cross sections. For gold, a common Hohlraum material in ICF experiments, this size ranges from to depending on the fidelity of the atomic model,1 with the upper limit set only by computational feasibility. The CPU-time of a call of the collisional-radiative model evolves in N2 or N3, depending on the linear solver.
The LTE opacities are usually stored in a three-dimensional (frequency, density, and temperature) database. In the NLTE case, it also depends on the radiative spectrum: making the NLTE database very challenging in the general case. Its feasibility depends on the number of parameters that allow to describe the radiative spectrum.
Therefore, the collisional-radiative model is solved inline in the multiphysics code. It has to be called in each cell of the mesh, at each time step, for every iteration of the solver. Much work has been done to accelerate the NLTE calculation in ICF codes, by the use of parallel computing and by simplifying the description of the atomic model.2 However, these calculations remain computationally taxing, taking from 10% to 90% of the total wall clock time of an ICF simulation.
A faster representation of the collisional-radiative model will enable significant acceleration of ICF simulations. Moreover, if we had such fast representation, we may, at fixed computing power, improve the physics (atomic description), the numerics (number of calls by cycle) or the parallel computing (low memory, easy to parallelize).
This article is summarized in Fig. 1. One collisional-radiative model can require milliseconds to days of computing time, depending on the size of the atomic model. We prove here the feasibility of using deep neural networks (DNNs) for obtaining a fast representation of the collisional-radiative model in the case of Krypton material with 1808 atomic levels. The extension of this work to higher fidelity of the atomic model (green dashed lines) may allow their use in an ICF simulation. This work is a first step to improve the trade-off between physical description accuracy and CPU limited resources. The CPU-cost of NLTE physics is even more prohibitive in 3D simulations, because they contain many more cells.3–5
Cost of one call for the collisional-radiative model, according to the number of atomic levels (solid black line). The DCA and DNN points represent the models used in this article for Krypton. The dashed lines represent possible extension of this work, which allow the use of more complex atomic description in ICF simulations. These lines are purely illustrative and further studies will describe how the scaling with fidelity of the atomic model behaves.
Cost of one call for the collisional-radiative model, according to the number of atomic levels (solid black line). The DCA and DNN points represent the models used in this article for Krypton. The dashed lines represent possible extension of this work, which allow the use of more complex atomic description in ICF simulations. These lines are purely illustrative and further studies will describe how the scaling with fidelity of the atomic model behaves.
In this work, we explore the use of machine learning models to accelerate the NLTE calculations in multiphysics ICF simulations. The inline collisional-radiative model takes as input a broad-band radiation spectrum that describes the radiation field, and outputs broad-band spectra that describe the material absorptivity and emissivity. This can be framed as a high-dimensional regression task, in which a machine learning model (e.g., a series of neural networks) is trained to map from the input to the output spectra via supervised learning. The neural network representation of the collisional-radiative model is significantly faster than the collisional-radiative model itself, providing a factor of 10 reduction in the NLTE CPU-cost for the 1D ICF Hohlraum simulation we present here.
Neural networks have long been used in spectroscopy, even before they revolutionized the fields of image recognition, natural language processing, or gaming. Spectra have been used by neural networks for classification purposes, such as predicting the electronic configurations of ions of manganese from narrow-band spectra,6 or to infer scalars, such as the inference of temperature and density of a plasma using the K-shell spectroscopy of Aluminum tracers.7 More recently, narrow-band spectra of molecules have been predicted using the description of molecules by Coulomb matrices.8 We extend these studies in the case of broad-band spectra (either in inputs or outputs).
Obtaining fast representation with machine learning can be done around codes or inside codes. Then we may obtain global models,9,10 or multi-scales or multi-fidelity coupling,11,12 or generalization of numerical methods.13,14 Deep neural networks improve accuracy and the use of multi-modal data.15 We confirm and quantify this statement on the high-dimensional regression problem of NLTE opacities with the construction of novel hybrid neural networks.
In this article, we use the radiation hydrodynamics code HYDRA16 and the collisional-radiative package of the code Cretin17 to demonstrate the efficacy of replacing inline Cretin calculations with neural networks for HYDRA ICF simulations. In Sec. II, we describe the collisional-radiative model. In Sec. III, we describe the neural networks used in this study. Finally, in Secs. IV and V, we will describe proof-of-principle simulations showing that neural networks are able to replace the collisional-radiative model with enough accuracy to be used in a radiation hydrodynamics code, with a tenfold reduction in CPU time.
II. THE COLLISIONAL-RADIATIVE MODEL IN CRETIN
We briefly describe here the collisional-radiative model of the NLTE atomic kinetics/radiation transport code Cretin.17 This model in-lined in HYDRA is called DCA. It is most often used with an atomic model constructed to be inexpensive yet reasonably accurate.18 Consider a plasma of a single isotope of Krypton (Z = 36), of atomic mass A in . It is composed of ions with density nion in and free electrons with density nel in . Each ion is characterized by its charge state, i.e., the number of bound electrons Zs, and by its atomic energy levels. Each atomic level encompasses one or more atomic states, described by the distribution of the bound electrons over possible quantum states. The extent of quantum states included in a given atomic model affects the range of conditions for which that model would be suitable, while the lumping of atomic states into levels provides a means of decreasing computational expense at the cost of decreased fidelity. The atomic models commonly used for ICF simulations are generated with a screened-hydrogenic approach. Quantum states are highly averaged, based on superconfigurations, and detailed (improved treatment of photoexcitation transitions, approximate unresolved transition arrays (UTA) widths etc.) so as to contain relevant phenomena for ICF simulations.
We summarize the atomic level by a single index s, and we denote its density Ns. The total ionic density is given by:
Transitions between two ions states s and occur via several atomic processes, characterized as collisional when induced by a collision with a free electron and as radiative when a photon is absorbed or emitted. The rate of transition between states s and (in ) depends on the plasma conditions and is calculated by the collisional-radiative model. The populations of all level then follow by solving the rate equations:19
In the following, we will solve the steady-state collisional-radiative model:20
which is a linear system of size equal to the number of atomic levels.
At the macroscopic scale, at every point in space and time, the plasma is characterized by its mass density ρ in
and its electronic temperature Te in K. The plasma is embedded in a radiation field described by its mean energy (over solid angles) in as a function of frequency ν (in Hz), equivalent to the phase space distribution of photons. Charge neutrality of the plasma implies the following electronic density:
with being the ionization of the atomic level s. For example, if we have Qs = 3, it means that we have 33 bound electrons for ions in the level s.
Rates vary with electronic density. Therefore, (2) is a linear system only if we fix the electronic density. In hydro-codes, we fix the mass density, and thus the ionic density, and we obtain a nonlinear system. One redundant equation of (2) is replaced by the ionic density equation (1); we solve the linear system at a fixed electronic density; we calculate the new electronic density with (3); we iterate until convergence.
For illustration purpose, we consider here transitions between ions states s and by collisional excitation (resp. photo-ionization). These states differ by the quantum state of only one electron, initially bounded and finally bounded (resp. free). Then, upward rates (going to the state of higher energy) are given by:19
The atomic collisional cross section for collisional excitation, , describes collisions of free electrons at velocity v, exciting the ion of state s to state . The free electron requires a minimum speed (v0) to provide enough energy for the transition of the bound electron. Free electrons are described by a Maxwellian velocity distribution f(v, T). The atomic radiative cross section for photo-ionization, , describes the absorption of photons of frequency ν. The inverse rates (from to s) are obtained from the principle of detailed balance, which ensures that upward and downward processes are balanced under conditions of thermodynamic equilibrium.
The constructions of these rates are done for every atomic collisional and radiative process that entails transitions, either upward (excitation, ionization, and dielectronic capture) or downward (de-excitation, recombination, and auto-ionization).
Finally, for the given plasma and radiation field conditions , and atomic structures (levels and cross sections), we can solve Eq. (2) and obtain the ion populations (their densities Ns).
The mean ionization:
is used in the calculations of equations of state, thermal conduction, and laser absorption. Other quantities of interest are the absorptivity in and the emissivity in . The absorptivity and emissivity characterize the interaction between the plasma and the radiation: a photon of frequency ν travels a mean distance without any collisions, and a volume of plasma V will emit during the time dt in the frequency interval the energy .
By splitting the radiative upward cross sections21,22 into (for transitions from the bound level s to the upper bound level ), (for ionizations from the bound level s to the continuum), and (a free electron decelerated in the ion electric field, emitting free–free radiation), we have:
and
The statistical weights gs represent the degeneracy of the state s, representing the multiplicity of atomic levels which are then treated identically in the collisional-radiative model. These weights may also depend on plasma conditions to account for the progressive delocalization of bound states with increasing ion density. Absorption and emission are here isotropic, as they are calculated in the reference frame moving with the plasma.
To conclude this brief description, we summarize the role of the neural network model in Fig. 2. This model will replace the construction of the rates (4), the solution of the steady-state collisional-radiative model (2), and the construction of the spectra (5), (6).
Calculation of from . For a given atomic structure (1808 described levels of Krypton electronic configurations), for a given version of a collisional-radiative code (Cretin), and for a given frequency binning (200 bins for the Cretin test and 40 for the HYDRA test), we aim at constructing deep neural networks that give the same results as the collisional-radiative model.
Calculation of from . For a given atomic structure (1808 described levels of Krypton electronic configurations), for a given version of a collisional-radiative code (Cretin), and for a given frequency binning (200 bins for the Cretin test and 40 for the HYDRA test), we aim at constructing deep neural networks that give the same results as the collisional-radiative model.
It is tempting to use our knowledge of the code or of the physics inside Cretin to help our model. For instance, we could ask our neural network to obtain populations Ns, or we could help our neural network by providing the free–free part of absorptivity and emissivity, but we aim to be as general as possible, and to encapsulate Cretin with no intrusive development or Cretin-specific tricks. In the long term, we would like to construct a framework that can encapsulate any NLTE code with any atomic structure.
III. NEURAL NETWORK SURROGATES FOR CRETIN
Deep neural networks (DNN) are commonly used as “surrogate” models, or emulators, of expensive scientific simulations and experiments. In this section, we describe the deep neural network architectures used to emulate the code Cretin for use in the radiation hydrodynamics code, HYDRA. In a standard ICF Hohlraum simulation, HYDRA will call DCA each time step, for each zone in the problem. HYDRA gives as input to DCA the material density, temperature, and radiation field represented by a binned spectrum. DCA then produces for HYDRA absorption and emission spectra to be used.
We describe the neural networks used to map from the inputs of density, temperature, and radiation field, into the output absorption and emission spectra. In the following, we do not use a DNN for the ionization, focusing instead on the spectral output, although using a DNN for the ionization has not encountered any difficulties. The high dimension of spectral data benefits from the use of two types of neural networks—auto-encoders, which determine a low-dimensional representation of high-dimensional data, and feed-forward neural networks used as regression functions, to map from low dimension inputs to low dimension outputs.
A. Neural network architectures
In its simplest form, a neural network is a composition of layers, and a layer is a series of matrix–vector multiplication and addition operations whose outputs are fed through nonlinear “activation” functions:
where z is the input vector to the layer of interest, w is the weight array, b is the bias, and σ is the activation function. The output of this layer, , is the input for the next layer in the neural network until the final layer is reached, in which case, the output . The number of layers in the neural network and the width of each layer (the number of elements in each vector ) are hyper-parameters that must be tuned for each dataset. The neural network is trained by giving the model examples of (input, output) or (x, y) pairs. The input to the model x goes through a series of transformations given by the expression above to produce a prediction y*. The difference between y* and the true output y is computed in the form of a cost function—often the mean squared error. The weights and biases of the model are then adjusted via an optimization process (such as stochastic gradient descent), by an amount set by the learning rate. The network is trained on many (x,y) pairs for multiple iterations (called epochs) until the cost function reaches a minimum. The number of epochs a neural network is trained, how many data points it is exposed to between weight adjustments (batch size), and how large of a change can be made to the weights/biases in each step (learning rate), in addition to the overall shape of the neural network, are hyper-parameters that are empirically tuned to maximize the performance of the model.
In this article, we use a variant of the stochastic gradient descent, the Adam optimizer,23 with a learning rate initialized at 0.001. The batch size is fixed at 5% of the training dataset size, and the number of epochs is chosen empirically based on the convergence of the cost function.
To ensure the neural networks do not simply memorize their training data, the total dataset of (x,y) pairs is split into training and validation sets. In this work, 90% of the data is used to train the models, and they are validated using the withheld 10% of the dataset to ensure that the models can accurately predict data that were not part of the training set. In this work, we also produce a test dataset that is composed of 30,000 samples not used in the training or validation process.
As previously mentioned, two types of neural networks are used to create the Cretin surrogates: auto-encoders and feed-forward networks constructed using the algorithm Deep Jointly Informed Neural Networks (DJINNs).24 Two separate models are used for absorption and emissivity, but the models share a common set of inputs: temperature, density, and radiation field.
It is important to note that preprocessing of the data has a significant impact on neural network performance for this application. The spectral data spans many orders of magnitude, thus we scale the spectral data according to: , where X is either the absorption, or the emissivity, or the radiation field. Additionally, the inputs and outputs of the feed-forward DJINN-based networks (see Sec. III C) are scaled , where the minimum and maximum are computed on the training data. This scaling ensures all inputs and outputs are treated as being equally important for the purposes of minimizing the cost function.
B. Auto-encoders
Auto-encoders are used to compress the spectral data into lower dimensional representations that are more readily processed by the DJINN-based predictive model. An auto-encoder is essentially a model for nonlinearly compressing data with minimal information loss. Auto-encoders typically have an “hour-glass” architecture, shown in Fig. 3—the input layer is wide, and each layer compresses this information gradually until the architecture reaches its narrowest layer, which is often called the “latent space.” The latent space is a low-dimensional representation of the input space. After the latent space, the data is decompressed, with each layer getting progressively wider until you get back to the full size vector in the output layer. The auto-encoder is trained by minimizing the difference between the input data and the output data. If the output layer can reproduce the input layer with minimal error, the latent space can then be taken as a good low-dimensional representation of the high-dimensional data. We often refer to the auto-encoder by its two halves: the “encoder” compresses the full-dimensional data into the low-dimensional latent space. The “decoder” decompresses the latent space to give back the full-dimensional data.
Illustration of an auto-encoder. In the upper picture, the auto-encoder is trained to find the 10-bin spectra given in the input, with a latent space of dimension 3. In the lower picture, if the training has succeeded, the two halves of the auto-encoder give an encoder and a decoder of the 10-bin spectra.
Illustration of an auto-encoder. In the upper picture, the auto-encoder is trained to find the 10-bin spectra given in the input, with a latent space of dimension 3. In the lower picture, if the training has succeeded, the two halves of the auto-encoder give an encoder and a decoder of the 10-bin spectra.
The architecture of the auto-encoders used in this work is determined by adjusting the size of the latent space until minimum reconstruction error is achieved. The number of neurons per layer between the input and latent space decays by geometric progression. The models included in this work are trained for 10,000 epochs. The auto-encoders are fully connected with softplus activation functions, and are initialized with Xavier weights and biases of zero.25
Thus, all spectra described in this article may be encoded in a latent space of low dimension, and a decoder may recover the original spectrum from the latent space.
C. DJINN models
“Deep jointly informed neural networks,” or DJINNs,24 are used to create networks that map from the inputs of Cretin (encoded on a latent space for spectra), to the outputs of Cretin (encoded on a latent space for spectra). The DJINN algorithm simplifies the neural network training process by choosing an appropriate neural network architecture for the data automatically, without requiring the user to manually tune the architecture. DJINN trains a decision tree-based model of a depth specified by the user on the data, then maps the tree to an initialized neural network architecture. The model is then trained using the Adam optimizer with the learning rate, batch size, and number of epochs set by the user. The DJINN models included in this work are of depth 11 and are trained for 1000 epochs. Each model has about 2 million trainable parameters.
IV. STANDALONE NEURAL NETWORK SURROGATES FOR CRETIN
To illustrate the feasibility of using neural networks to emulate an atomic physics calculation, we first consider a simple Cretin-only problem with a smooth radiation field described analytically surrounding a Krypton plasma (in Sec. V, we consider a more realistic example with radiation fields produced by the multiphysics code HYDRA). The smooth radiation field is an approximation of a typical field in an ICF Hohlraum; it is a superposition of a Planckian spectrum and M-band emission generated in the gold coronal plasma. This field is described by two parameters, the radiation temperature Tr and the M-band ratio α:
where is the reduced Planckian, and is a Gaussian of mean 3 keV and a full width half maximum of 1 keV.
A dataset is generated by running an ensemble of Cretin simulations under various plasma conditions typical for ICF experiments. The mass density (ρ) ranges from 3 to 100 mg/cc, the electron temperature (Te) from 300 to 3000 eV, the radiation temperature (Tr) from 30 to 300 eV, and the M-band ratio (α) from 0 to 0.3. The data generation process is illustrated in Fig. 4.
The workflow used to train the neural networks begins by sampling the material density, electron temperature, radiation temperature, and M-band ratio. The code Cretin produces absorption and emission spectra corresponding to these inputs. This process is executed 33,000 times to generate training data for the neural network, taking about 4 CPU-days. The neural network training is comparatively quick, taking 1–2 h on Nvidia Tesla P100 GPUs.
The workflow used to train the neural networks begins by sampling the material density, electron temperature, radiation temperature, and M-band ratio. The code Cretin produces absorption and emission spectra corresponding to these inputs. This process is executed 33,000 times to generate training data for the neural network, taking about 4 CPU-days. The neural network training is comparatively quick, taking 1–2 h on Nvidia Tesla P100 GPUs.
We construct two distinct auto-encoders for absorption () and emission () spectra given on 200 bins. After multiple trainings, we find that suitable latent space dimensions are 5 and 7, respectively, as shown in Fig. 5 and Table I. Encoding these data on their latent space, we can train the DJINN model; DJINN maps between the density, temperature, and the two parameters of the radiation field, (ρ, Te, Tr, α), to the compressed absorption and emission spectra. The compressed spectra are then decompressed by the appropriate decoder networks. The compressed representation of the data reduces the overall size of the DJINN network, and thus reduces the amount of training data required to train the model.
Network architectures for the Cretin-only example. Inputs of dimension 4 are transformed by a DJINN model to a latent space of dimension 5 or 7, for absorption and emission spectra, respectively. The corresponding decoder decompresses the latent spaces to the full 200 spectral bins.
Network architectures for the Cretin-only example. Inputs of dimension 4 are transformed by a DJINN model to a latent space of dimension 5 or 7, for absorption and emission spectra, respectively. The corresponding decoder decompresses the latent spaces to the full 200 spectral bins.
Network architectures for the Cretin-only example. DNNs are described by giving the number of active neurons per hidden layer, and the total number of parameters. We have quite the same architecture for emission, but with a latent space of dimension 7.
. | Absorption DNN . |
---|---|
Input dimension | 4 |
Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1528) |
2 106 596 parameters | |
Latent space dimension | 5 |
Spectra decoder | (10,22,46,96) |
24 898 parameters | |
Output dimension | 200 |
. | Absorption DNN . |
---|---|
Input dimension | 4 |
Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1528) |
2 106 596 parameters | |
Latent space dimension | 5 |
Spectra decoder | (10,22,46,96) |
24 898 parameters | |
Output dimension | 200 |
The accuracy of the neural network surrogate is compared graphically to Cretin data in Fig. 6 and quantitatively in Fig. 7 and Table II. For these comparisons, we use a test dataset of 30,000 samples, distinct from the training and validation datasets.
Results of the DNN (crosses) compared to Cretin (rounds). Absorption coefficients on the top and emissivities below. For each bin, we show the maximum, mean, percentile 30, and minimum values on all the test datasets.
Results of the DNN (crosses) compared to Cretin (rounds). Absorption coefficients on the top and emissivities below. For each bin, we show the maximum, mean, percentile 30, and minimum values on all the test datasets.
The spectral distribution of the relative error is plotted here (as it is difficult to visually distinguish differences between DNN and Cretin in Fig. 6), for the absorption on the left, and for the emissivity on the right. The three curves are the relative error for the maximum, the 95th percentile, and the mean values (per frequency) of the test dataset. The error made on the maximum is the biggest, but stays under .
The spectral distribution of the relative error is plotted here (as it is difficult to visually distinguish differences between DNN and Cretin in Fig. 6), for the absorption on the left, and for the emissivity on the right. The three curves are the relative error for the maximum, the 95th percentile, and the mean values (per frequency) of the test dataset. The error made on the maximum is the biggest, but stays under .
For the Cretin-only example, on the test dataset, relative errors on the Planck absorption mean, the Rosseland absorption mean (without scattering), and the integrated emissivity.
. | Mean (%) . | Max (%) . |
---|---|---|
Planck absorption | 0.2 | 6.1 |
Rosseland absorption | 0.2 | 8.8 |
Integrated emissivity | 0.4 | 15a |
. | Mean (%) . | Max (%) . |
---|---|---|
Planck absorption | 0.2 | 6.1 |
Rosseland absorption | 0.2 | 8.8 |
Integrated emissivity | 0.4 | 15a |
For the integrated emissivity, we filtered and kept only values that are above erg/cc/s/ste. We obtain the highest relative errors at the lowest emissivities, which is expected as we used Means Square Error as a cost function, and which has no effect on the final simulations, as error made at low emissivity has no great effect. But as a consequence, the maximum relative error is very sensitive to the value at which we cut the considered emissivities used for relative error calculations.
There is good agreement between the neural network predicted spectra and those from Cretin simulations. The mean integrated prediction error is 0.2%, and the maximum observed error in the outlier datapoints is less than 9%. The neural network models thus provide an accurate representation of Cretin when considering simple, analytic radiation fields. In Sec. V, we implement more realistic HYDRA-generated radiation fields.
V. HYDRA SIMULATIONS WITH INLINE NEURAL NETWORKS
In Sec. IV, we demonstrated the feasibility of training a neural network to emulate Cretin with a well-sampled input space described by a smooth analytically described radiation field, electron temperature, and material density. This dataset provides the neural network with a broad view of the various plasma conditions it might encounter during the validation and testing stages.
We aim to train a neural network suitable for embedding in the radiation hydrodynamics code HYDRA, to illustrate the computational time reduction that replacing Cretin with a neural network could provide. This model is slightly more complicated to train because the radiation field is no longer analytically described—it must be generated by HYDRA. Thus, instead of running tens of thousands of independent Cretin simulations to produce training data, we now run several independent Hohlraum simulations, each of which provides a large number of radiation fields, to produce a wide variety of realistic radiation fields.
To demonstrate the advantages of using neural networks as an approximation to Cretin inline in a HYDRA calculation, we consider a simple ICF simulation—a 1D spherical Krypton Hohlraum filled with helium gas (at ), that is used to compress a cryogenic deuterium–tritium (DT) capsule with a beryllium ablator doped with copper to reduce x-ray preheat at the ablator-ice interface.26 Although Krypton is not an experimentally relevant material for Hohlraums, it is a mid-Z material that is a reasonable stand-in for the high-Z materials used for standard ICF Hohlraums. Krypton is also used as a tracer in some ICF experiments, thus the neural network could be used inline for such calculations in the future, (with possible adaptation of the networks to the non-steady-state case). We use a DCA18 description for Krypton, with 1808 atomic levels. For the discretization of frequencies in the absorption and emission spectra, we use 40 bins from 10 eV to 40 keV, arranged such that they capture the K and L edges. HYDRA is run with multi-group diffusion with a flux limiter of 15% for conduction; NLTE is always activated.
The neural network architectures used in the inline calculations for HYDRA are shown in Fig. 8 and detailed in Table III.
Network architectures for the in-lined surrogate model in HYDRA. The radiation field of dimension 40 is encoded to a latent space of dimension 2. A DJINN model transforms mass density and temperature and the 2D latent space to a 4D latent space for the absorption or emission, which is then decoded to give the output absorption or emission spectra of dimension 40.
Network architectures for the in-lined surrogate model in HYDRA. The radiation field of dimension 40 is encoded to a latent space of dimension 2. A DJINN model transforms mass density and temperature and the 2D latent space to a 4D latent space for the absorption or emission, which is then decoded to give the output absorption or emission spectra of dimension 40.
Network architectures for the in-lined surrogate model in HYDRA. DNNs are described by giving the number of active neurons per hidden layer, and the total number of parameters. We have the same architecture either for emission or absorption.
Input dimension . | 42 . |
---|---|
Radiation field encoder | (19,9,4) |
975 parameters | |
Latent space dimension | 2 |
Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1535) |
2 111 768 parameters | |
Latent space dimension | 4 |
Spectra decoder | (6,9,13,19,27,40) |
2035 parameters | |
Output dimension | 40 |
Input dimension . | 42 . |
---|---|
Radiation field encoder | (19,9,4) |
975 parameters | |
Latent space dimension | 2 |
Djinn-based DNN | (6,9,15,26,49,100,197,389,766,1535) |
2 111 768 parameters | |
Latent space dimension | 4 |
Spectra decoder | (6,9,13,19,27,40) |
2035 parameters | |
Output dimension | 40 |
Instead of the simple radiation field as described in Sec. IV,, we now have a 40-bin radiation field that is produced by HYDRA. This field is compressed via the encoder portion of an auto-encoder to a 2-parameter latent space. The two parameters are combined with the material density and temperature, and are mapped to compressed absorption and emission spectra via a DJINN model. The absorption and emission spectra each have a 4D latent space, which is decoded by their respective auto-encoders to produce full 40-bin spectra. The spectra are used in HYDRA during the next time step of the radiation-hydrodynamics calculation.
We first produce a dataset of radiation fields to train the radiation field auto-encoder, and as input for Cretin calculations. To generate a diverse set of radiation fields in HYDRA, we make 10 variations of the laser drive as shown in Fig. 10. Each simulation produces radiation fields every 50 ps, producing 78,000 total fields. The fields are used as inputs to Cretin, along with randomly sampled mass densities (ranging from 2.4 mg/cc to 19 g/cc) and temperatures (3.4 eV to 3.3 keV), to generate a set of 120,000 Cretin calculations. This process is illustrated in Fig. 9. It should be noted that the 78 000 radiation fields produced by the HYDRA simulations are not unique; many of the radiation fields are highly similar and correlated, thus a path to improve the generalizability of the neural network models is to create more radiation fields that span regimes not reached in this particular set of ten simulations.
Ten HYDRA calculations produce radiation fields every 50 ps, producing 78 000 radiation fields. These fields, along with randomly sampled density and temperature, are used as inputs to Cretin. An ensemble of 120,000 Cretin simulations is run to produce absorption and emission spectra with 40 bins each.
Ten HYDRA calculations produce radiation fields every 50 ps, producing 78 000 radiation fields. These fields, along with randomly sampled density and temperature, are used as inputs to Cretin. An ensemble of 120,000 Cretin simulations is run to produce absorption and emission spectra with 40 bins each.
Table IV shows the error in integrated absorption and emissivity metrics for the holdout test dataset for the case of realistic radiation fields. The mean errors are 1%–3% with a maximum error of less than 13%. Including a more diverse set of radiation fields and increasing the size of the training dataset is expected to improve these results further.
On the test dataset, relative errors on the Planck absorption mean, the Rosseland absorption mean (without scattering), and the integrated emissivity.
. | Mean (%) . | Max (%) . |
---|---|---|
Planck absorption | 1.1 | 3.6 |
Rosseland absorption | 3.3 | 7.4 |
Integrated emissivity | 1.3 | 12.7a |
. | Mean (%) . | Max (%) . |
---|---|---|
Planck absorption | 1.1 | 3.6 |
Rosseland absorption | 3.3 | 7.4 |
Integrated emissivity | 1.3 | 12.7a |
See Table II.
The trained DNN models are embedded into the HYDRA simulation in place of calls to Cretin. The performance of the DNN is compared to Cretin in HYDRA for the purple laser pulse shown in Fig. 10 The comparisons between HYDRA with Cretin and with the DNN are shown in Fig. 11.
The used laser power function of time (in μs). The solid purple laser drive is used for the test problem to validate the inline DNN in HYDRA. The 10 other laser drives vary around the test laser drive with an amplitude randomly varying between −10% and .
The used laser power function of time (in μs). The solid purple laser drive is used for the test problem to validate the inline DNN in HYDRA. The 10 other laser drives vary around the test laser drive with an amplitude randomly varying between −10% and .
Results of HYDRA with Cretin (rounds) compared to HYDRA with the DNN (crosses). Electronic temperatures on the left and radiation temperatures on the right. There are four couples of curves, one for each time (at 3–4–5–6 ns). The relative difference is given for the temperatures in the coronal plasma at peak flux (at 5 ns).
Results of HYDRA with Cretin (rounds) compared to HYDRA with the DNN (crosses). Electronic temperatures on the left and radiation temperatures on the right. There are four couples of curves, one for each time (at 3–4–5–6 ns). The relative difference is given for the temperatures in the coronal plasma at peak flux (at 5 ns).
HYDRA run with the DNN model in place of Cretin achieves comparable results, with Hohlraum temperature differences of less than 1%, while providing a 10× speed up in NLTE computational time (the HYDRA serial ICF calculation lasts 454 s with DCA, and 65 s with the DNN in-lined by a Python entry-point: in this simulation, a 10× speed up on NLTE package entails a 7× speed up on the global simulation). To test the robustness of the DNN model, ten more HYDRA simulations are run with the entire laser pulse adjusted by a fixed percentage of power () (unlike the training data, in which the power change was randomly sampled at several time points throughout the pulse). Changing the overall power of the laser pulse is expected to produce different radiation conditions than were seen in the training data, thus testing the ability of the DNN to extrapolate. The results of the extrapolation are shown in Fig. 12; the error in the radiation temperature (Tr) and electron temperature (Te) increases as the change in power from the baseline laser pulse is increased, as we extrapolate more. However, the error is below 10% even at the extreme ends of extrapolation, and stays within about 2% error for laser drive variations of up to 4% in power. To make the DNN model more accurate for new laser pulses, a more diverse set of radiation fields can be generated for use in the DNN training process.
Each point is a different simulation, with a laser pulse that is at a constant difference from the test laser pulse (x-axis) For each of these calculations, we give the relative errors for temperatures (Te in purple and Tr in green) in the bubble at the peak flux between the HYDRA calculation with Cretin and the one with the DNN.
Each point is a different simulation, with a laser pulse that is at a constant difference from the test laser pulse (x-axis) For each of these calculations, we give the relative errors for temperatures (Te in purple and Tr in green) in the bubble at the peak flux between the HYDRA calculation with Cretin and the one with the DNN.
In the future, to cope with extrapolation, we will have to define a metric that indicate how far our radiative fields are from the training fields, a large dataset that sample a given manifold in the given metric, a strategy to enrich the training dataset and re-train our networks when new situations are encountered.
VI. CONCLUSION
Multiphysics computer simulations for high energy density physics experiments, such as ICF, are prohibitively expensive to run at high fidelity. This work presents a novel method for accelerating one of the most expensive physics calculations in ICF simulations via the use of machine learning. A machine learning model trained to emulate the atomic physics code Cretin is used in place of Cretin in an integrated ICF Hohlraum simulation ran with the radiation hydrodynamics code, HYDRA. The machine learning model reduces the NLTE computational time of the simulation by a factor of 10, with significant room for further speed up.
This speed up will be all the more important in 3D simulations for ICF,3–5 as the number of cells and thus the number of atomic physics calculation calls dramatically increases. Moreover, this speed up may improve with the use of parallelism, as it is easier to parallelize a DNN evaluation than a collisional-radiative calculation.
This method for accelerating physics calculations also offers a straightforward path to including higher fidelity physics without changing the computational cost: the neural network models used in this work can be trained on data produced with more detailed atomic models. This would increase the cost of creating the training data for the networks, but once trained, the evaluation time of the network will not change significantly; thus ICF simulations can be run with more accurate atomic physics models without added expense.
At a given physic representation, and at a fixed computing cost, this method may also be used to improve ensemble strategies for ICF design or prediction,9,10 by offering a larger ensemble.
This work demonstrates a proof of principle, and much work is needed to create a robust machine learning model replacement of Cretin for ICF simulations. First, the networks will have to generalize to a broad range of radiation fields, and will need to be insensitive to the choice of energy binning. The model should also generalize to other materials in an efficient manner; in this work, we examined only Krypton. However, we would like to use these models for all materials that require NLTE calculations without needing to train a new set of neural networks from scratch. The DNNs also have to be efficient with noisy data when Monte Carlo methods are used.27
Finally, we can improve these models by using higher fidelity atomic models to generate training data (the dashed green curves in Fig. 1). This will enable us to use accurate physics that is currently not feasible in Hohlraum simulations. Assessing the impact higher fidelity atomic models have on ICF quantities of interest could provide important insights into the deficiencies of current ICF simulations.
The ideas presented in this work are also not specific to the atomic physics calculation in a radiation hydrodynamics code. One can imagine using the same techniques to accelerate other physics packages or table look ups—such as the equation of state or nuclear cross section information. Often the machine learning models can be much less memory intensive than the data on which they are trained, making them attractive for replacing large data tables. Inline machine learning models have the potential to significantly improve multiphysics simulations; they can reduce computational time and memory requirements while providing the opportunity to include more accurate physics models that enable us to better simulate experiments.
ACKNOWLEDGMENTS
The first author was sponsored by DGA-AID (ERE) of the French government.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Document released as LLNL-JRNL-805050. This document was prepared as an account of the work sponsored by an agency of the United States government. Neither the United States government nor the Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or the Lawrence Livermore National Security, LLC. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the United States government or the Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.