Computer models of inertial confinement fusion (ICF) implosions play an essential role in experimental design and interpretation as well as our understanding of fundamental physics under the most extreme conditions that can be reached in the laboratory. Building truly predictive models is a significant challenge, with the potential to greatly accelerate progress to high yield and ignition. One path to more predictive models is to use experimental data to update the underlying physics in a way that can be extrapolated to new experiments and regimes. We describe a statistical framework for the calibration of ICF simulations using data collected at the National Ignition Facility (NIF). We perform Bayesian inferences for a series of laser shots using an approach that is designed to respect the physics simulation as much as possible and then build a second model that links the individual-shot inferences together. We show that this approach is able to match multiple X-ray and neutron diagnostics for a whole series of NIF “BigFoot” shots. Within the context of 2D radiation hydrodynamic simulations, our inference strongly favors a significant reduction in fuel compression over other known degradation mechanisms (namely, hohlraum issues and engineering perturbations). This analysis is expanded using a multifidelity technique to pick fuel-ablator mix from several candidate causes of the degraded fuel compression (including X-ray preheat and shock timing errors). Finally, we use our globally calibrated model to investigate the extra laser drive energy that would be required to overcome the inferred fuel compression issues in NIF BigFoot implosions.

## I. INTRODUCTION

Developing computer models that are predictive, in the sense that they accurately describe experimental data before they are collected, is a ubiquitous problem in science. In inertial confinement fusion (ICF) research,^{1} predictive models are essential for long term planning of experimental campaigns aimed at pushing current experiments to high energy yield. Unfortunately, current experiments at the National Ignition Facility (NIF) have shown that state of the art radiation-hydrodynamics simulations, while very successful at interpreting existing data,^{2,3} can be quite unreliable when it comes to predicting future experiments.

A traditional approach to this problem is to perform focussed experiments and code development to improve specific aspects of the simulation. Eventually, the improved model will become predictive of the experiments. This is exceptionally difficult in ICF owing to the highly complex nature of the simulations themselves and the extreme expense of the experiments. The experiments are also very difficult to diagnose, making it difficult to identify specific physics issues in the codes.

In this work, we present a complementary approach that aims to combine experimental data and simulations in a single model that simultaneously matches the available data and respects our best models for the underlying physics. In essence, the aim is to update the simulation model using experimental data in a physically consistent manner. We show that by combining machine learning and Bayesian model calibration, we can develop strong, physics-based hypotheses for the observed performance in NIF implosions. By combining these hypotheses over a whole range of laser shots, we can build a global model that spans the experimental design space. Our model is one of the first statistically calibrated ICF models and can be used to investigate some of the most pressing questions in ICF research.

## II. PREDICTIVE MODELS OF ICF IMPLOSIONS

In this work, we aim to use a process known as “model calibration” to develop a new ICF implosion model that can accurately predict experiments before they are done. In essence, some modification is introduced to the existing computer model to force agreement with the available experimental data. This is a well known problem in statistics; however, the ICF application introduces specific issues which influence the design of our calibration approach. We describe the issues, and our approach to overcome them, in Secs. II A and II B.

### A. Requirements for a reliable calibration

We aim to build a model which accurately describes our knowledge about implosion performance, both in the region of the available NIF data and away from them. For example, we may want to use data collected at low laser energy to predict the neutron yield from experiments done at an increased scale. Our model predictions must be equipped with uncertainties which allow us to assess how confident we are about a particular aspect of the physics. The uncertainties are essential for describing our understanding of current experiments and how it degrades as we extrapolate. In this work, we tackle this problem by statistically calibrating a high-fidelity computer model to the available data by modifying the underlying physics in such a way so as to bring the code into agreement with experimental data. By repeating this process for a series of laser shots, we build a picture of the way that the modified physics behaves over the experimental design space.

For our extrapolated predictions to be trustworthy, we have several requirements which influence the design of the model. First and foremost, it is essential that the prediction model obeys the fundamental physics of the problem; while some aspects of the physics driving ICF performance are uncertain or unknown, many others are not (conservation laws, for example). Since it is not clear how to express these laws in terms of experimental observables, an essential part of our model is that the predicted outputs are generated from detailed simulations. Second, it is desirable that the inferred updates to the physics models are interpretable so that they can be communicated to subject matter experts. For example, a simple reduction in the neutron yield to force agreement with experiment not only breaks the fundamental relationship between the yield and ion temperature but also provides no information that could be used to improve the simulation code itself. Finally, it is important that the prediction uncertainties are complete and come from all sources; it is quite possible that there are multiple physics explanations for the observed performance which extrapolate quite differently, and our model needs to be able to support all possibilities.

In order to build our predictive model, we need to combine information from various sources: simulation models, experimental data, and expert opinion (where available). The nature of these information sources introduces further design constraints on our approach. Simulations give us complete information about an approximation to the real system. In ICF research, the associated radiation-hydrodynamic simulations represent decades of work and have been tested against literally thousands of experiments on a variety of platforms, materials, and plasma conditions, and so we consider the approximation to be generally good. It is also worth noting that although simulations are often not predictive, it is usually possible to find a modified simulation that matches certain data after the experiment has been done, which is further evidence that the underlying physics is mostly correct.^{4–6} The experimental data, on the other hand, gives us precise but incomplete information about the real system. They are very expensive to collect, and therefore very sparse, and so it is not reasonable to expect a typical experiment (5–10 laser shots) to completely constrain the important physics. Finally, expert opinion and/or priors can be very strong but difficult to write down in a form that can be built into a prediction model. In this situation, it is common to develop several models based on assumptions ranging from optimistic to pessimistic and to then ask the experts to choose. This process requires a relatively flexible model in which the various assumptions are explicit.

The challenges we have described make this problem ideally suited for a Bayesian approach.^{7} At its core, Bayesian analysis provides a rule for combining information from various sources into a single model in a consistent way.^{8} Modern numerical approaches allow for complex interactions between different elements of the model as well as multiple explanations of the observations, both of which are expected to be important in ICF problems. What's needed is a description of the information content (equivalently, the uncertainty) in each source. For the experimental data, this is well known since huge effort is made to properly determine the measurement error. For simulations, the uncertainty can be more difficult to quantify. In this work, we introduce input parameters which modify the underlying physics and use them to describe the simulation uncertainty.

Our approach is based on a probabilistic calibration in the input space of the simulation code. Mathematical details are given in Sec. II B; however, it is worth noting that working in the input space ensures that the predicted outputs are always outputs of the simulation code (with modified physics). This means that the calibrated model automatically obeys the laws of physics included in the code. We take a fully Bayesian approach that explicitly includes uncertainties from all sources.

- 1.
Experimental uncertainty: Arising from both measurement uncertainties and poor experimental constraints on the physics modifications.

- 2.
Extrapolation uncertainty: Arising from poor constraints on the variation in modified physics with experimental design parameters and driven by the sparsity of the experimental data.

- 3.
Shot to Shot Variation: Due to random fluctuations in the experimental system.

- 4.
Simulation uncertainty: Arising from (known) uncertainties in code parameters or underlying physics.

^{9} - 5.
Surrogate uncertainty: Arising from the computational expense of our radiation-hydrodynamic simulations.

Each of these contributions is explicitly modeled, allowing our model to provide detailed information about the origin of the prediction uncertainty and how it can be reduced.

### B. Mathematical details

A standard approach to the calibration of simulations to experiments is through a probabilistic discrepancy on the output of the simulation,^{10,11}

where **y** denotes a vector of (potentially nonscalar) experimental observables, **x** is a vector of independent variables (the experimental design parameters varied across a series of laser shots), and subscripts (*S*, *M*) denote the simulation and calibrated model, respectively. The calibrated model is made to match the experimental data by training a probabilistic model for the discrepancy *δ*(**x**).

As discussed, we consider the output space calibration in Eq. (1) to be inappropriate for our problem. Instead, our calibration is based on learning the values of a set of “latent variables” **z** so that

The latent variables are introduced to modify the experimental outputs, allowing the simulation to match the data. They could be variations in the initial conditions for the simulation (i.e., the size of an applied hydrodynamic perturbation), or they could modify the underlying physics to describe uncertainties (i.e., an uncertain transport coefficient) or to approximately model known physics deficiencies that are too expensive to simulate directly (i.e., an extra pressure source or energy loss). The important point is that the modification to the physics is known and that the outputs of the calibrated model are still consistent with the physics models which we did not modify. This input space calibration approach has been shown to give effective calibration and improved extrapolation behavior in various applications.^{12,13}

Building our complete predictive model can be split into two stages. The first, which we call “local matching,” determines the latent variables required to match each laser shot in the experimental series independently. This provides samples from the mapping $x\u2192z$ at the values of **x** used in the experimental series. The second stage, called “global scaling,” stitches these samples together into a probabilistic model that predicts the distribution of latent variables for any **x**. We describe these two stages in Secs. II B 1 and II B 2, respectively.

#### 1. Local matching to a single laser shot

For a given laser shot, for which the vector $x=xE$ is known, then the values of the latent variables $z(xE)=zE$ can be found using standard methods, either by a brute-force walk through parameter space to find a “good” fit or by a more formal probabilistic approach. We are particularly interested in uncertainties and the possibility of multiple solutions, and so we use Markov chain Monte Carlo (MCMC). The MCMC fit develops probability distributions of the values of the latent variables consistent with experimental observables using Bayes' theorem

In the above equation, $P(zE|yE)$ is the “posterior” distribution we wish to sample, $P(yE|zE)$ is the “likelihood” of the experimental data given some values of the latent variables, *P*(**z**_{E}) is a “prior” distribution which expresses our belief about the values of the latent variables before the experiment was done. Note that we have dropped the dependence of all the quantities on **x**_{E} for clarity. The previous is specified before analysis starts and can be neglected when there are no preferred values of the latent variables. Assuming normally distributed experimental uncertainties, the likelihood can be written as

where $N(t;\mu ,\Sigma )$ is the probability density (PDF) of the value of a multivariate normal random variable **t** with mean *μ* and covariance Σ.

The MCMC inference requires many millions of evaluations of the likelihood, which as written in Eq. (4) would require many millions of evaluations of the simulation model **y**_{S}(**z**_{E}). Even for the simplest simulations, this is unfeasible, and so it is standard to train a rapid interpolation model (or “surrogate”), which can be trained on a much smaller number of simulations, and use this in place of the detailed simulation code. Decoupling the MCMC from the simulation code in this way has the added advantage of allowing multiple inferences to be performed using a single simulation set.

There are many choices for the form of the surrogate model; in this work, we use a deep neural network (DNN) trained using a novel warm-start procedure.^{14} This choice is motivated by several considerations; first, it gives excellent results for our datasets; second, DNNs are known to scale excellently for large datasets; third, DNNs can be generalized to predict diverse data types (images, spectra, and time series),^{15} all of which are represented in a typical ICF dataset. One disadvantage is that the huge capacity of DNNs almost always outweighs the number of training simulations that can be run, making overfitting a serious concern. For this reason, we use Bayesian dropout to equip our trained DNN with prediction uncertainties that reflect the range of possible models that are consistent with the training simulations.^{16} Dropout is a nonparametric approach, meaning that the uncertainty distribution must be built by Monte Carlo sampling. In this work, we draw *N _{d}* samples from the DNN output ${yDNN(i)}$ and build a kernel density estimate (KDE) for the prediction distribution.

^{17–19}Then, the approximate likelihood becomes

where $\Sigma T\u22121=\Sigma E\u22121+\Sigma KDE\u22121$ and Σ_{KDE} is the bandwidth covariance of the KDE model. Usually, Σ_{E} and Σ_{KDE} are diagonals and the total variance is simply the quadrature sum of the experimental error bar and the KDE bandwidth. This approach is convenient since the experimental error bar is usually large compared to the surrogate uncertainty, meaning that a reliable KDE can be built with only a small number of MC samples, *N _{d}* < 100. The likelihood in Eq. (5) can be evaluated very rapidly.

At this point, we have developed an approach suitable for the MCMC inversion of the experimental data to give posterior distributions of the latent variables for each laser shot. Since the number of latent variables is severely limited by the requirement that we can sample the latent space using detailed simulations, the MCMC is quite straightforward and can be run many times to investigate the importance of different experimental observables, prior distributions, etc. For a given set of MCMC runs, the posteriors are used to build a global scaling model as described in Sec. II B 2.

#### 2. Global scaling for a series of laser shots

Following the local matching procedure, we have a new dataset relating the experimental design parameters ${xE(i)}$ to the inferred latent variables ${zE(i)}$ (recalling that these are random variables with distributions found from the MCMC runs). The global scaling model aims to build a predictive model $P(z|x)$, trained purely on this dataset.

We have already mentioned that the number of experimental shots in a series is always small (∼10) and comparable to the dimensionality of **x**, *N _{x}*. As a result, the information available to learn the form of $P(z|x)$ is very limited. This situation is made worse by the fact that the posterior distributions often have a complex shape and so cannot be parameterized in a simple manner (this fact motivated the use of MCMC in the first place). Approximating the posterior distributions as an exponential-family distribution, for example, misses many important correlations between latent variables with serious consequences for the accuracy of the scaling model. This is a major challenge for the general applicability of the approach we are describing in this manuscript.

In the present work, we can develop a reliable model by inspection of the posterior distributions. As a function of **x**, the inferred distributions move around the latent space, but their detailed shape appears the same. In fact, the distribution moments higher than 1 (the mean) are approximately constant, and so we may build our scaling model by introducing a parametric dependence of the mean on **x** and using a common, nonparametric distribution for the higher moments. That is,

where $\u27e8z\u27e9$ is the mean value of the latent variables and $C$ is a nonparametric model of the higher order moments formed by combining the MCMC samples (with their means subtracted) for all the shots in the series. This approximation is specific for our application but can be generalized by expanding each posterior distribution into a polynomial chaos expansion^{20,21} and only treating the first *M* terms as dependent on **x**. In effect, we are taking only the first moment so that *M *=* *1. It is also easy to check by detailed comparison with the MCMC distributions we wish to approximate.

In the scaling model described by Eq. (6), the prediction from our calibrated model is controlled by the mean function $\u27e8y\u27e9|x$, and the correlation model $C$ describes the experimental uncertainty. The form of the mean function is unknown, resulting in potentially large extrapolation error. Here, we describe this using a parametric Bayesian “scaling” model,

in which *μ* is an assumed functional form, *α* are learned parameters, and *σ _{SS}* is the shot-shot variability in the mean. As before, we use MCMC to develop probability distributions on

*α*and

*σ*. The sparsity of the experimental data limits the number of learned parameters in each scaling model, and so, in practice, we try and remove many of the causal relationships between independent and latent variables on physical grounds.

_{SS}A final note regarding the shot-shot variability, *σ _{SS}*, is in order. This parameter is designed to account for the inherent variability in the NIF experiments; however, it can also serve to catch any residual variability that has not been captured by the particular choice of dependent or latent variables used to build our model. If different shots in a given series vary due to some varying design parameter that we did not include, then this variability will show up as increased shot-shot noise. As such, if the level of inferred shot-shot noise is very large, this can serve as a red flag for a poor model design, at which point we can go back and attempt to improve it.

#### 3. Summary of the global scaling model

At this point, we can summarize the training process for our global scaling model (also see Fig. 8 for a simple example);

Identify:

Independent Variables: The variables changed between laser shots, which define the inputs to the full scaling model

Latent Variables: Simulation inputs associated with a set of hypotheses for the performance degradation mechanisms in the experiments

Outputs: The observed quantities in the experiment which will be matched

Sample the space of latent variables with as many high-fidelity simulations as possible and train an accurate surrogate model

For each laser shot in the experimental series, generate samples from the Bayesian posterior on each simulation input by MCMC

Extract the means of each MCMC distribution and train a scaling model to predict the mean as a function of the independent variables.

## III. APPLICATION TO NIF BigFoot DATA

We have applied our Bayesian calibration framework to a series of NIF laser shots known as the BigFoot campaign. In the following, we give a detailed description of the formation of the BigFoot model and describe the physics insights that the training provides. We then demonstrate the extension of our model to further constrain the physics at play in the experiments, before giving a simple example of the application of the global scaling model to investigate paths to improved performance.

### A. Experimental data and simulation ensemble

The experimental dataset used in this study consists of 7 individual indirect-drive BigFoot implosions. The BigFoot platform uses tungsten-doped, high density carbon (HDC) ablator implosions and are designed to be relatively hydrodynamically stable and to have a predictable hohlraum.^{22,23} As a result, it is expected that the effect of low-mode asymmetries is small, making this campaign an interesting first application of our approach. In particular, we expect to get a reasonable description of the experimental observables using 2D simulations without applied drive asymmetries; this fact is crucial in reducing the size of the simulation parameter space.

A projection of the data is shown in Fig. 1. Over the 7 laser shots, at least 5 independent variables were changed, and so the data are very sparse. Alongside that, some of the independent variables are colinear; for example, there is a strong correlation between laser energy and the fill tube diameter, as well as between laser energy and power. It is well known that these colinearities will result in large interpolation and extrapolation errors^{24} in some directions in the design space. These features are typical of ICF datasets and can only be properly treated in a fully Bayesian treatment of uncertainties.

As with all NIF experiments, each laser shot produces a huge number of diagnostic quantities of various types. In this initial study, we identified 7 key observables that we will match in our analysis. They are the neutron yield, bang time, and burnwidth; the down scatter ratio (DSR), which measures the areal density of the stagnated fuel, the total 22 KeV X-ray yield, and the mean radius of the X-ray image (called the P0). Simultaneously matching all these diagnostics in a statistical manner is a significant challenge.

A crucial step in the success of our analysis is the generation of physics hypotheses and the meaningful mapping of these into a set of simulation input parameters. In a previous analysis, Thomas have used an artificial preheat of the DT fuel to reproduce the low observed DSR,^{25} which may be due to short wavelength fuel-ablator mix, while the shell is in flight.^{26} Alongside this, we have identified several other candidate physics degradation mechanisms and their associated input parameters;

Hohlraum Physics: Parameterized by a varying energy scale and peak power multiplier applied to the frequency-dependent capsule drive,

Capsule Support Tent: Described by a variable-amplitude surface roughness perturbation applied to the simulated capsule surface,

Fill Tube: Described by a Gaussian bump applied to the surface of the simulated capsule with a varying width and amplitude,

Poor Conditioning of the DT Ice: modeled by a varying amount of extra internal energy (preheat) added to the ice layer,

Hotspot Contamination: modeled by a varying level of premixed carbon in the DT gas fill.

Alongside these physics degradation mechanisms, we also vary the hydrodynamic scale of the target and the level of the tungsten dopant in the ablator. These are required to capture variations in the experimental dataset as well as the independent variables of the scaling model we wish to build.

All together, the above parameters define an 8 dimensional space of simulation inputs, which we investigate using the radiation-hydrodynamics code HYDRA.^{27} A nominal simulation is defined by generating a capsule drive from an integrated hohlraum simulation for one of the experimental shots (N170109, laser energy *E _{las}* = 1.1 MJ), which provides a radiation drive for a capsule-only simulation. The simulation input space is then sampled by applying the applied degradation mechanisms and input variations to the nominal capsule simulation and postprocessing to obtain values of the experimental observables in a range of points in the 8D space. In this work, we generated 70 000 2 dimensional HYDRA simulations, representing 20 million cpu hours and generating 25 TB of data. These points are then used to train our rapid neural network surrogate.

It is important to appreciate that the approximations in our simulation model, either due to computational constraints or deficiencies in the simulation itself, will be aliased into the degradation mechanisms we introduce to describe the experimental data or into our model for shot-shot noise. Given the very large number of simulations required in our multivariate analysis, some of these approximations could be severe (this is the motivation for the multifidelity analysis we describe in Sec. III C). In the current work, our 2 dimensional HYDRA simulations are able to resolve angular modes from 6 to 24 (128 cells in the polar angle direction), meaning that the effect of high-frequency perturbations due to surface roughness, interface mix, etc. is intentionally aliased into the latent preheat and hotspot contamination parameters. We use the SESAME equation of state^{28} for DT, QEOS^{29} for C, Lee-More conductivities,^{30} diffusive radiation transport, and Monte Carlo neutron transport. Deficiencies in these physics models are difficult to capture in a low-dimensional study and will require focused simulation studies that can feed back into our global analysis. The simulations are very fast and very robust (meaning >97% of simulations finish successfully), which is fundamental to making our approach feasible.

To train the neural network we use the DJINN decision-tree based initialization^{14} in which case the neural network structure and training hyperparameters are largely selected automatically. The DJINN approach uses Bayesian dropout to produce surrogate uncertainties,^{16} which defines a single hyperparameter *p*, the probability that a given node in the network is removed during training and prediction. We have found that setting the value of *p* too small can result in significant systematic errors in the surrogate prediction in some regions of the input space, while setting *p* too large results in an imprecise surrogate. We tune the value of *p* during training until the systematic errors are sufficiently small that they do not influence the results of our inferences. For the BigFoot simulations, we found that *p *=* *0.02 gives excellent generalization performance with quite small systematic errors in the region of the experimental data.

With the experimental observables defined and the trained surrogate model, we are now ready to build our local matches to all the shots in the dataset.

### B. Detailed analysis of individual laser shots

The first step in our analysis is the development of detailed fits to all shots in the BigFoot dataset, based on the experimental data and simulations described in Sec. III A. Before we proceed, it is important that we check that the simulations are able to match the experimental data. If this is not the case, then a new set of physics hypotheses are required. This check can be made by finding the maximum “a posteriori” (MAP) solution to the inference problem, that is, the best fit simulation parameters given the observables for each shot. This process is a generalization of standard *χ*^{2} minimization and returns the mode of the full posterior distributions we will develop in the rest of this work.

The MAP approach is a convenient starting point as the solution can be found very rapidly, allowing a feature selection procedure in which we attempt to match various subsets of the 7 observables we discussed previously. This provides protection against the situation in which no simulations are able to match all the observables, either due to missing physics, overly simple postprocessing, or inadequate physics hypotheses. By quantifying the quality of the MAP fit to different combinations of the observed quantities, we can discover which outputs are causing problems for the analysis. For the BigFoot data, this procedure identified two shots which cannot be matched by the simulation model, in the sense that it is not possible to match all 7 observables simultaneously. For those shots, all observables appear to be consistent with the simulations (and each other) except for the neutron yield, which is 2 times lower than the simulated value. The problematic shots are those with a larger fill tube (10 *μ*m vs 5 *μ*m for the others), which has been seen in several experimental studies to introduce a significant yield degradation.^{31–33} In principle, our simulations try to match both fill tube sizes by changing the size of the surrogate Gaussian bump; however, in reality, this simple picture is unable to reproduce the yield degradation when the fill tube is large.

The failure of the fill tube surrogate for certain shots poses two problems for our analysis. The first is that the matches to the lower energy shots are compromised; the MCMC that attempts to match all observables will compromise between them and give a bad fit. This problem can be solved by removing the neutron yield from the target dataset. In that case, the MCMC will explicitly match all experimental observables except the neutron yield, and we can use the experimental neutron yield as an independent check of the physical consistency of the fits to the data. The second problem is related to the global scaling model we aim to build in Sec. III D; since the fits for larger fill tube are untrustworthy, our model is limited to describe the scaling of the 5 *μ*m fill tube designs. This is a minor problem since the smaller fill tubes are now standard NIF design; nonetheless, it is clear that an experimental shot at low energy with a small fill tube would be very constraining for our global analysis.

It is worth noting that in the current application, the problems matching all the observables are quite simple to work around. However, this is unlikely to be true for all applications of our approach, particularly when fitting ignition-scale designs where significant physics uncertainties could be expected. In that case, it might become necessary to introduce a secondary output-space calibration similar to Eq. (1) to work around the physics deficiencies. Given the arguments made in Sec. II A, this is an undesirable approach, and so we would aim to use a MAP-based feature selection approach to ensure as much of the calibration as possible is done in the input space.

Given the extra understanding from the MAP models, we can perform detailed fits using MCMC. As discussed, our fits are generated by numerically matching 6 observables, those described in Sec. III A without the neutron yield. There are many possible approaches to the generation of MCMC samples, which vary in their numerical and statistical efficiency. For this work, we have found that a simple Metropolis-Hastings algorithm with a sample-based step size tuner works well^{34–36} and has the advantage that the algorithm hyperparameters can be tuned quite easily to give an efficient sampling. This is a significant advantage for complex problems like ours where the posteriors are not expected to have simple forms. Once tuned, we use our MCMC sampler to generate many millions of samples from the posterior, in 10–50 batches starting from randomly selected initial points. This approach is designed to allow us to resolve multimodal posterior distributions (describing different physical explanations for our data) which are a key feature of our approach. The samples are then trimmed down by a factor of ∼10 to give a set of well-mixed samples with low autocorrelation.

The MCMC fits provide important information about the quality of the physics hypotheses, as well as interesting avenues for further investigation. Figure 2 shows the difference between the MCMC fits and the experimental data, normalized by the total standard deviation of the fit (found by adding the experimental and surrogate errors in quadrature). Points that lie within 2*σ* are considered to be good fits; we see excellent agreement with all target observables for all shots. For the smaller fill tube shots (the rightmost 5 points in the figures), we also see excellent agreement with the experimental neutron yield. While the fits are excellent, there are some interesting features in Fig. 2; the predicted bangtime is consistently early, the X-ray image is consistently slightly too small, and the DSR shows an increasing discrepancy as the drive increases. These residual differences may be due to deficiencies in our physics model and could motivate more detailed studies to try and explain them; our statistical approach is designed to allow this (we give an example later in this section).

The main purpose of the MCMC fits is to update our understanding of the implosion based on the experimental data, and this process is best understood by looking at the posterior input distributions. The posteriors describe the probability that a simulation with a given set of physics hypotheses matches the experimental data and provide constraints on the true values of physics parameters.

A well-known source of uncertainty in indirect-drive ICF simulations is the response of the hohlraum, which we have characterized through variations in the peak power and energy incident on the capsule across the experimental shots. Since the shots span a range of laser energies, our analysis provides information on the scaling of the hohlraum response; however, the different shots were fired with different capsule radii and hohlraum dimensions, and so a correction is needed to properly compare them. We introduce a simple model for the actual energy incident on the capsule

in which *E _{las}* is the total energy in the laser pulse,

*R*and

_{cap}*R*are the radii of the capsule and hohlraum, respectively, and

_{hohl}*L*is the length of the capsule. Equation (8) is derived by assuming that the energy radiated by the hohlraum walls ∼

_{hohl}*E*and results in a 9% reduction in the conversion from laser energy to capsule drive energy for the higher energy shots. We compare this model for the “actual” capsule drive energy with the inferred distributions in Fig. 3 with generally good agreement. However, there are some differences, in particular, for three shots with a slight residual drive deficit; these shots had a significantly shorter pulse than the others which is not included in our simple correction. A more detailed analysis than Eq. (8) is required to investigate these effects.

_{las}Given that the inferred hohlraum response can be explained in terms of variations in the capsule and hohlraum geometry, it is clear that capsule degradation mechanisms are required to describe the observed experimental performance. We show the inferred capsule degradations for a single shot (N171015, 1.7 MJ) in Fig. 4. It is striking that the inferred degradation mechanisms are all small with the exception of the preheat. In particular, the probability of a low level of preheat is zero, meaning that the simulation model cannot describe the data without a significant amount of preheat. This result is consistent across all shots in the series and is driven by the combination of high temperature and low DSR in the experimental data.

The MCMC fits have provided significant information about the 2D performance of degradation mechanisms at play in the BigFoot experiments. The analysis does not require modifications to the large-scale hohlraum response and strongly prefers a significant degradation of fuel compression over the effect of engineering features. This fuel compression issue is described through the latent preheat variable, and the present analysis does not provide information on the likely cause of that preheat. If we can extend our approach to explain the origin of the preheat, then we will have a more physically meaningful result. Our Bayesian latent variable approach allows us to do this, as we describe in the Sec. III C.

### C. A physics-based explanation of preheat

Our Bayesian latent-variable approach allows for a multifidelity analysis in which a new set of simulation input variables are introduced to explain the latent variables used in the first analysis. We will demonstrate this approach by introducing a new set of physics hypotheses which could provide a physical explanation of the large levels of preheat (∼10% of the total energy in the hotspot^{37}) inferred in Sec. III B.

We start with the MAP fit to a particular shot (N170109, 1.1 MJ) from the previous study, that is, a simulation that matches all experimental observables (besides the neutron yield in this case, since N170109 used a larger 10 *μ*m fill tube) by introducing 250–300 J of preheat into the cold fuel at peak kinetic energy. This MAP simulation is the basis of a new simulation study in which all of the previous degradation mechanisms are fixed at the best-fit values and a new set of physics hypotheses are introduced, with the aim of explaining the observed preheat. We consider 3 new mechanisms; changes in the hard X-ray (*hν* >1.8 KeV) component of the drive spectrum which can penetrate the ablator and heat the ice, changes in the rise time of the main drive pulse which can change shock timing, and a simple model for fuel-ablator mix in which a constant fraction of the distance to the fall line is mixed.^{38} A similar mix model has been shown to be consistent with Beryllium-ablator implosion data.^{39} Alongside these new processes, we also allow unphysical preheat and varying peak drive power in the same manner as before. At this point, the analysis proceeds as before; sample the new space of physics parameters using high fidelity simulations, train a DNN surrogate, and generate a fit to the experiment using MCMC.

Correlations in the MCMC posterior distributions suggest explanatory power between the correlated variables. We therefore plot 2 dimensional probability, showing the joint probability of each of the new physics degradations with the preheat in Fig. 5. It is quite clear that the level of fuel-interface mix (increasing values of the “fracl” parameter on the vertical axis correspond to increasing mix) is strongly correlated with the preheat. In fact, fuel-ablator mix can completely explain the level of preheat since a simulation run with fracl ∼0.075 does not require any to match the data (in fact, the match is slightly improved). None of the other physics processes has this property; in some cases, the parameters are inferred to be different from their nominal values, but they do not correlate with the nonphysical preheat. The result of this analysis is quite clear—of the physics processes we have considered, only fuel-ablator mix can explain the observed degradation in fuel compression.

At the end of our extended analysis, we have developed two candidate models that explain the observed performance. The first is the original “preheat” model, where a significant amount of preheat is invoked. The second “mix” model uses a fuel-ablator mix model to remove the requirement for nonphysical preheat entirely. These two explanations are almost identical as far as the experimental data are concerned, but we tend to favor the second since it is more physically satisfying. It is interesting to investigate how these two models are different, and since our model is based entirely on input-space calibration, we have very detailed information in the form of numerical simulations.

Figures 6 and 7 show detailed temperature and density profiles from 2D HYDRA simulations of the two models for shot N170109. Our multifidelity analysis approach has significantly updated our understanding of the implosion, with important implications for the expected scaling and robustness of the BigFoot design.

### D. A global model of BigFoot implosion performance

The final piece of our analysis is the synthesis of the global fits from Sec. III C into a single, global model which can be used to make predictions away from the experimental data. In this section, we will describe the training of this model and then give a simple example of its application to compare potential avenues for increased yield.

As described in Sec. II B 2, the core of the global model is a set of parametric scaling models which relate the independent variables in the experiment **x** to distributions of simulation inputs **z**. The set of independent variables considered needs to capture the most important variations in the experimental dataset; for the current analysis, we use the laser energy *E _{las}*, the laser peak power

*P*, the atomic fraction of the W dopant

_{las}*f*, and the outer diameter of the fill tube Φ

_{dp}_{DT}. These are used as predictors for the mean values of the latent variables (simulation inputs) according to Eq. (7), which in turn describe the full simulation input distributions through Eq. (6).

It is important to limit the number of trained parameters in the scaling models since the number of data points is small. This can be done by limiting the functional dependencies on independent variables based on physical arguments and by choosing simple forms for the scaling functions *μ*(**x**). Of course, we also desire a good fit to the data, and so the limitations should not be too severe. To reach a compromise, we employ a similar approach as the one described before, running relatively fast MAP fits for various options and inspecting the fits until an acceptable set of scaling models is found. It might seem like this process is laborious; however, it is an essential step in ensuring that the scaling models are physically reasonable, which in turn is very important in ensuring accurate extrapolations from the model. In this work, MAP models are generated for various sets of candidate inputs $x\u0302$ to each scaling model, as well as for two choices of scaling function

or

The results of the MAP study are shown in Table I. The inferred relationships contain valuable physics information which could be used to understand the behavior of the performance degradations or deficiencies in the simulation code.

Simulation input z
. | Independent variables x . |
---|---|

Hydrodynamic scale | E _{las} |

Peak drive power | S |

Fill tube amplitude | E , Φ_{las}_{FT} |

Fill tube FWHM | None |

Log (hotspot contamination) | P, _{las}f, Φ_{dp}_{FT} |

Tent roughness | E, Φ_{las}_{FT} |

Preheat | E, _{las}P, _{las}f _{dp} |

Simulation input z
. | Independent variables x . |
---|---|

Hydrodynamic scale | E _{las} |

Peak drive power | S |

Fill tube amplitude | E , Φ_{las}_{FT} |

Fill tube FWHM | None |

Log (hotspot contamination) | P, _{las}f, Φ_{dp}_{FT} |

Tent roughness | E, Φ_{las}_{FT} |

Preheat | E, _{las}P, _{las}f _{dp} |

Given the assumed functional forms and dependencies on independent variables, we are ready to train the scaling model. We show an example training workflow in Fig. 8. The mean inputs for each shot are extracted from the MCMC and used to train the scaling models. The remaining, possibly complex, shape of the MCMC posteriors is approximated through the correlation model $C$ which is formed by concatenating the mean-removed MCMC samples from all shots into a single set. The assumption is that the MCMC posteriors have a common shape across all shots, with the variations between them entirely captured by the variation in the mean. This assumption is checked by comparing the combined model with the shapes of the individual distributions as shown in the lower middle panel of Fig. 8; for the current application, the shapes match very closely.

Once the scaling and correlation models have been trained, performance predictions can be made over the whole range of independent variables by predicting the mean latent variables, adding the samples from $C$, and using the surrogate to map them into experimental observables. Since every stage of the training is fully Bayesian, prediction uncertainties can be generated quite easily by nested sampling of the various MCMC posterior distributions. As described previously, these prediction uncertainties include all of the important contributions, and in addition, we are able to measure each one individually and determine which is the most important contribution.

We will conclude our discussion of the global scaling model with a relatively simple application to the performance scaling of the NIF BigFoot experiments. We aim to answer an important question for current ICF research: What is the most efficient path to an increased yield starting with current NIF experiments? There are at least two paths forward; the first is to simply increase the scale of the current experiments, in which case the drive energy and power must also be increased, possibly beyond the capabilities of the NIF laser system. The second is to drive higher quality implosions by improving or removing degradation mechanisms. With our global scaling model, we are now equipped to compare these two options.

The yield increase from the two options is shown in Fig. 9, where we show the predicted yield improvement for shot N171015 when the target is scaled or the quality is improved. The quality improvements are quite easy to consider by taking the best fit simulations to the experimental data and switching off the various degradation mechanisms. The improvements due to the increased scale from the global scaling model are also shown, where the scaling is done at constant velocity by introducing a nonhydrodynamic increase in peak drive power. We find that controlling preheat increases the experimental yield by 1.9×, while controlling all degradation mechanisms (preheat, fill tube, and tent) results in an increase in 2.2×. Comparison with the scaling model shows that the factor 1.9× is equivalent to an increase in the laser energy of $500\u2212280+330$ KJ, from 1.7 to $2.20\u22120.28+0.33$ MJ. This is a significant result since it suggests that improving the quality of the fairly low-yield shot can improve the yield beyond the capabilities of NIF. We have already shown that we can constrain the cause of this preheat, and so this suggests an important path forward for increased performance.

## IV. FURTHER WORK

It is clear that further work is required to fully validate our predictive model. This can be done using the standard cross-validation technique, in which a subset of experimental data are held out of model training and then used to validate the model afterwards. In the case of the NIF BigFoot data, this is challenging since the data are so sparse that removing data during training significantly reduces the quality of the final model. We therefore plan on validating our approach by predicting future NIF shots using the current best model and comparing our predictions as the experiments are performed. We can expect to significantly increase the number of BigFoot shots over the next few years, and so this will provide an interesting test of both our modeling approach and the convergence of our physics understanding as more data are collected. There is also a wealth of existing data for other experimental platforms which can be used to validate our approach in a similar manner.

Even after thorough validation, our approach can only explain the observed data in terms of the physics hypotheses we include in the underlying HYDRA simulations. In the present analysis, the excellent agreement between the calibrated simulations and the data gives us confidence that we have found a valid explanation of the data; however, it may not be the “only” explanation. In particular, we have neglected drive asymmetries which are known to be important in NIF implosions. An important extension of this work will be to include these asymmetries with the aim of either improving the final fit or providing an alternative explanation of the observed performance. A complementary study will be to include more experimental observables, for example, X-ray images and neutron sky data, which could differentiate between the bulk compression issues studied here and drive asymmetries. As mentioned, the extension to these diverse data types was an important factor in motivating the use of deep neural networks in our model.

Alongside the prediction problem described here, it is also interesting to use our calibrated model for experimental design.^{40} Given a particular overall goal, for example, reducing the prediction uncertainty, we can evaluate the expected improvement in the model for various candidate experiments.^{41} This information can help to make design decisions over the course of an experimental campaign; given the sparsity and expense of ICF experiments, this could be very significant. Even if it turns out that the global model is too uncertain to be of use for prediction (due to lack of data), it can still be used to design new experiments with the aim of improving the global model itself.

Finally, since our calibration is underpinned by physics hypotheses, we can use our results to make connections with other experiments on different facilities and at different scales as well as high fidelity simulations. For example, it would be interesting to compare our simple fall-line mix model with high resolution simulations which are too expensive to match with the data directly.

## V. SUMMARY AND CONCLUSIONS

We have described the challenges associated with model calibration using NIF data and developed a Bayesian framework specifically designed for this application. In particular, we have identified the importance of remaining physically consistent with the underlying simulation codes, and this has motivated a latent-variable input space calibration approach in which we introduce known modifications to underlying physics models. This approach is essential for ensuring a physically sensible prediction as results are extrapolated to new regions of the experimental design space. Our fully Bayesian approach allows us to equip our predictions with meaningful uncertainties from all sources, which are essential to ensure that we properly represent our current understanding of ICF implosions, at the current scale and for new experiments.

The application of this approach to NIF BigFoot implosions has demonstrated the strength of this approach. The calibration procedure is able to fit a large number of experimental observables, including both X-ray and neutron diagnostics. Using 2-dimensional radiation-hydrodynamics simulations, we identify a significant degradation in fuel compression and find that several competing hypotheses including drive energy issues and the effect of engineering perturbations are inconsistent with the experimental data. This result is only possible with a fully Bayesian input-space calibration approach. Our latent variable approach also allows for a multifidelity analysis where the nonphysical fuel preheat is expanded into multiple candidate explanatory physics effects, which can be constrained in a similar manner as the first round of analysis. This approach convincingly picks fuel-ablator mix from several possible causes of fuel compression problems, again assigning near-zero probability to the competing explanations. Since our approach provides detailed simulation models for the two explanations of the experimental data, we can explore the differences in detail; in this work, we find very different in-flight profiles.

The global scaling model, which is built out of detailed analysis of all the BigFoot data, provides a calibrated model of the observables as a function of hydrodynamic scale, peak laser power, dopant fraction, and fill tube diameter. We have used this model to compare two potential approaches to the increased neutron yield, improving implosion quality and increasing the target scale. The effective cost of degraded fuel compression is $500\u2212280+330$ KJ in laser energy, showing that improvements in implosion quality can improve the neutron yield beyond the current capabilities of NIF. The yield increase in either case is modest, which is a result of the current analysis being based around rather low performance shots which are not expected to scale favorably. Extension of this work to describe high-performance implosions will be very interesting.

The statistical calibration approach we have described has been made possible by significant advances in both high-volume simulation studies^{42–44} and deep learning for scientific applications.^{14,15} The results we have presented here are a proof-of-principle for the reliable application of these novel methods to ICF and HEDP in general. We feel that the physics-based calibration approach we have outlined will play an increasingly important role in our understanding of complex physical systems like ICF implosions.

## ACKNOWLEDGMENTS

This paper was prepared by LLNL under Contract No. DE-AC52-07NA27344. This document was prepared as an account of the work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.