Bayesian analysis enables flexible and rigorous definition of statistical model assumptions with well-characterized propagation of uncertainties and resulting inferences for single-shot, repeated, or even cross-platform data. This approach has a strong history of application to a variety of problems in physical sciences ranging from inference of particle mass from multi-source high-energy particle data to analysis of black-hole characteristics from gravitational wave observations. The recent adoption of Bayesian statistics for analysis and design of high-energy density physics (HEDP) and inertial confinement fusion (ICF) experiments has provided invaluable gains in expert understanding and experiment performance. In this Review, we discuss the basic theory and practical application of the Bayesian statistics framework. We highlight a variety of studies from the HEDP and ICF literature, demonstrating the power of this technique. Due to the computational complexity of multi-physics models needed to analyze HEDP and ICF experiments, Bayesian inference is often not computationally tractable. Two sections are devoted to a review of statistical approximations, efficient inference algorithms, and data-driven methods, such as deep-learning and dimensionality reduction, which play a significant role in enabling use of the Bayesian framework. We provide additional discussion of various applications of Bayesian and machine learning methods that appear to be sparse in the HEDP and ICF literature constituting possible next steps for the community. We conclude by highlighting community needs, the resolution of which will improve trust in data-driven methods that have proven critical for accelerating the design and discovery cycle in many application areas.

In recent years, there has been a tremendous increase in the development and application of the so-called advanced analysis techniques to problems in Inertial Confinement Fusion (ICF)1–3 and High Energy Density Physics (HEDP).4 Experiments in this field are conducted using some of the most complex facilities on the planet with an array of diagnostics that stretch the limits of measurement in exceedingly harsh environments to study matter in the most extreme conditions produced on Earth. Facilities such as the National Ignition Facility (NIF)5 at Lawrence Livermore National Laboratory (LLNL), the Z Machine6,7 at Sandia National Laboratories (SNL), the OMEGA laser8 at the Laboratory for Laser Energetics (LLE), the Laser Mega-Joule (LMJ)9 at the Commissariat à l’Énergie Atomique (CEA), and the Linac Coherent Light Source (LCLS)10 at the National Stanford Linear Accelerator Center (SLAC) provide scientists with the ability to study dense fusion plasmas, planetary physics, astrophysical phenomena, material properties, shock physics, and more. The new generation of ultra-intense lasers such as those found throughout the international Extreme Light Infrastructure (ELI),11 planned Zettawatt-Equivalent Ultrashort pulsed laser System (ZEUS)12 at the University of Michigan, and others provide access to unprecedented power and repetition rates to study high-field physics, radiation production, and more. Numerous university-scale laser and pulsed power facilities support these large efforts by training students and scientists and enabling additional studies. These facilities provide unique opportunities for studying matter in extreme conditions and unique challenges in collecting, analyzing, and interpreting data.

The reason for the recent growth in advanced analysis methodologies originates from the realization that few, if any, quantities-of-interest can be directly measured in experiments. This comes from two distinct but related realities of the field. First, HEDP and ICF experiments, by their very nature, operate in a transient regime where spatial and temporal gradients are rarely possible to avoid and often dominate the interpretation of our measurements. This means that all measurements we make are best described as weighted measurements, where the weight is provided by the messenger used to make the measurement (neutron emission from a fusion plasma, spatial and temporal character of an x-ray pulse used to perform Thomson scattering, etc.). The interpretation of such weighted quantities then requires that we use a physical model of the system that can account for gradients. The second realization is that many of the most important quantities we desire to understand from the experiment are not directly measurable at all, even in the sense established above. Instead, we must combine multiple measurements obtained from different instruments through a physical model to obtain a constraint on the quantity of interest.

This is perhaps best illustrated by considering the generalized Lawson criterion χ, which characterizes how close a given fusion experiment is to the ideal ignition threshold,13 
(1)
(2)
where the first line shows a general form of χ and the second line shows a simplified version applicable to a clean, 1D, isobaric plasma. The burning fuel, or hotspot, volume is VHS, ɛα is the energy of the emitted α particle, and ⟨σvDT is the DT fusion reactivity. The key quantities of interest are the ion temperature T, the hotspot pressure PHS, and the energy confinement time τE. The ion temperature is directly measurable in the first sense above, where an instrument such as a neutron time of flight (nTOF) detector can measure a signal that is sensitive to a spatiotemporal integration of the ion temperature weighted by the local neutron emissivity. The pressure and confinement time are not directly measurable at all, and so we use proxy quantities, such as the full-width at half-maximum (FWHM) of the nuclear burn history, which are informed by simulation. Alternatively, simulation and theory can be used to recast quantities such as χ in terms of quantities that are measurable, implicitly invoking a model assumption to constrain these relationships.

The picture above further assumes that we are able to extract reliable and useful features directly from our raw data and assign meaningful uncertainties to them. Unfortunately, for the HEDP and ICF experimentalist, our instruments exist in harsh environments where significant noise, background, and other artifacts are present. Perhaps more unique to these experiments is the fact that our instruments are routinely moved, modified, and otherwise adjusted. This makes common tasks such as calibrations and flat-fielding difficult and often impractical. The ability to extract important information directly from noisy data and quantify our confidence in such a process is the crucial first step in our ability to use these data to constrain models and improve our understanding. Furthermore, the process of reducing data is labor intensive and often requires user input, which can compromise reproducibility. The repetitive and laborious nature of data reduction, which often requires defining and removing backgrounds from signals and images, calibrating spectra, etc., often means that significant quantities of data remain unprocessed and, subsequently, unused.

It is with these challenges in mind that we define what we mean by the term advanced analysis. We seek methods to analyze experimental data that are robust to complex artifacts in the data, quantify uncertainties in both the data and inferred quantities, and enable the self-consistent integration of multiple sources of information to constrain quantities of interest, some of which may not be observable. Historically, the approach to this in HEDP and ICF has been to measure as many quantities directly as possible and compare these individual quantities to simulations of the experiment, looking for consistency. Alternatively, the measured quantities are often used as input to another equation to estimate a secondary important quantity. Unfortunately, this approach ignores shared information between different instruments and does not admit the inclusion of additional information if a new instrument provides constraining information. It also makes the propagation of uncertainties challenging and generally overestimates the resulting uncertainty.

The preferred method to circumvent the shortcomings of a more traditional approach relies on Bayesian inference. Bayesian methods provide a consistent statistical framework for the extraction of physical insights from experimental data. These methods are able to elegantly handle noisy and even incomplete data, providing a means to regularize ill-posed problems. While Bayesian methods have been popular in many fields, such as neuroscience,14–16 biology,17,18 astronomy,19–22 and others,23–25 the ICF and HEDP community has been slow to adopt them. However, the rapid acceleration of computational capabilities and the proliferation of open source, general use software packages for probabilistic programming have started to change this. In the last decade, a small, but growing number of applications have seen the use of Bayesian methods for data reduction, inference of experimental quantities, and calibration of complex physics models, which will be discussed in this Review.

Furthermore, the rise in popularity of Machine Learning (ML) methods in physical sciences has opened a wide array of new tools for use in the processing, reduction, and integration of experimental data. These tools offer the potential for automation of many common tasks, increasing the throughput of data so that vastly more experimental data can be used to inform models and falsify hypotheses than ever before.

We begin in Secs. II AII D by reviewing the foundations of Bayesian inference. In Sec. II A, we explore the example of inferring the ion temperature from a neutron spectrum that allows different approaches and approximations to be explored. As a companion to illustrate the methodology, we provide a Jupyter notebook with all of the code necessary to run the problem and explore the different methods.26 This is followed by practical advice for the application of Bayesian methods to problems of interest. In Sec. II E, we discuss examples from the recent literature that have employed Bayesian inference to improve the quality of analysis over what was possible before or to access fundamentally new information. In Sec. III, we discuss the topic of Bayesian data assimilation and review relevant examples in the literature. This extension of Bayesian inference allows for the use of multiple disparate sources of diagnostic information to jointly constrain model parameters. In Sec. IV, we discuss the application of a variety of methods to accelerate the computationally expensive task of statistical inference. This section will introduce the key concepts of a variety of ML methods for producing efficient surrogate models that can be employed and a variety of statistical approximations that can be used to improve efficiency with examples from the literature. In Sec. V, we discuss methods for quantifying and dealing with noise and background, extracting useful reduced features from raw data, and quantifying uncertainty from these processes. We highlight multiple examples from the literature where these techniques have been successfully applied. In Sec. VI, we discuss ways to extend the demonstrated use of these techniques to improve our ability to integrate experiment and theory, test hypotheses, and intelligently guide the design of experiments. In Sec. VII, we discuss actions that could be taken to accelerate the adoption of these methods across the ICF and HEDP community. Finally, in Sec. VIII, we summarize the current state of advanced data analysis in ICF and HEDP and the outlook for the future.

The methods and examples we chose to highlight are generally more mature and advanced outside of the field of ICF and HEDP. We do not consider this Review to be exhaustive, and as such, the interested reader will need to supplement the material herein with textbooks, reviews, and articles outside of this domain. We make no attempt to be comprehensive in this regard, but do provide some useful references, and the cited articles are likely to hold a trove of valuable resources. We strongly encourage the reader to use the accompanying Jupyter notebook as a resource to aid in the development of intuition and applicability of the methods discussed. We hope that this Review provides the interested reader with a sense of the power and potential of these methods and enough information to start them on their journey.

Bayes’ theorem provides experimental scientists and data analysts with a framework that allows for the solution of non-trivial, and often ill-posed, inverse problems with rigorously defined uncertainties by casting the problem in probabilistic terms. Bayes’ theorem specifies the relationship between the probability of a hypothesis being correct given an observation and the probability of observing a particular outcome given the hypothesis. Therefore, the Bayesian framework provides us with a way to transform an inverse problem to a forward problem, reducing the complexity of the analysis and improving the robustness of the solution. Many texts have been written on this subject,27–30 to which we refer the reader for a more comprehensive discussion of Bayesian statistics, their implications, and their use in a range of disciplines.

In this work, we are primarily interested in the application of Bayesian methods to the analysis of experimental data in the fields of ICF and HEDP. The basic problem we aim to solve is the following: With a set of experimental observations D, with associated uncertainties σ, and a model describing the system of interest f(θ), where θ is a vector of model parameters, we wish to find θ that best describes the observations. Bayes’ theorem provides us with a self-consistent framework with which to solve this problem. To see this, we first write Bayes’ theorem,
(3)
Each term in Eq. (3) is a conditional probability providing a specific meaning related to our task. The first term in the numerator on the right-hand side (RHS) of Eq. (3) is known as the likelihood, which describes the probability of a particular observation D given a specific set of model parameters θ and any background information we have regarding the system of interest I. This background information can be viewed as physical constraints, assumptions made in the formulation of the model, or even previous observational or theoretical evidence that informs our task. The mathematical form of the likelihood depends on the type of observation being considered. The second term in the numerator is known as the prior probability, which encodes any information or belief we have about the model parameters θ prior to seeing the observations conditioned on our background information I. Hard constraints or unphysical values can be excluded from the solution using a prior distribution with bounds. Additionally, priors can be used to provide stronger guidance for the solution by for example, imposing correlations on the model parameters. Generally, it is preferred to use maximally uninformative priors so that they do not unduly bias the inference. We will explore the topic of selecting priors in more detail in Sec. II D. The denominator in Eq. (3) is known as the evidence and is a normalization constant for the posterior. As we can see, it does not depend on the model parameters at all and therefore is often omitted when one is only interested in estimating the parameters of a given model. However, this term becomes essential when one wishes to evaluate which model among a library of competing models best matches the available data. Finally, the left-hand side (LHS) is known as the posterior probability, which encodes the probability distribution of model parameters θ, given the observed data D and our background information I. This is the quantity of interest we seek when performing Bayesian analysis.

In what follows, we will use an example motivated by ICF research to elaborate on these concepts and the practical implementation of Bayesian inference.

An important task in ICF research is the measurement of the ion temperature. This is because in a thermonuclear plasma, the temperature of the ions uniquely determines the fusion reactivity. Several instruments have been devised to inform this parameter; here, we will focus on a simplified process where we have access to the neutron energy spectrum directly. This is akin to a simplified version of the Magnetic Recoil Spectrometer (MRS)31,32 fielded on the NIF and OMEGA 60. The MRS works by converting neutrons to knock-on protons or deuterons by interacting the incident neutrons with a CH or CD foil. The knock-on charged particles are dispersed using a magnet system where they are detected by CR-39 foils. The energy of the incident particles is determined by the position on CR-39. Myriad effects contribute to blurring of the spectrum, determining the spectrometer resolution, as well as the absolute calibration. In what follows, we will ignore these effects for simplicity and treat the measurement as a series of counts collected in known energy bins. We simulate the counts as a Poisson process with a model for the neutron spectrum determining the rate parameter (specifying average counts per unit time), including a fixed background rate.

To generate the test data, we make use of the relativistic neutron spectrum model from Ballabio.33 We do not include any downscatter, making this spectrum appropriate for application to exploding pushers, which have negligible areal density.34 We further assume that the neutrons are produced by a single temperature deuterium/tritium (DT) plasma. This is not necessary, and arbitrary temperature and density distributions can be used to generate data with ease; however, using a single temperature makes comparison of inferred values to the known true values less ambiguous. In order to apply Bayesian methods, we require uncertainties on the data. Because our data generation process is Poissonian, the expected variance in any data point is just the recorded counts, giving the square root of the variance σi=Di, where Di are the counts measured in each bin. We assume that the recorded counts are the sum of the data and a fixed background process. Figure 1 shows the data generated for this example in blue along with the true neutron spectrum in orange. We note that we have deliberately chosen a count rate that is low enough such that proper treatment of counting statistics is important in order to obtain useful results.

FIG. 1.

Blue staircase plot shows the neutron counts in each energy bin with corresponding errorbars comprising our sample dataset. The orange dashed line shows the true neutron spectrum used to generate the data.

FIG. 1.

Blue staircase plot shows the neutron counts in each energy bin with corresponding errorbars comprising our sample dataset. The orange dashed line shows the true neutron spectrum used to generate the data.

Close modal
We begin by making further assumptions that allow this problem to be solved analytically. Although this is not often the case, in practice, it does help develop intuition. The first assumption is to approximate the posterior with a normal distribution. This assumption can be a very useful method to obtain quick, approximate results, even when the full model cannot be approached analytically. Accordingly, our posterior takes the following form:
(4)
For this multivariate Gaussian distribution, the mean is located at θo and the covariance matrix is Σ = Λ−1, which defines the width of the distribution, as well as linear correlations between the parameters. Λ is often referred to as the precision matrix. As an example, the covariance matrix describing a multivariate normal distribution with three parameters has the form
(5)
where σi2 is the variance of the ith parameter and Σij is the covariance between the ith and jth parameters. The matrix is symmetric such that Σij = Σji. This form is trivially extended to arbitrary dimension.
Our task is to use Eq. (3) to find θo and Σ. According to Bayes’ theorem, we must specify the likelihood and the priors. We define our priors to be unbounded uniform distributions. Accordingly, they show up as multiplicative constants. The likelihood is specified according to the measurement at hand. Our problem deals with counting particles on a series of detectors, and so a Poisson distribution would be most appropriate. However, if we assume a large number of counts, the Poisson distribution converges to a normal distribution, simplifying our task. The RHS of Eq. (3) is then given as
(6)
with Z being a parameter independent normalization (unimportant for our purposes), Di, i = 1…N, being the measured data points, σi being the measurement uncertainty associated with each point, and Cθj being the constant scale factor for the prior on each parameter. The forward model used to generate the data for comparison is given by f(Ei|θ), where Ei are the energies at each bin where data are collected. Because we are only concerned with a proportionality, we can drop any scale factors that do not depend explicitly on θ, allowing for a simplified expression. Additionally, the log of this expression is typically used for numerical convenience, giving
(7)
This expression is easily recognized as the negative of the familiar χ2 expression often used in data analysis. In fact, a Bayesian approach to data analysis with the assumption of a normal likelihood and unbounded uniform priors is exactly equivalent to the familiar χ2 minimization procedure. This is an important point, as it illustrates that the concepts used in Bayesian inference are not really all that unfamiliar, but a generalization of the special case we are more comfortable with. Since the LHS and RHS of Eq. (3) are equivalent, finding the peak in our posterior amounts to finding the peak in L, also known as the Maximum Likelihood Estimate (MLE) or alternatively the minimum of χ2. More generally, with arbitrary prior and posterior forms, this is called the Maximum A Posteriori (MAP) solution. We can, however, do more than find the optimum, we can determine the covariance as well, which, under the stated assumptions, determines the uncertainties in our inference. We begin by expanding L in a Taylor series about the optimum point θo,
(8)
The first derivative term in the expansion disappears because we are evaluating at the optimum. We can rewrite this in the matrix notation, dropping the constant term, as follows:
(9)
This is immediately recognizable as the exponent in the definition of the posterior from Eq. (4) with Λ=L|θo, where ∇∇L is the Hessian matrix. In other words, determination of the posterior has been cast as a simple optimization problem to obtain the optimum θo, followed by the computation of the Hessian. This approximation where the posterior is assumed to be a multivariate normal distribution with mean located at the mode of the true distribution θo and covariance determined by the Hessian computed at the mode is called the Laplace approximation.35 Intuitively, the Hessian describes how narrow or broad L is about the optimum. The terms along the diagonal determine the width of the distribution for each parameter. If we imagine the equal probability contours of a two-dimensional Gaussian distribution as a series of ellipses, we see that the principal axes of the ellipse need not be aligned with the coordinates of our model. The off-diagonal terms give the rotation, if any, of the distribution. In terms of distributions, this rotation means that two or more parameters are correlated with each other. This approximation is able to capture correlations up to first order, referred to as linear correlations. Higher order correlations imply curvature of the distribution and will violate the assumptions employed in this example, requiring more terms to be retained in the Taylor series expansion of L.
At this point, we must specify f(Ei|θ) to make further progress. We have been able to proceed analytically up to this point; however, taking derivatives of L analytically requires that f be analytically differentiable. Fortunately, we can make one final assumption that allows this for the neutron spectrum. Physically, the peak location and width of the neutron spectrum are related to the ion temperature; however, the shift in the peak location with temperature is small and can be difficult to resolve, so we will ignore this dependence for the moment. Following the work of Brysk,36 we can specify the neutron spectrum as a Gaussian distribution with variance proportional to the ion temperature,
(10)
In the above equation, θ1 is a scale factor, θ2 is the ion temperature, θ3 is the central energy, and θ4 is an additive background term. The factor w=2mnmn+mαE, where mn is the neutron mass and mα is the α particle mass. Because the Brysk model assumes that the average neutron energy is independent of ion temperature, we fix it to ⟨E⟩ = 14.021 MeV. This will introduce a bias in our estimate of the ion temperature, though we expect it to be small. Using the dataset from Fig. 1, we can use any common optimization package to solve for θo by maximizing Eq. (7) with Eq. (10) substituted for f(Ei|θ), from which we find that the best fit ion temperature is 4.5 keV and the best central energy is 14.04 MeV. The central energy is found almost exactly, while the ion temperature is not far from the true value of 4.75 keV, but we must determine the covariance and evaluate the posterior to see how much the data constrain our inference.

Given the optimum found above, we now compute the Hessian to fully determine the posterior distribution. Because of the approximations we made in this example, this can be computed analytically using a symbolic mathematics software or an automatic differentiation software tool. Once we have the Hessian, we take its inverse to obtain the covariance matrix. The posterior obtained in this way is shown as the red contours in the plot in Fig. 2. This kind of plot is known as a corner plot, where each plot shows the pairwise distribution among two parameters. From this plot, we can visualize the mode of each parameter (marked with a black square and black solid lines), the width of the distribution, and any correlations present. We see that the ion temperature is positively correlated with both the amplitude and the energy shift. The constant offset is negatively correlated with both the ion temperature and amplitude.

FIG. 2.

Corner plot showing the posterior distribution of the model parameters ion temperature, amplitude, energy shift, and constant offset inferred by fitting our example data. The red contours show the distribution inferred with our analytic Laplace approximation. Blue contours show the posterior inferred using the normal likelihood sampled via MCMC. Fully marginalized distributions are shown in the plots along the diagonal with the pairwise distributions in the off-diagonal plots.

FIG. 2.

Corner plot showing the posterior distribution of the model parameters ion temperature, amplitude, energy shift, and constant offset inferred by fitting our example data. The red contours show the distribution inferred with our analytic Laplace approximation. Blue contours show the posterior inferred using the normal likelihood sampled via MCMC. Fully marginalized distributions are shown in the plots along the diagonal with the pairwise distributions in the off-diagonal plots.

Close modal

Since the ion temperature is the only parameter of real interest in this example, we can use this distribution to determine the uncertainty of our estimate of 4.5 keV from the optimization above. The other parameters, while necessary to produce a fit, are not of physical interest and are referred to as nuisance parameters. Because we have assumed a Gaussian distribution, quoting the standard deviation fully characterizes the distribution, giving a measurement of 4.5 ± 0.75 keV. As we can see, the true value of 4.75 keV is within one standard deviation of the inferred value.

As stated, it is not often possible to compute the derivatives analytically, and so another means to solve this problem is desired. Additionally, even for this simple example, we had to make significant approximations to enable our analytic solution. The Laplace approximation can still be used without analytic derivatives by applying a finite difference method instead, but the posterior is not always guaranteed to be well approximated as a normal distribution. In order to relax this assumption and allow the posterior to take whatever form the problem dictates, we must have a method by which we can sample an arbitrary probability distribution. For this problem, the RHS of Bayes’ theorem is most generally written as in Eq. (6), where we maintain uniform priors. The technique of choice for sampling this distribution is known as Markov Chain Monte Carlo (MCMC).28,29,37–40 Here, we present the basic theory of Markov chains and stationary distributions needed to understand MCMC. We additionally present a specific example of an MCMC algorithm to clarify these concepts. We present the formulation using a continuous state-space notation as this is typically more appropriate for problems in HEDP and ICF. Additionally, many presentations of the theory in discrete formulation are readily available and correspond to intuitive conversions of integrals to summations from our presentation.29 

A Markov chain is a stochastic process where the probability of transitioning from the current state indexed by j to the next state at j + 1 only depends on the current state. Mathematically, this is expressed as p(xj+1|xj, xj−1, …, x0) = p(xj+1|xj). Simply stated, a Markov process has no memory. To emphasize the sequential nature of transitions from one state x = xj to another y = xj+1 in a Markov process, we define the notation p(xy) ≡ p(y = xj+1|x = xj). Note that p(xy) must satisfy
(11)
When there is a unique probability density π(x) satisfying
(12)
π is referred to as the stationary distribution for the Markov chain with transition probability density p(xy). Beginning from some state x0 and sequentially drawing samples accepting the transition with probability given by the transition density to build the chain of samples {x0, x1, …, xn}, eventually, the sampling converges to drawing from the stationary distribution π, i.e., xiπ. A stronger condition than the Markov property for p(xy) that will guarantee the existence of the stationary distribution π is that of detailed balance. Detailed balance requires that
(13)
To see this, integrate Eq. (13) and apply Eq. (11),
(14)
so that Eq. (13) is satisfied.

For an MCMC algorithm, one wishes to estimate the posterior distribution via sampling. It is sufficient for such an algorithm to choose an appropriate transition probability density so that the stationary distribution corresponds to the posterior distribution in Eq. (3). The Metropolis–Hastings algorithm41 achieves this by specifying a proposal distribution g(x, y) and a probability of acceptance A(x, y) such that the transition kernel p(xy) = A(x, y)g(x, y) satisfies detailed balanced. The proposal transition distribution g is not specified a priori and must be chosen by the user (see Refs. 28 and 30 for detailed discussion of the properties and requirements for the proposal distribution). Often, g is chosen to be symmetric g(x, y) = g(y, x) or independent of the current sample g(x, y) = g(y), with a uniform or normal distribution often selected for computational convenience.

For arbitrary proposal transition density g (state-space must contain the support of π), we see that to satisfy detailed balance, the acceptance probability must satisfy
(15)
It is easy to see that letting the acceptance probability take the form
(16)
satisfies this condition. The Metropolis–Hastings algorithm uses this acceptance probability according the following algorithm:
  1. Propose y by drawing from g(x, y).

  2. Compute A(x, y).

  3. Accept y with probability A(x, y); else, set a new step to current step x.

  4. Return to 1.

A couple of notes are in order. First, note that the transition probability being factored as a product of A and g allows us to first propose yg(x, y) and then accept or reject with probability A(x, y). We also note that, intuitively, if a symmetric proposal distribution is used g(x, y) = g(y, x), then for π(y) > π(x), the new point will always be accepted. That is, the chain will always take a proposed move to a region of higher probability density unless such a move is otherwise penalized by the proposal distribution g. Finally, we remind the reader that by satisfying detailed balance, the stationary distribution is guaranteed to be given by π.

As previously mentioned, in our problem of interest, π is the posterior distribution, and we can generate an individual sample for a given θ by evaluating the prior probability and the likelihood at that point. However, even though we can evaluate the posterior, we resort to MCMC to sample it because any problem with more than a few dimensions will be exceedingly expensive to sample by brute force on a grid, and we do not a priori know where the posterior will have the highest weight. MCMC sampling solves this problem for us by sampling from the posterior distribution and providing us a chain of samples. However, we do have to evaluate the posterior at each point proposed by the sampler, and many thousands of samples are often required to provide a converged chain. This necessitates that the forward models used to emulate the observed data be computationally efficient. We will address a variety of data-driven approaches that can enable computational tractability for this problem in Sec. IV.

To compare the results of the analytic Laplace approximation from Sec. II A, we sample the same model using the Metropolis–Hastings algorithm, allowing us to relax the assumption that the posterior is Gaussian. We can also take advantage of the flexibility of the algorithm to impose more informative priors. First, we observe that the Laplace approximation to the posterior shown in Fig. 2 allows the constant offset and energy shift terms to assume negative values. As negative backgrounds are unphysical in this case, we restrict the uniform prior to be positive. We also impose our knowledge that the mean energy shift in the neutron spectrum is positive by placing a bound on the prior for this parameter. Next, we apply realistic bounds to the ion temperature of [0.1,10] keV. Finally, we restrict the amplitude to be within a sensible range as well. We use the Metropolis–Hastings algorithm to sample the posterior, which is shown as the blue contours overlaid in Fig. 2. We can see very good agreement between the posteriors inferred with both methods, indicating that our assumptions made in the analytic model were not unfounded. We note very small shifts in inferred mean values of ion temperature and amplitude that are well within the uncertainty of the inference. In general, we see that the distribution inferred using MCMC is slightly tighter than the approximate version. This is desirable, as we would not want our approximate posterior to give us a reduced credible range in our inferences. However, we note that this is not a general feature of the Laplace approximation that can be guaranteed, so caution is warranted when using approximate inference methods. Finally, we note the cutoff in the distribution of the constant offset and energy shift parameters (most noticeable in the fully marginalized distributions along the diagonal), which are due to the bounds placed on the prior and are not informed by the data.

Now that we have multiple different inference methodologies at our disposal that can handle physical models of varying complexity with different assumptions made about the statistical model, we can explore how these choices impact our inference. As stated above, the problem of interest is a counting problem. We assumed that counting statistics were sufficient to warrant a normal likelihood distribution, but this will not always be true. Both the Laplace approximation and MCMC sampling can utilize a Poisson likelihood. Additionally, we can relax our assumptions on the physical mode and use one of higher fidelity for f(E|θ), such as the Ballabio model used to generate the data. Because the Ballabio model computes the shift in the mean neutron energy with ion temperature, we are also able to remove a parameter from our model. Our new model requires an ion temperature, a scale factor, and a background level.

Figure 3 shows the posterior distribution of the ion temperature inferred with different physical and statistical models, as well as different inference methodologies. The top panel shows inferences all using a normal likelihood with the Laplace approach and the Brysk model in blue, MCMC and the Brysk model in red, and MCMC with the Ballabio model in green. The true value is shown as a vertical dashed line, and the mean of each inference is shown as a colored dot. We see that the Laplace approach results in a slightly wider posterior with the mean a bit further from the true value relative to the other methods. However, the mean is well within one standard deviation of the true value, so this answer is still a good one. Using MCMC brings the mean a bit closer to the truth with a narrower posterior. Finally, using MCMC with the Ballabio model produces a negligible change in the posterior. This is due to the fact that, given the low count rate, the Brysk model is easily able to explain the data, so the improved physical model does not produce a tighter posterior. The bottom panel shows two cases using the Ballabio model and a Poisson likelihood. The posterior found using the Laplace approximation is shown in orange, and that found using MCMC is shown in cyan. Both approaches produce a noticeably narrower posterior than with the normal likelihood, and the MCMC approach shows a slight improvement relative to the Laplace method. We expect this to be the case since the use of the Poisson likelihood better matches the measurement mechanism and the use of the Ballabio model in our forward physics model better matches the true physics of the observation. We provide the details of these cases as additional examples in the Jupyter notebook that accompanies this Review.

FIG. 3.

Marginal posterior distribution of the ion temperature inferred from different statistical and physical models using MCMC or the Laplace approximation. Top panel: All inferences use a normal likelihood. Results using the Laplace approximation in conjunction with the Brysk model are shown in blue, MCMC with the Brysk model in red, and MCMC with the Ballabio model in green. Bottom panel: All inferences use the Ballabio model and a Poisson likelihood. Results using the Laplace approximation are shown in orange and MCMC sampling in cyan.

FIG. 3.

Marginal posterior distribution of the ion temperature inferred from different statistical and physical models using MCMC or the Laplace approximation. Top panel: All inferences use a normal likelihood. Results using the Laplace approximation in conjunction with the Brysk model are shown in blue, MCMC with the Brysk model in red, and MCMC with the Ballabio model in green. Bottom panel: All inferences use the Ballabio model and a Poisson likelihood. Results using the Laplace approximation are shown in orange and MCMC sampling in cyan.

Close modal

In summary, there are many choices one can make when constructing a statistical inference model. The inference methodology is largely independent of the physical and statistical models implemented in the inference, but the choices will have an impact on the result. Improving the fidelity of the physical and statistical models will generally improve the quality of the inference, and a poor physical model can produce highly biased results with misleading uncertainties. Care must be taken to ensure that a valid model is used. The inference methodology is largely determined by the complexity of the problem and the accuracy required in the results. If done properly, the Laplace approach and other approximate inference methods (e.g., variational inference37–40) can provide excellent results more efficiently than MCMC. However, as these methods rely on traditional optimizations, they can be sensitive to the starting point and the usual warnings apply. If the posterior is not well approximated by a multivariate normal distribution with nearly linear correlations or if more accurate results are required, MCMC approaches should be considered. Fortunately, the software tools available today make it fairly straightforward to test these different approaches.

The methods outlined above provide us with full joint probability distributions of the model parameters conditioned on the data. Plots such as Fig. 2 show the full distribution as a series of pairwise distributions with the univariate distributions along the diagonal. Often, we want to see the distribution of a parameter with the influence of the other parameters removed, a process referred to as marginalization. Marginalization is done by integrating over all parameters except for the one of interest,
(17)
This process leaves us with only the dependence on the parameter of interest. The marginalized distribution can be further reduced to produce the usual moments of the distribution for the parameter in question, such as the mean and variance.

Various analytic distributions have known solutions to the integral above. For the multivariate normal distribution obtained via the Laplace method, we can easily obtain a univariate distribution and the necessary summary statistics using the optimal parameter location θo and the covariance matrix Σ. This is accomplished simply by using the mean and variance of the parameter of interest p(θi)=N(θi,o,σi2). As a direct result, the expectation value and variance of the ith parameter are θi,o and σi2, respectively. Other inference methods that provide an analytic description of the posterior, for example variational inference,37 can use similar closed form solutions to marginalize the posterior and obtain summary statistics.

When sampling the posterior via MCMC methods, we do not obtain a closed form description of the posterior. Instead, the sampling algorithm maps the distribution and returns a chain of samples, where the frequency of occurrence of a given value is proportional to the probability density. While this chain does fully quantify the distribution, it can be difficult to visualize and interpret. The upside is that we can obtain the marginal distributions and the moments without need for the complex integration above. The marginal distribution for a given parameter is just the chain of samples obtained for that parameter. The distribution can be visualized using a histogram or similar plotting method. Using this chain, we can define the expectation value and variance as
(18)
(19)
where θn, n = 1, 2, …, N, are the MCMC samples from the chain. In fact, the expectation value of any function f can be estimated from samples θn as
(20)
providing us with a means to estimate virtually any quantity of interest from our chain of samples.

If the posterior is reasonably Gaussian, the moments above are useful descriptors. However, if there is significant skew in the distribution, the median may be a more appropriate descriptor. Additionally, quantiles are often used instead of the variance to quote credible intervals for an inference. This is because quantiles are able to accurately capture skewness in the distribution where a significant amount of the density is on one side or the other of the mode, therefore encoding more information, but still being amenable to the use of an error bar, albeit potentially asymmetric, when plotting inferred values.

The plot in Fig. 4 shows commonly used quantiles represented on a skewed distribution. The distribution is log normal with a mean of 1 and a standard deviation of 0.5. Quantiles are computed by finding the point in the cumulative distribution function (black line) that exceeds a specified fraction of the total probability. The vertical dashed line marked 50% shows the p50 or median point. This is the location where half the weight of the distribution is on either side. Next to this is a small tick mark showing the mean, illustrating that for a skewed distribution, the mean and median are not the same and the median will be closer to the peak. On either side of the p50 point are the p25 and p75 quantiles. These indicate the location where 25% or 75% of the weight of the distribution is to the left of the line, respectively. Between these two locations is the 50% credible interval (CI) or the symmetric region of the distribution about the median that contains 50% of the probability. Going further out, we have the p2.5 and p97.5 quantiles, between which lies the 95% CI. This interval is equivalent to a 2σ interval for a normal distribution. Any combination of quantiles can be used to specify a credible interval, but these are commonly used ones. Above the histogram is a so-called box and whisker plot. The line in the middle of the box shows the p50 point, and the edges of the box show the p25 and p75 points on the left and right, respectively, denoting the 50% CI. Finally, the whiskers extending from the box region indicate the p2.5 and p97.5 quantiles, giving the 95% CI. This form of plot is a convenient way to visually represent a distribution. Here, we note that there is a subtle difference between Bayesian credible intervals and frequentist confidence intervals. The credible interval conveys the range of parameter values, which are consistent with the observations and the prior. The frequentist confidence interval is a statement on the long-run performance of the method under repeated sampling of data. For further details on the interpretation of credible and confidence intervals, we refer the reader to one of the many textbooks or articles on the subject.28,39

FIG. 4.

Lower: Illustration of different summary statistics using a log-normal distribution. The solid hash mark indicates the mean, while the dashed vertical line marked 50% shows the median or p50 quantile. The 50% CI is indicated between the p25 and p75 quantiles. The 95% CI is indicated between the p2.5 and p97.5 quantiles. Upper: Box and whisker plot corresponding to the quantiles illustrated below.

FIG. 4.

Lower: Illustration of different summary statistics using a log-normal distribution. The solid hash mark indicates the mean, while the dashed vertical line marked 50% shows the median or p50 quantile. The 50% CI is indicated between the p25 and p75 quantiles. The 95% CI is indicated between the p2.5 and p97.5 quantiles. Upper: Box and whisker plot corresponding to the quantiles illustrated below.

Close modal

Even though these metrics are useful for describing a distribution function, they do fail to convey important information in certain circumstances. The most common example is a multi-modal distribution, i.e., one that contains more than one peak, or mode. In this situation, the quantiles and, in particular, the median will fail to convey that there are multiple regions of the distribution that have high probability. The other common scenario is when significant non-linear correlations exist between parameters. In this scenario, the pair-wise distribution shown in Fig. 2 will be strongly distorted, which has implications for interpreting various probabilities of interest. In these situations, it is generally best to show the full posterior rather than reduced quantities.

The example shown in Sec. II A, along with the companion Jupyter notebook,26 illustrates how to perform Bayesian inference for an idealized problem using different methods, approximations, and likelihood functions. As we have seen, these approximations have an impact on the quality of the inference. In general, the higher the fidelity of the physics model used in the inference and the better the match of the likelihood function to the true measurement process, the better the quality of the inference. In our example, using a Poisson likelihood in conjunction with the Ballabio model for the neutron spectrum produced the tightest posterior distribution for the ion temperature, which is the parameter of interest. We caution that a better quality inference does not always equate to a tighter posterior. In fact, as previously mentioned, using poor quality models can often give highly biased results with a very tight posterior about the biased values. This illustrates that many choices can be made when formulating a Bayesian inference problem, and these choices will have an effect on the quality of the resulting inference, and so caution must be exercised when interpreting results. We encourage the reader to change various parameters in the example code and study the effect each has on the posterior and how data quality impacts these results. Below, we will discuss some general principles to help the practitioner navigate these choices and develop an understanding to guide them.

The choice of prior is a critical input to any Bayesian inference problem and a very common conceptual hurdle for the novice. In general, we want to use the least informative prior possible. This is often, but not always, a uniform distribution with appropriate bounds placed to avoid non-physical parameters. The reasoning is that we wish to provide enough constraining information to ensure a useful solution without unduly influencing the posterior distribution. There are three major exceptions when a uniform distribution is not the appropriate prior. The first scenario is when important information is known before the experiment about a parameter (or subset of parameters) that warrants a more informative prior. This could be, for example, when target metrology or engineering tolerances are able to constrain certain initial conditions or when constraints such as conservation of mass can be employed. Additionally, correlations known about the model parameters a priori can be enforced by choosing a multivariate normal distribution with a covariance matrix that is informed by known physics. It is important to remember that only information known prior to observing the data should be included and not information that can be gleaned from the data. As an example, some diagnostics are sensitive to the product of two parameters, e.g., the areal density ρR, and thus, the posterior distribution of these parameters is strongly correlated. This correlation should not be built into the prior, even if it is expected, because this is informed by the data and, thus, is a posterior quantity. However, if, for example, we are inferring the density at two different spatial locations and physics arguments, e.g., a high sound speed, tell us that these two locations should be correlated, we can build this knowledge into the prior.

The second common situation is when a uniform distribution is genuinely not the least informative distribution. This can often occur when a change of variables is employed in the model. We may use a uniform distribution in the original variable system, but variable transformation requires a transformation be applied to the distribution to maintain the desired properties. The classic lighthouse problem illustrates this well, where a transformation is made from angular coordinates to position. We desire a uniform distribution in the angular variable, but this turns out to be non-uniform in position. We encourage the reader to see Ref. 30 for more details. Additionally, there are some common statistical models where a uniform prior on the parameters is not appropriate. Detailed discussion of these cases can be found in Refs. 28–30, though they tend to be uncommon in our application.

The final reason to use a more informative prior is when the algorithm being used requires it. Some approximations that are useful for describing the posterior require that the likelihood and priors be normal,42 though some methods can employ distributions with bounds.43 Common practice is to use very broad normal distributions with bounds to avoid non-physical or extreme values, thereby approaching a bounded uniform distribution. These approaches are often used for computational convenience and can be avoided if an efficient forward model is used when computing the likelihood. However, methods such as these can be used to find a linearized normal posterior, which can then be used to condition the proposal distribution for a subsequent MCMC sampling step, providing more efficient sampling than the general unconditioned case. Additionally, at least one example of an iterative method exists to remove bias in the posterior introduced by an informative prior.44 These methods will be discussed further in Sec. IV D.

Choosing the likelihood is the next significant decision that the practitioner must make. The likelihood distribution should be determined by the measurement method at hand. For the classic problem of determining the fairness of a coin using a series of coin flips, the binomial distribution is the appropriate likelihood. This is because each flip has only two possible outcomes and the probability of each is determined by a binomial distribution. A Poisson likelihood is most appropriate for a counting problem, which can be approximated by a normal likelihood in the limit of high counts. When using features extracted from data in the likelihood, it is imperative to establish the statistics of the feature as derived from the data. If they can be approximated as a normal distribution, then a normal likelihood could be appropriate. If not, a different distribution should be chosen.

One alternative likelihood is the hard boundary distribution,45 
(21)
Here, σ is the uncertainty band on the data. This likelihood gives equal weight to any f that matches D within the uncertainty band and 0 weight to any f that is outside providing a uniform distribution centered about D. One can imagine other likelihood functions that act similarly. For example, a super-Gaussian would provide a similar flat-top response and rapid decay to negligible weight outside of the uncertainty band. As we saw in our example, the choice of likelihood can strongly impact the results of the inference. Because of this, it is important to choose a likelihood that matches the physics of the measurement in question.

Once our priors and likelihood have been specified, along with our forward model f(θ), we must choose the inference method that we will use to find the posterior. For some well-behaved cases, we may find an analytic solution; for an estimate of the posterior, we can use the Laplace method, variational inference, or other approximate methods, or if the problem is more complex and we require an accurate posterior, we can use MCMC methods. A wide range of software packages exist to perform this step. Popular options are Dakota,46 Stan,47 emcee,48, pymc,49,50 JAGS,51 and TensorFlow probability.52 Our example notebook uses pymc. Each of these packages has built-in MCMC sampling algorithms. The Metropolis–Hastings algorithm is a widely used and reliable workhorse with a small number of hyperparameters to be chosen by the user. Another benefit of this algorithm is that it does not require the calculation of gradients during sampling, so the physics model being used does not need to be differentiable making it amenable to most any scientific computation with little to no modifications.

MCMC samplers require a period of time to establish a stationary distribution. This time period, known as burn-in, is needed so that the samples in the chain do not retain memory of their starting point. MCMC sampling is a kind of diffusion process, so this knowledge can take some time to leave the system. Once the chain has settled to its stationary state, these burn-in samples should be discarded. No hard and fast rules exist to establish the burn-in phase. Generally, 1000 samples are required, but this will depend on the dimensionality and properties of the individual problem. Best practice is to monitor the evolution of the chain for the problem of interest and determine the appropriate number of samples. We refer the reader to Refs. 28, 30, and 37 for more information. Additionally, some algorithms produce samples that have some degree of correlation between neighbors, meaning that the Markov chain has some degree of residual memory. In this situation, a practice called thinning is employed. Essentially, only every x samples are retained from the chain, where x is an integer specified by the user. The remaining samples are discarded. Alternatively, all samples can be retained, but the effective number of samples will be less than the total. This has implications for the Monte Carlo error of an estimate, which is 1/Neff, where Neff is the effective number of samples, not the total number.

Many of the available packages provide additional samplers that can be exploited to improve efficiency or avoid pathologies that are specific to certain classes of problems. Two popular samplers based on Hamiltonian dynamics are Hamiltonian Monte Carlo (HMC)37,53 and No U-turn Sampler (NUTS).54 These algorithms improve efficiency by enabling larger jumps in model parameter space while maintaining a high probability of acceptance due to the energy preserving nature of their methods. Details of these algorithms can be found in Refs. 37 and 53.

Additional care must be taken for high-dimensional problems, ≳20. This is because in a high-dimensional space, standard MCMC exploration of the probability density becomes inefficient, referred to as the curse of dimensionality. Regions of high density can be numerous, small in extent, and widely separated. This makes it ever more difficult to map the probability distribution fully. Various ideas from global optimization theory have been adapted to MCMC sampling, such as differential evolution,55 graduated homotopy,56 and mode-hopping,57,58 which seek to allow the chain to explore a broad region of the parameter space, enabling the full distribution to be sampled efficiently. Still, this is an active area of research, with the goal of a general and reliable algorithm for solving strongly non-convex problems remaining unsolved.

The benefits of Bayesian inference have been accepted and exploited in a variety of fields for several decades now. However, ICF and HEDP have been slow to catch up. This is, in part, due to the complexity of the measurements and physics models involved in inferring important quantities. Recently, computational capabilities and the availability of easy to use inference software have improved to the point where these tools are being applied to a range of problems in the field. An important benefit that Bayesian inference provides is the evaluation of rigorously quantified uncertainties, which can consistently incorporate all the relevant experimental and model uncertainties and constraining information from any measurements that are provided. Rigorously quantified uncertainties are crucial when comparing experimental inferences to models or attempting to discriminate between competing models.

Figure 5 shows four different examples where Bayesian inference was applied in ICF and HEDP research. The left panel from Ref. 59 demonstrates a method where very limited data are used to constrain the energy coupling and flow between different sources and sinks throughout the course of a direct drive exploding pusher implosion34 on the OMEGA 60 laser. In this work, a thin SiO2 shell is filled with 10 mg/cm3 of pure deuterium gas. The shell is directly illuminated with intense UV light, causing the shell to ablate outward, producing a rocket effect that drives the remaining shell material inward and a strong shock into the deuterium fill. These implosions have been shown to be reproducible and fairly symmetric,65,66 making them ideal for the investigation of detailed physics processes. Time resolved self-emission images are obtained as the capsule is exploding after the shock has coalesced on axis and struck the shell. These images show a bright ring of emission as the shell material is heated and accelerated outward, from which a radius as a function of time is extracted. The rate of this expansion can be used to constrain the efficiency and prior dynamics of the implosion. With knowledge of the initial radius and laser drive, a simplified model of the ablation, acceleration, and rebound of the shell is constructed that matches the expansion phase of the experiment and simultaneously constrains the unobserved implosion and shock reflection phases. The model consists of an ordinary differential equation (ODE) describing the radius vs time as a function of ten model parameters, five of which have strong priors informed by independent measurements or physical constraints. The other parameters take the form of broad normal distributions with bounds (such as positivity) being enforced where appropriate. The measured radius vs time is compared to the modeled values using a normal likelihood function, and MCMC sampling is used to constrain all of the model parameters. The result is the plot on the bottom of the left panel, showing the posterior distribution of the shell trajectory and pressure of the fuel. The posterior distribution of the model parameters can then be fed back into the ODE to obtain a wealth of information, such as the ablation kinetic energy, fuel energy, and shell kinetic energy, among other quantities, as a function of time with credible intervals. This dataset provides valuable constraints with quantified uncertainties on radiation hydrodynamics simulations, helping to provide consistency checks with the equation of state (EOS) models in untested regimes and improve the code’s predictive capability.

FIG. 5.

Upper left: Summary of Refs. 5961 illustrating the experimental data used to constrain the evolution of an exploding pusher implosion on OMEGA 60. Adapted with permission from Ruby et al., Phys. Rev. E 102, 053210 (2020). Copyright 2020 American Physical Society. Upper center: Figures from Ref. 62 illustrating the model used to generate synthetic images for comparison to measurement in a convergent Rayleigh–Taylor instability experiment. The bottom panel shows the resulting trajectories extracted from the analysis compared to simulation. Black denotes the location of the shock, blue and red are locations of the spike tips, and green is the location of the bubble (see Ref. 62 for details). Adapted with permission from Tobias et al., High Energy Density Phys. 37, 100879 (2020). Copyright 2020 Elsevier. Upper right: Figure from Ref. 63 illustrating the forward model used to emulate velocity measurements on dynamic compression experiments on Z. The bottom plot shows the calibrated equation of state obtained from the Bayesian inference model. Adapted with permission from J. L. Brown and L. B. Hund, J. R. Stat. Soc.: Ser. C: Appl. Stat. 67, 1023 (2018). Copyright 2018 The Royal Statistical Society. Lower: Bayesian analysis loop from Ref. 64 showing the Bayesian extraction of width and asymmetry features from the raw nTOF data and the DNN physics model surrogate used in the MCMC sampling to infer BR. On the right is a plot showing the resulting inferred BR as a function of laser preheat energy for four experiments on Z. Reproduced with permission from Lewis et al., Phys. Plasmas 28, 092701 (2021). Copyright 2021 AIP Publishing LLC.

FIG. 5.

Upper left: Summary of Refs. 5961 illustrating the experimental data used to constrain the evolution of an exploding pusher implosion on OMEGA 60. Adapted with permission from Ruby et al., Phys. Rev. E 102, 053210 (2020). Copyright 2020 American Physical Society. Upper center: Figures from Ref. 62 illustrating the model used to generate synthetic images for comparison to measurement in a convergent Rayleigh–Taylor instability experiment. The bottom panel shows the resulting trajectories extracted from the analysis compared to simulation. Black denotes the location of the shock, blue and red are locations of the spike tips, and green is the location of the bubble (see Ref. 62 for details). Adapted with permission from Tobias et al., High Energy Density Phys. 37, 100879 (2020). Copyright 2020 Elsevier. Upper right: Figure from Ref. 63 illustrating the forward model used to emulate velocity measurements on dynamic compression experiments on Z. The bottom plot shows the calibrated equation of state obtained from the Bayesian inference model. Adapted with permission from J. L. Brown and L. B. Hund, J. R. Stat. Soc.: Ser. C: Appl. Stat. 67, 1023 (2018). Copyright 2018 The Royal Statistical Society. Lower: Bayesian analysis loop from Ref. 64 showing the Bayesian extraction of width and asymmetry features from the raw nTOF data and the DNN physics model surrogate used in the MCMC sampling to infer BR. On the right is a plot showing the resulting inferred BR as a function of laser preheat energy for four experiments on Z. Reproduced with permission from Lewis et al., Phys. Plasmas 28, 092701 (2021). Copyright 2021 AIP Publishing LLC.

Close modal

The center panel shows an experiment designed to study the growth of the Rayleigh–Taylor instability (RTI) in a converging geometry. These experiments, conducted on both the NIF and the OMEGA 60 lasers, drive a cylindrical implosion with a pre-imposed perturbation, which grows due to the acceleration driven Rayleigh–Taylor instability; see Refs. 62 and 67 for details. The growth of the perturbation is modified by the convergence of the shell as it implodes, a process that is important for ICF and has seen extensive theoretical study over the years, but little experimental investigation until recently. X-ray radiography is used to measure the amplitude of the perturbation and the trajectory of the interface and the converging shock wave. Initial studies of this platform arrived at relatively large uncertainties in the positions and amplitudes of interest primarily due to instrumental effects. The radiographs are formed by a large area backlighter68,69 that illuminates a pinhole array. Each pinhole then forms a quasi-point source that illuminates the target and projects an image onto a time resolved detector. Because of this configuration, each pinhole projects through the target at a slightly different angle creating a position-dependent parallax effect that distorts each image differently. Furthermore, the locations of the backlighter source, pinhole array, target, and detector are not known exactly, and so the parallax cannot be removed directly from the data. To circumvent this, a ray tracing model was developed as the forward model for a Bayesian inference performed using the Bayes Inference Engine (BIE).70 The locations of each of the elements in the experiment are able to vary within the known uncertainties of the setup. Rays are passed through a model of the target, which is constructed of primitive objects whose positions vary, creating a mask for the rays in the model, shown in the top left image in the center panel. Backlighter intensity variations and background are also included to form a synthetic radiograph, shown in the upper right. This image is compared to the measured image (bottom left), and a residual is formed (bottom right). The residual is minimized using a normal likelihood, accounting for all the uncertainties in the instrumental setup, producing a posterior distribution of the amplitudes and positions of various features of interest. The plot on the bottom of the center panel shows the comparison of the inferred quantities (×’s) to the simulated trajectories (solid lines). The uncertainties derived from the posterior are invisible on the plot, demonstrating the significant improvement provided by the Bayesian technique. With this analysis, we can clearly see that the simulation fails to reproduce some important aspects of the experiment. Looking forward, this level of confidence in the measurement will easily be able to distinguish between competing hydrodynamic models or different methods of running the same model (choice of EOS, flux limiters, drive multipliers, etc.).

The panel on the far right of Fig. 5 illustrates the inference of the equation of state parameters from velocimetry data in dynamic compression experiments. Illustrated in the top image of the far right panel of Fig. 5, Brown and Hund developed a Bayesian method to calibrate an equation of state model for tantalum at high pressure.63 The Z machine6,7 is used to drive an aluminum electrode, which supplies a continuously ramped pressure drive to the tantalum sample.71 The velocity of the sample is measured on the opposite side where the free surface of the tantalum is mated to a lithium fluoride window. Typically, an inverse Lagrangian analysis is used to turn the free surface velocity information into the equation of state quantities of interest. This method uses the magnetic field drive as a boundary condition and performs a forward model to produce simulated velocities, which are compared with the measurement via a normal likelihood. Priors are placed on the parameters of a Vinet equation of state model72 and other experimental parameters, such as the magnetic field drive and the sample thickness, allowing for the incorporation of a wide range of experimental uncertainties. The posterior distribution of the Vinet fit parameters, shown in the bottom plot of Fig. 5, is used to produce an experimentally calibrated equation of state for tantalum. For efficiency, this work uses a Gaussian process73  (GP) surrogate of the forward model. This calibration provides more accurate constraints on the equation of state than the inverse Lagrangian method with more complete accounting of the experimental and model uncertainties. Due to its success, this methodology has been extended and applied to a wide range of equation of state experiments in a range of regimes and numerous drivers.74–76 

The final example we will discuss, shown in the bottom panel of Fig. 5, demonstrates a Bayesian technique for inferring magnetization of the fuel at stagnation in Magnetized Liner Inertial Fusion (MagLIF)77–79 implosions using secondary DT neutron spectra. The method was developed using a kinetic model that tracks tritons born from DD fusion reactions in a cylindrical fuel configuration and computes the resulting secondary DT neutron spectrum. It was shown that this signature is extremely sensitive to the magnetic field-radius product BR during burn, which is an important confinement parameter for magneto-inertial fusion concepts.80,81 This analysis was expensive, requiring thousands of evaluations of the kinetic model for every experiment analyzed, and did not provide rigorously defined uncertainties on either the data or the inferred values. The authors of Ref. 64 discuss a fully Bayesian pipeline to extract features from the raw waveforms utilizing a background fit and hand-engineered features that characterize the width and asymmetry of the neutron time of flight signals and infer BR from these featurized data in conjunction with the DD and DT neutron yields. To circumvent the expense of running the kinetic model for each MCMC step, a neural network surrogate model was trained to map from model parameter inputs to the extracted data features. The data featurization pipeline is shown on the top of the flow chart in the lower panel of Fig. 5, and the inference pipeline, including the neural network surrogate model, is shown on the bottom. The use of surrogate models and their implications will be discussed in Sec. IV. The Bayesian model implemented in this work is parameterized with five parameters, of which BR is of primary interest. The work demonstrated the technique using broad uniform priors on the nuisance parameters, but other measurements could be used to inform these quantities more tightly. This method was applied to an ensemble of four experiments. The results are shown in the plot on the right-hand side of the bottom panel as BR vs preheat energy deposited into the fuel. These data clearly show that the inclusion of the so-called Nernst effect is critical for capturing the remaining magnetic flux at stagnation.

These examples illustrate the power and flexibility of using a Bayesian framework for the inference of quantities of interest in ICF and HEDP experiments. Other works have utilized these methods to infer the equation of state parameters from radiographs,82 the inference of performance metrics in ICF experiments in the alpha-heating regime,83,84 and extracting electron temperature from K-shell spectra.85 Additionally, the authors of Ref. 45 used Bayesian inference to demonstrate the advantages and potential pitfalls of these methods using a reanalysis of two datasets from the literature. This work aptly demonstrates the potential for bias and misleading uncertainties when using maximum-likelihood methods that do not fully account for uncertainties and priors.

Going beyond the examples outlined above, where a small number of scalar-valued or a single vector-valued observation is used as the basis for an inference, we quickly enter the realm of data assimilation,25,86 sometimes called data fusion. Data assimilation is the process by which multiple disparate sources of data are used to jointly constrain an inference, typically through a physical model that links the different observables. The same basic principles as above apply, but now have to be extended to allow for the combination of a variety of different likelihood functions, each of which must be constructed prudently such that one does not dominate the solution more than is warranted by the true information content of the data. This necessity arises from the fact that it is often difficult or impossible to infer realistic correlation models between observations both within, and between, instruments.87,88 Computational tractability often dictates that we assume independence between measurements, but this can dramatically overstate the true information content of the observations when there are structured or correlated noise effects. Downweighting the likelihoods by rescaling is an effective substitute for the reduced statistical power of the data that would result from using a likelihood with realistic covariances if the latter were easily inferred. This strategy is particularly helpful when very low-dimensional measurements are used in conjunction with data such as images or spectra, which may be represented with a very high sample/voxel count. If treated naively, images can easily outweigh a scalar or vector-valued datum due to the sheer number of data points in the image. Fortunately, the number of pixels does not equate to the true number of data points in an image, though determining the true information content of images can be a challenge. In this section, we will review the construction of a statistical model for data assimilation and highlight a variety of dimensionality reduction techniques used in the literature to help solve the weighting problem.

Despite this challenge, the benefits of this approach are tremendous. Let us suppose that we are making an inference using a model that has a number of nuisance parameters, i.e., parameters that are necessary in order to fit the data but are not of physical interest. Revisiting our example in Sec. II A, fitting the neutron spectrum to extract an ion temperature requires three parameters, a background, an energy shift, and a scale factor, in addition to the parameter of interest T. If we have measurements that can inform these nuisance parameters, we can improve our inference by incorporating this knowledge in some way. In our example, a yield measurement could help to constrain the scale factor. Perhaps the simplest way is to use a separate measurement to independently constrain the nuisance parameter and then place an informative prior on that parameter in our inference. A more elegant and powerful way to accomplish this is to use both measurements to jointly constrain the inference. In order to do this, our forward model must account for both measurements simultaneously, and so may require more physics be added to accomplish this. Additionally, we now require two likelihood functions that are combined to perform the inference. These additions are not trivial, but they are extremely powerful. The resulting posterior is now informed by both measurements, which generally improves the quality of inference for all parameters, not just the one of interest. If our nuisance parameter is not really a nuisance, but something we are interested in knowing, then by taking this approach, we have improved our ability to infer multiple important quantities by combining them into a single statistical model.

By way of providing a concrete example, it is useful to revisit Eq. (6), which specifies a normal likelihood and uniform priors. For the moment, we ignore the priors, which need not be uniform, and focus on the normal likelihood. In our example in Sec. II A, the function f(Ei|θ) takes as inputs the model parameters θ and returns a neutron spectrum. In the more general case, we can split f into two parts: one that models the physics of interest and another that models the diagnostic observables. The physics model will generally produce as its output a variety of field quantities (density, temperature, velocity, etc.). These fields are not directly observable to us, but are related to the diagnostic signatures we do observe. The diagnostic model will take these fields and produce any number of observables that we may wish to compare to (x-ray radiographs, emission spectra, neutron spectra, x-ray emission power, etc.), all of which are self-consistent. Here, it is useful to introduce the concept of an inference network,37,89 an example of which is shown in Fig. 6. An inference network is a directed graph that visually depicts the connections between all of the important pieces in our model: the priors, the physics model, the diagnostic models, the observations, and unobserved, or unobservable, quantities of interest.

FIG. 6.

Bayesian inference network showing the flow of information used to define a Bayesian model. Model parameters θ connect to the physics model f, which outputs the quantity y. To capture potential uncertainties associated with the model such as those arising from the use of an inexact surrogate, a stochastic noise parameter ɛ can be added. The output of the physics model is then fed into the diagnostic models g along with any uncertain parameters z that control the behavior of the diagnostic. The diagnostic models are compared to observations D through the likelihood. Finally, an auxiliary model h can be used to tabulate statistics on other unobserved quantities Y. Stochastic parameters are denoted by open circles, deterministic quantities by triangles, and observed quantities by filled circles.

FIG. 6.

Bayesian inference network showing the flow of information used to define a Bayesian model. Model parameters θ connect to the physics model f, which outputs the quantity y. To capture potential uncertainties associated with the model such as those arising from the use of an inexact surrogate, a stochastic noise parameter ɛ can be added. The output of the physics model is then fed into the diagnostic models g along with any uncertain parameters z that control the behavior of the diagnostic. The diagnostic models are compared to observations D through the likelihood. Finally, an auxiliary model h can be used to tabulate statistics on other unobserved quantities Y. Stochastic parameters are denoted by open circles, deterministic quantities by triangles, and observed quantities by filled circles.

Close modal

The graph in Fig. 6 contains three different types of nodes: open circles, which denote stochastic quantities; triangles, which denote deterministic quantities; and filled circles, which denote observed quantities. At the top of our graph, we have open circles representing the prior probability for each of the physics model parameters θi, i = 1, 2, …, N. These nodes are connected to the triangle f(xθ), which represents our physics model. Here, x is the independent variable or variables, such as position, time, and frequency. Samples are drawn from θ and fed into this node, producing an output y. To account for uncertainty due to a surrogate model or other approximation, we can include an additive stochastic parameter ɛ. In the limit of no uncertainty, y becomes a deterministic variable. This output is then directed into each of the diagnostic models gj(yzj), j = 1, 2, …, M, where zj represent any diagnostic model parameters that have priors placed on them, such as calibration quantities, detector response, and alignment uncertainties. The output of the diagnostic models is compared to the observations Dj, j = 1, 2, …, M, through the likelihood distribution specified for each. Finally, through the process of sampling the posterior distribution of the model parameters, we can also accumulate statistics on unobserved quantities. In this example, we have another function Y = h(y) that takes the output of the physics model and returns some quantity of interest. This could be, for example, the quantity χ discussed in the introduction, where the field quantities of pressure and temperature can be used to compute a value for a given sample of model parameters. The posterior samples of this new quantity Y now give us a probability distribution p(Y|D) of a fundamentally unobservable quantity that is consistent with all of the data.

The posterior can be derived directly from the graph above and written as follows:
(22)
Here, Lj are the likelihoods for each of the observations, which require an evaluation of both f and g and depend on the model parameters θ and the diagnostic model parameters zj. We note that this form assumes independence of the different likelihoods. It is, in principle, possible to relax this assumption by estimating the covariance between the different measurements, but this is rarely ever possible, in practice.87,88 Even though this likelihood is derived from a relatively simple graph, the equation is very difficult to interpret, making the graph itself more useful for understanding the flow of information. Typically, as before, the log of the above equation is used, producing a sum over the individual likelihoods. It is the terms in this summation that must be appropriately weighted for a meaningful composite likelihood to be used. We note that, in general, these graphs can be far more complicated, with multiple models interacting in more complex ways.

Although the application of Bayesian data assimilation in HEDP and ICF is fairly new, a few examples of this approach are available. The authors of Ref. 90 demonstrated an analysis of MagLIF stagnation data using a combination of six different diagnostics (neutron yield, neutron time-of-flight, filtered x-ray power measurements, filtered x-ray pinhole camera, and a monochromatic x-ray imaging system). A simplified isobaric model of the stagnation plasma was used to produce spatially varying neutron and x-ray emissivities. The quasi-cylindrical plasma column was coarsely divided into three axial regions to attempt to capture low-frequency variations in the stagnation parameters. Each axial slice is characterized by a pressure, peak temperature, liner areal density, radius, and mix fraction. A schematic representation of this model parameterization is shown in Figs. 7(a) and 7(b). The computed emissivities were then used to produce synthetic versions of each of the instruments in the analysis. Normal likelihood distributions were used for all observable quantities. The posterior was found using the Laplace approximation discussed above. In this work, the optimum was found using Levenberg–Marquardt minimization, and the Hessian was computed numerically via finite difference to describe the Gaussian posterior. Priors were specified using very broad bounded normal distributions to minimize impact on the posterior. For a subset of cases analyzed, an MCMC sampling was done to confirm that the Gaussianized posterior was an unbiased and conservative estimate of the true posterior. The individual likelihood distributions were scaled by considering two independent factors. First, the data from the imaging instruments were reduced to correspond to only the three axial zones, ensuring repeated identical information was not being included. The images were further reduced by matching the number of samples along the x-dimension to the real resolution of the instrument, ensuring the traces were not over-sampled. The neutron time-of-flight signal was also downsampled to reflect the true temporal resolution of the instrument. Finally, an estimate was made of the number of degrees of freedom or truly independent data points in the signal. This effective number of data points was then used to scale the individual likelihoods. Before application to experimental data, this model was extensively validated using a broad ensemble of data from a simplified model and against results of 1D and 3D resistive magneto-hydrodynamics (MHD) calculations.

FIG. 7.

(a) and (b) Schematic of the model used to provide a simplified description of the stagnation in order to analyze experimental data in MagLIF experiments from Ref. 90. (c) Resulting inferred temperature, pressure, and χ from 35 MagLIF experiments. Isocontours of χ are shown for reference. Adapted with permission from Knapp et al., Phys. Plasmas 29, 052711 (2022). Copyright 2022 AIP Publishing LLC.

FIG. 7.

(a) and (b) Schematic of the model used to provide a simplified description of the stagnation in order to analyze experimental data in MagLIF experiments from Ref. 90. (c) Resulting inferred temperature, pressure, and χ from 35 MagLIF experiments. Isocontours of χ are shown for reference. Adapted with permission from Knapp et al., Phys. Plasmas 29, 052711 (2022). Copyright 2022 AIP Publishing LLC.

Close modal

The validated model was used to infer the model parameters over an ensemble of 35 MagLIF experiments. The posterior distributions of each of the model parameters inferred for each axial zone were then processed to give emissivity weighted values of pressure, temperature, fuel volume, beryllium mix fraction, and liner areal density. Finally, an inferred distribution of χ was computed using these values. The results of this ensemble analysis are shown in Fig. 7(c) where the emissivity-weighted pressure is plotted as a function of temperature, and the inferred mean value of χ is shown as the color. This kind of plot summarizes the progress made on improving the performance and scalability of the MagLIF concept and provides key quantities for comparison to trends from integrated calculations. Importantly, this example illustrates how to use multiple disparate instruments to jointly constrain a quantity of interest χ that is fundamentally unobservable.

The authors of Ref. 91 developed a Bayesian model calibration approach that uses an ensemble of experimental data obtained during the BigFoot campaign on the NIF to constrain potential sources of performance degradation. The authors chose a form of the model calibration problem that transforms the experimental input parameters into a latent space that allows for the inclusion of these degradation mechanisms within the model, rather than the traditional method of including a probabilistic discrepancy term. This approach allows for data taken at a specific experimental configuration x to be used to constrain the values of the latent variables z that best describe the observations.

The model is parameterized to include six potential physical sources of degradation (variable hohlraum peak power and total energy multiplier, variable amplitude surface roughness on the capsule, variable width and amplitude of surrogate fill tube perturbation, variable preheat introduced into the DT layer, and a variable carbon mass premixed into the DT gas). The experimental observations to which the model is compared are the DT neutron yield, the ion temperature, the nuclear bangtime and burnwidth, the downscatter ratio (DSR), the mean radius of the hotspot from x-ray imaging, and the radiated x-ray energy at 22 keV. These quantities are compared through a normal likelihood, and the scaling has been accounted for by converting the measurements responsible for each of the observables to a single scalar feature. The forward physics model for this inference is based on the HYDRA radiation hydrodynamics code.92 However, running 2D HYDRA simulations for each MCMC sample is too expensive, so a deep neural network (DNN) surrogate model is trained by generating 70000 samples across the eight-dimensional parameter space. These points are then used to train a DNN using the DJINN93 approach, providing a mapping from the input parameters to experimental observables with a calibrated surrogate uncertainty. This surrogate can then be used to efficiently sample the posterior distribution of the parameters quantifying the various degradation mechanisms, an example of which is shown in the top four plots in Fig. 8. The analysis consistently shows that reduced compression due to artificial preheat is a significant contributor to degradation of the experiments compared to 2D HYDRA predictions. The analysis is then refined to test a variety of hypotheses to explain the reduced compression, revealing the fuel–ablator mix to be the most likely mechanism. Finally, the ensemble of analyzed shots is used to calibrate a scaling model, which accounts for the measured discrepancies to predict performance at larger delivered laser energy (bottom plot in Fig. 8). In this example, the model is used to predict what drive energy is required to recover the predicted capsule performance if no degradation is present. Recent work, which will be discussed in Sec. IV, has extended the deep learned surrogate model to incorporate vastly more diagnostic information than the handful of scalars used in this work.94 

FIG. 8.

Upper: Posterior distributions of four latent variables describing various potential degradation mechanisms in NIF experiments from Ref. 91. Lower: Results of the calibrated scaling model using the ensemble of analyzed data, showing 2.2 MJ of laser energy needed to recover the undegraded performance predicted by simulation. Adapted with permission from Gaffney et al., Phys. Plasmas 26, 082704 (2019). Copyright 2019 AIP Publishing LLC.

FIG. 8.

Upper: Posterior distributions of four latent variables describing various potential degradation mechanisms in NIF experiments from Ref. 91. Lower: Results of the calibrated scaling model using the ensemble of analyzed data, showing 2.2 MJ of laser energy needed to recover the undegraded performance predicted by simulation. Adapted with permission from Gaffney et al., Phys. Plasmas 26, 082704 (2019). Copyright 2019 AIP Publishing LLC.

Close modal

A final example of data assimilation in the HEDP literature is using a hydrodynamic model with parametric representations of equations of state and strength models to integrate data from multiple different dynamic compression experiments.75 This framework has been used to infer a variety of parameters describing the equation of state, plasticity, and anelasticity of tantalum under ramp compression by simultaneously leveraging experimental data from the Z machine and a gas gun. The forward model is constructed to simulate the evolution of the tantalum sample under the measured drive conditions, which itself is inferred via the Bayesian method described previously,63 subject to parameterized strength and equation of state models. The forward model produces a simulated free surface velocity, which is compared to the measured free surface velocity using a normal likelihood distribution. As has become typical, the forward model is expensive, so a surrogate is used for efficiency. In this example, the surrogate is a GP model that captures the mean of the observable and the uncertainty of the surrogate. An effective number of data points is determined for the measurements based on an auto-correlation analysis of the signals. The inference of the model parameters is performed in one case simultaneously over all Z-shots and including all Z-shots and gun-shots simultaneously. By incorporating the data from both facilities, which span different ranges of strain and strain rate, the method is able to constrain the model parameters with lower uncertainty and reveal correlations between parameters that were not present with Z data alone. The differences observed by the authors clearly demonstrate the power of simultaneously and self-consistently analyzing experiments from different facilities and in different regimes to constrain constitutive models of material properties. Figure 9(a) shows the resulting parameter values with credible intervals inferred from one of the Z experiments analyzed as a function of time illustrating when in the experiment different parameters are being accessed. Figure 9(b) shows the equation of state calibrated using the data in this study compared to existing Diamond Anvil Cell (DAC) data.

FIG. 9.

Reproduced from Ref. 75. (a) Variation in model parameters as a function of time with credible intervals from Z shot 2488. (b) Equation of state calibrated with uncertainty is plotted along with Diamond Anvil Cell (DAC) experiments. Reproduced with permission from Schill et al., J. Appl. Phys. 130, 055901 (2021). Copyright 2021 AIP Publishing LLC.

FIG. 9.

Reproduced from Ref. 75. (a) Variation in model parameters as a function of time with credible intervals from Z shot 2488. (b) Equation of state calibrated with uncertainty is plotted along with Diamond Anvil Cell (DAC) experiments. Reproduced with permission from Schill et al., J. Appl. Phys. 130, 055901 (2021). Copyright 2021 AIP Publishing LLC.

Close modal

As indicated in Sec. II, statistical inference typically requires many thousands or more evaluations of forward (and sometimes adjoint) physics models. In general, this will necessitate the use of some combination of simplified physics models,59–61,84,90,98–101 machine learning methods,91,95,102–106 analytic approximations in statistical inference,90,107,108 high-performance computing, and advanced inference algorithms in order to reduce computational costs. In this section, we review some applications of machine learning, analytic approximations, and advanced algorithms drawn from the HEDP, ICF, and adjacent literature to achieve this reduction. These examples will apply tools such as neural networks,37–39 Gaussian process regression,73 projection-based reduced order models (ROMs),109 and Bayesian optimization.39,40 However, it is important to point out that many of the examples mentioned in this section are not currently leveraged to conduct statistical inference and uncertainty quantification (UQ), which is the central theme of this article. To further clarify the role that such methods can play in enabling Bayesian inference, we will provide some additional context in Sec. IV E. While we highlight a wide variety of methods, an extensive exposition of the veritable zoo of machine learning and surrogate modeling approaches goes well beyond the scope of this work, with many methods having yet to see adoption in the HEDP community. Thus, we recommend the interested reader see one of the many excellent reviews110–114 or textbooks37–40,109 for more details on model architectures as well as associated data-types and tasks that various approaches can handle.

In Fig. 10, we indicate a coarse categorization of computational reduction approaches into four different levels of granularity with examples of each taken from the literature. Panel (a) of Fig. 10 represents the lowest level category, where physics/diagnostic submodules that may be placed within a larger simulation framework are replaced by a fast emulator. The second categorization in panel (b) of Fig. 10 represents cases in which various physical quantities of interest, such as field variables or observables, are evolved dynamically using a reduced model. We note that general (not necessarily UQ driven) applications using these first two approaches appear to be fairly sparse in the HEDP and ICF literature, but are beginning to be more widely adopted in a variety of physics and engineering applications, with a particularly rich application space in computational fluid dynamics (CFD).115–118 The pursuit of these approaches in the fields of HEDP and ICF represents an exciting and impactful area of research likely to see significant growth as practitioners in the field become more familiar with advances in machine learning. To reinforce this perspective, we have included two examples of such applications in Sec. IV B from the broader ML for CFD community. Although not directly applied to HEDP and ICF experiment or simulation, they are relevant to this community (e.g., shock propagation and hydrodynamic instabilities). We emphasize, however, that adopting these cutting-edge techniques to realistic problems in HEDP and ICF presents a basic research challenge that must be overcome. Panel (c) of Fig. 10 categorizes together methods in which a full calculation from model inputs to a set of outputs or observables is surrogated. Note that methods that predict time and space resolved measurements blur the line somewhat between this category and the previous one. Panel (d) of Fig. 10 groups together methods that enable simplifications in the statistical analysis to produce computational cost savings. We organize the discussion in the remainder of this section largely around Fig. 10, providing details of several examples from within each grouping. Where possible, we provide additional references for relevant work not covered in Fig. 10 for the reader to further explore.

FIG. 10.

Bayesian inference requires many tens to hundreds of thousands of forward model evaluations. This may become too costly for complex physics models and simulations to be used regularly for inference and uncertainty quantification. Methods from machine learning and statistics provide a variety of routes to alleviate this challenge. We broadly categorize these methods into four categories, each discussed with examples in a corresponding subsection. (a) Emulating physics or diagnostic submodules: By building machine-learned emulators of the most expensive physics modules or diagnostic models (e.g., opacity models as shown in Ref. 95), multiphysics computations may be significantly accelerated. Adapted with permission from Kluth et al., Phys. Plasmas 27, 052707 (2020). Copyright 2020 AIP Publishing LLC. (b) Surrogate dynamics: There are a wide variety of approaches to accelerating computation of dynamical system evolution, including the deep-learning-based sequence-to-sequence regression model used in Ref. 96 to treat several exemplars from HEDP. Adapted with permission from arXiv:1811.05852v1.96 (c) Direct mapping of inputs to outputs: In many cases, it may be practical to learn mappings directly from model inputs to observables/outputs that will be analyzed from experiments. Here, we show an example from Ref. 97, where such a mapping was used to accelerate calibration of unknown simulation inputs against experimental observations. Reproduced with permission from Stripling et al., Ann. Nucl. Energy 52, 103 (2013). Copyright 2013 Elsevier. (d) Statistical algorithms and assumptions: Assumptions on statistical models, such as the normal approximation to the posterior made in Ref. 90 shown here, as well as a variety of algorithmic improvements can significantly reduce computational costs, enabling or simplifying Bayesian inference. Adapted with permission from Knapp et al., Phys. Plasmas 29, 052711 (2022). Copyright 2022 AIP Publishing LLC. We cover these examples and a multitude of others from HEDP, ICF, and adjacent literature (e.g., CFD) based on their categorization in Secs. IV AIV D.

FIG. 10.

Bayesian inference requires many tens to hundreds of thousands of forward model evaluations. This may become too costly for complex physics models and simulations to be used regularly for inference and uncertainty quantification. Methods from machine learning and statistics provide a variety of routes to alleviate this challenge. We broadly categorize these methods into four categories, each discussed with examples in a corresponding subsection. (a) Emulating physics or diagnostic submodules: By building machine-learned emulators of the most expensive physics modules or diagnostic models (e.g., opacity models as shown in Ref. 95), multiphysics computations may be significantly accelerated. Adapted with permission from Kluth et al., Phys. Plasmas 27, 052707 (2020). Copyright 2020 AIP Publishing LLC. (b) Surrogate dynamics: There are a wide variety of approaches to accelerating computation of dynamical system evolution, including the deep-learning-based sequence-to-sequence regression model used in Ref. 96 to treat several exemplars from HEDP. Adapted with permission from arXiv:1811.05852v1.96 (c) Direct mapping of inputs to outputs: In many cases, it may be practical to learn mappings directly from model inputs to observables/outputs that will be analyzed from experiments. Here, we show an example from Ref. 97, where such a mapping was used to accelerate calibration of unknown simulation inputs against experimental observations. Reproduced with permission from Stripling et al., Ann. Nucl. Energy 52, 103 (2013). Copyright 2013 Elsevier. (d) Statistical algorithms and assumptions: Assumptions on statistical models, such as the normal approximation to the posterior made in Ref. 90 shown here, as well as a variety of algorithmic improvements can significantly reduce computational costs, enabling or simplifying Bayesian inference. Adapted with permission from Knapp et al., Phys. Plasmas 29, 052711 (2022). Copyright 2022 AIP Publishing LLC. We cover these examples and a multitude of others from HEDP, ICF, and adjacent literature (e.g., CFD) based on their categorization in Secs. IV AIV D.

Close modal
In this section, we cover three examples of the use of machine learning to surrogate physics or diagnostic computations that may be part of a larger physics simulation package. Again, we remind the reader that by reducing computational costs, these methods may enable the application of Bayesian inference. The first two examples covered leverage artificial neural networks (ANNs)37–39,110,111 to enable fast surrogates. Mathematically, neural networks (NNs) have a relatively simple form. If we denote by x the output of layer , we may write
(23)
where f is a (generally nonlinear) “activation” function of the layer evaluated element wise on its argument, WRn,n1 is a trainable weight matrix for the layer, b is a trainable bias vector, and ◦ indicates linear composition of W with x−1 (matrix multiplication in fully connected NNs or convolution in convolutional NNs). Typical activation functions include the rectified linear unit, sigmoid, and hyperbolic tangent.38 
If we denote the output of the final layer of a NN by xL, then we may note that xL(x0, θ) depends on the input data and all of the weights and biases θ=[WL,bL,WL1,bL1,,W0,b0]. The tuning of these parameters is referred to as training, with the goal of achieving xL(x0, θ) ≈ F(x0) for some target function F. Generally, training is formulated as a minimization problem,
(24)
where LxL(x0i,θ),F(x0i), referred to as a “loss function,” measures the difference (Lp norm, binary cross-entropy, etc.) between xL and F on a training data point x0i. The optimization is typically achieved via gradient descent.38 We briefly discuss in Sec. IV E how this training may be promoted to a Bayesian framework, but assume that point estimates for θ* are obtained from training in what follows.

In Ref. 95, non-local thermal equilibrium (NLTE) opacity calculations are replaced with a NN [Fig. 10(a)] for use inline with an MHD simulation capability. Once trained, small to medium sized neural networks containing 10s–100s of thousands of parameters may typically be evaluated on millisecond timescales, offering potential for orders-of-magnitude speedup. For this case, a factor of 7× speedup of the full simulation was obtained by replacing the physics-based opacity calculations with the NN version with <10% relative error. Furthermore, it is anticipated that the NN solution may scale favorably as compared to the full NLTE calculations as the number of atomic levels included in the opacity calculation is increased [see Fig. 11(a)].

FIG. 11.

Three examples of emulating physics and diagnostic submodules. (a) In Ref. 95, an NN is trained on NLTE opacity data reducing computational cost for inline use in magneto-hydrodynamic simulations. Reproduced from the work of Kluth et al., Phys. Plasmas 27, 052707 (2020). Copyright 2020 AIP Publishing LLC. (b) In Ref. 119, an NN-based diagnostic inversion is paired with a physics forward model regularizing the NN through cycle-consistency. From Jiang et al., OSA Imaging and Applied Optics Congress 2021 (3D, COSI, DH, ISA, pcAOP). Copyright 2021 Optica Publishing Group. Reprinted with permission from Optica Publishing Group. (c) The authors of Ref. 106 demonstrate a multi-fidelity method to surrogate computation of transport coefficients that could be used inline with simulations. Adapted from L. J. Stanek, S. D. Bopardikar, and M. S. Murillo, Phys. Rev. E 104, 065303 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 See discussion of Sec. IV A for further details.

FIG. 11.

Three examples of emulating physics and diagnostic submodules. (a) In Ref. 95, an NN is trained on NLTE opacity data reducing computational cost for inline use in magneto-hydrodynamic simulations. Reproduced from the work of Kluth et al., Phys. Plasmas 27, 052707 (2020). Copyright 2020 AIP Publishing LLC. (b) In Ref. 119, an NN-based diagnostic inversion is paired with a physics forward model regularizing the NN through cycle-consistency. From Jiang et al., OSA Imaging and Applied Optics Congress 2021 (3D, COSI, DH, ISA, pcAOP). Copyright 2021 Optica Publishing Group. Reprinted with permission from Optica Publishing Group. (c) The authors of Ref. 106 demonstrate a multi-fidelity method to surrogate computation of transport coefficients that could be used inline with simulations. Adapted from L. J. Stanek, S. D. Bopardikar, and M. S. Murillo, Phys. Rev. E 104, 065303 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 See discussion of Sec. IV A for further details.

Close modal

As a second example at this level of granularity, we highlight the work of Ref. 119 shown in Fig. 11(b). In this application, x-ray radiography measurements are inverted to obtain electron density profiles in CH/Be interface heating experiments conducted at the OMEGA 60 laser facility. In this work, the authors utilize a neural network architecture for diagnostic inversion. They begin by specifying a physics based diagnostic forward model, d(z), that operates on an underlying physics state z, here electron density profile, to produce a synthetic diagnostic x-ray radiograph, y. The authors proceed to construct f(y) ≡ deNN(y) ≈ i(y) = y, where f(·) approximates the identity map i(·) that operates on radiography signals and eNN is a NN that will be trained to approximate d−1, the diagnostic inverse. The neural network function eNN is trained by minimizing the loss L=|yobsf(yobs)|2. As an aside, we note that it is also possible to train an approximation of the diagnostic inverse mapping, ẽNN, by providing pairs of points {(yobsk,zsysk)}k=1Ntrain, which must be obtained directly from forward modeling of the samples zsysk. Once the NN is trained using the forward modeled data, it can be applied to experimental data. However, this approach may risk being less robust to non-representative or out-of-training-sample data as compared to the self-consistent architecture utilized by Ref. 119. We also mention that this type of self-consistent learning architecture is similar to that used in Ref. 94 and can be useful for regularizing the trained model when an ill-posed inversion problem is being addressed.

The final example we present in Fig. 11(c) is taken from Ref. 106 and uses multi-fidelity (MF) GP regression to model transport coefficients. Briefly, GP may be thought of as defining a prior distribution over functions that will be allowed to model real-valued data. If the input domain is denoted as X, then the prior on functions f:XR can be stated as requiring that any combination of function values f = [f(x1), f(x2), …, f(xM)] evaluated on a set X={xi}i=1MX with M ≥ 1 is jointly Gaussian. This joint Gaussian distribution has mean μ = [m(x1), m(x2), …, m(xM)] and covariance Σi,j = K(xi, xj), where m is a mean function (typically taken to be zero) and K is a kernel function (e.g., the square exponential described in Sec. IV E). It may be shown analytically that the posterior distribution of f evaluated at points X*={xj*}j=1N has the form
(25)
(26)
(27)
where KA,B indicates the matrix formed by evaluating the kernel function on all pairs of points (a, b) for aA and bB. We pause to note from the structure of Eqs. (25)(27) that GP’s have the attractive property that they exactly interpolate training data, making them a strong candidate for fitting deterministic data from physics simulations.
By MF, we mean an approach that leverages relationships between an inexpensive-to-compute but physically less accurate low-fidelity (LF) model fLF(x) and sparsely sampled expensive high-fidelity (HF) models fHF(x), where x are the inputs (e.g., temperature and ion density). In the original Kennedy–O’Hagan framework,121 a linear correlation between LF and HF models along with a non-linear discrepancy term δHF(x) are specified so that
(28)
where ρ is a proportionality constant used to capture an overall linear correlation between the LF and HF data. The authors of Ref. 106 consider both this “linear MF” approach and a more general framework allowing for a nonlinear mapping between LF and HF models given by122 
(29)
where z is, in general, a nonlinear mapping that may depend non-trivially on x. A combination of relatively cost-effective to evaluate LF and HF model data may be used to fit GP’s for fLF(x), δHF(x), and ρ or z(·) depending on the chosen framework. The authors find that the approach given by Eq. (29) generally outperforms the Kennedy–O’Hagan framework when significant nonlinear correlations between LF and HF models are present. Although the authors of Ref. 106 do not incorporate their GP emulator into a larger simulation framework, it should be clear from context that this could constitute such a submodule of a larger calculation.
Of significant note, in an MF approach, one need not evaluate the LF and HF model at the same set of training points {xi}i=1Ntrain,LF{xi}i=1Ntrain,HF and that the number of training points for fitting fLF(x) and δHF(x) need not be the same. This allows for fewer evaluations of the HF model, i.e., Ntrain,HFNtrain,LF, allowing for a reduced training computational budget. We also note that this approach allows for straightforward extension to a set of multiple hierarchical models through recursive application of Eq. (29),
(30)
where i = 1, 2, …, imax indexes the expected physical accuracy of the model. The MF approach additionally allows for the incorporation of experimental datasets as the highest fidelity results.

Interestingly, the use of GP’s naturally incorporates an estimate of uncertainty in the emulator, namely, the posterior predictive interval, which grows as one moves away from data points xi on which GP’s for fLF and δHF are trained. This uncertainty could be propagated by directly sampling from the GP at runtime. However, interpretation of this and other forms of surrogate uncertainty requires caution. We give further details in Sec. IV E. We also wish to highlight that GP regression can be extended to incorporate both functional (e.g., monotonic,123 bounded,124 and multi-output125) and physics-based constraints that may be useful in certain applications (e.g., divergence-/curl-free126 and thermodynamic consistency for EOS modeling127). Furthermore, we note that the work of Ref. 106 assumed that there is an inherent ordering to the accuracy of the models used, leading to a hierarchical structure where it makes sense to treat the high-fidelity case with a discrepancy from the low fidelity case. Recent efforts utilizing MF models to capture non-hierarchical structure between different models128,129 are capable of lifting this assumption. Incorporating non-hierarchical structures may be particularly useful in HEDP and ICF applications where different models are not necessarily ordered uniformly across their input domain, e.g., one model may work better at high temperature and another at low temperature.

Before proceeding, we emphasize that in choosing to cover primarily HEDP and ICF literature, we have biased the application space to some degree. In particular, we wish to emphasize that ML methods may also be used to accelerate or replace expensive algorithmic portions of computational solvers. For example, applications of machine learning to choose effective solver-preconditioner setups,130–134 efficiently generate matrix preconditioners,135 or to replace such solvers altogether with learned components136,137 have been explored. Neural networks have also found application in large eddy simulations (LESs). In LES, low-resolution simulations include subgrid scale source and sink terms to capture higher-resolution effects to improve computational cost while maintaining model fidelity. Accurate closure models of the subgrid scale terms are an important and active area of research. Recently, subgrid scale effects parameterized by machine learned models have been incorporated, for example, into numerical simulation of turbulent flows,138 the 1D Burger’s equation,139 and 2D MHD Kelvin–Helmholtz instability.140 In Ref. 141, direct numerical simulations (DNSs) are accelerated through ML enabled accurate coarse-gridding of algorithm components most affected by a decrease in resolution. This was achieved by allowing a convolutional neural network (CNN) to learn local flux interpolation and field correction rules from DNS training data. These methods may provide significant speedups in partial differential equation (PDE) solvers at the sub-component level.

At the next higher level of granularity presented in panel (b) of Fig. 10, one may attempt to surrogate the dynamical evolution of the system under consideration. If this results in sufficient computational speedup, such an endeavor may allow for the incorporation of simulations directly into forward modeling in Bayesian inference and/or experiment design frameworks. To be specific, suppose that we have a simulation code that operates on some history of the system state {zti}ti=tktN to give a sequence of states at later times {ztj}tj=tN+1tM. We denote this by H({zti}ti=tktN,μ)={ztj}tj=tN+1tM, where μ are any other parameters needed to specify the operator that evolves zt. For continuous time dynamics, H typically takes the form of an integro-differential operator that can be solved using a discretization scheme. A goal of the work reviewed in this section is to replace the operator H and/or state z with a much faster approximation Ĥ to evolve the state ẑ, which may be the original state or some reduced-dimensional representation from which an estimate of z may be obtained.

The example from Ref. 96 shown in panel (b) of Fig. 10 uses a recurrent neural network (RNN) to surrogate the evolution of 1D temperature, pressure, and density fields. There, time evolution of the fields computed using a 1D radiation hydrodynamics simulation code is emulated using a sequence-to-sequence model based on gated recurrent units in an RNN architecture. The use of an RNN architecture allows the network to learn short- and long-time correlations in the evolution of the dynamics that cannot be effectively captured using a more standard ANN architecture lacking recurrent units. Particularly, the authors demonstrate the efficacy of the RNN approach over a simple state transition estimation by an ANN, showing a 30% reduction in the integrated absolute error between predicted and true values on a testing set for the RNN case compared to the ANN. Two additional examples are presented in Ref. 96. One example treats time evolution in a 1D diffusive problem. The second surrogates computation of time dependent x-ray emission images computed from 3D ICF implosion simulations. Both examples obtain similar findings in terms of ANN vs RNN performance.

In the remainder of this section, we discuss basic principles of reduced order models needed to consider the work of Refs. 142 and 143, respectively. The authors of these works apply interesting extensions to traditional linear projection-based reduced order models (ROMs) in order to surrogate dynamical system evolution. Briefly, projection-based ROMs operate on the assumption that the dynamics of the system under consideration live in a (linear) subspace of lower dimension than the full dimensionality (i.e., spatiotemporal computational mesh/grid size). Working with a space-discretized system of partial differential equations, one may write generally
(31)
where ARNG×Ns is a matrix of coefficients, giving the set of NG coupled first order ODEs after applying spatial discretization and any operations needed to reduce the system to first order in time; zRNs is the system state; and μ are parameters describing initial conditions or other free model parameters (e.g., viscosity). We assume that the system state may be expanded as
(32)
with Φ=[ϕ1,ϕ2,,ϕns]RNs×ns, having columns {ϕi} spanning the trial solution space of chosen dimensionality, ns, in which the dynamics are allowed to evolve (nsNS for ROM) and ẑ(t) represent the (time-dependent) mode amplitudes. We note that the expansion in Eq. (32) is referred to as Galerkin projection and is commonly used in CFD. We refer the reader to Chap. 11 of Ref. 109 and references therein for applications and extensions of Galerkin projection. With this ansatz, the equations of motion (after projecting onto the trial space) read
(33)
We may gain further insights by splitting f into linear (L) and nonlinear (N) pieces resulting in
(34)
If N = 0 above, then the matrix multiplications involving Φ, A, L, and z0 may be computed once and Eq. (34) clearly represents a reduction of the original Ns ODEs to ns < Ns equations, providing computational gains. Unfortunately, the residual nonlinearity will still require evaluation of functions and inner products at each time step with sizes that scale with the original problem dimension Ns. The careful treatment of this term goes beyond the scope of our discussion here, but we refer the interested reader to Chap. 12 of Ref. 109 and references therein for more details.

Another natural question arising from projection methods is how to find the best set of basis functions, ϕi, on which to project? One common approach is to apply proper orthogonal decomposition (POD) to a simulation dataset with a representative input parameter point μ. POD decomposes the spatiotemporal simulation data into linear combinations of orthonormal spatially dependent basis functions with time dependent coefficients of ever decreasing energy content (c.f. discussion in Sec. V A for a more detailed description of POD). Typically, such representations are truncated at some rank, r. Geometrically, if we consider the solution to live on some manifold M, POD will find the best r-dimensional linear approximation to the portion of M from which the solution snapshots are provided to the algorithm. This basis is then used to compute ROM evolution for parameters μ*Bμ in some neighborhood of μ. Alternately, one may attempt to develop interpolation schemes by including realizations of dynamics with different values of μ in the training data.

The linear approximation of Eq. (32) will typically be too restrictive to accurately capture the system evolution over long periods of time when truncating at small rank r. For example, in problems dominated by advection or with transient phenomena, POD may fail to capture a low-dimensional representation of the dynamics. However, Eq. (32) may be thought of as attempting to find a low-dimensional representation of the dynamics specified by ẑ. In this context, we may think of ẑ as providing Euclidean coordinates ẑRr that map to the configuration space of the full dynamics z given, more generally, by
(35)
with g providing mapping from Euclidean coordinates ẑ into manifold coordinates zMRNs.

A more accurate approach then would be to construct an appropriate set of locally linear approximations of g, i.e., a set of ROM bases that depend on time {Φ(tj)}j=1N and which can be thought of as spanning the tangent space of M near the point on M corresponding to time tj. Evolving the dynamics within one of these basis sets and projecting into the next appropriate basis at the appropriate time may result in accurately capturing the dynamics. While there is much mathematical detail we will not provide, this is the general idea behind the approach used to treat evolution of the classical Rayleigh–Taylor instability (RTI) in Ref. 142, as indicated in Fig. 12(a). In that work, different POD basis vectors are used based on how far the interfacial instability (bubble/spike) has evolved, allowing the authors to accurately capture single-mode RTI evolution well into the nonlinear regime.

FIG. 12.

Two examples of reduced order models for dynamical system evolution from CFD. (a) The authors of Ref. 142 utilize a time-dependent projection-based model, where the basis set used is chosen during online simulation based on a simulation parameter (here A±) dependent temporal partitioning as shown in the upper illustration. This allows for efficient and accurate modeling of single mode RTI well into the nonlinear regime. Adapted with permission from Cheung et al., J. Comput. Phys. 472, 111655 (2023). Copyright 2023 Elsevier. (b) The authors of Ref. 143 utilize a deep convolutional neural network to nonlinearly encode dynamics into a low-dimensional latent space in which the dynamics may be evolved efficiently using numerical integration. Adapted with permission from K. Lee and K. T. Carlberg, J. Comput. Phys. 404, 108973 (2020). Copyright 2020 Elsevier. See discussion of Sec. IV B for further details.

FIG. 12.

Two examples of reduced order models for dynamical system evolution from CFD. (a) The authors of Ref. 142 utilize a time-dependent projection-based model, where the basis set used is chosen during online simulation based on a simulation parameter (here A±) dependent temporal partitioning as shown in the upper illustration. This allows for efficient and accurate modeling of single mode RTI well into the nonlinear regime. Adapted with permission from Cheung et al., J. Comput. Phys. 472, 111655 (2023). Copyright 2023 Elsevier. (b) The authors of Ref. 143 utilize a deep convolutional neural network to nonlinearly encode dynamics into a low-dimensional latent space in which the dynamics may be evolved efficiently using numerical integration. Adapted with permission from K. Lee and K. T. Carlberg, J. Comput. Phys. 404, 108973 (2020). Copyright 2020 Elsevier. See discussion of Sec. IV B for further details.

Close modal

On the other hand, the method in Ref. 143 attempts to circumvent such approximations by training an auto-encoder on realizations of the dynamics for different values of μ. The decoder portion then represents g in Eq. (35). Furthermore, with the use of a neural network, gradients are readily available for allowing straightforward and rapid computation of the Jacobian matrix that appears in the equation of motion derived in Ref. 143 for the reduced dimensional Euclidean variable ẑ. This approach has been successfully applied, among other examples, for surrogating shock propagation simulated with the 1D Burger’s equation showing significant improvement in solution fidelity as compared to the linear ROM approach while also effectively capturing the intrinsic dimensionality of the solution manifold. The basic approach and shock propagation example are shown in Fig. 12(b).

Many significant details for these types of approaches go well beyond the scope of this Review and have been omitted. The goal was to provide only a high level outline to give the reader some intuition for approaches taken in the literature with application to a wide variety of fields. A number of other closely related tools should also be mentioned here. For example, sparse identification of nonlinear dynamics is a tool that has been developed to identify underlying dynamical systems models from measurement or simulation data and may be viewed as extracting a ROM directly from data.144,145 Similarly, operator regression has been developed to discover nonlinear operators in PDEs from data.146–149 Dynamic mode decomposition (DMD) is a matrix decomposition method that can identify spatiotemporal coherent structures in high-dimensional measurements/simulations having fixed linear behavior in time, i.e., damped or growing oscillatory structure.150 Recently, DMD has been applied to study magnetized plasmas151 and has been leveraged to accelerate computations of thermal radiation transfer for a Marshak wave test problem.152 Physics informed neural networks (PINNs) are another tool that have been introduced to enable data-driven solution of PDEs. PINNs have been applied, for example, to the 1D Burger’s equation153,154 and for PDE identification from data as demonstrated for incompressible 2D Navier–Stokes from data for vortex shedding behind a cylindrical obstruction.153 

We emphasize that all of the methods covered in this section will sacrifice physical accuracy and flexibility (to a greater-or-lesser extent depending on the particular application and method) in lieu of computation performance. ROM methods are actively in development that incorporate conservation laws,118,153,155,156 for example, of mass/momentum/energy, which may improve physical fidelity while preserving the performant nature of the surrogate. Where possible, effort should also be made to quantify and incorporate the uncertainty and bias introduced by the choice to use a surrogate model. The further development and adaptation of ROM methods with uncertainty quantification to realistic problems is an active area of research. Advocating for and contributing to research on realistic and relevant HEDP and ICF exemplars represent a significant community need.

For the task of Bayesian parameter estimation, one typically only requires a mapping from the model parameters θ to be estimated to the final quantities that will be compared to observations. Here, we refer to such a map as an “input-to-observable” map. As a somewhat conceptually simpler approach, these methods have proven popular despite being more “black-box” than attempting to surrogate full dynamical evolution. In particular, they have enabled a number of HEDP and ICF applications, including model calibration,91,97 plasma parameter inferences,64 post-shot analysis,157 and optimization of experiment designs102–104,158,159 and diagnostic configurations.160 In many cases, these applications involve the use of a Bayesian statistical inference models, while those that do not could, in principle, be promoted to such a setting.

As a first example, we consider the work of Ref. 97. The authors’ goal in this work was the calibration of uncertain inputs to a model of 2D laser energy deposition in beryllium (2D HYADES) using experimental data collected at the OMEGA facility, as indicated in Fig. 13(a). The authors wished to calibrate three values,
(36)
as unknown simulation inputs along with additional simulation input values derived from the experimental setup with characterized uncertainty
(37)
An emulator of the experimentally accessible shock breakout time computed using HYADES was trained using N = 104 simulations with uniformly distributed random inputs covering the domain of experimental configurations and values of θ considered possible. The chosen emulator model was a Bayesian extension of multivariate adaptive regression (BMARS). Rather than providing a single function estimating the output, fitting a BMARS model actually provides a distribution over MARS functions, each of which estimate the shock breakout time across the input domain. Example draws from this distribution are shown in panel (c) of Fig. 10. This distribution of regression functions allows for propagation of uncertainty in the MARS model fit through the calibration process via sampling of the posterior over the regression model parameters p(k,βk|{y(xi,θi)}i=1Ntrain), where k and βk represent parameters in a given MARS model being fit on Ntrain ≈ 100 samples from HYADES. We note that Sec. IV E provides additional details and references for a variety of approaches to incorporating UQ arising from the surrogate modeling process.
FIG. 13.

Three examples of the use of input-to-observable maps in HEDP. (a) The authors of Ref. 97 utilized the regression model surrogate simulation parameter dependent shock breakout times, which is then used to calibrate model parameters using fits to experimental data. Reproduced with permission from Stripling et al., Ann. Nucl. Energy 52, 103 (2013). Copyright 2013 Elsevier. (b) The authors of Ref. 102 used a random forest model to fit to the total energy yield from the HYDRA radiation hydrodynamics simulation code of NIF ICF capsule implosions. Random forests are fast to evaluate, allowing for optimization over simulation input parameters to aid in experiment design. Adapted with permission from Peterson et al., Phys. Plasmas 24, 032702 (2017). Copyright 2017 AIP Publishing LLC. (c) The authors of Ref. 160 utilize a Bayesian optimization approach to discover optimal radiation detector configurations for reducing bias and uncertainty in inferred quantities. Adapted with permission from Knapp et al., J. Plasma Phys. 89, 895890101 (2023). Copyright 2023 Cambridge University Press. See discussion of Sec. IV C for further details.

FIG. 13.

Three examples of the use of input-to-observable maps in HEDP. (a) The authors of Ref. 97 utilized the regression model surrogate simulation parameter dependent shock breakout times, which is then used to calibrate model parameters using fits to experimental data. Reproduced with permission from Stripling et al., Ann. Nucl. Energy 52, 103 (2013). Copyright 2013 Elsevier. (b) The authors of Ref. 102 used a random forest model to fit to the total energy yield from the HYDRA radiation hydrodynamics simulation code of NIF ICF capsule implosions. Random forests are fast to evaluate, allowing for optimization over simulation input parameters to aid in experiment design. Adapted with permission from Peterson et al., Phys. Plasmas 24, 032702 (2017). Copyright 2017 AIP Publishing LLC. (c) The authors of Ref. 160 utilize a Bayesian optimization approach to discover optimal radiation detector configurations for reducing bias and uncertainty in inferred quantities. Adapted with permission from Knapp et al., J. Plasma Phys. 89, 895890101 (2023). Copyright 2023 Cambridge University Press. See discussion of Sec. IV C for further details.

Close modal

In the remaining two examples, input-to-output emulators were developed to aid in optimizing the design of an experiment or diagnostic configuration. Broadly speaking, the critical component to such approaches is expert-motivated selection of a scalar valued loss for optimization (or multiple loss terms with Pareto optimization161). Typically, the evaluation of such a loss across a broad range of design or diagnostic configurations will be costly to compute. As a result, one must typically aid the optimization process either by emulating the expensive forward model or surrogating the computation of the loss in a fashion that allows for an intelligent parameter search.

In Ref. 102, represented in Fig. 13(b), a random forest model is fit to the total energy yield from the HYDRA radiation hydrodynamics simulation code of NIF ICF capsule implosions. The authors developed the expert-motivated objective function for optimization given by C=10P+ITFX, where P is a measure of the sensitivity of the yield output from the surrogate to small perturbations in the input and ITFX = Y(ρR)2 is an ignition threshold parameter. The factor of 10 in the first term ensures the two are of similar scale. The optimization results in proposal of an ICF experimental configuration, which may not belong to the data used to train the random forest surrogate. The new design point may be evaluated through a forward simulation. Provided that the original surrogate was trained with sufficient data to map the energy yield response accurately, the discovered experimental design will balance robustness to small, O(10%), deviations from nominal input parameters and performance as measured through ITFX. As random forests are relatively fast to evaluate, an exhaustive search is then possible.

Rather than optimizing an experiment design, the authors of Ref. 160 sought to optimize experimental diagnostic configurations across a range of possible designs to maximize understanding of the plasma conditions generated in MagLIF experiments. In this case, numerical experiments were conducted using a 1D resistive magneto-hydrodynamics code producing space-time dependent plasma conditions. The goal was to optimize a set of photoconducting diamond (PCD) sensitivities and x-ray filter materials and thicknesses used on the PCD detectors to enable unbiased inference of emissivity averaged quantities, such as plasma temperature and liner areal density using a simplified uniform plasma model. The work used Bayesian optimization to minimize a loss, say, M, characterizing both the bias and variance in inferences relative to ground truth across a set of simulation input ranges as indicated in Fig. 13(c). We note that the Bayesian optimization process involves fitting GP to the objective M to be optimized as a function of the diagnostic configuration. The GP provides a probability distribution p(M|x) conditioned on the inputs x. The width of p(M|x) is related to the sparsity of known data in the particular region of input parameter space under consideration. Furthermore, the behavior of the average value of the GP is governed by the observed data. Bayesian optimization works by proposing the next model evaluation point as that which maximizes the expected improvement given by
(38)
where X is the domain of input parameters and Mmax,obs is the maximum value of the metric observed so far. In this way, Bayesian optimization balances exploration needed to fill in regions with a low density of known data and exploitation of regions of previously observed good performance. Furthermore, the method flexibly allows for a mixed continuous and discrete diagnostic configuration parameter search. Applying the method to synthetic data resulted in finding a diagnostic configuration that substantially reduced the bias and uncertainty in inferring plasma temperature relative to an expert-chosen configuration. We note that Bayesian optimization has also notably been used in Ref. 158 for optimizing the design of a graded inner shell in double shell ICF capsules in a multi-objective optimization setting.

Interestingly, unlike the example in Ref. 102, Bayesian optimization does not necessarily require an accurate mapping of the response at the outset. It thereby reduces the need to generate large amounts of initial training data. However, there are practical considerations regarding the ability of Bayesian optimization to avoid being stuck in local optima (see Ref. 162). Furthermore, the robustness parameter P in Ref. 102 requires O(1000) forward model evaluations to compute, necessitating the use of a surrogate with detailed knowledge of the simulation response across the input domain. In practice, one will need to carefully chose the approach utilized based on computation resources and complexity, expert insight, and the particular application space.

The examples presented in Secs. IV AIV C provide approaches to reducing the computational cost of a given physics model through the application of data-driven methods. This subsection is devoted to highlighting methods of statistical approximations and statistical algorithms that enable analytic calculations or reduce computational burden for more rapid and accurate inference irrespective of an underlying physics model. We begin with a discussion of some practical applications of the analytic approach discussed in Sec. II and variational inference and move on to a discussion of examples in the literature, demonstrating the role that differentiable programming and gradient based sampling algorithms plays in efficient inference.

The Laplace approximation was discussed in Sec. II A as a means to enable an analytic treatment and connection of Bayesian methods to potentially more familiar χ2 analysis. This approximation may also be useful as a first approximation or to marginalize out well understood uncertain parameters. For example, the Bayesian data assimilation of Ref. 90 utilizes the Laplace approximation to the full posterior distribution for emissivity weighted temperature, hotspot pressure, and hotspot volume, while also assessing the Lawson parameter χ consistent with these inferred values across a database of MagLIF experiments. As indicated in panel (d) of Fig. 10, for this specific application, the normal approximation tended to slightly overestimate the posterior variance while not tending to strongly bias the inference. At the same time, the Laplace approximation generally will require fewer forward model evaluations, provided that the initial parameter optimization step is not too costly.

As indicated in Fig. 14(a), the authors of Refs. 107 and 108 performed an analytical marginalization of Nnuisance = 29 simulation inputs of convergent ablator experiments on the NIF with well-characterized uncertainties. This is achieved by assuming a linear response of simulation outputs to variations in these nuisance parameters along with a normal likelihood function. The very high dimensional integral needed to marginalize these parameters would have required a prohibitively expensive number of simulations to evaluate numerically. Despite these assumptions, the posterior for the parameters of interest can be highly non-Gaussian as a result of the nonlinear nature of the simulations coupling these parameters to the nuisance parameters to generate simulation output. In addition to providing an analytic route to nuisance parameter marginalization, this work demonstrates the impact of leaving out these nuisance parameters and the importance of physically reasonable priors on microphysics models. The corresponding maximum likelihood estimate (MLE) differs significantly when including the well constrained priors on the microphysics informed by previous experiments. In particular, it is found that the prior uncertainty on microphysics model multipliers needs to be increased by about 1–2 orders of magnitude relative to recover the MLE in the uniformed prior case. The result demonstrates the critical need to incorporate prior knowledge into inferences when available. Complimentary to these observations, note that the authors of Refs. 107 and 108 highlighted sensitivity to prior information, which should be carefully considered when establishing a choice of prior to avoid biasing the result without strong prior experimental evidence. Before proceeding, we point out another approximate inference method known as variational inference (VI). In VI, the posterior distribution p(θ|D) of Eq. (3) is approximated with a distribution qα*(θ) from a family of distributions parameterized by a set of quantities α that is more tractable (typically multi-variate normal with α giving mean and covariance) such that qα*(θ) minimizes some measure of discrepancy with the target distribution p(θ|D) (typically Kullback–Leibler divergence163). As we have not found significant application in the ICF and HEDP literature, we forgo further discussion and refer the reader to the textbooks of Refs. 3740 and references therein for further details.

FIG. 14.

Two examples of the use of statistical algorithms and simplification. (a) In Refs. 107 and 108, an analytical marginalization of Nnuisance = 29 simulation inputs with well-characterized uncertainties is performed assuming a linear response. Adapted with permission from Nucl. Fusion 53, 073032 (2013). Copyright IOP Publishing 2013. (b) By formulating an adjoint PDE, derivatives of quantities of interest such as time-integrated absorption of incident radiation in a particular region with respect to model parameters may be obtained from a single evaluation of the adjoint equations. Adapted with permission from K. D. Humbird and R. G. McClarren, High Energy Density Phys. 22, 12 (2017). Copyright 2017 Elsevier. See discussion of Sec. IV D for further details.

FIG. 14.

Two examples of the use of statistical algorithms and simplification. (a) In Refs. 107 and 108, an analytical marginalization of Nnuisance = 29 simulation inputs with well-characterized uncertainties is performed assuming a linear response. Adapted with permission from Nucl. Fusion 53, 073032 (2013). Copyright IOP Publishing 2013. (b) By formulating an adjoint PDE, derivatives of quantities of interest such as time-integrated absorption of incident radiation in a particular region with respect to model parameters may be obtained from a single evaluation of the adjoint equations. Adapted with permission from K. D. Humbird and R. G. McClarren, High Energy Density Phys. 22, 12 (2017). Copyright 2017 Elsevier. See discussion of Sec. IV D for further details.

Close modal

As mentioned in Sec. II D, gradient assisted MCMC is a useful way to improve efficiency of sampling algorithms. When available, the adjoint equations to a PDE allow for direct computation of derivatives of quantities of interest with respect to input parameters. For example, the study164 shown in Fig. 14(b) used analytic adjoint methods to conduct sensitivity analyses of flux-limited radiative diffusion. This work demonstrated that a full adjoint-based sensitivity analysis significantly outperformed even a single variable sensitivity analysis based on finite differencing of the forward model. In general, providing adjoint equations analytically by hand may be difficult. Recently, the authors of Ref. 165 demonstrated the use of JAX,166 an automatic-differentiation programming framework, to conduct fully differentiable kinetic simulations of the Vlasov–Poisson–Fokker–Plank equations with gradients. This code was paired with a neural network trained to tune a forcing function to minimize a physics cost function resulting in the discovery of a novel nonlinear physics response. Neural networks also allow for rapid gradient computations (using an approach commonly called back-propagation in the machine learning community167) through the chain rule. This can be achieved as a result of using a series of simple operations with known derivatives. As an example, by replacing a physics forward model with an ANN trained on simulation data, the authors of Ref. 64 were able to use the gradient-based NUTS54 method for MCMC using the python packages keras168 and pymc.49,50

Before concluding, we highlight that MF modeling ideas have also recently been applied to accelerate MCMC by reducing the number of high-fidelity model evaluations required. We refer the reader to Sec. 4 of the review in Ref. 169 and citations provided in that work for more details on such multi-level MCMC schemes. Given the range of model fidelities and associated computational costs, we believe these methods are an excellent candidate for application to problems in HEDP and ICF physics.

We note explicitly that this section has omitted a review of physics model simplifications that do not rely on the above surrogate modeling approaches. Although not covered here, such models will continue to play a significant role in enabling computationally tractable statistical inferences while allowing for detailed physical insights. Generally, the methods and examples highlighted in this section seek to balance the physical-fidelity computational-complexity trade-off incurred by the use of emulators to enable statistical inference. As such, it is important to consider when and how the resulting bias and uncertainty may be incorporated into the analysis to be conducted. However, we emphasize that the considerations presented here are not unique to data-driven modeling, but apply equally to physics-motivated approaches.

In regard to uncertainty quantification, models such as GP regression and BMARS are formulated using a Bayesian framework. These models come equipped with a posterior distribution over regression functions (and thus a posterior predictive interval). This allows for direct incorporation of uncertainty through sampling from said posterior distribution. In the context of other machine learning methods, such as NN regression, one has multiple approaches available of varying degrees of rigor. For example, the model may be fit to training data using a Bayesian approach. In this Review, we view the loss function as being proportional to a log-likelihood, leading to Bayesian NNs (BNNs).171,172 BNNs come equipped with posterior distributions for the NN weights and biases, allowing for a direct sampling approach as with GP regression. Alternately, NN weight dropout methods may be used to approximate the Bayesian approach.173 In Fig. 15(a), we indicate the approach to UQ taken in Refs. 64 and 170. In particular, the authors of those works utilize direct computation of out-of-sample error estimates from holdout data not used to train models. They then characterize variance of the employed emulator about ground truth model. However, this requires assumptions on statistics of the error term, assumed normal in the above two citations, so that yemulated=ygroundtruth+N(0,Σest), where Σest is estimated on holdout data not used for training. The authors of Ref. 105 developed a more flexible approach indicated in Fig. 15(b) by training two NNs on different loss functions to provide point and interval estimates. The authors are able to reduce the sensitivity to parametric error models by training directly on estimated distributions of the error for given input samples. In all cases, we note that caution must be applied in tasks involving model extrapolation (i.e., out-of-sample model evaluation) and interpreting uncertainty. The choice of model always imposes a bias. If this is not well matched to the data, then one cannot expect model extrapolation and any associated posterior predictive intervals to appropriately model the data or give reasonable estimates of uncertainty.

FIG. 15.

(a) In Refs. 64 and 170, model error is assumed to follow a multi-variate normal distribution with the covariance matrix computed on an unseen test dataset. Reproduced with permission from Glinsky et al., Phys. Plasmas 27, 112703 (2020). Copyright 2020 AIP Publishing LLC. (b) In Ref. 105, two separate networks are trained to reproduce point and interval estimates. The result is a more flexible error model. However, extrapolation of the point estimate and intervals for error bounds to unseen regions of input parameter space should be treated with caution. Reproduced with permission from Thiagarajan et al., Nat. Commun. 11, 5622 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 

FIG. 15.

(a) In Refs. 64 and 170, model error is assumed to follow a multi-variate normal distribution with the covariance matrix computed on an unseen test dataset. Reproduced with permission from Glinsky et al., Phys. Plasmas 27, 112703 (2020). Copyright 2020 AIP Publishing LLC. (b) In Ref. 105, two separate networks are trained to reproduce point and interval estimates. The result is a more flexible error model. However, extrapolation of the point estimate and intervals for error bounds to unseen regions of input parameter space should be treated with caution. Reproduced with permission from Thiagarajan et al., Nat. Commun. 11, 5622 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 

Close modal
As a simple example to demonstrate the impact of model-data mismatch, we train a zero mean GPN(μ(x)=0,k(x,x)) model using the squared exponential kernel function,
(39)
with kernel scale parameter = 0.5 on different data shown in Fig. 16. In the first row of Fig. 16, data follow f(x) = sin(x), while in the second row, data follow f(x) = sin(x) + 3, and in the first column, we train on only the two leftmost data points, while in the right column, we train on the full dataset. In the first case, the data follow a functional form that has zero-mean with an output domain f(x) ∈ [−1, 1] and therefore matches the inductive bias imposed by the GP model reasonably well. As a result, even when training on a small number of data points, the model provides reasonably meaningful (although still poor) point estimates and uncertainties, but this is not the case in the second row of Fig. 16. We see in the second row that the GP regression is sufficiently flexible to interpolate the full dataset and provide reasonable uncertainty where the model is interpolative, but not extrapolative. Furthermore, with less data (lower left plot of Fig. 16), the posterior predictive interval does not share a particularly meaningful relation with the unseen data, providing only information that the model lacks knowledge of data in the region x ≳ 0. Of course, in our somewhat contrived example, if sufficient data are available to estimate the mean and variance of observations, standard normal scaling (subtraction by mean and division by standard deviation) can be applied to enforce that the data better match our model bias. Regardless, we wish to highlight that these considerations are universal and not unique to the GP example represented in Fig. 16. Direct assessment of the bias and variance of out-of-sample data with respect to a model and strong inductive priors (e.g., physics constraints) is effectively the only safeguard against misinterpretation of model uncertainty. Methods such as cross-validation play an important role in understanding generalizability of models fit to data.37–40,174 These techniques and others are also important in hypothesis testing and model selection, which are discussed briefly in Sec. VI A. We wish to emphasize that the particular choice of GP’s to demonstrate this feature was out of convenience. This is because GP’s allow for more direct access to the underlying model assumptions making it more straightforward to emphasize the importance of out-of-sample testing. However, these considerations are critical regardless of the underlying model. A detailed discussion of the benefits and drawbacks of different approaches to uncertainty quantification for general machine learning methods is well beyond the scope of this Review. Rather, the intent here is to indicate key ideas to the reader. For further reading, we recommend a variety of textbooks that treat machine learning in a probabilistic (i.e., Bayesian) framework.37,39,40 For uncertainty quantification in deep learning, see the review in Ref. 175.
FIG. 16.

The choice of the model form imposes a bias on predictions made. When that bias is insufficiently matched to the data, extrapolative prediction and uncertainty estimation from posterior predictive intervals cannot be expected to have meaningful relation to unseen data. In more detail, note that in the left column, only the left two most data points are fit. In the upper plot, the assumption of zero mean and unit variance made in GP is sufficiently well matched to the function f(x) = sin(x) from which the data were drawn. However, in the lower left plot, the data are drawn from f(x) = sin(x) + 3 shifting the mean away from zero. In this case, the model bias imposed by GP is insufficiently matched to the data. In this case, the uncertainty in regions of extrapolation is not meaningful. When fitting to more data in the right column, GP is flexible enough to act as an interpolator, and in regions where the model is interpolating, the uncertainty is somewhat meaningful in both cases, while for extrapolation, only the case where GP is sufficiently matched to the data are the uncertainties meaningful.

FIG. 16.

The choice of the model form imposes a bias on predictions made. When that bias is insufficiently matched to the data, extrapolative prediction and uncertainty estimation from posterior predictive intervals cannot be expected to have meaningful relation to unseen data. In more detail, note that in the left column, only the left two most data points are fit. In the upper plot, the assumption of zero mean and unit variance made in GP is sufficiently well matched to the function f(x) = sin(x) from which the data were drawn. However, in the lower left plot, the data are drawn from f(x) = sin(x) + 3 shifting the mean away from zero. In this case, the model bias imposed by GP is insufficiently matched to the data. In this case, the uncertainty in regions of extrapolation is not meaningful. When fitting to more data in the right column, GP is flexible enough to act as an interpolator, and in regions where the model is interpolating, the uncertainty is somewhat meaningful in both cases, while for extrapolation, only the case where GP is sufficiently matched to the data are the uncertainties meaningful.

Close modal

Various model calibration approaches may be appropriate to aid in understanding the model bias in the context of experimental data or high-fidelity simulation. Transfer learning, for example, has shown significant promise by leveraging multi-fidelity data ranging from simple simulations through complex integrated experiments to make models more predictive. Typically, transfer learning is achieved by modifying the output of a model directly either via a learned discrepancy or multi-fidelity terms as highlighted in Ref. 106 of Sec. IV A or by freezing all but the last few layers in a NN model and retraining to match data from a related target task.176–179 In these cases, the base task might be a simulation and the similar target to be learned could be a higher-fidelity simulation or experiment. As an alternate approach, calibration could be conducted on the inputs, including inputs that are, in principal, well controlled in the experiment. One may then attempt to learn a mapping from the known experimental inputs to the inputs needed to reproduce observed data. This approach, taken in Ref. 91 discussed in detail in Sec. III, additionally allows for the exploration of explanatory degradation mechanisms, with the potential for developing data-driven physics insights. Critically, we emphasize that these methods should not be viewed as a replacement for but rather a compliment to the toolsets currently employed across the range of theory, simulation, and experimental analysis to accelerate the design and discovery cycle. In particular, post-facto investigation of the learned models using, for example, feature importance metrics180 and correlation analysis181 may provide physicists with new insights into fundamental processes.

As indicated in Secs. II and III, attempting to perform Bayesian inference in cases with large numbers of inputs will prove challenging due to the curse of dimensionality. This arises from the fact that volume grows exponentially quickly with dimensionality, making (probability) density estimation difficult. Furthermore, spurious background signals may contribute to significantly biased inferences, while noisy data may result in high-variance inferences. At the same time, high-dimensional experiment signals and synthetic simulation outputs can create memory usage and data storage problems, slowing or halting analysis. To alleviate these issues, a variety of expert-motivated and/or data-driven automated processing and featurization steps may be employed. Goals of these methods include reducing dimensionality of representations used for inference, accounting for instrument effects, filtering noise, subtracting background, and rejecting outliers. It is important to highlight that these methods may need to be applied to experimental or simulation derived data or both depending on the context to allow for efficient inference. In this section, we highlight a variety of applications demonstrating these approaches. We attempt to emphasize how these methods may enable or improve inference, although many examples presented stop short of performing a full inference with UQ. For clarity, we attempt to group the examples into two main categories. The first category is methods associated with dimensionality reduction. The second category is approaches to address background subtraction, noise reduction, and removal or assessment of instrumental effects that may adversely affect inferences. As has been the case throughout, we primarily restrict our consideration to HEDP and ICF relevant examples. However, many of the approaches discussed have been developed in diverse communities for applications ranging from digital signal processing to medical imaging. We again encourage the reader to refer to the wider literature for a more comprehensive perspective.

Dimensionality reduction refers to any process by which important information for a particular application is distilled into a smaller set of quantities for use in further analysis or data storage. Uses of dimensionality reduction include enabling sparse measurements for high-fidelity state estimation and data compression to data visualization and avoidance of the curse of dimensionality in statistical analysis. As indicated in Fig. 17, dimensionality reduction methods can range from expert-motivated feature engineering approaches through purely data-driven feature or representation learning. A variety of methods are available to the practitioner, some of which are explicitly called out in Fig. 17.

FIG. 17.

Dimensionality reduction methods can range from expert motivated engineering of features through data-driven approaches, such as low-rank approximations of data through matrix decompositions and embedding methods that can learn to capture nonlinear structures in the data with reduced dimensional representations. A small subset of examples are given here with their corresponding category.

FIG. 17.

Dimensionality reduction methods can range from expert motivated engineering of features through data-driven approaches, such as low-rank approximations of data through matrix decompositions and embedding methods that can learn to capture nonlinear structures in the data with reduced dimensional representations. A small subset of examples are given here with their corresponding category.

Close modal
To illustrate how these ideas can aid in Bayesian inference, we begin with the example of tomographic inversion. Suppose one wishes to reconstruct a source distribution of emissivity (at a fixed spectral position ν) given by ϵν(x, y, z) from projections of the emission onto a number of lines-of-sight (LOS). Denoting the projection operator along the ith LOS by Pi, one typically proceeds by minimizing a loss function characterizing the difference between the modeled and observed projection,
(40)
where L is typically taken to be the mean square error and Piobs is the observed projection. Note that this naturally promotes to a Bayesian inference taking Ltot to specify a likelihood function. Using a pixel-wise or voxel (3D pixel) basis, one then has to estimate the model inputs ϵν(xi, yj, zk), which has ninjnkN3 input parameters. For even modest dimensions of, say, 64 cells in each cardinal direction, one must infer emissivity for over 250 000 voxels, quickly making Bayesian posterior estimation unviable as resolution increases. Furthermore, in the HEDP and ICF experiment, a sparse number of projected views is generally available to help constrain the full morphology of an object. Often, such tomographic inversion problems are ill-posed due to the insufficient amounts of data. In Ref. 182, this is addressed by deriving 3D electron temperature maps from 2D projections using a voxel basis, as indicated in Fig. 18(a), while characterizing uncertainty using a bootstrapping39 method rather than Bayesian inference. In this case, variance in the solution on a voxel-by-voxel basis is assessed by adding Poisson distributed noise to experimental images before performing repeated tomographic inversion (using O(1000) realizations of images with noise added). The authors of Ref. 182 additionally assess systematic error incurred by using limited two and three lines-of-sight for reconstruction in this fashion. In this case, however, the rigorously defined uncertainty quantification from Bayesian inference is foregone in lieu of a more flexible representation that may be able to better capture detailed features in the 3D object. We note that inspection of 1D lineouts from the bootstrapped data (as in Fig. 12 of Ref. 182) shows significant variance in the local structure so that it is not clear if the structure is associated with the physical quantity of interest, increased noise added to experiments for the bootstrapping procedure, or from a possibly multimodal structure in the loss landscape. As an alternate approach involving a high-dimensional formulation, we note the work of Ref. 183, discussed in more detail in Sec. V B. Here, a deep-learning approach with a synthetic model is used to perform tomographic inversion. This approach corresponds to an implicit regularization for the inversion based on the network structure and training data. However, in the context of Bayesian inference, this suffers from difficulties of characterizing the high-dimensional posterior distribution of neural network weights.
FIG. 18.

Examples of tomographic inversion indicate different approaches to dimensionality reduction. (a) The authors of Ref. 182 choose a voxel basis to reconstruct electron temperature maps from 2D projections. The voxel size is matched to the resolution of the projections so that no dimensionality reduction is applied. Adapted with permission from K. W. Wong and B. Bachmann, Rev. Sci. Instrum. 93, 073501 (2022). Copyright 2022 AIP Publishing LLC. (b) The authors of Ref. 184 reconstructed x-ray and neutron emissivities by assuming a spherical basis function expansion. This is an expert-motivated choice, as the spherical target geometry is expected to result in nearly spherical emission volumes. Adapted with permission from Volegov et al., J. Appl. Phys. 122, 175901 (2017). Copyright 2017 AIP Publishing LLC.

FIG. 18.

Examples of tomographic inversion indicate different approaches to dimensionality reduction. (a) The authors of Ref. 182 choose a voxel basis to reconstruct electron temperature maps from 2D projections. The voxel size is matched to the resolution of the projections so that no dimensionality reduction is applied. Adapted with permission from K. W. Wong and B. Bachmann, Rev. Sci. Instrum. 93, 073501 (2022). Copyright 2022 AIP Publishing LLC. (b) The authors of Ref. 184 reconstructed x-ray and neutron emissivities by assuming a spherical basis function expansion. This is an expert-motivated choice, as the spherical target geometry is expected to result in nearly spherical emission volumes. Adapted with permission from Volegov et al., J. Appl. Phys. 122, 175901 (2017). Copyright 2017 AIP Publishing LLC.

Close modal
By imposing constraints on the allowed modes in a spectral decomposition of the tomographic solution, the problem may allow for a well-behaved reduced dimensional solution amenable to Bayesian inference to be obtained. This approach has been applied to reconstruct x-ray and neutron sources from three views, for example, in Ref. 184. As shown in Fig. 18(b), a spherical harmonic basis expansion
(41)
was used. In this case, the input degrees of freedom are subsumed into the unknown radial (or corresponding spectral transform) dependence functions s,m(r) reducing the input dimensionality from number of voxels ∝N3N down to linear in the number of bins in the radial (or spectral) coordinate.

In general, all of these approaches make significant trade-offs in representational flexibility and dimensionality with considerations for how uncertainty is quantified. It will be critical for this application space to compare different algorithms to better understand performance and reliability of uncertainty estimates. More generally, methods that enable accurate uncertainty quantification in tomographic inversion incorporating a variety of effects are of significant interest to the broader community and have been an active area of research in a variety of application spaces for some time.185–193 

Basis expansion methods are also commonly used to provide useful summaries of data for comparison in a variety of other contexts. For example, the authors of Refs. 194 and 195 analyze nonlinear harmonic generation and mode–mode interactions in growth of the magneto-Rayleigh–Taylor instability using a spectral approach indicated in Fig. 19(a). Methods for extracting spectral features rely on regression, as in the least-squares spectral analysis196 approach used in Ref. 195. Such methods, therefore, have natural Bayesian extensions197 (recall connection between loss function and log-likelihood), providing a route to rigorous uncertainty quantification, for example, of assessing growth rates of different modes. In another example, the authors of Ref. 198 built an emulator of the code HYADES for laser deposition in a radiating shock experiment to be used as input for the radiation hydrodynamics code CRASH. Note that this is an example of surrogating a physics submodule as discussed in Sec. IV A. CRASH requires full space-time dependent field information (velocity, density, pressure, etc.) from the HYADES output. However, the authors noted that CRASH was insensitive to many of the details of the full HYADES output. As a result, the authors were able to select a set of 40 parameters (position, density, velocity, and pressure at each of ten locations specified by physical conditions), which could be used to reconstruct the HYADES field quantities using linear interpolation. This constitutes using a triangle basis function expansion, where the basis function location and width depend on the simulation input parameters. The authors found that the free parameters (coefficients, basis function locations, and widths) in this expansion could be found and used to emulate field variables with sufficient fidelity, as shown in Fig. 19(b), for use as input to the code CRASH.

FIG. 19.

Basis decomposition methods may provide useful features for further analysis. (a) The authors of Refs. 194 and 195 used a spectral approach, providing a small set of highly interpretable physics motivated features to explore nonlinear harmonic generation and mode–mode interactions in the magneto-Rayleigh–Taylor instability. Adapted with permission from Ruiz et al., Phys. Rev. Lett. 128, 255001 (2022). Copyright 2022 American Physical Society. (b) The authors of Ref. 198 learned a triangle basis function representation of the output of the code HYADES of sufficient fidelity for use in the radiation hydrodynamics code CRASH. Adapted with permission from McClarren et al., Reliab. Eng. Syst. Saf. 96, 1194 (2011). Copyright 2011 Elsevier.

FIG. 19.

Basis decomposition methods may provide useful features for further analysis. (a) The authors of Refs. 194 and 195 used a spectral approach, providing a small set of highly interpretable physics motivated features to explore nonlinear harmonic generation and mode–mode interactions in the magneto-Rayleigh–Taylor instability. Adapted with permission from Ruiz et al., Phys. Rev. Lett. 128, 255001 (2022). Copyright 2022 American Physical Society. (b) The authors of Ref. 198 learned a triangle basis function representation of the output of the code HYADES of sufficient fidelity for use in the radiation hydrodynamics code CRASH. Adapted with permission from McClarren et al., Reliab. Eng. Syst. Saf. 96, 1194 (2011). Copyright 2011 Elsevier.

Close modal
In many cases, basis expansion methods may not provide a straightforward means to characterize data for comparison. More generally, features may be computed by applying a function F:RNRkkN to an experimental or synthetic dataset of dimension N to obtain a reduced representation of dimension k. The expert will generally wish to choose this mapping to give rise to a set of reduced dimensional quantities that are sensitive and constraining for the physics of interest. Physics intuition and synthetic data studies may both play a role in developing expert-motivated choices of F for this purpose. In this case, the log-likelihood function would be formulated as
(42)
where the synthetic model features depend on parameters θ to be inferred. The difference between this equation and Eq. (40) is that both the observed data and synthetic data have an additional transformation applied before comparison. Note that a basis function expansion of the observations may be chosen as a special case of F, where the function F now maps the observation to a set of basis function coefficients. While we do not have explicit need to be more general for our discussion, note that transformations may be simultaneously applied to both observations and model parameters.

We emphasize that the expert-engineered features discussed here provide highly interpretable results and will continue to serve useful for the community. For this reason, we now highlight several cases in the literature of choices for the function F before moving on to discuss details of data-driven dimensionality reduction. In the field of x-ray spectroscopy, spectral features, including linewidths, line-ratios, and continuum emission slope, constitute a small set of scalar quantities that are sensitive to and used for deriving estimates of various aspects of a plasma’s state and even geometry.199–202 Recent work has leveraged such summary features to obtain 2D electron temperature maps with the ratio of the energy-integrated Bremsstrahlung signal after attenuation by two different filters, providing the summary feature.203 Time dependent electron temperature of the ablating plasma in gas-filled holraums at the NIF has also been assessed using line-ratios from spectral features induced by inclusion of a mid-Z tracer layer204 [top panel of Fig. 20(a)]. Similarly, in inertial fusion experiments, fitting of neutron spectra in different contexts can be used to characterize ion temperature,205,206 ρR of the DT shell surrounding the hot-spot for implosions on OMEGA 60,207 magnetic flux entrained in a premagnetized fusion fuel,64,80,81,208,209 and residual kinetic flows in the hot-spot210–212 and, using neutron backscatter spectra, the dense fuel layer.213 Summary features have played a similar role in interpretation of neutron spectra. For example, the authors of Refs. 64 and 81 perform parameter inference directly on features characterizing the width and asymmetry of secondary DT neutron spectra and nToF measurements.

FIG. 20.

(a) Using line shape features induced by including a mid-Z tracer, the authors of Ref. 204 were able to extract time dependent electron temperature of the ablating plasma in gas-filled holraums at the NIF. Reproduced with permission from Barrios et al., Phys. Plasmas 23, 056307 (2016). Copyright 2016 AIP Publishing LLC. (b) By extracting the blastwave radius from shadowgraphy, the authors of Ref. 214 were able to compare to MHD simulations to infer laser energy deposited in a gas-filled target. Reproduced with permission from Harvey-Thompson et al., Phys. Plasmas 26, 032707 (2019). Copyright 2019 AIP Publishing LLC. (c) The authors of Ref. 215 extracted bubble, spike, and shock position to understand how energy fluxes produced by heat conduction and radiation impact growth of RTI. Adapted with permission from Kuranz et al., Nat. Commun. 9, 1564 (2018). Copyright 2018 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 

FIG. 20.

(a) Using line shape features induced by including a mid-Z tracer, the authors of Ref. 204 were able to extract time dependent electron temperature of the ablating plasma in gas-filled holraums at the NIF. Reproduced with permission from Barrios et al., Phys. Plasmas 23, 056307 (2016). Copyright 2016 AIP Publishing LLC. (b) By extracting the blastwave radius from shadowgraphy, the authors of Ref. 214 were able to compare to MHD simulations to infer laser energy deposited in a gas-filled target. Reproduced with permission from Harvey-Thompson et al., Phys. Plasmas 26, 032707 (2019). Copyright 2019 AIP Publishing LLC. (c) The authors of Ref. 215 extracted bubble, spike, and shock position to understand how energy fluxes produced by heat conduction and radiation impact growth of RTI. Adapted with permission from Kuranz et al., Nat. Commun. 9, 1564 (2018). Copyright 2018 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 

Close modal

Image data are also amenable to such hand-picked featurization. For example, the authors of Ref. 216 summarized self-emission x-ray images from MagLIF experiments utilizing brightness contours to define a plasma volume, similar to the extraction of volume from neutron imaging diagnostics in Ref. 217. In Refs. 170 and 218, features for self-emission x-ray images from MagLIF experiments were computed using a geometric parameterization to mimic the observed image structure and Fourier analysis of axially resolved radially integrated intensity and brightness centroid, respectively. As another case, the blastwave radius may be extracted from multi-frame shadowgraphy images in experiments designed as surrogates to assess the laser preheat energy deposited in MagLIF214 as seen in Fig. 20(b).

Feature engineering of image data has also played a significant role in the study of hydrodynamic instabilities relevant in a variety of HEDP settings ranging from astrophysical phenomena, such as supernova explosions,215,219–221 to their role in degrading ICF performance.222 These reduced features have proved significant in astronomical observations, laboratory experiments, and in the study of numerical and analytic calculations. For example, bubble-spike amplitudes and mix-layer width in classical-/ablative-/magneto-Rayleigh–Taylor (RT) and Richtmyer–Meshkov (RM) instabilities are commonly used to assess growth rates,62,215,221,223–228 an example of which is shown in Fig. 20(c).

Whether it is some hand-chosen highly reduced features sensitive to a particular quantity or a hand-picked basis (generally chosen on physics-based grounds) for reconstruction, the types of features discussed may constitute a dramatic reduction in the overall dimensionality. When this dimensionality reduction applies to the model parameters themselves, as in the case for basis expansion methods in tomographic inversion, one may avoid difficulties with the curse of dimensionality in sampling from the Bayesian posterior distribution of those parameters. When applied to observables or simulation state variables, dimensionality reduction may aid in data storage and retrieval, reducing computational costs, and addressing the likelihood weighting problem discussed in Sec. III. Furthermore, in some cases, such as when dealing with image data, dimensionality reduction may be thought of as providing a metric for comparison under a likelihood function. This is useful as pixel-wise mean square error can be highly sensitive to small deformations of the image such as a slight shift in registration and are known to be poor choice in many cases.229 However, the effectiveness of such approaches will, in general, depend on the chosen representation. As a simple example, a propagating shock may be well described by the shock front location and density jump with just a few parameters, but a Fourier representation will require infinitely many modes to capture the discontinuous jump. For more complex systems, it may not be possible for an expert to discover a parsimonious description of the system. It is therefore advantageous to develop machine learning approaches that may be applied directly to the data to extract the most important features for the problem or task under consideration. We will refer to this approach as data-driven feature learning.

As a first case, we consider matrix factorization methods. These methods seek to represent a dataset as a sum over a reduced set of basis vectors extracted directly from data, possibly with constraints as in the case of non-negative matrix factorization methods (see Ref. 230). Perhaps the most common and important of these methods is principal component analysis (PCA), which is also known as the proper orthogonal decomposition (POD) and Karhunen–Loeve expansion depending on the particular application.37,109 The basic algorithm for PCA is as follows: Given a dataset (assumed to be zero-mean) represented by a matrix XRn×m, where n is the number of data samples and m is the number of features (typically the number of pixels in an image or number of time-series data points), we may compute the eigenvectors and eigenvalues (known as principal values and principal vectors, respectively) of the covariance matrix XTX. It is not hard to show that the first principal vector associated with the largest principal value is oriented along the direction of maximum variance in the data, the second principal vector and value with the direction of second largest variance (after projecting out the first principal component), etc.37 From a practical stand point, one often uses singular value decomposition (SVD), which computes the factorization X = UΣVT, where URn×n and VRm×m are orthogonal matrices and ΣRn×m is a diagonal matrix. The columns of V are then the principal vectors, and diagonal entries of Σ, referred to as singular values, are the square root of the principal values. By truncating the representation to the first r principal vectors, one obtains XXrUrΣrVrT, where UrRn×r are the first r columns of U, ΣrRr×r, and VrRm×r. Xr provides the provably best (in the sense of minimizing the Frobenius norm of XXr) rank r approximation to X. This is an r-dimensional representation that explains a fraction
(43)
of the total variance in the original data X. Provided that fr ≈ 1 with rm, a significant dimensionality reduction is achieved. In Fig. 21(a), we show the principal components of a dataset drawn from a 2D multi-variate normal distribution. The dark blue vector shows the direction of maximum variance in the data, and the perpendicular projection of data onto this vector will constitute a dimensionality reduction that captures a significant portion of the information contained in the data.
FIG. 21.

(a) Principal component analysis provides a set of basis vectors ordered by the amount of variance explained. For xRn, projecting onto a subset of the first r < n principal directions provides an optimal (in the sense of minimizing residual variance) reduced-dimensional linear representation of the data. When applied to snapshots of time series data from dynamical systems, PCA is referred to as proper orthogonal decomposition (POD) and captures coherent structures. (b) The first (top) and third (bottom) principal components in the streamwise (left) and normal (right) directions for turbulent flow past a cylinder. Reproduced with permission from Wang et al., Comput. Methods Appl. Mech. Eng. 237–240, 10 (2012). Copyright 2012 Elsevier. (c) Manifold learning seeks to find a nonlinear representation of data (red) capable of capturing the intrinsic dimension of the manifold (blue) on which the data lie. (d) Application of manifold learning from Ref. 94, where the authors trained a multi-modal Wasserstein auto-encoder to compress data from ICF simulations. The method learned to compress the data by a factor of 500×. Adapted with permission from Anirudh et al., Proc. Natl. Acad. Sci. U. S. A. 117, 9741 (2020). 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 

FIG. 21.

(a) Principal component analysis provides a set of basis vectors ordered by the amount of variance explained. For xRn, projecting onto a subset of the first r < n principal directions provides an optimal (in the sense of minimizing residual variance) reduced-dimensional linear representation of the data. When applied to snapshots of time series data from dynamical systems, PCA is referred to as proper orthogonal decomposition (POD) and captures coherent structures. (b) The first (top) and third (bottom) principal components in the streamwise (left) and normal (right) directions for turbulent flow past a cylinder. Reproduced with permission from Wang et al., Comput. Methods Appl. Mech. Eng. 237–240, 10 (2012). Copyright 2012 Elsevier. (c) Manifold learning seeks to find a nonlinear representation of data (red) capable of capturing the intrinsic dimension of the manifold (blue) on which the data lie. (d) Application of manifold learning from Ref. 94, where the authors trained a multi-modal Wasserstein auto-encoder to compress data from ICF simulations. The method learned to compress the data by a factor of 500×. Adapted with permission from Anirudh et al., Proc. Natl. Acad. Sci. U. S. A. 117, 9741 (2020). 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.120 

Close modal

In Fig. 21(b), we show an example of the first (top) and third (bottom) principal components in the streamwise (left) and normal (right) directions for turbulent 3D fluid flow past a cylinder taken from Ref. 231. Since these modes are computed using snapshots of dynamical data, they are referred to as POD modes. Notice that the modes capture large scale coherent structures that are observed across time in the fluid. Furthermore, the flow is highly asymmetric on either side of the cylinder so that a Fourier decomposition would be a less appropriate descriptor of the data (i.e., require incorporating more components for the same level of accuracy). Despite being a very common starting point, explicit use of PCA in HEDP and ICF applications appears to be uncommon. We do, however, note the application in Refs. 232–235, where the authors compared inference of the properties of X-pinch plasmas generated from spectroscopy using a principal component decomposition of the non-local thermodynamic equilibrium spectra to a more traditional line-ratio analysis. We also note the work of Ref. 180, which utilized PCA and a sparse formulation of PCA to identify clusters of related design variables for ICF experiments on the NIF.

We pause to reflect on the fact that PCA/SVD may be thought of as representing a particular data sample from X, say, xn by xni=1rai(n)vi, where ai(n)=Un,iσi are the weights of the basis vectors vi=Vi,:T. Here, the subscript {i, :} indicates the vector given by taking the ith row of VT. PCA guarantees that vi · vj = δi,j (by orthogonality of V). This decomposition, therefore, bears significant resemblance to a variety of methods such as Fourier decomposition. However, PCA may be more appropriate for certain applications as the linear basis vectors are learned directly from data to optimally explain variance. Matrix decomposition methods come in a variety of forms useful for different applications and have found a wealth of applications (c.f. discussion of POD and DMD in Sec. IV B). Given the ubiquity and success of these methods, the reader is encouraged to consult the wider literature and statistics and machine learning textbooks37,38,40,109 where a plethora of applications, extensions (such as probabilistic PCA236), and limitations can be more fully appreciated. We especially highlight that the authors of Ref. 109 pointed to many excellent references, especially from the CFD community.

In general, matrix factorization approaches will fail to produce significant dimensionality reduction in cases with significant nonlinear correlations within a particular dataset. Strong nonlinearity of the correlations manifests as a singular value spectrum that decays very slowly with increasing rank r of the representation. Fortunately, there is strong empirical evidence for the “manifold hypothesis”237 that suggests that natural signals tend to cluster about surfaces or manifolds of significantly lower-dimension than the naive dimension of the data. To discover and exploit this structure, a vast number of “manifold learning” or non-linear dimensionality reduction methods have been developed, such as local linear embedding,238 t-distributed stochastic neighbor embedding,239 UMAP,240 kernel-PCA,241 Gaussian process latent variable models,242 auto-encoders,243 and many others.244,245 Figure 21(c) demonstrates the general goal behind manifold learning as a dimensionality reduction technique. In Fig. 21(c), the data appear to be two dimensional, yet a one-dimensional coordinate under a nonlinear mapping is sufficient to capture most of the structure in the data, with the residual structure corresponding to random fluctuations.

Auto-encoders have proven to be a particularly popular approach to learning underlying nonlinear structure owing, in part, to the fact that they can easily be constructed using auto-differentiable programming. This allows for their incorporation in reduced order modeling (c.f. discussion in Sec. IV B). Furthermore, they are commonly extended to serve as generative models37–40,246 for use in sampling and Bayesian inference. In Fig. 21(d), we highlight the work of Ref. 94, where the authors trained a multi-modal Wasserstein auto-encoder with a 32-dimensional latent space on a combination of image and scalar data of naive dimension given by four 64 × 64 pixel images and 15 scalars 16 400. This constitutes a more than 500 × reduction in the data output size. The authors then utilized the learned latent representation to train an invertible emulator of a semi-analytic numerical simulator on the latent space with enforced cycle consistency improving the generalization error of the emulator.

It is beyond the scope of our review to pursue a rigorous and detailed explanation of the incorporation of the topics of Sec. V into a Bayesian inference framework. However, we note that experimental and synthetic data features are often extracted using an algorithm or model and, where possible, should be equipped with some form of uncertainty obtained, e.g., via Bayesian fitting or hyperparameter sensitivity analysis in extraction of features from data. See, for example, Ref. 247, which discusses calibration of uncertainty for the manifold and cyclically consistent (MaCC) architecture.94 We also observe that recent significant effort has been placed on the development of generative models (see the textbooks of Refs. 3740 and 246 for more details and references) that provide tractable methods for treating the original data space and latent space in a rigorous probabilistic manner. This enables their direct incorporation into Bayesian inference. We anticipate that as these methods continue to mature, they will be more widely adopted in the HEDP and ICF community. We also remind the reader that there is an inherent connection between parameter optimization as in χ2 minimization and Bayesian inference via the maximum-likelihood estimation as discussed in Sec. II. We hope that knowledge of this intuitive connection will help to guide the reader in further investigation of this topic in these more advanced cases. We also advocate for readers to attempt to reformulate analyses they have conducted or observed in the literature in terms of the Bayesian theory framework to better understand these connections.

In the section, we detail additional aspects that the practitioner should consider when attempting to perform Bayesian inference with experimental data. Without too much loss of generality, we take an additive model for an experimentally observed quantity as
(44)
where Siobs,Biobs, and Niobs are the signal of interest (henceforth referred to as the signal), unwanted background signal (henceforth background), and noise, respectively. Note that in Eq. (44), we have suppressed any spatiotemporal dependence that may be present. If we consider the framework of Sec. V A, we see that, in general, we will not be able to fairly compare F(Oimodel(θ)) and F(Oiobs) under a likelihood function in order to infer θ. Indeed, even before applying the function F, Oimodel(θ) and Oiobs may not be comparable without properly accounting for background and instrument response. As addressing these aspects is integral to being able to perform accurate inference, we now present several approaches in the literature that can be used to avoid these issues. We roughly categorize these methods into those that deal with the noise term, the background term, and impacts of the instrument response on the signal term.
To begin, we will work with a simplified version of Eq. (44),
(45)
assuming the background to be negligible or non-existent. Taking the observations to be image data, we note that image comparison is often best done using a non-pixel-based representation.229 Here, we follow the work of Ref. 248, which makes use of a representation known as the Mallat Scattering Transform (MST). The MST has a number of desirable features as an image metric, for example, having been shown to be continuous to small image deformations.249 This metric is used in Ref. 248 to directly compare MagLIF stagnation images and test sensitivity to a variety of realistic features, such as image noise and instrument effects. In this case, one may think of the function F (here the function computing the MST) as mapping from spatial information to localized scale information, i.e., a space-scale representation. One may then ask if the application of F to images sensitively depends on the signal-to-noise ratio (SNR). Using ideas of data augmentation from machine learning, the authors compared two different scans of the same image plate, with nominally the same morphology, differing only in the signal-to-noise ratio (SNR). Defining the SNR by
(46)
where σ(·) computes the variance in the noise, the authors found that F(Si1+Ni1)F(Si2+Ni2) for signals Si1Si2 when SNR1 ≠ SNR2. This is due to the fact that the MST is highly sensitive to statistical texture so that otherwise identical images could show significant discrepancy in this space when exhibiting different SNRs, an example of which is shown in Fig. 22. As a result, denoising before computing the image metric was needed to ensure a fair comparison between pairs of images taken from a combination of experiment and simulation using this metric in the presence of different SNRs. Ensuring such a fair comparison would be important for allowing Bayesian inference using this metric in the likelihood function in any future work. As another example where noise reduction plays a role, real-space filtering methods were applied in Ref. 250 to recover image features from gated x-ray detector data collected on NIF implosions. The improved SNR allowed for the accurate extraction of Legendre mode amplitudes fit to an intensity contour characterizing low mode asymmetry with uncertainty, which may then be used in a Bayesian inference.
FIG. 22.

The authors of Ref. 248 demonstrated that real-space filtering should be applied to MagLIF stagnation images to avoid undue influence of the signal-to-noise ratio in comparing morphology of images between experiments or between simulation and experiment when using a wavelet-based image metric known as the Mallat scattering transform (MST). In the top row, two scans of the same image plate are seen to induce different SNR while leaving the structure of the underlying signal unchanged. The left two plots of the bottom two rows compare MST coefficients between unfiltered and filtered images, indicating that filtering aids in improving identifiability. Adapted from Lewis et al., “A framework for experimental-data-driven assessment of magnetized liner inertial fusion stagnation image metrics,” arXiv:2303.15680v1 [physics.plasm-ph].

FIG. 22.

The authors of Ref. 248 demonstrated that real-space filtering should be applied to MagLIF stagnation images to avoid undue influence of the signal-to-noise ratio in comparing morphology of images between experiments or between simulation and experiment when using a wavelet-based image metric known as the Mallat scattering transform (MST). In the top row, two scans of the same image plate are seen to induce different SNR while leaving the structure of the underlying signal unchanged. The left two plots of the bottom two rows compare MST coefficients between unfiltered and filtered images, indicating that filtering aids in improving identifiability. Adapted from Lewis et al., “A framework for experimental-data-driven assessment of magnetized liner inertial fusion stagnation image metrics,” arXiv:2303.15680v1 [physics.plasm-ph].

Close modal

Returning to Eq. (44), we notice that similar considerations apply for the background term. Note that when working directly with Oiobs, if one has a model for both Si and Bi, it is possible to apply Bayesian inference directly, inferring posterior distributions for signal and background model parameters simultaneously. However, there are several reasons such an approach may not be desirable. First, referring back to Sec. V A, one sees that incorporating background model parameters will increase the problem dimensionality, potentially resulting in difficulties from the curse of dimensionality. Second, returning to the case of image data, it is often useful to work with a transformed quantity F(Oiobs). As a result, steps must be taken to ensure that the background term does not introduce strong bias into the inference process. For this reason, we now discuss several different conceptual approaches to handling the background term to allow for accurate inference.

One common approach to addressing background due to spatial non-uniformities caused by instrument and invariant environmental effects is to perform direct measurements for use in a process called flat field calibration.251–253 Similar procedures can be followed to obtain spectral normalization to correct for spectral non-uniformities by fielding reference samples. In Refs. 254–256, this is achieved by fielding a reference sample in tandem with a sample of interest on opacity experiments conducted at Sandia’s Z Facility. The authors then performed a statistical analysis to obtain a spectral reference with uncertainty across many experiments that enabled characterization of shot-to-shot reproducibility. These approaches are attractive due to their effectively model-free approach to characterizing background, though we note that typically additive or multiplicative background is assumed depending on context. Furthermore, with a detailed understanding of statistics of noise sources, one may easily adapt such methods to a Bayesian context to propagate uncertainty from the background subtraction. This may be achieved either by parameterizing the uncertainty or sampling directly from experimental observations to mimic sampling from a prior distribution.

Unfortunately, such direct measurements are often unavailable or cannot be used consistently across multiple datasets due to varying environmental effects, making them costly to obtain. Alternately, approximate background subtraction may be achieved by fitting a parameterized model (e.g., polynomial and exponential) to a region lacking the signal of interest. The fit background is generally then subtracted off and any desired features are computed. This approach has the added benefit of incorporating models for additional sources of background signal that may not be present in a calibration experiment. However, this program fails to effectively capture uncertainty arising from the inability to perfectly fit the background. In order to account for such uncertainty, a statistical background fitting approach must be employed. Most relevant to the presentation here is a procedure for “Bayesian fitting” of background signal. We note that this approach requires some assumptions to be well satisfied. First, in order to fit just the background, the support of the background signal must be different from that of the noise. As a concrete example, suppose that the observed signal is a function of space, then one needs a set of points X={xi}i=1NBG such that Si(X)0 while Bi(X)0. Second, fitting the background independently of the signal and attempting to propagate uncertainty in an inference of the signal model parameters amount to an assumption of independence under the observed data. Specifically, one desires to marginalize over the full joint distribution p(θS,θB|D) of signal and background parameters conditioned on the data with respect to θB to obtain p(θS|D). Here, we assume p(θS,θB|DX)=p(θS)p(θB|DX)p(θS)p(θB|D). In other words, we assume that the background is constrained by data in X. We then perform the marginalization as
(47)
In going from the first to second line above, we use the product rule of probability.37 We then employ the stated approximation going from line two to three and utilize MCMC samples from the posterior distribution p(θB|DX) to approximate the integral obtaining the fourth line. In the final line, we utilize the Bayes rule while also writing out the explicit subtraction of the background resulting from the particular MCMC sample θB,j.

This is the type of approach that was used in Ref. 64, where secondary DT nToF data backgrounds were fit using Bayesian inference to find background model parameters as indicated in Fig. 23. By sampling from the posterior of background fit parameters, the authors of Ref. 64 obtained a distribution for the nToF shape quantities-of-interest, resulting in a significant dimensionality reduction in the observations to be fit (see Sec. V A). We note that this work made a further approximation that the likelihood function of the background subtracted data given the shape features of the signal θS was delta function distributed about the computation of the shape features from the background subtracted data. This amounts to assuming that the variance in shape features is dominated by the uncertainty in background fit so that noise in the computation of cumulative area under the nToF used to compute shape features is negligible. The authors used the experimental feature distributions as observations equipped with an approximate uncertainty in a final Bayesian inference of fuel magnetization from nToF and neutron yield data. This is achieved using a likelihood function form approximated from the experimental shape feature distribution with a multi-variate normal where the covariance matrix is estimated directly from MCMC samples of shape parameters obtained by the Bayesian background fitting procedure for a given experiment. Paired with a neural network surrogate of a physics forward model, this allowed for fast MCMC sampling and an assessment of the magnetic confinement parameter for the stagnated fuel in MagLIF experiments as discussed in Sec. II E. We highlight that this inference incorporates uncertainty from both the surrogate model fit (c.f. discussion in Sec. IV E) and the experimental feature extraction via Bayesian background fitting discussed here.

FIG. 23.

Left: shape features of the secondary DT nToF signal were obtained with uncertainty in Ref. 64. The slowly decaying background signal (blue line inset) was parameterized and fit using a Bayesian inference. The posterior distribution of background parameters was sampled, and the cumulative area under the background subtracted nToF signal was computed with resulting percentiles being used to compute shape features. Right: By computing these shape features on samples from the background parameter posterior distribution, one may obtain a distribution of the shape features with quantified uncertainty. Here, we plot the obtained distribution of δAt, which characterizes the p25–p75 width of the nToF signal shown in the inset of the left panel. Reproduced with permission from Lewis et al., Phys. Plasmas 28, 092701 (2021). Copyright 2021 AIP Publishing LLC.

FIG. 23.

Left: shape features of the secondary DT nToF signal were obtained with uncertainty in Ref. 64. The slowly decaying background signal (blue line inset) was parameterized and fit using a Bayesian inference. The posterior distribution of background parameters was sampled, and the cumulative area under the background subtracted nToF signal was computed with resulting percentiles being used to compute shape features. Right: By computing these shape features on samples from the background parameter posterior distribution, one may obtain a distribution of the shape features with quantified uncertainty. Here, we plot the obtained distribution of δAt, which characterizes the p25–p75 width of the nToF signal shown in the inset of the left panel. Reproduced with permission from Lewis et al., Phys. Plasmas 28, 092701 (2021). Copyright 2021 AIP Publishing LLC.

Close modal

In general, the process specified in Eq. (47) requires identification of a subset of the observational data X not containing the signal of interest. This may be done manually if needed, but may also be automated using signal processing and machine learning methods. As an example of a machine learning based approach, Fig. 24(a) shows results from Ref. 257, where the CNN-based pipeline was used to segment self-emission x-ray images from MagLIF experiments into fuel and background. The segmentation allowed for background subtraction and statistical analysis of the noise and background. The resulting experimental distributions of SNR and coefficients of low degree polynomials fit to slowly varying background are shown in Fig. 24(a). In particular, the analysis in Ref. 257 demonstrated that the assumption of additive Gaussian noise is well matched to the experimental data, while obtaining a multivariate normal approximation on background coefficients. We emphasize that this type of analysis can provide quantitative guides for the selection of parameterized distributions for data augmentation in machine learning applications or as priors in Bayesian background subtraction procedures that quantify uncertainty. Such data augmentation is commonly employed in machine learning applications such as those in Refs. 170, 257, and 258. Ensuring the data augmentation procedure is well matched to the experiment is critical. For example, it has been shown that convolutional neural network techniques for image analysis, such as classification and segmentation, may exhibit significant sensitivity to realistic noise.259 Training data augmentation methods are a common approach to avoid this feature. While it is common practice to assume normally distributed noise, it is important to consider rigorous approaches to effectively capture the noise distribution associated with experimental data.

FIG. 24.

(a) In Ref. 257, a CNN is used to perform image segmentation (six images in upper left corner) so that statistics of slowly varying background (lower left) and SNR (lower right) may be assessed. The statistical properties may be parameterized [here using (multivariate) normal distributions] and applied in data augmentation of synthetic data for training ML models or as priors in a Bayesian background subtraction to quantify uncertainty. In the upper right panel, we show statistics of coefficients of polynomial terms fit to the slowly varying background. The plot inset in the upper right panel shows the computed covariance matrix. Adapted with permission from Lewis et al., J. Plasma Phys. 88, 895880501 (2022). Copyright 2022 Cambridge University Press. (b) Rather than attempting to model noise, background, and instrument response effects, the authors of Ref. 183 utilized GANs to learn a function that approximates the inclusion of realistic experimental data features in synthetically generated images. This approach may help to reduce bias in training ML models with synthetic data by better minimizing the difference between the experimental and synthetic data distributions. Adapted with permission from Wolfe et al., Rev. Sci. Instrum. 94, 023504 (2023). Copyright 2023 AIP Publishing LLC.

FIG. 24.

(a) In Ref. 257, a CNN is used to perform image segmentation (six images in upper left corner) so that statistics of slowly varying background (lower left) and SNR (lower right) may be assessed. The statistical properties may be parameterized [here using (multivariate) normal distributions] and applied in data augmentation of synthetic data for training ML models or as priors in a Bayesian background subtraction to quantify uncertainty. In the upper right panel, we show statistics of coefficients of polynomial terms fit to the slowly varying background. The plot inset in the upper right panel shows the computed covariance matrix. Adapted with permission from Lewis et al., J. Plasma Phys. 88, 895880501 (2022). Copyright 2022 Cambridge University Press. (b) Rather than attempting to model noise, background, and instrument response effects, the authors of Ref. 183 utilized GANs to learn a function that approximates the inclusion of realistic experimental data features in synthetically generated images. This approach may help to reduce bias in training ML models with synthetic data by better minimizing the difference between the experimental and synthetic data distributions. Adapted with permission from Wolfe et al., Rev. Sci. Instrum. 94, 023504 (2023). Copyright 2023 AIP Publishing LLC.

Close modal

Rather than attempting to remove background and noise to ensure that an inference on a signal or derived quantity is insensitive to those features, one may instead try to construct a mapping F that has the desired property that F(Si + Bi + Ni) ≈ F(Si). This is the type of approach undertaken in Ref. 183 and indicated in in Fig. 24(b). In that work, generative adversarial networks (GANs) are used to learn a function that may be applied to synthetic images to approximate the background and noise features seen in the experiment. That is, SimodelSimodel+Bimodel+Nimodel (note that an additive form is not necessary here and is not assumed in Ref. 183). Broadly speaking, GANs operate using a split generator and discriminator architecture to learn the desired mapping. The generator, G(x), takes some input (in this case, a synthetic image) and generates an output, y (in this case, a modified synthetic image). The discriminator, D(ỹ), attempts to classify the output of the generator ỹ=G(x) as belonging either as an output of G or a member of the ground truth dataset. The networks are trained using feedback of one to inform the other so that G learns to better approximate the distribution of experimental data known to the discriminator. The learned generator function applies noise distributions to synthetic images that appear to arise from aspects of the measurement process that would be difficult or impossible to capture using an analytic forward model. The authors of Ref. 183 discuss the use of an additional neural network model applied to the GAN output to obtain high resolution 3D reconstruction of ICF capsule density profiles from a sparse number of views or even a single view. Although more work will be required to validate the reconstructions, the use of realistic looking GAN outputs in training the tomographic inversion is likely to reduce bias in the reconstruction. Before moving on, we note that, although not discussed explicitly in Ref. 183, generative models may be thought of as estimating the joint probability distributions over data and can serve as effective priors in Bayesian inference problems.260 

Finally, instrument response should also be appropriately accounted for to avoid biased inference. This is usually accounted for in one of two ways depending on the application. Perhaps the easier of the two methods involves convolving a model of the instrument response functions (IRFs) with a forward synthetic model. The output synthetic may be directly compared to observations, which naturally include these instrument response effects. This is the approach taken, for example, in Refs. 64, 205, and 261 to derive synthetic nToF data from a physics model that includes response. The authors of Ref. 261 went a step further by propagating uncertainty associated with imperfect information on the IRFs in a Bayesian framework, arriving at results for ion temperatures that accurately reflect that uncertainty. Alternately, one may attempt to solve the inverse problem to deconvolve experimental data using known responses such as point spread functions to remove instrument effects and obtain a ground truth experimental signal. This is the approach taken in Refs. 262 and 263, where an iterative maximum likelihood technique is used to remove non-stationary distortions caused by the aperture in pinhole neutron imager data from NIF experiments. If the IRF is parameterized, then a prior distribution over those parameters may be established by Bayesian fitting to calibration data. If it is directly measured but not parameterized, measurement uncertainty may be used to propagate the uncertainty through forward models using Monte Carlo sampling.

In this section, we have highlighted a unifying framework (for the case of additive signals) to understand the impact of realistic features in experimental data on dimensionality reduction and inference. We have presented a wide variety of methods spread across many subdisciplines that can be used to mitigate unwanted bias introduced from these features. It is our hope that the scope of applications is broad enough to provide good starting points for new practitioners regardless of their particular physics application.

In this section, we highlight more advanced applications of the concepts presented in this Review that have been applied in other fields and could be adapted to enhance the power of ICF and HEDP experiments.

The examples outlined in Secs. II E and III almost exclusively analyzed data with a single model to infer parameters of importance from an experiment. However, one of the great powers of the Bayesian formalism is that it provides a mechanism to quantitatively compare models against each other. While estimating parameters from data and calibrating models is a critical task, scientific discovery moves forward when a candidate hypothesis can be ruled out by existing data. Hypothesis testing, or Bayesian model selection, is a well-established methodology applied in many fields.264–267 

In order to do this, we must evaluate which model among a library of candidate models has more explanatory power, given the available data. We quickly realize that our models may have different numbers of parameters and a model with more parameters could reasonably be expected to fit the data better than a model with fewer parameters, though this does not necessarily make it a better model. A naive example is observed when fitting a noisy dataset with two different polynomial models. You would expect a higher-order polynomial to intersect more points than a linear model, producing a “better” fit. However, a high-order polynomial is prone to over fitting and will not necessarily explain the data better than the linear model. Appealing to Ockham’s razor, all things being equal we prefer the simpler model. Bayesian model selection takes this effect into account when comparing models.

Revisiting Bayes’ theorem [Eq. (3)], if we have two different models hi, each with their own parameter vectors θi, we can write the posterior probability of each model as
(48)
(49)
where the posterior on the left-hand side (LHS) now expresses our belief that model hi is correct, given some observations and our background data. The likelihood on the RHS expresses the probability that we would observe the data D, given the model and our background. We immediately notice that there is no dependence on θi in this expression. This is because we are interested in the probability of a model being correct regardless of the particular parameter values, so θi has been marginalized out. We can expand the likelihood in Eq. (49) to show the dependence as
(50)
Due to the marginalization over θi, this term is often called the marginal likelihood. By inspection, we see that the integrand in Eq. (50) is equivalent to the numerator in Bayes’ theorem, where the hypothesis or model hi can be viewed as part of the background information I. Therefore, the marginal likelihood is nothing more than the evidence in Bayes’ theorem, acting as the normalization constant for the posterior.
It is apparent that if we are to prefer h1 over h2, then we would expect that p(h1|D) > p(h2|D), or alternatively the ratio of the posteriors p(h1|D)/p(h2|D) > 1. If we assume the prior probability of each model (how much we believe either model to be correct before seeing the data) is equal, determining the ratio of the posterior probabilities reduces to the ratio of the evidences, also known as the Bayes factor,
(51)

Evaluation of the Bayes factor requires marginalization of the likelihood over the entire space of the parameters, which tends to be a very complex and costly integration. Many methods have been developed to perform or approximate this integration, such as MCMC,268 nested sampling,269 thermodynamic integration,270 and the Laplace approximation,35 among others.

Due to the expense of this integration, a number of approximate metrics have been formed to perform model selection. Two common metrics are the Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC).28,39,40,271 Both metrics use the negative log-likelihood at the maximum as a surrogate for the posterior weight and add a penalty term for the number of parameters. The AIC can be written as
(52)
and the BIC is defined as
(53)
In both expressions, θ̂ is the MLE and k is the number of model parameters. In the BIC, N is the number of data points used in the inference. When comparing multiple models, the model providing the smallest value of AIC or BIC will be preferred. These metrics only require the likelihood be evaluated at the MLE, obviating the need for the complex and costly integration of the evidence.

These methods are approximate, and importantly, they ignore all information about the posterior apart from the weight at the MLE. Other metrics have been developed to utilize more information provided by a chain of MCMC samples from the posterior. These methods, while still approximate, do take into account the full distribution of parameters provided by the chain, making them more accurate. Two of these methods are Leave One Out (LOO) cross-validation, and the Widely Applicable (or Watanabe–Akaike) Information Criteria (WAIC). We refer the reader to Refs. 174 and 272 for detailed information on the theory and implementation of these metrics. For our purposes, we note that they utilize a chain of samples from MCMC for each model and can be readily computed with many modern statistical packages. To compare multiple models, we compute either the LOO or WAIC from the posterior samples of each. The preferred model is the model that produces the largest value of the chosen metric.

To illustrate this, we revisit our example from Sec. II. In this example, we fit our neutron spectrum with multiple different models. The first was an approximate model for the neutron spectrum, which assumed that it took a Gaussian shape and used a normal likelihood. This model required four parameters to describe the spectrum. The second model used a normal likelihood but replaced the neutron spectrum with the Ballabio model, which reduced the number of model parameters from four to three. Finally, we implemented a Poisson likelihood in the inference with the Ballabio model. Figure 3 shows the posterior distribution of the ion temperature inferred with each model (as well as the Laplace approximation model). We can see that the Ballabio model with the normal likelihood produced a posterior for the ion temperature that was nearly identical to that produced using the Gaussian model, but required one fewer parameter to fit the data. The Poisson model produces a slightly narrow distribution. We can use LOO to quantify these differences and select the preferred model.

Figure 25 shows the LOO scores (circles) computed for each model along with the estimated standard error in the difference between each model and the highest ranked model (error bars). The Poisson model shows no error because it is the top-ranked model. We can see that the Ballabio model performs better than the Gaussian model, but the Poisson model is clearly superior. This agrees with our intuition, given the reduced uncertainty in the ion temperature demonstrated with this model and the fact that the data were generated using the Ballabio model as the ground truth with a Poisson process.

FIG. 25.

Circles show the log of the LOO score computed for the three different models tested. The error bars show the Monte Carlo error in the difference of the scores relative to the top ranked model. “Normal + Brysk” represents the model employing a Gaussian neutron spectrum and a normal likelihood. “Normal + Ballabio” represents the model using the Ballabio neutron spectrum and a normal likelihood. “Poisson + Ballabio” uses the Ballabio neutron spectrum and the Poisson likelihood. The Poisson model is clearly preferred over the other two.

FIG. 25.

Circles show the log of the LOO score computed for the three different models tested. The error bars show the Monte Carlo error in the difference of the scores relative to the top ranked model. “Normal + Brysk” represents the model employing a Gaussian neutron spectrum and a normal likelihood. “Normal + Ballabio” represents the model using the Ballabio neutron spectrum and a normal likelihood. “Poisson + Ballabio” uses the Ballabio neutron spectrum and the Poisson likelihood. The Poisson model is clearly preferred over the other two.

Close modal

A prominent example applying Bayesian model selection in the literature is in the detection of gravitational waves by the LIGO observatory and the identification of the event that produced the observed signal.273,274 An inference pipeline was developed that applies a library of models to the observed data, estimating the relevant parameters for each candidate model. Multiple independent methods were used to sample the posterior and evaluate the evidence to check for consistency. The event was identified as a binary black hole merger because this is the event model that produced the largest evidence, by a significant margin. This process allowed the team to simultaneously identify the event and estimate the relevant parameters of the system (initial black hole masses, final mass, final black hole spin, etc.). Similar approaches have been applied to other gravitational wave observations275 and other large astronomical observational endeavors, such as WMAP3.266 

In HEDP and ICF, we are faced with many models that are used to explain experiments and predict trends. Sometimes, these models are theoretical models with small numbers of parameters, but often, these models take the form of multi-physics simulation codes, each of which employ different discretizations, physical approximations, and numerical choices that can impact the fidelity and applicability of the model to a specific domain or experiment. The example of Ref. 91 discussed in Sec. III developed multiple hypotheses for physical mechanisms, which were implemented in a simulation tool in an ad-hoc manner. The experimental data were then used to select which hypothetical process was most consistent. Although Bayes factors were not computed, this application is a clear example of the power of this methodology to guide scientific discovery by systematically ruling out hypotheses that are inconsistent with data. By applying this framework to a more diverse array of experiments and physical processes, we can expect rapid progress in our understanding.

Building from the foundations of inference and hypothesis testing, we can use these tools not only to analyze experimental data but to intelligently guide how we conduct our experiments. By combining these ideas, we can construct a framework that allows us to decide where to obtain experimental data to maximally inform our models or provide data that maximally discriminate between competing model predictions. A fully Bayesian experiment design formalism accounts for all known uncertainties and sensitivities when selecting design points. It has the added benefit that the formalism used to determine the design points is readily able to ingest new experimental data once it is obtained to produce refined model predictions and propose new points.

We begin by proposing a utility function U(D|x) that quantifies the value of newly obtained data D at a given experiment design point x. This utility function is domain specific and must be engineered based on the problem at hand. However, some general metrics have been developed, which rely on various statistical approximations, most commonly the Laplace approximation. Two common metrics are A-optimality and D-optimality. The A-optimality criterion minimizes the trace of the covariance matrix, effectively minimizing the combined variance of each parameter in the model, ignoring correlations. The D-optimality criterion minimizes the determinant of the covariance matrix, allowing for the incorporation of linear correlations in the design. Variations on these criteria can be found in the literature. Alternatively, information gain can be maximized by using the Kullback–Leibler divergence as the utility function, a metric with roots in information theory. We refer the reader to Ref. 271 for more details on these and other criteria.

The experiment design points can be thought of as any parameters of the experiment that we have control over. This could be capsule dimensions, drive power or history, gas density, etc. These design points can also entail properties of the measurements being taken, such as the photon energy of a probe, pinhole diameter, the time at which images are taken, and filters used in an instrument. Because our task is to propose an experiment, we do not know what the data will be ahead of time, so we must marginalize over possible outcomes. The data itself depend on the parameters of our model θ, which are also uncertain, so we must include this uncertainty by marginalizing over model parameters. This process gives the expected utility function or the expected value of U(D|x) as a function of experiment design parameters weighted by the likelihood of D and the prior distributions on the model parameters. The proposed design point is then x that maximizes the expected value. Finding this design point, however, is computationally very challenging. It involves a high-dimensional optimization that requires sampling and integrating over the likelihood at each iteration.

Often, our goal is to find design points that maximize the constraining power of our diagnostic observations, not the performance or dynamics of the experiment itself. However, this approach can be used to maximize some predicted output of an experiment. With traditional approaches to designing experiments in HEDP and ICF, we will often look for sensitivity to underlying conditions or processes in our models that are not directly accessible. This formalism requires us to then look at those underlying conditions through the lens of our instruments. In order to do this well, we must have high fidelity instrument models that account for instrumental response, noise, and background effects so that we can confidently say that our proposed experiment is sensitive to measurable quantities.

Although not Bayesian in nature, an important example in the literature is found in Ref. 276, where a statistical model of yield from direct drive capsules on the Omega laser was used to predict design changes that would lead to enhanced neutron yield. The statistical model in question used a large ensemble of 1D calculations to produce a baseline model of target performance dependent on experimental parameters, such as capsule dimensions and laser drive. An experimental database was used to correct, or calibrate, this model based on experimental observables such as neutron yield and fuel areal density. The calibrated model was then used to suggest new design points that would not have been intuitively discovered if the 1D predictions were naively used to optimize performance. The study ultimately succeeded in tripling the neutron yield from a baseline high performance design. An important benefit of this approach is that the calibrated model is able to ingest the output of each new design point once the experiment is conducted, enabling continual improvement of its predictive capability.

A common application of the Bayesian information based formalism is the design of instruments to minimize uncertainty in a specified quantity. As discussed in Sec. IV, Knapp et al. applied this formalism to the optimization of an x-ray detector configuration (detector sensitivity, x-ray filter material, and thickness) to minimize uncertainty and bias in specific inferred quantities.160 Bayesian optimization was used to minimize the utility function, which was defined as the mean-squared error of the inferred x-ray spectrum relative to the ground truth from simulation data. In the closely related field of magnetic confinement fusion, this approach has been used to optimize the configuration of an interferometry diagnostic subject to constraints enforced by the vacuum vessel and the geometry of the experiment.277 The goal was to configure the lines of sight of three interferometers to maximally inform the electron density profile, for which the information gain was quantified using the Kullback–Leibler divergence in the utility function. Due to the complexity of both the instruments and the underlying physics, applying this approach to the design of new instruments on HEDP facilities could significantly improve the value of new data in addition to providing a means to prioritize resources.

Although well established in other fields (e.g., observational astronomy278–280), the use of this formalism to design the conditions and parameters of experiments on HEDP and ICF facilities is nascent. In this application, our utility function describes the value of new information in informing or constraining model parameters. This is an extremely promising approach in ICF and HEDP where experiments are rare and expensive as it provides a quantitative means to choose between proposed experiments and guide decisions that maximize each experiment’s value. Again, sampling our models over large spans of experimental parameter space enabling marginalization over model parameters is expensive, and so the surrogate modeling techniques and statistical simplifications previously discussed may be necessary.

HEDP and ICF experiments function in a wide range of parameter space, with important processes spanning many orders of magnitude in spatial and temporal scales, with wide ranges of plasma density and temperature. Because of this, it is not trivial to know a priori when and where a given model is valid or if other processes may have non-negligible impact on the outcome. By designing experiments with the goal of selecting between candidate physical models, we can readily reject hypotheses and update our understanding of the complex physical phenomena under study. The application of optimal experiment design techniques to model selection is less well developed, though examples do exist outside of HEDP and ICF that we can learn from.281,282 In this application, we wish to design experiments such that the observables are capable of discriminating between different models. Now, our utility function captures the discrepancy between predictions from different models, marginalized over the model parameters and likelihood. In the simplest case, one model is correct and the other is incorrect. In reality, most physical models do not obey this dichotomy. Most models are correct in some regime of applicability and fail when parameters stray outside of this regime, where another model is applicable. An example of this is the transition between Spitzer-like transport for weakly coupled plasmas and transport in the strongly coupled regime.283 Spitzer theory works extremely well in the weakly coupled limit, but breaks down as the Coulomb coupling parameter Γ → 1. Strongly coupled theories work well for large coupling, but break down as the coupling parameter decreases. Often, these two theories vary widely in the regime of Γ ∼ 1 because neither is valid in this transitional regime. In this and similar situations, we seek to optimize experiments that will delineate the regime of validity for each model. Additionally, the Bayesian framework admits model averaging, which can potentially produce a model that gracefully transitions between regimes of validity.

An important special case that often occurs in the physical sciences is that of nested models. Two or more models are said to be nested when they become degenerate in some region of parameter space. Physically, nested models can be viewed as a hierarchy, where one model is a simplification of another. This often occurs in plasma physics because simplified models are derived by making assumptions that reduce the complexity of more complete, but less tractable models. This situation is difficult from an optimization standpoint because in the regime where the two models are degenerate, the fitness of the models will be undetermined, as one cannot be favored over the other. Model selection applied in the case of nested models has been addressed to some extent in the literature though it is an active research area.284 By applying these ideas to experiments in ICF and HEDP, we provide a means to optimize the use of expensive resources with the goal of advancing our state of knowledge as efficiently as possible.

The advent of high repetition rate facilities, such as the new generation of high powered lasers11,12 and the LCLS,10 will push the frontier in both HEDP experiment and data-driven modeling and analysis. These facilities provide a wealth of high quality data at rates that are unprecedented in HEDP physics. The flagship large HEDP facilities (e.g., NIF and Z) operate at 1–3 shots per day. The OMEGA laser can provide up to 15 shots per day. Today’s high rep-rate facilities can operate in the range of 1 experiment/min. to 10 experiments/s. The traditional model of operating HEDP experiments does not extend to this unprecedented scale, and a new way of thinking about operation and control at these facilities is needed. To this end, Hatfield et al. discussed opportunities for the use of machine learning to analyze data for real-time control and feedback.285 Clearly, real-time analysis and inference methods will need to leverage many of the methods presented in Secs. IV and V in order to be feasible on relevant timescales.

As examples of forward looking approaches, the authors of Refs. 286 and 287 show the use of Bayesian optimization for real-time control of a high power laser facility to optimize a wakefield accelerator. In Ref. 288, the authors developed a NN taking x-ray spectra as inputs and outputting spectrum amplitude, slope of the spectrum (a proxy for electron temperature), and total particle dose. The authors discussed how the approach may be appropriate for real-time control applications. As the HEDP/ICF community begins to develop these approaches, we encourage seeking existing solutions from fields where such methods have already reached significant maturity, for example, in magnetic confinement fusion, NN-based methods for real-time applications, including classifying Alfvén eigenmodes that may cause loss of confinement and wall damage289 as well as profile prediction.290,291 In Ref. 292, a reinforcement learned293 model implemented on a field-programmable gate array (FPGA) was applied for controlling an important subsystem of the Fermilab Booster accelerator.

A key consideration for real-time applications is satisfaction of latency requirements. One approach to improving latency is to utilize code that is optimized for run-time. To this end, the authors of Ref. 294 developed a library to convert NNs built and trained in keras168 to “real-time compatible C.” In the field of high-energy particle (HEP) physics, particularly for experiments at the Large Hadron Collider, event rates of O(10MHz) with data size of O(1MB/event) give rise to O(1TB/s) data rates. In order to manage these data, much of which constitutes uninteresting background events, the HEP community has developed a number of tools that enable ML algorithms to be programmed on FPGAs to provide fast and accurate triggering. For example, the library hls4ml295 converts NNs trained in common ML libraries to high-level synthesis code for FPGAs. This has enabled a variety of ML algorithms to be programmed on FPGAs, including graph NNs,296 boosted decision trees,297 and autoencoders.298 Finally, we note that FPGA-based applications typically will require smaller NNs due to memory constraints. Various methods to compress NNs while preserving accuracy such as pruning as well as reduced numerical precision of weights and biases (e.g., binary and ternary) have been developed.299,300

Before concluding this section, we note that application of these methods may enable not only observation- or point-estimate-based control, but also real-time posterior inference. This could allow for more general methods such as the optimal experiment design framework discussed in Sec. VI B to be brought to bear. At a high rep-rate facility, the typical targets for each shot can be fed in a continuous stream, such as metal foils, liquid jets, and gas jets. As a result, the parameters varied will be primarily parameters characterizing the drive and diagnostic settings, such as relative pump–probe timing. The same model could be applied to more traditional HEDP facilities, taking advantage of the additional time available between experiments. If we consider operations on OMEGA, where 45–60 min. are available between experiments, it is clear that only a select number of parameters can be varied on a per shot basis. The optimization algorithm would need to have knowledge of the inventory of targets available and the quantities that can be changed (diagnostic timing, laser power history, etc.). However, campaigns could be designed with this mode of operation in mind. With judicious use of surrogate models and automated workflows, it is entirely feasible to process data, infer important quantities, and propose a new design point in under an hour. Facilities such as NIF and Z allow for one or more days in between successive experiments. While this is still not enough time to build new targets, it is enough time to make some changes to the experimental configuration.

Such an approach has the potential to dramatically accelerate progress in HEDP science. Traditionally, campaigns on such facilities are designed in such a way that pre-chosen sets of data are collected and analyzed after the experimental run is over. More data can only be collected at the next opportunity for facility time, which often requires a new proposal submission. This introduces a significant lag between design iterations that could be circumvented with intelligent adaptive control algorithms. Significant investment in data handling, automated processing, and fast inference algorithms will be required to realize this vision.

We have demonstrated broad interest within the HEDP and ICF communities in the application of advanced analysis tools to better constrain inferences and maximize the value of data obtained on expensive experiments. These tools have demonstrated success in improving inferences, allowing for access to unobserved physical processes, and the intelligent combination of disparate data across single and multiple experiments. These advances are partially enabled by the similarly recent explosion in the use of machine learning to provide efficient surrogates of complex models. One of the goals of this Review is to help proliferate these techniques across the fields of HEDP and ICF and to spur advancement of key technologies that will improve our ability to perform high quality measurements and efficiently integrate experiment with theory and modeling. In this section, we will discuss gaps and needs that could be filled to help realize these goals. Many of the items discussed strongly parallel recommendations from the work of Hatfield et al.285 and multiple recent community reports.301–303 

In order to spread these methodologies and increase their use across HEDP and ICF, the community requires training materials. Most tutorials regarding Bayesian inference available online focus on standard statistical problems. These examples are extremely important, but miss critical aspects of the problems encountered when analyzing experimental data in HEDP and ICF. Importantly, they generally use built-in statistical models and do not address the issues encountered when implementing physical forward models of the experimental system and instruments. This introduces a conceptual leap that must be taken and one that is not always trivial.

Similarly, the literature and databases available to the novice in machine learning are focused on standard problems of interest to industry. These often include data suitable for training a variety of classification tasks (e.g., cats vs dogs, identification of tumors, spam detection, and predicting user product preferences) and some regression tasks (e.g., real estate, weather, and traffic forecasting). While these problems illustrate important tools and techniques, and they leverage existing databases of well understood data, the applications are often subtly, but importantly, different from the physicists needs. Again, this introduces a conceptual barrier that is difficult to overcome.

To circumvent this shortcoming, it is imperative that the HEDP and ICF community develop examples and instructional material that is geared toward problems of interest. This will be the most efficient way to help students and early career researchers adopt these new methodologies that are often missing from graduate school curricula. This Review includes a Jupyter notebook working through the simplified Bayesian inference example from Sec. II, which is available on GitLab. This is intended to be a starting point around which other efforts can coalesce. The authors encouraged collaboration with the community to expand and improve this material. The LLNL open data initiative serves as a useful entry point for machine learning-ready datasets with applications in the physical sciences.304 This site provides databases in a variety of topical areas, most relevant to this Review being the JAG ICF dataset for machine learning.305 

We also recommend that experts in these methods develop instructional materials that can be used for summer schools and mini-courses at conferences. This development should be done in conjunction with academia to ensure proper pedagogy and integration with existing and potential future graduate school curricula. There are several examples within the HEDP community that could be used as templates for such activities. There are two HEDP summer schools, which operate on an alternating schedule.306,307 These schools have been operating for well over a decade at this point and are established within the community. Additionally, LLNL has partnered with the University of California San Diego to make several courses covering various topics in HEDP available to students and researchers across the country.308 These examples provide models for new courses or potential opportunities for collaboration that should be explored.

In addition to training materials, another means to help accelerate the adoption and advancement of these methods is to provide curated databases to the community. Many of the methods presented here require data with which to train algorithms. By making well organized and labeled databases available, others can leverage them to test new ideas, develop new algorithms, and build tools where application specific data are not available or extremely limited. Again, we refer to the JAG ICF database as an example from which the community can build. In cases where the application of interest differs substantially from the use case that generated the data, curated databases can still be useful for prototyping. Alternately, they may serve as a basis for training models that will be updated with transfer learning using a small dataset that is more targeted.

Bayesian methods can also benefit from the availability of data. These databases can be used to test for sensitivity of a particular inference model, develop new diagnostic concepts that can enhance experiments, test model calibration and selection methods, and, in general, to validate a particular inference. The use of data that are well vetted by the community to establish a new technique increases the acceptance of new methods and promotes reproducibility because others will have access to the data.

Beyond making data available to the public for testing and development, these datasets also provide an opportunity to establish a standard for storing raw data, metadata, and other important information. This applies not only to experimental data, but also to simulation data. The availability of a common format with tools for accessing and manipulating data enhances communication and collaboration across the entire community. It also provides a route to readily adapt methods developed at one institution to problems being investigated at another. This applies to ML algorithms and to the development of more conventional forward models for physics and diagnostics. Care must be taken when developing a standard for complex data like that generated in HEDP and ICF research so that the data are not only easy to access and manipulate but are also complete.

Finally, all of the tools and techniques discussed in this Review require some form of testing and validation before they are applied to experimental data. In addition to setting standards for the storage and dissemination of data, the community must also set standards for validation. The ML community has standard datasets for specific applications that can be used to train and test new algorithms. In ICF and HEDP, it will not always be the case that one can train an algorithm on one of the available standard datasets and then apply it to their specific problem of interest. In such cases, reporting validation scores on these datasets will not be meaningful. To circumvent this, it is recommended that the community develops methodologies and metrics that are meaningful for the tasks discussed here.

In this Review, we have provided a foundation for the researchers to begin their journey applying advanced methods to their work and a number of examples from recent literature illustrating the current status of these methods in ICF and HEDP research. We have provided enough background information to convey the power and potential of these methods to transform many aspects of experimental ICF and HEDP research. Many of the tools discussed here from inference, to experiment design, to automated data reduction have already been applied with great success. The future is bright for the further development and application of advanced analysis tools to greatly advance our understanding of matter in the most extreme conditions.

The authors would like to thank Dr. Andrew J. Porwitzky, Dr. Owen Mannion, Dr. Jeff Woolstrum, and Dr. Lucas Stanek for providing feedback on the full article greatly improving overall quality; Dr. Jeffrey R. Fein for providing feedback for Sec. V with particularly insightful comments on image analysis and tomography; Dr. Ravi G. Patel for reviewing Sec. IV with especially helpful references on CFD and physics informed machine learning and detailed comments throughout; Dr. Kathryn Maupin for feedback on Bayesian inference and model selection in Secs. II and VI; and Dr. James Gunning for insightful discussion on the topic of data assimilation. In addition, the authors would like to thank the data science working group within the Pulsed Power Sciences Center at SNL for providing a vibrant research environment for exploring and applying advanced methods to problems in HEDP and ICF. This group includes Dr. B. Klein, Dr. M.-A. Schaeuble, Dr. J. Fein. Dr. K. Maupin, Dr. T. Nagayama, Dr. C. Jennings, Dr. S. Hansen, Dr. D. Ampleford, Dr. K. Beckwith, Dr. S. Slutz, Dr. C. Tyler, Dr. R. Patel, Dr. K. Blaha, Dr. E. Harding, Dr. C. Tyler, Dr. S. Fields, Dr. J. Brown, and Dr. G. Vasey. Finally, P. F. Knapp would like to thank Dr. Michael Glinsky for his mentorship, without which this work would not have been possible.

This article was authored by an employee of National Technology and Engineering Solutions of Sandia, LLC under Contract No. DE-NA0003525 with the U.S. Department of Energy (DOE). The employee owns all right, title, and interest in and to the article and is solely responsible for its contents. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this article or allow others to do so, for United States Government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan, https://www.energy.gov/downloads/doe-public-access-plan.

The authors have no conflicts to disclose.

P. F. Knapp: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). W. E. Lewis: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal).

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

1.
2.
S.
Atzeni
and
J. M.
ter Vehn
,
The Physics of Inertial Fusion
,
1st ed.
(
Oxford University Press
,
2004
), p.
19
.
3.
R. S.
Craxton
,
K. S.
Anderson
,
T. R.
Boehly
,
V. N.
Goncharov
,
D. R.
Harding
,
J. P.
Knauer
,
R. L.
McCrory
,
P. W.
McKenty
,
D. D.
Meyerhofer
,
J. F.
Myatt
,
A. J.
Schmitt
,
J. D.
Sethian
,
R. W.
Short
,
S.
Skupsky
,
W.
Theobald
,
W. L.
Kruer
,
K.
Tanaka
,
R.
Betti
,
T. J. B.
Collins
,
J. A.
Delettrez
,
S. X.
Hu
,
J. A.
Marozas
,
A. V.
Maximov
,
D. T.
Michel
,
P. B.
Radha
,
S. P.
Regan
,
T. C.
Sangster
,
W.
Seka
,
A. A.
Solodov
,
J. M.
Soures
,
C.
Stoeckl
, and
J. D.
Zuegel
,
Phys. Plasmas
22
,
110501
(
2015
).
4.
R. P.
Drake
,
High Energy Density Physics: Fundamentals, Inertial Fusion, and Experimental Astrophysics
,
1st ed.
(
Springer
,
2006
), p.
70
.
5.
G. H.
Miller
,
E. I.
Moses
, and
C. R.
Wuest
,
Opt. Eng.
43
,
2841
(
2004
).
6.
M. E.
Savage
,
K. R.
LeChien
,
M. R.
Lopez
,
B. S.
Stoltzfus
,
W. A.
Stygar
,
D. S.
Artery
,
J. A.
Lott
, and
P. A.
Corcoran
, in
2011 IEEE Pulsed Power Conference
(
IEEE
,
2011
), pp.
983
990
.
7.
M.
Savage
,
K.
LeChien
,
W.
Stygar
,
J.
Maenchen
,
D.
McDaniel
, and
K.
Struve
, in
Proceedings of the 2008 IEEE International Power Modulators and High Voltage Conference
(
IEEE
,
2008
), p.
93
.
8.
T. R.
Boehly
,
D. L.
Brown
,
R. S.
Craxton
,
R. L.
Keck
,
J. P.
Knauer
,
J. H.
Kelly
,
T. J.
Kessler
,
S. A.
Kumpan
,
S. J.
Loucks
,
S. A.
Letzring
,
F. J.
Marshall
,
R. L.
McCrory
,
S. F. B.
Morse
,
W.
Seka
,
J. M.
Soures
, and
C. P.
Verdon
,
Opt. Commun.
133
,
495
(
1997
).
9.
J.-L.
Miquel
,
C.
Lion
, and
P.
Vivini
,
J. Phys.: Conf. Ser.
688
,
012067
(
2016
).
10.
J.
Arthur
,
G.
Materlik
,
R.
Tatchyn
, and
H.
Winick
,
Rev. Sci. Instrum.
66
,
1987
(
1995
).
11.
B.
Rus
,
P.
Bakule
,
D.
Kramer
,
G.
Korn
,
J.
Green
,
J.
Nóvak
,
M.
Fibrich
,
F.
Batysta
,
J.
Thoma
,
J.
Naylon
et al,
Proc. SPIE
8780
,
87801T
(
2013
).
12.
J.
Nees
,
A.
Maksimchuk
,
G.
Kalinchenko
,
B.
Hou
,
Y.
Ma
,
P.
Campbell
,
A.
McKelvey
,
L.
Willingale
,
I.
Jovanovic
,
C.
Kuranz
et al, in
2020 Conference on Lasers and Electro-Optics (CLEO)
(
IEEE
,
2020
), pp.
1
2
.
13.
R.
Betti
,
P. Y.
Chang
,
B. K.
Spears
,
K. S.
Anderson
,
J.
Edwards
,
M.
Fatenejad
,
J. D.
Lindl
,
R. L.
McCrory
,
R.
Nora
, and
D.
Shvarts
,
Phys. Plasmas
17
,
058102
(
2010
).
14.
D.
Bzdok
,
D. L.
Floris
, and
A. F.
Marquand
,
Philos. Trans. R. Soc., B
375
,
20190661
(
2020
).
15.
M. W.
Woolrich
,
NeuroImage
62
,
801
(
2012
), 20 Years of fMRI.
16.
Z.
Chen
,
Intell. Neurosci.
2013
,
251905
.
17.
S. M.
Hill
,
Y.
Lu
,
J.
Molina
,
L. M.
Heiser
,
P. T.
Spellman
,
T. P.
Speed
,
J. W.
Gray
,
G. B.
Mills
, and
S.
Mukherjee
,
Bioinformatics
28
,
2804
(
2012
).
18.
J. P.
Huelsenbeck
,
F.
Ronquist
,
R.
Nielsen
, and
J. P.
Bollback
,
Science
294
,
2310
(
2001
).
19.
G.
Ashton
,
M.
Hübner
,
P. D.
Lasky
,
C.
Talbot
,
K.
Ackley
,
S.
Biscoveanu
,
Q.
Chu
,
A.
Divakarla
,
P. J.
Easter
,
B.
Goncharov
et al,
Astrophys. J., Suppl. Ser.
241
,
27
(
2019
).
20.
S.
Sharma
,
Annu. Rev. Astron. Astrophys.
55
,
213
(
2017
).
21.
E.
Thrane
and
C.
Talbot
,
Publ. Astron. Soc. Aust.
36
,
e010
(
2019
).
23.
M. K.
Sen
and
P. L.
Stoffa
,
Geophys. Prospect.
44
,
313
(
1996
).
24.
F.
Feroz
,
M. P.
Hobson
, and
M.
Bridges
,
Mon. Not. R. Astron. Soc.
398
,
1601
(
2009
).
25.
U.
Von Toussaint
,
Rev. Mod. Phys.
83
,
943
(
2011
).
26.
P.
Knapp
and
W.
Lewis
, “
Examples for advanced data analysis in inertial confinement fusion and high energy density physics
,” https://github.com/sandialabs/Advanced-Data-Analysis-in-Inertial-Confinement-Fusion-and-High-Energy-Density-Physics.
27.
E. T.
Jaynes
,
Probability Theory: The Logic of Science
(
Cambridge University Press
,
2003
).
28.
A.
Gelman
,
J.
Carlin
,
H.
Stern
,
D.
Dunson
,
A.
Vehtari
, and
D.
Rubin
,
Bayesian Data Analysis
,
3rd ed.
(
CRC Press
,
2014
).
29.
W.
von der Linden
,
V.
Dose
, and
U.
von Toussaint
,
Bayesian Probability Theory: Applications in the Physical Sciences
,
1st ed.
(
Cambridge University Press
,
2014
).
30.
D.
Sivia
and
J.
Skilling
,
Data Analysis
,
2nd ed.
(
Oxford University Press
,
2006
).
31.
D. T.
Casey
,
J. A.
Frenje
,
M.
Gatu Johnson
,
F. H.
Séguin
,
C. K.
Li
,
R. D.
Petrasso
,
V. Y.
Glebov
,
J.
Katz
,
J.
Magoon
,
D. D.
Meyerhofer
,
T. C.
Sangster
,
M.
Shoup
,
J.
Ulreich
,
R. C.
Ashabranner
,
R. M.
Bionta
,
A. C.
Carpenter
,
B.
Felker
,
H. Y.
Khater
,
S.
LePape
,
A.
MacKinnon
,
M. A.
McKernan
,
M.
Moran
,
J. R.
Rygg
,
M. F.
Yeoman
,
R.
Zacharias
,
R. J.
Leeper
,
K.
Fletcher
,
M.
Farrell
,
D.
Jasion
,
J.
Kilkenny
, and
R.
Paguio
,
Rev. Sci. Instrum.
84
,
043506
(
2013
).
32.
J. A.
Frenje
,
R.
Bionta
,
E. J.
Bond
,
J. A.
Caggiano
,
D. T.
Casey
,
C.
Cerjan
,
J.
Edwards
,
M.
Eckart
,
D. N.
Fittinghoff
,
S.
Friedrich
,
V. Y.
Glebov
,
S.
Glenzer
,
G.
Grim
,
S.
Haan
,
R.
Hatarik
,
S.
Hatchett
,
M. G.
Johnson
,
O. S.
Jones
,
J. D.
Kilkenny
,
J. P.
Knauer
,
O.
Landen
,
R.
Leeper
,
S. L.
Pape
,
R.
Lerche
,
C. K.
Li
,
A.
Mackinnon
,
J.
McNaney
,
F. E.
Merrill
,
M.
Moran
,
D. H.
Munro
,
T. J.
Murphy
,
R. D.
Petrasso
,
R.
Rygg
,
T. C.
Sangster
,
F. H.
Séguin
,
S.
Sepke
,
B.
Spears
,
P.
Springer
,
C.
Stoeckl
, and
D. C.
Wilson
,
Nucl. Fusion
53
,
043014
(
2013
).
33.
L.
Ballabio
,
J.
Källne
, and
G.
Gorini
,
Nucl. Fusion
38
,
1723
(
1998
).
34.
B.
Ahlborn
,
M. H.
Key
, and
A. R.
Bell
,
Phys. Fluids
25
,
541
(
1982
).
35.
L.
Tierney
and
J. B.
Kadane
,
J. Am. Stat. Assoc.
81
,
82
(
1986
).
37.
C.
Bishop
,
Pattern Recognition and Machine Learning
(
Springer
,
2006
).
38.
I.
Goodfellow
,
Y.
Bengio
, and
A.
Courville
,
Deep Learning
(
MIT Press
,
2016
).
39.
K.
Murphy
,
Probabilistic Machine Learning: An Introduction
(
MIT Press
,
2016
).
40.
K. P.
Murphy
,
Probabilistic Machine Learning: Advanced Topics
(
MIT Press
,
2023
).
41.
N.
Metropolis
,
A. W.
Rosenbluth
,
M. N.
Rosenbluth
,
A. H.
Teller
, and
E.
Teller
,
J. Chem. Phys.
21
,
1087
(
1953
).
42.
A.
Zellner
and
P. E.
Rossi
,
J. Econometrics
25
,
365
(
1984
).
44.
M. E.
Glinsky
,
M. C.
Haase
,
V.
Charoing
,
G.
Duncan
,
R.
Hill
,
G.
O’Halloran
,
L.
Dang
, and
J.
Gunning
,
Leading Edge
27
,
642
(
2008
).
45.
M. F.
Kasim
,
T. P.
Galligan
,
J.
Topp-Mugglestone
,
G.
Gregori
, and
S. M.
Vinko
,
Phys. Plasmas
26
,
112706
(
2019
).
46.
K.
Dalbey
,
M. S.
Eldred
,
G.
Geraci
,
J. D.
Jakeman
,
K. A.
Maupin
,
J. A.
Monschke
,
D. T.
Seidl
,
L. P.
Swiler
,
A.
Tran
,
F.
Menhorn
, and
X.
Zeng
,
Sandia Technical Report No. SAND2020-12495, 2020
.
47.
B.
Carpenter
,
A.
Gelman
,
M. D.
Hoffman
,
D.
Lee
,
B.
Goodrich
,
M.
Betancourt
,
M.
Brubaker
,
J.
Guo
,
P.
Li
, and
A.
Riddell
,
J. Stat. Software
76
,
1
(
2017
).
48.
D.
Foreman-Mackey
,
D. W.
Hogg
,
D.
Lang
, and
J.
Goodman
,
Publ. Astron. Soc. Pac.
125
,
306
(
2013
); arXiv:1202.3665 [astro-ph.IM].
49.
A.
Patil
,
D.
Huard
, and
C. J.
Fonnesbeck
,
J. Stat. Software
35
,
1
(
2010
).
50.
J.
Salvatier
,
T. V.
Wiecki
, and
C.
Fonnesbeck
,
PeerJ Comput. Sci.
2
,
e55
(
2016
).
51.
S.
Depaoli
,
J. P.
Clifton
, and
P. R.
Cobb
,
J. Educ. Behav. Stat.
41
,
628
(
2016
).
52.
J. V.
Dillon
,
I.
Langmore
,
D.
Tran
,
E.
Brevdo
,
S.
Vasudevan
,
D.
Moore
,
B.
Patton
,
A.
Alemi
,
M.
Hoffman
, and
R. A.
Saurous
, “
TensorFlow distributions
,” arXiv:1711.10604 (
2017
).
53.
M.
Betancourt
, “
A conceptual introduction to Hamiltonian Monte Carlo
,” arXiv:1701.02434 (
2017
).
54.
M. D.
Hoffman
and
A.
Gelman
,
J. Mach. Learn. Res.
15
,
1593
(
2014
).
55.
J. A.
Vrugt
,
C. J. F.
Ter Braak
,
C. G. H.
Diks
,
B. A.
Robinson
,
J. M.
Hyman
, and
D.
Higdon
,
Int. J. Nonlinear Sci. Numer. Simul.
10
,
273
(
2009
).
56.
H.
Mobahi
and
J. W.
Fisher
, in
International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition
(
Springer
,
2015
), pp.
43
56
.
57.
H.
Tjelmeland
and
B. K.
Hegstad
,
Scand. J. Stat.
28
,
205
(
2001
).
58.
E.
Pompe
,
C.
Holmes
, and
K.
Łatuszyński
,
Ann. Stat.
48
,
2930
(
2020
).
59.
J. J.
Ruby
,
J. R.
Rygg
,
D. A.
Chin
,
J. A.
Gaffney
,
P. J.
Adrian
,
C. J.
Forrest
,
V. Y.
Glebov
,
N. V.
Kabadi
,
P. M.
Nilson
,
Y.
Ping
,
C.
Stoeckl
, and
G. W.
Collins
,
Phys. Rev. Lett.
125
,
215001
(
2020
).
60.
J. J.
Ruby
,
J. R.
Rygg
,
D. A.
Chin
,
J. A.
Gaffney
,
P. J.
Adrian
,
D.
Bishel
,
C. J.
Forrest
,
V. Y.
Glebov
,
N. V.
Kabadi
,
P. M.
Nilson
,
Y.
Ping
,
C.
Stoeckl
, and
G. W.
Collins
,
Phys. Rev. E
102
,
053210
(
2020
).
61.
J. J.
Ruby
,
J. A.
Gaffney
,
J. R.
Rygg
,
Y.
Ping
, and
G. W.
Collins
,
Phys. Plasmas
28
,
032703
(
2021
).
62.
B.
Tobias
,
C. F.
Kawaguchi
,
S.
Palaniyappan
,
J. P.
Sauppe
,
K. A.
Flippo
, and
J. L.
Kline
,
High Energy Density Phys.
37
,
100879
(
2020
).
63.
J. L.
Brown
and
L. B.
Hund
,
J. R. Stat. Soc.: Ser. C: Appl. Stat.
67
,
1023
(
2018
).
64.
W. E.
Lewis
,
P. F.
Knapp
,
S. A.
Slutz
,
P. F.
Schmit
,
G. A.
Chandler
,
M. R.
Gomez
,
A. J.
Harvey-Thompson
,
M. A.
Mangan
,
D. J.
Ampleford
, and
K.
Beckwith
,
Phys. Plasmas
28
,
092701
(
2021
).
65.
M. J.
Rosenberg
,
A. B.
Zylstra
,
F. H.
Séguin
,
H. G.
Rinderknecht
,
J. A.
Frenje
,
M.
Gatu Johnson
,
H.
Sio
,
C. J.
Waugh
,
N.
Sinenian
,
C. K.
Li
,
R. D.
Petrasso
,
P. W.
McKenty
,
M.
Hohenberger
,
P. B.
Radha
,
J. A.
Delettrez
,
V. Y.
Glebov
,
R.
Betti
,
V. N.
Goncharov
,
J. P.
Knauer
,
T. C.
Sangster
,
S.
LePape
,
A. J.
Mackinnon
,
J.
Pino
,
J. M.
McNaney
,
J. R.
Rygg
,
P. A.
Amendt
,
C.
Bellei
,
L. R.
Benedetti
,
L.
Berzak Hopkins
,
R. M.
Bionta
,
D. T.
Casey
,
L.
Divol
,
M. J.
Edwards
,
S.
Glenn
,
S. H.
Glenzer
,
D. G.
Hicks
,
J. R.
Kimbrough
,
O. L.
Landen
,
J. D.
Lindl
,
T.
Ma
,
A.
MacPhee
,
N. B.
Meezan
,
J. D.
Moody
,
M. J.
Moran
,
H.-S.
Park
,
B. A.
Remington
,
H.
Robey
,
M. D.
Rosen
,
S. C.
Wilks
,
R. A.
Zacharias
,
H. W.
Herrmann
,
N. M.
Hoffman
,
G. A.
Kyrala
,
R. J.
Leeper
,
R. E.
Olson
,
J. D.
Kilkenny
, and
A.
Nikroo
,
Phys. Plasmas
21
,
122712
(
2014
).
66.
M.
Gatu Johnson
,
A. B.
Zylstra
,
A.
Bacher
,
C. R.
Brune
,
D. T.
Casey
,
C.
Forrest
,
H. W.
Herrmann
,
M.
Hohenberger
,
D. B.
Sayre
,
R. M.
Bionta
,
J.-L.
Bourgade
,
J. A.
Caggiano
,
C.
Cerjan
,
R. S.
Craxton
,
D.
Dearborn
,
M.
Farrell
,
J. A.
Frenje
,
E. M.
Garcia
,
V. Y.
Glebov
,
G.
Hale
,
E. P.
Hartouni
,
R.
Hatarik
,
M.
Hohensee
,
D. M.
Holunga
,
M.
Hoppe
,
R.
Janezic
,
S. F.
Khan
,
J. D.
Kilkenny
,
Y. H.
Kim
,
J. P.
Knauer
,
T. R.
Kohut
,
B.
Lahmann
,
O.
Landoas
,
C. K.
Li
,
F. J.
Marshall
,
L.
Masse
,
A.
McEvoy
,
P.
McKenty
,
D. P.
McNabb
,
A.
Nikroo
,
T. G.
Parham
,
M.
Paris
,
R. D.
Petrasso
,
J.
Pino
,
P. B.
Radha
,
B.
Remington
,
H. G.
Rinderknecht
,
H.
Robey
,
M. J.
Rosenberg
,
B.
Rosse
,
M.
Rubery
,
T. C.
Sangster
,
J.
Sanchez
,
M.
Schmitt
,
M.
Schoff
,
F. H.
Séguin
,
W.
Seka
,
H.
Sio
,
C.
Stoeckl
, and
R. E.
Tipton
,
Phys. Plasmas
24
,
041407
(
2017
).
67.
J. P.
Sauppe
,
S.
Palaniyappan
,
B. J.
Tobias
,
J. L.
Kline
,
K. A.
Flippo
,
O. L.
Landen
,
D.
Shvarts
,
S. H.
Batha
,
P. A.
Bradley
,
E. N.
Loomis
,
N. N.
Vazirani
,
C. F.
Kawaguchi
,
L.
Kot
,
D. W.
Schmidt
,
T. H.
Day
,
A. B.
Zylstra
, and
E.
Malka
,
Phys. Rev. Lett.
124
,
185003
(
2020
).
68.
K. A.
Flippo
,
J. L.
Kline
,
F. W.
Doss
,
E. N.
Loomis
,
M.
Emerich
,
B.
Devolder
,
T. J.
Murphy
,
K. B.
Fournier
,
D. H.
Kalantar
,
S. P.
Regan
,
M. A.
Barrios
,
E. C.
Merritt
,
T. S.
Perry
,
I. L.
Tregillis
,
L.
Welser-Sherrill
, and
J. R.
Fincke
,
Rev. Sci. Instrum.
85
,
093501
(
2014
).
69.
K.
Flippo
,
B.
DeVolder
,
F.
Doss
,
J.
Kline
,
E.
Merritt
,
E.
Loomis
,
D.
Capelli
,
D.
Schmidt
, and
M. J.
Schmitt
,
J. Phys.: Conf. Ser.
717
,
012062
(
2016
).
70.
K. M.
Hanson
and
G. S.
Cunningham
,
Maximum Entropy and Bayesian Methods
, edited by
K. M.
Hanson
and
R. N.
Silver
(
Springer
,
Dordrecht, The Netherlands
,
1996
), pp.
125
134
.
71.
J. L.
Brown
,
C. S.
Alexander
,
J. R.
Asay
,
T. J.
Vogler
,
D. H.
Dolan
, and
J. L.
Belof
,
J. Appl. Phys.
115
,
043530
(
2014
).
72.
P.
Vinet
,
J. H.
Rose
,
J.
Ferrante
, and
J. R.
Smith
,
J. Phys.: Condens. Matter
1
,
1941
(
1989
).
73.
C. E.
Rasmussen
and
C. K.
Williams
,
Gaussian Processes for Machine Learning
(
MIT Press
,
2006
).
74.
S. J.
Ali
,
R. G.
Kraus
,
D. E.
Fratanduono
,
D. C.
Swift
, and
J. H.
Eggert
,
J. Appl. Phys.
121
,
195901
(
2017
).
75.
W. J.
Schill
,
R. A.
Austin
,
K. L.
Schimdt
,
J. L.
Brown
, and
N. R.
Barton
,
J. Appl. Phys.
130
,
055901
(
2021
).
76.
A.
Porwitzky
,
J.
Brown
,
S.
Duwal
,
D. H.
Dolan
,
C.
Blada
,
J.
Boerner
,
J.
Williams
, and
S.
Payne
,
J. Appl. Phys.
132
,
115102
(
2022
).
77.
S. A.
Slutz
,
M. C.
Herrmann
,
R. A.
Vesey
,
A. B.
Sefkow
,
D. B.
Sinars
,
D. C.
Rovang
,
K. J.
Peterson
, and
M. E.
Cuneo
,
Phys. Plasmas
17
,
056303
(
2010
).
78.
M. R.
Gomez
,
S. A.
Slutz
,
A. B.
Sefkow
,
D. B.
Sinars
,
K. D.
Hahn
,
S. B.
Hansen
,
E. C.
Harding
,
P. F.
Knapp
,
P. F.
Schmit
,
C. A.
Jennings
,
T. J.
Awe
,
M.
Geissel
,
D. C.
Rovang
,
G. A.
Chandler
,
G. W.
Cooper
,
M. E.
Cuneo
,
A. J.
Harvey-Thompson
,
M. C.
Herrmann
,
M. H.
Hess
,
O.
Johns
,
D. C.
Lamppa
,
M. R.
Martin
,
R. D.
McBride
,
K. J.
Peterson
,
J. L.
Porter
,
G. K.
Robertson
,
G. A.
Rochau
,
C. L.
Ruiz
,
M. E.
Savage
,
I. C.
Smith
,
W. A.
Stygar
, and
R. A.
Vesey
,
Phys. Rev. Lett.
113
,
155003
(
2014
).
79.
D.
Yager-Elorriaga
,
M. R.
Gomez
,
D. E.
Ruiz
,
S. A.
Slutz
,
A. J.
Harvey-Thompson
,
C.
Jennings
,
P.
Knapp
,
P.
Schmit
,
M.
Weis
,
T. J.
Awe
,
G. A.
Chandler
,
M. A.
Mangan
,
C. E.
Myers
,
J. R.
Fein
,
B. R.
Galloway
,
M.
Geissel
,
M. E.
Glinsky
,
S. B.
Hansen
,
E. C.
Harding
,
D. C.
Lamppa
,
W. E.
Lewis
,
P.
Rambo
,
G. K.
Robertson
,
M. E.
Savage
,
G. A.
Shipley
,
I. C.
Smith
,
J.
Schwarz
,
D. J.
Ampleford
,
K.
Beckwith
,
K.
Peterson
,
J. L.
Porter
,
G. A.
Rochau
, and
D. B.
Sinars
,
Nucl. Fusion
62
,
042015
(
2021
).
80.
P. F.
Schmit
,
P. F.
Knapp
,
S. B.
Hansen
,
M. R.
Gomez
,
K. D.
Hahn
,
D. B.
Sinars
,
K. J.
Peterson
,
S. A.
Slutz
,
A. B.
Sefkow
,
T. J.
Awe
,
E.
Harding
,
C. A.
Jennings
,
G. A.
Chandler
,
G. W.
Cooper
,
M. E.
Cuneo
,
M.
Geissel
,
A. J.
Harvey-Thompson
,
M. C.
Herrmann
,
M. H.
Hess
,
O.
Johns
,
D. C.
Lamppa
,
M. R.
Martin
,
R. D.
McBride
,
J. L.
Porter
,
G. K.
Robertson
,
G. A.
Rochau
,
D. C.
Rovang
,
C. L.
Ruiz
,
M. E.
Savage
,
I. C.
Smith
,
W. A.
Stygar
, and
R. A.
Vesey
,
Phys. Rev. Lett.
113
,
155004
(
2014
).