Our understanding of physical systems often depends on our ability to match complex computational modeling with the measured experimental outcomes. However, simulations with large parameter spaces suffer from inverse problem instabilities, where similar simulated outputs can map back to very different sets of input parameters. While of fundamental importance, such instabilities are seldom resolved due to the intractably large number of simulations required to comprehensively explore parameter space. Here, we show how Bayesian inference can be used to address inverse problem instabilities in the interpretation of x-ray emission spectroscopy and inelastic x-ray scattering diagnostics. We find that the extraction of information from measurements on the basis of agreement with simulations alone is unreliable and leads to a significant underestimation of uncertainties. We describe how to statistically quantify the effect of unstable inverse models and describe an approach to experimental design that mitigates its impact.

## I. INTRODUCTION

Our understanding of physical systems closely depends on our ability to predictively model their behavior in controlled experimental settings. By conducting simulations that produce outcomes resembling those observed experimentally, we can investigate which processes are important and quantify their effect on the observed outcomes. This approach is straightforward when experiments can be designed to probe a specific process in isolation, for which the modeling contains as few variable parameters as possible. Unfortunately, it is common for experimental outcomes to intertwine a host of competing processes and for simulations to operate in large parameter spaces. An example of such research is the study of matter at high energy densities, i.e., systems at temperatures exceeding ∼10 000 K at the typical density of a solid or at pressures exceeding 1 Mbar. Matter in these conditions is of great interest to astrophysical and inertial confinement fusion (ICF) investigations and tends to be highly transient, inhomogeneous, and challenging to controllably create and difficult to probe. Research here relies strongly on complex computational modeling that needs to account for a wide range of processes and interactions, which, critically, depend on a large number of variable parameters. Some difficulties with this approach for inertial fusion energy research have been discussed within the context of insufficiently accurate models and diagnostics,^{1,2} but little attention has been paid to the intrinsic limitation of integrated experiments in their own right and to the systematic uncertainties introduced by correlated physical parameters in both experiment and simulation.

Here, we show how stochastic optimization methods^{3} can be used to solve robustly the inversion problem in large-dimensional modeling and combine it with Bayesian inference to explore the behavior of the complex, multiparameter simulations. Our Bayesian inference approach uses Markov Chain Monte Carlo (MCMC) algorithms to efficiently sample large dimensional spaces.^{4} While the approach is general, we will focus specifically on two widely used experimental plasma diagnostics: x-ray emission spectroscopy and inelastic x-ray scattering. As we will show, inverse problem instabilities are significant in both and must be comprehensively treated if meaningful information is to be extracted robustly from experimental measurements.

### A. Searching the space of solutions

Simulations in physics are forward models: an initial set of parameters is chosen together with a set of models and assumptions and is fed into an algorithm to calculate a set of outputs. The outputs typically correspond to the experimental observables, while the input parameters are related to the experimental conditions or govern the models deployed. The models represent our best current understanding of the physical system. The comparison between experiment and simulation is then the process by which we vary the inputs of the simulation to find outputs that match the experimental observables. In this picture, a simulation *f* can be viewed as a map from a set of *n* input parameters to a set of *m* output observables $f:\mathbb{R}n\u2192\mathbb{R}m$. The comparison with experiment is then simply the search for its inverse, $f\u22121$, the function that maps the known experimental outputs $y\u2208\mathbb{R}m$ onto some vector of input parameters $x\u2208\mathbb{R}n$. If the number of input parameters is large, this inversion problem cannot be efficiently solved via a brute-force grid-search approach. This is due to the dimensionality curse: the number of simulations required to fill parameter space grows exponentially with the number of dimensions, quickly rendering the problem computationally intractable. The common workaround is to attempt to artificially reduce the range and number of parameters, restricting the search to a smaller subspace. However, this can introduce significant bias to the analysis. The problem of parameter bias is most acute if *f* is not injective, i.e., when a point in the space of outcomes can map on to several possible values in the space of parameters. However, the problem remains relevant even for injective maps in the presence of experimental uncertainties, such as noise. We show a schematic of this mapping process, for an injective problem with noise, in Fig. 1.

If similar sets of outcomes can be generated from very different sets of inputs, the inverse function $f\u22121$ is said to be unstable:^{5} the inversion can yield multiple solutions, indistinguishable within the experimental uncertainty. Nevertheless, the problem can be treated statistically, by searching for the likelihood *P* of finding a specific set of parameters **x** given an observed outcome **y**, $P(x|y)$. Finding this probability can therefore be phrased as a Bayesian inference problem,^{6}

where $P(y|x)$ is the likelihood of finding the observables **y** given a set of input parameters **x** (a forward model calculation), $P(x)$ is the prior distribution of possible parameters, and $P(y)$ is the marginal likelihood of the observed data over all possible parameters. The prior distribution here is important as it provides a way to bias the parameters due to physical constraints or to use results from complementary diagnostics, the results of which would otherwise not feed into the evaluation of the observed output. The marginal likelihood $P(y)$ is intractable to compute in high dimensional space, but is simply a scaling constant given its independence for **x**. In the present work, we chose the forward model likelihood $P(y|x)$ to be uniform if the simulated output lies within some uncertainty band around the data we aim to reproduce or approaches zero otherwise. This enforces equal preference to all simulation outcomes that match the data to within the uncertainty band regardless of the detailed fit. More details on the choice of likelihood functions and their influence on the results are provided in the supplementary material.

To sample the posterior distribution of the parameters, $P(x|y)$, we employ Markov Chain Monte Carlo (MCMC) algorithms^{7–9} to obtain samples from an unknown probability distribution in high dimensional space.^{4} This approach provides an efficient way to sample the possible simulations that yield outcomes, which match, to within a given uncertainty, the experimentally observed results. Importantly, while all matching simulations have equal prior probability, their posterior probability distribution in parameter space will not generally be uniform. We can thus address the inverse problem instability by investigating the posterior probability distribution of parameters and give best-estimates that are statistically meaningful even for highly pathological, noninvertible forward models.

## II. RESULTS

### A. Inelastic x-ray scattering

Inelastic x-ray Thomson scattering (XRTS) is a widely used tool in high energy density physics to determine the conditions of warm-dense matter via direct probing of the electron response function.^{10–12} This includes measuring plasma temperatures and densities,^{13,14} ion correlations,^{15,16} transport properties,^{17,18} ionization and continuum lowering,^{19,20} and informing equation of state studies.^{21–24} In XRTS, the backscattering spectrum provides access to the noncollective Compton scattering regime, which can probe the temperature, density, and ionization state of the plasma, while the forward scattering spectrum is sensitive to collective plasmon oscillations.

To explore the inverse problem instability in XRTS, we use the model of Gregori *et al.*^{25,26} within the Chihara approximation.^{27} This is a forward model that takes as input a set of plasma parameters (electron and ion temperature, ionization, density, and spectrum of incident x-rays) and produces an energy-resolved x-ray scattering spectrum. The scattering spectrum is then convolved with the point spread function of the spectrometer to predict the experimental observation.

Our analysis is generally applicable to any XRTS experiment, but to focus our discussion and derive quantitative results, we will analyze the high-quality experimental data of Lee *et al.*^{21} In this experiment, XRTS was used to deduce the temperature and density of shock-compressed beryllium. We show the scattering spectrum from Ref. 21 in Fig. 2(a), alongside a 3.5% uncertainty band. This band is determined by the full variation of the experimental signal due to noise on the Compton feature, and all our simulated scattering spectra are required to fall within it. It is impractical to solve the inversion problem directly via MCMC starting from a random point in parameter space, and so we start by finding the best fit solution using the CMA-ES algorithm^{3} to minimize the MCMC burn-in time. This algorithm is seen to scale well with dimensionality and is efficient and robust. The best fit solution is found by minimizing the sum of squared residuals between the experimental data and the simulation results and provides the starting point for the MCMC sampler to explore the space of solutions. The forward likelihood $P(y|x)$ used in our MCMC calculations is given by the hard boundary condition,

Here, *b* is the 3.5% uncertainty band on the experimental data. In the search, the temperature is constrained to the range of $(0\u221260)$ eV, the beryllium ionization to between $0.01\u2009and\u20094.0$, and the mass density to $(0.01\u221240)$ gcm^{−3}, all with uniform prior probability. For the MCMC, we use the Metropolis-Hastings algorithm^{7,8} following the implementation in the PyMC3 package^{28} using a Gaussian-step proposal with a step size of 10% of the bounding box size and tuned every 20 steps. The full set of collected simulated spectra (∼50 000) obtained by our MCMC algorithm is given in Fig. 2(b) for the electron temperature, mass density, and ionization. We find a large parameter spread ranging orders of magnitude in temperature and density, all while yielding total scattering spectra that are indistinguishable within the given uncertainty. This large spread illustrates the difficulties that arise in data interpretation due to the instability of the inverse problem, as any point could potentially represent the experiment.

For each point in Fig. 2 we know, by construction, that the probability $P(y|x)$ of finding the experimental spectrum given the parameters (density, temperature, and ionization) is large. However, this alone does “not” imply that the observed spectrum favors that specific set of input parameters, i.e., that $P(x|y)$ is also large. In fact, the posterior distribution probability $P(x|y)$ is generally unknown, unless calculated explicitly. We show our calculations for the posterior parameter distributions for this case in Figs. 2(b)–2(d).

Not considering posterior distributions in interpreting the data can lead to contradictions and erroneous conclusions. For example, *a priori* assumption that the temperature should be around 40 eV would imply a corresponding density of around 3.4 gcm^{−}^{3} and an ionization of $Z\u223c2.3$. However, assuming the temperature to be around 0.15 eV would yield a density estimate in excess of 7 gcm^{−}^{3} and an ionization of 1. Note that both these contradictory estimates rely on comparable agreement with experimental data. This exemplifies the insidious problem where previous assumptions—the bias of the modeler—can completely determine outcomes, irrespective of the underlying experimental data. It is therefore important to evaluate the posterior distribution probabilities rather than simply relying on agreement between simulation and experiment. The posterior distribution allows us to quantify which plasma conditions are most likely and to estimate uncertainties. From the histograms in Fig. 2, we find mean values to be $T=15\u221214+21$ eV, $Z=1.8\u22121.0+0.5$, and $\rho =5\u22124+8$ gcm^{−3}. The uncertainties quoted are 95% confidence intervals and are significantly larger than those originally reported. We note that the distributions show a complex mode structure, which is not captured well by a simple mean figure for various parameters. While intriguing, a more detailed discussion on what this implies for the underlying physics in the experiment is beyond the scope of this paper.

### B. X-ray emission spectroscopy

X-ray emission spectroscopy is a popular diagnostic technique across the physical sciences, widely used in astrophysical investigations,^{29–31} for elemental analysis in laser-induced breakdown spectroscopy,^{32} applied condensed matter physics,^{33,34} fundamental plasma physics,^{35,36} laser-plasma interactions,^{37–39} and inertial confinement fusion (ICF) research.^{40–43}

To explore the role of inverse problem instabilities in the interpretation of spectroscopic data, we will examine emission spectroscopy used in ICF experiments on the National Ignition Facility (NIF).^{42,44} Specifically, it was observed in these experiments that the mixing of cold material into the hotspot of an imploding capsule via hydrodynamic instabilities can quench the ignition process by enhancing radiative and conductive losses.^{45} It is thus of considerable interest to understand the mechanisms and amount of mix taking place in experiments. To this end, a novel spectroscopic model was proposed by Ciricosta *et al.*,^{46} which makes use of the spectroscopic emission signature to infer the amount of mixed mass. Here, the emission was simulated by the atomic-kinetics radiation-transfer code Cretin,^{47} extensively benchmarked to large-scale hydrodynamic simulations. The simulation is a forward model that takes as input ten parameters describing the various parameters of the imploding capsule, including a jet of cold material driven toward the core. It produces a synthetic x-ray emission spectrum that can be compared with experiment. A comprehensive description of the ten parameters and their significance is beyond the scope of this paper, but can be found in Ref. 46. For the purpose of the present discussion, we will focus only on trying to infer the mass of cold material that mixes into the hotspot (mixed mass) by searching for matching simulated and experimental emission spectra.

As previously, we start by minimizing the sum of squared residuals between the experimental data and the simulation results using the CMA-ES optimization algorithm, using the forward model from the study by Ciricosta *et al.*^{46} to simulate the experimental data of Regan *et al.*^{42} As can be seen in Fig. 3(a), there is a difference in the $K\alpha $ peak position around 9.9 keV between the simulated and experimental results. As this is due to difficulties in calculating accurately the transition energy in the atomic physics model,^{46} we shift the simulated $K\alpha $ peak in the calculation of the cost function to match the experimental $K\alpha $ peak position and thus avoid introducing spurious fitting effects due to forward model inaccuracies. The best fit result is then again used as a starting point for the MCMC sampler to calculate posterior distributions of the modeling parameters, via the ensemble MCMC algorithm described in Ref. 9 using only the stretch move. We have parallelized the simulations by running multiple samplers separately. We again assume a uniform distribution of non-negative priors (no bias). The full range of the 10 parameters used in the search is provided in the supplementary material.

The emission spectra are shown in Fig. 3(a), where we compare the experimental data with the fitting result and deduced the mixed mass from Ref. 46, alongside two of our calculations that yield the largest and smallest mixed mass estimates. All simulations shown are contained within the experimental error bars. Regan *et al.* initially estimated a mixed mass of $34\u221213+50$ ng by fitting only the Ge $He\alpha $ complex, while Ciricosta *et al.* estimated a mixed mass of 35 ng by fitting to the whole spectrum shown in Fig. 3, estimating their uncertainty to be within ∼80%. In contrast, we find spectra that match the experimental data with associated values of the mixed mass in the entire range between 20 and 117 ng. Nevertheless, as shown in Fig.3(b), our posterior probability distribution for the mixed mass exhibits a distinct peak around 35 ng, a value largely in line with previous estimates. Specifically, we find the mixed mass to be $38\u221214+27$ ng, with a 95% confidence interval. So while a wide range of mix values are consistent with the data, certain values are more probable than others. This observation seems to be encouraging for the exploitation of x-ray emission spectroscopy to measure the mixed mass in ICF experiments, despite the intrinsically large uncertainties of the approach. Estimates made during the National Ignition Campaign place a mixed-mass limit of around 75 ng for a hotspot of ∼2 $\xd7104$ ng before significant implosion efficiency deterioration.^{48} Diagnostics are therefore needed that are accurate to well within that range, and this evaluation of x-ray emission spectroscopy suggests that the inversion could be made robust to the required level.

The results presented in this 10-dimensional case used a total of ∼110 000 MCMC samples generated at a cost of over 24 000 cpu h. Using the technique described in Ref. 9, we find that 100 000 samples are sufficient to reconstruct the correct histogram for two benchmarked 10-dimensional MCMC test cases: (1) a 10-dimensional multivariate normal distribution with strong correlations in every pair of dimensions and (2) the 10-dimensional Rosenbrock density as in Ref. 9. We have also checked the distributions after only half (50 000) of the total samples used and see that they already look very similar to the final result, both in shape and in deduced final mean values, providing confidence that the MCMC is converged. The plots of the intermediate distributions with further details can be found in the supplementary material.

### C. The effect of constraints

So far we have assumed no bias on the parameters used for the simulations, other than the limits of the sampling range. However, complementary experimental diagnostics and other limitations can constrain the possible space of parameters to be explored. This is equivalent to adding a prior bias to the parameter set. It is thus interesting to see how constraints can influence posterior distribution probabilities and by them our conclusions on the measured outcomes. This is particularly important in the context of experimental design, where we wish to design experiments and choose diagnostics in such a way as to minimize the overall variance of the measured parameters.

We illustrate the effect of three different types of constraints on the outcome of an XRTS measurement in Fig. 4(a). The panels show the distributions of the deduced temperature, density, and ionization in the Be plasma, with the respective median value. The constraint that the ionization of Be is always larger than 2 strongly affects the range of possible densities and noticeably raises the estimated temperature. We note that in the original work, the ionization was set to *Z *=* *2 on the basis of radiation-hydrodynamic modeling.^{21} Such an assumption is seen to have a profound effect on the interpretation of the experimental results and predetermines the range of possible outcomes. In contrast, the constraint that Be is compressed above solid density is seen not to influence the results, as most simulations that match the data already require higher densities. Given the timescales of the experiment, one could also suppose the Be system to be near local thermodynamic equilibrium (LTE). Under this assumption, the range of densities and ionisation again decreases but leads only to a modest change in deduced temperature. The benefit of complementary diagnostics that could provide support to any of these conditions can thus be evaluated directly from Fig. 4(a).

The effects of parameter constraints in inverting x-ray spectroscopy models are shown in Fig. 4(b). Here, we assume that, for example, complementary diagnostics in an ICF implosion experiment are able to limit the possible variation of the temperature or mass of the hotspot or the density of the cold material in the capsule. Such additional data, for example, obtained from a fusion neutron spectrum, can then be integrated in the parameter search by restricting the parameter space in the implosion model. As we can see, the expectation value of the mixed mass seems to be robust to the addition of constraints to the MCMC calculation and the uncertainties are lowered by over a factor two, provided that the hotspot temperature can be determined to within ∼10%. Our approach thus allows us not only to integrate various diagnostic outputs into the evaluation of specific observables but also to estimate what additional parameters need to be measured and with what accuracy, for the spectroscopic measurement to be robust.

We note that the uncertainties quoted in the literature are generally much smaller than those deduced via our MCMC approach, often by over an order of magnitude. To understand one reason why this might happen, we plot in Fig. 5 the deduced uncertainties for the two cases examined here, obtained in two different ways. The first is taken directly from the posterior distribution shown in Figs. 2 and 3. The second is based on the smallest variation that we can make to a single parameter to produce simulations that no longer agree with the data to within the specified uncertainty—the simplest way of estimating uncertainties in fits. We find that the variation of only a single parameter significantly underestimates the uncertainty, as the other parameters are not allowed to compensate for the changes made.

Performing experiments where some parameters can be explicitly constrained by the setup or by the interaction itself is clearly very appealing in the context of trying to minimize difficulties related to inverting physical models. For example, recent scattering measurements of the electrical conductivity in warm dense aluminum took place on time scales so short that the ion density remained frozen, thus removing all uncertainties from that parameter in the inversion.^{18} A similar frozen-ion constraint was also exploited in spectroscopic investigations of continuum lowering.^{49–51} Other experiments have instead added additional diagnostics such as x-ray diffraction to help place stringent experimental constraints on regions of the parameter space.^{12,23} However, such mitigation-by-design is not always possible, and some investigations suffer more strongly from inverse problem instabilities. Because the data with the highest levels of uncertainty will be most affected, experimental results with large error bands should be treated with caution, especially if used as a benchmark for physical properties such as plasma correlations^{16} or in determining the equation of state of extreme states of matter.^{22,24}

For the cases discussed here, we deliberately used small uncertainties to constrain the parameters and have neglected several additional sources of uncertainty. For example, we have ignored the presence of gradients in the samples under investigation and the possibility that the measured outcome includes contributions from systems under very different conditions.^{52} Importantly, despite our efforts, some aspects of both model inaccuracy (e.g., in collision ionization models and in atomic kinetics modeling^{18,36,53}) and model inadequacy (e.g., breakdown of the Chihara decomposition in XRTS^{54}) might not be treated reliably with our choice of log-likelihood function. Clearly, such effects can vary significantly across experiments, but their combined effect will further restrict the information that can reliably be extracted from the measurement. Nevertheless, provided that limits can be placed on these uncertainties, their effect on the posterior probability distributions can be investigated comprehensively within the present framework.

## III. CONCLUSIONS

We have presented a study of inverse problem instabilities in complex simulations with large parameter spaces, applied to the interpretation of high energy-density plasma physics experiments. We find that spectroscopic and scattering diagnostics can suffer from inverse instabilities to a significant extent. A robust interpretation of such data thus requires knowledge of the posterior probability distribution, possibly in the presence of parameter constraints. The uncertainties arising from inverse instabilities are also significantly larger than those commonly quoted in the literature. Importantly, fielding appropriate complementary experimental diagnostics can be highly beneficial in restricting the possible space of solutions. In this regard, our approach based on Bayesian inference can provide valuable guidance on the selection of diagnostics, experimental parameters, and experimental design more generally.

## SUPPLEMENTARY MATERIAL

See the supplementary material for more information on the details of our calculations, including how the choice of likelihood function affects the results, the convergence of MCMC sampling, and the range of parameters used in the physical forward models.

## ACKNOWLEDGMENTS

We thank Dr. S. Regan at the Laboratory for Laser Energetics (University of Rochester, USA) for providing support for J. T.-M. XRTS research. S.M.V. is a University Research Fellow of the Royal Society. M.F.K. and S.M.V. acknowledge support from UK EPSRC Grant No. EP/P015794/1 and from the Royal Society. T.G. acknowledges support from the Oxford University Centre for High Energy Density Science (OxCHEDS). G.G. acknowledges support from AWE plc. and the UK EPSRC (Nos. EP/M022331/1 and EP/N014472/1).

M.K., G.G., and S.M.V. are founders and directors of Machine Discovery Ltd., a University of Oxford spinout company.

## References

_{3}

_{2}o

_{3}revealed by first-principles calculations and x-ray spectroscopy