Learning thermodynamically constrained equations of state with uncertainty

Numerical simulations of high energy-density experiments require equation of state (EOS) models that relate a material's thermodynamic state variables -- specifically pressure, volume/density, energy, and temperature. EOS models are typically constructed using a semi-empirical parametric methodology, which assumes a physics-informed functional form with many tunable parameters calibrated using experimental/simulation data. Since there are inherent uncertainties in the calibration data (parametric uncertainty) and the assumed functional EOS form (model uncertainty), it is essential to perform uncertainty quantification (UQ) to improve confidence in the EOS predictions. Model uncertainty is challenging for UQ studies since it requires exploring the space of all possible physically consistent functional forms. Thus, it is often neglected in favor of parametric uncertainty, which is easier to quantify without violating thermodynamic laws. This work presents a data-driven machine learning approach to constructing EOS models that naturally captures model uncertainty while satisfying the necessary thermodynamic consistency and stability constraints. We propose a novel framework based on physics-informed Gaussian process regression (GPR) that automatically captures total uncertainty in the EOS and can be jointly trained on both simulation and experimental data sources. A GPR model for the shock Hugoniot is derived and its uncertainties are quantified using the proposed framework. We apply the proposed model to learn the EOS for the diamond solid state of carbon, using both density functional theory data and experimental shock Hugoniot data to train the model and show that the prediction uncertainty reduces by considering the thermodynamic constraints.


I. INTRODUCTION
Hydrodynamics simulations, which are widely used to predict and understand the evolution of experiments in high energy density physics, inertial confinement fusion, laboratory astrophysics, and geophysics, are underpinned by equation of state (EOS) models which are needed to relate the thermodynamic state variables of the materials of interest 1 .The accuracy and precision of the EOS, and the development of methods to quantify uncertainty in EOS models, is therefore a crucial concern.This is a challenging task that will require novel methods to complete.
EOS models are typically constructed using semi-empirical functions where the functional form is motivated by the physics, and the parameters are calibrated using a complex combination of experimental and first-principles simulation data from a variety of sources.Once calibrated, the EOS model can be used to interpolate and extrapolate over the wide range of input states needed in hydrodynamic simulations.The semi-empirical approach is subject to two sources of uncertainty: uncertainty in the values of the parameters in the EOS model (parameter uncertainty) and uncertainty in the form of the EOS model itself (model uncertainty).Both sources must be quantified simultaneously to give a complete picture of the total EOS uncertainty.While parametric uncertainty in the EOS has been addressed in several recent works, model uncertainty remains a significant challenge.In this work, we describe a new machine learning based approach to UQ which accounts for all sources of uncertainty and provides an analytical framework for combining heterogeneous data sources into a single, uncertainty-aware EOS model.
Our new approach uses Gaussian process (GP) regression 2 to construct a data-driven EOS model that automatically satisfies all thermodynamic constraints.The resulting model provides pointwise predictions in the thermodynamic state space that include both model and data uncertainty.Incorporating thermodynamic constraints ensures that predictions satisfy the underlying physics across the entire domain, thereby avoiding pathologies that lead to the failure of downstream tasks like hydrodynamics modeling.Next, we derive a GP model for the shock Hugoniot directly from the uncertain EOS.This allows us to derive a novel unified approach enabling the model to be trained from first-principles simulation data, various experimental data sources, or both.We apply the proposed method to EOS modeling for the diamond phase of carbon, for which first-principles simulation data were first used to train the model.Then, experimental Hugoniot data are integrated into the unified GP EOS to create a jointly-trained uncertain EOS.The proposed model provides a powerful yet flexible non-parametric EOS that can be learned directly from het-erogeneous data, obeys the important thermodynamic principles, and quantifies uncertainties that stem from noisy and sparse data from disparate sources.

A. Relevant Prior Work
In the standard setting, EOS parameter calibration depends on individual modelers who leverage domain knowledge and expertise to align EOS predictions with given experimental and simulation data.Recently, it has been common to pose the calibration process as an optimization problem to solve for the best parameters that give the least prediction error compared to available data [3][4][5][6] .This approach can naturally be extended to capture parametric uncertainty by considering uncertainties in the calibration data sets.Ali et al. 7 proposed a method that considers small perturbations in experimental data to calibrate model parameters using an optimization routine and propagate the experimental uncertainties using Monte Carlo simulation through the EOS models.
Brown and Hund 8 apply Bayesian model calibration to estimate parameters using dynamic material properties experiments under extreme conditions.Lindquist and Jadrich 9  Quantifying model form uncertainty, on the other hand, is more challenging since it requires exploring the infinite-dimensional space of possible functional forms that are thermodynamically constrained.Nonetheless, some work has been done to explore model uncertainty 12,13 .For example, Kamga et al. 12 have performed UQ in a single model by exploring discrepancies in legacy experimental data.Gaffney, Yang, and Ali 13 used GP regression to capture model uncertainty in the EOS of B 4 C, accounting for the thermodynamic consistency constraint by explicitly modeling the free energy.They showed that the constraint reduces model uncertainty in the EOS by limiting the space of functions that can be fit to first-principles simulations.However, their model ignores the important thermodynamics stability constraints, which ensure that the specific heat and isothermal compressibility remains positive, and that will quickly cause hydrodynamic simulations to fail when violated.

B. Uncertainty in Parametric EOS
An EOS model is a semi-empirical equation that relates a set of state variables in a material such as temperature T , mass density ρ (or volume V ), pressure P, internal energy E, entropy S, etc.
The standard process of building EOS models is to leverage expert knowledge of the material state under different conditions and assume a functional form that obeys the laws of thermodynamics.
These often involve many parameters that must be carefully calibrated using a combination of experimental results and first-principles simulations.A generic EOS model may be written as, where F is a vector function relating a set of state variables, α are the set of parameters unique to the assumed EOS model and Θ are the set of state variables (e.g.Θ = {P,V, T, E}) that the EOS relates.In hydrodynamic simulations, for example, it is common to express the EOS in terms of the volume V and temperature T in the following form: Once the parameters α are learned, the EOS model can be utilized to predict the desired material state in the thermodynamics phase space.
To learn these parameters with uncertainty, we can solve the requisite inverse problem in a Bayesian setting.Here, we can determine the distribution of the parameters α conditioned on the observed data d as: where p(d|α) is the likelihood function, p(α) is the prior distribution reflecting our existing knowledge of the parameters, and p(d) is the evidence that serves as a normalization and does not need to be computed in the application for parameter estimation.In a general setting, this Bayesian inference problem is solved indirectly by drawing samples from p(α|d) using various Markov Chain Monte Carlo (MCMC) methods.As we'll see, this Bayesian inference process can be difficult when data are limited and/or the parameter vector α is very high-dimensional.
UQ for existing parametric EOS models is further limited by the prescribed form of these models.Although often derived from physical principles, these models are nonetheless built upon assumptions, simplifications, and approximations of known physics, while neglecting physics that are poorly understood.Consequently, these models may be very accurate in certain regimes (e.g. of T and P) and inadequate in others.The resulting uncertainty in these predictions is referred to as model-form uncertainty and cannot be accounted for in existing parametric models.In certain cases, competing parametric models can be compared and selected using Bayesian model selection.However, this requires the computation of the evidence term (denominator in Eq. ( 3)), which poses significant practical challenges.Model-form uncertainty, combined with parametric uncertainty, results in a range of outputs for a fixed input state, thus yielding an ensemble of valid EOS models.Hence, it is also necessary to quantify these model-form uncertainties in a rigorous UQ framework to enhance our confidence in EOS predictions.Several approaches exist in the literature to address the model form uncertainties in Bayesian model calibration.These include the Kennedy and O'Hagan (KOH) approach 14 , the hierarchical Bayesian approach 15 , and the Bayesian model averaging approach 16,17 , among others.Specifically, in the KOH approach, a discrepancy model is introduced into the mathematical model to address model form uncertainty.However, these methodologies have not been extensively explored in the context of EOS models, partly due to challenges associated with their implementation and computational complexity.Our approach is complementary to these previous works as we can use it to introduce physics constraints into the existing uncertainty frameworks as appropriate for a specific problem; for example, we could add a physically constrained GP correction term to the physics-based parametric models in the manner of KOH.Significant research efforts have been made in quantifying parametric uncertainty; however, model-form UQ for equations of state has not received adequate attention, and only a few publications are available in the literature 12,13 .The biggest hurdle in quantifying model-form uncertainty is enumerating all the potential mappings that could form physics-consistent EOSs.
In this work, we develop a framework using Gaussian process regression that automatically explores the range of physics-consistent EOS models to capture both sources of uncertainty.The proposed method is non-parametric and data-driven, yet satisfies both thermodynamic consistency and stability constraints.

C. Parametric EOS with Uncertainty: An Illustration
An example of a parametric equation of state is the Mie-Grüneisen-Debye model 18 for single phase diamond.This model, which has been widely used for modeling materials (e.g.carbon and neon) in high temperature and high pressure environments has the following form: where P V (V, T = 0) is the zero temperature Vinet EOS 19 given by where x = (V /V 0 ) 1/3 and is the Debye thermal pressure.In total, the model has six parameters.The Vinet model has parameters V 0 the atomic volume, K 0 the bulk modulus, and K ′ 0 the pressure derivative of K 0 -all at a reference state of ambient pressure and zero temperature.The Debye thermal pressure has an additional three parameters.The volume dependent characteristic Debye temperature θ D is given by: and the Debye-Grüneisen parameter is given by Fitting the model therefore requires a complicated calibration process to infer the following vector of six parameters α = {V 0 , K 0 , K ′ 0 , θ 0 , γ 1 , q} from data at various pressures and temperatures 20,21 .This calibration has been performed in the literature 20 and we have further conducted a Bayesian parameter estimation using the same data with the resulting parameter distributions shown in Fig- ure 1.These results were generated using Markov Chain Monte Carlo, MCMC, with an Affineinvariant sampler with Stretch moves 22 , implemented in our UQpy software 23,24 .
Our Bayesian parameter estimation, with naïve priors on the parameter values, produce wellconverged posterior distributions and a high-quality fit to the experimental data.Yet the Bayesian calibration is allowed to accept values in regions where the parameters have no physical meaning; for example, the posterior distributions of the θ D 0 and q parameters extend to negative values  20 (Figure 1).These parameters describe the thermal contribution to the EOS, while data are limited to a small range of parameters in the 298 o K ≤ T ≤ 900 o K region, and so it can be expected that this model will quickly develop inconsistencies when extended over a large range of parameters.While this incorrect behavior would be relatively easy to correct through a prior that limits parameters to positive values, such an approach would be very difficult for more complex models where pathologies are difficult, if not impossible, to identify a priori.This is in addition to the wellknown issues with scaling MCMC to large numbers of parameters and posterior distributions with complex structure as can be expected from poorly constrained problems.
The example in Figure 1 serves to demonstrate a key difficulty with Bayesian model calibration, even when models have a modest number of parameters.Many physics-based and empirical EOS models have a much larger number of parameters which makes the parametric approach difficult and expensive.For example, the Carbon EOS in the Radiative Emissivity and Opacity of Dense Plasmas (REODP) code 25 has 17 parameters for the diamond phase of carbon (plus 17 each for the BC8, SC, SH phases and 30 parameters for the liquid phase), which makes even deterministic calibration a massive undertaking 26 .Calibrating this set of parameters using Bayesian inference is practically impossible without huge data sets and highly specialized expertise.

D. Thermodynamic constraints on EOS models
The EOS expresses the thermodynamic response of a material and so is subject to the laws of thermodynamics.These laws impose two types of constraints, often known as thermodynamic consistency and thermodynamic stability.The consistency constraint arises from the fact that changes in the various thermodynamic variables Θ are all related to changes in a single quantity, the thermodynamic potential.For an EOS of the form in Eq. ( 2), this potential is the Helmholtz Free Energy given by: where E is internal energy, T is temperature, and S is the entropy.According to the first and second laws of thermodynamics, changes in the state variables induce a change in the free energy dF = −SdT − PdV allowing us to express the pressure and energy by: Taking the derivatives ∂ P ∂ T and ∂ E ∂V gives the thermodynamic consistency constraint Deviations from thermodynamic consistency represent an erroneous source or sink of heat or work in hydrocode simulations, and therefore any valid (useful) EOS model must satisfy this equality constraint.Thus, the space of functions that satisfy Eq. ( 11) forms the maximal set of possible EOS functions for any system and therefore provides an upper bound on EOS uncertainty 13 .Note that we have chosen the Helmholtz free energy to suit the data typically used to train EOS models (which have T and V as independent variables); the above discussion can be applied to any other choice of thermodynamic potential depending on the application.
The second EOS constraints, known as thermodynamic stability, are derived from the second law of thermodynamics that requires that the Helmholtz free energy is a convex function.As a result, the isothermal compressibility (κ T ) and specific heat (c V ) are positive quantities, and the thermodynamic stability constraints are given as Deviations from thermodynamic stability can be catastrophic in hydrocode simulations, leading to numerical instability, and so the above convexity conditions provide another important constraint on the functional space of valid EOSs.

E. Gaussian process regression
The EOS model developed in this work is developed using physically constrained Gaussian process regression (GPR).GPR is a non-parametric supervised machine learning method that is widely used to construct surrogate models for expensive physics-based models 27,28 since it can approximate complex non-linear functions with an inbuilt probabilistic estimate of prediction uncertainty.It is also easily interpretable, such that the Gaussian probability measure defined at each prediction point makes it straightforward to understand prediction uncertainty and establish a degree of confidence in the model.Furthermore, the model hyper-parameters establish the lengthscale of the process and can be easily interpreted in terms of correlations among point predictions.
Their flexibility allows them to cover a wide range of functional forms in a single model.These features of GPR makes it an ideal choice for quantifying model uncertainty.
Formally, a GP is a stochastic process that is a collection of an infinite number of random variables indexed by time or space, such that any finite collection of these random variables ∈ Ω forms a multivariate Gaussian distribution.Hence, a GP can be completely defined by a joint Gaussian probability distribution over a set of functions 2 .A single function f drawn from the set of admissible functions is known as a realization of the GP and, in our case, represents one possible model for the EOS.Our task in GPR is to identify the appropriate joint Gaussian probability distribution that best represents a set of available data.
Consider that, for a given set of N observation of the input x, i.e.X = x (1) , x (2) , . . .x (N) , x (i) ∈ R d , we have the respective output vector y = y (1) , y (2) , . . .y (N) ⊤ , y (i) ∈ R. We aim to use a GP to approximate the underlying function In a Bayesian framework, we start by assuming a prior for GP Y (x) as, where µ(•) : R d → R and K(•, •) : R d × R d → R are the mean function and covariance function, respectively, defined as, The covariance function, selected as a positive definite kernel, defines the degree of linear dependence between the output values computed at input points x and x ′ .Typically, the closer two points are in the input space (by some measure, e.g.Euclidean distance), the more strongly correlated they are in the output space.A variety of kernel functions are available in the literature 2 .
Throughout this work, we will use the square exponential covariance kernel with noise, given by where l, σ 2 , σ 2 n , and δ x,x ′ are length-scale (correlation length, input scale), signal variance (output scale), Gaussian noise variance, and Kronecker delta function, respectively.Generally, θ = (σ , l, σ n ) denotes the set of hyper-parameters that are estimated from the training data.
Next, define the matrix K = K x (i) , x ( j) i j , the mean vector µ = µ x (1) , µ x (2) , . . ., µ x (N) T , and the kernel entry k The posterior predictive distribution of the output y * for a new test input x * conditioned on the training data set (X, y) is given by, where The mean m (x * ) and variance s 2 (x * ) are determined by estimating the hyper-parameters θ .
One popular approach to determine the optimal θ is to minimize the negative marginal loglikelihood 2 , given by This is performed by using a numerical optimizer.For more details on standard GPR and its implementation, we refer the reader to the textbook by Williams and Rasmussen 2 .
The standard GPR output is unconstrained, making it impractical for physics-based models such as EOS.Recently, several techniques have been developed to incorporate physical constraints on the GPR output 29 .In our work, we employ two approaches to incorporate the thermodynamic consistency and stability constraints described in Section II D. The first approach is based on the work by Jidling et al. 30 , where they modify the kernel to incorporate the known linear operator constraints.We use this approach to design a specialized kernel that encodes the desired thermodynamic consistency constraint.The second approach is recently proposed by Pensoneault, Yang, and Zhu 31 to incorporate inequality-type constraints by minimizing the negative marginal log-likelihood function (Eq.( 20)) while requiring that the probability of violating the constraints is small.We use this approach to impose thermodynamic stability constraints.The mathematical details of incorporating these approaches in our proposed framework are described next.

III. MATHEMATICAL FORMULATION OF THE CONSTRAINED GP EOS
In the following sections, we present a novel constrained GPR framework to build an uncertain EOS model constrained by the laws of thermodynamics.We first construct a GP EOS model constrained by thermodynamic consistency and stability constraints presented in Section II D. We then use the resulting model to derive a GP model for the shock Hugoniot with uncertainty.Finally, we present a unified GPR that can be jointly trained from first-principles simulations and experimental shock Hugoniot data.

A. Thermodynamically constrained GP EOS model
∈ X be N input data points from the index set X with the corre- . We assume a GP prior for the Helmholtz free energy as Using Eq. ( 10), we can define a linear operator as, Since GPs are closed under linear operations, we can derive the joint GP priors for P and E as 30 , which can be rewritten as, and ensures that the thermodynamic consistency constraint is guaranteed.For notational simplicity, let us denote Eq. ( 24) as, From this joint GP, the prediction P * , E * at a new point X * can be calculated by conditioning as Again for notational simplicity, let us denote the block covariance matrix in Eq. ( 26) by Further conditioning on the training data, we obtain the following GP model, The negative log marginal likelihood of the joint GP is given by, Next, we enforce the thermodynamic stability constraints (Eqs.( 12) and ( 13)) by limiting the functional space through constrained hyper-parameter optimization 31 .We obtain the hyperparameters by minimizing the negative marginal log-likelihood function in Eq. ( 28) while requiring that the probability of violating the thermodynamics stability constraints is small.Formally, for 0 < η ≪ 1, we impose the following probabilistic constraints at virtual locations in the input domain X v as, | X v , E, X follow a Gaussian distribution, the constraints in Eq. ( 29) and ( 30), can be simplified as, and where Φ −1 is the inverse standard normal cumulative distribution function.By minimizing the objective function Eq. ( 28) subject to constraints Eq. ( 31) and Eq. ( 32), we can obtain a set of hyper-parameters, θ , that ensures the resulting GP EOS model (Eq.( 27)) satisfies both the thermodynamic consistency and stability constraints.

B. Hugoniot derivation from the GP EOS model
In this section, we first derive the Hugoniot function (H) as a GP from the constrained GP EOS model described in Section III A. Then, we obtain the probabilistic set of so-called Hugoniot points satisfying H(V, T ) = 0.
From Eqs. (A.3) and (A.4), the H GP prior is given as The predictive distribution of H at test points X * is given by Using Eq. ( 34), we can define a subset X H ⊂ X such that ∀ X H ∈ X H , H(X H ) = 0 lies within the 1 − α% confidence intervals of the GP for H.We can achieve this by defining the standardized GP where In other words, the points X H satisfy the following condition Given an arbitrary point X H ∈ X , we can therefore establish the predictive distribution for pressure and internal energy at this point, P H and E H using Eq. ( 27); thus providing an estimate of the uncertain Hugoniot curve satisfying H(X H ) = 0.

C. Unified GP EOS model learned from multiple data sources
In this section, we propose a unified framework to train the proposed constrained GP EOS model using heterogeneous data sources.In particular, we show that the EOS model can be learned from a combination of first-principles simulation data and experimental shock Hugoniot observations.Let us define the model outputs as P, E, and H, for respective inputs, X P , X E , and X H .The joint GP of P, E, H is then defined as follows We can train the joint GP using any combination of available data and impose the thermodynamic constraints similar to the steps described in Section III A to obtain the predictive distribution of the joint GP of P, E, and H at any test point X * .Further, we can condition on H by first partitioning the block covariance of the joint GP in Eq. ( 37) in a similar manner as done previously.We denote this block covariance by   Conditioning on H gives the resulting conditional distribution Next, conditioning on a set of training points X H constrained by H(X H ) = 0 (e.g. from experimental data collected along the Hugoniot curve) yields Eq. ( 39) yields the predictive distribution of P and E at X H given by Using these equations, it is now possible to learn the thermodynamically constrained GP EOS model using a combination of experimentally observed points along the shock Hugoniot and firstprinciples calculations that relate P,V, T, E as we demonstrate in the next section.

IV. RESULTS AND DISCUSSION
In this section, we apply the proposed constrained GPR framework described in Section III to learn the EOS for the diamond phase of Carbon.We use a sample of 20 data points obtained from Density Functional Theory Molecular Dynamics (DFT-MD) simulations from Benedict et al. 26 to train the GP EOS model.We first build an uncertain EOS model that satisfies both the thermodynamic consistency and stability constraints.We then derive the Hugoniot function GP from the EOS model and obtain the resulting shock Hugoniots with uncertainty.Finally, we train the unified GP EOS model using the simulation and a limited number of experimental Hugoniot data from laser-driven shock compression experiments 32 .These results are supported by plotting the thermodynamic stability conditions in the (V, T ) space in Figures 3(a) and (b) for the constrained P-GP and E-GP, respectively.We see that the thermodynamic stability constraints are satisfied across the domain in the proposed constrained EOS model.Meanwhile, Figures 3(c) and (d) show the stability constraints for the unconstrained P-GP and E-GP EOS models, respectively, where we see that the energy stability constraint (Eq.( 32)) is violated in regions characterized by high temperatures across volumes within the domain..This results in a negative specific heat in these regions, which will likely cause a hydrodynamics simulation to crash.The pressure stability constraint, on the other hand, is not violated even in the unconstrained GP.However, we note that the unconstrained GP EOS does not satisfy the thermodynamic consistency constraint because the covariance model is not physics informed.
EOS.As can be observed from the Figure 4(a), the percentage error in pressure remains relatively low across most regions in the (V, T ) space.In specific regions characterized by high volume and low temperature, as well as high volume and temperature, there is a moderate deviation, typically ranging between 5 − 10%.For these regions, our pressure GP EOS also gives the highest uncertainty.Similarly, in Figure 4(b), the percentage error in energy is relatively low across most (V, T ) regions with a moderate deviation (around 3 − 5.5%) for high volume and temperature.Again, our energy GP EOS gives the highest uncertainty for this region.This indicates the effectiveness of our model in accurately representing the characteristic uncertainty in the model.
We also obtained the specific isochoric heat capacity (c V ) and isothermal bulk modulus (K T ) from the Benedict EOS 26 and the proposed GP EOS model.These quantities, which were not explicitly included in the training of our model but are derived from it by numerical differentiation, serve to further validate the GP EOS and the uncertainties from it.In Figure 4(c), we plot the isothermal Bulk modulus against volume.The figure shows that the mean K T obtained from our model aligns well with the values derived from the Benedict EOS 26 .Any observed deviation is effectively characterized within the 95% confidence bounds.Additionally, it can be seen that K T does not vary appreciably with temperature within the specified temperature range.
Similarly, for c V we can observe good agreement between the mean c V obtained from our model and Benedict EOS in Figure 4(d).As pointed out in the literature 26,34 , surprisingly, the specific heat is observed to be indistinguishable from 3k B /atom up to 20,000 K and is also very nearly independent of volume.Our model captures these observations within the 95% confidence intervals, but suggests there is significant uncertainty in c V particularly at high temperature.This uncertainty results from seemingly small uncertainties in the EOS that grow considerably when

B. GP Hugoniot Curve for Diamond
The GP Hugoniot function, H(V, T ) is derived from the physics-constrained GP EOS model following the formulation presented in Section III B and illustrated in Figure 5.The input states (V, T ) that satisfy Eq. (36) for which H = 0 lies in the 95% confidence interval of H-GP are also shown in Figure 5(a).These Hugoniot points are plotted separately in Figure 5(b), and a deterministic curve is fit to establish a mapping V H → T H .For these Hugoniot points (V H , T H ), the predictive distribution of pressure and energy is computed using Eq. ( 27).The resulting Hugoniot points are shown in Figure 6

C. Unified GP EOS for Diamond Trained from Simulations and Experiments
In this section, we present a unified physics-informed GP EOS for the diamond phase of carbon that is trained from multiple data sources and provides accurate EOS predictions with uncertainty -as described in Section III C. The given DFT-MD simulation data (same 20 data points used above) is augmented by Hugoniot experimental data from dynamic shock wave experiments that relate pressure and volume 32 (3 points).These data are from a pressure regime where the diamond has significant strength, meaning that we expect to see a discrepancy between simulation and experiment which provides an interesting test for our unified modeling approach.Additionally, the temperature is not usually measured in these high-pressure experiments (except in those with static compression), which provides another difficulty to overcome with our approach.The GP EOS model trained in the previous sections can be used to estimate temperature for a given volume using the mapping (V H → T H ) shown in Figure 5(b).We apply a deterministic mapping here, between model and experiment is desired, various approaches from the literature can be considered to address the inconsistency.For example specific analysis may choose a suitable kernel (e.g., Matérn kernel with higher smoothness parameters), more robust likelihood functions (e.g., Huber likelihood), regularizations, and cross-validations among others 2 , while still ensuring physical consistency in the manner we have described.Finally, as shown in Figure 8, the unified GP EOS still satisfies the thermodynamic stability constraints as necessary.We have therefore developed a comprehensive unified framework that has been successfully trained using both first-principles simulation data and experimental shock compression experiments.

V. CONCLUSION
In this work, we have developed a novel data-driven framework to construct thermodynamically constrained equation of state (EOS) models with uncertainty.The proposed framework is based on a non-parametric constrained Gaussian process regression (GPR), which inherently captures the model and data uncertainties while satisfying the essential thermodynamic stability and consistency constraints.Violation of these constraints results in a non-physical EOS that will cause problems in downstream applications such as hydrodynamics simulation.The key benefits of using GPR to build the EOS model is that it can be trained on relatively small data sets compared to other machine learning methods like neural networks and automatically estimates the prediction uncertainty.The resulting EOS model also yields a GP for the shock Hugoniot with uncertainty, which has been derived herein.Further, we proposed a unified framework such that the GP can leverage both simulation and experimental data and provides pointwise EOS predictions with uncertainty.The resulting EOS can therefore be directly incorporated into hydrocode simulations for We have specifically demonstrated the training of this physics-constrained GP EOS for the diamond phase of Carbon from first-principles DFT-MD simulations.We show that the model satisfies the thermodynamic constraints, which results in a reduction in uncertainty at certain points as well.In short, we show that considering thermodynamic constraints improves confidence in EOS predictions and supplements limited data.We then derive the Hugoniot for diamond from the GP EOS and demonstrate that the trained model can be augmented with experimental shock Hugoniot data to improve the EOS -thus demonstrating our unified framework for EOS training.
The proposed framework can be similarly applied to different material phases.However, an extension of the proposed framework to a more generalized multiphase EOS model that captures phase transitions is the subject of further work.Finally, we anticipate that the proposed framework opens the door to a wide range of improvements in EOS modeling and the associated downstream applications.For example, in future studies the prediction uncertainty can be used to inform the choice of points for new simulations/experiments through e.g.Bayesian optimization resulting in smaller data sets requirements and thus accelerating the development of new EOSs.Moreover, the proposed framework can be integrated with physics-based parametric models similar to the KOH approach 14 , such that the physically constrained GP serves as a correction to the parametric model and potentially facilitates multi-fidelity modeling.Finally, these physics-informed GP EOS models will be integrated into hydrocode simulations of shock experiments to enable uncertainty quantification studies.

APPENDIX
The Hugoniot equation can be expressed as, where E 0 , P 0 and V 0 are initial energy, pressure and volume, respectively.
We recognize that H = f (P, E), where f is a linear function of both P and E. Applying the Taylor series expansion about its mean, we get The cross-covariance of the H and P GPs is given by K HP X, X ′ = 1 2 (V −V 0 ) K PP X, X ′ + K EP X, X ′ (A.5) Similarly, we can obtain the cross-covariance between the H and E GPs as, K HE X, X ′ = K EE X, X ′ + K EP X, X ′ (A.6) For cases where the GP EOS is intended for applications where a linear relationship may not exist, we can still derive the mean and covariance model in the same manner by considering higherorder closure terms in the Taylor series approximation.Constantinescu and Anitescu 36 demonstrated how to address non-linear dependencies between GP outputs for physics-based systems to obtain a valid covariances model.There are also recent works in the multioutput GP literature that deals with obtaining covariance models for systems that are non-linear while ensuring positive definiteness 37,38 .
proposed a Bayesian framework to perform UQ of a multi-phase EOS model for carbon by accounting for calibration data uncertainty and yielding an ensemble of model parameters set.Walters et al. 10 used a Bayesian statistical approach to quantify parametric uncertainty by coupling hydrocode simulations and velocimetry measurements of a series of plate impact experiments of an aluminum alloy.Robinson et al. 11 have quantified uncertainties in the EOS stemming from the measurement noise in the experimental data and propagated these uncertainties for hydrocode simulations.They have emphasized the crucial need to integrate model uncertainty into consideration, identifying it as a significant factor.

FIG. 1 :
FIG. 1: Joint probability distribution function from Bayesian calibration of the Mie-Grüneisen-Debye EOS model for diamond phase carbon.Inset table: Deterministic parameters from Dawaele et al.20

Figures 2 (
Figures2(a) and (b) show the marginalized pressure GP (P-GP) and energy GP (E-GP) as a function of state variables (V, T ), respectively, trained from 20 DFT-MD simulations where uncertainties are shown with colors denoting standard deviation.We clearly see that the uncertainties are small near the training data points and larger in regions with no data.The overall uncertainty is small, with the coefficient of variation (COV) under 7% for the P-GP and under 1.3% for E-GP, suggesting high confidence in the predicted state.We similarly trained a GP model without imposing constraints as shown in Figure2(c) and (d), which show that the unconstrained EOS model has higher uncertainty in general.

Figure 3 FIG. 2 :
Figure 3 further supports the observations made in Figure 2. The reduction in uncertainty in the constrained E-GP (Figure 2(b)) compared to unconstrained E-GP (Figure 2(d)) is due to the introduction of the energy stability constraint.Moreover, since the proposed GP has a physicsinformed covariance model, which yields valid cross-covariances between the pressure and energy GP, improvement in the prediction of E-GP also improves predictions of P-GP even though

FIG. 3 :FIG. 4 :
FIG. 3: Plots of the thermodynamic stability conditions in the (V, T ) space.Regions of constraint violation are represented by red.(a) Probabilistic stability constraint for constrained Pressure GP EOS model showing that the constraints are not violated.(b) Probabilistic stability constraint for constrained Energy GP EOS model showing that the constraints are not violated.(c) Probabilistic stability constraint for unconstrained Pressure GP EOS model showing that this specific constraint is not violated even when unconstrained.(d) Probabilistic stability constraint for unconstrained Energy GP EOS model showing that the constraint is violated in regions characterized by high temperatures across volumes within the domain.

FIG. 5 :
FIG. 5: (a) GP Hugoniot function, H(V, T ) showing points where H = 0 lies within the 95% confidence interval of the GP in black.(b) Hugoniot points where H = 0 lies within the 95% confidence interval of the GP shown in the (V, T ) plane.

FIG. 6 :
FIG. 6: Plots of the uncertain Hugoniot curve.The width of the curve represents uncertainty in the position of the Hugoniot (i.e.satisfying H = 0 with 95% confidence) along the EOS and the corresponding uncertainty in pressure is shown with colors denoting the standard deviation.(a) Complete Hugoniot in (V, T, P) space showing all points along the EOS in which H = 0 lies within the 95% confidence intervals of the Hugoniot GP.(b) Uncertain Hugoniot curve as a function of Volume.(c) Uncertain Hugoniot curve as a function of Temperature.

FIG. 7 :FIG. 8 :
FIG. 7: Unified constrained GP EOS model trained on 20 DFT-MD simulation data and 3 shock compression experimental data relating pressure and volume.(a) Pressure marginal GP EOS model (b) Energy marginal GP model gaussian processes," in The 22nd International Conference on Artificial Intelligence and Statistics (PMLR, 2019) pp.1969-1977.