Data-driven machine learning techniques can be useful for the rapid evaluation of material properties in extreme environments, particularly in cases where direct access to the materials is not possible. Such problems occur in high-throughput material screening and material design approaches where many candidates may not be amenable to direct experimental examination. In this paper, we perform an exhaustive examination of the applicability of machine learning for the estimation of isothermal shock compression properties, specifically the shock Hugoniot, for diverse material systems. A comprehensive analysis is conducted where effects of scarce data, variances in source data, feature choices, and model choices are systematically explored. New modeling strategies are introduced based on feature engineering, including a feature augmentation approach, to mitigate the effects of scarce data. The findings show significant promise of machine learning techniques for design and discovery of materials suited for shock compression applications.

Advances in the understanding of shock compression mechanisms in materials and flows have enabled impressive successes in diverse applications including drug delivery,1 destruction of kidney stones,2 effect of meteorite impacts on surfaces of planets,3,4 the study of volcanoes,5 and laser ablation applications.6 Fundamental to the study of such mechanisms is knowledge about material- and condition-dependent behaviors such as shock velocity, particle velocity, and pressure behind a shock wave. The equations of state and conservation laws for mass, momentum, and energy encapsulate this knowledge and provide the formal shock Hugoniot/equation of state (SH/EOS) relationships which govern shock compression behaviors through a range of thermodynamic conditions.

In scenarios where rapid evaluation of notional material designs for SH/EOS are needed or where experiments are prohibitively difficult to conduct, data-driven methods may serve an important role. Notional materials are materials that may be inaccessible, have no known synthesis route, or cannot be studied either in silico or in situ for other reasons. For example, the material may not exist terrestrially, be sensitive to strongly exothermic reactions,7 or be an operando biological material.8 Determination of the SH/EOS relationships for general media can require a significant amount of information that often necessitates full realization of the material and extensive measurements. In the context of optimization or design of materials, the cost and time required to obtain this information, if such data can be obtained at all,9 can greatly inhibit exploration of the material design space.

Two traditional approaches are available for the determination of SH/EOS relationships.10–13 The first is empirical14–16 and the second is via physics-based models.12,13,17–22 To varying degrees, both generally reduce to a parameterization of an underlying constitutive relationship.11,23 Though empirical models are popular for data analysis and extrapolation, constructing these models necessarily requires reliable source data. Multiscale simulations can overcome some of the more costly or time-consuming elements of experiments, but only when the idealizations required in the models are valid, such as a gas phase assumption or periodic unit cell boundary conditions.24 Today, the experimental and multiscale approaches are mainstays, each associated with a prolific literature, but both require information that can be difficult to acquire for notional material candidates.

Multiscale modeling is used extensively for the study of materials in extreme environments.25–27 Studies exist based on molecular dynamics17,18,28–30 and electronic structure methods.20–22,31,32 Yet, there are significant challenges in developing techniques that use multiscale methods as the basis for rapid screening of notional materials. First, the range of materials used in the context of shock compression is not limited to elemental crystals and alloys; the types of materials often used can be highly polymorphic10 and have crystal unit cells with tens or hundreds of atoms. This makes the problem cost prohibitive since structure determination33–35 and configuration optimization,36–39 well suited for the study of a single material at a time, must be scaled to screen thousands, if not millions, of compounds with an accuracy that can distinguish different crystalline polymorphs. The difficulty of this task is echoed in other fields, such as molecular simulations of folded proteins,40 where despite the ostensible maturity of computational techniques, the determination of accurate condensed phase structures starting only from a notional molecular graph can be difficult. In this area, machine learning (ML) has already been making advances.41 

Second, many mechanisms important for the performance of materials in extreme environments manifest at length scales that cannot be handled rapidly using atomistic models. These mechanisms can manifest at the microstructural scale due to, for instance, submicrometer defects or interfaces present in heterogeneous mixtures.42 At these length scales, performing a single simulation with electronic structure accuracy is prohibitively difficult, let alone performing many simulations in the context of statistical estimation over many different material and configurational combinations.

ML methods are well suited to model mechanisms or behaviors43–45 that cannot be modeled solely using physics-based approaches due to, for instance, the lack of an existing theoretical or analytic model.46–48 ML methods are also amenable to design or optimization because the models often contain an expressive response surface. In biochemical property models, ML prediction tools have been used to create surrogate models to navigate the response surface with respect to chemical composition and enable the search for molecules with optimal properties such as water-octanol partition coefficient, drug likeliness, synthetic accessibility, and (de)protonation free energy.49,50 In the present paper, this means that ML methods can estimate the shock compression properties for a notional material, or inversely, find material candidates that can deliver a desired shock compression performance. This inverse design ability can accelerate the development of new materials for biological, medical, material, space, and other applications.

To the best of our knowledge, no data-driven approach for the prediction of shock compression properties of notional materials is available. In this paper, we assess ML techniques for estimating the SH properties based on data at standard temperature and pressure. A comprehensive evaluation is performed, organized by data, features, and model considerations. First, the data considerations center on scarce data challenges and the associated methods that may be used to mitigate them. Specifically, the union of multiple data sets and containing variances due to noise are systematically explored. Second, the properties of ML models heavily depend upon strong feature engineering51 which, in turn, depends on the availability of features in the data and the application of rational and deep domain knowledge by the user. Overfitting and over-parameterization52 can be mitigated by the sensible design, selection, or engineering of features. In this work, three featurization methods are considered: naïve, physics-informed, and high-throughput features. Finally, we consider both forms of the physical model, in this case the SH, as well as the choice of the ML model. The physical model is presumed to take the form of a polynomial relating the particle and shock velocities. The ML models considered include Gaussian process regression (GPR), ridge regression (RR), and neural network (NN).

The rest of the paper is organized as follows. Section II describes the data and methodology details. Sections III and IV contain the results and discussions, and Sec. V is reserved for the conclusions. A brief overview of the symbols and abbreviations is given in Table S3 in the supplementary material.

The data used in the present paper come primarily from Ref. 53 which is a consolidated report summarizing experimental shock compression properties of a highly diverse set of 474 materials from ten different material classes: 100 materials containing a single chemical element, 51 alloys, 106 minerals, 32 rocks and mixtures of minerals, 42 plastics, 41 other synthetics, 12 woods, 26 liquids, 6 aqueous solutions, and 58 energetic materials (high explosives and propellants). The shock and particle data are reported in raw form for each material along with the initial sample densities ( ρ o) and longitudinal and shear acoustic velocities measured at ambient conditions. Though the sample densities were reported for all 474 materials in the data set, longitudinal and shear acoustic velocity data were reported for only 199 materials across nine different classes. The one missing class is woods, which did not report acoustic velocities. The set of N t o t a l = 199 materials constitutes the principal data used in Sec. II B 1.

In Ref. 53, the initial density of a material varies substantially between measurements. The standard deviation in the initial densities is shown in Fig. 1, using s i ρ = j [ ( ρ i j o ρ ¯ i o ) 2 n i 1 ], where i is the material index, n i denotes the number of data points in the SH for the ith material, j ( 1 , n i ) is the index for the experimental SH data points, and ρ ¯ i o is the mean value of ρ i j o j . ρ ¯ i o varies from 0.66 to 22.47 g/cm3 and the standard deviation ranges from 0 to 0.20. s i ρ was zero for some of the materials in the data, as their initial density values were same for all state points within an SH curve. Forty-five materials have s i ρ larger than 0.03 g/cm3. Only the first six material classes are represented in this set, including elements, alloys, minerals, rocks and mixtures of minerals, plastics, and synthetics. This leads us to believe that this fraction of the data may have effects of other components in the mixture or other morphological features. Despite such large variations, the diverse set of 199 materials is used without prejudice.

FIG. 1.

Occurrence counts of standard deviation of initial material densities.

FIG. 1.

Occurrence counts of standard deviation of initial material densities.

Close modal

The data for the Grüneisen parameters used in Sec. II B 2 come from the Appendix in Ref. 54 where 46 are also found in Ref. 53. The information about compositional features is used for the ML models. The compositional features include elemental properties, molecular formulas, and other physical properties that are generally known for compositionally pure ingredients.55 The compositional features were determined for 159 materials from the set of 199. The specific values and associated references are listed in the supplementary material.

Multiple feature vectors are considered in this paper and are summarized here. A feature vector is generally defined as x R m, where m is the chosen number of features. Ideally, x will include the most important factors that are needed to accurately predict the SH curve. The use of unnecessary or redundant factors will result in a higher dimensionality of the representation vector which can burden the model and increase the resource demands without significantly improving prediction accuracy. A careful design of a model's feature vector should consider the expected insights gained from the inclusion of each factor and the likely cost to acquire its values.

The various studies and their associated feature and target variables considered in the present paper are summarized in Table I.

TABLE I.

Summary of ML models and evaluations. ρ o = density ( g / c m 3 ), v T = shear or transverse sound velocity ( km / s ) v L = longitudinal sound velocity ( km / s ), γ = Grüneisen parameter (scalar), m ¯ = LCCF atomic mass ( g ), r ¯ = LCCF atomic radius ( pm ), χ ¯ = LCCF electronegativity ( Pauling scale ), E ¯ = LCCF ionization enthalpy ( eV ), T ¯ = LCCF melting point ( K ). Ntotal = total number of materials in the dataset for a specific trial.

No.StudyDataResultsNtotal and data sourcesFeature sets
Naïve features Sec. II B 1 Sec. III A 199 materials53  (ρo, vT, vL
Physics-informed features Sec. II B 2 Sec. III B 1 46 materials53,54 (ρo, vT, vL) vs(ρo, vT, vL, γ
Linear combination of compositional features (LCCFs) Sec. II B 3 Sec. III B 2 159 materials53,56 (ρo, vT, vL) vs ( m ¯ , r ¯ , χ ¯ , E ¯ , T ¯ ) vs ( ρ o , v T , v L , r ¯ , χ ¯ , E ¯ , T ¯ ) 
Segregated vs combined data Sec. II C 1 Sec. III C 1 199 materials53  (ρo, vT, vL
Source data variance Sec. II C 2 Sec. III C 2 131, 192, 197, 199, or 131 materials53  (ρo, vT, vL
No.StudyDataResultsNtotal and data sourcesFeature sets
Naïve features Sec. II B 1 Sec. III A 199 materials53  (ρo, vT, vL
Physics-informed features Sec. II B 2 Sec. III B 1 46 materials53,54 (ρo, vT, vL) vs(ρo, vT, vL, γ
Linear combination of compositional features (LCCFs) Sec. II B 3 Sec. III B 2 159 materials53,56 (ρo, vT, vL) vs ( m ¯ , r ¯ , χ ¯ , E ¯ , T ¯ ) vs ( ρ o , v T , v L , r ¯ , χ ¯ , E ¯ , T ¯ ) 
Segregated vs combined data Sec. II C 1 Sec. III C 1 199 materials53  (ρo, vT, vL
Source data variance Sec. II C 2 Sec. III C 2 131, 192, 197, 199, or 131 materials53  (ρo, vT, vL

1. Naïve features: Density and speed of sound

In study 1, the feature set is comprised of the density, shear, and longitudinal sound velocities ( x R 3 ). This is presently called the “naïve” set because this information is already provided in Ref. 53 and the variables are correlated to the shock compression properties in many solids.57–60 A separate density is reported in Ref. 53 for each sample prior to its measurement but only a single value based on their average is used in the input feature vector presently.

2. Physics-informed features: Thermodynamics

Shock compression is a strongly thermophysical process. This insight can be featurized through the Grüneisen parameter. The resulting feature vector in study 2 is presently termed the physics-informed feature set containing the naïve features concatenated by the Grüneisen parameter ( x R 4 ). Under high compression, a change in density must inevitably lead to a change in the acoustic wave speed, which is relevant to the Grüneisen parameter. The efficacy of including this value into the features is, therefore, examined. To ensure that the dimensionality of the feature vectors is the same with and without the Grüneisen parameter, the “without” case is trained and tested with the Grüneisen parameter set to a constant value of zero for all materials. The “with” case is then trained and tested with the values of Grüneisen parameter from Ref. 54. The values in Ref. 54 were estimated using zero pressure thermodynamic parameters such as thermal expansion coefficient, zero pressure bulk modulus, specific heat at constant volume, and initial density.

3. Feature augmentation: Linear combination of compositional features (LCCFs) for mixtures

In study 3, we explore the use of simple, low-cost compositional information using a linear combination of compositional features (LCCFs) ( x R 5 ). The terms of the LCCF vector ( x R 5 ) are the atomic mass m ¯, atomic radius r ¯, electronegativity χ ¯, ionization energy E ¯, and melting point T ¯. LCCF provides underlying contextual information that can be used to enrich available material data. The study will first examine the predictive performance of LCCF terms alone as the sole features (“LCCF-Only”) and then the effect of concatenating the LCCF features to the naïve features (“LCCF + Naïve”).

The LCCF vector is comprised of features obtained from simple rules of mixtures and basic information about the composition. Namely, the features are based on the mole fractions of ingredients, a combined single molecule representation of all fractional chemical species, even for mixtures containing multiple ingredients, and basic properties for elements in the periodic table. This is described in the discussion for Eq. (1). The data in Ref. 53, in contrast, contain a unique density for each experimental sample; two samples with ostensibly identical molecular formula may report different density values due to sample variations. This necessarily means LCCF, when used on its own, cannot account for polymorphs, allotropes, microstructure, grain orientation, or any configurational detail that is represented by the simple molecule representation. Other efforts have estimated the SH of mixtures using the SH of the individual constituents10 and used ML with rules of mixtures to estimate the elastic properties of alloys.61 In Fig. 2, the SH curves are estimated in three possible ways using distinct features. The first featurization uses the as-is measured data provided in Ref. 53 (Naïve), the second uses notional ingredients (LCCF-Only), and the third concatenates both of these (LCCF + Naïve).

FIG. 2.

Schematic showing naïve, LCCF-Only, and LCCF + Naïve experiments.

FIG. 2.

Schematic showing naïve, LCCF-Only, and LCCF + Naïve experiments.

Close modal
The LCCF are theoretical features and are determined using the following approach. For a material with pseudomolecule formula a 1 x 1 , a 2 x 2 , , a n x n, constituent chemical species are denoted by a 1 , a 2 , , and x 1 , x 2 , are the compositional fractions or the integer of each element in the pseudomolecule. For example, in C3H6N6O6, a 1 = C, a 2 = H, x 1 = 3, x 2 = 6, and so on. Then, the feature is determined using
ϕ ¯ = ( x 1 ϕ a 1 + x 2 ϕ a 2 + + x n ϕ a n ) ( x 1 + x 2 + x n ) ,
(1)
where ϕ ¯ is the composition-dependent feature and ϕ a 1, ϕ a 2 , , ϕ a n are the feature values of the constituent elements. This is effectively a mole-fraction-weighted rule of mixtures. The feature values, ϕ a 1, ϕ a 2 , , ϕ a n, come from Ref. 55 for each constituent element. We use the elemental properties that are based on the stable physical state at standard room temperature and pressure conditions typically published in the periodic table. The data also include information about the melting temperature. For consistency, we use the as-is value of melting temperature even though the elemental stable phase may not be a solid.

The final set of features used for LCCF and LCCF + Naïve are found after eliminating information redundancy among the terms. The redundancy was determined using the Variance Inflation Factor (VIF)62 based on the statsmodels package.63 VIF represents the degree of multicollinearity, or information redundancy, among feature variables. The VIF was determined for each feature and values > 10 are taken to be an indication of high collinearity. The final LCCF feature set is x = { m ¯ , r ¯ , χ ¯ , E ¯ , T ¯ }. The final LCCF + Naïve set is x = { ρ o , v T , v L , r ¯ , χ ¯ , E ¯ , T ¯ } after m ¯ was removed due to its high VIF value and likely information redundancy with ρ o.

The lack of training data is a major challenge in the translation of ML to any field.64,65 With such a small amount of data and the need to model many material types and classes, variances in data can exacerbate the problems associated with scarce data.

1. ML from segregated vs combined material class data

Study 4 is used to assess the value of combining data from multiple material classes when in the small data regime, despite the variance of properties between material classes. ML model performance is determined by comparing ML prediction accuracy for models trained and tested on each class separately vs an ML model trained and tested on all of the classes combined. This evaluates a transfer learning effect due to the enrichment of the data from multiple classes. The nine material classes in the data set are described in Sec. II A. The number of materials in each class ranges between 5 and 56. The training and evaluation procedures are described in Sec. II D 3.

2. Accounting for variances in data

A second type of variance occurs in the data due to potential variations in measured properties even within a single SH curve. For example, within any single SH curve in the data of Ref. 53, different points are recorded with its own unique density.

In study 5, the effect of this type of variance on the ML models is studied by regrouping the data into five subsets with progressively larger degrees of SH variability with respect to the slope coefficient, C 1, in the SH model, which is described below in Sec. II D 1. The first subset is comprised of 131 materials and has the smallest variance. The data are limited to materials whose standard error of the SH regression slope ( s C 1, defined in S1) is smaller than 0.04, shown in Fig. 3. Materials whose s C 1 is larger are shown in Fig. S4 in the supplementary material to have SH curves that exhibit a noticeable departure from linearity. The 131 materials in this subset exclude materials with s i C 1 > 0.04 and n i < 4 (five such materials), the latter due to the use of the standard error.

FIG. 3.

Ascending list of s i C 1 for the materials in the 199-material dataset. A dot is used to indicate the point at which s i C 1 = 0.04. Above this point, materials were found to exhibit a noticeable degree of nonlinearity in the SH curves.

FIG. 3.

Ascending list of s i C 1 for the materials in the 199-material dataset. A dot is used to indicate the point at which s i C 1 = 0.04. Above this point, materials were found to exhibit a noticeable degree of nonlinearity in the SH curves.

Close modal

The next three subsets correspond to materials whose standard error in C 1 is bounded from above by s i C 1 < 0.2, s i C 1 < 0.4, and s i C 1 < 0.7, respectively, contains 192, 197, and 199 materials. The fifth and final subset ensures that the sample size effects are accounted for and limits standard error to s i C 1 < 0.7 but for only 131 materials randomly chosen across trials, identical in size to the first subset. Notably, the fourth subset contains all materials in the data, implying that all materials have standard errors in the regression slope smaller than 0.7. However, due to the smaller sample size in the fifth subset, we reference the fifth subset using s i C 1 < .

1. Target model: Higher-order polynomial SH relation

The ML task is to learn the coefficients of polynomials that model the unreacted SH relationships of the materials in a data set. The SH relationship describes the shock and particle velocities in a unidirectional shock loading scenario. The SH state points constitute an isoline in the thermodynamic P V state space. When either set of variables is known, the isothermal Rankine Hugoniot relations66 can be used to convert between u s u p and P V information using the respective statements for conservation of mass and momentum,
V V o = u s u p u s ,
(2)
P P o = ρ o u p u s ,
(3)
where P o , V o , and ρ o are initial pressure, specific volume, and density, while P and V are pressure and specific volume behind the shock front, respectively. u s and u p are the shock and particle velocities, respectively. If the initial thermodynamic state in front of the shock wave is known, measurements of u s = g i ( u p ) for material i determine the relationship between the possible thermodynamic states ahead of and behind the wave front.
A general model of a relationship is the polynomial,67–70 
u s = g i ( u p ) = C 0 i + C 1 i u p + C 2 i ( u p ) 2 + C 3 i ( u p ) 3 .
(4)
The coefficients C r i for r = 1 , 2 , 3 can be obtained via least squares for each material i, and higher-order polynomial is well suited for the current task because SH curves can often be nonlinear. This can be due to underlying physical reasons, including phase transformations,71–73 porosity,10 and varying material composition or material characteristics10 as is found in materials with microstructures that are vitreous, fibrous, or anisotropic. However, for the data considered presently, we found that many of the materials exhibit a dominant linear behavior. It will be shown later in Sec. III A that higher-order terms in the model struggle to provide the level of ML prediction accuracy obtained using a simpler linear assumption. So, it is also convenient to define a linear SH model,
u s = g i ( u p ) = C 0 i + C 1 i u p .
(5)

The coefficients obtained directly via least squared from the source data are used as ground-truth values to evaluate the ML models in Sec. II D 2. In Sec. S10 in the supplementary material, orthogonal polynomials are shown to yield improved higher-order accuracy but at the cost of reduced accuracy at lower orders.

2. ML models

ML methods, such as neural networks,74 ridge regression75 and Gaussian process regression,76 have been used previously to learn material structure–property relationships. Gaussian process regression (GPR) is particularly noted for its ability to model stochastic systems77 and is, therefore, a good choice for the study of SH data. The main idea in GPR models is to use covariances in data to parameterize a jointly distributed probability density function. The function typically presumes a kernel. In Sec. III A, GPR will be shown to be the strongest performing model in this study and will, therefore, serve as the primary ML model used elsewhere in Sec. III.

The current GPR model78 implementation is based on Scikit-learn.79 For a detailed introduction to the fundamentals of GPR, we refer the readers to Ref. 78. In this work, the kernel function k ( x i , x i ) is defined as a sum of a Matern covariance kernel [ σ f 2 k 1 ( x i , x i ) ] and white noise kernel [ k 2 ( x i , x i ) ], where i , i ( 1 , N ) are material indices, N is the number of materials in training data, σ f 2 is a variance hyperparameter, and x i , x i R m are the input feature vectors. k 1 ( x i , x i ) is a Matern correlation kernel given by79,
k 1 ( x i , x i ) = 1 Γ ( ν i ) 2 ν 1 ( 2 ν d ( x i , x i ) l ) ν K ν ( 2 ν d ( x i , x i ) l ) ,
(6)
where d ( x i , x i ) is the Euclidean distance operator, K ν ( . ) is the modified Bessel function of the second-kind,80  Γ ( ν ) is the gamma function, l is the length scale parameter of the kernel, and ν is the smoothness parameter of the learned function. The physical features in x i may contain density and sound velocities (see Sec. II B 1), Grüneisen parameter (see Sec. II B 2) or LCCF (see Sec. II B 3) as summarized in Table I. Section S8 in the supplementary material provides a comparison of the SH predictions using an isotropic and anisotropic kernel. Based on the observations in Sec. S8 in the supplementary material, we proceed with an isotropic kernel in the rest of this study. The white noise kernel specifies the noise level for the GPR by adding independently and identically normally distributed noise to the kernel k ( x i , x i ). The white noise kernel is defined by
k 2 ( x i , x i ) = α 2 if i = i else 0 ,
(7)
where α 2 is a real-valued user parameter controlling the level of noise in the kernel. Following the observations in Sec. S9 in the supplementary material, we employ σ f 2 = 1 in the rest of this study.

Each of the coefficients in Eqs. (4) and (5) is modeled using an independent GPR model. The hyperparameters are the length scale l, smoothness ν in Matern kernel in Eq. (6), and the α 2 in the white noise kernel in Eq. (7). The popular choices for ν are 1/2,81,82 3/2,83–85 and 5/2.84–87 The GPR model is determined by the values of the hyperparameters ν, l, and α 2 that give the minimum value of negative log likelihood.

In this work, the mean prediction of the GPR model on the test data is used in the evaluation of model performance. The performance is compared with other ML models including ridge regression (RR) and neural network (NN). The results compare only the best performing RR model, which was obtained with optimized hyperparameters based on a grid search approach79 with fivefold cross validation. The NN model architecture has two hidden layers with 64 and 32 neurons each with ReLU activation except in the last layer, which uses a linear activation function. The model was trained using the Adam optimizer, mean square error (MSE) loss criterion with a learning rate of 0.0001, and a batch size of 16. The hyperparameters, associated with the NN implemented using Keras,88 were optimized using data in the validation set. The validation set is formed using a random 10% subset of the training set. Results are shown for the best performing NN model with the lowest validation loss vs epoch using early stopping callback88 (patience of 100 epochs and validation loss as the monitor). The maximum number of training epochs was set to 1000 epochs.

3. ML model training and evaluation procedures

For each study in Table I, the models are trained and evaluated using a set of 50 trials. Where appropriate, a 90/10 train-test split by material class is used in each trial. That is, 90% of the materials from each material class, rounded to the nearest integer, are included in the training set in each trial. The material classes are defined in Sec. II A. Before each trial, the data are shuffled and randomized with different values of random seed. 25 materials comprise each test set usually comprised of at least one material from every class. The hyperparameters of the model are optimized in each trial. Each element in the input feature vector (Sec. II B) and target vector (Sec. II D 1) was independently scaled in the range [0,1]. Scaling was done before training using the MinMaxScaler79 function.

This training procedure is used in the method described in Sec. II B 1. The methods described in Secs. II B 2, II B 3, and II C 2 also follow the same procedure but with data limited according to the conditions described in Table I.

In study 4, the combined data set follows the same approach for model training. In the trials that use segregated data, however, each material class is divided using the 90/10 train-test ratio but the data from each class is used to train separate GPR models. Each class is retrained and tested 50 times (i.e., 50 trials). The model evaluation procedure that follows next is then applied to each material in the test set. Further details are provided in Sec. S3 in the supplementary material. To ensure consistency in the comparisons of any two models, the same random seed is used for all models.

Model evaluation uses two types of R 2 coefficients of determination. The first is based on the measured (experimental), u i j s, and predicted, u ^ i j s, shock velocities,
R i H = 1 ( j ( u i j s u ^ i j s ) 2 j ( u i j s u ¯ i s ) 2 ) ,
(8)
where i ( 1 , N t o t a l ) is the material index, N t o t a l is the total number of materials in the data defined in Table I, j ( 1 , n i ) is the index for the SH points, and u ¯ i s is the mean value of u i j s j. Let R S H be the set of all R i H values such that R ¯ S H, R ~ S H, and IQR ( R S H ) are the overall mean, median, and interquartile range of R S H . The cardinality of R S H varies by study and its value is the number of materials tested per trial times the number of trials N N t r i a l. The number of test materials in a single trial is N , whose value in each study is provided in Sec. S6 in the supplementary material. Every study in this paper uses N t r i a l = 50.
The second is based on the polynomial coefficients in the SH equation. The coefficient of determination for the r th coefficient in Eq. (4) between the ML-predicted C ^ r i and ground-truth values C r i is defined by
R r P = 1 ( ( C r i C ^ r i ) 2 ( C r i C ¯ r ) 2 ) ,
(9)
where the mean value C ¯ r and the sums are for the N test materials in a single trial (see Sec. S6 in the supplementary material). The ground-truth coefficients C r i are determined directly from the source SH data through least squares. R r P is a more conservative evaluation measure than R i H. The symbol R ¯ r P is used to represent the average R r P over the trials.

Table II summarizes the results of study 1. The effect of the choice of polynomial order on the prediction accuracy is shown. The linear SH produces the highest accuracy in terms of both R S H as well as R ¯ r P metrics. The higher accuracy is attributable to the use of a fixed set of input features (density and sound velocities) but a smaller number of target parameters (two coefficients). The coefficients of the higher-order terms in the SH polynomial appear sensitive to features beyond density and sound velocities, likely indicating that these two features are not correlated to the nonlinearity in the SH curve. The larger errors associated with the higher-order coefficients ultimately compound the errors in the final SH prediction, thereby decreasing the performance. In the case of the third order polynomial, we find the prediction performance in terms of R i H to be exceptionally poor. We find that such highly negative occurrences arise due to poor ML based estimation of the coefficient of higher-order terms and when the state points within an SH curve take large u s and u p values. Under these conditions, we observe that the predicted state points in the SH curve with large values of u s and u p have significant deviations from the experimental data points. We, therefore, limit further study henceforth to the linear form of the SH model shown in Eq. (5).

TABLE II.

Study 1 comparisons of the effect of polynomial order and ML models on the accuracy of the SH and polynomial coefficient predictions. Total test materials = Number of materials in each trial (25) × number of trials (test/train splits) (50).

Polynomial orderML model R S H for test data R ¯ r P ( r = 0 , 1 , 2 , 3 ) for test data
R ~ S H R ¯ S H R ¯ 0 P R ¯ 1 P R ¯ 2 P R ¯ 3 P
GPR 0.90 0.65 0.86 0.28 … … 
NN 0.85 0.56 0.82 0.17 … … 
RR 0.88 0.53 0.82 0.15 … … 
GPR 0.75 0.27 0.74 0.17 −0.04 … 
NN 0.83 0.43 0.69 0.13 0.04 … 
RR 0.86 0.50 0.69 0.10 −0.02 … 
GPR −3.24 −18.08 0.12 −0.19 −0.32 −0.59 
NN −29.75 −145.54 0.33 −0.14 −0.66 −0.42 
RR −2.50 −10.49 0.34 −0.18 −0.32 −0.55 
Polynomial orderML model R S H for test data R ¯ r P ( r = 0 , 1 , 2 , 3 ) for test data
R ~ S H R ¯ S H R ¯ 0 P R ¯ 1 P R ¯ 2 P R ¯ 3 P
GPR 0.90 0.65 0.86 0.28 … … 
NN 0.85 0.56 0.82 0.17 … … 
RR 0.88 0.53 0.82 0.15 … … 
GPR 0.75 0.27 0.74 0.17 −0.04 … 
NN 0.83 0.43 0.69 0.13 0.04 … 
RR 0.86 0.50 0.69 0.10 −0.02 … 
GPR −3.24 −18.08 0.12 −0.19 −0.32 −0.59 
NN −29.75 −145.54 0.33 −0.14 −0.66 −0.42 
RR −2.50 −10.49 0.34 −0.18 −0.32 −0.55 

The large differences between R ¯ S H and R ~ S H are due to the skewed ranges of R i H values where a small number of outlier negative values have an outsized influence on the mean; 49 out of 1250 total test candidates had R i H values less than −1.00. The median R ~ S H is, therefore, more suitable to describe central tendency. The associated cross-validation plots for the NN model are provided in Fig. S7 in the supplementary material.

The R i H and R r P values from the 50 trials are shown in Figs. 4(a)4(c). GPR clearly outperforms other ML models (NN, RR) in Fig. 4(a) in terms of the R S H mean, median, and interquartile range and in Fig. 4(b) in terms of R ¯ 0 P. In Fig. 4(c), however, the values of R ¯ 1 P are relatively lower, indicating poor prediction performance for the SH slope. The associated mean square error (MSE), on the other hand, between C ^ 1 i and C 1 i is 0.1. This is in the context of ground-truth C 1 i values with a mean of 1.38 and a 95% confidence interval of (1.33, 1.43). This seemingly conflicting behavior is due to the relatively narrow range of C 1 i values across the many different materials in the data. That is, the low R ¯ 1 P is a consequence of a moderate MSE value, which is in the numerator in Eq. (9), that gets divided by a fairly small-valued variance in the denominator. The variance is indicated by the narrow range of slopes among different materials. The underlying physical cause of the variations is in the inability of the current input features of capturing incipient appearance of anharmonic or nonlinear material effects. The low values of R ¯ 1 P are an indication that it is the most conservative accuracy metric considered here. However, poor slope prediction does not presently preclude high R i H, which appears to be more useful as an indicator of overall SH accuracy.

FIG. 4.

Violin box plots for the linear form of SH in study 1. Distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green). GPR—Gaussian Process Regression, NN—Neural Net, and RR—Ridge Regression. Not all outliers are shown for ease in viewing.

FIG. 4.

Violin box plots for the linear form of SH in study 1. Distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green). GPR—Gaussian Process Regression, NN—Neural Net, and RR—Ridge Regression. Not all outliers are shown for ease in viewing.

Close modal

Shown in Fig. 5 is the accuracy in histogram form. The accuracy is reported relative to the experimental data for each material in the test set across all 50 trials. Of the 1250 test instances, 633 had R i H values above 0.9, and 841 had values above 0.8. Due to random selection, only 197 of 199 materials in the data appeared among the 1250 test materials. Figs. S3(a)–S3(i) in the supplementary material show the violin plots of R i H values grouped by material class. The predicted and experimental SH curves of the materials in the test data for each class are shown in Fig. S4 in the supplementary material.

FIG. 5.

Distribution of R i H, determined for all test materials across 50 trials. Of the total 1250 test instances (25 trials × 50 test materials per trial), 1139 yielded positive R i H values.

FIG. 5.

Distribution of R i H, determined for all test materials across 50 trials. Of the total 1250 test instances (25 trials × 50 test materials per trial), 1139 yielded positive R i H values.

Close modal

1. Thermodynamic features

The thermodynamically inspired feature set containing the Grüneisen parameter is used in study 2, as described in Sec. II B 2. Figure 6 depicts the effect of adding this feature to the naïve feature set for the cross-validation study of 46 materials. The changes in accuracy by adding the Grüneisen parameter are modest but show categorical increases in R ~ S H by 0.8%, increases R ¯ S H by 7.3%, and reduces IQR ( R S H ) by 56%. The appearance of negative values of R 1 P and the higher metric values than in Sec. III A are attributable to the significantly smaller data used in this part of the study. Nevertheless, the observations once again appear to confirm that the naïve features are insufficient in representing the factors that lead to some of the nonlinear SH behaviors and that increasing the amount of data with more physics information, such as the Grüneisen parameter, may improve model performance.

FIG. 6.

Violin box plots showing distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green). No Grüneisen—Grüneisen parameter is set to zero, With Grüneisen—feature vector includes the actual values of Grüneisen parameter. Not all outliers are shown for ease in viewing.

FIG. 6.

Violin box plots showing distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green). No Grüneisen—Grüneisen parameter is set to zero, With Grüneisen—feature vector includes the actual values of Grüneisen parameter. Not all outliers are shown for ease in viewing.

Close modal

2. Linear combination of compositional features (LCCFs) for mixtures

The LCCF vector in study 3 contains only information about each notional composition and its pure ingredients, namely, the pseudomolecular formula along with the properties of the relevant chemical species from the periodic table. In Fig. 7, the model performances are shown. Unsurprisingly, LCCF-Only offers the lowest ML prediction performance among the three featurizations, even lower than the naïve features. It is remarkable, however, that LCCF + Naïve is strongest in predicting the SH. LCCF + Naïve is improved over naïve in R ~ S H by 4.5%, R ¯ S H by 3.9%, and IQR ( R S H ) (lower) by 32%. A possible explanation for the improved R S H metrics is that significant underlying nonequilibrium effects may belie the SH curves. Anharmonic effects and mechanisms occurring at negative definite regions of the potential energy are not well represented by the experimental naïve features, which are presently limited to harmonic properties. It is likely that the nonequilibrium information about ionization enthalpy and melting temperature aids LCCF + Naïve in this case.

FIG. 7.

Violin box plots showing distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green). Naïve features—feature vector includes the experimental values of density, shear, and longitudinal sound velocities, LCCF-Only—using LCCF featurization alone, and LCCF + Naïve—LCCF features concatenated with naïve features. Not all outliers are shown for ease in viewing.

FIG. 7.

Violin box plots showing distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green). Naïve features—feature vector includes the experimental values of density, shear, and longitudinal sound velocities, LCCF-Only—using LCCF featurization alone, and LCCF + Naïve—LCCF features concatenated with naïve features. Not all outliers are shown for ease in viewing.

Close modal

The well-understood competition in many investigations is the need for data containing the proper feature variables most closely correlated to the target properties vs the need for data that are easy to obtain or otherwise readily accessible through rapid calculation or measurements.52 In the second experiment LCCF-Only, we investigate the use of LCCF to overcome featurization challenges, particularly in the context of scarce data of material mixtures. A significant fraction of the present data is for materials that are in fact mixtures.53 Data containing the proper feature variables most closely correlated to the target properties can be prohibitively difficult to obtain. The LCCF-Only is evaluated using the experimental SH curves despite containing no experimental feature values, unlike the studies in Secs. III A and III B 1, which use input features whose values were determined from experiments.

Not surprisingly, the results in Fig. 7 make evident the poorer accuracy of the LCCF-Only models; in Fig. 7(a), naïve outperforms LCCF-Only by a wide margin. However, it is quite remarkable that a simple augmentation of elemental chemical properties leads to a substantial improvement in performance where LCCF + Naïve improves upon either LCCF-Only or naïve alone. This noteworthy find is an indication that future efforts may weigh the benefits of using hybrid synthetic data (a) in place of, (b) limiting the amount of, or (c) augmenting experimental data. To wit, in situations lacking experimental or physics-based simulation data, linear combinations of elemental properties may be used to substantially improve ML model prediction accuracy. Figures 7(b) and 7(c) largely reiterate the observations of Sec. III A that underlying causes of nonlinearity within tightly clustered predictions of slope lead to poor R ¯ 1 p but without hurting the overall physical accuracy indicated by the R S H metrics.

1. ML from segregated vs combined material class data

In Fig. 8, the evaluation measures are shown for study 4 using the segregated and combined data. The number of materials in the classes used in the segregated data scenario is defined in Sec. II A. Though the segregated data provide a marginal improvement in R ~ S H and IQR ( R S H ) by 2.1% and 2.4%, respectively, the reduction in R ¯ S H is quite substantial on the order of 24%. As evident from the thicker tail, the lower values of R ¯ S H for segregated data signify a relatively large number of materials with worse predictions. Such a drastic decline in R ¯ S H and the associated larger number of worse predictions favor the use of combined data over segregated data. The effect of combined data is attributed to the knowledge sharing via diversification that effectively occurs when joining the classes. Indeed, estimates of the SH of mixtures can be based on the SH of its individual constituents10 or weighted averages such as the kinetic energy averaging technique.89 Thus, combining the classes places a more diverse set of ingredients into the population from which to learn. The use of uniform input features (i.e., density and sound velocities) in the complementary data facilitates the sharing, and the fact that density and sound velocities are known to correlate well to shock compression behavior in multiple material classes7,57–60 likely aids the performance. Perhaps most importantly, however, this is an indication that the dearth of data for one class may be overcome by joining available data for other classes.

FIG. 8.

Effect of combining various material classes on the accuracy of SH prediction vs the model trained separately on each material class. Distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green).

FIG. 8.

Effect of combining various material classes on the accuracy of SH prediction vs the model trained separately on each material class. Distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green).

Close modal

It should be briefly remarked that for the GPR model trained on segregated classes, two material classes with a single material in the test set (i.e., synthetics and aqueous solutions) were excluded from the evaluation of R r P as Eq. (9) is only valid for sample sizes of 2 or more. However, since each material in our data set contained at least two SH state points, no classes were exempted from the evaluation of R i H [Eq. (8)].

2. Variances in the experimental data

The generalization errors for the subsets of data with progressively larger permitted variance are shown in Fig. 9(a)9(c). Reducing variance in the data demonstrably leads to highly accurate (less bias) and more robust (less variance) predictions. Between the worst (far right) and best (far left) cases, the improvements were found in the R ¯ S H by 84% and R ~ S H by 6.0% and a reduction in IQR ( R S H ) by 64%. The progressively larger variance was accompanied by a systematic and nearly monotonic decrease in prediction accuracy. Interestingly, the smaller sample size studied in the fifth subset ( s i C 1 < ), which was associated with using 131 random samples over N t r i a l trials from the original data set containing 199 materials, has a pronounced effect in broadening the tail of the distribution (compared to the other four subsets). The use of subsets within the larger set with a greater variance magnifies the effect of noise and results in decreased ML performance. These trends are consistent in Figs. 9(b) and 9(c) where evaluation is performed with respect to the SH coefficients. The lone exception is in the third subset where the performance in prediction is contrary to the broader trend. This is an anecdotal effect related to the admitted materials in that subset with having an unusually easy time in predicting the SH curve intercept.

FIG. 9.

Comparison of the effect of different levels of variance in the source data. Distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green).

FIG. 9.

Comparison of the effect of different levels of variance in the source data. Distributions of (a) R i H, (b) R 0 P, and (c) R 1 P from 50 trials with respective medians (in orange) and means (in green).

Close modal

Insofar as prediction performance can be understood through a single coefficient of determination and in spite of the smallness of the data sets, R ~ S H values throughout this work have been remarkably high. However, a complete understanding of this performance must examine the worst performers particularly in the interest of steering future data curation efforts. The worst performers are evident in the stark differences in mean R ¯ S H and median R ~ S H. This is due to R i H values that are consistently low for 18 out of the 197 unique materials. The worst performers are shown in Fig. 10, and the classes to which they belong are in Table III. These specific materials, starting with the worst performer, are forsterite ( ρ o = 3.201 g / c m 3), 3.201 composition B ( ρ o = 1.715 g / c m 3), steel ( ρ o = 7.92 g / c m 3), hematite ( ρ o = 5.007 g / c m 3), iron magnesium oxide ( ρ o = 5.191 g / c m 3), uranium dioxide ( ρ o = 10.3 g / c m 3), ilmenite ( ρ o = 4.787 g / c m 3), sillimanite ( ρ o = 3.1 g / c m 3), silicon carbide ( ρ o = 3.122 g / c m 3), albitite ( ρ o = 2.61 g / c m 3), eclogite ( ρ o = 3.551 g / c m 3), magnetite ( ρ o = 5.117 g / c m 3), wollastonite ( ρ o = 2.87 g / c m 3), strontium ( ρ o = 2.628 g / c m 3), beryllium oxide ( ρ o = 2.989 g / c m 3), anorthosite ( ρ o = 2.732 g / c m 3), zirconium dioxide ( ρ o = 4.512 g / c m 3), and carbon ( ρ o = 1.492 g / c m 3). The classes to which the worst belong are primarily the minerals, mixtures of minerals, and energetics’ classes. Elements, alloys, and other synthetics classes are the next lowest. 14 of the 18 worst prediction cases belong to the minerals and mixtures of minerals classes, even though the two classes contain 45 of the 199 materials in the data set.

FIG. 10.

(a) The violin plot of RiH values of worst-performing materials. The material name and the average initial density (noing/cm) and (b) normalized SH curves of the worst performers. Black lines are the ML predictions, colored dots and lines are experimental data. Complete comparisons between predicted and experimental SH curves are shown in Fig. S4 in the supplementary material. The normalization approach is detailed in Sec. S5 in the supplementary material.

FIG. 10.

(a) The violin plot of RiH values of worst-performing materials. The material name and the average initial density (noing/cm) and (b) normalized SH curves of the worst performers. Black lines are the ML predictions, colored dots and lines are experimental data. Complete comparisons between predicted and experimental SH curves are shown in Fig. S4 in the supplementary material. The normalization approach is detailed in Sec. S5 in the supplementary material.

Close modal
TABLE III.

The effect of source data variance on ML predictions by material class. The number of training and test materials in each trial are same for both s i C 1 < 0.04 and s i C 1 < . The total number of unique test materials in s i C 1 < is higher than s i C 1 < 0.04 due to the random selection process for test materials.

Class nameTotal number of materials R ¯ S HNumber of worst performers s i C 1 < 0.04 s i C 1 <
Number of materials (% of materials) R ¯ S H by classNumber of worst performers in each class (total unique materials in each class) R ¯ S H by classNumber of worst performers (total test materials in each class)
Elements 56 0.81 8 (14%) 0.84 1 (46) 0.39 3 (54) 
Alloys 31 0.72 7 (23%) 0.93 0 (21) 0.71 2 (31) 
Minerals 32 0.11 11 23 (72%) 0.65 1 (8) −0.26 14 (32) 
Mixtures of minerals 13 0.39 8 (62%) 0.52 0 (5) 0.36 3 (13) 
Plastics 23 0.93 0 (0%) 0.95 0 (23) 0.91 0 (23) 
Other synthetics 0.89 4 (44%) 0.81 0 (4) 0.91 0 (9) 
Liquids 16 0.95 1 (6%) 0.98 0 (14) 0.95 0 (16) 
Aqueous solutions 0.99 0 (0%) 0.98 0 (6) 0.98 0 (6) 
Energetics 13 0.39 11 (85%) 0.88 0 (2) 0.07 3 (13) 
Class nameTotal number of materials R ¯ S HNumber of worst performers s i C 1 < 0.04 s i C 1 <
Number of materials (% of materials) R ¯ S H by classNumber of worst performers in each class (total unique materials in each class) R ¯ S H by classNumber of worst performers (total test materials in each class)
Elements 56 0.81 8 (14%) 0.84 1 (46) 0.39 3 (54) 
Alloys 31 0.72 7 (23%) 0.93 0 (21) 0.71 2 (31) 
Minerals 32 0.11 11 23 (72%) 0.65 1 (8) −0.26 14 (32) 
Mixtures of minerals 13 0.39 8 (62%) 0.52 0 (5) 0.36 3 (13) 
Plastics 23 0.93 0 (0%) 0.95 0 (23) 0.91 0 (23) 
Other synthetics 0.89 4 (44%) 0.81 0 (4) 0.91 0 (9) 
Liquids 16 0.95 1 (6%) 0.98 0 (14) 0.95 0 (16) 
Aqueous solutions 0.99 0 (0%) 0.98 0 (6) 0.98 0 (6) 
Energetics 13 0.39 11 (85%) 0.88 0 (2) 0.07 3 (13) 

We offer three probable causes for the low R ¯ S H associated with these material classes. The first cause—which is likely most important—is in the insufficiency of the physics in the current ML features and representations. This insufficiency leads to poor representation of the causes of nonlinearity in the SH curve. Nonlinearity is not a monolithic class of SH behaviors and can occur due to kinetic mechanisms or experimental noise. Kinetic mechanisms exist in these materials which may include pressure-induced phase changes or elastic–plastic transitions. Anharmonic properties can be used to reveal their incipient characteristics. However, when we attempted to use a higher-order polynomial in Sec. III A, the R S H metrics were shown to decrease when the higher-order terms were included.

Indeed, the worst performance is associated with materials expected to exhibit physical behaviors that lead to nonlinear SH curves. A typical nonlinear SH curve for a material undergoing yielding and phase change is shown in Fig. 11.90 The SH measurements induce pressures or stresses in the range 1–15 GPa.53 In beryllium oxide, the Hugoniot elastic limit (HEL) was observed to be 8.2 GPa.91 For magnetite, the HEL is around 5 GPa,92 and a high-pressure phase can be found above pressures of about 25 GPa.93 Silicon carbide undergoes an elastic-to-elastic–plastic transition at a shock velocity of 0.55 km/s or a stress of about 15–16 GPa.94 Similarly, the HEL of corundum occurs at around 15–21 GPa.95 In addition to an elastic–plastic transition, a Hugoniot kink appears at 79.3 GPa in corundum's SH curve, showing the beginning of a phase transition with a large volume change.95 A u s u P plot of albitite53 shows a region of particle velocity greater than 2.49 km/s or pressure greater than 40 GPa, which shows the existence of a shock-induced phase transformation.96 

FIG. 11.

Schematic SH curve for a mineral undergoing yielding and phase change. 0—the Hugoniot Elastic Limit, 1—transition via yielding, 2—low-pressure state, 3—mixed region with phase change, and 4—high-pressure state. Adapted with permission from T. J. Ahrens and M. L. Johnson, Mineral Physics and Crystallography: A Handbook of Physical Constants (American Geophysical Union, Washington, DC, 1995). Copyright 1995, American Geophysical Union.90 

FIG. 11.

Schematic SH curve for a mineral undergoing yielding and phase change. 0—the Hugoniot Elastic Limit, 1—transition via yielding, 2—low-pressure state, 3—mixed region with phase change, and 4—high-pressure state. Adapted with permission from T. J. Ahrens and M. L. Johnson, Mineral Physics and Crystallography: A Handbook of Physical Constants (American Geophysical Union, Washington, DC, 1995). Copyright 1995, American Geophysical Union.90 

Close modal

The potential improvements through inclusion of anharmonic physics were shown in Sec. III B 1 apropos the inclusion of the Grüneisen parameter as a feature. Previous shock compression studies97–101 have shown that the equation of state of many materials depends on a nonconvex internal energy, which is composed of both mechanical strain energy and thermal energy. As the Grüneisen parameter102 is an anharmonic property, it provides greater information about the nonconvex structure of the potential energy surface than the density and sound velocity alone. Theoretical works by Slater,103 Dugdale and McDonald,104 and the Free Volume model105 have previously shown the existence of a direct correlation between the Grüneisen parameter and the slope of the SH curve.

Furthermore, though the present work is limited to isothermal shock compression,98 the Grüneisen parameter will likely serve as an important feature in nonisothermal conditions as well. This is evident when considering the fact that the EOS can be derived from internal energy, U, through the classical relation P = ( U V ) S. The internal energy due to shock compression, measured as half of the product of shock pressure and specific volume change (Hugoniot equation),57 is primarily stored in the form of mechanical work due to deformation or strain and thermal vibrations. The thermal energy contribution to the internal energy, which is in general not limited to isothermal conditions, is a function of the Grüneisen parameter.106 

The second cause for the worse performing materials may be due to the improper reading of the source experimental data. For some materials, such as beryllium oxide, magnetite, albitite, and corundum, we noticed regions in the data of Ref. 53 where the shock velocity is negatively sloped with respect to the particle velocity. This is contrary to the general rule that a stable shock wave must have a shock velocity that increases with pressure,10 which is presumed true in the present work and is, therefore, a confounding effect during the training of the ML models. However, such negative regions may be realistic and must be accounted for with greater care. Other works90 suggest these are explainable physical mechanisms. In beryllium oxide,107 for example, a negative trend can be seen when the shock front breaks into two or more waves due to elastic–plastic transition or phase transformation in an experiment that is setup to only to measure the first waves of the shock front. This may happen in flash gap experiments, for instance, where the first wave can be of sufficient amplitude to close the flash gap and cause the gas to light before the second main wave reaches the target.10,107

The third cause for the worst performers is the scarcity of data, both in terms of the number of measurements taken along an SH curve as well as in the number of materials available in the data. A small value of n i means the experimental uncertainty at each data point can lead to large variations in the coefficients of the SH model. For instance, in Fig. 10, steel 348 ( ρ o = 7.92 g / c m 3) has only two points. Table III shows that study 5 in Sec. II C 2 filters out a significant percentage of materials from minerals, mixtures of minerals, and energetic classes. Thus, a larger percentage of materials in those three classes have higher shock velocity variance than others.

The scarcity of material types in the data can be analyzed using spatial population density with respect to density ρ o and bulk sound speed C i b defined as C i b = ( v i L ) 2 ( 4 3 ) ( v i T ) 2 in terms of the shear v i T and longitudinal v i L sound velocities.53 The left panel in Fig. 12 shows the clustering of materials by class, and the right plot is annotated using the worst performer materials. Regions of lower population density coincide with materials with lower ML prediction performance. In Fig. 13, we furthermore see that C i b is a more informative feature than ρ o for SH prediction. Figure 13 was obtained by using C i b and ρ o as a separate individual feature for predicting SH curves. This means that each new material selected for inclusion into the ML workflow will have a greater impact if it increases the population density of points with respect to the sound velocity. In the present work, the region 20 km / s < C i b < 70 km / s is particularly scarce of data; only 55 of 199 materials currently used occupy this range whereas the 0 to 20 km/s contains 144 materials. Indeed, as shown in Fig. 12, the worse predictions primarily cluster in the interval 20 km / s < C i b < 70 km / s. Notably, 36 of the 55 materials in the interval 20 km / s < C i b < 70 km / s are from minerals and mixtures of minerals, constituting almost 80% of all minerals and mixtures of minerals; the remaining 19 are from the elements and alloys’ classes, constituting about 22% of all elements and alloys in the 199 material data set.

FIG. 12.

(a) Distribution of materials by class and (b) worst performers (average value of initial density g / c m 3 in brackets) with negative R i H shown as orange dot.

FIG. 12.

(a) Distribution of materials by class and (b) worst performers (average value of initial density g / c m 3 in brackets) with negative R i H shown as orange dot.

Close modal
FIG. 13.

Plot comparing the effect of bulk sound velocity and density on R S H metrics.

FIG. 13.

Plot comparing the effect of bulk sound velocity and density on R S H metrics.

Close modal

In this paper, we performed a comprehensive study of an ML framework to estimate the shock Hugoniot (SH) of solid-state materials using relatively scarce data. We systematically examined featurization approaches and ML techniques to estimate the SH curves for diverse material systems. This establishes a potential for further generalizability to enable investigations of alternate material classes even beyond the scope of the data considered presently using only a limited amount of ambient state data as input. Data sets for training and testing were based on experiments on 199 materials from diverse material classes. The following are the main findings:

  • Among the models studied, which included Gaussian process regression, ridge regression, and a neural network, Gaussian process regression performs better at estimating SH curves of solids.

  • Despite the scarce data context of this study, the problem of SH prediction through ML is found to be feasible with an expectation for improvements with increased data size and improvements through feature engineering.

  • By restricting the SH model to a linear form, the median prediction performance of R2 = 0.90 was obtained with high generalizability to different materials.

  • When noise or nonlinear effects in SH is present in the data, particularly when the standard error of the regression slope ( s i C 1 ) > 0.04, the ML model performance can be substantially degraded. The model trained on a set of materials with lower noise ( s i C 1 < 0.04) improved the average prediction accuracy by more than 33%. A systematic dependency was observed with smaller noise leading to greater accuracy in ML predictions.

  • Physically informed feature selection and engineering are important in ML model development. By leveraging separate but available data to include Grüneisen parameters in the feature vector, significant improvements in average prediction accuracy by 7.3% and reduction in interquartile range by 56% were observed.

  • Diversity in material classes by combining multiple complementary data sets can significantly improve ML performance via a knowledge sharing effect over models developed from data for a single class when data are scarce. Comparison of the models trained with combined data to those trained using segregated classes reveals that the former model showed about 24% relative improvement.

  • Where available, experimental features are preferred for training ML models. However, low-cost compositional features may have significant merit in the absence of more costly features and data. A new featurization technique, the so-called linear combination of compositional features (LCCF), was found to improve both the median and mean R 2 by 4% when used to augment the existing experimental data.

See the supplementary material for further details of data, data analysis procedures, and ancillary supportive results.

Support is gratefully acknowledged for this work under ONR (Contract No. N00014-19-C-1052), Energetics Technology Center Project (No. 2044-002), and Army Cooperative Agreement (No. W911NF2120076).

R.M.D. and W.H.W. are part-time employees of the Energetics Technology Center.

Sangeeth Balakrishnan: Writing – original draft (equal). Francis G. VanGessel: Writing – review & editing (equal). Brian C. Barnes: Writing – review & editing (equal). Ruth Doherty: Writing – review & editing (equal). William Wilson: Writing – review & editing (equal). Zois Boukouvalas: Writing – review & editing (equal). Mark D. Fuge: Writing – review & editing (equal). Peter W. Chung: Writing – original draft (equal).

The data that support the findings of this study are available in the supplementary material..

1.
L. M.
López-Marín
,
A. L.
Rivera
,
F.
Fernández
, and
A. M.
Loske
, “
Shock wave-induced permeabilization of mammalian cells
,”
Phys. Life Rev.
26
,
1
38
(
2018
).
2.
S.
Shrivastava
and
Kailash
, “
Shock wave treatment in medicine
,”
J. Biosci.
30
(
2
),
269
275
(
2005
).
3.
T. J.
Ahrens
and
J. D.
OˊKeefe
, “
Shock melting and vaporization of lunar rocks and minerals
,”
Moon
4
(
1
),
214
249
(
1972
).
4.
T. J.
Ahrens
and
J. D.
Okeefe
, “
Equations of state and impact-induced shock-wave attenuation on the moon
,” in
Impact and Explosion Cratering: Planetary and Terrestrial Implications
(Pergamon Press, New York,
1977
), pp.
639
656
.
5.
K.
Takayama
and
T.
Saito
, “
Shock wave/geophysical and medical applications
,”
Annu. Rev. Fluid Mech.
36
,
347
379
(
2004
).
6.
S.
Bardy
,
B.
Aubert
,
L.
Berthe
,
P.
Combis
,
D.
Hébert
,
E.
Lescoute
,
J.-L.
Rullier
, and
L.
Videau
, “
Numerical study of laser ablation on aluminum for shock-wave applications: Development of a suitable model by comparison with recent experiments
,”
Opt. Eng.
56
(
1
),
011014
(
2016
).
7.
R. L.
Gustavesen
and
S. A.
Sheffield
, “
Unreacted Hugoniots for porous and liquid explosives
,”
AIP Conf. Proc.
309
(
1
),
1393
1396
(
1994
).
8.
C.
Albert-Weissenberger
and
A.-L.
Sirén
, “
Experimental traumatic brain injury
,”
Exp. Transl. Stroke Med.
2
(
1
),
1
8
(
2010
).
9.
K.
Nagayama
,
Y.
Mori
,
K.
Shimada
, and
M.
Nakahara
, “
Water shock Hugoniot measurement up to less than 1 GPa
,”
AIP Conf. Proc.
505
(
1
),
65
68
(
2000
).
10.
R. G.
McQueen
,
S. P.
Marsh
,
J. W.
Taylor
,
J. N.
Fritz
, and
W. J.
Carter
, “
The equation of state of solids from shock wave studies
,”
High Velocity Impact Phenom.
293
,
294
417
(
1970
).
11.
J.
Wang
,
Y.
Yin
, and
C.
Luo
, “
Johnson–Holmquist-II (JH-2) constitutive model for rock materials: Parameter determination and application in tunnel smooth blasting
,”
Appl. Sci.
8
(
9
),
1675
(
2018
).
12.
S. X.
Hu
,
B.
Militzer
,
L. A.
Collins
,
K. P.
Driver
, and
J. D.
Kress
, “
First-principles prediction of the softening of the silicon shock Hugoniot curve
,”
Phys. Rev. B
94
(
9
),
094109
(
2016
).
13.
P.
Vinet
,
J. H.
Rose
,
J.
Ferrante
, and
J. R.
Smith
, “
Universal features of the equation of state of solids
,”
J. Phys.: Condens. Matter
1
(
11
),
1941
(
1989
).
14.
F. D.
Murnaghan
, “
The compressibility of media under extreme pressures
,”
Proc. Natl. Acad. Sci. U.S.A.
30
(
9
),
244
(
1944
).
15.
J.-P.
Poirier
and
A.
Tarantola
, “
A logarithmic equation of state
,”
Phys. Earth Planet. Interiors
109
(
1–2
),
1
8
(
1998
).
16.
F.
Birch
, “
Finite elastic strain of cubic crystals
,”
Phys. Rev.
71
(
11
),
809
(
1947
).
17.
A. B.
Belonoshko
, “
Molecular dynamics of MgSiO3 perovskite at high pressures: Equation of state, structure, and melting transition
,”
Geochim. Cosmochim. Acta
58
(
19
),
4039
4047
(
1994
).
18.
S.
Ono
, “
First-principles molecular dynamics calculations of the equation of state for tantalum
,”
Int. J. Mol. Sci.
10
(
10
),
4342
4351
(
2009
).
19.
X.
Yang
,
X.
Zeng
,
H.
Chen
,
Y.
Wang
,
L.
He
, and
F.
Wang
, “
Molecular dynamics investigation on complete Mie-Gruneisen equation of state: Al and Pb as prototypes
,”
J. Alloys Compd.
808
,
151702
(
2019
).
20.
B. K.
Godwal
,
S. K.
Sikka
, and
R.
Chidambaram
, “
Equation of state theories of condensed matter up to about 10
TPa
,”
Phys. Rep.
102
(
3
),
121
197
(
1983
).
21.
A. C.
Landerville
,
M. W.
Conroy
,
M. M.
Budzevich
,
Y.
Lin
,
C. T.
White
, and
I.
Oleynik
, “
Equations of state for energetic materials from density functional theory with van der Waals, thermal, and zero-point energy corrections
,”
Appl. Phys. Lett.
97
(
25
),
251908
(
2010
).
22.
W. F.
Perger
,
J.
Zhao
,
J. M.
Winey
, and
Y. M.
Gupta
, “
First-principles study of pentaerythritol tetranitrate single crystals under high pressure: Vibrational properties
,”
Chem. Phys. Lett.
428
(
4–6
),
394
399
(
2006
).
23.
G. R.
Johnson
and
T. J.
Holmquist
, “
An improved computational constitutive model for brittle materials
,”
AIP Conf. Proc.
309
(
1
),
981
984
(
1994
).
24.
V.
Botu
and
R.
Ramprasad
, “
Adaptive machine learning framework to accelerate ab initio molecular dynamics
,”
Int. J. Quantum Chem.
115
(
16
),
1074
1083
(
2015
).
25.
B. C.
Barnes
,
K. W.
Leiter
,
J. P.
Larentzos
, and
J. K.
Brennan
, “
Forging of hierarchical multiscale capabilities for simulation of energetic materials
,”
Propellants Explos. Pyrotech.
45
(
2
),
177
195
(
2020
).
26.
B. C.
Barnes
,
C. E.
Spear
,
K. W.
Leiter
,
R.
Becker
,
J.
Knap
,
M.
Lísal
, and
J. K.
Brennan
, “
Hierarchical multiscale framework for materials modeling: Equation of state implementation and application to a Taylor anvil impact test of RDX
,”
AIP Conf. Proc.
1793
(
1
),
080001
(
2017
).
27.
B. C.
Barnes
,
K. W.
Leiter
,
R.
Becker
,
J.
Knap
, and
J. K.
Brennan
, “
LAMMPS integrated materials engine (LIME) for efficient automation of particle-based simulations: Application to equation of state generation
,”
Modell. Simul. Mater. Sci. Eng.
25
(
5
),
055006
(
2017
).
28.
S.
Hamel
,
L. X.
Benedict
,
P. M.
Celliers
,
M. A.
Barrios
,
T. R.
Boehly
,
G. W.
Collins
,
T.
Döppner
,
J. H.
Eggert
,
D. R.
Farley
,
D. G.
Hicks
, and
K. J.
L
, “
Equation of state of CH 1.36: First-principles molecular dynamics simulations and shock-and-release wave speed measurements
,”
Phys. Rev. B
86
(
9
),
094113
(
2012
).
29.
S.
Ono
,
J. P.
Brodholt
,
D.
Alfè
,
M.
Alfredsson
, and
G. D.
Price
, “
Ab initio molecular dynamics simulations for thermal equation of state of B 2-type NaCl
,”
J. Appl. Phys.
103
(
2
),
023510
(
2008
).
30.
A. B.
Belonoshko
, “
Molecular dynamics of silica at high pressures: Equation of state, structure, and phase transitions
,”
Geochim. Cosmochim. Acta
58
(
6
),
1557
1566
(
1994
).
31.
D. A.
Rehn
,
C. W.
Greeff
,
L.
Burakovsky
,
D. G.
Sheppard
, and
S. D.
Crockett
, “
Multiphase tin equation of state using density functional theory
,”
Phys. Rev. B
103
(
18
),
184102
(
2021
).
32.
E. D.
Chisolm
,
S. D.
Crockett
, and
D. C.
Wallace
, “
Test of a theoretical equation of state for elemental solids and liquids
,”
Phys. Rev. B
68
(
10
),
104103
(
2003
).
33.
W. D. S.
Motherwell
,
H. L.
Ammon
,
J. D.
Dunitz
,
A.
Dzyabchenko
,
P.
Erk
,
A.
Gavezzotti
,
D. W. M.
Hofmann
,
F. J. J.
Leusen
,
J. P. M.
Lommerse
, and
W. T. M.
Mooij
, “
Crystal structure prediction of small organic molecules: A second blind test
,”
Acta Crystallogr. B
58
(
4
),
647
661
(
2002
).
34.
G. M.
Day
,
W. D. S.
Motherwell
,
H. L.
Ammon
,
S. X. M.
Boerrigter
,
R. G.
Della Valle
,
E.
Venuti
,
A.
Dzyabchenko
,
J. D.
Dunitz
,
B.
Schweizer
, and
B. P.
Van Eijck
, “
A third blind test of crystal structure prediction
,”
Acta Crystallogr. B
61
(
5
),
511
527
(
2005
).
35.
R.
Tom
,
T.
Rose
,
I.
Bier
,
H.
OˊBrien
,
Á
Vázquez-Mayagoitia
, and
N.
Marom
, “
Genarris 2.0: A random structure generator for molecular crystals
,”
Comput. Phys. Commun.
250
,
107170
(
2020
).
36.
E.
Paquet
and
H. L.
Viktor
, “
Molecular dynamics, Monte Carlo simulations, and Langevin dynamics: A computational review
,”
BioMed. Res. Int.
2015
,
1
18
(
2015
).
37.
A. R.
Oganov
and
C. W.
Glass
, “
Crystal structure prediction using ab initio evolutionary techniques: Principles and applications
,”
J. Chem. Phys.
124
(
24
),
244704
(
2006
).
38.
D. H.
Case
,
J. E.
Campbell
,
P. J.
Bygrave
, and
G. M.
Day
, “
Convergence properties of crystal structure prediction by quasi-random sampling
,”
J. Chem. Theory Comput.
12
(
2
),
910
924
(
2016
).
39.
C. J.
Pickard
and
R. J.
Needs
, “
Ab initio random structure searching
,”
J. Phys.: Condens. Matter
23
(
5
),
053201
(
2011
).
40.
H. A.
Scheraga
,
M.
Khalili
, and
A.
Liwo
, “
Protein-folding dynamics: Overview of molecular simulation techniques
,”
Annu. Rev. Phys. Chem.
58
,
57
83
(
2007
).
41.
M.
AlQuraishi
, “
Alphafold at CASP13
,”
Bioinformatics
35
(
22
),
4862
4865
(
2019
).
42.
D.
Chichili
,
K.
Ramesh
, and
K.
Hemker
, “
The high-strain-rate response of alpha-titanium: Experiments, deformation mechanisms and modeling
,”
Acta Mater.
46
(
3
),
1025
1043
(
1998
).
43.
J.
Jung
,
J. I.
Yoon
,
H. K.
Park
,
J. Y.
Kim
, and
H. S.
Kim
, “
An efficient machine learning approach to establish structure-property linkages
,”
Comput. Mater. Sci.
156
,
17
25
(
2019
).
44.
F.
Musil
,
S.
De
,
J.
Yang
,
J. E.
Campbell
,
G. M.
Day
, and
M.
Ceriotti
, “
Machine learning for the structure–energy–property landscapes of molecular crystals
,”
Chem. Sci.
9
(
5
),
1289
1300
(
2018
).
45.
T. L.
Galvão
,
G.
Novell-Leruth
,
A.
Kuznetsova
,
J.
Tedim
, and
J. R.
Gomes
, “
Elucidating structure–property relationships in aluminum alloy corrosion inhibitors by machine learning
,”
J. Phys. Chem. C
124
(
10
),
5624
5635
(
2020
).
46.
J.
Schmidt
,
M. R.
Marques
,
S.
Botti
, and
M. A.
Marques
, “
Recent advances and applications of machine learning in solid-state materials science
,”
npj Comput. Mater.
5
(
1
),
83
(
2019
).
47.
Q.
Deng
,
J.
Hu
,
L.
Wang
,
Y.
Liu
,
Y.
Guo
,
T.
Xu
, and
X.
Pu
, “
Probing impact of molecular structure on mechanical property and sensitivity of energetic materials by machine learning methods
,”
Chemom. Intell. Lab. Syst.
215
,
104331
(
2021
).
48.
Y.
Liu
,
T.
Zhao
,
W.
Ju
, and
S.
Shi
, “
Materials discovery and design using machine learning
,”
J. Materiomics
3
(
3
),
159
177
(
2017
).
49.
R.
Gómez-Bombarelli
,
J. N.
Wei
,
D.
Duvenaud
,
J. M.
Hernández-Lobato
,
B.
Sánchez-Lengeling
,
D.
Sheberla
,
J.
Aguilera-Iparraguirre
,
T. D.
Hirzel
,
R. P.
Adams
, and
A.
Aspuru-Guzik
, “
Automatic chemical design using a data-driven continuous representation of molecules
,”
ACS Cent. Sci.
4
(
2
),
268
276
(
2018
).
50.
N. C.
Iovanac
and
B. M.
Savoie
, “
Improved chemical prediction from scarce data sets via latent space enrichment
,”
J. Phys. Chem. A
123
(
19
),
4295
4302
(
2019
).
51.
Z.
Li
,
X.
Ma
, and
H.
Xin
, “
Feature engineering of machine-learning chemisorption models for catalyst design
,”
Catal. Today
280
,
232
238
(
2017
).
52.
D. C.
Elton
,
Z.
Boukouvalas
,
M. S.
Butrico
,
M. D.
Fuge
, and
P. W.
Chung
, “
Applying machine learning techniques to predict the properties of energetic materials
,”
Sci. Rep.
8
(
1
),
9059
(
2018
).
53.
S. P.
Marsh
,
LASL Shock Hugoniot Data
(
University of California Press
,
1980
).
54.
J. R.
Asay
and
M.
Shahinpoor
,
High-Pressure Shock Compression of Solids
(
Springer Science and Business Media
,
2012
).
55.
See https://pubchem.ncbi.nlm.nih.gov/periodic-table/ for
PubChem Periodic Table of Elements
(
National Center for Biotechnology Information
,
2022
).
56.
S.
Kim
,
J.
Chen
,
T.
Cheng
,
A.
Gindulyte
,
J.
He
,
S.
He
,
Q.
Li
,
B. A.
Shoemaker
,
P. A.
Thiessen
,
B.
Yu
, and others, “
Pubchem in 2021 new data content and improved web interfaces
,”
Nucleic Acids Res.
49
(
D1
),
D1388
D1395
(
2021
).
57.
S. M.
Peiris
and
G. J.
Piermarini
,
Static Compression of Energetic Materials
(
Springer
,
2008
).
58.
B. J.
Alder
and
R. H.
Christian
, “
Behavior of strongly shocked carbon
,”
Phys. Rev. Lett.
7
(
10
),
367
(
1961
).
59.
P. L.
Stanton
and
R. A.
Graham
, “
Shock-wave compression of lithium niobate from 2.4 to 44 GPa
,”
J. Appl. Phys.
50
(
11
),
6892
6901
(
1979
).
60.
G. F.
Davies
and
D. L.
Anderson
, “
Revised shock-wave equations of state for high-pressure phases of rocks and minerals
,”
J. Geophys. Res.
76
(
11
),
2617
2627
, https://doi.org/10.1029/JB076i011p02617 (
1971
).
61.
V.
Revi
,
S.
Kasodariya
,
A.
Talapatra
,
G.
Pilania
, and
A.
Alankar
, “
Machine learning elastic constants of multi-component alloys
,”
Comput. Mater. Sci.
198
,
110671
(
2021
).
62.
G.
Chandrashekar
and
F.
Sahin
, “
A survey on feature selection methods
,”
Comput. Electr. Eng.
40
(
1
),
16
28
(
2014
).
63.
S.
Seabold
and
J.
Perktold
, “
Statsmodels: Econometric and statistical modeling with python
,” in
Proceedings of the 9th Python in Science Conference
,
edited by S. van der Walt and J. Millman (SciPy, 2010), pp. 92-96
.
64.
L.
Alzubaidi
,
J.
Zhang
,
A. J.
Humaidi
,
A.
Al-Dujaili
,
Y.
Duan
,
O.
Al-Shamma
,
J.
Santamaría
,
M. A.
Fadhel
,
M.
Al-Amidie
, and
L.
Farhan
, “
Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions
,”
J. Big Data
8
(
1
),
53
(
2021
).
65.
N. K.
Dinsdale
,
E.
Bluemke
,
V.
Sundaresan
,
M.
Jenkinson
,
S. M.
Smith
, and
A. I.
Namburete
, “
Challenges for machine learning in clinical translation of big data imaging studies
,”
Neuron
110
,
3866
-
3881
(
2022
).
66.
J. M.
Walsh
,
M. H.
Rice
,
R. G.
McQueen
, and
F. L.
Yarger
, “
Shock-wave compressions of twenty-seven metals: Equations of state of metals
,”
Phys. Rev.
108
(
2
),
196
(
1957
).
67.
G. I.
Kerley
, “The linear us-up relation in shock-wave physics,” arXiv:1306.6916 (
2013
).
68.
N.
Holmes
,
J.
Moriarty
,
G.
Gathers
, and
W.
Nellis
, “
The equation of state of platinum to 660 GPa (6.6
Mbar)
,”
J. Appl. Phys.
66
(
7
),
2962
2967
(
1989
).
69.
J. M.
Brown
,
J. N.
Fritz
, and
R. S.
Hixson
, “
Hugoniot data for iron
,”
J. Appl. Phys.
88
(
9
),
5496
5498
(
2000
).
70.
R. E.
Setchell
and
M. U.
Anderson
, “
Shock-compression response of an alumina-filled epoxy
,”
J. Appl. Phys.
97
(
8
),
083518
(
2005
).
71.
L. B.
Munday
,
P. W.
Chung
,
B. M.
Rice
, and
S. D.
Solares
, “
Simulations of high-pressure phases in RDX
,”
J. Phys. Chem. B
115
(
15
),
4378
4386
(
2011
).
72.
T.
Yan
,
K.
Wang
,
D.
Duan
,
X.
Tan
,
B.
Liu
, and
B.
Zou
, “
p-Aminobenzoic acid polymorphs under high pressures
,”
RSC Adv.
4
(
30
),
15534
15541
(
2014
).
73.
C. E.
Morris
, “
Shock-wave equation-of-state studies at Los Alamos
,”
Shock Waves
1
(
3
),
213
222
(
1991
).
74.
H. K. D. H.
Bhadeshia
, “
Neural networks in materials science
,”
ISIJ Int.
39
(
10
),
966
979
(
1999
).
75.
Y.
Özçelik
,
S.
Kulaksız
, and
M.
Çetin
, “
Assessment of the wear of diamond beads in the cutting of different rock types by the ridge regression
,”
J. Mater. Process. Technol.
127
(
3
),
392
400
(
2002
).
76.
L. C.
Yee
and
Y. C.
Wei
, “
Current modeling methods used in QSAR/QSPR
,”
Assessment
10
,
1978
(
2012
).
77.
J.
Quinonero-Candela
and
C. E.
Rasmussen
, “
A unifying view of sparse approximate Gaussian process regression
,”
J. Mach. Learning Res.
6
,
1939
1959
(
2005
).
78.
C. E.
Rasmussen
and
C.
Williams
,
Gaussian Processes for Machine Learning
(
MIT Press
,
Cambridge
,
MA
,
2006
).
79.
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
,
V.
Dubourg
,
J.
Vanderplas
,
A.
Passos
,
D.
Cournapeau
,
M.
Brucher
,
M.
Perrot
, and
Duchesna
, “
Scikit-learn: Machine learning in Python
,”
J. Mach. Learning Res.
12
,
2825
2830
(
2011
); available at http://jmlr.org/papers/v12/pedregosa11a.html.
80.
F.
Bowman
,
Introduction to Bessel Functions
(
Courier Corporation
,
2012
).
81.
I.
Nunez
,
A.
Marani
, and
M. L.
Nehdi
, “
Mixture optimization of recycled aggregate concrete using hybrid machine learning model
,”
Materials
13
(
19
),
4331
(
2020
).
82.
M.
Gheytanzadeh
,
A.
Baghban
,
S.
Habibzadeh
,
E. A.
Baghban
,
O.
Abida
,
A.
Mohaddespour
, and
M. T.
Munir
, “
Towards estimation of CO2 adsorption on highly porous MOF-based adsorbents using Gaussian process regression approach
,”
Sci. Rep.
11
(
1
),
1
13
(
2021
).
83.
B.
Hilloulin
,
M.
Lagrange
,
M.
Duvillard
, and
G.
Garioud
, “
ε–greedy automated indentation of cementitious materials for phase mechanical properties determination
,”
Cem. Concr. Compos.
129
,
104465
(
2022
).
84.
E.
Ford
,
S.
Kailas
,
K.
Maneparambil
, and
N.
Neithalath
, “
Machine learning approaches to predict the micromechanical properties of cementitious hydration phases from microstructural chemical maps
,”
Constr. Building Mater.
265
,
120647
(
2020
).
85.
D.
Furuya
,
T.
Miyashita
,
Y.
Miura
,
Y.
Iwasaki
, and
M.
Kotsugi
, “
Autonomous synthesis system integrating theoretical, informatics, and experimental approaches for large-magnetic-anisotropy materials
,”
Sci. Technol. Adv. Mater. Methods
2
(
1
),
280
293
(
2022
).
86.
Y.-F.
Lim
,
C. K.
Ng
,
U.
Vaitesswar
, and
K.
Hippalgaonkar
, “
Extrapolative Bayesian optimization with Gaussian process and neural network ensemble surrogate models
,”
Adv. Intell. Syst.
3
(
11
),
2100101
(
2021
).
87.
A. C.
Rajan
,
A.
Mishra
,
S.
Satsangi
,
R.
Vaish
,
H.
Mizuseki
,
K.-R.
Lee
, and
A. K.
Singh
, “
Machine-learning-assisted accurate band gap predictions of functionalized MXene
,”
Chem. Mater.
30
(
12
),
4031
4038
(
2018
).
88.
F.
Chollet
et al, “Keras,” https://keras.io, 2015.
89.
S. S.
Batsanov
,
Effects of Explosions on Materials: Modification and Synthesis Under High-Pressure Shock Compression
(
Springer Science and Business Media
,
1994
).
90.
T. J.
Ahrens
and
M. L.
Johnson
, “
Shock wave data for rocks
,” in
Mineral Physics and Crystallography: A Handbook of Physical Constants
(American Geophysical Union, Washington, DC,
1995
), Vol. 3, pp.
35
44
.
91.
J. J.
Petrovic
and
C. L.
Haertling
,
Beryllium Oxide (BeO) Handbook
(Los Alamos National Laboratory,
2020
).
92.
A.
Agarwal
,
A.
Kontny
,
T.
Kenkmann
, and
M. H.
Poelchau
, “
Variation in magnetic fabrics at low shock pressure due to experimental impact cratering
,”
J. Geophys. Res. Solid Earth
124
(
8
),
9095
9108
, https://doi.org/10.1029/2018JB017128 (
2019
).
93.
H.-K.
Mao
,
T.
Takahashi
,
W. A.
Bassett
,
G. L.
Kinsland
, and
L.
Merrill
, “
Isothermal compression of magnetite to 320
KB
,”
J. Geophys. Res.
79
(
8
),
1165
1170
, https://doi.org/10.1029/JB079i008p01165 (
1974
).
94.
D.
Grady
, “
Shock-wave strength properties of boron carbide and silicon carbide
,”
Le J. Phys. IV
4
(
C8
),
C8
385
(
1994
).
95.
T.
Mashimo
,
K.
Tsumoto
,
K.
Nakamura
,
Y.
Noguchi
,
K.
Fukuoka
, and
Y.
Syono
, “
High-pressure phase transformation of corundum (α-Al2O3) observed under shock compression
,”
Geophys. Res. Lett.
27
(
14
),
2021
2024
, https://doi.org/10.1029/2000GL008490 (
2000
).
96.
R.
McQueen
,
S.
Marsh
, and
J.
Fritz
, “
Hugoniot equation of state of twelve rocks
,”
J. Geophys. Res.
72
(
20
),
4999
5036
, https://doi.org/10.1029/JZ072i020p04999 (
1967
).
97.
F.
Murnaghan
,
Finite Deformation of an Elastic Solid
(
Chapman and Hall
,
London
,
1951
).
98.
F.
Birch
, “
Elasticity and constitution of the Earth’s interior
,”
J. Geophys. Res.
57
(
2
),
227
286
, https://doi.org/10.1029/JZ057i002p00227 (
1952
).
99.
J.
Bardeen
, “
Compressibilities of the alkali metals
,”
J. Chem. Phys.
6
(
7
),
372
378
(
1938
).
100.
L. A.
Davis
and
R. B.
Gordon
, “
Compression of mercury at high pressure
,”
J. Chem. Phys.
46
(
7
),
2650
2660
(
1967
).
101.
P. M.
Morse
, “
Diatomic molecules according to the wave mechanics. II. Vibrational levels
,”
Phys. Rev.
34
(
1
),
57
(
1929
).
102.
E.
Grüneisen
, “
Theorie des festen Zustandes einatomiger Elemente
,”
Ann. Phys.
344
(
12
),
257
306
(
1912
).
103.
J. C.
Slater
,
Introduction to Chemical Physics
(Read Books Limited,
2011
).
104.
J.
Dugdale
and
D.
MacDonald
, “
The thermal expansion of solids
,”
Phys. Rev.
89
(
4
),
832
(
1953
).
105.
Y.
Wang
,
X.
Zeng
,
H.
Chen
,
X.
Yang
,
F.
Wang
, and
J.
Ding
, “
Hugoniot states and Mie–Grüneisen equation of state of iron estimated using molecular dynamics
,”
Crystals
11
(
6
),
664
(
2021
).
106.
V.
Chuzeville
,
G.
Baudin
,
M.
Genetier
,
A.
Lefrancois
,
R.
Boulanger
, and
L.
Catoire
, “
Complete Mie-Grüneisen equation of state for several explosives and universal unreacted Hugoniot relations
,” in
16th International Detonation Symposium
(JHU WSE Energetics Research Group
,
Cambridge, MD
,
2018
).
107.
S.
Marsh
, “
Hugoniot equation of state of beryllium oxide
,”
High Temp. High Press.
5
(
5
),
503
508
(
1973
).

Supplementary Material