Data-driven machine learning techniques can be useful for the rapid evaluation of material properties in extreme environments, particularly in cases where direct access to the materials is not possible. Such problems occur in high-throughput material screening and material design approaches where many candidates may not be amenable to direct experimental examination. In this paper, we perform an exhaustive examination of the applicability of machine learning for the estimation of isothermal shock compression properties, specifically the shock Hugoniot, for diverse material systems. A comprehensive analysis is conducted where effects of scarce data, variances in source data, feature choices, and model choices are systematically explored. New modeling strategies are introduced based on feature engineering, including a feature augmentation approach, to mitigate the effects of scarce data. The findings show significant promise of machine learning techniques for design and discovery of materials suited for shock compression applications.

## I. INTRODUCTION

Advances in the understanding of shock compression mechanisms in materials and flows have enabled impressive successes in diverse applications including drug delivery,^{1} destruction of kidney stones,^{2} effect of meteorite impacts on surfaces of planets,^{3,4} the study of volcanoes,^{5} and laser ablation applications.^{6} Fundamental to the study of such mechanisms is knowledge about material- and condition-dependent behaviors such as shock velocity, particle velocity, and pressure behind a shock wave. The equations of state and conservation laws for mass, momentum, and energy encapsulate this knowledge and provide the formal shock Hugoniot/equation of state (SH/EOS) relationships which govern shock compression behaviors through a range of thermodynamic conditions.

In scenarios where rapid evaluation of notional material designs for SH/EOS are needed or where experiments are prohibitively difficult to conduct, data-driven methods may serve an important role. Notional materials are materials that may be inaccessible, have no known synthesis route, or cannot be studied either *in silico* or *in situ* for other reasons. For example, the material may not exist terrestrially, be sensitive to strongly exothermic reactions,^{7} or be an *operando* biological material.^{8} Determination of the SH/EOS relationships for general media can require a significant amount of information that often necessitates full realization of the material and extensive measurements. In the context of optimization or design of materials, the cost and time required to obtain this information, if such data can be obtained at all,^{9} can greatly inhibit exploration of the material design space.

Two traditional approaches are available for the determination of SH/EOS relationships.^{10–13} The first is empirical^{14–16} and the second is via physics-based models.^{12,13,17–22} To varying degrees, both generally reduce to a parameterization of an underlying constitutive relationship.^{11,23} Though empirical models are popular for data analysis and extrapolation, constructing these models necessarily requires reliable source data. Multiscale simulations can overcome some of the more costly or time-consuming elements of experiments, but only when the idealizations required in the models are valid, such as a gas phase assumption or periodic unit cell boundary conditions.^{24} Today, the experimental and multiscale approaches are mainstays, each associated with a prolific literature, but both require information that can be difficult to acquire for notional material candidates.

Multiscale modeling is used extensively for the study of materials in extreme environments.^{25–27} Studies exist based on molecular dynamics^{17,18,28–30} and electronic structure methods.^{20–22,31,32} Yet, there are significant challenges in developing techniques that use multiscale methods as the basis for rapid screening of notional materials. First, the range of materials used in the context of shock compression is not limited to elemental crystals and alloys; the types of materials often used can be highly polymorphic^{10} and have crystal unit cells with tens or hundreds of atoms. This makes the problem cost prohibitive since structure determination^{33–35} and configuration optimization,^{36–39} well suited for the study of a single material at a time, must be scaled to screen thousands, if not millions, of compounds with an accuracy that can distinguish different crystalline polymorphs. The difficulty of this task is echoed in other fields, such as molecular simulations of folded proteins,^{40} where despite the ostensible maturity of computational techniques, the determination of accurate condensed phase structures starting only from a notional molecular graph can be difficult. In this area, machine learning (ML) has already been making advances.^{41}

Second, many mechanisms important for the performance of materials in extreme environments manifest at length scales that cannot be handled rapidly using atomistic models. These mechanisms can manifest at the microstructural scale due to, for instance, submicrometer defects or interfaces present in heterogeneous mixtures.^{42} At these length scales, performing a single simulation with electronic structure accuracy is prohibitively difficult, let alone performing many simulations in the context of statistical estimation over many different material and configurational combinations.

ML methods are well suited to model mechanisms or behaviors^{43–45} that cannot be modeled solely using physics-based approaches due to, for instance, the lack of an existing theoretical or analytic model.^{46–48} ML methods are also amenable to design or optimization because the models often contain an expressive response surface. In biochemical property models, ML prediction tools have been used to create surrogate models to navigate the response surface with respect to chemical composition and enable the search for molecules with optimal properties such as water-octanol partition coefficient, drug likeliness, synthetic accessibility, and (de)protonation free energy.^{49,50} In the present paper, this means that ML methods can estimate the shock compression properties for a notional material, or inversely, find material candidates that can deliver a desired shock compression performance. This inverse design ability can accelerate the development of new materials for biological, medical, material, space, and other applications.

To the best of our knowledge, no data-driven approach for the prediction of shock compression properties of notional materials is available. In this paper, we assess ML techniques for estimating the SH properties based on data at standard temperature and pressure. A comprehensive evaluation is performed, organized by data, features, and model considerations. First, the data considerations center on scarce data challenges and the associated methods that may be used to mitigate them. Specifically, the union of multiple data sets and containing variances due to noise are systematically explored. Second, the properties of ML models heavily depend upon strong feature engineering^{51} which, in turn, depends on the availability of features in the data and the application of rational and deep domain knowledge by the user. Overfitting and over-parameterization^{52} can be mitigated by the sensible design, selection, or engineering of features. In this work, three featurization methods are considered: naïve, physics-informed, and high-throughput features. Finally, we consider both forms of the physical model, in this case the SH, as well as the choice of the ML model. The physical model is presumed to take the form of a polynomial relating the particle and shock velocities. The ML models considered include Gaussian process regression (GPR), ridge regression (RR), and neural network (NN).

The rest of the paper is organized as follows. Section II describes the data and methodology details. Sections III and IV contain the results and discussions, and Sec. V is reserved for the conclusions. A brief overview of the symbols and abbreviations is given in Table S3 in the supplementary material.

## II. DATA, METHODS, AND APPROACHES

### A. Data sources and preparation

The data used in the present paper come primarily from Ref. 53 which is a consolidated report summarizing experimental shock compression properties of a highly diverse set of 474 materials from ten different material classes: 100 materials containing a single chemical element, 51 alloys, 106 minerals, 32 rocks and mixtures of minerals, 42 plastics, 41 other synthetics, 12 woods, 26 liquids, 6 aqueous solutions, and 58 energetic materials (high explosives and propellants). The shock and particle data are reported in raw form for each material along with the initial sample densities ( $ \rho o$) and longitudinal and shear acoustic velocities measured at ambient conditions. Though the sample densities were reported for all 474 materials in the data set, longitudinal and shear acoustic velocity data were reported for only 199 materials across nine different classes. The one missing class is woods, which did not report acoustic velocities. The set of $ N t o t a l=199$ materials constitutes the principal data used in Sec. II B 1.

In Ref. 53, the initial density of a material varies substantially between measurements. The standard deviation in the initial densities is shown in Fig. 1, using $ s i \rho = \u2211 j \u2061 [ ( \rho i j o \u2013 \rho \xaf i o ) 2 n i \u2013 1 ]$, where *i* is the material index, $ n i$ denotes the number of data points in the SH for the $i$th material, $j\u2208( 1 , n i)$ is the index for the experimental SH data points, and $ \rho \xaf i o$ is the mean value of $ \rho i j o\u2200j. \rho \xaf i o$ varies from 0.66 to 22.47 g/cm^{3} and the standard deviation ranges from 0 to 0.20. $ s i \rho $ was zero for some of the materials in the data, as their initial density values were same for all state points within an SH curve. Forty-five materials have $ s i \rho $ larger than 0.03 g/cm^{3}. Only the first six material classes are represented in this set, including elements, alloys, minerals, rocks and mixtures of minerals, plastics, and synthetics. This leads us to believe that this fraction of the data may have effects of other components in the mixture or other morphological features. Despite such large variations, the diverse set of 199 materials is used without prejudice.

The data for the Grüneisen parameters used in Sec. II B 2 come from the Appendix in Ref. 54 where 46 are also found in Ref. 53. The information about compositional features is used for the ML models. The compositional features include elemental properties, molecular formulas, and other physical properties that are generally known for compositionally pure ingredients.^{55} The compositional features were determined for 159 materials from the set of 199. The specific values and associated references are listed in the supplementary material.

### B. Feature engineering

Multiple feature vectors are considered in this paper and are summarized here. A feature vector is generally defined as $ x\u2208 R m$, where $ m$ is the chosen number of features. Ideally, $ x$ will include the most important factors that are needed to accurately predict the SH curve. The use of unnecessary or redundant factors will result in a higher dimensionality of the representation vector which can burden the model and increase the resource demands without significantly improving prediction accuracy. A careful design of a model's feature vector should consider the expected insights gained from the inclusion of each factor and the likely cost to acquire its values.

The various studies and their associated feature and target variables considered in the present paper are summarized in Table I.

No. . | Study . | Data . | Results . | N_{total} and data sources
. | Feature sets . |
---|---|---|---|---|---|

1 | Naïve features | Sec. II B 1 | Sec. III A | 199 materials^{53} | (ρ^{o}, v^{T}, v^{L}) |

2 | Physics-informed features | Sec. II B 2 | Sec. III B 1 | 46 materials^{53,54} | (ρ^{o}, v^{T}, v^{L}) vs(ρ^{o}, v^{T}, v^{L}, γ) |

3 | Linear combination of compositional features (LCCFs) | Sec. II B 3 | Sec. III B 2 | 159 materials^{53,56} | (ρ^{o}, v^{T}, v^{L}) vs $( m \xaf , r \xaf , \chi \xaf , E \xaf , T \xaf)$ vs $( \rho o , v T , v L , r \xaf , \chi \xaf , E \xaf , T \xaf)$ |

4 | Segregated vs combined data | Sec. II C 1 | Sec. III C 1 | 199 materials^{53} | (ρ^{o}, v^{T}, v^{L}) |

5 | Source data variance | Sec. II C 2 | Sec. III C 2 | 131, 192, 197, 199, or 131 materials^{53} | (ρ^{o}, v^{T}, v^{L}) |

No. . | Study . | Data . | Results . | N_{total} and data sources
. | Feature sets . |
---|---|---|---|---|---|

1 | Naïve features | Sec. II B 1 | Sec. III A | 199 materials^{53} | (ρ^{o}, v^{T}, v^{L}) |

2 | Physics-informed features | Sec. II B 2 | Sec. III B 1 | 46 materials^{53,54} | (ρ^{o}, v^{T}, v^{L}) vs(ρ^{o}, v^{T}, v^{L}, γ) |

3 | Linear combination of compositional features (LCCFs) | Sec. II B 3 | Sec. III B 2 | 159 materials^{53,56} | (ρ^{o}, v^{T}, v^{L}) vs $( m \xaf , r \xaf , \chi \xaf , E \xaf , T \xaf)$ vs $( \rho o , v T , v L , r \xaf , \chi \xaf , E \xaf , T \xaf)$ |

4 | Segregated vs combined data | Sec. II C 1 | Sec. III C 1 | 199 materials^{53} | (ρ^{o}, v^{T}, v^{L}) |

5 | Source data variance | Sec. II C 2 | Sec. III C 2 | 131, 192, 197, 199, or 131 materials^{53} | (ρ^{o}, v^{T}, v^{L}) |

#### 1. Naïve features: Density and speed of sound

In study 1, the feature set is comprised of the density, shear, and longitudinal sound velocities $( x\u2208 R 3)$. This is presently called the “naïve” set because this information is already provided in Ref. 53 and the variables are correlated to the shock compression properties in many solids.^{57–60} A separate density is reported in Ref. 53 for each sample prior to its measurement but only a single value based on their average is used in the input feature vector presently.

#### 2. Physics-informed features: Thermodynamics

Shock compression is a strongly thermophysical process. This insight can be featurized through the Grüneisen parameter. The resulting feature vector in study 2 is presently termed the physics-informed feature set containing the naïve features concatenated by the Grüneisen parameter $( x \u2208 R 4)$. Under high compression, a change in density must inevitably lead to a change in the acoustic wave speed, which is relevant to the Grüneisen parameter. The efficacy of including this value into the features is, therefore, examined. To ensure that the dimensionality of the feature vectors is the same with and without the Grüneisen parameter, the “without” case is trained and tested with the Grüneisen parameter set to a constant value of zero for all materials. The “with” case is then trained and tested with the values of Grüneisen parameter from Ref. 54. The values in Ref. 54 were estimated using zero pressure thermodynamic parameters such as thermal expansion coefficient, zero pressure bulk modulus, specific heat at constant volume, and initial density.

#### 3. Feature augmentation: Linear combination of compositional features (LCCFs) for mixtures

In study 3, we explore the use of simple, low-cost compositional information using a linear combination of compositional features (LCCFs) $( x \u2208 R 5)$. The terms of the LCCF vector $( x \u2208 R 5)$ are the atomic mass $ m \xaf$, atomic radius $ r \xaf$, electronegativity $ \chi \xaf$, ionization energy $ E \xaf$, and melting point $ T \xaf$. LCCF provides underlying contextual information that can be used to enrich available material data. The study will first examine the predictive performance of LCCF terms alone as the sole features (“LCCF-Only”) and then the effect of concatenating the LCCF features to the naïve features (“LCCF + Naïve”).

The LCCF vector is comprised of features obtained from simple rules of mixtures and basic information about the composition. Namely, the features are based on the mole fractions of ingredients, a combined single molecule representation of all fractional chemical species, even for mixtures containing multiple ingredients, and basic properties for elements in the periodic table. This is described in the discussion for Eq. (1). The data in Ref. 53, in contrast, contain a unique density for each experimental sample; two samples with ostensibly identical molecular formula may report different density values due to sample variations. This necessarily means LCCF, when used on its own, cannot account for polymorphs, allotropes, microstructure, grain orientation, or any configurational detail that is represented by the simple molecule representation. Other efforts have estimated the SH of mixtures using the SH of the individual constituents^{10} and used ML with rules of mixtures to estimate the elastic properties of alloys.^{61} In Fig. 2, the SH curves are estimated in three possible ways using distinct features. The first featurization uses the as-is measured data provided in Ref. 53 (Naïve), the second uses notional ingredients (LCCF-Only), and the third concatenates both of these (LCCF + Naïve).

_{3}H

_{6}N

_{6}O

_{6}, $a1=C$, $a2=H$, $ x 1=3$, $ x 2=6$, and so on. Then, the feature is determined using

*,*$ \varphi a 2,\u2026, \varphi a n$ are the feature values of the constituent elements. This is effectively a mole-fraction-weighted rule of mixtures. The feature values, $ \varphi a 1$, $ \varphi a 2,\u2026, \varphi a n$, come from Ref. 55 for each constituent element. We use the elemental properties that are based on the stable physical state at standard room temperature and pressure conditions typically published in the periodic table. The data also include information about the melting temperature. For consistency, we use the as-is value of melting temperature even though the elemental stable phase may not be a solid.

The final set of features used for LCCF and LCCF + Naïve are found after eliminating information redundancy among the terms. The redundancy was determined using the Variance Inflation Factor (VIF)^{62} based on the statsmodels package.^{63} VIF represents the degree of multicollinearity, or information redundancy, among feature variables. The VIF was determined for each feature and values > 10 are taken to be an indication of high collinearity. The final LCCF feature set is $ x={ m \xaf , r \xaf , \chi \xaf , E \xaf , T \xaf}$. The final LCCF + Naïve set is $ x={ \rho o , v T , v L , r \xaf , \chi \xaf , E \xaf , T \xaf}$ after $ m \xaf$ was removed due to its high VIF value and likely information redundancy with $ \rho o$.

### C. Scarce data and data variance

The lack of training data is a major challenge in the translation of ML to any field.^{64,65} With such a small amount of data and the need to model many material types and classes, variances in data can exacerbate the problems associated with scarce data.

#### 1. ML from segregated vs combined material class data

Study 4 is used to assess the value of combining data from multiple material classes when in the small data regime, despite the variance of properties between material classes. ML model performance is determined by comparing ML prediction accuracy for models trained and tested on each class separately vs an ML model trained and tested on all of the classes combined. This evaluates a transfer learning effect due to the enrichment of the data from multiple classes. The nine material classes in the data set are described in Sec. II A. The number of materials in each class ranges between 5 and 56. The training and evaluation procedures are described in Sec. II D 3.

#### 2. Accounting for variances in data

A second type of variance occurs in the data due to potential variations in measured properties even within a single SH curve. For example, within any single SH curve in the data of Ref. 53, different points are recorded with its own unique density.

In study 5, the effect of this type of variance on the ML models is studied by regrouping the data into five subsets with progressively larger degrees of SH variability with respect to the slope coefficient, $ C 1$, in the SH model, which is described below in Sec. II D 1. The first subset is comprised of 131 materials and has the smallest variance. The data are limited to materials whose standard error of the SH regression slope ( $ s C 1$, defined in S1) is smaller than 0.04, shown in Fig. 3. Materials whose $ s C 1$ is larger are shown in Fig. S4 in the supplementary material to have SH curves that exhibit a noticeable departure from linearity. The 131 materials in this subset exclude materials with $ s i C 1>0.04$ and $ n i<4$ (five such materials), the latter due to the use of the standard error.

The next three subsets correspond to materials whose standard error in $ C 1$ is bounded from above by $ s i C 1<0.2$, $ s i C 1<0.4$, and $ s i C 1<0.7$, respectively, contains 192, 197, and 199 materials. The fifth and final subset ensures that the sample size effects are accounted for and limits standard error to $ s i C 1<0.7$ but for only 131 materials randomly chosen across trials, identical in size to the first subset. Notably, the fourth subset contains all materials in the data, implying that all materials have standard errors in the regression slope smaller than 0.7. However, due to the smaller sample size in the fifth subset, we reference the fifth subset using $ s i C 1<\u221e$.

### D. Models

#### 1. Target model: Higher-order polynomial SH relation

^{66}can be used to convert between $ u s u p$ and $PV$ information using the respective statements for conservation of mass and momentum,

^{67–70}

^{71–73}porosity,

^{10}and varying material composition or material characteristics

^{10}as is found in materials with microstructures that are vitreous, fibrous, or anisotropic. However, for the data considered presently, we found that many of the materials exhibit a dominant linear behavior. It will be shown later in Sec. III A that higher-order terms in the model struggle to provide the level of ML prediction accuracy obtained using a simpler linear assumption. So, it is also convenient to define a linear SH model,

The coefficients obtained directly via least squared from the source data are used as ground-truth values to evaluate the ML models in Sec. II D 2. In Sec. S10 in the supplementary material, orthogonal polynomials are shown to yield improved higher-order accuracy but at the cost of reduced accuracy at lower orders.

#### 2. ML models

ML methods, such as neural networks,^{74} ridge regression^{75} and Gaussian process regression,^{76} have been used previously to learn material structure–property relationships. Gaussian process regression (GPR) is particularly noted for its ability to model stochastic systems^{77} and is, therefore, a good choice for the study of SH data. The main idea in GPR models is to use covariances in data to parameterize a jointly distributed probability density function. The function typically presumes a kernel. In Sec. III A, GPR will be shown to be the strongest performing model in this study and will, therefore, serve as the primary ML model used elsewhere in Sec. III.

^{78}implementation is based on Scikit-learn.

^{79}For a detailed introduction to the fundamentals of GPR, we refer the readers to Ref. 78. In this work, the kernel function $k( x i , x i \u2032)$ is defined as a sum of a Matern covariance kernel $[ \sigma f 2 k 1 ( x i , x i \u2032 )]$and white noise kernel $[ k 2 ( x i , x i \u2032 )]$, where $i,i \u2032\u2208( 1 , N)$ are material indices,

*N*is the number of materials in training data, $ \sigma f 2$ is a variance hyperparameter, and $ x i, x i \u2032\u2208 R m$ are the input feature vectors. $ k 1( x i , x i \u2032)$ is a Matern correlation kernel given by

^{79}

^{80}$ \Gamma (\nu )$ is the gamma function,

*l*is the length scale parameter of the kernel, and $\nu $ is the smoothness parameter of the learned function. The physical features in $ x i$ may contain density and sound velocities (see Sec. II B 1), Grüneisen parameter (see Sec. II B 2) or LCCF (see Sec. II B 3) as summarized in Table I. Section S8 in the supplementary material provides a comparison of the SH predictions using an isotropic and anisotropic kernel. Based on the observations in Sec. S8 in the supplementary material, we proceed with an isotropic kernel in the rest of this study. The white noise kernel specifies the noise level for the GPR by adding independently and identically normally distributed noise to the kernel $k( x i , x i \u2032)$. The white noise kernel is defined by

Each of the coefficients in Eqs. (4) and (5) is modeled using an independent GPR model. The hyperparameters are the length scale *l*, smoothness $\nu $ in Matern kernel in Eq. (6), and the $ \alpha 2$ in the white noise kernel in Eq. (7). The popular choices for $\nu $ are 1/2,^{81,82} 3/2,^{83–85} and 5/2.^{84–87} The GPR model is determined by the values of the hyperparameters $\nu $, *l*, and $ \alpha 2$ that give the minimum value of negative log likelihood.

In this work, the mean prediction of the GPR model on the test data is used in the evaluation of model performance. The performance is compared with other ML models including ridge regression (RR) and neural network (NN). The results compare only the best performing RR model, which was obtained with optimized hyperparameters based on a grid search approach^{79} with fivefold cross validation. The NN model architecture has two hidden layers with 64 and 32 neurons each with ReLU activation except in the last layer, which uses a linear activation function. The model was trained using the Adam optimizer, mean square error (MSE) loss criterion with a learning rate of 0.0001, and a batch size of 16. The hyperparameters, associated with the NN implemented using Keras,^{88} were optimized using data in the validation set. The validation set is formed using a random 10% subset of the training set. Results are shown for the best performing NN model with the lowest validation loss vs epoch using early stopping callback^{88} (patience of 100 epochs and validation loss as the monitor). The maximum number of training epochs was set to 1000 epochs.

#### 3. ML model training and evaluation procedures

For each study in Table I, the models are trained and evaluated using a set of 50 trials. Where appropriate, a 90/10 train-test split by material class is used in each trial. That is, 90% of the materials from each material class, rounded to the nearest integer, are included in the training set in each trial. The material classes are defined in Sec. II A. Before each trial, the data are shuffled and randomized with different values of random seed. 25 materials comprise each test set usually comprised of at least one material from every class. The hyperparameters of the model are optimized in each trial. Each element in the input feature vector (Sec. II B) and target vector (Sec. II D 1) was independently scaled in the range [0,1]. Scaling was done before training using the MinMaxScaler^{79} function.

This training procedure is used in the method described in Sec. II B 1. The methods described in Secs. II B 2, II B 3, and II C 2 also follow the same procedure but with data limited according to the conditions described in Table I.

In study 4, the combined data set follows the same approach for model training. In the trials that use segregated data, however, each material class is divided using the 90/10 train-test ratio but the data from each class is used to train separate GPR models. Each class is retrained and tested 50 times (i.e., 50 trials). The model evaluation procedure that follows next is then applied to each material in the test set. Further details are provided in Sec. S3 in the supplementary material. To ensure consistency in the comparisons of any two models, the same random seed is used for all models.

## III. RESULTS

### A. SH property prediction using naïve features

Table II summarizes the results of study 1. The effect of the choice of polynomial order on the prediction accuracy is shown. The linear SH produces the highest accuracy in terms of both $ R S H$ as well as $ R \xaf r P$ metrics. The higher accuracy is attributable to the use of a fixed set of input features (density and sound velocities) but a smaller number of target parameters (two coefficients). The coefficients of the higher-order terms in the SH polynomial appear sensitive to features beyond density and sound velocities, likely indicating that these two features are not correlated to the nonlinearity in the SH curve. The larger errors associated with the higher-order coefficients ultimately compound the errors in the final SH prediction, thereby decreasing the performance. In the case of the third order polynomial, we find the prediction performance in terms of $ R i H$ to be exceptionally poor. We find that such highly negative occurrences arise due to poor ML based estimation of the coefficient of higher-order terms and when the state points within an SH curve take large $ u s$ and $ u p$ values. Under these conditions, we observe that the predicted state points in the SH curve with large values of $ u s$ and $ u p$ have significant deviations from the experimental data points. We, therefore, limit further study henceforth to the linear form of the SH model shown in Eq. (5).

Polynomial order . | ML model . | $ R S H$ for test data . | $ R \xaf r P( r = 0 , 1 , 2 , 3)$ for test data . | ||||
---|---|---|---|---|---|---|---|

$ R ~ S H$ . | $ R \xaf S H$ . | $ R \xaf 0 P$ . | $ R \xaf 1 P$ . | $ R \xaf 2 P$ . | $ R \xaf 3 P$ . | ||

1 | GPR | 0.90 | 0.65 | 0.86 | 0.28 | … | … |

NN | 0.85 | 0.56 | 0.82 | 0.17 | … | … | |

RR | 0.88 | 0.53 | 0.82 | 0.15 | … | … | |

2 | GPR | 0.75 | 0.27 | 0.74 | 0.17 | −0.04 | … |

NN | 0.83 | 0.43 | 0.69 | 0.13 | 0.04 | … | |

RR | 0.86 | 0.50 | 0.69 | 0.10 | −0.02 | … | |

3 | GPR | −3.24 | −18.08 | 0.12 | −0.19 | −0.32 | −0.59 |

NN | −29.75 | −145.54 | 0.33 | −0.14 | −0.66 | −0.42 | |

RR | −2.50 | −10.49 | 0.34 | −0.18 | −0.32 | −0.55 |

Polynomial order . | ML model . | $ R S H$ for test data . | $ R \xaf r P( r = 0 , 1 , 2 , 3)$ for test data . | ||||
---|---|---|---|---|---|---|---|

$ R ~ S H$ . | $ R \xaf S H$ . | $ R \xaf 0 P$ . | $ R \xaf 1 P$ . | $ R \xaf 2 P$ . | $ R \xaf 3 P$ . | ||

1 | GPR | 0.90 | 0.65 | 0.86 | 0.28 | … | … |

NN | 0.85 | 0.56 | 0.82 | 0.17 | … | … | |

RR | 0.88 | 0.53 | 0.82 | 0.15 | … | … | |

2 | GPR | 0.75 | 0.27 | 0.74 | 0.17 | −0.04 | … |

NN | 0.83 | 0.43 | 0.69 | 0.13 | 0.04 | … | |

RR | 0.86 | 0.50 | 0.69 | 0.10 | −0.02 | … | |

3 | GPR | −3.24 | −18.08 | 0.12 | −0.19 | −0.32 | −0.59 |

NN | −29.75 | −145.54 | 0.33 | −0.14 | −0.66 | −0.42 | |

RR | −2.50 | −10.49 | 0.34 | −0.18 | −0.32 | −0.55 |

The large differences between $ R \xaf S H$ and $ R ~ S H$ are due to the skewed ranges of $ R i H$ values where a small number of outlier negative values have an outsized influence on the mean; 49 out of 1250 total test candidates had $ R i H$ values less than −1.00. The median $ R ~ S H$ is, therefore, more suitable to describe central tendency. The associated cross-validation plots for the NN model are provided in Fig. S7 in the supplementary material.

The $ R i H$ and $ R r P$ values from the 50 trials are shown in Figs. 4(a)–4(c). GPR clearly outperforms other ML models (NN, RR) in Fig. 4(a) in terms of the $ R S H$ mean, median, and interquartile range and in Fig. 4(b) in terms of $ R \xaf 0 P$. In Fig. 4(c), however, the values of $ R \xaf 1 P$ are relatively lower, indicating poor prediction performance for the SH slope. The associated mean square error (MSE), on the other hand, between $ C ^ 1 i$ and $ C 1 i$ is 0.1. This is in the context of ground-truth $ C 1 i$ values with a mean of 1.38 and a 95% confidence interval of (1.33, 1.43). This seemingly conflicting behavior is due to the relatively narrow range of $ C 1 i$ values across the many different materials in the data. That is, the low $ R \xaf 1 P$ is a consequence of a moderate MSE value, which is in the numerator in Eq. (9), that gets divided by a fairly small-valued variance in the denominator. The variance is indicated by the narrow range of slopes among different materials. The underlying physical cause of the variations is in the inability of the current input features of capturing incipient appearance of anharmonic or nonlinear material effects. The low values of $ R \xaf 1 P$ are an indication that it is the most conservative accuracy metric considered here. However, poor slope prediction does not presently preclude high $ R i H$, which appears to be more useful as an indicator of overall SH accuracy.

Shown in Fig. 5 is the accuracy in histogram form. The accuracy is reported relative to the experimental data for each material in the test set across all 50 trials. Of the 1250 test instances, 633 had $ R i H$ values above 0.9, and 841 had values above 0.8. Due to random selection, only 197 of 199 materials in the data appeared among the 1250 test materials. Figs. S3(a)–S3(i) in the supplementary material show the violin plots of $ R i H$ values grouped by material class. The predicted and experimental SH curves of the materials in the test data for each class are shown in Fig. S4 in the supplementary material.

### B. Alternative features

#### 1. Thermodynamic features

The thermodynamically inspired feature set containing the Grüneisen parameter is used in study 2, as described in Sec. II B 2. Figure 6 depicts the effect of adding this feature to the naïve feature set for the cross-validation study of 46 materials. The changes in accuracy by adding the Grüneisen parameter are modest but show categorical increases in $ R ~ S H$ by 0.8%, increases $ R \xaf S H$ by 7.3%, and reduces IQR $( R S H)$ by 56%. The appearance of negative values of $ R 1 P$ and the higher metric values than in Sec. III A are attributable to the significantly smaller data used in this part of the study. Nevertheless, the observations once again appear to confirm that the naïve features are insufficient in representing the factors that lead to some of the nonlinear SH behaviors and that increasing the amount of data with more physics information, such as the Grüneisen parameter, may improve model performance.

#### 2. Linear combination of compositional features (LCCFs) for mixtures

The LCCF vector in study 3 contains only information about each notional composition and its pure ingredients, namely, the pseudomolecular formula along with the properties of the relevant chemical species from the periodic table. In Fig. 7, the model performances are shown. Unsurprisingly, LCCF-Only offers the lowest ML prediction performance among the three featurizations, even lower than the naïve features. It is remarkable, however, that LCCF + Naïve is strongest in predicting the SH. LCCF + Naïve is improved over naïve in $ R ~ S H$ by 4.5%, $ R \xaf S H$ by 3.9%, and IQR $( R S H)$ (lower) by 32%. A possible explanation for the improved $ R S H$ metrics is that significant underlying nonequilibrium effects may belie the SH curves. Anharmonic effects and mechanisms occurring at negative definite regions of the potential energy are not well represented by the experimental naïve features, which are presently limited to harmonic properties. It is likely that the nonequilibrium information about ionization enthalpy and melting temperature aids LCCF + Naïve in this case.

The well-understood competition in many investigations is the need for data containing the proper feature variables most closely correlated to the target properties vs the need for data that are easy to obtain or otherwise readily accessible through rapid calculation or measurements.^{52} In the second experiment LCCF-Only, we investigate the use of LCCF to overcome featurization challenges, particularly in the context of scarce data of material mixtures. A significant fraction of the present data is for materials that are in fact mixtures.^{53} Data containing the proper feature variables most closely correlated to the target properties can be prohibitively difficult to obtain. The LCCF-Only is evaluated using the experimental SH curves despite containing no experimental feature values, unlike the studies in Secs. III A and III B 1, which use input features whose values were determined from experiments.

Not surprisingly, the results in Fig. 7 make evident the poorer accuracy of the LCCF-Only models; in Fig. 7(a), naïve outperforms LCCF-Only by a wide margin. However, it is quite remarkable that a simple augmentation of elemental chemical properties leads to a substantial improvement in performance where LCCF + Naïve improves upon either LCCF-Only or naïve alone. This noteworthy find is an indication that future efforts may weigh the benefits of using hybrid synthetic data (a) in place of, (b) limiting the amount of, or (c) augmenting experimental data. To wit, in situations lacking experimental or physics-based simulation data, linear combinations of elemental properties may be used to substantially improve ML model prediction accuracy. Figures 7(b) and 7(c) largely reiterate the observations of Sec. III A that underlying causes of nonlinearity within tightly clustered predictions of slope lead to poor $ R \xaf 1 p$ but without hurting the overall physical accuracy indicated by the $ R S H$ metrics.

### C. Data scarcity

#### 1. ML from segregated vs combined material class data

In Fig. 8, the evaluation measures are shown for study 4 using the segregated and combined data. The number of materials in the classes used in the segregated data scenario is defined in Sec. II A. Though the segregated data provide a marginal improvement in $ R ~ S H$ and IQR $( R S H)$ by 2.1% and 2.4%, respectively, the reduction in $ R \xaf S H$ is quite substantial on the order of 24%. As evident from the thicker tail, the lower values of $ R \xaf S H$ for segregated data signify a relatively large number of materials with worse predictions. Such a drastic decline in $ R \xaf S H$ and the associated larger number of worse predictions favor the use of combined data over segregated data. The effect of combined data is attributed to the knowledge sharing via diversification that effectively occurs when joining the classes. Indeed, estimates of the SH of mixtures can be based on the SH of its individual constituents^{10} or weighted averages such as the kinetic energy averaging technique.^{89} Thus, combining the classes places a more diverse set of ingredients into the population from which to learn. The use of uniform input features (i.e., density and sound velocities) in the complementary data facilitates the sharing, and the fact that density and sound velocities are known to correlate well to shock compression behavior in multiple material classes^{7,57–60} likely aids the performance. Perhaps most importantly, however, this is an indication that the dearth of data for one class may be overcome by joining available data for other classes.

It should be briefly remarked that for the GPR model trained on segregated classes, two material classes with a single material in the test set (i.e., synthetics and aqueous solutions) were excluded from the evaluation of $ R r P$ as Eq. (9) is only valid for sample sizes of 2 or more. However, since each material in our data set contained at least two SH state points, no classes were exempted from the evaluation of $ R i H$ [Eq. (8)].

#### 2. Variances in the experimental data

The generalization errors for the subsets of data with progressively larger permitted variance are shown in Fig. 9(a)–9(c). Reducing variance in the data demonstrably leads to highly accurate (less bias) and more robust (less variance) predictions. Between the worst (far right) and best (far left) cases, the improvements were found in the $ R \xaf S H$ by 84% and $ R ~ S H$ by 6.0% and a reduction in IQR $( R S H)$ by 64%. The progressively larger variance was accompanied by a systematic and nearly monotonic decrease in prediction accuracy. Interestingly, the smaller sample size studied in the fifth subset ( $ s i C 1<\u221e$), which was associated with using 131 random samples over $ N t r i a l$ trials from the original data set containing 199 materials, has a pronounced effect in broadening the tail of the distribution (compared to the other four subsets). The use of subsets within the larger set with a greater variance magnifies the effect of noise and results in decreased ML performance. These trends are consistent in Figs. 9(b) and 9(c) where evaluation is performed with respect to the SH coefficients. The lone exception is in the third subset where the performance in prediction is contrary to the broader trend. This is an anecdotal effect related to the admitted materials in that subset with having an unusually easy time in predicting the SH curve intercept.

## IV. DISCUSSION

Insofar as prediction performance can be understood through a single coefficient of determination and in spite of the smallness of the data sets, $ R ~ S H$ values throughout this work have been remarkably high. However, a complete understanding of this performance must examine the worst performers particularly in the interest of steering future data curation efforts. The worst performers are evident in the stark differences in mean $ R \xaf S H$ and median $ R ~ S H$. This is due to $ R i H$ values that are consistently low for 18 out of the 197 unique materials. The worst performers are shown in Fig. 10, and the classes to which they belong are in Table III. These specific materials, starting with the worst performer, are forsterite ( $ \rho o=3.201 g / c m 3$), 3.201 composition B ( $ \rho o=1.715 g / c m 3$), steel ( $ \rho o=7.92 g / c m 3$), hematite ( $ \rho o=5.007 g / c m 3$), iron magnesium oxide ( $ \rho o=5.191 g / c m 3$), uranium dioxide ( $ \rho o=10.3 g / c m 3$), ilmenite ( $ \rho o=4.787 g / c m 3$), sillimanite ( $ \rho o=3.1 g / c m 3$), silicon carbide ( $ \rho o=3.122 g / c m 3$), albitite ( $ \rho o=2.61 g / c m 3$), eclogite ( $ \rho o=3.551 g / c m 3$), magnetite ( $ \rho o=5.117 g / c m 3$), wollastonite ( $ \rho o=2.87 g / c m 3$), strontium ( $ \rho o=2.628 g / c m 3$), beryllium oxide ( $ \rho o=2.989 g / c m 3$), anorthosite ( $ \rho o=2.732 g / c m 3$), zirconium dioxide ( $ \rho o=4.512 g / c m 3$), and carbon ( $ \rho o=1.492 g / c m 3$). The classes to which the worst belong are primarily the minerals, mixtures of minerals, and energetics’ classes. Elements, alloys, and other synthetics classes are the next lowest. 14 of the 18 worst prediction cases belong to the minerals and mixtures of minerals classes, even though the two classes contain 45 of the 199 materials in the data set.

Class name . | Total number of materials . | $ R \xaf S H$ . | Number of worst performers . | $ s i C 1<0.04$ . | $ s i C 1<\u221e$ . | |||
---|---|---|---|---|---|---|---|---|

Number of materials (% of materials) . | $ R \xaf S H$ by class . | Number of worst performers in each class (total unique materials in each class) . | $ R \xaf S H$ by class . | Number of worst performers (total test materials in each class) . | ||||

Elements | 56 | 0.81 | 2 | 8 (14%) | 0.84 | 1 (46) | 0.39 | 3 (54) |

Alloys | 31 | 0.72 | 1 | 7 (23%) | 0.93 | 0 (21) | 0.71 | 2 (31) |

Minerals | 32 | 0.11 | 11 | 23 (72%) | 0.65 | 1 (8) | −0.26 | 14 (32) |

Mixtures of minerals | 13 | 0.39 | 3 | 8 (62%) | 0.52 | 0 (5) | 0.36 | 3 (13) |

Plastics | 23 | 0.93 | 0 | 0 (0%) | 0.95 | 0 (23) | 0.91 | 0 (23) |

Other synthetics | 9 | 0.89 | 0 | 4 (44%) | 0.81 | 0 (4) | 0.91 | 0 (9) |

Liquids | 16 | 0.95 | 0 | 1 (6%) | 0.98 | 0 (14) | 0.95 | 0 (16) |

Aqueous solutions | 6 | 0.99 | 0 | 0 (0%) | 0.98 | 0 (6) | 0.98 | 0 (6) |

Energetics | 13 | 0.39 | 1 | 11 (85%) | 0.88 | 0 (2) | 0.07 | 3 (13) |

Class name . | Total number of materials . | $ R \xaf S H$ . | Number of worst performers . | $ s i C 1<0.04$ . | $ s i C 1<\u221e$ . | |||
---|---|---|---|---|---|---|---|---|

Number of materials (% of materials) . | $ R \xaf S H$ by class . | Number of worst performers in each class (total unique materials in each class) . | $ R \xaf S H$ by class . | Number of worst performers (total test materials in each class) . | ||||

Elements | 56 | 0.81 | 2 | 8 (14%) | 0.84 | 1 (46) | 0.39 | 3 (54) |

Alloys | 31 | 0.72 | 1 | 7 (23%) | 0.93 | 0 (21) | 0.71 | 2 (31) |

Minerals | 32 | 0.11 | 11 | 23 (72%) | 0.65 | 1 (8) | −0.26 | 14 (32) |

Mixtures of minerals | 13 | 0.39 | 3 | 8 (62%) | 0.52 | 0 (5) | 0.36 | 3 (13) |

Plastics | 23 | 0.93 | 0 | 0 (0%) | 0.95 | 0 (23) | 0.91 | 0 (23) |

Other synthetics | 9 | 0.89 | 0 | 4 (44%) | 0.81 | 0 (4) | 0.91 | 0 (9) |

Liquids | 16 | 0.95 | 0 | 1 (6%) | 0.98 | 0 (14) | 0.95 | 0 (16) |

Aqueous solutions | 6 | 0.99 | 0 | 0 (0%) | 0.98 | 0 (6) | 0.98 | 0 (6) |

Energetics | 13 | 0.39 | 1 | 11 (85%) | 0.88 | 0 (2) | 0.07 | 3 (13) |

We offer three probable causes for the low $ R \xaf S H$ associated with these material classes. The first cause—which is likely most important—is in the insufficiency of the physics in the current ML features and representations. This insufficiency leads to poor representation of the causes of nonlinearity in the SH curve. Nonlinearity is not a monolithic class of SH behaviors and can occur due to kinetic mechanisms or experimental noise. Kinetic mechanisms exist in these materials which may include pressure-induced phase changes or elastic–plastic transitions. Anharmonic properties can be used to reveal their incipient characteristics. However, when we attempted to use a higher-order polynomial in Sec. III A, the $ R S H$ metrics were shown to decrease when the higher-order terms were included.

Indeed, the worst performance is associated with materials expected to exhibit physical behaviors that lead to nonlinear SH curves. A typical nonlinear SH curve for a material undergoing yielding and phase change is shown in Fig. 11.^{90} The SH measurements induce pressures or stresses in the range 1–15 GPa.^{53} In beryllium oxide, the Hugoniot elastic limit (HEL) was observed to be 8.2 GPa.^{91} For magnetite, the HEL is around 5 GPa,^{92} and a high-pressure phase can be found above pressures of about 25 GPa.^{93} Silicon carbide undergoes an elastic-to-elastic–plastic transition at a shock velocity of 0.55 km/s or a stress of about 15–16 GPa.^{94} Similarly, the HEL of corundum occurs at around 15–21 GPa.^{95} In addition to an elastic–plastic transition, a Hugoniot kink appears at 79.3 GPa in corundum's SH curve, showing the beginning of a phase transition with a large volume change.^{95} A $ u s u P$ plot of albitite^{53} shows a region of particle velocity greater than 2.49 km/s or pressure greater than 40 GPa, which shows the existence of a shock-induced phase transformation.^{96}

The potential improvements through inclusion of anharmonic physics were shown in Sec. III B 1 apropos the inclusion of the Grüneisen parameter as a feature. Previous shock compression studies^{97–101} have shown that the equation of state of many materials depends on a nonconvex internal energy, which is composed of both mechanical strain energy and thermal energy. As the Grüneisen parameter^{102} is an anharmonic property, it provides greater information about the nonconvex structure of the potential energy surface than the density and sound velocity alone. Theoretical works by Slater,^{103} Dugdale and McDonald,^{104} and the Free Volume model^{105} have previously shown the existence of a direct correlation between the Grüneisen parameter and the slope of the SH curve.

Furthermore, though the present work is limited to isothermal shock compression,^{98} the Grüneisen parameter will likely serve as an important feature in nonisothermal conditions as well. This is evident when considering the fact that the EOS can be derived from internal energy, *U*, through the classical relation $P=\u2212 ( \u2202 U \u2202 V ) S$. The internal energy due to shock compression, measured as half of the product of shock pressure and specific volume change (Hugoniot equation),^{57} is primarily stored in the form of mechanical work due to deformation or strain and thermal vibrations. The thermal energy contribution to the internal energy, which is in general not limited to isothermal conditions, is a function of the Grüneisen parameter.^{106}

The second cause for the worse performing materials may be due to the improper reading of the source experimental data. For some materials, such as beryllium oxide, magnetite, albitite, and corundum, we noticed regions in the data of Ref. 53 where the shock velocity is negatively sloped with respect to the particle velocity. This is contrary to the general rule that a stable shock wave must have a shock velocity that increases with pressure,^{10} which is presumed true in the present work and is, therefore, a confounding effect during the training of the ML models. However, such negative regions may be realistic and must be accounted for with greater care. Other works^{90} suggest these are explainable physical mechanisms. In beryllium oxide,^{107} for example, a negative trend can be seen when the shock front breaks into two or more waves due to elastic–plastic transition or phase transformation in an experiment that is setup to only to measure the first waves of the shock front. This may happen in flash gap experiments, for instance, where the first wave can be of sufficient amplitude to close the flash gap and cause the gas to light before the second main wave reaches the target.^{10,107}

The third cause for the worst performers is the scarcity of data, both in terms of the number of measurements taken along an SH curve as well as in the number of materials available in the data. A small value of $ n i$ means the experimental uncertainty at each data point can lead to large variations in the coefficients of the SH model. For instance, in Fig. 10, steel 348 ( $ \rho o=7.92 g / c m 3$) has only two points. Table III shows that study 5 in Sec. II C 2 filters out a significant percentage of materials from minerals, mixtures of minerals, and energetic classes. Thus, a larger percentage of materials in those three classes have higher shock velocity variance than others.

The scarcity of material types in the data can be analyzed using spatial population density with respect to density $ \rho o$ and bulk sound speed $ C i b$ defined as $ C i b= ( v i L ) 2 \u2212 ( 4 3 ) ( v i T ) 2$ in terms of the shear $ v i T$ and longitudinal $ v i L$ sound velocities.^{53} The left panel in Fig. 12 shows the clustering of materials by class, and the right plot is annotated using the worst performer materials. Regions of lower population density coincide with materials with lower ML prediction performance. In Fig. 13, we furthermore see that $ C i b$ is a more informative feature than $ \rho o$ for SH prediction. Figure 13 was obtained by using $ C i b$ and $ \rho o$ as a separate individual feature for predicting SH curves. This means that each new material selected for inclusion into the ML workflow will have a greater impact if it increases the population density of points with respect to the sound velocity. In the present work, the region $20 km / s< C i b<70 km / s$ is particularly scarce of data; only 55 of 199 materials currently used occupy this range whereas the 0 to 20 km/s contains 144 materials. Indeed, as shown in Fig. 12, the worse predictions primarily cluster in the interval $20 km / s< C i b<70 km / s$. Notably, 36 of the 55 materials in the interval $20 km / s< C i b<70 km / s$ are from minerals and mixtures of minerals, constituting almost 80% of all minerals and mixtures of minerals; the remaining 19 are from the elements and alloys’ classes, constituting about 22% of all elements and alloys in the 199 material data set.

## V. CONCLUSION

In this paper, we performed a comprehensive study of an ML framework to estimate the shock Hugoniot (SH) of solid-state materials using relatively scarce data. We systematically examined featurization approaches and ML techniques to estimate the SH curves for diverse material systems. This establishes a potential for further generalizability to enable investigations of alternate material classes even beyond the scope of the data considered presently using only a limited amount of ambient state data as input. Data sets for training and testing were based on experiments on 199 materials from diverse material classes. The following are the main findings:

Among the models studied, which included Gaussian process regression, ridge regression, and a neural network, Gaussian process regression performs better at estimating SH curves of solids.

Despite the scarce data context of this study, the problem of SH prediction through ML is found to be feasible with an expectation for improvements with increased data size and improvements through feature engineering.

By restricting the SH model to a linear form, the median prediction performance of

*R*^{2}= 0.90 was obtained with high generalizability to different materials.When noise or nonlinear effects in SH is present in the data, particularly when the standard error of the regression slope $( s i C 1)>0.04$, the ML model performance can be substantially degraded. The model trained on a set of materials with lower noise ( $ s i C 1<0.04$) improved the average prediction accuracy by more than 33%. A systematic dependency was observed with smaller noise leading to greater accuracy in ML predictions.

Physically informed feature selection and engineering are important in ML model development. By leveraging separate but available data to include Grüneisen parameters in the feature vector, significant improvements in average prediction accuracy by 7.3% and reduction in interquartile range by 56% were observed.

Diversity in material classes by combining multiple complementary data sets can significantly improve ML performance via a knowledge sharing effect over models developed from data for a single class when data are scarce. Comparison of the models trained with combined data to those trained using segregated classes reveals that the former model showed about 24% relative improvement.

Where available, experimental features are preferred for training ML models. However, low-cost compositional features may have significant merit in the absence of more costly features and data. A new featurization technique, the so-called linear combination of compositional features (LCCF), was found to improve both the median and mean $ R 2$ by 4% when used to augment the existing experimental data.

## SUPPLEMENTARY MATERIAL

See the supplementary material for further details of data, data analysis procedures, and ancillary supportive results.

## ACKNOWLEDGMENTS

Support is gratefully acknowledged for this work under ONR (Contract No. N00014-19-C-1052), Energetics Technology Center Project (No. 2044-002), and Army Cooperative Agreement (No. W911NF2120076).

## AUTHOR DECLARATIONS

### Conflict of Interest

R.M.D. and W.H.W. are part-time employees of the Energetics Technology Center.

### Author Contributions

**Sangeeth Balakrishnan:** Writing – original draft (equal). **Francis G. VanGessel:** Writing – review & editing (equal). **Brian C. Barnes:** Writing – review & editing (equal). **Ruth Doherty:** Writing – review & editing (equal). **William Wilson:** Writing – review & editing (equal). **Zois Boukouvalas:** Writing – review & editing (equal). **Mark D. Fuge:** Writing – review & editing (equal). **Peter W. Chung:** Writing – original draft (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available in the supplementary material..

## REFERENCES

_{3}perovskite at high pressures: Equation of state, structure, and melting transition

*ab initio*molecular dynamics

*Ab initio*molecular dynamics simulations for thermal equation of state of B 2-type NaCl

*ab initio*evolutionary techniques: Principles and applications

*Ab initio*random structure searching

*High-Pressure Shock Compression of Solids*

*PubChem Periodic Table of Elements*

*Static Compression of Energetic Materials*

*Proceedings of the 9th Python in Science Conference*

*Gaussian Processes for Machine Learning*

_{2}adsorption on highly porous MOF-based adsorbents using Gaussian process regression approach

*Effects of Explosions on Materials: Modification and Synthesis Under High-Pressure Shock Compression*

*Mineral Physics and Crystallography: A Handbook of Physical Constants*

*Beryllium Oxide (BeO) Handbook*

_{2}O

_{3}) observed under shock compression

*Finite Deformation of an Elastic Solid*