Machine-learning (ML)-based interatomic potentials are increasingly popular in material modeling, enabling highly accurate simulations with thousands and millions of atoms. However, the performance of machine-learned potentials depends strongly on the choice of hyperparameters—that is, of those parameters that are set before the model encounters data. This problem is particularly acute where hyperparameters have no intuitive physical interpretation and where the corresponding optimization space is large. Here, we describe an openly available Python package that facilitates hyperparameter optimization across different ML potential fitting frameworks. We discuss methodological aspects relating to the optimization itself and to the selection of validation data, and we show example applications. We expect this package to become part of a wider computational framework to speed up the mainstream adaptation of ML potentials in the physical sciences.
I. INTRODUCTION
Machine learning (ML) is rapidly gaining importance in developing interatomic potential models for molecules and materials.1–5 ML potentials are fitted to (“trained on”) reference databases of energies and forces on atoms, typically encompassing several hundreds or thousands of representative small-cell computations with density-functional theory (DFT), and they are able to reach similar accuracy as the reference method at orders-of-magnitude lower computational cost. Consequently, ML potentials are now being widely used with applications ranging from the varied structures of elemental carbon6–10 to complex battery materials.11–15 Furthermore, ML potentials are also now able to accurately simulate yet unknown phases, holding promise for use in assisting and guiding material design.13,16
Methods for developing reference databases for ML potentials range from iterative exploration and random structure searching to active learning.17–20 However, even if a reliable database is available, fitting accurate and robust potentials to it is not trivial: in many, if not all, ML frameworks, there are parameters that need to be chosen before the model encounters data (and often in-between rounds of fitting). The latter are referred to as “hyperparameters,” distinct from fitting parameters such as the neural-network weights that are optimized during training. In the present paper, we denote a specific set of hyperparameters as and the space of all possible choices as (that is, ). The individual entries in can be discrete, such as the number of basis functions in a structural descriptor, or continuous, e.g., the weighting of different components of the model. Overall, the expectation is that hyperparameter optimization can increase the accuracy and robustness of potentials for a given reference database (Fig. 1).
Schematic overview of the scope of the present work. An ML potential fitting code takes as input the reference data (gray) sampled at selected points and also a set of hyperparameters (blue) that can be either discrete or continuous (sketched for two arbitrary parameters, “1” and “2,” respectively). An optimizer is used to improve model fitting over time by suggesting updated hyperparameters after each fit. After several iterations, the error of the ML fit improves to a satisfactory level, and the optimization is deemed to be complete.
Schematic overview of the scope of the present work. An ML potential fitting code takes as input the reference data (gray) sampled at selected points and also a set of hyperparameters (blue) that can be either discrete or continuous (sketched for two arbitrary parameters, “1” and “2,” respectively). An optimizer is used to improve model fitting over time by suggesting updated hyperparameters after each fit. After several iterations, the error of the ML fit improves to a satisfactory level, and the optimization is deemed to be complete.
Software packages have been developed for hyperparameter optimization, both for general ML tasks21–23 and specifically for atomistic ML.24–26 Previous work in the latter area has explored several strategies. Grid searches are frequently used when there is a tractable number of parameters to try,27,28 and Sobol sequences,29 swarm optimization,30 and Bayesian optimization26,31 have been explored. For example, Spectral Neighbor Analysis Potential (SNAP)32 models have been optimized with several methods with regard to the parameters used in the atomic representation and the weighting of energy and force training data in the fit.8,32,33 Very recent work has focused on optimizing two- and three-body descriptors through variance analysis.34 Thus far, however, most existing packages in the context of ML potentials are either linked to a single fitting methodology31,32 or optimize regressors separately from descriptor hyperparameters.26
Here, we describe a Python package to facilitate hyperparameter optimization for ML potentials across multiple fitting methods, and we discuss physical questions related to that process. We aim to provide an easy-to-use interface to ML potential fitting software—enabling automated benchmarking across platforms, for example, with a uniform format for user input and data files. We view this tool as a platform for experimenting with hyperparameters in ML potential development and as part of a future, wider computational infrastructure for the routine generation of ready-to-use potentials.
II. METHODOLOGY
A. ML potential fitting
In the Gaussian Approximation Potential (GAP) framework,36,37 is predicted by a sparse Gaussian process regression model using a set of M representative points in the fit. A kernel function, k, is used to determine the similarity of atomic neighbor environments: at training stage to compare input data and also at prediction stage to measure the similarity of the new, jth environment to the M ones known from training.
In this work, we focus on “classical” ML methods such as Gaussian process regression rather than deep-learning models, which are also widely used for describing chemical systems. Currently, the optimization of deep-learning potentials is not included in our package.
B. Optimization task
We use Bayesian Optimization (BO) to minimize the loss function. BO is effective for global optimization of functions where only the inputs and outputs are known but not the functional form itself.41 Therefore, BO can be used for any function but is inefficient if the function has known linearity or other specific structure.41 Here, we use BO as a generalizable method for optimizing across multiple ML potential fitting methods.
We note that in related work, Stuke et al.31 demonstrated the favorable performance of random searches and BO as the dimensionality of increases, especially against ordered “grid” style searches. Additionally, work on optimizing hyperparameters for atomistic neural-network models showed further improvement when using Centralized BO (CBO) over random-search methods.25
C. Validation and testing data
A common approach in ML research is to split the data into training and validation sets using k-fold cross-validation [Fig. 2(a)]. We include this functionality in our package, and we also explore optimization against an “external” dataset that is entirely separate from the training data [Fig. 2(b)]. The motivation for doing so is that early on in the development process of an ML potential, the reference dataset may not yet be comprehensive enough to allow for useful validation, and we therefore hypothesize that fixed sets of external data may exist that are suitable for hyperparameter optimization. In this case, an external dataset could either be computed by the user or sourced from existing structure libraries, such as the Samara Carbon Allotrope Database (SACADA)43 for carbon allotropes, or the Materials Project44 for other structures. The choice of validation structures is important: if their selection and thus the loss function is not representative of the properties required, then, the minimum found by optimization will not necessarily be useful when applying the potential in practice (e.g., in large-scale MD).
Schematic of the two ways of evaluating the loss function, , for hyperparameter optimization. In all cases, an ML potential model, denoted for simplicity by the general function , is fitted to the training data and used to predict the labels for the points in the validation set. (a) Cross-validation on the reference dataset itself, which is split into k folds for training (gray) and testing (orange), drawn in a similar style as in Ref. 42. (b) Optimization for a user-defined, separate, external test set. In this case, all reference data are used for the potential fit. Both options are available in XPOT.
Schematic of the two ways of evaluating the loss function, , for hyperparameter optimization. In all cases, an ML potential model, denoted for simplicity by the general function , is fitted to the training data and used to predict the labels for the points in the validation set. (a) Cross-validation on the reference dataset itself, which is split into k folds for training (gray) and testing (orange), drawn in a similar style as in Ref. 42. (b) Optimization for a user-defined, separate, external test set. In this case, all reference data are used for the potential fit. Both options are available in XPOT.
D. Loss function
III. IMPLEMENTATION
We implemented the above approaches in an open-sourced Python package that we call “Cross-Platform Optimizer for Potentials” (XPOT). Figure 3 provides an overview.
Flowchart detailing the operation of the XPOT package, enclosed by a dashed line, and its interfaces to external software for ML potential model fitting, evaluation, and Bayesian optimization (BO). For external calls (green), relevant software packages are mentioned in the figure and correspond to current functionality; we note, however, that the code is designed to be general where possible and to allow for compatibility with other frameworks as well.
Flowchart detailing the operation of the XPOT package, enclosed by a dashed line, and its interfaces to external software for ML potential model fitting, evaluation, and Bayesian optimization (BO). For external calls (green), relevant software packages are mentioned in the figure and correspond to current functionality; we note, however, that the code is designed to be general where possible and to allow for compatibility with other frameworks as well.
XPOT uses a single input file structure for all potential types with only the selected hyperparameters changing between fitting frameworks. The optimization process is consistent for all fitting methods with the gp_minimize optimization scheme from scikit-optimize used to wrap the BO. While scikit-optimize is chosen here, many general optimization packages, such as hyperopt21 or Optuna,22 are available. Initially, hyperparameters are chosen to sample the loss function for N iterations. These points are then used to generate an initial approximation for as the optimization stage starts. Data from previous runs can also be used as a prior if the unoptimized hyperparameter values have not changed. However, can be extended between sweeps by adding the fixed hyperparameter value to the priors. can include discrete (non-continuous) hyperparameter values if required, which is common in ML potential fitting for values such as twojmax (FitSNAP46) as well as zeta, n_max, and l_max for SOAP (gap_fit).
Iterations are performed serially, allowing the loss approximation function to be calculated after each fit. The approximation model is generated via a Gaussian process, and an acquisition function selects the hyperparameters to be sampled in the next iteration. Between fits, the approximation function is re-evaluated and new hyperparameters are selected. As such, the limiting factor in “real-time” optimization with XPOT is the sequential nature of the iterative process.
We summarize relevant hyperparameters in Table I, indicating those that can be converged out (i.e., a larger value typically means a better model, albeit at larger cost) with asterisks in the “Conv.” column. The other parameters do not have a known optimum, and those are the ones for which it is expected that hyperparameter optimization will be most useful. While Table I refers to the GAP and SNAP frameworks used in the present work,47,48 we note that interfaces to other ML potential fitting methods and optimization processes are expected to be added to XPOT in the future.
Hyperparameters and their ranges in XPOT. We list hyperparameter settings that are relevant to the GAP and (quadratic) SNAP fitting frameworks discussed in the present work. Settings that are convergence parameters (i.e., will generally improve the model when increased) are marked with an asterisk (∗) in the “Conv.” column and are therefore not optimized unless the user specifically requests it.
. | . | Description . | Conv. . | Default . | Default range . |
---|---|---|---|---|---|
GAP | cutoff | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |
atom_sigma | Gaussian width for SOAP neighbor density (Å) | 0.5 | 0.2–0.8 | ||
n_max | Maximum n for radial basis functions (SOAP) | ∗ | |||
l_max | Maximum l for angular basis functions (SOAP) | ∗ | |||
zeta | Power to which the kernel is raised (SOAP) | 4 | 2–6 | ||
n_sparse | Number of representative points, M | ∗ | |||
sigmaa | Regularization for energies (eV at.−1) | 0.01 | 0.001–0.1 | ||
Regularization for forces (eV Å−1) | 0.1 | 0.01–0.5 | |||
(q)SNAP | rcutfac | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |
twojmax | (2×) value of Jmax used to formulate SNA descriptors | ∗ | |||
rfac0 | Distance to angle conversion parameters | 0.994 | 0.6–1.6 | ||
rmin0 | 0 | 0–0.1 | |||
wj | Weighting of each element in the fit | 1.0 | 0.2–2.0 | ||
eweight | Energy weighting for input data | 100 | 10–105 | ||
fweight | Force weighting for input data | 1 | 0.1–100 |
. | . | Description . | Conv. . | Default . | Default range . |
---|---|---|---|---|---|
GAP | cutoff | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |
atom_sigma | Gaussian width for SOAP neighbor density (Å) | 0.5 | 0.2–0.8 | ||
n_max | Maximum n for radial basis functions (SOAP) | ∗ | |||
l_max | Maximum l for angular basis functions (SOAP) | ∗ | |||
zeta | Power to which the kernel is raised (SOAP) | 4 | 2–6 | ||
n_sparse | Number of representative points, M | ∗ | |||
sigmaa | Regularization for energies (eV at.−1) | 0.01 | 0.001–0.1 | ||
Regularization for forces (eV Å−1) | 0.1 | 0.01–0.5 | |||
(q)SNAP | rcutfac | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |
twojmax | (2×) value of Jmax used to formulate SNA descriptors | ∗ | |||
rfac0 | Distance to angle conversion parameters | 0.994 | 0.6–1.6 | ||
rmin0 | 0 | 0–0.1 | |||
wj | Weighting of each element in the fit | 1.0 | 0.2–2.0 | ||
eweight | Energy weighting for input data | 100 | 10–105 | ||
fweight | Force weighting for input data | 1 | 0.1–100 |
Input is given in the format {energy:force:stress:hessian}. The stress regularizer is not optimized in the present work to keep the number of parameters tractable. The Hessian value is set to zero and not used.
As part of the output, Python scripts for initial analysis of newly fitted ML potentials can be generated, offering relevant standard tests: potential-energy scans for dimers in vacuum (“dimer curves”) as well as validation energy and force output for all fitted models. An example is provided in the GitHub repository hosting the code (see data and code availability statements in the following).
IV. PERFORMANCE
In this section, we demonstrate the functionality of XPOT by optimizing a series of ML potential fits for published datasets and by comparing the numerical performance of the re-fitted potentials to existing ones. We first optimize a series of qSNAP models for elemental systems to show that we can reach the same quality level as in a previous benchmark study,28 and we then show examples of re-fitting existing potentials with different fitting frameworks (“cross-platform” optimization).
For the first test, we re-fitted qSNAP models to the datasets from Zuo et al. (Ref. 28) and compared to the accuracies reported in that work. Figure 4 shows two examples, and a comprehensive overview is provided in Table II. After a preliminary low-cost sweep of values, α values of 0.7–0.9 were used to balance energies and forces for fitting [cf. Eq. (8)]. The qSNAP fits were optimized over 150 iterations. Further optimization and more extensive initial exploration improved accuracy further, but improvements diminished beyond 300 iterations.
Illustrative examples of optimization runs with XPOT, shown for three series of representative qSNAP models fitted to existing databases. The figure shows the convergence of numerical error measures for sequentially optimized qSNAP models, evaluated for the Si and Cu datasets from Zuo et al. (Ref. 28) as well as the structurally much more complex Si-GAP-18 reference database from Bartók et al. (Ref. 49). Blue (orange) dashed lines represent the errors for qSNAP (GAP) models from the cited papers, respectively. The gray shaded areas indicate the initialization stage (here, carried out using a Hammersley sequence; cf. Ref. 50).
Illustrative examples of optimization runs with XPOT, shown for three series of representative qSNAP models fitted to existing databases. The figure shows the convergence of numerical error measures for sequentially optimized qSNAP models, evaluated for the Si and Cu datasets from Zuo et al. (Ref. 28) as well as the structurally much more complex Si-GAP-18 reference database from Bartók et al. (Ref. 49). Blue (orange) dashed lines represent the errors for qSNAP (GAP) models from the cited papers, respectively. The gray shaded areas indicate the initialization stage (here, carried out using a Hammersley sequence; cf. Ref. 50).
Energy and force component RMSE values for a series of testing databases for elemental systems, provided in Ref. 28.
. | E RMSE (meV/at.) . | F RMSE (eV/Å) . | ||||
---|---|---|---|---|---|---|
Reference 28 . | XPOT . | Reference 28 . | XPOT . | |||
GAP . | qSNAP . | qSNAP . | GAP . | qSNAP . | qSNAP . | |
Ni | 0.62 | 1.04 | 0.80 | 0.04 | 0.07 | 0.04 |
Cu | 0.56 | 1.16 | 0.52 | 0.02 | 0.05 | 0.03 |
Li | 0.63 | 0.85 | 0.59 | 0.01 | 0.04 | 0.01 |
Mo | 3.55 | 3.96 | 3.51 | 0.16 | 0.33 | 0.17 |
Si | 4.18 | 6.28 | 3.31 | 0.12 | 0.29 | 0.11 |
Ge | 4.47 | 10.55 | 3.74 | 0.08 | 0.20 | 0.11 |
. | E RMSE (meV/at.) . | F RMSE (eV/Å) . | ||||
---|---|---|---|---|---|---|
Reference 28 . | XPOT . | Reference 28 . | XPOT . | |||
GAP . | qSNAP . | qSNAP . | GAP . | qSNAP . | qSNAP . | |
Ni | 0.62 | 1.04 | 0.80 | 0.04 | 0.07 | 0.04 |
Cu | 0.56 | 1.16 | 0.52 | 0.02 | 0.05 | 0.03 |
Li | 0.63 | 0.85 | 0.59 | 0.01 | 0.04 | 0.01 |
Mo | 3.55 | 3.96 | 3.51 | 0.16 | 0.33 | 0.17 |
Si | 4.18 | 6.28 | 3.31 | 0.12 | 0.29 | 0.11 |
Ge | 4.47 | 10.55 | 3.74 | 0.08 | 0.20 | 0.11 |
Figure 4 illustrates the effect of using a combined energy and force loss during optimization, most notably visible in the results for Cu. While the loss value (lower panel) decreases, the constituent energy and force errors sometimes increase for one while the other decreases, emphasizing the importance of a well-chosen α value to balance both. The plots also illustrate that XPOT occasionally finds good hyperparameters in the initialization section (gray). This is highly dependent on the chosen search space and sampling method, determining whether one of the sampling points lands close to a minimum on the loss surface. In these cases, as seen in Fig. 4, there is less improvement to be gained by further optimization (Si, left) compared to other systems (Cu, center).
More generally, we find that our optimized qSNAP models compare favorably to qSNAP and GAP results by Zuo et al.,28 especially for Mo and Si (Table II). We note that the absolute differences are small—in fact, we are reporting more significant digits than we would normally do—and that the authors’ qSNAP fits and ours will likely have been carried out with different versions of the fitting code. The main qualitative conclusion to be drawn from Table II is therefore that XPOT reaches good performance across chemical systems.
For the second test, we moved to a more widely applicable, “general-purpose” ML potential with a highly complex reference database ( atomic environments), viz., Si-GAP-18 (Ref. 49). We used XPOT to optimize qSNAP fits to the same database. In this case, we match exactly the train–test split and use the published testing set for validation (rather than cross-validation) for direct comparability with the original potential.49 The results are shown on the right-hand side of Fig. 4: in this case, our optimized qSNAP model reaches an energy error of about 10 meV/at. (compared to 6 meV/at. for the GAP of Ref. 49) and a force error of about 0.15 eV/Å (compared to 0.10 eV/Å). We view this as a representative example of re-fitting with a different ML potential framework to the database of an already highly optimized model (here, GAP-18)—one possible application case could be to use a graphics processing unit (GPU) implementation that is not available for the published model or more generally to gain evaluation speed at the cost of some loss of accuracy.
To expand the range of tests further, we fitted a series of linear and quadratic SNAP ML potentials to the C-GAP-17 dataset for elemental carbon.51 C-GAP-17 is an example of a potential that has been (manually, at the time) optimized for the description of liquid and amorphous phases; it has notoriously high numerical errors compared to more recent potential fits27,52 and yet is already able to describe important physical aspects related to carbon.6,53–55 We use this example here to measure the performance of models re-fitted at different levels and in Sec. V to explore the physical role of hyperparameters.
Table III compares the accuracy of linear and quadratic SNAP models across J values. The cutoff radius was fixed to 3.7 Å, as in C-GAP-17,51 and the training and testing sets used were identical to those in Ref. 51. These potentials were fitted using α = 0.75, similar to the qSNAP tests on data from Ref. 28, but higher than for those on the Si-GAP-18 database. The rfac0 and rmin0 hyperparameters were optimized for the SNA descriptors as well as the energy and force weights within the default ranges given in Table I. All these were optimized in a single sweep to demonstrate the ability of BO (and thus XPOT) to survey large hyperparameter spaces.
Performance of potentials fitted to the C-GAP-17 dataset. Energy per atom and force errors are included as well as MD benchmark speeds. Errors are calculated on the C-GAP-17 testing dataset.51 All simulations were carried out on the same CPU architecture, and the speed is given relative to the GAP-17 model (4.4 steps s−1).
. | . | E RMSE (meV/at.) . | F RMSE (eV/Å) . | MD speed (relative) . |
---|---|---|---|---|
Linear SNAP | (J = 3) | 192.2 | 4.34 | 65 |
(J = 4) | 117.6 | 2.43 | 24 | |
(J = 5) | 76.8 | 1.68 | 9 | |
(J = 6) | 63.2 | 1.40 | 4 | |
qSNAP | (J = 3) | 50.8 | 1.26 | 38 |
(J = 4) | 41.4 | 1.00 | 14 | |
(J = 5) | 34.9 | 0.89 | 5 | |
(J = 6) | 28.8 | 0.81 | 2 | |
C-GAP-17 | (Reference 51) | 42.0 | 1.26 | 1 |
. | . | E RMSE (meV/at.) . | F RMSE (eV/Å) . | MD speed (relative) . |
---|---|---|---|---|
Linear SNAP | (J = 3) | 192.2 | 4.34 | 65 |
(J = 4) | 117.6 | 2.43 | 24 | |
(J = 5) | 76.8 | 1.68 | 9 | |
(J = 6) | 63.2 | 1.40 | 4 | |
qSNAP | (J = 3) | 50.8 | 1.26 | 38 |
(J = 4) | 41.4 | 1.00 | 14 | |
(J = 5) | 34.9 | 0.89 | 5 | |
(J = 6) | 28.8 | 0.81 | 2 | |
C-GAP-17 | (Reference 51) | 42.0 | 1.26 | 1 |
Our linear SNAP model fits did not match the accuracy of the C-GAP-17 potential up to J = 6. In contrast, qSNAP models optimized with XPOT show favorable numerical energy and force errors for J = 4 and higher. It should be noted that the twojmax hyperparameter in the FitSNAP input46 is 2Jmax and, thus, it is double the J values we report throughout this paper.
We ran MD simulations with the resulting potentials and found that as J increases so too does the computational cost of evaluation, as expected. We report these data in the rightmost column of Table III. The computational cost of moving from linear to quadratic fitting was significantly less than that of increasing J.
We note that while the cost of evaluation does not change by more than a factor of two upon moving from linear to quadratic SNAP models, the cost of fitting is increased further (∼3.7× for J = 5 and 7.4× for J = 6) and cannot be significantly improved by increasing computational resources. This is because the solver used by FitSNAP46 is single-core, and so, although the calculation of the descriptors is parallelized, the fitting of the potential becomes the bottleneck in the optimization process.
V. EXPERIMENTS
A. Physical role of hyperparameters
We now describe numerical experiments that explore aspects of XPOT and its operation. The first point is to do with the meaning of specific hyperparameters: these could be viewed as purely free optimization parameters, on one hand, or as physically constrained and informed ones, on the other hand.37
Amorphous carbon (a-C) presents a relevant challenge for fitting interatomic potentials with ML or otherwise54 due to the diverse nature of its atomic structure. In Sec. IV, we have shown SNAP model fits with optimized hyperparameters, leading to lower numerical errors than those of C-GAP-17. The final SNAP models are computationally cheaper than C-GAP-17 and can run on GPU architectures, but the robustness of the potential is reduced compared to C-GAP-17, as detailed in the following.
We used relatively flexible and numerically accurate qSNAP potentials with J = 6 in this case. We note that when fitting a potential from scratch, a larger and more diverse dataset with a lower J value might provide a more cost-effective potential for extensive large-scale simulations, such as the qSNAP model by Willman et al. (Ref. 8; J = 4) for high-pressure carbon systems.
A low value of rfac0 directly reduces the amount of the 3-sphere that can be used for mapping the possible neighbor positions because points south of θ0 are excluded. Thompson et al.32 state that “it is advantageous to use most of the 3-sphere while still excluding the region near the south pole where the configurational space becomes highly compressed.” Thus, both very high or low values of rfac0 can diminish accuracy and/or robustness of the resulting potential. For a-C, values lower than 0.7 resulted in potentials becoming unstable in MD, and we, hence, specified . While it may be possible for potentials with low values of rfac0 to be stable, we were unable to perform 9000 K NVT simulations without erroneous behavior (“losing atoms”)—even when using a 0.2 fs timestep, a factor of five shorter than that used for typical C-GAP-17 MD simulations.6,51
Our final qSNAP J = 6 potential is stable at 9000 K with a 0.2 fs timestep, and when replicating the short, small-scale melt-quench simulations in Ref. 51, it predicted 53.5% sp3 environments (Fig. 5, bottom) compared to 57.5% by GAP-17% and 65.7% from Ab Initio Molecular Dynamics (AIMD) simulations after the melt-quench simulations. We checked the predicted radial distribution functions for 216 atom-cells, finding good agreement with DFT and C-GAP-17 predictions for quenched a-C between 1.5 and 3.5 g/cm3.
Analyzing qSNAP models for carbon. (a) Potential-energy surface of an isolated C–C dimer, evaluated for XPOT-optimized qSNAP models compared to DFT and C-GAP-17.51 (b) Percentage of atomic environments in an sp3 (fourfold-connected) configuration during an MD simulation following the same protocol as in Ref. 51: randomization at 9000 K (3 ps), followed by holding the liquid at 5000 K for 3 ps, a rapid quench to 300 K (0.5 ps), and holding the quenched structure at 300 K (3 ps). The SNAP model characterized in blue was fitted with a fixed 3.7 Å Rcut (rcutfac); the model characterized in red is based on an optimization run that included optimization of the cutoff (final value, 3.49 Å). The shaded areas indicate standard deviations over three (DFT, from Ref. 51) and ten (qSNAP) separate runs.
Analyzing qSNAP models for carbon. (a) Potential-energy surface of an isolated C–C dimer, evaluated for XPOT-optimized qSNAP models compared to DFT and C-GAP-17.51 (b) Percentage of atomic environments in an sp3 (fourfold-connected) configuration during an MD simulation following the same protocol as in Ref. 51: randomization at 9000 K (3 ps), followed by holding the liquid at 5000 K for 3 ps, a rapid quench to 300 K (0.5 ps), and holding the quenched structure at 300 K (3 ps). The SNAP model characterized in blue was fitted with a fixed 3.7 Å Rcut (rcutfac); the model characterized in red is based on an optimization run that included optimization of the cutoff (final value, 3.49 Å). The shaded areas indicate standard deviations over three (DFT, from Ref. 51) and ten (qSNAP) separate runs.
Next, we investigated optimizing the cutoff radius. We ran a second optimization sweep with the same ranges selected for optimization, adding a variable cutoff. All hyperparameters optimized for a fixed cutoff were re-optimized in the variable-cutoff sweep. Here, the “best” potential was fitted with a cutoff of 3.49 Å compared to the 3.7 Å used previously, and the energy and force weightings were altered slightly (albeit the energies of the dimer structures are lowest-weighted in both cases).
Figure 5 shows that the model with optimized cutoff has almost identical behavior for the isolated-dimer curve compared to the fixed-cutoff potential. However, we find that the optimized-cutoff-radius SNAP better predicts sp3 character in carbon MD. When repeating the tests from above, the new potential predicts 60.0% sp3 carbon environments. In the lower panel of Fig. 5, we show these results, noting that for both SNAP models the sp3 content increased more slowly during quenching than for the AIMD simulations. We also fitted a potential with Rcut = 3.49 Å with the other hyperparameters remaining the same as for our fixed-cutoff potential, aiming to separate the effect of Rcut from that of other aspects of the model. This potential gave 56.9% sp3 content, which is between the fixed- and optimized-cutoff potentials. This test suggests that optimization of cutoff can be beneficial, alongside optimizing other hyperparameters, to optimize the physical behavior of an ML potential.
Comparing the hyperparameters of optimized potentials across the different datasets from Zuo et al. (analyzed above in Table II), we found that the optimal rfac0 changes significantly from element to element, ranging between 0.5 (Cu) and 0.845 (Mo). Notably, not only does the optimal value of rfac0 depend on the system, we observed that in the case of the Si GAP-18 database, the optimal rfac0 and rcutfac values were different to that of the Zuo et al. dataset for silicon. Specifically, rfac0 was optimized to 0.87 (GAP-18) and 0.58 (Zuo et al.). We also found that the optimized cutoff for these databases differed, viz., 4.68 vs 4.23 Å. These findings emphasize that different types of reference data (e.g., liquid vs crystalline structures and general vs specialized datasets) may require different hyperparameters for an optimized fit, and therefore, they underline the usefulness of hyperparameter optimization in the development of ML potential models.
B. Balancing energies and forces in the loss function
We use a combined loss function that includes energy and force error terms, controlled by a factor α that takes values between zero and one [Eq. (8)]. We investigated the effect of tuning this balance by optimizing a series of potentials from one where the loss only depends on force errors [α = 0 in Eq. (8)] to one where it only depends on energy errors (α = 1). For this experiment, at each value of α, we fitted ten independent qSNAP models with J = 5 for the Si-GAP-18 database49 (as in Sec. IV).
Figure 6(a) illustrates the requirement for joint optimization of the energy and force errors, which is achieved through the selection of α. In optimizing for only forces or energies, we produced potentials that are skewed significantly to either force or energy and are not accurate at predicting the respective other property.
Balancing the effect of energy and force errors in the definition of the loss function, . (a) Trade-off of errors. The plot shows energy and force errors for qSNAP (J = 5) fits to the Si-GAP-18 database with varied values of the parameter α. As expected, energy errors (blue) and force errors (green) are improved when is weighted more toward the respective quantity, demonstrating the importance of having a well-chosen α. Error bars show standard deviations across 10 independent runs with differing random seeds initializing the Hammersley sequence. (b) Optimized cutoff values obtained for the potentials characterized in panel (a).
Balancing the effect of energy and force errors in the definition of the loss function, . (a) Trade-off of errors. The plot shows energy and force errors for qSNAP (J = 5) fits to the Si-GAP-18 database with varied values of the parameter α. As expected, energy errors (blue) and force errors (green) are improved when is weighted more toward the respective quantity, demonstrating the importance of having a well-chosen α. Error bars show standard deviations across 10 independent runs with differing random seeds initializing the Hammersley sequence. (b) Optimized cutoff values obtained for the potentials characterized in panel (a).
Across the range 0 ≥ α ≥ 1, the energy RMSE decreases, and the force RMSE increases. However, the change is not linear, and for this particular system, the optimal value of α lies between 0.3 and 0.5. Not only do the errors change with varying α, but the standard deviation of the errors across the ten optimization runs also broadly decreases as the weighting of the corresponding part of the loss function is increased. For J = 4, we found this effect to be even further pronounced. Thus, a well-chosen α also increases reliability of optimization, improving consistency and performance across optimization sweeps.
Figure 6(b) shows the optimized cutoff values for potentials after optimizations at each α. When studying the trend in optimized cutoff values, we see that as α increases, so too does the optimal Rcut. Interestingly, the best force errors are found when the cutoff is lower, and better energy errors are found where higher Rcut values are preferred. In testing, we found that potentials at intermediate α values show robust performance in melt-quench simulations, while models with very small α occasionally demonstrated non-physical behavior. Further study of this behavior will be undertaken in the future.
To create well-balanced potentials, α should therefore be chosen being mindful of over-weighting either energies or forces. Using either α = 0 or 1 results in significantly less accurate potentials overall, and neither should be selected in general use.
C. Role of the validation set
In validating potentials and thus in hyperparameter optimization too, the dataset used plays a crucial role. If the test set is structurally similar to the training data, it will not comprehensively assess the ability of the potential to describe “unseen” structures. In the case of C-GAP-17, the training and testing data are structurally similar because they have been obtained by a random split of the overall dataset.51 We therefore tested whether a useful, but more generic, validation set might allow for efficient hyperparameter optimization, emphasizing robustness over accuracy.
We optimized a-C potentials using two other types of validation sets: (i) a set of snapshots from very-high-temperature, 15 000 K MD simulations performed with C-GAP-17 and (ii) a set of fully randomized structures (hard-sphere random-packing model). Here, we created our own validation datasets aimed specifically at highly disordered structures, but we note that existing external datasets such as SACADA43 could also be used to guide hyperparameter optimization for carbon. The new datasets were labeled with DFT and used independently as validation sets for optimization with XPOT. After the fact, we evaluated the errors of the optimized potentials for the respective other datasets. The results for qSNAP (J = 5) potentials are shown in Table IV.
Energy and force RMSE values of carbon potentials against the three test sets mentioned above. The validation set used for evaluating the loss function (cf. Fig. 2) is marked in boldface for each potential. The value in the C-GAP-17 test column for the cross-validated potential is the mean of the test errors across the testing data used in the 5-fold cross-validation.
. | Energy/force RMSE on defined test sets . | ||
---|---|---|---|
C-GAP-17 test set (meV/at.)/(eV/Å) . | High-T MD (meV/at.)/(eV/Å) . | Random packing (meV/at.)/(eV/Å) . | |
qSNAP (5-fold cross-validation) | 35.5/0.85 | 38.1/1.43 | 38.9/2.14 |
qSNAP (: C-GAP-17 test set) | 31.9/0.90 | 36.6/1.47 | 36.8/2.27 |
qSNAP (: High-T MD) | 39.5/0.87 | 38.0/1.32 | 41.4/2.12 |
qSNAP (: Random packing) | 194.8/0.90 | 47.1/1.36 | 39.3/1.68 |
C-GAP-17 (Ref. 51) | 42.0/1.26 | 65.8/1.51 | 33.8/1.60 |
. | Energy/force RMSE on defined test sets . | ||
---|---|---|---|
C-GAP-17 test set (meV/at.)/(eV/Å) . | High-T MD (meV/at.)/(eV/Å) . | Random packing (meV/at.)/(eV/Å) . | |
qSNAP (5-fold cross-validation) | 35.5/0.85 | 38.1/1.43 | 38.9/2.14 |
qSNAP (: C-GAP-17 test set) | 31.9/0.90 | 36.6/1.47 | 36.8/2.27 |
qSNAP (: High-T MD) | 39.5/0.87 | 38.0/1.32 | 41.4/2.12 |
qSNAP (: Random packing) | 194.8/0.90 | 47.1/1.36 | 39.3/1.68 |
C-GAP-17 (Ref. 51) | 42.0/1.26 | 65.8/1.51 | 33.8/1.60 |
We demonstrate that the validation set chosen has a direct impact on the hyperparameters selected during optimization and thus on the potential itself. The qSNAP model optimized on the random-packing dataset has very high energy errors on crystalline structures (which is expected as the atomic environments there are very different). We attribute this to a low weighting of the crystalline data in the optimization process as it allows for better fitting to random structures at the cost of worsened accuracy on more ordered environments.
We find that the high-T MD validation set provides generally robust and balanced results across the three testing sets; conversely, a purely random packing of atomic spheres is not informative for “real-world” applications. Comparing the optimized SNAPs to the original C-GAP-17 model, an interesting pattern emerges: C-GAP-17 overall has slightly higher errors and notably the highest energy error on the high-T MD dataset, and yet, it maintains reasonable accuracy when tested on the random-packing set—outperforming all our qSNAP models tested for this dataset, including the one specifically optimized with this set as validation. This observation suggests that C-GAP-17 is robust outside of regular structures, consistent with the physically meaningful performance seen in previous applications (e.g., Ref. 6).
VI. CONCLUSIONS
We have described an openly available Python package, which we call XPOT, that facilitates hyperparameter optimization for ML potentials by providing interfaces to widely used fitting software. We have shown practical applications of XPOT for “cross-platform” optimization of potential fits to datasets of quantum-mechanical reference data. In a series of numerical experiments, we explored the role of the loss function in this optimization and how it affects the characteristics of the resulting potentials with implications for the present software package and for atomistic ML more generally.
We find that potential fits with XPOT perform appreciably compared to literature data.28 A scaling factor, α, can be used to balance the weighting of energy and force validation during optimization, and we note that different α values for the Zuo et al.28 and Si-GAP-1849 datasets for silicon had to be chosen for best results. This requirement for manual user input remains one of the challenges in fully automating the fitting process. We also showed that being interfaced to a Bayesian optimization code, XPOT can explore large search spaces: up to 11 hyperparameters were optimized simultaneously in fitting qSNAP models for carbon.
In the future, XPOT will include functionality for optimizing other classes of ML potentials as well as asynchronous optimization to offer better parallelization across large computing resources and facilities. We expect that one of the most important use cases for XPOT will be in optimizing those hyperparameters that do not significantly affect the evaluation cost of an ML potential (e.g., the relative weighting of different datapoints in the fit): this way, one can improve the quality of prediction with no additional computational cost at runtime. Other parameters do affect the cost—and in such cases, especially for general-purpose ML potentials and large-scale simulations, the extra time spent on hyperparameter optimization is likely favorable over the cost one would incur with more expensive settings.
We have focused on hyperparameter optimization as but one component in the development pipeline of ML potential models. Looking further, it seems beneficial to combine this approach with strategies such as active learning for database generation and datapoint selection. We will explore these synergies in the future work and in the long run expect XPOT to become part of a larger framework (and set of workflows) for the automated and routine generation of machine-learned potentials for mainstream material modeling.
ACKNOWLEDGMENTS
We thank J. D. Morrow and J. L. A. Gardner for testing and reviewing the code and A. P. Thompson for helpful comments on the manuscript. This work was supported through a UK Research and Innovation Frontier Research grant (Grant No. EP/X016188/1).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Daniel F. Thomas du Toit: Conceptualization (equal); Investigation (lead); Software (lead); Writing – original draft (equal). Volker L. Deringer: Conceptualization (equal); Supervision (lead); Writing – original draft (equal); Writing – review & editing (lead).
DATA AVAILABILITY
The XPOT Python code is openly available at https://github.com/dft-dutoit/xpot. Interfaces to the LAMMPS,56, FitSNAP,46 and QUIP/GAP36 codes are provided, and all these codes are freely available for non-commercial research. Data and code to reproduce the results from the paper are provided in the same repository.