Machine-learning (ML)-based interatomic potentials are increasingly popular in material modeling, enabling highly accurate simulations with thousands and millions of atoms. However, the performance of machine-learned potentials depends strongly on the choice of hyperparameters—that is, of those parameters that are set before the model encounters data. This problem is particularly acute where hyperparameters have no intuitive physical interpretation and where the corresponding optimization space is large. Here, we describe an openly available Python package that facilitates hyperparameter optimization across different ML potential fitting frameworks. We discuss methodological aspects relating to the optimization itself and to the selection of validation data, and we show example applications. We expect this package to become part of a wider computational framework to speed up the mainstream adaptation of ML potentials in the physical sciences.

## I. INTRODUCTION

Machine learning (ML) is rapidly gaining importance in developing interatomic potential models for molecules and materials.^{1–5} ML potentials are fitted to (“trained on”) reference databases of energies and forces on atoms, typically encompassing several hundreds or thousands of representative small-cell computations with density-functional theory (DFT), and they are able to reach similar accuracy as the reference method at orders-of-magnitude lower computational cost. Consequently, ML potentials are now being widely used with applications ranging from the varied structures of elemental carbon^{6–10} to complex battery materials.^{11–15} Furthermore, ML potentials are also now able to accurately simulate yet unknown phases, holding promise for use in assisting and guiding material design.^{13,16}

Methods for developing reference databases for ML potentials range from iterative exploration and random structure searching to active learning.^{17–20} However, even if a reliable database is available, fitting accurate and robust potentials to it is not trivial: in many, if not all, ML frameworks, there are parameters that need to be chosen before the model encounters data (and often in-between rounds of fitting). The latter are referred to as “hyperparameters,” distinct from fitting parameters such as the neural-network weights that are optimized during training. In the present paper, we denote a specific set of hyperparameters as $H={h1,\u2026,hN}$ and the space of all possible choices as $H$ (that is, $H\u2208H$). The individual entries in $H$ can be *discrete*, such as the number of basis functions in a structural descriptor, or *continuous*, e.g., the weighting of different components of the model. Overall, the expectation is that hyperparameter optimization can increase the accuracy and robustness of potentials for a given reference database (Fig. 1).

Software packages have been developed for hyperparameter optimization, both for general ML tasks^{21–23} and specifically for atomistic ML.^{24–26} Previous work in the latter area has explored several strategies. Grid searches are frequently used when there is a tractable number of parameters to try,^{27,28} and Sobol sequences,^{29} swarm optimization,^{30} and Bayesian optimization^{26,31} have been explored. For example, Spectral Neighbor Analysis Potential (SNAP)^{32} models have been optimized with several methods with regard to the parameters used in the atomic representation and the weighting of energy and force training data in the fit.^{8,32,33} Very recent work has focused on optimizing two- and three-body descriptors through variance analysis.^{34} Thus far, however, most existing packages in the context of ML potentials are either linked to a single fitting methodology^{31,32} or optimize regressors separately from descriptor hyperparameters.^{26}

Here, we describe a Python package to facilitate hyperparameter optimization for ML potentials across multiple fitting methods, and we discuss physical questions related to that process. We aim to provide an easy-to-use interface to ML potential fitting software—enabling automated benchmarking across platforms, for example, with a uniform format for user input and data files. We view this tool as a platform for experimenting with hyperparameters in ML potential development and as part of a future, wider computational infrastructure for the routine generation of ready-to-use potentials.

## II. METHODOLOGY

### A. ML potential fitting

*E*, is predicted as a sum of per-atom energies,

^{35,36}

*j*identifies an individual atom in the structure. We review essential points of the fitting methods that are used in the present work, and we expect to extend to other methods in the future.

In the Gaussian Approximation Potential (GAP) framework,^{36,37} $\epsilon \u0302j$ is predicted by a sparse Gaussian process regression model using a set of *M* representative points in the fit. A kernel function, *k*, is used to determine the similarity of atomic neighbor environments: at training stage to compare input data and also at prediction stage to measure the similarity of the new, *j*th environment to the *M* ones known from training.

^{38,39}which is formulated using Gaussians centered on atomic positions, up to a cutoff radius,

*r*

_{cut}, and with smoothness

*σ*

_{at}. This neighbor density is then expanded in a local basis,

*i*th neighbor atoms, the second up to maximum values of

*n*

_{max}and

*l*

_{max}(which are hyperparameters controlling the resolution of the local basis), and

*g*

_{n}and $Ylm$ are radial basis functions and spherical harmonics, respectively. After collecting the individual $cnlmi$ in the rotationally invariant power-spectrum vector,

**p**

_{i}, the SOAP kernel becomes

*ζ*is the power to which the dot product is raised.

*c*

_{1}, …,

*c*

_{M}}, are determined by minimizing a regularized least-squares problem: the regularization here corresponds to the “expected error” of the input and is a tunable hyperparameter in GAP.

^{36,37}With the fitting coefficients known, a new prediction for the

*j*th atom is made as

^{32}framework, where, unlike in GAP, there is no sparsification. Bispectrum components are real and invariant under rotation. The SNAP energy of an atom is calculated as a function of

*K*bispectrum components with

*K*being determined by a convergence parameter,

*J*

_{max}. $Bkj$ denotes the

*k*th bispectrum component for atom

*j*, and

*β*

_{k}represents the corresponding linear coefficient computed by SNAP.

^{32,40}The linear SNAP energy prediction for an atom is

*β*

_{0}in Ref. 32, has been omitted for brevity.

^{40}the energy prediction can be written as follows:

**=**

*α**F*″

**a**⊗

**a**(a symmetric matrix). Equation (6) demonstrates that qSNAP differs from linear SNAP by including all distinct pairwise products of bispectrum components

**B**

^{j}in the atomic-energy expression. This extension provides improved accuracy compared to linear SNAPs of the same

*J*value.

^{28,40}

In this work, we focus on “classical” ML methods such as Gaussian process regression rather than deep-learning models, which are also widely used for describing chemical systems. Currently, the optimization of deep-learning potentials is not included in our package.

### B. Optimization task

**D**

_{val}is a set of validation data that are not included in the training.

We use Bayesian Optimization (BO) to minimize the loss function. BO is effective for global optimization of functions where only the inputs and outputs are known but not the functional form itself.^{41} Therefore, BO can be used for any function but is inefficient if the function has known linearity or other specific structure.^{41} Here, we use BO as a generalizable method for optimizing across multiple ML potential fitting methods.

We note that in related work, Stuke *et al.*^{31} demonstrated the favorable performance of random searches and BO as the dimensionality of $H$ increases, especially against ordered “grid” style searches. Additionally, work on optimizing hyperparameters for atomistic neural-network models showed further improvement when using Centralized BO (CBO) over random-search methods.^{25}

### C. Validation and testing data

A common approach in ML research is to split the data into training and validation sets using *k*-fold cross-validation [Fig. 2(a)]. We include this functionality in our package, and we also explore optimization against an “external” dataset that is entirely separate from the training data [Fig. 2(b)]. The motivation for doing so is that early on in the development process of an ML potential, the reference dataset may not yet be comprehensive enough to allow for useful validation, and we therefore hypothesize that fixed sets of external data may exist that are suitable for hyperparameter optimization. In this case, an external dataset could either be computed by the user or sourced from existing structure libraries, such as the Samara Carbon Allotrope Database (SACADA)^{43} for carbon allotropes, or the Materials Project^{44} for other structures. The choice of validation structures is important: if their selection and thus the loss function is not representative of the properties required, then, the minimum found by optimization will not necessarily be useful when applying the potential in practice (e.g., in large-scale MD).

### D. Loss function

^{45}), the definition of $L$ is straightforward: it will simply be given by the root mean square error (RMSE) or a similar metric for the individual predictions on the test set. The issue is more complex for interatomic potentials, where both energy and force errors are relevant and not normally directly proportional to each other. As is common in the field, we therefore use a combined loss function

*α*is a scaling parameter that allows the user to control the balance between a pure energy-based optimization (

*α*= 1) and a purely force-based one (

*α*= 0). We note that $L$ is dimensionless as we divide by the respective units (square brackets). Consequently, the

*absolute*values of $L$ do not carry meaning and are only comparable for a given value of

*α*.

*N*

_{cells}corresponds to the number of structures (cells) in the database and $E\u0302i$ is the energy prediction for the

*i*th cell in the dataset, whereas

*E*

_{i}is the corresponding true energy computed with quantum mechanics. We scale the deviation between the two by the

*square root*of the number of atoms in the cell,

*n*

_{at,i}; see Ref. 42 for details.

*j*again refers to an individual atom and

*N*

_{at}denotes the number of atoms in all of

**D**. ($F\u0302j$ here indicates the ML model prediction of the force vector for consistency with the notation above rather than the direction of the vector.)

## III. IMPLEMENTATION

We implemented the above approaches in an open-sourced Python package that we call “Cross-Platform Optimizer for Potentials” (XPOT). Figure 3 provides an overview.

XPOT uses a single input file structure for all potential types with only the selected hyperparameters changing between fitting frameworks. The optimization process is consistent for all fitting methods with the gp_minimize optimization scheme from scikit-optimize used to wrap the BO. While scikit-optimize is chosen here, many general optimization packages, such as hyperopt^{21} or Optuna,^{22} are available. Initially, hyperparameters are chosen to sample the loss function for *N* iterations. These points are then used to generate an initial approximation for $L(H)$ as the optimization stage starts. Data from previous runs can also be used as a prior if the unoptimized hyperparameter values have not changed. However, $H$ can be extended between sweeps by adding the fixed hyperparameter value to the priors. $H$ can include discrete (non-continuous) hyperparameter values if required, which is common in ML potential fitting for values such as twojmax (FitSNAP^{46}) as well as zeta, n_max, and l_max for SOAP (gap_fit).

Iterations are performed serially, allowing the loss approximation function to be calculated after each fit. The approximation model is generated via a Gaussian process, and an acquisition function selects the hyperparameters to be sampled in the next iteration. Between fits, the approximation function is re-evaluated and new hyperparameters are selected. As such, the limiting factor in “real-time” optimization with XPOT is the sequential nature of the iterative process.

We summarize relevant hyperparameters in Table I, indicating those that can be converged out (i.e., a larger value typically means a better model, albeit at larger cost) with asterisks in the “Conv.” column. The other parameters do not have a known optimum, and those are the ones for which it is expected that hyperparameter optimization will be most useful. While Table I refers to the GAP and SNAP frameworks used in the present work,^{47,48} we note that interfaces to other ML potential fitting methods and optimization processes are expected to be added to XPOT in the future.

. | . | Description . | Conv. . | Default . | Default range . |
---|---|---|---|---|---|

GAP | cutoff | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |

atom_sigma | Gaussian width for SOAP neighbor density (Å) | 0.5 | 0.2–0.8 | ||

n_max | Maximum n for radial basis functions (SOAP) | ∗ | |||

l_max | Maximum l for angular basis functions (SOAP) | ∗ | |||

zeta | Power to which the kernel is raised (SOAP) | 4 | 2–6 | ||

n_sparse | Number of representative points, M | ∗ | |||

sigmaa | Regularization for energies (eV at.^{−1}) | 0.01 | 0.001–0.1 | ||

Regularization for forces (eV Å^{−1}) | 0.1 | 0.01–0.5 | |||

(q)SNAP | rcutfac | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |

twojmax | (2×) value of J_{max} used to formulate SNA descriptors | ∗ | |||

rfac0 | Distance to angle conversion parameters | 0.994 | 0.6–1.6 | ||

rmin0 | 0 | 0–0.1 | |||

wj | Weighting of each element in the fit | 1.0 | 0.2–2.0 | ||

eweight | Energy weighting for input data | 100 | 10–10^{5} | ||

fweight | Force weighting for input data | 1 | 0.1–100 |

. | . | Description . | Conv. . | Default . | Default range . |
---|---|---|---|---|---|

GAP | cutoff | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |

atom_sigma | Gaussian width for SOAP neighbor density (Å) | 0.5 | 0.2–0.8 | ||

n_max | Maximum n for radial basis functions (SOAP) | ∗ | |||

l_max | Maximum l for angular basis functions (SOAP) | ∗ | |||

zeta | Power to which the kernel is raised (SOAP) | 4 | 2–6 | ||

n_sparse | Number of representative points, M | ∗ | |||

sigmaa | Regularization for energies (eV at.^{−1}) | 0.01 | 0.001–0.1 | ||

Regularization for forces (eV Å^{−1}) | 0.1 | 0.01–0.5 | |||

(q)SNAP | rcutfac | Cutoff radius (Å) | 4.0 | 2.5–8.0 | |

twojmax | (2×) value of J_{max} used to formulate SNA descriptors | ∗ | |||

rfac0 | Distance to angle conversion parameters | 0.994 | 0.6–1.6 | ||

rmin0 | 0 | 0–0.1 | |||

wj | Weighting of each element in the fit | 1.0 | 0.2–2.0 | ||

eweight | Energy weighting for input data | 100 | 10–10^{5} | ||

fweight | Force weighting for input data | 1 | 0.1–100 |

^{a}

Input is given in the format {energy:force:stress:hessian}. The stress regularizer is not optimized in the present work to keep the number of parameters tractable. The Hessian value is set to zero and not used.

As part of the output, Python scripts for initial analysis of newly fitted ML potentials can be generated, offering relevant standard tests: potential-energy scans for dimers in vacuum (“dimer curves”) as well as validation energy and force output for all fitted models. An example is provided in the GitHub repository hosting the code (see data and code availability statements in the following).

## IV. PERFORMANCE

In this section, we demonstrate the functionality of XPOT by optimizing a series of ML potential fits for published datasets and by comparing the numerical performance of the re-fitted potentials to existing ones. We first optimize a series of qSNAP models for elemental systems to show that we can reach the same quality level as in a previous benchmark study,^{28} and we then show examples of re-fitting existing potentials with different fitting frameworks (“cross-platform” optimization).

For the first test, we re-fitted qSNAP models to the datasets from Zuo *et al.* (Ref. 28) and compared to the accuracies reported in that work. Figure 4 shows two examples, and a comprehensive overview is provided in Table II. After a preliminary low-cost sweep of values, *α* values of 0.7–0.9 were used to balance energies and forces for fitting [cf. Eq. (8)]. The qSNAP fits were optimized over 150 iterations. Further optimization and more extensive initial exploration improved accuracy further, but improvements diminished beyond 300 iterations.

. | E RMSE (meV/at.)
. | F RMSE (eV/Å)
. | ||||
---|---|---|---|---|---|---|

Reference 28 . | XPOT . | Reference 28 . | XPOT . | |||

GAP . | qSNAP . | qSNAP . | GAP . | qSNAP . | qSNAP . | |

Ni | 0.62 | 1.04 | 0.80 | 0.04 | 0.07 | 0.04 |

Cu | 0.56 | 1.16 | 0.52 | 0.02 | 0.05 | 0.03 |

Li | 0.63 | 0.85 | 0.59 | 0.01 | 0.04 | 0.01 |

Mo | 3.55 | 3.96 | 3.51 | 0.16 | 0.33 | 0.17 |

Si | 4.18 | 6.28 | 3.31 | 0.12 | 0.29 | 0.11 |

Ge | 4.47 | 10.55 | 3.74 | 0.08 | 0.20 | 0.11 |

. | E RMSE (meV/at.)
. | F RMSE (eV/Å)
. | ||||
---|---|---|---|---|---|---|

Reference 28 . | XPOT . | Reference 28 . | XPOT . | |||

GAP . | qSNAP . | qSNAP . | GAP . | qSNAP . | qSNAP . | |

Ni | 0.62 | 1.04 | 0.80 | 0.04 | 0.07 | 0.04 |

Cu | 0.56 | 1.16 | 0.52 | 0.02 | 0.05 | 0.03 |

Li | 0.63 | 0.85 | 0.59 | 0.01 | 0.04 | 0.01 |

Mo | 3.55 | 3.96 | 3.51 | 0.16 | 0.33 | 0.17 |

Si | 4.18 | 6.28 | 3.31 | 0.12 | 0.29 | 0.11 |

Ge | 4.47 | 10.55 | 3.74 | 0.08 | 0.20 | 0.11 |

Figure 4 illustrates the effect of using a combined energy and force loss during optimization, most notably visible in the results for Cu. While the loss value (lower panel) decreases, the constituent energy and force errors sometimes increase for one while the other decreases, emphasizing the importance of a well-chosen *α* value to balance both. The plots also illustrate that XPOT occasionally finds good hyperparameters in the initialization section (gray). This is highly dependent on the chosen search space and sampling method, determining whether one of the sampling points lands close to a minimum on the loss surface. In these cases, as seen in Fig. 4, there is less improvement to be gained by further optimization (Si, left) compared to other systems (Cu, center).

More generally, we find that our optimized qSNAP models compare favorably to qSNAP and GAP results by Zuo *et al.*,^{28} especially for Mo and Si (Table II). We note that the absolute differences are small—in fact, we are reporting more significant digits than we would normally do—and that the authors’ qSNAP fits and ours will likely have been carried out with different versions of the fitting code. The main qualitative conclusion to be drawn from Table II is therefore that XPOT reaches good performance across chemical systems.

For the second test, we moved to a more widely applicable, “general-purpose” ML potential with a highly complex reference database ($>100K$ atomic environments), viz., Si-GAP-18 (Ref. 49). We used XPOT to optimize qSNAP fits to the same database. In this case, we match exactly the train–test split and use the published testing set for validation (rather than cross-validation) for direct comparability with the original potential.^{49} The results are shown on the right-hand side of Fig. 4: in this case, our optimized qSNAP model reaches an energy error of about 10 meV/at. (compared to 6 meV/at. for the GAP of Ref. 49) and a force error of about 0.15 eV/Å (compared to 0.10 eV/Å). We view this as a representative example of re-fitting with a different ML potential framework to the database of an already highly optimized model (here, GAP-18)—one possible application case could be to use a graphics processing unit (GPU) implementation that is not available for the published model or more generally to gain evaluation speed at the cost of some loss of accuracy.

To expand the range of tests further, we fitted a series of linear and quadratic SNAP ML potentials to the C-GAP-17 dataset for elemental carbon.^{51} C-GAP-17 is an example of a potential that has been (manually, at the time) optimized for the description of liquid and amorphous phases; it has notoriously high numerical errors compared to more recent potential fits^{27,52} and yet is already able to describe important physical aspects related to carbon.^{6,53–55} We use this example here to measure the performance of models re-fitted at different levels and in Sec. V to explore the physical role of hyperparameters.

Table III compares the accuracy of linear and quadratic SNAP models across *J* values. The cutoff radius was fixed to 3.7 Å, as in C-GAP-17,^{51} and the training and testing sets used were identical to those in Ref. 51. These potentials were fitted using *α* = 0.75, similar to the qSNAP tests on data from Ref. 28, but higher than for those on the Si-GAP-18 database. The rfac0 and rmin0 hyperparameters were optimized for the SNA descriptors as well as the energy and force weights within the default ranges given in Table I. All these were optimized in a single sweep to demonstrate the ability of BO (and thus XPOT) to survey large hyperparameter spaces.

. | . | E RMSE (meV/at.)
. | F RMSE (eV/Å)
. | MD speed (relative) . |
---|---|---|---|---|

Linear SNAP | (J = 3) | 192.2 | 4.34 | 65 |

(J = 4) | 117.6 | 2.43 | 24 | |

(J = 5) | 76.8 | 1.68 | 9 | |

(J = 6) | 63.2 | 1.40 | 4 | |

qSNAP | (J = 3) | 50.8 | 1.26 | 38 |

(J = 4) | 41.4 | 1.00 | 14 | |

(J = 5) | 34.9 | 0.89 | 5 | |

(J = 6) | 28.8 | 0.81 | 2 | |

C-GAP-17 | (Reference 51) | 42.0 | 1.26 | 1 |

. | . | E RMSE (meV/at.)
. | F RMSE (eV/Å)
. | MD speed (relative) . |
---|---|---|---|---|

Linear SNAP | (J = 3) | 192.2 | 4.34 | 65 |

(J = 4) | 117.6 | 2.43 | 24 | |

(J = 5) | 76.8 | 1.68 | 9 | |

(J = 6) | 63.2 | 1.40 | 4 | |

qSNAP | (J = 3) | 50.8 | 1.26 | 38 |

(J = 4) | 41.4 | 1.00 | 14 | |

(J = 5) | 34.9 | 0.89 | 5 | |

(J = 6) | 28.8 | 0.81 | 2 | |

C-GAP-17 | (Reference 51) | 42.0 | 1.26 | 1 |

Our linear SNAP model fits did not match the accuracy of the C-GAP-17 potential up to *J* = 6. In contrast, qSNAP models optimized with XPOT show favorable numerical energy and force errors for *J* = 4 and higher. It should be noted that the twojmax hyperparameter in the FitSNAP input^{46} is 2*J*_{max} and, thus, it is double the *J* values we report throughout this paper.

We ran MD simulations with the resulting potentials and found that as *J* increases so too does the computational cost of evaluation, as expected. We report these data in the rightmost column of Table III. The computational cost of moving from linear to quadratic fitting was significantly less than that of increasing *J*.

We note that while the cost of evaluation does not change by more than a factor of two upon moving from linear to quadratic SNAP models, the cost of fitting is increased further (∼3.7× for *J* = 5 and 7.4× for *J* = 6) and cannot be significantly improved by increasing computational resources. This is because the solver used by FitSNAP^{46} is single-core, and so, although the calculation of the descriptors is parallelized, the fitting of the potential becomes the bottleneck in the optimization process.

## V. EXPERIMENTS

### A. Physical role of hyperparameters

We now describe numerical experiments that explore aspects of XPOT and its operation. The first point is to do with the meaning of specific hyperparameters: these could be viewed as purely free optimization parameters, on one hand, or as physically constrained and informed ones, on the other hand.^{37}

Amorphous carbon (a-C) presents a relevant challenge for fitting interatomic potentials with ML or otherwise^{54} due to the diverse nature of its atomic structure. In Sec. IV, we have shown SNAP model fits with optimized hyperparameters, leading to lower numerical errors than those of C-GAP-17. The final SNAP models are computationally cheaper than C-GAP-17 and can run on GPU architectures, but the robustness of the potential is reduced compared to C-GAP-17, as detailed in the following.

We used relatively flexible and numerically accurate qSNAP potentials with *J* = 6 in this case. We note that when fitting a potential from scratch, a larger and more diverse dataset with a lower *J* value might provide a more cost-effective potential for extensive large-scale simulations, such as the qSNAP model by Willman *et al.* (Ref. 8; *J* = 4) for high-pressure carbon systems.

*r*is mapped onto

*θ*

_{0}via

A low value of rfac0 directly reduces the amount of the 3-sphere that can be used for mapping the possible neighbor positions because points south of *θ*_{0} are excluded. Thompson *et al.*^{32} state that “it is advantageous to use most of the 3-sphere while still excluding the region near the south pole where the configurational space becomes highly compressed.” Thus, both very high or low values of rfac0 can diminish accuracy and/or robustness of the resulting potential. For a-C, values lower than 0.7 resulted in potentials becoming unstable in MD, and we, hence, specified $rfac0\u22650.7$. While it may be possible for potentials with low values of rfac0 to be stable, we were unable to perform 9000 K NVT simulations without erroneous behavior (“losing atoms”)—even when using a 0.2 fs timestep, a factor of five shorter than that used for typical C-GAP-17 MD simulations.^{6,51}

Our final qSNAP *J* = 6 potential is stable at 9000 K with a 0.2 fs timestep, and when replicating the short, small-scale melt-quench simulations in Ref. 51, it predicted 53.5% sp^{3} environments (Fig. 5, bottom) compared to 57.5% by GAP-17% and 65.7% from *Ab Initio* Molecular Dynamics (AIMD) simulations after the melt-quench simulations. We checked the predicted radial distribution functions for 216 atom-cells, finding good agreement with DFT and C-GAP-17 predictions for quenched a-C between 1.5 and 3.5 g/cm^{3}.

Next, we investigated optimizing the cutoff radius. We ran a second optimization sweep with the same ranges selected for optimization, adding a variable cutoff. All hyperparameters optimized for a fixed cutoff were re-optimized in the variable-cutoff sweep. Here, the “best” potential was fitted with a cutoff of 3.49 Å compared to the 3.7 Å used previously, and the energy and force weightings were altered slightly (albeit the energies of the dimer structures are lowest-weighted in both cases).

Figure 5 shows that the model with optimized cutoff has almost identical behavior for the isolated-dimer curve compared to the fixed-cutoff potential. However, we find that the optimized-cutoff-radius SNAP better predicts sp^{3} character in carbon MD. When repeating the tests from above, the new potential predicts 60.0% sp^{3} carbon environments. In the lower panel of Fig. 5, we show these results, noting that for both SNAP models the sp^{3} content increased more slowly during quenching than for the AIMD simulations. We also fitted a potential with *R*_{cut} = 3.49 Å with the other hyperparameters remaining the same as for our fixed-cutoff potential, aiming to separate the effect of *R*_{cut} from that of other aspects of the model. This potential gave 56.9% sp^{3} content, which is between the fixed- and optimized-cutoff potentials. This test suggests that optimization of cutoff can be beneficial, alongside optimizing other hyperparameters, to optimize the physical behavior of an ML potential.

Comparing the hyperparameters of optimized potentials across the different datasets from Zuo *et al.* (analyzed above in Table II), we found that the optimal rfac0 changes significantly from element to element, ranging between 0.5 (Cu) and 0.845 (Mo). Notably, not only does the optimal value of rfac0 depend on the system, we observed that in the case of the Si GAP-18 database, the optimal rfac0 and rcutfac values were different to that of the Zuo *et al.* dataset for silicon. Specifically, rfac0 was optimized to 0.87 (GAP-18) and 0.58 (Zuo *et al.*). We also found that the optimized cutoff for these databases differed, viz., 4.68 vs 4.23 Å. These findings emphasize that different types of reference data (e.g., liquid vs crystalline structures and general vs specialized datasets) may require different hyperparameters for an optimized fit, and therefore, they underline the usefulness of hyperparameter optimization in the development of ML potential models.

### B. Balancing energies and forces in the loss function

We use a combined loss function that includes energy and force error terms, controlled by a factor *α* that takes values between zero and one [Eq. (8)]. We investigated the effect of tuning this balance by optimizing a series of potentials from one where the loss only depends on force errors [*α* = 0 in Eq. (8)] to one where it only depends on energy errors (*α* = 1). For this experiment, at each value of *α*, we fitted ten independent qSNAP models with *J* = 5 for the Si-GAP-18 database^{49} (as in Sec. IV).

Figure 6(a) illustrates the requirement for joint optimization of the energy and force errors, which is achieved through the selection of *α*. In optimizing for only forces or energies, we produced potentials that are skewed significantly to either force or energy and are not accurate at predicting the respective other property.

Across the range 0 ≥ *α* ≥ 1, the energy RMSE decreases, and the force RMSE increases. However, the change is not linear, and for this particular system, the optimal value of *α* lies between 0.3 and 0.5. Not only do the errors change with varying *α*, but the standard deviation of the errors across the ten optimization runs also broadly decreases as the weighting of the corresponding part of the loss function is increased. For *J* = 4, we found this effect to be even further pronounced. Thus, a well-chosen *α* also increases reliability of optimization, improving consistency and performance across optimization sweeps.

Figure 6(b) shows the optimized cutoff values for potentials after optimizations at each *α*. When studying the trend in optimized cutoff values, we see that as *α* increases, so too does the optimal *R*_{cut}. Interestingly, the best force errors are found when the cutoff is lower, and better energy errors are found where higher *R*_{cut} values are preferred. In testing, we found that potentials at intermediate *α* values show robust performance in melt-quench simulations, while models with very small *α* occasionally demonstrated non-physical behavior. Further study of this behavior will be undertaken in the future.

To create well-balanced potentials, *α* should therefore be chosen being mindful of over-weighting either energies or forces. Using either *α* = 0 or 1 results in significantly less accurate potentials overall, and neither should be selected in general use.

### C. Role of the validation set

In validating potentials and thus in hyperparameter optimization too, the dataset used plays a crucial role. If the test set is structurally similar to the training data, it will not comprehensively assess the ability of the potential to describe “unseen” structures. In the case of C-GAP-17, the training and testing data are structurally similar because they have been obtained by a random split of the overall dataset.^{51} We therefore tested whether a useful, but more generic, validation set might allow for efficient hyperparameter optimization, emphasizing robustness over accuracy.

We optimized a-C potentials using two other types of validation sets: (i) a set of snapshots from very-high-temperature, 15 000 K MD simulations performed with C-GAP-17 and (ii) a set of fully randomized structures (hard-sphere random-packing model). Here, we created our own validation datasets aimed specifically at highly disordered structures, but we note that existing external datasets such as SACADA^{43} could also be used to guide hyperparameter optimization for carbon. The new datasets were labeled with DFT and used independently as validation sets for optimization with XPOT. After the fact, we evaluated the errors of the optimized potentials for the respective other datasets. The results for qSNAP (*J* = 5) potentials are shown in Table IV.

. | Energy/force RMSE on defined test sets . | ||
---|---|---|---|

C-GAP-17 test set (meV/at.)/(eV/Å) . | High-T MD (meV/at.)/(eV/Å)
. | Random packing (meV/at.)/(eV/Å) . | |

qSNAP (5-fold cross-validation) | 35.5/0.85 | 38.1/1.43 | 38.9/2.14 |

qSNAP ($L$: C-GAP-17 test set) | 31.9/0.90 | 36.6/1.47 | 36.8/2.27 |

qSNAP ($L$: High-T MD) | 39.5/0.87 | 38.0/1.32 | 41.4/2.12 |

qSNAP ($L$: Random packing) | 194.8/0.90 | 47.1/1.36 | 39.3/1.68 |

C-GAP-17 (Ref. 51) | 42.0/1.26 | 65.8/1.51 | 33.8/1.60 |

. | Energy/force RMSE on defined test sets . | ||
---|---|---|---|

C-GAP-17 test set (meV/at.)/(eV/Å) . | High-T MD (meV/at.)/(eV/Å)
. | Random packing (meV/at.)/(eV/Å) . | |

qSNAP (5-fold cross-validation) | 35.5/0.85 | 38.1/1.43 | 38.9/2.14 |

qSNAP ($L$: C-GAP-17 test set) | 31.9/0.90 | 36.6/1.47 | 36.8/2.27 |

qSNAP ($L$: High-T MD) | 39.5/0.87 | 38.0/1.32 | 41.4/2.12 |

qSNAP ($L$: Random packing) | 194.8/0.90 | 47.1/1.36 | 39.3/1.68 |

C-GAP-17 (Ref. 51) | 42.0/1.26 | 65.8/1.51 | 33.8/1.60 |

We demonstrate that the validation set chosen has a direct impact on the hyperparameters selected during optimization and thus on the potential itself. The qSNAP model optimized on the random-packing dataset has very high energy errors on crystalline structures (which is expected as the atomic environments there are very different). We attribute this to a low weighting of the crystalline data in the optimization process as it allows for better fitting to random structures at the cost of worsened accuracy on more ordered environments.

We find that the high-*T* MD validation set provides generally robust and balanced results across the three testing sets; conversely, a purely random packing of atomic spheres is not informative for “real-world” applications. Comparing the optimized SNAPs to the original C-GAP-17 model, an interesting pattern emerges: C-GAP-17 overall has slightly higher errors and notably the highest energy error on the high-*T* MD dataset, and yet, it maintains reasonable accuracy when tested on the random-packing set—outperforming all our qSNAP models tested for this dataset, including the one specifically optimized with this set as validation. This observation suggests that C-GAP-17 is robust outside of regular structures, consistent with the physically meaningful performance seen in previous applications (e.g., Ref. 6).

## VI. CONCLUSIONS

We have described an openly available Python package, which we call XPOT, that facilitates hyperparameter optimization for ML potentials by providing interfaces to widely used fitting software. We have shown practical applications of XPOT for “cross-platform” optimization of potential fits to datasets of quantum-mechanical reference data. In a series of numerical experiments, we explored the role of the loss function in this optimization and how it affects the characteristics of the resulting potentials with implications for the present software package and for atomistic ML more generally.

We find that potential fits with XPOT perform appreciably compared to literature data.^{28} A scaling factor, *α*, can be used to balance the weighting of energy and force validation during optimization, and we note that different *α* values for the Zuo *et al.*^{28} and Si-GAP-18^{49} datasets for silicon had to be chosen for best results. This requirement for manual user input remains one of the challenges in fully automating the fitting process. We also showed that being interfaced to a Bayesian optimization code, XPOT can explore large search spaces: up to 11 hyperparameters were optimized simultaneously in fitting qSNAP models for carbon.

In the future, XPOT will include functionality for optimizing other classes of ML potentials as well as asynchronous optimization to offer better parallelization across large computing resources and facilities. We expect that one of the most important use cases for XPOT will be in optimizing those hyperparameters that do *not* significantly affect the evaluation cost of an ML potential (e.g., the relative weighting of different datapoints in the fit): this way, one can improve the quality of prediction with no additional computational cost at runtime. Other parameters do affect the cost—and in such cases, especially for general-purpose ML potentials and large-scale simulations, the extra time spent on hyperparameter optimization is likely favorable over the cost one would incur with more expensive settings.

We have focused on hyperparameter optimization as but one component in the development pipeline of ML potential models. Looking further, it seems beneficial to combine this approach with strategies such as active learning for database generation and datapoint selection. We will explore these synergies in the future work and in the long run expect XPOT to become part of a larger framework (and set of workflows) for the automated and routine generation of machine-learned potentials for mainstream material modeling.

## ACKNOWLEDGMENTS

We thank J. D. Morrow and J. L. A. Gardner for testing and reviewing the code and A. P. Thompson for helpful comments on the manuscript. This work was supported through a UK Research and Innovation Frontier Research grant (Grant No. EP/X016188/1).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Daniel F. Thomas du Toit**: Conceptualization (equal); Investigation (lead); Software (lead); Writing – original draft (equal). **Volker L. Deringer**: Conceptualization (equal); Supervision (lead); Writing – original draft (equal); Writing – review & editing (lead).

## DATA AVAILABILITY

The XPOT Python code is openly available at https://github.com/dft-dutoit/xpot. Interfaces to the LAMMPS,^{56}^{,} FitSNAP,^{46} and QUIP/GAP^{36} codes are provided, and all these codes are freely available for non-commercial research. Data and code to reproduce the results from the paper are provided in the same repository.

## REFERENCES

*sp*

^{3}content in tetrahedral amorphous carbon

_{10}GeP

_{2}S

_{12}-type superionic conductors

_{2}S-P

_{2}S

_{5}solid-state electrolytes using machine learning potentials

_{x}La

_{3}Zr

_{x−5}Ta

_{7−x}O

_{12}by machine learning interatomic potentials

*Homo citans*and carbon allotropes: For an ethics of citation

_{60}