A strategy is outlined to reduce the number of training points required to model intermolecular potentials using Gaussian processes, without reducing accuracy. An asymptotic function is used at a long range, and the crossover distance between this model and the Gaussian process is learnt from the training data. The results are presented for different implementations of this procedure, known as boundary optimization, across the following dimer systems: CO–Ne, HF–Ne, HF–Na+, CO2–Ne, and (CO2)2. The technique reduces the number of training points, at fixed accuracy, by up to ∼49%, compared to our previous work based on a sequential learning technique. The approach is readily transferable to other statistical methods of prediction or modeling problems.

In any molecular simulation, approximations of the potential energy surfaces (PESs) that describe the relevant interactions are a pre-requisite. Traditionally, these approximations have been made using empirical potentials (force fields).1,2 However, such force fields have closed functional forms, which limit their capacity to capture the complicated topography of the PES. Furthermore, they are laborious to produce and may fail to capture accurately even the fitted data. As the accuracy of the simulation depends on the potential employed, much work has been devoted to developing force fields that provide approximations of PESs with quantum-mechanical accuracy.

Such work includes methods to generate ab initio force fields3 and attempts to “learn” the potential via a machine learning algorithm.4,5 This article concerns the latter approach, which has also been applied in other fields in chemistry and materials science.6–8 This approach proceeds by training a statistical technique on a relatively small set of data from ab initio calculations on the PES of interest, known as the training set. Many such techniques have been employed to predict the energy in these algorithms, including neural networks,9–13 moment tensors,14–16 and Gaussian processes17–34 (GPs).

Herein, GPs are employed as the statistical technique. An existing example of a force field that uses GPs is FFLUX,35 which has been used to approximate the energies of weakly bound complexes36 and water clusters,37 among other applications.38–40 Furthermore, in the field of materials science, GP models that invoke a smooth overlap of atomic positions (SOAP) kernel41 have seen successful applications to many systems.29–31,42–45

GP models have also produced promising results in applications to intermolecular interactions.17–19 Initially,17 the training sets for these models were constructed with Latin hypercube (LHC) sampling46–48 though sequential design strategies,49,50 which achieve a prescribed accuracy with a smaller training set, have been shown to outperform such methods.19 Regardless of the training set design, a fixed crossover distance, Rcross, was imposed prior to training. This parameter defines a boundary beyond which a simple, long-range asymptotic function takes over prediction from the GP.17,19 A similar approach is used in materials science, in which the contribution by one atom to the neighbor density of another is assumed to be zero if the two are separated by a distance in excess of Rcross.31,44 Crossover distances have been applied alongside other statistical methods of prediction too,5,51 with neural network9–12 and moment tensor15 models of PESs employing fixed, pre-determined crossover distances.

The use of a crossover distance limits the portion of the PES approximated by the statistical method to the region around the potential well. Consequently, fewer training points are required to develop an accurate model. Previously, these distances were all fixed and specified a priori; however, they can be learnt from the training data. Herein, the work of Uteva et al.17,19 is extended into a sequential design strategy in which Rcross varies as a function of the number of training points and the species of the interacting atoms. This produces models of the same accuracy with fewer training points than the same design strategy with a fixed Rcross. The process by which Rcross is optimized is referred to here as boundary optimization.

All GP models herein make predictions via GP regression, with detailed descriptions of GP regression theory available elsewhere.52–54 This section briefly introduces the key concepts in the context of intermolecular potentials. For any intermolecular interaction, the PES can be thought of as a multivariate function f(x), xRZ, where x is a vector of inputs and Z is the number of elements in x. Here, the inputs are inverse interatomic separations, although promising results have also been obtained with Morse variables.24,55 Only pair intermolecular interactions are considered, although GP models have been applied successfully to non-additive interactions17,19,40 and the methodology outlined here can be extended to such cases straightforwardly.

When the outputs f(xi) = Yi are available at N values of i, the set {xi,Yi}i=1N forms the training set. Letting Y be a vector of the observed energies from this set, GP regression can be used to approximate the value of f(x) at a new point x* as

f(x*)=K*TK1Y,
(1)

where K* is a vector of the covariances between x* and all xi,

K*=k(x*,x1)k(x*,x2)k(x*,xN),
(2)

and K is the positive-definite covariance matrix,

K=k(x1,x1)+σn2k(x1,x2)k(x1,xN)k(x2,x1)k(x2,x2)+σn2k(x2,xN)k(xN,x1)k(xN,x2)k(xN,xN)+σn2.

Here, σn2 is the Gaussian noise variance, which accounts for the noise in the training data.

All entries in K are found by evaluating a covariance function, k(x, x′), where x and x′ are configurations from the training set. The same holds for K*, only with one training configuration replaced by the new point x*. A common example is the squared exponential covariance function for which

k(x,x)=σf2d=1Dexp(xdxd)22ld2.
(3)

Here, D is the total number of interatomic distances, which comprise unique atomic pairs; σf2 is the signal variance; xd and xd are the inverse separations in x and x′, which contain the dth atomic pair; and ld is the lengthscale for the interaction between this pair of atoms. The hyperparameters of the covariance function are σn2, σf2, and the set of lengthscales, ld.

This covariance function can be modified to account for the fact that two configurations can have equivalent energies due to symmetry. The result is the symmetric squared exponential covariance function, ksym(x, x′), investigated by Uteva et al.,17 which is used here. The set of all permutations of the interatomic distances in x under which the energy surface is unchanged is denoted as P, while p is a single permutation within this set. Assuming that ld is invariant under interchange of xd and xd,

ksym(x,x)=pPk(px,x),
(4)

where k(x, x′) is the squared exponential covariance function from Eq. (3). GPs also employ a mean function, which is taken here to be zero everywhere.

To specify the hyperparameters of ksym(x, x′) that produce a model that achieves the best fit to the training data, the log of the marginal likelihood function, log(L), is maximized,52,54

log(L)=12YTK1Y12log|K|N2log(2π).
(5)

Equation (5) shows that this optimization entails inversion of the N × N covariance matrix. This incurs a sizable computational cost, which translates to scaling of order O(N3) for hyperparameter optimization.56 Consequently, although they achieve higher predictive accuracies than other statistical methods when modeling PESs,51,57,58 GPs require greater computational effort to train.51 For prediction, meanwhile, K−1Y in Eq. (1) needs to be calculated only once, and as a result, the cost scales linearly with N. Even so, molecular simulations that employ a GP model for which N is large are more computationally intensive than those that use a traditional force field.

These issues have led to attempts to minimize the training set size required by a GP model to achieve a given error. For two training strategies that achieve the same error, that which does so with fewer training points is more computationally efficient. Attempts to develop more computationally efficient training strategies have involved active learning or sequential design methods,19,29,59 composite kernels,24 and new sampling methods.27,60

However, no attempt was made in the previous works to increase training efficiency by learning the optimal crossover distance from the data used in training. Our boundary optimization approach has the potential to increase the gains in efficiency from these prior methods further still with little associated increase in computational time, provided a viable long-range approximation is available. Such an improvement is shown here, building upon the sequential design method of Uteva et al.19 

It was found by Uteva et al.17 that the predictive performance of a GP for a PES is enhanced by using inverse interatomic separations rather than non-inverse separations to describe intermolecular configurations. This is because ksym(x, x′) is a stationary kernel (i.e., it relies only on the distance between the two configurations being compared), meaning it assumes that the rate of change of the output (i.e., the interaction energy) with the input (the interatomic distances) is constant. This is not the case for a PES, where the energy varies rapidly with separation in the short-range repulsive wall, but barely at all in the long-range asymptotic region. The rr−1 conversion addresses this issue: For low r, the rate of change in r−1 is faster than in r, meaning that the change in the input more closely matches the change in the output under the former; meanwhile, for large r, the rate of change in r−1 is far smaller, as is that of the energy.

Previously, Latin hypercube (LHC) sampling, which is a space-filling design, has been employed to build datasets when modeling PESs with GPs.17–19,61 The approach of Uteva et al.17 entailed designing LHCs in inverse separations over a range of angles that specified the symmetry-distinct region.17 The algorithm generates a large number of candidate LHCs and finds the minimum separation in each. The LHC with the largest such separation is selected under what is referred to here as the “maximin” criterion. Quantum chemical calculations are undertaken on the selected LHC only. This LHC is then subject to a high-energy cutoff, Ecut, and any configuration for which the interaction energy exceeds Ecut is discarded from the LHC.17 In addition, a geometric constraint of 8.5 Å was placed on all LHCs to ensure no configurations with minimum separations above this threshold were included.17 It was observed that at separations greater than the geometric constraint, the GP predictions tended toward a small, non-zero constant.17 As these predictions should tend toward zero, a long-range function derived from the multipole expansion62 was introduced to approximate energies at large separations.

In a later work, Uteva et al.19 presented active and sequential learning approaches, in which the model of the PES is progressively refined by adding new points to the GP. In this approach, training and validation of the GP models involve three datasets: a training, a reference, and a test set. The training set is used in the GP regression. The reference set provides a pool of configurations from which new training points can be selected, while the test set determines the GP’s accuracy against an independent dataset.

One sequential design strategy presented by Uteva et al.19 is the highest error search. This approach selects new training points based on the configuration in the reference set with the largest GP prediction error. We adopt this approach herein. However, Uteva et al.19 applied this strategy only at separations below the fixed, pre-determined crossover distance,19 meaning no new training data could be added at separations above Rcross.

The highest error search is best described as a sequential design rather than an active learning63,64 method. This is because active learning methods are a subset of sequential design methods that compute the output corresponding to an input only at the point it is added to the training set. This means, in the context of modeling PESs, an active learning method would use a reference set comprising configurations for which the energies were not calculated. Active learning strategies have, however, been successfully used to develop GP,19,29,59 moment tensor,15 and neural network13 models of PESs, and the methodology used here for boundary optimization could be altered for use with the two-set method of Uteva et al.,19 which is an active learning method.

When using a sequential design method, meanwhile, the energies in the reference and test sets are pre-computed using a relatively computationally inexpensive ab initio technique. Once a model with the requisite predictive accuracy is obtained, the energies in the minimal training set can be re-calculated using a more accurate, and costly, ab initio method before use in applications. This process is known as transfer learning,65,66 and it allows a GP model of a PES to be produced with the accuracy of a high-level ab initio technique with relatively few computationally expensive calculations. For example, MP267 energies have been upgraded to coupled cluster with single, double, and perturbative triple excitations [CCSD(T)]68 energies for calculation of the CO2–CO second virial coefficient.17 

In previous applications of both LHC learning17 and sequential design methods,19 a fixed value of the crossover distance, Rcross, was chosen a priori; however, it is possible to determine its optimal value from the reference data. This is boundary optimization and is possible because a sequential design method permits Rcross to be varied each time the GP is updated. Consequently, the size of the region over which GP regression is used for prediction will grow with the predictive accuracy of the GP.

Boundary optimization may lead to increased accuracy for the following reason. When the number of training points, NTP, is low, the predictive accuracy of the long-range function at the outer edge of the potential well will exceed that of the GP. Consequently, it is anticipated that allowing the long-range function to approximate the energies of configurations in this region at low NTP will increase the predictive accuracy of the overall model and facilitate more efficient model development.

Both the reference and test sets for each chemical system were generated under the LHC design strategy discussed in Sec. I B. All energies were calculated in Molpro69 using MP2 with an aug-cc-pVTZ basis set and the counterpoise correction with the only exception being HF–Na+, which used an aug-cc-pwCVTZ basis set instead. The specifications of the LHCs for each system are given in Table I. Therein, r is the distance between the bond centers (not centers of mass), θ is the angle between r and the bond axis of the molecule, θ1 and θ2 are the angles between r and the bond axis of the first and second CO2 molecules, respectively, and ϕ is the torsional angle between the two CO2 molecules. The molecules were kept rigid, with rCO = 1.1283 Å for CO, rCO = 1.1632 Å for CO2, and rHF = 0.9170 Å. However, boundary optimization could be applied to non-rigid systems straightforwardly, with the only additional requirement being generalization of the long-range function to non-rigid molecules. A larger geometric constraint of 100 Å (instead of 8.5 Å17) was employed to probe the long-range behavior of the system. The maximum value of NTP was 100 for all systems apart from (CO2)2, which used 300 training points at most.

TABLE I.

The co-ordinates for the reference and test LHCs for each system. Nref and Ntest are the number of points in the reference and test sets, respectively, after application of the high-energy cutoff, while the maximum number of training points for models of each potential is given in the text. Also shown is the minimum energy across the reference and test sets, Emin, in Hartrees (Eh), although no attempt was made to approximate the global minimum energy for any potential.

SystemCoordinateRangeNrefNtestEmin
CO–Ne r−1 0.01–0.67 Å1 1914 5718 −1.502 × 10−4 
cos(θ−1 to 1 
HF–Ne r−1 0.01–0.67 Å1 2148 6468 −2.633 × 10−4 
cos(θ−1 to 1 
HF–Na+ r−1 0.01–0.67 Å1 2760 8416 −2.518 × 10−2 
cos(θ−1 to 1 
CO2–Ne r−1 0.01–0.67 Å1 5057 5072 −2.895 × 10−3 
cos(θ0–1 
(CO2)2 r−1 0.01–0.67 Å1 5810 5837 −1.975 × 10−3 
cos(θ10–1 
cos(θ20–1 
Φ 0°–180° 
SystemCoordinateRangeNrefNtestEmin
CO–Ne r−1 0.01–0.67 Å1 1914 5718 −1.502 × 10−4 
cos(θ−1 to 1 
HF–Ne r−1 0.01–0.67 Å1 2148 6468 −2.633 × 10−4 
cos(θ−1 to 1 
HF–Na+ r−1 0.01–0.67 Å1 2760 8416 −2.518 × 10−2 
cos(θ−1 to 1 
CO2–Ne r−1 0.01–0.67 Å1 5057 5072 −2.895 × 10−3 
cos(θ0–1 
(CO2)2 r−1 0.01–0.67 Å1 5810 5837 −1.975 × 10−3 
cos(θ10–1 
cos(θ20–1 
Φ 0°–180° 

A high-energy cutoff, Ecut, was applied to the reference and test sets to remove configurations with interaction energies, which exceeded its value. Ecut = 0.005 Eh for all systems apart from HF–Na+, for which Ecut = 0.05 Eh because the charge–dipole interaction increases the well depth. (1 Eh ≈ 27.211 eV ≈ 2625.5 kJ mol−1).

For systems with NtestNref, the independence of the test set is self-evident. However, for systems where NtestNref, it is also true that the test set is independent. This follows because, although the reference and test sets for each system were designed using the same “maximin” strategy, the stochastic nature of the LHC algorithm means that separate LHCs contain completely independent sets of configurations. Furthermore, the “maximin” criterion is based on just one separation in the whole dataset, meaning that two LHCs with similar maximin will still be dissimilar. This is demonstrated in Fig. 1, which shows the energies against the inverse C–Ne separation for the reference and test sets used in training models of the CO2–Ne potential.

FIG. 1.

Calculated energies in the reference set (red) and the test set (black) of the CO2–Ne potential for separations between 2.85 and 4 Å. Both sets contain ∼5000 configurations (see Table I).

FIG. 1.

Calculated energies in the reference set (red) and the test set (black) of the CO2–Ne potential for separations between 2.85 and 4 Å. Both sets contain ∼5000 configurations (see Table I).

Close modal

For the long-range energy model, multipole series were employed for all systems apart from HF–Na+. The contributions included in the multipolar long-range functions are shown in Table II for each system apart from CO2–Ne, for which the function is already described in previous work,17 and (CO2)2. The latter was developed prior to the other multipolar long-range functions and uses atomistic charge, dipole, quadrupole, and polarizability contributions from Hartree–Fock70 theory, which were scaled to give the known total molecular properties. For HF–Na+, a fitted long-range function was used, which was derived by fitting a sum of two power laws between two points where the energy was predicted by GP regression. More information on the motivation for and derivation of this function is found in the  Appendix.

TABLE II.

Properties included in the multipolar long-range functions of each system, as well as the ab initio methods used in their calculation. All calculations were carried out using an aug-cc-pVQZ basis set apart from those for the dispersion coefficients, which used an aug-cc-pVTZ basis set.

SystemDipoleQuadrupolePolarizabilityDispersion
CO–Ne ✓ × ✓ ✓ 
HF–Ne ✓ × ✓ ✓ 
HF–Na+ ✓ ✓ ✓ ✓ 
Level of theory MRCI71,72 MRCI MP2 CCSD 
SystemDipoleQuadrupolePolarizabilityDispersion
CO–Ne ✓ × ✓ ✓ 
HF–Ne ✓ × ✓ ✓ 
HF–Na+ ✓ ✓ ✓ ✓ 
Level of theory MRCI71,72 MRCI MP2 CCSD 

When modeling PESs using GPs, it is often necessary to classify configurations as suitable for prediction via the GP or via a long-range asymptotic function. In the work of Uteva et al.,17,19 the classifier formed a boundary from the superposition of atom-centered spheres defined by a single constant, Rcross. Specifically, if any interatomic distance was less than Rcross, the GP was used. As Rcross was fixed, this classifier is referred to here as Cfixed. Denoting the region in which GP regression was used for prediction as AGP and the region which employed the long-range function as ALR, under Cfixed, these regions were

AGP={r:min(r)Rcross}
(6)

and

ALR={r:min(r)>Rcross},
(7)

where r is a set of intermolecular atom–atom distances and min(r) is the smallest separation in r.

In boundary optimization, Rcross varies according to the GP accuracy. Thus, Rcross is not constant, but a model parameter, which is learnt from the reference data. As this classifier is still parameterized by a single value, Rcross, it is referred to here as Csingle.

More elaborate classifiers are possible by using more detailed parametric forms to define the boundary region. A simple way of defining a more complex classifier is for the value of Rcross to depend on the atom types that comprise the interatomic distance. The resulting classifier is referred to here as Cmulti. For a system of molecules with D interatomic pairs of chemically different atoms, using Cmulti requires the vector of crossover distances Rcross = (R1, …, Rd, …, RD). This defines a multiple-parameter boundary region as follows:

Cmulti(r)=AGPifmind(r)Rdforanyd,ALRifmind(r)>Rdforalld,
(8)

where mind(r) is the minimum separation in r that involves an atomic pair of type d.

The optimum value of the classifier parameters is determined by minimizing the error between the model and the reference set (i.e., by minimizing the training error), meaning the sizes of AGP and ALR vary with the GP. The sum of squared errors, SSEtot, over the two regions is

SSEtot=SSEGP+SSELR,
(9)

where

SSEmethod=i=1Nmethod(ŶiYi)2.
(10)

Here, “method” denotes either GP or LR, Nmethod is the number of points in Amethod, Ŷi is the prediction of the energy for the ith configuration from the desired method, and Yi is the calculated energy of the same configuration. The root-mean-square error (RMSE) against the test set, RMSEtest, is given by

RMSEtest=SSEtestNtest12,
(11)

where SSEtest is SSEtot over the test set and Ntest is the number of configurations in the test set. RMSEtest is therefore a function of Rcross with discrete steps, as the RMSE changes only when a change in Rcross causes a configuration in the test set to be re-classified. Both Csingle and Cmulti are simple parametric classifiers that pre-impose a mathematical form on the classification. Hence, neither is expected to be optimal with respect to the RMSE against the reference or test sets. That is, a more complicated boundary than that described by these classifiers will likely produce a lower RMSE against a given dataset. However, it is shown later that an artificial “ideal” classifier produces only very marginal improvements over Cmulti, suggesting this classifier provides a very good balance of simplicity and accuracy.

Using the data and calculation methods above, the algorithm generates a GP model sequentially as follows: Train the GP to the current training set, select the classifier parameters by minimizing the RMSE against the reference set, and move a new point from the reference to the training set based on the largest error. Below are further details of each step, along with how the choice of classifier and placement strategy affects the algorithm.

1. GP training

All GPs described herein were trained using the GPy73 package in Python 2.7. Optimization of the hyperparameters was carried out by maximizing log(L) using 20 independent restarts whenever a configuration was added to the training set. Moreover, a gamma distribution with an expectation of one and a variance of two was used as a prior on all hyperparameters for all systems. This was to weakly penalize large hyperparameter values, given that the expected values are typically of order 0.1 or below.

2. Direct search algorithm

When using Csingle, Rcross can be optimized via a direct search that exploits how the RMSE varies as a piecewise constant function of Rcross. Because of this feature, all possible values of SSEtot (and hence the RMSE) can be computed readily against the reference set. A direct search of these values is guaranteed to find the global minimum. Full details of the direct search are given in Algorithm 1.

ALGORITHM 1.

Direct search algorithm.

1: Compute the squared errors for the GP and long-range function and min(r) at each configuration in the reference set. 
2: Order the configurations from smallest to largest in terms of min(r). 
3: Approximate SSEtot initially from the squared errors of the long-range function alone. 
4: Iterate through the min(r) from the lowest to the highest: a: set Rcross = min(r), which moves a single configuration from ALR to AGP; b: Update SSEtot by deducting the squared long range error of the moved point from SSELR and adding its squared GP error to SSEGP; c: store the new SSEtot
5: Select whichever value of Rcross corresponds to the smallest value of SSEtot
1: Compute the squared errors for the GP and long-range function and min(r) at each configuration in the reference set. 
2: Order the configurations from smallest to largest in terms of min(r). 
3: Approximate SSEtot initially from the squared errors of the long-range function alone. 
4: Iterate through the min(r) from the lowest to the highest: a: set Rcross = min(r), which moves a single configuration from ALR to AGP; b: Update SSEtot by deducting the squared long range error of the moved point from SSELR and adding its squared GP error to SSEGP; c: store the new SSEtot
5: Select whichever value of Rcross corresponds to the smallest value of SSEtot

The direct search algorithm is computationally cheap for the following reasons. Each instance of the direct search requires only a single set of predictions from each of the long-range function and GP regression. As the long-range function is fixed throughout the sequential design process, these predictions need only be calculated once at the start. The GP predictions need to be re-calculated whenever the GP is updated, which occurs once per step of the sequential design. However, these predictions are already required by a sequential design step when choosing the new training point. Furthermore, the calculation of all possible squared error values (step 4) is cheap because calculating each value in order requires only a simple update of SSELR and SSEGP. Consequently, the direct search is fast compared to the other steps of the sequential design algorithm and can therefore be undertaken at each design step with negligible additional computational effort.

3. Orthogonal direct search algorithm

An adaption of the direct search algorithm is required for Cmulti, as this requires a multidimensional optimization of multiple classifier parameters. This is called the orthogonal direct search as it optimizes one element of Rcross while keeping the others fixed. The single element that varies is optimized using the direct search algorithm, as this is guaranteed to return the best minimum along that 1D slice of Rcross. Although orthogonal optimizations can be time consuming, the speed of the one-dimensional direct search means that repeating it multiple times for all crossover distances is feasible.

The orthogonal direct search proceeds via Algorithm 2. As in the one-dimensional direct search, the square errors of the long-range and GP predictions at each configuration are pre-computed for this algorithm. The orthogonal search algorithm is not guaranteed to find the global minimum, as local minima may exist in the RMSE landscape. Hence, NRmax restarts are performed with randomly selected starting points. Each restart involves NImax optimizations of each Rd. Here, NImax = 15 and NRmax = 5 for all systems. If the values in Rcross converge such that further one-dimensional searches in any orthogonal direction do not change its elements prior to NI = NImax, Rcross is saved and the next restart is undertaken. In fact, it was rare that NI reached NImax for any of the systems explored here. Moreover, despite the low NRmax employed, the same minimum was usually found across multiple restarts.

ALGORITHM 2.

Orthogonal direct search algorithm.

1: Choose limits NImax and NRmax on the number of iterations and restarts respectively (see the text for details). 
2: Assemble the array of minimum distances M, as follows: a: For every configuration, collect all interatomic distances which comprise the same atoms types into group d and find mind(r) for each d. b: Arrange the lists of mind(r) values in an N × D array, M, where N is the number of configurations in the dataset and D is the number of unique atomic pairs in r. c: For each column in M, order the values from smallest to largest to produce an ordered list in each column. 
3: For each restart, select a row from M at random to be the initial guess at Rcross
4: For each d in turn, fix all cross-over distances apart from Rd and find the optimal value of Rd using a direct search. 
5: Repeat step 4 until NI = NImax or the elements of Rcross remain unchanged; save this Rcross and its corresponding SSEtot
6: Repeat steps 3–5 until NR = NRmax
7: Select the Rcross that corresponds to the lowest value of SSEtot
1: Choose limits NImax and NRmax on the number of iterations and restarts respectively (see the text for details). 
2: Assemble the array of minimum distances M, as follows: a: For every configuration, collect all interatomic distances which comprise the same atoms types into group d and find mind(r) for each d. b: Arrange the lists of mind(r) values in an N × D array, M, where N is the number of configurations in the dataset and D is the number of unique atomic pairs in r. c: For each column in M, order the values from smallest to largest to produce an ordered list in each column. 
3: For each restart, select a row from M at random to be the initial guess at Rcross
4: For each d in turn, fix all cross-over distances apart from Rd and find the optimal value of Rd using a direct search. 
5: Repeat step 4 until NI = NImax or the elements of Rcross remain unchanged; save this Rcross and its corresponding SSEtot
6: Repeat steps 3–5 until NR = NRmax
7: Select the Rcross that corresponds to the lowest value of SSEtot

The direct search offers a method that is at once very fast, designed for the discrete-stepped surface and perfect in a single dimension. This latter property means it skips over any local minima in a given orthogonal direction. As such, optimal crossover distances are obtained quickly and reproducibly under this method. Typically, about 80% of random restarts return the same set of crossover distances for a given system and a given training set.

4. Training point placement methods

Once the classifier parameters are optimized, the PES model is specified and the next training point is determined from the highest error method. The point with the greatest error can be selected either from AGP alone or from the union of AGP and ALR. Using AGP alone is referred to as the constrained placement method, which proceeds via Algorithm 3. Steps 2–5 of this algorithm comprise a stage of training, with the RMSEtest also calculated at each such stage.

ALGORITHM 3.

Constrained placement algorithm.

1: Select the configuration in the reference set with the highest energy; add this configuration to the training set and remove it from the reference set. 
2: Retrain the GP to the updated training set. 
3: Determine the boundary that minimizes the RMSE against the reference set (see Subsection II E 2). 
4: Find the configuration in AGP for which the GP error is highest, where AGP is defined by the boundary from the previous step. 
5: Add this configuration to the training set and remove it from the reference set. 
6: Repeat steps 2–5 until the desired RMSE or number of training points is reached. 
1: Select the configuration in the reference set with the highest energy; add this configuration to the training set and remove it from the reference set. 
2: Retrain the GP to the updated training set. 
3: Determine the boundary that minimizes the RMSE against the reference set (see Subsection II E 2). 
4: Find the configuration in AGP for which the GP error is highest, where AGP is defined by the boundary from the previous step. 
5: Add this configuration to the training set and remove it from the reference set. 
6: Repeat steps 2–5 until the desired RMSE or number of training points is reached. 

Choosing new points from the union of AGP and ALR is named the open placement method. This proceeded identically to the constrained placement method (Algorithm 3) except the highest error point was from either AGP or ALR. In each region, this point is found using the corresponding method of prediction (i.e., the error of the long-range function was used in ALR).

Models were trained under both point placement methods because each held potential advantages over the other. The constrained placement method ensures that all training points are added in the GP region and so are of immediate use in prediction. Meanwhile, the open placement method is capable of immediately placing points in regions of the PES where the long-range function performed poorly, potentially transferring these regions to AGP more rapidly than under the constrained placement approach.

5. Closest model training strategy

In the present work, all training strategies used are combinations of a classifier and a point placement method. A further strategy, which is intended for comparison only, is the closest model training strategy. This employs the open placement method to select training points and classifies a configuration using Coptimal, where

Coptimal(r)=AGPifSEGPSELR,ALRotherwise.
(12)

Here, SEmethod is the squared error in the prediction from “method” at r. Coptimal is so named because it classifies a configuration based on whether GP regression or the long-range function best approximates its energy.

Equation (12) shows that Coptimal employs no boundary and so requires prior knowledge of the energy of a configuration in order to classify it. Consequently, models obtained from the closest model strategy are unsuitable for prediction. However, this method represents an “ideal” classifier that is guaranteed to find the optimal RMSEtest for a given GP model. Hence, Coptimal is useful for estimating how inaccuracies in the parametric classifiers affect the training efficiency. If models from Coptimal significantly outperform the other classifiers, this suggests that the other classifiers are too simple to properly approximate the true boundary.

There are circumstances where Coptimal may not give an optimal RMSE against the test set. There may be short-range hypersurfaces where the interaction energy is predicted near-exactly by the long-range function due to chance. Points in the reference set that are near these hypersurfaces will be classified by Coptimal as part of ALR instead of AGP. If the configuration density of the reference set around such a hypersurface is insufficiently high, no training points will be added in its vicinity. Consequently, points in the test set near the hypersurface will be inadequately approximated by either method of prediction; the GP will perform poorly due to a lack of nearby training points, and the long-range function will be inaccurate for short-range points that are not extremely close to the hypersurface. This problem was avoided by using dense reference sets and by not using the constrained placement strategy with Coptimal.

6. Combining classifiers and placement strategies

The sequential design method, or training strategy, requires a choice of classifier and placement strategy. The combinations of these examined in this work are given in Table III. Methods that do not involve Coptimal can make predictions and so are suitable for applications. The method involving Coptimal can only classify points if the true energy is already known and so is only useful to estimate the loss in performance due to inaccuracies in the parametric classifiers. The fixed boundary method corresponds to the method of Uteva et al.19 and is included to allow comparison with this prior method, on which the new methods build. The method of Uteva et al.19 was previously shown to significantly reduce the number of training points compared to the LHC design. It is demonstrated below that these new boundary optimization methods improve further the already efficient methods of Uteva et al.19 

TABLE III.

Classifier and training point placement method for all training strategies examined here.

Training strategyClassifierPoint placement method
Single-constrained Csingle Constrained placement 
Multi-constrained Cmulti Constrained placement 
Single-open Csingle Open placement 
Multi-open Cmulti Open placement 
Closest model Coptimal Open placement 
Fixed boundary Cfixed Constrained placement 
Training strategyClassifierPoint placement method
Single-constrained Csingle Constrained placement 
Multi-constrained Cmulti Constrained placement 
Single-open Csingle Open placement 
Multi-open Cmulti Open placement 
Closest model Coptimal Open placement 
Fixed boundary Cfixed Constrained placement 

Comparisons of the performances of the different training strategies are made using the HF–Ne, HF–Na+, CO–Ne, CO2–Ne, and (CO2)2 potentials. These were selected as they provide a range of interaction types and well depths to test the robustness of the new training strategies.

The number of training points is NTP, which in each system was less than 10% of the number of configurations in the corresponding reference set (see Table I and the related text). Consequently, the training sets for the models discussed are candidates for transfer learning because their small size makes, for example, CCSD(T) calculations possible for the whole training set. Boundary optimization is anticipated to improve training efficiency as, for small NTP, the long-range function will outperform the GP at the outer edge of the potential well. This is illustrated in Fig. 2 for the (CO2)2 system using a GP model trained under the single-constrained strategy up to NTP = 21.

FIG. 2.

Predictions of the GP (blue) and the long-range function (red) for a slice through the PES of the (CO2)2 potential in which the two molecules are in a T-shaped configuration. The crossover distance for this model is shown by the black line at 4.505 Å. The data points (black circles) are independent of the GP training data.

FIG. 2.

Predictions of the GP (blue) and the long-range function (red) for a slice through the PES of the (CO2)2 potential in which the two molecules are in a T-shaped configuration. The crossover distance for this model is shown by the black line at 4.505 Å. The data points (black circles) are independent of the GP training data.

Close modal

Figure 2 shows that the predictive accuracy of the long-range function exceeds that of GP regression for configurations with C–C separations above the crossover distance. Despite this, in the fixed boundary method, these configurations would be predicted with the GP. This not only reduces the overall accuracy but also means new training points must be placed at a long range to address this when allowing the long-range function to predict that these energies would give adequate accuracy.

The impact of the above on training efficiency compared to fixed boundary training is shown for HF–Ne in Fig. 3. For Cmulti, the RMSE falls faster with NTP compared to Cfixed, indicating improved training efficiency. This improvement is most pronounced when NTP is low and so RMSEtest is high. This is expected because increasing the size of the training set increases the size of the GP region, meaning Rcross in the boundary-optimized strategies will approach the fixed value of 8.5 Å. Consequently, the difference between the fixed boundary and boundary-optimized models will close as NTP increases. Equivalent plots to Fig. 3 for all training strategies for all systems are found in the supplementary material. The RMSE data are somewhat noisy. Possible causes of this noise are minor variations in the hyperparameters upon retraining, the discrete nature and stochastic design of the reference set, and the low values of NTP meaning that the addition of a single point has a non-smooth effect on the RMSE. The new methods herein have considerably lower noise than the prior fixed boundary method.

FIG. 3.

Plot of RMSEtest/Eh against NTP on a log10 scale for models of the HF–Ne PES trained via the multi-constrained (circles) and fixed boundary (triangles) training strategies. The blue circles and pink triangles are points that were included in the fitting of the lines shown.

FIG. 3.

Plot of RMSEtest/Eh against NTP on a log10 scale for models of the HF–Ne PES trained via the multi-constrained (circles) and fixed boundary (triangles) training strategies. The blue circles and pink triangles are points that were included in the fitting of the lines shown.

Close modal

To compare the efficiency gain over fixed boundary training across all new strategies from Subsections II E 5 and II E 6, a metric, E, is used. This compares NTP required by two different strategies to achieve a given RMSEtest. When comparing a new training strategy with fixed boundary training,

E(RMSEtest)=NTP(RMSEtest)NTP,fixed(RMSEtest)×100%,
(13)

where NTP and NTP,fixed are the number of training points required by the new strategy and fixed boundary training, respectively. These quantities and E are shown as functions of RMSEtest as they vary with its value.

For a given RMSEtest, the values of NTP and NTP,fixed are determined from least squares fits of log10(RMSEtest) vs log10(NTP), with examples of such fits shown in Fig. 3 for the HF–Ne potential. Fits are made in the region where the RMSE decays as a power law of NTP. For all systems, this corresponds to 1 × 10−6Eh ≤ RMSEtext ≤ 1 × 10−4Eh, apart from HF–Na+ where the region is 1 × 10−5Eh ≤ RMSEtext ≤ 1 × 10−3Eh due to the larger high-energy cutoff for this potential. The fits provide continuous lines to interpolate to any RMSEtest within the above range. This enables a comparison of NTP between training strategies at fixed RMSEtest that accounts for the somewhat noisy data. Moreover, this is the range of errors in which the models are useful for applications and the decrease in log(RMSEtest) with log(NTP) is linear.

The fitted equations for each training strategy for the CO–Ne system are given in Table IV as power laws in NTP. Equivalent tables for all other systems are found in the supplementary material. As the data are noisy, the tables in the supplementary material also show the R2 value of each fit. In the case of CO–Ne, these evidence the high quality of the fits. In fact, no fit from any training strategy for any system achieves an R2 value lower than 0.8907 (from the fixed boundary training strategy on the HF–Na+ system). Furthermore, Fig. 3 and the corresponding plots in the supplementary material show that the RMSE data follow a straight line (in a log–log plot). This implies that R2 arises from scatter in the data rather than unsuitability of the fitting function.

TABLE IV.

Equation of the lines of best fit for RMSEtest of models from all training strategies for CO–Ne as power laws in the number of training points, NTP. Also shown are the R2 values of each fit on the data. All fits were over points in the range 1 × 10−6Eh ≤ RMSEtest ≤ 1 × 10−4Eh.

Training strategyLine of best fit to RMSEtestR2
Single-constrained placement 4.694NTP3.842 0.929 
Multi-constrained placement 1.551NTP3.647 0.944 
Single-open placement 2.901NTP3.718 0.921 
Multi-open placement 3.113NTP3.830 0.958 
Closest model 0.865NTP3.490 0.944 
Fixed boundary 546.0NTP5.069 0.914 
Training strategyLine of best fit to RMSEtestR2
Single-constrained placement 4.694NTP3.842 0.929 
Multi-constrained placement 1.551NTP3.647 0.944 
Single-open placement 2.901NTP3.718 0.921 
Multi-open placement 3.113NTP3.830 0.958 
Closest model 0.865NTP3.490 0.944 
Fixed boundary 546.0NTP5.069 0.914 

Re-arranging the equations in Table IV provides expressions for NTP in terms of RMSEtest for the CO–Ne potential. These give E as a function of RMSEtest, via Eq. (13), for all new training strategies herein. This was done for all other potentials as well, with plots of E against RMSEtest for all training strategies for each system given in Fig. 4.

FIG. 4.

Plots of E against RMSEtest for all potentials are shown in parts (a)–(e). The potential referred to in each frame is shown in the upper right corner.

FIG. 4.

Plots of E against RMSEtest for all potentials are shown in parts (a)–(e). The potential referred to in each frame is shown in the upper right corner.

Close modal

As observed earlier, the training efficiency gains in Fig. 4 are more pronounced at high RMSEtest for all training strategies across all systems. This suggests that boundary optimization is most effective when the training set is small, making this technique ideal for applications where a computationally cheap but less accurate potential is required. Nevertheless, significant reductions in the number of training points are also obtained in the RMSE range where PESs become useful for first-principles predictions. For example, a GP potential with an RMSE of 3 × 10−4 eV per atom (1.1 × 10−5Eh per atom) was employed in a recent simulation of the thermal properties of β-Ga2O3.31 Furthermore, Uteva et al. successfully determined the CO2–CO second virial coefficient using a GP PES with an RMSE of 2.4 × 10−5Eh.17 In this RMSE range, the boundary optimization methods typically reduce the required number of training points by 15%–33% (see Fig. 4).

Generally, the closest model strategy generates the largest efficiency gain, while the smallest improvement comes from strategies involving Csingle. Efficiency gains only slightly below those from the closest model strategy are achieved by the strategies that use Cmulti. This hierarchy of improvement implies that the choice of classifier, rather than the point placement strategy, is most important because strategies with the same classifier perform more similarly than those with the same point placement method. The closest model strategy is included only to illustrate the total training efficiency gain possible from an “ideal” classifier, and it is encouraging that the best boundary optimization methods are close to this ideal case. Indeed, the difference in E between this ideal method and the closest-performing Cmulti strategy never exceeds ∼3% for any system. This implies that Cmulti captures the true shape of the boundary region for the systems explored sufficiently. Thus, introducing a more detailed classifier would not be worth the increased cost of evaluating the PES.

Cmulti always outperforms Csingle. However, for the CO2–Ne potential [Fig. 4(d)], Cmulti also outperforms the closest model strategy. This is because the long-range function is nearly exact for a group of short-range configurations in the reference set. This is shown in Fig. 5 by the thin “peninsula” of points that are best estimated by the long-range function (in red), which encroaches deep into the GP region (in blue). This “peninsula” exists because the long-range function is of higher energy than the MP2 data in some regions of the PES and of lower energy in others, meaning there must be some hypersurface in between where the two are equal. Prediction for test configurations close to the “peninsula” is problematic under Coptimal unless the reference set is very dense in this region. This is because the exact predictions of the long-range function on the “peninsula” mean no training points are added there, leading the closest model strategy to perform relatively poorly for the CO2–Ne potential.

FIG. 5.

Plot of the x and y coordinates of the Ne in the CO2–Ne system for configurations in the reference set. The CO2 molecule is aligned along the x-axis with C at the origin. Points are classified as GP points (blue) or asymptotic points (red) by Coptimal using a model from the closest model strategy trained up to NTP = 100.

FIG. 5.

Plot of the x and y coordinates of the Ne in the CO2–Ne system for configurations in the reference set. The CO2 molecule is aligned along the x-axis with C at the origin. Points are classified as GP points (blue) or asymptotic points (red) by Coptimal using a model from the closest model strategy trained up to NTP = 100.

Close modal

The total gain in training efficiency achieved by boundary optimization varies somewhat between systems. While the best-performing training strategy for HF–Ne improved training efficiency by between 25% and 39% (i.e., E = 61%–75%), for (CO2)2, the gain was only 12%–18% (E = 82%–88%). The more limited gains for (CO2)2 may be because the a priori choice of Rcross = 8.5 Å, required for the fixed boundary method, is reasonable for this system. Nevertheless, even for the (CO2)2 potential, the use of multi-constrained training confers an efficiency gain of ∼18% over fixed boundary training. This means that for all of the potentials explored, the use of a training strategy that employs Cmulti confers a useful improvement over fixed boundary training.

Boundary optimization improves the training efficiency due to more effective placement of training points. This is illustrated in Fig. 6, which shows, for HF–Ne, the differences in training point placement for three training methods: fixed boundary, single-constrained, and closest model. While all place most points at separations below 3 Å, indicating that the repulsive wall is the hardest region to model, the placement of points at larger separations diverges between methods. For the first 20 training points, few configurations are placed beyond 3 Å for the single-constrained and closest model strategies. Thereafter, the distance at which training points are added slowly increases, even for the closest model strategy, which does not employ a boundary directly. In fact, single-constrained training adds training points beyond 8 Å only after ∼90 training points have been placed and the closest model strategy does not add any training points above 7 Å at all.

FIG. 6.

Plots showing training point placement for the first 100 training points for models of the HF–Ne PES trained using the fixed boundary (a), single-constrained placement (b), and closest model (c) strategies. These points are colored based on the shortest interatomic distance in the configuration, with only this shortest distance shown for each. The boundary values are represented by the black lines where applicable.

FIG. 6.

Plots showing training point placement for the first 100 training points for models of the HF–Ne PES trained using the fixed boundary (a), single-constrained placement (b), and closest model (c) strategies. These points are colored based on the shortest interatomic distance in the configuration, with only this shortest distance shown for each. The boundary values are represented by the black lines where applicable.

Close modal

In contrast, the fixed boundary training method adds its fifth training point at a minimum separation of above 5 Å and its eighth at 8.5 Å. This demonstrates that a fixed boundary strategy switches between placing training points in the repulsive wall and at the boundary from the onset of training. This difference is because fixed boundary training requires the GP to predict energies at separations up to 8.5 Å from the start of training. Consequently, training points must be added at separations near the 8.5 Å boundary from the onset, even though the energies at these separations are very small and generally well-approximated by the long-range function. This contrasts with the boundary-optimized and closest model strategies, which allow the long-range function to approximate configurations at separations around 8.5 Å when the number of training points is low. This produces more efficient model development under boundary-optimized and closest model training because point placement can be focused on the short-range region of the PES, where the energy varies more rapidly with configuration.

Training point plots for the CO–Ne potential, given in Fig. 7, show that the single-open strategy extends AGP much further than single-constrained training. This suggests that the capacity to place points in ALR facilitates the faster expansion of the boundary for this system. However, these plots also indicate that this discrepancy is most noticeable when the number of training points is large; specifically, the rapid expansion of the boundary under single-open training was instigated by the placement of a single training point at a long range, which facilitated the increase in Rcross to 63.0 Å. Prior to this, the crossover value from single-open training was quite similar to that from single-constrained training (both were ∼17 Å). Hence, the two training strategies differ significantly only when the predictions from GP regression are already highly accurate, as at this stage, the predictions of the long-range function are poor enough by comparison to the merit placement of a training point at a long range. The other systems also show a close similarity between the crossover distance for the single-constrained and single-open strategies, with equivalent plots to Fig. 8 found in the supplementary material. In addition, Fig. 8 shows that for the CO2–Ne potential, the values of Rcross from the two strategies are similar throughout training. Such a trend suggests that the choice of point placement strategy does not significantly change the value of Rcross, implying once more that the choice of classifier is of greater importance.

FIG. 7.

Plots showing training point placement for the first 100 training points for models of the CO–Ne PES trained using the single-constrained placement (a) and single-open placement (b) strategies. These points are colored based on the shortest interatomic distance in the configuration, with only this shortest distance shown for each. The boundary values are represented by the black lines.

FIG. 7.

Plots showing training point placement for the first 100 training points for models of the CO–Ne PES trained using the single-constrained placement (a) and single-open placement (b) strategies. These points are colored based on the shortest interatomic distance in the configuration, with only this shortest distance shown for each. The boundary values are represented by the black lines.

Close modal
FIG. 8.

Plot showing the values of Rcross achieved for the first 100 training points by single-open and single-constrained training for models of the CO2–Ne potential.

FIG. 8.

Plot showing the values of Rcross achieved for the first 100 training points by single-open and single-constrained training for models of the CO2–Ne potential.

Close modal

Similarities in the evolution of the boundary are seen when the crossover distances achieved under Cmulti and Csingle are compared for a given point placement method. Figure 9 shows the results of such a comparison for the HF–Na+ and CO2–Ne systems. The crossover distances generally grow at a similar rate, but the value of Rcross is consistently larger than all values in Rcross. This suggests that there are differences between Csingle and Cmulti, which manifest in both the training efficiency and boundary placement. The larger GP region under Csingle compared with Cmulti is explained by noting that both are approximations of an “ideal” classifier. When NTP is low (i.e., one or two), such a classifier will attempt to make AGP as small as possible because the training set comprises configurations from the repulsive wall only. Consequently, given the same training set, Csingle and Cmulti also try to minimize the size of AGP. As Cmulti is more flexible, its approximation of the “ideal” classifier will be closer than that of Csingle, meaning AGP under Csingle will be initially larger. Other than the repulsive wall, the largest errors tend to occur at separations around the boundary. Thus, due to its larger AGP, a Csingle strategy will place its first long-range training point at a larger separation than a Cmulti strategy. This facilitates faster expansion of the boundary under Csingle than Cmulti regardless of which point placement method is used.

FIG. 9.

Plots showing the value of Rcross and entries in Rcross for the first 100 training points for models of the HF–Na+ potential from single/multi-open training (a) and the CO2–Ne potential from single/multi-constrained training (b).

FIG. 9.

Plots showing the value of Rcross and entries in Rcross for the first 100 training points for models of the HF–Na+ potential from single/multi-open training (a) and the CO2–Ne potential from single/multi-constrained training (b).

Close modal

Figure 9(a) shows that RH–Na+ < RF–Na+ throughout most of the training for the HF–Na+ potential, meaning that the interaction involving the larger atom (F) obeys a larger crossover distance. Such ordering of the crossover distances also applies to the HF–Ne system for most of the training. Moreover, Fig. 9(b) shows that the ordering of the crossover distances for the CO2–Ne potential is consistent, with RONe < RCNe throughout the training. In fact, for the (CO2)2 and CO–Ne potentials, it holds generally throughout training that ROO < ROC < RCC and RONe < RCNe, respectively, for both point placement methods. This implies that the ordering of the crossover distances under Cmulti is consistent between systems as well as being physically reasonable.

Additionally, Figs. 69 show that the crossover distance increases with the size of the training set. This is expected because a larger training set means the GP has more information with which to infer the function describing the PES. The resultant more accurate GP enables larger crossover distance(s). This is the case for all training strategies across all systems.

Boundary optimization methods decrease the cost of using PES by reducing the number of training points and the size of the GP region. Fewer training points means GP evaluations are cheaper, as the GP cost is proportional to NTP. Additionally, the lower Rcross or Rcross of the boundary-optimized model means that in any application the GP would be used less often, in favor of the much cheaper asymptotic function. Hence, boundary optimization produces models that are more efficient to implement in applications than fixed boundary training, with no reduction in overall accuracy.

Boundary optimization also increases the reproducibility of training. That is, for two separate models from the same training strategy and reference set, the results will be more similar for a boundary-optimized strategy compared to using a fixed boundary. For example, when modeling the CO2–Ne potential with single-open training, the results from two separate training runs were identical. This is shown in Fig. 10. This is noteworthy because the values of the hyperparameters selected when maximizing log(L) can vary slightly, even for the same training set. Such variations can alter the predictions of the GP, leading to different values for Rcross or Rcross and the selection of different training points. Thus, the fact that two separate runs of the same training strategy are totally identical is encouraging.

FIG. 10.

Plots showing the first 100 training points and values of Rcross from two runs of single-open training (a) and fixed boundary training (b) for models of the CO2–Ne potential.

FIG. 10.

Plots showing the first 100 training points and values of Rcross from two runs of single-open training (a) and fixed boundary training (b) for models of the CO2–Ne potential.

Close modal

Furthermore, Fig. 10 shows that for fixed boundary training, separate runs were identical only up to NTP = 27. While the exact reproducibility in Fig. 10(a) is not present for the other potentials, there is generally significantly less difference between independent runs of the boundary-optimized training methods than for fixed boundary training. For example, models of the HF–Ne potential from single-constrained and single-open training are reproducible up to NTP = 30 and NTP = 41, respectively, compared with NTP = 16 for fixed boundary training. Equivalent plots of Fig. 10 for this system under these strategies are given in the supplementary material. This reproducibility increase compared to fixed boundary training does not transfer to the use of the Cmulti strategies, likely because a direct search is not as reproducible in multiple dimensions as in a single dimension. However, from the R2 values in Table IV, it can be seen that the use of either Cmulti or Csingle reduces the scatter in the RMSEtest data relative to the use of Cfixed, as evidenced by the larger R2 values achieved when training with the former two. This trend is repeated across all other potentials examined here, which suggests that the use of boundary optimization leads to more stability in selection of the hyperparameters and hence more consistent predictions.

For the (CO2)2 system, reproducibility is seen not just for repeat runs of identical training strategies but also between the training strategies that use Csingle. This is illustrated in Fig. 11, which shows that the single-constrained and single-open training strategies choose identical training points until NTP = 210. Such an observation explains why the two methods have identical E in Fig. 4(e). This is because when NTP = 210, the RMSEtest values for single-open and single-constrained placement were 2.817 × 10−7Eh and 2.784 × 10−7Eh, respectively. Consequently, the error was too low at this NTP to be included in the fit, meaning the two strategies were identical over the RMSEtest values used in fitting.

FIG. 11.

First 300 training points and values of Rcross from single-open and single-constrained training for the (CO2)2 potential.

FIG. 11.

First 300 training points and values of Rcross from single-open and single-constrained training for the (CO2)2 potential.

Close modal

It has been shown that boundary optimization produces GP PESs of the same accuracy using fewer training points than fixing Rcrossa priori. This improvement in efficiency is hierarchical, with a boundary defined by a single, variable crossover distance offering a modest improvement and a boundary defined by multiple such distances facilitating a further gain.

The results presented imply that the classifier is more important to the training strategy than the point placement method. In the RMSE range that is suitable for first-principles calculations (∼2 × 10−5Eh), the boundary optimization methods typically reduce the required number of training points by 15%–33% relative to a training strategy that is already established as efficient.19 Because of their reduced training set size, the resulting boundary-optimized PESs are strong candidates for transfer learning, in which the existing ab initio calculations are upgraded to a higher level of theory. Furthermore, as the size of the GP region increases with the size of the training set, only as needed, the resulting GPs are also less computationally intensive in applications than fixed boundary methods, as they employ the GP over a smaller region of phase space.

The classifier Cmulti, which uses different crossover distances for different atomic pairs, performed almost as well as an “ideal” classifier. Across all systems, the largest difference in performance between the closest model strategy, which uses an “ideal” classifier, and nearest boundary-optimized strategy was ∼3%. In all cases, the best-performing boundary-optimized strategy employed Cmulti, implying that a classifier comprising a spherical boundary with a unique radius on each unique atomic pair captures the true boundary effectively. Further refinement of the classifier would not result in a sufficient reduction in training points to justify the extra classifier expense.

The crossover distances are learned from the reference data under boundary optimization using a direct search. This is sufficiently fast to be used at every stage of training whether one or many crossover distances are employed and in the multi-dimensional case returns crossover distances in a physically reasonable order. Moreover, both direct search algorithms can be used easily in conjunction with another machine learning technique or on another chemical interaction. In fact, a direct search can be applied to any problem whereby a boundary is sought between a good method of approximation in one region of phase space (the long-range function here) and a machine learning technique in another. This means that the methodology that underpins boundary optimization is both fast and flexible, in addition to being effective in solving the problem of reducing the computational expense associated with training a GP model of an intermolecular PES.

Physical systems in which the behavior crosses over from a simple asymptotic function to more complicated behavior are common in many fields. A prominent example is the transition from ideal to non-ideal gas behavior. As the boundary optimization techniques herein exploit this crossover in behavior, there are many potential applications of this technique to physical problems beyond intermolecular potentials.

See the supplementary material for more plots of results from the explored systems.

The computations of all reference and test sets described in this paper were performed using the University of Nottingham’s Augusta HPC service, which provides a High Performance Computing service to the University’s research community. See https://www.nottingham.ac.uk/it-services/research/hpc/ for more details. The authors also acknowledge funding provided by the Leverhulme Trust Doctoral Scholarship through Modelling and Analytics for a Sustainable Society (MASS) at the University of Nottingham.

The authors have no conflicts to disclose.

The data that support the findings of this study are openly available in the Boundary Optimization Data repository at https://doi.org/10.6084/m9.figshare.16610713.v1.74 

HF–Na+ was the only intermolecular potential examined above for which a multipolar long-range function was not used. Instead, a fitted empirical long-range function was employed. Although determining an optimal long-range function was not the goal of this work, one that was accurate enough that all configurations were not rapidly transferred to the GP region was needed for proof of concept. The multipolar long-range function was unsuitable for this because of the discrepancy between its predictions and the MP2 energies.

The evidence of this is given in Fig. 12, which also highlights the superiority of the predictions of the empirical long-range function. The RMSEs of each method of prediction were separated by two orders of magnitude, with the empirical function achieving an RMSE of 3.07 × 10−7Eh vs 4.15 × 10−5Eh for the multipolar function against the MP2 data shown in Fig. 12.

FIG. 12.

Plots of the predicted energies of the long-range functions derived by the empirical method (red) and the multipole method (green) for the HF–Na+ potential in a linear configuration with the F proximal to the Na+ [i.e., with cos(θ) = 1]. Also shown for comparison are MP2 energies (blue) for this configuration.

FIG. 12.

Plots of the predicted energies of the long-range functions derived by the empirical method (red) and the multipole method (green) for the HF–Na+ potential in a linear configuration with the F proximal to the Na+ [i.e., with cos(θ) = 1]. Also shown for comparison are MP2 energies (blue) for this configuration.

Close modal

Figure 12 evidences that the multipolar long-range function captured the power law of the interaction energy for this system in this configuration, with the source of its error being that it was offset from the calculated energies. This offset was a product of the energies being calculated using MP2, while many of the properties used to derive the multipolar long-range function were calculated at higher levels of theory (see Table II). In the other systems, the small magnitude of the long-range energies meant that this disagreement was negligible and the multipolar long-range function was usable. However, in the case of HF–Na+, long-range energies with larger magnitudes were commonplace due to the strong repulsive and attractive interactions between the H–F dipole and the Na+ cation. This exacerbated the difference between the predictions of the multipolar long-range function and the MP2 energies.

To approximate accurately the long-range data without upgrading the reference and test data to a higher level of theory, a long-range function was produced by fitting directly to these data. This was the empirical long-range function. Taking r to be the distance between the center of the H–F bond (not center of mass) and the Na+, this function estimated the energy, E, as a sum of power laws,

E=Ar2+Br3.
(A1)

In doing so, the empirical long-range function exploits that the dominant powers of r in the HF–Na+ interaction are known to be −2 and −3 but assumes that the coefficients of these terms are unknown.

As the energy varies differently with r when θ changes, a sum such as in Eq. (A1) must be fitted for every configuration in a given dataset for which θ is unique. For a given θ, the coefficients in Eq. (A1) can be found using simultaneous equations, which are set up by following Algorithm 4. In all cases, rmin = 8.5 Å and rmax = 10.5 Å, although the power laws that resulted were accurate up to 100 Å. Furthermore, both GPs in Algorithm 4 were trained on ∼300 training points from a LHC design.

ALGORITHM 4.

Coefficients for the empirical long-range function.

1: Train a GP, GPmin, on a range of θ values at separation r = rmin; do the same for another GP, GPmax, at r = rmax
2: For a given θ, predict Emin and Emax using GPmin and GPmax respectively. 
3: Set up simultaneous equations of the form shown in Eq. (A1): one with E = Emin and r = rmin, and another with E = Emax and r = rmax
4: Solve the equations from step three for the coefficients for the current θ
1: Train a GP, GPmin, on a range of θ values at separation r = rmin; do the same for another GP, GPmax, at r = rmax
2: For a given θ, predict Emin and Emax using GPmin and GPmax respectively. 
3: Set up simultaneous equations of the form shown in Eq. (A1): one with E = Emin and r = rmin, and another with E = Emax and r = rmax
4: Solve the equations from step three for the coefficients for the current θ

By repeating steps 2–4 in Algorithm 4, a sum of power laws was determined for every θ value in the reference set and test set. Fitting to the latter was possible as knowledge of the energies in the set over which fitting was undertaken was unnecessary. This was because the GP predictions were based on their respective training sets, and r and θ were found from the inverse interatomic separations at each configuration alone.

Deriving a long-range function from the predictions of two GPs could be problematic if a transfer learning approach were to be invoked as the data used to train these GPs would itself need to be upgraded. For the HF–Na+ potential, this would not be an issue because increasing the quality of the training data would increase the quality of the fit from the multipolar long-range function.

However, when an empirical function is the only option for modeling the long-range data, upgrading the data used in training GPmin and GPmax would be of considerable computational expense. This is because, currently, each comprises ∼300 configurations. Such an issue could be circumvented by using a sequential design strategy to build minimal training sets for these GPs, which could then be upgraded instead. Furthermore, the training sets used for GPmin and GPmax are independent of that used to train a GP on the wider PES. Letting the number of training points in the training sets of GPmin and GPmax be Y and that in the training set of the other GP be Z, the cost of short-range predictions in any simulation would scale linearly with Z and the cost of any long-range predictions would scale linearly with 2Y. Given that the training of GPmin and GPmax take place at a fixed separation, it is likely that 2Y < Z if all GPs were trained under a sequential design strategy. This means that the predictions of the empirical long-range function would not be the computational bottleneck and that such a function is suitable for use in simulations. Finally, the method is sufficiently flexible that the GP predictions can easily be replaced by those of another statistical method.

1.
T. A.
Halgren
and
W.
Damm
, “
Polarizable force fields
,”
Curr. Opin. Struct. Biol.
11
,
236
242
(
2001
).
2.
A.
Warshel
,
M.
Kato
, and
A. V.
Pisliakov
, “
Polarizable force fields: History, test cases, and prospects
,”
J. Chem. Theory Comput.
3
,
2034
2045
(
2007
).
3.
P.
Xu
,
E. B.
Guidez
,
C.
Bertoni
, and
M. S.
Gordon
, “
Perspective: Ab initio force field methods derived from quantum mechanics
,”
J. Chem. Phys.
148
,
090901
(
2018
).
4.
J.
Behler
, “
Perspective: Machine learning potentials for atomistic simulations
,”
J. Chem. Phys.
145
,
170901
(
2016
).
5.
T.
Mueller
,
A.
Hernandez
, and
C.
Wang
, “
Machine learning for interatomic potential models
,”
J. Chem. Phys.
152
,
050902
(
2020
).
6.
B.
Selvaratnam
and
R. T.
Koodali
, “
Machine learning in experimental materials chemistry
,”
Catal. Today
371
,
77
(
2021
).
7.
J.
Noh
,
G. H.
Gu
,
S.
Kim
, and
Y.
Jung
, “
Machine-enabled inverse design of inorganic solid materials: Promises and challenges
,”
Chem. Sci.
11
,
4871
4881
(
2020
).
8.
P. O.
Dral
, “
Quantum chemistry in the age of machine learning
,”
J. Phys. Chem. Lett.
11
,
2336
2347
(
2020
).
9.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
10.
N.
Artrith
,
T.
Morawietz
, and
J.
Behler
, “
High-dimensional neural-network potentials for multicomponent systems: Applications to zinc oxide
,”
Phys. Rev. B
83
,
153101
(
2011
).
11.
N.
Artrith
,
B.
Hiller
, and
J.
Behler
, “
Neural network potentials for metals and oxides—First applications to copper clusters at zinc oxide
,”
Phys. Status Solidi B
250
,
1191
1203
(
2013
).
12.
J.
Behler
, “
Representing potential energy surfaces by high-dimensional neural network potentials
,”
J. Phys.: Condens. Matter
26
,
183001
(
2014
).
13.
Q.
Lin
,
Y.
Zhang
,
B.
Zhao
, and
B.
Jiang
, “
Automatically growing global reactive neural network potential energy surfaces: A trajectory-free active learning strategy
,”
J. Chem. Phys.
152
,
154104
(
2020
).
14.
A. V.
Shapeev
, “
Moment tensor potentials: A class of systematically improvable interatomic potentials
,”
Multiscale Model. Simul.
14
,
1153
1173
(
2016
).
15.
E. V.
Podryabinkin
and
A. V.
Shapeev
, “
Active learning of linearly parametrized interatomic potentials
,”
Comput. Mater. Sci.
140
,
171
180
(
2017
).
16.
I. I.
Novoselov
,
A. V.
Yanilkin
,
A. V.
Shapeev
, and
E. V.
Podryabinkin
, “
Moment tensor potentials as a promising tool to study diffusion processes
,”
Comput. Mater. Sci.
164
,
46
56
(
2019
).
17.
E.
Uteva
,
R. S.
Graham
,
R. D.
Wilkinson
, and
R. J.
Wheatley
, “
Interpolation of intermolecular potentials using Gaussian processes
,”
J. Chem. Phys.
147
,
161706
(
2017
).
18.
A. J.
Cresswell
,
R. J.
Wheatley
,
R. D.
Wilkinson
, and
R. S.
Graham
, “
Molecular simulation of the thermophysical properties and phase behaviour of impure CO2 relevant to CCS
,”
Faraday Discuss.
192
,
415
436
(
2016
).
19.
E.
Uteva
,
R. S.
Graham
,
R. D.
Wilkinson
, and
R. J.
Wheatley
, “
Active learning in Gaussian process interpolation of potential energy surfaces
,”
J. Chem. Phys.
149
,
174114
(
2018
).
20.
C. M.
Handley
,
G. I.
Hawe
,
D. B.
Kell
, and
P. L. A.
Popelier
, “
Optimal construction of a fast and accurate polarisable water potential based on multipole moments trained by machine learning
,”
Phys. Chem. Chem. Phys.
11
,
6365
6376
(
2009
).
21.
M. J. L.
Mills
and
P. L. A.
Popelier
, “
Intramolecular polarisable multipolar electrostatics from the machine learning method Kriging
,”
Comput. Theor. Chem.
975
,
42
51
(
2011
).
22.
M. J. L.
Mills
and
P. L. A.
Popelier
, “
Polarisable multipolar electrostatics from the machine learning method Kriging: An application to alanine
,”
Theor. Chem. Acc.
131
,
1137
(
2012
).
23.
S. M.
Kandathil
,
T. L.
Fletcher
,
Y.
Yuan
,
J.
Knowles
, and
P. L. A.
Popelier
, “
Accuracy and tractability of a kriging model of intramolecular polarizable multipolar electrostatics and its application to histidine
,”
J. Comput. Chem.
34
,
1850
1861
(
2013
).
24.
J.
Dai
and
R. V.
Krems
, “
Interpolation and extrapolation of global potential energy surfaces for polyatomic systems by Gaussian processes with composite kernels
,”
J. Chem. Theory Comput.
16
,
1386
1395
(
2020
).
25.
H.
Sugisawa
,
T.
Ida
, and
R. V.
Krems
, “
Gaussian process model of 51-dimensional potential energy surface for protonated imidazole dimer
,”
J. Chem. Phys.
153
,
114101
(
2020
).
26.
G.
Schmitz
,
E. L.
Klinting
, and
O.
Christiansen
, “
A Gaussian process regression adaptive density guided approach for potential energy surface construction
,”
J. Chem. Phys.
153
,
064105
(
2020
).
27.
M. J.
Burn
and
P. L. A.
Popelier
, “
Creating Gaussian process regression models for molecular simulations using adaptive sampling
,”
J. Chem. Phys.
153
,
054111
(
2020
).
28.
M. A.
Boussaidi
,
O.
Ren
,
D.
Voytsekhovsky
, and
S.
Manzhos
, “
Random sampling high dimensional model representation Gaussian process regression (RS-HDMR-GPR) for multivariate function representation: Application to molecular potential energy surfaces
,”
J. Phys. Chem. A
124
,
7598
7607
(
2020
).
29.
G.
Sivaraman
,
A. N.
Krishnamoorthy
,
M.
Baur
,
C.
Holm
,
M.
Stan
,
G.
Csányi
,
C.
Benmore
, and
Á.
Vázquez-Mayagoitia
, “
Machine learning inter-atomic potentials generation driven by active learning: A case study for amorphous and liquid hafnium dioxide
,”
npj Comput. Mater.
6
,
1
8
(
2019
).
30.
M. A.
Caro
,
G.
Csanyi
,
T.
Laurila
, and
V. L.
Deringer
, “
Machine learning driven simulated deposition of carbon films: From low-density to diamond like amorphous carbon
,”
Phys. Rev. B
102
,
174201
(
2020
).
31.
Y.-B.
Liu
,
J.-Y.
Yang
,
G.-M.
Xin
,
L.-H.
Liu
,
G.
Csányi
, and
B.-Y.
Cao
, “
Machine learning interatomic potential developed for molecular simulations on thermal properties of β-Ga2O3
,”
J. Chem. Phys.
153
,
144501
(
2020
).
32.
A.
Glielmo
,
P.
Sollich
, and
A.
De Vita
, “
Accurate interatomic force fields via machine learning with covariant kernels
,”
Phys. Rev. B
95
,
214302
(
2017
).
33.
A.
Glielmo
,
C.
Zeni
, and
A.
De Vita
, “
Efficient nonparametric n-body force fields from machine learning
,”
Phys. Rev. B
97
,
184307
(
2018
).
34.
C.
Zeni
,
K.
Rossi
,
A.
Glielmo
,
Á.
Fekete
,
N.
Gaston
,
F.
Baletto
, and
A.
De Vita
, “
Building machine learning force fields for nanoclusters
,”
J. Chem. Phys.
148
,
241739
(
2018
).
35.
P.
Maxwell
,
N.
di Pasquale
,
S.
Cardamone
, and
P. L. A.
Popelier
, “
The prediction of topologically partitioned intra-atomic and inter-atomic energies by the machine learning method kriging
,”
Theor. Chem. Acc.
135
,
195
(
2016
).
36.
P. I.
Maxwell
and
P. L. A.
Popelier
, “
Accurate prediction of the energetics of weakly bound complexes using the machine learning method kriging
,”
Struct. Chem.
28
,
1513
1523
(
2017
).
37.
Z. E.
Hughes
,
E.
Ren
,
J. C. R.
Thacker
,
B. C. B.
Symons
,
A. F.
Silva
, and
P. L. A.
Popelier
, “
A FFLUX water model: Flexible, polarizable and with a multipolar description of electrostatics
,”
J. Comput. Chem.
41
,
619
628
(
2020
).
38.
T. L.
Fletcher
and
P. L. A.
Popelier
, “
FFLUX: Transferability of polarizable machine-learned electrostatics in peptide chains
,”
J. Comput. Chem.
38
,
1005
1014
(
2017
).
39.
J. C. R.
Thacker
,
A. L.
Wilson
,
Z. E.
Hughes
,
M. J.
Burn
,
P. I.
Maxwell
, and
P. L. A.
Popelier
, “
Towards the simulation of biomolecules: Optimisation of peptide-capped glycine using FFLUX
,”
Mol. Simul.
44
,
881
890
(
2018
).
40.
A.
Konovalov
,
B. C. B.
Symons
, and
P. L. A.
Popelier
, “
On the many-body nature of intramolecular forces in FFLUX and its implications
,”
J. Comput. Chem.
42
,
107
116
(
2020
).
41.
A. P.
Bartók
,
R.
Kondor
, and
G.
Csányi
, “
On representing chemical environments
,”
Phys. Rev. B
87
,
184115
(
2013
).
42.
F. C.
Mocanu
,
K.
Konstantinou
,
T. H.
Lee
,
N.
Bernstein
,
V. L.
Deringer
,
G.
Csányi
, and
S. R.
Elliott
, “
Modeling the phase-change memory material, Ge2Sb2Te5, with a machine-learned interatomic potential
,”
J. Phys. Chem. B
122
,
8998
9006
(
2018
).
43.
D.
Dragoni
,
T. D.
Daff
,
G.
Csányi
, and
N.
Marzari
, “
Achieving DFT accuracy with a machine-learning interatomic potential: Thermomechanics and defects in BCC ferromagnetic iron
,”
Phys. Rev. Mater.
2
,
013808
(
2018
).
44.
A. P.
Bartók
,
J.
Kermode
,
N.
Bernstein
, and
G.
Csányi
, “
Machine learning a general-purpose interatomic potential for silicon
,”
Phys. Rev. X
8
,
041048
(
2018
).
45.
V. L.
Deringer
,
M. A.
Caro
, and
G.
Csányi
, “
A general-purpose machine-learning force field for bulk and nanostructured phosphorus
,”
Nat. Commun.
11
,
5461
(
2020
).
46.
M. D.
McKay
,
R. J.
Beckman
, and
W. J.
Conover
, “
A comparison of three methods for selecting values of input variables in the analysis of output from a computer code
,”
Technometrics
42
,
55
61
(
2000
).
47.
M.
Stein
, “
Large sample properties of simulations using Latin hypercube sampling
,”
Technometrics
29
,
143
151
(
1987
).
48.
W.-L.
Loh
, “
On Latin hypercube sampling
,”
Ann. Stat.
24
,
2058
2080
(
1996
).
49.
H.
Robbins
, “
Some aspects of the sequential design of experiments
,”
Bull. Am. Math. Soc.
58
,
527
535
(
1952
).
50.
H.
Chernoff
, “
Sequential design of experiments
,”
Ann. Math. Stat.
30
,
755
770
(
1959
).
51.
Y.
Zuo
,
C.
Chen
,
X.
Li
,
Z.
Deng
,
Y.
Chen
,
J.
Behler
,
G.
Csányi
,
A. V.
Shapeev
,
A. P.
Thompson
,
M. A.
Wood
 et al, “
Performance and cost assessment of machine learning interatomic potentials
,”
J. Phys. Chem. A
124
,
731
745
(
2020
).
52.
D. J.
MacKay
, “
Introduction to Gaussian processes
,” in
NATO ASI Series F Computer and Systems Sciences
(
Springer
,
1998
), Vol. 168, pp.
133
166
.
53.
M. N.
Gibbs
and
D. J. C.
MacKay
, “
Variational Gaussian process classifiers
,”
IEEE Trans. Neural Networks
11
,
1458
1464
(
2000
).
54.
C.
Rasmussen
and
C.
Williams
,
Gaussian Processes for Machine Learning
(
MIT Press
,
2006
).
55.
C.
Qu
,
Q.
Yu
,
B. L.
Van Hoozen
, Jr.
,
J. M.
Bowman
, and
R. A.
Vargas-Hernández
, “
Assessing Gaussian process regression and permutationally invariant polynomial approaches to represent high-dimensional potential energy surfaces
,”
J. Chem. Theory Comput.
14
,
3381
3396
(
2018
).
56.
J. W.
Ng
and
M. P.
Deisenroth
, “
Hierarchical mixture-of-experts model for large-scale Gaussian process regression
,”
Stat
1050
,
9
(
2014
).
57.
T. T.
Nguyen
,
E.
Székely
,
G.
Imbalzano
,
J.
Behler
,
G.
Csányi
,
M.
Ceriotti
,
A. W.
Götz
, and
F.
Paesani
, “
Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions
,”
J. Chem. Phys.
148
,
241725
(
2018
).
58.
A.
Kamath
,
R. A.
Vargas-Hernández
,
R. V.
Krems
,
T.
Carrington
, Jr.
, and
S.
Manzhos
, “
Neural networks vs Gaussian process regression for representing potential energy surfaces: A comparative study of fit quality and vibrational spectrum accuracy
,”
J. Chem. Phys.
148
,
241702
(
2018
).
59.
R. V.
Krems
, “
Bayesian machine learning for quantum molecular dynamics
,”
Phys. Chem. Chem. Phys.
21
,
13392
13410
(
2019
).
60.
M. P.
Metz
and
K.
Szalewicz
, “
A statistically guided grid generation method and its application to intermolecular potential energy surfaces
,”
J. Chem. Phys.
152
,
134111
(
2020
).
61.
J.
Cui
and
R. V.
Krems
, “
Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processes
,”
J. Phys. B: At., Mol. Opt. Phys.
49
,
224001
(
2016
).
62.
A.
Stone
,
The Theory of Intermolecular Forces
(
OUP
,
Oxford
,
2013
).
63.
L. E.
Atlas
,
D. A.
Cohn
, and
R. E.
Ladner
, “
Training connectionist networks with queries and selective sampling
,” in
Advances in Neural Information Processing Systems
(
CiteSeer
,
1990
), pp.
566
573
.
64.
D. J. C.
MacKay
, “
Information-based objective functions for active data selection
,”
Neural Comput.
4
,
590
604
(
1992
).
65.
L.
Torrey
and
J.
Shavlik
, “
Transfer learning
,” in
Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques
(
IGI Global
,
2010
), pp.
242
264
.
66.
S. J.
Pan
and
Q.
Yang
, “
A survey on transfer learning
,”
IEEE Trans. Knowl. Data Eng.
22
,
1345
1359
(
2009
).
67.
C.
Møller
and
M. S.
Plesset
, “
Note on an approximation treatment for many-electron systems
,”
Phys. Rev.
46
,
618
(
1934
).
68.
R. J.
Bartlett
and
M.
Musiał
, “
Coupled-cluster theory in quantum chemistry
,”
Rev. Mod. Phys.
79
,
291
(
2007
).
69.
H.-J.
Werner
,
P. J.
Knowles
,
G.
Knizia
,
F. R.
Manby
, and
M.
Schütz
, “
Molpro: A general-purpose quantum chemistry program package
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
,
242
253
(
2012
).
70.
V.
Fock
, “
Näherungsmethode zur lösung des quantenmechanischen mehrkörperproblems
,”
Z. Phys.
61
,
126
148
(
1930
).
71.
H. J.
Werner
and
P. J.
Knowles
, “
An efficient internally contracted multiconfiguration–reference configuration interaction method
,”
J. Chem. Phys.
89
,
5803
5814
(
1988
).
72.
P. J.
Knowles
and
H.-J.
Werner
, “
An efficient method for the evaluation of coupling coefficients in configuration interaction calculations
,”
Chem. Phys. Lett.
145
,
514
522
(
1988
).
73.
GPy
, GPy: A Gaussian process framework in Python, http://github.com/SheffieldML/GPy, since 2012.
74.
J.
Broad
,
R. J.
Wheatley
, and
R. S.
Graham
(
2021
). “
Boundary optimisation data
,” Figshare. .

Supplementary Material