We introduce ACEpotentials.jl, a Julia-language software package that constructs interatomic potentials from quantum mechanical reference data using the Atomic Cluster Expansion [R. Drautz, Phys. Rev. B 99, 014104 (2019)]. As the latter provides a complete description of atomic environments, including invariance to overall translation and rotation as well as permutation of like atoms, the resulting potentials are systematically improvable and data efficient. Furthermore, the descriptor’s expressiveness enables use of a linear model, facilitating rapid evaluation and straightforward application of Bayesian techniques for active learning. We summarize the capabilities of ACEpotentials.jl and demonstrate its strengths (simplicity, interpretability, robustness, performance) on a selection of prototypical atomistic modelling workflows.
I. INTRODUCTION
Machine-learning interatomic potentials (MLIPs) continue to revolutionize the fields of molecular and materials simulation.1,2 MLIPs provide the means to simulate atomistic systems at or close to the accuracy of electronic structure methods, while being computationally cheaper by orders of magnitude. They make the simulation of large-scale systems and long time-scales at high model accuracy accessible and have therefore become an indispensable tool for atomic-scale simulation. Recent reviews of the field are provided in Refs. 3–6. Of particular relevance to the present work are the methods introduced in Refs. 2 and 7–9.
To create an MLIP, one begins with a flexible functional form, constrained only to comply with the natural symmetries of the potential energy in three-dimensional space, then estimates its parameters using reference data, typically in the form of energies, forces, and virial stresses for a set of representative atomic configurations. Ordinarily, the data are generated with quantum mechanical techniques, such as density functional theory (DFT) calculations, which may be performed only for relatively small structures. A well-trained MLIP is then expected to provide accurate predictions of processes on similar but also much larger spatial scales.
The Atomic Cluster Expansion (ACE) introduced in Ref. 9 is a particular MLIP flavor that is flexible, theoretically well founded, interpretable, and for which it is straightforward to tune the cost-accuracy balance. It is establishing itself as a successful MLIP approach for a wide range of tasks, especially but not exclusively in materials simulation; see e.g., Refs. 10–17. Linear variants of the ACE model have been found remarkably data efficient and computationally efficient and as such have proven particularly useful for active learning (AL) workflows14 as Secs. III and IV will demonstrate. Linearity in particular enables sensitivity analysis and a path towards reliable uncertainty quantification.
This article describes ACEpotentials.jl, which ties together a collection of Julia-language packages to expose a user-oriented interface facilitating the convenient construction of ACE MLIPs. To highlight the ease of use of our package, Listing 1 provides a complete Julia-language example that produces an ACE potential for a TiAl dataset.
A minimal Julia-language script for fitting an ACEpotentials.jl potential. It first downloads a training dataset, then uses acemodel to create a model object whose parameters are explained fully in the following sections. The model parameters are estimated with the acefit! command, and the result is exported in a LAMMPS compatible format.
At the time of writing, ACEpotentials.jl provides interfaces for linear ACE models, which give good accuracy as well as performance both in parameter estimation and prediction. We have incorporated a range of geometric and analytical priors into the default model parameters that have proven robust in a range of tasks, including the challenging low data regime arising in active learning workflows. ACEpotentials.jl models can be used for molecular dynamics (MD) simulation in Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS),18 the Atomic Simulation Environment (ASE)19 and Molly.jl.20
The Julia-language codes on which ACEpotentials.jl builds are written with ease-of-use, performance, and flexibility of model development in mind. Several variations and extensions of the ACE model implementations discussed in this article are under active development. The choice of Julia as the development language enables seamless transition from rapid prototyping to performance optimization. Moreover, Julia is establishing itself as leader in scientific machine learning (see, e.g., Ref. 21), facilitating highly customized model architectures with novel computational kernels.
Finally, we emphasize that the aim of this article is to illustrate the capabilities of ACEpotentials.jl but not to precisely document its use; for the latter see the reference material at Ref. 22, which will evolve along with the software. While the examples and code snippets provided throughout this article are compatible with the present version of ACEpotentials.jl, they should be taken primarily as illustrations of how the package may be used. The documentation will be kept up-to-date for the foreseeable future and will continually expand to describe additional options and features.
II. METHODS
A. Review of the linear ACE framework
1. Model specification
Both series in Sec. II A are truncated versions of an exact body-order expansion. An exact expansion would include terms up to the number of atoms in the system, while here the maximum body-order is (corresponding to a correlation order of ), which constitutes the first approximation parameter. In practice, the truncation is performed at low to moderate (typically 5 or less) for several reasons, including control of model complexity and computational cost.
The basis functions Bi specify the model. In a typical example this can be done as demonstrated in Listing 2.
The model object specifies the model site energy potential, from which derived properties such as potential energy, forces and virial stresses can be computed that are used in molecular statics, molecular dynamics or sampling algorithms.
There are many additional parameters and options available to specify an ACE model, some of which we discuss throughout the remainder of this paper. For a complete list of options we refer to the documentation.22 We only remark briefly on the Eref parameter: We recommend the explicit specification of the one-body term V(0). We observed in many tests that constraining V(0)(Zi) to be the energy of a single isolated atom with atomic number Zi yields more chemically realistic potentials that are more robust in practical molecular dynamics and molecular statics simulations, especially those involving breaking and forming bonds. One provides this information to an ACE model as shown in Listing 2, line 6.
In the remainder of this section we maintain a focus on high level intuitive understanding of options and parameters and avoid details and technicalities of the ACE framework as much as possible. For those details we refer to the Appendix. and to the many publications now available on the subject.6,9,11,12
2. Parameter estimation
In Listing 3 we read in such a prepared training set provided in the extended XYZ format and then estimate the model parameters with a default solver (Bayesian Linear Regression; cf. Sec. II F). Several steps are combined and hidden from the user in the acefit! convenience function, but all these steps can in principle also be performed manually, e.g., to explore different parameter estimation algorithms that are currently not interfaced by ACEpotentials.jl. In line 5 of the listing, the fitted model is exported to a format that can be used for molecular dynamics simulations in LAMMPS.
A representative example loading a training dataset and estimating ACE model parameters.
B. Choice of basis functions and geometric priors
1. One-particle basis
2. Radial basis
- y is an element-dependent distance transform, which can be used to impose increased spatial resolution where needed, especially near the equilibrium bond-length. We typically employwhere r0 is an estimate of the equilibrium bond-length in the system and a is chosen to maximize the gradient of y at r = r0, thereby maximizing resolution for nearest-neighbour interaction.
The idea behind this transform is that it behaves as r−q for large r and as 1 − rp/a for small r thereby decreasing resolution in those two limits at rates determined by the parameters p, q. The reduction in resolution in the small r regime is desirable when no data is available to specify the model in that regime; see also Fig. 1.
Pn is an orthogonal basis in y-coordinates. Our default choice is the Legendre orthogonal polynomial basis, which implicitly assumes equidistribution of resolution in y-coordinates.
Finally, fenv is an envelope that specifies the cutoff radius rcut.
– The default and canonical choice for the many-body basis iswhere ycut = y (rcut, Zi, Zj).– The default choice of envelope for the pair potential U(1) or V(1) is Coulomb potential tilted to ensure a smooth cutoff,
Center: a typical interaction potential V(r), plotted in r-coordinates. Left: a coordinate transform y = y(r) to a non-dimensional variable y that increases resolution near r = r0 where the potential minimum is located and decreases resolution below rmin (the radial distance occuring in the training dataset), to zero near r = 0 where there is no data (and the envelope fenv becomes relevant) and near r = rcut where the potential converges to a constant. The histograms show the distribution of a typical dataset in both r- and y-coordinates. Right: the interaction potential plotted (i) in transformed coordinates V(r(y)), (ii) with the default pair envelope removed and (iii) with the theoretically optimal, typically unknown, envelope removed. The parameterisation and the smoothness priors are not applied to the original potential V(r) but to the transformed potential V(y)/fenv(y).
Center: a typical interaction potential V(r), plotted in r-coordinates. Left: a coordinate transform y = y(r) to a non-dimensional variable y that increases resolution near r = r0 where the potential minimum is located and decreases resolution below rmin (the radial distance occuring in the training dataset), to zero near r = 0 where there is no data (and the envelope fenv becomes relevant) and near r = rcut where the potential converges to a constant. The histograms show the distribution of a typical dataset in both r- and y-coordinates. Right: the interaction potential plotted (i) in transformed coordinates V(r(y)), (ii) with the default pair envelope removed and (iii) with the theoretically optimal, typically unknown, envelope removed. The parameterisation and the smoothness priors are not applied to the original potential V(r) but to the transformed potential V(y)/fenv(y).
While the envelope for the many-body potential is canonical, for the pair potential envelope there is significant scope for inserting prior modelling knowledge of the system of interest. For example, one could replace the r−1 type behaviour with r−p + r−q to obtain different behaviour as r → 0 and r → rcut, or in fact one could incorporate the Ziegler-Biersack-Littmark (ZBL) potential23 to obtain asymptotically exact repulsion.
A example demonstrating more fine-grained control over the choice of radial basis Rnl. The function transformed_jacobi_env constructs the polynomial basis from which the radial basis is constructed, which can be within the general class of Jacobi polynomials, but is normally taken to be the Legendre basis in transformed y coordinates.
The effect of the distance transform y = y(r) and of the envelope function are visualized in Fig. 1.
- Repulsion restraint: The construction outlined above means that, in the canonical cluster expansion formulation, the pair potential is given bywhere is a polynomial in transformed y coordinates. By imposing the constraint that , where y0 = y(0, Zi, Zj), we ensure that E ∼ fenv(rij) as rij → 0. This guarantees repulsive behaviour of the total energy, independently of whether or not this is provided through the training data. In practice we enforce this weakly through a mild restraint to give the potential more flexibility.
C. A priori sparsification and smoothness prior
1. Sparse basis selection
Recall that kt = (zt, nt, lt, mt), and that the bound |mt| ≤ lt on mt automatically gives a selection of possible mt values once lt bounds are chosen. Roughly speaking, nt, lt measure how oscillatory the corresponding basis functions are in, respectively, the radial rt and angular coordinates. Therefore one typically puts upper bounds nt ≤ nmax and lt ≤ lmax in the basis selection, i.e. one chooses all basis functions (k1, …, kν) in the expansion for which these bounds are satisfied. Lower bounds lead to a smaller basis, but also less flexibility and correspondingly lower accuracy on the training set.
The default usage is that totaldegree(ν) takes the same value for all ν but one may also specify a separate total degree for each correlation order ν. For example, Listing 5 demonstrates how to select a stronger weight wL = 2.0 thus providing less angular resolution, as well as how to select total polynomial degrees 25, 23, 20, 10 for, respectively, parameterising V(1), V(2), V(3), V(4).
Construct an ACE model with finer control on the sparse selection of basis functions.
Significant further fine-tuning of the basis specification is possible, e.g. choosing different total degrees and wL parameters for different interacting species. This is explained in the package documentation.22
2. Smoothness prior
The matrix Γ also serves as a smoothness prior within the Bayesian interpretation of ridge regression: the prior distribution for the model parameters c is given by a multivariate normal distribution that is centered at the origin and has variance proportional to Γ−2; see Secs. II E and II F. In ACEpotentials.jl this operator can be constructed as shown in Listing 6, with p = 4, wL = 1 the default.
Construct an operator that estimates the smoothness of the MLIP model, to be used as a Tikhonov regulariser, or prior in a Bayesian framework.
The resulting operator Γ may now be used to specify the regularizer (or prior) of parameter estimation algorithms, e.g., in Listing 3, line 2 and explained in more detail in Secs. II E and II F. A key point is that Γ is a rigorous smoothness prior for the canonical cluster expansion (2a) but only a heuristic for the self-interacting expansion (2b).
D. Training data
In the foregoing sections we discussed in some depth how an ACE interatomic potential architecture can be conveniently specified. The next task is to estimate the parameters matching the model to training data.
A training dataset consists of a collection of reference structures, R = {R}, each with associated potential energy , forces and, when appropriate, virials (Voigt notation). The reference energies, forces and virials are typically obtained by evaluating a “high fidelity” reference potential energy surface for which we wish to obtain an ACE surrogate model. Density Functional Theory is a common choice, but higher levels of theory such as Coupled-Cluster methods are also used, especially for non-periodic systems. In addition each training structure should be given a label that specifies related sub-groups. For example, these subgroups could indicate different phases of a material, and the resulting labels might be “bcc,” “fcc,” “liquid.” The label could also indicate the MD temperature from the which the structures were generated, e.g. “fcc500K” or “liquid2500K.” This allows convenient filtering of the training set, e.g., for assigning training weights (cf. Sec. II E) or fitting to subsets.
Acquisition of training data need not be performed within the ACEpotentials.jl package, but can be undertaken in any simulation software that makes it convenient to generate and manipulate atomic structures, perform molecular dynamics or Monte Carlo simulations, and to evaluate structures using a high fidelity electronic structure model. Because of the general ease of use and in particular ease of interoperability with the Julia molecular simulation eco-system, we often use the Atomic Simulation Environment.19
The standard format for storing and retrieving a training set in ACEpotentials.jl is the extended XYZ format and can be read as shown in Listing 7. This results in a list of atomic structures storing the structure information as well as the training data.
1. Overview of training set acquisition
The acquisition of training data is often the most time-consuming aspect of MLIP development. An in-depth discussion goes beyond the scope of this software review article; important details can be found for example in Refs. 5, 12, and 24–26. In the remainder of this section we give an outline of general strategies to consider, while in Sec. III we go into practical aspects how training sets can be constructed in a few prototypical applications and what kind of tools ACEpotentials.jl provide to support that task.
The overarching requirements are that training sets (1) must contain small enough atomic structures that they can be evaluated using high-fidelity electronic structure models; and (2) must contain snapshots of all possible local atomic configurations one expects to encounter during simulation and prediction tasks. Thus, generating a training set reduces to generating representative atomic structures which are then evaluated with the reference model to obtain target potential energies, forces and virials. While the latter is usually straightforward and varies little between projects, there is no standard way yet to generate the training structures. The choice will depend on the atomic system at hand, and the simulation tasks that the model must be able to perform reliably, e.g. which system properties (observables) are to be modelled.
As a first step, one should “sketch out” the parts of the potential energy landscape that are of interest, e.g. construct one representative structure for each distinct energy minimum of interest. This might include different phases or material defects that the final model should be able to describe. Next, one generates random samples from those sketches for example by displacing the atom positions (randomly, along normal modes, volume scans, and so forth), or by subsampling an ab initio molecular dynamics trajectory. After collecting a seemingly adequate number of training structures (the total number of observations should normally exceed that number of parameters) one can fit a first model and test that model’s accuracy with respect to some target property. If the accuracy is inadequate, or the model not robust (e.g., an MD simulation is unstable), then a good strategy is to proceed with an iterative model refinement process. In each iteration additional training structures are selected to converge the model’s accuracy with respect to the target properties of interest. One might add hand-crafted structures to fix a particular flaw (e.g. to improve description of inter-molecular interaction in a molecular liquid or include supercells with vacant atomic sites) or model-driven MD to less computationally expensively explore relevant parts of Potential Energy Surface (for example, low potential energy regions to bring potential-Boltzmann-sample closer to reference-Boltzmann-sample and wider temperature/pressure range than intended for application of interest to make the model-driven simulations more stable).
Iterative model refinement is closely related to active learning. That strategy assumes that there is an accurate and efficient way available to estimate model uncertainty. During a simulation task, for example a molecular dynamics simulation, when a structure with high uncertainty is encountered it is evaluated with a reference method and added to the training data. To accelerate this process, we developed Hyper-Active Learning,14 which biases molecular dynamics simulation towards high-uncertainty and high predicted error regions. This strategy is sometimes capable of more rapidly generating many independent training samples. Section III will go into some details how this strategy is used in practice.
E. Parameter estimation: Ridge regression
Recall from Sec. II A that the linear ACE models are parameterized linearly as shown in (3). As described in Sec. II D we estimate parameters by matching the model to observations of total energies, forces and virials evaluated via a high fidelity reference model on different training structures R ∈ R, where R denotes the training set. To estimate the parameters we minimize the loss function (4). In the current section, we go into further details of the parameter estimation process once the model and training set have been specified.
Prototypical parameter estimation script, using some simple control over regression weights and solver parameters.
To solve the ridge regression problem (12), ACEpotentials.jl employes the package ACEfit.jl (https://github.com/ACEsuit/ACEfit.jl), which offers a range of such algorithms. In the simplest setting, it can be used as shown in Listing 8, lines 6 and 7. For a list of the most important solvers, see Table I. For large models and/or large datasets, the parameter estimation task can be computationally challenging and may have to be performed on a cluster.
Table of solvers for the ridge regression problem (12).
QR | QR decomposition: Direct solution of the ridge regression problem (12). Tikhonov regularisation is imposed by extending the |
linear system. This method should rarely be used in practice and is included mostly for theoretical interest and the sake of | |
completeness. | |
solver = QR(lambda = 0.0) | |
LSQR | Krylov method: The standard iterative Krylov algorithm to solve the ridge regression problem (12). Tikhonov regularisation is |
imposed implicitly in the algorithm, with damp corresponding to the parameter λ. Early termination, by adjusting atol provides an | |
additional and different form of regularisation. This algorithm is suitable for very large-scale parameter estimation problems. | |
solver = LSQR(damp = 1 × 10−4, atol = 1 × 10−6) | |
RRQR | Rank-revealing QR decomposition: A random matrix sketching approach, which is computationally more efficient than the |
standard QR decomposition. In addition, the parameter rtol is closely related to λ in (12) but not identical. Instead of adding a | |
Tikhonov term, RRQR regularisation is imposed by removing highly sensitive subspaces as determined by rtol. For large problems, | |
this algorithm is more performant than the standard QR decomposition. | |
solver = RRQR(rtol = 1e−5) | |
BLR | Bayesian linear regression: (or, Bayesian ridge regression) specifies a class of solvers that estimate regularisation hyperparameters, |
depending on the setting it estimates the scaling parameter λ or the entire Tikhonov matrix Γ. This solver also determines | |
a posterior model distribution that can be used for uncertainty quantification. See Section II F. for further details. This algorithm is | |
more robust than QR, LSQR, RRQR, but computationally more intensive. It is highly recommended for relatively small datasets. | |
solver = BLR() |
QR | QR decomposition: Direct solution of the ridge regression problem (12). Tikhonov regularisation is imposed by extending the |
linear system. This method should rarely be used in practice and is included mostly for theoretical interest and the sake of | |
completeness. | |
solver = QR(lambda = 0.0) | |
LSQR | Krylov method: The standard iterative Krylov algorithm to solve the ridge regression problem (12). Tikhonov regularisation is |
imposed implicitly in the algorithm, with damp corresponding to the parameter λ. Early termination, by adjusting atol provides an | |
additional and different form of regularisation. This algorithm is suitable for very large-scale parameter estimation problems. | |
solver = LSQR(damp = 1 × 10−4, atol = 1 × 10−6) | |
RRQR | Rank-revealing QR decomposition: A random matrix sketching approach, which is computationally more efficient than the |
standard QR decomposition. In addition, the parameter rtol is closely related to λ in (12) but not identical. Instead of adding a | |
Tikhonov term, RRQR regularisation is imposed by removing highly sensitive subspaces as determined by rtol. For large problems, | |
this algorithm is more performant than the standard QR decomposition. | |
solver = RRQR(rtol = 1e−5) | |
BLR | Bayesian linear regression: (or, Bayesian ridge regression) specifies a class of solvers that estimate regularisation hyperparameters, |
depending on the setting it estimates the scaling parameter λ or the entire Tikhonov matrix Γ. This solver also determines | |
a posterior model distribution that can be used for uncertainty quantification. See Section II F. for further details. This algorithm is | |
more robust than QR, LSQR, RRQR, but computationally more intensive. It is highly recommended for relatively small datasets. | |
solver = BLR() |
For small and moderate datasets we normally recommend the BLR method. For large datasets. when finely tuned regularisation is often less important, the random matrix sketching RRQR and iterative LSQR may be more appropriate.
Once the model parameters are determined as shown above, we typically wish to perform two tasks: (1) confirm the model accuracy on a test set; and (2) export the model to a format that can be used in standard MD codes, e.g., LAMMPS and ASE. Suppose that we are provided with a test data set testdata, then we can determine the model errors on that test set as seen in Listing 8, lines 9–11. This will print tables of root-mean-square error (RMSE) and mean absolute error (MAE) errors for individual configuration types. If we wish to store and/or export the fitted potential for later use, we typically save it in .json format which can be read by ACEpotentials.jl as well as its Python interface to ASE, and in .yace format which can be read by the pace extension to LAMMPS; cf. Listing 8, lines 13–15.
F. Bayesian framework for parameter estimation
1. Solvers and model selection via evidence maximisation
2. Uncertainty estimates via committees
The first term in (17) refers to the aleatoric, or irreducible, uncertainty arising due to randomness of the system which is dominated by the complexity of the linear ACE convergence parameters such as correlation order, polynomial degree and cutoff. The second term is the epistemic, or reducible, uncertainty arising due to a lack of data or rather information. An example how a variance estimate of the epistemic uncertainty can be obtained in the linear ACE framework is shown in Listing 9.
Example how to use a committee to estimate the uncertainty of a prediction. (Note that model.potential gives access to the calculator object.) Analogously, one can obtain committees of forces and virials.
III. WORKFLOW EXAMPLES
In this section, we present several practical examples of ACE usage, including simple benchmarks, practical potentials for materials and liquids to examples illustrating the hyperactive learning workflow. The scripts we used to generate the reported results are made available in a separate git repository (https://github.com/ACEsuit/ACEworkflows) that will be regularly updated as the ACEpotentials.jl package evolves.
A. Tests with pre-existing data sets
1. Benchmarks with limited-diversity datasets
We test ACEpotentials.jl with default parameters on an early single-element benchmark dataset taken from Ref. 27. This dataset was originally used to assess the relative strengths and weaknesses of four important MLIPs, the high-dimensional neural network potential (NNP),1 the Gaussian approximation potential (GAP),2 the Spectral Neighbor Analysis Potential (SNAP),7 and moment tensor potentials (MTP).8 The benchmark contains six separate datasets corresponding to the six elements Li, Mo, Ni, Cu, Si and Ge, spanning a variety of chemistries (main group metal, transition metal and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). For each element, the dataset contains the ground-state crystal structure, strained structures with strains of −10%–10%, slab structures up to a maximum Miller index of three, and NVT ab initio molecular dynamics simulations of the bulk supercells with and without a single vacancy. These datasets contain a relatively large number of training structures, but only limited diversity.
In Table II we see the comparison of the MAEs in energies and forces for the best performing potentials in the benchmark (GAP and MTP) with two linear ACE models trained with the default parameters and total degrees chosen to reach basis sizes of, respectively, 300 basis functions for ACE(s) and ∼1000 basis functions for ACE(l). We optimized none of the hyperparameters and solved used RRQR to estimate the parameters. We chose RRQR since the datasets are very large, hence a highly tuned regularisation is less important. This results in competitive accuracy across the entire benchmark. The only small exception is the slightly larger energy error for Mo-ACE(l), which suggests some fine-tuning of the model parameters could be beneficial in this particular case. Our aim with this experiment was to demonstrate that, with only minimal effort, linear ACE models can perform with (near-) best accuracy in a data set geared towards testing statistical generalization.
Mean absolute test errors in predicted energies and forces of two ACE models, ACE(sm) with ∼300 basis functions and ACE(lge) with ∼1000 basis functions, compared against the two best performing MLIPs published in.27
Energy (meV) . | Forces (eV/Å) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
. | ACE(sm) . | ACE(lge) . | GAP . | MTP . | . | ACE(sm) . | ACE(lge) . | GAP . | MTP . |
Ni | 0.416 | 0.34 | 0.42 | 0.48 | 0.018 | 0.015 | 0.02 | 0.01 | |
Cu | 0.292 | 0.228 | 0.46 | 0.41 | Ni | 0.007 | 0.005 | 0.01 | 0.01 |
Li | 0.231 | 0.165 | 0.49 | 0.49 | Cu | 0.006 | 0.005 | 0.01 | 0.01 |
Mo | 2.597 | 2.911 | 2.24 | 2.83 | Li | 0.123 | 0.097 | 0.09 | 0.09 |
Si | 3.501 | 1.985 | 2.91 | 2.21 | Mo | 0.086 | 0.066 | 0.07 | 0.06 |
Ge | 2.594 | 2.162 | 2.06 | 1.79 | Si | 0.064 | 0.051 | 0.05 | 0.05 |
Energy (meV) . | Forces (eV/Å) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
. | ACE(sm) . | ACE(lge) . | GAP . | MTP . | . | ACE(sm) . | ACE(lge) . | GAP . | MTP . |
Ni | 0.416 | 0.34 | 0.42 | 0.48 | 0.018 | 0.015 | 0.02 | 0.01 | |
Cu | 0.292 | 0.228 | 0.46 | 0.41 | Ni | 0.007 | 0.005 | 0.01 | 0.01 |
Li | 0.231 | 0.165 | 0.49 | 0.49 | Cu | 0.006 | 0.005 | 0.01 | 0.01 |
Mo | 2.597 | 2.911 | 2.24 | 2.83 | Li | 0.123 | 0.097 | 0.09 | 0.09 |
Si | 3.501 | 1.985 | 2.91 | 2.21 | Mo | 0.086 | 0.066 | 0.07 | 0.06 |
Ge | 2.594 | 2.162 | 2.06 | 1.79 | Si | 0.064 | 0.051 | 0.05 | 0.05 |
2. Silicon
We used ACEpotentials.jl to fit a linear ACE potential to the silicon dataset introduced by Bartók et al.25 for fitting a Gaussian approximation potential (GAP). This extensive database contains a wide range of configurations ranging from several bulk crystal structures (diamond, hcp, fcc, etc.), amorphous structures as well as liquid MD snapshots, aiming to cover as much of the silicon energy landscape as possible. The corresponding GAP model was shown to outperform a wide range of other (classical) interatomic potentials on a large selection of accuracy and property or generalisation tests ranging from surface formation energies as well as liquid and radial distribution functions. The current work benchmarks an ACEpotentials.jl model, with default model parameters, containing basis functions up to order , polynomial total degree Dmax = 20 and 6 Å cutoff against this silicon GAP potential. The model was fitted using generalised Tikhonov regularisation (12) of λΓ, where Γ was constructed using an algebraic smoothness prior (10) with p = 5, whilst the BLR solver was used to estimate the scaling parameter λ. This benchmark is formed of a series of property tests including bulk diamond elastic constants, vacancy formation energies, surface formation energies for the (100), (110), (111) surfaces and hexagonal, dumbbell and tetragonal point defect energies for bulk diamond. These results of these property tests for the CASTEP28 DFT reference, GAP and ACE are shown in Fig. 2 and indicate good accuracy across the range of property tests. Percentage errors relative to the DFT reference are also included, confirming similarly accurate performance between the GAP and the ACEpotentials.jl frameworks.
Benchmark of the silicon GAP25 and ACE model presented in this work. Percentage relative errors against the DFT reference are provided in the Table.
Benchmark of the silicon GAP25 and ACE model presented in this work. Percentage relative errors against the DFT reference are provided in the Table.
We also used this silicon ACE potential to carry out a more challenging test, namely to simulate fracture in the cleavage system. We used the matscipy package to setup a 12 × 11 × 1 supercell containing 1586 atoms and to carry out structural optimizations with a Mode I crack anisotropic continuum linear elastic displacement field29 applied with stress intensity factors ranging from 0.6KG to 1.5KG (where KG is the Griffith load at which fracture becomes thermodynamically favorable). We observed spontaneous formation of the Pandey 2 × 1 reconstructed (111) surface behind the crack tip, in good agreement with previous studies using DFT30 and GAP.25 The critical stress intensity factor was determined to be KI = 1.0 ± 0.02KG, which is very close to the expected Griffith value, indicating minimal lattice trapping. Overestimating the extent of lattice trapping is a common failure mode of previous interatomic potentials when applied to model fracture.31 The total simulation time was around 30 minutes on a 28-core workstation.
To successfully carry out the fracture test it was crucial to produce a highly regular (smooth) ACE potential. To illustrate the effect of changing the smoothness prior, a sequence of ACE potentials (order , total degree Dmax = 21 and 6 Å cutoff), was fitted using no smoothness prior (Γ = 1) and increasing strengths of algebraic smoothness prior (10), p = 1, 2, 5 and 10. In all cases the model parameters were estimated using generalized Tychonov regularisation (12) with the scale factor λ tuned such that all potentials achieved a force RMSE of ∼0.075 eV/Å, which is ∼5% larger than without any regularisation. The effect of the prior on predicted Si–Si dimer curves and rigid bulk Si decohesion curves, which respectively probe smoothness of two-body and many-body terms, is shown in Fig. 3. Applying a moderate smoothness prior aids extrapolation into the close-approach region and reduces the amplitude of spurious oscillations seen in the stress (S) during decohesion.
Top Left: The predicted energy of the Si–Si dimer is shown for a sequence of ACE potentials trained with varying strengths of smoothness prior but equal accuracy (Force RMSE ≈ 0.075 eV/Å). Γ = 1 corresponds to an equal prior for all basis functions whilst p indicates the strength of the algebraic smoothness prior defined in (10). The black curve shows the corresponding result using GAP. All curves are shifted for clarity. Bottom: The evolution of stress (S) as a function of separation (z) during rigid decohesion of bulk silicon into the unrelaxed (110) and (100) surfaces is shown for the same sequence of potentials. Top Right: Snapshot from Si quasi-static fracture simulation at a stress intensity factor of 1.8KG using our ACE potential. The lower fracture surface shows a 2 × 1 Pandey reconstruction (alternating pentagons and heptagons), consistent with previous studies using DFT and GAP models, but at much reduced cost. The critical fracture toughness is very close to KG, showing minimal lattice trapping.
Top Left: The predicted energy of the Si–Si dimer is shown for a sequence of ACE potentials trained with varying strengths of smoothness prior but equal accuracy (Force RMSE ≈ 0.075 eV/Å). Γ = 1 corresponds to an equal prior for all basis functions whilst p indicates the strength of the algebraic smoothness prior defined in (10). The black curve shows the corresponding result using GAP. All curves are shifted for clarity. Bottom: The evolution of stress (S) as a function of separation (z) during rigid decohesion of bulk silicon into the unrelaxed (110) and (100) surfaces is shown for the same sequence of potentials. Top Right: Snapshot from Si quasi-static fracture simulation at a stress intensity factor of 1.8KG using our ACE potential. The lower fracture surface shows a 2 × 1 Pandey reconstruction (alternating pentagons and heptagons), consistent with previous studies using DFT and GAP models, but at much reduced cost. The critical fracture toughness is very close to KG, showing minimal lattice trapping.
3. Water
We investigated the ability of ACEpotentials.jl to capture the interactions in complex molecular liquids and to perform robust molecular dynamics simulations in such systems, fitting a linear ACE potential to a dataset containing 1593 liquid water configurations.32 We chose only default model parameters, containing basis functions up to correlation order , polynomial total degree Dmax = 15 and rcut = 5.5 Å cutoff. Parameter estimation was performed using ARD with relevance threshold set by minmising the Bayesian Information Criterion (BIC).33 The training RMSE were 1.732 meV/atom for energies and 0.099 eV/Å for forces. To investigate the performance and robustness of the fitted ACE model, a series of mean squared displacement (MSD) simulation were performed under 1 bar NPT conditions at 300 K. The simulations were performed using 5184 atom simulation boxes, shown in Fig. 4 below, with the pace pair style in LAMMPS.12 The total simulation time for each of these simulation was 20 min utilising 1280 cores on ARCHER2, illustrating the efficiency of ACE potentials. The diffusion constant predicted by this simulation was 1.20 ± 0.03 m/s2. It should be noted that diffusion constants are notoriously difficult to accurately determine especially considering the absence of long-range interactions into these ACE models. This example is therefore mostly an illustration of robustness and performance.
Mean squared displacement (MSD) for three liquid water simulation at 1 bar NPT simulations and 300 K. The simulation cell contained 5184 atoms.
Mean squared displacement (MSD) for three liquid water simulation at 1 bar NPT simulations and 300 K. The simulation cell contained 5184 atoms.
B. The hyperactive learning (HAL) workflow
The HAL framework shares similarities with Bayesian Optimization (BO) as the biasing term is formally equivalent to a Lower Confidence Bound (LCB) acquisition function.36 Similarly to BO, the parameter τ adjusts the tradeoff between exploration and exploitation during the generation of training configurations using HAL. HAL-generated configurations are both energetically reasonable, guided by EACE (exploitation), and informative, predicted by a relatively large value of σ (exploration). The bias towards uncertainty, mediated by an emerging biasing force during HAL dynamics, can be viewed as a strategy to acquire information (gain) by seeking out unseen (local) environments. The HAL approach can also be viewed as an adversarial attack, aimed to destabilize a fitted ACE potential such that, after iteratively adding sufficiently many new configurations, the linear ACE model is robust to such attacks which all but guarantees stable dynamics over long timescales.
The biasing parameter τ in HAL necessitates careful tuning, which HAL achieves through an adaptive scheme14 that tunes τ on the fly by balancing the magnitude of the biasing force relative to the forces obtained by EACE. The relative biasing parameter τr used in this scheme is typically set to 0.1–0.2 and ensures that the biasing strength is reduced or increased depending on the degree of predicted uncertainty explored during the dynamics.
Hyperactive Learning (HAL) protocol. Linear ACE potentials are fitted using BRR or ARD after which biased MD/MC steps are performed controlled by biasing parameter τ. Once the uncertainty metric fi exceeds ftol a DFT calculation is triggered a HAL iteration is completed and the training database extended.
Hyperactive Learning (HAL) protocol. Linear ACE potentials are fitted using BRR or ARD after which biased MD/MC steps are performed controlled by biasing parameter τ. Once the uncertainty metric fi exceeds ftol a DFT calculation is triggered a HAL iteration is completed and the training database extended.
1. AlSi10 melting temperature
The HAL framework was used to create an ACE potential for determining the melting temperature of the AlSi10 alloy. An initial dataset consisted of 32-atom random fcc lattice configurations, each containing 98 aluminium and 10 silicon atoms. This initial dataset was composed of five fcc random alloy configurations with lattice constants ranging from 14.3 to 16.6 Å3/atom. The ACE basis set included interactions up to correlation order (three-body), and employed a cutoff of 5.5 Å. The model was fitted using Automatic Relevance Determination (ARD) and its sparsity set by minimising BIC which resulted in increasingly complex ACE models as more configurations (or information) were added. The chosen maximum polynomial degree Dmax during the HAL procedure increased from 4 to 12. The parameter estimation was carried out using ARD. The HAL relative biasing strength was set to τr = 0.2, and the relative uncertainty threshold to ftol = 0.2.
The HAL dynamics was used to melt the random alloy crystal structure, by ramping the temperature from 0 to 1500 K at 1 GPa using a 1 fs timestep. Cell swapping and volume adjusting HAL-MC steps were taken to facilitate exploration of the (biased) energy landscape. After 18 HAL iterations, the ACE potential was already able to consistently perform 5000 HAL MD/MC timesteps without encountering new structures with high uncertainty. This final ACE potential contained 79 basis functions as selected using ARD pruning.
During these 18 HAL iterations the dimer curves are typically examined to ensure the potentials exhibit attraction at typical interatomic distances and short range repulsion as illustrated in Fig. 6.
ACE dimer curves for pair interactions for several HAL iterations. Stronger colours indicate later HAL iterations. They key observation to be drawn from this figure is that even in the early stages of the HAL process with very little available data, our priors ensure that the dimer curves are physically sensible, in particular smooth and repulsive.
ACE dimer curves for pair interactions for several HAL iterations. Stronger colours indicate later HAL iterations. They key observation to be drawn from this figure is that even in the early stages of the HAL process with very little available data, our priors ensure that the dimer curves are physically sensible, in particular smooth and repulsive.
The ACE potential obtained after HAL iteration 18 (fitted to 22 structures in total) was subsequently used to perform nested sampling (NS) simulations to model the liquid-solid phase transition. NS simulations were performed using 384 NS walkers, using a total decorrelation length of 512 formed by volume/shear/stretch/swap MC steps at a ratio of 4:4:4:4. The resulting heat capacity curves obtained by NS are presented in Fig. 7 and are in close agreement to the melting temperature of 867 K as given by Thermo-Calc using the TCAL4 database.37
NS AlSi10 heat capacity curves for several runs indicating the liquid-solid transition as predicted by the HAL generated ACE potential.
NS AlSi10 heat capacity curves for several runs indicating the liquid-solid transition as predicted by the HAL generated ACE potential.
2. Polyethylene glycol
The HAL framework38 was used to create a polyethylene glycol (PEG) model. To initilize HAL, 18 structures of PEG (n = 32) formed of 32 monomer units in vacuum were evaluated using the ORCA code39 with the ωB97X DFT exchange correlation functional40 and the 6-31G(d) basis set. ACE models were fitted to the initial and subsequent datasets with correlation order , total degree Dmax = 12 and a cutoff radius 5.5 Å, using the ARD algorithm. The HAL protocol used relative biasing parameter τr = 0.1 and uncertainty tolerance ftol = 0.3 and performed at 500 K. Unlike the previous AlSi10 example, no cell adjusting or atom swapping HAL-MC steps were performed as the configurations are isolated molecules in vacuum. It was also chosen to not change the ACE basis throughout the HAL procedure but rather to keep it constant (e.g. Dmax = 12) as the initial database was relatively diverse. After 50 HAL iterations an ACE potential was generated that was deemed stable as it completed 104 HAL biased MD steps without triggering a DFT calculation. It was then used to determine the density of a PEG polymer formed of n = 200 monomer units in LAMMPS under periodic boundary conditions using the PACE evaluator.12 The PEG (n = 200) density was determined at 300, 350 and 400 K at 1 bar pressure over a timescale of 0.5 ns as shown in Fig. 8. The density at 300 K is in good agreement with the experimental density of 1.2 g/cm341 at 293 K. This illustrates remarkable extrapolative performance by the linear ACE framework as the DFT reference (ORCA) does not support periodic boundary conditions itself, making determining the PEG density purely from first principles impossible.
PEG (n = 200) density for HAL generated ACE potential under periodic boundary conditions using LAMMPs.
PEG (n = 200) density for HAL generated ACE potential under periodic boundary conditions using LAMMPs.
3. Perovskite CsPbBr3
We used the HAL framework38 to create a training dataset for the lead-halide perovskite CsPbBr3, which shows three relevant phases: orthorhombic at low temperatures, tetragonal at intermediate temperatures, and cubic at high temperatures, with experimental transition temperatures of 361 and 403 K.42 The HAL process was designed to sample all of these phases so that the resulting potential accurately represents energy and entropy of each phase and is hence capable of predicting the transition temperatures. To ensure consistent DFT energies and effective vibrational mode sampling, approximately cubic 40 atom supercells were created for all three phases.
This problem required some refinement of the standard HAL procedure, and careful testing of fitted ACE potentials for several basis sizes. We therefore give more detail about the process than in the previous cases.
The initial fit starting the HAL process used a set of 15 randomly perturbed (unit cell and atomic positions) 40-atom configurations, three from each of the high symmetry phases. The default ACE basis was used, with a cutoff of 8 Å, a smoothness prior with p = 3, and the sklearn BayesianRidge linear solver. Automated basis selection was applied every 10 HAL iterations, with a maximum basis size of 2000, , a maximum total polynomial degree of 16, and the model score as the selection criterion. To encourage exploration of a wide range of temperatures and configurations, over a maximum of 104 1 fs HAL MD steps the temperature was ramped from 200 to 600 K, and τr from 0.1 to 0.5. New fitting configurations were selected when the fractional force error ftol exceeded 0.4. After 20 iterations starting from the three unperturbed high-symmetry 40-atom cells at fixed unit cell shape and size, the process was restarted from 9 80-atom high symmetry cells, doubling each of the three 40-atom cells along each cell vector, for 20 additional iterations. Then 20 additional iterations were carried out with variable unit cell and an applied pressure of 0.
At this point the model appeared to be stable enough for 105 steps without a HAL bias, so we switched to an unbiased sampling process to gather more data and improve the model accuracy. Starting the fit from the complete set of configurations from the HAL process, we generated fitting configurations from 2000 step runs with a maximum basis size of 4000. These used the same 80 atom starting configurations, but at fixed temperatures of 200–500 K at 100 K intervals, and fixed shape but variable unit cell volume. To further refine the performance of the low energy parts of the PES around each high symmetry structure, we sampled 36 more configurations, each with 160 atoms (the three 40 atom supercells doubled along each of the three pairs of lattice vectors) at a range of lower temperatures, 150–300 K at 50 K intervals.
The original set of 15 randomly perturbed configurations, another similar set of 15, and the 168 HAL configurations were used as the reference database for a set of fits to explore the performance of the model for a wide range of basis sizes. At this stage we filtered out physically unreasonable fitting data, as defined by a criterion that excluded any force larger than 10 eV/Å, as well as the energies and virials from such configurations. To fit the model and evaluate its predictive accuracy we split the set of configurations into 75% fitting and 25% testing, stratifying the split by the HAL iteration (or initial random perturbation set) that produced the configuration. The same fitting procedure and basis as in the HAL run were used, with and and maximum polynomial degree 4–16, up to a maximum basis size of 2 × 104. We also compared three choices for the smoothness prior: none, p = 2, and p = 4.
The training set residuals, test set residuals, and BayesianRidge score (log marginal likelihood) are plotted as a function of basis size in Fig. 9. For each value of the fitting error improves monotonically as the basis size (and polynomial degree) increases, but at equal basis size the residuals are lower by as much as 25% (especially for moderately sized bases), indicating that for this system increasing the polynomial degree provides the basis with more useful flexibility as compared to increasing . For the basis size range where the error is minimized, the testing set residuals are larger than the fitting set by at least about a factor of 2, indicating that some amount of overfitting is occurring. The smoothness prior is successful at limiting the extent of this overfitting.
Fitting set residual (left), testing set residual (center), and log marginal likelihood (right) as a function of basis size for CsPbBr3 ACE model fit to a database generated with HAL. Symbol indicates correlation order , and color indicates smoothness prior exponent p.
Fitting set residual (left), testing set residual (center), and log marginal likelihood (right) as a function of basis size for CsPbBr3 ACE model fit to a database generated with HAL. Symbol indicates correlation order , and color indicates smoothness prior exponent p.
The generally lower training and test errors for the models relative to the correlation order three models are reflected in their Bayesian ridge scores (log marginal likelihoods). However, within each correlation order the optimal choice of polynomial degree and corresponding basis size indicated by the minimum test error are not consistent with the score. Indeed, the results displayed in Fig. 9 lead us to conclude that the Bayesian ridge score is not always a reliable tool for optimal basis selection and other options should be explored in the future.
We used the model with lowest test set error, generated by the fit with , maximum polynomial degree 12, and smoothness prior p = 4, to simulate larger unit cells of CsPbBr3 at a range of temperatures spanning its expected range of phase transition temperatures. We simulated 32 independent constant temperature, constant pressure, MD trajectories at temperatures from 200 to 355 K and zero pressure for 104 10 fs time steps. Each trajectory started from an 8 × 8 × 6 supercell (7680 atoms) of the orthorhombic structure. To analyze the resulting structure we reconstructed the effective cubic lattice vectors and averaged their magnitudes over the last 8000 steps of each trajectory. A plot of these effective cubic cell lattice vector magnitudes as a function of temperature is shown in Fig. 10. We see the three expected phases as indicated by the degeneracy of the lattice constants: cubic at high temperature, tetragonal at intermediate temperatures, and orthorhombic at low temperatures. The transition temperatures are 240 and 255 K, which are substantially shifted relative to the experimental results of 361 and 403 K.42 We expect that this deviation from experiment is primarily due to our choice of exchange correlation functional, the Perdew–Burke–Enzerhof generalized-gradient approximation,43 as has been seen in similar simulations.44 A direct comparison to DFT would be useful, but it would require an accurate calculation of the predicted phase transition temperatures directly from the DFT PES, which is too computationally demanding to be practical without additional approximations.
Effective cubic lattice constants at fixed temperature simulated using the ACE model with , maximum polynomial degree 12, and p = 4. All three values are identical (to within the estimated error) at T > 255 K indicating a cubic structure. At lower temperatures these split into a single value and a group of two, consistent with a tetragonal structure, and at T < 240 K they split further into three distinct values, consistent with an orthorhombic structure.
Effective cubic lattice constants at fixed temperature simulated using the ACE model with , maximum polynomial degree 12, and p = 4. All three values are identical (to within the estimated error) at T > 255 K indicating a cubic structure. At lower temperatures these split into a single value and a group of two, consistent with a tetragonal structure, and at T < 240 K they split further into three distinct values, consistent with an orthorhombic structure.
IV. COMPUTATIONAL PERFORMANCE
The linearity of ACE potentials renders them not only interpretable but also efficient in terms of computational performance. To demonstrate this, a performance test was conducted on various linear ACE potentials referenced in this paper. The evaluation times, as well as some ACE hyperparameters used, are shown in Table III for the AlSi10, CsPbI3, H2O, PEG and Si potential developed in this work. The number of basis functions for each model is given too and may be fewer than a complete ACE basis parameterized by and Dmax due to ARD pruning basis functions with low relevance. The timings were obtained using the LAMMPs-PACE implementation12 using a 128 core ARCHER2 node, equivalent to two seperate AMD EPYC 7742 64-core at 2.25 GHz. The 106 steps/day figures are equivalent to a ns/day and were obtained for varying cell sizes to illustrate scaling. A standardized performance figure in the form of core-μs/atom figure is also provided. The silicon database fitted originates from the silicon GAP potential, whereas the AlSi10, PEG and CsPbBr3 potentials were fitted using HAL generated databases containing 22, 68 and 198 configurations respectively as discussed in the previous subsections.
Performance of linear ACE potentials for various systems using an ARCHER2 node utilising 128 cores for the 106 steps/day figures (equivalent ns/day using a 1 fs timestep). Core-μs/atom figures were obtained by performing simulations in serial.
. | ACE parameters . | Performance . | ||||
---|---|---|---|---|---|---|
. | Dmax . | rcut . | No. basis func. . | 106 steps/day (atoms) . | Core-μs/atom . | |
AlSi10 | 2 | 7 | 5.5 | 79 | 636 (32) | 23 |
CsPbBr3 | 2 | 12 | 5.5 | 544 | 334 (20) | 93 |
PEG | 3 | 12 | 5.5 | 4897 | 10 (1400) | 227 |
Si | 4 | 20 | 6 | 5434 | 7 (250) | 744 |
. | ACE parameters . | Performance . | ||||
---|---|---|---|---|---|---|
. | Dmax . | rcut . | No. basis func. . | 106 steps/day (atoms) . | Core-μs/atom . | |
AlSi10 | 2 | 7 | 5.5 | 79 | 636 (32) | 23 |
CsPbBr3 | 2 | 12 | 5.5 | 544 | 334 (20) | 93 |
PEG | 3 | 12 | 5.5 | 4897 | 10 (1400) | 227 |
Si | 4 | 20 | 6 | 5434 | 7 (250) | 744 |
V. CONCLUSION AND OUTLOOK
We introduced ACEpotentials.jl, a front-end for several Julia-language packages that implement Atomic Cluster Expansion (ACE) MLIPs and related functionality. This front-end provides a user-oriented interface, while the backend packages combine excellent performance with the flexibility for rapid model development and experimentation that is typical for the Julia language. The front-end ACEpotentials.jl exposes a relatively simple subset of ACE type models, linear models with robust priors, that we consider reliable in every-day use, especially in the context of an active learning type workflow.
However, we emphasize that the ACE framework allows for a much richer MLIPs design space9,12,48–47 as well as parameterisation of many other types of particle systems.51–51 We therefore conclude by mentioning some of those extensions, as well as current short-comings, that require further development.
Robust parameter estimation, in particular hyperparameter tuning, remains under-investigated in the MLIPs context. We regularly experience that hand-tuned hyperparameters can give superior results, basis sparsification remains poorly understood, and uncertainties are often only indicative of actual errors. Further research is required to resolve these closely related issues.
The design space of the ACEpotentials.jl ACE models can be expanded to admit trainable radial embeddings, composition of ACE features with nonlinearities, or even multi-layer architectures such as.45,46 This comes at the cost of highly nonlinear and less efficient models, but some of those extensions, such as trainable radial embeddings, can be undertaken while keeping the spirit of our current ACE models: small models for rapid iterative development and low evaluation cost.
The extension to highly nonlinear models would likely require that the computational kernels on which ACEpotentials.jl is built also be made graphics processing unit (GPU)-capable. Towards that end a deep learning framework such as MACE46 [see also the mace (https://github.com/ACEsuit/mace) code] may be better suited.
Finally, we note that there are already several related ACE software packages within ACEsuit (https://github.com/ACEsuit) that implement a variety of models for other particle systems at different stages of development: Hamiltonians (Ref. 48, ACEhamiltonians.jl); wave functions (Ref. 50 and 51, ACEpsi.jl); jet tagging models (Ref. 49, BIPs.jl). These build on an experimental and significantly expanded Julia-language ACE package ACE.jl.
ACKNOWLEDGMENTS
G.C. acknowledges support from EPSRC Grant No. EP/X035956/1. C.O., A.R., and T.J. were supported by NSERC Discovery Grant No. GR019381 and NFRF Exploration Grant No. GR022937. W.J.B. was supported by US AFRL Grant No. FA8655-21-1-7010. C.v.d.O. and G.C. acknowledge ARCHER2 for which access was obtained via the UKCP consortium and funded by EPSRC Grant No. EP/P022065/1. N.B. was supported by the U.S. Office of Naval Research through the U.S. Naval Research Laboratory’s fundamental research base program. E.G. acknowledges support from the EPSRC Centre for Doctoral Training in Automated Chemical Synthesis Enabled by Digital Molecular Technologies with Grant Reference No. EP/S024220/1. W.C.W. was supported by the Schmidt Science Fellows in partnership with the Rhodes Trust, and additionally acknowledges support from EPSRC (Grant No. EP/V062654/1). J.K. and C.O. acknowledge funding from the Leverhulme Trust under grant RPG-2017-191 and the EPSRC under Grant No. EP/R043612/1. J.K., J.P.D. and G.C. acknowledge support from the NOMAD Centre of Excellence funded by the European Commission under grant agreement 951786. J.K. acknowledges support from the EPSRC under Grant Nos. EP/P002188 and EP/R012474/1. This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the EPSRC (capital Grant No. EP/T022159/1), and DiRAC funding from the STFC (www.dirac.ac.uk). Further computing facilities were provided by the Scientific Computing Research Technology Platform of the University of Warwick.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
William C Witt: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Cas van der Oord: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Elena Gelžinytė: Software (equal); Writing – original draft (equal). Teemu Järvinen: Software (equal); Writing – original draft (equal). Andres Ross: Methodology (equal); Software (equal); Writing – original draft (equal). James P. Darby: Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Cheuk Hin Ho: Data curation (equal); Investigation (equal); Methodology (equal); Software (equal). William J. Baldwin: Data curation (equal); Investigation (equal); Software (equal); Writing – original draft (equal). Matthias Sachs: Methodology (equal); Writing – original draft (equal). James Kermode: Software (equal); Writing – review & editing (equal). Noam Bernstein: Data curation (equal); Investigation (equal); Software (equal); Visualization (equal); Writing – original draft (equal). Gábor Csányi: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). Christoph Ortner: Conceptualization (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Software (equal); Supervision (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request. are openly available in github.
APPENDIX: LINEAR SCALING COST AND COMPUTATIONAL KERNELS
In Secs. II A and II B we outlined some basic ideas behind the ACE model, in particular expressing the potential energy model in terms of the many-body expansion (Sec. II A). A naive implementation of the many-body expansion results in prohibitive computational cost due to the exponential cost of the sums over clusters (j1, …, jν). However, after discretizing the U(ν)-body potential of the self-interacting many-body expansion (2b) the sum can be rewritten to result in linear scaling cost. This is presented in detail, for example, in.9,11,12 hence we shall not review this process in full detail here. In order to outline what is involved in an implementation of an ACE potential, we only recall the form that the ACE model takes after this re-organisation of the many-body summation. The evaluation of the self-interacting ACE basis then results in the following stages:
Evaluation of the embeddings, Rnl (rij, Zi, Zj) and .
- Product basis: for lexicographically ordered tuples we defineThis operation can be thought of as a sparse symmetric tensor product, or as taking ν-correlations.(A2)
- Symmetrization: To ensure invariance one averages Ai over all rotations, resulting in the O(3)-invariant basisemployed in the definition of the linear ACE model (3). Here, Ai is the vector of basis functions while a sparse matrix.(A3)
For each of these stages efficient computational kernels are implemented, designed in a modular way so that they can be independently optimized or composed into new model architectures.
1. Canonical many-body expansion
Under the condition that the radial basis and envelope function are pure polynomials, it is possible to transform the self-interacting ACE basis Bi defined in (A3) into a basis for the canonical many-body expansion (2a). The idea behind this procedure is sketched out in Ref. 11. The precise details of the implementation and a detailed study is not the purpose of this review. Here, we only mention that, upon slightly extending the Rnl, Ai and Ai bases, one can obtain a “purification operator” such that the linearly transformed becomes a basis for the canonical many-body expansion (2a). The symmetrisation can then be applied to obtain an O(3)-invariant basis .