We introduce ACEpotentials.jl, a Julia-language software package that constructs interatomic potentials from quantum mechanical reference data using the Atomic Cluster Expansion [R. Drautz, Phys. Rev. B 99, 014104 (2019)]. As the latter provides a complete description of atomic environments, including invariance to overall translation and rotation as well as permutation of like atoms, the resulting potentials are systematically improvable and data efficient. Furthermore, the descriptor’s expressiveness enables use of a linear model, facilitating rapid evaluation and straightforward application of Bayesian techniques for active learning. We summarize the capabilities of ACEpotentials.jl and demonstrate its strengths (simplicity, interpretability, robustness, performance) on a selection of prototypical atomistic modelling workflows.

Machine-learning interatomic potentials (MLIPs) continue to revolutionize the fields of molecular and materials simulation.1,2 MLIPs provide the means to simulate atomistic systems at or close to the accuracy of electronic structure methods, while being computationally cheaper by orders of magnitude. They make the simulation of large-scale systems and long time-scales at high model accuracy accessible and have therefore become an indispensable tool for atomic-scale simulation. Recent reviews of the field are provided in Refs. 36. Of particular relevance to the present work are the methods introduced in Refs. 2 and 79.

To create an MLIP, one begins with a flexible functional form, constrained only to comply with the natural symmetries of the potential energy in three-dimensional space, then estimates its parameters using reference data, typically in the form of energies, forces, and virial stresses for a set of representative atomic configurations. Ordinarily, the data are generated with quantum mechanical techniques, such as density functional theory (DFT) calculations, which may be performed only for relatively small structures. A well-trained MLIP is then expected to provide accurate predictions of processes on similar but also much larger spatial scales.

The Atomic Cluster Expansion (ACE) introduced in Ref. 9 is a particular MLIP flavor that is flexible, theoretically well founded, interpretable, and for which it is straightforward to tune the cost-accuracy balance. It is establishing itself as a successful MLIP approach for a wide range of tasks, especially but not exclusively in materials simulation; see e.g., Refs. 1017. Linear variants of the ACE model have been found remarkably data efficient and computationally efficient and as such have proven particularly useful for active learning (AL) workflows14 as Secs. III and IV will demonstrate. Linearity in particular enables sensitivity analysis and a path towards reliable uncertainty quantification.

This article describes ACEpotentials.jl, which ties together a collection of Julia-language packages to expose a user-oriented interface facilitating the convenient construction of ACE MLIPs. To highlight the ease of use of our package, Listing 1 provides a complete Julia-language example that produces an ACE potential for a TiAl dataset.

LISTING 1.

A minimal Julia-language script for fitting an ACEpotentials.jl potential. It first downloads a training dataset, then uses acemodel to create a model object whose parameters are explained fully in the following sections. The model parameters are estimated with the acefit! command, and the result is exported in a LAMMPS compatible format.

 

 

At the time of writing, ACEpotentials.jl provides interfaces for linear ACE models, which give good accuracy as well as performance both in parameter estimation and prediction. We have incorporated a range of geometric and analytical priors into the default model parameters that have proven robust in a range of tasks, including the challenging low data regime arising in active learning workflows. ACEpotentials.jl models can be used for molecular dynamics (MD) simulation in Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS),18 the Atomic Simulation Environment (ASE)19 and Molly.jl.20 

The Julia-language codes on which ACEpotentials.jl builds are written with ease-of-use, performance, and flexibility of model development in mind. Several variations and extensions of the ACE model implementations discussed in this article are under active development. The choice of Julia as the development language enables seamless transition from rapid prototyping to performance optimization. Moreover, Julia is establishing itself as leader in scientific machine learning (see, e.g., Ref. 21), facilitating highly customized model architectures with novel computational kernels.

Finally, we emphasize that the aim of this article is to illustrate the capabilities of ACEpotentials.jl but not to precisely document its use; for the latter see the reference material at Ref. 22, which will evolve along with the software. While the examples and code snippets provided throughout this article are compatible with the present version of ACEpotentials.jl, they should be taken primarily as illustrations of how the package may be used. The documentation will be kept up-to-date for the foreseeable future and will continually expand to describe additional options and features.

1. Model specification

An atomic structure is described by a collection of position-element pairs (ri, Zi), and the computational unit cell (with open or periodic boundary conditions). In the ACE model, the total potential energy of such a structure is decomposed into site energies,
E=iεi,
(1)
where the summation ranges over all atoms belonging to the computational cell and each ɛi depends on its atomic neighbourhood containing all atoms within a cutoff radius rcut from ri, taking into account the boundary conditions. The ACE framework provides a design space to construct systematic models for the site energy ɛi in terms of a complete linear basis of body-ordered symmetric polynomials.
For convenience we introduce the new variables xi ≔ (ri, Zi) for the state of an atom and xij ≔ (rij, Zi, Zj), where rij = rjri, for the state of a bond between atoms xi, xj. In terms of these variables the site energy is expanded in body-order, in two different formulations:
εi=V(0)(Zi)+j1V(1)(xij1)+j1<j2V(2)(xij1,xij2)++j1<<jν̄V(ν̄)(xij1,,xijν̄)
(2a)
=V(0)(Zi)+j1U(1)(xij1)+12!j1,j2U(2)(xij1,xij2)++1ν̄!j1,,jν̄U(ν̄)(xij1,,xijν̄).
(2b)
We call the first formulation (2a) the canonical cluster expansion. It can be transformed9 into the second formulation (2b), where the sums run over all possible combinations of atoms, including all permutation-equivalent clusters and even “artificial clusters” with repeated particles. This transformation introduces unphysical self-interaction terms such as V(2)(xij, xij), but this counter-intuitive choice leads to a tensor product structure that can be exploited in constructing a highly efficient evaluation scheme. Our code is unique in that it implements the transformation between the two descriptions and also allows the evaluation of the canonical formulation (2a). Indeed the default ACEpotentials.jl model specification uses a combination of the two formulations. We will briefly review the challenges involved in evaluating cluster expansion models in the  Appendix.

Both series in Sec. II A are truncated versions of an exact body-order expansion. An exact expansion would include terms up to the number of atoms in the system, while here the maximum body-order is ν̄+1 (corresponding to a correlation order of ν̄), which constitutes the first approximation parameter. In practice, the truncation is performed at low to moderate ν̄ (typically 5 or less) for several reasons, including control of model complexity and computational cost.

Each potential V(ν) (or, U(ν)) is parameterized by a linear model, a process for which we give details below in the following sections. This then results in a parameterisation of the site energy that is also linear,
εi=cBi,
(3)
where c is a vector of parameters and Bi a vector of basis functions (or, features) involved in the expansion of the many-body potentials V(ν) or U(ν). The basis functions are by construction invariant under rotations, reflections and permutations of like atoms. The representation is also complete (or, universal) in the sense that when the approximation parameters (body-order, cutoff radius, and expansion resolution) are taken to infinity, the model can in principle represent an arbitrary smooth site-energy potential. Linearity of the model allows us to employ a vast range of established tools for parameter estimation and uncertainty quantification, and enables rapid model development by refitting to new training data or with adjusted hyperparameters.

The basis functions Bi specify the model. In a typical example this can be done as demonstrated in Listing 2.

LISTING 2.

A typical construction of an ACE model and description of parameters.

 

 

The model object specifies the model site energy potential, from which derived properties such as potential energy, forces and virial stresses can be computed that are used in molecular statics, molecular dynamics or sampling algorithms.

There are many additional parameters and options available to specify an ACE model, some of which we discuss throughout the remainder of this paper. For a complete list of options we refer to the documentation.22 We only remark briefly on the Eref parameter: We recommend the explicit specification of the one-body term V(0). We observed in many tests that constraining V(0)(Zi) to be the energy of a single isolated atom with atomic number Zi yields more chemically realistic potentials that are more robust in practical molecular dynamics and molecular statics simulations, especially those involving breaking and forming bonds. One provides this information to an ACE model as shown in Listing 2, line 6.

In the remainder of this section we maintain a focus on high level intuitive understanding of options and parameters and avoid details and technicalities of the ACE framework as much as possible. For those details we refer to the  Appendix. and to the many publications now available on the subject.6,9,11,12

2. Parameter estimation

Having specified a physically reasonable model architecture, we must now estimate its parameters. To that end we require a training set, which typically consists of a list of atomic structures, R = {R}, for which the total potential energy ERR, forces FRR3×NR (with NR the number of atoms in the computational unit cell) and possibly also virial stresses VRR6 (in Voigt notation) have been evaluated with an electronic structure model. We define E(cR), F(cR), V(cR) as the corresponding energies, forces and virials for the structure R in the ACE model, with parameters c. The simplest way to estimate those parameters is then to minimize the least squares loss function
L(c)=RRwE,R2|E(c;R)ER|2+wF,R2|F(c;R)FR|2+wV,R2|V(c;R)VR|2.
(4)
The weights wE,R, wF,R, wV,R can be used to give more or less relative “importance” to certain structures or observations. They are usually highly structured (e.g., wE,R, wV,R are scaled with the number of atoms in a structure R), which will be discussed in more detail in Sec. II E. Since the ACE model is linear in c it follows that L(c) is quadratic, which means that minimizing L is a linear least squares problem. A wide range of efficient numerical techniques exist for its solution. In particular we will normally employ regularized or Bayesian variations of the naive least squares minimization, which are discussed in Secs. II E and II F.

In Listing 3 we read in such a prepared training set provided in the extended XYZ format and then estimate the model parameters with a default solver (Bayesian Linear Regression; cf. Sec. II F). Several steps are combined and hidden from the user in the acefit! convenience function, but all these steps can in principle also be performed manually, e.g., to explore different parameter estimation algorithms that are currently not interfaced by ACEpotentials.jl. In line 5 of the listing, the fitted model is exported to a format that can be used for molecular dynamics simulations in LAMMPS.

LISTING 3.

A representative example loading a training dataset and estimating ACE model parameters.

 

 

In the remainder of Sec. II we will dive slightly deeper into some the steps we outlined above. Then, in Sec. III we will demonstrate how the framework can be used to fit potential energy models for realistic materials and molecular systems of scientific interest.

The parameters in the model specification in Listing 2 specify a basis in which the V(ν) potentials are expanded. In the current section we will detail the basis functions that are employed, while in Sec. II C we will then explain how to select a finite subset from the infinite complete basis set.

1. One-particle basis

To begin we must select a one-particle basis ϕk in which all smooth functions f(xij) = f(rij, Zi, Zj) can be expanded. The most general form we consider is
ϕznlm(rij,Zi,Zj)=Rnl(rij,Zi,Zj)Ylm(r̂ij)δzZj,
(5)
where δ denotes the Kronecker symbol and we have identified k = (z, n, l, m). The Ylm are the standard complex spherical harmonics, while Rnl is called the radial basis. The choice of Ylm to embed the angular component r̂ij facilitates the exact symmetrization of the parameterisation with respect to rotations. Since (rij, Zi, Zj) is already invariant under rotations, the choice of Rnl is extremely general. Nevertheless we will below outline a heuristic that leads to a narrow class of choices that have proven successful in many applications. However, we note that the optimal choice of Rnl remains an active area of research and will likely also evolve within ACEpotentials.jl.
Once ϕk is selected, each potential V(ν) (or, U(ν)) is expanded in terms of a tensor product many-body basis,
V(1)xij1=k1ck1(Zi)ϕk1(xij1)V(2)xij1,xij2=k1,k2ck1k2(Zi)ϕk1xij1ϕk2xij2V(ν̄)xij1,,xijν̄=k1,,kν̄ck1kν̄(Zi)ϕk1xij1ϕkν̄xijν̄
(6)
The model parameters ck1kν(Zi) will be estimated from data. Note that we choose individual model parameters for each center-atom element Zi. During the parameter estimation, the parameters will be constrained to guarantee invariance of the resulting potentials under rotations and reflections of an atomic environment. Invariance under permutations is already ensured through the summation in Sec. II A. The  Appendix reviews additional details of this invariant basis construction, resulting in the specification of Bi in terms of which site energy is defined in (3).

To complete the model specification two steps remain: (i) the choice of radial basis Rnl; and (ii) the selection of basis functions (k1, …, kν) that we employ in the expansions (6). In the remainder of this section we discuss (i) while (ii) will be discussed in Sec. II C.

2. Radial basis

There is considerable freedom in the choice of the radial basis Rnl, which can be thought of as a geometric prior. For example, it incorporates the interaction range (cutoff radius, rcut) and can be tuned to capture rough qualitative information about interacting atoms. In the following we describe a class of radial bases, available through ACEpotentials.jl, that require no data-driven optimization and thus leads to genuinely linear models. At the time of writing this article, ACEpotentials.jl supports radial bases indexed by n only, i.e. Rnl = Rn for all l. This class is described by
Rn(rij,Zj,Zi)=fenv(rij,Zj,Zi)Pny(rij,Zj,Zi),
(7)
with the following components (Listing 4):
  • y is an element-dependent distance transform, which can be used to impose increased spatial resolution where needed, especially near the equilibrium bond-length. We typically employ
    y(rij,Zi,Zj)=1+a(r/r0)q1+(r/r0)qp1,
    where r0 is an estimate of the equilibrium bond-length in the system and a is chosen to maximize the gradient of y at r = r0, thereby maximizing resolution for nearest-neighbour interaction.

    The idea behind this transform is that it behaves as rq for large r and as 1 − rp/a for small r thereby decreasing resolution in those two limits at rates determined by the parameters p, q. The reduction in resolution in the small r regime is desirable when no data is available to specify the model in that regime; see also Fig. 1.

    • Pn is an orthogonal basis in y-coordinates. Our default choice is the Legendre orthogonal polynomial basis, which implicitly assumes equidistribution of resolution in y-coordinates.

    • Finally, fenv is an envelope that specifies the cutoff radius rcut.

      – The default and canonical choice for the many-body basis is
      fenv(rij,Zi,Zj)=y2(yycut)2,
      where ycut = y (rcut, Zi, Zj).
      – The default choice of envelope for the pair potential U(1) or V(1) is Coulomb potential tilted to ensure a smooth cutoff,
      fenv(rij,Zi,Zj)=rijr01rcutr01+rcutr02rijr0rcutr0,
    which is repulsive as rij1 as r → 0 but continuously differentiable at the cutoff.

FIG. 1.

Center: a typical interaction potential V(r), plotted in r-coordinates. Left: a coordinate transform y = y(r) to a non-dimensional variable y that increases resolution near r = r0 where the potential minimum is located and decreases resolution below rmin (the radial distance occuring in the training dataset), to zero near r = 0 where there is no data (and the envelope fenv becomes relevant) and near r = rcut where the potential converges to a constant. The histograms show the distribution of a typical dataset in both r- and y-coordinates. Right: the interaction potential plotted (i) in transformed coordinates V(r(y)), (ii) with the default pair envelope removed and (iii) with the theoretically optimal, typically unknown, envelope removed. The parameterisation and the smoothness priors are not applied to the original potential V(r) but to the transformed potential V(y)/fenv(y).

FIG. 1.

Center: a typical interaction potential V(r), plotted in r-coordinates. Left: a coordinate transform y = y(r) to a non-dimensional variable y that increases resolution near r = r0 where the potential minimum is located and decreases resolution below rmin (the radial distance occuring in the training dataset), to zero near r = 0 where there is no data (and the envelope fenv becomes relevant) and near r = rcut where the potential converges to a constant. The histograms show the distribution of a typical dataset in both r- and y-coordinates. Right: the interaction potential plotted (i) in transformed coordinates V(r(y)), (ii) with the default pair envelope removed and (iii) with the theoretically optimal, typically unknown, envelope removed. The parameterisation and the smoothness priors are not applied to the original potential V(r) but to the transformed potential V(y)/fenv(y).

Close modal

While the envelope for the many-body potential is canonical, for the pair potential envelope there is significant scope for inserting prior modelling knowledge of the system of interest. For example, one could replace the r−1 type behaviour with rp + rq to obtain different behaviour as r → 0 and rrcut, or in fact one could incorporate the Ziegler-Biersack-Littmark (ZBL) potential23 to obtain asymptotically exact repulsion.

LISTING 4.

A example demonstrating more fine-grained control over the choice of radial basis Rnl. The function transformed_jacobi_env constructs the polynomial basis from which the radial basis is constructed, which can be within the general class of Jacobi polynomials, but is normally taken to be the Legendre basis in transformed y coordinates.

 

 

The effect of the distance transform y = y(r) and of the envelope function are visualized in Fig. 1.

  • Repulsion restraint: The construction outlined above means that, in the canonical cluster expansion formulation, the pair potential is given by
    V(1)(rij,Zi,Zj)=fenv(rij,Zi,Zj)pZiZj(yij),
    where pZiZj is a polynomial in transformed y coordinates. By imposing the constraint that pZiZj(y0)=1, where y0 = y(0, Zi, Zj), we ensure that Efenv(rij) as rij → 0. This guarantees repulsive behaviour of the total energy, independently of whether or not this is provided through the training data. In practice we enforce this weakly through a mild restraint to give the potential more flexibility.

We now turn towards the second aspect of basis construction: how to select which of the infinitely many tensor product basis functions
ϕk1ϕkν,
(8)
specified by the tuples (k1, …, kν), we wish to incorporate into the expansion of the (ν + 1)-body potential V(ν).

1. Sparse basis selection

Recall that kt = (zt, nt, lt, mt), and that the bound |mt| ≤ lt on mt automatically gives a selection of possible mt values once lt bounds are chosen. Roughly speaking, nt, lt measure how oscillatory the corresponding basis functions are in, respectively, the radial rt and angular r̂t coordinates. Therefore one typically puts upper bounds ntnmax and ltlmax in the basis selection, i.e. one chooses all basis functions (k1, …, kν) in the expansion for which these bounds are satisfied. Lower bounds lead to a smaller basis, but also less flexibility and correspondingly lower accuracy on the training set.

This simple strategy is available in ACEpotentials.jl but the default usage takes the notion of regularity a step further and bounds the mixed regularity of the basis functions we select. This is done by choosing a maximum total degree totaldegree(ν) for each correlation order ν and choosing all basis functions (k1, …, kν) such that
1νν̄andt=1νnt+wLlttotaldegree(ν).
(9)
The additional weight wL allows us to select whether we require lower or higher resolution of the angular vs radial components of the interaction. Note that a higher weight wL decreases the angular resolution. The resulting selected basis is much sparser and is appropriate for parameterising very smooth functions in high dimension.

The default usage is that totaldegree(ν) takes the same value for all ν but one may also specify a separate total degree for each correlation order ν. For example, Listing 5 demonstrates how to select a stronger weight wL = 2.0 thus providing less angular resolution, as well as how to select total polynomial degrees 25, 23, 20, 10 for, respectively, parameterising V(1), V(2), V(3), V(4).

LISTING 5.

Construct an ACE model with finer control on the sparse selection of basis functions.

 

 

Significant further fine-tuning of the basis specification is possible, e.g. choosing different total degrees and wL parameters for different interacting species. This is explained in the package documentation.22 

2. Smoothness prior

The foregoing discussion concludes the model architecture specification. An issue closely related to the sparse basis selection (9) is the definition of a smoothness prior that may be employed for ridge regression (regularized least squares) which we discuss in Sec. II E or in the Bayesian framework of Sec. II F. As explained above, the value
t=1νnt+wLlt
is a qualitative estimate for how oscillatory or smooth a basis function (8) is. We can extend this definition slightly by adding another parameter p and defining
γznlmt=1νntp+wLltp,
(10)
where z=(zt)t=1ν,n=(nt)t=1ν,l=(lt)t=1ν and m=(mt)t=1ν. We then collect these parameters into a diagonal matrix Γ with Γkk = γk. If c are the model parameters then ‖Γc2 will be a rough estimate for how smooth the potential energy surface (PES) is.

The matrix Γ also serves as a smoothness prior within the Bayesian interpretation of ridge regression: the prior distribution for the model parameters c is given by a multivariate normal distribution that is centered at the origin and has variance proportional to Γ−2; see Secs. II E and II F. In ACEpotentials.jl this operator can be constructed as shown in Listing 6, with p = 4, wL = 1 the default.

LISTING 6.

Construct an operator that estimates the smoothness of the MLIP model, to be used as a Tikhonov regulariser, or prior in a Bayesian framework.

 

 

The resulting operator Γ may now be used to specify the regularizer (or prior) of parameter estimation algorithms, e.g., in Listing 3, line 2 and explained in more detail in Secs. II E and II F. A key point is that Γ is a rigorous smoothness prior for the canonical cluster expansion (2a) but only a heuristic for the self-interacting expansion (2b).

It is interesting in general, but in particular in the low-data regime, to explore different choices of priors. Two particular variants that are also available in ACEpotentials.jl are the exponential and Gaussian priors
γznlmexp=expαltlt+αntnt,
and
γznlmgauss=expσltlt2+σntnt2,
which enforce even stronger smoothness requirements than the algebraic prior (10) and are currently still experimental features.

In the foregoing sections we discussed in some depth how an ACE interatomic potential architecture can be conveniently specified. The next task is to estimate the parameters matching the model to training data.

A training dataset consists of a collection of reference structures, R = {R}, each with associated potential energy ERR, forces FRR3×NR and, when appropriate, virials VRR6 (Voigt notation). The reference energies, forces and virials are typically obtained by evaluating a “high fidelity” reference potential energy surface for which we wish to obtain an ACE surrogate model. Density Functional Theory is a common choice, but higher levels of theory such as Coupled-Cluster methods are also used, especially for non-periodic systems. In addition each training structure should be given a label that specifies related sub-groups. For example, these subgroups could indicate different phases of a material, and the resulting labels might be “bcc,” “fcc,” “liquid.” The label could also indicate the MD temperature from the which the structures were generated, e.g. “fcc500K” or “liquid2500K.” This allows convenient filtering of the training set, e.g., for assigning training weights (cf. Sec. II E) or fitting to subsets.

Acquisition of training data need not be performed within the ACEpotentials.jl package, but can be undertaken in any simulation software that makes it convenient to generate and manipulate atomic structures, perform molecular dynamics or Monte Carlo simulations, and to evaluate structures using a high fidelity electronic structure model. Because of the general ease of use and in particular ease of interoperability with the Julia molecular simulation eco-system, we often use the Atomic Simulation Environment.19 

The standard format for storing and retrieving a training set in ACEpotentials.jl is the extended XYZ format and can be read as shown in Listing 7. This results in a list of atomic structures storing the structure information as well as the training data.

LISTING 7.

Reading a training set from an extended XYZ file.

 

 

1. Overview of training set acquisition

The acquisition of training data is often the most time-consuming aspect of MLIP development. An in-depth discussion goes beyond the scope of this software review article; important details can be found for example in Refs. 5, 12, and 2426. In the remainder of this section we give an outline of general strategies to consider, while in Sec. III we go into practical aspects how training sets can be constructed in a few prototypical applications and what kind of tools ACEpotentials.jl provide to support that task.

The overarching requirements are that training sets (1) must contain small enough atomic structures that they can be evaluated using high-fidelity electronic structure models; and (2) must contain snapshots of all possible local atomic configurations one expects to encounter during simulation and prediction tasks. Thus, generating a training set reduces to generating representative atomic structures which are then evaluated with the reference model to obtain target potential energies, forces and virials. While the latter is usually straightforward and varies little between projects, there is no standard way yet to generate the training structures. The choice will depend on the atomic system at hand, and the simulation tasks that the model must be able to perform reliably, e.g. which system properties (observables) are to be modelled.

As a first step, one should “sketch out” the parts of the potential energy landscape that are of interest, e.g. construct one representative structure for each distinct energy minimum of interest. This might include different phases or material defects that the final model should be able to describe. Next, one generates random samples from those sketches for example by displacing the atom positions (randomly, along normal modes, volume scans, and so forth), or by subsampling an ab initio molecular dynamics trajectory. After collecting a seemingly adequate number of training structures (the total number of observations should normally exceed that number of parameters) one can fit a first model and test that model’s accuracy with respect to some target property. If the accuracy is inadequate, or the model not robust (e.g., an MD simulation is unstable), then a good strategy is to proceed with an iterative model refinement process. In each iteration additional training structures are selected to converge the model’s accuracy with respect to the target properties of interest. One might add hand-crafted structures to fix a particular flaw (e.g. to improve description of inter-molecular interaction in a molecular liquid or include supercells with vacant atomic sites) or model-driven MD to less computationally expensively explore relevant parts of Potential Energy Surface (for example, low potential energy regions to bring potential-Boltzmann-sample closer to reference-Boltzmann-sample and wider temperature/pressure range than intended for application of interest to make the model-driven simulations more stable).

Iterative model refinement is closely related to active learning. That strategy assumes that there is an accurate and efficient way available to estimate model uncertainty. During a simulation task, for example a molecular dynamics simulation, when a structure with high uncertainty is encountered it is evaluated with a reference method and added to the training data. To accelerate this process, we developed Hyper-Active Learning,14 which biases molecular dynamics simulation towards high-uncertainty and high predicted error regions. This strategy is sometimes capable of more rapidly generating many independent training samples. Section III will go into some details how this strategy is used in practice.

Recall from Sec. II A that the linear ACE models are parameterized linearly as shown in (3). As described in Sec. II D we estimate parameters by matching the model to observations of total energies, forces and virials evaluated via a high fidelity reference model on different training structures RR, where R denotes the training set. To estimate the parameters we minimize the loss function (4). In the current section, we go into further details of the parameter estimation process once the model and training set have been specified.

First, we discuss the regression weights wE,R, wF,R and wV,R, which allow users to specify the relative importance of different observations and structures. In principle one could specify individual weights for each structure R and observation type E, F, V. In practice, it has proven convenient to label all structures R with a configuration type as described in Sec. II D and to assign weights according to such groups. In addition the weights wE,R, wV,R should scale like 1/NR where NR denotes the number of atoms in the structure R.2,12 Thus, the weights wE,R, wV,R take the form
wE,R=w̃E,cfgtype(R)NR,wF,R=w̃F,cfgtype(R),wV,R=w̃V,cfgtype(R)NR,
with w̃*,cfgtype defined by the user as follows: Suppose, for example, that a training set contains several solid phase structures as well as liquid structures, then we may wish to demand a higher fit accuracy on the solid structures. In addition we typically find that energies must be given higher weights in order to achieve the best possible balance of accuracy. This might result in weight specifications as shown in Listing 8, lines 4 and 5.
LISTING 8.

Prototypical parameter estimation script, using some simple control over regression weights and solver parameters.

 

 
Next we discuss the minimization of the loss. Since all observations we consider here are linear, the minimization of L(c) can be rewritten in the form
arg mincW(yAc)2,
(11)
where y is a vector containing the observation values ER,FR,VR, A is the design matrix containing the ACE basis values corresponding to those observations and W a diagonal matrix containing the weights wE,R, wF,R, wV,R. Solving the linear least squares system (11) often results in overfitting, hence one almost always employs regularized methods, for example the ridge regression formulation,
arg mincW(yAc)2+λΓc2,
(12)
where Γ specifies the form of the regularizer and λ a scaling parameter determining the relative weight of the regularisation. This formulation of the least squares problem is often also called regularized least squares, and the λ‖Γc2 term is often called generalized Tikhonov regularisation. The default for Γ is zero or the identity, depending on the choice of solver. Our recommendation is to use the smoothness prior introduced in (10) instead for most solvers. Automatic relevance determination (ARD) is unique amongst the ridge regression solvers available in ACEpotentials.jl in that it estimates a regularizer Γ from the sensitivity of the parameters to the training data, at additional computational cost; see Sec. II F for more details.

To solve the ridge regression problem (12), ACEpotentials.jl employes the package ACEfit.jl (https://github.com/ACEsuit/ACEfit.jl), which offers a range of such algorithms. In the simplest setting, it can be used as shown in Listing 8, lines 6 and 7. For a list of the most important solvers, see Table I. For large models and/or large datasets, the parameter estimation task can be computationally challenging and may have to be performed on a cluster.

TABLE I.

Table of solvers for the ridge regression problem (12).

QR QR decomposition: Direct solution of the ridge regression problem (12). Tikhonov regularisation is imposed by extending the 
 linear system. This method should rarely be used in practice and is included mostly for theoretical interest and the sake of 
 completeness. 
 solver = QR(lambda = 0.0) 
LSQR Krylov method: The standard iterative Krylov algorithm to solve the ridge regression problem (12). Tikhonov regularisation is 
 imposed implicitly in the algorithm, with damp corresponding to the parameter λ. Early termination, by adjusting atol provides an 
 additional and different form of regularisation. This algorithm is suitable for very large-scale parameter estimation problems. 
 solver = LSQR(damp = 1 × 10−4, atol = 1 × 10−6) 
RRQR Rank-revealing QR decomposition: A random matrix sketching approach, which is computationally more efficient than the 
 standard QR decomposition. In addition, the parameter rtol is closely related to λ in (12) but not identical. Instead of adding a 
 Tikhonov term, RRQR regularisation is imposed by removing highly sensitive subspaces as determined by rtol. For large problems, 
 this algorithm is more performant than the standard QR decomposition. 
 solver = RRQR(rtol = 1e−5) 
BLR Bayesian linear regression: (or, Bayesian ridge regression) specifies a class of solvers that estimate regularisation hyperparameters, 
 depending on the setting it estimates the scaling parameter λ or the entire Tikhonov matrix Γ. This solver also determines 
 a posterior model distribution that can be used for uncertainty quantification. See Section II F. for further details. This algorithm is 
  more robust than QR, LSQR, RRQR, but computationally more intensive. It is highly recommended for relatively small datasets. 
 solver = BLR() 
QR QR decomposition: Direct solution of the ridge regression problem (12). Tikhonov regularisation is imposed by extending the 
 linear system. This method should rarely be used in practice and is included mostly for theoretical interest and the sake of 
 completeness. 
 solver = QR(lambda = 0.0) 
LSQR Krylov method: The standard iterative Krylov algorithm to solve the ridge regression problem (12). Tikhonov regularisation is 
 imposed implicitly in the algorithm, with damp corresponding to the parameter λ. Early termination, by adjusting atol provides an 
 additional and different form of regularisation. This algorithm is suitable for very large-scale parameter estimation problems. 
 solver = LSQR(damp = 1 × 10−4, atol = 1 × 10−6) 
RRQR Rank-revealing QR decomposition: A random matrix sketching approach, which is computationally more efficient than the 
 standard QR decomposition. In addition, the parameter rtol is closely related to λ in (12) but not identical. Instead of adding a 
 Tikhonov term, RRQR regularisation is imposed by removing highly sensitive subspaces as determined by rtol. For large problems, 
 this algorithm is more performant than the standard QR decomposition. 
 solver = RRQR(rtol = 1e−5) 
BLR Bayesian linear regression: (or, Bayesian ridge regression) specifies a class of solvers that estimate regularisation hyperparameters, 
 depending on the setting it estimates the scaling parameter λ or the entire Tikhonov matrix Γ. This solver also determines 
 a posterior model distribution that can be used for uncertainty quantification. See Section II F. for further details. This algorithm is 
  more robust than QR, LSQR, RRQR, but computationally more intensive. It is highly recommended for relatively small datasets. 
 solver = BLR() 

For small and moderate datasets we normally recommend the BLR method. For large datasets. when finely tuned regularisation is often less important, the random matrix sketching RRQR and iterative LSQR may be more appropriate.

Once the model parameters are determined as shown above, we typically wish to perform two tasks: (1) confirm the model accuracy on a test set; and (2) export the model to a format that can be used in standard MD codes, e.g., LAMMPS and ASE. Suppose that we are provided with a test data set testdata, then we can determine the model errors on that test set as seen in Listing 8, lines 9–11. This will print tables of root-mean-square error (RMSE) and mean absolute error (MAE) errors for individual configuration types. If we wish to store and/or export the fitted potential for later use, we typically save it in .json format which can be read by ACEpotentials.jl as well as its Python interface to ASE, and in .yace format which can be read by the pace extension to LAMMPS; cf. Listing 8, lines 13–15.

Uncertainty estimates of model predictions are highly sought after tools to judge the accuracy of a prediction during simulation with a fitted model, but can also be employed to great effect during the model development workflow, e.g., in an active learning context. Such uncertainty estimates can be derived in a principled way by recasting the ridge regression problem (12) in a Bayesian framework where inference is based on the Bayesian posterior distribution
post(c)=p(c|A,y)p(A,y|c)p(c).
(13)
Here, p(A, y|c) denotes the likelihood of the observed data, and p(c) the prior distribution on the model parameters. The Bayesian analogue of (12) is a Bayesian Linear Regression model with Gaussian observational noise and prior,
p(A,y|c)exp12(yAc)T(βW2)(yAc),
(14)
and
p(c)exp12cTΣ01c,
(15)
where the covariance β−1W−2 of the observation noise depends on the regression weight matrix W and a hyper-parameter β > 0. This choice of prior and noise model yields a Gaussian posterior distribution, p(c|A,y)=N(c;μ,Σ), with mean and covariance given, respectively, by μ = βΣATW2y and Σ=βATW2A+Σ011. We assume that the prior covariance Σ0 is of the form of a diagonal matrix. The above Bayesian model can be connected to the ridge regression formulation of Eq. (12) by noticing that maximising the posterior density (13) is equivalent to minimizing the regularized loss in (12) when Σ01=ζΓ2 for some ζ > 0 and λ = ζ/β.

1. Solvers and model selection via evidence maximisation

The reliability of uncertainty estimates critically depends on the values of the model hyper-parameters, the noise and prior covariance matrices β−1W−2 and Σ0. In ACE, it is sometimes difficult to make informed guesses of explicit values of these hyper-parameters that lead to good fits. We therefore commonly employ empirical Bayes approaches that infer appropriate values of these parameters directly from the training data by virtue of maximising the model evidence
p(A,y|Σ0,Λ1)=p(A,y|c,Λ1)p(c|Σ0)dc=(2π)Nobs|Σ||Σ0||Λ|×exp12(yAμ)TΛ1(yAμ)12μTΣ01μ
(16)
as a function of Σ0, β. Intuitively, maximising the model evidence results in a model where the regularising effect of the covariance matrix Σ0 and the degree of penalization of model misfit—modelled by the noise covariance matrix β−1W−2—are balanced against the degree to which the regression coefficients are determined by the data.

Within ACEpotentials.jl this is implemented in the BLR solver (cf. Table I). Different solver options result in different constraints on the form of the prior covariance Σ0, and we refer to the documentation22 for further details.

2. Uncertainty estimates via committees

Formally, the Bayesian ridge solver provides not an optimal parameter vector c but a posterior parameter distribution p(c). In practice, one then selects the mean parameter vector μ to specify the model. However, the posterior distribution remains important to estimate the uncertainty of predictions. Evaluating such uncertainties from the exact posterior distribution is computationally expensive; instead, ACEpotentials draws K samples {ck}k=1K from post(c) resulting in a committee of ACE models which can be used to obtain computationally efficient uncertainty estimates for predictions. For example, the standard deviation σ of a total energy prediction can be approximated by a committee via
σ̃2=1λ+1Kk=1K(EkEμ)2,
(17)
where Eμ is the prediction made by the mean model with parameters μ, while Ek are the committee predictions from models with parameters ck. Similarly, uncertainty estimates can be made for any partial derivative of the potential energy surface such as for committee forces Fk = ck · ∇Bi, or the mean force Fμ = μ · ∇Bi.

The first term in (17) refers to the aleatoric, or irreducible, uncertainty arising due to randomness of the system which is dominated by the complexity of the linear ACE convergence parameters such as correlation order, polynomial degree and cutoff. The second term is the epistemic, or reducible, uncertainty arising due to a lack of data or rather information. An example how a variance estimate of the epistemic uncertainty can be obtained in the linear ACE framework is shown in Listing 9.

LISTING 9.

Example how to use a committee to estimate the uncertainty of a prediction. (Note that model.potential gives access to the calculator object.) Analogously, one can obtain committees of forces and virials.

 

 

In this section, we present several practical examples of ACE usage, including simple benchmarks, practical potentials for materials and liquids to examples illustrating the hyperactive learning workflow. The scripts we used to generate the reported results are made available in a separate git repository (https://github.com/ACEsuit/ACEworkflows) that will be regularly updated as the ACEpotentials.jl package evolves.

1. Benchmarks with limited-diversity datasets

We test ACEpotentials.jl with default parameters on an early single-element benchmark dataset taken from Ref. 27. This dataset was originally used to assess the relative strengths and weaknesses of four important MLIPs, the high-dimensional neural network potential (NNP),1 the Gaussian approximation potential (GAP),2 the Spectral Neighbor Analysis Potential (SNAP),7 and moment tensor potentials (MTP).8 The benchmark contains six separate datasets corresponding to the six elements Li, Mo, Ni, Cu, Si and Ge, spanning a variety of chemistries (main group metal, transition metal and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). For each element, the dataset contains the ground-state crystal structure, strained structures with strains of −10%–10%, slab structures up to a maximum Miller index of three, and NVT ab initio molecular dynamics simulations of the bulk supercells with and without a single vacancy. These datasets contain a relatively large number of training structures, but only limited diversity.

In Table II we see the comparison of the MAEs in energies and forces for the best performing potentials in the benchmark (GAP and MTP) with two linear ACE models trained with the default parameters and total degrees chosen to reach basis sizes of, respectively, 300 basis functions for ACE(s) and ∼1000 basis functions for ACE(l). We optimized none of the hyperparameters and solved used RRQR to estimate the parameters. We chose RRQR since the datasets are very large, hence a highly tuned regularisation is less important. This results in competitive accuracy across the entire benchmark. The only small exception is the slightly larger energy error for Mo-ACE(l), which suggests some fine-tuning of the model parameters could be beneficial in this particular case. Our aim with this experiment was to demonstrate that, with only minimal effort, linear ACE models can perform with (near-) best accuracy in a data set geared towards testing statistical generalization.

TABLE II.

Mean absolute test errors in predicted energies and forces of two ACE models, ACE(sm) with ∼300 basis functions and ACE(lge) with ∼1000 basis functions, compared against the two best performing MLIPs published in.27 

Energy (meV)Forces (eV/Å)
ACE(sm)ACE(lge)GAPMTPACE(sm)ACE(lge)GAPMTP
Ni 0.416 0.34 0.42 0.48  0.018 0.015 0.02 0.01 
Cu 0.292 0.228 0.46 0.41 Ni 0.007 0.005 0.01 0.01 
Li 0.231 0.165 0.49 0.49 Cu 0.006 0.005 0.01 0.01 
Mo 2.597 2.911 2.24 2.83 Li 0.123 0.097 0.09 0.09 
Si 3.501 1.985 2.91 2.21 Mo 0.086 0.066 0.07 0.06 
Ge 2.594 2.162 2.06 1.79 Si 0.064 0.051 0.05 0.05 
Energy (meV)Forces (eV/Å)
ACE(sm)ACE(lge)GAPMTPACE(sm)ACE(lge)GAPMTP
Ni 0.416 0.34 0.42 0.48  0.018 0.015 0.02 0.01 
Cu 0.292 0.228 0.46 0.41 Ni 0.007 0.005 0.01 0.01 
Li 0.231 0.165 0.49 0.49 Cu 0.006 0.005 0.01 0.01 
Mo 2.597 2.911 2.24 2.83 Li 0.123 0.097 0.09 0.09 
Si 3.501 1.985 2.91 2.21 Mo 0.086 0.066 0.07 0.06 
Ge 2.594 2.162 2.06 1.79 Si 0.064 0.051 0.05 0.05 

2. Silicon

We used ACEpotentials.jl to fit a linear ACE potential to the silicon dataset introduced by Bartók et al.25 for fitting a Gaussian approximation potential (GAP). This extensive database contains a wide range of configurations ranging from several bulk crystal structures (diamond, hcp, fcc, etc.), amorphous structures as well as liquid MD snapshots, aiming to cover as much of the silicon energy landscape as possible. The corresponding GAP model was shown to outperform a wide range of other (classical) interatomic potentials on a large selection of accuracy and property or generalisation tests ranging from surface formation energies as well as liquid and radial distribution functions. The current work benchmarks an ACEpotentials.jl model, with default model parameters, containing basis functions up to order ν̄=4, polynomial total degree Dmax = 20 and 6 Å cutoff against this silicon GAP potential. The model was fitted using generalised Tikhonov regularisation (12) of λΓ, where Γ was constructed using an algebraic smoothness prior (10) with p = 5, whilst the BLR solver was used to estimate the scaling parameter λ. This benchmark is formed of a series of property tests including bulk diamond elastic constants, vacancy formation energies, surface formation energies for the (100), (110), (111) surfaces and hexagonal, dumbbell and tetragonal point defect energies for bulk diamond. These results of these property tests for the CASTEP28 DFT reference, GAP and ACE are shown in Fig. 2 and indicate good accuracy across the range of property tests. Percentage errors relative to the DFT reference are also included, confirming similarly accurate performance between the GAP and the ACEpotentials.jl frameworks.

FIG. 2.

Benchmark of the silicon GAP25 and ACE model presented in this work. Percentage relative errors against the DFT reference are provided in the Table.

FIG. 2.

Benchmark of the silicon GAP25 and ACE model presented in this work. Percentage relative errors against the DFT reference are provided in the Table.

Close modal

We also used this silicon ACE potential to carry out a more challenging test, namely to simulate fracture in the (111)[11̄0] cleavage system. We used the matscipy package to setup a 12 × 11 × 1 supercell containing 1586 atoms and to carry out structural optimizations with a Mode I crack anisotropic continuum linear elastic displacement field29 applied with stress intensity factors ranging from 0.6KG to 1.5KG (where KG is the Griffith load at which fracture becomes thermodynamically favorable). We observed spontaneous formation of the Pandey 2 × 1 reconstructed (111) surface behind the crack tip, in good agreement with previous studies using DFT30 and GAP.25 The critical stress intensity factor was determined to be KI = 1.0 ± 0.02KG, which is very close to the expected Griffith value, indicating minimal lattice trapping. Overestimating the extent of lattice trapping is a common failure mode of previous interatomic potentials when applied to model fracture.31 The total simulation time was around 30 minutes on a 28-core workstation.

To successfully carry out the fracture test it was crucial to produce a highly regular (smooth) ACE potential. To illustrate the effect of changing the smoothness prior, a sequence of ACE potentials (order ν̄=4, total degree Dmax = 21 and 6 Å cutoff), was fitted using no smoothness prior (Γ = 1) and increasing strengths of algebraic smoothness prior (10), p = 1, 2, 5 and 10. In all cases the model parameters were estimated using generalized Tychonov regularisation (12) with the scale factor λ tuned such that all potentials achieved a force RMSE of ∼0.075 eV/Å, which is ∼5% larger than without any regularisation. The effect of the prior on predicted Si–Si dimer curves and rigid bulk Si decohesion curves, which respectively probe smoothness of two-body and many-body terms, is shown in Fig. 3. Applying a moderate smoothness prior aids extrapolation into the close-approach region and reduces the amplitude of spurious oscillations seen in the stress (S) during decohesion.

FIG. 3.

Top Left: The predicted energy of the Si–Si dimer is shown for a sequence of ACE potentials trained with varying strengths of smoothness prior but equal accuracy (Force RMSE ≈ 0.075 eV/Å). Γ = 1 corresponds to an equal prior for all basis functions whilst p indicates the strength of the algebraic smoothness prior defined in (10). The black curve shows the corresponding result using GAP. All curves are shifted for clarity. Bottom: The evolution of stress (S) as a function of separation (z) during rigid decohesion of bulk silicon into the unrelaxed (110) and (100) surfaces is shown for the same sequence of potentials. Top Right: Snapshot from Si(111)[11̄0] quasi-static fracture simulation at a stress intensity factor of 1.8KG using our ACE potential. The lower fracture surface shows a 2 × 1 Pandey reconstruction (alternating pentagons and heptagons), consistent with previous studies using DFT and GAP models, but at much reduced cost. The critical fracture toughness is very close to KG, showing minimal lattice trapping.

FIG. 3.

Top Left: The predicted energy of the Si–Si dimer is shown for a sequence of ACE potentials trained with varying strengths of smoothness prior but equal accuracy (Force RMSE ≈ 0.075 eV/Å). Γ = 1 corresponds to an equal prior for all basis functions whilst p indicates the strength of the algebraic smoothness prior defined in (10). The black curve shows the corresponding result using GAP. All curves are shifted for clarity. Bottom: The evolution of stress (S) as a function of separation (z) during rigid decohesion of bulk silicon into the unrelaxed (110) and (100) surfaces is shown for the same sequence of potentials. Top Right: Snapshot from Si(111)[11̄0] quasi-static fracture simulation at a stress intensity factor of 1.8KG using our ACE potential. The lower fracture surface shows a 2 × 1 Pandey reconstruction (alternating pentagons and heptagons), consistent with previous studies using DFT and GAP models, but at much reduced cost. The critical fracture toughness is very close to KG, showing minimal lattice trapping.

Close modal

3. Water

We investigated the ability of ACEpotentials.jl to capture the interactions in complex molecular liquids and to perform robust molecular dynamics simulations in such systems, fitting a linear ACE potential to a dataset containing 1593 liquid water configurations.32 We chose only default model parameters, containing basis functions up to correlation order ν̄=3, polynomial total degree Dmax = 15 and rcut = 5.5 Å cutoff. Parameter estimation was performed using ARD with relevance threshold set by minmising the Bayesian Information Criterion (BIC).33 The training RMSE were 1.732 meV/atom for energies and 0.099 eV/Å for forces. To investigate the performance and robustness of the fitted ACE model, a series of mean squared displacement (MSD) simulation were performed under 1 bar NPT conditions at 300 K. The simulations were performed using 5184 atom simulation boxes, shown in Fig. 4 below, with the pace pair style in LAMMPS.12 The total simulation time for each of these simulation was 20 min utilising 1280 cores on ARCHER2, illustrating the efficiency of ACE potentials. The diffusion constant predicted by this simulation was 1.20 ± 0.03 m/s2. It should be noted that diffusion constants are notoriously difficult to accurately determine especially considering the absence of long-range interactions into these ACE models. This example is therefore mostly an illustration of robustness and performance.

FIG. 4.

Mean squared displacement (MSD) for three liquid water simulation at 1 bar NPT simulations and 300 K. The simulation cell contained 5184 atoms.

FIG. 4.

Mean squared displacement (MSD) for three liquid water simulation at 1 bar NPT simulations and 300 K. The simulation cell contained 5184 atoms.

Close modal
While fitting ACE potentials to pre-existing or “manually” assembled datasets is a common task, as discussed in Sec. III A, the real benefit of the linear ACE framework is in the construction robust and computationally inexpensive ACE potentials from the ground up with automated dataset assembly. This is achieved through the use of an iterative loop employing an active learning (AL) type approach,34,35 where relevant training configurations are sampled to form a training database. To accelerate this AL process, we introduced hyperactive learning (HAL),14 which adds a biasing term to a molecular dynamics simulation towards predicted high uncertainty σ, as shown in (18). A tunable parameter τ controls the strength of the biasing and thus the balance between physical exploration (molecular dynamics) and discovery of new structures (biasing).
EHAL=EACEτσ.
(18)

The HAL framework shares similarities with Bayesian Optimization (BO) as the biasing term is formally equivalent to a Lower Confidence Bound (LCB) acquisition function.36 Similarly to BO, the parameter τ adjusts the tradeoff between exploration and exploitation during the generation of training configurations using HAL. HAL-generated configurations are both energetically reasonable, guided by EACE (exploitation), and informative, predicted by a relatively large value of σ (exploration). The bias towards uncertainty, mediated by an emerging biasing force during HAL dynamics, can be viewed as a strategy to acquire information (gain) by seeking out unseen (local) environments. The HAL approach can also be viewed as an adversarial attack, aimed to destabilize a fitted ACE potential such that, after iteratively adding sufficiently many new configurations, the linear ACE model is robust to such attacks which all but guarantees stable dynamics over long timescales.

The biasing parameter τ in HAL necessitates careful tuning, which HAL achieves through an adaptive scheme14 that tunes τ on the fly by balancing the magnitude of the biasing force relative to the forces obtained by EACE. The relative biasing parameter τr used in this scheme is typically set to 0.1–0.2 and ensures that the biasing strength is reduced or increased depending on the degree of predicted uncertainty explored during the dynamics.

To initiate HAL, an initial database is typically constructed consisting of 1–10 configurations that sketch out some aspects of the energy landscape that are of interest to the application at hand. An ACE potential is fitted using a variant of the BLR solver, after which committee parameterisations {ck}k=1K, typically K = 8, are sampled from the posterior as discussed in Sec. II F. Biased MD/MC dynamics are then performed on EHAL, using the dynamically tuned τ parameter. During the dynamics the relative force uncertainty fi is recorded and once it exceeds a predefined tolerance ftol a DFT calculation is triggered, and the training database is extended. This relative force uncertainty fi is defined as
fi=1Kk=1KFikFiμFiμ+ε,
(19)
where Fik are the forces as obtained by the committee and Fiμ the forces predicted by the mean μ of the posterior over the coefficients as outlined in Sec. II F. ɛ is a regularising constant used to regularize the fraction typically set to 0.2–0.4 eV/Å. Careful tuning of ftol is required as it tunes the degree of extrapolation when adding new (unseen) configurations to the training database. Too large ftol may lead to the sampling of energetically unreasonable configurations, whereas too small ftol leads to suboptimal information gain during the HAL scheme resulting in sampling unnecessarily many configurations. The HAL scheme is outlined in Fig. 5 illustrating how from a small initial training database containing a handful of configurations of interest a stable ACE potential is generated by performing biased MD and MC steps and iteratively triggering DFT calculations. For future reference, we define a HAL iteration to consist of (i) a biased MD simulation run until a new unseen structure is flagged, (ii) evaluating energies, forces and virials on the new structure, and (iii) updating the ACE potential model.
FIG. 5.

Hyperactive Learning (HAL) protocol. Linear ACE potentials are fitted using BRR or ARD after which biased MD/MC steps are performed controlled by biasing parameter τ. Once the uncertainty metric fi exceeds ftol a DFT calculation is triggered a HAL iteration is completed and the training database extended.

FIG. 5.

Hyperactive Learning (HAL) protocol. Linear ACE potentials are fitted using BRR or ARD after which biased MD/MC steps are performed controlled by biasing parameter τ. Once the uncertainty metric fi exceeds ftol a DFT calculation is triggered a HAL iteration is completed and the training database extended.

Close modal

1. AlSi10 melting temperature

The HAL framework was used to create an ACE potential for determining the melting temperature of the AlSi10 alloy. An initial dataset consisted of 32-atom random fcc lattice configurations, each containing 98 aluminium and 10 silicon atoms. This initial dataset was composed of five fcc random alloy configurations with lattice constants ranging from 14.3 to 16.6 Å3/atom. The ACE basis set included interactions up to correlation order ν̄=2 (three-body), and employed a cutoff of 5.5 Å. The model was fitted using Automatic Relevance Determination (ARD) and its sparsity set by minimising BIC which resulted in increasingly complex ACE models as more configurations (or information) were added. The chosen maximum polynomial degree Dmax during the HAL procedure increased from 4 to 12. The parameter estimation was carried out using ARD. The HAL relative biasing strength was set to τr = 0.2, and the relative uncertainty threshold to ftol = 0.2.

The HAL dynamics was used to melt the random alloy crystal structure, by ramping the temperature from 0 to 1500 K at 1 GPa using a 1 fs timestep. Cell swapping and volume adjusting HAL-MC steps were taken to facilitate exploration of the (biased) energy landscape. After 18 HAL iterations, the ACE potential was already able to consistently perform 5000 HAL MD/MC timesteps without encountering new structures with high uncertainty. This final ACE potential contained 79 basis functions as selected using ARD pruning.

During these 18 HAL iterations the dimer curves are typically examined to ensure the potentials exhibit attraction at typical interatomic distances and short range repulsion as illustrated in Fig. 6.

FIG. 6.

ACE dimer curves for pair interactions for several HAL iterations. Stronger colours indicate later HAL iterations. They key observation to be drawn from this figure is that even in the early stages of the HAL process with very little available data, our priors ensure that the dimer curves are physically sensible, in particular smooth and repulsive.

FIG. 6.

ACE dimer curves for pair interactions for several HAL iterations. Stronger colours indicate later HAL iterations. They key observation to be drawn from this figure is that even in the early stages of the HAL process with very little available data, our priors ensure that the dimer curves are physically sensible, in particular smooth and repulsive.

Close modal

The ACE potential obtained after HAL iteration 18 (fitted to 22 structures in total) was subsequently used to perform nested sampling (NS) simulations to model the liquid-solid phase transition. NS simulations were performed using 384 NS walkers, using a total decorrelation length of 512 formed by volume/shear/stretch/swap MC steps at a ratio of 4:4:4:4. The resulting heat capacity curves obtained by NS are presented in Fig. 7 and are in close agreement to the melting temperature of 867 K as given by Thermo-Calc using the TCAL4 database.37 

FIG. 7.

NS AlSi10 heat capacity curves for several runs indicating the liquid-solid transition as predicted by the HAL generated ACE potential.

FIG. 7.

NS AlSi10 heat capacity curves for several runs indicating the liquid-solid transition as predicted by the HAL generated ACE potential.

Close modal

2. Polyethylene glycol

The HAL framework38 was used to create a polyethylene glycol (PEG) model. To initilize HAL, 18 structures of PEG (n = 32) formed of 32 monomer units in vacuum were evaluated using the ORCA code39 with the ωB97X DFT exchange correlation functional40 and the 6-31G(d) basis set. ACE models were fitted to the initial and subsequent datasets with correlation order ν̄=3, total degree Dmax = 12 and a cutoff radius 5.5 Å, using the ARD algorithm. The HAL protocol used relative biasing parameter τr = 0.1 and uncertainty tolerance ftol = 0.3 and performed at 500 K. Unlike the previous AlSi10 example, no cell adjusting or atom swapping HAL-MC steps were performed as the configurations are isolated molecules in vacuum. It was also chosen to not change the ACE basis throughout the HAL procedure but rather to keep it constant (e.g. Dmax = 12) as the initial database was relatively diverse. After 50 HAL iterations an ACE potential was generated that was deemed stable as it completed 104 HAL biased MD steps without triggering a DFT calculation. It was then used to determine the density of a PEG polymer formed of n = 200 monomer units in LAMMPS under periodic boundary conditions using the PACE evaluator.12 The PEG (n = 200) density was determined at 300, 350 and 400 K at 1 bar pressure over a timescale of 0.5 ns as shown in Fig. 8. The density at 300 K is in good agreement with the experimental density of 1.2 g/cm341 at 293 K. This illustrates remarkable extrapolative performance by the linear ACE framework as the DFT reference (ORCA) does not support periodic boundary conditions itself, making determining the PEG density purely from first principles impossible.

FIG. 8.

PEG (n = 200) density for HAL generated ACE potential under periodic boundary conditions using LAMMPs.

FIG. 8.

PEG (n = 200) density for HAL generated ACE potential under periodic boundary conditions using LAMMPs.

Close modal

3. Perovskite CsPbBr3

We used the HAL framework38 to create a training dataset for the lead-halide perovskite CsPbBr3, which shows three relevant phases: orthorhombic at low temperatures, tetragonal at intermediate temperatures, and cubic at high temperatures, with experimental transition temperatures of 361 and 403 K.42 The HAL process was designed to sample all of these phases so that the resulting potential accurately represents energy and entropy of each phase and is hence capable of predicting the transition temperatures. To ensure consistent DFT energies and effective vibrational mode sampling, approximately cubic 40 atom supercells were created for all three phases.

This problem required some refinement of the standard HAL procedure, and careful testing of fitted ACE potentials for several basis sizes. We therefore give more detail about the process than in the previous cases.

The initial fit starting the HAL process used a set of 15 randomly perturbed (unit cell and atomic positions) 40-atom configurations, three from each of the high symmetry phases. The default ACE basis was used, with a cutoff of 8 Å, a smoothness prior with p = 3, and the sklearn BayesianRidge linear solver. Automated basis selection was applied every 10 HAL iterations, with a maximum basis size of 2000, ν̄=3, a maximum total polynomial degree of 16, and the model score as the selection criterion. To encourage exploration of a wide range of temperatures and configurations, over a maximum of 104 1 fs HAL MD steps the temperature was ramped from 200 to 600 K, and τr from 0.1 to 0.5. New fitting configurations were selected when the fractional force error ftol exceeded 0.4. After 20 iterations starting from the three unperturbed high-symmetry 40-atom cells at fixed unit cell shape and size, the process was restarted from 9 80-atom high symmetry cells, doubling each of the three 40-atom cells along each cell vector, for 20 additional iterations. Then 20 additional iterations were carried out with variable unit cell and an applied pressure of 0.

At this point the model appeared to be stable enough for 105 steps without a HAL bias, so we switched to an unbiased sampling process to gather more data and improve the model accuracy. Starting the fit from the complete set of configurations from the HAL process, we generated fitting configurations from 2000 step runs with a maximum basis size of 4000. These used the same 80 atom starting configurations, but at fixed temperatures of 200–500 K at 100 K intervals, and fixed shape but variable unit cell volume. To further refine the performance of the low energy parts of the PES around each high symmetry structure, we sampled 36 more configurations, each with 160 atoms (the three 40 atom supercells doubled along each of the three pairs of lattice vectors) at a range of lower temperatures, 150–300 K at 50 K intervals.

The original set of 15 randomly perturbed configurations, another similar set of 15, and the 168 HAL configurations were used as the reference database for a set of fits to explore the performance of the model for a wide range of basis sizes. At this stage we filtered out physically unreasonable fitting data, as defined by a criterion that excluded any force larger than 10 eV/Å, as well as the energies and virials from such configurations. To fit the model and evaluate its predictive accuracy we split the set of configurations into 75% fitting and 25% testing, stratifying the split by the HAL iteration (or initial random perturbation set) that produced the configuration. The same fitting procedure and basis as in the HAL run were used, with ν̄=2 and ν̄=3 and maximum polynomial degree 4–16, up to a maximum basis size of 2 × 104. We also compared three choices for the smoothness prior: none, p = 2, and p = 4.

The training set residuals, test set residuals, and BayesianRidge score (log marginal likelihood) are plotted as a function of basis size in Fig. 9. For each value of ν̄ the fitting error improves monotonically as the basis size (and polynomial degree) increases, but at equal basis size the ν̄=2 residuals are lower by as much as 25% (especially for moderately sized bases), indicating that for this system increasing the polynomial degree provides the basis with more useful flexibility as compared to increasing ν̄. For the basis size range where the error is minimized, the testing set residuals are larger than the fitting set by at least about a factor of 2, indicating that some amount of overfitting is occurring. The smoothness prior is successful at limiting the extent of this overfitting.

FIG. 9.

Fitting set residual (left), testing set residual (center), and log marginal likelihood (right) as a function of basis size for CsPbBr3 ACE model fit to a database generated with HAL. Symbol indicates correlation order ν̄, and color indicates smoothness prior exponent p.

FIG. 9.

Fitting set residual (left), testing set residual (center), and log marginal likelihood (right) as a function of basis size for CsPbBr3 ACE model fit to a database generated with HAL. Symbol indicates correlation order ν̄, and color indicates smoothness prior exponent p.

Close modal

The generally lower training and test errors for the ν̄=2 models relative to the correlation order three models are reflected in their Bayesian ridge scores (log marginal likelihoods). However, within each correlation order the optimal choice of polynomial degree and corresponding basis size indicated by the minimum test error are not consistent with the score. Indeed, the results displayed in Fig. 9 lead us to conclude that the Bayesian ridge score is not always a reliable tool for optimal basis selection and other options should be explored in the future.

We used the model with lowest test set error, generated by the fit with ν̄=2, maximum polynomial degree 12, and smoothness prior p = 4, to simulate larger unit cells of CsPbBr3 at a range of temperatures spanning its expected range of phase transition temperatures. We simulated 32 independent constant temperature, constant pressure, MD trajectories at temperatures from 200 to 355 K and zero pressure for 104 10 fs time steps. Each trajectory started from an 8 × 8 × 6 supercell (7680 atoms) of the orthorhombic structure. To analyze the resulting structure we reconstructed the effective cubic lattice vectors and averaged their magnitudes over the last 8000 steps of each trajectory. A plot of these effective cubic cell lattice vector magnitudes as a function of temperature is shown in Fig. 10. We see the three expected phases as indicated by the degeneracy of the lattice constants: cubic at high temperature, tetragonal at intermediate temperatures, and orthorhombic at low temperatures. The transition temperatures are 240 and 255 K, which are substantially shifted relative to the experimental results of 361 and 403 K.42 We expect that this deviation from experiment is primarily due to our choice of exchange correlation functional, the Perdew–Burke–Enzerhof generalized-gradient approximation,43 as has been seen in similar simulations.44 A direct comparison to DFT would be useful, but it would require an accurate calculation of the predicted phase transition temperatures directly from the DFT PES, which is too computationally demanding to be practical without additional approximations.

FIG. 10.

Effective cubic lattice constants at fixed temperature simulated using the ACE model with ν̄=2, maximum polynomial degree 12, and p = 4. All three values are identical (to within the estimated error) at T > 255 K indicating a cubic structure. At lower temperatures these split into a single value and a group of two, consistent with a tetragonal structure, and at T < 240 K they split further into three distinct values, consistent with an orthorhombic structure.

FIG. 10.

Effective cubic lattice constants at fixed temperature simulated using the ACE model with ν̄=2, maximum polynomial degree 12, and p = 4. All three values are identical (to within the estimated error) at T > 255 K indicating a cubic structure. At lower temperatures these split into a single value and a group of two, consistent with a tetragonal structure, and at T < 240 K they split further into three distinct values, consistent with an orthorhombic structure.

Close modal

The linearity of ACE potentials renders them not only interpretable but also efficient in terms of computational performance. To demonstrate this, a performance test was conducted on various linear ACE potentials referenced in this paper. The evaluation times, as well as some ACE hyperparameters used, are shown in Table III for the AlSi10, CsPbI3, H2O, PEG and Si potential developed in this work. The number of basis functions for each model is given too and may be fewer than a complete ACE basis parameterized by ν̄ and Dmax due to ARD pruning basis functions with low relevance. The timings were obtained using the LAMMPs-PACE implementation12 using a 128 core ARCHER2 node, equivalent to two seperate AMD EPYC 7742 64-core at 2.25 GHz. The 106 steps/day figures are equivalent to a ns/day and were obtained for varying cell sizes to illustrate scaling. A standardized performance figure in the form of core-μs/atom figure is also provided. The silicon database fitted originates from the silicon GAP potential, whereas the AlSi10, PEG and CsPbBr3 potentials were fitted using HAL generated databases containing 22, 68 and 198 configurations respectively as discussed in the previous subsections.

TABLE III.

Performance of linear ACE potentials for various systems using an ARCHER2 node utilising 128 cores for the 106 steps/day figures (equivalent ns/day using a 1 fs timestep). Core-μs/atom figures were obtained by performing simulations in serial.

ACE parametersPerformance
ν̄DmaxrcutNo. basis func.106 steps/day (atoms)Core-μs/atom
AlSi10 5.5 79 636 (32) 23 
CsPbBr3 12 5.5 544 334 (20) 93 
PEG 12 5.5 4897 10 (1400) 227 
Si 20 5434 7 (250) 744 
ACE parametersPerformance
ν̄DmaxrcutNo. basis func.106 steps/day (atoms)Core-μs/atom
AlSi10 5.5 79 636 (32) 23 
CsPbBr3 12 5.5 544 334 (20) 93 
PEG 12 5.5 4897 10 (1400) 227 
Si 20 5434 7 (250) 744 

We introduced ACEpotentials.jl, a front-end for several Julia-language packages that implement Atomic Cluster Expansion (ACE) MLIPs and related functionality. This front-end provides a user-oriented interface, while the backend packages combine excellent performance with the flexibility for rapid model development and experimentation that is typical for the Julia language. The front-end ACEpotentials.jl exposes a relatively simple subset of ACE type models, linear models with robust priors, that we consider reliable in every-day use, especially in the context of an active learning type workflow.

However, we emphasize that the ACE framework allows for a much richer MLIPs design space9,12,48–47 as well as parameterisation of many other types of particle systems.51–51 We therefore conclude by mentioning some of those extensions, as well as current short-comings, that require further development.

  • Robust parameter estimation, in particular hyperparameter tuning, remains under-investigated in the MLIPs context. We regularly experience that hand-tuned hyperparameters can give superior results, basis sparsification remains poorly understood, and uncertainties are often only indicative of actual errors. Further research is required to resolve these closely related issues.

  • The design space of the ACEpotentials.jl ACE models can be expanded to admit trainable radial embeddings, composition of ACE features with nonlinearities, or even multi-layer architectures such as.45,46 This comes at the cost of highly nonlinear and less efficient models, but some of those extensions, such as trainable radial embeddings, can be undertaken while keeping the spirit of our current ACE models: small models for rapid iterative development and low evaluation cost.

  • The extension to highly nonlinear models would likely require that the computational kernels on which ACEpotentials.jl is built also be made graphics processing unit (GPU)-capable. Towards that end a deep learning framework such as MACE46 [see also the mace (https://github.com/ACEsuit/mace) code] may be better suited.

  • Finally, we note that there are already several related ACE software packages within ACEsuit (https://github.com/ACEsuit) that implement a variety of models for other particle systems at different stages of development: Hamiltonians (Ref. 48, ACEhamiltonians.jl); wave functions (Ref. 50 and 51, ACEpsi.jl); jet tagging models (Ref. 49, BIPs.jl). These build on an experimental and significantly expanded Julia-language ACE package ACE.jl.

G.C. acknowledges support from EPSRC Grant No. EP/X035956/1. C.O., A.R., and T.J. were supported by NSERC Discovery Grant No. GR019381 and NFRF Exploration Grant No. GR022937. W.J.B. was supported by US AFRL Grant No. FA8655-21-1-7010. C.v.d.O. and G.C. acknowledge ARCHER2 for which access was obtained via the UKCP consortium and funded by EPSRC Grant No. EP/P022065/1. N.B. was supported by the U.S. Office of Naval Research through the U.S. Naval Research Laboratory’s fundamental research base program. E.G. acknowledges support from the EPSRC Centre for Doctoral Training in Automated Chemical Synthesis Enabled by Digital Molecular Technologies with Grant Reference No. EP/S024220/1. W.C.W. was supported by the Schmidt Science Fellows in partnership with the Rhodes Trust, and additionally acknowledges support from EPSRC (Grant No. EP/V062654/1). J.K. and C.O. acknowledge funding from the Leverhulme Trust under grant RPG-2017-191 and the EPSRC under Grant No. EP/R043612/1. J.K., J.P.D. and G.C. acknowledge support from the NOMAD Centre of Excellence funded by the European Commission under grant agreement 951786. J.K. acknowledges support from the EPSRC under Grant Nos. EP/P002188 and EP/R012474/1. This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the EPSRC (capital Grant No. EP/T022159/1), and DiRAC funding from the STFC (www.dirac.ac.uk). Further computing facilities were provided by the Scientific Computing Research Technology Platform of the University of Warwick.

The authors have no conflicts to disclose.

William C Witt: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Cas van der Oord: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Elena Gelžinytė: Software (equal); Writing – original draft (equal). Teemu Järvinen: Software (equal); Writing – original draft (equal). Andres Ross: Methodology (equal); Software (equal); Writing – original draft (equal). James P. Darby: Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Cheuk Hin Ho: Data curation (equal); Investigation (equal); Methodology (equal); Software (equal). William J. Baldwin: Data curation (equal); Investigation (equal); Software (equal); Writing – original draft (equal). Matthias Sachs: Methodology (equal); Writing – original draft (equal). James Kermode: Software (equal); Writing – review & editing (equal). Noam Bernstein: Data curation (equal); Investigation (equal); Software (equal); Visualization (equal); Writing – original draft (equal). Gábor Csányi: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). Christoph Ortner: Conceptualization (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Software (equal); Supervision (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request. are openly available in github.

In Secs. II A and II B we outlined some basic ideas behind the ACE model, in particular expressing the potential energy model in terms of the many-body expansion (Sec. II A). A naive implementation of the many-body expansion results in prohibitive computational cost due to the exponential cost of the sums over clusters (j1, …, jν). However, after discretizing the U(ν)-body potential of the self-interacting many-body expansion (2b) the sum can be rewritten to result in linear scaling cost. This is presented in detail, for example, in.9,11,12 hence we shall not review this process in full detail here. In order to outline what is involved in an implementation of an ACE potential, we only recall the form that the ACE model takes after this re-organisation of the many-body summation. The evaluation of the self-interacting ACE basis then results in the following stages:

  1. Evaluation of the embeddings, Rnl (rij, Zi, Zj) and Ylm(r̂ij).

  2. A pooling operation; also called called the atomic basis,9 or the density projection,2 
    Aznlmi=jN(i)ϕznml(rij,Zj,Zi),
    (A1)
    where N(i) denotes the set of indices of all atoms within the cutoff radius from atom i.
  3. Product basis: for lexicographically ordered tuples (z,n,l,m)=(zt,nt,lt,mt)t=1ν we define
    Aznlmi=t=1νAztntltmti.
    (A2)
    This operation can be thought of as a sparse symmetric tensor product, or as taking ν-correlations.
  4. Symmetrization: To ensure invariance one averages Ai over all rotations, resulting in the O(3)-invariant basis
    Bi=CAi,
    (A3)
    employed in the definition of the linear ACE model (3). Here, Ai is the vector of (Aznlmi) basis functions while C a sparse matrix.

For each of these stages efficient computational kernels are implemented, designed in a modular way so that they can be independently optimized or composed into new model architectures.

1. Canonical many-body expansion

Under the condition that the radial basis and envelope function are pure polynomials, it is possible to transform the self-interacting ACE basis Bi defined in (A3) into a basis for the canonical many-body expansion (2a). The idea behind this procedure is sketched out in Ref. 11. The precise details of the implementation and a detailed study is not the purpose of this review. Here, we only mention that, upon slightly extending the Rnl, Ai and Ai bases, one can obtain a “purification operator” P such that the linearly transformed PAi becomes a basis for the canonical many-body expansion (2a). The symmetrisation C can then be applied to obtain an O(3)-invariant basis BiCPAi.

An important variation of the “purification operation” P is to only purify the two-body interaction. This entails replacing the fully self-interacting basis functions
Aki=j1,,jνt=1νϕkt(xijt)withj1,,jνjajbt=1νϕkt(xijt)
All three options (i) fully self-interacting, (ii) purified pair interaction, and (iii) canonical cluster expansion are available in ACEpotentials.jl. The package documentation should be reviewed on how to select the different basis sets.
1.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
2.
A. P.
Bartók
,
M. C.
Payne
,
R.
Kondor
, and
G.
Csányi
, “
Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons
,”
Phys. Rev. Lett.
104
(
13
),
136403
(
2010
).
3.
V. L.
Deringer
,
M. A.
Caro
, and
G.
Csányi
, “
Machine learning interatomic potentials as emerging tools for materials science
,”
Adv. Mater.
31
(
46
),
1902765
(
2019
).
4.
J.
Behler
and
G.
Csányi
, “
Machine learning potentials for extended systems: A perspective
,”
Eur. Phys. J. B
94
,
142
(
2021
).
5.
V. L.
Deringer
,
A. P.
Bartók
,
N.
Bernstein
,
D. M.
Wilkins
,
M.
Ceriotti
, and
G.
Csányi
, “
Gaussian process regression for materials and molecules
,”
Chem. Rev.
121
(
16
),
10073
10141
(
2021
).
6.
F.
Musil
,
A.
Grisafi
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Physics-inspired structural representations for molecules and materials
,”
Chem. Rev.
121
(
16
),
9759
9815
(
2021
).
7.
A. P.
Thompson
,
L. P.
Swiler
,
C. R.
Trott
,
S. M.
Foiles
, and
G. J.
Tucker
, “
Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials
,”
J. Comput. Phys.
285
,
316
330
(
2015
).
8.
A. V.
Shapeev
, “
Moment tensor potentials: A class of systematically improvable interatomic potentials
,”
Multiscale Model. Simul.
14
(
3
),
1153
1173
(
2016
).
9.
R.
Drautz
, “
Atomic cluster expansion for accurate and transferable interatomic potentials
,”
Phys. Rev. B
99
,
014104
(
2019
).
10.
A.
Seko
,
A.
Togo
, and
I.
Tanaka
, “
Group-theoretical high-order rotational invariants for structural representations: Application to linearized machine learning interatomic potential
,”
Phys. Rev. B
99
,
214108
(
2019
).
11.
G.
Dusson
,
M.
Bachmayr
,
G.
Csányi
,
R.
Drautz
,
S.
Etter
,
C.
van der Oord
, and
C.
Ortner
, “
Atomic cluster expansion: Completeness, efficiency and stability
,”
J. Comput. Phys.
454
,
110946
(
2022
).
12.
Y.
Lysogorskiy
,
C.
van der Oord
,
A.
Bochkarev
,
S.
Menon
,
M.
Rinaldi
,
T.
Hammerschmidt
,
M.
Mrovec
,
A.
Thompson
,
G.
Csányi
,
C.
Ortner
, and
R.
Drautz
, “
Performant implementation of the atomic cluster expansion (PACE) and application to copper and silicon
,”
npj Comput. Mater.
7
(
1
),
97
(
2021
).
13.
D. P.
Kovács
,
C.
van der Oord
,
J.
Kucera
,
E. A.
AliceAllen
,
D. J.
Cole
,
C.
Ortner
, and
G.
Csányi
, “
Linear atomic cluster expansion force fields for organic molecules: Beyond RMSE
,”
J. Chem. Theory Comput.
17
(
12
),
7696
7711
(
2021
).
14.
C.
van der Oord
,
M.
Sachs
,
D. P.
Kovács
,
C.
Ortner
, and
G.
Csányi
, “
Hyperactive learning (HAL) for data-driven interatomic potentials
,”
npj Comput. Mater.
9
(
1
),
1
14
(
2022
).
15.
Y.
Liang
,
M.
Mrovec
,
Y.
Lysogorskiy
,
M.
Vega-Paredes
,
C.
Scheu
, and
R.
Drautz
, “
Atomic cluster expansion for Pt–Rh catalysts: From ab initio to the simulation of nanoclusters in few steps
,”
J. Mater. Res.
(published online
2023
).
16.
A.
Bochkarev
,
Y.
Lysogorskiy
,
S.
Menon
,
M.
Qamar
,
M.
Mrovec
, and
R.
Drautz
, “
Efficient parametrization of the atomic cluster expansion
,”
Phys. Rev. Mater.
6
,
013804
(
2022
).
17.
M.
Qamar
,
M.
Mrovec
,
Y.
Lysogorskiy
,
A.
Bochkarev
, and
R.
Drautz
, “
Atomic cluster expansion for quantum-accurate large-scale simulations of carbon
,”
J. Chem. Theory Comput.
19
(
15
),
5151
5167
(
2023
).
18.
A. P.
Thompson
,
H. M.
Aktulga
,
R.
Berger
,
D. S.
Bolintineanu
,
W. M.
Brown
,
P. S.
Crozier
,
P. J.
in ’t Veld
,
A.
Kohlmeyer
,
S. G.
Moore
,
T. D.
Nguyen
,
R.
Shan
,
M. J.
Stevens
,
J.
Tranchida
,
C.
Trott
, and
S. J.
Plimpton
, “
LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales
,”
Comput. Phys. Commun.
271
,
108171
(
2022
).
19.
A.
Hjorth Larsen
,
J.
Jørgen Mortensen
,
J.
Blomqvist
,
I. E.
Castelli
,
R.
Christensen
,
M.
Dułak
,
J.
Friis
,
M. N.
Groves
,
B.
Hammer
,
C.
Hargus
,
E. D.
Hermes
,
P. C.
Jennings
,
P.
Bjerre Jensen
,
J.
Kermode
,
J. R.
Kitchin
,
E.
Leonhard Kolsbjerg
,
J.
Kubal
,
K.
Kaasbjerg
,
S.
Lysgaard
,
J.
Bergmann Maronsson
,
T.
Maxson
,
T.
Olsen
,
L.
Pastewka
,
A.
Peterson
,
C.
Rostgaard
,
J.
Schiøtz
,
O.
Schütt
,
M.
Strange
,
K. S.
Thygesen
,
T.
Vegge
,
L.
Vilhelmsen
,
M.
Walter
,
Z.
Zeng
, and
K. W.
Jacobsen
, “
The atomic simulation environment—a Python library for working with atoms
,”
J. Phys.: Condens. Matter
29
(
27
),
273002
(
2017
).
20.
Molly.jl: Molecular simulation in Julia.
21.
C.
Rackauckas
,
Y.
Ma
,
J.
Martensen
,
C.
Warner
,
K.
Zubov
,
R.
Supekar
,
D.
Skinner
, and
A.
Ramadhan
, “
Universal differential equations for scientific machine learning
,” arXiv:2001.04385 (
2020
).
22.
ACEpotentials.jl. Documentation and user interface for Julia-language development of ACE potentials, https://github.com/ACEsuit/ACEpotentials.jl.
23.
J. F.
Ziegler
,
J. P.
Biersack
, and
U.
Littmark
,
The Stopping and Range of Ions in Solids
(
Pergamon
,
1985
).
24.
W. J.
Szlachta
,
A. P.
Bartók
, and
G.
Csányi
, “
Accuracy and transferability of Gaussian approximation potential models for tungsten
,”
Phys. Rev. B
90
(
10
),
104108
(
2014
).
25.
A. P.
Bartók
,
J.
Kermode
,
N.
Bernstein
, and
G.
Csányi
, “
Machine learning a general-purpose interatomic potential for silicon
,”
Phys. Rev. X
8
(
4
),
041048
(
2018
).
26.
A. P.
Bartók
,
S.
De
,
C.
Poelking
,
N.
Bernstein
,
J. R.
Kermode
,
G.
Csányi
, and
M.
Ceriotti
, “
Machine learning unifies the modeling of materials and molecules
,”
Sci. Adv.
3
(
12
),
e1701816
(
2017
).
27.
Y.
Zuo
,
C.
Chen
,
X.
Li
,
Z.
Deng
,
Y.
Chen
,
J.
Behler
,
G.
Csányi
,
A. V.
Shapeev
,
A. P.
Thompson
,
M. A.
Wood
, and
S. P.
Ong
, “
Performance and cost assessment of machine learning interatomic potentials
,”
J. Phys. Chem. A
124
(
4
),
731
745
(
2020
).
28.
S. J.
Clark
,
M. D.
Segall
,
C. J.
Pickard
,
P. J.
Hasnip
,
M. I. J.
Probert
,
K.
Refson
, and
M. C.
Payne
, “
First principles methods using CASTEP
,”
Z. Kristallogr. - Cryst. Mater.
220
(
5–6
),
567
570
(
2005
).
29.
G. C.
Sih
,
P. C.
Paris
, and
G. R.
Irwin
, “
On cracks in rectilinearly anisotropic bodies
,”
Int. J. Fract. Mech.
1
(
3
),
189
203
(
1965
).
30.
J. R.
Kermode
,
T.
Albaret
,
D.
Sherman
,
N.
Bernstein
,
P.
Gumbsch
,
M. C.
Payne
,
G.
Csányi
, and
A.
De Vita
, “
Low-speed fracture instabilities in a brittle crystal
,”
Nature
455
(
7217
),
1224
1227
(
2008
).
31.
E.
Bitzek
,
J. R.
Kermode
, and
P.
Gumbsch
, “
Atomistic aspects of fracture
,”
Int. J. Fract.
191
(
1–2
),
13
30
(
2015
).
32.
B.
Cheng
,
E. A.
Engel
,
J.
Behler
,
C.
Dellago
, and
M.
Ceriotti
, “
Ab initio thermodynamics of liquid and solid water
,”
Proc. Natl. Acad. Sci. U. S. A.
116
(
4
),
1110
1115
(
2019
).
33.
G.
Schwarz
, “
Estimating the dimension of a model
,”
Ann. Stat.
6
(
2
),
461
464
(
1978
).
34.
J.
Vandermause
,
Y.
Xie
,
J. S.
Lim
,
C. J.
Owen
, and
B.
Kozinsky
, “
Active learning of reactive Bayesian force fields applied to heterogeneous catalysis dynamics of H/Pt
,”
Nat. Commun.
13
(
1
),
5183
(
2022
).
35.
E. V.
Podryabinkin
and
A. V.
Shapeev
, “
Active learning of linearly parametrized interatomic potentials
,”
Comput. Mater. Sci.
140
,
171
180
(
2017
).
36.
M. K.
Bisbo
and
B.
Hammer
, “
Global optimization of atomic structure enhanced by machine learning
,”
Phys. Rev. B
105
,
245404
(
2022
).
37.
M.
Tang
,
P. C.
Pistorius
,
S.
Narra
, and
J. L.
Beuth
, “
Rapid solidification: Selective laser melting of AlSi10Mg
,”
JOM
68
,
960
(
2016
).
38.
ACEHAL, Implementation in Python, https://github.com/libAtoms/ACEHAL.
39.
F.
Neese
, “
The ORCA program system
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
(
1
),
73
78
(
2012
).
40.
J.-D.
Chai
and
M.
Head-Gordon
, “
Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections
,”
Phys. Chem. Chem. Phys.
10
,
6615
(
2008
).
41.
Polyethylene Glycol [MAK Value Documentation, 1998]
(
John Wiley and Sons, Ltd.
,
2012
), pp.
248
270
.
42.
C. C.
Stoumpos
,
C. D.
Malliakas
,
J. A.
Peters
,
Z.
Liu
,
M.
Sebastian
,
J.
Im
,
T. C.
Chasapis
,
A. C.
Wibowo
,
D. Y.
Chung
,
A. J.
Freeman
,
B. W.
Wessels
, and
M. G.
Kanatzidis
, “
Crystal growth of the perovskite semiconductor CsPbBr3: A new material for high-energy radiation detection
,”
Cryst. Growth Des.
13
(
7
),
2722
2727
(
2013
).
43.
J. P.
Perdew
,
K.
Burke
, and
M.
Ernzerhof
, “
Generalized gradient approximation made simple
,”
Phys. Rev. Lett.
77
,
3865
3868
(
1996
).
44.
E.
Fransson
,
J.
Wiktor
, and
P.
Erhart
, “
Phase transitions in inorganic halide perovskites from machine learning potentials
,”
J. Phys. Chem. C
127
(
28
),
13773
13781
(
2023
).
45.
A.
Bochkarev
,
Y.
Lysogorskiy
,
C.
Ortner
,
G.
Csanyi
, and
R.
Drautz
, “
Multilayer atomic cluster expansion for semilocal interactions
,”
Phys. Rev. Res.
4
,
L042019
(
2022
).
46.
I.
Batatia
,
D. P.
Kovács
,
G. N. C.
Simm
,
C.
Ortner
, and
G.
Csányi
, “
MACE: Higher order equivariant message passing neural networks for fast and accurate force fields
,” in
Advances in Neural Information Processing Systems
(
Neural Information Processing Systems Foundation
,
2022
), Vol. 35, pp.
11423
11436
.
47.
J. P.
Darby
,
J. R.
Kermode
, and
G.
Csányi
, “
Compressing local atomic neighbourhood descriptors
,”
npj Comput. Mater.
8
(
1
),
166
(
2022
).
48.
L.
Zhang
,
B.
Onat
,
G.
Dusson
,
A.
McSloy
,
G.
Anand
,
R. J.
Maurer
,
C.
Ortner
, and
J. R.
Kermode
, “
Equivariant analytical mapping of first principles Hamiltonians to accurate and transferable materials models
,”
npj Comput. Mater.
8
,
158
(
2022
).
49.
J. M.
Munoz
,
I.
Batatia
, and
C.
Ortner
, “
Boost invariant polynomials for efficient jet tagging
,”
Mach. Learn.: Sci. Technol.
3
,
04LT05
(
2022
).
50.
R.
Drautz
and
C.
Ortner
, “
Atomic cluster expansion and wave function representations
,” arXiv:2206.11375.
51.
D.
Zhou
,
H.
Chen
,
C.
Hin Ho
, and
C.
Ortner
, “
A multilevel method for many-electron Schrödinger equations based on the atomic cluster expansion
,” arXiv:2304.04260.