We introduce ACEpotentials.jl, a Julia-language software package that constructs interatomic potentials from quantum mechanical reference data using the Atomic Cluster Expansion [R. Drautz, Phys. Rev. B **99**, 014104 (2019)]. As the latter provides a complete description of atomic environments, including invariance to overall translation and rotation as well as permutation of like atoms, the resulting potentials are systematically improvable and data efficient. Furthermore, the descriptor’s expressiveness enables use of a linear model, facilitating rapid evaluation and straightforward application of Bayesian techniques for active learning. We summarize the capabilities of ACEpotentials.jl and demonstrate its strengths (simplicity, interpretability, robustness, performance) on a selection of prototypical atomistic modelling workflows.

## I. INTRODUCTION

Machine-learning interatomic potentials (MLIPs) continue to revolutionize the fields of molecular and materials simulation.^{1,2} MLIPs provide the means to simulate atomistic systems at or close to the accuracy of electronic structure methods, while being computationally cheaper by orders of magnitude. They make the simulation of large-scale systems and long time-scales at high model accuracy accessible and have therefore become an indispensable tool for atomic-scale simulation. Recent reviews of the field are provided in Refs. 3–6. Of particular relevance to the present work are the methods introduced in Refs. 2 and 7–9.

To create an MLIP, one begins with a flexible functional form, constrained only to comply with the natural symmetries of the potential energy in three-dimensional space, then estimates its parameters using reference data, typically in the form of energies, forces, and virial stresses for a set of representative atomic configurations. Ordinarily, the data are generated with quantum mechanical techniques, such as density functional theory (DFT) calculations, which may be performed only for relatively small structures. A well-trained MLIP is then expected to provide accurate predictions of processes on similar but also much larger spatial scales.

The *Atomic Cluster Expansion* (ACE) introduced in Ref. 9 is a particular MLIP flavor that is flexible, theoretically well founded, interpretable, and for which it is straightforward to tune the cost-accuracy balance. It is establishing itself as a successful MLIP approach for a wide range of tasks, especially but not exclusively in materials simulation; see e.g., Refs. 10–17. Linear variants of the ACE model have been found remarkably data efficient and computationally efficient and as such have proven particularly useful for active learning (AL) workflows^{14} as Secs. III and IV will demonstrate. Linearity in particular enables sensitivity analysis and a path towards reliable uncertainty quantification.

This article describes ACEpotentials.jl, which ties together a collection of Julia-language packages to expose a user-oriented interface facilitating the convenient construction of ACE MLIPs. To highlight the ease of use of our package, Listing 1 provides a complete Julia-language example that produces an ACE potential for a TiAl dataset.

At the time of writing, ACEpotentials.jl provides interfaces for *linear* ACE models, which give good accuracy as well as performance both in parameter estimation and prediction. We have incorporated a range of geometric and analytical priors into the default model parameters that have proven robust in a range of tasks, including the challenging low data regime arising in active learning workflows. ACEpotentials.jl models can be used for molecular dynamics (MD) simulation in Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS),^{18} the Atomic Simulation Environment (ASE)^{19} and Molly.jl.^{20}

The Julia-language codes on which ACEpotentials.jl builds are written with ease-of-use, performance, and flexibility of model development in mind. Several variations and extensions of the ACE model implementations discussed in this article are under active development. The choice of Julia as the development language enables seamless transition from rapid prototyping to performance optimization. Moreover, Julia is establishing itself as leader in *scientific machine learning* (see, e.g., Ref. 21), facilitating highly customized model architectures with novel computational kernels.

Finally, we emphasize that the aim of this article is to illustrate the capabilities of ACEpotentials.jl but not to precisely document its use; for the latter see the reference material at Ref. 22, which will evolve along with the software. While the examples and code snippets provided throughout this article are compatible with the present version of ACEpotentials.jl, they should be taken primarily as illustrations of how the package may be used. The documentation will be kept up-to-date for the foreseeable future and will continually expand to describe additional options and features.

## II. METHODS

### A. Review of the linear ACE framework

#### 1. Model specification

*r*_{i},

*Z*

_{i}), and the computational unit cell (with open or periodic boundary conditions). In the ACE model, the total potential energy of such a structure is decomposed into site energies,

*ɛ*

_{i}depends on its atomic neighbourhood containing all atoms within a cutoff radius

*r*

_{cut}from

*r*_{i}, taking into account the boundary conditions. The ACE framework provides a design space to construct systematic models for the site energy

*ɛ*

_{i}in terms of a complete linear basis of body-ordered symmetric polynomials.

*x*_{i}≔ (

*r*_{i},

*Z*

_{i}) for the state of an atom and

*x*_{ij}≔ (

*r*_{ij},

*Z*

_{i},

*Z*

_{j}), where

*r*_{ij}=

*r*_{j}−

*r*_{i}, for the state of a bond between atoms

*x*_{i},

*x*_{j}. In terms of these variables the site energy is expanded in body-order, in two different formulations:

*canonical cluster expansion*. It can be transformed

^{9}into the second formulation (2b), where the sums run over all possible combinations of atoms, including all permutation-equivalent clusters and even “artificial clusters” with repeated particles. This transformation introduces unphysical self-interaction terms such as

*V*

^{(2)}(

*x*_{ij},

*x*_{ij}), but this counter-intuitive choice leads to a tensor product structure that can be exploited in constructing a highly efficient evaluation scheme. Our code is unique in that it implements the transformation between the two descriptions and also allows the evaluation of the canonical formulation (2a). Indeed the default ACEpotentials.jl model specification uses a combination of the two formulations. We will briefly review the challenges involved in evaluating cluster expansion models in the Appendix.

Both series in Sec. II A are truncated versions of an exact body-order expansion. An exact expansion would include terms up to the number of atoms in the system, while here the maximum body-order is $\nu \u0304+1$ (corresponding to a correlation order of $\nu \u0304$), which constitutes the first approximation parameter. In practice, the truncation is performed at low to moderate $\nu \u0304$ (typically 5 or less) for several reasons, including control of model complexity and computational cost.

*V*

^{(ν)}(or,

*U*

^{(ν)}) is parameterized by a linear model, a process for which we give details below in the following sections. This then results in a parameterisation of the site energy that is also linear,

**is a vector of parameters and**

*c*

*B*_{i}a vector of basis functions (or, features) involved in the expansion of the many-body potentials

*V*

^{(ν)}or

*U*

^{(ν)}. The basis functions are by construction invariant under rotations, reflections and permutations of like atoms. The representation is also

*complete*(or, universal) in the sense that when the approximation parameters (body-order, cutoff radius, and expansion resolution) are taken to infinity, the model can in principle represent an arbitrary smooth site-energy potential. Linearity of the model allows us to employ a vast range of established tools for parameter estimation and uncertainty quantification, and enables rapid model development by refitting to new training data or with adjusted hyperparameters.

The basis functions *B*_{i} specify the model. In a typical example this can be done as demonstrated in Listing 2.

The model object specifies the model site energy potential, from which derived properties such as potential energy, forces and virial stresses can be computed that are used in molecular statics, molecular dynamics or sampling algorithms.

There are many additional parameters and options available to specify an ACE model, some of which we discuss throughout the remainder of this paper. For a complete list of options we refer to the documentation.^{22} We only remark briefly on the Eref parameter: We recommend the explicit specification of the one-body term *V*^{(0)}. We observed in many tests that constraining *V*^{(0)}(*Z*_{i}) to be the energy of a single isolated atom with atomic number *Z*_{i} yields more chemically realistic potentials that are more robust in practical molecular dynamics and molecular statics simulations, especially those involving breaking and forming bonds. One provides this information to an ACE model as shown in Listing 2, line 6.

In the remainder of this section we maintain a focus on high level intuitive understanding of options and parameters and avoid details and technicalities of the ACE framework as much as possible. For those details we refer to the Appendix. and to the many publications now available on the subject.^{6,9,11,12}

#### 2. Parameter estimation

**= {**

*R**R*}, for which the total potential energy $ER\u2208R$, forces $FR\u2208R3\xd7NR$ (with

*N*

_{R}the number of atoms in the computational unit cell) and possibly also virial stresses $VR\u2208R6$ (in Voigt notation) have been evaluated with an electronic structure model. We define

*E*(

**;**

*c**R*),

*F*(

**;**

*c**R*),

*V*(

**;**

*c**R*) as the corresponding energies, forces and virials for the structure

*R*in the ACE model, with parameters

**. The simplest way to estimate those parameters is then to minimize the least squares loss function**

*c**w*

_{E,R},

*w*

_{F,R},

*w*

_{V,R}can be used to give more or less relative “importance” to certain structures or observations. They are usually highly structured (e.g.,

*w*

_{E,R},

*w*

_{V,R}are scaled with the number of atoms in a structure

*R*), which will be discussed in more detail in Sec. II E. Since the ACE model is linear in

**it follows that**

*c**L*(

**) is quadratic, which means that minimizing**

*c**L*is a linear least squares problem. A wide range of efficient numerical techniques exist for its solution. In particular we will normally employ regularized or Bayesian variations of the naive least squares minimization, which are discussed in Secs. II E and II F.

In Listing 3 we read in such a prepared training set provided in the extended XYZ format and then estimate the model parameters with a default solver (Bayesian Linear Regression; cf. Sec. II F). Several steps are combined and hidden from the user in the acefit! convenience function, but all these steps can in principle also be performed manually, e.g., to explore different parameter estimation algorithms that are currently not interfaced by ACEpotentials.jl. In line 5 of the listing, the fitted model is exported to a format that can be used for molecular dynamics simulations in LAMMPS.

### B. Choice of basis functions and geometric priors

The parameters in the model specification in Listing 2 specify a basis in which the *V*^{(ν)} potentials are expanded. In the current section we will detail the *basis functions* that are employed, while in Sec. II C we will then explain how to select a finite subset from the infinite complete basis set.

#### 1. One-particle basis

*one-particle basis*

*ϕ*

_{k}in which all smooth functions

*f*(

*x*_{ij}) =

*f*(

*r*_{ij},

*Z*

_{i},

*Z*

_{j}) can be expanded. The most general form we consider is

*δ*denotes the Kronecker symbol and we have identified

*k*= (

*z*,

*n*,

*l*,

*m*). The $Ylm$ are the standard complex spherical harmonics, while

*R*

_{nl}is called the

*radial basis*. The choice of $Ylm$ to embed the angular component $r\u0302ij$ facilitates the exact symmetrization of the parameterisation with respect to rotations. Since (

*r*

_{ij},

*Z*

_{i},

*Z*

_{j}) is already invariant under rotations, the choice of

*R*

_{nl}is extremely general. Nevertheless we will below outline a heuristic that leads to a narrow class of choices that have proven successful in many applications. However, we note that the optimal choice of

*R*

_{nl}remains an active area of research and will likely also evolve within ACEpotentials.jl.

*ϕ*

_{k}is selected, each potential

*V*

^{(ν)}(or,

*U*

^{(ν)}) is expanded in terms of a tensor product many-body basis,

*Z*

_{i}. During the parameter estimation, the parameters will be constrained to guarantee invariance of the resulting potentials under rotations and reflections of an atomic environment. Invariance under permutations is already ensured through the summation in Sec. II A. The Appendix reviews additional details of this invariant basis construction, resulting in the specification of

*B*_{i}in terms of which site energy is defined in (3).

#### 2. Radial basis

*R*

_{nl}, which can be thought of as a

*geometric prior*. For example, it incorporates the interaction range (cutoff radius,

*r*

_{cut}) and can be tuned to capture rough qualitative information about interacting atoms. In the following we describe a class of radial bases, available through ACEpotentials.jl, that require no data-driven optimization and thus leads to genuinely linear models. At the time of writing this article, ACEpotentials.jl supports radial bases indexed by

*n*only, i.e.

*R*

_{nl}=

*R*

_{n}for all

*l*. This class is described by

*y*is an element-dependent distance transform, which can be used to impose increased spatial resolution where needed, especially near the equilibrium bond-length. We typically employwhere$y(rij,Zi,Zj)=1+a(r/r0)q1+(r/r0)q\u2212p\u22121,$*r*_{0}is an estimate of the equilibrium bond-length in the system and*a*is chosen to maximize the gradient of*y*at*r*=*r*_{0}, thereby maximizing resolution for nearest-neighbour interaction.The idea behind this transform is that it behaves as

*r*^{−q}for large*r*and as 1 −*r*^{p}/*a*for small*r*thereby decreasing resolution in those two limits at rates determined by the parameters*p*,*q*. The reduction in resolution in the small*r*regime is desirable when no data is available to specify the model in that regime; see also Fig. 1.*P*_{n}is an orthogonal basis in*y*-coordinates. Our default choice is the Legendre orthogonal polynomial basis, which implicitly assumes equidistribution of resolution in*y*-coordinates.Finally,

*f*_{env}is an envelope that specifies the cutoff radius*r*_{cut}.– The default and canonical choice for the many-body basis iswhere$fenv(rij,Zi,Zj)=y2(y\u2212ycut)2,$*y*_{cut}=*y*(*r*_{cut},*Z*_{i},*Z*_{j}).– The default choice of envelope for the pair potential*U*^{(1)}or*V*^{(1)}is Coulomb potential tilted to ensure a smooth cutoff,$fenv(rij,Zi,Zj)=rijr0\u22121\u2212rcutr0\u22121+rcutr0\u22122rijr0\u2212rcutr0,$

*r*→ 0 but continuously differentiable at the cutoff.

While the envelope for the many-body potential is canonical, for the pair potential envelope there is significant scope for inserting prior modelling knowledge of the system of interest. For example, one could replace the *r*^{−1} type behaviour with *r*^{−p} + *r*^{−q} to obtain different behaviour as *r* → 0 and *r* → *r*_{cut}, or in fact one could incorporate the Ziegler-Biersack-Littmark (ZBL) potential^{23} to obtain asymptotically exact repulsion.

The effect of the distance transform *y* = *y*(*r*) and of the envelope function are visualized in Fig. 1.

- Repulsion restraint: The construction outlined above means that, in the canonical cluster expansion formulation, the pair potential is given bywhere $pZiZj$ is a polynomial in transformed$V(1)(rij,Zi,Zj)=fenv(rij,Zi,Zj)pZiZj(yij),$
*y*coordinates. By imposing the constraint that $pZiZj(y0)=1$, where*y*_{0}=*y*(0,*Z*_{i},*Z*_{j}), we ensure that*E*∼*f*_{env}(*r*_{ij}) as*r*_{ij}→ 0. This*guarantees*repulsive behaviour of the total energy, independently of whether or not this is provided through the training data. In practice we enforce this weakly through a mild restraint to give the potential more flexibility.

### C. A priori sparsification and smoothness prior

*k*

_{1}, …,

*k*

_{ν}), we wish to incorporate into the expansion of the (

*ν*+ 1)-body potential

*V*

^{(ν)}.

#### 1. Sparse basis selection

Recall that *k*_{t} = (*z*_{t}, *n*_{t}, *l*_{t}, *m*_{t}), and that the bound |*m*_{t}| ≤ *l*_{t} on *m*_{t} automatically gives a selection of possible *m*_{t} values once *l*_{t} bounds are chosen. Roughly speaking, *n*_{t}, *l*_{t} measure how oscillatory the corresponding basis functions are in, respectively, the radial *r*_{t} and angular $r\u0302t$ coordinates. Therefore one typically puts upper bounds *n*_{t} ≤ *n*_{max} and *l*_{t} ≤ *l*_{max} in the basis selection, i.e. one chooses all basis functions (*k*_{1}, …, *k*_{ν}) in the expansion for which these bounds are satisfied. Lower bounds lead to a smaller basis, but also less flexibility and correspondingly lower accuracy on the training set.

*mixed regularity*of the basis functions we select. This is done by choosing a maximum

*total*degree totaldegree(

*ν*) for each correlation order

*ν*and choosing all basis functions (

*k*

_{1}, …,

*k*

_{ν}) such that

*w*

_{L}allows us to select whether we require lower or higher resolution of the angular vs radial components of the interaction. Note that a higher weight

*w*

_{L}decreases the angular resolution. The resulting selected basis is much sparser and is appropriate for parameterising very smooth functions in high dimension.

The default usage is that totaldegree(*ν*) takes the same value for all *ν* but one may also specify a separate total degree for each correlation order *ν*. For example, Listing 5 demonstrates how to select a stronger weight *w*_{L} = 2.0 thus providing less angular resolution, as well as how to select total polynomial degrees 25, 23, 20, 10 for, respectively, parameterising *V*^{(1)}, *V*^{(2)}, *V*^{(3)}, *V*^{(4)}.

Significant further fine-tuning of the basis specification is possible, e.g. choosing different total degrees and *w*_{L} parameters for different interacting species. This is explained in the package documentation.^{22}

#### 2. Smoothness prior

*architecture*specification. An issue closely related to the sparse basis selection (9) is the definition of a smoothness prior that may be employed for ridge regression (regularized least squares) which we discuss in Sec. II E or in the Bayesian framework of Sec. II F. As explained above, the value

*p*and defining

_{kk}=

*γ*

_{k}. If

**are the model parameters then ‖Γ**

*c***‖**

*c*_{2}will be a rough estimate for how smooth the potential energy surface (PES) is.

The matrix Γ also serves as a smoothness prior within the Bayesian interpretation of ridge regression: the prior distribution for the model parameters ** c** is given by a multivariate normal distribution that is centered at the origin and has variance proportional to Γ

^{−2}; see Secs. II E and II F. In ACEpotentials.jl this operator can be constructed as shown in Listing 6, with

*p*= 4,

*w*

_{L}= 1 the default.

The resulting operator Γ may now be used to specify the regularizer (or prior) of parameter estimation algorithms, e.g., in Listing 3, line 2 and explained in more detail in Secs. II E and II F. A key point is that Γ is a *rigorous* smoothness prior for the canonical cluster expansion (2a) but only a heuristic for the self-interacting expansion (2b).

### D. Training data

In the foregoing sections we discussed in some depth how an ACE interatomic potential architecture can be conveniently specified. The next task is to estimate the parameters matching the model to training data.

A training dataset consists of a collection of reference structures, ** R** = {

*R*}, each with associated potential energy $ER\u2208R$, forces $FR\u2208R3\xd7NR$ and, when appropriate, virials $VR\u2208R6$ (Voigt notation). The reference energies, forces and virials are typically obtained by evaluating a “high fidelity” reference potential energy surface for which we wish to obtain an ACE surrogate model. Density Functional Theory is a common choice, but higher levels of theory such as Coupled-Cluster methods are also used, especially for non-periodic systems. In addition each training structure should be given a label that specifies related sub-groups. For example, these subgroups could indicate different phases of a material, and the resulting labels might be “bcc,” “fcc,” “liquid.” The label could also indicate the MD temperature from the which the structures were generated, e.g. “fcc500K” or “liquid2500K.” This allows convenient filtering of the training set, e.g., for assigning training weights (cf. Sec. II E) or fitting to subsets.

Acquisition of training data need not be performed within the ACEpotentials.jl package, but can be undertaken in any simulation software that makes it convenient to generate and manipulate atomic structures, perform molecular dynamics or Monte Carlo simulations, and to evaluate structures using a high fidelity electronic structure model. Because of the general ease of use and in particular ease of interoperability with the Julia molecular simulation eco-system, we often use the Atomic Simulation Environment.^{19}

The standard format for storing and retrieving a training set in ACEpotentials.jl is the extended XYZ format and can be read as shown in Listing 7. This results in a list of atomic structures storing the structure information as well as the training data.

#### 1. Overview of training set acquisition

The acquisition of training data is often the most time-consuming aspect of MLIP development. An in-depth discussion goes beyond the scope of this software review article; important details can be found for example in Refs. 5, 12, and 24–26. In the remainder of this section we give an outline of general strategies to consider, while in Sec. III we go into practical aspects how training sets can be constructed in a few prototypical applications and what kind of tools ACEpotentials.jl provide to support that task.

The overarching requirements are that training sets (1) must contain small enough atomic structures that they can be evaluated using high-fidelity electronic structure models; and (2) must contain snapshots of all possible local atomic configurations one expects to encounter during simulation and prediction tasks. Thus, generating a training set reduces to generating representative atomic structures which are then evaluated with the reference model to obtain target potential energies, forces and virials. While the latter is usually straightforward and varies little between projects, there is no standard way yet to generate the training structures. The choice will depend on the atomic system at hand, and the simulation tasks that the model must be able to perform reliably, e.g. which system properties (observables) are to be modelled.

As a first step, one should “sketch out” the parts of the potential energy landscape that are of interest, e.g. construct one representative structure for each distinct energy minimum of interest. This might include different phases or material defects that the final model should be able to describe. Next, one generates random samples from those sketches for example by displacing the atom positions (randomly, along normal modes, volume scans, and so forth), or by subsampling an *ab initio* molecular dynamics trajectory. After collecting a seemingly adequate number of training structures (the total number of observations should normally exceed that number of parameters) one can fit a first model and test that model’s accuracy with respect to some target property. If the accuracy is inadequate, or the model not robust (e.g., an MD simulation is unstable), then a good strategy is to proceed with an iterative model refinement process. In each iteration additional training structures are selected to converge the model’s accuracy with respect to the target properties of interest. One might add hand-crafted structures to fix a particular flaw (e.g. to improve description of inter-molecular interaction in a molecular liquid or include supercells with vacant atomic sites) or model-driven MD to less computationally expensively explore relevant parts of Potential Energy Surface (for example, low potential energy regions to bring potential-Boltzmann-sample closer to reference-Boltzmann-sample and wider temperature/pressure range than intended for application of interest to make the model-driven simulations more stable).

Iterative model refinement is closely related to *active learning*. That strategy assumes that there is an accurate and efficient way available to estimate model uncertainty. During a simulation task, for example a molecular dynamics simulation, when a structure with high uncertainty is encountered it is evaluated with a reference method and added to the training data. To accelerate this process, we developed Hyper-Active Learning,^{14} which biases molecular dynamics simulation towards high-uncertainty and high predicted error regions. This strategy is sometimes capable of more rapidly generating many independent training samples. Section III will go into some details how this strategy is used in practice.

### E. Parameter estimation: Ridge regression

Recall from Sec. II A that the linear ACE models are parameterized linearly as shown in (3). As described in Sec. II D we estimate parameters by matching the model to observations of total energies, forces and virials evaluated via a high fidelity reference model on different training structures *R* ∈ ** R**, where

**denotes the training set. To estimate the parameters we minimize the loss function (4). In the current section, we go into further details of the parameter estimation process once the model and training set have been specified.**

*R**w*

_{E,R},

*w*

_{F,R}and

*w*

_{V,R}, which allow users to specify the relative importance of different observations and structures. In principle one could specify individual weights for each structure

*R*and observation type

*E*,

*F*,

*V*. In practice, it has proven convenient to label all structures

*R*with a

*configuration type*as described in Sec. II D and to assign weights according to such groups. In addition the weights

*w*

_{E,R},

*w*

_{V,R}should scale like $1/NR$ where

*N*

_{R}denotes the number of atoms in the structure

*R*.

^{2,12}Thus, the weights

*w*

_{E,R},

*w*

_{V,R}take the form

*L*(

**) can be rewritten in the form**

*c***y**is a vector containing the observation values $ER,FR,VR$,

**A**is the design matrix containing the ACE basis values corresponding to those observations and

**W**a diagonal matrix containing the weights

*w*

_{E,R},

*w*

_{F,R},

*w*

_{V,R}. Solving the linear least squares system (11) often results in overfitting, hence one almost always employs regularized methods, for example the ridge regression formulation,

**Γ**specifies the form of the regularizer and

*λ*a scaling parameter determining the relative weight of the regularisation. This formulation of the least squares problem is often also called regularized least squares, and the

*λ*‖Γ

**‖**

*c*^{2}term is often called generalized Tikhonov regularisation. The default for

**Γ**is zero or the identity, depending on the choice of solver. Our recommendation is to use the smoothness prior introduced in (10) instead for most solvers. Automatic relevance determination (ARD) is unique amongst the ridge regression solvers available in ACEpotentials.jl in that it estimates a regularizer

**Γ**from the sensitivity of the parameters to the training data, at additional computational cost; see Sec. II F for more details.

To solve the ridge regression problem (12), ACEpotentials.jl employes the package ACEfit.jl (https://github.com/ACEsuit/ACEfit.jl), which offers a range of such algorithms. In the simplest setting, it can be used as shown in Listing 8, lines 6 and 7. For a list of the most important solvers, see Table I. For large models and/or large datasets, the parameter estimation task can be computationally challenging and may have to be performed on a cluster.

QR | QR decomposition: Direct solution of the ridge regression problem (12). Tikhonov regularisation is imposed by extending the |

linear system. This method should rarely be used in practice and is included mostly for theoretical interest and the sake of | |

completeness. | |

solver = QR(lambda = 0.0) | |

LSQR | Krylov method: The standard iterative Krylov algorithm to solve the ridge regression problem (12). Tikhonov regularisation is |

imposed implicitly in the algorithm, with damp corresponding to the parameter λ. Early termination, by adjusting atol provides an | |

additional and different form of regularisation. This algorithm is suitable for very large-scale parameter estimation problems. | |

solver = LSQR(damp = 1 × 10^{−4}, atol = 1 × 10^{−6}) | |

RRQR | Rank-revealing QR decomposition: A random matrix sketching approach, which is computationally more efficient than the |

standard QR decomposition. In addition, the parameter rtol is closely related to λ in (12) but not identical. Instead of adding a | |

Tikhonov term, RRQR regularisation is imposed by removing highly sensitive subspaces as determined by rtol. For large problems, | |

this algorithm is more performant than the standard QR decomposition. | |

solver = RRQR(rtol = 1e−5) | |

BLR | Bayesian linear regression: (or, Bayesian ridge regression) specifies a class of solvers that estimate regularisation hyperparameters, |

depending on the setting it estimates the scaling parameter λ or the entire Tikhonov matrix Γ. This solver also determines | |

a posterior model distribution that can be used for uncertainty quantification. See Section II F. for further details. This algorithm is | |

more robust than QR, LSQR, RRQR, but computationally more intensive. It is highly recommended for relatively small datasets. | |

solver = BLR() |

QR | QR decomposition: Direct solution of the ridge regression problem (12). Tikhonov regularisation is imposed by extending the |

linear system. This method should rarely be used in practice and is included mostly for theoretical interest and the sake of | |

completeness. | |

solver = QR(lambda = 0.0) | |

LSQR | Krylov method: The standard iterative Krylov algorithm to solve the ridge regression problem (12). Tikhonov regularisation is |

imposed implicitly in the algorithm, with damp corresponding to the parameter λ. Early termination, by adjusting atol provides an | |

additional and different form of regularisation. This algorithm is suitable for very large-scale parameter estimation problems. | |

solver = LSQR(damp = 1 × 10^{−4}, atol = 1 × 10^{−6}) | |

RRQR | Rank-revealing QR decomposition: A random matrix sketching approach, which is computationally more efficient than the |

standard QR decomposition. In addition, the parameter rtol is closely related to λ in (12) but not identical. Instead of adding a | |

Tikhonov term, RRQR regularisation is imposed by removing highly sensitive subspaces as determined by rtol. For large problems, | |

this algorithm is more performant than the standard QR decomposition. | |

solver = RRQR(rtol = 1e−5) | |

BLR | Bayesian linear regression: (or, Bayesian ridge regression) specifies a class of solvers that estimate regularisation hyperparameters, |

depending on the setting it estimates the scaling parameter λ or the entire Tikhonov matrix Γ. This solver also determines | |

a posterior model distribution that can be used for uncertainty quantification. See Section II F. for further details. This algorithm is | |

more robust than QR, LSQR, RRQR, but computationally more intensive. It is highly recommended for relatively small datasets. | |

solver = BLR() |

For small and moderate datasets we normally recommend the BLR method. For large datasets. when finely tuned regularisation is often less important, the random matrix sketching RRQR and iterative LSQR may be more appropriate.

Once the model parameters are determined as shown above, we typically wish to perform two tasks: (1) confirm the model accuracy on a test set; and (2) export the model to a format that can be used in standard MD codes, e.g., LAMMPS and ASE. Suppose that we are provided with a test data set testdata, then we can determine the model errors on that test set as seen in Listing 8, lines 9–11. This will print tables of root-mean-square error (RMSE) and mean absolute error (MAE) errors for individual configuration types. If we wish to store and/or export the fitted potential for later use, we typically save it in .json format which can be read by ACEpotentials.jl as well as its Python interface to ASE, and in .yace format which can be read by the pace extension to LAMMPS; cf. Listing 8, lines 13–15.

### F. Bayesian framework for parameter estimation

*p*(

**A**,

**y**|

**) denotes the likelihood of the observed data, and**

*c**p*(

**) the prior distribution on the model parameters. The Bayesian analogue of (12) is a Bayesian Linear Regression model with Gaussian observational noise and prior,**

*c**β*

^{−1}

**W**

^{−2}of the observation noise depends on the regression weight matrix

**W**and a hyper-parameter

*β*> 0. This choice of prior and noise model yields a Gaussian posterior distribution, $p(c|A,y)=N(c;\mu ,\Sigma )$, with mean and covariance given, respectively, by

**=**

*μ**β*

**ΣA**

^{T}

**W**

^{2}

**y**and $\Sigma =\beta ATW2A+\Sigma 0\u22121\u22121$. We assume that the prior covariance

**Σ**

_{0}is of the form of a diagonal matrix. The above Bayesian model can be connected to the ridge regression formulation of Eq. (12) by noticing that maximising the posterior density (13) is equivalent to minimizing the regularized loss in (12) when $\Sigma 0\u22121=\zeta \Gamma 2$ for some

*ζ*> 0 and

*λ*=

*ζ*/

*β*.

#### 1. Solvers and model selection via evidence maximisation

*β*

^{−1}

**W**

^{−2}and

**Σ**

_{0}. In ACE, it is sometimes difficult to make informed guesses of explicit values of these hyper-parameters that lead to good fits. We therefore commonly employ empirical Bayes approaches that infer appropriate values of these parameters directly from the training data by virtue of maximising the model evidence

**Σ**

_{0},

*β*. Intuitively, maximising the model evidence results in a model where the regularising effect of the covariance matrix

**Σ**

_{0}and the degree of penalization of model misfit—modelled by the noise covariance matrix

*β*

^{−1}

**W**

^{−2}—are balanced against the degree to which the regression coefficients are determined by the data.

#### 2. Uncertainty estimates via committees

**but a posterior parameter distribution**

*c**p*(

**). In practice, one then selects the mean parameter vector**

*c***to specify the model. However, the posterior distribution remains important to estimate the uncertainty of predictions. Evaluating such uncertainties from the exact posterior distribution is computationally expensive; instead, ACEpotentials draws**

*μ**K*samples ${ck}k=1K$ from post(

**) resulting in a committee of ACE models which can be used to obtain computationally efficient uncertainty estimates for predictions. For example, the standard deviation**

*c**σ*of a total energy prediction can be approximated by a committee via

*E*

^{μ}is the prediction made by the mean model with parameters

**, while**

*μ**E*

^{k}are the committee predictions from models with parameters

*c*_{k}. Similarly, uncertainty estimates can be made for any partial derivative of the potential energy surface such as for committee forces

*F*

^{k}=

*c*_{k}· ∇

*B*_{i}, or the mean force

*F*

^{μ}=

**· ∇**

*μ*

*B*_{i}.

The first term in (17) refers to the aleatoric, or irreducible, uncertainty arising due to randomness of the system which is dominated by the complexity of the linear ACE convergence parameters such as correlation order, polynomial degree and cutoff. The second term is the epistemic, or reducible, uncertainty arising due to a lack of data or rather information. An example how a variance estimate of the epistemic uncertainty can be obtained in the linear ACE framework is shown in Listing 9.

## III. WORKFLOW EXAMPLES

In this section, we present several practical examples of ACE usage, including simple benchmarks, practical potentials for materials and liquids to examples illustrating the hyperactive learning workflow. The scripts we used to generate the reported results are made available in a separate git repository (https://github.com/ACEsuit/ACEworkflows) that will be regularly updated as the ACEpotentials.jl package evolves.

### A. Tests with pre-existing data sets

#### 1. Benchmarks with limited-diversity datasets

We test ACEpotentials.jl with default parameters on an early single-element benchmark dataset taken from Ref. 27. This dataset was originally used to assess the relative strengths and weaknesses of four important MLIPs, the high-dimensional neural network potential (NNP),^{1} the Gaussian approximation potential (GAP),^{2} the Spectral Neighbor Analysis Potential (SNAP),^{7} and moment tensor potentials (MTP).^{8} The benchmark contains six separate datasets corresponding to the six elements Li, Mo, Ni, Cu, Si and Ge, spanning a variety of chemistries (main group metal, transition metal and semiconductor), crystal structures (bcc, fcc, and diamond) and bonding types (metallic and covalent). For each element, the dataset contains the ground-state crystal structure, strained structures with strains of −10%–10%, slab structures up to a maximum Miller index of three, and NVT *ab initio* molecular dynamics simulations of the bulk supercells with and without a single vacancy. These datasets contain a relatively large number of training structures, but only limited diversity.

In Table II we see the comparison of the MAEs in energies and forces for the best performing potentials in the benchmark (GAP and MTP) with two linear ACE models trained with the default parameters and total degrees chosen to reach basis sizes of, respectively, 300 basis functions for **ACE(s)** and ∼1000 basis functions for **ACE(l)**. We optimized none of the hyperparameters and solved used RRQR to estimate the parameters. We chose RRQR since the datasets are very large, hence a highly tuned regularisation is less important. This results in competitive accuracy across the entire benchmark. The only small exception is the slightly larger energy error for Mo-ACE(l), which suggests some fine-tuning of the model parameters could be beneficial in this particular case. Our aim with this experiment was to demonstrate that, with only minimal effort, linear ACE models can perform with (near-) best accuracy in a data set geared towards testing statistical generalization.

Energy (meV) . | Forces (eV/Å) . | ||||||||
---|---|---|---|---|---|---|---|---|---|

. | ACE(sm) . | ACE(lge) . | GAP . | MTP . | . | ACE(sm) . | ACE(lge) . | GAP . | MTP . |

Ni | 0.416 | 0.34 | 0.42 | 0.48 | 0.018 | 0.015 | 0.02 | 0.01 | |

Cu | 0.292 | 0.228 | 0.46 | 0.41 | Ni | 0.007 | 0.005 | 0.01 | 0.01 |

Li | 0.231 | 0.165 | 0.49 | 0.49 | Cu | 0.006 | 0.005 | 0.01 | 0.01 |

Mo | 2.597 | 2.911 | 2.24 | 2.83 | Li | 0.123 | 0.097 | 0.09 | 0.09 |

Si | 3.501 | 1.985 | 2.91 | 2.21 | Mo | 0.086 | 0.066 | 0.07 | 0.06 |

Ge | 2.594 | 2.162 | 2.06 | 1.79 | Si | 0.064 | 0.051 | 0.05 | 0.05 |

Energy (meV) . | Forces (eV/Å) . | ||||||||
---|---|---|---|---|---|---|---|---|---|

. | ACE(sm) . | ACE(lge) . | GAP . | MTP . | . | ACE(sm) . | ACE(lge) . | GAP . | MTP . |

Ni | 0.416 | 0.34 | 0.42 | 0.48 | 0.018 | 0.015 | 0.02 | 0.01 | |

Cu | 0.292 | 0.228 | 0.46 | 0.41 | Ni | 0.007 | 0.005 | 0.01 | 0.01 |

Li | 0.231 | 0.165 | 0.49 | 0.49 | Cu | 0.006 | 0.005 | 0.01 | 0.01 |

Mo | 2.597 | 2.911 | 2.24 | 2.83 | Li | 0.123 | 0.097 | 0.09 | 0.09 |

Si | 3.501 | 1.985 | 2.91 | 2.21 | Mo | 0.086 | 0.066 | 0.07 | 0.06 |

Ge | 2.594 | 2.162 | 2.06 | 1.79 | Si | 0.064 | 0.051 | 0.05 | 0.05 |

#### 2. Silicon

We used ACEpotentials.jl to fit a linear ACE potential to the silicon dataset introduced by Bartók *et al.*^{25} for fitting a Gaussian approximation potential (GAP). This extensive database contains a wide range of configurations ranging from several bulk crystal structures (diamond, hcp, fcc, etc.), amorphous structures as well as liquid MD snapshots, aiming to cover as much of the silicon energy landscape as possible. The corresponding GAP model was shown to outperform a wide range of other (classical) interatomic potentials on a large selection of accuracy and property or generalisation tests ranging from surface formation energies as well as liquid and radial distribution functions. The current work benchmarks an ACEpotentials.jl model, with default model parameters, containing basis functions up to order $\nu \u0304=4$, polynomial total degree *D*^{max} = 20 and 6 Å cutoff against this silicon GAP potential. The model was fitted using generalised Tikhonov regularisation (12) of *λ*Γ, where Γ was constructed using an algebraic smoothness prior (10) with *p* = 5, whilst the BLR solver was used to estimate the scaling parameter *λ*. This benchmark is formed of a series of property tests including bulk diamond elastic constants, vacancy formation energies, surface formation energies for the (100), (110), (111) surfaces and hexagonal, dumbbell and tetragonal point defect energies for bulk diamond. These results of these property tests for the CASTEP^{28} DFT reference, GAP and ACE are shown in Fig. 2 and indicate good accuracy across the range of property tests. Percentage errors relative to the DFT reference are also included, confirming similarly accurate performance between the GAP and the ACEpotentials.jl frameworks.

We also used this silicon ACE potential to carry out a more challenging test, namely to simulate fracture in the $(111)[11\u03040]$ cleavage system. We used the matscipy package to setup a 12 × 11 × 1 supercell containing 1586 atoms and to carry out structural optimizations with a Mode I crack anisotropic continuum linear elastic displacement field^{29} applied with stress intensity factors ranging from 0.6*K*_{G} to 1.5*K*_{G} (where *K*_{G} is the Griffith load at which fracture becomes thermodynamically favorable). We observed spontaneous formation of the Pandey 2 × 1 reconstructed (111) surface behind the crack tip, in good agreement with previous studies using DFT^{30} and GAP.^{25} The critical stress intensity factor was determined to be *K*_{I} = 1.0 ± 0.02*K*_{G}, which is very close to the expected Griffith value, indicating minimal lattice trapping. Overestimating the extent of lattice trapping is a common failure mode of previous interatomic potentials when applied to model fracture.^{31} The total simulation time was around 30 minutes on a 28-core workstation.

To successfully carry out the fracture test it was crucial to produce a highly regular (smooth) ACE potential. To illustrate the effect of changing the smoothness prior, a sequence of ACE potentials (order $\nu \u0304=4$, total degree *D*^{max} = 21 and 6 Å cutoff), was fitted using no smoothness prior (Γ = 1) and increasing strengths of algebraic smoothness prior (10), *p* = 1, 2, 5 and 10. In all cases the model parameters were estimated using generalized Tychonov regularisation (12) with the scale factor *λ* tuned such that all potentials achieved a force RMSE of ∼0.075 eV/Å, which is ∼5% larger than without any regularisation. The effect of the prior on predicted Si–Si dimer curves and rigid bulk Si decohesion curves, which respectively probe smoothness of two-body and many-body terms, is shown in Fig. 3. Applying a moderate smoothness prior aids extrapolation into the close-approach region and reduces the amplitude of spurious oscillations seen in the stress (S) during decohesion.

#### 3. Water

We investigated the ability of ACEpotentials.jl to capture the interactions in complex molecular liquids and to perform robust molecular dynamics simulations in such systems, fitting a linear ACE potential to a dataset containing 1593 liquid water configurations.^{32} We chose only default model parameters, containing basis functions up to correlation order $\nu \u0304=3$, polynomial total degree *D*^{max} = 15 and *r*_{cut} = 5.5 Å cutoff. Parameter estimation was performed using ARD with relevance threshold set by minmising the Bayesian Information Criterion (BIC).^{33} The training RMSE were 1.732 meV/atom for energies and 0.099 eV/Å for forces. To investigate the performance and robustness of the fitted ACE model, a series of mean squared displacement (MSD) simulation were performed under 1 bar NPT conditions at 300 K. The simulations were performed using 5184 atom simulation boxes, shown in Fig. 4 below, with the pace pair style in LAMMPS.^{12} The total simulation time for each of these simulation was 20 min utilising 1280 cores on ARCHER2, illustrating the efficiency of ACE potentials. The diffusion constant predicted by this simulation was 1.20 ± 0.03 m/s^{2}. It should be noted that diffusion constants are notoriously difficult to accurately determine especially considering the absence of long-range interactions into these ACE models. This example is therefore mostly an illustration of robustness and performance.

### B. The hyperactive learning (HAL) workflow

^{34,35}where relevant training configurations are sampled to form a training database. To accelerate this AL process, we introduced hyperactive learning (HAL),

^{14}which adds a biasing term to a molecular dynamics simulation towards predicted high uncertainty

*σ*, as shown in (18). A tunable parameter

*τ*controls the strength of the biasing and thus the balance between physical exploration (molecular dynamics) and discovery of new structures (biasing).

The HAL framework shares similarities with Bayesian Optimization (BO) as the biasing term is formally equivalent to a Lower Confidence Bound (LCB) acquisition function.^{36} Similarly to BO, the parameter *τ* adjusts the tradeoff between exploration and exploitation during the generation of training configurations using HAL. HAL-generated configurations are both energetically reasonable, guided by *E*^{ACE} (exploitation), and informative, predicted by a relatively large value of *σ* (exploration). The bias towards uncertainty, mediated by an emerging biasing force during HAL dynamics, can be viewed as a strategy to acquire information (gain) by seeking out unseen (local) environments. The HAL approach can also be viewed as an adversarial attack, aimed to destabilize a fitted ACE potential such that, after iteratively adding sufficiently many new configurations, the linear ACE model is robust to such attacks which all but guarantees stable dynamics over long timescales.

The biasing parameter *τ* in HAL necessitates careful tuning, which HAL achieves through an adaptive scheme^{14} that tunes *τ* on the fly by balancing the magnitude of the biasing force relative to the forces obtained by *E*^{ACE}. The *relative biasing parameter* *τ*_{r} used in this scheme is typically set to 0.1–0.2 and ensures that the biasing strength is reduced or increased depending on the degree of predicted uncertainty explored during the dynamics.

*K*= 8, are sampled from the posterior as discussed in Sec. II F. Biased MD/MC dynamics are then performed on

*E*

^{HAL}, using the dynamically tuned

*τ*parameter. During the dynamics the relative force uncertainty

*f*

_{i}is recorded and once it exceeds a predefined tolerance

*f*

^{tol}a DFT calculation is triggered, and the training database is extended. This relative force uncertainty

*f*

_{i}is defined as

**of the posterior over the coefficients as outlined in Sec. II F.**

*μ**ɛ*is a regularising constant used to regularize the fraction typically set to 0.2–0.4 eV/Å. Careful tuning of

*f*

_{tol}is required as it tunes the degree of extrapolation when adding new (unseen) configurations to the training database. Too large

*f*

_{tol}may lead to the sampling of energetically unreasonable configurations, whereas too small

*f*

_{tol}leads to suboptimal information gain during the HAL scheme resulting in sampling unnecessarily many configurations. The HAL scheme is outlined in Fig. 5 illustrating how from a small initial training database containing a handful of configurations of interest a stable ACE potential is generated by performing biased MD and MC steps and iteratively triggering DFT calculations. For future reference, we define a

*HAL iteration*to consist of (i) a biased MD simulation run until a new unseen structure is flagged, (ii) evaluating energies, forces and virials on the new structure, and (iii) updating the ACE potential model.

#### 1. AlSi10 melting temperature

The HAL framework was used to create an ACE potential for determining the melting temperature of the AlSi10 alloy. An initial dataset consisted of 32-atom random fcc lattice configurations, each containing 98 aluminium and 10 silicon atoms. This initial dataset was composed of five fcc random alloy configurations with lattice constants ranging from 14.3 to 16.6 Å^{3}/atom. The ACE basis set included interactions up to correlation order $\nu \u0304=2$ (three-body), and employed a cutoff of 5.5 Å. The model was fitted using Automatic Relevance Determination (ARD) and its sparsity set by minimising BIC which resulted in increasingly complex ACE models as more configurations (or information) were added. The chosen maximum polynomial degree *D*^{max} during the HAL procedure increased from 4 to 12. The parameter estimation was carried out using ARD. The HAL relative biasing strength was set to *τ*_{r} = 0.2, and the relative uncertainty threshold to *f*^{tol} = 0.2.

The HAL dynamics was used to melt the random alloy crystal structure, by ramping the temperature from 0 to 1500 K at 1 GPa using a 1 fs timestep. Cell swapping and volume adjusting HAL-MC steps were taken to facilitate exploration of the (biased) energy landscape. After 18 HAL iterations, the ACE potential was already able to consistently perform 5000 HAL MD/MC timesteps without encountering new structures with high uncertainty. This final ACE potential contained 79 basis functions as selected using ARD pruning.

During these 18 HAL iterations the dimer curves are typically examined to ensure the potentials exhibit attraction at typical interatomic distances and short range repulsion as illustrated in Fig. 6.

The ACE potential obtained after HAL iteration 18 (fitted to 22 structures in total) was subsequently used to perform nested sampling (NS) simulations to model the liquid-solid phase transition. NS simulations were performed using 384 NS walkers, using a total decorrelation length of 512 formed by volume/shear/stretch/swap MC steps at a ratio of 4:4:4:4. The resulting heat capacity curves obtained by NS are presented in Fig. 7 and are in close agreement to the melting temperature of 867 K as given by Thermo-Calc using the TCAL4 database.^{37}

#### 2. Polyethylene glycol

The HAL framework^{38} was used to create a polyethylene glycol (PEG) model. To initilize HAL, 18 structures of PEG (*n* = 32) formed of 32 monomer units in vacuum were evaluated using the ORCA code^{39} with the *ω*B97X DFT exchange correlation functional^{40} and the 6-31G(d) basis set. ACE models were fitted to the initial and subsequent datasets with correlation order $\nu \u0304=3$, total degree *D*^{max} = 12 and a cutoff radius 5.5 Å, using the ARD algorithm. The HAL protocol used relative biasing parameter *τ*_{r} = 0.1 and uncertainty tolerance *f*^{tol} = 0.3 and performed at 500 K. Unlike the previous AlSi10 example, no cell adjusting or atom swapping HAL-MC steps were performed as the configurations are isolated molecules in vacuum. It was also chosen to not change the ACE basis throughout the HAL procedure but rather to keep it constant (e.g. *D*^{max} = 12) as the initial database was relatively diverse. After 50 HAL iterations an ACE potential was generated that was deemed stable as it completed 10^{4} HAL biased MD steps without triggering a DFT calculation. It was then used to determine the density of a PEG polymer formed of *n* = 200 monomer units in LAMMPS under periodic boundary conditions using the PACE evaluator.^{12} The PEG (*n* = 200) density was determined at 300, 350 and 400 K at 1 bar pressure over a timescale of 0.5 ns as shown in Fig. 8. The density at 300 K is in good agreement with the experimental density of 1.2 g/cm^{3}^{41} at 293 K. This illustrates remarkable extrapolative performance by the linear ACE framework as the DFT reference (ORCA) does not support periodic boundary conditions itself, making determining the PEG density purely from first principles impossible.

#### 3. Perovskite CsPbBr_{3}

We used the HAL framework^{38} to create a training dataset for the lead-halide perovskite CsPbBr_{3}, which shows three relevant phases: orthorhombic at low temperatures, tetragonal at intermediate temperatures, and cubic at high temperatures, with experimental transition temperatures of 361 and 403 K.^{42} The HAL process was designed to sample all of these phases so that the resulting potential accurately represents energy and entropy of each phase and is hence capable of predicting the transition temperatures. To ensure consistent DFT energies and effective vibrational mode sampling, approximately cubic 40 atom supercells were created for all three phases.

This problem required some refinement of the standard HAL procedure, and careful testing of fitted ACE potentials for several basis sizes. We therefore give more detail about the process than in the previous cases.

The initial fit starting the HAL process used a set of 15 randomly perturbed (unit cell and atomic positions) 40-atom configurations, three from each of the high symmetry phases. The default ACE basis was used, with a cutoff of 8 Å, a smoothness prior with *p* = 3, and the sklearn BayesianRidge linear solver. Automated basis selection was applied every 10 HAL iterations, with a maximum basis size of 2000, $\nu \u0304=3$, a maximum total polynomial degree of 16, and the model score as the selection criterion. To encourage exploration of a wide range of temperatures and configurations, over a maximum of 10^{4} 1 fs HAL MD steps the temperature was ramped from 200 to 600 K, and *τ*_{r} from 0.1 to 0.5. New fitting configurations were selected when the fractional force error *f*^{tol} exceeded 0.4. After 20 iterations starting from the three unperturbed high-symmetry 40-atom cells at fixed unit cell shape and size, the process was restarted from 9 80-atom high symmetry cells, doubling each of the three 40-atom cells along each cell vector, for 20 additional iterations. Then 20 additional iterations were carried out with variable unit cell and an applied pressure of 0.

At this point the model appeared to be stable enough for 10^{5} steps without a HAL bias, so we switched to an unbiased sampling process to gather more data and improve the model accuracy. Starting the fit from the complete set of configurations from the HAL process, we generated fitting configurations from 2000 step runs with a maximum basis size of 4000. These used the same 80 atom starting configurations, but at fixed temperatures of 200–500 K at 100 K intervals, and fixed shape but variable unit cell volume. To further refine the performance of the low energy parts of the PES around each high symmetry structure, we sampled 36 more configurations, each with 160 atoms (the three 40 atom supercells doubled along each of the three pairs of lattice vectors) at a range of lower temperatures, 150–300 K at 50 K intervals.

The original set of 15 randomly perturbed configurations, another similar set of 15, and the 168 HAL configurations were used as the reference database for a set of fits to explore the performance of the model for a wide range of basis sizes. At this stage we filtered out physically unreasonable fitting data, as defined by a criterion that excluded any force larger than 10 eV/Å, as well as the energies and virials from such configurations. To fit the model and evaluate its predictive accuracy we split the set of configurations into 75% fitting and 25% testing, stratifying the split by the HAL iteration (or initial random perturbation set) that produced the configuration. The same fitting procedure and basis as in the HAL run were used, with $\nu \u0304=2$ and $\nu \u0304=3$ and maximum polynomial degree 4–16, up to a maximum basis size of 2 × 10^{4}. We also compared three choices for the smoothness prior: none, *p* = 2, and *p* = 4.

The training set residuals, test set residuals, and BayesianRidge score (log marginal likelihood) are plotted as a function of basis size in Fig. 9. For each value of $\nu \u0304$ the fitting error improves monotonically as the basis size (and polynomial degree) increases, but at equal basis size the $\nu \u0304=2$ residuals are lower by as much as 25% (especially for moderately sized bases), indicating that for this system increasing the polynomial degree provides the basis with more useful flexibility as compared to increasing $\nu \u0304$. For the basis size range where the error is minimized, the testing set residuals are larger than the fitting set by at least about a factor of 2, indicating that some amount of overfitting is occurring. The smoothness prior is successful at limiting the extent of this overfitting.

The generally lower training and test errors for the $\nu \u0304=2$ models relative to the correlation order three models are reflected in their Bayesian ridge scores (log marginal likelihoods). However, within each correlation order the optimal choice of polynomial degree and corresponding basis size indicated by the minimum test error are not consistent with the score. Indeed, the results displayed in Fig. 9 lead us to conclude that the Bayesian ridge score is not always a reliable tool for optimal basis selection and other options should be explored in the future.

We used the model with lowest test set error, generated by the fit with $\nu \u0304=2$, maximum polynomial degree 12, and smoothness prior *p* = 4, to simulate larger unit cells of CsPbBr_{3} at a range of temperatures spanning its expected range of phase transition temperatures. We simulated 32 independent constant temperature, constant pressure, MD trajectories at temperatures from 200 to 355 K and zero pressure for 10^{4} 10 fs time steps. Each trajectory started from an 8 × 8 × 6 supercell (7680 atoms) of the orthorhombic structure. To analyze the resulting structure we reconstructed the effective cubic lattice vectors and averaged their magnitudes over the last 8000 steps of each trajectory. A plot of these effective cubic cell lattice vector magnitudes as a function of temperature is shown in Fig. 10. We see the three expected phases as indicated by the degeneracy of the lattice constants: cubic at high temperature, tetragonal at intermediate temperatures, and orthorhombic at low temperatures. The transition temperatures are 240 and 255 K, which are substantially shifted relative to the experimental results of 361 and 403 K.^{42} We expect that this deviation from experiment is primarily due to our choice of exchange correlation functional, the Perdew–Burke–Enzerhof generalized-gradient approximation,^{43} as has been seen in similar simulations.^{44} A direct comparison to DFT would be useful, but it would require an accurate calculation of the predicted phase transition temperatures directly from the DFT PES, which is too computationally demanding to be practical without additional approximations.

## IV. COMPUTATIONAL PERFORMANCE

The linearity of ACE potentials renders them not only interpretable but also efficient in terms of computational performance. To demonstrate this, a performance test was conducted on various linear ACE potentials referenced in this paper. The evaluation times, as well as some ACE hyperparameters used, are shown in Table III for the AlSi10, CsPbI_{3}, H_{2}O, PEG and Si potential developed in this work. The number of basis functions for each model is given too and may be fewer than a complete ACE basis parameterized by $\nu \u0304$ and *D*^{max} due to ARD pruning basis functions with low relevance. The timings were obtained using the LAMMPs-PACE implementation^{12} using a 128 core ARCHER2 node, equivalent to two seperate AMD EPYC 7742 64-core at 2.25 GHz. The 10^{6} steps/day figures are equivalent to a ns/day and were obtained for varying cell sizes to illustrate scaling. A standardized performance figure in the form of core-*μ*s/atom figure is also provided. The silicon database fitted originates from the silicon GAP potential, whereas the AlSi10, PEG and CsPbBr_{3} potentials were fitted using HAL generated databases containing 22, 68 and 198 configurations respectively as discussed in the previous subsections.

. | ACE parameters . | Performance . | ||||
---|---|---|---|---|---|---|

$\nu \u0304$ . | D^{max}
. | r_{cut}
. | No. basis func. . | 10^{6} steps/day (atoms)
. | Core-μs/atom
. | |

AlSi10 | 2 | 7 | 5.5 | 79 | 636 (32) | 23 |

CsPbBr_{3} | 2 | 12 | 5.5 | 544 | 334 (20) | 93 |

PEG | 3 | 12 | 5.5 | 4897 | 10 (1400) | 227 |

Si | 4 | 20 | 6 | 5434 | 7 (250) | 744 |

. | ACE parameters . | Performance . | ||||
---|---|---|---|---|---|---|

$\nu \u0304$ . | D^{max}
. | r_{cut}
. | No. basis func. . | 10^{6} steps/day (atoms)
. | Core-μs/atom
. | |

AlSi10 | 2 | 7 | 5.5 | 79 | 636 (32) | 23 |

CsPbBr_{3} | 2 | 12 | 5.5 | 544 | 334 (20) | 93 |

PEG | 3 | 12 | 5.5 | 4897 | 10 (1400) | 227 |

Si | 4 | 20 | 6 | 5434 | 7 (250) | 744 |

## V. CONCLUSION AND OUTLOOK

We introduced ACEpotentials.jl, a front-end for several Julia-language packages that implement Atomic Cluster Expansion (ACE) MLIPs and related functionality. This front-end provides a user-oriented interface, while the backend packages combine excellent performance with the flexibility for rapid model development and experimentation that is typical for the Julia language. The front-end ACEpotentials.jl exposes a relatively simple subset of ACE type models, linear models with robust priors, that we consider reliable in every-day use, especially in the context of an active learning type workflow.

However, we emphasize that the ACE framework allows for a much richer MLIPs design space^{9,12,48–47} as well as parameterisation of many other types of particle systems.^{51–51} We therefore conclude by mentioning some of those extensions, as well as current short-comings, that require further development.

Robust parameter estimation, in particular hyperparameter tuning, remains under-investigated in the MLIPs context. We regularly experience that hand-tuned hyperparameters can give superior results, basis sparsification remains poorly understood, and uncertainties are often only indicative of actual errors. Further research is required to resolve these closely related issues.

The design space of the ACEpotentials.jl ACE models can be expanded to admit trainable radial embeddings, composition of ACE features with nonlinearities, or even multi-layer architectures such as.

^{45,46}This comes at the cost of highly nonlinear and less efficient models, but some of those extensions, such as trainable radial embeddings, can be undertaken while keeping the spirit of our current ACE models: small models for rapid iterative development and low evaluation cost.The extension to highly nonlinear models would likely require that the computational kernels on which ACEpotentials.jl is built also be made graphics processing unit (GPU)-capable. Towards that end a deep learning framework such as MACE

^{46}[see also the mace (https://github.com/ACEsuit/mace) code] may be better suited.Finally, we note that there are already several related ACE software packages within ACEsuit (https://github.com/ACEsuit) that implement a variety of models for other particle systems at different stages of development: Hamiltonians (Ref. 48, ACEhamiltonians.jl); wave functions (Ref. 50 and 51, ACEpsi.jl); jet tagging models (Ref. 49, BIPs.jl). These build on an experimental and significantly expanded Julia-language ACE package ACE.jl.

## ACKNOWLEDGMENTS

G.C. acknowledges support from EPSRC Grant No. EP/X035956/1. C.O., A.R., and T.J. were supported by NSERC Discovery Grant No. GR019381 and NFRF Exploration Grant No. GR022937. W.J.B. was supported by US AFRL Grant No. FA8655-21-1-7010. C.v.d.O. and G.C. acknowledge ARCHER2 for which access was obtained via the UKCP consortium and funded by EPSRC Grant No. EP/P022065/1. N.B. was supported by the U.S. Office of Naval Research through the U.S. Naval Research Laboratory’s fundamental research base program. E.G. acknowledges support from the EPSRC Centre for Doctoral Training in Automated Chemical Synthesis Enabled by Digital Molecular Technologies with Grant Reference No. EP/S024220/1. W.C.W. was supported by the Schmidt Science Fellows in partnership with the Rhodes Trust, and additionally acknowledges support from EPSRC (Grant No. EP/V062654/1). J.K. and C.O. acknowledge funding from the Leverhulme Trust under grant RPG-2017-191 and the EPSRC under Grant No. EP/R043612/1. J.K., J.P.D. and G.C. acknowledge support from the NOMAD Centre of Excellence funded by the European Commission under grant agreement 951786. J.K. acknowledges support from the EPSRC under Grant Nos. EP/P002188 and EP/R012474/1. This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the EPSRC (capital Grant No. EP/T022159/1), and DiRAC funding from the STFC (www.dirac.ac.uk). Further computing facilities were provided by the Scientific Computing Research Technology Platform of the University of Warwick.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**William C Witt**: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Cas van der Oord**: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Elena Gelžinytė**: Software (equal); Writing – original draft (equal). **Teemu Järvinen**: Software (equal); Writing – original draft (equal). **Andres Ross**: Methodology (equal); Software (equal); Writing – original draft (equal). **James P. Darby**: Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Cheuk Hin Ho**: Data curation (equal); Investigation (equal); Methodology (equal); Software (equal). **William J. Baldwin**: Data curation (equal); Investigation (equal); Software (equal); Writing – original draft (equal). **Matthias Sachs**: Methodology (equal); Writing – original draft (equal). **James Kermode**: Software (equal); Writing – review & editing (equal). **Noam Bernstein**: Data curation (equal); Investigation (equal); Software (equal); Visualization (equal); Writing – original draft (equal). **Gábor Csányi**: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). **Christoph Ortner**: Conceptualization (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Software (equal); Supervision (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request. are openly available in github.

### APPENDIX: LINEAR SCALING COST AND COMPUTATIONAL KERNELS

In Secs. II A and II B we outlined some basic ideas behind the ACE model, in particular expressing the potential energy model in terms of the many-body expansion (Sec. II A). A naive implementation of the many-body expansion results in prohibitive computational cost due to the exponential cost of the sums over clusters (*j*_{1}, …, *j*_{ν}). However, after discretizing the *U*^{(ν)}-body potential of the *self-interacting many-body expansion* (2b) the sum can be rewritten to result in linear scaling cost. This is presented in detail, for example, in.^{9,11,12} hence we shall not review this process in full detail here. In order to outline what is involved in an implementation of an ACE potential, we only recall the form that the ACE model takes after this re-organisation of the many-body summation. The evaluation of the *self-interacting* ACE basis then results in the following stages:

Evaluation of the embeddings,

*R*_{nl}(*r*_{ij},*Z*_{i},*Z*_{j}) and $Ylm(r\u0302ij)$.- Product basis: for lexicographically ordered tuples $(z,n,l,m)=(zt,nt,lt,mt)t=1\nu $ we defineThis operation can be thought of as a sparse symmetric tensor product, or as taking(A2)$Aznlmi=\u220ft=1\nu Aztntltmti.$
*ν*-correlations. - Symmetrization: To ensure invariance one averages
*A*^{i}over all rotations, resulting in the*O*(3)-invariant basisemployed in the definition of the linear ACE model (3). Here,(A3)$Bi=CAi,$*A*^{i}is the vector of $(Aznlmi)$ basis functions while $C$ a sparse matrix.

For each of these stages efficient computational kernels are implemented, designed in a modular way so that they can be independently optimized or composed into new model architectures.

#### 1. Canonical many-body expansion

Under the condition that the radial basis and envelope function are pure polynomials, it is possible to transform the self-interacting ACE basis *B*^{i} defined in (A3) into a basis for the canonical many-body expansion (2a). The idea behind this procedure is sketched out in Ref. 11. The precise details of the implementation and a detailed study is not the purpose of this review. Here, we only mention that, upon slightly extending the *R*_{nl}, *A*^{i} and *A*^{i} bases, one can obtain a “purification operator” $P$ such that the linearly transformed $PAi$ becomes a basis for the canonical many-body expansion (2a). The symmetrisation $C$ can then be applied to obtain an *O*(3)-invariant basis $Bi\u2254CPAi$.

## REFERENCES

*The Stopping and Range of Ions in Solids*

_{10}Mg

*Polyethylene Glycol [MAK Value Documentation, 1998]*

_{3}: A new material for high-energy radiation detection

*Advances in Neural Information Processing Systems*