Machine-learning potentials provide computationally efficient and accurate approximations of the Born–Oppenheimer potential energy surface. This potential determines many materials properties and simulation techniques usually require its gradients, in particular forces and stress for molecular dynamics, and heat flux for thermal transport properties. Recently developed potentials feature high body order and can include equivariant semi-local interactions through message-passing mechanisms. Due to their complex functional forms, they rely on automatic differentiation (AD), overcoming the need for manual implementations or finite-difference schemes to evaluate gradients. This study discusses how to use AD to efficiently obtain forces, stress, and heat flux for such potentials, and provides a model-independent implementation. The method is tested on the Lennard-Jones potential, and then applied to predict cohesive properties and thermal conductivity of tin selenide using an equivariant message-passing neural network potential.

## I. INTRODUCTION

Molecular dynamics (MD) simulations enable computational prediction of thermodynamic quantities for a wide range of quantum systems, and constitute a cornerstone of modern computational science.^{1} In MD, systems are simulated by propagating Newton’s equations of motion – potentially modified to model statistical ensembles – numerically in time, based on the forces acting on each atom due to their movement on the Born–Oppenheimer potential-energy surface (PES). Therefore, the quality of the underlying PES is important for the predictive ability of this method. First-principles electronic structure methods such as density-functional theory (DFT) can be used to perform high-accuracy *ab initio* molecular dynamics (aiMD) simulations,^{2} provided the exchange-correlation approximation is reliable.^{3} Such approaches are, however, restricted by high computational cost, severely limiting accessible size and time scales. Computationally efficient approximations to the underlying PES are therefore required for the atomistic simulation of larger systems: Forcefields (FFs) are built on analytical functional forms that are often based on physical bonding principles, parametrized to reproduce quantities of interest for a given material.^{4} They are computationally cheap, but parametrizations for novel materials are not always available, and their flexibility is limited by their fixed functional form. Machine-learning interatomic potentials (MLIPs),^{5–11} where a potential is inferred based on a small set of reference calculations, aim to retain near first-principles accuracy while remaining linear scaling with the number of atoms. While MLIPs are limited, in principle, to modeling the physical mechanisms present in the training data, they have nevertheless emerged as an important tool for MD,^{12–16} often combined with active learning schemes.^{17–21} Modern MLIPs can include semi-local interactions through message-passing (MP) mechanisms,^{22} internal equivariant representations,^{23} and body-order expansions,^{24} which enable the efficient construction of flexible many-body interactions. In such complex architectures, the manual implementation of derivatives is often unfeasible. Finite-difference approaches require tuning of additional parameters, as well as repeated energy evaluations. Automatic differentiation (AD)^{25,26} presents an intriguing alternative: If the computation of the potential energy *U* is implemented in a suitable framework, derivatives such as the forces ** F** or the stress

**can be computed with the same asymptotic computational cost as computing the energy. This is accomplished by decomposing the underlying “forward” computation into elementary operations with analytically tractable derivatives, computing local gradients for a given input, and then combining the resulting values using the chain rule.**

*σ*This work provides a systematic discussion of and an introduction to the use of AD to compute forces, stress, and heat flux for MLIPs. While the calculation of forces with AD is common practice,^{27–30} stress and heat flux are not yet commonly available for many MLIPs. Both quantities have been the focus of much previous work due to the difficulty of defining and implementing them for many-body potentials and periodic boundary conditions.^{31–42} Introducing an abstract definition of MLIPs as functions of a graph representation of atomistic systems, unified formulations of stress and heat flux are given, which can be implemented generally for any such graph-based machine-learning potential (GLP). An example implementation using jax^{43} is provided in the glp package.^{44} To validate the approach, different formulations of stress and heat flux are compared for the Lennard-Jones potential,^{45} where analytical derivatives are readily available for comparison, as well as a state-of-the art message-passing neural network (MPNN), so3krates.^{46} Having established the correctness of the proposed implementation, the ability of so3krates to reproduce first-principles cohesive properties and thermal conductivity of tin selenide (SnSe) is studied.

## II. AUTOMATIC DIFFERENTIATION

Automatic differentiation (AD) is a technique to obtain derivatives of functions implemented as computer programs.^{25,26} It is distinct from *numerical differentiation*, where finite-difference schemes are employed, and *symbolic differentiation*, where analytical derivatives are obtained manually or via computer algebra systems, and then implemented explicitly. Instead, AD relies on the observation that complex computations can often be split up into elementary steps, for which derivatives are readily implemented. If one can track those derivatives during the computation of the forward, or “primal,” function, the chain rule allows to obtain derivatives.

For this work, two properties of AD are particularly relevant: It allows the computation of derivatives with respect to quantities that are explicitly used in the forward computation, and it can do so at the same asymptotic computational cost as the forward function. In particular, AD can obtain two quantities efficiently: Given a differentiable function $u:RN\u2192RM$, the Jacobian of *u* is defined as the *M* × *N* matrix *∂u*_{i}(** x**)/

*∂x*

_{j}. AD can then obtain Jacobian-vector and vector-Jacobian products, i.e., the multiplication and summation of factors over either the input or the output dimension. This corresponds to propagating derivatives from the inputs

*forwards*, leading to

*forward-mode AD*, or from the end result

*backwards*, leading to

*reverse-mode AD*. As many popular AD frameworks are primarily implemented to work with neural networks, where scalar loss functions must be differentiated with respect to many parameters, reverse-mode AD, also called “backpropagation,”

^{47}is more generally available. More recent frameworks implement both approaches, for instance jax

^{43}which is used in the present work. AD can also be leveraged to compute contractions of higher-order derivative operators.

^{48}

## III. CONSTRUCTING GRAPH MLIPS

^{49}consisting of

*N*atoms with atomic numbers

*Z*

_{i}placed in a simulation cell which is infinitely periodically tiled in space. We define

_{sc}, B, Z), to a set of atomic potential energies U ≔ {

*U*

_{i}:

*i*= 1, …,

*N*}. Each

*U*

_{i}is a many-body function of positions; in general, we do not assume that a further decomposition into body-ordered contributions is available. The total potential energy is then obtained by summing over atomic contributions: $U=\u2211i=1NUi$. Since AD relies on the forward computation to calculate derivatives, it is sensitive to the exact implementation of this mapping. Care must be taken to construct the MLIP such that required derivatives are available.

This work considers MLIPs that scale linearly with *N*. Therefore, the number of atoms contributing to a given *U*_{i} must be bounded, which is achieved by introducing a cutoff radius *r*_{c}, restricting interactions to finite-sized atomic neighborhoods $N(i)={rj:rij\u2264rc,rj\u2208Rall}$. To ensure translational invariance, MLIPs do not rely on neighbor positions directly, but rather on atom-pair vectors centered on *i*, from which atom-pair vectors between neighboring atoms can be constructed, for instance to determine angles.

The resulting structure can be seen as a graph G. The vertices V of this graph are identified with atoms, labeled with their respective atomic numbers, and connected by edges E that are labeled with atom-pair vectors if placed closer than *r*_{c}. Starting from G, MLIPs can be constructed in different ways: Local MLIPs compute a suitable representation^{50} of each neighborhood, and predict *U*_{i} from that representation using a learned function, such as a neural network or a kernel machine. Such models are conceptually simple, but cannot account for effects that extend beyond *r*_{c}. Recently, semi-local models such as MPNNs^{22,29,51–58} have been introduced to tackle this shortcoming without compromising asymptotic runtime. In such models, effective longer-range interactions are built up iteratively by allowing adjacent neighborhoods to interact repeatedly. We introduce the parameter *M*, the *interaction depth*, to quantify how many such iterations are included. After *M* interactions, the energy at any given site can depend implicitly on positions within *M* hops on the graph, which we denote by N^{M}(*i*), leading to an *effective* cutoff radius $rceff=Mrc$. Atomic potential energy contributions can therefore depend on positions up to a distance of $rceff$. Since interactions are confined to neighborhoods at each iteration, the computational cost of semi-local models nevertheless remains linear in *N* and *M*.

We term this class of potentials, which act on sets of neighborhoods and use atom-pair vectors as input, graph-based machine-learning potentials (GLPs). Models with *M* = 1 are termed local, those with *M* > 1 semi-local. By construction, this framework does not include global interactions: GLPs are still local at thermodynamic scales. Nevertheless, the distinction between local and semi-local models is useful for the purpose of classifying the functional dependence of atomic potential energy contributions, which in turn determines the derivatives available via AD: In a local model, *U*_{i} is a function only of atom-pair vectors within its neighborhood, and derivatives of *U* with respect to a given *r*_{ij} therefore uniquely map to one *U*_{i}. This is not the case for semi-local models and will be discussed further in Sec. IV.

We consider two strategies to construct GLPs: The “standard” way, which includes periodic boundary conditions via the edges in the graph, and an “unfolded” formulation, where periodicity is explicitly included via replica positions. Both concepts are illustrated in Fig. 1.

*in the simulation cell*, using the minimum image convention (MIC) to include periodicity. Edges E

_{ij}exist between two atoms

*i*and

*j*in R

_{sc}if they interact:

^{std}and the set of edges E

^{std}.

Alternatively, we can first determine the total set of positions R_{unf} ⊂ R_{all} that can interact with atoms in the simulation cell, creating an unfolded system extracted from the bulk, consisting of R_{sc} and all replicas with up to $rceff$ distance from the cell boundary. This construction can be performed efficiently, and adds only a number of positions that is proportional to the *surface* of the simulation cell, therefore becoming increasingly negligible as *N* increases at constant density.^{42} However, the number of positions increases cubically with *M*, and this construction is therefore practical only for small *M* (see supplementary material).

We proceed by constructing a correspondingly modified graph G^{unf}, and compute potential energies for vertices corresponding to atoms in the simulation cell only. By construction, since the same atom-pair vectors appear in the graph, this approach then reproduces the potential energy of the standard method. A similar construction appears in spatial decomposition approaches for multi-device parallelization, discussed for instance in Ref. 59: To contain the calculation of interactions in one device, all interacting positions outside a given domain must be considered as “ghost atoms.” The number of ghost atoms corresponds to the number of additional atoms due to the unfolding. As this approach becomes unfeasible for large values of $rceff$, alternative schemes must be considered, for instance propagating message-passing updates, and later gradient information, between devices.

## IV. DERIVATIVES

Having constructed the forward function for a given GLP, we can compute derivatives with respect to its inputs using AD. In this section, we discuss how forces, stress, and heat flux can be computed in this manner, and demonstrate the relationship between different formulations.

### A. Forces

_{sc}are an explicit input, they can be computed directly with AD:

*U*is a scalar, and this therefore is a trivial Jacobian-vector product, which can be computed with the same asymptotic cost as

*U*.

*U*is a function of all edges,

*M*= 1, the local case, this definition reduces to a more standard form

^{37}

*M*> 1, this definition includes a sum over all

*U*

_{k}that are influenced by a given edge

*U*

_{i}purely as a function of atom-pair vector originating from

*i*. However, to construct derivatives with respect to these atom-pair vectors with AD, these extended neighborhoods have to be constructed and included explicitly, therefore negating the computational efficiency gains of a GLP architecture.

### B. Stress

^{60}

*U*denoting the potential energy after a strain transformation with the 3 × 3 symmetric tensor

*ϵ*_{all}.

One approach, followed for instance by schnetpack,^{28,61} schnetpack 2.0,^{30} and nequip,^{54,62} is to inject the strain transformation explicitly into the construction of the GLP. This can be done at different points: One can transform R_{sc} and B *before* constructing G, directly transform atom-pair vectors, or transform all contributing positions R_{unf}. Alternatively, as the inner derivative of inputs with respect to ** ϵ** is simply the input, the derivative of

*U*with respect to inputs can be obtained with AD, and the stress can then be computed analytically from the results. This avoids modifying the forward computation of

*U*entirely.

^{31,33}Louwerse and Baerends

^{31}considered the calculation of the stress for periodic systems, highlighting that contributions from replica positions must be considered, either by computing the elementary definition in Eq. (10) or the inclusion of the second term in Eq. (13). Both options are considered “much effort” in that work. Thompson

*et al.*

^{33}discuss an alternative approach based on explicitly including replica positions, obtaining a result similar to the unfolded construction used for Eq. (15).

All these forms of the stress are equivalent and can be implemented directly with AD, obtaining numerically identical results as seen in Tables I and IV. In all cases, as the scalar *U* is differentiated with respect to its inputs, asymptotic cost remains linear.

Equations . | Single . | Double . | ||
---|---|---|---|---|

MAE (eV) . | MAPE (%) . | MAE (eV) . | MAPE (%) . | |

Fin.diff. | 7.70 × 10^{−4} | 1.04 × 10^{−1} | 1.13 × 10^{−6} | 1.18 × 10^{−4} |

(10) | 1.19 × 10^{−5} | 1.79 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(11) | 8.25 × 10^{−6} | 1.27 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(12) | 9.17 × 10^{−6} | 1.36 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(13) | 1.18 × 10^{−5} | 1.79 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(14) | 8.22 × 10^{−6} | 1.27 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(15) | 9.16 × 10^{−6} | 1.37 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

Equations . | Single . | Double . | ||
---|---|---|---|---|

MAE (eV) . | MAPE (%) . | MAE (eV) . | MAPE (%) . | |

Fin.diff. | 7.70 × 10^{−4} | 1.04 × 10^{−1} | 1.13 × 10^{−6} | 1.18 × 10^{−4} |

(10) | 1.19 × 10^{−5} | 1.79 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(11) | 8.25 × 10^{−6} | 1.27 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(12) | 9.17 × 10^{−6} | 1.36 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(13) | 1.18 × 10^{−5} | 1.79 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(14) | 8.22 × 10^{−6} | 1.27 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

(15) | 9.16 × 10^{−6} | 1.37 × 10^{−3} | 3.15 × 10^{−6} | 3.69 × 10^{−4} |

However, as AD computes algorithmic derivatives, which reflect the particular implementation of the energy function exactly, care must be taken to ensure that the strain transformation in Eq. (9) is applied consistently. For example, obtaining a correct stress with Eqs. (10) and (13), where derivatives are obtained *before* the construction of the input graph, requires that the implementation of the MIC used matches the definition in Sec. III. If the MIC is implemented using fractional coordinates,^{63} this can be challenging. In such cases, it may be preferable to rely on approaches that compute derivatives on the basis of the graph input, such as Eqs. (11) and (14). This method is adopted, for instance, by jax-md.^{27} Many other MLIP packages use offsets to implement the MIC, and can therefore use Eq. (10), for instance schnetpack,^{61}^{,} schnetpack 2.0,^{30}^{,} nequip,^{62} and mace.^{64} The matgl package,^{65} on the other hand, uses Eq. (14), avoiding the introduction of a strain transformation.

*U*

_{i}, or one forward pass for each entry in

**. If only reverse-mode AD is available, its evaluation therefore scales quadratically with**

*ϵ**N*. Linear scaling is retained with forward mode. For GLPs with

*M*= 1, linear scaling in reverse mode can be recovered by using Eq. (14): Every edge can be uniquely assigned to one

*U*

_{i}, and therefore the derivatives can be used to construct atomic stresses. For

*M*> 1, this is not possible; similar to the observations of Sec. IV A, atomic stresses take a semi-local form.

### C. Heat flux

Finally, we discuss the heat flux, which is required to compute thermal conductivities with the Green–Kubo (GK) method.^{67–68} It describes how energy flows between atoms, and has been the focus of a large body of previous work.^{32,35,37,40–42,69}

^{70}for periodic quantum systems. It reads

^{42}

^{,}

*v*_{i}denote velocities,

*m*

_{i}masses, and $Ei=Ui+1/2mivi2$ is the total energy per atom. Intuitively, the “potential” term

*J*_{pot}describes how the total instantaneous change in

*U*

_{i}can be attributed to interactions with other atoms, with energy flowing between them, while the second, “convective,” term

*J*_{conv}describes energy being carried by individual atoms. In the present setting,

*J*_{conv}can be computed directly, as

*E*

_{i}are available.

*J*_{pot}, however, presents a challenge in an AD framework: In principle,

*J*_{pot}could be computed directly, obtaining the required partial derivatives with AD. However, as

*J*_{pot}is neither a Jacobian-vector nor a vector-Jacobian product, this requires repeated evaluations over the input or output dimension. Even when restricting

*j*∈ R

_{sc}, which can be achieved by introducing the MIC for

*r*_{ji}(see supplementary material for details),

*N*, rendering the system sizes and simulation times required for the GK method inaccessible.

^{42}Based on previous work,

^{42}we therefore survey approaches that restore linear scaling in the following.

*M*= 1, edges can be uniquely assigned to atomic energy contributions as discussed for atomic stresses in Sec. IV B. In this case

We note that the terms appearing in front of *v*_{j} also appear in the stress in Eq. (14). However, for a given *j*, the pre-factor *cannot* be identified with the atomic stress as defined in Eq. (16) – the atomic energy being differentiated is not *U*_{j}, but *U*_{i}. The indices can only be exchanged for additive pairwise potentials; this inequivalence was recently corrected in the large-scale atomic/molecular massively parallel simulator (LAMMPS) code.^{40}

This approach is not applicable for *M* > 1, since the relation in Eq. (20) no longer holds, and the mapping between stress contributions and heat flux contributions becomes invalid. Therefore, Eq. (19) must be used, incurring quadratic computational cost.

To restore linear scaling, one must find a way to rewrite Eq. (19) in terms of either Jacobian-vector or vector-Jacobian products by executing the sum over *i* before taking the derivative with respect to *j*. The use of the MIC for the $rjiMIC$ prefactors prevents this: An index-dependent offset appears in $rjiMIC=ri\u2212rj+\u2211anajiba$. Adopting the unfolded construction avoids this issue, as replica positions are now included explicitly. The heat flux can then be split into two terms, which can be rewritten in a form suitable for AD.^{42}

*r*_{i}, but not used to compute

*U*, and defining the energy barycenter $B=\u2211i\u2208RscriauxUi$, the heat flux becomes

_{unf}scales as

*N*

^{2/3}, overall linear scaling is restored, albeit with a pre-factor due to the higher number of positions to be considered.

To summarize, we have introduced two forms of the heat flux that can be implemented efficiently with AD: Eq. (21), which applies to GLPs with *M* = 1, and Eq. (22), which applies for *M* ≥ 1, but introduces some additional overhead. Both are equivalent to the general, quadratically-scaling, form given in Eq. (17), as seen in Tables II and VI.

. | Single . | Double . | ||
---|---|---|---|---|

Equations . | MAE ($eVA\u030a/fs$) . | MAPE (%) . | MAE ($eVA\u030a/fs$) . | MAPE (%) . |

(19) | 2.81 × 10^{−9} | 1.71 × 10^{−2} | 1.47 × 10^{−10} | 6.81 × 10^{−4} |

(21) | 2.84 × 10^{−9} | 1.67 × 10^{−2} | 1.47 × 10^{−10} | 6.81 × 10^{−4} |

(22) | 2.44 × 10^{−9} | 1.54 × 10^{−2} | 1.47 × 10^{−10} | 6.81 × 10^{−4} |

. | Single . | Double . | ||
---|---|---|---|---|

Equations . | MAE ($eVA\u030a/fs$) . | MAPE (%) . | MAE ($eVA\u030a/fs$) . | MAPE (%) . |

(19) | 2.81 × 10^{−9} | 1.71 × 10^{−2} | 1.47 × 10^{−10} | 6.81 × 10^{−4} |

(21) | 2.84 × 10^{−9} | 1.67 × 10^{−2} | 1.47 × 10^{−10} | 6.81 × 10^{−4} |

(22) | 2.44 × 10^{−9} | 1.54 × 10^{−2} | 1.47 × 10^{−10} | 6.81 × 10^{−4} |

## V. EXPERIMENTS

### A. Lennard-Jones argon

The stress formulas in Eqs. (10)–(15) and heat flux formulas Eqs. (19), (21), and (22) have been implemented in the glp package^{44} using jax.^{43} As a first step, we numerically verify this implementation.

^{45}is employed, where analytical derivatives including those required for the heat flux are readily available, and implementations are included in many packages, for example the atomic simulation environment (ASE).

^{71}In the GLP framework, the Lennard-Jones potential can be seen as an extreme case of a

*M*= 1 GLP, where

*U*

_{i}is composed of a sum of pair terms:

For this experiment, parameters approximating elemental argon are used.^{72} 100 randomly displaced and distorted geometries, based on the 512-atom 8 × 8 × 8 supercell of the face-centered cubic primitive cell with lattice parameter $3.72A\u030a$ and angle 60° are used. Random velocities to evaluate a finite heat flux are sampled from the Boltzmann distribution corresponding to 10 K. Additional computational details are discussed in the supplementary material.

Table I compares the stress formulations in Eqs. (10)–(15) with finite differences. We report “best-case” results for finite differences, choosing the stepsize that minimizes the error. In the table, the mean absolute error (MAE) and mean absolute percentage error (MAPE) with respect to the analytical ground truth are reported. All given formulations are found to be equivalent. In single precision arithmetic, the AD-based implementations outperform finite differences, in double precision, errors are similar.

For the heat flux, finite difference approaches are not feasible. Therefore, only AD-based implementations are shown in Table II. In the case of the Lennard-Jones potential, where *M* = 1, Eqs. (19), (21), and (22) are found to be identical.

Having confirmed the numerical correctness of the presented implementations, we then proceed with a benchmark of the computation time of Eqs. (10)–(15). Here, we construct cubic supercells of varying sizes, generating 1000 randomly displaced samples; additional details are given in the supplementary material. The results, computed on one Nvidia V100 graphics processing unit (GPU), are given in Table III: Overall, computation times per atom are similar for AD-based approaches. Computation time per atom decreases with system size, as calculations are parallelized on the GPU. In this case, stress formulations based on the unfolded construction are on par with other approaches, indicating that in this setting, computational cost is dominated by graph construction, rather than the evaluation of the potential itself. For more complex potentials, as seen in Sec. VI, this is not the case, and the introduction of additional positions yields significant computational cost. The best performance in single precision is obtained for Eq. (14), which avoids the use of a strain transformation, and re-uses gradients computed for the forces. For double precision, Eq. (15) is optimal, which avoids the use of the MIC or a strain transformation, and also re-uses gradients.

. | Single . | Double . | ||||
---|---|---|---|---|---|---|

. | Time per atom (s) . | Time per atom (s) . | ||||

Equations . | N = 864
. | N = 2048
. | N = 4000
. | N = 864
. | N = 2048
. | N = 4000
. |

F.d. | 3.1 × 10^{−5} | 1.6 × 10^{−5} | 9.9 × 10^{−6} | 3.8 × 10^{−5} | 2.1 × 10^{−5} | 1.4 × 10^{−5} |

(10) | 1.0 × 10^{−6} | 7.1 × 10^{−7} | 6.0 × 10^{−7} | 1.8 × 10^{−6} | 1.3 × 10^{−6} | 1.1 × 10^{−6} |

(11) | 7.7 × 10^{−7} | 5.1 × 10^{−7} | 4.5 × 10^{−7} | 1.3 × 10^{−6} | 9.8 × 10^{−7} | 8.0 × 10^{−7} |

(12) | 9.2 × 10^{−7} | 5.4 × 10^{−7} | 3.7 × 10^{−7} | 1.1 × 10^{−6} | 6.7 × 10^{−7} | 4.9 × 10^{−7} |

(13) | 9.4 × 10^{−7} | 6.7 × 10^{−7} | 5.8 × 10^{−7} | 1.6 × 10^{−6} | 1.3 × 10^{−6} | 1.0 × 10^{−6} |

(14) | 7.2 × 10^{−7} | 4.5 × 10^{−7} | 3.6 × 10^{−7} | 1.2 × 10^{−6} | 9.0 × 10^{−7} | 7.1 × 10^{−7} |

(15) | 8.9 × 10^{−7} | 5.3 × 10^{−7} | 3.7 × 10^{−7} | 1.0 × 10^{−6} | 6.5 × 10^{−7} | 4.6 × 10^{−7} |

. | Single . | Double . | ||||
---|---|---|---|---|---|---|

. | Time per atom (s) . | Time per atom (s) . | ||||

Equations . | N = 864
. | N = 2048
. | N = 4000
. | N = 864
. | N = 2048
. | N = 4000
. |

F.d. | 3.1 × 10^{−5} | 1.6 × 10^{−5} | 9.9 × 10^{−6} | 3.8 × 10^{−5} | 2.1 × 10^{−5} | 1.4 × 10^{−5} |

(10) | 1.0 × 10^{−6} | 7.1 × 10^{−7} | 6.0 × 10^{−7} | 1.8 × 10^{−6} | 1.3 × 10^{−6} | 1.1 × 10^{−6} |

(11) | 7.7 × 10^{−7} | 5.1 × 10^{−7} | 4.5 × 10^{−7} | 1.3 × 10^{−6} | 9.8 × 10^{−7} | 8.0 × 10^{−7} |

(12) | 9.2 × 10^{−7} | 5.4 × 10^{−7} | 3.7 × 10^{−7} | 1.1 × 10^{−6} | 6.7 × 10^{−7} | 4.9 × 10^{−7} |

(13) | 9.4 × 10^{−7} | 6.7 × 10^{−7} | 5.8 × 10^{−7} | 1.6 × 10^{−6} | 1.3 × 10^{−6} | 1.0 × 10^{−6} |

(14) | 7.2 × 10^{−7} | 4.5 × 10^{−7} | 3.6 × 10^{−7} | 1.2 × 10^{−6} | 9.0 × 10^{−7} | 7.1 × 10^{−7} |

(15) | 8.9 × 10^{−7} | 5.3 × 10^{−7} | 3.7 × 10^{−7} | 1.0 × 10^{−6} | 6.5 × 10^{−7} | 4.6 × 10^{−7} |

## VI. TIN SELENIDE WITH SO3KRATES

To investigate stress and heat flux in a practical setting, we now study tin selenide (SnSe) using the state-of-the-art so3krates GLP.^{46} In contrast to other equivariant MLIPs, for instance nequip,^{54} so3krates replaces shared equivariant feature representations by separated branches for invariant and equivariant information, whose information exchange is handled using an equivariant self-attention mechanism. By doing so, one can achieve data efficiency and extrapolation quality competitive to state-of-the-art GLPs at reduced time and memory complexity. As non-local interactions are not modeled in the GLP framework introduced in this work, global interactions are disabled in the so3krates models used at present.

The training data for this task is obtained from a large-scale *ab initio* Green–Kubo (aiGK) benchmark study by Knoop *et al.*,^{73,74} and consists of four aiMD trajectories in the *NVT* ensemble at 300 K and different volumes, from which a total of 3000 timesteps are used. These simulations were performed as the first steps of the aiGK workflow, first determining the equilibrium volume at 300 K and then generating thermalized starting geometries for the long-running *NVE* simulations required for the GK method. For this experiment, we repurpose these thermalization trajectories as training data, aiming to replace the subsequent, computationally expensive, aiMD with MLIP-accelerated MD. Additional details on the MLIP training can be found in the supplementary material.

### A. Implementation of stress and heat flux

While no analytical derivatives are available for so3krates, the implementation of the stress can be verified with finite differences, and the heat flux can be checked for consistency between different implementations. Similar to Sec. V A, we use 100 randomly displaced and distorted 4 × 8 × 8 supercells of the 0 K primitive cell of SnSe for this experiment, sampling velocities from the Boltzmann distribution at 10 K to evaluate a finite heat flux.

Table IV compares the stress implementations in Eqs. (10)–(15) with finite differences, confirming the equivalence of all implemented formulations. Table V shows the computation times for the stress, for so3krates with *M* = 2; additional tables for *M* = 1, 3 can be found in the supplementary material. In contrast to the results for the Lennard-Jones potential, all AD-based approaches using the standard construction display similar performance, while the unfolded formulations, Eqs. (12) and (15), are significantly slower to compute, with the additional computational cost proportional to the number of unfolded positions. We conclude that in this setting, using a state-of-the-art MLIP, the computational cost of evaluating the potential dominates implementation differences in the stress, and all standard methods can be used.

Equations . | Single . | Double . | ||
---|---|---|---|---|

MAE (eV) . | MAPE (%) . | MAE (eV) . | MAPE (%) . | |

(10) | 1.58 × 10^{−2} | 4.40 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(11) | 1.58 × 10^{−2} | 4.38 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(12) | 1.57 × 10^{−2} | 4.33 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(13) | 1.58 × 10^{−2} | 4.40 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(14) | 1.58 × 10^{−2} | 4.38 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(15) | 1.57 × 10^{−2} | 4.33 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

Equations . | Single . | Double . | ||
---|---|---|---|---|

MAE (eV) . | MAPE (%) . | MAE (eV) . | MAPE (%) . | |

(10) | 1.58 × 10^{−2} | 4.40 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(11) | 1.58 × 10^{−2} | 4.38 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(12) | 1.57 × 10^{−2} | 4.33 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(13) | 1.58 × 10^{−2} | 4.40 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(14) | 1.58 × 10^{−2} | 4.38 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

(15) | 1.57 × 10^{−2} | 4.33 × 10^{−2} | 1.45 × 10^{−4} | 2.32 × 10^{−4} |

. | Single . | Double . | ||||
---|---|---|---|---|---|---|

Time per atom (s) . | Time per atom (s) . | |||||

Equations . | N = 864
. | N = 2048
. | N = 4000
. | N = 864
. | N = 2048
. | N = 4000
. |

F.d. | 1.2 × 10^{−4} | 9.5 × 10^{−5} | 8.8 × 10^{−5} | 2.2 × 10^{−4} | 1.9 × 10^{−4} | 1.8 × 10^{−4} |

(10) | 8.2 × 10^{−6} | 7.1 × 10^{−6} | 6.9 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(11) | 8.0 × 10^{−6} | 7.0 × 10^{−6} | 6.9 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(12) | 3.5 × 10^{−5} | 2.5 × 10^{−5} | 2.0 × 10^{−5} | 8.4 × 10^{−5} | 5.7 × 10^{−5} | 4.6 × 10^{−5} |

(13) | 8.0 × 10^{−6} | 7.1 × 10^{−6} | 6.9 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(14) | 8.0 × 10^{−6} | 7.0 × 10^{−6} | 6.8 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(15) | 3.5 × 10^{−5} | 2.5 × 10^{−5} | 2.0 × 10^{−5} | 8.4 × 10^{−5} | 5.7 × 10^{−5} | 4.6 × 10^{−5} |

. | Single . | Double . | ||||
---|---|---|---|---|---|---|

Time per atom (s) . | Time per atom (s) . | |||||

Equations . | N = 864
. | N = 2048
. | N = 4000
. | N = 864
. | N = 2048
. | N = 4000
. |

F.d. | 1.2 × 10^{−4} | 9.5 × 10^{−5} | 8.8 × 10^{−5} | 2.2 × 10^{−4} | 1.9 × 10^{−4} | 1.8 × 10^{−4} |

(10) | 8.2 × 10^{−6} | 7.1 × 10^{−6} | 6.9 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(11) | 8.0 × 10^{−6} | 7.0 × 10^{−6} | 6.9 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(12) | 3.5 × 10^{−5} | 2.5 × 10^{−5} | 2.0 × 10^{−5} | 8.4 × 10^{−5} | 5.7 × 10^{−5} | 4.6 × 10^{−5} |

(13) | 8.0 × 10^{−6} | 7.1 × 10^{−6} | 6.9 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(14) | 8.0 × 10^{−6} | 7.0 × 10^{−6} | 6.8 × 10^{−6} | 1.6 × 10^{−5} | 1.5 × 10^{−5} | 1.5 × 10^{−5} |

(15) | 3.5 × 10^{−5} | 2.5 × 10^{−5} | 2.0 × 10^{−5} | 8.4 × 10^{−5} | 5.7 × 10^{−5} | 4.6 × 10^{−5} |

Table VI compares the heat flux formulations in Eqs. (21) and (22) with the baseline in Eq. (19), implementing the quadratically-scaling “Hardy” heat flux with the MIC. For *M* = 1, all formulations are precisely equivalent. For *M* > 1, the semi-local case, only Eq. (22) is equivalent to the “Hardy” heat flux; Eq. (21) does not apply and consequently is not equivalent, displaying large deviations.

. | . | Single . | Double . | ||
---|---|---|---|---|---|

Equations . | M
. | MAE ($eVA\u030a/fs$) . | MAPE (%) . | MAE ($eVA\u030a/fs$) . | MAPE (%) . |

(21) | 1 | 5.78 × 10^{−9} | 9.74 × 10^{−4} | 1.09 × 10^{−17} | 1.73 × 10^{−12} |

(22) | 1 | 8.75 × 10^{−8} | 2.65 × 10^{−2} | 1.69 × 10^{−16} | 4.31 × 10^{−11} |

(21) | 2 | 3.73 × 10^{−3} | 5.84 × 10^{2} | 3.73 × 10^{−3} | 5.84 × 10^{2} |

(22) | 2 | 9.36 × 10^{−8} | 1.00 × 10^{−2} | 1.54 × 10^{−16} | 1.60 × 10^{−11} |

(21) | 3 | 1.01 × 10^{−3} | 3.27 × 10^{2} | 1.01 × 10^{−3} | 3.25 × 10^{2} |

(22) | 3 | 9.74 × 10^{−8} | 3.04 × 10^{−2} | 1.65 × 10^{−16} | 2.91 × 10^{−11} |

. | . | Single . | Double . | ||
---|---|---|---|---|---|

Equations . | M
. | MAE ($eVA\u030a/fs$) . | MAPE (%) . | MAE ($eVA\u030a/fs$) . | MAPE (%) . |

(21) | 1 | 5.78 × 10^{−9} | 9.74 × 10^{−4} | 1.09 × 10^{−17} | 1.73 × 10^{−12} |

(22) | 1 | 8.75 × 10^{−8} | 2.65 × 10^{−2} | 1.69 × 10^{−16} | 4.31 × 10^{−11} |

(21) | 2 | 3.73 × 10^{−3} | 5.84 × 10^{2} | 3.73 × 10^{−3} | 5.84 × 10^{2} |

(22) | 2 | 9.36 × 10^{−8} | 1.00 × 10^{−2} | 1.54 × 10^{−16} | 1.60 × 10^{−11} |

(21) | 3 | 1.01 × 10^{−3} | 3.27 × 10^{2} | 1.01 × 10^{−3} | 3.25 × 10^{2} |

(22) | 3 | 9.74 × 10^{−8} | 3.04 × 10^{−2} | 1.65 × 10^{−16} | 2.91 × 10^{−11} |

### B. Equation of state and pressure

To assess the capability of so3krates to predict stress- and pressure-related materials properties, we calculate energy-volume and pressure-volume curves for SnSe to obtain an equation of state (EOS) of the Vinet form.^{75,76} The experiment is performed for unit cells that were homogeneously strained up to ±2% starting from the fully relaxed geometry, and relaxing the internal degrees of freedom afterwards. The energy vs volume curves for so3krates with *M* = 1, 2, 3 interaction steps are shown in Fig. 2 in comparison to the DFT reference using the PBEsol exchange-correlation functional with “light” default basis sets in FHI-aims.^{77,78} so3krates with three interaction steps (*M* = 3) yields the best visual agreement for the energy-volume curve in Fig. 2, and the best equilibrium volume. To quantify the agreement, we evaluate the Vinet EOS and extract the cohesive properties equilibrium volume *V*_{0}, isothermal Bulk modulus *B*_{0}, and its volume derivative $B0\u2032$ (functional forms are given in the supplementary material). Results are listed in Table VII. For so3krates with three interaction steps (*M* = 3), the predicted volume deviates by −16% from the DFT reference, and the deviation of the bulk modulus is −3.14%. A larger error is seen for the volume derivative of the bulk modulus, $B0\u2032$, which deviates by 27.7%, indicating worse agreement further away from the training region. Overall, the agreement between DFT and so3krates when predicting cohesive properties can be considered satisfactory. The energy-volume predictions are very good, and transfer even to volumes that are larger or smaller then the ones seen during training.

. | DFT . | M = 1
. | M = 2
. | M = 3
. |
---|---|---|---|---|

B_{0} ($eV/A\u030a3$) | 0.230 | 0.236 | 0.229 | 0.223 |

$B0\u2032$ ($eV/A\u030a6$) | 5.576 | 2.830 | 4.815 | 7.118 |

V_{0} ($A\u030a3$) | 26.429 | 26.489 | 26.322 | 26.388 |

Error B_{0} (%) | ⋯ | 2.58 | −0.47 | −3.14 |

Error $B0\u2032$ (%) | ⋯ | −49.25 | −13.64 | 27.66 |

Error V_{0} (%) | ⋯ | 0.22 | −0.40 | −0.16 |

. | DFT . | M = 1
. | M = 2
. | M = 3
. |
---|---|---|---|---|

B_{0} ($eV/A\u030a3$) | 0.230 | 0.236 | 0.229 | 0.223 |

$B0\u2032$ ($eV/A\u030a6$) | 5.576 | 2.830 | 4.815 | 7.118 |

V_{0} ($A\u030a3$) | 26.429 | 26.489 | 26.322 | 26.388 |

Error B_{0} (%) | ⋯ | 2.58 | −0.47 | −3.14 |

Error $B0\u2032$ (%) | ⋯ | −49.25 | −13.64 | 27.66 |

Error V_{0} (%) | ⋯ | 0.22 | −0.40 | −0.16 |

Finally we check the internal consistency of energy and stress predictions with so3krates by fitting energy-volume and pressure-volume curves and verify that they yield identical parameters for the EOS. Results are shown in Fig. 3. As seen there, results are in perfect agreement, which verifies that stress and resulting pressure are consistent with the underlying energy function as expressed in Eq. (8).

### C. Thermal conductivity

Finally, we proceed to GK calculations, following the approach outlined in our previous work.^{42,83} Finding that simulation cells with *N* = 864 atoms and a simulation duration of *t*_{0} = 1 ns yield converged results (see supplementary material), we run 11 MD simulations using the glp package and so3krates, computing ** J** at every step.

Table VIII shows the result: For *M* = 2, 3, so3krates in excellent agreement with the work by Brorsson *et al.*,^{80} which uses a force constant potential (FCP). A larger difference is observed with the work by Liu *et al.*,^{81} who, however, use the Perdew–Burke–Ernzerhof (PBE) exchange-correlation functional, as opposed to PBEsol, which was used for the present work. The observed thermal conductivity is consistent with experiments, as well as the size-extrapolated DFT result of Knoop *et al.*^{73} The anisotropy of ** κ** in SnSe is captured as well.

Source . | Method . | κ (W/mK)
. | κ_{x}
. | κ_{y}
. | κ_{z}
. |
---|---|---|---|---|---|

This work | so3krates, M = 1 | 0.99 ± 0.10 | 0.53 ± 0.03 | 1.31 ± 0.13 | 1.12 ± 0.12 |

so3krates, M = 2 | 1.13 ± 0.07 | 0.48 ± 0.04 | 1.59 ± 0.07 | 1.20 ± 0.07 | |

so3krates, M = 3 | 1.13 ± 0.10 | 0.56 ± 0.05 | 1.56 ± 0.15 | 1.32 ± 0.16 | |

Brorsson et al.^{80} | FCP | 1.12 | 0.57 | 1.46 | 1.32 |

Liu et al.^{81} | MLIP | 0.86 ± 0.13 | 0.57 ± 0.05 | 1.25 ± 0.24 | 0.76 ± 0.08 |

Knoop et al.^{73} | DFT (extrapolated) | 1.40 ± 0.39 | ⋯ | ⋯ | ⋯ |

Review by Wei et al.^{82} | Experiments | 0.45–1.9 | ⋯ | ⋯ | ⋯ |

Source . | Method . | κ (W/mK)
. | κ_{x}
. | κ_{y}
. | κ_{z}
. |
---|---|---|---|---|---|

This work | so3krates, M = 1 | 0.99 ± 0.10 | 0.53 ± 0.03 | 1.31 ± 0.13 | 1.12 ± 0.12 |

so3krates, M = 2 | 1.13 ± 0.07 | 0.48 ± 0.04 | 1.59 ± 0.07 | 1.20 ± 0.07 | |

so3krates, M = 3 | 1.13 ± 0.10 | 0.56 ± 0.05 | 1.56 ± 0.15 | 1.32 ± 0.16 | |

Brorsson et al.^{80} | FCP | 1.12 | 0.57 | 1.46 | 1.32 |

Liu et al.^{81} | MLIP | 0.86 ± 0.13 | 0.57 ± 0.05 | 1.25 ± 0.24 | 0.76 ± 0.08 |

Knoop et al.^{73} | DFT (extrapolated) | 1.40 ± 0.39 | ⋯ | ⋯ | ⋯ |

Review by Wei et al.^{82} | Experiments | 0.45–1.9 | ⋯ | ⋯ | ⋯ |

Overall, we find that so3krates with more than one interaction step is able to predict the converged thermal conductivity of SnSe, using training data *only* from the thermalization step of a full aiGK workflow. With this approach, long-running equilibrium aiMD simulations, the bottleneck of the GK method, have been avoided.

## VII. CONCLUSION

We explained how and demonstrated that the stress and heat flux can be computed efficiently with AD for potentials based on a graph of atom-pair vectors, which we termed GLPs, and provided example implementations in the glp package.^{44} In order to provide this functionality, glp also implements the necessary infrastructure to generate input graphs from structures, as well as the ability to compute forces. With this, it can be used as a lightweight driver for MD. An example implementation of the velocity Verlet integrator^{84} is therefore also provided. This functionality is extended in the gkx package,^{85} which implements the GK method using GLP.

Numerical experiments for Lennard-Jones argon and tin selenide with the so3krates GLP, verified that these quantities are computed correctly, efficiently, and consistently. The equivariant so3krates GLP was shown to predict cohesive properties and thermal conductivity of SnSe in good agreement with DFT, other MLIPs, and experiments, confirming the practical relevance of computational access to stress and heat flux.

This work enables the use of a large class of recently developed MLIPs, those which can be described in the GLP framework, in computational materials science, and in particular for the calculation of thermal conductivities using the GK method. For GLPs implemented with jax, the glp package is provided to enable the calculation of stress and heat flux without requiring further implementation efforts.

## SUPPLEMENTARY MATERIAL

The supplementary material contains (1) an explanation of Eq. (19), (2) additional details and results for the numerical experiments in Sec. V, (3) information about the GK workflow and convergence, and (4) explicit forms for the equation of state.

## ACKNOWLEDGMENTS

M.F.L. was supported by the German Ministry for Education and Research BIFOLD program (Ref. Nos. 01IS18025A and 01IS18037A), and by the TEC1p Project (ERC Horizon 2020 Grant No. 740233). M.F.L. would like to thank Samuel Schoenholz and Niklas Schmitz for constructive discussions, Shuo Zhao for feedback on the manuscript, and acknowledges contributions by Fabian Nagel and Adam Norris. J.T.F. acknowledges support from the Federal Ministry of Education and Research (BMBF) and BIFOLD program (Ref. Nos. 01IS18025A and 01IS18037A). F.K. acknowledges support from the Swedish Research Council (VR) Program No. 2020-04630, and the Swedish e-Science Research Centre (SeRC). The computations were partially enabled by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre (NSC) and by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at NSC partially funded by the Swedish Research Council through Grant Agreement No. 2022-06725. Xuan Gu at NSC is acknowledged for technical assistance on the Berzelius resource.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Marcel F. Langer**: Conceptualization (lead); Investigation (equal); Project administration (lead); Software (lead); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). **J. Thorben Frank**: Investigation (equal); Software (supporting); Writing – original draft (supporting); Writing – review & editing (supporting). **Florian Knoop**: Investigation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (supporting).

## DATA AVAILABILITY

The data and code that support the findings of this study are available at https://doi.org/10.5281/zenodo.8406532 or https://github.com/sirmarcel/glp-archive. The glp package is available at https://github.com/sirmarcel/glp, the so3krates model is implemented in mlff at https://github.com/thorben-frank/mlff. Further information can also be found at https://marcel.science/glp.

## REFERENCES

*Statistical Mechanics: Theory and Molecular Simulation*

*Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6-11 August 2017*

*Evaluating Derivatives*

*Ab initio*Green-Kubo approach for the thermal conductivity of solids

Non-periodic systems can be formally accommodated by setting B such that replicas lie outside the interaction cutoff radius. The glp framework supports non-periodic systems.

*Ab initio*study of high-pressure behavior of a low compressibility metal and a hard material: Osmium and diamond

*Ab initio*molecular simulations with numeric atom-centered orbitals

*Ab initio*Green-Kubo simulations of heat transport in solids: Method and implementation