Stress and heat flux via automatic differentiation

Machine-learning potentials provide computationally efficient and accurate approximations of the Born-Oppenheimer potential energy surface. This potential determines many materials properties and simulation techniques usually require its gradients, in particular forces and stress for molecular dynamics, and heat flux for thermal transport properties. Recently developed potentials feature high body order and can include equivariant semi-local interactions through message-passing mechanisms. Due to their complex functional forms, they rely on automatic differentiation (AD), overcoming the need for manual implementations or finite-difference schemes to evaluate gradients. This study demonstrates a unified AD approach to obtain forces, stress, and heat flux for such potentials, and provides a model-independent implementation. The method is tested on the Lennard-Jones potential, and then applied to predict cohesive properties and thermal conductivity of tin selenide using an equivariant message-passing neural network potential.


I. INTRODUCTION
Molecular dynamics (MD) simulations enable computational prediction of thermodynamic quantities for a wide range of quantum systems, and constitute a cornerstone of modern computational science 1 .In MD, systems are simulated by propagating Newton's equations of motion -potentially modified to model statistical ensembles -numerically in time, based on the forces acting on each atom due to their movement on the Born-Oppenheimer potential-energy surface (PES).Therefore, the quality of the underlying PES is important for the predictive ability of this method.First-principles electronic structure methods such as density-functional theory (DFT) can be used to perform high-accuracy MD simulations 2 , provided the exchange-correlation approximation is reliable 3 .Such approaches are, however, restricted by high computational cost, severely limiting accessible size and time scales.Computationally efficient approximations to the underlying PES are therefore required for the atomistic simulation of larger systems: Forcefields (FFs) are built on analytical functional forms that are often based on physical bonding principles, parametrized to reproduce quantities of interest for a given material 4 .They are computationally cheap, but parametrizations for novel materials are not always available, and their flexibility is limited by their fixed functional form.Machine-learning potentials (MLPs) [5][6][7][8][9][10][11] , where a potential is inferred based on a small set of reference calculations, aim to retain near first-principles accuracy while remaining linear scaling with the number of atoms.While MLPs are limited, in principle, to modeling the physical mechanisms present in the training data, they have nevertheless emerged as an important tool for MD [12][13][14][15][16] , often combined with active learning schemes [17][18][19][20][21] .Modern MLPs can include semi-local interactions through message-passing (MP) mechanisms 22 , internal equivariant representations 23 , and body-order expansions 24 , which enable the efficient construction of flexible many-body interactions.In such complex architectures, the manual implementation of derivatives is often unfeasible.Finite-difference approaches require tuning of additional parameters, as well as repeated energy evaluations.Automatic differentiation (AD) 25,26 presents an intriguing alternative: If the computation of the potential energy U is implemented in a suitable framework, derivatives such as the forces F or the stress σ can be computed with the same asymptotic computational cost as computing the energy.This is accomplished by decomposing the underlying 'forward' computation into elementary operations with analytically tractable derivatives, computing local gradients for a given input, and then combining the resulting values using the chain rule.
This work provides a systematic discussion of the use of AD to compute forces, stress, and heat flux for MLPs.While the calculation of forces with AD is common practice [27][28][29][30] , stress and heat flux are not yet commonly available for many MLPs.Both quantities have been the focus of much previous work due to the difficulty of defining and implementing them for many-body potentials and periodic boundary conditions [31][32][33][34][35][36][37][38][39][40][41][42] .Introducing an abstract definition of MLPs as functions of a graph representation of atomistic systems, unified formulations of stress and heat flux are given, which can be implemented generally for any such graph-based machinelearning potential (GLP).An example implementation using jax 43 is provided in the glp package 44 .To validate the approach, different formulations of stress and heat flux are compared for the Lennard-Jones potential 45 , where analytical derivatives are readily available for comparison, as well as a state-of-the art message-passing neural network (MPNN), SO3KRATES 46 .Having established the correctness of the proposed implementation, the ability of SO3KRATES to reproduce first-principles cohesive properties and thermal conductivity of tin selenide (SnSe) is studied.

II. AUTOMATIC DIFFERENTIATION
Automatic differentiation (AD) is a technique to obtain derivatives of functions implemented as computer programs. 25,26It is distinct from numerical differentiation, where finite-difference schemes are employed, and symbolic differentiation, where analytical derivatives are obtained manually or via computer algebra systems, and then implemented explicitly.Instead, AD relies on the observation that complex computations can often be split up into elementary steps, for which derivatives are readily implemented.If one can track those derivatives during the computation of the forward, or 'primal', function, the chain rule allows to obtain derivatives.
For this work, two properties of AD are particularly relevant: It allows the computation of derivatives with respect to quantities that are explicitly used in the forward computation, and it can do so at the same asymptotic computational cost as the forward function.In particular, AD can obtain two quantities efficiently: Given a differentiable function u : R N → R M , the Jacobian of u is defined as the M × N matrix ∂ u i (x)/∂ x j .AD can then obtain Jacobian-vector and vector-Jacobian products, i.e., the multiplication and summation of factors over either the input or the output dimension.This corresponds to propagating derivatives from the inputs forwards, leading to forward-mode AD, or from the end result backwards, leading to reverse-mode AD.As many popular AD frameworks are primarily implemented to work with neural networks, where scalar loss functions must be differentiated with respect to many parameters, reverse-mode AD, also called 'backpropagation', 47 is more generally available.More recent frameworks implement both approaches, for instance jax 43 which is used in the present work.AD can also be leveraged to compute contractions of higher-order derivative operators 48 .

III. CONSTRUCTING GRAPH MLPS
This work considers periodic systems, 49 consisting of N atoms with atomic numbers Z i placed in a simulation cell which is infinitely periodically tiled in space.We define basis or lattice vectors In this setting, a MLP is a function that maps the structure represented by its positions, lattice vectors, and atomic numbers, (R sc , B, Z), to a set of atomic potential energies U := {U i : i = 1 ... N }, which yield the total potential energy U = ∑ i=1 U i .Since AD relies on the forward computation to calculate derivatives, it is sensitive to the exact implementation of this mapping.Care must therefore be taken to construct the MLP such that required derivatives are available.This work considers MLPs that scale linearly with N. Therefore, the number of atoms contributing to a given U i must be bounded, which is achieved by introducing a cutoff radius r c , restricting interactions to finite-sized atomic neighborhoods N(i) = { r j : r i j ≤ r c , r j ∈ R all }.To ensure translational invariance, MLPs do not rely on neighbor positions directly, but rather on atom-pair vectors centered on i, from which atom-pair vectors between neighboring atoms can be constructed, for instance to determine angles.
The resulting structure can be seen as a graph G.The vertices V of this graph are identified with atoms, labeled with their respective atomic numbers, and connected by edges E that are labeled with atom-pair vectors if placed closer than r c .Starting from G, MLPs can be constructed in different ways: Local MLPs compute a suitable representation 50 of each neighborhood, and predict U i from that representation using a learned function, such as a neural network or a kernel machine.Such models are conceptually simple, but cannot account for effects that extend beyond r c .Recently, semilocal models such as MPNNs 22,29,[51][52][53][54][55][56][57][58] have been introduced to tackle this shortcoming without compromising asymptotic runtime.In such models, effective longer-range interactions are built up iteratively by allowing adjacent neighborhoods to interact repeatedly.We introduce the parameter M, the interaction depth, to quantify how many such iterations are included.After M interactions, the energy at any given site can depend implicitly on positions within M hops on the graph, which we denote by N M (i), leading to an effective cutoff radius r eff c = M r c .However, since interactions are confined to neighborhoods at each iteration, the asymptotic linear scaling is not impacted.Local MLPs are formally included as semi-local models with M=1, allowing a unified treatment for both.We term this class of potentials, which act on sets of neighborhoods and use atom-pair vectors as input, graphbased machine-learning potentials (GLPs).By construction, this framework does not include global interactions.
We consider two strategies to construct GLPs: The 'standard' way, which includes periodic boundary conditions via the edges in the graph, and an 'unfolded' formulation, where periodicity is explicitly included via replica positions.
In the standard architecture, vertices in G are identified with atoms in the simulation cell, using the minimum image convention (MIC) to include periodicity.Edges E i j exist between two atoms i and j in R sc if they interact: We denote the the graph constructed in this manner as G std and the set of edges E std .Alternatively, we can first determine the total set of positions R unf ⊂ R all that can interact with atoms in the simulation cell, creating an 'unfolded' system extracted from the bulk, consisting of R sc and all replicas with up to r eff c distance from the cell boundary.This construction can be performed efficiently, and adds only a number of positions that is proportional to the surface of the simulation cell, therefore becoming increasingly negligible as N increases at constant density 42 .We proceed by constructing a correspondingly modified graph G unf , and compute potential energies for vertices corresponding to atoms in the simulation cell only.By construction, since the same atom-pair vectors appear in the graph, this approach then reproduces the potential energy of the standard method.

IV. DERIVATIVES
Having constructed the forward function for a given GLP, we can compute derivatives with respect to its inputs using AD.In this section, we discuss how forces, stress, and heat flux can be computed in this manner, and demonstrate the relationship between different formulations.

A. Forces
For MD, the most relevant quantities are the forces acting on the atoms in the simulation cell.Since R sc are an explicit input, they can be computed directly with AD: U is a scalar, and this therefore is a trivial Jacobian-vector product, which can be computed with the same asymptotic cost as U.
An interesting situation arises if pairwise forces are desired.Strictly speaking, in a many-body MLP, where interactions cannot be decomposed into pairwise contributions, such quantities are not well-defined and Newton's third law is replaced by conservation of momentum, which requires ∑ N i=1 F i = 0. Nevertheless, pairwise forces with an antisymmetric structure can be defined by exploiting the construction of GLPs in terms of atom-pair vectors.In the standard formulation, U is a function of all edges, Hence, by the chain rule, The pairwise forces such defined exhibit anti-symmetry, and therefore fulfil Newton's third law.For M=1, the local case, this definition reduces to a more standard form 37 However, for general GLPs with M>1, this definition includes a sum over all U k that are influenced by a given edge subverting expectations connecting local potential energies to pairwise forces.We note that this seeming contradiction is a consequence of the combination of the peculiar construction of GLPs and AD: In principle, it is always possible to define extended neighborhoods up to r eff c , obtaining U i purely as a function of atom-pair vector originating from i.However, to construct derivatives with respect to these atom-pair vectors with AD, these extended neighborhoods have to be constructed and included explicitly, therefore negating the computational efficiency gains of a GLP architecture.

B. Stress
The definition of the (potential) stress is 59 with U denoting the potential energy after a strain transformation with the the 3 × 3 symmetric tensor acting on R all .While computing this derivative for arbitrary potentials and periodic systems has required 'much effort' 31 in the past 33,60 , it is straightforward with AD.The simplest approach, followed for instance by schnetpack 61,62 and nequip 54,63 , is to inject the strain transformation explicitly into the construction of the GLP.This can be done at different points: One can transform R sc and B before constructing G, directly transform atom-pair vectors, or transform all contributing positions R unf .Alternatively, as the inner derivative of inputs with respect to is simply the input, the derivative of U with respect to inputs can be obtained with AD, and the stress computed analytically from the results.This avoids modifying the forward computation of U entirely.
These approaches yield, with ⊗ denoting an outer product, recovering previous formulations given by Louwerse and Baerends 31 and Thompson 33 .As seen in Tables I and III, all such forms of the stress are equivalent, provided the strain transformation is applied consistently to all used inputs. 64n all cases, as U is differentiated with respect to its inputs, asymptotic cost remains linear.
A more complex situation arises if strain derivatives of atomic energies, i.e., atomic stresses (16)   are required.Their calculation requires either one backward pass per U i , or one forward pass for each entry in .If only reverse-mode AD is available, its evaluation therefore scales quadratically with N. Linear scaling is retained with forward mode.For GLPs with M=1, linear scaling in reverse mode can be recovered by using Eq. ( 14): Every edge can be uniquely assigned to one U i , and therefore the derivatives can be used to construct atomic stresses.For M>1, this is not possible; similar to the observations of the previous section, atomic stresses take a semi-local form.
The fundamental definition of the heat flux for MLPs was originally derived by Hardy 68 for periodic quantum systems.It reads 42 =: where v i denote velocities, m i masses, and is the total energy per atom.Intuitively, the 'potential' term J pot describes how the total instantaneous change in U i can be attributed to interactions with other atoms, with energy flowing between them, while the second, 'convective', term J conv describes energy being carried the individual atoms.In the present setting, J conv can be computed directly, as E i are available.J pot , however, presents a challenge in an AD framework: In principle, J pot could be computed directly, obtaining the required partial derivatives with AD.However, as J pot is neither a Jacobian-vector nor a vector-Jacobian product, this requires repeated evaluations over the input or output dimension.Even when restricting j ∈ R sc , which can be achieved by introducing the MIC for r ji (see Supp.Mat. for details), computational cost of a direct implementation with AD scales quadratically with N, rendering the system sizes and simulation times required for the GK method inaccessible 42 .We therefore consider approaches that restore linear scaling in the following.
For M=1, edges can be uniquely assigned to atomic energy contributions as discussed for atomic stresses in Sec.IV B. In this case so that which requires a single evaluation of reverse-mode AD.
We note that the terms appearing in front of v j also appear in the stress in Eq. ( 14).However, for a given j, the prefactor cannot be identified with the atomic stress as defined in Eq. ( 16) -the atomic energy being differentiated is not U j , but U i .The indices can only be exchanged for additive pairwise potentials; this inequivalence was recently corrected in the LAMMPS code 40 .
This approach is not applicable for M>1, since the relation in Eq. ( 20) no longer holds, and the mapping between stress contributions and heat flux contributions becomes invalid.By using the unfolded construction, however, linear scaling can be restored regardless 42 .Introducing auxiliary positions r aux i , which are numerically identical to the positions r i , but not used to compute U, and defining the energy barycenter B = ∑ i∈R sc r aux i U i , the heat flux can be written as The first term requires three reverse-mode evaluations, or one forward-mode evaluation, the latter a single backward-or forward-mode evaluation.Since the overhead introduced by explicitly constructing R unf scales as N 2/3 , overall linear scaling is restored, albeit with a pre-factor due to the higher number of positions to be considered.
To summarize, we have introduced two forms of the heat flux that can be implemented efficiently with AD: Equation (21), which applies to GLPs with M=1, and Eq. ( 22), which applies for M≥1, but introduces some additional overhead.Both are equivalent to the general, quadraticallyscaling, form given in Eq. ( 17), as seen in Tables II and IV.

V. EXPERIMENTS A. Lennard-Jones Argon
The stress formulas in Eqs.(10) to (15) and heat flux formulas Eqs. ( 19), ( 21) and (22) have been implemented in the glp package 44 using jax 43 .As a first step, we numerically verify this implementation.
To this end, the Lennard-Jones potential 45 is employed, where analytical derivatives including those required for the heat flux are readily available, and implementations are included in many packages, for example the Atomic Simulation   Environment (ASE) 69 .In the GLP framework, the Lennard-Jones potential can be seen as an extreme case of a M=1 GLP, where U i is composed of a sum of pair terms: For this experiment, parameters approximating elemental argon are used 70 .100 randomly displaced and distorted geometries, based on the 512-atom 8 × 8 × 8 supercell of the face-centered cubic primitive cell with lattice parameter 3.72 Å and angle 60°are used.Random velocities to evaluate a finite heat flux are sampled from the Boltzmann distribution corresponding to 10 K. Additional computational details are discussed in the Supp.Mat.
Table I compares the stress formulations in Eqs.(10) to (15)  with finite differences.We report 'best-case' results for finite differences, choosing the stepsize that minimises the error.In the table, the mean absolute error (MAE) and mean absolute percentage error (MAPE) with respect to the analytical ground truth are reported.All given formulations are found to be equivalent.In single precision arithmetic, the AD-based implementations slightly outperform finite differences, in double precision, errors are similar.
For the heat flux, finite difference approaches are not feasible.Therefore, only AD-based implementations are shown in Table II.In the case of the Lennard-Jones potential, where M=1, Eqs. ( 19), ( 21) and ( 22) are found to be identical.

VI. TIN SELENIDE WITH SO3KRATES
To investigate stress and heat flux in a practical setting, we now study tin selenide (SnSe) using the state-of-the-art SO3KRATES GLP 46 .In contrast to other equivariant MLPs, for instantce NEQUIP 54 , SO3KRATES replaces shared equivariant feature representations by separated branches for invariant and equivariant information, whose information exchange is handled using an equivariant self-attention mechanism.By doing so, one can achieve data efficiency and extrapolation quality competitive to state-of-the-art GLPs at reduced time and memory complexity.As non-local interactions are not modeled in the GLP framework introduced in this work, global interactions are disabled in the SO3KRATES models used at present.
For these experiments, SO3KRATES models with M=1, 2, 3 were trained on approximately 3000 reference calculations, comprising a number of thermalization trajectories at different volumes at 300 K.These calculations were perfomed as part of a large-scale ab initio Green-Kubo (aiGK) benchmark study by Knoop et al. 71,72

Implementation of stress and heat flux
While no analytical derivatives are available for SO3KRATES, the implementation of the stress can be verified with finite differences, and the heat flux can be checked for consistency between different implementations.Similar to Sec.V A, we use 100 randomly displaced and distorted 4 × 8 × 8 supercells of the 0 K primitive cell of SnSe for this experiment, sampling velocities from the Boltzmann distribution at 10 K to evaluate a finite heat flux.
Table III compares the stress implementations in Eqs. ( 10) to ( 15) with finite differences, confirming the equivalence of all implemented formulations.
Table IV compares the heat flux formulations in Eq. ( 21) and Eq. ( 22) with the baseline in Eq. ( 19), implementing the quadratically-scaling 'Hardy' heat flux with the MIC.For M=1, all formulations are precisely equivalent.For M>1, the semi-local case, only Eq. ( 22) is equivalent to the 'Hardy' heat flux; Eq. ( 21) does not apply and consequently is not equivalent, displaying large deviations.

Equation of state and pressure
To assess the capability of SO3KRATES to predict stress-and pressure-related materials we calculate energy-volume and pressure-volume curves for SnSe to obtain an equation of state (EOS) of the Vinet form 73,74 .The experiment is performed for unit cells that were homogeneously strained up to ±2 % starting from the fully relaxed geometry, and relaxing the internal degrees of freedom afterwards.The energy vs. volume curves for SO3KRATES with M=1, 2, 3 interaction steps are shown in Fig. 1 in comparison to the DFT reference using the PBEsol exchangecorrelation functional with 'light' default basis sets in FHIaims 75,76 .SO3KRATES with three interaction steps (M=3) yields the best visual agreement for the energy-volume curve in Fig. 1, and the best equilibrium volume.To quantify the agreement, we evaluate the Vinet EOS and extract the cohesive properties equilibrium volume V 0 , isothermal Bulk modulus B 0 , and its pressure derivative B 0 (functional forms are given in the Supp.Mat.).Results are listed in Table V.For SO3KRATES with three interaction steps (M=3), the predicted  volume deviates by −0.16 % from the DFT reference, and the deviation of the bulk modulus is −3.14 %.A larger error is seen for the pressure derivative of the bulk modulus, B 0 , which deviates by 27.7 %, indicating worse agreement further away from the training region.Overall, the agreement between DFT and SO3KRATES when predicting cohesive properties can be considered satisfactory.The energy-volume predictions are very good, and transfer even to volumes that are larger or smaller then the ones seen during training.
Finally we check the internal consistency of energy and stress predictions with SO3KRATES by fitting energy-volume and pressure-volume curves and verify that they yield identical parameters for the EOS.Results are shown in Fig. 2. As seen there, results are in perfect agreement, which verifies that stress and resulting pressure are consistent with the underlying energy function as expressed in Eq. (8).

Thermal Conductivity
Finally, we proceed to GK calculations, following the approach outlined in our previous work 42,81 .Finding that simulation cells with N=864 atoms and a simulation duration of t 0 =1 ns yield converged results (see Supp.Mat.), we run 11 MD simulations using the glp package and SO3KRATES, computing J at every step.
Table VI shows the result: For M=2, 3, SO3KRATES in excellent agreement with the work by Brorsson et al. 78 , which uses a force constant potential (FCP).A larger difference is observed with the work by Liu et al. 79 , who, however, use the PBE exchange-correlation functional, as opposed to PBEsol, which was used for the present work.The observed thermal conductivity is consistent with experiments, as well as the size-extrapolated DFT result of Knoop et al. 71 .The anisotropy of κ in SnSe is captured as well.Overall, SO3KRATES with more than one interaction step is able to capture the converged thermal conductivity of SnSe, using only training data from the thermalization step of a full aiGK workflow; long-running equilibrium ab initio MD simulations, the bottleneck of the GK method, have been avoided.

VII. CONCLUSION
We demonstrated that the stress and heat flux can be computed efficiently with AD for potentials based on a graph of atom-pair vectors, which we termed GLPs, and provided example implementations in the glp package 44 .Numerical experiments for Lennard-Jones argon and tin selenide with the SO3KRATES GLP, verified that these quantities are computed correctly and consistently.The equivariant SO3KRATES GLP was shown to predict cohesive properties and thermal conductivity of SnSe in good agreement with DFT, other MLPs, and experiments, confirming the practical relevance of computational access to stress and heat flux.
This work enables the use of a large class of recently developed MLPs, those which can be described in the GLP framework, in computational materials science, and in particular for the calculation of thermal conductivities using the GK method.For GLPs implemented with jax, the glp package is provided to enable the calculation of stress and heat flux without requiring further implementation efforts.
The Lennard-Jones potential used in this work is based on the 'smooth' implementation in ase 1 , where the standard pair potential is multiplied with a cutoff function ensuring that energies and forces decay continuously to zero as atoms approach r c .This was implemented in glp, and energy predictions were verified to ensure that glp and ase yield identical results.

B. Test of stress and heat flux
In order to verify the approach described in this work, and to test the glp framework, stress and heat flux were compared with the implementation in ase, where derivatives are computed analytically.The experiment consists of computing these properties for 100 randomly perturbed simulation cells of Lennard-Jones argon, starting from an 8 × 8 × 8 supercell of the face-centred cubic primitive cell with lattice parameter 3.72 Å and angle 60°.
Positions are then modified with perturbations drawn from a normal distribution with σ =0.01 Å; a random strain with each component drawn from a uniform distribution over [−1 %, 1 %] is also applied.Velocities, required for the heat flux, are drawn from a Boltzmann distribution corresponding to 10 K.For the heat flux, only J pot is computed, as J conv is identical for all implementations.

A. Implementation of test and heat flux
For this experiment, the relaxed 0 K primitive cell of SnSe, obtained from the work by Knoop et al. 2 , was extended to a 4 × 8 × 8 supercell, ensuring that r eff c does not exceed half length of the smallest lattice vector.Otherwise, the experiment proceeded identically to the one described in Section II B.
Tables I and II contain comparisons of the glp stress with finite differences for M=1 and M=3.Results are similar to the M=2 case.

B. Evaluating the potential
This section presents additional results with SO3KRATES and SnSe, which were used to assess the quality of the machine-learning potentials (MLPs) used in the work.
Figure 1 shows the phonon band structure for SO3KRATES with different M compared with the density-functional theory (DFT) reference.Agreement between DFT and SO3KRATES   increases systematically with M; satisfactory agreement is obtained from M=2 onwards.
The vibrational density of states can be seen in Fig. 2. Overall features are captured for all values of M. Performance is slightly improved for M=2, 3, which behave similarly.
In line with observations in the main text, M=2 can be considered sufficient to model vibrational properties.While slight improvements are obtained with M=3, the impact on thermal conductivity is negligible.Energy-volume curves, on the other hand, are improved by M=3.

C. Green-Kubo workflow and convergence
As the Green-Kubo (GK) method is formally valid in the thermodynamic limit, the convergence of κ with respect to simulation size and duration must be considered.Additional considerations arise from the use for noise reduction in the resulting heat flux autocorrelation function (HFACF).In this work, following previous work 3,4 a lowpass filter with frequency 1 THz is applied to the integrated HFACF, which is then differentiated via finite differences to yield a smooth HFACF from which the integration cutoff can be determined as the first zero crossing.No further noise reduction is performed; no gauge terms are removed and the full heat flux J = J pot + J conv is computed at all timesteps.
Eleven trajectories with timestep ∆t = 4 fs are used throughout, initialised from snapshots from a 0.2 ns NV T thermalization trajectory using the Langevin thermostat implemented Vibrational density of states for tin selenide at 300 K for SO3KRATES models with M=1, 2, 3 compared with FHI-aims.Results are reported for a trajectory with 30 ps duration in a supercell with 256 atoms, started from identical initial configurations.Vertical lines indicate prominent peaks in the FHI-aims result.in ase 1 .The primitive cell at 300 K from the work by Knoop et al. 2 was used, creating n × 2n × 2n supercells to obtain simulation cells at different N = 32n 3 .Figure 3 displays the convergence of κ with respect to simulation cell size N and simulation duration t 0 , for SO3KRATES with M=2.No significant increases in κ are observed for simulation cell sizes above N=864 and simulation durations above t 0 =2 ns.We therefore choose (864, 2 ns) as 'production' settings for this work.κ for tin selenide at 300 K for different choices of simulation cell size N and simulation duration t 0 for SO3KRATES with M=2.Error bars indicate the standard error across trajectories.N is shown as N 1/3 , which is proportional to the length scale of the simulation cell.For each choice of N, t 0 from 0.1 ns to 4.0 ns are shown with a horizontal offset.'Production' settings (864, 2 ns) are indicated; the associated standard error is shown as a shaded band.

D. Model and training
Here we use the SO3KRATES message-passing neural network (MPNN), which is implemented in the mlff package 5 .The number of interaction steps M is varied from M = 1 up to M = 3, whereas we fix the cutoff radius to r c = 5 Å, the embedding dimension to F = 132 and the maximal degree in the equivariant branch to l max = 3. Non-local corrections are not used.After M message-passing updates the final embeddings are put through a two layered feed-forward neural network (FFNN) with SILU (sigmoid linear unit) non-linearity to obtain per-atom energies.The total potential energy is given as the sum of the per-atom energies.
The model is trained by minimizing a joint loss of the potential energy and per-atom forces with loss weightings of 0.01 and 0.99, respectively using the ADAM optimizer 6 .In total 3000 reference structures of tin selenide are used for training, of which 600 are reserved for validation.The training is stopped after 2500 epochs, with a batch size of 10.After each epoch, the performance of the current model is evaluated on the validation data and the best-performing model is saved for production.The initial learning is set to 10 −3 and is reduced every 100k steps using exponential learning rate decay with a decay factor of 0.7.No early stopping is employed.Training times on a single NVIDIA A100 40GB GPU range from 2h54min for M = 1 up to 6h46min for M = 3.

3 FIG. 1 .
FIG. 1.EOS (energy vs. volume) computed with PBEsol-DFT (black dots) compared to SO3KRATES with different numbers of interaction steps M. The connecting lines have been obtained by fitting the Vinet EOS.Inset: Zoom into the region of volumes that were covered during the training indicated by the gray shading.

FIG. 2 .
FIG. 2.Comparison of EOS obtained by fitting energy-volume (red dashed) and pressure-volume (black solid) data.The results are in perfect agreement.

FIG. 1 .
FIG.1.Phonon band structure and density of states for tin selenide for SO3KRATES models with M=1, 2, 3 compared with FHI-aims.Results are reported for a supercell with 256 atoms.
FIG. 3.κ for tin selenide at 300 K for different choices of simulation cell size N and simulation duration t 0 for SO3KRATES with M=2.Error bars indicate the standard error across trajectories.N is shown as N 1/3 , which is proportional to the length scale of the simulation cell.For each choice of N, t 0 from 0.1 ns to 4.0 ns are shown with a horizontal offset.'Production' settings (864, 2 ns) are indicated; the associated standard error is shown as a shaded band.

TABLE II .
Error in heat flux for Lennard-Jones argon, comparing different formulations to analytical derivatives.Results are shown for both single and double precision arithmetic.

TABLE III .
Error in stress for tin selenide, comparing different formulations to finite differences, for SO3KRATES with M=2.Results are shown for both single and double precision arithmetic, and for σ •V in place of σ. Results for other M, which are similar to the one shown here, can be found in the Supp.Mat.

TABLE V .
. Additional details on the MLP training can be found in the Supp.Mat.Cohesive properties of SnSe for PBEsol-DFT, and SO3KRATES with different numbers of interaction steps M obtained via the Vinet EOS.Best values are highlighted.Values are in good agreement with existing literature 77 .

TABLE II .
Error in stress for tin selenide, comparing different formulations to finite differences, for SO3KRATES with M=3.Results are shown for both single and double precision arithmetic, and for σ •V in place of σ.