Data-driven schemes that associate molecular and crystal structures with their microscopic properties share the need for a concise, effective description of the arrangement of their atomic constituents. Many types of models rely on descriptions of atom-centered environments, which are associated with an atomic property or with an atomic contribution to an extensive macroscopic quantity. Frameworks in this class can be understood in terms of atom-centered density correlations (ACDC), which are used as a basis for a body-ordered, symmetry-adapted expansion of the targets. Several other schemes that gather information on the relationship between neighboring atoms using “message-passing” ideas cannot be directly mapped to correlations centered around a single atom. We generalize the ACDC framework to include multi-centered information, generating representations that provide a complete linear basis to regress symmetric functions of atomic coordinates, and provide a coherent foundation to systematize our understanding of both atom-centered and message-passing and invariant and equivariant machine-learning schemes.

The understanding that ν-point correlations of the atom density provide a complete linear basis to expand any property of these atom-centered environments1–3 sheds light on the relations between different machine-learning (ML) frameworks for atomistic modeling4–6 and is driving much progress in obtaining more accurate and performant schemes.7 This systematic approach is somewhat disconnected from efforts that combine ideas from geometric ML8,9 and point clouds10,11 that look at molecules as graphs in which the atoms play the role of nodes and the interatomic distances decorate the edges that connect them12–17—a point of view that has a long-standing tradition in the field of cheminformatics.18,19 Message-passing (MP) concepts that describe the propagation of information about the geometric relations between nodes in terms of messages propagated along the edges provide a general framework that encompasses popular architectures, such as graph convolutional neural networks (NN), as special cases. With the rise of equivariant neural networks,10,20–24 the similarity between atom-centered and message-passing schemes has become more and more apparent, since both rely on symmetry-adapted constructions based on O(3)-preserving operations. We introduce a formalism to define representations that incorporate graph-based and message-passing ideas, that provide a linear basis to expand symmetric functions of the relative position of atoms (or points in a 3D cloud), and that correspond, in the appropriate limit, to a wide variety of deep learning schemes based on graphs. We give examples of the performance of these “message-passing” representations in describing short- and long-range structure–property relations and discuss how this framework provides a unified formalism to describe geometric machine-learning approaches that rely on structured representations,9 incorporating inductive biases in the form of atom-centered contributions that are local and equivariant.

The basic ingredient in the atom-centered density correlation (ACDC) framework2 involves localized functions (e.g., Gaussians g or Dirac δ distributions) centered at the atomic positions ri. Following the notation formalized in Sec. 3.1 of Ref. 4, we indicate these atomic contributions as ⟨x|ri⟩ ≡ g(xri) [or δ(xri)]. The notation is summarized in Table I, and some of the key quantities are illustrated schematically in Fig. 1. Repeated symmetrization and summation over the neighbors of the central atom yield a family of (N + 1)-centers and (ν + 1)-body-order equivariant representations,25 

|ρii1iNν;σ;λμ̄O(3)dR̂R̂|λμR̂|σR̂|ρii1iNν.
(1)

In this expression, |λμ⟩ is the vector that tracks the irreducible spherical [SO(3)] representation26 and |σ⟩ is the parity label. |ρii1iNν indicates the tensor product of pair terms |riki, which determine the relative position riki=rikri of the N centers around the central atom, and ν of the neighbors found within the i-centered environment Ai [for the derivation, see Ref. 25, where the following is presented as Eq. (5)],

|ρii1iNν=|ri1i|riNij1jνAi|rj1i|rjνi.
(2)

Equation (2) is a highly abstract formulation of the construction (represented schematically in Fig. 1), which can be evaluated, in practice, by projection on any complete basis |q⟩. By combining the resulting coefficients with appropriately fitted weights ⟨y|q⟩, one can approximate any (N + 1)-center quantity yii1iN that is symmetric with respect to permutations of the labels of the neighbors and equivariant with respect to rigid translations and rotations, e.g.,

yii1iNσ;λμ(A)qy;σ;λ|qq|ρii1iNν;σ;λμ̄.
(3)

To see how, one can consider a real-space basis and a δ-like atom density. For an invariant (scalar) property,

yii1iN(A)=d{x}y|{x}{x}|δii1iNν̄=O(3)dR̂j1jνAiỹ(R̂rj1i,R̂rjνi,R̂ri1i,R̂riNi),
(4)

where {x} is the shorthand for the collection of the continuous indices associated with each interatomic degree of freedom.27 

TABLE I.

Notation key for different types of density-based representations.

Density peaked at the interatomic spacing rikri 
|rikix|riki=g(xriki) 
Basis of spherical harmonics and radial functions 
nlm|dxn|xlm|x̂xx̂| 
Neighbor density around the ith atom 
|ρii1|ri1i 
ν-neighbor ACDC around the ith atom 
|ρiν≡|ρi⟩ ⊗ |ρi⟩ ⋯ ⊗ |ρi⟩ 
Symmetrizedν-neighbor ACDC invariant 
|ρiν̄dR̂|ρiν 
Symmetrized ν-neighbor ACDC equivariant with SO(3) character λ and inversion character σ 
|ρiν;σ;λμ̄dR̂R̂|σR̂|λμR̂|ρiν 
Multi-center ACDC features describing a cluster i, i1, …, iN of N + 1 atoms 
|ρii1iNνν1νN|ρiν|ρi1ν1|ri1i 
Simplest form of an i-centered message-passing ACDC representation 
|ρi[νν1]i1|ρii1νν1 
Higher-order MP contraction 
|ρi[ν(ν1,ν2)]i1i2|ρii1i2νν1ν2 
Iterating the MP construction 
|ρi[ν[ν1ν2]]i1|ρiν|ρi1[ν1ν2]|ri1i 
Density peaked at the interatomic spacing rikri 
|rikix|riki=g(xriki) 
Basis of spherical harmonics and radial functions 
nlm|dxn|xlm|x̂xx̂| 
Neighbor density around the ith atom 
|ρii1|ri1i 
ν-neighbor ACDC around the ith atom 
|ρiν≡|ρi⟩ ⊗ |ρi⟩ ⋯ ⊗ |ρi⟩ 
Symmetrizedν-neighbor ACDC invariant 
|ρiν̄dR̂|ρiν 
Symmetrized ν-neighbor ACDC equivariant with SO(3) character λ and inversion character σ 
|ρiν;σ;λμ̄dR̂R̂|σR̂|λμR̂|ρiν 
Multi-center ACDC features describing a cluster i, i1, …, iN of N + 1 atoms 
|ρii1iNνν1νN|ρiν|ρi1ν1|ri1i 
Simplest form of an i-centered message-passing ACDC representation 
|ρi[νν1]i1|ρii1νν1 
Higher-order MP contraction 
|ρi[ν(ν1,ν2)]i1i2|ρii1i2νν1ν2 
Iterating the MP construction 
|ρi[ν[ν1ν2]]i1|ρiν|ρi1[ν1ν2]|ri1i 
FIG. 1.

A few examples of ACDC representations and their construction by summing pair contributions over atomic neighbors.

FIG. 1.

A few examples of ACDC representations and their construction by summing pair contributions over atomic neighbors.

Close modal

Explicitly evaluating the sum over neighbors entails a cost that scales exponentially with ν. However, a density trick eliminates the steep scaling with the number of neighbors by first computing an atom-centered density, |ρi⟩ ≡ j|rji⟩, and then performing the tensor products.4 Similarly, one does not need to compute the integral over rotations explicitly. By expressing the equivariant features in an irreducible coupled basis and discretizing each term in the tensor product on a basis of radial functions ⟨x|nl⟩ and spherical harmonics x̂|lm, the full set of symmetry-adapted equivariants (1) can be evaluated by combining symmetry-adapted terms in an iterative fashion,28 

q1l1;q2l2|A1;A2;(σ1σ2(1)l1+l2+λ);λμ=m1m2q1|A1;σ1;l1m1q2|A2;σ2;l2m2l1m1;l2m2|λμ.
(5)

The key insight from Eq. (5) is that equivariant features can be combined to yield higher-order equivariants that provide a more complete basis to express structure–property relations. As shown graphically in Fig. 2, the tensor product of two features with symmetry l1 and l2 generates equivariant components ranging from λ = |l1l2| to l1 + l2. One can obtain the same rotational behavior (λ) with different combinations of l1 and l2. For instance, the figure shows the λ = 1 blocks that can be obtained by combinations of l1 = 1, l2 = 1 and l1 = 0, l2 = 1 in shades of red, indicating the different symmetry blocks involved in the coupling. Similarly, λ = 0 blocks can arise from the combination of only invariants (l1 = 0, l2 = 0) but also by combinations of equivariant components (l1 = 1, l2 = 1) and (l1 = 2, l2 = 2) shown in darker shades of gray. Expressions such as (2) provide a notation to describe a linear basis to approximate properties of (N + 1) atomic centers that depend simultaneously on the positions of ν neighbors and (5) a strategy to compute such features iteratively while keeping the irreducible equivariant components separate. References 2, 3, and 29 discuss the universality of this construction based on its equivalence to a complete basis of symmetric polynomials (see also Sec. 4.1 of Ref. 4 for a pedagogic discussion). Similar arguments about approximating equivariant functions with a basis of equivariant polynomials have been used in Ref. 30 to prove the universality of tensor field networks.10 

FIG. 2.

Equivariant features [color-coded based on the SO(3) character and enumerated by the index q] can be combined to generate equivariants of higher order by Clebsch–Gordan (CG) iterations (black arrows). The figure only shows a small part of the equivariants that can be computed but emphasizes that linearly independent terms with the same SO(3) character arise from the combination of different characters, giving rise to a richer product space ⟨q1l1q2l2|.

FIG. 2.

Equivariant features [color-coded based on the SO(3) character and enumerated by the index q] can be combined to generate equivariants of higher order by Clebsch–Gordan (CG) iterations (black arrows). The figure only shows a small part of the equivariants that can be computed but emphasizes that linearly independent terms with the same SO(3) character arise from the combination of different characters, giving rise to a richer product space ⟨q1l1q2l2|.

Close modal

As a concrete example, consider the expansion of the atom-centered density in radial functions ⟨r|nl⟩ and spherical harmonics r̂|lm, ⟨nlm|ρi⟩, which can also be interpreted as the ν = 1 equivariant ACDC n|ρi1;lm̄. The two-point correlation can be simply indexed as nlm;nlm|ρi2, but its elements do not transform as irreducible representations of O(3). We can obtain a more transparent equivariant formulation by using the iteration (5),

nl;nl|ρi2;(1)l+l+λ;λμ̄=mmn|ρi1;lm̄n|ρi1;lm̄lm;lm|λμ,
(6)

which yields the equivariant smooth overlap of atomic positions (λ-SOAP) features31 with a well-defined parity and SO(3) character. The (σ = 0, λ = 0) scalar term corresponds to the popular invariant SOAP representation,32 

nnl|ρi2̄=mn|ρi1;lm̄n|ρi1;lm̄.
(7)

To better understand the broader significance of this construction and its relationship to other geometric ML schemes, it is worth recalling that SOAP invariants encode information on the distances between the central atom and two neighbors rji, rji as well as on the angle between them θjji and could be equally well computed as2,4

nnl|ρi2̄=j,jAiGnnl(rji,rji,θjji),
(8)

where the Gnnl are the functions that depend on the basis used for the neighbor density expansion and that could, in principle, be chosen to mimic the usual Behler–Parrinello symmetry functions.33 Similar expressions can be derived for any scalar contraction of two atom-centered density terms: for instance, the three-center, zero-neighbor representation can be evaluated as

nnl|ρii1i20̄=mnlm|ri1inlm|ri2i=Gnnl(ri1i,ri2i,θi1i2i),
(9)

which we will use later to discuss the connection with geometric ML schemes based on angular information.

The ACDC construction is entirely based on terms centered around a single atom and can be seen as an alternative to message-passing (MP) architectures, which are based on the idea that information on multiple nodes (atoms in this context) and their connectivity (inter-atomic distance vectors) could be combined to provide higher descriptive power. The essential ideas behind MP frameworks can be summarized in terms of a few essential steps, illustrated in Fig. 3: (a) A description of the connectivity of a selected node i with each of its neighbors j is built by combining information on the neighbor attributes and on the geometry of the connection. (b) Information on all the neighbors of the selected node is combined in a way that is independent of the order by which they are considered. (c) The compound description of the neighborhood is assigned as the new attributes of the central node. Iterating the process propagates information on the neighbors’ neighbors to every node. Graph-convolutional frameworks can be seen as a special case.8,9 We refer the reader to Refs. 12 and 34 for an overview of the methodology applied to chemical modeling and to Sec. II E for an explicit comparison with existing MP architectures.

FIG. 3.

Schematic representation of the functioning of message-passing schemes. The three panels illustrate the different steps of a message-passing architecture, as discussed in the text where they are enumerated with letters matching those in the figure.

FIG. 3.

Schematic representation of the functioning of message-passing schemes. The three panels illustrate the different steps of a message-passing architecture, as discussed in the text where they are enumerated with letters matching those in the figure.

Close modal

The ACDC framework can be extended to provide a basis for the construction of arbitrarily complex equivariant MP frameworks. First, we introduce the possibility of decorating an (N + 1)-center representation with information on any of the centers. Disregarding the symmetry indices and the integral over rotations for simplicity—as discussed above, symmetry-adapted equivariants can be obtained by using Eq. (5) on the tensor-product pattern—we define the decorated atom-density representations,

|ρii1iNνν1νN|ρiν|ρi1ν1|ri1i|ρiNνN|riNi.
(10)

The (N + 1)-center representations (2) are a special case, with νk>0 = 0. Given that we encode information on the vectors connecting the central atom i with its neighbors riki, the information on the position of the neighbors relative to each other is also formally retained because rikik=rikiriki. As we will show in Sec. III A, for a finite basis, the performance of representations that formally describe the same order of interatomic correlations can depend, in practice, on which pairs of atoms are included explicitly in the description. Taking tensor products of (N + 1)-center features amounts to increasing their body order,

|ρii1iNνν1νN|ρii1iNνν1νN|ρii1iN(ν+ν)(ν1+ν1)(νN+νN),
(11)

where we use an approximate equality because on the left-hand side each |riki appears twice. The tensor product |riki|riki increases the dimensionality of the left-hand side even though it does not incorporate information on more neighbors or centers.

Simple MP-like representations can be obtained starting from decorated pair (two-center) features and summing over the pair index (Fig. 4),

|ρi[νν1]i1Ai|ρii1νν1=|ρiνi1Ai|ρi1ν1|ri1i.
(12)

For the case of ν1 = 0, the sum simplifies to |ρi⟩; thus, the expression simply evaluates ACDC features of order ν + 1. By expanding the tensor product and in analogy with (4), one sees that these features provide a linear basis to expand a symmetrized function of the neighbors of i and of each of their neighbors, e.g.,

qy|qq|ρi[11]=i1,jAi;j1Ai1ỹ(rji,ri1i,rj1i1).
(13)

A basis for O(3)-equivariant predictions can be obtained by computing the tensor product iteratively in the coupled basis, following (5). Except for the (important) subtlety that one of the indices extends over the neighborhood of i1, these features contain the same information as the ν = 3 ACDC (the bispectrum). Indeed, one can verify that |ρi[11]̄ discriminate between structures that are degenerate35 for the on-site power spectrum |ρi2̄ but cannot discriminate between environments that have the same |ρi3̄.

FIG. 4.

Construction of MP features from the contraction of pair and neighbor-density terms. The process can be iterated in many different ways (see the supplementary material for details).

FIG. 4.

Construction of MP features from the contraction of pair and neighbor-density terms. The process can be iterated in many different ways (see the supplementary material for details).

Close modal

Equation (12) can be generalized in different ways. For example, one could compute N = 2 decorated features and contract them,

|ρii1[νν2]ν1i2|ρii1i2νν1ν2.
(14)

These features describe atomic pairs as a sum over descriptors of triplets. They could be used as features to characterize edges and seen as a building block of MP architectures that use bonds as nodes and angles as edges.36 

By also summing over i1, one obtains message-passing ACDC features that contain joint information on two decorated neighbors,

|ρi[ν(ν1,ν2)]=i1i2|ρiν|ri1i|ρi1ν1|ri2i|ρi2ν2.
(15)

This corresponds to an overall correlation order of ν + ν1 + ν2 + 2 neighbors around the i-atom. More generally, representations that incorporate joint information on multiple neighbors can be built starting from pair MP features—see the supplementary material for more information,

|ρi[νν1]|ρi[νν1]=|ρi[(ν+ν)(ν1,ν1)].
(16)

Note that the left-hand side avoids the double sum over neighbors in Eq. (15), similar to how the density trick avoids summation over tuples of neighbors by first expanding a neighbor density and then increasing the body order by taking tensor products rather than by explicit higher-order summations.

Given that all of these contracted expressions yield representations that are atom-centered, the process can be iterated. For example, one could first compute a low-order |ρiν descriptor, use it to evaluate the MP representation |ρi[νν], and then repeat. Depending on how one describes the central atom and the neighbors, the procedure yields different types of representations. Combining atom-centered features on the i atom with MP representations on the neighbors yields

|ρi[ν[νν]]=i1|ρiν|ri1i|ρi1[νν],
(17)

while combining MP representations on i and atom-centered neighbor features yields

|ρi[[νν]ν]=i1|ρi[νν]|ri1i|ρi1ν,
(18)

which can be shown to be equivalent to |ρi[ν(ν,ν)]. Finally, one can combine MP representations on both atoms,

|ρi[[νν][νν]]=i1|ρi[νν]|ri1i|ρi1[νν],
(19)

which can also been rewritten in terms of (17). A full expansion of these expressions is given in the supplementary material. Each of these descriptors—as well as countless variations in which primitive features of different body orders, describing atoms, bonds, or general clusters, are combined together—is associated with a tensor product string in the form of Eq. (2). This can be manipulated to improve the efficiency of the calculation or simply used to characterize the overall body order and the type of correlations included in the architecture.

The discussion this far focuses on the case of equivariant MP representations that aim to provide a complete basis for the symmetric function of the neighbor positions. Arguments similar to those exposed in Eq. (13) make it possible to determine the precise form of the function of interatomic distance vectors that is associated with each type of features. Even though it is convenient to manipulate these representations in their irreducible O(3) equivariant form, it may not always be the most convenient form to work with them. As long as one exclusively uses tensor products to construct them, the higher-order terms can always be written in terms of these equivariant irreducible representations, which, for instance, are exploited by moment tensor potentials.1 If, however, one takes an arbitrary non-linear function of the equivariant components, the result cannot be separated into irreducible terms. The case of scalar (invariant) components is an exception in that any non-linear function of invariant features is itself invariant. Thus, one can enrich the architecture by applying non-linear functions to the scalar component of the equivariants of any type and order. It should be noted that doing so eliminates the link of ACDCs with a body-ordered linear expansion of the target property. As a simple example, Behler–Parrinello neural networks37 compute atom-centered symmetry functions that are equivalent to |ρi1̄ and |ρi2̄ and use them as inputs to a feed-forward neural network, which is essentially a non-linear function with adjustable parameters. This corresponds to a non-linear functional form that can incorporate some, but not necessarily all, higher-order correlations.35 

To see what these non-linear terms do, we start from a vector of invariants ξ(Ai) = {ξq(Ai)} = {⟨q|Ai⟩} (e.g., {q|ρi1̄}, {q|ρi2̄}, or any other representation such as {q|ρi[11]̄}) and apply Eq. (5) with all terms truncated to λ = 0. This generates all possible products of the initial features q1;q2|Ai2q1|Aiq2|Ai. In terms of the feature vector, this corresponds to {ξ12,ξ1ξ2,ξ22,}. Repeating the iteration k times generates all monomials of total order k that combine powers of {ξq}, e.g., ξ13ξ22ξ9k5. Comparing with the power series expansion of an analytical function of ξ(A),

f(ξ)=k1k2f(k1,k2,)(0)k1!,k2!,ξ1k1ξ2k2,
(20)

one sees that computing f(ξ) is equivalent to repeating the λ = 0-truncated iteration to infinity and projecting the resulting features on the expansion coefficients. Given that f (and hence the coefficients) usually depends on adjustable parameters, architectures that include scalar functions escape the systematic classification that we propose for equivariant frameworks that are entirely based on the angular momentum iteration. Still, the spirit is clear: each non-linear scalar function brings in infinite body-order correlations, which are, however, restricted to λ = 0 iterations and projected along a single adjustable direction in the infinite body-order feature space. In other terms, scalar nonlinearities provide a computationally efficient way to incorporate high-order products of the scalar components of the different types of ACDCs. However, they are not always more expressive than lower-order equivariant models because (1) each non-linear function picks up a single linearly independent component of the high-order correlation space and (2) they entirely miss all the components of the correlation space that arise from the combinations of λ>0 equivariants. Figure 5 extends the scheme presented in Fig. 2 to illustrate the hierarchical construction of higher order features by combining ν = 2 features from one Clebsch–Gordan iteration (5) with ν = 1 features to further increase the correlation order. It also visualizes the process of computing a non-linear function of the first-order invariants {ξq(Ai)}={q|ρi1;00̄} (light-gray). Given that the terms in Eq. (20) correspond to products of {ξq(Ai)}, the function can be seen as a weighted combination (represented with green shading) of the invariants generated as combination of the first-order invariants (light gray blocks). Evaluating a scalar function cannot incorporate any of the invariants that arise from combinations of λ>0 equivariants (darker shades of gray).

FIG. 5.

The figure extends the scheme in Fig. 2 to illustrate the construction of a hierarchy of invariant features of order ν = 1, 2, 3, … by repeated application of the Clebsch–Gordan (CG) iteration (5). Linearly independent invariants can also arise from intermediate representations with a vectorial, λ⟩0 character. A non-linear function of the ν = 1 invariants selects a single direction within the subspace that is generated exclusively by tensor products of the λ = 0 terms (green dotted line), combined with weights that depend on the details of the non-linear function (green shading).

FIG. 5.

The figure extends the scheme in Fig. 2 to illustrate the construction of a hierarchy of invariant features of order ν = 1, 2, 3, … by repeated application of the Clebsch–Gordan (CG) iteration (5). Linearly independent invariants can also arise from intermediate representations with a vectorial, λ⟩0 character. A non-linear function of the ν = 1 invariants selects a single direction within the subspace that is generated exclusively by tensor products of the λ = 0 terms (green dotted line), combined with weights that depend on the details of the non-linear function (green shading).

Close modal

During the last few years, a class of NNs has been developed that is based on the use of rotationally equivariant expressions as their fundamental building blocks, operating on node and edge descriptors that are computed on a basis of spherical harmonics. Models of this type rely heavily on the use of CG iterations analogous to Eq. (5) and therefore have the most direct relation with the MP-ACDC formalism up to a one-to-one correspondence when considering frameworks that do not introduce any kind of scalar non-linearities. The essential ingredient is the construction of contracted message-passing terms, akin to Eq. (12). In most frameworks, this operation is split into two parts: the construction of a neighbor sum of messages,

|ρi[0ν1]=i1|ρi1ν1|ri1i,
(21)

and the interaction of the contracted message with the central atom,

|ρi[νν1]=|ρiν|ρi[0ν1].
(22)

Given that symmetry considerations greatly restrict the range of permissible operations, various equivariant frameworks are remarkably similar. The most significant differences lie in the strategy used to avoid the exponential increase in the dimensionality of the tensor-product space and make the NN practically computable. To establish a direct connection, therefore, it is necessary to explicitly describe the basis used to express the various equivariant terms. Both |ρiν and |ρi1ν can be written in their irreducible form (we neglect the parity index, for simplicity) and enumerated by a feature index q, e.g., q|ρiν;lm̄. For ν = 1, the q index corresponds to radial functions that discretize the spherically averaged distribution of neighbors. The pair vector term can also be discretized in an irreducible equivariant form as q|ri1i;lm̄=ri1i|qr̂i1i|lm. Thus, Eq. (21) can be written as

q1l1;q2l2|ρi[0ν1];λμ̄=i1m1m2q1|ρi1ν1;l1m1̄q2|ri1i;l2m2̄l1m1;l2m2|λμ,
(23)

following Eq. (5). In practical implementations, one can perform first the sum over the neighbors (as performed in tensor field networks10) or perform first the CG product, as we do here. The interaction layer (22) adds one more feature index and two more angular momentum indices, e.g., q1l1;q2l2;q3l3k|ρi[11];λμ̄. Iterating Eqs. (21) and (22) exponentially increases the size of the feature vector. This is precisely the same issue that is observed in high-order ACDC frameworks2,3 and is usually addressed by a contraction/selection of the features based on either heuristic38 or data-driven28,39 considerations.

In the context of equivariant NNs, there are two main approaches to alleviate the exponential increase in the dimensionality of descriptors. The first, delayed approach21,28 involves linear contraction steps applied after each CG iteration. In practice, Eq. (23) is followed by

q̃|A;λμ̄=q1l1q2l2q̃;λ|q1l1;q2l2q1l1;q2l2|A;λμ̄,
(24)

where the contraction coefficients q̃;λ|q1l1;q2l2 depend on the equivariant channel λ and can be determined in an unsupervised manner [e.g., through principal component analysis (PCA)28] or are simply coefficients in the network, optimized by backpropagation. The second, anticipated approach involves applying a linear transformation to the descriptors before performing the CG iteration. First, one computes

q̃;t|ρiν;λμ̄=qq̃;λ;t|qq|ρiνλμ̄
(25)

and

q̃;t|ri1i;λμ̄=qq̃;λ;t|qq|ri1i;λμ̄.
(26)

Different transformations can be applied at different points in the network, as emphasized by the additional index t in the contraction coefficients q̃;λ;t|q and q̃;λ;t|q, and the operation projects the two terms in a space of equal dimensionality even if they are not initially. Then, Eq. (23) can be expressed in a more efficient way that does not involve a tensor product,

q̃;l1;l2;t|ρi[0ν1];λμ̄=i1m1m2l1m1;l2m2|λμ×q̃;t|ρi1ν1;l1m1̄q̃;t|ri1i;l2m2̄.
(27)

Note that, in principle, this form does not entail loss of generality because one could write the linear transformations (25) and (26) as a projection in a higher dimensional space, reproducing the full tensor product, which would, however, defeat the purpose of an anticipated transformation.

Having expressed the basic building blocks of equivariant neural networks in our notation, we can draw a direct correspondence to different frameworks. In the language used in tensor field networks,10 Eq. (27) describes a pointwise convolution layer, incorporating also a self-interaction layer that corresponds to (25). The contracted pair term q̃;t|ri1i;λμ̄ corresponds to the rotation-equivariant filter, given that an adaptive linear combination of a fixed radial basis has the same expressive power as a learned radial function—which is used in NequIP,23 an atom-based implementation of TFN with an architecture optimized for applications to the construction of interatomic potentials.

All the on-site operations discussed up until now [both the delayed (24) and anticipated (25)] correspond to a linear change in basis and do not increase the expressive power of the descriptor. One can, however, design on-site terms that increase the correlation order relative to a tagged atom, e.g.,

q1l1;q2l2|ρi(ν1+ν2);λμ̄=m1m2l2m1;l2m2|λμ×q1|ρiν1;l1m1̄q2|ρiν2;l2m2̄.
(28)

This kind of CG iteration—which is the same as Eq. (5) that underlies ACDCs—is used in Clebsch–Gordan nets,21 where it is combined with delayed contractions to keep the computational cost manageable. Other architectures, such as Cormorant,22 combine all three discussed layers. On top of these, SE(3) equivariant transformers11 modify the pointwise convolutional layer by incorporating an attention mechanism,

q|ρi[0ν];λμ̄;α=mmlm;lm|λμ×i1αi1iq|ρi1ν;lm̄q;l|ri1ilm̄,
(29)

where the scalar terms αi1i indicate the cross-attention coefficients.40 These terms are built by first constructing invariant combinations of the covariant query vector related to the central atom i and key vectors associated with the neighbor atoms i1, which are there combined with a non-linear soft-max operation. Since the attention mechanism is not linear, each layer incorporates terms up to the infinite body order, as discussed in Sec. II C, and so, the indices ν cannot be rigorously interpreted as a tracker of the body order of the descriptor.

To discuss the role of non-linearities and how they can—at least, in principle—be understood in the language of MP-ACDC, let us discuss first the case of neural networks based on invariant, scalar attributes of the nodes and edges of the graph, which constitute the earlier, and better established, message-passing and graph-convolution neural networks for molecules. A general invariant message-passing architecture, as formalized by Gilmer et al.,12 amounts to steps in which node features hi and edge features eii1 are combined based on the construction of messages,

mit+1=i1AiMt(hit,hi1t,eii1),
(30)

and their aggregation to update the node features,

hit+1=Ut(hit,mit+1).
(31)

The linear limit of this architecture can be described as a special case of the MP-ACDC. The node features hit can be taken to be any set of scalar features. At the simplest level, one can take ν = 1 invariants, i.e., projections of the interatomic distances on a radial basis, which can represent any function of the distance. Thus, the linear message function Mt corresponds to the two-center ACDC, restricted to a tensor product of site and edge invariants,

|ρii111̄;lmax=0=|ρi1̄|ρi11̄|rii1,
(32)

appropriately discretized on a radial basis. The contraction over i1 yields the MP-ACDC (again, restricted to invariant features),

|ρi[11]̄;lmax=0=i1|ρii111̄;lmax=0,
(33)

while the node update simply increases the body order of the on-site descriptor,

|ρiν|ρi[ν1ν2]=|ρi[(ν1+ν)ν2].
(34)

The general, non-linear case amounts to using the pair features ξj(ii1)=j|ρii111̄;lmax=0 as the inputs for a set of scalar functions. If one wants to keep a linear formalism, arbitrary non-linear message function Mt can be approximated by a linear fit using tensor products that approximate the high dimensional scalar correlation space, as discussed in Sec. II C. The entries of mit+1 could be written explicitly as

q|ρi[11]̄;lmax=0;Mt=i1Mq,t(ξ(ii1))=i1k1k2k3Mq,tk1,k2,[ξ1(ii1)]k1[ξ2(ii1)]k2,
(35)

setting Mq,tk1,k2, to the appropriate values to match the series expansion of Mq,t. Even though Eq. (35) shows how to formulate the message in terms of MP-ACDC, it is clearly an impractical way of evaluating a prescribed non-linear function of ξ(ii1). Similar considerations apply to the update functions, Eq. (31). In the non-linear case, Ut also introduces higher products of |ρi[νν];lmax=0 features and therefore is akin to the calculation of representations that combine joint information of multiple neighbors, e.g., Eq. (15). The restriction to lmax = 0, however, reduces the expressive power of the combination relative to a full tensor product.

We have demonstrated the direct connections between the MP-ACDC formalism and many established invariant and equivariant neural networks. There are several directions along which the framework can be extended with qualitative connections to recently proposed neural network architectures that attempt to achieve a better balance between the simplicity of invariant networks and the accuracy of their equivariant counterparts.

One obvious way to increase the expressive power of invariant deep learning models is to start from higher-order scalar descriptors—or to incorporate them at later stages of the iteration. This has long been known for NN potentials based on atom-centered symmetry functions,37,41 which have included from the very beginning three-body descriptors. Angular information can be readily included into an ACDC through the discretized two-center invariant |ρii1i20̄ [see Eq. (9)] as well as its contracted pair |ρii11̄ or atom-centered |ρi2̄ counterparts—the latter corresponding to the SOAP power spectrum. Several works36,42 have reported substantial improvements in the performance of ML models when incorporating information on interatomic angles into message-passing frameworks. REANN43 is a particularly interesting example that iteratively combines three-body invariant features weighted by coefficients describing the corresponding pair of neighbors—essentially the scalar equivalent of the MP contraction over two decorated neighbors (15) —overcoming some of the limitations of both distance-based MP frameworks and atom-centered three-body invariants.

Returning to equivariant NNs, non-linearities can be included in a similar spirit as for invariant message-passing schemes, increasing the effective correlation order without the cost of a full CG iteration. A simple way to introduce a non-linear term without interfering with the linear nature of O(3) covariance involves combining equivariant features with a scaling computed from invariant non-linear terms, e.g.,

q|A;λμ̄;f=f({q|A;00̄})q|A;λμ̄,
(36)

an idea that was already discussed in the context of ACDCs.2 Most of the recent architectures that extend equivariant NNs incorporate some kind of non-linearities, such as the attention mechanism discussed for SE(3) transformers in Eq. (29). Other noteworthy, recent attempts to improve on equivariant NNs include GemNET44—which introduces a two-hop message-passing scheme that is rotationally equivariant through directional representations of interatomic separations defined on the S2 sphere rather than on the SO(3) group, resulting in spherical convolution layers that depend on angles between interatomic separation vectors; PAINN45—which propagates directional information in the Cartesian space, reducing the complexity associated with the construction of a complete equivariant basis set, and can be mapped to MP-ACDCs with inputs restricted to lmax = 1; and SEGNN46—which extends the equivariant (steerable) node features to include spherical decompositions of tensors representing physical/geometric quantities to perform contractions akin to Eq. (5) with the messages in addition to incorporating non-linearities of the form of Eq. (36) in the message and node-update functions. Further developments of the MP-ACDC formalism may be needed to characterize systematically these extensions, particularly for in regard to the use of non-linear terms.

We choose three simple examples to illustrate the behavior of the MP representations in the context of linear models (details are given in the supplementary material). We emphasize that we deliberately chose simple examples and small datasets as we do not intend to perform a benchmark exercise. In the following examples, we compare the behavior of different types of representations and gather some insights on their strengths and shortcomings when used in a controlled setting.

To test the convergence of the body-order expansion, we use a dataset of 40 000 random CH4 configurations48 with energies computed at the density functional theory (DFT) level. We discretize the neighbor density on nmax = 6 optimal radial functions49 and lmax = 4 angular channels. We set rcut = 3.5 Å so that all atoms are included in each neighborhood.50 To reduce the complexity of the tests and eliminate possible confounding factors, we compute the full invariant correlation features up to ν = 3 and the |ρi[11]̄ MP representations without any data-driven28 or heuristic contraction. An interesting observations in Ref. 28 is the fact that linear models based on C-centered ν = 4 features—which should be complete in terms of expanding a symmetric function of the coordinates of the four H atoms—show saturation of the test error at around 4% RMSE (root mean square error). Models based on ν = 3 features saturate at 8% RMSE. This saturation is related to the truncation of the basis and is even more evident here (Fig. 6) where we use smaller (nmax, lmax) to enumerate the full one-center-three-neighbor correlations, and the ν = 3 model cannot improve beyond 16% RMSE (Fig. 6).

FIG. 6.

Learning curves for the atomization energy of random CH4 configurations for different C-centered features (solid lines) and for multi-center models using both C and H-centered features (dashed lines) averaged over five random subselections of train-test sets. Shaded areas (error bars) indicate the standard deviation of the five subselections for C (C+H) models.

FIG. 6.

Learning curves for the atomization energy of random CH4 configurations for different C-centered features (solid lines) and for multi-center models using both C and H-centered features (dashed lines) averaged over five random subselections of train-test sets. Shaded areas (error bars) indicate the standard deviation of the five subselections for C (C+H) models.

Close modal

MP representations [1 ← 1] perform dramatically better, reaching a 7% RMSE. This appears to be due to better convergence of the discretization and not modulation of the weight of contributions at different distances, which does change the accuracy of the model (see the supplementary material) but not in terms of the relative performance of ν = 3 and MP features. Using a multi-center model (Fig. 6, dashed curves) almost entirely eliminates the advantage of MP features over the bispectrum, consistent with the interpretation that the H-centered density provides a faster-converging description of the interaction between pairs of hydrogen atoms that are far from C.28 However, simply having H-centered features is not sufficient to obtain a very accurate linear model. Even though ν = 2 features also yield more accurate predictions when using a multi-center mode, the improvement is less dramatic than for ν = 3 ACDCs, which in the multi-center case are almost equivalent to the MP features. Even in this very simple test problem, the order of the representation, the convergence of the basis, and the use of single-centered or multi-centered models are interconnected.

An alternative test to quantify the relative information content of different representations, independent from the regression target, involves computing the error one makes when using one type of features ξ to predict a second type ξ′ for the same structures. This construction, recently introduced as the the global feature space reconstruction error (GFRE),47 yields a number close to one when the target features ξ′ are linearly independent from ξ and close to zero when they can be predicted well. The GFRE is not symmetric: if GFRE(ξ, ξ′) ≪ GFRE(ξ′, ξ), one can say that ξ is more informative than ξ′, and vice versa. Figure 7 demonstrates that all four-body representations allow predicting, almost perfectly, ν = 2 features and that message-passing ACDCs [1 ← 1] are more informative than their single-center ν = 3 counterpart even though the bispectrum does contain a small amount of complementary information, given that GFRE([1 ← 1], ν = 3) is small but non-zero. These results provide further evidence to support the hypothesis that—for a finite basis—MP representations may provide a more converged description of interatomic correlations.

FIG. 7.

GFRE(ξ, ξ′) between different types of C-centered features, i.e., the fractional error made when using features ξ(A) to linearly predict features ξ′(A) for the same configurations.47 

FIG. 7.

GFRE(ξ, ξ′) between different types of C-centered features, i.e., the fractional error made when using features ξ(A) to linearly predict features ξ′(A) for the same configurations.47 

Close modal

A separate issue is the ability of MP features to incorporate long-range correlations, which we investigate considering a dataset of 2000 random NaCl bulk configurations51 with energies computed as the bare Coulomb interactions between ±e charges. This system entails very long-ranged, but perfectly two-body, interactions. Thus, |ρi1̄ invariant features can learn perfectly well the electrostatic energy although a very large cutoff is necessary to reduce the error below 10% RMSE [Fig. 8(a)]. Frameworks that explicitly incorporate a long-range component51,52 provide a more efficient approach to achieve high-accuracy models.

FIG. 8.

(a) Convergence of the ntrain = 1800 validation error for the electrostatic energy of randomized NaCl structures as a function of rcut. The curves correspond to different types of features, following the key in the legend. Shading indicates the standard deviation over three random splits. (b) Predicted energy of a single NaCl pair surrounded by inert atoms as a function of the distance for linear models built on rcut = 6 Å features. The line and the shading indicate the conditionally averaged mean of predictions for a given distance and a one-standard-deviation range around it. The various models yield a test-set RMSE of 0.028 a.u. (NaCl, ν = 1), 0.025 a.u. (ν = 1), 0.019 (ν = 2), 0.017 a.u. (ν = 3), and 0.016 a.u. ([1 ← 1]).

FIG. 8.

(a) Convergence of the ntrain = 1800 validation error for the electrostatic energy of randomized NaCl structures as a function of rcut. The curves correspond to different types of features, following the key in the legend. Shading indicates the standard deviation over three random splits. (b) Predicted energy of a single NaCl pair surrounded by inert atoms as a function of the distance for linear models built on rcut = 6 Å features. The line and the shading indicate the conditionally averaged mean of predictions for a given distance and a one-standard-deviation range around it. The various models yield a test-set RMSE of 0.028 a.u. (NaCl, ν = 1), 0.025 a.u. (ν = 1), 0.019 (ν = 2), 0.017 a.u. (ν = 3), and 0.016 a.u. ([1 ← 1]).

Close modal

It is often stated that ν ≥ 2 features contain information on atoms that are up to 2rcut apart.53,54 This is formally true if the two atoms are simultaneously neighbors of a third [Fig. 9(a)]. Similarly, [1 ← 1] MP features can, in principle, describe correlations up to 3rcut. Message-passing schemes are often cited as a way to incorporate long-range physics in atomistic ML, and some promising results have been shown for condensed-phase applications.43 However, Fig. 8(a) clearly shows that—at least in combination with a linear model—increasing the range of interactions indirectly is much less effective than just using a large rcut with two-body features (ν = 1).

FIG. 9.

(a) Schematic of a configuration for which a ν = 2 (or [1 ← 1]) representation can encode information on a distance rrcut; (b) to have information on the pair, the central (and the decorated neighbor) atoms must lie in very narrow regions, highlighted. (c) An example of how the same distance can be represented by multiple three-body terms. Regressing E(r) in terms of the χ-centered contributions, ϵ1 and ϵ2, leads to an over-determined system.

FIG. 9.

(a) Schematic of a configuration for which a ν = 2 (or [1 ← 1]) representation can encode information on a distance rrcut; (b) to have information on the pair, the central (and the decorated neighbor) atoms must lie in very narrow regions, highlighted. (c) An example of how the same distance can be represented by multiple three-body terms. Regressing E(r) in terms of the χ-centered contributions, ϵ1 and ϵ2, leads to an over-determined system.

Close modal

To better understand this observation, we modify the structures by (1) removing periodic boundary conditions and (2) leaving only one NaCl pair, turning all other atoms to inert dummy particles, which we label χ and assume not to contribute to the target energy. Thus, the energy of each structure is just −E(A) = 1/rNaCl. A model that uses ν = 1 features that describe Na and Cl, completely ignoring the spectator atoms, yields the expected behavior, fitting the target almost exactly until r = rcut and then becoming constant [ν = 1, NaCl in Fig. 8(b)].

One may then investigate what happens when dummy atoms are included in the model. At first, this may seem unnecessary: χ particles do not contribute to the energy, and in this dataset, their positions are almost random. However, dummy particle do provide additional information about the Na and Cl atoms that are in their environments—e.g., reporting on the relative positions of the ions when they are farther than rcut. Simply including dummy atoms within ν = 1 features gives the predicted E(r) curve a small slope for r>rcut: this is because the atom distribution is not entirely random; there is an “exclusion zone” of 2.5 Å around each atom, and so, information on the relative position of χ around Na indirectly indicates the possible presence of a Cl atom outside the cutoff. Even though overall the validation-set RMSE decreases—as χ-centered features do indeed incorporate useable information—the quality of the fit at short r degrades. Features with ν ≥ 2 (which in theory contain enough information to precisely determine the relative position of Na and Cl up to 2rcut) improve the accuracy beyond rcut, but there is a large spread around the target even if all these results are obtained in the large ntrain limit.

We believe there are two reasons why the usual argument that three-body features extend the range of interactions is too simplistic. First, to have some information on rNaCl, the dummy particles should be placed in a very precise region, which is increasingly unlikely as r approaches 2rcut [Fig. 9(a)]. Similar arguments apply to [1 ← 1] MP features [Fig. 9(b)], which indeed only marginally increase the accuracy of predictions for large r. Second, using a three-body term to describe the Na–Cl interactions may lead to contradictory scenarios [Fig. 9(c)]. The idea of the counterexample is similar to one of the cases discussed in Ref. 35. Consider three structures having the same Na–Cl distance that is larger than rcut and one or more χ atoms that are sufficiently close to see the two ions within the cutoff but outside each other’s cutoff. It is then possible to assign the same energy contribution to different Na − χ − Cl geometries. However, if a structure has two dummy atoms simultaneously close to the ion pair with the same relative distances and angles as the previous two structures, one arrives at a contradictory, overdetermined regression. Note that for simplicity, we consider only χ-centered descriptors: adding Na and Cl features would introduce χ − Na − χ′ terms and break the tie, but one can design a more complicated set of structures that would be similarly contradictory. Overall, the analysis of these two toy models suggests that a fully equivariant MP construction is not necessarily an efficient route to describe long-range/low-body-order interactions even though it provides, in principle, a systematically convergent basis to express them.

As a final example, we demonstrate the extrapolative prediction of molecular dipole moments of small organic molecules. The dataset is taken from Ref. 55, where a similar exercise was performed to compare the performance of a model based on atom-centered equivariant contributions with one based on the prediction of atomic point charges. The purpose of this example is two-fold. First, it serves as a demonstration of the symmetry-adapted prediction of a covariant property. Second, given that we train the model on small molecules taken from the QM7 dataset56 and we make prediction on slightly larger structures from the QM9 set,57 the example gives an indication of differences in transferability between the various representations.58 

Figure 10 shows the learning curves of linear models built on ν = 2, ν = 3, and [1 ← 1] features, both for a validation subset of QM7 and 1000 of larger QM9 molecules. We also compare two different levels of discretization of the density expansion (as for CH4, we do not truncate the expansion to keep the setup as simple as possible). ν = 2 features show inferior performance to those observed in previous work, which is explained both by the use of a larger basis and (more importantly) a non-linear kernel, while the four-body representation performs much better both for QM7 and extrapolation to QM9 with performances comparable to the non-linear kernel model in Ref. 55. The [1 ← 1] MP-ACDC features are remarkably well-behaved. Even with the very coarse (nmax = 4, lmax = 2) discretization, learning curves on QM7 do not saturate, and also, the QM9 extrapolation error decreases steadily with ntrain. Despite having restricted our tests to a linear model, the accuracy reached on QM9 is better than that was achieved in Ref. 55 with non-linear kernels and a hybrid model using both atomic dipoles and partial charges—even though it must be stressed that the models in Ref. 55 included also sulfur-containing molecules while here we restricted ourselves to CHNO compositions.

FIG. 10.

Learning curves for the prediction of molecular dipole models using a linear equivariant model trained on QM7 structures (restricted to CHNO composition) using different types of representations. (Top) Validation error on 800 hold-out QM7 structures. (Bottom) Error for extrapolative prediction on 1000 larger molecules taken from QM9. All learning curves are averaged over five random train-test partitions of the dataset. Shaded areas (error bars) indicate the standard deviation of the five subselections for nmax, lmax = 6, 3 (4, 2) models.

FIG. 10.

Learning curves for the prediction of molecular dipole models using a linear equivariant model trained on QM7 structures (restricted to CHNO composition) using different types of representations. (Top) Validation error on 800 hold-out QM7 structures. (Bottom) Error for extrapolative prediction on 1000 larger molecules taken from QM9. All learning curves are averaged over five random train-test partitions of the dataset. Shaded areas (error bars) indicate the standard deviation of the five subselections for nmax, lmax = 6, 3 (4, 2) models.

Close modal

The formalism we introduce in this work extends the atom-centered density-correlation representations and their link with linear models of functions of interatomic displacements to the case in which information on multiple environments is combined in a similar symmetry-adapted construction. The scheme captures the key ingredients of equivariant message-passing ML architectures, which, however, usually include additional scalar terms that make them more flexible but prevent establishing a clear correspondence with body-ordered expansions. We illustrate the behavior of the simpler MP representations on three elementary examples—the first focusing on the description of short-range body-ordered interactions, the second focusing on long-range properties, and the third focusing on covariant targets and extrapolative power. The examples suggest that MP representations can incorporate information on the position of atoms more efficiently than single-atom-centered features even though this advantage is reduced when using a multi-center decomposition of a global target property. The analysis of a purely electrostatic learning problem provides insight into the ability of MP features to incorporate long-range correlations. The formal observation that correlations between multiple neighbors extend the range of interactions beyond the cutoff appears to be of limited practical relevance. MP-ACDCs also perform very well in a more realistic application to small organic molecules and confirm their better behavior in the limit of a heavily truncated discretization compared to single-center features of the same body order.

On a formal level, we show how invariant and equivariant deep learning frameworks can be interpreted in terms of time-tested atom-centered descriptors. We expect that this formalism will help rationalize and classify geometric message-passing machine-learning schemes, which are often touted as “feature-free” approaches but usually entail the ad hoc construction of heavily engineered architectures. From this point of view, the main challenge is to study systematically the role played by scalar terms—which correspond to infinite correlation order but are restricted to powers of the input invariant features—and the iteration of the MP step, which also requires introducing contractions to limit the size of the feature vectors. A better understanding of the link between different kinds of ML architectures and systematically converging representations and between representations and descriptions of physically motivated components of the targets will help in the construction of more accurate, efficient, and transferable machine-learning models of atomic-scale problems as well as all applications that rely on a geometric description of point clouds.

The supplementary material contains additional derivations and full details on the illustrative models presented in the main text.

J.N. and M.C. acknowledge support by the NCCR MARVEL, funded by the Swiss National Science Foundation (Grant No. 182892), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 101001890-FIAMMA). G.F. and S.P. acknowledge support by the Swiss Platform for Advanced Scientific Computing (PASC). Discussions with Alexander Goscinski and Andrea Grisafi are gratefully acknowledged.

The authors have no conflicts to disclose.

The data that support the findings of this study are openly available in Zenodo.59 All data and scripts used for the examples in Sec. III are available online on the Materials Cloud Archive.60 

1.
A. V.
Shapeev
,
Multiscale Model. Simul.
14
,
1153
(
2016
).
2.
M. J.
Willatt
,
F.
Musil
, and
M.
Ceriotti
,
J. Chem. Phys.
150
,
154110
(
2019
).
3.
R.
Drautz
,
Phys. Rev. B
99
,
014104
(
2019
).
4.
F.
Musil
,
A.
Grisafi
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
,
Chem. Rev.
121
,
9759
(
2021
).
5.
M.
Uhrin
,
Phys. Rev. B
104
,
144110
(
2021
).
6.
M. F.
Langer
,
A.
Goeßmann
, and
M.
Rupp
,
npj Comput. Mater.
8
(
1
),
41
(
2022
).
7.
Y.
Zuo
,
C.
Chen
,
X.
Li
,
Z.
Deng
,
Y.
Chen
,
J.
Behler
,
G.
Csányi
,
A. V.
Shapeev
,
A. P.
Thompson
,
M. A.
Wood
, and
S. P.
Ong
,
J. Phys. Chem. A
124
,
731
(
2020
).
8.
M. M.
Bronstein
,
J.
Bruna
,
T.
Cohen
, and
P.
Veličković
, arXiv:2104.13478 (
2021
).
9.
P. W.
Battaglia
,
J. B.
Hamrick
,
V.
Bapst
,
A.
Sanchez-Gonzalez
,
V.
Zambaldi
,
M.
Malinowski
,
A.
Tacchetti
,
D.
Raposo
,
A.
Santoro
,
R.
Faulkner
 et al, arXiv:1806.01261 (
2018
).
10.
N.
Thomas
,
T.
Smidt
,
S.
Kearnes
,
L.
Yang
,
L.
Li
,
K.
Kohlhoff
, and
P.
Riley
, arXiv:1802.08219 (
2018
).
11.
F. B.
Fuchs
,
D. E.
Worrall
,
V.
Fischer
, and
M.
Welling
, arXiv:2006.10503 (
2020
).
12.
J.
Gilmer
,
S. S.
Schoenholz
,
P. F.
Riley
,
O.
Vinyals
, and
G. E.
Dahl
, in
International Conference on Machine Learning
(
PMLR
,
2017
), pp.
1263
1272
.
13.
K. T.
Schütt
,
F.
Arbabzadah
,
S.
Chmiela
,
K. R.
Müller
, and
A.
Tkatchenko
,
Nat. Commun.
8
,
13890
(
2017
).
14.
T.
Xie
and
J. C.
Grossman
,
Phys. Rev. Lett.
120
,
145301
(
2018
).
15.
N.
Lubbers
,
J. S.
Smith
, and
K.
Barros
,
J. Chem. Phys.
148
,
241715
(
2018
).
16.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
,
J. Chem. Phys.
148
,
241722
(
2018
).
17.
O. T.
Unke
and
M.
Meuwly
,
J. Chem. Theory Comput.
15
,
3678
(
2019
).
18.
L.
David
,
A.
Thakkar
,
R.
Mercado
, and
O.
Engkvist
,
J. Cheminf.
12
,
56
(
2020
).
19.
S.
Kearnes
,
K.
McCloskey
,
M.
Berndl
,
V.
Pande
, and
P.
Riley
,
J. Comput.-Aided Mol. Des.
30
,
595
(
2016
).
20.
R.
Kondor
,
H. T.
Son
,
H.
Pan
,
B.
Anderson
, and
S.
Trivedi
, arXiv:1801.02144 (
2018
).
21.
R.
Kondor
,
Z.
Lin
, and
S.
Trivedi
, in
Advances in Neural Information Processing Systems
(
Curran Associates, Inc.
,
2018
), Vol. 31, p.
10117
.
22.
B.
Anderson
,
T.-S.
Hy
, and
R.
Kondor
, in
Advances in Neural Information Processing Systems
(
Curran Associates, Inc.
,
2019
), pp.
14510
14519
.
23.
S.
Batzner
,
A.
Musaelian
,
L.
Sun
,
M.
Geiger
,
J. P.
Mailoa
,
M.
Kornbluth
,
N.
Molinari
,
T. E.
Smidt
, and
B.
Kozinsky
, arXiv:2101.03164 (
2021
).
24.
Z.
Qiao
,
A. S.
Christensen
,
M.
Welborn
,
F. R.
Manby
,
A.
Anandkumar
, and
T. F.
Miller
 III
, arXiv:2105.14655 (
2021
).
25.
J.
Nigam
,
M. J.
Willatt
, and
M.
Ceriotti
,
J. Chem. Phys.
156
,
014115
(
2022
).
26.
M. A.
Parker
and
G. A.
Parker
,
Aust. J. Phys.
40
,
465
(
1987
).
27.

A derivation for the N = 1 case, which follows closely the ideas in Ref. 1, can be found in Sec. 4.1 of Ref. 4.

28.
J.
Nigam
,
S.
Pozdnyakov
, and
M.
Ceriotti
,
J. Chem. Phys.
153
,
121101
(
2020
).
29.
A.
Glielmo
,
C.
Zeni
, and
A.
De Vita
,
Phys. Rev. B
97
,
184307
(
2018
).
30.
N.
Dym
and
H.
Maron
, arXiv:2010.02449 (
2020
).
31.
A.
Grisafi
,
D. M.
Wilkins
,
G.
Csányi
, and
M.
Ceriotti
,
Phys. Rev. Lett.
120
,
036002
(
2018
).
32.
A. P.
Bartók
,
R.
Kondor
, and
G.
Csányi
,
Phys. Rev. B
87
,
184115
(
2013
).
33.
J.
Behler
,
J. Chem. Phys.
134
,
074106
(
2011
).
34.
K. T.
Schütt
,
M.
Gastegger
,
A.
Tkatchenko
, and
K.-R.
Müller
,
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
(
Springer
,
2019
), pp.
311
330
.
35.
S. N.
Pozdnyakov
,
M. J.
Willatt
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
,
Phys. Rev. Lett.
125
,
166001
(
2020
).
36.
K.
Choudhary
and
B.
DeCost
,
npj Comput. Mater.
7
(
1
),
185
(
2021
).
37.
J.
Behler
and
M.
Parrinello
,
Phys. Rev. Lett.
98
,
146401
(
2007
).
38.
C.
van der Oord
,
G.
Dusson
,
G.
Csányi
, and
C.
Ortner
,
Mach. Learn.: Sci. Technol.
1
,
015004
(
2020
).
39.
G.
Imbalzano
,
A.
Anelli
,
D.
Giofré
,
S.
Klees
,
J.
Behler
, and
M.
Ceriotti
,
J. Chem. Phys.
148
,
241730
(
2018
).
40.
A.
Vaswani
,
N.
Shazeer
,
N.
Parmar
,
J.
Uszkoreit
,
L.
Jones
,
A. N.
Gomez
,
Ł.
Kaiser
, and
I.
Polosukhin
, in
Advances in Neural Information Processing Systems
(
Curran Associates, Inc.
,
2017
), Vol. 30.
41.
J. S.
Smith
,
O.
Isayev
, and
A. E.
Roitberg
,
Chem. Sci.
8
,
3192
(
2017
).
42.
J.
Klicpera
,
J.
Groß
, and
S.
Günnemann
, arXiv:2003.03123 (
2020
).
43.
Y.
Zhang
,
J.
Xia
, and
B.
Jiang
,
Phys. Rev. Lett.
127
,
156002
(
2021
).
44.
J.
Klicpera
,
F.
Becker
, and
S.
Günnemann
, arXiv:2106.08903 (
2021
).
45.
K. T.
Schütt
,
O. T.
Unke
, and
M.
Gastegger
, arXiv:2102.03150 (
2021
).
46.
J.
Brandstetter
,
R.
Hesselink
,
E.
van der Pol
,
E.
Bekkers
, and
M.
Welling
, arXiv:2110.02905 (
2021
).
47.
A.
Goscinski
,
G.
Fraux
,
G.
Imbalzano
, and
M.
Ceriotti
,
Mach. Learn.: Sci. Technol.
2
,
025028
(
2021
).
48.
S.
Pozdnyakov
,
M.
Willatt
, and
M.
Ceriotti
(
2020
). “
Randomly-displaced methane configurations
,” Materials Cloud, Dataset , https://archive.materialscloud.org/record/2020.110; accessed May 11, 2020.
49.
A.
Goscinski
,
F.
Musil
,
S.
Pozdnyakov
,
J.
Nigam
, and
M.
Ceriotti
,
J. Chem. Phys.
155
,
104106
(
2021
).
50.

The H atoms are randomly distributed in a sphere of 3 Å around the central carbon.

51.
A.
Grisafi
and
M.
Ceriotti
,
J. Chem. Phys.
151
,
204105
(
2019
).
52.
A.
Grisafi
,
J.
Nigam
, and
M.
Ceriotti
,
Chem. Sci.
12
,
2078
(
2021
).
53.
N.
Artrith
,
T.
Morawietz
, and
J.
Behler
,
Phys. Rev. B
83
,
153101
(
2011
).
54.
V. L.
Deringer
,
A. P.
Bartók
,
N.
Bernstein
,
D. M.
Wilkins
,
M.
Ceriotti
, and
G.
Csányi
,
Chem. Rev.
121
,
10073
(
2021
).
55.
M.
Veit
,
D. M.
Wilkins
,
Y.
Yang
,
R. A.
DiStasio
, and
M.
Ceriotti
,
J. Chem. Phys.
153
,
024113
(
2020
).
56.
G.
Montavon
,
M.
Rupp
,
V.
Gobre
,
A.
Vazquez-Mayagoitia
,
K.
Hansen
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O.
Anatole Von Lilienfeld
,
New J. Phys.
15
,
095003
(
2013
).
57.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
Von Lilienfeld
,
Sci. Data
1
(
1
),
140022
(
2014
).
58.

We consider exclusively the 6874 QM7 structures that contain only C, H, N, and O atoms and make predictions on 1000 randomly selected QM9 structures, similarly limited to four elemental components.

59.
J.
Nigam
and
M.
Ceriotti
(
2022
). “
Zenodo record
,” Zenodo.
60.
J.
Nigam
,
S.
Pozdnyakov
,
G.
Fraux
, and
M.
Ceriotti
, “
Unified theory of atom-centered representations and message-passing machine-learning schemes
,”
Materials Cloud Archive
2022
,
44
.

Supplementary Material