Data-driven schemes that associate molecular and crystal structures with their microscopic properties share the need for a concise, effective description of the arrangement of their atomic constituents. Many types of models rely on descriptions of atom-centered environments, which are associated with an atomic property or with an atomic contribution to an extensive macroscopic quantity. Frameworks in this class can be understood in terms of atom-centered density correlations (ACDC), which are used as a basis for a body-ordered, symmetry-adapted expansion of the targets. Several other schemes that gather information on the relationship between neighboring atoms using “message-passing” ideas cannot be directly mapped to correlations centered around a single atom. We generalize the ACDC framework to include multi-centered information, generating representations that provide a complete linear basis to regress symmetric functions of atomic coordinates, and provide a coherent foundation to systematize our understanding of both atom-centered and message-passing and invariant and equivariant machine-learning schemes.

## I. INTRODUCTION

The understanding that *ν*-point correlations of the atom density provide a complete linear basis to expand any property of these atom-centered environments^{1–3} sheds light on the relations between different machine-learning (ML) frameworks for atomistic modeling^{4–6} and is driving much progress in obtaining more accurate and performant schemes.^{7} This systematic approach is somewhat disconnected from efforts that combine ideas from geometric ML^{8,9} and point clouds^{10,11} that look at molecules as graphs in which the atoms play the role of nodes and the interatomic distances decorate the edges that connect them^{12–17}—a point of view that has a long-standing tradition in the field of cheminformatics.^{18,19} Message-passing (MP) concepts that describe the propagation of information about the geometric relations between nodes in terms of messages propagated along the edges provide a general framework that encompasses popular architectures, such as graph convolutional neural networks (NN), as special cases. With the rise of equivariant neural networks,^{10,20–24} the similarity between atom-centered and message-passing schemes has become more and more apparent, since both rely on symmetry-adapted constructions based on O(3)-preserving operations. We introduce a formalism to define representations that incorporate graph-based and message-passing ideas, that provide a linear basis to expand symmetric functions of the relative position of atoms (or points in a 3D cloud), and that correspond, in the appropriate limit, to a wide variety of deep learning schemes based on graphs. We give examples of the performance of these “message-passing” representations in describing short- and long-range structure–property relations and discuss how this framework provides a unified formalism to describe geometric machine-learning approaches that rely on structured representations,^{9} incorporating inductive biases in the form of atom-centered contributions that are local and equivariant.

## II. THEORY

### A. Atom-centered density correlations

The basic ingredient in the atom-centered density correlation (ACDC) framework^{2} involves localized functions (e.g., Gaussians *g* or Dirac *δ* distributions) centered at the atomic positions **r**_{i}. Following the notation formalized in Sec. 3.1 of Ref. 4, we indicate these atomic contributions as ⟨**x**|**r**_{i}⟩ ≡ *g*(**x** − **r**_{i}) [or *δ*(**x** − **r**_{i})]. The notation is summarized in Table I, and some of the key quantities are illustrated schematically in Fig. 1. Repeated symmetrization and summation over the neighbors of the central atom yield a family of (*N* + 1)-centers and (*ν* + 1)-body-order equivariant representations,^{25}

In this expression, |*λμ*⟩ is the vector that tracks the irreducible spherical [SO(3)] representation^{26} and |*σ*⟩ is the parity label. $|\rho ii1\cdots iN\u2297\nu \u3009$ indicates the tensor product of pair terms $|riki\u3009$, which determine the relative position $riki=rik\u2212ri$ of the *N* centers around the central atom, and *ν* of the neighbors found within the *i*-centered environment *A*_{i} [for the derivation, see Ref. 25, where the following is presented as Eq. (5)],

Equation (2) is a highly abstract formulation of the construction (represented schematically in Fig. 1), which can be evaluated, in practice, by projection on any complete basis |*q*⟩. By combining the resulting coefficients with appropriately fitted weights ⟨*y*|*q*⟩, one can approximate any (*N* + 1)-center quantity $yii1\u2026iN$ that is symmetric with respect to permutations of the labels of the neighbors and equivariant with respect to rigid translations and rotations, e.g.,

To see how, one can consider a real-space basis and a *δ*-like atom density. For an invariant (scalar) property,

where {**x**} is the shorthand for the collection of the continuous indices associated with each interatomic degree of freedom.^{27}

Density peaked at the interatomic spacing $rik\u2212ri$ |

$|riki\u3009$$\u3008x|riki\u3009=g(x\u2212riki)$ |

Basis of spherical harmonics and radial functions |

⟨nlm|$\u2261\u222bdx\u3008n|x\u3009\u3008lm|x\u0302\u3009\u3008xx\u0302|$ |

Neighbor density around the ith atom |

|ρ_{i}⟩$\u2261\u2211i1|ri1i\u3009$ |

ν-neighbor ACDC around the ith atom |

$|\rho i\u2297\nu \u3009$≡|ρ_{i}⟩ ⊗ |ρ_{i}⟩ ⋯ ⊗ |ρ_{i}⟩ |

Symmetrized ν-neighbor ACDC invariant |

$|\rho i\u2297\nu \u0304\u3009$$\u2261\u222bdR\u0302|\rho i\u2297\nu \u3009$ |

Symmetrized ν-neighbor ACDC equivariant with SO(3) character λ and inversion character σ |

$|\rho i\u2297\nu ;\sigma ;\lambda \mu \u0304\u3009$$\u2261\u222bdR\u0302R\u0302|\sigma \u3009\u2297R\u0302|\lambda \mu \u3009\u2297R\u0302|\rho i\u2297\nu \u3009$ |

Multi-center ACDC features describing a cluster i, i_{1}, …, i_{N} of N + 1 atoms |

$|\rho ii1\u2026iN\u2297\nu \nu 1\u2026\nu N\u3009$$\u2261|\rho i\u2297\nu \u3009\u2297|\rho i1\u2297\nu 1\u3009\u2297|ri1i\u3009$ |

Simplest form of an i-centered message-passing ACDC representation |

$|\rho i\u2297[\nu \u2190\nu 1]\u3009$$\u2261\u2211i1|\rho ii1\u2297\nu \nu 1\u3009$ |

Higher-order MP contraction |

$|\rho i\u2297[\nu \u2190(\nu 1,\nu 2)]\u3009$$\u2261\u2211i1i2|\rho ii1i2\u2297\nu \nu 1\nu 2\u3009$ |

Iterating the MP construction |

$|\rho i\u2297[\nu \u2190[\nu 1\u2190\nu 2]]\u3009$$\u2261\u2211i1|\rho i\u2297\nu \u3009\u2297|\rho i1\u2297[\nu 1\u2190\nu 2]\u3009\u2297|ri1i\u3009$ |

Density peaked at the interatomic spacing $rik\u2212ri$ |

$|riki\u3009$$\u3008x|riki\u3009=g(x\u2212riki)$ |

Basis of spherical harmonics and radial functions |

⟨nlm|$\u2261\u222bdx\u3008n|x\u3009\u3008lm|x\u0302\u3009\u3008xx\u0302|$ |

Neighbor density around the ith atom |

|ρ_{i}⟩$\u2261\u2211i1|ri1i\u3009$ |

ν-neighbor ACDC around the ith atom |

$|\rho i\u2297\nu \u3009$≡|ρ_{i}⟩ ⊗ |ρ_{i}⟩ ⋯ ⊗ |ρ_{i}⟩ |

Symmetrized ν-neighbor ACDC invariant |

$|\rho i\u2297\nu \u0304\u3009$$\u2261\u222bdR\u0302|\rho i\u2297\nu \u3009$ |

Symmetrized ν-neighbor ACDC equivariant with SO(3) character λ and inversion character σ |

$|\rho i\u2297\nu ;\sigma ;\lambda \mu \u0304\u3009$$\u2261\u222bdR\u0302R\u0302|\sigma \u3009\u2297R\u0302|\lambda \mu \u3009\u2297R\u0302|\rho i\u2297\nu \u3009$ |

Multi-center ACDC features describing a cluster i, i_{1}, …, i_{N} of N + 1 atoms |

$|\rho ii1\u2026iN\u2297\nu \nu 1\u2026\nu N\u3009$$\u2261|\rho i\u2297\nu \u3009\u2297|\rho i1\u2297\nu 1\u3009\u2297|ri1i\u3009$ |

Simplest form of an i-centered message-passing ACDC representation |

$|\rho i\u2297[\nu \u2190\nu 1]\u3009$$\u2261\u2211i1|\rho ii1\u2297\nu \nu 1\u3009$ |

Higher-order MP contraction |

$|\rho i\u2297[\nu \u2190(\nu 1,\nu 2)]\u3009$$\u2261\u2211i1i2|\rho ii1i2\u2297\nu \nu 1\nu 2\u3009$ |

Iterating the MP construction |

$|\rho i\u2297[\nu \u2190[\nu 1\u2190\nu 2]]\u3009$$\u2261\u2211i1|\rho i\u2297\nu \u3009\u2297|\rho i1\u2297[\nu 1\u2190\nu 2]\u3009\u2297|ri1i\u3009$ |

Explicitly evaluating the sum over neighbors entails a cost that scales exponentially with *ν*. However, a density trick eliminates the steep scaling with the number of neighbors by first computing an atom-centered density, |*ρ*_{i}⟩ ≡ *∑*_{j}|**r**_{ji}⟩, and then performing the tensor products.^{4} Similarly, one does not need to compute the integral over rotations explicitly. By expressing the equivariant features in an irreducible coupled basis and discretizing each term in the tensor product on a basis of radial functions ⟨*x*|*nl*⟩ and spherical harmonics $\u3008x\u0302|lm\u3009$, the full set of symmetry-adapted equivariants (1) can be evaluated by combining symmetry-adapted terms in an iterative fashion,^{28}

The key insight from Eq. (5) is that equivariant features can be combined to yield higher-order equivariants that provide a more complete basis to express structure–property relations. As shown graphically in Fig. 2, the tensor product of two features with symmetry *l*_{1} and *l*_{2} generates equivariant components ranging from *λ* = |*l*_{1} − *l*_{2}| to *l*_{1} + *l*_{2}. One can obtain the same rotational behavior (*λ*) with different combinations of *l*_{1} and *l*_{2}. For instance, the figure shows the *λ* = 1 blocks that can be obtained by combinations of *l*_{1} = 1, *l*_{2} = 1 and *l*_{1} = 0, *l*_{2} = 1 in shades of red, indicating the different symmetry blocks involved in the coupling. Similarly, *λ* = 0 blocks can arise from the combination of only invariants (*l*_{1} = 0, *l*_{2} = 0) but also by combinations of equivariant components (*l*_{1} = 1, *l*_{2} = 1) and (*l*_{1} = 2, *l*_{2} = 2) shown in darker shades of gray. Expressions such as (2) provide a notation to describe a linear basis to approximate properties of (*N* + 1) atomic centers that depend simultaneously on the positions of *ν* neighbors and (5) a strategy to compute such features iteratively while keeping the irreducible equivariant components separate. References 2, 3, and 29 discuss the universality of this construction based on its equivalence to a complete basis of symmetric polynomials (see also Sec. 4.1 of Ref. 4 for a pedagogic discussion). Similar arguments about approximating equivariant functions with a basis of equivariant polynomials have been used in Ref. 30 to prove the universality of tensor field networks.^{10}

As a concrete example, consider the expansion of the atom-centered density in radial functions ⟨*r*|*nl*⟩ and spherical harmonics $\u3008r\u0302|lm\u3009$, ⟨*nlm*|*ρ*_{i}⟩, which can also be interpreted as the *ν* = 1 equivariant ACDC $\u3008n|\rho i\u22971;lm\u0304\u3009$. The two-point correlation can be simply indexed as $\u3008nlm;n\u2032l\u2032m\u2032|\rho i\u22972\u3009$, but its elements do not transform as irreducible representations of O(3). We can obtain a more transparent equivariant formulation by using the iteration (5),

which yields the equivariant smooth overlap of atomic positions (*λ*-SOAP) features^{31} with a well-defined parity and SO(3) character. The (*σ* = 0, *λ* = 0) scalar term corresponds to the popular invariant SOAP representation,^{32}

To better understand the broader significance of this construction and its relationship to other geometric ML schemes, it is worth recalling that SOAP invariants encode information on the distances between the central atom and two neighbors *r*_{ji}, *r*_{j′i} as well as on the angle between them *θ*_{jj′i} and could be equally well computed as^{2,4}

where the *G*_{nn′l} are the functions that depend on the basis used for the neighbor density expansion and that could, in principle, be chosen to mimic the usual Behler–Parrinello symmetry functions.^{33} Similar expressions can be derived for any scalar contraction of two atom-centered density terms: for instance, the three-center, zero-neighbor representation can be evaluated as

which we will use later to discuss the connection with geometric ML schemes based on angular information.

### B. Equivariant message-passing representations

The ACDC construction is entirely based on terms centered around a single atom and can be seen as an alternative to message-passing (MP) architectures, which are based on the idea that information on multiple nodes (atoms in this context) and their connectivity (inter-atomic distance vectors) could be combined to provide higher descriptive power. The essential ideas behind MP frameworks can be summarized in terms of a few essential steps, illustrated in Fig. 3: (a) A description of the connectivity of a selected node *i* with each of its neighbors *j* is built by combining information on the neighbor attributes and on the geometry of the connection. (b) Information on all the neighbors of the selected node is combined in a way that is independent of the order by which they are considered. (c) The compound description of the neighborhood is assigned as the new attributes of the central node. Iterating the process propagates information on the neighbors’ neighbors to every node. Graph-convolutional frameworks can be seen as a special case.^{8,9} We refer the reader to Refs. 12 and 34 for an overview of the methodology applied to chemical modeling and to Sec. II E for an explicit comparison with existing MP architectures.

The ACDC framework can be extended to provide a basis for the construction of arbitrarily complex equivariant MP frameworks. First, we introduce the possibility of decorating an (*N* + 1)-center representation with information on any of the centers. Disregarding the symmetry indices and the integral over rotations for simplicity—as discussed above, symmetry-adapted equivariants can be obtained by using Eq. (5) on the tensor-product pattern—we define the decorated atom-density representations,

The (*N* + 1)-center representations (2) are a special case, with *ν*_{k>0} = 0. Given that we encode information on the vectors connecting the central atom *i* with its neighbors $riki$, the information on the position of the neighbors relative to each other is also formally retained because $rik\u2032ik=rik\u2032i\u2212riki$. As we will show in Sec. III A, for a finite basis, the performance of representations that formally describe the same order of interatomic correlations can depend, in practice, on which pairs of atoms are included explicitly in the description. Taking tensor products of (*N* + 1)-center features amounts to increasing their body order,

where we use an approximate equality because on the left-hand side each $|riki\u3009$ appears twice. The tensor product $|riki\u3009\u2297|riki\u3009$ increases the dimensionality of the left-hand side even though it does not incorporate information on more neighbors or centers.

Simple MP-like representations can be obtained starting from decorated pair (two-center) features and summing over the pair index (Fig. 4),

For the case of *ν*_{1} = 0, the sum simplifies to |*ρ*_{i}⟩; thus, the expression simply evaluates ACDC features of order *ν* + 1. By expanding the tensor product and in analogy with (4), one sees that these features provide a linear basis to expand a symmetrized function of the neighbors of *i* and of each of their neighbors, e.g.,

A basis for O(3)-equivariant predictions can be obtained by computing the tensor product iteratively in the coupled basis, following (5). Except for the (important) subtlety that one of the indices extends over the neighborhood of *i*_{1}, these features contain the same information as the *ν* = 3 ACDC (the bispectrum). Indeed, one can verify that $|\rho i\u2297[1\u21901]\u0304\u3009$ discriminate between structures that are degenerate^{35} for the on-site power spectrum $|\rho i\u22972\u0304\u3009$ but cannot discriminate between environments that have the same $|\rho i\u22973\u0304\u3009$.

Equation (12) can be generalized in different ways. For example, one could compute *N* = 2 decorated features and contract them,

These features describe atomic pairs as a sum over descriptors of triplets. They could be used as features to characterize edges and seen as a building block of MP architectures that use bonds as nodes and angles as edges.^{36}

By also summing over *i*_{1}, one obtains message-passing ACDC features that contain joint information on two decorated neighbors,

This corresponds to an overall correlation order of *ν* + *ν*_{1} + *ν*_{2} + 2 neighbors around the *i*-atom. More generally, representations that incorporate joint information on multiple neighbors can be built starting from pair MP features—see the supplementary material for more information,

Note that the left-hand side avoids the double sum over neighbors in Eq. (15), similar to how the density trick avoids summation over tuples of neighbors by first expanding a neighbor density and then increasing the body order by taking tensor products rather than by explicit higher-order summations.

Given that all of these contracted expressions yield representations that are *atom-centered*, the process can be iterated. For example, one could first compute a low-order $|\rho i\u2297\nu \u3009$ descriptor, use it to evaluate the MP representation $|\rho i\u2297[\nu \u2190\nu ]\u3009$, and then repeat. Depending on how one describes the central atom and the neighbors, the procedure yields different types of representations. Combining atom-centered features on the *i* atom with MP representations on the neighbors yields

while combining MP representations on *i* and atom-centered neighbor features yields

which can be shown to be equivalent to $|\rho i\u2297[\nu \u2190(\nu ,\nu )]\u3009$. Finally, one can combine MP representations on both atoms,

which can also been rewritten in terms of (17). A full expansion of these expressions is given in the supplementary material. Each of these descriptors—as well as countless variations in which primitive features of different body orders, describing atoms, bonds, or general clusters, are combined together—is associated with a tensor product string in the form of Eq. (2). This can be manipulated to improve the efficiency of the calculation or simply used to characterize the overall body order and the type of correlations included in the architecture.

### C. Scalar nonlinearitites

The discussion this far focuses on the case of equivariant MP representations that aim to provide a complete basis for the symmetric function of the neighbor positions. Arguments similar to those exposed in Eq. (13) make it possible to determine the precise form of the function of interatomic distance vectors that is associated with each type of features. Even though it is convenient to manipulate these representations in their irreducible O(3) equivariant form, it may not always be the most convenient form to work with them. As long as one exclusively uses tensor products to construct them, the higher-order terms can always be written in terms of these equivariant irreducible representations, which, for instance, are exploited by moment tensor potentials.^{1} If, however, one takes an arbitrary non-linear function of the equivariant components, the result cannot be separated into irreducible terms. The case of scalar (invariant) components is an exception in that any non-linear function of invariant features is itself invariant. Thus, one can enrich the architecture by applying non-linear functions to the scalar component of the equivariants of any type and order. It should be noted that doing so eliminates the link of ACDCs with a body-ordered *linear* expansion of the target property. As a simple example, Behler–Parrinello neural networks^{37} compute atom-centered symmetry functions that are equivalent to $|\rho i\u22971\u0304\u3009$ and $|\rho i\u22972\u0304\u3009$ and use them as inputs to a feed-forward neural network, which is essentially a non-linear function with adjustable parameters. This corresponds to a non-linear functional form that can incorporate some, but not necessarily all, higher-order correlations.^{35}

To see what these non-linear terms do, we start from a vector of invariants ** ξ**(

*A*

_{i}) = {ξ

_{q}(

*A*

_{i})} = {⟨

*q*|

*A*

_{i}⟩} (e.g., ${\u3008q|\rho i\u22971\u0304\u3009}$, ${\u3008q|\rho i\u22972\u0304\u3009}$, or any other representation such as ${\u3008q|\rho i\u2297[1\u21901]\u0304\u3009}$) and apply Eq. (5) with all terms truncated to

*λ*= 0. This generates all possible products of the initial features $\u3008q1;q2|Ai\u22972\u3009\u221d\u3008q1|Ai\u3009\u3008q2|Ai\u3009$. In terms of the feature vector, this corresponds to ${\xi 12,\xi 1\xi 2,\xi 22,\u2026}$. Repeating the iteration

*k*times generates all monomials of total order

*k*that combine powers of {ξ

_{q}}, e.g., $\xi 13\xi 22\xi 9k\u22125$. Comparing with the power series expansion of an analytical function of

**(**

*ξ**A*),

one sees that computing *f*(** ξ**) is equivalent to repeating the

*λ*= 0-truncated iteration to infinity and projecting the resulting features on the expansion coefficients. Given that

*f*(and hence the coefficients) usually depends on adjustable parameters, architectures that include scalar functions escape the systematic classification that we propose for equivariant frameworks that are entirely based on the angular momentum iteration. Still, the spirit is clear: each non-linear scalar function brings in infinite body-order correlations, which are, however, restricted to

*λ*= 0 iterations and projected along a single adjustable direction in the infinite body-order feature space. In other terms, scalar nonlinearities provide a computationally efficient way to incorporate high-order products of the scalar components of the different types of ACDCs. However, they are not always more expressive than lower-order equivariant models because (1) each non-linear function picks up a single linearly independent component of the high-order correlation space and (2) they entirely miss all the components of the correlation space that arise from the combinations of

*λ*>0 equivariants. Figure 5 extends the scheme presented in Fig. 2 to illustrate the hierarchical construction of higher order features by combining

*ν*= 2 features from one Clebsch–Gordan iteration (5) with

*ν*= 1 features to further increase the correlation order. It also visualizes the process of computing a non-linear function of the first-order invariants ${\xi q(Ai)}={\u3008q|\rho i\u22971;00\u0304\u3009}$ (light-gray). Given that the terms in Eq. (20) correspond to products of {ξ

_{q}(

*A*

_{i})}, the function can be seen as a weighted combination (represented with green shading) of the invariants generated as combination of the first-order invariants (light gray blocks). Evaluating a scalar function cannot incorporate any of the invariants that arise from combinations of

*λ*>0 equivariants (darker shades of gray).

### D. Relation to equivariant networks

During the last few years, a class of NNs has been developed that is based on the use of rotationally equivariant expressions as their fundamental building blocks, operating on node and edge descriptors that are computed on a basis of spherical harmonics. Models of this type rely heavily on the use of CG iterations analogous to Eq. (5) and therefore have the most direct relation with the MP-ACDC formalism up to a one-to-one correspondence when considering frameworks that do not introduce any kind of scalar non-linearities. The essential ingredient is the construction of contracted message-passing terms, akin to Eq. (12). In most frameworks, this operation is split into two parts: the construction of a neighbor sum of messages,

and the interaction of the contracted message with the central atom,

Given that symmetry considerations greatly restrict the range of permissible operations, various equivariant frameworks are remarkably similar. The most significant differences lie in the strategy used to avoid the exponential increase in the dimensionality of the tensor-product space and make the NN practically computable. To establish a direct connection, therefore, it is necessary to explicitly describe the basis used to express the various equivariant terms. Both $|\rho i\u2297\nu \u3009$ and $|\rho i1\u2297\nu \u3009$ can be written in their irreducible form (we neglect the parity index, for simplicity) and enumerated by a feature index *q*, e.g., $\u3008q|\rho i\u2297\nu ;lm\u0304\u3009$. For *ν* = 1, the *q* index corresponds to radial functions that discretize the spherically averaged distribution of neighbors. The pair vector term can also be discretized in an irreducible equivariant form as $\u3008q|ri1i;lm\u0304\u3009=\u3008ri1i|q\u3009\u3008r\u0302i1i|lm\u3009$. Thus, Eq. (21) can be written as

following Eq. (5). In practical implementations, one can perform first the sum over the neighbors (as performed in tensor field networks^{10}) or perform first the CG product, as we do here. The interaction layer (22) adds one more feature index and two more angular momentum indices, e.g., $\u3008q1l1;q2l2;q3l3k|\rho i\u2297[1\u21901];\lambda \mu \u0304\u3009$. Iterating Eqs. (21) and (22) exponentially increases the size of the feature vector. This is precisely the same issue that is observed in high-order ACDC frameworks^{2,3} and is usually addressed by a contraction/selection of the features based on either heuristic^{38} or data-driven^{28,39} considerations.

In the context of equivariant NNs, there are two main approaches to alleviate the exponential increase in the dimensionality of descriptors. The first, delayed approach^{21,28} involves linear contraction steps applied after each CG iteration. In practice, Eq. (23) is followed by

where the contraction coefficients $\u3008q\u0303;\lambda |q1l1;q2l2\u3009$ depend on the equivariant channel *λ* and can be determined in an unsupervised manner [e.g., through principal component analysis (PCA)^{28}] or are simply coefficients in the network, optimized by backpropagation. The second, *anticipated* approach involves applying a linear transformation to the descriptors *before* performing the CG iteration. First, one computes

and

Different transformations can be applied at different points in the network, as emphasized by the additional index *t* in the contraction coefficients $\u3008q\u0303;\lambda ;t|q\u3009$ and $\u3008q\u0303;\lambda ;t|q\u2032\u3009$, and the operation projects the two terms in a space of equal dimensionality even if they are not initially. Then, Eq. (23) can be expressed in a more efficient way that does not involve a tensor product,

Note that, *in principle*, this form does not entail loss of generality because one could write the linear transformations (25) and (26) as a projection in a higher dimensional space, reproducing the full tensor product, which would, however, defeat the purpose of an anticipated transformation.

Having expressed the basic building blocks of equivariant neural networks in our notation, we can draw a direct correspondence to different frameworks. In the language used in tensor field networks,^{10} Eq. (27) describes a pointwise convolution layer, incorporating also a self-interaction layer that corresponds to (25). The contracted pair term $\u3008q\u0303;t|ri1i;\lambda \mu \u0304\u3009$ corresponds to the rotation-equivariant filter, given that an adaptive linear combination of a fixed radial basis has the same expressive power as a learned radial function—which is used in NequIP,^{23} an atom-based implementation of TFN with an architecture optimized for applications to the construction of interatomic potentials.

All the on-site operations discussed up until now [both the delayed (24) and anticipated (25)] correspond to a linear change in basis and do not increase the expressive power of the descriptor. One can, however, design on-site terms that increase the correlation order relative to a tagged atom, e.g.,

This kind of CG iteration—which is the same as Eq. (5) that underlies ACDCs—is used in Clebsch–Gordan nets,^{21} where it is combined with delayed contractions to keep the computational cost manageable. Other architectures, such as Cormorant,^{22} combine all three discussed layers. On top of these, SE(3) equivariant transformers^{11} modify the pointwise convolutional layer by incorporating an attention mechanism,

where the scalar terms $\alpha i1i$ indicate the cross-attention coefficients.^{40} These terms are built by first constructing invariant combinations of the covariant query vector related to the central atom *i* and key vectors associated with the neighbor atoms *i*_{1}, which are there combined with a non-linear soft-max operation. Since the attention mechanism is not linear, each layer incorporates terms up to the infinite body order, as discussed in Sec. II C, and so, the indices *ν* cannot be rigorously interpreted as a tracker of the body order of the descriptor.

### E. Non-linearities in deep-learning frameworks

To discuss the role of non-linearities and how they can—at least, in principle—be understood in the language of MP-ACDC, let us discuss first the case of neural networks based on invariant, scalar attributes of the nodes and edges of the graph, which constitute the earlier, and better established, message-passing and graph-convolution neural networks for molecules. A general invariant message-passing architecture, as formalized by Gilmer *et al.*,^{12} amounts to steps in which node features **h**_{i} and edge features $eii1$ are combined based on the construction of messages,

and their aggregation to update the node features,

The linear limit of this architecture can be described as a special case of the MP-ACDC. The node features $hit$ can be taken to be any set of scalar features. At the simplest level, one can take *ν* = 1 invariants, i.e., projections of the interatomic distances on a radial basis, which can represent any function of the distance. Thus, the linear message function **M**_{t} corresponds to the two-center ACDC, restricted to a tensor product of site and edge invariants,

appropriately discretized on a radial basis. The contraction over *i*_{1} yields the MP-ACDC (again, restricted to invariant features),

while the node update simply increases the body order of the on-site descriptor,

The general, non-linear case amounts to using the pair features $\xi j(ii1)=\u3008j|\rho ii1\u229711\u0304;lmax=0\u3009$ as the inputs for a set of scalar functions. If one wants to keep a linear formalism, arbitrary non-linear message function *M*_{t} can be approximated by a linear fit using tensor products that approximate the high dimensional scalar correlation space, as discussed in Sec. II C. The entries of $mit+1$ could be written explicitly as

setting $Mq,tk1,k2,\u2026$ to the appropriate values to match the series expansion of *M*_{q,t}. Even though Eq. (35) shows how to formulate the message in terms of MP-ACDC, it is clearly an impractical way of evaluating a prescribed non-linear function of $\xi (ii1)$. Similar considerations apply to the update functions, Eq. (31). In the non-linear case, *U*_{t} also introduces higher products of $|\rho i\u2297[\nu \u2190\nu \u2032];lmax=0\u3009$ features and therefore is akin to the calculation of representations that combine joint information of multiple neighbors, e.g., Eq. (15). The restriction to *l*_{max} = 0, however, reduces the expressive power of the combination relative to a full tensor product.

### F. Extending the framework

We have demonstrated the direct connections between the MP-ACDC formalism and many established invariant and equivariant neural networks. There are several directions along which the framework can be extended with qualitative connections to recently proposed neural network architectures that attempt to achieve a better balance between the simplicity of invariant networks and the accuracy of their equivariant counterparts.

One obvious way to increase the expressive power of invariant deep learning models is to start from higher-order scalar descriptors—or to incorporate them at later stages of the iteration. This has long been known for NN potentials based on atom-centered symmetry functions,^{37,41} which have included from the very beginning three-body descriptors. Angular information can be readily included into an ACDC through the discretized two-center invariant $|\rho ii1i2\u22970\u0304\u3009$ [see Eq. (9)] as well as its contracted pair $|\rho ii1\u22971\u0304\u3009$ or atom-centered $|\rho i\u22972\u0304\u3009$ counterparts—the latter corresponding to the SOAP power spectrum. Several works^{36,42} have reported substantial improvements in the performance of ML models when incorporating information on interatomic angles into message-passing frameworks. REANN^{43} is a particularly interesting example that iteratively combines three-body invariant features weighted by coefficients describing the corresponding pair of neighbors—essentially the scalar equivalent of the MP contraction over two decorated neighbors (15) —overcoming some of the limitations of both distance-based MP frameworks and atom-centered three-body invariants.

Returning to equivariant NNs, non-linearities can be included in a similar spirit as for invariant message-passing schemes, increasing the effective correlation order without the cost of a full CG iteration. A simple way to introduce a non-linear term without interfering with the linear nature of O(3) covariance involves combining equivariant features with a scaling computed from invariant non-linear terms, e.g.,

an idea that was already discussed in the context of ACDCs.^{2} Most of the recent architectures that extend equivariant NNs incorporate some kind of non-linearities, such as the attention mechanism discussed for SE(3) transformers in Eq. (29). Other noteworthy, recent attempts to improve on equivariant NNs include GemNET^{44}—which introduces a two-hop message-passing scheme that is rotationally equivariant through directional representations of interatomic separations defined on the *S*^{2} sphere rather than on the SO(3) group, resulting in spherical convolution layers that depend on angles between interatomic separation vectors; PAINN^{45}—which propagates directional information in the Cartesian space, reducing the complexity associated with the construction of a complete equivariant basis set, and can be mapped to MP-ACDCs with inputs restricted to *l*_{max} = 1; and SEGNN^{46}—which extends the equivariant (steerable) node features to include spherical decompositions of tensors representing physical/geometric quantities to perform contractions akin to Eq. (5) with the messages in addition to incorporating non-linearities of the form of Eq. (36) in the message and node-update functions. Further developments of the MP-ACDC formalism may be needed to characterize systematically these extensions, particularly for in regard to the use of non-linear terms.

## III. EXAMPLES

We choose three simple examples to illustrate the behavior of the MP representations in the context of linear models (details are given in the supplementary material). We emphasize that we deliberately chose simple examples and small datasets as we do not intend to perform a benchmark exercise. In the following examples, we compare the behavior of different types of representations and gather some insights on their strengths and shortcomings when used in a controlled setting.

### A. Body-ordered interactions

To test the convergence of the body-order expansion, we use a dataset of 40 000 random CH_{4} configurations^{48} with energies computed at the density functional theory (DFT) level. We discretize the neighbor density on *n*_{max} = 6 optimal radial functions^{49} and *l*_{max} = 4 angular channels. We set *r*_{cut} = 3.5 Å so that all atoms are included in each neighborhood.^{50} To reduce the complexity of the tests and eliminate possible confounding factors, we compute the full invariant correlation features up to *ν* = 3 and the $|\rho i\u2297[1\u21901]\u0304\u3009$ MP representations without any data-driven^{28} or heuristic contraction. An interesting observations in Ref. 28 is the fact that linear models based on C-centered *ν* = 4 features—which should be complete in terms of expanding a symmetric function of the coordinates of the four H atoms—show saturation of the test error at around 4% RMSE (root mean square error). Models based on *ν* = 3 features saturate at 8% RMSE. This saturation is related to the truncation of the basis and is even more evident here (Fig. 6) where we use smaller (*n*_{max}, *l*_{max}) to enumerate the full one-center-three-neighbor correlations, and the *ν* = 3 model cannot improve beyond 16% RMSE (Fig. 6).

MP representations [1 ← 1] perform dramatically better, reaching a 7% RMSE. This appears to be due to better convergence of the discretization and not modulation of the weight of contributions at different distances, which does change the accuracy of the model (see the supplementary material) but not in terms of the relative performance of *ν* = 3 and MP features. Using a multi-center model (Fig. 6, dashed curves) almost entirely eliminates the advantage of MP features over the bispectrum, consistent with the interpretation that the H-centered density provides a faster-converging description of the interaction between pairs of hydrogen atoms that are far from C.^{28} However, simply having H-centered features is not sufficient to obtain a very accurate linear model. Even though *ν* = 2 features also yield more accurate predictions when using a multi-center mode, the improvement is less dramatic than for *ν* = 3 ACDCs, which in the multi-center case are almost equivalent to the MP features. Even in this very simple test problem, the order of the representation, the convergence of the basis, and the use of single-centered or multi-centered models are interconnected.

An alternative test to quantify the relative information content of different representations, independent from the regression target, involves computing the error one makes when using one type of features ** ξ** to predict a second type

**′ for the same structures. This construction, recently introduced as the the global feature space reconstruction error (GFRE),**

*ξ*^{47}yields a number close to one when the target features

**′ are linearly independent from**

*ξ***and close to zero when they can be predicted well. The GFRE is not symmetric: if GFRE(**

*ξ***,**

*ξ***′) ≪ GFRE(**

*ξ***′,**

*ξ***), one can say that**

*ξ***is more informative than**

*ξ***′, and vice versa. Figure 7 demonstrates that all four-body representations allow predicting, almost perfectly,**

*ξ**ν*= 2 features and that message-passing ACDCs [1 ← 1] are more informative than their single-center

*ν*= 3 counterpart even though the bispectrum does contain a small amount of complementary information, given that GFRE([1 ← 1],

*ν*= 3) is small but non-zero. These results provide further evidence to support the hypothesis that—for a finite basis—MP representations may provide a more converged description of interatomic correlations.

### B. Electrostatic interactions

A separate issue is the ability of MP features to incorporate long-range correlations, which we investigate considering a dataset of 2000 random NaCl bulk configurations^{51} with energies computed as the bare Coulomb interactions between ±*e* charges. This system entails very long-ranged, but perfectly two-body, interactions. Thus, $|\rho i\u22971\u0304\u3009$ invariant features can learn perfectly well the electrostatic energy although a very large cutoff is necessary to reduce the error below 10% RMSE [Fig. 8(a)]. Frameworks that explicitly incorporate a long-range component^{51,52} provide a more efficient approach to achieve high-accuracy models.

It is often stated that *ν* ≥ 2 features contain information on atoms that are up to 2*r*_{cut} apart.^{53,54} This is formally true if the two atoms are simultaneously neighbors of a third [Fig. 9(a)]. Similarly, [1 ← 1] MP features can, in principle, describe correlations up to 3*r*_{cut}. Message-passing schemes are often cited as a way to incorporate long-range physics in atomistic ML, and some promising results have been shown for condensed-phase applications.^{43} However, Fig. 8(a) clearly shows that—at least in combination with a linear model—increasing the range of interactions indirectly is much less effective than just using a large *r*_{cut} with two-body features (*ν* = 1).

To better understand this observation, we modify the structures by (1) removing periodic boundary conditions and (2) leaving only one NaCl pair, turning all other atoms to inert dummy particles, which we label *χ* and assume not to contribute to the target energy. Thus, the energy of each structure is just −*E*(*A*) = 1/*r*_{NaCl}. A model that uses *ν* = 1 features that describe Na and Cl, completely ignoring the spectator atoms, yields the expected behavior, fitting the target almost exactly until *r* = *r*_{cut} and then becoming constant [*ν* = 1, NaCl in Fig. 8(b)].

One may then investigate what happens when dummy atoms are included in the model. At first, this may seem unnecessary: *χ* particles do not contribute to the energy, and in this dataset, their positions are almost random. However, dummy particle do provide additional information about the Na and Cl atoms that are in their environments—e.g., reporting on the relative positions of the ions when they are farther than *r*_{cut}. Simply including dummy atoms within *ν* = 1 features gives the predicted *E*(*r*) curve a small slope for *r*>*r*_{cut}: this is because the atom distribution is not entirely random; there is an “exclusion zone” of 2.5 Å around each atom, and so, information on the relative position of *χ* around Na indirectly indicates the possible presence of a Cl atom outside the cutoff. Even though overall the validation-set RMSE decreases—as *χ*-centered features do indeed incorporate useable information—the quality of the fit at short *r* degrades. Features with *ν* ≥ 2 (which in theory contain enough information to precisely determine the relative position of Na and Cl up to 2*r*_{cut}) improve the accuracy beyond *r*_{cut}, but there is a large spread around the target even if all these results are obtained in the large *n*_{train} limit.

We believe there are two reasons why the usual argument that three-body features extend the range of interactions is too simplistic. First, to have some information on *r*_{NaCl}, the dummy particles should be placed in a very precise region, which is increasingly unlikely as *r* approaches 2*r*_{cut} [Fig. 9(a)]. Similar arguments apply to [1 ← 1] MP features [Fig. 9(b)], which indeed only marginally increase the accuracy of predictions for large *r*. Second, using a three-body term to describe the Na–Cl interactions may lead to contradictory scenarios [Fig. 9(c)]. The idea of the counterexample is similar to one of the cases discussed in Ref. 35. Consider three structures having the same Na–Cl distance that is larger than *r*_{cut} and one or more *χ* atoms that are sufficiently close to see the two ions within the cutoff but outside each other’s cutoff. It is then possible to assign the same energy contribution to different Na − *χ* − Cl geometries. However, if a structure has two dummy atoms simultaneously close to the ion pair with the same relative distances and angles as the previous two structures, one arrives at a contradictory, overdetermined regression. Note that for simplicity, we consider only *χ*-centered descriptors: adding Na and Cl features would introduce *χ* − Na − *χ*′ terms and break the tie, but one can design a more complicated set of structures that would be similarly contradictory. Overall, the analysis of these two toy models suggests that a fully equivariant MP construction is not necessarily an efficient route to describe long-range/low-body-order interactions even though it provides, in principle, a systematically convergent basis to express them.

### C. Equivariant predictions of molecular dipoles

As a final example, we demonstrate the extrapolative prediction of molecular dipole moments of small organic molecules. The dataset is taken from Ref. 55, where a similar exercise was performed to compare the performance of a model based on atom-centered equivariant contributions with one based on the prediction of atomic point charges. The purpose of this example is two-fold. First, it serves as a demonstration of the symmetry-adapted prediction of a covariant property. Second, given that we train the model on small molecules taken from the QM7 dataset^{56} and we make prediction on slightly larger structures from the QM9 set,^{57} the example gives an indication of differences in transferability between the various representations.^{58}

Figure 10 shows the learning curves of linear models built on *ν* = 2, *ν* = 3, and [1 ← 1] features, both for a validation subset of QM7 and 1000 of larger QM9 molecules. We also compare two different levels of discretization of the density expansion (as for CH_{4}, we do not truncate the expansion to keep the setup as simple as possible). *ν* = 2 features show inferior performance to those observed in previous work, which is explained both by the use of a larger basis and (more importantly) a non-linear kernel, while the four-body representation performs much better both for QM7 and extrapolation to QM9 with performances comparable to the non-linear kernel model in Ref. 55. The [1 ← 1] MP-ACDC features are remarkably well-behaved. Even with the very coarse (*n*_{max} = 4, *l*_{max} = 2) discretization, learning curves on QM7 do not saturate, and also, the QM9 extrapolation error decreases steadily with *n*_{train}. Despite having restricted our tests to a linear model, the accuracy reached on QM9 is better than that was achieved in Ref. 55 with non-linear kernels and a hybrid model using both atomic dipoles and partial charges—even though it must be stressed that the models in Ref. 55 included also sulfur-containing molecules while here we restricted ourselves to CHNO compositions.

## IV. CONCLUSIONS

The formalism we introduce in this work extends the atom-centered density-correlation representations and their link with linear models of functions of interatomic displacements to the case in which information on multiple environments is combined in a similar symmetry-adapted construction. The scheme captures the key ingredients of equivariant message-passing ML architectures, which, however, usually include additional scalar terms that make them more flexible but prevent establishing a clear correspondence with body-ordered expansions. We illustrate the behavior of the simpler MP representations on three elementary examples—the first focusing on the description of short-range body-ordered interactions, the second focusing on long-range properties, and the third focusing on covariant targets and extrapolative power. The examples suggest that MP representations can incorporate information on the position of atoms more efficiently than single-atom-centered features even though this advantage is reduced when using a multi-center decomposition of a global target property. The analysis of a purely electrostatic learning problem provides insight into the ability of MP features to incorporate long-range correlations. The formal observation that correlations between multiple neighbors extend the range of interactions beyond the cutoff appears to be of limited practical relevance. MP-ACDCs also perform very well in a more realistic application to small organic molecules and confirm their better behavior in the limit of a heavily truncated discretization compared to single-center features of the same body order.

On a formal level, we show how invariant and equivariant deep learning frameworks can be interpreted in terms of time-tested atom-centered descriptors. We expect that this formalism will help rationalize and classify geometric message-passing machine-learning schemes, which are often touted as “feature-free” approaches but usually entail the *ad hoc* construction of heavily engineered architectures. From this point of view, the main challenge is to study systematically the role played by scalar terms—which correspond to infinite correlation order but are restricted to powers of the input invariant features—and the iteration of the MP step, which also requires introducing contractions to limit the size of the feature vectors. A better understanding of the link between different kinds of ML architectures and systematically converging representations and between representations and descriptions of physically motivated components of the targets will help in the construction of more accurate, efficient, and transferable machine-learning models of atomic-scale problems as well as all applications that rely on a geometric description of point clouds.

## SUPPLEMENTARY MATERIAL

The supplementary material contains additional derivations and full details on the illustrative models presented in the main text.

## ACKNOWLEDGMENTS

J.N. and M.C. acknowledge support by the NCCR MARVEL, funded by the Swiss National Science Foundation (Grant No. 182892), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 101001890-FIAMMA). G.F. and S.P. acknowledge support by the Swiss Platform for Advanced Scientific Computing (PASC). Discussions with Alexander Goscinski and Andrea Grisafi are gratefully acknowledged.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

## REFERENCES

The H atoms are randomly distributed in a sphere of 3 Å around the central carbon.

We consider exclusively the 6874 QM7 structures that contain only C, H, N, and O atoms and make predictions on 1000 randomly selected QM9 structures, similarly limited to four elemental components.