Machine-learning models based on a point-cloud representation of a physical object are ubiquitous in scientific applications and particularly well-suited to the atomic-scale description of molecules and materials. Among the many different approaches that have been pursued, the description of local atomic environments in terms of their discretized neighbor densities has been used widely and very successfully. We propose a novel density-based method, which involves computing “Wigner kernels.” These are fully equivariant and body-ordered kernels that can be computed iteratively at a cost that is independent of the basis used to discretize the density and grows only linearly with the maximum body-order considered. Wigner kernels represent the infinite-width limit of feature-space models, whose dimensionality and computational cost instead scale exponentially with the increasing order of correlations. We present several examples of the accuracy of models based on Wigner kernels in chemical applications, for both scalar and tensorial targets, reaching an accuracy that is competitive with state-of-the-art deep-learning architectures. We discuss the broader relevance of these findings to equivariant geometric machine-learning.
Machine-learning techniques are widely used to perform tasks on 3D objects, from pattern recognition and classification to property prediction.1–4 In particular, different flavors of geometric machine learning5 have widely been used in applications to chemistry, biochemistry, and condensed-matter physics.6–8 Given the coordinates and types of atoms seen as a decorated point cloud, ML models act as a surrogate for accurate electronic-structure simulations, predicting all types of atomic-scale properties that can be obtained from quantum mechanical calculations.9–14 These include not only scalars, such as the potential energy, but also vectors and tensors, which require models that are covariant to rigid rotations of the system.15,16
In this context, body-ordered models have emerged as an elegant and accurate way of describing how the behavior of a molecule or a crystal arises from a hierarchy of interactions between pairs of atoms, triplets, and so on—a perspective that has also widely been adopted in the construction of traditional physics-based interatomic potentials.17–20 By only modeling interactions up to a certain body order, these methods generally achieve low computational costs. Furthermore, since low-body-order interactions are usually dominant, focusing machine-learning models on their description also leads to excellent accuracy and data-efficiency. Several body-ordered models have been proposed for atomistic machine learning; while most work has focused on simple linear models,21–23 it has been shown that several classes of equivariant neural networks24–26 can be interpreted in terms of the systematic construction of hidden features that are capable of describing body-ordered symmetric functions.27,28 Kernel methods are also popular in the field of atomistic chemical modeling,9,14,29–32 as they provide a good balance between the simplicity of linear methods and the flexibility of non-linear models. In most cases, they are used in an invariant setting, and the kernels are manipulated to incorporate higher-order terms in a non-systematic way.33 Even though the body-ordered construction is particularly natural for chemical applications, the formalism can be applied equally well to any point cloud,26 and so it is also relevant for more general applications of geometric learning.
In this work, we present an approach to build body-ordered equivariant kernels in an iterative fashion. Crucially, the iterations only involve kernels themselves, entirely avoiding the definition of a basis to expand the radial and chemical descriptors of each atomic environment and the associated scaling issues. The resulting kernels can be seen as the infinite-basis or infinite-width limit of many of the aforementioned equivariant linear models and neural networks. We demonstrate the excellent accuracy that is exhibited by these “Wigner kernels” in the prediction of scalar and tensorial properties, including the cohesive energy of transition metal clusters and high-energy molecular configurations, and both energetics and molecular dipole moments of organic molecules.
The definition of local equivariant representations of a point cloud is a problem of general relevance for computer vision,34 but it is particularly important for atomistic applications, where the overwhelming majority of frameworks relies on the representation of atom-centered environments. Such atom-centered representations are often computed starting from the definition of local atomic densities around the atom of interest, which makes the predictions invariant with respect to the permutation of atoms of the same chemical element. The locality of the atomic densities is often enforced via a finite cutoff radius within which they are defined, and it results in models whose cost scales linearly with the size of the system. The use of discretized atomic densities has also been linked to much increased computational efficiency in the evaluation of high-order descriptors, as they allow us to compute them while avoiding sums over clusters of increasing order. This is sometimes referred to as the density trick.35,36
Smooth overlap of atomic positions and symmetry-adapted GPR. The oldest model to employ the density trick is kernel-based SOAP-GPR,37 which evaluates a class of three-body invariant descriptors and builds kernels as their scalar products. Higher-body-order invariant interactions are generally included, although not in a systematic way, by taking integer powers of the linear kernels. This model has been used in a wide variety of applications (see the work of Deringer et al.33 for a review). SA-GPR is an equivariant generalization of SOAP-GPR, which aims to build equivariant kernels from “λ-SOAP” features.32 However, these kernels are built as products of a linear low-body-order equivariant part and a non-linear invariant kernel that incorporates higher-order correlations. As a result, these models are not strictly body-ordered, and they offer no guarantees of behaving as universal approximators.38
N-body kernel potentials. In contrast, Glielmo et al.31 introduced density-based body-ordered kernels, proposing their analytical evaluation for low body orders. Nonetheless, these kernels are exclusively rotationally invariant, and the paper proposes a strategy based on approximate symmetrization as the only viable strategy to compute kernels of arbitrarily high body orders.
MTP, ACE, and NICE. The moment tensor potential (MTP)39 and the more recent, closely related atomic cluster expansion (ACE)21 and N-body iterative contraction of equivariant (NICE)40 schemes consist of linear models based on a systematic hierarchy of equivariant body-ordered descriptors. These are obtained as discretized and symmetrized atomic density correlations, which are themselves simply tensor products of the atomic densities. Although several contraction and truncation schemes have been proposed,22,41,42 the full feature space of these constructions grows exponentially with the maximum body-order of the expansion.
Equivariant neural networks. In recent years, equivariant neural networks24–26,43 have become ubiquitous, and they represent the state of the art on many atomic-scale datasets. Most, but not all,44 incorporate message-passing schemes. Equivariant architectures can be seen as a way to efficiently contract the exponentially large feature space of high-body-order density correlations.28 Even though the target-specific optimization of the contraction weights gives these models great flexibility, they still rely on an initial featurization based on the expansion of the neighbor density on a basis and can only span a heavily contracted portion of the high-order correlations.
Point-edge transformer. Even though the method we discuss here falls squarely within the class of equivariant ML models, it is also interesting to compare it against the point-edge transformer (PET) architecture, a model that is not rotationally equivariant and achieves high body-order nature through the application of general-purpose deep-learning architectures to the Cartesian coordinates of the atoms. In doing so, PET avoids the explicit discretization of the neighbor density on a basis.
This work proposes a kernel method capable of generating a high-order representation of interatomic correlations, similarly to MTP, ACE, and NICE. Contrary to these, however, it achieves a complete description of the chemical and radial dimensions avoiding the exponential scaling with the body-order. Compared to previous kernel methods, such as SOAP-GPR, Wigner kernels can achieve descriptions of point clouds of arbitrarily high body-orders and do so in a systematic, controllable way.
Here, j runs over all neighbors in Ai, g is a three-dimensional Gaussian function, and fcut is a cutoff function, which satisfies fcut(r ≥ rcut) = 0, so that the Ai neighborhoods are effectively restricted by a cutoff radius rcut while maintaining continuity. is the position of atom j relative to atom i, and rji is their distance. The coefficients express the discretization of the density on a basis of nmax radial functions Rnl and spherical harmonics that are the basic building blocks of the equivariant models described in Sec. II. It should be noted that a different density ρi,a(x) is defined for each of the amax chemical elements in the neighborhood and that the sum only includes neighboring atoms whose chemical element aj matches a.
Wigner kernels through Wigner iterations. As detailed in Sec. S3, symmetry-adapted kernels of the form given in (5) can be computed by first evaluating body-ordered equivariant representations [in the form of discretized correlations of the neighbor density (4)] and then computing their scalar products. These are the same representations that underlie MTP, ACE, and NICE feature-space models and that are very closely related to the representations that are implicitly generated by equivariant neural networks.27,28 Such a formulation highlights the positive-semi-definiteness of the kernels; however, performing such computations is impractical for ν > 2: on one hand, kernel regression is then equivalent to linear regression on the starting features; on the other hand, the number of features one needs to compute to evaluate the kernel without approximations grows exponentially with ν.
In order to initialize the iterations in (6), only the ν = 1 equivariant kernels are needed. Their expression, which follows immediately from (4) and (5), is given in Sec. S4, along with details of its cost-effective evaluation. Equivariance with respect to inversion is discussed in Sec. S5, and it results in the incorporation of a parity index σ so that the full notation for an O(3)-equivariant kernel is . Finally, we also define one-body, ν = 0 kernels as , which describe similarity of two environments exclusively based on the chemical elements of the central atoms ai and ai′.
Scaling and computational cost. The calculation of the atom-centered density correlations that underlie linear and non-linear equivariant point cloud models entails an exponential scaling of the equivariant feature set size as a function of νmax,22,23 which is the consequence of a use of a radial-element basis of size (amaxnmax) out of which one effectively computes a sequence of outer products, affording a scaling of . Computing Wigner kernels as scalar products of such equivariant features (see Sec. S3) would present the same problems and require aggressive truncation of the basis. The calculation through a Wigner iteration can be understood as a tensor contraction strategy to compute the very same quantity, while avoiding the intermediate evaluation of these outer products (see the schematics in Fig. 1), so that it is possible to use a converged basis while achieving a linear scaling with respect to νmax. The scaling of the Wigner iteration with respect to its hyperparameters is discussed in Sec. S6. Only the angular basis has to be truncated at a maximum angular momentum order λmax, and the scaling is steeper relative to traditional SO(3)-symmetrized products . Fortunately, as we shall see in Sec. IV A, Wigner kernels exhibit excellent performance even with low λmax.
Having discussed the formulation and the theoretical scaling of Wigner kernels, we now proceed to assess their behavior in practical regression tasks, focusing on applications to atomistic machine learning. We refer the reader to Sec. S7 for a discussion of the implementation details and to Sec. S9 for a list of the hyperparameters of the models. We consider four cases that allow us to show the accuracy of our framework and to highlight its relationship with other geometric ML schemes: a system that is expected to exhibit strong many-body effects, one that requires high-resolution descriptors, and two classical benchmark datasets for organic molecules, including regression of a tensorial target. We focus on molecular systems, as the smaller number of atoms per structure reduces the computational effort of performing systematic ablation studies with full-kernel regression. Wigner kernels can be computed in an analogous way for bulk systems, although efficient implementation of regression and inference—particularly when including target gradients—would then benefit greatly from a sparse kernel approximation.
A. Ablation studies: Gold cluster and random methane datasets
In the first instance, we test the behavior of the Wigner kernels on two datasets that fully display the relative importance of its different body-ordered and angular components, respectively, while comparing the proposed model to its most closely related counterparts. It is clear that the scaling properties of the Wigner kernel model discussed in Sec. III make it especially advantageous for systems requiring a high-body-order description of the potential energy surface. Metallic clusters often exhibit non-trivial finite-size effects due to the interplay between surface and bulk states,52 and they have therefore been used in the past as prototypical benchmarks for many-body ML models.53 As a particularly challenging test case, we consider a publicly-available dataset54 of MD trajectories of gold clusters of different size.55 From these trajectories, we select 105 092 uncorrelated structures for use in this work.
The need for high-body-order terms is clear when comparing results for models based on exponential WKs truncated at different orders of ν (Fig. 2). ν = 2 and (to a lesser extent) ν = 3 models result in saturating learning curves. A comparison with SOAP-based models reveals the likely source of the increased performance of the Wigner kernels. Indeed, linear SOAP, which is a νmax = 2 model, shows very similar performance to its ν = 2 WK analog. The same is true for squared-kernel SOAP-GPR, which closely resembles the learning curve of a Wigner kernel construction for which νmax = 2 and the resulting kernels are squared—the difference probably due to the different functional form of the two kernels and the presence of higher-l components in the density for SOAP-GPR. A true νmax = 4 kernel, which incorporates all five-body correlations, significantly outperforms both squared-kernel learning curves, demonstrating the advantages of explicit body-ordering. We conclude with a comparison between the νmax = 6 WKs and νmax = 6 Laplacian-eigenbasis (LE) ACE models. For the latter, we used the same radial transform presented in Ref. 49, and we optimized its single hyperparameter. Although it might be possible to further tune the performance of LE-ACE by changing the functional form of the radial transform, the comparison with the Wigner kernel learning curve suggests that the kernel-space basis employed in the Wigner kernels might be advantageous in geometrically inhomogeneous datasets, such as this.
As a second example, we test the Wigner kernels on a random gas-phase CH4 dataset,38,56 which we expect to be very challenging for the proposed Wigner kernel model, as it is intrinsically limited in body-order and almost random in its configurations, so that using a training-set kernel basis provides close to no advantages. More importantly, this dataset requires very careful convergence of the angular basis,28,49 which is problematic in view of the steep λmax scaling of Wigner iterations. With all these potential problems, Wigner kernels achieve a remarkable level of accuracy, outperforming SOAP-GPR and NICE, and being competitive with LE-ACE despite using only λmax = 3.
In light of the aforementioned points, comparing against the PET architecture, which has unlimited angular resolution and body order due to its unconstrained nature, is especially challenging. Despite this, we observe that Wigner kernels outperform it by a large margin in the low-data regime. Similar low-λmax effects have been noticed in many recent efforts to machine-learn interatomic potentials.25,27,44,57 By providing a functional form that spans the full space of density correlations at a given level of angular truncation, Wigner kernels can help rationalize why low-λmax models can perform well. Indeed, due to the form of the Wigner iterations, k(ν) does not report exclusively on (ν + 1)-body correlations, but also on all lower-order ones, and the tensor-product form of the kernel space incorporates higher frequency components in their functional form, much like sin2(ωx) contains components with frequency 2ω. We investigate and confirm this hypothesis in Sec. S10 by decomposing the angular dependence of high-ν kernels into their frequency components. This explains why aggressively truncated equivariant ML models25,58 can achieve high accuracy in the prediction of interatomic potentials.
B. RMD17 dataset
We proceed our investigation on the rMD17 dataset,59 which assesses the accuracy achieved by models when learning potential energy surfaces of small organic molecules. When using the derivative learning scheme by Chmiela et al.,29 Wigner kernels are shown to systematically outcompete the ACE implementation that performs best on this benchmark (LE-ACE, Bigi et al.49), showing the advantages of a fully converged radial-chemical description (Table I). The proposed model is also competitive in accuracy with equivariant neural networks, such as NequIP25 and MACE,43 while operating at a reduced computational cost. Using atomic neighborhoods as support points rather than full structures would further reduce the cost of the Wigner kernels by roughly an order of magnitude [by eliminating the sum over i′ in (1)] while causing little to no deterioration in accuracy. Exploiting the sparsity of the Clebsch–Gordan matrices, as is done in MACE, would also improve the efficiency of the proposed model. Finally, it is also worth mentioning that, similar to ACE and Allegro,44 but unlike NequIP and MACE, Wigner kernels are entirely local, as they do not incorporate message-passing operations. This greatly simplifies the parallelization of inference for large-scale calculations.
C. QM9 dataset
Wigner kernels avoid the unfavorable scaling of traditional body-ordered models with respect to the number of chemical elements in the system. This property is particularly useful when dealing with chemically diverse datasets. An example is that of the popular QM9 dataset,71 which contains five elements (H, C, N, O, and F). We build Wigner kernel models for two atomic-scale properties within this dataset, and to illustrate the transferability of our model, we use the same hyperparameters for both fits (see Sec. S9).
Molecular dipoles. We begin the investigation with a covariant learning exercise. This consists of learning the dipole moment vectors μ of the molecules in the QM9 dataset.66 In the small-data regime, Wigner kernels have a similar performance to that obtained by optimized λ-SOAP kernels in the work of Veit et al.,66 but they completely avoid the saturation for larger train set size (Fig. 3). The improved performance of the Wigner kernels is a clear indication of the higher descriptive power that is afforded by the use of a full body-ordered equivariant kernel, as opposed to the combination of linear covariant ν = 2 kernels and non-linear scalar kernel that is used in current applications of SA-GPR. The need for a high-body-order and high-basis-resolution framework is also clear in the performance of the PET model, which despite not being exactly equivariant outperforms all models—including Wigner kernels in the large dataset regime.
Molecule . | LE-ACE . | NequIP . | MACE . | WK . | ||||
E . | F . | E . | F . | E . | F . | E . | F . | |
Aspirin | 22.4 | 59.1 | 19.5 | 52.0 | 17.0 | 43.9 | 17.0 | 50.2 |
Azobenzene | 9.9 | 27.5 | 6.0 | 20.0 | 5.4 | 17.7 | 7.9 | 25.6 |
Benzene | 0.135 | 1.44 | 0.6 | 2.9 | 0.7 | 2.7 | 0.131 | 1.31 |
Ethanol | 6.6 | 32.0 | 8.7 | 40.3 | 6.7 | 32.6 | 5.9 | 30.8 |
Malonaldehyde | 11.3 | 50.9 | 12.7 | 52.5 | 10.0 | 43.3 | 8.9 | 43.8 |
Naphthalene | 2.9 | 13.9 | 2.1 | 10.0 | 2.1 | 9.2 | 2.5 | 12.5 |
Paracetamol | 14.3 | 45.1 | 14.3 | 39.7 | 9.7 | 31.5 | 10.2 | 37.2 |
Salicylic acid | 8.3 | 36.7 | 8.0 | 35.0 | 6.5 | 28.4 | 6.8 | 31.9 |
Toluene | 4.1 | 18.4 | 3.3 | 15.1 | 3.1 | 12.1 | 3.4 | 16.4 |
Uracil | 5.7 | 30.7 | 7.3 | 40.1 | 4.4 | 25.9 | 5.1 | 27.8 |
Energies. Finally, we test the Wigner kernel model on the ground-state energies of the QM9 dataset. The corresponding learning curves are shown in Fig. 3. Wigner kernels significantly improve on other kernel methods, such as SOAP and FCHL in the low-data regime. As in the case of CH4, the WK model is truncated at a low angular threshold (λmax = 3). However, the corresponding learning curve shows no signs of saturation, possibly for the same reasons we highlighted in Sec. IV A. Similarly, a relatively low maximum body-order (νmax = 4) does not seem to impact the accuracy of the model, most likely because stable organic molecules have, with few exceptions, atoms with only up to four nearest neighbors. On the full QM9 dataset, Wigner kernels also achieve state-of-the-art accuracy, as shown in the last point of the WK learning curve and in Table II. The remarkable performance of the Wigner kernels on this exercise shows the suitability of the proposed model to, for instance, screening of pharmaceutical targets or prediction of chemical shifts from single equilibrium configurations. This stands in contrast to the other datasets we have investigated, which are better suited to assess the quality of a model in approximating a property surface for atomistic simulations.
Model . | U0 . | U . | H . | G . |
NoisyNodes60 | 7.3 | 7.6 | 7.4 | 8.3 |
SphereNet61 | 6.3 | 6.4 | 6.3 | 7.8 |
DimeNet++62 | 6.3 | 6.3 | 6.5 | 7.6 |
ET63 | 6.2 | 6.4 | 6.2 | 7.6 |
PaiNN58 | 5.9 | 5.8 | 6.0 | 7.4 |
Allegro44 | 4.7 (0.2) | 4.4 | 4.4 | 5.7 |
MACE64 | 4.1 | 4.1 | 4.7 | 5.5 |
TensorNet65 | 3.9 (0.3) | 3.9 (0.1) | 4.0 (0.2) | 5.7 (0.1) |
Wigner Kernels | 4.2 (0.3) | 4.1 (0.3) | 4.1 (0.3) | 5.8 (0.3) |
In this work, we have presented the Wigner iteration as a practical tool to construct rotationally equivariant “Wigner kernels” for use in symmetry-adapted Gaussian process regression on 3D point clouds. We have then applied them to machine learn the atomistic properties of molecules and clusters. The proposed kernels are explicitly body-ordered—i.e., they provide explicit universal approximation capabilities22 for properties that simultaneously depend on the correlations between the positions of ν + 1 points—and can be thought as the kernels corresponding to the infinite-width limit of several families of body-ordered models. This extends the well-known equivalence between infinitely wide neural networks and Gaussian processes72–74 from a statistical context to the one of geometric representations. Whereas the full feature-space evaluation of body-ordered models leads to an exponential increase of the cost with ν, a kernel-space evaluation is naturally adapted to the training structures, and it avoids the explosion in the number of equivariant features that arises from the use of an explicit radial-chemical basis. The scaling properties of the Wigner iterations make the new model particularly suitable for datasets that are chemically diverse, that are expected to contain strong high-body-order effects, and/or that involve a very inhomogeneous distribution of molecular geometries.
Our benchmarks demonstrate the excellent performance of KRR models based on Wigner iterations on a variety of different atomistic problems. The ablation studies on gold clusters and gas-phase methane molecules fully reveal the strengths and weaknesses of the proposed model. In particular, the results for a random CH4 dataset suggest that Wigner kernels incorporate high-resolution basis functions even when they are built with a moderate angular momentum threshold, which is reassuring, given the steep scaling of the computational cost with λmax. The chemically diverse rMD17 and QM9 datasets allow us to show the state-of-the-art performance of the proposed model when learning energies, forces, and vectorial dipole moments. The fact that a kernel model can match the performance of extensively tuned equivariant neural networks testifies to the importance of understanding the connection between body-ordered correlations, the choice and truncation of a feature-space basis, and the introduction of scalar non-linearities in equivariant models.
Besides this fundamental role to test the complete-basis limit of density-correlation models, it is clear that Wigner iterations can be incorporated into practical applications. Our model achieves high efficiency on small molecules, and using a sparse kernel formalism will allow us to further reduce its computational cost and apply the model to much larger systems. Finally, the Wigner iteration could also be applied outside a pure kernel regression framework: from the calculation of non-linear equivariant functions, to the use in Gaussian process classifiers,75 to the inclusion as a layer in an equivariant architecture, the ideas we present here open up an original research direction in the construction of symmetry-adapted, physically inspired models for chemistry, materials science, and more in general any application whose inputs can be conveniently described in terms of a 3D point cloud.
The supplementary material contains several detailed derivations for the results reported in the main text, details of the models, and further benchmarks.
The authors would like to thank Jigyasa Nigam and Kevin Huguenin-Dumittan for stimulating discussions. M.C. and F.B. acknowledge support from the NCCR MARVEL, funded by the Swiss National Science Foundation (SNSF, Grant No. 182892). M.C. and S.N.P. acknowledge support from the Swiss Platform for Advanced Scientific Computing (PASC).
The code used to generate the results for the Wigner kernel model is available at Hyperparameters for all the numerical experiments are given in Sec. S9. The data used here are available from the cited references.