Machine-learning models based on a point-cloud representation of a physical object are ubiquitous in scientific applications and particularly well-suited to the atomic-scale description of molecules and materials. Among the many different approaches that have been pursued, the description of local atomic environments in terms of their discretized neighbor densities has been used widely and very successfully. We propose a novel density-based method, which involves computing “Wigner kernels.” These are fully equivariant and body-ordered kernels that can be computed iteratively at a cost that is independent of the basis used to discretize the density and grows only linearly with the maximum body-order considered. Wigner kernels represent the infinite-width limit of feature-space models, whose dimensionality and computational cost instead scale exponentially with the increasing order of correlations. We present several examples of the accuracy of models based on Wigner kernels in chemical applications, for both scalar and tensorial targets, reaching an accuracy that is competitive with state-of-the-art deep-learning architectures. We discuss the broader relevance of these findings to equivariant geometric machine-learning.

Machine-learning techniques are widely used to perform tasks on 3D objects, from pattern recognition and classification to property prediction.1–4 In particular, different flavors of geometric machine learning5 have widely been used in applications to chemistry, biochemistry, and condensed-matter physics.6–8 Given the coordinates and types of atoms seen as a decorated point cloud, ML models act as a surrogate for accurate electronic-structure simulations, predicting all types of atomic-scale properties that can be obtained from quantum mechanical calculations.9–14 These include not only scalars, such as the potential energy, but also vectors and tensors, which require models that are covariant to rigid rotations of the system.15,16

In this context, body-ordered models have emerged as an elegant and accurate way of describing how the behavior of a molecule or a crystal arises from a hierarchy of interactions between pairs of atoms, triplets, and so on—a perspective that has also widely been adopted in the construction of traditional physics-based interatomic potentials.17–20 By only modeling interactions up to a certain body order, these methods generally achieve low computational costs. Furthermore, since low-body-order interactions are usually dominant, focusing machine-learning models on their description also leads to excellent accuracy and data-efficiency. Several body-ordered models have been proposed for atomistic machine learning; while most work has focused on simple linear models,21–23 it has been shown that several classes of equivariant neural networks24–26 can be interpreted in terms of the systematic construction of hidden features that are capable of describing body-ordered symmetric functions.27,28 Kernel methods are also popular in the field of atomistic chemical modeling,9,14,29–32 as they provide a good balance between the simplicity of linear methods and the flexibility of non-linear models. In most cases, they are used in an invariant setting, and the kernels are manipulated to incorporate higher-order terms in a non-systematic way.33 Even though the body-ordered construction is particularly natural for chemical applications, the formalism can be applied equally well to any point cloud,26 and so it is also relevant for more general applications of geometric learning.

In this work, we present an approach to build body-ordered equivariant kernels in an iterative fashion. Crucially, the iterations only involve kernels themselves, entirely avoiding the definition of a basis to expand the radial and chemical descriptors of each atomic environment and the associated scaling issues. The resulting kernels can be seen as the infinite-basis or infinite-width limit of many of the aforementioned equivariant linear models and neural networks. We demonstrate the excellent accuracy that is exhibited by these “Wigner kernels” in the prediction of scalar and tensorial properties, including the cohesive energy of transition metal clusters and high-energy molecular configurations, and both energetics and molecular dipole moments of organic molecules.

The definition of local equivariant representations of a point cloud is a problem of general relevance for computer vision,34 but it is particularly important for atomistic applications, where the overwhelming majority of frameworks relies on the representation of atom-centered environments. Such atom-centered representations are often computed starting from the definition of local atomic densities around the atom of interest, which makes the predictions invariant with respect to the permutation of atoms of the same chemical element. The locality of the atomic densities is often enforced via a finite cutoff radius within which they are defined, and it results in models whose cost scales linearly with the size of the system. The use of discretized atomic densities has also been linked to much increased computational efficiency in the evaluation of high-order descriptors, as they allow us to compute them while avoiding sums over clusters of increasing order. This is sometimes referred to as the density trick.35,36

Smooth overlap of atomic positions and symmetry-adapted GPR. The oldest model to employ the density trick is kernel-based SOAP-GPR,37 which evaluates a class of three-body invariant descriptors and builds kernels as their scalar products. Higher-body-order invariant interactions are generally included, although not in a systematic way, by taking integer powers of the linear kernels. This model has been used in a wide variety of applications (see the work of Deringer et al.33 for a review). SA-GPR is an equivariant generalization of SOAP-GPR, which aims to build equivariant kernels from “λ-SOAP” features.32 However, these kernels are built as products of a linear low-body-order equivariant part and a non-linear invariant kernel that incorporates higher-order correlations. As a result, these models are not strictly body-ordered, and they offer no guarantees of behaving as universal approximators.38 

N-body kernel potentials. In contrast, Glielmo et al.31 introduced density-based body-ordered kernels, proposing their analytical evaluation for low body orders. Nonetheless, these kernels are exclusively rotationally invariant, and the paper proposes a strategy based on approximate symmetrization as the only viable strategy to compute kernels of arbitrarily high body orders.

MTP, ACE, and NICE. The moment tensor potential (MTP)39 and the more recent, closely related atomic cluster expansion (ACE)21 and N-body iterative contraction of equivariant (NICE)40 schemes consist of linear models based on a systematic hierarchy of equivariant body-ordered descriptors. These are obtained as discretized and symmetrized atomic density correlations, which are themselves simply tensor products of the atomic densities. Although several contraction and truncation schemes have been proposed,22,41,42 the full feature space of these constructions grows exponentially with the maximum body-order of the expansion.

Equivariant neural networks. In recent years, equivariant neural networks24–26,43 have become ubiquitous, and they represent the state of the art on many atomic-scale datasets. Most, but not all,44 incorporate message-passing schemes. Equivariant architectures can be seen as a way to efficiently contract the exponentially large feature space of high-body-order density correlations.28 Even though the target-specific optimization of the contraction weights gives these models great flexibility, they still rely on an initial featurization based on the expansion of the neighbor density on a basis and can only span a heavily contracted portion of the high-order correlations.

Point-edge transformer. Even though the method we discuss here falls squarely within the class of equivariant ML models, it is also interesting to compare it against the point-edge transformer (PET) architecture, a model that is not rotationally equivariant and achieves high body-order nature through the application of general-purpose deep-learning architectures to the Cartesian coordinates of the atoms. In doing so, PET avoids the explicit discretization of the neighbor density on a basis.

This work proposes a kernel method capable of generating a high-order representation of interatomic correlations, similarly to MTP, ACE, and NICE. Contrary to these, however, it achieves a complete description of the chemical and radial dimensions avoiding the exponential scaling with the body-order. Compared to previous kernel methods, such as SOAP-GPR, Wigner kernels can achieve descriptions of point clouds of arbitrarily high body-orders and do so in a systematic, controllable way.

(Symmetry-adapted) kernel ridge regression. Throughout this work, we will employ kernel ridge regression (KRR) to fit atomistic properties. In this context, kernel functions are defined between any two atomic-scale structures so that the kernel k(A, A′) represents a similarity measure between structures A and A′, where each structure is described via the set of positions and chemical elements of its atoms. As mentioned in Sec. II, it is common practice—rooted in physical approximations45 and usually beneficial to the transferability of the model35—to use atom-centered decompositions of the physical properties of a structure. This physical ansatz implies a kernel-mean-embedding46 form for the structure-wise kernels, which are decomposed into atom-pair contributions,47 
(1)
where i runs over all atoms in structure A, i′ runs over all atoms in structure A′, and Ai, Ai denote the atomic environments around atoms i and i′, respectively. These are spherical neighborhoods of the central atom under consideration with radius rcut so that Ai{(aj,rji)}rji<rcut is a shorthand for all the Cartesian positions rji relative to the center, and the chemical element labels aj, of the atoms within the cutoff radius rcut.
As shown in the works of Glielmo et al.16 and Grisafi et al.,32 KRR can be extended to the prediction of atomistic properties that are equivariant with respect to symmetry operations in SO(3) (3D-rotations R̂). In order to build a symmetry-adapted model that is suitable for the regression of a property yλμ (that transforms like the set {Yλμ}μ=λλ of spherical harmonics of degree λ > 0 and order −λμλ), it is sufficient to employ tensorial kernels kμμλ and a symmetry-adapted ansatz,
(2)
where cAμ are regression coefficients, A is a structure in the training set, B is a structure whose rotationally equivariant property ỹλμ(B) is to be predicted, and the kμμλ kernels must obey
(3)
Here, Dλ(R̂) is the Wigner D-matrix associated with the rotation R̂, i.e., the matrix representation of the rotation operator R̂ in the basis of the irreducible representations of the SO(3) group. In practice, most established invariant models use low-rank approximations of the kernel matrix, which result in a more favorable scaling with system size in training and predictions. See the work of Deringer et al.33 for a recent review on kernel methods applied to atomistic problems.
Atomic densities and body-ordered kernels. As discussed in Sec. II, a broad class of atomistic ML frameworks can be formulated in terms of discretized correlations of an atomic neighbor density defined within each environment.21,28,35,48 These are defined as scalar fields in real space ρi,a(x), where xR3, and given by
(4)

Here, j runs over all neighbors in Ai, g is a three-dimensional Gaussian function, and fcut is a cutoff function, which satisfies fcut(rrcut) = 0, so that the Ai neighborhoods are effectively restricted by a cutoff radius rcut while maintaining continuity. rjiR3 is the position of atom j relative to atom i, and rji is their distance. The coefficients cnlma express the discretization of the density on a basis of nmax radial functions Rnl and spherical harmonics Ylm that are the basic building blocks of the equivariant models described in Sec. II. It should be noted that a different density ρi,a(x) is defined for each of the amax chemical elements in the neighborhood and that the sum only includes neighboring atoms whose chemical element aj matches a.

These densities can be used to define kernels that fulfill the equivariance condition (3),
(5)
where ν will be referred to as the correlation order of the kernel and the other symbols carry the same meaning as in (3). The ν = 2 special case has been used to machine learn tensorial properties of atomistic systems in the work of Grisafi et al.32 The kernels in (5) contain correlated information about at most ν neighbors in each atomic neighborhood (Ai and Ai). This is because the density expansion in (4) is a simple sum over neighbors, and it is raised to the power of ν, while all other operations (the inner integral and the rotation) are linear. As a result, these kernels are intrinsically body-ordered: kμμν,λ can describe physical interactions up to body-order ν + 1 (the center of the representation and ν neighbors), but not higher.

Wigner kernels through Wigner iterations. As detailed in Sec. S3, symmetry-adapted kernels of the form given in (5) can be computed by first evaluating body-ordered equivariant representations [in the form of discretized correlations of the neighbor density (4)] and then computing their scalar products. These are the same representations that underlie MTP, ACE, and NICE feature-space models and that are very closely related to the representations that are implicitly generated by equivariant neural networks.27,28 Such a formulation highlights the positive-semi-definiteness of the kernels; however, performing such computations is impractical for ν > 2: on one hand, kernel regression is then equivalent to linear regression on the starting features; on the other hand, the number of features one needs to compute to evaluate the kernel without approximations grows exponentially with ν.

Our main result, which we will refer to as a Wigner iteration, is that high-ν kernels can be computed following an alternative route by combining lower-order kernels iteratively,
(6)
where ⟨l1m1l2m2|λμ⟩ are Clebsch–Gordan coefficients. The proof (shown in Sec. S2) involves expressing the kernel in Eq. (5) in terms of two lower-order terms and recasting the result to lower-order kernels using known relationships between Wigner D-matrices and Clebsch–Gordan coefficients. Therefore, Eq. (6) can be derived simply from the definition of the kernels, without invoking radial-chemical indices at any point. Despite this, we also provide an alternative proof (see the supplementary material, Sec. S3), which might be easier to follow for readers who are familiar with feature-space models. Although truncated in its angular parameters, this formulation of the high-order kernels is entirely lossless in terms of the radial basis and the dimension of composition (chemical element) space. Indeed, Sec. S3 shows how the Wigner kernel formulation corresponds to the infinite-width limit of equivariant feature-space linear models and neural networks.

In order to initialize the iterations in (6), only the ν = 1 equivariant kernels kμμ1,λ are needed. Their expression, which follows immediately from (4) and (5), is given in Sec. S4, along with details of its cost-effective evaluation. Equivariance with respect to inversion is discussed in Sec. S5, and it results in the incorporation of a parity index σ so that the full notation for an O(3)-equivariant kernel is kμμν,λσ. Finally, we also define one-body, ν = 0 kernels as kμμ0,λσ(Ai,Ai)=δλ0δσ1δaiai, which describe similarity of two environments exclusively based on the chemical elements of the central atoms ai and ai.

Scaling and computational cost. The calculation of the atom-centered density correlations that underlie linear and non-linear equivariant point cloud models entails an exponential scaling of the equivariant feature set size as a function of νmax,22,23 which is the consequence of a use of a radial-element basis of size (amaxnmax) out of which one effectively computes a sequence of outer products, affording a scaling of O((amaxnmax)ν). Computing Wigner kernels as scalar products of such equivariant features (see Sec. S3) would present the same problems and require aggressive truncation of the basis. The calculation through a Wigner iteration can be understood as a tensor contraction strategy to compute the very same quantity, while avoiding the intermediate evaluation of these outer products (see the schematics in Fig. 1), so that it is possible to use a converged basis while achieving a linear scaling with respect to νmax. The scaling of the Wigner iteration with respect to its hyperparameters is discussed in Sec. S6. Only the angular basis has to be truncated at a maximum angular momentum order λmax, and the scaling is steeper (λmax7) relative to traditional SO(3)-symmetrized products (λmax5). Fortunately, as we shall see in Sec. IV A, Wigner kernels exhibit excellent performance even with low λmax.

FIG. 1.

A schematic comparison of the calculation of body-ordered representations and kernels through Clebsch–Gordan products (black arrows) and through Wigner iterations (blue arrows), illustrated for the case of the ν = 2 invariant kernel k2,0. The former entails a scaling of representation size and computational cost with the square of the basis size, (amaxnmax)2. The latter immediately computes a kernel and is therefore independent of basis size. Further feature-space iterations exponentially increase the complexity, whereas kernel-space iterations are independent of the body order ν.

FIG. 1.

A schematic comparison of the calculation of body-ordered representations and kernels through Clebsch–Gordan products (black arrows) and through Wigner iterations (blue arrows), illustrated for the case of the ν = 2 invariant kernel k2,0. The former entails a scaling of representation size and computational cost with the square of the basis size, (amaxnmax)2. The latter immediately computes a kernel and is therefore independent of basis size. Further feature-space iterations exponentially increase the complexity, whereas kernel-space iterations are independent of the body order ν.

Close modal

Having discussed the formulation and the theoretical scaling of Wigner kernels, we now proceed to assess their behavior in practical regression tasks, focusing on applications to atomistic machine learning. We refer the reader to Sec. S7 for a discussion of the implementation details and to Sec. S9 for a list of the hyperparameters of the models. We consider four cases that allow us to show the accuracy of our framework and to highlight its relationship with other geometric ML schemes: a system that is expected to exhibit strong many-body effects, one that requires high-resolution descriptors, and two classical benchmark datasets for organic molecules, including regression of a tensorial target. We focus on molecular systems, as the smaller number of atoms per structure reduces the computational effort of performing systematic ablation studies with full-kernel regression. Wigner kernels can be computed in an analogous way for bulk systems, although efficient implementation of regression and inference—particularly when including target gradients—would then benefit greatly from a sparse kernel approximation.

In the first instance, we test the behavior of the Wigner kernels on two datasets that fully display the relative importance of its different body-ordered and angular components, respectively, while comparing the proposed model to its most closely related counterparts. It is clear that the scaling properties of the Wigner kernel model discussed in Sec. III make it especially advantageous for systems requiring a high-body-order description of the potential energy surface. Metallic clusters often exhibit non-trivial finite-size effects due to the interplay between surface and bulk states,52 and they have therefore been used in the past as prototypical benchmarks for many-body ML models.53 As a particularly challenging test case, we consider a publicly-available dataset54 of MD trajectories of gold clusters of different size.55 From these trajectories, we select 105 092 uncorrelated structures for use in this work.

The need for high-body-order terms is clear when comparing results for models based on exponential WKs truncated at different orders of ν (Fig. 2). ν = 2 and (to a lesser extent) ν = 3 models result in saturating learning curves. A comparison with SOAP-based models reveals the likely source of the increased performance of the Wigner kernels. Indeed, linear SOAP, which is a νmax = 2 model, shows very similar performance to its ν = 2 WK analog. The same is true for squared-kernel SOAP-GPR, which closely resembles the learning curve of a Wigner kernel construction for which νmax = 2 and the resulting kernels are squared—the difference probably due to the different functional form of the two kernels and the presence of higher-l components in the density for SOAP-GPR. A true νmax = 4 kernel, which incorporates all five-body correlations, significantly outperforms both squared-kernel learning curves, demonstrating the advantages of explicit body-ordering. We conclude with a comparison between the νmax = 6 WKs and νmax = 6 Laplacian-eigenbasis (LE) ACE models. For the latter, we used the same radial transform presented in Ref. 49, and we optimized its single hyperparameter. Although it might be possible to further tune the performance of LE-ACE by changing the functional form of the radial transform, the comparison with the Wigner kernel learning curve suggests that the kernel-space basis employed in the Wigner kernels might be advantageous in geometrically inhomogeneous datasets, such as this.

FIG. 2.

Left: Learning curves for the electronic free energy of gold clusters. Different curves correspond to invariant Wigner kernels of increasing body-order, as well as a construction where a linear combination of Wigner kernels up to νmax = 2 is squared. A linear SOAP37 model, a SOAP-GPR33 model built with a squared kernel, and a LE-ACE49 model are also shown. The hyperparameters for all models are discussed in Sec. S9. Right: Learning curves for the energy of random CH4 configurations, comparing different models. Wigner kernels have a maximum correlation order of νmax = 4. The LE-ACE and NICE curves are from the works of Bigi et al.49 and Nigam et al.,23 respectively. Hyperparameters for all other models are discussed in Sec. S9. All results are the average of ten random train/test splits within the respective datasets, where the number of test structures is kept constant at 1000. Figures with error bars relative to these random splits are provided in Sec. S11. We note that REANN50 and PET51 achieved higher accuracy on this dataset by also learning from forces.

FIG. 2.

Left: Learning curves for the electronic free energy of gold clusters. Different curves correspond to invariant Wigner kernels of increasing body-order, as well as a construction where a linear combination of Wigner kernels up to νmax = 2 is squared. A linear SOAP37 model, a SOAP-GPR33 model built with a squared kernel, and a LE-ACE49 model are also shown. The hyperparameters for all models are discussed in Sec. S9. Right: Learning curves for the energy of random CH4 configurations, comparing different models. Wigner kernels have a maximum correlation order of νmax = 4. The LE-ACE and NICE curves are from the works of Bigi et al.49 and Nigam et al.,23 respectively. Hyperparameters for all other models are discussed in Sec. S9. All results are the average of ten random train/test splits within the respective datasets, where the number of test structures is kept constant at 1000. Figures with error bars relative to these random splits are provided in Sec. S11. We note that REANN50 and PET51 achieved higher accuracy on this dataset by also learning from forces.

Close modal

As a second example, we test the Wigner kernels on a random gas-phase CH4 dataset,38,56 which we expect to be very challenging for the proposed Wigner kernel model, as it is intrinsically limited in body-order and almost random in its configurations, so that using a training-set kernel basis provides close to no advantages. More importantly, this dataset requires very careful convergence of the angular basis,28,49 which is problematic in view of the steep λmax scaling of Wigner iterations. With all these potential problems, Wigner kernels achieve a remarkable level of accuracy, outperforming SOAP-GPR and NICE, and being competitive with LE-ACE despite using only λmax = 3.

In light of the aforementioned points, comparing against the PET architecture, which has unlimited angular resolution and body order due to its unconstrained nature, is especially challenging. Despite this, we observe that Wigner kernels outperform it by a large margin in the low-data regime. Similar low-λmax effects have been noticed in many recent efforts to machine-learn interatomic potentials.25,27,44,57 By providing a functional form that spans the full space of density correlations at a given level of angular truncation, Wigner kernels can help rationalize why low-λmax models can perform well. Indeed, due to the form of the Wigner iterations, k(ν) does not report exclusively on (ν + 1)-body correlations, but also on all lower-order ones, and the tensor-product form of the kernel space incorporates higher frequency components in their functional form, much like sin2(ωx) contains components with frequency 2ω. We investigate and confirm this hypothesis in Sec. S10 by decomposing the angular dependence of high-ν kernels into their frequency components. This explains why aggressively truncated equivariant ML models25,58 can achieve high accuracy in the prediction of interatomic potentials.

We proceed our investigation on the rMD17 dataset,59 which assesses the accuracy achieved by models when learning potential energy surfaces of small organic molecules. When using the derivative learning scheme by Chmiela et al.,29 Wigner kernels are shown to systematically outcompete the ACE implementation that performs best on this benchmark (LE-ACE, Bigi et al.49), showing the advantages of a fully converged radial-chemical description (Table I). The proposed model is also competitive in accuracy with equivariant neural networks, such as NequIP25 and MACE,43 while operating at a reduced computational cost. Using atomic neighborhoods as support points rather than full structures would further reduce the cost of the Wigner kernels by roughly an order of magnitude [by eliminating the sum over i′ in (1)] while causing little to no deterioration in accuracy. Exploiting the sparsity of the Clebsch–Gordan matrices, as is done in MACE, would also improve the efficiency of the proposed model. Finally, it is also worth mentioning that, similar to ACE and Allegro,44 but unlike NequIP and MACE, Wigner kernels are entirely local, as they do not incorporate message-passing operations. This greatly simplifies the parallelization of inference for large-scale calculations.

Wigner kernels avoid the unfavorable scaling of traditional body-ordered models with respect to the number of chemical elements in the system. This property is particularly useful when dealing with chemically diverse datasets. An example is that of the popular QM9 dataset,71 which contains five elements (H, C, N, O, and F). We build Wigner kernel models for two atomic-scale properties within this dataset, and to illustrate the transferability of our model, we use the same hyperparameters for both fits (see Sec. S9).

Molecular dipoles. We begin the investigation with a covariant learning exercise. This consists of learning the dipole moment vectors μ of the molecules in the QM9 dataset.66 In the small-data regime, Wigner kernels have a similar performance to that obtained by optimized λ-SOAP kernels in the work of Veit et al.,66 but they completely avoid the saturation for larger train set size (Fig. 3). The improved performance of the Wigner kernels is a clear indication of the higher descriptive power that is afforded by the use of a full body-ordered equivariant kernel, as opposed to the combination of linear covariant ν = 2 kernels and non-linear scalar kernel that is used in current applications of SA-GPR. The need for a high-body-order and high-basis-resolution framework is also clear in the performance of the PET model, which despite not being exactly equivariant outperforms all models—including Wigner kernels in the large dataset regime.

FIG. 3.

Left: Learning curves for the prediction of molecular dipole moments in the QM9 datasets. Different curves correspond to FCHL kernels,30 the dipole models presented by Veit et al.,66 and Wigner kernels. It should be noted how the models that use atomic charges can account for the macroscopic component of the dipole moment that arises due to charge separation, while the others can only predict dipole moments as a sum of local atom-centered contributions. The dashed line in the WK learning curve represents a change in the fitting procedure: the points before the dashed line are obtained as highlighted in Sec. S7, while the points after the dashed line are obtained with the same cross-validation procedure, but using a less expensive two-dimensional grid search over the kernel mixing parameters (Sec. S7 B). The accuracy of the model does not seem to be affected by this change. Right: Selection of the best QM9 literature models for which learning curves are available: FCHL,30 SOAP-GPR,42 aSLATM,67 PhysNet,68 SchNet,69 NICE,23 MTP,39 and GM-sNN.70 A few selected neural networks whose learning curves are not available are also shown on the full QM9 dataset (right-most isolated points); more are available in Table II. The dashed line in the WK learning curve represents a change in the fitting procedure. The points to its left are obtained by averaging ten runs with random train/test splits, where 1000 test structures are employed, and cross-validation is conducted within the training set as described in Sec. S7. Instead, for consistency with the literature models trained on the full QM9 dataset (Table II), the last point is averaged over 16 random train/validation/test splits where validation is conducted on a dedicated validation set via a grid search, as discussed in the caption of Table II. Figures with error bars relative to these random splits are provided in Sec. S11.

FIG. 3.

Left: Learning curves for the prediction of molecular dipole moments in the QM9 datasets. Different curves correspond to FCHL kernels,30 the dipole models presented by Veit et al.,66 and Wigner kernels. It should be noted how the models that use atomic charges can account for the macroscopic component of the dipole moment that arises due to charge separation, while the others can only predict dipole moments as a sum of local atom-centered contributions. The dashed line in the WK learning curve represents a change in the fitting procedure: the points before the dashed line are obtained as highlighted in Sec. S7, while the points after the dashed line are obtained with the same cross-validation procedure, but using a less expensive two-dimensional grid search over the kernel mixing parameters (Sec. S7 B). The accuracy of the model does not seem to be affected by this change. Right: Selection of the best QM9 literature models for which learning curves are available: FCHL,30 SOAP-GPR,42 aSLATM,67 PhysNet,68 SchNet,69 NICE,23 MTP,39 and GM-sNN.70 A few selected neural networks whose learning curves are not available are also shown on the full QM9 dataset (right-most isolated points); more are available in Table II. The dashed line in the WK learning curve represents a change in the fitting procedure. The points to its left are obtained by averaging ten runs with random train/test splits, where 1000 test structures are employed, and cross-validation is conducted within the training set as described in Sec. S7. Instead, for consistency with the literature models trained on the full QM9 dataset (Table II), the last point is averaged over 16 random train/validation/test splits where validation is conducted on a dedicated validation set via a grid search, as discussed in the caption of Table II. Figures with error bars relative to these random splits are provided in Sec. S11.

Close modal
TABLE I.

Performance comparison of the Wigner kernel model with the best literature models on the rMD17 dataset59 in its smaller version (50 randomly selected training structures). Accuracies of energies (E) and forces (F) are given as MAE in eV and eV/Å, respectively. The best results on each target are highlighted in bold.

MoleculeLE-ACENequIPMACEWK
EFEFEFEF
Aspirin 22.4 59.1 19.5 52.0 17.0 43.9 17.0 50.2 
Azobenzene 9.9 27.5 6.0 20.0 5.4 17.7 7.9 25.6 
Benzene 0.135 1.44 0.6 2.9 0.7 2.7 0.131 1.31 
Ethanol 6.6 32.0 8.7 40.3 6.7 32.6 5.9 30.8 
Malonaldehyde 11.3 50.9 12.7 52.5 10.0 43.3 8.9 43.8 
Naphthalene 2.9 13.9 2.1 10.0 2.1 9.2 2.5 12.5 
Paracetamol 14.3 45.1 14.3 39.7 9.7 31.5 10.2 37.2 
Salicylic acid 8.3 36.7 8.0 35.0 6.5 28.4 6.8 31.9 
Toluene 4.1 18.4 3.3 15.1 3.1 12.1 3.4 16.4 
Uracil 5.7 30.7 7.3 40.1 4.4 25.9 5.1 27.8 
MoleculeLE-ACENequIPMACEWK
EFEFEFEF
Aspirin 22.4 59.1 19.5 52.0 17.0 43.9 17.0 50.2 
Azobenzene 9.9 27.5 6.0 20.0 5.4 17.7 7.9 25.6 
Benzene 0.135 1.44 0.6 2.9 0.7 2.7 0.131 1.31 
Ethanol 6.6 32.0 8.7 40.3 6.7 32.6 5.9 30.8 
Malonaldehyde 11.3 50.9 12.7 52.5 10.0 43.3 8.9 43.8 
Naphthalene 2.9 13.9 2.1 10.0 2.1 9.2 2.5 12.5 
Paracetamol 14.3 45.1 14.3 39.7 9.7 31.5 10.2 37.2 
Salicylic acid 8.3 36.7 8.0 35.0 6.5 28.4 6.8 31.9 
Toluene 4.1 18.4 3.3 15.1 3.1 12.1 3.4 16.4 
Uracil 5.7 30.7 7.3 40.1 4.4 25.9 5.1 27.8 

Energies. Finally, we test the Wigner kernel model on the ground-state energies of the QM9 dataset. The corresponding learning curves are shown in Fig. 3. Wigner kernels significantly improve on other kernel methods, such as SOAP and FCHL in the low-data regime. As in the case of CH4, the WK model is truncated at a low angular threshold (λmax = 3). However, the corresponding learning curve shows no signs of saturation, possibly for the same reasons we highlighted in Sec. IV A. Similarly, a relatively low maximum body-order (νmax = 4) does not seem to impact the accuracy of the model, most likely because stable organic molecules have, with few exceptions, atoms with only up to four nearest neighbors. On the full QM9 dataset, Wigner kernels also achieve state-of-the-art accuracy, as shown in the last point of the WK learning curve and in Table II. The remarkable performance of the Wigner kernels on this exercise shows the suitability of the proposed model to, for instance, screening of pharmaceutical targets or prediction of chemical shifts from single equilibrium configurations. This stands in contrast to the other datasets we have investigated, which are better suited to assess the quality of a model in approximating a property surface for atomistic simulations.

TABLE II.

Performance comparison of the Wigner kernel model with a selection of the best literature models on the full QM9 dataset, as presented by Musaelian et al.44 The WK values are the mean and standard deviation of 16 runs on different random train/validation/test splits. In particular, the training set contains 110 000 random structures, the validation set contains another 10 000, and all the remaining QM9 structures constitute the test set, for consistency with the work of Musaelian et al.44 

ModelU0UHG
NoisyNodes60  7.3 7.6 7.4 8.3 
SphereNet61  6.3 6.4 6.3 7.8 
DimeNet++62  6.3 6.3 6.5 7.6 
ET63  6.2 6.4 6.2 7.6 
PaiNN58  5.9 5.8 6.0 7.4 
Allegro44  4.7 (0.2) 4.4 4.4 5.7 
MACE64  4.1 4.1 4.7 5.5 
TensorNet65  3.9 (0.3) 3.9 (0.1) 4.0 (0.2) 5.7 (0.1) 
Wigner Kernels 4.2 (0.3) 4.1 (0.3) 4.1 (0.3) 5.8 (0.3) 
ModelU0UHG
NoisyNodes60  7.3 7.6 7.4 8.3 
SphereNet61  6.3 6.4 6.3 7.8 
DimeNet++62  6.3 6.3 6.5 7.6 
ET63  6.2 6.4 6.2 7.6 
PaiNN58  5.9 5.8 6.0 7.4 
Allegro44  4.7 (0.2) 4.4 4.4 5.7 
MACE64  4.1 4.1 4.7 5.5 
TensorNet65  3.9 (0.3) 3.9 (0.1) 4.0 (0.2) 5.7 (0.1) 
Wigner Kernels 4.2 (0.3) 4.1 (0.3) 4.1 (0.3) 5.8 (0.3) 

In this work, we have presented the Wigner iteration as a practical tool to construct rotationally equivariant “Wigner kernels” for use in symmetry-adapted Gaussian process regression on 3D point clouds. We have then applied them to machine learn the atomistic properties of molecules and clusters. The proposed kernels are explicitly body-ordered—i.e., they provide explicit universal approximation capabilities22 for properties that simultaneously depend on the correlations between the positions of ν + 1 points—and can be thought as the kernels corresponding to the infinite-width limit of several families of body-ordered models. This extends the well-known equivalence between infinitely wide neural networks and Gaussian processes72–74 from a statistical context to the one of geometric representations. Whereas the full feature-space evaluation of body-ordered models leads to an exponential increase of the cost with ν, a kernel-space evaluation is naturally adapted to the training structures, and it avoids the explosion in the number of equivariant features that arises from the use of an explicit radial-chemical basis. The scaling properties of the Wigner iterations make the new model particularly suitable for datasets that are chemically diverse, that are expected to contain strong high-body-order effects, and/or that involve a very inhomogeneous distribution of molecular geometries.

Our benchmarks demonstrate the excellent performance of KRR models based on Wigner iterations on a variety of different atomistic problems. The ablation studies on gold clusters and gas-phase methane molecules fully reveal the strengths and weaknesses of the proposed model. In particular, the results for a random CH4 dataset suggest that Wigner kernels incorporate high-resolution basis functions even when they are built with a moderate angular momentum threshold, which is reassuring, given the steep scaling of the computational cost with λmax. The chemically diverse rMD17 and QM9 datasets allow us to show the state-of-the-art performance of the proposed model when learning energies, forces, and vectorial dipole moments. The fact that a kernel model can match the performance of extensively tuned equivariant neural networks testifies to the importance of understanding the connection between body-ordered correlations, the choice and truncation of a feature-space basis, and the introduction of scalar non-linearities in equivariant models.

Besides this fundamental role to test the complete-basis limit of density-correlation models, it is clear that Wigner iterations can be incorporated into practical applications. Our model achieves high efficiency on small molecules, and using a sparse kernel formalism will allow us to further reduce its computational cost and apply the model to much larger systems. Finally, the Wigner iteration could also be applied outside a pure kernel regression framework: from the calculation of non-linear equivariant functions, to the use in Gaussian process classifiers,75 to the inclusion as a layer in an equivariant architecture, the ideas we present here open up an original research direction in the construction of symmetry-adapted, physically inspired models for chemistry, materials science, and more in general any application whose inputs can be conveniently described in terms of a 3D point cloud.

The supplementary material contains several detailed derivations for the results reported in the main text, details of the models, and further benchmarks.

The authors would like to thank Jigyasa Nigam and Kevin Huguenin-Dumittan for stimulating discussions. M.C. and F.B. acknowledge support from the NCCR MARVEL, funded by the Swiss National Science Foundation (SNSF, Grant No. 182892). M.C. and S.N.P. acknowledge support from the Swiss Platform for Advanced Scientific Computing (PASC).

The authors have no conflicts to disclose.

Filippo Bigi: Conceptualization (equal); Data curation (lead); Formal analysis (equal); Investigation (lead); Software (lead); Visualization (lead); Writing – original draft (equal); Writing – review & editing (equal). Sergey N. Pozdnyakov: Conceptualization (equal); Formal analysis (equal); Investigation (supporting); Software (supporting); Writing – original draft (supporting). Michele Ceriotti: Conceptualization (equal); Funding acquisition (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

The code used to generate the results for the Wigner kernel model is available at https://doi.org/10.5281/zenodo.7952084. Hyperparameters for all the numerical experiments are given in Sec. S9. The data used here are available from the cited references.

1.
S.
Gumhold
,
X.
Wang
,
R. S.
MacLeod
et al, “
Feature extraction from point clouds
,” in
Proceedings of 10th International Meshing Roundtable
(
Sandia National Laboratories
,
2001
), IMR, pp.
293
305
.
2.
Y.
Guo
,
H.
Wang
,
Q.
Hu
,
H.
Liu
,
L.
Liu
, and
M.
Bennamoun
, “
Deep learning for 3D point clouds: A survey
,”
IEEE Trans. Pattern Anal. Mach. Intell.
43
(
12
),
4338
4364
(
2021
).
3.
Y.
Li
,
L.
Ma
,
Z.
Zhong
,
F.
Liu
,
M. A.
Chapman
,
D.
Cao
, and
J.
Li
, “
Deep learning for LiDAR point clouds in autonomous driving: A review
,”
IEEE Trans. Neural Networks Learn. Syst.
32
(
8
),
3412
3432
(
2021
).
4.
W.
Wu
,
Z.
Qi
, and
L.
Fuxin
, “
PointConv: Deep convolutional networks on 3D point clouds
,” in
2019 IEEECVF Conf. Comput. Vis. Pattern Recognit. CVPR
(
IEEE
,
Long Beach
,
2019
), pp.
9613
9622
.
5.
M. M.
Bronstein
,
J.
Bruna
,
T.
Cohen
, and
P.
Veličković
, “
Geometric deep learning: Grids, groups, graphs, geodesics, and gauges
,” arXiv:2104.13478 (
2021
).
6.
G.
Carleo
,
I.
Cirac
,
K.
Cranmer
,
L.
Daudet
,
M.
Schuld
,
N.
Tishby
,
L.
Vogt-Maranto
, and
L.
Zdeborová
, “
Machine learning and the physical sciences
,”
Rev. Mod. Phys.
91
(
4
),
045002
(
2019
).
7.
M.
Ceriotti
,
C.
Clementi
, and
O.
Anatole von Lilienfeld
, “
Introduction: Machine learning at the atomic scale
,”
Chem. Rev.
121
(
16
),
9719
9721
(
2021
).
8.
P.
Gainza
,
F.
Sverrisson
,
F.
Monti
,
E.
Rodolà
,
D.
Boscaini
,
M. M.
Bronstein
, and
B. E.
Correia
, “
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning
,”
Nat. Methods
17
(
2
),
184
192
(
2020
).
9.
A. P.
Bartók
,
M. C.
Payne
,
R.
Kondor
, and
G.
Csányi
, “
Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons
,”
Phys. Rev. Lett.
104
(
13
),
136403
(
2010
).
10.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
(
14
),
146401
(
2007
).
11.
F.
Brockherde
,
L.
Vogt
,
L.
Li
,
M. E.
Tuckerman
,
K.
Burke
, and
K. R.
Müller
, “
Bypassing the Kohn-Sham equations with machine learning
,”
Nat. Commun.
8
(
1
),
872
(
2017
).
12.
M.
Ceriotti
, “
Beyond potentials: Integrated machine learning models for materials
,”
MRS Bull.
47
(
10
),
1045
1053
(
2022
).
13.
J.
Gilmer
,
S. S.
Schoenholz
,
P. F.
Riley
,
O.
Vinyals
, and
G. E.
Dahl
, “
Neural message passing for quantum chemistry
,” in
Int. Conf. Mach. Learn.
(
The International Machine Learning Society
,
2017
), pp.
1263
1272
.
14.
M.
Rupp
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
, “
Fast and accurate modeling of molecular atomization energies with machine learning
,”
Phys. Rev. Lett.
108
(
5
),
058301
(
2012
).
15.
T.
Bereau
,
D.
Andrienko
, and
O. A.
Von Lilienfeld
, “
Transferable atomic multipole machine learning models for small organic molecules
,”
J. Chem. Theory Comput.
11
(
7
),
3225
3233
(
2015
).
16.
A.
Glielmo
,
P.
Sollich
, and
A.
De Vita
, “
Accurate interatomic force fields via machine learning with covariant kernels
,”
Phys. Rev. B
95
(
21
),
214302
(
2017
).
17.
M. W.
Finnis
and
J. E.
Sinclair
, “
A simple empirical N-body potential for transition metals
,”
Philos. Mag. A
50
(
1
),
45
55
(
1984
).
18.
A. P.
Horsfield
,
A. M.
Bratkovsky
,
M.
Fearn
,
D. G.
Pettifor
, and
M.
Aoki
, “
Bond-order potentials: Theory and implementation
,”
Phys. Rev. B
53
(
19
),
12694
12712
(
1996
).
19.
G. R.
Medders
,
A. W.
Götz
,
M. A.
Morales
,
P.
Bajaj
, and
F.
Paesani
, “
On the representation of many-body interactions in water
,”
J. Chem. Phys.
143
(
10
),
104102
(
2015
).
20.
J.
Sanchez
,
F.
Ducastelle
, and
D.
Gratias
, “
Generalized cluster description of multicomponent systems
,”
Physica A
128
(
1-2
),
334
350
(
1984
).
21.
R.
Drautz
, “
Atomic cluster expansion for accurate and transferable interatomic potentials
,”
Phys. Rev. B
99
(
1
),
014104
(
2019
).
22.
G.
Dusson
,
M.
Bachmayr
,
G.
Csányi
,
R.
Drautz
,
S.
Etter
,
C.
van der Oord
, and
C.
Ortner
, “
Atomic cluster expansion: Completeness, efficiency and stability
,”
J. Comput. Phys.
454
,
110946
(
2022
).
23.
J.
Nigam
,
S.
Pozdnyakov
, and
M.
Ceriotti
, “
Recursive evaluation and iterative contraction of N-body equivariant features
,”
J. Chem. Phys.
153
(
12
),
121101
(
2020
).
24.
B.
Anderson
,
T. S.
Hy
, and
R.
Kondor
, “
Cormorant: Covariant molecular neural networks
,” in
NeurIPS
(
ACM
,
2019
), p.
10
.
25.
S.
Batzner
,
A.
Musaelian
,
L.
Sun
,
M.
Geiger
,
J. P.
Mailoa
,
M.
Kornbluth
,
N.
Molinari
,
T. E.
Smidt
, and
B.
Kozinsky
, “
E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials
,”
Nat. Commun.
13
(
1
),
2453
(
2022
).
26.
N.
Thomas
,
T.
Smidt
,
S.
Kearnes
,
L.
Yang
,
L.
Li
,
K.
Kohlhoff
, and
P.
Riley
, “
Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds
,” arXiv:1802.08219 (
2018
).
27.
I.
Batatia
,
S.
Batzner
,
D. P.
Kovács
,
A.
Musaelian
,
G. N. C.
Simm
,
R.
Drautz
,
C.
Ortner
,
B.
Kozinsky
, and
G.
Csányi
, “
The design space of E(3)-equivariant atom-centered interatomic potentials
,” arXiv:2205.06643 (
2022
).
28.
J.
Nigam
,
S.
Pozdnyakov
,
G.
Fraux
, and
M.
Ceriotti
, “
Unified theory of atom-centered representations and message-passing machine-learning schemes
,”
J. Chem. Phys.
156
(
20
),
204115
(
2022
).
29.
S.
Chmiela
,
A.
Tkatchenko
,
H. E.
Sauceda
,
I.
Poltavsky
,
K. T.
Schütt
, and
K.-R.
Müller
, “
Machine learning of accurate energy-conserving molecular force fields
,”
Sci. Adv.
3
(
5
),
e1603015
(
2017
).
30.
F. A.
Faber
,
A. S.
Christensen
,
B.
Huang
, and
O. A.
Von Lilienfeld
, “
Alchemical and structural distribution based representation for universal quantum machine learning
,”
J. Chem. Phys.
148
(
24
),
241717
(
2018
).
31.
A.
Glielmo
,
C.
Zeni
, and
A.
De Vita
, “
Efficient nonparametric n-body force fields from machine learning
,”
Phys. Rev. B
97
(
18
),
184307
(
2018
).
32.
A.
Grisafi
,
D. M.
Wilkins
,
G.
Csányi
, and
M.
Ceriotti
, “
Symmetry-adapted machine learning for tensorial properties of atomistic systems
,”
Phys. Rev. Lett.
120
(
3
),
036002
(
2018
).
33.
V. L.
Deringer
,
A. P.
Bartók
,
N.
Bernstein
,
D. M.
Wilkins
,
M.
Ceriotti
, and
G.
Csányi
, “
Gaussian process regression for materials and molecules
,”
Chem. Rev.
121
(
16
),
10073
10141
(
2021
).
34.
M.
Marcon
,
R.
Spezialetti
,
S.
Salti
,
L.
Silva
, and
L. D.
Stefano
, “
Unsupervised learning of local equivariant descriptors for point clouds
,”
IEEE Trans. Pattern Anal. Mach. Intell.
44
(
12
),
9687
9702
(
2022
).
35.
F.
Musil
,
A.
Grisafi
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Physics-inspired structural representations for molecules and materials
,”
Chem. Rev.
121
(
16
),
9759
9815
(
2021
).
36.
C.
van der Oord
,
G.
Dusson
,
G.
Csányi
, and
C.
Ortner
, “
Regularised atomic body-ordered permutation-invariant polynomials for the construction of interatomic potentials
,”
Mach. Learn. Sci. Technol.
1
(
1
),
015004
(
2020
).
37.
A. P.
Bartók
,
R.
Kondor
, and
G.
Csányi
, “
On representing chemical environments
,”
Phys. Rev. B
87
(
18
),
184115
(
2013
).
38.
S. N.
Pozdnyakov
,
M. J.
Willatt
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Incompleteness of atomic structure representations
,”
Phys. Rev. Lett.
125
,
166001
(
2020
).
39.
A. V.
Shapeev
, “
Moment tensor potentials: A class of systematically improvable interatomic potentials
,”
Multiscale Model. Simul.
14
(
3
),
1153
1173
(
2016
).
40.
J.
Nigam
,
M. J.
Willatt
, and
M.
Ceriotti
, “
Equivariant representations for molecular Hamiltonians and N-center atomic-scale properties
,”
J. Chem. Phys.
156
(
1
),
014115
(
2022
).
41.
J. P.
Darby
,
D. P.
Kovács
,
I.
Batatia
,
M. A.
Caro
,
G. L. W.
Hart
,
C.
Ortner
, and
G.
Csányi
, “
Tensor-reduced atomic density representations
,” arXiv:2210.01705 (
2022
).
42.
M. J.
Willatt
,
F.
Musil
, and
M.
Ceriotti
, “
Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements
,”
Phys. Chem. Chem. Phys.
20
(
47
),
29661
29668
(
2018
).
43.
I.
Batatia
,
D. P.
Kovacs
,
G. N. C.
Simm
,
C.
Ortner
, and
G.
Csanyi
, “
MACE: Higher order equivariant message passing neural networks for fast and accurate force fields
,” in
Adv. Neural Inf. Process. Syst.
, edited by
A. H.
Oh
,
A.
Agarwal
,
D.
Belgrave
, and
K.
Cho
(
The NeurIPS Foundation
,
2022
).
44.
A.
Musaelian
,
S.
Batzner
,
A.
Johansson
,
L.
Sun
,
C. J.
Owen
,
M.
Kornbluth
, and
B.
Kozinsky
, “
Learning local equivariant representations for large-scale atomistic dynamics
,”
Nat. Commun.
14
(
1
),
579
(
2023
).
45.
E.
Prodan
and
W.
Kohn
, “
Nearsightedness of electronic matter
,”
Proc. Natl. Acad. Sci. U. S. A.
102
(
33
),
11635
11638
(
2005
).
46.
K.
Muandet
,
K.
Fukumizu
,
B.
Sriperumbudur
,
B.
Schölkopf
et al, “
Kernel mean embedding of distributions: A review and beyond
,”
Found. Trends Mach. Learn.
10
(
1-2
),
1
141
(
2017
).
47.
S.
De
,
A. P.
Bartók
,
G.
Csányi
, and
M.
Ceriotti
, “
Comparing molecules and solids across structural and alchemical space
,”
Phys. Chem. Chem. Phys.
18
(
20
),
13754
13769
(
2016
).
48.
M. J.
Willatt
,
F.
Musil
, and
M.
Ceriotti
, “
Atom-density representations for machine learning
,”
J. Chem. Phys.
150
(
15
),
154110
(
2019
).
49.
F.
Bigi
,
K. K.
Huguenin-Dumittan
,
M.
Ceriotti
, and
D. E.
Manolopoulos
, “
A smooth basis for atomistic machine learning
,”
J. Chem. Phys.
157
(
23
),
234101
(
2022
).
50.
Y.
Zhang
,
J.
Xia
, and
B.
Jiang
, “
REANN: A PyTorch-based end-to-end multi-functional deep neural network package for molecular, reactive, and periodic systems
,”
J. Chem. Phys.
156
(
11
),
114801
(
2022
).
51.
S. N.
Pozdnyakov
and
M.
Ceriotti
, “
Smooth, exact rotational symmetrization for deep learning on point clouds
,” arXiv:2305.19302 (
2023
).
52.
L.
Li
,
A. H.
Larsen
,
N. A.
Romero
,
V. A.
Morozov
,
C.
Glinsvad
,
F.
Abild-Pedersen
,
J.
Greeley
,
K. W.
Jacobsen
, and
J. K.
Nørskov
, “
Investigation of catalytic finite-size-effects of platinum metal clusters
,”
J. Phys. Chem. Lett.
4
(
1
),
222
226
(
2013
).
53.
C.
Zeni
,
K.
Rossi
,
A.
Glielmo
,
Á.
Fekete
,
N.
Gaston
,
F.
Baletto
, and
A.
De Vita
, “
Building machine learning force fields for nanoclusters
,”
J. Chem. Phys.
148
(
24
),
241739
(
2018
).
54.
B.
Goldsmith
and
L.
Ghiringhelli
. Dataset: Gold Clusters REMD 5-14 atoms,
2016
.
55.
B. R.
Goldsmith
,
J.
Florian
,
J.-X.
Liu
,
P.
Gruene
,
J. T.
Lyon
,
D. M.
Rayner
,
A.
Fielicke
,
M.
Scheffler
, and
L. M.
Ghiringhelli
, “
Two-to-three dimensional transition in neutral gold clusters: The crucial role of van der Waals interactions and temperature
,”
Phys. Rev. Mater.
3
(
1
),
016002
(
2019
).
56.
S.
Pozdnyakov
,
M.
Willatt
, and
M.
Ceriotti
. Dataset: Randomly-displaced methane configurations,
2020
.
57.
I.
Batatia
,
D. P.
Kovács
,
G. N.
Simm
,
C.
Ortner
, and
G.
Csányi
, “
MACE: Higher order equivariant message passing neural networks for fast and accurate force fields
,” arXiv:2206.07697 (
2022
).
58.
K.
Schütt
,
O.
Unke
, and
M.
Gastegger
, “
Equivariant message passing for the prediction of tensorial properties and molecular spectra
,” in
Int. Conf. Mach. Learn.
(
PMLR
,
2021
), pp.
9377
9388
.
59.
A. S.
Christensen
and
O. A.
Von Lilienfeld
, “
On the role of gradients for machine learning of molecular energies and forces
,”
Mach. Learn.: Sci. Technol.
1
(
4
),
045018
(
2020
).
60.
J.
Godwin
,
M.
Schaarschmidt
,
A.
Gaunt
,
A.
Sanchez-Gonzalez
,
Y.
Rubanova
,
P.
Veličković
,
J.
Kirkpatrick
, and
P.
Battaglia
, “
Simple GNN regularisation for 3D molecular property prediction and beyond
,” arXiv:2106.07971 (
2021
).
61.
Y.
Liu
,
L.
Wang
,
M.
Liu
,
X.
Zhang
,
B.
Oztekin
, and
S.
Ji
, “
Spherical message passing for 3D graph networks
,” arXiv:2102.05013 (
2021
).
62.
J.
Klicpera
,
S.
Giri
,
J. T.
Margraf
, and
S.
Günnemann
, “
Fast and uncertainty-aware directional message passing for non-equilibrium molecules
,” arXiv:2011.14115 (
2020
).
63.
P.
Thölke
and
G.
De Fabritiis
, “
Equivariant transformers for neural network based molecular potentials
,” in
International Conference on Learning Representations
,
2022
.
64.
D. P.
Kovacs
,
I.
Batatia
,
E. S.
Arany
, and
G.
Csanyi
, “
Evaluation of the MACE force field architecture: From medicinal chemistry to materials science
,” arXiv:2305.14247 (
2023
).
65.
G.
Simeon
and
G.
De Fabritiis
, “
TensorNet: Cartesian tensor representations for efficient learning of molecular potentials
,” arXiv:2306.06482 (
2023
).
66.
M.
Veit
,
D. M.
Wilkins
,
Y.
Yang
,
R. A.
DiStasio
, and
M.
Ceriotti
, “
Predicting molecular dipole moments by combining atomic partial charges and atomic dipoles
,”
J. Chem. Phys.
153
(
2
),
024113
(
2020
).
67.
B.
Huang
and
O. A.
von Lilienfeld
, “
Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity
,”
J. Chem. Phys.
145
(
16
),
161102
(
2016
).
68.
O. T.
Unke
and
M.
Meuwly
, “
PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges
,”
J. Chem. Theory Comput.
15
(
6
),
3678
3693
(
2019
).
69.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet—A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
(
24
),
241722
(
2018
).
70.
V.
Zaverkin
and
J.
Kästner
, “
Gaussian moments as physically inspired molecular descriptors for accurate and scalable machine learning potentials
,”
J. Chem. Theory Comput.
16
(
8
),
5410
5421
(
2020
).
71.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
Von Lilienfeld
, “
Quantum chemistry structures and properties of 134 kilo molecules
,”
Sci. Data
1
(
1
),
140022
(
2014
).
72.
J.
Lee
,
Y.
Bahri
,
R.
Novak
,
S. S.
Schoenholz
,
J.
Pennington
, and
J.
Sohl-Dickstein
, “
Deep neural networks as Gaussian processes
,” arXiv:1711.00165 (
2017
).
73.
R. M.
Neal
, “
Priors for infinite networks
,” in
Bayesian Learning for Neural Networks
(
Springer
,
1996
), pp.
29
53
.
74.
C.
Williams
, “
Computing with infinite networks
,” in
Advances in Neural Information Processing Systems
(
The NeurIPS Foundation
,
1996
), Vol.
9
.
75.
C. E.
Rasmussen
,
Gaussian Processes for Machine Learning
(
MIT Press
,
2006
).