Molecular fingerprints, i.e., feature vectors describing atomistic neighborhood configurations, is an important abstraction and a key ingredient for data-driven modeling of potential energy surface and interatomic force. In this paper, we present the density-encoded canonically aligned fingerprint algorithm, which is robust and efficient, for fitting per-atom scalar and vector quantities. The fingerprint is essentially a continuous density field formed through the superimposition of smoothing kernels centered on the atoms. Rotational invariance of the fingerprint is achieved by aligning, for each fingerprint instance, the neighboring atoms onto a local canonical coordinate frame computed from a kernel minisum optimization procedure. We show that this approach is superior over principal components analysis-based methods especially when the atomistic neighborhood is sparse and/or contains symmetry. We propose that the “distance” between the density fields be measured using a volume integral of their pointwise difference. This can be efficiently computed using optimal quadrature rules, which only require discrete sampling at a small number of grid points. We also experiment on the choice of weight functions for constructing the density fields and characterize their performance for fitting interatomic potentials. The applicability of the fingerprint is demonstrated through a set of benchmark problems.
I. INTRODUCTION
Molecular Dynamics (MD) simulations have been widely used for studying atomistic systems, e.g., proteins and catalysts, due to their ability to precisely capture transient events and to predict macroscopic properties from microscopic details.1,2 In its most prevalent implementation, the trajectory of an atomistic system is integrated in time according to the Newton’s law of motion using forces calculated as the negative gradient of a Hamiltonian, whose functional form and parameters are collectively referred to as a force field.3–5 Traditionally, the pairwise and many-body terms that comprise a force field are derived empirically by fitting to quantum mechanical (QM) calculations and experimental data.
Three properties directly relate to the applicability of a force field: accuracy, transferability, and complexity.6,7 Over the years, a large number of force fields have been developed, each carrying a particular emphasis over these three properties. However, the combinatorial complexity of atomistic systems can easily outpace force field development efforts, the difficulty of which explodes following the curse of dimensionality.8 A deceptively simple system that can demonstrate the situation is water, a triatomic molecule with a well-characterized molecular structure. In fact, all common water models, such as SPC-E, TIP3P, and TIP4P, have only succeeded in reproducing a small number of structural and dynamical properties of water due to the difficulty in modeling strong intermolecular many-body effects such as hydrogen bonding and polarization.9,10
In lieu of a force field, quantum mechanical (QM) calculations can be employed straightforwardly to drive molecular dynamics simulations. The method achieves significantly better accuracy and transferability by solving for the electronic structure of the system. However, the computational complexity of QM methods is at least cubic in the number of electrons, and consequently the time and length scales accessible by QM-driven molecular dynamics are severely constrained.
Assuming that there is smoothness in the potential energy surface of the atomistic system, one possible strategy to accelerate QM-driven molecular dynamics is to use QM calculations on only a subset of the time steps and to interpolate for similar atomic configurations.11–14 A schematic overview of the process is given in Fig. 1, which is enabled by the recent development of high-dimensional nonlinear statistical learning and regression techniques such as Gaussian process regression (GPR)15 and artificial neural networks.16
In the pipeline of machine learning-driven molecular computations, atomistic neighborhood configurations are transformed into feature vectors, called fingerprints, and used to train non-linear regression models.
In the pipeline of machine learning-driven molecular computations, atomistic neighborhood configurations are transformed into feature vectors, called fingerprints, and used to train non-linear regression models.
This paper focuses on a particular aspect of the machine-learning-driven molecular computation pipeline, i.e., fingerprint algorithms, whose importance arises naturally from the aforementioned regression protocol. A fingerprint is an encoding of an atomistic configuration that can facilitate regression tasks such as similarity comparison across structures consisting of variable numbers of atoms and elements. As has been pointed out previously,12 a good fingerprint should possess the following properties:
It can be encoded as a fixed-length vector so as to facilitate regression (particularly for artificial neural networks).
It is complete, i.e., different atomistic neighborhood configurations lead to different fingerprints and vice versa, and the “distance” between the fingerprints should be proportional to the intrinsic difference between the atomistic neighborhood configurations.
It is continuous with regard to atomistic coordinates, and the change in fingerprint should be approximately proportional to the structural variation as characterized by, for example, some internal coordinates.
It is invariant under permutation, rotation, and translation.
It is computationally feasible and straightforward to implement.
Before we proceed to the details of our work, we will first briefly review several fingerprints that are closely related to our work, i.e., the Smooth Overlap of Atomic Positions (SOAP) kernel,12 the Coulomb matrix,17 and the Graph Approximated Energy (GRAPE) kernel.18
Smooth Overlap of Atomic Positions (SOAP): The SOAP kernel is built on the idea of representing atomistic neighborhoods as smoothed density fields using Gaussian kernels each centered at a neighbor atom. Similarity is measured as the inner product between density fields, while rotational invariance is achieved by integrating over all possible 3D rotations, which can be performed analytically using the power spectrum of the density field. In fact, our fingerprint algorithm is inspired by this idea of treating atoms as smoothed density fields. However, we take a different approach to endorse the fingerprint with rotational invariance and use the Euclidean distance instead the of inner product as a distance metric.
Coulomb matrix: The practice of using graphs to represent atomistic neighbor configurations was first implied by the Coulomb matrix, and later further formulated in the GRAPE kernel, where the diffusion distance was proposed as a similarity measure between different local chemical environments.19 The idea is to construct an undirected, unlabeled graph G = (V, E) with atoms serving as the vertices and pairwise interactions weighting the edges. For example, the Coulomb matrix can be treated as a physically inspired Laplacian matrix,20
where the degree matrix D encodes a polynomial fit of atomic energies to the nuclear charge, while the adjacency matrix A corresponds to the Coulombic interactions between all pairs of atoms. Due to the use of only relative positions between atoms in the adjacency matrix, the Coulomb matrix is automatically invariant under translation and rotation. However, the matrix itself is not invariant under permutation, as swapping the order of two atoms will result in an exchange of the corresponding columns and the rows. To address this, the sorted list of eigenvalues of the Coulomb matrix can be used instead as a feature vector, while an ℓp norm can be used as a distance metric. In practice, due to the fact that the number of neighbor atoms may change, the shorter eigenvalue list is padded with zeros in a distance computation.
Graph Approximated Energy (GRAPE): The GRAPE kernel evaluates the simultaneous random walks on the direct product of the two graphs representing two atomistic neighborhood configurations. Permutational invariance is achieved by choosing a uniform starting and stopping distribution across nodes of both graphs. However, the cost of distance computation between two graphs scales as with a one-time per-graph diagonalization cost of .
In Secs. II–V, we present our new fingerprint algorithm, namely, the Density-Encoded Canonically Aligned Fingerprint (DECAF). The paper is organized as follows: in Sec. II, we introduce a robust algorithm that can determine canonical coordinate frames for obtaining symmetry-invariant projections; in Sec. III, we present numerical recipes to use smoothed atomistic density fields as a fingerprint for molecular configuration; in Sec. IV, we demonstrate the capability of the fingerprint via examples involving the regression of atomistic potential energy surfaces; in Sec. V, we discuss the connection between our algorithm and previously proposed ones; and we conclude with a discussion in Sec. VI.
II. LOCALIZED CANONICAL COORDINATE FRAME FOR ROTATIONALLY INVARIANT DESCRIPTION OF ATOMISTIC NEIGHBORHOOD
A. Kernel minisum approach
To improve model generalization while minimizing data redundancy, a fingerprint algorithm should be able to recognize atomistic structures that differ only by a rigid-body transformation or a permutation of atoms of the same element and to extract feature vectors invariant under these transformations. As summarized in Table I, a variety of strategies have been successfully employed by common fingerprint algorithms to achieve rotational invariance.
Comparison of strategies used by fingerprint algorithms to obtain feature vectors which are invariant under translation, permutation, and rotation.
. | Invariance . | ||
---|---|---|---|
Fingerprint . | Translation . | Permutation . | Rotation . |
Coulomb matrix17 | Relative distance | Sorting eigenvalues | All vs. all graph |
Behler11 | Relative distance | Summation | Ignoring angular information |
SOAP12 | Relative distance | Summation | Integrating over all rotations |
GRAPE18 | Relative distance | Uniform distribution | Uniform distribution |
. | Invariance . | ||
---|---|---|---|
Fingerprint . | Translation . | Permutation . | Rotation . |
Coulomb matrix17 | Relative distance | Sorting eigenvalues | All vs. all graph |
Behler11 | Relative distance | Summation | Ignoring angular information |
SOAP12 | Relative distance | Summation | Integrating over all rotations |
GRAPE18 | Relative distance | Uniform distribution | Uniform distribution |
However, these approaches do not provide a means for the acquisition of vector-valued quantities in a rotational invariant form. One approach is to only acquire and interpolate the potential energy, a scalar quantity, and then take the derivative of the regression model. This approach, however, triggers the need for methods to decompose the total energy among the atoms, which is a property of the entire system rather than individual atoms.21
Another approach proposed by Li et al.13,22 is to project vector quantities onto a potentially overcomplete set of non-orthogonal basis vectors obtained from a weighted sum of the atomic coordinate vectors,
However, the approach may suffer from robustness issues. For example, all of the Vk generated with different pk will point in the same direction if the radial distances of the atoms are all equal. Further, the configuration with 4 atoms at (r cos ε, r sin ε), (0, r), (−r, 0), and (0, −r) leads to
Thus, if ε gets close to zero, Vk will always point toward either (0, 1) or (0, −1), even if the vector quantity of interest may point in other directions.
Here, we present a robust kernel principal components analysis (PCA)-inspired algorithm for the explicit determination of a canonical coordinate frame, within which the projection of the atomistic neighborhood is invariant under rigid-body rotation. Furthermore, the canonical coordinate frame can be directly used to capture vector-valued quantities in a rotational-invariant form. Given N atoms with position , we first formulate the Lp PCA algorithm as an optimization problem where we seek a unit vector w* that maximizes the sum of the projections,
where ri = ∥xi∥ is the distance from the origin to atom i and ei = xi/ri is the unit vector pointing toward atom i, respectively. The optimization process can only uniquely determine the orientation of a projection vector up to a line because . As a consequence, further heuristics are needed to identify a specific direction for the PCA vectors.
To overcome this difficulty, we generalize the term into a weight function g(r) of radial distance and the term into a bivariate kernel function κ(w, e) between two vectors. We then attempt to seek a unit vector w* that minimizes the kernel summation,
We will hereafter refer to this formulation as kernel minisum.
In particular, we have found a square angle (SA) kernel and an exponentiated cosine (EC) kernel that perform well in practice,
As shown in Fig. 2, both kernels are minimal when w and e are parallel and monotonically reach maximum when w and e are antiparallel. Intuitively, optimizing the minisum objective function generated by the SA kernel will yield a vector that, loosely speaking, bisects the sector occupied by the atoms. The EC kernel exhibits very similar behavior but leads to a smoother objective function. As shown in Fig. 2, this allows for the determination of a projection vector without ambiguity, even if the atom configuration contains perfect symmetry.
Shown here is an illustration of the minisum algorithm that determines projection vectors for rotational invariant description of atomistic neighbor configurations. Black dots represent atoms which all carry equal importance. The vectors that point from the origin to the atoms are used as the input to bivariate kernels to compute the minisum objective function, which are drawn in solid lines. For reference, the negated values of the PCA objective function are drawn in dashed lines. Projection vectors are obtained by finding the unit vector w* that minimizes the objective function.
Shown here is an illustration of the minisum algorithm that determines projection vectors for rotational invariant description of atomistic neighbor configurations. Black dots represent atoms which all carry equal importance. The vectors that point from the origin to the atoms are used as the input to bivariate kernels to compute the minisum objective function, which are drawn in solid lines. For reference, the negated values of the PCA objective function are drawn in dashed lines. Projection vectors are obtained by finding the unit vector w* that minimizes the objective function.
A major advantage of the kernel minisum approach versus Lp norm-based PCA lies in its (1) robustness in the presence of structural symmetry and (2) continuity of the resulting principal axes with respect to angular movement of the input data. As shown in Fig. 3, kernel minisum is particularly suitable for atomistic systems where strong symmetries are common and the continuity against angular movement is desired. The minisum framework can also be used with other customized kernels to suit for the characteristics of specific application scenarios.
A comparison of the orthogonal bases obtained using principal components analysis (PCA) and kernel minisum with the square-angle (MSA) kernel. (a) The PCA algorithm, used in conjunction with the L2 norm, fails to extract a principal axis that rotates with a system that exhibits planar C4 symmetry. Both MSA and PCA with the L1 norm can accommodate this scenario. (b) Both L1 and L2 principal axes change orientation abruptly when the system undergoes a slight angular motion. In contrast, the MSA output is continuous with regard to this movement as it always bisects the angle formed by the atoms and the origin. (c) Loosely speaking, the MSA axis points at the majority direction of the atoms if only a single cluster is present within the cutoff distance but bisects the angle between two atom clusters. This is different from PCA-based results and should deliver robust rotational invariance as well as continuity. Arrows are drawn at different lengths to improve visual clarity in situations of overlapping. They are to be understood as unit vectors. De-trending is not performed because the radial distances of the atoms carry physically significant information.
A comparison of the orthogonal bases obtained using principal components analysis (PCA) and kernel minisum with the square-angle (MSA) kernel. (a) The PCA algorithm, used in conjunction with the L2 norm, fails to extract a principal axis that rotates with a system that exhibits planar C4 symmetry. Both MSA and PCA with the L1 norm can accommodate this scenario. (b) Both L1 and L2 principal axes change orientation abruptly when the system undergoes a slight angular motion. In contrast, the MSA output is continuous with regard to this movement as it always bisects the angle formed by the atoms and the origin. (c) Loosely speaking, the MSA axis points at the majority direction of the atoms if only a single cluster is present within the cutoff distance but bisects the angle between two atom clusters. This is different from PCA-based results and should deliver robust rotational invariance as well as continuity. Arrows are drawn at different lengths to improve visual clarity in situations of overlapping. They are to be understood as unit vectors. De-trending is not performed because the radial distances of the atoms carry physically significant information.
B. Solving the kernel minisum optimization problems
The optimization problem can be solved very efficiently using a gradient descent algorithm as detailed below.
Square Angle: The objective function of the minisum problem using the square angle (SA) kernel is
The gradient of ASA with respect to w is
Note that is singular when w∥ei. This can be treated numerically by replacing the removable singularities at wTei = 1 with the left-limit , while truncating the gradient at a finite threshold near the poles at wTei = −1.
A local minimum can be iteratively searched for with gradient descent while renormalizing w after each iteration. Moreover, due to the locally quadratic nature of the objective function, we have found that the Barzilai-Borwein algorithm23 can significantly accelerate the convergence at a minimal cost. The algorithm is presented in Algorithm 1.
Gradient descent for solving the square angle minisum problem.
1: function MinSquareAngle(E = [e1, e2, …, eN], G = [g1, g2, …, gN], w0) |
2: w ←w0 |
3: Repeat |
4: Compute gradient ∇w using Eq. (13). |
5: Obtain tangential component of gradient |
6: if at step 0 then |
7: α ← 0.01 ▹ Small initial step size for bootstrapping |
8: else |
9: ▹Adaptive subsequent steps by Barzilai-Borwein |
10: end if |
11: Save w as w−1, save as |
12: Update and normalize to unit length |
13: until ∥w −w−1∥ < ε |
14: return w |
15: end function |
1: function MinSquareAngle(E = [e1, e2, …, eN], G = [g1, g2, …, gN], w0) |
2: w ←w0 |
3: Repeat |
4: Compute gradient ∇w using Eq. (13). |
5: Obtain tangential component of gradient |
6: if at step 0 then |
7: α ← 0.01 ▹ Small initial step size for bootstrapping |
8: else |
9: ▹Adaptive subsequent steps by Barzilai-Borwein |
10: end if |
11: Save w as w−1, save as |
12: Update and normalize to unit length |
13: until ∥w −w−1∥ < ε |
14: return w |
15: end function |
Exponentiated Cosine: The objective function of the minisum problem using the exponentiated cosine (EC) kernel is
The gradient of AEC with respect to w is
The gradient contains no singularity. However, it is not always locally quadratic or convex. This can cause the Barzilai-Borwein algorithm to generate negative step sizes and consequently divert the search towards a maximum. Luckily, this can be easily overcome by always using the absolute value of the step size generated by the Barzilai-Borwein algorithm. Such enforcement prevents the minimization algorithm from going uphill. The complete algorithm is given in Algorithm 2.
Gradient descent for solving the exponentiated cosine minisum problem.
1: function MinExpCosine(E = [e1, e2, …, eN], G = [g1, g2, …, gN], w0) |
2: w ←w0 |
3: repeat |
4: Compute gradient ∇w using Eq. (15). |
5: Obtain tangential component of gradient |
6: if at step 0 then |
7: α ← 0.01 ▹Small initial step size for bootstrapping |
8: else |
9: ▹Adaptive subsequent steps by Barzilai-Borwein |
10: end If |
11: Save w as w−1, save as |
12: Update and normalize to unit length |
13: until ∥w −w−1∥ < ε |
14: return w |
15: end function |
1: function MinExpCosine(E = [e1, e2, …, eN], G = [g1, g2, …, gN], w0) |
2: w ←w0 |
3: repeat |
4: Compute gradient ∇w using Eq. (15). |
5: Obtain tangential component of gradient |
6: if at step 0 then |
7: α ← 0.01 ▹Small initial step size for bootstrapping |
8: else |
9: ▹Adaptive subsequent steps by Barzilai-Borwein |
10: end If |
11: Save w as w−1, save as |
12: Update and normalize to unit length |
13: until ∥w −w−1∥ < ε |
14: return w |
15: end function |
As shown in Table II, both Algorithms 1 and 2 converge quickly and consistently across a wide range of representative point configurations commonly found in molecular systems. However, the gradient descent method can only find local optima. Thus, multiple trials should be performed using different initial guesses to ensure that a global minimum can be located.
Listed here is the number of iterations and initial guesses used by the gradient descent algorithm to find a local optimum of the kernel minisum problems. The numbers are averaged over 500 repetitions, and the convergence criterion is 10−14. In cases where the iterative algorithm does not converge within 64 iterations, the optimization will be restarted with a new guess.
Kernel . | Square angle . | Exponentiated cosine . | ||
---|---|---|---|---|
Configuration . | Itrs./guess . | Guesses . | Itrs./guess . | Guesses . |
Single point | 8.8 | 1 | 7.9 | 1 |
Two points, angle < π/2 | 7.1 | 1 | 7.7 | 1 |
Two points, angle ≥ π/2 | 6.1 | 1 | 6.3 | 1 |
Two points, angle = π | 6.0 | 1 | 5.6 | 1 |
Planar C3 | 6.2 | 1 | 6.3 | 1 |
Planar C4 | 5.6 | 1 | 6.9 | 1 |
Tetrahedra | 10.6 | 1 | 7.7 | 1 |
Octahedra | 11.7 | 1 | 9.4 | 1 |
Improper S4 | 17.9 | 1 | 23.1 | 1 |
Improper S6 | 14.7 | 1 | 18.0 | 1 |
2D random 10 points | 7.2 | 1.1 | 8.9 | 1 |
3D random 50 points | 16.9 | 1.2 | 14.4 | 1 |
Kernel . | Square angle . | Exponentiated cosine . | ||
---|---|---|---|---|
Configuration . | Itrs./guess . | Guesses . | Itrs./guess . | Guesses . |
Single point | 8.8 | 1 | 7.9 | 1 |
Two points, angle < π/2 | 7.1 | 1 | 7.7 | 1 |
Two points, angle ≥ π/2 | 6.1 | 1 | 6.3 | 1 |
Two points, angle = π | 6.0 | 1 | 5.6 | 1 |
Planar C3 | 6.2 | 1 | 6.3 | 1 |
Planar C4 | 5.6 | 1 | 6.9 | 1 |
Tetrahedra | 10.6 | 1 | 7.7 | 1 |
Octahedra | 11.7 | 1 | 9.4 | 1 |
Improper S4 | 17.9 | 1 | 23.1 | 1 |
Improper S6 | 14.7 | 1 | 18.0 | 1 |
2D random 10 points | 7.2 | 1.1 | 8.9 | 1 |
3D random 50 points | 16.9 | 1.2 | 14.4 | 1 |
C. Complete set of orthogonal projection vectors as a canonical coordinate frame
In 3D, a complete set of orthogonal bases can be found greedily using the protocol as described in Algorithm 3. Specifically, we use the globally optimal solution of the minisum optimization problem as the first basis bα, and the constrained optimal solution in a plane orthogonal to bα as the second basis bβ. Special care must be taken for determining the third basis bγ, as the only degree of freedom now is its sign due to the orthogonality constraint. The straightforward approach of choosing the direction that gives the smaller objective function value may fail, for example, when the system contains improper rotational symmetry. In that case, bα and bβ are interchangeable and both are perpendicular to the rotation-reflection axis. As a result, the two candidates of bγ will both align with the rotation-reflection axis and are thus indistinguishable by kernel minisum. However, the projection of the atoms onto the two seemingly equivalent coordinate frames is not identical but rather mirroring images of each other. Fortunately, this can be addressed by choosing the direction of the half-space, as created by the plane bα-bα, that yields the smaller kernel objective function between the bisector of bα and bβ versus the points which lie in that half-space. This rule can also handle general situations with/without symmetry.
The procedure for determining a canonical coordinate frame using kernel minisum.
1: function GetCanonicalProjection3D(E = [e1, e2, …, eN], G = [g1, g2, …, gN]) |
2: bα ← global minimum of Minisum(E, G) using multiple runs of Algorithm 2 or Algorithm 3 |
3: if E contains only 1 point then |
4: construct arbitrary bβ ⊥ bα |
5: bγ ← bα × bβ |
6: else |
7: bβ ← global minimum of Minisum (E, G) using multiple runs of Algorithm 2 or Algorithm 3, |
subjecting to the constraint that the probe vector and the gradient are all ⊥ bα |
8: |
9: if then |
10: bγ ← bα × bβ |
11: else |
12: bγ ← −bα × bβ |
13: end if |
14: end if |
15: return bα, bβ, bγ |
16: end function |
1: function GetCanonicalProjection3D(E = [e1, e2, …, eN], G = [g1, g2, …, gN]) |
2: bα ← global minimum of Minisum(E, G) using multiple runs of Algorithm 2 or Algorithm 3 |
3: if E contains only 1 point then |
4: construct arbitrary bβ ⊥ bα |
5: bγ ← bα × bβ |
6: else |
7: bβ ← global minimum of Minisum (E, G) using multiple runs of Algorithm 2 or Algorithm 3, |
subjecting to the constraint that the probe vector and the gradient are all ⊥ bα |
8: |
9: if then |
10: bγ ← bα × bβ |
11: else |
12: bγ ← −bα × bβ |
13: end if |
14: end if |
15: return bα, bβ, bγ |
16: end function |
It is difficult to prove global uniqueness of the kernel minisum solution given the non-convex nature of the exponentiated cosine and square angle kernels. In fact, it seems that the only kernel that allows analytical proof of solution uniqueness is κCOM(w, e) ≐ −wTe, whose solution simply corresponds to the weighted center of mass of the neighbor atoms. However, this simple kernel is not robust against reflectional and rotational symmetry. Luckily, the rare cases where two global optimal solutions do coexist can be safely captured by the repeated searching procedure starting from different seeds. Thus, a fingerprint can be extracted using each of the resulting coordinate frame. This may mildly increase the size of the training set, which could even be advantageous when training data are scarce.
III. DENSITY-ENCODED CANONICALLY ALIGNED FINGERPRINT
A. Density field and approximation of volume integral
The local density field ρs(r) around a point s, as shown in Fig. 4, is formulated as a superimposition of smoothing kernel functions each centered at a neighbor atom i = 1, 2, …, N with relative displacement xi with regard to s and within a cutoff distance Rc,
This density field, as has been pointed out previously,12 may be used as a fingerprint of the local atomistic environment. Here, we assume that the smoothing kernel Wd(r) takes the form of a stationary Gaussian . We also assume that the density scaling function Wc(r), which ensures the continuity of the density field when atoms enter or exit the cutoff distance, is a bell-shaped function with compact support. Further discussion on both Wc(r) and Wd(r) can be found in Secs. III B and III C, respectively.
(a) Two 1D density profiles, ρ1 and ρ2, are generated from two different atomistic configurations using atom-centered smoothing kernel functions. The “distance” between them is measured as the L2 norm of their pointwise difference, which corresponds to the orange area in the middle plot. (b) Shown here is a 2D density field using smoothing kernels whose widths depend on the distances of the atoms from the origin. Darker shades indicate higher density.
(a) Two 1D density profiles, ρ1 and ρ2, are generated from two different atomistic configurations using atom-centered smoothing kernel functions. The “distance” between them is measured as the L2 norm of their pointwise difference, which corresponds to the orange area in the middle plot. (b) Shown here is a 2D density field using smoothing kernels whose widths depend on the distances of the atoms from the origin. Darker shades indicate higher density.
To achieve rotational invariance, we project the atom coordinates into the canonical coordinate frame R ≐ [bα, bβ, bγ] as determined by the kernel minisum algorithm, when generating the density field,
Depending on the specific application, s may not necessarily overlap with any of the xi. Scalar properties can be acquired directly from the target atom, while vector-valued properties, such as force, can be acquired and interpolated in the local orthogonal coordinates as = RTy.
We define the distance between two density fields ρi and ρj as a weighted volume integral of their pointwise difference,
The weight function (r) provides additional flexibility for emphasizing particular regions of the atomistic neighborhood. It could play an important role for fitting properties with steep gradients, e.g., the repulsive part of the Lennard-Jones potential.
We now introduce an optimal quadrature rule to approximate the integral in Eq. (19) in a computationally tractable manner. A quadrature rule is a numerical recipe in the form , which numerically approximates a definite integral using only discrete evaluations of the integrand. To determine the quadrature nodes and weights, we decompose the volume integral in Eq. (19) into a surface integral over spherical shells and a 1D integral along the radial direction,
The surface integral can be optimally approximated using the Lebedev quadrature rule,24
where βm, qm ≐ (xm, ym, zm), and Na are the weights, positional unit vectors, and number of the Lebedev nodes, respectively. The radial integral fits well into the generalized Laguerre-Gauss quadrature formula with weight function r2e−r,25
where αn, rn, and Nr are the weights, coordinates, and number of the Laguerre nodes, respectively. Combining Eqs. (20)–(22), a composite quadrature rule can be generated consisting of several spherical layers of nodes. As shown in Fig. 5, the radial coordinates of the quadrature nodes are determined by the Laguerre quadrature nodes, while the angular positions are determined by the Lebedev quadrature nodes, respectively. This composite quadrature formula translates the 3D volume integral into a summation over discrete grid points,
Using the right hand side of Eq. (23) to replace the integral in Eq. (19) and using the multi-index notation k = (n, m), 1 ≤ n ≤ Nr, 1 ≤ m ≤ Na(n) to enumerate over the quadrature nodes located at rk = rn ⋅qm with weights , we obtain the final discretized formula for computing the distance between the fingerprints,
For quick reference, we tabulated in the Appendix the values for rn and αn in the Laguerre quadrature of up to 6 points and the values for qm, βm in the Lebedev quadrature of up to 50 points.
(a) Laguerre quadrature nodes with order 1–5, normalized by the reciprocal of the largest node onto the unit interval. (b) The a1, a2, a3, and bk classes of Lebedev grid points on a unit sphere. (c) The DECAF molecular fingerprint essentially comprises of a grid of quadrature nodes that optimally samples the density field induced by the neighbor atoms. Shown here is an example of one such composite quadrature grid which combines a 3-node Laguerre quadrature rule with three layers of Lebedev quadrature nodes of 3rd, 5th, and 7th order, respectively.
(a) Laguerre quadrature nodes with order 1–5, normalized by the reciprocal of the largest node onto the unit interval. (b) The a1, a2, a3, and bk classes of Lebedev grid points on a unit sphere. (c) The DECAF molecular fingerprint essentially comprises of a grid of quadrature nodes that optimally samples the density field induced by the neighbor atoms. Shown here is an example of one such composite quadrature grid which combines a 3-node Laguerre quadrature rule with three layers of Lebedev quadrature nodes of 3rd, 5th, and 7th order, respectively.
In addition, the quadrature nodes could be radially scaled such that the outer most nodes lie at a radius R* within some cutoff distance Rc. This allows us to fit a Laguerre quadrature of any order within an arbitrary cutoff distance. The scaled quadrature rule is given by
Since the scaling is simply constant among all nodes, it can be safely omitted in many regression tasks where only the relative proximity between the fingerprints are of significance.
B. Radial weight functions
In this section, we examine two radial weight functions that can be used to fine-tune the density field fingerprint: the density scaling function Wc(r) as appears in Eq. (17) and the weight of integral w(r) as appears in Eq. (19).
Driven by the interest of reducing computational cost, we would like to use a cutoff distance to select atoms involved in constructing the density field. However, it is important to ensure that atoms will enter and exit the neighborhood smoothly. This naturally requests that the contribution of an atom to be zero outside of the cutoff and to increase continuously and smoothly when the atom approaches entrance. Correspondingly, Wc(r) should (1) become unity at the origin, (2) smoothly approach zero at the cutoff distance, and (3) be twice-differentiable to ensure the differentiability of regression models based on the fingerprint. Candidates satisfying the above conditions include, for example, a tent-like kernel
and a bell-shaped polynomial kernel with compact support
as detailed in the Appendix.
The approximation of the radial integral using a Laguerre quadrature requires that the integrand, i.e., the pointwise difference between the atomistic density fields, decays sufficiently fast beyond the outermost quadrature nodes in order to achieve acceptable convergence. In addition, the steeply repulsive yet flatly attractive interatomic short-range interactions prompt that the sensitivity of fingerprint be adjusted correspondingly in order to avoid numerical difficulties in training a regression model. The weight of the integral, (r), provides a convenient means for achieving the purpose. Different from Wc(r), (r) should instead satisfy the following conditions: (1) should be normalized such that ∫(r)dV (r) = 1; (2) should decay sufficiently fast, but not necessarily to 0, beyond the outer most quadrature node; and (3) should be sufficiently smooth beyond the outermost quadrature node. Candidates for (r) include the tent-like kernel and the bell-shaped kernel for Wc(r), albeit with a different normalization factor. The Laplacian kernel
with a properly sized length scale l also appears to be a candidate due to its similarity with e−r part of the weight function of the Laguerre quadrature. Note that the constant kernel
may also be a choice as long as the density field already decays fast enough due to the density scaling function Wc(r).
In Fig. 6, we demonstrate the effect of the density scaling function and the weight of integral on the distance matrices between fingerprints obtained from a pair of atoms. The comparison between panels (a) and (b) shows that a bell-shaped integration weight allows the distance between fingerprints to change more rapidly when the atoms are closer but more slowly when the atoms are farther apart. The visible discontinuity in the second row clearly demonstrates the importance of the damping function when only atoms within a finite cutoff distance are used to compute the fingerprint.
Shown here are examples of the distance matrices between fingerprints sampled from a biatomic system. As manifested by the difference between (a) against (b), a bell-shaped weight of integral helps to emphasize the near field. Meanwhile, the obvious discontinuities in the second row of matrices demonstrate the importance of the density scaling function when the fingerprint algorithm uses only atoms within a finite cutoff distance.
Shown here are examples of the distance matrices between fingerprints sampled from a biatomic system. As manifested by the difference between (a) against (b), a bell-shaped weight of integral helps to emphasize the near field. Meanwhile, the obvious discontinuities in the second row of matrices demonstrate the importance of the density scaling function when the fingerprint algorithm uses only atoms within a finite cutoff distance.
We further examine the impact of the weight functions on the performance of Gaussian process regression (GPR) using the fingerprint of the interatomic force of a minimal system containing two nitrogen atoms. Despite the simplicity of the system, this case is of fundamental importance because of its ubiquity and because the fast-growing repulsive regime of the Lennard-Jones potential could cause difficulty as a small change in the system configuration can trigger large changes in the regression target function. In Fig. 7, we compare the performance among the combination of four weights of the integral and two density scaling functions. The initial training set consists of two samples collected at rN–N = 1.0 and 6.0. The regression is then refined using a greedy strategy that consecutively learns the point with the largest posterior variance until the largest uncertainty, defined as twice the posterior standard deviation, is less than 0.1 eV/Å. The active learning scheme is able to deliver a GPR model, for each and every combination of the weight functions, that closely fits the target function. However, the numbers of refinement iterations and the achieved accuracy do vary. Therefore, it is important to evaluate and choose the weight functions in the context of specific application scenarios.
Gaussian process regression of the force between two nitrogen atoms as a function of interatomic distance using different combinations of radial weight functions. Inset figures are plots of the regression function using distances from the feature space.
Gaussian process regression of the force between two nitrogen atoms as a function of interatomic distance using different combinations of radial weight functions. Inset figures are plots of the regression function using distances from the feature space.
C. Quadrature resolution and density kernel
Despite the formal convergence of the composite quadrature in DECAF, a cost of distance calculations and kernel evaluations is needed to sample a density field generated by N atoms using M quadrature nodes. A less prominent cost is associated with the L2 distance calculation, which comes at a cost of floating point operations. Thus, in practice, it is often desirable to use as few nodes as possible to capture only information of the density field within a certain band limit.26 Accordingly, the integral cutoff Rc, the number of quadrature nodes, and the width of the density kernel need to be tuned to obtain an optimal balance between resolution and computational speed.
When designing the composite quadrature rule, we chose the Laguerre quadrature for the radial direction because its nodes are denser near the origin but sparser farther away. This is consistent with our physical intuition that the near field generally has a stronger influence than the far field in an atomistic neighborhood. For example, the van der Waals potential grows rapidly when atoms are in direct contact but flattens out of the first coordinate shell. Accordingly, it may be possible for us to use sparser outer-layer grids to reduce the total number of quadrature nodes, while still keeping enough nodes in the inner layers to maintain the sensitivity of the quadrature toward close neighbors. Cooperatively, we can also use non-stationary Gaussian density kernels whose width is dependent on the distance from the atom to the origin. In this way, the sparser nodes should still sufficiently sample the smoother far field. Wider kernels at remote atoms also reduce the total difference between the far fields of two fingerprints in a statistical sense. Thus, the contribution of the far field in the integral can be effectively tuned even though the weights on the quadrature nodes remain the same.
In Fig. 8, we demonstrate how a variable-resolution quadrature can be combined with a widening smoothing density kernel to simultaneously reduce the computational complexity and preserve the quality of the fingerprint. In column A, a dense grid is used to sample density fields generated by a wide smoothing length. By examining the distance matrices of fingerprints sampled during bond stretching and angular stretching movements, we note that the radial similarity decreases monotonically while the angular similarity changes nearly constantly. In column B, the number of quadrature nodes is kept the same, but the smoothing length is reduced as an attempt to increase the fingerprint sensitivity. Better response in the near field of the radial direction is obtained, but the linearity in the far field in the angular direction is compromised. In column C, the fingerprint performs even worse due to the combination of a sparser quadrature grid and a small smoothing length. In column D, the performance is recovered because we let the smoothing length parameter σ of the Gaussian density kernels Wd(r) depend on the distance from the origin to each atom and simultaneously adjust the quadrature node density according to this pattern.
A comparison of fingerprint distance matrices corresponding to bond stretching and angular stretching movements. (a) Dense grid + large smoothing length: radial similarity decreases monotonically while angular similarity changes nearly constantly. (b) Dense grid + smaller smoothing length: better fingerprint sensitivity in the near field for bond stretching, compromised linearity in the far field for angular stretching. (c) Sparser grid + smaller smoothing length: compromised far-field performance for both bond and angular movements. (d) Variable-resolution grid + radially dependent smoothing length: good resolution and linearity in both near and far fields.
A comparison of fingerprint distance matrices corresponding to bond stretching and angular stretching movements. (a) Dense grid + large smoothing length: radial similarity decreases monotonically while angular similarity changes nearly constantly. (b) Dense grid + smaller smoothing length: better fingerprint sensitivity in the near field for bond stretching, compromised linearity in the far field for angular stretching. (c) Sparser grid + smaller smoothing length: compromised far-field performance for both bond and angular movements. (d) Variable-resolution grid + radially dependent smoothing length: good resolution and linearity in both near and far fields.
IV. DEMONSTRATION
A. Method
Regression tasks throughout this work are performed using Gaussian process regression (GPR), a nonlinear kernel method, that treats training data points as discrete observations from an instantiation of a Gaussian process. Predictions are made using the posterior mean and variance of the joint distribution between the test data and the training data. One particular interesting property about the Gaussian process is that the posterior variance may be interpreted as a measure of prediction uncertainty, which can be exploited to design active learning algorithms for sampling expensive functions. The actual computation used our own software implementation which was made publicly available on Zenodo.27 We use the square exponential covariance kernel to compute the covariance, i.e., correlation, between the samples,
where x and x′ are the DECAF fingerprints, and d(x, x′) is the distance between norms as computed by Eq. (24) or Eq. (26). The kernel is stationary, meaning that the covariance depends only on the relative distance between two samples but not their absolute position. The training process searches for the hyperparameters, i.e., the output variance σ and the length scale l, that maximizes the likelihood of the training data. A detailed tutorial on GPR can be found in Ref. 15. An illustration on the complete workflow of using the density field fingerprint to perform regression tasks is given in Fig. 9.
Shown here is the workflow of regression using the density field fingerprint. The key stages, i.e., kernel minisum optimization, density field generation, and quadrature-based distance computation, are covered in detail in Secs. II and III. The fingerprints can also be readily used to train artificial neural networks.
Shown here is the workflow of regression using the density field fingerprint. The key stages, i.e., kernel minisum optimization, density field generation, and quadrature-based distance computation, are covered in detail in Secs. II and III. The fingerprints can also be readily used to train artificial neural networks.
B. Potential energy surface
First, we attempt to fit the potential energy surface of a protonated water dimer system, in a head-to-head configuration, as a function of the oxygen-oxygen distance rO–O and the dihedral angle φ between the two planes each formed by a water molecule. As shown in Fig. 10(a), the system contains an improperly rotational symmetry, which we wish to capture with the kernel minisum algorithm. A GPR model was seeded with 8 training points corresponding to the combinations of rO–O = 2.2, 2.4, 2.6, 2.8 and φ = 0, π/2. Subsequently, an active learning protocol was used to greedily absorb points with the highest uncertainty into the training set. Despite that we restricted all training data to be within the subdomain φ ≤ π/2, as shown by Figs. 10(b) and 10(c), we are able to accurately reproduce the target function over the entire parameter space after a few active learning steps.
Gaussian process regression is carried out to fit the potential energy surface of a protonated water dimer as a function of two internal variables, i.e., the oxygen-oxygen distance rO–O and the dihedral angle φ between the planes determined by the two water molecules. This system contains an improperly rotational symmetry, which can be correctly recognized by the kernel minisum-based algorithm given in Algorithm 3. Only a quarter of the domain was used to train the GP, yet the model can accurately predict the energy of the entire parameter space thanks to symmetry detection.
Gaussian process regression is carried out to fit the potential energy surface of a protonated water dimer as a function of two internal variables, i.e., the oxygen-oxygen distance rO–O and the dihedral angle φ between the planes determined by the two water molecules. This system contains an improperly rotational symmetry, which can be correctly recognized by the kernel minisum-based algorithm given in Algorithm 3. Only a quarter of the domain was used to train the GP, yet the model can accurately predict the energy of the entire parameter space thanks to symmetry detection.
The DECAF fingerprint used here is constructed with 3 spherical layers within a cutoff distance Rc of 6.0 Å, each consisting of 14, 26, and 38 Lebedev quadrature nodes, respectively. The weight of integral was chosen as (r) = W6,4(1 − r/Rc), where W6,4 is the bell-shaped polynomial as defined in the Appendix, Eq. (A1). The density scaling function , where r is the vector from the atom to the fingerprint center, is the tent-like kernel as defined in Eq. (27) with t = 3. The density kernel that sits on the oxygen atoms assumes the form of a non-stationary Gaussian as discussed in Sec. III C: with r being the vector from the atom to the fingerprint center and r′ being the vector from the atom to the quadrature node. The density kernel for the hydrogen atoms has a different weight and width to ensure discriminability: .
C. Geometry optimization and vibrational analysis
Next, we demonstrate the usability of fingerprint for fitting vector-valued quantities by performing geometry optimization and vibrational analysis on a single water molecule. The process involves the simultaneous regression of: (1) energy, a molecular scalar quantity; (2) force, a per-atom vector quantity; and (3) dipole, a molecular vector quantity. Correspondingly, we performed GPR of energy and dipole using fingerprints extracted from the center of mass of the molecule and GPR of force using fingerprints extracted from each atom. Each component of the vector properties is modeled independently as a scalar Gaussian process. The training set consists of 45 configurations uniformly covering the range rO–H = 0.93, 0.95, 0.97, 0.99, 1.05 Å and θH–O–H = 101°, 105.5°, 111°. As shown in Table III, the GPR model can successfully drive calculations of the infrared spectrum of the molecule from randomly perturbed initial structures in arbitrary orientation. The fingerprint configuration is the same as in Sec. IV B.
Geometry optimization and vibrational analysis of a single water molecule using GPR and our proposed fingerprint algorithm. 256 independent trials were performed using coordinates of water perturbed from equilibrium by a Gaussian noise followed by a randomly chosen rigid-body rotation.
. | GPR . | DFT . |
---|---|---|
Zero-point energy | 0.591 ± 0.003 eV | 0.583 eV |
Static dipole | 2.1580 ± 0.0001 D | 2.159 D |
Residual force | 0.0016 ± 0.0005 eV/Å | 0.0003 eV/Å |
. | GPR . | DFT . |
---|---|---|
Zero-point energy | 0.591 ± 0.003 eV | 0.583 eV |
Static dipole | 2.1580 ± 0.0001 D | 2.159 D |
Residual force | 0.0016 ± 0.0005 eV/Å | 0.0003 eV/Å |
. | Frequency (cm−1) . | Intensity (D/A)2 amu−1 . | ||
---|---|---|---|---|
Mode . | GPR . | DFT . | GPR . | DFT . |
0 | 1576.5 ± 1.4 | 1602.4 | 1.5726 ± 0.0005 | 1.5767 |
1 | 3819.3 ± 0.9 | 3817.5 | 0.2516 ± 0.0005 | 0.2159 |
2 | 3916.7 ± 1.6 | 3922.6 | 1.3349 ± 0.0028 | 1.3401 |
. | Frequency (cm−1) . | Intensity (D/A)2 amu−1 . | ||
---|---|---|---|---|
Mode . | GPR . | DFT . | GPR . | DFT . |
0 | 1576.5 ± 1.4 | 1602.4 | 1.5726 ± 0.0005 | 1.5767 |
1 | 3819.3 ± 0.9 | 3817.5 | 0.2516 ± 0.0005 | 0.2159 |
2 | 3916.7 ± 1.6 | 3922.6 | 1.3349 ± 0.0028 | 1.3401 |
D. Molecular dynamics trajectory
As shown in Fig. 11, here we attempt to fit for the forces felt by the atoms in a benzene molecule along the MD Trajectories as obtained from a sibling database of QM7.28,29 The density kernel for the carbon atoms assumes the same functional form with that of the oxygen atoms, but uses a different smoothing length function σC(r) = 1.2 + 0.2∥r∥. The rest of the parameters are inherited from the previous examples. The training configurations were chosen adaptively in an iterative process using the sum of the GPR posterior variance and the prediction error as the acquisition function.
The force exerted on the carbon atoms in a benzene molecule was fit using 100 out of the 1200 configurations in a 100-ns molecular dynamics trajectory at 0.5 ns time step size. Color coding corresponds to carbon atom id, while crosses indicate the atomistic configurations used in the GPR training.
The force exerted on the carbon atoms in a benzene molecule was fit using 100 out of the 1200 configurations in a 100-ns molecular dynamics trajectory at 0.5 ns time step size. Color coding corresponds to carbon atom id, while crosses indicate the atomistic configurations used in the GPR training.
V. CONNECTION TO OTHER FINGERPRINT ALGORITHMS
In Fig. 12, we compare the ability to distinguish atomistic configurations of our fingerprint with SOAP and the Coulomb matrix. Our work is inspired by the SOAP descriptor,12 which proposes the use of smoothed densities to represent atomistic neighborhoods. However, instead of converting the density field into the frequency domain using spherical harmonics, we perform density field sampling and comparison directly in the real space. This is enabled due to the availability of the canonical coordinate frame as computed through the kernel minisum optimization. We have mainly used the L2 norm to compute the distance between atomistic neighborhoods. However, our fingerprint exhibits very similar behavior to SOAP when used together with an inner product formula
as demonstrated in Fig. 12(a). Thus, our fingerprint could be used in conjunction with a wide variety of covariance functions based on either the Euclidean distance or the inner product similarity.
A comparison between the distances measured using DECAF, SOAP, and the Coulomb matrix. Fingerprint distances Δφ shown in the plots are measured against rij = 1.0 in (a) and an randomly chosen initial state in (b).
A comparison between the distances measured using DECAF, SOAP, and the Coulomb matrix. Fingerprint distances Δφ shown in the plots are measured against rij = 1.0 in (a) and an randomly chosen initial state in (b).
At first sight, DECAF is very different from the Coulomb matrix fingerprint and GRAPE, which are both graph-based algorithms.17,18 However, instead of trying to capture the overall density field, if we measure the contribution from each individual atom on the quadrature nodes at z1, z2, …, zM as a row vector, and stacked up the results to yield the matrix
then E can be regarded as an incidence matrix30 between the atoms and the quadrature nodes. As shown in Fig. 13, this is similar to the graph-based abstraction as seen in the Coulomb matrix and the GRAPE kernel. However, in both cases, the vertices in the graph represent atoms, while the edges represent pairwise interatomic interactions. Here, the density-based incidence matrix adopts the opposite pattern and constructs a graph with the quadrature nodes being vertices and atoms being edges. The adjacency matrix in this case is computed as the inner product ETE,
The weight on the edges, as represented by the elements of the adjacency matrix A, can be interpreted as the total flux as contributed by all paths each bridged by an atom k. We have numerically found that the smallest N eigenvalues (except for the 0 eigenvalue) of the symmetric normalized Laplacian
are invariant under rotation up to a certain noise level, even if the quadrature nodes do not rotate with the atoms. Nonetheless, this detour appears to represent a pure theoretical interest rather than any practical value.
A comparison between graph-based molecular fingerprints. The Coulomb matrix and the GRAPE kernel construct graphs where nodes correspond to atoms, while the weights on the edges are determined by some pairwise inter-atomic interactions. In contrast, in the density-based incidence matrix, we construct a graph on a set of quadrature nodes whose connectivity is weighted by a sum of contributions from individual atoms.
A comparison between graph-based molecular fingerprints. The Coulomb matrix and the GRAPE kernel construct graphs where nodes correspond to atoms, while the weights on the edges are determined by some pairwise inter-atomic interactions. In contrast, in the density-based incidence matrix, we construct a graph on a set of quadrature nodes whose connectivity is weighted by a sum of contributions from individual atoms.
VI. CONCLUSION
In this paper, we presented the Density-Encoded Canonically Aligned Fingerprint (DECAF) by exploring the idea of using smoothed density fields to represent and compare atomistic neighborhoods. One of the key enabling techniques in DECAF is a kernel minisum algorithm, which allows the unambiguous identification of a canonically aligned coordinate frame that can be used for rotationally invariant projection of the density field as well as any associated vector quantities. We have performed detailed analysis to study the behavior of the fingerprint by changing various parameters, such as resolution, smoothing length, and the choice of weight functions. We demonstrate that the fingerprint algorithm can be used to implement highly accurate regressions of both scalar and vector properties of atomistic systems including energy, force, and dipole moment, and could be a useful building block for constructing data-driven next generation force fields to accelerate molecular mechanics calculations with an accuracy comparable to those driven by quantum mechanical theories and calculators.
ACKNOWLEDGMENTS
This work was supported by the Department of Energy (DOE) Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). This work was also supported by the Army Research Laboratory under Cooperative Agreement No. W911NF-12-2-0023.
APPENDIX: SOME NUMERICAL AND PRACTICAL DETAILS
1. Polynomial smoothing functions with compact support
As candidates for the weight of integral and density scaling functions (Sec. III B), a class of compact polynomials that satisfy the criteria,31
should be compactly supported,
should be strictly positive within some cutoff distance rc,
should decrease monotonically,
should be at least twice continuously differentiable,
with a minimal number of non-zero terms are
where s = 1 − r/h is the normalized complementary coordinate within the span h of the kernel, and
is an optional normalization factor to ensure that the integral of the kernel in a 3D ball of radius h is unity. The parameters a and b are the free parameters that can be used to adjust the smoothness and width of the kernel and can take any real numbers satisfying the condition a > b > 2. A series of instantiations of the kernel with various , values are illustrated in Fig. 14. Note that the kernel W4,3 is equivalent to the Lucy kernel commonly used in Smoothed Particle Hydrodynamics simulations.32 The kernel can be evaluated very efficiently using only multiplication and addition when both a and b are integers.
Visualization of the polynomial kernels as given in Eq. (A1) with a unit support radius. The kernels are bell-shaped with a derivative of 0 at the origin. Both the first and second derivatives of the kernels transition smoothly to 0 at its support radius. In contrast, the Gaussian kernel and its derivatives do not decay to zero at any finite distance, while the second derivative of the cosine kernel as mentioned in previous work11,12 is not zero at the cutoff distance.
Visualization of the polynomial kernels as given in Eq. (A1) with a unit support radius. The kernels are bell-shaped with a derivative of 0 at the origin. Both the first and second derivatives of the kernels transition smoothly to 0 at its support radius. In contrast, the Gaussian kernel and its derivatives do not decay to zero at any finite distance, while the second derivative of the cosine kernel as mentioned in previous work11,12 is not zero at the cutoff distance.
2. Table of quadrature nodes and weights
In Table IV, we list the nodes and weights of the Laguerre quadrature rules up to Nr = 6, using notations from Eq. (22). In Table V, we list the nodes and weights of the Lebedev quadrature rules up to Nr = 6, using notations from Eq. (21). The Laguerre and Lebedev quadrature nodes can be combined using Eqs. (23)–(26) into composite grids for sampling the atomistic density field.
Laguerre quadrature nodes and weights up to 6 points.
. | . | n = 0 . | n = 1 . | n = 2 . | n = 3 . | n = 4 . | n = 5 . |
---|---|---|---|---|---|---|---|
Nr = 2 | r_n | 2.0000 | 6.0000 | ||||
a_n | 1.5000 | 0.5000 | |||||
Nr = 3 | r_n | 1.5174 | 4.3116 | 9.1710 | |||
a_n | 1.0375 | 0.9058 | 0.0568 | ||||
Nr = 4 | r_n | 1.2268 | 3.4125 | 6.9027 | 12.4580 | ||
a_n | 0.7255 | 1.0634 | 0.2067 | 0.0044 | |||
Nr = 5 | r_n | 1.0311 | 2.8372 | 5.6203 | 9.6829 | 15.8285 | |
a_n | 0.5209 | 1.0667 | 0.3835 | 0.0286 | 0.0003 | ||
Nr = 6 | r_n | 0.8899 | 2.4331 | 4.7662 | 8.0483 | 12.6004 | 19.2620 |
a_n | 0.3844 | 0.9971 | 0.5361 | 0.0795 | 0.0029 | 0.0000 |
. | . | n = 0 . | n = 1 . | n = 2 . | n = 3 . | n = 4 . | n = 5 . |
---|---|---|---|---|---|---|---|
Nr = 2 | r_n | 2.0000 | 6.0000 | ||||
a_n | 1.5000 | 0.5000 | |||||
Nr = 3 | r_n | 1.5174 | 4.3116 | 9.1710 | |||
a_n | 1.0375 | 0.9058 | 0.0568 | ||||
Nr = 4 | r_n | 1.2268 | 3.4125 | 6.9027 | 12.4580 | ||
a_n | 0.7255 | 1.0634 | 0.2067 | 0.0044 | |||
Nr = 5 | r_n | 1.0311 | 2.8372 | 5.6203 | 9.6829 | 15.8285 | |
a_n | 0.5209 | 1.0667 | 0.3835 | 0.0286 | 0.0003 | ||
Nr = 6 | r_n | 0.8899 | 2.4331 | 4.7662 | 8.0483 | 12.6004 | 19.2620 |
a_n | 0.3844 | 0.9971 | 0.5361 | 0.0795 | 0.0029 | 0.0000 |
Lebedev quadrature nodes and weights up to 50 points.
Na = 6 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.166 67 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.166 67 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.166 67 |
Na = 6 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.166 67 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.166 67 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.166 67 |
Na = 14 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.066 67 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.066 67 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.066 67 |
m = 6−13 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.075 00 |
Na = 14 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.066 67 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.066 67 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.066 67 |
m = 6−13 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.075 00 |
Na = 26 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.047 62 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.047 62 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.047 62 |
m = 6−9 | 0.000 00 | ±0.707 11 | ±0.707 11 | 0.038 10 |
m = 10−13 | ±0.707 11 | 0.000 00 | ±0.707 11 | 0.038 10 |
m = 14−17 | ±0.707 11 | ±0.707 11 | 0.000 00 | 0.038 10 |
m = 18−25 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.032 14 |
Na = 26 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.047 62 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.047 62 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.047 62 |
m = 6−9 | 0.000 00 | ±0.707 11 | ±0.707 11 | 0.038 10 |
m = 10−13 | ±0.707 11 | 0.000 00 | ±0.707 11 | 0.038 10 |
m = 14−17 | ±0.707 11 | ±0.707 11 | 0.000 00 | 0.038 10 |
m = 18−25 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.032 14 |
Na = 38 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.009 52 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.009 52 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.009 52 |
m = 6−13 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.032 14 |
m = 14−17 | ±0.459 70 | ±0.888 07 | 0.000 00 | 0.028 57 |
m = 18−21 | ±0.888 07 | ±0.459 70 | 0.000 00 | 0.028 57 |
m = 22−25 | ±0.459 70 | 0.000 00 | ±0.888 07 | 0.028 57 |
m = 26−29 | ±0.888 07 | 0.000 00 | ±0.459 70 | 0.028 57 |
m = 30−33 | 0.000 00 | ±0.459 70 | ±0.888 07 | 0.028 57 |
m = 34−37 | 0.000 00 | ±0.888 07 | ±0.459 70 | 0.028 57 |
Na = 38 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.009 52 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.009 52 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.009 52 |
m = 6−13 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.032 14 |
m = 14−17 | ±0.459 70 | ±0.888 07 | 0.000 00 | 0.028 57 |
m = 18−21 | ±0.888 07 | ±0.459 70 | 0.000 00 | 0.028 57 |
m = 22−25 | ±0.459 70 | 0.000 00 | ±0.888 07 | 0.028 57 |
m = 26−29 | ±0.888 07 | 0.000 00 | ±0.459 70 | 0.028 57 |
m = 30−33 | 0.000 00 | ±0.459 70 | ±0.888 07 | 0.028 57 |
m = 34−37 | 0.000 00 | ±0.888 07 | ±0.459 70 | 0.028 57 |
Na = 50 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.012 70 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.012 70 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.012 70 |
m = 6−9 | 0.000 00 | ±0.707 11 | ±0.707 11 | 0.022 57 |
m = 10−13 | ±0.707 11 | 0.000 00 | ±0.707 11 | 0.022 57 |
m = 14−17 | ±0.707 11 | ±0.707 11 | 0.000 00 | 0.022 57 |
m = 18−25 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.021 09 |
m = 26−33 | ±0.301 51 | ±0.301 51 | ±0.904 53 | 0.020 17 |
m = 34−41 | ±0.301 51 | ±0.904 53 | ±0.301 51 | 0.020 17 |
m = 42−49 | ±0.904 53 | ±0.301 51 | ±0.301 51 | 0.020 17 |
Na = 50 . | x_m . | y_m . | z_m . | b_m . |
---|---|---|---|---|
m = 0,1 | ±1.000 00 | 0.000 00 | 0.000 00 | 0.012 70 |
m = 2,3 | 0.000 00 | ±1.000 00 | 0.000 00 | 0.012 70 |
m = 4,5 | 0.000 00 | 0.000 00 | ±1.000 00 | 0.012 70 |
m = 6−9 | 0.000 00 | ±0.707 11 | ±0.707 11 | 0.022 57 |
m = 10−13 | ±0.707 11 | 0.000 00 | ±0.707 11 | 0.022 57 |
m = 14−17 | ±0.707 11 | ±0.707 11 | 0.000 00 | 0.022 57 |
m = 18−25 | ±0.577 35 | ±0.577 35 | ±0.577 35 | 0.021 09 |
m = 26−33 | ±0.301 51 | ±0.301 51 | ±0.904 53 | 0.020 17 |
m = 34−41 | ±0.301 51 | ±0.904 53 | ±0.301 51 | 0.020 17 |
m = 42−49 | ±0.904 53 | ±0.301 51 | ±0.301 51 | 0.020 17 |