Transferring particle charges to and from a grid plays a central role in the particle–mesh algorithms widely used to evaluate the electrostatic energy in molecular dynamics (MD) simulations. The computational cost of this transfer process represents a substantial part of the overall time required for simulation and is primarily determined by the size of the support (the set of grid nodes at which the transfer function is evaluated). The accuracy of the resulting approximation depends on the form of the transfer function, of which several have been proposed, as well as the size and shape of its support. Here, we show how to derive the transfer function that yields maximal asymptotic accuracy for a given support in the limit of fine grid resolution, finding that all such functions are splines, and we determine these functions (which we refer to as midtown splines) for a variety of choices of support to find optimally efficient transfer functions at accuracy levels relevant to MD simulations. We describe midtown splines that achieve fourth- and sixth-order accuracy in the grid spacing while requiring a support size of 32 and 88 grid nodes, respectively, compared to the 64 and 216 nodes required by the most widely used transfer functions (B-splines). At accuracy levels typically used in MD simulations, the use of midtown splines thus cuts the time required for charge spreading by roughly a factor of two.
I. INTRODUCTION
The inclusion of electrostatic interactions improves the accuracy of modern molecular models, but the long-range nature of these interactions makes them prohibitively expensive to evaluate directly by pairwise summation. A commonly used, more efficient alternative is splitting the interactions into short-range and long-range components (by means of Ewald splitting1–3 or the u-series decomposition4). The short-range interactions, which include the Coulomb singularity, are effectively range-limited and are evaluated by direct pairwise summation.5 The long-range interactions are typically first mapped to a rectangular grid, where they are evaluated (a process that often involves transforming to Fourier space, where the long-range components become range-limited and straightforward to compute), and then energies and/or forces are interpolated back to the continuum.6–10 The accuracy and efficiency of the continuum-to-grid mapping (a procedure also called “charge assignment” or “charge spreading”) and its reverse are important factors in determining the performance of molecular dynamics (MD) simulations. Although our focus is on MD, we note that continuum-to-grid mapping is a problem encountered in other applications, such as the evaluation of quantum effects by path integral calculations,11 smoothing in medical image processing,12 and particle hydrodynamics calculations.13
Continuum-to-grid mapping consists of approximating a charge normally located at a point r in the continuum by a number n of charges located on the nearest nodes of a regular grid (with spacing Δ). For simplicity, consider a one-dimensional case: A charge at point x is spread to all grid nodes iΔ within a radius rs = (n/2)Δ of x. The amount of charge assigned to iΔ (more generally referred to as the “weight” assigned to that grid node) is a function of the relative position x − iΔ. Due to its functional dependence on x, we term this weight the transfer function; the set of points where the transfer function is non-zero is called its support. Because the size of the support is finite and the mesh size Δ is non-zero, the original point charge is only approximately represented by the weights; there are thus inherent errors in energies evaluated using the continuum-to-grid mapping. These errors can be reduced by increasing the support size, but at increased computational cost. Alternatively, for a given support size, one can try to reduce the errors—and thus obtain a better trade-off between accuracy and computational cost—by judicious choice of the transfer function.
The first transfer function to see widespread use in MD simulations was the triangular-shaped cloud (TSC), which spreads a charge to the nearest three grid nodes in the one-dimensional case and 27 grid nodes in the three-dimensional case. TSC is used in the context of the particle–particle particle–mesh (P3M) method of Hockney and Eastwood,6 who have demonstrated that the error in the total energy is O(Δ2) in the limit Δ → 0. A widely used related method known as smooth particle mesh Ewald (SPME)7 uses cardinal B-splines for charge assignment to increase the convergence order beyond 2: For a support size of n in one dimension, the convergence order is also n. In three dimensions, one B-spline is used for each dimension to form an outer product (i.e., a separable function) that spreads a point charge to n3 grid nodes. (B-splines have more recently been replaced by Kaiser–Bessel functions14,15 and further optimized to dedicated Hermite splines.16) The simplest transfer function is perhaps the truncated Gaussian used in Gaussian split Ewald (GSE).9 Although less accurate for a given support size, this function is used on the specialized supercomputer Anton,17 which takes advantage of the function’s simple form to evaluate it rapidly.
In this work, we show how to derive the optimal transfer functions (i.e., the sets of weights that minimize the error in the limit Δ → 0) for a given support, finding that these functions are splines in the position r. These optimal functions, which we term midtown splines, are continuous and piecewise smooth in the argument r for all even integers n and are thus suitable for MD, where differentiability is needed for the evaluation of the forces. We first derive the optimal transfer functions for one-dimensional systems and do so in two steps. In the first step, we address the related more general problem of evaluating the Gaussian smoothing of a function that is known only on a grid. In the continuum, Gaussian smoothing is an integral of the product between the function and a Gaussian of mean x. On a grid, this integral is approximated by a finite weighted sum known as a “quadrature rule” (see Refs. 18 and 19). The effect of the Gaussian that multiplies the function is captured by way of x-dependent quadrature weights, which, upon minimization of the error, are found to be splines in x: the midtown quadrature splines (QuadS). In the second step, QuadS are shown to minimize the asymptotic error in the electrostatic potential energy and are thus demonstrated to be the optimal transfer functions for evaluating long-range interactions in one dimension. Similar to B-splines, QuadS can be extended to three dimensions by way of an outer product, with the error being O(Δn) in the limit Δ → 0, but with the minimum possible proportionality constant among all separable transfer functions.
Separable spreading functions, being products of one-dimensional functions, are restricted to a cubic support, but further optimization is possible by considering non-separable spreading functions, which do not have this restriction on the shape of the support. Optimizing directly for non-cubic supports in higher dimensions entails higher-dimensional integration, which requires the determination of what is known as a “cubature rule.” The optimal weights are again found to be splines, which we call the midtown cubature splines (CubeS). CubeS achieve the same order of accuracy as QuadS but with smaller support sizes: Examples include one CubeS transfer function of accuracy order 4 with a support size of 32 grid nodes (compared to 64 nodes for QuadS) and two functions of order 6 with a support size of 80 or 88 nodes (compared to 216 nodes for QuadS). This makes CubeS about two times faster for charge spreading than QuadS—and thus all separable transfer functions, such as the widely used B-splines—at accuracy levels typically used in MD simulations. (We note that CubeS is slightly less accurate than QuadS for a given grid spacing Δ at the same order of convergence, but this is typically not important in practice. On most computational architectures, the spreading component dominates the cost of computing the long-range electrostatics. The savings from the smaller support size of CubeS is thus much larger than the marginal increase in the computational cost that results from the decrease in the grid spacing for CubeS to yield the same accuracy as QuadS.)
We provide practical guidance on how to determine the optimal computational setup for MD simulations using midtown splines in combination with the u-series decomposition (an alternative to the Ewald decomposition that is more computationally efficient). Using this setup, we found that simulations of water boxes on graphics processing units (GPUs) with CubeS and a support size of 88 are 30%–40% faster than those using QuadS with a support size of 216. These results illustrate the potential practical performance benefits of using CubeS in MD simulations. Finally, we note that on certain computational architectures—including on Anton—it is much more efficient to use a transfer function that is a function of the absolute distance to the grid nodes in three dimensions, such as the truncated Gaussian of the GSE algorithm. Although midtown splines are thus not directly applicable to Anton, their development yielded insight that allowed us to develop an improved version of the GSE algorithm, which we call hyperbolic sine Gaussian split (SinhGS), that can be used on such architectures.
II. MIDTOWN QUADRATURE SPLINES
The midtown quadrature splines (QuadS) are introduced in this section as an optimization result for the problem of approximately representing and manipulating one-dimensional Gaussian smoothings by means of a grid of mesh size Δ. It is only in Sec. III that we demonstrate that the same splines also optimally solve the problem of one-dimensional continuum-to-grid charge transfer. This separation in which we first consider a more general optimization problem (defined more precisely below) before showing that the resulting splines are also optimal for charge spreading makes it clearer that the domain of applicability of the splines extends beyond charge spreading (to path integrals and image filter design), even though for concreteness, we retain the language of charge spreading below. To skip the details of the optimization problem, continue with Sec. VI, which gives practical guidance for most effectively using midtown splines as transfer functions to compute the electrostatics in MD simulations. Explicit formulas for midtown splines of orders 4 and 6 are provided in the Appendix, and formulas for other orders can be computed with the Mathematica code provided in the supplementary material.
The Gaussian smoothing fσ(x) of an arbitrary function f(x) is defined by
with denoting the normalized Gaussian kernel,
In practice, the smoothing can rarely be performed exactly because f(x) is almost always known only for the grid points xi, whether by necessity (an apparatus sampling a signal may only allow for finite resolution) or by design (to speed up computations by storing and processing less information). The integral in Eq. (1) must then be replaced by a sum that only uses the samples f(xi). An initial candidate is provided by the Riemann sum,
where we introduce the scaled width ξ = σ/Δ and position s = x/Δ. The sum in Eq. (3) is actually an infinite series because the support of the Gaussian kernel is unbounded. Computationally, we require a finite sum, and to obtain it, we must replace with an approximation Sξ(s, i) that vanishes whenever |s − i| exceeds a certain finite bound ν. Given the fact that i is an integer, we restrict the bound ν to integers. The smoothing fσ(x) is approximately recovered by the interpolation formula (now effectively a finite sum),
For each , the values represent a set of weights that, together with the nodes xi, define a quadrature rule. The associated quadrature sum [i.e., the sum in the right-hand side (rhs) of Eq. (4)] is meant to approximate the continuous integral in Eq. (1), and as a first optimization problem, we could attempt to minimize the error of this approximation. Before doing so, let us point out the existence of a different way of approximating fσ(x). This approach is given by what we call the assignment formula,
which applies even when f(x) is not continuous (e.g., if it is a Dirac delta localized between the grid nodes), but only computes the values of fσ(x) on the grid. To make sense of Eq. (5), revisit Eq. (1) and simply note that has been replaced with the transfer function .
At this point, a natural question is which of the two formulas is to be optimized. It is helpful to anticipate the results: we will demonstrate that the kernel Sξ(s, i) that optimizes the assignment formula to a given order of accuracy simultaneously optimizes the interpolation formula to the same order of accuracy. In other words, the choice of objective function to be optimized is a false dilemma, but this will become clear only after several propositions are first demonstrated. The optimization problem will be stated and solved at the end of this section in Proposition 4. We will additionally find that the optimal transfer functions are at least continuous and piecewise differentiable in the argument x.
It is important to note that, although the variables x and x′ in are interchangeable, they perform different roles in Eq. (1): x′ is an integration variable, whereas x is an interpolation variable. The difference is more conspicuous in , since the variable i is now discrete. We distinguish between the discrete representation , which we call the assignment view, and the continuous representation , which we call the interpolation view. The interpolation and assignment views are named so because they showcase the action of the interpolation and assignment formulas, respectively, on singular charges. We obtain the assignment view by applying the assignment formula to a unit charge located at x. Since the charge density is then the Dirac function δ(x′ − x), we calculate
To obtain the interpolation view, we apply the interpolation formula to a unit charge located on the grid at node xi to obtain
with δij denoting the Kronecker symbol and modeling the singular charge distribution on the grid.
We now provide the details needed to both define the optimization problem and identify the optimal . We will start by requiring that the interpolation formula is exact for all polynomials f(x) of degree strictly less than 2ν. This requirement defines the kernel , which turns out to have the form , with being a spline that is uniquely determined except for the value of the nonlinear parameter ξ. We then show that with this kernel, the assignment formula is also exact for all polynomials of order strictly less than 2ν. The optimal value of the non-linear parameter ξ is determined, predictably, by minimizing the error in the moment of order 2ν for the assignment formula.
A. Accuracy of the interpolation formula
To ensure an order of accuracy equal to 2ν, we will define the assignment view such that the interpolation formula is exact for all functions f(x) that are polynomials of degree strictly less than 2ν. Put differently, we require the assignment view to represent the weights for a Newton–Cotes quadrature rule of order 2ν. As noted in Ref. 18, such a rule might be better said to be of algebraic degree 2ν − 1. Nevertheless, it is also noted in Ref. 18 that the rule invariably leads to error terms with asymptotic decay of the form const · Δ2ν in the limit Δ → 0 for expressions in which the exact integral is replaced by the quadrature sum. Expressions with error decay of the form const · Δ2ν in the limit Δ → 0 are said to be of accuracy order 2ν, and the error is said to be O(Δ2ν) as Δ → 0. The quadrature rule itself is then rightfully said to be of accuracy order 2ν, a nomenclature that is more widespread than the concept of algebraic degree and that we adopt here to characterize the accuracy of both the interpolation and assignment formulas. We only consider even orders of accuracy, corresponding to integer values for ν, but provide a short discussion on the properties of midtown splines of odd orders in Appendix F. The nodes of the quadrature rule are the grid nodes xi within a distance νΔ of x because, as already mentioned, the assignment view is set equal to 0 outside this range.
When dealing with quadrature rules, it appeared to be easier to work in terms of the pair (, θ) instead of just s, with and θ(s) = s − denoting the integer and fractional parts of s, respectively. θ(s) takes values in the interval [0, 1). The decomposition s = + θ(s) establishes a one-to-one correspondence between s and the pair (, θ). Owing to this bijection, we can drop the explicit dependence of θ(s) on s and think of θ as a variable taking values in the range [0, 1). We will do so most of the time, except when readability can be improved by explicitly specifying the dependence θ(s).
The support of the assignment view is made up of the grid nodes in the set {, , …, }. They are indicated by the shaded ellipses in Fig. 1. Let us denote the quadrature weight associated with the node as ci(θ, ξ) for −ν < i ≤ ν. We consider the following approximation to fσ(x) defined as a standard Newton–Cotes quadrature sum,
Comparison with Eq. (4) shows that
The interpolation formula becomes
As we have already mentioned, the quadrature weights ci(θ, ξ) are obtained by demanding that the Newton–Cotes rule integrate exactly all polynomials of degree at most 2ν − 1. That is, we require that
for some and for all 0 ≤ k < 2ν. The value of a is immaterial because the set of polynomials in x′ spanned by the monomials {(x′ − a)k; 0 ≤ k < 2ν} is independent of a. Picking a = ( + 1/2)Δ for convenience of calculation, we deduce that the weights ci(θ, ξ) satisfy the linear system of equations,
for all 0 ≤ k < 2ν. In Eq. (12), μk is the kth moment of a standard normal variable,
with (k − 1)‼ denoting the product of every odd number from 1 to k − 1 if k ≥ 2 and 1 otherwise. For now, ξ = σ/Δ represents a free parameter, the value of which will be decided later. The system specified by Eq. (12) is uniquely determined because the matrix {(i − 1/2)k; 0 ≤ k < 2ν, −ν < i ≤ ν} is invertible.
Assignment of grid values for point charges placed at different locations x. A darker shade for the ellipsis under a given grid node means that more charge is assigned to that node. In this example, 2ν = 4, non-zero charges are assigned only to the grid nodes that are within 2Δ of x. In the continuum, the charges are spread as Gaussians of standard deviation σ centered on x. The interpolation formula approximates fσ(x) with a weighted average of the values of f at the nodes that were assigned non-zero weights.
Assignment of grid values for point charges placed at different locations x. A darker shade for the ellipsis under a given grid node means that more charge is assigned to that node. In this example, 2ν = 4, non-zero charges are assigned only to the grid nodes that are within 2Δ of x. In the continuum, the charges are spread as Gaussians of standard deviation σ centered on x. The interpolation formula approximates fσ(x) with a weighted average of the values of f at the nodes that were assigned non-zero weights.
It is useful to extend the domain on which the weights ci(θ, ξ) are defined from [0, 1) × (0, ∞) to [0, 1] × (0, ∞). The extension is done by taking limθ↑1c(θ, ξ) or, equivalently, by solving Eq. (12) for θ = 1. On the extended range, we have the symmetry property
The symmetry is demonstrated in Appendix B.
The linear system given by Eq. (12) always has a unique solution, and the user can find the explicit solutions for orders 4 and 6 in the Appendix. Solutions exist even when the mesh Δ is so large that it cannot accurately represent the features of a Gaussian of spread σ or, at the other end, when Δ is so small that the diameter of the support set is shorter than the distances on which the Gaussian density is important. Clearly then, ξ = σ/Δ cannot be arbitrary: Only values at or near an optimal ratio ξ○ are numerically useful. The value of the optimal ratio is established in Sec. II B.
B. Accuracy of the assignment formula
Similar to how the assignment view controls the accuracy of the interpolation formula, the interpolation view controls the accuracy of the assignment formula. In this subsection, we demonstrate that, with the weights determined above, the assignment formula is also exact for all functions f(x) that are polynomials of degree 2ν − 1 or less. We additionally demonstrate that the interpolation view is a continuous spline and then proceed to determine the optimal value of ξ by minimizing the error of the assignment formula for a polynomial of order 2ν. It will be shown that the assignment formula becomes exact for polynomials of order 2ν and 2ν + 1 if ν is odd and the interpolation view becomes C2-continuous if ν is even.
The interpolation view models the dependence of the amount of charge transferred to a grid node xi = iΔ with the location of the point charge. Referring to Fig. 1 as a visual aid, imagine placing an observation apparatus at the grid node xi, measuring the intensity of the quadrature weight. When the point charge is away from the node by more than νΔ, the signal registered by the apparatus is 0. As the point charge continues to approach the node from the left, the signal recorded by the apparatus becomes cν(θ, ξ), with θ increasing from 0 to 1 as x − xi increases from −νΔ to (1 − ν)Δ. When x − xi becomes exactly (1 − ν)Δ, the signal is no longer given by cν(1, ξ), as one would infer by continuity from the left, but by cν−1(0, ξ). Unless the two values are equal, the signal is discontinuous. The signal is then given by the weight cν−1(θ, ξ) as long as x − xi lies between (1 − ν)Δ and (2 − ν)Δ. Then, the measured weight changes to cν−2(θ, ξ), and so on, until the entire set of weights {cj(θ, ξ); −ν < j ≤ ν} is exhausted. Once x − xi exceeds νΔ, the apparatus registers zero because the quadrature rule has compact support.
The signal intensity measured by the apparatus is given by the x-dependence of the transfer function or, equivalently, by the s-dependence of . The dependence is a piecewise polynomial, owing to the definition of . As can be inferred from Eq. (9), the integers j with |j − i| ≤ ν may constitute points of discontinuity for the s-dependence of and its derivatives. often needs to be differentiable (in order to compute forces in MD simulations, for example), and in such cases, must be continuous at integer values of s; more precisely, we must have cν(0, ξ) = 0, c1−ν(1, ξ) = 0, and cj+1(1, ξ) = cj(0, ξ) for −ν < j < ν. The following proposition shows that the dependence of with s is a continuous piecewise polynomial (a proper spline).
The interpolation view can be decomposed as the sum between a continuous piecewise linear spline and a C2-continuous spline. The piecewise linear component vanishes if dcν(θ, ξ)/dθ = 0 at θ = 0.
The proof of the proposition is given in Appendix C. A graphical illustration for order 2ν = 6 is given in Fig. 2 for different values of ξ.
Interpolation views of order 2ν = 6 (solid lines) decomposed as sums between C2-continuous (dotted lines) and piecewise linear (dashed lines) parts for different values of ξ. The piecewise linear part is almost negligible when ξ = ξ○, as shown in the right panel, where the interpolation view resembles a Gaussian. The piecewise linear part can be entirely eliminated for orders 2ν of the form 4k.
Interpolation views of order 2ν = 6 (solid lines) decomposed as sums between C2-continuous (dotted lines) and piecewise linear (dashed lines) parts for different values of ξ. The piecewise linear part is almost negligible when ξ = ξ○, as shown in the right panel, where the interpolation view resembles a Gaussian. The piecewise linear part can be entirely eliminated for orders 2ν of the form 4k.
For each 0 ≤ k < 2ν, the k-th moment equals the k-th central moment of a Gaussian of standard deviation ξ and mean 0.
The proof is presented in Appendix C.
If 2ν = 4k + 2 for some integer k, there exists a unique ξ, to be denoted by ξ○, such that the interpolation view matches two more additional central moments (the moments of order 2ν and 2ν + 1). There exists no C2-continuous parameterization.
If 2ν = 4k for some integer k, there exists a unique ξ, to be denoted by ξ○, such that the interpolation view is C2 continuous. There exists no parameterization for which the interpolation view matches the moment of order 2ν.
The optimal parameters ξ○ for the first few quadrature splines of even order. Also shown are the continuity classes of the splines.
Order 2ν . | ξ○ . | Continuity class . |
---|---|---|
4 | C2 | |
6 | 0.728 794 88… | C0 |
8 | 0.835 436 00… | C2 |
10 | 0.946 570 66… | C0 |
12 | 1.030 918 80… | C2 |
14 | 1.122 896 72… | C0 |
Order 2ν . | ξ○ . | Continuity class . |
---|---|---|
4 | C2 | |
6 | 0.728 794 88… | C0 |
8 | 0.835 436 00… | C2 |
10 | 0.946 570 66… | C0 |
12 | 1.030 918 80… | C2 |
14 | 1.122 896 72… | C0 |
For 2ν = 4k + 2, the error expressed by Eq. (19) vanishes when ξ = ξ○. For 2ν = 4k, the error does not vanish but attains its non-zero minimum for the same ξ○ that produces the unique C2-continuous parameterization. We define QuadS through and use it by way of the formula .
Asymptotic dependence of the optimal parameter ξ○ with the half-order ν for sufficiently large ν.
Asymptotic dependence of the optimal parameter ξ○ with the half-order ν for sufficiently large ν.
Illustration of the agreement between the interpolation view and the Gaussian kernel it approximates when ξ = ξ○. The agreement improves rapidly as 2ν increases.
Illustration of the agreement between the interpolation view and the Gaussian kernel it approximates when ξ = ξ○. The agreement improves rapidly as 2ν increases.
III. APPLICATIONS OF THE MIDTOWN QUADRATURE SPLINES TO ELECTROSTATICS
In this section, we show that QuadS are the optimal transfer functions to use when computing the long-range component of the electrostatic energy in one dimension (also in higher dimensions among separable transfer functions). As mentioned in the Introduction, the optimal transfer functions in higher dimensions are non-separable, and these CubeS functions are described in Sec. IV; practical guidance on choosing appropriate parameters for both QuadS and CubeS electrostatic calculations is given in Sec. VI.
For electrostatics calculations, the function f(x) represents a one-dimensional charge distribution density vanishing outside a bounded interval. The long-range component of the electrostatic energy is given by the formula
with representing the long-range interaction kernel between two unit charges located at x and x′. We consider to be a linear superposition of Gaussian kernels , with the standard deviation σ′ larger than or equal to a minimum value denoted by σ. Practically useful examples of such superpositions include the u-series and Ewald kernels for the long-range component of the electrostatics, the forms of which will be given later in this section. We mention from the outset that it is the narrowest Gaussian in the superposition, the one of width σ, that ultimately dictates the maximum mesh size Δ of a common grid used to handle all Gaussians in the superposition. We note here that due to linearity, there is no loss of generality in our arguments establishing the optimality of QuadS by restricting the class of interaction kernels to either for non-periodic charge distributions or
for periodic charge distributions with period a. This restriction is made solely to avoid carrying around an extra exterior integral or sum denoting the superposition.
The starting point for designing grid-based methods is the splitting
where ranges between 0 and σ. As a result, σs is allowed to vary between 0 and , and its determination for typical problems is discussed extensively in Sec. VI. The splitting is a direct application of the Gaussian convolution theorem, which states that the repeated convolution of Gaussians centered about the origin results in a Gaussian with variance equal to the sum of the variances of the convolving factors. If the integrals in the previous equation are replaced by the infinite Riemann sums,
then the interaction energy is approximated by
Although not directly usable in practice, Eq. (25) is suggestive of a wide class of implementable approximations for the interaction energy. Members of this class are obtained by modifying the on-grid interaction kernel and/or replacing the Gaussians entering Eq. (24) with arbitrary transfer functions vanishing for all |x − xi| ≥ rs > 0. Our optimization is over a subclass of models that keeps the on-grid interaction kernel unchanged and sets the spreading radii rs in the form νΔ, with fixed but arbitrary ν. We note here (partially anticipating the results of the optimization) that the error for the optimal model in the subclass decays only polynomially fast. It follows then that the optimal transfer functions remain optimal for all models for which the on-grid interaction kernel takes the general form , as long as it converges to in the limit Δ → 0 at a rate faster than any polynomial. Mimicking what many other P3M-inspired methods do, the model we ultimately parameterize in Sec. VI uses a related spectral kernel built by restricting the Fourier transform of to the first Nyquist zone, according to Eq. (74). This kernel converges to superexponentially in the limit Δ → 0 and is preferred in practice because it permits the use of the fast Fourier transform (FFT) algorithm.
With these clarifications in mind, we now set out to find the optimal
To further simplify the analysis, we use the splitting
with , to write the interaction energy as
A little thought shows that it is enough to optimize the rate of the convergence,
since is either or the periodic extension of it. We will refer to the lhs of Eq. (28) as the half-potential. The total energy is then the convolution of two such half-potentials.
The QuadS transfer function is and turns out to be the optimal transfer function if σs = ξ○Δ, as established by the following proposition.
For piecewise continuous charge distributions f(x), the QuadS transfer function is asymptotically optimal among all possible assignment kernels that vanish for |x − xi| ≥ νΔ.
As noted above, there are likely to be other fruitful applications of QuadS in addition to evaluation of the long-range component of electrostatics on which we have focused here; we present another example application of QuadS (to path integrals) in Appendix E. The main result is Proposition 6, which states that QuadS of order 4k represent the optimal transfer functions for grid-to-grid assignment.
IV. MIDTOWN CUBATURE SPLINES
QuadS are only the optimal functions in higher dimensions if we restrict ourselves to separable functions, which require cubic support shapes. In this section, we consider non-separable functions, which provide opportunities for further optimization by changing the shape of the support. (We note that the most useful ones of such functions for MD simulations are given explicitly in the Appendix, and practical guidance on their use in MD simulations is given in Sec. VI.)
It is instructive to first consider why a product of one-dimensional midtown splines does not result in an optimal cubature scheme absent a request for separability. The reason is that the central moments
exactly match those for a Gaussian of covariance matrix , not only for all multi-indices α = (α1, α2, α3) with |α| ≡ α1 + α2 + α3 < 2ν but also for many multi-indices with |α| ≥ 2ν. In d dimensions, the number of moments of degree smaller than 2ν (the bare minimum that needs to be matched) is equal to the number of weak compositions of 2ν − 1 in precisely d + 1 parts (the number of ways in which 2ν − 1 can be written as a sum of d + 1 non-negative integers). It is given by the formula
A d-dimensional integration scheme based on QuadS uses (2ν)d nodes and matches (2ν)d moments. In three dimensions, for large enough 2ν, QuadS matches 3! = 6 times more moments than the minimum necessary. For order 2ν = 6, the minimum number of moments is 56, yet the cubature based on QuadS matches 6 × 6 × 6 = 216 moments. Some of these moments, for example, the one of multi-index α = (5, 4, 3), have total order |α| = 12. Since most other moments of order 12 are not matched, the accuracy order remains 6. We will show that there exist cubature splines (CubeS) of order 2ν that use much fewer than (2ν)d nodes. In effect, we formulate and study a cubature problem that would be typical if not for the additional requirement of continuity of the interpolation view.
As for QuadS, the CubeS assignment kernel is given by the rhs of Eq. (38), with the following d-dimensional extension of Eq. (9):
satisfying an analog of Eq. (15), namely,
The vector-valued functions and θ(s) are component-wise extensions of their one-dimensional counterparts. The support set D is equal to the integer interval [(1 − ν)⋯ν] for QuadS in one dimension and to [(1 − ν)⋯ν]d for QuadS in d dimensions (defined as outer products of one-dimensional midtown splines). For CubeS, the support set D is a proper subset of [(1 − ν)⋯ν]d. CubeS must integrate exactly all monomials of order less than 2ν, that is,
for all multi-indices α with |α| < 2ν. This system of equations is the multidimensional analog of Eq. (12). The system is overdetermined unless the number of cubature nodes is larger than the number of moments of total order less than 2ν. Equation (43) thus provides a lower bound on the minimum number of elements D must contain.
The selection of the proper cubature nodes is a rather delicate problem, made challenging by the large amount of interesting mathematical information encoded by a multidimensional Gaussian measure. We have studied two kinds of sets D, both defined by
for some norm ∥·∥p. The two important norms for our study are the L1 or Manhattan norm ∥x∥1 = |x1| + ⋯ + |xd| and the L2 or Euclidean norm ∥x∥2 = (|x1|2 + ⋯ + |xd|2)1/2. The L∞ or Chebyshev norm ∥x∥∞ = max{|x|1, …, |x|d} is also important, but this choice results precisely in (outer products of) QuadS. The nodes singled out by Dp are those within the Lp distance of ν − 1 from the unit cell [0, 1]d, the place where the vectors θ live. We have D∞ = [1 − ν⋯ν]d and D1 ⊆ D2 ⊆ D∞. The inclusions follow from the inequalities ∥·∥∞ ≤ ∥·∥2 ≤ ∥·∥1. The family of cubature splines of support Dp will be denoted by CubeSp, and we note that CubeS∞ is alternatively denoted by QuadS.
Driven by practical considerations, we will set d = 3 for the remainder of this section and assess the relative numerical advantages of different CubeSp schemes by simply examining the sizes and shapes of their supports. For large enough ν, the cardinality of Dp, denoted by |Dp|, is given roughly by the volume of the set. The ratios |D1|/|D∞| and |D2|/|D∞| approach 1/6 and π/6, respectively. |D1| thus scales with ν in the same way as the lower bound given by Eq. (43), and it ought to result in cubature splines that have nearly minimal numbers of nodes. Although scaling less favorably, |D2| is not much larger than |D1| for the small orders 2ν that are relevant to practical applications, as shown in Table II. The support D2 (a solid sphere for large 2ν) more closely captures the rotational symmetry of the spreading Gaussian than either D∞ (a cube) or D1 (an octahedron). The support D∞ is the only one that captures the separability of the spreading Gaussian, which sometimes can be exploited computationally and may partially compensate for the larger support size.
The second column shows the minimum number of moments that must be matched for the cubature rule to achieve the convergence order stated in the first column. The other columns show the actual number of moments matched for the listed cubature splines.
Order 2ν . | Lower bound . | CubeS1 . | CubeS2 . | QuadS (CubeS∞) . |
---|---|---|---|---|
4 | 20 | 32 | 32 | 64 |
6 | 56 | 80 | 88 | 216 |
8 | 120 | 160 | 232 | 512 |
10 | 220 | 280 | 432 | 1024 |
Order 2ν . | Lower bound . | CubeS1 . | CubeS2 . | QuadS (CubeS∞) . |
---|---|---|---|---|
4 | 20 | 32 | 32 | 64 |
6 | 56 | 80 | 88 | 216 |
8 | 120 | 160 | 232 | 512 |
10 | 220 | 280 | 432 | 1024 |
If the cardinality of Dp is larger than the minimum number of moments necessary to achieve the convergence order 2ν (henceforth also referred to as the minimal list of moments), the linear system given by Eq. (46) is underdetermined. In order to make the system uniquely determined, we must enlarge the list of matched moments. For QuadS, we know that we must match all moments of multi-index α with ∥α∥∞ < 2ν. There are (2ν)3 such moments, and their total equals the cardinality of D∞. This maximal list of moments also provides the candidate moments for CubeS1 and CubeS2 by the following prescription. The maximal list is first partially ordered as follows: We let the total order |α| increase from 0 to 3(2ν − 1). For each such |α|, we generate its integer partitions in at most 3 parts, with the partitions listed in reverse lexicographical order—e.g., (5, 0, 0) comes before (4, 1, 0). A permutation of such an integer partition defines a multi-index in the maximal list, and all elements of the maximal list are so enumerated. To summarize, in our ordering of the maximal list, multi-indices α with smaller |α| come first, integer partitions with same |α| are in the reverse lexicographical order, and permutations of the same integer partition are equivalent.
The minimal list of moments corresponds to all multi-indices from the maximal list that has a total order less than 2ν. Starting with the total order of 2ν, we provisionally add one group of equivalent permutations at a time, the group being taken in the order described in the previous paragraph. We form the linear system from Eq. (46) and try to solve it symbolically in Mathematica. If the system becomes inconsistent, the group of permutations is removed, and the next group is provisionally added. The process is stopped when the number of moments equals |Dp|. Of course, it is not trivial that the process stops before the maximal list is exhausted. We do not have a proof, but we have verified the success of the procedure for all three distances up to order 2ν = 10. We have additionally verified that the solutions are C0 continuous, which is again a non-trivial statement. Hence, at least for the cases presented in Table II, cubature splines exist.
We added entire groups of permutations to the list of matched moments so that the cubature rules possess some of the basic symmetries of the Gaussian measure they replace. All CubeS constructed above feature reflection and permutation symmetry. Imagine a coordinate system with the origin at the center of the unit cell [0, 1]3, as depicted in Fig. 5. Each octant (±1, ±1, ±1) contains an equal number of cubature nodes, which can be obtained from the nodes of the first octant (+1, +1, +1) by a sequence of reflections with respect to the main planes of the coordinate system. For each i ∈ Dp and θ ∈ [0, 1]3, the reflected node i† in the first octant and the location θ† are
with the vector operations being component-wise. The reflection symmetry implies
which generalizes Eq. (14) to CubeS. The cubature weights additionally relate to each other by way of permutation symmetry. Let σ denote a permutation of {1, 2, 3}. The permutation acts on the arbitrary vector x = (x1, x2, x3)⊤ in such a way that σ·x is the vector of entries xσ(i) for 1 ≤ i ≤ 3. The permutation symmetry has that
Together with Eq. (49), the last identity implies that it is enough to specify the cubature weights for those nodes i for which i1 ≥ i2 ≥ i3 ≥ 1. Figure 5 shows the nodes that are assigned charges for fourth-order CubeS1 and CubeS2. (At order 4, these cubature splines are identical). Owing to permutation and reflection symmetry, only the two nodes labeled (1, 1, 1) and (2, 1, 1) need to have their weights specified.
The balls mark the grid nodes that are assigned a charge by carriers located in the unit cube centered at the origin for fourth-order CubeS2. (Note that, at order 4, CubeS1 and CubeS2 are identical.) We use scaled coordinates (units of Δ) to illustrate the almost spherical distribution of the nodes of the cubature rule. The grid nodes located in the first octant are colored, with the permutationally equivalent nodes sharing the same color. The weights for all nodes are uniquely determined once the weights for the two nodes labeled (1, 1, 1) and (2, 1, 1) are specified [see Eqs. (49) and (50)].
The balls mark the grid nodes that are assigned a charge by carriers located in the unit cube centered at the origin for fourth-order CubeS2. (Note that, at order 4, CubeS1 and CubeS2 are identical.) We use scaled coordinates (units of Δ) to illustrate the almost spherical distribution of the nodes of the cubature rule. The grid nodes located in the first octant are colored, with the permutationally equivalent nodes sharing the same color. The weights for all nodes are uniquely determined once the weights for the two nodes labeled (1, 1, 1) and (2, 1, 1) are specified [see Eqs. (49) and (50)].
Owing to the use of the Euclidean distance in the definition of the support, CubeS2 better capture the spherical symmetry of the spreading Gaussian in scaled coordinates. For all CubeS, the optimal ξ○ is determined by minimizing the absolute error in the spherical moment of order 2ν, which is given by
For orders 2ν = 4k + 2, this absolute error vanishes for a unique ξ○, for which the weights of CubeS2 are all positive. For orders 2ν = 4k, the error does not vanish. The unique minimum is realized for a value ξ○ for which the CubeS2 have one weight (not counting its symmetry-induced variants) that can take small negative values. The CubeS1 family is decidedly worse with respect to the positivity of the cubature weights. Many weights take negative values for 2ν ≥ 8. For order 4, CubeS1 are identical to CubeS2. For order 6, although both CubeS1 and CubeS2 have positive weights, CubeS2 have smaller cost for evaluating the cubature weights and their positional derivatives, despite the slightly larger number of cubature nodes (88 vs 80). For these reasons, in the Appendix, we only present CubeS2 of order 4 and 6, as CubeS1 of order 6 does not seem to present any supplementary computational advantage.
V. HYPERBOLIC SINE GAUSSIAN SPLIT METHOD
As mentioned in the Introduction, Anton’s specialized hardware allows for fast computation of transfer functions that depend solely on the absolute distance to the grid nodes in three dimensions. The midtown splines do not have this property and so are not suitable to be used directly on Anton. Their development, however, gives insight into how to improve upon the GSE method used on Anton, while retaining the ability to take advantage of its specialized hardware. In this section, we describe such an improved method.
GSE uses a very simple transfer function: a Gaussian truncated to a certain finite spreading range ∥r − ri∥ < rs. As a consequence, GSE can only increase its accuracy by increasing the spreading range rs while simultaneously decreasing the mesh size Δ. In the original method, the optimal value for σs is determined empirically, but Lindbo and Tornberg10 have demonstrated the optimal scaling σs/Δ = κ(rs/Δ)1/2, with . (While the precise value of C is undetermined, it is argued to be a number slightly smaller than 1; with the value C = 0.976 used by Lindbo and Tornberg, we obtain κ ≈ 0.409.) Their scaling can also be inferred from our research on quadrature splines. In Sec. I B, we determined that σs/Δ ≈ κ(rs/Δ)1/2 for QuadS, with κ ≈ 0.427. From Eq. (20), we deduce that the QuadS transfer function is well approximated by
The last equation used with ∥Δ−1r − i∥ truncated to ν gives rise to Spectral Ewald (SE), the improved version of GSE studied by Lindbo and Tornberg. However, further improvements are possible, and in the remainder of this section, we use the asymptotic form expressed by Eq. (52) as a starting point for developing an improved transfer function suitable for Anton.
Equation (52) already differs from its original GSE counterpart in that it uses scaled coordinates s = Δ−1r. The new spreading function thus behaves differently for anisotropic systems or for simulations for which Δ fluctuates in time (for example, simulations in the isobaric ensemble). Additionally, rather than simply abruptly truncating the Gaussian kernel in Eq. (52) as in GSE, we modify it such that it vanishes continuously at the boundary of the support (the set determined by the inequality ∥s − i∥ < ν) while remaining very nearly a Gaussian inside the boundary. Although this requirement does not uniquely determine a transfer function, we propose the transfer function
where
The Gaussian split method using the hyperbolic sine transfer function will be labeled SinhGS.
is a normalization constant such that
The covariance matrix of the transfer function is
implying
The covariance matrix is needed for the evaluation of , which participates in the definition of the grid-to-grid interaction kernel entering Eq. (39). As a first guess, the constant κ can be set to 0.427, the value determined for QuadS, but empirical studies for charged systems of biological importance have shown that the value
leads to slightly smaller errors in the interaction energy.
Let us illustrate some of the improvements SinhGS provides over the original GSE using ions in aqueous solution as an example system. For SinhGS, we use ν = 3, which corresponds to the spreading radius rs = 3Δ for GSE; for GSE, this spreading radius is too small for most purposes, as will become clear (rs = 3.8Δ is generally used with GSE in typical biomolecular simulations). For reference, we also display results computed with QuadS of order 2ν = 6. The system is a cubic box of water of edge length equal to 126.7 Å, containing 125 protonated arginine dipeptides neutralized by an equal number of chlorine anions and modeled by the CHARMM force field.21 The electrostatic energy is computed with the help of the u-series decomposition, which is itself a source of errors. In Fig. 6, though, we only display the error that is solely due to the grid discretization. This error is computed by reference to an effectively exact calculation performed using QuadS of order 2ν = 8 and a very small mesh size. The cutoff for the near electrostatics is kept constant at rc = 9 Å, but the mesh size Δ is progressively decreased so that the value rc/Δ used in the abscissa increases. QuadS converges rapidly, consistent with a sixth order method, and is the only method to converge. Neither SinhGS nor GSE converges as Δ → 0, but the errors for SinhGS stay closer to 0 than the errors for GSE. As we already mentioned, SinhGS and GSE can be made to converge by letting ν become large, although this mode of convergence is more expensive, with the number of grid nodes to which a charge is spread being roughly 4πν3/3.
Energy error solely due to grid discretization for a typical configuration of the arginine dipeptide system plotted against the multiple of the inverse mesh rc/Δ. All calculations used ν = 3, the short-range electrostatics cutoff rc = 9 Å, and the u-series decomposition with base b = 2. Also reported are the exact (QuadS) or average (GSE and SinhGS) numbers of grid nodes to which a charge is spread.
Energy error solely due to grid discretization for a typical configuration of the arginine dipeptide system plotted against the multiple of the inverse mesh rc/Δ. All calculations used ν = 3, the short-range electrostatics cutoff rc = 9 Å, and the u-series decomposition with base b = 2. Also reported are the exact (QuadS) or average (GSE and SinhGS) numbers of grid nodes to which a charge is spread.
As exemplified in Fig. 7, MD simulations using GSE with ν = 3 exhibit strong energy drift for systems with large charges. The drift can be ascribed to the lack of continuity of the transfer function (the truncated Gaussian) at the cutoff radius. The drift can be decreased to acceptable values by increasing the spreading radius from 3 (in units of Δ) to the typically used value of ∼3.8, but there is a computational price to be paid since the number of grid nodes to which a charge is spread doubles. Under similar simulation conditions, SinhGS with ν = 3 performs better, and switching to this method is a computationally more efficient way to decrease the drift. Still, even though its energy drift is smaller, SinhGS must increase ν when more accuracy is needed, since modifying ν is the only accuracy control. From this point of view, the method is less flexible than QuadS or SPME on programmable computers, since the latter methods may alternatively increase the accuracy by decreasing Δ at constant ν. SinhGS is useful for Anton, where the cost of charge transfer is lessened by the utilization of dedicated hardware.
Energy drift in the microcanonical ensemble for the arginine dipeptide system. The Desmond/GPU simulation uses the mesh Δ = 3.6 Å and the Verlet integrator with a time step of 2 fs. The long-range component of the electrostatics is multistepped and evaluated every 6 fs.
Energy drift in the microcanonical ensemble for the arginine dipeptide system. The Desmond/GPU simulation uses the mesh Δ = 3.6 Å and the Verlet integrator with a time step of 2 fs. The long-range component of the electrostatics is multistepped and evaluated every 6 fs.
VI. USAGE OF MIDTOWN SPLINES WITH THE u-SERIES DECOMPOSITION
In earlier sections, we have derived the midtown splines (also the SinhGS method). In this section, we turn to more practical issues of how to use these splines in MD simulations. Both Midtown splines and SinhGS can be used to evaluate the interaction energy for all potentials that can be represented as a superposition of Gaussians with standard deviations larger than a certain minimum value σ. Many interactions can be expressed this way, including the long-range component of the Coulomb interaction or the two-body part of the dispersion interactions. The superposition can be continuous, as is the case for the Ewald decomposition, or discrete, as is the case for the u-series decomposition. Here, we consider the u-series decomposition because it has been shown to be more computationally efficient than the Ewald decomposition.4 We succinctly describe the u-series decomposition and explain how it is used to evaluate the long-range component of the electrostatic energy. We then point out the existence of one parameter that has to be determined: ϕ = Δ/rc, with rc denoting the cutoff for the short-range electrostatics. We empirically determine and tabulate the maximal value of ϕ consistent with the requirement that the grid discretization errors, which increase as ϕ increases, are smaller than the intrinsic errors of the u-series decomposition (those errors still present as Δ → 0).
With r = ∥r′ − r∥, the long-range component of the u-series decomposition for the Coulomb interaction 1/r is
and the short-range component is
The coefficients wl are equal to 1 for all l ≥ 1. Given b > 1, rc and w0 are such that rc is the smallest value for which is C1 continuous at r = rc. This requirement makes the ratio rc/σ a function of b. Using lower values of b leads to more accurate decompositions, but these decompositions are computationally more expensive because the ratio rc/σ increases as b decreases. We note that for special values of b, the short-range component becomes C2 continuous at r = rc. The order of continuity o at the cutoff is important because, for constant b, the error in the total electrostatic energy can be decreased as fast as by increasing rc.
The long-range component of the electrostatic energy for N particles located at the positions Rn and carrying the charges Qn is obtained as a sum over interacting grid points carrying grid charges by simply inserting Eq. (39) in place of in Eq. (59). The result is
The matrix σk,l is such that and the grid charges qi are
The transfer function is given by Eq. (53) for SinhGS or by
for midtown splines. The splines that are useful in MD are presented in the Appendix. For a periodic system, we can choose the kernel to be given by either Eq. (31) or by a spectral version10 of Eq. (31) obtained by transporting the kernel into the reciprocal space and restricting the summation over the reciprocal vectors to the first Nyquist zone. We provide more details below.
In practice, the calculations are organized as follows: (i) the grid charges qi are prepared by way of the assignment formula given by Eq. (62), (ii) the grid electrostatic potential
is evaluated, and, then, (iii) the electrostatic potential U(r) in the continuum is obtained by way of the interpolation formula,
The electrostatic potential U(r) is used to compute the force
acting on particle n and/or to evaluate the total energy
The evaluation of the on-grid electrostatic potential at step (ii) requires further clarifications for periodic systems. Defining x = r′ − r and the matrix Ω such that Ωn = n1a1 + n2a2 + n3a3, we have
The sum over p is a Fourier series, reflecting that the preceding sum defines a periodic function in x with unit cell Ω defined by the column vectors of Ω. The Fourier coefficients are
We can re-order the summation in Eq. (64) and introduce the influence function
to obtain
For the Ewald kernel, a similar arrangement holds except that the influence function becomes
The sum over i in Eq. (71) is finite: It extends only over those i that are within a distance ∥i − s∥p of a point s = Δ−1r with r in the unit cell Ω. This statement follows from Eq. (62). We now request that the scaled unit box is a rectangular prism of integer side lengths, that is,
for some positive integers N1, N2, and N3. Periodicity arguments allow the charges qi to be replaced by the set obtained by coalescing all charges qi from images outside Ω onto the unique equivalent in Ω. The sum inside the parentheses of Eq. (71) becomes a discrete Fourier transform to be handled by FFT. After multiplication by the influence function Ψ(p), the data can be transformed back using inverse FFT, provided that the exterior sum in Eq. (71) is restricted to the first Nyquist zone. The restriction could have been introduced explicitly in the last sum of Eq. (68) and is equivalent to using the kernel
for computing the on-grid interactions.
We now quantify the intrinsic errors in the u-series approximation using parameters we have previously found to be appropriate for biomolecular MD simulations. This will later allow us to compare the magnitude of these intrinsic errors—obtained, in practice, by performing the mesh computation with a fine mesh and high-order QuadS—to the errors that arise when more efficient but less accurate mesh computation is used, facilitating the selection of mesh computations with an appropriate balance of accuracy and efficiency. The two parameterizations we consider are as follows: (i) a C1-continuous parameterization with b = 2 and rc/σ ≈ 1.989 and (ii) a C2-continuous parameterization with b ≈ 1.630 and rc/σ ≈ 2.752. (Parameter values specified to more than 3 digits are available in Ref. 11.) The parameterization with b = 2 is computationally more efficient and recommended for most simulations. The parameterization with b ≈ 1.630 is useful when more accurate simulations are needed. There exist C2-continuous parameterizations exceeding any desired accuracy. They are not, however, expected to be useful in MD simulations, so we did not include them here.
Figure 8 displays the error in the electrostatic energy that is intrinsic to the u-series method, as measured for the arginine dipeptide system. These errors decay as for the C2 parameterization and as for the C1 parameterization. For comparison, we also display the errors due to the truncation of the van der Waals interactions. Even though we include a dispersion correction that corrects for the truncation in an average sense (reducing the error by three orders of magnitude for this system), the remaining errors, which decay as fast as with increasing rc, are overall larger than the errors in the electrostatic energies for cutoffs in the range we studied (rc ∈ 7 Å–16 Å).
Intrinsic errors for two parameterizations of the u-series decomposition applied to the arginine dipeptide system. The errors decrease exponentially as b approaches 1, but they also decrease inverse polynomially with increasing rc. Also plotted as a reference is the error caused by the truncation of the van der Waals interactions, with the dispersion correction included.
Intrinsic errors for two parameterizations of the u-series decomposition applied to the arginine dipeptide system. The errors decrease exponentially as b approaches 1, but they also decrease inverse polynomially with increasing rc. Also plotted as a reference is the error caused by the truncation of the van der Waals interactions, with the dispersion correction included.
We now empirically determine the maximum mesh spacing below which the grid discretization error is smaller than the intrinsic error. The result depends on the decomposition scheme (here, the u-series decomposition), the charge assignment rule (here, CubeS or SinhGS), the grid-to-grid kernel (here, the Fourier spectral kernel), and the class of physical problems (here, condensed-phase systems in aqueous solutions near room temperature). Scaling arguments (for a given u-series parameterization σ∝rc and for a given Midtown spline ) suggest that the maximum Δ will be proportional to rc, so we will empirically determine the maximum allowable values for the ratio ϕ = Δ/rc. The maximum values of ϕ were indeed observed to depend little with rc in the range 7 Å–16 Å.
Table III lists the maximum allowable values of ϕ recommended for several QuadS and CubeS2 of orders 4 and 6. These values have been determined so that the discretization errors are smaller than the intrinsic errors on the entire range 7 Å–16 Å of cutoff radii considered. Graphs similar to Fig. 9 have been determined for several systems thought to be representative of biological systems, and the most conservative values for ϕmax have been retained. A separate study has been carried out for the u-series with b ≈ 1.630. For SinhGS with ν = 3, owing to the non-decaying asymptotic error in the limit Δ → 0, there is no value of ϕ for which the error decays below the intrinsic error of the u-series with b ≈ 1.630. We must use SinhGS with ν = 4 to reach the accuracy afforded by the C2-continuous parameterization of the u-series decomposition. (For SinhGS, the parameter ν is continuous, and it is just by coincidence that the optimal values were found to be nearly integers.) The other results are also consistent with the theoretical analysis. For the same order, CubeS2 are slightly less accurate than QuadS, and this is reflected in their smaller values of ϕmax.
Maximum allowable values of ϕ = Δ/rc recommended for the u-series method when applied to charged molecules in aqueous solution near room temperature and pressure.
ϕmax | |||
Method | Nr nodes | b = 2 | b ≈ 1.630 |
CubeS2, ν = 2 | 32 | 0.23 | 0.065 |
QuadS, ν = 2 | 64 | 0.26 | 0.075 |
CubeS2, ν = 3 | 88 | 0.35 | 0.160 |
QuadS, ν = 3 | 216 | 0.43 | 0.180 |
SinhGS, ν = 3 | 113 | 0.43 | … |
SinhGS, ν = 4 | 268 | … | 0.190 |
ϕmax | |||
Method | Nr nodes | b = 2 | b ≈ 1.630 |
CubeS2, ν = 2 | 32 | 0.23 | 0.065 |
QuadS, ν = 2 | 64 | 0.26 | 0.075 |
CubeS2, ν = 3 | 88 | 0.35 | 0.160 |
QuadS, ν = 3 | 216 | 0.43 | 0.180 |
SinhGS, ν = 3 | 113 | 0.43 | … |
SinhGS, ν = 4 | 268 | … | 0.190 |
Errors in the electrostatic energy of the arginine dipeptide system. The errors result from a combined effect of the intrinsic error of the u-series decomposition (with b = 2) and the errors due to grid discretization. The various grid assignment rules have used the maximum values of φ recommended in Table III. Also listed are the number of grid nodes to which a single charge is spread. The corresponding values of ν are listed in Table III.
Errors in the electrostatic energy of the arginine dipeptide system. The errors result from a combined effect of the intrinsic error of the u-series decomposition (with b = 2) and the errors due to grid discretization. The various grid assignment rules have used the maximum values of φ recommended in Table III. Also listed are the number of grid nodes to which a single charge is spread. The corresponding values of ν are listed in Table III.
Table III facilitates the specification of different parameterizations of comparable accuracy, which is an important step toward determining the fastest method able to achieve a desired level of accuracy on a given computational platform. As an illustrative example, Fig. 10 shows the results of a study for the u-series with b ≈ 1.630 using Desmond/GPU on an RTX 2080 GPU. All systems were cubic water boxes at experimental densities with interactions given by the TIP3P water model. The distant calculation was executed at each time step, and a cutoff radius of 9 Å was used for both electrostatics and van der Waals interactions. The grid sizes were chosen according to equation Δ = ϕmaxrc. We found that the optimal transfer functions for this level of accuracy is provided by CubeS2 of order 6, which spreads to 88 grid nodes. The gain in overall simulation speed of about 40% when measured against QuadS of order 6 is due to the more compact spreading support, which more than compensates for the moderate increase in the FFT sizes. The high accuracy demands of b ≈ 1.630 are difficult to meet with midtown splines of order 4, the usage of which leads to substantial increases in the FFT sizes and consequently large slowdowns. A separate study for b = 2 has shown that the midtown splines of order 4 are nevertheless faster at this lower accuracy. CubeS2 of order 4 were found to lead to overall simulation speeds about 10% faster than with QuadS of order 4.
Speed-up in the total simulation rate relative to simulations run with QuadS of order 6. The data are for cubic water boxes with increasing number of atoms. The distant electrostatics is evaluated using the u-series decomposition with b ≈ 1.630.
Speed-up in the total simulation rate relative to simulations run with QuadS of order 6. The data are for cubic water boxes with increasing number of atoms. The distant electrostatics is evaluated using the u-series decomposition with b ≈ 1.630.
VII. SUMMARY AND CONCLUSIONS
We have introduced the midtown splines defined as a set of cubature Newton–Cotes weights varying continuously with the location of a charge carrier. The splines have been developed for Gaussian kernels, which have immediate applications in chemical physics, but similar splines can be defined for other measures that possess reflection symmetry. The splines are equipped with assignment and interpolation formulas that allow sampling of smoothings on a grid given the charge densities and, conversely, continuous reconstruction of smoothings given charge samples on a grid. In the present work, the mappings provided a way to perform the charge assignment (spreading) and the electrostatic potential interpolation steps in particle–mesh algorithms for electrostatics computations. These mappings may find other uses, for example, to switch between grids of different resolutions, as is done in the multilevel summation method,8 or as an optimized approximation to an ideal Gaussian filter in signal processing.
The one-dimensional splines, the quadrature splines, have been characterized completely and uniquely. Their mathematical properties are highly interesting because optimality considerations for the assignment view simultaneously result in maximal smoothness of the interpolation view. Each accuracy order of the form 4k + p with p ∈ {1, 2, 3, 4} results in optimal quadrature splines of continuity Cp−2. The splines of order 4k + 2 are particularly recommended for MD applications, although all continuous splines are usable. The splines of orders 4k optimally implement path integrals on a grid.
In three dimensions, cubature splines can be built as products of quadrature splines by exploiting the factorization of Gaussians. This construction, which can be named either QuadS or CubeS∞, is similar to the way the cardinal B-splines are used in SPME. The construction is not optimal, absent a request for separable transfer functions, and more economical cubature splines of given order can be built. We described the Manhattan and Euclidean splines, denoted by CubeS1 and CubeS2, respectively, and completely characterized the CubeS2 of order 4 and 6. These cubature splines spread any given charge to a number of 32 and 88 neighboring grid nodes, respectively, which compare favorably to the 64 or 216 nodes needed by both QuadS and B-splines of the same order. Measured against QuadS, CubeS2 require slightly more operations per assignment and larger grid sizes, but fewer memory transactions. Speed measurements using Desmond/GPU have shown that the cubature splines outperform their QuadS counterparts on single-GPU platforms.
The spline-based transfer functions are less useful for Anton or other platforms possessing dedicated hardware for the fast evaluation of pairwise interactions. Informed by the asymptotic properties of QuadS, we have developed a variation of GSE called SinhGS, which is both simpler and more accurate than the original method. It is simpler especially in its use because the exact relationship between the standard deviation of the spreading Gaussian and the mesh size, namely, , eliminates the reliance on an ad hoc optimizer and allows for simpler virial expressions obtained by formal differentiation. SinhGS has better accuracy and smaller energy drift than GSE.
SUPPLEMENTARY MATERIAL
See the supplementary material for the following appendixes and table: Appendix B: Proof of the reflection symmetry, Appendix C: Proofs of Propositions 1 and 2, Appendix D: Proof of Proposition 5, Appendix E: Path integrals in imaginary time on the grid, Appendix F: Quadrature splines of odd order, and Table IV: The optimal parameters ξ○ for the first few quadrature splines of odd order.
ACKNOWLEDGMENTS
The authors thank Michael Eastwood for helpful discussions and Berkman Frank for editorial assistance.
This study was conducted and funded internally by D. E. Shaw Research, of which D.E.S. is the sole beneficial owner and Chief Scientist and with which all the authors are affiliated.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material.
APPENDIX: RECOMMENDED MIDTOWN SPLINES FOR MOLECULAR DYNAMICS SIMULATIONS
Explicit formulas for those midtown splines that we recommend for molecular dynamics simulations are provided here. These are the CubeS2 splines of order 4 and 6. We first provide explicit formulas for the one dimensional midtown splines (QuadS) of order 4 and 6; by means of an outer product, the QuadS functions can be used to construct the optimal separable transfer functions in three dimensions. We note that practical advice on how to determine an appropriate grid spacing is given in Sec. VI.
The one-dimensional midtown splines (QuadS) of order 2ν have the form
Here, and θ(s) represent the integer and fractional parts of s, respectively. The optimal values for ξ are denoted by ξ○ and presented in Table I. QuadS possess the translational property and the reflection symmetry . We only report the quadrature weights ci(θ, ξ) with 1 ≤ i ≤ ν because those with −ν < i ≤ 0 can be obtained from the reflection symmetry or equivalently from Eq. (14).
For order 2ν = 4, the quadrature weights are
For order 2ν = 6, the quadrature weights are
Many-dimensional midtown splines are generically called CubeS. They have the form
which generalizes Eq. (A1). Here, and θ(s) are the component-wise extensions of s and θ(s) to many dimensions. The splines possess the properties and . Here, s† denotes the unique point of positive coordinates that is obtained from s through a sequence of reflections with respect to the main planes of the coordinate system. Additionally, is invariant under any permutation of the components of s. The domain D represents the support of for any s ∈ (0, 1)d, where d is the number of dimensions. In three dimensions, it is invariant under any of the eight reflections,
and any permutation of the coordinates of i. D is thus completely specified by the list of nodes i ∈ D with i1 ≥ i2 ≥ i3 ≥ 1, which bear explicit labels in Figs. 5 and 11. Due to these symmetries or equivalently Eqs. (49) and (50), it suffices to report the cubature weights ci(θ, ξ) for these nodes only.
The largest domain is D∞ = [(1 − ν)⋯ν]d, which is the domain for outer products of QuadS,
These functions, also called CubeS∞, are optimal for the cubic domain, D∞. D∞ is, however, part of a class of domains Dp defined by Eq. (47), with the associated midtown splines denoted by CubeSp. In three dimensions, we have found CubeS2 of order 4 with the support size |D2| = 32 and of order 6 with the support size |D2| = 88, which we recommend for use in molecular dynamics simulations.
For order 4, we first define the auxiliary functions
Then,
The value of ξ recommended for electrostatics calculations is . In applications where positive weights are required, they can be obtained by using .
For order 6, we define the additional auxiliary functions
Then,
The optimal value of ξ is ξ○ = 0.650 399 8…. (With this choice, the cubature weights are positive).
We note that various parallelization strategies are possible. Desmond/GPU uses an outer loop to iterate over all charges in the system. For each such charge, an inner loop iterates over all grid points to which portions of the charge are assigned. If the charge is located at s, the charge is transferred to all grid points i such that . Equation (A4) is used directly as given. Another possible parallelization strategy is to iterate over all grid points in the outer loop. Then, for each such grid point i, an inner loop iterates over all particles that can transfer a non-zero charge to the grid point. In other words, for each i, we need to list all charges located in the support of the interpolation view . To do this, it may be computationally advantageous to prefilter the charge positions s by using only if ∥i − s∥p < ν. (The Lp norm ∥·∥p is fast to compute.)