Transferring particle charges to and from a grid plays a central role in the particle–mesh algorithms widely used to evaluate the electrostatic energy in molecular dynamics (MD) simulations. The computational cost of this transfer process represents a substantial part of the overall time required for simulation and is primarily determined by the size of the support (the set of grid nodes at which the transfer function is evaluated). The accuracy of the resulting approximation depends on the form of the transfer function, of which several have been proposed, as well as the size and shape of its support. Here, we show how to derive the transfer function that yields maximal asymptotic accuracy for a given support in the limit of fine grid resolution, finding that all such functions are splines, and we determine these functions (which we refer to as *midtown splines*) for a variety of choices of support to find optimally efficient transfer functions at accuracy levels relevant to MD simulations. We describe midtown splines that achieve fourth- and sixth-order accuracy in the grid spacing while requiring a support size of 32 and 88 grid nodes, respectively, compared to the 64 and 216 nodes required by the most widely used transfer functions (B-splines). At accuracy levels typically used in MD simulations, the use of midtown splines thus cuts the time required for charge spreading by roughly a factor of two.

## I. INTRODUCTION

The inclusion of electrostatic interactions improves the accuracy of modern molecular models, but the long-range nature of these interactions makes them prohibitively expensive to evaluate directly by pairwise summation. A commonly used, more efficient alternative is splitting the interactions into short-range and long-range components (by means of Ewald splitting^{1–3} or the *u*-series decomposition^{4}). The short-range interactions, which include the Coulomb singularity, are effectively range-limited and are evaluated by direct pairwise summation.^{5} The long-range interactions are typically first mapped to a rectangular grid, where they are evaluated (a process that often involves transforming to Fourier space, where the long-range components become range-limited and straightforward to compute), and then energies and/or forces are interpolated back to the continuum.^{6–10} The accuracy and efficiency of the continuum-to-grid mapping (a procedure also called “charge assignment” or “charge spreading”) and its reverse are important factors in determining the performance of molecular dynamics (MD) simulations. Although our focus is on MD, we note that continuum-to-grid mapping is a problem encountered in other applications, such as the evaluation of quantum effects by path integral calculations,^{11} smoothing in medical image processing,^{12} and particle hydrodynamics calculations.^{13}

Continuum-to-grid mapping consists of approximating a charge normally located at a point **r** in the continuum by a number *n* of charges located on the nearest nodes of a regular grid (with spacing Δ). For simplicity, consider a one-dimensional case: A charge at point *x* is spread to all grid nodes *i*Δ within a radius *r*_{s} = (*n*/2)Δ of *x*. The amount of charge assigned to *i*Δ (more generally referred to as the “weight” assigned to that grid node) is a function of the relative position *x* − *i*Δ. Due to its functional dependence on *x*, we term this weight the *transfer function*; the set of points where the transfer function is non-zero is called its support. Because the size of the support is finite and the mesh size Δ is non-zero, the original point charge is only approximately represented by the weights; there are thus inherent errors in energies evaluated using the continuum-to-grid mapping. These errors can be reduced by increasing the support size, but at increased computational cost. Alternatively, for a given support size, one can try to reduce the errors—and thus obtain a better trade-off between accuracy and computational cost—by judicious choice of the transfer function*.*

The first transfer function to see widespread use in MD simulations was the triangular-shaped cloud (TSC), which spreads a charge to the nearest three grid nodes in the one-dimensional case and 27 grid nodes in the three-dimensional case. TSC is used in the context of the particle–particle particle–mesh (P3M) method of Hockney and Eastwood,^{6} who have demonstrated that the error in the total energy is *O*(Δ^{2}) in the limit Δ → 0. A widely used related method known as smooth particle mesh Ewald (SPME)^{7} uses cardinal B-splines for charge assignment to increase the convergence order beyond 2: For a support size of *n* in one dimension, the convergence order is also *n*. In three dimensions, one B-spline is used for each dimension to form an outer product (i.e., a separable function) that spreads a point charge to *n*^{3} grid nodes. (B-splines have more recently been replaced by Kaiser–Bessel functions^{14,15} and further optimized to dedicated Hermite splines.^{16}) The simplest transfer function is perhaps the truncated Gaussian used in Gaussian split Ewald (GSE).^{9} Although less accurate for a given support size, this function is used on the specialized supercomputer Anton,^{17} which takes advantage of the function’s simple form to evaluate it rapidly.

In this work, we show how to derive the optimal transfer functions (i.e., the sets of weights that minimize the error in the limit Δ → 0) for a given support, finding that these functions are splines in the position **r**. These optimal functions, which we term *midtown splines,* are continuous and piecewise smooth in the argument **r** for all even integers *n* and are thus suitable for MD, where differentiability is needed for the evaluation of the forces. We first derive the optimal transfer functions for one-dimensional systems and do so in two steps. In the first step, we address the related more general problem of evaluating the Gaussian smoothing of a function that is known only on a grid. In the continuum, Gaussian smoothing is an integral of the product between the function and a Gaussian of mean *x*. On a grid, this integral is approximated by a finite weighted sum known as a “quadrature rule” (see Refs. 18 and 19). The effect of the Gaussian that multiplies the function is captured by way of *x*-dependent quadrature weights, which, upon minimization of the error, are found to be splines in *x*: the *midtown quadrature splines* (QuadS). In the second step, QuadS are shown to minimize the asymptotic error in the electrostatic potential energy and are thus demonstrated to be the optimal transfer functions for evaluating long-range interactions in one dimension. Similar to B-splines, QuadS can be extended to three dimensions by way of an outer product, with the error being *O*(Δ^{n}) in the limit Δ → 0, but with the minimum possible proportionality constant among all separable transfer functions.

Separable spreading functions, being products of one-dimensional functions, are restricted to a cubic support, but further optimization is possible by considering non-separable spreading functions, which do not have this restriction on the shape of the support. Optimizing directly for non-cubic supports in higher dimensions entails higher-dimensional integration, which requires the determination of what is known as a “cubature rule.” The optimal weights are again found to be splines, which we call the *midtown cubature splines* (CubeS). CubeS achieve the same order of accuracy as QuadS but with smaller support sizes: Examples include one CubeS transfer function of accuracy order 4 with a support size of 32 grid nodes (compared to 64 nodes for QuadS) and two functions of order 6 with a support size of 80 or 88 nodes (compared to 216 nodes for QuadS). This makes CubeS about two times faster for charge spreading than QuadS—and thus all separable transfer functions, such as the widely used B-splines—at accuracy levels typically used in MD simulations. (We note that CubeS is slightly less accurate than QuadS for a given grid spacing Δ at the same order of convergence, but this is typically not important in practice. On most computational architectures, the spreading component dominates the cost of computing the long-range electrostatics. The savings from the smaller support size of CubeS is thus much larger than the marginal increase in the computational cost that results from the decrease in the grid spacing for CubeS to yield the same accuracy as QuadS.)

We provide practical guidance on how to determine the optimal computational setup for MD simulations using midtown splines in combination with the *u*-series decomposition (an alternative to the Ewald decomposition that is more computationally efficient). Using this setup, we found that simulations of water boxes on graphics processing units (GPUs) with CubeS and a support size of 88 are 30%–40% faster than those using QuadS with a support size of 216. These results illustrate the potential practical performance benefits of using CubeS in MD simulations. Finally, we note that on certain computational architectures—including on Anton—it is much more efficient to use a transfer function that is a function of the absolute distance to the grid nodes in three dimensions, such as the truncated Gaussian of the GSE algorithm. Although midtown splines are thus not directly applicable to Anton, their development yielded insight that allowed us to develop an improved version of the GSE algorithm, which we call *hyperbolic sine Gaussian split* (SinhGS), that can be used on such architectures.

## II. MIDTOWN QUADRATURE SPLINES

The midtown quadrature splines (QuadS) are introduced in this section as an optimization result for the problem of approximately representing and manipulating one-dimensional Gaussian smoothings by means of a grid $xi\u2261i\Delta :i\u2208Z$ of mesh size Δ. It is only in Sec. III that we demonstrate that the same splines also optimally solve the problem of one-dimensional continuum-to-grid charge transfer. This separation in which we first consider a more general optimization problem (defined more precisely below) before showing that the resulting splines are also optimal for charge spreading makes it clearer that the domain of applicability of the splines extends beyond charge spreading (to path integrals and image filter design), even though for concreteness, we retain the language of charge spreading below. To skip the details of the optimization problem, continue with Sec. VI, which gives practical guidance for most effectively using midtown splines as transfer functions to compute the electrostatics in MD simulations. Explicit formulas for midtown splines of orders 4 and 6 are provided in the Appendix, and formulas for other orders can be computed with the Mathematica code provided in the supplementary material.

The Gaussian smoothing *f*_{σ}(*x*) of an arbitrary function *f*(*x*) is defined by

with $G\sigma x,x\u2032$ denoting the normalized Gaussian kernel,

In practice, the smoothing can rarely be performed exactly because *f*(*x*) is almost always known only for the grid points *x*_{i}, whether by necessity (an apparatus sampling a signal may only allow for finite resolution) or by design (to speed up computations by storing and processing less information). The integral in Eq. (1) must then be replaced by a sum that only uses the samples *f*(*x*_{i}). An initial candidate is provided by the Riemann sum,

where we introduce the scaled width *ξ* = *σ*/Δ and position *s* = *x*/Δ. The sum in Eq. (3) is actually an infinite series because the support of the Gaussian kernel is unbounded. Computationally, we require a finite sum, and to obtain it, we must replace $G\xi s,i$ with an approximation *S*_{ξ}(*s*, *i*) that vanishes whenever |*s* − *i*| exceeds a certain finite bound *ν*. Given the fact that *i* is an integer, we restrict the bound *ν* to integers. The smoothing *f*_{σ}(*x*) is approximately recovered by the *interpolation formula* (now effectively a finite sum),

For each $x\u2208R$, the values $S\sigma /\Delta x/\Delta ,i:i\u2208Z$ represent a set of weights that, together with the nodes *x*_{i}, define a quadrature rule. The associated quadrature sum [i.e., the sum in the right-hand side (rhs) of Eq. (4)] is meant to approximate the continuous integral in Eq. (1), and as a first optimization problem, we could attempt to minimize the error of this approximation. Before doing so, let us point out the existence of a different way of approximating *f*_{σ}(*x*). This approach is given by what we call the *assignment formula*,

which applies even when *f*(*x*) is not continuous (e.g., if it is a Dirac delta localized between the grid nodes), but only computes the values of *f*_{σ}(*x*) on the grid. To make sense of Eq. (5), revisit Eq. (1) and simply note that $G\sigma x,xi$ has been replaced with the transfer function $\Delta \u22121S\xi x/\Delta ,i$.

At this point, a natural question is which of the two formulas is to be optimized. It is helpful to anticipate the results: we will demonstrate that the kernel *S*_{ξ}(*s*, *i*) that optimizes the assignment formula to a given order of accuracy simultaneously optimizes the interpolation formula to the same order of accuracy. In other words, the choice of objective function to be optimized is a false dilemma, but this will become clear only after several propositions are first demonstrated. The optimization problem will be stated and solved at the end of this section in Proposition 4. We will additionally find that the optimal transfer functions $\Delta \u22121S\xi x/\Delta ,i$ are at least continuous and piecewise differentiable in the argument *x*.

It is important to note that, although the variables *x* and *x*′ in $G\sigma x,x\u2032$ are interchangeable, they perform different roles in Eq. (1): *x*′ is an integration variable, whereas *x* is an interpolation variable. The difference is more conspicuous in $S\xi s,i$, since the variable *i* is now discrete. We distinguish between the discrete representation $S\xi s,\u22c5$, which we call the *assignment view*, and the continuous representation $S\xi \u22c5,i$, which we call the *interpolation view*. The interpolation and assignment views are named so because they showcase the action of the interpolation and assignment formulas, respectively, on singular charges. We obtain the assignment view by applying the assignment formula to a unit charge located at *x*. Since the charge density is then the Dirac function *δ*(*x*′ − *x*), we calculate

To obtain the interpolation view, we apply the interpolation formula to a unit charge located on the grid at node *x*_{i} to obtain

with *δ*_{ij} denoting the Kronecker symbol and modeling the singular charge distribution on the grid.

We now provide the details needed to both define the optimization problem and identify the optimal $S\xi s,i$. We will start by requiring that the interpolation formula is exact for all polynomials *f*(*x*) of degree strictly less than 2*ν*. This requirement defines the kernel $S\xi s,i$, which turns out to have the form $S\xi s\u2212i,0$, with $S\xi s,0$ being a spline that is uniquely determined except for the value of the nonlinear parameter *ξ*. We then show that with this kernel, the assignment formula is also exact for all polynomials of order strictly less than 2*ν*. The optimal value of the non-linear parameter *ξ* is determined, predictably, by minimizing the error in the moment of order 2*ν* for the assignment formula.

### A. Accuracy of the interpolation formula

To ensure an order of accuracy equal to 2*ν*, we will define the assignment view $S\xi x/\Delta ,\u22c5$ such that the interpolation formula is exact for all functions *f*(*x*) that are polynomials of degree strictly less than 2*ν*. Put differently, we require the assignment view to represent the weights for a Newton–Cotes quadrature rule of order 2*ν*. As noted in Ref. 18, such a rule might be better said to be of algebraic degree 2*ν* − 1. Nevertheless, it is also noted in Ref. 18 that the rule invariably leads to error terms with asymptotic decay of the form *const* · Δ^{2ν} in the limit Δ → 0 for expressions in which the exact integral is replaced by the quadrature sum. Expressions with error decay of the form *const* · Δ^{2ν} in the limit Δ → 0 are said to be of accuracy order 2*ν*, and the error is said to be *O*(Δ^{2ν}) as Δ → 0. The quadrature rule itself is then rightfully said to be of accuracy order 2*ν*, a nomenclature that is more widespread than the concept of algebraic degree and that we adopt here to characterize the accuracy of both the interpolation and assignment formulas. We only consider even orders of accuracy, corresponding to integer values for *ν*, but provide a short discussion on the properties of midtown splines of odd orders in Appendix F. The nodes of the quadrature rule are the grid nodes *x*_{i} within a distance *ν*Δ of *x* because, as already mentioned, the assignment view is set equal to 0 outside this range.

When dealing with quadrature rules, it appeared to be easier to work in terms of the pair ($\u230as\u230b$, *θ*) instead of just *s*, with $\u230as\u230b$ and *θ*(*s*) = *s* − $\u230as\u230b$ denoting the integer and fractional parts of *s*, respectively. *θ*(*s*) takes values in the interval [0, 1). The decomposition *s* = $\u230as\u230b$ + *θ*(*s*) establishes a one-to-one correspondence between *s* and the pair ($\u230as\u230b$, *θ*). Owing to this bijection, we can drop the explicit dependence of *θ*(*s*) on *s* and think of *θ* as a variable taking values in the range [0, 1). We will do so most of the time, except when readability can be improved by explicitly specifying the dependence *θ*(*s*).

The support of the assignment view $S\xi s,\u22c5$ is made up of the grid nodes in the set {$x\u230as\u230b\u2212\nu +1$, $x\u230as\u230b\u2212\nu +2$, …, $x\u230as\u230b+\nu $}. They are indicated by the shaded ellipses in Fig. 1. Let us denote the quadrature weight associated with the node $x\u230as\u230b+i$ as *c*_{i}(*θ*, *ξ*) for −*ν* < *i* ≤ *ν*. We consider the following approximation to *f*_{σ}(*x*) defined as a standard Newton–Cotes quadrature sum,

Comparison with Eq. (4) shows that

The interpolation formula becomes

As we have already mentioned, the quadrature weights *c*_{i}(*θ*, *ξ*) are obtained by demanding that the Newton–Cotes rule integrate exactly all polynomials of degree at most 2*ν* − 1. That is, we require that

for some $a\u2208R$ and for all 0 ≤ *k* < 2*ν*. The value of *a* is immaterial because the set of polynomials in *x*′ spanned by the monomials {(*x*′ − *a*)^{k}; 0 ≤ *k* < 2*ν*} is independent of *a*. Picking *a* = ($s$ + 1/2)Δ for convenience of calculation, we deduce that the weights *c*_{i}(*θ*, *ξ*) satisfy the linear system of equations,

for all 0 ≤ *k* < 2*ν*. In Eq. (12), *μ*_{k} is the *k*th moment of a standard normal variable,

with (*k* − 1)‼ denoting the product of every odd number from 1 to *k* − 1 if *k* ≥ 2 and 1 otherwise. For now, *ξ* = *σ*/Δ represents a free parameter, the value of which will be decided later. The system specified by Eq. (12) is uniquely determined because the matrix {(*i* − 1/2)^{k}; 0 ≤ *k* < 2*ν*, −*ν* < *i* ≤ *ν*} is invertible.

It is useful to extend the domain on which the weights *c*_{i}(*θ*, *ξ*) are defined from [0, 1) × (0, ∞) to [0, 1] × (0, ∞). The extension is done by taking lim_{θ↑1}*c*(*θ*, *ξ*) or, equivalently, by solving Eq. (12) for *θ* = 1. On the extended range, we have the symmetry property

The symmetry is demonstrated in Appendix B.

The linear system given by Eq. (12) always has a unique solution, and the user can find the explicit solutions for orders 4 and 6 in the Appendix. Solutions exist even when the mesh Δ is so large that it cannot accurately represent the features of a Gaussian of spread *σ* or, at the other end, when Δ is so small that the diameter of the support set is shorter than the distances on which the Gaussian density is important. Clearly then, *ξ* = *σ*/Δ cannot be arbitrary: Only values at or near an optimal ratio *ξ*_{○} are numerically useful. The value of the optimal ratio is established in Sec. II B.

### B. Accuracy of the assignment formula

Similar to how the assignment view controls the accuracy of the interpolation formula, the interpolation view controls the accuracy of the assignment formula. In this subsection, we demonstrate that, with the weights determined above, the assignment formula is also exact for all functions *f*(*x*) that are polynomials of degree 2*ν* − 1 or less. We additionally demonstrate that the interpolation view is a continuous spline and then proceed to determine the optimal value of *ξ* by minimizing the error of the assignment formula for a polynomial of order 2*ν*. It will be shown that the assignment formula becomes exact for polynomials of order 2*ν* and 2*ν* + 1 if *ν* is odd and the interpolation view becomes *C*^{2}-continuous if *ν* is even.

The interpolation view models the dependence of the amount of charge transferred to a grid node *x*_{i} = *i*Δ with the location of the point charge. Referring to Fig. 1 as a visual aid, imagine placing an observation apparatus at the grid node *x*_{i}, measuring the intensity of the quadrature weight. When the point charge is away from the node by more than *ν*Δ, the signal registered by the apparatus is 0. As the point charge continues to approach the node from the left, the signal recorded by the apparatus becomes *c*_{ν}(*θ*, *ξ*), with *θ* increasing from 0 to 1 as *x* − *x*_{i} increases from −*ν*Δ to (1 − *ν*)Δ. When *x* − *x*_{i} becomes exactly (1 − *ν*)Δ, the signal is no longer given by *c*_{ν}(1, *ξ*), as one would infer by continuity from the left, but by *c*_{ν−1}(0, *ξ*). Unless the two values are equal, the signal is discontinuous. The signal is then given by the weight *c*_{ν−1}(*θ*, *ξ*) as long as *x* − *x*_{i} lies between (1 − *ν*)Δ and (2 − *ν*)Δ. Then, the measured weight changes to *c*_{ν−2}(*θ*, *ξ*), and so on, until the entire set of weights {*c*_{j}(*θ*, *ξ*); −*ν* < *j* ≤ *ν*} is exhausted. Once *x* − *x*_{i} exceeds *ν*Δ, the apparatus registers zero because the quadrature rule has compact support.

The signal intensity measured by the apparatus is given by the *x*-dependence of the transfer function $\Delta \u22121S\xi x/\Delta ,i$ or, equivalently, by the *s*-dependence of $S\xi s,i$. The dependence is a piecewise polynomial, owing to the definition of $S\xi s,i$. As can be inferred from Eq. (9), the integers *j* with |*j* − *i*| ≤ *ν* may constitute points of discontinuity for the *s*-dependence of $S\xi s,i$ and its derivatives. $S\xi s,i$ often needs to be differentiable (in order to compute forces in MD simulations, for example), and in such cases, $S\xi s,i$ must be continuous at integer values of *s*; more precisely, we must have *c*_{ν}(0, *ξ*) = 0, *c*_{1−ν}(1, *ξ*) = 0, and *c*_{j+1}(1, *ξ*) = *c*_{j}(0, *ξ*) for −*ν* < *j* < *ν*. The following proposition shows that the dependence of $S\xi s,i$ with *s* is a continuous piecewise polynomial (a proper spline).

*Proposition 1.*

*The interpolation view can be decomposed as the sum between a continuous piecewise linear spline and a* C^{2}*-continuous spline. The piecewise linear component vanishes if* dc_{ν}(θ, ξ)/dθ = 0 *at* θ = 0*.*

The proof of the proposition is given in Appendix C. A graphical illustration for order 2*ν* = 6 is given in Fig. 2 for different values of *ξ*.

*θ*(

*s*−

*i*),

*ξ*) = $ci\u2212s$(

*θ*(

*s*),

*ξ*). The second equality is demonstrated for non-integer values of

*s*by first using the identities

*θ*(

*s*−

*i*) +

*θ*(

*i*−

*s*) = 1 and $i\u2212s$ + $s\u2212i$ = −1 to infer $c0\u2212s\u2212i$(

*θ*(

*s*−

*i*),

*ξ*) = $c1+i\u2212s$(1 −

*θ*(

*i*−

*s*),

*ξ*) and then using Eq. (14) to conclude $c1+i\u2212s$(1 −

*θ*(

*i*−

*s*),

*ξ*) = $c0\u2212i\u2212s$(

*θ*(

*i*−

*s*),

*ξ*). The equality is extended to integer values of

*s*by continuity of the interpolation view of the spline.

*s*∉ [

*i*−

*ν*,

*i*+

*ν*] or, equivalently,

*x*∉ [

*x*

_{i}−

*ν*Δ,

*x*

_{i}+

*ν*Δ]. The assignment formula defined by Eq. (5) is alternatively given by

*Proposition 2.*

*For each* 0 ≤ *k* < 2*ν, the k-th moment* $\u222bR(s\u2212i)kS\xi s,ids$ *equals the k-th central moment of a Gaussian of standard deviation ξ and mean* 0*.*

The proof is presented in Appendix C.

*ξ*

_{○}for the nonlinear parameter

*ξ*=

*σ*/Δ. Both Proposition 1 and Proposition 2 offer a potential candidate. We could try for the

*C*

^{2}-continuous parameterization, which is determined by solving the following equation:

*ν*be exactly reproduced (the moment of order 2

*ν*+ 1 is also reproduced by symmetry, since it is 0). That is, we try to solve the following equation:

*Proposition 3.*

*If*2*ν*= 4k + 2*for some integer*k*, there exists a unique*ξ*, to be denoted by*ξ_{○}*, such that the interpolation view matches two more additional central moments (the moments of order*2*ν and*2*ν*+ 1*). There exists no*C^{2}*-continuous parameterization.**If*2*ν*= 4k*for some integer*k*, there exists a unique*ξ*, to be denoted by*ξ_{○}*, such that the interpolation view is*C^{2}*continuous. There exists no parameterization for which the interpolation view matches the moment of order*2*ν.*

^{20}for all orders up to 2

*ν*= 40. The first few values of

*ξ*

_{○}are presented in Table I. The Mathematica investigations have also shown that the quadrature weights

*c*

_{i}(

*θ*,

*ξ*

_{○}) are non-negative and that the following optimality recharacterization of ξ

_{○}holds true.

Order 2ν
. | ξ_{○}
. | Continuity class . |
---|---|---|

4 | $1/3$ | C^{2} |

6 | 0.728 794 88… | C^{0} |

8 | 0.835 436 00… | C^{2} |

10 | 0.946 570 66… | C^{0} |

12 | 1.030 918 80… | C^{2} |

14 | 1.122 896 72… | C^{0} |

Order 2ν
. | ξ_{○}
. | Continuity class . |
---|---|---|

4 | $1/3$ | C^{2} |

6 | 0.728 794 88… | C^{0} |

8 | 0.835 436 00… | C^{2} |

10 | 0.946 570 66… | C^{0} |

12 | 1.030 918 80… | C^{2} |

14 | 1.122 896 72… | C^{0} |

*Proposition 4.*

*The value*ξ

_{○}

*of Proposition*3

*is alternatively obtainable by minimizing the absolute error in the moment of order*2

*ν for the interpolation view*,

*against*ξ.

For 2*ν* = 4*k* + 2, the error expressed by Eq. (19) vanishes when *ξ* = *ξ*_{○}. For 2*ν* = 4*k*, the error does not vanish but attains its non-zero minimum for the same *ξ*_{○} that produces the unique *C*^{2}-continuous parameterization. We *define* QuadS through $S\xi \u25cbs,0$ and use it by way of the formula $S\xi \u25cbs,i=S\xi \u25cbs\u2212i,0$.

*ν*→ ∞, with

*κ*≈ 0.427. Remember that

*ξ*

_{○}=

*σ*/Δ and

*ν*=

*r*

_{s}/Δ. As

*ν*increases, the mesh size Δ thus becomes progressively smaller and the truncation radius

*r*

_{s}progressively larger than the standard deviation

*σ*of the Gaussian kernel. Consequently, the Riemann sum appearing in the rhs of Eq. (3) accurately approximates

*f*

_{σ}(

*x*) in the limit

*ν*→ ∞ even when the sum is truncated to only include the grid nodes

*x*

_{i}within a distance

*r*

_{s}of

*x*. The quadrature sum expressed by the rhs of Eq. (4) also becomes increasingly accurate in the limit

*ν*→ ∞ because the quadrature is exact for all polynomials of degree at most 2

*ν*− 1. Since

*f*(

*x*) is arbitrary, the quadrature weights for the two techniques must thus converge to each other as

*ν*→ ∞. To avoid clutter, our notation withholds the explicit dependence of

*ξ*

_{○}and $S\xi s,j$ with

*ν*but, after taking note of it, the expected convergence is

## III. APPLICATIONS OF THE MIDTOWN QUADRATURE SPLINES TO ELECTROSTATICS

In this section, we show that QuadS are the optimal transfer functions to use when computing the long-range component of the electrostatic energy in one dimension (also in higher dimensions among separable transfer functions). As mentioned in the Introduction, the optimal transfer functions in higher dimensions are non-separable, and these CubeS functions are described in Sec. IV; practical guidance on choosing appropriate parameters for both QuadS and CubeS electrostatic calculations is given in Sec. VI.

For electrostatics calculations, the function *f*(*x*) represents a one-dimensional charge distribution density vanishing outside a bounded interval. The long-range component of the electrostatic energy is given by the formula

with $K\sigma x,x\u2032$ representing the long-range interaction kernel between two unit charges located at *x* and *x*′. We consider $K\sigma x,x\u2032$ to be a linear superposition of Gaussian kernels $G\sigma \u2032x,x\u2032$, with the standard deviation *σ*′ larger than or equal to a minimum value denoted by *σ*. Practically useful examples of such superpositions include the *u*-series and Ewald kernels for the long-range component of the electrostatics, the forms of which will be given later in this section. We mention from the outset that it is the narrowest Gaussian in the superposition, the one of width *σ*, that ultimately dictates the maximum mesh size Δ of a common grid used to handle all Gaussians in the superposition. We note here that due to linearity, there is no loss of generality in our arguments establishing the optimality of QuadS by restricting the class of interaction kernels to either $K\sigma x,x\u2032=G\sigma x,x\u2032$ for non-periodic charge distributions or

for periodic charge distributions with period *a*. This restriction is made solely to avoid carrying around an extra exterior integral or sum denoting the superposition.

The starting point for designing grid-based methods is the splitting

where $\sigma k=(\sigma 2\u22122\sigma s2)1/2$ ranges between 0 and *σ*. As a result, *σ*_{s} is allowed to vary between 0 and $\sigma /2$, and its determination for typical problems is discussed extensively in Sec. VI. The splitting is a direct application of the Gaussian convolution theorem, which states that the repeated convolution of Gaussians centered about the origin results in a Gaussian with variance equal to the sum of the variances of the convolving factors. If the integrals in the previous equation are replaced by the infinite Riemann sums,

then the interaction energy is approximated by

Although not directly usable in practice, Eq. (25) is suggestive of a wide class of implementable approximations for the interaction energy. Members of this class are obtained by modifying the on-grid interaction kernel $K\sigma kxi,xj$ and/or replacing the Gaussians $G\sigma sx,xi$ entering Eq. (24) with arbitrary transfer functions $T\sigma sx,xi$ vanishing for all |*x* − *x*_{i}| ≥ *r*_{s} > 0. Our optimization is over a subclass of models that keeps the on-grid interaction kernel unchanged and sets the spreading radii *r*_{s} in the form *ν*Δ, with fixed but arbitrary *ν*. We note here (partially anticipating the results of the optimization) that the error for the optimal model in the subclass decays only polynomially fast. It follows then that the optimal transfer functions $T\sigma sx,xi$ remain optimal for all models for which the on-grid interaction kernel takes the general form $K\u0303\sigma k(\Delta )x,x\u2032$, as long as it converges to $K\sigma kx,x\u2032$ in the limit Δ → 0 at a rate faster than any polynomial. Mimicking what many other P3M-inspired methods do, the model we ultimately parameterize in Sec. VI uses a related spectral kernel $K\u0303\sigma k(\Delta )x,x\u2032$ built by restricting the Fourier transform of $K\sigma kx,x\u2032$ to the first Nyquist zone, according to Eq. (74). This kernel converges to $K\sigma kx,x\u2032$ superexponentially in the limit Δ → 0 and is preferred in practice because it permits the use of the fast Fourier transform (FFT) algorithm.

With these clarifications in mind, we now set out to find the optimal $T\sigma sx,xi.$

To further simplify the analysis, we use the splitting

with $\sigma h=\sigma k/2$, to write the interaction energy as

A little thought shows that it is enough to optimize the rate of the convergence,

since $K\sigma h$ is either $G\sigma h$ or the periodic extension of it. We will refer to the lhs of Eq. (28) as the half-potential. The total energy is then the convolution of two such half-potentials.

The QuadS transfer function is $T\sigma sx,xi=\Delta \u22121S\sigma s/\Delta x/\Delta ,i$ and turns out to be the optimal transfer function if *σ*_{s} = ξ_{○}Δ, as established by the following proposition.

*Proposition 5.*

*For piecewise continuous charge distributions* f(x)*, the QuadS transfer function* $\Delta \u22121S\xi \u25cbx/\Delta ,i$ *is asymptotically optimal among all possible assignment kernels* $T\sigma sx,xi$ *that vanish for* |x − x_{i}| ≥ νΔ*.*

*O*(Δ

^{2ν}), as a consequence of Eq. (D4), which produces the following asymptotic upper bound for the error:

*ξ*

_{s}=

*σ*

_{s}/Δ. [We note that for piecewise-continuous charge distributions, the convergence may be better than

*O*(Δ

^{2ν}). Equation (D10) together with Proposition 4 and Proposition 3 (used in this order) reveals that the convergence is

*o*(Δ

^{2ν}) in the limit Δ → 0, if

*ν*is an odd number.]

**Σ**, let

**σ**be a 3 × 3 matrix such that

**Σ**=

**σσ**

^{⊤}. The interaction kernel $K\sigma r,r\u2032$ is either

**a**

_{1},

**a**

_{2},

**a**

_{3}. As a notational convenience, $K\sigma r,r\u2032$ with scalar

*σ*is defined as $K\sigma r,r\u2032$ with

**σ**=

*σI*

_{3}and

*I*

_{3}equal to the identity matrix in three dimensions.

**σ**

_{k}is such that $\sigma k\sigma k\u22a4=\sigma 2I3\u22122\sigma s\sigma s\u22a4$, a relationship that can be easily understood by remembering that the repeated convolution of multi-dimensional Gaussians results in a Gaussian with the covariance matrix equal to the sum of the covariances of the participating factors. The grid node

**r**

_{i}indexed by the vector

**i**≡ (

*i*

_{1},

*i*

_{2},

*i*

_{3})

^{⊤}is defined as

**Δ**to be the 3 × 3 matrix with the

*k*th column formed by the coordinates of the primitive vector

**b**

_{k}of the grid. Then,

*V*

_{p}= |det(

**Δ**)| is the volume of the primitive cell of the grid, and

**r**

_{i}=

**Δi**.

**σ**

_{s}and the grid mesh

**Δ**is. If we scale the integration coordinates in Eq. (32) by

**Δ**

^{−1}, we end up with the equivalent problem of spreading a Gaussian charge of covariance (

**Δ**

^{−1}

**σ**

_{s})(

**Δ**

^{−1}

**σ**

_{s})

^{⊤}onto a grid having the mesh

*I*

_{3}. Different from the one-dimensional case, the asymptotic error for the interaction energy is no longer a product between a spreading-dependent factor and a charge/kernel-dependent factor, as it appears in Eq. (D10). We desire a concept of optimality that is independent of the charge density

*f*(

**r**) though, and we interpret this desiderate to mean that the constraint

*ξ*

_{s}that is yet to be determined. Equation (36) asserts a relationship expected to hold for MD simulation in the liquid phase for which the typical charge distributions are fairly isotropic (as in a cubic box of water). On the grounds of isotropy, the optimal

**Δ**must be of the form

**Δ**= Δ

*I*

_{3}, with Eq. (36) then asserting that the optimal

**σ**

_{s}must also be an isotropic matrix (a multiple of the identity matrix).

**σ**

_{s}=

*ξ*

_{s}

**Δ**implied by Eq. (36), $det\Delta G\sigma s\Delta s,\Delta i$ is separable and can thus be approximated by the following outer product of QuadS:

*ξ*

_{○}listed in Table I. The interaction energy defined by Eq. (32) is approximated by the expression

**i**.

*u*-series kernel, which defines the long-range component of the electrostatics by way of the infinite sum

*a*

_{k}> 0 and base

*b*> 1. Another example is the popular Ewald kernel for the long-range electrostatics, which is the continuous superposition $2\pi \u222b\sigma 2\u221eGu1/2r,r\u2032du$. The optimality results implied by Proposition 5 extend to all such more general interaction kernels.

As noted above, there are likely to be other fruitful applications of QuadS in addition to evaluation of the long-range component of electrostatics on which we have focused here; we present another example application of QuadS (to path integrals) in Appendix E. The main result is Proposition 6, which states that QuadS of order 4*k* represent the optimal transfer functions for grid-to-grid assignment.

## IV. MIDTOWN CUBATURE SPLINES

QuadS are only the optimal functions in higher dimensions if we restrict ourselves to separable functions, which require cubic support shapes. In this section, we consider non-separable functions, which provide opportunities for further optimization by changing the shape of the support. (We note that the most useful ones of such functions for MD simulations are given explicitly in the Appendix, and practical guidance on their use in MD simulations is given in Sec. VI.)

It is instructive to first consider why a product of one-dimensional midtown splines does not result in an optimal cubature scheme absent a request for separability. The reason is that the central moments

exactly match those for a Gaussian of covariance matrix $\xi \u25cb2I3$, not only for all multi-indices **α** = (*α*_{1}, *α*_{2}, *α*_{3}) with |**α**| ≡ *α*_{1} + *α*_{2} + *α*_{3} < 2*ν* but also for many multi-indices with |**α**| ≥ 2*ν*. In *d* dimensions, the number of moments of degree smaller than 2*ν* (the bare minimum that needs to be matched) is equal to the number of weak compositions of 2*ν* − 1 in precisely *d* + 1 parts (the number of ways in which 2*ν* − 1 can be written as a sum of *d* + 1 non-negative integers). It is given by the formula

A *d*-dimensional integration scheme based on QuadS uses (2*ν*)^{d} nodes and matches (2*ν*)^{d} moments. In three dimensions, for large enough 2*ν*, QuadS matches 3! = 6 times more moments than the minimum necessary. For order 2*ν* = 6, the minimum number of moments is 56, yet the cubature based on QuadS matches 6 × 6 × 6 = 216 moments. Some of these moments, for example, the one of multi-index **α** = (5, 4, 3), have total order |**α**| = 12. Since most other moments of order 12 are not matched, the accuracy order remains 6. We will show that there exist cubature splines (CubeS) of order 2*ν* that use much fewer than (2*ν*)^{d} nodes. In effect, we formulate and study a cubature problem that would be typical if not for the additional requirement of continuity of the interpolation view.

As for QuadS, the CubeS assignment kernel is given by the rhs of Eq. (38), with the following *d*-dimensional extension of Eq. (9):

satisfying an analog of Eq. (15), namely,

The vector-valued functions $s$ and **θ**(**s**) are component-wise extensions of their one-dimensional counterparts. The support set *D* is equal to the integer interval [(1 − *ν*)⋯*ν*] for QuadS in one dimension and to [(1 − *ν*)⋯*ν*]^{d} for QuadS in *d* dimensions (defined as outer products of one-dimensional midtown splines). For CubeS, the support set *D* is a proper subset of [(1 − *ν*)⋯*ν*]^{d}. CubeS must integrate exactly all monomials of order less than 2*ν*, that is,

for all multi-indices **α** with |**α**| < 2*ν*. This system of equations is the multidimensional analog of Eq. (12). The system is overdetermined unless the number of cubature nodes is larger than the number of moments of total order less than 2*ν*. Equation (43) thus provides a lower bound on the minimum number of elements *D* must contain.

The selection of the proper cubature nodes is a rather delicate problem, made challenging by the large amount of interesting mathematical information encoded by a multidimensional Gaussian measure. We have studied two kinds of sets *D*, both defined by

for some norm ∥·∥_{p}. The two important norms for our study are the *L*^{1} or Manhattan norm ∥*x*∥_{1} = |*x*_{1}| + ⋯ + |*x*_{d}| and the *L*^{2} or Euclidean norm ∥*x*∥_{2} = (|*x*_{1}|^{2} + ⋯ + |*x*_{d}|^{2})^{1/2}. The *L*^{∞} or Chebyshev norm ∥*x*∥_{∞} = max{|*x*|_{1}, …, |*x*|_{d}} is also important, but this choice results precisely in (outer products of) QuadS. The nodes singled out by *D*_{p} are those within the *L*^{p} distance of *ν* − 1 from the unit cell [0, 1]^{d}, the place where the vectors **θ** live. We have *D*_{∞} = [1 − *ν*⋯*ν*]^{d} and *D*_{1} ⊆ *D*_{2} ⊆ *D*_{∞}. The inclusions follow from the inequalities ∥·∥_{∞} ≤ ∥·∥_{2} ≤ ∥·∥_{1}. The family of cubature splines of support *D*_{p} will be denoted by CubeS_{p}, and we note that CubeS_{∞} is alternatively denoted by QuadS.

Driven by practical considerations, we will set *d* = 3 for the remainder of this section and assess the relative numerical advantages of different CubeS_{p} schemes by simply examining the sizes and shapes of their supports. For large enough *ν*, the cardinality of *D*_{p}, denoted by |*D*_{p}|, is given roughly by the volume of the set. The ratios |*D*_{1}|/|*D*_{∞}| and |*D*_{2}|/|*D*_{∞}| approach 1/6 and *π*/6, respectively. |*D*_{1}| thus scales with *ν* in the same way as the lower bound given by Eq. (43), and it ought to result in cubature splines that have nearly minimal numbers of nodes. Although scaling less favorably, |*D*_{2}| is not much larger than |*D*_{1}| for the small orders 2*ν* that are relevant to practical applications, as shown in Table II. The support *D*_{2} (a solid sphere for large 2*ν*) more closely captures the rotational symmetry of the spreading Gaussian than either *D*_{∞} (a cube) or *D*_{1} (an octahedron). The support *D*_{∞} is the only one that captures the separability of the spreading Gaussian, which sometimes can be exploited computationally and may partially compensate for the larger support size.

Order 2ν . | Lower bound . | CubeS_{1}
. | CubeS_{2}
. | QuadS (CubeS_{∞})
. |
---|---|---|---|---|

4 | 20 | 32 | 32 | 64 |

6 | 56 | 80 | 88 | 216 |

8 | 120 | 160 | 232 | 512 |

10 | 220 | 280 | 432 | 1024 |

Order 2ν . | Lower bound . | CubeS_{1}
. | CubeS_{2}
. | QuadS (CubeS_{∞})
. |
---|---|---|---|---|

4 | 20 | 32 | 32 | 64 |

6 | 56 | 80 | 88 | 216 |

8 | 120 | 160 | 232 | 512 |

10 | 220 | 280 | 432 | 1024 |

If the cardinality of *D*_{p} is larger than the minimum number of moments necessary to achieve the convergence order 2*ν* (henceforth also referred to as the minimal list of moments), the linear system given by Eq. (46) is underdetermined. In order to make the system uniquely determined, we must enlarge the list of matched moments. For QuadS, we know that we must match all moments of multi-index **α** with ∥**α**∥_{∞} < 2*ν*. There are (2*ν*)^{3} such moments, and their total equals the cardinality of *D*_{∞}. This maximal list of moments also provides the candidate moments for CubeS_{1} and CubeS_{2} by the following prescription. The maximal list is first partially ordered as follows: We let the total order |**α**| increase from 0 to 3(2*ν* − 1). For each such |**α**|, we generate its integer partitions in at most 3 parts, with the partitions listed in reverse lexicographical order—e.g., (5, 0, 0) comes before (4, 1, 0). A permutation of such an integer partition defines a multi-index in the maximal list, and all elements of the maximal list are so enumerated. To summarize, in our ordering of the maximal list, multi-indices **α** with smaller |**α**| come first, integer partitions with same |**α**| are in the reverse lexicographical order, and permutations of the same integer partition are equivalent.

The minimal list of moments corresponds to all multi-indices from the maximal list that has a total order less than 2*ν*. Starting with the total order of 2*ν*, we provisionally add one group of equivalent permutations at a time, the group being taken in the order described in the previous paragraph. We form the linear system from Eq. (46) and try to solve it symbolically in Mathematica. If the system becomes inconsistent, the group of permutations is removed, and the next group is provisionally added. The process is stopped when the number of moments equals |*D*_{p}|. Of course, it is not trivial that the process stops before the maximal list is exhausted. We do not have a proof, but we have verified the success of the procedure for all three distances up to order 2*ν* = 10. We have additionally verified that the solutions are *C*^{0} continuous, which is again a non-trivial statement. Hence, at least for the cases presented in Table II, cubature splines exist.

We added entire groups of permutations to the list of matched moments so that the cubature rules possess some of the basic symmetries of the Gaussian measure they replace. All CubeS constructed above feature reflection and permutation symmetry. Imagine a coordinate system with the origin at the center of the unit cell [0, 1]^{3}, as depicted in Fig. 5. Each octant (±1, ±1, ±1) contains an equal number of cubature nodes, which can be obtained from the nodes of the first octant (+1, +1, +1) by a sequence of reflections with respect to the main planes of the coordinate system. For each **i** ∈ *D*_{p} and **θ** ∈ [0, 1]^{3}, the reflected node **i**^{†} in the first octant and the location **θ**^{†} are

with the vector operations being component-wise. The reflection symmetry implies

which generalizes Eq. (14) to CubeS. The cubature weights additionally relate to each other by way of permutation symmetry. Let *σ* denote a permutation of {1, 2, 3}. The permutation acts on the arbitrary vector **x** = (*x*_{1}, *x*_{2}, *x*_{3})^{⊤} in such a way that *σ*·**x** is the vector of entries *x*_{σ(i)} for 1 ≤ *i* ≤ 3. The permutation symmetry has that

Together with Eq. (49), the last identity implies that it is enough to specify the cubature weights for those nodes **i** for which *i*_{1} ≥ *i*_{2} ≥ *i*_{3} ≥ 1. Figure 5 shows the nodes that are assigned charges for fourth-order CubeS_{1} and CubeS_{2}. (At order 4, these cubature splines are identical). Owing to permutation and reflection symmetry, only the two nodes labeled (1, 1, 1) and (2, 1, 1) need to have their weights specified.

Owing to the use of the Euclidean distance in the definition of the support, CubeS_{2} better capture the spherical symmetry of the spreading Gaussian in scaled coordinates. For all CubeS, the optimal *ξ*_{○} is determined by minimizing the absolute error in the spherical moment of order 2*ν*, which is given by

For orders 2*ν* = 4*k* + 2, this absolute error vanishes for a unique *ξ*_{○}, for which the weights of CubeS_{2} are all positive. For orders 2*ν* = 4*k*, the error does not vanish. The unique minimum is realized for a value *ξ*_{○} for which the CubeS_{2} have one weight (not counting its symmetry-induced variants) that can take small negative values. The CubeS_{1} family is decidedly worse with respect to the positivity of the cubature weights. Many weights take negative values for 2*ν* ≥ 8. For order 4, CubeS_{1} are identical to CubeS_{2}. For order 6, although both CubeS_{1} and CubeS_{2} have positive weights, CubeS_{2} have smaller cost for evaluating the cubature weights and their positional derivatives, despite the slightly larger number of cubature nodes (88 vs 80). For these reasons, in the Appendix, we only present CubeS_{2} of order 4 and 6, as CubeS_{1} of order 6 does not seem to present any supplementary computational advantage.

## V. HYPERBOLIC SINE GAUSSIAN SPLIT METHOD

As mentioned in the Introduction, Anton’s specialized hardware allows for fast computation of transfer functions that depend solely on the absolute distance to the grid nodes in three dimensions. The midtown splines do not have this property and so are not suitable to be used directly on Anton. Their development, however, gives insight into how to improve upon the GSE method used on Anton, while retaining the ability to take advantage of its specialized hardware. In this section, we describe such an improved method.

GSE uses a very simple transfer function: a Gaussian $G\sigma sr,ri$ truncated to a certain finite spreading range ∥**r** − **r**_{i}∥ < *r*_{s}. As a consequence, GSE can only increase its accuracy by increasing the spreading range *r*_{s} while simultaneously decreasing the mesh size Δ. In the original method, the optimal value for *σ*_{s} is determined empirically, but Lindbo and Tornberg^{10} have demonstrated the optimal scaling *σ*_{s}/Δ = *κ*(*r*_{s}/Δ)^{1/2}, with $\kappa =1/C2\pi $. (While the precise value of *C* is undetermined, it is argued to be a number slightly smaller than 1; with the value *C* = 0.976 used by Lindbo and Tornberg, we obtain *κ* ≈ 0.409.) Their scaling can also be inferred from our research on quadrature splines. In Sec. I B, we determined that *σ*_{s}/Δ ≈ *κ*(*r*_{s}/Δ)^{1/2} for QuadS, with *κ* ≈ 0.427. From Eq. (20), we deduce that the QuadS transfer function $det\Delta \u22121S\xi \u25cb\Delta \u22121r,i$ is well approximated by

The last equation used with ∥**Δ**^{−1}**r** − **i**∥ truncated to *ν* gives rise to Spectral Ewald (SE), the improved version of GSE studied by Lindbo and Tornberg. However, further improvements are possible, and in the remainder of this section, we use the asymptotic form expressed by Eq. (52) as a starting point for developing an improved transfer function suitable for Anton.

Equation (52) already differs from its original GSE counterpart in that it uses scaled coordinates **s** = **Δ**^{−1}**r**. The new spreading function thus behaves differently for anisotropic systems or for simulations for which **Δ** fluctuates in time (for example, simulations in the isobaric ensemble). Additionally, rather than simply abruptly truncating the Gaussian kernel in Eq. (52) as in GSE, we modify it such that it vanishes continuously at the boundary of the support (the set determined by the inequality ∥**s** − **i**∥ < *ν*) while remaining very nearly a Gaussian inside the boundary. Although this requirement does not uniquely determine a transfer function, we propose the transfer function

where

The Gaussian split method using the hyperbolic sine transfer function will be labeled SinhGS.

$N$ is a normalization constant such that

The covariance matrix of the transfer function is

implying

The covariance matrix is needed for the evaluation of $\sigma k\sigma k\u22a4=\sigma 2I3\u22122\sigma s\sigma s\u22a41/2$, which participates in the definition of the grid-to-grid interaction kernel $K\sigma kri,rj$ entering Eq. (39). As a first guess, the constant *κ* can be set to 0.427, the value determined for QuadS, but empirical studies for charged systems of biological importance have shown that the value

leads to slightly smaller errors in the interaction energy.

Let us illustrate some of the improvements SinhGS provides over the original GSE using ions in aqueous solution as an example system. For SinhGS, we use *ν* = 3, which corresponds to the spreading radius *r*_{s} = 3Δ for GSE; for GSE, this spreading radius is too small for most purposes, as will become clear (*r*_{s} = 3.8Δ is generally used with GSE in typical biomolecular simulations). For reference, we also display results computed with QuadS of order 2*ν* = 6. The system is a cubic box of water of edge length equal to 126.7 Å, containing 125 protonated arginine dipeptides neutralized by an equal number of chlorine anions and modeled by the CHARMM force field.^{21} The electrostatic energy is computed with the help of the *u*-series decomposition, which is itself a source of errors. In Fig. 6, though, we only display the error that is solely due to the grid discretization. This error is computed by reference to an effectively exact calculation performed using QuadS of order 2*ν* = 8 and a very small mesh size. The cutoff for the near electrostatics is kept constant at *r*_{c} = 9 Å, but the mesh size Δ is progressively decreased so that the value *r*_{c}/Δ used in the abscissa increases. QuadS converges rapidly, consistent with a sixth order method, and is the only method to converge. Neither SinhGS nor GSE converges as Δ → 0, but the errors for SinhGS stay closer to 0 than the errors for GSE. As we already mentioned, SinhGS and GSE can be made to converge by letting ν become large, although this mode of convergence is more expensive, with the number of grid nodes to which a charge is spread being roughly 4*πν*^{3}/3.

As exemplified in Fig. 7, MD simulations using GSE with *ν* = 3 exhibit strong energy drift for systems with large charges. The drift can be ascribed to the lack of continuity of the transfer function (the truncated Gaussian) at the cutoff radius. The drift can be decreased to acceptable values by increasing the spreading radius from 3 (in units of Δ) to the typically used value of ∼3.8, but there is a computational price to be paid since the number of grid nodes to which a charge is spread doubles. Under similar simulation conditions, SinhGS with *ν* = 3 performs better, and switching to this method is a computationally more efficient way to decrease the drift. Still, even though its energy drift is smaller, SinhGS must increase *ν* when more accuracy is needed, since modifying *ν* is the only accuracy control. From this point of view, the method is less flexible than QuadS or SPME on programmable computers, since the latter methods may alternatively increase the accuracy by decreasing Δ at constant *ν*. SinhGS is useful for Anton, where the cost of charge transfer is lessened by the utilization of dedicated hardware.

## VI. USAGE OF MIDTOWN SPLINES WITH THE *u*-SERIES DECOMPOSITION

In earlier sections, we have derived the midtown splines (also the SinhGS method). In this section, we turn to more practical issues of how to use these splines in MD simulations. Both Midtown splines and SinhGS can be used to evaluate the interaction energy for all potentials that can be represented as a superposition of Gaussians with standard deviations larger than a certain minimum value *σ*. Many interactions can be expressed this way, including the long-range component of the Coulomb interaction or the two-body part of the dispersion interactions. The superposition can be continuous, as is the case for the Ewald decomposition, or discrete, as is the case for the *u*-series decomposition. Here, we consider the *u*-series decomposition because it has been shown to be more computationally efficient than the Ewald decomposition.^{4} We succinctly describe the *u*-series decomposition and explain how it is used to evaluate the long-range component of the electrostatic energy. We then point out the existence of one parameter that has to be determined: *ϕ* = Δ/*r*_{c}, with *r*_{c} denoting the cutoff for the short-range electrostatics. We empirically determine and tabulate the maximal value of *ϕ* consistent with the requirement that the grid discretization errors, which increase as *ϕ* increases, are smaller than the intrinsic errors of the *u*-series decomposition (those errors still present as Δ → 0).

With *r* = ∥**r**′ − **r**∥, the long-range component of the *u*-series decomposition for the Coulomb interaction 1/*r* is

and the short-range component is

The coefficients *w*_{l} are equal to 1 for all *l* ≥ 1. Given *b* > 1, *r*_{c} and *w*_{0} are such that *r*_{c} is the smallest value for which $N1,br$ is *C*^{1} continuous at *r* = *r*_{c}. This requirement makes the ratio *r*_{c}/*σ* a function of *b*. Using lower values of *b* leads to more accurate decompositions, but these decompositions are computationally more expensive because the ratio *r*_{c}/*σ* increases as *b* decreases. We note that for special values of *b*, the short-range component $N1,br$ becomes *C*^{2} continuous at *r* = *r*_{c}. The order of continuity *o* at the cutoff is important because, for constant *b*, the error in the total electrostatic energy can be decreased as fast as $rc\u2212o\u22121$ by increasing *r*_{c}.

The long-range component of the electrostatic energy for *N* particles located at the positions **R**_{n} and carrying the charges *Q*_{n} is obtained as a sum over interacting grid points carrying grid charges by simply inserting Eq. (39) in place of $Gbl\sigma r\u2032,r$ in Eq. (59). The result is

The matrix **σ**_{k,l} is such that $\sigma k,l\sigma k,l\u22a4=b2l\sigma 2I3\u22122\sigma s\sigma s\u22a4$ and the grid charges *q*_{i} are

The transfer function $T\sigma sr,ri$ is given by Eq. (53) for SinhGS or by

for midtown splines. The splines $S\xi \u25cbs,i$ that are useful in MD are presented in the Appendix. For a periodic system, we can choose the kernel $K\sigma k,lri,rj$ to be given by either Eq. (31) or by a spectral version^{10} of Eq. (31) obtained by transporting the kernel $K\sigma k,lr,r\u2032$ into the reciprocal space and restricting the summation over the reciprocal vectors to the first Nyquist zone. We provide more details below.

In practice, the calculations are organized as follows: (i) the grid charges *q*_{i} are prepared by way of the *assignment formula* given by Eq. (62), (ii) the grid electrostatic potential

is evaluated, and, then, (iii) the electrostatic potential *U*(**r**) in the continuum is obtained by way of the *interpolation formula*,

The electrostatic potential *U*(**r**) is used to compute the force

acting on particle *n* and/or to evaluate the total energy

The evaluation of the on-grid electrostatic potential at step (ii) requires further clarifications for periodic systems. Defining **x** = **r**′ − **r** and the matrix **Ω** such that **Ωn** = *n*_{1}**a**_{1} + *n*_{2}**a**_{2} + *n*_{3}**a**_{3}, we have

The sum over **p** is a Fourier series, reflecting that the preceding sum defines a periodic function in **x** with unit cell Ω defined by the column vectors of **Ω**. The Fourier coefficients are

We can re-order the summation in Eq. (64) and introduce the influence function

to obtain

For the Ewald kernel, a similar arrangement holds except that the influence function becomes

The sum over **i** in Eq. (71) is finite: It extends only over those **i** that are within a distance ∥**i** − **s**∥_{p} of a point **s** = **Δ**^{−1}**r** with **r** in the unit cell Ω. This statement follows from Eq. (62). We now request that the scaled unit box is a rectangular prism of integer side lengths, that is,

for some positive integers *N*_{1}, *N*_{2}, and *N*_{3}. Periodicity arguments allow the charges *q*_{i} to be replaced by the set $q\u0303i$ obtained by coalescing all charges *q*_{i} from images outside Ω onto the unique equivalent in Ω. The sum inside the parentheses of Eq. (71) becomes a discrete Fourier transform to be handled by FFT. After multiplication by the influence function Ψ(**p**), the data can be transformed back using inverse FFT, provided that the exterior sum in Eq. (71) is restricted to the first Nyquist zone. The restriction could have been introduced explicitly in the last sum of Eq. (68) and is equivalent to using the kernel

for computing the on-grid interactions.

We now quantify the intrinsic errors in the *u*-series approximation using parameters we have previously found to be appropriate for biomolecular MD simulations. This will later allow us to compare the magnitude of these intrinsic errors—obtained, in practice, by performing the mesh computation with a fine mesh and high-order QuadS—to the errors that arise when more efficient but less accurate mesh computation is used, facilitating the selection of mesh computations with an appropriate balance of accuracy and efficiency. The two parameterizations we consider are as follows: (i) a *C*^{1}-continuous parameterization with *b* = 2 and *r*_{c}/*σ* ≈ 1.989 and (ii) a *C*^{2}-continuous parameterization with *b* ≈ 1.630 and *r*_{c}/*σ* ≈ 2.752. (Parameter values specified to more than 3 digits are available in Ref. 11.) The parameterization with *b* = 2 is computationally more efficient and recommended for most simulations. The parameterization with *b* ≈ 1.630 is useful when more accurate simulations are needed. There exist *C*^{2}-continuous parameterizations exceeding any desired accuracy. They are not, however, expected to be useful in MD simulations, so we did not include them here.

Figure 8 displays the error in the electrostatic energy that is intrinsic to the *u*-series method, as measured for the arginine dipeptide system. These errors decay as $1/rc3$ for the *C*^{2} parameterization and as $1/rc2$ for the *C*^{1} parameterization. For comparison, we also display the errors due to the truncation of the van der Waals interactions. Even though we include a dispersion correction that corrects for the truncation in an average sense (reducing the error by three orders of magnitude for this system), the remaining errors, which decay as fast as $1/rc3$ with increasing *r*_{c}, are overall larger than the errors in the electrostatic energies for cutoffs in the range we studied (*r*_{c} ∈ 7 Å–16 Å).

We now empirically determine the maximum mesh spacing below which the grid discretization error is smaller than the intrinsic error. The result depends on the decomposition scheme (here, the *u*-series decomposition), the charge assignment rule (here, CubeS or SinhGS), the grid-to-grid kernel (here, the Fourier spectral kernel), and the class of physical problems (here, condensed-phase systems in aqueous solutions near room temperature). Scaling arguments (for a given *u*-series parameterization *σ*∝*r*_{c} and for a given Midtown spline $\Delta =\sigma s/\xi \u25cb\u2264\sigma /2\xi \u25cb$) suggest that the maximum Δ will be proportional to *r*_{c}, so we will empirically determine the maximum allowable values for the ratio *ϕ* = Δ/*r*_{c}. The maximum values of *ϕ* were indeed observed to depend little with *r*_{c} in the range 7 Å–16 Å.

Table III lists the maximum allowable values of *ϕ* recommended for several QuadS and CubeS_{2} of orders 4 and 6. These values have been determined so that the discretization errors are smaller than the intrinsic errors on the entire range 7 Å–16 Å of cutoff radii considered. Graphs similar to Fig. 9 have been determined for several systems thought to be representative of biological systems, and the most conservative values for *ϕ*_{max} have been retained. A separate study has been carried out for the *u*-series with *b* ≈ 1.630. For SinhGS with *ν* = 3, owing to the non-decaying asymptotic error in the limit Δ → 0, there is no value of *ϕ* for which the error decays below the intrinsic error of the *u*-series with *b* ≈ 1.630. We must use SinhGS with *ν* = 4 to reach the accuracy afforded by the *C*^{2}-continuous parameterization of the *u*-series decomposition. (For SinhGS, the parameter *ν* is continuous, and it is just by coincidence that the optimal values were found to be nearly integers.) The other results are also consistent with the theoretical analysis. For the same order, CubeS_{2} are slightly less accurate than QuadS, and this is reflected in their smaller values of *ϕ*_{max}.

ϕ_{max} | |||

Method | Nr nodes | b = 2 | b ≈ 1.630 |

CubeS_{2}, ν = 2 | 32 | 0.23 | 0.065 |

QuadS, ν = 2 | 64 | 0.26 | 0.075 |

CubeS_{2}, ν = 3 | 88 | 0.35 | 0.160 |

QuadS, ν = 3 | 216 | 0.43 | 0.180 |

SinhGS, ν = 3 | 113 | 0.43 | … |

SinhGS, ν = 4 | 268 | … | 0.190 |

ϕ_{max} | |||

Method | Nr nodes | b = 2 | b ≈ 1.630 |

CubeS_{2}, ν = 2 | 32 | 0.23 | 0.065 |

QuadS, ν = 2 | 64 | 0.26 | 0.075 |

CubeS_{2}, ν = 3 | 88 | 0.35 | 0.160 |

QuadS, ν = 3 | 216 | 0.43 | 0.180 |

SinhGS, ν = 3 | 113 | 0.43 | … |

SinhGS, ν = 4 | 268 | … | 0.190 |

Table III facilitates the specification of different parameterizations of comparable accuracy, which is an important step toward determining the fastest method able to achieve a desired level of accuracy on a given computational platform. As an illustrative example, Fig. 10 shows the results of a study for the *u*-series with *b* ≈ 1.630 using Desmond/GPU on an RTX 2080 GPU. All systems were cubic water boxes at experimental densities with interactions given by the TIP3P water model. The distant calculation was executed at each time step, and a cutoff radius of 9 Å was used for both electrostatics and van der Waals interactions. The grid sizes were chosen according to equation Δ = *ϕ*_{max}*r*_{c}. We found that the optimal transfer functions for this level of accuracy is provided by CubeS_{2} of order 6, which spreads to 88 grid nodes. The gain in overall simulation speed of about 40% when measured against QuadS of order 6 is due to the more compact spreading support, which more than compensates for the moderate increase in the FFT sizes. The high accuracy demands of *b* ≈ 1.630 are difficult to meet with midtown splines of order 4, the usage of which leads to substantial increases in the FFT sizes and consequently large slowdowns. A separate study for *b* = 2 has shown that the midtown splines of order 4 are nevertheless faster at this lower accuracy. CubeS_{2} of order 4 were found to lead to overall simulation speeds about 10% faster than with QuadS of order 4.

## VII. SUMMARY AND CONCLUSIONS

We have introduced the midtown splines defined as a set of cubature Newton–Cotes weights varying continuously with the location of a charge carrier. The splines have been developed for Gaussian kernels, which have immediate applications in chemical physics, but similar splines can be defined for other measures that possess reflection symmetry. The splines are equipped with assignment and interpolation formulas that allow sampling of smoothings on a grid given the charge densities and, conversely, continuous reconstruction of smoothings given charge samples on a grid. In the present work, the mappings provided a way to perform the charge assignment (spreading) and the electrostatic potential interpolation steps in particle–mesh algorithms for electrostatics computations. These mappings may find other uses, for example, to switch between grids of different resolutions, as is done in the multilevel summation method,^{8} or as an optimized approximation to an ideal Gaussian filter in signal processing.

The one-dimensional splines, the quadrature splines, have been characterized completely and uniquely. Their mathematical properties are highly interesting because optimality considerations for the assignment view simultaneously result in maximal smoothness of the interpolation view. Each accuracy order of the form 4*k* + *p* with *p* ∈ {1, 2, 3, 4} results in optimal quadrature splines of continuity *C*^{p−2}. The splines of order 4*k* + 2 are particularly recommended for MD applications, although all continuous splines are usable. The splines of orders 4*k* optimally implement path integrals on a grid.

In three dimensions, cubature splines can be built as products of quadrature splines by exploiting the factorization of Gaussians. This construction, which can be named either QuadS or CubeS_{∞}, is similar to the way the cardinal B-splines are used in SPME. The construction is not optimal, absent a request for separable transfer functions, and more economical cubature splines of given order can be built. We described the Manhattan and Euclidean splines, denoted by CubeS_{1} and CubeS_{2}, respectively, and completely characterized the CubeS_{2} of order 4 and 6. These cubature splines spread any given charge to a number of 32 and 88 neighboring grid nodes, respectively, which compare favorably to the 64 or 216 nodes needed by both QuadS and B-splines of the same order. Measured against QuadS, CubeS_{2} require slightly more operations per assignment and larger grid sizes, but fewer memory transactions. Speed measurements using Desmond/GPU have shown that the cubature splines outperform their QuadS counterparts on single-GPU platforms.

The spline-based transfer functions are less useful for Anton or other platforms possessing dedicated hardware for the fast evaluation of pairwise interactions. Informed by the asymptotic properties of QuadS, we have developed a variation of GSE called SinhGS, which is both simpler and more accurate than the original method. It is simpler especially in its use because the exact relationship between the standard deviation of the spreading Gaussian and the mesh size, namely, $\sigma s=0.40\nu \Delta $, eliminates the reliance on an ad hoc optimizer and allows for simpler virial expressions obtained by formal differentiation. SinhGS has better accuracy and smaller energy drift than GSE.

## SUPPLEMENTARY MATERIAL

See the supplementary material for the following appendixes and table: Appendix B: Proof of the reflection symmetry, Appendix C: Proofs of Propositions 1 and 2, Appendix D: Proof of Proposition 5, Appendix E: Path integrals in imaginary time on the grid, Appendix F: Quadrature splines of odd order, and Table IV: The optimal parameters ξ_{○} for the first few quadrature splines of odd order.

## ACKNOWLEDGMENTS

The authors thank Michael Eastwood for helpful discussions and Berkman Frank for editorial assistance.

This study was conducted and funded internally by D. E. Shaw Research, of which D.E.S. is the sole beneficial owner and Chief Scientist and with which all the authors are affiliated.

## DATA AVAILABILITY

The data that support the findings of this study are available within the article and its supplementary material.

### APPENDIX: RECOMMENDED MIDTOWN SPLINES FOR MOLECULAR DYNAMICS SIMULATIONS

Explicit formulas for those midtown splines that we recommend for molecular dynamics simulations are provided here. These are the CubeS_{2} splines of order 4 and 6. We first provide explicit formulas for the one dimensional midtown splines (QuadS) of order 4 and 6; by means of an outer product, the QuadS functions can be used to construct the optimal separable transfer functions in three dimensions. We note that practical advice on how to determine an appropriate grid spacing is given in Sec. VI.

The one-dimensional midtown splines (QuadS) of order 2*ν* have the form

Here, $\u230as\u230b$ and *θ*(*s*) represent the integer and fractional parts of *s*, respectively. The optimal values for *ξ* are denoted by ξ_{○} and presented in Table I. QuadS possess the translational property $S\xi s,i=S\xi s\u2212i,0$ and the reflection symmetry $S\xi s,0=S\xi \u2212s,0$. We only report the quadrature weights *c*_{i}(*θ*, *ξ*) with 1 ≤ *i* ≤ *ν* because those with −*ν* < *i* ≤ 0 can be obtained from the reflection symmetry or equivalently from Eq. (14).

For order 2*ν* = 4, the quadrature weights are

For order 2*ν* = 6, the quadrature weights are

Many-dimensional midtown splines are generically called CubeS. They have the form

which generalizes Eq. (A1). Here, $\u230as\u230b$ and **θ**(**s**) are the component-wise extensions of *s* and *θ*(*s*) to many dimensions. The splines possess the properties $S\xi s,i=S\xi s\u2212i,0$ and $S\xi s,0=S\xi s\u2020,0$. Here, **s**^{†} denotes the unique point of positive coordinates that is obtained from **s** through a sequence of reflections with respect to the main planes of the coordinate system. Additionally, $S\xi s,0$ is invariant under any permutation of the components of **s**. The domain *D* represents the support of $S\xi s,\u22c5$ for any **s** ∈ (0, 1)^{d}, where *d* is the number of dimensions. In three dimensions, it is invariant under any of the eight reflections,

and any permutation of the coordinates of **i**. *D* is thus completely specified by the list of nodes **i** ∈ *D* with *i*_{1} ≥ *i*_{2} ≥ *i*_{3} ≥ 1, which bear explicit labels in Figs. 5 and 11. Due to these symmetries or equivalently Eqs. (49) and (50), it suffices to report the cubature weights *c*_{i}(**θ**, *ξ*) for these nodes only.

The largest domain is *D*_{∞} = [(1 − *ν*)⋯*ν*]^{d}, which is the domain for outer products of QuadS,

These functions, also called CubeS_{∞}, are optimal for the cubic domain, *D*_{∞}. *D*_{∞} is, however, part of a class of domains *D*_{p} defined by Eq. (47), with the associated midtown splines denoted by CubeS_{p}. In three dimensions, we have found CubeS_{2} of order 4 with the support size |*D*_{2}| = 32 and of order 6 with the support size |*D*_{2}| = 88, which we recommend for use in molecular dynamics simulations.

For order 4, we first define the auxiliary functions

Then,

The value of *ξ* recommended for electrostatics calculations is $\xi \u25cb=2/15$. In applications where positive weights are required, they can be obtained by using $\xi \u25cb=1/3$.

For order 6, we define the additional auxiliary functions

Then,

The optimal value of *ξ* is *ξ*_{○} = 0.650 399 8…. (With this choice, the cubature weights are positive).

We note that various parallelization strategies are possible. Desmond/GPU uses an outer loop to iterate over all charges in the system. For each such charge, an inner loop iterates over all grid points to which portions of the charge are assigned. If the charge is located at **s**, the charge is transferred to all grid points **i** such that $i\u2212s\u2208Dp$. Equation (A4) is used directly as given. Another possible parallelization strategy is to iterate over all grid points in the outer loop. Then, for each such grid point **i**, an inner loop iterates over all particles that can transfer a non-zero charge to the grid point. In other words, for each **i**, we need to list all charges located in the support of the interpolation view $S\xi \u22c5,i$. To do this, it may be computationally advantageous to prefilter the charge positions **s** by using $i\u2212s\u2208D$ only if ∥**i** − **s**∥_{p} < *ν*. (The *L*^{p} norm ∥·∥_{p} is fast to compute.)