The evaluation of electrostatic energy for a set of point charges in a periodic lattice is a computationally expensive part of molecular dynamics simulations (and other applications) because of the long-range nature of the Coulomb interaction. A standard approach is to decompose the Coulomb potential into a near part, typically evaluated by direct summation up to a cutoff radius, and a far part, typically evaluated in Fourier space. In practice, all decomposition approaches involve approximations—such as cutting off the near-part direct sum—but it may be possible to find new decompositions with improved trade-offs between accuracy and performance. Here, we present the u-series, a new decomposition of the Coulomb potential that is more accurate than the standard (Ewald) decomposition for a given amount of computational effort and achieves the same accuracy as the Ewald decomposition with approximately half the computational effort. These improvements, which we demonstrate numerically using a lipid membrane system, arise because the u-series is smooth on the entire real axis and exact up to the cutoff radius. Additional performance improvements over the Ewald decomposition may be possible in certain situations because the far part of the u-series is a sum of Gaussians and can thus be evaluated using algorithms that require a separable convolution kernel; we describe one such algorithm that reduces communication latency at the expense of communication bandwidth and computation, a trade-off that may be advantageous on modern massively parallel supercomputers.
I. INTRODUCTION
Evaluation of the electrostatic energy for a set of point charges in a periodic lattice is required for a number of applications, including molecular dynamics (MD) simulations, electronic structure calculations, crystallographic calculations, and hydrodynamic simulations. The long-range nature of the Coulomb interaction leads to well-known computational challenges arising from the inefficiency of the direct summation of all pairwise interactions.1–3 For MD simulations of biophysical systems in an aqueous solution with an explicitly modeled solvent4—the application that motivates us here—the unscreened Coulomb interactions are non-negligible even at distances as large as 100 Å and thus extend over the entirety of most simulated systems. Furthermore, when using spatial-domain decompositions5 on massively parallel computers, the long range of the Coulomb interaction leads to communication latency that limits parallel scaling.6,7
Many efficient approaches for evaluating the total electrostatic energy are based on Ewald’s decomposition,2,8 which divides the Coulomb potential function 1/r into short-range and long-range parts, also referred to as the “near” and “far” parts, respectively. The near part is effectively range-limited and is computed by direct summation up to a cutoff radius rc. The far part is typically computed as a sum in k-space (also known as reciprocal or Fourier space), where it too is effectively range limited, and can be efficiently computed by methods such as optimized Ewald summation,9,10 the particle-mesh Ewald (PME) algorithm,11,12 the particle-particle particle-mesh method,13 the fast Fourier Poisson method,14 or Gaussian split Ewald (GSE).15
The near interactions in the Ewald decomposition decay rapidly but do not vanish at rc, leading to various undesirable artifacts depending on the specific implementation. The typical approach of abruptly setting the near forces to zero beyond rc is equivalent to truncating the near electrostatic energy and shifting it downward so that it vanishes continuously at the cutoff. This is known as the “cut-and-shift” approach,16 and it introduces a small, constant error in the electrostatic energy at distances up to rc. More sophisticated techniques based on switching functions have been applied to the Ewald near term to increase its smoothness at rc; such techniques eliminate the error in the electrostatic energy up to the distance at which the switching function takes effect, thus localizing the error near rc, but have not been found to measurably reduce errors in actual biomolecular simulations.17 There has also been some work exploring decompositions based on screening charge distributions,18–20 which can be made continuous at rc, but which still exhibit errors at distances less than rc.
In the present work, we introduce the u-series, a new decomposition of the Coulomb potential into near and far parts that is constructed to be exact up to the cutoff and continuous at the cutoff. The far part of the u-series is essentially the long-range portion of a bilateral series (a series whose summation index runs from −∞ to ∞) presented by Beylkin and Monzón,21,22 who observed that the function 1/r can be well approximated with a uniform relative error by a bilateral series of suitably scaled Gaussians. (The name “u-series” was chosen to reflect this property of uniform relative error.) The near part is 1/r minus the far part, truncated at rc (hence the near and far parts sum to exactly 1/r at distances up to rc). We construct the decomposition in such a way that the near part vanishes and is continuously differentiable at rc, without recourse to approximations that perturb the total energy (e.g., shifting or switching). As a result, the magnitude of the Coulomb interaction error near rc is significantly decreased relative to the Ewald decomposition.
Our approach was motivated by the hypothesis that moving the electrostatic energy error from before rc (as in the Ewald decomposition) to beyond rc would improve overall accuracy due to self-cancellation of long-range errors in a charge-neutral system. This intuition has been exploited in a different way by the Isotropic Periodic Sum (IPS) method, introduced by Wu and Brooks23,24 and further developed by Takahashi.25 Through the inclusion of a particular model for the distant charge distribution surrounding any given particle, IPS dispenses with the need for the k-space sum, provided that the cutoff radius rc is sufficiently large. For the u-series, the intuition is borne out empirically by numerical experiments in which we find a substantial improvement in the accuracy of simulation observables for a lipid membrane system. Alternatively, the u-series can match the accuracy of the Ewald decomposition with a reduced computational effort; based on our numerical tests, we estimate that u-series needs only about half the computational effort of Ewald for the same accuracy. In practice, these computational savings are effected by using a smaller cutoff radius for the near part and/or a coarser grid when employing a mesh-based method to evaluate the far part.
In certain situations, the u-series may have certain additional advantages: Its far part is a sum of Gaussians, which are separable functions [that is, products of one-dimensional functions; e.g., g(x, y, z) = f(x) f(y) f(z)], allowing its far part—unlike the far part of the Ewald decomposition—to be evaluated by algorithms that take advantage of this separability. Three-dimensional convolutions of separable kernels are computationally less expensive than convolutions of non-separable kernels,26 and we will show that they also admit implementations with lower communication latency on parallel machines. In particular, we present a real-space algorithm that evaluates the far part with a lower communication latency than standard Fourier space methods, at the expense of some overhead in communication bandwidth and computation. Such a trade-off may be beneficial on massively parallel supercomputers, providing an additional potential performance advantage for the u-series beyond that which is illustrated in our numerical experiments.
II. THE PROBLEM OF EVALUATING ELECTROSTATIC INTERACTIONS
We begin by reviewing the problem of evaluating the electrostatic energy for periodic systems. We briefly discuss Ewald’s method and refer the reader to Refs. 1–3 for additional details. For an in-depth treatment of the structure of the electrostatic potential for periodic charge distributions, we refer to the reader the original analysis of de Leeuw et al.27
Let ρ(r) be a charge density that satisfies periodic boundary conditions ρ(r) = ρ(r + n), where n denotes a vector (mxLx, myLy, mzLz) with mx, my, mz integers and Lx, Ly, Lz the side lengths of an orthorhombic unit cell of volume V = LxLyLz. The unit cell, denoted by Ω, is assumed to be charge neutral (that is, ). The charge density ρ(r) can be continuous, discrete, or a combination of the form
where ρc(r) is the continuous part of the distribution, qi denotes the magnitude of the point charge at position ri within the unit cell, and δ is the Dirac delta function. The electrostatic potential Φ(r) generated by the infinite, periodic distribution of charge is formally given by
The potential defines the electrostatic energy of the unit cell by the formula
As is customary, the prime in the integral of Eq. (3) means that the potential has the Coulomb singularity at ri removed when interacting with the point charge qi (which itself was the source of the singularity). It can be shown that Φ(r) can be expressed as
where k runs over what are referred to as vectors of the reciprocal lattice, k = (2πmx/Lx, 2πmy/Ly, 2πmz/Lz), with mx, my, mz integers and
By the Fourier inversion theorem, the series
recovers the periodic charge density.
It is not straightforward to evaluate the electrostatic energy from Eq. (4) because of the need to remove the self-interaction energy. Ewald provides a resolution to this problem by dividing the Coulomb potential into near and far parts,2,8
where σ > 0 is an arbitrary length scale. The splitting can be physically rationalized18,19 by canceling the charge distribution ρ(r) in Eq. (2) with a smoother version of itself obtained by convolution with a Gaussian of width (standard deviation) σ. The second term in Eq. (6) (the far part) then represents the electrostatic potential generated by a unit charge that is distributed as a Gaussian of width σ centered about the origin. It can be verified that
which shows that the far part in the Ewald decomposition is a continuous superposition of Gaussians of width at least σ. Similarly, the near part is a superposition of Gaussians of width at most σ.
The near and far parts of the Ewald decomposition are substituted for the Coulomb interaction in Eq. (2), resulting in the near and far potentials, respectively. The integral involving the near potential converges absolutely in real space, where it is truncated at a suitable cutoff radius rc and computed as an explicit sum of pairwise interactions.
The far part of the electrostatic potential,
generates an absolutely convergent sum in k-space, which takes the form
The far part of the electrostatic energy is then defined by
where the constant term subtracted in Eq. (10) is the self-interaction energy.
The cost of computing the near potential is proportional to the number of particles and to the volume of interaction 4πrc3/3 for each particle. The cost of evaluating the k-space sum depends on the specific method used. For direct Ewald summation, a k-space cutoff kc is set to a fixed multiple of 1/σ, and the number of k-space lattice points in the ball of radius kc is proportional to the volume 4πkc3/3. For grid-based techniques such as PME and GSE, the mesh spacing of the grid is proportional to σ (for a given amount of accuracy), and the computational effort scales at least linearly with the number of grid points. For both methods, the cost of computing the far potential is thus at least proportional to 1/σ3.
III. THE u-SERIES
In this section, we introduce the u-series, a new decomposition of the Coulomb potential for Ewald-like electrostatics computations. Our starting point is a previously known21,22 bilateral series approximation for 1/r, which we study in Sec. III A. In Sec. III B, we show how this bilateral series approximation leads directly to a decomposition of 1/r. In Sec. III C, we make small but important modifications to this decomposition to improve its accuracy and to make it more suitable for use in MD simulations, resulting in the decomposition we call the u-series.
A. The bilateral series approximation for 1/r
The function 1/r can be approximated by a bilateral infinite series of Gaussians of geometrically spaced widths, which we refer to as the bilateral series approximation (BSA).
Let be a Gaussian of width σ, and let b > 1 be a positive constant (the base). The bilateral series approximation for 1/r is defined as
The BSA is motivated by the identity
which follows from the change of variables u = rt. Rearranged, this identity provides a well-known integral expression for 1/r,
A quadrature of Eq. (13) with geometrically spaced quadrature points (i.e., t = b−x) leads directly to the BSA for 1/r in Eq. (11). It may seem unusual that the starting point of an improved method is an approximation, but the alternating distribution of the errors (illustrated in Fig. 1) is a feature specific to the BSA, and one that we will exploit algorithmically.
We are not the first to observe that 1/r may be represented by the series of geometrically scaled Gaussian terms given in Eq. (11). Notably, Beylkin and Monzon21,22 have previously made use of the same observation as a starting point to develop approximations to 1/r with bounds on the relative error for a finite range δ ≤ r ≤ 1, where δ > 0. Such approximations have been used in the context of electronic structure calculations,28 and other approximations that use a sum of Gaussians to represent an electrostatic potential have found use in that context29 or in quantum mechanics/molecular mechanics calculations.29–31 Our goal is to develop a decomposition of 1/r that is particularly suitable for use in MD simulations, and as a first step, we establish some important properties of the BSA that to our knowledge have not been previously reported.
We remind the reader that if f(b) and g(b) ≥ 0 are functions of b, then g(b) asymptotically bounds f(b) in the limit b → 1 if limsupb→1| f(b)|/g(b) ≤ M for some M > 0. The case with M = 1 is succinctly represented by | f(b)| ≲ g(b) as b → 1. The relative error of the BSA has the asymptotic bound
as b → 1 (see supplementary material, Appendix B for details). We denote this asymptotic error bound by Mb and remark that it is independent of σ, being strictly a function of the base b. Numerical investigations show that Mb(14) is within 1% of the exact bound for b ≤ 2. The relative error for the BSA with b = 2 is illustrated in Fig. 1, with the asymptotic error bound M2 plotted in red.
The BSA features the scaling property
which follows immediately from the substitution j = j′ + 1 in Eq. (11). Consequently, the relative error has the property
Owing to Eq. (16), the relative error is uniquely determined as a function of r by its values on the interval [1, b). These values are repeated in the interval [bk, bk+1) for each integer k. The relative error is thus uniform across the entire real axis, being no worse than it is on the interval [1, b). This uniform approximation property is shown in Fig. 1, which also illustrates that the BSA relative error has infinitely many roots that accumulate both to the origin and to infinity. The scaling invariance expressed by Eq. (16) implies this statement provided that there is at least one root in the interval [1, b), which we prove in supplementary material, Appendix D.
B. The BSA decomposition
Ewald’s method evaluates the near part in real space and the far part in k-space, where they are, respectively, localized. The Gaussian terms in the BSA decay super-exponentially (i.e., faster than exponentially) in real space and k-space as functions of r/(b jσ) and kσ/b j, respectively (see supplementary material, Appendix C for the k-space form of the BSA). The BSA thus provides a natural approximate decomposition of 1/r into terms with positive and negative j,
We refer to this new decomposition as the BSA decomposition and illustrate its components in Fig. 2.
The BSA decomposition has several desirable properties. The near and far terms have a fast decay in real-space and k-space, respectively. The relative error of the BSA series converges exponentially in the limit b → 1, as shown in Eq. (14). Finally, the Gaussian components of the BSA are separable; in fact, Gaussians are the only separable spherically symmetric functions. As will be discussed in Sec. V, this enables the use of lower-latency parallel algorithms if the infinite series can be truncated.
C. The u-series
Two properties of the BSA decomposition make it not ideally suitable for direct use in MD simulations. First, the near part of the BSA decomposition is not range limited, which leads to truncation errors when the near term is evaluated only up to a cutoff radius rc. Second, the error of the BSA decomposition is large in absolute terms for values of r much smaller than rc, since the relative BSA error extends all the way down to r = 0. The u-series addresses these issues by retaining the far part of the BSA decomposition,
while replacing the near part with
The cutoff radius rc is chosen to be the smallest root of
The u-series is the sum of the near and far parts,
By construction, the u-series equals 1/r exactly for r < rc, and the near part has its range limited to r < rc. For r ≥ rc, the relative error of the u-series is equal to that of , thus approaching the uniformly bounded relative error of the BSA for r ≫ rc. Owing to the fast decay of Gσ(r), the relative error of the BSA and the u-series approaches each other rapidly when r ≥ rc, as illustrated in Fig. 3.
By choosing rc to be a root of , we ensure continuity of at rc, and thus along the entire real axis. As a result, the u-series does not incur the truncation errors associated with the cut-and-shift approach. The smallest root is chosen because it minimizes the O(rc3) computational work required to compute the near term.
The definition of the u-series relies on the existence of roots of . We can confirm their existence by observing again that the near part of the BSA decomposition decays super-exponentially in r, so the relative error of the far part approaches for sufficiently large r and thus has infinitely many roots away from the origin, as illustrated in Fig. 3. Because the near part is missing, the roots of do not accumulate at the origin, as they do for . Figure 3 plots the first four roots of .
Note that
so that rc is equivalently the smallest root of . For a given base b, define r0 to be the smallest root of ; then, we have the relation rc = r0σ. In practice, depending on the best approach for maximizing the computational performance on a specific platform, one can either choose σ and then set rc = r0σ or choose rc and then set σ = rc/r0.
The smoothness of the u-series at rc can be increased by modifying the definition of the far part. We use the nomenclature Ck construction for a family of u-series decompositions with k continuous derivatives at all r > 0; accordingly, the simplest form of the u-series—with the far part defined in Eq. (19)—is referred to as the C0 construction. A C1-continuous form of the u-series that we refer to as the C1 construction, and which numerical tests described in later sections suggest is particularly useful for MD applications, is obtained by varying the coefficient of the narrowest Gaussian in the far part of the series as follows:
To ensure that the resulting decomposition is C1-continuous for a given base b, we determine the pair (r0, w0) for which and its first derivative evaluated at r0 are equal to 1/r0 and −1/r02, respectively. Using Eq. (24), we can solve for w0 in terms of r0 such that ,
r0 is then obtained by solving the equation
As is true for the C0 construction, there exist multiple solutions to Eq. (26), and we pick r0 to be the smallest one. The many solutions possible for any given b give rise to the curves shown in Fig. 4. The C1 construction contains some solutions that are not optimal; that is, there is another solution with both smaller b (higher accuracy) and smaller r0 (lower computational cost). The solid red curves in Fig. 4 highlight the set of all optimal C1 constructions, which comprise a countably infinite set of intervals.
The points of discontinuity marked by bullets in Fig. 4 are points where two roots of Eq. (26) coalesce. For these special values of b, the u-series is C2 continuous. The points of C2 continuity can be found by solving the system of equations,
with unknowns b, r0, and w0, where is the C1 construction term specified by Eq. (24). Equation (27) admits countable many solutions, forming a sequence such that r0 → ∞ as b → 1. Each C2 solution is the most accurate u-series in its interval. Parameters for the first three are given in supplementary material, Table 1, along with the corresponding bounds Mb on the relative error.
Figure 5 plots the u-series relative error bound Mb against r0, directly illustrating the trade-off between computation (which increases with r0) and error (which decreases with r0). The gap in r0 corresponds to the gap between the first two red curves in Fig. 4. Within the first interval, the u-series error decays roughly exponentially with r0.
We believe that the C1 construction suffices for all purposes relevant to MD simulations, especially since it has a subset of C2-continuous solutions that can be used if such solutions are required. For even smoother approximations, additional parameters could be varied, for example, the width s0 of the narrowest Gaussian, as in
Varying the parameters defining the narrowest Gaussian is preferable, in order to prevent the appearance of large errors away from r0. We refer to Eq. (28) as the C2construction and provide parameters for b = 2 in the third row of supplementary material, Table 2, but we found that this and smoother constructions suffer from larger values of r0, as well as a larger overshoot or undershoot of the BSA relative error. With respect to the magnitude of the relative error for r > r0, the C1 construction is well behaved, as illustrated in Fig. 6 for b = 2. Although it undershoots the lower bound, M2 near r0, it blends more smoothly at r0 than the C0 construction (compare to the black solid line of Fig. 3, which does not go out of bounds, but exhibits a clear kink at r0). For b ≤ 2, numerical investigations show that the overshoot/undershoot never exceeds 20%, in part because the optimized values for the parameter w0 remain close to 1 (see Fig. 7).
IV. NUMERICAL EXPERIMENTS
We now assess the accuracy and performance of the u-series. In Sec. IV A, we calculate errors in the energy and pressure obtained using u-series C1 and C2 constructions for 100 configurations obtained from the simulation of a lipid bilayer. We compare these errors to those obtained for the same set of configurations using an Ewald decomposition. This allows us to establish parameters that give rise to similar accuracy between the two decomposition methods, and hence to estimate their relative performance. Then, in Sec. IV B, we perform a series of simulations using different u-series parameters and calculate observables that are known to be sensitive to the electrostatics. These results help us recommend u-series parameters that are useful in practice for MD simulations.
The lipid bilayer system we simulated was composed of 72 dipalmitoylphosphatidylcholine (DPPC) lipids solvated by 2189 molecules of water. The simulations were performed at a temperature of 323 K, a pressure of 1 atm, and a surface tension of 0 dyn cm−1. Under these conditions, the bilayer is in its fluid state32 and the aspect of the box fluctuates by a few percentage points. Simple truncation of Coulomb interactions is known to produce major artifacts in DPPC,33 and although smoothly tapering 1/r to zero before truncation can alleviate some of the problems, there can still be large discrepancies in properties such as the electrostatic potential difference between the center of the bilayer and the surrounding water.34 These discrepancies are thus due to the absence of the long-range component of the electrostatic interaction. Because the long-range component is precisely where the error of the u-series appears, DPPC ought to be a sensitive test case for the u-series. The 100 simulation configurations used in Sec. IV A were sampled at 1-ns intervals from a 100-ns simulation that used a very accurate parameterization of the Ewald decomposition. The simulations described in Sec. IV B used the u-series. In all our simulations, the cutoff radius for van der Waals interactions was 12 Å.
A. Comparison to the Ewald decomposition
We start by focusing on errors that are intrinsic to the decomposition scheme: those that remain even if the direct and k-space sums are performed exactly.
In Fig. 8, we show the absolute error when σ = 1 and rc ≈ 1.989 for the Ewald decomposition and the C1-continuous u-series with base b = 2. The absolute error Ewald(r) − 1/r is constant and large in the region r ≤ rc. Using a switching function to taper the near Ewald potential at rc does not decrease the maximum interaction error. Beyond the cutoff, the error decays as fast as a Gaussian of width σ. The absolute error S21(r) − 1/r for the u-series decays less rapidly, but is much smaller in magnitude than the Ewald error for small values of r.
To see how these errors in approximating 1/r translate into errors in the computation of energies and pressures for configurations obtained from MD simulations, we used 100 configurations from a simulation of the bilayer system described above. We computed the energy and pressure for these configurations using different decomposition schemes and parameterizations, including a very accurate Ewald parameterization that yields essentially exact results. The absolute errors in the energies and pressures for the various parameterizations were then calculated as differences from these essentially exact results. In order to isolate errors due to the decomposition, a very accurate grid-based method with a very fine grid (128 × 128 × 128) was used to evaluate the far part, and all computations were carried out in double precision.
The results of our computations, which all used rc = 8 Å, are shown in Fig. 9 (left). The red lines show the energy and pressure errors for the C1-continuous u-series for the same values of b = 2 and r0 = 1.989 used in Fig. 8 so that σ = 8/1.989 ≈ 4.02 Å. Such a value of σ leads to large absolute errors for the Ewald decomposition (not shown), but these errors can be decreased by decreasing σ. We find that we must decrease σ to approximately 2.50 Å for the Ewald decomposition errors to match the magnitude of the errors characteristic of the C1-continuous u-series with b = 2 [see Fig. 9 (left)]. The value of σ for the u-series is 4.02/2.50 ≈ 1.61 times larger, which results in substantial computational savings, as we discuss below. If greater accuracy is required, then a u-series with a smaller base b is a good choice. The C2-continuous u-series with b ≈ 1.63 and σ = 2.91 Å shown in Fig. 9 (left), for example, has an error that is one order of magnitude smaller than both Ewald and u-series with b = 2 and σ = 4.02 Å, yet is still more computationally efficient than Ewald (σ = 2.91 Å > 2.50 Å).
So far we have focused on the errors intrinsic to the decomposition scheme by computing the far part very accurately. In practice, however, such accurate computations would be prohibitively expensive, so faster but more approximate computations would be performed leading to another source of error. We now redo the calculations leading to Fig. 9 (left), but evaluate the far part using the level of accuracy that we typically use in MD simulations and show the results in Fig. 9 (right). Specifically, we use the GSE15 method, which was originally described in the context of the Ewald decomposition, but which is also straightforward to use to calculate the far contribution of the u-series decomposition. In GSE, charges are spread from particles to grid points within a certain radius; the trade-off between speed and accuracy is principally controlled by the ratio of this charge-spreading cutoff radius to the grid spacing. We have found through experience that a reasonable compromise between speed and accuracy is obtained when this ratio is 3.8, a value we have used extensively in simulations of a variety of systems, and which we also choose here. Other GSE parameters follow from this choice: The width of the spreading Gaussian, which does not affect speed and so can be chosen to optimize accuracy, is σs = 0.83Δ, where Δ is the grid spacing; the GSE requirement yields a maximum allowable grid spacing of Δ ≈ σ/1.17.
For our DPPC system, this ratio leads to a 24 × 24 × 32 grid for the Ewald decomposition, and 15 × 15 × 21 and 21 × 21 × 28 grids for the u-series decomposition with b = 2 and b ≈ 1.63, respectively. Figure 9 (right) plots the energy and pressure errors using these more practical computations of the far contributions. The results are nearly identical to those in Fig. 9 (left), validating our finding that the u-series is more computationally efficient than Ewald for this system. Not surprisingly, the most accurate method in the figures (u-series with b ≈ 1.63) suffers most from the error related to the coarser grids. The biggest difference between the left (L) and right (R) panels of Fig. 9 is a small but noticeable increase in the energy error for this simulation.
Based on the results of this section and the arguments about computational work scaling in Sec. II, we now estimate the ratio of total work required to compute the electrostatic interaction with a comparable error using u-series and Ewald decompositions under certain assumptions. Because they decay super-exponentially with r0 = rc/σ and only polynomially with rc for both Ewald and u-series, the errors are primarily functions of r0. In Sec. II, we argued that the minimum number of operations required to evaluate the direct and k-space sums are proportional to at least rc3 and 1/σ3, respectively; the total number of operations required is thus
for some positive constants Cn and Cf. Choosing rc to minimize Nops yields
If polynomial interpolation is used to compute the interaction functions, then the overall computational cost is largely independent of the specific near/far functional forms [Eq. (6) for Ewald vs Equations (19) and (20) for u-series]. Both Ewald and the u-series thus obey Eq. (30) with roughly the same values of Cn and Cf, and differ only in the value of r0. Under this assumption, the observed ratio r0,ewald/r0,u-series = 1.61 for rc = 8 implies that the u-series requires about 1.61−3/2 ≈ 1/2 of the computational effort for equivalent accuracy. We emphasize that this estimate of the ratio of u-series to Ewald computational work makes a number of simplifying assumptions, including perfect balancing of the computational load between the near and far terms and computational work that is strictly proportional to rc3 and 1/σ3, with equal constant factors. In practice, the relative performance of the two decompositions is dependent on the particular algorithms and computer architecture used.
B. Choosing u-series parameters for use in simulations
Although errors in the energy and pressure of DPPC are a useful way to judge the fidelity of electrostatics calculations, it is not clear, a priori, how these errors will impact the quality of the simulations themselves. To assess the accuracy required to avoid substantial simulation artifacts, we looked at two properties mentioned earlier as being sensitive to electrostatics: the surface area of the bilayer and the electrostatic potential difference between the water and the center of the membrane. We used as our baseline an Ewald simulation with a large cutoff radius (rc = 13.0 Å) to ensure high accuracy. As a threshold for acceptable accuracy, we required our observed mean properties to fall within the natural fluctuations—chosen here to be plus or minus one standard deviation—as estimated from the baseline run; this should help ensure that simulations using the electrostatic approximations still significantly overlap the target ensemble. Simulations were run for 2–4 µs each, with frames saved every 0.24 ns, and the first 120 ns of each simulation discarded from the analysis. The surface area was defined as the area of the simulation box parallel to the membrane (the x-y plane), and the electrostatic potential drop was estimated by first using the tool pmepot35 to compute a smoothed estimate of the electrostatic potential on a grid, and then averaging grid values that share the same z coordinate.34 All simulations used a 2.5 fs time step and a 64 × 64 × 64 grid to compute the far electrostatics.
Figure 10 shows the error in mean membrane properties as a function of rc for the C1-continuous u-series with b = 2, rc/σ ≈ 1.989 and the C2-continuous u-series with b ≈ 1.63, rc/σ ≈ 2.752 (refer to supplementary material, Tables 1 and 2, for more precise parameter values). DPPC simulations exhibited natural fluctuations in the surface area with a standard deviation of about 3%, and all of our simulations yielded errors less than half this value. Similarly, the mean potential drop (averaged across 40-ns windows) had a standard deviation of about 2.4%, again larger than the errors attributable to our choice of rc, suggesting that our accuracy was sufficient. By comparison, simulations that use cutoff electrostatics can have errors in the mean potential drop of well over 100%34 and are likely too inaccurate for membrane systems. Since DPPC is the most sensitive realistic system we have tested with respect to electrostatic approximations, it seems reasonable to expect that even the less accurate u-series settings shown in Fig. 10 should be conservative enough for most simulation purposes. In fact, we have used fairly aggressive parameters (b = 2, rc/σ ≈ 1.989, rc = 8 Å) in other tests—reversible folding of a villin headpiece and the temperature-dependent helicity of an Ala15 peptide, for example—and have achieved results consistent with simulations that use more accurate electrostatics. We also discuss the parameter choice in the section titled Summary and Conclusion.
V. A SEPARABLE REAL-SPACE ALGORITHM FOR EVALUATING THE ELECTROSTATIC ENERGY
In arriving at our main finding that the u-series requires roughly half the computation of the Ewald approach, we assumed that the far part of each decomposition would be calculated using the same algorithm. Since the u-series far part is well approximated by a sum of a small number of Gaussians, however, it can be computed by certain algorithms that the Ewald far part cannot, leading to an additional potential performance advantage for the u-series in certain situations. In this section, we sketch one such algorithm that may be advantageous on massively parallel machines.
On massively parallel supercomputers, the time required to evaluate the Ewald k-space sum is strongly affected by communication latency.6,7 Since the far interaction is not range limited, any parallel algorithm requires at least one global operation with the property that data from each processor affect the result of the computation on every other processor. In practice, computing the far interaction with a single global operation is rarely achieved. Many MD packages use PME11,12 to compute the k-space sum, in which a fast Fourier transform (FFT) is performed on the grid, followed by a multiplication in k-space, followed by an inverse Fourier transform. Each Fourier transform is a global operation, so PME requires two global operations.
The u-series k-space sum can alternatively be computed in real space with a single global operation by taking advantage of the dimensional separability of the Gaussian kernel,
First, we truncate the far part of the u-series at N terms, with N chosen to be large enough so that the truncation error is acceptably small (this will be quantified shortly). As long as a separable function S(r) = S0(x)S1(y)S2(z) such as a B-spline (PME) or Gaussian (GSE) is used to spread particle charges onto the grid, the k-space kernel for the resultant on-grid convolution
is a sum of separable functions, where denotes the Fourier transform of S. In real space, the on-grid potential is the circular convolution of the on-grid charge with the inverse Fourier transform of this kernel. Alternatively, we may regard the on-grid potential as the sum of N potentials, each obtained by convolving the on-grid charge with the inverse Fourier transforms of one of the Gaussians appearing in Eq. (32). The N Gaussians are separable by dimension, as are their inverse Fourier transforms. The potential associated with a given Gaussian thus represents a separable three-dimensional convolution (i.e., the composition of three one-dimensional convolutions).
Each separable convolution is performed in three sequential rounds: In the first round, for every (y, z) grid coordinate pair, grid values are broadcast along the x dimension, and the x convolution is computed. Each output point of the convolution can be computed independently with no additional communication, since the convolution is executed directly in real space. The same procedure is then repeated in the y and z dimensions. The three rounds of convolutions are performed for each of the N u-series terms in parallel, and the results are summed at the end to produce the final grid potentials. In total, three sequential rounds of communication are required, as opposed to the six sequential rounds necessary to perform both forward and backward Fourier transforms.
Computing the k-space sum in real space as described requires greater total computation and bandwidth because separate sets of convolutions must be performed for each of the N terms, whereas FFT-based approaches operate only on the sum of the terms. This trade-off of increased computation and bandwidth for reduced latency may be advantageous on massively parallel machines. Additionally, the computation and bandwidth can be reduced using other techniques if necessary. The on-grid Gaussian kernels corresponding to different terms have increasing widths and, consequently, are more suitably handled on increasingly coarser grids. The u-series can thus be implemented at a reduced computational cost by computing its far part using the multilevel summation method of Skeel et al.36–38
The practicality of the reduced-latency real-space algorithm depends critically on the value of N, because both bandwidth and computation scale in proportion to N. Fortunately, periodicity causes the Gaussians that are broader than the longest side of the unit cell to self-cancel, and we can show (supplementary material, Appendix F) that the relative error for the far part of the periodic energy when truncated to N terms is bounded by
where L = max{Lx, Ly, Lz}, generally leading to small values of N. Consider, as an example, the DPPC system, for which L = 70 Å. For the C1-continuous u-series with b = 2, we have σ ≈ 4.02 Å. The upper bound provided by Eq. (33) is 1.5 × 10−5 for N = 4 and a tiny 1.1 × 10−26 for N = 5. Given the typical accuracy required for MD simulations, N = 4 suffices for this particular system.
VI. SUMMARY AND CONCLUSION
The u-series decomposes 1/r into a sum of near and far parts such that the resulting approximation has a number of desirable properties. It is smooth on the entire real axis, is exact up to a cutoff radius rc, and has a uniformly bounded relative error for r > rc. These factors contribute to the u-series being more accurate than the Ewald decomposition for a given amount of computational effort. This in turn lends the u-series a computational advantage. In our experiments, roughly half as much work was required to achieve the same degree of accuracy as the Ewald decomposition.
Because the u-series is exact up to rc, the error for the interaction energy of a particle with all neighbors less than rc away is identically zero. Most of the error arises from the interactions with particles located immediately outside the ball r ≤ rc. The u-series method is thus potentially less accurate for physical systems (such as the lipid bilayers used in our numerical experiments) in which some charge structure is maintained on scales larger than the cutoff radius. Nonetheless, the u-series is a method with fast convergence in the limit b → 1 for any physical system. The error decays at a rate proportional to the rhs of Eq. (14) in the limit b → 1 for fixed rc, and polynomially fast in rc for fixed b.
The u-series is constructed so that its far part is a sum of separable functions, and we have shown that, with periodic boundary conditions, only a small number of summation terms need to be evaluated in order to compute the far part to machine precision. This enables the use of minimal-latency algorithms wherein the three-dimensional convolutions with each of these Gaussians are computed in parallel in real space as a product of three one-dimensional convolutions. This algorithmic advantage of the u-series for massively parallel supercomputers will likely become more important over time as computational density and communication bandwidth continue to scale more rapidly than communication latency. Section V describes only one possible algorithm for exploiting the separability of the u-series (one that has been implemented in Anton 2),39 leaving room for further algorithmic research.
Among the u-series variants that we have presented, the C1 construction appears to be the most useful in practice for MD simulations of biophysical systems. Our experience has been that most simulations are well served by the C1 construction with b = 2, which is sufficiently accurate even for electrostatic cutoff radii as low as rc = 8 Å. The parameters for the C1 construction with b = 2 are given by the second row of supplementary material, Table 2, and represent our recommended u-series parameterization for most MD simulations carried out near or above room temperature. When the accuracy of the electrostatics is the primary concern, the C2 construction may be used; even the least accurate C2 parameterization given in supplementary material, Table 1 (b = 1.629…), is more accurate than the Ewald decomposition with the same cutoff radius and has a comparable computational cost.
In this work, we have focused on the decomposition of the Coulomb potential 1/r, but the bilateral series approximations in Sec. III can be used more generally for 1/rα, as noted by Beylkin and Monzon.21,22 This in turn leads to u-series decompositions for 1/rα, which may prove useful, for example, for splitting the Lennard-Jones dispersion interaction 1/r6. Similarly, although the present work focuses exclusively on decompositions based on Gaussian series approximations, supplementary material, Appendix A shows how the BSA can be constructed based on the series of arbitrary well-behaved functions ϕ not limited to Gaussians, leading to alternative formulations. Although Gaussians have several desirable properties for our purposes, formulations based on alternative functions may be useful in other contexts.
The present work leaves open some mathematical questions about the u-series that would benefit from further investigation. We have shown that the pointwise convergence of the u-series is exponentially fast as b approaches 1 at constant σ. We have not proven, however, that this convergence for the Coulomb kernel implies similarly fast convergence for the total energy. While pointwise convergence typically entails convergence of integrals, a more precise statement and formal proof of the convergence of the total energy would be desirable. Additionally, for constant b, the u-series converges to 1/r in the limit rc → ∞. The numerical evidence presented in supplementary material, Appendix E, suggests that the total electrostatic energy converges polynomially fast, with a degree one larger than the degree of smoothness of the u-series, but we do not have a proof for this conjecture. Mathematical investigations beyond the scope of this paper are thus warranted.
SUPPLEMENTARY MATERIAL
See the supplementary material for the following appendices, tables, and figures. Appendix A: Convergence of the BSA to 1/rα; Appendix B: BSA error bounds; Appendix C: The BSA in fourier space; Appendix D: Existence of a root in Eq. (21); Appendix E: Impact of rc on u-series vs Ewald; Appendix F: Error bound on distant energy in terms of series length; Table 1: Parameters for the first three C2-continuous solutions arising from the C1 construction defined by Eq. (24), along with their relative error bound Mb; Table 2: Parameters for the C0, C1, and C2 constructions with b = 2 (Mb = 2.289 × 10−3); Fig. A1: Root-mean-square deviations (RMSDs) of the errors for the electrostatic energies as functions of the electrostatic cutoff radius rc; Fig. A2: RMSDs of energy errors for two u-series sharing the same set of parameters (rc, rc/σ), but different levels of smoothness.
ACKNOWLEDGMENTS
The authors thank Michael Eastwood and Cliff Young for helpful early discussions, and Mollie Kirk and Berkman Frank for editorial assistance.
This study was conducted and funded internally by D. E. Shaw Research, of which D.E.S. is the sole beneficial owner and Chief Scientist and with which all the authors are affiliated.