We present an approach to accelerate real-space electronic structure methods several fold, without loss of accuracy, by reducing the dimension of the discrete eigenproblem that must be solved. To accomplish this, we construct an efficient, systematically improvable, discontinuous basis spanning the occupied subspace and project the real-space Hamiltonian onto the span. In calculations on a range of systems, we find that accurate energies and forces are obtained with 8–25 basis functions per atom, reducing the dimension of the associated real-space eigenproblems by 1–3 orders of magnitude.
Over the course of the past few decades, quantum mechanical calculations have become a linchpin of materials research by virtue of the fundamental insights they provide and predictive power they afford. Of the many formulations employed, Kohn-Sham density functional theory (KS-DFT)1 has become the most widely used in practice due to its generality, relative simplicity, and high accuracy-to-cost ratio.2,3 However, while less costly than wavefunction based alternatives, the solution of the Kohn-Sham equations remains a formidable task, severely restricting the range of physical systems that can be investigated. These restrictions become even more acute in molecular dynamics investigations2 wherein the Kohn-Sham equations may be solved tens or hundreds of thousands of times to reach the time scales relevant to phenomena of interest.
The planewave pseudopotential method4 has been among the most widely used methods for the solution of the Kohn-Sham equations. By virtue of its Fourier basis, the method is accurate, simple to use since it relies on a single convergence parameter, and highly efficient on moderate computational resources employing well optimized Fast Fourier Transforms (FFTs). However, the use of a Fourier basis limits the method to periodic boundary conditions so that finite systems, such as molecules and clusters, as well as semi-infinite systems, such as nanotubes and surfaces, require the introduction of artificial periodicity with large vacuum regions between periodic replicas. This limitation also necessitates a neutralizing background density to avoid Coulomb divergences when treating charged systems. Moreover, the method’s reliance on Fourier transforms hampers scalability on parallel computing platforms, thus limiting the system sizes and time scales that can be reached.
To overcome the limitations of the planewave method, a number of approaches employing systematically improvable, localized representations have been developed over the last two decades.5–9 Of these, perhaps the most mature and widely used to date are the finite-difference methods.6,8 These methods maximize computational locality by discretizing all quantities of interest on a real-space grid. In so doing, both Dirichlet and periodic boundary conditions can be accommodated, thus enabling efficient and accurate treatment of finite, semi-infinite, and charged systems, as well as bulk 3D systems. Moreover, convergence is controlled by a single parameter and large-scale parallel computational resources can be efficiently leveraged by virtue of the method’s simplicity, locality, and freedom from communication-intensive transforms such as FFTs. However, the large number of grid points required to obtain accurate energies and forces (∼400–30 000/atom), and lack of efficient preconditioners for iterative solutions have hampered the competitiveness of real-space methods in practice.
Significant advances in recent years have accelerated real-space methods substantially. Since pioneering work in the 1990s,10–14 the number of grid points required to obtain accurate energies and forces has been notably reduced by double-grid techniques,15 ultrasoft pseudopotential16 and projector augmented wave17 formulations, high-order integration,18 and nonlocal force reformulation.19,20 Moreover, the lack of efficient preconditioners has been resolved by replacing traditional iterative solvers with Chebyshev-polynomial filtered subspace iteration (CheFSI),21 eliminating the need for preconditioning entirely. With these and other advances, real-space methods have been applied to systems containing thousands of atoms,22 and have now outperformed established planewave codes in applications to both finite19 and extended20 systems.
In this paper, we present an approach to accelerate current state-of-the-art real-space electronic structure methods several fold, without loss of accuracy, by systematically reducing the cost of the key computational step: the determination of Kohn-Sham orbitals spanning the occupied subspace.49 We do this by systematically and substantially reducing the dimension of the discrete eigenproblem that must be solved. To accomplish this, we leverage a recent breakthrough in electronic structure methodologies: discontinuous basis sets.9 By allowing the basis to be discontinuous, it can be simultaneously strictly local, orthonormal, systematically improvable, and highly efficient, requiring just tens of basis functions per atom to attain chemical accuracy (mHa/atom and mHa/bohr in energy and forces, respectively). We construct such a basis spanning the occupied subspace to the desired accuracy by solving local Kohn-Sham problems on the same grid as used for the global Kohn-Sham problem. The result is a discrete, discontinuous basis spanning the occupied subspace of the real-space Hamiltonian. We then project the large, sparse Hamiltonian onto the subspace, thus reducing its dimension substantially without loss of accuracy in the relevant subspace. Upon solving the reduced problem, the full-matrix eigenvalues and vectors are readily recovered and the remainder of the real-space calculation proceeds unmodified. The method is thus straightforward to implement and use. Having these key ingredients, we refer the approach as discrete discontinuous basis projection (DDBP). In calculations of quasi-1D, quasi-2D, and bulk metallic systems, we find that accurate energies and forces are obtained with 8–25 projection basis functions per atom, reducing the dimension of full-matrix eigenproblems by 1–3 orders of magnitude.
As noted above, we compute and employ an adaptive local basis (ALB)9 in real space to accurately and efficiently span the occupied subspace. Whereas localized bases are commonly employed to reduce degrees of freedom while retaining locality in linear-scaling methods, e.g., Refs. 23–28, and can be highly efficient in reducing degrees of freedom, these are generally nonorthogonal, limited to systems with a band gap, and nontrivial to improve.29,30 Recent efforts47,48 have begun to explore the application of such bases to metallic systems. In contrast, as detailed in Refs. 9 and 31–34, by releasing the constraint of continuity, ALBs can be highly efficient while strictly local, orthonormal, systematically improvable, and applicable to metallic and insulating systems alike. In addition, being determined by local eigenfunctions in each element, ALBs can be used in conjunction with the usual self-consistent field (SCF) method to efficiently solve the global Kohn-Sham problem. The filter diagonalization method35 develops a highly efficient contracted basis by decomposition of Hamiltonian and overlap matrices. In addition, the approach can be used in conjunction with the SCF method and is applicable to metallic and insulating systems alike. However, the resulting contracted basis is nonorthogonal and substantially less localized than ALBs. In addition, demonstrations of the efficacy of the method have so far been limited to underlying bases that are nontrivial to improve, e.g., Gaussians36 and pseudo-atomic orbitals (PAOs).37 The divide-and-conquer (DC)38,39 and fragment40 methods employ a subdomain decomposition as well. However, rather than a fixed uniform partition, independent of atomic configuration as for ALBs, the DC method employs a configuration-dependent partition into subsystems and approximates the global solution by solutions in each subsystem. Thus, while highly efficient, it is not systematically improvable. Hence, while a number of efficient, localized bases might be employed to span the occupied (and partially occupied) subspace, the ALB provides strict locality, orthonormality, systematic improvability, and high efficiency simultaneously; all of which we leverage in the present work.
II. DISCRETE DISCONTINUOUS BASIS PROJECTION METHOD
We consider the solution of the Kohn-Sham equations for a system of Na atoms in a domain Ω discretized on a uniform grid in real space. In each iteration of the SCF solution, the following linear eigenproblem must be solved:
where is the discrete Hamiltonian, ψn are the discrete Kohn-Sham orbitals with energies λn, and Ns is the number of occupied states. This constitutes a large, sparse eigenproblem of size Nd × Nd, where Nd ≫ Ns. As the system size grows, the solution of this problem becomes the bottleneck, regardless of discretization or solution strategy employed.
To reduce the dimension of the real-space eigenproblem (1), we begin by constructing an efficient, systematically improvable basis spanning the occupied subspace to desired accuracy. In order to obtain strict locality and orthonormality, we compute an ALB9 in real space. For this purpose, the domain Ω is partitioned into non-overlapping elements , with the collection of grid points encompassed by Ek denoted by . In doing so, with for k ≠ k′. The extended element is defined to be the uniform extension of Ek in all directions, with referred to as the buffer region.50
As illustrated in Fig. 1, the construction of the basis during an SCF iteration proceeds as follows. First, we solve the Kohn-Sham equations in each extended element , yielding discrete equations
where Nb,k is the number of basis functions in and is formed by restricting the potential in Ω to , with either periodic or Dirichlet boundary conditions on the Laplacian. Second, we restrict the basis so obtained on to Ek by truncating it outside Ek, i.e.,
Third, we orthonormalize the basis in Ek,
where ck,j are chosen such that are orthonormal within Ek. Since has zero overlap with for k ≠ k′, the basis , k = 1, 2, …, NE, j = 1, 2, …, Nb,k, is orthonormal on the global domain Ω.
Next, we project the Hamiltonian onto the subspace spanned by the discrete, strictly local, orthonormal basis generated above to arrive at the following reduced eigenproblem:
where , with V denoting the matrix whose columns are the basis vectors , k = 1, 2, …, NE, j = 1, 2, …, Nb,k. The dimension of this reduced problem is Nb × Nb with , which is significantly smaller than the dimension of the original real-space eigenproblem (1).
Finally, upon solution of the reduced problem (5), the corresponding full-matrix vectors are readily recovered as ψn = , whereupon the remainder of the real-space calculation proceeds unmodified.
Obtaining the reduced problem from the discrete real-space Hamiltonian rather than from the differential operator9 yields significant advantages. First, because continuity and boundary conditions are already incorporated in the discrete Hamiltonian, the discrete basis need not satisfy either: it must merely span the desired subspace. By contrast, when discretizing the differential operator in a discontinuous basis, continuity and boundary conditions are incorporated via penalty and surface terms, thus requiring surface integrals to be computed and penalty parameters to be set controlling the degree of continuity at interelement boundaries.9 Second, the reduced Hamiltonian obtained by the present approach has significantly smaller spectral width than the full Hamiltonian, thus accelerating iterative solution methods. By contrast, when discretizing the differential operator directly, the penalty parameter must be sufficiently large to ensure the stability of the approximation,9 leading to larger spectral widths than for corresponding planewave and finite-difference discretizations.33 Finally, in the present approach, all computations are carried out on a single, uniform real-space grid; whereas when discretizing the differential operator, separate discretization and quadrature grids are employed, making convergence less straightforward.
We now consider the reduction in computational cost afforded by DDBP. For this purpose, let us denote the finite-difference order by no and the partition dimension by D. In addition, let , , and = Na/NE. The asymptotic cost of the electron density calculation using iterative diagonalization21,41 is a factor smaller for the reduced eigenproblem. Whereas for systems of moderate size, where matrix-vector multiplies dominate computational cost, the efficiency is determined by the number of nonzeros in the Hamiltonian, which is a factor smaller for the reduced problem.51 This translates to speedups of for linear-scaling methods42 in which the density matrix is expressed directly in real space.
As we show below, both and are in practice. Therefore, the prefactor associated with DFT calculations for both diagonalization and linear-scaling methods is significantly reduced. This is a consequence of (i) the reduced number of operations when starting from compared to , as discussed above, (ii) the smaller spectral width of compared to , (iii) the block nature of , which enables efficient dense linear algebra, and (iv) the minimal wall time for basis generation, which scales linearly with the system size and is naturally parallel. Overall, the above estimates suggest that DDBP may provide up to an order of magnitude speedup for diagonalization based methods and up to two orders of magnitude speedup for linear-scaling methods.
III. RESULTS AND DISCUSSION
To demonstrate the accuracy and efficiency of DDBP, we implemented the approach in the M-SPARC prototype code, a serial matlab implementation of the large-scale parallel real-space code, SPARC.19,20 In order to demonstrate the generality of the approach, and particular efficiency in lower-dimensional calculations, we consider three systems with distinct electronic structure and dimensionality: a (3,0) carbon nanotube (CNT) with bond length 2.68 bohr a buckled silicene sheet with bond length 4.25 bohr, and fcc aluminum with lattice constant 7.78 bohr. The atoms are perturbed by up to ten percent of the bond length, as typical in relaxation and dynamics. In all calculations, we employ norm-conserving Troullier-Martins pseudopotentials,43 the local density approximation (LDA) with Perdew-Wang parametrization,44 and Fermi-Dirac smearing of 10−3 Ha to accommodate partial occupation.
In the real-space method, we employ a twelfth order central finite-difference approximation with mesh-sizes of 0.20, 0.40, and 0.65 bohr for the CNT, silicene, and aluminum systems, respectively. The resulting discretization errors in the energy and atomic forces are within mHa/atom and mHa/bohr, respectively, as typical in applications. In the DDBP method, we perform a D = 1, 2, and 3 dimensional partition with ∼ 12, 4, and 4 atoms/element for the CNT, silicene, and aluminum systems, respectively. Equations (1), (2), and (5) are subject to periodic boundary conditions and solved via Chebyshev-polynomial filtered subspace iteration,21 with orthonormalization via Cholesky factorization.33 The SCF iteration is accelerated using the Periodic Pulay method.45
In the construction of the DDBP basis, the buffer region must be sufficiently large to minimize boundary condition effects within elements but not so large as to introduce contributions from distant atoms into the local basis. In practice, an efficient buffer size can be determined for a given system by a series of calculations on representative smaller systems. In so doing, one generally finds that the accuracy of the basis is insensitive to the specific choice beyond ∼5 bohr. Since for our prototype serial calculations, however, the cost of basis construction is significant (since it is not parallelized over elements), we chose smaller buffer sizes when sufficient. In particular, we found buffer sizes of 4.02, 5.10, and 3.24 bohr (nearest grid points to 4, 5, and 3 bohr) to be sufficient for CNT, silicene, and fcc Al systems, respectively, and used these in all calculations. In a parallel calculation, this optimization could be omitted if desired by simply using a buffer size of 5 bohr for all.
Figure 2 shows the convergence of DDBP total energies and forces to standard real-space results for CNT, silicene, and aluminum systems with Na = 36, 36, and 108 atoms, respectively. We observe near exponential convergence with respect to for all three systems with increasing prefactor as the dimension of the partition D is increased. This is a consequence of the ability to apply exact boundary conditions in more directions for lower-dimensional partitions. Significantly, we see that just basis functions/atom are sufficient to achieve chemical accuracy in both energy and atomic forces for the CNT, silicene, and aluminum systems, respectively. This is to be compared with ∼19 763, 3488, and 432 grid points/atom, respectively, required by the standard real-space method without projection. Projection thus reduces the dimension of the eigenproblem that must be solved by 1–3 orders of magnitude.
Next, to get a sense of the performance gains to be expected in practice from the substantial reduction in eigenproblem dimension, corresponding reduction in spectral width, and ability to use dense algebra, we examine timings for the CNT, silicene, and aluminum systems. Specifically, we employ projection basis functions/atom, respectively, to ensure chemical accuracy. When solving the full real-space eigenproblem (1), we employ Chebyshev polynomial degrees of 40, 20, and 20 for the CNT, silicene, and aluminum systems, respectively. For the reduced eigenproblem (5), a degree of 10 is sufficient in all cases due to the significantly reduced spectral width of the reduced Hamiltonian (Table I).52 We note that the reduction in spectral width is most pronounced for finer meshes for which full-Hamiltonian spectral widths are largest. The present projection approach can thus be expected to yield the most significant gains in execution time for systems involving one or more “hard” atoms, such as first-row and transition elements.
|.||CNT .||Silicene .||fcc Al .|
|.||CNT .||Silicene .||fcc Al .|
Figure 3 shows the timings obtained using our serial prototype implementation with and without projection as a function of the system size. We see that, even for the serial implementation, projection yields significant speedups as system sizes increase, with factors of 1.6, 2.7, and 2.4 for the largest CNT, silicene, and aluminum systems considered, respectively. However, since basis generation is both scaling and naturally parallel, the time for basis generation will vanish as the problem size and/or number of processors increases. Hence, to get a better sense for the gains to be expected in a parallel implementation, we show also in Fig. 3 the execution time excluding basis generation, yielding factors of 36.3, 25.0, and 4.9 for the largest CNT, silicene, and aluminum systems considered, respectively. Of particular note is the factor of 4.9 speedup for the bulk Al system since, having a soft potential and correspondingly coarse mesh, speedups for more typical potentials and meshes may be expected to be substantially greater.
Finally, to illustrate the utility of DDBP in challenging physical applications, we compute the vacancy formation energy (VFE) in silicene. Since such calculations must resolve small differences in large energies, they pose a stringent test. For this purpose, we choose a 160-atom supercell of silicene and create a vacancy, leaving three dangling bonds. Using the same computational parameters in DDBP as in the preceding silicene calculations, we find a VFE of 2.94 eV, within 2 mHa of the full real-space result. While such accuracy is often sufficient, it can be increased further if desired by augmenting the projection basis. For example, increasing from 15 to 25 basis functions/atom brings agreement to within 0.2 mHa of the full real-space result.
IV. CONCLUDING REMARKS
In summary, we have presented an approach to accelerate real-space electronic structure methods several fold, without loss of accuracy, by systematically reducing the dimension of the discrete eigenproblem that must be solved, via projection in a highly efficient discontinuous basis. Upon solving the reduced problem, the full-matrix eigenvalues and vectors are readily recovered and the remainder of the real-space calculation proceeds unmodified. The result is a method which retains the essential simplicity, systematic convergence, and scalability of real-space methods, while reducing computational cost substantially. In calculations of quasi-1D, quasi-2D, and bulk metallic systems, we find that accurate energies and forces are obtained with 8–25 projection basis functions per atom, reducing the dimension of full-matrix eigenproblems by 1–3 orders of magnitude.
While the prototype serial implementation employed here already demonstrates significant gains, more significant gains will come with parallel implementation since the basis construction is both linear-scaling and naturally parallel. Basis construction time will then vanish as the problem size and/or number of processors increases. This is the focus of current work. On the other hand, with basis construction and filtering costs so reduced, the Rayleigh-Ritz step in the solution of the reduced problem will become apparent at smaller system sizes than in the solution of the full problem wherein it dominates beyond a few thousand atoms.33 Avenues to address this include recently developed spectrum slicing46 and complementary subspace34 approaches which replace the dense Rayleigh-Ritz solution with better-scaling iterative alternatives. Such a compact, sparse reduced Hamiltonian is also well suited to linear-scaling solution methods. Finally, we note that the present reduction strategy might be employed to accelerate other electronic structure methods as well via transforms to and from real space. These offer a number of opportunities for further advances.
This work was performed, in part, under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344. Support for this work was provided through the Scientific Discovery through Advanced Computing (SciDAC) program funded by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences.
Since our aim is a general method, applicable to metals and insulators alike, we consider the standard formulation, though the approach here will be advantageous in the context as well.
If opposite faces of Ek coincide with the boundary of Ω, no buffer is employed in that direction and the boundary conditions on in that direction are the same as those on Ω.
The factor of (2D + 1) arises due to the star-type stencil associated with the central finite-difference approximation of the Laplacian. The nonlocal projectors are typically applied in matrix-free fashion, with the associated cost being a factor of smaller for the reduced problem, where rc is the ratio of the average number of grid points within the support of each projector to and is the average number of elements spanned by nonlocal projectors.
The spectral width of the reduced eigenproblem increases linearly with the number of basis functions/atom for values typically employed in practice.