The evaluation of the exact [Hartree–Fock (HF)] exchange operator is a crucial ingredient for the accurate description of the electronic structure in periodic systems through ab initio and hybrid density functional approaches. An efficient formulation of periodic HF exchange in a linear combination of atomic orbitals representation presented here is based on the concentric atomic density fitting approximation, a domain-free local density fitting approach in which the product of two atomic orbitals is approximated using a linear combination of fitting basis functions centered at the same nuclei as the AOs in that product. A significant reduction in the computational cost of exact exchange is demonstrated relative to the conventional approach due to avoiding the need to evaluate four-center two-electron integrals, with sub-millihartree/atom errors in absolute HF energies and good cancellation of fitting errors in relative energies. The novel aspects of the evaluation of the Coulomb contribution to the Fock operator, such as the use of real two-center multipole expansions and spheropole-compensated unit cell densities, are also described.

Recently, there has been dramatic progress in reduced-scaling many-body formalisms for molecular electronic structure problems.1–11 Such approaches allow robust simulation of the electronic structure of large systems that surpasses the accuracy of mainstream methods, i.e., the Kohn–Sham density functional theory (KS DFT), by allowing to systematically eliminate analytic and numerical approximations. Although reduced-scaling formalisms, including many-body methods, have been well established in the solid state community,12–22 their development applicable to periodic solids is a relatively unexplored frontier in quantum chemistry when compared to the molecular setting, the few pioneering efforts notwithstanding.23–30 A significant barrier to straightforward extension of the molecular methods to the periodic setting is the different numerical representations dominant in each setting. The linear combination of atomic orbitals (LCAO) approach is dominant in molecular computations with open-boundary conditions, but it is not commonly used in the periodic context where the use of plane waves is dominant; nevertheless, bases with (strictly) local support has been used in periodic solids [linear muffin-tin orbitals (LMTO),31 finite/spectral elements,32 and numerical atomic orbitals33,34]. We note that several widely used packages, including Crystal,35 Gaussian,36,37 CP2K,38 PySCF,39 ADF-BAND,40,41 and DMol,3,42 support local basis sets in periodic calculations. The primary advantage of the LCAO representation is its compactness for atomistic matter in which the atomic electronic structure is largely preserved; the compactness of representation and the quasi-local support of most important AOs are key to practical reduced-scaling formalisms for many-body methods. Thus, the use of the LCAO representation provides an appealing alternative to the many-body formalisms in solids.

This paper focuses on the mean-field problem in the LCAO representation for periodic solids as a precursor to reduced-scaling many-body methods in such a representation. Although the basic LCAO machinery in the periodic setting has long been established,43–46 several recent efforts have focused on how to improve the efficiency of operator evaluation in the LCAO representation by density fitting (DF).24,26,34,47–52 DF is key to fast computation in the LCAO representation, whether it is at the mean-field or many-body level. Long before the AO basis set approaches numerical completeness, the AO product space becomes highly redundant; density fitting is a way to eliminate the redundancy and thus to greatly reduce the cost of operator evaluation (and for AOs that are represented purely numerically,33,34 density fitting is the only practical way to evaluate operators).21 Another way to view density fitting is as a physically motivated factorization of the two- and higher-body operator representations; by separating the tensor modes of the operator representation, the density fitting reduces the cost of operator manipulations such as transformation to the basis of single-particle eigenstates, which is key to reducing the cost of integral transformation in modern reduced-scaling many-body methods.7,8,10,11,53–55

In this paper, we describe how to improve the efficiency of the exact exchange operator construction in the periodic LCAO setting using concentric atomic density fitting (CADF) that has been successfully utilized in the molecular electronic structure by us56,57 and others.21,58–60 In Sec. II, we briefly review the basics of the density fitting approximation in the electronic structure, the periodic LCAO HF method, and the implementation of the CADF periodic exact exchange algorithm (CADF-K). Computations on various one-, two-, and three-dimensional periodic systems are discussed in Secs. III and IV. Conclusions and future work are given in Sec. V.

For the purpose of making this manuscript as self-contained as reasonable, we start out by quickly recapping the basics of density fitting and periodic LCAO mean-field methods.

Density fitting, also for historical reasons known in quantum chemistry as the resolution of the identity (RI), approximates product quasi-densities by a linear combination of density-fitting (also referred to generically as auxiliary) basis functions. The modern DF formalism was introduced and analyzed by Whitten for the purpose of approximating the two-body Coulomb integrals61,62 (for earlier DF-like approximations, see Ref. 63 and references therein). At the same time, Baerends et al.64 used Eq. (1) in the context of evaluation of the Coulomb potential of a density represented as a linear combination of two-center products in terms of one-electron functions. After much subsequent work,65–68 DF entered the mainstream of quantum chemistry due to the seminal work by Feyereisen et al.69,70 and Ahlrichs et al.71–74 Careful engineering of the density fitting basis sets enables us to make the DF error of global fitting sufficiently small and largely canceling in practically relevant energy differences; the computational speedup can be as high as one to two orders of magnitude for small-to-medium sized molecules.72,75–77

To highlight the issues with straightforward application of DF to periodic systems, consider fitting two-center AO products μνϕμ*(r)ϕν(r) by its approximate representation μν̃ as a linear combination of one-center density-fitting basis functions {X},

μνDFμν̃XCμ,νXX.
(1)

The set of {X} in Eq. (1) is a μν-independent system-spanning set in global fitting and a μν-dependent set in local fitting; clearly, DF in an infinite (e.g., periodic) system must necessarily be local to keep the number of fitting coefficients finite. The fitting coefficients are determined by minimizing the “norm” of the error density, δμ,νμνμν̃, “weighted” by an arbitrary operator Ŵ. If the fitting basis is complete, then minimizing (δμ,ν|Ŵ|δμ,ν) is equivalent to solving

Ŵδμ,ν=0.
(2)

For any positive Ŵ, such as the Coulomb operator V^ff(r)|rr| dr, this is only possible by making δμ,ν=0, i.e., μν=μν̃. For a finite fitting basis X, minimizing (δμ,ν|Ŵ|δμ,ν) is equivalent to solving the linear system

X:XŴδμ,ν=0XŴμν=YCμ,νYXŴY,
(3)

which only involves two- and three-center integrals, and hence is efficient. Comparison of Eqs. (2) and (3) reveals the dual role played by the fitting basis in density fitting in a finite basis: {X} defines not only the expansion space for the fitted density but also the projection space (in the sense of the collocation method) for solving the exact fitting problem (2). For example, the Coulomb fitting (Ŵ=V^) via Eq. (3) can be viewed as ensuring that the Coulomb potential of the error density δμ,ν is zero “at” every collocation function X. In addition, unless Ŵ = 1, “density fitting” is a misnomer: in an incomplete basis, minimizing (δμ,ν|Ŵ|δμ,ν) is equivalent to minimizing the 2-norm of the “potential” generated by Ŵ1/2 from the error density δμ,ν; since V^1/2fπ3/2f(r)|rr|2 dr, the Coulomb fitting is equivalent to minimizing the error in the electric field.

The error in a four-center Coulomb integral can be easily seen as quadratic in the density fitting errors (i.e., its DF approximation is robust78,79) if (a) Ŵ=V^ and (b) the same fitting basis set is used for the bra and the ket,

ρσV^μνρσ̃V^μν̃=δρ,σV^δμ,ν+δρ,σV^μν̃+ρσ̃V^δμ,νŴ=V^,{X}ρ,σ={X}μ,ν̲̲δρ,σV^δμ,ν,
(4)

where each component of δρ,σV^μν̃ and ρσ̃V^δμ,ν vanishes according to Eq. (3). Chemists’ notation is used for the two-electron integrals here and in the rest of this paper. Condition (b) is satisfied automatically in the global Coulomb fitting, hence its ubiquitousness. For local fitting, or global fitting of an indefinite operator Ô, a manifestly robust approximation78 can be used,

ρσÔμν=robustDFρσÔμν̃+ρσ̃Ôμνρσ̃Ôμν̃.
(5)

One of the challenges for local DF is making it accurate without introducing artifacts. Fundamentally, the problem with local fitting is that the potential generated by μν̃ is only accurate where the fitting functions are located. Thus, the local DF will introduce large errors in integrals ρσV^μν with large bra–ket distances. This issue is avoided in domain-based local DF53,80 by fitting ρσ and μν using the set of X specialized to every (ρσ, μν) combination; the resulting DF approximation is robust. This makes both ρσ̃ and μν̃ optimal for approximating ρσV^μν accurately for any ρσμν distance. The domain-based local fitting has been used for periodic systems in the context of local LCAO MP2 methods by Schutz, Usvyat, and co-workers.49,50,81–83 The domain fitting greatly increases the number of fitting problems to be solved relative to the global and purely local approaches and thus pushes the crossover with respect to the global fitting to relatively large systems. Also note that the domain DF requires integrals XV^μν, where {X} span fitting sets for all possible ρσ products, i.e., potentially the entire system. Thus, purely local fitting [where {X} are “local” to each μν] would be preferred if it can be made accurate.

The accuracy of purely local fitting can be systematically tuned by several approaches:

  • by selecting sufficiently complete fitting sets, whether by brute force21,34 or by augmenting the fitting basis with plane waves as recently done by Sun and co-workers;51 note the artifactually slow decay of fitting errors that occurs in one-dimensional systems;84 

  • by improving the accuracy of long-range potential via (a) constraining the fitted density to match the leading multipole moments as the exact density or, similarly, (b) including distant functions in the collocation fitting set in Eq. (3);85 

  • by using the robust DF approximant [Eq. (5)] to reduce the error; however, this can lead to artifacts such as loss of positivity of the Coulomb operator.58 

The accuracy of the long-range tail of the potential generated by the fitted density is especially important for the periodic systems in the context of computing the Coulomb potential due to its slow decay. This means at least constraining the charge of the fitted density to ensure the exact electrical neutrality of the unit cell and thus ensure that the Coulomb potential is finite.47,48,52,86–89 Note that for the Coulomb term, other types of methods similar to density fitting exist for molecules and solids (see, for example, Refs. 90 and 91 for an efficient, potentially exact approach that exploits numerical integration of multipolar-expanded atomic densities partitioned from the total charge density).

The long-range behavior of the exact exchange operator is less problematic, at least in insulators, due to the rapid decay of the one-particle density matrix. Therefore, it is reasonable to assume that the use of purely local density fitting would be a viable strategy even in periodic systems. Here, we extend our previous efforts56,57 on fast strongly local density fitting for exchange operator construction in molecules to the periodic setting. By strong locality, we mean that the fitting space for the product μ1μ2 is composed only of the fitting basis functions on the same atoms as μi; we dub such a local DF approximation concentric atomic DF. The idea of CADF-like fitting goes back to the work of Baerends and co-workers64 who used it to fit atomic components of the one-particle density. Recently, the concentric fitting approach, referred to as pair atomic resolution of the identity (PARI), was investigated by Merlot et al.58 in the context of Coulomb fitting; the authors were the first to point out the violation of positivity by the robust PARI approximant that led to catastrophic failures. Hollman and co-workers56 showed that the accuracy of the CADF could be dramatically improved and the nonpositivity artifacts could be partially alleviated by combining the CADF approximation for most integrals with the exact evaluation of four-center integrals, dubbed semiexact CADF (seCADF),56 as well as developed a rigorously screened linear scaling version of the CADF exact exchange builder.57 Manzer et al.59 demonstrated that the nonpositivity violations do not occur when robust PARI is only used for the exchange operator.

The above applications of CADF/PARI in molecular context were recently mirrored in the molecular21and periodic34 setting by Scheffler and co-workers using the non-Gaussian LCAO representation based on numeric atomic orbitals that have strictly local support. Unlike the earlier Gaussian-based robust CADF efforts, the use of the nonrobust local RI approach for fitting numerical AO products causes the need for the construction of high-quality fitting basis sets with improved accuracy.21 Application of robust CADF to the exact exchange construction in periodic systems has not yet been accomplished to the best of our knowledge in Gaussian or non-Gaussian LCAO representations; hence, this is the primary objective of this work.

The periodic Hartree–Fock procedure implemented in the current work is rather standard and can be found in the literature since the 1980s.44,45,92,93 The Hartree–Fock crystalline orbitals (COs) in the LCAO framework are expanded,

ψi,k(r)=μCμ,kiϕμ,k(r),
(6)

in terms of Bloch AOs, which, in turn, are translation-symmetry-adapted linear combinations of the AOs,

ϕμ,k(r)ReikRχμ(rR).
(7)

The CO coefficients Cμ,ki in this work are determined by solving the canonical Hartree–Fock equations projected on the AOs in the reference cell,

FkCk=SkCkϵk,
(8)

where Fk, Sk, and ϵk are the Fock matrix, overlap matrix, and CO (band) energies, respectively. In this work, these equations were solved for a uniform grid of k points in the first Brillouin zone, without exploiting any additional symmetries. The Fock matrix in Eq. (8) is a linear combination of conventional AO (direct space) Fock matrices,

Fμ,ν,k=ReikRFμ,νR,
(9)
Fμ,νRμ|F^|νR,
(10)

where a compound index μR is used to represent the μth AO located in cell R, i.e., χμ(rR). For AOs in the reference cell, R = 0 and thus R will be omitted for simplicity. Note that the Fock matrix is Hermitian, i.e., Fμ,ν,k=Fν,μ,k*, which requires the AO Fock matrix to obey

Fμ,νR=Fν,μ(R)*.
(11)

This condition is closely related to the requirement of translational invariance: Fμ,νR = Fμ(−R),ν.

The closed-shell Fock matrix in the AO basis is expressed in terms of the usual kinetic energy, nuclear and electronic Coulomb potential, and exchange contributions,93 

Fμ,νR=Tμ,νR+Vμ,νR+2Jμ,νRKμ,νR,
(12)
Tμ,νR=12μ|r2|νR,
(13)
Vμ,νR=GA=1Nhatomμ|ZA|rAG||νR,
(14)
Jμ,νR=GLρ,σμνR|1r12|σ(G+L)ρGDρ,σL,
(15)
Kμ,νR=GLρ,σμρG|1r12|σ(G+L)νRDρ,σL,
(16)

where A denotes the nuclear position of atom A in the reference cell and Dρ,σL denotes the density matrix. Indices R, G, and L span the infinite set of lattice vectors and are associated with pseudo-overlap distribution, Coulomb operator, and density representation, respectively. Note that the translational invariance of D,

DρG,σ(G+L)=Dρ,σL,
(17)

is exploited in Eqs. (15) and (16). The AO (direct-space) density matrix Dρ,σL is constructed from the CO coefficients as

Dρ,σL=1ΩdkeikLioccCρ,kiCσ,ki*,
(18)

where Ω is the volume of the first Brillouin zone of the reciprocal lattice. The columns of Fμ,νR (and Sμ,νR) are transformed to the Bloch AO basis for each k before solving the Hartree–Fock equations [Eq. (8)]. The Hartree–Fock energy is obtained as

EHF=Rμ,νDμ,νRFμ,νR+Tμ,νR+Vμ,νR.
(19)

All contributions to the AO Fock matrix for a gapped system can be evaluated to arbitrary finite precision ϵ with finite effort despite the infinite ranges of indices R, G, and L. First, a finite support of Gaussians limits the range of R in Eqs. (13)(15) and L in Eq. (15) to polylog of precision (i.e., ∝− logkϵ). Summation over G in the Coulombic contributions [Eqs. (14) and (15)] is slow and must be carried out to sufficiently long range to make V and J Hermitian according to Eq. (11). However, fast techniques for the Coulomb potential evaluation, such as the fast multipole method (FMM), are well established and will not be discussed here; in this work, we used the multipole approximation to evaluate the Coulomb potential efficiently (see the  Appendix for technical details of the real-valued multipole expansion employed here). Finally, G and (G + LR) in Eq. (16) have polylog ranges due to the finite support of Gaussians, whereas the range of the L sum is also polylog in gapped systems due to the exponential decay of density, Dρ,σL.94–97 However, in metallic systems, convergence of the L sum in Eq. (16) will be slow (polynomial in precision).98–100 The remaining question is that of cost. The computational effort of periodic LCAO HF is primarily dominated by the evaluation of the four-center two-electron integrals, particularly in Eq. (16) (explicit evaluation of the integrals in J, denoted as 4c-J, is only needed for the near-field contribution, i.e., small values of |G|, and can be reduced further by density fitting). Section II C will describe how the computational cost of Eq. (16) can be greatly reduced by local density fitting.

The concentric atomic density fitting (CADF) is a local approximation to Eq. (1),

μν=X(μ,ν)Cμ,νX(μ,ν)X(μ,ν),
(20)

where X(μ,ν) are the density fitting functions that are concentric with functions μ or ν. The nonrobust CADF approximation

(μν|ρσ)=nonrobust CADFX(μ,ν)Y(ρ,σ)Cμ,νX(μ,ν)(X(μ,ν)|Y(ρ,σ))Cρ,σY(ρ,σ)
(21)

is positive definite, but inaccurate with fitting basis sets of practical size; hence, robust CADF is used in this work,

(μν|ρσ)=CADFX(μ,ν)Cμ,νX(μ,ν)(X(μ,ν)|ρσ)+Y(ρ,σ)(μν|Y(ρ,σ))Cρ,σY(ρ,σ)X(μ,ν)Y(ρ,σ)Cμ,νX(μ,ν)(X(μ,ν)|Y(ρ,σ))Cρ,σY(ρ,σ).
(22)

Although robust CADF can violate positivity,58 it is safe to use for K [Eq. (16)].59 Inserting Eq. (22) into Eq. (16) leads to the following expression [Einstein’s convention is assumed for Eqs. (23)–(25)]:

Kμ,νR=CADFCμ,ρGX(μ,ρG)(X(μ,ρG)|σ(G+L)νR)+(μρG|Y(σ(G+L),νR))Cσ(G+L),νRY(σ(G+L),νR)Cμ,ρGX(μ,ρG)(X(μ,ρG)|Y(σ(G+L),νR))Cσ(G+L),νRY(σ(G+L),νR)Dρ,σL
(23)
=2Cμ,ρGX(μ,ρG)(X(μ,ρG)|σ(G+L)νR)Dρ,σLCμ,ρGX(μ,ρG)(X(μ,ρG)|Y(σ(G+L),νR))Cσ(G+L),νRY(σ(G+L),νR)Dρ,σL
(24)
=Qμ,σLXGFσL,νRXG,
(25)

where

Qμ,σLXGGρCμ,ρGX(μ,ρG)Dρ,σL,
(26)
FσL,νRXG2(X(μ,ρG)|σ(G+L)νR)Y(σ(G+L),νR)(X(μ,ρG)|Y(σ(G+L),νR))Cσ(G+L),νRY(σ(G+L),νR).
(27)

Note the appearance of lattice vector indices G and L that distinguish cell indices of X and σ on the left-hand sides of Eqs. (26) and (27) from cell indices G and L on the right-hand sides. The appearance of G and L in Eq. (26) can be understood if we express Q using a translationally redundant density representation,

Qμ,σLXGGρCμ,ρGX(μ,ρG)DρG,σ(G+L).
(28)

Due to the dependence of X(μ,ρG) and σ(G + L) on G, summation over G produces nonzero elements of Q with functions X and σ spread in a range of cells, with the respective ranges labeled G and L. In contrast, in Eq. (27), indices G and L are introduced simply as notational convenience. Note that only translationally unique components of FσL,νRXG need to be evaluated,

FσL,νRXG=Fσ(LR),νX(GR).
(29)

Note that Eq. (24) assumes the equality of the first two terms on the right-hand side of Eq. (23). This assumption is satisfied, given a sufficiently large lattice range for the G and L sums. Indeed, consider product |μνR1) and density Dρ,σR2. As discussed above, the maximum range of R1 and R2 are polylog in precision due to the finite support of Gaussians and rapid decay of the density in gapped systems. The two range extents are denoted as Romax and Rdmax, respectively, i.e., |R1||Romax| and |R2||Rdmax|, respectively. In this work, Romax is determined before SCF starts, while Rdmax is updated in every SCF iteration based on the specified precision thresholds (see Sec. III). Therefore, the ranges of lattice vectors in Eqs. (23)–(27) are as follows:

  • |G||Romax| and |L||Rdmax| because G and L are involved in pseudo-overlap and density representations, respectively.

  • |R|2|Romax|+|Rdmax| because cell R is linked to cell 0 via

μ|Romax|ρG|Rdmax|σ(G+L)|Romax|νR.
(30)
  • |G||Romax|. Since {XG} spans the union of density fitting functions used to represent all non-negligible (μ, ρG) pairs, sums over G and G have the same extent.

  • |L||Romax|+|Rdmax|. Since {σL} spans the union of all possible σ(G + L), which must be connected to μ0 via Eq. (30), sums over L and (G + L) have the same extent.

The CADF coefficients are precomputed before the SCF iterations by solving the usual fitting linear system,

Y(μ,ρG)|μρG=X(μ,ρG)MX(μ,ρG)Y(μ,ρG)Cμ,ρGX(μ,ρG),
(31)

with the Coulomb fitting “metric” MX(μ,ρG)Y(μ,ρG)Y(μ,ρG)|X(μ,ρG). The fitting coefficients CνR,σ(G+L)Y(σ(G+L),νR) in Eq. (27) are then obtained by translation. Note that the number of non-negligible (μ, ρG) pairs for each AO μ is polylog in precision; hence, the fitting coefficient tensor is sparse. Tensor Q is also sparse in insulators due to the exponential decay of the density matrix. However, note that tensor F is not sparse due to its slow (polynomial) decay. Nevertheless, evaluation of Eq. (25) is fast because only a sparse subset of F contributes to the contraction with sparse Q in Eq. (25). Indeed, in Eq. (25), we can see that for a specific pair of (XG, σL), no contribution to Kμ,νR will be made if Qμ,σLXG is negligible for all μ. Thus, based solely on the sparsity of Q, a prescreening list of (XG, σL) pairs, named L, can be formed prior to the construction of F to reduce the computational cost. This list defines which blocks of F are computed (see Sec. III for the details of tensor blocking).

Of course, the CADF approximation is not sufficient to arrive at an efficient method since it is necessary to also screen out small contributions in every step of the CADF-K algorithm. Although it is possible to evaluate a block-sparse tensor expression such as the CADF-K expression [Eq. (23)] such that all contributions above the certain threshold ϵ are included, the necessary analysis is complicated (see Ref. 57 for how this can be done for the nonperiodic CADF-K). Here, we sacrifice the ability to control the precision of the exchange matrix by a single threshold and instead introduce separate thresholds for exploiting sparsity in individual steps. Since for the efficiency reasons, the AO tensors in our implementation are tiled (blocked) not by atoms but by sets of one or more atoms (up to an entire unit cell), the screening is adapted to work approximately the same no matter how the tiling is chosen. The following thresholds are utilized:

  • ϵschwarz: The magnitudes of the three- and four-center ERIs are estimated using the standard (Cauchy–Bunyakovsky–)Schwarz bound,101 

|(XG|μρG)|(X|X)1/2(μρG|μρG)1/2,
(32)
|(μνR|σ(G+L)ρG)|(μνR|μνR)1/2(σLρ|σLρ)1/2,
(33)

respectively. The bounds for the individual integrals are then straightforwardly combined to produce the bounds for the Frobenius norms of tiles of the ERI tensors. A tile of integrals is neglected if its estimated Frobenius norm is below ϵschwarz times the number of integrals of the tile. This is equivalent to skipping tiles with per-element (scaled) Frobenius norms less than ϵschwarz; in other words, the skipped tiles contain integrals with average magnitude below ϵschwarz.

  • ϵS: For any non-concentric orbital basis set (OBS) shell pair (μ, νR), if the per-element Frobenius norm of its overlap integral is below ϵS, i.e.,

Frobμμ,νν{μ|νR}<ϵS,
(34)

we will neglect integrals associated with the corresponding pseudo-overlap distributions, such as |μνR) in Eqs. (13)(15) and |μρG) in Eq. (31). R’s range limit, Romax, is determined when all (μ, νR) pairs for a specific R satisfy Eq. (34).

  • ϵD: If the per-element Frobenius norm of all tiles in density Dρ,σL for a given unit cell L is below ϵD, that unit cell’s density is considered to be negligible; this screening defines the Rdmax range.

  • ϵF: This parameter controls which tiles of F [Eq. (27)] are computed by defining the prescreening list L (see Algorithm 1).

  • ϵsparse: A tile of tensor Q [Eq. (26)] or F [Eq. (27)] is omitted if the per-element Frobenius norm is below ϵsparse.

The values of these parameters used in this work are detailed in Sec. III.

ALGORITHM 1

Density-based periodic CADF-K.a

1: procedure CADF-K 
2: before the beginning of SCF loops 
3: Read in orbital and density fitting basis sets 
4: Evaluate fitting coefficients C  Eq. (31) 
5: Evaluate two-center ERI, (XG|Y) where |G|2|Romax|+|Rdmax| 
6: Evaluate integral-direct three-center ERI, (XG|νσG
7: 
8: in each SCF loop 
9: D ← density matrix from previous iteration 
10: Evaluate Q  Eq. (26) 
11: L = ForceShape(Q See function definition below 
12: Evaluate translationally unique F via BuildF(L See function definition below 
13: Evaluate K  Eq. (25) 
14: 
15: end procedure 
16: function ForceShape(Q
17: forX in DFBS translated by G′ do 
18: forσ in OBS translated by L′ do 
19: forμ in OBS do 
20: ifQμ,σLXG¯F>ϵFthen 
21: Add (XG, σL) to L 
22: break 
23: end if 
24: end for 
25: end for 
26: end for 
27: returnL 
28: end function 
29: function BuildF(L
30: for (XG, σL) in Ldo 
31: for R within range (2|Romax|+|Rdmax|) do 
32: if|LR||Romax|then 
33: forν in OBS do 
34: Construct Fν,σ(LR)X(GR)  Eq. (27) 
35: end for 
36: end if 
37: end for 
38: end for 
39: return F 
40: end function 
1: procedure CADF-K 
2: before the beginning of SCF loops 
3: Read in orbital and density fitting basis sets 
4: Evaluate fitting coefficients C  Eq. (31) 
5: Evaluate two-center ERI, (XG|Y) where |G|2|Romax|+|Rdmax| 
6: Evaluate integral-direct three-center ERI, (XG|νσG
7: 
8: in each SCF loop 
9: D ← density matrix from previous iteration 
10: Evaluate Q  Eq. (26) 
11: L = ForceShape(Q See function definition below 
12: Evaluate translationally unique F via BuildF(L See function definition below 
13: Evaluate K  Eq. (25) 
14: 
15: end procedure 
16: function ForceShape(Q
17: forX in DFBS translated by G′ do 
18: forσ in OBS translated by L′ do 
19: forμ in OBS do 
20: ifQμ,σLXG¯F>ϵFthen 
21: Add (XG, σL) to L 
22: break 
23: end if 
24: end for 
25: end for 
26: end for 
27: returnL 
28: end function 
29: function BuildF(L
30: for (XG, σL) in Ldo 
31: for R within range (2|Romax|+|Rdmax|) do 
32: if|LR||Romax|then 
33: forν in OBS do 
34: Construct Fν,σ(LR)X(GR)  Eq. (27) 
35: end for 
36: end if 
37: end for 
38: end for 
39: return F 
40: end function 
a

All AO indices refer to the tiles of the corresponding AO dimensions. ¯F is the per-element Frobenius norm of the tensor tile.

Note that ϵsparse is used to screen out the insignificant tiles so that they are not used in subsequent computations. The difficult problem is how to predict which tiles are significant before computing them. For the AO integrals, this can be done by using Schwarz inequalities or other more sophisticated estimators,102 but for results of tensor contractions, this is less straightforward. The standard TiledArray approach for screening block-sparse tensor contractions103 estimates the norms of the result by using the submultiplicative property of the Frobenius norm; this, in effect, assumes that all contraction contributions to the result tile add up constructively (i.e., have the same sign). The insignificant result tiles whose estimated per-element Frobenius norm is less than ϵsparse are then not computed, but small contributions to significant result tiles are still computed. Here, we use a hybrid approach: for the significant result tiles, all contributions whose per-element Frobenius norm are estimated to be less than ϵsparse/ncontrib are not evaluated, where ncontrib is the total number of contributions from the tensor contraction to the particular given tile. Replacing ncontrib by ncontrib would guarantee that the omitted contributions affect the result’s norm by less than ϵsparse; the use of ncontrib does not provide strict guarantee but does account for the cancellation of small omitted contributions, assuming that they are normally distributed around zero. This allows us to reduce the number of evaluated contributions without any observable effect on the accuracy.

Our periodic CADF-K implementation is summarized in Algorithm 1. The orbital basis set and density fitting basis set will be denoted as OBS and DFBS, respectively, in the following:

Periodic Hartree–Fock with CADF-K has been implemented in the Massively Parallel Quantum Chemistry (MPQC) package (version 4.0.0).104 The implementation of the sparse tensor algebra is provided by our TiledArray tensor framework105 supported by the MADNESS task-based runtime.106 These components were compiled with the GCC 8.3.0 compiler. All computations were performed on a commodity cluster at Virginia Tech, each node of which has two 12-core Intel Xeon E5-2680 v3 processors (the theoretical peak performance of one node is 960 GFlops/s) and 128 GB of RAM, unless otherwise stated.

We chose polyacetylene, polyethylene, single-walled carbon nanotube (4, 0), hexagonal boron nitride monolayer (h-BN), urea monolayer, urea crystal, lithium hydride (LiH), and diamond as test cases representative of relevant dimensionalities and bond types. The structures were obtained from Refs. 107–113. In the present work, the Def2-SVP114 orbital basis set paired with the Def2-SVP-J115 density fitting basis set was utilized for all systems, except for LiH and diamond for which more computationally feasible basis sets were adopted; a double-ζ quality basis set derived from molecular cc-pVDZ116 by Lorenz and co-workers117 (denoted as CR-cc-pVDZ) was chosen for LiH and Pople’s 3-21G118 for diamond. To test the convergence of CADF-K with respect to density fitting basis sets, we also investigated regular cc-pVDZ116 OBS paired with cc-pVXZ-RI (X = D, T, Q, 5, 6)75,119 DFBS.

Table I lists the default and tight values for various thresholds. The former set is used in all our calculations unless specified otherwise, while the latter is recommended when one expects high-precision results.

TABLE I.

Default and tight values for screening thresholds used in this work.

ThresholdDefaultTightDescription
ϵschwarz 10–12 10–16 Schwarz threshold 
ϵS 10–10 10–14 Overlap threshold 
ϵD 10–8 10–12 Density truncation threshold 
ϵF 10–10 10–12 Force shape threshold 
ϵsparse 10–10 10–12 Tensor sparse threshold 
ThresholdDefaultTightDescription
ϵschwarz 10–12 10–16 Schwarz threshold 
ϵS 10–10 10–14 Overlap threshold 
ϵD 10–8 10–12 Density truncation threshold 
ϵF 10–10 10–12 Force shape threshold 
ϵsparse 10–10 10–12 Tensor sparse threshold 

The computational complexity of molecular electronic structure methods is traditionally expressed as the operation count (or, alternatively, time-to-solution) as a function of the system size N in the asymptotic regime of N. It is somewhat less common to discuss the complexity of molecular computations with respect to precision ϵ, defined as the difference between the exact solution for a given property (e.g., Hartree–Fock energy) and its approximate value obtained in practice. Precision is thus limited by the discretization (“basis set”) error as well as other numerical approximations such as screening. The effects of both kinds of numerical approximations cannot be easily separated for most representations.

The cost of computations on infinite systems can only be meaningfully analyzed in terms of scaling with the unit cell size and with respect to the precision.120 Here, we focus only on the latter and specifically focus on the effect on finite lattice summation ranges on precision; the effects of the CADF approximation and various screening thresholds on the precision is considered in Sec. IV B.

Without any screening, the cost of the exchange operator evaluation is O(Ncell3), where Ncell is the number of unit cells included in the lattice summations in Eq. (16) over R, G, and L. Due to the localized nature of Gaussian basis functions, the G and R series involved in the pseudo-overlap distributions {μ, ρG} and {ρR, σ(G + L)} decay exponentially to zero with the increase in G and R. Therefore, the L summation, which depends on the decay of the density matrix, usually predominates the computation of exchange. Nevertheless, with the decaying density matrix, the best possible scaling exponents with respect to Ncell is 0 (constant).

Figure 1 shows the effective scaling exponent of Ncell for the periodic CADF-K builder using the default cutoff thresholds (see Table S1 of the supplementary material for the wall times for each system). The effective scaling exponent k is defined by a “two-point fit” of the time-to-solution C(N) measured with size parameters Ni and Ni+1,

kNi1,NlogNiNi1C(Ni)C(Ni1).
(35)

Polyethylene (C2H4)n and polyacetylene (C2H2)n represent prototypical one-dimensional covalent periodic systems with large gaps. Computational costs of their exchange term quickly decay to zero after including a finite number of unit cells (20 for polyethylene and 80 for polyacetylene) due to the rapid (exponential) decay of contributions to all three lattice sums with respect to the lattice index magnitude. The slower decay in polyacetylene is due to its smaller bandgap and thus less localized density matrix, resulting in a slower convergence of the L summation.

FIG. 1.

Effective scaling exponent of the components of CADF-K, including the most expensive Q and F contractions [see Eqs. (26) and (27) for details], as well as the total K build in periodic Hartree–Fock for the five selected systems. The Def2-SVP/Def2-SVP-J basis pair is used.

FIG. 1.

Effective scaling exponent of the components of CADF-K, including the most expensive Q and F contractions [see Eqs. (26) and (27) for details], as well as the total K build in periodic Hartree–Fock for the five selected systems. The Def2-SVP/Def2-SVP-J basis pair is used.

Close modal

A single-walled nanotube (n, 0) has been reported to have finite but small bandgaps both in experiment and theory.121,122 The slow decay of its density matrix (reflected in the L summation) will dominate the calculation in long distance. As expected, the asymptotic scaling of nanotube with Ncell approaches linear after 40 unit cells in each direction, equivalent to 171.6 Å from the reference cell. Note that scaling behaviors of Q and F are different, i.e., the former stays linear, while the latter drops to zero after 10 cells. This is because Q and F have direct and indirect dependence on density D, respectively. For threshold ϵD = 10−8, the density decreases to a small value after 10 cells, but it is still above ϵD. As a result, the operation count of Q contraction [Eq. (26)] increases with D. On the other hand, the incremental elements of Q are so small that the function ForceShape in Algorithm 1 yields an unchanged shape for F, leading to a scaling exponent of zero for the computation of F. Numerical analysis indicates that we might have overcomputed the elements of Q due to greatly overestimating norms of its tiles. An improved norm-estimating approach for Q, especially in small-gap systems, is left for the future work.

The two-dimensional test cases, urea monolayer and h-BN, are prototypical molecular and covalently bonded crystals, respectively. The loosely packed structure of molecular crystals usually leads to large bandgaps and fast decay of density matrices. Figure 1 shows that for the urea monolayer, the scaling exponent quickly decays to zero after Ncell = 4. The covalently bonded h-BN has a similar structure as graphene but with a wide bandgap of about 5 eV.110,123–125 The total CADF-K time of h-BN stops growing after Ncell = 40. Similar to the nanotube case, Q decays slower than F for h-BN because we overestimated its tile norms, but finally it drops to zero.

In this section, we discuss the effect of the CADF approximation and screening thresholds on precision. The default values of thresholds (see Table I) are used in Sec. IV B 1, ensuring that the incompleteness of DFBS is the main source of error, i.e., further reduction of thresholds by a factor of 10 does not change the Hartree–Fock energy by more than 1 μEh. In Sec. IV B 2, to study the effect of cutoff thresholds individually, other parameters are chosen from the tight set.

1. Precision of CADF

Figure 2 shows the absolute HF energy error per atom due to the use of CADF-K relative to the exact evaluation using four-center ERIs (4c-K) for various OBS/DFBS pairs. When a matched OBS/DFBS pair is used, i.e., cc-pVDZ/cc-pVDZ-RI and Def2-SVP/Def2-SVP-J(K), errors per atom are close to or below 1 mEh, comparable to typical DF errors in absolute energies for common auxiliary basis sets.76 As the size of DFBS increases, the errors tend to reduce significantly. In particular, with the cc-pVDZ/cc-pV6Z-RI basis pair, CADF errors can be controlled well below 10 μEh per atom for tested systems. Note that the CADF error does not reduce monotonically, i.e., cc-pVQZ-RI yields larger errors than cc-pVTZ-RI. This is likely due to the fact that the cc-pVXZ-RI (X = D, T, Q, 5, 6) basis sets are designed to minimize the fitting errors for the matching cc-pVXZ basis and not to form a systematic series.

FIG. 2.

Errors per atom in total Hartree–Fock energies (in Eh) for CADF-K with respect to 4c-K for the selected one- and two-dimensional systems. The XZ/RI (X = D, T, Q, 5, 6) labels on the x axis denote cc-pVDZ/cc-pVXZ-RI OBS/DFBS pairs, and SVP/J(K) denotes Def2-SVP/Def2-SVP-J(K) pairs. Multipole-accelerated 4c-J is used in all calculations.

FIG. 2.

Errors per atom in total Hartree–Fock energies (in Eh) for CADF-K with respect to 4c-K for the selected one- and two-dimensional systems. The XZ/RI (X = D, T, Q, 5, 6) labels on the x axis denote cc-pVDZ/cc-pVXZ-RI OBS/DFBS pairs, and SVP/J(K) denotes Def2-SVP/Def2-SVP-J(K) pairs. Multipole-accelerated 4c-J is used in all calculations.

Close modal

Similar accuracy has been achieved for cc-pVDZ/cc-pVDZ-RI and Def2-SVP/Def2-SVP-J pairs. The latter will be used in the rest of this article for practical reasons regarding CPU time and memory usage.

2. Precision vs screening thresholds

Among the cutoff thresholds listed in Table I, ϵD for truncating density matrix and ϵF for predetermining the sparsity shape of tensor F have the most significant impact on the efficiency and accuracy of CADF-K and thus will be studied in more detail.

Figure 3 shows variation of the CADF-K error with the two screening thresholds. We can see that the CADF-K error decreases quickly with both cutoffs, yielding results with the precision close to or well below 0.01 μEh for ϵD ≤ 10−8 and ϵF ≤ 10−10. To balance the accuracy and computational cost, we use ϵD = 10−8 and ϵF = 10−10 as our default cutoffs for the following results, unless otherwise stated.

FIG. 3.

Errors per atom (in Eh) in total Hartree–Fock energy with CADF-K for various ϵD (left) and ϵF (right) values with respect to the energies obtained with CADF-K and the tight cutoff thresholds.

FIG. 3.

Errors per atom (in Eh) in total Hartree–Fock energy with CADF-K for various ϵD (left) and ϵF (right) values with respect to the energies obtained with CADF-K and the tight cutoff thresholds.

Close modal

Table II presents the average wall times per iteration for the CADF-K and 4c-K builders in the eight selected systems, including the three-dimensional ones. CADF-K demonstrates speedups of up to 13× relative to 4c-K. Among the eight model systems, the greatest speedup is observed for the urea crystal. This is partially because the urea crystal adopts a looser force shape threshold ϵF = 10−8 (see Table I for explanation) in order to satisfy the memory requirement of the cluster (it is safe to loosen ϵF in this case since the introduced error is well below the density fitting error). It is likely related to the fact that this is the lowest-density (molecular) crystal of all systems considered here.

TABLE II.

Comparison of performance and precision of CADF-K vs 4c-K h.a

Wall time (in s)
System4c-KdCADF-KSpeedupΔElattb% errorc
Polyacetylene 1.02 0.22 4.71 0.059 0.32 
Polyethylene 1.01 0.16 6.26 0.047 0.34 
Nanotube 187.58 22.11 8.48 0.028 0.07 
h-BN 182.74 29.66 6.16 0.192 0.22 
Urea monolayer 22.97 3.85 5.96 0.009 0.68 
Urea crystal 542.49 41.64f 13.03 0.026 1.06 
LiH 827.54 346.63 2.39 0.071 0.09 
Diamonde 904.72 364.30f 2.48 0.499 0.42 
Diamondg 3804.29 504.46f 7.54 0.232 0.21 
Wall time (in s)
System4c-KdCADF-KSpeedupΔElattb% errorc
Polyacetylene 1.02 0.22 4.71 0.059 0.32 
Polyethylene 1.01 0.16 6.26 0.047 0.34 
Nanotube 187.58 22.11 8.48 0.028 0.07 
h-BN 182.74 29.66 6.16 0.192 0.22 
Urea monolayer 22.97 3.85 5.96 0.009 0.68 
Urea crystal 542.49 41.64f 13.03 0.026 1.06 
LiH 827.54 346.63 2.39 0.071 0.09 
Diamonde 904.72 364.30f 2.48 0.499 0.42 
Diamondg 3804.29 504.46f 7.54 0.232 0.21 
a

Wall time is measured as the time per iteration. Multipole accelerated 4c-J is used for both 4c-K and CADF-K. Def2-SVP is used as the orbital basis set for the first six systems, CR-cc-pVDZ for LiH and 3-21G and 6-31G* for diamond. Def2-SVP-J is used as the density-fitting basis set in all cases in CADF-K. Calculations are performed on Intel Xeon E5-2680 v4 processors @ 2.4 GHz with 28 cores.

b

ΔElatt is the difference between Elatt (in kcal mol−1 atom−1) computed by CADF-K and that computed by 4c-K.

c

Percent error of Elatt computed by CADF-K with respect to 4c-K Elatt.

d

Permutational and translational symmetries of four-center ERIs are exploited in 4c-K.

e

Using the 3-21G orbital basis.

f

ϵF = 10−8 is used in the CADF-K calculation of the urea crystal and diamond (default values for other thresholds) in order to satisfy the memory requirement of the cluster.

g

Using the 6-31G* orbital basis.

To demonstrate the accuracy of our method in terms of relative energies, here we define the (electronic) lattice dissociation energy, Elatt, as the per-atom energy difference between the cell structure in a periodic form and that in an isolated form, i.e.,

Elatt=(EcellperiodicEcellisolated)/Natom,
(36)

where Natom is the number of atoms in a cell. We take polyethylene to illustrate each component in Eq. (36). In the calculation of polyethylene, where we adopt C2H4 as its unit cell, Natom thus equals 6, Ecellperiodic is the total energy of polyethylene per cell (calculated as a 1D system), and Ecellisolated is the energy of an isolated C2H4 with the same atomic coordinates of the reference cell (calculated as a molecule). Note that Elatt is not lattice energy or cohesive energy. From the last two columns of Table II, we can see that the error in lattice dissociation energy (ΔElatt) due to the CADF approximation is at most 0.5 kcal mol−1 per atom, or at most 1% of 4c-K Elatt. Among the eight systems, diamond shows the largest ΔElatt (0.499 kcal mol−1 per atom) when the 3-21G orbital basis is used. This particular computation seems to be an outlier due to the poor match of the orbital and fitting basis; all other computations have used double-zeta plus polarization orbital basis, whereas the 3-21G basis does not include any polarization functions. Indeed, the error is significantly reduced when the polarization functions are included via the use of the 6-31G* basis (use of the def2-SVP basis for both diamond and LiH unfortunately leads to excessive condition numbers). We point out that the CADF errors can be better controlled if we use a more converged DFBS (see Fig. 2 of the present work and Fig. 6 of the work by Ihrig and co-workers21). The relevant total energies of the eight systems from CADF-K and 4c-K can be found in Table S2 of the supplementary material.

Periodic Hartree–Fock based on Gaussian AOs has also been implemented in other solid-state quantum chemistry packages, including, but not limited to, Crystal,35 Gaussian,36,37 CP2K,38 and PySCF.39 A strict apple-to-apple comparison between various packages is impossible because of their diverse implementation techniques, different optimization levels, and licensing restrictions. Here, in order to show that the demonstrated speedups of our CADF-K algorithm are indeed meaningful, i.e., our 4c-K implementation in MPQC is efficient, we perform a comparison of timing between the 4c-K in MPQC and that in Crystal, as well as an accuracy comparison with Crystal and Gaussian to prove the correctness of our code (see the footnotes of Table III for the relevant computational details). Polyacetylene, polyethylene, h-BN, LiH, and diamond are selected as examples. From Table III, we can see that for all tested systems, the 4c-K implementation in MPQC is as efficient as Crystal. A consistent estimate of lattice dissociation energies has been achieved among the three packages at sub-kcal mol−1 atom−1 accuracy (10−4 kcal mol−1 atom−1 if one compares MPQC to Gaussian). Note that the timings obtained from Crystal are based on tightened integral thresholds in order to reach a similar level of accuracy as Gaussian and MPQC. The relatively low accuracy in Crystal’s results (especially for polyacetylene) could possibly be further improved by adopting more appropriate thresholds; they will not be explored in the present work, partly due to the authors’ unfamiliarity to the program.

TABLE III.

Comparison of performance and precision of the 4c-K implementations in MPQC and reference codes.a

GaussianbCrystalcMPQCd
SystemElatt TimeElattTimeElatt
Polyacetylene −26.651 05 15.3 −26.627 45 4.9 −26.651 05 
Polyethylene −13.372 71 22.9 −13.372 73 5.8 −13.372 71 
h-BN −87.804 06 2 838.7 −87.798 22 1886.6 −87.804 07 
LiH −83.293 62 25 769.9 −83.292 75 6358.8 −83.293 41 
Diamond −118.153 46 23 377.3 −118.146 71 7223.9 −118.153 78 
GaussianbCrystalcMPQCd
SystemElatt TimeElattTimeElatt
Polyacetylene −26.651 05 15.3 −26.627 45 4.9 −26.651 05 
Polyethylene −13.372 71 22.9 −13.372 73 5.8 −13.372 71 
h-BN −87.804 06 2 838.7 −87.798 22 1886.6 −87.804 07 
LiH −83.293 62 25 769.9 −83.292 75 6358.8 −83.293 41 
Diamond −118.153 46 23 377.3 −118.146 71 7223.9 −118.153 78 
a

Def2-SVP is used as the orbital basis set for the first three systems, CR-cc-pVDZ for LiH, and 3-21G for diamond. Calculations are performed on Intel Xeon E5-2680 v4 processors @ 2.4 GHz with 1 core. The wall times are given in seconds and Elatt in kcal mol−1 atom−1.

b

Gaussian09 is used with the keyword SCF = Tight.

c

Space symmetry and the bipolar approximation are turned off in Crystal. ERI cutoffs and shrink factors are optimized to achieve maximal efficiency without a significant loss of accuracy. Note that for h-BN tightened integral cutoffs, TOLINTEG 20 20 20 20 40, were used in order to converge the SCF iterations. The reported time of Crystal includes the computation of Coulomb term, which is assumed insignificant relative to the cost of exchange.

d

Default cutoffs are used in MPQC.

We have presented an efficient density-based implementation of exact exchange for periodic systems via the concentric atomic density fitting (CADF) approximation based on atom-centered Gaussian-type orbitals. Both molecular and periodic systems can be treated on an equal footing. We have shown the performance of periodic CADF-K on several systems with different dimensionalities and obtained a significant improvement relative to the 4c-K algorithm. Errors in lattice dissociation energy introduced by the CADF approximation are below the chemical accuracy. Further efficiency improvements could be conceivably achieved by using maximally localized Wannier functions126–129 in the context of an orbital-based CADF-K algorithm.59 

See the supplementary material for the performance data used to generate Fig. 1 and the absolute energies of conventional and CADF-K computations reported in Table II.

The data that support the findings of this study are available within the article and its supplementary material and from the corresponding author upon request.

This research was supported by the U.S. National Science Foundation (Award Nos. 1550456 and 1800348). We thank the Advanced Research Computing (ARC) at Virginia Tech for providing computing resources. X.W. thanks Dr. Chong Peng for insightful discussions. The Flatiron Institute is a division of the Simons Foundation.

Fast evaluation of Coulomb potential is typically performed via the Ewald method46,130–132 or the fast multipole method (FMM);133–138 however, other efficient approaches exist.32,139,140 In the current work, we implemented a straightforward multiscale multipole-accelerated approach as described in this section; this approach is similar to the method of Ref. 141, except we use (1) real142bipolar143,144 (or two-center145) multipole expansions and (2) leading multipole moments of the unit cell’s total charge density are compensated to ensure absolutely convergent values for total energy and potential as well as the equivalence with the Ewald method. This appendix documents the relevant formulas.

Our goal is to evaluate matrix representation of the total Coulomb potential, i.e., the sum of Eqs. (14) and (15). The “Coulomb” summations (over G) are first divided into the crystal near field (CNF) and far field (CFF) according to the partitioning criteria defined in Ref. 88. The Coulomb interaction (ρ0|ρG) between charge distributions of the reference unit cell 0 and a distant unit cell G is computed via 4c-J [Eq. (15)] if G is in CNF; otherwise, it is evaluated using multipole expansion. The use of complex multipole moments is common in the literature;133–136,138,146,147 however, if the basis is real, there is no reason to use the complex-valued formalism. Real spherical multipole moments and their interaction kernels are defined as142 

Ol,m(r)=|r|l(l+m)!Plm(cosθ)cos(mϕ)ifm0|r|l(l+m)!Plm(cosθ)sin(mϕ)ifm<0,
(A1)
Ml,m(r)=(lm)!|r|(l+1)Plm(cosθ)cos(mϕ)ifm0(lm)!|r|(l+1)Plm(cosθ)sin(mϕ)ifm<0,
(A2)

where (r, θ, ϕ) are spherical polar coordinates of r and Plm(cosθ) denotes the associated Legendre polynomials about cosθ. Now, we introduce functions Nl,m±(r) (N = O or M) as

Nl,m+(r)=Nl,m(r) ifm0(1)mNl,m(r) ifm<0,
(A3)
Nl,m(r)=(1)(m+1)Nl,m(r) ifm>00 ifm=0Nl,m(r) ifm<0.
(A4)

Note that Ol,m± and Ml,m± have the same expressions as Eqs. (3)–(6) in Ref. 142. The Coulomb potential created at P by a distant unit charge at r whose potential is multipole expanded around Q can be expressed as

Ll,m(Pr)=l=0m=llMl+l,m+m+(PQ)Ol,m+(rQ)+Ml+l,m+m(PQ)Ol,m(rQ) ifm0l=0m=llMl+l,m+m(PQ)Ol,m+(rQ)+Ml+l,m+m+(PQ)Ol,m(rQ) ifm<0.
(A5)

The Coulomb interaction energy between sets of point charges {qi} and {qj} (e.g., two unit cells) becomes

Ecoul=i,jqiqj|rirj|
(A6)
=l=0(1)lm=ll(2δm,0)Ōl,m(P)L¯l,m(P),
(A7)

where

Ōl,m(P)iqiOl,m(riP)
(A8)

are the total (real) multipole moments of {qi} centered at P and

L¯l,m(P)jqjLl,m(Prj)
(A9)

are the charge-including local potentials at P due to distant {qj} (centered at Q) and evaluated via Eq. (A5). For continuous charge distribution, the summation over point charges in Eqs. (A8) and (A9) becomes a trace with 1-RDM, e.g.,

Ōl,m(P)Rμ,νμ|Ol,m(rP)|νRDμ,νR.
(A10)

Evaluation of the real multipole moment integrals over the Gaussian basis was implemented in the open-source Libint148 library following the recurrence formulas in Ref. 142.

To obtain meaningful (absolutely convergent) total and orbital energies, it is mandatory to compensate the unit cell’s electric dipoles (for 2-dimensional crystals) and quadrupoles (for three-dimensional crystals). This is achieved by adding charges and point dipoles to the surface of the unit cell in such a way that the net charge density in the bulk of the crystal is not changed, while the electric dipole and quadrupole moments of the unit cell vanish. The compensation can be viewed as modifying the boundary conditions such that the net charge density in the bulk of the crystal remains unchanged; the compensating charges/dipoles would end up on the surface of a finite crystal composed of the multipole-compensated unit cells and thus help cancel the shape-dependent contributions to the energy and potential.44,46,149 Charge compensation of the dipoles in the context of FMM was introduced by Kudin and Scuseria as soon as they started to utilize (C)FMM for LCAO calculations on 3-D solids.146 Here, we compensate not only the dipoles but also the Cartesian quadrupoles. The latter automatically ensures not only the absolute convergence of the potential (e.g., orbital energies) and the equivalence to the Ewald limit of the potential due to the vanishing trace of the second moment (spheropole) of the charge density.46,150 Our approach is formulated for a monoclinic (Bravais) lattice.

Dipole moments are compensated by placing charges of ±qi at the center of opposing facets, namely, for each principal axis i (i = 0, 1, 2 for a 3D lattice), charges of qi and −qi are placed at aj/2 + ak/2 and ai + aj/2 + ak/2, respectively, where j = mod(i + 1, 3), k = mod(i + 2, 2), and ai are the unit cell vectors. The magnitudes of charges needed to compensate net dipole μ are given by

qi=μb̃i,
(A11)

with b̃ibi/(2π) being the reduced reciprocal lattice vectors,

bi2πaj×akV,
(A12)

where Va0 · (a1 × a2) is the unit cell volume.

Quadrupole moments are compensated by placing point dipoles ±μi at the center of opposing facets. Specifically, dipoles μi and −μi are placed at aj/2 + ak/2 and ai + aj/2 + ak/2, respectively. The compensating dipoles needed to compensate unit cell’s net Cartesian quadrupole tensor q (evaluated with the dipole-compensating charges) are given by

μi12b̃iqb̃iai+b̃iqb̃jaj.
(A13)

If needed, this formula can be straightforwardly generalized to higher-order multipoles. For simplicity, point dipoles were approximated by two opposite charges separated by 0.1 a.u.

1.
P. Y.
Ayala
and
G. E.
Scuseria
,
J. Chem. Phys.
110
,
3660
(
1999
).
2.
M.
Ziółkowski
,
B.
Jansík
,
T.
Kjærgaard
, and
P.
Jørgensen
,
J. Chem. Phys.
133
,
014107
(
2010
).
3.
D. G.
Fedorov
and
K.
Kitaura
,
J. Chem. Phys.
123
,
134103
(
2005
).
4.
J.
Friedrich
and
M.
Dolg
,
J. Chem. Theory Comput.
5
,
287
(
2009
).
5.
M.
Kobayashi
and
H.
Nakai
,
J. Chem. Phys.
129
,
044103
(
2008
).
6.
W.
Li
,
P.
Piecuch
, and
J. R.
Gour
,
AIP Conf. Proc.
1102
,
68
(
2009
).
7.
C.
Riplinger
and
F.
Neese
,
J. Chem. Phys.
138
,
034106
(
2013
).
8.
H.-J.
Werner
and
M.
Schütz
,
J. Chem. Phys.
135
,
144116
(
2011
).
9.
M.
Schütz
,
G.
Hetzer
, and
H.-J.
Werner
,
J. Chem. Phys.
111
,
5691
(
1999
).
10.
Q.
Ma
and
H.-J.
Werner
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1371
(
2018
).
11.
C.
Riplinger
,
P.
Pinski
,
U.
Becker
,
E. F.
Valeev
, and
F.
Neese
,
J. Chem. Phys.
144
,
024109
(
2016
).
12.
M.
Del Ben
,
J.
Hutter
, and
J.
VandeVondele
,
J. Chem. Theory Comput.
9
,
2654
(
2013
).
13.
J.
Wilhelm
,
P.
Seewald
,
M.
Del Ben
, and
J.
Hutter
,
J. Chem. Theory Comput.
12
,
5851
(
2016
).
14.
A.
Grüneis
,
G. H.
Booth
,
M.
Marsman
,
J.
Spencer
,
A.
Alavi
, and
G.
Kresse
,
J. Chem. Theory Comput.
7
,
2780
(
2011
).
15.
M.
Kaltak
,
J.
Klimeš
, and
G.
Kresse
,
J. Chem. Theory Comput.
10
,
2498
(
2014
).
16.
M.
Marsili
,
E.
Mosconi
,
F.
De Angelis
, and
P.
Umari
,
Phys. Rev. B
95
,
075415
(
2017
).
17.
P.
Umari
,
X.
Qian
,
N.
Marzari
,
G.
Stenuit
,
L.
Giacomazzi
, and
S.
Baroni
,
Phys. Status Solidi B
248
,
527
(
2011
).
18.
D.
Neuhauser
,
Y.
Gao
,
C.
Arntsen
,
C.
Karshenas
,
E.
Rabani
, and
R.
Baer
,
Phys. Rev. Lett.
113
,
076402
(
2014
).
19.
M. P.
Ljungberg
,
P.
Koval
,
F.
Ferrari
,
D.
Foerster
, and
D.
Sánchez-Portal
,
Phys. Rev. B
92
,
075422
(
2015
).
20.
G. H.
Booth
,
A.
Grüneis
,
G.
Kresse
, and
A.
Alavi
,
Nature
493
,
365
(
2013
).
21.
A. C.
Ihrig
,
J.
Wieferink
,
I. Y.
Zhang
,
M.
Ropo
,
X.
Ren
,
P.
Rinke
,
M.
Scheffler
, and
V.
Blum
,
New J. Phys.
17
,
093020
(
2015
).
22.
W.
Hu
,
L.
Lin
,
A. S.
Banerjee
,
E.
Vecharynski
, and
C.
Yang
,
J. Chem. Theory Comput.
13
,
1188
(
2017
).
23.
P. Y.
Ayala
,
K. N.
Kudin
, and
G. E.
Scuseria
,
J. Chem. Phys.
115
,
9698
(
2001
).
24.
A. F.
Izmaylov
and
G. E.
Scuseria
,
Phys. Chem. Chem. Phys.
10
,
3421
(
2008
).
25.
C.
Pisani
,
M.
Busso
,
G.
Capecchi
,
S.
Casassa
,
R.
Dovesi
,
L.
Maschio
,
C.
Zicovich-Wilson
, and
M.
Schütz
,
J. Chem. Phys.
122
,
094113
(
2005
).
26.
M.
Lorenz
,
D.
Usvyat
, and
M.
Schütz
,
J. Chem. Phys.
134
,
094101
(
2011
).
27.
C.
Pisani
,
M.
Schütz
,
S.
Casassa
,
D.
Usvyat
,
L.
Maschio
,
M.
Lorenz
, and
A.
Erba
,
Phys. Chem. Chem. Phys.
14
,
7615
(
2012
).
28.
T.
Schäfer
,
B.
Ramberger
, and
G.
Kresse
,
J. Chem. Phys.
146
,
104101
(
2017
).
29.
E.
Rebolini
,
G.
Baardsen
,
A. S.
Hansen
,
K. R.
Leikanger
, and
T. B.
Pedersen
,
J. Chem. Theory Comput.
14
,
2427
(
2018
).
30.
D.
Usvyat
,
L.
Maschio
, and
M.
Schütz
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1357
(
2018
).
31.
H. L.
Skriver
,
The LMTO Method: Muffin-Tin Orbitals and Electronic Structure
(
Springer-Verlag
,
Berlin
,
1984
).
32.
J.
Jia
,
J. C.
Hill
,
G. I.
Fann
, and
R. J.
Harrison
, in
Proceedings of the 8th International Symposium on Distributed Computing and Applications to Business Engineering and Science
(
Publishing House of Electronics Industry, Beijing
,
2009
), pp.
13
16
.
33.
V.
Blum
,
R.
Gehrke
,
F.
Hanke
,
P.
Havu
,
V.
Havu
,
X.
Ren
,
K.
Reuter
, and
M.
Scheffler
,
Comput. Phys. Commun.
180
,
2175
(
2009
).
34.
S. V.
Levchenko
,
X.
Ren
,
J.
Wieferink
,
R.
Johanni
,
P.
Rinke
,
V.
Blum
, and
M.
Scheffler
,
Comput. Phys. Commun.
192
,
60
(
2015
).
35.
R.
Dovesi
,
A.
Erba
,
R.
Orlando
,
C. M.
Zicovich-Wilson
,
B.
Civalleri
,
L.
Maschio
,
M.
Rérat
,
S.
Casassa
,
J.
Baima
,
S.
Salustro
, and
B.
Kirtman
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1360
(
2018
).
36.
M. J.
Frisch
,
G. W.
Trucks
,
H. B.
Schlegel
,
G. E.
Scuseria
,
M. A.
Robb
,
J. R.
Cheeseman
,
G.
Scalmani
,
V.
Barone
,
G. A.
Petersson
,
H.
Nakatsuji
,
X.
Li
,
M.
Caricato
,
A.
Marenich
,
J.
Bloino
,
B. G.
Janesko
,
R.
Gomperts
,
B.
Mennucci
,
H. P.
Hratchian
,
J. V.
Ortiz
,
A. F.
Izmaylov
,
J. L.
Sonnenberg
,
D.
Williams-Young
,
F.
Ding
,
F.
Lipparini
,
F.
Egidi
,
J.
Goings
,
B.
Peng
,
A.
Petrone
,
T.
Henderson
,
D.
Ranasinghe
,
V. G.
Zakrzewski
,
J.
Gao
,
N.
Rega
,
G.
Zheng
,
W.
Liang
,
M.
Hada
,
M.
Ehara
,
K.
Toyota
,
R.
Fukuda
,
J.
Hasegawa
,
M.
Ishida
,
T.
Nakajima
,
Y.
Honda
,
O.
Kitao
,
H.
Nakai
,
T.
Vreven
,
K.
Throssell
,
J. A.
Montgomery
, Jr.
,
J. E.
Peralta
,
F.
Ogliaro
,
M.
Bearpark
,
J. J.
Heyd
,
E.
Brothers
,
K. N.
Kudin
,
V. N.
Staroverov
,
T.
Keith
,
R.
Kobayashi
,
J.
Normand
,
K.
Raghavachari
,
A.
Rendell
,
J. C.
Burant
,
S. S.
Iyengar
,
J.
Tomasi
,
M.
Cossi
,
J. M.
Millam
,
M.
Klene
,
C.
Adamo
,
R.
Cammi
,
J. W.
Ochterski
,
R. L.
Martin
,
K.
Morokuma
,
O.
Farkas
,
J. B.
Foresman
, and
D. J.
Fox
, Gaussian 09, Revision A.02,
Gaussian, Inc.
,
Wallingford, CT
,
2016
.
37.
A. F.
Izmaylov
,
G. E.
Scuseria
, and
M. J.
Frisch
,
J. Chem. Phys.
125
,
104103
(
2006
).
38.
J.
Hutter
,
M.
Iannuzzi
,
F.
Schiffmann
, and
J.
VandeVondele
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
4
,
15
(
2014
).
39.
Q.
Sun
,
T. C.
Berkelbach
,
N. S.
Blunt
,
G. H.
Booth
,
S.
Guo
,
Z.
Li
,
J.
Liu
,
J. D.
McClain
,
E. R.
Sayfutyarova
,
S.
Sharma
,
S.
Wouters
, and
G. K. L.
Chan
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1340
(
2018
).
40.
G.
te Velde
and
E. J.
Baerends
,
Phys. Rev. B
44
,
7888
(
1991
).
41.
P. H. T.
Philipsen
,
G.
te Velde
,
E. J.
Baerends
,
J. A.
Berger
,
P. L.
de Boeij
,
M.
Franchini
,
J. A.
Groeneveld
,
E. S.
Kadantsev
,
R.
Klooster
,
F.
Kootstra
,
P.
Romaniello
,
M.
Raupach
,
D. G.
Skachkov
,
J. G.
Snijders
,
C. J. O.
Verzijl
,
J. A.
Celis Gil
,
J. M.
Thijssen
,
G.
Wiesenekker
,
C. A.
Peeples
,
G.
Schreckenbach
, and
T.
Ziegler
, BAND 2019.3, SCM, Theoretical Chemistry,
Vrije Universiteit
,
Amsterdam, The Netherlands
, http://www.scm.com.
42.
B.
Delley
,
J. Chem. Phys.
113
,
7756
(
2000
).
43.
J.
Ladik
and
F.
Martino
,
Acta Phys.
34
,
67
(
1973
).
44.
F. E.
Harris
, in
Theoretical Chemistry
, edited by
H.
Eyring
and
D.
Henderson
(
Elsevier
,
1975
), Vol. 1, pp.
147
218
.
45.
C.
Pisani
and
R.
Dovesi
,
Int. J. Quantum Chem.
17
,
501
(
1980
).
46.
V. R.
Saunders
,
C.
Freyria-Fava
,
R.
Dovesi
,
L.
Salasco
, and
C.
Roetti
,
Mol. Phys.
77
,
629
(
1992
).
47.
Š.
Varga
,
Int. J. Quantum Chem.
108
,
1518
(
2008
).
48.
A. M.
Burow
,
M.
Sierka
, and
F.
Mohamed
,
J. Chem. Phys.
131
,
214101
(
2009
).
49.
L.
Maschio
,
D.
Usvyat
,
F. R.
Manby
,
S.
Casassa
,
C.
Pisani
, and
M.
Schütz
,
Phys. Rev. B
76
,
075101
(
2007
).
50.
D.
Usvyat
,
L.
Maschio
,
F. R.
Manby
,
S.
Casassa
,
M.
Schütz
, and
C.
Pisani
,
Phys. Rev. B
76
,
075102
(
2007
).
51.
Q.
Sun
,
T. C.
Berkelbach
,
J. D.
McClain
, and
G. K.-L.
Chan
,
J. Chem. Phys.
147
,
164119
(
2017
).
52.
C. H.
Patterson
,
J. Chem. Phys.
153
,
064107
(
2020
).
53.
H.-J.
Werner
,
F. R.
Manby
, and
P. J.
Knowles
,
J. Chem. Phys.
118
,
8149
(
2003
).
54.
F.
Neese
,
F.
Wennmohs
, and
A.
Hansen
,
J. Chem. Phys.
130
,
114108
(
2009
).
55.
P.
Pinski
,
C.
Riplinger
,
E. F.
Valeev
, and
F.
Neese
,
J. Chem. Phys.
143
,
034108
(
2015
).
56.
D. S.
Hollman
,
H. F.
Schaefer
, and
E. F.
Valeev
,
J. Chem. Phys.
140
,
064109
(
2014
).
57.
D. S.
Hollman
,
H. F.
Schaefer
, and
E. F.
Valeev
,
Mol. Phys.
115
,
2065
(
2017
).
58.
P.
Merlot
,
T.
Kjaergaard
,
T.
Helgaker
,
R.
Lindh
,
F.
Aquilante
,
S.
Reine
, and
T. B.
Pedersen
,
J. Comput. Chem.
34
,
1486
(
2013
).
59.
S. F.
Manzer
,
E.
Epifanovsky
, and
M.
Head-Gordon
,
J. Chem. Theory Comput.
11
,
518
(
2015
).
60.
E.
Rebolini
,
R.
Izsák
,
S. S.
Reine
,
T.
Helgaker
, and
T. B.
Pedersen
,
J. Chem. Theory Comput.
12
,
3514
(
2016
).
61.
J. L.
Whitten
,
J. Chem. Phys.
58
,
4496
(
1973
).
62.
J. A.
Jafri
and
J. L.
Whitten
,
J. Chem. Phys.
61
,
2116
(
1974
).
63.
F. E.
Harris
and
R.
Rein
,
Theor. Chim. Acta
6
,
73
(
1966
).
64.
E. J.
Baerends
,
D. E.
Ellis
, and
P.
Ros
,
Chem. Phys.
2
,
41
(
1973
).
65.
H.
Sambe
and
R. H.
Felton
,
J. Chem. Phys.
62
,
1122
(
1975
).
66.
N. H. F.
Beebe
and
J.
Linderberg
,
Int. J. Quantum Chem.
12
,
683
(
1977
).
67.
B. I.
Dunlap
,
J. W. D.
Connolly
, and
J. R.
Sabin
,
J. Chem. Phys.
71
,
3396
(
1979
).
68.
B. I.
Dunlap
,
J. W. D.
Connolly
, and
J. R.
Sabin
,
J. Chem. Phys.
71
,
4993
(
1979
).
69.
M.
Feyereisen
,
G.
Fitzgerald
, and
A.
Komornicki
,
Chem. Phys. Lett.
208
,
359
(
1993
).
70.
O.
Vahtras
,
J.
Almlöf
, and
M. W.
Feyereisen
,
Chem. Phys. Lett.
213
,
514
(
1993
).
71.
K.
Eichkorn
,
O.
Treutler
,
H.
Öhm
,
M.
Häser
, and
R.
Ahlrichs
,
Chem. Phys. Lett.
240
,
283
(
1995
).
72.
K.
Eichkorn
,
F.
Weigend
,
O.
Treutler
, and
R.
Ahlrichs
,
Theor. Chem. Acc.
97
,
119
(
1997
).
73.
R.
Bauernschmitt
,
M.
Häser
,
O.
Treutler
, and
R.
Ahlrichs
,
Chem. Phys. Lett.
264
,
573
(
1997
).
74.
F.
Weigend
,
M.
Häser
,
H.
Patzelt
, and
R.
Ahlrichs
,
Chem. Phys. Lett.
294
,
143
(
1998
).
75.
F.
Weigend
,
A.
Köhn
, and
C.
Hättig
,
J. Chem. Phys.
116
,
3175
(
2002
).
76.
F.
Weigend
,
M.
Kattannek
, and
R.
Ahlrichs
,
J. Chem. Phys.
130
,
164106
(
2009
).
77.
J.
Boström
,
F.
Aquilante
,
T. B.
Pedersen
, and
R.
Lindh
,
J. Chem. Theory Comput.
5
,
1545
(
2009
).
78.
B. I.
Dunlap
,
J. Mol. Struct.: THEOCHEM
529
,
37
(
2000
).
79.
B. I.
Dunlap
,
Phys. Chem. Chem. Phys.
2
,
2113
(
2000
).
80.
M.
Schütz
and
F. R.
Manby
,
Phys. Chem. Chem. Phys.
5
,
3349
(
2003
).
81.
C.
Pisani
,
L.
Maschio
,
S.
Casassa
,
M.
Halo
,
M.
Schütz
, and
D.
Usvyat
,
J. Comput. Chem.
29
,
2113
(
2008
).
82.
L.
Maschio
and
D.
Usvyat
,
Phys. Rev. B
78
,
073102
(
2008
).
83.
L.
Maschio
,
J. Chem. Theory Comput.
7
,
2818
(
2011
).
84.
P. M. W.
Gill
,
A. T. B.
Gilbert
,
S. W.
Taylor
,
G.
Friesecke
, and
M.
Head-Gordon
,
J. Chem. Phys.
123
,
061101
(
2005
).
85.
D. P.
Tew
,
J. Chem. Phys.
148
,
011102
(
2018
).
86.
J. E.
Jaffe
and
A. C.
Hess
,
J. Chem. Phys.
105
,
10983
(
1996
).
87.
88.
R.
Łazarski
,
A. M.
Burow
, and
M.
Sierka
,
J. Chem. Theory Comput.
11
,
3029
(
2015
).
89.
M.
Franchini
,
P. H. T.
Philipsen
,
E.
van Lenthe
, and
L.
Visscher
,
J. Chem. Theory Comput.
10
,
1994
(
2014
).
90.
B.
Delley
,
J. Chem. Phys.
92
,
508
(
1990
).
91.
B.
Delley
,
J. Phys. Chem.
100
,
6107
(
1996
).
92.
C.
Pisani
,
R.
Dovesi
, and
C.
Roetti
,
Hartree-Fock Ab Initio Treatment of Crystalline Systems
, Lecture Notes in Chemistry Vol. 48 (
Springer-Verlag
,
Berlin
,
1988
).
93.
R.
Dovesi
,
R.
Orlando
,
C.
Roetti
,
C.
Pisani
, and
V. R.
Saunders
,
Phys. Status Solidi B
217
,
63
(
2000
).
95.
J. D.
Cloizeaux
,
Phys. Rev.
135
,
A698
(
1964
).
96.
L.
He
and
D.
Vanderbilt
,
Phys. Rev. Lett.
86
,
5341
(
2001
).
97.
S. N.
Taraskin
,
D. A.
Drabold
, and
S. R.
Elliott
,
Phys. Rev. Lett.
88
,
196405
(
2002
).
98.
S.
Goedecker
,
Phys. Rev. B
58
,
3501
(
1998
).
99.
S.
Ismail-Beigi
and
T. A.
Arias
,
Phys. Rev. Lett.
82
,
2127
(
1999
).
100.
S. N.
Taraskin
,
P. A.
Fry
,
X.
Zhang
,
D. A.
Drabold
, and
S. R.
Elliott
,
Phys. Rev. B
66
,
233101
(
2002
).
101.
M.
Häser
and
R.
Ahlrichs
,
J. Comput. Chem.
10
,
104
(
1989
).
102.
D. S.
Hollman
,
H. F.
Schaefer
, and
E. F.
Valeev
,
J. Chem. Phys.
142
,
154106
(
2015
).
103.
J. A.
Calvin
,
C. A.
Lewis
, and
E. F.
Valeev
, in
IA3’15: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms
(
ACM Press
,
New York, NY, USA
,
2015
), pp.
1
8
.
104.
C.
Peng
,
C. A.
Lewis
,
X.
Wang
,
M. C.
Clement
,
K.
Pierce
,
V.
Rishi
,
F.
Pavošević
,
S.
Slattery
,
J.
Zhang
,
N.
Teke
,
A.
Kumar
,
C.
Masteran
,
A.
Asadchev
,
J. A.
Calvin
, and
E. F.
Valeev
,
J. Chem. Phys.
153
,
044120
(
2020
).
105.
J. A.
Calvin
,
C.
Peng
,
C. A.
Lewis
,
J.
Zhang
, and
E. F.
Valeev
, “
TiledArray: A framework for distributed-memory block-sparse tensor computation
,” https://github.com/valeevgroup/tiledarray.
106.
R. J.
Harrison
,
G.
Beylkin
,
F. A.
Bischoff
,
J. A.
Calvin
,
G. I.
Fann
,
J.
Fosso-Tande
,
D.
Galindo
,
J. R.
Hammond
,
R.
Hartman-Baker
,
J. C.
Hill
,
J.
Jia
,
J. S.
Kottmann
,
M.-J.
Yvonne Ou
,
J.
Pei
,
L. E.
Ratcliff
,
M. G.
Reuter
,
A. C.
Richie-Halford
,
N. A.
Romero
,
H.
Sekino
,
W. A.
Shelton
,
B. E.
Sundahl
,
W. S.
Thornton
,
E. F.
Valeev
,
Á.
Vázquez-Mayagoitia
,
N.
Vence
,
T.
Yanai
, and
Y.
Yokoi
,
SIAM J. Sci. Comput.
38
,
S123
(
2016
).
107.
G.
Avitabile
,
R.
Napolitano
,
B.
Pirozzi
,
K. D.
Rouse
,
M. W.
Thomas
, and
B. T. M.
Willis
,
J. Polym. Sci., Polym. Lett. Ed.
13
,
351
(
1975
).
108.
W.
Förner
,
R.
Knab
,
J.
Č
́žek
ı, and
J.
Ladik
,
J. Chem. Phys.
106
,
10248
(
1997
).
109.
M. S.
Dresselhaus
,
G.
Dresselhaus
, and
R.
Saito
,
Carbon Nanotubes
33
,
883
(
1995
).
110.
D.-H.
Kim
,
H.-S.
Kim
,
M. W.
Song
,
S.
Lee
, and
S. Y.
Lee
,
Nano Convergence
4
,
13
(
2017
).
111.
S.
Swaminathan
,
B. M.
Craven
, and
R. K.
McMullan
,
Acta Crystallogr., Sect. B: Struct. Sci.
40
,
300
(
1984
).
112.
K.
Persson
, Materials data on C (SG:227) by materials project, 2014.
113.
K.
Persson
, Materials data on LiH (SG:225) by materials project, 2014.
114.
F.
Weigend
and
R.
Ahlrichs
,
Phys. Chem. Chem. PhWys.
7
,
3297
(
2005
).
115.
F.
Weigend
,
Phys. Chem. Chem. Phys.
8
,
1057
(
2006
).
116.
T. H.
Dunning
,
J. Chem. Phys.
90
,
1007
(
1989
).
117.
M.
Lorenz
,
L.
Maschio
,
M.
Schütz
, and
D.
Usvyat
,
J. Chem. Phys.
137
,
204119
(
2012
).
118.
J. S.
Binkley
,
J. A.
Pople
, and
W. J.
Hehre
,
J. Am. Chem. Soc.
102
,
939
(
1980
).
119.
C.
Hättig
,
Phys. Chem. Chem. Phys.
7
,
59
(
2005
).
120.

Since we only focus on the most time-consuming step, namely the construction of the direct-space exchange matrix [Eq. (16)], for a chosen unit cell, we do not analyze the cost with respect to the number of sampled k points in the first Brillouin zone [Nk, see, for example, Eq. (18)]; in any case, the number of k points can always be reduced to 1 by increasing the unit cell size appropriately.

121.
M.
Ouyang
,
J.-L.
Huang
,
C. L.
Cheung
, and
C. M.
Lieber
,
Science
292
,
702
(
2001
).
122.
Y.
Matsuda
,
J.
Tahir-Kheli
, and
W. A.
Goddard
,
J. Phys. Chem. Lett.
1
,
2946
(
2010
).
123.
K. T.
Park
,
K.
Terakura
, and
N.
Hamada
,
J. Phys. C: Solid State Phys.
20
,
1241
(
1987
).
124.
M. S.
Si
,
J. Y.
Li
,
H. G.
Shi
,
X. N.
Niu
, and
D. S.
Xue
,
Europhys. Lett.
86
,
46002
(
2009
).
125.
B.
Huang
and
H.
Lee
,
Phys. Rev. B
86
,
245406
(
2012
).
126.
C. M.
Zicovich-Wilson
,
R.
Dovesi
, and
V. R.
Saunders
,
J. Chem. Phys.
115
,
9708
(
2001
).
127.
S.
Casassa
,
C. M.
Zicovich-Wilson
, and
C.
Pisani
,
Theor. Chem. Acc.
116
,
726
(
2006
).
128.
N.
Marzari
and
D.
Vanderbilt
,
Phys. Rev. B
56
,
12847
(
1997
).
129.
P.
Zeiner
,
R.
Dirl
, and
B. L.
Davies
,
J. Math. Phys.
39
,
2437
(
1998
).
130.
131.
A. Y.
Toukmaji
and
J. A.
Board
,
Comput. Phys. Commun.
95
,
73
(
1996
).
132.
C.
Kittel
,
Introduction to Solid State Physics
(
John Wiley & Sons
,
New York
,
1996
).
133.
L.
Greengard
and
V.
Rokhlin
,
J. Comput. Phys.
73
,
325
(
1987
).
134.
C. A.
White
and
M.
Head-Gordon
,
J. Chem. Phys.
101
,
6593
(
1994
).
135.
M. C.
Strain
,
G. E.
Scuseria
, and
M. J.
Frisch
,
Science
271
,
51
(
1996
).
136.
M.
Challacombe
,
C.
White
, and
M.
Head-Gordon
,
J. Chem. Phys.
107
,
10131
(
1997
).
137.
K. N.
Kudin
and
G. E.
Scuseria
,
Phys. Rev. B
61
,
16440
(
2000
).
138.
K. N.
Kudin
and
G. E.
Scuseria
,
J. Chem. Phys.
121
,
2886
(
2004
).
139.
L.
Genovese
,
T.
Deutsch
,
A.
Neelov
,
S.
Goedecker
, and
G.
Beylkin
,
J. Chem. Phys.
125
,
074105
(
2006
).
140.
G.
Beylkin
,
V.
Cheruvu
, and
F.
Pérez
,
Appl. Comput. Harmonic Anal.
24
,
354
(
2008
).
141.
M.
Sierka
,
A.
Hogekamp
, and
R.
Ahlrichs
,
J. Chem. Phys.
118
,
9136
(
2003
).
142.
J. M.
Pérez-Jordá
and
W.
Yang
,
J. Chem. Phys.
104
,
8003
(
1996
).
143.
B. C.
Carlson
and
G. S.
Rushbrooke
,
Math. Proc. Cambridge Philos. Soc.
46
,
626
(
1950
).
144.
R. J.
Buehler
and
J. O.
Hirschfelder
,
Phys. Rev.
83
,
628
(
1951
).
145.
I. A.
Solov’yov
,
A. V.
Yakubovich
,
A. V.
Solov’yov
, and
W.
Greiner
,
Phys. Rev. E
75
,
051912
(
2007
).
146.
K. N.
Kudin
and
G. E.
Scuseria
,
Chem. Phys. Lett.
283
,
61
(
1998
).
147.
K. N.
Kudin
and
G. E.
Scuseria
,
Chem. Phys. Lett.
289
,
611
(
1998
).
148.
E. F.
Valeev
, Libint: A library for the evaluation of molecular integrals of many-body operators over gaussian functions, http://libint.valeyev.net/, 2018, version 2.5.0-beta.1.
149.
S. W.
de Leeuw
,
J. W.
Perram
, and
E. R.
Smith
,
Proc. R. Soc. London, Ser. A
373
,
27
(
1980
).
150.
R. N.
Euwema
and
G. T.
Surratt
,
J. Phys. Chem. Solids
36
,
67
(
1975
).

Supplementary Material