The evaluation of the exact [Hartree–Fock (HF)] exchange operator is a crucial ingredient for the accurate description of the electronic structure in periodic systems through *ab initio* and hybrid density functional approaches. An efficient formulation of periodic HF exchange in a linear combination of atomic orbitals representation presented here is based on the concentric atomic density fitting approximation, a domain-free local density fitting approach in which the product of two atomic orbitals is approximated using a linear combination of fitting basis functions centered at the same nuclei as the AOs in that product. A significant reduction in the computational cost of exact exchange is demonstrated relative to the conventional approach due to avoiding the need to evaluate four-center two-electron integrals, with sub-millihartree/atom errors in absolute HF energies and good cancellation of fitting errors in relative energies. The novel aspects of the evaluation of the Coulomb contribution to the Fock operator, such as the use of real two-center multipole expansions and spheropole-compensated unit cell densities, are also described.

## I. INTRODUCTION

Recently, there has been dramatic progress in reduced-scaling many-body formalisms for *molecular* electronic structure problems.^{1–11} Such approaches allow robust simulation of the electronic structure of large systems that surpasses the accuracy of mainstream methods, i.e., the Kohn–Sham density functional theory (KS DFT), by allowing to systematically eliminate analytic and numerical approximations. Although reduced-scaling formalisms, including many-body methods, have been well established in the solid state community,^{12–22} their development applicable to periodic solids is a relatively unexplored frontier in quantum chemistry when compared to the molecular setting, the few pioneering efforts notwithstanding.^{23–30} A significant barrier to straightforward extension of the molecular methods to the periodic setting is the different numerical representations dominant in each setting. The linear combination of atomic orbitals (LCAO) approach is dominant in molecular computations with open-boundary conditions, but it is not commonly used in the periodic context where the use of plane waves is dominant; nevertheless, bases with (strictly) local support has been used in periodic solids [linear muffin-tin orbitals (LMTO),^{31} finite/spectral elements,^{32} and numerical atomic orbitals^{33,34}]. We note that several widely used packages, including Crystal,^{35} Gaussian,^{36,37} CP2K,^{38} PySCF,^{39} ADF-BAND,^{40,41} and DMol,^{3,42} support local basis sets in periodic calculations. The primary advantage of the LCAO representation is its compactness for atomistic matter in which the atomic electronic structure is largely preserved; the compactness of representation and the quasi-local support of most important AOs are key to practical reduced-scaling formalisms for many-body methods. Thus, the use of the LCAO representation provides an appealing alternative to the many-body formalisms in solids.

This paper focuses on the mean-field problem in the LCAO representation for periodic solids as a precursor to reduced-scaling many-body methods in such a representation. Although the basic LCAO machinery in the periodic setting has long been established,^{43–46} several recent efforts have focused on how to improve the efficiency of operator evaluation in the LCAO representation by *density fitting* (DF).^{24,26,34,47–52} DF is key to fast computation in the LCAO representation, whether it is at the mean-field or many-body level. Long before the AO basis set approaches numerical completeness, the AO product space becomes highly redundant; density fitting is a way to eliminate the redundancy and thus to greatly reduce the cost of operator evaluation (and for AOs that are represented purely numerically,^{33,34} density fitting is the only practical way to evaluate operators).^{21} Another way to view density fitting is as a physically motivated factorization of the two- and higher-body operator representations; by separating the tensor modes of the operator representation, the density fitting reduces the cost of operator manipulations such as transformation to the basis of single-particle eigenstates, which is key to reducing the cost of integral transformation in modern reduced-scaling many-body methods.^{7,8,10,11,53–55}

In this paper, we describe how to improve the efficiency of the exact exchange operator construction in the periodic LCAO setting using concentric atomic density fitting (CADF) that has been successfully utilized in the molecular electronic structure by us^{56,57} and others.^{21,58–60} In Sec. II, we briefly review the basics of the density fitting approximation in the electronic structure, the periodic LCAO HF method, and the implementation of the CADF periodic exact exchange algorithm (CADF-K). Computations on various one-, two-, and three-dimensional periodic systems are discussed in Secs. III and IV. Conclusions and future work are given in Sec. V.

## II. FORMALISM

For the purpose of making this manuscript as self-contained as reasonable, we start out by quickly recapping the basics of density fitting and periodic LCAO mean-field methods.

### A. Density fitting

Density fitting, also for historical reasons known in quantum chemistry as the resolution of the identity (RI), approximates product quasi-densities by a linear combination of *density-fitting* (also referred to generically as *auxiliary*) basis functions. The modern DF formalism was introduced and analyzed by Whitten for the purpose of approximating the two-body Coulomb integrals^{61,62} (for earlier DF-like approximations, see Ref. 63 and references therein). At the same time, Baerends *et al.*^{64} used Eq. (1) in the context of evaluation of the Coulomb potential of a density represented as a linear combination of two-center products in terms of one-electron functions. After much subsequent work,^{65–68} DF entered the mainstream of quantum chemistry due to the seminal work by Feyereisen *et al.*^{69,70} and Ahlrichs *et al.*^{71–74} Careful engineering of the density fitting basis sets enables us to make the DF error of *global* fitting sufficiently small and largely canceling in practically relevant energy differences; the computational speedup can be as high as one to two orders of magnitude for small-to-medium sized molecules.^{72,75–77}

To highlight the issues with straightforward application of DF to periodic systems, consider fitting two-center AO products $\mu \nu \u2261\varphi \mu *(r)\varphi \nu (r)$ by its approximate representation $\mu \nu \u0303$ as a linear combination of one-center density-fitting basis functions ${X}$,

The set of ${X}$ in Eq. (1) is a *μν*-independent system-spanning set in *global* fitting and a *μν*-dependent set in *local* fitting; clearly, DF in an infinite (e.g., periodic) system must necessarily be local to keep the number of fitting coefficients finite. The fitting coefficients are determined by minimizing the “norm” of the error density, $\delta \mu ,\nu \u2261\mu \nu \u2212\mu \nu \u0303$, “weighted” by an arbitrary operator *Ŵ*. If the fitting basis is complete, then minimizing (*δ*_{μ,ν}|*Ŵ*|*δ*_{μ,ν}) is equivalent to solving

For any positive *Ŵ*, such as the Coulomb operator $V^f\u2261\u222bf(r\u2032)|r\u2212r\u2032|\u2009dr\u2032$, this is only possible by making $\delta \mu ,\nu =0$, i.e., $\mu \nu =\mu \nu \u0303$. For a finite fitting basis $X$, minimizing (*δ*_{μ,ν}|*Ŵ*|*δ*_{μ,ν}) is equivalent to solving the linear system

which only involves two- and three-center integrals, and hence is efficient. Comparison of Eqs. (2) and (3) reveals the dual role played by the fitting basis in density fitting in a finite basis: ${X}$ defines not only the expansion space for the fitted density but also the projection space (in the sense of the collocation method) for solving the exact fitting problem (2). For example, the Coulomb fitting ($\u0174=V^$) via Eq. (3) can be viewed as ensuring that the Coulomb potential of the error density *δ*_{μ,ν} is zero “at” every collocation function $X$. In addition, unless *Ŵ* = 1, “density fitting” is a misnomer: in an incomplete basis, minimizing (*δ*_{μ,ν}|*Ŵ*|*δ*_{μ,ν}) is equivalent to minimizing the 2-norm of the “potential” generated by *Ŵ*^{1/2} from the error density $\delta \mu ,\nu $; since $V^1/2f\u2261\pi \u22123/2\u222bf(r\u2032)|r\u2212r\u2032|2\u2009dr\u2032$, the Coulomb fitting is equivalent to minimizing the error in the *electric field*.

The error in a four-center Coulomb integral can be easily seen as quadratic in the density fitting errors (i.e., its DF approximation is *robust*^{78,79}) if (a) $\u0174=V^$ and (b) the same fitting basis set is used for the bra and the ket,

where each component of $\delta \rho ,\sigma V^\mu \nu \u0303$ and $\rho \sigma \u0303V^\delta \mu ,\nu $ vanishes according to Eq. (3). Chemists’ notation is used for the two-electron integrals here and in the rest of this paper. Condition (b) is satisfied automatically in the global Coulomb fitting, hence its ubiquitousness. For local fitting, or global fitting of an indefinite operator *Ô*, a manifestly robust approximation^{78} can be used,

One of the challenges for local DF is making it accurate without introducing artifacts. Fundamentally, the problem with local fitting is that the potential generated by $\mu \nu \u0303$ is only accurate where the fitting functions are located. Thus, the local DF will introduce large errors in integrals $\rho \sigma V^\mu \nu $ with large bra–ket distances. This issue is avoided in *domain-based* local DF^{53,80} by fitting $\rho \sigma $ and $\mu \nu $ using the set of $X$ specialized to every (*ρσ*, *μν*) combination; the resulting DF approximation is robust. This makes both $\rho \sigma \u0303$ and $\mu \nu \u0303$ optimal for approximating $\rho \sigma V^\mu \nu $ accurately for any *ρσ*–*μν* distance. The domain-based local fitting has been used for periodic systems in the context of local LCAO MP2 methods by Schutz, Usvyat, and co-workers.^{49,50,81–83} The domain fitting greatly increases the number of fitting problems to be solved relative to the global and purely local approaches and thus pushes the crossover with respect to the global fitting to relatively large systems. Also note that the domain DF requires integrals $XV^\mu \nu $, where ${X}$ span fitting sets for all possible $\rho \sigma $ products, i.e., potentially the entire system. Thus, purely local fitting [where ${X}$ are “local” to each $\mu \nu $] would be preferred if it can be made accurate.

The accuracy of purely local fitting can be systematically tuned by several approaches:

by selecting sufficiently complete fitting sets, whether by brute force

^{21,34}or by augmenting the fitting basis with plane waves as recently done by Sun and co-workers;^{51}note the artifactually slow decay of fitting errors that occurs in one-dimensional systems;^{84}by improving the accuracy of long-range potential via (a) constraining the fitted density to match the leading multipole moments as the exact density or, similarly, (b) including distant functions in the collocation fitting set in Eq. (3);

^{85}by using the robust DF approximant [Eq. (5)] to reduce the error; however, this can lead to artifacts such as loss of positivity of the Coulomb operator.

^{58}

The accuracy of the long-range tail of the potential generated by the fitted density is especially important for the periodic systems in the context of computing the Coulomb potential due to its slow decay. This means at least constraining the charge of the fitted density to ensure the exact electrical neutrality of the unit cell and thus ensure that the Coulomb potential is finite.^{47,48,52,86–89} Note that for the Coulomb term, other types of methods similar to density fitting exist for molecules and solids (see, for example, Refs. 90 and 91 for an efficient, potentially exact approach that exploits numerical integration of multipolar-expanded atomic densities partitioned from the total charge density).

The long-range behavior of the exact exchange operator is less problematic, at least in insulators, due to the rapid decay of the one-particle density matrix. Therefore, it is reasonable to assume that the use of purely local density fitting would be a viable strategy even in periodic systems. Here, we extend our previous efforts^{56,57} on fast *strongly* local density fitting for exchange operator construction in molecules to the periodic setting. By strong locality, we mean that the fitting space for the product $\mu 1\mu 2\u2026\u2009$ is composed only of the fitting basis functions on the same atoms as *μ*_{i}; we dub such a local DF approximation *concentric atomic* DF. The idea of CADF-like fitting goes back to the work of Baerends and co-workers^{64} who used it to fit atomic components of the one-particle density. Recently, the concentric fitting approach, referred to as pair atomic resolution of the identity (PARI), was investigated by Merlot *et al.*^{58} in the context of Coulomb fitting; the authors were the first to point out the violation of positivity by the *robust* PARI approximant that led to catastrophic failures. Hollman and co-workers^{56} showed that the accuracy of the CADF could be dramatically improved and the nonpositivity artifacts could be partially alleviated by combining the CADF approximation for most integrals with the exact evaluation of four-center integrals, dubbed semiexact CADF (seCADF),^{56} as well as developed a rigorously screened *linear scaling* version of the CADF exact exchange builder.^{57} Manzer *et al.*^{59} demonstrated that the nonpositivity violations do not occur when robust PARI is only used for the exchange operator.

The above applications of CADF/PARI in molecular context were recently mirrored in the molecular^{21} *and* periodic^{34} setting by Scheffler and co-workers using the non-Gaussian LCAO representation based on numeric atomic orbitals that have strictly local support. Unlike the earlier Gaussian-based *robust* CADF efforts, the use of the *nonrobust* local RI approach for fitting numerical AO products causes the need for the construction of high-quality fitting basis sets with improved accuracy.^{21} Application of robust CADF to the exact exchange construction in periodic systems has not yet been accomplished to the best of our knowledge in Gaussian or non-Gaussian LCAO representations; hence, this is the primary objective of this work.

### B. Periodic Hartree–Fock

The periodic Hartree–Fock procedure implemented in the current work is rather standard and can be found in the literature since the 1980s.^{44,45,92,93} The Hartree–Fock crystalline orbitals (COs) in the LCAO framework are expanded,

in terms of Bloch AOs, which, in turn, are translation-symmetry-adapted linear combinations of the AOs,

The CO coefficients $C\mu ,ki$ in this work are determined by solving the *canonical* Hartree–Fock equations projected on the AOs in the reference cell,

where *F*_{k}, *S*_{k}, and *ϵ*_{k} are the Fock matrix, overlap matrix, and CO (band) energies, respectively. In this work, these equations were solved for a uniform grid of ** k** points in the first Brillouin zone, without exploiting any additional symmetries. The Fock matrix in Eq. (8) is a linear combination of conventional AO (direct space) Fock matrices,

where a compound index *μ*** R** is used to represent the

*μ*th AO located in cell

**, i.e.,**

*R**χ*

_{μ}(

**−**

*r***). For AOs in the reference cell,**

*R***=**

*R***0**and thus

**will be omitted for simplicity. Note that the Fock matrix is Hermitian, i.e., $F\mu ,\nu ,k=F\nu ,\mu ,k*$, which requires the AO Fock matrix to obey**

*R*This condition is closely related to the requirement of translational invariance: *F*_{μ,νR} = *F*_{μ(−R),ν}.

The closed-shell Fock matrix in the AO basis is expressed in terms of the usual kinetic energy, nuclear and electronic Coulomb potential, and exchange contributions,^{93}

where ** A** denotes the nuclear position of atom

*A*in the reference cell and

*D*

_{ρ,σL}denotes the density matrix. Indices

**,**

*R***, and**

*G***span the infinite set of lattice vectors and are associated with pseudo-overlap distribution, Coulomb operator, and density representation, respectively. Note that the translational invariance of**

*L***,**

*D*is exploited in Eqs. (15) and (16). The AO (direct-space) density matrix *D*_{ρ,σL} is constructed from the CO coefficients as

where Ω is the volume of the first Brillouin zone of the reciprocal lattice. The columns of *F*_{μ,νR} (and *S*_{μ,νR}) are transformed to the Bloch AO basis for each ** k** before solving the Hartree–Fock equations [Eq. (8)]. The Hartree–Fock energy is obtained as

All contributions to the AO Fock matrix for a gapped system can be evaluated to arbitrary finite precision *ϵ* with finite effort despite the infinite ranges of indices ** R**,

**, and**

*G***. First, a finite support of Gaussians limits the range of**

*L***in Eqs. (13)–(15) and**

*R***in Eq. (15) to polylog of precision (i.e., ∝− log**

*L*^{k}

*ϵ*). Summation over

**in the Coulombic contributions [Eqs. (14) and (15)] is slow and must be carried out to sufficiently long range to make**

*G***and**

*V***Hermitian according to Eq. (11). However, fast techniques for the Coulomb potential evaluation, such as the fast multipole method (FMM), are well established and will not be discussed here; in this work, we used the multipole approximation to evaluate the Coulomb potential efficiently (see the Appendix for technical details of the real-valued multipole expansion employed here). Finally,**

*J***and (**

*G***+**

*G***−**

*L***) in Eq. (16) have polylog ranges due to the finite support of Gaussians, whereas the range of the**

*R***sum is also polylog in**

*L**gapped*systems due to the exponential decay of density,

*D*

_{ρ,σL}.

^{94–97}However, in metallic systems, convergence of the

**sum in Eq. (16) will be slow (polynomial in precision).**

*L*^{98–100}The remaining question is that of cost. The computational effort of periodic LCAO HF is primarily dominated by the evaluation of the four-center two-electron integrals, particularly in Eq. (16) (explicit evaluation of the integrals in

**, denoted as 4c-J, is only needed for the near-field contribution, i.e., small values of |**

*J***G**|, and can be reduced further by density fitting). Section II C will describe how the computational cost of Eq. (16) can be greatly reduced by local density fitting.

### C. Concentric atomic density fitting for the exchange operator

The concentric atomic density fitting (CADF) is a local approximation to Eq. (1),

where *X*_{(μ,ν)} are the density fitting functions that are concentric with functions *μ or ν*. The nonrobust CADF approximation

is positive definite, but inaccurate with fitting basis sets of practical size; hence, robust CADF is used in this work,

Although robust CADF can violate positivity,^{58} it is safe to use for ** K** [Eq. (16)].

^{59}Inserting Eq. (22) into Eq. (16) leads to the following expression [Einstein’s convention is assumed for Eqs. (23)–(25)]:

where

Note the appearance of lattice vector indices ** G′** and

**that distinguish cell indices of**

*L*′*X*and

*σ*on the left-hand sides of Eqs. (26) and (27) from cell indices

**and**

*G***on the right-hand sides. The appearance of**

*L***and**

*G*′**in Eq. (26) can be understood if we express**

*L*′**using a translationally redundant density representation,**

*Q*Due to the dependence of *X*_{(μ,ρG)} and *σ*(** G** +

**) on**

*L***, summation over**

*G***produces nonzero elements of**

*G**Q*with functions

*X*and

*σ*spread in a range of cells, with the respective ranges labeled

**and**

*G*′**. In contrast, in Eq. (27), indices**

*L*′**and**

*G*′**are introduced simply as notational convenience. Note that only translationally unique components of $F\sigma L\u2032,\nu RXG\u2032$ need to be evaluated,**

*L*′Note that Eq. (24) assumes the equality of the first two terms on the right-hand side of Eq. (23). This assumption is satisfied, given a sufficiently large lattice range for the ** G** and

**sums. Indeed, consider product |**

*L**μν*

*R*_{1}) and density $D\rho ,\sigma R2$. As discussed above, the maximum range of

*R*_{1}and

*R*_{2}are polylog in precision due to the finite support of Gaussians and rapid decay of the density in

*gapped*systems. The two range extents are denoted as $Romax$ and $Rdmax$, respectively, i.e., $|R1|\u2264|Romax|$ and $|R2|\u2264|Rdmax|$, respectively. In this work, $Romax$ is determined before SCF starts, while $Rdmax$ is updated in every SCF iteration based on the specified precision thresholds (see Sec. III). Therefore, the ranges of lattice vectors in Eqs. (23)–(27) are as follows:

$|G|\u2264|Romax|$ and $|L|\u2264|Rdmax|$ because

and*G*are involved in pseudo-overlap and density representations, respectively.*L*$|R|\u22642\u2009|Romax|+|Rdmax|$ because cell

is linked to cell*R***0**via

$|G\u2032|\u2264|Romax|$. Since {

*X*} spans the union of density fitting functions used to represent all non-negligible (*G*′*μ*,*ρ*) pairs, sums over*G*and*G*′have the same extent.*G*$|L\u2032|\u2264|Romax|+|Rdmax|$. Since {

*σ*} spans the union of all possible*L*′*σ*(+*G*), which must be connected to*L**μ***0**via Eq. (30), sums overand (*L*′+*G*) have the same extent.*L*

The CADF coefficients are precomputed before the SCF iterations by solving the usual fitting linear system,

with the Coulomb fitting “metric” $MX(\mu ,\rho G)Y(\mu ,\rho G)\u2261Y(\mu ,\rho G)|X(\mu ,\rho G)$. The fitting coefficients $C\nu R,\sigma (G+L)Y(\sigma (G+L),\nu R)$ in Eq. (27) are then obtained by translation. Note that the number of non-negligible (*μ*, *ρ*** G**) pairs for each AO

*μ*is polylog in precision; hence, the fitting coefficient tensor is sparse. Tensor

**is also sparse in insulators due to the exponential decay of the density matrix. However, note that tensor**

*Q***is not sparse due to its slow (polynomial) decay. Nevertheless, evaluation of Eq. (25) is fast because only a sparse subset of**

*F***contributes to the contraction with sparse**

*F***in Eq. (25). Indeed, in Eq. (25), we can see that for a specific pair of (**

*Q**X*

**,**

*G*′*σ*

**), no contribution to**

*L*′*K*

_{μ,νR}will be made if $Q\mu ,\sigma L\u2032XG\u2032$ is negligible for all

*μ*. Thus, based solely on the sparsity of

**, a prescreening list of (**

*Q**X*

**,**

*G*′*σ*

**) pairs, named**

*L*′*L*

_{Xσ}, can be formed prior to the construction of

**to reduce the computational cost. This list defines which blocks of**

*F***are computed (see Sec. III for the details of tensor blocking).**

*F*Of course, the CADF approximation is not sufficient to arrive at an efficient method since it is necessary to also screen out small contributions in every step of the CADF-K algorithm. Although it is possible to evaluate a block-sparse tensor expression such as the CADF-K expression [Eq. (23)] such that all contributions above the certain threshold *ϵ* are included, the necessary analysis is complicated (see Ref. 57 for how this can be done for the nonperiodic CADF-K). Here, we sacrifice the ability to control the precision of the exchange matrix by a single threshold and instead introduce separate thresholds for exploiting sparsity in individual steps. Since for the efficiency reasons, the AO tensors in our implementation are *tiled* (blocked) not by atoms but by sets of one or more atoms (up to an entire unit cell), the screening is adapted to work approximately the same no matter how the tiling is chosen. The following thresholds are utilized:

*ϵ*_{schwarz}: The magnitudes of the three- and four-center ERIs are estimated using the standard (Cauchy–Bunyakovsky–)Schwarz bound,^{101}

respectively. The bounds for the individual integrals are then straightforwardly combined to produce the bounds for the Frobenius norms of tiles of the ERI tensors. A tile of integrals is neglected if its estimated Frobenius norm is below *ϵ*_{schwarz} times the number of integrals of the tile. This is equivalent to skipping tiles with *per-element* (scaled) Frobenius norms less than *ϵ*_{schwarz}; in other words, the skipped tiles contain integrals with average magnitude below *ϵ*_{schwarz}.

*ϵ*_{S}: For any*non-concentric*orbital basis set (OBS) shell pair (*μ*,*ν*), if the per-element Frobenius norm of its overlap integral is below*R**ϵ*_{S}, i.e.,

we will neglect integrals associated with the corresponding pseudo-overlap distributions, such as |*μν*** R**) in Eqs. (13)–(15) and |

*μρ*

**) in Eq. (31).**

*G***’s range limit, $Romax$, is determined when all (**

*R**μ*,

*ν*

**) pairs for a specific**

*R***satisfy Eq. (34).**

*R**ϵ*_{D}: If the per-element Frobenius norm of all tiles in density*D*_{ρ,σL}for a given unit cellis below*L**ϵ*_{D}, that unit cell’s density is considered to be negligible; this screening defines the $Rdmax$ range.*ϵ*_{F}: This parameter controls which tiles of[Eq. (27)] are computed by defining the prescreening list*F**L*_{Xσ}(see Algorithm 1).*ϵ*_{sparse}: A tile of tensor[Eq. (26)] or*Q*[Eq. (27)] is omitted if the per-element Frobenius norm is below*F**ϵ*_{sparse}.

The values of these parameters used in this work are detailed in Sec. III.

1: procedure CADF-K | |

2: before the beginning of SCF loops | |

3: Read in orbital and density fitting basis sets | |

4: Evaluate fitting coefficients C | $\u22b3$ Eq. (31) |

5: Evaluate two-center ERI, (X$G\u2033$|Y) where $|G\u2033|\u22642\u2009|Romax|+|Rdmax|$ | |

6: Evaluate integral-direct three-center ERI, (XG^{″}|νσ) G′ | |

7: | |

8: in each SCF loop | |

9: ← density matrix from previous iteration D | |

10: Evaluate Q | $\u22b3$ Eq. (26) |

11: L_{Xσ} = ForceShape() Q | $\u22b3$ See function definition below |

12: Evaluate translationally unique via BuildF(FL_{Xσ}) | $\u22b3$ See function definition below |

13: Evaluate K | $\u22b3$ Eq. (25) |

14: | |

15: end procedure | |

16: function ForceShape() Q | |

17: for X in DFBS translated by G′ do | |

18: for σ in OBS translated by L′ do | |

19: for μ in OBS do | |

20: if $\Vert Q\mu ,\sigma L\u2032XG\u2032\Vert \xafF>\u03f5F$ then | |

21: Add (X, G′σ) to L′L_{Xσ} | |

22: break | |

23: end if | |

24: end for | |

25: end for | |

26: end for | |

27: return L_{Yρ} | |

28: end function | |

29: function BuildF(L_{Xσ}) | |

30: for (X, G′σ) in L′L_{Xσ} do | |

31: for within range ($2\u2009|Romax|+|Rdmax|$) Rdo | |

32: if $|L\u2032\u2212R|\u2264|Romax|$then | |

33: for ν in OBS do | |

34: Construct $F\nu ,\sigma (L\u2032\u2212R)X(G\u2032\u2212R)$ | $\u22b3$ Eq. (27) |

35: end for | |

36: end if | |

37: end for | |

38: end for | |

39: return F | |

40: end function |

1: procedure CADF-K | |

2: before the beginning of SCF loops | |

3: Read in orbital and density fitting basis sets | |

4: Evaluate fitting coefficients C | $\u22b3$ Eq. (31) |

5: Evaluate two-center ERI, (X$G\u2033$|Y) where $|G\u2033|\u22642\u2009|Romax|+|Rdmax|$ | |

6: Evaluate integral-direct three-center ERI, (XG^{″}|νσ) G′ | |

7: | |

8: in each SCF loop | |

9: ← density matrix from previous iteration D | |

10: Evaluate Q | $\u22b3$ Eq. (26) |

11: L_{Xσ} = ForceShape() Q | $\u22b3$ See function definition below |

12: Evaluate translationally unique via BuildF(FL_{Xσ}) | $\u22b3$ See function definition below |

13: Evaluate K | $\u22b3$ Eq. (25) |

14: | |

15: end procedure | |

16: function ForceShape() Q | |

17: for X in DFBS translated by G′ do | |

18: for σ in OBS translated by L′ do | |

19: for μ in OBS do | |

20: if $\Vert Q\mu ,\sigma L\u2032XG\u2032\Vert \xafF>\u03f5F$ then | |

21: Add (X, G′σ) to L′L_{Xσ} | |

22: break | |

23: end if | |

24: end for | |

25: end for | |

26: end for | |

27: return L_{Yρ} | |

28: end function | |

29: function BuildF(L_{Xσ}) | |

30: for (X, G′σ) in L′L_{Xσ} do | |

31: for within range ($2\u2009|Romax|+|Rdmax|$) Rdo | |

32: if $|L\u2032\u2212R|\u2264|Romax|$then | |

33: for ν in OBS do | |

34: Construct $F\nu ,\sigma (L\u2032\u2212R)X(G\u2032\u2212R)$ | $\u22b3$ Eq. (27) |

35: end for | |

36: end if | |

37: end for | |

38: end for | |

39: return F | |

40: end function |

^{a}

All AO indices refer to the tiles of the corresponding AO dimensions. $\Vert \u22c5\Vert \xafF$ is the per-element Frobenius norm of the tensor tile.

Note that *ϵ*_{sparse} is used to screen out the insignificant tiles so that they are not used in subsequent computations. The difficult problem is how to predict which tiles are significant before computing them. For the AO integrals, this can be done by using Schwarz inequalities or other more sophisticated estimators,^{102} but for results of tensor contractions, this is less straightforward. The standard TiledArray approach for screening block-sparse tensor contractions^{103} estimates the norms of the result by using the submultiplicative property of the Frobenius norm; this, in effect, assumes that all contraction contributions to the result tile add up constructively (i.e., have the same sign). The *insignificant* result tiles whose estimated per-element Frobenius norm is less than *ϵ*_{sparse} are then not computed, but small contributions to *significant* result tiles are still computed. Here, we use a hybrid approach: for the significant result tiles, all contributions whose per-element Frobenius norm are estimated to be less than $\u03f5sparse/ncontrib$ are not evaluated, where *n*_{contrib} is the total number of contributions from the tensor contraction to the particular given tile. Replacing $ncontrib$ by *n*_{contrib} would *guarantee* that the omitted contributions affect the result’s norm by less than *ϵ*_{sparse}; the use of $ncontrib$ does not provide strict guarantee but does account for the cancellation of small omitted contributions, assuming that they are normally distributed around zero. This allows us to reduce the number of evaluated contributions without any observable effect on the accuracy.

Our periodic CADF-K implementation is summarized in Algorithm 1. The orbital basis set and density fitting basis set will be denoted as OBS and DFBS, respectively, in the following:

## III. COMPUTATIONAL DETAILS

Periodic Hartree–Fock with CADF-K has been implemented in the Massively Parallel Quantum Chemistry (MPQC) package (version 4.0.0).^{104} The implementation of the sparse tensor algebra is provided by our TiledArray tensor framework^{105} supported by the MADNESS task-based runtime.^{106} These components were compiled with the GCC 8.3.0 compiler. All computations were performed on a commodity cluster at Virginia Tech, each node of which has two 12-core Intel Xeon E5-2680 v3 processors (the theoretical peak performance of one node is 960 GFlops/s) and 128 GB of RAM, unless otherwise stated.

We chose polyacetylene, polyethylene, single-walled carbon nanotube (4, 0), hexagonal boron nitride monolayer (h-BN), urea monolayer, urea crystal, lithium hydride (LiH), and diamond as test cases representative of relevant dimensionalities and bond types. The structures were obtained from Refs. 107–113. In the present work, the Def2-SVP^{114} orbital basis set paired with the Def2-SVP-J^{115} density fitting basis set was utilized for all systems, except for LiH and diamond for which more computationally feasible basis sets were adopted; a double-*ζ* quality basis set derived from molecular cc-pVDZ^{116} by Lorenz and co-workers^{117} (denoted as CR-cc-pVDZ) was chosen for LiH and Pople’s 3-21G^{118} for diamond. To test the convergence of CADF-K with respect to density fitting basis sets, we also investigated regular cc-pVDZ^{116} OBS paired with cc-pV*X*Z-RI (*X* = D, T, Q, 5, 6)^{75,119} DFBS.

Table I lists the *default* and *tight* values for various thresholds. The former set is used in all our calculations unless specified otherwise, while the latter is recommended when one expects high-precision results.

Threshold . | Default . | Tight . | Description . |
---|---|---|---|

ϵ_{schwarz} | 10^{–12} | 10^{–16} | Schwarz threshold |

ϵ_{S} | 10^{–10} | 10^{–14} | Overlap threshold |

ϵ_{D} | 10^{–8} | 10^{–12} | Density truncation threshold |

ϵ_{F} | 10^{–10} | 10^{–12} | Force shape threshold |

ϵ_{sparse} | 10^{–10} | 10^{–12} | Tensor sparse threshold |

Threshold . | Default . | Tight . | Description . |
---|---|---|---|

ϵ_{schwarz} | 10^{–12} | 10^{–16} | Schwarz threshold |

ϵ_{S} | 10^{–10} | 10^{–14} | Overlap threshold |

ϵ_{D} | 10^{–8} | 10^{–12} | Density truncation threshold |

ϵ_{F} | 10^{–10} | 10^{–12} | Force shape threshold |

ϵ_{sparse} | 10^{–10} | 10^{–12} | Tensor sparse threshold |

## IV. RESULTS AND DISCUSSION

### A. Computational complexity with respect to lattice summation ranges

The computational complexity of molecular electronic structure methods is traditionally expressed as the operation count (or, alternatively, time-to-solution) as a function of the system size *N* in the asymptotic regime of *N* → *∞*. It is somewhat less common to discuss the complexity of molecular computations with respect to precision *ϵ*, defined as the difference between the exact solution for a given property (e.g., Hartree–Fock energy) and its approximate value obtained in practice. Precision is thus limited by the discretization (“basis set”) error as well as other numerical approximations such as screening. The effects of both kinds of numerical approximations cannot be easily separated for most representations.

The cost of computations on infinite systems can only be meaningfully analyzed in terms of scaling with the unit cell size and with respect to the precision.^{120} Here, we focus only on the latter and specifically focus on the effect on finite lattice summation ranges on precision; the effects of the CADF approximation and various screening thresholds on the precision is considered in Sec. IV B.

Without any screening, the cost of the exchange operator evaluation is $O(Ncell3)$, where *N*_{cell} is the number of unit cells included in the lattice summations in Eq. (16) over ** R**,

**, and**

*G***. Due to the localized nature of Gaussian basis functions, the**

*L***and**

*G***series involved in the pseudo-overlap distributions {**

*R**μ*,

*ρ*

**} and {**

*G**ρ*

**,**

*R**σ*(

**+**

*G***)} decay exponentially to zero with the increase in**

*L***and**

*G***. Therefore, the**

*R***summation, which depends on the decay of the density matrix, usually predominates the computation of exchange. Nevertheless, with the decaying density matrix, the best possible scaling exponents with respect to**

*L**N*

_{cell}is 0 (constant).

Figure 1 shows the *effective* scaling exponent of *N*_{cell} for the periodic CADF-K builder using the default cutoff thresholds (see Table S1 of the supplementary material for the wall times for each system). The effective scaling exponent *k* is defined by a “two-point fit” of the time-to-solution *C*(*N*) measured with size parameters *N*_{i} and *N*_{i+1},

Polyethylene (C_{2}H_{4})_{n} and polyacetylene (C_{2}H_{2})_{n} represent prototypical one-dimensional covalent periodic systems with large gaps. Computational costs of their exchange term quickly decay to zero after including a finite number of unit cells (20 for polyethylene and 80 for polyacetylene) due to the rapid (exponential) decay of contributions to all three lattice sums with respect to the lattice index magnitude. The slower decay in polyacetylene is due to its smaller bandgap and thus less localized density matrix, resulting in a slower convergence of the ** L** summation.

A single-walled nanotube (n, 0) has been reported to have finite but small bandgaps both in experiment and theory.^{121,122} The slow decay of its density matrix (reflected in the ** L** summation) will dominate the calculation in long distance. As expected, the asymptotic scaling of nanotube with

*N*

_{cell}approaches linear after 40 unit cells in each direction, equivalent to 171.6 Å from the reference cell. Note that scaling behaviors of

**and**

*Q***are different, i.e., the former stays linear, while the latter drops to zero after 10 cells. This is because**

*F***and**

*Q***have direct and indirect dependence on density**

*F***, respectively. For threshold**

*D**ϵ*

_{D}= 10

^{−8}, the density decreases to a small value after 10 cells, but it is still above

*ϵ*

_{D}. As a result, the operation count of

**contraction [Eq. (26)] increases with**

*Q***. On the other hand, the incremental elements of**

*D***are so small that the function ForceShape in Algorithm 1 yields an unchanged shape for**

*Q***, leading to a scaling exponent of zero for the computation of**

*F***. Numerical analysis indicates that we might have overcomputed the elements of**

*F***due to greatly overestimating norms of its tiles. An improved norm-estimating approach for**

*Q***, especially in small-gap systems, is left for the future work.**

*Q*The two-dimensional test cases, urea monolayer and h-BN, are prototypical molecular and covalently bonded crystals, respectively. The loosely packed structure of molecular crystals usually leads to large bandgaps and fast decay of density matrices. Figure 1 shows that for the urea monolayer, the scaling exponent quickly decays to zero after *N*_{cell} = 4. The covalently bonded h-BN has a similar structure as graphene but with a wide bandgap of about 5 eV.^{110,123–125} The total CADF-K time of h-BN stops growing after *N*_{cell} = 40. Similar to the nanotube case, ** Q** decays slower than

**for h-BN because we overestimated its tile norms, but finally it drops to zero.**

*F*### B. Precision analysis

In this section, we discuss the effect of the CADF approximation and screening thresholds on precision. The *default* values of thresholds (see Table I) are used in Sec. IV B 1, ensuring that the incompleteness of DFBS is the main source of error, i.e., further reduction of thresholds by a factor of 10 does not change the Hartree–Fock energy by more than 1 *μE*_{h}. In Sec. IV B 2, to study the effect of cutoff thresholds individually, other parameters are chosen from the *tight* set.

#### 1. Precision of CADF

Figure 2 shows the absolute HF energy error per atom due to the use of CADF-K relative to the exact evaluation using four-center ERIs (4c-K) for various OBS/DFBS pairs. When a matched OBS/DFBS pair is used, i.e., cc-pVDZ/cc-pVDZ-RI and Def2-SVP/Def2-SVP-J(K), errors per atom are close to or below 1 *mE*_{h}, comparable to typical DF errors in absolute energies for common auxiliary basis sets.^{76} As the size of DFBS increases, the errors tend to reduce significantly. In particular, with the cc-pVDZ/cc-pV6Z-RI basis pair, CADF errors can be controlled well below 10 *μE*_{h} per atom for tested systems. Note that the CADF error does not reduce monotonically, i.e., cc-pVQZ-RI yields larger errors than cc-pVTZ-RI. This is likely due to the fact that the cc-pV*X*Z-RI (*X* = D, T, Q, 5, 6) basis sets are designed to minimize the fitting errors for the matching cc-pV*X*Z basis and not to form a systematic series.

Similar accuracy has been achieved for cc-pVDZ/cc-pVDZ-RI and Def2-SVP/Def2-SVP-J pairs. The latter will be used in the rest of this article for practical reasons regarding CPU time and memory usage.

#### 2. Precision vs screening thresholds

Among the cutoff thresholds listed in Table I, *ϵ*_{D} for truncating density matrix and *ϵ*_{F} for predetermining the sparsity shape of tensor ** F** have the most significant impact on the efficiency and accuracy of CADF-K and thus will be studied in more detail.

Figure 3 shows variation of the CADF-K error with the two screening thresholds. We can see that the CADF-K error decreases quickly with both cutoffs, yielding results with the precision close to or well below 0.01 *μE*_{h} for *ϵ*_{D} ≤ 10^{−8} and *ϵ*_{F} ≤ 10^{−10}. To balance the accuracy and computational cost, we use *ϵ*_{D} = 10^{−8} and *ϵ*_{F} = 10^{−10} as our default cutoffs for the following results, unless otherwise stated.

### C. Performance of CADF-K vs 4c-K

Table II presents the average wall times per iteration for the CADF-K and 4c-K builders in the eight selected systems, including the three-dimensional ones. CADF-K demonstrates speedups of up to 13× relative to 4c-K. Among the eight model systems, the greatest speedup is observed for the urea crystal. This is partially because the urea crystal adopts a looser force shape threshold *ϵ*_{F} = 10^{−8} (see Table I for explanation) in order to satisfy the memory requirement of the cluster (it is safe to loosen *ϵ*_{F} in this case since the introduced error is well below the density fitting error). It is likely related to the fact that this is the lowest-density (molecular) crystal of all systems considered here.

. | Wall time (in s) . | . | . | . | |
---|---|---|---|---|---|

System . | 4c-K^{d}
. | CADF-K . | Speedup . | ΔE_{latt}^{b}
. | % error^{c}
. |

Polyacetylene | 1.02 | 0.22 | 4.71 | 0.059 | 0.32 |

Polyethylene | 1.01 | 0.16 | 6.26 | 0.047 | 0.34 |

Nanotube | 187.58 | 22.11 | 8.48 | 0.028 | 0.07 |

h-BN | 182.74 | 29.66 | 6.16 | 0.192 | 0.22 |

Urea monolayer | 22.97 | 3.85 | 5.96 | 0.009 | 0.68 |

Urea crystal | 542.49 | 41.64^{f} | 13.03 | 0.026 | 1.06 |

LiH | 827.54 | 346.63 | 2.39 | 0.071 | 0.09 |

Diamond^{e} | 904.72 | 364.30^{f} | 2.48 | 0.499 | 0.42 |

Diamond^{g} | 3804.29 | 504.46^{f} | 7.54 | 0.232 | 0.21 |

. | Wall time (in s) . | . | . | . | |
---|---|---|---|---|---|

System . | 4c-K^{d}
. | CADF-K . | Speedup . | ΔE_{latt}^{b}
. | % error^{c}
. |

Polyacetylene | 1.02 | 0.22 | 4.71 | 0.059 | 0.32 |

Polyethylene | 1.01 | 0.16 | 6.26 | 0.047 | 0.34 |

Nanotube | 187.58 | 22.11 | 8.48 | 0.028 | 0.07 |

h-BN | 182.74 | 29.66 | 6.16 | 0.192 | 0.22 |

Urea monolayer | 22.97 | 3.85 | 5.96 | 0.009 | 0.68 |

Urea crystal | 542.49 | 41.64^{f} | 13.03 | 0.026 | 1.06 |

LiH | 827.54 | 346.63 | 2.39 | 0.071 | 0.09 |

Diamond^{e} | 904.72 | 364.30^{f} | 2.48 | 0.499 | 0.42 |

Diamond^{g} | 3804.29 | 504.46^{f} | 7.54 | 0.232 | 0.21 |

^{a}

Wall time is measured as the time per iteration. Multipole accelerated 4c-J is used for both 4c-K and CADF-K. Def2-SVP is used as the orbital basis set for the first six systems, CR-cc-pVDZ for LiH and 3-21G and 6-31G^{*} for diamond. Def2-SVP-J is used as the density-fitting basis set in all cases in CADF-K. Calculations are performed on Intel Xeon E5-2680 v4 processors @ 2.4 GHz with 28 cores.

^{b}

Δ*E*_{latt} is the difference between *E*_{latt} (in kcal mol^{−1} atom^{−1}) computed by CADF-K and that computed by 4c-K.

^{c}

Percent error of *E*_{latt} computed by CADF-K with respect to 4c-K *E*_{latt}.

^{d}

Permutational and translational symmetries of four-center ERIs are exploited in 4c-K.

^{e}

Using the 3-21G orbital basis.

^{f}

*ϵ*_{F} = 10^{−8} is used in the CADF-K calculation of the urea crystal and diamond (*default* values for other thresholds) in order to satisfy the memory requirement of the cluster.

^{g}

Using the 6-31G^{*} orbital basis.

To demonstrate the accuracy of our method in terms of relative energies, here we define the (electronic) lattice dissociation energy, *E*_{latt}, as the per-atom energy difference between the cell structure in a periodic form and that in an isolated form, i.e.,

where *N*_{atom} is the number of atoms in a cell. We take polyethylene to illustrate each component in Eq. (36). In the calculation of polyethylene, where we adopt C_{2}H_{4} as its unit cell, *N*_{atom} thus equals 6, $Ecellperiodic$ is the total energy of polyethylene per cell (calculated as a 1D system), and $Ecellisolated$ is the energy of an isolated C_{2}H_{4} with the same atomic coordinates of the reference cell (calculated as a molecule). Note that *E*_{latt} is not lattice energy or cohesive energy. From the last two columns of Table II, we can see that the error in lattice dissociation energy (Δ*E*_{latt}) due to the CADF approximation is at most 0.5 kcal mol^{−1} per atom, or at most 1% of 4c-K *E*_{latt}. Among the eight systems, diamond shows the largest Δ*E*_{latt} (0.499 kcal mol^{−1} per atom) when the 3-21G orbital basis is used. This particular computation seems to be an outlier due to the poor match of the orbital and fitting basis; all other computations have used double-zeta plus polarization orbital basis, whereas the 3-21G basis does not include any polarization functions. Indeed, the error is significantly reduced when the polarization functions are included via the use of the 6-31G^{*} basis (use of the def2-SVP basis for both diamond and LiH unfortunately leads to excessive condition numbers). We point out that the CADF errors can be better controlled if we use a more converged DFBS (see Fig. 2 of the present work and Fig. 6 of the work by Ihrig and co-workers^{21}). The relevant total energies of the eight systems from CADF-K and 4c-K can be found in Table S2 of the supplementary material.

Periodic Hartree–Fock based on Gaussian AOs has also been implemented in other solid-state quantum chemistry packages, including, but not limited to, Crystal,^{35} Gaussian,^{36,37} CP2K,^{38} and PySCF.^{39} A strict apple-to-apple comparison between various packages is impossible because of their diverse implementation techniques, different optimization levels, and licensing restrictions. Here, in order to show that the demonstrated speedups of our CADF-K algorithm are indeed meaningful, i.e., our 4c-K implementation in MPQC is efficient, we perform a comparison of timing between the 4c-K in MPQC and that in Crystal, as well as an accuracy comparison with Crystal and Gaussian to prove the correctness of our code (see the footnotes of Table III for the relevant computational details). Polyacetylene, polyethylene, h-BN, LiH, and diamond are selected as examples. From Table III, we can see that for all tested systems, the 4c-K implementation in MPQC is as efficient as Crystal. A consistent estimate of lattice dissociation energies has been achieved among the three packages at sub-kcal mol^{−1} atom^{−1} accuracy (10^{−4} kcal mol^{−1} atom^{−1} if one compares MPQC to Gaussian). Note that the timings obtained from Crystal are based on tightened integral thresholds in order to reach a similar level of accuracy as Gaussian and MPQC. The relatively low accuracy in Crystal’s results (especially for polyacetylene) could possibly be further improved by adopting more appropriate thresholds; they will not be explored in the present work, partly due to the authors’ unfamiliarity to the program.

. | Gaussian^{b}
. | Crystal^{c}
. | MPQC^{d}
. | ||
---|---|---|---|---|---|

System . | E_{latt}
. | Time . | E_{latt}
. | Time . | E_{latt}
. |

Polyacetylene | −26.651 05 | 15.3 | −26.627 45 | 4.9 | −26.651 05 |

Polyethylene | −13.372 71 | 22.9 | −13.372 73 | 5.8 | −13.372 71 |

h-BN | −87.804 06 | 2 838.7 | −87.798 22 | 1886.6 | −87.804 07 |

LiH | −83.293 62 | 25 769.9 | −83.292 75 | 6358.8 | −83.293 41 |

Diamond | −118.153 46 | 23 377.3 | −118.146 71 | 7223.9 | −118.153 78 |

. | Gaussian^{b}
. | Crystal^{c}
. | MPQC^{d}
. | ||
---|---|---|---|---|---|

System . | E_{latt}
. | Time . | E_{latt}
. | Time . | E_{latt}
. |

Polyacetylene | −26.651 05 | 15.3 | −26.627 45 | 4.9 | −26.651 05 |

Polyethylene | −13.372 71 | 22.9 | −13.372 73 | 5.8 | −13.372 71 |

h-BN | −87.804 06 | 2 838.7 | −87.798 22 | 1886.6 | −87.804 07 |

LiH | −83.293 62 | 25 769.9 | −83.292 75 | 6358.8 | −83.293 41 |

Diamond | −118.153 46 | 23 377.3 | −118.146 71 | 7223.9 | −118.153 78 |

^{a}

Def2-SVP is used as the orbital basis set for the first three systems, CR-cc-pVDZ for LiH, and 3-21G for diamond. Calculations are performed on Intel Xeon E5-2680 v4 processors @ 2.4 GHz with 1 core. The wall times are given in seconds and *E*_{latt} in kcal mol^{−1} atom^{−1}.

^{b}

Gaussian09 is used with the keyword **SCF = Tight**.

^{c}

Space symmetry and the bipolar approximation are turned off in Crystal. ERI cutoffs and shrink factors are optimized to achieve maximal efficiency without a significant loss of accuracy. Note that for h-BN tightened integral cutoffs, **TOLINTEG** 20 20 20 20 40, were used in order to converge the SCF iterations. The reported time of Crystal includes the computation of Coulomb term, which is assumed insignificant relative to the cost of exchange.

^{d}

Default cutoffs are used in MPQC.

## V. CONCLUSIONS

We have presented an efficient density-based implementation of exact exchange for periodic systems via the concentric atomic density fitting (CADF) approximation based on atom-centered Gaussian-type orbitals. Both molecular and periodic systems can be treated on an equal footing. We have shown the performance of periodic CADF-K on several systems with different dimensionalities and obtained a significant improvement relative to the 4c-K algorithm. Errors in lattice dissociation energy introduced by the CADF approximation are below the chemical accuracy. Further efficiency improvements could be conceivably achieved by using maximally localized Wannier functions^{126–129} in the context of an orbital-based CADF-K algorithm.^{59}

## SUPPLEMENTARY MATERIAL

See the supplementary material for the performance data used to generate Fig. 1 and the absolute energies of conventional and CADF-K computations reported in Table II.

## DATA AVAILABILITY

The data that support the findings of this study are available within the article and its supplementary material and from the corresponding author upon request.

## ACKNOWLEDGMENTS

This research was supported by the U.S. National Science Foundation (Award Nos. 1550456 and 1800348). We thank the Advanced Research Computing (ARC) at Virginia Tech for providing computing resources. X.W. thanks Dr. Chong Peng for insightful discussions. The Flatiron Institute is a division of the Simons Foundation.

### APPENDIX: COULOMB POTENTIAL EVALUATION USING REAL MULTIPOLE EXPANSION

Fast evaluation of Coulomb potential is typically performed via the Ewald method^{46,130–132} or the fast multipole method (FMM);^{133–138} however, other efficient approaches exist.^{32,139,140} In the current work, we implemented a straightforward multiscale multipole-accelerated approach as described in this section; this approach is similar to the method of Ref. 141, except we use (1) *real*^{142} *bipolar*^{143,144} (or two-center^{145}) multipole expansions and (2) leading multipole moments of the unit cell’s total charge density are compensated to ensure absolutely convergent values for total energy and potential as well as the equivalence with the Ewald method. This appendix documents the relevant formulas.

Our goal is to evaluate matrix representation of the total Coulomb potential, i.e., the sum of Eqs. (14) and (15). The “Coulomb” summations (over ** G**) are first divided into the crystal near field (CNF) and far field (CFF) according to the partitioning criteria defined in Ref. 88. The Coulomb interaction (

*ρ*

_{0}|

*ρ*

_{G}) between charge distributions of the reference unit cell

**0**and a distant unit cell

**is computed via 4c-J [Eq. (15)] if**

*G***is in CNF; otherwise, it is evaluated using multipole expansion. The use of complex multipole moments is common in the literature;**

*G*^{133–136,138,146,147}however, if the basis is real, there is no reason to use the complex-valued formalism. Real spherical multipole moments and their interaction kernels are defined as

^{142}

where (r, *θ*, *ϕ*) are spherical polar coordinates of ** r** and $Plm(cos\theta )$ denotes the associated Legendre polynomials about cos

*θ*. Now, we introduce functions $Nl,m\xb1(r)$ (

*N*=

*O*or

*M*) as

Note that $Ol,m\xb1$ and $Ml,m\xb1$ have the same expressions as Eqs. (3)–(6) in Ref. 142. The Coulomb potential created at ** P** by a distant unit charge at

**whose potential is multipole expanded around**

*r***can be expressed as**

*Q*The Coulomb interaction energy between sets of point charges {*q*_{i}} and {*q*_{j}} (e.g., two unit cells) becomes

where

are the total (real) multipole moments of {*q*_{i}} centered at ** P** and

are the charge-including local potentials at ** P** due to distant {

*q*

_{j}} (centered at

**) and evaluated via Eq. (A5). For continuous charge distribution, the summation over point charges in Eqs. (A8) and (A9) becomes a trace with 1-RDM, e.g.,**

*Q*Evaluation of the real multipole moment integrals over the Gaussian basis was implemented in the open-source Libint^{148} library following the recurrence formulas in Ref. 142.

To obtain meaningful (absolutely convergent) total and orbital energies, it is mandatory to compensate the unit cell’s electric dipoles (for 2-dimensional crystals) and quadrupoles (for three-dimensional crystals). This is achieved by adding charges and point dipoles to the surface of the unit cell in such a way that the net charge density in the bulk of the crystal is not changed, while the electric dipole and quadrupole moments of the unit cell vanish. The compensation can be viewed as modifying the boundary conditions such that the net charge density in the bulk of the crystal remains unchanged; the compensating charges/dipoles would end up on the surface of a finite crystal composed of the multipole-compensated unit cells and thus help cancel the shape-dependent contributions to the energy and potential.^{44,46,149} Charge compensation of the dipoles in the context of FMM was introduced by Kudin and Scuseria as soon as they started to utilize (C)FMM for LCAO calculations on 3-D solids.^{146} Here, we compensate not only the dipoles but also the Cartesian quadrupoles. The latter automatically ensures not only the absolute convergence of the potential (e.g., orbital energies) and the equivalence to the Ewald limit of the potential due to the vanishing trace of the second moment (spheropole) of the charge density.^{46,150} Our approach is formulated for a monoclinic (Bravais) lattice.

Dipole moments are compensated by placing charges of ±*q*_{i} at the center of opposing facets, namely, for each principal axis *i* (*i* = 0, 1, 2 for a 3D lattice), charges of *q*_{i} and −*q*_{i} are placed at *a*_{j}/2 + *a*_{k}/2 and *a*_{i} + *a*_{j}/2 + *a*_{k}/2, respectively, where *j* = mod(*i* + 1, 3), *k* = mod(*i* + 2, 2), and *a*_{i} are the unit cell vectors. The magnitudes of charges needed to compensate net dipole ** μ** are given by

with $b\u0303i\u2261bi/(2\pi )$ being the reduced reciprocal lattice vectors,

where *V* ≡ *a*_{0} · (*a*_{1} × *a*_{2}) is the unit cell volume.

Quadrupole moments are compensated by placing point dipoles ±*μ*_{i} at the center of opposing facets. Specifically, dipoles *μ*_{i} and −*μ*_{i} are placed at *a*_{j}/2 + *a*_{k}/2 and *a*_{i} + *a*_{j}/2 + *a*_{k}/2, respectively. The compensating dipoles needed to compensate unit cell’s net Cartesian quadrupole tensor ** q** (evaluated with the dipole-compensating charges) are given by

If needed, this formula can be straightforwardly generalized to higher-order multipoles. For simplicity, point dipoles were approximated by two opposite charges separated by 0.1 a.u.

## REFERENCES

*ı*, and

Since we only focus on the most time-consuming step, namely the construction of the direct-space exchange matrix [Eq. (16)], for a chosen unit cell, we do not analyze the cost with respect to the number of sampled ** k** points in the first Brillouin zone [

*N*

_{k}, see, for example, Eq. (18)]; in any case, the number of

**points can always be reduced to 1 by increasing the unit cell size appropriately.**

*k*