Efficient representations of the electron repulsion integral (ERI) tensor and fast algorithms for contractions with the ERI tensor often employ a low-rank approximation of the tensor or its sub-blocks. Such representations include density fitting (DF), the continuous fast multipole method (CFMM), and, more recently, hierarchical matrices. We apply the H2 hierarchical matrix representation to the ERI tensor with Gaussian basis sets to rapidly calculate the Coulomb matrices in Hartree–Fock and density functional theory calculations. The execution time and storage requirements of the hierarchical matrix approach and the DF approach are compared. The hierarchical matrix approach has very modest storage requirements, allowing large calculations to be performed in memory without recomputing ERIs. We interpret the hierarchical matrix approach as a multilevel, localized DF method and also discuss the close relationship between the hierarchical matrix approaches with CFMM. Like CFMM, the hierarchical matrix approach is asymptotically linear scaling, but the latter requires severalfold less memory (or severalfold less computation, if quantities are computed dynamically) due to being able to efficiently employ low-rank approximations for far more blocks.

Given Nbf basis functions {ϕa}, the electron repulsion integral (ERI) tensor

(ab|cd)=ϕa(r1)ϕb(r1)1|r1r2|ϕc(r2)ϕd(r2)dr1dr2

contains Nbf4 entries, with the number of non-negligible entries scaling as O(Nbf2) with a very large prefactor. Here, we assume that the basis functions are Gaussian type functions. The ERI tensor is generally so large that it cannot be stored in memory, so its entries are either stored on disk or are recomputed whenever they are needed. Furthermore, each integral is very costly to compute, depending on irregularly structured recurrence relations. Efficient representations of the ERI tensor exploit properties of the tensor to reduce computational cost as well as to reduce storage requirements. Once an efficient representation of the ERI tensor is formed, it may be used to accelerate subsequent calculations, such as the calculation of the Coulomb matrix. Calculating the Coulomb matrix is often the computational bottleneck in density functional theory (DFT) calculations.

Although they have different physical motivations, many existing efficient representations of the ERI tensor can be viewed algebraically as “compressing” the ERI tensor or its sub-blocks into low-rank form. For example, density fitting1–9 (DF) constructs a low-rank approximation of the entire ERI tensor. The continuous fast multipole method10–13 (CFMM) and related methods14–17 compress specific blocks of the ERI tensor into low-rank form. The clustered low-rank tensor format18 (CLR), proposed for compressing the blocks in the three-index tensors of DF, can also be used to compress the ERI tensor. The different methods differ in how they exploit the block low-rank structure of the ERI tensor, i.e., which blocks to compress and how to compress them.

Previously, the H2-ERI method19 was proposed as an efficient method to calculate the Coulomb matrix. This is accomplished by compressing specific ERI blocks and representing the overall ERI tensor in the H2 matrix representation. The H2 matrix representation is a type of hierarchical block low-rank representation that has both storage and matrix-vector multiplication costs linear in the matrix dimension. In Ref. 19, a Matlab implementation was used to calculate and verify the accuracy of Coulomb matrices using the H2-ERI method. In this paper, we use an optimized multithreaded C program to perform Hartree–Fock and DFT calculations with the H2-ERI method. We also perform calculations with DF for comparison of both accuracy and computational time. In comparison with DF and CFMM, we show that H2-ERI better exploits the block low-rank structure of the ERI tensor and thus provides a more efficient representation.

The ERI tensor (ab|cd) can be unfolded into an ERI matrix where rows are indexed by ab and columns are indexed by cd. Each entry of the ERI matrix is the Coulomb interaction between two generalized electron densities, ϕaϕb and ϕcϕd. It can be justified numerically that (1) the ERI matrix has numerical rank O(Nbf); (2) a block of the ERI matrix associated with two “well-separated” (to be defined in Sec. II) sets of electron densities has numerical rank O(1) independent of the block size. Such rank-O(1) blocks form the majority of the ERI matrix. Thus, DF must have approximation rank at least O(Nbf) [equivalent to using O(Nbf) auxiliary basis functions], while H2-ERI and CFMM can compress most blocks of the ERI matrix into rank-O(1) form. As a result, CFMM and H2-ERI are asymptotically more efficient than DF.

In CFMM, a large number of ERI blocks, though numerically low-rank, are not compressed because multipole expansions cannot be used for determining low-rank expansions when electron densities overlap numerically. Such blocks must be represented in dense form, and these dense ERI blocks dominate the storage (if precomputed) or computation (if dynamically computed) in CFMM. In comparison, H2-ERI uses a compression method that works for more ERI blocks (without needing to compute these ERI blocks explicitly) than multipole expansions. As a result, H2-ERI has the same asymptotic scalability as CFMM but is far more efficient.

It is worth noting that the CLR tensor format uses a one-level partitioning of a tensor into nonoverlapping blocks as opposed to the hierarchical partitioning used in H2-ERI and CFMM. This lack of hierarchy in partitioning leads to larger asymptotic computation and storage complexities when using CLR for the ERI tensor than CFMM and H2-ERI. CLR also does not provide a way to compress blocks without computing them, as opposed to H2-ERI and CFMM.

In this paper, we refer to a continuous electron density, or a product of two basis functions, ϕaϕb, as a distribution. Since we assume that the basis functions are Gaussian type functions (GTFs), the distributions are also GTFs. Such a distribution decays exponentially and thus has a bounded numerical support outside of which the distribution is negligible. We use a ball to compactly characterize the numerical support of a distribution and will often refer to the center of the distribution as the center of this ball.

Let I denote the complete set of distributions {ϕaϕb} for a molecular system and chosen basis set. We use the notation (I|I) to denote the Nbf2×Nbf2 ERI matrix. Subsets of I can be used in this notation to denote sub-blocks of the ERI matrix.

To illustrate the block low-rank structure of the ERI matrix, consider one ideal graphene layer with the STO-3G basis set. Place a cubic box B0=[L/2,L/2]3 centered on the graphene layer [see Fig. 1(a)]. Let Bnear=[3L/2,3L/2]3\B0 be the union of 26 adjacent boxes, and Bfar=[7L/2,7L/2]3\(BnearB0) be the union of 316 nonadjacent boxes. Denote the sets of distributions with the center in these three domains, B0, Bnear, and Bfar, as I0, Inear, and Ifar, respectively. Increasing the edge length L from 3 Bohrs to 9 Bohrs, we obtain a series of sets I0, Inear, and Ifar with increasing numbers of distributions. Figure 1(b) plots the numerical ranks of the ERI blocks (I0|I0), (I0|Inear), and (I0|Ifar) vs the size of I0. The numerical rank of a matrix block is estimated by its singular values with relative threshold 10−10.

FIG. 1.

(a) 2D illustration of the graphene layer and B0, Bnear, and Bfar; (b) numerical ranks of the three ERI blocks vs |I0|.

FIG. 1.

(a) 2D illustration of the graphene layer and B0, Bnear, and Bfar; (b) numerical ranks of the three ERI blocks vs |I0|.

Close modal

We observe that the numerical ranks of the ERI blocks (I0|I0) and (I0|Inear) both increase with the size of I0, while the numerical rank of (I0|Ifar) tends to be small and independent of the sizes of I0 and Ifar. This low-rank property of blocks of the form (I0|Ifar) is the basis of the H2-ERI method.

We say that two boxes of the same size are well-separated if they are separated by at least one box of the same size. As an example, box B0 and any box of the same size in Bfar are said to be well-separated.

Let the domain containing the centers of all distributions be partitioned into boxes of equal size. Let Ii denote the set of distributions in box i. Let Ji denote the set of distributions with centers in boxes that are well-separated from box i. The ERI block (Ii|Ji) is low-rank like the block (I0|Ifar) from Sec. II B. This block can be compressed into a low-rank form

(Ii|Ji)|Ii|×|Ji|Ui|Ii|×r0(Iiid|Ji)r0×|Ji|,
(1)

where Iiid is a subset of Ii and thus (Iiid|Ji) is a subset of the rows of (Ii|Ji), Ui is a tall-and-skinny matrix with bounded values, and r0 is the approximation rank. Such a low-rank form is called an interpolative decomposition20 (ID).

One could compute an ID algebraically via the pivoted QR decomposition given a rank or an absolute/relative error threshold. However, such a procedure requires that the ERI block (Ii|Ji) is formed explicitly, i.e., requiring computation of all the ERIs in (Ii|Ji). Both CFMM and H2-ERI avoid needing (Ii|Ji) in explicit form. In CFMM, the low-rank approximation is computed via multipole expansions if the distributions in Ii and Ji do not overlap numerically. This limitation of CFMM means that the low-rank form of many ERI blocks cannot be computed and are treated as full-rank, dense matrices. In H2-ERI, the low-rank approximation is computed without this limitation via a method called the proxy point method.19 The key that allows the ERI matrix to be efficiently represented in the H2 matrix representation is this proxy point method.

To briefly explain the proxy point method, let (I*|J*) denote any above ERI block to be compressed into ID form. Referring to Fig. 2 (left), the centers of the distributions in I* are contained in a box B, and the centers of the distributions in J* are within the union of well-separated boxes F. In the proxy point method, the distributions in J* are split into two subsets, Jnear and Jfar. The set Jnear contains the distributions that numerically overlap with B or its 26 adjacent boxes. These distributions are shown in green in Fig. 2. The set Jfar = J*Jnear. The set Jnear is usually a small fraction of J* since GTFs decay exponentially, while Jfar is a large fraction.

FIG. 2.

2D illustration of the proxy point method for an ERI block (I*|J*). A circle represents the numerical support of a distribution in J*. In the proxy point method, Jfar is replaced by a set of point charges Yp in F.

FIG. 2.

2D illustration of the proxy point method for an ERI block (I*|J*). A circle represents the numerical support of a distribution in J*. In the proxy point method, Jfar is replaced by a set of point charges Yp in F.

Close modal

The challenge is how to efficiently compress the interactions between I* and Jfar. Here, by virtue of Green’s theorem, we replace Jfar by a small set of point charges Yp located in F as illustrated in Fig. 2 (right), i.e., the column space of (I*|Jfar) is replaced by the column space of the much smaller matrix (I*|Yp). The elements of (I*|Yp) are much cheaper to compute than ERIs; the elements are interactions between distributions and point charges and are thus analogous to electron–nuclear attraction integrals.

Finally, the ID approximation (I*|J*)U(I*id|J*) is computed via the “mixed” ID approximations of (I*|Jnear) and (I*|Yp), as shown in Algorithm 1. The proxy points in Yp are uniformly sampled on several layers of cubic surfaces that enclose the interior boundary of domain F, as shown in Fig. 2. The size of Yp is independent of the overall problem size |I| or the sizes of I* and J*. In most cases, a few hundreds to one thousand proxy points for Yp are sufficient to guarantee the accuracy of the proxy point method. More details on the selection of Yp can be found in Ref. 19.

ALGORITHM 1

The proxy point method for (I*|J*).

Input: I*, J*, B, F 
Output: U and I*id for the ID approximation (I*|J*)U(I*id|J*) 
1: Split J* into Jnear and Jfar 
2: Select proxy point charges Yp in F 
3: Generate random matrices Ω1R|Jnear|×|I*| and Ω2R|Yp|×|I*| 
4: Calculate A1 = (I*|Jnear1 and A2 = (I*|Yp2 
5: Normalize the columns of A1 and A2 to obtain Ã1 and Ã2 
6: Compute U and I*id by an algebraic ID approximation of [Ã1, Ã2]. 
Input: I*, J*, B, F 
Output: U and I*id for the ID approximation (I*|J*)U(I*id|J*) 
1: Split J* into Jnear and Jfar 
2: Select proxy point charges Yp in F 
3: Generate random matrices Ω1R|Jnear|×|I*| and Ω2R|Yp|×|I*| 
4: Calculate A1 = (I*|Jnear1 and A2 = (I*|Yp2 
5: Normalize the columns of A1 and A2 to obtain Ã1 and Ã2 
6: Compute U and I*id by an algebraic ID approximation of [Ã1, Ã2]. 

In Algorithm 1, a randomized linear algebra technique21 is used in steps 3 and 4 to approximate the column spaces of (I*|Jnear) and (I*|Yp) by the column spaces of the smaller matrices A1 and A2, respectively. Typically, Ω1 and Ω2 are chosen to be dense with elements from a standard normal distribution. In this paper, to reduce the cost of the multiplications in step 4, we choose Ω1 and Ω2 as sparse random matrices, with 16 nonzero entries per column with locations selected randomly and nonzero values following a standard normal distribution.

We can now briefly describe the H2 matrix representation and establish additional terminology for this paper. See Ref. 22 for additional details.

An H2 matrix representation is composed of far-field (FF) blocks stored in low-rank form and near-field (NF) blocks stored in dense matrix form. FF blocks represent the interaction between distributions centered in two boxes that are well-separated. NF blocks represent the remaining interactions. See Fig. 3 (right) for an example.

FIG. 3.

1D illustration of a recursive partitioning of the distributions in I and the associated structure of the ERI matrix (I|I). NF blocks are white, and FF blocks are colored. Yellow FF blocks are defined by nodes in level 2, and green FF blocks are defined by nodes in level 3 of the partition tree.

FIG. 3.

1D illustration of a recursive partitioning of the distributions in I and the associated structure of the ERI matrix (I|I). NF blocks are white, and FF blocks are colored. Yellow FF blocks are defined by nodes in level 2, and green FF blocks are defined by nodes in level 3 of the partition tree.

Close modal

In the example, the distributions I in a 1D domain are hierarchically partitioned into boxes. The structure is represented by a partition tree [see Fig. 3 (left)]. Let IiI denote the set of distributions centered in box i and corresponding to node i in the partition tree. The distributions centered in the finest level boxes are labeled I7, …, I14 in the example. The union of the distributions I7 and I8 is I3. Larger FF blocks are formed from merging smaller FF blocks, if possible. In the example, the large FF block (I3|I5) is due to the interactions between I3 and I5, which are in well-separated boxes.

For each node i in the partition tree, the ID approximation of (Ii|Ji) [see Eq. (1)] is computed. Each FF block (Ii|Ij) is the intersection between the low-rank ERI blocks (Ii|Ji) and (Jj|Ij). [The latter is the transpose of (Ij|Jj).] Thus, the low-rank approximation of (Ii|Ij) can be constructed directly based on the ID approximations of the two blocks as

(Ii|Ij)Ui(Iiid|Ijid)UjT.
(2)

We refer to (Iiid|Ijid) as an intermediate block.

For a nonleaf node i in the partition tree, the ID approximation of (Ii|Ji) is constructed in terms of the ID approximations associated with its children nodes in order to reduce computation and storage cost.19 

In standard H2 matrix representations, the NF and intermediate blocks are stored in the dense matrix format, and storage of these dense blocks typically constitutes a large majority of the total storage cost. For ERI matrices, however, we note from Fig. 1(b) that the numerical ranks of (I0|I0) and (I0|Inear), although not independent of |I0|, are small relative to |I0|. Thus, in H2-ERI, we also compress each NF and intermediate block into an ID form using the pivoted QR decomposition when this is beneficial. Since the total size of NF and intermediate blocks is O(|I|), this additional compression does not change the asymptotic computation cost of H2-ERI.

Let r denote an upper bound on the rank of all the ID approximations. Then, the H2 matrix representation can be formed with O(|I|r2) computation cost and requires O(|I|r) storage cost.

H2-ERI uses a relative error threshold τ to control the accuracy of each computed ID approximation. The same τ is used in the compression of the NF and intermediate blocks. The constructed representation of (I|I) in the end has relative error O(τ) in the Frobenius norm.

The H2 matrix representation, once computed, can then be used to rapidly calculate the Coulomb matrix,

Jab=c,d(ab|cd)Dcd.

This operation is equivalent to the matrix-vector multiplication J = (I|I)D with the Coulomb matrix J and density matrix D unfolded into vectors. With (I|I) represented in the H2 format, the established fast H2 matrix-vector multiplication algorithm22,23 can calculate the Coulomb matrix with O(|I|r) computation cost.

In DF, a set of Naux auxiliary basis functions {ψα} is used to fit each distribution ϕaϕb by a linear combination, i.e., ϕaϕbαCa,bαψα. This leads to the general DF approximation

(ab|cd)α,βCa,bα(α|β)Cc,dβ.

In classical DF,3 the fitting coefficients Ca,bα are computed as Ca,bα=β(ab|β)(β|α)1, and the DF approximation becomes

(ab|cd)α,β(ab|α)(α|β)1(β|cd).

This can be viewed as a rank-Naux approximation of the ERI matrix (I|I). Typically, the number of auxiliary basis functions is about five times the number of basis functions, i.e., NauxO(Nbf).

In H2-ERI, each ID approximation (Ii|Ji)Ui(Iiid|Ji) can be viewed as a localized DF. More precisely, each row of (Ii|Ji) is approximated in the ID approximation as

(φk|Ji)uk,:(Iiid|Ji)=φαIiiduk,α(φα|Ji),φkIi,

where uk,: is the kth row of Ui and uk,α is the (k, α)th entry of Ui. Each distribution φkIi is thus approximated by a linear combination of the distributions in Iiid as

φkφαIiiduk,αφα.

From the viewpoint of DF, Iiid is exactly a set of auxiliary basis functions used to approximate Iifor the interactions between Iiand Ji, and Ui contains the associated fitting coefficients. Conversely, computing an ID approximation of an ERI block can be viewed as a numerical way to construct a DF approximation for the associated Coulomb interactions.

Most existing localized DF methods4–9,24 limit the number of auxiliary basis functions used to fit a distribution by heuristically applying a local fitting domain or a local fitting metric. In contrast, H2-ERI can be considered as a distinct localized DF approach that restricts DF to only approximate the Coulomb interactions between well-separated sets of distributions.

There are two main differences between H2-ERI and DF. The first is how accuracies of the constructed representations are controlled. In DF, the accuracy depends on the choice of a precomputed auxiliary basis set. In H2-ERI, the relative error threshold τ is used to directly control the accuracy of the H2 matrix representation.

The second difference is in the resulting approximation ranks, i.e., Naux and r. The maximum approximation rank r in H2-ERI is experimentally O(1) with increasing problem sizes, while Naux scales as O(Nbf). This results in H2-ERI requiring much less storage cost than DF.

In this section, we first verify the accuracy of Hartree–Fock (HF) and Kohn–Sham DFT calculations when H2-ERI is used to calculate the Coulomb matrices. We then compare the performance of the H2-ERI approach with that of DF in terms of computation time and storage.

Both HF and DFT use self-consistent field (SCF) iterations with direct inversion of the iterative subspace. Superposition of atomic densities is used for the initial density matrix. The SCF iterations are stopped when the energy difference between two consecutive iterations is less than 10−11 Hartrees. For DFT, we use the hybrid exchange–correlation functional B3LYP.

Previously, only the accuracy of the Coulomb matrices calculated by H2-ERI had been verified.19 In addition, H2-ERI was only used for ERI matrices with primitive basis functions. In this paper, we use contracted basis functions. Two basis sets are used: cc-pVDZ and aug-cc-pVDZ. The latter is important for comparison because it contains diffuse functions. For classical DF, the corresponding auxiliary basis sets cc-pVDZ-jkfit and aug-cc-pVDZ-jkfit are used.

H2-ERI and DF both start with prescreening negligible distributions in I = {ϕaϕb}. If

(ϕaϕb|ϕaϕb)1010/maxc,d(ϕcϕd|ϕcϕd),

then the distribution ϕaϕb is relatively very small, and the corresponding rows and columns of the ERI matrix are omitted from computations. We now use I to denote the set of distributions that survive prescreening. For large enough chemical systems, the scaling |I| ∼ O(Nbf) has been justified previously.2,10,28

In H2-ERI, we partition the distributions in I adaptively according to their centers. The hierarchical partitioning is stopped when each finest box has fewer than 400 unique distribution centers. However, since many distributions share the same center, they cannot be further split into subsets. These distributions result in a finest box that may contain more than 400 distributions.

The test calculations were carried out on a dual Intel Xeon Gold 6226 CPU computer with a total of 24 cores and 1.5 TB of memory. One hyperthread per core was used. Our software was developed in the GTFock framework,26,29 which contains very efficient C language parallel implementations of HF and DF. Our software uses H2Pack23 for storing and applying the H2 matrix representation and the Simint package30 for computing ERIs and other integrals.

In this section, we report on testing the accuracy of using H2-ERI in HF and DFT ground state energy calculations. We assume that the exact energies are computed from our baseline HF and DFT calculations, which use “direct” calculation of the Coulomb matrices. In direct calculation, the ERIs that survive Cauchy–Schwarz screening are recomputed at every SCF iteration, and the Coulomb and exchange matrices are computed via the usual tensor contractions.

We first consider eight sets of test molecules from the “large system” group of the GMTKN55 benchmark database.25Table I shows the average, maximum, and standard deviation of the absolute errors (difference from the baseline) in HF and DFT ground state energy per electron for the molecules in each of the test sets when the Coulomb matrices are calculated by H2-ERI and by DF. In these calculations, the cc-pVDZ basis set was used. H2-ERI used a relative error threshold of τ = 10−7.

TABLE I.

Average, maximum, and standard deviation of the absolute errors in ground state energy per electron (in Hartrees) for test sets from GMTKN55. Results for HF and DFT calculations are shown, each using DF and H2-ERI for calculating Coulomb matrices.

HFDFT (B3LYP)
DFH2-ERIDFH2-ERI
avemaxstdavemaxstdavemaxstdavemaxstd
BSR36 2.0 × 10−6 2.4 × 10−6 2.0 × 10−7 6.7 × 10−9 1.6 × 10−8 4.5 × 10−9 1.7 × 10−6 2.0 × 10−6 1.7 × 10−7 7.7 × 10−9 1.6 × 10−8 4.6 × 10−9 
CDIE20 1.8 × 10−6 2.2 × 10−6 2.1 × 10−7 1.1 × 10−8 1.8 × 10−8 2.7 × 10−9 1.8 × 10−6 1.4 × 10−5 2.2 × 10−6 1.6 × 10−8 1.4 × 10−7 2.2 × 10−8 
DARC 1.9 × 10−6 2.6 × 10−6 4.8 × 10−7 1.0 × 10−8 1.6 × 10−8 4.6 × 10−9 1.6 × 10−6 2.2 × 10−6 4.3 × 10−7 1.1 × 10−8 2.1 × 10−8 5.6 × 10−9 
PArel 2.4 × 10−6 9.4 × 10−6 2.2 × 10−6 2.0 × 10−7 3.3 × 10−6 7.1 × 10−7 1.5 × 10−6 2.8 × 10−6 7.4 × 10−7 9.1 × 10−9 1.6 × 10−8 3.9 × 10−9 
RSE43 8.9 × 10−5 1.6 × 10−3 2.7 × 10−4 1.7 × 10−5 2.1 × 10−4 4.9 × 10−5 8.4 × 10−5 1.5 × 10−3 2.5 × 10−4 2.4 × 10−5 4.8 × 10−4 7.1 × 10−5 
ISO34 2.0 × 10−6 4.9 × 10−6 9.6 × 10−7 1.0 × 10−8 1.5 × 10−8 3.2 × 10−9 1.7 × 10−6 4.3 × 10−6 8.6 × 10−7 1.2 × 10−8 1.9 × 10−8 3.6 × 10−9 
ISOL24 1.7 × 10−6 3.2 × 10−6 6.4 × 10−7 2.5 × 10−7 9.5 × 10−6 1.5 × 10−6 2.1 × 10−5 5.9 × 10−4 1.1 × 10−4 1.4 × 10−6 4.2 × 10−5 7.7 × 10−6 
C60ISO 1.9 × 10−6 1.9 × 10−6 2.2 × 10−8 1.1 × 10−8 3.8 × 10−8 1.3 × 10−8 1.6 × 10−6 1.6 × 10−6 1.7 × 10−8 2.1 × 10−8 5.5 × 10−8 1.6 × 10−8 
HFDFT (B3LYP)
DFH2-ERIDFH2-ERI
avemaxstdavemaxstdavemaxstdavemaxstd
BSR36 2.0 × 10−6 2.4 × 10−6 2.0 × 10−7 6.7 × 10−9 1.6 × 10−8 4.5 × 10−9 1.7 × 10−6 2.0 × 10−6 1.7 × 10−7 7.7 × 10−9 1.6 × 10−8 4.6 × 10−9 
CDIE20 1.8 × 10−6 2.2 × 10−6 2.1 × 10−7 1.1 × 10−8 1.8 × 10−8 2.7 × 10−9 1.8 × 10−6 1.4 × 10−5 2.2 × 10−6 1.6 × 10−8 1.4 × 10−7 2.2 × 10−8 
DARC 1.9 × 10−6 2.6 × 10−6 4.8 × 10−7 1.0 × 10−8 1.6 × 10−8 4.6 × 10−9 1.6 × 10−6 2.2 × 10−6 4.3 × 10−7 1.1 × 10−8 2.1 × 10−8 5.6 × 10−9 
PArel 2.4 × 10−6 9.4 × 10−6 2.2 × 10−6 2.0 × 10−7 3.3 × 10−6 7.1 × 10−7 1.5 × 10−6 2.8 × 10−6 7.4 × 10−7 9.1 × 10−9 1.6 × 10−8 3.9 × 10−9 
RSE43 8.9 × 10−5 1.6 × 10−3 2.7 × 10−4 1.7 × 10−5 2.1 × 10−4 4.9 × 10−5 8.4 × 10−5 1.5 × 10−3 2.5 × 10−4 2.4 × 10−5 4.8 × 10−4 7.1 × 10−5 
ISO34 2.0 × 10−6 4.9 × 10−6 9.6 × 10−7 1.0 × 10−8 1.5 × 10−8 3.2 × 10−9 1.7 × 10−6 4.3 × 10−6 8.6 × 10−7 1.2 × 10−8 1.9 × 10−8 3.6 × 10−9 
ISOL24 1.7 × 10−6 3.2 × 10−6 6.4 × 10−7 2.5 × 10−7 9.5 × 10−6 1.5 × 10−6 2.1 × 10−5 5.9 × 10−4 1.1 × 10−4 1.4 × 10−6 4.2 × 10−5 7.7 × 10−6 
C60ISO 1.9 × 10−6 1.9 × 10−6 2.2 × 10−8 1.1 × 10−8 3.8 × 10−8 1.3 × 10−8 1.6 × 10−6 1.6 × 10−6 1.7 × 10−8 2.1 × 10−8 5.5 × 10−8 1.6 × 10−8 

We observe that the average error for H2-ERI is always smaller than 1.5 × 10−3 Hartrees (chemical accuracy). For a small number of molecules in the RSE43 test set, the maximum error is larger than chemical accuracy. In these cases, DF also shows error larger than chemical accuracy. These cases are also associated with slow convergence of the SCF iterations compared to the other cases. We note that the averages in Table I do not include cases where SCF did not converge in the baseline calculations and did not include the molecule “i12p” in the ISOL24 test set, where the HF SCF iterations in the DF case did not converge to the baseline energy. The statistics consider 301 molecules for HF and 292 molecules for DFT, out of 379 total molecules in the eight test sets.

As the molecules in the GMTKN55 benchmark database are rather small for demonstrating the efficiency of H2-ERI, we now consider three types of larger test molecules: alkanes, graphenes, and truncated protein–ligand systems. The latter are derived from the protein–ligand system “1hsg” from the protein data bank. These systems consist of a ligand and a portion of its protein environment within a given distance of the ligand (see Refs. 26 and 27 for more information).

Table II shows the ground state energy error for the larger test molecules. The results show that HF and DFT calculations using H2-ERI, which uses error tolerance τ = 10−7, achieves better than chemical accuracy for this set of test molecules. The number of SCF iterations (see the  Appendix) remained essentially the same. We also observe that the energy errors for the aug-cc-pVDZ basis set are larger than for the cc-pVDZ basis set.

TABLE II.

Signed errors (in Hartrees) of the ground state energies computed in HF and DFT (B3LYP) with cc-pVDZ and aug-cc-pVDZ basis sets. The dashed entries indicate that the baseline calculation did not converge. Nbf is the number of basis functions, Naux is the number of auxiliary basis functions in DF, ravg is the average rank of all the ERI blocks (Ii|Ji) in H2-ERI, and |I| is the number of distributions after prescreening.

HFDFT (B3LYP)
NbfNauxravg|I|DFH2-ERIDFH2-ERI
cc-pVDZ         
Alkane C60H122 1510 7 910 163 183 148 −1.0 × 10−3 −7.2 × 10−6 −8.7 × 10−4 −4.8 × 10−6 
Alkane C100H202 2510 13 150 135 310 068 −1.7 × 10−3 −1.8 × 10−5 −1.5 × 10−3 −1.0 × 10−5 
Graphene C96H24 1560 8 376 205 443 319 −4.3 × 10−4 1.3 × 10−5 −3.7 × 10−4 4.5 × 10−5 
Graphene C150H30 2400 12 900 227 744 909 −6.6 × 10−4 4.1 × 10−5 −5.7 × 10−4 9.8 × 10−5 
1hsg30 1240 6 572 109 251 261 −7.9 × 10−4 −2.1 × 10−5 −6.8 × 10−4 −3.4 × 10−5 
1hsg32 1560 8 268 105 356 063 −9.7 × 10−4 −7.5 × 10−6 −8.4 × 10−4 4.1 × 10−6 
aug-cc-pVDZ         
Alkane C60H122 2598 10 330 161 801 403 −9.6 × 10−4 8.9 × 10−6 −8.9 × 10−4 9.1 × 10−5 
Alkane C100H202 4318 17 170 176 1 373 643 −1.6 × 10−3 5.5 × 10−5 −1.5 × 10−3 2.6 × 10−4 
Graphene C54H18 1512 6 084 317 859 638 −2.2 × 10−4 2.2 × 10−5 −1.9 × 10−4 −3.5 × 10−5 
Graphene C96H24 2616 10 536 256 2 029 611 −3.9 × 10−4 2.8 × 10−4 −3.4 × 10−4 6.0 × 10−4 
1hsg30 2108 8 432 166 1 301 608 −7.0 × 10−4 1.8 × 10−4 ⋯ ⋯ 
1hsg32 2652 10 608 162 1 902 022 −8.1 × 10−4 4.6 × 10−4 ⋯ ⋯ 
HFDFT (B3LYP)
NbfNauxravg|I|DFH2-ERIDFH2-ERI
cc-pVDZ         
Alkane C60H122 1510 7 910 163 183 148 −1.0 × 10−3 −7.2 × 10−6 −8.7 × 10−4 −4.8 × 10−6 
Alkane C100H202 2510 13 150 135 310 068 −1.7 × 10−3 −1.8 × 10−5 −1.5 × 10−3 −1.0 × 10−5 
Graphene C96H24 1560 8 376 205 443 319 −4.3 × 10−4 1.3 × 10−5 −3.7 × 10−4 4.5 × 10−5 
Graphene C150H30 2400 12 900 227 744 909 −6.6 × 10−4 4.1 × 10−5 −5.7 × 10−4 9.8 × 10−5 
1hsg30 1240 6 572 109 251 261 −7.9 × 10−4 −2.1 × 10−5 −6.8 × 10−4 −3.4 × 10−5 
1hsg32 1560 8 268 105 356 063 −9.7 × 10−4 −7.5 × 10−6 −8.4 × 10−4 4.1 × 10−6 
aug-cc-pVDZ         
Alkane C60H122 2598 10 330 161 801 403 −9.6 × 10−4 8.9 × 10−6 −8.9 × 10−4 9.1 × 10−5 
Alkane C100H202 4318 17 170 176 1 373 643 −1.6 × 10−3 5.5 × 10−5 −1.5 × 10−3 2.6 × 10−4 
Graphene C54H18 1512 6 084 317 859 638 −2.2 × 10−4 2.2 × 10−5 −1.9 × 10−4 −3.5 × 10−5 
Graphene C96H24 2616 10 536 256 2 029 611 −3.9 × 10−4 2.8 × 10−4 −3.4 × 10−4 6.0 × 10−4 
1hsg30 2108 8 432 166 1 301 608 −7.0 × 10−4 1.8 × 10−4 ⋯ ⋯ 
1hsg32 2652 10 608 162 1 902 022 −8.1 × 10−4 4.6 × 10−4 ⋯ ⋯ 

Table II also shows the average rank ravg of all the ERI blocks (Ii|Ji) in the H2 matrix representation. The results show that the average rank is very small relative to the number of auxiliary basis functions Naux used in DF. Furthermore, this rank appears to be bounded and may even decrease when the molecular system size increases (for the same molecular system type). In contrast, in DF, Naux grows with the system size. Thus, we expect that the total storage cost for H2-ERI will be low compared to DF and only grow linearly with the system size, rather than superlinearly for DF.

To study possible error cancellation in quantum chemical computation using H2-ERI, we consider perturbations of the truncated protein–ligand system 1hsg32. This system consists of a ligand and the portion of its protein environment within 3.2 Å of the ligand. Perturbed systems were produced by shifting the ligand toward the protein pocket along the vector joining the pair of ligand and protein atoms that are the closest. Two perturbed systems, corresponding to shifts of 0.25 Å and 0.5 Å, were used. HF calculations were performed using the cc-pVDZ basis set.

Table III shows the ground state energies computed by direct, DF, and H2-ERI methods, for the original and the two perturbed systems. As previously observed, if the direct calculation is assumed to be exact, then H2-ERI appears to have smaller errors than DF. However, the main feature in Table III is the error differences between the original and perturbed systems. Again assuming that the direct calculation is exact, we observe that the error in the energy differences for DF are now very small and comparable to the error in the energy differences for H2-ERI.

TABLE III.

Energy (in Hartrees) of three truncated protein–ligand configurations calculated by three methods. Assuming the direct calculation to be exact, the error in energy is greater for DF than for H2-ERI. Energy differences between the shift = 0 configuration and the other two configurations are also shown. Again assuming the direct calculation to be exact, the error in the energy differences is now comparable due to beneficial error cancellation in DF.

Ligand shift (in Å)00.250.5
Energy 
Direct −3756.922 322 5 −3756.907 960 9 −3756.858 213 0 
DF −3756.923 292 7 −3756.908 928 6 −3756.859 174 6 
H2-ERI −3756.922 329 9 −3756.907 967 4 −3756.858 222 1 
Error in energy 
DF −9.70 × 10−4 −9.68 × 10−4 −9.62 × 10−4 
H2-ERI −7.49 × 10−6 −6.53 × 10−6 −9.09 × 10−6 
Energy difference 
Direct  0.014 361 6 0.064 109 5 
DF  0.014 364 1 0.064 118 1 
H2-ERI  0.014 362 5 0.064 107 9 
Error in energy difference 
DF  2.53 × 10−6 8.62 × 10−6 
H2-ERI  9.57 × 10−7 −1.60 × 10−6 
Ligand shift (in Å)00.250.5
Energy 
Direct −3756.922 322 5 −3756.907 960 9 −3756.858 213 0 
DF −3756.923 292 7 −3756.908 928 6 −3756.859 174 6 
H2-ERI −3756.922 329 9 −3756.907 967 4 −3756.858 222 1 
Error in energy 
DF −9.70 × 10−4 −9.68 × 10−4 −9.62 × 10−4 
H2-ERI −7.49 × 10−6 −6.53 × 10−6 −9.09 × 10−6 
Energy difference 
Direct  0.014 361 6 0.064 109 5 
DF  0.014 364 1 0.064 118 1 
H2-ERI  0.014 362 5 0.064 107 9 
Error in energy difference 
DF  2.53 × 10−6 8.62 × 10−6 
H2-ERI  9.57 × 10−7 −1.60 × 10−6 

DF has significant error cancellation in computing the energy differences (around two digits of accuracy improvement from ground state energies) but H2-ERI does not. This lack of error cancellation in H2-ERI is expected because H2-ERI only focuses on accurately approximating the ERI matrix blocks algebraically. Errors in the computed energies generally would not be biased toward any specific direction, e.g., H2-ERI has both positive and negative errors in Table II, but DF only has negative errors. Note that even without significant error cancellation, H2-ERI still has at least comparable accuracy with DF in computing energy differences, as shown in Table III.

The use of H2-ERI and DF both require a precomputation step. For H2-ERI, this is the construction of the H2 matrix representation. For DF, this is the construction of (ab|α) and (α|β)−1. After the precomputation step, the SCF iterations calculate the Coulomb matrix once per iteration.

For H2-ERI and DF, Table IV lists the execution timings for the precomputation and for one calculation of the Coulomb matrix. The memory storage costs for H2-ERI and DF are also listed. For reference, the timings for direct calculation of the Coulomb matrix are also shown. For direct calculation, eight-way symmetry and Schwarz screening of the ERI tensor are exploited. Precomputation and storage are not needed in direct calculation. Timings for calculating the exact exchange term and exchange–correlation term in the DFT term are provided in the  Appendix.

TABLE IV.

Execution time (in seconds) and storage cost (in GB) for one Coulomb matrix calculation using direct calculation, DF, and H2-ERI. For DF and H2-ERI, the execution time is given separately for precomputation (“precomp.”) and the calculation of Coulomb matrices (“J”). The timings for calculating Coulomb matrices are averaged over five runs.

DirectDFH2-ERI
JStoragePrecomp.JStoragePrecomp.J
cc-pVDZ        
Alkane C60H122 25.52 17.3 3.20 0.19 2.8 19.89 0.05 
Alkane C100H202 74.62 48.6 9.07 0.54 4.8 25.64 0.07 
Graphene C96H24 101.63 41.1 5.53 0.72 5.4 67.82 0.11 
Graphene C150H30 273.56 104.5 13.73 1.11 9.3 136.33 0.15 
1hsg30 32.75 17.7 2.70 0.19 4.7 34.08 0.06 
1hsg32 60.85 30.6 4.76 0.37 6.4 45.93 0.08 
aug-cc-pVDZ        
Alkane C60H122 350.29 101.3 12.39 1.43 21.7 130.60 0.30 
Alkane C100H202 1025.82 287.0 39.00 3.27 41.5 241.71 0.61 
Graphene C54H18 338.48 67.6 9.76 0.76 16.1 349.22 0.25 
Graphene C96H24 1597.58 260.2 33.98 2.73 34.4 664.28 0.53 
1hsg30 725.67 130.6 13.90 2.26 47.8 478.04 0.61 
1hsg32 1481.97 236.1 25.54 2.65 70.8 683.31 0.88 
DirectDFH2-ERI
JStoragePrecomp.JStoragePrecomp.J
cc-pVDZ        
Alkane C60H122 25.52 17.3 3.20 0.19 2.8 19.89 0.05 
Alkane C100H202 74.62 48.6 9.07 0.54 4.8 25.64 0.07 
Graphene C96H24 101.63 41.1 5.53 0.72 5.4 67.82 0.11 
Graphene C150H30 273.56 104.5 13.73 1.11 9.3 136.33 0.15 
1hsg30 32.75 17.7 2.70 0.19 4.7 34.08 0.06 
1hsg32 60.85 30.6 4.76 0.37 6.4 45.93 0.08 
aug-cc-pVDZ        
Alkane C60H122 350.29 101.3 12.39 1.43 21.7 130.60 0.30 
Alkane C100H202 1025.82 287.0 39.00 3.27 41.5 241.71 0.61 
Graphene C54H18 338.48 67.6 9.76 0.76 16.1 349.22 0.25 
Graphene C96H24 1597.58 260.2 33.98 2.73 34.4 664.28 0.53 
1hsg30 725.67 130.6 13.90 2.26 47.8 478.04 0.61 
1hsg32 1481.97 236.1 25.54 2.65 70.8 683.31 0.88 

Although the execution time for calculating the Coulomb matrix is lower for H2-ERI than for DF, the results show that the main advantage of H2-ERI over DF is that of memory storage requirements. DF requires storing (ab|α) and (α|β)−1, which can be very large. Thus, H2-ERI extends the size of the molecular systems that can be processed in memory. In particular, if the DF memory requirement exceeds that of a given machine, then H2-ERI could be used instead.

On the other hand, the precomputation cost of H2-ERI is higher than that of DF. High precomputation cost is a common issue with the use of H2 matrix representations, in general, although such precomputation costs are amortized over each time the H2 matrix representation is used (e.g., to form the Coulomb matrix).

The execution time of the H2-ERI method (both precomputation and calculating the Coulomb matrix) scales linearly with the number of distributions |I| (after prescreening), as verified in Fig. 4, for a sequence of 1hsg systems ranging from 124 to 1208 atoms with the cc-pVDZ basis set. The storage cost also scales linearly.

FIG. 4.

(a) Execution time and (b) storage cost vs |I| for calculating the Coulomb matrix via direct calculation, H2-ERI, and DF. The molecular systems are 1hsg systems of different sizes. The estimated slope of each curve in these log–log plots are marked along the curve.

FIG. 4.

(a) Execution time and (b) storage cost vs |I| for calculating the Coulomb matrix via direct calculation, H2-ERI, and DF. The molecular systems are 1hsg systems of different sizes. The estimated slope of each curve in these log–log plots are marked along the curve.

Close modal

In contrast, DF computation and storage scales with an exponent of ∼1.7 over this range. Due to better scaling of H2-ERI, the precomputation time for H2-ERI would be smaller than that of DF for large enough systems. However, if the three-index tensors of DF are stored in memory, the bottleneck for DF is memory usage. Calculations for DF were not carried out if the memory storage exceeded 1 TB.

Since the scaling of H2-ERI is linear with respect to |I|, the scaling with respect to the number of basis functions Nbf (equivalently, the number of atoms) is also linear if |I| is linear in Nbf. This holds true for large molecular system sizes. Figure 5 plots the ratio of |I| to Nbf for a range of molecular system sizes. A ratio curve turning flat indicates that |I| becomes linear in Nbf. As can be observed, for systems that are more globular and for basis sets with more diffuse functions, the point at which the ratio curve turns flat is larger.

FIG. 5.

Ratio of |I| (after prescreening) to Nbf for (a) alkanes, (b) graphenes, and (c) 1hsg systems of different sizes. Results for cc-pVDZ and aug-cc-pVDZ basis sets are given.

FIG. 5.

Ratio of |I| (after prescreening) to Nbf for (a) alkanes, (b) graphenes, and (c) 1hsg systems of different sizes. Results for cc-pVDZ and aug-cc-pVDZ basis sets are given.

Close modal

H2-ERI and CFMM share the same hierarchical partitioning of I. From an algebraic viewpoint, CFMM is equivalent to multiplication by the ERI matrix in a specific H2 matrix representation. This H2 matrix representation in CFMM differs from the H2 matrix representation constructed for H2-ERI in two important ways: the definition of FF blocks and the compression of FF blocks.

In CFMM, (Ii|Ij) is a FF block if the following two conditions hold: boxes i and j are well-separated, and the numerical supports of distributions in box i do not overlap with those in box j. The numerical support of a GTF distribution is characterized by a ball in CFMM, and the radius of this ball is referred to as the “extent” of the distribution. In comparison, H2-ERI does not require the second condition. As a result of this difference, CFMM defines far more NF blocks than H2-ERI since, for typical problems, the numerical support of a distribution usually spreads over several boxes at the leaf level of the partition tree.

Consider the 1D hierarchical partitioning in Fig. 3 as an example. The FF and NF blocks in H2-ERI plotted in Fig. 3 are independent of the actual distribution extents. Assuming that all the distributions have their extents being 0.9× (and 1.4×), the edge length of a leaf box, the corresponding FF and NF blocks defined in CFMM are plotted in Fig. 6(a) [and Fig. 6(b)]. In addition, Table V lists the total number of entries of the NF and FF blocks defined in both CFMM and H2-ERI for several examples. As can be observed from both the abstract and practical examples, far more NF blocks are defined in CFMM than in H2-ERI, especially when basis sets with diffuse functions are used and when the molecular structure is globular.

FIG. 6.

The NF (white) and FF (yellow and green) blocks defined in CFMM associated with the 1D example in Fig. 3 when the distribution extent equals (a) 0.9× and (b) 1.4×, the edge length of a leaf box. In (a), the hatched block (I7|I9) is a NF block because I7 and I9 have overlapping distributions, and similarly for the hatched block (I7|I10) in (b). In addition, (I3|I5) is a FF block in (a) but is not in (b) due to the assumption of a larger distribution extent in (b).

FIG. 6.

The NF (white) and FF (yellow and green) blocks defined in CFMM associated with the 1D example in Fig. 3 when the distribution extent equals (a) 0.9× and (b) 1.4×, the edge length of a leaf box. In (a), the hatched block (I7|I9) is a NF block because I7 and I9 have overlapping distributions, and similarly for the hatched block (I7|I10) in (b). In addition, (I3|I5) is a FF block in (a) but is not in (b) due to the assumption of a larger distribution extent in (b).

Close modal
TABLE V.

Total number of entries in the FF and NF blocks defined in H2-ERI and in CFMM. “Ratio” refers to the ratio of the number of NF block entries in CFMM to that in H2-ERI. Symmetry of the NF and FF blocks in the ERI matrix is considered.

CFMMH2-ERI
NFFFNFFFRatio
cc-pVDZ      
Alkane C60H122 2.2 × 109 1.5 × 1010 1.1 × 109 1.6 × 1010 2.0 
Alkane C100H202 3.6 × 109 4.5 × 1010 1.6 × 109 4.7 × 1010 2.3 
Graphene C96H24 1.5 × 1010 8.3 × 1010 1.8 × 109 9.7 × 1010 8.4 
Graphene C150H30 3.2 × 1010 2.5 × 1011 4.4 × 109 2.7 × 1011 7.4 
1hsg30 8.0 × 109 2.4 × 1010 1.4 × 109 3.0 × 1010 5.8 
1hsg32 1.3 × 1010 5.1 × 1010 2.1 × 109 6.1 × 1010 5.9 
aug-cc-pVDZ      
Alkane C60H122 3.5 × 1010 2.9 × 1011 4.8 × 109 3.2 × 1011 7.3 
Alkane C100H202 6.2 × 1010 8.8 × 1011 7.9 × 109 9.4 × 1011 7.8 
Graphene C54H18 1.5 × 1011 2.2 × 1011 6.1 × 109 3.6 × 1011 24.3 
Graphene C96H24 4.2 × 1011 1.6 × 1012 1.1 × 1010 2.0 × 1012 38.0 
1hsg30 2.3 × 1011 6.2 × 1011 1.1 × 1010 8.4 × 1011 21.6 
1hsg32 4.0 × 1011 1.4 × 1012 1.6 × 1010 1.8 × 1012 25.3 
CFMMH2-ERI
NFFFNFFFRatio
cc-pVDZ      
Alkane C60H122 2.2 × 109 1.5 × 1010 1.1 × 109 1.6 × 1010 2.0 
Alkane C100H202 3.6 × 109 4.5 × 1010 1.6 × 109 4.7 × 1010 2.3 
Graphene C96H24 1.5 × 1010 8.3 × 1010 1.8 × 109 9.7 × 1010 8.4 
Graphene C150H30 3.2 × 1010 2.5 × 1011 4.4 × 109 2.7 × 1011 7.4 
1hsg30 8.0 × 109 2.4 × 1010 1.4 × 109 3.0 × 1010 5.8 
1hsg32 1.3 × 1010 5.1 × 1010 2.1 × 109 6.1 × 1010 5.9 
aug-cc-pVDZ      
Alkane C60H122 3.5 × 1010 2.9 × 1011 4.8 × 109 3.2 × 1011 7.3 
Alkane C100H202 6.2 × 1010 8.8 × 1011 7.9 × 109 9.4 × 1011 7.8 
Graphene C54H18 1.5 × 1011 2.2 × 1011 6.1 × 109 3.6 × 1011 24.3 
Graphene C96H24 4.2 × 1011 1.6 × 1012 1.1 × 1010 2.0 × 1012 38.0 
1hsg30 2.3 × 1011 6.2 × 1011 1.1 × 1010 8.4 × 1011 21.6 
1hsg32 4.0 × 1011 1.4 × 1012 1.6 × 1010 1.8 × 1012 25.3 

It is worth noting that, in CFMM, distributions in each leaf-level set Ii are further grouped into “branches” according to their extents, and each NF block is then subdivided into smaller blocks, some of which can also be characterized as FF blocks and compressed. Figure 6 does not illustrate the “branch” idea for simplicity, but the number of NF block entries counted in Table V has taken this approach into account.

In both H2-ERI and CFMM, the NF blocks can be precomputed, stored in memory, and recalled when needed. Alternatively, the NF blocks can be computed when they are needed. The bottleneck in CFMM is usually the storage or computation of the NF blocks.17 Thus, the reduction in the number of NF blocks in H2-ERI compared to CFMM alleviates this bottleneck.

The reason for this restricted definition of FF blocks in CFMM is that the application of the multipole expansion technique used in CFMM is restrictive, i.e., requiring two distributions to be nonoverlapping. Using multipole expansions, a FF block (Ii|Ij) in CFMM is compressed into the low-rank form

(Ii|Ij)TiBi,jSj,

where Sj corresponds to the source-to-multipole linear operator for box j, Bi,j corresponds to the multipole-to-local linear operator from box j to box i, and Ti corresponds to the local-to-target linear operator for box i. Comparing this approximation with (Ii|Ij)Ui(Iiid|Ijid)UjT in H2-ERI, we can note that Ti, Sj, and Bi,j in CFMM correspond to Ui, UjT, and (Iiid|Ijid) in H2-ERI, respectively.

In CFMM, all operators Ti, Sj, and Bi,j can be analytically computed using Ii, Ij, and geometric information for boxes i and j. These operators can be dynamically computed when needed. In H2-ERI, however, the components Ui and Iiid must be precomputed via ID approximation of ERI blocks. Experimentally, the computation cost in H2-ERI for constructing Ui and Iiid is usually multiple times the computation cost for evaluating and compressing the NF blocks, and the storage cost for Ui and Iiid is very small compared to the storage cost of the NF blocks. Thus, compared to CFMM, the advantage of H2-ERI having far fewer NF blocks could easily offset the additional precomputation and storage cost required for Ui and Iiid.

The main advantage of H2-ERI over DF for calculating the Coulomb matrix is the dramatically reduced storage needed for an accurate representation of the ERI matrix, allowing large-scale quantum chemical computations in memory without recomputing ERIs. On the other hand, H2-ERI has a relatively expensive precomputation step compared to DF. The cost of this precomputation, however, is usually much less than the cost of forming the Coulomb matrix directly (Table IV) and can be amortized over multiple SCF iterations where the Coulomb matrix is calculated each time.

Compared to CFMM, H2-ERI reduces the number of NF blocks. This reduces the storage required for the compressed representation of the ERI tensor (if the NF blocks are precomputed and stored) or the computation required when computing the Coulomb matrix (if the NF blocks are computed dynamically when needed). We note that the precomputation time we have reported for H2-ERI includes the time for computing the required NF blocks.

Once the ERI tensor is represented in the H2 matrix form, the possibility exists for using this representation to accelerate other computations involving the ERI tensor. The H2 matrix form could also be used to efficiently represent the three-index tensor (ab|α) in DF.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

The authors would like to thank C. David Sherrill for helpful discussions. The authors gratefully acknowledge funding from the National Science Foundation (Grant No. ACI-1609842).

Corresponding to the data in Tables II and IV, Table VI shows the number of SCF iterations and the average time per iteration for direct calculation of the Coulomb matrix, direct calculation of the exchange matrix, and calculation of the DFT exchange–correlation matrix (not including time for the exchange matrix). We note that our implementation of the DFT exchange–correlation matrix calculation is not optimized.

TABLE VI.

For both HF and DFT, the number of SCF iterations when using direct calculation, density fitting, and H2-ERI. Average timings are also shown (per iteration) for direct Coulomb and exchange matrix calculation (“Direct-J” and “Direct-K,” respectively) and for XC matrix calculation (not including time for the exchange part).

HF #IterDFT (B3LYP) #IterTime (s)
DirectDFH2-ERIDirectDFH2-ERIDirect-JDirect-KXC
cc-pVDZ          
Alkane C60H122 10 10 10 12 12 12 25.5 23.2 44.3 
Alkane C100H202 10 10 10 12 12 12 74.6 67.5 166.1 
Graphene C96H24 15 15 15 13 13 14 101.6 94.1 35.9 
Graphene C150H30 17 17 17 14 14 14 273.6 253.9 103.2 
1hsg30 15 16 15 22 21 19 32.8 29.9 26.9 
1hsg32 16 16 16 17 19 19 60.9 55.5 44.3 
aug-cc-pVDZ          
Alkane C60H122 11 11 11 19 19 17 350.3 305.6 44.3 
Alkane C100H202 11 11 11 16 16 16 1025.8 892.8 166.1 
Graphene C54H18 14 14 14 13 11 11 338.5 304.3 35.9 
Graphene C96H24 15 15 15 14 13 12 1597.6 1432.5 103.2 
1hsg30 16 16 16 ⋯ ⋯ ⋯ 725.7 634.4 26.9 
1hsg32 19 19 19 ⋯ ⋯ ⋯ 1482.0 1330.3 101.6 
HF #IterDFT (B3LYP) #IterTime (s)
DirectDFH2-ERIDirectDFH2-ERIDirect-JDirect-KXC
cc-pVDZ          
Alkane C60H122 10 10 10 12 12 12 25.5 23.2 44.3 
Alkane C100H202 10 10 10 12 12 12 74.6 67.5 166.1 
Graphene C96H24 15 15 15 13 13 14 101.6 94.1 35.9 
Graphene C150H30 17 17 17 14 14 14 273.6 253.9 103.2 
1hsg30 15 16 15 22 21 19 32.8 29.9 26.9 
1hsg32 16 16 16 17 19 19 60.9 55.5 44.3 
aug-cc-pVDZ          
Alkane C60H122 11 11 11 19 19 17 350.3 305.6 44.3 
Alkane C100H202 11 11 11 16 16 16 1025.8 892.8 166.1 
Graphene C54H18 14 14 14 13 11 11 338.5 304.3 35.9 
Graphene C96H24 15 15 15 14 13 12 1597.6 1432.5 103.2 
1hsg30 16 16 16 ⋯ ⋯ ⋯ 725.7 634.4 26.9 
1hsg32 19 19 19 ⋯ ⋯ ⋯ 1482.0 1330.3 101.6 
1.
J. L.
Whitten
,
J. Chem. Phys.
58
,
4496
(
1973
).
2.
K.
Eichkorn
,
O.
Treutler
,
H.
Öhm
,
M.
Häser
, and
R.
Ahlrichs
,
Chem. Phys. Lett.
240
,
283
(
1995
).
3.
F.
Weigend
,
Phys. Chem. Chem. Phys.
4
,
4285
(
2002
).
4.
A.
Sodt
,
J. E.
Subotnik
, and
M.
Head-Gordon
,
J. Chem. Phys.
125
,
194109
(
2006
).
5.
F.
Aquilante
,
R.
Lindh
, and
T.
Bondo Pedersen
,
J. Chem. Phys.
127
,
114107
(
2007
).
6.
F.
Aquilante
,
L.
Gagliardi
,
T. B.
Pedersen
, and
R.
Lindh
,
J. Chem. Phys.
130
,
154107
(
2009
).
7.
S.
Reine
,
E.
Tellgren
,
A.
Krapp
,
T.
Kjærgaard
,
T.
Helgaker
,
B.
Jansik
,
S.
Høst
, and
P.
Salek
,
J. Chem. Phys.
129
,
104101
(
2008
).
8.
P.
Merlot
,
T.
Kjaergaard
,
T.
Helgaker
,
R.
Lindh
,
F.
Aquilante
,
S.
Reine
, and
T. B.
Pedersen
,
J. Comput. Chem.
34
,
1486
(
2013
).
9.
A. C.
Ihrig
,
J.
Wieferink
,
I. Y.
Zhang
,
M.
Ropo
,
X.
Ren
,
P.
Rinke
,
M.
Scheffler
, and
V.
Blum
,
New J. Phys.
17
,
093020
(
2015
).
10.
C. A.
White
,
B. G.
Johnson
,
P. M. W.
Gill
, and
M.
Head-Gordon
,
Chem. Phys. Lett.
230
,
8
(
1994
).
11.
C. A.
White
,
B. G.
Johnson
,
P. M. W.
Gill
, and
M.
Head-Gordon
,
Chem. Phys. Lett.
253
,
268
(
1996
).
12.
R.
Łazarski
,
A. M.
Burow
, and
M.
Sierka
,
J. Chem. Theory Comput.
11
,
3029
(
2015
).
13.
R.
Łazarski
,
A. M.
Burow
,
L.
Grajciar
, and
M.
Sierka
,
J. Comput. Chem.
37
,
2518
(
2016
).
14.
M.
Challacombe
,
E.
Schwegler
, and
J.
Almlöf
,
J. Chem. Phys.
104
,
4685
(
1996
).
15.
M.
Challacombe
and
E.
Schwegler
,
J. Chem. Phys.
106
,
5526
(
1997
).
16.
J. M.
Pérez-Jordá
and
W.
Yang
,
J. Chem. Phys.
107
,
1218
(
1997
).
17.
M. C.
Strain
,
G. E.
Scuseria
, and
M. J.
Frisch
,
Science
271
,
51
(
1996
).
18.
C. A.
Lewis
,
J. A.
Calvin
, and
E. F.
Valeev
,
J. Chem. Theory Comput.
12
,
5868
(
2016
).
19.
X.
Xing
and
E.
Chow
,
SIAM J. Sci. Comput.
42
,
A162
(
2020
).
20.
H.
Cheng
,
Z.
Gimbutas
,
P. G.
Martinsson
, and
V.
Rokhlin
,
SIAM J. Sci. Comput.
26
,
1389
(
2005
).
21.
Y.
Aizenbud
and
A.
Averbuch
,
Inf. Inference: A J. IMA
8
,
445
(
2019
).
22.
W.
Hackbusch
,
B.
Khoromskij
, and
S. A.
Sauter
,
Lectures on Applied Mathematics
(
Springer-Verlag
,
Berlin
,
2000
), pp.
9
29
.
23.
H.
Huang
,
X.
Xing
, and
E.
Chow
, “
H2Pack: High-performance H2 matrix package for kernel matrices using the proxy point method
,”
ACM Trans. Math. Software
(to be published) (2020).
24.
P.
Sałek
,
S.
Høst
,
L.
Thøgersen
,
P.
Jørgensen
,
P.
Manninen
,
J.
Olsen
,
B.
Jansík
,
S.
Reine
,
F.
Pawłowski
,
E.
Tellgren
 et al,
J. Chem. Phys.
126
,
114110
(
2007
).
25.
L.
Goerigk
,
A.
Hansen
,
C.
Bauer
,
S.
Ehrlich
,
A.
Najibi
, and
S.
Grimme
,
Phys. Chem. Chem. Phys.
19
,
32184
(
2017
).
26.
E.
Chow
,
X.
Liu
,
M.
Smelyanskiy
, and
J. R.
Hammond
,
J. Chem. Phys.
142
,
104103
(
2015
).
27.
E.
Chow
,
X.
Liu
,
S.
Misra
,
M.
Dukhan
,
M.
Smelyanskiy
,
J. R.
Hammond
,
Y.
Du
,
X.-K.
Liao
, and
P.
Dubey
,
Int. J. High Perform. Comput. Appl.
30
,
85
(
2016
).
28.
T.
Helgaker
,
P.
Jørgensen
, and
J.
Olsen
,
Molecular Electronic-Structure Theory
(
John Wiley & Sons
,
2014
).
29.
H.
Huang
,
C. D.
Sherrill
, and
E.
Chow
,
J. Chem. Phys.
152
,
024122
(
2020
).
30.
B. P.
Pritchard
and
E.
Chow
,
J. Comput. Chem.
37
,
2537
(
2016
).