Efficient representations of the electron repulsion integral (ERI) tensor and fast algorithms for contractions with the ERI tensor often employ a low-rank approximation of the tensor or its sub-blocks. Such representations include density fitting (DF), the continuous fast multipole method (CFMM), and, more recently, hierarchical matrices. We apply the $H2$ hierarchical matrix representation to the ERI tensor with Gaussian basis sets to rapidly calculate the Coulomb matrices in Hartree–Fock and density functional theory calculations. The execution time and storage requirements of the hierarchical matrix approach and the DF approach are compared. The hierarchical matrix approach has very modest storage requirements, allowing large calculations to be performed in memory without recomputing ERIs. We interpret the hierarchical matrix approach as a multilevel, localized DF method and also discuss the close relationship between the hierarchical matrix approaches with CFMM. Like CFMM, the hierarchical matrix approach is asymptotically linear scaling, but the latter requires severalfold less memory (or severalfold less computation, if quantities are computed dynamically) due to being able to efficiently employ low-rank approximations for far more blocks.

## I. INTRODUCTION

Given *N*_{bf} basis functions {*ϕ*_{a}}, the electron repulsion integral (ERI) tensor

contains $Nbf4$ entries, with the number of non-negligible entries scaling as $O(Nbf2)$ with a very large prefactor. Here, we assume that the basis functions are Gaussian type functions. The ERI tensor is generally so large that it cannot be stored in memory, so its entries are either stored on disk or are recomputed whenever they are needed. Furthermore, each integral is very costly to compute, depending on irregularly structured recurrence relations. Efficient representations of the ERI tensor exploit properties of the tensor to reduce computational cost as well as to reduce storage requirements. Once an efficient representation of the ERI tensor is formed, it may be used to accelerate subsequent calculations, such as the calculation of the Coulomb matrix. Calculating the Coulomb matrix is often the computational bottleneck in density functional theory (DFT) calculations.

Although they have different physical motivations, many existing efficient representations of the ERI tensor can be viewed algebraically as “compressing” the ERI tensor or its sub-blocks into low-rank form. For example, density fitting^{1–9} (DF) constructs a low-rank approximation of the entire ERI tensor. The continuous fast multipole method^{10–13} (CFMM) and related methods^{14–17} compress specific blocks of the ERI tensor into low-rank form. The clustered low-rank tensor format^{18} (CLR), proposed for compressing the blocks in the three-index tensors of DF, can also be used to compress the ERI tensor. The different methods differ in how they exploit the *block low-rank structure* of the ERI tensor, i.e., which blocks to compress and how to compress them.

Previously, the $H2$-ERI method^{19} was proposed as an efficient method to calculate the Coulomb matrix. This is accomplished by compressing specific ERI blocks and representing the overall ERI tensor in the $H2$ matrix representation. The $H2$ matrix representation is a type of hierarchical block low-rank representation that has both storage and matrix-vector multiplication costs linear in the matrix dimension. In Ref. 19, a Matlab implementation was used to calculate and verify the accuracy of Coulomb matrices using the $H2$-ERI method. In this paper, we use an optimized multithreaded C program to perform Hartree–Fock and DFT calculations with the $H2$-ERI method. We also perform calculations with DF for comparison of both accuracy and computational time. In comparison with DF and CFMM, we show that $H2$-ERI better exploits the block low-rank structure of the ERI tensor and thus provides a more efficient representation.

The ERI tensor (*ab*|*cd*) can be unfolded into an *ERI matrix* where rows are indexed by *ab* and columns are indexed by *cd*. Each entry of the ERI matrix is the Coulomb interaction between two generalized electron densities, *ϕ*_{a}*ϕ*_{b} and *ϕ*_{c}*ϕ*_{d}. It can be justified numerically that (1) the ERI matrix has numerical rank *O*(*N*_{bf}); (2) a block of the ERI matrix associated with two “well-separated” (to be defined in Sec. II) sets of electron densities has numerical rank *O*(1) independent of the block size. Such rank-*O*(1) blocks form the majority of the ERI matrix. Thus, DF must have approximation rank at least *O*(*N*_{bf}) [equivalent to using *O*(*N*_{bf}) auxiliary basis functions], while $H2$-ERI and CFMM can compress most blocks of the ERI matrix into rank-*O*(1) form. As a result, CFMM and $H2$-ERI are asymptotically more efficient than DF.

In CFMM, a large number of ERI blocks, though numerically low-rank, are not compressed because multipole expansions cannot be used for determining low-rank expansions when electron densities overlap numerically. Such blocks must be represented in dense form, and these dense ERI blocks dominate the storage (if precomputed) or computation (if dynamically computed) in CFMM. In comparison, $H2$-ERI uses a compression method that works for more ERI blocks (without needing to compute these ERI blocks explicitly) than multipole expansions. As a result, $H2$-ERI has the same asymptotic scalability as CFMM but is far more efficient.

It is worth noting that the CLR tensor format uses a one-level partitioning of a tensor into nonoverlapping blocks as opposed to the hierarchical partitioning used in $H2$-ERI and CFMM. This lack of hierarchy in partitioning leads to larger asymptotic computation and storage complexities when using CLR for the ERI tensor than CFMM and $H2$-ERI. CLR also does not provide a way to compress blocks without computing them, as opposed to $H2$-ERI and CFMM.

## II. $H2$-ERI METHOD

### A. Notation and terminology

In this paper, we refer to a continuous electron density, or a product of two basis functions, *ϕ*_{a}*ϕ*_{b}, as a *distribution*. Since we assume that the basis functions are Gaussian type functions (GTFs), the distributions are also GTFs. Such a distribution decays exponentially and thus has a bounded *numerical support* outside of which the distribution is negligible. We use a ball to compactly characterize the numerical support of a distribution and will often refer to the *center* of the distribution as the center of this ball.

Let *I* denote the complete set of distributions {*ϕ*_{a}*ϕ*_{b}} for a molecular system and chosen basis set. We use the notation (*I*|*I*) to denote the $Nbf2\xd7Nbf2$ ERI matrix. Subsets of *I* can be used in this notation to denote sub-blocks of the ERI matrix.

### B. Block low-rank structure of the ERI matrix

To illustrate the block low-rank structure of the ERI matrix, consider one ideal graphene layer with the STO-3G basis set. Place a cubic box $B0=[\u2212L/2,\u2009L/2]3$ centered on the graphene layer [see Fig. 1(a)]. Let $Bnear=[\u22123L/2,\u20093L/2]3\B0$ be the union of 26 adjacent boxes, and $Bfar=[\u22127L/2,\u20097L/2]3\(Bnear\u222aB0)$ be the union of 316 nonadjacent boxes. Denote the sets of distributions with the center in these three domains, $B0$, $Bnear$, and $Bfar$, as *I*_{0}, *I*_{near}, and *I*_{far}, respectively. Increasing the edge length *L* from 3 Bohrs to 9 Bohrs, we obtain a series of sets *I*_{0}, *I*_{near}, and *I*_{far} with increasing numbers of distributions. Figure 1(b) plots the numerical ranks of the ERI blocks (*I*_{0}|*I*_{0}), (*I*_{0}|*I*_{near}), and (*I*_{0}|*I*_{far}) vs the size of *I*_{0}. The numerical rank of a matrix block is estimated by its singular values with relative threshold 10^{−10}.

We observe that the numerical ranks of the ERI blocks (*I*_{0}|*I*_{0}) and (*I*_{0}|*I*_{near}) both increase with the size of *I*_{0}, while the numerical rank of (*I*_{0}|*I*_{far}) tends to be small and independent of the sizes of *I*_{0} and *I*_{far}. This low-rank property of blocks of the form (*I*_{0}|*I*_{far}) is the basis of the $H2$-ERI method.

We say that two boxes of the same size are *well-separated* if they are separated by at least one box of the same size. As an example, box $B0$ and any box of the same size in $Bfar$ are said to be well-separated.

### C. Low rank approximation for well-separated boxes

Let the domain containing the centers of all distributions be partitioned into boxes of equal size. Let *I*_{i} denote the set of distributions in box *i*. Let *J*_{i} denote the set of distributions with centers in boxes that are well-separated from box *i*. The ERI block (*I*_{i}|*J*_{i}) is low-rank like the block (*I*_{0}|*I*_{far}) from Sec. II B. This block can be compressed into a low-rank form

where $Iiid$ is a subset of *I*_{i} and thus $(Iiid|Ji)$ is a subset of the rows of (*I*_{i}|*J*_{i}), *U*_{i} is a tall-and-skinny matrix with bounded values, and *r*_{0} is the approximation rank. Such a low-rank form is called an interpolative decomposition^{20} (ID).

One could compute an ID algebraically via the pivoted QR decomposition given a rank or an absolute/relative error threshold. However, such a procedure requires that the ERI block (*I*_{i}|*J*_{i}) is formed explicitly, i.e., requiring computation of all the ERIs in (*I*_{i}|*J*_{i}). Both CFMM and $H2$-ERI avoid needing (*I*_{i}|*J*_{i}) in explicit form. In CFMM, the low-rank approximation is computed via multipole expansions if the distributions in *I*_{i} and *J*_{i} do not overlap numerically. This limitation of CFMM means that the low-rank form of many ERI blocks cannot be computed and are treated as full-rank, dense matrices. In $H2$-ERI, the low-rank approximation is computed without this limitation via a method called the proxy point method.^{19} The key that allows the ERI matrix to be efficiently represented in the $H2$ matrix representation is this proxy point method.

To briefly explain the proxy point method, let (*I*_{*}|*J*_{*}) denote any above ERI block to be compressed into ID form. Referring to Fig. 2 (left), the centers of the distributions in *I*_{*} are contained in a box $B$, and the centers of the distributions in *J*_{*} are within the union of well-separated boxes $F$. In the proxy point method, the distributions in *J*_{*} are split into two subsets, *J*_{near} and *J*_{far}. The set *J*_{near} contains the distributions that numerically overlap with $B$ or its 26 adjacent boxes. These distributions are shown in green in Fig. 2. The set *J*_{far} = *J*_{*}∖*J*_{near}. The set *J*_{near} is usually a small fraction of *J*_{*} since GTFs decay exponentially, while *J*_{far} is a large fraction.

The challenge is how to efficiently compress the interactions between *I*_{*} and *J*_{far}. Here, by virtue of Green’s theorem, we replace *J*_{far} by a small set of point charges *Y*_{p} located in $F$ as illustrated in Fig. 2 (right), i.e., the column space of (*I*_{*}|*J*_{far}) is replaced by the column space of the much smaller matrix (*I*_{*}|*Y*_{p}). The elements of (*I*_{*}|*Y*_{p}) are much cheaper to compute than ERIs; the elements are interactions between distributions and point charges and are thus analogous to electron–nuclear attraction integrals.

Finally, the ID approximation $(I*|J*)\u2248U(I*id|J*)$ is computed via the “mixed” ID approximations of (*I*_{*}|*J*_{near}) and (*I*_{*}|*Y*_{p}), as shown in Algorithm 1. The proxy points in *Y*_{p} are uniformly sampled on several layers of cubic surfaces that enclose the interior boundary of domain $F$, as shown in Fig. 2. The size of *Y*_{p} is independent of the overall problem size |*I*| or the sizes of *I*_{*} and *J*_{*}. In most cases, a few hundreds to one thousand proxy points for *Y*_{p} are sufficient to guarantee the accuracy of the proxy point method. More details on the selection of *Y*_{p} can be found in Ref. 19.

Input: I_{*}, J_{*}, $B$, $F$ |

Output: U and $I*id$ for the ID approximation $(I*|J*)\u2248U(I*id|J*)$ |

1: Split J_{*} into J_{near} and J_{far} |

2: Select proxy point charges Y_{p} in $F$ |

3: Generate random matrices $\Omega 1\u2208R|Jnear|\xd7|I*|$ and $\Omega 2\u2208R|Yp|\xd7|I*|$ |

4: Calculate A_{1} = (I_{*}|J_{near})Ω_{1} and A_{2} = (I_{*}|Y_{p})Ω_{2} |

5: Normalize the columns of A_{1} and A_{2} to obtain $\xc31$ and $\xc32$ |

6: Compute U and $I*id$ by an algebraic ID approximation of [$\xc31$, $\xc32$]. |

Input: I_{*}, J_{*}, $B$, $F$ |

Output: U and $I*id$ for the ID approximation $(I*|J*)\u2248U(I*id|J*)$ |

1: Split J_{*} into J_{near} and J_{far} |

2: Select proxy point charges Y_{p} in $F$ |

3: Generate random matrices $\Omega 1\u2208R|Jnear|\xd7|I*|$ and $\Omega 2\u2208R|Yp|\xd7|I*|$ |

4: Calculate A_{1} = (I_{*}|J_{near})Ω_{1} and A_{2} = (I_{*}|Y_{p})Ω_{2} |

5: Normalize the columns of A_{1} and A_{2} to obtain $\xc31$ and $\xc32$ |

6: Compute U and $I*id$ by an algebraic ID approximation of [$\xc31$, $\xc32$]. |

In Algorithm 1, a randomized linear algebra technique^{21} is used in steps 3 and 4 to approximate the column spaces of (*I*_{*}|*J*_{near}) and (*I*_{*}|*Y*_{p}) by the column spaces of the smaller matrices *A*_{1} and *A*_{2}, respectively. Typically, Ω_{1} and Ω_{2} are chosen to be dense with elements from a standard normal distribution. In this paper, to reduce the cost of the multiplications in step 4, we choose Ω_{1} and Ω_{2} as sparse random matrices, with 16 nonzero entries per column with locations selected randomly and nonzero values following a standard normal distribution.

### D. The $H2$ matrix representation

We can now briefly describe the $H2$ matrix representation and establish additional terminology for this paper. See Ref. 22 for additional details.

An $H2$ matrix representation is composed of *far-field* (FF) blocks stored in low-rank form and *near-field* (NF) blocks stored in dense matrix form. FF blocks represent the interaction between distributions centered in two boxes that are well-separated. NF blocks represent the remaining interactions. See Fig. 3 (right) for an example.

In the example, the distributions *I* in a 1D domain are hierarchically partitioned into boxes. The structure is represented by a *partition tree* [see Fig. 3 (left)]. Let *I*_{i} ⊂ *I* denote the set of distributions centered in box *i* and corresponding to node *i* in the partition tree. The distributions centered in the finest level boxes are labeled *I*_{7}, …, *I*_{14} in the example. The union of the distributions *I*_{7} and *I*_{8} is *I*_{3}. Larger FF blocks are formed from merging smaller FF blocks, if possible. In the example, the large FF block (*I*_{3}|*I*_{5}) is due to the interactions between *I*_{3} and *I*_{5}, which are in well-separated boxes.

For each node *i* in the partition tree, the ID approximation of (*I*_{i}|*J*_{i}) [see Eq. (1)] is computed. Each FF block (*I*_{i}|*I*_{j}) is the intersection between the low-rank ERI blocks (*I*_{i}|*J*_{i}) and (*J*_{j}|*I*_{j}). [The latter is the transpose of (*I*_{j}|*J*_{j}).] Thus, the low-rank approximation of (*I*_{i}|*I*_{j}) can be constructed directly based on the ID approximations of the two blocks as

We refer to $(Iiid|Ijid)$ as an *intermediate* block.

For a nonleaf node *i* in the partition tree, the ID approximation of (*I*_{i}|*J*_{i}) is constructed in terms of the ID approximations associated with its children nodes in order to reduce computation and storage cost.^{19}

### E. Low-rank approximation of NF and intermediate blocks

In standard $H2$ matrix representations, the NF and intermediate blocks are stored in the dense matrix format, and storage of these dense blocks typically constitutes a large majority of the total storage cost. For ERI matrices, however, we note from Fig. 1(b) that the numerical ranks of (*I*_{0}|*I*_{0}) and (*I*_{0}|*I*_{near}), although not independent of |*I*_{0}|, are small relative to |*I*_{0}|. Thus, in $H2$-ERI, we also compress each NF and intermediate block into an ID form using the pivoted QR decomposition when this is beneficial. Since the total size of NF and intermediate blocks is *O*(|*I*|), this additional compression does not change the asymptotic computation cost of $H2$-ERI.

### F. Complexity and accuracy

Let *r* denote an upper bound on the rank of all the ID approximations. Then, the $H2$ matrix representation can be formed with *O*(|*I*|*r*^{2}) computation cost and requires *O*(|*I*|*r*) storage cost.

$H2$-ERI uses a relative error threshold *τ* to control the accuracy of each computed ID approximation. The same *τ* is used in the compression of the NF and intermediate blocks. The constructed representation of (*I*|*I*) in the end has relative error *O*(*τ*) in the Frobenius norm.

### G. Using the $H2$ matrix representation

The $H2$ matrix representation, once computed, can then be used to rapidly calculate the Coulomb matrix,

This operation is equivalent to the matrix-vector multiplication *J* = (*I*|*I*)*D* with the Coulomb matrix *J* and density matrix *D* unfolded into vectors. With (*I*|*I*) represented in the $H2$ format, the established fast $H2$ matrix-vector multiplication algorithm^{22,23} can calculate the Coulomb matrix with *O*(|*I*|*r*) computation cost.

## III. RELATIONSHIP TO DF

In DF, a set of *N*_{aux} auxiliary basis functions {*ψ*_{α}} is used to fit each distribution *ϕ*_{a}*ϕ*_{b} by a linear combination, i.e., $\varphi a\varphi b\u2248\u2211\alpha Ca,b\alpha \psi \alpha $. This leads to the general DF approximation

In classical DF,^{3} the fitting coefficients $Ca,b\alpha $ are computed as $Ca,b\alpha =\u2211\beta (ab|\beta )(\beta |\alpha )\u22121$, and the DF approximation becomes

This can be viewed as a rank-*N*_{aux} approximation of the ERI matrix (*I*|*I*). Typically, the number of auxiliary basis functions is about five times the number of basis functions, i.e., *N*_{aux} ∼ *O*(*N*_{bf}).

In $H2$-ERI, each ID approximation $(Ii|Ji)\u2248Ui(Iiid|Ji)$ can be viewed as a *localized DF*. More precisely, each row of (*I*_{i}|*J*_{i}) is approximated in the ID approximation as

where *u*_{k,:} is the *k*th row of *U*_{i} and *u*_{k,α} is the (*k*, *α*)th entry of *U*_{i}. Each distribution *φ*_{k} ∈ *I*_{i} is thus approximated by a linear combination of the distributions in $Iiid$ as

From the viewpoint of DF, $Iiid$ is exactly a set of auxiliary basis functions used to *approximate I*_{i}*for the interactions between I*_{i}*and J*_{i}, and *U*_{i} contains the associated fitting coefficients. Conversely, computing an ID approximation of an ERI block can be viewed as a numerical way to construct a DF approximation for the associated Coulomb interactions.

Most existing localized DF methods^{4–9,24} limit the number of auxiliary basis functions used to fit a distribution by heuristically applying a local fitting domain or a local fitting metric. In contrast, $H2$-ERI can be considered as a distinct localized DF approach that restricts DF to only approximate the Coulomb interactions between well-separated sets of distributions.

There are two main differences between $H2$-ERI and DF. The first is how accuracies of the constructed representations are controlled. In DF, the accuracy depends on the choice of a precomputed auxiliary basis set. In $H2$-ERI, the relative error threshold *τ* is used to directly control the accuracy of the $H2$ matrix representation.

The second difference is in the resulting approximation ranks, i.e., *N*_{aux} and *r*. The maximum approximation rank *r* in $H2$-ERI is experimentally *O*(1) with increasing problem sizes, while *N*_{aux} scales as *O*(*N*_{bf}). This results in $H2$-ERI requiring much less storage cost than DF.

## IV. TEST CALCULATIONS

In this section, we first verify the accuracy of Hartree–Fock (HF) and Kohn–Sham DFT calculations when $H2$-ERI is used to calculate the Coulomb matrices. We then compare the performance of the $H2$-ERI approach with that of DF in terms of computation time and storage.

Both HF and DFT use self-consistent field (SCF) iterations with direct inversion of the iterative subspace. Superposition of atomic densities is used for the initial density matrix. The SCF iterations are stopped when the energy difference between two consecutive iterations is less than 10^{−11} Hartrees. For DFT, we use the hybrid exchange–correlation functional B3LYP.

Previously, only the accuracy of the Coulomb matrices calculated by $H2$-ERI had been verified.^{19} In addition, $H2$-ERI was only used for ERI matrices with primitive basis functions. In this paper, we use contracted basis functions. Two basis sets are used: cc-pVDZ and aug-cc-pVDZ. The latter is important for comparison because it contains diffuse functions. For classical DF, the corresponding auxiliary basis sets cc-pVDZ-jkfit and aug-cc-pVDZ-jkfit are used.

$H2$-ERI and DF both start with prescreening negligible distributions in *I* = {*ϕ*_{a}*ϕ*_{b}}. If

then the distribution *ϕ*_{a}*ϕ*_{b} is relatively very small, and the corresponding rows and columns of the ERI matrix are omitted from computations. We now use *I* to denote the set of distributions that survive prescreening. For large enough chemical systems, the scaling |*I*| ∼ *O*(*N*_{bf}) has been justified previously.^{2,10,28}

In $H2$-ERI, we partition the distributions in *I* adaptively according to their centers. The hierarchical partitioning is stopped when each finest box has fewer than 400 unique distribution centers. However, since many distributions share the same center, they cannot be further split into subsets. These distributions result in a finest box that may contain more than 400 distributions.

The test calculations were carried out on a dual Intel Xeon Gold 6226 CPU computer with a total of 24 cores and 1.5 TB of memory. One hyperthread per core was used. Our software was developed in the GTFock framework,^{26,29} which contains very efficient C language parallel implementations of HF and DF. Our software uses H2Pack^{23} for storing and applying the $H2$ matrix representation and the Simint package^{30} for computing ERIs and other integrals.

### A. Ground state energy calculations

In this section, we report on testing the accuracy of using $H2$-ERI in HF and DFT ground state energy calculations. We assume that the exact energies are computed from our baseline HF and DFT calculations, which use “direct” calculation of the Coulomb matrices. In direct calculation, the ERIs that survive Cauchy–Schwarz screening are recomputed at every SCF iteration, and the Coulomb and exchange matrices are computed via the usual tensor contractions.

We first consider eight sets of test molecules from the “large system” group of the GMTKN55 benchmark database.^{25} Table I shows the average, maximum, and standard deviation of the absolute errors (difference from the baseline) in HF and DFT ground state energy per electron for the molecules in each of the test sets when the Coulomb matrices are calculated by $H2$-ERI and by DF. In these calculations, the cc-pVDZ basis set was used. $H2$-ERI used a relative error threshold of *τ* = 10^{−7}.

. | HF . | DFT (B3LYP) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | DF . | $H2$-ERI . | DF . | $H2$-ERI . | ||||||||

. | ave . | max . | std . | ave . | max . | std . | ave . | max . | std . | ave . | max . | std . |

BSR36 | 2.0 × 10^{−6} | 2.4 × 10^{−6} | 2.0 × 10^{−7} | 6.7 × 10^{−9} | 1.6 × 10^{−8} | 4.5 × 10^{−9} | 1.7 × 10^{−6} | 2.0 × 10^{−6} | 1.7 × 10^{−7} | 7.7 × 10^{−9} | 1.6 × 10^{−8} | 4.6 × 10^{−9} |

CDIE20 | 1.8 × 10^{−6} | 2.2 × 10^{−6} | 2.1 × 10^{−7} | 1.1 × 10^{−8} | 1.8 × 10^{−8} | 2.7 × 10^{−9} | 1.8 × 10^{−6} | 1.4 × 10^{−5} | 2.2 × 10^{−6} | 1.6 × 10^{−8} | 1.4 × 10^{−7} | 2.2 × 10^{−8} |

DARC | 1.9 × 10^{−6} | 2.6 × 10^{−6} | 4.8 × 10^{−7} | 1.0 × 10^{−8} | 1.6 × 10^{−8} | 4.6 × 10^{−9} | 1.6 × 10^{−6} | 2.2 × 10^{−6} | 4.3 × 10^{−7} | 1.1 × 10^{−8} | 2.1 × 10^{−8} | 5.6 × 10^{−9} |

PArel | 2.4 × 10^{−6} | 9.4 × 10^{−6} | 2.2 × 10^{−6} | 2.0 × 10^{−7} | 3.3 × 10^{−6} | 7.1 × 10^{−7} | 1.5 × 10^{−6} | 2.8 × 10^{−6} | 7.4 × 10^{−7} | 9.1 × 10^{−9} | 1.6 × 10^{−8} | 3.9 × 10^{−9} |

RSE43 | 8.9 × 10^{−5} | 1.6 × 10^{−3} | 2.7 × 10^{−4} | 1.7 × 10^{−5} | 2.1 × 10^{−4} | 4.9 × 10^{−5} | 8.4 × 10^{−5} | 1.5 × 10^{−3} | 2.5 × 10^{−4} | 2.4 × 10^{−5} | 4.8 × 10^{−4} | 7.1 × 10^{−5} |

ISO34 | 2.0 × 10^{−6} | 4.9 × 10^{−6} | 9.6 × 10^{−7} | 1.0 × 10^{−8} | 1.5 × 10^{−8} | 3.2 × 10^{−9} | 1.7 × 10^{−6} | 4.3 × 10^{−6} | 8.6 × 10^{−7} | 1.2 × 10^{−8} | 1.9 × 10^{−8} | 3.6 × 10^{−9} |

ISOL24 | 1.7 × 10^{−6} | 3.2 × 10^{−6} | 6.4 × 10^{−7} | 2.5 × 10^{−7} | 9.5 × 10^{−6} | 1.5 × 10^{−6} | 2.1 × 10^{−5} | 5.9 × 10^{−4} | 1.1 × 10^{−4} | 1.4 × 10^{−6} | 4.2 × 10^{−5} | 7.7 × 10^{−6} |

C60ISO | 1.9 × 10^{−6} | 1.9 × 10^{−6} | 2.2 × 10^{−8} | 1.1 × 10^{−8} | 3.8 × 10^{−8} | 1.3 × 10^{−8} | 1.6 × 10^{−6} | 1.6 × 10^{−6} | 1.7 × 10^{−8} | 2.1 × 10^{−8} | 5.5 × 10^{−8} | 1.6 × 10^{−8} |

. | HF . | DFT (B3LYP) . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | DF . | $H2$-ERI . | DF . | $H2$-ERI . | ||||||||

. | ave . | max . | std . | ave . | max . | std . | ave . | max . | std . | ave . | max . | std . |

BSR36 | 2.0 × 10^{−6} | 2.4 × 10^{−6} | 2.0 × 10^{−7} | 6.7 × 10^{−9} | 1.6 × 10^{−8} | 4.5 × 10^{−9} | 1.7 × 10^{−6} | 2.0 × 10^{−6} | 1.7 × 10^{−7} | 7.7 × 10^{−9} | 1.6 × 10^{−8} | 4.6 × 10^{−9} |

CDIE20 | 1.8 × 10^{−6} | 2.2 × 10^{−6} | 2.1 × 10^{−7} | 1.1 × 10^{−8} | 1.8 × 10^{−8} | 2.7 × 10^{−9} | 1.8 × 10^{−6} | 1.4 × 10^{−5} | 2.2 × 10^{−6} | 1.6 × 10^{−8} | 1.4 × 10^{−7} | 2.2 × 10^{−8} |

DARC | 1.9 × 10^{−6} | 2.6 × 10^{−6} | 4.8 × 10^{−7} | 1.0 × 10^{−8} | 1.6 × 10^{−8} | 4.6 × 10^{−9} | 1.6 × 10^{−6} | 2.2 × 10^{−6} | 4.3 × 10^{−7} | 1.1 × 10^{−8} | 2.1 × 10^{−8} | 5.6 × 10^{−9} |

PArel | 2.4 × 10^{−6} | 9.4 × 10^{−6} | 2.2 × 10^{−6} | 2.0 × 10^{−7} | 3.3 × 10^{−6} | 7.1 × 10^{−7} | 1.5 × 10^{−6} | 2.8 × 10^{−6} | 7.4 × 10^{−7} | 9.1 × 10^{−9} | 1.6 × 10^{−8} | 3.9 × 10^{−9} |

RSE43 | 8.9 × 10^{−5} | 1.6 × 10^{−3} | 2.7 × 10^{−4} | 1.7 × 10^{−5} | 2.1 × 10^{−4} | 4.9 × 10^{−5} | 8.4 × 10^{−5} | 1.5 × 10^{−3} | 2.5 × 10^{−4} | 2.4 × 10^{−5} | 4.8 × 10^{−4} | 7.1 × 10^{−5} |

ISO34 | 2.0 × 10^{−6} | 4.9 × 10^{−6} | 9.6 × 10^{−7} | 1.0 × 10^{−8} | 1.5 × 10^{−8} | 3.2 × 10^{−9} | 1.7 × 10^{−6} | 4.3 × 10^{−6} | 8.6 × 10^{−7} | 1.2 × 10^{−8} | 1.9 × 10^{−8} | 3.6 × 10^{−9} |

ISOL24 | 1.7 × 10^{−6} | 3.2 × 10^{−6} | 6.4 × 10^{−7} | 2.5 × 10^{−7} | 9.5 × 10^{−6} | 1.5 × 10^{−6} | 2.1 × 10^{−5} | 5.9 × 10^{−4} | 1.1 × 10^{−4} | 1.4 × 10^{−6} | 4.2 × 10^{−5} | 7.7 × 10^{−6} |

C60ISO | 1.9 × 10^{−6} | 1.9 × 10^{−6} | 2.2 × 10^{−8} | 1.1 × 10^{−8} | 3.8 × 10^{−8} | 1.3 × 10^{−8} | 1.6 × 10^{−6} | 1.6 × 10^{−6} | 1.7 × 10^{−8} | 2.1 × 10^{−8} | 5.5 × 10^{−8} | 1.6 × 10^{−8} |

We observe that the average error for $H2$-ERI is always smaller than 1.5 × 10^{−3} Hartrees (chemical accuracy). For a small number of molecules in the RSE43 test set, the maximum error is larger than chemical accuracy. In these cases, DF also shows error larger than chemical accuracy. These cases are also associated with slow convergence of the SCF iterations compared to the other cases. We note that the averages in Table I do not include cases where SCF did not converge in the baseline calculations and did not include the molecule “i12p” in the ISOL24 test set, where the HF SCF iterations in the DF case did not converge to the baseline energy. The statistics consider 301 molecules for HF and 292 molecules for DFT, out of 379 total molecules in the eight test sets.

As the molecules in the GMTKN55 benchmark database are rather small for demonstrating the efficiency of $H2$-ERI, we now consider three types of larger test molecules: alkanes, graphenes, and truncated protein–ligand systems. The latter are derived from the protein–ligand system “1hsg” from the protein data bank. These systems consist of a ligand and a portion of its protein environment within a given distance of the ligand (see Refs. 26 and 27 for more information).

Table II shows the ground state energy error for the larger test molecules. The results show that HF and DFT calculations using $H2$-ERI, which uses error tolerance *τ* = 10^{−7}, achieves better than chemical accuracy for this set of test molecules. The number of SCF iterations (see the Appendix) remained essentially the same. We also observe that the energy errors for the aug-cc-pVDZ basis set are larger than for the cc-pVDZ basis set.

. | . | . | . | . | HF . | DFT (B3LYP) . | ||
---|---|---|---|---|---|---|---|---|

. | N_{bf}
. | N_{aux}
. | r_{avg}
. | |I|
. | DF . | $H2$-ERI . | DF . | $H2$-ERI . |

cc-pVDZ | ||||||||

Alkane C_{60}H_{122} | 1510 | 7 910 | 163 | 183 148 | −1.0 × 10^{−3} | −7.2 × 10^{−6} | −8.7 × 10^{−4} | −4.8 × 10^{−6} |

Alkane C_{100}H_{202} | 2510 | 13 150 | 135 | 310 068 | −1.7 × 10^{−3} | −1.8 × 10^{−5} | −1.5 × 10^{−3} | −1.0 × 10^{−5} |

Graphene C_{96}H_{24} | 1560 | 8 376 | 205 | 443 319 | −4.3 × 10^{−4} | 1.3 × 10^{−5} | −3.7 × 10^{−4} | 4.5 × 10^{−5} |

Graphene C_{150}H_{30} | 2400 | 12 900 | 227 | 744 909 | −6.6 × 10^{−4} | 4.1 × 10^{−5} | −5.7 × 10^{−4} | 9.8 × 10^{−5} |

1hsg30 | 1240 | 6 572 | 109 | 251 261 | −7.9 × 10^{−4} | −2.1 × 10^{−5} | −6.8 × 10^{−4} | −3.4 × 10^{−5} |

1hsg32 | 1560 | 8 268 | 105 | 356 063 | −9.7 × 10^{−4} | −7.5 × 10^{−6} | −8.4 × 10^{−4} | 4.1 × 10^{−6} |

aug-cc-pVDZ | ||||||||

Alkane C_{60}H_{122} | 2598 | 10 330 | 161 | 801 403 | −9.6 × 10^{−4} | 8.9 × 10^{−6} | −8.9 × 10^{−4} | 9.1 × 10^{−5} |

Alkane C_{100}H_{202} | 4318 | 17 170 | 176 | 1 373 643 | −1.6 × 10^{−3} | 5.5 × 10^{−5} | −1.5 × 10^{−3} | 2.6 × 10^{−4} |

Graphene C_{54}H_{18} | 1512 | 6 084 | 317 | 859 638 | −2.2 × 10^{−4} | 2.2 × 10^{−5} | −1.9 × 10^{−4} | −3.5 × 10^{−5} |

Graphene C_{96}H_{24} | 2616 | 10 536 | 256 | 2 029 611 | −3.9 × 10^{−4} | 2.8 × 10^{−4} | −3.4 × 10^{−4} | 6.0 × 10^{−4} |

1hsg30 | 2108 | 8 432 | 166 | 1 301 608 | −7.0 × 10^{−4} | 1.8 × 10^{−4} | ⋯ | ⋯ |

1hsg32 | 2652 | 10 608 | 162 | 1 902 022 | −8.1 × 10^{−4} | 4.6 × 10^{−4} | ⋯ | ⋯ |

. | . | . | . | . | HF . | DFT (B3LYP) . | ||
---|---|---|---|---|---|---|---|---|

. | N_{bf}
. | N_{aux}
. | r_{avg}
. | |I|
. | DF . | $H2$-ERI . | DF . | $H2$-ERI . |

cc-pVDZ | ||||||||

Alkane C_{60}H_{122} | 1510 | 7 910 | 163 | 183 148 | −1.0 × 10^{−3} | −7.2 × 10^{−6} | −8.7 × 10^{−4} | −4.8 × 10^{−6} |

Alkane C_{100}H_{202} | 2510 | 13 150 | 135 | 310 068 | −1.7 × 10^{−3} | −1.8 × 10^{−5} | −1.5 × 10^{−3} | −1.0 × 10^{−5} |

Graphene C_{96}H_{24} | 1560 | 8 376 | 205 | 443 319 | −4.3 × 10^{−4} | 1.3 × 10^{−5} | −3.7 × 10^{−4} | 4.5 × 10^{−5} |

Graphene C_{150}H_{30} | 2400 | 12 900 | 227 | 744 909 | −6.6 × 10^{−4} | 4.1 × 10^{−5} | −5.7 × 10^{−4} | 9.8 × 10^{−5} |

1hsg30 | 1240 | 6 572 | 109 | 251 261 | −7.9 × 10^{−4} | −2.1 × 10^{−5} | −6.8 × 10^{−4} | −3.4 × 10^{−5} |

1hsg32 | 1560 | 8 268 | 105 | 356 063 | −9.7 × 10^{−4} | −7.5 × 10^{−6} | −8.4 × 10^{−4} | 4.1 × 10^{−6} |

aug-cc-pVDZ | ||||||||

Alkane C_{60}H_{122} | 2598 | 10 330 | 161 | 801 403 | −9.6 × 10^{−4} | 8.9 × 10^{−6} | −8.9 × 10^{−4} | 9.1 × 10^{−5} |

Alkane C_{100}H_{202} | 4318 | 17 170 | 176 | 1 373 643 | −1.6 × 10^{−3} | 5.5 × 10^{−5} | −1.5 × 10^{−3} | 2.6 × 10^{−4} |

Graphene C_{54}H_{18} | 1512 | 6 084 | 317 | 859 638 | −2.2 × 10^{−4} | 2.2 × 10^{−5} | −1.9 × 10^{−4} | −3.5 × 10^{−5} |

Graphene C_{96}H_{24} | 2616 | 10 536 | 256 | 2 029 611 | −3.9 × 10^{−4} | 2.8 × 10^{−4} | −3.4 × 10^{−4} | 6.0 × 10^{−4} |

1hsg30 | 2108 | 8 432 | 166 | 1 301 608 | −7.0 × 10^{−4} | 1.8 × 10^{−4} | ⋯ | ⋯ |

1hsg32 | 2652 | 10 608 | 162 | 1 902 022 | −8.1 × 10^{−4} | 4.6 × 10^{−4} | ⋯ | ⋯ |

Table II also shows the average rank *r*_{avg} of all the ERI blocks (*I*_{i}|*J*_{i}) in the $H2$ matrix representation. The results show that the average rank is very small relative to the number of auxiliary basis functions *N*_{aux} used in DF. Furthermore, this rank appears to be bounded and may even decrease when the molecular system size increases (for the same molecular system type). In contrast, in DF, *N*_{aux} grows with the system size. Thus, we expect that the total storage cost for $H2$-ERI will be low compared to DF and only grow linearly with the system size, rather than superlinearly for DF.

### B. Energy differences and error cancellation

To study possible error cancellation in quantum chemical computation using $H2$-ERI, we consider perturbations of the truncated protein–ligand system 1hsg32. This system consists of a ligand and the portion of its protein environment within 3.2 Å of the ligand. Perturbed systems were produced by shifting the ligand toward the protein pocket along the vector joining the pair of ligand and protein atoms that are the closest. Two perturbed systems, corresponding to shifts of 0.25 Å and 0.5 Å, were used. HF calculations were performed using the cc-pVDZ basis set.

Table III shows the ground state energies computed by direct, DF, and $H2$-ERI methods, for the original and the two perturbed systems. As previously observed, if the direct calculation is assumed to be exact, then $H2$-ERI appears to have smaller errors than DF. However, the main feature in Table III is the error differences between the original and perturbed systems. Again assuming that the direct calculation is exact, we observe that the error in the energy differences for DF are now very small and comparable to the error in the energy differences for $H2$-ERI.

Ligand shift (in Å) . | 0 . | 0.25 . | 0.5 . |
---|---|---|---|

Energy | |||

Direct | −3756.922 322 5 | −3756.907 960 9 | −3756.858 213 0 |

DF | −3756.923 292 7 | −3756.908 928 6 | −3756.859 174 6 |

$H2$-ERI | −3756.922 329 9 | −3756.907 967 4 | −3756.858 222 1 |

Error in energy | |||

DF | −9.70 × 10^{−4} | −9.68 × 10^{−4} | −9.62 × 10^{−4} |

$H2$-ERI | −7.49 × 10^{−6} | −6.53 × 10^{−6} | −9.09 × 10^{−6} |

Energy difference | |||

Direct | 0.014 361 6 | 0.064 109 5 | |

DF | 0.014 364 1 | 0.064 118 1 | |

$H2$-ERI | 0.014 362 5 | 0.064 107 9 | |

Error in energy difference | |||

DF | 2.53 × 10^{−6} | 8.62 × 10^{−6} | |

$H2$-ERI | 9.57 × 10^{−7} | −1.60 × 10^{−6} |

Ligand shift (in Å) . | 0 . | 0.25 . | 0.5 . |
---|---|---|---|

Energy | |||

Direct | −3756.922 322 5 | −3756.907 960 9 | −3756.858 213 0 |

DF | −3756.923 292 7 | −3756.908 928 6 | −3756.859 174 6 |

$H2$-ERI | −3756.922 329 9 | −3756.907 967 4 | −3756.858 222 1 |

Error in energy | |||

DF | −9.70 × 10^{−4} | −9.68 × 10^{−4} | −9.62 × 10^{−4} |

$H2$-ERI | −7.49 × 10^{−6} | −6.53 × 10^{−6} | −9.09 × 10^{−6} |

Energy difference | |||

Direct | 0.014 361 6 | 0.064 109 5 | |

DF | 0.014 364 1 | 0.064 118 1 | |

$H2$-ERI | 0.014 362 5 | 0.064 107 9 | |

Error in energy difference | |||

DF | 2.53 × 10^{−6} | 8.62 × 10^{−6} | |

$H2$-ERI | 9.57 × 10^{−7} | −1.60 × 10^{−6} |

DF has significant error cancellation in computing the energy differences (around two digits of accuracy improvement from ground state energies) but $H2$-ERI does not. This lack of error cancellation in $H2$-ERI is expected because $H2$-ERI only focuses on accurately approximating the ERI matrix blocks algebraically. Errors in the computed energies generally would not be biased toward any specific direction, e.g., $H2$-ERI has both positive and negative errors in Table II, but DF only has negative errors. Note that even without significant error cancellation, $H2$-ERI still has at least comparable accuracy with DF in computing energy differences, as shown in Table III.

### C. Computational and memory storage costs

The use of $H2$-ERI and DF both require a precomputation step. For $H2$-ERI, this is the construction of the $H2$ matrix representation. For DF, this is the construction of (*ab*|*α*) and (*α*|*β*)^{−1}. After the precomputation step, the SCF iterations calculate the Coulomb matrix once per iteration.

For $H2$-ERI and DF, Table IV lists the execution timings for the precomputation and for one calculation of the Coulomb matrix. The memory storage costs for $H2$-ERI and DF are also listed. For reference, the timings for direct calculation of the Coulomb matrix are also shown. For direct calculation, eight-way symmetry and Schwarz screening of the ERI tensor are exploited. Precomputation and storage are not needed in direct calculation. Timings for calculating the exact exchange term and exchange–correlation term in the DFT term are provided in the Appendix.

. | Direct . | DF . | $H2$-ERI . | ||||
---|---|---|---|---|---|---|---|

. | J
. | Storage . | Precomp. . | J
. | Storage . | Precomp. . | J
. |

cc-pVDZ | |||||||

Alkane C_{60}H_{122} | 25.52 | 17.3 | 3.20 | 0.19 | 2.8 | 19.89 | 0.05 |

Alkane C_{100}H_{202} | 74.62 | 48.6 | 9.07 | 0.54 | 4.8 | 25.64 | 0.07 |

Graphene C_{96}H_{24} | 101.63 | 41.1 | 5.53 | 0.72 | 5.4 | 67.82 | 0.11 |

Graphene C_{150}H_{30} | 273.56 | 104.5 | 13.73 | 1.11 | 9.3 | 136.33 | 0.15 |

1hsg30 | 32.75 | 17.7 | 2.70 | 0.19 | 4.7 | 34.08 | 0.06 |

1hsg32 | 60.85 | 30.6 | 4.76 | 0.37 | 6.4 | 45.93 | 0.08 |

aug-cc-pVDZ | |||||||

Alkane C_{60}H_{122} | 350.29 | 101.3 | 12.39 | 1.43 | 21.7 | 130.60 | 0.30 |

Alkane C_{100}H_{202} | 1025.82 | 287.0 | 39.00 | 3.27 | 41.5 | 241.71 | 0.61 |

Graphene C_{54}H_{18} | 338.48 | 67.6 | 9.76 | 0.76 | 16.1 | 349.22 | 0.25 |

Graphene C_{96}H_{24} | 1597.58 | 260.2 | 33.98 | 2.73 | 34.4 | 664.28 | 0.53 |

1hsg30 | 725.67 | 130.6 | 13.90 | 2.26 | 47.8 | 478.04 | 0.61 |

1hsg32 | 1481.97 | 236.1 | 25.54 | 2.65 | 70.8 | 683.31 | 0.88 |

. | Direct . | DF . | $H2$-ERI . | ||||
---|---|---|---|---|---|---|---|

. | J
. | Storage . | Precomp. . | J
. | Storage . | Precomp. . | J
. |

cc-pVDZ | |||||||

Alkane C_{60}H_{122} | 25.52 | 17.3 | 3.20 | 0.19 | 2.8 | 19.89 | 0.05 |

Alkane C_{100}H_{202} | 74.62 | 48.6 | 9.07 | 0.54 | 4.8 | 25.64 | 0.07 |

Graphene C_{96}H_{24} | 101.63 | 41.1 | 5.53 | 0.72 | 5.4 | 67.82 | 0.11 |

Graphene C_{150}H_{30} | 273.56 | 104.5 | 13.73 | 1.11 | 9.3 | 136.33 | 0.15 |

1hsg30 | 32.75 | 17.7 | 2.70 | 0.19 | 4.7 | 34.08 | 0.06 |

1hsg32 | 60.85 | 30.6 | 4.76 | 0.37 | 6.4 | 45.93 | 0.08 |

aug-cc-pVDZ | |||||||

Alkane C_{60}H_{122} | 350.29 | 101.3 | 12.39 | 1.43 | 21.7 | 130.60 | 0.30 |

Alkane C_{100}H_{202} | 1025.82 | 287.0 | 39.00 | 3.27 | 41.5 | 241.71 | 0.61 |

Graphene C_{54}H_{18} | 338.48 | 67.6 | 9.76 | 0.76 | 16.1 | 349.22 | 0.25 |

Graphene C_{96}H_{24} | 1597.58 | 260.2 | 33.98 | 2.73 | 34.4 | 664.28 | 0.53 |

1hsg30 | 725.67 | 130.6 | 13.90 | 2.26 | 47.8 | 478.04 | 0.61 |

1hsg32 | 1481.97 | 236.1 | 25.54 | 2.65 | 70.8 | 683.31 | 0.88 |

Although the execution time for calculating the Coulomb matrix is lower for $H2$-ERI than for DF, the results show that the main advantage of $H2$-ERI over DF is that of memory storage requirements. DF requires storing (*ab*|*α*) and (*α*|*β*)^{−1}, which can be very large. Thus, $H2$-ERI extends the size of the molecular systems that can be processed in memory. In particular, if the DF memory requirement exceeds that of a given machine, then $H2$-ERI could be used instead.

On the other hand, the precomputation cost of $H2$-ERI is higher than that of DF. High precomputation cost is a common issue with the use of $H2$ matrix representations, in general, although such precomputation costs are amortized over each time the $H2$ matrix representation is used (e.g., to form the Coulomb matrix).

### D. Linear scaling

The execution time of the $H2$-ERI method (both precomputation and calculating the Coulomb matrix) scales linearly with the number of distributions |*I*| (after prescreening), as verified in Fig. 4, for a sequence of 1hsg systems ranging from 124 to 1208 atoms with the cc-pVDZ basis set. The storage cost also scales linearly.

In contrast, DF computation and storage scales with an exponent of ∼1.7 over this range. Due to better scaling of $H2$-ERI, the precomputation time for $H2$-ERI would be smaller than that of DF for large enough systems. However, if the three-index tensors of DF are stored in memory, the bottleneck for DF is memory usage. Calculations for DF were not carried out if the memory storage exceeded 1 TB.

Since the scaling of $H2$-ERI is linear with respect to |*I*|, the scaling with respect to the number of basis functions *N*_{bf} (equivalently, the number of atoms) is also linear if |*I*| is linear in *N*_{bf}. This holds true for large molecular system sizes. Figure 5 plots the ratio of |*I*| to *N*_{bf} for a range of molecular system sizes. A ratio curve turning flat indicates that |*I*| becomes linear in *N*_{bf}. As can be observed, for systems that are more globular and for basis sets with more diffuse functions, the point at which the ratio curve turns flat is larger.

## V. RELATIONSHIP BETWEEN $H2$-ERI AND CFMM

$H2$-ERI and CFMM share the same hierarchical partitioning of *I*. From an algebraic viewpoint, CFMM is equivalent to multiplication by the ERI matrix in a specific $H2$ matrix representation. This $H2$ matrix representation in CFMM differs from the $H2$ matrix representation constructed for $H2$-ERI in two important ways: the definition of FF blocks and the compression of FF blocks.

In CFMM, (*I*_{i}|*I*_{j}) is a FF block if the following two conditions hold: boxes *i* and *j* are well-separated, and the numerical supports of distributions in box *i* do not overlap with those in box *j*. The numerical support of a GTF distribution is characterized by a ball in CFMM, and the radius of this ball is referred to as the “extent” of the distribution. In comparison, $H2$-ERI does not require the second condition. As a result of this difference, CFMM defines far more NF blocks than $H2$-ERI since, for typical problems, the numerical support of a distribution usually spreads over several boxes at the leaf level of the partition tree.

Consider the 1D hierarchical partitioning in Fig. 3 as an example. The FF and NF blocks in $H2$-ERI plotted in Fig. 3 are independent of the actual distribution extents. Assuming that all the distributions have their extents being 0.9× (and 1.4×), the edge length of a leaf box, the corresponding FF and NF blocks defined in CFMM are plotted in Fig. 6(a) [and Fig. 6(b)]. In addition, Table V lists the total number of entries of the NF and FF blocks defined in both CFMM and $H2$-ERI for several examples. As can be observed from both the abstract and practical examples, far more NF blocks are defined in CFMM than in $H2$-ERI, especially when basis sets with diffuse functions are used and when the molecular structure is globular.

. | CFMM . | $H2$-ERI . | . | ||
---|---|---|---|---|---|

. | NF . | FF . | NF . | FF . | Ratio . |

cc-pVDZ | |||||

Alkane C_{60}H_{122} | 2.2 × 10^{9} | 1.5 × 10^{10} | 1.1 × 10^{9} | 1.6 × 10^{10} | 2.0 |

Alkane C_{100}H_{202} | 3.6 × 10^{9} | 4.5 × 10^{10} | 1.6 × 10^{9} | 4.7 × 10^{10} | 2.3 |

Graphene C_{96}H_{24} | 1.5 × 10^{10} | 8.3 × 10^{10} | 1.8 × 10^{9} | 9.7 × 10^{10} | 8.4 |

Graphene C_{150}H_{30} | 3.2 × 10^{10} | 2.5 × 10^{11} | 4.4 × 10^{9} | 2.7 × 10^{11} | 7.4 |

1hsg30 | 8.0 × 10^{9} | 2.4 × 10^{10} | 1.4 × 10^{9} | 3.0 × 10^{10} | 5.8 |

1hsg32 | 1.3 × 10^{10} | 5.1 × 10^{10} | 2.1 × 10^{9} | 6.1 × 10^{10} | 5.9 |

aug-cc-pVDZ | |||||

Alkane C_{60}H_{122} | 3.5 × 10^{10} | 2.9 × 10^{11} | 4.8 × 10^{9} | 3.2 × 10^{11} | 7.3 |

Alkane C_{100}H_{202} | 6.2 × 10^{10} | 8.8 × 10^{11} | 7.9 × 10^{9} | 9.4 × 10^{11} | 7.8 |

Graphene C_{54}H_{18} | 1.5 × 10^{11} | 2.2 × 10^{11} | 6.1 × 10^{9} | 3.6 × 10^{11} | 24.3 |

Graphene C_{96}H_{24} | 4.2 × 10^{11} | 1.6 × 10^{12} | 1.1 × 10^{10} | 2.0 × 10^{12} | 38.0 |

1hsg30 | 2.3 × 10^{11} | 6.2 × 10^{11} | 1.1 × 10^{10} | 8.4 × 10^{11} | 21.6 |

1hsg32 | 4.0 × 10^{11} | 1.4 × 10^{12} | 1.6 × 10^{10} | 1.8 × 10^{12} | 25.3 |

. | CFMM . | $H2$-ERI . | . | ||
---|---|---|---|---|---|

. | NF . | FF . | NF . | FF . | Ratio . |

cc-pVDZ | |||||

Alkane C_{60}H_{122} | 2.2 × 10^{9} | 1.5 × 10^{10} | 1.1 × 10^{9} | 1.6 × 10^{10} | 2.0 |

Alkane C_{100}H_{202} | 3.6 × 10^{9} | 4.5 × 10^{10} | 1.6 × 10^{9} | 4.7 × 10^{10} | 2.3 |

Graphene C_{96}H_{24} | 1.5 × 10^{10} | 8.3 × 10^{10} | 1.8 × 10^{9} | 9.7 × 10^{10} | 8.4 |

Graphene C_{150}H_{30} | 3.2 × 10^{10} | 2.5 × 10^{11} | 4.4 × 10^{9} | 2.7 × 10^{11} | 7.4 |

1hsg30 | 8.0 × 10^{9} | 2.4 × 10^{10} | 1.4 × 10^{9} | 3.0 × 10^{10} | 5.8 |

1hsg32 | 1.3 × 10^{10} | 5.1 × 10^{10} | 2.1 × 10^{9} | 6.1 × 10^{10} | 5.9 |

aug-cc-pVDZ | |||||

Alkane C_{60}H_{122} | 3.5 × 10^{10} | 2.9 × 10^{11} | 4.8 × 10^{9} | 3.2 × 10^{11} | 7.3 |

Alkane C_{100}H_{202} | 6.2 × 10^{10} | 8.8 × 10^{11} | 7.9 × 10^{9} | 9.4 × 10^{11} | 7.8 |

Graphene C_{54}H_{18} | 1.5 × 10^{11} | 2.2 × 10^{11} | 6.1 × 10^{9} | 3.6 × 10^{11} | 24.3 |

Graphene C_{96}H_{24} | 4.2 × 10^{11} | 1.6 × 10^{12} | 1.1 × 10^{10} | 2.0 × 10^{12} | 38.0 |

1hsg30 | 2.3 × 10^{11} | 6.2 × 10^{11} | 1.1 × 10^{10} | 8.4 × 10^{11} | 21.6 |

1hsg32 | 4.0 × 10^{11} | 1.4 × 10^{12} | 1.6 × 10^{10} | 1.8 × 10^{12} | 25.3 |

It is worth noting that, in CFMM, distributions in each leaf-level set *I*_{i} are further grouped into “branches” according to their extents, and each NF block is then subdivided into smaller blocks, some of which can also be characterized as FF blocks and compressed. Figure 6 does not illustrate the “branch” idea for simplicity, but the number of NF block entries counted in Table V has taken this approach into account.

In both $H2$-ERI and CFMM, the NF blocks can be precomputed, stored in memory, and recalled when needed. Alternatively, the NF blocks can be computed when they are needed. The bottleneck in CFMM is usually the storage or computation of the NF blocks.^{17} Thus, the reduction in the number of NF blocks in $H2$-ERI compared to CFMM alleviates this bottleneck.

The reason for this restricted definition of FF blocks in CFMM is that the application of the multipole expansion technique used in CFMM is restrictive, i.e., requiring two distributions to be nonoverlapping. Using multipole expansions, a FF block (*I*_{i}|*I*_{j}) in CFMM is compressed into the low-rank form

where *S*_{j} corresponds to the source-to-multipole linear operator for box *j*, *B*_{i,j} corresponds to the multipole-to-local linear operator from box *j* to box *i*, and *T*_{i} corresponds to the local-to-target linear operator for box *i*. Comparing this approximation with $(Ii|Ij)\u2248Ui(Iiid|Ijid)UjT$ in $H2$-ERI, we can note that *T*_{i}, *S*_{j}, and *B*_{i,j} in CFMM correspond to *U*_{i}, $UjT$, and $(Iiid|Ijid)$ in $H2$-ERI, respectively.

In CFMM, all operators *T*_{i}, *S*_{j}, and *B*_{i,j} can be analytically computed using *I*_{i}, *I*_{j}, and geometric information for boxes *i* and *j*. These operators can be dynamically computed when needed. In $H2$-ERI, however, the components *U*_{i} and $Iiid$ must be precomputed via ID approximation of ERI blocks. Experimentally, the computation cost in $H2$-ERI for constructing *U*_{i} and $Iiid$ is usually multiple times the computation cost for evaluating and compressing the NF blocks, and the storage cost for *U*_{i} and $Iiid$ is very small compared to the storage cost of the NF blocks. Thus, compared to CFMM, the advantage of $H2$-ERI having far fewer NF blocks could easily offset the additional precomputation and storage cost required for *U*_{i} and $Iiid$.

## VI. CONCLUSION

The main advantage of $H2$-ERI over DF for calculating the Coulomb matrix is the dramatically reduced storage needed for an accurate representation of the ERI matrix, allowing large-scale quantum chemical computations in memory without recomputing ERIs. On the other hand, $H2$-ERI has a relatively expensive precomputation step compared to DF. The cost of this precomputation, however, is usually much less than the cost of forming the Coulomb matrix directly (Table IV) and can be amortized over multiple SCF iterations where the Coulomb matrix is calculated each time.

Compared to CFMM, $H2$-ERI reduces the number of NF blocks. This reduces the storage required for the compressed representation of the ERI tensor (if the NF blocks are precomputed and stored) or the computation required when computing the Coulomb matrix (if the NF blocks are computed dynamically when needed). We note that the precomputation time we have reported for $H2$-ERI includes the time for computing the required NF blocks.

Once the ERI tensor is represented in the $H2$ matrix form, the possibility exists for using this representation to accelerate other computations involving the ERI tensor. The $H2$ matrix form could also be used to efficiently represent the three-index tensor (*ab*|*α*) in DF.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## ACKNOWLEDGMENTS

The authors would like to thank C. David Sherrill for helpful discussions. The authors gratefully acknowledge funding from the National Science Foundation (Grant No. ACI-1609842).

### APPENDIX: SCF CONVERGENCE AND TIMING DATA

Corresponding to the data in Tables II and IV, Table VI shows the number of SCF iterations and the average time per iteration for direct calculation of the Coulomb matrix, direct calculation of the exchange matrix, and calculation of the DFT exchange–correlation matrix (not including time for the exchange matrix). We note that our implementation of the DFT exchange–correlation matrix calculation is not optimized.

. | HF #Iter . | DFT (B3LYP) #Iter . | Time (s) . | ||||||
---|---|---|---|---|---|---|---|---|---|

. | Direct . | DF . | $H2$-ERI . | Direct . | DF . | $H2$-ERI . | Direct-J . | Direct-K . | XC . |

cc-pVDZ | |||||||||

Alkane C_{60}H_{122} | 10 | 10 | 10 | 12 | 12 | 12 | 25.5 | 23.2 | 44.3 |

Alkane C_{100}H_{202} | 10 | 10 | 10 | 12 | 12 | 12 | 74.6 | 67.5 | 166.1 |

Graphene C_{96}H_{24} | 15 | 15 | 15 | 13 | 13 | 14 | 101.6 | 94.1 | 35.9 |

Graphene C_{150}H_{30} | 17 | 17 | 17 | 14 | 14 | 14 | 273.6 | 253.9 | 103.2 |

1hsg30 | 15 | 16 | 15 | 22 | 21 | 19 | 32.8 | 29.9 | 26.9 |

1hsg32 | 16 | 16 | 16 | 17 | 19 | 19 | 60.9 | 55.5 | 44.3 |

aug-cc-pVDZ | |||||||||

Alkane C_{60}H_{122} | 11 | 11 | 11 | 19 | 19 | 17 | 350.3 | 305.6 | 44.3 |

Alkane C_{100}H_{202} | 11 | 11 | 11 | 16 | 16 | 16 | 1025.8 | 892.8 | 166.1 |

Graphene C_{54}H_{18} | 14 | 14 | 14 | 13 | 11 | 11 | 338.5 | 304.3 | 35.9 |

Graphene C_{96}H_{24} | 15 | 15 | 15 | 14 | 13 | 12 | 1597.6 | 1432.5 | 103.2 |

1hsg30 | 16 | 16 | 16 | ⋯ | ⋯ | ⋯ | 725.7 | 634.4 | 26.9 |

1hsg32 | 19 | 19 | 19 | ⋯ | ⋯ | ⋯ | 1482.0 | 1330.3 | 101.6 |

. | HF #Iter . | DFT (B3LYP) #Iter . | Time (s) . | ||||||
---|---|---|---|---|---|---|---|---|---|

. | Direct . | DF . | $H2$-ERI . | Direct . | DF . | $H2$-ERI . | Direct-J . | Direct-K . | XC . |

cc-pVDZ | |||||||||

Alkane C_{60}H_{122} | 10 | 10 | 10 | 12 | 12 | 12 | 25.5 | 23.2 | 44.3 |

Alkane C_{100}H_{202} | 10 | 10 | 10 | 12 | 12 | 12 | 74.6 | 67.5 | 166.1 |

Graphene C_{96}H_{24} | 15 | 15 | 15 | 13 | 13 | 14 | 101.6 | 94.1 | 35.9 |

Graphene C_{150}H_{30} | 17 | 17 | 17 | 14 | 14 | 14 | 273.6 | 253.9 | 103.2 |

1hsg30 | 15 | 16 | 15 | 22 | 21 | 19 | 32.8 | 29.9 | 26.9 |

1hsg32 | 16 | 16 | 16 | 17 | 19 | 19 | 60.9 | 55.5 | 44.3 |

aug-cc-pVDZ | |||||||||

Alkane C_{60}H_{122} | 11 | 11 | 11 | 19 | 19 | 17 | 350.3 | 305.6 | 44.3 |

Alkane C_{100}H_{202} | 11 | 11 | 11 | 16 | 16 | 16 | 1025.8 | 892.8 | 166.1 |

Graphene C_{54}H_{18} | 14 | 14 | 14 | 13 | 11 | 11 | 338.5 | 304.3 | 35.9 |

Graphene C_{96}H_{24} | 15 | 15 | 15 | 14 | 13 | 12 | 1597.6 | 1432.5 | 103.2 |

1hsg30 | 16 | 16 | 16 | ⋯ | ⋯ | ⋯ | 725.7 | 634.4 | 26.9 |

1hsg32 | 19 | 19 | 19 | ⋯ | ⋯ | ⋯ | 1482.0 | 1330.3 | 101.6 |