We present a tensor hypercontracted (THC) scaled opposite spin second order Møller-Plesset perturbation theory (SOS-MP2) method. By using THC, we reduce the formal scaling of SOS-MP2 with respect to molecular size from quartic to cubic. We achieve further efficiency by exploiting sparsity in the atomic orbitals and using graphical processing units (GPUs) to accelerate integral construction and matrix multiplication. The practical scaling of GPU-accelerated atomic orbital-based THC-SOS-MP2 calculations is found to be N2.6 for reference data sets of water clusters and alanine polypeptides containing up to 1600 basis functions. The errors in correlation energy with respect to density-fitting-SOS-MP2 are less than 0.5 kcal/mol for all systems tested (up to 162 atoms).

Second order Møller-Plesset perturbation theory1 (MP2) is the simplest wavefunction-based method that can describe dynamic electron correlation effects. MP2 has been used extensively in studies of hydrogen bonding networks2,3 and noncovalent interactions in biological systems.4 However, the computational cost of canonical MP2 is dominated by the need to transform the two electron Coulomb repulsion integrals (ERIs) from the atomic orbital (AO) basis to the molecular orbital (MO) basis. This operation scales as the fifth power of molecular size (more precisely, the number of basis functions) and severely restricts the size of molecules where MP2 may be applied. As a result, there have been many efforts aiming to reduce the computational cost of MP2 calculations.

Two general approaches to reducing this computational complexity are possible. First, one can reduce the prefactor, leading to a method that is less computationally costly but with the same (inconvenient) scaling behavior. Second, one can reduce the scaling exponent. Such scaling reductions often come at the cost of an increased prefactor, so there is no gain until the molecule reaches a certain size.

The density fitting (DF) approximation,5–7 also known as resolution-of-the-identity (RI), aims to reduce the computational prefactor and memory requirements of MP2. DF/RI methods introduce a set of auxiliary basis functions to describe the charge density, thereby approximating the fourth order ERI tensor as a product of lower order tensors. By optimizing the auxiliary basis functions, DF-MP2 can achieve good agreement with canonical MP2.8–10 In addition, DF-MP2 is particularly well suited for implementation on massively parallel computers,11,12 enabling its application to large molecules with up to 4000 basis functions.13 Despite the great success of DF-MP2 at reducing the computational cost of MP2, one deficiency is that calculating the exchange-like correlation term still scales as the fifth power of the system size. This exchange term therefore becomes the bottleneck for large molecules.

An alternative way to reduce the computational prefactor is to utilize highly parallel hardware such as graphical processing units (GPUs). GPUs can launch thousands of threads simultaneously and have become a powerful tool in ab initio quantum chemistry.14–17 GPUs have been applied to MP218 and also many other types of electronic structure methods such as density functional theory,17,19,20 coupled cluster,21 and plane wave basis set techniques.22 

Most approaches to reduce the scaling behavior of correlation methods rely on spatial locality. This is most straightforward when the basis functions themselves exhibit the desired locality, as in the often-used atom-centered Gaussian basis sets. Within the context of MP2, one example is atomic orbital Laplace-transformed MP2,23,24 which scales quadratically for molecules with relatively large HOMO–LUMO gaps. Some methods attempt to further exploit the locality of electron correlation by introducing localized molecular orbitals. One successful example is the local electron correlation treatment originally proposed by Pulay25 and first implemented by Saebo and Pulay for Møller–Plesset perturbation theory up to fourth order, where projected atomic orbitals (PAOs) were used as a localized basis to expand the correlated space.26 By introducing multipole approximations and efficient prescreening algorithms, the scaling of both MP227 and CCSD28 can be reduced to linear. Neese et al. proposed the use of pair natural orbitals (PNOs) to overcome the problem that large PAO domains may be needed to converge the correlation energy.29 A PNO-based local-MP2 method has recently been reported by Werner et al., which scales linearly with molecular size and achieves high parallel efficiency.30 These local correlation concepts have been formalized into the sparse map framework, which provides a systematic way to treat electron correlation with sparse tensor techniques.31 

In the case of MP2, a second approach to reduce the scaling behavior consists of neglecting the exchange-like term. This does not affect the scaling of canonical MP2 (because the integral transformation remains the computational bottleneck) but in combination with the DF/RI approximation yields a quartic scaling method. Of course, the missing exchange contribution to the correlation energy must then be approximated in a computationally inexpensive way. One of the most important methods falling into this category is the scaled opposite spin MP2 (SOS-MP2) method,32 which approximates the correlation energy by scaling the Coulomb-like opposite-spin correlation energy with an empirical parameter. In combination with density fitting, this leads to quartic scaling with system size for the DF-SOS-MP2 energy (because the quintic scaling exchange term has been neglected). By using localized orbitals and exploiting the sparsity of local integral matrices and expansion coefficients, the scaling can be further reduced to cubic.33 Very recently, GPUs have been used to accelerate SOS-MP2 calculations based on a J-engine reformulation.34 

In addition to the traditional tensor-based approach, recently, there have also been several low-scaling methods proposed based on stochastic sampling that allow MP2 calculations to be performed in grid-based representations. Neuhauser et al. proposed a stochastic scheme that replaces the full summation with random averages.35 This is achieved by replacing the energy denominator with integration over real-time and introducing random wave functions. An alternative formulation in the energy-domain was later proposed by Ge et al.36 Another example is the Monte Carlo MP2 method proposed by Willow et al.,37 which evaluates the correlation energy through Monte Carlo integration by performing a stochastic sampling in the electron coordinate space. For each Monte Carlo step in this algorithm, the computational cost formally scales quadratically with respect to system size. These stochastic approaches are easily parallelized over many processors38 but have distinct disadvantages if energy gradients are required (for example, in ab initio molecular dynamics).

Tensor hypercontraction (THC)39,40 is a recently proposed method, which, like DF, approximates the ERI tensor by factorization into lower order tensors.41 Unlike DF, THC leads to a complete unpinning of all four indices in the ERI. This provides more flexibility compared to other methods (canonical and/or density-fitting MP2) as fewer indices may need to be carried during tensor operations. In tensor hyper-contraction, the ERI tensor is approximated as

pq|rsP,QXpPXqPZPQXrQXsQ,
(1)

where p, q, r, s are orbital indices and P, Q refer to a set of auxiliary variables. These auxiliary variables can have an abstract meaning if general tensor decompositions are employed,39 but they can also be given a concrete representation as gridpoints if desired, which is related to the grid-based representation adopted in the aforementioned stochastic algorithms. This emphasizes the relationship of Eq. (1) to a renormalized numerical quadrature42 and is the viewpoint taken in least-squares THC (LS-THC).40 Previous work39,43–45 has applied Eq. (1) in the MO basis. In principle, there is no restriction to the MO basis, but the accuracy that can be achieved for a fixed number of auxiliary indices could depend on the representation. In this work, we instead use the AO basis. As we will show, this can be more efficient and does not degrade the accuracy substantially. Furthermore, an AO formulation opens the door to efficient analytic derivative formulations.

In least-squares THC (LS-THC),40 one starts with a predetermined grid and the elements of the X tensors in Eq. (1) are then constrained to be values of orbitals at gridpoints

XpP=wP4ϕprP.
(2)

The quadrature grids used for LS-THC are constructed in a manner similar to the grids used in the density functional theory. Molecular grids are constructed from the union of spherical grids centered on each atom, which are formed as the direct product of angular (i.e., Lebedev-Laikov grids46) and radial grids (i.e., the Handy radial scheme47).45 When the MO basis is used, the resulting X tensors are dense since molecular orbitals are usually delocalized. In contrast, when THC is applied in the AO basis, the X tensors become sparse due to the locality of atom-centered Gaussian atomic orbitals. This can be exploited by using sparse tensor techniques.

As shown previously,39,40,45 the flexibility of THC allows cubic scaling computation of the Coulomb-like term in MP2 and quartic scaling computation of the exchange-like term.39 This implies cubic scaling of THC-SOS-MP2, where only the opposite spin correlation term (i.e., the Coulomb-like term) is computed. However, the construction of the Z tensor needed for THC scales either quartically or quintically (see below) and this bottleneck must be addressed in order to realize a cubic scaling THC-SOS-MP2 algorithm. In this work, we describe an efficient implementation of THC-SOS-MP2 by accelerating integral construction and matrix multiplication operations with GPUs and by exploiting the locality of atomic orbitals during any matrix multiplication operations involving the X tensors. Our algorithm does not depend on localization of the molecular orbitals but instead exploits locality in the atomic orbital basis set.

The structure of this paper is as follows. First, we discuss how X and Z (the fundamental ingredients in THC) are constructed on GPUs in the AO basis. Second, the working equations for THC-SOS-MP2 are presented, emphasizing the computational scaling when the locality of the AO basis set is exploited. Next, we test the accuracy for AO-based THC-SOS-MP2 by comparing with DF-SOS-MP2. Finally, we show the timings and scaling for the method on a variety of test sets.

The indices appearing in this work are summarized as follows:

  • μ, ν, λ, σ: Atomic Orbital (AO) basis indices, which are atom-centered contracted Gaussian functions. These indices range from 1…Nμ, where Nμ is proportional to system size N.

  • A, B, C: auxiliary basis indices for density fitting (DF) approximation, which are atom-centered Gaussian functions. These indices range from 1…NA, where NAN. For satisfactory accuracy, NA is usually three to four times larger than Nμ.

  • P, Q, R, S: auxiliary basis indices for THC approximation, which we take to represent gridpoints in physical space in this work. These indices range from 1…NP, where NPN. For satisfactory accuracy, NP is usually eight to ten times larger than Nμ.

  • i, j, k: occupied MO basis indices ranging from 1…Nocc.

  • a, b, c: virtual MO basis indices ranging from 1…Nvirt. Usually, Nvirt is significantly larger than Nocc.

  • κ: Laplace quadrature point indices. The range of these indices is independent of N, and 7-10 quadrature points are usually sufficient.

In the most general formulation of THC, the X and Z tensors both need to be determined.39 In LS-THC, the X tensors are prescribed (as collocation matrices on a predetermined molecular grid) and the Z tensor is determined by least-squares fitting,40 minimizing the difference between the THC-approximated and analytical integrals. The analytical solution for the Z tensor is

ZPQ=PQμνλσSphysPP1RμνPμν|λσRλσQSphysQQ1,
(3)

where the joint collocation matrix is defined as

RμνP=XμPXνP
(4)

and the THC (“physical space”) metric matrix is defined as

SphysPP=μνRμνPRμνP.
(5)

As written in Eq. (3), the scaling of Z tensor construction with respect to molecular size is quintic. This scaling can be reduced to quartic by using the density fitting approximation for the electron repulsion integrals as follows:

ZPQ=PQμνλσABSphysPP1RμνPμν|AA|B1×B|λσRλσQSphysQQ1.
(6)

Unfortunately, this would be the bottleneck if we proceeded to compute the Z tensor in this form, since the scaling of the THC-SOS-MP2 energy calculation is only cubic (vide infra). Thus, we show below how Eq. (6) can be evaluated in cubic time for a localized basis set.

Figure 1 illustrates how the X tensors are constructed on the GPUs. In order to exploit locality in atomic orbitals, we first divide the space containing the entire molecular geometry into cubic boxes with a specified edge length. The choice of the box size is discussed below. Primitive Gaussian basis functions and gridpoints are assigned to boxes according to their positions, and a sorting operation ensures that primitive Gaussians and gridpoints belonging to the same box are stored together. For every box, we define a box center R̄Box and a box radius ρBox. The box center is the average position of all gridpoints within the box. The box radius is the maximum distance from any of its gridpoints to its center. We also introduce a user-defined threshold, Xthre, such that elements in the X tensor with absolute value less than Xthre can be safely neglected (Xthre = 1 × 10−12 atomic units is used for all the results shown in this paper). The threshold Xthre is used to compute two cutoff distances for each angular momentum: the gridpoint cutoff distance rthre, which determines whether a gridpoint is located far enough to be neglected with respect to a primitive Gaussian function; and the box cutoff distance Rthre, which determines whether an entire box of gridpoints can be neglected. The gridpoint cutoff distance rthre is determined by solving

wmax4dmaxrthreLeαminrthre2Xthre,
(7)

where L is the angular momentum of the primitive Gaussian function; wmax is the maximum absolute value of gridpoint weight over all gridpoints; dmax and αmin are the maximum absolute value of contraction normalization factor and minimum exponent, respectively, over all primitive Gaussian functions. Therefore, the value of rthre reflects the properties of the most diffuse primitive Gaussian function in the system. For water clusters with cc-pVDZ basis set, rthre is ≈10 for Xthre = 1 × 10−12 (values in atomic units). The box cutoff distance Rthre is computed as Rthre=rthre+ρmaxBox, where ρmaxBox is the maximum radius among all boxes. The three thresholds Rthre, rthre, and Xthre are passed as parameters to the GPU kernels and are applied to all GPU threads. When computing on the GPUs, a 1-thread-1-primitive strategy13 is adopted where each thread only computes the elements of the X tensor for a single primitive Gaussian function. As shown in Figure 1, we exploit locality at three levels. First, each thread neglects boxes whose center is farther than Rthre from the center of the thread’s primitive Gaussian. Second, for boxes located within the cutoff distance, each thread then neglects gridpoints located farther than rthre from the thread’s primitive Gaussian. Finally, for the remaining gridpoints, the element value is computed but is only written to global memory if its absolute value exceeds the threshold Xthre. In order to balance the workload between threads, we require that all threads within a GPU block are assigned primitive Gaussian functions belonging to the same box. After the values of primitive Gaussians on gridpoints are transferred back to the central processing unit (CPU), a post-processing procedure is invoked to construct the final X tensors as a sparse matrix. Each box constructs its own X tensor, with the first index (major index) consistent with its own gridpoints, and the second index keeping track of all atomic orbitals with nontrivial values on any of its own gridpoints. As the number of contributing boxes is roughly constant for each primitive Gaussian, the nominal scaling of X tensor construction is linear. The accuracy of the above strategy is solely controlled by Xthre. Therefore, the choice of box size has no effect on the accuracy but may affect the performance. A large box size increases the box cutoff distance Rthre, and therefore, is inefficient for exploiting locality in X tensor construction. In contrast, although a small box size is favorable for screening the gridpoints at earlier stage, the resulting X sub-tensor for each box may not be large enough to saturate the graphics card in future tensor operations to benefit from its parallel capability. To balance between these two factors, in our implementation, we set the box length equal to 5.0 a.u. in the X tensor construction. After the post-processing procedure, boxes containing small X sub-tensors with fewer than 500 gridpoints are merged with adjacent boxes.

FIG. 1.

Linear scaling construction of the X tensors on GPUs. The distance from the center of the primitive Gaussian function P to the center of the corresponding box R̄Box is denoted as PR̄Box, and PRPt represents the distance from P to the corresponding gridpoint RPt. Only values associated with gridpoints that are close enough to P are computed and written to global memory. When constructing the sparse matrix during post-processing, the number of points for each box stays the same as pre-processing, and the number of atomic orbitals (AOs) for each box includes orbitals within the box plus orbitals from other boxes that have non-negligible contributions.

FIG. 1.

Linear scaling construction of the X tensors on GPUs. The distance from the center of the primitive Gaussian function P to the center of the corresponding box R̄Box is denoted as PR̄Box, and PRPt represents the distance from P to the corresponding gridpoint RPt. Only values associated with gridpoints that are close enough to P are computed and written to global memory. When constructing the sparse matrix during post-processing, the number of points for each box stays the same as pre-processing, and the number of atomic orbitals (AOs) for each box includes orbitals within the box plus orbitals from other boxes that have non-negligible contributions.

Close modal

To exploit locality in the joint collocation matrix R defined in Eq. (4) for Z tensor construction, we adopt the same box partitioning used in the X tensor construction. Although one could reuse the X tensor to compute R, we found better performance and lower memory consumption by recomputing the elements of R in each box. Figure 2 defines the concept of neighborhoods that we use to exploit locality in R as we construct the Z tensor. Figure 2(a) illustrates the neighborhood for a given primitive pair φ(r; P), where the center of the corresponding charge distribution P is defined by the Gaussian product rule. As the primitive pair is also a Gaussian function, we can compute R in the primitive pairs on GPUs using a similar strategy as that for X tensor construction. This involves first presorting the primitive pairs to consolidate those that are in the same box (this is done as a preprocessing step on the CPU). Primitive pairs pq with Schwarz bound,48 i.e., (pq|pq)1/2, lower than a user-defined threshold (1.0 × 10−13 a.u. is used for all test results shown in this paper) are neglected. Each GPU thread then computes R values for one primitive pair, considering only boxes of gridpoints within the neighborhood of this primitive pair. When a contracted basis set is used, an atomic orbital pair ϕμ(r) ϕν(r) is represented as the contraction of several primitive pairs ϕμrϕνr=mdmφmr;Pm each with its own unique center (where dm are the contraction coefficients). The neighborhood of this contracted atomic orbital pair is then defined as the union of the neighborhoods for its component primitive pairs, as shown in Figure 2(b). Each atomic orbital pair is assigned to a box according to the average coordinates of its primitive pair centers

1MNi=1Mj=1NζiA+ζjBζi+ζj,
(8)

where A and B represent the centers of two atomic orbitals, M and N represent the number of primitive Gaussian functions in each contracted atomic orbital, and ζi/ζj represent the exponents of each primitive Gaussian. As illustrated in Figure 2(c), the neighborhood of a box of atomic orbital pairs is the union of the neighborhoods of each of the contracted atomic orbital pairs in the box. For a large enough molecule, the size of all the neighborhoods discussed here will not depend strongly on molecular size. This implies that computational effort for constructing the joint collocation matrix will scale linearly with the number of atomic orbitals.

FIG. 2.

Illustration of the neighborhoods used for exploiting sparsity during Z tensor construction (Eq. (6)). Gridpoints are populated into boxes according to their positions. The neighborhood is a set of boxes whose gridpoints will have non-negligible contribution to a quantity of interest. (a) A primitive pair is populated into a box according to the center of its corresponding charge distribution; its neighborhood is composed of boxes that contain at least one gridpoint where the value of the primitive pair is non-negligible. (b) An atomic orbital pair is the contraction of several primitive pairs and is assigned to a box according to the average coordinates of its primitive pair centers; its neighborhood is the union of those of its primitive pairs. (c) The neighborhood of a box of atomic orbital pairs is the union of neighborhoods for each atomic orbital pair in the box.

FIG. 2.

Illustration of the neighborhoods used for exploiting sparsity during Z tensor construction (Eq. (6)). Gridpoints are populated into boxes according to their positions. The neighborhood is a set of boxes whose gridpoints will have non-negligible contribution to a quantity of interest. (a) A primitive pair is populated into a box according to the center of its corresponding charge distribution; its neighborhood is composed of boxes that contain at least one gridpoint where the value of the primitive pair is non-negligible. (b) An atomic orbital pair is the contraction of several primitive pairs and is assigned to a box according to the average coordinates of its primitive pair centers; its neighborhood is the union of those of its primitive pairs. (c) The neighborhood of a box of atomic orbital pairs is the union of neighborhoods for each atomic orbital pair in the box.

Close modal

Figure 3 shows the entire workflow used to accelerate construction of the Z tensor with the density fitting approximation. As the number of boxes comprising the neighborhood for a given box of atomic orbital pairs remains roughly constant, the calculation of the THC metric matrix defined in Eq. (5), which involves contraction over atomic orbital pairs, can be computed with linearly scaling operations. By using the same strategy, the contraction of the joint collocation matrix with the three-center two-electron integrals

MQA=μνRμνQμν|A
(9)

can be computed with quadratic scaling operations. Therefore, the overall nominal scaling for the integral calculations is quadratic. When the integral calculations are completed, a series of cubic-scaling linear algebra operations (bottom two boxes in Figure 3) are invoked to construct the final Z tensor. The inversions of the DF Coulomb metric matrix (A|C) and the THC metric matrix Sphys are computed on the CPU. As the THC metric matrix Sphys is usually ill-conditioned, a pseudo-inversion is performed by first diagonalizing the matrix, and then neglecting eigenvalues smaller than a predefined threshold (1.0 × 10−12 a.u. in this work). We then perform a series of cubic scaling matrix multiplications on the GPU to compute the Z tensor as

ZPQ=PABQSphysPP1MPAA|B1MBQSphysQQ1.
(10)

From the scaling analysis, the linear algebra operations are expected to dominate the computations as the systems become larger.

FIG. 3.

Flowchart for Z tensor construction (Eq. (6)), noting nominal scaling for each step. Operations colored in green are computed on the GPU, and other operations are computed on the CPU. Sparsity is exploited in the computation of the joint collocation matrix R, as described in the text and similar to X construction depicted in Figure 1. Angular momenta of the atomic orbital pairs and the auxiliary orbitals are denoted as (La, Lb) and L, respectively. For a chosen CentralBox of atomic orbital pairs, its Neighborhood is composed of a list of boxes as illustrated in Figure 2(c).

FIG. 3.

Flowchart for Z tensor construction (Eq. (6)), noting nominal scaling for each step. Operations colored in green are computed on the GPU, and other operations are computed on the CPU. Sparsity is exploited in the computation of the joint collocation matrix R, as described in the text and similar to X construction depicted in Figure 1. Angular momenta of the atomic orbital pairs and the auxiliary orbitals are denoted as (La, Lb) and L, respectively. For a chosen CentralBox of atomic orbital pairs, its Neighborhood is composed of a list of boxes as illustrated in Figure 2(c).

Close modal

Once X and Z have been constructed, they can be used to calculate the correlation energy. The canonical expression for the SOS-MP2 correlation energy based on a restricted, closed-shell HF reference is given as

ESOSMP2=cosijabia|jbia|jbΔijab,
(11)

where Δijab is the orbital energy denominator

Δijab=1εa+εbεiεj
(12)

and cos is the empirical factor, which approximates the total MP2 correlation energy by scaling the opposite spin correlation energy (a value of cos = 1.3 was previously suggested).32 The Laplace transformation is applied to factorize the orbital energy denominator, which can then be evaluated by numerical quadrature49,50

Δijab=0eεa+εbεiεjtdtκτiκτjκτaκτbκ.
(13)

We start by writing the Coulomb-like correlation energy in terms of the MO-THC approximation and the Laplace transformed energy denominators

EOSijabPQRSκτiκτjκτaκτbκXiPXaPZPQXjQXbQXiRXaRZRSXjSXbS.
(14)

In order to exploit locality in the AO basis, we first construct the following two transformation matrices:23,51

Tμμκ,occ=iCiμτiκCiμ,ONoccNμ2,
(15)
Tμμκ,virt=aCaμτaκCaμ,ONvirtNμ2,
(16)

where the scaling of the required operations has been shown next to the equations. Given these transformation matrices, the Coulomb-like term can be written in the AO basis as

EOSμμννλλσσPQRSκTμμκ,occTννκ,virtTλλκ,occTσσκ,virtXμPXνPZPQXλQXσQXμRXνRZRSXλSXσS.
(17)

By exploiting symmetry, the expression can be simplified as

EOSκPSGκ,PSGκ,SP,ONP2,
(18)

where

Gκ,PS=μμννRTμμκ,occTννκ,virtXμPXνPXμRXνRZRS.
(19)

By applying the transformation matrices to the X matrix, we then define

Yμκ,P=μTμμκ,occXμP,ONμ2,
(20)
Wμκ,P=μTμμκ,virtXμP,ONμ2.
(21)

The nominal scaling of the operations has made use of the fact that the number of significant gridpoints for each atomic orbital is nearly constant. Now, the G matrix defined in Eq. (19) and the SOS-MP2 energy in Eq. (17) can be evaluated in O(N3) operations

Aκ,PR=μXμPYμκ,R,ONPNμ,
(22)
Bκ,PR=νWνκ,PXνR,ONPNμ,
(23)
Fκ,PR=Aκ,PRBκ,PR,ONP2,
(24)
Gκ,PS=RFκ,PRZRS,ONP3.
(25)

Thus, the calculation of the SOS-MP2 energy (given X and Z tensors in the AO basis) formally scales cubically with molecular size.

The method described above was implemented in the TeraChem14–16 quantum chemistry package (which currently supports atom-centered spd Gaussian basis sets). All calculations were performed with the cc-pVDZ basis52 without the frozen core approximation. We will use the term AO-THC-DF-SOS-MP2 to denote this work, which indicates that THC factorization is applied to AO-based integrals approximated with density fitting (Eq. (6)), instead of the exact integrals (Eq. (3)). Similarly, the term MO-THC-DF-SOS-MP2 will be used to denote our previous implementations45 where THC factorization is applied in MO-basis. We used grids previously optimized for THC-MP2 in the molecular orbital basis.45 The THC grids contain around 60 gridpoints per hydrogen atom, and around 120 gridpoints per each carbon, nitrogen, or oxygen atom. The cc-pVDZ-RI auxiliary basis set is used in the density fitting approximation.53 

The accuracy of the AO-THC-DF-SOS-MP2 correlation energy was verified by comparing to the DF-SOS-MP2 energy implemented in Q-Chem.54 Errors for AO-THC-DF-SOS-MP2 with respect to DF-SOS-MP2 are presented in Figures 4 and 5 for alanine polypeptides and water clusters, respectively. The errors introduced by using the THC approximation are less than 0.5 kcal/mol for alanine polypeptides up to 16 amino acids (162 atoms, 1610 basis functions) and less than 0.21 kcal/mol for water clusters containing up to 48 water molecules (144 atoms, 1200 basis functions). These errors correspond to less than 0.0075% of the total correlation energy and are well within chemical accuracy. In Figure S1,58 we compare the error from the THC approximation with that arising from the DF approximation (for a series of alkenes), showing that the THC error is more than an order of magnitude less than the DF error (which has already been found to be sufficiently accurate in many applications). In Figure S2,58 we show the THC error, which arises for basis sets containing diffuse functions (for a series of water clusters). As there are currently no optimized grids for basis sets with diffuse functions, we used a grid optimized for the cc-pVTZ basis set.45 Even so, the THC error is well below 1 kcal/mol. Optimized grids for diffuse basis sets are under development, and these should decrease the error substantially. Finally, it is important to show that the THC errors are systematic, i.e., that they at least partially cancel when taking energy differences. We have shown this for canonical MP2 previously. In Figure S3,58 we compare the relative DF-SOS-MP2 energies of 21 (H2O)16 water cluster structures corresponding to local minima. The RMS error of these relative energies for AO-THC-DF-SOS-MP2 compared to DF-SOS-MP2 is 0.015 kcal/mol well within chemically acceptable errors.

FIG. 4.

Absolute error of AO-THC-DF-SOS-MP2 with respect to DF-SOS-MP2 for alanine polypeptides (ALA)N. Calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set.

FIG. 4.

Absolute error of AO-THC-DF-SOS-MP2 with respect to DF-SOS-MP2 for alanine polypeptides (ALA)N. Calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set.

Close modal
FIG. 5.

Absolute error of AO-THC-DF-SOS-MP2 with respect to DF-SOS-MP2 for water clusters. Calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set.

FIG. 5.

Absolute error of AO-THC-DF-SOS-MP2 with respect to DF-SOS-MP2 for water clusters. Calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set.

Close modal

Using the water clusters as a test set, Figure 6 compares the computational scaling with respect to the number of basis functions for AO-THC-DF-SOS-MP2, MO-THC-DF-SOS-MP2, DF-SOS-MP2, and Hartree-Fock. In these tests, AO-THC-DF-SOS-MP2 and HF are computed using TeraChem55 on the GPU, MO-THC-DF-SOS-MP2 is computed on the CPU with a development version of Psi4,56 and DF-SOS-MP2 is computed on the CPU with Q-Chem v4.2.54 When comparing absolute times, one should bear in mind that these correspond to different architectures and programs. The observed scaling should not be sensitive to the variation in architectures but instead is a reflection of the underlying algorithms. The formal scaling of MO-based THC-DF-SOS-MP2 is O(N4), which is the same as DF-SOS-MP2. In contrast, the formal scaling of AO-based THC-DF-SOS-MP2 is O(N3). Figure 6 shows that DF-SOS-MP2 exhibits exactly the expected quartic scaling, while MO-THC-DF-SOS-MP2 is only cubic in this size regime. The AO-THC-SOS-MP2 method scales even more favorably with an observed scaling in this size regime of N2.6. This scaling is only slightly worse than the preceding Hartree-Fock calculations (as performed by TeraChem on the GPU). For (H2O)48, which is the largest water cluster tested, AO-THC-DF-SOS-MP2 requires about 5 times as much computing time as the prerequisite Hartree-Fock calculation. We have also performed the same tests on alanine polypeptides and obtained similar scaling behavior (up to 1610 basis functions) as shown in Figure S4.58 For the largest tested alanine polypeptide (ALA)16, AO-THC-DF-SOS-MP2 requires about 3 times longer than the preceding Hartree-Fock calculation.

FIG. 6.

The scaling of the AO-THC-DF-SOS-MP2 calculations (red) compared with DF-SOS-MP2 (green), MO-THC-DF-SOS-MP2 (blue), and Hartree-Fock (black) calculations on water clusters taken from ice crystal structures with increasing sizes. All calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set. The geometry shown in the inset corresponds to the largest test cluster (H2O)48. DF-SOS-MP2 calculations are performed with Q-Chem and MO-THC-DF-SOS-MP2 calculations are performed with Psi4, both using one Intel Xeon E5-2643 CPU core. The AO-THC-DF-SOS-MP2 and Hartree-Fock calculations are performed with TeraChem using one of the same type of CPU core and one GeForce GTX 970 GPU. The SOS-MP2 timing results shown do not include time spent on the Hartree-Fock calculations. The SCF convergence threshold is set to 1.0 × 105 Hartree. For the largest test system, the time spent on AO-THC-DF-SOS-MP2 is about 5x the time spent on the Hartree-Fock.

FIG. 6.

The scaling of the AO-THC-DF-SOS-MP2 calculations (red) compared with DF-SOS-MP2 (green), MO-THC-DF-SOS-MP2 (blue), and Hartree-Fock (black) calculations on water clusters taken from ice crystal structures with increasing sizes. All calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set. The geometry shown in the inset corresponds to the largest test cluster (H2O)48. DF-SOS-MP2 calculations are performed with Q-Chem and MO-THC-DF-SOS-MP2 calculations are performed with Psi4, both using one Intel Xeon E5-2643 CPU core. The AO-THC-DF-SOS-MP2 and Hartree-Fock calculations are performed with TeraChem using one of the same type of CPU core and one GeForce GTX 970 GPU. The SOS-MP2 timing results shown do not include time spent on the Hartree-Fock calculations. The SCF convergence threshold is set to 1.0 × 105 Hartree. For the largest test system, the time spent on AO-THC-DF-SOS-MP2 is about 5x the time spent on the Hartree-Fock.

Close modal

In order to better understand the performance-limiting step, we examined the timing and scaling for each component of the AO-THC-DF-SOS-MP2 calculations, as shown in Figure 7 for water clusters. Similar information for alanine polypeptides is provided in Figure S5.58 The correlation energy calculation (“SOS Energy,” Eqs. (14)-(25)) scales sub-cubically and is not the computational bottleneck for all systems tested. This indicates that Eqs. (15), (16), and (25) (which formally scale cubically) have low prefactors compared to the remaining equations for the SOS energy (which formally scale quadratically). The component that has the highest scaling is the construction of the Z tensor in Eq. (6), which becomes the computational bottleneck for large systems. Although this involves a number of matrix multiplications, the dominant step is the inversion of the DF and THC metric matrices (inner and outer terms on the right-hand side in Eq. (10)). As mentioned above, these pseudo-inversions are currently performed on the CPU by diagonalizing the full matrix. They are therefore much more expensive than matrix multiplications, which have the same cubic scaling but are greatly accelerated by the GPUs. In order to further extend the size of molecules for which our method is feasible, it will be necessary to make this matrix inversion more efficient. This work is currently in progress.

FIG. 7.

The scaling of individual steps in AO-THC-DF-SOS-MP2 calculations on water clusters. All calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set. “Linear Algebra” refers to the time spent on matrix inversions and matrix multiplications, as shown by the last two boxes in Figure 3. “SOS Energy” refers to the time spent on correlation energy calculations after the X and Z tensors have been built (Eqs. (17)-(25) in the text). “Integrals” refers to the construction of the X and Z tensors as well as the auxiliary basis integrals (Eqs. (4)-(6) and (9) in the text). The inset geometry is (H2O)36 which is the molecular size where linear algebra starts to dominate the calculation.

FIG. 7.

The scaling of individual steps in AO-THC-DF-SOS-MP2 calculations on water clusters. All calculations use the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set. “Linear Algebra” refers to the time spent on matrix inversions and matrix multiplications, as shown by the last two boxes in Figure 3. “SOS Energy” refers to the time spent on correlation energy calculations after the X and Z tensors have been built (Eqs. (17)-(25) in the text). “Integrals” refers to the construction of the X and Z tensors as well as the auxiliary basis integrals (Eqs. (4)-(6) and (9) in the text). The inset geometry is (H2O)36 which is the molecular size where linear algebra starts to dominate the calculation.

Close modal

By taking advantage of the THC factorization, we demonstrated a reduction in the formal scaling of SOS-MP2 from quartic to cubic with respect to the size of the system. In practice, the observed scaling is an even more favorable N2.6 in the size regime up to 160 atoms. As the method is highly efficient, it can potentially be applied to study properties of large-scale hydrogen bonding networks and to understand noncovalent interactions in biological molecules.

The implementation of GPU-accelerated X and Z tensor construction provides an efficient way to compute the THC approximated integrals and can be used in other types of THC-approximated correlation methods. Even when the X and Z tensor construction is not the computational bottleneck, the GPU implementation will still be helpful in reducing the total wall time. For example, it can be applied to previous algorithms for THC coupled cluster (THC-CC2 and THC-CCSD)43,57 and THC equation-of-motion coupled cluster (THC-EOM-CC2)43 calculations.

We have demonstrated that THC applied in the AO basis is not only efficient due to locality inherent in the AO basis but also gives accurate correlation energies. In addition to the advantages in terms of low computational cost as we have discussed, AO-based THC is more appealing than the MO-based THC for developing analytic gradients of THC approximated methods. When computing the gradients of the Z tensor (Eq. (3) or Eq. (6)) in the MO basis with respect to the atomic degrees of freedom, one should include the contributions from the gradients of molecular orbital coefficients, which requires the solution of coupled-perturbed Hartree-Fock equations. The expression of the Z tensor in the AO basis overcomes this problem and opens up the possibility of developing analytical gradients for THC and extending the applications of THC to ab initio molecular dynamics. We are currently working on these enhancements.

In the last part of the Discussion section, we have observed that inverting the metric matrices becomes the bottleneck as system sizes become large. In a forthcoming paper,59 we will explore methods that can reduce the cost of these matrix inversions. Such advances would make THC-SOS-MP2 feasible for larger biological systems such as proteins.

This work was supported by the National Science Foundation (Grant No. CHE-1565249) and used the XStream computational resource supported by the NSF MRI Program (Grant No. ACI-1428930). We thank Edward Hohenstein and Lee-Ping Wang for helpful suggestions and insightful discussions. T.J.M. is grateful to the Department of Defense (Office of the Assistant Secretary of Defense for Research and Engineering) for a National Security Science and Engineering Faculty Fellowship (NSSEFF). C.S. is grateful for a Stanford Graduate Fellowship.

1.
C.
Moller
and
M. S.
Plesset
, “
Note on an approximation treatment for many-electron systems
,”
Phys. Rev.
46
,
618
(
1934
).
2.
S. S.
Xantheas
, “
Cooperativity and hydrogen bonding network in water clusters
,”
Chem. Phys.
258
,
225
(
2000
).
3.
S.
Yoo
,
E.
Apra
,
X. C.
Zeng
, and
S. S.
Xantheas
, “
High-level ab initio electronic structure calculations of water clusters (H2O)(16) and (H2O)(17): A new global minimum for (H2O)(16)
,”
J. Phys. Chem. Lett.
1
,
3122
(
2010
).
4.
K. E.
Riley
and
P.
Hobza
, “
Assessment of the MP2 method, along with several basis sets, for the computation of interaction energies of biologically relevant hydrogen bonded and dispersion bound complexes
,”
J. Phys. Chem. A
111
,
8257
(
2007
).
5.
O.
Vahtras
,
J.
Almlof
, and
M.
Feyereisen
, “
Integral approximations for LCAO-SCF calculations
,”
Chem. Phys. Lett.
213
,
514
(
1993
).
6.
M.
Feyereisen
,
G.
Fitzgerald
, and
A.
Komornicki
, “
Use of approximate integrals in ab initio theory: An application in MP2 energy calculations
,”
Chem. Phys. Lett.
208
,
359
(
1993
).
7.
F.
Weigend
,
M.
Haser
,
H.
Patzelt
, and
R.
Ahlrichs
, “
RI-MP2: Optimized auxiliary basis sets and demonstration of efficiency
,”
Chem. Phys. Lett.
294
,
143
(
1998
).
8.
D. E.
Bernholdt
and
R. J.
Harrison
, “
Fitting basis sets for the RI-MP2 approximate second-order many-body perturbation theory method
,”
J. Chem. Phys.
109
,
1593
(
1998
).
9.
C.
Hattig
, “
Optimization of auxiliary basis sets for RI-MP2 and RI-CC2 calculations: Core-valence and quintuple-zeta basis sets for H to Ar and QZVPP basis sets for Li to Kr
,”
Phys. Chem. Chem. Phys.
7
,
59
(
2005
).
10.
M.
Tanaka
,
M.
Katouda
, and
S.
Nagase
, “
Optimization of RI-MP2 auxiliary basis functions for 6-31G** and 6-311G** basis sets for first-, second-, and third-row elements
,”
J. Comput. Chem.
34
,
2568
(
2013
).
11.
D. E.
Bernholdt
, “
Scalability of correlated electronic structure calculations on parallel computers: A case study of the RI-MP2 method
,”
Parallel Comput.
26
,
945
(
2000
).
12.
D. E.
Bernholdt
and
R. J.
Harrison
, “
Large-scale correlated electronic structure calculations: The RI-MP2 method on parallel computers
,”
Chem. Phys. Lett.
250
,
477
(
1996
).
13.
M.
Katouda
and
S.
Nagase
, “
Efficient parallel algorithm of second-order Moller-Plesset perturbation theory with resolution-of-identity approximation (RI-MP2)
,”
Int. J. Quantum Chem.
109
,
2121
(
2009
).
14.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. I. Strategies for two-electron integral evaluation
,”
J. Chem. Theory Comput.
4
,
222
(
2008
).
15.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. II. Direct self-consistent-field implementation
,”
J. Chem. Theory Comput.
5
,
1004
(
2009
).
16.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. III. Analytical energy gradients, geometry optimization, and first principles molecular dynamics
,”
J. Chem. Theory Comput.
5
,
2619
(
2009
).
17.
K.
Yasuda
, “
Acclerating density functional calculations with graphical processing unit
,”
J. Chem. Theory Comput.
4
,
1230
(
2008
).
18.
L.
Vogt
,
R.
Olivares-Amaya
,
S.
Kermes
,
Y.
Shao
,
C.
Amador-Bedolla
, and
A.
Aspuru-Guzik
, “
Accelerating resolution-of-the-identity second-order Moller-Plesset quantum chemistry calculations with graphical processing units
,”
J. Phys. Chem. A
112
,
2049
(
2008
).
19.
L.
Genovese
,
M.
Ospici
,
T.
Deutsch
,
J.-F.
Mehaut
,
A.
Neelov
, and
S.
Goedecker
, “
Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures
,”
J. Chem. Phys.
131
,
034103
(
2009
).
20.
I. S.
Ufimtsev
,
N.
Luehr
, and
T. J.
Martinez
, “
Charge transfer and polarization in solvated proteins from ab initio molecular dynamics
,”
J. Phys. Chem. Lett.
2
,
1789
(
2011
).
21.
A. E.
DePrince
and
J. R.
Hammond
, “
Coupled cluster theory on graphics processing units. I. The coupled cluster doubles method
,”
J. Chem. Theory Comput.
7
,
1287
(
2011
).
22.
M.
Hacene
,
A.
Anciaux-Sedrakian
,
X.
Rozanska
,
D.
Klahr
,
T.
Guignon
, and
P.
Fleurat-Lessard
, “
Accelerating VASP electronic structure calculations using graphic processing units
,”
J. Comput. Chem.
33
,
2581
(
2012
).
23.
P. Y.
Ayala
and
G. E.
Scuseria
, “
Linear scaling second-order Moller-Plesset theory in the atomic orbital basis for large molecular systems
,”
J. Chem. Phys.
110
,
3660
(
1999
).
24.
P. Y.
Ayala
,
K. N.
Kudin
, and
G. E.
Scuseria
, “
Atomic orbital Laplace-transformed second-order Moller-Plesset theory for periodic systems
,”
J. Chem. Phys.
115
,
9698
(
2001
).
25.
S.
Saebo
and
P.
Pulay
, “
Local treatment of electron correlation
,”
Annu. Rev. Phys. Chem.
44
,
213
(
1993
).
26.
S.
Saebo
and
P.
Pulay
, “
4th-order Moller-Plessett perturbation theory in the local correlation treatment. I. Method
,”
J. Chem. Phys.
86
,
914
(
1987
).
27.
M.
Schutz
,
G.
Hetzer
, and
H. J.
Werner
, “
Low-order scaling local electron correlation methods. I. Linear scaling local MP2
,”
J. Chem. Phys.
111
,
5691
(
1999
).
28.
M.
Schutz
and
H. J.
Werner
, “
Low-order scaling local electron correlation methods. IV. Linear scaling local coupled-cluster (LCCSD)
,”
J. Chem. Phys.
114
,
661
(
2001
).
29.
F.
Neese
,
F.
Wennmohs
, and
A.
Hansen
, “
Efficient and accurate local approximations to coupled-electron pair approaches: An attempt to revive the pair natural orbital method
,”
J. Chem. Phys.
130
,
114108
(
2009
).
30.
H.-J.
Werner
,
G.
Knizia
,
C.
Krause
,
M.
Schwilk
, and
M.
Dornbach
, “
Scalable electron correlation methods. I. PNO-LMP2 with linear scaling in the molecular size and near-inverse-linear scaling in the number of processors
,”
J. Chem. Theory Comput.
11
,
484
(
2015
).
31.
P.
Pinski
,
C.
Riplinger
,
E. F.
Valeev
, and
F.
Neese
, “
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals
,”
J. Chem. Phys.
143
,
034108
(
2015
).
32.
J.
Yousung
,
R. C.
Lochan
,
A. D.
Dutoi
, and
M.
Head-Gordon
, “
Scaled opposite-spin second order Moller-Plesset correlation energy: An economical electronic structure method
,”
J. Chem. Phys.
121
,
9793
(
2004
).
33.
J.
Yousung
,
S.
Yihan
, and
M.
Head-Gordon
, “
Fast evaluation of scaled opposite spin second-order Moller-Plesset correlation energies using auxiliary basis expansions and exploiting sparsity
,”
J. Comput. Chem.
28
,
1953
(
2007
).
34.
S. A.
Maurer
,
J.
Kussmann
, and
C.
Ochsenfeld
, “
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units
,”
J. Chem. Phys.
141
,
051106
(
2014
).
35.
D.
Neuhauser
,
E.
Rabani
, and
R.
Baer
, “
Expeditious stochastic approach for MP2 energies in large electronic systems
,”
J. Chem. Theory Comput.
9
,
24
(
2013
).
36.
Q.
Ge
,
Y.
Gao
,
R.
Baer
,
E.
Rabani
, and
D.
Neuhauser
, “
A guided stochastic energy-domain formulation of the second order Møller–Plesset perturbation theory
,”
J. Phys. Chem. Lett.
5
,
185
(
2014
).
37.
S. Y.
Willow
,
K. S.
Kim
, and
S.
Hirata
, “
Stochastic evaluation of second-order many-Body perturbation energies
,”
J. Chem. Phys.
137
,
204122
(
2012
).
38.
S. Y.
Willow
,
M. R.
Hermes
,
K. S.
Kim
, and
S.
Hirata
, “
Convergence acceleration of parallel Monte Carlo second-order many-body perturbation calculations using redundant walkers
,”
J. Chem. Theory Comput.
9
,
4396
(
2013
).
39.
E. G.
Hohenstein
,
R. M.
Parrish
, and
T. J.
Martinez
, “
Tensor hypercontraction density fitting. I. Quartic scaling second- and third-order Moller-Plesset perturbation theory
,”
J. Chem. Phys.
137
,
044103
(
2012
).
40.
R. M.
Parrish
,
E. G.
Hohenstein
,
T. J.
Martinez
, and
C. D.
Sherrill
, “
Tensor hypercontraction. II. Least-squares renormalization
,”
J. Chem. Phys.
137
,
224106
(
2012
).
41.

Throughout this article, we use the word tensor to indicate any multi-dimensional array and do not mean to imply any specific transformation properties.

42.
R. M.
Parrish
,
E. G.
Hohenstein
,
N. F.
Schunck
,
C. D.
Sherrill
, and
T. J.
Martinez
, “
Exact tensor hypercontraction: A universal technique for the resolution of matrix elements of local finite-range N-body potentials in many-body quantum problems
,”
Phys. Rev. Lett.
111
,
132505
(
2013
).
43.
E. G.
Hohenstein
,
S. I. L.
Kokkila
,
R. M.
Parrish
, and
T. J.
Martinez
, “
Quartic scaling second-order approximate coupled cluster singles and doubles via tensor hypercontraction: THC-CC2
,”
J. Chem. Phys.
138
,
124111
(
2013
).
44.
E. G.
Hohenstein
,
S. I. L.
Kokkila
,
R. M.
Parrish
, and
T. J.
Martinez
, “
Tensor hypercontraction equation-of-motion second-order approximate coupled cluster: Electronic excitation energies in O(N-4) time
,”
J. Phys. Chem. B
117
,
12972
(
2013
).
45.
S. I. L.
Kokkila Schumacher
,
E. G.
Hohenstein
,
R. M.
Parrish
,
L.-P.
Wang
, and
T. J.
Martinez
, “
Tensor hypercontraction second-order Moller-Plesset perturbation theory: Grid optimization and reaction energies
,”
J. Chem. Theory Comput.
11
,
3042
(
2015
).
46.
V. I.
Lebedev
and
D. N.
Laikov
, “
A quadrature formula for the sphere of the 131st algebraic order of accuracy
,”
Dokl. Math.
59
,
477
(
1999
).
47.
C. W.
Murray
,
N. C.
Handy
, and
G. J.
Laming
, “
Quadrature schemes for integrals of density functional theory
,”
Mol. Phys.
78
,
997
(
1993
).
48.
J. L.
Whitten
, “
Coulombic potential-energy integrals and approximations
,”
J. Chem. Phys.
58
,
4496
(
1973
).
49.
M.
Haser
and
J.
Almlof
, “
Laplace transform techniques in Moller-Plesset perturbation theory
,”
J. Chem. Phys.
96
,
489
(
1992
).
50.
J.
Almlof
, “
Elimination of energy denominators in Moller-Plesset perturbation theory by a Laplace transform approach
,”
Chem. Phys. Lett.
181
,
319
(
1991
).
51.
M.
Haser
, “
Moller-Plesset (MP2) perturbation theory for large molecules
,”
Theor. Chim. Acta
87
,
147
(
1993
).
52.
T. H.
Dunning
, “
Gaussian-basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
,”
J. Chem. Phys.
90
,
1007
(
1989
).
53.
F.
Weigend
,
A.
Kohn
, and
C.
Hattig
, “
Efficient use of the correlation consistent basis sets in resolution of the identity MP2 calculations
,”
J. Chem. Phys.
116
,
3175
(
2002
).
54.
Y.
Shao
,
L. F.
Molnar
,
Y.
Jung
,
J.
Kussmann
,
C.
Ochsenfeld
,
S. T.
Brown
,
A. T. B.
Gilbert
,
L. V.
Slipchenko
,
S. V.
Levchenko
,
D. P.
O’Neill
,
R. A.
DiStasio
, Jr.
,
R. C.
Lochan
,
T.
Wang
,
G. J. O.
Beran
,
N. A.
Besley
,
J. M.
Herbert
,
C. Y.
Lin
,
T.
Van Voorhis
,
S. H.
Chien
,
A.
Sodt
,
R. P.
Steele
,
V. A.
Rassolov
,
P. E.
Maslen
,
P. P.
Korambath
,
R. D.
Adamson
,
B.
Austin
,
J.
Baker
,
E. F. C.
Byrd
,
H.
Dachsel
,
R. J.
Doerksen
,
A.
Dreuw
,
B. D.
Dunietz
,
A. D.
Dutoi
,
T. R.
Furlani
,
S. R.
Gwaltney
,
A.
Heyden
,
S.
Hirata
,
C.-P.
Hsu
,
G.
Kedziora
,
R. Z.
Khalliulin
,
P.
Klunzinger
,
A. M.
Lee
,
M. S.
Lee
,
W.
Liang
,
I.
Lotan
,
N.
Nair
,
B.
Peters
,
E. I.
Proynov
,
P. A.
Pieniazek
,
Y. M.
Rhee
,
J.
Ritchie
,
E.
Rosta
,
C. D.
Sherrill
,
A. C.
Simmonett
,
J. E.
Subotnik
,
H. L.
Woodcock
III
,
W.
Zhang
,
A. T.
Bell
,
A. K.
Chakraborty
,
D. M.
Chipman
,
F. J.
Keil
,
A.
Warshel
,
W. J.
Hehre
,
H. F.
Schaefer
III
,
J.
Kong
,
A. I.
Krylov
,
P. M. W.
Gill
, and
M.
Head-Gordon
, “
Advances in methods and algorithms in a modern quantum chemistry program package
,”
Phys. Chem. Chem. Phys.
8
,
3172
(
2006
).
55.
See www.petachem.com for more information about the TeraChem quantum chemistry software, accessed 1 February 2016.
56.
J. M.
Turney
,
A. C.
Simmonett
,
R. M.
Parrish
,
E. G.
Hohenstein
,
F. A.
Evangelista
,
J. T.
Fermann
,
B. J.
Mintz
,
L. A.
Burns
,
J. J.
Wilke
,
M. L.
Abrams
,
N. J.
Russ
,
M. L.
Leininger
,
C. L.
Janssen
,
E. T.
Seidl
,
W. D.
Allen
,
H. F.
Schaefer
,
R. A.
King
,
E. F.
Valeev
,
C. D.
Sherrill
, and
T. D.
Crawford
, “
PSI4: An open-source ab initio electronic structure program
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
,
556
(
2012
).
57.
R. M.
Parrish
,
C. D.
Sherrill
,
E. G.
Hohenstein
,
S. I. L.
Kokkila
, and
T. J.
Martinez
, “
Communication: Acceleration of coupled cluster singles and doubles via orbital-weighted least-squares tensor hypercontraction
,”
J. Chem. Phys.
140
,
181102
(
2014
).
58.
See supplementary material at http://dx.doi.org/10.1063/1.4948438 for additional test results with other types of molecules regarding the accuracy and performance of the method described in this paper.
59.
C.
Song
and
T. J.
Martinez
, “
Atomic orbital-based SOS-MP2 with tensor hypercontraction. II. Local tensor hypercontraction
” (unpublished).

Supplementary Material