Analytic energy gradients for tensor hyper-contraction (THC) are derived and implemented for second-order Møller-Plesset perturbation theory (MP2), with and without the scaled-opposite-spin (SOS)-MP2 approximation. By exploiting the THC factorization, the formal scaling of MP2 and SOS-MP2 gradient calculations with respect to system size is reduced to quartic and cubic, respectively. An efficient implementation has been developed that utilizes both graphics processing units and sparse tensor techniques exploiting spatial sparsity of the atomic orbitals. THC-MP2 has been applied to both geometry optimization and ab initio molecular dynamics (AIMD) simulations. The resulting energy conservation in micro-canonical AIMD demonstrates that the implementation provides accurate nuclear gradients with respect to the THC-MP2 potential energy surfaces.

Ab initio molecular dynamics (AIMD) is an important tool to understand chemical processes. However, accurate ab initio molecular dynamics simulations are usually limited by the computational cost of evaluating the energy and the analytical gradients with electronic structure theories. The fourth-order electronic repulsion integral (ERI) tensor is the major source of complexity for most electronic structure methods. One of the simplest ways to describe dynamic correlation effects beyond the Hartree-Fock approximation is second-order Møller-Plesset perturbation theory (MP2).1 MP2 is known to be good at describing hydrogen bonding and long-range dispersion interactions2 and provides an accurate description of conformational energies of small peptides.3 Therefore, MP2 has been used with geometry optimization in search of water cluster configurations4 and for protein force field parametrization.5 A recent Monte Carlo simulation study on bulk liquid water shows that the water density obtained with MP2 is in better agreement with experiments compared with density functional theory (DFT) without dispersion corrections.6 Despite the advantages of MP2, a bottleneck to widespread use of MP2, in its conventional form, is its quintic computational scaling with increasing system size. This scaling is due to the required transformation of the fourth-order ERI tensor from the atomic orbital (AO) basis to the molecular orbital (MO) basis.

Analytic energy gradients for MP2 were first derived and implemented by Pople et al.7 and many advances have been made since then. To efficiently treat the ERI contribution to the gradient, Rice and Amos pointed out that storing and transforming the derivative of ERI can be avoided by formulating the gradient in terms of the two-particle density matrix.8 Handy et al.9 and Fitzgerald et al.10 both showed that the Handy-Schaefer Z vector method11 is applicable to the MP2 gradient and requires solving only a single coupled perturbed Hartree-Fock (CPHF) equation in order to account for the orbital response contribution. These two techniques were both used in subsequent direct and semi-direct MP2 gradient implementations.12–15 Several alternative derivations have been reported that provide insight into the structure of the orbital response contributions,16,17 such as the Lagrangian method18,19 and the density approach.20,21 MP2 gradient theories have also been developed for open shell systems,22–24 extended systems,25 and non-canonical molecular orbital formulations.26–30 More recently, the analytic gradients for methods that combine MP2 with other levels of theories have also been developed. A few examples are the analytic gradients for MP2 combined with density functional theory,31 MP2 with explicit correlation (MP2-R12),32 MP2 with implicit solvation through the polarizable continuum model,33,34 and MP2 in combination with hybrid quantum mechanics/molecular mechanics (QM/MM)35 or polarizable force fields.36 

There have been many efforts to reduce the computational cost of evaluating MP2 energies and gradients. One approach factorizes the high order ERI tensor as a product of several low-order tensors. Two such examples with implemented analytic gradients are the density-fitting (DF) approximation (also called resolution-of-identity or RI)37–39 and Cholesky decomposition (CD).40,41 However, DF and CD factorizations are unable to efficiently factorize the exchange-like correlation term, and thus do not affect the quintic scaling. A second type of approach makes use of locality. One such example is the MP2 method based on fragment molecular orbital (FMO),42 which has been applied in geometry optimization43 and ab initio molecular dynamics simulations.44 Another example is local MP2 methods.45,46 A third type of approach is represented by the scaled opposite spin MP2 (SOS-MP2) method, which approximates the exchange-like term by scaling the Coulomb-like term, thereby achieving quartic scaling for both energy47 and gradients.48,49 Finally, one can accelerate the MP2 gradient through parallelization and such studies have been reported for canonical MP2,50–52 unrestricted MP2,53 and RI-MP2 with Gaussian orbitals54 or hybrid Gaussian/plane waves approaches.55 

Unlike most previous methods that require third-order tensors in the factorization of the ERI tensor, tensor hyper-contraction (THC)56,57 is able to factorize the ERI tensor using only second-order tensors. With the THC factorization, the scaling of calculating MP2,56 MP3,56 second-order approximate coupled cluster (CC2),58 and equation-of-motion CC2 (EOM-CC2)59 energies have all been successfully reduced to O(N4). Previous work has primarily focused on using THC to accelerate correlation energy calculations. In this work, we focus on the development of analytic gradients for THC in order to enable its application to geometry optimization and AIMD simulations. We show how to evaluate analytic derivatives for the THC tensors and further provide complete expressions for analytic energy gradients of THC-MP2 and THC-SOS-MP2.

The structure of this paper is as follows. We start by discussing how the THC factorization reduces MP2/SOS-MP2 gradient evaluations to quartic/cubic scaling, respectively. Next, we discuss the analytic gradients for the THC tensors under a variety of THC approximations. A few considerations regarding the graphics processing unit (GPU) implementations will also be discussed. Finally, we present results demonstrating the performance of the methods and show applications of the methods to both geometry optimization and AIMD simulations. The correctness of the methods and the implementations will be demonstrated by considering energy conservation in microcanonical AIMD simulations.

As there are many high-order tensors involved in the methods described in this paper, we first summarize our convention for indices as follows:

  • μ,ν,λ,σ,γ: Atomic Orbital (AO) basis function indices. These are atom-centered contracted Gaussian functions. The indices range from 1…Nμ, where Nμ is proportional to system size N.

  • A,B,C,D: Auxiliary basis indices for density fitting (DF) approximation. These are also atom-centered Gaussian functions. The indices range from 1…NA, where NAN. For satisfactory accuracy, NA is usually three to four times larger than Nμ.

  • P,Q,R,S,P,Q: Auxiliary basis indices for THC approximation, which in this work represent gridpoints in physical space. These indices range from 1NP, where NPN. For satisfactory accuracy, NP is usually eight to ten times larger than Nμ.

  • i,j,k: Occupied MO basis indices ranging from 1…Nocc.

  • a,b,c: Virtual MO basis indices ranging from 1…Nvirt. Usually Nvirt is significantly larger than Nocc.

  • κ: Laplace quadrature point indices. The range of these indices is independent of N, and 7-10 quadrature points are usually sufficient.

In the THC framework, the atomic orbital (AO) based ERI tensor is factorized as

μν|λσTHCPQXμPXνPZPQXλQXσQ,
(1)

where the X tensor in this work is constructed from the values of the atomic orbitals at physical space gridpoints,

XμP=wP4ϕμrP,
(2)

and the Z tensor is often computed by least-squares (LS) as defined below and elsewhere.56–63 Other choices for the X tensor are possible, including ones that have no relationship to physical space grids.56 Most of the formalism presented here carries over to these choices, provided the derivative of X with respect to molecular geometry can be evaluated. The THC factorization of ERIs in the molecular orbital (MO) basis is obtained by keeping the Z tensor unchanged while transforming the X tensor from the AO basis to the MO basis,

XpP=μCpμXμP,
(3)

where C represents the molecular orbital coefficients and p indexes molecular orbitals.

When the X tensor is defined as in Eq. (2) and the corresponding AO basis set is localized (as chosen in this work), the X tensor will be inherently sparse. This sparsity can be utilized to accelerate tensor operations involving X. We focus in this work on least-squares based definitions of Z,57 although other choices exist.56,63 Within the least-squares approximation, there remains freedom as to which error metric is used to define Z. For example, one can minimize the errors in the ERIs in the MO57 or AO61,62 basis or one can employ a weighted least-squares procedure to minimize the errors for a subset of the MO ERIs.64 In Secs. II A and II B, we first outline the general THC-MP2 gradient theory that is applicable for all THC methods that factorize the ERIs in the form given in Eq. (1). We discuss the evaluation of gradients for various choices of the Z tensor in Sec. II C.

For simplicity, we give formulas for MP2 and SOS-MP2 gradients in the case of a restricted, closed-shell HF reference. In order to treat both MP2 and SOS-MP2 simultaneously, we define

μν||λσTHC=2μν|λσTHCμσ|λνTHC,MP2,csosμν|λσTHC,SOSMP2,
(4)

where csos = 1.3 is used in our implementation as recommended previously.47 Similarly, we define μν||λσ=2μν|λσμσ|λν as the exact ERI used in the preceding HF calculation (assuming that THC is only used for the MP2 and/or SOS-MP2 calculation). The THC-MP2 correlation energy is given as

EMP2=ijabΔijabia|jbTHCia||jbTHC,
(5)

where we use a Laplace transform and quadrature65,66 for the energy denominator,

Δijab=1εa+εbεiεj=0eεa+εbεiεjtdtκτiκτjκτaκτbκ.
(6)

Previous work56,57,60–62 has shown that within the THC approximation, the Coulomb-like term in Eq. (5) can be evaluated with O(N3) operations, and the exchange-like term (not needed in SOS-MP2) can be evaluated with O(N4) operations. This should be compared with conventional formulations of MP2 and SOS-MP2, which require O(N5) operations. For brevity, we describe the THC factorization details of Coulomb- and exchange-like terms in  Appendixes A and  B, respectively. These have been discussed previously but are reproduced here in order to make the article self-contained (and many of these operations are closely related to the factorizations needed in the analytic gradient).

The analytic gradient of the THC MP2 ground state energy is given as12 

E(ξ)=EHF(ξ)+EMP2(ξ)=μνPμνHF+Pμν(2)hμν(ξ)+μνWμνHF+Wμν(2)Sμν(ξ)+μνλσ12PμνHF+Pμν(2)PλσHFμν||λσ(ξ)+μνλσΓ^μνλσMP2μν|λσTHC(ξ),
(7)

where ξ represents the nuclear degree of freedom involved in the derivative, hμν is the core Hamiltonian matrix element (electronic kinetic energy and electron-nuclear attraction), Sμν is the overlap integral, PHF/WHF are the HF/energy-weighted density matrices, P(2) and W(2) are the second order corrections to the density and energy-weighted density matrices, and Γ^μνλσMP2 is the MP2 two-electron density matrix. Gradient contributions from P(2) and W(2) in Eq. (7) describe how the MP2 energy responds to changes in the orbital coefficients and orbital energies. The last term in Eq. (7) describes how the MP2 energy responds to changes in the THC-approximated ERI. We will focus our discussion on the terms that need to be modified in THC-based MP2/SOS-MP2 relative to a conventional implementation of the MP2/SOS-MP2 gradient. This boils down to the second-order corrections to the density and energy-weighted density matrices, i.e., P(2) and W(2), and the last term in Eq. (7) where the THC ERI appears. Other terms are treated in an analogous manner to conventional MP2 gradient theory.

As in density fitted MP2 (DF-MP2), there are two types of ERIs involved in the ground state energy—those used in the preceding HF calculation (assumed to be exact) and the approximate ERIs used in the calculation of the MP2 energy. Because of this, some blocks of P(2) and W(2) contain contributions from both the THC-approximated ERI μν|λσTHC and also the exact ERI μν|λσ. In the MO basis, the occupied-occupied and virtual-virtual blocks in P(2) are defined as

Pik2=2jabΔijabΔkjabia|jbTHCka||jbTHC,
(8)
Pac2=2ijbΔijabΔijcbai|bjTHCci||bjTHC.
(9)

The occupied-virtual block of P(2) is obtained by solving the coupled-perturbed HF equations11 as

aiMck,aiPai2=Lck2,
(10)

where

Mck,ai=δacδikεaεi+4kc|iaka|icki|ca,
(11)
Lck2=2jabΔkjabca|jbTHCka||jbTHC+2ijcΔijcbci|bjTHCki||bjTHC+bcck||abPab2+jkck||ijPij2.
(12)

The definition of W(2) in the MO-basis is

Wik2=2jabΔijabia|jbTHCka||jbTHCεiPik2pqik||pqPpq2,
(13)
Wac2=2kjcΔijabai|bjTHCci||bjTHCεaPac2,
(14)
Wck2=2jabΔkjabca|jbTHCka||jbTHCεkPck2.
(15)

Equations (8) and (9) contain products of two energy denominators, which would normally imply two separate Laplace transform quadratures. Here, we introduce a single quadrature to describe the product. This implies only a single loop over the Laplace transform quadrature points. For ΔijacΔkjac in Eq. (8), when i is not equal to k,

ΔijabΔkjab=1εiεk1εa+εbεiεj1εa+εbεkεjκτiκτkκεiεkτjκτaκτbκ.
(16)

If i is equal to k,

1εa+εbεiεj2=0tetεa+εbεiεjdtκtκτaκτbκτiκτjκ.
(17)

For the degenerate case when i is equal to k, previous works48,49 have obtained a similar expression to Eq. (17). Combining Eqs. (16) and (17), we get

ΔijabΔkjab=κωikκτjκτaκτbκ,
(18)

where

ωikκ=τiκτkκεiεk,ik,tκτiκ,i=k.
(19)

Similarly, ΔijabΔijcb in Eq. (9) can be evaluated as

ΔijabΔijcb=κω¯acκτbκτiκτjκ,
(20)

where

ω¯acκ=τaκτcκεaεc,ac,tκτaκ,a=c.
(21)

In this work, we use the same quadrature nodes and weights for evaluating denominator products as shown in Eqs. (16) and (17) and single denominators as shown in Eq. (6). These nodes and weights are chosen following the work of Takatsuka et al.67,68 Empirical tests of the accuracy of this procedure are documented in Sec. VIII of the supplementary material. In principle, the optimal quadrature nodes and weights will be different for single denominators and denominator products, and there may be ways to increase the accuracy by searching for a quadrature that can treat both cases. However, we leave this to future work, as the tests below show that the present approach is sufficiently accurate.

Equations (18) and (20) factorize the energy denominator product in a similar way as the energy denominator in Eq. (6). This allows us to efficiently evaluate Eqs. (8)–(15) by introducing the following common intermediate tensors:

ρμνκ=jabτjκτaκτbκμa|jbTHCνa||jbTHC,
(22)
ρ¯μνκ=ijbτbκτiκτjκμi|bjTHCνi||bjTHC.
(23)

We call these the OVV [occupied-virtual-virtual, Eq. (22)] and OOV [occupied-occupied-virtual, Eq. (23)] intermediate density matrices. The density matrix P(2), energy-weighted density matrix W(2), and the Lagrangian L can then be computed from ρμνκ and ρ¯μνκ as

Pik2=2κμνωikκCiμCkνρμνκ,
(24)
Pac2=2κμνω¯acκCaμCcνρ¯μνκ,
(25)
Lck2=2κμντkκCcμCkνρμνκ+2κμντcvCcμCkνρ¯μνκ+abck||abPab2+ijck||ijPij2,
(26)
Wik2=2κμντiκCiμCkνρμνκεiPik2pqik||pqPpq2,
(27)
Wac2=2κμντaκCaμCcνρ¯μνκεaPac2,
(28)
Wck2=2κμντcκCcμCkνρμνκεkPck2.
(29)

The operations in Eqs. (24)–(29) are dominated by AO to MO transformations of ρμνκ and ρ¯μνκ, scaling as O(N3). Additionally, there are Fock matrix evaluations in Eqs. (26) and (27), which in practice usually scale as O(N2) after screening with the Schwarz bound.69 The calculation of ρμνκ and ρ¯μνκ will be discussed in detail below (Sec. II B). These terms require O(N4) operations for THC-MP2 and O(N3) operations for THC-SOS-MP2.

The last term in Eq. (7) represents gradient contributions corresponding to geometry dependence of the THC tensors, where the nonseparable part of the MP2 two-electron density matrix is defined as

Γ^μνλσMP2=2ijabΔijabia||jbTHCCiμCaνCjλCbσ,
(30)

The last term in Eq. (7) can be further broken down in terms of the X and Z tensors as

μνλσΓ^μνλσMP2μν|λσTHCξ=PμDμPXμP,ξ+PQΓPQZPQ,ξ,
(31)

where

DμP=DPμocc+DPμvir=4ijabQΔijabia||jbTHCXaPZPQXjQXbQCiμ4ijabQΔijabia||jbTHCXiPZPQXjQXbQCaμ
(32)

and

ΓPQ=2iajbΔijabia||jbTHCXiPXaPXjQXbQ.
(33)

In Sec. II B, we discuss the implementation details for the computation of DμP and ΓPQ, showing that this can be accomplished with O(N4)/O(N3) operations for THC-MP2/THC-SOS-MP2. In Sec. II C, we show how analytic gradients for the X and Z tensors in Eq. (31) can be evaluated once DμP and ΓPQ have been computed.

Given the density matrices DμP, the MP2 energy can be directly calculated as

EMP2=18μPDμPXμP,ON.
(34)

Therefore, in MP2 gradient calculations, one does not need to use the equations shown in  Appendixes A and  B in order to evaluate the MP2 energy. Rather, the MP2 energy is efficiently computed as a byproduct of the MP2 gradient computation.

In this section, we will discuss how the THC factorization is applied to reduce the computational scaling associated with the four density matrices needed for gradient evaluation: ρμνκ, ρ¯μνκ, DμP, and ΓPQ. The expressions for these density matrices are closely related to the MP2 energy expression, which allows most intermediate tensor operations in the energy implementation to be reused for gradient calculations. Similar to the MP2 energy expression, the density matrix ρμνκ can be decomposed into a Coulomb-like contribution ρμνκ,J and an exchange-like contribution ρμνκ,K as ρμνκ=cJρμνκ,J+cKρμνκ,K, where

cJ=2.0,MP2csos,SOSMP2,cK=1.0,MP20.0,SOSMP2.
(35)

A similar decomposition into Coulomb-like and exchange-like terms also applies to the other three density matrices ρ¯μνκ, DμP, and ΓPQ. Diagrammatic representations illustrating the required tensor operations are provided for both Coulomb-like (Fig. 1) and exchange-like (Fig. 2) terms. In the following, we treat each of these density matrices in turn.

FIG. 1.

Diagrammatic representation of tensor operations in Coulomb-like calculations. Solid lines represent contracting the connected tensors by the index that is in color, and the contraction leads to the new tensor in the circle that the arrow is pointing to. Dashed lines represent element-wise multiplications between the tensors being connected, leading to a new tensor with the same number of indices. Tensors that are shared with previous diagrams are shaded in gray.

FIG. 1.

Diagrammatic representation of tensor operations in Coulomb-like calculations. Solid lines represent contracting the connected tensors by the index that is in color, and the contraction leads to the new tensor in the circle that the arrow is pointing to. Dashed lines represent element-wise multiplications between the tensors being connected, leading to a new tensor with the same number of indices. Tensors that are shared with previous diagrams are shaded in gray.

Close modal
FIG. 2.

Diagrammatic representation of tensor operations in the calculation of exchange-like terms. Notation is consistent with Fig. 1.

FIG. 2.

Diagrammatic representation of tensor operations in the calculation of exchange-like terms. Notation is consistent with Fig. 1.

Close modal

1. OVV intermediate density matrix ρμνκ

As shown in Fig. 1(a), the Laplace quadrature points are usually first contracted with the molecular orbital coefficients to form two transformation matrices, i.e., Tκ,(occ) defined in Eq. (A2) and Tκ,(vir) defined in Eq. (A3). Contracting the X tensor with Tκ,(occ) and Tκ,(vir) leads to the Y tensor defined in Eq. (A4) and the U tensor defined in Eq. (A5), respectively. The Coulomb-like contribution in ρμνκ can then be formulated as

ρμνκ,J=jabτjκτaκτbκμa|jbTHCνa|jbTHC=PQRSγλσXμPXνRXγPUγκ,RZRSYλκ,SXλQUσκ,SXσQZQP.
(36)

As shown in Fig. 1(c), ρμνκ,J requires one fewer contraction with Tκ,(occ) compared to the Coulomb-like energy calculation in Fig. 1(a). The expression in parentheses is also contained in the Coulomb-like energy expression in Eq. (A6) and corresponds to the F tensor defined in Eq. (A10). This allows Eq. (36) to be computed with O(N3) operations as

gκ,PR=SZRSFκ,SP,ON3,
(37)
hκ,PR=Vκ,PRgκ,PR,ON2,
(38)
ρμνκ,J=PRXμPhκ,PRXνR,ON2,
(39)

where the V tensor in Eq. (38) is defined in Eq. (A8). The calculation of ρμνκ is slightly more expensive than the Coulomb-like energy calculation, as it requires an additional dense matrix multiplication shown as in Eq. (37).

Using the Λ tensor defined in Eq. (B2) and the transformation matrix Tκ,(vir), the exchange-like contribution in ρμνκ can be expressed as

ρμνκ,K=jabτjκτaκτbκμa|jbTHCνb|jaTHC=PRQSjσσγγXμPXνRZPQΛjκ,QXσQTσσκ,virXσRZRSΛjκ,SXγSTγγκ,virXγP.
(40)

The calculation of Eq. (40) is also closely related to the exchange-like energy expression in Eq. (B4) and requires one fewer contraction with Tκ,(occ) as can be seen in the comparison of Figs. 2(a) and 2(c). The expression in parentheses is shared between the two calculations and corresponds to the H tensor defined in Eq. (B8). Thus we have

ρμνκ,K=PRXμPHκ,PRXνR,ON2.
(41)

This indicates that the calculation of ρμνκ,K also scales as O(N4) and has almost the same computational cost as the MP2 exchange-like energy.

2. OOV intermediate density matrix ρ¯μνκ

As shown in Eqs. (22) and (23), ρμνκ and ρ¯μνκ have similar expressions, differing by the interchange of occupied and virtual indices. This allows them to be computed in similar ways.

For the Coulomb-like contribution in ρ¯μνκ, we have

ρ¯μνκ,J=ijbτiκτjκτbκμi|bjTHCνi|bjTHC=PQRSγλσXμPXνRXγPYγκ,RZRSYλκ,SXλQUσκ,SXσQZQP.
(42)

As illustrated in Fig. 1(e), using the g tensor defined in Eq. (37) and the O tensor defined in Eq. (A7), it can be evaluated as

h¯κ,PR=Oκ,PRgκ,PR,ON2,
(43)
ρ¯μνκ,J=PRXμPh¯κ,PRXνR,ON2.
(44)

For the exchange-like contribution in ρ¯μνκ, using the Λ¯ tensor defined in Eq. (B3) and the transformation matrix Tκ,(occ), we have

ρ¯μνκ,K=ijbτiκτjκτbκμi|bjTHCνi||bjTHC=PRQSbσσγγXμPXνRZPQΛ¯bκ,QXσQTσσκ,occXσRZRSΛbκ,SXγSTγγκ,occXγP.
(45)

The expression in Eq. (45) does not share any intermediate tensors with the exchange-like energy term. Nevertheless, as shown in Fig. 2(e), we can compute the following tensor by following a similar procedure as Eqs. (89)–(91):

G¯bκ,RP=SγγZRSΛ¯bκ,SXγSTγγκ,occXγP.
(46)

This requires O(N4) operations, whereupon ρ¯μνκ,K can be computed as

H¯κ,PR=bG¯bκ,PRG¯bκ,RP,ON3,
(47)
ρ¯μνκ,K=PRXμPH¯κ,PRXνR,ON2.
(48)

However, as the number of virtual orbitals is usually greater than the number of occupied orbitals, computing G¯ is usually more expensive than computing G in Eq. (B7). We discuss this in more detail in Sec. III.

3. X tensor density matrix DμP

As shown in Eq. (32) of Sec. II A, since the X tensor is contracted with the occupied and virtual molecular orbital coefficients separately, the density matrix can also be divided into DPμocc and DPμvir correspondingly. The Coulomb-like contribution in the density matrix of the X tensor from the occupied orbitals is

DPμocc,J=τiκτjκτaκτbκia|jbTHCXaPZPQXjQXbQCiμ=κPQRSγλσYμRXγPUγκ,RZRSYλκ,SXλQUσκ,SXσQZQP.
(49)

As can be seen from the comparison of Figs. 1(c) and 1(d), the expression in parentheses is shared when calculating DPμocc and ρμνκ,J, which is contracted into the intermediate tensor h defined in Eq. (38). Therefore, this density matrix can be computed as

DPμocc,J=κRhκ,PRYμκ,R,ON3.
(50)

The Coulomb-like contribution in the density matrix of the X tensor from virtual orbitals is

DPμvir,J=τiκτjκτaκτbκia|jbTHCXiPZPQXjQXbQCaμ=κPQRSγλσUμRXγPYγκ,RZRSYλκ,SXλQUσκ,SXσQZQP.
(51)

As shown in Figs. 1(f) and 1(e), DPμvir,J depends on the intermediate tensor h¯ used to construct ρ¯μνκ,J, as shown in Eq. (43). Reusing this intermediate,

DPμvir,J=κRh¯κ,PRUμκ,R,ON3.
(52)

Similarly, for the exchange-like term in the density matrix of the X tensor, DPμocc,K in Fig. 2(d) and DPμvir,K in Fig. 2(f) can be computed as

DPμocc,K=κRHκ,PRYμκ,R,ON3,
(53)
DPμvir,K=κRH¯κ,PRUμκ,R,ON3,
(54)

where H and H¯, respectively, defined in Eqs. (B8) and (47), are intermediate tensors shared with ρμνκ,K and ρ¯μνκ,K. From the above discussion, we see that calculation of the X tensor density matrices requires only four additional matrix multiplications after ρμνκ and ρ¯μνκ have been calculated.

4. Z tensor density matrix ΓPQ

The Coulomb-like term in the density matrix of the Z tensor is

ΓPQ,J=κiajbτiκτjκτaκτbκia|jbTHCXiPXbPXjQXbQ=κ,μνλσRSXμPYμκ,RXγPUγκ,RZRSYλκ,SXλQUσκ,SXσQ.
(55)

As shown in Fig. 1(b), it requires one fewer contraction with the Z tensor compared to the energy evaluations in Fig. 1(a). Thus using the A tensor defined in Eq. (A9) and the F tensor in Eq. (A10), both from the Coulomb-like MP2 energy calculation, we obtain

ΓPQ,J=κRFκ,PSAκ,SQ.
(56)

Therefore, the construction of ΓPQ,J also requires one additional dense matrix multiplication compared to the Coulomb-like energy calculation.

The exchange-like term in the density matrix of the Z tensor is

ΓPQ,K=κiajbτiκτjκτaκτbκib|jaTHCXiPXaPXjQXbQ=κRSμjσσγγXμPYμκ,RΛjκ,QXσQTσσκ,virXσRZRSΛjκ,SXγSTγγκ,virXγP.
(57)

The operations in parentheses are also contained in the exchange-like energy evaluations and correspond to the G tensor in Eq. (B7). However, as shown in Fig. 2(b) because it requires one fewer contraction with the Z tensor compared to the energy evaluation, the remaining tensor cannot be contracted into another G tensor as in Fig. 2(a). Therefore, building the exchange-like density matrix of Z tensor requires some additional quartic scaling operations as

fjκ,RP=Oκ,RPGjκ,RP,ON3,
(58)
vjσκ,Q=Λjκ,QXσQ,ON2,
(59)
ujκ,QR=σvjσκ,QUσR,ON3,
(60)
ΓPQ=κjRfjκ,RPujκ,QR,ON4.
(61)

The scaling we present in some of the above equations and in the appendices has made use of the spatial sparsity in the AO-based X tensor, i.e., the number of non-negligible grid points remains roughly constant for each atomic orbital. However, we emphasize that even without exploiting spatial sparsity, the formal scaling remains O(N4) for MP2 and O(N3) for SOS-MP2, since all tensor operations involve at most four indices.

Once the density matrices have been constructed, the last remaining question is how to evaluate the analytic gradients of the THC tensors, i.e., X and Z. The X tensor defined in Eq. (2) is simply the values of the atomic orbitals at the gridpoints. Therefore, its analytic gradient can be evaluated in a similar way as density functional theory, including gradient contributions from the changes of atomic centers on the gridpoints, atomic orbitals, and Becke gridpoint weights.70,71 The analytic gradient of the Z tensor is more complicated and will depend on which type of THC approximation is being used.

For least-squares AO-THC, the expression for the Z tensor is

ZPQ=PQμνλσABSphys1PPXμPXνPμν|AA|B1×B|λσXλQXσQSphys1QQ,
(62)

where the metric matrix in the physical space is defined as

SphysPP=μνXμPXνPXμPXνP.
(63)

The Z tensor in Eq. (62) contains two matrix inversions, i.e., the inversion of the density-fitting metric matrix (A|B) and the inversion of the physical space metric matrix Sphys. For a well-conditioned matrix, the derivative of its inverse can be computed as

ddxA1=A1dAdxA1.
(64)

However, Sphys is generally ill-conditioned, and its inverse matrix is computed as a pseudo-inverse by diagonalizing the matrix and neglecting small eigenvalues below a user-defined threshold (1.0 × 10−12 a.u. in our implementation). Because of the pseudo-inversion, Eq. (64) is not applicable. We currently solve this problem by regularizing the matrix as

Sphys1Sphys+δI1,
(65)

where I represents the identity matrix and δ is a small regularization constant added to the diagonal. When the regularization constant is large enough to make Sphys well-conditioned, Eq. (64) can again be used and we assume this below. We discuss the effect of different regularization values in Sec. III.

The THC tensors with Eqs. (2) and (62) contain three contributions that must be considered: the atomic orbital density at the grid points XμP, the two-center-two-electron Coulomb repulsion integral (A|B), and the three-center-two-electron Coulomb repulsion integral μν|A. Therefore, Eq. (31) can be further broken down as the gradient contributions from each type of integral

PμDμPXμP,ξ+PQΓPQZPQ,ξ=μνAdμνA1μν|Aξ+ABdAB2A|Bξ+PμdPμ3XμP,ξ,
(66)

where dμνA1, dAB2, and dPμ3 are corresponding density matrices, and details about their definition and evaluation are provided in Eqs. (C4), (C6), and (C10), respectively, in  Appendix C.

The major computational bottleneck when computing these density matrices is the last term AνcPAA|μνXνP in Eq. (C10). Theoretically it is possible to compute this term with O(N2) operations, but the calculation then requires storage of the third-order ERI tensor. To avoid this potential storage bottleneck, we numerically integrate the density fitting integral,

μν|A=P̃ωP̃ϕμrP̃ϕνrP̃ψAr|rP̃r|dr=P̃X̃μP̃X̃νP̃ṼP̃A,
(67)
X̃μP̃=ωP̃ϕμrP̃,
(68)
ṼP̃A=ψAr|rP̃r|dr.
(69)

We will use an over-tilde to denote all quantities related to the numerical integration. By using Eq. (67), we can factorize the three-index ERI tensor as the product of two-index tensors, leading to

ZPQ=PQP̃Q̃ABμνλσSphys1PPXμPXνPX̃μP̃X̃νP̃ṼP̃AA|B1×VBQ̃X̃λQ̃X̃σQ̃XλQXσQSphys1QQ.
(70)

There are two sets of gridpoints used in Eq. (70), i.e., the sparse gridpoints (denoted as P, P′, Q, Q′) used for the least-squares THC factorization and the dense grid points (denoted as P̃ and Q̃) used for numerical integration. The gridpoints used for numerical integration are described in Sec. I of the supplementary material. Normally, this grid is at least ten times larger than that used for the least-squares fitting. To make it clear that there exist two sets of gridpoints, we call this as dual-grid AO-THC. The quantities in Eqs. (68) and (69) are much simpler to evaluate compared with μν|A and can be efficiently implemented and parallelized on graphics processing units (GPUs). Therefore, our implementation computes these values and integrals on the fly when needed without storing them in memory. Despite the large number of gridpoints needed for accurate numerical integrations, the dual-grid scheme is advantageous in terms of performance, as discussed in further detail in Sec. III.

There are four types of quantities appearing in the Z tensor of dual-grid AO-THC in Eq. (70), i.e., the two-center-two-electron Coulomb integral (A|B), the Coulomb integral on the grid ṼP̃A, XμP on the sparse grid, and X̃μP̃ on the dense grids. Therefore, the gradient contributions can be broken down as

PμDμPXμP,ξ+PQΓPQZPQ,ξ=ABd̃AB1A|Cξ+P̃Ad̃P̃A2ṼP̃Aξ+P̃μd̃P̃μ3X̃μP̃+Pμd̃Pμ4XμP,
(71)

where the density matrices of integrals d̃AB1, d̃P̃A2, d̃P̃μ3, and d̃Pμ4 are provided in Eqs. (D4), (D5), (D7), and (D9), respectively, in  Appendix D. As shown in  Appendix D, the calculations of these density matrices only require tensor operations between two-index tensors such as matrix multiplications and thus scale formally as O(N3).

For least-squares MO-THC, the Z tensor is computed as

ZPQ=PQpqrsABSphys1PPXpPXqPpq|AA|B1×B|rsXrQXsQSphys1QQ,
(72)

where p, q, r, s are molecular orbital indices, and

SphysPP=pqXpPXqPXpPXqP
(73)

Therefore, the analytic gradient of the Z tensor in Eq. (72) can be broken down into contributions from XμP,ξ, A|Bξ, μν|Aξ, and molecular orbital coefficients Cpμξ. Evaluating the analytic derivative of the Z tensor will involve contributions from the response to changes of the molecular orbital coefficients. This necessitates further solution of coupled-perturbed Hartree-Fock equations. Although conceptually simple, the algebra is tedious and we leave the implementation of analytic gradients for least-squares MO-THC and dual-grid MO-THC to future work.

Our implementation uses graphical processing units (GPUs) to accelerate three parts of the analytic gradient calculation, as described below. First, all integrals as well as their corresponding analytic gradients are evaluated on GPUs. In our previous work,61,62 we have discussed in detail how to exploit the spatial sparsity of the atomic orbitals during integral evaluations on the GPU. We arrange the data by dividing the space surrounding the molecule into boxes and then populating the gridpoints, Gaussian primitive functions, and Gaussian primitive pairs into these boxes based on the locations of their centers. These techniques are adopted in the implementation for the analytic gradients of the integrals.

Second, our previous performance tests showed that the matrix diagonalization [which was performed on the central processing unit (CPU)] needed in the pseudo-inverse of the physical space metric matrix (Sphys) became a computational bottleneck for larger systems. To reduce the computational cost of this step, we now use the Magma library,72 a dense linear algebra library for GPUs, to diagonalize matrices with dimension larger than 5000.

Finally, the cuBLAS library73 is used to accelerate matrix multiplication (dgemm) on GPUs, including both matrix multiplications between second-order tensors and also high-order tensor operations that can be formulated as dgemm. For example, Eq. (61) can be evaluated with dgemm by treating R and j as a single compound index. When the tensors cannot fit into the limited GPU memory (around 3-12 Gb depending on the particular GPU being used), the tensors are divided into sub-tensors such that dgemm between subtensors can be computed on the GPUs.

All tensor operations that cannot be formulated as dgemm are computed on the CPU. Even though CPUs usually have much larger memory than GPUs, in some cases, storing high-order tensors can still become difficult for large molecules. An example is the exchange-like energy shown in  Appendix B. When the sizes of the tensors exceed the memory limit, our current strategy is to divide the tensor over the occupied index j and loop through Eqs. (89)–(92) for each subset of j. This is not the only way to address the memory limit, and systematic optimization for high-order tensor operations is certainly an avenue for further research.

The method described above was implemented in the TeraChem74–77 quantum chemistry package (which currently supports atom-centered spd Gaussian basis sets and spdf Gaussian auxiliary basis sets). Density fitted MP2 and SOS-MP2 calculations were performed with QCHEM.78 All calculations were performed with the cc-pVDZ basis set79 and without the frozen core approximation. The THC grids optimized for THC-MP2 with cc-pVDZ basis sets60 were used for both the least-squares AO-THC (LS-AO-THC) and dual grid AO-THC (DG-AO-THC). Because the THC approximation is applied in the AO basis for all calculations in this paper, we will omit “AO” in the acronyms hereafter for simplicity. These THC grids contain around 60 gridpoints per hydrogen atom and around 120 gridpoints per each carbon, nitrogen, or oxygen atom. The grids for numerical integration in DG-THC are provided in Sec. I of the supplementary material, and these are composed of around 1200–1300 gridpoints for each atom. All timing data are collected using one CPU core and one GPU unless otherwise specified.

To ensure that THC-MP2 is valid for AIMD simulations, we start by testing its accuracy in describing potential energy surfaces. Using alanine dipeptide as an example, we computed MP2 correlation energies with both density fitting and THC on 576 different geometries generated by scanning over the φ and ψ angles. As shown in Fig. 3, LS-THC-MP2 provides a very accurate description of the potential energy surface, where the root mean square error (RMSE) with respect to the DF-MP2 is only 0.013 kcal/mol (ranging from −0.046 kcal/mol to 0.032 kcal/mol). By replacing analytical evaluations of the density fitting integrals with numerical integrations, the RMSE in the relative energies computed with DG-THC-MP2 becomes 0.050 kcal/mol (ranging from −0.182 kcal/mol to 0.146 kcal/mol). The errors introduced by numerical integration can always be decreased by increasing the number of gridpoints used for the numerical integration, as in density functional theory. When a regularization parameter of δ = 1.0 × 10−5 a.u. is applied, denoted as DG-THC-MP2 (δ = 1.0 × 10−5 a.u.), the errors in the relative energy increase (as expected), but the RMSE is still acceptable at 0.068 kcal/mol (ranging from −0.233 kcal/mol to 0.195 kcal/mol). We also computed the relative thermal probabilities of conformations at room temperature (298 K) from the relative total MP2 energies. As shown in Fig. S1 of the supplementary material, the overall impact of the energy errors on the probability density at 298 K is minor. Most of the probability density is concentrated in the (−75, +90) region of the plot, and the DG-THC-MP2 (δ = 1.0 × 10−5) distribution only varies by a few percent in this region. We have also tested DG-THC-MP2 (δ = 1.0 × 10−5 a.u.) on numerous geometric configurations of (H2O)6 water clusters. The maximum error in relative energies is less than 0.06 kcal/mol, as documented in Fig. S2 of the supplementary material. These results indicate that DG-THC-MP2 with regularization can provide an accurate description of the potential energy surface.

FIG. 3.

MP2 correlation energies with density fitting, least-squares THC (LS-THC), dual-grid THC (DG-THC), and dual-grid THC with regularization (DG-THC with δ = 1.0 × 10−5 a.u.). Density fitting MP2 calculations were performed with QCHEM. All THC calculations were performed with TERACHEM. The upper panels show the relative correlation energies computed on 576 different alanine dipeptides geometries generated by scanning over the φ and ψ angles, as indicated on the lower left. The zero of energy for each method is chosen as the average energy across all conformations at that level of theory. Details about the coordinates along with the relative energy data are provided in the supplementary material. The lower panels show the errors in the relative correlation energy for THC methods compared to DF-MP2.

FIG. 3.

MP2 correlation energies with density fitting, least-squares THC (LS-THC), dual-grid THC (DG-THC), and dual-grid THC with regularization (DG-THC with δ = 1.0 × 10−5 a.u.). Density fitting MP2 calculations were performed with QCHEM. All THC calculations were performed with TERACHEM. The upper panels show the relative correlation energies computed on 576 different alanine dipeptides geometries generated by scanning over the φ and ψ angles, as indicated on the lower left. The zero of energy for each method is chosen as the average energy across all conformations at that level of theory. Details about the coordinates along with the relative energy data are provided in the supplementary material. The lower panels show the errors in the relative correlation energy for THC methods compared to DF-MP2.

Close modal

As we have discussed in Sec. II, DG-THC-MP2 was introduced to factorize the three-index ERI as the product of two index tensors, such that storing the three-index tensor can be avoided when computing the density matrices required for the analytic derivative. It is also interesting to investigate how the insertion of numerical integration benefits the performance of energy calculations. Integral evaluations on the GPUs usually benefit from the fact that GPUs are capable of launching thousands of threads simultaneously but are restricted by the limited register resources for each GPU thread. Therefore, compared with LS-THC that requires the more complicated three-center integrals, the DG-THC scheme has clear advantages on the GPUs because the required integrals are much simpler, as shown in Eq. (69), and can be easily parallelized over GPU threads. In addition, as we have mentioned, our implementation now uses the MAGMA library so that diagonalization of the metric matrix is also accelerated by the GPUs. As shown in Fig. S4 of the supplementary material, at least four times speedup is achieved in SOS-MP2 energy calculations by introducing DG-THC and the Magma library.

In Fig. 4, we show the performance of energy and gradient calculations with both THC-SOS-MP2 and THC-MP2. The THC-SOS-MP2 energy and gradient evaluations both exhibit sub-cubic scaling for the test systems (water clusters with up to 900 basis functions), and THC-MP2 energy and gradients both show sub-quartic scaling. In Fig. S5 of the supplementary material, we also compare the scaling of the gradient calculations with THC-SOS-MP2 and DF-SOS-MP2. These results indicate that the THC factorization can effectively reduce the scaling in both energy and gradient calculations. In addition, because computations on different Laplace quadrature points are independent, the method is well suited for parallelization. Our current implementation supports multi-threading parallelism within a node, where each thread consists of one CPU core and one GPU. In Fig. S7 of the supplementary material, we present the parallel efficiency of the current implementation. The speedup for the test system is close to linear with respect to the number of GPUs and reaches 6.7× speedup with 8 GPUs. Parallelization across multiple nodes will be studied in future work. In terms of the absolute timing, THC-SOS-MP2 and THC-MP2 have quite different behaviors. The gradient of THC-SOS-MP2 takes only about twice the time needed for THC-SOS-MP2 energy. Since THC-SOS-MP2 only requires the Coulomb-like term, its efficiency is as expected since the THC factorizations of the Coulomb-like density matrices all closely resemble the factorization of the Coulomb-like energy. In contrast, the gradient of THC-MP2 takes much longer than THC-MP2 energy. Compared to the THC factorization of the exchange-like energy, the exchange-like OVV density matrix ρμνκ in Eq. (40) has a nearly identical factorization and thus takes almost the same amount of time. The factorization of the OOV density matrix ρ¯μνκ is similar but interchanges the virtual and occupied orbital indices. This implies an increased computational cost by a factor of five, corresponding to the ratio of the number of virtual and occupied orbitals. The factorization of the density matrix of the Z tensor ΓPQ requires a few additional tensor operations, Eqs. (58)–(61), and therefore takes about twice the time needed for the energy evaluations. Combining all of these effects leads to a gradient which is almost eight times more costly than the energy in THC-MP2. Numerical timings for each of these components of the THC-MP2 gradient are provided in Fig. S6 of the supplementary material.

FIG. 4.

The scaling of the THC-SOS-MP2 energy (orange), THC-SOS-MP2 gradient (green), THC-MP2 energy (red), and THC-MP2 gradient (blue) calculations on water clusters of increasing size (taken from ice crystal structures). Dual-grid THC is used for all calculations. Details about the test systems are provided in the supplementary material. All calculations used the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set. The geometry shown in the inset corresponds to the largest test cluster (H2O)36. All calculations were performed with TeraChem using one Intel Xeon X5570 CPU core and one GeForce GTX TITAN GPU.

FIG. 4.

The scaling of the THC-SOS-MP2 energy (orange), THC-SOS-MP2 gradient (green), THC-MP2 energy (red), and THC-MP2 gradient (blue) calculations on water clusters of increasing size (taken from ice crystal structures). Dual-grid THC is used for all calculations. Details about the test systems are provided in the supplementary material. All calculations used the cc-pVDZ basis set and the cc-pVDZ-RI auxiliary basis set. The geometry shown in the inset corresponds to the largest test cluster (H2O)36. All calculations were performed with TeraChem using one Intel Xeon X5570 CPU core and one GeForce GTX TITAN GPU.

Close modal

To investigate the effect of the regularization parameter on stability in AIMD simulations, we carried out several microcanonical AIMD simulations with a varying regularization parameter and observed the degree to which total energy was conserved. We first ran simulations on an alanine aldehyde molecule with DG-THC-SOS-MP2. Initial velocities were drawn from a Maxwell-Boltzmann distribution at 1500 K, and a velocity Verlet integrator was used with a 0.2 fs time step. We select one representative trajectory for each regularization parameter and plot the kinetic, potential, and total energies as a function of time for 1 ps in Fig. 5. The cutoff of eigenvalues during pseudo-matrix inversion is set to 1.0 × 10−12 a.u. As shown in Fig. 5, a larger regularization parameter significantly improves the total energy conservation. For both δ = 1.0 × 10−7 a.u. and δ = 1.0 × 10−8 a.u., there are notable discontinuities appearing in the trajectories, and calculations in the AIMD simulations soon fail to converge. With δ = 1.0× 10−6 a.u., both the number and size of the discontinuities become much smaller. With δ = 1.0 × 10−5 a.u., the total energy conservation becomes acceptable. To verify that the methods and implementations of THC-MP2 gradients are also correct, we have also performed similar tests on an acetic acid molecule with DG-THC-MP2 under the same initial settings (1500 K and 0.2 fs time step). Fig. S8 of the supplementary material shows energies as a function of time for 2 ps. The trajectory with δ = 1.0 × 10−7 a.u. contains notable discontinuities. With regularization δ = 1.0 × 10−5 a.u., the energy conservation is significantly improved, and the total energy drift is 0.0164 (kcal/mol)/ps, corresponding to 6.8 × 10−4 kcal/mol per degree of freedom per ps.

FIG. 5.

Energy conservation tests for DG-THC-SOS-MP2 with different regularization parameters in microcanonical ab initio molecular dynamics simulation. Simulations were performed on one alanine aldehyde molecule with the cc-pVDZ basis set and cc-pVDZ-RI auxiliary basis set. The initial velocities were drawn from a Maxwell-Boltzmann distribution at 1500 K. The velocity Verlet integrator with a 0.2 fs time step is used. The linear fit reflects the drift in the total energy (in kcal/mol) as a function of time (in picoseconds). The inset geometry at the right corner represents the initial coordinates for all MD simulations, where hydrogen, carbon, nitrogen, and oxygen atoms are colored as white, gray, blue, and red, respectively.

FIG. 5.

Energy conservation tests for DG-THC-SOS-MP2 with different regularization parameters in microcanonical ab initio molecular dynamics simulation. Simulations were performed on one alanine aldehyde molecule with the cc-pVDZ basis set and cc-pVDZ-RI auxiliary basis set. The initial velocities were drawn from a Maxwell-Boltzmann distribution at 1500 K. The velocity Verlet integrator with a 0.2 fs time step is used. The linear fit reflects the drift in the total energy (in kcal/mol) as a function of time (in picoseconds). The inset geometry at the right corner represents the initial coordinates for all MD simulations, where hydrogen, carbon, nitrogen, and oxygen atoms are colored as white, gray, blue, and red, respectively.

Close modal

Given that trajectories with regularization δ = 1.0 × 10−5 a.u. are able to satisfy energy conservation requirements, we proceeded to test energy conservation using THC gradients for longer AIMD simulations. Figure 6 shows the results from AIMD simulations on one alanine aldehyde molecule with DG-THC-SOS-MP2 (δ = 1.0 × 10−5 a.u.) for 10 ps with initial velocities drawn from a Maxwell-Boltzmann distribution at 1500 K and a 0.2 fs time step. As shown in Fig. 6, the trajectory remains stable for the entire 10 ps, and the fluctuation in total energy is unnoticeable on the scale of the fluctuations in the kinetic and potential energies. The drift of the total energy is 4.755 × 10−4 (kcal/mol)/ps, corresponding to 1.320 × 10−5 kcal/mol per degree of freedom per ps. Further tests are provided in the supplementary material (Fig. S9) for AIMD simulations of an acetic acid molecule with DG-THC-MP2 (δ = 1.0 × 10−5 a.u.). These results demonstrate that our gradient implementation gives the correct nuclear gradients with respect to the THC-MP2 potential energy surfaces.

FIG. 6.

Energy as a function of time in a microcanonical ab initio molecular dynamics simulation of one alanine aldehyde molecule using DG-THC-SOS-MP2(δ = 1.0 × 10−5 a.u.) with the cc-pVDZ basis set and cc-pVDZ-RI auxiliary basis set. The initial velocities were drawn from a Maxwell-Boltzmann distribution at 1500 K. The velocity Verlet integrator with a 0.2 fs time step is used. The linear regression fits the total energy (in kcal/mol) as a function of time (in picoseconds). The four snapshots from left to right are taken at 2 ps, 4 ps, 6 ps, and 8 ps, respectively. Hydrogen, carbon, nitrogen, and oxygen atoms are colored as white, gray, blue, and red, respectively.

FIG. 6.

Energy as a function of time in a microcanonical ab initio molecular dynamics simulation of one alanine aldehyde molecule using DG-THC-SOS-MP2(δ = 1.0 × 10−5 a.u.) with the cc-pVDZ basis set and cc-pVDZ-RI auxiliary basis set. The initial velocities were drawn from a Maxwell-Boltzmann distribution at 1500 K. The velocity Verlet integrator with a 0.2 fs time step is used. The linear regression fits the total energy (in kcal/mol) as a function of time (in picoseconds). The four snapshots from left to right are taken at 2 ps, 4 ps, 6 ps, and 8 ps, respectively. Hydrogen, carbon, nitrogen, and oxygen atoms are colored as white, gray, blue, and red, respectively.

Close modal

In addition to AIMD simulations, the THC-MP2 and THC-SOS-MP2 gradients developed can also be used in geometry optimizations, and one such result is presented in Fig. 7. The geometry optimization was carried out using the geometric optimizer,80 and the convergence criteria were set as specified in the figure caption. The starting structure was taken from PDB-ID 2ONW, an X-ray crystal structure of a fibril-forming peptide (sequence SSTSAA). Our energy minimization, which was carried out on a single molecule in the gas phase, caused multiple changes in the structure. The overall structure contracted during the minimization process, with the end-to-end distance between the two α-carbons on S1 and A6 decreasing from 16.8 to 10.6 Å. We observed the formation of one intramolecular hydrogen bond between the OH groups of the S2 and S4 side chains with an equilibrium distance of 1.72 Å. The S4 side chain also formed a hydrogen bond with the backbone CO group of the same residue with a distance of 1.82 Å. The chemical structure remained the same throughout the optimization with no proton transfer or other chemical changes observed.

FIG. 7.

Energy changes in the geometry optimization of a fibril-forming peptide (PDB ID: 2ONW) using DG-THC-SOS-MP2 with the cc-pVDZ basis set and cc-pVDZ-RI auxiliary basis set. This peptide consists of 6 residues, 70 atoms, and 710 basis functions. The optimization is deemed converged when the following five criteria are satisfied simultaneously: (1) RMSD from the previous step is less than 1.2 × 10−3 Å; (2) maximum atomic displacement from the previous step is less than 1.8 × 10−3 Å; (3) RMS of the nuclear gradient is less than 3.0 × 10−4 hartree/bohr; (4) maximum nuclear gradient for any atom is less than 4.5 × 10−4 hartree/bohr; (5) energy change from the previous step is less than 1.0 × 10−6 hartree. Hydrogen, carbon, nitrogen, and oxygen atoms are colored as white, gray, blue, and red, respectively.

FIG. 7.

Energy changes in the geometry optimization of a fibril-forming peptide (PDB ID: 2ONW) using DG-THC-SOS-MP2 with the cc-pVDZ basis set and cc-pVDZ-RI auxiliary basis set. This peptide consists of 6 residues, 70 atoms, and 710 basis functions. The optimization is deemed converged when the following five criteria are satisfied simultaneously: (1) RMSD from the previous step is less than 1.2 × 10−3 Å; (2) maximum atomic displacement from the previous step is less than 1.8 × 10−3 Å; (3) RMS of the nuclear gradient is less than 3.0 × 10−4 hartree/bohr; (4) maximum nuclear gradient for any atom is less than 4.5 × 10−4 hartree/bohr; (5) energy change from the previous step is less than 1.0 × 10−6 hartree. Hydrogen, carbon, nitrogen, and oxygen atoms are colored as white, gray, blue, and red, respectively.

Close modal

In this work, we derived and implemented analytic gradients for tensor hyper-contracted MP2 and SOS-MP2 on graphical processing units (GPUs). We have achieved formal quartic/cubic scaling with molecular size for MP2/SOS-MP2 gradient evaluations. In practice, the observed scaling up to 1000 basis functions is even less steep—O(N3.6)/O(N2.4) for MP2/SOS-MP2, respectively. We applied the analytic gradients in the context of ab initio molecular dynamics simulations and observed good energy conservation (10−5 kcal/mol per ps per degree-of-freedom or better), demonstrating the robustness and accuracy of our formulation and its implementation.

In order to avoid difficulties in the differentiation of the inverse of an ill-conditioned matrix that is present in THC methods, we have used a regularization procedure. This introduces small errors in the correlation energies (less than 0.25 kcal/mol in the test molecules used here). Even though these errors are small and do not cause difficulties with energy conservation in dynamics (implying that there are no significant discontinuities in the resulting potential energy surface), this remains a point for further improvement that we will pursue in future work. Specifically, it would be useful to investigate alternative strategies to ensure that the matrices arising in THC remain well-conditioned.

In our current implementation, calculation of the Coulomb-like terms is significantly faster than the exchange-like terms. In part, this is because the Coulomb-like terms involve only two-index tensors, which fit nicely in the BLAS framework. Therefore, the tensor operations in the Coulomb-like terms greatly benefit from the mature BLAS libraries available both on the CPU (e.g., MKL) and also on the GPU (e.g., MAGMA and cuBLAS). In contrast, the exchange-like terms involve operations between higher-order tensors, and some of the required operations do not fit well into the existing BLAS framework. These operations provide challenging questions for optimization regarding balancing of memory usage and floating point operations and distribution of work between the CPU and GPU. Therefore, a high performance high-order tensor library that utilizes both the CPU and GPU efficiently, and that further recognizes and exploits sparsity patterns in the tensors, would be very useful in order to extend our work to large molecules and long time scales.

Now that analytic gradients are available for tensor hyper-contraction, we are currently working to develop low scaling analytic gradients for other correlation methods within the THC framework. For example, it remains to develop analytic gradients for THC-EOM-CC2,59 which can potentially enable AIMD simulations for larger molecules on the excited states. In addition, the development of THC-based methods so far is mostly within the framework of Gaussian orbitals, and the corresponding applications are primarily targeting non-periodic molecular systems. To enable applications to extended systems, plane wave based approaches are required, and such methods with high-performing implementations are already available for DF-MP2.55 Recently, it has been shown that THC is also applicable in cases with plane wave basis sets,81 and the approaches used here for analytic gradients will likely also be useful in that context.

See supplementary material for details of numerical grids, further performance evaluation including parallelization over multiple GPUs, and further energy conservation tests.

This work was supported by the National Science Foundation (Grant No. CHE-1565249) and used the XStream computational resource supported by the NSF MRI Program (Grant No. ACI-1429830). C.S. is grateful for a Stanford Graduate Fellowship.

The Coulomb-like term of the MP2 energy is

EMP2J=κijabτiκτjκτaκτbκia|jbTHCia|jbTHC=κijabPQRSτiκτjκτaκτbκXiPXaPZPQXjQXbQXiRXaRZRSXjSXbS.
(A1)

We first define the following transformation matrices and tensors:

Tμμκ,occ=iCiμτiκCiμ,ON3,
(A2)
Tμμκ,vir=aCaμτaκCaμ,ON3,
(A3)
Yμκ,P=μTμμκ,occXμP,ON2,
(A4)
Uμκ,P=μTμμκ,virXμP,ON2.
(A5)

EMP2-J can then be written in the AO basis with THC integrals as

EMP2J=κ,μνλσPQRSXμPYμκ,RXγPUγκ,RZRSYλκ,SXλQUσκ,SXσQZQP.
(A6)

This can be evaluated in O(N3) operations as follows:

Oκ,PR=μXμPYμκ,R,ON2,
(A7)
Vκ,PR=γXγPUγκ,R,ON2,
(A8)
Aκ,PR=Oκ,PRVκ,PR,ON2,
(A9)
Fκ,PS=RAκ,PRZRS,ON3,
(A10)
EMP2J=κ,PSFκ,PSFκ,SP,ON2.
(A11)

The exchange-like part of the MP2 energy is

EMP2K=κijabτiκτjκτaκτbκia|jbTHCib|jaTHC=κijabPQRSτiκτjκτaκτbκXiPXaPZPQXjQXbQXiRXbRZRSXjSXaS.
(B1)

We first define the following two tensors:

Λjκ,P=τjκXjP,
(B2)
Λ¯bκ,P=τbκXbP.
(B3)

With these definitions, the exchange-like term of the MP2 energy becomes

EMP2K=κPQRSμνjσ,σνXμPYμκ,RZPQΛjκ,QXσQTσσκ,virXσRZRSΛjκ,SXγSTγγκ,virXγP,
(B4)

which can be computed in O(N4) operations as follows:

Bjγκ,R=SZRSΛjκ,SXγS,ON3,
(B5)
Njγκ,R=γBjγκ,RTγγκ,vir,ON4,
(B6)
Gjκ,RP=σNjγκ,RXγP,ON3,
(B7)
Hκ,PR=jGjκ,PRGjκ,RP,ON3,
(B8)
EMP2K=κPRAκ,PRHκ,PR,ON2.
(B9)

For the Z tensor of LS-AO-THC defined in Eq. (62), we first define the following tensors that are both parts of the Z tensor:

ΞPP=Sphys1PP,
(C1)
MPB=PμνAΞPPXμPXνPμν|AA|B1.
(C2)

The density matrix for μν|A can then be computed as

cPA=2PQΞPPΓPQMQA,
(C3)
dμνA1=PXμPXνPcPA.
(C4)

Inserting the derivative of the metric matrices,

A|B1ξ=CDA|C1C|DξD|B1.
(C5)

The density matrix for (A|B) is then

dAB2=PQMPAΓPQMQB.
(C6)

The derivative of the physical space metric matrix is

ΞPP,ξ=2QQμΞPQΩQQXμQΞQPXμQ,ξ2QQμΞPQXμQΩQQΞQPXμQ,ξ,
(C7)

where

ΩQQ=μXμQXμQ.
(C8)

The density matrices for X can then be computed as

sPQ=2ΩPQRSΞPRΓRSZSQ,
(C9)
dPμ3=DμP+QsPQXμQ+QsQPXμQ+2AνcPAA|μνXνP.
(C10)

For the Z tensor defined in Eq. (70), which has the density fitting integral evaluated through numerical integration, we first define the following tensors that are parts of the Z tensors:

Ω̃PP̃=μXμPX̃μP̃,
(D1)
TPP̃=μνPΞPPXμPXνPX̃μP̃X̃νP̃=PΞPPΩ̃PP̃Ω̃PP̃,
(D2)
M̃PB=P̃ATPP̃ṼP̃AA|B1.
(D3)

Similar to Eq. (C6), the density matrix for (A|B) becomes

d̃AB1=PQM̃PAΓPQM̃QB.
(D4)

The density matrix for ṼP̃A can be computed as

d̃P̃A2=2PTPP̃ΓPQM̃QA.
(D5)

The density matrix for X̃μP̃ can be computed as

rPP̃=4Ω̃PP̃PQAΞPPΓPQM̃QAṼP̃A,
(D6)
d̃P̃μ3=PrPP̃XμP.
(D7)

The density matrix for XμP can be computed as

s̃PQ=4ΩPQRSΞPRΓRSZSQ,
(D8)
d̃Pμ4=DμP+Qs̃PQXμQ+Qs̃QPXμQ+P̃rPP̃X̃μP̃.
(D9)
1.
C.
Møller
and
M. S.
Plesset
, “
Note on an approximation treatment for many-electron systems
,”
Phys. Rev.
46
,
618
(
1934
).
2.
K. E.
Riley
and
P.
Hobza
, “
Assessment of the MP2 method, along with several basis sets, for the computation of interaction energies of biologically relevant hydrogen bonded and dispersion bound complexes
,”
J. Phys. Chem. A
111
,
8257
(
2007
).
3.
M. K.
Kesharwani
,
A.
Karton
, and
J. M. L.
Martin
, “
Benchmark ab initio conformational energies for the proteinogenic amino acids through explicitly correlated methods. Assessment of density functional methods
,”
J. Chem. Theory Comput.
12
,
444
(
2016
).
4.
S.
Yoo
,
E.
Apra
,
X. C.
Zeng
, and
S. S.
Xantheas
, “
High-level ab initio electronic structure calculations of water clusters (H2O)16 and (H2O)17: A new global minimum for (H2O)16
,”
J. Phys. Chem. Lett.
1
,
3122
(
2010
).
5.
L.-P.
Wang
,
K. A.
McKiernan
,
J.
Gomes
,
K. A.
Beauchamp
,
T.
Head-Gordon
,
J. E.
Rice
,
W. C.
Swope
,
T. J.
Martínez
, and
V. S.
Pande
, “
Building a more predictive protein force field: A systematic and reproducible route to AMBER-FB15
,”
J. Phys. Chem. B
121
,
4023
(
2017
).
6.
M.
Del Ben
,
M.
Schonherr
,
J.
Hutter
, and
J.
VandeVondele
, “
Bulk liquid water at ambient temperature and pressure from MP2 theory
,”
J. Phys. Chem. Lett.
4
,
3753
(
2013
).
7.
J. A.
Pople
,
R.
Krishnan
,
H. B.
Schlegel
, and
J. S.
Binkley
, “
Derivative studies in Hartree-Fock and Møller-Plesset theories
,”
Int. J. Quantum Chem.
16
,
225
(
1979
).
8.
J. E.
Rice
and
R. D.
Amos
, “
On the efficient evaluation of analytic energy gradients
,”
Chem. Phys. Lett.
122
,
585
(
1985
).
9.
N. C.
Handy
,
R. D.
Amos
,
J. F.
Gaw
,
J. E.
Rice
, and
E. D.
Simandiras
, “
The elimination of singularities in derivative calculations
,”
Chem. Phys. Lett.
120
,
151
(
1985
).
10.
G.
Fitzgerald
,
R. J.
Harrison
, and
R. J.
Bartlett
, “
Analytic energy gradients for general coupled-cluster methods and 4th order many-body perturbation theory
,”
J. Chem. Phys.
85
,
5143
(
1986
).
11.
N. C.
Handy
and
H. F.
Schaefer
, “
On the evaluation of analytic energy derivatives for correlated wave-functions
,”
J. Chem. Phys.
81
,
5031
(
1984
).
12.
M. J.
Frisch
,
M.
Head-Gordon
, and
J. A.
Pople
, “
A direct MP2 gradient method,”
,”
Chem. Phys. Lett.
166
,
275
(
1990
).
13.
M. J.
Frisch
,
M.
Head-Gordon
, and
J. A.
Pople
, “
Semi-direct algorithms for the MP2 energy and gradient
,”
Chem. Phys. Lett.
166
,
281
(
1990
).
14.
F.
Haase
and
R.
Ahlrichs
, “
Semidirect MP2 gradient evaluation on workstation programs - the MPGRAD program
,”
J. Comput. Chem.
14
,
907
(
1993
).
15.
M.
Head-Gordon
, “
An improved semidirect MP2 gradient method
,”
Mol. Phys.
96
,
673
(
1999
).
16.
E. D.
Simandiras
,
R. D.
Amos
, and
N. C.
Handy
, “
The analytic evaluation of 2nd-order Møller-Plesset (MP2) dipole moment derivatives
,”
Chem. Phys.
114
,
9
(
1987
).
17.
T. J.
Lee
,
S. C.
Racine
,
J. E.
Rice
, and
A. P.
Rendell
, “
On the orbital contribution to analytical derivatives of perturbation theory energies
,”
Mol. Phys.
85
,
561
(
1995
).
18.
P.
Jørgensen
and
T.
Helgaker
, “
Møller-Plesset energy derivatives
,”
J. Chem. Phys.
89
,
1560
(
1988
).
19.
T.
Helgaker
,
P.
Jørgensen
, and
N. C.
Handy
, “
A numerically stable procedure for calculating Møller-Plesset energy derivatives, derived using the theory of Lagrangians
,”
Theor. Chim. Acta
76
,
227
(
1989
).
20.
E. A.
Salter
,
G. W.
Trucks
,
G.
Fitzgerald
, and
R. J.
Bartlett
, “
Theory and application of MBPT(3) gradients - the density approach
,”
Chem. Phys. Lett.
141
,
61
(
1987
).
21.
G. W.
Trucks
,
E. A.
Salter
,
C.
Sosa
, and
R. J.
Bartlett
, “
Theory and implementation of the MBPT density-matrix - an application to one-electron properties
,”
Chem. Phys. Lett.
147
,
359
(
1988
).
22.
W. J.
Lauderdale
,
J. F.
Stanton
,
J.
Gauss
,
J. D.
Watts
, and
R. J.
Bartlett
, “
Restricted open-shell Hartree-Fock based many-body perturbation theory - theory and application of energy and gradient calculations
,”
J. Chem. Phys.
97
,
6606
(
1992
).
23.
D. J.
Tozer
,
J. S.
Andrews
,
R. D.
Amos
, and
N. C.
Handy
, “
Gradient theory applied to restricted (Open-Shell) Møller-Plesset theory
,”
Chem. Phys. Lett.
199
,
229
(
1992
).
24.
C. M.
Aikens
,
S. P.
Webb
,
R. L.
Bell
,
G. D.
Fletcher
,
M. W.
Schmidt
, and
M. S.
Gordon
, “
A derivation of the frozen-orbital unrestricted open-shell and restricted closed-shell second-order perturbation theory analytic gradient expressions
,”
Theor. Chem. Acc.
110
,
233
(
2003
).
25.
S.
Hirata
and
S.
Iwata
, “
Analytical energy gradients in second-order Møller-Plesset perturbation theory for extended systems
,”
J. Chem. Phys.
109
,
4147
(
1998
).
26.
S.
Saebø
,
J.
Baker
,
K.
Wolinski
, and
P.
Pulay
, “
An efficient atomic orbital based second-order Møller-Plesset gradient program
,”
J. Chem. Phys.
120
,
11423
(
2004
).
27.
S.
Schweizer
,
B.
Doser
, and
C.
Ochsenfeld
, “
An atomic orbital-based reformulation of energy gradients in second-order Møller-Plesset perturbation theory
,”
J. Chem. Phys.
128
,
154101
(
2008
).
28.
K.
Kristensen
,
P.
Jørgensen
,
B.
Jansík
,
T.
Kjærgaard
, and
S.
Reine
, “
Molecular gradient for second-order Møller-Plesset perturbation theory using the divide-expand-consolidate (DEC) scheme
,”
J. Chem. Phys.
137
,
114102
(
2012
).
29.
M.
Kobayashi
and
H.
Nakai
, “
An effective energy gradient expression for divide-and-conquer second-order Møller-Plesset perturbation theory
,”
J. Chem. Phys.
138
,
044102
(
2013
).
30.
U.
Bozkaya
and
C. D.
Sherrill
, “
Analytic energy gradients for the orbital-optimized second-order Møller-Plesset perturbation theory
,”
J. Chem. Phys.
138
,
184103
(
2013
).
31.
S.
Chabbal
,
H.
Stoll
,
H.-J.
Werner
, and
T.
Leininger
, “
Analytic gradients for the combined sr-DFT/lr-MP2 method: Application to weakly bound systems
,”
Mol. Phys.
108
,
3373
(
2010
).
32.
E.
Kordel
,
C.
Villani
, and
W.
Klopper
, “
Analytical nuclear gradients for the MP2-R12 method
,”
Mol. Phys.
105
,
2565
(
2007
).
33.
D.
Si
and
H.
Li
, “
Analytic energy gradients in combined second order Møller-Plesset perturbation theory and conductorlike polarizable continuum model calculation
,”
J. Chem. Phys.
135
,
144107
(
2011
).
34.
T.
Nagata
,
D. G.
Fedorov
,
H.
Li
, and
K.
Kitaura
, “
Analytic gradient for second order Møller-Plesset perturbation theory with the polarizable continuum model based on the fragment molecular orbital method
,”
J. Chem. Phys.
136
,
204112
(
2012
).
35.
J.
Jung
,
Y.
Sugita
, and
S.
Ten-no
, “
Møller-Plesset perturbation theory gradient in the generalized hybrid orbital quantum mechanical and molecular mechanical method
,”
J. Chem. Phys.
132
,
084106
(
2010
).
36.
H.
Li
, “
Analytic energy gradient in combined second-order Møller-Plesset perturbation theory and polarizable force field calculation
,”
J. Phys. Chem. A
115
,
11824
(
2011
).
37.
F.
Weigend
and
M.
Haser
, “
RI-MP2: First derivatives and global consistency
,”
Theor. Chem. Acc.
97
,
331
(
1997
).
38.
R. A.
Distasio
,
R. P.
Steele
,
Y. M.
Rhee
,
Y. H.
Shao
, and
M.
Head-Gordon
, “
An improved algorithm for analytical gradient evaluation in resolution-of-the-identity second-order Møller-Plesset perturbation theory: Application to alanine tetrapeptide conformational analysis
,”
J. Comput. Chem.
28
,
839
(
2007
).
39.
U.
Bozkaya
, “
Derivation of general analytic gradient expressions for density-fitted post-Hartree-Fock methods: An efficient implementation for the density-fitted second-order Møller-Plesset perturbation theory
,”
J. Chem. Phys.
141
,
124108
(
2014
).
40.
F.
Aquilante
,
R.
Lindh
, and
T. B.
Pedersen
, “
Analytic derivatives for the Cholesky representation of the two-electron integrals
,”
J. Chem. Phys.
129
,
034106
(
2008
).
41.
J.
Bostrom
,
V.
Veryazov
,
F.
Aquilante
,
T. B.
Pedersen
, and
R.
Lindh
, “
Analytical gradients of the second-order Møller-Plesset energy using Cholesky decompositions
,”
Int. J. Quantum Chem.
114
,
321
(
2014
).
42.
T.
Nagata
,
D. G.
Fedorov
,
K.
Ishimura
, and
K.
Kitaura
, “
Analytic energy gradient for second-order Møller-Plesset perturbation theory based on the fragment molecular orbital method
,”
J. Chem. Phys.
135
,
044110
(
2011
).
43.
T.
Tsukamoto
,
Y.
Mochizuki
,
N.
Watanabe
,
K.
Fukuzawa
, and
T.
Nakano
, “
Partial geometry optimization with FMO-MP2 gradient: Application to TrpCage
,”
Chem. Phys. Lett.
535
,
157
(
2012
).
44.
Y.
Mochizuki
,
T.
Nakano
,
Y.
Komeiji
,
K.
Yamashita
,
Y.
Okiyama
,
H.
Yoshikawa
, and
H.
Yamataka
, “
Fragment molecular orbital-based molecular dynamics (FMO-MD) method with MP2 gradient
,”
Chem. Phys. Lett.
504
,
95
(
2011
).
45.
M.
Schutz
,
H. J.
Werner
,
R.
Lindh
, and
F. R.
Manby
, “
Analytical energy gradients for local second-order Møller-Plesset perturbation theory using density fitting approximations
,”
J. Chem. Phys.
121
,
737
(
2004
).
46.
A.
El Azhary
,
G.
Rauhut
,
P.
Pulay
, and
H. J.
Werner
, “
Analytical energy gradients for local second-order Møller-Plesset perturbation theory
,”
J. Chem. Phys.
108
,
5185
(
1998
).
47.
Y.
Jung
,
R. C.
Lochan
,
A. D.
Dutoi
, and
M.
Head-Gordon
, “
Scaled opposite-spin second order Møller-Plesset correlation energy: An economical electronic structure method
,”
J. Chem. Phys.
121
,
9793
(
2004
).
48.
R. C.
Lochan
,
Y. H.
Shao
, and
M.
Head-Gordon
, “
Quartic-scaling analytical energy gradient of scaled opposite-spin second-order Møller-Plesset perturbation theory
,”
J. Chem. Theory Comput.
3
,
988
(
2007
).
49.
Y. M.
Rhee
,
D.
Casanova
, and
M.
Head-Gordon
, “
Quartic-scaling analytical gradient of quasidegenerate scaled opposite spin second-order perturbation corrections to single excitation configuration interaction
,”
J. Chem. Theory Comput.
5
,
1224
(
2009
).
50.
I. M. B.
Nielsen
, “
A new direct MP2 gradient algorithm with implementation on a massively parallel computer
,”
Chem. Phys. Lett.
255
,
210
(
1996
).
51.
G. D.
Fletcher
,
A. P.
Rendell
, and
P.
Sherwood
, “
A parallel second-order Møller-Plesset gradient
,”
Mol. Phys.
91
,
431
(
1997
).
52.
K.
Ishimura
,
P.
Pulay
, and
S.
Nagase
, “
New parallel algorithm for MP2 energy gradient calculations
,”
J. Comput. Chem.
28
,
2034
(
2007
).
53.
C. M.
Aikens
and
M. S.
Gordon
, “
Parallel unrestricted MP2 analytic gradients using the distributed data interface
,”
J. Phys. Chem. A
108
,
3103
(
2004
).
54.
C.
Hättig
,
A.
Hellweg
, and
A.
Köhn
, “
Distributed memory parallel implementation of energies and gradients for second-order Møller-Plesset perturbation theory with the resolution-of-the-identity approximation
,”
Phys. Chem. Chem. Phys.
8
,
1159
(
2006
).
55.
M.
Del Ben
,
J.
Hutter
, and
J.
VandeVondele
, “
Forces and stress in second order Møller-Plesset perturbation theory for condensed phase systems within the resolution-of-identity Gaussian and plane waves approach
,”
J. Chem. Phys.
143
,
102803
(
2015
).
56.
E. G.
Hohenstein
,
R. M.
Parrish
,
C. D.
Sherrill
, and
T. J.
Martínez
, “
Communication: Tensor hypercontraction. III. Least-squares tensor hypercontraction for the determination of correlated wavefunctions
,”
J. Chem. Phys.
137
,
221101
(
2012
).
57.
R. M.
Parrish
,
E. G.
Hohenstein
,
T. J.
Martínez
, and
C. D.
Sherrill
, “
Tensor hypercontraction. II. Least-squares renormalization
,”
J. Chem. Phys.
137
,
224106
(
2012
).
58.
E. G.
Hohenstein
,
S. I. L.
Kokkila
,
R. M.
Parrish
, and
T. J.
Martínez
, “
Quartic scaling second-order approximate coupled cluster singles and doubles via tensor hypercontraction: THC-CC2
,”
J. Chem. Phys.
138
,
124111
(
2013
).
59.
E. G.
Hohenstein
,
S. I. L.
Kokkila
,
R. M.
Parrish
, and
T. J.
Martinez
, “
Tensor hypercontraction equation-of-motion second-order approximate coupled cluster: Electronic excitation energies in O(N4) time
,”
J. Phys. Chem. B
117
,
12972
(
2013
).
60.
S. I. L.
Kokkila Schumacher
,
E. G.
Hohenstein
,
R. M.
Parrish
,
L.-P.
Wang
, and
T. J.
Martinez
, “
Tensor hypercontraction second-order Møller-Plesset perturbation theory: Grid optimization and reaction energies
,”
J. Chem. Theory Comput.
11
,
3042
(
2015
).
61.
C.
Song
and
T. J.
Martínez
, “
Atomic orbital-based SOS-MP2 with tensor hypercontraction: I. GPU-based tensor construction and exploiting sparsity
,”
J. Chem. Phys.
144
,
174111
(
2016
).
62.
C.
Song
and
T. J.
Martinez
, “
Atomic orbital-based SOS-MP2 with tensor hypercontraction. II. Local tensor hypercontraction
,”
J. Chem. Phys.
146
,
034104
(
2017
).
63.
R. M.
Parrish
,
E. G.
Hohenstein
,
N. F.
Schunck
,
C. D.
Sherrill
, and
T. J.
Martínez
, “
Exact tensor hypercontraction: A universal technique for the resolution of matrix elements of local finite-range N-body potentials in many-body quantum problems
,”
Phys. Rev. Lett.
111
,
132505
(
2013
).
64.
R. M.
Parrish
,
C. D.
Sherrill
,
E. G.
Hohenstein
,
S. I. L.
Kokkila
, and
T. J.
Martínez
, “
Communication: Acceleration of coupled cluster singles and doubles via orbital-weighted least-squares tensor hypercontraction
,”
J. Chem. Phys.
140
,
181102
(
2014
).
65.
J.
Almlof
, “
Elimination of energy denominators in Møller-Plesset perturbation-theory by a Laplace transform approach
,”
Chem. Phys. Lett.
181
,
319
(
1991
).
66.
M.
Haser
and
J.
Almlof
, “
Laplace transform techniques in Møller-Plesset perturbation theory
,”
J. Chem. Phys.
96
,
489
(
1992
).
67.
A.
Takatsuka
,
S.
Ten-no
, and
W.
Hackbusch
, “
Minimax approximation for the decomposition of energy denominators in Laplace-transformed Møller-Plesset perturbation theories
,”
J. Chem. Phys.
129
,
044112
(
2008
).
68.
D.
Braess
and
W.
Hackbusch
, “
Approximation of 1/x by exponential sums in [1, infinity)
,”
IMA J. Numer. Anal.
25
,
685
(
2005
).
69.
J. L.
Whitten
, “
Coulombic potential-energy integrals and approximations
,”
J. Chem. Phys.
58
,
4496
(
1973
).
70.
J. A.
Pople
,
P. M. W.
Gill
, and
B. G.
Johnson
, “
Kohn-Sham density functional theory within a finite basis set
,”
Chem. Phys. Lett.
199
,
557
(
1992
).
71.
B. G.
Johnson
,
P. M. W.
Gill
, and
J. A.
Pople
, “
The performance of a family of density functional methods
,”
J. Chem. Phys.
98
,
5612
(
1993
).
72.
R.
Nath
,
S.
Tomov
, and
J.
Dongarra
, “
Accelerating GPU kernels for dense linear algebra
,” in
High Performance Computing for Computational Science–VECPAR 2010
(
Springer
,
2011
), Vol. 6449, p.
83
.
73.
See http://docs.nvidia.com/cuda/cublas for details about the cuBLAS library, accessed 2/10/17.
74.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. 1. Strategies for two-electron integral evaluation
,”
J. Chem. Theory Comput.
4
,
222
(
2008
).
75.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. 2. Direct self-consistent-field implementation
,”
J. Chem. Theory Comput.
5
,
1004
(
2009
).
76.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. 3. Analytical energy gradients, geometry optimization, and first principles molecular dynamics
,”
J. Chem. Theory Comput.
5
,
2619
(
2009
).
77.
A. V.
Titov
,
I. S.
Ufimtsev
,
N.
Luehr
, and
T. J.
Martinez
, “
Generating efficient quantum chemistry codes for novel architectures
,”
J. Chem. Theory Comput.
9
,
213
(
2013
).
78.
Y.
Shao
,
L. F.
Molnar
,
Y.
Jung
,
J.
Kussmann
,
C.
Ochsenfeld
,
S. T.
Brown
,
A. T. B.
Gilbert
,
L. V.
Slipchenko
,
S. V.
Levchenko
,
D. P.
O’Neill
,
R. A.
DiStasio
, Jr.
,
R. C.
Lochan
,
T.
Wang
,
G. J. O.
Beran
,
N. A.
Besley
,
J. M.
Herbert
,
C. Y.
Lin
,
T.
Van Voorhis
,
S. H.
Chien
,
A.
Sodt
,
R. P.
Steele
,
V. A.
Rassolov
,
P. E.
Maslen
,
P. P.
Korambath
,
R. D.
Adamson
,
B.
Austin
,
J.
Baker
,
E. F. C.
Byrd
,
H.
Dachsel
,
R. J.
Doerksen
,
A.
Dreuw
,
B. D.
Dunietz
,
A. D.
Dutoi
,
T. R.
Furlani
,
S. R.
Gwaltney
,
A.
Heyden
,
S.
Hirata
,
C.-P.
Hsu
,
G.
Kedziora
,
R. Z.
Khalliulin
,
P.
Klunzinger
,
A. M.
Lee
,
M. S.
Lee
,
W.
Liang
,
I.
Lotan
,
N.
Nair
,
B.
Peters
,
E. I.
Proynov
,
P. A.
Pieniazek
,
Y. M.
Rhee
,
J.
Ritchie
,
E.
Rosta
,
C. D.
Sherrill
,
A. C.
Simmonett
,
J. E.
Subotnik
,
H. L.
Woodcock
 III
,
W.
Zhang
,
A. T.
Bell
,
A. K.
Chakraborty
,
D. M.
Chipman
,
F. J.
Keil
,
A.
Warshel
,
W. J.
Hehre
,
H. F.
Schaefer
 III
,
J.
Kong
,
A. I.
Krylov
,
P. M. W.
Gill
, and
M.
Head-Gordon
, “
Advances in methods and algorithms in a modern quantum chemistry program package
,”
Phys. Chem. Chem. Phys.
8
,
3172
(
2006
).
79.
T. H.
Dunning
, “
Gaussian basis sets for use in correlated molecular calculations. 1. The atoms boron through neon and hydrogen
,”
J. Chem. Phys.
90
,
1007
(
1989
).
80.
L. P.
Wang
and
C. C.
Song
, “
Geometry optimization made simple with translation and rotation coordinates
,”
J. Chem. Phys.
144
,
214108
(
2016
).
81.
F.
Hummel
,
T.
Tsatsoulis
, and
A.
Grüneis
, “
Low rank factorization of the Coulomb integrals for periodic coupled cluster theory
,”
J. Chem. Phys.
146
,
124105
(
2017
).

Supplementary Material