Analytic energy gradients for tensor hyper-contraction (THC) are derived and implemented for second-order Møller-Plesset perturbation theory (MP2), with and without the scaled-opposite-spin (SOS)-MP2 approximation. By exploiting the THC factorization, the formal scaling of MP2 and SOS-MP2 gradient calculations with respect to system size is reduced to quartic and cubic, respectively. An efficient implementation has been developed that utilizes both graphics processing units and sparse tensor techniques exploiting spatial sparsity of the atomic orbitals. THC-MP2 has been applied to both geometry optimization and *ab initio* molecular dynamics (AIMD) simulations. The resulting energy conservation in micro-canonical AIMD demonstrates that the implementation provides accurate nuclear gradients with respect to the THC-MP2 potential energy surfaces.

## I. INTRODUCTION

*Ab initio* molecular dynamics (AIMD) is an important tool to understand chemical processes. However, accurate *ab initio* molecular dynamics simulations are usually limited by the computational cost of evaluating the energy and the analytical gradients with electronic structure theories. The fourth-order electronic repulsion integral (ERI) tensor is the major source of complexity for most electronic structure methods. One of the simplest ways to describe dynamic correlation effects beyond the Hartree-Fock approximation is second-order Møller-Plesset perturbation theory (MP2).^{1} MP2 is known to be good at describing hydrogen bonding and long-range dispersion interactions^{2} and provides an accurate description of conformational energies of small peptides.^{3} Therefore, MP2 has been used with geometry optimization in search of water cluster configurations^{4} and for protein force field parametrization.^{5} A recent Monte Carlo simulation study on bulk liquid water shows that the water density obtained with MP2 is in better agreement with experiments compared with density functional theory (DFT) without dispersion corrections.^{6} Despite the advantages of MP2, a bottleneck to widespread use of MP2, in its conventional form, is its quintic computational scaling with increasing system size. This scaling is due to the required transformation of the fourth-order ERI tensor from the atomic orbital (AO) basis to the molecular orbital (MO) basis.

Analytic energy gradients for MP2 were first derived and implemented by Pople *et al.*^{7} and many advances have been made since then. To efficiently treat the ERI contribution to the gradient, Rice and Amos pointed out that storing and transforming the derivative of ERI can be avoided by formulating the gradient in terms of the two-particle density matrix.^{8} Handy *et al.*^{9} and Fitzgerald *et al.*^{10} both showed that the Handy-Schaefer *Z* vector method^{11} is applicable to the MP2 gradient and requires solving only a single coupled perturbed Hartree-Fock (CPHF) equation in order to account for the orbital response contribution. These two techniques were both used in subsequent direct and semi-direct MP2 gradient implementations.^{12–15} Several alternative derivations have been reported that provide insight into the structure of the orbital response contributions,^{16,17} such as the Lagrangian method^{18,19} and the density approach.^{20,21} MP2 gradient theories have also been developed for open shell systems,^{22–24} extended systems,^{25} and non-canonical molecular orbital formulations.^{26–30} More recently, the analytic gradients for methods that combine MP2 with other levels of theories have also been developed. A few examples are the analytic gradients for MP2 combined with density functional theory,^{31} MP2 with explicit correlation (MP2-R12),^{32} MP2 with implicit solvation through the polarizable continuum model,^{33,34} and MP2 in combination with hybrid quantum mechanics/molecular mechanics (QM/MM)^{35} or polarizable force fields.^{36}

There have been many efforts to reduce the computational cost of evaluating MP2 energies and gradients. One approach factorizes the high order ERI tensor as a product of several low-order tensors. Two such examples with implemented analytic gradients are the density-fitting (DF) approximation (also called resolution-of-identity or RI)^{37–39} and Cholesky decomposition (CD).^{40,41} However, DF and CD factorizations are unable to efficiently factorize the exchange-like correlation term, and thus do not affect the quintic scaling. A second type of approach makes use of locality. One such example is the MP2 method based on fragment molecular orbital (FMO),^{42} which has been applied in geometry optimization^{43} and *ab initio* molecular dynamics simulations.^{44} Another example is local MP2 methods.^{45,46} A third type of approach is represented by the scaled opposite spin MP2 (SOS-MP2) method, which approximates the exchange-like term by scaling the Coulomb-like term, thereby achieving quartic scaling for both energy^{47} and gradients.^{48,49} Finally, one can accelerate the MP2 gradient through parallelization and such studies have been reported for canonical MP2,^{50–52} unrestricted MP2,^{53} and RI-MP2 with Gaussian orbitals^{54} or hybrid Gaussian/plane waves approaches.^{55}

Unlike most previous methods that require third-order tensors in the factorization of the ERI tensor, tensor hyper-contraction (THC)^{56,57} is able to factorize the ERI tensor using only second-order tensors. With the THC factorization, the scaling of calculating MP2,^{56} MP3,^{56} second-order approximate coupled cluster (CC2),^{58} and equation-of-motion CC2 (EOM-CC2)^{59} energies have all been successfully reduced to *O*(*N*^{4}). Previous work has primarily focused on using THC to accelerate correlation energy calculations. In this work, we focus on the development of analytic gradients for THC in order to enable its application to geometry optimization and AIMD simulations. We show how to evaluate analytic derivatives for the THC tensors and further provide complete expressions for analytic energy gradients of THC-MP2 and THC-SOS-MP2.

The structure of this paper is as follows. We start by discussing how the THC factorization reduces MP2/SOS-MP2 gradient evaluations to quartic/cubic scaling, respectively. Next, we discuss the analytic gradients for the THC tensors under a variety of THC approximations. A few considerations regarding the graphics processing unit (GPU) implementations will also be discussed. Finally, we present results demonstrating the performance of the methods and show applications of the methods to both geometry optimization and AIMD simulations. The correctness of the methods and the implementations will be demonstrated by considering energy conservation in microcanonical AIMD simulations.

## II. METHOD

As there are many high-order tensors involved in the methods described in this paper, we first summarize our convention for indices as follows:

$\mu ,\nu ,\lambda ,\sigma ,\gamma $: Atomic Orbital (AO) basis function indices. These are atom-centered contracted Gaussian functions. The indices range from 1…

*N*_{μ}, where*N*_{μ}is proportional to system size*N*.$A,B,C,D$: Auxiliary basis indices for density fitting (DF) approximation. These are also atom-centered Gaussian functions. The indices range from 1…

*N*_{A}, where*N*_{A}$\u221d$*N*. For satisfactory accuracy,*N*_{A}is usually three to four times larger than*N*_{μ}.$P,Q,R,S,P\u2032,Q\u2032$: Auxiliary basis indices for THC approximation, which in this work represent gridpoints in physical space. These indices range from 1$\u2026$

*N*_{P}, where*N*_{P}$\u221d$*N*. For satisfactory accuracy,*N*_{P}is usually eight to ten times larger than*N*_{μ}.$i,j,k$: Occupied MO basis indices ranging from 1…

*N*_{occ}.$a,b,c$: Virtual MO basis indices ranging from 1…

*N*_{virt}. Usually*N*_{virt}is significantly larger than*N*_{occ}.$\kappa $: Laplace quadrature point indices. The range of these indices is independent of

*N*, and 7-10 quadrature points are usually sufficient.

In the THC framework, the atomic orbital (AO) based ERI tensor is factorized as

where the *X* tensor in this work is constructed from the values of the atomic orbitals at physical space gridpoints,

and the *Z* tensor is often computed by least-squares (LS) as defined below and elsewhere.^{56–63} Other choices for the *X* tensor are possible, including ones that have no relationship to physical space grids.^{56} Most of the formalism presented here carries over to these choices, provided the derivative of *X* with respect to molecular geometry can be evaluated. The THC factorization of ERIs in the molecular orbital (MO) basis is obtained by keeping the *Z* tensor unchanged while transforming the *X* tensor from the AO basis to the MO basis,

where *C* represents the molecular orbital coefficients and *p* indexes molecular orbitals.

When the *X* tensor is defined as in Eq. (2) and the corresponding AO basis set is localized (as chosen in this work), the *X* tensor will be inherently sparse. This sparsity can be utilized to accelerate tensor operations involving *X*. We focus in this work on least-squares based definitions of *Z*,^{57} although other choices exist.^{56,63} Within the least-squares approximation, there remains freedom as to which error metric is used to define *Z*. For example, one can minimize the errors in the ERIs in the MO^{57} or AO^{61,62} basis or one can employ a weighted least-squares procedure to minimize the errors for a subset of the MO ERIs.^{64} In Secs. II A and II B, we first outline the general THC-MP2 gradient theory that is applicable for all THC methods that factorize the ERIs in the form given in Eq. (1). We discuss the evaluation of gradients for various choices of the *Z* tensor in Sec. II C.

### A. Analytic gradient expressions for THC-MP2 and THC-SOS-MP2

For simplicity, we give formulas for MP2 and SOS-MP2 gradients in the case of a restricted, closed-shell HF reference. In order to treat both MP2 and SOS-MP2 simultaneously, we define

where *c*_{sos} = 1.3 is used in our implementation as recommended previously.^{47} Similarly, we define $\mu \nu ||\lambda \sigma \u2009=\u20092\mu \nu |\lambda \sigma \u2212\mu \sigma |\lambda \nu $ as the exact ERI used in the preceding HF calculation (assuming that THC is only used for the MP2 and/or SOS-MP2 calculation). The THC-MP2 correlation energy is given as

where we use a Laplace transform and quadrature^{65,66} for the energy denominator,

Previous work^{56,57,60–62} has shown that within the THC approximation, the Coulomb-like term in Eq. (5) can be evaluated with *O*(*N*^{3}) operations, and the exchange-like term (not needed in SOS-MP2) can be evaluated with *O*(*N*^{4}) operations. This should be compared with conventional formulations of MP2 and SOS-MP2, which require *O*(*N*^{5}) operations. For brevity, we describe the THC factorization details of Coulomb- and exchange-like terms in Appendixes A and B, respectively. These have been discussed previously but are reproduced here in order to make the article self-contained (and many of these operations are closely related to the factorizations needed in the analytic gradient).

The analytic gradient of the THC MP2 ground state energy is given as^{12}

where *ξ* represents the nuclear degree of freedom involved in the derivative, *h*_{μν} is the core Hamiltonian matrix element (electronic kinetic energy and electron-nuclear attraction), *S*_{μν} is the overlap integral, *P*^{HF/}*W*^{HF} are the HF/energy-weighted density matrices, *P*^{(2)} and *W*^{(2)} are the second order corrections to the density and energy-weighted density matrices, and $\Gamma ^\mu \nu \lambda \sigma MP2$ is the MP2 two-electron density matrix. Gradient contributions from *P*^{(2)} and *W*^{(2)} in Eq. (7) describe how the MP2 energy responds to changes in the orbital coefficients and orbital energies. The last term in Eq. (7) describes how the MP2 energy responds to changes in the THC-approximated ERI. We will focus our discussion on the terms that need to be modified in THC-based MP2/SOS-MP2 relative to a conventional implementation of the MP2/SOS-MP2 gradient. This boils down to the second-order corrections to the density and energy-weighted density matrices, i.e., *P*^{(2)} and *W*^{(2)}, and the last term in Eq. (7) where the THC ERI appears. Other terms are treated in an analogous manner to conventional MP2 gradient theory.

As in density fitted MP2 (DF-MP2), there are two types of ERIs involved in the ground state energy—those used in the preceding HF calculation (assumed to be exact) and the approximate ERIs used in the calculation of the MP2 energy. Because of this, some blocks of *P*^{(2)} and *W*^{(2)} contain contributions from both the THC-approximated ERI $\mu \nu |\lambda \sigma THC$ and also the exact ERI $\mu \nu |\lambda \sigma $. In the MO basis, the occupied-occupied and virtual-virtual blocks in *P*^{(2)} are defined as

The occupied-virtual block of *P*^{(2)} is obtained by solving the coupled-perturbed HF equations^{11} as

where

The definition of *W*^{(2)} in the MO-basis is

Equations (8) and (9) contain products of two energy denominators, which would normally imply two separate Laplace transform quadratures. Here, we introduce a single quadrature to describe the product. This implies only a single loop over the Laplace transform quadrature points. For $\Delta ijac\Delta kjac$ in Eq. (8), when *i* is not equal to *k*,

If *i* is equal to *k*,

For the degenerate case when *i* is equal to *k*, previous works^{48,49} have obtained a similar expression to Eq. (17). Combining Eqs. (16) and (17), we get

where

Similarly, $\Delta ijab\Delta ijcb$ in Eq. (9) can be evaluated as

where

In this work, we use the same quadrature nodes and weights for evaluating denominator products as shown in Eqs. (16) and (17) and single denominators as shown in Eq. (6). These nodes and weights are chosen following the work of Takatsuka *et al.*^{67,68} Empirical tests of the accuracy of this procedure are documented in Sec. VIII of the supplementary material. In principle, the optimal quadrature nodes and weights will be different for single denominators and denominator products, and there may be ways to increase the accuracy by searching for a quadrature that can treat both cases. However, we leave this to future work, as the tests below show that the present approach is sufficiently accurate.

Equations (18) and (20) factorize the energy denominator product in a similar way as the energy denominator in Eq. (6). This allows us to efficiently evaluate Eqs. (8)–(15) by introducing the following common intermediate tensors:

We call these the OVV [occupied-virtual-virtual, Eq. (22)] and OOV [occupied-occupied-virtual, Eq. (23)] intermediate density matrices. The density matrix *P*^{(2)}, energy-weighted density matrix *W*^{(2)}, and the Lagrangian *L* can then be computed from $\rho \mu \nu \kappa $ and $\rho \xaf\mu \nu \kappa $ as

The operations in Eqs. (24)–(29) are dominated by AO to MO transformations of $\rho \mu \nu \kappa $ and $\rho \xaf\mu \nu \kappa $, scaling as *O*(*N*^{3}). Additionally, there are Fock matrix evaluations in Eqs. (26) and (27), which in practice usually scale as *O*(*N*^{2}) after screening with the Schwarz bound.^{69} The calculation of $\rho \mu \nu \kappa $ and $\rho \xaf\mu \nu \kappa $ will be discussed in detail below (Sec. II B). These terms require *O*(*N*^{4}) operations for THC-MP2 and *O*(*N*^{3}) operations for THC-SOS-MP2.

The last term in Eq. (7) represents gradient contributions corresponding to geometry dependence of the THC tensors, where the nonseparable part of the MP2 two-electron density matrix is defined as

The last term in Eq. (7) can be further broken down in terms of the *X* and *Z* tensors as

where

and

In Sec. II B, we discuss the implementation details for the computation of $D\mu P$ and *Γ*^{PQ}, showing that this can be accomplished with *O*(*N*^{4})/*O*(*N*^{3}) operations for THC-MP2/THC-SOS-MP2. In Sec. II C, we show how analytic gradients for the *X* and *Z* tensors in Eq. (31) can be evaluated once $D\mu P$ and *Γ*^{PQ} have been computed.

Given the density matrices $D\mu P$, the MP2 energy can be directly calculated as

Therefore, in MP2 gradient calculations, one does not need to use the equations shown in Appendixes A and B in order to evaluate the MP2 energy. Rather, the MP2 energy is efficiently computed as a byproduct of the MP2 gradient computation.

### B. Calculation of the density matrices

In this section, we will discuss how the THC factorization is applied to reduce the computational scaling associated with the four density matrices needed for gradient evaluation: $\rho \mu \nu \kappa $, $\rho \xaf\mu \nu \kappa $, $D\mu P$, and *Γ*^{PQ}. The expressions for these density matrices are closely related to the MP2 energy expression, which allows most intermediate tensor operations in the energy implementation to be reused for gradient calculations. Similar to the MP2 energy expression, the density matrix $\rho \mu \nu \kappa $ can be decomposed into a Coulomb-like contribution $\rho \mu \nu \kappa ,J$ and an exchange-like contribution $\rho \mu \nu \kappa ,K$ as $\rho \mu \nu \kappa \u2009=\u2009cJ\rho \mu \nu \kappa ,J+cK\rho \mu \nu \kappa ,K$, where

A similar decomposition into Coulomb-like and exchange-like terms also applies to the other three density matrices $\rho \xaf\mu \nu \kappa $, $D\mu P$, and *Γ*^{PQ}. Diagrammatic representations illustrating the required tensor operations are provided for both Coulomb-like (Fig. 1) and exchange-like (Fig. 2) terms. In the following, we treat each of these density matrices in turn.

#### 1. OVV intermediate density matrix $\rho \mu \nu \kappa $

As shown in Fig. 1(a), the Laplace quadrature points are usually first contracted with the molecular orbital coefficients to form two transformation matrices, i.e., *T*^{κ,(occ)} defined in Eq. (A2) and *T*^{κ,(vir)} defined in Eq. (A3). Contracting the *X* tensor with *T*^{κ,(occ)} and *T*^{κ,(vir)} leads to the *Y* tensor defined in Eq. (A4) and the *U* tensor defined in Eq. (A5), respectively. The Coulomb-like contribution in $\rho \mu \nu \kappa $ can then be formulated as

As shown in Fig. 1(c), $\rho \mu \nu \kappa ,J$ requires one fewer contraction with *T*^{κ,(occ)} compared to the Coulomb-like energy calculation in Fig. 1(a). The expression in parentheses is also contained in the Coulomb-like energy expression in Eq. (A6) and corresponds to the *F* tensor defined in Eq. (A10). This allows Eq. (36) to be computed with *O*(*N*^{3}) operations as

where the *V* tensor in Eq. (38) is defined in Eq. (A8). The calculation of $\rho \mu \nu \kappa $ is slightly more expensive than the Coulomb-like energy calculation, as it requires an additional dense matrix multiplication shown as in Eq. (37).

Using the *Λ* tensor defined in Eq. (B2) and the transformation matrix *T*^{κ,(vir)}, the exchange-like contribution in $\rho \mu \nu \kappa $ can be expressed as

The calculation of Eq. (40) is also closely related to the exchange-like energy expression in Eq. (B4) and requires one fewer contraction with *T*^{κ,(occ)} as can be seen in the comparison of Figs. 2(a) and 2(c). The expression in parentheses is shared between the two calculations and corresponds to the *H* tensor defined in Eq. (B8). Thus we have

This indicates that the calculation of $\rho \mu \nu \kappa ,K$ also scales as *O*(*N*^{4}) and has almost the same computational cost as the MP2 exchange-like energy.

#### 2. OOV intermediate density matrix $\rho \xaf\mu \nu \kappa $

As shown in Eqs. (22) and (23), $\rho \mu \nu \kappa $ and $\rho \xaf\mu \nu \kappa $ have similar expressions, differing by the interchange of occupied and virtual indices. This allows them to be computed in similar ways.

For the Coulomb-like contribution in $\rho \xaf\mu \nu \kappa $, we have

As illustrated in Fig. 1(e), using the *g* tensor defined in Eq. (37) and the *O* tensor defined in Eq. (A7), it can be evaluated as

For the exchange-like contribution in $\rho \xaf\mu \nu \kappa $, using the $\Lambda \xaf$ tensor defined in Eq. (B3) and the transformation matrix *T*^{κ,(occ)}, we have

The expression in Eq. (45) does not share any intermediate tensors with the exchange-like energy term. Nevertheless, as shown in Fig. 2(e), we can compute the following tensor by following a similar procedure as Eqs. (89)–(91):

This requires *O*(*N*^{4}) operations, whereupon $\rho \xaf\mu \nu \kappa ,K$ can be computed as

#### 3. *X* tensor density matrix $D\mu P$

As shown in Eq. (32) of Sec. II A, since the *X* tensor is contracted with the occupied and virtual molecular orbital coefficients separately, the density matrix can also be divided into $DP\mu occ$ and $DP\mu vir$ correspondingly. The Coulomb-like contribution in the density matrix of the *X* tensor from the occupied orbitals is

As can be seen from the comparison of Figs. 1(c) and 1(d), the expression in parentheses is shared when calculating $DP\mu occ$ and $\rho \mu \nu \kappa ,J$, which is contracted into the intermediate tensor *h* defined in Eq. (38). Therefore, this density matrix can be computed as

The Coulomb-like contribution in the density matrix of the *X* tensor from virtual orbitals is

As shown in Figs. 1(f) and 1(e), $DP\mu vir,J$ depends on the intermediate tensor $h\xaf$ used to construct $\rho \xaf\mu \nu \kappa ,J$, as shown in Eq. (43). Reusing this intermediate,

Similarly, for the exchange-like term in the density matrix of the *X* tensor, $DP\mu occ,K$ in Fig. 2(d) and $DP\mu vir,K$ in Fig. 2(f) can be computed as

where *H* and $H\xaf$, respectively, defined in Eqs. (B8) and (47), are intermediate tensors shared with $\rho \mu \nu \kappa ,K$ and $\rho \xaf\mu \nu \kappa ,K$. From the above discussion, we see that calculation of the *X* tensor density matrices requires only four additional matrix multiplications after $\rho \mu \nu \kappa $ and $\rho \xaf\mu \nu \kappa $ have been calculated.

#### 4. *Z* tensor density matrix $\Gamma PQ$

The Coulomb-like term in the density matrix of the *Z* tensor is

As shown in Fig. 1(b), it requires one fewer contraction with the *Z* tensor compared to the energy evaluations in Fig. 1(a). Thus using the *A* tensor defined in Eq. (A9) and the *F* tensor in Eq. (A10), both from the Coulomb-like MP2 energy calculation, we obtain

Therefore, the construction of $\Gamma PQ,J$ also requires one additional dense matrix multiplication compared to the Coulomb-like energy calculation.

The exchange-like term in the density matrix of the *Z* tensor is

The operations in parentheses are also contained in the exchange-like energy evaluations and correspond to the *G* tensor in Eq. (B7). However, as shown in Fig. 2(b) because it requires one fewer contraction with the *Z* tensor compared to the energy evaluation, the remaining tensor cannot be contracted into another *G* tensor as in Fig. 2(a). Therefore, building the exchange-like density matrix of *Z* tensor requires some additional quartic scaling operations as

The scaling we present in some of the above equations and in the appendices has made use of the spatial sparsity in the AO-based *X* tensor, i.e., the number of non-negligible grid points remains roughly constant for each atomic orbital. However, we emphasize that even without exploiting spatial sparsity, the formal scaling remains *O*(*N*^{4}) for MP2 and *O*(*N*^{3}) for SOS-MP2, since all tensor operations involve at most four indices.

### C. Analytic gradients of the THC tensors

Once the density matrices have been constructed, the last remaining question is how to evaluate the analytic gradients of the THC tensors, i.e., *X* and *Z*. The *X* tensor defined in Eq. (2) is simply the values of the atomic orbitals at the gridpoints. Therefore, its analytic gradient can be evaluated in a similar way as density functional theory, including gradient contributions from the changes of atomic centers on the gridpoints, atomic orbitals, and Becke gridpoint weights.^{70,71} The analytic gradient of the *Z* tensor is more complicated and will depend on which type of THC approximation is being used.

For least-squares AO-THC, the expression for the *Z* tensor is

where the metric matrix in the physical space is defined as

The *Z* tensor in Eq. (62) contains two matrix inversions, i.e., the inversion of the density-fitting metric matrix (A|B) and the inversion of the physical space metric matrix *S*_{phys}. For a well-conditioned matrix, the derivative of its inverse can be computed as

However, *S*_{phys} is generally ill-conditioned, and its inverse matrix is computed as a pseudo-inverse by diagonalizing the matrix and neglecting small eigenvalues below a user-defined threshold (1.0 × 10^{−12} a.u. in our implementation). Because of the pseudo-inversion, Eq. (64) is not applicable. We currently solve this problem by regularizing the matrix as

where *I* represents the identity matrix and $\delta $ is a small regularization constant added to the diagonal. When the regularization constant is large enough to make *S*_{phys} well-conditioned, Eq. (64) can again be used and we assume this below. We discuss the effect of different regularization values in Sec. III.

The THC tensors with Eqs. (2) and (62) contain three contributions that must be considered: the atomic orbital density at the grid points $X\mu P$, the two-center-two-electron Coulomb repulsion integral (A|B), and the three-center-two-electron Coulomb repulsion integral $\mu \nu |A$. Therefore, Eq. (31) can be further broken down as the gradient contributions from each type of integral

where $d\mu \nu A1$, $dAB2$, and $dP\mu 3$ are corresponding density matrices, and details about their definition and evaluation are provided in Eqs. (C4), (C6), and (C10), respectively, in Appendix C.

The major computational bottleneck when computing these density matrices is the last term $\u2211A\nu cPAA|\mu \nu X\nu P$ in Eq. (C10). Theoretically it is possible to compute this term with *O*(*N*^{2}) operations, but the calculation then requires storage of the third-order ERI tensor. To avoid this potential storage bottleneck, we numerically integrate the density fitting integral,

We will use an over-tilde to denote all quantities related to the numerical integration. By using Eq. (67), we can factorize the three-index ERI tensor as the product of two-index tensors, leading to

There are two sets of gridpoints used in Eq. (70), i.e., the sparse gridpoints (denoted as *P*, *P*′, *Q*, *Q*′) used for the least-squares THC factorization and the dense grid points (denoted as $P\u0303$ and $Q\u0303$) used for numerical integration. The gridpoints used for numerical integration are described in Sec. I of the supplementary material. Normally, this grid is at least ten times larger than that used for the least-squares fitting. To make it clear that there exist two sets of gridpoints, we call this as dual-grid AO-THC. The quantities in Eqs. (68) and (69) are much simpler to evaluate compared with $\mu \nu |A$ and can be efficiently implemented and parallelized on graphics processing units (GPUs). Therefore, our implementation computes these values and integrals on the fly when needed without storing them in memory. Despite the large number of gridpoints needed for accurate numerical integrations, the dual-grid scheme is advantageous in terms of performance, as discussed in further detail in Sec. III.

There are four types of quantities appearing in the *Z* tensor of dual-grid AO-THC in Eq. (70), i.e., the two-center-two-electron Coulomb integral (A|B), the Coulomb integral on the grid $V\u0303P\u0303A$, $X\mu P$ on the sparse grid, and $X\u0303\mu P\u0303$ on the dense grids. Therefore, the gradient contributions can be broken down as

where the density matrices of integrals $d\u0303AB1$, $d\u0303P\u0303A2$, $d\u0303P\u0303\mu 3,$ and $d\u0303P\mu 4$ are provided in Eqs. (D4), (D5), (D7), and (D9), respectively, in Appendix D. As shown in Appendix D, the calculations of these density matrices only require tensor operations between two-index tensors such as matrix multiplications and thus scale formally as *O*(*N*^{3}).

For least-squares MO-THC, the *Z* tensor is computed as

where *p*, *q*, *r*, *s* are molecular orbital indices, and

Therefore, the analytic gradient of the *Z* tensor in Eq. (72) can be broken down into contributions from $X\mu P,\xi $, $A|B\xi $, $\mu \nu |A\xi $, and molecular orbital coefficients $Cp\mu \xi $. Evaluating the analytic derivative of the *Z* tensor will involve contributions from the response to changes of the molecular orbital coefficients. This necessitates further solution of coupled-perturbed Hartree-Fock equations. Although conceptually simple, the algebra is tedious and we leave the implementation of analytic gradients for least-squares MO-THC and dual-grid MO-THC to future work.

### D. GPU implementation

Our implementation uses graphical processing units (GPUs) to accelerate three parts of the analytic gradient calculation, as described below. First, all integrals as well as their corresponding analytic gradients are evaluated on GPUs. In our previous work,^{61,62} we have discussed in detail how to exploit the spatial sparsity of the atomic orbitals during integral evaluations on the GPU. We arrange the data by dividing the space surrounding the molecule into boxes and then populating the gridpoints, Gaussian primitive functions, and Gaussian primitive pairs into these boxes based on the locations of their centers. These techniques are adopted in the implementation for the analytic gradients of the integrals.

Second, our previous performance tests showed that the matrix diagonalization [which was performed on the central processing unit (CPU)] needed in the pseudo-inverse of the physical space metric matrix (*S*_{phys}) became a computational bottleneck for larger systems. To reduce the computational cost of this step, we now use the Magma library,^{72} a dense linear algebra library for GPUs, to diagonalize matrices with dimension larger than 5000.

Finally, the cuBLAS library^{73} is used to accelerate matrix multiplication (*dgemm*) on GPUs, including both matrix multiplications between second-order tensors and also high-order tensor operations that can be formulated as *dgemm*. For example, Eq. (61) can be evaluated with *dgemm* by treating *R* and *j* as a single compound index. When the tensors cannot fit into the limited GPU memory (around 3-12 Gb depending on the particular GPU being used), the tensors are divided into sub-tensors such that *dgemm* between subtensors can be computed on the GPUs.

All tensor operations that cannot be formulated as *dgemm* are computed on the CPU. Even though CPUs usually have much larger memory than GPUs, in some cases, storing high-order tensors can still become difficult for large molecules. An example is the exchange-like energy shown in Appendix B. When the sizes of the tensors exceed the memory limit, our current strategy is to divide the tensor over the occupied index *j* and loop through Eqs. (89)–(92) for each subset of *j*. This is not the only way to address the memory limit, and systematic optimization for high-order tensor operations is certainly an avenue for further research.

## III. RESULTS AND DISCUSSION

The method described above was implemented in the TeraChem^{74–77} quantum chemistry package (which currently supports atom-centered *spd* Gaussian basis sets and *spdf* Gaussian auxiliary basis sets). Density fitted MP2 and SOS-MP2 calculations were performed with Q_{CHEM}.^{78} All calculations were performed with the cc-pVDZ basis set^{79} and without the frozen core approximation. The THC grids optimized for THC-MP2 with cc-pVDZ basis sets^{60} were used for both the least-squares AO-THC (LS-AO-THC) and dual grid AO-THC (DG-AO-THC). Because the THC approximation is applied in the AO basis for all calculations in this paper, we will omit “AO” in the acronyms hereafter for simplicity. These THC grids contain around 60 gridpoints per hydrogen atom and around 120 gridpoints per each carbon, nitrogen, or oxygen atom. The grids for numerical integration in DG-THC are provided in Sec. I of the supplementary material, and these are composed of around 1200–1300 gridpoints for each atom. All timing data are collected using one CPU core and one GPU unless otherwise specified.

To ensure that THC-MP2 is valid for AIMD simulations, we start by testing its accuracy in describing potential energy surfaces. Using alanine dipeptide as an example, we computed MP2 correlation energies with both density fitting and THC on 576 different geometries generated by scanning over the *φ* and *ψ* angles. As shown in Fig. 3, LS-THC-MP2 provides a very accurate description of the potential energy surface, where the root mean square error (RMSE) with respect to the DF-MP2 is only 0.013 kcal/mol (ranging from −0.046 kcal/mol to 0.032 kcal/mol). By replacing analytical evaluations of the density fitting integrals with numerical integrations, the RMSE in the relative energies computed with DG-THC-MP2 becomes 0.050 kcal/mol (ranging from −0.182 kcal/mol to 0.146 kcal/mol). The errors introduced by numerical integration can always be decreased by increasing the number of gridpoints used for the numerical integration, as in density functional theory. When a regularization parameter of *δ* = 1.0 × 10^{−5} a.u. is applied, denoted as DG-THC-MP2 (δ = 1.0 × 10^{−5} a.u.), the errors in the relative energy increase (as expected), but the RMSE is still acceptable at 0.068 kcal/mol (ranging from −0.233 kcal/mol to 0.195 kcal/mol). We also computed the relative thermal probabilities of conformations at room temperature (298 K) from the relative total MP2 energies. As shown in Fig. S1 of the supplementary material, the overall impact of the energy errors on the probability density at 298 K is minor. Most of the probability density is concentrated in the (−75, +90) region of the plot, and the DG-THC-MP2 (δ = 1.0 × 10^{−5}) distribution only varies by a few percent in this region. We have also tested DG-THC-MP2 (*δ* = 1.0 × 10^{−5} a.u.) on numerous geometric configurations of (H_{2}O)_{6} water clusters. The maximum error in relative energies is less than 0.06 kcal/mol, as documented in Fig. S2 of the supplementary material. These results indicate that DG-THC-MP2 with regularization can provide an accurate description of the potential energy surface.

As we have discussed in Sec. II, DG-THC-MP2 was introduced to factorize the three-index ERI as the product of two index tensors, such that storing the three-index tensor can be avoided when computing the density matrices required for the analytic derivative. It is also interesting to investigate how the insertion of numerical integration benefits the performance of energy calculations. Integral evaluations on the GPUs usually benefit from the fact that GPUs are capable of launching thousands of threads simultaneously but are restricted by the limited register resources for each GPU thread. Therefore, compared with LS-THC that requires the more complicated three-center integrals, the DG-THC scheme has clear advantages on the GPUs because the required integrals are much simpler, as shown in Eq. (69), and can be easily parallelized over GPU threads. In addition, as we have mentioned, our implementation now uses the MAGMA library so that diagonalization of the metric matrix is also accelerated by the GPUs. As shown in Fig. S4 of the supplementary material, at least four times speedup is achieved in SOS-MP2 energy calculations by introducing DG-THC and the Magma library.

In Fig. 4, we show the performance of energy and gradient calculations with both THC-SOS-MP2 and THC-MP2. The THC-SOS-MP2 energy and gradient evaluations both exhibit sub-cubic scaling for the test systems (water clusters with up to 900 basis functions), and THC-MP2 energy and gradients both show sub-quartic scaling. In Fig. S5 of the supplementary material, we also compare the scaling of the gradient calculations with THC-SOS-MP2 and DF-SOS-MP2. These results indicate that the THC factorization can effectively reduce the scaling in both energy and gradient calculations. In addition, because computations on different Laplace quadrature points are independent, the method is well suited for parallelization. Our current implementation supports multi-threading parallelism within a node, where each thread consists of one CPU core and one GPU. In Fig. S7 of the supplementary material, we present the parallel efficiency of the current implementation. The speedup for the test system is close to linear with respect to the number of GPUs and reaches 6.7× speedup with 8 GPUs. Parallelization across multiple nodes will be studied in future work. In terms of the absolute timing, THC-SOS-MP2 and THC-MP2 have quite different behaviors. The gradient of THC-SOS-MP2 takes only about twice the time needed for THC-SOS-MP2 energy. Since THC-SOS-MP2 only requires the Coulomb-like term, its efficiency is as expected since the THC factorizations of the Coulomb-like density matrices all closely resemble the factorization of the Coulomb-like energy. In contrast, the gradient of THC-MP2 takes much longer than THC-MP2 energy. Compared to the THC factorization of the exchange-like energy, the exchange-like OVV density matrix $\rho \mu \nu \kappa $ in Eq. (40) has a nearly identical factorization and thus takes almost the same amount of time. The factorization of the OOV density matrix $\rho \xaf\mu \nu \kappa $ is similar but interchanges the virtual and occupied orbital indices. This implies an increased computational cost by a factor of five, corresponding to the ratio of the number of virtual and occupied orbitals. The factorization of the density matrix of the *Z* tensor $\Gamma PQ$ requires a few additional tensor operations, Eqs. (58)–(61), and therefore takes about twice the time needed for the energy evaluations. Combining all of these effects leads to a gradient which is almost eight times more costly than the energy in THC-MP2. Numerical timings for each of these components of the THC-MP2 gradient are provided in Fig. S6 of the supplementary material.

To investigate the effect of the regularization parameter on stability in AIMD simulations, we carried out several microcanonical AIMD simulations with a varying regularization parameter and observed the degree to which total energy was conserved. We first ran simulations on an alanine aldehyde molecule with DG-THC-SOS-MP2. Initial velocities were drawn from a Maxwell-Boltzmann distribution at 1500 K, and a velocity Verlet integrator was used with a 0.2 fs time step. We select one representative trajectory for each regularization parameter and plot the kinetic, potential, and total energies as a function of time for 1 ps in Fig. 5. The cutoff of eigenvalues during pseudo-matrix inversion is set to 1.0 × 10^{−12} a.u. As shown in Fig. 5, a larger regularization parameter significantly improves the total energy conservation. For both δ = 1.0 × 10^{−7} a.u. and δ = 1.0 × 10^{−8} a.u., there are notable discontinuities appearing in the trajectories, and calculations in the AIMD simulations soon fail to converge. With δ = 1.0× 10^{−6} a.u., both the number and size of the discontinuities become much smaller. With δ = 1.0 × 10^{−5} a.u., the total energy conservation becomes acceptable. To verify that the methods and implementations of THC-MP2 gradients are also correct, we have also performed similar tests on an acetic acid molecule with DG-THC-MP2 under the same initial settings (1500 K and 0.2 fs time step). Fig. S8 of the supplementary material shows energies as a function of time for 2 ps. The trajectory with δ = 1.0 × 10^{−7} a.u. contains notable discontinuities. With regularization δ = 1.0 × 10^{−5} a.u., the energy conservation is significantly improved, and the total energy drift is 0.0164 (kcal/mol)/ps, corresponding to 6.8 × 10^{−4} kcal/mol per degree of freedom per ps.

Given that trajectories with regularization δ = 1.0 × 10^{−5} a.u. are able to satisfy energy conservation requirements, we proceeded to test energy conservation using THC gradients for longer AIMD simulations. Figure 6 shows the results from AIMD simulations on one alanine aldehyde molecule with DG-THC-SOS-MP2 (δ = 1.0 × 10^{−5} a.u.) for 10 ps with initial velocities drawn from a Maxwell-Boltzmann distribution at 1500 K and a 0.2 fs time step. As shown in Fig. 6, the trajectory remains stable for the entire 10 ps, and the fluctuation in total energy is unnoticeable on the scale of the fluctuations in the kinetic and potential energies. The drift of the total energy is 4.755 × 10^{−4} (kcal/mol)/ps, corresponding to 1.320 × 10^{−5} kcal/mol per degree of freedom per ps. Further tests are provided in the supplementary material (Fig. S9) for AIMD simulations of an acetic acid molecule with DG-THC-MP2 (δ = 1.0 × 10^{−5} a.u.). These results demonstrate that our gradient implementation gives the correct nuclear gradients with respect to the THC-MP2 potential energy surfaces.

In addition to AIMD simulations, the THC-MP2 and THC-SOS-MP2 gradients developed can also be used in geometry optimizations, and one such result is presented in Fig. 7. The geometry optimization was carried out using the geometric optimizer,^{80} and the convergence criteria were set as specified in the figure caption. The starting structure was taken from PDB-ID 2ONW, an X-ray crystal structure of a fibril-forming peptide (sequence SSTSAA). Our energy minimization, which was carried out on a single molecule in the gas phase, caused multiple changes in the structure. The overall structure contracted during the minimization process, with the end-to-end distance between the two α-carbons on S1 and A6 decreasing from 16.8 to 10.6 Å. We observed the formation of one intramolecular hydrogen bond between the OH groups of the S2 and S4 side chains with an equilibrium distance of 1.72 Å. The S4 side chain also formed a hydrogen bond with the backbone CO group of the same residue with a distance of 1.82 Å. The chemical structure remained the same throughout the optimization with no proton transfer or other chemical changes observed.

## IV. CONCLUSIONS

In this work, we derived and implemented analytic gradients for tensor hyper-contracted MP2 and SOS-MP2 on graphical processing units (GPUs). We have achieved formal quartic/cubic scaling with molecular size for MP2/SOS-MP2 gradient evaluations. In practice, the observed scaling up to 1000 basis functions is even less steep—*O*(*N*^{3.6})/*O*(*N*^{2.4}) for MP2/SOS-MP2, respectively. We applied the analytic gradients in the context of *ab initio* molecular dynamics simulations and observed good energy conservation (10^{−5} kcal/mol per ps per degree-of-freedom or better), demonstrating the robustness and accuracy of our formulation and its implementation.

In order to avoid difficulties in the differentiation of the inverse of an ill-conditioned matrix that is present in THC methods, we have used a regularization procedure. This introduces small errors in the correlation energies (less than 0.25 kcal/mol in the test molecules used here). Even though these errors are small and do not cause difficulties with energy conservation in dynamics (implying that there are no significant discontinuities in the resulting potential energy surface), this remains a point for further improvement that we will pursue in future work. Specifically, it would be useful to investigate alternative strategies to ensure that the matrices arising in THC remain well-conditioned.

In our current implementation, calculation of the Coulomb-like terms is significantly faster than the exchange-like terms. In part, this is because the Coulomb-like terms involve only two-index tensors, which fit nicely in the BLAS framework. Therefore, the tensor operations in the Coulomb-like terms greatly benefit from the mature BLAS libraries available both on the CPU (e.g., MKL) and also on the GPU (e.g., MAGMA and cuBLAS). In contrast, the exchange-like terms involve operations between higher-order tensors, and some of the required operations do not fit well into the existing BLAS framework. These operations provide challenging questions for optimization regarding balancing of memory usage and floating point operations and distribution of work between the CPU and GPU. Therefore, a high performance high-order tensor library that utilizes both the CPU and GPU efficiently, and that further recognizes and exploits sparsity patterns in the tensors, would be very useful in order to extend our work to large molecules and long time scales.

Now that analytic gradients are available for tensor hyper-contraction, we are currently working to develop low scaling analytic gradients for other correlation methods within the THC framework. For example, it remains to develop analytic gradients for THC-EOM-CC2,^{59} which can potentially enable AIMD simulations for larger molecules on the excited states. In addition, the development of THC-based methods so far is mostly within the framework of Gaussian orbitals, and the corresponding applications are primarily targeting non-periodic molecular systems. To enable applications to extended systems, plane wave based approaches are required, and such methods with high-performing implementations are already available for DF-MP2.^{55} Recently, it has been shown that THC is also applicable in cases with plane wave basis sets,^{81} and the approaches used here for analytic gradients will likely also be useful in that context.

## SUPPLEMENTARY MATERIAL

See supplementary material for details of numerical grids, further performance evaluation including parallelization over multiple GPUs, and further energy conservation tests.

## ACKNOWLEDGMENTS

This work was supported by the National Science Foundation (Grant No. CHE-1565249) and used the XStream computational resource supported by the NSF MRI Program (Grant No. ACI-1429830). C.S. is grateful for a Stanford Graduate Fellowship.

### APPENDIX A: MP2 ENERGY COULOMB-LIKE TERM WITH AO-THC

The Coulomb-like term of the MP2 energy is

We first define the following transformation matrices and tensors:

*E*_{MP2-J} can then be written in the AO basis with THC integrals as

This can be evaluated in *O*(*N*^{3}) operations as follows:

### APPENDIX B: MP2 ENERGY EXCHANGE-LIKE TERM WITH AO-THC

The exchange-like part of the MP2 energy is

We first define the following two tensors:

With these definitions, the exchange-like term of the MP2 energy becomes

which can be computed in *O*(*N*^{4}) operations as follows:

### APPENDIX C: INTEGRAL DENSITY MATRICES FOR LEAST-SQUARES AO-THC

For the *Z* tensor of LS-AO-THC defined in Eq. (62), we first define the following tensors that are both parts of the *Z* tensor:

The density matrix for $\mu \nu |A$ can then be computed as

Inserting the derivative of the metric matrices,

The density matrix for (A|B) is then

The derivative of the physical space metric matrix is

where

The density matrices for *X* can then be computed as

### APPENDIX D: INTEGRAL DENSITY MATRICES FOR DUAL-GRID AO-THC

For the *Z* tensor defined in Eq. (70), which has the density fitting integral evaluated through numerical integration, we first define the following tensors that are parts of the *Z* tensors:

Similar to Eq. (C6), the density matrix for (A|B) becomes

The density matrix for $V\u0303P\u0303A$ can be computed as

The density matrix for $X\u0303\mu P\u0303$ can be computed as

The density matrix for $X\mu P$ can be computed as

## REFERENCES

_{2}O)

_{16}and (H

_{2}O)

_{17}: A new global minimum for (H

_{2}O)

_{16}

^{4}) time