We report an implementation of the molecular gradient using the divideexpandconsolidate resolution of the identity secondorder MøllerPlesset perturbation theory (DECRIMP2). The new DECRIMP2 gradient method combines the precision control as well as the linearscaling and massively parallel features of the DEC scheme with efficient evaluations of the gradient contributions using the RI approximation. We further demonstrate that the DECRIMP2 gradient method is capable of calculating molecular gradients for very large molecular systems. A test set of supramolecular complexes containing up to 158 atoms and 1960 contracted basis functions has been employed to demonstrate the general applicability of the DECRIMP2 method and to analyze the errors of the DEC approximation. Moreover, the test set contains molecules of complicated electronic structures and is thus deliberately chosen to stress test the DECRIMP2 gradient implementation. Additionally, as a showcase example the full molecular gradient for insulin (787 atoms and 7604 contracted basis functions) has been evaluated.
I. INTRODUCTION
Electronic energy derivatives with respect to nuclear coordinates play a central role in contemporary computational chemistry as they open a possibility of calculating a variety of properties.^{1} For instance, the first energy derivatives are used in the determination of equilibrium and saddle point geometries of molecules, and the second derivatives are used for calculating force constants, harmonic vibrational frequencies, and infrared (IR) and Raman intensities. Higher derivatives are the basis for calculating more subtle effects revealing themselves in spectroscopy as vibrationrotation constants, anharmonic constants, sextic distortion constants, etc.^{2}
The theory of selfconsistent field (SCF) energy derivatives has its roots in the work of Bratoz.^{3} More modern developments with major applications were brought forward by the work of Pulay.^{4} In his pioneering work, Pulay demonstrated that properly implemented analytic SCF gradients are not only more accurate but also more efficient than numerical derivatives. This work laid the foundation for the correlated wave function methods, which then followed for a variety of models—MøllerPlesset perturbation theory (MP2,^{5} MP3,^{6} MP4^{7}), configuration interaction (CI),^{8–11} and coupledcluster (CC)^{12–15} gradients.
In this work we focus on the MP2 molecular gradient. Formulations of the MP2 molecular gradient in the canonical molecular orbital (MO) basis^{5,16–27} suffer from a highorder computational scaling with system size and thus cannot be used for large molecular systems. Efficient implementations extend the applicability range,^{28} but for very large molecular systems MP2 gradients are still out of reach using the traditional MO formulation.
The computational cost of the conventional MP2 gradient implementation can be expressed as kN^{5}, where N is the number of the basis functions in the calculation and k is a prefactor. Obviously, the computational cost may be reduced using two different strategies: (i) reducing the prefactor k and (ii) reducing the N^{5} scaling to a lower power, ideally to a linear dependence on N. Either of these strategies requires that approximations are introduced in the conventional MP2 algorithm.
One of the most common strategies to reduce the prefactor k is to use the resolution of the identity (RI) approximation for the evaluation of twoelectron integrals. The RI method has its roots in the work of Baerends et al.^{29} It was further developed by Dunlap^{30,31} and by Almløf^{32} and Ahlrichs.^{33,34} The RI method is now routinely used for the evaluation of gradients^{35–39} and has also been extended to Hessian evaluation.^{40–42}
To reduce the N^{5} scaling of the MP2 and other correlated models it is necessary to exploit the local nature of the electron correlation. The idea of utilizing the wave function locality originates in the work on local CC methods by Pulay^{43} and Saebø and Pulay,^{44} which has been further developed and matured by the works of Hampel and Werner^{45} and Schütz and Werner.^{46–48} Many other methods for obtaining reduced scaling have been proposed, including atomicorbitalbased CC,^{49–51} the naturallinearscaling approach,^{52} the clusterinmolecule approach,^{53–55} the smooth local CC approach,^{56} the divideandconquer approach,^{57} the fragmentmolecularorbital approach,^{58} the incremental scheme,^{59,60} LaplaceMP2,^{61–64} the dynamically screened local correlation method,^{65} local pair natural orbital approximations,^{66,67} and orbitalspecific virtual orbital approximations.^{68,69}
Our group has recently proposed an alternative local scheme—the divideexpandconsolidate (DEC) method. The method allows the CC energy to be evaluated in a linearscaling and massively parallel fashion.^{70–72} In the DEC scheme the standard CC calculation is divided into a set of independent CC fragment calculations each using their own subset of the total orbital space. This is possible because we use localized occupied and localized unoccupied HF orbitals. A crucial feature of the DEC method is the precision control obtained by systematically expanding the local orbital spaces in a blackbox manner during the calculation to ensure that the fragment energies are determined to a preset threshold denoted the fragment optimization threshold (FOT).^{73} This, in turn, defines the total CC correlation energy as a sum of the fragment energies to a preset threshold compared to a standard molecular calculation. Moreover, the DEC scheme can be extended to the geometric derivatives without changing the underlying structure of the method.^{74}
Several other local correlation methods have been extended to enable the evaluation of the MP2 molecular gradient. For example, MP2 molecular gradients were implemented in the context of the local CC method by Werner and coworkers,^{75} and this approach was also extended to employ densityfitting techniques.^{76} The MP2 molecular gradient has also been presented for the fragmentmolecularorbital method.^{77} Furthermore, Schweizer et al.^{78} reformulated the MP2 gradient equations in the atomic orbital (AO) basis using Laplace transformations of the energy denominator.^{79}
In the present article, we report an implementation of the DECRIMP2 gradient as an extension of the DECMP2 molecular gradient developed in Ref. 74. The DECRIMP2 inherits the precision control as well as the linearscaling and massively parallel attributes of the DECMP2 gradient. Thus, the RI approximation does not change the formal linear scaling of the DEC method, which comes from the division of the total orbital space into many small local orbital spaces. Rather, the use of the RI approximation lowers the cost of the gradient contribution evaluations for each individual fragment. The combination of the DEC and RI approximations results in a method capable of calculating molecular gradients for very large molecular systems. The errors of the DECRIMP2 gradients are analyzed using a test set of supramolecular complexes, and the results demonstrate the general applicability of the DEC method. The test set contains molecules of complicated electronic structures and is thus deliberately chosen to stress test our DECRIMP2 implementation. Furthermore, as a showcase example the full molecular gradient for insulin is evaluated.
The paper is structured as follows. Section II describes the general DECMP2 energy and gradient theory followed by a discussion of the modifications needed for the evaluation of the RI approximated terms. Section III contains numerical results on a test set of very large molecular systems, including a molecular gradient calculation on insulin. In Section IV we give some concluding remarks and future perspectives.
II. THEORY
In Sections II A and II B we summarize the main equations defining the DECMP2 energy and DECMP2 molecular gradient,^{74} respectively. In Section II C we apply RI approximations to the DECMP2 gradient equations to obtain expressions for the DECRIMP2 gradient. Throughout the paper we use the following index conventions.

α, β, μ, ν: atomic orbital (AO) indices.

i, j, k, l: occupied molecular orbital (MO) indices.

a, b, c, d: virtual MO indices.

p, q, r, s: MO indices of unspecified occupation.

Υ, Λ, Φ: auxiliary AO indices.
A. DECMP2 energy
The MP2 correlation energy for a closedshell system may be written as
where g_{pqrs} is a twoelectron repulsion integral (ERI) using the Mulliken notation, g_{pqrs} = (pqrs). The MP2 amplitudes $ t i j a b $ are determined by solving the amplitude equations,
where F is the Fock matrix. Equations (1) and (2) are valid for any choice of optimized HF basis. In the following we assume that the HF orbitals have been localized.
In the DEC scheme, the localized HF orbitals are assigned to atomic sites P, Q, R, … (e.g., based on Löwdin atomic charges^{73}), and the calculation of the correlation energy in Eq. (1) is partitioned into many small and independent fragment calculations. The occupied and virtual MOs assigned to atomic site P are denoted $ P \xaf $ and $ P \xaf $, respectively. In this work, we consider two different partitioning schemes. In the occupied partitioning scheme, the correlation energy $ E corr o $ is written as
where the atomic fragment energy $ E P o $ and pair fragment energy $\Delta E P Q o $ are given by
and $ [ P \xaf ] $ is the set of localized virtual HF orbitals spatially close to atomic site P. The virtual summation restriction is possible due to the locality of the twoelectron integrals and MP2 amplitudes when expressed in the localized HF basis.^{71}
Alternatively, the correlation energy may be evaluated using the virtual partitioning scheme,^{72}
where $ [ P \xaf ] $ is the set of localized occupied HF orbitals spatially close to atomic site P.
The amplitudes entering Eqs. (4) and (7) are determined by solving the MP2 amplitude equations in Eq. (2) in a local orbital space where the occupied and virtual orbitals are restricted to the $ [ P \xaf ] $ and $ [ P \xaf ] $ spaces, respectively. Similarly, the MP2 amplitudes for pair fragments are determined by solving the MP2 amplitude equations in union of spaces from atomic fragment calculations. In practice, $ [ P \xaf ] $ and $ [ P \xaf ] $ are determined in a black box manner to ensure that the error in the atomic fragment energy $ E P o $ is below the fragment optimization threshold (FOT) as described in Ref. 73. The occupied and virtual partitioning schemes in Eqs. (3) and (6), respectively, are equally valid strategies for partitioning the correlation energy into a sum of atomic fragment and pair fragment contributions. Since the local orbital spaces are determined to the FOT precision in a blackbox manner, both approaches yield the total correlation energy to FOT precision.^{72} For the energy evaluation it is thus sufficient to consider either Eq. (3) or Eq. (6). However, for the gradient evaluation, the locality analysis in Ref. 74 shows that some terms are most easily evaluated using the occupied partitioning scheme, while other terms require the virtual partitioning scheme to exploit the locality of the associated MO integrals.
For atomic fragment P, the local HF orbitals are also expanded in a restricted set of atomic orbitals, the socalled atomic extent (AE) {P}, which is a subset of the full set of atomic orbitals in the vicinity of atomic site P. The AE is determined such that the approximate MO $ \varphi \u0303 r P $ assigned to atomic site P,
resembles the true MO $ \varphi r P $ as much as possible in a least squares sense.^{73}
In principle, the number of fragments to consider scales quadratically with system size. However, the pair fragment energies for distant pairs describe dispersion effects, which decay with the inverse pair distance to the sixth power.^{72} Pair fragments for distant atoms may therefore be omitted without affecting the precision of the total correlation energy. When pair screening is employed, the DECMP2 scheme becomes linearscaling with system size. In the simplest implementation a distance cutoff can be used to screen away pairs with negligible energy contributions. A more sophisticated pair screening procedure will be presented in a forthcoming publication.
B. DECMP2 molecular gradient
The DECMP2 molecular gradient expressions were derived in detail in Ref. 71. Here we summarize the main equations that define the DECMP2 molecular gradient to provide the necessary framework for discussing the RI approximations in Section II C.
The DECMP2 molecular gradient is obtained by differentiating the MP2 Lagrangian L_{MP2} with respect to the nuclear displacements x for the 3 Cartesian coordinates for each atom and subsequently applying locality approximations to simplify the evaluation of the gradient intermediates. The DECMP2 molecular gradient for a nuclear displacement x, $L MP2 ( x ) $, may be written as
In Eq. (10) and throughout the paper the notation Z^{(x)} refers to a partial derivative, i.e., the AO integrals contained in Z are differentiated with respect to nuclear displacement x, while the wave function parameters (MO coefficients and MP2 amplitudes/multipliers) contained in Z are not. The nuclear repulsion contribution to the energy is denoted $h nuc ( x ) $, and $L \Theta ( x ) $ contains the terms where MP2 multipliers $ t \u0304 i j a b $ are contracted with differentiated twoelectron integrals,
where the MP2 Lagrange multipliers can be calculated directly from the MP2 amplitudes according to
The oneelectron $L 1el ( x ) $, Coulomb/exchange $L couex ( x ) $, and reorthonormalization $L reort ( x ) $ contributions to the gradient can be expressed as
where $S \mu \nu ( x ) $ is the differentiated AO overlap matrix and $h \mu \nu ( x ) $ is a differentiated oneelectron integral in the AO basis,
with Z_{K} and r_{K} representing the nuclear charge and the nuclearelectronic distance for nucleus K, respectively. The G(A) transformation on a general matrix A is given as the sum of a Coulomb contribution J(A) and an exchange contribution K(A),
and the corresponding quantities with differentiated twoelectron AO integrals (not differentiated matrix A) are defined as
D is the HF density matrix,
and ρ^{AO} is the MP2 correlation density matrix in the AO basis, which is connected to the corresponding MO matrix ρ^{MO} via the following transformation:
ρ^{MO} is partitioned into occupied/virtual blocks,
The locality analysis in Ref. 74 shows that the X intermediate is most efficiently evaluated using the virtual partitioning scheme in the following manner:
Note that (X_{P})_{ij} is zero if i and/or j is outside the $ [ P \xaf ] $ space, and similarly for (ΔX_{PQ})_{ij}. Similarly, using the locality arguments of Ref. 74, the Y intermediate may be evaluated using the occupied partitioning scheme,
The $ \kappa \u0304 a i $ multipliers are determined by solving coupledperturbed HF equations of the form
where $ \kappa \u0304 s $ has been symmetrized and transformed to the AO basis,
and M is determined from the X and Y matrices,
$ G d l ( \kappa \u0304 s ) $ and G_{dl}(M) are determined using Eq. (20) where the first and the second index are transformed to the virtual and the occupied MO basis, respectively, e.g.,
The Φ matrix is written in block form,
where o/v denotes occupied/virtual orbital indices, and the different blocks are given by
The reorthonormalization contribution in Eq. (17) is written in terms of the intermediate matrix W defined as
where all matrices are given in the AO basis, and $ D \u0304 $ is an effective density matrix, which is obtained by allowing the indices in Eq. (26) to run over occupied as well as virtual orbitals,
The DECMP2 gradient scheme may thus be summarized as follows.

Perform HF calculation and localize occupied and virtual HF orbitals.

For each atomic fragment P, perform the following.

Calculate MP2 amplitudes $ t i j a b $ for the occupied partitioning scheme ($ij\u2208 P \xaf $ and $ab\u2208 [ P \xaf ] $) and the virtual partitioning scheme ($ab\u2208 P \xaf $ and $ij\u2208 [ P \xaf ] $).

Calculate fragment intermediates $L \Theta , P ( x ) $, X_{P}, Y_{P}, and Φ_{P} using Eqs. (12), (30), (33), (44), (41), (47), and (50).

Update full intermediates $L \Theta ( x ) $, X, Y, and Φ with calculated fragment intermediates using Eqs. (11), (29), (32), (43), (40), (46), and (49).


Repeat step 2 for the important pair fragments (the number of pair fragments to consider scales linearly with fragment size when pair screening is employed).

Solve the coupledperturbed HF equations in Eq. (35) to determine the $ \kappa \u0304 $ matrix.

Calculate the molecular gradient using Eq. (10).
C. DECRIMP2 molecular gradient
Having summarized the working equations for the DECMP2 molecular gradient, we now apply the RI approximation using the symmetric decomposition^{32,35} to write the g_{aibj} integral as
where g denotes the RI approximated twoelectron repulsion integrals (ERIs), (aiΛ) is a 3center ERI, (ΛΦ) is a 2center ERI, $ C a i \Phi = \u2211 \Lambda ( a i  \Lambda ) ( \Lambda  \Phi ) \u2212 1 / 2 $ defines the symmetrically decomposed fitting coefficients, and capital greek letters {Υ, Λ, Φ} refer to auxiliary AO indices.
Calculating the g_{aibj} integrals in Section II A according to Eq. (54) with auxiliary AOs assigned to atomic sites in the AE (Υ ∈ {P} for atomic fragment P) defines the DECRIMP2 model for the evaluation of the correlation energy.^{80} The current DECRIMP2 implementation does not use RI approximations for the initial HF calculation, and the Fock matrix has thus been calculated without employing the RI approximation. Consequently, in the MP2 amplitude equations in Eq. (2), the RI approximation in applied for the g_{aibj} integrals, but not for the Fock matrix elements.
For the gradient evaluation it is convenient to use the nonsymmetric formulation where integrals are expressed in the following manner:
with the nonsymmetric fitting coefficients $ D a i \Upsilon = \u2211 \Lambda ( \Upsilon  \Lambda ) \u2212 1 ( \Lambda  a i ) $. The formulation using the symmetrically decomposed fitting coefficients is convenient for the energy evaluation as the storage requirements are reduced by a factor of 2, since the storage of the 3center ERIs (aiΥ) is not required. However, in the case of the integral derivatives, the symmetric formulation does not lead to reduced memory requirements, and the formulation using the nonsymmetric fitting coefficients leads to simpler expressions.
As described above, the initial HF calculation does not employ the RI approximation, while the correlated part of the calculation does. For this reason, only some of the integrals of the DECRIMP2 gradient are evaluated using the RI approximation. In summary, the DECRIMP2 molecular gradient may be evaluated using the procedure summarized at the end of Section II B with the following modifications for each fragment calculation.

Determine RIMP2 amplitudes $ t i j a b $ (and hence the associated multipliers $ t \u0304 i j a b =4 t i j a b \u22122 t i j b a $) by solving the amplitude equation in Eq. (2) using RIapproximated ERIs^{80} calculated using the symmetric formulation (Eq. (54)).

Evaluate all ERIs entering the Φ matrix in Eqs. (40) to (51) using the symmetric RI formulation.
 Replace the derivative ERIs $ g a i b j ( x ) $ by $ g a i b j ( x ) $ calculated using the nonsymmetric RI formulation, i.e.,$ g a i b j ( x ) = \u2211 \Upsilon ( a i  \Upsilon ) ( x ) D b j \Upsilon \u2212 \u2211 \Lambda \Upsilon ( a i  \Lambda ) ( \Lambda  \Upsilon ) ( x ) D b j \Upsilon + \u2211 \Upsilon D a i \Upsilon ( \Upsilon  b j ) ( x ) . $
The last point results in the modified $ L \Theta ( x ) $ terms
Note that the derivative integrals in Eq. (23) are not to be replaced according to Eq. (56) as these derivative integrals originate from differentiation of Fock matrix elements, which have been calculated without employing the RI approximation (since the HF calculation does not employ the RI approximation).
D. Implementation
Before discussing the computational results we review a few aspects of the implementation with emphasis on the RI modified $ L \Theta ( x ) $ fragment terms in Eq. (57). Exploiting permutational symmetry and rearranging the terms $ L \Theta ( x ) $ (without involving any approximations) may be written as
Defining the two intermediates
we may write $ L \Theta ( x ) $ in the simple form
A similar expression may be obtained for the atomic fragments and pair fragment contributions in Eqs. (58) and (59),
where we have introduced,
Table I shows the storage requirements and the computational cost for constructing the intermediates of the $L \Theta , P ( x ) $ contribution to the DECRIMP2 gradient calculation. The computational cost is determined as follows:

The occupied MOs assigned to P ($ P \xaf $) denoted the occupied energy orbital space (EOS) of dimension O_{EOS}.

The occupied/virtual MOs that are local to P ($ [ P \xaf ] / [ P \xaf ] $) denoted the amplitude orbital space (AOS) of dimension O_{AOS}/V_{AOS}.

The number of standard AOs N_{AO} in the AE ({P}).

The number of auxiliary AOs N_{aux} in the AE.

The number of atoms N_{atoms} in the AE.
Note that these dimensions all refer to an atomic fragment P, not the full molecular system. For pair fragments we use unions of these spaces for the atoms involved (e.g., P and Q).
Intermediate .  Storage .  Computational cost . 

$ t i j a b $  $ O EOS 2 V AOS 2 $  $ N aux O AOS 2 V AOS 2 $ 
$ t \u0304 i j a b $  $ O EOS 2 V AOS 2 $  $ O EOS 2 V AOS 2 $ 
$ \Theta i a \Upsilon P $  N _{aux} O _{EOS} V _{AOS}  $ N aux O EOS 2 V AOS 2 $ 
$ G \Lambda \Upsilon P $  $ N aux 2 $  $ N aux 2 O EOS V AOS $ 
$L \Theta , P ( x ) $  3N_{atoms}  $3 N atoms N aux O EOS V AOS +3 N atoms N aux 2 $ 
(ΛΥ)^{(x)}  $3 N atoms N aux 2 $  $3 N atoms N aux 2 $ 
(akΥ)^{(x)}  3N_{atoms}N_{aux}O_{EOS}V_{AOS}  $3 N atoms N aux N AO 2 O EOS $ 
Intermediate .  Storage .  Computational cost . 

$ t i j a b $  $ O EOS 2 V AOS 2 $  $ N aux O AOS 2 V AOS 2 $ 
$ t \u0304 i j a b $  $ O EOS 2 V AOS 2 $  $ O EOS 2 V AOS 2 $ 
$ \Theta i a \Upsilon P $  N _{aux} O _{EOS} V _{AOS}  $ N aux O EOS 2 V AOS 2 $ 
$ G \Lambda \Upsilon P $  $ N aux 2 $  $ N aux 2 O EOS V AOS $ 
$L \Theta , P ( x ) $  3N_{atoms}  $3 N atoms N aux O EOS V AOS +3 N atoms N aux 2 $ 
(ΛΥ)^{(x)}  $3 N atoms N aux 2 $  $3 N atoms N aux 2 $ 
(akΥ)^{(x)}  3N_{atoms}N_{aux}O_{EOS}V_{AOS}  $3 N atoms N aux N AO 2 O EOS $ 
Naturally the storage in memory of (akΥ)^{(x)} and (ΛΥ)^{(x)} quickly becomes infeasible. However, using a batching of the auxiliary functions (and the AOs if needed) these integrals may be constructed in parts and immediately contracted with $ \Theta k a \Upsilon P $ and $ G \Lambda \Upsilon P $. The time critical part of the DECRIMP2 molecular gradient scheme is the evaluation of the (akΥ)^{(x)} derivative integral which depends critically on the size of the atomic extent of the fragment N_{atoms}.
III. NUMERICAL RESULTS
A. Computational details
All DECRIMP2 calculations were done using a development version of the LSDalton electronic structure program, which is a part of Dalton2016 suite.^{81,82} The reference numbers were obtained with ORCA suite of programs version 3.0.^{83,84} Correlation consistent basis set of doubleζ quality (ccpVDZ) was used for all calculations,^{85} and the corresponding matching ccpVDZRI basis set was used for the RI terms. The selfconsistentfield equations were converged using the “Tight” keywords for both programs, which corresponds to the energy change of 10^{−8} hartree. The coupledperturbed HF equations Eq. (35) converged to reach a residual norm of 10^{−8}. The fragment optimization threshold (FOT) in the DEC calculations was chosen based on our previous investigation^{74} to target an accuracy of 10^{−3} hartree/angstrom in the final molecular gradient.
We have employed the S12L test set of Grimme et al.^{86} to estimate the errors of the DEC approximation. The test set consists of supramolecular complexes of up to 158 atoms (the cucurbit[7]uril (CB7) host inclusion complex with the bis(trimethylammoniomethyl)ferrocene (FECP) was excluded as it has an openshell metal core). The test set contains molecules of complicated electronic structures and is thus deliberately chosen to demonstrate the general applicability of the DEC method.
The individual complexes of the test set are as follows (see Figure 1): two “tweezer” complexes with tetracyanoquinone (TCNQ) and 1,4dicyanobenzene—TCNA@tweezer and DCB@tweezer; two “pincer” complexes of organic πsystems—3c@pincer and 3d@pincer; fullerenes C60 and C70 in a socalled “buckycatcher”—C60@catcher and C70@catcher; an amide macrocycle (mcycle) that binds benzoquinone (BQ) and glycine anhydride (GLH)—BQ@mcycle and GLH@mcycle; the cucurbit[6]uril (CB6) cation complexes of butylammonium (BuNH4) and propylammonium (PrNH4)—BuNH4@CB6 and PrNH4@CB6; and the cucurbit[7]uril (CB7) host with 1hydroxyadamantane (ADOH)—ADOH@CB7. For the insulin molecule considered in Section III C, the molecular geometry was obtained as described in Ref. 87 where the original geometry was taken from Ref. 88. All molecular geometries used in this work are listed in the supplementary material.^{89}
All DECRIMP2 gradient calculations were run on the Titan supercomputer, a Cray XK7 system deployed by the US Department of Energy at the Oak Ridge National Laboratory. Each Titan node consists of a 16core AMD Opteron 6274 processor with 32 GB of main memory. Titan’s nodes are connected according to a 3D torus topology with the Cray Gemini network interfaces. The applicationlevel communication layer is provided by the Cray MPI library implementing most of the MPI 3.0 features, a derivative of MPICH2 from the Argonne National Laboratory.
B. DEC errors
The DEC errors were estimated by comparing the results of conventional RIMP2 calculations with those obtained using the DECRIMP2 method. Direct comparison with the conventional implementation is the most efficient and simple way to conduct the error analysis provided that the reference analytical molecular gradient calculation can be carried out. The DEC method is aimed at calculations on very large molecules, and we therefore consider a test set of rather large molecular systems (Figure 1). In order to enable the determination of reference numbers for this test set, we have restricted ourselves to the correlation consistent basis set of doubleζ quality. Obtaining reference gradients without the RI approximation to test the RI performance benefits and errors is impossible for the considered test set. Moreover, the RI acceleration and errors are now very well understood and documented in the literature.^{35–39} In the DEC context, introducing the RI approximation means that each fragment calculation benefits from RI approximation in the very same way as any conventional RIMP2 calculation. Thus, the RI performance benefits and errors are out of scope for the paper and the presented error analysis reflects solely errors coming from DEC approximation.
Tables II and III summarize the results of the test set calculations and associated DEC errors. As it is evident from the tables, the absolute maximum DEC errors are of the order 10^{−3} hartree/angstrom. This is in accordance with the results obtained previously^{74} for the DECMP2 gradient for smaller molecular systems. Also, the DECRIMP2 gradient errors are in general independent of the system size or type of electronic structure, even for the highly delocalized C60@catcher and C70@catcher. The full molecular gradients are given in the supplementary material^{89} for future comparison with other methods.
Complex .  Basis/AUX basis .  HF .  Correlation .  Total energy .  Gradient norm . 

TCNA@tweezer  982/3724  −2282.511 183 59  −7.7317  −2290.2429  0.1542 
DCB@tweezer  898/3388  −2022.198 430 64  −6.9032  −2029.1016  0.1174 
3c@pincer  1341/5082  −3804.668 221 82  −11.7084  −3816.3906  0.0802 
3d@pincer  1190/4500  −3697.985 014 30  −10.1780  −3708.1630  0.0976 
C60@catcher  1820/7112  −4560.549 432 22  −15.5399  −4576.0894  0.1226 
C70@catcher  1960/7672  −4939.295 517 56  −16.8315  −4956.1270  0.1147 
BQ@mcycle  1404/5208  −3272.631 842 13  −10.7488  −3283.3806  0.0709 
GLH@mcycle  1394/5180  −3238.224 661 99  −10.6522  −3248.8769  0.0743 
BuNH4@CB6  1318/4984  −3802.794 931 92  −11.5053  −3814.3002  0.0676 
PrNH4@CB6  1294/4900  −3763.755 541 33  −11.3561  −3775.1117  0.0669 
ADOH@CB7  1620/6132  −4651.225 865 58  −14.1529  −4665.3788  0.0734 
Complex .  Basis/AUX basis .  HF .  Correlation .  Total energy .  Gradient norm . 

TCNA@tweezer  982/3724  −2282.511 183 59  −7.7317  −2290.2429  0.1542 
DCB@tweezer  898/3388  −2022.198 430 64  −6.9032  −2029.1016  0.1174 
3c@pincer  1341/5082  −3804.668 221 82  −11.7084  −3816.3906  0.0802 
3d@pincer  1190/4500  −3697.985 014 30  −10.1780  −3708.1630  0.0976 
C60@catcher  1820/7112  −4560.549 432 22  −15.5399  −4576.0894  0.1226 
C70@catcher  1960/7672  −4939.295 517 56  −16.8315  −4956.1270  0.1147 
BQ@mcycle  1404/5208  −3272.631 842 13  −10.7488  −3283.3806  0.0709 
GLH@mcycle  1394/5180  −3238.224 661 99  −10.6522  −3248.8769  0.0743 
BuNH4@CB6  1318/4984  −3802.794 931 92  −11.5053  −3814.3002  0.0676 
PrNH4@CB6  1294/4900  −3763.755 541 33  −11.3561  −3775.1117  0.0669 
ADOH@CB7  1620/6132  −4651.225 865 58  −14.1529  −4665.3788  0.0734 
Complex .  ABS × 10^{−3} .  MAD × 10^{−3} .  RMSD × 10^{−3} . 

TCNA@tweezer  4.4  0.5  0.9 
DCB@tweezer  3.0  0.4  0.6 
3c@pincer  4.0  0.4  0.6 
3d@pincer  3.9  0.4  0.7 
C60@catcher  4.7  0.8  1.2 
C70@catcher  6.4  1.3  2.0 
BQ@mcycle  1.8  0.3  0.4 
GLH@mcycle  1.9  0.3  0.4 
BuNH4@CB6  1.2  0.2  0.3 
PrNH4@CB6  1.3  0.2  0.3 
ADOH@CB7  1.6  0.3  0.4 
Complex .  ABS × 10^{−3} .  MAD × 10^{−3} .  RMSD × 10^{−3} . 

TCNA@tweezer  4.4  0.5  0.9 
DCB@tweezer  3.0  0.4  0.6 
3c@pincer  4.0  0.4  0.6 
3d@pincer  3.9  0.4  0.7 
C60@catcher  4.7  0.8  1.2 
C70@catcher  6.4  1.3  2.0 
BQ@mcycle  1.8  0.3  0.4 
GLH@mcycle  1.9  0.3  0.4 
BuNH4@CB6  1.2  0.2  0.3 
PrNH4@CB6  1.3  0.2  0.3 
ADOH@CB7  1.6  0.3  0.4 
C. Example calculation—insulin
As an example of a computationally demanding system, which is not accessible with a conventional implementation, we have chosen insulin (see Figure 2). This molecule contains 787 atoms and 7604 regular and 28 148 auxiliary basis functions. All computational thresholds were kept at the same level as for the test set examples. The DECRIMP2 gradient was obtained in less than 10 h using one third of the Titan supercomputer (6000 nodes). The results of the insulin calculation are presented in Table IV.
HartreeFock energy  −21 645.424 375 24 
Correlation energy  −60.610 9 
Total RIMP2 energy  −21 706.035 3 
DECRIMP2 GRADIENT NORM  0.775 0 
DECRIMP2 GRADIENT RMS  0.015 9 
HartreeFock energy  −21 645.424 375 24 
Correlation energy  −60.610 9 
Total RIMP2 energy  −21 706.035 3 
DECRIMP2 GRADIENT NORM  0.775 0 
DECRIMP2 GRADIENT RMS  0.015 9 
A direct comparison with a conventional implementation is impossible, but we might consider the trace of the correlation density matrix divided by the number of electrons, Tr(ρ^{MO})/N_{el}, as a sizeintensive error measure.^{74} This measure is zero if no DEC approximation is invoked (FOT = 0), while it is nonzero for practical FOT values. For the DECMP2 model we have previously found Tr(ρ^{MO})/N_{el} to be −7.4 × 10^{−6} for insulin using the same geometry, basis, and FOT value as employed here. The corresponding number for DECRIMP2 is 9.6 × 10^{−6}. As expected, these numbers are thus of the same order of magnitude, since the same FOT has been used.
Finally, we note that the DECRIMP2 molecular gradient for insulin is provided in the supplementary material^{89} to enable future comparisons to lowerlevel methods, such as the atomic forces determined from classical force field calculations. In general, the DECRIMP2 gradient scheme might provide useful benchmark data for large molecular systems for semiempirical methods which are often parameterized based on CC calculations for small molecular systems.
IV. SUMMARY AND FUTURE PERSPECTIVES
We have presented the DECRIMP2 molecular gradient scheme, which combines the precision control as well as the linearscaling and massively parallel features of the DEC scheme with efficient evaluations of the gradient contributions using the RI approximation. The DECRIMP2 molecular gradient was successfully applied to a test set of large molecular systems, including complicated electronic structures, to demonstrate the general and wide applicability of the method. We have furthermore performed a largescale DECRIMP2 gradient calculation on the insulin molecule, which contains 787 atoms and 7604 regular and 28 148 auxiliary basis functions. This calculation was completed in less than 10 h using one third of the Titan supercomputer.
The overall goal of the DEC scheme is to provide a framework for treating large molecular systems at the CC level of theory, i.e., systems where the conventional implementation hits a scaling wall. Furthermore, the DEC scheme is linearscaling with system size and designed to utilize thousands of computing cores on a large high performance computer (HPC) architecture. However, the DEC method is rather inefficient for small molecular systems. As a rule of thumb, if the canonical calculation can be performed within a reasonable time frame, then the canonical calculation will use fewer computational resources than the corresponding DEC calculation. However, if one is willing to use many compute nodes, then the timetosolution provided by the DEC method will generally be much shorter than for a canonical implementation. Nonetheless, the main advantage of the DEC method is the ability to treat very large molecular systems where canonical implementations encounter a scaling wall. Such calculations are computationally demanding, but linearscaling and massively parallel. The insulin calculation presented in this work represents such a case, and it was carried out in less than 10 h. It is one of the largest correlated gradient calculations known to date.
In many aspects the comparison of DEC and canonical algorithms mirrors the rise of direct SCF, which was initially only an advantage when disk space was insufficient. However, the development of linearscaling algorithms, improved integral screening, and densityfitting technology made direct SCF competitive, even in cases of sufficient disk space. Direct SCF has also benefitted from HPC developments which favor computations rather than memory access. Similar developments might happen for the DEC scheme due the continued algorithmic and HPC architecture improvements.
Finally, an additional obvious advantage that should be emphasized is the relatively low effort required for extending the DEC scheme to the determination of molecular gradients and other molecular properties, while the linearscaling and massively parallel features of the DEC scheme are retained.
Acknowledgments
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract No. DEAC0500OR22725. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme No. (FP/20072013)/ERC Grant Agreement No. 291371. D.B. acknowledges the Marie Curie Individual Fellowship funding for “DECOS,” Project No. 657514.