An atomic-orbital (AO) reformulation of the random-phase approximation (RPA) correlation energy is presented allowing to reduce the steep computational scaling to linear, so that large systems can be studied on simple desktop computers with fully numerically controlled accuracy. Our AO-RPA formulation introduces a contracted double-Laplace transform and employs the overlap-metric resolution-of-the-identity. First timings of our pilot code illustrate the reduced scaling with systems comprising up to 1262 atoms and 10 090 basis functions.

## I. INTRODUCTION

The random phase approximation (RPA) originally introduced by Bohm and Pines^{1} has become of wide interest for describing electron-correlation both in solid-state and molecular systems offering important advantages also for vanishing electronic gap systems. However, the applicability of RPA to larger systems has always suffered from its huge computational cost. For molecular systems the currently most widespread RPA formulation employing the resolution-of-the-identity (RI) approximation was proposed by Furche *et al.*,^{2–4} on the basis of formulations by Langreth and Perdew,^{5,6} and reduces the effective scaling from $O M 6 $ to $O M 4 $ with system size *M*. This opened the way for more extensive benchmarking beyond the few atoms scale. Many authors have consequently shown the usefulness and applicability of RPA.^{7–12} Developments include analytical forces and first-order properties by Rekkedal *et al.*^{13} and Burow *et al.*,^{14} the stochastic reformulation by Gao *et al.*,^{15} and a complete basis set extrapolation by Mezei *et al.*^{16} Recently, Kállay^{17} presented an effective linear-scaling approximation using a local correlation domain-based approach. While not all of the employed approximations for preselecting domains are rigorous, Kállay presented for the tightest selection criteria deviations in relative energies on the order of 1 kcal/mol for a range of selected molecular systems. However, further increasing the accuracy, full rigorosity, and, e.g., avoiding domain changes on a potential energy surface remain difficult.

In our work, we present an effective linear-scaling RPA atomic orbital (AO) reformulation entirely avoiding ad-hoc approximations in order to recover the exact RPA results and allows for full numerical control of the accuracy by simple thresholding in the spirit of direct SCF or other rigorous AO-based methods. This opens up the possibility to study molecules of several hundreds to thousands of atoms at the RPA level on a simple desktop computer, while the accuracy remains fully numerically controlled.

Our formulation employs a contracted double-Laplace transformation and a RI-expansion in the overlap-metric formulation. While the Coulomb-metric formulation is far more common than the overlap-metric nowadays, the three-center overlap allows us directly to reformulate all time-limiting steps in sparse quantities allowing for linear scaling sparse matrix algebra. In contrast to the widespread expectation that the overlap-metric would entail the need for very large auxiliary basis sets, we do find errors much smaller than the intrinsic error of RPA already for standard auxiliary basis sets (e.g., cc-pVQZ-RI for the cc-pVQZ basis^{18–22}).

Yang and co-workers^{23} have previously suggested to use tensor hypercontraction (THC) methods instead of the RI-expansion for rank reduction within RPA. THC allows to express the electron repulsion integrals solely by rank-2 tensors, while the overlap metric RI also includes rank-3 tensors. The highest rank tensors in RI are, on the other hand, limited to linear increase of significant elements with molecular size, which is a necessity of our linear scaling reformulation. In the context of periodic systems computed with plane wave basis sets, Kresse and co-workers^{24} have recently used the simple Laplace integration together with a Fourier transformation scheme to reduce the scaling to cubic, finding that the grids originally developed for second-order Møller-Plesset (MP2) theory are also suited for use with RPA. This goes well with our finding that also for our new contracted double-Laplace scheme the original MP2 grids already perform sufficiently well.

In the following, first the derivation of our RPA reformulation is given, including a discussion on numerical stability. Then the efficiency and scaling behavior of our pilot implementation are described. Finally, a brief outlook on possible further improvements is presented.

## II. THEORY

RPA has been found to be a highly useful correlation method for going beyond approximative forms of Kohn-Sham density-functional theory (KS-DFT).^{2,7–12} Here, RPA is generally derived from the adiabatic connection^{25} as presented by Langreth and Perdew^{5,6} and the total energy can thus be expressed as

where the kinetic *E _{T}*, Coulomb

*E*, and exact exchange

_{J}*E*energies are evaluated using the Kohn-Sham orbitals from a preceding DFT calculation. Since a multitude of methods have been developed for evaluating HF and DFT in a linear-scaling fashion (see, e.g., Ref. 26 for a recent review), the first three terms are of no concern for developing a linear-scaling RPA method. Therefore, we will focus on the last term known as RPA correlation energy.

_{X}In the following, the Einstein sum convention^{27} is used for simplicity. All slashed symbols denote matrices that are sparse for sufficiently large system sizes with non-vanishing electronic gap. Indices *i*, *j*, … denote occupied orbitals from the underlying DFT calculation, *a*, *b*, … virtual orbitals, *μ*, *ν*, … basis functions, and *P*, *Q*, … auxiliary basis functions arising from the RI-expansion.

We start our reformulation from Furche’s original $O M 6 $ formulation of the RPA correlation energy (*E _{C}*)

^{8,28}

with

and

where *d*_{ia,jb} = (ε_{a} − ε_{i}) *δ*_{ia,jb} denotes a diagonal matrix with diagonal elements given by the difference of the Kohn-Sham orbital energies and $ i a | j b $ denote the electron repulsion integrals in Mulliken notation.

For evaluating the energy given by Eq. (2), we follow the basic idea of Eshuis, Bates, and Furche^{29} to employ the RI approximation.^{30} However, we do not employ the Coulomb-metric RI, but use the overlap-metric RI instead (see also Ref. 31 on the discussion of various RI approximations), given by

where the electron repulsion integral (*ia*|*jb*) is expressed via 3-center overlaps (*iaP*), 2-center overlaps *S*_{P,Q} = (*PQ*), and 2-center 2-electron integrals *C*_{P,Q} = (*P*|*Q*) over auxiliary functions.

Along the lines of Ref. 3, Eqs. (15)-(28), using the matrix square root formula by Hale *et al.*^{32} and the Woodbury matrix identity,^{33} one arrives at

where we have defined

For its computation in an atomic-orbital basis, one first needs to decouple the Kohn-Sham energy based *dD*(*w*)^{−1}. While the quadratic denominator prohibits direct use of the Laplace transformation as in AO-based second-order Møller-Plesset (AO-MP2) theory^{34–37} that allows for linear scaling by distance-including integral screening methods,^{38,39} we are able to achieve decoupling in RPA by the following contracted double-Laplace expansion:

Inserting this transformation into Eq. (9) yields

with

where ⁄*F*^{(p)} is written in the AO basis (LCAO ansatz: $|i ) = C \mu i |\mu ) $). In addition to the commonly defined pseudo-densities in AO-formulation^{37,40,41}

we have further introduced energy-weighted pseudo-densities which can also be seen as derivatives of the pseudo-densities (a fact that we will make use of below (Eq. (19)))

with the Fock matrix *F* and the occupied and virtual one-particle density matrices *P* and *Q*, respectively. As *F*, *P*, *Q* in Eqs. (15)–(18) are known to become sparse for large molecular systems,^{37,40,41} the pseudo-densities and derivative-densities are themselves sparse for such systems and can be computed by sparse-matrix algebra.

Eq. (11) can be simplified by partial integration

resulting in

However, since the cosine function does not decay at large argument values this formulation causes strong oscillations and therefore numerical instabilities occur when *w* approaches infinity. Here, the opposite partial integration is more easily applicable

where

The double weighted pseudo-densities have been defined in analogy to Eq. (18) as

This formulation, however, suffers from numerical instability where *w* tends to zero as there divisions by very small numbers occur. We thus conclude that the energy is to be computed by Eq. (7) with the matrix ⁄*Q* evaluated by Eq. (20) (denoted as “INT-formulation”) for small values of *w* and by Eq. (21) (denoted as “D-formulation”) for large values of *w*. In Fig. 1, we show the energy integrand values computed by each of the two formulations for the example case of the methane molecule. While both coincide over a large range of values, thus making the system dependent choice of the *w*-separation value a robust one, it is apparent that only by computing the small-*w*-part of the integrand in INT-formulation and the large-*w*-part in D-formulation the exact integrand can be recovered. Another choice besides the *w*-separation value is the upper limit of the quadrature in Eq. (7) which we term Λ-cutoff in analogy to the custom in the quantum-field theories. This quadrature can then be carried out by the Clenshaw-Curtis scheme which allows to adaptively augment and distribute the nodes to the desired accuracy. Even when requiring mHartree accuracy as few as 60 node points have proven to be sufficient for most molecules. Because of this small number of evaluations the matrix valued logarithm in Eq. (7) is best computed by direct diagonalization. Although, in principle, also this step could be reduced in computational effort by a series expansion, the presently chosen cubic-scaling diagonalization step will in practice hardly ever become significant due to the very small prefactor (see Fig. S1 in the supplementary material).^{43} With the boundaries of *w* now being effectively finite, one can make use of the results by Takatsuka, Ten-No, and Hackbusch^{44} who gave optimal *p*-integration-points and weights in a minimax sense for the original Laplace transformation. Although our transformation is very different from that, we still find that mHartree accuracy can be achieved when, for example, the range up to Λ = 50 is described by 15 of their nodes. We expect however, that even less points will suffice when optimized specifically for our transformation (Eq. (10)). We finally note that one could, without a change to the derivation, redefine the Coulomb matrix *C* to contain the inverse overlap matrices instead of the ⁄*B* matrices. In this way, the Laplace summation could be carried out over slightly more sparse matrices which could accelerate the computation of Eqs. (11), (20), and (21).

All tuning parameters of our reformulation are monotonically convergent in the sense, that larger Λ-cutoff values, number of Laplace points, and number of integration nodes, and smaller sparsity threshold always improve the result quality. In the limiting case the result equals the exact RI-RPA result. We therefore term our reformulation fully numerically controlled in its accuracy.

## III. RESULTS

Our pilot code of AO-RPA has been realized in the FermiONs++ program^{45,46} with thresholds chosen such that all energies are precise to below 1 mHartree.

To estimate the number of Laplace points and Λ-cutoff needed, we have conducted preliminary investigations on several molecules with medium sized to large basis sets. We have found for all tested cases that the results were essentially converged with the number of Laplace points at about 12 *p*-sampling points for a Λ-cutoff at 50. To ascertain that using the thus very conservative choice of 15 *p*-integration-points and 0.5 as the *w*-separator are sufficient choices for the desired accuracy, we first tested these settings for the computation of Grimme’s NBRC test set^{47} with cc-pVQZ/cc-pVQZ-RI as the basis, finding a mean absolute deviation of only 0.86 kcal/mol which is below the so called chemical accuracy of 1 kcal/mol. For the sake of comparison, the canonical RI-RPA implementation in TURBOMOLE results in a deviation of 0.75 kcal/mol for the same test set. Since our theory does not make use of any screening, the only source for further errors arising for larger systems would stem from the sparse-algebra thresholds from our sparse-algebra implementation, based on the block compressed sparse row format.^{48,49} To make sure no such errors arise we choose block dimensions of 50 elements and set the threshold to the tight value of 1 × 10^{−8}.

The formal scaling behavior is defined as the increase in computational time with the cardinality of the physical interactions of a system, the observed behavior with respect to the number of atoms. While the first gives the true physical boundary, the second can be easily measured by timing the method on a systematically enlarged test molecule. Only in the asymptotic limit, where the molecule is grown in one direction until the extents in the other directions are comparably negligible, the two definitions coincide. Therefore, for studying the formal scaling behavior with molecules of limited size, quasi-one-dimensional systems are ideally suited.

In Fig. 2 evaluation times for the first pilot implementation of our new RPA method are shown for linear alkanes of increasing size. While our code is not yet fully optimized, run times are already short (single core) and effective linear-scaling behavior is observed. The timings were evaluated with the def2-svp basis set. Since the currently available RI auxiliary basis sets are optimized for the Coulomb-metric RI, we employ the same basis set for both the AO and the auxiliary indices as AO-RPA-optimized basis sets have not yet been developed.

Furthermore, we tested the scaling behavior by studying polypeptide chains as models for biochemical systems. The timings show the same linear scaling behavior as the alkane chains (Fig. 2 inset).

To test the limits for the regular user without access to high performance computers, we calculated our largest system (C_{420}H_{842}) comprising 1262 atoms and 10 090 basis functions on a normal desktop computer (Intel Core i7-3770 CPU 3.40 GHz). With the same thresholds as in the other computations the evaluation time is 79 h.

## IV. CONCLUSION AND OUTLOOK

We have presented an AO-based reformulation of RPA that allows evaluating the correlation energy in linear time. Here, we have introduced a contracted double-Laplace transformation to allow for a purely AO-based formulation and employ the overlap-metric RI scheme to separate the long-ranged Coulomb interaction.

For a proof-of-concept we have further presented results obtained with a pilot implementation of our theory. With sufficiently large auxiliary basis sets, mHartree accuracy can be readily achieved. Preliminary timings on more modest basis sets illustrate the linear-scaling time-complexity and low computational effort of our pilot implementation with the largest system comprising 1262 atoms and 10 090 basis functions.

While there are many ways for improving our AO-RPA method in the future, we will focus first on aspects such as the (1) optimization of our preliminary pilot code and parallelization, (2) use of screening procedures, (3) introduction of a Cholesky decomposition for the pseudo-density matrices in order to reduce dimensions (along the lines of, e.g., Refs. 50–52), and (4) development of efficient basis sets for overlap-metric RI expansions.

Nevertheless, already our preliminary pilot AO-RPA implementation shows fairly early crossovers and effective linear time-complexity and in this way opens new possibilities to account for electron-correlation effects in large complex systems.

## Acknowledgments

The authors thank Dr. Jörg Kussmann (LMU Munich) for help in implementing AO-RPA within the framework of the FermiONs++ quantum chemistry package. Financial support by the DFG funding initiatives SFB749 (C7) and the Excellence Cluster EXC114 (CIPSM) is acknowledged.