A novel implementation of the coupled-cluster singles and doubles (CCSD) approach is presented that is specifically tailored for the treatment of large symmetric systems. It fully exploits Abelian point-group symmetry and the use of the Cholesky decomposition of the two-electron repulsion integrals. In accordance with modern CCSD algorithms, we propose two alternative strategies for the computation of the so-called particle–particle ladder term. The code is driven toward the optimal choice depending on the available hardware resources. As a large-scale application, we computed the frozen-core correlation energy of buckminsterfullerene (C60) with a polarized valence triple-zeta basis set (240 correlated electrons in 1740 orbitals).

The accurate and reliable determination of the energy and properties of a molecular system is a pivotal aspect of quantum chemistry. In this context, coupled-cluster (CC) methods1–3 are an optimal choice due to their remarkable accuracy and size-extensivity. For many applications, CC with singles and doubles excitations (CCSD)4 offers a good compromise between accuracy and computational cost. Nevertheless, its routine use is restricted to small- to medium-sized systems. The computational cost scales with the sixth power of the number of basis functions, and the storage requirements are high due to the fact that two-electron integrals involving three and four virtual orbitals need in principle to be stored.

To overcome such computational limitations, various strategies have been developed and are often combined together to get the most efficient implementation. A first class of strategies aims at reducing the overall scaling of the calculation by enforcing the local nature of electron correlation.5–12 Other low-scaling solutions are given by methods that compress the double-excitation amplitudes13 as, for example, in the tensor hypercontraction (THC) scheme,14–16 or by real-space methods.17 Using such techniques, impressive large-scale calculations can be done.18 However, reduced-scaling methods come with various difficulties, which include the use of cutoffs to enforce the locality of correlation, complicated expressions for analytical derivatives,19–23 and overall accuracy control that requires some user experience. A different family of strategies does not reduce the formal scaling of the method, but allows for very efficient implementations that can be vectorized, parallelized, and formulated mostly in terms of highly optimized matrix–matrix multiplications. This is achieved by introducing a low-rank approximation of the two-electron repulsion integrals (ERI), such as density fitting (DF)24–28 and Cholesky decomposition (CD).29–37 Such techniques can be used to reduce the prefactor in the overall method complexity, which remains O(N6), and can drastically lower integral storage and bandwith requirements and improve the parallelization potential.

Simultaneously and in parallel with this, new CC implementations also focus on deploying efficient computational realizations that leverage the full potential of modern computer architectures and utilize parallel computing. Recently, various works that combine massively parallel implementation strategies—based on the joint effort given by the message-passing interface (MPI) and OpenMP—with or without DF/CD, showed impressive calculations on large systems without exploiting local correlation.38–49 To the best of our knowledge, the largest CC calculation, actually at the CCSD(T)50 level, was done with as many as 1624 orbitals (without the use of point-group symmetry) using 288 cores.49 

In this work, we present an implementation of CCSD that exploits the CD of the ERI matrix. The main element of novelty in our implementation is that it fully exploits Abelian point-group symmetry. The implementation is highly vectorized, as it mostly relies on efficient level 3 BLAS routines to perform the bulk of the floating-point operations, and is parallelized using OpenMP. It is therefore well suited to treat, on accessible computational resources, medium-sized molecular systems, and in particular large symmetric molecules. We choose CD instead of DF because, though DF affords a more compact representation of the ERIs, its accuracy depends on the fitted auxiliary basis set that is used for the calculation and cannot be determined a priori nor improved in a systematic way. On the contrary, the accuracy of the CD representation is rigorously controlled by the threshold used to truncate the decomposition, which is the only user-defined parameter that controls the calculation. Our implementation has been realized in a development version of the cfour suite of programs.51,52

In the following, we adopt the usual convention for the indices: μ, ν, σ, … refer to atomic orbitals; p, q, r, … refer to generic molecular orbitals; i, j, k, … refer to occupied orbitals; a, b, c, … refer to virtual orbitals; and P, Q, … refer to the vectors of the Cholesky basis. To get the CCSD correlation energy for closed-shell systems
ΔE=2iatiafai+ijabτ̃ijabij|ab,
(1)
a set of projected non-linear equations needs to be solved,
Φia|H̄|0=0,
(2)
Φijab|H̄|0=0,
(3)
where fai is the ai element of the Fock matrix; τ̃ijab=2τijabτjiab is the spin-adapted tau amplitude, which is the usual combination of double-excitation (tijab) and single-excitation (tia) amplitudes and in the closed-shell case assumes the following form: τijab=tijab+tiatjb; ⟨ij|ab⟩ is a two-electron integral, written using the Dirac convention; |0⟩ is the reference Hartree–Fock (HF) determinant; |Φia and |Φijab are singly and doubly excited determinants, respectively; and H̄=eTHeT is the similarity-transformed Hamiltonian. The algebraic expressions can be found elsewhere. In particular, we have implemented the spin-restricted closed-shell equations.53 We formulated the equations using spin-adapted forms of the F and W intermediates first introduced in the spin-orbital basis by Stanton, Gauss, and co-workers.54,55
The CD of the ERIs (in Mulliken notation) reads
μν|ρσ=PLμρPLνσP,
(4)
where LμρP denotes an element of the Pth Cholesky vector (CV). In order to account for Abelian point-group symmetry, the CD is done using symmetry-adapted linear combinations of atomic orbitals. The resulting CVs are classified by means of the irreducible representation to which the index P belongs (ΓP). For the totally symmetric vectors, we store only the lower triangular block while for the others we store only the sub-blocks for which Γμ ≤ Γρ = ΓP ⊗ Γμ. The CVs are transformed to the molecular orbital basis at the beginning of the CC calculation. Furthermore, in our implementation, we always assemble and keep in core the integrals with at most two virtual indices, such as ⟨ab|ij⟩ and ⟨ij|kl⟩.
The exploitation of symmetry enables a reduction of the memory requirement and the number of floating-point operations by approximately a factor h2, h being the order of the point-group. Given a generic contraction between two four-index arrays, it is performed using the direct-product decomposition (DPD) approach,54 which means that the contraction is subdivided into h distinct operations. The rate-determining step in a CCSD code is given by the so-called particle–particle ladder (PPL) term
Zijab=efτijefWabef.
(5)
The Wabef intermediate can be written in terms of CVs as
Wabef=PLaePtaePLbfPmtmbLaePLmfP
(6)
and tabP=mtmaLbmP. To reduce the computational cost from its formal 12O2V4 scaling to 14O2V4, we exploited the symmetric/antisymmetric algorithm,53,56 which was also successfully exploited in other recent CD/DF-CC implementations.57–59  O and V denote the number of occupied and virtual orbitals, respectively. In particular, Zijab is computed as the sum of a totally symmetric (S) and an antisymmetric (A) contribution,
Zijab=Sijab+Aijab,
(7)
where
Sijab=efτijef+Wabef+,
(8)
Aijab=efτijefWabef,
(9)
and the + and − denote symmetric and antisymmetric combinations, respectively.
Moreover, we note that the Wabef intermediate can be rewritten as a single contraction
Wabef=PLaePtaePLbfPtbfP.
(10)
However, the following contribution:
Yijab=efmnτijeftmatnbPLmePLnfP
(11)
needs to be subtracted from the double-excitation equations, as it is spuriously included in Eq. (10). This is in practice not a problem, as Eq. (11) introduces a hole–hole ladder-like term, which only requires O4V2 operations and O2V2 words of memory for its computation. The price of computing such a term is compensated by not having to take into account single-excitation amplitudes in the evaluation of the PPL term, as in Eq. (6). This approach can thus be seen as a simple tool to slightly reduce the operation count and memory demand of the PPL contraction subroutine when the T1-transformed Hamiltonian is not used.60 We note that the quantity LaePtaeP is not symmetric with respect to the interchange of indices a and e.

An efficient implementation of the PPL term should take into account memory issues, and it should distribute the right amount of data to the CPU in order to retrieve its peak performance. These two considerations require in general the adoption of different strategies that are often in conflict with respect to each other. As the main cornerstone, we avoid to assemble and thus to store V4 and V3O arrays; for this reason, only batches of integrals are assembled and contracted on-the-fly. In particular, we implemented two different strategies for the PPL term. In the first version, which is outlined in Algorithm 1, we fix index a and distribute the operations over shared-memory threads using the OpenMP paradigm; therefore, the memory required to store temporary arrays is V3Nthreads. Here, we used for W the expression reported in Eq. (6). This algorithm has the advantage of computing the PPL contraction with few, large matrix–matrix multiplications that are efficiently performed using the level 3 BLAS DGEMM routine, but as the number of threads increases, it can become too demanding in terms of memory requirements. We therefore implemented as a second option a different algorithm, which is shown in Algorithm 2, where we fix both the a and b indices and again distribute the operations over OpenMP threads. In this situation, the memory required to store the intermediates scales as V2Nthreads. This time, Eq. (10) was used.

ALGORITHM 1.

PPL: LOOP OVER A.

1: Form τijab±,irr 
2: for irr2 = 1, n_irrep do ⊳ OpenMP parallelized 
3: for a = 1, n_vir(irr2) do ⊳ OpenMP parallelized 
4: Iebfa=P(LaePtaeP)LbfP 
5: Jfeba=IebfamtmbPLmfPLaeP 
6: Wfeba±=12(Jfeba±Jefba) 
7: Form Sijab and Aijab 
8: Distribute in Zijab 
9: end for 
10: end for 
1: Form τijab±,irr 
2: for irr2 = 1, n_irrep do ⊳ OpenMP parallelized 
3: for a = 1, n_vir(irr2) do ⊳ OpenMP parallelized 
4: Iebfa=P(LaePtaeP)LbfP 
5: Jfeba=IebfamtmbPLmfPLaeP 
6: Wfeba±=12(Jfeba±Jefba) 
7: Form Sijab and Aijab 
8: Distribute in Zijab 
9: end for 
10: end for 
ALGORITHM 2.

PPL: LOOP OVER A AND B.

1: Form τijab±,irr 
2: for irr2 = 1, n_irrep do ⊳ OpenMP parallelized 
3: for a = 1, n_vir(irr2) do ⊳ OpenMP parallelized 
4: for irr = 1, n_nirrep do ⊳ OpenMP parallelized 
5: irr1 = irr × irr2 
6: if (irr1 < irr2) cycle 
7: b_end = n_vir(irr1) 
8: if (irr = = 1) b_end = a 
9: for b = 1, b_end do ⊳ OpenMP parallelized 
10: Iefab=P(LaePtaeP)(LbfPtbfP) 
11: Wefab±=12(Iefab±Ifeab) 
12: Form Sijab and Aijab 
13: Distribute in Zijab 
14: end for 
15: end for 
16: end for 
17: end for 
1: Form τijab±,irr 
2: for irr2 = 1, n_irrep do ⊳ OpenMP parallelized 
3: for a = 1, n_vir(irr2) do ⊳ OpenMP parallelized 
4: for irr = 1, n_nirrep do ⊳ OpenMP parallelized 
5: irr1 = irr × irr2 
6: if (irr1 < irr2) cycle 
7: b_end = n_vir(irr1) 
8: if (irr = = 1) b_end = a 
9: for b = 1, b_end do ⊳ OpenMP parallelized 
10: Iefab=P(LaePtaeP)(LbfPtbfP) 
11: Wefab±=12(Iefab±Ifeab) 
12: Form Sijab and Aijab 
13: Distribute in Zijab 
14: end for 
15: end for 
16: end for 
17: end for 

To test the two PPL algorithms, we ran calculations on coronene (C24H12) enforcing D2h symmetry using Dunning’s cc-pVTZ basis set61 and the frozen-core (fc) approximation, that is 888 orbitals of which 24 are kept frozen. The geometry is given as the supplementary material in the Zenodo repository.62 We note in passing that for non-symmetric systems our code performs similarly to the recent implementation presented by Schnack-Petersen et al.;59 here, we focus on the performance of the implementation for symmetric molecules. All calculations were run on a single Intel Xeon Gold 6140M node equipped with 1.1 TB of RAM. The results are summarized in Table I. For both algorithms, we run the calculation using 32 OpenMP threads, as expected the peak memory required is considerably reduced for the “loop ab” algorithm, and in particular it coincides with the memory required by the PPL subroutine. However, the timings for a single iteration are approximately a factor of two slower with respect to the implementation using the first algorithm. We observe that both algorithms present an analogous speedup plot, as shown in Fig. 1, with the “loop a” algorithm performing slightly better for smaller numbers of threads but hitting a plateau sooner than the “loop ab” algorithm. This behavior is expected, as both algorithms execute in parallel a large number of DGEMM matrix–matrix multiplications, which make the code memory-bound. This in turns causes cache misses and other cache-related issue, making the cores idle while they are starving for more data to process. The “loop ab” algorithm scales slightly better because the matrix–matrix multiplications performed within the parallel loop are smaller, which reduce the memory bottleneck. Nevertheless, a somewhat satisfactory speedup factor of 8–10 can be achieved. Different parallelization strategies, as well as the deployment of the code on different architectures, will be investigated in the future.

TABLE I.

Summary of the comparison between the two implementations for the PPL contraction. The calculations are done on coronene with the cc-pVTZ basis set using up to 32 OpenMP threads. We report for each algorithm type the peak-memory consumption in GB, which coincides with the memory requested by the PPL subroutine, and the average time for a single CCSD iteration in minutes.

AlgorithmMemory (GB)Time it. (min)
Loop a 272 3.5 
Loop ab 47 
AlgorithmMemory (GB)Time it. (min)
Loop a 272 3.5 
Loop ab 47 
FIG. 1.

Speedup plot for CD based CCSD/cc-pVTZ calculations on coronene. For both algorithms, we plot the ratio between iteration time for the serial calculation and the iteration time of the parallel ones. Both axes are in the log2 scale.

FIG. 1.

Speedup plot for CD based CCSD/cc-pVTZ calculations on coronene. For both algorithms, we plot the ratio between iteration time for the serial calculation and the iteration time of the parallel ones. Both axes are in the log2 scale.

Close modal

The benefit obtained by exploiting point-group symmetry can be discerned by comparing the theoretical factor of reduction due to symmetry (FRS) with respect to the achieved FRSs. Theoretical FRS can be computed as the ratio between the total number of floating-point operations required for a specific contraction term in the CC equations with symmetry and the one without symmetry. Achieved FRSs are given by the ratio between the elapsed CPU time required for a specific contraction performed with and without enforcing Abelian symmetry. The results are summarized in Table II. As test systems and basis sets we used corenene with (fc)cc-pVTZ, buckminsterfullerene (C60) with (fc)cc-pVDZ, and a cobalt β-diketiminato oxo complex [(nacnac)CoIIIO] with cc-pVDZ whose geometry was taken from Ref. 63. The three molecular representations can be seen in Fig. 2. We considered only the most expensive contractions, namely, the ones that exhibit a scaling of O3V3 and O2V4 (PPL). We note that the achieved FRS are systematically larger for the O3V3 terms with respect to the PPL contraction. This can be explained since, even for the most efficient PPL algorithm (i.e., Algorithm 1), we need to perform two nested loops over the irreducible representations, which means that enforcing point group symmetry results in a larger number of significantly smaller matrix–matrix multiplications, with a loss of efficiency with respect to the C1 treatment. On the other hand, the O3V3 terms are implemented in a way that only requires one explicit loop over the irreducible representations, affecting thus less the overall efficiency of their evaluation. This effect is less apparent for calculations that are run sequentially (see the numbers in parentheses in Table II). Furthermore, we note that for the C2v cobalt complex, where only 4 irreducible representations are present, the achieved FRS are closer to the theoretical FRS.

TABLE II.

We report the theoretical and achieved factor of reduction due to symmetry (FRS) for D2h coronene, D2h buckminsterfullerene, and C2v cobalt complex. We ran the calculations using 32 OpenMP threads, while between parenthesis we report the achieved FRS obtained with a serial calculation.

Theoretical FRSAchieved FRS
MoleculeO3V3O2V4O3V3O2V4
C24H12 59 59 35 (47) 10 (23) 
C60 64 63 44 (57) 15 (28) 
(nacnac)CoIII16 15 12 (14) 6 (9) 
Theoretical FRSAchieved FRS
MoleculeO3V3O2V4O3V3O2V4
C24H12 59 59 35 (47) 10 (23) 
C60 64 63 44 (57) 15 (28) 
(nacnac)CoIII16 15 12 (14) 6 (9) 
FIG. 2.

Molecular systems used to test the CD-CCSD code. From the left: (nacnac)CoIIIO, buckminsterfullerene, and coronene. Carbon is represented in silver, oxygen in red, nitrogen in blue, cobalt in pink, and hydrogen in white.

FIG. 2.

Molecular systems used to test the CD-CCSD code. From the left: (nacnac)CoIIIO, buckminsterfullerene, and coronene. Carbon is represented in silver, oxygen in red, nitrogen in blue, cobalt in pink, and hydrogen in white.

Close modal

To show the full potential of our new implementation, we ran a calculation on a challenging system, namely buckminsterfullerene (C60) with the cc-pVTZ basis set,61 which consists of 1800 basis functions (1620 virtual and 180 doubly occupied orbitals). The geometry can be downloaded from the Zenodo repository.62 The computational complexity for this system is twofold since both the number of virtual orbitals and the number of electrons are high. We performed the calculation with the frozen-core approximation—thus considering 120 occupied orbitals. We used a Cholesky decomposition threshold of 10−4 and obtained 10 169 CVs, which are nearly evenly distributed among the eight irreducible representations. The full set of CVs occupies only 29 GB of memory, which can be compared with the immense—and perhaps prohibitive—amount of memory, or rather disk space, that a traditional implementation would demand, which is about 3.4 TB for the full set of integrals ⟨ab|cd⟩ as estimated with a dry run. We note here that, in our implementation, to achieve maximum efficiency, we store the ⟨ab|cd⟩ integrals with ab and no other restriction. If one wanted to store only the unique elements of the ⟨ab|cd⟩ array, one would still need slightly less than 1 TB of memory, which is still a formidable amount of memory that comes at the price of having to expand lower-triangular matrices into square matrices before performing the relevant matrix–matrix multiplications. To accelerate the convergence of the non-linear equation solver, we exploited as usual the direct inversion in the iterative subspace (DIIS) strategy, and in order to avoid the in-core storage of the past expansion and error vectors, we input/output them at each iteration. The memory required to store a set of amplitude (tijab) amounts in fact to ∼30 GB, making it a potential bottleneck, especially when large DIIS expansion spaces are used.

We first performed the computation using the “loop ab” PPL algorithm and distributing the work over 32 OpenMP threads. With this setup, the total memory requested amounts to 478 GB, and the time for a single iteration is 6 h 50 min proving the feasibility of the calculation. Furthermore, we note that using this algorithm the requested memory is almost independent of the number of threads used. We repeated the calculation using the “loop a” PPL algorithm and 12 OpenMP threads. Within this configuration, however, the amount of memory requested is 1080 GB. A single iteration took about 2 h, and the calculation converged in 1 day 19 h to 10−7 in the maximum norm of the residual. The PPL contraction is the most time consuming operation since about 65% of a single-iteration time is spent in computing this term, while the rest is mostly spent in computing the O3V3 contributions. The converged CCSD energy is −2281.300 20 Eh of which −8.915 74 Eh represents the correlation contribution.

In summary, we present a CCSD code that exploits Abelian point-group symmetry and the Cholesky decomposition of the two-electron integrals. The most expensive contribution—namely the particle–particle ladder term—has been implemented in two different versions using the symmetric/antisymmetric strategy. The two implementations differ in the memory required to store temporary arrays and the code is driven toward the best option depending on the available hardware resources. The implementation is suited to treat medium and large symmetric systems as demonstrated by performing CCSD calculations on buckminsterfullerene (C60) with a polarized valence triple-zeta basis set.

F.L. and T.N. acknowledge financial support from ICSC-Centro Nazionale di Ricerca in High Performance Computing, Big Data, and Quantum Computing, funded by the European Union—Next Generation EU—PNRR, Missione 4 Componente 2 Investimento 1.4. In Mainz, this work was supported by the Deutsche Forschungsgemeinschaft (DFG) via project B5 within the TRR 146 “Multiscale Simulation Methods for Soft Matter Systems.”

The authors have no conflicts to disclose.

Tommaso Nottoli: Data curation (equal); Investigation (equal); Methodology (equal); Software (lead); Writing – original draft (equal); Writing – review & editing (equal). Jürgen Gauss: Conceptualization (equal); Investigation (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). Filippo Lipparini: Conceptualization (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal).

The data that support the findings of this study are openly available in the Zenodo repository.62 

1.
J.
Čížek
, “
On the correlation problem in atomic and molecular systems. Calculation of wavefunction components in Ursell-type expansion using quantum-field theoretical methods
,”
J. Chem. Phys.
45
,
4256
4266
(
1966
).
2.
J.
Čížek
, “
On the use of the cluster expansion and the technique of diagrams in calculations of correlation effects in atoms and molecules
,”
Adv. Chem. Phys.
14
,
35
89
(
1969
).
3.
I.
Shavitt
and
R. J.
Bartlett
,
Many-Body Methods in Chemistry and Physics: MBPT and Coupled-Cluster Theory
(
Cambridge University Press
,
2009
).
4.
G. D.
Purvis
III
and
R. J.
Bartlett
, “
A full coupled-cluster singles and doubles model: The inclusion of disconnected triples
,”
J. Chem. Phys.
76
,
1910
1918
(
1982
).
5.
C.
Hampel
and
H.-J.
Werner
, “
Local treatment of electron correlation in coupled cluster theory
,”
J. Chem. Phys.
104
,
6286
6297
(
1996
).
6.
M.
Schütz
and
H.-J.
Werner
, “
Low-order scaling local electron correlation methods. IV. Linear scaling local coupled-cluster (LCCSD)
,”
J. Chem. Phys.
114
,
661
681
(
2001
).
7.
G. E.
Scuseria
and
P. Y.
Ayala
, “
Linear scaling coupled cluster and perturbation theories in the atomic orbital basis
,”
J. Chem. Phys.
111
,
8330
8343
(
1999
).
8.
J. E.
Subotnik
,
A.
Sodt
, and
M.
Head-Gordon
, “
A near linear-scaling smooth local coupled cluster algorithm for electronic structure
,”
J. Chem. Phys.
125
,
074116
(
2006
).
9.
A. A.
Auer
and
M.
Nooijen
, “
Dynamically screened local correlation method using enveloping localized orbitals
,”
J. Chem. Phys.
125
,
024104
(
2006
).
10.
F.
Neese
,
A.
Hansen
, and
D. G.
Liakos
, “
Efficient and accurate approximations to the local coupled cluster singles doubles method using a truncated pair natural orbital basis
,”
J. Chem. Phys.
131
,
064103
(
2009
).
11.
F.
Neese
,
F.
Wennmohs
, and
A.
Hansen
, “
Efficient and accurate local approximations to coupled-electron pair approaches: An attempt to revive the pair natural orbital method
,”
J. Chem. Phys.
130
,
114108
(
2009
).
12.
Q.
Ma
and
H.-J.
Werner
, “
Explicitly correlated local coupled-cluster methods using pair natural orbitals
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1371
(
2018
).
13.
T.
Kinoshita
,
O.
Hino
, and
R. J.
Bartlett
, “
Singular value decomposition approach for the approximate coupled-cluster method
,”
J. Chem. Phys.
119
,
7756
7762
(
2003
).
14.
R. M.
Parrish
,
E. G.
Hohenstein
,
T. J.
Martínez
, and
C. D.
Sherrill
, “
Tensor hypercontraction. II. Least-squares renormalization
,”
J. Chem. Phys.
137
,
224106
(
2012
).
15.
R. M.
Parrish
,
E. G.
Hohenstein
,
N. F.
Schunck
,
C. D.
Sherrill
, and
T. J.
Martínez
, “
Exact tensor hypercontraction: A universal technique for the resolution of matrix elements of local finite-range N-body potentials in many-body quantum problems
,”
Phys. Rev. Lett.
111
,
132505
(
2013
).
16.
R. M.
Parrish
,
Y.
Zhao
,
E. G.
Hohenstein
, and
T. J.
Martínez
, “
Rank reduced coupled cluster theory. I. Ground state energies and wavefunctions
,”
J. Chem. Phys.
150
,
164118
(
2019
).
17.
J. S.
Kottmann
and
F. A.
Bischoff
, “
Coupled-cluster in real space. 1. CC2 ground state energies using multiresolution analysis
,”
J. Chem. Theory Comput.
13
,
5945
5955
(
2017
).
18.
C.
Riplinger
,
B.
Sandhoefer
,
A.
Hansen
, and
F.
Neese
, “
Natural triple excitations in local coupled cluster calculations with pair natural orbitals
,”
J. Chem. Phys.
139
,
134101
(
2013
).
19.
J.
Gauss
and
H.-J.
Werner
, “
NMR chemical shift calculations within local correlation methods: The GIAO-LMP2 approach
,”
Phys. Chem. Chem. Phys.
2
,
2083
2090
(
2000
).
20.
G.
Rauhut
and
H.-J.
Werner
, “
Analytical energy gradients for local coupled-cluster methods
,”
Phys. Chem. Chem. Phys.
3
,
4853
5862
(
2001
).
21.
D.
Datta
,
S.
Kossmann
, and
F.
Neese
, “
Analytic energy derivatives for the calculation of the first-order molecular properties using the domain-based local pair-natural orbital coupled-cluster theory
,”
J. Chem. Phys.
145
,
114101
(
2016
).
22.
P.
Pinski
and
F.
Neese
, “
Analytical gradient for the domain-based local pair natural orbital second order Møller-Plesset perturbation theory method (DLPNO-MP2)
,”
J. Chem. Phys.
150
,
164102
(
2019
).
23.
G. L.
Stoychev
,
A. A.
Auer
,
J.
Gauss
, and
F.
Neese
, “
DLPNO-MP2 second derivatives for the computation of polarizabilities and NMR shieldings
,”
J. Chem. Phys.
154
,
164110
(
2021
).
24.
J. L.
Whitten
, “
Coulombic potential energy integrals and approximations
,”
J. Chem. Phys.
58
,
4496
4501
(
1973
).
25.
B. I.
Dunlap
,
J. W. D.
Connolly
, and
J. R.
Sabin
, “
On some approximations in applications of Xα theory
,”
J. Chem. Phys.
71
,
3396
3402
(
1979
).
26.
O.
Vahtras
,
J.
Almlöf
, and
M. W.
Feyereisen
, “
Integral approximations for LCAO-SCF calculations
,”
Chem. Phys. Lett.
213
,
514
518
(
1993
).
27.
M.
Feyereisen
,
G.
Fitzgerald
, and
A.
Komornicki
, “
Use of approximate integrals in ab initio theory. An application in MP2 energy calculations
,”
Chem. Phys. Lett.
208
,
359
363
(
1993
).
28.
K.
Eichkorn
,
O.
Treutler
,
H.
Öhm
,
M.
Häser
, and
R.
Ahlrichs
, “
Auxiliary basis sets to approximate Coulomb potentials
,”
Chem. Phys. Lett.
240
,
283
290
(
1995
).
29.
N. H. F.
Beebe
and
J.
Linderberg
, “
Simplifications in the generation and transformation of two-electron integrals in molecular calculations
,”
Int. J. Quantum Chem.
12
,
683
705
(
1977
).
30.
I.
Røeggen
and
E.
Wisløff-Nilssen
, “
On the Beebe-Linderberg two-electron integral approximation
,”
Chem. Phys. Lett.
132
,
154
160
(
1986
).
31.
H.
Koch
,
A.
Sánchez de Merás
, and
T. B.
Pedersen
, “
Reduced scaling in electronic structure calculations using Cholesky decompositions
,”
J. Chem. Phys.
118
,
9481
9484
(
2003
).
32.
I.
Røeggen
and
T.
Johansen
, “
Cholesky decomposition of the two-electron integral matrix in electronic structure calculations
,”
J. Chem. Phys.
128
,
194107
(
2008
).
33.
F.
Aquilante
,
L.
Boman
,
J.
Boström
,
H.
Koch
,
R.
Lindh
,
A. S.
de Merás
, and
T. B.
Pedersen
, “
Cholesky decomposition techniques in electronic structure theory
,” in
Linear-Scaling Techniques in Computational Chemistry and Physics: Methods and Applications
, edited by
R.
Zalesny
,
M. G.
Papadopoulos
,
P. G.
Mezey
, and
J.
Leszczynski
(
Springer Netherlands
,
Dordrecht
,
2011
), pp.
301
343
.
34.
E.
Epifanovsky
,
D.
Zuev
,
X.
Feng
,
K.
Khistyaev
,
Y.
Shao
, and
A. I.
Krylov
, “
General implementation of the resolution-of-the-identity and Cholesky representations of electron repulsion integrals within coupled-cluster and equation-of-motion methods: Theory and benchmarks
,”
J. Chem. Phys.
139
,
134105
(
2013
).
35.
S. D.
Folkestad
,
E. F.
Kjønstad
, and
H.
Koch
, “
An efficient algorithm for Cholesky decomposition of electron repulsion integrals
,”
J. Chem. Phys.
150
,
194112
(
2019
).
36.
T.
Zhang
,
X.
Liu
,
E. F.
Valeev
, and
X.
Li
, “
Toward the minimal floating operation count Cholesky decomposition of electron repulsion integrals
,”
J. Phys. Chem. A
125
,
4258
4265
(
2021
).
37.
T. B.
Pedersen
,
S.
Lehtola
,
I. F.
Galván
, and
R.
Lindh
, “
The versatility of the Cholesky decomposition in electron structure theory
,” WIREs Comput. Mol. Sci. (published online
2023
).
38.
M. E.
Harding
,
T.
Metzroth
,
J.
Gauss
, and
A. A.
Auer
, “
Parallel calculation of CCSD and CCSD(T) analytic first and second derivatives
,”
J. Chem. Theory Comput.
4
,
64
74
(
2008
).
39.
E.
Prochnow
,
M. E.
Harding
, and
J.
Gauss
, “
Parallel calculation of CCSDT and Mk-MRCCSDT energies
,”
J. Chem. Theory Comput.
6
,
2339
2347
(
2010
).
40.
M.
Pitoňák
,
F.
Aquilante
,
P.
Hobza
,
P.
Neogrády
,
J.
Noga
, and
M.
Urban
, “
Parallelized implementation of the CCSD(T) method in MOLCAS using optimized virtual orbitals space and Cholesky decomposed two-electron integrals
,”
Collect. Czech. Chem. Commun.
76
,
713
742
(
2011
).
41.
A.
Asadchev
and
M. S.
Gordon
, “
Fast and flexible coupled cluster implementation
,”
J. Chem. Theory Comput.
9
,
3385
3392
(
2013
).
42.
E.
Deumens
,
V. F.
Lotrich
,
A.
Perera
,
M. J.
Ponton
,
B. A.
Sanders
, and
R. J.
Bartlett
, “
Software design of ACES III with the super instruction architecture
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
1
,
895
901
(
2011
).
43.
I. A.
Kaliman
and
A. I.
Krylov
, “
New algorithm for tensor contractions on multi-core CPUs, GPUs, and accelerators enables CCSD and EOM-CCSD calculations with over 1000 basis functions on a single compute node
,”
J. Comput. Chem.
38
,
842
853
(
2017
).
44.
T.
Janowski
and
P.
Pulay
, “
Efficient parallel implementation of the CCSD external exchange operator and the perturbative triples (T) energy calculation
,”
J. Chem. Theory Comput.
4
,
1585
1592
(
2008
).
45.
C.
Peng
,
J. A.
Calvin
,
F.
Pavošević
,
J.
Zhang
, and
E. F.
Valeev
, “
Massively parallel implementation of explicitly correlated coupled-cluster singles and doubles using TiledArray framework
,”
J. Phys. Chem. A
120
,
10231
10244
(
2016
).
46.
T.
Shen
,
Z.
Zhu
,
I. Y.
Zhang
, and
M.
Scheffler
, “
Massive-parallel implementation of the resolution-of-identity coupled-cluster approaches in the numeric atom-centered orbital framework for molecular systems
,”
J. Chem. Theory Comput.
15
,
4721
4734
(
2019
).
47.
E.
Solomonik
,
D.
Matthews
,
J. R.
Hammond
,
J. F.
Stanton
, and
J.
Demmel
, “
A massively parallel tensor contraction framework for coupled-cluster computations
,”
J. Parallel Distrib. Comput.
74
,
3176
3190
(
2014
).
48.
L.
Gyevi-Nagy
,
M.
Kállay
, and
P. R.
Nagy
, “
Integral-direct and parallel implementation of the CCSD(T) method: Algorithmic developments and large-scale applications
,”
J. Chem. Theory Comput.
16
,
366
384
(
2019
).
49.
D.
Datta
and
M. S.
Gordon
, “
A massively parallel implementation of the CCSD(T) method using the resolution-of-the-identity approximation and a hybrid distributed/shared memory parallelization model
,”
J. Chem. Theory Comput.
17
,
4799
4822
(
2021
).
50.
K.
Raghavachari
,
G. W.
Trucks
,
J. A.
Pople
, and
M.
Head-Gordon
, “
A fifth-order perturbation comparison of electron correlation theories
,”
Chem. Phys. Lett.
157
,
479
483
(
1989
).
51.
J. F.
Stanton
,
J.
Gauss
,
L.
Cheng
,
M. E.
Harding
,
D. A.
Matthews
, and
P. G.
Szalay
,
“CFOUR, coupled-cluster techniques for computational chemistry, a quantum-chemical program package
,” With contributions from
A.
Asthana
,
A. A.
Auer
,
R. J.
Bartlett
,
U.
Benedikt
,
C.
Berger
,
D. E.
Bernholdt
,
S.
Blaschke
,
Y. J.
Bomble
,
S.
Burger
,
O.
Christiansen
,
D.
Datta
,
F.
Engel
,
R.
Faber
,
J.
Greiner
,
M.
Heckert
,
O.
Heun
,
M.
Hilgenberg
,
C.
Huber
,
T.-C.
Jagau
,
D.
Jonsson
,
J.
Jusélius
,
T.
Kirsch
,
M.-P.
Kitsaras
,
K.
Klein
,
G. M.
Kopper
,
W. J.
Lauderdale
,
F.
Lipparini
,
J.
Liu
,
T.
Metzroth
,
L. A.
Mück
,
D. P.
O’Neill
,
T.
Nottoli
,
J.
Oswald
,
D. R.
Price
,
E.
Prochnow
,
C.
Puzzarini
,
K.
Ruud
,
F.
Schiffmann
,
W.
Schwalbach
,
C.
Simmons
,
S.
Stopkowicz
,
A.
Tajti
,
J. T.
Uhlirova
,
J.
Vázquez
,
F.
Wang
,
J. D.
Watts
,
P.
Yergün
,
C.
Zhang
, and
X.
Zheng
, and the integral packages MOLECULE (
J.
Almlöf
and
P. R.
Taylor
), PROPS (
P. R.
Taylor
), ABACUS (
T.
Helgaker
,
H. J. A.
Jensen
,
P.
Jørgensen
, and
J.
Olsen
), and ECP routines by
A. V.
Mitin
and
C.
van Wüllen
. For the current version, see http://www.cfour.de.
52.
D. A.
Matthews
,
L.
Cheng
,
M. E.
Harding
,
F.
Lipparini
,
S.
Stopkowicz
,
T.-C.
Jagau
,
P. G.
Szalay
,
J.
Gauss
, and
J. F.
Stanton
, “
Coupled-cluster techniques for computational chemistry: The CFOUR program package
,”
J. Chem. Phys.
152
,
214108
(
2020
).
53.
G. E.
Scuseria
,
C. L.
Janssen
, and
H. F.
Schaefer
III
, “
An efficient reformulation of the closed-shell coupled cluster single and double excitation (CCSD) equations
,”
J. Chem. Phys.
89
,
7382
7387
(
1988
).
54.
J. F.
Stanton
,
J.
Gauss
,
J. D.
Watts
, and
R. J.
Bartlett
, “
A direct product decomposition approach for symmetry exploitation in many-body methods. I. Energy calculations
,”
J. Chem. Phys.
94
,
4334
4345
(
1991
).
55.
J.
Gauss
,
J. F.
Stanton
, and
R. J.
Bartlett
, “
Coupled-cluster open-shell analytic gradients: Implementation of the direct product decomposition approach in energy gradient calculations
,”
J. Chem. Phys.
95
,
2623
2638
(
1991
).
56.
S.
Saebø
and
P.
Pulay
, “
Fourth-order Møller–Plessett perturbation theory in the local correlation treatment. I. Method
,”
J. Chem. Phys.
86
,
914
922
(
1987
).
57.
A. E.
DePrince
III
and
C. D.
Sherrill
, “
Accuracy and efficiency of coupled-cluster theory using density fitting/Cholesky decomposition, frozen natural orbitals, and a t1-transformed Hamiltonian
,”
J. Chem. Theory Comput.
9
,
2687
2696
(
2013
).
58.
U.
Bozkaya
and
C. D.
Sherrill
, “
Analytic energy gradients for the coupled-cluster singles and doubles method with the density-fitting approximation
,”
J. Chem. Phys.
144
,
174103
(
2016
).
59.
A. K.
Schnack-Petersen
,
H.
Koch
,
S.
Coriani
, and
E. F.
Kjønstad
, “
Efficient implementation of molecular CCSD gradients with Cholesky-decomposed electron repulsion integrals
,”
J. Chem. Phys.
156
,
244111
(
2022
).
60.
H.
Koch
,
O.
Christiansen
,
R.
Kobayashi
,
P.
Jørgensen
, and
T.
Helgaker
, “
A direct atomic orbital driven implementation of the coupled cluster singles and doubles (CCSD) model
,”
Chem. Phys. Lett.
228
,
233
238
(
1994
).
61.
T. H.
Dunning
, Jr.
, “
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
,”
J. Chem. Phys.
1007
,
4572
4585
(
1989
).
62.
T.
Nottoli
,
J.
Gauss
, and
F.
Lipparini
(
2023
), “
Supplementary material: A novel coupled-cluster singles and doubles implementation that combines the exploitation of point-group symmetry and Cholesky decomposition of the two-electron integrals
,” Zenodo https://doi.org/10.5281/zenodo.8314406.
63.
J.
Conradie
and
A.
Ghosh
, “
Electronic structure of trigonal-planar transition-metal−imido Complexes: spin-state energetics, spin-density profiles, and the remarkable performance of the OLYP functional
,”
J. Chem. Theory Comput.
3
,
689
702
(
2007
).