Novel implementations based on dense tensor storage are presented for the singlet-reference perfect quadruples (PQ) [J. A. Parkhill et al., J. Chem. Phys. 130, 084101 (2009)] and perfect hextuples (PH) [J. A. Parkhill and M. Head-Gordon, J. Chem. Phys. 133, 024103 (2010)] models. The methods are obtained as block decompositions of conventional coupled-cluster theory that are exact for four electrons in four orbitals (PQ) and six electrons in six orbitals (PH), but that can also be applied to much larger systems. PQ and PH have storage requirements that scale as the square, and as the cube of the number of active electrons, respectively, and exhibit quartic scaling of the computational effort for large systems. Applications of the new implementations are presented for full-valence calculations on linear polyenes (CnHn+2), which highlight the excellent computational scaling of the present implementations that can routinely handle active spaces of hundreds of electrons. The accuracy of the models is studied in the π space of the polyenes, in hydrogen chains (H50), and in the π space of polyacene molecules. In all cases, the results compare favorably to density matrix renormalization group values. With the novel implementation of PQ, active spaces of 140 electrons in 140 orbitals can be solved in a matter of minutes on a single core workstation, and the relatively low polynomial scaling means that very large systems are also accessible using parallel computing.
I. INTRODUCTION
Efficient handling of strong correlation is one of the still unresolved questions in computational chemistry. Outside the scope of the otherwise useful Kohn–Sham density-functional theory,1–3 approaches to strong correlation lean heavily on wave function theories. The conventional way to treat strong correlation is through multiconfigurational self-consistent field (MCSCF) theory, which scales exponentially as the size of the active space grows. As a result, MCSCF cannot be applied to active spaces significantly larger than 16 electrons in 16 orbitals, although the barrier has been recently pushed back through approximative stochastic4,5 and adaptive6–10 approaches to the full configuration interaction (FCI) problem.
Now, while MCSCF relies on a configuration interaction (CI) type ansatz for the wave function, where |Φ〉 is the single-particle reference and |Ψ〉 the true, multiconfigurational wave function, one might ask whether a coupled-cluster11 (CC) type ansatz could be used instead to formulate a MCSCF-type model. Often the CC approach is also used to describe dynamic correlation with a CI reference, as in the multireference CC (MR-CC) approach,12 but MR-CC is much more complicated than single-reference CC, and combining both strong correlation and dynamic correlation within the same machinery would be clearly advantageous. While in principle CC theory is well able to describe static correlation, the problem that arises upon its application is that the computational effort of the resulting procedure is prohibitively expensive, even if an active space is used.13 However, this problem has been recently solved by Parkhill, Lawler, and Head-Gordon, who proposed a truncation of CC for exactness within the active space for a given number of strongly interacting electron pairs,14,15 instead of the usual, global truncation based on the excitation level. For example, the perfect quadruples14 (PQ) model is formed by truncating CC theory with single through quadruple (CCSDTQ) excitations to two pairs by keeping only the amplitudes that satisfy {ak, ik} ∈ {pair1} × {pair2}. Similarly, the perfect hextuples15 (PH) model is a three-pair truncation of CC theory with single through hextuple excitations (CCSDTQ56), in which only the amplitudes that satisfy {ak, ik} ∈ {pair1} × {pair2} × {pair3} are kept. Here, the concept of a pair arises from connection to the perfect pairing16–20 (PP) theory, in which pair i is described by a quartet of orbitals: the alpha and beta orbitals i and , respectively, that are occupied in the single-particle reference, as well as the corresponding alpha and beta virtual orbitals i∗ and , respectively. In contrast to MCSCF, PQ and PH exhibit polynomial scaling with respect to the size of the active space. The storage requirements are modest, scaling as O(N2) for PQ and O(N3) for PH, N being the number of active orbitals in the calculation. The computational effort scales asymptotically as O(N4) for both models.
Another widely used approach for strong correlation is the density matrix renormalization group (DMRG) method,21–25 which is exact in principle and which also offers polynomial scaling with respect to system size. However, the method has a significant prefactor: the storage and computational costs scale as25,26 O(M2N3) and O(M3N3) + O(M2N4), respectively, where M is the number of states kept in the calculation, the exact solution being recovered at the limit M to ∞. Furthermore, as a fundamentally one-dimensional approach, DMRG has a history of being hard to use due to issues with orbital ordering, although automatic orbital ordering schemes have recently made DMRG calculations easier to perform. We note that other approaches for a compact description of strong correlation have also been suggested, such as the CC valence bond (CCVB) method,27–29 extended CC,30–32 and the multifacet graphically contracted function method.33,34
In order to apply MCSCF, DMRG, PQ, or PH to studies of chemical systems, the additional description of dynamic correlation is usually of utmost importance. While MCSCF and DMRG typically rely on a two-pronged approach to dynamic correlation through CI, CC, or perturbative approaches, dynamic correlation can be treated on the same footing as static correlation in PQ and PH,35 enabling the two types of correlation to interact. Note that perturbative approaches can be used as well in combination with PQ36 and PH.
What unites MCSCF, DMRG, PQ, and PH is that the choice of the set of active orbitals is a problem, and often the orbitals need to be optimized in order to fully capture static correlation effects. Orbital optimization is hard for PQ and PH, but earlier experience15 suggests that in many cases orbitals optimized at the much simpler PP level of theory might be sufficient for a wide variety of applications. Thus, in the present work, only localized Hartree–Fock orbitals and PP optimized orbitals will be considered. Orbitals optimized with CCVB or unrestricted Hartree–Fock natural orbitals37–39 might also prove useful. However, because the feasible size of the active space is much larger in PQ and PH than in MCSCF or even DMRG, it is likely that for full-valence calculations, the choice of the active space orbitals should not turn out to be an impediment to the use of the PQ and PH models.
While encouraging results have been achieved with PQ and PH, their existing implementations (while exhibiting the correct O(N4) scaling with system size) have a too large prefactor to allow their application to studies of chemical problems. Because the previous implementations are based on the use of sparse tensor algebra,40 it is worth asking if the prefactor could be reduced by the use of dense tensor storage, instead. Moreover, the existing sparse tensor implementation does not take advantage of the mathematical structure of the PQ and PH methods, which results in extra work for bookkeeping and prevents efficient compiler optimization from taking place.
In the present work, we describe the dense tensor implementation of the PQ and PH models and benchmark the resulting code against the previous implementation on linear polyenes. Novel applications of the PQ and PH models are demonstrated for the H50 chain, and the results are compared to DMRG. Applications of PQ and PH to the strongly correlated π space of the polyacene series are also presented. The organization of the manuscript is the following. In Sec. II, we illustrate how the truncation of CC theory is performed. Next, in Sec. III, we describe various aspects of how the procedure has been carried out in practice. In Sec. IV, we present the scaling of the new code when applied to linear polyenes, and the energies for the polyenes, the dissociation of the H50 chain, as well as the polyacenes for which natural occupation numbers are also presented. The study concludes with a summary and brief discussion.
II. THEORY
The theory behind the CC approach is well established and shall not be discussed further here.41,42 The starting point for the present truncation is the CC equations in spin-integrated form, consisting of tensors that belong to a certain spin block, such as the αβ block of the double excitation amplitudes, where α and β denote spin-up and spin-down, respectively. As a truncation of CC theory, the models in the present hierarchy inherit all the beneficial properties of CC such as size extensivity, but also its drawbacks such as non-variational behavior in cases where the excitation rank in CC is not high enough to describe the static correlation effects.
In the following, we will omit the spin indices, because the discussion applies to all spin blocks of all tensors that appear in CC theory. We will mostly be focusing on the PQ model, as it already illustrates all of the necessary aspects for a general truncation method without being overly complicated. The division of the orbital space that is used in the present work is illustrated in Figure 1. The system, in a singlet reference state, is composed of a set of inactive core orbitals, a set of N active occupied orbitals, a set of N active virtual orbitals, and a set of inactive virtual orbitals. The tensors such as the excitation amplitudes t, the two-electron integrals v, the de-excitation amplitudes λ, and the one- and two-electron density matrices γ and Γ are truncated using a block decomposition. For instance, for PQ, the double excitation tensor is expanded as
where i and j denote occupied orbitals, a and b denote virtual orbitals, P and Q are pairing labels that run over the active space, and δPQ is the Kronecker delta. It is easy to see that Equation (1) for the double excitation amplitudes agrees with the original definition of the PQ model as the set of amplitudes that satisfy {ak, ik} ∈ {pair1} × {pair2}. As this kind of notation is unusual to most chemists, we will try to clarify the meaning of Equation (1) in the following. In the convention adapted above, all the orbitals in the alpha and beta occupied and virtual spaces are numbered from 1 to N, which is also the range spanned by the pairing labels. Because we have assumed that the spin has been integrated out, all the quantities in Equation (1) are scalar numbers. The indices i, j, a, b are fixed by the left hand side of the equation, which represents the element of the excitation tensor under decomposition. Applying the decomposition to, for example, the element with k≠c will only yield a contribution from the last term of Equation (1) (which equals ), because the first term would require k = c and all the other terms would contribute only for P = Q which has been excluded in the summations.
The restriction to identically sized active occupied and virtual spaces becomes apparent here, as the same pairing labels cannot be used for spaces of different sizes. Each value of a pairing label corresponds to a quartet of orbitals: the alpha and beta occupied orbital and the alpha and beta virtual orbital. The tensor decomposition in Equation (1) is known in mathematics as a block decomposition.43,44 For PH, in addition to the 8 cases shown in Equation (1), one obtains 63 more, corresponding to all the inequivalent ways of distributing three labels to four indices: for instance, in PQ is equivalent to as the dummy summation indices P and Q can be interchanged. Having enumerated the different subtensor representations for all the necessary tensors, one proceeds by inserting them into the CC equations and by performing the summations over the Kronecker deltas. When this is done, one ends up with a formulation of PQ or PH theory in terms of the dense subtensors. For example, in the case of PH, the original hextuple excitation operator , which is a rank-12 tensor, is reduced to a set of rank-3 tensors, and any summations that are left over are only with respect to the pairing labels. While, in principle, the doubles contribution to the opposite spin correlation energy
has 8 × 8 = 64 distinct subtensor contributions in PQ (and 5041 in PH), it is seen that due to orthogonality of the Kronecker indices, the PQ expression becomes simply
The limitation to P≠Q in Equation (3) and similar equations can be easily lifted by establishing a convention in which all diagonals of the subtensors vanish.
One can identify a set of 8 subtensors in Equations (1) and (3): , , , , , , , and . The first term in Equation (1), , can be identified as the PP amplitude (singles are usually not included in the definition of PP as the orbitals are normally optimized). As the simplest truncation (exactness for two electrons in two orbitals), PP has a special place within the hierarchy of the truncated models. Because PP couples every occupied orbital to a specific virtual orbital, the amplitudes are independent and can be solved for in a closed form. Orbital optimization can then be done in a routine fashion even for large systems.45 The terms , , and in Equation (1) are the basis of the imperfect pairing (IP) model,46 which is considerably more complicated than PP but has still been originally been derived by hand, and for which orbital optimization is still a routine task.
However, in addition to the terms in Equation (1) that represent double excitations, PQ also includes single, triple, and quadruple excitations, and unlike PP and IP, PQ is not restricted to opposite-spin correlation, so the derivation of the equations for PQ would be a challenging task by hand. Fortunately, it is an ideal task for a computer.
The antisymmetry between same-spin indices can be used to eliminate redundant storage of equivalent contributions (e.g., for the αα and ββ blocks in Equation (1)). The resulting storage costs for the subtensors that appear at the PP, PQ, and PH levels of theory are shown in Table I. For instance, for PQ, the storage cost is 17N2 + 3N for the amplitudes, 113N2 + 19N for the density matrices, and 115N2 + 23N for the integrals. The figures in Table I also include the classes of integrals that are only needed for the calculation of the CC pseudoenergy; for a t amplitude-only program, the amount of integrals would be slightly smaller. Table II details the distribution of the total amplitude subtensors into the individual excitation ranks.
. | Rank-1 . | Rank-2 . | Rank-3 . |
---|---|---|---|
t/λ | 3 | 17 | 64 |
f | 10 | 6 | 0 |
v | 13 | 109 | 96 |
γ | 6 | 6 | 0 |
Γ | 13 | 107 | 96 |
. | Rank-1 . | Rank-2 . | Rank-3 . |
---|---|---|---|
t/λ | 3 | 17 | 64 |
f | 10 | 6 | 0 |
v | 13 | 109 | 96 |
γ | 6 | 6 | 0 |
Γ | 13 | 107 | 96 |
. | Rank-1 . | Rank-2 . | Rank-3 . |
---|---|---|---|
T1/Λ1 | 2 | 2 | |
T2/Λ2 | 1 | 9 | 8 |
T3/Λ3 | 5 | 29 | |
T4/Λ4 | 1 | 21 | |
T5/Λ5 | 5 | ||
T6/Λ6 | 1 |
. | Rank-1 . | Rank-2 . | Rank-3 . |
---|---|---|---|
T1/Λ1 | 2 | 2 | |
T2/Λ2 | 1 | 9 | 8 |
T3/Λ3 | 5 | 29 | |
T4/Λ4 | 1 | 21 | |
T5/Λ5 | 5 | ||
T6/Λ6 | 1 |
Due to the truncation of the amplitudes and integrals, PP, PQ, and PH are not invariant to rotations in the occupied–occupied or virtual–virtual blocks, unlike their parent CC theories. The same will also apply perfect octuples (POs) theory, which would be a truncation of CC theory with single through octuple excitations (CCSDTQ5678) based on exactness for eight electrons in eight orbitals. This is somewhat counterintuitive, because in contrast to PP, PQ, and PH, the full set of one- and two-electron integrals is used in PO, encoding in principle all there is to know about the system. But, the triple through octuple excitation operators would still not be fully described within PO, as the truncation is based on orbital labels, the used set of orbitals would still be an issue in PO. However, seeing as the issues with the choice of orbitals clearly ease up going from PP to PQ to PH, PO might not be very sensitive to the orbitals after all.
Due to the one-to-one coupling of occupied and virtual orbitals, PP (and IP) is obviously dependent on the relative ordering of the occupied and virtual orbitals. But, the same property also applies to the PQ and PH models. For instance, if one interchanges the virtual orbitals 2 and 3, the PQ doubles amplitude in the original numbering would become in the renumbered system. However, is not included in the PQ model, so the description of the system has changed. By a similar analysis, it is easy to see that the dependence on the orbital orderings holds for all tensors that are truncated in the subtensor expansion. Due to this common property, we refer to the hierarchy of the PP, PQ, and PH (as well as higher) models as the perfect pairing hierarchy to underline the pairing of the orbital quartets, formed of occupied and virtual alpha and beta orbitals, at each level of theory.
The origin of the dependence on the relative ordering of the occupied and virtual orbitals is easy to understand on physical grounds. In order to reduce the computational scaling, we wished to retain only the strongest interactions between orbitals. The pairing of occupied orbitals to virtual orbitals introduces a sense of locality in the model that results in fast convergence of the truncation, as in the usual local dynamic correlation methods.47,48 The models in the PP hierarchy are, however, invariant to swaps between pairs (i.e., quartets) of orbitals.
The models in the PP hierarchy assume a pairing of the occupied and virtual orbitals before the calculation is began. While PP optimized orbitals will produce the best possible pairing in the sense of reproducing the lowest energy at the PP level of theory, it is not clear that the PP orbitals, which usually turn out to be well-localized, will be optimal for the PQ or PH models: for instance, in polyacenes, the optimal orbitals for CCVB (which describes more strong correlation than PP) turn out to be a bit more delocalized than PP orbitals.29 In all our experience so far, the optimal orbitals have turned out to be localized. However, as stated in the Introduction, variational optimization of the orbitals for PQ and PH is hard, which means that one may need to resort to approximate methods of obtaining localized orbitals with matching virtual orbitals. But, this may not be a problem after all: while orbital optimization is necessary at the PP level of theory due to its aggressive truncation (which also makes the optimization problem tractable), it is plausible that the use of optimal orbitals is less important at higher levels of the hierarchy, because at every step the models become closer to FCI which is orbital invariant. Reasonable orbitals can be obtained from a Hartree–Fock or density-functional theory calculation by localizing the occupied orbitals using, e.g., the Foster–Boys49 (FB), Edmiston–Ruedenberg50 (ER), or Pipek–Mezey51,52 (PM) criteria, after which matching virtual orbitals can be obtained, e.g., by using the Sano procedure.53 In the present work, both PP orbitals and generalized Pipek–Mezey orbitals52 are used.
III. IMPLEMENTATION
We have written a C++ program that generates C++ source code for the PP, PQ, and PH models, starting from CC equations in spin-orbital form. In the generated code, the subtensors are stored in memory separately as arrays. Subtensor symmetry, like for the same-spin doubles excitation amplitudes, is not used in the storage. As is usual for CC theory, the implementation is based on the use of intermediates in evaluating nested tensor products, such as in the singles contribution to the doubles amplitude
Pairings are also generated for the intermediates, and the index antisymmetries of the intermediates are used to eliminate labelings that vanish by Pauli exclusion. The truncation of the intermediates and integrals is controlled separately from that of the excitation amplitudes. For PQ the integrals are truncated to 2 pairs while the intermediates can be truncated to either 2 or 3 pairs, and for PH a 3-pair truncation is always used. As a result, the solution of PQ will scale as O(N3) or O(N4), depending on which truncation of the intermediates is used, and PH will also scale as O(N4). In the present work, both PQ and PH use 3-pair intermediates, yielding asymptotic O(N4) scaling for both models.
The program performs the pairings by brute force by looping over all the necessary index permutations of the amplitude diagrams in spin-orbital form, performing the integration over spin into different spin blocks, and performing the pairing for each block separately. The generation of the PQ model was found to take roughly 3 min on a single core of a workstation, while generation of the PH model took 14.5 days on 32 cores on a cluster node. Almost all the time for the PH model was spent in pairing the hextuple t excitation amplitudes.
The equations for the subtensor contributions are saved on disk in human-readable form. OpenMP parallelization is performed over the individual contributions. For example, the PQ model has 4529 different contributions to the t amplitudes, while PH has 213 234 different contributions to the t amplitudes. Altogether, the generated PQ equations take 27 megabytes of disk space in 10 080 files, while the PH equations take four gigabytes of disk space in almost a quarter million files.
Besides merging contributions that require identical sets of intermediates, no attempt has been made to search for an optimal reduced set of contributions. This procedure reduces the amount of separate contributions by roughly one half from the numbers given above. While common subexpression elimination techniques commonly performed in conventional CC theory could clearly be used here, it appears that because of the introduction of a large variety of subtensors, more optimal sets of intermediates could be found with a thorough graph search.
Only a single copy of the integrals is kept in memory. A disk based direct inversion in the iterative subspace54 (DIIS) approach is used to accelerate the convergence of the CC amplitudes.55
In contrast to conventional CC theory, the pairing of the indices introduces elementwise (Hadamard) products in the pairwise tensor products. Now, while the number of permutations of a rank-n tensor increases as n!, the number of possible products of two rank-n tensors would increase as (n!)2, which becomes unmanageable even for relatively small n. For this reason, our implementation is based on hardcoded C++ computation kernels for each permutationally invariant type of contribution, where indices are either fixed (output index appears only on one of the two tensors), elementwise multiplied over (output index appears on both tensors), or contraction indices (index appears on both tensors but is not an output index). The kernels are machine generated in an external library, both as a for-loop-only implementation and as code calling basic linear algebra subprogram (BLAS) routines wherever applicable. When the library is generated, the two kinds of implementations are benchmarked to determine which one to use for a given kernel.
The two-electron integrals are evaluated in the program from B matrices that can be generated by the resolution of the identity56 (RI) or Cholesky decomposition57 (CD) techniques, as their integral transforms scale more favorably than that of the traditional two-electron integrals, and as the RI/CD representation allows for cherrypicking of the wanted integral elements. The Fock and RI/CD B matrices may currently be obtained from either Q-Chem58,59 or ERKALE.60,61 Both programs have been utilized in the present work: Q-Chem for calculations using PP orbitals and ERKALE for calculations with localized Hartree–Fock orbitals. In the latter case, the active occupied orbitals are localized using the generalized Pipek–Mezey method52 using the intrinsic atomic orbital partial charge estimate.62 The gradient descent method used for the localization has been described elsewhere.63 After the occupied space has been localized, corresponding virtual orbitals are obtained for each active occupied orbital using the Sano procedure.53 All calculations performed with Q-Chem used exact integrals for the formation of the Fock matrices, and RI or CD for forming the B matrices for the two-electron integrals for the CC procedure. The calculations performed with ERKALE used only Cholesky integrals for both the Fock and B matrices. To ensure near-exactness, a very small truncation threshold (10−8) was used for the Cholesky procedure both in ERKALE and Q-Chem.
IV. RESULTS
A. Scaling
To demonstrate the scaling of the novel implementations of PQ and PH with respect to the original implementation of the PQ model, we study full-valence calculations on linear all-trans polyenes CnHn+2 with the geometries described in Ref. 26. The STO-3G basis set64 and PP optimized orbitals are used, and the two-electron integrals are formed with RI using the auxiliary basis set corresponding to the correlation consistent double-ζ basis set.65 The timings shown correspond to the use of a single core on one of the authors’ (S.L.) desktop computer with an Intel Core i7-4770 processor and 32 gigabytes of memory. No data are shown for sparse PH, because due to inefficiencies in its implementation, getting sensible scaling data would have required hundreds of gigabytes of memory.
While the use of local PP orbitals in a quasi-1D molecule clearly would favor algorithms employing sparsity, the results shown in Figure 2 emphatically show the tremendous speedup achieved with the dense tensor formalism that does not use screening of small elements at all. The distribution of the computational work with the PQ model is shown in Table III. As is seen, most of the work in the model is performed in the multiplications, with permutations also contributing a nontrivial fraction of the total runtime. The compute kernels and permutations account altogether for 85% of the runtime of the model, with another 11% being spent on dynamic memory management. A small fraction of the total runtime for large systems is lost due to inefficiencies in the present implementation, which are also obvious in Figure 2 for small systems for which the time for solution is quasi-constant. (Due to the larger size of the equations, the inefficiencies are even larger for PH.) The scaling plot also shows some bumps, which are likely artifacts of background processes running simultaneously on the workstation.
The mathematical scaling of the models is determined by the individual multiplication kernels that appear in the model. The individual multiplication kernels for the three-pair intermediate PQ model are shown in Table IV. Here, all the kernels that contain a contraction use BLAS implementations. For instance, the heaviest kernel in Table IV is implemented with a dgemv call within a for-loop over x, over which there is an elementwise product. Similarly, the second-heaviest kernel is also a dgemv call inside a for-loop over the y index. The 7th heaviest kernel is just a single dgemm call. As can be seen from the kernels, PQ on C52H54 with an active space of 202 electrons in 202 orbitals is still in the cubic scaling regime, as the heaviest kernels on the list are cubic scaling. In contrast, PH on C14H16 with an active space of 72 electrons in 72 orbitals is on the verge of turning from cubic to quartic scaling, as while the heaviest kernel (3.3 processor hours) is cubic scaling, the second-heaviest kernel (2.7 processor hours) is quartic scaling.
The novel implementation of PQ is fast even for large orbital spaces, and typically much greater effort is spent in the orbital optimization with either the self-consistent field method or PP, as well as the integral transforms: detailed timings for the different steps in the calculation are shown in Table V. As can be seen from Figure 2, the PQ calculation for an active space of 140 electrons in 140 orbitals can be performed in a matter of minutes on a single core with the current version of the code that has not yet been heavily optimized. Even lower scaling could also be obtained by truncating the intermediates in PQ to two pairs, but this would also limit the accuracy of the model.14
Time spent in multiplication kernels (Table IV) | 3617.84 |
Time spent in addition kernels | 42.34 |
Time spent in permutations | 2331.36 |
Memory allocation | 761.73 |
Integrals | 9.27 |
Diagonal zeroing | 39.72 |
Denominator application | 0.17 |
Sum of above | 6802.40 |
Total elapsed wall time | 7047.52 |
Time spent in multiplication kernels (Table IV) | 3617.84 |
Time spent in addition kernels | 42.34 |
Time spent in permutations | 2331.36 |
Memory allocation | 761.73 |
Integrals | 9.27 |
Diagonal zeroing | 39.72 |
Denominator application | 0.17 |
Sum of above | 6802.40 |
Total elapsed wall time | 7047.52 |
Kernel . | Time (s) . | Scaling . | BLAS . |
---|---|---|---|
o(x)←l(x) r(x) | 0.00 | N | No |
o(x, y)←l(x, y) r(x, y) | 0.01 | N2 | No |
o(x)←l(x) r(x) | 0.05 | N | No |
o(x, y)←l(x) r(y) | 0.07 | N2 | No |
o(x)←∑yl(x, y) r(y) | 0.56 | N2 | Yes |
o(x, y, z)←l(y, z) r(x) | 1.27 | N3 | No |
o(x, y)←l(x, y) r(y) | 13.51 | N2 | No |
o(x)←∑yl(x, y) r(x, y) | 21.25 | N2 | Yes |
o(x, y)←l(x, y) r(x) | 23.31 | N2 | No |
o(x, y)←∑zl(x, z) r(y, z) | 24.90 | N3 | Yes |
o(x, y, z)←l(x, y, z) r(z) | 28.69 | N3 | No |
o(x, y)←l(x, y) r(x, y) | 47.89 | N2 | No |
o(x, y, z)←∑wl(y, z, w) r(x, w) | 83.04 | N4 | Yes |
o(x, y, z)←l(x, z) r(y, z) | 110.31 | N3 | No |
o(x, y, z)←l(x, y, z) r(x) | 125.27 | N3 | No |
o(x, y, z)←∑wl(x, y, w) r(z, w) | 127.99 | N4 | Yes |
o(x, y, z)←l(x, y, z) r(x, z) | 145.91 | N3 | No |
o(x, y, z)←l(x, y, z) r(x, y) | 151.28 | N3 | No |
o(x, y, z)←l(x, y) r(y, z) | 501.45 | N3 | No |
o(x, y, z)←l(x, y) r(x, z) | 502.06 | N3 | No |
o(x, y)←∑zl(x, y, z) r(y, z) | 521.73 | N3 | Yes |
o(x, y)←∑zl(x, y, z) r(x, z) | 1187.29 | N3 | Yes |
Kernel . | Time (s) . | Scaling . | BLAS . |
---|---|---|---|
o(x)←l(x) r(x) | 0.00 | N | No |
o(x, y)←l(x, y) r(x, y) | 0.01 | N2 | No |
o(x)←l(x) r(x) | 0.05 | N | No |
o(x, y)←l(x) r(y) | 0.07 | N2 | No |
o(x)←∑yl(x, y) r(y) | 0.56 | N2 | Yes |
o(x, y, z)←l(y, z) r(x) | 1.27 | N3 | No |
o(x, y)←l(x, y) r(y) | 13.51 | N2 | No |
o(x)←∑yl(x, y) r(x, y) | 21.25 | N2 | Yes |
o(x, y)←l(x, y) r(x) | 23.31 | N2 | No |
o(x, y)←∑zl(x, z) r(y, z) | 24.90 | N3 | Yes |
o(x, y, z)←l(x, y, z) r(z) | 28.69 | N3 | No |
o(x, y)←l(x, y) r(x, y) | 47.89 | N2 | No |
o(x, y, z)←∑wl(y, z, w) r(x, w) | 83.04 | N4 | Yes |
o(x, y, z)←l(x, z) r(y, z) | 110.31 | N3 | No |
o(x, y, z)←l(x, y, z) r(x) | 125.27 | N3 | No |
o(x, y, z)←∑wl(x, y, w) r(z, w) | 127.99 | N4 | Yes |
o(x, y, z)←l(x, y, z) r(x, z) | 145.91 | N3 | No |
o(x, y, z)←l(x, y, z) r(x, y) | 151.28 | N3 | No |
o(x, y, z)←l(x, y) r(y, z) | 501.45 | N3 | No |
o(x, y, z)←l(x, y) r(x, z) | 502.06 | N3 | No |
o(x, y)←∑zl(x, y, z) r(y, z) | 521.73 | N3 | Yes |
o(x, y)←∑zl(x, y, z) r(x, z) | 1187.29 | N3 | Yes |
Molecule . | norb . | SCF . | PP . | B matrix . | PQ ints . | PQ t . | PQ perm . | PQ flop . | PH ints . | PH t . | PH perm . | PH flop . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
C4H6 | 22 | 0.23 | 0.35 | 0.68 | 0.00 | 66.63 | 1.3 | 1.8 | 0.05 | 6942.94 | 4.8 | 14.2 |
C6H8 | 32 | 0.27 | 0.73 | 0.86 | 0.01 | 75.42 | 1.8 | 3.9 | 0.15 | 12 228.76 | 8.8 | 28.2 |
C8H10 | 42 | 0.35 | 1.45 | 1.13 | 0.03 | 90.27 | 2.7 | 7.5 | 0.39 | 20 436.82 | 12.8 | 41.9 |
C10H12 | 52 | 0.44 | 2.37 | 1.65 | 0.06 | 105.08 | 4.3 | 12.7 | 0.79 | 39 056.60 | 16.8 | 48.9 |
C12H14 | 62 | 0.64 | 3.65 | 2.47 | 0.11 | 120.68 | 5.6 | 17.8 | 1.49 | 59 499.11 | 19.2 | 54.5 |
C14H16 | 72 | 1.10 | 5.05 | 3.89 | 0.18 | 144.63 | 7.1 | 21.7 | 2.72 | 83 323.67 | 20.4 | 58.4 |
C16H18 | 82 | 1.71 | 7.10 | 5.89 | 0.26 | 144.47 | 9.2 | 29.8 | ||||
C18H20 | 92 | 2.48 | 9.92 | 8.83 | 0.37 | 169.61 | 11.9 | 34.9 | ||||
C20H22 | 102 | 3.54 | 13.07 | 13.56 | 0.48 | 197.28 | 13.7 | 39.6 | ||||
C22H24 | 112 | 6.98 | 19.06 | 18.81 | 0.68 | 225.59 | 15.5 | 44.0 | ||||
C24H26 | 122 | 7.83 | 24.13 | 26.78 | 0.89 | 274.37 | 18.5 | 47.5 | ||||
C26H28 | 132 | 9.11 | 30.51 | 37.99 | 1.11 | 332.18 | 20.6 | 50.0 | ||||
C28H30 | 142 | 10.53 | 37.52 | 55.58 | 1.40 | 400.48 | 22.5 | 52.8 | ||||
C30H32 | 152 | 12.07 | 46.01 | 70.23 | 1.73 | 510.74 | 26.5 | 51.5 | ||||
C32H34 | 162 | 13.64 | 55.16 | 104.76 | 2.11 | 610.46 | 27.1 | 53.4 | ||||
C34H36 | 172 | 15.62 | 66.35 | 130.54 | 2.51 | 713.33 | 27.6 | 54.9 | ||||
C36H38 | 182 | 17.26 | 77.59 | 161.94 | 2.99 | 847.72 | 28.5 | 56.1 | ||||
C38H40 | 192 | 19.32 | 94.63 | 242.02 | 3.50 | 1055.37 | 31.7 | 55.3 | ||||
C40H42 | 202 | 21.33 | 107.16 | 339.07 | 4.14 | 1351.42 | 30.8 | 51.7 | ||||
C42H44 | 212 | 25.34 | 125.60 | 436.98 | 4.77 | 1566.79 | 32.0 | 51.7 | ||||
C44H46 | 222 | 27.52 | 143.47 | 586.43 | 5.48 | 1888.20 | 32.5 | 51.4 | ||||
C46H48 | 232 | 30.12 | 166.73 | 1108.01 | 6.28 | 2020.66 | 32.0 | 51.7 | ||||
C48H50 | 242 | 33.16 | 188.38 | 1366.80 | 7.31 | 2343.37 | 32.4 | 51.6 |
Molecule . | norb . | SCF . | PP . | B matrix . | PQ ints . | PQ t . | PQ perm . | PQ flop . | PH ints . | PH t . | PH perm . | PH flop . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
C4H6 | 22 | 0.23 | 0.35 | 0.68 | 0.00 | 66.63 | 1.3 | 1.8 | 0.05 | 6942.94 | 4.8 | 14.2 |
C6H8 | 32 | 0.27 | 0.73 | 0.86 | 0.01 | 75.42 | 1.8 | 3.9 | 0.15 | 12 228.76 | 8.8 | 28.2 |
C8H10 | 42 | 0.35 | 1.45 | 1.13 | 0.03 | 90.27 | 2.7 | 7.5 | 0.39 | 20 436.82 | 12.8 | 41.9 |
C10H12 | 52 | 0.44 | 2.37 | 1.65 | 0.06 | 105.08 | 4.3 | 12.7 | 0.79 | 39 056.60 | 16.8 | 48.9 |
C12H14 | 62 | 0.64 | 3.65 | 2.47 | 0.11 | 120.68 | 5.6 | 17.8 | 1.49 | 59 499.11 | 19.2 | 54.5 |
C14H16 | 72 | 1.10 | 5.05 | 3.89 | 0.18 | 144.63 | 7.1 | 21.7 | 2.72 | 83 323.67 | 20.4 | 58.4 |
C16H18 | 82 | 1.71 | 7.10 | 5.89 | 0.26 | 144.47 | 9.2 | 29.8 | ||||
C18H20 | 92 | 2.48 | 9.92 | 8.83 | 0.37 | 169.61 | 11.9 | 34.9 | ||||
C20H22 | 102 | 3.54 | 13.07 | 13.56 | 0.48 | 197.28 | 13.7 | 39.6 | ||||
C22H24 | 112 | 6.98 | 19.06 | 18.81 | 0.68 | 225.59 | 15.5 | 44.0 | ||||
C24H26 | 122 | 7.83 | 24.13 | 26.78 | 0.89 | 274.37 | 18.5 | 47.5 | ||||
C26H28 | 132 | 9.11 | 30.51 | 37.99 | 1.11 | 332.18 | 20.6 | 50.0 | ||||
C28H30 | 142 | 10.53 | 37.52 | 55.58 | 1.40 | 400.48 | 22.5 | 52.8 | ||||
C30H32 | 152 | 12.07 | 46.01 | 70.23 | 1.73 | 510.74 | 26.5 | 51.5 | ||||
C32H34 | 162 | 13.64 | 55.16 | 104.76 | 2.11 | 610.46 | 27.1 | 53.4 | ||||
C34H36 | 172 | 15.62 | 66.35 | 130.54 | 2.51 | 713.33 | 27.6 | 54.9 | ||||
C36H38 | 182 | 17.26 | 77.59 | 161.94 | 2.99 | 847.72 | 28.5 | 56.1 | ||||
C38H40 | 192 | 19.32 | 94.63 | 242.02 | 3.50 | 1055.37 | 31.7 | 55.3 | ||||
C40H42 | 202 | 21.33 | 107.16 | 339.07 | 4.14 | 1351.42 | 30.8 | 51.7 | ||||
C42H44 | 212 | 25.34 | 125.60 | 436.98 | 4.77 | 1566.79 | 32.0 | 51.7 | ||||
C44H46 | 222 | 27.52 | 143.47 | 586.43 | 5.48 | 1888.20 | 32.5 | 51.4 | ||||
C46H48 | 232 | 30.12 | 166.73 | 1108.01 | 6.28 | 2020.66 | 32.0 | 51.7 | ||||
C48H50 | 242 | 33.16 | 188.38 | 1366.80 | 7.31 | 2343.37 | 32.4 | 51.6 |
B. Accuracy
To study the accuracy of the generated models in the perfect pairing hierarchy, in the following, we apply the PP, PQ, and PH models on the π space of the polyenes used above for the scaling study, the symmetric and asymmetric dissociation of the H50 chain, as well as the strongly correlated π space of polyacene molecules. Hydrogen chains and polyacenes are standard chemical models for testing models of strong correlation.66–69 The π space correlation energies for the polyenes are shown in Table VI, from which the accuracy of the PP hierarchy already becomes apparent: even for the largest system, PH captures 99.3% of the full correlation energy.
Molecule . | PP . | PQ . | PH . | DMRG . |
---|---|---|---|---|
C4H6 | −0.080 460 | −0.091 502 | −0.091 502 | −0.091 502 |
C8H10 | −0.143 334 | −0.172 876 | −0.176 908 | −0.177 127 |
C12H14 | −0.205 042 | −0.253 097 | −0.261 466 | −0.262 297 |
C16H18 | −0.266 492 | −0.333 088 | −0.345 899 | −0.347 403 |
C20H22 | −0.327 882 | −0.413 030 | −0.430 310 | −0.432 498 |
C24H26 | −0.389 257 | −0.492 961 | −0.514 717 | −0.517 591 |
C28H30 | −0.450 629 | −0.572 889 | −0.599 123 | −0.602 684 |
C32H34 | −0.512 000 | −0.652 816 | −0.683 528 | −0.687 777 |
C36H38 | −0.573 371 | −0.732 743 | −0.767 934 | −0.772 870 |
C40H42 | −0.634 742 | −0.812 669 | −0.852 338 | −0.857 963 |
C44H46 | −0.696 112 | −0.892 595 | −0.936 743 | −0.943 056 |
C48H50 | −0.757 483 | −0.972 521 | −1.021 147 | −1.028 149 |
Molecule . | PP . | PQ . | PH . | DMRG . |
---|---|---|---|---|
C4H6 | −0.080 460 | −0.091 502 | −0.091 502 | −0.091 502 |
C8H10 | −0.143 334 | −0.172 876 | −0.176 908 | −0.177 127 |
C12H14 | −0.205 042 | −0.253 097 | −0.261 466 | −0.262 297 |
C16H18 | −0.266 492 | −0.333 088 | −0.345 899 | −0.347 403 |
C20H22 | −0.327 882 | −0.413 030 | −0.430 310 | −0.432 498 |
C24H26 | −0.389 257 | −0.492 961 | −0.514 717 | −0.517 591 |
C28H30 | −0.450 629 | −0.572 889 | −0.599 123 | −0.602 684 |
C32H34 | −0.512 000 | −0.652 816 | −0.683 528 | −0.687 777 |
C36H38 | −0.573 371 | −0.732 743 | −0.767 934 | −0.772 870 |
C40H42 | −0.634 742 | −0.812 669 | −0.852 338 | −0.857 963 |
C44H46 | −0.696 112 | −0.892 595 | −0.936 743 | −0.943 056 |
C48H50 | −0.757 483 | −0.972 521 | −1.021 147 | −1.028 149 |
Next, while H50 is chemically uninteresting, it is an excellent system for studying models for strong correlation. When the interatomic separation is increased, the system switches from a metallic state to an insulating state with strong multireference character in the intermediate region that is not captured by CC with full single and double (CCSD) or CC with full single and double and perturbative triple excitations (CCSD(T)).26 In the symmetric dissociation, the equispaced chain is stretched apart into noninteracting H atoms. In the asymmetric dissociation, the system is composed of 25 H2 molecules with bond length R = 1.4a0, where a0 is the Bohr radius, which are placed on the z axis, and the energy is studied as a function of the intermolecular distance. Here, the STO-6G basis set64 and PP optimized orbitals are used. The two-electron integrals for PQ and PH are generated with CD. The DMRG data have been taken from Ref. 26.
The obtained total energies for asymmetric dissociation are shown in Figure 3. For large intermolecular separation, the system is essentially composed of noninteracting H2 molecules, for which already PP is exact. When the molecules are pushed closer, the differences in the performance of the models become apparent: while PP quickly starts deviating from the exact solution, PQ and PH stay more accurate on a wider range of correlation strength. This is also clearly visible in the fraction of correlation energy captured by the models shown in Figure 4. For the most strongly correlated geometry, in which the atoms are equidistant, PP, PQ, and PH capture 35.2%, 63.9%, and 87.9% of the correlation energy, respectively. PH becomes practically exact (disagreeing from the DMRG correlation energy by at most 1%) at the distance d = 2.0a0, where it captures 99.2% of the correlation energy. For PQ, practical exactness is reached at d = 2.8a0 with 99.4% of the correlation energy retained. PP reaches exactness at d = 4.2a0, where it describes 99.0% of the correlation.
The symmetric dissociation is a much harder problem, but even here the models fare seemingly well, as shown by the total and correlation energy plots in Figures 5 and 6, respectively. As before with the asymmetric stretch, a clear hierarchy is again seen between the models, with the most strongly correlated geometry showing 22.0%, 47.6%, and 80.9% of the correlation energy being captured by PP, PQ, and PH, respectively. As the chain is pulled apart, the models quickly become more accurate, PH reaching practical exactness at separation d = 2.8a0 where it captures 99.0% of the correlation energy. Unfortunately, for larger interatomic spacings, the CC iterations diverged for PQ and PH (both with the zero and PP guess for the amplitudes); thus, some data points are missing. However, because PP appears to become better and better for large separations, the divergence issue might be solved by the use of a better preconditioner.
The linear polyacenes are known to exhibit multiradical behavior in the π space.70 Next, we will try to match the DMRG energies in the STO-3G basis from Ref. 70 using their geometries. Generalized Pipek–Mezey localized active occupied orbitals paired with Sano virtuals were used. The energies obtained with the PP, PQ, and PH models compared against the DMRG reference are shown in Table VII. Here, the models capture 44%–59%, 69%–88%, and 96%–99% of the DMRG correlation energy for PP, PQ, and PH, respectively. We can also compare the natural occupation numbers produced by PP, PQ, and PH against the ones from DMRG; this is done in Figure 7. While PP and PQ successfully capture a significant amount of the correlation energy, they fail to capture the strong correlation effects in the π space of the polyacenes. For PP, the occupation number plot has an almost step function like character, while PQ reproduces some curvature but is still far from the DMRG reference. In contrast, PH reproduces a clear signal of strong correlation that is strikingly similar to the DMRG reference values. Quantitative agreement with DMRG is not, however, reached, which we primarily attribute to the use of non-optimal orbitals. Indeed, other choices for the orbital localization method (see the supplementary material) demonstrate that a different choice for the active space orbitals affects the natural orbital occupation numbers, as well as the fraction of energy captured. Furthermore, as generalized Pipek–Mezey localized Hartree–Fock orbitals were found to yield better energies than PP optimized orbitals at the UPH level of theory, we conclude that as has been seen for CCVB,29 PQ and PH presumably need orbitals that are less localized than those reproduced by PP, and the best choice for the (non-optimized) orbitals is still an open question.
Molecule . | Active space . | PP . | PQ . | PH . | DMRG . |
---|---|---|---|---|---|
2acene | (10e,10o) | −0.104 902 | −0.157 684 | −0.177 098 | −0.178 294 |
3acene | (14e,14o) | −0.144 544 | −0.213 115 | −0.249 895 | −0.254 307 |
4acene | (18e,18o) | −0.190 182 | −0.279 345 | −0.325 491 | −0.332 661 |
5acene | (22e,22o) | −0.228 839 | −0.335 088 | −0.401 098 | −0.412 627 |
6acene | (26e,26o) | −0.265 084 | −0.393 537 | −0.479 533 | −0.495 683 |
8acene | (34e,34o) | −0.322 287 | −0.495 219 | −0.641 147 | −0.668 526 |
10acene | (42e,42o) | −0.382 622 | −0.595 840 | −0.803 505 | −0.838 580 |
12acene | (50e,50o) | −0.444 834 | −0.698 165 | −0.968 521 | −1.007 378a |
Molecule . | Active space . | PP . | PQ . | PH . | DMRG . |
---|---|---|---|---|---|
2acene | (10e,10o) | −0.104 902 | −0.157 684 | −0.177 098 | −0.178 294 |
3acene | (14e,14o) | −0.144 544 | −0.213 115 | −0.249 895 | −0.254 307 |
4acene | (18e,18o) | −0.190 182 | −0.279 345 | −0.325 491 | −0.332 661 |
5acene | (22e,22o) | −0.228 839 | −0.335 088 | −0.401 098 | −0.412 627 |
6acene | (26e,26o) | −0.265 084 | −0.393 537 | −0.479 533 | −0.495 683 |
8acene | (34e,34o) | −0.322 287 | −0.495 219 | −0.641 147 | −0.668 526 |
10acene | (42e,42o) | −0.382 622 | −0.595 840 | −0.803 505 | −0.838 580 |
12acene | (50e,50o) | −0.444 834 | −0.698 165 | −0.968 521 | −1.007 378a |
DMRG value of Ref. 70 was not converged.
V. SUMMARY AND DISCUSSION
We have described the low-scaling dense tensor implementation of a family of truncated CC methods and generated efficient implementations of the PQ and PH models. High-rank tensors are expressed in terms of a considerable number of subtensors, and summations over Kronecker delta indices are performed before computer code is generated. As straightforward truncations of CC theory, the models inherit all the properties of the parent theory, such as size extensivity, and pre-existing machinery for the calculation of properties could be used.
We have demonstrated impressive speedups (of roughly 200-500 times) compared to the earlier implementation that already had the correct O(N4) scaling for PQ, and demonstrated the rapid convergence of the PP, PQ, and PH models in describing strong correlation with applications to linear, all-trans polyenes, the dissociation of the H50 chain, as well as linear polyacenes. Although only minimal basis sets have been used in the present work due to limited availability of high-level reference data, the runtime of the models after integral transformation is basis set independent due to the use of an active space (and the lack of orbital optimization that has not been pursued in the present work). While the models are already computationally fast enough to allow their use in studies of chemical problems, we believe that they can still be made even faster by a variety of approaches. Most importantly, the role of the intermediates in the paired theories is clearly different from the intermediates in full-rank CC theory. A thorough procedure for identifying common factors between different contributions, as well as a graph-type approach for determining the optimal set of intermediates, could plausibly yield a further order of magnitude speedup to the existing implementation. In the case of closed-shell molecules, spin summation of the model would reduce storage costs for the amplitudes by roughly 50% and cut down on computational costs as well. A reorganization of the way the amplitude updates are performed might yield convergence in a smaller number of iterations.71 Because of the mathematical structure of the models, the storage of the tensors is already distributed, and massively parallel implementations of the models could be pursued.
While we have demonstrated that the truncation hierarchy advocated in the present work allows for applications to large systems, it is further important to note that in systems with little static correlation, it is possible to further speed things up. While in PQ and PH the tensors that correspond to excitations of different ranks have similar storage requirements (within an order of magnitude) as shown in Table II, the runtime tends to be dominated by the highest excitations due to the larger amount of antisymmetrizations necessary in the CC diagrams. Thus, should the higher excitation operators turn out to be insignificant, disabling them can yield further speedups. For example, the PH model with only single and double (and triple and quadruple) excitations enabled is essentially a local CCSD (CCSDTQ) approach, if the treatment is based on localized orbitals.
As we have seen in the application to the acene series, the choice of orbitals may be important for the perfect pairing hierarchy of models, which we will address in future work. In addition, we wish to extend the PQ and PH models to open-shell systems, as well as to include the description of dynamic correlation. The similarities between the PP truncation hierarchy and local CC methods could also be further explored.
SUPPLEMENTARY MATERIAL
See supplementary material for results on the polyacenes with various choices for the active space orbitals.
Acknowledgments
S.L. thanks Evgeny Epifanovsky, David Small, and Joonho Lee for discussions, and Johannes Hachmann for supplying the DMRG natural occupation numbers for the polyacenes. This work was supported by the Director, Office of Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division of the U.S. Department of Energy, under Contract No. DE-AC02-05CH11231. J.P. thanks the University of Notre Dame’s College of Science and Department of Chemistry and Biochemistry for generous start-up funding.