In this communication, an improved perturbative triples correction (T) algorithm for domain based local pair-natural orbital singles and doubles coupled cluster (DLPNO-CCSD) theory is reported. In our previous implementation, the semi-canonical approximation was used and linear scaling was achieved for both the DLPNO-CCSD and (T) parts of the calculation. In this work, we refer to this previous method as DLPNO-CCSD(T0) to emphasize the semi-canonical approximation. It is well-established that the DLPNO-CCSD method can predict very accurate absolute and relative energies with respect to the parent canonical CCSD method. However, the (T0) approximation may introduce significant errors in absolute energies as the triples correction grows up in magnitude. In the majority of cases, the relative energies from (T0) are as accurate as the canonical (T) results of themselves. Unfortunately, in rare cases and in particular for small gap systems, the (T0) approximation breaks down and relative energies show large deviations from the parent canonical CCSD(T) results. To address this problem, an iterative (T) algorithm based on the previous DLPNO-CCSD(T0) algorithm has been implemented [abbreviated here as DLPNO-CCSD(T)]. Using triples natural orbitals to represent the virtual spaces for triples amplitudes, storage bottlenecks are avoided. Various carefully designed approximations ease the computational burden such that overall, the increase in the DLPNO-(T) calculation time over DLPNO-(T0) only amounts to a factor of about two (depending on the basis set). Benchmark calculations for the GMTKN30 database show that compared to DLPNO-CCSD(T0), the errors in absolute energies are greatly reduced and relative energies are moderately improved. The particularly problematic case of cumulene chains of increasing lengths is also successfully addressed by DLPNO-CCSD(T).
The importance of triples correction for coupled cluster with singles and doubles (CCSD) was recognized early on.1 The approximated iterative or non-iterative triples correction methods are more economical than full triples.2,3 Derived from CCSDT-1,4 a non-iterative method CCSD[T] was developed by Bartlett and co-workers in 1985.5 By adding a singles term correction to CCSD[T], Pople and co-workers proposed the CCSD(T) method in 1989.6 The CCSD(T) method is the most widely used wave function method for small molecules and is considered to be the “gold standard.” However, the high computational costs limit the applicability of canonical CCSD(T) to systems with not more than about 1000 basis functions on the most powerful hardware.7 Many groups have worked on new developments in order to be able to apply the CCSD(T) method to medium- and large-sized systems.8–20 The large and fast growing literature about low order or linear scaling CCSD algorithms has been extensively reviewed in earlier publications.11–13 In this communication, we only discuss the studies that are focused on the (T) correction. Schütz and Werner implemented the first projected atomic orbitals (PAOs) based linear scaling local (T) method in 2000.8,21 In their work, different variants of local (T) were explored. Their variant using the semi-canonical (SC) approximation is the most efficient one since it neglects the couplings between different triples by the off-diagonal Fock matrix elements and hence bypasses the storage of the triples amplitudes. This SC-based local (T) method is usually referred as LCCSD(T0). Another alternative in their paper is called LCCSD(T), in which the triples amplitudes are computed iteratively. More accurate (T) correlation energies can be recovered using converged amplitudes compared to the LCCSD(T0) results. However, the triples amplitudes expressed in PAOs for the virtual space have to be stored either on disk or in memory and must be updated iteratively. In the LCCSD(T) method, a large amount of storage is needed to apply the method to medium-sized systems. To overcome this difficulty, orbital-specific virtual (OSV) orbitals were used to reduce the virtual space of each triple.22 Furthermore, Schmitz and Hättig introduced the Laplace transformation (LT) technique23 and used pair natural orbitals (PNOs)24 in their recent PNO-CCSD(T) method.25,26 The storage bottleneck of the triples amplitudes is completely avoided by using the LT technique. The explicitly correlated method (F12) was integrated into their method as well, which can reproduce very accurate absolute energies. Alternatively, Schütz also extended their LCCSD(T) method to the CCSDT-1b method,5 which can avoid the storage of triple amplitudes as well.10
Besides the above mentioned direct local correlation methods, fragment-based local CCSD(T) method development is a very active area of research.14–20 Popular approaches include the cluster-in-molecule (CIM),15 the natural linear-scaled coupled cluster,16 the divide-expand-consolidate (DEC),20 the incremental,14 and the local natural orbital (LNO)18 methods, all of which have led to low-order or linear scaling CCSD(T) methods.17,19,27–29 Most recently, Nagy and Kállay combined their LNO approach with LT CCSD(T).29 Their new scheme is capable to compute very accurate CCSD(T) correlation energies for systems with more than ten thousand basis functions. By calculating all intermediates on the fly, there is no storage bottleneck anymore in their new method. More importantly, by using the LT technique, there is no redundant calculation in the triples amplitudes’ evaluation from the overlapping fragments.
The use of pair natural orbitals (PNOs)30–32 was revived and introduced into local correlation methods in 2009.24 Since then, a series of linear scaling CCSD(T) methods for both closed-shell and open-shell systems have been developed.12,13,33,34 In all these PNO-based CCSD(T) methods, the SC approximation is used. In the following, our previous domain based local pair-natural orbital (DLPNO)-CCSD(T)13 method is denoted as DLPNO-CCSD(T0), whereas the DLPNO-CCSD method with the improved perturbative triples correction, described in this communication, is abbreviated as DLPNO-CCSD(T).
Several studies have shown that the local (T0) method is a very good approximation to compute relative or reaction energies.12,26 However, two of us (Y.M. and L.C.) found that DLPNO-CCSD(T0) can fail for a few cases, for example, the atomization-like energy of cumulene chains, because of the large deviations from the canonical results caused by the (T0) corrections. To overcome this difficulty, we here present an iterative local (T) algorithm. Our strategy is similar to Schütz’s LCCSD(T) method.8 Only a subset of triples is considered in the (T), which scales linearly with the system size. Meanwhile, by using triples natural orbitals (TNOs) instead of PAOs, the virtual spaces of surviving triples are greatly reduced. To further speed up our algorithm, three additional approximations are introduced. In the following, after a detailed description of the algorithm, a series of benchmark calculations is conducted. Finally, a range of medium-sized to large-sized molecules is studied to demonstrate the accuracy and efficiency of the new DLPNO-CCSD(T) method.
The indices i,j,k,l are used to denote the occupied MOs, and a,b,c,d denote virtual MOs (TNOs). Note that for each ijk combination, there is a different set of quasi-canonical TNOs as denoted by a subscript aijk.12 Using converged DLPNO-CCSD singles- and doubles-amplitudes, the energy contribution of each triple is computed by
The in Eq. (1) is the converged singles amplitudes from the DLPNO-CCSD calculation. The number of explicitly treated index-triples ijk increases linearly with the systems size.13 The intermediate has the same definition as in our previous work.11 The construction of TNOs was adopted from the DLPNO-CCSD(T0) method as well. Since localized, occupied MOs are used, the amplitudes have to be computed iteratively in order to reproduce canonical (T) results,
where is the Fock matrix elements and are overlap matrices between TNO spaces of two different triples. Note that the TNOs are chosen to be quasi-canonical, and therefore, no coupling terms arise in the virtual block. In the DLPNO-CCSD(T0) method, the amplitudes are approximately computed by
All intermediates and amplitudes are evaluated on the fly under the (T0) approximation. The extension from the linear scaling (T0) approach to (T) is straightforward. In the (T0) calculation, only a subset of triples is considered, the number of which increases linearly with the system size. By using TNOs, the virtual space of each triple is greatly reduced. In fact, the number of TNOs per triple approaches a constant with increasing system size (on average less than 80 TNOs per triple for calculations with triple-ζ basis sets). Hence, the storage needed for all intermediates during the iteration process is not a bottleneck anymore. After the DLPNO-CCSD(T0) calculation, the amplitudes and other intermediates can be saved on disk. Next, the amplitudes are updated according to Eq. (2) until convergence is achieved. Finally, inserting converged amplitudes into Eq. (1), the final local (T) correction energy is computed. Although the (T) algorithm is linear scaling, the central processing unit (CPU) time spent on the iteration step is at least five times more than the (T0) method for medium-sized molecules. The bottleneck in the (T) calculations is the TNO space transformations between triples. Hence, we have carefully introduced three additional approximations in order to ease the computational burden while not sacrificing any additional accuracy in the results.
Approximation 1 is based on the observation that, on average, only 20% of all triples account for more than 90% of the (T) correction. Hence, it is reasonable to focus attention on these large triples and treat them at higher accuracy, while the remaining “weak triples” are treated at lower accuracy. To this end, all surviving triples after the DLPNO-CCSD(T0) calculations are ordered by decreasing absolute value of their energy contributions. Then, these energies are summed until 90% of the (T0) energy is reached. These triples are denoted as “strong triples” and the remaining ijk index combinations as “weak triples.” For both strong and weak triples, smaller TNO spaces are constructed after the conventional (T0) calculations. These are determined by two separate scaling factors (TTNOScaleS and TTNOScaleW) relative to the overall TNO occupation number threshold (TCutTNO) that has the default value 10−9.13 with the smaller TNOs will be used in the iterative process to save computational time. Having obtained converged amplitudes in the small TNO basis, the final (T) energy is calculated as
Results with different TTNOScaleS (TTNOScaleW) scale factors for strong (weak) triples are given in Table S1 (see supplementary material). By setting TTNOScaleS as 10.0 TCutTNO and TTNOScaleW as 100.0 TCutTNO, we still recover more than 99.9% of the (T) correction energies relative to the DLPNO-CCSD(T) method without the strong/weak separation and without reduced TNO spaces. From Table S1, one can see that the DLPNO-(T) algorithm is just about 2-3 times slower than its (T0) counterpart by using the default setting for a range of medium-sized (30-50 atoms) model systems.
From Eq. (2), one notices that the couplings between triples are connected by the Fock matrix elements. For example, if the contribution from term is zero. In approximation 2, couplings between and other triples are depending on the size of the off-diagonal elements of the Fock matrix. If the absolute value of is smaller than a threshold FCut, all couplings between two triples connected by are neglected. This approximation is commonly employed in local MP2 LMP2 implementations.35,36 From Table S2 (see supplementary material), one observes that the (T) correction energies converge very fast with FCut. By setting FCut to the default value 10−3, the deviations with respect to the converged results are already below 1 μhartree. Meanwhile, the speedup is about 10% relative to the calculation without approximation 2.
In order to save computer storage, the Gauss-Seidel scheme is used for the update of triples amplitudes. The (T) energies usually converge within four iterations, which is much faster than, e.g., LMP2. Additionally, we found that the convergence speed of different triples is different. Some triples contributions converge within even fewer iterations. To further speed up the performance of the DLPNO-(T), a triple will be skipped in the next Gauss-Seidel iteration, if its energy increment with respect to the energy in the previous iteration is less than a threshold TCutIter. Results in Table S3 (see supplementary material) show that this approximation, which represents the 3rd approximation in our algorithm, still guarantees the recovery of 99.9% of the local (T) energies by setting TCutIter = 0.001. Meanwhile, triple-triple couplings are reduced substantially in the subsequent (T) iterations.
The algorithm proposed above has been implemented in a development version of ORCA.37 The closed-shell test sets in the GMTKN30 database38,39 are employed to show the accuracy of the new DLPNO-(T) method with aug-cc-pVDZ basis sets (Fig. 1). The canonical CCSD(T) results are used as references. The maximum absolute deviation (MAD) of DLPNO-(T) with respect to the canonical (T) results is less than 2 kcal/mol for all systems by using the TightPNO settings defined previously.13 By contrast, the largest MAD obtained with the previous (T0) method is more than 6.0 kcal/mol. The deviations of CCSD and overall correlation energies with respect to the corresponding references are also shown in Fig. 1. Although, the deviations introduced by the DLPNO-CCSD approximations are smaller than (T) in some cases, the deviations between DLPNO-CCSD(T) and canonical CCSD(T) correlation energies are significantly reduced compared with DLPNO-CCSD(T0). The relative energy deviations of DLPNO-CCSD(T) and DLPNO-CCSD(T0) show a similar but less pronounced trend. Both DLPNO-CCSD(T0) and DLPNO-CCSD(T) methods can reproduce the relative energies very accurately, MADs of which with respect to the canonical CCSD(T) results are below 1 kcal/mol. Nevertheless, the results computed by DLPNO-CCSD(T) are slightly better.
Absolute (a) and relative (b) energy deviations of CCSD, (T), and total correlation energies (in kcal/mol) calculated by DLPNO-CCSD(T) with respect to the canonical CCSD(T) using aug-cc-pVDZ basis set and TightPNO settings for tests sets in the GMTKN30 database. The increment MAD values produced by DLPNO-(T0) and DLPNO-CCSD(T0) were shown as blank bars. The detailed average, MAD, and standard deviation values for each test set can be found in the supplementary material S4.
Absolute (a) and relative (b) energy deviations of CCSD, (T), and total correlation energies (in kcal/mol) calculated by DLPNO-CCSD(T) with respect to the canonical CCSD(T) using aug-cc-pVDZ basis set and TightPNO settings for tests sets in the GMTKN30 database. The increment MAD values produced by DLPNO-(T0) and DLPNO-CCSD(T0) were shown as blank bars. The detailed average, MAD, and standard deviation values for each test set can be found in the supplementary material S4.
As discussed before, the SC approximation can introduce large deviations in special cases. For the atomization-like reaction of cumulene (CnH4 → nC(S = 0) + 2H2), the deviation in reaction energies between DLPNO-CCSD(T0) and canonical CCSD(T) can exceed 11.0 kcal/mol with TightPNO settings and cc-pVDZ basis set for C16H4. (The error introduced by DLPNO-CCSD part is only 1.06 kcal/mol.) With the improved triples correction, this deviation is reduced down to 2.23 kcal/mol. A detailed analysis of these results will be presented elsewhere.
In order to document the scaling behavior of the present algorithm with respect to the system size, calculations on a series of linear alkane chains are presented in Fig. 2. Since the number of triples increases linearly with the number of atoms, the wall times for both methods scale linearly. However, the (T) algorithm takes about 2-3 times longer than the (T0) algorithm. In the LT-based local (T) methods,25,29 usually 3-4 quadrature points are needed. Therefore, the CPU time of both Hättig’s and Kallay’s methods are about 3-4 times slower than their corresponding (T0) methods. Thus, the formal prefactor of our DLPNO-CCSD(T) algorithm is comparable to or even smaller than the LT-based local CCSD(T) methods.
The scaling behavior of DLPNO-CCSD(T) and DLPNO-CCSD(T0) for linear alkane chains with def2-TZVP basis set. The timings for the (T)/(T0) steps and the overall DLPNO-CCSD(T)/(T0) calculations are plotted separately. All calculations run on cluster with two Intel(R) Xeon(R) E5-2670 processors. Only one core is used for each calculation.
The scaling behavior of DLPNO-CCSD(T) and DLPNO-CCSD(T0) for linear alkane chains with def2-TZVP basis set. The timings for the (T)/(T0) steps and the overall DLPNO-CCSD(T)/(T0) calculations are plotted separately. All calculations run on cluster with two Intel(R) Xeon(R) E5-2670 processors. Only one core is used for each calculation.
Finally, using the present method, a few medium-sized molecules are studied (Table I). The penicillin molecule with the def2-TZVP basis set is the largest canonical CCSD(T) calculation that we can afford (24 days on 6 cores). One can see that the DLPNO-(T) recovers more than 95% of conventional (T) corrections, and the overall DLPNO-CCSD(T) method recovers 99.8% of the total correlation energies. Our method presents similar accuracy as the results reported by Hättig.25 The computational cost comparison for these systems between DLPNO-CCSD(T) and DLPNO-CCSD(T0) is given in Table I as well. The (T) step in the DLPNO-CCSD(T) calculation is always less than 4 times slower than its (T0) counterpart. As the basis set is enlarged, the (T) calculations become more efficient. With the def2-SVP basis set, the (T) costs about 4 times as much as (T0), whereas for the def2-QZVPP basis set the increase in computational time only costs a factor of about 0.5. In the DLPNO-(T) algorithm, the computing time of the (T0) step is much more influenced by the basis set size than the iteration step. The computing time of the iteration step is only affected by the number of TNOs, which only slightly increases with the basis set size. Thus, as the size of the basis set enlarges, the iteration step does not dominate the (T) calculation anymore. The computational cost comparison between the overall DLPNO-CCSD(T) and DLPNO-CCSD(T0) calculation is also given in Table I. As can be readily seen, the overall DLPNO-CCSD(T) calculation is only slightly more expensive than the DLPNO-CCSD(T0) calculation (with an overhead of only 20%-30% for the calculations using def2-QZVPP).
The accuracy and efficiency comparison for three medium-sized molecules between DLPNO-CCSD(T) and DLPNO-CCSD(T0) with different basis set.
. | Diclofenac . | Penicillin . | Vancomycin . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | def2-SVP . | def2-TZVP . | def2-QZVPP . | def2-SVP . | def2-TZVP . | def2-QZVPP . | def2-SVP . | def2-TZVP . | def2-QZVPP . |
No. basis function | 329 | 667 | 1439 | 430 | 858 | 1921 | 1797 | 3593 | 8033 |
Accuracy | |||||||||
DLPNO-(T0) | 90.32% | 91.10% | −0.178 999 | 90.46% | 91.39% | −0.219 025 | −0.525 794 | −0.821 973 | −0.919 853 |
DLPNO-(T) | 96.09% | 96.11% | −0.188 199 | 95.75% | 96.00% | −0.229 296 | −0.555 634 | −0.862 413 | −0.962 056 |
Canonical (T) (hartree) | −0.115 737 | −0.176 850 | … | −0.140 410 | −0.214 880 | … | … | … | … |
DLPNO-CCSD(T0) | 99.60% | 99.59% | −4.851 412 | 99.60% | 99.58% | −3.688 547 | −15.988 954 | −19.422 886 | −20.804 963 |
DLPNO-CCSD(T) | 99.84% | 99.84% | −4.861 683 | 99.80% | 99.80% | −3.697 747 | −16.018 653 | −19.463 164 | −20.847 166 |
Canonical CCSD(T) | −2.834 544 | −3.459 576 | … | −3.753 983 | −4.550 853 | … | … | … | … |
(hartree) | |||||||||
Wall time (using 4 cores) | |||||||||
DLPNO-(T0) (min) | 5.3 | 41.8 | 474.3 | 5.3 | 34.2 | 353.0 | 44.9 | 218.9a | 2393.5a |
DLPNO-CCSD(T0) (min) | 10.2 | 71.8 | 1203.0 | 10.3 | 64.5 | 614.7 | 147.7 | 662.4a | 8182.9a |
Ratio of | 4.6 | 3.2 | 1.6 | 4.0 | 3.2 | 1.5 | 4.3 | 2.7a | 1.5a |
DLPNO-(T)/DLPNO-(T0) | |||||||||
Ratio of | 2.9 | 2.3 | 1.2 | 2.5 | 2.2 | 1.3 | 2.0 | 1.6a | 1.2a |
DLPNO-CCSD(T)/ | |||||||||
DLPNO-CCSD(T0) | |||||||||
Hard disk requirement | |||||||||
Additional space for | 22.49 | 74.41 | 143.11 | 21.05 | 68.10 | 132.49 | 113.86 | 342.94 | 645.66 |
intermediates in | |||||||||
DLPNO-(T) (GB) |
. | Diclofenac . | Penicillin . | Vancomycin . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | def2-SVP . | def2-TZVP . | def2-QZVPP . | def2-SVP . | def2-TZVP . | def2-QZVPP . | def2-SVP . | def2-TZVP . | def2-QZVPP . |
No. basis function | 329 | 667 | 1439 | 430 | 858 | 1921 | 1797 | 3593 | 8033 |
Accuracy | |||||||||
DLPNO-(T0) | 90.32% | 91.10% | −0.178 999 | 90.46% | 91.39% | −0.219 025 | −0.525 794 | −0.821 973 | −0.919 853 |
DLPNO-(T) | 96.09% | 96.11% | −0.188 199 | 95.75% | 96.00% | −0.229 296 | −0.555 634 | −0.862 413 | −0.962 056 |
Canonical (T) (hartree) | −0.115 737 | −0.176 850 | … | −0.140 410 | −0.214 880 | … | … | … | … |
DLPNO-CCSD(T0) | 99.60% | 99.59% | −4.851 412 | 99.60% | 99.58% | −3.688 547 | −15.988 954 | −19.422 886 | −20.804 963 |
DLPNO-CCSD(T) | 99.84% | 99.84% | −4.861 683 | 99.80% | 99.80% | −3.697 747 | −16.018 653 | −19.463 164 | −20.847 166 |
Canonical CCSD(T) | −2.834 544 | −3.459 576 | … | −3.753 983 | −4.550 853 | … | … | … | … |
(hartree) | |||||||||
Wall time (using 4 cores) | |||||||||
DLPNO-(T0) (min) | 5.3 | 41.8 | 474.3 | 5.3 | 34.2 | 353.0 | 44.9 | 218.9a | 2393.5a |
DLPNO-CCSD(T0) (min) | 10.2 | 71.8 | 1203.0 | 10.3 | 64.5 | 614.7 | 147.7 | 662.4a | 8182.9a |
Ratio of | 4.6 | 3.2 | 1.6 | 4.0 | 3.2 | 1.5 | 4.3 | 2.7a | 1.5a |
DLPNO-(T)/DLPNO-(T0) | |||||||||
Ratio of | 2.9 | 2.3 | 1.2 | 2.5 | 2.2 | 1.3 | 2.0 | 1.6a | 1.2a |
DLPNO-CCSD(T)/ | |||||||||
DLPNO-CCSD(T0) | |||||||||
Hard disk requirement | |||||||||
Additional space for | 22.49 | 74.41 | 143.11 | 21.05 | 68.10 | 132.49 | 113.86 | 342.94 | 645.66 |
intermediates in | |||||||||
DLPNO-(T) (GB) |
Using 8 cores.
In this work, an improved (T) correction, based on our previous DLPNO-CCSD(T0) method, is reported. The accuracy of the new method was confirmed by studying absolute and relative energies over test sets in the GMTKN30 database. The new (T) algorithm scales linearly with the system size, as does the previous (T0) algorithm. The increase of computer time of computing (T) relative to (T0) is reasonable, but certainly not negligible. It is about an additional factor of 4 for double-ζ basis sets, a factor of 2-3 for triple-ζ basis sets, and only a factor of 0.5 for quadruple-ζ basis sets. The accuracy is definitely improved significantly for absolute energies. For relative energies, the gain in accuracy is modest, but the problematic cases for which the (T0) approximation failed are well treated with the improved (T) scheme. The promising combination of current (T) with the DLPNO-CCSD-F12 method will be reported in due course. The extension of the DLPNO-(T) from the closed-shell to the open-shell case is undergoing and will be reported in due course as well.
See supplementary material for detailed results under the three approximations and results of test sets in GMTKN30.
F.N. and Y.G. gratefully acknowledge financial support by the Max Planck Society and the cluster of excellence (RESOLV, University of Bochum, No. EXC 1069). Y.G. is thankful to Peter Pinski for helpful discussion.