The focal-point approach, combining several quantum chemistry computations to estimate a more accurate computation at a lower expense, is effective and commonly used for energies. However, it has not yet been widely adopted for properties such as geometries. Here, we examine several focal-point methods combining Møller–Plesset perturbation theory (MP2 and MP2.5) with coupled-cluster theory through perturbative triples [CCSD(T)] for their effectiveness in geometry optimizations using a new driver for the Psi4 electronic structure program that efficiently automates the computation of composite-energy gradients. The test set consists of 94 closed-shell molecules containing first- and/or second-row elements. The focal-point methods utilized combinations of correlation-consistent basis sets cc-pV(X+d)Z and heavy-aug-cc-pV(X+d)Z (X = D, T, Q, 5, 6). Focal-point geometries were compared to those from conventional CCSD(T) using basis sets up to heavy-aug-cc-pV5Z and to geometries from explicitly correlated CCSD(T)-F12 using the cc-pVXZ-F12 (X = D, T) basis sets. All results were compared to reference geometries reported by Karton et al. [J. Chem. Phys. 145, 104101 (2016)] at the CCSD(T)/heavy-aug-cc-pV6Z level of theory. In general, focal-point methods based on an estimate of the MP2 complete-basis-set limit, with a coupled-cluster correction evaluated in a (heavy-aug-)cc-pVXZ basis, are of superior quality to conventional CCSD(T)/(heavy-aug-)cc-pV(X+1)Z and sometimes approach the errors of CCSD(T)/(heavy-aug-)cc-pV(X+2)Z. However, the focal-point methods are much faster computationally. For the benzene molecule, the gradient of such a focal-point approach requires only 4.5% of the computation time of a conventional CCSD(T)/cc-pVTZ gradient and only 0.4% of the time of a CCSD(T)/cc-pVQZ gradient.
I. INTRODUCTION
The “gold standard” of quantum chemistry, coupled-cluster with single, double, and perturbative triple excitations [CCSD(T)],1 has proven to be quite reliable for predictions of molecular energies or properties, so long as there are no nearly degenerate electron configurations.2 As early as 1993, a systematic study of ten small molecules3 found that CCSD(T) paired with a TZ(2df, 2pd) basis yielded errors in equilibrium bond lengths of only 0.001–0.005 Å compared to experiment or an average absolute error of 0.17%. A number of other studies4–7 have demonstrated similarly good agreement. However, the size of the basis set can have a significant influence on the accuracy of quantum chemical results, particularly for wavefunction-based methods such as CCSD(T). For example, a study of 19 closed-shell molecules compared CCSD(T) bond lengths to experiment and found that the maximum error decreased from 0.0046 Å with a cc-pVDZ basis to 0.0012 Å with a cc-pVQZ basis.4 Unfortunately, large-basis CCSD(T) computations are computationally expensive, except for the smallest molecules, because the computational cost of canonical CCSD(T) scales as , where o and are the number of occupied and virtual (unoccupied) orbitals, respectively. Very large basis sets thus lead to a rapid rise in the computational cost due to the quartic dependence on . The problem is compounded when seeking optimized geometries because of the extra expense of computing the gradient of the energy and because of the iterative nature of geometry optimization. Hence, without additional approximations, it is very difficult to obtain high-quality, large-basis CCSD(T) geometries for molecules with more than a few atoms. However, geometries of this quality are desirable in a variety of applications, such as the determination of rotational constants for comparison with microwave spectra,8 computation of vibrational frequencies for comparison with infrared spectra,9 and the determination of accurate barrier heights in studies of chemical reactions.10
Recently, Spackman et al. carefully examined11 two different approaches to estimate large-basis CCSD(T) geometries at a lower computational cost. For comparison, they reported conventional CCSD(T) geometries obtained with a series of basis sets of increasing size, and they estimated the geometries for CCSD(T) in the complete-basis-set (CBS) limit. Due to computational expense, only a few previous studies have reported estimates of CCSD(T)/CBS geometries.12–15 For their test set, Karton co-workers11 used 122 neutral first- and second-row species from the W4-11 database.16 For benchmark values, these workers used CCSD(T) with the very large heavy-aug-cc-pV(6+d)Z basis set (and heavy-aug-cc-pV(5+d)Z for 14 larger molecules), where “heavy” denotes that diffuse functions were added only to non-hydrogen atoms and +d represents corrections to d functions for second-row elements.
First, Karton and co-workers examined the effectiveness of the explicitly correlated coupled-cluster CCSD(T)-F12 method17 with the associated F12b approximation,18 which is known to improve speed of convergence toward the CCSD(T)/CBS limit for a variety of properties.18–22 For geometries, these methods were found to be quite effective, with CCSD(T)-F12b/cc-pVXZ-F12 performing about as well as conventional CCSD(T)/cc-pV(X+2)Z; that is, when using explicitly correlated CCSD(T)-F12b in conjunction with the explicitly correlated cc-pVXZ-F12 (X = D, T) basis sets,23 one effectively gets two “free” ζ-levels of accuracy. The extra F12 terms do not add a large additional computational expense, and hence, CCSD(T)-F12 offers considerable cost savings over conventional CCSD(T) because higher quality results are obtained for smaller basis sets.
Although the quality of the CCSD(T)-F12 geometries was found to be quite good, the approach was not very practical at the time it was tested because analytic gradients of CCSD(T)-F12 had not yet been formulated (meaning that the gradients had to be estimated numerically by finite differences of energy points). Since that work, analytic CCSD(T)-F12 gradients have been published by Győrffy and Werner.24 Unfortunately, however, adding explicit correlation terms of the R12 or F12 type makes analytic gradient formulas substantially more complex, in part due to the appearance of multiple products of 4-index integrals;24–30 to our knowledge, analytic CCSD(T)-F12 gradients are currently implemented only in the molpro program.31,32 Hence, it is worthwhile to investigate whether there are alternative approximations to large-basis CCSD(T) geometries that would be easier to implement across multiple quantum chemistry programs.
In their study,11 Karton and co-workers also considered the performance of the popular two-point extrapolation scheme33
for directly estimating bond lengths at the complete basis set limit, , from their values DX and DY in adjacent basis set levels X and Y through an optimal extrapolation exponent α. Using this procedure, extrapolated CCSD(T)/cc-pV[X,Y]Z and CCSD(T)/heavy-aug-cc-pV[X,Y]Z bond lengths were seen to be of comparable accuracy to those computed with their single-point CCSD(T)/cc-pV(Y+1)Z and CCSD(T)/heavy-aug-cc-pV(Y+1)Z counterparts. This approach can be utilized with existing CCSD(T) analytic gradient programs, but Karton and co-workers found that extrapolations did not work well if double-ζ basis sets were included, meaning that very expensive triple-ζ and quadruple-ζ CCSD(T) optimizations are required.
Here, we consider an alternative path to estimating large-basis CCSD(T) geometries, namely, the focal-point technique,34,35 which we have found to be very effective in the context of non-covalent interactions.36–38 This approach capitalizes on the fact that smaller basis sets are usually sufficient to capture higher-order electron correlation effects. Thus, CCSD(T) in a large basis set may be approximated by
In place of the large basis set, one may also use an estimate of the CBS limit; two-point extrapolation of the correlation energy33 is effective for this purpose. An advantage of this approach over explicitly correlated methods is that analytic gradients are straightforward to formulate because they are just sums and differences of analytic gradients of the component methods [MP2 and CCSD(T) in the example above]. An advantage over direct extrapolation of CCSD(T) results is that the focal-point approach often gives reliable results even when the CCSD(T) portion of the computation uses only a polarized double-ζ basis set.
To date, a few previous studies have investigated focal-point methods for geometries8,13–15,39–41 by scanning along potential energy curves for diatomics,39 by fitting PES’s of small-dimensional systems with Collins interpolation,40 or by estimating the focal-point gradients by finite differences of energies.15,41 To our knowledge, very few works have employed analytic gradients in focal-point approaches. Černý et al.13 wrote custom scripts that parse gradient outputs from other programs and combine them to form the focal-point gradient. That study examined geometries of four small van der Waals dimers. A subsequent study by Řezáč and Hobza on 24 van der Waals dimers14 may also have used these scripts. Puzzarini, Heckert, and Gauss used analytic gradients in focal-point approaches that extrapolated CCSD(T) gradients to the complete basis set limit, and additionally added core-valence and beyond-CCSD(T) correlation corrections, to obtain very accurate rotational constants.8 The cfour program42 supports the automated computation of energies, gradients, and Hessians using basis set extrapolation and/or focal-point schemes. cfour allows basis set extrapolation of Hartree–Fock or correlation energies and allows for various post-CCSD(T) corrections; to our knowledge, it does not utilize approximations such as Eq. (2), which we explore here.
In this work, we present a general implementation of focal-point gradients for arbitrary combinations of methods, and we use this implementation to examine the basis set convergence of 94 molecular geometries containing first- and second-row elements (labeled W4-11-94) using combinations of perturbation theory and CCSD(T) as in Eq. (2). Our results indicate that with the focal point approach, the time-consuming CCSD(T) part of the computation can be performed in a basis that is one or two ζ-levels smaller than that needed for a conventional CCSD(T) computation with similar accuracy. That is, focal-point methods employing CCSD(T) computations with a cc-pVDZ basis set are about as accurate as conventional CCSD(T) with a cc-pVTZ or even a cc-pVQZ basis set. We have also investigated the effect of neglecting diffuse functions in the CCSD(T) component of the focal-point scheme to further speed up the computations, and surprisingly, we find that the accuracy is just as good or better for the geometries of the small molecules examined.
II. COMPUTATIONAL METHODS
A. Psi4 driver description
Focal-point methods have been quite popular for estimating high-quality energies. Although they require multiple individual energy evaluations [e.g., MP2/large, CCSD(T)/small, and MP2/small in Eq. (2)], it is straightforward for the user of a quantum chemistry program to add and subtract the required total energies to obtain the focal-point energy. In principle, a focal-point gradient is also straightforward to determine, since the focal-point energy consists of simple sums and differences of individual total energies. For example, the gradient of Eq. (2) is just
for the displacement of some Cartesian coordinate qi. Unlike the case of energies, however, it is no longer appropriate to obtain the focal-point results through post-processing of individual computations because the composite gradients are needed during the geometry optimization procedure. Hence, for routine use, focal-point optimizations require support by the quantum chemistry program itself or else a custom driver script must be written, which can perform the individual gradient computations, form the composite gradient, and intercept communications to the optimizer.
Ideally, focal-point optimizations should be supported in a generic way, where any combination of high-level and low-level methods could be employed. This is not trivial because one may wish to use multi-step procedures for one or more of the components of the focal-point procedure. For example, here we will examine, in place of MP2/large, a two-point extrapolation33 to the MP2/CBS limit using two individual computations, such as MP2/cc-pVTZ and MP2/cc-pVQZ. Alternatively, or in addition, one may also wish to apply counterpoise corrections for van der Waals clusters, which again involves multiple individual computations.
With these goals in mind, the driver of the open-source Psi4 program was rewritten to automate basis set extrapolation, counterpoise correction, and focal-point schemes with additional support for many-body expansions of molecular clusters43 (see Fig. 1). This driver provides three basic functions to compute quantum chemistry quantities: energy(), gradient(), and hessian(). Each of these functions is called with a user-defined string that specifies the level of theory to employ, such as ‘MP2/cc-pVTZ’. A focal-point approach can be specified by a string such as ‘MP2/cc-pV[T,Q]Z + D:CCSD(T)/cc-pVDZ’, which would be an extrapolation to the MP2/CBS limit based on cc-pVTZ and cc-pVQZ computations, plus a correction for higher-order electron correlation evaluated in the cc-pVDZ basis set. The optimizer works with energies and gradients returned from the gradient() function; so as long as that function can interpret the string passed to it by the user, any basic or composite method may be employed in the geometry optimization. In addition, the gradient and Hessian functions are capable of computing the desired quantity via finite-difference procedures if analytic versions are not available (or upon user request).
In the case of gradient(‘MP2/cc-pV[T,Q]Z + D:CCSD(T)/cc-pVDZ’), the gradient() function will first detect that a composite focal-point computation is requested. Both focal-point and CBS extrapolations are handled by a wrapper function named cbs(). That function will be called with the string ‘MP2/cc-pV[T,Q]Z + D:CCSD(T)/cc-pVDZ’. This wrapper then interprets the string as a focal-point computation consisting of two parts: ‘MP2/cc-pV[T,Q]Z’ and ‘D:CCSD(T)/cc-pVDZ’. The first string will be recognized as a two-point extrapolation to the CBS limit. The cbs() function knows how to perform the standard two-point Helgaker extrapolation of correlation energies33 or three-point extrapolations of SCF energies assuming exponential convergence.44 The cbs() function will then call gradient(‘MP2/cc-pVTZ’) and gradient(‘MP2/cc-pVQZ’). Now the gradient() function no longer detects a required CBS extrapolation, and so the individual gradients will be computed at this point. The cbs() wrapper now has all information required to extrapolate the MP2/cc-pV[T,Q]Z gradient and will do so. Helgaker’s two-point extrapolation formula was developed for correlation energies,44 and so we extrapolate the correlation contribution to the gradient and add it to the SCF gradient for the largest basis set to obtain the overall MP2/CBS gradient; the original extrapolation formula is used unmodified, simply replacing energies with gradients.
Next, from the original string ‘MP2/cc-pV[T,Q]Z + D:CCSD(T)/cc-pVDZ’, it can be inferred that the higher order correlation correction to be computed is /cc-pVDZ. If a focal-point energy were requested, the cbs() wrapper knows that it can obtain the required MP2/cc-pVDZ energy as a byproduct of the CCSD(T)/cc-pVDZ energy. However, this is not the case for gradients, and hence both gradient(‘MP2/cc-pVDZ’) and gradient(‘CCSD(T)/cc-pVDZ’) are called. The cbs() function then subtracts these two components to form the /cc-pVDZ gradient, which is passed back up to the calling MP2 instance of cbs(), which adds this to the MP2/cc-pV[T,Q]Z result to obtain the overall focal-point gradient, which is returned to the user. In an optimization procedure, this gradient() procedure is called repeatedly by the optimizer. The quasi-recursive organization of the driver allows for a layered approach to build up complex quantities from common primitives and can be easily extended to additional regimes.
B. Geometry optimization details: Molecular dataset, reference, and comparison methods
In this work, we utilize our flexible implementation of focal-point gradients to perform geometry optimizations on a subset of the W4-11 database16 (excluding 3 beryllium-containing, 16 multireference, and 28 open-shell species). This W4-11 subset (visualized in Fig. 2 and labeled W4-11-94) contains 94 closed-shell molecules totaling 204 symmetry-unique bonds and 172 symmetry-unique bond angles. There are 66 first-row, 5 second-row, and 23 mixed first- and second-row species, containing 172 first-row, 7 second-row, and 25 mixed first- and second-row bonds and 155 single, 33 double, and 16 triple bonds.
When computing errors for the focal-point methods explored here, we use as reference values the highest-level geometries reported by Karton and co-workers.11 For most systems, those authors used CCSD(T)/heavy-aug-cc-pV(6+d)Z as their best level of theory; however, for the 11 molecules indicated by red asterisks in Fig. 2, they used CCSD(T)/heavy-aug-cc-pV(5+d)Z. When we wish to compare specifically to the highest-quality, CCSD(T)/heavy-aug-cc-pV(6+d)Z data available for 83 of the molecules, we will refer to this as the W4-11-83 subset.
We also compare our present focal-point geometries to conventional CCSD(T) results obtained using various basis sets and also to CCSD(T)-F12 results, previously reported by Karton and co-workers.11 In these comparisons, the summary statistics (e.g., mean absolute errors, MAE) were recomputed because of our removal of open-shell and multi-reference systems from the original W4-11 test set.
Here, we considered focal-point methods utilizing the correlation consistent basis sets of Dunning and co-workers,45,46 cc-pV(X+d)Z (abbreviated XZ; X = D, T, Q, 5, 6). We also considered the addition of diffuse functions, as in the augmented correlation-consistent basis sets aug-cc-pV(X+d)Z,47 except that we added the diffuse functions only to heavy (non-hydrogen) elements, which we denote heavy-aug-cc-pV(X+d)Z (haXZ). Additional truncations to the diffuse space were considered in which subshells of diffuse functions were successively removed. Papajak and Truhlar have introduced a “calendar” naming scheme for such truncations,48 which we adopt here for basis sets more truncated than heavy-aug-cc-pV(X+d)Z [which Papajak and Truhlar call jul-cc-pV(X+d)Z]. Removing the subshell of diffuse functions with the highest angular momentum from heavy atoms yields the jun-cc-pV(X+d)Z (jXZ) basis sets, and removing the next angular momentum subshell yields the may-cc-pV(X+d)Z (mXZ) basis sets.
Our focal-point results are compared to previously published11 CCSD(T) results employing both Dunning basis sets and the def2-XZVPP (X = T, Q) basis sets of Weigend and Ahlrichs.49 We also compare to explicitly correlated CCSD(T) results11 obtained using the XZ-F12 (X = D, T) basis sets of Peterson et al.50
A two-point extrapolation of MP2 correlation energies using the cc-pVXZ and cc-pVYZ basis sets is denoted as MP2/CBS[X,Y]Z. General notations for focal-point methods, such as MP2/CBS[X,Y]Z + /NZ, are abbreviated as CCSD(T)/[XYZ; δ:NZ]. If another method is used in place of MP2 for the large-basis component (such as MP2.5),51 we use a notation such as CCSD(T)/[MP2.5/XYZ; δ:NZ].
All computations were performed with a development version of Psi4 1.1,43 where we employed analytic gradients of density-fitted (DF) MP2,52–54 DF-MP2.5,55 DF-MP3,55 and DF-CCSD(T)56 with the default Psi4 density fitting basis sets,57 namely, the (aug)-cc-pVXZ-JKFIT sets58 for Hartree–Fock and Fock-like terms and the (aug)-cc-pVXZ-RI sets59,60 for correlation terms. Core orbitals were constrained to remain doubly occupied. CCSD(T)/ha6Z reference geometries from Karton and co-workers11 were used as starting points for all optimizations with the following convergence criteria (all in internal coordinates and atomic units): maximum displacement of 6.0 × 10−6, maximum force of 2.0 × 10−6, root mean square displacement of 4.0 × 10−6, root mean square force of 1.0 × 10−6, and maximum energy change between geometry steps of 1.0 × 10−10. Total energies were converged to 1.0 × 10−10 hartrees.
We also considered the effect of truncating the space of virtual orbitals in the CCSD(T) computation by transforming to frozen natural orbitals (FNOs).61–66 These can be very effective at speeding up computations with little loss in accuracy in a variety of situations, from reaction energies66 to intermolecular interactions.65,67 In this case, we have not yet implemented analytic gradients of FNO-based DF-CCSD(T), so the FNOs gradients can only be tested numerically and do not offer a speedup with the current code. Nevertheless, it is interesting to explore their accuracy to help evaluate the benefits of implementing the analytic gradients. For FNO-CCSD(T), we used the conservative Psi4 default threshold, which deletes FNO virtual orbitals with occupation numbers less than 10−6.
While not included in the W4-11-94 test set, benzene was chosen for comparing single gradient timings of all focal-point schemes and standard CCSD(T) methods considered here [excluding CCSD(T) with the large ha5Z basis] to provide some idea about the computational costs of the methods tested here. We did not include DF-FNO-CCSD(T) in our timing comparisons because the current lack of analytic gradients for the combination of frozen natural orbitals with density fitting means the gradients must be estimated numerically, which is a slower procedure that will not be competitive with analytic gradients. For comparison, timings are reported for analytic gradients of CCSD(T)-F12b/DZ-F12 using the latest version (2019.2) of the MOLPRO program.31,32 Timings were obtained on a 6-core Intel i7-5930K at 3.5 GHz, with 64 GB of RAM, and a locally attached RAID0 array comprised of three 7200 rpm hard drives for scratch files. Unfortunately, CCSD(T)-F12 timings with the larger TZ-F12 basis set were not possible on this machine because they required more than the available 64 GB of memory.
In order to gain a better understanding of the bond length and angle errors presented in this work, it is important to discuss the magnitude of the errors introduced by Born–Oppenheimer corrections, electron correlation effects neglected by CCSD(T), and the density fitting approximation. Previous works have shown that errors due to relativistic effects for bond lengths of first-row atoms seem to be about 0.0005 Å or less.12,68,69 When compared to geometries of BH, CH+, and NH computed using full configuration interaction, CCSD(T) methods underestimated bond lengths by about 0.0002–0.0005 Å.68 That same study of diatomic hydrides68 indicates that errors from neglect of the diagonal Born–Oppenheimer approximation are in the range of 0.0003–0.0007 Å and that core-valence correlation causes changes of up to 0.0032 Å. A study of 13 small first-row compounds by Feller et al.41 shows core-valence corrections of 0.0008–0.0042 Å. The density fitting approximation was shown to incur mean absolute equilibrium bond length errors of around 0.0002 Å.55,56 However, acetylene showed an outlying maximum absolute error of 0.0028 Å for the C–C bond comparing DF-CCSD(T)/cc-pVQZ to conventional CCSD(T)/cc-pVQZ, which is an order of magnitude greater.56 While information on how these approximations affect bond angles is scarce, one study demonstrated relativistic and core correlation errors under 0.01° for the HCH angle in formaldehyde.70 In this study, we report bond lengths to a precision of 0.0001 Å and bond angles to a precision of 0.01°, and the reader should bear in mind that focal-point errors of ∼0.01° and 0.0005 Å would be comparable to most other sources of error in non-relativistic, Born–Oppenheimer CCSD(T) geometries, and would actually be significantly smaller than errors from neglect of core-valence correlation (which we do not consider here because we compare to the frozen-core W4-11 benchmark geometry data).
III. RESULTS AND DISCUSSION
A. Focal point methods save 1–2 “free” levels of accuracy compared to standard CCSD(T) methods
Table I displays error statistics, i.e., root-mean-square error (RMSE), mean absolute error (MAE), mean signed error (ME), and maximum absolute error (maxAE), for the molecular geometries of the W4-11 subset of 94 closed-shell systems (labeled W4-11-94) that are illustrated in Fig. 2. The final column gives the computational time (in minutes) to compute a single gradient for the benzene molecule, to provide a sense of relative computational cost. Molecular bond lengths (Å) and angles (°) of the W4-11-94 test set produced from focal-point, CCSD(T), and CCSD(T)-F12 methods paired with various basis set sizes are compared to those of the CCSD(T)/ha6Z reference geometries.11
. | Bond lengths . | Bond angles . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
Method and basis . | RMSE . | MAE . | ME . | MaxAE . | RMSE . | MAE . | ME . | MaxAE . | Timeb . |
CCSD(T)c,d | |||||||||
[TQZ;δ:DZ] | 0.0023 | 0.0013 | 0.0012 | 0.0125 | 0.12 | 0.07 | −0.05 | 0.43 | 5.2 |
[TQZ;δ:TZ] | 0.0012 | 0.0007 | 0.0002 | 0.0055 | 0.08 | 0.05 | −0.03 | 0.29 | 116.4 |
[Q5Z;δ:TZ] | 0.0008 | 0.0005 | 0.0002 | 0.0036 | 0.04 | 0.03 | −0.00 | 0.09 | 124.5 |
[haTQZ;δ:haDZ] | 0.0014 | 0.0011 | 0.0010 | 0.0050 | 0.04 | 0.03 | −0.02 | 0.17 | 22.4 |
[haTQZ;δ:haTZ] | 0.0008 | 0.0005 | 0.0003 | 0.0033 | 0.03 | 0.02 | −0.00 | 0.14 | 365.0 |
[haQ5Z;δ:haTZ] | 0.0005 | 0.0004 | 0.0001 | 0.0019 | 0.02 | 0.02 | −0.00 | 0.09 | 379.8 |
CCSD(T)e | |||||||||
DZ | 0.0178 | 0.0158 | 0.0158 | 0.0547 | 0.86 | 0.55 | −0.31 | 3.93 | 3.3 |
TZ | 0.0042 | 0.0031 | 0.0030 | 0.0172 | 0.34 | 0.22 | −0.12 | 1.09 | 115.2 |
QZ | 0.0012 | 0.0008 | 0.0006 | 0.0059 | 0.13 | 0.09 | −0.04 | 0.42 | 1350.6 |
5Z | 0.0004 | 0.0002 | 0.0000 | 0.0017 | 0.04 | 0.03 | −0.01 | 0.13 | 11 488.4 |
haDZ | 0.0205 | 0.0185 | 0.0185 | 0.0525 | 0.48 | 0.27 | −0.16 | 3.10 | 18.1 |
haTZ | 0.0051 | 0.0042 | 0.0042 | 0.0185 | 0.13 | 0.08 | −0.05 | 0.73 | 362.3 |
haQZ | 0.0015 | 0.0011 | 0.0011 | 0.0060 | 0.04 | 0.02 | −0.01 | 0.22 | 3 621.9 |
ha5Z | 0.0004 | 0.0002 | 0.0002 | 0.0018 | 0.01 | 0.01 | −0.00 | 0.08 | n/a |
Def2-TZVPP | 0.0040 | 0.0028 | 0.0028 | 0.0181 | 0.22 | 0.15 | −0.08 | 0.88 | 122.4 |
Def2-QZVPP | 0.0012 | 0.0007 | 0.0007 | 0.0056 | 0.07 | 0.05 | −0.02 | 0.22 | 1 505.1 |
CCSD(T)-F12e | |||||||||
DZ-F12 | 0.0018 | 0.0013 | 0.0012 | 0.0081 | 0.11 | 0.06 | −0.04 | 0.72 | 71.7 |
TZ-F12 | 0.0006 | 0.0004 | 0.0004 | 0.0026 | 0.03 | 0.02 | −0.01 | 0.20 | n/a |
. | Bond lengths . | Bond angles . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
Method and basis . | RMSE . | MAE . | ME . | MaxAE . | RMSE . | MAE . | ME . | MaxAE . | Timeb . |
CCSD(T)c,d | |||||||||
[TQZ;δ:DZ] | 0.0023 | 0.0013 | 0.0012 | 0.0125 | 0.12 | 0.07 | −0.05 | 0.43 | 5.2 |
[TQZ;δ:TZ] | 0.0012 | 0.0007 | 0.0002 | 0.0055 | 0.08 | 0.05 | −0.03 | 0.29 | 116.4 |
[Q5Z;δ:TZ] | 0.0008 | 0.0005 | 0.0002 | 0.0036 | 0.04 | 0.03 | −0.00 | 0.09 | 124.5 |
[haTQZ;δ:haDZ] | 0.0014 | 0.0011 | 0.0010 | 0.0050 | 0.04 | 0.03 | −0.02 | 0.17 | 22.4 |
[haTQZ;δ:haTZ] | 0.0008 | 0.0005 | 0.0003 | 0.0033 | 0.03 | 0.02 | −0.00 | 0.14 | 365.0 |
[haQ5Z;δ:haTZ] | 0.0005 | 0.0004 | 0.0001 | 0.0019 | 0.02 | 0.02 | −0.00 | 0.09 | 379.8 |
CCSD(T)e | |||||||||
DZ | 0.0178 | 0.0158 | 0.0158 | 0.0547 | 0.86 | 0.55 | −0.31 | 3.93 | 3.3 |
TZ | 0.0042 | 0.0031 | 0.0030 | 0.0172 | 0.34 | 0.22 | −0.12 | 1.09 | 115.2 |
QZ | 0.0012 | 0.0008 | 0.0006 | 0.0059 | 0.13 | 0.09 | −0.04 | 0.42 | 1350.6 |
5Z | 0.0004 | 0.0002 | 0.0000 | 0.0017 | 0.04 | 0.03 | −0.01 | 0.13 | 11 488.4 |
haDZ | 0.0205 | 0.0185 | 0.0185 | 0.0525 | 0.48 | 0.27 | −0.16 | 3.10 | 18.1 |
haTZ | 0.0051 | 0.0042 | 0.0042 | 0.0185 | 0.13 | 0.08 | −0.05 | 0.73 | 362.3 |
haQZ | 0.0015 | 0.0011 | 0.0011 | 0.0060 | 0.04 | 0.02 | −0.01 | 0.22 | 3 621.9 |
ha5Z | 0.0004 | 0.0002 | 0.0002 | 0.0018 | 0.01 | 0.01 | −0.00 | 0.08 | n/a |
Def2-TZVPP | 0.0040 | 0.0028 | 0.0028 | 0.0181 | 0.22 | 0.15 | −0.08 | 0.88 | 122.4 |
Def2-QZVPP | 0.0012 | 0.0007 | 0.0007 | 0.0056 | 0.07 | 0.05 | −0.02 | 0.22 | 1 505.1 |
CCSD(T)-F12e | |||||||||
DZ-F12 | 0.0018 | 0.0013 | 0.0012 | 0.0081 | 0.11 | 0.06 | −0.04 | 0.72 | 71.7 |
TZ-F12 | 0.0006 | 0.0004 | 0.0004 | 0.0026 | 0.03 | 0.02 | −0.01 | 0.20 | n/a |
Errors computed against the highest-level results reported in Ref. 11, CCSD(T)/ha6Z for 83 molecules and CCSD(T)/ha5Z for the remaining 11 molecules marked by red asterisks in Fig. 2.
Time (minutes) to compute a single analytic gradient for benzene (see Sec. II B for details).
For focal-point approaches, CCSD(T)/[XYZ; δ:NZ] denotes MP2/CBS[X,Y]Z + /NZ.
FNO-CCSD(T) and CCSD(T) values were identical within the precision reported, except for small differences in angle maxAEs (for more details, see the supplementary material).
Error statistics for the 94 closed-shell molecule subset of W4-11 using previously reported geometries for CCSD(T) and CCSD(T)-F12.11
First, we note that the errors in geometries are very small for most of the methods presented in Table I. For the focal-point methods tested, for explicitly correlated CCSD(T)-F12, and for conventional CCSD(T) with basis sets of quadruple-ζ quality or better, the RMSE and MAE are typically around 0.001 Å or better. The RMSE and MAE for bond angles are typically 0.1° or better. The remainder are conventional CCSD(T) with a double- or triple-ζ basis set, where errors are substantially larger [e.g., MAE 0.016 Å and 0.86° for CCSD(T)/DZ]. This demonstrates yet again that CCSD(T) with small basis sets is not capable of providing very high quality results for geometries, unless additional corrections for basis set incompleteness are applied (e.g., through focal-point procedures).
Considering the conventional CCSD(T) results, error statistics vs the estimated CCSD(T)/CBS benchmark geometries decrease smoothly as larger basis sets are used. Errors statistics are similar between cc-pVTZ and def2-TZVPP or between cc-pVQZ and def2-QZVPP. Explicitly correlated approaches give excellent geometries, with CCSD(T)-F12/cc-pVDZ-F12 performing nearly as well as CCSD(T)/cc-pVQZ and with CCSD(T)-F12/cc-pVTZ-F12 approaching the quality of CCSD(T)/cc-pV5Z. Perhaps surprisingly, adding diffuse functions does not necessarily improve results for bond lengths, although bond angles tend to be improved. This contrasts with the behavior of the focal-point methods, where diffuse functions tend to improve error statistics for both bond lengths and bond angles.
The least computationally demanding focal-point approach tested, CCSD(T)[TQZ;δ:DZ], exhibits an accuracy for bond lengths that lies between the quality of conventional CCSD(T)/TZ and CCSD(T)/QZ, while the quality of the bond angles is comparable to that of CCSD(T)/QZ. This is remarkable given that the CCSD(T) portion of the CCSD(T)[TQZ;δ:DZ] focal-point approach requires only a DZ basis set, and so the total computational cost is drastically reduced (5.2 min) compared to the canonical CCSD(T) computations of similar quality (115.2 min for TZ and 1350.6 min for QZ). Similarly, impressive results are seen for the next two focal-point approaches tested. CCSD(T)[TQZ;δ:TZ] is similar in quality to CCSD(T)/QZ for bond lengths and intermediate in quality between CCSD(T)/QZ and CCSD(T)/5Z for bond angles, while the cost is only 116.4 min compared to 1350.6 min or 11488.4 min for the conventional QZ or 5Z computations. Likewise, CCSD(T)/[Q5Z;δTZ] is between CCSD(T)/QZ and CCSD(T)/5Z in quality for bond lengths and very similar to CCSD(T)/5Z for bond angles.
The next three focal-point methods in Table I utilize “heavy-augmented” basis functions, i.e., they add diffuse functions to the heavy (non-hydrogen) atoms. Here, the error statistics for CCSD(T)/[haTQZ;δhaDZ] are very similar to those of conventional CCSD(T)/haQZ, despite using only a haDZ basis for the CCSD(T) portion of the computation, with a timing reduction from 3621.9 min to only 22.4 min. Similarly, CCSD(T)[haTQZ;δ:haTZ] has a quality between CCSD(T)/haQZ and CCSD(T)/ha5Z, and CCSD(T)[haQ5Z;δ:haTZ] is very close to CCSD(T)/ha5Z in quality.
Thus, the focal-point approaches [(ha)XYZ;δ(ha)NZ] are generally at least as accurate as conventional CCSD(T)/(ha)(N+1)Z computations and are often as accurate as CCSD(T)/(ha)(N+2)Z. By accounting for basis set effects using MP2 instead of CCSD(T), we essentially gain 1–2 “free” ζ levels of accuracy.
Table I indicates that the CCSD(T)/[haTQZ;δ:haDZ] focal-point approach is superior in its error statistics compared to CCSD(T)-F12/DZ-F12 while also requiring less time (22.4 min vs 71.7 min). In the focal-point timings, the MP2 component requires only 4.1 min out of the total of 22.4 min, with the remaining time belonging to the CCSD(T) portion of the computation. The primary reason the CCSD(T)-F12/DZ-F12 computation is slower is that the DZ-F12 basis set (i.e., cc-pVDZ-F12) comprises 234 basis functions and is significantly larger than the heavy-aug-cc-pVDZ basis set, comprising 168 basis functions, used in the coupled-cluster portion of the CCSD(T)/[haTQZ;δ:haDZ] focal-point computation. The DZ-F12 and TZ-F12 basis sets were used by Karton and co-workers11 because they are recommended for use with CCSD(T)-F12.50 Computations would be faster with smaller basis sets, although one would also expect a concomitant increase in errors, and the errors of CCSD(T)-F12/DZ-F12 are already larger than those of CCSD(T)/[haTQZ;δ:haDZ].
Similarly, CCSD(T)/[haQ5Z;δ:haTZ] focal point geometries exhibit better error statistics than CCSD(T)-F12/TZ-F12 and will again be quite a bit faster: the MP2 portion takes a mere 18.5 min out of the overall 379.8 min of the focal-point computation, while the haTZ basis set of the CCSD(T) part of the focal-point computation comprises only 360 basis functions, compared to 426 for the VTZ-F12 basis. Explicit timings for CCSD(T)-F12/TZ-F12 were unfortunately not obtainable because they required more than the 64 GB of RAM available on the machine used for testing (MOLPRO replicates the data for each core, whereas Psi4 uses a shared-memory model).
B. Error distributions
Table II shows the error distribution of bonds and bond angles involving only first-row elements (blue), or one or more second-row elements (orange). As seen graphically in the error distributions in Table II or numerically from the mean errors in Table I, the CCSD(T) approaches considered tend, on average, to overestimate bond lengths and underestimate bond angles. The error distribution is heavily skewed toward overestimated bond lengths (with very few underestimated), but it is more symmetric about zero for bond angles. The spread of the errors is quite large using conventional CCSD(T) with a double-ζ basis set, and many of the bond lengths are substantially overestimated, with errors typically between 0.01 Å and 0.02 Å for first-row atoms, and often somewhat larger for the second-row atoms (up to a maximum absolute error of 0.055 Å for Cl–F). Switching to a triple-ζ basis set greatly reduces the errors (now typically in the 0.001–0.005 Å range), but spread of the errors remains significant, and individual errors remain as large as 0.017 Å. Going finally to quadruple-ζ basis sets provides excellent mean absolute errors and a tight spread of the errors.
Standard CCSD(T) and CCSD(T)-F12 methods included for comparison. Reference geometries computed at CCSD(T)/ha6Z (or CCSD(T)/ha5Z) were obtained from the same source11 (see Sec. II B for details).
Blue represents bonds (or angles) containing strictly first-row elements, and orange represents any bond (or angle) containing second-row elements (see Fig. 2 for dataset details). Vertical bars represent errors of −0.005, −0.001, 0.000, 0.001, 0.005, 0.01, and 0.02 Å for bond lengths, and −1.0°, −0.5°, −0.1°, 0°, 0.1°, 0.5°, and 1.0° for bond angles. MAE (positive) is represented by the orange box.
For focal-point approaches, CCSD(T)/[XYZ;δNZ] denotes MP2/CBS[X,Y]Z + /NZ.
The spread of the errors for the focal-point methods in Table II is excellent. CCSD(T)/[TQZ;δDZ] has an error distribution better than that of CCSD(T)/TZ, although not quite as good as CCSD(T)/QZ. The next four focal-point methods in Table II have an error spread comparable to that of CCSD(T)/QZ. Finally, the [haQ5Z;δ:haTZ] error distribution approaches those of CCSD(T)/5Z and CCSD(T)/ha5Z in quality.
For methods better than CCSD(T)/(ha)DZ, the larger errors in the bond lengths tend to be associated with the orange lines (bonds involving one or two second-row elements). This is not surprising, as bonds involving a second-row element will be longer. A closer inspection of mean absolute percent errors for bond lengths involving only first-row atoms (Table S-6 of the supplementary material) vs those involving at least one second-row element (Table S-7 of the supplementary material) suggests that the relative errors are also somewhat higher for bonds involving second-row elements, both for the correlation-consistent basis sets and for the def2 basis sets. This suggests that basis set requirements for the second-row elements are somewhat more stringent than for first-row elements.
C. Mixed addition of diffuse functions, and MP2.5 tests
In Tables I and II, the MP2 and CCSD(T) portions of the focal-point computations consistently both used diffuse functions or both neglected diffuse functions. Here, we examine the effect of adding diffuse functions to the MP2 portion of the computation, but neglecting it for the CCSD(T) portion. We also consider basis sets using more heavily truncated diffuse functions, primarily the “jun-” truncation jun-cc-pV(X+d)Z,48 denoted here as jXZ, which removes diffuse functions from hydrogens and the highest angular momentum diffuse functions from other atoms, compared to standard aug-cc-pV(X+d)Z basis sets. Finally, we also consider the effect of using the MP2.5 method51,55 for the basis set correction, instead of MP2. The results for the error statistics and error distributions are presented in Tables III and IV, along with results for the previously discussed focal-point methods, for easy comparison.
. | Bond lengths . | Bond angles . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
Method and basis . | RMSE . | MAE . | ME . | MaxAE . | RMSE . | MAE . | ME . | MaxAE . | Timeb . |
CCSD(T)c | |||||||||
[TQZ;δ:DZ] | 0.0023 | 0.0013 | 0.0012 | 0.0125 | 0.12 | 0.07 | −0.05 | 0.43 | 5.2 |
[haTQZ;δ:DZ] | 0.0022 | 0.0014 | 0.0013 | 0.0097 | 0.08 | 0.05 | −0.02 | 0.27 | 7.4 |
[jTQZ;δ:jDZ] | 0.0017 | 0.0011 | 0.0010 | 0.0070 | 0.08 | 0.05 | −0.03 | 0.30 | 10.9 |
[haTQZ;δ:haDZ] | 0.0014 | 0.0011 | 0.0010 | 0.0050 | 0.04 | 0.03 | −0.02 | 0.17 | 22.4 |
[MP2.5/TQZ;δ:DZ] | 0.0013 | 0.0008 | 0.0001 | 0.0066 | 0.09 | 0.06 | −0.03 | 0.35 | 34.4 |
[TQZ;δ:TZ] | 0.0012 | 0.0007 | 0.0002 | 0.0055 | 0.08 | 0.05 | −0.03 | 0.29 | 116.4 |
[haTQZ;δ:TZ] | 0.0010 | 0.0006 | 0.0004 | 0.0039 | 0.03 | 0.02 | −0.00 | 0.15 | 118.9 |
[haTQZ;δ:haTZ] | 0.0008 | 0.0005 | 0.0003 | 0.0033 | 0.03 | 0.02 | −0.00 | 0.14 | 365.0 |
[Q5Z;δ:TZ] | 0.0008 | 0.0005 | 0.0002 | 0.0036 | 0.04 | 0.03 | −0.00 | 0.09 | 124.5 |
[haQ5Z;δ:TZ] | 0.0007 | 0.0005 | 0.0002 | 0.0027 | 0.03 | 0.02 | −0.00 | 0.10 | 132.1 |
[haQ5Z;δ:haTZ] | 0.0005 | 0.0004 | 0.0001 | 0.0019 | 0.02 | 0.02 | −0.00 | 0.09 | 379.8 |
. | Bond lengths . | Bond angles . | . | ||||||
---|---|---|---|---|---|---|---|---|---|
Method and basis . | RMSE . | MAE . | ME . | MaxAE . | RMSE . | MAE . | ME . | MaxAE . | Timeb . |
CCSD(T)c | |||||||||
[TQZ;δ:DZ] | 0.0023 | 0.0013 | 0.0012 | 0.0125 | 0.12 | 0.07 | −0.05 | 0.43 | 5.2 |
[haTQZ;δ:DZ] | 0.0022 | 0.0014 | 0.0013 | 0.0097 | 0.08 | 0.05 | −0.02 | 0.27 | 7.4 |
[jTQZ;δ:jDZ] | 0.0017 | 0.0011 | 0.0010 | 0.0070 | 0.08 | 0.05 | −0.03 | 0.30 | 10.9 |
[haTQZ;δ:haDZ] | 0.0014 | 0.0011 | 0.0010 | 0.0050 | 0.04 | 0.03 | −0.02 | 0.17 | 22.4 |
[MP2.5/TQZ;δ:DZ] | 0.0013 | 0.0008 | 0.0001 | 0.0066 | 0.09 | 0.06 | −0.03 | 0.35 | 34.4 |
[TQZ;δ:TZ] | 0.0012 | 0.0007 | 0.0002 | 0.0055 | 0.08 | 0.05 | −0.03 | 0.29 | 116.4 |
[haTQZ;δ:TZ] | 0.0010 | 0.0006 | 0.0004 | 0.0039 | 0.03 | 0.02 | −0.00 | 0.15 | 118.9 |
[haTQZ;δ:haTZ] | 0.0008 | 0.0005 | 0.0003 | 0.0033 | 0.03 | 0.02 | −0.00 | 0.14 | 365.0 |
[Q5Z;δ:TZ] | 0.0008 | 0.0005 | 0.0002 | 0.0036 | 0.04 | 0.03 | −0.00 | 0.09 | 124.5 |
[haQ5Z;δ:TZ] | 0.0007 | 0.0005 | 0.0002 | 0.0027 | 0.03 | 0.02 | −0.00 | 0.10 | 132.1 |
[haQ5Z;δ:haTZ] | 0.0005 | 0.0004 | 0.0001 | 0.0019 | 0.02 | 0.02 | −0.00 | 0.09 | 379.8 |
Errors computed against the highest-level results reported in Ref. 11, CCSD(T)/ha6Z for 83 molecules and CCSD(T)/ha5Z for the remaining 11 molecules marked by red asterisks in Fig. 2.
Time (minutes) to compute a single analytic gradient for benzene (see Sec. II B for details).
For focal-point approaches, CCSD(T)/[XYZ; δ:NZ] denotes MP2/CBS[X,Y]Z + /NZ.
Errors with respect to CCSD(T)/ha6Z (or CCSD(T)/ha5Z) reference geometries that were provided by Karton et al. 11 (see Sec. II B for details).
Blue represents bonds (or angles) containing strictly first-row elements, and orange represents any bond (or angle) containing second-row elements (see Fig. 2 for dataset details). Vertical bars represent errors of −0.005, −0.001, 0.000, 0.001, 0.005, and 0.01 Å for bond lengths and −0.5°, −0.1°, 0°, 0.1°, and 0.5° for bond angles. MAE (positive) is represented by the orange box.
For focal-point approaches, CCSD(T)/[XYZ; δ:NZ] denotes MP2/CBS[X,Y]Z + /NZ.
If we start with the focal-point method CCSD(T)/[TQZ;δ:DZ] and add diffuse functions to the MP2 portion of the computation, yielding CCSD(T)[haTQZ;δ:DZ], it provides little discernable improvement in the error distribution for bond lengths, although there is a slight improvement in the error distribution for bond angles (Table IV). The maximum absolute errors drop from 0.0125 Å and 0.43° to 0.0097 Å and 0.27°. The next jump, adding diffuse functions to the CCSD(T) portion of the computation, yielding the CCSD(T)[haTQZ;δ:haDZ] approach, provides a more noticeable tightening of the error distribution and improved error statistics.
Moving on to triple-ζ treatments of CCSD(T), for the focal-point sequence [TQZ;δ:TZ], [haTQZ;δ:TZ], [haTQZ;δ:haTZ], adding diffuse functions only to the MP2 portion (the middle method) provides error statistics that are intermediate between the other two methods (although results for bond angles are closer to the quality of the final [haTQZ;δ:haTZ] method). For the sequence [Q5Z;δ:TZ], [haQ5Z;δ:TZ], [haQ5Z;δ:haTZ], results are very similar for any of these methods, although again the quality of the middle approach is intermediate between the two endpoints.
We tested the effect of more truncated sets of diffuse functions through the CCSD(T)/[jTQZ;δ:jDZ] focal-point approach, where “j” denotes “jun-” as discussed above. Comparing CCSD(T)/[jTQZ;δ:jDZ] to the more complete CCSD(T)/[haTQZ;δ:haDZ] approach, mean errors and mean absolute errors are very similar, although there is a larger spread in the errors for [jTQZ;δ:jDZ] than [haTQZ;δ:haDZ] (RMSE for bond lengths of 0.0017 Å vs 0.0014 Å and a maximum absolute error of 0.0070 Å vs 0.0050 Å). Bond length errors are actually reduced for [jTQZ;δ:jDZ] vs [haTQZ;δ:DZ], even though the basis set is smaller for the MP2 portion. This reinforces the trend that minor improvements to the basis set have more impact when they are applied to the CCSD(T) portion of the computation than the MP2 portion [of course, basis set enhancements to the CCSD(T) portion of the computation lead to larger increases in computational cost than basis set enhancements to the MP2 portion of the computation]. Compared to non-augmented [TQZ;δ:DZ], [jTQZ;δ:jDZ] leads to modest improvements in MAE and in error distributions. Overall, the “jun-” basis sets seem to be effective in these focal-point approaches, if larger basis sets such as heavy-aug-cc-pVXZ become computationally costly. Explorations of fully augmented basis sets, and even more truncated “may-” basis sets, are presented in the supplementary material. We find that heavy-aug-cc-pVXZ (neglecting diffuse functions on H atoms) is a very effective approximation to aug-cc-pVXZ for geometries and, not surprisingly, that may-cc-pVXZ basis sets lead to errors slightly larger than for jun-cc-pVXZ.
Finally, we also tested the performance of focal-point methods where the MP2.5 method51,55 was used in place of MP2. Specifically, we estimated the MP2.5 CBS limit using a two-point extrapolation of the correlation energies using the cc-pVTZ and cc-pVQZ basis sets, and we used the cc-pVDZ basis for ; this focal-point approach is denoted [MP2.5/TQZ;δ:DZ] in the table. Compared to using MP2 for the large-basis part of the computation, i.e., [TQZ;δ:DZ], the MP2.5 approach significantly tightens the error distribution in bond lengths and slightly improves the error distribution for bond angles. Maximum errors drop from 0.0125 Å and 0.43° to 0.0066 Å and 0.35°. Mean absolute errors are also improved. Indeed, results are nearly as good as focal-point methods using triple-ζ methods for at a substantially reduced cost (34.4 min vs 116.4 min for a [TQZ;δ:TZ] gradient for benzene). Table S-2 of the supplementary material shows that statistics using MP3 for the large-basis computations is not as good as for MP2.5, consistent with the claims of the MP2.5 method that provides a compromise between errors inherent in MP2 and MP3.51
D. Conclusions
Bearing in mind that electron correlation beyond CCSD(T) and relativistic effects can change bond lengths between first-row atoms by ∼0.0005 Å, we can consider approximations to valence CCSD(T)/CBS geometries to be of “benchmark quality” if they exhibit MAE of approximately this size. Referring back to Table I, we see that conventional CCSD(T) would require basis sets as large as (heavy-aug-)cc-pV5Z to achieve this level of accuracy. Explicitly correlated CCSD(T)-F12 achieves this using the TZ-F12 basis. Among the focal-point methods tested, essentially all of them achieve this level of accuracy on average, so long as is computed in a basis set of triple-ζ quality. Examination of Table II indicates that MAEs for bonds involving second-row atoms are a little larger than for first-row atoms (as shown in the supplementary material, these errors are somewhat larger on a percentage basis as well). To achieve this “benchmark” ∼0.0005 Å accuracy for second-row atoms requires at least a Q5Z extrapolation for the MP2 component of the computation. Examining the maximum absolute errors as well, we see that the focal-point approximation CCSD(T)/[haQ5Z;δ:haTZ] has a MaxAE of only 0.0019 Å, compared to 0.0018 Å for conventional CCSD(T)/ha5Z and 0.0026 Å for CCSD(T)-F12/cc-pVTZ-F12. This allows us to recommend the focal-point approach CCSD(T)/[haQ5Z;δ:haTZ] as a suitable replacement for computing “benchmark-quality” CCSD(T)/CBS equilibrium geometries for larger molecules where CCSD(T)/(ha)5Z computations are cost-prohibitive. The speedup for this focal-point method is tremendous, as an analytic gradient for benzene requires 380 min vs 11 488 min for CCSD(T)/cc-pV5Z. RMSE, MAE, and ME are nearly as good for CCSD(T)/[Q5Z;δTZ] as for CCSD(T)/[haQ5Z;δ;haTZ] (and the time drops from 379.8 min to only 124.5 min), but the MaxAE increases to 0.0036 Å.
If instead we can settle for an accuracy vs CCSD(T)/CBS of ∼ 0.005 Å maximum absolute error, then any of the focal-point methods tested are suitable for this purpose, except for the least expensive CCSD(T)/[TQZ;δ:DZ]. Likewise, conventional CCSD(T) using basis sets of quadruple-ζ quality or better are acceptable. However, again, the focal-point approaches are much more computationally affordable, e.g., 22 min for CCSD(T)/[haTQZ;δ:haDZ] vs 3622 min for a CCSD(T)/haQZ computation of similar accuracy.
When an efficient DF-MP2 gradient code is available, it seems advantageous to use larger basis sets for the MP2 portion of the computation, such as (ha)Q5Z extrapolations. Focal-point approaches using triple-ζ basis sets for are also significantly more accurate than the ones using double-ζ basis sets, although for many applications the double-ζ basis sets may be accurate enough. Partial augmentation of the basis sets provides geometries of a quality intermediate between the unagumented and fully augmented results.
Compared to explicitly correlated CCSD(T)-F12, we find superior error statistics for CCSD(T)/[haTQZ;δ:haDZ] vs CCSD(T)-F12/DZ-F12 and for CCSD(T)/[haQ5Z;δ:haTZ] vs CCSD(T)-F12/TZ-F12. The focal-point computations are also significantly faster than the F12 computations, due to the larger size of the XZ-F12 basis sets vs the haXZ basis sets in addition to the extra expense of the explicit correlation terms.
Figure 3 summarizes our recommendations about computational cost-vs-accuracy for the focal-point methods tested vs conventional CCSD(T). If one wishes to maximize both accuracy and computational speed, the focal-point methods examined here are superior to conventional CCSD(T). Standard CCSD(T) would only be preferable if the molecule is so small that the cost of the computation is not high or if MP2 for that system is somehow suspect (e.g., perhaps a very small HOMO–LUMO gap). Of course, optimizing geometries using focal-point methods is not convenient using most standard quantum chemistry program packages, but this technical problem is addressed with the implementation reported here, which is available as open-source software through the Psi4 program.43 These capabilities are also being migrated to the quantum chemistry common driver and databases (QCDB) project,71 which will extend them to additional quantum chemistry packages.
SUPPLEMENTARY MATERIAL
See the supplementary material for more detailed error statistics for all methods considered here, including various truncations of the diffuse functions in the basis sets.
ACKNOWLEDGMENTS
This work was supported, in part, by the U.S. National Science Foundation (Grants Nos. CHE-1566192 and ACI-1449723) and by the Scientific and Technological Research Council of Turkey (TUBITAK-118Z916). We are grateful to Professor Amir Karton (University of Western Australia) for providing the raw data from his recent study on CBS convergence of CCSD(T) geometries. We thank Professor Ryan Fortenberry for helpful discussions.
The data that support the findings of this study are available from the corresponding author upon reasonable request.