Noncovalent interactions are of fundamental importance across the disciplines of chemistry, materials science, and biology. Quantum chemical calculations on noncovalently bound complexes, which allow for the quantification of properties such as binding energies and geometries, play an essential role in advancing our understanding of, and building models for, a vast array of complex processes involving molecular association or self-assembly. Because of its relatively modest computational cost, second-order Møller-Plesset perturbation (MP2) theory is one of the most widely used methods in quantum chemistry for studying noncovalent interactions. MP2 is, however, plagued by serious errors due to its incomplete treatment of electron correlation, especially when modeling van der Waals interactions and π-stacked complexes. Here we present spin-network-scaled MP2 (SNS-MP2), a new semi-empirical MP2-based method for dimer interaction-energy calculations. To correct for errors in MP2, SNS-MP2 uses quantum chemical features of the complex under study in conjunction with a neural network to reweight terms appearing in the total MP2 interaction energy. The method has been trained on a new data set consisting of over 200 000 complete basis set (CBS)-extrapolated coupled-cluster interaction energies, which are considered the gold standard for chemical accuracy. SNS-MP2 predicts gold-standard binding energies of unseen test compounds with a mean absolute error of 0.04 kcal mol−1 (root-mean-square error 0.09 kcal mol−1), a 6- to 7-fold improvement over MP2. To the best of our knowledge, its accuracy exceeds that of all extant density functional theory- and wavefunction-based methods of similar computational cost, and is very close to the intrinsic accuracy of our benchmark coupled-cluster methodology itself. Furthermore, SNS-MP2 provides reliable per-conformation confidence intervals on the predicted interaction energies, a feature not available from any alternative method.
I. INTRODUCTION
The development of accurate and efficient quantum chemical methods for modeling intermolecular interactions is one of the primary goals of modern computational chemistry. These interactions play a central role throughout biology, materials science, and chemistry and drive a diverse set of phenomena, such as the noncovalent binding of drugs to proteins, the packing of crystals and self-assembly of nanomaterials, and the establishment of the bulk properties of liquids.1–3
Among the range of methods available, second-order Møller-Plesset perturbation (MP2) theory stands out for its principled physical description of intermolecular interactions and relatively modest computational cost.4 It is the simplest quantum chemical method that contains an ab initio description of each of the essential physical effects for accurately modeling noncovalent interactions; although Hartree-Fock (HF) and Kohn-Sham density functional theory (DFT) can account for electrostatic, polarization, and exchange interactions, neither method is currently able to capture long-range dispersion effects without additional cost or parameterization. In contrast, MP2, the most efficient wavefunction-based ab initio electronic-structure method to directly, if approximately, include dynamical correlation, describes dispersion interactions explicitly. Modern techniques such as density fitting/resolution of the identity (DF-MP2/RI-MP2), local correlation (LMP2, TRIM-MP2), and their combination (DF-LMP2, RI-TRIM) have, moreover, considerably improved the computational performance of MP2 such that systems with hundreds of atoms can now be handled routinely.5–10
Nevertheless, MP2’s incomplete treatment of electron correlation sometimes results in significant errors. In the case of intermolecular interactions, the largest errors are often found in systems with π–π stacking and those that are dominated by dispersion forces.11,12 Relative to highly accurate coupled-cluster results, for example, MP2 overestimates the binding energy of sandwich and T-stacked benzene dimers by 2× and 1.3×, respectively.13 More accurate calculations can, however, be quite computationally expensive. The cost of benchmark-quality coupled-cluster calculations with single, double, and perturbative triple excitations [CCSD(T)], for example, formally scales as O(N7), where N is proportional to the system size, whereas MP2 scales only as O(N5). For large systems, especially when the Boys-Bernardi counterpoise correction is used, these CCSD(T) calculations can be prohibitively costly.14 Whereas a single benzene dimer interaction-energy calculation in a relatively modest augmented double-ζ basis set can, for example, take over 24 h to complete on a contemporary workstation using CCSD(T), the corresponding DF-MP2 calculation completes in under 5 min.
In order to overcome these limitations of MP2, we have developed spin-network-scaled MP2 (SNS-MP2), a semi-empirical MP2-based method for dimer interaction-energy calculations. SNS-MP2 uses a neural network to reweight terms appearing in the total MP2 interaction energy so as to reproduce benchmark-quality, complete basis set (CBS)-extrapolated CCSD(T) interaction energies, which are considered the gold standard in computational chemistry due to their high accuracy.15 The method can be interpreted as an extension of Grimme’s spin-component-scaled MP2 (SCS-MP2) method (see Sec. II).16 Instead of using two global spin-component coefficients chosen during training, as in the original SCS-MP2 and in other related approaches, in SNS-MP2, a collection of per-conformation HF-, MP2-, and SAPT0-based17,18 features is used to derive, by way of a neural network, a unique set of spin-component coefficients for the particular system and conformation under study. SNS-MP2 predicts the interaction energies of our test set of 22 643 geometries of unseen complexes with a mean absolute error (MAE) of 0.04 kcal mol−1 [root-mean-square error (RMSE) 0.09 kcal mol−1] relative to the gold-standard values, a 6–7× improvement over MP2. Moreover, in addition to providing a point estimate of the interaction energy, SNS-MP2 provides a reliable predictive interval, or an error bar, on each interaction-energy value, a feature not available from any alternative method that we are aware of.
Based on reports in the literature concerning errors in CCSD(T)/CBS methodologies, we estimate the reliability of the benchmark values in our training and test data to be between 0.05 and 0.10 kcal mol−1 RMSE. This indicates that the SNS-MP2 method is approaching the intrinsic accuracy of the CCSD(T)/CBS reference data it has been trained to predict. In further evaluation of the SNS-MP2 model on two previously published test sets from Řezáč, Riley, and Hobza19,20 and Taylor et al.21 that have been used to benchmark the accuracy of a wide variety of quantum chemical methods, we find that the accuracy of SNS-MP2 exceeds that of all previously reported O(N5) and faster wavefunction-based methods and all density functionals.
II. RELATED WORK
This work draws from two lines of related research in the fields of quantum chemistry and machine learning. The first area of work, in quantum chemistry, involves efforts to obtain accurate interaction energies using methods based solely on MP2 and/or DFT, thereby avoiding the additional computational cost of higher accuracy methods such as CCSD(T). Novel methods in this class include MP2C,22 MP2+Δvdw,23 SCS-MP2,16 and SCS-MP2 variants such as SCSN-MP2,24 SCS(MI)-MP2,25 SOS-MP2,26 and JMP2.27 The second area of work, chiefly from the cheminformatics, force field development, and machine learning communities, involves efforts to use deep neural networks to predict molecular properties. Our work relies on combining the respective strengths of both approaches for modeling intermolecular interactions by integrating machine learning techniques within the mathematical structure of the MP2 method.
Most MP2-based semi-empirical methods that have been developed within the quantum chemistry community involve breaking the MP2 energy into a sum of two or more physically meaningful components and then either rescaling the components empirically or replacing them with values calculated using another method. A major deficiency of MP2 for intermolecular interactions in the asymptotic limit is its implicit inclusion of the uncoupled Hartree-Fock (UCHF) dispersion energy.28,29 The MP2C method of Pitoňák and Heβelmann leverages this observation by subtracting out the UCHF dispersion energy from the MP2 interaction energy and adding back in a more accurate approximation of the dispersion energy from a coupled Kohn-Sham DFT calculation.22 MP2+Δvdw adopts a similar approach, supplementing the MP2 interaction energy with a dispersion correction based on a pairwise interatomic CnR−n summation.23
SCS-MP2 is a method proposed originally by Grimme that exploits the fact that the MP2 correlation energy can be broken into two terms arising from the correlation of the same-spin and opposite-spin electron pairs.16 These contributions can then be empirically scaled and recombined to yield more accurate energies. Values of 6/5 and 1/2 were recommended for the opposite-spin and same-spin scale factors (referred to as the “SCS coefficients”), respectively, based on least-squares fitting of the coefficients to 51 quadratic configuration interaction reaction energies.16
Although SCS-MP2 has demonstrated impressive performance for reaction energies, barrier heights, and molecular geometries,16 the results have been less encouraging for interaction energies. DiStasio and Head-Gordon reported, for example, that on Hobza’s S22 test set,12 SCS-MP2 produced consistently less accurate results than standard DF-MP2.25 Fitting the SCS coefficients afresh, they obtained values of 0.40 and 1.29 for the opposite-spin and same-spin coefficients, essentially inverses of the previous values. Hill and Platts independently refitted the SCS coefficients for interaction energies of 10 stacked nucleic acid base pairs.24 They obtained values of 0 and 1.75 for the opposite-spin and same-spin scaling factors, respectively. King showed that a slight modification to Grimme’s parameters, with values of 1.28 and 0.5 for the opposite-spin and same-spin coefficients, was capable of yielding accurate results for the ethylene dimer.30
These results, taken together, suggest that the optimal SCS coefficients may depend significantly on the nature of the molecules under study as well as on the details of their interaction. Different dimer systems may be most accurately modeled with different parameter values—whereas certain values may perform well for hydrogen bonding interactions, for example, other values may be appropriate for π–π interactions, and so on. One implication of this suggestion is that more flexible models, in which the coefficients are permitted to vary across chemical (and possibly conformational) space, could yield improved accuracy. Tan, Acevedo, and Izgorodina recently explored this direction with spin-ratio-scaled MP2 (SRS-MP2), an extension of SCS-MP2 in which two different sets of SCS coefficients are used depending on the ratio of the opposite-spin to the same-spin components of the interaction correlation energy.31
If an SCS-like model is to be constructed with variable coefficients, by what criteria should the model determine which coefficient values are used in each calculation? One choice would be to fit a series of separate fixed-coefficient models for different classes of molecular interactions; this would, however, prevent us from exploiting the information shared across these classes and require difficult decisions about the level of specificity with which the classes should be defined. Another approach, drawing from MP2+Δvdw,23 molecular mechanics force fields, and dispersion-corrected DFT, could be to define the SCS coefficients in terms of parameters associated with a set of atom types in addition to the nuclear geometry. In addition to the difficulty of defining the appropriate atom-type groupings, however, this method would require the development of yet unknown geometry-dependent functional forms.
In this work we have chosen an alternative strategy, one that utilizes quantum chemical observables related to decompositions of the HF, MP2, and SAPT0 interaction energies for the particular dimer configuration under study as input features to a deep neural network that constructs a unique pair of SCS coefficients for that configuration. This approach avoids difficulties related to the division of molecules into classes and the manual selection of atom types inherent in the two approaches mentioned above. Furthermore, as we will show, the result is a method for calculating MP2-based interaction energies that is dramatically more accurate than any previously reported.
Although this approach shares many similarities with other recently developed methods that bridge quantum chemistry and machine learning, some distinctions should be noted. Many of these recently developed methods attempt to predict the atomization energy or other conformation-dependent properties of molecules directly by using connectivity or geometric information represented in a variety of forms, including through Coulomb matrices,32 scattering transforms,33 generalized symmetry functions,34,35 bags of bonds,36 and deep tensor neural networks.37 When successful, these approaches provide an alternative to quantum chemical calculations and can be thought of as molecular mechanics force fields with significantly more flexible functional forms. The strategy we take here is somewhat different—our model does not eliminate the need for quantum chemical calculations on new conformations but instead uses them as its core building blocks. It aims to build on the strengths of MP2 and provide a method for predicting benchmark-quality CCSD(T)/CBS-level dimer interaction energies without the cost of coupled-cluster calculations.
III. NEURAL NETWORK ARCHITECTURE
The role of the deep neural network in this work is to take values from a series of quantum mechanical calculations performed on the conformation under study and output a predicted energy and corresponding uncertainty. The architecture of our deep neural network is largely a generic feed-forward multilayer perceptron (MLP).38 It operates on each conformation independently, receiving a vector of 22 input features (described in Table I) and outputting three scalars, which we label as cos, css, and b. The first two values, cos and css, are the spin-component coefficients used in the construction of the final energy, and the third value, b, determines the uncertainty of the prediction. The architecture of the network is shown in Fig. 1. Although all of the quantum chemical input features depend on the molecular geometry, no direct geometric observables such as interatomic distances are employed and no auxiliary information is provided—during training the model is not informed of, for example, which inputs represent the same chemical systems or nearby molecular geometries.
Name . | Description . |
---|---|
Large-basis HF interaction energy | |
Small-basis HF interaction energy | |
Large-basis MP2 interaction energy | |
Small-basis MP2 interaction energy | |
Large-basis MP2 same-spin correlation interaction energy | |
Small-basis MP2 same-spin correlation interaction energy | |
ζlow = 2 and ζlow = 3 | One-hot encoding of the ζ-value of the small basis set, two binary features |
Large-basis HF electrostatic interaction energy | |
Large-basis MP2 electrostatic interaction energy | |
Large-basis Heitler-London energy | |
Large-basis MP2 density matrix overlap | |
Large-basis HF density matrix overlap | |
SAPT0 second-order dispersion energy | |
SAPT0 same-spin second-order dispersion energy | |
SAPT0 first-order electrostatic energy | |
SAPT0 first-order exchange energy (S2 approximation) | |
SAPT0 second-order exchange-dispersion energy | |
SAPT0 same-spin second-order exchange-dispersion energy | |
SAPT0 first-order electrostatic energy | |
SAPT0 second-order exchange-induction energy | |
SAPT0 second-order induction energy |
Name . | Description . |
---|---|
Large-basis HF interaction energy | |
Small-basis HF interaction energy | |
Large-basis MP2 interaction energy | |
Small-basis MP2 interaction energy | |
Large-basis MP2 same-spin correlation interaction energy | |
Small-basis MP2 same-spin correlation interaction energy | |
ζlow = 2 and ζlow = 3 | One-hot encoding of the ζ-value of the small basis set, two binary features |
Large-basis HF electrostatic interaction energy | |
Large-basis MP2 electrostatic interaction energy | |
Large-basis Heitler-London energy | |
Large-basis MP2 density matrix overlap | |
Large-basis HF density matrix overlap | |
SAPT0 second-order dispersion energy | |
SAPT0 same-spin second-order dispersion energy | |
SAPT0 first-order electrostatic energy | |
SAPT0 first-order exchange energy (S2 approximation) | |
SAPT0 second-order exchange-dispersion energy | |
SAPT0 same-spin second-order exchange-dispersion energy | |
SAPT0 first-order electrostatic energy | |
SAPT0 second-order exchange-induction energy | |
SAPT0 second-order induction energy |
The estimated total interaction energy is constructed from cos and css as
where EHF/aVQZ is the Hartree-Fock interaction energy computed in the aVQZ basis set (a modification of the aug-cc-pVQZ basis; see Appendix A for further details), and and are the CBS-extrapolated opposite-spin and same-spin components of the MP2 correlation energy. The CBS limit is approximated by a two-point Helgaker extrapolation.39 Note that all of the components required for the calculation of Eq. (1) are themselves elements of the input vector used to compute cos and css (see Table I). Nevertheless, we interpret this architecture as enforcing a quasi-linearity of ÊSNS-MP2 with respect to and , and we find that this model yields much more accurate predictions than those in which the neural network directly outputs the total interaction energy without a final SCS-MP2-like resummation.
In employing any quantum chemical method to address scientific questions, it is essential to evaluate, either qualitatively or quantitatively, the degree of confidence which should be accorded to the results. For this reason, instead of merely predicting a single scalar, SNS-MP2 has been constructed to output a probability distribution over interaction energies, centered at ÊSNS-MP2 with some variable width, representing the range of possible values it predicts that the CCSD(T)/CBS interaction energy may take. The use of a heteroskedastic error model is particularly important because the areas of intended application of the model include both compact and extended dimers, for which the uncertainty in the interaction energy differs substantially. This information is typically summarized by a predictive interval, or an error bar, on the interaction energy.
We utilize two extensions to the basic MLP architecture, both related to the quantification of uncertainty in the model’s predicted energies. The model’s uncertainty is parameterized using the neural network’s third scalar output, b, which is interpreted as the width of a Laplace distribution,
This error distribution is incorporated into the model by the choice of the loss function used in training this neural network.40,41 The network’s weights, W, are optimized to approximately minimize (W),
where the index i runs over all data points in the training set and N is the number of training data points. Ignoring an additive constant, this objective function is the negative mean log likelihood of the data under our Laplace error model. It incentivizes both small (scaled) absolute errors and tight error bars. The choice of an error model based on the Laplace distribution mitigates the effect of outliers when fitting the SNS-MP2 model; we note that such error models have also been widely used in the field of robust regression for similar reasons.42 The use of a softplus activation function43 on the last layer of the network ensures that the value of b will always be positive, so no constraints are required to keep Eq. (3) finite and well behaved.
We also use dropout, both during training and prediction.44 During training, dropout works by stochastically setting a small fraction of the weights in the neural network to zero when computing the gradient of the loss function with respect to the weights. This can be interpreted as an approximate variational Bayesian inference method or as a form of model averaging.44–46 At test time, instead of using the common weight-scaling technique in which the network weights are rescaled to their expected value given the dropout fraction, we use Monte Carlo model averaging. In this approach, when making a prediction, a moderate number (∼1000) of random neural networks are sampled using dropout, the prediction is made with each network, and the results are averaged. Each sampled network yields a separate Laplace distribution over the predicted energy—the resulting averaged prediction using Monte Carlo model averaging is a mixture of these Laplace distributions. Because it requires a number of forward passes through the network to sample the effect of the dropout, Monte Carlo model averaging is more computationally costly than the common weight-averaging approach. For our purposes, however, the incremental cost of these extra forward passes through the network is negligible with respect to the cost of the HF, MP2, and SAPT0 calculations that generate the input features. The advantage over conventional weight scaling is both formal correctness44 and a slight empirical increase in predictive accuracy.
IV. DATA SET
In opting for a flexible neural network-based model, overfitting becomes a significant concern due to the large number of model parameters. Very large data sets for training and validation are thus essential. Accordingly, we collected 247 560 benchmark- and near-benchmark-quality CCSD(T)/CBS dimer interaction-energy data points. This data set is nearly twice as large as the QM9 data set of DFT-computed [B3LYP/6-31G(2df,p)] properties, which has been used to fit machine learning models, and significantly larger than previous CCSD(T)/CBS data sets used to fit other semi-empirical MP2-based methods and density functionals.19,47–49 The collection of CCSD(T)/CBS data used in this work is over 100 times larger than the set of 2027 CCSD(T) noncovalent interaction energies curated by Schneebeli, Bochevarov, and Friesner.49
The data set contains 3149 unique chemical systems in an average of 80 different conformations per chemical system (min: 3, median: 29, max: 2486). The systems span the range of chemical functionality found in biological systems and covered by common biomolecular force fields. The distribution of this data set by functional groups and sizes is shown in Figs. 2 and 3, respectively.77 The most prevalent molecule in the data set is water; the set features 2486 conformations of water dimer and 115 227 conformations for which at least one of the molecules is water. Due to concerns over complete charge transfer in the supermolecular interaction-energy calculations, no systems in which the net charge on either fragment was +2, −2, or beyond and no systems in which the fragments carried net charges with an opposite sign were used in this work.
At the outset, the data set was split into three subsets for training, validation, and final testing. The unique chemical systems were split randomly into an 80%/10%/10% distribution among the three subsets. All of the geometries associated with a given chemical system were either entirely in the training set, entirely in the validation set, or entirely in the final test set. After performing this division, the training, validation, and test sets contained 204 934, 19 983, and 22 643 data points, respectively.
A. Reference energies
Counterpoise-corrected benchmark values were computed with a hybrid CCSD(T) and CBS-extrapolated MP2 method that is often referred to as the gold standard in computational thermochemistry.15 The reference energy is constructed from three components: the aVQZ HF interaction energy, , the MP2 interaction correlation energy extrapolated to the CBS limit using Helgaker’s inverse third-power form,39 , and a high-level correlation energy correction computed as the difference between the CCSD(T) and MP2 interaction energy in a single basis set, ,
For the two-point MP2/CBS extrapolation, all 247 470 high-basis MP2 calculations were performed in the aVQZ basis, while the low-basis MP2 calculations [basis set X in Eqs. (4) and (5)] were performed in a mixture of basis sets: 185 334 were performed at aVTZ, 47 413 at aVTZ(d/p), and 14 723 at aVDZ (see Appendix A for a specification of these basis sets). The CCSD(T) high-level correlation corrections dominate the cost of the benchmark calculation and were also performed at a mixture of basis sets [basis set X′ in Eqs. (4) and (6)]: 14 176 were performed at aVQZ, 66 652 at aVTZ, 107 315 at aVTZ(d/p), and 59 327 at aVDZ. The choice of the basis set used for each system was made based on the molecular size, with larger systems run in smaller basis sets to reduce the computational cost.
These calculations were performed using three software packages—MOLPRO 2012,50,51 MOLPRO 2015,50,52 and Psi4 1.053—each with some in-house modifications. All calculations were performed in the C1 point group, all correlated calculations used the frozen-core approximation, and most MP2 calculations were performed with density fitting.78
The geometries were generated both by DF-LMP2/aVTZ geometry optimization and by the extraction of frames from molecular dynamics simulations. Frequently, although not exclusively, conformations were generated based on radial scans that were seeded from both optimized geometries and conformations observed in molecular dynamics simulations. The radial scans were performed by displacing the monomers along their intermolecular axis in units of 0.1 Å. Because of these procedures, a greater degree of diversity is present in the intermolecular distances and orientations of the dimer configurations than in the geometries of their constituent monomers, which tend to be at, or near, equilibrium.
B. Input features
The model does not use the geometry information or atomic composition for a given conformation directly as input features. Instead, it uses quantum chemical features computed from HF, MP2, and SAPT0 calculations on the dimer and its monomers. The features are intended to encode the nature of the interaction in a geometry-independent manner such that chemically disparate dimers can be represented within a single model. We briefly describe each feature below; refer to Table I for the complete list.
The first set of features includes those which are computed from low-basis and high-basis counterpoise-corrected interaction-energy calculations. These are the same underlying QM calculations used for the MP2/CBS component of the gold-standard reference values. The features in this category are the HF and MP2 interaction energies and the same-spin component of the MP2 interaction correlation energy. These values are computed in two basis sets, and we supply the network with both values, for a total of six features. Whereas the high basis is fixed to aVQZ, the low basis varies across the data set. We thus supplied a one-hot encoding54 of the ζ-value of the low basis as two additional binary input features.
Furthermore, we performed a series of aVQZ calculations to compute the electrostatic interaction energy, Heitler-London energy, and density-matrix overlap. The electrostatic interaction energy between fragments A and B is defined as
where is the electron density associated with the relevant monomer calculated at the X = HF/MP2 level in the monomer basis set. The Heitler-London energy provides information about the Pauli repulsion between the monomers and is defined as
where  is the antisymmetrizer and is the Hartree product of the isolated-monomer ground-state HF wavefunctions computed in the corresponding monomer basis sets. The density-matrix overlap is
where is the one-particle reduced density matrix associated with the relevant monomer calculated at the X = HF/MP2 level in the monomer basis set. These quantities yield an additional five features.
Finally, we included nine features collected from SAPT0/aVTZ calculations for each data point.17,18 These features are the second-order dispersion, induction, exchange-dispersion, and exchange-induction energies, the same-spin component of the second-order dispersion and exchange-dispersion energies, the first-order electrostatic and exchange energies, and the first-order exchange energy computed in the S2 approximation.
An open-source Psi4 plugin that allows users to compute SNS-MP2 interaction energies and confidence intervals is available at http://www.deshawresearch.com/resources_snsmp2.html.
V. RESULTS
A. Training set
Extensive but non-exhaustive scans over hyperparameters were performed to choose the number of layers, the number of hidden nodes per layer, and the dropout strength. We implemented the SNS-MP2 model with the Keras library,55 using the Theano backend,56 and observed smooth training of the network on a single NVIDIA GTX 980 Ti graphical processing unit (GPU) using the RMSProp optimizer57 with default parameters over 100 000 epochs. (The training-set error statistics are shown in Table II.) The final model, with 10 hidden layers and 50 nodes per layer, was selected to minimize the validation-set error (Table III). The training curve of the selected model, which plots the error on the training and validation sets over the course of the optimization, is shown in Fig. 4. Despite the use of dropout, some evidence of overfitting is still visible in the gap between the performance on the training set and on the validation set. On a log scale, the apparent performance on the training set does not appear to be saturated although the validation loss may be converged.
Class . | N . | MP2/CBS MAE . | MP2/CBS RMSE . | MP2/CBS 99th percentile . | SNS-MP2 MAE . | SNS-MP2 RMSE . | SNS-MP2 99th percentile . |
---|---|---|---|---|---|---|---|
All | 204 934 | 0.208 | 0.556 | 2.631 | 0.030 | 0.078 | 0.340 |
Acids | 6 268 | 0.206 | 0.478 | 2.253 | 0.032 | 0.070 | 0.320 |
Alcohols | 16 035 | 0.180 | 0.441 | 2.105 | 0.031 | 0.077 | 0.344 |
Alkanes | 22 337 | 0.205 | 0.525 | 2.504 | 0.024 | 0.062 | 0.284 |
Alkenes | 10 702 | 0.292 | 0.669 | 2.982 | 0.037 | 0.102 | 0.498 |
Amides | 21 926 | 0.176 | 0.422 | 2.026 | 0.033 | 0.078 | 0.353 |
Amines | 18 487 | 0.220 | 0.534 | 2.483 | 0.030 | 0.074 | 0.350 |
Ammoniums | 6 516 | 0.296 | 0.633 | 2.776 | 0.042 | 0.094 | 0.417 |
Benzene | 4 396 | 0.927 | 1.649 | 6.123 | 0.066 | 0.146 | 0.673 |
Carboxylates | 3 942 | 0.188 | 0.404 | 2.008 | 0.057 | 0.120 | 0.563 |
Esters | 5 992 | 0.105 | 0.241 | 1.137 | 0.023 | 0.051 | 0.228 |
Ethers | 15 931 | 0.198 | 0.457 | 2.083 | 0.032 | 0.074 | 0.345 |
Guanidiniums | 3 937 | 0.425 | 0.861 | 3.522 | 0.058 | 0.120 | 0.578 |
Imidazolium | 2 659 | 0.690 | 1.212 | 4.554 | 0.078 | 0.160 | 0.719 |
Ketones | 9 820 | 0.171 | 0.466 | 2.338 | 0.025 | 0.065 | 0.295 |
Monoatomics | 14 505 | 0.246 | 0.521 | 2.332 | 0.045 | 0.099 | 0.372 |
Other | 40 876 | 0.226 | 0.601 | 2.786 | 0.037 | 0.081 | 0.358 |
Phenol | 3 463 | 0.658 | 1.319 | 5.492 | 0.061 | 0.125 | 0.538 |
Pyridine | 2 997 | 0.474 | 1.050 | 4.562 | 0.060 | 0.114 | 0.501 |
Pyrrole | 8 218 | 0.488 | 1.023 | 4.176 | 0.056 | 0.119 | 0.551 |
Sulfides | 22 213 | 0.395 | 0.842 | 3.532 | 0.040 | 0.088 | 0.407 |
Thiols | 19 161 | 0.285 | 0.678 | 3.099 | 0.031 | 0.099 | 0.339 |
Water | 96 591 | 0.087 | 0.191 | 0.902 | 0.020 | 0.055 | 0.217 |
Class . | N . | MP2/CBS MAE . | MP2/CBS RMSE . | MP2/CBS 99th percentile . | SNS-MP2 MAE . | SNS-MP2 RMSE . | SNS-MP2 99th percentile . |
---|---|---|---|---|---|---|---|
All | 204 934 | 0.208 | 0.556 | 2.631 | 0.030 | 0.078 | 0.340 |
Acids | 6 268 | 0.206 | 0.478 | 2.253 | 0.032 | 0.070 | 0.320 |
Alcohols | 16 035 | 0.180 | 0.441 | 2.105 | 0.031 | 0.077 | 0.344 |
Alkanes | 22 337 | 0.205 | 0.525 | 2.504 | 0.024 | 0.062 | 0.284 |
Alkenes | 10 702 | 0.292 | 0.669 | 2.982 | 0.037 | 0.102 | 0.498 |
Amides | 21 926 | 0.176 | 0.422 | 2.026 | 0.033 | 0.078 | 0.353 |
Amines | 18 487 | 0.220 | 0.534 | 2.483 | 0.030 | 0.074 | 0.350 |
Ammoniums | 6 516 | 0.296 | 0.633 | 2.776 | 0.042 | 0.094 | 0.417 |
Benzene | 4 396 | 0.927 | 1.649 | 6.123 | 0.066 | 0.146 | 0.673 |
Carboxylates | 3 942 | 0.188 | 0.404 | 2.008 | 0.057 | 0.120 | 0.563 |
Esters | 5 992 | 0.105 | 0.241 | 1.137 | 0.023 | 0.051 | 0.228 |
Ethers | 15 931 | 0.198 | 0.457 | 2.083 | 0.032 | 0.074 | 0.345 |
Guanidiniums | 3 937 | 0.425 | 0.861 | 3.522 | 0.058 | 0.120 | 0.578 |
Imidazolium | 2 659 | 0.690 | 1.212 | 4.554 | 0.078 | 0.160 | 0.719 |
Ketones | 9 820 | 0.171 | 0.466 | 2.338 | 0.025 | 0.065 | 0.295 |
Monoatomics | 14 505 | 0.246 | 0.521 | 2.332 | 0.045 | 0.099 | 0.372 |
Other | 40 876 | 0.226 | 0.601 | 2.786 | 0.037 | 0.081 | 0.358 |
Phenol | 3 463 | 0.658 | 1.319 | 5.492 | 0.061 | 0.125 | 0.538 |
Pyridine | 2 997 | 0.474 | 1.050 | 4.562 | 0.060 | 0.114 | 0.501 |
Pyrrole | 8 218 | 0.488 | 1.023 | 4.176 | 0.056 | 0.119 | 0.551 |
Sulfides | 22 213 | 0.395 | 0.842 | 3.532 | 0.040 | 0.088 | 0.407 |
Thiols | 19 161 | 0.285 | 0.678 | 3.099 | 0.031 | 0.099 | 0.339 |
Water | 96 591 | 0.087 | 0.191 | 0.902 | 0.020 | 0.055 | 0.217 |
Class . | N . | MP2/CBS MAE . | MP2/CBS RMSE . | MP2/CBS 99th percentile . | SNS-MP2 MAE . | SNS-MP2 RMSE . | SNS-MP2 99th percentile . |
---|---|---|---|---|---|---|---|
All | 19 983 | 0.255 | 0.635 | 2.964 | 0.043 | 0.095 | 0.407 |
Acids | 1 061 | 0.199 | 0.469 | 2.243 | 0.030 | 0.061 | 0.284 |
Alcohols | 741 | 0.382 | 0.717 | 2.786 | 0.081 | 0.158 | 0.665 |
Alkanes | 1 751 | 0.254 | 0.599 | 2.747 | 0.029 | 0.063 | 0.281 |
Alkenes | 1 049 | 0.323 | 0.675 | 2.794 | 0.048 | 0.112 | 0.483 |
Amides | 1 677 | 0.238 | 0.528 | 2.354 | 0.043 | 0.097 | 0.356 |
Amines | 631 | 0.309 | 0.607 | 2.346 | 0.046 | 0.079 | 0.308 |
Ammoniums | 1 385 | 0.291 | 0.707 | 2.972 | 0.055 | 0.108 | 0.420 |
Benzene | 492 | 1.191 | 1.920 | 6.374 | 0.109 | 0.210 | 0.888 |
Carboxylates | 891 | 0.146 | 0.347 | 1.736 | 0.047 | 0.103 | 0.525 |
Esters | 881 | 0.081 | 0.152 | 0.700 | 0.036 | 0.074 | 0.316 |
Ethers | 2 148 | 0.206 | 0.479 | 2.181 | 0.038 | 0.082 | 0.391 |
Guanidiniums | 312 | 0.389 | 0.775 | 3.215 | 0.059 | 0.111 | 0.456 |
Imidazolium | 742 | 0.401 | 0.835 | 3.703 | 0.055 | 0.101 | 0.349 |
Ketones | 363 | 0.410 | 0.830 | 3.638 | 0.063 | 0.113 | 0.438 |
Monoatomics | 1 656 | 0.348 | 0.634 | 2.643 | 0.067 | 0.119 | 0.435 |
Other | 5 165 | 0.223 | 0.561 | 2.642 | 0.044 | 0.099 | 0.400 |
Phenol | 701 | 0.472 | 1.023 | 4.521 | 0.065 | 0.139 | 0.660 |
Pyridine | 640 | 0.435 | 1.098 | 4.689 | 0.052 | 0.094 | 0.398 |
Pyrrole | 1 179 | 0.510 | 1.091 | 4.673 | 0.079 | 0.169 | 0.766 |
Sulfides | 2 534 | 0.490 | 0.954 | 3.833 | 0.050 | 0.103 | 0.437 |
Thiols | 1 411 | 0.411 | 0.875 | 3.704 | 0.050 | 0.102 | 0.435 |
Water | 9 315 | 0.091 | 0.198 | 0.966 | 0.026 | 0.054 | 0.227 |
Class . | N . | MP2/CBS MAE . | MP2/CBS RMSE . | MP2/CBS 99th percentile . | SNS-MP2 MAE . | SNS-MP2 RMSE . | SNS-MP2 99th percentile . |
---|---|---|---|---|---|---|---|
All | 19 983 | 0.255 | 0.635 | 2.964 | 0.043 | 0.095 | 0.407 |
Acids | 1 061 | 0.199 | 0.469 | 2.243 | 0.030 | 0.061 | 0.284 |
Alcohols | 741 | 0.382 | 0.717 | 2.786 | 0.081 | 0.158 | 0.665 |
Alkanes | 1 751 | 0.254 | 0.599 | 2.747 | 0.029 | 0.063 | 0.281 |
Alkenes | 1 049 | 0.323 | 0.675 | 2.794 | 0.048 | 0.112 | 0.483 |
Amides | 1 677 | 0.238 | 0.528 | 2.354 | 0.043 | 0.097 | 0.356 |
Amines | 631 | 0.309 | 0.607 | 2.346 | 0.046 | 0.079 | 0.308 |
Ammoniums | 1 385 | 0.291 | 0.707 | 2.972 | 0.055 | 0.108 | 0.420 |
Benzene | 492 | 1.191 | 1.920 | 6.374 | 0.109 | 0.210 | 0.888 |
Carboxylates | 891 | 0.146 | 0.347 | 1.736 | 0.047 | 0.103 | 0.525 |
Esters | 881 | 0.081 | 0.152 | 0.700 | 0.036 | 0.074 | 0.316 |
Ethers | 2 148 | 0.206 | 0.479 | 2.181 | 0.038 | 0.082 | 0.391 |
Guanidiniums | 312 | 0.389 | 0.775 | 3.215 | 0.059 | 0.111 | 0.456 |
Imidazolium | 742 | 0.401 | 0.835 | 3.703 | 0.055 | 0.101 | 0.349 |
Ketones | 363 | 0.410 | 0.830 | 3.638 | 0.063 | 0.113 | 0.438 |
Monoatomics | 1 656 | 0.348 | 0.634 | 2.643 | 0.067 | 0.119 | 0.435 |
Other | 5 165 | 0.223 | 0.561 | 2.642 | 0.044 | 0.099 | 0.400 |
Phenol | 701 | 0.472 | 1.023 | 4.521 | 0.065 | 0.139 | 0.660 |
Pyridine | 640 | 0.435 | 1.098 | 4.689 | 0.052 | 0.094 | 0.398 |
Pyrrole | 1 179 | 0.510 | 1.091 | 4.673 | 0.079 | 0.169 | 0.766 |
Sulfides | 2 534 | 0.490 | 0.954 | 3.833 | 0.050 | 0.103 | 0.437 |
Thiols | 1 411 | 0.411 | 0.875 | 3.704 | 0.050 | 0.102 | 0.435 |
Water | 9 315 | 0.091 | 0.198 | 0.966 | 0.026 | 0.054 | 0.227 |
On the training data, we observe that the neural network model is consistently more accurate than MP2. The MAE, RMSE, and 99th percentile absolute error in the neural network’s energy predictions on our training set, 0.030 kcal mol−1, 0.781 kcal mol−1, and 0.340 kcal mol−1, respectively, are all roughly 7× to 8× lower than the corresponding MP2 error statistics. Furthermore, as shown in Table II, the model achieves error reductions of 3–14× across all of the classes of molecules in the training set, suggesting a broadly shared increase in accuracy. In contrast to MP2, which has a mean signed error on the training data of −0.15 kcal mol−1 (reflecting its tendency to overbind noncovalent complexes), the mean signed error of the SNS-MP2 model on the training data is 0.001 kcal mol−1.
On our training set, SNS-MP2 achieves its highest accuracy for systems containing water, an unsurprising observation for two reasons: First, these systems are already very well described by MP2 (the MP2/CBS RMSE is under 0.2 kcal mol−1 on our training set), and second, systems containing water are very highly represented in the data set. The most challenging systems for the neural network are aromatic: Predictions on systems containing pyridine, benzene, pyrrole, phenol, and imidazolium are among the most inaccurate, with MAEs close to 0.06 kcal mol−1 and RMSEs exceeding 0.11 kcal mol−1. This itself is a remarkable reduction in error—for benzene-containing systems, this is a >11× improvement compared to MP2. Much of this reduction comes from improvements on the 637 conformations in the training set of benzene interacting with another aromatic system, for which the MP2/CBS RMSE of over 2.4 kcal mol−1 is reduced by SNS-MP2 to 0.21 kcal mol−1 RMSE. In addition, the general tendency of MP2 to overbind these systems of interacting aromatics is largely corrected. Whereas the MP2 interaction energy is too attractive by an average of 1.4 kcal mol−1 on these conformations, the mean signed error of SNS-MP2 is over 20× smaller (0.07 kcal mol−1).
B. Test-set summary statistics
More relevant than the training-set error statistics is the performance on our test set, which consists of 22 643 conformations of 314 different molecular systems that were not encountered by the model during training. The most highly represented systems in the test set are the ammonia and formic acid homodimers, with 2208 and 1659 geometries, respectively. The SNS-MP2 error statistics on the test set are shown in Table IV. Over the test set, we measure an MAE of 0.038 kcal mol−1, an RMSE of 0.088 kcal mol−1, and a 99th percentile absolute error of 0.384 kcal mol−1. This is a large and definitive improvement over MP2: The MAE, RMSE, and 99th percentile absolute error all decrease by 6–7× compared to MP2. These errors are only very mildly larger than the errors observed on the training set, indicating a low degree of overfitting. Consistent with our observations on the training set, MP2 overbinds these test complexes by a substantial margin (the mean signed error is 0.17 kcal mol−1), a trend almost fully corrected by SNS-MP2, which achieves a mean signed error on the test set of 9 × 10−4 kcal mol−1.
Class . | N . | MP2/CBS MAE . | MP2/CBS RMSE . | MP2/CBS 99th percentile . | SNS-MP2 MAE . | SNS-MP2 RMSE . | SNS-MP2 99th percentile . |
---|---|---|---|---|---|---|---|
All | 22 643 | 0.225 | 0.593 | 2.750 | 0.038 | 0.088 | 0.383 |
Acids | 1 989 | 0.093 | 0.288 | 1.355 | 0.027 | 0.06 | 0.281 |
Alcohols | 1 274 | 0.238 | 0.548 | 2.474 | 0.049 | 0.114 | 0.482 |
Alkanes | 1 788 | 0.292 | 0.611 | 2.529 | 0.045 | 0.092 | 0.395 |
Alkenes | 2 199 | 0.268 | 0.640 | 2.918 | 0.037 | 0.083 | 0.351 |
Amides | 1 072 | 0.311 | 0.637 | 2.448 | 0.057 | 0.110 | 0.441 |
Amines | 4 338 | 0.145 | 0.424 | 2.195 | 0.023 | 0.066 | 0.338 |
Ammoniums | 480 | 0.357 | 0.647 | 2.520 | 0.073 | 0.129 | 0.568 |
Benzene | 677 | 0.665 | 1.422 | 5.418 | 0.075 | 0.184 | 0.847 |
Carboxylates | 546 | 0.229 | 0.458 | 2.088 | 0.076 | 0.153 | 0.613 |
Esters | 250 | 0.170 | 0.342 | 1.575 | 0.034 | 0.063 | 0.267 |
Ethers | 1 822 | 0.280 | 0.576 | 2.429 | 0.048 | 0.091 | 0.375 |
Guanidiniums | 310 | 0.461 | 0.866 | 3.266 | 0.074 | 0.138 | 0.556 |
Imidazolium | 1 034 | 0.318 | 0.720 | 3.171 | 0.055 | 0.092 | 0.394 |
Ketones | 446 | 0.309 | 0.654 | 2.718 | 0.050 | 0.100 | 0.490 |
Monoatomics | 1 291 | 0.318 | 0.605 | 2.480 | 0.052 | 0.089 | 0.303 |
Other | 5 501 | 0.256 | 0.705 | 3.144 | 0.048 | 0.109 | 0.476 |
Phenol | 638 | 0.425 | 1.085 | 4.575 | 0.045 | 0.086 | 0.377 |
Pyridine | 226 | 0.695 | 1.208 | 4.264 | 0.076 | 0.126 | 0.396 |
Pyrrole | 779 | 0.504 | 1.047 | 4.327 | 0.063 | 0.126 | 0.552 |
Sulfides | 2 788 | 0.498 | 1.001 | 4.199 | 0.055 | 0.111 | 0.449 |
Thiols | 491 | 0.517 | 0.933 | 3.326 | 0.063 | 0.122 | 0.566 |
Water | 9 411 | 0.101 | 0.222 | 1.077 | 0.025 | 0.057 | 0.256 |
Class . | N . | MP2/CBS MAE . | MP2/CBS RMSE . | MP2/CBS 99th percentile . | SNS-MP2 MAE . | SNS-MP2 RMSE . | SNS-MP2 99th percentile . |
---|---|---|---|---|---|---|---|
All | 22 643 | 0.225 | 0.593 | 2.750 | 0.038 | 0.088 | 0.383 |
Acids | 1 989 | 0.093 | 0.288 | 1.355 | 0.027 | 0.06 | 0.281 |
Alcohols | 1 274 | 0.238 | 0.548 | 2.474 | 0.049 | 0.114 | 0.482 |
Alkanes | 1 788 | 0.292 | 0.611 | 2.529 | 0.045 | 0.092 | 0.395 |
Alkenes | 2 199 | 0.268 | 0.640 | 2.918 | 0.037 | 0.083 | 0.351 |
Amides | 1 072 | 0.311 | 0.637 | 2.448 | 0.057 | 0.110 | 0.441 |
Amines | 4 338 | 0.145 | 0.424 | 2.195 | 0.023 | 0.066 | 0.338 |
Ammoniums | 480 | 0.357 | 0.647 | 2.520 | 0.073 | 0.129 | 0.568 |
Benzene | 677 | 0.665 | 1.422 | 5.418 | 0.075 | 0.184 | 0.847 |
Carboxylates | 546 | 0.229 | 0.458 | 2.088 | 0.076 | 0.153 | 0.613 |
Esters | 250 | 0.170 | 0.342 | 1.575 | 0.034 | 0.063 | 0.267 |
Ethers | 1 822 | 0.280 | 0.576 | 2.429 | 0.048 | 0.091 | 0.375 |
Guanidiniums | 310 | 0.461 | 0.866 | 3.266 | 0.074 | 0.138 | 0.556 |
Imidazolium | 1 034 | 0.318 | 0.720 | 3.171 | 0.055 | 0.092 | 0.394 |
Ketones | 446 | 0.309 | 0.654 | 2.718 | 0.050 | 0.100 | 0.490 |
Monoatomics | 1 291 | 0.318 | 0.605 | 2.480 | 0.052 | 0.089 | 0.303 |
Other | 5 501 | 0.256 | 0.705 | 3.144 | 0.048 | 0.109 | 0.476 |
Phenol | 638 | 0.425 | 1.085 | 4.575 | 0.045 | 0.086 | 0.377 |
Pyridine | 226 | 0.695 | 1.208 | 4.264 | 0.076 | 0.126 | 0.396 |
Pyrrole | 779 | 0.504 | 1.047 | 4.327 | 0.063 | 0.126 | 0.552 |
Sulfides | 2 788 | 0.498 | 1.001 | 4.199 | 0.055 | 0.111 | 0.449 |
Thiols | 491 | 0.517 | 0.933 | 3.326 | 0.063 | 0.122 | 0.566 |
Water | 9 411 | 0.101 | 0.222 | 1.077 | 0.025 | 0.057 | 0.256 |
Analyzing the test-set performance by molecule classes, we reach similar conclusions to those suggested by performance on the training set. Many of the most challenging systems are aromatic. Among the three test-set dimer systems with the worst MAEs are acetylene-benzene and dimethylacetylene-benzene (SNS-MP2 MAE of 0.41 and 0.23 kcal mol−1, respectively), suggesting that π–π interactions still remain an area of difficulty.79 Systems containing water, which account for nearly half of the test set, are among the most accurately modeled with an MAE of 0.025 kcal mol−1 (RMSE 0.057 kcal mol−1), a 4× improvement over MP2.
To what degree is the accuracy of the SNS-MP2 method approaching the intrinsic error in the benchmark interaction-energy reference values themselves? Brauer et al. estimated the uncertainty of a similar CCSD(T)/CBS methodology to be 0.05 kcal mol−1 RMSE for the S66x8 data set.20 Burns, Marshall, and Sherrill showed that the inaccuracy introduced by performing the CCSD(T) correction in the aVDZ basis set, which we do in roughly one quarter of our interaction energies, is 0.08 kcal mol−1 MAE on a composite data set of 345 interaction energies.58 Řezáč and Hobza show that not including post-CCSD(T) corrections introduces an error on the order of 1% on the A24 data set.15 Applying this result to our data set yields an estimated error of 0.08 kcal mol−1. Our own examination of the basis-set-related errors, drawn from comparisons to CCSD(T*)-F12a/aVQZ interaction energies on small systems, is consistent with these aforementioned results.59,60 We note that there are differences between these smaller data sets and our full benchmark data set including, but not limited to, chemical composition and choice of basis sets; nevertheless, taken together, we estimate the reliability of the benchmark values in our training and test data to be likely between 0.05 and 0.10 kcal mol−1 RMSE. Considering the SNS-MP2 test-set RMSE of 0.088 kcal mol−1, there may be some further residual signal in the data that is not accurately captured by the SNS-MP2 neural network. These data suggest, however, that particularly on the most accurately modeled systems, such as complexes containing water, SNS-MP2 may be nearly as accurate as the gold-standard CCSD(T)/CBS benchmark values themselves. Dramatic increases in the accuracy of SNS-MP2 are thus unlikely to be possible without obtaining data sets of higher quality, beyond the currently accepted gold-standard level of accuracy.
C. Typical performance across radial scans
In order to demonstrate the range of accuracy of the model on the validation set, we selected three radial scans from the 5th, 50th, and 95th percentiles of the distribution of scans by mean log likelihood. These scans are shown in Fig. 5. The first and third panels, respectively, can be interpreted as the near-worst-case and near-best-case behavior for SNS-MP2, and the second panel can be interpreted as a typical case. The log likelihood includes a contribution both from the accuracy of the energy prediction as well as the predicted error bar. In the near-worst-case radial scan, an interaction between chloromethane and imidazole, two problems are apparent: The predicted energy deviates by a substantial amount, 0.13 kcal mol−1 MAE, and the error bars are underestimated, with the reference energy falling near the edge or outside of the 95% predictive interval for some of the conformations along the scan. Nonetheless, the sign and trend of the post-MP2/CBS correction are accurately predicted. In contrast, in the near-best-case scan, an interaction between ethylamine and water, we see an improved behavior: exceptionally small errors below 0.02 kcal mol−1 MAE and reference energies that fall neatly within the error bars.
Some general trends, which apply to each of the scans in Fig. 5 and across the data set, should be highlighted. First, the model is substantially more accurate than MP2 and correctly predicts the sign and trend of the post-MP2/CBS correction in all three cases. Second, the accuracy of SNS-MP2 strongly depends on the intrinsic accuracy of MP2 itself: It performs best for systems like ethylamine-water in which MP2 is a good reference and more poorly for systems like chloromethane-imidazole in which larger corrections are required. Third, the per-conformation error bars are very responsive to the geometry of the dimer. The model is correctly able to recognize that at a long intermolecular distance, the post-MP2 correction approaches zero with extremely high confidence, while at a very short intermolecular distance, the uncertainties in the energy are much greater.
D. Calibration of error bars
In addition to assessing the quality of the point estimates, it is important to determine whether the error bars computed by the model are accurate. We anticipate that these error bars will be used by researchers for a variety of purposes, and we neither want them to reflect a systematic underconfidence (reporting a large uncertainty for calculations in which the true error is small) or, worse, an overconfidence. In order to gauge whether the error bars are properly calibrated, we inspect the coverage of the predictive intervals. Recall that our error model is Laplacian; when the model predicts a given energy and error bar, (, ), it is making a prediction about the probability distribution of the unknown benchmark interaction energy. In particular, we focus on three quantiles of the error distribution: The Laplace cumulative distribution function implies that with 63.2%, 86.5%, and 95.0% probability, the SNS-MP2 energy should be, respectively, within , 2, and 3 of the gold-standard interaction energy. In fact, we find on our validation set that 65.0%, 91.9%, and 98.3% of the predictions, respectively, fall within these intervals of the benchmark CCSD(T)/CBS interaction energies. These results lead us to conclude that the model is slightly underconfident in its own accuracy but that the discrepancy is small.
To further probe this, Fig. 6 displays a histogram of the standardized errors, |ÊSNS-MP2 − ECCSD(T)/CBS|/. If our error model is accurate and the error bars, , are properly calibrated, then these values should follow a standard exponential distribution. On the other hand, if the error bars are systematically too large or too small, we would expect to be able to see this by comparing the empirical distribution of the scaled error on the test set to this expected distribution. In fact, we find that the two distributions match quite closely, suggesting that researchers employing this model should be assured that the model’s confidence in its accuracy can be trusted in downstream applications such as force field parameterization. In our test set we see, moreover, no evidence that suggests a decrease in the reliability of the error bars for larger systems. The largest systems in our test set, as measured by the number of heavy atoms, in fact exhibit somewhat higher coverage rates—the fraction of the benchmark CCSD(T)/CBS values fall within the SNS-MP2 predictive intervals—than the test data set as a whole. Within the range of system sizes examined in this study, we observe no systematic trend suggesting that researchers should have less confidence in the reliability of the SNS-MP2 error bars when studying larger, as opposed to smaller, systems.
VI. SUPPLEMENTAL TEST SETS
In addition to assessing the performance of SNS-MP2 on our primary test set, after finalizing the model, we evaluated it on two supplemental test sets for which benchmark-quality CCSD(T)/CBS reference energies have been publicly reported: Hobza’s S66x8, a set of 66 dimer potential energy curves,19,20 and a set of 10 dimer potential energy curves calculated by Taylor et al.,21 which we refer to as T80.
A. Validation on S66x8
We compared our neural network’s predictions to the S66x8 revised benchmark energies computed by Brauer et al., which combined CBS-extrapolated MP2-F12 energies with CCSD(Tcsc)-F12b/cc-pVDZ-F12 high-level corrections.20 Based on their assessments of yet higher-level calculations for smaller data sets, Brauer et al. estimated these benchmark values to be reliable to about 0.05 kcal mol−1 RMSE.
The accuracy of our neural network in comparison to other commonly used methods on S66x8 is shown in Fig. 7. It achieves an MAE of 0.077 kcal mol−1, an RMSE of 0.14 kcal mol−1, and a maximum absolute error of 0.69 kcal mol−1. This maximum error occurs for the benzene-uracil system. The S66x8 data set is grouped into four subsets by the type of interaction: H-bonds, π-stacks, London, and mixed. On these subsets, our model achieves MAE values of 0.14, 0.11, 0.05, and 0.06 kcal mol−1, respectively.
Across the data set, SNS-MP2 is more accurate by MAE than all of the alternative methods with a similar computational cost. The only methods benchmarked by Brauer that yield more accurate test-set predictions than SNS-MP2 on this data set were SCS(MI)-CCSD(F12c), CCSD(T)(F12c), and CCSD(T*)(F12c), each of which is significantly more computationally costly than SNS-MP2.
B. Validation on T80
Figure 8 shows the accuracy of our model in comparison to a number of popular methods on each of the 10 chemical systems in T80. The overall MAE of our model is 0.062 kcal mol−1 and the RMSE is 0.12 kcal mol−1, which are similar to the values measured on the S66x8 data set. The maximum error is 0.37 kcal mol−1. The reference CCSD(T)/CBS energies fall within the SNS-MP2 95% predictive interval for 67 out of the 80 conformations in the T80 test set, giving an observed coverage of 84%. Our model is more accurate than any DFT method we are aware of that has been benchmarked on this data set, including the 12 methods benchmarked by Taylor et al.21 and 14 more density functionals subsequently tested on this data set by Mardirossian and Head-Gordon.61 The most accurate of these 26 density functionals is Mardirossian and Head-Gordon’s ωB97M-V, which achieves an MAE of 0.090 kcal mol−1 and an RMSE of 0.151 kcal mol−1: By these error metrics, SNS-MP2 achieves a 20%–30% improvement over this state-of-the-art density functional.
VII. DISCUSSION
A. Representation of chemical systems for machine learning algorithms
A persistent challenge in the application of machine learning to problems in quantum chemistry has been the representation of molecules and their conformations in a form that is suitable for conventional machine learning algorithms, which usually require fixed-length vectors as input. For cheminformatics and bioinformatics tasks such as bioactivity, toxicity, and binding-affinity prediction, a variety of molecular descriptors are commonly used.62 These descriptors are generally based solely on the molecular graph specifying all covalent bonds (e.g., as encoded by SMILES strings), which makes them unsuitable for use in modeling conformationally dependent properties such as atomization energies and interaction energies. No downstream machine learning algorithm is capable of describing the conformational dependence of a property when supplied only with a representation of the data that is independent of conformation.
Although chemical descriptors that depend only on covalent connectivity are thus too coarse for the purpose of modeling interaction energies, the complete set of atomic positions and nuclear charges is generally too fine of a representation, possessing its own set of challenges. We know, for example, from elementary physical principles that the interaction energy should obey a number of symmetries, invariances, and asymptotic properties. These properties include invariance of the interaction energy with respect to translation, rotation, and index ordering of the atoms, as well as the asymptotic decay of the interaction energy to zero as the interfragment distance approaches infinity. Naïve input of the Cartesian coordinates of the nuclei into a conventional machine learning algorithm may yield a model that approximately obeys these symmetries, at least with a large enough data set, but the invariances will not be satisfied exactly.
For this reason, significant effort has been directed at the design of representations of chemical systems that filter out irrelevant degrees of freedom such as rotational and translational states. Rupp et al., for example, in building a system to predict the atomization energies of small organic molecules, represented each conformation using its so-called “Coulomb matrix,” a square matrix with off-diagonal elements equal to the internuclear Coulomb repulsion between pairs of nuclei. By restricting the downstream machine learning model (first kernel ridge regression and later neural networks) to depend only on the distance between pairs of atoms, invariance with respect to rotation and translation of the coordinate system is ensured, although invariance with respect to permutation of the atomic ordering is not guaranteed. More recently, a number of neural network architectures have been designed in which molecular structures are represented through a vector of nuclear charges and a matrix of interatomic distances, and the total energy is constructed as a sum over atomic contributions using shared weights to ensure permutational invariance.37,63
The representations of chemical systems discussed above require the application of only a few simple arithmetic operations to the Cartesian coordinates. This simplicity is critical for their use in methods whose intent is to predict observables, such as atomization energies, much more rapidly than would be possible with any traditional DFT- or wavefunction-based quantum chemical calculations. In contrast, since SNS-MP2 has been built to augment, rather than circumvent, traditional quantum chemical calculations (an approach referred to as Δ-machine learning by Ramakrishnan et al.64), its use of a chemical-system representation derived from quantum chemical calculations incurs only a small additional runtime cost.
There are significant advantages, we believe, to using a quantum chemical, as opposed to purely geometric, representation of the chemical system. The quantum chemical features derived from HF, MP2, and SAPT0 calculations on the interacting fragments implicitly encode all of the appropriate invariances and asymptotic behaviors, which the neural network component of the model would otherwise have to learn. These features may provide a richer and more relevant representation of the nature of the interaction between the two fragments than the raw geometry would provide, enabling the neural network to easily identify the common electronic features between geometrically and chemically distinct systems.
B. Performance
While less computationally demanding than CCSD(T), the SNS-MP2 method still incurs a substantial cost. The key factors that contribute to the computational cost of SNS-MP2 are the large number of SCF calculations required, the need for calculations in the aVQZ basis set, and the need for the MP2 reduced density matrix. 10 separate SCF calculations are required to compute all of the SNS-MP2 input features, when using aVTZ as the small basis set and aVQZ as the large basis set. These include one for each monomer in the monomer-centered aVTZ and aVQZ basis sets (in order to generate both the SAPT0 and MP2 electrostatic interaction-energy features) and one for the dimer and each monomer in the dimer-centered aVTZ and aVQZ basis sets (in order to generate the counterpoise-corrected DF-MP2 interaction energies). The costs of computing the aVQZ energies in the dimer-centered basis set and the DF-MP2 density matrix are the two largest components in the overall runtime. For large systems, especially when the fragments are of near-equal size, the aVQZ energy calculations in the dimer basis dominate, and thus the full SNS-MP2 computational cost is only negligibly larger than that of a counterpoise-corrected aVQZ DF-MP2 interaction-energy calculation. For smaller systems, however, the SNS-MP2 calculation is approximately 2–3× the cost of a DF-MP2/aVQZ interaction-energy calculation.
C. Limitations and generalizations
Although SNS-MP2 improves on the accuracy of MP2, it does not improve on the computational cost. In fact, because the SNS-MP2 input features require the calculation of the MP2/aVQZ density matrix, it is more costly to compute the SNS-MP2 interaction energy than the MP2/aVQZ interaction energy. One avenue for future development may thus be to train variants of the method using different input features such that the computational requirements are reduced. A large performance increase could be easily accomplished, for example, by dropping the use of features computed with the aVQZ basis set or replacing these features with similar inputs calculated in smaller basis sets.
A more fundamental limitation of the method comes from its use of input features that are defined only in terms of interaction-energy calculations. An alternate set of input features defined instead in terms of only a single system, as opposed to two interacting subsystems, would let us train the model to predict total energies instead of interaction energies, which may broaden the applicability of the method.
Although we observe promising accuracy across the range of system sizes studied, SNS-MP2 is not size extensive.65 The lack of size extensivity is due to variation in the same-spin and opposite-spin scaling coefficients as a function of system size, and the method provides no theoretical guarantee as to how these values behave asymptotically as the number of electrons grows. We thus anticipate a reduction in accuracy when the method is applied to systems that are very dissimilar from those in the training set, for instance, due to a large size or significantly different chemistry, such as that found in transition metal complexes.
VIII. CONCLUSIONS
We have introduced SNS-MP2, a hybrid machine learning and quantum chemical method for computing the interaction energies of noncovalent complexes. The method combines MP2—an approximate and nonparametric quantum chemical method that captures the essential physics of intermolecular interactions—with modern big-data machine learning techniques that, when trained on a large data set of expensive and accurate reference values covering a broad subset of chemical space, can identify and correct the systematic errors inherent to the MP2 method. While some previously developed semi-empirical MP2-based methods have shown promise using two global spin-component coefficients to correct the MP2 correlation energy,16,26 subsequent work showed that improved accuracy can be achieved by using a different choice of spin-component coefficients for different categories of intermolecular interactions.24,25,30,31 With SNS-MP2, we take this observation one step further: The method uses a collection of per-conformation HF-, MP2-, and SAPT0-based quantum chemical features to derive, by way of a deep neural network, a specialized set of spin-component scaling coefficients for the particular chemical system and conformation under study.
SNS-MP2 has been trained on a data set of over 200 000 CBS-extrapolated CCSD(T) interaction energies to an unprecedented degree of accuracy. It predicts the interaction energies of unseen test complexes with an MAE of less than 0.05 kcal mol−1 and an RMSE of less than 0.10 kcal mol−1. This level of accuracy represents a 6- to 7-fold improvement over MP2 and approaches the intrinsic reliability of the CCSD(T)/CBS gold-standard methodology itself, especially for complexes containing water. In tests on two supplemental data sets for which gold-standard CCSD(T)/CBS reference energies have been publicly reported and a large number of alternative methods have been benchmarked, the accuracy of SNS-MP2 exceeds that of all previously reported density functionals and all wavefunction-based methods with O(N5) or lower scaling. Moreover, in addition to providing a point estimate of the interaction energy, SNS-MP2 provides a reliable conformation-specific predictive interval, or an error bar, on each interaction energy, a feature not available, to the best of our knowledge, from any alternative method. Overall, our results suggest that deeply integrating modern machine learning techniques within the mathematical frameworks given by traditional approximate ab initio electronic-structure methods may be a promising strategy for the development of moderate-cost, extremely accurate quantum chemical methods.
ACKNOWLEDGMENTS
The authors thank Panos Angelikopoulos and Brent Gregersen for helpful discussions and a critical reading of the manuscript, Paul Maragakis and David Borhani for helpful discussions, and Berkman Frank for editorial assistance.
This study was conducted and funded internally by D. E. Shaw Research, of which D.E.S. is the sole beneficial owner and Chief Scientist and with which all the authors are affiliated.
APPENDIX A: BASIS SETS
This work uses a combination of four orbital basis sets, labeled aVQZ, aVTZ, aVTZ(d/p), and aVDZ, constructed for a consistent and accurate behavior across the periodic table. The set of elements used in the calculations in this work are H, He, Li, C, N, O, F, Ne, Na, P, S, Cl, Ar, K, Br, Kr, I, and Xe. The distribution of system sizes present in our overall data set, subdivided by the basis set used to perform the coupled-cluster high-level correlation correction, is shown in Fig. 3.
1. Hydrogen through potassium
For first row and non-metal second-row elements (H, He, and B–Ne), we use the standard Dunning augmented correlation-consistent basis sets, aug-cc-pVXZ.66 For third-row non-metals (P–Ar), we use the revised augmented correlation-consistent basis sets, aug-cc-pV(X+d)Z, with tight d functions that yield improved convergence over the aug-cc-pVXZ series.67 For alkali metals (Li, Na, and K), we use Dunning correlation-consistent, polarized, weighted core-valence basis sets, cc-pwCVTZ.68 Furthermore, many calculations were performed with the aVTZ(d/p) basis, a modified version of the aVTZ basis in which d functions on hydrogen and f functions on heavy atoms have been omitted in a manner similar to that of the work of Wilson and co-workers and to the Truhlar calendar basis sets.69–72
For density-fit calculations, the canonical Weigend aug-cc-pVXZ MP2FIT and JKFIT auxiliary basis sets were used as auxiliary basis sets.73 For He, Li, Na, and K, no aug-cc-pVXZ-JKFIT basis has been specifically developed. For these elements, we used the larger def2-QZVPP-JKFIT auxiliary basis set.74
2. Bromine through xenon
To correct for the neglect of relativistic effects, effective core potentials (ECPs) from the Stuttgart/Cologne group were used for systems containing Br, Kr, I, and Xe in our MP2 and CCSD(T) calculations.75,76 For Br and Kr, and for I and Xe, 10 and 28 core electrons were replaced by the ECP10MDF and ECP28MDF pseudopotentials, respectively. For MP2 and CCSD(T) calculations with these elements, the aug-cc-pwCVXZ-PP orbital basis sets and def2-AQZVPP-JKFIT and aug-cc-pwCVXZ-PP-MP2FIT auxiliary basis sets were used. Because of the lack of support for ECPs in the Psi4 1.0 software package, ECPs were not used with SAPT0, and the aug-cc-pwCVXZ orbital basis set and the def2-QZVPP-JKFIT and aug-cc-pVXZ-MP2FIT auxiliary basis sets were used for these elements in SAPT0 calculations.
APPENDIX B: CORE ORBITALS
In MP2 and CCSD(T) interaction energy calculations, the choice of which core (spatial) orbitals to freeze is a complex problem, as common strategies can yield inconsistent behavior between the fragment and dimer calculations. We analyzed the orbital energies of the relevant atoms to define a set of cutoffs that both match chemical intuition and provide sufficient energy gaps between orbitals so that environmental effects do not reorder the orbitals with respect to the cutoffs. Based on this work, we elected to use 0 frozen orbitals for elements H–Li, 1 frozen orbital for elements C–Na, 5 frozen orbitals for elements P–K, and 4 frozen orbitals for Br, Kr, I, and Xe, where an ECP is present. Note that this choice differs from the default behavior implemented in many quantum chemistry software packages—using the number of orbitals in the nearest previous noble gas atom—for Li, Na, and K atoms.