Conformational polymorphs of organic molecular crystals represent a challenging test for quantum chemistry because they require careful balancing of the intra- and intermolecular interactions. This study examines 54 molecular conformations from 20 sets of conformational polymorphs, along with the relative lattice energies and 173 dimer interactions taken from six of the polymorph sets. These systems are studied with a variety of van der Waals-inclusive density functionals theory models; dispersion-corrected spin-component-scaled second-order Møller–Plesset perturbation theory (SCS-MP2D); and domain local pair natural orbital coupled cluster singles, doubles, and perturbative triples [DLPNO-CCSD(T)]. We investigate how delocalization error in conventional density functionals impacts monomer conformational energies, systematic errors in the intermolecular interactions, and the nature of error cancellation that occurs in the overall crystal. The density functionals B86bPBE-XDM, PBE-D4, PBE-MBD, PBE0-D4, and PBE0-MBD are found to exhibit sizable one-body and two-body errors vs DLPNO-CCSD(T) benchmarks, and the level of success in predicting the relative polymorph energies relies heavily on error cancellation between different types of intermolecular interactions or between intra- and intermolecular interactions. The SCS-MP2D and, to a lesser extent, ωB97M-V models exhibit smaller errors and rely less on error cancellation. Implications for crystal structure prediction of flexible compounds are discussed. Finally, the one-body and two-body DLPNO-CCSD(T) energies taken from these conformational polymorphs establish the CP1b and CP2b benchmark datasets that could be useful for testing quantum chemistry models in challenging real-world systems with complex interplay between intra- and intermolecular interactions, a number of which are significantly impacted by delocalization error.
Organic crystal polymorphism plays a key role in determining crystal properties, such as pharmaceutical solubility and bioavailability. Experimental solid form screening is employed to search for polymorphs, but knowing when all important polymorphs have been found can be difficult. The ability to predict organic crystal structures from first principles has improved substantially over the past two decades,1–7 and crystal structure prediction is increasingly used as complement to experimental polymorph screens to ensure that all important forms have been realized experimentally. A key driver in this progress has been the shift from modeling crystals with classical force fields to more accurate dispersion-corrected density functional theory (DFT) models.8–17 DFT-based crystal structure prediction has successfully predicted crystal structures for many pharmaceutical systems, for example, sometimes in concert with or even before their experimental discovery.18–23
Conformational polymorphs are systems where changes in the intramolecular conformation accommodate different intermolecular packing motifs.24 They are particularly challenging for computational models because of the difficulties in balancing the interplay between intra- and intermolecular interactions. Polymorph energy differences are very small in general—usually less than 10 kJ/mol and often only a few kJ/mol.25,26 While the energy differences for conformational polymorphs tend to be modestly larger than those for non-conformational polymorphs, achieving the necessary energy discrimination remains challenging. The inadequacies of classical potentials for describing conformational energies were recognized many years ago. Indeed, quantum chemistry methods were first incorporated into force-field-based crystal structure prediction protocols to improve the modeling of conformational energies.27,28 Since then, fully periodic DFT treatments have become routine in crystal structure prediction, especially for the late/final refinement and ranking stage(s).8,15,17
However, commonly used generalized gradient approximation (GGA) functionals, such as Perdew–Burke–Ernzerhof (PBE)29 or B86bPBE,29,30 exhibit delocalization error, which biases them against localized densities.31,32 Delocalization error can artificially stabilize structures with more extended π conjugation, for example.33,34 In molecular crystal conformational polymorphs, this behavior can spuriously stabilize selected conformations, as occurs for the prolific polymorph former ROY (so-named for its red, orange, and yellow crystals),35–38 ortho-acetamidobenzamide,37 molecule X14,37 from the third blind test of crystal structure prediction,3 the pharmaceuticals axitinib and galunisertib,37 and rubrene derivatives.39 Delocalization error has also been blamed for the spurious proton transfer in a set of acid-base co-crystals40 and poor lattice energies for halogen-bonded crystals.41
By mixing in some fraction of exact/Hartree–Fock (HF) exchange, hybrid density functionals, such as PBE0,42 should reduce (but not necessarily eliminate) delocalization error. Hybrid functionals improve the description of halogen-bonded crystals41 and reduce the erroneous proton transfer in the aforementioned co-crystals.40 They also generally lead to improved polymorph energy rankings.17,43–45 At the same time, hybrid functionals do not cure all problems—they still describe conformational energies poorly in systems when the delocalization error is significant, for example.37,39
Recognizing the problematic conformational energies, two of the present authors recently examined a “monomer correction” approach37 in which the problematic intramolecular (one-body) DFT conformational energies obtained from conventional GGA or hybrid functionals are replaced with more accurate ones obtained from correlated wavefunction methods or more advanced density functionals. For the conformational polymorph cases mentioned above, the monomer correction considerably improves the agreement of the predicted polymorph stabilities with experiment.37 Because the correction is calculated with gas-phase monomers (in their crystalline conformations), the correction is straightforward to evaluate and computationally inexpensive compared to periodic DFT geometry optimizations. It can be applied to entire crystal energy landscapes readily.37,38
These recent studies raise a number of questions. How widespread are large conformational energy errors in conformational polymorphs when using conventional density functionals? Monomer conformational energy corrections can potentially address these intramolecular errors, but that approach is predicated on the density functionals describing the intermolecular interactions accurately. How accurate is the DFT description of the intermolecular interactions? How much do hybrid functionals improve the treatment of intra- and intermolecular interactions in conformational polymorphs compared to GGA functionals? How readily can these improvements be incorporated into standard crystal structure prediction workflows? What insights can we gain into the successes and failures of crystal structure prediction energy ranking models based on benchmarking of the intra- and intermolecular interactions?
To address these questions, this study examines a number of conformational polymorphs using a mixture of density functional and correlated wavefunction models. It first examines the conformational energies across 20 sets of conformational polymorphs containing a total of 54 symmetrically unique monomer conformations. To assess the accuracy of the intermolecular interactions, the conformational polymorphs of the six smallest molecules are subsequently analyzed in much greater detail. This in-depth analysis includes benchmarking several different electronic structure methods for describing 173 short-range pairwise (two-body) intermolecular interactions extracted from the six crystals, examining relative polymorph lattice energies computed with three different GGA and hybrid density functionals, and performing one-body (monomer) and two-body (dimer) corrections to those lattice energies.
The models tested for the one-body and two-body energies include the “conventional” GGA and hybrid density functionals B86bPBE with the exchange-hole dipole moment (XDM) dispersion correction,46 PBE with either Grimme’s D4 dispersion correction47 or the many-body dispersion (MBD) treatment,48–50 and PBE0 with either D4 or MBD. These particular density functionals and dispersion treatments are frequently used in crystal structure prediction work with good success.10,12–14,17,44,45 Two more advanced models are also studied: the high-quality51 range-separated hybrid meta GGA ωB97M-V functional52 and the recently developed spin-component-scaled dispersion-corrected second-order Møller–Plesset perturbation theory (SCS-MP2D) model.53 SCS-MP2D competes well with state-of-the-art density functionals in terms of cost and accuracy.53 All of these models are benchmarked against domain local pair natural orbital coupled cluster singles and doubles with perturbative triples [DLPNO-CCSD(T)]54,55 energies. The simultaneous use of high-quality coupled cluster benchmarks and the emphasis on challenging conformational polymorph systems differentiates the present study from other recent studies56–58 that investigated the quality of DFT pairwise interactions in crystals.
In addition to the insights into the performance of DFT for crystal structure modeling provided by the current study, the conformational polymorph one-body (CP1b) and two-body (CP2b) DLPNO-CCSD(T) datasets generated here establish real-world benchmarks for testing the performance of electronic structure methods. These sets complement existing benchmark sets for molecular crystals12,59–61 and conformational energies/non-covalent interactions in general.62–66 Datasets such as CP1b and CP2b, which involve larger, more “realistic” molecules in non-equilibrium geometries, are particularly important for validating quantum chemistry models,62 especially for models parameterized against more traditional synthetic datasets. The substantial impacts of delocalization error in a number of the systems in CP1b and CP2b (as will be demonstrated below) further distinguish these benchmark datasets from most other existing sets.
II. COMPUTATIONAL METHODS
A. Test systems
A set of 20 conformationally flexible, polymorphic molecules was selected, as shown in Fig. 1. These 20 molecules include chemically interesting pharmaceuticals, sugars, color polymorphs,67 etc., that are known to exhibit large conformational energy differences between polymorphs24 and have unit cells that are amenable to geometry optimization with periodic DFT. Fifteen of these systems have two polymorphs, while the remaining five systems have three for a total of 45 unique crystals. We note that KIJBOX and QIJZOY are a pair of tautomeric as well as conformational polymorphs. Initial crystal structures for each polymorph were obtained from the Cambridge Structure Database (CSD) reference codes listed in Table I.
|RefCode .||Z′ .||RefCode .||Z′ .||RefCode .||Z′ .|
|RefCode .||Z′ .||RefCode .||Z′ .||RefCode .||Z′ .|
The atomic positions and lattice vectors for each crystal structure were fully relaxed with periodic density functional theory (computational details provided below). To study the intramolecular conformational energies, the geometry of every molecule in the asymmetric unit was then extracted from the optimized crystals for a total of 54 symmetrically unique monomers. These monomer structures and their DLPNO-CCSD(T) conformational energies constitute the CP1b benchmark dataset.
For six of the polymorph sets (ACBNZA, GLUCIT, MCHTEP, MNIAAN, UFAGIS, and ZEYBIO), the pairwise intermolecular interactions and relative crystal lattice energies were also studied in detail. These systems were chosen for the pragmatic computational reason that they are the smallest molecules in the set (23–28 atoms) and because they include four species whose monomers are strongly affected by delocalization error as well as two species which are not, which makes for an interesting comparison in the dimer interaction energies. Molecular dimers were extracted for all pairs involving one molecule in the asymmetric unit and any other molecule lying within 4 Å of that molecule (as defined by the shortest intermolecular atom–atom distance). All 173 symmetrically unique dimers and their DLPNO-CCSD(T) interaction energies were retained to establish the CP2b benchmark dataset. The CP2b dataset is arguably less evenly balanced than many synthetic datasets in the sense that the number of dimers for each species included depends on the number of polymorphs for that species and how many molecules each polymorph contains in its asymmetric unit. On the other hand, its value lies in how it samples the interplay of intra- and intermolecular interactions across a variety of dimer geometries. Moreover, while the monomer and dimer structures were extracted from equilibrium crystal geometries, the individual CP1b and CP2b structures are not necessarily minima on the monomer/dimer potential energy surfaces. In these senses, these CP datasets are similar in spirit to ones such as the SSI (amino acid side chain interactions taken from proteins)66 or 3B-69 (trimers extracted from molecular crystals).61 The complete set of structures, symmetry scaling factors, and benchmark energies for CP1b and CP2b are provided in the supplementary material.
B. Computational details
Periodic DFT simulations were performed with the Vienna Ab initio Simulation Package (VASP 5.4.4)103–106 using the PBE functional with normal projector augmented wave (PAW) pseudopotentials.107,108 Initial crystal structures were taken from the CSD reference codes listed above and fully optimized (unit cells and atomic positions) using PBE and the Grimme D2 dispersion correction.109 Next, single-point energy calculations were performed on the optimized crystal structures with periodic PBE-MBD and PBE0-MBD.48–50 All VASP DFT calculations were performed using a 520 eV cut-off energy for the planewaves, while the Brillouin zone was sampled via a Monkhorst–Pack k-point grid with spacing of at least 0.03 Å−1. Additional single-point energies were computed for these structures with B86bPBE-XDM using Quantum Espresso version 22.214.171.124 These calculations employed PAW potentials, a 50 Ry planewave energy cutoff, and reciprocal space Monkhorst–Pack k-point grid densities of 0.04 Å−1 or better.
Gas-phase monomer and dimer interaction energy calculations were performed using a variety of methods. First, benchmark DLPNO-CCSD(T) calculations55 with non-iterative semi-canonical triples [sometimes referred to as DLPNO-CCSD(T0)111] were run for each monomer and dimer using Orca 126.96.36.199 The monomer and dimer calculations were performed in the aug-cc-pVTZ and aug-cc-pVDZ basis sets,113 respectively, all using the “TightPNO” parameter settings (TCutPNO = 1 × 10−7, TCutDO = 5 × 10−3, and TCutPairs = 1 × 10−5) together with a tighter TCutMKN parameter of 1 × 10−4. The DLPNO-CCSD(T) correlation energies were extrapolated to the complete basis set (CBS) limit using the focal point approach,114 which combines DLPNO-CCSD(T) in a moderate basis set with second-order Møller–Plesset perturbation theory (MP2) results that have been extrapolated to the CBS limit using the aug-cc-pVTZ and aug-cc-pVQZ basis sets. Various convergence tests described in the supplementary material suggest that the DLPNO-CCSD(T) benchmark conformational and dimer interaction energies are probably converged to within a few kJ/mol at worst and usually considerably better.
For comparison with these benchmarks, gas-phase monomer and dimer energies were computed using the van der Waals-inclusive GGA functionals B86bPBE-XDM, PBE-D4, and PBE-MBD; hybrid functionals PBE0-D4 and PBE0-MBD; and range-separated hybrid meta-GGA ωB97M-V. The first five “conventional” functionals were chosen because of their widespread use in organic crystal structure prediction applications. The range-separated hybrid meta-GGA ωB97M-V functional was chosen as an example of more advanced functionals that are used in molecular applications and that frequently perform considerably better than conventional functionals.51 However, many other high-quality meta-GGA, hybrid, and double-hybrid functionals exist64,115,116 that have not been tested here, some of which may perform better than those examined here. Finally, energies were also computed with spin-component-scaled dispersion-corrected MP2 (SCS-MP2D), a correlated wavefunction method that competes well with state-of-the-art density functionals in terms of cost and accuracy.53
The B86bPBE-XDM monomer and dimer energies were computed using Quantum Espresso and the same settings as for the crystals described above, albeit with 20 Å vacuum spacing between the nearest atoms applied along each Cartesian direction. All other DFT calculations employ the def2-QZVP basis,117 and the dimer interaction energies were counterpoise corrected. Test calculations of dimer interaction energies in the larger aug-cc-pVQZ basis differed by 0.1 kJ/mol or less. The MP2 correlation energies entering SCS-MP2D were extrapolated to the complete basis set limit118 from calculations in the aug-cc-pVTZ and aug-cc-pVQZ basis sets and were paired with Hartree–Fock/aug-cc-pVQZ energies. The SCS-MP2D dimer calculations were also counterpoise-corrected. PBE-MBD and PBE0-MBD were computed using Q-Chem 5.4.1;119 all other MP2 and DFT calculations were done with PSI4 version 1.3.2120 and utilized density fitting. SCS-MP2D energies were obtained from MP2 calculations via the MP2D library.121
The study here also involves applying one-body conformational energy and two-body intermolecular corrections to the periodic DFT energies. One-body-corrected crystal energies37 were computed by subtracting out the gas-phase DFT energy of each monomer in the unit cell and replacing it with the corresponding DLPNO-CCSD(T) one,
The monomer sum runs over only symmetrically unique monomers in the unit cell, and the symmetry factor σi scales those energies based on the number of symmetrically equivalent monomers present in the cell. Because the final energies are reported per molecule (i.e., unit cell energies divided by the number of molecules in the cell), the symmetry factors are less than or equal to 1 in practice.
One-body- and two-body-corrected DFT crystal energies replace both the monomer and short-ranged DFT dimer interaction energies ΔEdim,j with DLPNO-CCSD(T) ones,
where again both sums run over the symmetrically unique monomers/dimers whose energies are scaled by the appropriate symmetry factors σj. The monomer conformational and dimer interaction energies are computed in the gas phase using the geometries extracted from the optimized crystals. The interaction energy for the pair of molecules A and B is computed as ΔEdim = EAB − EA − EB (with appropriate ghost basis functions on the monomers for counterpoise-corrected energies).
Equation (2) is comparable to various fragment approaches,122,123 including the hybrid many-body interaction (HMBI) approach.58,124 The key differences vs earlier HMBI studies lie in treating fewer two-body dimer terms with the higher-level method here (i.e., applying a shorter 4 Å two-body cutoff instead of the more typical ∼6–10 Å HMBI one) and using periodic DFT for the long-range and many-body terms instead of periodic HF or a force field. A few other studies have performed similar fragment calculations in simple crystals with DFT providing the long-range and many-body terms.125–127
III. RESULTS AND DISCUSSION
A. Intramolecular conformational energies
We begin by examining the gas-phase conformational energies for the 54 symmetrically unique monomers using geometries extracted from the 45 geometry optimized crystals (CP1b dataset). Conformational energies are computed relative to the most stable conformation of that species found in the set with DLPNO-CCSD(T). Error statistics are computed from the 34 conformations that remain after excluding these 20 reference conformations. The gray kernel density estimate plots in Fig. 2 show the error distributions for the various DFT functionals and SCS-MP2D compared to the DLPNO-CCSD(T) benchmarks across all systems. The conformational energy errors generally follow the expected trend: GGA functionals B86bPBE and PBE exhibit the largest root-mean square (rms) errors in the range of 4.8–5.6 kJ/mol. The hybrid PBE0 functional reduces the rms errors from the GGAs by ∼20% to 3.9–4.4 kJ/mol, depending on the dispersion treatment. Finally, the range-separated hybrid meta-GGA functionals ωB97M-V and SCS-MP2D perform much better, with rms errors of only 2.3 and 1.9 kJ/mol relative to DLPNO-CCSD(T), respectively.
A closer inspection of the conformational energies reveals that most models perform well for most of the conformational energies here. The yellow box-and-whisker plots in Fig. 2 represent the errors for the 14 polymorph sets for which the B86bPBE-XDM errors are less than 5 kJ/mol. The central line represents the median error, the boxes contain 50% of the data, and the whiskers capture the largest errors. For these 14 polymorph sets, the DFT rms errors range 1.7–2.1 kJ/mol. The box plots for the conventional density functionals are also quite similar. ωB97M-V exhibits a generally tighter distribution than the other functionals, albeit with larger errors for CELBEA, QUBPAF, and YIGPIO conformations that give it an overall rms error of 2.5 kJ/mol. SCS-MP2D exhibits a very tight error distribution with an rms error of only 1.2 kJ/mol.
The large conformational energy errors that differentiate how the various models perform arise primarily from the other six polymorph sets: MCHTEP, MNIAAN, KIJBOX/QIJZOY, UFAGIS, ACBNZA, and MIQKOM. Representative conformations of these six species are shown in Fig. 3, and the individual conformational errors are highlighted at individual points in Fig. 2. B86bPBE-XDM errors for three MCHTEP and MNIAAN conformations exceed 10 kJ/mol, and several others for the six sets are only slightly smaller. The PBE errors with either D4 or MBD dispersion are a little larger than the B86bPBE-XDM ones.
These more substantial errors are consistent with the delocalization error in the approximate density functionals, which artificially stabilizes conformations that exhibit a greater extent of π conjugation. In ACBNZA, MNIAAN, and MCHTEP, the upper conformation shown in Fig. 3 is relatively planar, allowing the π delocalization to extend from the aromatic ring to the side chains. These three conformations also exhibit intramolecular hydrogen bonds whose strength is probably overestimated, as will be discussed in Sec. III B. Other polymorphs of these species alter the intramolecular conformation to replace intramolecular hydrogen bonds with intermolecular ones. This conformational change reduces the planarity of the π system and decreases the overall π-electron delocalization. Accordingly, conventional GGA/hybrid DFT functionals artificially stabilize the more planar conformations found in ACBNZA02, MCHTEP07, MCHTEP13, MNIAAN02, and MNIAAN12 relative to the less planar ones present in ACBNZA01, MCHTEP11, and MNIAAN11. These results are consistent with earlier case studies.37,39,58
Similar behaviors occur in UFAGIS. The planar UFAGIS conformation that allows the highest-occupied molecular orbital (HOMO) to extend across both rings via the sulfur lone pair electrons is over-stabilized relative to the non-planar conformations in UFAGIS01 that localize the frontier orbitals onto one ring or the other. In the KIJBOX/QIJZOY tautomers, moving the hydrogen from the bridging nitrogen in QIJZOY to the pyridine nitrogen in KIJBOX alters the nature of the conjugation involving that ring. The origin of the large GGA error in MIQKOM is less obvious, though it may reflect how the more symmetrical MIQKOM01 polymorph conformation (lower one in Fig. 3) exhibits slightly less distortion around the cyclopropene rings compared to the MIQKOM polymorph. In contrast to these six species, none of the conformations adopted by the other 14 better-behaved species in Fig. 1 appreciably alters the extent of electron delocalization.
Given the apparent role of delocalization error in the six problematic systems, one expects that the conformational energy errors should decrease as exact exchange is incorporated into the density functional. Switching from PBE to PBE0 reduces the errors by ∼20%–25% for either the D4 or MBD dispersion treatments (Fig. 2). The one notable exception occurs for ACBNZA, for which PBE already performs up to 2–3 kJ/mol worse than B86bPBE-XDM, especially with the MBD correction, and PBE0 increases the error by another ∼1 kJ/mol. Further progression toward the range-separated ωB97M-V functional or the wave function-based SCS-MP2D largely eliminates the problems, with conformational energy rms errors for the six problematic species of only 1.8 and 2.8 kJ/mol, respectively. The fact that the SCS-MP2D rms error is ∼1 kJ/mol larger than ωB97M-V for these for six species stems primarily from the 4–5 kJ/mol errors found for the KIJBOX/QIJZOY tautomers and the UFAGIS conformations, compared to the ∼1 kJ/mol or less errors in those systems from ωB97M-V. On the other hand, the SCS-MP2D rms error is 1.3 kJ/mol smaller than ωB97M-V for the other 14 systems such that the SCS-MP2D has a slightly smaller rms error for the full dataset compared to ωB97M-V (1.9 vs 2.3 kJ/mol).
In summary, conventional GGA and hybrid density functionals generally describe the relative conformational energies in these polymorphs well. However, when conformational changes alter the degree of electron delocalization/conjugation, the quality of the conformational energies can deteriorate dramatically. In those cases, switching to electronic structure methods that are less sensitive to delocalization error, such as top-tier density functionals like ωB97M-V or wavefunction methods like SCS-MP2D and coupled cluster models, can substantially improve the accuracy of the conformational energies.
B. Two-body intermolecular energies
Next, we examine the performance of the same models for predicting dimer intermolecular interaction energies. Due to the high computational expense of the coupled cluster benchmarks for dimers and the large number of dimer interactions in each crystal, we focus on the short-ranged dimer interactions occurring in the polymorphs of six of the smallest species: ACBNZA, MCHTEP, MNIAAN, UFAGIS, GLUCIT and ZEYBIO. The first four systems had problematic conformational energies with conventional density functionals in Sec. III A, while the latter two were well-behaved and serve as “controls” for the harder systems. The resulting CP2b interaction energy test set of 173 dimers consists of all symmetrically unique dimer interactions involving molecules separated by up to 4 Å (i.e., nearest-neighbors). The average DLPNO-CCSD(T) interaction energy in the set is −22.3 kJ/mol. The vast majority of dimers (82%) have attractive interaction energies in the range of 0–40 kJ/mol, 13% are bound by an even larger 40–120 kJ/mol, and the remaining 5% are repulsive by up to 7 kJ/mol.
Figure 4 plots the interaction energy error distributions for the five conventional GGA and hybrid functionals along with ωB97M-V and SCS-MP2D. Positive errors here indicate that the dimer is under-bound, while negative errors are over-bound. B86bPBE-XDM, PBE-MBD, and PBE0-MBD exhibit the largest rms errors of 3.6–3.9 kJ/mol, with no obvious improvement for the hybrid PBE0 functional over PBE. The two MBD-corrected models also exhibit the largest individual dimer interaction energy errors of the models tested here (both 17.1 kJ/mol, for a UFAGIS dimer). The PBE-D4 and PBE0-D4 rms errors of 2.0 and 2.1 kJ/mol, respectively, are considerably better than either the MBD or B86bPBE-XDM results. Switching to the ωB97M-V functional or SCS-MP2D reduces the rms errors to 1.6 and 1.5 kJ/mol, respectively.
Looking more closely at the error distributions in Fig. 4, all five of the conventional GGA/hybrid functionals and dispersion treatments examined here exhibit systematic errors: they consistently underestimate the strength of aromatic stacking van der Waals interactions and systematically over-bind hydrogen bonds.65,128,129 The most over-bound dimer errors mostly occur for GLUCIT, which has extensive hydrogen bonding, including some dimers with multiple hydrogen bonds. The error for a single hydrogen bond is often around ∼2–5 kJ/mol, and it can be as much as ∼10 kJ/mol for dimers with 2–3 hydrogen bonds.
The most positive dimer errors occur for UFAGIS, MCHTEP, MNIAAN, and ACBNZA, all of which exhibit substantial aromatic stacking interactions. Despite having two phenyl groups per molecule, the ZEYBIO molecules do not form aromatic stacks in these polymorphs, so their dimers are not prominent among the highly under-bound dimers. While these two types of systematic errors occur for all of the conventional functionals, they are most exaggerated for PBE-MBD and PBE0-MBD (Fig. 4). These systematic errors in the dimer interaction energies play an important role in the predicted crystalline stabilities, as will be discussed below.
The dimer interaction energy errors are smaller and less systematic for ωB97M-V and SCS-MP2D. For example, some of the most negative SCS-MP2D errors do still involve hydrogen bonded GLUCIT dimers. However, others include aromatic stacked dimers from MNIAAN and UFAGIS, along with NH⋯S and NH⋯π interactions in ZEYBIO.
While the accuracy with which individual dimer interaction energies are described is an important metric for judging a model’s performance, it is even more important in molecular crystals to examine how these errors accumulate or cancel in their net contribution to the lattice energy. To do so, the errors from all dimers coming from a particular polymorph were weighted based on their symmetry factors and summed to compute a total two-body error in the lattice energy per molecule.
Figure 5 plots the errors in these two-body lattice energy contributions relative to the DLPNO-CCSD(T) benchmark results for each of the six polymorph sets. First, the accumulated two-body errors are substantial for the conventional GGA and hybrid functionals, with root-mean-square errors in the range of 8–19 kJ/mol and a maximum error of 41 kJ/mol (for B86bPBE-XDM in UFAGIS). For context, the average total two-body lattice energy contribution from these dimers is 164 kJ/mol, and energy differences between organic crystal polymorphs, in general, are usually less than 10 kJ/mol. These two-body interaction energy errors are also considerably larger than the ∼5 kJ/mol mean absolute errors in the total lattice energies found for the same functionals in small-molecule crystal benchmarks, such as the C21/X23 test sets.59,60,123 That discrepancy in magnitude reflects both the greater size and complexity of the species here, which leads to larger-magnitude lattice energies and a greater potential for problematic two-body interactions. The large accumulated errors seen for the conventional density functionals in UFAGIS and two of the MCHTEP polymorphs arise primarily from systematic under-stabilization of the extensive aromatic stacking interactions. Second, the hybrid PBE0 functional only sometimes out-performs the two GGA functionals for the accumulated two-body errors. For example, PBE0-D4 exhibits smaller net two-body errors than PBE-D4 in GLUCIT and ZEYBIO, but it has errors that are comparable or larger to PBE-D4 in the other four systems.
ωB97M-V and SCS-MP2D substantially reduce the accumulated two-body errors in Fig. 5, which is consistent with the much smaller dimer interaction energy errors seen in Fig. 4. The rms errors in the net two-body contributions for ωB97M-V and SCS-MP2D are 6.6 and 5.3 kJ/mol, respectively, which are roughly on par with the lattice energy errors typically found in C21/X23 benchmarks. The largest-magnitude net two-body errors are ∼10–12 kJ/mol for ωB97M-V (UGAFIS and GLUCIT) and ∼−8–9 kJ/mol for SCS-MP2D (GLUCIT).
For crystal structure prediction, the computed relative energy between polymorphs is arguably even more important than the total lattice energy. With this in mind, the gray numbers in Fig. 5 indicate the maximum relative error between polymorphs. This reveals an interesting behavior: despite B86bPBE-XDM having larger net two-body errors than the PBE and PBE0-based models in every polymorph set, except GLUCIT, it actually exhibits rather small relative two-body errors between polymorphs in most cases. For ACBNZA, for example, the difference in the net two-body errors for the two polymorphs is only 2.4 kJ/mol, compared to 10.3 kJ/mol with PBE0-MBD. With the exception of MCHTEP, which proves exceptionally difficult for most of the DFT models here, the relative B86bPBE-XDM two-body energy errors differ by ∼5 kJ/mol or less. PBE-D4 also performs almost as well in this regard. In contrast, the PBE0-MBD net error variations across polymorphs exceed 5 kJ/mol for every species, except GLUCIT and ZEYBIO. Finally, even better performance is once again obtained with ωB97M-V and SCS-MP2D. For SCS-MP2D, the relative two-body errors are within ∼2 kJ/mol in every case except UFAGIS. The ωB97M-V errors are more variable, with a small range of errors for UFAGIS, GLUCIT, and ZEYBIO, while exhibiting somewhat larger errors for the other three systems.
In summary, the conventional functionals systematically over-stabilize hydrogen bonds and underestimate aromatic stacking interactions in these systems. When accumulating the net dimer interaction energies for the crystals, B86bPBE-XDM and PBE-D4 appear to exhibit better error cancellation among the two-body errors between polymorphs than do the PBE0-D4 or MBD models. Once again, ωB97M-V and SCS-MP2D perform significantly better, with errors that are both generally smaller and less systematic. The net two-body errors for SCS-MP2D are particularly small in most systems.
C. Impact on the overall relative polymorph energies
Sections III A and III B have examined the one-body intramolecular conformational energies and the short-range two-body intermolecular interactions separately. However, the overall lattice energy reflects the sum of those contributions plus longer-range two-body and many-body intermolecular interactions. In this section, therefore, we assess how the interplay of the one-body and two-body errors impacts the overall relative polymorph lattice energy differences. Figure 6 plots the intramolecular errors (from Fig. 2) and the relative two-body errors i.e., (the relative error differences between polymophs in Fig. 5). Figure 7 plots the overall relative periodic DFT polymorph energies from B86bPBE-XDM, PBE-MBD, and PBE0-MBD. It shows how those relative crystal energies change when one corrects the intramolecular conformational energies (one-body correction) and short-range pairwise intermolecular interaction energies (two-body correction) with their DLPNO-CCSD(T) values according to Eqs. (1) and (2). It also includes experimental data, where available. For ACBNZA, relative enthalpy measurements between the two polymorphs have been reported by two groups,68,130 and both values are shown. Qualitative (but not quantitative) stabilities for the MNIAAN131 and MCHTEP132 polymorphs have been reported as well.
For the four systems exhibiting substantial conformational energy errors with GGA and hybrid functionals [Figs. 6(a)–6(d)], the intra- and intermolecular errors with those functionals largely exhibit opposite signs and cancel to varying degrees in the overall lattice energy. To understand this cancellation better, consider the ACBNZA system. B86bPBE-XDM over-stabilizes the planar α polymorph conformation (ACBNZA02) considerably, producing the large positive intramolecular error in Fig. 6(a). It then generally underestimates the two-body intermolecular interactions in both polymorphs to a relatively uniform extent (Fig. 5) such that the relative intermolecular energy difference is only 2.4 kJ/mol. Because the intramolecular error dominates the B86bPBE-XDM polymorph energy difference, applying a one-body monomer conformational energy correction with DLPNO-CCSD(T) via Eq. (1) improves the overall polymorph energy difference relative to experiment [Fig. 7(a)], as found previously.37 Further correcting the two-body intermolecular interactions according to Eq. (2) destabilizes ACBNZA01 slightly such that it now lies 1.4 kJ/mol above ACBNZA02. This final result lies within ∼1 kJ/mol of an earlier hybrid many-body interaction (HMBI) model study,58 which paired MP2D133 (a predecessor of SCS-MP2D) monomer and dimer descriptions with periodic HF for long-range and many-body interactions. That earlier study also found that the remaining apparent discrepancy with the experimental enthalpies shown in Fig. 7(a) can largely be accounted for via inclusion of finite-temperature effects that are neglected here.
For comparison, PBE0-MBD over-stabilizes the ACBNZA02 conformation relative to the ACBNZA01 one by 10.3 kJ/mol (4 kJ/mol more than B86bPBE-XDM). This reflects both delocalization error preferring the π-delocalization found in the planar conformation and the systematically over-estimated hydrogen bond strength for the intramolecular hydrogen bond (Fig. 3). However, the relative two-body intermolecular error between polymorphs is considerably larger with PBE0-MBD than B86bPBE-XDM, spuriously favoring ACBNZA01 by 10.6 kJ/mol. Whereas ACBNZA01 has two intermolecular hydrogen bonds per molecule whose strength PBE0-MBD significantly over-estimates, ACBNZA02 only has one. Moreover, the error for that one hydrogen bond in ACBNZA02 is largely canceled by the under-binding of the intermolecular aromatic stacking enabled by the planar ACBNZA02 conformation. This results in the large PBE0-MBD two-body error for ACBNZA01 and the near-zero error for ACBNZA02 in Fig. 5(a).
Combining everything, PBE0-MBD predicts a nearly correct ACBNZA polymorph energy difference [Fig. 7(a)], thanks to the cancellation between ∼10 kJ/mol intra- and intermolecular errors. Correcting the monomer conformational error with DLPNO-CCSD(T) cures the intramolecular energies, but it produces a far worse overall relative polymorph energy difference because it disrupts that error cancellation. Correcting both the one-body and two-body errors leads to a more reasonable relative polymorph energy difference that is in good agreement both with the original PBE0-MBD result and the one-body- and two-body-corrected B86bPBE-XDM energies. PBE-MBD behaves similarly to PBE0-MBD, albeit with moderately smaller error cancellations involved.
The fact that the monomer- and dimer-corrected ACBNZA relative polymorph energies computed from B86bPBE-XDM, PBE-MBD, and PBE0-MBD [Fig. 7(a)] vary by only ∼1 kJ/mol is reassuring. The most important interactions are being modeled identically with DLPNO-CCSD(T) in all three cases, and the functional is only used to describe the residual long-range and many-body contributions. All three functionals apparently describe these interactions similarly.
Now consider MNIAAN, which is an example of color polymorphism (like ROY and MCHTEP).67 MNIAAN11 adopts a non-planar conformation for the molecule, while MNIAAN02 and MNIAAN12 have more planar ones. In this case, the intramolecular errors arising from π-delocalization and the intramolecular hydrogen bond dominate when using the conventional functionals, though again the relative intermolecular errors are more substantial for the PBE0 models [Fig. 6(b)]. These intermolecular errors are most notably driven by the 5–9 kJ/mol underestimation of the aromatic-stacking interactions for the planar conformations in MNIAAN02 and MNIAAN12; no comparable interactions occur for the non-planar conformation in MNIAAN11.
Experimentally, the white MNIAAN11 polymorph is the most stable, followed by the yellow MNIAAN12 and the amber MNIAAN02.131 With a DFT-only treatment, all three functionals invert this stability ordering [Fig. 7(b)]. After applying DLPNO-CCSD(T) monomer and dimer corrections, however, all three functionals give quantitatively similar energetics that match the experimental stability ordering. Assuming that these final DFT + 1b + 2b relative energies reasonably represent true polymorph energetics, we find that while PBE0-MBD provided the best initial uncorrected energetics, none of the uncorrected DFT methods is particularly close to the final relative energies.
We know from the intramolecular benchmarks in Sec. III A that these density functionals artificially destabilize the non-planar MNIAAN11 conformation energy relative to the other two polymorphs. Correcting this conformational energy error with the DLPNO-CCSD(T) monomer correction appropriately makes MNIAAN11 the most stable form, with monomer-corrected B86bPBE-XDM giving the result closest to the final DFT + 1b + 2b results. Monomer-corrected PBE0-MBD, on the other hand, clearly over-stabilizes MNIAAN11. Similar to ACBNZA, applying a one-body correction to PBE0-MBD fixes the conformational energy, but it also eliminates a large error cancellation between one-body and two-body errors [Fig. 6(b)].
Applying a one-body monomer correction to MNIAAN12 has a smaller effect on its energy relative to MNIAAN02, since their conformations are more similar. Monomer-corrected PBE-MBD and PBE0-MBD give better results than B86bPBE-XDM+1b, thanks to their small intermolecular errors. The B86bPBE-XDM result is only a couple kJ/mol worse, but this is enough to reverse the qualitative ordering relative to MNIAAN02. Further refining the energies with two-body corrections addresses these issues, at which point the variations arising from the three different starting functionals once again become small.
MCHTEP represents the most difficult case here, with large intra- and intermolecular errors for the conventional functionals, especially for MCHTEP11 relative to the other forms. The one-body error stems from over-stabilization of the extended π conjugation and of the intramolecular hydrogen bonds present in the more planar MCHTEP07 and MCHTEP13 conformations, both of which are missing from the non-planar MCHTEP11. At the same time, the functionals substantially underestimate the intermolecular aromatic stacking interactions found in MCHTEP07 and MCHTEP13. The two-body error in MCHTEP11 is much smaller than in the other two polymorphs, since the key dimer involves both underestimated aromatic stacking interactions and a pair of over-estimated hydrogen bonds that largely cancel, such that the overall MCHTEP11 intermolecular error is far smaller [Fig. 5(c)]. In other words, the relative errors in Fig. 6(c) reflect the large errors for MCHTEP07 and MCHTEP13 rather than problems with MCHTEP11.
Combining these errors leads to the relative polymorph energies shown in Fig. 7(c). At room temperature, the yellow MCHTEP07 form is the most stable experimentally, followed by the pale yellow MCHTEP13 and then the least-stable white MCHTEP11 polymorph.132 MCHTEP11 becomes the most stable form above 360 K. All three DFT methods predict the room-temperature ordering without any correction, though with very different relative energetics for the hybrid vs the GGA functionals. Applying the monomer correction corrects most of the error for MCHTEP13, but it effectively over-corrects for MCHTEP11 by disrupting the cancellation between inter- and intramolecular errors. This contrasts how monomer correction substantially improved the B86bPBE-XDM results for ACBNZA and MNIAAN. After applying both monomer and dimer corrections, we once again arrive at relatively consistent polymorph energies for all three functionals. These relative energies are also consistent with the experimental stabilities, though, of course, the neglected vibrational and finite-temperature effects could impact the predicted stabilities further.
UFAGIS similarly exhibits sizable one-body and two-body errors. The intramolecular error stems from the over-stabilization of the planar UFAGIS conformation relative to the non-planar UFAGIS01 one, while the intermolecular one stems from the underestimation of the aromatic stacking interactions, which are more severe in UFAGIS. Since experimental polymorph stabilities are unavailable, we take the monomer- and dimer-corrected PBE0-MBD results as the “true” energies; these corrected energies are similar for all three functionals [Fig. 7(d)]. Once again, substantial error cancellation between the one-body and two-body errors leads to DFT relative polymorph energies that are better than the individual intra/intermolecular errors would suggest. Applying a monomer correction disrupts the error cancellation and leads to energies in worse agreement with the final result; correcting both one-body and two-body interactions is necessary to obtain a more balanced result.
Finally, consider GLUCIT and ZEYBIO to contrast the four more challenging systems. Detailed experimental stability information is unavailable for these systems. The experimental literature on GLUCIT is complicated, with many forms reported, conflicting nomenclature, and only three polymorphs available in the CSD. See, for example, Refs. 91, 93, and 134 and references therein. It has also been argued135 that the hydroxyl orientations/space groups of the commercial γ form (GLUCIT03) inferred from the powder synchrotron x-ray diffraction experiments91 are erroneous, which could impact the predicted stability for that form. No experimental stability information is available for ZEYBIO.
As shown in Figs. 6(e) and 6(f), the intra- and intermolecular errors are small for GLUCIT and ZEYBIO, with the intermolecular errors usually being a little larger. For most models and polymorphs, the intra- and intermolecular errors here also have the same sign, in contrast to the oppositely signed errors that are prevalent in the first four systems and that facilitate error cancellation. Even without one-body or two-body corrections, the relative polymorph energies are fairly similar across all DFT models [Figs. 7(e) and 7(f)]. The one-body and two-body GLUCIT corrections are relatively modest, shifting the relative energies by only 1–3 kJ/mol. The prediction of GLUCIT04 being the most stable is achieved either by applying one-body and/or two-body corrections to the GGA functionals or by using PBE0-MBD. The particularly large instability of GLUCIT03 relative to the other forms could reflect the incorrect hydrogen bonding network mentioned above. The one-body and two-body corrections are a little larger for ZEYBIO than for GLUCIT, but the predictions are again fairly similar across the three functionals. The most notable difference occurs for PBE-MBD, where the two-body corrections stabilize ZEYBIO01 more so than for the other two functionals.
Finally, we note that the one-body and two-body ωB97M-V and SCS-MP2D errors are much smaller across all six systems (Fig. 6). The worst performances occur for MCHTEP with ωB97M-V and for UFAGIS with SCS-MP2D, but the conventional DFT functionals perform even worse in those cases. The error cancellation between inter- and intramolecular interactions is perhaps less systematic for ωB97M-V and SCS-MP2D, but the smaller errors reduce the importance of having that error cancellation occur.
D. General discussion
On the whole, these results show that the performance of standard density functionals in organic crystals is complex and relies heavily on error cancellation. The largest errors involve a mix of over-stabilizing extended π conjugation, underestimating aromatic stacking interactions that tend to be more plentiful in those systems with greater π conjugation, and over-estimating hydrogen bond strengths. The particular nature of the error cancellation seems to vary for different functionals. For the conformational polymorphs studied here, B86bPBE-XDM obtains substantial error cancellation among the intermolecular two-body contributions to the lattice energy, while PBE-MBD and PBE0-MBD have larger relative two-body errors and rely more on cancellation between one-body and two-body errors.
Earlier work found that correcting the B86bPBE-XDM monomer conformational energies with higher-accuracy ones worked well in several conformational polymorph systems, ranging from smaller molecules to pharmaceuticals.37 If we take the one- and two-body-corrected PBE0-MBD energies as the true result, applying a DLPNO-CCSD(T) monomer correction improves 7 of 9 relative polymorph energies here (exceptions: MCHTEP11 and UFAGIS01). This reflects how the one-body B86bPBE-XDM errors dominate over the two-body ones for most of these systems. In contrast, PBE0-MBD exhibits larger two-body errors in most of these systems, which are somewhat mitigated by cancellation with the one-body errors. Correcting the monomer conformational energies therefore makes the PBE0-MBD energies distinctly worse in 5 of 9 cases. The fact that uncorrected PBE0-MBD predicts more accurate relative polymorph stabilities in these systems than uncorrected B86bPBE-XDM or PBE-MBD despite having larger conformational and dimer interaction energies highlights the degree to which these one-body and two-body PBE0-MBD errors cancel.
Finally, the fact that the three density functionals achieved similar relative lattice energies once monomer and local dimer corrections were applied is encouraging, since it suggests that the different functionals are describing the long-range and many-body intermolecular interactions similarly. As noted earlier, the one- and two-body corrections here amount to the same approach as has been used in HMBI and other fragment-approaches previously. While polarizable force fields or periodic Hartree–Fock have probably been applied more often to model the long-range and many-body interactions, there is precedent for using periodic DFT as well.125–127 With that said, one previous energy decomposition found considerable differences between the B86bPBE-XDM and HF long-range (10 Å) and many-body contributions for oxalyl dihydrazide.58 Those interactions should be dominated by electrostatics and polarization, which could ideally be described reasonably by either model. Further work will be needed to understand such discrepancies better, particularly with regard to the role of delocalization error in the many-body contributions.
The current work also computed far fewer dimer interactions with the high-level method compared to earlier HMBI studies. For the six sets of polymorphs considered here, the 4 Å two-body cutoff required computing an average of only 8.2 dimer interactions per molecule in the asymmetric unit at the high level of theory, compared to a few dozen dimers with the ∼9–10 Å cutoffs often employed previously.
Nevertheless, performing the two-body correction with DLPNO-CCSD(T) remains computationally demanding. As evidenced from Fig. 6, SCS-MP2D also performs quite well in most of these systems and better overall than any of the density functionals tested. Employing one-body and two-body SCS-MP2D corrections to B86bPBE-XDM gives relative lattice energies that match the DLPNO-CCSD(T)-corrected PBE0-MBD results to within an rms error of 2.1 kJ/mol. In contrast, uncorrected B86bPBE-XDM and PBE0-MBD differ by larger rms errors of 5.2 and 2.9 kJ/mol, respectively.
Given the high cost of hybrid density functionals in planewave basis sets, it could be worth investigating the computational cost and accuracy trade-offs between using hybrid functionals in periodic DFT vs performing local monomer and dimer corrections for a GGA functional like B86bPBE-XDM. Beyond using methods such as SCS-MP2D or an advanced density functional directly, it could be appealing to employ Δ-machine learning techniques136–140 to correct the important one-body and two-body interactions at lower cost, particularly if one wanted to correct the energies for a large number of structures on a crystal structure prediction energy landscape.
Conformational polymorphs represent a particularly challenging case for organic crystal lattice energy rankings. This study benchmarked intramolecular conformational energies across 20 sets of conformational polymorphs ranging from small molecules to much larger drug molecules. Detailed analysis of the dimer interactions were then carried out for six of these polymorph sets. Finally, the nature of the error accumulation and cancellation among different types of interactions for the final polymorph stability predictions was investigated. These benchmarks revealed the following:
Conventional van der Waals-inclusive density functionals generally describe intramolecular conformational energies well, except in cases where delocalization error manifests. More advanced density functionals or high-quality correlated wavefunction methods avoid those issues and perform better across the board.
The conventional GGA and hybrid functionals studied here show systematic errors in modeling non-covalent interactions: hydrogen bond strengths are over-estimated, while aromatic stacking interactions are underestimated. Again, more advanced models perform considerably better. PBE-D4 and PBE0-D4 perform well for the dimer interactions as well, contrasting to their more mediocre performance for the intramolecular conformational energies.
Error cancellation among these one-body and two-body errors plays a major role in the overall accuracy of these functionals for predicting the relative polymorph energies. For the six species studied in detail here, B86bPBE-XDM benefited more from error cancellation among the intermolecular interactions, while PBE-MBD and PBE0-MBD relied more on cancellation between intra- and intermolecular interaction errors.
Given the nature of the error cancellation, applying a monomer conformational energy correction to address intramolecular errors improved the relative B86bPBE-XDM crystal energies in most cases here. In contrast, it often made the PBE0-MBD energies considerably worse, since eliminating the one-body errors disrupts error cancellation and exposes the two-body errors. Applying both one-body and two-body corrections resolves this issue, albeit with considerably higher computational cost.
B86bPBE-XDM, PBE-MBD, and PBE0-MBD predict fairly similar long-range two-body and many-body contributions. The differences in short-range interactions are much larger. This potentially means that efforts to improve DFT descriptions of molecular crystals might focus primarily on the local interactions.
The CP1b (54 monomer conformations) and CP2b (173 dimer interactions) datasets generated here provide real-world example systems in non-equilibrium geometries, including a number of cases that are substantially impacted by delocalization error. The associated DLPNO-CCSD(T) energies can offer challenging benchmarks for testing other electronic structure methods.
Moving forward, it would be valuable to extend the investigations here to a broader array of crystals, particularly focusing on the intermolecular interactions. The first goal would be to develop better chemical intuition for when intermolecular interaction errors will be large and how those will interact with any intramolecular errors. At present, the conformational energy errors in systems such as ACBNZA, MNIAAN, and MCHTEP are easy to anticipate, but the fact that monomer-corrected B86bPBE-XDM performs well for all polymorphs of those systems except MCHTEP11 is harder to anticipate. a priori.
The second goal would be to assess the impact of delocalization error in the long range/many-body interactions. Despite the consistency of these interactions across the three functionals here, earlier work has found sizable discrepancies between DFT and periodic HF descriptions of those interactions in at least one system.58 Nevertheless, clear progress in the quantum chemical description of the energies of conformational polymorphs is being made, and this will have important implications for crystal structure prediction in pharmaceuticals and other organic materials.
See the supplementary material for Cartesian coordinate geometries for the CP1b and CP2b sets, conformational and interaction energies for each structure with each electronic structure method, and the results of convergence testing for the DLPNO-CCSD(T) benchmarks.
G.J.O.B. gratefully acknowledges funding for this work from the National Science Foundation (Grant No. CHE-1955554) and supercomputer time from XSEDE (Grant No. TG-CHE110064). S.E.W. acknowledges financial support from The Cambridge Crystallographic Data Center and The Engineering and Physical Sciences Research Council, UK. A.J.C.-C. acknowledges The Royal Society for an Industry Fellowship in AstraZeneca, UK.
Conflict of Interest
The authors have no conflicts to disclose.
The data that support the findings of this study are available within the article and its supplementary material or from the corresponding author upon reasonable request.