Molecular modeling plays an important role in the discovery of organic structure-directing agents (OSDAs) for zeolites. By quantifying the intensity of host–guest interactions, it is possible to select cost-effective molecules that maximize binding toward a given zeolite framework. Over the last few decades, a variety of methods and levels of theory have been used to calculate these binding energies. Nevertheless, there is no consensus on the best calculation strategy for high-throughput virtual screening undertakings. In this work, we compare binding affinities from density functional theory (DFT) and Dreiding force field calculations for 272 zeolite–OSDA pairs obtained from static and time-averaged simulations. Enabled by automation software, we show that Dreiding binding energies from the frozen pose method correlate best with DFT energies. They are also less sensitive to the choice of initial lattice parameters and optimization algorithms, as well as less computationally expensive than their time-averaged counterparts. Furthermore, we demonstrate that a broader exploration of the conformation space from molecular dynamics simulations does not provide significant improvements in binding energy trends over the frozen pose method despite being orders of magnitude more expensive. The code and benchmark data are open-sourced and provide robust and computationally efficient guidelines to calculating binding energies in zeolite–OSDA pairs.
I. INTRODUCTION
Zeolites are nanoporous materials widely used in catalysis, separation, sorption, and many other industrially relevant applications.1 These metastable polymorphs are typically crystallized by adding inorganic cations and organic molecules to amorphous precursor gels in hydrothermal conditions2–4 although organic-free approaches are also possible.5–12 In particular, a combination of electrostatic effects and dispersion interactions allows the molecules to act as organic structure-directing agents (OSDAs) that template the pore structure of the zeolite and determine its topology.4,13,14 By exploring a variety of OSDAs under different synthesis conditions, several new zeolites have been discovered over the last few decades.15 Nevertheless, only around 250 different topologies have been experimentally identified,16,17 a majority of which rely on OSDAs to be realized. Accessing known and new zeolites with desired pore structures and compositions through the OSDA design is still an open-ended problem, often relying on trial and error.18
The computational simulation of zeolite–OSDA interactions has been an important resource to rationalize the design of templates, explain synthesis outcomes, or locate preferential placements for OSDAs.14,19 Early molecular modeling works relied on shape-matching methods20–23 to assess the goodness-of-fit between an OSDA and a zeolite. Later, it was shown that host–guest interactions computed from atomistic simulations are good predictors of synthesis outcomes for zeolites.24–30 This descriptor has enabled the theory-driven discovery of many zeolites over the last few years.15,31–36
Yet, the interaction energy between guests and hosts is strongly dependent on the level of theory and computational pipeline employed. Methods based on quantum mechanics, such as density functional theory (DFT) calculations, offer mostly parameter-free descriptions of the total energy once a suitable exchange–correlation functional is chosen. However, the large size of typical zeolite–OSDA systems demands considerable computational resources, preventing the use of DFT for OSDA screening purposes. A more cost-effective approach is to use force fields (FFs) to compute interatomic interactions. Several parameterizations have been employed to model host–guest interactions in zeolites, including CVFF,25,28,37–39 Dreiding,36,40–43 COMPASS,44 UFF,45,46 and others.24,47–51
Even after a level of theory is appropriately chosen, different ways to calculate interaction energies between zeolites and OSDAs remain. As an example, one could calculate binding affinities by computing energies by performing structural optimizations,24,44,51 molecular dynamics (MD) simulations,36,40,41,43,47,49,50,52 or calculating van der Waals (vdW) contributions by freezing host and guest structures.25,39,45,46,53,54 Furthermore, optimizations and dynamics can be performed at constant pressure50,51 or volume.25,39,43,45,46,52 In screening approaches, calibrating strategies to obtain consistent, computationally efficient results often make up for biases in simulation methods, matching experimental results better than higher levels of theory with less systematic errors.55,56 This is especially important in a diverse, combinatorial chemical space such as the one provided by pairings of OSDAs and zeolites.
In this work, we compare methods to calculate binding energies of OSDAs in zeolites and propose guidelines to compute these interactions in high-throughput virtual screening approaches. In particular, the following contributions are put forward:
Different simulation constraints, structural optimization algorithms, and initial conditions of the hosts significantly change the trends in binding energies for all simulation methods.
Binding energies of zeolite–OSDA pairs from the Dreiding FF are best calculated by constant volume simulations with the frozen pose method, as opposed to structural optimizations or MD simulations.
A reference dataset of 272 OSDA–zeolite pairs calculated with DFT and FF approaches can enable further benchmarks with other FF parameterizations, software packages, or simulation pipelines.
An open-source Python interface to the General Utility Lattice Program (GULP), GULPy, to enable faster generation of input files, parsing of outputs, structure manipulation, and execution of FF calculations.
II. METHODS
A. Simulation details
DFT calculations were performed using the Vienna Ab initio Simulation Package (VASP),57,58 version 5.4.4, within the projector-augmented wave (PAW) method.59,60 The Perdew–Burke–Ernzerhof (PBE) functional within the generalized gradient approximation (GGA)61 was used as the exchange–correlation functional. vdW interactions were taken into account through Grimme’s D3 corrections.62,63 Several benchmarks have shown that including dispersion corrections in DFT calculations is imperative to accurately predict trends in binding energies between zeolites and guests.53,54,64–72 Although D3 is known to overbind the guest species to zeolites,54,70–72 it provides an excellent balance between cost and accuracy for binding energies compared to higher levels of theory such as random phase approximation, Møller–Plesset perturbation theory, or many-body dispersion. The kinetic energy cutoff for plane waves was restricted to 520 eV. Integrations over the Brillouin zone were performed using Monkhorst–Pack k-point meshes73 (Γ-centered for hexagonal unit cells) with a uniform density of 64 k-points/Å−3 (see Table S1 for complete k-point meshes for each zeolite). For isolated molecules, a vacuum of 15 Å thickness was employed in all directions to avoid unphysical interactions between periodic images. A stopping criterion of 10−6 eV was adopted for the self-consistent field (SCF) cycle energy convergence. Relaxation of unit cell parameters and atomic positions was performed until the Hellmann–Feynman forces on atoms were smaller than 10 meV/Å. Ab initio MD (AIMD) simulations were performed in the NPT ensemble with Langevin dynamics74,75 within the Parrinello–Rahman method76,77 and in the NVT ensemble with the Nosé–Hoover thermostat.78 A conservative time step of 0.5 fs was used for all AIMD trajectories, with all frames saved for post-processing. The sampling temperature was fixed at 400 K to simulate typical hydrothermal conditions in zeolite synthesis.3 The fictitious lattice mass was set to 1000 atomic mass units, and all Langevin friction coefficients were set to 1 ps−1 in the NPT simulation. A Nosé mass set by the VASP tag SMASS = 0.1 was used in the NVT simulation. The external pressure has been set to 0 kbar to allow comparisons with similar works where this value is implicit.50,51 Ground-state geometries were thermalized by randomly displacing the atoms by up to 0.02 Å in each Cartesian coordinate before being used as initial configurations for AIMD calculations. AIMD simulations were performed for 500 fs, with only the last 200 fs considered for production. Despite the short time lengths, this is enough to have well-equilibrated trajectories (see Fig. S1).
FF simulations were performed using the General Utility Lattice Program (GULP), version 5.1.1,79,80 through the new GULPy package.81 MD simulations were performed in the NVT and NPT ensembles with modified Nosé–Hoover dynamics82 using the leapfrog Verlet integrator, 0.5 fs time steps, a temperature of 400 K, and 0 kbar of external pressure. Fully optimized geometries were used as the initial configurations for MD calculations. All MD trajectories consisted of a production run of 5 ps preceded by a 5 ps equilibration run. Production frames were saved every 10 fs. The Dreiding force field83 was used to model interactions between the zeolite and the OSDA. Among the many options of parameterizations, Dreiding has been widely used for OSDA screening36,41–43 and some of its predictions have been experimentally verified.32,33 Initial FF optimization of unloaded zeolites was performed with the Sanders–Leslie–Catlow (SLC) parameterization.84 Whereas several other parameterizations have been proposed for pure-silica zeolites,85–89 SLC is still widely used in the field due to the good correlation of predicted energies with experimental enthalpies of formation for zeolites.90
Initial zeolite structures were downloaded from the International Zeolite Association (IZA) database and pre-optimized using either DFT or SLC, as described above. Conformers for OSDAs were generated using RDKit91 with the MMFF94 force field92,93 and further optimized using the BP86-D3/def2-SVP94–96 level of theory as implemented in ORCA.97,98 Subsequent optimizations of these geometries with VASP did not change the final structure of the isolated molecule. Nonetheless, the energies derived from these calculations were necessary to adopt the same reference method (PBE-D3/PAW) throughout the work. Generation of OSDA–zeolite poses was performed using the VOID package.99 A complete description of the docking strategy, software, and parameters is found in Ref. 99.
B. Calculation of binding energies
Typically, host–guest interactions are described in terms of binding energies since the free energies of binding are typically unavailable from simple simulations. A general expression for the binding energy (Eb) of a molecule in a host is given by
in which Ep is the energy of the zeolite–OSDA pose, Eh is the energy of the pure-silica, unloaded zeolite host, Eg is the energy of the guest template, and n is the number of guests per pose. Whereas the interaction energy between docked guests or distortion energies for hosts and guests could be subtracted from Eq.(1)48,49 (see Fig. S2), it is often useful to include these terms when computing the stabilization provided by the OSDA toward the zeolite. It is well-known that guest–guest interactions influence the product zeolite, particularly when more than one molecule is packed into the same cavity.39,100 Furthermore, host–guest interactions should compensate for the framework or molecule distortions in favorable OSDA–zeolite pairs. Therefore, removing these contributions from the final binding energy expression may not be representative of the actual stabilization obtained in experimental settings.
The literature contains multiple examples on how to calculate each of the terms in Eq. (1). For example, the energies can be obtained by time-averaging the host–guest energies along an MD simulation,40,41,43,47,49,50,52
or by subtracting energies resulting from structural optimizations (opt),24,44,51
Moreover, the resulting geometries from structural optimizations can be frozen in their relaxed positions, and the energy components are calculated for the isolated host and guests without further relaxation,
where the summation is performed for all guests docked in the host. In this scenario, there are different values of Eh and Eg for each pose since the final atomic structure is dependent on the host–guest interactions. Here, we refer to this strategy, widely used in the zeolite literature,25,39,45,46,53,54 as the “frozen pose” method. Finally, structural relaxations and MD simulations can be performed at constant pressure or volume, adding an additional degree of freedom for each of these simulations. Figure 1 schematizes different pathways for obtaining all these energies.
(a) Workflow of calculations performed to obtain different binding energies for each of the OSDA–zeolite pairs in the dataset. Resulting energies for poses (Ep), zeolite hosts (Eh), and OSDA guests (Eg) are shown. (b) Schematic of the calculation of and energies within the frozen pose scheme. The host and guest geometries are kept fixed at the relaxed positions obtained in the pose optimization, and a single point calculation is carried for each of the isolated systems.
(a) Workflow of calculations performed to obtain different binding energies for each of the OSDA–zeolite pairs in the dataset. Resulting energies for poses (Ep), zeolite hosts (Eh), and OSDA guests (Eg) are shown. (b) Schematic of the calculation of and energies within the frozen pose scheme. The host and guest geometries are kept fixed at the relaxed positions obtained in the pose optimization, and a single point calculation is carried for each of the isolated systems.
III. RESULTS
A. Correlations between binding energy calculations
To compare the different methods of calculating binding energies, we created a dataset of 272 zeolite–OSDA poses from 164 unique complexes, which cover 60 neutral OSDAs and 55 zeolite frameworks (Fig. S3 and Tables S2 and S3). The dataset was produced by selecting well-known neutral OSDAs from the zeolite literature and pairing them to hosts with the zero-, one-, two-, and three-dimensional pore topology and small number of framework atoms for computational efficiency. Different loadings were considered for pairs in which the molecule was small compared to the zeolite cavity, following the method described in Ref. 99. Only neutral OSDAs were simulated due to ambiguities on how to calculate DFT energies for charged systems without considering the presence of heteroatoms. It is unclear whether the charge-compensating background potential added in the DFT calculation may affect the lattice parameters and energy references for the pose. Furthermore, typical approaches employ FFs without explicitly considering charges, even for cationic molecules. Therefore, this dataset of neutral molecules docked in pure-silica zeolites allows exploring variations of binding energies with a focus on OSDA/zeolite shapes and sizes.
DFT-optimized structures of zeolites and molecules were used as inputs for the docking scheme. We later repeated this docking step for some SLC-optimized zeolites as substrates to analyze the effects of different initial host lattices on the binding energies (see Fig. S4, Sec. III C). For each of the 272 poses created from DFT substrates, we carried structural optimizations at constant pressure and volume using DFT and calculated binding energies using Eq. (3). Nonetheless, the physical reality is best described by a dynamic simulation, with the temperature set to ranges of typical hydrothermal conditions. To obtain such binding energies from ab initio MD without incurring into excessive computational cost, we performed AIMD simulations for 40 different complexes whose poses contained less than 80 atoms. All other energies listed in Fig. 1 are calculated for all 272 poses (see complete values in Table S3). A summary of the calculation tree is shown in Fig. 2(a) (see Fig. S4 for a complete description).
(a) Dependency tree of the atomistic simulations performed in this work. The docking algorithm implemented in the VOID package generates poses from host structures optimized using PBE-D3 and ligands optimized with BP86-D3. Then, subsequent calculations with Dreiding and PBE-D3 are performed for each pose (see Fig. S4 for complete details). (b) Correlation matrix of binding energies calculated with different methods. The value of Spearman’s correlation coefficient is reported in each element of the matrix.
(a) Dependency tree of the atomistic simulations performed in this work. The docking algorithm implemented in the VOID package generates poses from host structures optimized using PBE-D3 and ligands optimized with BP86-D3. Then, subsequent calculations with Dreiding and PBE-D3 are performed for each pose (see Fig. S4 for complete details). (b) Correlation matrix of binding energies calculated with different methods. The value of Spearman’s correlation coefficient is reported in each element of the matrix.
After performing all calculations, binding energies obtained for the same initial docked structures were compared. If the property of interest in the benchmark was the mean absolute error (MAE) between the binding affinities, systematic shifts due to parameterization inaccuracies or energy rescaling due to dynamic effects (e.g., sampling temperature in MD simulations) would largely influence the final results (see Fig. S5). Instead, a theoretical screening method should quantify trends of host–guest binding energies to inform OSDA selection, as in many other computational screening approaches.36,41,43,101 To quantify the correlation in the ordering between two different methods, Spearman’s rank correlation coefficient (ρ) is employed as a figure of merit. It correlates two variables according to the rank of the data points and is invariant to translations and rescalings of the sets under comparison.102 An increasing, monotonic relationship between host–guest interactions from two different methods has ρ = 1. Since DFT was chosen as the reference method, a higher correlation with DFT suggests that the method is better in capturing trends in binding energies.
Figure 2(b) summarizes all pairwise correlation coefficients (see also Fig. S6). First, we observe that most binding energies from DFT correlate well with each other. In particular, optimizations at constant pressure/volume, henceforth denoted as DFT (opt, P/V), correlate well with those from DFT (MD, P/V, N = 40) simulations (ρ = 0.82 at constant pressure and ρ = 0.91 at constant volume). Since DFT can address molecular and zeolite structures accurately, these correlations set a reference level for the goodness-of-fit of other approaches. On the other hand, DFT (frz, P/V) energies correlate well with each other but not with their DFT counterparts. Because the frozen pose method removes the contribution of structural distortions toward the final binding energy (Fig. S2), DFT (frz, P/V) energies can be too low for some zeolite-OSDA pairs (see Fig. S6). In particular, when the OSDA is slightly larger than the ideal for a given zeolite cavity, thus distorting the host by a significant amount, DFT (frz, P/V) predicts favorable bindings since it does not include the energy cost of expanding the framework. As they still require costly electronic structure simulations, DFT (frz, P/V) energies do not offer an interesting cost/accuracy trade-off compared to DFT (opt, P/V) affinities.
In contrast, the best FF strategy to calculate binding energies is the frozen pose method at constant volume whose correlation coefficient with DFT (MD, P) is 0.78. It also outperforms other methods in correlations with DFT binding energies. MD-derived energies from FF simulations with the NVT ensemble have a similar correlation with DFT (MD, P) (ρ = 0.77). These values are in excellent agreement with the baseline of ρ = 0.83 between DFT (MD/opt, P/V) methods. The same trend is observed if DFT (opt, P) energies are adopted as reference, with an even higher statistical power due to the larger number and diversity of poses. FF (frz, V) outperforms other methods by achieving a correlation of ρ = 0.68 with DFT (opt, P) binding affinities, followed by FF (MD, V) (ρ = 0.55).
Although MD simulations, in principle, allow sampling a larger fraction of molecular conformations within the guest, average binding energies derived from constant volume MD simulations are extremely correlated with their frozen pose counterparts (ρ = 0.88). This suggests that further exploring the phase space beyond the local minimum does not significantly change the trends in binding energies for most cases. To ensure that this subsampling effect was not due to short trajectories, we increased the total time of the FF (MD, V) simulations to 105 ps, 5 ps of which were dedicated to an initial equilibration run. Longer trajectories enabled lower energy minima to be sampled during the simulation, thus decreasing the average binding energy of the system, as shown in Fig. S7(a). The larger time scales allowed some poses to cross potential energy barriers and achieve lower energy minima in the Dreiding PES, as depicted in Figs. S7(c)–S7(e). Nevertheless, no significant changes in trends between binding energies are found when 100 ps-long production simulations are performed to obtain FF (MD, V) affinities, as shown in the inset of Fig. S7(a). Even with longer trajectories, OSDAs are not able to diffuse through pores or undergo significant structural transitions due to confinement effects (see Fig. S7b). This result has important consequences. It demonstrates that frozen pose calculations from the Dreiding FF are slightly better predictors of reference binding energies than MD simulations while also being orders of magnitude faster to compute. These trends hold even when guest–guest interactions are removed from the binding energies (Fig. S8) or when the rank correlation is computed only across poses with favorable binding affinities (Fig. S9). In these cases, however, the reference level for the goodness-of-fit may decrease, as the Spearman’s correlation coefficient is sensitive to the relative spread and size of the ranked list.
Additionally, Fig. 2(b) shows that the correlation between FF binding energies from constant pressure calculations and DFT is much worse than the one obtained by their constant volume counterparts. Although the physical reality is, in principle, better described by a constant pressure constraint, the Dreiding FF does not correctly capture the behavior of the isolated silicate frameworks, often leading to unphysical distortions in the zeolite structure. As such, the configuration space sampled by FF calculations at constant pressure is further from the ground truth than that from constant volume calculations, leading to much poorer predictions of binding energy trends. Indeed, an analysis of the density of pure-silica zeolites shows that constant pressure FF optimizations lead to structures that are 45% denser, on average, than their experimental counterparts, as well as having shorter Si–O bond lengths and smaller O–Si–O bond angles (see Fig. S10). On the other hand, starting with DFT- or SLC-optimized substrates and optimizing the host at constant volume result in more accurate zeolite structures, which reflects on better binding energies.
B. Stability of structural optimizers for reference hosts
Even at constant volume calculations, structural optimizations of unloaded zeolites affect the binding energies of zeolite–OSDA pairs by changing the host reference energy Eh in Eq. (1). If the minimization algorithms employed in the atomic relaxation are unable to find the global energy minimum for a given structure, then all binding energies for that zeolite will be lower than the ground truth, as . While the stability of geometry optimization algorithms has been studied before,103,104 their effects are investigated here to quantify the errors in Dreiding binding energies of zeolites. We compared the Dreiding energies of pure-silica, unloaded zeolites optimized at constant volume through four algorithms: the conjugate gradient (CG) method, rational function optimization (RFO), Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm, and symmetry-lowering (Lower) method, all as implemented in GULP. To accelerate the convergence of the structural relaxation, we started the optimization with the CG, BFGS, or Lower methods and switched to the RFO method when the norm of the gradient (|G|) was smaller than a threshold of choice. In principle, if the algorithms were equally effective in finding the global energy minimum, all structures would converge to the same ground state. However, the results indicate that the algorithms often disagree on the equilibrium energies and geometries. Figure 3 shows the distribution of energies for zeolites with respect to the minimum energy configuration for the same framework found among all four optimization runs. No optimization scheme outperforms the others across all zeolites. Whereas at least 75% of the energy differences tends to be smaller than 1 kJ/mol SiO2, some algorithms tend to overestimate the energy of the ground state of a zeolite by up to 100 kJ/mol SiO2. Changing the |G| threshold also leads to different energy minima. If we use BFGS as the initial optimizer, but switch to RFO at different values of |G|, the outcomes of the simulation are different. We did not find a threshold for |G| that outperforms others nor did we find a significant advantage in not using RFO altogether. In fact, avoiding switching to RFO often increases the number of steps necessary to reach a local energy minimum.
Distribution of Dreiding FF energies of unloaded zeolite frameworks (E) with respect to the minimum energy obtained among all optimization algorithms for the same framework (Emin). The shaded box represents the interquartile range, the vertical line is the median, and the whiskers span the range of the distribution. DFT-optimized frameworks were used as inputs for the Dreiding FF optimization.
Distribution of Dreiding FF energies of unloaded zeolite frameworks (E) with respect to the minimum energy obtained among all optimization algorithms for the same framework (Emin). The shaded box represents the interquartile range, the vertical line is the median, and the whiskers span the range of the distribution. DFT-optimized frameworks were used as inputs for the Dreiding FF optimization.
While optimization energies can get trapped in local minima, MD simulations allow atoms to move and avoid these higher energy traps. Thus, reference energies should be more robust to the optimization methods used to generate the initial structures. Figure 3 compares differences according to the algorithm that generated the structure used as input for the MD simulation. MD energies lower the differences between the optimizers, but the discrepancies are still as high as 10 kJ/mol SiO2 for some systems. This suggests that structural relaxations can affect even time-averaged energy references. Therefore, finding the global ground state energy of different hosts is unlikely without a thorough combination of different minimizers. These systematic shifts explain the existence of binding energies of a large absolute value, such as the ones observed in ACO zeolite (see Table S3). Rather than exposing a flaw in the calculation, they suggest that even after multiple optimization attempts, the isolated host has not converged to its ground state geometry. As a consequence, all binding energies for that particular host are biased if Eq. (3) is used to compute the binding affinity of an OSDA toward a zeolite. When OSDAs are ranked across a single zeolite framework according to their binding energy,99 this is often not a problem. Since the energy reference is shifted by the same amount for all poses, the trends of binding energies along a single host are preserved, but comparisons between different hosts are hindered. Nevertheless, non-systematic errors are undesirable in the high-throughput discovery of materials and should be avoided. The role of energy references in binding energy calculations does not alter the conclusions drawn from Fig. 2(b). If we analyze the correlations between binding energies for a single framework (see Fig. S11 for an analysis on SOD zeolite), the same trends with respect to DFT energies are observed. The only difference is that a larger correlation between FF binding energies at constant pressure and volume is found and also between FF and DFT binding energies at constant pressure. This might be a feature of sod cage, which effectively confines an OSDA without allowing the framework to collapse toward a denser structure due to an inappropriate description by the Dreiding FF.
C. Variability of binding energies according to the initial substrate parameters
Another constraint required in constant volume simulations is fixing the host lattice parameters prior to docking. Since methods such as DFT and SLC lead to different equilibrium lattice constants, often with DFT predicting less dense zeolites due to the GGA-PBE functional (see Fig. S10), host–guest interactions are also affected by the choices of unit cells. To compare the influence of the initial host parameters on the final binding energy, we performed the docking procedure a second time for 81 different complexes using either the DFT or SLC geometry of the host as an input (Fig. S3d). The VOID package generated an average of 14 poses for each host geometry and guest conformer, which has been shown to generate binding energies that qualitatively correlate with typical experimental outcomes.99 Host energies were obtained by relaxing each unloaded zeolite with the four optimization schemes shown in Fig. 3 and selecting the resulting energy minimum as the reference value. Unit cells from DFT- and SLC-optimized frameworks were assigned different reference energies since all relaxations were performed at constant volume (see Fig. S4a). Then, we selected the strongest binding affinities among all poses created with the given guests, hosts, and their lattice parameters.99 Figure 4 compares the best binding energies for each complex according to the starting host. Ideally, small changes in unit cell parameters should not significantly affect the ground state host–guest interactions, as long as the configuration space is thoroughly explored. However, we observe that optimization and MD binding energies vary significantly as a function of the initial substrate. The MAE between the best binding energies is 52.7 and 50.3 kJ/mol OSDA for opt and MD methods, respectively. In contrast, the best binding energies from the frozen pose method are consistent across different initial zeolite structures, with a MAE of 10.1 kJ/mol OSDA, five times better than the other methods. This suggests that binding energies from the frozen pose method are more robust to variations of unit cell parameters, rendering them a desirable choice for high-throughput computational workflows. Furthermore, the agreement of the final binding energies from this method across randomly initialized poses suggests that the strategy converges toward an energy minimum and could be supplemented by Boltzmann averages. Although only a subset of complexes was considered for the sake of computational efficiency, these trends should also hold for other zeolite–OSDA pairs.
Correlation between the minimum FF-based binding energy (Eb) for complexes with different initial host lattices, as calculated by the (a) optimization, (b) MD, and (c) frozen pose methods. The unloaded zeolites were independently optimized using DFT and FF-SLC before the docking [see Fig. 2(a)]. The MAE indicates that the frozen pose method is more robust to variations of the initial conditions. All values are given in kJ/mol OSDA.
Correlation between the minimum FF-based binding energy (Eb) for complexes with different initial host lattices, as calculated by the (a) optimization, (b) MD, and (c) frozen pose methods. The unloaded zeolites were independently optimized using DFT and FF-SLC before the docking [see Fig. 2(a)]. The MAE indicates that the frozen pose method is more robust to variations of the initial conditions. All values are given in kJ/mol OSDA.
IV. DISCUSSION
For decades, host–guest interactions have been modeled using a variety of methods, but there is still no reported consensus on which method is more appropriate to compute these binding affinities. Remarkably, many works do not report the exact conditions to reproduce their calculations and omit the choice of simulation ensembles, MD time steps, pressure, initial substrates, and other important parameters. Yet, high-throughput screening methods require selecting parameters that yield computationally efficient results and can be deployed robustly without manual supervision.
The results in this work clarify a number of interesting observations. First, using the Spearman’s correlation coefficient allows binding energies to be compared according to their rank instead of their magnitude. This is particularly useful when FF parameterizations are not designed to reproduce exact energy values and the composition space under analysis is diverse. According to this approach, outcomes of DFT simulation approaches (opt or MD at constant P or V) are highly correlated with one another, justifying their use as baseline values. On the other hand, DFT (frz, P/V) energies overlook framework distortions and may be misleading, particularly in view of their computational cost. Moreover, the simulation constraints on pressure/volume weakly influence the DFT results, since this level of theory accurately describes hosts, guests, and their interactions.
In contrast, Dreiding FF binding energies at the NPT ensemble do not correlate well with their DFT counterparts. Rather, FF simulations at constant volume show good correlation with all DFT optimizations and MD simulations. This is a limitation of the Dreiding FF, which, despite it widespread use in prediction of OSDA–zeolite interactions, is unable to correctly describe the unloaded zeolite framework when volume relaxation is allowed. Other less ubiquitous and more rigorous force field parameterizations may achieve a reasonable description of the zeolite structure in addition to the guest–guest and host–guest interactions.51 However, the conclusions regarding the frozen pose method may not hold for such force fields, which would have to be tested again. We propose that the metrics and ground-truth DFT data in this work can serve as open benchmarks for the community.
Even when binding energies are compared across different simulation pathways and optimization methods within a single FF parameterization, results vary drastically. We have shown that energies from structure optimizations and MD simulations are more susceptible to initialization issues than frozen pose methods. The higher correlation between the latter and the DFT binding energies is also supportive of this robustness. Furthermore, while optimization algorithms can lead to different energy minima, this might be a limitation of the Dreiding FF instead of the algorithms themselves.
Additionally, a larger sampling of the configuration space of vdW-interacting OSDAs through MD simulations does not necessarily lead to significant changes in trends of binding energies when compared to the frozen pose method, even when long trajectories are employed to calculate binding affinities. This may be another consequence of the Dreiding FF, which is unable to accurately describe the host dynamics. Methods such as DFT can display different behaviors. In practice, however, this conclusion opens an opportunity for simulating zeolite–OSDA pairs at a larger scale using the Dreiding parameterization. One of the major bottlenecks of zeolite–OSDA simulations is to simulate long MD trajectories of guests docked inside the host for a variety of loadings and initial configurations, as has been typically done in screening works of the field.36,41–43 We propose to replace MD simulations by frozen pose methods within Dreiding FF calculations, drastically reducing the time necessary to perform computations while increasing the robustness of the final binding energy with respect to the choice of optimization algorithms and unloaded zeolite geometry. Even if Dreiding FF (MD, V) binding energies are better predictors of experimental outcomes than DFT (MD, P), which has not yet been verified, the use of the frozen pose method is still justified by the high correlation between FF (MD/frz, V) affinities. Moreover, we show that, within the Dreiding FF, the absolute values of the binding energy tend to be more transferable across different initial configurations through this method, suggesting that it can also be used to compare the influence of each molecule in stabilizing different zeolite frameworks.99
It is important to note that the chemical space used in this analysis is comprised of neutral molecules. However, most known OSDAs are positively charged and direct the formation of zeolites with heteroatoms in their backbone. Theoretical studies on how OSDAs affect the position of the heteroatoms have been developed,105–110 although at a high computational cost. OSDAs are often modeled without charges in FFs since vdW interactions tend to dominate the templating effects.2,13,24,33,41–43,45 However, changes in zeolite composition may have additional effects on binding energies beyond shape matching. Disentangling the role of these additional interactions between charged OSDAs and framework heteroatoms requires more complex and expensive simulations and is left for further studies. Typical methods of charge-compensating background potentials may shift energy differences depending on the system, and combinatorial studies on heteroatom distribution are prone to be computationally expensive.
Finally, benchmarks enable the development of high-throughput computation infrastructures by providing clear guidelines for simulating materials in large scales. To support further dissemination of these ideas, we are releasing the Python interface to the GULP code used to perform these calculations as the package GULPy, as well as the data generated in this article.111 They lay down standards to test and automate calculation workflows of OSDAs and zeolites with increased reliability.
V. CONCLUSIONS
In summary, we benchmarked different methods to calculate binding energies of OSDAs in zeolites by performing DFT and FF calculations for 272 zeolite–OSDA pairs. We showed that Dreiding FF binding energies calculated with the frozen pose method correlate best with DFT energies. This method offers additional robustness to the binding energy with respect to the choice of geometry optimization algorithms and initial docking conditions. On the other hand, a larger sampling of the phase space from MD simulations does not provide significant benefits over the frozen pose method within the Dreiding FF since MD- and frozen pose-based binding affinities are well correlated. Moreover, simulations at constant volume significantly outperform those at constant pressure within the Dreiding FF. This stems from the inability of this parameterization to correctly model the unloaded, pure-silica zeolite structure. The results shown here provide reliable parameters for high-throughput computation of binding energies for zeolites and OSDAs. This benchmark, code, and data aim to enable robust, large-scale screenings of OSDAs with significantly less computational overhead.
SUPPLEMENTARY MATERIAL
See the supplementary material for the following:
Complete dataset of binding energies for all 272 poses.
Details of the computational approach, distributions of energies, correlations, and structural information for all systems.
ACKNOWLEDGMENTS
This work was supported by the MIT Energy Initiative (MITEI) and MIT International Science and Technology Initiatives (MISTI) Seed Funds. D.S.-K. was additionally funded by the MIT Energy Fellowship. The computations in this paper were executed at the Massachusetts Green High-Performance Computing Center with support from MIT Research Computing and at the Extreme Science and Engineering Discovery Environment (XSEDE)112 Comet through Allocation No. TG-DMR200068.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material and on GitHub at https://github.com/learningmatter-mit/Zeolite-Binding-Benchmark, Ref. 111. The code used to perform the simulations with GULP is available at https://github.com/learningmatter-mit/gulpy, Ref. 81.