Author Notes
Conformer–rotamer sampling tool (CREST) is an open-source program for the efficient and automated exploration of molecular chemical space. Originally developed in Pracht et al. [Phys. Chem. Chem. Phys. 22, 7169 (2020)] as an automated driver for calculations at the extended tight-binding level (xTB), it offers a variety of molecular- and metadynamics simulations, geometry optimization, and molecular structure analysis capabilities. Implemented algorithms include automated procedures for conformational sampling, explicit solvation studies, the calculation of absolute molecular entropy, and the identification of molecular protonation and deprotonation sites. Calculations are set up to run concurrently, providing efficient single-node parallelization. CREST is designed to require minimal user input and comes with an implementation of the GFNn-xTB Hamiltonians and the GFN-FF force-field. Furthermore, interfaces to any quantum chemistry and force-field software can easily be created. In this article, we present recent developments in the CREST code and show a selection of applications for the most important features of the program. An important novelty is the refactored calculation backend, which provides significant speed-up for sampling of small or medium-sized drug molecules and allows for more sophisticated setups, for example, quantum mechanics/molecular mechanics and minimum energy crossing point calculations.
I. INTRODUCTION
Scientific software development in chemical physics has always been driven by the need for greater efficiency and adaptability in the face of evolving theoretical methods and advances in computer hardware.1–4 The conventional paradigm of self-contained codes, while capable of reproducing specific functionalities, has inadvertently given rise to duplicated efforts and faces challenges in staying ahead of the rapid pace of change. Over the past couple of years, and in response to these challenges, software developers are increasingly embracing a “divide and conquer” strategy and creating interoperable software to advance chemical physics research.5–8 New developments focus on creating software tailored to specific tasks within the broader field of computational chemistry, which not only streamlines development efforts but also facilitates easier adaptation to new methodologies and hardware architectures. Simultaneously, researchers are adopting a collaborative ethos by enhancing the interoperability of their software to enable the seamless integration of features from diverse program packages. Popular examples from the wider field of chemical physics include the atomic simulation environment (ASE),9 the large-scale atomic/molecular massively parallel simulator (LAMMPS),10 and the cheminformatics software package RDKit.11 The collaborative approach enables the coordinated utilization of atomic simulation capabilities across various programs, enhancing the prospects for innovation and providing researchers with a comprehensive toolkit to address complex problems in many areas of chemical physics research.
One such field of research with continued attention is molecular chemical space exploration (CSE).12–14 Chemical space and, in particular, the low-energy molecular chemical space, e.g., featuring weak metallic interactions that require a description of the electronic structure. The need for sampling this vast space becomes quickly apparent considering that even small changes to the studied system can strongly affect the calculated properties according to the structure-activity-principle. Among the most prominent and insightful examples for members of the low-energy chemical space are molecular conformations. Exploring the conformational space is a fundamental task and often among the first steps in any computational study concerning molecules or clusters15,16 (cf. Fig. 1).
The first step in many computational chemical physics studies: Conformational sampling is necessary to find the energetically favored structures of a molecule. Typically, starting from a low-dimensional representation such as the molecular Lewis structure (left) or its SMILES17 or InChi18 code, an initial set of three-dimensional coordinates is created (center), which can then be applied in conformational sampling applications (right).
The first step in many computational chemical physics studies: Conformational sampling is necessary to find the energetically favored structures of a molecule. Typically, starting from a low-dimensional representation such as the molecular Lewis structure (left) or its SMILES17 or InChi18 code, an initial set of three-dimensional coordinates is created (center), which can then be applied in conformational sampling applications (right).
The key problem here is the rapid growth of the conformational space of a molecule with its size, which is roughly exponential with the number of “rotatable” bonds (N).14,19–21 Associated with that is a huge computational effort, giving rise to a variety of programs and algorithms specific to the sampling task. A good review of the topic was provided by Hawkins,20 to which we refer the reader at this point. Typical techniques include various types of molecular dynamics (MD) based sampling,22,23 Monte Carlo based algorithms such as basin-hopping global optimization,21,24,25 and heuristic/rule-based chemoinformatic conformer generators as implemented in Omega,26 Molassembler,27 ConfGen,28 Frog2,29 RDKit,11 and others.30–35 Similar challenges and algorithmic varieties are encountered for other members of the low-energy chemical space, including the tasks of determining protonation sites, finding other types of isomers such as tautomers, or describing the molecule’s behavior upon (explicit) solvation.
From a physical point of view, all CSE applications are tied to analysis of the potential energy landscape (PEL),14,21 imposing an important and often limiting condition on any kind of sampling: The exploration can only ever be as good (or bad) as the potential used to investigate the underlying energy landscape. While the best quality energy landscapes, i.e., the most physical ones, are usually obtained using first principles electronic structure methods, for example, density functional theory (DFT), even medium-sized pharmaceutical molecules require thousands to millions of energy and gradient evaluations to explore the chemical space, making large-scale high-throughput calculations impractical at such quantum mechanical levels.15,16 Much less costly molecular mechanics (MM) calculations like classical force-field (FF) or coarse-grained methods allow exploration of substantially larger systems but often suffer in terms of accuracy and parameter availability. A balance between computational cost and accuracy is essential, which in recent years has been addressed by machine learning (ML) methods36,37 and the revival of semiempirical quantum mechanics (SQM).38,39
In previous studies, we presented the CREST code19,40 (abbreviation for Conformer–Rotamer Ensemble Sampling Tool) as a versatile program for studying the chemical space of drug sized molecules at the semiempirical extended tight-binding (xTB) level. Being based on the GFNn-xTB methods,39 CREST offered capabilities for automated conformational sampling, conformational entropy calculations,40,41 explicit solvation studies,42,43 protonation site determination,44 and more45–48 for systems containing any element up to radon (Z ≤ 86). The code was developed in order to provide a robust, broadly applicable, and easy-to-use tool for both experts and non-experts in the field.
This paper reports on the status of the program’s latest iteration, CREST 3.0, including new implementations and a major re-factorization of the original Fortran code. In the following, Sec. II summarizes the primary algorithm implemented in CREST, followed by an overview of further technical particulars in Sec. III, which includes discussions of implemented potentials and atomic simulation methods. Information about the program distribution is provided in Sec. IV.
II. ALGORITHMS AND APPLICATIONS
A. Conformational sampling
Conformational sampling is the original and eponymous application of CREST. Historically, the program was developed with a normal mode-following approach for generating conformations in the context of nuclear magnetic resonance (NMR) spectra calculation.49 Soon after, this procedure was discarded in favor of a quicker and more robust metadynamics-based algorithm,19,50 which is the basis for current implementations. Herein, a main design aspect of CREST is the automation of all calculations. Starting from an initial user-defined configuration of the input system (nuclear charges, Cartesian coordinates, molecular charge, multiplicity/spin), the system is analyzed, and sensible simulation parameters are determined. In particular, the overall metadynamics (MTD) run times and bias potential strengths are calculated based on the system size and a flexibility measure ξf (for details on this, see Refs. 19 and 40) to allow an automatic initialization of simulations. Furthermore, an initial geometry optimization is conducted to check for any occurring changes to the molecular topology that could be caused simply by the potential and, if observed, safely interrupt the sampling process. The general workflow for CREST’s default conformational sampling routine is outlined in Fig. 2.
Outline of the iterative metadynamics and genetic crossing (iMTD-GC) workflow to generate molecular conformations.
Outline of the iterative metadynamics and genetic crossing (iMTD-GC) workflow to generate molecular conformations.
Following the automatic setup, a block of metadynamics simulations (see Sec. III B 1) is performed using the Cartesian root-mean-square-deviation (RMSD)51 as the collective variables.50 The applied metadynamics bias potential ensures a reliable generation of new molecular conformations by blocking out previously visited minima on the potential energy landscape. Since automation is the key step and different molecules require different widths and strengths of the bias potential, i.e., the prefactor and exponent employed in a Gaussian function [see Eq. (15)], several MTD simulations with various bias settings are run in parallel to allow exploration for a large variety of systems. MTDs routinely cross high energy barriers during the simulation to generate new conformations; however, small lower barrier changes such as side-chain rotations are sometimes not adequately captured. For this purpose, two additional steps are implemented. First, a block of regular MD simulations is performed using two different temperatures, starting from the lowest few (typically four) conformers found up to this point. This will generate predominantly conformations that are connected via small barriers relative to those overcome by the metadynamics and target the sampling of rotamers, which are degenerate conformers featuring nuclear permutation (see Sec. II A 2). Second, the rotamer generation is further augmented by a procedure adapted from genetic crossing (GC) algorithms.31,52 Here, the internal coordinates (Z-matrices) of two structures, systematically drawn from the current ensemble, are used to calculate a mutation for the lowest conformer. The newly generated conformations often differ primarily in the orientation of functional groups, leading to a completion of the rotamer rather than the conformer space. Furthermore, the use of internal coordinates makes the GC step less robust than the preceding molecular and metadynamics sampling. Between any of these steps, molecular geometries are optimized in a multi-level optimization approach19 using the Approximate Normal Coordinate OPTimizer (ANCOPT) algorithm (Sec. III B 2), and the ensemble is sorted via the Conformer–Rotamer Ensemble GENeration (CREGEN) procedure (Sec. II A 2) to distinguish conformers, rotamers, and prune duplicates. If, at any point, a lower-energy conformer is found, the entire procedure is iteratively restarted with this new structure, which can be interpreted as the progression of a Markov chain. The combined workflow is hence abbreviated as the iterative metadynamics-genetic crossing (iMTD-GC) method, which was designed to both reliably reproduce the global minimum of drug-like molecules and to provide good ensemble coverage as measured by an ensemble entropy, as outlined in Ref. 50. The corresponding code is optimized for single-node (OpenMP) parallelization to allow the concurrent execution of all molecular dynamics and geometry optimization processes. In theory, the algorithm is “embarrassingly parallel” up to the point where each structure is processed by a single thread.
The development of the current program version was concerned with improving the technical performance of the code. Previously designed as an OpenMP scheduler for calculations performed with the xtb program,53 the CREST 3.0 source code has since largely been refactored and extended to implement the corresponding calculators directly (cf. Sec. III). For workflows such as iMTD-GC, the change from a subprocess-based to the current calculator/API-based program provides significant computation-time improvements, as demonstrated for three typical target molecules in Fig. 3.
Computational wall times for running iMTD-GC with CREST versions 2.12 and 3.0 for the n-octane, amoxicillin, and Tamiflu molecules at the GFN2-xTB (blue) and GFN-FF (red) levels of theory. All calculations were run on the same system, each using 16 threads of Intel Xeon Gold 6326 CPUs (2.90 GHz). Timings are presented on a logarithmic scale. The revised program version 3.0 (light-colored bars) shows improved efficiency for all cases.
Computational wall times for running iMTD-GC with CREST versions 2.12 and 3.0 for the n-octane, amoxicillin, and Tamiflu molecules at the GFN2-xTB (blue) and GFN-FF (red) levels of theory. All calculations were run on the same system, each using 16 threads of Intel Xeon Gold 6326 CPUs (2.90 GHz). Timings are presented on a logarithmic scale. The revised program version 3.0 (light-colored bars) shows improved efficiency for all cases.
For the molecules provided as examples, namely n-octane, amoxicillin, and Tamiflu, notable reductions in computation time ranging from factors 1.6–4.7 are achieved when using CREST 3.0 as opposed to the earlier program version. Running iMTD-GC for these systems requires a number of energy and gradient evaluations on the order of , making the achievable performance highly dependent on the chosen level of theory, the number of parallel running CPUs, and the system size. As a rule-of-thumb, savings compared to the xtb subprocess version (CREST ≤ 2.12) will be greatest for small systems where a separate process and I/O operations would produce non-negligible overhead compared to the actual energy and gradient evaluation. Unsurprisingly, the greatest computation time reduction was achieved for the flexible but small n-octane molecule, running iMTD-GC with GFN-FF. In conclusion, the improved efficiency of CREST 3.0 renders it more suitable for high-throughput conformational sampling of small-molecule databases than previous versions.
1. Using semiempirical methods for conformational sampling
The conformational sampling of CREST was explicitly developed in combination with the (semi)empirical methods of the GFNn-xTB/-FF family.39,54–57 These methods are supposed to yield reasonably accurate PELs, associated with only little computational effort compared to density functional or wave function theory. Both computational efficiency and sufficient accuracy are essential requirements for extensive conformational sampling approaches. To answer the question of how well the approximated PELs match high-level references, conformational energy benchmarks are evaluated in the following. In Fig. 4(a), GFN2-xTB and GFN-FF are compared to competing methods with a focus on conformational energies. Comparisons are grouped according to force-field (FF), SQM, and (composite) DFT and Hartree–Fock (HF)-based approaches. For a statistically proper and fair evaluation across a multitude of different benchmarks, we chose the established WTMAD-2 averaging scheme, originally introduced with the GMTKN55 benchmark database58 (see the supplementary material for details). Besides the conformer benchmarks included in GMTKN55 (Amino20x4: amino acid conformers, BUT14DIOL: butane-1,4-diol conformers, ICONF: inorganic systems, MCONF: melatonin conformers, PCONF21: tri- and tetrapeptide conformers, SCONF: sugar conformers, UPU23: RNA-backbone conformers), further conformer benchmarks from the literature were taken into account for the illustrated WTMAD-2 values: ACONFL61 (conformers of longer n-alkane chains), MPCONF19664 (conformers of acyclic and cyclic model peptides and other macrocycles), 37CONF863 (conformers of organic drug-like molecules), TMCONF16 (subset of TMCONF40,60 conformers of transition-metal complexes), and MALT22266 (α-maltose conformers).
(a) Weighted total mean absolute deviations (WTMAD-2, see Refs. 58 or 59 for the definition) to reference relative conformational energies for various methods in kcal mol−1. The benchmark sets Amino20x4, BUT14DIOL, ICONF, MCONF, PCONF21, SCONF, UPU23 (all part of GMTKN5558), TMCONF16,60 ACONFL,61,62 37CONF8,63 MPCONF196,64,65 and MALT22266 were considered for the presented WTMAD-2. TMCONF16 (for GAFF additionally ICONF) were not taken into account for the WTMAD-2 calculation for DFTB3-D3(BJ) and GAFF since appropriate parameters were not available. (b) Mean absolute errors (MAEs, kcal mol−1) for selected benchmark sets and methods. For B3LYP-D467–70 and ωB97X-V,71 def2-QZVP(P) basis sets72 were used throughout. The benchmark data employed in the figures, including the WTMAD-2 calculation, are provided in a spreadsheet in the supplementary material.
(a) Weighted total mean absolute deviations (WTMAD-2, see Refs. 58 or 59 for the definition) to reference relative conformational energies for various methods in kcal mol−1. The benchmark sets Amino20x4, BUT14DIOL, ICONF, MCONF, PCONF21, SCONF, UPU23 (all part of GMTKN5558), TMCONF16,60 ACONFL,61,62 37CONF8,63 MPCONF196,64,65 and MALT22266 were considered for the presented WTMAD-2. TMCONF16 (for GAFF additionally ICONF) were not taken into account for the WTMAD-2 calculation for DFTB3-D3(BJ) and GAFF since appropriate parameters were not available. (b) Mean absolute errors (MAEs, kcal mol−1) for selected benchmark sets and methods. For B3LYP-D467–70 and ωB97X-V,71 def2-QZVP(P) basis sets72 were used throughout. The benchmark data employed in the figures, including the WTMAD-2 calculation, are provided in a spreadsheet in the supplementary material.
In the force-field regime (blue), two “general” force-fields were tested: the general Amber force-field73 (GAFF) is an extension of the Amber force-field for proteins and nucleic acids, next to GFN-FF,57 the generic polarizable force-field of the GFN family of methods utilized in CREST. Besides GFN2-xTB, the considered SQM methods comprise another tight-binding (TB) method (cyan), DFTB3-D3(BJ) in the 3ob-3-1 parameterization,74–77 and the Hartree–Fock-based PM6-D3H4X78–80 (red). HF-3c, a minimal basis set Hartree–Fock (HF) with specific approximations,81 has been grouped together with the HF-based PM6-D3H4X semiempirical method. Generally, the TB methods are more accurate than the FFs, whereas PM6-D3H4X is outperformed by GFN-FF. GFN-FF and GFN2-xTB stand out with noticeably lower WTMAD-2s in comparison to GAFF on the one end and PM6-D3H4X and DFTB3-D3(BJ) on the other end. Notably, neither a “proper” QM treatment of HF-3c nor a semiempirical method lead to results superior to the respective WTMAD-2, which is slightly larger than that of GFN2-xTB. The DFT methods are grouped into two categories, composite DFT methods employing smaller basis sets (coral) and DFT functionals in practically converged quadruple-ζ (QZ) basis sets (gray). Remarkably, the more efficient “3c” composite methods, r2SCAN-3c82 and ωB97X-3c,59 reach virtually the same performance for conformational energies as the (range-separated) hybrid functionals B3LYP-D467–70 and ωB97X-V71 in QZ bases. At present, state-of-the-art DFT methods are substantially more accurate in describing the subtle energy differences in different molecular conformations with four to five times lower mean absolute errors (MAEs) than SQM or FF methods but are at the same time much more expensive. In Fig. 4(b), MAEs of selected methods are shown for the non-GMTKN55 conformer energy benchmark sets. Consistent with Fig. 4(a), ωB97X-V/QZ and r2SCAN-3c exhibit very similar MAEs throughout. PM6-D3H4X is in almost all cases inferior to GFN2-xTB except for ACONFL, for which GFN2-xTB has visibly larger errors than the competing methods. Note, however, that errors for none of the tested methods exceed the target accuracy of 1 kcal mol−1 for the ACONFL set. Even though the differences in overall performance are only moderate, GFN2-xTB and GFN-FF show very different strengths and weaknesses. GFN2-xTB is distinctly more accurate for organic drug-like molecules and peptides in 37CONF8 and MPCONF196, while GFN-FF is better at describing long alkane chains in ACONFL and transition metal complexes in TMCONF16, respectively. However, with regards to the latter benchmark, we note that transition metal complexes are particularly challenging, and related studies report an opposite trend for GFN-FF and GFN2-xTB.60
For further comparison, Fig. 5 contains relative energies with respect to the energetically lowest conformer for ensembles from the MPCONF196 set, correlated with the reference relative energies. The ωB97X-V relative energies agree excellently with the reference data and are mostly within the desired accuracy of 1 kcal mol−1. GFN2-xTB and, more pronounced, GFN-FF reveal a stronger scatter of the relative energies but still show the correct qualitative trends in most cases, which is mandatory for conformational sampling. This expectation can easily be confirmed by looking at the corresponding coefficients of determination and Spearman rank correlation coefficients, which are shown in Table I.
Correlation of reference relative conformational energies of the MPCONF196 benchmark set64 (reference data taken from Ref. 65) compared to ωB97X-V (top), GFN2-xTB, and GFN-FF relative energies (bottom). The dashed lines above and below the parity diagonal indicate the desired “chemical accuracy” of 1 kcal mol−1. The RMSEs for the three methods are provided in kcal mol−1. The coefficients of determination (R2) are dimensionless.
Correlation of reference relative conformational energies of the MPCONF196 benchmark set64 (reference data taken from Ref. 65) compared to ωB97X-V (top), GFN2-xTB, and GFN-FF relative energies (bottom). The dashed lines above and below the parity diagonal indicate the desired “chemical accuracy” of 1 kcal mol−1. The RMSEs for the three methods are provided in kcal mol−1. The coefficients of determination (R2) are dimensionless.
Coefficients of determination (R2) and Spearman rank correlation coefficients (rs) computed for the data shown in Fig. 5. Both coefficients are dimensionless.
Method . | R2 . | rs . |
---|---|---|
ωB97X-V | 0.994 | 0.996 |
GFN2-xTB | 0.889 | 0.925 |
GFN-FF | 0.687 | 0.787 |
Method . | R2 . | rs . |
---|---|---|
ωB97X-V | 0.994 | 0.996 |
GFN2-xTB | 0.889 | 0.925 |
GFN-FF | 0.687 | 0.787 |
Given that the two GFN methods are multiple orders of magnitude faster than the above-discussed DFT methods, they are reasonable choices for exploring the conformational space, which is consistent with findings in related literature.60,61,63,65 However, the significant differences in accuracy may necessitate the re-ranking and optimization of conformer ensembles at the DFT level if highly accurate results are required. Furthermore, while results for the ensemble entropy (see Sec. II A 4) suggest good ensemble coverage and similar sampling fitness with both SQM and FF methods, minima on the PEL might differ and occasionally collapse into the same structure upon optimization at another level of theory. This is an issue of potential rather than sampling methodology, and the user is well advised to investigate such effects on a case-by-case basis.
2. Ensemble sorting algorithm
The generation of molecular configurations in a top-down approach necessitates the implementation of bookkeeping algorithms for tracking the ensemble. Each configuration is defined as a minimum in the potential energy landscape. The energy landscape complexity grows exponentially with degrees of freedom (DOF), and even for medium-sized systems, like a non-covalently bound cluster of 55 atoms, it is estimated to be on the order of minima.24,25 In the special case of molecular conformations, the topology of covalent bonds between atoms may not change, which employs a constraint on the overall energy landscape complexity. Still, the number of accessible minima scales roughly as 3n, with n being the number of rotatable (e.g., sp3-sp3 carbon) bonds.14 Due to local permutational symmetry, some side chains will lead to degenerate conformations referred to as rotamers. Common examples include methyl or phenyl groups, where the dihedral rotation around a single carbon bond will generate three and two indistinguishable rotamers, respectively. Accordingly, ensembles including both types of isomers can be referred to as conformer–rotamer ensembles (CREs). The CRE within a certain energy window is typically significantly smaller than the overall conformational space and the rather crude 3n approximation. This is particularly true for ensembles generated with CREST,83,84 which classifies and filters the generated structures already during the iterative sampling procedure. An a priori prediction of the expected number of conformations within such energy windows is, therefore, hardly possible and depends on the PEL shape of the individual molecule.
A numerical classification and bookkeeping algorithm for the Conformer–Rotamer Ensemble GENeration (CREGEN) is implemented in CREST to aid in the identification of conformers, their associated rotamers, and duplicated structures. The general multi-step workflow is outlined in Fig. 6. CREGEN is composed of several steps: In the initial ensemble preprocessing, first a covalent-bond topology is determined for all structures based on neighbor lists calculated from the atomic coordination number (CN) as taken from the D4 scheme.70,85 All structures that do not match the topology of the original input are discarded by default. In the next step, an energy window is applied to discard all structures with an energy higher than a given reference Ewin relative to the lowest-energy structure in the current ensemble. Finally, rotational constants Be are calculated for all structures, serving as a structural parameter for further comparisons.
Outline of the CREGEN sorting algorithm. The workflow processes an initial ensemble of M structures, each represented by a set of atomic coordinates and an associated energy, to eliminate duplicate structures and determine pairwise relations of “true” conformers and rotamers that are collected in the CRE.
Outline of the CREGEN sorting algorithm. The workflow processes an initial ensemble of M structures, each represented by a set of atomic coordinates and an associated energy, to eliminate duplicate structures and determine pairwise relations of “true” conformers and rotamers that are collected in the CRE.
The main pairwise classification of structures as conformers, rotamers, or duplicates is based on a sequential comparison of three criteria, namely the energy difference, the difference of rotational constants, and the Cartesian RMSD. If the energy difference (ΔE) between two structures surpasses a predetermined threshold (Ethr), these two candidates, defined as minima on the potential energy surface (PES), are categorized as conformers. Otherwise, a computationally expensive structure comparison becomes necessary. As an initial and cost-effective criterion, the difference between the rotational constants (|ΔBe|) can be evaluated against an empirical anisotropically corrected threshold value (Bthr). Since rotational constants remain unaffected by atom permutations, minimal differences in |ΔBe| between two rotamers or duplicates of the same conformer are expected. In contrast, true conformers should exhibit notable differences in this aspect, even if they are close in energy. Hence, to identify those structures and eliminate duplicates, an RMSD (R) between two potential rotamers is computed using a quaternion algorithm,51 and the result is compared to a third threshold (Rthr). Only structures surpassing this threshold are retained in the final Conformer Redundancy Elimination (CRE). The default threshold values for CREGEN are detailed in Table II.
Default thresholds for the energy window Ewin, energy difference Ethr, Cartesian RMSD Rthr, and rotational constant Bthr. The latter is dynamically adjusted to the system. Systems with a higher anisotropy of Be use a larger Bthr.
Parameter . | Default value . |
---|---|
Ewin | 6.0 kcal mol−1 |
Ethr | 0.05 kcal mol−1 |
Bthr | 1.0%–2.5% |
Rthr | 0.125 Å |
Parameter . | Default value . |
---|---|
Ewin | 6.0 kcal mol−1 |
Ethr | 0.05 kcal mol−1 |
Bthr | 1.0%–2.5% |
Rthr | 0.125 Å |
The CREGEN algorithm is optimized for processing large ensembles. Here, the most expensive step is the pairwise RMSD calculation. Due to the preceding energy and rotational constant cutoffs, the RMSD comparison has a complexity of approximately but, at worst, will scale as , with n being the ensemble size within Ewin. Notably, the threshold-based numerical construction of the algorithm may lead to false-positive identification of true conformers as rotamers or vice versa. This problem occurs more frequently with increasing system size (Nat), which can only partially be compensated by an adjustment of {Ethr, Rthr, Bthr}. The conformer identification may, therefore, be supplemented by post-processing via permutation invariant comparisons of atomic configurations. Since the latter is accessible only via costly methods such as finding the permutational minimum RMSD via a Hungarian algorithm86 or modern schemes such as the use of smooth atomic overlap (SOAP) kernels,87 no implementation is currently available in CREST, and the reader is referred to the literature.88–90
For the reduction of ensemble sizes or the identification of representative structures, CREST additionally implements a clustering approach based on principle component analysis (PCA) of dihedral angles, combined with k-means clustering of the associated principle component space. Very briefly, the implementation follows closely the details given in Ref. 91, and ideal cluster sizes are iteratively determined through the convergence of the ratio (SSR/SST) of the sum of squares regression (SSR) and the total sum of squares (SST), which is equal to the coefficient of determination in linear regression. By default, the clustering terminates when the SSR/SST ratio exceeds a value of 0.9. Internal coordinates for the identification of suitable dihedral angles are set up in reference to the topology measure mentioned earlier to optimally represent covalent bonds. A PCA is then conducted to provide the principle components associated with the most diverse dihedral angles from this list, excluding terminal rotamers such as methyl groups. The clustering procedure can automatically be appended to the CREGEN algorithm and is used, for example, within the configurational entropy calculation (Sec. II A 4).
3. Specialized conformational sampling applications
Opposed to heuristics-based conformer generators, general and physics-based conformational sampling in CREST offers enhanced simulation capabilities with the option to define a series of geometrical constraints (see Sec. III A 5). The latter are of particular use for sampling non-covalently bound complexes. Here, an ellipsoidal potential can be used to encapsulate the studied complex and avoid problems due to the metadynamics bias which, by maximization of the RMSD due to increasing the distance between the non-covalently bound fragments, would minimize the energy and hence only generate dissociated clusters. A dedicated non-covalent interaction (NCI) runtype, called NCI-MTD, is implemented in CREST to automatically set up the ellipsoid potential and slightly modify the bias potential parameters. Several examples of this were shown in the original publication,19 in comparison with DFT calculations and experimental references from rotational spectroscopy.
Another application is conformational sampling with constrained parts of the structure, for example, to find preferred conformations on a surface cutout. In this context, and as opposed to previous program versions, CREST 3 implements the complete freezing of atoms in all metadynamics simulations and geometry optimizations. An illustrative example combining the NCI-MTD mode with this new capability is provided in Fig. 7, which shows a cutout of the iron-based MIL53 metal organic framework (MOF) used for the drug delivery of Ibuprofen.92,93 In the corresponding calculation, all iron atoms in the cutout were frozen to closely retain the crystal structure92 geometry. Furthermore, the NCI-MTD ellipsoid potential encapsulating the entire cavity (not visualized in the figure) was applied only to the ibuprofen molecule to avoid its effusion from the MOF. With a calculation runtime of ∼38 min on a standard commercial laptop, 122 unique conformers for the complex were found within an energy window of 6 kcal mol−1 at the GFN-FF level, of which the lowest ten are shown overlaid in Fig. 7. These conformers are particularly useful for calculating the more accurate adsorption energies of the drug-in-MOF complex. Even at the inexpensive force-field level, the lowest energy conformer was found to be stabilized by 3.8 kcal mol−1 compared to the initial optimized starting structure and showed a strong preference for forming a hydrogen bond to the MOF. This example clearly demonstrates the versatility and efficacy of both CREST and GFN-FF. The machinery for constrained conformational sampling is easy to set up and provides a powerful toolset for a variety of applications.42,43,94,95 New applications, for example, conformational sampling with applied hydrostatic pressure via the extended hydrostatic compression force-field (X-HCFF) method96 (see Sec. III A 3), are currently explored by our groups.
Overlay of the ten most preferred conformations of the Ibuprofen molecule, embedded within one cell cutout of the MIL53(Fe) MOF (Cambridge Crystal Structure Database code HAMXEC),92 calculated at GFN-FF level. The most stable conformation is shown in solid color; all other conformations are shown in transparent blue.
Overlay of the ten most preferred conformations of the Ibuprofen molecule, embedded within one cell cutout of the MIL53(Fe) MOF (Cambridge Crystal Structure Database code HAMXEC),92 calculated at GFN-FF level. The most stable conformation is shown in solid color; all other conformations are shown in transparent blue.
4. Absolute molecular entropy calculation
In our work,40,41 we proposed a modification of iMTD-GC aimed at more thorough sampling, explicitly for obtaining the conformational entropy. This workflow is primarily aimed at generating complete ensembles rather than finding the global minimum and ideally should be started from the latter.40 The respective workflow is outlined in Fig. 8. The core novelty compared to the default iMTD-GC is the replacement of the MD and GC steps after an initial block of metadynamics simulations by a batch of umbrella sampling107 dynamics using the same type of RMSD bias potentials as introduced for the MTDs. This will effectively forbid the simulations to visit certain (known) regions of the energy landscape. However, instead of modifying the list of bias structures during the simulations as in the MTD, a fixed set of representative bias configurations is chosen in an extension to CREGEN via a PCA of intramolecular dihedral angles combined with k-means clustering.91 Due to the RMSD-based potential and the fixed nature of the list of reference configurations, we referred to the umbrella sampling approach as static metadynamics (sMTD) and the whole workflow as iMTD-sMTD. The block of umbrella sampling simulations is repeated iteratively until convergence with regards to a conformational entropy calculated via the Gibbs–Shannon equation is reached for the CRE. In general, this workflow is more thorough than iMTD-GC and can routinely be applied for CSE, but it is also more costly. At this point, we again note that the number of generated conformers within a given energy cutoff is much smaller than the theoretically expected overall number of minima, for example, according to the 3n approximation (see above).83,84 While this is certainly beneficial for sampling the corresponding ensembles as completely as possible, it warrants consideration of the energy window, which must be large enough to capture the cumulative configurational entropy contributions of high-energy conformations. We generally find 6–10 kcal mol−1 sufficient for the energy window, depending on the molecule size.
Outline of the iterative metadynamics–static metadynamics (iMTD-sMTD) workflow used to calculate the configurational entropy.
Outline of the iterative metadynamics–static metadynamics (iMTD-sMTD) workflow used to calculate the configurational entropy.
(a) Extrapolation of for the example of n-octadecane. The red double sided arrow shows the final correction to the conformational entropy, which in this case accounts for ∼1 cal mol−1 K−1 (4.5% of the conformational entropy). (b) Parity comparison of calculated and experimental absolute molecular entropy employing only msRRHO at the B97-3c level of theory111 for the most stable conformer (top, blue) and the same calculation corrected by obtained with CREST and GFN2-xTB (bottom, red). “eu” refers to entropy units of cal mol−1 K−1. The shaded area around the parity line refers to the domain of chemical accuracy (±3 eu). The black arrows show the case with the highest deviation from the experimental reference. All data shown in these figures were taken from Ref. 40, which also includes more detailed discussions.
(a) Extrapolation of for the example of n-octadecane. The red double sided arrow shows the final correction to the conformational entropy, which in this case accounts for ∼1 cal mol−1 K−1 (4.5% of the conformational entropy). (b) Parity comparison of calculated and experimental absolute molecular entropy employing only msRRHO at the B97-3c level of theory111 for the most stable conformer (top, blue) and the same calculation corrected by obtained with CREST and GFN2-xTB (bottom, red). “eu” refers to entropy units of cal mol−1 K−1. The shaded area around the parity line refers to the domain of chemical accuracy (±3 eu). The black arrows show the case with the highest deviation from the experimental reference. All data shown in these figures were taken from Ref. 40, which also includes more detailed discussions.
B. Explicit solvation via quantum cluster growth
Beyond the, so far, presented functionality, CREST is also able to model explicit and micro-solvation. For this purpose, the Quantum Cluster Growth (QCG) algorithm42 was recently implemented, allowing the automated generation of cluster ensembles containing a solute solvated with explicit solvent molecules. Similar strategies have been published by other groups.113–115 By using the GFNn-xTB methods, any element up to radon can be included in the calculation, making the solvation of chemically diverse solutes with almost any conceivable solvent, including organic solvents just like ionic liquids, feasible. The resulting conformers of solute–solvent clusters can directly be used to compute the properties of the solute in the liquid phase. Additionally, the QCG workflow is able to compute solvation-free energies with such ensembles. The QCG algorithm, outlined in Fig. 10, contains two essential steps that can be subsequently invoked.
The “Growth” part yields a single, low-energy solute–solvent cluster, while the subsequent ensemble generation generates different conformers from this cluster. To get clusters of equally distributed solvent molecules around the solute, two wall potentials are used in both steps. The inner wall potential, affecting only the solute, serves to keep the solute in the center of the cluster, while the outer one, applied to the whole cluster, suppresses the clustering of the solvent molecules to ensure an evenly distributed solvent shell. As the size of the wall potentials is crucial to accurately reflecting the real behavior in solution, it is carefully constructed using geometrical features of the solute and solvent, the solvent accessible surface area (SASA) of the solute, and the number of solvent molecules allowing a dynamical adjustment during the growth (see Ref. 95 for details). The cluster is grown step-wise by adding solvent molecules at low-energy interaction sites. Here, either a combination of the xtbiff program116 and GFNn-xTB/GFN-FF geometry optimizations or the recently developed and for this task recommended aISS docking algorithm43 implemented in xtb can be employed. The Growth algorithm stops if a certain amount of solvent molecules is added that is either determined by the user or automatically based on converging interaction energies between the solute and solvent, determined by a moving average [cf. Fig. 11(b)]. Thereby, the algorithm yields a solute–solvent cluster along with some properties like energy convergence and the final wall potentials for direct use with the xtb program. For the second step that generates an ensemble from this structure, the user has the choice of performing a single MD, MTD, or NCI-MTD workflow (see Sec. II A 3) for the conformer search with the wall potentials. Finally, the whole ensemble is optimized without the wall potentials and energetically ranked, including information about the Boltzmann weighting. Additionally, CREST can compute solvation free energies from this ensemble. Therefore, a reference cluster of pure solvent molecules is generated. By subtracting the Boltzmann-weighted free energies of the solvent cluster and the single solute molecule in the gas phase from the solute–solvent cluster, the energy for transferring the solute from the gas phase into the solution is obtained, which corresponds to the solvation free energy. An example of solvating the anticoagulant drug Rivaroxaban117 with 100 water molecules using the QCG algorithm is given in Fig. 11. Additionally, the interaction energy of the solute and solvent shell is shown during cluster growth.
(a) Solvation of Rivaroxaban with 100 water molecules employing the QCG algorithm. (b) Interaction energy of the solute and the solvent shell during cluster growth.
(a) Solvation of Rivaroxaban with 100 water molecules employing the QCG algorithm. (b) Interaction energy of the solute and the solvent shell during cluster growth.
When adding about 40 water molecules, the first solvent shell of Rivaroxaban is completed, which can be seen in a stagnating decrease of . Afterward, the additional water molecules lead to small variations in the interaction energy as the additional water molecules influence the ones in the first water shell.
C. Protonation site sampling
With the ability to perform inexpensive electronic structure calculations at SQM levels of theory comes access to a variety of properties that cannot be obtained with a classical force-field. For instance, molecular orbitals (MOs) are readily obtained and can be used to analyze and describe the molecule. A popular use of MOs from SQM is to employ them as an orbital guess in higher-level electronic structure calculations.118
Localized molecular orbitals (LMOs) from GFNn-xTB can be used to efficiently identify possible protonation sites of a molecule as part of the CSE.19,44 The obvious advantage of this procedure over heuristics-based or ML algorithms is its generality in that any chemical system can be treated as long as MOs can be supplied to the algorithm. Practically, the procedure is interfaced to the GFNn-xTB methods. The general protonation site screening procedure in CREST is outlined in Fig. 12 for the cytosine molecule. Starting from a calculation of LMOs via the xtb program,53 suitable lone pairs and π-centers are identified. The two criteria employed for this are either a localization degree of less than 1.03 nuclei, i.e., the LMO belongs to a single atom, or a triangular inequality greater than 1.04 if the LMO is shared between two atoms [cf. Fig. 12(b)]. After adding a proton to the identified LMO centers and optimizing the corresponding structures, the generated protomers are ranked energetically. Similar procedures based on localization of the electrostatic potential have also been proposed.119 A typical target size for the protonation site screening procedures is drug molecules. Larger systems are at risk of performing inefficiently, especially when a high degree of conformational flexibility is involved. In such cases, post-processing by conformational sampling is necessary to accurately rank the relative stabilities of the generated isomers.
(a) LMO centers of the cytosine molecule, represented by transparent blue spheres. (b) Conditions for identifying suitable π and lone pair centers for protonation. (c) Low-energy protomers of cytosine, calculated at the GFN2-xTB/ALPB(toluene) level.
(a) LMO centers of the cytosine molecule, represented by transparent blue spheres. (b) Conditions for identifying suitable π and lone pair centers for protonation. (c) Low-energy protomers of cytosine, calculated at the GFN2-xTB/ALPB(toluene) level.
A similar yet simpler strategy can be adapted for identifying deprotonation sites, where all possible deprotonated isomers can be generated by systematically removing the corresponding hydrogen atoms and optimizing the molecular structure.19 Combining both protonation and deprotonation site algorithms sequentially, CREST enables the calculation of prototropic tautomers. Here, the total number of technically accessible tautomers, n × m, is predetermined by the number of hydrogen atoms in the system n and accessible protonation sites m calculated from the LMOs. Furthermore, the extension to alkali ion addition in both the protonation and tautomerization screening algorithms is straightforward. The automated protonation and tautomerization procedures of CREST were efficiently applied in the calculation of pKa values via linear free energy relationships.16,120,121 One significant limitation of these procedures should be highlighted: the manipulation of molecules, either by adding or removing an atom directly from the input structure, ignores the naturally occurring reaction barriers. This oversight can sometimes result in atypical protonation or deprotonation patterns,122 and users are urged to carefully analyze the resulting ensembles with informed chemical insight. Enhancements to these algorithms are planned for future development.
D. Automated reaction discovery for mass spectra elucidation
Automatic reaction discovery has become an important field of modern computational chemistry.123 The MSREACT tool of CREST aids reaction exploration with a special focus on creating fragments and rearrangement products (isomers) occurring in mass spectrometry (MS) experiments, such as electron ionization (EI) or Electrospray Ionization/Collision Induced Dissociation (ESI/CID). Systematic generation of possible products for the input molecule is achieved through the application of constraining potentials between atom pairs and optimization using an efficient SQM method. Interatomic distances are deliberately extended well beyond their equilibrium values, inducing dissociation and rearrangement reactions. As a default setting, atom pairs separated by up to three covalent bonds are subjected to elongation through a repulsive harmonic potential, as given in Table III. Hereby, r0 is set to 1.5 times the sum of the covalent radii of both atoms plus the number of bonds in between, with kr set to 0.05 Eh/bohr2. The number of bonds between two atoms is identified from the molecular graph or topology (see Sec. II A 2) via the Floyd–Warshall algorithm.124 Subsequent optimization at “crude” energy convergence settings (see Sec. III B 2) is performed with GFNn-xTB at an elevated finite electronic temperature of 5000 K to favor the generation of open-shell radicals and to partially account for the multi-reference nature of the open-shell systems typically occurring in (EI-)MS.125 To produce often observed products due to hydrogen rearrangements, further optimizations are conducted with attractive potentials (kr set to −0.05 Eh/bohr2) applied between hydrogen atoms and potential protonation sites within a certain cutoff distance (by default 4 Å), which are determined by LMOs from GFNn-xTB as described in Sec. II C. For planar molecules, for which simple bond stretching may not be enough to get all important isomers, additionally, atomic displacements of the atom positions and subsequent optimizations can be performed. Duplicates of the resulting fragment-isomer ensemble showing the same topology are removed with the quantum chemistry-inspired molecular identifier MolBar.126 Next, the remaining structures are sorted with the CREGEN algorithm to remove potential remaining duplicates at the employed GFNn-xTB level and to remove unreasonable structures above a given energy threshold (default: 200 kcal mol−1). Fragmentation of a molecule is detected if the distance between all atoms of two fragments is 1.3 times larger than the sum of the covalent radii of the respective atoms.
Geometrical constraint potentials implemented in the CREST program.
Constraint . | Potential . |
---|---|
Distance/bond | |
Angle | |
Dihedral angle | |
Bond range | |
Ellipsoid potential |
Constraint . | Potential . |
---|---|
Distance/bond | |
Angle | |
Dihedral angle | |
Bond range | |
Ellipsoid potential |
As an example, Fig. 13 shows the isomers and fragments obtained for 2-pentanone with default settings in the EI-mode, i.e., positively charged open shell radical, using GFN2-xTB. A total of 81 optimizations resulted in 25 unique isomers and fragment pairs. The generated products depend on the input conformer, as specific rearrangements may require spatially close atoms, as in, for example, the important fragment 8 stemming from a McLafferty type rearrangement. Here, the lowest conformer found by CREST at the GFN2-xTB level was used as the input structure.
Generated unique fragments of positively charged 2-pentanone at the GFN2-xTB level using the MSREACT mode with default settings. The structures are numerically labeled based on their relative energies.
Generated unique fragments of positively charged 2-pentanone at the GFN2-xTB level using the MSREACT mode with default settings. The structures are numerically labeled based on their relative energies.
Subsequent fragmentation via cascade reactions can be obtained by employing the MSREACT mode iteratively on the generated fragments. The structures can be used as a starting point for automated reaction network exploration by combining them with automated transition state search to predict the relative occurrence of the fragments. An automated workflow employing this approach to compute mass spectra is currently being investigated. By combining the protonation sampling tool and using the MSREACT mode for the generated protomers or deprotomers, molecular species relevant to ESI/CID-MS can also be generated.
E. Minimum energy crossing points
Approaches to exploring the MECI space by locating MECPs via a metadynamics-based strategy have recently been presented by the groups of Lindner et al.131 and Pieri et al.132 CREST 3 allows the specification of a gap constraint Eq. (9) and the application of Eq. (8) in the previously presented workflows, like iMTD-GC. Employing this in combination with the GFNn-xTB level of theory enables rapid screening of MECPs, as we have previously demonstrated.47,48 A common example is the S0/S1 MECIs (S0/T1 MECPs) of benzene,132 which are shown in Fig. 14(a).
(a) Minimum energy crossing point geometries found for the S0/S1 crossing (S0/T1 for xTB) of benzene, via a modified GFN0-xTB-based algorithm.48 Reference hhTDA-BHLYP-D3(BJ)/def2-SV(P)133 geometries are shown in transparent cyan, and the corresponding Cartesian RMSD is given below each structure. (b) Total energy and energy gap change of the S0/T1 MECP optimization of azobenzene, calculated at the PBEh-3c level of theory. The corresponding structure is shown in solid color, overlaid by the respective GFN0-xTB MECP in transparent cyan.
(a) Minimum energy crossing point geometries found for the S0/S1 crossing (S0/T1 for xTB) of benzene, via a modified GFN0-xTB-based algorithm.48 Reference hhTDA-BHLYP-D3(BJ)/def2-SV(P)133 geometries are shown in transparent cyan, and the corresponding Cartesian RMSD is given below each structure. (b) Total energy and energy gap change of the S0/T1 MECP optimization of azobenzene, calculated at the PBEh-3c level of theory. The corresponding structure is shown in solid color, overlaid by the respective GFN0-xTB MECP in transparent cyan.
Furthermore, employing interfaces to quantum chemistry programs like ORCA134,135 (see Sec. III A 4), CREST provides capabilities to post-process the crossing points obtained with GFNn-xTB at higher DFT levels with the same technical machinery. Figure 14(b) demonstrates the latter for optimizing the S0/T1 MECP of azobenzene, using the broken-symmetry unrestricted Kohn–Sham (BS-UKS) PBEh-3c level of theory136 to describe the S0 and “regular” UKS PBEh-3c for the triplet state. These new capabilities of the CREST code provide valuable tools, for example, to aid the study of photochemical reactions and have been successfully employed for the latter.137
III. IMPLEMENTATION DETAILS
A. Calculator and implemented potentials
By default, CREST comes with implementations of the GFNn-xTB Hamiltonians39,54–56 and the GFN-FF force-field.57 Since these methods are parametrized for all elements up to radon (Z ≤ 86), a majority of the chemical space is accessible with the implemented algorithms discussed in Sec. II. From a computational viewpoint, the heart of the program is the calculator instance interfacing the different Hamiltonians, geometric constraint potentials, and storing parameters (cf. Fig. 15).
Outline of the calculator setup in CREST. The calculator processes a molecular structure with n atoms, as defined by its atomic numbers Z = {Z1, …, Zn} and Cartesian coordinates r = {r1, …, rn}, and returns an energy E with the corresponding gradient ∂E/∂r. The setup allows to define multiple potentials, for example, GFNn-xTB, and combine the respective energies and gradients via user-defined weights w. Additionally, a variety of structural constraints can be defined.
Outline of the calculator setup in CREST. The calculator processes a molecular structure with n atoms, as defined by its atomic numbers Z = {Z1, …, Zn} and Cartesian coordinates r = {r1, …, rn}, and returns an energy E with the corresponding gradient ∂E/∂r. The setup allows to define multiple potentials, for example, GFNn-xTB, and combine the respective energies and gradients via user-defined weights w. Additionally, a variety of structural constraints can be defined.
While general program usage is still possible via command line arguments, as in previous versions of CREST, the increasing complexity of implementations like the calculation container demands the option to define detailed calculation setups. For this purpose, CREST 3.0 implements a new TOML input reader based on the open-source toml-f project.138 Listing 1 shows the new TOML input for a simple GFN1-xTB + ALPB(toluene) singlepoint calculation. More complex calculations, as, for example, required in a MC-ONIOM setup (Sec. III B 5), or for minimum energy crossing point calculations (Sec. II E), support the definition of multiple [[calculation.level]] instances that are processed by the calculator.
Example of an TOML input file for setting up a GFN1-xTB + ALPB(toluene) singlepoint energy evaluation in CREST.
![]() |
![]() |
In Subsections III A 1–III A 5, a short overview of the most important interfaces of the CREST calculator instance is given. The calculator is, in turn, interfaced to different modules that contain typical atomic simulation procedures, as discussed in Sec. III B.
1. GFN1-xTB and GFN2-xTB Hamiltonians via tblite
2. GFN0-xTB and GFN-FF
3. X-HCFF
To be used within CREST, X-HCFF was implemented, similar to tblite, in the external standalone library xhcfflib, calculating the X-HCFF gradient contribution via a singlepoint call. The cavity discretization is carried out following the tessellation scheme used by ALPB.140 The details of the implementation of xhcfflib together with a comprehensive overview of application examples will be given elsewhere; however, in the following, we show the decrease in volume of small methane clusters during compression as a first example application of the use of GFN tight binding methods together with X-HCFF. The respective volumes at a fixed pressure were calculated as the average volumes during a 30 ps MD simulation using GFN2-xTB/X-HCFF for methane clusters consisting of 12, 24, 36, and 64 molecules at 300 K. It should be noted that the pressure is fixed during the X-HCFF calculation while the cavity volume changes with interatomic distances; therefore, the simulation resembles an NPT ensemble. As can be seen in Fig. 16, the cluster compressibility converges rapidly with increasing cluster size, and the model is able to reproduce experimental XRD data obtained with a diamond-anvil cell145 quite well.
Comparison of the compression of methane clusters using X-HCFF and GFN2-xTB with experimental145 values. V0 corresponds to the volume at 1.6 GPa.
Comparison of the compression of methane clusters using X-HCFF and GFN2-xTB with experimental145 values. V0 corresponds to the volume at 1.6 GPa.
4. Subprocesses
Subprocesses allow the creation of interfaces to other atomic simulation programs. Essentially, any potential can be interfaced to the calculator instance as long as energies and gradients are written in a readable format (e.g., a file containing the energy and 3Nat entries for the gradient in atomic units). By default, CREST provides a subprocess interface to the ORCA program134,135 as well as fallback subprocess calculators to the original xtb code.53 Furthermore, an interface to generic shell scripts is implemented. The corresponding calculations can be defined via [[calculation.level]] instances that are processed by the calculator, as shown in Listing 2 for the ORCA code.
Subprocess calls need to be treated with caution. For rapid energy and gradient evaluations, for example, via xtb, hosting the subprocess creates computational overhead and should generally be avoided in procedures with a high number of calculator evaluations, in particular conformational sampling.
5. Constraints
A series of geometrical constraints are available in CREST to enable targeted sampling (see Sec. II A 3). The corresponding potentials allow user-defined fixing of substructure parts and are listed in Table III. Interatomic distances rij, angles θijk, and dihedral angles ϕijkl are constrained via a simple harmonic potential that, aside from the selection of atoms i, j, k, l, only needs the definition of a reference value and a force constant k, which can be individually defined for each set of constrained atoms. The bond range constraint allows the definition of an upper and lower distance limit for a pair of atoms and employs a “logfermi” type potential. For the latter, kBT has the function of a force constant, and the parameter β is used to control the exponential strength. The ellipsoid potential, which is also of the logfermi type, is used to construct a confining ellipsoid around the entire system (relative to the coordinate origin O), which is particularly useful to avoid dissociation of non-covalently bound complexes in CREST’s NCI mode (see Sec. II A 3). Additionally, the complete freezing of atomic coordinates was implemented into CREST 3.0. The corresponding DOF of the gradient is set to zero and projected out during geometry optimization and molecular dynamics simulations.
B. Implemented atomic simulation methods
1. Molecular and metadynamics
MD is the main tool for the generation of new molecular configurations within the CSE of CREST. In this context, no properties or observables, aside from the molecular geometries, are evaluated from the MD trajectories. However, by basing the CSE on the Newtonian equations of motion, the program promotes sampling based on physical principles rather than a heuristic or chemoinformatic generation of configurations.19,20 The main advantage of this setup is that algorithms are, in principle, universally applicable to any atomistic system and level of theory, limited only by the available potentials implemented in the calculation container and the associated computational cost.
CREST employs a leap-frog (Verlet-type) algorithm146 with a simple Berendsen-type thermostat147 for NVT molecular and metadynamics simulations. Longer time steps for extended simulations are enabled by two features: first, an implementation of a SHAKE algorithm148 introducing a bond constraint into the equations of motion, and second, a modification (increase) of atomic masses, in particular hydrogen, for the calculation of the MD velocities.149
Adding new bias structures to the list in Eq. (15) will, over time, “fill” the energy landscape [cf. Fig. 17(a)] and, therefore, ensure the exploration of new regions of the conformational space of the molecule.50 The addition of a single bias configuration typically leads to an instantaneous increase in total energy and temperature during the MTD, as shown in Fig. 17(b). This change is quickly compensated for by the thermostat. For applications within conformational sampling, the bias parameters k and α are selected to maintain the molecular topology, i.e., small enough to avoid the breaking of covalent bonds or the inversion of stereo centers but still able to cross other high-energy barriers.19
Effect of the RMSD bias potential. (a) Schematic representation of sequentially adding bias configurations to the metadynamics simulation. (b) 100 ps MTD of the Tamiflu molecule at 200 K, calculated at the GFN-FF level. A new bias configuration is added to every 15 ps, resulting in a brief, sharp increase in the total energy. The total energy is plotted relative to its average value.
Effect of the RMSD bias potential. (a) Schematic representation of sequentially adding bias configurations to the metadynamics simulation. (b) 100 ps MTD of the Tamiflu molecule at 200 K, calculated at the GFN-FF level. A new bias configuration is added to every 15 ps, resulting in a brief, sharp increase in the total energy. The total energy is plotted relative to its average value.
2. Geometry optimization
A further addition extends the RFO to form the basis of ANCOPT: The entire procedure is conducted in approximate normal coordinates (ANC), which are based on the initial Hk computed from a model Hessian.164 For molecular geometries, internal coordinates are often a preferred choice.165 This is demonstrated for the taxol molecule in Fig. 18, where ANCOPT is compared to L-BFGS166,167 employing Cartesian coordinates, as implemented in the OPTIM program.168 For comparability, both programs use the same tblite backend and employ the GFN2-xTB Hamiltonian.
Comparison of the ANCOPT and L-BFGS geometry optimization algorithms for the taxol molecule (113 atoms) in terms of energy and rms force ‖∇E‖ convergence, calculated at the GFN2-xTB level. The energy convergence is shown relative to the energy of the converged minimum (E − Emin in Eh). Both quantities are plotted on a logarithmic scale.
Comparison of the ANCOPT and L-BFGS geometry optimization algorithms for the taxol molecule (113 atoms) in terms of energy and rms force ‖∇E‖ convergence, calculated at the GFN2-xTB level. The energy convergence is shown relative to the energy of the converged minimum (E − Emin in Eh). Both quantities are plotted on a logarithmic scale.
Clearly, ANCOPT shows superior performance, requiring about 200 steps less than the L-BFGS minimization to converge into the closest minimum satisfying the same energy (ΔE = 0.5 × 10−4) and root-mean-square (rms) force (‖∇E‖ = 0.5 × 10−4) convergence criteria. The overall algorithm of ANCOPT is highly optimized for the rapid geometry optimization of molecular geometries that are needed during the demanding task of CSE, where often thousands of such evaluations are required.
For CSE, CREST utilizes a multi-level optimization strategy. Here, during the sampling iterations, molecular geometries are optimized in (typically) three steps with increasingly tight convergence criteria. In between these steps, CREGEN is used to filter out the high-energy intermediates that would require a large number of optimization cycles in order to converge tightly and duplicated structures that clearly would converge to the same minimum. Different convergence criteria are predefined to simplify the selection. The corresponding settings are shown in Table IV.
Predefined optimization levels of ANCOPT in terms of the change in energy (in Hartree), rms force (‖∇E‖ in Hartree Eh per ANC α), and the maximum number of optimization cycles, which defaults to 200 in case Nat is small.
Setting . | ΔEconv/Eh . | ‖∇E‖/Ehα−1 . | Max. Cycles . |
---|---|---|---|
Crude | 5 × 10−4 | 1 × 10−2 | Nat |
Vloose | 1 × 10−4 | 6 × 10−3 | Nat |
Loose | 5 × 10−5 | 4 × 10−3 | 2Nat |
Default | 5 × 10−6 | 1 × 10−3 | 3Nat |
Tight | 1 × 10−6 | 8 × 10−4 | 5Nat |
Vtight | 1 × 10−7 | 2 × 10−4 | 20Nat |
Extreme | 5 × 10−8 | 5 × 10−5 | 20Nat |
Setting . | ΔEconv/Eh . | ‖∇E‖/Ehα−1 . | Max. Cycles . |
---|---|---|---|
Crude | 5 × 10−4 | 1 × 10−2 | Nat |
Vloose | 1 × 10−4 | 6 × 10−3 | Nat |
Loose | 5 × 10−5 | 4 × 10−3 | 2Nat |
Default | 5 × 10−6 | 1 × 10−3 | 3Nat |
Tight | 1 × 10−6 | 8 × 10−4 | 5Nat |
Vtight | 1 × 10−7 | 2 × 10−4 | 20Nat |
Extreme | 5 × 10−8 | 5 × 10−5 | 20Nat |
3. Numerical Hessian calculation
4. Effective Hessian
To show the influence of the nuclear quantum effects on the rate of state crossing, the allene molecule is shown as an example in Fig. 19. The effective Hessian was computed using the newly implemented procedures of CREST with the composite density functional theory method PBEh-3c136 via the ORCA subprocess interface. Here, restricted Kohn–Sham formalism was used for the S0 minimum and unrestricted formalism was used for the MECP and T1 geometries, and thermodynamic corrections were computed using the RRHO approximation.102
Relative energies of the minima of T1 and S0 and the MECP between the states are shown. The energies are shown relative to the respective triplet energy. The chosen level of theory for the geometry optimization and the computation of the Hessian is PBEh-3c.
Relative energies of the minima of T1 and S0 and the MECP between the states are shown. The energies are shown relative to the respective triplet energy. The chosen level of theory for the geometry optimization and the computation of the Hessian is PBEh-3c.
This case study clearly reveals that thermodynamic corrections computed via the effective Hessian at the MECP and via the Hessian at the T1 geometry lead to a near-degeneracy of the two states. An inclusion of the thermodynamic contributions would, therefore, show an increase in the intersystem crossing rate, proving their importance for describing non-radiative relaxation processes in an excited state.
It should be mentioned that the GFNn-xTB methods, due to the lack of spin-discriminating energy terms, lead to fully adiabatic states. Hence, only avoided crossings are obtained. Therefore, there is no need for the use of an effective Hessian with these methods.
5. Multicenter n-layer ONIOM
CREST implements a new standalone Fortran library called lwONIOM172,173 to allow for subtractive QM/MM (i.e., ONIOM174,175) calculations with, in principle, an arbitrary number of levels and multiple center definitions. The QM/MM methodology is especially valuable for investigating macroscopic and biomolecular systems.176–178 Computational efficiency is achieved by selectively employing high-level QM or SQM calculations in a specific region of interest while resorting to lower-level MM calculations for the rest of the system.
Schematic construction of the MC-ONIOMn dependency tree for a setup with three layers. Fragments are represented as nodes. The highest layer (I) must include a single node containing all the atoms of the system. All other layers (II, III) can include multiple non-overlapping subsystems, treated as children nodes of the original node or nodes of subsequently higher layers.
Schematic construction of the MC-ONIOMn dependency tree for a setup with three layers. Fragments are represented as nodes. The highest layer (I) must include a single node containing all the atoms of the system. All other layers (II, III) can include multiple non-overlapping subsystems, treated as children nodes of the original node or nodes of subsequently higher layers.
In a benchmarking example, MC-ONIOMn calculations (n = 2 layers) were performed for a 5,15-linked porphyrin nanoring C–C coupling reaction [Figs. 21(a) and 21(b)].180 Due to the size of the system, the nanoring allows the definition of six bis-porhyrin units that can either be set up as independent high-level ONIOM region or merged into a single such layer. Similar to previous studies in the literature, in particular Ref. 181, calculations were performed employing GFN1-xTB for the high SQM layer, together with GFN-FF for the low MM layer. Both layers employed ALPB implicit solvation140 for toluene. The conventional two-level SQM/MM ONIOM approach [ONIOM2(GFN1-xTB:GFN-FF)] was compared with the six-centered MC-ONIOM2(GFN1-xTB:GFN-FF) variant. Timings are shown in Fig. 21(c), reaction energies (per C–C bond formed), and Cartesian RMSDs compared to the GFN2-xTB starting structure are provided in Table V.
(a) Nanoring structure adapted from Refs. 180 and 181. Yellow-shaded Zn-functionalized porphyrin units are used for the high-level QM calculations within the ONIOM setup. (b) 5,15-linked porphyrin nanoring C–C coupling reaction. (c) Computation times referring to the full geometry optimization (starting from the GFN2-xTB optimized structures taken from Ref. 181) wall time on 8 CPUs. ONIOM2 refers to a conventional two-layer GFN1-xTB:GFN-FF setup,181 while MC-ONIOM2 refers to the multicenter setup with six independent bis-porphyrin units forming a single GFN1-xTB “high-level” layer.
(a) Nanoring structure adapted from Refs. 180 and 181. Yellow-shaded Zn-functionalized porphyrin units are used for the high-level QM calculations within the ONIOM setup. (b) 5,15-linked porphyrin nanoring C–C coupling reaction. (c) Computation times referring to the full geometry optimization (starting from the GFN2-xTB optimized structures taken from Ref. 181) wall time on 8 CPUs. ONIOM2 refers to a conventional two-layer GFN1-xTB:GFN-FF setup,181 while MC-ONIOM2 refers to the multicenter setup with six independent bis-porphyrin units forming a single GFN1-xTB “high-level” layer.
C–C coupling reaction energies of the 5,15-linked porphyrin nanoring. Values were calculated for the corresponding molecular geometries optimized at the specified levels of theory. Atomic RMSDs relative to the GFN2-xTB optimized structure (taken from Ref. 181) are given for the more flexible nanoring reactant.
Method . | ΔE (kcal mol−1) . | RMSD (Å) . |
---|---|---|
GFN1-xTB | −36.89 | 0.070 |
ONIOM2(GFN1-xTB:GFN-FF) | −36.74 | 0.212 |
MC-ONIOM2(GFN1-xTB:GFN-FF) | −36.71 | 0.213 |
GFN-FF | 155.20 | 0.392 |
Method . | ΔE (kcal mol−1) . | RMSD (Å) . |
---|---|---|
GFN1-xTB | −36.89 | 0.070 |
ONIOM2(GFN1-xTB:GFN-FF) | −36.74 | 0.212 |
MC-ONIOM2(GFN1-xTB:GFN-FF) | −36.71 | 0.213 |
GFN-FF | 155.20 | 0.392 |
MC-ONIOM2 results quite clearly provide a considerably greater trade-off between calculation cost and accuracy compared to the conventional ONIOM2 setup. Calculated coupling energies differ by only 0.03 kcal mol−1 between the two ONIOM variants, and both are within less than 1 kcal mol−1 of the reference GFN1-xTB value. However, the multi-center treatment speeds up the calculation by an additional factor on the same machine [cf. Fig. 21(c)] compared to the “standard” two-center ONIOM2. Expectantly, coupling energies calculated only at the GFN-FF level are not plausible due to the classical treatment.
The ONIOM model setup within the CREST input files follows conventions, as demonstrated in Listing 3. Fragments (nodes) are defined on a by-atom basis; layers are defined by fragments. Finally, each layer is tied to a level of theory from the [[calculation.level]] TOML list (cf. Listing 1).
While this setup technically allows for the use of ONIOM energy and gradient in the sampling algorithms of CREST, we recommend avoiding using it for RMSD-based metadynamics applications. Due to the strong bias potential and the weak points introduced by the mechanically embedded layers, undesirable breaking of bonds can frequently occur. In those instances, it is recommended to first perform the sampling at a cost-efficient level of theory, for example, with the force-field employed for the outer ONIOM layer, and in a second step, evaluate the generated structures within the ONIOM framework. Similar built-in strategies are discussed in Sec. III B 6.
6. “On-the-fly” multiscale modeling
In consideration of computational efficiency, it is sometimes not desirable to run certain parts of the sampling algorithms, in particular metadynamics simulations, at the more expensive SQM level rather than at a force-field level. Likewise, the post-processing of structures by singlepoint energy calculations on the optimized structures might be feasible and useful. The modular construction of CREST 3.0’s calculator allows the definition of such multiscale setups and can be used in algorithms like iMTD-GC and iMTD-sMTD. A minimal example is shown in Listing 4.
This example input will automatically run iMTD-GC in a three-stage multiscale approach: All molecular dynamics and metadynamics simulations will be conducted at the GFN-FF level, geometries will then be optimized at the GFN0-xTB level, and finally, a singlepoint energy on the optimized geometries will be calculated at the GFN2-xTB level. Only the latter will be used for the CREGEN evaluation. This framework enables sophisticated calculation setups; however, it is currently incompatible with the other multiscale models described earlier (ONIOM and MECPs). Furthermore, the user is responsible to judge whether the intended setup is sensible. For example, while a DFT singlepoint refinement via ORCA could easily be defined within the multiscale setup, integrating this into iMTD-GC would still lead to unfavorably high computation times and is advised against.
IV. PROGRAM DISTRIBUTION
CREST is developed with an open-source focus under the LGPL-3.0 license. All code is publicly accessible on GitHub at https://github.com/crest-lab/crest. The repository includes basic build instructions and provides a continuous release binary that is automatically compiled from the most recent commit on the main branch using a meson/ifort environment. A pre-built binary is also supported by the conda-forge infrastructure. Build options are available with the CMake and meson build systems for Intel and GNU compilers. We generally recommend using the Intel architecture.
Developing scientific software as open-source not only accelerates scientific progress through global collaboration but also enhances transparency, allowing fellow researchers to validate and reproduce findings.5,7 Moreover, open-source software grants the advantage of customization, enabling others to adapt the software to their specific research needs, often leading to novel applications. The long-term sustainability of open-source projects ensures that software remains relevant and useful over time, contributing to a lasting impact on the scientific community. At its core, open-source development embodies the principle of shared knowledge and aligns with the ethical responsibility of advancing science for the greater good.
The choice of the GNU Lesser General Public License version 3.0 (LGPL-3.0) reflects our commitment to preserving the open nature of the project and its benefits for both individuals and organizations. The LGPL-3.0 license permits the use of CREST in both open-source and proprietary projects, enabling wider adoption without compromising the freedom to inspect, modify, and distribute improvements to the core software. Furthermore, it ensures that any modifications made to the CREST codebase remain open and accessible to the community, promoting continued growth and improvement of the toolkit.
Besides the main repository, online documentation with example show-cases is maintained at https://crest-lab.github.io/crest-docs.
V. CONCLUSION
In this paper, we present the latest developments of the chemical space exploration software CREST. The program is designed as a straightforward solution to the automated sampling problem of molecular systems up to a size of a few hundred atoms. With a focus on efficiency and single-node parallelization, several workflows based on semiempirical electronic structure calculations are implemented and can be used even on standard desktop computers. The main features of CREST 3.0 include
implementation of the GFNn-xTB and GFN-FF methods via tblite139 and standalone calculators,141,142 providing efficient semiempirical calculations for systems containing all elements up to radon (Z ≤ 86) in standard applications such as energy and gradient calculations, geometry optimization, and molecular dynamics simulations,
capabilities to interface any atomistic simulation program providing gradients. The current code comes with interfaces to the ORCA code134,135 and an implementation of the X-HCFF method,96
automated algorithms for conformational sampling and the calculation of configurational entropy based on the aforementioned levels of theory, including automated multiscale modeling variants,
automated algorithms for the efficient generation of explicit solvation shells and the determination of molecular protonation sites and,
calculators for simultaneously handling multiple levels of theory, in particular for the calculation of MECPs and for the setup of multi-center and multi-layer ONIOM calculations.
In conclusion, the enduring relevance of CREST lies in the robust versatility of its algorithms and general backend implementation. Newly developed potentials or methods can be quickly interfaced with the existing infrastructure. Hence, as technology continues to advance, CREST’s adaptability ensures its continued efficacy, making it a reliable tool for molecular CSE, ready to meet the challenges of an evolving scientific landscape.
Future development efforts will be focused on three primary directions. First, the integration of algorithmic advances such as rapid coarse-grained generation of molecular conformations and global optimization techniques. Second, there are technical improvements to the code, for example, the inclusion of periodic boundary conditions for xTB methods,182 further acceleration of calculations through message passing interface (MPI) parallelization for high-performance computing (HPC) facilities, and the creation of a Python frontend. Finally, generally more accurate CSE will be achieved by next-generation SQM methods, which are currently explored through new tight-binding approaches183 and the novel q-vSZP basis.184 A new method (GP3-xTB) is currently being developed and will be published separately soon.
SUPPLEMENTARY MATERIAL
The supplementary material provides a spreadsheet containing detailed benchmarking results for Sec. II A 1.
ACKNOWLEDGMENTS
P.P. gratefully acknowledges support from the Alexander von Humboldt Foundation for a Feodor Lynen Research Fellowship. S.G. acknowledges the Deutsche Forschungsgemeinschaft for general funding, mainly in the framework of the Leibniz price. C.B. is grateful for funding from the Ministry of Culture and Science of the German State of North Rhine-Westphalia (MKW) via the NRW Rückkehrprogramm. G.F. acknowledges funding from RWTH Graduate Support. P.A.W. acknowledges the Engineering and Physical Sciences Research Council (EPSRC) for funding his studentship through Doctoral Training Partnership Grant No. EP/W524633/1. M.M. and S.S. acknowledge the Fonds der Chemischen Industrie (FCI) for funding under a Kekulé scholarship. The authors acknowledge David J. Wales, Andreas Hansen, Markus Bursch, and Jan-Michael Mewes for valuable discussions.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Philipp Pracht: Conceptualization (lead); Data curation (equal); Methodology (lead); Software (lead); Writing – original draft (equal); Writing – review & editing (equal). Stefan Grimme: Conceptualization (lead); Methodology (lead); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Christoph Bannwarth: Conceptualization (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Fabian Bohle: Data curation (equal); Writing – original draft (equal); Writing – review & editing (equal). Sebastian Ehlert: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Gereon Feldmann: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Johannes Gorges: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Marcel Müller: Data curation (equal); Writing – original draft (equal); Writing – review & editing (equal). Tim Neudecker: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Christoph Plett: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Sebastian Spicher: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Pit Steinbach: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Patryk A. Wesołowski: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Felix Zeller: Software (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request. The software is freely available on GitHub.
NOMENCLATURE
- ALPB
analytical linearized Poisson–Boltzmann (implicit solvation model)
- ANC
approximate normal coordinate
- ANCOPT
approximate normal coordinate optimization algorithm
- BFGS
Broyden–Fletcher–Goldfarb–Shanno
- CN
coordination number
- CRE
conformer–rotamer ensemble
- CREGEN
conformer–rotamer ensemble generation algorithm
- CREST
conformer–rotamer sampling tool
- CSE
chemical space exploration
- DFT
density functional theory
- DOF
degrees of freedom
- Eu
entropy units (cal/mol K)
- FF
force-field
- GC
genetic crossing
- GFN
geometries, frequencies, non-covalent interactions
- HPC
Hartree–Fock
- HF
high-performance computing
- iMTD-GC
iterative MTD + GC conformer sampling algorithm
- iMTD-sMTD
iterative metadynamics-static metadynamics algorithm
- LMO
localized molecular orbital
- MAD
mean absolute deviation
- MC-ONIOM
multi-center ONIOM
- MD
molecular dynamics
- MECI
minimum energy conical intersection
- ML
machine learning
- MM
molecular mechanics
- MO
molecular orbital
- MOF
metal-organic framework
- MPI
message passing interface
- msRRHO
modified and scaled rigid-rotor harmonic-oscillator approximation
- MTD
metadynamics
- NCI
non-covalent interaction
- NCI-MTD
NCI-focused metadynamics sampling algorithm
- PCA
principal component analysis
- PEL
potential energy landscape (synonym to PES)
- PES
potential energy surface (synonym to PEL)
- QCG
quantum cluster growth algorithm
- QHA
quasi harmonic analysis
- RFO
rational function optimization
- RMSD
root-mean-square-deviation (of atomic positions)
- RMSE
root-mean-square-error
- SASA
solvent accessible surface area
- sMTD
“static” metadynamics (≈umbrella sampling)
- SQM
semiempirical quantum mechanics
- SSR
sum of squares regression
- SST
total sum of squares
- WTMAD
weighted mean absolute deviation
- XC
exchange-correlation
- X-HCFF
extended hydrostatic compression force-field
- xTB
extended tight-binding
- ZPVE
zero-point vibrational energy
NOMENCLATURE
- ALPB
analytical linearized Poisson–Boltzmann (implicit solvation model)
- ANC
approximate normal coordinate
- ANCOPT
approximate normal coordinate optimization algorithm
- BFGS
Broyden–Fletcher–Goldfarb–Shanno
- CN
coordination number
- CRE
conformer–rotamer ensemble
- CREGEN
conformer–rotamer ensemble generation algorithm
- CREST
conformer–rotamer sampling tool
- CSE
chemical space exploration
- DFT
density functional theory
- DOF
degrees of freedom
- Eu
entropy units (cal/mol K)
- FF
force-field
- GC
genetic crossing
- GFN
geometries, frequencies, non-covalent interactions
- HPC
Hartree–Fock
- HF
high-performance computing
- iMTD-GC
iterative MTD + GC conformer sampling algorithm
- iMTD-sMTD
iterative metadynamics-static metadynamics algorithm
- LMO
localized molecular orbital
- MAD
mean absolute deviation
- MC-ONIOM
multi-center ONIOM
- MD
molecular dynamics
- MECI
minimum energy conical intersection
- ML
machine learning
- MM
molecular mechanics
- MO
molecular orbital
- MOF
metal-organic framework
- MPI
message passing interface
- msRRHO
modified and scaled rigid-rotor harmonic-oscillator approximation
- MTD
metadynamics
- NCI
non-covalent interaction
- NCI-MTD
NCI-focused metadynamics sampling algorithm
- PCA
principal component analysis
- PEL
potential energy landscape (synonym to PES)
- PES
potential energy surface (synonym to PEL)
- QCG
quantum cluster growth algorithm
- QHA
quasi harmonic analysis
- RFO
rational function optimization
- RMSD
root-mean-square-deviation (of atomic positions)
- RMSE
root-mean-square-error
- SASA
solvent accessible surface area
- sMTD
“static” metadynamics (≈umbrella sampling)
- SQM
semiempirical quantum mechanics
- SSR
sum of squares regression
- SST
total sum of squares
- WTMAD
weighted mean absolute deviation
- XC
exchange-correlation
- X-HCFF
extended hydrostatic compression force-field
- xTB
extended tight-binding
- ZPVE
zero-point vibrational energy