This paper is dedicated to the quantum chemical package Jaguar, which is commercial software developed and distributed by Schrödinger, Inc. We discuss Jaguar’s scientific features that are relevant to chemical research as well as describe those aspects of the program that are pertinent to the user interface, the organization of the computer code, and its maintenance and testing. Among the scientific topics that feature prominently in this paper are the quantum chemical methods grounded in the pseudospectral approach. A number of multistep workflows dependent on Jaguar are covered: prediction of protonation equilibria in aqueous solutions (particularly calculations of tautomeric stability and pKa), reactivity predictions based on automated transition state search, assembly of Boltzmann-averaged spectra such as vibrational and electronic circular dichroism, as well as nuclear magnetic resonance. Discussed also are quantum chemical calculations that are oriented toward materials science applications, in particular, prediction of properties of optoelectronic materials and organic semiconductors, and molecular catalyst design. The topic of treatment of conformations inevitably comes up in real world research projects and is considered as part of all the workflows mentioned above. In addition, we examine the role of machine learning methods in quantum chemical calculations performed by Jaguar, from auxiliary functions that return the approximate calculation runtime in a user interface, to prediction of actual molecular properties. The current work is second in a series of reviews of Jaguar, the first having been published more than ten years ago. Thus, this paper serves as a rare milestone on the path that is being traversed by Jaguar’s development in more than thirty years of its existence.
I. INTRODUCTION
More than a decade has passed since the most recent review of the quantum chemical (QC) software package Jaguar was published.1 Now is a good time to look back on over ten years of research and development associated with Jaguar and survey its usage in applications. In doing so, we demonstrate for which types of present and future research projects in life and materials science (MS), it is most suitable.
At the outset, we should say a word about terminology. In theoretical chemistry, the term “quantum mechanics” (QM)2,3 is essentially synonymous with the term “quantum chemistry” (QC).4,5 While the former is an extremely broad domain of physical science studying all quantum phenomena, from the behavior of elementary particles to quantum effects in astronomical objects,6 in theoretical chemical research, the term “QM” implies “molecular QM”7 or QM in models involving periodic boundary conditions.8 The term QM is also used to express contrast with molecular mechanics (MM) approaches. In this work dedicated to chemistry, the terms QC and QM will be used interchangeably.
The past ten years of Jaguar’s continual version updates (four releases a year starting from 2013) have been marked by the consolidation of the conceptual ideas and methodology adopted in the early stages of its development. The package has been meant to be used primarily in applications that deal with medium- and large-sized (and likely conformationally flexible) molecules, which are typically encountered in industrial molecular modeling. Consequently, we have been putting emphasis on the development of QC methods that scale favorably with molecular size. Another key idea is that Jaguar should play its scientific role (QC calculations) not only in isolation, but also as a part of the larger collection of molecular modeling solutions produced by Schrödinger, Inc. Jaguar is designed to provide the quantum mechanical functionalities that may complement the rest of computer applications, which are part of the modeling suite. These applications may be based on molecular mechanics methods, describe protein–small molecule interactions, or serve as a database of scientific information. Therefore, Jaguar must be well integrated into these software products, which means the standardized passing of data, uniformity of the commands and help messages accessible through the command line, a single graphical user interface (GUI), etc. A related concept, and one extensively exploited in the Jaguar QC package, is using Jaguar density functional theory (DFT) calculations in “pre-packaged” multistep workflows. These workflows, often comprising conformational search and filtering, perform a task or compute a molecular property, which eliminates the need to combine the results of various computational parts afterward.
As Jaguar is a piece of commercial software, its scientific development is driven not merely by knowledge, specialization, or vision of scientists and engineers, but in a balance with forces of the molecular modeling market, which may favor some QC methods and approaches and downplay or completely reject others. These forces contribute to reshaping the traditional organization of a QC package, resulting not only in a somewhat different spectrum of scientific features, but also in a greater role of components that have little scientific value per se, for instance the accessibility of options through a high quality GUI.
As can be seen from this brief characterization of Jaguar, there is a significant contrast to multiple other QC computer programs, which often focus on delivering the most recently published or most accurate QC methods, regardless of computational cost. Like in any traditional review of this kind, Secs. II–X will provide a fairly comprehensive description of the scientific features available in Jaguar. In addition, they will attempt to explain the reasons behind the selection of these features and other preferences, which form the aforementioned contrast. A particular stress will be laid on those elements and components of Jaguar that, since the time of the previous review,1 have been newly introduced, extended, or improved, or that have undergone a change in function or evaluation. Considerable attention will be devoted to the recently published applications in which Jaguar was used. This will help paint a picture of the research fields in which Jaguar users work and the types of scientific problems or tasks that Jaguar is preferred for. While the usability features and business reasons behind certain decisions in software development should normally be outside the scope of a scientific review, we believe that such aspects should be briefly mentioned since they are often closely connected with the enablement of the modern scientific research.
II. DEVELOPMENT STRATEGY
Before we delve into a discussion of individual scientific and technological components that constitute Jaguar, it is useful to examine the big picture and understand upon what grounds the decisions to work in certain directions are made. Numerous scientific and technological advances from which Jaguar might potentially benefit are published in the literature each year. Therefore, a judicious development strategy is needed in order to avoid creating a formless agglomeration of features, which would hamper specialization and be difficult to maintain.
Let us formulate a set of principles which the managers of Jaguar try to follow in order to determine what customer-facing features are the best candidates to be implemented in Jaguar. These are guidelines rather than rigid rules and exceptions are in any case possible. Some of the features present in Jaguar do not adhere to these guidelines for historical reasons, or were implemented at the request of a particular user and do not enjoy a widespread and recommended usage. Consequently, the principles described below should be viewed as an idealized scenario.
First of all, methods and workflows included in Jaguar should be applicable to solving practical computational problems, especially those that come up in industrial applications. This rules out interest in computationally expensive methods that are only suitable for tiny molecules. A boundary that, in our metrics, defines the minimally interesting molecular size for industrially relevant calculations is symbolically represented by the molecule diphenylamine (DPA) or Ph–NH–Ph, which contains 24 atoms (out of which 13 are non-hydrogen). DPA is essentially an arbitrarily selected common molecule, but it combines a non-primitive size, the presence of a torsional degree of freedom, and a common structural motif present in drug-like molecules (two common aromatic rings separated by a linker). As an example of using this boundary in a decision making process, we mention the following piece of reasoning: wave function-based methods, which feature non-perturbative triple and higher determinant excitations, for instance coupled cluster with single, double, and triple excitations (CCSDT),9–11 are hardly practical for chemically accurate calculations on DPA and, therefore, do not qualify for inclusion in Jaguar. A computationally more tractable coupled cluster method CCSD(T),12,13 in contrast, has been applied to molecules with 31–43 atoms,14 and with fairly large basis sets to boot. Therefore, CCSD(T) qualifies as a practically useful method, according to our criterion, although it has not been implemented in Jaguar. The same sort of reasoning is applicable to the decision making process with respect to quantum computing in real world applications: until this new computational paradigm is able to treat molecules of the size of DPA, it will remain outside of the realm of industrially relevant QC projects and, as such, is not a candidate for inclusion in Jaguar.
Next, the feature under consideration should contribute to solving a concrete, industrially relevant practical problem. In this respect, generating accurate energy curves for bond breaking of small molecules is an important fundamental problem,15–17 but it comes up far less often, for instance, in medicinal chemistry applications than approaches that accurately describe torsional rotation.18–20 For this reason, accurate description of torsional degrees of freedom is given priority over bond breaking. Another distinction is that between an actionable prediction of a physical observable and an easily gratifying result that can, nevertheless, be difficult to convert into a physically meaningful interpretation. Therefore, we give preference to delivering features that produce, for example, conformationally averaged spectra, which are directly comparable to experimental measurements in a quantitative way than those that yield qualitative analysis based on non-physically observable entities. Among the latter features are Fukui indices21,22 and the quantum theory of atoms in molecules (QTAIM),23,24 although these particular methods are nonetheless implemented in Jaguar. Then, we prefer to develop a computational solution, which consists of several computational parts assembled, optimized, and delivered to the user as a single whole (for examples, see Secs. VII and IX) rather than focusing on implementing disjointed components and passing the burden of connecting the parts to the user. If possible, it should be applicable to a large chemical space (and not a few favorable functional groups, for example). In particular projects intended for internal users, this aspiration may be unnecessary (see Sec. VIII).
Finally, the tool, method, or approach should require a simple setup and a minimal expertise from the human operator for effective use. Preferably, it should be a “black box,” which only requires a molecular input and produces an easily interpretable output. Examples of methods that do not pass this filter are those methods that require a “manual” selection of active space25,26 or those with less than obvious molecular fragmentation, like fragment molecular orbital (FMO) method27 or effective fragment potential (EFP) method,28 especially when applied to less common types of chemical structures. Automated algorithms for some of these selections exist,29–32 and if any of the above-mentioned methods should be considered for implementation in Jaguar, it is tandem with automations of this kind. Quantum chemical methods that at least partly depend on resonance structures33,34 (assignment of bond orders and formal atomic charges), SMARTS patterns,35 or atom types,36 while delivering a superior quality of results in some cases, have revealed certain difficulties in practical use. Uncomplicated neutral organic molecules typically have a single reasonable resonance structure, but cations and anions with unsaturated bonds often have multiple competing resonance structures, which inevitably leads to indeterminate results. Therefore, it is desirable for the methods exposed for practical use to take only atomic coordinates as a molecular input (accompanied by total charge and multiplicity), without any bond or atom type assignments.
The absence of multireference (MR) methods37 in Jaguar is consistent with the ideas delineated above. These methods tend to be computationally expensive, wave function-based (although MR DFT extensions have been discussed38,39), and oriented toward smaller molecules, all of which does not align well with the principles of Jaguar development. A significant expertise required for choosing the method and its reference determinants as well as for interpreting the results makes MR methods difficult to automate and use in routine calculations. In addition, the types of molecular geometries or phenomena that MR approaches are designed to address (nonequilibrium geometries, conical intersections, etc.40) are not of the highest priority in industry.
As we can see, a large number of well-recognized and often used QC methods and approaches are considered non-ideal for the types of uses targeted by Jaguar. This sidelining has little to do with these methods’ accuracy, which may be otherwise excellent, but rather with their applicability and usability in industrial applications.
III. INFRASTRUCTURE
The Jaguar QC package and the workflows our team builds around it are embedded in a larger suite of Schrödinger products and a company-wide software infrastructure. In other words, the Jaguar suite of programs is not developed in isolation and benefits greatly from its synergy with these other software components. For example, Jaguar workflows can seamlessly leverage a vast pool of cheminformatics tools, machine learning (ML) libraries, or other proprietary applications like MacroModel41–43 for efficient conformational searches, allowing the team to implement new QC products or features more quickly and effectively than it would otherwise be possible. In addition, Jaguar development is strongly influenced by Schrödinger’s culture of professional software engineering practices as well as its business-oriented philosophy, as described in Sec. II. Thus, Jaguar is built as part of a platform infrastructure that is dedicated to providing robust and trusted scientific solutions that the users can rely on, as we describe below.
The whole Schrödinger suite, which consists of multiple software products, including Jaguar, is released to end users on a quarterly cycle. The quarterly releases are named after the pattern year-quarter, for example, 2024-1 (as will be seen from references to releases later in the text). Compared to many other technology industries, our user base has generally conservative attitudes toward software updates, which influences the periodicity of our software updates. A three-month release cycle delivers the product four times more frequently than on an annual release cycle, which was present at the time of the previous Jaguar review.1 We find three months to be a “sweet spot,” which allows for sufficient agility in our development of new features while maintaining quality and avoids the negative impact of overly hasty inclusion of recent changes that an annual release can be susceptible to.
In contrast, internal code development follows a strategy of continuous integration (CI). CI advocates a set of sub-practices that work together to reduce the cost and risk of combining code contributions from distributed teams and multiple individual developers.44 CI practices include frequent code commits (several per day), automated testing, frequent builds, prompt action to fix broken builds, and so on.44–46 The two paradigms are brought into harmony by forking the code a number of weeks before the scheduled release date into two branches: one reserved for the upcoming release, and another which we call the main branch. After this branch point, developers may continue to work as usual on the main branch, but are subject to increasingly strict restrictions on what they are allowed to change on the release code branch. Every night, changes in the release branch are merged into the main code branch so that the latter is always the most complete version. About two weeks before release, only very small changes posing the minimum risk are permitted, allowing ample time for final quality control checks, etc., before the final version of the code is settled and released.
At Schrödinger, we currently use the Buildbot CI framework47 and version control with Git.48 We employ a framework with multiple Git repositories, with most product divisions, including Jaguar, having their own dedicated repository. There is also a central repository containing code for shared functionality, which facilitates interaction among products. Product-specific repositories are allowed to have dependencies on it, but not vice versa.
The whole suite, consisting of code from all repositories, is built at least once a day (in practice multiple times on different platforms) to allow for thorough daily testing, as described below. In case of failure, an immediate response is required from the developers. This integrated development and testing infrastructure is critical to maintaining the quality of all Schrödinger software products. In addition, we have a policy that all local code changes should be reviewed and analyzed by co-workers to suggest improvements and judge their quality before they are committed to the company-wide codebase. The process of code reviewing is reminiscent of peer reviews of publications in academic paper publishing. Code reviews have been shown to reduce the probability of bugs,49–51 increase quality,51 and make the resulting software easier to understand and maintain.
Our testing framework has multiple components. After making changes, almost immediate feedback is provided from fine-grained unit tests,52 which test individual functions or small collections of functions and are executed by the build system automatically upon changes to the codebase. Then, every night, collections of traditional higher-level system integration tests (numbering in the thousands) are automatically applied to the whole suite, which involve executing complete runs of workflows or products with various inputs. In addition, the testing infrastructure includes a number of tests, which exercise the scientific integrity of our code. In contrast to the previous two test types, these typically do not have a binary pass/fail outcome, but instead return a performance metric. An example could be the mean unsigned error of a particular prediction (such as a pKa) for a number of molecules in a test set, which is chosen to ensure the scientific validity of a particular product, or the mean runtime to track and monitor the speed of the code.
The Jaguar codebase is a mixture of Fortran, C/C++, and Python (including the major NumPy53 and SciPy54 libraries). The QC engine comprises ∼1.6 × 106 lines of Fortran code and ∼500 000 lines of C/C++. There are over 250 000 lines of Python code in the dedicated Jaguar codebase and many more in the company-wide codebase upon which Jaguar relies.
An important part of the company-wide infrastructure that Jaguar leverages is the Job Server system. In the previous Jaguar review,1 a primitive version of this system was described, but it has since been largely rewritten and extended in significant ways. Job Server is a tool for running tasks asynchronously and, in particular, on machines different from the user’s. Schrödinger developers wrote Job Server instead of using off-the-shelf tools for distributed computation in order to deal with some unusual requirements of our software, especially the size of the binaries required for our computations. One typical use of Job Server is to launch a Jaguar job or workflow from a laptop, perhaps from the Maestro graphical user interface, with the intention of executing the Jaguar job on a compute node on a remote server that utilizes multiple central processing unit (CPU) cores. Job Server takes care of transferring input files from the launch machine to the remote server, submitting the job to the native queueing system with the correct number of CPUs requested, monitoring the progress of the job, and finally returning output and log files to the user, including the automatic registration of output files to later display in the GUI, when appropriate. The Job Server infrastructure also has several capabilities that help developers implement complex workflows and has components that abstract many of the job handling needs in any given workflow. These include a mechanism for enforcing dependencies between jobs representing different stages of a computation, as well as an optimization that allows single-threaded subjobs to overlap computation with an otherwise idle workflow driver process.
Internally, Schrödinger utilizes a mixture of self-managed hardware compute clusters and cloud resources such as Google Cloud Products (GCPs)55 to perform calculations. We also anticipate an increasingly greater use of the Cloud by the users of our products. It is, therefore, important that our products run efficiently on Cloud resources, in particular preemptible nodes,56 which are typically offered at much lower prices than conventional nodes. Preemptible nodes have the characteristic feature that they are not guaranteed to be available for the whole duration of a calculation; they can be reclaimed by the host at short notice (on the order of a minute, typically). This presents new challenges to a QC engine like Jaguar, which involves complicated long-running calculations, which do not necessarily divide up into trivial (short-running) restartable steps easily. Currently, Jaguar has capabilities to create checkpoint files for use in restarts, such as those needed after a preemptive halt, but only after completed steps in a self-consistent field (SCF) or geometry optimization. For the use of preemptible nodes in particular, it would be beneficial to add checkpointing to other points in the code too, such as between expensive computing steps of a coupled-perturbed Hartree–Fock (CPHF)57,58 iterative solution. This is a work-in-progress paper, and work is also currently underway to support restarts conveniently inside the Job Server infrastructure. The backend job, such as a Jaguar job, would be responsible for periodically registering checkpoint files and an associated restart command with Job Server. Then, Job Server would be able to intercept event signals from a preemptible node, for example, and act on them automatically to restart the job appropriately.
The above discussion has highlighted some of the different ways our codebase has developed to handle single QC Jaguar jobs, like a single-point energy calculation or geometry optimization of a single molecule. This is in contrast to the implementation of high-level workflows comprised of possibly many Jaguar jobs and, potentially, other products. In the previous Jaguar review,1 the parallelization of the Jaguar QC engine was described in detail, where it was explained that Jaguar supports both shared-memory OpenMP parallelism and Message Passing Interface (MPI) parallelism. Since that report, the code’s parallel performance has remained approximately the same, except for the deprecation of MPI-based parallelism in favor of OpenMP parallelism only. The decision to remove MPI support was made for a combination of technical and business reasons and certainly influenced by the much greater importance of tasks involving large-scale workflows and high-throughput screening which users favor more at present compared to ten years ago. Such tasks are typically optimized by running many subjobs single-threaded in an “embarrassingly parallel” fashion, as opposed to accelerating individual subjobs with multiple cores, intrinsically reducing the demand for MPI-type parallelism. In addition, nodes with 16 or more cores suitable for OpenMP threading are now much more commonplace than before, and as reported previously,1 Jaguar calculations do not scale well beyond 16 CPUs anyway. Deprecating MPI support removed a non-trivial burden of code complexity and maintenance.
Jaguar currently does not support QC calculations on graphical processing units (GPUs), and we have no immediate plans to enable such calculations. While the QC community went through a notable excitement of porting QC methods and algorithms to GPUs over a decade ago,59–62 the efforts required to produce high quality software often required a code redesign from scratch.63–65 At the same time, the speedups achieved by some of GPU-based or CPU–GPU hybrid QC programs are not very large;66,67 these are on the order of the speedups, which can already be achieved by the pseudospectral (PS) method. Other implementations68 claim extraordinary and certainly attractive speedups. However, our analysis shows that it will not be a straightforward operation to switch the core SCF algorithms used in Jaguar to GPUs. Factors like the availability of important QC components, such as second derivatives of energy, and higher cost of executing QC calculations on GPUs [while the same hardware resources can be spent on molecular dynamics (MD) calculations for which GPUs are indispensable] must also be taken into account.
As a result, in recent years, the Jaguar team has focused on writing infrastructure related to the CPU parallelization of workflows, as opposed to individual QM calculations. This topic was touched on above, but in addition to the basic coding and job launching issues, there are questions of how to make the execution of parallel jobs as efficient and convenient as possible to non-expert users.
It may not be clear to users, a priori, which combination of simultaneous subjobs and OpenMP threads per subjob is optimal to accelerate a job or workflow for a given amount of compute resources. Indeed, the picture may change if the resources are available via a queue-based environment, or via a server or a multiprocessor machine dedicated to the user. In case of the queue, an additional factor to consider is whether the queue is busy or not. Then, the picture may change again if many such jobs are to be launched together, when the important question is not how quickly each job finishes per se, but how quickly all the jobs finish on aggregate. Finally, some users may be limited in their use of compute resources by their access to Schrödinger licenses. In such cases, they may prefer to launch fewer jobs simultaneously on many CPUs (to reduce license consumption) rather than many jobs simultaneously with fewer CPUs each, even though in general we would expect the latter to be more efficient in terms of overall throughput.
The Jaguar user interface, therefore, includes various input options to launch jobs and workflows in parallel with “intelligent” defaults to abstract these decisions away from the concern of average users as much as possible. The backend automatically assigns CPUs to subjobs as appropriate to optimize runtime with, for example, a given number of CPUs or simultaneous number of subjobs, depending on the workflow and user input. Knowledge of Jaguar’s parallel efficiency is included in the algorithm to throttle the use of OpenMP threads by default to prevent wasteful consumption. In addition, the backend can use knowledge of the number of CPUs per node to avoid launching OpenMP jobs that oversubscribe the shared-memory hardware. Finally, we have plans to incorporate predictions of subjob runtimes (using Jaguar Timer—see Sec. X) to optimize the distribution of CPUs further. The ultimate goal is to make it as easy as possible for a scientist to run jobs efficiently and quickly without needing to worry about details of the available compute resources.
IV. USER INTERFACE
The user can currently interact with Jaguar in two ways: in a text mode (command line interface) and via a graphical user interface (GUI). The text mode implies working with input and output files in a text editor and launching Jaguar calculations via a command line terminal, which can be invoked in all common operating systems: Linux, Mac, and Windows. Among the significant changes to the Jaguar input file format, since the time of the previous review, are: support of comments and macros (combinations of keywords defined in a configuration file), certain keywords accept vector arguments (for example, a list of atoms), pointers to files (such as a grid or a basis set file) can be either a relative or an absolute path on disk, and custom DFT functionals can be specified with a more user-friendly syntax. The relative paths are handled automatically across the divide separating the setup and run machines (for example, a personal laptop computer and a computer cluster), which can be controlled by different operating systems.
Every Jaguar panel launches the associated backend Jaguar job or workflow by a command line script. In such a way, all Jaguar keywords and settings selected in the GUI (or implicitly set by the GUI) are recorded in the command line flags and input files that the GUI generates. For most Jaguar panels, the state of the GUI for a given job specification is fully recorded in a single input file, along with molecule input files, and these input files may be edited in the GUI and imported back into the GUI at a later date to regenerate a calculation exactly. Similarly, the user who prefers working on the command line can reproduce any calculation launched from the GUI by copying and relaunching the command line script on the associated input files, if desirable.
The GUI Maestro has been significantly updated since the time of the previous Jaguar review, where the graphical interface was briefly described and showcased. The biggest transformation of Maestro happened in 2015–2016. The interface was redesigned, improved, and modernized on several levels. Its Jaguar-related functions have undergone significant changes as well. Below, we will give a brief account of the most important improvements that concern the graphical interface with Jaguar, in comparison with the interface described previously.
Most Jaguar GUI panels can now launch many individual Jaguar calculations in a single submission to the Jaguar wrapper. This functionality reflects the need to carry out multiple QC calculations in modern research projects. As the number of levels of theory (most importantly, DFT functionals) and the number of basis sets in Jaguar are measured in hundreds, there is a greater need to be able to easily find the desired level of theory and basis set in a list containing many choices. Now, Jaguar panels allow the user to type in these settings in a search box, and the matching results will appear, dynamically updated. It is also possible to set filters on certain types of settings [for example, to display only meta-generalized gradient approximation (GGA) functionals or relativistic basis sets]. Certain atom-level settings such as basis sets on individual atoms or charge constraints for the constrained DFT (CDFT) theory69 can now be edited in the GUI. Figure 1 illustrates the innovations described in this paragraph.
Several more GUI panels, which launch Jaguar calculations and analyze the results, have been implemented, and several existing panels have been updated to reflect changes and improvements in the backend code. New panels appeared such as the panel that predicts hydrogen bond basicity following the work of Kenny.70 A customizable multistep workflow, which generates and predicts the relative stabilities of conformers and tautomers (including ring-chain tautomers71), is now available. A complex panel, which supports the AutoTS [automated transition state (TS) search] functionality,72 has also been implemented. A new panel, which monitors geometry optimization convergence (by plotting the progression of energies, gradients, rms displacements, and other convergence criteria as a function of iteration number), is useful for checking the smoothness of the optimization. Setting up and launching DFT-based pKa prediction calculations (for both micro- and macro-pKas) have been significantly improved.
The new workflow action menus (WAMs) seamlessly connect various panels in Maestro without requiring that the user seek or be aware of an existence of such a connection. For example, at the end of a successful TS search, an appropriately displayed WAM invites the user to visualize the potential energy surface (PES) corresponding to the reactant, TS, and the product. A similar mechanism exists for a successfully completed geometry optimization or a dihedral angle scan. Figure 2 demonstrates how a WAM works in case of a geometry optimization and the new convergence monitor panel.
Interaction with the QC software via a GUI has traditionally been relegated to a secondary role, while the emphasis was put on using the command line interface. This is perhaps due to the perceived complexity of QC operations as well as high development costs associated with creating and testing interactive graphical elements. Our and our customers’ experience shows that QC research can be conducted entirely in a graphical interface, with Figs. 1 and 2 and serving as examples of typical computational operations. In fact, graphical, and not text-based, interface is meant to be Schrödinger’s main interface for all its customer-facing software, including Jaguar.
Jaguar calculations can be launched from both the GUI and the command line. Every Jaguar panel launches the associated backend Jaguar job or workflow by a command line script. In such a way, all Jaguar keywords and settings selected in the GUI (or implicitly set by the GUI) are recorded in the command line flags and input files that the GUI generates. For most Jaguar panels, the state of the GUI for a given job specification is fully recorded in a single input file, along with molecule input files, and these input files may be edited in the GUI and imported back into the GUI at a later date to rerun a calculation exactly. Similarly, the user who prefers working on the command line can reproduce any calculation launched from the GUI exactly by copying and relaunching the command line script at a later date, if desirable.
V. QUANTUM CHEMICAL METHODS AND FEATURES
After the sections dedicated to questions on planning, engineering, and the user interface, we are ready to embark on a traditional scientific discussion of QC methods and features implemented in Jaguar. For completeness, we will mention the features covered in the previous review only briefly and pay more attention to the features implemented in the past ten years or so. A summary of the principal improvements to methods and properties’ calculations introduced since the publication of the previous review is presented in Table I. A discussion of these improvements is given below. For a similar discussion of the Jaguar workflows, see Sec. VII as well as Sec. IX.
. | Year 2013 . | Year 2024 . |
---|---|---|
Maximum number of atoms | 1000 | 25 000 |
Maximum number of orbitals | 12 000 total | 2000 per atom |
Support of second derivatives in Gaussian basis sets | s-d functions | + f-g functions |
DFT grids | Up to 590 point Lebedev grids | Up to 5810 point Lebedev grids, |
+ SG-0, SG-1, SG-2, SG-3 | ||
Treatment of low frequency vibrational modes | Discard low frequency modes | + Quasi-harmonic approximation73 |
Relativistic basis sets | Dyall basis sets74–80 | |
Relativistic methods | ZORA Hamiltonian in HF and DFT | |
DFT | SVWN5, B3LYP, M06-2X, etc. | + ωB97X-D, ωB97M-V, r2SCAN, PBEh-3c, etc. |
Constrained DFT | Yes | |
TDDFT | TDA81 | Completely reimplemented with TDA and FLR82 |
Dispersion corrections | D3 | + D3(BJ), D4(BJ) |
MP2 | LMP2 | + RI-MP2 |
ΔSCF | Yes | |
Wave function stability analysis | Yes | |
Machine learning energy functions | QRNN19 | |
Implicit solvation | PBF, SM6, SM8 | + PCM (COSMO, GCOSMO/C-PCM, IEF-PCM) |
Polarizabilities and hyperpolarizabilities | Static α, β, γ | + Dynamic α, β |
Spectra | UV/Vis, IR, VCD, Raman | + ECD, NMR |
Mössbauer quadrupole splittings and isomer shifts | Yes83 | |
NMR characteristics | 1H, 13C chemical shifts | + Spin–spin couplings,19F chemical shifts |
. | Year 2013 . | Year 2024 . |
---|---|---|
Maximum number of atoms | 1000 | 25 000 |
Maximum number of orbitals | 12 000 total | 2000 per atom |
Support of second derivatives in Gaussian basis sets | s-d functions | + f-g functions |
DFT grids | Up to 590 point Lebedev grids | Up to 5810 point Lebedev grids, |
+ SG-0, SG-1, SG-2, SG-3 | ||
Treatment of low frequency vibrational modes | Discard low frequency modes | + Quasi-harmonic approximation73 |
Relativistic basis sets | Dyall basis sets74–80 | |
Relativistic methods | ZORA Hamiltonian in HF and DFT | |
DFT | SVWN5, B3LYP, M06-2X, etc. | + ωB97X-D, ωB97M-V, r2SCAN, PBEh-3c, etc. |
Constrained DFT | Yes | |
TDDFT | TDA81 | Completely reimplemented with TDA and FLR82 |
Dispersion corrections | D3 | + D3(BJ), D4(BJ) |
MP2 | LMP2 | + RI-MP2 |
ΔSCF | Yes | |
Wave function stability analysis | Yes | |
Machine learning energy functions | QRNN19 | |
Implicit solvation | PBF, SM6, SM8 | + PCM (COSMO, GCOSMO/C-PCM, IEF-PCM) |
Polarizabilities and hyperpolarizabilities | Static α, β, γ | + Dynamic α, β |
Spectra | UV/Vis, IR, VCD, Raman | + ECD, NMR |
Mössbauer quadrupole splittings and isomer shifts | Yes83 | |
NMR characteristics | 1H, 13C chemical shifts | + Spin–spin couplings,19F chemical shifts |
Jaguar’s most recognizable difference from other QC software is its heavy use of the pseudospectral (PS) approach developed by Friesner and co-workers for a number of QC methods81,82,84–96 and for the calculation of a range of molecular properties.81,82,97–101 This approach is an efficient numerical technique for solving self-consistent field (SCF) and Kohn–Sham equations, as well as other QC equations involving two-electron integrals. Normally, it shaves off one unit in the exponential dependence of the scaling on the number of basis functions N such that, for example, an N4-scaling method becomes an N3-scaling method under the PS method. Importantly, the technique maintains this reduced scaling for functionals containing the exact Hartree–Fock (HF) exchange component. The PS method in Jaguar can be used for HF, DFT, time dependent DFT (TDDFT), local second order Møller–Plesset perturbation theory (LMP2), and its resolution-of-the-identity analog (RI-MP2). In addition, a mixed quantum mechanics/molecular mechanics (QM/MM) method, which is included in Schrödinger’s suite of scientific solutions as the product QSite,93,102,103 naturally benefits from the PS speedup since QSite calls Jaguar for the QM part of the QM/MM calculations. Periodic boundary conditions are not supported in Jaguar.
The mathematical formalism as well as the engineering aspects of the PS method and its applications, besides being presented in many original research papers,82,84,87–90,92,94,96,97,100,104–107 have been briefly described in the previous review.1 The PS treatment of various terms in equations has also been used in the development of other QC methods not included in Jaguar.108–113
In a similar way to the PS method, the resolution-of-the identity (RI) approach114–116 reduces the N4 scaling of the Coulomb two-electron integral calculations in the conventional spectral method to the N3 scaling through factorization and fitting. The conventional RI method is more effective in calculations with pure DFT functionals in which only the Coulomb two-electron integrals are involved,117,118 although progress in computing the exchange two-electron integrals with the RI approach has been reported.118–122 A hybrid method (RIJCOSX), which uses the RI approximation to calculate the Coulomb two-electron integrals and uses a simplified PS method (called COSX, “chain of spheres exchange”) to calculate the exchange two-electron integrals on a grid, has been implemented in other computational packages.123–127 In the COSX method, the PS least square operator is replaced with a matrix formed by the values of basis functions evaluated at grid points. At the same time, the COSX method ignores the PS length scales, the PS atomic corrections, as well as the PS multigrid technique. The progress of linear scaling128,129 has also been reported for density-sparse systems, such as linear molecules. We have recently developed a hybrid method PS-RI-MP2, which is reminiscent of the techniques used in RIJCOSX. PS-RI-MP2 computes SCF and MP2 gradients with the PS method and obtains the MP2 energy with the RI method.130,131
Jaguar’s DFT grids have been updated in two key ways. First, we made it much easier to specify fixed grids in Jaguar, for easier comparison of Jaguar results and those obtained with other quantum chemistry packages. Traditionally, Jaguar DFT calculations (especially in the PS approach) were performed on a combination of custom grids of various sizes, to balance speed and accuracy. This type of grid handling complicated direct comparison of Jaguar with other codes when a nearly exact agreement in numbers was sought. Second, the variety of DFT grids has been expanded, with a number of larger radial grids of up to 5810 radial points introduced. Popular “standard” grids SG-0,132 SG-1,133 SG-2,134 and SG-3134 have also been added to Jaguar.
In a related improvement, we have implemented the quasi-harmonic approximation,73 which is now activated by default in thermodynamic free energy calculations. This approximation helps mitigate the variability of entropy and free energy as a function of rigid rotation in molecular systems, which arises in some rare cases due to the lack of perfect rotational symmetry of DFT grids.135 The variation of the results can be made negligibly small by the use of larger grids and application of the quasi-harmonic approximation or the modified rigid-rotor-harmonic-oscillator approximation.136
The memory organization of the Jaguar source code has been dramatically changed since the previous review.1 Static memory allocation has been switched to dynamic memory allocation by replacing most Fortran common blocks and removing the major parameters that were set for static memory usage. Dynamic memory allocation is preferable for parallelization with OpenMP and for using the code in applications that involve large molecular systems.
The original PS time-dependent density functional theory implemented in Jaguar using the Tamm–Dancoff approximation (TDA) for restricted singlet excitations of closed-shell systems is described in Ref. 81. In that initial work, a number of critically important techniques, such as the PS length scales, the PS Fock matrix updating with multigrid strategy, and the PS atomic corrections, were not used. Without these special techniques, the deviations between the PS method and the conventional spectral method might be unacceptably large for many key applications, such as the optimization of minimum energy crossing points (MECPs) and the design of highly efficient sensitizer dyes for photovoltaics and new materials for organic light emitting diodes (OLEDs). In such applications, the state energy gaps are usually small, and, consequently, the PS TDDFT code in Jaguar was fully rewritten82 in order to fully support increased accuracy and computational efficiency. In the new TDDFT code, the following features are implemented: TDA and full linear response (FLR), support of restricted singlets and restricted triplets for closed shell systems, and unrestricted excitations for open shell systems, for both the PS and conventional spectral methods in both serial and OpenMP-parallelized calculations. The new code shows particularly impressive speedups provided by the PS method, with direct comparisons with Q-Chem reported in Ref. 82.
The gradients of TDDFT/TDA energies play a critical role in many applications, for example, in the design of new OLED materials. Both the PS and conventional spectral analytical gradients of TDDFT have been implemented in Jaguar. TDDFT excited state geometry optimizations of the molecules from the G2 test set, organic fluorophores with large Stokes shifts, and Pt-containing complexes show that the PS method yields average errors in the range of 0.01–0.1 kcal/mol for TDDFT excited state energies, 0.02–0.06 pm for bond lengths, and 0.02°–0.12° for bond angles, when compared to the results from conventional spectral TDDFT. TDDFT gradient calculations of fullerenes (Cn, n ≤ 540) with the B3LYP functional and 6-31G** basis set show that the PS method provides 8- to 14-fold speedups in total wall clock time over the conventional spectral methods.95
The Born–Oppenheimer (BO) approximation,137 which is based on an adiabatic separation of electronic and nuclear motions, fails when the energy gap between electronic states becomes small, and the electronic states are strongly coupled and nuclear motions can induce electronic transitions.138,139 The leading-order nonadiabatic corrections to the BO approximation are the derivative couplings.140,141 Both the PS and the conventional spectral analytic nonadiabatic derivative couplings at the TDA level have been implemented in Jaguar. Benchmark calculations on small fullerenes at the B3LYP/cc-pVTZ level of theory reveal that the PS method can be faster than the conventional spectral method by a factor of ten or more, while showing negligible errors (to be reported in a following publication).
The ΔSCF method,142 which is an alternative way to converge excited state wave functions, has also been implemented in Jaguar. We are currently planning to use ΔSCF in automated workflows, which solve concrete chemical problems, for example, those related to electronic properties of optoelectronic materials.
In addition to nuclear magnetic resonance (NMR) shielding constants, which have long been available in Jaguar,107 the indirect nuclear spin–spin coupling constant is another important quantity in describing the nuclear magnetic resonance spectra. Both the PS and the conventional spectral indirect nuclear spin–spin coupling constants have now been implemented in Jaguar. Our timing benchmark study of spin–spin coupling calculations of fullerenes at the B3LYP/cc-pVTZ level of theory demonstrates that the PS method can achieve a multifold speedup with a mean absolute error of 0.1 Hz (to be presented elsewhere).
The limitation of supporting second derivatives of only up to d-functions discussed in the preceding review,1 which forced us to use custom basis sets with excluded high angular momenta denoted by the suffixes (-f), (-g), etc., has now been overcome. Up to g-functions can now be doubly differentiated in Jaguar calculations. Concurrent with this improvement is an addition of a number of basis sets with high angular momentum, such as def2-qzvppd, to the most recent version of Jaguar. Our development code supports the differentiation of functions with angular momenta larger than g, but their infrequent use in practical applications and the need for some further code optimization have not yet allowed us to release them for general use.
Another limitation in the analytical frequency calculations was the support of only the conventional spectral method in the calculations of the first derivatives of Fock matrix and the explicit second derivatives of the electronic energy. This limitation, which is usually a computational bottleneck, has also been removed. The PS method is now fully integrated into every part of the analytical frequency calculations. In order to support higher projected angular momenta and higher exponents, the effective core potential (ECP) code has been re-implemented. Jaguar has extended the largest ECP projected angular momentum from g to h and the largest exponent from 2 to 4. This extension covers all the ECP basis functions included in the Basis Set Exchange (BSE) database.143
Besides the PBF (Poisson–Boltzmann finite elements) solvation model (SM),36,144,145 which solves the Poisson–Boltzmann equation on a 3D mesh grid, SM6,146 and SM8,147 mentioned in the previous review,1 Jaguar now provides polarizable continuum models (PCMs). These include the conductor-like screening model (COSMO),148 GCOSMO/C-PCM,149,150 as well as SS(v)PE (“surface and simulation of volume polarization for electrostatics”),151 which is also known as the “integral equation formalism” (IEF-PCM).152 In our PCM models, the molecular cavities are constructed from a switching/Gaussian153 function to avoid the discontinuities on the PES. A variety of atomic radii and solvent parameters compatible with the PCM are included in the Jaguar distribution. Furthermore, Jaguar can generate .cosmo files, which store the screening charge surface and other data, and can serve as an input to COSMO-RS154,155 calculations. Besides PCM free energies, the PCM energy gradients and the second derivatives have also been implemented in Jaguar. These components have been made compatible with our PS and the conventional methods to calculate molecular solvation energies, geometry optimizations, ground state and excited state harmonic frequencies, TDDFT/TDA energies, excited state geometry optimizations with TDDFT/TDA, zeroth-order regular approximation (ZORA) energies, ZORA-TDDFT/TDA energies, frequency-dependent polarizabilities and hyperpolarizabilities, NMR shielding constants and indirect spin–spin coupling constants, and PS-RI-MP2 energies and gradients.
Nonlinear optical properties of materials are of paramount importance in a vast number of optical–electronic applications (see, for example, reviews, Refs. 156–159). In cases of static electric field, Jaguar can compute polarizabilities and hyperpolarizabilities by perturbing the system Hamiltonian or by taking derivatives of the electronic energy,100 as was reported in the previous review.1 Among many ways to compute the frequency-dependent (also called dynamic) analogs of these properties, TDDFT is often preferred due to its favorable computational cost. A PS TDDFT implementation of frequency-dependent polarizabilities and hyperpolarizabilities was made available in Jaguar several years ago.
Electronic circular dichroism (ECD) is the most widely used form of chiroptical spectroscopy and is normally combined with ECD theoretical predictions in practical applications.160,161 The signal in this spectroscopy is determined by the electronic rotatory strength. There are two rotatory strength formalisms: velocity and length.162 The velocity expression is gauge-origin independent, and the length expression requires the use of London atomic orbitals163,164 to be gauge-origin independent. Both the velocity and length ECD formalisms have been implemented in Jaguar. We currently recommend the velocity formalism implementation of Jaguar for reasons of its origin independence and because it has been better validated.
In modeling processes that involve electron transfer, it is desirable to apply constraints to the electron density in different regions of space in order to force the SCF to converge to a desirable charge density. The constrained DFT (CDFT) approach, which addresses this problem, was originally developed by the group of Van Voorhis.165,166 There are two types of constraints, namely charge constraints and spin constraints, and both of them have been implemented in Jaguar. Following the pioneering work in electron transfer by Zhang and co-workers,99 we have extended our electron transfer code to include a DFT coverage, to support the unrestricted formalism, and to integrate it into wave functions generated from CDFT.
The vibrational self-consistent field (VSCF) method167 has been implemented in Jaguar to calculate anharmonic vibrational spectra. The VSCF method solves the vibrational Schrödinger equations by taking into account the anharmonicity of the potential energy surface and the couplings between the normal modes.
OLEDs based on fluorescence (first generation), phosphorescence (second generation), and thermally activated delayed fluorescence (TADF) (third generation) have attracted great attention in academia and industry.168–170 Theoretical predictions of absorption and emission spectra are essential in the design and selection of new OLED emitter materials for use in active display and lighting applications. There are two different methodologies to calculate the vibronic spectra (either absorption or emission): the time-independent approach and the time-dependent approach.171 In the former, the vibronic spectra are calculated through overlaps between the vibronic wave functions of initial and final states and the sum over initial and final vibrational states. In the latter, the vibronic spectra are calculated through the time domain integration via a fast Fourier transform. Both the time-independent and time-dependent methods have been implemented in Jaguar. Besides the absorption and emission spectra, calculations of the radiative rates, intersystem crossing (ISC) and reverse intersystem crossing rates, and the non-radiative decay rates have been implemented in a time-dependent formalism with both Franck–Condon and Franck–Condon Herzberg–Teller approximations172 included.
The relativistic spin–orbit (SO) coupling effects cause spin-forbidden transitions and play a critical role in photochemical reactions. In particular, the SO coupling induces fast intersystem crossing (ISC) and phosphorescent decay in organometallic complexes.173,174 The relativistic SO coupling approximation has been implemented in Jaguar as part of the ZORA method175,176 in both the scalar and two-component formalisms. ZORA in the scalar formalism in Jaguar is compatible with the HF, DFT, and TDDFT levels of theory. Both SCF and TDDFT/TDA with relativistic spin–orbit coupling in two-component ZORA have been implemented in the noncollinear formalism of DFT.
A number of features mentioned above and recently implemented in Jaguar, especially those that employ the PS method, have not yet been described in detail in individual publications. We are planning to fill this gap in the near future.
The class of methods that is most frequently used in practical QC calculations involving Jaguar is, predictably, DFT. It is Jaguar’s most extensively developed method in a sense that it has the greatest variety of options (particularly, in the choice of the exchange–correlation functional), it is compatible with a comprehensive range of molecular property calculations, and it is utilized in all workflows. All major types of functionals except double hybrids177 [local-density approximation (LDA), GGA, meta-GGA, hybrid, dispersion-corrected, range-separated, non-local correlation, and composite) are currently represented in Jaguar. A recent PS implementation of range-separated functionals has been described in detail in a published study.96 Several double hybrids have been implemented in a development version of Jaguar, and they are currently undergoing validation and are being considered for release. The D3 a posteriori correction,178 together with the Becke–Johnson damping179 and Sherrill reparameterization,180 and its successor D4 correction181 have been added to Jaguar and can be combined with many functionals. Note that the D3 correction and its first and second derivatives are computed with our own code, whereas the D4 correction and its derivatives are taken from the dftd4 code.182,183 A number of composite 3c functionals184–188 as well as HF-3c189 have also been recently made available in Jaguar.
If we count the number of different functional names that can be invoked in Jaguar (with each of those functionals producing a different energy), then it stands at 178. If we exclude functionals with the suffixes -D3, -D3(BJ), -D4, -3c, and other functionals with separable dispersion corrections (B3LYP-LOC,190 B3LYP-MM,191 and PBE-ulg192), then the number of uniquely named functionals provided by Jaguar will be 88. To put this number in context, in a very extensive benchmarking study by Mardirossian and Head-Gordon,193 an assessment of “200 density functionals” was made, but that number included functionals of the structure F, F-D2, and F-D3, with various parameterizations of the D3 correction counting toward distinct F-D3 functionals. In our estimation, the number of “undecorated” functionals F in that study was only 94, which is on the same scale as the number of pure functionals accessible from Jaguar.
The default DFT functional in Jaguar is currently B3LYP-D3, which replaced the previous default B3LYP several years ago. The former typically produces more accurate results than the latter193 at the same computational cost and has essentially no drawbacks, so the switch was natural once the computational community overwhelmingly accepted the utility of the D3 correction.
The methods that utilize the PS approach require specially prepared basis sets (which we call PS basis sets). These basis sets are essentially the same familiar Gaussian basis sets, but accompanied by a set of dealiasing functions and grids, which should be in principle fine-tuned for each PS basis set (see Ref. 1 for more details). Thus, any conventional basis set Y, which we would like to use in a PS calculation, needs to be “converted” into a PS basis set Y-PS first. Y-PS may be constructed in a variety of ways, and, ideally, it should lead to a negligible loss of accuracy, in comparison with the analogous fully analytical calculation using the basis set X. Of course, the PS calculation should have a significantly lower computational cost (in practice, a speedup of a factor of 2–5 and more) if its use is to be justified.
Figure 3 shows the typical Jaguar timings obtained for single point energy calculations with the fully analytic and PS DFT calculations, at the B3LYP-D3/LACVP** level of theory. The calculations are performed on over 100 OLED and organic photovoltaic (OPV) molecules, some of which are shown in this figure. In this case, most PS speedups fall in the interval between 2 and 4.
The speedups provided by the PS method must be balanced by the loss of accuracy, which should be negligible in practice. A typical demonstration of such a balance is shown in Fig. 4 in which a relaxed scan is performed on a molecule tested for atropisomerism (see more information about the types of scans in Sec. VII B). A relaxed scan is a more challenging case in this regard (in comparison with a rigid scan), since numerous geometry optimization steps at every scan point may lead to a greater divergence between the PS and analytical energies. The figure reveals almost no visible difference between PS and analytical calculations. An average absolute energy difference for all the points was 0.03 kcal/mol. The maximum difference was 0.49 kcal/mol at 70°, which, while being larger than normally expected, produces no qualitative changes in the interpretation of the results. In addition to the accuracy of the calculation, the figure shows time savings, which are mostly a factor of 2.
Not every basis set Y, which is included in a distributed Jaguar release, has its counterpart Y-PS available in the package. The following major PS basis sets or basis set families are currently included in Jaguar releases, where the symbols in parentheses indicate the optional use of diffuse and polarization functions: 3-21(++)G(**), 6-31(++)G(**), 6-311(++)G(**), D95(**),194 (aug-)cc-pVDZ, (aug-)cc-pVTZ, (aug-)cc-pVQZ, LAV2P(++)(**), LAV3P(++)(**),195,196 LACVP(++)(**),197 LACV3P(++)(**), CSDZ(++)(**),198 ERMLER2(++)(**), MIDIX, MIDIXL, cc-pVTZ-PP, def2-SVPD, def2-TZVP, and def2-TZVPD.199 LACV3P is a triple-zeta contraction of the LACVP197 basis set developed by us in an unpublished study. This somewhat compact list of PS basis sets readily available in Jaguar, in comparison with a much longer list of provided analytic basis sets, speaks to the laborious semi-manual process with which high quality PS basis sets are constructed in our team and also to the observation that the existing list of PS basis sets is sufficient for most practically important calculations in industrial applications. An automation of constructing high quality PS basis sets would be certainly desirable and is planned for future work, along with the ideas introduced previously123,200–203 or by making use of modern ML techniques. In a separate project, we are designing a new generation of PS basis sets. These PS basis sets may offer much more attractive speedups with respect to analytic calculations than those mentioned above.
Among other improvements related to basis set support in Jaguar is an introduction of several new relativistic basis sets of Dyall and co-workers,74–80 which are suitable for relativistic calculations with the ZORA Hamiltonian. The relativistic Hamiltonian has been implemented in Jaguar in both the scalar and spin–orbit variants compatible with HF and DFT wave functions. A recent listing of the Jaguar basis set format in the popular BSE website143 also deserves a mention.
In addition to UV/Vis, IR, VCD, and Raman spectra, which Jaguar could predict at the time of the previous review, Jaguar can now compute ECD and NMR spectra (for 1H, 13C, and recently introduced 19F isotopes). For the latter, not only are chemical shifts and spin–spin couplings computed, but they can also be used in an assembled spin dynamics Hamiltonian, which represents an interaction of the nuclear spins with the internal and external magnetic fields. As a result, a whole NMR spectrum can be generated and plotted by a specialized script that acts on the results of a Jaguar NMR calculation. All these types of spectra can be computed with the use of the conventional spectral method or the PS method. Support of the nuclear Overhauser effect (NOE)204 in the form of printed interatomic H–H and H–F distances across the conformational space is available. The quadrupole splittings and isomer shifts of Mössbauer spectra have been implemented in accordance with the parameterization developed in our previous work.83
As spectra are usually sensitive to conformational effects, it is recommended that the generation of spectra for realistic molecules take these effects into account. For this reason, most of the types of spectra mentioned above can be predicted in multistep workflows, which involve conformational search and averaging. These workflows will be discussed in more detail in Sec. VII.
Recent years have seen an emergence of a large number of machine learning potentials (MLPs), the construction of which revolves around neural networks (NNs).19,205–211 MLPs are trained on QC data and output molecular energies comparable in quality to those of QC methods, but at a tiny computational expense. These NN-based models are outwardly similar to traditional QC methods: they produce absolute energies, which can be differentiated once or twice, typically only with respect to nuclear displacements. There are some significant differences too. For example, MLPs do not require a basis set and have limitations not characteristic of traditional QC methods (only some elements of the periodic table and only certain types of molecular properties may be supported). Nevertheless, these MLPs can be thought of conceptually, and treated on an engineering level, as a “level of theory.” In Jaguar, an MLP known as QRNN19,212 can be used as a level of theory. Sections VII and X discuss QRNN and its use in Jaguar and Jaguar workflows in more detail.
VI. EXTERNAL COMPUTATIONAL SOFTWARE COMPONENTS
It is natural for Jaguar users to benefit from a number of external computational modules, which are either produced by Schrödinger and have their own Maestro panels (for example, QSite or Epik), or which are licensed from third-party developers and distributed alongside Jaguar (for example, Mopac). Since some of these programs are mentioned in various parts of this paper, Table II summarizes them succinctly for the benefit of the reader. This table is not exhaustive and mentions only the most frequently used external computational software. Note that most of these components are reachable from Maestro in one way or another (for instance, by editing Jaguar input files and entering the appropriate settings), but only some of them have dedicated GUI panels.
Title . | Description . | Has dedicated Maestro panel(s)? . | References . |
---|---|---|---|
Desmond | A molecular dynamics package | Yes | 213 and 214 |
dftd4 | D4 empirical dispersion corrections | No | 181–183 |
Epik | ML-based pKa predictions | Yes | 215 and 216 |
MacroModel | Molecular mechanics calculations | Yes | 41–43 |
Mopac | Semiempirical thermochemistry methods | Yes | 217 and 218 |
NBO 7.0 | A natural bond orbital analysis | No | 219 |
Open Babel | Format conversion cheminformatics program | No | 220 |
OPLS4 | A force field | No | 18, 221, and 222 |
pyGSM | A reaction path and geometry optimization package | No | 223 and 224 |
QRNN | A machine learning potential | No | 19 |
QSite | A QM/MM program, which calls Jaguar for QM calculations | Yes | 93, 101–103 and 225 |
RDKit | A cheminformatics package | No | 226 |
xtb | Semiempirical xTB models | No | 227–229 |
Title . | Description . | Has dedicated Maestro panel(s)? . | References . |
---|---|---|---|
Desmond | A molecular dynamics package | Yes | 213 and 214 |
dftd4 | D4 empirical dispersion corrections | No | 181–183 |
Epik | ML-based pKa predictions | Yes | 215 and 216 |
MacroModel | Molecular mechanics calculations | Yes | 41–43 |
Mopac | Semiempirical thermochemistry methods | Yes | 217 and 218 |
NBO 7.0 | A natural bond orbital analysis | No | 219 |
Open Babel | Format conversion cheminformatics program | No | 220 |
OPLS4 | A force field | No | 18, 221, and 222 |
pyGSM | A reaction path and geometry optimization package | No | 223 and 224 |
QRNN | A machine learning potential | No | 19 |
QSite | A QM/MM program, which calls Jaguar for QM calculations | Yes | 93, 101–103 and 225 |
RDKit | A cheminformatics package | No | 226 |
xtb | Semiempirical xTB models | No | 227–229 |
VII. WORKFLOWS
In the text above, we alluded to workflows that combine multiple individual calculations (subjobs) such as geometry optimizations and single point calculations, but most importantly involve conformational search with subsequent conformer pruning or filtering. Our experience of interaction with multiple researchers using the Jaguar package shows that workflows have become increasingly popular in practical calculations. Before discussing the workflows available in Jaguar, we believe that it would be useful to briefly reflect on the changing landscape of practical QC calculations and the greater role that workflows play at the present.
Traditionally, the end goal of QC calculations has been an accurate and computationally tractable prediction of molecular properties.230,231 For decades, the accuracy of QC methods, whether in single- or multistep calculations,193,232,233 has been measured on test sets involving predominantly small, rigid, structurally uncomplicated molecules, as well as their relatively straightforward chemical transformations. It is perhaps assumed that the most important component determining the accuracy of the property predictions is the quantum chemical calculation itself and that the protocols developed on smaller, uncomplicated molecules would easily scale to real world molecules with a trivial addition of computational steps such as conformational search. Treatment of conformational degrees of freedom and other complications that might accompany physico-chemical phenomena in realistic molecules (tautomerism, stereochemistry, protonation states, interaction with solvent molecules, aggregation, etc.) are typically left outside traditional QC packages and are a responsibility of the end user. There have been numerous studies of large and flexible molecules in the context of the above-mentioned complications,71,234–237 although we believe that an understanding of the magnitude of errors stemming from neglect or approximate treatment of these complications is still limited. The workflows that we are developing and distributing as part of the Jaguar package are designed to automatically account for some of the physico-chemical effects not treated at the QC level. Let us discuss those workflows that have undergone substantial changes in the past ten years or so. Other workflows, such as computation of hydrogen bond238 strength or heat of formation,33 have not changed significantly since the time of the previous review and do not require additional discussion here. A number of workflows specific to materials science research have been introduced and will be discussed in Sec. IX.
A. Spectroscopy
Simulation of spectra with Jaguar workflows should be distinguished from a mere QC computation of data that are required to produce a spectrum. A spectrum is usually “assembled” from QC data in an operation, which might include line broadening and conformational averaging. In the case of NMR spectra, an interaction of the molecule’s nuclear spins with the magnetic field of a spectrometer and each other has to be calculated. With this distinction in mind, now we can mention that Jaguar does not output Mössbauer spectra although it computes quadrupole splittings and isomer shifts, which are key characteristics of these spectra.83
The types of spectra currently generated by Jaguar workflows are as follows: IR, Raman, VCD, UV/Vis, ECD, and NMR. These workflows depart from generation of conformers with user-controlled settings. The default settings include generation of up to 200 conformers within the energy window of 5.0 kcal/mol, according to the force field method. According to Chan and co-workers,239 who used the GFN2-xTB semiempirical method229 (often abbreviated to xTB) to estimate the number of low energy conformers as a function of the number of rotatable bonds within the 5.0 kcal/mol energy window, 200 conformers should suffice to treat molecules with up to 13 rotatable bonds. However, when working with our workflows, we have been able to generate far more conformers in the same energy window at the DFT level of theory, for molecules with fewer rotatable bonds. So, it is important to investigate whether Chan and co-workers’ findings hold for more accurate levels of theory. The answer to that question has significant implications for DFT-driven workflows like ours. For now, we ask our users to exercise caution and experiment with the influence of the conformational search settings on the quality of the produced spectra.
When comparing a simulated spectrum with an experimental one, it is important to properly align the spectra, if adequate conclusions are to be reached. Several spectrum alignment algorithms exist for VCD and IR spectra,240–243 have been implemented in our code base, and are accessible through the Maestro graphical interface.
The recently implemented NMR spin–spin couplings are used together with NMR chemical shifts to generate a conformationally averaged NMR spectrum. The spectra of 1H, 13C, and 19F isotopes can be output, and the coupling to spin ≠1/2 nuclei such as 2H (deuterium) is fully supported. Currently, the chemical shifts of the above-mentioned isotopes are computed on the basis of linear regression models,244 although in the near future, we are likely to switch to using machine learning models that have been recently adopted for NMR chemical shift prediction and claim superior accuracy.245,246 In generating NMR spectra, our code follows the algorithms of Kuprov et al.247,248 as implemented in the Spinach spin dynamics package.249
B. Coordinate scans
Workflows known as “scans” generate a slice of a PES as a function of a molecular geometry parameters such as bond distance, angle, or dihedral angle. Rigid and relaxed coordinate scans have been discussed before,1 but two novelties need to be mentioned with regard to relaxed scans at the present time. All the initial geometries for the data points in relaxed scans used to be generated at the time of the workflow launch. Let us call this type of scan “relaxed fixed parallel scan” because each data point is independent of another and can be computed in parallel. This approach might create problematic initial geometries (such as those with unacceptably short atom–atom distances due to atom “collisions”) for some data points. In order to address this problem, we implemented an option to generate the (N + 1)th initial geometry from the Nth converged geometry. It comes at a disadvantage of having to wait for the Nth optimization to finish until the (N + 1)th optimization can be launched, so the calculations for all data points cannot be launched in parallel (this type of scan is appositely named “relaxed fixed consecutive scan”). The second innovation is an introduction of dynamic relaxed scans. In these types of scans, the geometries of all initial data points are the same, but the values of the scan variable s(i) on the ith iteration are set to be reached dynamically, in the course of a geometry optimization. Dynamic scans allow the user to avoid interatomic collisions that might arise in the process, scan coordinates in rings, and also perform calculations for all data points in parallel. Scans can be performed in more than one dimension, and PES plots for one- and two-dimensional scans are automatically generated. Table III summarizes all these types of scans.
Iteration | Rigid scan | Relaxed fixed parallel scan | Relaxed fixed consecutive scan | Relaxed dynamic scan | ||||||
1 | scan(1) rest(1) | scan(1) rest(1) | → | scan(1) rest(2) | scan(1) rest(1) | → | scan(1) rest(10) | scan(1) rest(1) | → | scan(1) rest(100) |
2 | scan(2) rest(1) | scan(2) rest(1) | → | scan(2) rest(3) | scan(2) rest(10) | → | scan(2) rest(20) | scan(1) rest(1) | → | scan(2) rest(200) |
3 | scan(3) rest(1) | scan(3) rest(1) | → | scan(3) rest(4) | scan(3) rest(20) | → | scan(3) rest(30) | scan(1) rest(1) | → | scan(3) rest(300) |
Iteration | Rigid scan | Relaxed fixed parallel scan | Relaxed fixed consecutive scan | Relaxed dynamic scan | ||||||
1 | scan(1) rest(1) | scan(1) rest(1) | → | scan(1) rest(2) | scan(1) rest(1) | → | scan(1) rest(10) | scan(1) rest(1) | → | scan(1) rest(100) |
2 | scan(2) rest(1) | scan(2) rest(1) | → | scan(2) rest(3) | scan(2) rest(10) | → | scan(2) rest(20) | scan(1) rest(1) | → | scan(2) rest(200) |
3 | scan(3) rest(1) | scan(3) rest(1) | → | scan(3) rest(4) | scan(3) rest(20) | → | scan(3) rest(30) | scan(1) rest(1) | → | scan(3) rest(300) |
C. Solvation
The three types of calculations that can be performed by the solvation workflow are solvation energy (or the variant thereof called E-sol250), log P, and log D predictions. log P is computed following water and octanol-1 implicit solvent calculations preceded by a conformational search. For geometry optimizations, a PCM model is used, and SM8 single point energy calculations are performed on the optimized geometries of neutrally charged species (the contribution of other charges is neglected). Our testing has showed that the SM8 model, although likely not the most accurate solvation model (SM) developed by Truhlar and co-workers,251,252 gives the most balanced treatment of solvation free energy between water and octanol, among all solvation models included in Jaguar.
The calculation of log D is done along the same lines, except that, being a pH-dependent property, log D requires prediction of populations of protonation states at a given pH. The distribution of the tautomers of the charge states is predicted by either the program Epik216 based on ML methodology or Macro-pKa253 grounded in DFT calculations. More information about Macro-pKa will be given in Subsection VII E.
D. Tautomer stability
The problem of tautomerism254–256 permeates many tasks in modeling organic and, particularly, drug-like molecules.257–259 Even though it looks deceptively simple in elementary textbook examples, this phenomenon exhibits considerable complexity in practice.260 Prediction of tautomerism is related to the prediction of pKa (vide infra). Both these types of calculations are very important in practice, since protons in drug-like molecules are usually mobile, and not being able to predict the energetic penalty of a particular arrangement of protons carries a risk of modeling a wrong molecule.
Our workflow, which predicts relative tautomer stability, begins by generating tautomers of a user-specified structure, discards clearly high-energy tautomers (by using a computationally inexpensive filter), optimizes the remaining reasonable structures, and finally computes single point energies of the few low-energy tautomers with a higher level theory. This type of multistep calculation, which consists essentially of generation of structures, their filtering, subsequent geometry optimization, and single point energy calculations, is present in a number of Jaguar workflows. Details differ from workflow to workflow, but these four “stages” remain conceptually the same. In the tautomer stability workflow, we generate tautomers by a combination of algorithms, including one published recently by us,261 and use either QRNN ML-based energy function, or PM7 semiempirical method,218 or a low-level, computationally inexpensive DFT method as a structure filter. The GFN2-xTB semiempirical method could be used as a filter as well. Force fields like OPLS4222 are appropriate for ranking the energies of conformers, not tautomers. Conformational search can be optionally added to the workflow. Ring-chain tautomers can also be generated via a recently published algorithm.71
E. pKa prediction
DFT-based pKa prediction has several advantages in comparison with empirical models. DFT-based models work with three-dimensional (3D) structures and, as such, handle steric, conformational, and electronic effects directly. They also naturally account for stereochemical effects. Although DFT-based models still require empirical parameters for high pKa prediction accuracy, they are assumed to be less dependent on parameterization than fully empirical models and should be, therefore, more applicable to uncommon types of molecules and functional groups. Among the disadvantages of DFT-based models are their high computational expense and the complicated problem of conformational sampling exacerbated by inherent limitations of implicit solvent models. Both these problems become overwhelming for very large and flexible molecules, at which point it might be better to resort to empirical pKa prediction models.
In the Jaguar package, DFT-based pKa prediction is done through the workflow called Jaguar pKa. In this workflow, DFT energetics are combined with empirical corrections trained on experimental pKa data. The empirical corrections are meant to compensate for the defects inherent in the implicit solvation model and bring pKa prediction to the level of accuracy that would be useful in complex research projects such as computational drug design. Since the description of Jaguar pKa in the previous Jaguar review, the workflow has undergone a couple of conceptual improvements.1 Conformational averaging was introduced into the final pKa calculation,234 and the weighted averaging of pKa prediction models increased pKa prediction accuracy and relieved the user of having to decide on a particular pKa prediction model that depended on a subjective classification of the functional group participating in the protonation/deprotonation process.264
Jaguar pKa was initially designed for predicting micro-pKas. This type of calculation would be appropriate for molecules with a single functional group that is not prone to tautomerism, or molecules with multiple functional groups provided that the micro-pKas of these groups are sufficiently well separated on the logarithmic scale. As many molecules in realistic projects do not meet these conditions, a need arose to design a more complex workflow that would be applicable to pKa predictions of medium-sized organic molecules with any type or number of functional groups, including those that can undergo tautomerization. Such a workflow, a successor to Jaguar pKa, is now available and is called Macro-pKa. It generates the tautomers and conformers for the relevant charge states, filters off and discards high energy structures, computes all necessary micro-pKas, and assembles them into macro-pKas, which should be directly comparable to experimental pKa measurements. Macro-pKa is based on the same thermodynamic cycle as Jaguar pKa and borrows some physico-chemical ideas from the latter, although Macro-pKa and Jaguar pKa are separate codes with different training sets and parameter construction techniques. At the end of a calculation, Macro-pKa outputs not only macro-pKas but also tautomer populations of all sufficiently highly populated states, conveniently organized in a single diagram. An example of such an output is shown in Fig. 5. It is important to note that work on finding the optimal settings for the workflow is still ongoing, but here, we thought it worthwhile to highlight the type of automated complex calculations that QC methods can be routinely used for. Early versions of Macro-pKa have already been applied to a couple of published research projects,216,253 but a separate, detailed presentation of Macro-pKa is planned for the future.
F. Transition state search
TS search was an obvious candidate for automation in the form of a Jaguar workflow when we started working on it several years ago.72 If performed manually, a TS search is quite a laborious task. Our automated workflow AutoTS is designed for locating a TS of an elementary chemical reaction in which covalent bonds are broken or formed (or both). See Fig. 6 for an illustration. Given the reactants and the products as either isolated entities or complexes, a typical AutoTS calculation proceeds as follows: the correspondence between the atoms of the reactants and the products is set, and a TS guess representing a similar reaction is extracted from the library of TS templates. If the template is not available, an artificial force induced reaction (AFIR) algorithm of Maeda and co-workers265 is applied to generate a high-quality initial guess of a TS. Then, a TS search is performed, a TS is located and confirmed via an intrinsic reaction coordinate (IRC) calculation to be the correct TS, and the results are summarized and reported. Conformational search can be performed as part of the workflow266 and is recommended for many reaction types. The workflow is configurable in many ways and automatically applies several layers of fallbacks and alternative approaches if some automated steps did not finish as expected. For example, if IRC does not match either reactants or the products, the workflow enters an iterative mode, which continues to find TSs on the way until connections to the reactants and the products are found (“guardrails” are put in place to prevent an inordinately unreasonable loop).
A script called RankReactivity is an example of a “super-workflow.” It applies the identical molecular transformation to a set of similar reactants, launches AutoTS on each of the prepared reactions, and summarizes the results, ranking the original molecules by reactivity. The script supports several types of practically important molecular transformations: Michael addition, hydrogen abstraction, and aromatic hydroxylation. Michael addition is a common reaction in design of covalent inhibitors of proteins,267,268 whereas hydrogen abstractions and aromatic hydroxylation are common proxy reactions in prediction of drug metabolism.269–271
Among multiple recent additions and improvements introduced to AutoTS is its optional use of the growing string method (GSM)272–274 as implemented in the open source program pyGSM developed by the Zimmerman research group.223,224 By default, AutoTS switches to the GSM when other possibilities of locating a TS have been exhausted. AutoTS can now account for “catalytic” solvent molecules (formally not participating in the reaction but involved in the elementary reaction as a catalyst) and spectator solvent molecules (creating an electrostatic environment but not breaking or forming covalent bonds during the reaction). The user can also study isotope effects on the reaction rate, since now AutoTS can handle isotopes. The workflow is also aware of certain types of stereochemical complications and automatically applies the corresponding adjustments (see Ref. 266 for a particular case).
There is now support for “refinishing,” i.e., applying higher-level calculations to a finished, optimized PES found by AutoTS. This feature lets the user converge the geometries of the reactants, TS, and products at one (less computationally expensive) level of theory and generate single point results at another level of theory.
AutoTS has been used for finding TSs in a number of published applications.266,275–283 This type of workflow has now become very popular among other researchers, and a large number of its variations have appeared (see, for example, Refs. 284–287). The planned extensions of AutoTS include “grafting” a molecule on a reaction path obtained for another, similar molecule to rapidly assay relative reaction barriers, using an ML energy engine in place of DFT for much faster TS predictions as well as generalizing the workflow to TSs that separate conformational degrees of freedom.
VIII. APPLICATIONS IN LIFE SCIENCE
In principle, any quantum chemistry code can be used in life science (LS) projects that involve organic, medium-sized molecules. At the same time, Jaguar may be regarded as a particularly attractive choice for such applications for a few reasons. These reasons have been mentioned above (see Secs. II and IV), but it is worth summarizing them here for the purposes of this section.
As Jaguar developers have been evolving and optimizing the code, they have been concentrating efforts on those specific QM applications that gained priority during Schrödinger’s long-standing contacts with the pharmaceutical industry. At present, the use of Jaguar as a QM engine in the medicinal chemistry research conducted at large pharmaceutical companies is considerable. Only a handful of representative publications can be cited in support of the preceding statement, and in the rest of this section, for reasons of space.288–297 Popular applications of Jaguar in pharmaceutical studies involve geometry optimizations,298–300 coordinate scans,292,296,301 location of transition states,275,291,302,303 pKa predictions,294,304–308 calculations of electrostatic potential (ESP) surfaces,293,309,310 or descriptors such as HOMO and LUMO energies, ESP atomic charges, or Fukui indices,297,311–313 analysis of non-covalent interactions (NCIs),314,315 and simulation of VCD spectra.316,317
It is not possible within the scope of this paper to analyze even a small part of the mentioned applications in detail, but a couple of comments can be made. Coordinate scans can serve as an instrument for detecting atropisomerism,292,318,319 which is an important problem in drug discovery.320 Prediction of pKa and protonation states with the Jaguar pKa workflow has been used in the pKa correction approach321–323 in free energy perturbation (FEP) calculations324 to estimate relative binding affinities between small molecules and proteins, for modulating basicity of drugs325,326 and for other purposes.327,328
Schrödinger’s unifying user interface makes Jaguar closely linked to the rest of the company’s LS products. Interaction among different products or workflows, such as passing a structure from one program to another, is often necessary, and it is performed seamlessly in Maestro. For example, a drug-like molecule of interest may be initially sketched in the 2D drawing interface (2D Sketcher), passed on to the 3D Builder for any necessary modifications in 3D (such as minimizing the geometry with a force field or making sure the right atropisomer is created), and then passed on to a Jaguar workflow (for example, for establishing the population of its tautomeric forms under a specific pH). Finally, the molecule thus processed may be passed on to other modules such as FEP+324 for subsequent computations related to protein–ligand interactions. Another frequent scenario of Jaguar’s involvement in a multi-component computational enterprise starts from the opposite end. Inside Maestro, an x-ray crystal structure of a protein-sized molecule can be downloaded from the protein data bank (PDB),329 and once the associated small molecule or fragment of interest is extracted from the protein, it can be passed for further processing to a Jaguar-driven application. Alternatively, a full-sized and appropriately prepared protein structure can be treated with the QM/MM program QSite, which relies on Jaguar for the QM part of the QM/MM calculation. QSite has been used to optimize the geometries and compute the electronic structures of protein-sized molecules in a number of studies, a few of which, published very recently, may be highlighted.330–332 One more advantage of using Jaguar in LS applications consists of its focus on the PS method, which enables faster calculations and helps achieve a greater throughput in projects that deal with larger and more flexible molecules.
Schrödinger’s internal drug discovery team uses Jaguar and Jaguar-based workflows extensively, and a few observations related to the team’s experience with QM calculations in drug discovery projects can be shared. Some of these observations intersect with the comments made in Sec. II, which was dedicated to the Jaguar development strategies, but we believe that it will be valuable to present a separate viewpoint of Jaguar users.
While higher accuracy is desirable, it cannot justify a concomitant increased computational cost of calculations, if this cost exceeds some practical limits, such as 2–3 days of wall-clock time. An input–output feedback loop is important: actionable results need to be generated quickly, which should prompt swift decisions. If an accurate method is projected to require more time than the practical time limit to produce results, then a less accurate but faster method is used instead.
Computational QM models developed in drug discovery applications need not be all-encompassing or universal, in a sense that they need not be suitable for or validated on a large chemical space, or work under diverse settings. An ad hoc QM model or workflow, which provides adequate accuracy on a chemical space of interest, is often sufficient to make progress in a practical project.
Most types of atomistic simulations must be performed in solution phase. This may seem like an obvious observation, but in reality, quantum chemists may not fully realize its importance. Unless the method or workflow that they are developing is capable of modeling a solvation environment, it is likely going to end up being of little use in practical applications.
Now, we can give a few examples of the types of Jaguar calculations that see a significant use in internal drug discovery projects. Accurate solvation energies are computed with Jaguar’s implementation of the parameterized PBF implicit solvent model. These solvation energies of drug-like molecules are correlated with their blood–brain barrier penetration and unbound brain/plasma ratio (Kp,uu).250 In estimations of time dependent inhibition (TDI), when the mechanism of the reactive intermediate is known for a project series, Jaguar has been used to generate predictions for the relative reactivity. This reactivity has been shown to correlate with experimental TDI readouts with good accuracy. Accurate predictions of the energy of interconversion between two atropisomers are used to reduce the probability of ending up working on compounds of so-called type II. Such compounds are characterized by a rotation barrier of medium height (typically taken to be between 20 and 28 kcal/mol), which implies that a drug formulation can change its physico-chemical characteristics during its shelf life.
Pharmaceutical and organic chemistry are closely intertwined, but standalone organic chemistry research projects also employ Jaguar (see Refs. 333–335, among others). A valuable use of DFT calculations in pure organic chemistry research appears to be in modeling the mechanisms of chemical transformations, normally via TS search.336,337 A similar use of Jaguar is observed in bioinorganic research.338–340 We hope that in the future, such calculations will be increasingly reliant on the AutoTS module, which is designed to make TS searches routine (see Sec. VII F).
IX. APPLICATIONS IN MATERIALS SCIENCE
Molecular QC methods can provide important insights into the rationale and mechanism underlying the intrinsic and condensed phase properties critical to materials applications (see general reviews, Refs. 341 and 342). They can reveal fundamental design principles linking composition and structure to the physics and chemistry, which leads to novel materials solutions impacting diverse areas of application including electronics, automotive, aerospace, energy storage, and specialty chemicals. Jaguar, with its advantages in efficiency due to the PS method95,343,344 and integration in multistep automated simulation solutions,345–347 makes it uniquely positioned to drive the continued evolution of quantum chemistry from a mostly retrospective tool to a powerful prospective tool through high-throughput simulations and property predictions for a wide range of materials science (MS) applications.
As summarized in the earlier sections of this paper, Jaguar has capabilities to calculate energetics and an extensive set of properties for neutral and charged, closed and open shell species, in ground and excited states. Energy profiles linking atomic configurations spanning the reactant, intermediate, and product channels during chemical transformations are critical for assessing the reaction kinetics and reaction activities of materials systems and have been computed with Jaguar for a wide range of catalytic systems.348–358 Kohn–Sham orbital energies, electron affinities, ionization potentials, and the associated predictions of redox potentials are also crucial design factors for many MS applications, such as energy storage materials277,283,359–365 and materials for display applications.366–371
The advantage from Jaguar’s high efficiency increases as the number of systems to analyze increases. Aside from property predictions for a specific system, Jaguar has been used to calculate energies for varying atomic configurations over a range of chemistries to generate training sets to develop structure–energy relationships for simulation methods applicable to larger time- and length-scales. An example is the use of Jaguar to generate training data for new and improved force field parameters for classical molecular dynamics simulations (e.g., Schrödinger’s OPLS4 force field).18,222,372,373 To address shortcomings of conventional force fields, physics-informed machine-learned force fields are gaining popularity for materials research. Force fields based on neural networks374 bridge the application gap between organic and inorganic systems, treating mixed and hybrid systems with the same energy function, where traditionally organic and inorganic structures would require incompatible functional forms (e.g., valence bond force fields and embedded atom potentials, respectively). Jaguar has also been used to generate the extensive datasets needed to develop machine-learned potentials,19 in particular for accelerated accurate simulations of materials systems.
Properties for the analysis and optimization of chemical solutions require configurational conditioning and sampling before any QM calculation is performed, to account for the structural disorder in real world chemical systems. Predicting complex materials properties, in particular, often requires simulation and modeling methods that are beyond the scale of QC techniques (e.g., classical molecular dynamics). Over the past 30 years, Schrödinger has engineered, automated, and validated a suite of complex, multiphysics, multistep simulation solutions to calculate a wide spectrum of critical properties needed for chemical discovery and development, and Jaguar served as a foundation of this vast enterprise. Examples include multistep reaction energetics, chemical bond dissociation, excited-state charge transfer and charge carrier mobility, and spectral properties of thin film devices. The previous review1 described only a fraction of Jaguar’s support for MS simulation applications. In Table IV, we provide a more comprehensive list of Jaguar-based solutions for materials applications, as specifically developed for and made available to users through Schrödinger’s Materials Science Suite.375
Workflow title . | Description . | Key areas of application . | Other simulation method(s)? . |
---|---|---|---|
QM multistage | User-defined multistage molecular quantum mechanical calculations | General materials R&D | No |
Optoelectronics | Predict major optoelectronic properties of molecular materials | Electronics, display and lighting, energy storage and capture | No |
Bond and ligand dissociation | Predict bond dissociation energies varying charge and electronic state to assess stability or dissociative degradation of materials | Electronics, display and lighting, packaging, automotive and aerospace | No |
Beta elimination | Predict dissociative degradation energy via beta elimination mechanism | Semiconductor processing | No |
Excited state analysis | Assess charge transfer characteristics of molecular compounds and clusters at the electronically excited states | Electronics, display and lighting, energy harvesting | No |
TST rate calculation | Predict chemical reaction rate vs temperature based on the transition state barrier predictions | Catalysis, chemical degradation and failure, additive processing and curing | No |
Band shape | Predict band shape of absorption and emission spectrum accounting for vibronic coupling (Franck–Condon Herzberg–Teller) | Electronics, display and lighting, energy capture, dyes and colorants | No |
Auto reaction workflow | Automated high-throughput predictions for catalytic and non-catalytic reactivity and selectivity with physics-based modeling | Homogeneous catalysis, energy storage and conversion, materials synthesis and processing, aging and degradation | Macromodel, MM, xTB, semiempirical |
Transition moment order parameter | Compute transition dipole moment of a light-emitting molecule and estimate order parameter based on the distribution of molecular orientations | Electronics, display and lighting, energy harvesting | MD (Desmond) |
KMC charge mobility | Compute charge carrier mobility via kinetic Monte Carlo simulations based on charge transport parameters | Electronics, display and lighting, energy capture | MD (Desmond), QM/MM (QSite) |
Nanoreactor | Survey reaction products initial reactant(s) via semiempirical metadynamics, energy refinement, and sorting | Catalysis, chemical synthesis, degradation and processing | xTB-based metadynamics |
Amorphous dielectric properties | Compute dielectric properties of molecular and polymeric materials in the condensed phase | Electronics, display and lighting, automotive and aerospace, energy storage, semiconductor, communications | MD (Desmond) |
Molecular materials descriptors | Generate molecular materials descriptors for machine learning applications | Machine learning for diverse materials applications | Semiempirical, QM |
Active learning optoelectronics | Generate machine learning prediction models for optoelectronic properties of molecular materials using quantum chemistry generated training sets and an active learning algorithm | Electronics, automotive and aerospace, energy storage and capture, semiconductor, communications | ML |
GA optoelectronics | Genetic optimization to evolve chemical solutions in a multiobjective property space | Electronics, display and lighting, energy storage and capture; general materials R&D | ML |
Workflow title . | Description . | Key areas of application . | Other simulation method(s)? . |
---|---|---|---|
QM multistage | User-defined multistage molecular quantum mechanical calculations | General materials R&D | No |
Optoelectronics | Predict major optoelectronic properties of molecular materials | Electronics, display and lighting, energy storage and capture | No |
Bond and ligand dissociation | Predict bond dissociation energies varying charge and electronic state to assess stability or dissociative degradation of materials | Electronics, display and lighting, packaging, automotive and aerospace | No |
Beta elimination | Predict dissociative degradation energy via beta elimination mechanism | Semiconductor processing | No |
Excited state analysis | Assess charge transfer characteristics of molecular compounds and clusters at the electronically excited states | Electronics, display and lighting, energy harvesting | No |
TST rate calculation | Predict chemical reaction rate vs temperature based on the transition state barrier predictions | Catalysis, chemical degradation and failure, additive processing and curing | No |
Band shape | Predict band shape of absorption and emission spectrum accounting for vibronic coupling (Franck–Condon Herzberg–Teller) | Electronics, display and lighting, energy capture, dyes and colorants | No |
Auto reaction workflow | Automated high-throughput predictions for catalytic and non-catalytic reactivity and selectivity with physics-based modeling | Homogeneous catalysis, energy storage and conversion, materials synthesis and processing, aging and degradation | Macromodel, MM, xTB, semiempirical |
Transition moment order parameter | Compute transition dipole moment of a light-emitting molecule and estimate order parameter based on the distribution of molecular orientations | Electronics, display and lighting, energy harvesting | MD (Desmond) |
KMC charge mobility | Compute charge carrier mobility via kinetic Monte Carlo simulations based on charge transport parameters | Electronics, display and lighting, energy capture | MD (Desmond), QM/MM (QSite) |
Nanoreactor | Survey reaction products initial reactant(s) via semiempirical metadynamics, energy refinement, and sorting | Catalysis, chemical synthesis, degradation and processing | xTB-based metadynamics |
Amorphous dielectric properties | Compute dielectric properties of molecular and polymeric materials in the condensed phase | Electronics, display and lighting, automotive and aerospace, energy storage, semiconductor, communications | MD (Desmond) |
Molecular materials descriptors | Generate molecular materials descriptors for machine learning applications | Machine learning for diverse materials applications | Semiempirical, QM |
Active learning optoelectronics | Generate machine learning prediction models for optoelectronic properties of molecular materials using quantum chemistry generated training sets and an active learning algorithm | Electronics, automotive and aerospace, energy storage and capture, semiconductor, communications | ML |
GA optoelectronics | Genetic optimization to evolve chemical solutions in a multiobjective property space | Electronics, display and lighting, energy storage and capture; general materials R&D | ML |
An illustrative example of a Jaguar-based simulation solution is the Optoelectronics module from the Materials Science Suite (see Fig. 7). The module is designed to automatically, efficiently, and easily compute key optoelectronic properties that are widely considered as important design factors for molecular materials used in organic electronics and energy storage devices. The list of properties includes oxidation and reduction potentials, charge reorganization energies, corrected highest-occupied and lowest-unoccupied molecular orbital (HOMO/LUMO) energies, optical absorption and emission peaks, singlet–triplet energy gap, and molecular dipole. Making predictions for these properties requires multiple individual quantum chemical calculations and the coordination, monitoring, and curation of simulations for each individual chemical system.
Robust workflow integration enables reliable, efficient, and accessible execution of large volume property predictions with Jaguar. Multiple studies have been reported, in which the Optoelectronics module and Jaguar were applied to screen libraries of materials useful in optoelectronic applications.344–346,379–389 Automation, multistep integration, and job management have enabled Jaguar to expand its impact from retrospective analysis of a few systems, to prospective analysis of larger collections of chemical systems in the search for new materials solutions.
Work by Matsuzawa and co-workers344 showcases the use of a Jaguar-based workflow on modern-day cloud computing resources to carry out QM simulation at an unprecedented scale. Simulation driven innovation is compute-limited, with more resources allowing more simulations, making it possible to screen larger chemical spaces. Cloud computing enables chemical innovation on the megascale. In that study, the authors performed massive screening of organic semiconductor materials for application as ink-jet printed radio frequency ID tags (RFID). Charge transport through weakly interacting molecular solids occurs through a thermal hopping mechanism (Marcus theory390,391) with the reorganization energy being the dominant controlling parameter. Dipole–charge interactions can lead to trapping in the solid phase, thus reducing charge mobility. Exemplary organic materials with high charge mobility will have low reorganization energy and small permanent dipole. The calculation of the controlling parameters for organic material charge mobility requires multiple Jaguar calculations for neutral and charged species in their optimized and in a fixed geometry. From a library of over 7 × 106 candidate compounds, 250 000 candidates were selected and analyzed using the Jaguar simulation workflows. This work involved over 3 × 106 DFT calculations, carried out in 16 days on the cloud over 9336 cores, representing 52 years of compute time. The candidate library was down-selected, identifying the Pareto optimal charge transport compounds for further study. This selection is represented by systems in the lower left quadrant of Fig. 8. Single molecule gas-phase simulations can provide critical insights into the intrinsic properties of a molecular system and reveal important fundamental structure–property relationships. However, in practical applications, these materials exist in a condensed phase environment, which for flexible systems leads to a distribution of low energy conformations, and there are also environmental effects on electronic structure and reactivity. Conformational sampling, simulated deposition, and morphology equilibration is a necessary initial simulation phase, which normally employs force field-based methods before property calculations are computed with Jaguar. These multiscale simulations are used to investigate the impact of thin-film morphology on optoelectronic properties of materials,347,392 to assess thermomechanical properties coupled with chemistry,393,394 or to analyze the solvation effects of molecular materials.395,396
An example of a multistep, multiphysics simulation protocol to predict a key multiscale materials property is the calculation of solid-state charge mobility for molecular materials.398,399 As shown in Fig. 9, the calculation of bulk mobility for an amorphous organic semiconductor requires multiple simulation steps. They are realized through a combination of classical MD simulations (effected by Desmond MD) to prepare the 3D condensed phase morphology and mixed method QM/MM simulations (performed by QSite) to calculate transport parameters such as site energy and reorganization energy. Then, quantum chemical calculations are performed with Jaguar on individual dimer pairs extracted from the 3D structure. Finally, explicit charge hopping trajectories are simulated with a kinetic Monte Carlo method,400 which calculates the overall charge mobility in the amorphous condensed phase.
One of the most widespread uses of quantum chemical simulation is in studying chemical reactivity (e.g., Refs. 401–404). Using Jaguar to locate, characterize, and compare critical points defining a reaction pathway enables the prediction of relative energetics with accuracy comparable to experimental measurements. This process provides atomistic details of reaction mechanisms unavailable by other means (see examples in Refs. 405–412). Jaguar is a critical tool when it comes to applications in materials science. It helps improve the understanding of structure–property relationships, provides valuable details about growth and failure chemistries, and furnishes data required for process optimization and control.
Once the structures on the reaction path and the corresponding energies are calculated for a reference reaction of interest, the structures along the reaction coordinate can be similarly modified to evaluate the effect on the controlling energetics (kinetics and thermodynamics) with the purpose of enhancing or hindering that specific reaction channel. The automated reaction workflow (AutoRXNWF) in Schrödinger’s Materials Science Suite was developed for high-throughput chemical reaction and reactivity optimization with QM methods provided by Jaguar. See Fig. 10 for an illustration. Such an automated optimization is a powerful tool, which enables molecular catalyst design, non-catalytic reaction optimization based on stoichiometric additives, as well as reactivity and selectivity screening in a series of related reactions in a massively parallel fashion.
The following input data are required to set up and launch an AutoRXNWF calculation: (i) knowledge of the mechanism of the reference reaction, (ii) a predefined in silico library of “R” group substituents to be used in critical point structure enumeration or fragment swapping, and (iii) defined level of DFT model chemistry (the combination of exchange–correlation functional and basis set) to be used in the comparative analysis. The specific details of AutoRXNWF are presented in Fig. 10 for the example case of molecular catalyst design. In this scheme, the knowledge of the mechanism of the reference reaction is supplied in the form of 3D structures of selectivity-determining transition states and/or catalytic cycle participants.
The workflow offers optional conformational search and geometry pre-optimization with classical force fields and/or extended tight-binding. Molecular mechanics models called from MacroModel and the semiempirical method xTB are used for these purposes. The workflow then invokes a PS DFT Jaguar calculation for the final analysis.
The output is a set of Boltzmann-averaged free energy profiles calculated for each catalyst specified initially via R-group enumeration. Selectivity of each catalyst is determined kinetically via the free energy difference between the selectivity-determining transition states. If the information on 3D structures of the catalytic cycle is supplied, turnover frequency (TOF) is automatically calculated and printed on the basis of the exact and approximate formula of the energetic span model approximation.
X. ROLE OF MACHINE LEARNING
The recent explosive progress in application of ML techniques to various scientific and technological topics has made a strong impact on theoretical chemistry as well.19,205,413–422 The traditional juxtapositions of QM and MM methods,222,423–425 as well as QM and semiempirical methods,218,229,425,426 have been disrupted or blurred by the ever-increasing presence of the ML methods.417,427–429
A particularly exciting progress is anticipated in development of machine learned density functionals418,430–432 and MLPs19,205 (see more discussion and citations in Sec. V). Recently, large language models (LLMs) have demonstrated the potential to transform the way we do chemical research.433–436
Against this rapidly evolving backdrop, several ML-related research directions have been pursued by us. First, the QRNN model, which is an MLP that was mentioned above, has been incorporated into Jaguar as a level of theory. Several variants of QRNN are available. One of the more accurate ones is a transfer learning model, which was trained on DFT but uses a part of the GFN2-xTB229 as a correction. Implicit solvent effects can be optionally imported from the GFN2-xTB model. QRNN currently supports only eight “organic” elements: H, C, N, O, F, P, S, and Cl.
While this MLP is only trained on Jaguar calculations and is a standalone code, which has no connection to the main Jaguar codebase, it was logical to begin to treat it more or less like DFT or MP2 from the perspective of the user. QRNN energies and analytic gradients can be used for single point optimizations and geometry optimizations. At this time, only numerical second derivatives are available, but we have plans to implement analytic QRNN second derivatives soon. Of course, many common QC properties such as Mulliken charges or polarizabilities are disabled when the level of theory is set to QRNN. The treatment of QRNN on an equal footing with other traditional QC methods allows us to inherit all the machinery developed for geometry optimizations, transition state searches, coordinate scans, and even multistep workflows for much faster calculations that outwardly look like QC calculations.
Interestingly, the exposure of QRNN as a level of theory in Jaguar revealed a number of Jaguar code inefficiencies, which would be irrelevant in DFT and other traditional QC calculations, but which became noticeable when the bottleneck calculation (such as SCF iterations) disappeared. Steps like writing data to disk or allocating large arrays (no longer needed for an MLP) begin to compete with MLP calls. Currently, we are making sure that these inefficiencies are eliminated when QRNN is used. Another problem that had to be solved in conjunction with calling QRNN directly from Jaguar is limits that had been put in place for traditional QC calculations, such as restrictions on the number of atoms. Years ago, when the DFT code was written, the developers did not imagine that anyone would be performing a QC-like calculation on a system that exceeds one thousand atoms. When this restriction was lifted, we were able to apply QRNN to proteins, for example, to a globulin protein with the protein data bank (PDB) code 3AUP with about 22 418 atoms shown in Fig. 11. Note that the molecular structure deposited to the database contains about 12 000 heavy atoms and that hydrogen atoms were added to it by the Schrödinger software. The single point energy calculation on this particular system is likely not of much use other than serving as an illustration of the feasibility of obtaining QRNN energies with Jaguar for systems of such size. The single point energy calculation with the direct-learned QRNN method finished in 7 min on a laptop computer (only CPUs were used). Evidently, our current implementation of QRNN energy calculation within Jaguar is not as computationally fast as MM force fields, and some additional optimization of the code that would remove some unnecessary “overhead” operations is possible. However, even the demonstrated efficiency is still likely to find applications in problems for which DFT is computationally intractable and to which some force fields are not even applicable (such as reactive chemistry or tautomerization). Many types of calculations on proteins require molecular dynamics simulations to produce statistically meaningful results. However, applications to reactive chemistry might be possible under a “frozen bath” assumption,437 that is, the chemical covalent transformation is assumed to be much faster than the conformational movement of the protein, and working with a single conformation is thereby justified. Note that GFN2-xTB can be similarly called as a “level of theory” from Jaguar. However, in our experience, and as implied in the original paper,229 GFN2-xTB cannot be used on molecules as large as the 3AUP protein. When called either from Jaguar or from its native executable file, GFN2-xTB struggles to converge single point energies for proteins as small as about 3000 atoms, and even then one GFN2-xTB SCF iteration takes several minutes.
A second ML application prominently featured in the Jaguar package is known as Jaguar Timer. This NN model estimates the computational time cost to complete a Jaguar calculation given an input structure, Jaguar input file settings such as level of theory and basis set, as well as properties to be calculated. As there seems to be no slacking of computational demands in computational quantum chemistry projects, Jaguar Timer will allow Jaguar users not only to better plan time and computational resources before launching the calculation, but also to load balance their computational resources for optimal efficiency. Jaguar Timer, trained on over 17 000 individual Jaguar calculations, is capable of estimating the cost of DFT single-point optimizations and geometry optimizations. In addition, Jaguar Timer accounts for a wide range of settings, such as the number of CPU threads, DFT functional type, implicit solvent model used, PS and analytic integral calculations, unrestricted and restricted wave function references, and the need to compute vibrational frequencies. If the calculation settings are incompatible with the trained model (for example, no training has been done on RI-MP2 calculations yet), or fall outside the validated limits, Jaguar Timer will indicate that a prediction cannot be made.
A similar ML model, which predicts timings in QC calculations, was announced several years ago.438 This model of von Lilienfeld and co-workers has conceptual overlaps with Jaguar Timer, but it focuses on somewhat different practical aspects of QC calculations. For instance, among all DFT functionals, it covers only B3LYP, whereas Jaguar Timer works on all the functionals available in Jaguar.
The current performance of Jaguar Timer on a test set is depicted in Fig. 12. This figure shows what in practice might be called a close agreement between the predicted and observed walltimes over an expansive range of types of calculations and time scales—from 1 minute up to 32 h. The vast majority of the predictions in the test fall within the “off by 4” boundary, which means that the timing was mispredicted by up to a factor of 4 (either underestimated or overestimated), which in our estimation should be sufficiently accurate to make decisions in practical projects involving QC calculations.
We anticipate numerous applications of Jaguar Timer in our software. Not only the timings of individual Jaguar calculations can be estimated, but also those of the Jaguar workflows in which QC calculations are a computational bottleneck. The improvements to the model, particularly attempts to improve its prediction accuracy, are currently underway. A separate detailed presentation of Jaguar Timer is planned for the future.
ML approaches have found their way into several other Jaguar features that we are currently developing. For example, as part of the development of the Macro-pKa workflow, we have used an ML method for adjusting DFT-computed “raw pKa” values to bring them to a closer agreement with experimental pKas. Our methodology is similar to that used in the ML-based empirical pKa predictor Epik,216 except that, importantly, our ML training and input to the ML model involve QC data produced from atomistic 3D models. The results will be presented elsewhere.
XI. CONCLUDING REMARKS
We have given a comprehensive overview of the quantum chemistry package Jaguar. In addition to the scientific methods, features, and tools, which characterize the types of calculations the program is capable of, we also covered ancillary topics: the development methodology and practices, as well as the user interface. As we have shown, the outstanding features of Jaguar, in comparison with other QM packages, are: the intensive use of the PS technique, which accelerates QC methods, the availability of workflows, multistep and multiphysics solutions targeting specific chemical properties, the prominent role of the graphical user interface, and a dedicated support of and integration into life science and materials science areas of research. What remains to be mentioned is a couple of points that were not discussed in the main text and that have significance for the possible future development of Jaguar.
At present, QC methods and workflows are frequently applied to large and flexible molecules, and that is why all aspects of scientific research related to conformations acquire critical importance, if accurate results are to be expected. In our opinion, it should become the responsibility of the developers of practical QC workflows and protocols to take care of all computational steps that deal with conformations. This includes, but is not limited to: generation and filtering of conformations, accounting for inter- and intramolecular hydrogen bonds, the effects of explicit solvent, and producing conformationally averaged results. It is not enough to merely declare success for a design of a QC method, which yields rapid and accurate computational predictions when applied only to small or rigid molecules. It is also necessary to demonstrate how to use the method and what its performance will be in the face of significant conformational flexibility.
Finally, the truly marvelous capabilities of LLMs hold the promise of developing a transformative new type of user interface for atomistic simulations, in addition to the familiar command line and GUI. This new interface might resemble a “chat,” in which the user would send commands written in natural human language. In this paradigm, the human user would act as a reviewer of the input and output generated by the agent, which would operate QC software directly. A large number of associated questions remain, and they will be exciting to solve in the next few years, possibly within the Jaguar program package.
ACKNOWLEDGMENTS
We would like to acknowledge valuable contributions to Jaguar by the following former and present colleagues: Dale A. Braden, Christian Cortis, William A. Goddard III, Burnham H. Greeley, Jean-Marc Langlois, Robert B. Murphy, Dean M. Philipp, W. Thomas Pollard, Murco N. Ringnalda, Haoyu S. Yu, and Linda Yu Zhang.
AUTHOR DECLARATIONS
Conflict of Interest
All authors have financial interest in Schrödinger, Inc., and its products.
Author Contributions
All authors contributed to this work.
Yixiang Cao: Methodology (equal); Software (equal); Writing – original draft (equal). Ty Balduf: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Michael D. Beachy: Software (equal); Writing – review & editing (equal). M. Chandler Bennett: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Art D. Bochevarov: Methodology (equal); Software (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Alan Chien: Software (equal). Pavel A. Dub: Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Kenneth G. Dyall: Software (equal). James W. Furness: Software (equal); Writing – review & editing (equal). Mathew D. Halls: Conceptualization (equal); Methodology (equal); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal). Thomas F. Hughes: Software (equal). Leif D. Jacobson: Software (equal). H. Shaun Kwak: Methodology (equal); Supervision (equal); Writing – original draft (equal). Daniel S. Levine: Software (equal); Writing – review & editing (equal). Daniel T. Mainz: Software (equal). Kevin B. Moore III: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Mats Svensson: Methodology (equal); Writing – original draft (equal). Pablo E. Videla: Software (equal); Writing – review & editing (equal). Mark A. Watson: Software (equal); Supervision (equal); Writing – original draft (equal). Richard A. Friesner: Conceptualization (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.