In this contribution to the special software-centered issue, the ORCA program package is described. We start with a short historical perspective of how the project began and go on to discuss its current feature set. ORCA has grown into a rather comprehensive general-purpose package for theoretical research in all areas of chemistry and many neighboring disciplines such as materials sciences and biochemistry. ORCA features density functional theory, a range of wavefunction based correlation methods, semi-empirical methods, and even force-field methods. A range of solvation and embedding models is featured as well as a complete intrinsic to ORCA quantum mechanics/molecular mechanics engine. A specialty of ORCA always has been a focus on transition metals and spectroscopy as well as a focus on applicability of the implemented methods to “real-life” chemical applications involving systems with a few hundred atoms. In addition to being efficient, user friendly, and, to the largest extent possible, platform independent, ORCA features a number of methods that are either unique to ORCA or have been first implemented in the course of the ORCA development. Next to a range of spectroscopic and magnetic properties, the linear- or low-order single- and multi-reference local correlation methods based on pair natural orbitals (domain based local pair natural orbital methods) should be mentioned here. Consequently, ORCA is a widely used program in various areas of chemistry and spectroscopy with a current user base of over 22 000 registered users in academic research and in industry.
I. INTRODUCTION
The electronic structure package ORCA has been developed since the late 1990’s mostly in the group of the authors but with important contributions from highly esteemed collaborators around the world. Judging from citation statistics (Fig. 1), ORCA has grown to be one of the most used quantum chemistry programs worldwide with more than 22 000 registered users (January 2020)1 and a very rapidly growing user base. From the start, ORCA has been freely available for academic researchers, a feature that we always considered as essential and non-negotiable. However, given the very rapidly growing demand from industry, the company FAccTs was founded (Dr. Riplinger),2 which is licensing ORCA to commercial users. In the process, it was ensured that ORCA remains freely available to academic users in long term.
Articles citing ORCA per calendar year. The citations were counted using data from the cited by feature of Google Scholar (January 27, 2020) for the main ORCA references.
Articles citing ORCA per calendar year. The citations were counted using data from the cited by feature of Google Scholar (January 27, 2020) for the main ORCA references.
ORCA has always been a package that has been driven by desire for practical applications to real-life chemistry, biochemistry, and materials sciences, as opposed to a package that is mostly used to test new theories in a “proof of principle” fashion. Hence, ORCA very much is a “hands-on” package. A major driving force for the development of ORCA was the desire to solve concrete chemical problems where the questions to be addressed mostly originate from the experiment. This includes our own experimental problems as well as those of numerous collaborators ranging from biochemists and synthetic chemists to hard-core spectroscopists. In its development, versatility, efficiency, reliability, and ease of use have been prominent aspects. As will be elaborated below, the program development has followed strict guidelines from the start. Consequently, ORCA features a highly coherent code base that lends itself well to future development.
In this short article, we provide an overview of the features and strengths (and shortcomings) of ORCA in its current state (version 4.2, 2020/Q1)
II. HISTORY, PHILOSOPHY, AND MISSION OF ORCA
Precursor versions of ORCA were written in 1995 as part of the Ph.D. thesis of Neese3 using Turbo Pascal as the programming language. At the time, the biochemical problem to be solved was to elucidate the structure of a transition metal active site in an enzyme (nitrous oxide reductase).4 This active site had known spectroscopic properties but no crystal structure was available. Since the site was paramagnetic, there was rich information available from electron paramagnetic resonance (EPR)5 as well as from optical spectra [UV/vis absorption, circular dichroism (CD), and magnetic CD (MCD) spectroscopy].4 It was clear from the start that these spectroscopic properties cannot be interpreted correctly without the help of quantum mechanics. However, in those days, quantum chemistry was largely a science of closed-shell molecules, density functional theory (DFT) was still in its infancy in chemistry, and the computational resources available to graduate students at the University of Konstanz were far too restricted to allow ab initio [not even Hartree–Fock (HF)] methods to be applied to enzyme active sites. Hence, a pragmatic approach was chosen and ORCA’s predecessor started its life as a semi-empirical package. The main method was the spectroscopic version of the intermediate neglect of differential overlap (INDO/S6) since it had shown excellent results in spectroscopic applications. In the first version of the program, the INDO/S method was implemented for closed- and open-shells and already featured a fairly sophisticated restricted open-shell HF (ROHF) code that could handle far more than only high-spin cases.7 The second feature that was added was a configuration interaction (CI) program that was specialized to S = 1/2 states and single excitations. Since magnetic properties were at the center of interest, the program could also perform spin–orbit coupled (SOC) CI calculations. From the “relativistic” eigenstates, optical and magnetic spectra could be calculated with excellent success.3–5,8
The program was largely a tool to solve a specific problem for a Ph.D. project. However, as soon as Neese started his postdoctoral stay in the laboratory of Edward Solomon (Stanford university), it became clear that it was also useful in other contexts. Since the new project involved open-shell iron centers, the CI program was not general enough anymore and a multi-reference CI (MRCI) module based on Rumer spin-coupled functions was added.9 It could also handle SOC.10 A general theory for the EPR g- and D-tensors11 and a general theory of MCD spectroscopy12 were developed and implemented at the MRCI level.11,12 The theory stands unchanged today and has served as a basis for the interpretation of a host of related phenomena.
While the existing program was successfully used in application studies13,14 by a number of labmates at the time, it quickly became clear that DFT would take chemistry by storm and that semi-empirical quantum chemistry would rapidly loose acceptance. Motivated by the success of the older semi-empirical Turbo Pascal program, it was decided to create a more powerful and more general electronic structure program. The Turbo Pascal program mostly served as a vehicle for self-education and for solving imminent research problems. Since it was also clear that Turbo Pascal was a dying language with no future in high-performance computing, Neese decided to start from scratch and use C++ as a programming language. Thus, a complete rewrite of the program started after the postdoc time. Hence, the birth date of the ORCA project can be precisely given as September 1, 1999.15
One frequently asked question in this regard is the origin of the name ORCA or whether it is an undisclosed acronym. The answer is that there is no acronym that ORCA stands for and we also did not succeed in reverse engineering one. At the time, the decision for using “ORCA” as the program’s name (taken while being on a whale watching trip) was solely based on the desire to make the name short and strong and to avoid “MOL” in it since there are so many other packages that use “MOL” in their name.
From the start, ORCA has adhered to a number of strictly enforced programming rules that have greatly helped to keep the code as clean and as manageable as a program consisting of over 2 × 106 lines of hand-written code can be. Given by frustrations encountered in trying to even compile and install other program packages, it was decided to not allow mixed-language programming, to not allow any operating-system or hardware specific code (hence, no shell scripts), to not allow any crucial dependence on external code of uncertain future, and to minimize the dependence on external libraries. Hence, the only libraries that were (and are) allowed into ORCA are the basic linear algebra subroutines (BLAS), the linear algebra package LAPACK, and the message passing interface. In order to not be critically dependent on any specific implementation details of these three crucial libraries, a software layer has been added on top to ensure that no BLAS, LAPACK, or MPI routine is ever called directly from production level parts of the program. Rather, these libraries are interfaced in one and only one place in the code to ensure that they could be readily swapped against alternative versions, even if these have different calling conventions. It has also been considered to be essential to extensively document all written code (in English) and to provide test jobs for every feature of ORCA to ensure the program’s integrity at all times. These jobs (meanwhile, there are thousands of them) are executed by an automated procedure every night and the correctness of the results is monitored automatically. This basic philosophy has been followed throughout the ORCA development.
For several years, the code was mostly further developed by Neese after taking a position as group leader at the Max Planck Institute (MPI) for radiation chemistry. There, DFT and time-dependent DFT (TD-DFT)16 were added to the program in order to satisfy requests from several group members.
A crucial part of any quantum chemistry program is its integral package. The original ORCA integral package was written following the outstanding review article of Helgaker and Taylor17 and used the elegant McMurchie–Davidson scheme.18 During a visit to the research group of Professor Schaefer, Neese was introduced to Professor Valeev, who was a graduate student in Athens at the time. He had developed the libint library that provided access to a large variety of integrals.19 Libint turned out to be much more efficient than the ORCA intrinsic integral library. Since the latter was written as a library in an object-oriented fashion, it proved to be an easy task to adapt this code to work with other integral libraries, including libint. The friendship between the ORCA developers and Professor Valeev continues to this day and has resulted in a range of collaborative research projects.20–30 Since this article was first submitted, we have developed a new integral library (called SHARK) that significantly improves the performance of integral generation and digestion and also features an optimized code to handle generally contracted basis sets. This new functionality will be made publicly available in the upcoming release of ORCA 5.0.
During the years 2001–2005, a research group started to emerge at the Max Planck Institute, and furthermore, developers started to work on ORCA with the most noticeable contributions coming from Wennmohs who has remained the technical program director ever since. However, the decision to turn ORCA into a generally available, general purpose quantum chemistry package was not made until Neese was appointed to the chair of theoretical chemistry in Bonn in 2006. Ever since, there has been a growing group of developers and collaborators who have pushed the frontiers of ORCA. Very noticeable contributions were made by Riplinger, who implemented the first robust geometry optimizer and is responsible for all of ORCA’s quantum mechanics/molecular mechanics (QM/MM) capabilities.31 Important contributions to ORCA have also been made by the group of Professor Grimme, who has largely taken responsibility for the further DFT implementation to ORCA.20,32–40 Altogether, we estimate that about 200–300 man years of development time have gone into ORCA in our research group alone.
As mentioned above, the mission of ORCA has always been to be a real-life problem solver for chemistry. Hence, while there certainly have been “quick and dirty” implementations in ORCA, these have usually been removed after a final strategy had been decided on. It is an important part of the development philosophy to implement features completely (e.g., closed and open-shell versions and common integral approximations available throughout) and to optimize the features that are implemented to the largest extent. New methods are usually only considered if they promise to allow to either tackle completely new problems that could not be tackled before or offer concrete and significant computational, technical, or conceptual advantages over the best available alternative. While this does not necessarily imply a limited feature set, this mindset has greatly helped select research subjects.
While today ORCA is a true general-purpose program package,41,42 there has always been an emphasis on open-shell transition metals, spectroscopy, and magnetism. In our opinion, the associated problems belong to the most complicated electronic structure problems in chemistry. Hence, methods that work in these areas are expected to perform well pretty much throughout chemistry.
It goes without saying that we fully appreciate that other philosophies for quantum chemical program development are every bit as valid as ours. However, we hope that stating our guidelines clearly may be of help to younger researchers who are just starting their larger long-term development projects, even if they object to our point of view.
III. ORCA FEATURE OVERVIEW
In terms of its feature set, by now, ORCA is certainly one of the most versatile quantum chemistry packages in existence. It features extensive capabilities in HF, DFT, single-reference correlation, and multi-reference correlation methods. ORCA also contains semi-empirical quantum chemical methods (NDO methods such as MNDO as well as the more recently developed XTB method by the Grimme group) and even features classical force field calculations. Multilevel43,44 and embedding approaches45 are available to bridge the domains of applicability of the many available methods. A large number of methods are available to calculate spectroscopic properties. In addition, most of these methods can be combined with an extended set of “bells and whistles” that help in solving actual problems.
A. Basis sets, integrals, and integral approximations
ORCA (mostly) uses Gaussian basis sets and features all-electron and effective core potential (ECP) based basis sets. However, the program has some provision to handle Slater basis sets such as those used in semi-empirical calculations. A large library of basis sets is stored internally in the program. However, basis sets can also be read in Gamess-US format (as provided by the EMSL basis set library46). ORCA can handle segmented contraction as well as generally contracted [atomic natural orbital (ANO47)] basis sets,20 with the latter functionality being somewhat limited to common one-electron and electron repulsion integrals. A special family of basis sets has been developed for ORCA, which was given the name “segmented all electron relativistically contracted” (SARC) basis sets.48–53 They are available in conjunction with scalar relativistic Douglas–Kroll–Hess (DKH) and zeroth-order regular approximation (ZORA) based contractions for essentially the entire periodic table and offer an excellent price–performance ratio.
As mentioned above, most standard integrals and the integrals that occur in F12-methods are generated by the libint library. Numerous other integrals, such as one- and two-electron spin–orbit,54 two-electron spin–spin,55 relativistic property integrals,56 or the integrals that occur in gauge including atomic orbital (GIAO) based calculations,57 are generated by ORCA’s integral package.
There are a number of integral approximations that have been consistently implemented throughout the ORCA program. One of the two most important approximations is the resolution of the identity (RI) approximation.58–60 In this approximation, a product of basis functions is fitted to an auxiliary basis set,
As shown by Vahtras and co-workers,61 the expansion coefficients are best determined through minimization of the self-repulsion of the residual,
Minimization immediately leads to a linear equation system,
With two-index repulsion integrals,
and with three-index repulsion integrals,
The RI-approximation speeds up electronic structure calculations in a variety of ways. First, one only needs to compute, handle, and potentially store two- and three-index quantities, which are far more convenient than computing, handling, and storing four-index-quantities such as the original electron repulsion integrals. Second, the computational effort to compute two- and three-index repulsion integrals is far lower than the computation of four-index-repulsion integrals. Third, the RI approximation amounts to an approximate factorization of the original four-index integrals. Inserted into the expressions of interest, this may reduce the scaling of computationally demanding steps by one order of magnitude. For example, the formation of the Coulomb matrix becomes
with
While the first line in Eq. (6) involves formal O(N4) scaling, the steps implied by the following lines feature never more than O(N3) scaling. The last line implies the solution of a linear equation system that, numerically, is most readily accomplished via the Cholesky decomposition of the positive definite symmetric matrix V. Similar factorizations hold for any kind of “Coulomb type” summations, where the double sum includes the pair of basis functions carrying the same electron label. With suitable integral pre-screening, such summations scale asymptotically as O(N2) with a very low prefactor. Hence, RI based Coulomb calculations can be up to two orders of magnitude faster than the exact analytic ones. An even larger speedup can be obtained by moving some of the compute-intensive work out of the innermost loop as realized in the “Split-RI-J” algorithm that is the actual variant used in ORCA.62
The error introduced by the RI approximation is bounded63,64 and is smooth such that the errors in energy differences tend to be very small (typically <0.1 kcal/mol), which explains why the RI approximation has become a mainstay of modern quantum chemistry.
In correlated calculations, it is usually advantageous to employ a different factorization of the form65
where (pq|rs) is a two-electron repulsion integral over arbitrary (orthonormal) orbitals p, q, r, and s. Thus, a two-electron integral can be individually approximated through a summation over the “orthogonalized” (with respect to the Coulomb metric V) auxiliary basis set,
One great advantage in such calculations is that the auxiliary index does not need to be transformed from the atomic orbital (AO) into the molecular orbital (MO) basis since either AO pairs or MO pairs can be fitted to the auxiliary basis set. Inserted into various terms occurring in correlation theories, the RI approximation often drastically reduces the prefactor and storage requirements but typically does not reduce the formal scaling with system size.66
In addition to the RI approximation for repulsion integrals, ORCA also supports the RI approximation for more demanding and complicated integrals such as two-electron spin–orbit coupling,54 two-electron spin–spin coupling,55 and gauge including atomic orbital (GIAO) integrals.57
The construction of appropriate auxiliary basis sets is not trivial. Excellent optimized basis sets for Coulomb calculations have been developed by Ahlrichs, Weigend, and co-workers. Their all-purpose def2/J auxiliary basis67,68 for Coulomb fitting is of excellent quality. If the HF exchange is also fitted to the auxiliary basis set, the demands are higher leading to the larger and orbital basis specific auxiliary “/JK” basis sets.69,70 Finally, correlation calculations have yet different requirements since occupied/virtual and possibly virtual/virtual orbital pairs are fitted, and to this end, the orbital basis specific “/C” auxiliary bases have been developed.71 ORCA can simultaneously handle any number and nature of auxiliary basis sets, for example, a /JK basis for the SCF part of a calculation and a “/C” basis for a subsequent MP2 calculation. In order to be able to always take advantage of the RI approximation, we have developed an automatic procedure, “AutoAux,”72 that provides a general-purpose auxiliary basis set that is adapted to the underlying orbital basis set through the analysis of its exponent and angular momentum structure. These automatically generated auxiliary basis sets are less efficient than the purpose specific optimized ones but still lead to very significant reductions of computational effort relative to the fully analytic calculations.
The second widely used approximation throughout ORCA termed “chain of spheres exchange” (COSX)73 is intellectually based on the elegant work of Friesner and co-workers74–77 in the framework of the pseudo-spectral methodology. In the chain of spheres (COS) approximation, a given four center repulsion integral is approximately factorized as73
Here, the sum is over grid points g on a three-dimensional molecular integration grid and the quantities to be evaluated on the grid are
Here, wg, rg are the weights and positions of the grid points. The COSX algorithm has a number of attractive features. On the one hand, the analytic integration over the Coulomb singularity avoids the slow convergence that would plague a fully numerical evaluation of the double integral. On the other hand, the numeric integration exposes locality very nicely. Hence, surrounding each basis function by a sphere outside of which it is effectively zero, one can very efficiently screen out negligible quantities since spheres do not inter-penetrate, thus leading to linear or low-order scaling. Furthermore, the numeric integration parallelizes with high efficiency.73 This approximation has also been implemented for other integral types, such as SOC integrals.54
While the RI approximation is most effective for Coulomb type integrations or summations, COSX is strongest for exchange type quantities. For example, the HF exchange matrix in this approximation becomes73
The actual evaluation is, of course, more complex due to the screening steps involved. With appropriate pre-screening, the algorithm is asymptotically linear scaling with the system size. From the definitions, it is clear that the COSX algorithm slightly breaks the symmetry of the exchange matrix. However, in practice, this does not appear to be problematic. A number of improvements can be made to the algorithm78,79 including the condition that the overlap matrix is exactly integrated (“overlap fitted COSX”).78 The actual evaluation is very efficient since the analytic integrals as well as the values of the basis functions on the grid can be efficiently generated on the fly. Unlike the RI-J approximation, the error of the COSX evaluation is not bounded and not smooth (like any numerical integration scheme). Hence, one needs to carefully monitor the grid induced errors.
The combinations of the RI Coulomb (RI-J) and COS exchange (COSX) have been called RIJCOSX algorithm,73 and it is a special and widely used feature of ORCA. Both approximations have been implemented throughout the program not only in SCF energies,80 gradients,73 and Hessians23 but also in coupled perturbed SCF and time-dependent DFT,81 MP236,82 as well as SCS-MP3,83 and coupled-pair and coupled cluster84–86 calculations among other places.
B. Single-reference self-consistent field methods
ORCA features closed-shell, unrestricted, and restricted open-shell versions of HF and DFT self-consistent field (SCF) methods. In terms of DFT, a wide variety of functionals is available. The latest ORCA version is compatible with the libxc library87 that features about 400 functionals including all established and modern functionals in common use. Local-, generalized gradient approximation (GGA), meta-GGA, and double hybrid functionals (DHDFs)88 as well as range separated versions of these functionals are supported by ORCA. ORCA has, in fact, been among the earliest packages to adopt DHDFs and reported the first analytic DHDF gradient33 (note that Ref. 33 was the first to use the term double hybrid functional in this context. The term had been previously used by Truhlar and co-workers in a different context,89 but this had slipped our attention at the time) as well as the extension of DHDFs to excited states.32
All SCF methods can be run in an integral direct (integrals recalculated on the fly) or conventional (integrals stored on disk) way. DFT XC quantities are handled by a highly optimized numerical integration approach that follows the guidelines laid out by Becke90 and further refined by Treutler and Ahlrichs.91
C. Single-reference correlation methods
ORCA features a large number of single-reference correlation methods including various versions of the coupled-electron pair approximation (CEPA), the quadratic CI (QC), and, of course, coupled cluster theory (CC).92 These include implementations featuring single-, double-, and perturbative triple-excitations [QCISD(T), CCSD(T)] for closed and open shell systems. These implementations show standard O(N6) (CCSD, QCISD, CEPA) or O(N7) scaling with well optimized prefactors due to a strictly matrix operation-oriented implementation (ORCA’s MDCI = matrix driven CI module) and reasonable parallel scaling. Orbital optimized and Brückner-orbital based versions are available. Analytic gradients or Hessians for these methods are not yet available.
Second-order many body perturbation theory (MP2) is available in a number of variants including an approximation free version and using the resolution of the identity (RI). There is also an efficient implementation of orbital optimized MP2 (OOMP2) that offers significant advantages over MP2 itself, in particular, for open-shell systems.36,93 Analytic gradients and (partially) analytic Hessians exist for these methods.
F12 versions of the methods, namely, CCSD(T)-F12 and MP2-F12, for closed and open-shells have been developed in collaboration with Professor Valeev and co-workers.
D. Multi-reference methods
ORCA features a versatile complete active space self-consistent field (CASSCF) module that can be applied to large molecules. Due to recent improvements, it features a robust and very efficient first-order algorithm for simultaneous optimization of CI and orbital coefficients.94 A number of approximations can be applied to either the integrals that enter the calculations or the active space full-CI problem. ORCA’s CASSCF program has a number of features that are designed to allow the user to construct the desired active spaces. Among those features are the possibility to construct the initial guess from SCF calculations of various fragments or an algorithm to automatically locate a second d-shell in transition metal calculations.
The integral approximations are the ones also available in other parts of the program, foremost the RI approximation to calculate transformed integrals required for the calculation of the orbital gradient and the RIJCOSX or RI-JK approximation for the construction of the various Fock matrices. Thus, the program can be readily applied to large molecules containing a few hundred atoms.
The full CI approximations include an interface to the BLOCK density matrix renormalization (DMRG) code by Olivares-Amaya and co-workers,95 an interface to the quantum Monte Carlo Full CI (QMCFCI) method of Booth and co-workers,96 and that to our own iterative configuration expansion (ICE) selected CI approach. A special feature not found in many other program packages is that the ORCA CASSCF module can handle any number of different multiplicities and irreps in state averaged CASSCF (SA-CASSCF) calculations.
ORCA has featured the first large-scale implementation97 of the second-order N-electron valence perturbation theory (NEVPT2) by Angeli and Malrieu following their pioneering formulation and implementation of the method.98–100 NEVPT2 has become a widely used feature of ORCA and a major workhorse for magnetic property calculations. NEVPT2 is highly efficient since no ill-conditioned linear equation systems need to be solved, no first-order wavefunction needs to be stored, and full advantage of the RI-approximation is taken. In addition, NEVPT2 is practically intruder state free. Typically, NEVPT2 calculations are less expensive than the underlying CASSCF calculation (per state). NEVPT2 can be used in conjunction with magnetic property calculations including scalar-relativistic Hamiltonians, SOC, magnetic fields, and two-electron spin–spin coupling. We have implemented the strongly contracted (SC-NEVPT2) and the fully internally contracted (FIC-NEVPT2) version (also referred to as partially contracted, PC-NEVPT2).25,28,97 Quasi degenerate (QD) and hermitized QD (HQD) NEVPT2 variants are also available.101 The explicitly correlated variant NEVPT2-F12 has also been developed.28 A rather special technique in ORCA is the dynamic correlation dressed CAS [DCD-CAS(2)], which is a perturb-then-diagonalize type approach that aims at treating static- and dynamic-correlation as well as relativity on equal footing.101,102
CASSCF and all MR-perturbation methods combine very naturally with the Ab Initio Ligand Field Theory (AILFT).103–107 AILFT is a particular feature of ORCA that allows one to extract ligand field parameters of open-shell transition metal complexes unambiguously from the ab initio calculation. This has major applications in transition metal, f-element chemistry, and magnetochemistry.103–117
A fully featured CASPT2 implementation has been added to ORCA more recently. Recent advances indicate that the intruder state problem, which plagues many CASPT2 applications, can be largely solved by using a regularization approach.
Beyond second-order perturbation theory, ORCA features a number of methods. The early selecting MRCI program118 can handle a number of approximate methods including approximately size-consistent variants [averaged coupled pair functions (ACPF) and average quadratic coupled cluster (AQCC)] and has a large number of features pertaining to magnetic property calculations.
Newer developments in high-level MR-correlation methods are based on automatic code generation.119 The latter has been pushed to a level where the machine generated code is of similar efficiency as the hand-optimized code.119 In this way, a fully internally contracted CI program (FIC-MRCI, including division into active, inactive, and virtual orbital spaces) has been generated and tested together with various contraction schemes.120 Newer developments in this area also include a fully internally contracted coupled cluster (MRCC) implementation, following the pioneering work of Hanauer and Köhn117 as well as the work of Evangelista and Gauss.121,122
The iterative configuration expansion (ICE) is a special feature of ORCA despite the fact that to date, it has only been documented in the ORCA manual since 2016. Technically, it is a selecting CI method of the CIPSI (configuration interaction with perturbative selection through iterations)123 type. The main idea is to identify the relevant parts of a many electron wavefunction through perturbation theory and to refine it through iteration. Its premise is very similar to the MRDCI method by Buenker and Peyerimhoff124,125 that has been widely used in the interpretation of spectroscopic properties of small molecules. The method introduces two thresholds that control its accuracy. Depending on these thresholds, the method can be viewed as a selecting second-order multi-reference perturbation theory or as a full-CI approximation. Perhaps, the latter use is more important for modern uses since it can substitute for the factorially scaling full CI step in complete active space calculations. The scaling of the ICECI method is polynomial and depends on the nature of the system. Very large active spaces with more than 40 electrons and 100 orbitals have been treated with it. It lends itself very well to the treatment of electronic systems with many open-shells and almost arbitrarily complex spin couplings.
E. Local correlation methods
The desire to apply accurate first-principles methods to systems of current chemical interest in a routine manner has been a major driving force in the development of linear or low-order scaling correlation methods of the coupled-electron pair approximation (CEPA) and coupled cluster type. Our approach is based on the impressive original work of Meyer,126–130 Ahlrichs,131,132 and co-workers on pair natural orbitals (PNOs) following their introduction by England. These pioneers showed that accurate CEPA calculations can be performed by expanding the virtual space of each electron pair separately in an approximate natural expansion. These methods showed very promising results in the 1970s but subsequently fell into disuse before they were revived in 2009 in the ORCA framework.133 In fact, at the time, it was realized that the combination of local occupied orbitals (LMOs) with PNOs is an ideal basis for the development of linear scaling local correlation theories—something that was computationally far out of reach in the 1970s.
Following our proof-of-concept paper in 2009 that was based on the coupled electron pair approximation (CEPA), the first PNO based CCSD implementation followed shortly after.134 At that time, these methods [local pair natural orbital (LPNO) leading to LPNO-CEPA and LPNO-CCSD] had formal O(N5) scaling due to the very complex integral transformations that could only be efficiently realized on the basis of the RI approximation. However, the methods could be applied to systems with around 100 atoms. Since it was shown that the methods can retain about 99.8%–99.9% of the canonical correlation energy,133,134 it quickly became clear that PNO based correlation methods are extremely promising for large scale development. The next breakthroughs, with seminal contributions of Riplinger, were reported in 2013 when a (nearly) linear scaling version of CCSD and CCSD(T) approximations were reported.135,136 The latter work introduced the concept of triples natural orbitals (TNOs)136 and the methods received the name “domain based local pair natural orbital” (DLPNO)-CCSD(T).135,136 This was the first work in which PNOs were expanded in terms of projected atomic orbitals (PAO’s) in large orbital domains. Of course, the idea of using PAO’s in local correlation treatments goes back to the pioneering work of Saebo and Pulay137–140 in the 1980s that was extensively further developed by Werner, Schütz, and co-workers since the mid 1990s.141–145 Our development of DLPNO-CCSD(T) led to the first CCSD(T) level calculation of a whole protein—Crambin with almost 650 atoms (Fig. 2).135,136 True linear scaling was finally reached in 201627 through the development of the “Sparse-Map” concept24–27,29 and allowed for DLPNO-CCSD(T) calculations on systems with over 1000 atoms and in excess of 15 000 basis functions. For treating even larger systems with more than 2000 atoms, the combination of cluster in molecules (CIM)146,147 approaches with the DLPNO-CCSD(T) [e.g., CIM-DLPNO-CCSD(T)] method was developed in 2018 (Fig. 2).148 For these enormous systems, the calculation of the SCF reference becomes almost more problematic than the linear scaling coupled cluster calculation.
Crambin was the first whole protein to be treated with a CCSD(T) level calculation in 2013. The currently largest coupled cluster calculation with ORCA features the integrase protein with 2380 atoms and more than 22 000 basis functions. The latter calculation was made possible by the combination of DLPNO-CCSD(T) with cluster-in-molecules concepts.
Crambin was the first whole protein to be treated with a CCSD(T) level calculation in 2013. The currently largest coupled cluster calculation with ORCA features the integrase protein with 2380 atoms and more than 22 000 basis functions. The latter calculation was made possible by the combination of DLPNO-CCSD(T) with cluster-in-molecules concepts.
Extensive benchmark has been assembled for the DLPNO-CCSD(T) method, which indicate that it consistently outperforms all DFT methods (in terms of accuracy)149–153 and, at the same time, is more accurate and much more efficient than conventional high-accuracy thermochemistry schemes such as G4.154 For systems with low lying excited states, such as the most open shell species, it is important to avoid the semi-canonical “T0” approximation and iteratively solve the linear equations for the triples correction.153,155,156 Schmitz and Hättig have developed an alternative technique based on the Laplace transformation,157 while Ma and Werner have reported an iterative technique in conjunction with explicitly correlated PNO based CCSD(T).158 The accuracy in standard benchmark sets can be pushed to about 1 kJ/mol error relative to canonical CCSD(T) but not much further without exploding computational effort. Already progressing from 1 kcal/mol mean absolute error with what we have defined as “normal” PNO cutoffs to 1 kJ/mol with “tight” PNO cutoffs (as required for the study of intermolecular interactions) increases the computational effort by about one order of magnitude.152
DLPNO methods subsequently found widespread use in computational chemistry. Consequently, significant and excellent development efforts were reported, in particular, by the Werner and Hättig groups.158–168 Excellent work has also been reported by Kallay and co-workers who extensively explored the combination of CIM approaches with natural orbital ideas.169–171 In ORCA, the main goal has been to make PNO based methods ideally as applicable as DFT to mainstream computational chemistry in an, as much as possible, black box fashion (Fig. 3). Consequently, next to closed shell DLPNO-MP224 and DLPNO-CCSD(T), the development of open-shell variants,30,172 explicitly correlated variants (pushed forward mostly by Professor Valeev),26,29 closed-173 and open-shell174 densities, and exact analytic gradients for DLPNO-MP2175,176 was reported. A particular success was also the development of the similarity transformed equation of motion variant for excited states (DLPNO-STEOM-CCSD)177–181 as well as the first PNO based multi-reference method, DLPNO-NEVPT2.25 The former also led to the development of linear scaling versions of the equation of motion CCSD method for ionized and electron attached states (DLPNO-IP-EOM-CCSD179 and DLPNO-EA-EOM-CCSD181).
Importantly, it was shown that the DLPNO-CCSD(T) energy can be very naturally decomposed in a natural manner leading to a very powerful local energy decomposition (LED) scheme.182 The latter has been used extensively in the analysis of intermolecular interactions.183–187 The LED has inspired a highly efficient and, at the same time, surprisingly accurate scheme in which the dispersion energy is added to the HF energy (HF-LD).188 It has also inspired the development of a multi-level scheme43 in which different parts of a large system can be treated at user defined levels of accuracy. DLPNO methods also seamlessly integrate with combined quantum mechanical/molecular mechanics (QM/MM) approaches.189
Given their high accuracy and efficiency together with their wide applicability, (DL)PNO methods are having a rapidly increasing impact on computational chemistry studies worldwide. We anticipate this trend to intensify and consequently DLPNO methods will remain a major research target in ORCA.
F. Semi-empirical methods and force fields
Since ORCA started its life as a semi-empirical program package, some zero-differential overlap (ZDO) semi-empirical functionality remained after the redesign. Traditionally, this included the intermediate neglect of differential overlap (INDO/S and INDO/1) parameterizations of Zerner and co-workers.6 In addition, the traditional MNDO190 and Neglect of Diatomic Differential Overlap (NDDO) based methods are part of ORCA. These methods simply write the semi-empirical integrals on disk from where they are picked up by all parts of the program. Hence, all of the functionality including CASSCF, MRCI, and CCSD(T) are, in principle, usable together with these semi-empirical Hamiltonians. Where necessary, the underlying Slater orbital basis is automatically fitted to the 3-Gaussian expansion and other integrals that are not parameterized are calculated over these basis functions.
A more modern semi-empirical method that is based on a tight binding approach, the XTB method191 has been developed by Grimme and is available in ORCA, presently in form of an interface. It is available essentially for the entire periodic table. XTB calculations serve well as a basis for highly efficient molecular dynamics approaches or for exploratory geometry optimizations. The HF-3c approach192 by Grimme should also be mentioned. It is based on a minimal basis HF approach together with some empirical corrections. It is highly efficient and also an excellent choice for exploratory calculations or long MD runs. A more accurate method along the same lines of thinking but based on DFT is the PBEh-3c method193 that yields excellent results at a very moderate computational cost.
Finally, ORCA features its own (as of yet unpublished) force field implementation that is used in MM or QM/MM contexts. The design goals for ORCA’s force field implementation were ease-of-use and transferability. We did not aim at developing our own type of force field but wanted to provide a framework that can work with the most popular existing force fields. ORCA features its own input format, but the MM module offers conversion routines from other force field, topology, and parameter files to the ORCA-internal force field format. Users can easily manipulate individual parameters within the ORCA force field file if desired. The MM module even features options for the generation of ORCA force field files for molecules without available parameters, directly making use of ORCA’s various electronic structure methods.
IV. GEOMETRY OPTIMIZATION, TRANSITION STATES, MINIMUM ENERGY CROSSING POINTS
Finding local energy minima is essential for any QM application, and efficiently locating transition states is crucial for studying chemical reactivity. TS optimization is often found as being significantly more complicated and computer time intensive compared to finding minima. In its initial years, ORCA relied partly on external optimization routines, but since 2007, more and more sophisticated optimization routines have been developed. The initial implementation was based on redundant internal coordinates, which is the de facto standard in this field.194 In a first wave of development, options to optimize TSs, crossing points, and only hydrogen positions to impose restraints and constraints and to perform relaxed surface scans were implemented. The latter were a necessity for exploring PESs and for finding good initial structures for TS optimizations. In order to speed up the entire process of TS optimizations, various forms of approximate Hessians were included with the intention to prevent the time-consuming computation of the exact Hessian matrix at the beginning of a TS optimization.
In a second wave of implementation work, starting in 2015, additional options were included, e.g., a conical intersection optimizer and the intrinsic reaction coordinate. A particularly successful scheme for the efficient and robust optimization to TSs is the nudge-elastic band (NEB) implementation in ORCA. The NEB-TS routine automatically and efficiently finds TSs even for reactions with multiple breaking and forming bonds without extensive user intervention.195–197
The implementation of a fully integrated intrinsic QM/MM method brought ORCA’s quasi-Newton optimization routines to their limits, and two approaches for the optimization of multiscale systems were implemented in due course. Large scale optimizations of tens to hundreds of thousands of atoms are now possible using an L-BFGS optimizer provided by ORCA’s MD module. This optimizer includes not only a simple minimization feature but also options to (i) optimize specific elements only, e.g., all hydrogen atoms of a protein, (ii) freeze small and large parts of the system, and (iii) optimize the location and orientation of subsystems as rigid bodies in their environment, which becomes particularly useful if bonded parameters are not available for a molecule, but a full MM optimization is to be carried out nonetheless. All these options are available independent of whether the active atoms are part of the QM or MM region.
In the second approach, a large part of the entire system is frozen, and only a subset of the system, in maximum several hundreds of atoms, is treated as active. The reasoning behind this approach is analogous to that of QM/MM, assuming that the action of a reaction or an electronic excitation takes place only in a subset of the system. This approach can make use of ORCA’s QN optimizer, with all its advantages, from the much faster convergence, compared to an L-BFGS optimizer, to the plethora of available features, which were already discussed above. A major advantage here is the combination of the NEB-TS feature together with the multiscale model, allowing for the efficient treatment of reactivity in extended systems such as enzymes.
V. MOLECULAR DYNAMICS
Having focused mostly on the calculation and analysis of static properties, that is, on single molecular structures, it became quickly apparent that the dynamics of atomic motion under external conditions such as temperature, pressure, and volume had to be taken into account. Thanks to Dr. Brehm,198 ORCA now provides a fully featured ab initio Born–Oppenheimer molecular dynamics (AIMD) module. This includes a Berendsen thermostat,199 pressure coupling, different cell geometries, constraint dynamics,200 multiple time integrators,201 and energy minimization algorithms. Trajectories and velocities can be read and stored in different formats. Additionally, the AIMD module contains a fully featured interpreted SANscript language.202 The tight coupling of the MD module to the ORCA framework enables the calculation of trajectories using all available electronic structure methods on all levels of accuracy. In this way, dynamics using correlated methods or dynamics on excited states are possible, limited only by the available computational resources.
VI. SOLVATION
Whereas traditionally quantum chemical calculations have been done in vacuo, most chemical reactions take place in solution. Besides the possibility to include explicit solvent molecules in a calculation, ORCA also provides a range of implicit solvent models that are implemented throughout the program. While in its initial years, ORCA was relying on the COSMO solvation model203,204 (even was instrumental for the first implementation of direct COSMO-RS205), more recent ORCA versions rely entirely on a fully internal conductor-like polarizable continuum model (CPCM) implementation.
The implementation follows the CPCM principles206 in which the interaction with the surrounding dielectric environment is represented by point charges on an encompassing surface. This molecular cavity, one of the crucial elements in every implicit solvent method, is generated by a slightly modified version of the GEPOL algorithm,207 either as a solvent excluding surface (SES) or a solvent accessible surface (SAS). For general applicability, single-point energies, gradients, and geometric second derivatives can be calculated at the SCF/DFT level.208 In post-HF methods, the solvation enters as a contribution to the one-electron Hamiltonian.
More recently, ORCA also provides an alternative way of representing the solvent interaction via spherical Gaussians instead of point charges.209 In combination with a new scheme for deriving a new type of surface using Gaussians on a grid around the molecule, this “Gaussian Charge Scheme” provides smoother potential surfaces, which are important for geometry optimizations and large systems among other advantages.210 In order to compute the free energy of solvation, ORCA provides an implementation of the SMD solvation model.211
VII. SPECTROSCOPIC PROPERTIES
It is an essential part of ORCA’s design philosophy to connect the results of computations to experimental observables. Among those, spectroscopic properties have always been at the forefront of our interest. Given that spectroscopy is the experimental way to study the electronic structure, these spectroscopic properties are a precise and sensitive fingerprint of a system’s electronic structure. Consequently, spectroscopic properties also provide excellent feedback about the quality of a given calculation and therefore play an essential part in the scientific falsification process (in Popper’s sense). For recent essays explaining and discussing our philosophy of science and our approach to computational chemistry in general, see Refs. 212 and 213.
Since spectroscopy is such an essential part of ORCA’s design philosophy, major efforts have been undertaken to make the program versatile and powerful in this arena. Consequently, ORCA is the dominantly used electronic structure package in a variety of spectroscopic communities, e.g., in electron paramagnetic resonance (EPR), molecular magnetism, and x-ray spectroscopy. We emphasize that to the experimentally oriented spectroscopist, even simple calculations can be of major utility. Clearly, the usefulness of a given method is neither proportional to its intellectual complexity nor to its computational cost. With this philosophy in mind, we are going to briefly describe the main methods available in ORCA.
It is convenient to order spectroscopic phenomena according to increasing or decreasing energy of the photons involved. We follow this logic below but have decided to discuss electronically excited states first since the latter constitute a special branch in theoretical and computational chemistry.
A. Electronically excited states
The simplest and perhaps most widely used approach to electronically excited states are particle–hole theories on the basis of a single reference determinant of the HF or KS type. From the start, ORCA has featured configuration interaction with single excitations (CIS) and time-dependent DFT (TD-DFT) implementations.16 For both methods, analytic gradients are available81 and multiple ways to accelerate the calculations through the use of the RI and COSX approximations are featured. For closed shell systems, the program can also compute singlet/triplet coupling through spin–orbit coupling,214 which is an essential property for phosphorescence calculations. These methods can be used to compute absorption and circular dichroism (CD) and x-ray absorption and emission spectra for closed- and open-shell molecules.215–221 In particular, the very simple x-ray emission calculations have been found to be of major utility in practical applications.218,219,222–232 The calculations feature exact transition moments233 and their approximate decomposition in electric dipole, magnetic dipole, and electric quadrupole contributions.
A rather special feature in ORCA is Grimme’s simplified sTD-DFT method37 that is very much more efficient than regular TD-DFT calculations and can be used to treat very large molecules efficiently, albeit at some loss in accuracy.
At a higher level of sophistication and accuracy, the equation of motion (EOM) CCSD methods and the similarity transformed EOM-CCSD (STEOM-CCSD) method234 are available in ORCA. For the latter, automatic procedures have been devised in order to determine the necessary active space. Both methods are available for closed235 and open-shells.236 A particularly powerful variant is the linear-scaling DLPNO-STEOM-CCSD method178,237 that has recently been used with excellent success to study the excitation spectra of large molecules238,239 and the bandgaps of solids.237
A method that is unique to ORCA is the slightly semi-empirically parameterized restricted open-shell CIS (ROCIS) method and its variant based on DFT orbitals ROCIS/DFT.240–245 These methods are similar in spirit to the CIS/DFT methods developed by Grimme246,247 and Marian.248–250 The main objective here is to provide a method that correctly deals with the spin-coupling of unpaired electrons in excited states while staying computationally affordable enough to be applied to systems with a few hundred atoms. The ROCIS/DFT method is designed to simultaneously treat the ground state multiplicity with spin S (a single high-spin restricted open-shell determinant) and the excited states with S′ = S, S + 1, and S − 1. SOC between all excited states is included.240 The resulting eigenstates can be used to calculate a wide variety of spectroscopic properties, including x-ray absorption and emission spectra, XMCD spectra, and more complicated two-dimensional x-ray (RIXS)244 spectra. The recent development of the PNO based ROCIS method251 allows for the efficient computation of hundreds of excited states for large systems. The methodology has found widespread application in the synchrotron community.
A host of multi-reference methods provides direct access to excited states as opposed to the indirect methods based on response theory. In particular, MR methods are appropriate in many cases in which the spectra are not dominated by valence-to-valence single excitations. A major workhorse in this regard is ORCA’s large-scale CASSCF program94 in conjunction with NEVPT2 or CASPT2. However, a wide range of alternative methods is available including MRCI, a special method called spectroscopy oriented CI (SORCI),118 various approximately size consistent MR methods (MR-ACPF, MR-AQCC, and MR-CEPA/0) in either individually selecting or fully internally contracted120 variants. A special method, which was first implemented in ORCA, is Nooijen’s multi-reference equation of motion coupled cluster method (MREOM-CCSD).234 These methods can be used to compute absorption, CD, and MCD spectra. They have also been successfully used to compute x-ray absorption spectra252–254 as well as x-ray emission including the interpretation of x-ray emission mainline spectra.255
In order to interpret finer details of electronic spectra, the ORCA_ASA256,257 and ESD214,258,259 modules have been developed. The former is based on Heller’s semi-classical time-dependent wave packet propagation theory, while the latter is exploiting a path-integral approach. Both modules have largely equivalent functionality, but only the ESD module will be further developed. Based on excitations energies, ground state geometries (exact or estimated), and ground- and excited-state (exact or estimated) Hessians, the ASA/ESD modules are able to calculate vibronically resolved absorption spectra256,257,260 even for large molecules with many degrees of freedom, resonance Raman259 spectra and profiles, and fluorescence258 and phosphorescence214 spectra. Herzberg–Teller coupling and Duchinsky rotations are optionally included in the treatment. The ESD module works together with the CIS/TD-DFT, EOM- and STEOM-CCSD, and CASSCF/NEVPT2/, CASSCF/CASPT2, and MRCI methods.
B. Other spectroscopic properties
Proceeding from low to high energy, ORCA features a powerful module to compute nuclear magnetic resonance (NMR) chemical shieldings using gauge including atomic orbital (GIAO) based HF/DFT,57 RI-MP2 and double hybrid functionals.261 Spin–spin couplings can be computed at the HF/DFT level.40 This methodology works together with Grimme’s conformer sampling in order to generate approximate NMR-spectra in solution.
EPR properties have always been a focal point in ORCA’s development. A general theory of spin-Hamiltonian parameters [g-tensor, Zero-Field Splittings (ZFS)] was developed early on11,262–265 and implemented in conjunction with MRCI calculations.11,263 Subsequently, HF/DFT methods were developed for the g-tensor267 and hyperfine couplings including SOC266 and zero-field splitting.267 Unlike most other programs, ORCA has always been able to compute the two-electron spin–spin coupling terms using multi-reference or self-consistent field wavefunctions.55,264,265,268 While these have been and are extensively used in the community, the accuracy of DFT is rather limited. Hence, more practical wavefunction based methods to compute these properties have been under intense development.30,93,102,104,120,174,263,269–280 The main workhorse for transition metal calculations has been the CASSCF/NEVPT2 methodology into which SOC [calculated with a more general multi-center spin–orbit mean field (SOMF) approach54], magnetic fields, and the two-electron spin–spin coupling are introduced by quasi-degenerate perturbation theory (QDPT). From the resulting relativistic eigenstates, all spin-Hamiltonian parameters can be deduced, as well as the magnetic susceptibility.104,108,109,111–113,115 These methods have found widespread use in molecular magnetism. For simpler organic radicals, DFT can work reasonably well and has served well in many application studies. However, the recent development of the DLPNO-CCSD spin-densities promises to open-up the next level in accuracy.174,281 A “PNMR” module has been added to ORCA that combines the results of various EPR property calculations to predict paramagnetic NMR spectra.
While Mössbauer spectroscopy involves high-energy gamma-ray photons, the parameters measured are on the order of hyperfine couplings. ORCA’s simple calibration approach to isomer shifts282,283 and quadrupole couplings has, again, found widespread application in the Mössbauer community.284 More recent developments also allow for these calculations to be performed on the basis of large-scale DLPNO-CCSD wavefunctions, which partially overcome accuracy issues that DFT may have.
Standard vibrational spectra [infra-red (IR) and Raman] are generated in ORCA using standard methodology based on either semi-analytic or fully analytic Hessians. The latter are available at the HF/DFT levels. A special feature is the calculation of nuclear resonance vibrational (NRVS) spectra, a synchrotron based Mössbauer technique to measure vibrational spectra.285,286 The calculation of resonance Raman spectra and excitation profiles has been discussed above.259,260,287,288
VIII. ORCA AND EXTENDED SYSTEMS
Proteins are extended systems that are usually too large to be treated on quantum mechanics level alone. Proteins consist of tens to hundreds of thousands of atoms, but the area of interest from a chemical point of view is often confined to just dozens to hundreds of atoms. A simple approach to model such systems is to use simple cluster models. A more elaborate approach is to use different methods for different scales, i.e., QM for the small region of interest, and classical mechanics, or molecular mechanics (MM), to treat the remainder of the protein.289
From its very beginning, ORCA was used in the prediction of properties such as spectroscopy and reactivity of proteins using simple cluster models. Due to ORCA’s capabilities for efficiently treating open shell systems, which are often present in metalloenzymes, it was seen as an attractive QM engine for QM/MM calculations. Thus, for various programs that include a QM/MM infrastructure and which can drive QM/MM calculations, interfaces to ORCA were developed. These implementations were used by the community, but they were lacking the ease-of-use that is characteristic of ORCA. More importantly, the objectives of the driver programs, mostly MM programs, are usually quite different to those of a QM program, and thus, certain features such as TS optimization, scans, and optimizations are not as efficiently implemented as in ORCA or they are not implemented at all. Thus, it was decided to implement an own MM and QM/MM module for ORCA, which, since 2019, is operating efficiently with all other modules of ORCA and allows one to run QM/MM calculations from a QM-centric perspective, including reactivity and spectroscopy calculations.
Solid state systems are extended systems that can be efficiently treated exploiting their symmetry and periodicity. Various efficient quantum mechanics codes exist, which mimic an infinite system by a unit cell or super cell with periodic boundary conditions. ORCA is a molecular code lacking periodic boundary conditions. Nonetheless, ORCA’s wealth of spectroscopic and high-accuracy methods (from which the most accurate ones, such as DHDFs and linear-scaling correlation methods, are not available in the periodic codes) make it attractive to applications on solid systems. In the ORCA team, various schemes for treating solid systems have been developed. For semiconductors and insulators, a QM/MM scheme has been developed since 2013240 in which the solid is represented by an extended super cell that contains hundreds of thousands of atoms. In this extended system, a quantum region, which usually is similar in size to the unit cell, is embedded in the point charges of the remainder of the super cell, and the boundary region is mimicked by capped ECPs. The approach yields similar results compared to periodic calculations for various properties using comparable levels of theory; however, using ORCA, these properties can also be computed at a higher level of theory than possible previously.237,290,291
IX. ORCA AND MODERN COMPUTER ARCHITECTURES
Since the outset of the ORCA project, the computational landscape has changed considerably. When it was common in the mid-nineties that one personal computer had a single central processing unit (CPU), major computing resources a few CPUs and supercomputers a dozen, and compute clusters still fit on a single shelf, the same computational power is available in a single notebook today. Through all these changes from single CPU to interconnected nodes to highly parallel supercomputers, the conceptual paradigms ORCA is based on have withstood the test of time remarkably well. In this respect, one of the most important design decisions was to always keep the ORCA core functionality completely system and architecture independent. Thus, today, ORCA runs on all common platforms (Linux, MacOS, Windows) from personal sub-notebooks to CRAY supercomputers.
Due to the high computational demands of electronic structure methods, the demands to use more than single CPU has to be taken into account. It was decided early on to rely on the Message Passing Interface (MPI) to perform calculations in parallel, covering multi-core computers and interconnected compute clusters at the same time. The ORCA parallelization model follows a scheme of data replication per process in order to avoid a performance hit that might arise when trying to keep the required data distributed. All ORCA modules from SCF to DLPNO have been parallelized according to this scheme.
More recently, it became clear that the performance of single cores within a CPU does no longer increase according to Moore’s Law, that is, doubling the performance all 18 months. However, the law still holds when counting the number of transistors on a CPU, thus increasing the overall number of cores per CPU significantly. This development severely changed the core to available memory ratio. With the appearance of the MPI-3 standard, ORCA adopted its use of platform independent shared memory regions to reduce the overall memory consumption when using multiple cores in a single machine.
ORCA has significantly benefited from newer CPU generations, making use of newer instruction sets such as AVX, AVX2, and AVX-512. The performance gain via these extended single instruction multiple data (SIMD) instructions is transferred almost directly to ORCA, as many of its routines, such as the CC and DPLNO CC methods, are highly optimized matrix driven algorithms.
Even though the CPU power has increased in recent years, the world has seen an explosive growth in computational capacities via the use of non-CPU devices such as Graphic Processing Units (GPUs) or similar units (IBM Cell, Intel XEON Phi, and AMD APUs). In a pilot study, we have developed an implementation of the McMurchie–Davidson scheme using OpenCL, providing accelerated generation of 4-index and 3-index integrals via these devices.292
X. REPRESENTATIVE TIMINGS
At the suggestion of one of the referees to this paper, we have decided to document some performance data. Clearly, given the enormous breadth of functionality in ORCA, comprehensive benchmarks are not feasible. Hence, only a subset of frequently used methods will be documented here. The systems depicted in Fig. 4 were used for the benchmark calculations, covering molecules of small to medium size. All calculations were carried out on a local cluster where each compute node has two processors (2 x AMD EPYC 7302; 16 cores each) and 512 GB per node. In the calculations shown below, typically, not more than 2–4 GB of memory per core were used. Comparing timings between different program packages is notoriously difficult given the many variables that enter into the final wall-clock time, such as various cut-off parameters and convergence criteria. One of the most important parameters is the Fock matrix neglect threshold. In our calculation, the Schwartz-pre-screening threshold was 10−10 Eh throughout and SCF convergence was carried out to be 10−6 Eh. Obviously, less strict thresholds would lead to faster calculations.
In order to keep the amount of data manageable, we have decided to only document calculations that were carried out in parallel on 16 cores. In typical calculations, the speedup due to parallelism in such calculations is 12–14. Exceptions are DFT calculations in the RI-J approximation where the observed speedups are only around a factor of 8–9. We mention in passing that the documented timings refer to a development version of ORCA in which the integral handling has been improved relative to the most recent production version ORCA 4.2. These new developments will be part of ORCA 5.0 to be released in the fall of 2020.
We first turn to system 1, Vancomycin with 176 atoms. With the def2-TZVPP basis set, this leads to 4203 basis functions. The universal Coulomb fitting basis of Weigend and co-workers (“def2/J” in ORCA notation) has 5778 functions.
A calculation with the PBE functional and using the RI-J approximation converges in 14 SCF cycles. The total elapsed time is 449 s showing that even calculations with thousands of basis functions can be completed as a matter of a few minutes. Only about half of this time (232 s) is spent in the Fock matrix construction, which, in turn, is dominated by the calculation of the Coulomb term (194 s). The remaining time is attributed to the numerical exchange-correlation integration. Thus, Fock matrix formation is highly efficient and scales well with the number of processors (speedup around 13 on 16 cores). The rate limiting step is the SCF solver, which, at this point, does not scale well enough in parallel calculations.
The high efficiency of the RI-J approximation is evident by comparing this calculation to a B3LYP calculation without integral approximations (other than Schwartz pre-screening) that take 14 910 s (13 SCF cycles). Thus, the RI-J approximation accounts for a speedup of a factor of about 33 over the calculation that uses traditional 4-index integrals. However, the time required for the B3LYP calculation can be drastically reduced by approximating the Coulomb term with RI-J and calculating either the exchange term exactly (RIJ-DX) or the COSX approximation (RIJCOSX). The RIJ-DX calculations completes in 8053 s, while the RIJCOSX calculation requires only 1383 s. These timings are consistent with our general experience that B3LYP(RIJCOSX) is about three times slower than PBE(RIJ) and about ten times faster than the fully analytic calculation. RIJ-DX is typically about a factor of two faster than the fully analytic calculation. Obviously, these statements carry over to all other functionals that fall into the same families as the ones tested here. Hartree–Fock calculations behave very similar to hybrid DFT calculations.
A series of geometry optimizations were carried out on 1 and 4. Given that SCF gradients are a minor addition to the SCF calculation time, the time required for the optimization is essentially given by the number of geometry cycles times the time required for one SCF. The number of cycles required for representative calculations are collected in Table I. It is evident from these data that ORCA’s geometry optimizer works well and converges in a competitive number of cycles.
Number of geometry cycles required for convergence of the geometry optimizations. Structures are preoptimized with the XTB method. All DFT calculations are done with the def2-SVP basis set (default convergence criteria: energy change = 5 × 10−6 Eh, max. gradient = 3 × 10−4 Eh/bohrs, rms gradient = 10−4 Eh/bohrs, max. displacement = 4 × 10−3 bohrs, and rms displacement = 2 × 10−3 bohrs).
Method . | 1 . | 3 . |
---|---|---|
HF-3c | 35 | 9 |
PBE-D3 | 40 | 14 |
B3LYP-D3 | 49 | 24 |
DLPNO-B2PLYP-D3 | 37 | 20 |
Method . | 1 . | 3 . |
---|---|---|
HF-3c | 35 | 9 |
PBE-D3 | 40 | 14 |
B3LYP-D3 | 49 | 24 |
DLPNO-B2PLYP-D3 | 37 | 20 |
We now turn to an analytical Hessian calculation on system 3 (def2-TZVPP = 1007 basis functions and def2/J = 1376 auxiliary functions) using the same methods. The PBE(RIJ) calculation completes the entire Hessian calculation in only 547 s, B3LYP(RIJCOSX) requires 1501 s, and B3LYP(RIJ-DX) completes the task in 9558 s. In these calculations, the use of 4 GB per core was allowed. Again, the very high efficiency of the RIJ approximation is apparent as are the enormous benefits that the RIJCOSX approximation offers over the fully analytic calculation.
Correlated wavefunction based calculations are well-known to be computationally expensive and require large basis sets to deliver accurate results. The test system treated together with canonical CCSD(T) was system 5 (no symmetry used). The coupled cluster part of the calculation with the progression of basis sets: def2-SVP (141 basis functions), def2-TZVPP (329 basis functions), and def2-QZVPP (619 basis functions) requires 29 s, 873 s, and 22 461 s, respectively. Thus, each cardinal number in the basis set comes at an increase in computer time of about a factor of 30. The calculation of the (T) corrections requires about 20%–30% of the total time spent.
While this canonical CCSD(T) calculation is already somewhat elaborate with the def2-QZVPP basis set, it is too small to be used as a test case for the DLPNO-CCSD(T) method. The latter was applied to system 4 using the same progression of basis sets: def2-SVP (430 basis functions), def2-TZVPP (1007 basis functions), and def2-QZVPP (1921 basis functions). These calculations (“NormalPNO” settings) complete in 101 s, 731 s, and 5295 s, respectively. This demonstrates that DLPNO methods allow for coupled cluster level calculations with good quality basis sets on systems with 50–200 atoms in a routine fashion. If one stretches the limit, calculations on systems with over 1000 atoms are possible. It is a characteristic of DLPNO methods that an increase in the cardinal number of the basis set is much more affordable than in canonical calculations with the increase in computational time between successive cardinal numbers being only about a factor of seven. It is instructive to compare the DLPNO-CCSD(T)/def2-TZVPP time for 4 to the respective SCF without any approximation, which took 371 s. Thus, using DLPNO methods, the increase in wall-clock time to obtain a coupled cluster result over a Hartree–Fock calculation is on the order of 2–5. Since Hartree–Fock and hybrid DFT calculations require about the same amount of time, the same comment applies to the comparison of DLPNO-CCSD(T) and hybrid DFT. However, with tighter PNO thresholds that are required for the computation of intermolecular interaction energies, the time required for the DLPNO-CCSD(T) calculations increases quickly (e.g., with a factor of 4 for all three above reported calculations). This is the subject of ongoing research.
RI-MP2 and DLPNO-MP2 calculations were carried out on 1. The DLPNO-MP2 calculation was carried out with the basis set progression: def2-SVP (1797 basis functions and 5926 Aux functions), def2-TZVPP (4203 basis functions/10020 Aux functions), and def2-QZVPP (8033 basis functions/18493 Aux functions). These calculations completed the DLPNO-MP2 part of the calculation in 409 s, 2428 s, and 16 928 s, respectively. This is quite remarkable given that the SCF calculation without integral approximations takes about 15 000 s with the def2-TZVPP basis set and even the respective RIJCOSX calculation completed in about 1300 s. Thus, DLPNO-MP2 is a highly efficient way to compute MP2 and also double hybrid DFT energies (also gradients). While DLPNO-MP2 is genuinely linear scaling, the parent RI-MP2 method features fifth order scaling with system size albeit with a small pre-factor. Thus, fairly large RI-MP2 and double hybrid DFT calculations can also be efficiently completed with the canonical code. For example, a B2PLYP calculation on system 1 with the def2-SVP, def2-TZVPP, and even def2-QZVPP basis set is possible and took 1735 s, 19 305 s, and 183 445 s to complete, respectively. Thus, for systems of this size, the DLPNO-MP2 (or DLPNO-double hybrid) is about one order of magnitude faster than the canonical calculations.
As an example of a TD-DFT calculation, we have chosen system 2. With the def2-TZVPP basis set and the def/J auxiliary basis set, this leads to 3020 basis functions and 3950 auxiliary functions. 50 roots were calculated in 3–4 Davidson iterations. The SCF times for PBE(RIJ), B3LYP(RIJCOSX), and B3LYP(RIJ-DX) were 242 s, 636 s, and 3086 s, respectively. The TD-DFT equation solution required 3274 s, 6311 s, and 86 420 s, respectively. In these calculations, 6 GB of memory were allowed for each core and ten batches were requires in order to compute all requested vectors. Once more, these numbers demonstrate the high efficiency that can be reached with the combination of the RI-J and COSX approximations.
Finally, we document some CASSCF calculations on system 3, which is a typical Co(II) (d7) transition metal complex as encountered in the study of magnetic properties. In these calculations, only the five d-orbitals were in the active space. The basis set def2-TZVPP leads to 1380 basis functions. All ligand field roots for the quartet (10) and doublet (40) multiplicities were calculated in a state average CASSCF calculation. This calculation converged in 11 iterations that took 2237 s. The only integral approximations made were the RI approximation in the limited integral transformation required for the orbital gradient. 50 NEVPT2 roots were calculated in 28 266 s (471 min). In this calculation, a very large auxiliary basis set (AutoAux) with 7817 functions was used. Obviously, significant time could be saved with an auxiliary basis set optimized for the underlying orbital basis. The example nevertheless shows that a calculation of this kind can be routinely and efficiently performed with ORCA.
XI. SHORTCOMINGS OF ORCA
One of the shortcomings of ORCA certainly is its handling of molecular symmetry that is only elementary, given that ORCA was always written to perform calculations on large, low-symmetry systems. In its present version, ORCA can detect molecular point groups and adapt orbitals to D2h and subgroups. Symmetry can be used to a limited extent in multi-reference calculations but is not exploited to speed up SCF calculations. Activities are underway to improve this aspect of the program.
Other limitations are more detailed including relatively large memory consumption in the SCF program that leads to large memory demands in parallel calculations, which ultimately limit the scalability of the code. These aspects will receive extended attention in the future.
Since more and more non-expert users are using ORCA, the availability of user interfaces beyond text-based input becomes crucial for enabling this user group to use many methods that are available. For a long time, an in-house developed GUI for ORCA has not been available. An improvement in recent times has been an ORCA-enhanced version of Avogadro,293 which we support, as well as other open source GUIs that support ORCA.294,295
Analytic gradients and also Hessians will be available for a broader range of methods in the future.
XII. CONCLUSION
Over the course of the last 25 years, ORCA has grown from being a single-purpose, single-user program into a powerful general-purpose tool for chemistry, biochemistry, and materials sciences. ORCA is enjoying rapidly increasing popularity in the chemical user communities, which provides ample motivation for future developments. In order to keep pace with hardware developments, future developments will certainly, to some extent, feature machine generated codes of high efficiency and portability as well as expert protocols to make sophisticated and accurate calculations as user friendly as possible.
DATA AVAILABILITY
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
ACKNOWLEDGMENTS
The authors are deeply indebted to our numerous students, co-workers, and collaborators who have been instrumental in shaping ORCA into the program that it is today. Without their hard-work, enthusiasm, and dedication, we could not have succeeded. We are extremely grateful to the Max Planck Society, the German Science foundation, and the University of Bonn for generous financial support of this project over extended periods of time. The dedication of funding institutions to basic science is absolutely critical to the long-term success of a project such as ORCA. We thank Dr. Bernardo De Souza for help with the figures.