In this work, we develop a new semiempirical method, dubbed NOTCH (Natural Orbital Tied Constructed Hamiltonian). Compared to existing semiempirical methods, NOTCH is less empirical in its functional form as well as parameterization. Specifically, in NOTCH, (1) the core electrons are treated explicitly; (2) the nuclear–nuclear repulsion term is calculated analytically, without any empirical parameterization; (3) the contraction coefficients of the atomic orbital (AO) basis depend on the coordinates of the neighboring atoms, which allows the size of AOs to depend on the molecular environment, despite the fact that a minimal basis set is used; (4) the one-center integrals of free atoms are derived from scalar relativistic multireference equation-of-motion coupled cluster calculations instead of empirical fitting, drastically reducing the number of necessary empirical parameters; (5) the (AA|AB) and (AB|AB)-type two-center integrals are explicitly included, going beyond the neglect of differential diatomic overlap approximation; and (6) the integrals depend on the atomic charges, effectively mimicking the “breathing” of AOs when the atomic charge varies. For this preliminary report, the model has been parameterized for the elements H–Ne, giving only 8 empirical global parameters. Preliminary results on the ionization potentials, electron affinities, and excitation energies of atoms and diatomic molecules, as well as the equilibrium geometries, vibrational frequencies dipole moments, and bond dissociation energies of diatomic molecules, show that the accuracy of NOTCH rivals or exceeds those of popular semiempirical methods (including PM3, PM7, OM2, OM3, GFN-xTB, and GFN2-xTB) as well as the cost-effective ab initio method Hartree–Fock-3c.

Semiempirical methods are quantum chemistry methods with intermediate cost, accuracy, and degree of empiricism between density functional theory (DFT) and molecular mechanics (MM) methods and have found wide use in the study of large molecules (>1000 atoms) and the molecular dynamics/conformational search/pre-optimization of moderately sized molecules, to name a few.1,2 As approximate, parameterized versions of minimal basis Hartree–Fock (HF) or DFT calculations, semiempirical methods are more robust and describe more of the underlying physics than classical MM methods while typically requiring only a few hundredths of the time of a DFT calculation with a double-zeta basis set, even when efficient approximations, such as resolution of the identity (RI)3 and chain-of-spheres exchange (COSX),4–7 are applied to the latter. When properly parameterized, semiempirical methods can often achieve useful accuracy on a handful of properties of sufficiently “well-behaved” molecules, although like any other method with a fair amount of empiricism, it is notoriously difficult to transfer the accuracy to properties and types of molecules that are not used in the training procedure. Intriguingly, few methods are known whose timing and accuracy are between those of most semiempirical methods and RI-DFT. Thus, even if slightly more computational budget is available than what is required for a semiempirical calculation, one usually cannot take advantage of this to improve the accuracy of the computational results.

The most popular semiempirical methods mainly fall into two categories: the zero-differential overlap (ZDO) methods (e.g., the PMx8–10 and OMx11 families of methods) and tight-binding (TB) methods (e.g., the DFTB12,13 and xTB14–16 families of methods), respectively. In the ZDO family of methods,17,18 the following approximations are made (among others): (1) Only the (AA|AA)- and (AA|BB)-type two-electron integrals are retained, i.e., μAνB|κCλD=δABδCDμAνA|κCλC, which is the chief reason for the low cost of the ZDO methods; (2) consistent with the above approximation, the Roothaan–Hall equation FC = SCϵ—where F, C, S, and ϵ are the Fock, molecular orbital (MO) coefficient, overlap, and orbital energy matrices, respectively—is replaced by FC = , i.e., the overlap matrix S is set to the unit matrix. On the other hand, the TB methods12–14 usually do not approximate S as the unit matrix, while they describe the electrostatic (and exchange-correlation) interaction as a damped interaction between atomic charges (usually based on a Mulliken population analysis), thereby achieving a similar cost as the ZDO methods. However, most TB methods lack an exact exchange contribution, which introduces considerable self-interaction errors that can be difficult to remedy and makes them unable to describe finer electronic structure effects, such as electronic multiplets or spin state splittings.

The present work aims to develop a new semiempirical method, NOTCH (Natural Orbital Tied Constructed Hamiltonian), which is designed to fill up the timing and accuracy gap between the common semiempirical methods and RI-DFT. It makes fewer approximations and, at the same time, introduces less empiricism than both the ZDO and the TB methods. As its name suggests, the NOTCH atomic orbital (AO) basis functions are designed to mimic atomic natural orbitals (ANOs) as closely as possible. Even more importantly, the NOTCH one-center one-electron and two-electron integrals of free atoms, as well as those of monocationic and monoanionic atomic ions, are required to reproduce the ANO basis integrals of an effective ab initio Hamiltonian that implicitly includes dynamic correlation and scalar relativity, i.e., the second-order Douglas–Kroll–Hess (DKH2)19,20 similarity-transformed multireference equation-of-motion coupled cluster (MR-EOM-CC)21–24 Hamiltonian. In this way, the parameterization of atomic integrals is solved in a convenient, systematic, and completely nonempirical way. The nonempirical atomic integrals serve also as a starting point for building the one-electron and two-electron integrals in molecules.

A few other salient points of our construction include the following:

  1. In NOTCH, the core electrons are treated explicitly, rather than approximated by absorbing their contribution into the nuclear attraction potential, as what is done in most semiempirical methods. This obviates the need for complicated treatments of the core–core repulsion energy, which have often been “abused” to absorb the errors of other parts of the semiempirical method and contributed significantly to the highly empirical nature of semiempirical methods and a partially severe lack of transferability.

  2. In NOTCH, the radial distributions of the AOs depend on the molecular geometry, so that, e.g., an AO becomes more compact in a molecular environment than in the free atom. By this way, we mimic a double-zeta basis calculation within a minimal basis framework. Semiempirical methods employing geometry-dependent basis sets are rare, one prominent example being sTDA-xTB.25 

  3. The radial distributions of the AOs also depend on the atomic charges in NOTCH, i.e., the more positive charge an atom carries, the more compact its AOs will be. A simple way to account for this is to add charge-dependent corrections to the one-center one-electron and two-electron integrals. Similar ideas have been explored in ZDO26–28 and TB15,16,29,30 methods.

  4. The NOTCH AO two-electron integrals are first calculated analytically from the AO basis functions, which are subsequently corrected in order to incorporate correlation effects. This stands in contrast to the common practice of using crude approximations like the Dewar–Sabelli–Klopman formula31–35 to approximate the integrals and then hope to absorb dynamic correlation effects and various errors by parameterization. Note that the left–right correlation (LRC) of bonds, which is inherently nonlocal, is also explicitly accounted for.

  5. In NOTCH, we retain not only the AO basis (AA|AA) and (AA|BB)-type two-electron integrals but also the (AA|AB) and (AB|AB)-type ones.

  6. In NOTCH, the dispersion interaction is not described as a post-self-consistent field (SCF) correction to the energy but as a correction to the (AA|BB)-type two-electron integrals. The main rationale for this choice is that dispersion is inherently a correlation effect and thus represents an interaction between electrons, not between atoms. Similar ideas have been pursued in some semiempirical methods36 but have not gained wide attention.

  7. In NOTCH, a classical charge polarization term is added as a correction to the one-electron integrals to account for the underestimation of polarizability by the minimal basis set formalism. The idea of adding a classical charge polarization term has been explored before, but as a post-SCF correction to the total energy instead of a correction to the integrals.37,38 The same correction also mimics the charge polarization due to bonding and therefore improves the description of bonding as well.

It should be noted that the present work is preliminary, as only the elements from H to Ne are parameterized, and only results for diatomic molecules are reported. Nevertheless, our benchmark results on all the possible neutral diatomic molecules that can form between the elements H–Ne show that NOTCH provides equilibrium bond length, vibrational frequency, dipole moment, ionization potential (IP), electron affinity (EA), bond dissociation energy, and excitation energy results that are at least on par with, and frequently outperform, the best existing semiempirical methods. More importantly, NOTCH is characterized by a lack of severe outliers, which can at least partly be traced back to NOTCH’s greatly reduced empiricism compared to other semiempirical methods. The possible ways to extend NOTCH to multiatomic molecules will be briefly discussed at the end of the manuscript.

Throughout the manuscript, the following notations are used: Atoms are denoted by A, B, C, …, AOs and orthogonalized AOs (OAOs) are denoted by μ, ν, κ, λ, …, occupied and virtual MOs are labeled by i, j, … and a, b, …, respectively; spin indices are denoted by σ, τ, …. The notation μA stands for a primitive AO, a contracted AO, or an OAO centered on atom A.

Integrals without any superscript, e.g., μν|κλ, hμν, Tμν, or μν|A, denote analytic, uncorrected AO integrals calculated over ano-pc-0 basis functions (vide infra). Integrals with overbars [e.g., μν|κλ̄] stand for corrected integrals that are explained in detail below.

As the present manuscript deals with monoatomic and diatomic systems only, it will sometimes be assumed that the number of atoms in the system is no larger than 2, in order to simplify the expressions. Likewise, for the sake of simplicity, we may sometimes assume that all the elements in the molecule belong to the first two periods, but it is trivial to generalize the conclusions to heavier elements.

The NOTCH SCF energy is given by the following formula, where all integrals are initially evaluated analytically at the HF/ano-pc-0 level of theory. However, the actually used integrals (indicated by the overbar) are corrected to effectively include correlation and relativity as well as further modifications to overcome the limitation of the minimal basis set. The energy expression reads as follows:
ENOTCH=ENN+μνσhμν̄DDμνσ+12μνσκλτDμνσ×μν|κλ̄Dδστμκ|νλ̄DDκλτ+ELRCD,
(1)
where D is the density matrix and ELRC is the left–right correlation energy correction that will be detailed in Sec. II H. Herein ano-pc-0 is an all-electron minimal basis set with geometry-dependent contraction coefficients, whose construction will be detailed in Subsection II C. Thanks to the explicit description of the core electrons and thus the core-core repulsion, the ENN in Eq. (1) is given by the exact nuclear repulsion energy, without any empirical correction:
ENN=A>BZAZBRAB.
(2)

Another special feature of NOTCH is that the integrals depend on the atomic charges (and therefore on the density matrix D). In this way we are able to mimic the expansion/shrinking of AOs as a result of the atom acquiring negative/positive charge, respectively (which otherwise requires at least a split valence basis to describe), despite using a minimal basis set.

Like in any other SCF method, the density matrix is required to minimize the electronic energy ENOTCH, under the idempotency constraint. This leads to the familiar Roothaan–Hall equation,
FC=SCϵ,
(3)
where FμνσNOTCH=ENOTCHDμνσ. However, owing to the density matrix dependencies of the integrals, the Fock matrix consistent with Eq. (1) is not
FμνσNOTCH=hμν̄D+κλτμν|κλ̄Dδστμκ|νλ̄DDκλτ
(4)
but rather acquires additional terms arising from the density matrix element derivatives of the integrals, as a result of the variational principle, giving us
FμνσNOTCH=hμν̄D+κλτhκλ̄DDμνσDκλτ+κλτμν|κλ̄Dδστμκ|νλ̄DDκλτ+12μνσκλτμν|κλ̄DDμνσδστμκ|νλ̄DDμνσDμνσDκλτ+ELRCDDμνσ.
(5)

For the sake of simplicity, in the remaining parts of the manuscript, we use hμν̄ and μν|κλ̄ to denote the “zero-charge” core Hamiltonian and two-electron integrals, respectively, i.e., the integrals one obtains in the case that all atomic charges are exactly zero.

The basis set ano-pc-0 is obtained by a geometry-dependent re-contraction of the split valence, polarization function free basis set pc-0,39,40
ψμAanopc0r;RA,RB=νAcμAνAanopc0RA,RBϕνApc0r;RA,
(6)
where cμAνAanopc0RA,RB are the contraction coefficients which depend on the coordinates of the surrounding atoms RB, and ϕνApc0r;RA are the pc-0 primitive functions. For Li and Be, we also add the diffuse p function from the aug-pc-0 basis set41 to the ano-pc-0 primitive functions, to improve the description of the 2p orbitals. This yields a generally contracted minimal basis set with the following composition: 3s (uncontracted)/1s (contracted) for H and He, 5s2p (uncontracted)/2s1p (contracted) for Li and Be and 5s3p (uncontracted)/2s1p (contracted) for B–Ne. Note that the pc-0 primitive functions, which are optimized for partial general contraction, are more efficient than those of fully segmentally contracted basis sets like Slater-type orbitals (STO)-3G42 and MINIX,43 that possess 6 primitive s functions instead of 5.
The contraction coefficients cμAνAanopc0RA,RB are determined via the following procedure:
  1. A scalar relativistic second-order DKH2 state-averaged (SA)-complete active space self-consistent field (CASSCF) calculation with the decontracted pc-0 basis set is carried out for the neutral atoms of the elements Li–Ne, where the CAS active space includes all valence orbitals, and all states are averaged. (The DKH2 scalar relativistic treatment is of course unnecessary for the elements H–Ne, but will be necessary for the heavy elements that will be parameterized in the future; we have thus used DKH2 here as well, for consistency reasons.) For Li and Be, we consider the 2p orbitals as belonging to the valence orbitals, and furthermore we add the diffuse p function from the aug-pc-0 basis set to improve the description of the 2p orbitals. The converged inactive orbital is then taken as the 1s function of the ano-pc-0 basis set.

  2. The valence functions of the ano-pc-0 basis set are also given as contractions of the pc-0 primitive functions (plus the aug-pc-0 diffuse p functions for Li and Be). We first variationally optimized the contraction coefficients of the valence functions to minimize the DKH2 state specific (SS)-CASSCF ground state energies of the homodimers of elements H–Ne at the bond lengths 3.0, 2.9, …, 0.1 Å, under the constraint that the 2s function (if it exists) is orthogonal to the 1s function. We then interpolate the optimized contraction coefficients as a function of the bond length, with the following formula:

cμAνAanopc0RA,RB=cμAνA0+BAcμAνA1ρAPModelrRAVBPModelrRBdr+cμAνA2ρAPModelrrr=rRAVBPModelrRBdr,
(7)
where cμAνA02 are fitted coefficients, and ρAPModelr is the spherically averaged atomic density of atom A, which is calculated at the DKH2 restricted open-shell Hartree-Fock (ROHF)/UGBS44 level of theory, fitted by a sum of s-type Gaussian functions and tabulated in advance; VBPModelris the Coulomb potential generated by the atomic density ρBPModelr plus the nuclear charge of atom B. Since the atomic densities are expressed as a sum of Gaussians, the integrals in Eq. (7) can be readily evaluated analytically. The functional form of Eq. (7) is justified in  Appendix A. Although cμAνA02 are determined by fitting, they are fitted against variationally optimized contraction coefficients, instead of high-level reference data (which are usually used in existing semiempirical methods). They can be compared with the contraction coefficients of ab initio basis sets, which are often also determined by variationally minimizing the energies of atoms and simple molecules. Note also that, although cμAνA02 are fitted against homodiatomic molecules only, they are found to be well transferable to heterodiatomic molecules.

The radial parts of some selected ano-pc-0 basis functions of atoms and diatomic molecules are plotted in Fig. 1 as a function of the bond length. As expected, the basis functions are generally more compact in covalently bonded diatomic molecules than in free atoms, and become progressively more compact when the bond shortens or (in the case of H 1s) when the other atom becomes more and more electronegative. Thus, ano-pc-0 mimics a double-zeta basis set despite being a minimal basis set, thanks to the geometry dependence of the contraction coefficients. It is also interesting to note that the radial distribution of the C 2s AO is nearly invariant with respect to the chemical environment, probably because it has to remain orthogonal to the 1s AO, which restricts its ability to shrink in a molecular environment. However, there are two important effects that ano-pc-0 cannot yet capture: (1) the px, py, and pz functions of an atom can have different radial distribution profiles, a frequently cited example being linear molecules where the p orbital parallel to the bond axis (pσ) is typically more compact than those perpendicular to the bond axis (pπ); and (2) the basis functions may deform by mixing with polarization functions. These effects are crucial for an accurate prediction of equilibrium bond lengths and vibrational frequencies, and will be discussed in Sec. II G.

FIG. 1.

Radial parts of some ano-pc-0 basis functions in different diatomic molecules with different bond lengths. The C 2p (F-C, R = 1.3 Å) curve nearly coincides with the C 2p (C-C, R = 1.5 Å) one.

FIG. 1.

Radial parts of some ano-pc-0 basis functions in different diatomic molecules with different bond lengths. The C 2p (F-C, R = 1.3 Å) curve nearly coincides with the C 2p (C-C, R = 1.5 Å) one.

Close modal

To keep the empiricism of NOTCH at the lowest possible level while still obtaining decent and uniform accuracy, the core Hamiltonian and two-electron NOTCH integrals of free atoms are determined from DKH2 multireference equation-of-motion coupled cluster singles and doubles (MR-EOM-CCSD)/ANO-RCC-Full45–47 calculations of neutral atoms. We make full use of the fact that the ano-pc-0 AOs resemble the atomic natural orbitals (since they are optimized for atomic CASSCF calculations), so that for atomic calculations, the integrals of the former can be replaced by those of the latter to take advantage of the higher basis set quality and (more importantly) the implicit inclusion of dynamic correlation of the latter. This construction is at the heart of the NOTCH idea and is the reason why we insisted on using a generally contracted basis set. In addition, we require that the 1s and 2s functions of every element be orthogonal: a segmented contracted or non-orthogonal basis set may expand the correct s-space, but its basis functions generally do not resemble the exact atomic natural orbitals as orthogonal generally contracted basis sets can do.

Specifically, a DKH2 SA-CASSCF/ANO-RCC-Full calculation is first performed on the neutral atom, with the active space being all the valence orbitals, and all states being averaged. Then, the MR-EOM-T|T|SXD|U method21 is invoked, which consists of four sequential similarity transformations of the full Hamiltonian Ĥ that approximately (but quite accurately) decouple the CAS configurations from their single- and double excitations in the first-order interacting space, until only the couplings with the one-particle (1p) and one-hole (1h) configurations remain,
H̄̂=eT̂ĤeT̂,
(8)
H̃̂=eT̂H̄̂2eT̂,
(9)
F̄̂=eŜ+X̂+D̂1H̃̂2eŜ+X̂+D̂,
(10)
Ĝ=eÛF̄̂2eÛ.
(11)
Here, the curly braces denote the Kutzelnigg–Mukherjee normal ordering, and the subscript 2 denotes the truncation up to two-body operators. The definitions of the cluster operators T̂, Ŝ, X̂, D̂ and Û are given in the original references of the MR-EOM method.21,22,24 The final transformed Hamiltonian Ĝ for an atom A has the following form (here ÊνAμA and ÊνAλAμAκA are the usual spin-free excitation operators):
Ĝ=g0+μAνAgνAμAÊνAμA+μAνAκAλAgνAλAμAκAÊνAλAμAκA+
(12)
The standard MR-EOM workflow proceeds to diagonalize Ĝ in the space spanned by the CAS configurations plus the 1h and 1p configurations, giving the excitation energies of the system. In the parameterization of the NOTCH method, however, we stop here and set the atomic NOTCH one-electron and two-electron integrals as the one-body and two-body contributions of Ĝ, respectively [the superscript (A) stresses that the integrals are those of neutral atoms; the one-center integrals in charged atoms or molecules will generally have different values, see below]:
hμAνA(A)̄=gνAμA,
(13)
μAνA|κAλA(A)̄=gνAλAμAκA.
(14)

The rationale of this construction is that, a multireference configurational interaction singles (MR-CIS) calculation with NOTCH integrals will then closely reproduce the MR-EOM excitation energies (the reproduction is not exact, because the NOTCH MR-CIS calculation does not include Rydberg orbitals). This is corroborated by our benchmark results of atomic excitation energies, as shown in Table I: the NOTCH MR-CIS excitation energies of atoms reproduce the MR-EOM ones (which in turn agrees with the experimental results48 to about 0.1 eV) to within a few tenth of an eV, with only one exception (fluorine). Thus, the NOTCH two-electron integrals can be said to implicitly include dynamic correlation with close to MR-CCSD quality. While a slightly better accuracy is achieved by the INDO/S intermediate neglect of differential overlap/spectroscopy49,50 method, this is mostly due to the fact that the INDO/S integrals were fitted against this exact set of experimental spectra, unlike the NOTCH integrals that were derived from higher-level calculations without any empiricism. Semiempirical methods that were not fitted to excitation energies at all, such as PM38 and OM2,51,52 perform very poorly and yield mean absolute deviations (MADs) around 2–3 eV. The largest outlier, the 12P → 12S excitation energy of the fluorine atom, is predicted to be 17.615 and 9.474 eV too low by PM3 and OM2, respectively. This is probably due to the fact that ground state, fluorine-containing molecules generally do not have a hole in the fluorine 2s orbital, making typical ground state molecule training sets incapable of properly constraining the integrals over 2s orbitals. While PM3 is well-known to be unsuitable for excitation energies in general,53 the failure of OM2 is surprising since random phase approximation (RPA) and multireference configuration interaction (MRCI)54 calculations with OM2 integrals are routinely used to calculate the excited states of molecules. This again highlights the importance of deriving the one-center integrals from the MR-EOM Hamiltonian, instead of fitting against reference data.

TABLE I.

Atomic excitation energies (eV) of elements Li–F computed by MR-CIS using NOTCH, INDO/S, PM3 and OM2 integrals, compared with DKH2 MR-EOM-CCSD/ANO-RCC-Full and experimental results.

Atom (ground state)Excited stateNOTCHINDO/SPM3OM2MR-EOMExpt.
Li [1s22s1 (2S)] 1s22p1 (2P) 1.846 1.850 2.407 N.A.a 1.846 1.848 
Be [1s22s2 (1S)] 1s22s12p1 (3P) 2.908 3.258 N.A.a N.A.a 2.866 2.725 
1s22s12p1 (1P) 5.961 5.810 N.A.a N.A.a 5.280 5.277 
B [1s22s22p1 (2P)] 1s22s12p2 (4P) 2.623 4.154 6.998 N.A.a 3.703 3.551 
1s22s12p2 (2D) 5.706 6.789 9.988 N.A.a 5.985 5.932 
1s22s12p2 (2S) 7.054 8.043 11.758 N.A.a 8.092 7.879 
1s22s12p2 (2P) 9.168 9.555 12.428 N.A.a 8.916 8.991 
C [1s22s22p2 (3P)] 1s22s22p2 (1D) 1.217 1.082 1.754 1.240 1.148 1.260 
1s22s22p2 (1S) 3.042 2.706 3.583 2.265 2.646 2.680 
1s22s12p3 (5S) 3.114 5.168 3.847 4.516 4.310 4.179 
1s22s12p3 (3D) 7.737 9.090 8.768 8.806 7.949 7.942 
N [1s22s22p3 (4S)] 1s22s22p3 (2D) 2.278 2.325 1.421 2.085 2.251 2.384 
1s22s22p3 (2P) 2.934 3.876 2.268 2.891 3.519 3.576 
1s22s12p4 (4P) 10.731 11.180 8.593 14.026 10.870 10.927 
1s22s12p4 (2D) 15.624 15.717 10.677 18.556 14.952 15.027 
O [1s22s22p4 (3P)] 1s22s22p4 (1D) 1.808 1.657 1.248 1.540 1.867 1.958 
1s22s22p4 (1S) 4.521 4.142 3.094 2.816 4.204 4.180 
1s22s12p5 (3P) 16.385 15.896 18.336 20.669 15.546 15.650 
F [1s22s22p5 (2P)] 1s22s12p6 (2S) 23.104 20.257 3.506 11.647 21.121 21.039 
MDb  0.036 0.289 −0.485 (0.585c0.056 (0.923c  
MADb  0.519 0.420 2.861 (1.938c2.117 (1.448c  
Atom (ground state)Excited stateNOTCHINDO/SPM3OM2MR-EOMExpt.
Li [1s22s1 (2S)] 1s22p1 (2P) 1.846 1.850 2.407 N.A.a 1.846 1.848 
Be [1s22s2 (1S)] 1s22s12p1 (3P) 2.908 3.258 N.A.a N.A.a 2.866 2.725 
1s22s12p1 (1P) 5.961 5.810 N.A.a N.A.a 5.280 5.277 
B [1s22s22p1 (2P)] 1s22s12p2 (4P) 2.623 4.154 6.998 N.A.a 3.703 3.551 
1s22s12p2 (2D) 5.706 6.789 9.988 N.A.a 5.985 5.932 
1s22s12p2 (2S) 7.054 8.043 11.758 N.A.a 8.092 7.879 
1s22s12p2 (2P) 9.168 9.555 12.428 N.A.a 8.916 8.991 
C [1s22s22p2 (3P)] 1s22s22p2 (1D) 1.217 1.082 1.754 1.240 1.148 1.260 
1s22s22p2 (1S) 3.042 2.706 3.583 2.265 2.646 2.680 
1s22s12p3 (5S) 3.114 5.168 3.847 4.516 4.310 4.179 
1s22s12p3 (3D) 7.737 9.090 8.768 8.806 7.949 7.942 
N [1s22s22p3 (4S)] 1s22s22p3 (2D) 2.278 2.325 1.421 2.085 2.251 2.384 
1s22s22p3 (2P) 2.934 3.876 2.268 2.891 3.519 3.576 
1s22s12p4 (4P) 10.731 11.180 8.593 14.026 10.870 10.927 
1s22s12p4 (2D) 15.624 15.717 10.677 18.556 14.952 15.027 
O [1s22s22p4 (3P)] 1s22s22p4 (1D) 1.808 1.657 1.248 1.540 1.867 1.958 
1s22s22p4 (1S) 4.521 4.142 3.094 2.816 4.204 4.180 
1s22s12p5 (3P) 16.385 15.896 18.336 20.669 15.546 15.650 
F [1s22s22p5 (2P)] 1s22s12p6 (2S) 23.104 20.257 3.506 11.647 21.121 21.039 
MDb  0.036 0.289 −0.485 (0.585c0.056 (0.923c  
MADb  0.519 0.420 2.861 (1.938c2.117 (1.448c  
a

The semiempirical method is not parameterized for this element.

b

With respect to the MR-EOM results.

c

Statistics after the worst outlier, the 12P → 12S excitation energy of fluorine, is removed.

The NOTCH integrals of atomic monocations and monoanions, hμAνA(A±)̄ and μAνA|κAλA(A±)̄, are similarly calculated and tabulated in advance. By this way, not only the total energies and excitation energies, but also the IPs and EAs of free atoms automatically acquire MR-EOM-like accuracy. The integrals of fractionally and multiply charged atoms can then be obtained by a quadratic interpolation/extrapolation from the neutral, cationic and anionic integrals:
hμAνAĀqA=hμAνAĀ+aμAνACDI,1e(1)qÃ+aμAνACDI,1e(2)qÃ2,
(15)
qÃ=4πarctanqA,
(16)
aμAνACDI,1e(1)=12hμAνAA+̄hμAνAĀ,
(17)
aμAνACDI,1e(2)=12hμAνAA+̄+hμAνAĀhμAνAĀ.
(18)
Here, the superscript “CDI” stands for “charge dependence of integrals,” and qA is the atomic charge. The two-electron integrals of charged atoms are derived in a completely analogous manner:
μAνA|κAλA(A)̄qA=μAνA|κAλA(A)̄+aμAνAκAλACDI,2e(1)qÃ+aμAνAκAλACDI,2e(2)qÃ2,
(19)
aμAνAκAλACDI,2e(1)=12μAνA|κAλA(A+)̄μAνA|κAλA(A)̄,
(20)
aμAνAκAλACDI,2e(2)=12μAνA|κAλA(A+)̄+μAνA|κAλA(A)̄μAνA|κAλA(A)̄.
(21)
Note that qA is “regularized” via Eq. (16) before it is used in the quadratic interpolation/extrapolation [Eqs. (15) and (19)], so that qà is equal to qA when qA1,0,1, and levels off to ±2 as qA becomes very large, instead of increasing indefinitely in magnitude and potentially leading to unphysical integral values. In reality we have not observed such unphysical results even if we used qA instead of qÃ, but we nevertheless chose to use qà to ensure the robustness of NOTCH for exotic systems.
For diatomic molecules, one needs to derive the NOTCH two-center integrals, in addition to the one-center integrals mentioned above. Less obvious is the fact that the NOTCH one-center integrals in a molecule must, in general, be different from the corresponding integral in the free atom. Consider a dihydrogen molecule whose H–H bond length is compressed to zero. The system then becomes a helium atom, whose 1 s orbitals are much tighter than those of the hydrogen atom. Therefore we have
limRAB0sAsA|sAsAH2=sAsA|sAsAHe>sAsA|sAsAH.
(22)
The most straightforward approach of setting sAsA|sAsAH2̄sAsA|sAsAH̄ can thus have an error up to sAsA|sAsAHēsAsA|sAsAH̄0.400a.u. While RAB = 0 is not a realistic situation, the huge error at the RAB → 0 limit casts doubt on the validity of the approximation sAsA|sAsAH2̄sAsA|sAsAH at non-zero bond lengths. The same situation occurs for the one-electron integrals. Interestingly, none of the most popular ZDO methods seem to have made the one-center integrals dependent on the chemical environment. This can be partly rationalized by the fact that the AOs of ZDO methods correspond more closely to the OAOs (rather than the AOs) of ab initio calculations,55 and the molecular OAOs do not reduce to the atomic OAOs as RAB → 0 (e.g., the OAOs of a dihydrogen molecule do not reduce to an OAO of the helium atom in the RAB → 0 limit, but rather become linear combinations of the helium 1s and 2p orbitals). However, the environmental dependence of one-center integrals becomes a real issue in TB methods, where integrals are usually given in the AO basis, and indeed the one-center integrals were made to depend on the atomic coordination numbers in, e.g., the xTB family of methods.14–16 
A better alternative for the two-electron integrals of a diatomic molecule AB, would be to add a dynamic correlation correction to the HF/ano-pc-0 integrals of the molecule, μν|κλ(AB) [which differs from μν|κλ(A) due to the geometry dependence of the contraction coefficients of the ano-pc-0 basis set]. Since the ano-pc-0 basis set is generally more compact in the molecular environment than in the free atom, the molecular NOTCH one-center integrals calculated in this way will be generally larger in magnitude than the corresponding free atom NOTCH integrals, as expected. Without loss of generality, we may write
μν|κλAB̄=μν|κλAB+μν|κλcorr,A+μν|κλcorr,B+μν|κλdisp,AB.
(23)
Here, μν|κλcorr,A and μν|κλcorr,B are the local dynamic correlation contributions of atoms A and B to μν|κλAB̄, respectively, and μν|κλdisp,AB is a nonlocal term that describes the dispersion effect, as will be detailed below.
The local dynamic correlation contributions μν|κλcorr,A, where the AOs μ, ν, κ, λ can be on either atom A or atom B, can in principle be approximated by projecting the AOs onto the AO basis functions of atom A (taking advantage of the fact that the AO basis functions μA of a given atom A are orthonormal). Note that we have temporarily used the bra-ket notation μ|μA for the overlap matrix elements SμμA to better illustrate how the projection is applied:
μν|κλcorr,AμAμ|μAμAν|κλcorr,A
μAνAκAλAμ|μAν|νAκ|κAλ|λAμAνA|κAλAcorr,A
μAνAκAλASμμASννAμAνA|κAλAcorr,ASκκASλλA.
(24)
Thus, we have reduced the task of finding the local dynamic correlation corrections μν|κλcorr,A to finding the one-center corrections μAνA|κAλAcorr,A, which (without loss of generality) can be expressed as the correlation correction in the free atom times a factor,
μAνA|κAλAcorr,A=fμAνAκAλAcorrμAνA|κAλAĀμAνA|κAλAA.
(25)
However, Eq. (24) double-counts the corrections to two-center integrals in the large RAB limit: since the fμAνAκAλAcorr must approach 1 in this limit in order to reproduce the correct free atom results, both μν|κλcorr,A and μν|κλcorr,B now describe the full correlation, and the total local dynamic correlation correction to the two-center integrals is thus too large by a factor of 2. Thus, in the real working equations, the overlap matrix elements in Eq. (24) are scaled by a factor that approaches 1 when RAB is small, but smoothly tends to 1/2 when RAB → +, in order to remove this double-counting while not spoiling the small RAB behavior,
μν|κλcorr,A=μAνAκAλAS̃μμAS̃ννAμAνA|κAλAcorr,AS̃κκAS̃λλA,
(26)
S̃μAνB=SμAνB1+eαcorrRAB/rA22.
(27)
Here, αcorr is an empirical constant. rA is a nonempirical, element-specific parameter that correlates with the atomic radius of the atom A; its construction recipe will be defined later.
The correlation factors fμAνAκAλAcorr account for the fact that an atom in a molecule generally feels less dynamic correlation than in the free atom, and are given by
fμAνAκAλAcorr=fμAfνAfκAfλA1/4.
(28)
For the orbital specific correlation factors, we took inspiration from density functional theory and expressed them as,
fμA=ψμArψμArρAPModelrRAρABPModelrfc,PW92ρABPModelrψμArψμArdrψμArψμArfc,PW92ρAPModelrRAψμArψμArdr,
(29)
where fc,PW92ρ=2Ec,PW92ρρ2=12fααc,PW92ρ+fαβc,PW92ρ is the spin-unpolarized correlation kernel of the PW92 local density approximation (LDA) correlation functional,56 and ρABPModelr=ρAPModelrRA+ρBPModelrRB is an approximation to the molecular density. The numerator of Eq. (29) has a clear physical interpretation: it is the contribution of atom A (as defined via a Hirshfeld partitioning) to the LDA correlation kernel tensor element fμAμAμAμAc,PW92ρABPModelr in the atom AB. The denominator similarly equals the LDA correlation kernel fμAμAμAμAc,PW92ρAPModelrRA in the free atom A.
Equation (29) is further approximated by spherically averaging the basis functions, e.g., ψpxr is replaced by ψpxr2+ψpyr2+ψpzr2. Now the integrand in the denominator becomes spherically symmetric, and the integrand in the numerator becomes axially symmetric. They are then integrated on a product grid centered on atom A with 1 radial point, 4 theta points and 1 phi point, where the z axis of the product grid is aligned with the AB bond axis; the latter ensures that the numerically integrated results are rotationally invariant. The radial point of the grid, rμAA, is set to the unique value that exactly integrates the ratio
ψμArψμArfc,PW92ρAPModelrRAψμArψμArdrψμArψμArψμArψμArdr
(30)
and is tabulated in advance. The value rμAA is unique because fc,PW92ρAPModelrRA is a monotonic function of rRA; it roughly correlates with the radial extent of the AO ψμAr. The theta points are obtained from the standard Gauss-Legendre quadrature. The use of only one radial point may seem like an extremely crude approximation, but in reality it suffices to converge the equilibrium geometries to about 0.01 Å compared to a more realistic grid comprising 100 radial points and 30 theta points, and it is actually preferable to the latter due to a lack of numerical noise. The use of only 4 theta points similarly suffices to converge the equilibrium geometries to about 0.001 Å.
Finally, the dispersion term μν|κλdisp has the following form:
μν|κλdisp=μAνAκBλBS̃μμAS̃ννAn=6,8sn×ηABnrμAArνAArκBBrλBBn4RABn1+a1rμAArνAArκBBrλBB14+a2n1×μAνA|κBλBS̃κκBS̃λλB.
(31)
The n = 6 and n = 8 terms in Eq. (31) decay asymptotically as RAB6 and RAB8, respectively [noting that μAνA|κBλB decays as RAB1], and are damped to finite values when RAB is small, in direct analogy with the well-known RAB6 and RAB8-decaying terms of the Becke–Johnson (BJ) damped dispersion correction.57–59 Unlike the conventional MM-type dispersion corrections (e.g., DFT-D360), however, Eq. (31) automatically captures the dependence of the dispersion energy with respect to atomic charges, i.e., the magnitude of the dispersion energy between two atoms increases when one of the atoms acquire a negative charge, and decreases when it acquires a positive charge, since the contribution of Eq. (31) to the total energy is proportional to the density matrix elements of the atoms [cf. Eq. (1)]:
Edispcorr=12μνσκλτDμνσμν|κλdispδστμκ|νλdispDκλτ.
(32)
The charge-dependent dispersion correction Eq. (31) can be compared with established charge-dependent dispersion correction schemes, such as DFT-D4.61 However, unlike the latter, Eq. (31) has the property that when one of the atoms has no electrons, the atom’s contribution to the dispersion energy is exactly zero. The factor rμAArνAArκBBrλBBn4 in Eq. (31) further makes the core electrons contribute much less to the dispersion energy than the same number of valence electrons do.

The parameters ηABn are pre-computed for every pair of atoms AB, so that when the spherically averaged NOTCH ROHF atomic density matrices of atoms A and B are plugged into Eq. (31), the DFT-D3 CAB6 and CAB8 coefficients of the atom pair AB are reproduced. The remaining parameters are determined by fitting, except for s6 which is fixed at the theoretical value of 1.0, as will be detailed in Sec. II I.

Finally, the one-electron integrals are corrected in a similar way,
hμνAB̄=hμνAB+μAνASμμAhμAνAĀhμAνASννA+μBνBSμμBhμBνBB̄hμBνBSννB+hμνpol.
(33)
The terms hμAνAĀhμAνA and hμBνBB̄hμBνB are included to recover the free atom NOTCH integrals in the RAB → + limit. The extra correction hμνpol is detailed in Sec. II G.
Now that we have the one-electron and two-electron integrals at zero atomic charge, the charge dependence of integrals, Eqs. (15)(21), needs to be generalized to the molecular case as well. The first thing to consider is the definition of the atomic charge qA in molecules, which can, in principle, be obtained from any population analysis method. One choice that we find particularly robust and convenient is the Löwdin charges:
qA=ZAμAσDμAμAσOAO.
(34)
Now we make the approximation that only one-center OAO basis integrals are affected by the atomic charges, while the two-center ones are not. Like the dispersion correction of the two-electron integrals, the free atom charge dependence corrections hμAνAĀqAhμAνAĀ and μAνA|κAλA(A)̄qAμAνA|κAλA(A)̄ also need to be scaled down before they can be added onto the molecular integrals,
hμAνAOAŌD=hμAνAOAŌ+fACDIhμAνAĀqAhμAνAĀ+hμAνApol,OAOqBhμAνApol,OAO,
(35)
μAνA|κAλAOAŌD=μAνA|κAλAOAŌ+fACDIμAνA|κAλA(A)̄qAμAνA|κAλA(A)̄.
(36)
The last term of Eq. (35) make the one-center integrals depend on the charges of other atoms (qB|BA) too, and will be explained in Sec. II G. The damping factor has the simple form
fACDI=BA1eβCDIRAB/rB
(37)
where βCDI is an empirical parameter, and rA=maxμAArμAA is the radial extent of the most diffuse AO of atom A, which should correlate with the size of the atom itself; note that rA is also used in Eq. (27). As discussed before, the charge dependence corrections are designed to mimic the expansion and shrinking of AOs due to non-zero atomic charges, and Eq. (37) accounts for the fact that, when other atoms are bonded to atom A, the AOs of those atoms can provide some of the variational flexibility required to expand or shrink the AOs, so that there is less need to correct the integrals.

Even with the above constructions, the NOTCH method still suffers from a systematic overestimation of equilibrium bond length (by 0.03–0.1 Å), as well as IPs and EAs (by 1.0–4.0 eV). This was found to be due to fundamental restrictions of the minimal basis set that are not remedied by the geometry-dependent re-contraction, i.e., (1) the constraint that the radial distributions of the px, py and pz orbitals of a given atom must be the same, and (2) the lack of polarization functions. These effects prove to be very difficult to capture without adding more basis functions to the minimal basis set.

To overcome these remaining basis set deficiencies without deviating from a minimal basis set formalism, we first consider the case where the atoms are sufficiently far away from each other, in which case these basis set limitations manifest themselves primarily as a severe underestimation of atomic polarizabilities, a pitfall that is well-known in other semiempirical methods as well. We can thus define a “basis set correction” to the total energy as the difference between the atomic polarization energies with a nearly complete basis set and with the ano-pc-0 basis set,
ΔEpolqB=AΔEApolqB,ΔEApolqB=12αAdef2QZVPPDqÃαAanopc0qÃFAqB2.
(38)
The charge-dependent polarizabilities αAdef2QZVPPDqà and αAanopc0qà are obtained from ground state polarizabilities of the neutral and monocationic states of the atom A calculated at the DKH2 SA-CASSCF/def2-QZVPPD62,63 and DKH2 SA-CASSCF/ano-pc-0 levels of theory, respectively, and interpolated as a linear function of the regularized charge qÃ. FAqB is the electric field that all atoms other than A exert on the outermost orbital shell [whose density contribution is approximated by the Fukui function ρAFukuirRA=ρAPModelrRAρA+PModelrRA] of atom A,
FAqB=BAfABpolρAFukuirRArrrr3×ρBPModelqB̃rRBZBδrRBdrdr.
(39)
Here, ρBPModelqB̃ is obtained from a linear interpolation between the fitted densities of the spherically averaged neutral atom (ρBPModel) and monocation (ρB+PModel) of B. Thanks to the ρPModel densities being a sum of s-type Gaussian functions, the above integral can be easily evaluated analytically.
To make Eq. (39) applicable to short interatomic distances as well, we have introduced the empirical factor fABpol, which damps FAqB when RAB is small, since the in this case the atom B starts to probe the inner region of the electron cloud of atom A, which is less polarizable than what would be expected from the total atomic polarizability,
fABpol=1eβpolRAB/rA,
(40)
where βpol is an empirical parameter. The energy correction is then equally distributed among the valence electrons of A, as a correction to the diagonal one-electron integrals (where NAval is the number of valence electrons in the neutral atom of A),
hμAνApol,OAOqB=δμAνAΔEApolqBNAval,when both μA and νA are valence AOs,0,otherwise.
(41)

The only remaining important contribution to the total energy is the left–right correlation of the bonded electrons, also known as the non-dynamic correlation. It is well known that semilocal DFT correlation functionals mainly describe dynamic correlation and severely underestimate (if not completely miss) the left–right correlation contribution,64 especially in elongated bonds. This is one of the major reasons why DFT functionals with 100% HF exchange typically perform poorly. As the NOTCH correlation correction described in Sec. II E uses an LDA-inspired form, it suffers from the same lack of left–right correlation, which is primarily manifested as a systematic overestimation of vibrational frequencies and IPs for certain molecules, like N2, O2, and F2, as well as an erroneous prediction of dipole moments. Therefore, we now add a final correction, ELRCD, to approximately account for left–right correlation.

Recall that the quantity 4DμAνBσOAO2 measures the expected number of shared spin-σ electrons between the OAOs μA and νB, which is the basis of the Wiberg bond order analysis.65 Similarly, we argue that the analogous quantity constructed from the uncorrelated OAO two-body density matrix (2-RDM) element, 16ΓμAκBσνAλBτOAO2, where
ΓμAκBσνAλBτOAO=DμAκBσOAODνAλBτOAOδστDμAλBσOAODνAκBτOAO,
(42)
for single reference methods like NOTCH, should represent a generalization of the Wiberg bond order analysis to 2-RDMs and measure the expected number of pairs of shared electrons [note that the pair of electrons, as defined in this section, need not occupy the same spatial orbital; thus, for example, there are 6 × (6 − 1)/2 = 15 pairs of shared electrons in N2, instead of 3]. Specifically, it is easy to see from Eq. (42) that 16ΓμAκBσνAλBτOAO2 is close to 1 when either (1) there is a spin-σ electron shared between OAOs μA and κB, and another spin-τ electron shared between OAOs νA and λB (in which case DμAκBσOAODνAλBτOAO12 and DμAλBσOAO=DνAκBτOAO=0); (2) there is a spin-σ electron shared between OAOs μA and λB, and another spin-σ electron shared between OAOs νA and κB, and σ = τ (in which case, DμAλBσOAODνAκBτOAO12 and DμAκBσOAO=DνAλBτOAO=0).
Since the left–right correlation is primarily due to the double excitations of shared electrons, it is natural to approximate the left–right correlation (LRC) energy as proportional to ΓμAκBσνAλBτOAO2, i.e., there is a contribution from every pair of shared electrons (the factor 4 instead of 16 stems from the double-counting of the OAOs in the sum),
ELRCD=A<BστμAνBκAλBvalenceOAOs4ΓμAκBσνAλBτOAO2EμAνBκAλBLRC,AB,
(43)
where EμAνBκAλBLRC,AB is given by the configuration interaction energy of a 2 × 2 Hamiltonian including a doubly excited determinant and the ground state determinant,
EμAνBκAλBLRC,AB=ΔμAνBκAλBLRCΔμAνBκAλBLRC2+ξABLRC2.
(44)
The quantity ΔμAνBκAλBLRC describes the energy gap between the doubly excited determinant and the ground state determinant. It should reduce to the AO orbital energy difference in the RAB → + limit,
ΔμAνBκAλBLRCRAB+=maxϵμA+ϵνBϵκAϵλB,ϵμA+ϵκAϵνBϵλB,ϵμA+ϵλBϵνBϵκA,
(45)
i.e., the sum of the two higher AO orbital energies minus the sum of the two lower AO orbital energies. The orbital energies ϵμA, ϵνB, ϵκA, and ϵλB are pre-computed at the ROHF-NOTCH level for spherically averaged atoms. At finite RAB, the energy gap ΔμAνBκAλB should in general increase as a result of the resonance integral, βAB, which couples the AOs of the two atoms,
ΔμAνBκAλBLRC=ΔμAνBκAλBLRCRAB+2+4βAB2.
(46)
The resonance integrals, while usually approximated as proportional to the overlap matrix elements in semiempirical methods, are shown in the OMx methods11 to be more accurately approximated by the following functional form:
βAB=β0RABeα0RAB2.
(47)
We have thus adopted the same functional form in NOTCH. Here, β0 and α0 are parameters that traditionally depend on the element types of A and B, as well as the types of AOs involved in the bonding, but are treated as global parameters in NOTCH.
Meanwhile, the term ξABLRC in Eq. (44) describes the off-diagonal matrix element between the ground state and the doubly excited determinant, whose RAB → + asymptotic behavior is determined by the asymptotic behavior of the MO two-electron integrals (where ξAB0 and ζAB are constants),
ξABLRC=ξAB0ζABRAB+O1RAB2.
(48)
Here, the constant term ξAB0 and the first-order term ζABRAB come from the (AA|AA) and monopolar (AA|BB) integrals, respectively. On the other hand, we would like ξABLRC to vanish in the RAB → 0 limit, since in this limit all correlation effects become completely local and are thus already well described by the local correlation corrections described in Sec. II E. These suggest the following form:
ξABLRC=ξAB0ζABRAB1eξAB0RAB/ζAB.
(49)
For homodiatomic molecules, the parameters ξAB0 and ζAB can be determined by matching the asymptotic behaviors of their restricted HF (RHF) NOTCH and unrestricted HF (UHF) NOTCH dissociation curves up to O1RAB (for He2, Be2, and Ne2, where there is no shared electron in the RAB → 0 limit, we set ξAB0=ζAB=0). The parameters for a heterodiatomic molecule AB are then taken as the maximum of those of the homodiatomic molecules AA and BB.

We conclude by a few remarks of the functional form Eq. (43):

  • In the DFT community, the left–right correlation is usually modeled by using a less than 100% amount of HF exchange; the difference of DFT exchange and HF exchange approximates the left–right correlation energy because it makes the exchange–correlation hole more local (as it usually should) than if the functional comprises 100% HF exchange plus a semilocal correlation functional. However, this strategy introduces the notorious self-interaction error problem, since the so-modeled left–right correlation is nonzero even for single-electron densities. The same problem obviously also exists in all self-consistent TB methods. In NOTCH, however, ELRCD by construction vanishes for single-electron densities, since the 2-RDM matrix elements ΓμAκBσνAλBτOAO are zero when there is only one electron. The theory thus remains self-interaction free.

  • Equation (43) is a quartic functional of the density matrix, so it cannot be accurately absorbed into any ansatz where the energy only depends on the density matrix up to third order. The energy expressions of all mainstream ZDO methods are quadratic functionals of the density matrix, while some TB methods include terms cubic in the density matrix (i.e., cubic charge dependence terms of the on-site energy).15,16,29,30 This in part explains why heavily parameterized core–core repulsion potentials are ubiquitous in existing semiempirical methods because the employed functional forms are not flexible enough to describe the left–right correlation, so that many more empirical parameters have to be introduced to fit the reference data.

  • The parameterization procedure of ξAB0 and ζAB implies that NOTCH can dissociate homodiatomic molecules correctly even under a restricted HF formalism, which even many ab initio methods struggle to achieve. The dissociation is in general not exactly correct for heterodiatomic molecules or diatomic molecular ions but in general still better than if ELRCD was not present. While this may not be directly relevant in real-world calculations (since it is usually acceptable to use broken-symmetry wavefunctions, which naturally lead to the correct dissociation limits), it is responsible for curing the systematic overestimation of energy at long but finite bond lengths, leading to better vibrational frequencies at equilibrium geometries (and potentially also the systematic overestimation of barrier height, which is common for methods with 100% HF exchange).

  • The use of the uncorrelated 2-RDM [Eq. (42)] is crucial here. Take H2 as an example: The exact correlated 2-RDM element decays to zero in the RAB → + limit, giving zero correlation correction, while the uncorrelated RHF 2-RDM shows that there is still one pair of shared electrons, and thus correctly yields a correlation correction that restores the UHF dissociation limit (see Fig. S1 for the uncorrelated and correlated 2-RDM elements of H2 as a function of the bond length). Thus, Eq. (42) should not be understood as an approximation of the correlated 2-RDM. This is reminiscent of the fact that, the Wiberg bond order of, e.g., H2 calculated from the correlated density matrix goes to zero in the RAB → + limit and therefore does not reflect the number of shared electrons in the RHF wavefunction.

To illustrate the effect of the LRC correction (especially near the dissociation region), we have plotted the potential energy curves of six representative diatomic molecules (H2, N2, F2, CH, HF, LiF) with NOTCH, both with and without the LRC correction (Fig. 2). As LRC strongly stabilizes the RHF wavefunction in the dissociation region but less so near the equilibrium geometry, it corrects the energy overestimation of the other terms of the NOTCH method in the dissociation region, including the highly ionic molecule LiF, although the correction is sometimes too large (especially for F2 and CH). Perhaps more importantly, the preferential stabilization of the dissociation region also increases the equilibrium bond length and reduces the vibrational frequency at the equilibrium geometry, which is especially obvious for F2 and CH. Therefore, despite the fact that the LRC correction does not consistently improve the bond energy and UHF potential energy curves in the dissociation region, we expect that it should in general be beneficial for geometries and vibrational frequencies, as will be confirmed in Secs. III BIII C.

FIG. 2.

Potential energy curves of selected diatomic molecules, calculated by NOTCH with and without the LRC correction, compared against DKH2 UCCSD(T)/ANO-RCC-TZP results.

FIG. 2.

Potential energy curves of selected diatomic molecules, calculated by NOTCH with and without the LRC correction, compared against DKH2 UCCSD(T)/ANO-RCC-TZP results.

Close modal

Another property of the LRC correction is that the RHF instability point of the NOTCH method occurs later than if the LRC correction is not present, especially in F2 and CH; this can be compared to the well-known fact that the ground state wavefunction instabilities of DFT methods generally occur at longer bond lengths than the HF method. For H2, the restricted NOTCH wavefunction even remains stable all the way until dissociation, so that the RHF NOTCH and UHF NOTCH curves coincide. For N2 and F2, the LRC correction also makes the RHF NOTCH dissociation limit coincide with the UHF one (by construction), but an RHF instability is still observed when the molecules are not fully dissociated; thus, while the RHF NOTCH curves exhibit unphysical bumps, the UHF NOTCH ones remain qualitatively correct. In the case of the HF molecule, the RHF NOTCH dissociation limit is wrong, but again the UHF NOTCH curve is at least as well behaved as when the LRC correction is not present.

Overall, our preliminary version of NOTCH is a semiempirical method featuring minimal empiricism and explicitly describes much more physics than existing semiempirical methods: (1) the local dynamic correlation and scalar relativistic effects are included through nonempirical MR-EOM calculations (Sec. II D) and the former is properly scaled down in a molecular environment (Sec. II E); (2) the nonlocal component of dynamic correlation, i.e., dispersion, is accounted for as a correction to the (AA|BB)-type two-electron integrals (Sec. II E); (3) the non-dynamic or left–right correlation is described by a separate term inspired by Wiberg bond orders (Sec. II H); (4) the radial distortions of AOs as a function of geometry (Sec. II C) and atomic charge (Sec. II F) are accounted for by corrections to the AO contraction coefficients and one-center integrals, respectively; and (5) the angular distortions of AOs are approximated by adding a classical charge polarization term (Sec. II G). It can be seen that although our method is built upon a HF/minimal basis framework, all relativity, correlation, and basis set incompleteness effects are included in a physically transparent way, and no attempt is made to absorb any effect into terms whose functional forms are not designed to capture that effect.

The present form of NOTCH contains eight global empirical parameters: (1) the parameter αcorr responsible for removing the double-counting of dynamic correlation [Eq. (27)]; (2) the dispersion correction parameters s8, a1, and a2 [Eq. (31)]; (3) the parameter βCDI used for damping the charge dependence correction of integrals [Eq. (37)]; (4) the parameter βpol used for damping the basis set incompleteness correction [Eq. (40)]; and (5) the parameters β0 and α0 involved in the resonance integral in the LRC correction [Eq. (47)]. They were fitted via the following procedure:

  1. The parameters αcorr, s8, a1, and a2 were fitted against the domain-based local pair natural orbital (DLPNO)-coupled-cluster singles and doubles with perturbative triples [CCSD(T)]66,67 local energy decomposition (LED)68,69 dispersion energies of all 55 diatomic molecules constructible from the elements H–Ne, at bond distances that are 0.9, 1.0, 1.2, 1.5, and 2.0 times the sum of the respective van der Waals radii (for details, see  Appendix B).

  2. The parameters βpol, β0, and α0 are obtained by fitting the RHF NOTCH potential energy curve of H2 against DKH2 CCSD(T)/ANO-RCC-TZP.45–47 

  3. The parameter βCDI is fitted against the DKH2 HF/ANO-RCC-TZP dissociation energy of H2+.

Since there are some couplings between the parameters, the above procedure is iterated a few times until all the parameters converge. Although the parameters βpol, β0, α0, and βCDI are fitted against H2 and H2+ only, the benchmark results (Sec. III) do not improve noticeably when we add more molecules (such as N2 and F2) to the training set, suggesting that the parameters are well transferable among different diatomic molecules.

In addition, NOTCH has many parameters that are not derived from fitting against high-level reference data, including (1) the parameters cμAνA02 used for calculating the geometry-dependent contraction coefficients of the ano-pc-0 basis set (Sec. II C), fitted from CASSCF calculations of ground state homodiatomic molecules; (2) the Gaussian exponents and coefficients used for expanding the spherically averaged density of atoms, ρAPModel (Sec. II C), and atomic monocations, ρA+PModel (Sec. II G), fitted against spherically averaged ROHF atomic densities; (3) the MR-EOM one-electron and two-electron integrals (Sec. II D), generated from atomic MR-EOM calculations; (4) the location of the radial grid point in the numerical integration of Eq. (29) (Sec. II E), fitted from numerically integrating Eq. (30) for spherically averaged atoms; (5) the atomic polarizability correction αAdef2QZVPPDqÃαAanopc0qà (Sec. II G), obtained from polarizability calculations of atoms; and (6) the element-specific parameters ξAA0 and ζAA (Sec. II H) involved in the LRC correction, determined by matching the RHF and UHF NOTCH energies for homodiatomic molecules at the dissociation limit. These parameters, as well as detailed descriptions for the procedures to generate them, are tabulated in the supplementary material for convenience. Note that all the nonempirical parameters are derived from the calculations of either atoms or homodiatomic molecules; thus, it is easy to extend the parameterization to heavier elements. While the global, empirical parameters (Table II) may need to be re-fitted when we parameterize more elements, we expect that the reference data of a very few number of heavy element-containing molecules should suffice for the fitting.

TABLE II.

The global NOTCH parameters. All parameters are given in atomic units.

Parameterαcorrs8a1a2βCDIβpolβ0α0
Value 0.080 0.711 0.752 5.773 0.793 0.240 0.130 0.094 
Parameterαcorrs8a1a2βCDIβpolβ0α0
Value 0.080 0.711 0.752 5.773 0.793 0.240 0.130 0.094 

Finally, based on the comments of a referee to this paper, we would like to briefly discuss the meaning of the term “empirical” in semiempirical theories. Empiricism, as a deviation from pure ab initio theories, may enter a semiempirical method in various ways:

  • by fitting physically more or less well-defined parameters that are atom or even atom pair specific to a set of molecular reference data;

  • by fitting a set of element-specific parameters to experimental data obtained on free atoms and ions;

  • by replacing well-defined quantities in ab initio theory, for example, analytical AO integrals, with simplified and possibly parameterized expressions—the latter may or may not respect the known asymptotic behavior of the corresponding ab initio quantities; or

  • by inventing correction terms that have no counterpart in ab initio theory and are based on plausibility arguments.

Based on these levels of empiricism, we wish to comment on the level of empiricism in NOTCH:

  1. In NOTCH, the one-center integrals are derived from high-level calculations on atoms and ions. There is no fitting involved in this process. The parameters are taken directly from the twice similarity-transformed Hamiltonian of MR-EOM calculations in an ANO basis. This is manifestly less empirical than fitting to experimental data of free atoms and ions, let alone the experimental data of molecules. We therefore regard our current approach as nonempirical.

  2. In NOTCH, the “breathing” of basis functions due to the chemical environment is modeled by fitting them to basis functions optimized in an ab initio fashion for homodiatomic molecules. Importantly, the latter basis functions are not optimized to reproduce high-level data obtained either experimentally or computationally (like the case of most semiempirical methods, whose basis set exponents are treated as free parameters and fitted against experimental data) but are rather optimized to minimize the diatomic CASSCF energy computed using that basis set (like the case of any ab initio basis set whose basis functions are optimized variationally on atoms and possibly molecules). Thus, we regard this form of fitting also as nonempirical, even though molecular calculations are involved.

  3. In NOTCH, there are no element-specific empirical parameters that are derived from fitting to atomic or molecular reference data. There are a handful of semiempirical parameters that are fitted to molecular reference data, which is indeed an empirical element of the method. However, the total number of parameters determined in this way (8) is much smaller as for example, in PM79 (225) or GFN2-xTB16 (194) for the elements H–Ne.

  4. In NOTCH, physically motivated, semiempirical expressions are used to approximately incorporate well-defined physical phenomena that all have a counterpart in ab initio theory. This obviously is a form of empiricism, but here we stress the close ties to ab initio theory. Most of the NOTCH correction terms can be turned on or off with well-defined and physically meaningful changes in the results of the methods (Sec. III); this is in stark contrast with many other semiempirical methods or even some density functionals, where individual energy contributions do not have rigorous ties to any physical quantity, since only the sum of the contributions was constrained by the parameterization process. Our expressions reproduce most of the known asymptotic behaviors of their ab initio counterparts to the extent that semiempirical methods can possibly achieve.

Based on these criteria, we believe that NOTCH is significantly less empirical than practically all semiempirical methods in use, while at least up to this point in the development, NOTCH is characterized by a far more uniform accuracy across a large array of molecular properties, as will be shown in Sec. III. We are hopeful that this behavior will translate to larger molecules.

The NOTCH method has been implemented in a development version of ORCA.70 The program workflow of a NOTCH single point calculation closely resembles a minimal basis HF calculation and is given as follows:

  1. The ano-pc-0 basis set contraction coefficients of the molecule in question, cμAνAanopc0RA,RB, are generated on the fly, via the procedure detailed in Sec. II C.

  2. The zero-charge one-electron and two-electron integrals, hμνAB̄ and μν|κλAB̄, are calculated under the AO basis (Sec. II E), transformed to the OAO basis, and stored on disk. The transformation to the OAO basis is not necessary for diatomic calculations (since the Fock builds can be done under the AO basis instead) but may become essential for multiatomic molecules (see Sec. IV).

  3. A standard integral conventional HF calculation is then carried out. During each SCF iteration, the usual one-electron and two-electron components of the Fock matrix, FOAO, are assembled under the OAO basis using the pre-calculated, zero-charge integrals and the OAO basis density matrix DOAO. The contributions from the κλτhκλ̄DDμνσDκλτ, μνσκλτμν|κλ̄DDμνσδστμκ|νλ̄DDμνσDμνσDκλτ, and ELRCDDμνσ terms in Eq. (5) are then calculated under the OAO basis and added onto FOAO. The derivatives of the integrals with respect to the density matrix elements are given by the chain rule, e.g.,
    hκλOAŌDOAODμνσOAO=AhκλOAŌDOAOqÃqÃqAqADμνσOAO.
    (50)

    It is then elementary to derive the working formulas of the derivatives of the integrals, which are given in the supplementary material.

  4. The contribution of the charge dependence of integrals (Sec. II G) and the LRC correction (Sec. II H) to the energy are then calculated and added to the total energy [Eq. (1)].

  5. The final Fock matrix can be transformed back to the AO basis to yield the Roothaan–Hall equation (3), which is then solved in the standard way to give the molecular orbitals of the next SCF iteration. For large systems, however, it may be advantageous to directly diagonalize the Fock matrix under the OAO basis to save a few matrix operations.

In this section, we calculate a few properties of atoms and diatomic molecules using NOTCH and some other semiempirical methods (PM3,8 PM7,9 OM2,51,52 OM3,71 GFN-xTB,15 GFN2-xTB16), as well as two cost-effective ab initio methods (HF-3c43 and B3LYP-D360,72,73/def2-SVP62). All calculations at the NOTCH, PM3, GFN-xTB, GFN2-xTB, HF-3c, and B3LYP-D3/def2-SVP levels were conducted by a development version of ORCA,70 while the PM7 and OM2/OM3 calculations were performed with the MOPAC74 and MNDO75 programs, respectively. The reference values for all ground state properties were generated at the DKH2 CCSD(T)/ANO-RCC-Full (for atomic calculations) and DKH2 CCSD(T)/ANO-RCC-TZP (for molecular calculations) levels of theories. The reference excitation energies were generated at the DKH2 multireference average coupled-pair functional (MRACPF)76/ANO-RCC-TZP level of theory; herein MRACPF was used instead of the more rigorous MR-EOM-CCSD method due to the better numerical stability of the former.

As a first check, we report the atomic IPs and EAs calculated at the NOTCH level and compare them against the reference results. All IPs and EAs are calculated as differences of single-reference unrestricted SCF energies. As can be seen from Tables III and IV, the MADs of NOTCH atomic IPs and EAs are on the order of 0.4 eV and do not exhibit systematic bias. However, if the neutral atom integrals are used in the calculations of the atomic cations and anions, the MADs increase to 1.63 and 2.42 eV for IPs and EAs, respectively, with the IPs being systematically overestimated and the EAs being systematically underestimated; in particular, all the atomic monoanions are predicted to be unbound. This clearly illustrates the importance of introducing the charge dependence to the integrals. It is interesting to note that none of the other semiempirical methods outperform NOTCH for either the IPs or the EAs, and while the B3LYP-D3/def2-SVP IPs are more accurate than the NOTCH ones, the corresponding EAs are inferior to the NOTCH results. The latter shows that the charge dependence of NOTCH integrals introduces a radial flexibility in the basis set that goes beyond that of even def2-SVP. While the GFN-xTB and GFN2-xTB methods do effectively account for the charge dependence of integrals owing to an energy term that depends cubically on the atomic charges, in practice GFN-xTB and GFN2-xTB perform very poorly for the IPs and EAs of atoms. Indeed, it has been recommended to use a specially parameterized version of GFN-xTB to calculate the IPs and EAs of molecules; even then, an empirical shift on the order of 5 eV must be applied to the IPs and EAs.15 Also of note is the presence of several severe outliers in various semiempirical methods [e.g., the IPs of Ne (PM7), He, O (GFN-xTB), and the EAs of Li (PM7) and Be (GFN2-xTB)], which clearly represent corner cases that are not found in typical training sets. The nonempirical atomic integrals of NOTCH, however, ensure that NOTCH remains reliable even in these cases. It must be noted, however, that atoms (and many of the diatomic molecules that will be mentioned in the following sections) are not the target of most semiempirical methods.

TABLE III.

Atomic IPs (eV) of elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-Full results.

ElementNOTCHNOTCHaPM3PM7OM2OM3GFN-xTBGFN2-xTBHF-3cB3LYP-D3/def2-SVPCCSD(T)
13.61 13.61 13.07 11.07 12.65 12.46 17.32 17.10 12.68 13.55 13.61 
He 24.56 26.09 N.A.b 22.33 N.A.b N.A.b 43.09 34.27 24.59 24.78 24.56 
Li 5.42 5.40 5.13 4.80 N.A.b N.A.b 11.00 9.42 5.26 5.46 5.39 
Be 8.52 8.73 N.A.b 7.84 N.A.b N.A.b 14.09 N.A.c 7.74 9.00 9.31 
8.41 8.62 5.32 4.04 N.A.b N.A.b 13.54 N.A.c 6.88 8.73 8.26 
11.51 12.24 9.86 8.22 10.01 9.87 17.23 N.A.c 9.87 11.51 11.23 
14.90 16.41 13.28 14.09 13.63 13.42 19.56 18.91 13.79 14.58 14.52 
12.17 16.67 12.76 10.54 13.43 13.41 26.17 22.39 12.87 13.91 13.51 
17.99 20.91 16.39 17.25 17.06 17.29 28.58 24.84 16.90 17.44 17.34 
Ne 22.35 25.70 N.A.b 5.30 N.A.b N.A.b 24.76 31.86 21.47 21.35 21.53 
MD 0.02 1.51 −2.17 −3.38 −0.68 −0.75 7.61 6.90 −0.72 0.11  
MAD 0.45 1.63 2.17 3.38 0.68 0.75 7.61 6.90 0.72 0.22  
ElementNOTCHNOTCHaPM3PM7OM2OM3GFN-xTBGFN2-xTBHF-3cB3LYP-D3/def2-SVPCCSD(T)
13.61 13.61 13.07 11.07 12.65 12.46 17.32 17.10 12.68 13.55 13.61 
He 24.56 26.09 N.A.b 22.33 N.A.b N.A.b 43.09 34.27 24.59 24.78 24.56 
Li 5.42 5.40 5.13 4.80 N.A.b N.A.b 11.00 9.42 5.26 5.46 5.39 
Be 8.52 8.73 N.A.b 7.84 N.A.b N.A.b 14.09 N.A.c 7.74 9.00 9.31 
8.41 8.62 5.32 4.04 N.A.b N.A.b 13.54 N.A.c 6.88 8.73 8.26 
11.51 12.24 9.86 8.22 10.01 9.87 17.23 N.A.c 9.87 11.51 11.23 
14.90 16.41 13.28 14.09 13.63 13.42 19.56 18.91 13.79 14.58 14.52 
12.17 16.67 12.76 10.54 13.43 13.41 26.17 22.39 12.87 13.91 13.51 
17.99 20.91 16.39 17.25 17.06 17.29 28.58 24.84 16.90 17.44 17.34 
Ne 22.35 25.70 N.A.b 5.30 N.A.b N.A.b 24.76 31.86 21.47 21.35 21.53 
MD 0.02 1.51 −2.17 −3.38 −0.68 −0.75 7.61 6.90 −0.72 0.11  
MAD 0.45 1.63 2.17 3.38 0.68 0.75 7.61 6.90 0.72 0.22  
a

The charge dependence corrections aμAνACDI,1e(1), aμAνACDI,1e(2), aμAνAκAλACDI,2e(1), and aμAνAκAλACDI,2e(2) are set to zero.

b

The element is not parameterized.

c

The SCF calculation fails to converge.

TABLE IV.

Atomic EAs (eV) of elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-Full results. The EAs of He and Ne are omitted since their anions are either not calculable (due to the use of a minimal basis set) or unbound at all levels of theories.

ElementNOTCHNOTCHaPM3PM7OM2OM3GFN-xTBGFN2-xTBHF-3cB3LYP-D3/def2-SVPCCSD(T)
0.72 −2.32 −1.72 −3.08 −0.20 −0.39 4.53 5.91 −8.60 −0.42 0.72 
Li 0.27 −0.72 −2.17 −14.10 N.A.b N.A.b 5.41 2.75 −1.64 0.41 0.60 
Be −0.61 −1.79 N.A.b −3.95 N.A.b N.A.b 2.54 −14.87 −7.05 −0.82 −0.73 
0.32 −1.54 −1.89 2.39 N.A.b N.A.b 4.07 −1.93 −4.70 −0.30 0.14 
1.43 −0.74 1.70 −0.63 0.79 0.65 4.74 3.37 −4.53 0.38 1.24 
−1.27 −3.07 0.58 −0.03 −0.74 −0.95 6.30 5.86 −8.21 −1.71 −0.24 
1.87 −1.64 0.98 0.99 1.22 1.20 9.65 8.20 −6.23 −0.39 1.40 
4.03 −0.14 2.17 3.36 3.05 3.28 8.85 7.90 −4.78 1.21 3.37 
MD −0.06 −2.42 −0.86 −2.69 −0.47 −0.54 4.95 1.33 −6.53 −1.02  
MAD 0.38 2.42 1.36 3.31 0.47 0.54 4.95 6.31 6.53 1.02  
ElementNOTCHNOTCHaPM3PM7OM2OM3GFN-xTBGFN2-xTBHF-3cB3LYP-D3/def2-SVPCCSD(T)
0.72 −2.32 −1.72 −3.08 −0.20 −0.39 4.53 5.91 −8.60 −0.42 0.72 
Li 0.27 −0.72 −2.17 −14.10 N.A.b N.A.b 5.41 2.75 −1.64 0.41 0.60 
Be −0.61 −1.79 N.A.b −3.95 N.A.b N.A.b 2.54 −14.87 −7.05 −0.82 −0.73 
0.32 −1.54 −1.89 2.39 N.A.b N.A.b 4.07 −1.93 −4.70 −0.30 0.14 
1.43 −0.74 1.70 −0.63 0.79 0.65 4.74 3.37 −4.53 0.38 1.24 
−1.27 −3.07 0.58 −0.03 −0.74 −0.95 6.30 5.86 −8.21 −1.71 −0.24 
1.87 −1.64 0.98 0.99 1.22 1.20 9.65 8.20 −6.23 −0.39 1.40 
4.03 −0.14 2.17 3.36 3.05 3.28 8.85 7.90 −4.78 1.21 3.37 
MD −0.06 −2.42 −0.86 −2.69 −0.47 −0.54 4.95 1.33 −6.53 −1.02  
MAD 0.38 2.42 1.36 3.31 0.47 0.54 4.95 6.31 6.53 1.02  
a

The charge dependence corrections aμAνACDI,1e(1), aμAνACDI,1e(2), aμAνAκAλACDI,2e(1), and aμAνAκAλACDI,2e(2) are set to zero.

b

The element is not parameterized.

Encouraged by the success with atoms, we then move on to diatomic molecules. The first thing to look at is the ground state equilibrium bond lengths and vibrational frequencies of the molecules, which are of paramount importance since geometry optimization and molecular dynamics are perhaps the most common applications of semiempirical methods. As only 55 different diatomic molecules can possibly form between the elements H–Ne, we can do an exhaustive analysis of all these molecules to rule out most qualitative failures of NOTCH on the diatomic level. To the best of our knowledge, such a systematic analysis of diatomic molecules has never been pursued for semiempirical methods.

As can be seen from Fig. 3, the NOTCH equilibrium bond lengths agree very well with the reference values. The largest absolute deviations are on the order of 1 Å and occur in the molecules HeLi and LiNe, but since they are van der Waals complexes with very long interatomic distances [6.043 and 5.112 Å at the CCSD(T) level, respectively; see Table S2 for the complete set of equilibrium geometry benchmark results], such deviations should be tolerable for most purposes. The largest negative deviations are similarly found in van der Waals complexes (BeNe, BNe, CNe), whose equilibrium bond lengths are very long. The largest deviations (>0.05 Å) seen in covalent and ionic bonds are the multiply bonded molecules BeN, NO, and O2 (which have appreciable multireference character), and the lithium compounds LiH, LiBe, LiB, LiN, LiO, and LiF (which are highly ionic).

FIG. 3.

Errors of equilibrium bond lengths (Å) of all neutral, ground state diatomic molecules formed between the elements H–Ne (left), as well as excluding Be2 and all molecules containing He and/or Ne (right), computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. Important outliers are labeled. Method abbreviations: NOTCH a—NOTCH where the geometry dependence of the basis contraction coefficients are neglected, i.e., cμAνA1=cμAνA2=0; NOTCH b—NOTCH where αcorr is set to 0; NOTCH c—NOTCH where the dispersion correction μν|κλdisp is set to 0; NOTCH d—NOTCH where the charge dependence of integrals is neglected, i.e., fACDI is set to 0; NOTCH e—NOTCH where the basis set incompleteness correction is neglected, i.e., fABpol is set to 0; NOTCH f—NOTCH where the LRC correction ELRCD is set to 0; B3LYP—B3LYP-D3/def2-SVP.

FIG. 3.

Errors of equilibrium bond lengths (Å) of all neutral, ground state diatomic molecules formed between the elements H–Ne (left), as well as excluding Be2 and all molecules containing He and/or Ne (right), computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. Important outliers are labeled. Method abbreviations: NOTCH a—NOTCH where the geometry dependence of the basis contraction coefficients are neglected, i.e., cμAνA1=cμAνA2=0; NOTCH b—NOTCH where αcorr is set to 0; NOTCH c—NOTCH where the dispersion correction μν|κλdisp is set to 0; NOTCH d—NOTCH where the charge dependence of integrals is neglected, i.e., fACDI is set to 0; NOTCH e—NOTCH where the basis set incompleteness correction is neglected, i.e., fABpol is set to 0; NOTCH f—NOTCH where the LRC correction ELRCD is set to 0; B3LYP—B3LYP-D3/def2-SVP.

Close modal

To illustrate the effects of some key ingredients of NOTCH, we next perform benchmark studies where certain terms in NOTCH are turned off or certain parameters are set to zero. When the geometry dependence of the ano-pc-0 contraction coefficients is neglected (“NOTCH a”), we observe a systematic overestimation of bond lengths, especially for hydrogen-containing molecules; this is consistent with the well-known fact that the optimal hydrogen 1s exponents of minimal basis sets for molecular calculations are significantly larger than the free atom value 1.0 (typically around 1.242), while the optimal exponents of other elements differ significantly less from the corresponding free atom values. Meanwhile, the use of a scaled-down version of the overlap matrix elements in Eq. (27) appears essential for the correct description of a few non-covalent complexes containing He and Ne, as otherwise their bond lengths are severely underestimated and sometimes even end up at covalent bond-like values (e.g., the Li–Ne equilibrium bond length is 1.826 Å when αcorr is set to zero, as in “NOTCH b” in Fig. 3, compared to the reference value 5.112 Å). The dispersion correction Eq. (31) is of course essential for the qualitative description of most He- and Ne-containing molecules as well as Be2, which are under-bound or even unbound when the dispersion correction is turned off (“NOTCH c”); the effect is notable even for some weak covalent bonds, such as Li2 (Fig. 3). The charge dependence of integrals does not in general affect the bond lengths appreciably but is responsible for preventing the overbinding of some He- and Ne-containing molecules when the dependence is neglected (“NOTCH d”), which can undergo charge transfers on the 0.1e magnitude at compressed geometries and are thus susceptible to the treatment of the charge dependence of integrals. Neglecting the basis set incompleteness correction (“NOTCH e”) results in a systematic overestimation of bond lengths, especially for hydrogen-containing molecules and molecules containing the elements N, O and F. Finally, neglecting the LRC correction (“NOTCH f”) gives systematically underestimated bond lengths for hydrogen-containing molecules and F2, consistent with the well-known fact that HF tends to underestimate bond lengths due to the lack of left–right correlation. In general, neglecting the various terms in NOTCH results in either worse outliers, wider error spread or increased systematic error, which illustrates the importance of the terms (Fig. 3).

Subsequently, we test the accuracy of some other representative semiempirical methods, as well as HF-3c and B3LYP-D3/def2-SVP (Table S3). All methods, except for the OM2 and OM3 methods parameterized for the elements H, C, N, O, and F only, have severe outliers with qualitatively wrong bond lengths, as shown in Fig. 3 and Table S2. For example, the PM3 method systematically overestimates the bond lengths of Li- and B-containing molecules by as much as 0.452 and 0.210 Å, respectively. The PM7, GFN-xTB, and GFN2-xTB methods, as well as HF-3c and B3LYP-D3/def2-SVP, severely underestimate the bond lengths of several He- and Ne-containing molecules. In particular, the PM7 and GFN2-xTB methods predict that the bond lengths of some He- and Ne-containing molecules are even shorter than typical covalent bonds, for example, HeO [R(GFN2-xTB) = 0.808 Å], BNe [R(GFN2-xTB) = 1.213 Å], HNe [R(PM7) = 0.970 Å], and BeNe [R(PM7) = 0.743 Å]. Even the B3LYP-D3/def2-SVP method, which is widely used in production level geometry optimizations, fails dramatically for LiNe, predicting a bond length of 2.677 Å, which is even shorter than the Li–Li bond length obtained at the same level, 2.766 Å. The widespread failure of semiempirical and cost-effective ab initio methods (especially B3LYP-D3/def2-SVP) for these molecules can be partly attributed to basis set superposition error (BSSE), although the aforementioned failures of GFN2-xTB and PM7 for HeO, BNe, HNe, and BeNe must be primarily due to other deficiencies of the methods, given the magnitudes of the deviations. While the latter deficiencies presumably do not show up in non-covalent complexes in the training sets of the methods, they lead to spurious underestimations of energy near the typical interatomic distance of covalent bonds, a region that is scarcely probed by most He- and/or Ne-containing systems in typical training sets. By contrast, the use of less empirical and more physically grounded constructions in the NOTCH method helped us to avoid unprecedented failures of the method at extremely short bond distances, and the explicit consideration of the charge dependence of integrals (which reflects the radial flexibility of the basis set) makes the NOTCH AOs behave like a much more complete basis set than it would otherwise be (as evidenced by the failure of NOTCH for He- and Ne-containing molecules when fACDI is set to zero), therefore reducing the BSSE. Finally, even when non-covalent molecules are excluded from the statistics, NOTCH continues to have a narrower error spread than PM3, PM7, GFN-xTB, GFN2-xTB, and HF-3c (Fig. 3, right), which shows that NOTCH is systematically superior for covalently bonded molecules as well. The excellent performance of OM2 and OM3 can be explained by the fact that they are not parameterized for the elements He, Li, Be, B, and Ne, which are generally “difficult” elements for bond lengths, as evidenced by the outliers shown in Fig. 3.

The errors of harmonic vibrational frequencies (Fig. 4; for the complete set of data, see Table S4) generally mirror the trend of the equilibrium bond lengths, with the frequencies tending to be overestimated whenever the equilibrium bond length is underestimated, and vice versa. However, as NOTCH has a systematic tendency to overestimate vibrational frequencies, it may appear that some of the terms in NOTCH, such as the basis set incompleteness correction hμνpol, actually worsen the agreement of the vibrational frequencies with the reference values (cf. the “NOTCH e” result in Fig. 4, where hμνpol is set to 0); nevertheless, the effect is not very large (on the order of 100 cm−1) and may thus be viewed as a necessary compromise for obtaining satisfactory equilibrium bond lengths. Other terms generally improve the accuracy of the vibrational frequencies in most cases. Neglecting the geometry dependence of the AO contraction coefficients (“NOTCH a”) leads to too small vibrational frequencies for H2 and increases the overestimation of vibrational frequencies for period 2 elements. Setting αcorr to zero (“NOTCH b”) worsens the results of Li- and Be-containing molecules, again reflecting the role of Eq. (27) in preventing the overbinding of weakly bound molecules. Neglecting the charge dependence of integrals (“NOTCH d”) results in too large vibrational frequencies for highly polar molecules like HF and LiF. Finally, the LRC correction (“NOTCH f”) is found to play a major role in the correct prediction of vibrational frequencies of hydrogen-containing molecules as well as O2 and F2.

FIG. 4.

Errors of harmonic vibrational frequencies at equilibrium geometries (cm−1) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

FIG. 4.

Errors of harmonic vibrational frequencies at equilibrium geometries (cm−1) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

Close modal

Similar to the case of equilibrium geometries, the vibrational frequencies computed by other semiempirical methods and HF-3c (Table S5) invariably show severe outliers. In addition to the outliers that are already expectable from the bad equilibrium geometries (such as the HeO, BNe, HNe, and BeNe molecules mentioned above), there are a few more outliers, like the BN and O2 molecules at the PM3 level and the H2, HO, and F2 molecules at the OM2 level. One may notice that many of these molecules have significant multireference character, such that single reference methods are prone to yield erroneous results. However, the much smaller errors of NOTCH compared to other semiempirical methods as well as the fact that the B3LYP-D3/def2-SVP frequencies of these multireference molecules are not much worse than the single-reference ones suggest that the failure of the existing semiempirical methods is not an intrinsic problem of the single-reference theoretical framework but rather a problem of the functional forms and parameterization procedures of the methods.

The dipole moments are then benchmarked as a measure of the quality of the semiempirical ground state wavefunctions. We first note that there is some ambiguity in defining the NOTCH dipole moment of a molecule, since the classical polarization term Eq. (38) introduces a contribution to the dipole moment, in addition to the usual nuclear and electronic contributions,
Δμpol=AαAdef2QZVPPDqÃαAanopc0qÃFAqB.
(51)
We have thus computed the NOTCH dipole moment both including and excluding Δμpol. As shown in Fig. 5 and Table S6, when Δμpol is included (“NOTCH + p”), the dipole moments of most hydrogen-containing molecules are systematically overestimated, while those of Li-containing molecules are underestimated. When Δμpol is removed, the errors of H- and Li-containing molecules are noticeably reduced, but those of multiply bonded molecules between B, C, N, and O are somewhat increased; in particular, the dipole moment of CO (−0.355 D) is now predicted to have the wrong sign (Cδ+ − Oδ−), unlike when Δμpol is included (0.206 D, compared with the reference value 0.288 D). Whether one should include Δμpol in computing the NOTCH dipole moments is as yet unclear and is the subject of future research.
FIG. 5.

Errors of signed dipole moments (Debye) of all neutral, ground state heterodiatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results (evaluated on the linearized coupled cluster densities). The sign convention is chosen such that the signs of all DKH2 CCSD(T)/ANO-RCC-TZP dipole moments are positive. Important outliers are labeled. “NOTCH + p” stands for NOTCH dipole moments that include the contribution from Eq. (51); other results do not include the contribution from Eq. (51). For the meanings of the other method abbreviations, please refer to Fig. 3.

FIG. 5.

Errors of signed dipole moments (Debye) of all neutral, ground state heterodiatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results (evaluated on the linearized coupled cluster densities). The sign convention is chosen such that the signs of all DKH2 CCSD(T)/ANO-RCC-TZP dipole moments are positive. Important outliers are labeled. “NOTCH + p” stands for NOTCH dipole moments that include the contribution from Eq. (51); other results do not include the contribution from Eq. (51). For the meanings of the other method abbreviations, please refer to Fig. 3.

Close modal

Despite this, many things can still be said about the influence of the various terms on the dipole moment. The error spread of the dipole moments is worsened when the geometry dependence of the ano-pc-0 contraction coefficients is neglected (“NOTCH a”), suggesting that a correct description of the radial extent of the AOs is crucial for the accuracy of dipole moments (Fig. 5). Neglecting the charge dependence of integrals (“NOTCH d”) improves the dipole moments of some boron compounds but worsens the dipole moments of lithium compounds; as the boron compounds generally have strong multireference character, we believe that the former improvement is fortuitous. The basis set incompleteness correction hμνpol is likewise found to slightly deteriorate the dipole moment results for the molecules formed between the elements B, C, and N but improve the results of the hydrides of B, C, N, O, and F (Table S6). Finally, removing the LRC correction (“NOTCH f”) deteriorates the results of molecules like NO, NF, and OF but also improves the dipole moments of some other molecules, notably LiF (Table S6). Overall, many of the terms in NOTCH are not generally beneficial for the accuracy of dipole moments, unlike the case of geometries and vibrational frequencies.

Nevertheless, the NOTCH dipole moments still prove much more robust than the dipole moments calculated by many semiempirical methods and HF-3c (Table S7). The dipole moments of GFN-xTB, GFN2-xTB, and HF-3c have rather large outliers for uncommon bonds, especially Li- and Be-containing molecules; PM7 additionally performs exceptionally poorly for Ne-containing molecules, which can be traced to its strong underestimation of the IP of Ne (Table III). The PM3, OM2, and OM3 methods give accuracies that are similar to or better than NOTCH, but this may partly be attributed to the fact that beryllium is not parameterized in these methods, which seems to be a difficult case for methods that do parameterize it. Finally, B3LYP-D3/def2-SVP gives excellent predictions for most molecules, with the exception of LiBe, LiC, LiN, LiO, and LiF whose dipole moments are significantly underestimated as well as LiB and BeC whose dipole moments are overestimated (obviously owing to their multireference character). The former suggests that the errors of the NOTCH dipole moments of lithium-containing molecules (Table S6), while large in magnitude, are already remarkably small when compared with other methods. Overall, NOTCH has a smaller error range than all the studied semiempirical methods and HF-3c, except for OM2 and OM3, which are only parameterized for H, C, N, O, and F (Fig. 5).

Our next target is the IPs of diatomic molecules. As can be seen from Fig. 6 and Table S8, the errors of the NOTCH IPs generally lie within 1.5 eV, but there are a few positive outliers, namely a few molecules formed between B, C, N, and O, as well as a single negative outlier, F2. All these molecules have significant multireference character, and while one may expect that the LRC correction may play a significant role here, it is evident from Fig. 6 that the LRC correction overcorrects F2 and under-corrects most of the other multireference molecules. In general, however, the LRC correction proves to be beneficial for reducing the largest IP errors. The geometry dependence of the AO contraction coefficient also helps to reduce the systematic overestimation of the IPs, especially for molecules containing carbon and/or nitrogen. The parameter αcorr and the charge dependence of integrals do not contribute as much to the IPs although the former reduces the IPs of lithium-containing molecules by about 0.1–0.5 eV and thus helps to reduce the error.

FIG. 6.

Errors of vertical IPs (eV) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. The DKH2 CCSD(T)/ANO-RCC-TZP equilibrium geometries are used. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

FIG. 6.

Errors of vertical IPs (eV) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. The DKH2 CCSD(T)/ANO-RCC-TZP equilibrium geometries are used. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

Close modal

The performance of other methods is then briefly discussed (Table S9). The GFN-xTB and GFN2-xTB methods drastically overestimate all the IPs by about 2–10 eV with a large scatter, consistent with what is observed in atoms (Table III). The PM7 and HF-3c methods, on the other hand, systematically underestimate the IPs; while the former is well expected from the large negative mean deviation (MD) of the atomic IPs of PM7 (Table III), the latter is better explained by the tendency of HF-3c to significantly underestimate atomic EAs (Table IV), which make the molecular IPs too low when one of the atoms in the molecules carries significant negative charge (e.g., the molecules LiN, LiO and LiF in Table S9). The PM3, OM2, and OM3 methods perform reasonably well and, in particular, are free from the systematic overestimation of the IPs in B-, C-, N-, and O-containing molecules as found in NOTCH. Whether the latter is physically meaningful or due to error cancellation is unclear at the moment. Finally, we note that the B3LYP-D3/def2-SVP method, while performing excellently for all covalently bonded molecules, underestimates the IPs of He2 and Ne2 by large margins, despite NOTCH performing very well for these systems; this is clearly due to the self-interaction error of B3LYP, which spuriously stabilizes stretched radical ions like He2+ and Ne2+ and illustrates the importance of using 100% HF exchange in the NOTCH method.

The EAs of diatomic molecules computed under different levels of theories are summarized in Fig. 7 and Table S10; note that molecules whose anions are unbound at the reference level of theory are excluded from the analysis. The accuracy of the NOTCH EAs is similar to that of the IPs; there are a few outliers (Li2, F2, and boron-containing molecules) that can probably be attributed to some problems of the LRC correction. A few other contributions, such as the geometry dependence of the AO contraction coefficients, the parameter αcorr, and the charge dependence of integrals, prove notably beneficial for the accuracy of the EAs, indicating the crucial role of the radial flexibility of the AOs (and to a lesser extent, the correct treatment of correlation) in the correct prediction of the EAs.

FIG. 7.

Errors of vertical EAs (eV) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. The DKH2 CCSD(T)/ANO-RCC-TZP equilibrium geometries are used. Molecules whose DKH2 CCSD(T)/ANO-RCC-TZP EAs are negative are omitted. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

FIG. 7.

Errors of vertical EAs (eV) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. The DKH2 CCSD(T)/ANO-RCC-TZP equilibrium geometries are used. Molecules whose DKH2 CCSD(T)/ANO-RCC-TZP EAs are negative are omitted. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

Close modal

Most of the other methods studied herein (Table S11) have difficulties predicting the EAs of some of the diatomic molecules. For example, PM3 predicts much too low EAs for LiB, B2, C2, and F2, while PM7 gives systematically too low EAs for most hydrogen- and lithium-containing molecules, except for Li2 where the EA is significantly overestimated. The OM2 and OM3 methods perform well, with just a single outlier, F2; however this may be partially due to the fact that the more difficult elements Li, Be, and B are not parameterized in these methods. The EAs are systematically overestimated at the GFN-xTB and GFN2-xTB levels, except for some beryllium-containing molecules, whose EAs are predicted to be too negative at the GFN2-xTB level, in accord with the drastically underestimated EA of the beryllium atom (Table IV). The HF-3c and, to a much less extent, the B3LYP-D3/def2-SVP methods predict systematically too low EAs, which can be obviously attributed to the use of a too small basis set and the lack of corrections for the lack of diffuse functions.

The final ground state property to look at is the bond dissociation energy. As shown in Fig. 8 and Table S12, NOTCH gives bond dissociation energies that are usually accurate to about 1 eV; the worse outliers (BeF, F2, BF, N2) generally possess some multireference character. The terms that affect the NOTCH bond dissociation energies the most are the scaled overlap matrix elements [Eq. (27)] and the LRC correction [Eq. (43)]. When αcorr is set to zero (“NOTCH b”), the bond dissociation energies are overestimated by about 1 eV on average and feature a broader error distribution due to overestimation of dynamic correlation. Interestingly, removing the LRC correction (“NOTCH f”) actually improves the bond dissociation energies (the MAD is reduced from 0.813 to 0.485 eV; see Table S12). The latter suggests that the LRC correction may still have room for improvement.

FIG. 8.

Errors of bond dissociation energies (eV) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. The equilibrium geometries optimized at the respective levels of theory are used. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

FIG. 8.

Errors of bond dissociation energies (eV) of all neutral, ground state diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical and cost-effective ab initio methods, compared with DKH2 CCSD(T)/ANO-RCC-TZP results. The equilibrium geometries optimized at the respective levels of theory are used. Important outliers are labeled. For the meanings of the method abbreviations, please refer to Fig. 3.

Close modal

The bond dissociation energies of other semiempirical methods and HF-3c (Table S13) have broader error distributions than NOTCH does, again with the exception of OM2 and OM3, which are parameterized for H, C, N, O, and F only. Naturally, the biggest outliers are typically molecules possessing multireference character, like BO, C2, and N2. The PM3 and HF-3c methods, bearing 100% HF exchange, tend to underbind these multiply bonded, multireferential molecules, while the GFN-xTB and GFN2-xTB methods, which do not have HF exchange, typically overbind them. The PM7 method manages to achieve less systematic bias throughout the set of molecules studied, probably due to it being more extensively parameterized; however, like what we have seen in Sec. III B, this comes at a price that some non-covalent complexes (HeB, BeNe) are predicted to be extremely stable, covalently bonded molecules, with dissociation energies (9.246 and 8.594 eV, respectively) even exceeding that of N2 at the same level of theory (8.369 eV). B3LYP-D3/def2-SVP performs excellently for the bond dissociation energies, as expected.

Finally, we examine the excitation energies of the diatomic molecules. Since the charge dependence of integrals (Sec. II F), the basis set incompleteness correction (Sec. II G) and the LRC correction (Sec. II H) introduce additional response contributions, a straightforward configuration interaction singles (CISs) or RPA calculation using NOTCH integrals derived from ground state atomic charges is not appropriate due to neglecting, e.g., the change of the atomic charges during excitation. Therefore, we calculate the NOTCH excitation energies via the second-order response of the NOTCH energy, giving the Casida equation,77 
ABBAXY=ωI00IXY,
(52)
where the orbital Hessians A and B are defined as
Aiaσjbτ=δijδabδστϵaσϵiσ+Kiaσjbτ,
(53)
Biaσjbτ=Kiaσbjτ,
(54)
Kpqσrsτ=2ENOTCHDpqσDrsτ.
(55)
The working expressions of the second-order derivatives 2ENOTCHDpqσDrsτ are given in the supplementary material. Note that all the molecular excitation energies in this section are calculated in a single-reference manner since it is nontrivial to extend Eq. (55) to multireference calculations.

The excitation energy benchmark results of NOTCH, INDO/S, PM3, and OM2 are listed in Table V. While NOTCH performs very well for ππ* excitations (e.g., N2 and CO) and some σπ* excitations [e.g., Li2 (3Πu), HN (3Π), BC (4Π), BF, LiC (4Π), and BeF (2Π)], giving an error of within 1 eV in most cases, it systematically overestimates the energies of excitations into σ* orbitals [e.g., H2 (σσ*), F2 (π* → σ*), HN (5Σ, σσ*), HF (πσ*), LiF (π* → σ*), BeF (2Σ+, σσ*), BO (2Π, πσ*, and 2Σ+, σσ*), and CN (2Π, πσ*, and 2Σ+, σσ*)]. Neglecting the geometry dependence of the ano-pc-0 basis set improves the excitation energies of hydrogen-containing molecules by as much as 5 eV, suggesting that optimizing the AO basis set for the ground state leads to H 1s orbitals that are too compact for expanding the σ* orbital. However, the fact that a sizable error still remains even if the (very diffuse) free atom AOs are used and that the excitation energies of F2, LiF, BeF, and BO are not much affected by this change suggests that there is probably no minimal basis set that can provide a reasonable compromise between describing the ground state properties and the σ* orbital. Turning off the charge dependence of the integrals does not in general lead to appreciable changes of the excitation energies but changes the excitation energies of highly polar molecules (e.g., HF, LiF) by up to 1.6 eV; while the changes do not always improve the results, it can be said that the effect of the charge dependence of integrals on the excitation energies is large. The LRC correction is occasionally useful in curing the underestimation of triplet excitation energies (N2), which in extreme cases can lead to triplet instabilities (O2, F2), as already discussed in Sec. II H.

TABLE V.

The lowest two vertical excitation energies (eV) of neutral, diatomic molecules formed between the elements H–Ne, computed by NOTCH as well as other semiempirical methods via RPA, compared with DKH2 MRACPF/ANO-RCC-TZP results. Only those molecules that do not contain rare gas atoms, whose ground states are spatially nondegenerate, and whose ground state NOTCH wavefunctions do not have instabilities are listed. The DKH2 CCSD(T)/ANO-RCC-TZP equilibrium geometries are used.

MoleculeStateNOTCHNOTCH aNOTCH bNOTCH cINDO/SPM3OM2MRACPF
H2 3Σu+ 18.305 15.458 18.305 15.782 13.700 6.926 5.066 10.585 
1Σu+ 22.371 17.463 24.670 21.518 18.317 10.006 7.236 12.732 
Li2 3Σu+ 1.432 1.600 1.432 1.212 d d e 1.304 
3Πu 1.833 1.153 1.833 1.041 d d 1.384 
N2 3Σu+ 7.613 7.820 7.613 6.219 3.266 5.261 5.951 7.569 
3Πg 7.940 7.220 7.940 7.425 4.979 5.615 6.500 7.935 
O2 3Δu 6.503 6.673 6.503 d 2.311 4.641 6.009f 6.145 
3Σu+ 4.045 3.171 5.757 d 2.311 4.694 6.053f 6.292 
F2 3Πu 6.178 6.134 6.178 d 7.978 3.916 2.298 3.152 
1Πu 8.405 8.353 8.405 5.641 9.171 4.206 3.300 4.347 
HLi 3Σ+ 5.394 5.501 5.403 3.936 4.937 3.297 e 3.263 
1Σ+ 5.667 5.585 5.740 4.814 5.426 4.863 3.629 
HN 3Π 4.207 3.632 4.208 3.968 3.976 2.876 3.959f 3.701 
5Σ- 17.950 13.783 17.912 16.182 12.266 6.207 7.137f 8.416 
HF 3Π 16.357 15.197 17.324 19.024 14.519 8.266 9.667 10.200 
1Π 17.133 16.320 18.790 20.258 15.737 8.632 10.355 10.569 
LiC 4Π 4.445 4.763 4.311 4.462 3.442 4.102 e 3.254 
4Σ- 5.156 5.490 5.041 5.172 3.718 4.293 3.778 
LiF 3Π 8.570 8.956 7.557 9.381 9.502 7.854 e 6.734 
1Π 8.790 9.100 7.809 9.542 9.568 7.987 6.778 
BeF 2Π 5.508 5.335 5.540 5.179 3.888 e e 4.119 
2Σ+ 11.929 12.632 12.761 12.878 10.416 8.218 
BC 4Π 0.360 0.316 d 0.577 2.437 3.997 e 1.509 
4Σ- 4.430 4.541 d 4.110 2.566 4.419 2.260 
BO 2Π 8.172 8.003 7.684 8.131 3.339 2.981 e 3.496 
2Σ+ 11.040 9.395 8.181 8.699 4.777 6.366 5.561 
BF 3Π 2.638 2.521 2.464 2.134 3.216 5.073 e 3.652 
1Π 6.892 6.663 6.825 6.312 6.102 7.383 6.521 
CN 2Π 4.596 4.649 4.295 4.696 0.886 1.392 2.209f 1.329 
2Σ+ 5.800 5.665 5.454 5.597 1.962 2.170 4.541f 3.195 
CO 3Π 5.789 5.269 5.965 5.389 4.975 4.771 5.040 6.191 
3Σ+ 8.808 8.388 8.874 8.400 6.053 6.626 7.165 8.326 
NF 3Π 8.664 8.631 8.668 6.996 8.247 5.004 3.803f 7.107 
3Π 10.564 10.396 10.586 9.358 10.793 5.085 6.031f 7.264 
MDg  2.440 1.919 2.603 2.229 0.842 −0.800 −1.388  
MADg  2.723 2.354 2.724 2.608 2.238 1.672 1.649  
MDg,h  1.450 1.299 1.348 1.005 0.200 −0.101 −0.711  
MADg,h  1.807 1.845 1.504 1.494 1.804 1.228 1.125  
MoleculeStateNOTCHNOTCH aNOTCH bNOTCH cINDO/SPM3OM2MRACPF
H2 3Σu+ 18.305 15.458 18.305 15.782 13.700 6.926 5.066 10.585 
1Σu+ 22.371 17.463 24.670 21.518 18.317 10.006 7.236 12.732 
Li2 3Σu+ 1.432 1.600 1.432 1.212 d d e 1.304 
3Πu 1.833 1.153 1.833 1.041 d d 1.384 
N2 3Σu+ 7.613 7.820 7.613 6.219 3.266 5.261 5.951 7.569 
3Πg 7.940 7.220 7.940 7.425 4.979 5.615 6.500 7.935 
O2 3Δu 6.503 6.673 6.503 d 2.311 4.641 6.009f 6.145 
3Σu+ 4.045 3.171 5.757 d 2.311 4.694 6.053f 6.292 
F2 3Πu 6.178 6.134 6.178 d 7.978 3.916 2.298 3.152 
1Πu 8.405 8.353 8.405 5.641 9.171 4.206 3.300 4.347 
HLi 3Σ+ 5.394 5.501 5.403 3.936 4.937 3.297 e 3.263 
1Σ+ 5.667 5.585 5.740 4.814 5.426 4.863 3.629 
HN 3Π 4.207 3.632 4.208 3.968 3.976 2.876 3.959f 3.701 
5Σ- 17.950 13.783 17.912 16.182 12.266 6.207 7.137f 8.416 
HF 3Π 16.357 15.197 17.324 19.024 14.519 8.266 9.667 10.200 
1Π 17.133 16.320 18.790 20.258 15.737 8.632 10.355 10.569 
LiC 4Π 4.445 4.763 4.311 4.462 3.442 4.102 e 3.254 
4Σ- 5.156 5.490 5.041 5.172 3.718 4.293 3.778 
LiF 3Π 8.570 8.956 7.557 9.381 9.502 7.854 e 6.734 
1Π 8.790 9.100 7.809 9.542 9.568 7.987 6.778 
BeF 2Π 5.508 5.335 5.540 5.179 3.888 e e 4.119 
2Σ+ 11.929 12.632 12.761 12.878 10.416 8.218 
BC 4Π 0.360 0.316 d 0.577 2.437 3.997 e 1.509 
4Σ- 4.430 4.541 d 4.110 2.566 4.419 2.260 
BO 2Π 8.172 8.003 7.684 8.131 3.339 2.981 e 3.496 
2Σ+ 11.040 9.395 8.181 8.699 4.777 6.366 5.561 
BF 3Π 2.638 2.521 2.464 2.134 3.216 5.073 e 3.652 
1Π 6.892 6.663 6.825 6.312 6.102 7.383 6.521 
CN 2Π 4.596 4.649 4.295 4.696 0.886 1.392 2.209f 1.329 
2Σ+ 5.800 5.665 5.454 5.597 1.962 2.170 4.541f 3.195 
CO 3Π 5.789 5.269 5.965 5.389 4.975 4.771 5.040 6.191 
3Σ+ 8.808 8.388 8.874 8.400 6.053 6.626 7.165 8.326 
NF 3Π 8.664 8.631 8.668 6.996 8.247 5.004 3.803f 7.107 
3Π 10.564 10.396 10.586 9.358 10.793 5.085 6.031f 7.264 
MDg  2.440 1.919 2.603 2.229 0.842 −0.800 −1.388  
MADg  2.723 2.354 2.724 2.608 2.238 1.672 1.649  
MDg,h  1.450 1.299 1.348 1.005 0.200 −0.101 −0.711  
MADg,h  1.807 1.845 1.504 1.494 1.804 1.228 1.125  
a

The geometry dependence of the AO contraction coefficients is neglected, i.e., cμAνA1=cμAνA2=0.

b

The charge dependence of integrals is neglected, i.e., fACDI=0.

c

The LRC correction ELRCD is neglected.

d

The excitation energy cannot be calculated because the molecule has a triplet instability.

e

The molecule contains an unparameterized element.

f

A valence space CASSCF calculation is performed instead of RPA since the MNDO program does not support RPA calculations from an unrestricted reference.

g

Molecules that possess triplet instabilities or unparameterized elements are not included in the statistics.

h

States whose excitation energies are above 8 eV are excluded in the statistics.

By comparison, the other semiempirical methods INDO/S, PM3 and (to a lesser extent) OM2 generally underestimate the ππ* excitation energies of N2, O2, and CO, which is a well-known problem of the ZDO formalism itself. In ZDO methods, the bonding and antibonding MOs formed between two AOs are split symmetrically, while in reality the virtual orbitals are higher than what would be expected from a symmetric splitting;51 while this is partially remedied by the orthogonalization corrections in the OMx methods,54 the corrections are only carried out to low order. Thus, while these methods generally perform better than NOTCH for excitations involving σ* orbitals, it is reasonable to believe that the better performance is due to a fortunate cancellation of the negative errors due to the ZDO formalism and the positive errors due to using a minimal basis set. In some cases, the INDO/S, PM3, and OM2 methods significantly underestimate the excitation energies (e.g., H2 at the OM2 level, and NF at the PM3 and OM2 levels), sometimes even resulting in triplet instabilities (e.g., Li2 at the INDO/S and PM3 levels), when one would predict an overestimation of excitation energies due to the use of a minimal basis set. Consequently, despite that the largest outliers occur in NOTCH instead of the other semiempirical methods, we believe that NOTCH should probably be more robust than the other methods for realistically sized organic and organometallic molecules, where the virtual orbitals are less diffuse and much easier to describe by a minimal basis than in diatomic molecules. Moreover, if one desires, it should be relatively easy to improve NOTCH for excited states involving σ* orbitals, since the cause of the overestimation of the excitation energies is well understood.

The present work describes the preliminary formulation and benchmark results of a new semiempirical method, NOTCH. The method can be considered as a parameterized all-electron HF/minimal basis set calculation, where (contrary to most ZDO methods) all two-center two-electron integrals are retained and all scalar relativity, correlation (local dynamic correlation, dispersion, left–right correlation) and basis set incompleteness (the lack of radial and angular flexibility of the basis set) effects are accounted for by individual, physically motivated corrections.

In our opinion, the theory is based on well-designed and physically appealing approximations and only introduce eight semiempirical parameters which are all global rather than element or even element pair specific. This level of empiricism is even lower than that of many density functionals. Preliminary equilibrium geometry, vibrational frequency, dipole moment, IP, EA, bond dissociation energy, and excitation energy calculations on the atoms and diatomic molecules of the elements H–Ne show that NOTCH is exceptionally robust, with a lack of serious and practically relevant outliers that plague almost all the other semiempirical methods as well as a few cost-effective ab initio methods (e.g., HF-3c), such as the formation of unreasonably short covalent bonds in non-covalent complexes or IPs and EAs that are in error in excess of 5 eV. Meanwhile, the accuracy of NOTCH for the more “conventional” systems, where all methods perform reasonably well, is at least comparable with (and sometimes better than) other semiempirical methods. Moreover, in contrast to other semiempirical methods, NOTCH is balanced in the sense that the quality of the results is consistent across many different properties including the ground state as well as excited states. This feature makes us optimistic that in NOTCH one obtains the right answer for the right reasons. Taken together, the numerical results provide support for the hope that, once fully developed for polyatomic molecules, NOTCH will be useful as a robust, reliable, general-purpose semiempirical method.

While multiple ideas have contributed to the development of NOTCH, we believe that one of the most important ingredients is the parameterization of the one-center one-electron and two-electron integrals on the basis of MR-EOM reference calculations on free atoms and ions, without involving any fitting. The leading idea to approach the design of a semiempirical method from this angle is rooted in the hypothesis that, starting from a good description of the atoms and their excitation and ionization events would be a solid basis for embedding these atoms in a reasonable molecular potential, thus leading to systematically correct electronic structure descriptions. The MR-EOM construction appeared to be particularly promising in this respect given its excellent accuracy relative to experimental data and the fact that the sequence of similarity transformations leads to a form of the theory that can be expressed in terms of “dressed” integrals that can be directly taken as a basis for the parameterization. Thus, MR-EOM does not only provide reliable integrals for atomic calculations, but it also helped us tremendously to avoid large-scale fitting procedures that might have led to overfitting and fortuitously good-looking molecular results.

Insisting on not further manipulating the atomic integrals, we were then forced to identify the real physical reasons for the discrepancies of the calculated results with the reference data, and design physically motivated corrections whose functional forms are genuinely original and have never been proposed or used in existing semiempirical methods. In particular, the geometry dependence of the basis set (Sec. II C), the basis set incompleteness correction (Sec. II G), and the left–right correlation correction (Sec. II H) describe effects that most semiempirical methods do not explicitly consider. In our opinion, this is far more physically satisfying than trying to absorb all of these effects into a large number of parameters that were originally designed to capture other effects. For example, modifying the parameters that define the core–core repulsion potential to absorb, e.g., correlation errors and basis set limitations is particularly dissatisfying to us. While many semiempirical methods incorporate more and more flexible forms of the core–core repulsion potential as they are developed, we have taken the opposite route, i.e., to describe the core–core repulsion exactly using an all-electron approach. Consequently, there is no excuse for adding corrections to the core–core repulsion energy (because it is already exact), and we were forced to develop our other approximations in a way that retains the balance with this “electrostatic cornerstone.” This is clearly a more difficult task, but to us it appears to be more physically appealing than the traditional alternatives.

The obvious next step in the development is the extension of NOTCH to multiatomic molecules. In multiatomic molecules, there are a few issues that are not present in diatomic molecules and that must be properly taken care of:

  1. We would like to omit, to the largest extent possible, three- and four-center two-electron integrals in order to keep the computational performance of NOTCH close to existing semiempirical methods. While simply neglecting these integrals is an unacceptably crude approximation under the AO basis, it is far better justified in the OAO basis.78 If all surviving two-electron integrals can be conveniently derived from one-center and two-center two-electron integrals, then for sufficiently large systems, NOTCH will be at most, say, 2–5 times more expensive than existing semiempirical methods. The additional cost stems partly from the inclusion of the (AA|AB)- and (AB|AB)-type integrals and partly because of the explicit treatment of core electrons.

  2. The CAB6 and CAB8 coefficients of a pair of atoms depend on the chemical environments of the atoms. It remains to be seen if ideas from established dispersion correction methods, such as expressing CAB6 and CAB8 as functions of the atomic coordination numbers (as used in the DFT-D360 and DFT-D461 methods), can give useful accuracy in conjunction with our newly proposed dispersion correction formalism, Eq. (31).

Of course, analytic gradients and perhaps also analytic Hessians must be implemented such that geometry optimization, frequency analysis, and molecular dynamics simulations of realistically sized molecules become feasible.

Our hope is to be able to extend NOTCH to heavier elements as well. In particular, open shell transition metals appear to be a major challenge for most existing semiempirical methods. Since NOTCH is based on a formulation in which all terms are in balance by construction, and the exchange is treated in a detailed and physically correct fashion without introducing any self-interaction error, we believe that it holds significant promise in the area. A particularly exciting possibility is to combine NOTCH with a minimal active space CASSCF or MR-CIS approach, which is essential for correctly describing the spin multiplet structure of transition metal complexes. When active spaces spanning more than one atom have to be used, one can turn off the contribution of these atoms to the LRC correction in the multireference NOTCH calculation, to avoid double-counting the non-dynamic correlation between these atoms while still retaining the dynamic correlation contribution. This is a key advantage compared to most existing semiempirical methods, which do not separate the non-dynamic and dynamic contributions to correlation.

Eventually, we believe that NOTCH will become a robust, accurate, general-purpose semiempirical method, which offers intermediate cost and accuracy between existing semiempirical methods and DFT methods that allows for a very wide range of applications in (bio)chemistry, material science, and spectroscopy/photochemistry. Our efforts along this way will be reported in due course.

This paper is dedicated to the memory of Professor Walter Thiel. Walter was a wonderful and kind friend and mentor. One of us (FN) had the privilege to discuss the very early stages of this project with him shortly before his untimely passing. We deeply regret that he could not see this work to fruition, but we hope that this work may help to reinvigorate interest in semiempirical quantum chemistry, a subject that always remained close to Walter’s heart.

See the supplementary material for further computational details of the benchmark studies, derivations of the NOTCH Fock matrix elements and orbital Hessian matrix elements, raw benchmark data, and lists of the nonempirical NOTCH parameters.

The authors gratefully acknowledge generous financial support of this work by the Max Planck society.

The authors have no conflicts to disclose.

Zikuan Wang: Formal analysis (lead); Investigation (lead); Methodology (equal); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). Frank Neese: Conceptualization (lead); Funding acquisition (lead); Methodology (equal); Software (supporting); Supervision (lead); Writing – review & editing (equal).

The data that support the findings of this study are available within the article and its supplementary material.

As mentioned in the Introduction, the AOs in molecules tend to be more compact than those in the corresponding free atoms. Two possibly predominant reasons are (a) when the length of a bond AB is sufficiently short, the AOs of atom A can enjoy more nuclear attraction from atom B by shrinking and (b) the AOs of atom A may overlap more favorably with the AOs of atom B after shrinking, resulting in stronger bonding. While (a) does not necessarily dominate over (b), it is much easier to derive a formula for the contribution (a) than for (b). We have therefore decided to use a functional form that is consistent with (a) and contains a few adjustable parameters and fit the parameters against optimized molecular basis set contraction coefficients so that the effects of contribution (b) are effectively accounted for.

Assuming that the density of atom A shrinks uniformly, the new atomic density can be written in terms of the free atom density ρAPModelrRA as
ρArRA=λ3ρAPModelλrRA
(A1)
for some scale factor λ. We can then find the optimal scale factor λopt by variationally optimizing λ to minimize the total energy,
λopt=argminλEλ.
(A2)
By definition, λopt is 1 if A is a free atom. Thus, the deviation of λopt from 1 in a molecular environment, which measures the extent of shrinking of the AOs, should be proportional to the derivative of Eλ with respect to λ,
λopt1Eλλλ=1.
(A3)
If we only consider the electrostatic interactions between atoms, then Eλ can be given as
Eλ=Eλ,freeatom+BAρArRAVBPModelrRBdr.
(A4)
Taking the derivative with respect to λ yields
Eλλλ=1=BA3ρAPModelrRAVBPModelrRBdr+ρAPModelrrr=rRAVBPModelrRBdr.
(A5)
Here, the condition Eλ,freeatomλλ=1=0 is used, since Eλ is minimized at λ = 1 for a free atom. Finally, we assume that the desired contraction coefficients cμAνAanopc0RA,RB are linear functions of the scale factor λopt, which immediately leads to Eq. (7). Note that while Eq. (A5) implies that cμAνA1=3cμAνA2, in reality the fitted values of cμAνA1 and cμAνA2 usually do not satisfy this condition, probably due to the influence of the contribution (b) mentioned above. In addition, as we are scaling the contraction coefficients rather than the exponents, the AOs actually do not shrink uniformly as would be implied by Eq. (A1).

The dispersion correction parameters of existing semiempirical methods are usually fitted against non-covalent complexation energies or other properties related to the total energy, such as the equilibrium geometries. In these approaches, the dispersion correction parameters will absorb some of the errors of the intermolecular repulsion term and other inaccuracies of the non-dispersion contribution to the interaction energy. While this obviously helps to improve the accuracy of the method on the training set, it has a few drawbacks: (1) It will hide the fundamental deficiencies of the non-dispersion terms, which may be masked by error compensation in the training set but show up in systems not seen in the training set; and (2) one cannot easily extract the dispersion contribution from the total energy, since only the total energy but not its components are meaningful. Therefore, in NOTCH we would like to fit the dispersion correction parameters to reproduce the dispersion energy itself rather than the total energy.

The LED method68,69 is a robust and accurate approach for decomposing the DLPNO coupled cluster correlation energy into contributions from dispersion and various non-dispersion contributions. The method takes advantage of the fact that the DLPNO coupled cluster correlation energy can be expressed as a sum of contributions from occupied localized MO (LMO) pairs,
Ecorr=ijεij.
(B1)
Each of the pair energies ɛij can be further decomposed according to which virtual orbitals the electrons in LMOs i and j excite into
εij=ãijb̃ijiãij|jb̃ijτ̃ãijb̃ijij,
(B2)
where τ̃ãijb̃ijij are contravariant coupled cluster amplitudes. In DLPNO methods, pair natural orbitals (PNOs) are used as the virtual orbitals and the subscripts of ãij and b̃ij stress that different occupied orbital pairs ij have different sets of PNOs.
It is well-known that the correlation energy can be decomposed into a sum of contributions from all double excitations, and since any given LMO or PNO of the molecule can often be unambiguously assigned to a single molecular fragment as long as the fragments are not very strongly interacting, the sum can be regrouped according to in which fragments the occupied and virtual orbitals involved in the double excitations are localized. For example, the dispersion energy between two fragments X and Y can be defined as
EXYdisp=iX,jYãijX,b̃ijYiãij|jb̃ijτ̃ãijb̃ijij+ãijY,b̃ijXiãij|jb̃ijτ̃ãijb̃ijij,
(B3)
i.e., the contribution of double excitations where one occupied orbital and one virtual orbital are localized on X, and the other two orbitals are localized on Y. Note that the energy partition scheme is more complicated in real DLPNO-CCSD(T) calculations, since (1) the counterpoise correction is applied, (2) some occupied LMO pairs (the so-called “weak pairs”) are treated perturbatively instead of at the coupled cluster level, and (3) the total correlation energy contains contributions from the perturbative triples correction (T). The interested reader is referred to the original references for the details of the DLPNO-CCSD(T) LED approach.68,69

The DLPNO-CCSD(T) LED dispersion energy EXYdisp represents an accurate metric of the dispersion contribution to the total interaction energy that is not contaminated by non-dispersive interactions. However, its physical meaning is different from the contribution of the NOTCH dispersion correction to the total energy, Edispcorr [Eq. (32)], because the local dynamic correlation correction (Sec. II E) also describes some of the short-range dispersion. Thus, a naïve approach of fitting Edispcorr against the reference EXYdisp data would result in a drastic overestimate of the magnitude of dispersion. Instead, we have to define a NOTCH dispersion energy that is consistent with the physical picture of Eq. (B3). Herein, we have chosen the following approach:

  1. Perform a usual NOTCH calculation of the dimer XY.

  2. Perform another NOTCH calculation of XY, but all OAO-basis NOTCH two-electron integrals μν|κλOAŌ where μ, νX and κ, λY are replaced by their HF values μν|κλOAO. This roughly amounts to selectively turning off the correlation corrections that are responsible for dispersion.

  3. The NOTCH dispersion energy Edisp between the fragments X and Y is defined as the difference between the aforementioned two energies.

There are a few possible variations of this approach, for example, (a) instead of doing a full SCF calculation in step (2), one may also do a single SCF energy evaluation on the converged wavefunction of step (1); and/or (b) one can turn off the correlation under the LMO basis, instead of the OAO basis. The differences between these variations are expected to be small when the interfragment distance is not much shorter than the van der Waals distance.

Thus, we fitted the NOTCH dispersion parameters s8, a1, and a2, as well as the parameter αcorr [which controls the long-range behavior of the local dynamic correlation correction, Eq. (27)], against the diatomic LED dispersion energies at the DKH2 DLPNO-CCSD(T)/ANO-RCC-Full level of theory. When both atoms are open shell, we align their spins in parallel to each other, so as to minimize bonding interactions between the atoms. The resolution of the identity (RI) auxiliary basis sets were generated by the AutoAux method.79 All the core electrons were included in the correlation treatment and the TightPNO setting was enforced to obtain more tightly converged dispersion energies. The sum of strong pair and weak pair contributions to the dispersion energy was used as the reference data EXYdisp in the fitting.

1.
W.
Thiel
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
4
(
2
),
145
157
(
2014
).
2.
N. D.
Yilmazer
and
M.
Korth
,
Comput. Struct. Biotechnol. J.
13
,
169
175
(
2015
).
3.
K.
Eichkorn
,
O.
Treutler
,
H.
Öhm
,
M.
Häser
, and
R.
Ahlrichs
,
Chem. Phys. Lett.
240
(
4
),
283
290
(
1995
).
4.
F.
Neese
,
F.
Wennmohs
,
A.
Hansen
, and
U.
Becker
,
Chem. Phys.
356
(
1–3
),
98
109
(
2009
).
5.
R.
Izsák
and
F.
Neese
,
J. Chem. Phys.
135
(
14
),
144105
(
2011
).
6.
R.
Izsák
,
F.
Neese
, and
W.
Klopper
,
J. Chem. Phys.
139
(
9
),
094111
(
2013
).
7.
B.
Helmich-Paris
,
B.
de Souza
,
F.
Neese
, and
R.
Izsák
,
J. Chem. Phys.
155
(
10
),
104109
(
2021
).
8.
J. J. P.
Stewart
,
J. Comput. Chem.
10
(
2
),
209
220
(
1989
).
9.
J. J. P.
Stewart
,
J. Mol. Model.
13
(
12
),
1173
1213
(
2007
).
10.
J. J. P.
Stewart
,
J. Mol. Model.
19
(
1
),
1
32
(
2013
).
11.
P. O.
Dral
,
X.
Wu
,
L.
Spörkel
,
A.
Koslowski
,
W.
Weber
,
R.
Steiger
,
M.
Scholten
, and
W.
Thiel
,
J. Chem. Theory Comput.
12
(
3
),
1082
1096
(
2016
).
12.
Q.
Cui
and
M.
Elstner
,
Phys. Chem. Chem. Phys.
16
(
28
),
14368
14377
(
2014
).
13.
G.
Seifert
and
J.-O.
Joswig
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
(
3
),
456
465
(
2012
).
14.
C.
Bannwarth
,
E.
Caldeweyher
,
S.
Ehlert
,
A.
Hansen
,
P.
Pracht
,
J.
Seibert
,
S.
Spicher
, and
S.
Grimme
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
11
(
2
),
e1493
(
2021
).
15.
S.
Grimme
,
C.
Bannwarth
, and
P.
Shushkov
,
J. Chem. Theory Comput.
13
(
5
),
1989
2009
(
2017
).
16.
C.
Bannwarth
,
S.
Ehlert
, and
S.
Grimme
,
J. Chem. Theory Comput.
15
(
3
),
1652
1671
(
2019
).
17.
T.
Husch
,
A. C.
Vaucher
, and
M.
Reiher
,
Int. J. Quantum Chem.
118
(
24
),
e25799
(
2018
).
18.
A. A.
Voityuk
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
3
(
5
),
515
527
(
2013
).
19.
M.
Douglas
and
N. M.
Kroll
,
Ann. Phys.
82
(
1
),
89
155
(
1974
).
20.
G.
Jansen
and
B. A.
Hess
,
Phys. Rev. A
39
(
11
),
6016
6017
(
1989
).
21.
L. M. J.
Huntington
,
O.
Demel
, and
M.
Nooijen
,
J. Chem. Theory Comput.
12
(
1
),
114
132
(
2016
).
22.
D.
Datta
and
M.
Nooijen
,
J. Chem. Phys.
137
(
20
),
204107
(
2012
).
23.
M.
Nooijen
,
O.
Demel
,
D.
Datta
,
L.
Kong
,
K. R.
Shamasundar
,
V.
Lotrich
,
L. M.
Huntington
, and
F.
Neese
,
J. Chem. Phys.
140
(
8
),
081102
(
2014
).
24.
O.
Demel
,
D.
Datta
, and
M.
Nooijen
,
J. Chem. Phys.
138
(
13
),
134108
(
2013
).
25.
S.
Grimme
and
C.
Bannwarth
,
J. Chem. Phys.
145
(
5
),
054103
(
2016
).
26.
G. M.
Loubriel
and
R. G.
Selsby
,
Int. J. Quantum Chem.
8
(
4
),
547
557
(
1974
).
27.
R.
Iffert
and
K.
Jug
,
Theor. Chim. Acta
72
(
5
),
373
378
(
1987
).
28.
J. D.
Baker
and
M. C.
Zerner
,
J. Phys. Chem.
94
(
7
),
2866
2872
(
1990
).
29.
M.
Gaus
,
Q.
Cui
, and
M.
Elstner
,
J. Chem. Theory Comput.
7
(
4
),
931
948
(
2011
).
30.
Y.
Yang
,
H.
Yu
,
D.
York
,
Q.
Cui
, and
M.
Elstner
,
J. Phys. Chem. A
111
(
42
),
10861
10873
(
2007
).
31.
M. J. S.
Dewar
and
N. L.
Hojvat
,
J. Chem. Phys.
34
(
4
),
1232
1236
(
1961
).
32.
M. J. S.
Dewar
and
N. L.
Hojvat
,
Proc. R. Soc. Lond. A
264
(
1319
),
431
444
(
1961
).
33.
M. J. S.
Dewar
and
N. L.
Sabelli
,
J. Phys. Chem. A
66
(
12
),
2310
2316
(
1962
).
34.
G.
Klopman
,
J. Am. Chem. Soc.
86
(
21
),
4550
4557
(
1964
).
35.
M. J. S.
Dewar
and
W.
Thiel
,
J. Am. Chem. Soc.
99
(
15
),
4899
4907
(
1977
).
36.
D. N.
Laikov
,
J. Chem. Phys.
135
(
13
),
134120
(
2011
).
37.
T. J.
Giese
and
D. M.
York
,
J. Chem. Phys.
123
(
16
),
164108
(
2005
).
38.
T. J.
Giese
and
D. M.
York
,
J. Chem. Phys.
127
(
19
),
194101
(
2007
).
39.
F.
Jensen
,
J. Chem. Phys.
115
(
20
),
9113
9125
(
2001
).
40.
F.
Jensen
,
J. Phys. Chem. A
111
(
44
),
11198
11204
(
2007
).
41.
F.
Jensen
,
J. Chem. Phys.
117
(
20
),
9234
9240
(
2002
).
42.
W. J.
Hehre
,
R. F.
Stewart
, and
J. A.
Pople
,
J. Chem. Phys.
51
(
6
),
2657
2664
(
1969
).
43.
R.
Sure
and
S.
Grimme
,
J. Comput. Chem.
34
(
19
),
1672
1685
(
2013
).
44.
E. V. R.
de Castro
and
F. E.
Jorge
,
J. Chem. Phys.
108
(
13
),
5225
5229
(
1998
).
45.
P.-O.
Widmark
,
P.-Å.
Malmqvist
, and
B. O.
Roos
,
Theor. Chim. Acta
77
(
5
),
291
306
(
1990
).
46.
B. O.
Roos
,
V.
Veryazov
, and
P.-O.
Widmark
,
Theor. Chem. Acc.
111
(
2
),
345
351
(
2004
).
47.
B. O.
Roos
,
R.
Lindh
,
P.-Å.
Malmqvist
,
V.
Veryazov
, and
P.-O.
Widmark
,
J. Phys. Chem. A
108
(
15
),
2851
2858
(
2004
).
48.
A.
Kramida
,
Y.
Ralchenko
,
J.
Reader
, and
NIST ASD Team
, NIST Atomic Spectra Database (version 5.10),
National Institute of Standards and Technology
,
Gaithersburg, MD
,
2022
, available at: https://physics.nist.gov/asd.
49.
J. A.
Pople
,
D. L.
Beveridge
, and
P. A.
Dobosh
,
J. Chem. Phys.
47
(
6
),
2026
2033
(
1967
).
50.
J.
Ridley
and
M.
Zerner
,
Theor. Chim. Acta
32
(
2
),
111
134
(
1973
).
51.
W.
Weber
and
W.
Thiel
,
Theor. Chem. Acc.
103
(
6
),
495
506
(
2000
).
52.
W.
Weber
, “
Ein neues semiempirisches NDDO-Verfahren mit Orthogonalisierungskorrekturen: Entwicklung des Modells, Parametrisierung und Anwendungen
,” Ph.D. thesis,
Universität Zürich
,
1996
.
53.
M. R.
Silva-Junior
and
W.
Thiel
,
J. Chem. Theory Comput.
6
(
5
),
1546
1564
(
2010
).
54.
D.
Tuna
,
Y.
Lu
,
A.
Koslowski
, and
W.
Thiel
,
J. Chem. Theory Comput.
12
(
9
),
4400
4422
(
2016
).
55.
X.
Wu
,
P. O.
Dral
,
A.
Koslowski
, and
W.
Thiel
,
J. Comput. Chem.
40
(
4
),
638
649
(
2019
).
56.
J. P.
Perdew
and
Y.
Wang
,
Phys. Rev. B
45
(
23
),
13244
(
1992
).
57.
A. D.
Becke
and
E. R.
Johnson
,
J. Chem. Phys.
123
(
15
),
154101
(
2005
).
58.
E. R.
Johnson
and
A. D.
Becke
,
J. Chem. Phys.
123
(
2
),
024101
(
2005
).
59.
E. R.
Johnson
and
A. D.
Becke
,
J. Chem. Phys.
124
(
17
),
174104
(
2006
).
60.
S.
Grimme
,
J.
Antony
,
S.
Ehrlich
, and
H.
Krieg
,
J. Chem. Phys.
132
(
15
),
154104
(
2010
).
61.
E.
Caldeweyher
,
C.
Bannwarth
, and
S.
Grimme
,
J. Chem. Phys.
147
(
3
),
034112
(
2017
).
62.
F.
Weigend
and
R.
Ahlrichs
,
Phys. Chem. Chem. Phys.
7
(
18
),
3297
3305
(
2005
).
63.
D.
Rappoport
and
F.
Furche
,
J. Chem. Phys.
133
(
13
),
134105
(
2010
).
64.
N. C.
Handy
and
A. J.
Cohen
,
Mol. Phys.
99
(
5
),
403
412
(
2001
).
65.
K. B.
Wiberg
,
Tetrahedron
24
(
3
),
1083
1096
(
1968
).
66.
C.
Riplinger
,
B.
Sandhoefer
,
A.
Hansen
, and
F.
Neese
,
J. Chem. Phys.
139
(
13
),
134101
(
2013
).
67.
C.
Riplinger
,
P.
Pinski
,
U.
Becker
,
E. F.
Valeev
, and
F.
Neese
,
J. Chem. Phys.
144
(
2
),
024109
(
2016
).
68.
A.
Altun
,
F.
Neese
, and
G.
Bistoni
,
J. Chem. Theory Comput.
15
(
1
),
215
228
(
2019
).
69.
G.
Bistoni
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
10
(
3
),
e1442
(
2020
).
70.
F.
Neese
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
12
(
5
),
e1606
(
2022
).
71.
M.
Scholten
, “
Semiempirische Verfahren mit Orthogonalisierungskorrekturen: Die OM3 Methode
,” Ph.D. thesis,
Universität Düsseldorf
,
2003
.
72.
A. D.
Becke
,
Phys. Rev. A
38
(
6
),
3098
(
1988
).
73.
C.
Lee
,
W.
Yang
, and
R. G.
Parr
,
Phys. Rev. B
37
(
2
),
785
(
1988
).
74.
J. J. P.
Stewart
,
A.
Klamt
,
W.
Thiel
,
D.
Danovich
,
G. B.
Rocha
,
R. L.
Gieseking
,
J. E.
Moussa
,
H. A.
Kurtz
,
P.
Korambath
,
K. M.
Merz
, and
B.
Wang
(
2022
). “
MOPAC (22.0.0)
,” Zenodo. https://doi.org/10.5281/zenodo.6511959
75.
W.
Thiel
, Program MNDO2020,
Muelheim
,
2020
, https://mndo.kofo.mpg.de/.
76.
R. J.
Gdanitz
and
R.
Ahlrichs
,
Chem. Phys. Lett.
143
(
5
),
413
420
(
1988
).
77.
M. E.
Casida
,
Recent Advances in Density Functional Methods: (Part I)
(
World Scientific
,
1995
), pp.
155
192
.
78.
T.
Husch
and
M.
Reiher
,
J. Chem. Theory Comput.
14
(
10
),
5169
5179
(
2018
).
79.
G. L.
Stoychev
,
A. A.
Auer
, and
F.
Neese
,
J. Chem. Theory Comput.
13
(
2
),
554
562
(
2017
).

Supplementary Material