This article presents a perspective on Kohn-Sham density functional theory (KS-DFT) for electronic structure calculations in chemical physics. This theory is in widespread use for applications to both molecules and solids. We pay special attention to several aspects where there are both concerns and progress toward solutions. These include: 1. The treatment of open-shell and inherently multiconfigurational systems (the latter are often called multireference systems and are variously classified as having strong correlation, near-degeneracy correlation, or high static correlation; KS-DFT must treat these systems with broken-symmetry determinants). 2. The treatment of noncovalent interactions. 3. The choice between developing new functionals by parametrization, by theoretical constraints, or by a combination. 4. The ingredients of the exchange-correlation functionals used by KS-DFT, including spin densities, the magnitudes of their gradients, spin-specific kinetic energy densities, nonlocal exchange (Hartree-Fock exchange), nonlocal correlation, and subshell-dependent corrections (DFT+U). 5. The quest for a universal functional, where we summarize some of the success of the latest Minnesota functionals, namely MN15-L and MN15, which were obtained by optimization against diverse databases. 6. Time-dependent density functional theory, which is an extension of DFT to treat time-dependent problems and excited states. The review is a snapshot of a rapidly moving field, and—like Marcel Duchamp—we hope to convey progress in a stimulating way.

## I. INTRODUCTION TO DENSITY FUNCTIONAL THEORY

“Every technology is used before it is completely understood.”—Wieseltier.^{1} Density functional theory is no exception. Density functional theory has revolutionized the practice of modern chemistry, and we are still trying to make heads or tails of it. Wieseltier continued, “We are living in that lag, and it is a right time to keep our heads.” This perspective will be one attempt to do so.

Density functional theory is now recognized as the method of choice for quantum mechanical electronic structure calculations on the most difficult problems, which are those with the largest number of valence electrons or requiring the most ensemble averaging or following the dynamics over the longest time frames. That is because density functional theory, when it works, is the most affordable way to get reasonably reliable and useful accuracy on such problems. So when cost (which can often be translated as computer resources, such as computer time, memory, and fast cache, but which also includes human time) is a factor, and it usually is, density functional theory is the method of choice. But we are still given pause by the “when it works” qualifier. One of the issues to be discussed in this perspective is: When does it work and how well? Or, from another point of view, where would improvements be welcome?

Density functional theory is a rapidly changing field. Duchamp, who was also participating in a rapidly developing movement, said, “In the ‘Nu descendant un escalier,’ I wanted to create a static image of movement.”^{2} The resulting image he created also carried a certain sense of confusion and evoked reactions to the effect that this is not the way things should be done. We will attempt to present a snapshot in time of a rapidly advancing field, and, following Duchamp, we will try to capture the sense of movement in the field, as confusing and unsettling as that might be.

### A. What is DFT?

Density functional theory (DFT) is a new form of quantum mechanics. There used to be matrix mechanics (invented by Heisenberg), wave function theory (invented by Schrödinger, building on work of de Broglie), the phase space formulation (invented by Wigner), the path integral formulation (invented by Feynman), the density matrix formulation (introduced by von Neumann), and at least four others.^{3} Then came the Hohenberg-Kohn formulation, which we will call DFT. Translating the implications of their results into simple language (at the risk of oversimplification, which is a risk we accept throughout this perspective article), we can say that Hohenberg and Kohn^{4} proved that the sum of the kinetic and electron-electron repulsion energies of an *N*-electron system is a universal functional of its density (the first HK theorem). Furthermore they showed that, if one knew the functional, one could solve for the density and the energy variationally (the second HK theorem). Later, Levy extended the HK variational theorem to include the situations that were not covered in the original theorem, such as degeneracies.^{5} A thorough presentation of the theory would accompany the above statements by presenting the suitable mathematical conditions and caveats and by discussing the range of definition of the density functional, including such issues as *N*-representability and *v*-representability, but we refer the reader to other sources for such details.^{5–8}

Historically, most progress in electronic structure theory was made by Schrödinger wave function theory, and so electronic structure theorists often simply said “quantum mechanics” or “quantum chemistry” to refer to the wave function methods of electronic structure theory. The great success of density functional theory means that we now need a word for Schrödinger wave function theory to distinguish it from density functional theory. In his Nobel Prize lecture, Kohn^{9} used the language wave function theory (WFT) and density functional theory (DFT). We will follow this example. Just as there are different levels of approximation in WFT (for example, Hartree-Fock theory, Møller-Plesset perturbation theory, configuration interaction, multiconfiguration self-consistent field theory, coupled cluster theory, …), there are many different levels of approximation in DFT. In WFT, the exact result (equivalent to full solution of the many-electron Schrödinger equation) is usually called complete configuration interaction (CCI); it is unattainable for all but the simplest systems (H, H_{2}^{+}, H_{2}, H_{3}, He, …), but we can prove that it exists. In DFT, the exact result is obtained by using the exact density functional. It is unknown and essentially unknowable for all but the simplest systems, but we can prove that it exists, at least for some cases and some versions of DFT.

The CCI limit can in principle be attained by starting with a reference wave function and adding single excitations, then double excitation, then triple excitations, etc., until one finally adds *N*-tuple excitations for an *N*-electron system. This systematic approach is totally impractical for most systems of interest, and devising good approximations that balance the errors so one can calculate accurate relative energies for complex systems in a practical way is something of an art. There are many WFT artists making good progress developing new methods,^{10–32} and the domain of problems for which WFT can give useful answers continues to grow. Similarly, there is also no practical systematic approach to the exact result in DFT, and also similarly, many DFT artists are broadening the range of problems for which DFT can give useful answers. Although there is no practical systematic approach, the domain for which DFT can give useful answers is already much larger than the domain for WFT, and it is growing more rapidly.

DFT differs from the other formulations of quantum mechanics in a fundamental way. In the other formulations one knows the equation or equations that govern the motion of the particles, and the hard task is solving the equations. In DFT, solving the equations would be relatively easy if one knew the universal functional, but the hard part is finding the functional. That is, we do not know what equation to solve. In fact, we will probably never know it because it can be shown^{33} that finding the universal functional is “among the hardest problems in the complexity class QMA, quantum Merlin Arthur,” which is the quantum version of non-deterministic polynomial-time hard (NP-hard), which (informally) means you will not solve the problem without approximations. We can understand this as follows: finding the universal functional (also known as the exact functional) is equivalent to solving the many-body Schrödinger equation exactly, and we all know how hard that is, essentially impossible for practical purposes. So DFT, just like WFT, does not provide a shortcut to the *exact* answer for practical problems. Nevertheless, DFT—like WFT—still might be a useful way to obtain *practical* accuracy for many problems, and in fact we know that it is.

Why does one call the exact functional “universal”? The density functional theory of electronic structure considers the electrons to move in an external field. For basic applications, this field is simply the electrostatic field of the nuclei; in the DFT literature it is called the external potential. By “universal” one simply means that the functional for the sum of the kinetic and electron-electron repulsion energies is independent of the external potential, i.e., applies to all possible external potentials—for atoms, molecules, and solids.

Although the meaning of “universal” in the above discussion is well accepted in the literature on the foundations of DFT, in the rest of this perspective (and in previous publications), we and some others use the word with a different (but related) meaning. In particular, by “universal functional” we mean one that has useful accuracy for all problems in chemistry, materials science, and condensed-matter physics, or at least broader accuracy than other available functionals. The importance of the search for a universal functional and validating functionals against a wide range of problems can hardly be overemphasized. For example, in reviewing the current state of density functional methods for catalysis and photocatalysts, Pacchioni said^{34} “It should be stressed, however, that it is very important to be able to identify which method is the most appropriate for a given problem, and that the quality of the results critically depends on this specific capability. At the moment, there is no single universal solution for every chemical or physical problem.” In the rest of this article, we use “universal” as a synonym for “general purpose.” Finding a universal (or more universal) functional is the objective of many of our efforts in Kohn-Sham DFT.

### B. What is Kohn–Sham DFT?

So far, the most successful way to implement DFT is the method proposed by Kohn and Sham.^{35} In fact, this method is so widely used that many researchers simply say DFT when they really mean Kohn-Sham DFT (KS-DFT).

To understand the motivation for the KS-DFT version of DFT, it is useful to consider the historical Thomas-Fermi and Thomas-Fermi-Dirac approximations.^{36–38} These authors approximated the energy in terms of the density by a statistical theory, and their work may be considered the beginning of DFT, but it is not useful for molecules, predicting no binding of atoms to one another.^{39} A second historically important line of approach was the self-consistent-field (SCF) approach developed with prominent contributions by Hartree,^{40} Slater,^{41} and Roothaan.^{42} This method involves orbitals and a many-electron wave function, and it yields physically reasonably molecular energies, although it is not an exact theory and it is not quantitatively accurate because it takes no account of the correlated motion of the electrons, beyond that provided by the antisymmetry of the wave function. Kohn reasoned that the main improvement in the SCF approach over the Thomas-Fermi approach is the treatment of electronic kinetic energy, and he decided to construct an in-principle exact theory by combining the SCF way of treating the kinetic energy with an explicit functional of density motivated by the HK theorems.^{9} Therefore KS-DFT approximates the kinetic energy in terms of orbitals, as explained below.

Since then, progress has been made in improving density-based approximations to the kinetic energy, and direct application of the HK theorems in this way is called orbital-free DFT. Orbital-free density functional theory achieves the ultimate goal of DFT to reduce the *N*-particle problem from 3*N* dimensions to 3 dimensions, since the density depends only on the three dimensions of real space. Although progress has been made,^{43} and orbital-free density functional is useful for certain classes of problems (for example, treating very large systems containing metals), it is still less accurate in general than KS-DFT, and we will not discuss it further. Instead we refer the reader to a review^{44} and a recent research article.^{45}

By introducing orbitals, KS-DFT reduces a single 3*N*-dimensional equation not to a 3-dimensional equation, as in orbital-free DFT, but to *N* coupled 3-dimensional equations for *N* spin-orbitals, and these equations should be solved self-consistently. Because the orbitals must be orthogonal, the computational effort scales as at least *N*^{3} in the limit of large *N*. We say “at least” because some approximations to the density functional (such as nonlocal exchange or nonlocal correlation) may scale more steeply, depending on the algorithms and software.

We also note that essentially any quantum chemical computational scheme can be made to have linear scaling [i.e., scale as *AN* or *AN* ln *N*, where *A* is a prefactor] by a divide-and-conquer approach, but that often gives a large prefactor. The derivation of linear-scaling algorithms with efficiently low prefactors is a field in itself,^{46} much broader than DFT, and it is beyond the scope of the present perspective to cover it in detail. But we do note the great promise of density embedding methods, of which there are various flavors.^{47–53} However, we will not cover embedding methods further in this perspective, except to note that density embedding not only can serve as a way to speed up DFT calculations, but also it can serve as a new way to combine WFT with DFT, which is an approach that has great promise but is not fully covered in this article.

How does the introduction of orbitals fulfill the need for an accurate approximation to the kinetic energy? In KS-DFT, the density is calculated from a reference function that is a single Slater determinant and that is the wave function of a system of noninteracting electrons with the same density as the real system. The spin-orbitals of this determinant are used to evaluate an approximation to the kinetic energy of the real (interacting) system in the same way that one calculates kinetic energy in WFT. This approximation to the electronic kinetic energy is called *T*_{S}. The electron density is also calculated from the Slater determinant and is used to calculate the classical electrostatic energy^{54} of the charge distribution, which is due to electron–nucleus and electron–electron Coulomb interactions (and the interaction of electrons with an external field, if present). Note especially that the classical Coulomb energy does not contain the exchange energy, which is a quantum effect due to the antisymmetry of a many-electron wave function. We will call the classical electrostatic energy *V*_{C}. The exact quantum mechanical energy differs from the sum *T*_{S} + *V*_{C} by the correction to the kinetic energy (the difference between the exact kinetic energy and *T*_{S}), the electron exchange energy, and the electron-electron repulsion energy contribution to the electron correlation energy.^{55} The sum of these three effects is then taken to be a functional of the density. This kind of density functional is called the exchange–correlation (XC) functional. Note that the exact XC functional is nonlocal;^{56} in general it may depend on local functions of the density, such as its gradient, and on nonlocal functions of the density, such as integrals over all space of the density times other functions, and the most accurate currently available XC functionals also depend on functionals of the density, such as the occupied and unoccupied spin-orbitals of the Kohn–Sham Slater determinant that represents the exact density or of any other wave function that is functional of the density. Then KS-DFT leads to a set of coupled pseudoeigenvalue equations, called the Kohn-Sham equations, for the *N* spin-orbitals. (Like the Hartree-Fock equations, they are pseudoeigenvalue equations, rather than eigenvalue equations, because they are nonlinear in the spin-orbitals.)

A full KS-DFT calculation is self-consistent; one varies the orbitals to minimize the KS-DFT energy expression, which is itself calculated from the orbitals. However, one can also perform post-SCF calculations. For example, one could optimize the Slater determinant by WFT (which would be Hartree-Fock theory) and then evaluate the energy using the KS-DFT energy expression without minimizing it. Such calculations are in principle less accurate but can sometimes be useful.

So far we have discussed the total electron density, which is a natural variable for electronic states that are closed-shell singlets. For open-shell systems, one must consider the local magnetization as well as the density. This can be done correctly only in a relativistic framework,^{57–60} but in DFT, as is usually done in nonrelativistic WFT, we use a simpler formalism^{61} that involves the spin-up and spin-down electron densities, *ρ*_{α} and *ρ*_{β}, respectively. This formalism is precisely analogous to spin-unrestricted Hartree-Fock theory^{62} (usually just called unrestricted Hartree-Fock theory), and it optimizes the spin-orbitals of the Slater determinant under the constraint that every spin-orbital is an eigenfunction of *s _{z}*, the operator for the component of electron spin in a given quantization direction

*z*; but there is no constraint on the total electron spin

*S*. This is usually called unrestricted KS-DFT (and abbreviated UKS) in the chemistry literature and spin-polarized DFT in the physics literature. The results are independent of the choice of spin quantization direction if spin-orbit coupling is neglected. Using this formalism one can restate the above discussion by replacing “density

*ρ*” by “spin densities

*ρ*

_{α}and

*ρ*

_{β}.” In dealing with approximate functionals, it seems to be a general practice that one takes the closed-shell functionals to be special cases of the unrestricted functionals rather than using different functionals for the two versions of the theory.

Note that the spatial parts of the α spin-orbitals and the β spin-orbitals are different in UKS theory. One can consider an even more general Slater determinant in which each spin-orbital is a linear combination of α spin and β spin, with spatially varying coefficients. Recall that a linear combination of α and β is a spin function quantized along an axis other than the *z* axis. Thus the more general spin-orbitals have spin angular momenta in different directions at different points in space, and at a given point in space, the spin angular momenta of individual spin-orbitals are in more than two directions and are different for each orbital. Thus the spins are not collinear, and this is called noncollinear DFT. It has mainly been used for systems with noncollinear magnetic ordering,^{63} although it can also be used to provide a more flexible spin coupling of the spin-orbitals of the Slater determinant without regard for magnetism.^{64,65} Because it is not widely used, we shall not consider noncollinear DFT further in this perspective, and we will restrict our considerations of KS-DFT to UKS theory (including closed-shell singlets as a special case where there are an equal number of α and β electrons and the spin-orbitals are restricted to be doubly occupied pairs of spatial orbitals). A key point though is that neither UKS theory nor the noncollinear generalization necessarily represents the density by a Slater determinant with the correct spin symmetry in the general case, even if one used the unknown exact exchange-correlation functional that would give the exact density and exact energy.

### C. Broken symmetry, improved functionals, and beyond Kohn-Sham

It is often useful to divide electron correlation effects into two categories, although the border is not sharp. One kind of electron correlation is dynamic electron correlation, and it is most easily understood in two limits. In particular, it is associated with electrons avoiding one another when they are very close (this effect is often associated with the cusp in the electronic wave function when the distance between two electrons tends to zero), and it is also associated with electrons correlating their motion for two charge clouds to have favorable mutually induced multipole interactions when the two charge clouds are far apart (this effect leads to dispersion interactions between separated fragments but also plays a role for more general situations). The other kind of electron correlation is often called nondynamic, static, near-degeneracy, or strong correlation, with the different languages evoking different aspects, different contexts of discussion, or different connotations.

Systems for which a single configuration state function (CSF) provides a good zero-order description (good “reference wave function”) are called single-reference (SR) systems. (A CSF for a closed-shell state or a high-spin open-shell state with no spatial degeneracy is a single Slater determinant; for other states a CSF is a linear combination of the fewest number of Slater determinants required to make a wave function with the correct spatial and spin symmetry.) Systems for which a single CSF does not provide a good reference wave function are called multireference (MR) systems or strongly correlated systems.

KS-DFT with the exact functional would yield the correct density and energy not only for SR systems but also for MR systems, even with the single-configuration representation of the density that it employs; however existing XC functionals are not convoluted enough to make up for the unphysical nature of the reference function for MR systems. (An analogous effect in WFT is that if one describes electron correlation in an MR system by a superposition of CSFs built by excitation from a reference CSF, one requires high-order excitations (quadruple or higher) for a quantitative description.) Thus the accuracy of Kohn–Sham theory with existing functionals is typically lower for MR systems than for SR ones.^{66–70} Furthermore, in order to obtain correct energetics, the exact functional may require representing the density by a Slater determinant that is not a spin eigenfunction and that has the wrong symmetry.^{71} In other words, KS-DFT leads to the so called broken-symmetry solutions. The problem is particularly severe for spin symmetry when 2*M _{S}* is less than the number of unpaired electrons, where

*M*is the component of total electron spin along the quantization axis. Thus it is not always clear which of the nearly degenerate states is being approximated.

_{S}The problem of treating MR systems accurately is a very important one since static correlation is present (to a greater or lesser degree) in partially broken bonds (and hence bond-breaking potential curves and reactive potential energy surfaces), in most transition-metal catalytic systems, in most excited states, in conjugated radicals that are important in atmospheric and combustion chemistry, and in many other systems.

In the variational approach to KS-DFT, one assumes that the lowest energy solution with a given *M _{S}* gives the energy of the lowest-energy state with S =

*M*. This often works well for high-spin states, which we define as states where 2

_{S}*M*is equal to the number of unpaired electrons, but when 2

_{S}*M*is less than the number of unpaired electrons, one often gets a severely broken-symmetry solution, especially if the low-spin state of interest is higher in energy than (or approximately the same in energy as) a higher-spin state. Many workers have tried to use WFT concepts to make the KS-DFT calculations more manifestly approximate a specific quantum state; this has led to the development of strategies for interpreting broken-symmetry solutions.

_{S}^{72–95}Although these methods often give improved results, they do not provide improvement in general because their validity rests on the orbitals being unchanged (except for occupancy) between the low-spin and high-spin states.

^{94}Therefore, one of the unmet challenges for DFT is the proper treatment of multireference systems or, to say it another way, the treatment of nearly degenerate states. A related practical issue is whether one fully optimizes the SCF solution to the Kohn-Sham equations. Our own practice is to always try to optimize to a stable solution, even if that breaks symmetry, but it is not always clear in the literature if other workers have done this (and in fact, the literature appears to have many calculations where this was not done). A discussion of the stability of SCF solutions has been given by Schlegel and McDouall;

^{96}it is written in terms of Hartree-Fock theory but it is equally applicable to KS-DFT.

Here the development of DFT branches into two quite different approaches. In the first approach, we try to improve the XC functional of KS-DFT so that that it can treat both SR and MR systems. We cover this approach in Section II.

In another approach, one can move beyond Kohn-Sham theory and replace the Slater determinant of Kohn-Sham theory by a multiconfigurational reference function. In a collaboration of our group with Gagliardi and her group, we have recently introduced an approach of this type, which is called multiconfiguration pair-density functional theory,^{97–101} and many other workers have also attempted to do this (we limit ourselves here to a few representative references^{102–109} out of dozens possible). Such a multiconfiguration approach can provide more definitive answers for cases where KS-DFT suffers from the inability of current functionals to deliver good enough results for multireference systems and for cases where KS-DFT makes ambiguous predictions due to broken symmetry. A review of this approach is in preparation for *Accounts of Chemical Research*,^{110} and so it will not be discussed further in the present perspective.

## II. ADVANCES IN KOHN-SHAM DENSITY FUNCTIONAL THEORY

Let us remind ourselves of the elements of KS-DFT. First we make the Born-Oppenheimer approximation so that the potential function for the nuclei is the electronic energy, which is defined to include nuclear repulsion as well as electronic kinetic energy and electronic Coulomb energy, which must be treated quantum mechanically, and so the accurate Born-Oppenheimer energy includes exchange and depends on the correlated motion of individual electrons. The central property of KS-DFT is the one-electron probability density, usually just called the density, or, for open-shell systems, the two spin densities—one (*ρ*_{α}) for spin-up electrons and one (*ρ*_{β}) for spin-down electrons. As already mentioned, when the spin-up density does not equal the spin-down density one speaks of spin-polarized KS theory (also called spin-unrestricted Kohn-Sham theory or simply UKS theory, as above). Since closed-shell theory is a special case of the more general theory that works with both spin densities, we will cast our discussion in terms of the more general spin-polarized theory. In practice, one should always use spin-polarized theory.^{111} One then calculates the electronic energy as the sum of three terms: the kinetic energy of the Slater determinant, the classical Coulomb energy (which includes nucleus-nucleus interactions, nucleus-electron interactions, and electron-electron interactions and interactions with an external field, if present), and a remainder, which is called the exchange-correlation (XC) energy (one might think that the name should also mention the correction to *T _{S}*, but that correction is traditionally considered as part of the “correlation energy”). The XC energy is taken to be a functional of the density, and there is a unique XC functional that yields the exact energy if one has the exact density. Furthermore, there is a variational principle so that one could use the XC functional to find that density. So (in principle) density functional theory always works; it is an exact theory.

But we do not know the XC functional, and we almost surely never will, and so we work with approximations to it. It might be useful to remind ourselves that we are always working with an approximation to the XC functional by always saying “approximate XC functional” and “approximate density functional theory” when we do practical work, but that is cumbersome (and unnecessary since we never use the unknown exact functional for practical work). So we say density functional theory or KS-DFT or XC functional when we should say approximate density functional theory, or approximate KS-DFT, or approximate XC functional, and we will do that here simply as a matter of convention. That is, when one says DFT fails or has large errors for some problem, what one really means is that it fails or has large errors with the approximate XC functionals being used or with any XC functional that is known. The practical problem in developing KS-DFT is the determination of better and better approximations to the XC functional. The practical problem in using KS-DFT intelligently often reduces to knowing which approximate XC functional that is already available is accurate (or most reliable) for the problem to be studied and demonstrating the expected validity (for example, by citing an appropriate reference or by benchmarking on similar systems) so that others will trust the calculation—we could call this the validation problem. The development problem and the validation problem go hand in hand. When we find a problem for which currently available XC functionals are not good enough, we are motivated to find better ones. Ideally, as mentioned above, we would find a universal XC functional with good accuracy for all the problems of chemistry, molecular physics, and condensed-matter physics. In practice we are often satisfied to use XC functionals that are good for some task at hand, even though we know they have failures or larger-than-desired errors for some other problems. But many of the interesting problems in modern chemical physics, such as sustainable energy, catalysis, and environmental science, are complex in involving many different physical properties in the same problem, for example, we may need to know the thermochemistry, the polarizability, the conductance, and the optical properties all for the same system. For this reason, the quest for a universal XC functional continues.

A review^{112} of various WFT and DFT methods for treating noncovalent interactions concluded that a “promising route seems to be the combination of the strong points of WFT and DFT techniques into hybrid methods.” This brings up the complication that, not just for noncovalent interactions but also more generally, density functional theory need not be used in pure form. For example, one can combine density functional theory with molecular mechanics as in combined QM/MM methods^{113–115} or as in so-called dispersion-corrected density functionals,^{116–118} in which one adds a post-SCF damped dispersion term onto the KS-DFT energy for functionals that do not predict accurate van der Waals interactions. The latter approach was developed most systematically by Grimme and coworkers, especially leading to their D3 damped dispersion term^{116} (also called D3(0) because the add-on term is damped to zero at short range) and a later version D3(BJ)^{117} (which is damped to a constant at short range). This approach has two disadvantages: (i) the add-on terms are unphysical in that they not only include damped dispersion, but they make up in part for deficiencies of the XC functional (and hence they are different for each XC functional); (ii) like all molecular mechanics methods, they can suffer from lack of transferability. When the parameters of the add-on terms are optimized for an XC functional that already predicts reasonably accurate results for noncovalent interactions, the damping is very severe at van der Waals distances, and only the long-range tail of the dispersion interaction is added on (to account for pure dispersion effects in the region when the charge distributions do not overlap). Nevertheless, the method has been found useful when used for the kinds of systems for which the parametrization is most thorough, for example, interactions between organic molecules. A recent variation on this then is to combine damped dispersion corrections with pairwise corrections for basis set superposition error so that one may treat large systems with small basis sets.^{119}

Another complication is that the border between DFT and WFT can be ambiguous. Formally, the KS orbitals (out of which the Slater determinant is formed) are functionals of the density, and so one may use functionals of the orbitals, motivated by wave function theory, as part of the XC functional, leading to so called orbital-dependent functionals. The most popular of the orbital-dependent functionals are those that calculate the exchange energy of the Slater determinant by the same expression as used in wave function theory. This leads to the so called hybrid XC functionals. Sometimes this is treated as a particular instance of KS-DFT,^{120,121} and in other cases one treats it as a generalization^{122,123} of the original KS-DFT. Both languages are useful.

Since Hartree-Fock exchange is nonlocal, hybrid DFT loses a key feature of the original KS theory, namely that the effective potential determining the orbitals is a local potential in KS theory. One can derive an equivalent local potential, called the optimized effective potential,^{124,125} although this can be a somewhat unstable procedure. The equivalence refers to the total energy and density being the same, but the orbitals are different; with the original (local) KS theory or the equivalent local potential the unoccupied orbitals moving in the same potential as the occupied orbitals, and hence they may be considered as excited-state orbitals, but this is not true with the nonlocal Hartree-Fock potential, for which the virtual orbitals of a neutral have the interpretation of being anionic orbitals. This affects the interpretation of band gaps and orbital energy gaps.^{126,127} We will say more about band gaps below.

Just as hybrid functionals, in which exchange is calculated by a method drawn from WFT, lead to ambiguity in how to consider the combination of DFT and WFT elements, nonlocal treatments of correlation also cause ambiguity. One can think of functionals with correlation energy computed by formulas suggested by WFT (e.g., second-order perturbation theory) as multicoefficient correlation methods combining WFT and DFT or one can think of them as having nonlocal correlation functionals, in which case they are called doubly hybrid functionals or double hybrid functionals.^{128–147} One can also use many-body wave function theory to build on a KS starting point, as in GW theory^{148–151} (a many-body theory where G denotes a Green’s function and W denotes a screened Coulomb interaction).

### A. Static correlation and KS-DFT

A widely asked question is: Does KS-DFT include static correlation?^{66,152} We introduced the concept of static correlation in Section I, where we used the usual language of “static correlation energy,” but it might be better to change the language to “static correlation error.” We can define static correlation error as the error one makes when one calculates the exchange energy by Hartree-Fock theory. It was recognized already in the 1930s that Hartree-Fock theory (which has nonlocal exchange) overemphasizes ionic contributions to the wave function. Prior to the Kohn-Sham paper, Cook and Karplus^{153} analyzed self-consistent-field calculations with Slater’s local approximation^{41} to exchange (as in the Xα method^{154}) from this point of view, and showed how the local approximation does not have this static correlation error. Since Slater’s local approximation to exchange may be considered to be an early version of the Gáspár^{155} exchange approximation used by Kohn and Sham in their original KS-DFT paper, this conclusion is also applicable to KS-DFT. (Slater exchange differs from Gáspár-Kohn-Sham exchange in that Slater derived his formula by averaging the exchange energy of a uniform electron gas over the whole Fermi sea, whereas Gáspár-Kohn-Sham exchange is obtained by evaluating it at the Fermi level, which is the variational result because variations of the electron density take place at the Fermi level.) Tschinke and Ziegler^{156} provided a further analysis along a related line that showed that the Slater approximation does not have static correlation error because it “avoid[s] near-degeneracy error through a balanced description of the molecular and atomic hole functions.” This analysis would seem at first to be restricted to left–right correlation in bond breaking, but Buijse and Baerends^{157} have argued that it is more general. For example, in bonding of 3d transition metals to ligands, the bond overlap is small due to repulsion of the ligand by filled 3s and 3p shells on the metal; this implies that even at equilibrium geometries, transition metal–ligand bonds are effectively extended into the partially bond-broken region where multiconfigurational effects are significant in WFT.

As discussed in Subsection I C, broken-symmetry solutions mimic static correlation in enabling a single determinant to dissociate properly to two radicals, but the disadvantage is spin contamination of the Slater determinant. It is interesting that Hartree-Fock exchange leads to worse spin contamination in such cases than does local exchange.^{158}

It is clear from the above analyses and previous work^{153,156,159,160} that local exchange either mimics static correlation or avoids Hartree-Fock static correlation error, depending on one’s point of view. Becke summarized the situation as follows:^{161} “Local ... density-functional theory (DFT) exchange approximations ... mimic exchange + static correlation. They do so, however, in an uncontrolled manner.” However, Grüning *et al.*,^{162} Becke,^{163} Malet and Gori-Giorgi,^{164} and Kong and Proynov^{165} have tried to explicitly build static correlation effects into Kohn-Sham functionals; and MC-PDFT,^{97–101} as mentioned at the end of the Introduction, provides an alternative way to gain control of specific multiconfiguration effects by putting them into the reference wave function. Furthermore, our group has designed functionals that give improved results for MR systems even with single-configuration reference functions, as discussed further below.

### B. Parameterization or theoretical constraints?

The question of empirical versus nonempirical approaches to density functional theory attracts a lot of attention, and some workers prefer methods based on the satisfaction of formal limiting laws. There are many such constraints, but I will concentrate here on one of them, namely the model system variously known as a uniform electron gas (UEG), free-electron gas (FEG), or homogeneous electron gas (HEG). The simplest way to incorporate what is known about the UEG is to make the local spin density approximation, LSDA (for a closed-shell singlet this reduces to the local-density approximation or LDA). A UEG is seemingly described only by its spin densities (for the spin polarized case, or by its density for a closed-shell singlet). Therefore, in the LSDA, the XC functional is taken to depend only on the local spin densities, and traditionally this has meant that it has the same dependence on them as for a UEG. In addition, some researchers design more general XC functionals (those for which the local XC energy depends on more than the local spin densities) so that in the limit of vanishing spin-density gradients, the functionals tend to the UEG limit. In fact, some workers have insisted with extreme fervor on not violating this UEG limit (similarly Duchamp was told with extreme fervor that one does not paint figures descending a staircase^{166}).

Actually, no such thing as a UEG actually exists in the real physical world and more than one kind of UEG exists in the mathematical world.^{167} In order to make the electron spin density uniform, one would need a uniform positive background charge (and a high enough value of both the background charge and the electron density to avoid Wigner crystallization), but in the real world, positive charge is nonuniform, being located in the nuclei. It has been pointed out by many workers that real atoms and molecules are not well approximated by a UEG. For example, Wigner^{168} stated that a uniform positive charge distribution “will not be a lattice at all and will in many ways have very little similarity to an actual metal. On the other hand, it can readily be treated mathematically....”

Kohn and Sham^{35} pointed out that the UEG “has no validity” at “the ‘surface’ of atoms and the overlap regions in molecules,” and hence they said, “We do not expect an accurate description of chemical binding.” In other words, the UEG model is inappropriate at interfaces of the metal with vacuum, gas, liquid, or other solids, and that is where many interesting phenomena occur. To most workers it seems that the most obvious way to treat a UEG is with periodic boundary conditions in three dimensions, and that is what we mean here when we discuss a UEG without extra specification. However, since the UEG is a model system that does not actually exist, one can consider other model systems with uniform electron densities. Gill and Loos,^{167} for example, argued that a UEG is not characterized simply by its density; for example, one could consider a three-dimensional uniform electron gas confined to the surface of a four-dimensional sphere, and at a given density it has a different energy than the commonly used UEG. Following up on the work of Gill and Loos, Sun *et al.*^{169} concluded that small finite systems have their own version of the LSDA approximation, different from the UEG one. In some cases they have increased magnitude of the negative exchange energy and smaller magnitude of the negative correlation energy.

Because the UEG model is so deeply ingrained into many discussions of KS-DFT and is often considered to be the starting point for nonempirical derivation of XC functionals, it is worthwhile to continue to emphasize its limitations. Mattsson and Kohn^{170} commented that “*a priori* approximations for the exchange correlation energies of DFT such as the local density approximation ... cannot be expected to accurately describe electronic surfaces or edges, where the Kohn–Sham wave functions undergo a transition from propagating to evanescent character. The reason is that these approximations have been developed for bulk systems in which there is no evanescence.” This argument applies to molecules as well as surfaces. Pople *et al.*^{171} showed that the local density approximation incorrectly partitions the electron correlation of atoms into core and valence regions and into same-spin and opposite-spin contributions. Liu and Parr^{172} argued, “atoms and molecules are far from uniform. Indeed, their essential nature is exponential falloff from nuclei. The evidence that the LDA is the best starting point is not strong.” Handy and Cohen^{159} optimized an exchange functional that does not tend to the accepted UEG limit when the density becomes uniform, and they justified this by saying^{173} “the UEG plays no role in actual systems that are of interest to chemists.” Since many of the formal arguments about the nature of the XC functional start with the UEG, these considerations also apply to many such arguments.

However, even if one disregards all the above problems, it is hard to claim that enforcing the uniform electron gas limit is nonempirical. For example, in trying to derive more useful approximations to the XC functional, Sham^{174} made the argument that “Because the LDA has proven to account for the bulk of the interaction effects rather well, further approximations should contain the best features of the LDA.” One assumes that if the LDA did not account for the bulk of the effects of interest to them, early developers might have raised the objection that the UEG is an artificial system, and the limit of a uniform density is not a physical limit about which we should be concerned. This latter kind of reasoning has actually played a large role in the development of conventional functionals. For example, very early in the development of improved exchange-correlation functionals it was realized^{175} that enforcing the correct power series expansion of the exchange-correlation energy (expanding in derivatives of the density about the UEG limit) led to very inaccurate exchange-correlation functionals for real systems. Thus, even though the leading terms in the gradient expansion were known, they were not used as a constraint. This led to the idea of the generalized gradient approximation in which one ignores the known low-order terms in the gradient expansion and determines the form of the dependence on the gradient by other considerations. One might call this an empirical choice of constraints. Many workers like to count parameters, and because an empirical choice of constraints is not easily countable, they do not count it. It is, however, an empirical element. Based on these kinds of considerations, some workers have adopted the viewpoint that *all* known XC functionals are to some extent empirical, and we agree with that assessment. Our own point of view is that it is worthwhile to pursue more than one approach, using *both* selected mathematical constraints *and* empirical tuning, to develop useful new functionals.

An issue introduced in the previous paragraph is the question of counting parameters. A question often raised in conference discussion is “How many parameters does your density functional have?” This kind of concern is clearly motivated by the well-known problem of over fitting, a problem much more general than DFT. If one fits a limited amount of data with a large number of parameters, the fit may reproduce the data but have little predictive value for problems outside the data set because it is nonphysical. We have already mentioned one difficulty of counting parameters, but there are others. For example, some workers consider that parameters used to fit the UEG are somehow “fundamental” and should not be counted, but a parameter fitted to, for example, the energy of the helium atom (which can be calculated very accurately without any empiricism) should be counted because it is real-world data. However, even if one resolves the question of fundamental constraints vs. real-world data, and even if one employs a functional form with countable parameters, it is our experience that simply counting parameters is not the best way to assess the likelihood of over fitting. The composition of the data set and the functional form optimized (including any constraints or restraints that are imposed) have a much greater effect on the quality of a parametrized functional than the number of countable parameters. In the final analysis, we believe that developing exchange-correlation functionals requires judgment and cannot be evaluated by simple measures. Counting parameters in a density functional is a little bit like evaluating the quality of a research program by counting the publications it produces—the number of publications is hardly irrelevant, but it is far from the whole story, and usually it is not the decisive measure of quality.

The above discussion of empiricism leads into the related language issue of “*ab initio*” and “first principles” which are often used as alternatives to “nonempirical.” One could argue that there is no unambiguous way to classify theories to make this kind of language meaningful. This is an old subject; Dewar, for example, argued that^{176} “all current theoretical procedures have to be tested because none can be used in chemistry in any but an empirical sense.” Becke argued,^{177} “Whether the underlying universal functional dependence is obtained from purely theoretical arguments (very difficult) or from fits to experimental data (much more practical) is entirely irrelevant. Information on the ‘shape’ of the Kohn-Sham functional, revealed by *whatever* means, is of fundamental value and utility.” We agree; we prefer to judge XC functionals on the basis of their performance and economy rather than their method of derivation or parametrization.

### C. Exchange–correlation functionals classified by their ingredients

There are many levels of approximation in use for Kohn-Sham XC functionals. The variety is so great that Perdew^{178} titled a 1999 review “The Functional Zoo.” Now, more than 15 years later, the variety has grown many fold, and the need for phylogeny and taxonomy is even greater.

We first classify the approximations by an algorithmic criterion, namely, is the approximation to the energy density at a point a function of only the spin densities, spin-orbitals, and their derivatives at that point? If the answer is yes, we call it a local XC functional. One must be careful to distinguish local approximations from local-spin-density approximations, which are the simplest example of local approximations; in particular, local-spin-density approximations are the case where the energy density at a point is a function of only the spin densities at that point. (Many physicists use the word “semilocal” where we say “local”—reserving the word “local” for local-spin-density approximations.) One can write the KS-DFT partition of the energy as

where (...) denotes the arguments of a function, [...] denotes the arguments of a functional, the XC energy is

and

In Eq. (1), **r** is a point in space, *ρ* is the electron density, and the total energy *E* is a sum of the kinetic energy *T*_{S} of the non-interacting system with the same density, the interaction between the electron and external potential (which is usually just the potential due to the nuclei), the classical electrostatic energy *J*(*ρ*) of the electronic distribution, and *E _{XC}*. In Eq. (2), ε

_{XC}stands for the XC energy density at a given point in space; for a local XC functional, it depends only on variables evaluated at

**r**; for a nonlocal functional it depends on quantities evaluated at all points in space. The classical electrostatic energy of a charge distribution involves every electron (for example, Avogadro’s number of electrons) interacting with the total electrostatic potential.

^{54}Part of that potential is from the electron itself; it is a negligible part in macroscopic terms, but it is not negligible on the atomic scale.

^{179–181}Removing this self-interaction energy is a major part of what the exchange-correlation energy must accomplish.

The most commonly used nonlocal ingredient is Hartree-Fock exchange, which involves calculating the exchange energy from the Slater determinant by the same expression as used in Hartree-Fock theory. Hartree-Fock exchange in KS-DFT is numerically different from Hartree-Fock exchange in Hartree-Fock theory because the Kohn-Sham orbitals *ϕ _{i}* are not the same as the Hartree-Fock orbitals. (Hartree-Fock exchange is sometimes called exact exchange, but we prefer not to do this because the separation of exchange from other energy components is not well defined in WFT when one has a multiconfiguration wave function, which is necessary to get the correct answer for any system with more than one electron.) The Hartree-Fock exchange energy is an orbital-dependent quantity; it is given by (assuming real orbitals)

^{182}

where

We note that the classical electrostatic energy density is also nonlocal, with the electron-electron part being given by

where

Electron exchange removes self-interaction error, and Hartree-Fock exchange has the great advantage that full Hartree-Fock exchange completely removes self-interaction error in a Slater determinant. But Hartree-Fock exchange also has disadvantages. There is a physical disadvantage and a practical one. The physical disadvantage is that it introduces static correlation error, so if one can account for exchange effects with a local functional, one can avoid that error, and that makes it easier to optimize a density functional that gives good performance for MR systems. The algorithmic disadvantage is that Hartree-Fock exchange is more expensive to calculate, especially in plane wave codes,^{183,184} where it can be even two or more orders of magnitude more expensive.

In order to improve the performance of density functionals, the exchange-correlation functional is also allowed to depend on the gradients of the spin densities. Functionals that depend on both the spin densities *ρ*_{σ} and their gradients $\u2207\rho \sigma $ and that separately approximate the exchange and correlation^{185} are called generalized gradient approximations.^{186} The revolution in popularizing KS-DFT was initiated by the exchange GGA published by Becke^{187} in 1988 (denoted as B88 or often just as B) and by the correlation functionals of Perdew^{188} (P86), Perdew and Wang^{189} (PW91), and Lee, Yang, and Parr^{190} (LYP). The LYP correlation functional contains parameters fit to the helium atom, and it has no self-correlation error in one-electron systems. When exchange is separated from correlation, the correct scaling of exchange under a uniform expansion of all coordinates is achieved only^{191} if the exchange energy density is separable, that is, if it is a sum over spins σ of a function of *ρ*_{α} times a function of $s\sigma 2$ where *s*_{σ} is the dimensionless gradient proportional to $\u2207\rho \sigma /\rho \sigma 4/3$; standard GGAs for exchange all satisfy this. The B88 exchange functional has the correct asymptotic behavior at long range for the exchange energy density, and it incorporates a parameter optimized by fitting to the Hartree-Fock exchange energies of six noble gas atoms (from He to Rn). Popular functional forms for GGA exchange functionals are rational function expansions in the reduced gradient^{192,193} and polynomials of such functions.^{177,194}

These GGAs usually yield better results than the LSDA. More recently we proposed parameterizing functionals with the same ingredients (spin densities and their reduced gradients) but without using a separable form for exchange and without separately parameterizing exchange and correlation since only their sum is important; we called such a functional a nonseparable gradient approximation (NGA).^{195} Such a functional is more flexible, and the two functionals parameterized this way, called N12^{195} and GAM,^{196} have better performance on average than GGAs.

When exchange is not separated from correlation, it is harder to apply the uniform scaling constraint, although this could be done by projecting the exchange energy (according to one possible definition of exchange energy in KS-DFT) from the exchange-correlation energy.^{197} So far, this has not been done with NGAs.

Hartree-Fock exchange can be mixed with either GGAs or NGAs, yielding what are called hybrid GGAs^{121} and hybrid NGAs.^{198} The originally proposed way to mix Hartree-Fock exchange with local exchange mixes a certain percentage *X* of Hartree-Fock exchange with a percentage, nominally 100 − *X*, of a local exchange approximation, a practice rationalized by Becke based on the adiabatic connection^{199–202} between the non-interacting Kohn-Sham reference system of electrons and the real interacting system of electrons. Based on this connection, Becke^{121} originally suggested a half-and-half approach (*X* = 50). In later work, *X* is usually empirically optimized to improve performance, with optimum values usually in the range *X* = 5–60 when *X* is treated as a constant (more general treatments of *X* are discussed below). In 1993, Becke reported a three-parameter functional, called B3PW91,^{203} which was fitted against main-group atomization energy data. One of the parameters is *X*, which has the value 20 in this functional. By replacing the PW91 GGA correlation functional with the LYP GGA correlation functional, Stephens *et al.*^{204} obtained the B3LYP functional, which was so successful it revolutionized the practice of computational chemistry. Although it has been out of date for at least ten years, it is still widely used.

The reason for mixing a certain amount of a local exchange approximation with a complementary amount of Hartree-Fock exchange is that there is a trade-off between beneficial and deleterious effects when introducing Hartree-Fock exchange. Above we emphasized the static correlation error introduced by Hartree-Fock exchange, but there are also benefits of including it. The benefits result from the removal or lessoning of self-interaction error. Hartree-Fock exchange removes at least some self-interaction error. This has many chemical consequences. For example, self-interaction tends to make Kohn-Sham orbitals too delocalized, and Hartree-Fock exchange corrects that tendency. Self-interaction error also leads to a large error in the magnitude of the electric field felt by an electron far from the nuclei (for a neutral system, the exchange potential decays exponentially rather than by Coulomb’s law); Hartree-Fock exchange corrects this. At a higher level of approximation though, one finds that electron correlation in a solid screens the exchange,^{123,205,206} and so the local approximation to exchange may be more accurate; this again shows that one must consider exchange and correlation together.^{207} The effective local dielectric constant is the reciprocal of the fraction of retained Hartree-Fock exchange; therefore in principle the fraction of retained Hartree-Fock exchange should depend on the position through the local dielectric constant,^{208} and below we will mention some additional hybrid functionals that incorporate this position-dependence. Metals have an infinite dielectric constant, and to treat them properly one must screen the Hartree-Fock exchange completely at long distances.

On a macroscopic basis screening accounts for polarization of a medium surrounding the subsystem of interest (such as solvent surrounding a solute in Debye-Hückel theory or a solid surrounding a test charge in condensed-matter physics), and it might seem that we do not need to consider this in a molecule since the whole molecule is treated explicitly. However, if the correlation functional does not yield the necessary polarization, screening of Hartree-Fock exchange may provide a more physical treatment. This is another reason why 100% Hartree-Fock exchange (which corresponds to no dielectric screening) is not necessarily the best approximation.

The problem with excessive delocalization due to self-interaction error (resulting from too little Hartree-Fock exchange) can be very serious because it can change the nature of the system even qualitatively. For example, in our study of holes (empty orbitals produced when the battery is charged) in lithium-ion batteries,^{209} we found that local functionals predict them to be delocalized, but adding Hartree-Fock exchange yields a localized small polaron. This gives an entirely different picture of charge transport.

If one is designing or choosing an XC functional for MR systems, an overriding concern might be to minimize static correlation error and not use Hartree-Fock exchange; in designing or choosing an XC functional for an SR system, other kinds of error might be foremost in one’s consideration, and one might want a large portion of Hartree-Fock exchange. If one is trying to design a universal function, one has to weigh the trade-offs and/or design the local part of the functional so that the total correlation functional is resistant to static correlation error even when a large portion of Hartree-Fock exchange is present. (This is easier to do when one uses kinetic energy density as another ingredient, which is discussed below.)

Becke pioneered the parametrization of hybrid GGAs by optimizing polynomial coefficients against databases.^{177} The B97 global-hybrid GGA functional^{177} was parametrized to main-group atomization energies, ionization potentials, and proton affinities. The B98 global-hybrid functional^{210} was parameterized with higher-order polynomials to main-group heats of formation plus ionization potentials, electron affinities, and proton affinities. Handy and coworkers, following a similar procedure, parametrized the B97-1 global hybrid GGA and the HCTH/93, HCTH/120, and HCTH/147 GGAs with broadened training data including total energies, gradients, XC potentials at points in space, and weakly bound dimers.^{211,212} Two other popular functionals in the global-hybrid GGA class are the already-mentioned B3LYP^{204} functional and the PBE0^{213,214} functional (also called PBE1or PBE1PBE). In our tests of hybrid GGAs, we found that B97-1,^{211} B97-3,^{215} and SOGGA11-X^{216} performed especially well.

The most popular other local ingredient is the kinetic energy density

Adding this to a GGA or an NGA results in what is called a meta-GGA^{217–221} or meta-NGA,^{222} and adding it to a hybrid GGA or hybrid NGA yields a hybrid meta-GGA^{129,221,223} or a hybrid meta-NGA.^{198} There is more than one advantage of employing kinetic energy density. First is that it can distinguish between regions of space where the density is due to a single orbital from regions where it is due to multiple oribitals.^{224,225} This can allow the XC functional to recognize regions containing only a single electron where self-interaction error can be especially serious if not tamed. A second advantage of using kinetic energy density is that it allows one to distinguish regions of decaying density from bonding regions; that in turn allows the XC functional to distinguish between atoms on the surface of a molecule or solid, where the density decays in some directions, from atoms in an interior, where it is surrounded by bonding regions.^{226} A more detailed discussion of the behavior of kinetic energy density and functions of kinetic energy density has been given by Sun *et al.*, who reported a new kinetic energy density variable α, which is able to distinguish regions of covalent bonding, metallic bonding, and weak interactions.^{227}

Some notable local meta functionals are VSXC by Van Voorhis and Scuseria,^{220} τ-HCTH by Boese and Handy,^{228} which improves the performance over HCTH on atomization energy, electron affinities, ionization potentials, dissociation energies of transition metal complexes, and hydrogen bonds, revTPSS by Perdew *et al.*,^{229} M06-L^{230} which is very good for transition metal thermochemistry and geometry optimization, MN12-L,^{222} and MN15-L, which is also very good for transition metal chemistry. The latter two are meta-NGAs, and the other four are meta-GGAs. The costs of meta-GGA and meta-NGA functionals are similar to GGAs and NGAs, but they are capable of higher and more universal accuracy.

By including the new variable α, Sun *et al.*^{231} developed a new series of meta-GGAs called MGGA_MS0, MGGA_MS1, MGGA_MS2, and MGGA_MS2h. The first three use the same functional form with different values of the parameters, and the last one is a hybrid meta-GGA with 9% Hartree-Fock exchange added to the MGGA_MS2 functional. These new functionals were tested for various properties, including reaction barrier heights^{232} and the G3 thermochemistry database.^{233} Recently, Sun *et al.*^{234} developed a new meta gradient approximation, called SCAN (which stands for strongly constrained and appropriately normed), based on the function α, that satisfies 17 constraints. In our own work,^{425} we tested the use of functions of α, but they have not given us significantly improved performance over the functionals used in MN12-L and MN15-L. Further study of functional forms that best incorporate the physics would be valuable.

One can also use the second derivatives of the spin densities (the Laplacian);^{235} this incorporates similar physical effects to including kinetic energy density, but it leads to less stable calculations. (Such functionals are also called meta.)

Some functionals treat the percentage *X* of Hartree-Fock exchange as a variable rather than as a constant. When it is treated as a constant, the XC functional is called a global hybrid. Bahman *et al.*^{236} said “a constant exact-exchange admixture in a global hybrid provides only a very limited way of balancing the elimination of self-interaction (achieved best at large exact-exchange admixtures) and the simulation of nondynamical correlation by LSDA or GGA exchange.” When *X* is treated a variable, the XC functional is called a range-separated hybrid or a local hybrid.

In a range-separated hybrid, *X* depends on the interelectronic separation

where **r**_{1} and **r**_{2} are the coordinates of two electrons. This is most simply done by dividing the interelectronic interaction terms into a short-range (SR) part and a long-range (LR) part, for example by^{237}

Range separation has been applied in a variety of ways. The initial usage was to include more Hartree-Fock exchange at long range than at short range, sometimes increasing from *X* = 0 to *X* = 100, sometimes from finite *X* to *X* = 100, and in one case from *X* = 19 to *X* = 65. Functionals in which *X* increases with *r*_{12} are called long-range corrected or LC functionals.^{237–246} One motivation for this approach is to improve the description of the asymptotic exchange potential so that, for example, one obtains more accurate electronically excited states with charge transfer character without introducing too much static correlation error into ground states. Song *et al.* obtained especially good results by using a Gaussian correction to the usual attenuation factor.^{242}

Another class of range-separated functionals has *X* decreasing to zero at long range.^{198,247} This is often called screened exchange (although any functional with less than 100% Hartree-Fock exchange would qualify as having screened exchange, as discussed above). The motivation for thus eliminating Hartree-Fock exchange at long range is twofold. One motivation is that it reduces the cost of calculations in plane wave codes. A second is that (as already mentioned), in a solid electron correlation screens electron exchange at large distances because of dielectric shielding.^{205,206} One can think of this as arising from a cancellation of exchange and correlation effects at large interelectronic separation.^{123}

Screened exchange has a long history in the physics community, where it is essential for treating metals, and it was incorporated into DFT by Bylander and Kleinman.^{120,206} Heyd, Scuseria, and Ernzerhof developed a widely used screened exchange hybrid functional, and the final version of their potential^{241,247–251} is called HSE06. More recent screened exchange functionals are a Gaussian-attenuated screened exchange denoted GauPBE,^{252} a modification, called HSE12s,^{253} of HSE06 that speeds up computational costs without degrading its performance for band gaps, lattice constants, and barrier heights, a broadly optimized range-separated hybrid NGA called N12-SX,^{198} a broadly optimized range-separated hybrid meta-NGA named MN12-SX,^{198} and a version of HSE06 reparametrized against many-body calculations of band gaps.^{254} Galli and coworkers have developed a dielectric-dependent range-separated hybrid that shows good accuracy for energy gaps of inorganic and organic solids.^{255}

In order to reap the benefits of both LC and SX functionals, it has been suggested to use three-ranges, with *X* increasing from short range to middle range and then decreasing from middle range to long range.^{256}

Another way to use range separation has nothing to do with Hartree-Fock exchange. Rather it was used to employ one parametrization of a meta-GGA at short range and another parametrization at long range.^{257} This accounts for exchange and correlation effects associated with the short-range (SR) and the long-range (LR) portions of the Coulomb interactions depending differently on the independent variables.

A fifth use of dividing the electron-electron interaction into long and short ranges is to join KS-DFT at short range to multiconfiguration self-consistent field or other correlated wave function theories at long range. This is one of the strategies for multiconfiguration DFT mentioned briefly at the end of Section I.^{106,109}

In a position-dependent-hybrid functional, the admixture depends on other variables such as spin-labeled kinetic energy density^{258} spin-labeled reduced density gradient,^{259} or the spin polarization.^{260} The spin polarization is defined by

The next ingredient we will consider is a nonlocal correlation functional. Two kinds of approach have been used for this.

The first approach to nonlocal correlation was the doubly hybrid approach. In this approach one combines an XC energy calculation with one based on post-SCF WFT correlation theory using Kohn-Sham or Hartree-Fock orbitals. Since both sets of orbitals are functionals of the density, this is a form of KS-DFT, just like (singly) hybrid DFT. Most applications have used second-order perturbation theory as the WFT correlation theory,^{128–138,141} but the method has also been employed with higher-order methods.^{129} One can also use the random phase approximation^{261–264} (RPA), which can be considered (from one point of view) as exchange-free perturbation theory.

The second kind of nonlocal correlation functional is called a van der Waals functional.^{265–271} This kind of functional involves the nonlocal interaction between the densities at two points in space. Note the distinction from doubly hybrid functionals; the doubly hybrid functionals have orbital-dependent nonlocal correlation, whereas the van der Waals functionals have density-dependent nonlocal correlation.

Three examples of the van der Waals functional are the vdW-DF2 method of Langreth and coworkers,^{269} which is a further development of the pioneering effort^{265} for this kind of functional, the VV10 functional of Vydrov and Van Voorhis,^{268} and the BEEF-vdW functional of Jacobsen and coworkers^{271} (where the acronym stands for Bayesian error estimation functional with van der Waals correction). It was found^{269,270} that the vdW-DF2 correlation functional works better if used with a compatible exchange functional, and one study found that it works best if combined with the PW86R functional, which is designed^{272} to agree well with Hartree-Fock exchange and with the second order gradient expansion, thereby confirming the importance of a consistent treatment of exchange and correlation, which can be troublesome since standard exchange functionals, especially if empirically optimized, cannot be guaranteed to be innocent of mimicking correlation effects. The BEEF-vdW functional is based on a novel combination of machine learning with a Bayesian-motivated fitting procedure such that one makes predictions of best estimates and error estimates using an ensemble of XC functionals. Interestingly, in later work a subset of the authors of the BEEF-vdW functional developed a local meta functional, mBEEF, that “allows studies of larger and more complex systems than the BEEF-vdW, since the nonlocal correlation term has been eliminated” and showed “that the endured loss of accuracy, even for hydrogen-bonded systems, is rather limited.”^{273}

In 2014, Head-Gordon and coworkers developed a range-separated hybrid functional called ωB97X-V,^{274} which is fitted by choosing the most transferable parameters from a B97-type power series and a nonlocal VV10-type^{268,275} functional. The ωB97X-V is trained on a large number of thermochemistry and noncovalent data, and as a result this highly parametrized functional shows good results on weak interaction and thermochemistry properties. However the ωB97X-V training and test data do not contain any transition-metal or solid-state systems, so it does not represent an attempt to derive a universal functional. By using the same design strategy, these authors also developed a functional called B97M-V, which is a meta-GGA paired with a VV10-type nonlocal functional.^{276}

Another ingredient that can be added to KS-DFT is subshell-dependent corrections to on-site Coulomb repulsions, that is, subshell-dependent corrections to intra-atomic Coulomb and exchange interactions,^{277–284} sometimes called Hubbard-model^{285} terms. Theories including this feature are known as DFT+U. A related method (which is also related to the addition of molecular mechanics terms to improve dispersion interactions in density functionals that do not treat damped dispersion well) is the addition of empirical localized orbital corrections, as developed by Friesner and co-workers^{286–288} to improve the prediction of thermochemical quantities. However, whereas the DFT+U method is applied self-consistently and can, for example, correct delocalization errors, the empirical localized orbital corrections (and the molecular mechanics dispersion terms) are applied post-SCF and do not improve the description of the electronic structure. In some, but not all, respects, the Hubbard correction mimics Hartree-Fock exchange.^{284}

That completes our enumeration of the animals in the zoo, although a few exotic species have gone unmentioned. We have not speculated on which species may become extinct, but in Sec. II D we examine some of the more successful progeny of functional evolution.

### D. The quest for a universal functional

#### 1. Noncovalent interactions

We mentioned above that many workers add molecular mechanics terms to density functional energies because many of the simpler functionals do not predict accurate noncovalent interactions. Local density functionals predict negligible interaction between systems with negligible overlap of their charge distributions, and that is one motivation for adding nonlocal correlation terms in some recent functionals, dating back to the pioneering work of Langreth.^{265} However, at van der Waals minima the attractive force is precisely cancelled by the repulsive one arising from charge cloud overlap, which means that charge cloud overlap is appreciable. It turns out that local density functionals, especially meta functionals, can give reasonably accurate attractive noncovalent interactions by providing a good description of medium-range correlation energy.^{225,230,289,290} (We denote the region of nonoverlapping charge distributions as long range, van der Waals distances as medium range, and distances corresponding to bonded geometries or closer interactions as short range.) Because dispersion is damped and treatable at medium range, density functionals without molecular mechanics terms and without nonlocal correlation can even give reasonable descriptions of medium-range interactions even in systems for which dispersion dominates inductive and permanent-multipole interactions at long range. In recent years there has been a resurgence of interest in forces due to medium-range correlation energy because it is now realized that they play important roles in interactions of mid-sized and large molecules where previously their role had been underestimated or ignored; this is especially important for conformational analysis, isomerization, and catalysis.^{291–294} Since surfaces may be considered (from one point of view) as large molecules, medium-range correlation can also be important in heterogeneous catalysis.^{34,295} Therefore, in the quest for a universal functional, our goal is to obtain useful accuracy for attractive noncovalent interactions as well as for bonding interactions.

#### 2. Performance for atomic and molecular ground states

Progress in developing more accurate density functionals has involved adding ingredients, designing flexible forms that can best represent the important physics of the electron distribution, and parameterizing against larger, more diverse, and/or better chosen databases. We will discuss the performance mainly in terms of Database 2015,^{196} Database 2015A,^{296} and Database 2015B.^{297}

Database 2015B contains 471 atomic and molecular data that were divided into 28 subdatabases, and 83 XC functionals were tested against the entire set of data.^{297} The 28 subdatabases are single-reference main-group metal bond energies, single-reference main-group non-metal bond energies, single-reference transition metal bond energies, multireference main-group metal bond energies, multireference main-group nonmetal bond energies, multireference transition metal bond energies, isomerization energies of large molecules, ionization potentials, electron affinities, proton affinities, thermochemistry of π systems, hydrogen transfer barrier heights, a diverse set of other barrier heights, noncovalent interactions at van der Waals distances, absolute atomic energies, alkyl bond dissociation energies, hydrocarbon chemistry, 3d transition metal atomic excitation energies and the first excitation energy of Fe_{2}, 4d transition metal atomic excitation energies, p-block atomic excitation energies, difficult cases, 2p isomerization energies, 4p isomerization energies, noncovalent complexes as functions of internuclear distance, rare-gas dimers as functions of internuclear distance, transition metal homonuclear dimer bond energies, sulfur-containing-molecule atomization energies, and bond lengths of diatomic molecules. Because these data have different orders of magnitude (ranging from small noncovalent interactions to large ionization potentials), the mean unsigned error across the database is not the most useful measure of the universality of the tested functionals. Relative errors also lead to unreliable evaluations of progress because they overestimate the importance of errors on small quantities and because they cannot be applied in a reasonable way to signed quantities like isomerization energies. A better measure of universality is to rank each of the 83 functionals in order of increasing mean unsigned error on each of the 28 subdatabases and then compute the average rank of each functional across the 28 databases.

These databases were originally developed for the design and validation of three new XC functionals, namely GAM,^{196} which is an NGA, MN15-L,^{296} which is a meta-NGA, and MN15,^{297} which is a global-hybrid meta-NGA. One design principle in building these functionals was the use of large variety of databases and a flexible mathematical functional form capable of fitting, for example, transition metal bond energies and reaction barrier heights, multireference systems, and thermochemistry and noncovalent interactions.

In Table I, we present some ranking results for the ten best-performing functionals in our study,^{211,225,289,296–300} for the eight best-performing local functionals,^{193,196,222,229,230,257,296,301} and for the classic B3LYP functional.^{204} Table I shows the average rankings and the number of databases that a functional is ranked in top 10, top 20, and top 35 (out of the 83 functionals tested). We can clearly see the movement in the field as many functionals are more universal than the long-time favorite B3LYP functional (which is still used today by many workers). Furthermore, among the other functionals the newer ones have the best performance. The MN15 and MN15-L functionals have the best average rankings, namely 11 and 17; the other functionals in the top ten have average rankings from 20 to 31. The MN15, MN15-L, and MPW1B95 functionals have the most number of databases that are ranked in top 35 with respectively 28, 25, and 25 databases so ranked.

Functional . | Reference . | Type^{a}
. | Average ranking . | R10^{b}
. | R20 . | R35 . |
---|---|---|---|---|---|---|

Ten best functionals from among all 83 XC functionals | ||||||

MN15 | 297 | Global-hybrid meta-NGA | 11 | 17 | 23 | 28 |

MN15-L | 296 | Meta-NGA | 17 | 12 | 20 | 25 |

MPW1B95 | 289 | Global-hybrid meta-GGA | 20 | 10 | 17 | 25 |

M08-HX | 298 | Global-hybrid meta-GGA | 21 | 12 | 18 | 21 |

M06 | 299 | Global-hybrid meta-GGA | 22 | 10 | 16 | 19 |

M08-SO | 298 | Global-hybrid meta-GGA | 24 | 13 | 15 | 19 |

PW6B95 | 225 | Global-hybrid meta-GGA | 24 | 6 | 16 | 13 |

B97-1 | 211 | Global-hybrid GGA | 24 | 8 | 12 | 22 |

M06-2X | 299 | Global-hybrid meta-GGA | 26 | 11 | 16 | 19 |

ωB97X-D | 300 | Range-separated-hybrid GGA + MM | 26 | 8 | 14 | 19 |

Eight best functionals from among 38 local XC functionals | ||||||

MN15-L | 296 | Meta-NGA | 17 | 12 | 20 | 25 |

M06-L | 230 | Meta-GGA | 31 | 6 | 8 | 15 |

MN12-L | 222 | Meta-NGA | 36 | 4 | 9 | 17 |

GAM | 196 | NGA | 39 | 3 | 6 | 14 |

M11-L | 257 | Meta-GGA | 43 | 2 | 6 | 14 |

revTPSS | 229 | Meta-GGA | 43 | 2 | 4 | 10 |

TPSS | 301 | Meta-GGA | 43 | 1 | 5 | 9 |

PBE | 193 | GGA | 43 | 0 | 5 | 11 |

Historically important functional for comparison | ||||||

B3LYP | 204 | Global-hybrid GGA | 31 | 1 | 5 | 9 |

Functional . | Reference . | Type^{a}
. | Average ranking . | R10^{b}
. | R20 . | R35 . |
---|---|---|---|---|---|---|

Ten best functionals from among all 83 XC functionals | ||||||

MN15 | 297 | Global-hybrid meta-NGA | 11 | 17 | 23 | 28 |

MN15-L | 296 | Meta-NGA | 17 | 12 | 20 | 25 |

MPW1B95 | 289 | Global-hybrid meta-GGA | 20 | 10 | 17 | 25 |

M08-HX | 298 | Global-hybrid meta-GGA | 21 | 12 | 18 | 21 |

M06 | 299 | Global-hybrid meta-GGA | 22 | 10 | 16 | 19 |

M08-SO | 298 | Global-hybrid meta-GGA | 24 | 13 | 15 | 19 |

PW6B95 | 225 | Global-hybrid meta-GGA | 24 | 6 | 16 | 13 |

B97-1 | 211 | Global-hybrid GGA | 24 | 8 | 12 | 22 |

M06-2X | 299 | Global-hybrid meta-GGA | 26 | 11 | 16 | 19 |

ωB97X-D | 300 | Range-separated-hybrid GGA + MM | 26 | 8 | 14 | 19 |

Eight best functionals from among 38 local XC functionals | ||||||

MN15-L | 296 | Meta-NGA | 17 | 12 | 20 | 25 |

M06-L | 230 | Meta-GGA | 31 | 6 | 8 | 15 |

MN12-L | 222 | Meta-NGA | 36 | 4 | 9 | 17 |

GAM | 196 | NGA | 39 | 3 | 6 | 14 |

M11-L | 257 | Meta-GGA | 43 | 2 | 6 | 14 |

revTPSS | 229 | Meta-GGA | 43 | 2 | 4 | 10 |

TPSS | 301 | Meta-GGA | 43 | 1 | 5 | 9 |

PBE | 193 | GGA | 43 | 0 | 5 | 11 |

Historically important functional for comparison | ||||||

B3LYP | 204 | Global-hybrid GGA | 31 | 1 | 5 | 9 |

^{a}

NGA = nonseparable gradient approximation; GGA = generalized gradient approximation; MM = molecular mechanics.

^{b}

R10, R20, and R35 are respectively the number of subdatabases (out of 28) for which a given functional is ranked in the top 10, the top 20, or the top 35 out of 83 functionals tested.

Table I shows that the most universal performance is obtained with hybrid functionals, despite the static correlation error brought in by Hartree-Fock exchange. Local functionals do have the advantage of being more efficient in many computer codes though. For this reason, there is great interest in local functionals; for example, they are very convenient for geometry optimization of large systems. In this regard, the performance of the recent MN15-L functional is very noteworthy, with greater universality than any functional except the very recent hybrid MN15.

The M08-HX and M08-SO functionals in Table I are refined versions of M06-2X in which we increased the number of parameters to push the functional form to its limit. The M08-HX functional has been found to have outstanding performance for hydrogen-atom transfer barrier heights in application after application.^{302–311}

We also note the good performance of the GAM functional,^{196} which is obtained at the gradient approximation level, without kinetic energy density and without nonlocal exchange; GAM, which denotes gradient approximation for molecules, uses the same mathematical form as the N12 functional,^{195} which was the first NGA, but it is optimized with smoothness restraints against a more diverse database. The GAM functional appears to be a large improvement over the N12 functional, especially for multireference systems. Among all fifteen gradient approximations that we compared in the GAM functional paper, the GAM functional performs the best against various chemistry and physics databases. By adding kinetic energy density to the GAM functional, we developed the MN15-L functional.^{296} The MN15-L functional gives very promising results for various molecular energetic databases including transition metal bond energies, reaction barrier heights, main-group bond energies, excitation energies of p block, 3d, and 4d atoms, hydrocarbon chemistry, and isomerization energies. The MN15-L functional also gives good molecular geometries and solid-state structures like lattice constants, but the performance for noncovalent interactions is less than might be desired. In order to further improve the performance, we developed the MN15 functional, which is a global-hybrid meta-NGA that gives accurate results for a wide array of molecular properties.^{297} The MN15 functional gives the best results for 313 single-reference systems and second best for 54 multireference systems among the 83 density functionals tested on database 2015B. Moreover, the MN15 functional also gives the best results, among functionals for which comparison is available, for main-group molecular geometries, organic molecular geometries, and transition metal bond lengths.

The comparisons in this section do not include all potentially broadly applicable functionals. As just one example, we mention the ωM06-D3 functional,^{312} which was designed by adding LC range separation and molecular mechanics dispersion to an exchange-modified version of the M06 functional. It would be interesting to test this functional on our transition metal databases.

We anticipate that the field will continue to move forward, and functionals will continue to improve, for example, perhaps by taking better advantage of nonlocal correlation or perhaps by other design strategies or more robust parametrization.

#### 3. Performance for the solid state

Because hybrid functionals are very expensive in plane-wave codes, and because most calculations on solids are carried out with plane-wave codes, we have limited most of our investigation of solid-state properties to local functionals. Here we consider mean unsigned errors (MUEs) for the three solid-state subdatabases of Database 2015A: LC17, which has lattice constants for main-group metals, ionic salts, transition metals, and semiconductors, SSCE8, which has cohesive energies of semiconductors and ionic salts, and SBG31, which has semiconductor band gaps. In the tests against SBG31, we calculate band gaps (as usual) as orbital energy differences (energy of lowest unoccupied crystal orbital minus energy of highest occupied crystal orbital).

For lattice constants we tested 26 local functionals and one nonlocal functional (HSE06^{248}) and we found five functionals that give significantly better accuracy than the others, in particular (in ranked order) two GGAs, namely SOGGA^{313} and PBEsol,^{184} a meta-NGA, namely MN12-L,^{222} a meta-GGA, namely MGGA_MS2,^{231} and an NGA, namely N12.^{195} In our test, the LSDA functional ranked 12th out of 27; this is mentioned because Kohn singled out LSDA predictions of lattice constants in his Nobel Prize acceptance speech.^{9} The five functionals mentioned above have MUEs that are factors of 2.6–3.1 smaller than the MUE of the LSDA functional. One functional, namely RPBE,^{314} that was specifically developed for improved adsorption energies at gas-solid interfaces, has an MUE 1.7 times larger than LSDA; this illustrates how a functional with good performance for one property, e.g., adsorption energies, may not reach the goal of being a general-purpose functional.

For cohesive energies and semiconductor band gaps we tested 36 local functionals and three nonlocal functionals (N12-SX,^{198} HSE06, and MN12-SX^{198}), we found the best functionals were SOGGA11^{315} with an MUE of 0.07 eV, PW91^{189} and mPWPW^{316} with MUEs of 0.10 eV, and N12-SX, HSE06, MN15-L,^{296} MN12-L, and PBE^{193} with MUEs of 0.11 eV. The only repeat from the lattice constant top performers is MN12-L, which is therefore considered to be more universal.

For semiconductor band gaps, the three nonlocal functionals are clearly the best performers, with MUEs of 0.26 eV for N12-SX and HSE06 and 0.32 eV for MN12-SX. The best local functional is M11-L^{257} with an MUE of 0.54 eV, followed by MGGA_MS2 (0.66 eV), M06-L (0.73 eV), MN15-L (0.83 eV), and MN12-L (0.84 eV), all of which are meta functionals. SOGGA and PBEsol, which are GGAs that had predicted good lattice constants, each have an MUE of 1.14 eV.

Based on average rank over the three databases, the most universal functionals in this test for solid-state physics are (with average rank in parentheses) MN12-L (6), MN15-L (7), MGGA_MS2 (8), N12 (12), TPSS^{301} (12), PBE (12), M06-L^{230} (13), revTPSS^{229} (13), M11-L (13), SOGGA11 (13), and PW91 (13). Since the functionals near the top of this list are the most recent ones, this shows progress in achieving better universality.

As a side point, we note that periodic calculations can also be carried out with Gaussian basis functions, and this provides an economical way to include Hartree-Fock exchange; one powerful code available at this time for such calculations is Crystal.^{317} We may see further development in this area. We especially note that plane wave calculations are inefficient for calculations involving solid-vacuum interfaces because the vacuum must be filled with basis functions.

### E. Excited states: Time-dependent density functional theory

In the language of condensed-matter physics, molecules are finite systems, and they are distinguished from condensed matter. Molecules have discrete excited states; condensed matter has band structure. Molecules are traditionally discussed in terms of configuration state functions, but condensed mater is discussed in terms of quasiparticles (as originally introduced in the Fermi liquid theory of Landau^{318}). Density functional theory is discussed in both languages. This sometimes makes it hard to translate the methods and discussions of one community into those of the other, and in some respects the differences are more than sematic as the relative importance of various physical effects may change with phase. These differences can be especially important when one discusses excitations.^{319} In this section we will mainly consider molecules and use the language of molecular physics.

#### 1. Theory

Time-dependent density functional theory (TDDFT) is an extension of DFT to treat time-dependent problems and excited states. The first Runge-Gross theorem,^{320} which is a time-dependent counterpart of the first HK theorem, asserts that, given the initial wave function, there exists a one-to-one correspondence between a time-dependent potential (up to an additive function of time) and the time-dependent electron density in this potential. Since the time-dependent potential determines the Hamiltonian of a system with a certain number of electrons, which in turn determines the time-dependent wave function, it follows that a time-dependent system, such as a molecule being excited in an external field, can be completely described by its time-dependent electron density. The second Runge-Gross theorem^{320} provides a variational principle for determining the density.

To practically find the density, a set of time-dependent KS equations can be derived by constructing a fictitious non-interacting system analogous to that used for the ground-state KS case. If the objective is spectroscopy or the calculations of excited-state potential energy surfaces, it is not necessary to work directly with the time-dependent electron density; instead one can use linear response theory to derive convenient frequency-domain formulas for the calculation of electronic excitation energies and excited state properties.^{321,322} This yields a theory identical to time-dependent Hartree-Fock theory (sometimes called the random phase approximation with exchange)^{323–325} except that Hartree-Fock orbitals, orbital energies, and matrix elements are replaced by KS-DFT ones. In particular, one obtains the following equations, sometimes called the Casida equations:^{322}

Here *ω* is the excitation energy, **X** and **Y** are the excitation and de-excitation amplitude vectors describing the excited state, and an asterisk denotes a complex conjugate. The spin-free two-electron integrals are defined, using standard notation^{326} (sometimes called Mulliken notation), by

and *f*_{XC} is the XC kernel with matrix elements

where *ϕ _{i}* is a KS orbital given by a ground-state KS-DFT calculation,

*i*and

*j*denote occupied orbitals, and

*a*and

*b*denote virtual orbitals. One usually makes the adiabatic approximation, under which the frequency dependence of the XC kernel is ignored, and this allows the use of ground-state XC functionals.

^{321,327–331}Then one has

where *ρ* is the electron density, *E*_{xc} is the XC functional, and δ denotes functional variation. The Casida equations with the adiabatic approximation have been efficiently implemented in many electronic structure codes, and the method has gained popularity due to its modest computational cost and useful accuracy for many applications. The above formulation may be called linear-response TDDFT (LR-TDDFT), but often one simply says TDDFT.

The Casida equations can be further simplified by the Tamm-Dancoff approximation (TDA).^{332,333} This amounts to setting **B** to zero in Eq. (12), which results in a Hermitian eigenvalue problem

This equation has the same form as the configuration interaction singles (CIS) approximation in WFT, just as LR-TDDFT has the same form as time-dependent Hartree-Fock (TD-HF) theory.^{334} TDDFT-TDA usually gives similar results to LR-TDDFT, but it is more stable near state intersections.^{335}

Even if the adiabatic and linear response approximations were valid, the accuracy of TDDFT depends on the choice of XC functional. There has been a considerable amount of benchmark work assessing the strengths and weaknesses of various functionals^{336–348} as well as efforts devoted to developing better functionals for TDDFT and better formulations of TDDFT.^{335} Equation (13) shows that the orbital energy differences are a component of the excitation energies, but there are other contributions as well. Neglecting the other components is sometimes called the independent-particle approximation, and in the solid-state literature the difference of the final result from the independent particle approximation is sometimes called an excitonic effect, where an exciton is a quasiparticle interacting with a quasihole.^{150} The importance of excitonic effects is a strong function of the percentage of Hartree-Fock exchange and the screening of this exchange in screened exchange functionals, and this has been elucidated by Brothers *et al.*^{349}

Although the present perspective is mainly devoted to the time-independent linear response theory in the frequency domain, it should be kept in mind that one can also use TDDFT in the time domain. This is especially advantageous for applications involving intense laser fields^{350} and for studying broad spectral regions of complex systems with high densities of states.^{351}

One of the long-standing problems of TDDFT with currently available XC functionals is the difficulty of accurately predicting valence, Rydberg, and long-range charge-transfer excitations with the same XC functional. Such simultaneous accuracy is important because, first of all, many applications such as spectroscopy and photochemistry involve excited states of different types, and secondly, even when only one type of excitation is of interest, the inaccurate treatment of other types may still cause inaccuracies. For instance, as we will discuss below, many XC functionals underestimate Rydberg excitations, and when higher-energy Rydberg states are lowered, they can mix with the valence states and ruin their accuracy as well. TDDFT with the popular hybrid GGA and hybrid meta-GGA functionals often has useful accuracy for valence excitation energies (with errors of only a few tenths of an eV in favorable cases), and the errors can be comparable to the affordable WFT methods such as the equation-of-motion coupled cluster singles and doubles method^{352,353} (EOM-CCSD) or CASPT2.^{354} However, the errors are larger for Rydberg states, and the energies of long-range charge transfer excitations can be severely underestimated by amounts on the order of electron volts.^{299,335,339,355,356}

Other issues meriting discussion are the treatment of doubly excited states and the treatment of conical intersections. We will discuss the issues of Rydberg states, charge transfer states, conical intersections, and double excited states separately and then return to the issue of simultaneous treatment of different kinds of excitations.

#### 2. Rydberg states

The problem with Rydberg excitations can be understood in terms of orbital energies. It can be seen from Eq. (12) that the excitation energies depend on the orbital energy differences, and the accuracy can depend on how well the reference KS-DFT calculation generates differences between orbital energies. However, local functionals generate inaccurate orbital energies due to the self-interaction error. A consequence of the failure of the XC energy to exactly cancel the spurious Coulomb interaction of an electron with itself is that the derived XC potential that determines the orbitals is everywhere too shallow, so that, for example, in a neutral molecule, the potential felt by an electron far from the rest of the system (which is a cation) does not have the correct asymptotic −1/*r* form where *r* is the distance of the electron to the cation. This leads to too high orbital energies; for example, the energy of the highest occupied orbital is higher than the negative of the ionization potential although it would be equal to the ionization potential if the XC functional were exact. This turns out not to be a severe problem for valence excitations since the TDDFT excitation energies depend on the differences of orbital energies, and the valence orbitals involved in these excitations are overestimated by similar amounts. Rydberg orbitals, however, are overestimated by a smaller amount, leading to too narrow valence–Rydberg orbital energy gaps.^{357} These too narrow gaps are the direct cause of the underestimation of Rydberg excitation energies.

This problem can be directly remedied by modeling the XC potential so that it has the correct behavior.^{358–363} This enables one to produce excellent excitation energy spectra for both valence and Rydberg states, but it often requires system-dependent parameters such as accurate ionization potentials as input, making this approach inconvenient to use and decreasing its predictive value. This approach is also unable to produce total energies, since it is in general impossible to derive an XC energy functional from an XC potential model. A different approach to the Rydberg state problem is the “HOMO depopulation” method of Staroverov and co-workers.^{364–366} They showed that the orbital energies can be improved by removing a fraction of the electrons in the HOMO during the SCF cycles, and thus the Rydberg excitation energies are also improved while maintaining the accuracy in valence excitations. This method can also give excellent results for spectroscopy, but it is not size-extensive and is therefore not suitable for calculating potential energy surfaces. To avoid the inability to produce total energies and the lack of size extensivity, it is desirable to address the problem at the level of XC functionals rather than effective potentials.

The most popular functional-level improvement for this problem comes from the range-separated hybrid scheme.^{367} By making the fraction of HF exchange a function of interelectronic distance, as discussed above in Subsection II C, one can have a low fraction of HF exchange appropriate for ground-state properties and valence states at short interelectronic distances while having a high fraction of HF exchange appropriate for Rydberg states at larger interelectronic distances. In practice the scheme has been adopted to develop useful functionals and successfully applied to many problems, although it can be difficult to choose the parameters that control range separation to achieve broad accuracy for different systems,^{368,369} especially for systems of different sizes. Furthermore, improving the accuracy for Rydberg states usually makes the accuracy worse for valence excitations.

Recently, our group showed that it is also possible to improve the accuracy of Rydberg excitations within the gradient approximation framework. We proposed a scheme called exchange enhancement for large gradient (XELG)^{370} and showed that, by enhancing the local exchange energy in regions where the reduced density gradient is large, we can improve the accuracy of Rydberg excitation energies without sacrificing accuracy for valence excitations and ground-state properties. We also showed that it is the XC potential in the middle-*r* range, rather than in the asymptotic region, that determines the Rydberg excitation energies. Therefore it is possible for a well-designed functional to achieve simultaneous accuracy in both valence and Rydberg excitations even if its XC potential does not have the correct asymptotic behavior. For example, the recently published MN15 functional, with a global *X* value of 44, was shown to have the best overall accuracy for a molecular excitation dataset among a large set of functionals and wave function methods.^{297} Further progress should be possible by taking advantage of the lessons of the XELG study in future XC functional development.

Some comments on orbital energies are in order at this point. Although obtaining a reasonable set of orbital energies is crucial for predicting accurate excitation energies with TDDFT, physical interpretation of orbital energies should be done with caution. The HOMO energy given by the exact XC functional is equal to the negative of the ionization potential, but this is seldom the case for practical XC functionals. The HOMO–LUMO energy gap (the “KS gap”) given by local functionals is an approximation to the lowest (local) excitation energy (the optical gap); this should be contrasted with the HOMO-LUMO gap given by HF, which is an approximation to the ionization potential minus the electron affinity (the fundamental gap).^{371} The HOMO-LUMO gap of a hybrid functional is often between those two cases and does not have well-defined physical interpretation. In general it is better to focus attention on physical observables than on theoretical constructs like orbital energies.

One consideration to keep in mind is that, independent of the choice XC functional, diffuse basis functions are needed to give a qualitatively correct description of Rydberg excitations.^{372} The error caused by not using diffuse enough basis functions can be several eV even with a good XC functional.

#### 3. Charge transfer

The problem with charge transfer excitations has a different origin. In the limit that the charge transfer distance goes to infinity, the charge transfer process is equivalent to removing an electron from the donor and adding an electron to the acceptor. At a finite (but large) distance *R*, the excitation energy is thus equal to the sum of the ionization potential of the donor, the electron affinity of the acceptor, and the Coulomb interaction between the two charged moieties. Accordingly the excitation energy as a function of the charge transfer distance, *R*, should change as *I* + *A* − 1/*R* when *R* is large, where *I* is the ionization potential, and *A* is the electron affinity.^{373} However, the excitation energy given by TDDFT with local functionals often approaches a constant smaller than *I* + *A* as *R* increases. The reason can be seen from Eqs. (12) and (13). When the donor-acceptor separation is large, the overlap between the occupied orbitals on the donor and virtual orbitals on the acceptor is negligible, and the integrals in Eq. (13) are zero if *f*_{XC} is local. Therefore the excitation energy collapses to the orbital energy difference, which underestimates the charge transfer excitation energy.^{371} In some cases, B3LYP, with only 20% Hartree-Fock exchange, fails to predict any charge transfer bands with nonvanishing oscillator strength.^{374} Better results for intermediate-range charge transfer can be obtained both with range-separated-hybrid GGAs and with global-hybrid meta-GGAs.^{374–376}

If the spatial overlap of the initial and final orbitals of the charge transfer excitation approaches zero, only very high Hartree-Fock exchange, approaching 100%, can provide useful accuracy. Peach *et al.*^{377} presented a diagnostic called Λ that provides guidance on whether a charge transfer excitation requires very high nonlocal exchange. This diagnostic is a measure of the spatial overlap of the initial and final orbitals of the excitation (spatial overlap is an intrinsically semipositive quantity, not to be confused with signed orbital overlap). This has been found to be very useful in practice.^{378}

A different viewpoint on this problem was provided by Ziegler and co-workers.^{347,379–384} They showed that the problem can also be ascribed to linear response and can be handled by including higher-order terms, using the constricted variational DFT that they developed. A practical disadvantage of this theory, as compared to LR-TDDFT, is that one must solve separately for each excited state.

#### 4. Conical intersections

Another problem of TDDFT is the incorrect dimensionality of conical intersections between the ground (reference) state and an excited (response) state in the TDDFT formulation. In reality, conical intersection seams appear in *F*–2 dimensions, where *F* is the number of internal degrees of freedom of a molecule.^{385–387} With TDDFT, however, conical intersections appear in *F*–1 dimensions,^{388} and thus the coupled ground- and excited-state potential energy surfaces are qualitatively incorrect. To solve the problem, we proposed, within the TDA, a scheme to recover the missing coupling between the ground and any excited state. The scheme, called configuration-interaction-corrected Tamm-Dancoff approximation (CIC-TDA),^{389} gives the correct dimensionality of conical intersections while giving energies similar to TDDFT-TDA away from conical intersections. The dressed TDDFT^{390–393} and spin-flip TDDFT^{394–397} methods also have the correct dimensionality due to the inclusion of doubly excited configurations, as discussed below.

#### 5. Double excitations

In LR-TDDFT, the formally correct description of doubly excited states requires a frequency-dependent XC kernel.^{398} Burke, Maitra, and co-workers proposed a formulation called dressed TDDFT that includes explicit frequency dependence in the XC kernel, and it has subsequently been elaborated and tested,^{390–393} and its detailed implementation has been explored.^{399} It can qualitatively account for double excitations, but at the current stage of development its overall accuracy is not as good as adiabatic TDDFT with the best functionals. The problem can also be remedied in a different way by spin-flip TDDFT.^{394–397} Spin-flip TDDFT uses a triplet state as the reference state to which single excitations with spin flip are applied, resulting in both singly excited and doubly excited configurations relative to a closed-shell ground state. It can produce accurate excitation energies when used with the noncollinear kernel and some functionals, but there can be problems from spin contamination and subtleties in practical details.^{346,400}

It is important to note that the so called doubly excited states are not as straightforward as is sometimes assumed. In many cases labeled as double excitations, configuration interaction wave functions for both ground and excited states have significant multireference character, and double excitation states are better described as transitions from an MR state (the ground state itself has significant double excitation character) to another MR state that has a different mixture of two (or more) dominant CSFs.^{400,401} This complicates the analysis considerably.

#### 6. Broad applicability for molecular excitations

To treat photochemistry and other excited-state dynamical processes, one wants a functional that gives good potential energy surfaces (especially bond energies and barrier heights) while simultaneously providing accurate results for all three types of excitation energies—valence, Rydberg, and charge transfer. That is still an unmet goal for KS-DFT because long-range charge transfer excitations require 100% Hartree-Fock exchange,^{299} but other electronically excited states usually have MR character, and 100% Hartree-Fock exchange is not good for MR systems with currently available functionals. But if one omits long-range charge transfer excitations, some XC functionals show very useful accuracy. This is illustrated in Table II, which shows mean unsigned errors for a database^{344,345} of 30 valence states and 39 Rydberg states of 11 organic molecules, in particular, ethylene, isobutene, *trans*-1,3-butadiene, formaldehyde, acetaldehyde, acetone, pyridine, pyrazine, pyrimidine, pyridazine, and *s*-tetrazine. The table shows the results for the XC functionals with a mean unsigned error less than 0.35 eV, plus—for comparison—results for five popular functionals and for EOM-CCSD, which is a more expensive WFT method. We see that, except for xePBE0, the best global hybrid functionals have *X* between 41 and 54. Lowering *X* to 20–25 increases the accuracy for valence states, but makes the accuracy much worse for Rydberg states. Range-separated hybrid functionals [LC-BOP (*μ* = 0.33),^{237} LC-BLYP (*μ* = 0.33),^{237} ωB97X-D,^{244} CAM-B3LYP,^{239} ωB97X^{243}], with an appropriate control of HF exchange in different ranges of interelectronic distance, are capable of achieving good accuracy for both valence and Rydberg states. The xePBE0 functional, on the other hand, achieves simultaneous accuracy for the two types of excitations by exploiting local exchange as well as HF exchange. While it has similar accuracy to PBE0 with 25% HF exchange for valence states, its corrected local exchange functional improves the shape of the exchange potential in the range relevant to Rydberg excitations and thus it also gives good accuracy for Rydberg states.

. | . | Mean unsigned error (eV) . | ||
---|---|---|---|---|

. | X^{b}
. | 30 valence . | 39 Rydberg . | 69 total . |

Best performing | ||||

MN15 | 44 | 0.29 | 0.24 | 0.26 |

LC-BOP (μ = 0.33)^{c} | 0-100 | 0.31 | 0.22 | 0.26 |

LC-BLYP (μ = 0.33)^{c} | 0-100 | 0.31 | 0.24 | 0.27 |

M06-2X | 54 | 0.36 | 0.26 | 0.30 |

ωB97X-D | 22.2-100 | 0.32 | 0.28 | 0.30 |

xePBE0^{d} | 25 | 0.25 | 0.36 | 0.31 |

MPWKCIS1K | 41 | 0.40 | 0.27 | 0.32 |

PWB6K | 46 | 0.43 | 0.24 | 0.32 |

CAM-B3LYP | 19–65 | 0.31 | 0.35 | 0.33 |

MPW1K | 42.8 | 0.45 | 0.23 | 0.33 |

MPWB1K | 44 | 0.40 | 0.28 | 0.33 |

ωB97X | 13.77-100 | 0.40 | 0.28 | 0.33 |

For comparison—Other popular XC functionals | ||||

PBE0^{d} | 25 | 0.22 | 0.80 | 0.55 |

B3LYP^{d} | 20 | 0.20 | 1.03 | 0.67 |

LSDA | 0 | 0.45 | 1.20 | 0.88 |

PBE^{d} | 0 | 0.40 | 1.70 | 1.13 |

BLYP^{d} | 0 | 0.40 | 1.88 | 1.23 |

For comparison—WFT | ||||

EOM-CCSD | 100 | 0.47 | 0.11 | 0.27 |

. | . | Mean unsigned error (eV) . | ||
---|---|---|---|---|

. | X^{b}
. | 30 valence . | 39 Rydberg . | 69 total . |

Best performing | ||||

MN15 | 44 | 0.29 | 0.24 | 0.26 |

LC-BOP (μ = 0.33)^{c} | 0-100 | 0.31 | 0.22 | 0.26 |

LC-BLYP (μ = 0.33)^{c} | 0-100 | 0.31 | 0.24 | 0.27 |

M06-2X | 54 | 0.36 | 0.26 | 0.30 |

ωB97X-D | 22.2-100 | 0.32 | 0.28 | 0.30 |

xePBE0^{d} | 25 | 0.25 | 0.36 | 0.31 |

MPWKCIS1K | 41 | 0.40 | 0.27 | 0.32 |

PWB6K | 46 | 0.43 | 0.24 | 0.32 |

CAM-B3LYP | 19–65 | 0.31 | 0.35 | 0.33 |

MPW1K | 42.8 | 0.45 | 0.23 | 0.33 |

MPWB1K | 44 | 0.40 | 0.28 | 0.33 |

ωB97X | 13.77-100 | 0.40 | 0.28 | 0.33 |

For comparison—Other popular XC functionals | ||||

PBE0^{d} | 25 | 0.22 | 0.80 | 0.55 |

B3LYP^{d} | 20 | 0.20 | 1.03 | 0.67 |

LSDA | 0 | 0.45 | 1.20 | 0.88 |

PBE^{d} | 0 | 0.40 | 1.70 | 1.13 |

BLYP^{d} | 0 | 0.40 | 1.88 | 1.23 |

For comparison—WFT | ||||

EOM-CCSD | 100 | 0.47 | 0.11 | 0.27 |

^{a}

The basis set used for this table is 6-31(2+,2+)G(d,p) as in Ref. 344. The beyond-LSDA density functionals are from Refs. 187, 190, 193, 204, 213, 214, 225, 237, 289, 297, 299, 300, 370, 403, 426, and 427.

^{b}

*X* is the percentage of Hartree-Fock exchange. When a range is shown, the first value applies for small interelectronic excitation, and the second value applies for large interelectronic separation.

^{c}

*μ* is the parameter in the LC scheme that controls range separation in these range-separated-hybrid GGAs. These results are from.

## III. CONCLUDING REMARKS

We have reviewed recent advances in density functional theory, from our personal perspective rather than in an exhaustive way. There are many related topics that we could have covered, but did not. As just a few examples, we mention density matrix functional theory,^{402,403} solvatochromic shifts,^{404–408} efficient ways to treat core orbitals,^{409–413} and many-body WFT methods that use the KS-DFT reference determinant as a reference wave function.^{148–151,183,261–264,414–417} We also de-emphasized applications and validations for specific systems, for which other reviews are available.^{34,91,418–424}

A unifying theme of what we have presented is the quest for a universally applicable method that can handle both multireference and single-reference systems as well as being universal in other respects—applicable to both ground and excited states, including Rydberg states, to stable molecules and transition states, to covalent, ionic, charge transfer, and dispersion-dominated interactions, and so forth. We presented a summary of the current status of making XC functionals more universal. KS-DFT is quite mature—being 42 years old. The current emphasis is on obtaining improved exchange-correlation functionals. Progress is very exciting, especially the broad accuracy of the new MN15-L and MN15 functionals.

It is often said that there is no systematic path to improving density functional theory. This may be true, but nevertheless density functional theory continues to be improved at an exciting pace.

## Acknowledgments

We gratefully acknowledge the contributions of Lucas Bao, Kaining Duanmu, Xiao He, Miho Isegawa, Andy Luo, Ben Lynch, Roberto Peverati, Nate Schultz, Pragya Verma, Xuefei Xu, Wenjing Zhang, and Yan Zhao for contributions to our work on Kohn-Sham density functional theory development, Carlo Adamo, John Perdew, Giovanni Scalmani, and Jianwei Sun for helpful discussions, and Laura Gagliardi and many additional coworkers for collaboration on multiconfiguration pair-density functional theory. Work on designing functionals with higher accuracy for transition metal catalysis was supported in part as part of the Inorganometallic Catalysis Design Center, an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences under Award No. DE-SC0012702, and work on excited-state chemistry was supported in part by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Award No. DE-SC0015997.

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the U.S. Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the U.S. Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the U.S. Government or any agency thereof.