For an electronic system, given a mean field method and a distribution of orbital occupation numbers that are close to the natural occupations of the correlated system, we provide formal evidence and computational support to the hypothesis that the entropy (or more precisely −σS, where σ is a parameter and S is the entropy) of such a distribution is a good approximation to the correlation energy. Underpinning the formal evidence are mild assumptions: the correlation energy is strictly a functional of the occupation numbers, and the occupation numbers derive from an invertible distribution. Computational support centers around employing different mean field methods and occupation number distributions (Fermi–Dirac, Gaussian, and linear), for which our claims are verified for a series of pilot calculations involving bond breaking and chemical reactions. This work establishes a formal footing for those methods employing entropy as a measure of electronic correlation energy (e.g., i-DMFT [Wang and Baerends, Phys. Rev. Lett. 128, 013001 (2022)] and TAO-DFT [J.-D. Chai, J. Chem. Phys. 136, 154104 (2012)]) and sets the stage for the widespread use of entropy functionals for approximating the (static) electronic correlation.
Accounting for electronic correlation in electronic structure methods has been an active topic of research for decades. Particularly difficult is accounting for the strong, static correlation that arises when stretching and breaking covalent bonds in molecules and materials. Multireference methods,1–3 reduced density matrix functional theory,4–9 and methods based on range separation10–12 have been perhaps the most active lines of work to account for static correlation. Methods that account for electronic correlation are typically built on top of a reference mean field method. In quantum chemistry, the Hartree–Fock (HF) method is generally adopted employing a single Slater determinant as the reference wavefunction, which features aufbau occupations of the HF orbitals. However, the eigenvalues of the one-electron reduced density matrix (1-rdm) of the correlated system generally do not follow aufbau occupations. In addition, the natural orbitals (the eigenfunctions of the 1-rdm) are close to the HF orbitals when the occupation numbers are large but deviate significantly for small occupations in a system-dependent way.5,13
We remark that the definition of correlation energy in Eq. (1) differs from the definition of correlation energy found in quantum chemistry textbooks (see, for example, Szabo and Ostlund14). This is because the classical (Hartree) Coulomb and exchange energies (and correlation energy in the case of nDFT) evaluated with a non-aufbau 1-rdm and an aufbau 1-rdm differ. Such a discrepancy motivated Hammond and Mazziotti15 to introduce the concept of “connected energy” and Wang and Baerends9 to name the same energy difference “cumulant energy.” Although we do not dispute their choice, we will not adopt the same nomenclature in this work.
Several authors9,16,17 showed that the Fermi–Dirac (FD) distribution applied to nMF methods can generate occupations that are in good agreement with natural occupations from correlated wavefunction methods for such processes as bond breaking9,16 and systems displaying a diradical character.18 In addition, computational evidence shows that the Shannon entropy of the occupations derived from the FD distribution is a good approximation to the correlation energy when it is added on top of the energy of an nMF of choice. Examples are available for nHF9 and semilocal and hybrid DFT methods16,19 (nDFT).
A lingering question arises: why should the entropy be a suitable approximation to the correlation energy? In their development of the i-DMFT method, Wang et al.9,20 sought entropy-like expressions as approximations for the correlation energy (which, as mentioned before, they call cumulant energy), drawing inspiration from the studies of Collins.21 However, they do not offer a formal justification for this approach. Chai16,22 attempted to justify the use of entropy functionals, but the argument provided16 is flawed as it corrects the noninteracting kinetic energy instead of the exchange–correlation functional. TAO-DFT has recently been formulated on formal grounds;23 however, the reliance on the FD entropy, as for the i-DMFT method, has remained ad hoc. In this context, we also note alternative approaches to extend mean field methods to incorporate static correlations from non-idempotent 1-rdm’s, such as the ones from the Mazziotti group,24 which, in limiting cases, relate to the i-DMFT and TAO-DFT methods.25
Below, we provide a formal justification for using an entropy-like functional as an approximation for the correlation energy of electronic systems. We start from the following initial conditions:
We then impose the following mild approximations:The occupation numbers, {ni}, derive from an invertible distribution ρσ having width σ that relates each ni to the difference between the corresponding orbital energy, ɛi, and the Fermi energy, μ. See Eq. (3).
We note that this is a standard treatment of finite temperature systems. However, it is an approximation in the context of electron correlation.
- The correlation energy is exclusively a functional of the nMF occupation numbers,This is perhaps the most severe of the approximations introduced in this work, and it is shared with the i-DMFT work of Wang and Baerends.9(6)
We wish to prove the following theorem.
Let the electronic energy be given as the sum of two terms. The first is the energy of a non-aufbau mean field method, EnMF[{ni}, {ϕi}], employing an invertible occupation number distribution, ρσ. The second is a pure functional of the occupation numbers, Ec[{ni}]. We establish that Ec = −σSρ, where σ represents the width of the distribution and Sρ denotes the entropy associated with the distribution ρσ.
where is the inverse function of ρσ(ɛi − μ). Such an inverse typically exists for physically behaved distributions. For example, it exists for FD, G, and L distributions employed here and listed in Table I as long as σ > 0.
Occupation number distributions, ρσ(ɛi − μ), and the corresponding entropy density expressions for the Fermi–Dirac (FD), Gaussian (G), and linear (L) distributions. Here, the occupation numbers satisfy 0 ≤ ni ≤ 1.
X . | ρσ(ɛi − μ) . | Entropy density (si) . |
---|---|---|
FD | −ni ln ni − (1 − ni) ln(1 − ni) | |
G | ||
L |
X . | ρσ(ɛi − μ) . | Entropy density (si) . |
---|---|---|
FD | −ni ln ni − (1 − ni) ln(1 − ni) | |
G | ||
L |
To evaluate the integral above, the lower limit of integration must be chosen such that the occupations correspond to a state of null electron correlation, i.e., aufbau occupations. Meanwhile, the upper limit is determined by the occupations obtained from the distribution ρσ or {ni}.
Equation (11) yields the entropy energy contribution for the distribution ρσ, which, it follows, is also equal to the correlation energy functional concluding the proof of this theorem.
The use of the FD distribution and its associated entropy has proven to yield accurate energy curves along the bond-breaking coordinate for numerous molecules. This approach has been employed with both mainstream non-aufbau DFT methods16,19 and nHF9 as nMF methods. Chai and co-workers showed that this method predicts improved reaction barrier heights, thermochemistry,22 and other properties for various classes of molecules and materials.
A reasonable approach would involve exploring the collection of distributions such that 0 ≤ ni ≤ 1 and ∑ini = N to identify the one that most accurately represents the correlation energy across a wide range of processes and molecular systems. The computational demands of such an undertaking are significant. Thus, we defer this task to future work. In Table I, we present the expressions for three commonly used distributions and their respective entropy density expressions, denoted by ec(ni) = −σsρ(ni); see Eq. (11).
We proceed to compare the three distributions, examining whether they yield significantly different occupation numbers and correlation energies. Figure 1 displays the three distributions alongside their corresponding correlation energy densities. In order to facilitate a clear comparison, we select σ values for each distribution that yield occupation numbers most similar to the FD distribution: σG = 2.4σFD and σL = 4.1σFD. The resulting occupation numbers and correlation energies exhibit a high degree of similarity. Nevertheless, Fig. 1 clearly reveals slight disparities in both the occupation numbers (∼4% for linear and 2% for Gaussian) and entropy densities (around 4 mHa for linear and 3 mHa for Gaussian). Considering that σFD = 1 Ha, we can foresee that these differences, although minor in comparison with the errors caused by MF methods in regimes characterized by a strong static correlation,10,26 will have a noticeable, however, mild impact on the computed correlation energies.
Occupation numbers (top) and the corresponding correlation energy densities [bottom, ec(ni) in Eq. (11)] for the G and L distributions (see Table I) referenced to the FD result. We use widths that minimize the difference between the occupations produced by the three distributions (σG = 2.4σFD and σL = 4.1σFD).
Occupation numbers (top) and the corresponding correlation energy densities [bottom, ec(ni) in Eq. (11)] for the G and L distributions (see Table I) referenced to the FD result. We use widths that minimize the difference between the occupations produced by the three distributions (σG = 2.4σFD and σL = 4.1σFD).
The calculations presented in this work are performed using PySCF.27 In order to ensure full reproducibility of our results, we have made the Jupyter notebooks available on Zenodo.28 These notebooks can be utilized to reproduce all calculations and figures described in this Communication. The list of benchmark methods employed, along with the corresponding basis set, nMF method used, and σ values, is given in Table II. For the Diels–Alder reaction, B3LYP/6-31G* was employed to predict the reaction coordinate via nudged elastic band calculations, and the transition structure was obtained from Ref. 29 (also accessible on Zenodo using the aforementioned link).
Correlated wavefunction methods, basis sets, and the corresponding σFD in Ha used in this work. CAS(n, m) stands for CASSCF with an active space of n electrons and m holes.
System . | Basis set . | Benchmark . | nMF . | σFD . |
---|---|---|---|---|
H2 | cc-pVDZ | Full-CI | nHF | 0.095 |
nPBE | 0.029 | |||
nPBE0 | 0.047 | |||
Ethane | cc-pVDZ | CAS(6,6) | nHF | 0.093 |
Benzene | 6-31G* | CAS(6,6) | nHF | 0.096 |
Diels–Alder | 6-31G* | CAS(6,6) | nHF | 0.070 |
Retinal | cc-pVDZ | CAS(15,22)+NEVPT2 | nHF | 0.065 |
System . | Basis set . | Benchmark . | nMF . | σFD . |
---|---|---|---|---|
H2 | cc-pVDZ | Full-CI | nHF | 0.095 |
nPBE | 0.029 | |||
nPBE0 | 0.047 | |||
Ethane | cc-pVDZ | CAS(6,6) | nHF | 0.093 |
Benzene | 6-31G* | CAS(6,6) | nHF | 0.096 |
Diels–Alder | 6-31G* | CAS(6,6) | nHF | 0.070 |
Retinal | cc-pVDZ | CAS(15,22)+NEVPT2 | nHF | 0.065 |
While Theorem 1 provides no link between the nMF occupation numbers and the natural occupation numbers, we have determined that it is advantageous to select a σ that allows for close matching of the natural occupations and the MF occupations. We noted that the distribution width ratios determined for the data presented in Fig. 1 are generally optimal. Thus, it is enough to find an optimal σFD to then infer on the optimal σL or σG. We tested the single bond stretching process for H2, benzene (C–H bond), and ethane (C–C bond). Figure 2 shows the bond energy curves. The optimal σFD used in this work is reported in Table II. While we detect some (expected) differences in the results from the three distributions, they all substantially improve the nHF energies. We can conclude that the correlation energy functional is able to capture the static correlation in the bond breaking of molecules.
Bond energy curves referenced to the equilibrium energy for the distributions listed in Table I for the following bond breaking processes: (a) molecular hydrogen (H2); (b) benzene, breaking one of the C–H bonds; and (c) ethane (CH3–CH3), breaking the C–C bond.
Bond energy curves referenced to the equilibrium energy for the distributions listed in Table I for the following bond breaking processes: (a) molecular hydrogen (H2); (b) benzene, breaking one of the C–H bonds; and (c) ethane (CH3–CH3), breaking the C–C bond.
In their work, Wang and Baerends9 employed the i-DMFT correlation energy functional given by Ec = −σS + b, where b represents a system-dependent shift added to the nHF energy. We observe that the total electronic energies of the systems investigated in this study exhibit a shift compared to the reference energies. Therefore, incorporating an energy shift, b, on a case-by-case basis, could enhance the quality of the results. However, we have been unable to identify a theorem similar to our Theorem 1 that could provide an explanation or justification for employing such an energy shift. Speculatively, based on the ongoing debate and understanding of the i-DMFT method, relaxing approximation (a2), i.e., introducing a functional dependence of Ec on the nMF orbitals as well, Ec[{ni}, {ϕi}], may address this issue. It is evident that the true correlation energy must encompass dynamic correlation effects as well, which are unlikely to be captured by the entropy functional alone. In addition, in a comment on the i-DMFT paper,30 Ding et al. showed that the 1-rdm predicted by i-DMFT does not quantitatively agree with those obtained from accurate wavefunction methods (such as MCSCF).
These considerations suggest two potential avenues for improvement: (1) exploring alternative nMF methods that offer more favorable error cancelation, as nHF may not be the optimal or “best” choice of the nMF method; and (2) acknowledging that the correlation energy should also exhibit a dependence on the nMF orbitals. While we will analyze point (2) (orbital dependence) later, to address point (1), we note that Theorem 1 is not specific to nHF. It can be applied to any nMF method provided that the nMF energy lies above the energy of the electronic ground state. This realization opens an avenue for improving the accuracy of the total energies as well as the electron densities by searching for the “best” nMF methods. In fact, the TAO-DFT method16,19,22 effectively utilizes Eq. (11) in conjunction with DFT methods. Chai showed22 that changing the amount of HF exchange in the DFT functional in TAO-DFT can improve the description of reaction barrier heights. Problematic is the fact that often approximate xc functionals lead to electronic energies below the variational energy minimum.
In Fig. 3, we compare the nMF dependence of the energy functional (EnMF + Ec, where Ec is the one from Eq. (11) evaluated with an nMF-specific σ) and the electron density against full-CI (FCI) for H2. This figure shows that the 1-rdm’s obtained with nDFT methods, such as nPBE0, are closer to FCI than nHF; the nHF 1-rdm error is slashed by 50%. The total electronic energy is also closer to FCI, reducing the shift (which is captured by the −b constant in i-DMFT) from 27 mHa for nHF to barely 3 mHa for nPBE0. A contrast is HF (aufbau occupations), which provides the most deviated densities and energies. These results provide us with barely an indication for improving the i-DMFT method, ridding it of the −b constant in compliance with Theorem 1. We should expect the performance of nPBE0 to be system dependent but generally to outperform nHF for bond dissociations. Thus, further inspection in the dependence on the reference nMF method is needed and will be tackled in a future study.
1-rdm (left) and total electronic energy (right) differences between the nMF methods listed and the full-CI (FCI) along the dissociation path of H2. For the 1-rdm, we plot the following metric: .
1-rdm (left) and total electronic energy (right) differences between the nMF methods listed and the full-CI (FCI) along the dissociation path of H2. For the 1-rdm, we plot the following metric: .
So far, we have only considered bond dissociation curves. More interesting are barrier heights of chemical reactions. In Fig. 4, we show the energy profile for the prototypical Diels–Alder reaction between 1,3-butadiene and ethene computed by HF (EHF), nHF (EnHF), and EnHF + Ec. The reaction coordinate has been determined by nudged elastic band calculations carried out with B3LYP. The results are quite convincing in that the barrier height is overestimated by HF, underestimated by B3LYP, and well reproduced by EnHF + Ec. Specifically, the overestimation by HF is about 20 kcal/mol and the underestimation by B3LYP is of similar size. This result shows that the good performance of EnHF + Ec is not solely relegated to bond stretching, but it ventures into the prediction of barrier heights and, potentially, many other processes where static correlation is important.
Energy (left) and occupations (right) along a reaction coordinate for the Diels–Alder reaction between butadiene and ethene. We used σFD = 0.07 Ha.
Energy (left) and occupations (right) along a reaction coordinate for the Diels–Alder reaction between butadiene and ethene. We used σFD = 0.07 Ha.
An example of such processes is the cis–trans isomerization of alkenes, where the reaction coordinates is a dihedral angle that twists a double bond. In doing so, the double bond is strained and eventually broken when the dihedral angle reaches 90°. A particularly important class of molecules where cis–trans isomerization is important is retinal derivatives. In Fig. 5, we showcase the 9-trans to 9-cis isomerization of all-trans retinal. B3LYP, HF, nHF, and nHF+Ec energies are compared against reference NEVPT2 [using a large CAS(15,22)] along the rotation of angle α for unrelaxed geometries.
Energy of retinal along the all-trans to 9-cis and back to all-trans dihedral angle coordinate, α, for unrelaxed geometries. We used σFD = 0.065 Ha.
Energy of retinal along the all-trans to 9-cis and back to all-trans dihedral angle coordinate, α, for unrelaxed geometries. We used σFD = 0.065 Ha.
The trans–cis isomerization reaction barrier in Fig. 5 delivers a picture consistent with the behavior for Diels–Alder reactions as well as for single-bond dissociation. In addition, we see that appropriate σFD depends on the particular bond being stretched with a value nearing 0.09 Ha for single σ bonds and 0.06–0.07 Ha for single π bonds.
In conclusion, in this work, we formalize and provide computational support to the hypothesis that the entropy is a good approximation to the correlation energy. We show that if the total energy functional is approximated by the sum of the energy of a non-aufbau mean field method, EnMF, and a correlation energy functional purely dependent on the occupation numbers, Ec[{ni}], the correlation energy is given by −σSρ, where Sρ is the entropy associated with the occupation number distribution employed. This formal result validates the i-DMFT method9 (which uses non-aufbau HF as the nMF) and TAO-DFT16 (which uses approximate DFT methods as the nMF). The computational support of this result shows that the excellent performance of i-DMFT (which uses the Fermi–Dirac distribution) can be matched with other reasonable occupation number distributions, such as Gaussian and linear. The entropy used in the evaluation of the correlation energy is the one associated with the chosen distribution, showing that there is nothing special about the Shannon entropy, and entropy expressions derived from other distributions (such as Gaussian and linear) are just as good in approximating the correlation energy as the Shannon entropy. The computational work includes considering several bond dissociation systems as well as the barrier height of a Diels–Alder reaction where the static correlation is known to negatively affect the results of mean field methods. In addition to our results, we can also leverage the good results from the i-DMFT and TAO-DFT methods to provide further computational evidence that such a strategy is effective.
A question, however, lingers: why should the occupation number distributions considered here (FD, G, and L) reproduce so well the natural occupations along bond breaking, double bond twisting, and certain chemical reactions? We have not provided an answer to this question, which, to the best of our knowledge, remains open.
Despite the satisfactory results, it is imperative that the method be improved in future work. This can be achieved in several ways. As hinted from Fig. 3, potential research avenues include choosing judiciously the non-aufbau mean field method to be paired with Ec. For H2, we notice a clear improvement for the 1-rdm when using nPBE0 compared to nHF. In addition to considering improvements from the mean field method, concurrent improvements to the entropy functional itself, Ec = −σSρ, should be formulated. Wang and Baerends proposed making σ a density (or 1-rdm) functional.31 While this approach is, indeed, feasible (potentially by relating σ to the energy gap19), we also recognize that the entropy functional itself will likely need to be modified when incorporating orbital dependence.
To clarify this aspect, an approach can be derived from the method of spectral decomposition of correlation energy by Savin and co-workers.32,33 They formally showed that Ec = ∑iɛiDi, where , with the integration path λ following the Hamiltonian , in which represents a mean field N-electron Hamiltonian and denotes the Coulomb molecular Hamiltonian. The correlation energy is defined for a given value of λ as (where ), and the diagonals of the natural occupations expressed in the mean field basis are denoted by . It is evident that λ and are interrelated and can be used interchangeably as the primary integration variable, yielding , where formally . Notably, by realizing that , we observe that the entropy functional can be retrieved by approximating . In other words, to recover the entropy as Ec, the occupations along the connection path λ follow a distribution function with the scaled width, λσ. However, in reality, the width scaling may not be simply linear and may, instead, be an orbital functional. Thus, as one might envision, when including orbital dependence in the definition of the correlation energy, the actual correlation energy functional bears some resemblance to the entropy, but, in general, it will differ. This analysis may guide future developments of correlation energy functionals.
This material is based upon the work supported by the U.S. National Science Foundation under Grant Nos. CHE-2154760, OAC-2117429, and OAC-1931473. M.P. acknowledges discussions with Andreas Savin.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Jessica A. Martinez B: Data curation (equal); Investigation (equal); Software (equal). Xuecheng Shao: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Software (equal); Supervision (equal). Kaili Jiang: Investigation (equal); Software (equal). Michele Pavanello: Conceptualization (equal); Supervision (equal); Writing – original draft (equal).
DATA AVAILABILITY
The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.8240064, Ref. 28.