A step toward density benchmarking – the energy-relevant “mean field error”

Since the development of generalized gradient approximations in the 1990s, approximations based on density functional theory have dominated electronic structure theory calculations. Modern approximations can yield energy differences that are precise enough to be predictive in many instances, as validated by large-and small-scale benchmarking efforts. However, assessing the quality of densities has been the subject of far less attention, in part because reliable error measures are difficult to define. To this end, this work introduces the mean-field error that directly assesses the quality of densities from approximations. The mean-field error is contextualised within existing frameworks of density functional error analysis and understanding, and shown to be part of the density-driven error. It is demonstrated on several illustrative examples. Its potential use in future benchmarking protocols is discussed, and some conclusions drawn.


I. INTRODUCTION
Modern density functional theory (DFT) was introduced in the mid 1960s through the pioneering work of Hohenberg, Kohn and Sham. 1,2Since the 1990s it has come to dominate computational electronic structure theory, with tens of thousands of works published each year.Calculations based on DFT are routinely used to understand chemical and solid state problems.Increasingly, they are used to predict the properties of novel molecules and materials.
The reason for DFTs dominance can mostly be explained by two important elements of the approach: 1) approximations based on the one-body density (density functional approximations) can perform remarkably well as a tool for prediction and analysis of quantum chemical problems; 2) the use of one-body densities as the key variable of interest dramatically reduces computational demands compared to many-body wave functions, and makes the study of large systems numerically tractable.With these elements in mind, it is time to move on to the theoretical basis of DFT and its approximations.
Consider a molecular or solid state system defined by a nuclear potential, v sys .The Hohenberg-Kohn and Kohn-Sham theories 1,2 dictate that its ground state energy may be written as a functional, of its optimal electron (one-body) density, n sys .Eq. ( 1) somewhat unconventionally divides the usual DFT energy into a mean-field part, All the ingredients of F MF are known in an orbital formalism, so may be computed exactly.The remarkable practical success of DFT comes from the fact that E xc [n]  may be replaced by a density functional approximation (DFA), E DFA xc [n], yet still yield useful energetic predictions.The ground state energy from the DFA is, where all terms are easily computable.The density, n DFA sys , is found by self-consistently solving, Given the importance of energies in physical systems, typically the quality of any DFA is assessed on how well (or not) it reproduces energy properties (usually energy differences).The magnitude of energy errors, has therefore been the subject of significant scrutiny through large (e.g.Refs 3-6 and references therein) and boutique (e.g.Refs 7-9) benchmarking efforts.Thirty years of work on generalized gradient approximations (GGAs) and higher rungs of Jacob's ladder 10 have led to DFAs that can (semi-) 11 reliably predict reaction energies to within 10 kcal/mol. 4,5In many cases, better predictions have been enabled by heavily empirical optimization strategies that eschew physical constraints for lower errors on training sets.However, comparing Eqs (3) and (1) reveals that the DFA involves two approximations: one for the energy, E xc → E DFA xc , and one for the density, n sys → n DFA sys .One might expect that as DFAs improve in quality, and as they climb Jacob's ladder, 10 that their energies and densities should both get better as they navigate the path toward exactness.However, in ground breaking work, Medvedev et al 12 reported that the empricial strategy seems to have led to better energies at the expense of worse densities -leading DFAs to "stray from the path" to exactness.
This apparent decoupling of the quality of energies and densities has consequently led to increasing interest (e.g.Refs 13-20) in the quality of DFA densities.That is, on the magnitude of errors, Err[n DFA sys − n sys ], according to some quality measure.
Despite increasing interest, the scope of efforts to benchmark densities is dwarfed by traditional analyses based on energies (the author has seen fewer papers on density benchmarking than there are unique benchmarking subsets in Ref. 4 or 5).A major outstanding problem hampering better understand of density errors is the lack of reliable (i.e.useful and meaningful expressions) measures for Err[n DFA sys − n sys ].This work seeks to address this problem by motivating and introducing the "mean-field error" that assesses densities according to their DFA-independent effect on energies.
The rest of the manuscript proceeds as follows.First, the challenge of assessing densities is discussed, and the mean-field error motivated.Then, methodology is introduced and used to investigate mean-field (and other) measures of density errors on same exemplar systems.Next, some the future of density benchmarking is discussed, wherein results from this work are used to highlight some promissing leads and highlight challenges that most be overcome for routine benchmarking of densities.Finally, some conclusions are drawn.

II. ASSESSING ERRORS IN DFAS
Assessing energies of DFAs is straightforward, albeit not uncontentious. 11,21One typically decides on a relevant set (typically between 20 and a couple of thousand examples ) of energy differences -such as a dissociation curve, a reaction energy, the interaction between dimers or the atomisation energy of a solid -and then evaluates how accurately one or more DFAs reproduce them on average.By choosing energy differences one is able to separate systematic errors in energies (which cancel out in differences) from non-systematic errors, which are relevant for practical computational studies.
The simplest example of an energy difference metric is, Here, is an energy difference in the true system, and, is its DFA counterpart, where A indicates a subsystem of sys, whether that be constituent atoms are other chemically relevant divisions.A more general expression is where w sys R is the weight (generally a positive or negative integer) assigned to some reactant, sys R .
By contrast, densities are non-local quantities and there are a multitude of potential metrics and measures (here meaning useful expressions that do not meet the requirements of a metric) for assessing them.Some examples include: • The mean absolute density metric, that has units of inverse volume; • The root mean square density metric, that also has units of inverse volume; • The dipole difference measure, that has units of length; • The Hartree metric, that has units of energy.
These definitions may also be extended to differences, e.g., the Hartree error equivalent of eq. ( 6) is, which similarly avoids systematic errors; while, is the more general equivalent.

A. Using energy decomposition to assess densities
Kim et al 22 proposed a rather more practical way of assessing densities, that has important consequences for functional development.They highlighted that the DFA error, E −E DFA may be decomposed into two more useful sources of error -which they called functional-driven, ∆E F and density-driven, ∆E D , errors.These are defined to take the form, where the functional-driven error reflects the the use of a DFA at the exact density; and the density-driven error reflects the use of the wrong density within the DFA.Eqs. ( 1) and (3) reveal that, is the difference between the exact and approximate xc energies evaluated at the exact density.∆E DFA D,sys is undeniably a useful practical measure of the quality of densities -especially once Hartree-Fock densities are used as a substitute (n sys :≈ n HF sys ) for exact ones.However, its value does depend on the specific DFA chosen which makes it non-fundamental.It is therefore useful to use eqs.( 1) and ( 3) to further decompose the energy as, That is, ∆E D is the sum of a DFA-dependent xc error, and a DFA-independent (fundamental) mean-field error, Again, it is useful to avoid systematic errors by looking at errors in energy differences.To this end, is a useful "double ∆" expression that investigates the change in the mean field error across a process.Here, A indicates subsystems (e.g., atoms) of sys, while . . .indicates F , DMF or Dxc.
An alternative decomposition, [eq.(45) which includes the MF error as well as the corresponding xc term.

B. Mean-field error
The mean-field error of eq. ( 19) forms part of the density-driven error.Importantly, ∆E DMF,sys , offers a pragmatic way to assess the quality of DFA densities entirely independently from the DFA energies.∆E DMF has the following (mostly useful) properties: The first two points follow directly from the definition in eq. ( 19).The third point follows from the fact that F MF involves linear (T s + nv sys (r)dr), and quadratic (E H ) functionals of the density so can have multiple zeros.
The fourth point may be understood by considering the case that the DFA density is close to the true density, i.e. ∆n DFA := n DFA − n → 0. Eq. ( 19) then yields, involving the approximate xc potential.Taking the sum of Eqs ( 23) and (24) gives an error, (v xc,sys − v DFA xc,sys )∆n DFA sys dr, which Vuckovic et al 23 showed is second order in the ∆n DFA sys , meaning the exact and approximate potentials approximately differ (up to a constant) in proportion to ∆n DFA sys .Finally, Eq. ( 20) becomes, as the MF energy difference for small changes to densities.Eq. ( 25) reveals that systematic errors compensation between v xc,sys and v xc,A or ∆n DFA sys and ∆n DFA A will also cancel out.
The fifth point warrants some additional discussion.Formally, it follows from the fact that |E should either be close to the density that minimizes F MF , or give a value of F MF that is close to the minima.Therefore, variational principles dictate that F MF will be relatively insensitive to numerical errors in n sys or n DFA sys .To illustrate this in practice, consider the case of atomic Hydrogen, whose properties (n = 1 π e −2r , T s = 1 2 = 0.5, nvdr = −1 and E H = −E xc = 5 16 = 0.3125, all in Hartree units) are known exactly.The orbital (ϕ = e −r / √ π) may be approximated using one-, twoand three-Gaussian models, with coefficients chosen to minimize the true energy.The error in T s is −0.076, −0.014 and −0.003 Ha, for one-, two-and three-Gaussian models.But, they are balanced by errors of the opposite sign in nvdr.The error in F MF is therefore substantially smaller -being −0.012 Ha for one-, and well under 0.001 Ha for two-and three-Gaussian models.

Density
The sixth point follows from the fact that, of the ingredients of F MF [n], only T s [n] is not trivially computable given the density.But, given the exact KS potential, v s [n], one may determine the exact KS orbitals, φ k , which obey, [ Taken together, these qualities yield an exact measure that is reasonably easy to compute, reasonably insensitive to basis errors, yet provides a meaningful measure of density quality, despite not being a true metric.The next section will explore how well it works in practice, on a series of selected examples, and with the goal of illustrating its usefulness on practical chemical problems.

III. RESULTS
All computation in this section is done using customized psi4-based code. 24,25Most calculations use moderate (cc-pvtz, 26 tz) quality basis, although sometimes lower (cc-pvdz, dz) quality sets are used to test the impact of basis set on results.
F MF [n DFA sys ] is computed as part of any DFT calculation.For the work here orbitals are obtained from KS theory (restricted for even and unrestricted for odd numbers of electrons) and used directly to evaluate T s and nv sys dr.E H is computed using density-fitted electron repulsion integrals (DF-ERIs). 26This ensures helps to improve consistency with the reference results, which use DF-ERIs as part of the inversion procedure.
Reference mean-field energies, F MF [n sys ], are obtained from coupled cluster singles and doubles (CCSD) densities.The resulting density-matrix can be used to directly evaluate nv sys dr and E H in the same way as above.Reference values for T s are obtained using the method described in Ref. 27, also with the same DF-ERI basis used to evaluate E H .For systems with odd numbers of electrons reference values are at the restricted-orbital Kohn-Sham level, where ↑ and ↓ electrons share the same spatial orbitals.
Results are reported for (up to) fifteen DFAs covering multiple rungs of Jacob's ladder; 10 and with a mix of minimally empirical (where only a few 'free' parameters are optimized over energies) and empirical (where most parameters are optimized over energies) construction.The investigated DFAs are: • The local density approximation (LDA) using VWN 28 correlation; • The minimally empirical GGAs PBE, 29 BLYP, 30 and empirical N12 31 and SOGGA11; 32 • The minimally empirical meta-GGAs TPSS 33 and SCAN; 34 and more empirical M06-L 35 and MN15-L; 36 • Hartree-Fock theory; the minimally empirical hybrids GGAs PBE0 37 and B3LYP; 38 and the more empirical M06; 39 • The empirical range-separated hybrid, ωB97X-V, 40 and the minimally empirical double hybrid B2PLYP 41 (implemented in the usual way with MP2 contributions to the energies, but not to the densities).

A. C and H systems
The first test of density-related properties involves 56 random molecules and ions containing only C and H -17 CH 3 anions, 19 neutral CH 4 , and 20 CH 5 cations. 42The structures are random, but obey the following rules: 1) CH distances are between 1 and 1.8 Å at 0.1 Å steps; 2) angles are random but all HH distances must be greater than 0.7 Å.Using these random systems controls for elemental effects on energies (since only C and H are involved) but provides a reasonable diversity of chemical interactions.
To begin, consider the usefulness of E DMF .Figure 1 compares different measures of density errors, for dz and tz basis sets, for all 56 molecules and for all 15 DFAs.The top row of plots compares results for tz and dz, to reveal that, despite the low quality of a dz basis, most errors correlate rather closely in both basis sets -although there are some exceptions.tz is thus chosen for subsequent work.
The bottom row indicates that Hartree and density errors are closely correlated.But, it also reveals something rather alarming.There are many systems with large mean field errors but small density or Hartree errors.That is, there are many systems where the DFA density has a more substantial impact on the DFA energy than is captured by common metrics.This suggests that typical metrics may miss important features of densities.Next, consider the total density-driven error.Figure 2 shows the MF and xc parts of this error, which cancel almost perfectly in all cases, as expected and shown in Ref. 23.As shall become apparent later, this nearly perfect cancellation is not univeral.Most likely, it at least partly reflects the fact that DFAs become popular because they work on organic chemistry -and especially on CH bonds.The exceptional case is MN15-L (in yellow) which has noticeably less cancellation of errors than other DFAs.
Finally, consider how ∆E DMF varies with different DFAs, and how this compares with more traditional metrics for density errors.Figure 3 shows the error distribution (as violin plots whose width indicates the probability of errors) over all 56 molecules for each of the 15 considered DFAs.DFAs are grouped according to rung/complexity.
The most obvious conclusion to be drawn is that most functionals yield very low Hartree errors -less than 1 kcal/mol in the vast majority of cases.Variation in density errors is a little greater, but is broadly consistent with the Hartree errors.Surprisingly, Hartree-Fock (HF) and B2PLYP (with a high fraction of Hartree-Fock) densities are rather poor -despite B2PLYP giving excellent energies.This supports the cautionary note in recent work 19 on using density-corrected DFT (i.e. using HF densities and orbitals to compute DFA energies) as a general purpose tool.
The MF error provides a more complex picture of DFAs.Firstly, it reveals that the best MF errors are from SOGGA11, SCAN and B3LYP, which give ∆E DMF within ±5 kcal/mol in the majority of cases.Notably, SCAN outperforms PBE0 despite not using any exact Hartree-Fock exchange, perhaps reflecting its satisfaction of many exact limits. 34Secondly, Hartree-Fock is the least consistent performer -all DFAs have a narrower distribution of errors which suggests that inclusion of even approximate correlation generally improves densities.Thirdly, of the remaining heavily parameterized DFAs N12, M06-L, MN15L, M06 and ωB97X-V, only M06 and ωB97X-V (both hybrid DFAs) do a good job on densities across all three errors.It is also notable that SOGGA11 is also not as good as SCAN or B3LYP on the other errors, despite its success on ∆E DMF .
Finally, Figure 3 reveals an interesting result pertaining to the "path to exactness".LDA, PBE, TPSS and SCAN (all except LDA from Perdew and co-workers) are clearly staying on the path to exactness.Each step leads to satisfaction of more exact constraints, and each step yields an improvement to the mean-field error.

B. Cl-OH − system
Having demonstrated the usefulness of the mean-field error, let us now turn to problems where densities can be a genuine problem.The OH-Cl − dimer has been shown to benefit from density-corrected (DC-)DFT, in which HF densities are used together with standard DFAs for energies to avoid non-systematic cancellation of errors as ClOH angles are increased.By using DC-DFT, the zero angle structure is (correctly) predicted to be optimal, versus 25 • or so for self-consistent DFAs. 19This success has been justified by the fact that HF densities avoid the delocalization errors of GGAs.
Figure 4 illustrates that DC-DFT also benefits from cancellation of errors.In fact, as the top plot shows, the HF densities are quite bad according to ∆E DMFonly N12 is worse.But, the HF error increases only slightly as angles are increased, whereas all GGAs and all meta-GGAs bar MN15-L predict a much larger increase.The xc contribution to the density-driven error (bottom plot) is much more consistent across different DFAs, which means that using the HF density can lead to signficant changes in energy ordering.
C. He + 2 system The last example to be investigated is He + 2 , which represents a stringent test of densities as the spare electron is shared equally between the two cations when they are close together, but represents a difficult dissociation into He+He + = 2×He +0.5 when they are pulled apart.Figure 5 shows mean-field and density-driven errors relative to the error in He and He + -which should be (but isn't) the limit of large bond length.
Previous work 43 has shown that density-driven errors do not have much effect on this system, which is confirmed by this work.Nevertheless, HF is the standout performer for densities, with systematic mean field errors of -10 kcal/mol.By contrast, all DFAs start with similar errors to HF, but get worse as the molecule as dissociated.The total density-driven error reveals that the mean-field errors are almost entirely cancelled by the corresponding density-driven errors -with the exception of HF theory.Breaking apart the density-driven error into its mean-field and xc components therefore provides a picture of what is going on in this difficult exemplar case.

IV. THE FUTURE OF DENSITY BENCHMARKING
Due to their direct connection to energies, ∆E DMF and ∆ 2 E DMF are promising tools for analysing errors in densities independent of any approximations As can be seen in the previous section, both offer insights that aren't available in traditional analysis and assessment of densities.The increasing quality and reliability of general purpose Kohn-Sham inversion algorithms, 27,[44][45][46][47] and relative insensitivity of ∆E DMF to basis set effects make future routine use quite feasible.Both measures thus provide a crucial first step to systematic benchmarking of densities.The author outlines several promising directions, and remaining challenges, below.
Assessment of DFAs: The results from Sec III illustrate that all DFAs benefit from cancellation of densityand energy-based errors, as can be expected (but not guaranteed) from minimization principles.For the systems tested here, only N12 and HF have consistently poor densities, as shown by large systematic errors (N12) or highly variable errors (HF).Results also suggest that the sequence of LDA, PBE, TPSS, SCAN is staying on the exact path.A deeper analysis involving a wider range of exemplar chemical physics would be required to properly assess whether highly empirical DFAs are truly straying from the path to exactness, or merely taking a side road.
DFAs for densities: The work suggests that DFAs might be optimized to improve densities, at the expense of energies.Factoring densities into optimization strate-gies may offer improvements on DC-DFT, 19 which already improves on self-consistent calculations when used sensibly.For example, despite being optimized on energies, B3LYP clearly out-performs its "ingredients" (LDA, BLYP and HF) on densities.Rigorous optimization of DFAs to improve densities may offer a "best of both worlds" route to DFAs, in which densities are treated accurately by one approach and energies by another.
Benchmarking the benchmarks: One important aspect that was brushed over in the previous section is the reliability of benchmark data.The author has followed tradition in using CCSD densities as a reference.But this begs several questions: How accurate are CCSD densities?When do they become inaccurate?How do we assess inaccuracies?We need to benchmark the benchmarks to answer these questions.[Note, Kanungo et al 48 make the case that Brueckner orbital based coupled cluster theory may be more appropriate.] Similarly, the author has assumed that the cc-pvtz basis set is sufficiently accurate for benchmarking densities.For present illustrative purposes this is almost certainly the case -indeed, cc-pvdz was probably sufficient.But, benchmark energies have benefited from substational development and refinement of techniques to extrapolate to the basis set limit, especially via composite protocols (e.g.Weizmann-4 theory 49 ).Can (and should?) something similar be done for densities?
V. CONCLUSIONS DFAs serve a major role in computational electronic structure theory.Addressing the growing need for accurate DFAs has pushed human-optimized DFAs to their limits -SCAN 34 obeys an unprecedented 17 constraints.Empiricism and data-driven (machine learned) DFAs can fill in gaps.But, data-driven approaches can optimize for cancellation of errors in the training set (interpolation), which can lead to problems when they are applied to novel chemical physics (extrapolation).There is thus an urgent need to use more physical constraints in their construction.Otherwise, there is a risk of developing DFAs that have outstanding interpolative abilities, but are poorer than the current state-of-art at extrapolation -which is precisely where they are needed.
Testing, improving and even optimizing on the quality of densities is a promising way to incorporate additional physical data into DFAs.The mean-field error, ∆E DMF , [eq.(19)] presented here offers several advantages in this regard, as detailed in Section II B. Its usefulness as a measure is illustrated in this work using selected examples in Section III.
The present work presents multiple future challenges and directions [see also Section IV].Firstly, new benchmarking strategies need to be developed to generate useful benchmark data for densities.Secondly, those strategies need to be turned into useful high-quality reference data that can be used to test and develop DFAs.Finally, a more diverse range of systems needs to be studied, to offer insights into which DFAs do better, and worse, at predicting densities.
Once these challenges have been met, the resulting benchmark sets of ∆F MF will allow data-driven DFA to incorporate densities into their optimization, and so ensure they stay on the path to exactness, by design.
) and the usual exchange-correlation (xc) energy, E xc [n].The mean-field part is a semi-classical approximation to the physics of the system, in which the only quantum effects are in the KS kinetic energy, T s [n], whereas interactions of electrons with nucleii, nv sys dr, and each other, E H [n] = 1 2 n(r)n(r ′ ) drdr ′ |r−r ′ | , are treated classically.All other quantum contributions are in the unknown exchange correlation (xc) energy, E xc [n].
is the Hartree potential and v DFA xc = δE DFA xc /δn is the xc potential.
MF | meaning that the density that minimizes F MF + E (DFA) xc

5 FIG. 2 .
FIG. 2. Deviations of xc and mean-field energies for different DFAs.Colours identify DFAs and are the same as in Figure 3. Dotted lines indicate y = −x and thus perfect cancellation of errors.

4 FIG. 3 .
FIG. 3. Distribution of different density errors for selected DFAs.Note, Hartree and density errors below zero are an artefact of the violin plot model.

FIG. 4 .
FIG. 4.Mean-field and density-driven errors for HO-Cl − with various HOCl angles.Mean-field errors (top) are shown for 1 • and 25 • , as well as the difference between the two.Both components of density-driven errors (bottom) are shown for all angles, for a more limited set of DFAs -xc errors are solid lines and MF errors are dotted lines.0 • is excluded due to issues in treating the higher-symmetry structure.LDA is excluded as it did not converge for angles < 5 • .
FIG. Mean-field (top) and density-driven (bottom) errors for He + 2 at various bond lengths.
1.It does not contain E xc or E DFA . 1. Comparisons between various error measures across the full suite of random molecules, with all errors normalized by their average value.Here, 'Hartree' indicates eq.(12), 'Density' indicates eq.(9) and 'MF' indicates eq.(2).'dz'/'tz' indicate the cc-pvdz/cc-pvtz basis set.Dark colours indicate many points, yellow indicates few points, and white indicates no points.Dotted lines show y = x, so indicate perfect linear correlation. FIG