With density functional theory (DFT), it is possible to calculate the formation energy of charged point defects and in turn to predict a range of experimentally relevant quantities, such as defect concentrations, charge transition levels, or recombination rates. While prior efforts have led to marked improvements in the accuracy of such calculations, comparatively modest effort has been directed at quantifying their uncertainties. However, in the broader DFT research space, the development of Bayesian Error Estimation Functionals (BEEF) has enabled uncertainty quantification (UQ) for other properties. In this paper, we investigate the utility of BEEF as a tool for UQ of defect formation energies. We build a pipeline for propagating BEEF energies through a formation-energy calculation and test it on intrinsic defects in several materials systems spanning a variety of chemistries, bandgaps, and crystal structures, comparing to prior published results where available. We also assess the impact of aligning to a deep-level transition rather than to the VBM (valence band maximum). We observe negligible dependence of the estimated uncertainty upon a supercell size, though the relationship may be obfuscated by the fact that finite-size corrections cannot be computed separately for each member of the BEEF ensemble. Additionally, we find an increase in estimated uncertainty with respect to the absolute charge of a defect and the relaxation around the defect site without deep-level alignment, but this trend is absent when the alignment is applied. While further investigation is warranted, our results suggest that BEEF could be a useful method for UQ in defect calculations.

## I. INTRODUCTION

Point defects are an unavoidable consequence of nature’s insistence upon maximizing entropy. Whether intentional or inherent, point defects often determine the optical,^{1,2} electronic,^{3–5} magnetic,^{6–8} and mechanical^{9–11} behavior of materials. Therefore, whether in service of designing new materials or engineering existing ones, first-principles computational modeling of defects is crucial. Since accessible simulation sizes in electronic structure techniques, such as density functional theory (DFT), are $ O( 10 2\u2212 10 3)$ atoms, we rely on a variety of well-established methods for correcting finite-size effects (including electrostatic interactions of charged defects with their periodic images and Burstein–Moss shifts due to band filling effects) to approximate more realistic concentrations of dilute defects, such as dopants.

However, assessing the true accuracy of these calculations in comparison with experiment remains a challenge. Part of this challenge arises from the difficulty of doing sufficiently well-controlled experiments (e.g., knowing that we are truly measuring an equilibrium system or enumerating all defects present in a system that could impact a signal). Another part is due to the fact that many of the quantities most directly accessible from simulation (e.g., formation energies or charge transition levels) are related to experimental observables through exponential scaling (e.g., Boltzmann or Shockley–Read–Hall^{12,13} relations), making uncertainty propagation especially unforgiving.

This work seeks to address this second issue by investigating the utility of Bayesian Error Estimation Functionals (BEEF)^{14} to generate uncertainty estimates of defect formation energies for charged point defects as calculated with DFT within the supercell approach. Specifically, we develop a pipeline enabling the BEEF framework to be used to generate uncertainty estimates for defect formation energies and assess trends of these estimates across a variety of physical and computational parameters. With BEEF, atomic relaxation and self-consistent field calculations are completed with an “optimal” functional (comprising the bulk of the computational effort). Next, a large ensemble of functionals is generated that deviates from the optimal functional in a manner that has been optimized to produce reliable error estimates relative to experiment for a variety of properties used in fitting the functional. Using the converged charge density from the optimal functional, the ensemble of functionals is used to calculate an ensemble of non-self-consistent energies (computed via Kohn–Sham, not Harris–Foulkes), and the standard deviation of this distribution forms the error estimate.

It should be noted that the BEEF functional used in this study^{15} was not created with defect data (nor does there currently exist a BEEF variant that includes defect-specific properties in its fitting data). In addition, it is a GGA functional, which is a lower level of theory than is often used for modern defect calculations. As such, while we do compare to prior published results, we are primarily interested in trends in uncertainty across relevant variables, including supercell size, charge state, chemistry, and crystal structure in order to assess the conceptual utility of a BEEF-derived uncertainty metric in the context of charged point defect calculations.

Finally, a note on terminology: rigorously, *error* and *uncertainty* mean different things. Error refers to the difference between a single measurement and the “true” value of a quantity, while uncertainty refers to a range of values that the true value could fall into based on that measurement. BEEF is explicitly framed in the context of error estimation, because the ensemble is fitted on properties that *do* have broadly accepted experimental values. However, what we are doing in this work is attempting to quantify uncertainty, especially since there are not even widely accepted “true” values for many of the properties we are computing. As such, in this work, we will describe results as the estimation of uncertainty unless explicitly describing details of BEEF.

## II. METHODS

### A. Bayesian error estimation functionals (BEEF)

The Bayesian Error Estimation Functionals (BEEF) framework is used to generate an ensemble of exchange-correlation functionals that enable the calculation of an equal number of DFT energies, for which the standard deviation is calibrated to be representative of experimental error.^{14} In the selected implementation of BEEF,^{15} the authors optimized and regularized functionals for each experimental dataset of interest by parameterizing the exchange enhancement factor in terms of a Legendre expansion of the reduced density gradient. They then generated an optimal compromise functional that minimized the product of the cost function of each dataset’s individual functional. This “optimal” functional thus minimizes the errors for a variety of material properties for a wide range of materials. The authors proceeded to generate an ensemble of functionals by varying the Legendre expansion coefficients, where the probability of a parameter taking on any “sub-optimal” value decays exponentially away from the optimal value. The lengthscale of decay was chosen such that the spread of energies from the ensemble is consistent with the error relative to the training data. As a result, the ensemble of functionals is explicitly created to provide reliable error estimates for the defined materials properties.

In this work, we adopt the BEEF-vdW functional of Wellendorff *et al.*^{15} The functional employs the generalized gradient approximation (GGA) with a van der Waals (vdW) correlation. The datasets used to create the functional included a variety of molecular, solid-state, and surface properties.

### B. Defect formation energy

^{16}as follows:

In Eq. (1), term 1 represents the “raw” energy difference between a supercell with a defect $D$ and charge $q$ ( $ E D , q$) and a pristine (or “host”) supercell ( $ E H$). Term 2 represents the change in energy due to adding or removing atoms at a given chemical potential, where $ n i>0$ when atom $i$ is added to the defect system and $ n i<0$ when atom $i$ is removed. These first two terms are the ones we are able to propagate through the BEEF ensemble and are discussed in more detail in Secs. II C and II D).

Term 3 is analogous to term 2 but for the electrons added or removed from the system with Fermi energy $ E F$ to charge the defect. The Fermi energy is defined with respect to the valence band maximum (VBM) of the self-consistent functional. Finally, term 4 represents corrections due to finite-size effects and potential misalignment, for which we adopt the approach of Freysoldt, Neugebauer, and Van de Walle^{17} (the so-called FNV scheme) and, in particular, adapt the implementation provided in the program Corrections for Formation Energy and Eigenvalues (CoFFEE).^{18}

### C. Raw defect formation energy

As discussed above, the raw defect formation energy, term 1 in Eq. (1), is the difference between the energy of the defect supercell ( $ E D , q$) and the pristine supercell ( $ E H$). In a non-BEEF calculation, this is computed by simply subtracting the energies of each respective DFT calculation. In the BEEF framework, this simply becomes an element-wise subtraction of the vectors of energies for the defect and pristine supercells. Critically, the BEEF implementation employs consistent indexing across the ensemble of functionals; thus, it is possible to compute raw defect formation energy for a particular functional within the ensemble by selecting the energy value within each DFT calculation of the same index.

### D. Phase stability

To determine chemical potential ranges within which our target compounds are stable, we follow the process set out by Zhang and Northrup^{19} and implemented by Buckeridge *et al.*^{20} The chemical potential position (e.g., Ga-rich, I-rich) within each envelope is selected to be consistent with prior published results.

To account for the comparison of materials in different states (e.g., crystalline solids and gases, where the typical cancellation between entropic terms in free energies across solids does not generally hold), we adopt the method described by Hine *et al.,*^{21} wherein reference states for gaseous elemental phases are not computed via DFT but are calculated with respect to the experimental formation energies of the crystalline host. This technique was used to compute the chemical potential for elemental nitrogen. For gallium, lead, and iodine, the elemental forms were computed directly with DFT (using starting structures of mp-142, mp-20483, and mp-639751, respectively, in the Materials Project^{22} database).

Notably, for some members of the BEEF ensemble, there may be no predicted-stable region in chemical potential space (i.e., the compound is not on the convex hull at all); in these cases, that functional was excluded from the analysis as a full defect formation energy cannot be calculated. For the materials considered in this study, this behavior was only observed in GaN, where 5 of the 2000 of the chemical potential envelopes produced by the BEEF ensemble were not stable.

### E. Final terms and trends in uncertainty estimates

In theory, different functionals would result in different values of a self-consistent bandgap/Fermi energy (and potential alignment) and finite-size corrections. However, to make BEEF computationally tractable, the ensemble of functionals is used only at the very end of the process to generate *non-*self-consistent total energies and not separate sets of Kohn–Sham eigenvalues or electron densities. As a result, the final two terms of Eq. (1) are generated only with reference to the self-consistent result produced by the optimal functional. As such, these terms do not affect the trends in uncertainty estimates of defect formation energy.

As the Fermi energy and finite-size effects do not affect trends in the standard deviation of defect formation energy with respect to the variables under consideration here (i.e., they shift every energy in the BEEF ensemble by the same amount), any change in the *spread* of defect formation energies (examined specifically in Figs. 4 and 5) is purely a reflection of the change in the spread resulting from the raw defect formation energy and the chemical potential.

Material . | Supercell sizes (number of atoms) . | Defect . | Charge states . |
---|---|---|---|

GaN | 3 × 3 × 2 (72), 4 × 4 × 3 (192), 5 × 5 × 3(300) | Ga_{N}N _{Ga}V _{Ga}V _{N} | −2, −1, 0, 1, 2, 3 −1, 0, 1, 2 −3, −2, −1, 0 −3, −2, −1, 0, 1, 2 |

PbI_{2} | 3 × 3 × 2(54), 4 × 4 × 2 (96), 4 × 4 × 3 (144) | I_{Pb}Pb _{I}V _{Pb}V _{I} | −2, −1, 0, 1, 2 0, 1, 2, 3 −2, −1, 0 −1, 0, 1 |

Diamond | 2 × 2 × 2 (64), 3 × 3 × 3 (216), 4 × 4 × 4 (512), 5 × 5 × 5 (1000) | VC | 2, 1, 0, 1, 2 |

Material . | Supercell sizes (number of atoms) . | Defect . | Charge states . |
---|---|---|---|

GaN | 3 × 3 × 2 (72), 4 × 4 × 3 (192), 5 × 5 × 3(300) | Ga_{N}N _{Ga}V _{Ga}V _{N} | −2, −1, 0, 1, 2, 3 −1, 0, 1, 2 −3, −2, −1, 0 −3, −2, −1, 0, 1, 2 |

PbI_{2} | 3 × 3 × 2(54), 4 × 4 × 2 (96), 4 × 4 × 3 (144) | I_{Pb}Pb _{I}V _{Pb}V _{I} | −2, −1, 0, 1, 2 0, 1, 2, 3 −2, −1, 0 −1, 0, 1 |

Diamond | 2 × 2 × 2 (64), 3 × 3 × 3 (216), 4 × 4 × 4 (512), 5 × 5 × 5 (1000) | VC | 2, 1, 0, 1, 2 |

### F. Non-dimensionalization

In this work, we examine the relationship between BEEF uncertainty estimates and both supercell size and atomic relaxation near a charged defect site. To compare these relationships across different chemical systems, we define nondimensional versions of these quantities.

### G. Energy alignment by a charge transition level

In the standard calculation method as described above, defect states are implicitly aligned to the VBM position in each supercell. These are presumed to be aligned with each other across charge states via the potential alignment correction; however, as discussed above, we cannot compute separate values of this correction for each member of the BEEF ensemble since we only have access to the ensemble of total energy values and not full sets of Kohn–Sham eigenvalues and occupations. To explore a way to address this, we take inspiration from the work of Freysoldt *et al.,*^{23} in which they made use of the observation that frequently, a well-localized deep defect can serve as a better energy reference than the VBM position and lead to more favorable energy cancellation, also known as the Langer–Heinrich rule.^{24} This trend was originally observed in transition-metal impurities^{25} and later also for interstitial hydrogen.^{26} In particular, we explore explicitly aligning to a transition level within the bandgap (e.g., +1/0 for the diamond vacancy as illustrated in Fig. 2). In this way, we can compute a separate alignment value for each member of the BEEF ensemble, since while we do not have separately resolved VBM positions, we *do* have transition levels, as these come only from the total DFT energy. Interestingly, this suggests that one could in principle compute an “implied” ensemble of VBM positions in BEEF via such a procedure and potentially even propagate that to ionization energies via Janak’s theorem, though an exploration of the rigorous validity of this is beyond the scope of this study.

We note that while in Ref. 23, they used a hybrid functional as the “ground truth” to which they aligned the other transitions, here we do not have a higher level of theory per se, and therefore, we align to the self-consistent “optimal” functional. We observed qualitatively similar behavior when aligning to the various transitions within the gap and, therefore, selected the particular transition for alignment according to transitions that existed in previously reported data to enable consistent comparisons (see the supplementary material for more details on this).

### H. Computational settings

All calculations were conducted with the PSL 1.0.0^{27} pseudopotentials and the BEEF-vdW functional^{15} (with an ensemble size of 2000) using the Quantum ESPRESSO^{28,29} DFT package. Defect structures were created with the aid of the Atomic Simulation Environment,^{30} Pymatgen,^{31} and the Materials Project.^{32} Convergence tests were conducted with respect to cutoff energies (in Quantum ESPRESSO, the corresponding parameters are ecutwfc and ecutrho), k-point density, and supercell size. Following these tests, ecutwfc was set to 70 Ry, and ecutrho was set to 350 Ry. The k-point meshes for the primitive cells of diamond, $ GaN$, and $ PbI 2$ were $4\xd74\xd74$, $6\xd76\xd74$, and $4\xd74\xd72$, respectively. The convergence threshold for estimated energy error, conv_thr, was set to $ 10 \u2212 7$ eV.

## III. RESULTS AND DISCUSSION

Three different materials were studied: diamond, $ GaN$, and $ PbI 2$. In total, nine intrinsic defects were considered in a variety of charge states and supercell sizes, as detailed in Table I.

Figure 3 shows defect formation energies as a function of Fermi level position, including the BEEF-optimal values, the averages of the ensemble energies, and the uncertainty estimates [standard deviations $ \sigma D( E F)$] for each defect. We note that the value of $ \sigma D$ at a particular Fermi level can in general contain contributions from more than one charge state since different members of the ensemble may have charge transition levels at different points. In each case, we have shown data from the second-largest supercell size examined for that material.

In Fig. 3, we also compare these data to prior published results for intrinsic defects in these materials.^{33–37} Noting that there are some variations in the level of theory used (which lead to bandgap discrepancies, among other differences), we see broad agreement between the results shown here and prior work. In addition, in cases where multiple prior results have been published, we see a general trend toward a broader spread among those results when the uncertainty estimate is larger, which serves as further validation (albeit somewhat anecdotal due to small statistics) of the uncertainty estimate.

With uncertainty estimates in hand, it is also interesting to examine how these values vary with different attributes of the calculation. For this, we choose to consider $ \sigma D , q$ for each charge state $q$ of each defect $D$ separately. In contrast to $ \sigma D$ as visualized in Fig. 3, which may contain contributions from multiple charge states, this corresponds directly to the standard deviation of the ensemble of energies arising from the calculation in Eq. (1).

In Figs. 4(a) and 4(b), we plot $ \sigma D , q$ against the nondimensional supercell size $ L ~$. In general, we see virtually no dependence of $ \sigma D , q$ on a supercell size regardless of alignment (though the alignment does generally reduce $ \sigma D , q$). This suggests that the uncertainty estimates behave as they should in this respect since so long as the actual values of formation energy are converged with respect to supercell size, $ \sigma D , q$ should not vary either. However, it is important to recognize that the finite-size correction is only a function of the optimal functional and that each member of the BEEF ensemble for a defect at a given charge has the same finite-size correction.

One marked exception to this trend is the $ N Ga$ defect in the +1 and +2 charge states in GaN. This is quite a high formation-energy defect that also relaxed to markedly different structures in different-sized supercells, and so some “uncertainty about uncertainty” seems warranted. See below for a more detailed discussion and the Data Availability statement for a link to download the structures and all other files associated with the calculations performed in this work.

In Figs. 4(c) and 4(d), we plot $ \sigma D , q$ against this average relaxation $\u27e8|\Delta r ~|\u27e9$. All else being equal, we would expect greater estimated uncertainty with larger relaxation and also with larger absolute charge, $|q|$. This is because larger relaxation and/or absolute charge will generally mean larger changes in local charge density, meaning the corresponding Legendre coefficients for the BEEF ensemble will vary more. Thus, when ensemble members are subtracted element-wise between the pristine and defective supercell, there will be less systematic correlation/cancellation and hence wider variation. In the unaligned case [Fig. 4(c)], this trend broadly holds, albeit with several exceptions. In some cases (most clearly those where both positive and negative charge states were considered), we can observe that competition between relaxation and $|q|$ leads to plots with a “tilted-U” shape, as can be seen in the $ I Pb$ and $ V I$ defects in $ PbI 2$. While it is difficult to discern on the plot because the data points are quite close together, this is actually also the case for $ V C$ in diamond.

Note that Fig. 4 includes *all* charge states calculated, not only those that are thermodynamically stable at some Fermi energy. For example, in the case of the $ V N$ defect in GaN, we see in Fig. 3(c) that charge state q = +2 does not appear on the hull of the BEEF optimal functional but is included in Fig. 4.

Returning to the case of $ N Ga$ in GaN mentioned above, we can also see clearly here the dramatically different relaxation at different supercell sizes in the “swing” from the light purple squares to the darker purple squares, whereas for most other defects, the lines for different supercell sizes are much more strongly correlated. In effect, this is a case where structures (and hence uncertainty estimates) converge more slowly with supercell size for one defect than for the others in that material.

When the “deep-level” alignment is applied to defect systems, there is a substantial change in the nature of the uncertainty band produced with BEEF [as made clear by a comparison between Figs. 3(i) and 2(c)], often making the uncertainty band smaller and more symmetric. An aligned version of Fig. 3 is available in the supplementary material. In Fig. 4, we see that the alignment collapses the value of $ \sigma D , q$ for different charge states as well as removing any clear relationship between uncertainty and relaxation.

An intuitive expectation/hope of an uncertainty estimate for point defects is that it should be larger the less “representative” of the bulk our defective supercell is. Above, we explored quantifying this difference via geometric metrics of supercell size and local relaxation. An *energetic* metric for this difference could be the magnitude of the computed finite-size correction, which directly approximates energetic artifacts due to the interactions of the defect with its periodic images. We explore this in Fig. 5, again for both the unaligned case and with the application of a deep-level alignment. As a reminder, the finite-size correction is calculated as a function of the self-consistent BEEF functional for a given defect type/charge and the supercell size. Both from the intuition outlined above and from the specific mechanism by which BEEF estimates uncertainty as described previously, we would expect the behavior in Fig. 5(a) (the unaligned case), where the uncertainty estimate generally increases as a function of the absolute finite-size correction. However, when the deep-level alignment is applied [Fig. 5(b)], the normalized uncertainty typically does not change with respect to the absolute finite-size correction. This observation is consistent with prior reports that a deep-level alignment results in better error cancellation.

## IV. CONCLUSION AND FUTURE WORK

Through our examination of intrinsic defects in a diverse set of materials, trends in uncertainty with a supercell size and degree of relaxation, and comparison to prior publications, our results suggest that BEEF may be a promising approach for uncertainty quantification (UQ) in defect energetics. There are several natural avenues for future investigation. First, parameterizing a new BEEF ensemble at a higher level of theory, such as a hybrid functional, would allow for better quantitative agreement on properties, such as bandgap, and also allow incorporation of more experimental data of relevance to the systems and properties of interest. For example, the solid-state data used in fitting BEEF-vdW includes only FCC, BCC, diamond cubic, and HCP crystal structures. Second, extending this work to other types of point defects (i.e., interstitials and extrinsic defects) and yet more materials would enable further tests of its robustness.

While BEEF is a powerful technique for getting uncertainty estimates at minimal added computational cost compared to a single calculation, it is limited in the sense that it works only with the self-consistent density from the BEEF-optimal functional and the ensemble is generated only over the total energies. As discussed in Sec. II, this precludes propagating uncertainty through all the terms in Eq. (1). Tools such as automatic differentiation, which already has partial support in the DFTK package,^{38} open the door to utilizing traditional sensitivity analysis to propagate uncertainty through such calculations.

Finally, in this work, we have restricted ourselves to the FNV scheme for finite-size corrections. However, a variety of other schemes exist, including that due to Lany and Zunger,^{39} the extension to FNV by Kumagai and Oba^{40} that allows for anisotropic screening, and specialized schemes, such as that by Gake *et al.* for handling vertical transitions.^{41} A systematic examination of sensitivity to choice of correction scheme would be a valuable contribution to this research space.

## SUPPLEMENTARY MATERIAL

See the supplementary material that includes a discussion of a version of the relaxation calculation used for Fig. 4 that includes defect site atoms in the relaxation calculation and also a version of Fig. 3 with deep-level alignment applied.

## ACKNOWLEDGMENTS

This research was conducted using the Tartan Research Advanced Computing Environment (TRACE). The authors would like to gratefully acknowledge the College of Engineering at Carnegie Mellon University for making this shared high-performance computing resource available to its community. A.T. also gratefully acknowledges support from the Bradford and Diane Smith Graduate Fellowship in Engineering at Carnegie Mellon University.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Andrew Timmins:** Conceptualization (supporting); Data curation (lead); Formal analysis (lead); Funding acquisition (supporting); Investigation (lead); Methodology (equal); Project administration (equal); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (supporting). **Rachel C. Kurchin:** Conceptualization (lead); Funding acquisition (lead); Methodology (equal); Project administration (equal); Resources (lead); Supervision (lead); Writing – original draft (supporting); Writing – review & editing (lead).

## DATA AVAILABILITY

The repository of all calculations and intermediate scripts can be found at https://github.com/ACME-group-CMU/BEEF_Charged_Defects.

## REFERENCES

*ab initio*finite-size corrections for charged-defect supercell calculations

*et al.*, “

*Proceedings of the JuliaCon Conferences*(The Open Journal, 2021), Vol. 3, p. 69.