First-principles computations reliably predict the energetics of point defects in semiconductors but are constrained by the expense of using large supercells and advanced levels of theory. Machine learning models trained on computational data, especially ones that sufficiently encode defect coordination environments, can be used to accelerate defect predictions. Here, we develop a framework for the prediction and screening of native defects and functional impurities in a chemical space of group IV, III–V, and II–VI zinc blende semiconductors, powered by crystal Graph-based Neural Networks (GNNs) trained on high-throughput density functional theory (DFT) data. Using an innovative approach of sampling partially optimized defect configurations from DFT calculations, we generate one of the largest computational defect datasets to date, containing many types of vacancies, self-interstitials, anti-site substitutions, impurity interstitials and substitutions, as well as some defect complexes. We applied three types of established GNN techniques, namely crystal graph convolutional neural network, materials graph network, and Atomistic Line Graph Neural Network (ALIGNN), to rigorously train models for predicting defect formation energy (DFE) in multiple charge states and chemical potential conditions. We find that ALIGNN yields the best DFE predictions with root mean square errors around 0.3 eV, which represents a prediction accuracy of 98% given the range of values within the dataset, improving significantly on the state-of-the-art. We further show that GNN-based defective structure optimization can take us close to DFT-optimized geometries at a fraction of the cost of full DFT. The current models are based on the semi-local generalized gradient approximation-Perdew–Burke–Ernzerhof (PBE) functional but are highly promising because of the correlation of computed energetics and defect levels with higher levels of theory and experimental data, the accuracy and necessity of discovering novel metastable and low energy defect structures at the PBE level of theory before advanced methods could be applied, and the ability to train multi-fidelity models in the future with new data from non-local functionals. The DFT-GNN models enable prediction and screening across thousands of hypothetical defects based on both unoptimized and partially optimized defective structures, helping identify electronically active defects in technologically important semiconductors.

## I. INTRODUCTION

Semiconductors are critical for a variety of technologies, such as consumer electronics, healthcare and biotechnology, information technology, communication and connectivity, automotive manufacturing, renewable energy, and industrial automation.^{1} With the signing of the CHIPS Act,^{2} there has been a massive influx of funding into semiconductor R&D, resulting in the establishment of several manufacturing facilities and research centers across the United States as well as many global partnerships between companies and universities. Developing next-generation semiconductor materials is crucial for addressing global energy needs and the demands of the electronics industry, and this process begins at the atomistic scale with enhanced structure–property relationships that can scale up to device performance and aid data-driven materials’ design and improvement.^{3}

The electronic structure of a semiconductor is heavily dependent on the presence of point defects in its crystal lattice, which range from intrinsic vacancies, self-interstitials, and anti-site substitutions, to impurities at different lattice sites.^{4} Defects can introduce energy levels within the bandgap, affecting carrier concentration and mobility, and often acting as traps that lead to a non-radiative recombination of holes and electrons in optoelectronic devices.^{5–10} Defects also play a crucial role in dopant activation, diffusion, and segregation, which are vital factors in device fabrication processes. Even at low concentrations, unwanted point defects or impurities can have a significant impact on the electronic, optical, and transport properties of semiconductors, making it important to be able to accurately predict their stability and electronic signatures.^{11}

One of the applications where the effect of defects is felt most is solar cells, where semiconductors such as Si and CdTe are commonly used as absorbers.^{7,12} Undesirable defects and functional dopants in semiconductor absorbers will, respectively, decrease and increase the optical absorption and thus the power conversion efficiency of single-junction, tandem, and bifacial solar cells.^{13} Similar effects are felt in applications such as transistors, photodiodes, lasers, sensors, and quantum information sciences.^{14–18} Canonical group IV, III–V, and II–VI semiconductors are some of the most important materials used in these applications, either as binary compounds or as alloys, typically in the Zinc Blende (ZB) or wurtzite (WZ) phases. In addition to Si and CdTe, compounds such as GaAs, SiC, CdSe, and CdS have been used in photovoltaics (PVs). GaAs, GaN, ZnO, and InP are employed in optoelectronic devices such as light-emitting diodes (LEDs), laser diodes, quantum dots, and quantum wells. GaN, AlN, and related compounds are desirable wide bandgap (WBG) semiconductors for power electronics.^{19} Point defects negatively or positively impact the performance of each of these materials; in general, semiconductors with intrinsic defect tolerance and possible n-type or p-type dopability are desired for optoelectronic applications. Furthermore, defect levels in semiconductors (e.g., NV-centers in diamond) have also been suggested as qubits for quantum computing.^{20}

Experimentally, defect levels are measured using techniques such as cathodoluminescence and deep-level transient spectroscopy.^{21} However, these methods face difficulties in sample preparation and assigning measured levels to particular defects; e.g., it is not trivial to determine whether a captured peak is from a specific vacancy or self-interstitial, or from an unwanted substitutional or interstitial impurity.^{12} First-principles-based density functional theory (DFT) computations have thus been widely used to calculate the formation energy (E^{f}) of point defects as a function of Fermi level (E_{F}), net charge in the system (*q*), and chemical potential (*μ*).^{22,23} Such computations help reliably identify the lowest energy donor and acceptor defects, all possible shallow and deep defect levels, the equilibrium conductivity as pinned by the lowest energy defects (p-type, intrinsic, or n-type), defect concentrations, electron/hole capture rates, and other related properties. When using an appropriate level of theory, DFT-computed charge transition levels have been shown to match remarkably well with measured levels.^{12} Although semi-local generalized gradient approximation (GGA) functionals suffer from a lack of accuracy in predicting the bandgap and band edges,^{24} the discovery of low energy defect structures from GGA is still crucial as it provides a base for further computations at advanced levels of theory as well as the use of multi-fidelity learning,^{25} and computed properties often correlate well with experiments or higher theory,^{12,26} thus providing a means to a first-level screening or weeding out of unfavorable defects. Despite the successes of DFT, large-supercell charged calculations are rather computationally expensive even at the GGA level of theory, which makes it prohibitive to perform defect calculations across a very large chemical space.

Predicting defect properties can be significantly accelerated by combining DFT data with state-of-the-art machine learning (ML) models.^{12,26} Some recent efforts, including our own past work, have shown that regression models trained on a DFT dataset, using vectors encoding the identity and elemental properties of the coordinating atoms involved in creating a defect, can yield accurate prediction and screening over tens of thousands of defects and impurities.^{27–29} In published work from 2022,^{12} we trained ML models to predict the formation energies and charge transition levels of point defects in 34 ZB semiconductors with a roughly 90% prediction accuracy, which enabled qualitatively reliable screening of consequential impurities from across a list of >12 000. However, these models suffer from the following limitations: (a) for a wide chemical space, composition-based models^{30} require a significant amount of training data to accurately capture the complex relationships between the material and target properties, (b) all predictions are made for a supposed ground state defect structure, which means no competing defective structures could be sampled and no lower energy defect structures could theoretically be found, (c) the errors are larger than desired, presumably due to lack of information in the model inputs about the defective geometry and how local coordination changes in the presence of different defects, and (d) the predictive models cannot trivially be applied to related but “out-of-sample” defects, such as complexes and alloys within the same material classes. A potential approach to tackle these issues arises in the form of a “crystal graph,” which is the most general representation of a crystalline structure, automatically accounting for different supercell sizes, types of ionic species, mixing or added atoms, and metastable configurations.

Graph-based Neural Networks (GNNs) have been rising in popularity over the past few years and are widely applied at present for adequately representing and predicting properties of molecules, polymers, and inorganic crystals.^{27,31–37} GNNs can work directly with graph-structured data, converting crystal structures into crystal graphs where the nodes are atomic positions and the edges are chemical bonds. They are capable of learning internal representations of crystals useful for predicting properties ranging from formation or decomposition energy to bandgap to defect formation energy. ML models based only on vectors encoding composition and/or elemental properties are typically not suited to deal with crystalline polymorphs of a given material, often requiring hand-crafted features that are not generalizable. GNNs are known to be much more flexible than composition-based models, as they can be normalized with respect to system size or number of atoms and have the ability to capture important structure/chemistry information that contributes to properties of interest. By learning the global representation of crystals, GNNs can incorporate physical laws and phenomena on larger scales and be used to predict properties that are affected by long-range interactions. Predictive models trained using GNNs show a much better accuracy than models that lack structural/geometry information.

In this article, we present one of the most comprehensive efforts undertaken to date for predicting defect properties in semiconductors, by combining a massive high-throughput (HT) DFT dataset of defect formation energies (DFEs) with state-of-the-art GNNs. We utilize our previously published dataset,^{12,26} bolstered by the inclusion of thousands of partially optimized and unoptimized structures in addition to optimized structures, along with several new computations, and train predictive models using three types of established GNN schemes: Crystal Graph Convolutional Neural Network (CGCNN),^{38} Materials Graph Network (MEGNET),^{39} and Atomistic Line Graph Neural Network (ALIGNN).^{40} We note that there are a few other high-throughput DFT defect datasets reported in the literature; in Table S3, we compare our dataset with others in terms of the number of data points and types of defects studied. We train GNN models on datasets ranging from a few thousand to nearly 15 000 data points for point defects in different charge states, across 40 or so unique ZB compounds. We present a description of model optimization and visualization of the prediction results for different defect types and show how (a) ALIGNN predictions significantly improve upon previous DFE estimates, with a root mean square error (RMSE) of ∼0.3 eV, (b) predictions can be made for defect complexes and alloyed compositions by including a subset of them in the training data, and (c) GNN predictions for new systems can be used both for screening based on upper-bound energies and for stabilizing any defect by changing the defect coordination environment until the GNN-predicted formation energy minimizes. We believe that this provides a novel and promising approach toward predicting defect energetics and screening important defects from large chemical spaces. Considerations of the level of theory and future prospects of this work are additionally discussed. The DFT + GNN workflow applied for predicting defect properties is presented in Fig. 1.

## II. COMPUTATIONAL DATASET

The semiconductor + defect chemical space considered in this work is pictured in Fig. S1 in terms of the group IV, III–V, and II–VI binary ZB compounds (referred to henceforth as AB, with A being the cation and B being the anion) that serve as hosts to defects, elements selected from across the Periodic Table as possible defects (henceforth referred to as M for any substitutional or interstitial defect, whereas vacancies will use V), and five possible symmetry inequivalent defect sites, namely A-site (M_{A}), B-site (M_{B}), tetrahedral interstitial site with four neighboring A atoms (M_{i,A}), tetrahedral interstitial site with four neighboring B atoms (M_{i,B}), and octahedral interstitial site with 3 A and 3 B atoms in the neighborhood (M_{i,oct}). Here, we utilize four types of datasets: dataset 1 (all possible single native defects in 34 binary ZB semiconductors), dataset 2 (hundreds of substitutional and interstitial impurities across all ZB compounds), dataset 3 (native defects and impurities in a variety of CdSe_{x}Te_{1−x} alloys), and dataset 4 (some defect complexes simulated in CdTe). Datasets 3 and 4 arise from a parallel study on dopant activation in CdTe-based solar cells^{41} and are used here to evaluate the effectiveness of GNN-based defect predictions for alloys and complexes. All datasets contain DFEs calculated for at least 5 distinct charged states (q = 0, +2, +1, −2, −1) under two extreme chemical potential conditions, namely A-rich and B-rich. Figure 2 shows the violin plots capturing the distribution of DFEs (only for neutral defects under A-rich chemical potential conditions) for all four datasets, with inset box plots showing the median, lower quartile, and upper quartile for each.

The DFT details, including specific Vienna *Ab initio* Simulation Package (VASP) input parameters, level of theory information, reciprocal space sampling, references for chemical potentials, and equations used for DFE calculation, are present in our past publication.^{12} All data are from the semi-local GGA-PBE functional, which generally reproduces lattice parameters and relative stabilities quite well but is not reliable for electronic band edges and bandgaps, which is likely to affect computed defect levels as well. The non-local hybrid HSE06 functional^{42} is preferred for electronic and defect properties but is much more expensive and often requires a tuning of the mixing parameter [which determines the fraction in which exact exchange from Hartree–Fock is combined with the exchange–correlation energy from Perdew–Burke–Ernzerhof (PBE)], which is very system-specific and not trivially applied across large datasets.^{43} Beyond-DFT methods, such as GW approximation, which expands the self-energy in terms of the single-particle Green’s function G and the screened Coulomb interaction W,^{43} are more reliable but too prohibitively expensive to be applied high-throughput.

In past work,^{12} we showed that PBE computed defect charge transition levels (CTLs), when plotted to span the experimentally known bandgap of the semiconductor, match rather well with measured defect levels for a dataset of ∼80 defects in binary ZB compounds collected from the literature, showing a PBE vs experiment RMSE of 0.21 eV. This approach of moving the conduction band edge to match the experimental bandgap starting from the computed valence band edge is similar to corrections successfully applied by other researchers as well.^{27,44} A natural question arises here: how accurate are the defect properties computed using PBE as compared to a frequently used higher level method, namely the HSE06 functional? Ideally, one would perform HSE06 (with the appropriate mixing parameter determined for each material individually) or GW computations for all the defects in the dataset, but, clearly, this is not computationally tractable without expending years of effort. We reiterate here that we expect our PBE computed properties to correlate (in a highly complex and non-linear composition/structure-dependent manner) with the higher levels of theory, which will facilitate multi-fidelity learning efforts,^{25} but, more importantly, we trust the low energy geometries and relative energetics of different defect structures from PBE and also highlight the exciting opportunity of the optimized ground-state structures being fed as an input to high-throughput HSE06, GW, or other calculations in the near future.

Nevertheless, we performed some additional computations to benchmark our PBE results against HSE06. We considered a wide range of native point defects and selected dopants, namely, As_{Te}, As_{Se}, and Cu_{Cd}, in both CdTe and CdSe, leading to a total of 34 defects. The PBE-optimized structures were fed as an input to HSE (32% mixing parameter) calculations in five different charge states with spin–orbit coupling (SOC) turned on in order to capture the interaction of an electron’s spin with its orbital motion around the nucleus (we refer to this as an HSE + SOC calculation).^{45,46} The specific choice of the 32% mixing parameter (or *α* = 0.32) is motivated by studies in the literature^{47} and by the fact that it closely reproduces the experimental bandgaps of CdTe (1.47 eV vs the experimental value of 1.5 eV) and CdSe (1.64 eV vs the experimental value of 1.7 eV).^{12,48} Computed DFEs and CTLs from PBE and HSE + SOC are plotted against each other in Fig. 3. We find that there is some scatter in the DFE comparison plot in Fig. 3(a), which we owe to the slight differences in energies of reference phases from different levels of theory, but the major deviations arise for higher energy defects and screening of low energy defects using a certain PBE-DFE threshold should still work well. Meanwhile, we find remarkable agreement between the defect CTLs computed from PBE and HSE + SOC as pictured in Fig. 3(b) with a few discrepancies mainly coming from −1/−2 and +2/+1 transitions. This is consistent with our previous assertions and is primarily attributed to the fact that energy level differences are often commensurate between different levels of theory.^{49,50} This correlation in defect levels is particularly noteworthy (with the caveat that this testing is done for a small dataset), as it underscores the robustness of the PBE approach in accurately capturing the electronic structure with defect states in these semiconductors to identify the shallow or deep nature of these defects.

Thus, PBE-level DFEs and transition levels may be sufficient for a first-level screening of low-energy defects. Inaccuracies will still persist from incorrectly locating VBM and conduction-band minimum (CBM), but appropriate corrections can be applied afterward using different higher-accuracy bulk calculations once PBE-level DFEs are predicted for multiple *q* and *μ* conditions. Two such possible correction methods include using the modified band alignment approach based on PBE and HSE06 bandgap values^{51} and shifting both band edge positions using GW quasiparticle energies.^{52} The focus of the present work is to demonstrate the accelerated prediction of PBE-level defect energetics, and the aforementioned corrections will be considered in the future work. In the next few subsections, a detailed description is provided for the four datasets generated at the PBE level.

### A. Dataset 1

Dataset 1 contains DFEs for all possible native defects in each AB compound, namely V_{A} (vacancy at A-site), V_{B}, A_{i,A} (A self-interstitial at A-coordinated tetrahedral site), A_{i,B}, A_{i,oct} (A self-interstitial at octahedral site), B_{i,A}, B_{i,B}, B_{i,oct}, A_{B} (A on B anti-site substitution), and B_{A}. All AB compounds, simulated in the cubic ZB structure with A atoms occupying an FCC lattice and B atoms occupying tetrahedral sites, are listed as follows: 8 II–VI compounds (ZnO, ZnS, ZnSe, ZnTe, CdO, CdS, CdSe, and CdTe), 16 III–V compounds (AlN, AlP, AlAs, AlSb, BN, BP, BAs, BSb, GaN, GaP, GaAs, GaSb, InN, InP, InAs, and InSb), and 10 group IV compounds (SiC, GeC, SnC, SiGe, SiSn, GeSn, C, Si, Ge, and Sn). There are a total of 312 possible native defects across the 34 compounds, and DFEs were computed for all under both A-rich and B-rich conditions. From each of the 312 (×5 *q* states) PBE geometry optimization calculations, we collected all possible “intermediate structures,” that is, geometries generated during the course of the optimization, all the way from pristine structure (which is simply the ground state semiconductor bulk supercell structure with a defect introduced) to the fully optimized structure; also collected were the total DFT energies corresponding to each structure. The shell script used to collect intermediate structures (*IS*) for every defect from XDATCAR and the corresponding energies from OUTCAR (typical output files in VASP^{53}) is added to the supplementary material and available on GitHub. The DFE corresponding to every *IS* is estimated as E^{f}(*IS*) = E_{DFT}(*IS*) − E_{DFT}(optimized structure) + E^{f}(optimized structure).

This approach helps swell the DFT native defect dataset to 3071 data points for q = 0, and between ∼1500 and ∼2000 points for the other q values, as reported in Table I. The lower number of structures for the charged systems as compared to the neutral defects comes from the fact that most of the geometry optimization takes place during neutral calculation, whereas the charged calculations typically use the optimized neutral defect geometry as their starting point. All the *IS* serve as energetically accessible but not ground state defective structures, which can not only play an important role in understanding the dynamics and behavior of the crystal but also provide an energetically and structurally diverse dataset for a GNN model to be trained on. This also ensures that the GNN “knows” what an unoptimized, partially optimized, and fully optimized defect structure looks like, meaning that it will yield the correct energy corresponding to any hypothetical new structure and potentially help reduce the energy by subtly modifying the defect coordination environment.

Dataset . | Data points . |
---|---|

Native defects in 34 compounds (dataset 1) | 2053 (q = +2), 1840 (q = +1), 3071 (q = 0), 1966 (q = −1), 1498 (q = −2) |

Impurities in 34 compounds (dataset 2) | 5326 (q = +2), 3990 (q = +1), 13 966 (q = 0), 3568 (q = −1), 4628 (q = −2) |

Native defects in CdSe_{x}Te_{1−x} (dataset 3) | 291 (q = +2), 322 (q = +1), 734 (q = 0), 305 (q = −1), 329 (q = −2) |

Defect complexes in CdTe (dataset 4) | 47 (q = 0) |

Dataset . | Data points . |
---|---|

Native defects in 34 compounds (dataset 1) | 2053 (q = +2), 1840 (q = +1), 3071 (q = 0), 1966 (q = −1), 1498 (q = −2) |

Impurities in 34 compounds (dataset 2) | 5326 (q = +2), 3990 (q = +1), 13 966 (q = 0), 3568 (q = −1), 4628 (q = −2) |

Native defects in CdSe_{x}Te_{1−x} (dataset 3) | 291 (q = +2), 322 (q = +1), 734 (q = 0), 305 (q = −1), 329 (q = −2) |

Defect complexes in CdTe (dataset 4) | 47 (q = 0) |

### B. Dataset 2

Dataset 2 contains hundreds of impurities or extrinsic defects (M) at various sites, namely M_{A}, M_{B}, M_{i,A}, M_{i,B}, and M_{i,oct}, across each of the 34 unique AB compounds. The five distinct defect sites are considered in 30 binary compounds, and three defect sites (A-site, one tetrahedral interstitial site, and one octahedral interstitial site) are considered in the elemental systems (C, Si, Ge, and Sn). This dataset encompasses a wide range of singular impurity atoms, including groups I to VII, all transition metals, and all lanthanides, leading to a total of 77 species, as shown in Fig. S1. The total number of possible impurities resulting from this can be calculated as 77 × 5 × 30 + 77 × 3 × 4 = 12 474 (many of these would coincide with the native defects described earlier). Out of this dataset of 12 474 defects, 1566 were chosen for complete neutral-state geometry optimization and ∼1000 were subjected to charged calculations as well; points in the DFT dataset exhibit sufficient chemical diversity in terms of semiconductor type, element type, and defect site type, to adequately represent the entire chemical space. Once again, we collected all IS from the DFT optimization runs for each impurity in five different q states, leading to nearly 14 000 data points for q = 0 and between 3500 and 5300 data points for the other q values, as reported in Table I.

### C. Dataset 3

This dataset includes several possible native defects in a series of CdSe_{x}Te_{1−x} alloys (*x* = 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, and 0.875), namely V_{Cd}, V_{Te}, V_{Se}, Cd_{i}, Te_{i}, Se_{i}, Cd_{Te}, Cd_{Se}, Te_{Cd}, and Se_{Cd}. This results in a total of 82 unique defects across the seven mixed compositions, which are of interest in CdTe solar cells where Se is often mixed at the anion site.^{54–58} DFEs are computed for each defect in 5 q states under Cd-rich and Te/Se-rich conditions to obtain the two extreme ranges of energies, and all the *IS* are collected from the optimization runs. Total datasets of a few hundred points are thus compiled, as shown in Table I. This dataset will help examine whether GNN models trained on defects in pure unmixed compositions (CdTe and CdSe) are applicable to alloyed compositions in the same chemical space and how many new alloy data points might need to be added to the training set to achieve satisfactory accuracy.

### D. Dataset 4

Finally, we posit that crystal graphs should be capable of representing defect complexes as well in addition to the single-point defects described above. For exhaustively investigating defect tolerance and dopability of a semiconductor, it is vital to consider likely defect complexes, which are typically multiple point defects or impurities that form simultaneously in the lattice and interact with each other. Examples include Schottky and Frenkel defects, and compensating vacancies or interstitials that form along with dopants. The V_{Ga}–O_{N}–2H triple complex was found to have a very low energy in GaN,^{59} and it has recently been suggested that As–O based complexes may form in CdTe.^{41} Thus, we simulated a series of complexes, such as V_{Cd} + As_{Te} and V_{Te} + Cu_{Cd} in CdTe, resulting in a small dataset of 47 points of neutral state defects, for both Cd-rich and Te/Se-rich conditions, including all the *IS*.

## III. DFT OPTIMIZED vs UNOPTIMIZED FORMATION ENERGY

Before training GNN models, we analyzed the DFT datasets to determine the scale of the differences between DFEs from full DFT-optimization and from pristine, unoptimized defect structures. For any hypothetical defect, an unoptimized pristine structure could be trivially generated simply by inserting the defect of interest in an optimized bulk supercell, but obtaining the ground state DFE is a matter of optimizing this structure, which would involve a PBE calculation that runs for minutes, hours, or days, depending on the nature of the defect. The purpose of GNN-based on-demand predictions is to reduce this time drastically. Since any new GNN predictions would likely be made using pristine defect structures, it is informative to examine, for a given type of defect, how low the energy could go starting from the unoptimized DFE if full optimization were to be performed.

Figure 4 shows unoptimized DFE plotted against the fully optimized DFE, for the dataset of 312 native defects across 34 AB compounds (dataset 1), under both A-rich and B-rich conditions. The unoptimized DFEs are obtained by performing fixed-volume single ionic step calculations on the pristine defect-introduced structures. The dataset is divided into vacancies, anti-site substitutions, and self-interstitials. It can be seen that the amount of geometry relaxation happening in vacancy structures is minimal, with the two types of DFEs almost always being very similar to each other. Meanwhile, interstitials and anti-sites often undergo major atomic rearrangement such that the optimized defect coordination environment may look starkly different from the starting structure, thus leading to DFE differences ranging from 1 eV to nearly 8 eV. The large differences for interstitials could be understood in terms of the unfavorability of introducing an extra cation or anion in a tetrahedral or octahedral void; the larger the ionic radii, the greater this unfavorability. Substitutions depend on the size differences between A and B and may thus show either a low or high energy variability. These trends roughly inform the threshold that must be applied upon unoptimized DFEs (say, from GNN predictions) to determine their likelihood of stability upon full optimization. It should be noted that the intermediate structures collected from each “optimization run” essentially span the range of the unoptimized to optimized DFE, yielding hundreds of structures for some defects and only a handful for others.

## IV. GRAPH NEURAL NETWORK ARCHITECTURE

Here, we briefly describe the technical details behind the three GNN schemes chosen in this study, namely CGCNN, MEGNET, and ALIGNN.

### A. Crystal graph convolutional neural network (CGCNN)

CGCNN, developed by Xie *et al.*,^{38} is a deep learning architecture that takes a crystal graph as an input and applies a series of convolution and pooling operations to extract features that capture the underlying properties of the crystal. These features are subsequently fed into a fully connected neural network (FCNN) to make predictions of the properties of interest.^{60} The CGCNN framework is pictured in Fig. S2(a), and its operation is described as follows:

^{61,62}

*i*at layer

*t*+ 1, $vi(t)$ represents the feature vector of atom

*i*at the previous layer

*t*,

*∑*

_{j,k}indicates a summation over all neighbors

*j*of atom

*i*and over all bond types

*k*between atoms

*i*and

*j*,

*σ*is a non-linear activation function, $z(i,j)k(t)$ represents the edge feature vector between atoms

*i*and

*j*with bond type

*k*at layer

*t*, $Wf(t)$ and $Ws(t)$ are trainable weight matrices at layer $t,Wf(t)$ is used for filtering (hence the subscript

*f*) and $Ws(t)$ is used for scaling (hence the subscript

*s*), $bf(t)$ and $bs(t)$ are bias vectors at layer

*t*corresponding to the weight matrices $Wf(t)$ and $Ws(t)$, respectively, ⊙ denotes element-wise multiplication, and

*g*is another non-linear activation function. Pooling layer: The output of the last convolutional layer is passed through a global pooling layer (e.g., min, max pooling), which aggregates the features of all nodes in the graph into a single vector.

^{61,62}This vector contains information about the entire crystal structure, including all atomic coordinates, bond distances, and well-known elemental properties of each atom, such as ionization energy and electronegativity,

*h*

_{pool}is the output feature vector of the pooling layer,

*N*is the total number of nodes in the graph, and $hi(L)$ is the feature vector of node

*i*in the last layer

*L*. Fully connected neural network (FCNN): Finally, the output of the pooling layer is fed into an FCNN, which is trained like a regular NN-regression model to make predictions,

*y*is the predicted property,

*W*

_{fc}is the weight matrix of the FCNN,

*b*

_{fc}is the bias vector,

*h*

_{pool}is the output feature vector of the pooling layer, and

*f*is a non-linear activation function such as ReLU or sigmoid.

### B. Materials graph network (MEGNET)

The MEGNET framework was developed and released by Chen *et al.* in 2019^{39} and is pictured in Fig. S2(b). MEGNET uses elemental embeddings to encode periodic chemical trends that can be used to improve the performance of models with limited training data. Elemental embeddings are vector representations of elements that capture their chemical properties that are typically learned from a large dataset of crystals. When a new crystal is encountered, the embeddings can be used to predict the properties of interest. The MEGNET methodology is described as follows:

Graph representation of materials: MEGNET represents the crystal as a graph where the atoms are the nodes, and the edges represent the connections between the atoms. Each atom is associated with a set of features, such as its atomic number, coordinates, and chemical environment.

Message passing algorithm: MEGNET uses a message-passing algorithm to capture the interactions between atoms in the crystal. Each atom sends a message to its neighboring atoms, which is a function of the node and edge features. The messages are then aggregated at each node and the resulting feature vector is used as an input to a neural network.

Readout layer: The output of the message-passing algorithm is passed through a readout layer that maps the learned node and edge features to target properties, and a loss function is calculated to capture the difference between the predicted and actual values.

### C. Atomistic line graph neural network (ALIGNN)

ALIGNN is a novel approach developed by Choudhary *et al.*,^{40} which differs from CGCNN and MEGNET in terms of considering three-body interactions (bond angles) as well in addition to two-body terms (bond lengths). ALIGNN leverages both graph convolutional layers and line graph convolutional^{63} layers to capture both short-range and long-range correlations in the crystal. The framework [pictured in Fig. S2(c)] is described as follows:

Atomic feature extraction: ALIGNN takes as an input a graph representing the atomic structure of the crystal. Each node (atom) in the graph is associated with a set of atomic features, which includes properties such as electronegativity, group number, covalent radius, valence electrons, ionization energy, electron affinity, atomic block, and atomic volume. Each edge (bond) in the graph is associated with both the bond angle and bond distance.

Graph convolutional layers: ALIGNN uses graph convolutional layers to update the feature vectors of each node based on the features of its neighboring nodes. In each layer, the feature vectors are updated using a weighted sum of the features of neighboring nodes, similar to other models. This step captures short-range interactions in the structure.

Line graph construction: To capture long-range correlations, ALIGNN constructs a line graph on top of the original crystal graph. The line graph has nodes that represent unique bonds between atoms, corresponding to edges in the crystal graph. The line graph edges connect pairs of bonds that share a central atom in the crystal graph, effectively capturing bond angle information. ALIGNN then applies another set of graph convolutional layers to the line graph, which updates the feature vectors of each bond based on the features of neighboring bonds. The information from the line graph is then propagated back to the original crystal graph to update the bond features in combination with the node features.

Feature refinement: After the line graph convolution, ALIGNN refines the feature vectors for each edge using a set of learnable transformations that help capture more complex interactions between atoms and bonds.

Graph pooling: ALIGNN aggregates the refined bond-level feature vectors into a single graph-level feature vector using a graph pooling operation that summarizes the relevant information from the entire crystal graph.

Output prediction: Finally, the graph-level feature vector is fed to an FCNN for making predictions.

## V. RESULTS AND DISCUSSION

### A. Testing GNN models on dataset 1

As a first step, we tested the performance of CGCNN, MEGNET, and ALIGNN for predicting q = 0 E^{f} for dataset 1 only. For each model, the data are split 60:20:20 into training, validation, and test sets. The CGCNN training protocol has several important hyperparameters that must either be kept fixed at recommended values or rigorously tuned over a set of possible values, such as the number of properties used in the atom feature vector, the number of convolutional layers, the length of the learned atom feature vector, the number of hidden layers, the regularization term, the scaling factor of the Gaussian initialization of weights, the step size of the Adam optimizer,^{64} dropout fraction, batch size, the number of epochs, and the cutoff distance r_{cut} for constructing the crystal graph. Here, we optimized only the batch size, epochs, and r_{cut}, keeping the rest of the hyperparameters the same as in the original CGCNN publication.^{38} A parity plot for the best CGCNN predictions on dataset 1 (for A-rich conditions) is pictured in Fig. 5(a), showing an RMSE of 0.19, 0.41, and 0.35 eV, respectively, on the training, validation, and test sets. These errors are already far lower than our previously published DFE test prediction errors of ∼0.6 eV for defects in Cd-based chalcogenides^{26} and ∼1 eV across many group IV, II–VI, and III–V semiconductors,^{12} as well as being highly competitive with the state of the art for such predictions. Learning curves showing how the CGCNN errors converge as the epochs, batch size, and r_{cut} increase are presented in Fig. S3.

In the context of GNN for materials science and engineering, the r_{cut} parameter typically refers to the cutoff radius for considering atomic interactions, and convolutional layers are used to aggregate information from the neighbors within this cutoff radius. If r_{cut} is too small, important interactions between atoms or defects that are slightly farther apart may not be included, potentially missing critical information about the defect properties of interest. Meanwhile, a larger r_{cut} allows the model to consider a broader neighborhood, which could include interactions between defects, especially if they are within that distance, which is unwanted. In DFT calculations, we utilized a 2 × 2 × 2 supercell configuration, and as a result, the dimensions of these supercells for various IV–IV, III–V, II–VI semiconductors range from 10 to 14 Å. The periodic images of point defects are generally ∼12 Å apart, a span greater than the r_{cut} used in generating the crystal graph. Consequently, given that our r_{cut} is shorter than the separation between two periodic point defects, the larger convolution layers in our model are unlikely to erroneously aggregate information from neighboring point defects. This careful setup helps in maintaining the accuracy and specificity of the interaction data within our model.

Next, we trained a MEGNET model as shown in Fig. 5(b) following the same strategy. Notable hyperparameters include the number of interaction blocks, number of hidden layers, hidden layer size, learning rate, regularization strength, dropout rate, batch size, activation function, number of features assigned to each bond in the input graph, and r_{cut}. Here, we only optimized the number of epochs, batch size, and r_{cut}, and the rest of the parameters are directly adopted from the MEGNET publication.^{39} We find that MEGNET once again performs much better than previous composition-based models but shows slightly larger errors than CGCNN with an RMSE of 0.36, 0.42, and 0.40 eV on the training, validation, and test sets, respectively. The test error is close enough to the CGCNN error of 0.35 eV to justify the use of MEGNET over CGCNN, especially since MEGNET significantly corrects any possible overfitting in the CGCNN models by yielding roughly similar training, validation, and test errors. MEGNET has a more complex model architecture than CGCNN and includes elemental embeddings encoding periodic chemical trends, which may allow better generalization to unseen data.

Finally, we trained an ALIGNN model on dataset 1 and found that it yields a better performance than both CGCNN and MEGNET, with a slight concern of model overfitting alleviated by the remarkably low values of validation and test RMSEs. As shown in Fig. 5(c), the test RMSE is 0.15 eV, which represents a 99% accuracy considering the DFE values range from 0 to 15 eV. The reason for the improvement from CGCNN and MEGNET could be attributed to the line graph convolution step that captures long-range interactions, which may be important for particular point defects whose presence affects atomic arrangements beyond just the first nearest neighbor shell, causing larger lattice distortions. For training ALIGNN models, we use r_{cut} = 8 Å, 12 nearest neighbors, four graph convolutional layers, four line graph layers, learning rate = 0.001, batch size = 16, an AdamW optimizer, and 150 epochs. We tested a few different choices for training–validation–test splits of the entire dataset, and the test RMSE obtained for each is shown in Table S1. We find that the error does not change beyond the 60–20–20 split and, thus, use this choice for all the final optimized models in this work. We also performed fivefold cross-validation (CV) dividing the dataset from Fig. 4 into five equal parts. Each fold served as the validation set once, while the remaining four folds collectively formed the training set. This cycle was repeated five times, ensuring that every segment of the dataset was used for validation exactly once. In each iteration, the model was trained on the training set and then evaluated on the validation set. The RMSE was calculated for each validation phase, providing insights into the model’s performance on different data segments as shown in Table S2 with an average RMSE of 0.15 eV over five cycles. This outcome aligns closely with the validation RMSE of 0.16 eV obtained without the application of a fivefold CV, affirming the model’s consistency and reliability. The standard scikit-learn package conveniently includes a built-in module for k-fold CV, which automates the process of segmenting the dataset into multiple folds and systematically iterating through these folds for training and validation. However, in our case, working with GNN models, we did not have access to a comparable, integrated module for k-fold CV. As a result, we were required to implement this process manually, which entailed meticulously dividing our dataset into k distinct folds, sequentially using each fold as the validation set, while the remaining folds constituted the training set, and then repeating this process k times—once for each fold. This manual approach, although more labor-intensive, was crucial for ensuring the thorough evaluation and validation of our GNN models, particularly given their unique architecture and the specific nature of our dataset. We also trained the ALIGNN model using only unoptimized configurations (see Fig. S17), which shows much larger test errors owing to a smaller dataset size, validating our approach of expanding the defect geometries to $\u223c15,000$ structures. The results of hyperparameter optimization with ALIGNN are presented in Fig. S4, which is a more computationally expensive step than for the other two methods, motivating the use of much of the same hyperparameters as in past work.^{40} However, the training time is still reasonable and the accuracy payoff is immense; thus, we use ALIGNN as the GNN scheme of choice going forward and discuss prediction performances for datasets 2, 3, and 4 in Subsection V B. Table II lists the optimized training, validation, and test set RMSE values for CGCNN, MEGNET, and ALIGNN models trained on dataset 1. It is important to clarify that during the initial phase of our study, we experimented with various schemes, including CGCNN, MEGNET, and ALIGNN. However, after preliminary testing, we decided to exclusively focus on the ALIGNN scheme for all subsequent ML training. The significant hyperparameters of the ALIGNN model, which were critical to our study, are comprehensively listed in the supplementary material for reference.

### B. Extending ALIGNN to datasets 2, 3, and 4

To determine whether the GNN models optimized so far could directly be applied to impurities, complexes, and alloys, we first tested the ALIGNN model trained on dataset 1 for their predictive power on datasets 2, 3, and 4, before suitably re-optimizing the models by adding more training data points. Figure 6 shows the prediction performance of the ALIGNN model trained only on dataset 1 for dataset 2 (a), dataset 3 (b), and dataset 4 (c), along with the improvement in the predictions when 50% of any new dataset is added to the training set and the ALIGNN model is retrained. The RMSE for the dataset 1-trained ALIGNN model is as high as 2.17 eV for dataset 2, 2.68 eV for dataset 3, and 2.98 eV for dataset 4, showing very poor performances that become worse going from native defects to impurities to alloys to complexes. The structure–property relationships learned from native defects alone cannot be generalized to extrinsic species or non-binary compounds, or indeed, the presence of multiple simultaneous defects that will inevitably cause far higher distortions and atomic rearrangements compared to single defects.

Upon combining 50% of each dataset (chosen randomly) with dataset 1 and re-optimizing the model using the same approach as before, and performing necessary hyperparameter optimization anew, RMSE values improve drastically to 0.36 eV for dataset 2, 0.70 eV for dataset 3, and 0.18 eV for dataset 4. Upon adding 50% training data, the performance improvement is much better in dataset 4 than dataset 3, which can be attributed to two things: (a) the much smaller dataset of complexes (47 points) compared to defects in alloys (734 points in the neutral state), and the similarity of some of these complexes with one another, which makes it easier to extend predictions from the training to test sets, with the small data size also skewing predictions, and (b) the amount of structural complexity is much higher in the alloy dataset than the complex dataset, as the former contains several new atoms dispersed in the lattice (e.g., half the Te atoms in a CdTe supercell being replaced by Se to make CdSe_{0.5}Te_{0.5}) as opposed to just one or two new atoms in the case of a defect complex. These errors will go further down as more data are added to the training set, showing that each type of defect data needs to be adequately represented during the training process for generalizing the ALIGNN predictive power. This exercise provides insights into the challenges associated with representing and predicting the energetics of defects in different defect categories with a limited training dataset and demonstrates the importance of training ML models on comprehensive datasets to improve their performance across various defect types.

Next, we trained predictive models by combining all four datasets, for all charge states and chemical potential conditions. For charged defects, the DFE value is taken to be E^{f}(E_{F} = 0), because by predicting this value for each charge state, the E^{f} vs E_{F} plot can be extended across the bandgap region by using straight lines with slope = q. Thus, a total of ten different ALIGNN models are trained for the combined dataset, for E^{f}(q = +2, E_{F} = 0), E^{f}(q = +1, E_{F} = 0), E^{f}(q = 0, E_{F} = 0), E^{f}(q = −1, E_{F} = 0), and E^{f}(q = −2, E_{F} = 0), under A-rich and B-rich chemical potential conditions. As seen in Fig. S5, there are a handful of outliers in the parity plots, especially in the case of E^{f}(q = 0, E_{F} = 0), which may result from some structures getting stuck in local minima during DFT optimization, and possible inaccuracies in the GNN model that may be fixed with more data and/or hyperparameter optimization. We removed a few notable outliers and trained the models again, to obtain the best ALIGNN predictive models that may be applied for new data points. The q = +1, q = 0, and q = −1 ALIGNN models under A-rich conditions are pictured in Figs. 7(a)–7(c), respectively, and the remaining seven models are presented in Figs. S6 and S7. These models show very respectable RMSE values, suggesting that ALIGNN is capable of effectively learning the structure–property correlations in each dataset and each charge state, and generalizing them across all data types. The test prediction errors for q = +2, q = +1, q = 0, q = −1, and q = −2 are found to be 0.30, 0.23, 0.32, 0.25, and 0.26 eV, respectively, representing a 98% accuracy. The slightly larger errors for the neutral defects arise from the larger structural diversity for q = 0 defect structures compared to the charged defect structures, also manifesting in much larger numbers of q = 0 data points (e.g., 13 966 in dataset 2) than q = +1 (3990 in dataset 2) or q = −1 (3568 in dataset 2). The training, validation, and test errors for the best ALIGNN models for different charge states under A-rich conditions are listed in Table III.

Charge . | Train RMSE (eV) . | Validation RMSE (eV) . | Test RMSE (eV) . |
---|---|---|---|

q = +2 | 0.10 | 0.25 | 0.30 |

q = +1 | 0.07 | 0.27 | 0.23 |

q = 0 | 0.20 | 0.30 | 0.32 |

q = −1 | 0.11 | 0.26 | 0.25 |

q = −2 | 0.06 | 0.26 | 0.26 |

Charge . | Train RMSE (eV) . | Validation RMSE (eV) . | Test RMSE (eV) . |
---|---|---|---|

q = +2 | 0.10 | 0.25 | 0.30 |

q = +1 | 0.07 | 0.27 | 0.23 |

q = 0 | 0.20 | 0.30 | 0.32 |

q = −1 | 0.11 | 0.26 | 0.25 |

q = −2 | 0.06 | 0.26 | 0.26 |

To investigate the variation in the RMSE of dataset 1 when dataset 2, dataset 3, and dataset 4 are successively integrated with it, we expanded our analysis by incorporating datasets 2, 3, and 4, in addition to dataset 1, to train four distinct ALIGNN models at q = 0 under A-rich conditions. The corresponding RMSE values for these models on the test sets are included in Table S6. Our analysis revealed that the lowest RMSE was achieved when the ALIGNN model was trained exclusively on dataset 1. The introduction of additional datasets (datasets 2, 3, and 4) does not notably change the RMSE of dataset 1 as shown in Table S6, and the overall RMSE on the combined dataset is 0.32 eV, which is a satisfactory error given the range of values and the fact that it works for each dataset.

In addition, to examine the effect of different types of point defects on the GNN performance, we trained separate ALIGNN models for vacancy, substitutional, and interstitial defects, for the same dataset used for the models in Fig. 5(c). An ALIGNN vs DFT parity plot showing predictions for different defect types is presented in Fig. S13. Primarily, the high test RMSE of 1 eV for vacancy defects suggests that the limited data (705 points) available for this type might not be adequate for effective training. Any deep learning model like a GNN requires substantial data to accurately learn and predict complex patterns that are less pronounced for substitution (∼4553) and interstitial (∼12 558) defects, where the data volume is sufficient, reflected in lower RMSEs of 0.36 and 0.42 eV, respectively. Furthermore, the combined model likely benefits from a more generalized approach, leveraging patterns across all defect types, leading to a lower overall RMSE of 0.32 eV. In contrast, training separate models for each defect type focuses on the specific patterns of each category, potentially exposing limitations in data quantity or complexity handling, thus resulting in higher RMSEs for individual defect types. We observe that interstitial defects, particularly those involving larger cations/anions, require rigorous DFT optimization. This complexity can challenge the generalization capabilities of ALIGNN models. Consequently, despite interstitial defects having the largest dataset volume, their performance is relatively constrained compared to substitutional defects. This limitation underscores the model’s difficulty in adapting to the intricate variations and structural changes associated with these specific defect types. We reiterate that the model trained on all the defect types combined yields a very good performance for all, with test RMSEs of 0.40 eV for vacancies, 0.29 eV for interstitials, and 0.38 eV for substitutional defects.

### C. ALIGNN-unoptimized vs DFT-optimized energies

The next objective is to utilize the best ALIGNN models to make predictions for new defects and perform screening of potentially low-energy defects based on predicted DFEs. The caveat here is that for any new defect, one could only really generate an “unoptimized” defect structure, and thus, the only ALIGNN prediction that can be made is the unoptimized DFE. As described earlier, a full DFT optimization of any defect structure is obviously a time-consuming step, involving structural relaxation until atomic forces become close to zero and the energy of the crystal reaches a minimum. In contrast, ALIGNN prediction of unoptimized DFE can be performed in seconds and thus used to estimate the unoptimized energies of hundreds of thousands of defect structures, which could then be used as a surrogate for screening based on some upper bound values.^{65,66} However, ALIGNN predictions could also, in theory, be used to replace DFT-based energy estimates within a gradient descent-type optimization process,^{67–69} or using brute-force structure generation and energy evaluation, and thus quickly yield low energy structures at near-DFT accuracy for any hypothetical crystalline defects.

To test the correspondence between ALIGNN-unoptimized energies and DFT-optimized energies, we plotted the ALIGNN-predicted E^{f}(q = 0) (ALIGNN-unopt) on pristine structures of all defects across datasets 1, 2, 3, and 4, against the DFT-optimized E^{f}(q = 0) (DFT-opt), in Fig. 8(a). We expect the ALIGNN-unopt DFE values to always be higher than the DFT-opt values, and this is, indeed, the case of $>$ 95% of the data points. However, we do find the opposite to be true for several defects, which is, perhaps, a consequence of the statistical nature of ML predictions, which are very accurate on average but may show large errors for certain outliers. Importantly, notable differences between ALIGNN-unopt and DFT-opt values are seen for many defects where large structural changes are expected upon full relaxation. Some examples include V_{Si} in SiC (8.28 eV difference), V_{In} in InN (7.63 eV difference), and Cs_{i,oct} in SiC (7.17 eV difference). By examining the 300 defects out of this total list of 1747 defects [plotted in Fig. 8(a)], which show the largest ALIGNN-unopt vs DFT-opt energy differences, we find an average difference of ∼1 eV; thus, we could apply a rough general equivalence of DFT-opt = ALIGNN-unopt − 1 eV and use this information to perform a high-throughput screening of likely low energy defects. Similar trends are expected to hold for ALIGNN-unopt vs DFT-opt energies of charged defects as well.

We further performed a thorough analysis comparing DFT-optimized and unoptimized energies. Specifically, we focused on evaluating the error distribution, which represents the energy differences between optimized and unoptimized structures. This analysis, detailed in Fig. S11 and the supplementary material, encompasses dataset 1. The findings reveal that for a large majority of defects, the error variably spans from 0 to 1.2 eV. This range validates our proposed correlation between DFT-opt and ALIGNN-unopt (assuming ALIGNN-unopt matches well with DFT-unopt), demonstrating that the approximation of a ∼1 eV difference is not only statistically significant but also practically applicable in the context of our research. This lends robust support to the reliability of the ALIGNN model in simulating point defects with an acceptable level of accuracy, thereby reinforcing the effectiveness of our computational approach. We note that this approach will be inaccurate for a small subset of defects, but such defects are generally unfavorable to begin with and, despite the large DFT-opt vs DFT-unopt difference, typically do not result in notably low energy defect structures. This can also be seen from Fig. 3, where the largest differences occur for formation energies $>\u223c4eV$ and should thus not affect eventual screening of low energy defects using the “ALIGNN-unopt 1 eV” criterion. A plot between the unoptimized DFEs computed from DFT and predicted from ALIGNN (for dataset 1) is presented in Fig. S12. It can be seen that there is a very high correlation between the two, showing an R^{2} value of 0.99 and an RMSE of 0.29 eV, meaning that the ALIGNN predictions made on pristine/unoptimized defect structures can, at minimum, be used as a surrogate for initial DFT energies, and suitable corrections can be made on top of the ALIGNN-unopt values to get closer to the optimized values.

Looking at only the DFT-opt values, we find that 170 defects have DFE < 0 eV, with some examples including N_{S} in ZnS (−3.76 eV), N_{As} in GaAs (−3.2 eV), and N_{Te} in CdTe (−2.64 eV). Whereas a look at the ALIGNN-unopt values yields 351 defects with DFE < 1 eV, which includes all 170 low energy defects from DFT, meaning that the suggested upper-bound energy screening procedure should help account for all potentially low energy defects in addition to a few unstable defects that may be eliminated subsequently. On average, the computational cost for optimizing a single-point defect in a 64-atom supercell amounts to ∼2500 core hours for the neutral state and 1000 core hours each for charged calculations. Running only static, single-shot calculations on pristine structures requires around 400 core hours in total. Meanwhile, ALIGNN-unopt predictions are made in seconds with minimal computing expense, and it is imperative to use these predictions as descriptors for low energy defects. Visualizing the ALIGNN-unopt vs DFT-opt values for q = +2, +1, −1, and −2, in Fig. S8 shows that there are more cases where the unoptimized energy is unexpectedly higher than the optimized energy. This may be a factor of the charged models never encountering pristine structures, as those are typically only utilized in neutral calculations and training sets. The charged models are only trained on partially optimized structures close to the ground state, or simply the ground state defect structures, making it such that q ≠ 0 ALIGNN predictions for pristine structures are less reliable than the neutral state predictions, even though the best charged models still have very low test RMSE values.

Finally, we examine the accuracy of predicting charge transition levels (CTLs) from ALIGNN compared to optimized DFT predictions. For any possible defect, a pristine unoptimized structure is created as described earlier and their DFE values are predicted at E_{F} = 0 eV for q = 0, +2, +1, −1, and −2. Using these values, E^{f} vs E_{F} plots are produced based on the lowest energies and locations where the defect transitions from one stable charge state to another, referred to as *ɛ*(q1/q2), are identified. This would effectively yield ALIGNN-unopt values for *ɛ*(+2/+1), *ɛ*(+1/0), *ɛ*(0/−1), and *ɛ*(−1/−2), which may then be compared with corresponding DFT-opt values available for datasets 1, 2, and 3. Figure 8(b) shows the ALIGNN-unopt *ɛ*(+2/+1) plotted against the DFT-opt *ɛ*(+2/+1), revealing substantial scatter and a lack of overall correlation. A similar behavior is observed for other CTLs as well, as shown in Fig. S9. This is not surprising and indicates that the relative stability of different charged defects and thus their transitions are sensitive to defect geometry and would thus require some level of optimization. Thus, we conclude that although ALIGNN-unopt E^{f}(E_{F} = 0) predictions within a threshold will provide some idea about the likelihood of formation of defects in a compound and its possible defect tolerance and dopability, exact CTLs will be harder to determine without optimization, which means ALIGNN-unopt alone cannot reveal the shallow or deep level nature of important low energy defects.

### D. ALIGNN-based defect structure optimization

We employed our trained ALIGNN model to optimize a crystal containing point defects using an iterative distortion approach. This gradient-free optimization method entailed subjecting the initial defective crystal structure to a series of systematic atomic displacements. Each atom in the crystal was displaced based on a normal distribution, and the displacements were carefully adjusted to ensure no overall translation of the system. The magnitude of these displacements ranged from zero (representing no displacement) to a maximum value specified as a user input. The best ALIGNN models are applied to predict the E^{f} of all generated crystals in all 5 q states. This procedure is iteratively repeated, and the applied distortions are adjusted until the predicted DFE becomes as low as possible. This approach allows for the efficient exploration of the energy landscape of the defective crystal, seeking configurations that approach the optimal structure. Another advantage of this gradient-free approach is that it does not rely on explicit training for atomic forces, which significantly increases the memory cost of GNNs at both training and prediction time.

For a demonstration of this procedure, we pick two defects in the neutral state, namely Re_{Zn} and La_{Zn}, both in ZnO. The ALIGNN-based defective structure optimization scheme is pictured in Fig. 9(a), and the results of the optimization procedure for Re_{Zn} and La_{Zn} in ZnO are presented in Figs. 9(b) and 9(c), respectively. After applying 644 consecutive distortions on the Re_{Zn} geometry, the ALIGNN-predicted DFE comes down from 5.93 to 5.31 eV, which is very close to the DFT-optimized DFE of 5.30 eV. For La_{Zn}, applying a total of 2407 distortions helps reduce the ALIGNN-predicted DFE from 5.23 to 3.35 eV, which is more than 1 eV higher than the DFT-optimized DFE of 2.20 eV. ALIGNN optimization required ∼12 min for Re_{Zn} and 40 min for La_{Zn} using a standard laptop to generate defect configurations and make instant ALIGNN predictions, which is a vast improvement on the ∼2500 core hours each DFT optimization would require. Thus, this procedure will efficiently yield lower energy defective structures, although an exact match with DFT may not occur for every system. Finally, examining the lowest energy ALIGNN and DFT structures shows remarkable similarities between the two, as pictured in Fig. S10. The decision to halt the optimization depends on whether additional iterations significantly reduce the energy. If generating more configurations does not lead to a notable decrease in energy, it suggests that the crystal structures have attained relative stability. Therefore, we suppose the optimization process is satisfactorily concluded, as further iterations are unlikely to result in major improvements or energy changes. For clarity on the stopping criteria, we generated and analyzed the energy of an additional 200 configurations while optimizing Re_{Zn} in ZnO at q = 0. Alongside Fig. 8(a), we have included a corresponding figure in the supplementary material (Fig. S18), where we demonstrated that an additional 200 configurations do not significantly reduce the energy notably (the model previously obtained an optimized energy of 5.3 eV, which remains the same). In addition, we note the potential use of algorithms such as Monte Carlo (MC),^{70} genetic algorithms, and Bayesian optimization for optimizing the defect structures. We acknowledge that MC approaches can be an effective optimization method and can provide a clear stopping criterion based on the acceptance probability. To integrate the MC approach, we need to dynamically integrate our code to generate a new configuration with the GNN model’s checkpoint (files where the model’s parameters are stored), which is under development; this will be part of ongoing and future work.

Next, we applied the same optimization scheme to six selected defects across three compounds for different charge states and plotted their ALIGNN-optimized E^{f} vs E_{F} in Fig. 10 alongside the corresponding DFT-optimized plots. We find that ALIGNN-optimization produces defect formation energy plots for Co_{Zn} and Si_{Zn} in ZnS (II–VI semiconductor), Rh_{i} and B_{i} in AlP (III–V semiconductor), and Li_{Si} and C_{i} in Si (group IV semiconductor), which match almost perfectly with DFT DFEs for all cases. The most stable charge states, transition levels, and E^{f} magnitudes are predicted to be very similar to both ALIGNN and DFT. We note that there is some discrepancy between the DFT and ALIGNN formation energies of the B_{i} defect in AlP closer to the conduction band region, owing to the fact that ML models are not perfect and will tend to show larger errors in some places than others. Such differences are typically not harmful when they occur for unstable defects or in regions of the bandgap less relevant to growth conditions. The transition level is found to be 0.34 eV away from the CBM from DFT and 0.51 eV from the CBM from ALIGNN, but the observations are qualitatively similar, showing that the B_{i} will transition from the +1 state to neutral state to a negatively charged state with three deep levels in the bandgap. Figure 11 further shows the CTLs for a few selected defects plotted from ALIGNN and DFT, showing both ALIGNN-unopt and ALIGNN-opt values. It can be seen that ALIGNN optimization brings down the DFT vs ALIGNN RMSE from 0.37 to 0.17 eV, which is a very respectable CTL prediction error that is far better than previous composition-based models and also very commensurate with the DFT-experiment RMSE of 0.21 eV established for the same chemical space. We further selected five distinct defects in CdTe, namely V_{Cd}, As_{Te}, Cu_{Cd}, Cu_{i}, and Te_{Cd}, and utilizing our GNN based optimization scheme, we methodically optimized the geometry of these defects. Subsequently, we compared the defect levels obtained from ALIGNN against those derived from experimental data.^{12,26,71,72} The outcomes of this comprehensive comparison are visually represented in Fig. S14, and we find that all the ALIGNN-optimized defect levels match very well with experimental values.

Our results demonstrate the effectiveness of GNNs in guiding crystal structure optimization and highlight their potential for accelerating materials discovery and design. It should be noted that the geometry optimization process is not very clean, and there is no clear answer on how many distortions or atomic displacements must be applied for a given defect, although the unoptimized vs optimized energy visualization provides some insights into different types of defects. It is not easy to determine when to stop the optimization process, other than when the GNN-predicted energy does not reduce anymore, which does not negate the risk of local minima trapping. This process can also get expensive when applying for hundreds of thousands of defects, especially depending on the values of hyperparameters such as r_{cut}; nevertheless, they are still meaningfully faster than complete DFT optimization.

It is important to note that the DFEs can vary depending on the size of the supercell. This variation is due to the interaction of defects with their periodic images in smaller supercells, which can lead to significant energy differences, although we have employed charge correction to compensate for this interaction. The final reported defect properties are always assumed to be converged with respect to supercell size, which should be large enough to regard the point defect as dilute in concentration. To test the supercell size effect, we simulated some of the dominant point defects in CdTe, including V_{Cd}, V_{Te}, Cd_{i}, AsTe_{Te}, Cl_{Te}, As_{i}, and Cl_{i}, using a 3 × 3 × 3 supercell and used the same ALIGNN based geometry optimization to make DFE prediction. Figure S15 shows that the DFT-computed DFEs from 2 × 2 × 2 and 3 × 3 × 3 supercells match very well with each other, showing the reliability of limiting the defect supercell size to 2 × 2 × 2. Figure S16 shows a comparison between the ALIGNN-predicted and DFT-computed DFEs of the same defects in the 3 × 3 × 3 supercell. A discrepancy exists between the ALIGNN and DFT data, indicating that our current GNN models are not yet adept at making reliable predictions for 3 × 3 × 3 supercells. To address this limitation, it is essential to adapt our model to better understand and interpret 3 × 3 × 3 supercell structures. This can be achieved by incorporating additional DFT-derived DFEs from 3 × 3 × 3 supercells into our training dataset, thereby enhancing the model’s familiarity and predictive accuracy with this specific supercell configuration.

### E. High-throughput screening of defects

The best ALIGNN models were finally applied to predict the E^{f} (E_{F} = 0 eV) of all 12 474 possible single defects and impurities across the entire chemical space, in all 5 q states, under A-rich chemical potential conditions. These predictions were then used to generate E^{f} vs E_{F} plots for all defects spanning the experimental bandgap of the semiconductor. To screen for potentially low energy defects, we look for E^{f} becoming negative for any portion of the bandgap, using a stringent threshold of 1 eV for neutral defects and 0 eV for charged defects. This yields a total of 1281 defects that are very likely to be stable based on the ALIGNN-unopt predictions, although many more such low-energy defects may exist once ALIGNN-optimization is performed. Table IV contains a few examples of low-energy defects identified by ALIGNN. The prediction of DFE by ALIGNN for all possible 12 474 defects under A-rich chemical potential conditions is added to the supplementary material. This provides a great starting point for future studies and a quick identification of n-type or p-type dopants in any compound of interest.

Semiconductor . | Group . | Defect . |
---|---|---|

AlN | III–V | Mn_{N} |

AlP | III–V | Cr_{i,B} |

AlAs | III–V | V_{i,B} |

AlSb | III–V | N_{Sb} |

BN | III–V | Al_{B} |

BP | III–V | V_{P} |

BAs | III–V | Cr_{As} |

BSb | III–V | Cr_{B} |

CdO | II–VI | Ti_{Cd} |

CdS | II–VI | Mn_{i,B} |

CdSe | II–VI | Cr_{Cd} |

CdTe | II–VI | Sc_{Cd} |

Semiconductor . | Group . | Defect . |
---|---|---|

AlN | III–V | Mn_{N} |

AlP | III–V | Cr_{i,B} |

AlAs | III–V | V_{i,B} |

AlSb | III–V | N_{Sb} |

BN | III–V | Al_{B} |

BP | III–V | V_{P} |

BAs | III–V | Cr_{As} |

BSb | III–V | Cr_{B} |

CdO | II–VI | Ti_{Cd} |

CdS | II–VI | Mn_{i,B} |

CdSe | II–VI | Cr_{Cd} |

CdTe | II–VI | Sc_{Cd} |

## VI. FUTURE OUTLOOK

The present article is primarily aimed at creating a robust framework for predicting defect structures and energetics in semiconductors, focusing on identifying likely defects and their impact on the properties of interest. We utilized GNNs to advance the accurate prediction of DFEs in various semiconductors, which is vital in semiconductor research for efficiently screening harmful defects and useful dopants and assessing low energy defect configurations, which may be found experimentally. Although understanding defect behavior and mechanisms is important, our paper concentrated more on refining the predictive models rather than exploring the interpretative aspects of defect behavior, which involves an understanding of why certain defects behave the way they do and the underlying mechanisms that govern these behaviors.

Furthermore, while it is true that defects are temperature-dependent phenomena, the qualitative nature of the DFEs obtained from 0 K DFT calculations is expected to remain consistent even at finite temperatures, such as 300 K. The rationale behind this is based on the fact that the primary contributions to DFEs at finite temperatures come from vibrational free energy, which often does not drastically alter the relative stability of different defects. Therefore, the trends and insights obtained from 0 K calculations are likely to remain valid when extrapolated to finite temperatures, especially in the context of screening and predicting defects across a wide range of materials.

In addition, E_{F} can also be understood as the chemical potential of electrons in a solid. It represents the energy level at which adding or removing an electron does not change the total energy of the system. This means that at this specific energy, the system is at a kind of equilibrium with respect to electron exchange. In a pure, intrinsic semiconductor, the E_{F} is located approximately midway between the conduction band and the valence band, although this also depends on the hole and electron effective mass and the temperature. In doped semiconductors, impurities introduce additional energy levels, and the E_{F} shifts toward the conduction band in n-type semiconductors (with electrons majority carriers) or toward the valence band in p-type semiconductors (with holes majority carriers). The precise position of the E_{F} is critical in semiconductor physics and technology, as it influences the behavior of electronic devices by determining how electrons and holes are distributed across the energy bands of the material. We emphasize that our DFT-GNN approach helps predict DFEs as a function of *E*_{F}, where the *E*_{F} goes from the PBE-computed VBM to the CBM obtained from the experimental bandgap. Our models can easily be extended to include further shifts in the VBM or CBM as necessary.

Finally, we make a few notes to emphasize the strength and limitations of the present work. Recent publications have shown that DFT computations for point defects in solids often get stuck in high-symmetry local minima,^{73,74} potentially leading to incorrect energies and defect levels. Lower energy defect structures may be found by performing additional geometry optimization after inducing intentional distortions in high-symmetry structures. We hypothesize that our GNN models could accelerate the search for such previously unknown lower energy or metastable defect configurations, by enabling instant energy estimates on hundreds of possible structures with distortions or atomic displacements, similar to the approach in this work for performing ALIGNN-optimization. Despite the inaccuracy of semi-local GGA functionals in predicting the electronic bandgap and band edges, the predicted defect structures are expected to be highly reliable, and the predicted energetics and defect levels would be useful for initial screening. Low energy defect geometries and DFE values predicted from the DFT-GNN models will serve the following purposes:

High-throughput screening of low energy native defects, impurities/dopants, and defect complexes across compounds of interest.

As an input for higher-fidelity calculations, such as using HSE06 or GW,

^{43,46}as well as suitable band edge corrections^{51,52}that may be applied to shift the predicted DFE (E_{F}= 0 eV) values for more accurate E^{f}vs E*F*plots.As an input for assigning experimental phases in real-life semiconductor applications, such as via linear combination analysis using simulated and measured x-ray absorption spectra.

^{75,76}As the impetus for continuous model improvement by adding more higher accuracy data and training multi-fidelity models, as well as expanding the model’s reach to other semiconductor classes and defect types.

## VII. CONCLUSIONS

In this work, we used state-of-the-art crystal graph-based neural networks to develop predictive models for defect formation energies in a chemical space of zinc blende semiconductors, by learning from a substantial computational dataset containing optimized and partially optimized geometries. Three established GNN techniques, namely CGCNN, MEGNET, and ALIGNN, are tested in this work. The ALIGNN scheme shows the best prediction performance and is capable of high-accuracy prediction for native defects, impurities, complexes, and defects in alloys. While ALIGNN predictions made on hypothetical pristine defect structures deviate significantly from DFT-optimized defect formation energies, we demonstrate an ALIGNN-based defective geometry optimization approach, which helps bridge the gap and bring down errors in predicting charge transition levels. The ALIGNN-unoptimized predictions made for the entire chemical space of $>$12 000 possible defects are released with this article, along with necessary code and training data. We believe that the DFT-GNN approach presented in this work will be highly consequential for screening across optoelectronically active point defects and functional dopants in technologically important semiconductors, even being applicable to all kinds of defect complexes. The next steps would involve developing a package to perform ALIGNN-based defect optimization, expanding the models to other semiconductor classes and higher levels of theories, and testing alternative ML and GNN approaches for further improvement.

## SUPPLEMENTARY MATERIAL

The supplementary material provided contains detailed descriptions of the DFT data and the functioning of GNN models, along with extra figures and tables for supplementary reference.

## ACKNOWLEDGMENTS

A.M.-K. acknowledges the support from the School of Materials Engineering at Purdue University under Account No. F.10023800.05.002, as well as the support from Argonne National Laboratory under Subcontract Nos. 21090590 and 22057223. A.M.-K. also acknowledges insightful discussions with Dr. Mariana Bertoni at Arizona State University, Dr. Prashun Gorai at Colorado School of Mines, and Dr. Maria K. Y. Chan at Argonne National Laboratory. This research used resources from the National Energy Research Scientific Computing Center (NERSC), the Laboratory Computing Resource Center (LCRC) at Argonne National Laboratory, and the Rosen Center for Advanced Computing (RCAC) clusters at Purdue University. P.G. acknowledges IIT Madras for providing financial assistance through the “International Immersion Experience Travel Award” to visit Purdue University. Please note that commercial software is identified to specify procedures. Such identification does not imply a recommendation by the National Institute of Standards and Technology (NIST).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

A.M.-K. conceived and planned the research project. DFT computations and GNN model training were performed by M.H.R., P.G., P.M., and A.M.-K.; S.K.Y., G.P., B.D., and K.C. provided constant guidance and software support for the project and for editing the manuscript. M.H.R. and A.M.-K. took the lead on writing and editing.

**Md. Habibur Rahman**: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Writing – original draft (equal). **Prince Gollapalli**: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal). **Panayotis Manganaris**: Software (equal); Visualization (equal). **Satyesh Kumar Yadav**: Software (equal). **Ghanshyam Pilania**: Software (equal). **Brian DeCost**: Software (equal). **Kamal Choudhary**: Software (equal). **Arun Mannodi-Kanakkithodi**: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Visualization (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

All DFT data and GNN predictions are included with the supplementary material as .csv files. supplementary material available: https://github.com/msehabibur/defect_GNN_gen_1.

## REFERENCES

*Defect Migration and Diffusion*

*Te*

_{x}_{1–}

_{x}*International Encyclopedia of Education*