We present a new version of the Ogre open source Python package with the capability to perform structure prediction of epitaxial inorganic interfaces by lattice and surface matching. In the lattice matching step, a scan over combinations of substrate and film Miller indices is performed to identify the domain-matched interfaces with the lowest mismatch. Subsequently, surface matching is conducted by Bayesian optimization to find the optimal interfacial distance and in-plane registry between the substrate and the film. For the objective function, a geometric score function is proposed based on the overlap and empty space between atomic spheres at the interface. The score function reproduces the results of density functional theory (DFT) at a fraction of the computational cost. The optimized interfaces are pre-ranked using a score function based on the similarity of the atomic environment at the interface to the bulk environment. Final ranking of the top candidate structures is performed with DFT. Ogre streamlines DFT calculations of interface energies and electronic properties by automating the construction of interface models. The application of Ogre is demonstrated for two interfaces of interest for quantum computing and spintronics, Al/InAs and Fe/InSb.

## I. INTRODUCTION

Epitaxial inorganic interfaces play a crucial role in a wide range of modern day electronics, including semiconductor, spintronic, and quantum devices. For example, heterostructures of superconductors and semiconductors are considered as promising materials for the realization of fault tolerant quantum computing schemes based on Majorana Fermions.^{1–10} Other systems of interest are interfaces between ferromagnetic and semiconducting materials, which have been used in spintronic devices, such as spin-filters and spin-valves.^{11–16} The functionality of such devices derives from the electronic and magnetic properties of the interface, which strongly depend on its structure at the atomistic scale. Different interface configurations may produce different properties.^{15,17–19} Moreover, defects and disorder may be detrimental to the functionality of a device. In particular, quantum devices may be extremely sensitive. Therefore, high-quality interfaces with precisely controlled structure and properties are required.

To grow high-quality interfaces, molecular beam epitaxy is often employed.^{20} If the lattice parameters and symmetries of the materials are closely matched, the film will strain to match the substrate and grow pseudomorphically. Three growth modes are possible:^{12,21,22} two-dimensional, mono-layer by mono-layer growth via the Frank–van der Merwe (FM) mode, three-dimensional island growth via the Volmer–Weber (VW) mode, and two-dimensional growth followed by three-dimensional growth via the Stranski–Krastanov (SK) mode.^{23} The balance between the surface energy of the film, *γ*_{film}, the surface energy of the substrate, *γ*_{sub}, and the interface energy, *σ*, determines which growth mode is favored.^{12,21,23} For the preferred FM growth to occur, the sum of the surface energy of the film and the interface energy must be smaller than the substrate surface energy,

This will always be true for homoepitaxy because *γ*_{sub} = *γ*_{film} and *σ* = 0. In the case of heteroepitaxy, where *γ*_{sub} ≠ *γ*_{film} and *σ* ≠ 0, the film is strained and the excess strain energy accumulates in the *σ* term as the film grows. If the FM condition is not met initially, VW growth will take place, and if the FM condition is met initially but is broken after a certain number of layers, SK growth will occur.^{23}

An example of psuedomorphic growth can be found in the closely lattice matched heteroepitaxial interface of CdTe and HgTe.^{24,25} Both materials assume the cubic zinc-blende structure and the $F4\u03043m$ space group, with lattice parameters of 6.45 and 6.48 Å, respectively, amounting to a lattice mismatch of 0.465%.^{26} For such a heterostructure, determining the orientation of the film is trivial because the film will grow in the same direction as the substrate to minimize the lattice strain. If the lattice parameters of the film and substrate do not closely match, the film may assume different orientations. Domain matching epitaxy may occur, where the lattices are matched via domains that contain integer multiples of film and substrate lattice plane spacings.^{21,22,27–31} For example, if the film has a lattice parameter that is 75% of the substrate, then four lattice planes of the film will match with three lattice planes of the substrate for so-called cube-on-cube growth in the [001] direction. The effective strain, *ε*_{eff}, produced by domain matching is given by^{21,22,27–31}

where *d*_{f} and *d*_{s} are the film and substrate lattice plane spacings and *m* and *n* are the integer multiples required to match the domains. For many heterostructures, *ε*_{eff} may be small for several orientations of the film, giving rise to the coexistence of multiple domains.^{17,28,30}

For example, one of the most studied superconductor/semiconductor interfaces for Majorana-based quantum computing is Al deposited on InAs nanowires.^{1–4,8,10} Al and InAs are not lattice matched. When wurtzite InAs nanowires grown in the ⟨0001⟩ direction serve as the substrate, face-centered cubic Al has been observed to grow on the ${11\u030400}$ InAs facets with both ⟨111⟩ and $\u27e8112\u0304\u27e9$ out of plane directions. For the ⟨111⟩ interface, there is a 7:5 match between the lattice planes in the Al $[112\u0304]$ direction and the InAs [0001] direction, as well as a 3:2 match in the lattice planes between the Al $[11\u03040]$ and the InAs $[112\u03040]$ directions, resulting in effective strains of −0.5% and 0.3%, respectively. The lattice matched domains for the $\u27e8112\u0304\u27e9$ interface are a 1:1 match between the lattice planes in the Al [111] direction and the InAs [0001] direction, as well as a 3:2 match in the lattice planes between the Al $[11\u03040]$ direction and the InAs $[112\u03040]$ direction, both resulting in effective strains of 0.3%.^{30} Because both growth directions of Al result in low effective lattice strain via domain matching epitaxy, they are both likely candidates for the interface structure. The $\u27e8112\u0304\u27e9$ orientation is preferred at larger thicknesses of Al and forms a faceted surface, whereas the ⟨111⟩ orientation is preferred at smaller thicknesses and forms a planar surface. At an intermediate thickness range, domains of the two orientations coexist.

The structure of interfaces may be characterized experimentally by observing a cross section with high resolution electron microscopy.^{17,22,30,32,33} However, the exact alignment of the substrate and film may be difficult to determine precisely, especially if there are multiple domains present at the interface. Computer simulations may help interpret experiments and assist in the structural characterization of experimentally grown interfaces.^{17,33,34} Beyond interpreting experiments, computer simulations may aid in the search for new materials systems for semiconductor, spintronic, and quantum devices. The structure and properties of interfaces comprising various materials combinations may be predicted theoretically to guide synthesis efforts in the most promising directions. For example, simulations can be useful for epitaxial stabilization of metastable crystal structures with desirable properties.^{35–38} Therefore, it is imperative to develop accurate and efficient methods for structure prediction of epitaxial inorganic interfaces.

While structure prediction of inorganic crystals is fairly well-established,^{11,39–42} there has been less work done on structure prediction of interfaces.^{43–49} Several of the codes developed for interface structure prediction,^{45,46,50,51} including the one introduced here, rely on implementations of the lattice matching algorithm by Zur and McGill,^{31} which works by generating domain-matched superlattices between the film and the substrate within user-defined mismatch and area tolerances. Once candidate interface structures are generated, their stability must be evaluated. Classical force fields are often used for this purpose,^{43,44,47} however, general-purpose force fields may lack the accuracy to resolve small energy differences between interface configurations with similar stability.^{52,53} To achieve the required accuracy, *ab initio* density functional theory (DFT) can be utilized to calculate the interface energies.^{54–57} However, the increase in accuracy entails an increase in computational cost. Owing to quantum size effects, a large number of layers of each material must be included in the interface model to converge its properties.^{19,58,59} Moreover, domain-matched interfaces may require large supercells. This often amounts to models containing hundreds of atoms. Depending on the mismatch and area tolerances, many candidate interface structures may be produced by the lattice matching algorithm. Additionally, once a commensurate interface is identified, the separation between the substrate and the film at the interface, as well as the registry in the plane of the interface, may still be varied. Therefore, surface matching should be performed to find the optimal configuration(s). This may require sampling hundreds of points in the 3D search space. Therefore, a fast pre-screening method is useful in reducing the number of candidate interface structures to be considered by DFT.^{45,46} A solution proposed by Raclariu *et al.*^{45} is a score function that ranks structures based on nearest neighbor distances and electronegativity differences at the interface, where bond lengths closer to the ideal bond length of a material and larger electronegativity differences between neighboring atoms lead to better scores.

Here, we introduce a new version of the open source Python package, Ogre, which was previously developed to generate surface models of molecular crystals and streamline the calculation of surface energies and Wulff shapes.^{60} We have implemented in Ogre a new functionality of predicting the structure of epitaxial inorganic interfaces by lattice and surface matching. We note that Ogre does not take into account growth conditions and kinetics, which may lead to the formation of different structures.^{61,62} In addition, Ogre does not take into account interdiffusion, which may lead to the formation of substitutional impurities and interface phases.^{15,33} Similar to Refs. 45 and 46, the workflow of Ogre begins by using Zur and McGill’s lattice matching algorithm and proceeds to perform surface matching. A new score function is proposed to optimize and rank candidate interface structures from purely geometric considerations without performing any energy evaluations. The score function is based on the overlap and empty space between atomic spheres at the interface. It is efficiently implemented using tensor algebra and demonstrated to correctly reproduce the extrema of the DFT potential energy surface. Bayesian optimization (BO) is then performed to explore the 3D space of interfacial distance and registry and find the most stable interface configurations. The optimal configurations of all domain-matched interfaces are ranked using a geometric score function. In the final stage, a small subset of the most promising candidate structures are evaluated with DFT. Ogre streamlines the convergence of the interface thickness and the evaluation of interface energies with DFT. The application of Ogre is demonstrated for interfaces of interest for quantum computing and spintronics, including Al/InAs and Fe/InSb.

## II. CODE DESCRIPTION

Ogre is written in Python 3 and utilizes the Python Materials Genomics (pymatgen)^{50} and Atomic Simulation Environment (ASE)^{51} libraries. The package is available for download from www.noamarom.com under a BSD-3 license. The inputs to the code are the bulk structures of the substrate and film materials, as well as a configuration file that contains user-specified settings. Ogre supports several common formats of input structure files, including crystallographic information files (CIFs), the geometry.in format of the FHI-aims code,^{63} and the POSCAR format of the Vienna *ab initio* Simulation Package (VASP).^{64–68} An overview of the workflow of interface structure prediction with Ogre is shown in Fig. 1. The hierarchical screening workflow of Ogre comprises three main steps: lattice matching (Sec. II A), surface matching (Sec. II B), and final ranking with DFT (Sec. II C).

The lattice matching step identifies all domain-matched supercells of the substrate and film within the user-defined misfit and area tolerances. If there are several possible orientations of the substrate and film, a Miller index scan may be performed. The input parameters for lattice matching are the maximum interface area, the misfit tolerance, and the Miller indices to be considered for the substrate and film. The output of the lattice matching step is a list of structures sorted by their super-cell area misfit values. We note that the area misfit is a more stringent criterion than the effective strain along one direction. From an experimental perspective, robust epitaxial growth usually occurs with misfit values below 1%.^{69} Therefore, the default criteria for selecting the interfaces that proceed to the surface matching step are supercell area misfit, lattice vector length misfit, and angle misfit below 1%. The selection criteria may be modified by user input.

The surface matching step uses a geometric score function based on the overlap and empty space between atomic spheres at the interface to find the optimal distance in the *z* direction and registry in the *xy* plane between the substrate and the film. For surface matching and ranking with the score function, the Hirshfeld radii of each chemical species must be specified in the configuration file. To this end, the Ogre radii calculation module streamlines the calculation of the Hirshfeld^{70} radii using the FHI-aims code. If no Hirshfeld radii are provided, Ogre will automatically use the van der Waals radii tabulated in the pymatgen periodic table module. The score function serves as the objective function for Bayesian optimization to explore the 3D space above the substrate by shifting the film in *x*, *y*, and *z* directions to optimize the structure of generated lattice matched interfaces. Subsequently, the optimized structures are ranked based on the deviation of the overlap between species at the interface from the respective bulk structures. The optimized interfaces are sorted based on their predicted stability, and users can select a certain percentage to output.

For the most promising structures, the interface energy is calculated with DFT. Ogre includes a module for streamlining interface energy calculations with DFT. Interface slab models are constructed with a user-defined number of layers and a vacuum region. Automatic passivation with pseudo-hydrogen atoms can be applied to terminate dangling bonds at the surfaces. An option of generating periodic heterostructure models without a vacuum region is also available. The number of layers of each material is converged by adding layers of either material to the interface model. For surface energy calculations, the linear method has been found to exhibit good convergence behavior.^{71,72} Here, we use a modified linear approach for the calculation of interface energies. Finally, for the most stable interface structures, further analysis may be performed with DFT, including structural relaxation and calculation of electronic properties.

### A. Lattice matching

The workflow of Ogre’s lattice matching module is illustrated in Fig. 2. The algorithm proposed by Zur and McGill^{31} is utilized to identify matching supercells of the film and substrate. First, bulk structures are cleaved along the specified Miller planes using ASE to determine the basis vectors of the substrate and film surface slab models. The resulting surface basis vectors are reduced to a pair of primitive basis vectors using pymatgen to obtain a unique representation of two-dimensional lattices, which is necessary for the comparison of substrate and film lattice properties. Second, the pymatgen substrate analyzer module (which follows the Zur and McGill algorithm) is utilized to generate all the transformation matrices that would produce lattice matched super-cells with a low misfit of lattice vector lengths and angles. Third, a reduction scheme is used to find all unique commensurate interfaces of a given system. We note that the algorithm of Zur and McGill is not restricted to interfaces between cubic lattices.^{31} Finally, the interface structure is constructed. By default, the lattice parameter of the substrate is fixed and the film layer is strained to match the substrate to simulate an epitaxial growth experiment (the user may choose to apply strain to the substrate). Ogre creates the matching substrate and film super-cells and aligns the atomic coordinates to build a lattice matched interface with specified structural properties, including the interfacial distance in the *z* direction, the shifts in the *xy* plane, the number of layers, and the amount of vacuum.

Ogre’s lattice matching module takes three sets of input parameters that determine the matching criteria and interface structural properties: the substrate and film Miller indices; maximum super-cell area; and misfit tolerances for the unit cell area, length, and angle. The default for all misfit tolerances is one percent, and the default maximum super-cell area is 500 Å^{2}. Specifying a larger maximum area or higher misfit tolerances would generate additional candidate interfaces; however, they are likely to be less stable. The user may also specify parameters for constructing the interface model, such as the amount of vacuum, the interfacial distance, the shift in the *xy* plane, and the number of layers. The default values of the vacuum region and interfacial distance are 40 and 2 Å, respectively. By default, no *xy* shifts are applied to the initial interface structure. The optimal registry is later found by Ogre’s surface matching module. The substrate and film thickness can be modified by specifying the number of layers or a range of values to calculate. The surface termination may be defined by the user. If a surface termination is not set by the user, Ogre can identify all possible terminations and automatically generate the corresponding interfaces. For example, for InAs(111), the user may specify an As-terminated or In-terminated surface; otherwise, Ogre will generate both.

If the orientation of the substrate and film is known, the user may specify the corresponding Miller indices in the configuration file. In some cases, several substrate orientations are possible and/or the film orientation is not known. For example, when a superconductor is grown on top of a semiconductor nano-wire, growth on different facets may result in different orientations.^{30,73} In such cases, a Miller index scan may be conducted, where lattice matching is performed for each combination of substrate and film Miller indices. The Miller index search module takes the maximum single index as input and finds all possible symmetrically unique Miller indices as described in Ref. 60. For example, Fig. 3 shows the results of a Miller index scan for an Al/InAs interface with a maximal Miller index of 2. The results were generated using a maximum interface area of 500 Å^{2} and a misfit tolerance of 2%. Figure 3(a) displays a histogram of the number of interfaces generated for each combination of Miller indices. In total, 116 candidate lattice matched interfaces are generated with these settings. Figure 3(b) shows the minimum misfit percentage obtained for each interface orientation. In this case, seven of the ten possible orientations found in the Miller index scan have a misfit under 1%. Further screening of these structures is performed in Sec. IV.

### B. Surface matching

Once commensurate interface supercells are identified via lattice matching, surface matching is performed to determine the optimal distance in the *z* direction and registry in the *xy* plane between the substrate and the film.^{45,46} Owing to the size of the configuration space to be searched (the number of lattice matched candidates multiplied by the number of displacements to be sampled in the *x*, *y*, and *z* directions for each interface), it is desirable to avoid computationally expensive DFT calculations. Therefore, we have developed a geometric score function. In Ref. 74, it has been shown that a simple geometric model based on the overlap of circles captures the main features of the DFT potential energy surface for the interlayer sliding of hexagonal boron nitride. Here, we define a score function based on the overlap and empty space between atomic spheres at the interface.

The workflow of Ogre’s surface matching module is shown in Fig. 4. After importing the interface structures constructed in the lattice matching step, the user may choose to utilize the geometric score function or DFT to perform surface matching. Bayesian optimization is then used to efficiently scan the 3D parameter space to find the optimal position of the film above the substrate. Finally, if the geometric score function is selected, a geometry-based metric is used to rank the optimized structures based on their predicted stabilities. If DFT is selected, the structures are ranked based on their DFT energy. The optimized interface structures are written in the appropriate geometry file format (e.g., POSCAR for VASP).

#### 1. Geometric score function

The score, *S*, is defined as

where the scaled overlap is defined as the overlap volume divided by the total volume occupied by atoms at the interface,

and the scaled empty space is defined as the volume of empty space divided by the unit cell volume,

We note that the unit cell volume changes when the interfacial distance is changed along the *z* direction. The volume of the vacuum region is subtracted from the total unit cell volume. The number of substrate/film atomic layers considered in the calculation of the interface overlap and empty space depends on the surface roughness of both slabs. Ogre determines the required substrate and film thickness by increasing the number of layers until the surface roughness is converged. The effective atomic volume of each element, used for the calculation of the score function, is a system dependent parameter. Here, it is obtained using Hirshfeld partitioning of the bulk material’s DFT charge density. Ogre’s volume determination module streamlines the calculation of Hirshfeld volumes with the FHI-aims code and converts them into scaled Hirshfeld radii. The scaled Hirshfeld radius of species M, *R*_{M}, is given by

where *D*_{MX} is the minimum distance between species M and X in the corresponding bulk crystal structure and *H*_{V,M/X} is the Hirshfeld volume of species M/X in the bulk crystal structure. The species radii for the substrate and film are calculated separately, so if a certain species exists in both the substrate and film slabs (e.g., Te in the SnTe/CaTe interface), two different radii are considered. If there are more than two species in the bulk crystal, the resulting Hirshfeld radii for each element are averaged.

To calculate overlap and empty space at the interface, Ogre applies an efficient vectorized method based on mesh voxelization, similar to the method of calculating molecular volumes in Ref. 75. Voxel representations of the substrate and film slabs are generated and stored as two 3D binary matrices, in which voxels occupied by atoms are assigned a value of one and empty voxels are assigned a value of zero. The default voxel volume is 0.001 Å^{3}. This produces a sufficiently fine grid to sample the atomic surface accurately. The user may set a different voxel size. For the calculation of empty space and overlap, logical NOR and logical AND operations are performed on the two binary matrices, which results in overlap and empty space matrices. To calculate the overlap volume and empty space in Å^{3}, the voxel volume is multiplied by the number of “one” entries in the overlap and empty space matrices, respectively. Illustrations of the voxel tensors are shown in the supplementary material.

In Eq. (3), the scaled overlap and empty space terms effectively serve the function of repulsive and attractive terms, respectively. The effective repulsion term is squared to reflect the short-ranged nature of repulsive forces. The coefficient *c* balances the weights of the overlap and empty space terms. The value of *c* significantly affects the performance of the score function. For example, Fig. 5 shows the score as a function of the interfacial distance obtained with different values of *c* for interfaces of Al(100)/InAs(100) and SnTe(111)/CaTe(111). DFT total energy curves are shown for comparison with the minimum energy of each curve referenced to zero (for DFT settings; see Sec. III). The interface distance is calculated by subtracting the height (z value in Cartesian coordinates) of the top atom of the substrate slab from the height of the bottom atom in the film slab.

We have developed a procedure for finding the optimal value of *c* without relying on DFT calculations. We define Δ_{S}(*c*) as the difference between the asymptotic and minimal score values obtained with a given value of *c*, *δ*_{S}(*c*), divided by the difference between the maximal and minimal score values obtained with *c* = 0, *δ*_{S}(0), as illustrated in Fig. 5,

Increasing *c* leads to an increase in Δ_{S}(*c*) and a decrease in the optimal interface distance produced by the score function. We have found empirically for several orientations of four different interfaces (Al/InAs, SnTe/CaTe, NiMnIn/InAs, and Fe/InSb) that the optimal distance produced by the score function is closest to the DFT result when Δ_{S}(*c*) is 0.1. Figure 5 shows examples for the Al(100)/InAs(100) interface and the SnTe(111)/CaTe(111) interface. Additional examples are shown in Sec. IV and in the supplementary material. Therefore, to find the optimal *c* coefficient for a certain interface, Ogre starts from *c* = 0 and incrementally increases *c* until Δ_{S}(*c*) reaches 0.1. As the film is shifted in the *xy* plane, the optimal *z*-distance and *c* parameter may also vary. Examples are shown in the supplementary material. This requires Ogre to find an optimal *c* parameter that performs well over the entire search space. To this end, Ogre first generates a preliminary 2D score contour at the initial user-defined interface distance to locate the positions of the minimum and maximum. The *c* coefficient optimization process is then performed for the structures with the minimal and maximal score values. Finally, the value of the two *c* coefficients obtained for extrema of the score contour is averaged and used for the 3D surface matching. The averaged value of *c* is found to be 0.43 for the Al(100)/InAs(100) interface and 0.54 for the SnTe(111)/CaTe(111) interface.

To validate the score function, we compare its results to DFT. Figure 6 shows contour plots of the score obtained with the optimal value of *c*, compared with the DFT potential energy surface as a function of the displacement in the *xy* plane at a fixed interfacial distance of 2.2 Å for the Al(100)/InAs(100) interface and 2.0 Å for the SnTe(111)/CaTe(111) interface. In both cases, the score function reproduces well the features of the DFT potential energy surface and the positions of the extrema. An additional example for a ternary compound is provided in the supplementary material.

#### 2. Bayesian optimization

Once the atomic radii and the *c* parameter of the score function are determined, a three-dimensional search is performed to find the (*x*, *y*, *z*) coordinates of the global minimum structure. To this end, we use Bayesian optimization (BO).^{76} BO is a machine learning algorithm that efficiently samples points from a black-box objective function to find the global optimum. The objective function is first estimated by a Bayesian statistical model using a Gaussian process prior.^{77} An acquisition function is then used to predict the optimal position for the next point to be sampled, and the prior is updated with the new information to create a new surrogate function called the posterior. Once the posterior is calculated, it becomes the new prior. The process is repeated for a specified number of steps or until convergence is achieved. BO is a superior optimization technique compared to a grid search because it maximizes the information gained about the black-box function while sampling a minimal number of points.

To perform Bayesian optimization, Ogre utilizes the Bayesian-optimization Python package.^{78} Here, the BO objective function for surface matching is defined as the negative of the score value or the negative of the DFT total energy such that the score/energy value is minimized by maximizing the objective function. If the DFT total energy is used as the objective function, Ogre is compatible with FHI-aims and VASP. The DFT calculation settings are read from an input file provided by the user. Ogre automatically performs the necessary DFT calculations and analyzes the results.

The parameter space to be searched is defined by the default bounds for shifts in the *x*, *y*, and *z* directions, which are (0, *a*), (0, *b*), and (*d* − 1 Å, *d* + 1 Å), respectively, where *a* and *b* are the interface unit cell lattice parameters in the *xy* plane and *d* is the initial interface distance. To determine the next point ($r\u20d7$) to be sampled in each iteration, the upper confidence bound acquisition function^{76,77,79} is used,

where *μ* and *σ* are the mean and standard deviation at each point and *κ* is a hyperparameter that controls the trade-off between exploration and exploitation. By default, *κ*, is set to 5 and the number of iterations, *N*, is set to 100. Both parameters can be modified through the surface matching settings file. Once the maximal number of iterations is reached, the code outputs the most stable structure, as well as any structures whose score or DFT energy is within a user-defined tolerance of the minimum.

The computer time required for BO is the product of the number of points sampled and the cost of function evaluation, which may be the cost of a single DFT calculation or a single score function evaluation. Evaluation of the geometric score function is extremely fast because it requires no quantum mechanical calculations. Scaling plots for the score function calculation are provided in the supplementary material. Although BO is the default and recommended optimization method in Ogre, a grid search option is available, e.g., to generate potential energy surfaces or binding energy curves. BO is significantly more efficient than a grid search. For example, for the Fe/InSb interface (see Sec. IV B) with a lattice parameter of 10.3 Å, a grid search with a step size of 0.2 Å and a 1 Å range for the *z* axis would require evaluating more than 12 500 points. In comparison, the BO algorithm converged to the optimal interface within 200 iterations. After the structure optimization is completed, the optimized interface structures may be exported to input geometry files for DFT calculations or proceed to Ogre’s structure ranking module.

#### 3. Preliminary ranking

Surface matching is performed for every candidate structure passed from the lattice matching step, which may still amount to a large number of structures. Therefore, it is desirable to perform preliminary ranking in order to select a smaller number of the most promising candidate structures for the final evaluation with DFT. The geometric score function used for surface matching cannot be used for structure ranking because the score function parameter, *c*, is optimized for each interface such that interfaces between the same materials with different orientations may have different *c* parameters. A ranking score function, *R*, is formulated based on the assumption that a stable interface is more likely to form when the chemical environment at the interface is similar to that of the bulk materials. The scaled overlap, defined in Eq. (4), is used as a measure of similarity between the bulk and the interface (int) environment for the film and substrate (sub),

$O\u0304slab$ is calculated for unstrained film and substrate slab structures with the same orientation as the interface such that the interface strain is taken into account in the ranking. $O\u0304int$ is calculated for the first few layers of the film and substrate at the interface, as shown in Fig. 7. Atoms that are not in the vicinity of the interface do not affect the surface contours (see the supplementary material for an example of the voxel representation of the surface contour). Ogre automatically finds the minimum number of layers required for full representation of the substrate and film surface contours at the interface. A smaller value of *R* means that the chemical environment at the interface is more similar to that of the bulk materials. Hence, the structure with the lowest value of *R* is expected to be the most stable.

In Fig. 8, the ranking score is compared to DFT interface energies (calculated as described in Sec. II C) for the six most stable candidate structures of the Al(011)/InAs(001) interface. The ranking score, which is based on purely geometric considerations, successfully reproduces the ranking order obtained with DFT and identifies the most stable candidate structures. The ranking score is found to perform similarly well for several additional interfaces, as shown in Sec. IV. Based on the ranking score, the list of structures selected to proceed to the final DFT ranking stage may be narrowed down.

### C. DFT ranking

#### 1. Automated surface passivation

In DFT calculations of surface and interface slab models, dangling bonds on the surface are terminated by pseudo-hydrogen atoms with appropriate fractional charges to avoid surface states.^{80–82} Ogre automates the surface passivation during the construction of interface models, as illustrated in Fig. 9. First, one additional layer is added to the top and bottom surfaces of the slab. Second, vectors in the directions of the dangling bonds are constructed by performing a nearest neighbor search for the atoms on the surface to obtain the proper coordination. Third, the relative positions of each neighboring atom are converted from Cartesian to spherical coordinates and the pseudo-hydrogen is inserted between the covalent radii of the two atoms. The charge of the pseudo-hydrogen (*q*_{H}) is calculated using the number of valence electrons of the atom being replaced by pseudo-hydrogen (*N*_{v}) and the coordination of that atom (*X*),

The spherical coordinates are then converted back to Cartesian coordinates. Finally, the top and bottom atomic layers are removed, leaving only the pseudo-hydrogen passivation layer on the surface. After the passivated structure is generated, the pseudo-hydrogen positions should be relaxed using DFT. For computational efficiency, this is performed using a structure with a small number of layers and the relaxed pseudo-hydrogen positions are subsequently transferred to larger slab models.

#### 2. Interface energy evaluation

The interface energy, *σ*, is defined as the energy of eliminating two surfaces and creating an interface,^{83–86}

where *W*_{ad} is the adhesive energy of the interface and *γ*_{sub/film} are the surface energies of the substrate and film, calculated with the OgreSwamp module of Ogre,^{60} using the linear method,^{71,72}

where *E*_{slab} and *E*_{bulk} are the DFT total energies of a surface slab with *N* layers and a bulk unit cell, respectively, and *A* is the cross-section area of the interface. We note that *γ*_{sub/film} must be recalculated if the respective lattice is strained (by default, only the film is strained in Ogre). Examples are provided in the supplementary material. The adhesive energy of an interface is given by^{54–57,85,86}

where *E*_{sub}, *E*_{film}, and *E*_{int} are the DFT total energies of the substrate slab, the film slab, and the interface slab, respectively. Because *E*_{sub}, *E*_{film}, and *E*_{int} are dependent on the number of layers in the system, *W*_{ad} must be converged with respect to the number of layers used in the interface slab construction. Often, the number of layers of the substrate and film are converged separately and then used to build the interface model.^{57,83} However, we find that owing to the electronic and magnetic interactions at the interface, the number of substrate/film layers required to converge the interface energy may be different from the number of layers required to converge the surface energy of either material. Similar to the calculation of surface energy, we may define a linear method for the calculation of the interface energy by substituting Eqs. (13) and (12) into Eq. (11),

where *E*_{int} is the total energy of the interface, *N*_{sub} and *N*_{film} are the number of layers of the substrate and film in the interface model, and $Ebulksub$ and $Ebulkfilm$ are the total bulk energies of the substrate and film. Using Eq. (14), the interface energy, *σ*, can be obtained by performing two linear regressions: one where the number of slab layers is held constant and the number of film layers is increased and one where the number of film layers is held constant and the number of slab layers is increased. In each case, after a user-defined number of layers of the film/substrate are added to the interface, linear regression is performed, and *σ* can be extracted from the intercept, *b*,

In these regressions, a straight line is fitted to all the interface total energy data vs the number of layers except for the structures that contain only one atomic layer of the film/substrate, following the method of Refs. 71 and 72. The interface energy is taken as the average of the *σ*_{film/sub} values obtained from both regressions,

The default convergence criterion is that the relative deviation of the averaged interface energy should be within 0.3% with the addition of one layer. We note that the interface energy enables comparing interfaces with different Miller indices, different areas, and different numbers of atoms on the same footing. Hence, this is the preferred metric for the final ranking of interfaces.

Figure 10 shows the interface energy as a function of the number of layers for two structures of the Al(111)/InAs(001) interface. The curves are obtained as described above, using ten layers for the fixed substrate/film in each regression. As expected, both lines converge to a similar value. Additional interface energy convergence plots for Al(111)/InAs(111) are provided in the supplementary material. The workflow of calculating converged interface energies is fully automated in Ogre, and Eq. (1) is used to estimate which growth mode will dominate based on the relative interface and surface energies.

## III. COMPUTATIONAL DETAILS

DFT calculations were performed using the VASP code^{64–68} with the projector-augmented wave (PAW) method.^{64,87} The generalized gradient approximation (GGA) of Perdew, Burke, and Ernzerhof^{88} (PBE) was employed for the description of the exchange–correlation interactions between electrons. For InAs and InSb, a Hubbard-U correction within the Dudarev formalism^{89} was applied to the *p* orbitals of In, As, and Sb. The values of *U*_{eff} were determined by Bayesian optimization^{90} to be *U*_{eff,In} = −0.5 eV and *U*_{eff,As} = −7.5 eV for InAs and *U*_{eff,In} = −0.2 eV and *U*_{eff,Sb} = −6.1 eV for InSb. These U values have produced band structures in good agreement with angle-resolved photoemission spectroscopy (ARPES) for InAs and InSb surfaces.^{19} The Tkatchenko–Scheffler (TS)^{91} pairwise dispersion method was used to account for the van der Waals interactions at the interface. A plane-wave cutoff of 450 eV was adopted. The convergence criterion used in the structural relaxation was for the Hellman–Feynman forces acting on ions to be below 0.001 eV/Å. A k-point mesh of 5 × 5 × 1 was used for self-consistent field (SCF) calculations; and a k-point mesh of 7 × 7 × 1 was used for density of states (DOS) calculations. Key VASP INCAR file tags for convergence were ALGO = Fast, AMIN = 0.01, and BMIX = 3. In all calculations, dipole corrections were applied along the *z* axis^{92} and spin–orbit coupling^{93} was applied with the *z* spin quantization axis. For the Fe/InSb interface, spin-polarized calculations were performed to study the local spin polarization induced in the InSb. The lattice parameters of InAs, InSb, Al, Fe, CaTe, and SnTe were 6.0584, 6.4794, 4.038 93, 2.866, 6.4010, and 6.4002 Å, respectively. For slab models, a vacuum region of 40 Å was added. DFT calculations for Hirshfeld analysis were performed with the all-electron electronic structure code FHI-aims^{63} using the PBE exchange–correlation functional with the Tkatchenko–Scheffler (TS) pairwise dispersion correction,^{91} the light numerical settings, tier 1 basis sets, and a k-point mesh of 8 × 8 × 1. The tag vdw_correction_hirshfeld was used for exporting Hirshfeld analysis results.

## IV. APPLICATIONS

### A. Al/InAs

Based on Fig. 3, the Al(111)/InAs(111), Al(011)/InAs(111), and Al(012)/InAs(012) interfaces have the lowest area misfit values of 0.158%, 0.172%, and 0.158%, respectively. The (111) orientation is more commonly used for InAs substrates in experiments than the (012) orientation. With this substrate orientation, Al(111) results in a minimal interface area of 47.68 Å^{2}, whereas Al(011) results in an interface area of 197.30 Å^{2}. Therefore, we proceed with the Al(111)/InAs(111) interface orientation. As-terminated InAs is chosen based on the experiment reported in Ref. 30.

We obtain three Al(111)/InAs(111) interface structures with interface areas of 47.68, 111.25, and 63.57 Å^{2}, respectively, and an area misfit value of 0.158%. The structures also differ in terms of the film rotation with respect to the substrate. Surface matching was performed for these three structures to find their optimal interface distance and *xy* registry. The score function was validated by comparing to DFT results for a representative structure. As shown in Fig. 11, good agreement is obtained for the optimal registry in the *xy* plane.

The optimized interface structures were ranked using Ogre’s ranking score, and subsequently, their interface energies were calculated with DFT. The interface energies converged with 12 layers of InAs and eight layers of Al. Convergence plots are provided in the supplementary material. Figure 12 shows that the Ogre ranking score successfully predicts the DFT ranking. The lowest energy interface structure is similar to the one observed by tunneling electron microscopy (TEM) in Ref. 30.

Finally, electronic structure calculations were performed for the most stable interface configuration. A user may choose to perform a similar analysis for a larger number of interface structures to study the effect of the interface configuration on the electronic properties. To eliminate the effect of the surfaces, these calculations were performed for a periodic heterostructure with seven atomic layers of Al and 31 atomic layers of InAs. Geometry relaxation was performed for the four atomic layers of InAs and the three atomic layers of Al closest to the interface. The atoms were constrained in the *x* and *y* directions and allowed to relax along the *z* direction. Figure 13 shows the variation of the InAs local DOS as a function of the distance from the interface. The Fermi level position is at the edge of the InAs conduction band (it is possible that the Fermi level would shift into the conduction band for a thicker InAs slab). Close to the interface (e.g., four atomic layers from the interface), a significant density of metal-induced gap states (MIGSs)^{94–96} is found in the gap of InAs. The MIGSs decay gradually with the distance from the interface. 16 layers away from the interface, the bulk DOS of InAs is recovered. We note that due to the quantum size effect, the bandgap of InAs is 0.45 eV, which is 0.14 eV larger than the bulk PBE + U(BO) value of 0.31 eV.^{19}

### B. Fe/InSb

The Fe/InSb ferromagnet/semiconductor interface has been shown to achieve spin-filtering and spin-valve effects.^{11} The Fe/InSb interface is similar to the Fe/GaAs interface, in which spin injection has also been observed.^{15,97} The Fe and GaAs lattice parameters are matched nearly 2:1, leading to coherent “cube-on-cube” growth. In contrast, the lattice parameters of Fe, 2.866 Å, and InSb, 6.4794 Å, result in a lattice mismatch of 56%, making domain-matched epitaxy more likely. To find possible domain-matched interfaces, a Miller index scan was performed with a maximal Miller index of 2, a maximum interface area of 500 Å^{2}, and an area misfit tolerance of 1%. Only Sb-terminated InSb was considered based on the experiment reported in Ref. 11. The results are shown in Fig. 14. Three Miller index pairs are identified to have an area misfit below 1%. The Fe(012)/InSb(012) interface is not an obvious choice from a practical perspective because (012) is not a common orientation for the InSb substrate. The Fe(111)/InSb(111) interface generates only one possible interface structure with the given inputs, making it a trivial example and thus not interesting for demonstration purposes. The Fe(001)/InSb(001) interface produces four possible interface structures with the same cross-sectional area of 104.96 Å^{2} and effective strain of 0.86%. The four configurations differ only by relative rotations of the Fe film on top of the InSb substrate.

For the four Fe(001)/InSb(001) candidate structures, surface matching was performed using the Ogre score function to find their optimal interface distance and *xy* registry. The geometric score function was validated by comparison to DFT for a Fe(001)/InSb(001) interface structure with a smaller area of 45.55 Å^{2} and a mismatch of 1.49%. Figure 15 shows that the score function is in good agreement with the DFT potential energy surface with respect the positions of the extrema and the optimal registry in the *xy* plane. Figure 16 shows that the Ogre ranking score successfully predicts the DFT ranking. The four configurations are very close in energy and may co-exist.

Finally, electronic structure calculations were performed for the most stable interface configuration. To eliminate the effect of the surfaces, these calculations were performed for a periodic heterostructure with 11 atomic layers of Fe and 35 atomic layers of InSb. Geometry relaxation was performed for the four atomic layers of InSb and the three atomic layers of Fe closest to the interface. The atoms were constrained in the *x* and *y* directions and allowed to relax along the *z* direction. Figure 17 shows the variation of the InSb local DOS as a function of the distance from the interface. The Fermi level position is at the edge of the InSb valence band. Close to the interface (e.g., six atomic layers from the interface), a significant density of MIGS is observed in the bandgap of Insb. The MIGSs decay gradually with the distance from the interface. About 18 layers away from the interface, the bulk DOS of InSb is recovered. We note that due to the quantum size effect, the bandgap of InSb is 0.3 eV, which is 0.16 eV larger than the bulk PBE + U(BO) value of 0.14 eV.^{19}

A small magnetic moment of −0.052 *μ*B is induced in the first InSb layer directly in contact with the Fe. The induced magnetic moment decays rapidly and completely vanishes beyond five atomic layers from the interface. Figure 17(c) shows the local spin polarization (the difference between the majority DOS and the minority DOS) as a function of the distance from the interface. Despite the small overall magnetization, the DOS of the InSb is spin-polarized in the same region where the significant presence of MIGS is observed. In this region, the DOS around the Fermi level is dominated by the majority spin channel, which may produce spin-polarized transport in agreement with Ref. 11. At 0.1–0.3 eV below the Fermi level, the DOS is dominated by the minority spin channel. This indicates that spin switching may be achieved by applying a bias.

## V. CONCLUSION

In summary, we have presented a new version of the Ogre Python package with the capability to predict the structure of epitaxial inorganic interfaces by lattice and surface matching. In the lattice matching step, all possible domain-matched interface structures are found within user-defined tolerances for the interface area and lattice mismatch. A Miller index scan is performed to determine the substrate and film orientations that would lead to the most favorable interface. In the surface matching step, Bayesian optimization (BO) is used to find the optimal configuration of each domain-matched interface, in terms of the interfacial distance in the *z* direction and the registry in the *xy* plane. For the BO objective function, we have formulated a geometric score function based on the overlap and empty space between atomic spheres at the interface. We have demonstrated that the geometric score function reproduces the DFT results for the dependence of the energy on the interfacial distance, as well as the features of the DFT potential energy surface in the *xy* plane at a fraction of the computational cost. For preliminary ranking of the optimized interfaces, we have formulated a ranking score based on the similarity between the overlap of atomic spheres at the interface and the overlap in the respective bulk crystal structures. We have demonstrated that the ranking score reproduces the DFT ranking and correctly predicts the order of stability of interface structures. After the lattice and surface matched structures are ranked based on the ranking score, DFT simulations can be performed for a small number of the most promising candidate structures. Ogre streamlines the evaluation of interface energies and calculation of electronic properties with DFT by automating the creation of interface models with a user-defined number of layers. For slab models, Ogre also automates the passivation of dangling bonds at the surface and adds a vacuum region.

We have demonstrated the application of Ogre for two interfaces of interest in relation to quantum computing and spintronics: Al on As-terminated InAs and Fe on Sb-terminated InSb. Based on the results of a Miller index scan in the lattice matching step, the (111) orientation was selected for Al/InAs and the (001) orientation was selected for Fe/InSb. For Al/InAs (111), the top ranked structure produced by Ogre is in agreement with the experimentally observed structure. For Fe/InSb (001), whose structure has not been characterized experimentally, Ogre produces four structures that differ by the relative rotation of the Fe film on top of the InSb substrate. We have investigated the electronic structure of the most stable structures of both interfaces. In both cases, a significant density of metal-induced gap states is found in the semiconductor in the vicinity of the interface, which gradually decays within about 16 atomic layers. For Fe/InSb, although the induced magnetic moment is small and decays rapidly, the MIGSs around the Fermi level are spin-polarized, which may produce spin-polarized transport.

Ogre may be used to interpret the results of experiments conducted on epitaxial inorganic interfaces by identifying the most likely interface configurations and correlating the structures with the observed electronic properties and/or spectroscopic signatures. Moreover, Ogre may be used to predict the structure and properties of putative interfaces and guide synthesis efforts in promising directions. Ogre may be incorporated into an automated materials discovery workflow. Thus, Ogre can advance the understanding of the structure and properties of epitaxial inorganic interfaces, as well as the computational design and discovery of new interfaces for various applications, such as quantum computing and spintronic devices.

## SUPPLEMENTARY MATERIAL

See the supplementary material for additional details on the score function, surface and interface energy convergence, and additional illustrations of interface structures.

## ACKNOWLEDGMENTS

We thank Christian Ratsch from UCLA for helpful discussions on lattice matching. We thank Imanuel Bier from CMU for helpful discussions on the voxel tensor method. This research was funded by the Department of Energy through Grant No. DE-SC0019274. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request. The Ogre code is available for download from www.noamarom.com.

## REFERENCES

_{2}MnSi/GaAs (001) interface

_{2}) and rutile polymorphs of SnO

_{2}on all orientations of columbite CoNb

_{2}O

_{6}substrates

_{2}on (Sr,Ba)TiO

_{3}

_{2}oxide polymorphs for epitaxial growth candidates

_{2}interfaces

_{2}heterogeneous nucleation interface

_{2}heterogeneous nucleation interface

_{2}(110)

_{2}Se

_{3}film on Si(111) with atomically sharp interface

_{2}interface

_{1−x}Co

_{x}alloys