Perovskite insulator SrTiO3 (STO) is expected to be applied to the next generation of electronic and photonic devices as high-k capacitors and photocatalysts. However, reproducible growth of highly insulating stoichiometric (STO) films remains challenging due to the difficulty of precise stoichiometry control in perovskite oxide films. Here, to grow stoichiometric (STO) thin films by fine-tuning multiple growth conditions, we developed a new Bayesian optimization (BO)-based machine learning method that encourages exploration of the search space by varying the prior mean to get out of suboptimal growth condition parameters. Using simulated data, we demonstrate the efficacy of the new BO method, which reproducibly reaches the global best conditions. With the BO method implemented in machine-learning-assisted molecular beam epitaxy (ML-MBE), a highly insulating stoichiometric (STO) film with no absorption in the bandgap was developed in only 44 MBE growth runs. The proposed algorithm provides an efficient experimental design platform that is not as dependent on the experience of individual researchers and will accelerate not only oxide electronics but also various material syntheses.
I. INTRODUCTION
The perovskite insulator SrTiO3 (STO) (cubic structure with the lattice constant of 3.905 Å), having a bandgap of 3.2 eV, is one of the most promising materials for oxide electronics.1–4 It is expected to be applied to high-k capacitors4,5 and photocatalysts6–8 owing to its high dielectric constant of 100–200 at room temperature,4,9 chemical stability,1 almost 100% quantum efficiency of photocatalytic water splitting under ultraviolet light (UV),6,10 and compatibility with other perovskite oxides.2,11–16 In addition, when it is doped by cation substitution, adding oxygen or cation vacancies, many interesting physical properties or phenomena emerge, such as superconducting states,17–20 ferroelectricity,21 high mobility carriers,22,23 and blue light emission.24 However, mid-gap states originating from off-stoichiometry defects, such as oxygen and cation vacancies, are known to cause leakage current in STO capacitors25,26 and also cause mid-gap absorption that may decrease the photocatalytic activity of STO.27,28 Therefore, to utilize the potential of STO as a high-k capacitor or photocatalyst, it is essential to grow stoichiometric STO epitaxial films without mid-gap states. Since the growth of stoichiometric STO entails fine-tuning of multiple growth conditions, including the supplied flux ratios of Ti and Sr, the growth temperature, and the oxidation strength in the case of molecular beam epitaxy (MBE), only a few papers have reported highly insulating stoichiometric STO films having the same lattice constants as bulk STO and no absorption in the bandgap.29,30
The conventional trial-and-error approach to optimizing the growth conditions is time-consuming and costly, and the reproducibility of optimization depends on the individual researcher. In contrast, data-driven decision-making approaches have attained high-throughput in experiments where machine learning models, such as Bayesian optimization (BO) and artificial neural networks, are incrementally updated by newly measured data.31–41 BO is a sample-efficient approach for global optimization,42 which has proven itself useful for streamlining the optimization of the thin film growth conditions.43–46 However, a technical challenge for growth optimization has remained. Specifically, the search procedure needs to reliably find a suitable parameter since the experiments are costly in terms of time, labor, and expense. To this end, exploration needs to be encouraged, especially when the suitable parameter region lies in a complex shape in the search space. Such a complex-shaped growth parameter space may force the search method to become stuck at suboptimal growth condition parameters.
In this study, to obtain highly insulating stoichiometric STO films, we develop a new BO method that encourages exploration by adapting the hyperparameter of the prediction model to get out of suboptimal parameters. We demonstrate the efficacy of our adaptation first by using simulated data and then through implementation for real materials growth: machine-learning-assisted molecular beam epitaxy (ML-MBE) of STO films. Results of MBE growth and crystallographic analyses of grown samples are accumulated to produce the next growth conditions with BO (Fig. 1). As a result, we developed highly insulating stoichiometric STO films with lattice constants identical to those of bulk STO. The visible-to-UV light spectroscopy shows no optical absorption in the bandgap, and the films were achieved in only 44 MBE growth runs. The reproducible highly insulating stoichiometric STO films will contribute to the development of the next generation of electronic and photonic devices.
Flow of ML-MBE growth using BO. (a) Schematic illustration of our multisource oxide MBE setup. EIES: Electron Impact Emission Spectroscopy. (b) X-ray diffraction (XRD) θ–2θ scan of the STO film with a Δc of 0.045 Å, as an example. STO sub. means the peak from the substrate. (c) Growth conditions for four samples, as an example. (d) Two-dimensional plots of EI values at the O3-nozzle-to-substrate distance of 65.5 mm obtained from the collected data for 27 samples, as an example.
Flow of ML-MBE growth using BO. (a) Schematic illustration of our multisource oxide MBE setup. EIES: Electron Impact Emission Spectroscopy. (b) X-ray diffraction (XRD) θ–2θ scan of the STO film with a Δc of 0.045 Å, as an example. STO sub. means the peak from the substrate. (c) Growth conditions for four samples, as an example. (d) Two-dimensional plots of EI values at the O3-nozzle-to-substrate distance of 65.5 mm obtained from the collected data for 27 samples, as an example.
II. METHODS
A. Bayesian optimization with adaptive prior mean
This section outlines how BO tackles the optimization problem and its adaptation of the prior mean function. Detailed formulations are presented in Appendix A. BO is a method for optimizing a black box function in the form of y = f(x), where function is unknown and expensive to evaluate given D-dimensional input of a specified search space . In materials growth optimization, x and y represent the growth parameters and physical properties used to evaluate grown materials, respectively. Examples of physical properties include electrical resistance and x-ray diffraction intensity. BO searches the parameter space by repeating the following steps: (i) construct a prediction model based on the Gaussian process (GP)47 given the past n observations ; (ii) evaluate an acquisition function to find a promising x′ that is likely to give a good function output; and (iii) evaluate the new point xn+1 = x′ and acquire its function value yn+1 to update the prediction model using .
The GP uses a prior mean function η(x) for predicting the value of f(x′) at an unseen x′. Nevertheless, this function is typically fixed as η(x) = 0 or a certain constant42,48,49 throughout the BO iterations because of the black box nature of f. Our method still employs a constant function in the form of η(x) = m0, but it adaptively calculates a hyperparameter m0 using past observations . Figure 2 visualizes an example of the difference in prediction of outcome at unseen parameters caused by different choices of m0 (see also Subsection 2 of Appendix A). This example uses the two-dimensional Ackley function f(x) [Fig. 2(a)], where the search progress has focused on the left–top region. Since this function has four isolated peaks with different heights, BO needs to explore the search space instead of persisting in one of the peaks. In the example in Fig. 2, the m0 = 0 case keeps on searching the skirt of the left-top sub-optimal peak [Fig. 2(d)]. In contrast, the m0 ≈ 0.2 case jumps into the unexplored left-bottom area [Fig. 2(e)]. While a greater m0 encourages exploration, fixing m0 at a large value may not always be efficient: a large m0 tends to produce an optimistic prediction in an unexplored parameter space. This can trigger unnecessary explorations leading to a plateau of the sub-optimal function value. Therefore, the choice of m0 needs to take account of the balance between exploration and exploitation in the parameter search.
(a) Ackley function in two-dimensional parameter space [x1, x2]. Among the four peaks, the left-bottom one at [0,0]⊤ gives the largest value. (b), (c) Predicted mean m(x) and (d), (e) acquisition function aEI(x) with 20 observations depicted with × marks. In (d) and (e), the ⋆ mark represents the position of maximum of aEI(x). (b), (d) Results with m0 = 0. (c), (e) Results with m0 ≈ 0.2.
(a) Ackley function in two-dimensional parameter space [x1, x2]. Among the four peaks, the left-bottom one at [0,0]⊤ gives the largest value. (b), (c) Predicted mean m(x) and (d), (e) acquisition function aEI(x) with 20 observations depicted with × marks. In (d) and (e), the ⋆ mark represents the position of maximum of aEI(x). (b), (d) Results with m0 = 0. (c), (e) Results with m0 ≈ 0.2.
To mitigate this dilemma, we developed three methods called adaptive leveling (AL), empirical Bayes (EB), and empirical Bayes uniform (EBu). The AL method draws m0 from the uniform distribution between the minimum and maximum of the past observations to obtain a balance between exploitation and exploration by stochastic choice. We also developed the EB and EBu methods to justify the modification of hyperparameters based on prior using the observations. Their further descriptions are deferred to Subsection 3 of Appendix A.
B. ML-MBE growth and sample characterizations
Epitaxial STO films with a thickness of 60 nm were grown on STO (001) substrates in a custom-designed MBE system with multiple e-beam evaporators [Fig. 1(a)]. Detailed information about the MBE system is described elsewhere.50–54 We precisely controlled the Sr and Ti elemental fluxes by monitoring the flux rates with an electron-impact-emission-spectroscopy sensor and feeding the results back to the power supplies for the e-beam evaporators. The oxidation during growth was carried out with ozone (O3) gas ( O3 + 85% O2) introduced through an alumina nozzle pointed at the substrate. For the stoichiometric STO growth, it is important to fine tune the growth conditions (the ratio of the Ti flux to the Sr flux, growth temperature, and local ozone pressure at the growth surface).28,55–57 To systematically change the Ti flux ratio to the Sr flux, we changed the Ti flux while keeping the Sr flux at 0.98 Å/s. The growth temperature was controlled by the heater shown in Fig. 1(a). We can adjust the local ozone pressure at the growth surface by changing the O3-nozzle-to-substrate distance [Fig. 1(a)] while keeping the flow rate of O3 gas at sccm.
We executed the BO algorithm in a three-dimensional space. The search windows for the Ti flux rate, growth temperature, and O3-nozzle-to-substrate distance were 0.20–0.33 Å/s, 600–900 °C, and 10–80 mm, respectively. We searched equally spaced grid points for each parameter. The number of points of the respective quantities was 100. Since the three-dimensional parameter space consisted of 1 000 000 (1003) points, performing a trial for the entire space in a point-by-point manner is unrealistic, as only several runs can be carried out per day with a typical MBE system. In order to evaluate the stoichiometry of the films, we measured θ–2θ scanned x-ray diffraction (XRD) [Fig. 1(b)] since the increase in the lattice constant is a good indicator of the magnitude of the off-stoichiometry of STO caused by changes in the cation and/or oxygen concentration.55,56,58,59 Therefore, we adopted the difference in the c-axis lattice constant of the film and the substrate (Δc) as the evaluation value. Thermal conductivity might be considered a useful metric for evaluating the crystalline quality of SrTiO3 films since high-quality films of sufficient thickness reproduced the thermal conductivity observed in bulk single crystals.60,61 However, to make the thermal conductivity of a SrTiO3 film distinguishable from that of the SrTiO3 substrate, a film thickness of several hundred nm is required,60,61 and, therefore, thermal conductivity is not suitable for the evaluation of the SrTiO3 films with a thickness of 60 nm used in this study. If the films are thick enough and the thermal conductivity measurements are reliable, optimization using BO methods with thermal conductivity as the evaluation value should also be possible. When XRD diffractions from the STO phase were indiscernible and/or diffractions from SrO, TiO2, or Srn+1TinO3n+1 (n: integer; n ≠ ∞) Ruddlesden–Popper series62 precipitates (impurity phases) appeared, we defined the evaluation value of those samples to be the worst experimental Δc value at that time. This imputation of the missing data generated when the designated phase is not formed enabled a direct search of the wide three-dimensional parameter space.46
Here, a black box function Δc = f(x) is the target function specific to our STO films, and x represents the growth parameters (Ti flux rate, growth temperature, and O3-nozzle-to-substrate distance). We used the data [Fig. 1(c)] obtained from past n MBE growths and XRD measurements [Fig. 1(b)] of STO films to construct a model to predict the value of f(x) at an unseen x. To this end, we used the GP to estimate the mean m and variance s2 at an arbitrary parameter value x (see Subsection 1 of Appendix A for details). Specifically, the GP predicts the value of f(x) as a Gaussian-distributed variable , where m and s2 depend on x and . In short, m(x) and s2(x) represent the expected value and uncertainty of Δc at x. To consider the inherent noise in the Δc of STO films grown under nominally the same conditions, the variance of the observation noise of the GP model was set to 0.01. In our implementation, we used the Matérn kernel since it is good at fitting functions with steep gradients.42 We iterated the routine after the initial MBE growth with five random initial growth parameters and XRD measurements. First, the GP was updated using the dataset at the time [Fig. 1(c)]. Subsequently, to assign the value of the growth parameter in the next run, we calculated the expected improvement (EI) [Fig. 1(d)].63
III. RESULTS AND DISCUSSION
A. Experiments with simulated data
This section investigates the optimization performance for simulated functions using five methods: the baseline with m0 = 0, a simple adaptation of m0 by taking the average of observed data referred to as DA, which stands for data averaging, and the methods that adjust m0: AL, EB, and EBu. We use two functions—the Ackley function64,65 and the Rosenbrock function.65,66 The boundary of search space was set at −0.5 and 2 for each element. These functions allow us to set an arbitrary number of dimensions D. In this study, we used D = 2, 4, and 6. Generally speaking, larger dimensionality D makes black box optimization more challenging. The Ackley function with D = 2 is illustrated in Fig. 2(a), and the Rosenbrock function with D = 2 is displayed in Fig. 3, respectively. For each configuration, all methods (baseline, DA, AL, EB, and EBu) were iterated until 100 observations were obtained. These optimization processes were repeated five times with five randomly chosen initial observations. Each evaluation contained noise with . These functions are described in Appendix B.
Rosenbrock function with D = 2 employed for the simulated optimization experiment.
Rosenbrock function with D = 2 employed for the simulated optimization experiment.
Figure 4 shows the optimization results for the Ackley and Rosenbrock functions with D = 2, 4, and 6. Each curve indicates the best observation value averaged over five runs as a function of the number of observations n. The shaded area indicates the best and worst observations among the five runs. A curve that rises with fewer observations indicates a better search algorithm, one that requires fewer resources before reaching a high value of y. A vertically narrow shaded area means that the method performs robustly against randomness in the search process, such as the initial choice of parameters and random seeds.
(a)–(c) Optimization results for the Ackley function with (a) D = 2, (b) D = 4, and (c) D = 6. (d)–(f) Optimization results for the Rosenbrock function with (d) D = 2, (e) D = 4, and (f) D = 6. Solid lines indicate the best evaluation value found so far as a function of the number of observations averaged over five trials. Shaded areas represent the best and worst performances among the five trials.
(a)–(c) Optimization results for the Ackley function with (a) D = 2, (b) D = 4, and (c) D = 6. (d)–(f) Optimization results for the Rosenbrock function with (d) D = 2, (e) D = 4, and (f) D = 6. Solid lines indicate the best evaluation value found so far as a function of the number of observations averaged over five trials. Shaded areas represent the best and worst performances among the five trials.
For the Ackley function with D = 2 [Fig. 4(a)], all methods reached the top peak on average. In particular, the baseline, DA and AL found the optimal value after 50 observations in all five trials of the optimization process. Since EB and EBu were stuck at the second-best peak occasionally, their average performance was slightly inferior to that of the baseline, DA and AL. In larger dimensionalities D = 4 and 6 for the Ackley function [Figs. 4(b) and 4(c)], the baseline method struggled to improve. This is because the baseline tends to be trapped at one of the suboptimal peaks. In contrast, DA, AL, EB, and EBu with adaptive m0 clearly outperformed the baseline. This supports the efficacy of the variable prior mean for BO. Nevertheless, none of the methods attained the maximum value of 0.5. This explains the general difficulty of black box optimization in high-dimensional search space, especially when the objective function has multiple and disconnected peaks. The performance of DA was worse in D = 6 [Fig. 4(c)]. Since the DA method tends to stabilize the value of m0 with more observations, the algorithm failed to explore novel regions.
Figures 4(d)–4(f) show the optimization results for the Rosenbrock function with D = 2, 4, and 6. Unlike the case with the Ackley function, all methods performed comparably well under all conditions. These results mean that the optimization of the Rosenbrock function is easy due to its concave surface (Fig. 3). Since the configuration of search space was restricted to a limited region between −0.5 and 2, a high evaluation value f(x) ≥ 0.3 was present for a large portion of the search space. This allowed all methods to work reasonably well in our experiment. With that said, the DA method showed slower improvements when D = 4 [Fig. 4(e)] and D = 6 [Fig. 4(f)]. This is because an occasional observation of low yi decreases m0 of DA, which caused conservative and pessimistic prediction that led to slower improvements.
Among AL, EB, and EBu that change m0 at each step, the performance was similar in most configurations. With that said, some trials of EB and EBu were inferior to those of AL in, for example, the optimization result of the Ackley functions with D = 2 and 6 and the Rosenbrock function with D = 4. Owing to the stability and reliability of the performance, we adopted the AL method for the ML-MBE growth of STO films.
B. Application to ML-MBE of STO film
To obtain stoichiometric STO films with no absorption in the bandgap, we grew STO films by the AL method implemented in ML-MBE. Figure 5 shows how the BO algorithm predicts Δc values with unseen parameter configurations and acquires new data points. The process starts with five random initial growth parameters and gains experimental Δc values for the updated GP model with 10 [Fig. 5(a)], 27 [Fig. 5(b)], and 44 [Fig. 5(c)] observations. Two-dimensional plots of the predicted Δc, s(x), and EI values at the O3-nozzle-to-substrate distance, at which the highest EI value was obtained, are shown in the lower panels [Figs. 5(d)–5(l)]. Within the first ten samples from the start, the STO phase had not formed at two growth conditions [green spheres in Fig. 5(a)]. Therefore, we defined the Δc value of these samples as the worst experimental one at that time. This imputation of experimental failure enabled a direct search of the wide three-dimensional parameter space.46 According to the GPR prediction from the ten samples, the highest EI was obtained at a Ti flux = 0.26 Å/s, a growth temperature = 777 °C, and an O3-nozzle-to-substrate distance = 39.5 mm [Fig. 5(a)]. This predicted growth condition yielded an Δc of 0.036 Å, smaller than the minimum value of 0.045 Å at that time [Fig. 5(b)]. The more surrounding data points there are, the smaller s becomes, and the less there are, the larger it becomes. Therefore, the region with relatively small s of the predicted Δc became larger as the number of experimental samples increased from 10 to 44 [Figs. 5(g)–5(i)], meaning that the prediction accuracy had increased, resulting in the lower EI values [Figs. 5(j)–5(l)]. Through this optimization process, in which exploration is encouraged by varying the prior mean, the lowest Δc value decreased and reached an ideal value of 0 in only 44 MBE growth runs (Fig. 6). The ideal Δc of 0 was achieved at the Ti flux = 0.32 Å/s, growth temperature = 852 °C, and O3-nozzle-to-substrate distance = 13.5 mm [Fig. 5(c)]. The achievement of the target material with the desired properties in such a small number of optimizations demonstrates the efficacy of the AL method for high-throughput materials growth.
(a)–(c) Experimental Δc values in the three-dimensional growth parameter space for 10 (a), 27 (b), and 44 (c) samples. The green spheres indicate the NaN points at which the STO phase was not obtained. The red spheres indicate the most promising conditions with the highest EI values, which should be examined in the next growth run. The red planes indicate the cutting plane of the O3-nozzle-to-substrate distance, at which the highest EI value was obtained. (d)–(l): Two-dimensional plots of predicted Δc values (d)–(f), s values (g)–(i), and EI values (j)–(l) at O3-nozzle-to-substrate distances of 39.5 mm [(d), (g), and (j)], 65.5 mm [(e), (h), and (k)], and 10 mm [(f), (i), and (l)], which were obtained from the collected data for 10 [(d), (g), and (j)], 27 [(e), (h), and (k)], and 44 [(f), (i), and (l)] observations, respectively. The O3-nozzle-to-substrate distance was that at which the highest EI value was obtained.
(a)–(c) Experimental Δc values in the three-dimensional growth parameter space for 10 (a), 27 (b), and 44 (c) samples. The green spheres indicate the NaN points at which the STO phase was not obtained. The red spheres indicate the most promising conditions with the highest EI values, which should be examined in the next growth run. The red planes indicate the cutting plane of the O3-nozzle-to-substrate distance, at which the highest EI value was obtained. (d)–(l): Two-dimensional plots of predicted Δc values (d)–(f), s values (g)–(i), and EI values (j)–(l) at O3-nozzle-to-substrate distances of 39.5 mm [(d), (g), and (j)], 65.5 mm [(e), (h), and (k)], and 10 mm [(f), (i), and (l)], which were obtained from the collected data for 10 [(d), (g), and (j)], 27 [(e), (h), and (k)], and 44 [(f), (i), and (l)] observations, respectively. The O3-nozzle-to-substrate distance was that at which the highest EI value was obtained.
Actual Δc values and the lowest experimental Δc plotted as a function of growth number.
Actual Δc values and the lowest experimental Δc plotted as a function of growth number.
The optimized Ti flux (0.32 Å/s) corresponds to the Ti/Sr ratio of 1.026, which is near the stoichiometric value of 1. This may be due to the low vapor pressure of the most stable strontium oxide (SrO) and titanium oxide (TiO2),67 for which the sticking coefficients of both Sr and Ti become almost 1—even if SrO and TiO2 are concomitantly formed during the growth of SrTiO3, they do not desorb from the growth surface and are eventually transformed to SrTiO3. Generally, in complex oxide films, if the sticking coefficients of each constituent cation were known as functions of growth temperature and oxidation strength, the optimum supplied flux ratio could be predicted, at least in principle. However, one cannot optimize the whole growth conditions by predicting or just using reasoning from a crystal growth perspective. Instead, the combined impact of the growth temperature, the oxidation strength, and the supplied flux ratio on the oxide film growth can only be obtained empirically through experiments. This is because actual crystal growth dynamics are complicated and cannot be comprehended even when thermodynamic phase diagrams are available. The proposed method enables high-quality and efficient materials growth, independent of the researcher’s knowledge and experience, even in cases where such prior knowledge is limited.
C. Crystallographic and optical properties of stoichiometric STO films
We experimentally characterized the physical properties of the stoichiometric STO film with Δc of 0. For comparison, we also examined the physical properties of the off-stoichiometric STO film with Δc of 0.045 Å grown under one of the first random growth conditions (Ti flux = 0.29 Å/s, growth temperature = 796 °C, and O3-nozzle-to-substrate distance = 52 mm). The sheet resistance was measured by a standard two-point method with Ag electrodes deposited on the STO surface. The stoichiometric STO is highly insulating, exceeding the measurable range (sheet resistance MΩ), while the off-stoichiometric STO film shows a relatively small sheet resistance of 1.4 kΩ. This result indicates that precise stoichiometry adjustment is necessary to obtain highly resistive STO. The crystallinity of the STO films was examined by XRD, atomic force microscopy (AFM), and scanning transmission electron microscopy (STEM). Figure 7(a) shows XRD θ–2θ scans of the stoichiometric STO film around the (002) STO Bragg peak. The XRD θ–2θ scans of the STO film grown on (001) (LaAlO3)0.3(SrAl0.5Ta0.5O3)0.7 (LSAT) under the same growth conditions are also shown. It has been reported that non-stoichiometric STO films have larger lattice constants than stoichiometric ones.28,55–58,68 The XRD pattern for the stoichiometric STO on STO shows a good overlap between film and substrate peaks without XRD fringes. The lack of fringes in the XRD data—which would be observed for finite repetition of the unit cell—is a typical feature of stoichiometric STO films28,58 since the films merge with the substrate and become indistinguishable. In contrast, the fringes are clearly observed for the films heteroepitaxially grown on the LSAT substrate, which allows for thickness estimation of the SRO films. The film thickness estimated from the periods of the Laue fringes (62 nm) agrees very well with that calculated by the Sr flux rate (60 nm), whose sticking coefficient is 1. Figures 7(b) and 7(c) show the AFM images of the stoichiometric STO film. The root-mean-square roughness is 0.25 nm, indicating that the stoichiometric STO film has smooth surfaces.
(a) XRD θ–2θ scans of the stoichiometric STO film grown on (001) STO and LSAT substrates. Inset shows the XRD θ–2θ scan of the off-stoichiometric STO film. (b) AFM image of the stoichiometric STO film on (001) STO. (c) Magnified image of (b).
(a) XRD θ–2θ scans of the stoichiometric STO film grown on (001) STO and LSAT substrates. Inset shows the XRD θ–2θ scan of the off-stoichiometric STO film. (b) AFM image of the stoichiometric STO film on (001) STO. (c) Magnified image of (b).
Figure 8 shows high-angle annular dark-field (HAADF)- and annular bright-field (ABF)-STEM images of the stoichiometric and off-stoichiometric STO films taken with a JEOL JEM-ARM 200F microscope. Since the intensity in the HAADF-STEM image is proportional to ∼Zn (n ∼ 1.7–2.0, and Z is the atomic number),69 the brighter spheres and darker ones in [Figs. 8(a) and 8(c)] are assigned to Sr- (Z = 38) and Ti- (Z = 22) occupied columns, respectively. The ABF-STEM images [Figs. 8(b) and 8(d)] represent the atomic arrangement of oxygen since the oxygen is emphasized in annular bright-field ABF-STEM images.70 The film and the substrate are nearly indistinguishable in the HAADF-STEM image for the stoichiometric film [Fig. 8(a)], indicating the ideal cationic arrangement at the interface. In contrast, the threading dislocations perpendicular to the film surface are observed in the ABF-STEM image for the off-stoichiometric film [Fig. 8(d)], which are merely observed in the HAADF-STEM image as well [Fig. 8(c)]. Such threading dislocations have been reported in Sr-rich STO films and are thought to be Ruddlesden–Popper planar faults.68 In addition, the magnified ABF-STEM image [inset in Fig. 8(d)] reveals strong contrast due to local atomic dechanneling.71,72 Since oxygen is emphasized in ABF-STEM unlike HAADF-STEM images [Fig. 8(c)], the contrasts in the ABF-STEM image should come from the oxygen vacancies. The oxygen vacancies in the off-stoichiometric STO film are consistent with the growth conditions with an oxidation strength lower than that for the stoichiometric STO film (O3-nozzle-to-substrate distances are 13.5 and 52 mm for the stoichiometric STO and off-stoichiometric STO films, respectively).
(a), (c) HAADF-STEM and (b), (d) ABF-STEM images of the stoichiometric [(a) and (b)] and off-stoichiometric [(c) and (d)] STO films along the [100] direction. Dashed lines indicate the interfaces between the grown STO layers and the substrates. Insets show magnified images at the center of the films.
(a), (c) HAADF-STEM and (b), (d) ABF-STEM images of the stoichiometric [(a) and (b)] and off-stoichiometric [(c) and (d)] STO films along the [100] direction. Dashed lines indicate the interfaces between the grown STO layers and the substrates. Insets show magnified images at the center of the films.
To determine the Ti/Sr composition ratios, we carried out energy dispersive x-ray spectroscopy (EDS) measurements also taken with a JEOL JEM-ARM 200F microscope. Figure 9 shows the EDS spectra for the stoichiometric and off-stoichiometric STO films. Only Sr, Ti, and O peaks are observed in both films, indicating no observable impurities in the films. The Ti/Sr composition ratio was estimated by the Ti Kα/Sr Lα integrated intensity ratios normalized by that of the STO substrate, i.e., the Ti/Sr ratio of the STO substrate is assumed to be 1. The estimated Ti/Sr ratios in the stoichiometric and off-stoichiometric films are 1.00 and 0.94, respectively. Note that the typical accuracy of the EDS for the Ti Kα and Sr Lα integrated intensities is ±0.01–±0.04.73,74
EDS spectra for the stoichiometric and off-stoichiometric STO films. The inset shows a magnified view at the Ti Kα. The spectra are normalized of the Sr Lα peak intensities for easy comparison.
EDS spectra for the stoichiometric and off-stoichiometric STO films. The inset shows a magnified view at the Ti Kα. The spectra are normalized of the Sr Lα peak intensities for easy comparison.
Figure 10 shows the optical absorption of the stoichiometric and off-stoichiometric STO films at room temperature. The sudden increase of the absorption at 3.2 eV originates from the O 2p-to-Ti 3d charge transfer transition in STO films and substrates, indicating the bandgap of STO.7 The absorption spectrum of the stoichiometric STO is identical to that of the STO substrate, indicating that the stoichiometric STO could be an ideal mother material for photocatalysis applications. In contrast, non-stoichiometric STO shows the Drude (free-electron-carrier) absorption (photon energy eV) and absorption from deep-impurity states (1 eV photon energy eV).28 The absorptions at 2.4 and 2.9 eV may originate from the excitation of electrons trapped by oxygen vacancies since they are widely observed in reduced STO.69
Optical absorptions of the STO substrate and the stoichiometric and non-stoichiometric STO films on STO at room temperature.
Optical absorptions of the STO substrate and the stoichiometric and non-stoichiometric STO films on STO at room temperature.
IV. CONCLUSION
We demonstrated the stoichiometric growth of STO films via Bayesian optimization with an adaptive hyperparameter of a prior mean function. To obtain highly insulating stoichiometric STO films, we developed a new BO method that encourages exploration by adjusting the prior mean to get out of suboptimal parameters. Using simulated data, we found the efficacy of all the methods that vary the prior mean value, reproducibly reaching the global best conditions. Among these methods, we employed the AL method for ML-MBE. In only 44 MBE growth runs, our approach attained highly insulating stoichiometric STO films having no absorption in the bandgap, which will contribute to the next generation of electronic and photonic devices. The proposed algorithm provides an efficient experimental design platform that is not as dependent on the experience and skills of individual researchers. It will enhance the efficiency of not only oxide electronics but also various material syntheses and autonomous syntheses.75–77
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Y.K.W. and T.O. contributed equally to this work.
Yuki K. Wakabayashi: Conceptualization (equal); Investigation (lead); Methodology (equal); Software (supporting); Supervision (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal). Takuma Otsuka: Conceptualization (equal); Investigation (supporting); Methodology (equal); Software (lead); Supervision (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal). Yoshiharu Krockenberger: Investigation (supporting); Writing – review & editing (supporting). Hiroshi Sawada: Writing – review & editing (supporting). Yoshitaka Taniyasu: Writing – review & editing (supporting). Hideki Yamamoto: Writing – review & editing (supporting).
DATA AVAILABILITY
The data and code that support the findings of this study are openly available on GitHub at https://github.com/nttcslab/adaptive-leveling-BO.
APPENDIX A: BAYESIAN OPTIMIZATION WITH ADAPTIVE PRIOR MEAN FUNCTION
1. GP prediction and acquisition function
2. Impact of prior mean function on optimization
The choice of η(x) will affect the acquisition function as well as the parameter search efficiency. The typical choice of the prior mean function is η(x) = 0, particularly because we lack knowledge of the underlying black box function f(x). We replace the prior mean with a constant function in the form of η(x) = m0 since a more flexible functional form of η to potentially approximate f is inaccessible.
Figure 2 shows the difference in the prediction mean and the acquisition for the two-dimensional Ackley function with m0 = 0 and m0 ≈ 0.2. See Appendix B and Eq. (B1) for the description of the Ackley function. As shown in Fig. 2(a), the Ackley function f(x) has four peaks with the left-bottom one at x = [0,0]⊤ being the highest. In this example, the left-top suboptimal peak has been intensively searched [Figs. 2(b) and 2(c)]. A comparison with m0 = 0 [Fig. 2(b)] and m0 ≈ 0.2 [Fig. 2(c)] shows that the predictive mean m(x′) with m0 ≈ 0.2 is larger than that with m0 = 0 in the half bottom region. This difference in the predicted mean results in a difference in the acquisition function [Figs. 2(d) and 2(e)]. The use of m0 > 0 increases the predictive mean of an unexplored region and leads to finding other peaks.
While a greater m0 worked favorably in the example in Fig. 2, a large m0 may not always be efficient; a large m0 tends to produce an optimistic prediction in unexplored regions. This can trigger unnecessary explorations leading to a plateau at the suboptimal function value. Thus, the choice of hyperparameter m0 needs to take account of the balance between exploration and exploitation in the parameter search.
3. Methods for adapting prior mean function hyperparameter
The discussion in Subsection 2 of Appendix A motivates us to adapt m0. A simple way is to average the observed data . We call this approach DA (data averaging). Despite its simplicity, DA works well in practice (Fig. 4), but tends to saturate m0 at a certain value as more observations are accumulated. This can compensate for the exploration of parameter search. In the following, we present three methods with varying m0 for efficient BO.
Indeed, EB and EBu may choose . This is because some elements in can be negative, and thus their weighted average can produce the extrapolation of observations in yn. Since EBu uses an extended interval between 0 and , m0 of EBu has more chance to be outside of the past observations.
APPENDIX B: OBJECTIVE FUNCTIONS USED IN SIMULATED DATA EXPERIMENTS
APPENDIX C: OPTIMIZATION OF DEVIATION IN LATTICE CONSTANT
Our experiments used the Ackley and Rosenbrock functions with D = 2 and set the target as t = 0. Namely, we minimized the absolute value of these functions. Figures 11(a) and 11(b) show the value of |f(x) − t| for the Ackley function and Rosenbrock function, respectively. Figures 11(c) and 11(d) show the optimization results for each function. For both functions, the optimization process with each method was repeated five times until 100 observations were acquired. Solid lines indicate the smallest difference between the function value and target as a function of the number of observations averaged over the five runs. Shaded areas indicate the best and worst performances among the five runs. The results in Figs. 11(c) and 11(d) were given by the AL method for GP prediction. In the case of the Ackley function [Fig. 11(c)], all approaches eventually reached f(x) = 0 after 40 observations. In contrast, the truncation approach failed once out of five trials [Fig. 11(d)]. The rest of the methods showed stable performance for finding f(x) = 0. Figures 11(e) and 11(f) investigate the effect of the choice of m0 (AL vs baseline) on the chi2 approach. While AL made little difference for the Rosenbrock function [Fig. 11(f)], AL accelerated the optimization in the case of the Ackley function [Fig. 11(e)]. The observation in an unexplored region was encouraged when m0 ≈ 0 was chosen. This behavior enhanced the chance of finding f(x) = 0 on the wavy surface of the absolute Ackley function [Fig. 11(a)].
(a) Absolute value of the Ackley function with D = 2. The original function is displayed in Fig. 2(a). (b) Absolute value of the Rosenbrock function with D = 2. The original function is displayed in Fig. 3. (c) Optimization results of minimizing the difference between the function value and 0, i.e., the absolute value of the Ackley function. (d) Optimization results of minimizing the absolute value of the Rosenbrock function. (c), (d) Solid lines indicate the best evaluation value found so far as a function of the number of observations averaged over five trials. Shaded areas represent the best and worst performances for the five runs. (e) Optimization results of absolute the Ackley function using with m0 = 0 (base) and AL. (f) Optimization results of the absolute Rosenbrock function using with m0 = 0 (base) and AL.
(a) Absolute value of the Ackley function with D = 2. The original function is displayed in Fig. 2(a). (b) Absolute value of the Rosenbrock function with D = 2. The original function is displayed in Fig. 3. (c) Optimization results of minimizing the difference between the function value and 0, i.e., the absolute value of the Ackley function. (d) Optimization results of minimizing the absolute value of the Rosenbrock function. (c), (d) Solid lines indicate the best evaluation value found so far as a function of the number of observations averaged over five trials. Shaded areas represent the best and worst performances for the five runs. (e) Optimization results of absolute the Ackley function using with m0 = 0 (base) and AL. (f) Optimization results of the absolute Rosenbrock function using with m0 = 0 (base) and AL.
Our results showed that all the approaches except the truncation one gave comparable performance for both functions. While chi2 reduced the difference between the function and target values slightly quicker than the other approaches, we adopted the simplest absolute approach for the STO film growth experiment in expectation of its robust behavior and easier interpretation of progress in the wild environment. Since the MBE growth experiment requires extensive time, we leave the use of chi2 for optimizing the growth parameters for future work.