Reflection high energy electron diffraction (RHEED) information is critical for the growth of thin films; however, only a small percentage of the data from RHEED videos is typically used. The use of full videos in machine learning can require dimension reduction techniques. In this paper, three dimension reduction techniques, principal component analysis (PCA), non-negative matrix factorization (NMF), and kmeans clustering, are compared to investigate their benefits to the analysis of RHEED data. Three different heterostructures with different growth modes, all deposited on Ti-terminated strontium titanate by pulsed laser deposition, were used for the analysis: lanthanum aluminate with layer-by-layer growth, lithium cobalt oxide with island growth, and strontium ruthenate with a transition from layer-by-layer to step-flow growth. A phase shift in intensity fluctuations of different RHEED spots was discovered and discussed in terms of their sensitivity to the film growth characterization. The diffraction spots that were more sensitive to the growth were differentiated from the spots that are affected by the substrate as a function of film thickness. It was concluded that NMF provides the analysis that is easiest to interpret without the loss of detailed physical information due to its non-negativity constraint and lack of forced orthogonality such as in PCA. Analysis of the full RHEED videos enables a more detailed understanding of growth characteristics and control of growth processes as aided by dimension reduction.
I. INTRODUCTION
Machine learning is becoming increasingly important to the field of microscopy. Specifically, techniques such as principal component analysis (PCA) and kmeans clustering can statistically compress large datasets such as a set of images from scanning transmission electron microscopy using the convergent beam technique1 or videos taken of reflection high energy electron diffraction (RHEED) during the growth of thin films.2 PCA has also been used to distinguish unique features within the data3 as well as to identify and filter noise.4,5 Both PCA and non-negative matrix factorization (NMF) have been used to analyze and model data such that there is no need for prior knowledge of the samples and there is no human bias during the interpretation of the data.4,6
This work applies several unsupervised machine learning techniques to RHEED, which is a critical in situ technique containing two spatial dimensions and time (x,y,t). RHEED is a tool for characterizing growth characteristics of thin films, particularly in thin film devices based on epitaxial heterostructures and superlattices, which require high accuracy in the number of layers, crystallinity, and surface roughness. During layer-by-layer deposition, the oscillation of average RHEED specular spot intensity with time portrays the number of layers grown.7 This intensity can also be used to indirectly determine where a transition in the growth mode may be occurring,8–10 the relative amount of material ablated per layer,11,12 and an understanding of the diffusion mechanisms of adatoms on the surface of the substrate via intensity recovery of RHEED,8,12 as well as the composition ratio of elements during growth with molecular beam epitaxy.11 Additionally, the shape and spacing of the RHEED spots can help identify how crystalline a material is,8,10,13 what the surface structure looks like, and what the lattice parameters13 and the crystal symmetry are.7,13,14 More generally, RHEED patterns are examined qualitatively while only utilizing the spacing of the spots and the oscillation pattern quantitatively.2,14 This leaves a plethora of data from the remaining video untapped. In addition, different choices are made during quantitative analysis such as which reflection to use for intensity analysis or what fraction of the reflection streak to use, as well as whether to use the total intensity changes or just the change of intensity of the brightest (specular) spot. It is assumed that these differences in choice do not make a difference in analysis. With the rise of data science techniques and more powerful computational technology, full RHEED videos of in situ growth can be captured and analyzed. RHEED patterns could be compared to other characterization techniques such as x-ray diffraction (XRD), atomic force microscopy (AFM), and transmission electron microscopy (TEM) to better understand exactly how the patterns relate to the physical features of a sample.
Previous work has been done with PCA and kmeans to identify growth modes of a particular film-substrate combination as well as to identify noise and its source.2 A recent paper examined multiple materials—homo- and heteroepitaxial using PCA and kmeans to better understand the stoichiometry, crystallinity, and relaxation of the MOCVD grown films.15 A third set of authors looked at homoepitaxial growth of GaAs/GaAs using PCA and density based spatial clustering of applications with noise (DBSCAN) to distinguish orientations and surface structures of each sample.16 This same group had previously used convolutional neural networks (CNNs) to make these distinctions but had to label each image by knowledge of the qualitative data, introducing bias.17 Another group of authors input the first several principal components into a CNN in order to determine how the input parameters of molecular beam epitaxy affect the structure by comparing the images to x-ray diffraction, photoluminescence, and defect density data.18 Finally, NMF has been used on scanning precession electron diffraction data.19,20 These studies all differ from the current study, which by the nature of RHEED includes a temporal aspect.
The present study will compare multiple materials, each with different known growth modes via multiple unsupervised learning techniques to elucidate how each technique can enable an understanding of growth modes and how these techniques could be best applied for future analyses. Three different film materials were chosen in this study, LaAlO3 (LAO), LiCoO2 (LCO), and SrRuO3 (SRO), as they provide different growth characteristics on the substrate SrTiO3 (STO), layer-by-layer, polycrystalline growth, and layer-by-layer followed by step-flow growth, respectively. In comparison to the other studies listed, a variety of known growth modes will assist in the determination of how easily interpretable and advantageous each learning technique is to numerous situations encountered when using RHEED.
The unsupervised learning techniques of interest are PCA, NMF, and kmeans clustering. PCA has been used for over a century to statistically reduce high dimensional data. Combinations of the original dimensions of data are returned such that the first value, principal component 1 (PC1), contains the most variability or the most defined shape of the data.21 The consecutive values: PC2, PC3, etc. are orthogonal, meaning that the direction of variance is not overlapping from value to value.21 The data structure built for the analysis contains the coordinates (x,y) of the RHEED videos as columns, with the time (t) at which a frame occurs in the rows. The loadings explain how much each variable, in this case coordinate, has contributed to that principal component. These are returned as new images for each principal component. The score at a particular time in this structure is how much the projection of each frame deviates from the average variance of its principal component.21
NMF is similar to PCA, as both are decomposition methods; however, NMF has the constraint that the supplied matrix and returned values are non-negative.22 This has the additional result that the components are no longer orthogonal, meaning information within the components can overlap leading to soft clustering (i.e., components can cluster with more than one basis) and the data are more sparse than in PCA.22 Furthermore, since images are non-negative to begin with, the data are more interpretable.22,23 However, for NMF, the calculation has to be repeated to determine the best rank, making NMF a computationally more intensive technique in comparison with PCA, which returns all components in a single run and determines the relevant number using a scree plot.
Kmeans clustering is a different approach from the previous two in that the images are grouped into individual clusters rather than being formed based on how the pixels within each image are changing with time. The number of clusters in kmeans is defined by the user based on achieving the least amount of variation within each cluster or the smallest squared Euclidian distance between points in a cluster. In other words, the closest points are within each cluster and each cluster should be as far as possible to the other clusters. The data points are initially randomly assigned to clusters, which is followed by an iterative process in which a centroid is determined for each cluster and the points are reassigned to the closest centroid. This occurs until the centroid stops moving. By repeating kmeans, it is possible to achieve different results due to the first step being random.21 In this analysis, the number of clusters was chosen by a silhouette plot, which determines the cluster number for which the clusters are sufficiently separated. Here, PCA, NMF, and kmeans were applied to the time dependent RHEED patterns of three different materials, with different growth modes to understand how each unsupervised learning technique can be useful in understanding the kinetics of growth in pulsed laser deposition.
II. MATERIALS AND METHODS
Pulsed laser deposition (PLD) using the Pioneer 120 Advanced PLD from Neocera with a KrF excimer laser was used to grow LaAlO3 (LAO), LiCoO2 (LCO), and SrRuO3 (SRO). All samples were grown on titanium terminated (001) SrTiO3 (STO) substrates. Epi-polished STO substrates [MTI (100) with edge (100)] were etched with hydrofluoric acid buffered with ammonium bifluoride at pH = 4 and were annealed in air atmosphere at 950°C for 1 h. Targets of LCO and SRO were purchased from the Kurt J. Lesker Company. The LAO target was prepared in house with conventional ceramic processing. LAO films were grown at 770 °C, at a fluence of 1.5 J/cm2, a pulse frequency of 2 Hz, a background oxygen pressure of 0.013 Pa (0.1 mTorr), and 150 pulses. LCO was grown at 620 °C, a fluence of 0.53 J/cm2, a pulse frequency of 5 Hz, a background oxygen pressure of 0.667 Pa (5 mTorr), and 1510 pulses. Two different samples of SRO were grown; both samples were grown at a temperature of 670 °C, a fluence of 1.5 J/cm2, a frequency of 2 Hz, and 300 pulses, but were at different background oxygen pressures, 6.67 Pa and 13.3 Pa (50 and 100 mTorr). The k-Space Associates Inc. (kSA) 400 software was used to capture RHEED (Staib Inc.) movies as well as intensity oscillations. RHEED was operated with a 30 kV electron beam, and an angle of about 2° to the substrate. The intensity of various parts of the pattern was captured by placing boxes over the regions of interest for which the software would record the peak intensity on an intensity vs time plot. Videos were converted to .avi file types before being made into frames at four frames per second by ffmpeg and analyzed in Python version 3.9. All data were normalized by dividing by 255. The Scikit learn decomposition PCA package was used to perform principal component analysis using their default conditions. The Scikit learn cluster KMeans package was used to perform kmeans clustering. The Scikit learn metrics silhouette_samples and silhouette_scores were used to make a silhouette plot, which were used to choose the number of clusters for kmeans. Scikit learn decomposition NMF with a random initialization coordinate descent as the numerical solver, Forbenius norm at the beta loss function, and 100 maximum iterations were used for NMF. The rank was chosen based on the location of an elbow in a plot of the Euclidian distances vs rank for each sample. Analyses are example cases applied to one sample for each material. All machine learning techniques were repeated twice on separate samples grown with identical conditions to verify reproducibility. Slight changes in the azimuthal angle or the incidence angle can create intensity variations;24 therefore, unless extreme care is given to the substrate placement on the heater, sample to sample comparison will be complicated when machine learning techniques are applied. However, for a given growth, the incidence angle and the azimuthal angle are constant. The results of this work have been verified for more than one sample, indicating while we cannot combine several samples due to such variations, the conclusions are universally applicable (see the supplementary material).
III. RESULTS and DISCUSSION
A. LaAlO3 (LAO)
LAO provides a well-defined layer-by-layer growth characteristic over a broad processing range. The conditions chosen were shown to yield single crystalline epitaxial heterostructures in the current processing system.25–29 The intensity oscillations and the RHEED pattern during deposition can be seen in Figs. 1(a) and 1(b), respectively, which indicate layer-by-layer growth. Only the (00) and (0) curves are shown for simplicity, but a comparison of (0) and (01) is shown in Fig. S1 in the supplementary material. The amplitude of the intensity oscillations in several figures has been rescaled to aid in comparison to scores and basis plots; the rescaling magnitude is provided in the captions of the images and, if not noted, is the same as in Fig. 1(a). In general, the intensity fluctuation is related to the growth where the maximum intensity indicates full coverage (θ = 1) of the layer and the minimum intensity indicates the largest roughness when the layer has half the coverage (θ = 0.5). The intensity plot in Fig. 1(a) starts at the third oscillation due to uneven fluctuations at the start of growth as the mobility of adatoms changes between the substrate and the first couple of layers of the film. AFM shows an atomically smooth film surface replicating the terraces of the Ti-terminated substrate [Fig. 1(c)]. High angle annular dark field scanning transmission electron microscopy (HAADF STEM)29 images confirm the single crystalline nature of the film and sharpness of the interface [Fig. 1(d)].
The first five principal components of the LAO RHEED video add up to 70% of the variance; the scree plot can be seen in Fig. S2 in the supplementary material. Figures 2(a)–2(e) show the first five loading plots (eigenvectors) for the first five principal components and (f)–(j) show their corresponding score plots. The score plots in Figs. 2(f), 2(g), and 2(i), corresponding to PC1 (28.5%), PC2 (20.4%), and PC4 (7.0%), respectively, oscillate in a manner similar to that of the intensity oscillations (Fig. 1), which are overlaid on those score plots. The PC1 scores [Fig. 2(f)] are half a wavelength off that of the (0) RHEED spot intensity oscillations where the minimum of one corresponds to the maximum of the other. Positive scores are correlated with the minimum in the (0) RHEED spot in the early parts of the deposition, while the negative scores, that are due to the forced orthogonality of PCA, are related to the maximum in the (0) RHEED spot in the later parts of the deposition [Fig. 2(f)]. Positive loadings are associated with the specular (00) spot [Fig. 2(a)]. Since the score maxima decrease with increasing film thickness (0–25 s), it suggests that the specular (00) spot is sensitive to the reflection from the substrate layers, whose contribution decreases with increasing film thickness. Negative loadings are associated with the (0) specular spots and some of the streaks around them. Since the (0) and (01) spots are associated with negative scores, and the peaks of the negative scores become larger with time, and thus thickness, this indicates that the (0) RHEED spot is more sensitive to the film thickness and growth. Additionally, because these negative scores align with the maximum of the (0) intensity, it suggests that the intensity of this spot can be correlated to the completion of the layers in the film. The magnitude of positive loadings is much lower than that of the negative loadings, which is likely due to the phase shift between the (00) and (0) RHEED spots [Fig 1(a)]. This relationship, thus, is better observed in PC2. The loadings map for PC2 [Fig. 2(b)] shows that it is strongly and positively related to the (00) RHEED spot. The scores oscillations align with intensity oscillations of the (00) RHEED spot [Fig. 2(g)] and the magnitude of the positive scores decrease with increasing film thickness, consistent with the suggestion that it is more sensitive to the substrate in comparison with the (0) RHEED spot which we argue is more sensitive to the film growth based on these observations. PC4, the only other PC with oscillations, is correlated positively with the (00) RHEED specular spot, and a prominent non-specular spot higher up on the (00) streak, and negatively (to a lesser magnitude) with the streaks of the (01) and (0) RHEED spots [Fig. 2(d)]. It has been seen that prominent non-specular spots can appear in the diffraction patterns due to diffusely scattered electrons being reflected into Kikuchi lines, which are enhanced as they cross the RHEED streak.24,30 Diffuse scattering is also known as incoherent scattering.24 PC4 seems to introduce a correction to the overall intensity of PC1 as the average intensity decreases for PC1 and increases for PC4 with time, while the average intensity oscillations do not show a change with time. If there is a decrease in the overall quality of the surface as the film gets thicker, one would expect a decrease in the overall intensity of the reflections and an increasing contribution of the diffuse scattering. Thus, the intensity trends between PC1 and PC4 suggest that PC4 is more sensitive to diffuse scattering effects, which is consistent with the observation of the non-specular spot on (00).
PC3 (8.8%) and PC5 (4.4%) scores do not have any distinct pattern and have a similar percent variance as the noise observed for STO sitting in the deposition chamber at the respective temperatures and pressures [Figs. 1(h) and 1(j)]. Therefore, this noise is likely from vibrations in the room that contains the electron beam for RHEED and thermal fluctuations from the heater in the PLD. PCA of a substrate sitting in the chamber at the deposition conditions can be seen in Fig. S6 in the supplementary material and accounts for noise.
As a result of PCA, we recognize that there are differences in response between the (00) and (0) RHEED spots, which are historically randomly chosen for intensity analysis. When the intensity fluctuations were compared [Fig. 1(a)] for both (00) and (0) RHEED spots, a clear shift has been observed, indicating that the point identified as full layer coverage would be different if it was based on different diffraction spots. There was no conclusion on exactly which location best demonstrates full coverage at maximum intensity. However, both PC1 that describes a larger amount of variance and PC4 that is related to diffuse scattering suggest that intensity fluctuations of the (0) and (01) RHEED spots are more sensitive to changes associated with growth. The origin of the phase shift between intensity fluctuations of two RHEED spots does not show any dependence on deposition time, and thus thickness (Fig. S3 in the supplementary material). This shift in the oscillation pattern for different RHEED spots was uniformly observed for all the intensity maxima and was reproducible for different materials, as shown in Fig. S4 in the supplementary material for STO deposited on STO and La0.57Li0.29TiO3 deposited on STO.
Phase shifts between intensity oscillations have been observed previously in RHEED studies. However, these studies were based on different samples grown under identical growth conditions and were related to different diffraction conditions such as the incidence and azimuthal angles.24,30–32 Those studies concluded that RHEED intensity does not always directly correlate to monolayer completion.30,32 In this study, we observe a phase shift between reflections of the same diffraction pattern, thus the diffraction conditions related to sample and beam geometry are the same. Based on models built to describe the phase shift, the kinematic model alone is not sufficient to describe phase shifts and incoherent scattering can significantly affect the RHEED intensity oscillations.30,32 The phase shift between different diffraction spots, i.e., (00) and (01) may be due to the effect of Kikuchi lines dominating one of the signals.24 In fact, in Fig. S5 in the supplementary material, the Kikuchi line is overlaying the specular region of the (00) diffraction spot, but not the specular regions of the (0) and (01) spots, indicating that incoherent scattering is interfering with the diffraction spots.
Next, NMF was applied to the same dataset (Fig. 3). Despite similarities in the overall look of the basis maps [Figs. 3(e)–3(h)] with score plots in PCA and the coefficient plots [Figs. 3(a)–3(d)] with loading plots in PCA, NMF provides a much clearer description of the observations when compared with PCA due to its non-negativity constraint. A plot of the Euclidian distances for each rank that was used to choose rank 4 is shown in Fig. S7 in the supplementary material. Cluster 1 is clearly related to the average intensity change of the whole RHEED pattern which is very minute. Cluster 2 and cluster 3 separate the intensity fluctuations of the (00) and (0) RHEED spots, respectively. The basis of cluster 2 maximizes at times where the intensity of the (00) spot minimizes [Fig. 3(f)]. The coefficient plot has higher values for the streaks of all the spots, which correspond to a disordered surface during growth of the monolayer [Fig. 3(b)]. Additionally, similar to PC4, there is a prominent non-specular spot on the (00) diffraction streak, which is most likely due to diffuse scattering. As there is no relation to the specular spots in the cluster 2 coefficient plot, this cluster explains the diffuse contribution to the diffraction pattern, caused by changes in growth [Fig. 3(b)].
Cluster 3 perfectly aligns with the intensity fluctuations of the (0) spot [Figs. 3(c) and 3(g)] and cluster 4 is directly related to the intensity fluctuations of the (00) spot [Figs. 3(d) and 3(h)].
Kmeans was performed on LAO using three clusters based on a silhouette plot (Fig. S8 in the supplementary material). The clusters are periodic over time [Fig. 4(a)] and when the clusters are plotted on the (00) RHEED spot intensity, they refer to specific points in the growth of a layer [Fig. 4(b)]. Figures 4(c)–4(e) are averages of the images in each cluster for reference. A random selection of images from each cluster was also observed for any notable differences. Kmeans cluster 3 is associated with the specular spots particularly for (00) and shows that it corresponds to its maximum intensity where the layers are complete and smooth (θ ≈ 1). The association of maximum intensity of the (00) RHEED spot with a narrow specular spot agrees with the results from PCA and theory.24,30 Even though based on this intensity distribution [Fig. 4(b)], the analysis suggests that cluster 1 corresponds to low coverage of the layer (0 < θ < 0.5), whereas cluster 2 indicates higher, but not fully complete, coverage (0.5 < θ < 1); in reality, it can be seen that the clustering is due to the phase change in the intensity fluctuation between two RHEED spots. When the same clusters are overlaid on the intensity fluctuations of (0) spot, cluster 2 becomes associated with the maximum intensity (Fig. S9 in the supplementary material). The average RHEED plot for cluster 2 also shows a stronger intensity for the specular (0) spot [Fig. 4(d)]. Cluster 1, regardless of the RHEED spot chosen, corresponds to regions of lower coverage and the streak regions are emphasized in comparison with the specular spots of (0) for cluster 2 and the (00) specular spot for cluster 1. Such emphasis on streaks for lower than monolayer coverage is expected to occur.24
Kmeans cluster 3 gets narrower and cluster 2 gets wider with each layer [Fig. 4(b), Fig. S9 in the supplementary material]. As the films get thicker, the (0) spot intensity associated with cluster 2 differentiates changes in growth better than the (00) spot, which is more dominant early on due to the contribution of the substrate to the diffraction. This is consistent with our conclusion from PCA analysis that the (0) type spots are more sensitive to growth.
B. LiCoO2 (LCO)
Figure 5(a) shows the RHEED pattern several seconds into the deposition while Fig. 5(b) shows the pattern toward the end of the deposition. The pattern transitions from streaks to an array of spots, which have been known to represent island growth.7 As seen in Fig. 5(c), the intensity is not oscillating with time in a way that would indicate layer-by-layer growth. As with LAO, the amplitude of the intensity oscillations of LCO has been rescaled to aid in comparison to scores and basis plots; the magnitude of rescaling is provided in the captions of the images and, if not noted, is the same as in Fig. 5(c). Additionally, there are large features on the surface consistent with a three-dimensional growth mode and the formation of a polycrystalline film [Fig. 5(d)].
PCA analysis of RHEED data showed that most of the variance can be represented by two principal components, PC1 (63.0%) and PC2 (29.4%) (Fig. S10 in the supplementary material). PC1 is strongly correlated with earlier times of the deposition from the very large scores in the first 10 s [Fig. 6(c)]. The corresponding loadings are associated with the specular spots and streaks, which were the features that were observed in the early RHEED patterns [Fig. 5(a)]. As films switch to island growth at later stages, the scores are negative and the negative loadings correspond to the spotty pattern such as those at the later stage of growth in the RHEED pattern [Fig. 5(b)]. Since the specular spots are observed for both the initial growth regime and the later islands, PC1 differentiates the two based on the width of the diffraction streak. Roughness of the surface in general results in broadening of the streaks and specular spots and since the later parts of the growth result in increased roughness, its effect is observed within the negative loadings as a halo around the specular spot, emphasizing the information hidden within the diameter of the specular spot. PC2 is harder to identify because while there are positive scores early on, the end of the deposition also returns to positive scores. Therefore, to understand PC2, we look to the negative scores that seem to emphasize the transition region. The associated loadings are the different parts of the halos around the specular spots and some of the streaks. However, a physical understanding is hard to deduce.
Once again, NMF shows a much clearer identification of the regions by removing negative values. A plot of the Euclidian distance for each rank is seen in Fig. S11 in the supplementary material. The second cluster is associated with the initial parts of the growth based on the basis plot and the associated coefficient map, which shows a dependence on the narrow specular spots [Figs. 7(b) and 7(e)]. Cluster 1 then identifies the transition region [Fig. 7(d)] and cluster 3 identifies the island growth [Fig. 7(f)] as seen from its coefficient map which has an idealized spotty pattern [Fig. 7(c)]. The coefficient map of cluster 1, however, shows that the transition region is represented by broadening of the specular spots (seen from the darker shell around the three lighter specular spots) [Fig. 7(a)]. These observations suggest an initial roughening of the surface finished with island growth. Thus, the transition region does not necessarily mean a transition from one mode of growth to another, but a region between the low thickness of the initial growth, where the reflections are still dominated by the substrate, and the broadening of the streaks due to increasing roughness.
When kmeans was examined with two clusters (the silhouette plot in Fig. S12 in the supplementary material), the clusters remain in blocks of time, separating the growth into two parts combining clusters 1 and 3 of NMF into cluster 1 of kmeans. Cluster 1 of kmeans is representing the island growth and is associated with the spots in the pattern. Cluster 2 represents the beginning stages of the growth and is associated with narrower specular spots. However, NMF clustering more clearly isolates the idealized pattern of the island growth as it was able to separate the transition region (Fig. 8).
C. SrRuO3 (SRO)
Now that cases of layer-by-layer and island growth have been demonstrated, we will look at a third material, SRO, which exhibits a more complicated growth mechanism. Two different partial pressures of background oxygen were used, 13.3 and 6.67 Pa (Fig. 9). The sample grown at 13.3 Pa has intensity oscillations that fade, whereas they remain for the sample grown at 6.67 Pa. This indicates that in the sample grown with a background pressure of 13.3 Pa, there is a transition in growth, either to step-flow growth, which is two-dimensional, or to island growth, which is three-dimensional.30,33–35 Again, the amplitude of the intensity oscillations in Fig. 9(b) has been rescaled to aid in comparison to scores and basis plots; the magnitude of rescaling is provided in the captions of the images and, if not noted, is the same as in Fig. 9(b). X-ray diffraction (XRD) showed the presence of secondary phases, Sr2RuO4 and Sr3Ru2O7, in addition to the epitaxial SRO phase [Fig. 10(a)]. Samples grown at 13.3 Pa had lower amounts of these secondary phases in comparison with the samples grown at 6.67 Pa, based on relative intensity ratios [with (003) peak]. Both samples showed relatively smooth terraces with around two unit-cell size steps [Figs. 10(c) and 10(d)]. The sample grown at 13.3 Pa also exhibited a much sharper interface and surface as determined by the x-ray reflectivity (XRR) [Fig. 10(b)]. These measurements indicate that the sample grown at 13.3 Pa switched to step-flow mode and provided a better-quality film than the one grown at 6.67 Pa, which retained layer-by-layer growth. Step-flow growth takes place when there is a high diffusion coefficient of the adatoms, and rather than nucleating and growing in areas across the film, the adatoms nucleate and grow out from the steps.30,33 Therefore, there is not a fluctuation of roughness of the films that would lead to intensity oscillations when the growth is layer-by-layer. This transition from layer-by-layer to step-flow for the growth of SRO/STO heterostructure has also been observed elsewhere.36
PCA was executed on the SRO sample grown at 13.3 Pa, to better understand step-flow growth via RHEED patterns. The first four principal components add up to 83.6% of the variance (Fig. S14 in the supplementary material). Figure 11 shows the first four principal component loading plots as well as their corresponding score plots. Based on only the intensity vs time plot, the transition from layer-by-layer to step-flow growth appears to occur at 65 s for the (00) RHEED spot, which shows three oscillations, but around 45 s for the (0) spot, which only shows two oscillations [shown in Fig. 9(b) and overlaid in Figs. 11(e)–11(h)]. PC1 (43.1%), shows a relationship to the first 46 s where the oscillations are stronger, suggesting a mixed transition into step-flow [Fig. 11(e)]. This section of growth, interestingly, is not associated with the specular spots. Instead, they showed a stronger relationship to the halo around the specular region of (00) spot and to a lower extent to the streaks of (01) and (0) [Fig. 11(a)]. Both the broadening of the (00) specular spot and the emphasis of streaks (of the (01) and (0) spots) relate to less than full coverage during layer-by-layer growth, indicating greater overall roughness in the layer-by-layer region of growth. However, in the step-flow growth region (>45 s), the (01) and (0) specular spots are emphasized [Fig. 11(a)]. PC2 (16.65%) is strongly related to the start of the growth, thus the substrate itself [Fig. 11(f)]. This principal component is related to the specular regions of the (01) and (0) spots, thus indicating that the high-quality surface is defined by these specular spots in this sample rather than the (00) spot [Fig. 11(b)]. This corresponds to the same regions identifying the step-flow growth in PC1, where one would expect high-quality surfaces. PC3 (13%) shows score oscillations half a wavelength off from the intensity fluctuations in the layer-by-layer growth region [Fig. 11(g)], similar to those observed for LAO, which was growing layer-by-layer for the whole range. The maximum of these scores identifies where the roughness is greatest. However, for PC3, negative loadings are stronger [Fig. 11(c)] which relate to the highest intensity points in the layer-by-layer growth. Thus, PC3 introduces a modification to PC1 that defines the fluctuations within the layer-by-layer growth regime. The positive loadings of PC1 and the negative loadings of PC3 are associated with similar regions of the RHEED pattern. Finally, PC4 (10.8%) has scores that show one large fluctuation from positive scores to negative scores within the layer-by-layer growth regime. The relationship to intensity fluctuations is more complicated and the most significant feature in the loadings map [Fig. 11(d)] is the halo around the specular (00) spot that shows negative loadings. Once again, the higher principal components (lower contribution to the variance) are more difficult to relate to physical descriptions of the underlying behavior in PCA.
A rank of four was used for NMF of SRO (Fig. S15 in the supplementary material). In NMF, Fig. 12(e), the basis for the first cluster lines up with the intensity oscillations particularly of the diffuse region of the (0) spot and show a correlation with the same regions [see the magenta box in Fig. 9(a)] in the coefficient maps [Fig. 12(a)]. However, a stronger relationship is observed with the (00) specular spot. This is interesting as the intensity was saturated throughout the growth, and therefore cannot be compared to the other three intensities. Figures 12(b) and 12(f), cluster two in NMF, correspond to the step-flow region of growth, which has a narrower (00) specular spot and darker (0) and (01) specular spots, consistent with a higher quality surface as is observed for the original substrate. Cluster 3 is the transition region between layer-by-layer and step-flow growth, indicating that the (0) and (01) specular spots are becoming more prominent while the (00) specular spot is starting to narrow before moving into step-flow (the variance is seen as a halo effect). Similar to PC2, cluster 4 [Figs. 12(d) and 12(h)] corresponds to the beginning of growth when the surface is smoothest, and this is represented in the coefficient plot by dark (0) and (01) specular spots. It is important to note that (00) specular spot was not associated with this smooth surface. Due to saturation, there is not much of a variance that can be observed for this spot and it was not highlighted in any of the loading plots of principal components. However, it was categorized with cluster 1 in NMF.
Kmeans was performed on the SRO RHEED video with two clusters (Fig. 13 with the silhouette plot in Fig. S16 in the supplementary material). As seen in PCA and NMF, cluster 1, which corresponds to the step-flow region, has darker (0) and (01) spots and a narrower central spot than cluster 2. The broadening in the layer-by-layer region (cluster 2) is associated with less than the perfect surface, and once the step-flow growth started to occur a quality comparable to the initial surface quality is achieved. Additionally, the clusters are separated at around 45 s. This transition is better aligned with that observed for the intensity fluctuations of (0) RHEED spot, again supporting the conclusion that was achieved for LAO, that the intensity fluctuations of the (0) RHEED spot are more sensitive to the layer-by-layer film growth than the (00) spot.
IV. CONCLUSIONS
PCA, NMF, and kmeans are powerful machine learning tools, which aid in the interpretation of microscopy. In the analysis of RHEED, where the data are often qualitatively analyzed, these techniques can distinguish points of interest within the data. Three samples, each with different growth modes, were compared to understand the application of PCA, NMF, and kmeans to thin film growth. For LAO, which demonstrated layer-by-layer growth, it was determined that the (00) specular spot has contributions due to the substrate. In comparison, the (0) and (01) spots are more sensitive to the growth of the films and suggest a greater correspondence to the completion of the layers. Diffuse scattering from the overlap of the RHEED streaks with Kikuchi lines was also distinguished in PCA and NMF. From this machine learning analysis, it was discovered that there is a phase shift between the intensity oscillations of different RHEED spots. This is crucial as the location for intensity collection may be arbitrarily chosen and used as the determining factor for layer completion, which has a critical influence on the properties of thin films. When LCO, which demonstrated island growth, was examined, similar to LAO, it was identified that the initial pattern could be dominated by the substrate. Additionally, the broadening of streaks occurs as the film roughens. Finally, SRO had a transition from layer-by-layer to step-flow growth. The features of the substrate could be identified as dark (0) and (01) spots for this sample, which were also present in the step-flow region of growth, indicating that step-flow growth leads to smoother layers. This was verified by the broader (00) spot in the layer-by-layer region as compared to the step-flow region. Although both PCA and NMF condense the data and aid in a quick understanding of the changes that are occurring during growth, NMF was a more straightforward method for interpretation of the data due to its non-negativity constraint. However, if an understanding of the noise is desired, PCA may be better at gathering this information. Kmeans determines what frames are the most similar and was able to identify different regions of growth for all three materials but relied on averaging the frames that belonged to each cluster and could lose vital information in this process. Therefore, NMF is recommended for use on RHEED videos due to its straightforward nature. Machine learning techniques are capable of distinguishing important features and identifying where transitions in growth are occurring and how the formation of layers changes with time.
SUPPLEMENTARY MATERIAL
See the supplementary material for the plots that were used to determine the number of components, ranks, and clusters used for PCA, NMF, and kmeans, respectively. Further information including an additional trial for repeatability is also contained in this material.
ACKNOWLEDGMENTS
Kim Gliebe thanks the National Defense Science and Engineering Graduate Fellowship (No. F-6692990251) for funding this work. This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at the Case Western Reserve University. We would also like to thank the SDLE center for helpful discussions, Professor Steven Eppell for access to the AFM, and Elahe Farghadany for help with PLD.
DATA AVAILABILITY
The Jupyter Notebook that supports the findings of this study is openly available in GitHub at http://doi.org/10.17605/OSF.IO/H6J7X, Ref. 37 and the videos are available from the corresponding author upon reasonable request.