Materials discovery and design require characterizing material structures at the nanometer and sub-nanometer scale. Four-Dimensional Scanning Transmission Electron Microscopy (4D-STEM) resolves the crystal structure of materials, but many 4D-STEM data analysis pipelines are not suited for the identification of anomalous and unexpected structures. This work introduces improvements to the iterative Non-Negative Matrix Factorization (NMF) method by implementing consensus clustering for ensemble learning. We evaluate the performance of models during parameter tuning and find that consensus clustering improves performance in all cases and is able to recover specific grains missed by the best performing model in the ensemble. The methods introduced in this work can be applied broadly to materials characterization datasets to aid in the design of new materials.

Acceleration of materials discovery and design necessitates versatile and robust characterization tools that can resolve nm and sub-nm structures. Four-Dimensional Scanning Transmission Electron Microscopy (4D-STEM) is a technique that allows for structural maps to be created over the micron field of view at high spatial resolution.1 This is achieved by scanning a converged electron probe over a two-dimensional (2D) region of a sample and collecting a diffraction pattern at each position. Thus, 4D-STEM datasets contain a set of 2D diffraction patterns for each probe position over the sample scan region [Fig. 1(a)]. These diffraction patterns contain a rich set of information regarding the underlying structure of the sample, which can be quantified based on the intensity and positions of Bragg disks for crystalline materials or the diffuse scattering signal for disordered and amorphous materials. Crystallographic orientations,2–5 phases,6,7 and properties8–11 have been extracted from 4D-STEM datasets, but automated and semi-automated analysis pipelines have not been universally established for novel material systems.12 

FIG. 1.

Overview of 4D-STEM with the visual representation of datasets and results. (a) Diagram of the 4D-STEM dataset. (b) Maximum diffraction patterns and (c) true cluster labels for the three simulated datasets, Ag1 (top), Ag2 (middle), and Ag3 (bottom). (d) Example of a successful model for each dataset. (e) Example of a failed model for each dataset.

FIG. 1.

Overview of 4D-STEM with the visual representation of datasets and results. (a) Diagram of the 4D-STEM dataset. (b) Maximum diffraction patterns and (c) true cluster labels for the three simulated datasets, Ag1 (top), Ag2 (middle), and Ag3 (bottom). (d) Example of a successful model for each dataset. (e) Example of a failed model for each dataset.

Close modal

Real space maps of the sample are commonly recreated from 4D-STEM datasets using virtual apertures. This is achieved by creating a real space intensity map by summing the intensities at specific regions in diffraction space. While this technique has been applied to understand crystallographic and phase distributions within a sample in the past,13–15 it requires manual placement of masks, which can bias the analysis. Researchers have designed semi-automated and automated protocols for data analysis using template matching procedures6,16–19 and machine learning.20–24 Template-based techniques can be useful when the phases present in the material are known and the classification problem is simple, but complex or anomalous structures often arise during the process of designing new materials. Deep learning approaches often depend on simulated data for training, which may not reflect the complexity of new materials where structures or properties outside the training set may arise. Supervised methods cannot capture information that deviates from our current knowledge, which can prevent these methods from aiding in materials discovery and design pipelines. Thus, unsupervised learning is a practical alternative to rapidly identify regions of self-similarity within a dataset.25 

Unsupervised learning pipelines for 4D-STEM data analysis have been introduced in the past.5,20,26,27 These approaches primarily focus on either detecting unique Bragg reflections and using this form for clustering5 or virtual imaging14,27 for crystalline materials. For amorphous materials, the Pair Distribution Function (PDF) and other similar statistical operations in the azimuthal direction in reciprocal space are frequently used.9,28 Techniques that focus on the full, multidimensional dataset have only shown success in simple systems.23 Here, we present guidelines for the implementation of unsupervised learning and new approaches for improved performance. We first discuss the impact of parameter selection on performance to guide the implementation process for different feature sets. We then apply consensus clustering25,29,30 to improve the performance of unsupervised pipelines and evaluate the stability of the parameter sets.

4D-STEM datasets require solving a multi-class problem. There can be several different crystal orientations or phases present within the dataset, as shown by the various colors in Fig. 1(a). These classes can overlap within a single pattern, requiring a soft clustering model to resolve patterns that contain multiple crystal structures. We simulated three datasets containing Ag grains (referred to as Ag1, Ag2, and Ag3) with varying orientations and grain sizes [(35 × 41 × 40 Å3), (52 × 62 × 60 Å3), and (70 × 82 × 80 Å3)] to evaluate the performance of our models against different extents of grain overlap. We used custom MATLAB scripts that implement the multislice algorithm of Cowley and Moodie31 and methods defined by Kirkland32 and the plane wave reciprocal space interpolated scattering matrix (PRISM) algorithm33 with an acceleration potential of 300 keV, a probe semiconvergence angle of 1.05 mrad, a 5 Å pixel size in real space, and a 0.01 Å−1 pixel size in reciprocal space to simulate the 4D-STEM dataset. Labels were generated for scoring purposes, but were not used during modeling.

The max diffraction pattern and true cluster labels for the datasets are shown in Figs. 1(b) and 1(c), respectively. These datasets were featurized using a mean virtual imaging method referred to as the Angular Average (AA)27 and the detected Bragg Disk (BD) intensities and positions5 using methods available in py4DSTEM.34 We used a binning of 3 × 3 for the BD representation and an averaging step of 5 pixels for the AA representation because they maintained integrity of the individual disk positions and intensities while reducing the size of the input data. Visual representations of these feature sets are shown in Fig. S1. The sizes of the feature sets are 288, 7056, and 63 504 for the AA, BD, and raw patterns, respectively. These selections are not universally optimal, and different binning and averaging steps are needed for datasets with different imaging conditions. Prior work has shown the efficacy of performing dimensionality reduction, such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA), on data featurized using statistical operations in the azimuthal direction in reciprocal space. The most notable of these methods is the Pair Distribution Function (PDF).9,28 While the PDF and similar featurization protocols have been successful at identifying distinct regions in amorphous multi-phase materials, prior work has shown that similar methods tend to fail when being applied to crystalline data.27 Therefore, we omit these types of featurization methods from our analysis as our dataset contains purely crystalline samples. Additional post processing steps are described in the supplementary material.

Iterative Negative Matrix Factorization (NMF) was introduced as a computational method for 4D-STEM data analysis in the work of Allen et al.5 and used in the work of Bruefach et al.27 NMF (‖VWHF, W ≥ 0, H ≥ 0) reduces the feature matrix (V, n × m) into a set of weighted linear combinations.35,36 We first apply NMF to the feature set and then merge the columns of the reduced component matrix (W, n × c) that are correlated above the defined merge threshold. This is repeated until there are no components that are correlated above the merge threshold. The final columns in the component matrix become the clusters, with each pattern (n) having a weight associated with the cluster identity. This method mitigates the need to define the number of components, but the selection of the maximum components and merge threshold can greatly impact the performance of the model. The initial number of input components can be estimated by first performing Principal Component Analysis (PCA) and setting the initial number of clusters to be five times greater than the scree plot shows.5,27 In our experience, this step does not add significant time to the modeling step, but the scree plot is not always straightforward to interpret the exact number of clusters. If the number of expected clusters is known by the user, the selection of maximum components could similarly be set to five times that value. We find that the merge threshold is strongly dependent on both the dataset and the input feature set. Typically, optimal values range from 0.15 to 0.65. Typically, the optimal merge threshold value becomes larger as the similarity between structures in the dataset decreases. We investigated the impact of the input parameters by running 25 models with different initialization conditions for each of four parameter sets (P1–P4, Table I) for the AA and BD featurizations, leading to a total of 200 models trained.

TABLE I.

Parameter sets for NMF models.

FeatureParameter setInput componentsCorrelation threshold
AA 50 0.35 
AA 50 0.40 
AA 60 0.35 
AA 60 0.40 
BD 60 0.20 
BD 60 0.25 
BD 75 0.20 
BD 75 0.25 
FeatureParameter setInput componentsCorrelation threshold
AA 50 0.35 
AA 50 0.40 
AA 60 0.35 
AA 60 0.40 
BD 60 0.20 
BD 60 0.25 
BD 75 0.20 
BD 75 0.25 

While many models successfully segmented the data [Fig. 1(d)], we observed three distinct types of model failures [Fig. 1(e)]. The first is the inability of the model to retain information from the input feature sets, leading to little to no detection of individual clusters and large values for ‖VWHF, referred to as the NMF reconstruction error. This effectively means that the input feature columns are lost during the NMF optimization process, leading to a final product that does not replicate the initial inputs. This type of error is easy to detect both computationally and visually [Fig. 1(e), middle row]. Failure to retain information is likely due to poor initialization conditions and typically does not impact an entire parameter set, except for the P3 models using the BD feature for the Ag2 dataset. The AA feature also had several models per parameter set that failed in this way. The second type of failure can be observed from the Ag1 BD model results shown in Fig. 1(e) (top row). This is associated with too high of a merge threshold, which prevents the model from developing a single cluster for single grains. These two failure cases underpin the necessity to test parameter sets for datasets as proper selection of parameters is crucial for optimal performance.

The last failure case is the inability to detect the overlapping grains as two distinct clusters, shown in Fig. 1(e) (bottom row). We observe this failure primarily in models using the BD feature. Previous work has shown that the BD feature has the tendency to overfit the overlapping regions by clustering them into separate classes rather than containing two distinct grains.27 Thus, for the classification problem in which datasets have more than one orientation or phase per pattern, the BD representation may not provide accurate grain sizes or shapes. In addition to these failures, none of the models detect all the grains in the datasets. This can be observed in the Ag1 cluster map in Fig. 1(d) (top row), where the model omits some grains present in true cluster maps [Fig. 1(c)]. More examples of successful and failed models can be observed in Fig. S2.

Since 4D-STEM data analysis requires solving a multi-class problem, there are several different ways to report performance. We first calculated the sum of the weights in the intersect of the class labels and clusters in each model to determine the best match grain. After finding the best match, we calculated the True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), False Negative Rate (FNR), and F1 score for each pair within a model. The average of each of these parameters was calculated for each model and reported as a single value. The average TPRs for the models trained for Ag1, Ag2, and Ag3 are represented by the black markers in Figs. 2(a), 2(b), and 2(c), respectively. All other average values are presented in Table S1.

FIG. 2.

Comparison of consensus cluster and TPR measurements. Measurements and consensus cluster maps for Ag1 (a), Ag2 (b), and Ag3 (c) trained using four different parameter sets for the AA (top) and BD (bottom) representations.

FIG. 2.

Comparison of consensus cluster and TPR measurements. Measurements and consensus cluster maps for Ag1 (a), Ag2 (b), and Ag3 (c) trained using four different parameter sets for the AA (top) and BD (bottom) representations.

Close modal

We found that the model set using P1 performs well across all three datasets using the BD representation. We believe that the success of P1 is due to the optimal selection of both the merge threshold and initial components. First, the lower merge threshold leads to improved performance for Ag1 by preventing independent grains from being put into separate clusters. In Ag3, the enhanced performance of P1 may be due to the lower number of components. All parameter sets applied to Ag2 overfit the overlapping grain regions, but the lower number of components may prevent some of the overlapping regions from being clustered as distinct from the parent grains. We find that even when datasets are very similar, parameter tuning is a crucial step in optimizing model performance.

The models developed using the AA feature with P4 perform consistently well across all datasets. We believe that the high number of components for the AA feature is important for Ag1, where the high threshold leads to high performance of P4 on the Ag2 and Ag3 datasets. It is possible that the lower merge threshold tends to merge dissimilar grains when applying to the AA feature when there is some extent of grain overlap. If not much is known about the content of the dataset, using the AA feature with P4 would be a good choice as these models have the best overall performance across the datasets, with an average overall TPR of 0.80 across the 75 models and three datasets.

We observed that the Ag2 dataset is more sensitive to input parameters for the BD feature. All sets using the BD feature for the Ag1 and Ag3 datasets have narrow distributions (average variance 0.001), yet the models using the BD feature for Ag2 tend to have broader distributions (average variance 0.04). We believe that the small regions of overlapping grains are the cause of the variability in the Ag2 dataset. Since the overlapping regions are small relative to the grain size in Ag2, only a few models per parameter set may see these regions as distinct. This theory is underpinned by the reproducible poor performance of all the models created using the BD feature for Ag3 (TPRbest = 0.55). The iterative NMF method may have difficulty in determining minority phases if they are only present in few patterns per dataset, especially if there is a significant overlap with majority crystallographic features in the sample. The BD feature is more sensitive to detecting these regions as they become larger. The cause of this difference may be due to the size of the feature sets and their ability to retain information relevant for the clustering task. Since the AA feature set has only 288 features, it may have more difficulty in detecting these distinctions, while the BD feature set holds 7056 features and is more likely to contain information, allowing it to distinguish these regions. This may be beneficial or detrimental based on the classification problem. For example, the BD representation has a tendency to fail at identifying overlapping regions correctly, but for the same reason is more likely to properly identify distinct precipitates in matrix or minority phases in a material system.

Consensus clustering allows for information from the 25 models per parameter set to be consolidated into one model. Unlike in supervised learning, the cluster order does not represent class labels or specific regions and the number of clusters is not consistent among all models. To address these complications, we performed label correspondence to determine which clusters should be combined for the consensus. We selected the model with the maximum number of clusters out of the 25 trained models and used these as the foundation set. The remaining clusters were matched to the foundation set (A) using the sum of the weights in the intersection of the current cluster (Bj) and the foundational cluster (Ai) in the initialized bins (i) based on the equation bin(Bj)=argmax(i=1i=n(AiBj). We created a new bin if no match was found. All bins that contained less than two clusters were dropped. After matching the clusters, the average was taken in each bin and used as the consensus. A visual representation of this workflow can be found in Fig. S3. We calculated the True Positive Rate (TPR), False Positive Rate (FPR), True Negative Rate (TNR), False Negative Rate (FNR), and F1 score for the independent and consensus cluster models (Table S3).

FIG. 3.

Select consensus agreement maps for the BD P1 models trained on (a) Ag1, (b) Ag2, and (c) Ag3 with plots showing the relationship between average TPR within a cluster and average agreement fraction (agreementavg). We find no obvious trend between agreementavg and performance.

FIG. 3.

Select consensus agreement maps for the BD P1 models trained on (a) Ag1, (b) Ag2, and (c) Ag3 with plots showing the relationship between average TPR within a cluster and average agreement fraction (agreementavg). We find no obvious trend between agreementavg and performance.

Close modal

The consensus clustering approach mitigates the impact of parameter selection on performance. We observe that the performance of the consensuses is relatively stable across parameter sets as the consensus cluster TPR (TPRconsensus) has a narrower distribution than both the parameter set (TPRavg) and the best model (TPRbest), reported in Table II. In all cases, TPRconsensus (Fig. 2, diamond markers) improves above the average of the TPRavg and in several cases beats TPRbest within a parameter set. The largest performance enhancement occurs in sets containing some models that failed to retain information. However, consensus clustering did not improve clustering in parameter sets that did not retain any information, such as P3 using the BD feature for the Ag2 dataset. In addition to the benefit of improved performance, we observe from the cluster maps that information from individual grains that top performing models omit can be recovered using consensus clustering. For example, the top model for Ag1 using the AA feature in Fig. 1(d) omits some grains, but these are now observed in the consensus cluster map for this parameter set. Thus, more grains can be accurately detected when taking the consensus over the best performing model. This is a dangerous outcome since the TPR does improve significantly from the set. However, the FPR (Table S4) increases from 0.01 to 0.20 over the average consensus of that parameter set, which could be used as an indicator for the poor performance of the consensus cluster. The highest FPR increase from the other consensuses was 5×, leading to an average FPR of 0.05, which is low in comparison.

TABLE II.

Average, best, and consensus TPR and FPR in select parameter sets. The FPRbest value is reported as the FPR rate associated with TPRbest, not the model with the best FPR out of the parameter set.

ValueAg1, AA P1Ag2, AA P3Ag3, AA P3Ag1, BD P4Ag2, BD P4Ag3, BD P3
TPRavg 0.27 0.28 0.62 0.55 0.60 0.44 
TPRbest 0.75 0.89 0.73 0.56 0.72 0.44 
TPRconsensus 0.91 0.92 0.77 0.65 0.89 0.64 
FPRavg 0.00 0.00 0.00 0.00 0.01 0.00 
FPRbest 0.01 0.02 0.00 0.00 0.01 0.00 
FPRconsensus 0.02 0.03 0.02 0.00 0.03 0.01 
ValueAg1, AA P1Ag2, AA P3Ag3, AA P3Ag1, BD P4Ag2, BD P4Ag3, BD P3
TPRavg 0.27 0.28 0.62 0.55 0.60 0.44 
TPRbest 0.75 0.89 0.73 0.56 0.72 0.44 
TPRconsensus 0.91 0.92 0.77 0.65 0.89 0.64 
FPRavg 0.00 0.00 0.00 0.00 0.01 0.00 
FPRbest 0.01 0.02 0.00 0.00 0.01 0.00 
FPRconsensus 0.02 0.03 0.02 0.00 0.03 0.01 

Performing label correspondence and consensus clustering also allows for quantification of stability across the models within a parameter set. We calculated both the agreement maps (agreementm) and the average agreement fraction (agreementavg) for each cluster in the consensus. We determined agreementm by calculating the agreement fraction for each observation. The average of the binary clusters (c) in each consensus bin was taken using the equation agreementm=1nc=1c=n0if obst,1if obs>t. Each consensus bin has a set number of clusters (n), and each pixel in the agreementm is associated with a single observation (obs), which is binarized using a predefined threshold (t). This predefined threshold can be selected by visually observing the spatial distribution of weights before and after applying the threshold or by visualizing the histogram of weights in each class. If spatial separation as described in the work of Bruefach et al.27 is performed prior to consensus clustering, the t value can be set to zero as weight filtering has already been performed. Values closer to 1.0 for the agreementm indicate a high degree of agreement per cluster in a consensus bin, where values closer to 0.0 indicate a low degree of agreement across clusters within a consensus bin. A higher degree of agreement within a parameter set indicates higher stability across random seeds. The agreementavg was calculated by taking the average of each agreementm.

Select agreementm for the BD feature and P1 is shown in Figs. 3(a), 3(b), and 3(c) for Ag1, Ag2, and Ag3, respectively. We observe that there is a relatively high agreementavg in Ag1 and Ag2 with a wider range of agreements in Ag3. Even the clusters in Fig. 3(c) that overfit the overlapping regions tend to have agreementavg that do not deviate from the distribution. The scatter plots show no obvious trend between agreementavg and the TPR of the select consensus cluster, further indicating that feature engineering and parameter selection must be selected carefully. We find this lack of trend consistent across all feature representations and parameter sets. Failure of the consensus cluster models to appropriately detect trends in the data is not linked to instability within the parameter sets and is likely due to the selection of feature representation or parameter selection. The implementation of consensus clustering applied in this work does not address the overfitting problem plaguing the models trained on the BD representation and instead implies that improved performance is most likely to come from the proper selection of model inputs.

Application of this method requires the ability to accurately featurize experimental datasets. While the Ag datasets that we simulated for this work are a relatively simple use case and do not contain a support film, the methods applied in this work have been successful on more complicated experimental datasets of twinned polycrystalline gold nanoparticles5 and of double helical multimetallic nanowires with a preferred orientation along the nanowire growth direction.18,27 Typically, we find that the background signal from a support in experimental data will be independently segmented by the model into 1–2 clusters, which can be manually removed. Instead, a background subtraction step can be applied during feature preprocessing but is not required.

The challenges introduced by sample thickness are more difficult to address. Strides have been made in this area, particularly by Munshi et al.,24 who demonstrated the ability to accurately detect and extract Bragg disk intensities from thicker samples using deep learning. The detected disks from this model would be suitable for our unsupervised learning pipeline, while the angular average feature from thicker samples may not be easily interpretable.

This work introduces the nuances of parameter selection and performance variability in iterative NMF applied to 4D-STEM datasets using engineered features as model inputs. We find that datasets that have equally sized grains are less sensitive to input parameters. The models tend to have more difficulty in resolving minority regions within a dataset, as indicated by the high model variance in the Ag2 dataset. There is no universally superior feature set or parameter set. In general, we find that the AA representation performs better when the number of unique structures per pattern increases, but the BD feature would be a better choice when attempting to identify minority phases. Consensus clustering consistently reduces error and recovers information that is lost in individual cluster sets and reduces the impact of parameter selection on performance. We introduce a method to evaluate stability across models in a consensus and highlight how it can provide insights on failures in either feature or parameter selection. The approaches and findings we present can be extended broadly to other materials characterization techniques, where experts can design relevant dataset featurization protocols.

While this work does not address the ability to directly interpret orientations or phases of the clusters within the dataset, it can be a necessary intermediate toward understanding large, complex diffraction datasets when little prior knowledge about the sample is known. When the phase(s) of the material is known, it will almost always be more efficient to apply template matching protocols, such as those reported by Ophus et al.18 and Cautaerts et al.,19 directly to the preprocessed data. Unsupervised learning workflows, such as the one reported in this work, can complement these efforts by reducing large 4D-STEM datasets to a more manageable size such that the individual clusters can be analyzed for a best match structure. This type of data analysis pipeline is crucial in underpinning the material discovery and design efforts particularly when evaluating the homogeneity of materials in new synthesis protocols for electrocatalysts.

See the supplementary material for additional details on the dataset calibration and postprocessing; figures to demonstrate the featurization protocol, model performance distributions, and consensus clustering workflow; and tables reporting model performances.

A.B. and the py4DSTEM development were supported by the Toyota Research Institute. Work at the Molecular Foundry was supported by the Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

The authors have no conflicts to disclose.

A.B. was responsible for developing the methodology, analysis, visualization, and writing. M.C.S. was responsible for project conceptualization and paper revisions. C.O. was responsible for project conceptualization, data curation, and paper revisions.

Alexandra Bruefach: Formal analysis (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Colin Ophus: Conceptualization (supporting); Data curation (lead); Supervision (supporting); Writing – review & editing (equal). M. C. Scott: Conceptualization (lead); Resources (lead); Supervision (lead); Writing – review & editing (equal).

The data that support the findings of this study are openly available in the Demonstration of Consensus Clustering for 4D-STEM at https://doi.org/10.5281/zenodo.7195135.37 

1.
C.
Ophus
, “
Four-dimensional scanning transmission electron microscopy (4D-STEM): From scanning nanodiffraction to ptychography and beyond
,”
Microsc. Microanal.
25
,
563
582
(
2019
).
2.
E. D.
Grimley
,
S.
Frisone
,
T.
Schenk
,
M. H.
Park
,
C. M.
Fancher
,
T.
Mikolajick
,
J. L.
Jones
,
U.
Schroeder
, and
J. M.
LeBeau
, “
Insights into texture and phase coexistence in polycrystalline and polyphasic ferroelectric HFO2 thin films using 4D-STEM
,”
Microsc. Microanal.
24
,
184
(
2018
).
3.
D.
Mukherjee
,
J. T. L.
Gamler
,
S. E.
Skrabalak
, and
R. R.
Unocic
, “
Lattice strain measurement of core@shell electrocatalysts with 4D scanning transmission electron microscopy nanobeam electron diffraction
,”
ACS Catal.
10
,
5529
5541
(
2020
).
4.
A.
Londoño-Calderon
,
D. J.
Williams
,
M. M.
Schneider
,
B. H.
Savitzky
,
C.
Ophus
,
S.
Ma
,
H.
Zhu
, and
M. T.
Pettes
, “
Intrinsic helical twist and chirality in ulrathin tellurium nanowires
,”
Nanoscale
13
,
9606
9614
(
2021
).
5.
F. I.
Allen
,
T. C.
Pekin
,
A.
Persaud
,
S. J.
Rozeveld
,
G. F.
Meyers
,
J.
Ciston
,
C.
Ophus
, and
A. M.
Minor
, “
Fast grain mapping with sub-nanometer resolution using 4D-STEM with grain classification by principal component analysis and non-negative matrix factorization
,”
Microsc. Microanal.
1
,
1
10
(
2021
).
6.
A. K.
Shukla
,
Q. M.
Ramasse
,
C.
Ophus
,
D. M.
Kepaptsoglou
,
F. S.
Hage
,
C.
Gammer
,
C.
Bowling
,
P. A. H.
Gallegos
, and
S.
Venkatachalam
, “
Effect of composition on the structure of lithium-and manganese-rich transition metal oxides
,”
Energy Environ. Sci.
11
,
830
840
(
2018
).
7.
E. F.
Rauch
and
M.
Véron
, “
Methods for orientation and phase identification of nano-sized embedded secondary phase particles by 4D scanning precession electron diffraction
,”
Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater.
75
,
505
511
(
2019
).
8.
T. C.
Pekin
,
J.
Ding
,
C.
Gammer
,
B.
Ozdol
,
C.
Ophus
,
M.
Asta
,
R. O.
Ritchie
, and
A. M.
Minor
, “
Direct measurement of nanostructural change during in situ deformation of a bulk metallic glass
,”
Nat. Commun.
10
,
2445
(
2019
).
9.
X.
Mu
,
L.
Chen
,
R.
Mikut
,
H.
Hahn
, and
C.
Kübel
, “
Unveiling local atomic bonding and packing of amorphous nanophases via independent component analysis facilitated pair distribution function
,”
Acta Mater.
212
,
116932
(
2021
).
10.
N.
Yang
,
C.
Ophus
,
B. H.
Savitzsky
,
M. C.
Scott
,
K.
Bustillo
, and
K.
Lu
, “
Nanoscale characterization of crystalline and amorphous phases in silicon oxycarbide ceramics using 4D-STEM
,”
Mater. Charact.
181
,
1111512
(
2021
).
11.
E.
Thornsen
,
J.
Frafjord
,
J.
Friis
,
C.
Marioara
,
S.
Wenner
,
S.
Andersen
, and
R.
Holmestad
, “
Studying GPI zones in Al-Zn-Mg alloys by 4D-STEM
,”
Mater. Charact.
185
,
111675
(
2021
).
12.
A.
Ponce
,
J. A.
Aguilar
,
J.
Tate
, and
M. J.
Yacamán
, “
Advances in the electron diffraction characterization of atomic clusters and nanoparticles
,”
Nanoscale Adv.
3
,
311
325
(
2021
).
13.
C.
Gammer
,
V.
Burak Ozdol
,
C. H.
Liebscher
, and
A. M.
Minor
, “
Diffraction contrast imaging using virtual apertures
,”
Ultramicroscopy
155
,
1
10
(
2015
).
14.
A.
Nalin Mehta
,
N.
Gauquelin
,
M.
Nord
,
A.
Orekhov
,
H.
Bender
,
D.
Cerbu
,
J.
Verbeeck
, and
W.
Vandervorst
, “
Unravelling stacking order in epitaxial bilayer MX2 using 4D-STEM with unsupervised learning
,”
Nanotechnology
31
,
445702
(
2020
).
15.
K.
Reidy
,
G.
Varnavides
,
J. D.
Thomsen
,
A.
Kumar
,
T.
Pham
,
A. M.
Blackburn
,
P.
Anikeeva
,
P.
Narang
,
J. M.
LeBeau
, and
F. M.
Ross
, “
Direct imaging and electronic structure modulation of moiré superlattices at the 2D/3D interface
,”
Nat. Commun.
12
,
1290
(
2021
).
16.
E. F.
Rauch
,
J.
Portillo
,
S.
Nicolopoulos
,
D.
Bultreys
,
S.
Rouvimov
, and
P.
Moeck
, “
Automated nanocrystal orientation and phase mapping in the transmission electron microscope on the basis of precession electron diffraction
,”
Z. Kristallogr. Cryst. Mater.
225
,
103
109
(
2010
).
17.
T.
Meng
and
J.-M.
Zuo
, “
Improvements in electron diffraction pattern automatic indexing algorithms
,”
Eur. Phys. J. Appl. Phys.
80
,
107901
(
2017
).
18.
C.
Ophus
,
S. E.
Zeltmann
,
A.
Bruefach
,
A.
Rakowski
,
B. H.
Savitzsky
,
A. M.
Minor
, and
M.
Scott
, “
Automated crystal orientation mapping in py4DSTEM using sparse correlation matching
,”
Microsc. Microanal.
28
,
390
403
(
2022
).
19.
N.
Cautaerts
,
P.
Crout
,
H. W.
Anes
,
E.
Prestat
,
J.
Jeong
,
G.
Dehm
, and
C. H.
Liebscher
, “
Free, flexible and fast: Orientation mapping using the multi-core and GPU-accelerated template matching capabilities in the Python-based open source 4D-STEM analysis toolbox Pyxem
,”
Ultramicroscopy
237
,
113517
(
2022
).
20.
B. H.
Martineau
,
D. J.
Johnstone
,
A. T.
van Helvoort
,
P. A.
Midgley
, and
A. S.
Eggeman
, “
Unsupervised machine learning applied to scanning precession electron diffraction data
,”
Adv. Struct. Chem. Imaging
5
,
1
14
(
2019
).
21.
A.
Zintler
,
R.
Eilhardt
,
S.
Wang
,
M.
Krajnak
,
P.
Schramowski
,
W.
Stammer
,
S.
Petzold
,
N.
Kaiser
,
K.
Kersting
,
L.
Alff
, and
L.
Molina-Luna
, “
Machine learning assisted pattern matching: Insight into oxide electronic device performance by phase determination in 4D-STEM datasets
,”
Microsc. Microanal.
26
,
1908
(
2020
).
22.
R.
Yuan
,
J.
Zhang
,
L.
He
, and
J.-M.
Zuo
, “
Training artificial neural networks for precision orientation and strain mapping using 4D electron diffraction datasets
,”
Ultramicroscopy
231
,
113256
(
2021
).
23.
C.
Shi
,
M.
Cao
,
S. M.
Rehn
,
S.-H.
Bae
,
J.
Kim
,
M. R.
Jones
,
D. A.
Muller
, and
Y.
Han
, “
Uncovering material deformations via machine learning combined with four-dimensional scanning transmission electron microscopy
,”
npj Comput. Mater.
8
,
114
(
2022
).
24.
J.
Munshi
,
A.
Rakowski
,
B. H.
Savitsky
,
S. E.
Zeltmann
,
J.
Ciston
,
M.
Henderson
,
S.
Cholia
,
A. M.
Minor
,
M. K.
Chan
, and
C.
Ophus
, “
Disentangling multiple scattering with deep learning: Application to strain mapping from electron diffraction patterns
,” arXiv:2202.00204 (
2022
).
25.
H. G.
Ayad
and
M. S.
Kamel
, “
On voting-based consensus of cluster ensembles
,”
Pattern Recognit.
43
,
1943
1953
(
2010
).
26.
F.
Uesugi
,
S.
Koshiya
,
J.
Kikkawa
,
T.
Nagai
,
K.
Mitsuishi
, and
K.
Kimoto
, “
Non-negative matrix factorization for mining big data using four-dimensional scanning transmission electron microscopy
,”
Ultramicroscopy
221
,
113168
(
2021
).
27.
A.
Bruefach
,
C.
Ophus
, and
M. C.
Scott
, “
Analysis of interpretable data representations for 4D-STEM using unsupervised learning
,”
Microsc. Microanal.
28
,
1998
2008
(
2022
).
28.
J. E. M.
Laulainen
,
D. N.
Johnstone
,
I.
Bogachev
,
S. M.
Collins
,
L.
Longley
,
T. D.
Bennett
, and
P. A.
Midgley
, “
Mapping non-crystalline nanostructure in beam sensitive systems with low-dose scanning electron pair distribution function analysis
,”
Microsc. Microanal.
25
,
1636
1637
(
2019
).
29.
A.
Strehl
and
J.
Ghosh
, “
Cluster ensembles—A knowledge reuse framework for combining multiple partitions
,”
J. Mach. Learn. Res.
3
,
583
617
(
2002
).
30.
T.
Boongoen
and
N.
Iam-On
, “
Cluster ensembles: A survey of approaches with recent extensions and applications
,”
Comput. Sci. Rev.
28
,
1
25
(
2018
).
31.
J. M.
Cowley
and
A. F.
Moodie
, “
The scattering of electrons by atoms and crystals. I. A new theoretical approach
,”
Acta Crystallogr.
10
,
609
619
(
1957
).
32.
E. J.
Kirkland
,
Advanced Computing in Electron Microscopy
, 3rd ed. (
Springer Science and Business Media
,
2020
).
33.
C.
Ophus
, “
A fast image simulation algorithm for scanning transmission electron microscopy
,”
Adv. Struct. Chem. Imaging
3
,
13
(
2017
).
34.
B. H.
Savitzky
,
S. E.
Zeltmann
,
L. A.
Hughes
,
H. G.
Brown
,
S.
Zhao
,
P. M.
Pelz
,
T. C.
Pekin
,
E. S.
Barnard
,
L.
Rangel DaCosta
,
E.
Kennedy
,
Y.
Xie
,
M. T.
Janish
,
M. M.
Schneider
,
P.
Herring
,
C.
Gopal
,
A.
Anapolsky
,
R.
Dhall
,
K. C.
Bustillo
,
P.
Ercius
,
M. C.
Scott
,
H.
Ciston
,
A. M.
Minor
, and
C.
Ophus
, “
py4DSTEM: A software package for four-dimensional scanning transmission electron microscopy data analysis
,”
Microsc. Microanal.
27
,
1
32
(
2021
).
35.
P.
Paatero
and
U.
Tapper
, “
Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values
,”
Environmetrics
5
,
111
126
(
1994
).
36.
D. D.
Lee
and
H. S.
Seung
, “
Learning the parts of objects by non-negative matrix factorization
,”
Nature
401
,
788
791
(
1999
).
37.
A.
Bruefach
(
2022
). “
Demonstration of consensus clustering for 4D-STEM
,” Zenodo.

Supplementary Material