Over the last two decades, Electron Energy Loss Spectroscopy (EELS) imaging with a scanning transmission electron microscope has emerged as a technique of choice for visualizing complex chemical, electronic, plasmonic, and phononic phenomena in complex materials and structures. The availability of the EELS data necessitates the development of methods to analyze multidimensional data sets with complex spatial and energy structures. Traditionally, the analysis of these data sets has been based on analysis of individual spectra, one at a time, whereas the spatial structure and correlations between individual spatial pixels containing the relevant information of the physics of underpinning processes have generally been ignored and analyzed only via the visualization as 2D maps. Here, we develop a machine learning-based approach and workflows for the analysis of spatial structures in 3D EELS data sets using a combination of dimensionality reduction and multichannel rotationally invariant variational autoencoders. This approach is illustrated for the analysis of both the plasmonic phenomena in a system of nanowires and in the core excitations in functional oxides using low loss and core-loss EELS, respectively. The code developed in this manuscript is open sourced and freely available and provided as a Jupyter notebook for the interested reader.

Electron energy loss spectroscopy (EELS) in a scanning transmission electron microscope (STEM) has become an enabling tool in studying electronic behaviors at the atomic and nanoscales. In a modern instrument, phenomena such as bandgap, atomic elemental identification, and chemical nature of the specimen are all accessible.1–4 Recently, monochromator technology has improved to the point where energy resolutions around 5 meV can be achieved, allowing vibrational spectroscopy to be conducted with unprecedented spatial resolution5,6 in the near-infrared regime.7,8 Effectively, this allows the electron microscope to explore atomic resolution phonon imaging, including single-atom and single-defect phonons, as well as identification of isotopes, plasmon–phonon coupling, and interfacial vibrational effects.9–16 

Aside from vibrational spectroscopy, localized surface plasmon resonances (LSPRs) are also detectable and mappable in space.17 Even more spatial detail can be realized by acquiring EELS images at a set of tilt angles, allowing a three-dimensional reconstruction of the plasmon modes which can serve to distinguish, for instance, modes localized to a surface from those at an edge.18 Compared to traditional optical excitation schemes,19,20 the electron beam simultaneously serves as an excellent excitation source and probe, in that it is a well-localized optically white light source whose energy transfer (e.g., to plasmon modes) can be spatially mapped with nanoscale precision. Given that LSPRs are typically supported in dimensionally confined systems like nanoparticles and nanowires, STEM-EELS emerges as the ideal tool to explore the local plasmonic behavior of these systems.

EELS may be performed as either a single point spectroscopy, or as a so-called “spectrum image,” in which an EEL spectrum is collected at an array of equally spaced spatial coordinates. The latter is a three-dimensional data set (“data cubes”) containing information on energy dispersion and spatial distributions of relevant behaviors.21 Correspondingly, the modern proliferation of EELS imaging necessitates development of methods to decipher physically meaningful information from this type of data sets. Note that this requirement is not unique to EELS and also emerges in the context of techniques such as energy-dispersive x-ray spectroscopy (EDS).22,23

Initial successes in exploring multidimensional EELS data sets were achieved with unsupervised linear unmixing methods, including principal component analysis (PCA), independent component analysis (ICA), and non-negative matrix factorization (NMF).24 In these methods, the data set is represented as a linear combination of the components defined in the spectral domains and weights (or loading maps) defined in real 2D or 3D space as
(1)

Here, A i ( x , y ) represents the loading maps that contain spectral variability across the component space and r i ( E ) are the spectral endmembers. The number of components, N, is chosen based on the anticipated physics and decomposition quality, among other factors.

Dissimilar linear decomposition methods impose different limitations on the weights and components. For example, PCA orders the components based on their significance and components are orthogonal; in NMF, the components are non-negative, while techniques such as Bayesian linear unmixing impose an additional sum-to-one constraint.25–27 Due to the constraint of non-negativity which is consistent with a counting spectroscopy, NMF has been used extensively in the electron microscopy community in order to spectrally deconvolve EELS data cubes,28–30 while PCA and other algorithms without a non-negativity constraint tend to produce abstract, un-physical, and generally uninterpretable loadings.31 Additionally, more significant constraints on sparsity, closeness to a chosen functional form, etc., can be imposed. In general, these constraints can be chosen to match the physics of the explored system with the hope that the resultant behaviors and loading maps will represent the physics of the system being explored.

Linear decomposition methods are particularly well suited for situations when the corresponding physical mechanisms are linear,32,33 for example, in a thin sample where the total core-loss EELS signal can be reasonably approximated as a linear combination of the signals from the atomic species in the probed volume. However, it is well recognized that in many cases this linear approximation does not hold, such as for multiple scattering in thick samples, or for features that change due to structural or bonding differences. Correspondingly, numerous methods have been developed for non-linear unmixing or so-called “manifold learning.” These methods ultimately seek to discover a low-dimensional manifold containing the data based on similarity between the data points, statistical properties of the distribution, or partially known functional forms. Well-established techniques such as self-organized feature maps (SOFM),34,35 Gaussian mixture modeling (GMM),36 and simple and variational autoencoders35,37,38 have existed for years in the computer vision fields but have only recently gained traction in the context of physics extraction. Beyond analysis of the data set to discover low-dimensional representations, these approaches can be extended to establish structure–property relationships, e.g., predict spectral responses given a geometric input image and vice versa.39 Generally, it is assumed that if the manifold containing the data is discovered, it provides a basis for the construction of generative models and can provide insight into the physics of the system.

However, the common limitation of both linear and manifold learning methods as applied to imaging data is that they are applied on the pixel-by-pixel level. In particular, we note that permutation of spatial pixels does not change the components and leads to an equivalent permutation of the loading maps. In this manner, the information contained in the spatial correlations within the image is generally lost. While a number of methods were suggested to impose spatial correlations in spectral analysis,40–42 there has not been a universal approach to explore these in hyperspectral imaging methods. At the same time, spatial correlations in EELS and other hyperspectral techniques are a crucial aspect of the experiment, in which physics is naturally encoded, and is arguably a key motivating factor behind collecting a spectrum image.

To address this issue, we originally implemented invariant variational autoencoders43 as a method for structural analysis of real-space images to disentangle rotation, translation, and scale from structural building blocks, allowing for definition of local symmetry.44–48 Here, we introduce the multichannel rotationally invariant variational autoencoder (rVAE), which pairs classical dimensionality reduction with rotationally invariant VAE such that the model inputs are still image-based, but are derived from a chosen spectral decomposition technique (e.g., NMF). The multichannel rVAE is demonstrated for EELS spectrum images and its applicability and performance in different EELS regimes is explored.

As a first model system, we have chosen semiconductor nanowires. Nanowire systems support multiple plasmon modes and give rise to complex plasmonic interactions mediated by proximity effects. The energies of these depend both on geometry and effective dielectric properties (both the material and its environment), and in an extreme case, plasmons can even exist in molecular nanowires only a few atoms in length.49 Moreover, the spatial localization of modes tends to decrease with lower energies, meaning there is a natural spatial structure that is related to the energy of the plasmon. As a second model system, we have chosen an example interface on a strontium titanate substrate for which we collected core-loss EELS spectra to explore the performance for atomically resolved data.

Indium antimonide nanowires of nominal diameter 100 nm were electrochemically synthesized50 via a template assisted method (see the supplementary material) and monochromated STEM-EELS was performed to acquire spectrum images of various configurations of nanowires. More details about sample growth, sample preparation, and STEM-EELS measurements can be found in the supplementary material. A pair of nanowires with different lengths is selected, allowing for variability of the plasmon energies. Shown in Fig. 1 is the standard exploratory approach using NMF to spectrally deconvolve the data. At an array of probe positions enclosed within the red bounding box in Fig. 1(a), EEL spectra were collected, and the resulting 3D data are flattened into two dimensions (one of space, one of energy). Figures 1(b)1(d) illustrate that despite the lack of spatial details provided to the algorithm, NMF performs well in terms of extracting the different spectral features contained within the data, making it an excellent tool for spectral exploration within an EELS spectrum image, specifically in the case of nanoparticles and nanowires.51 Several different plasmon modes are revealed by this deconvolution, where lower energy modes are more delocalized—the degree of localization to the nanowire is clearly depicted in the abundance maps in Fig. 1(d).

FIG. 1.

NMF deconvolution of an EELS spectrum image. A HAADF image is shown in (a) with the area used for the spectrum image outlined in red; (b) the NMF spectral endmembers; (c) PCA scree plot, representing the variability of the data; and (d) the corresponding abundance maps. Scale bars in (d) are all 100 nm.

FIG. 1.

NMF deconvolution of an EELS spectrum image. A HAADF image is shown in (a) with the area used for the spectrum image outlined in red; (b) the NMF spectral endmembers; (c) PCA scree plot, representing the variability of the data; and (d) the corresponding abundance maps. Scale bars in (d) are all 100 nm.

Close modal

In a different approach, we make use of a latent variable (nonlinear) technique—the VAE—to attempt to map the spectral features to a low-dimensional manifold. Use of VAEs in hyperspectral imaging has been previously carried out in remote sensing fields52,53 and recently have been used with STEM imaging,45 where in both cases linear unmixing fails to capture the physics of the system. In contrast to linear unmixing methods, the autoencoder compresses the data set to a small number of latent variables via a convolutional or fully connected neural network and then expands it back to match the original signal with a different neural network in such a way as to retain the maximum original information. In this process, the relevant traits of the signal get encoded in continuous latent variables, whereas noise is rejected. In variational autoencoders, the additional constraint is the minimization of Kullback–Leibler divergence between encoded latent variables and a prior distribution.54 Practically, this results in much smoother latent distributions in the latent space. However, both for AEs and VAEs, the general principle is that the data set is described in terms of continuous latent variables, with the encoder and decoder serving as functions establishing the relationship between the signal and latent vector. The distributions of the data in the low-dimensional latent space provide information on the variability of the behaviors within the system.

Figure 2 shows what information is gleaned from application of the VAE to the identical data. A two-dimensional space was chosen for mapping the spectral features, where each encoded feature is represented as a latent variable. The latent variables elucidate the physical factors of variation in the data. A high dimensional latent space is usually not necessary as the prominent spectral features can be represented by a small number of latent variables. Practically, visualizing higher dimensional latent spaces is also difficult.

FIG. 2.

1D VAE analysis. (a) and (b) represent the encoded latent feature images, while (c) shows the distribution of data points in the (2D) latent space overlaid with kernel density estimate (KDE). Spectral predictions for selected pixels shown in HAADF-STEM (j) are presented in (g)–(i). The hyperspectral data averaged over all energy is shown in (d), with the reconstructed image using the trained VAE model in (e), and the error surface in (f), with the MSE calculated for each pixel. Scale bars are 100 nm. Color bar is for MSE in (f) only.

FIG. 2.

1D VAE analysis. (a) and (b) represent the encoded latent feature images, while (c) shows the distribution of data points in the (2D) latent space overlaid with kernel density estimate (KDE). Spectral predictions for selected pixels shown in HAADF-STEM (j) are presented in (g)–(i). The hyperspectral data averaged over all energy is shown in (d), with the reconstructed image using the trained VAE model in (e), and the error surface in (f), with the MSE calculated for each pixel. Scale bars are 100 nm. Color bar is for MSE in (f) only.

Close modal

The trained VAE model is used to encode the original data points into the latent space. Shown in Figs. 2(a) and 2(b) are the two latent feature maps, corresponding to the spatial distribution of the variability factors discovered by VAE. The distribution of points in latent space is shown in Fig. 2(c), overlaid with the kernel density estimation (KDE) which effectively allows visualization of a smoothed probability density using a Gaussian kernel.

The performance of the model is assessed in Figs. 2(d)2(f), where the model is used first to encode all spectra to a 2D latent space then to decode back to spectral space for each spatial coordinate, which effectively reconstructs the original data. We note that it is possible to have a favorable disentanglement (encoding) and a poor reconstruction (decoding).55 Nevertheless, for the sake of simplicity we assess performance by considering these processes together. The energy-averaged original and decoded data are shown for comparison in Figs. 2(d) and 2(e), respectively, and (f) shows the mean squared error (MSE) for all energies (no averaging). The latter shows that the model has performed well, with mostly well below 1% error at any given position, which also reinforces our choice of two latent dimensions. Together, Z 1 and Z 2 contain the total—but compressed—spectral variability of the system.

By virtue of the compression strategy, noise is rejected during training, and as a direct result, the data are, therefore, said to be denoised. The denoising can be seen when comparing the original and predicted energy-averaged spectrum images in Figs. 2(d) and 2(e), respectively. For a clearer insight into the spectral differences, we compare the spectra from a few pixels as in Figs. 2(g)2(i), where the model-reconstructed spectra clearly contain less noise than the original. In NMF and other linear methods, this noise is unavoidable. It is noted that the predicted spectra in Figs. 2(g)2(i) generally possess a slightly lower intensity than the raw spectra—the intensity underestimation here that may be a result of the denoising of the VAE, but this may be caused by slight over-compression or over-training of the VAE, as this does not occur in other spectra to be investigated in subsequent figures.

It is instructive to further compare the variational autoencoder with linear decomposition methods such as NMF and PCA. Like NMF, the VAE explores each spectrum without regard for its coordinate in space. Here, our intent is primarily to discover spatial structures, but a challenge up until now has been treatment of a feature's rotation, as an object of interest may have any orientation in the image plane. To solve this problem, the rotational-invariant VAE (rVAE) encodes rotations as a separate latent variable (Fig. 3).45,56

FIG. 3.

Multichannel rVAE design. A hyperspectral data cube is deconvolved to generate abundance maps based on spectral features. Using selected maps, sets of image patches are formed from each pixel in the parent map and then tiled into small windows (stacks of local channels). These multichannel tiles are sent as stacks to the rVAE network, which encodes all input to a latent space, from which latent features are subsequently decoded, iterating the process to best match output to input. The network shown here contains a latent space of three dimensions (red and yellow circles) where one dimension (red) is forced to represent the encoded angle.

FIG. 3.

Multichannel rVAE design. A hyperspectral data cube is deconvolved to generate abundance maps based on spectral features. Using selected maps, sets of image patches are formed from each pixel in the parent map and then tiled into small windows (stacks of local channels). These multichannel tiles are sent as stacks to the rVAE network, which encodes all input to a latent space, from which latent features are subsequently decoded, iterating the process to best match output to input. The network shown here contains a latent space of three dimensions (red and yellow circles) where one dimension (red) is forced to represent the encoded angle.

Close modal

In the multi-channel rVAE architecture, a stack of images is used as inputs (and outputs). Images derived from NMF abundance maps are used here; therefore, each channel is associated with a particular NMF component and is based on a spectral feature intended to be explored. PCA maps or other spectral deconvolution methods can just as easily be used, however, we find that NMF tends to generally perform rather well for these EELS data sets and hosts the non-negativity constraint which physically agrees with a counting spectroscopy. We choose nine NMF components—out of a total of 30—that best represent the system behavior as input channels, which are the same components from Fig. 1. It is important to note that the choice of components falls in the realm of the domain expert, and while biased, it is necessary to produce a solvable problem. More NMF components than features that are to be expected to be contained in the data are used (i.e., 30 components, when only ∼9 are inspected) as a way for the domain expert to have the ability to select physically reasonable components, e.g., a single-peaked spectrum corresponding to a single physical effect and ignore other noisier components.

With the aim of discovering spatial correlations that have a spectral foundation, a sub-image with size d x × d y is created at each pixel of each channel of a “reduced” spectroscopic image (pixels in which half the window size exceeds the boundary of the image are excluded). The window size plays a large role in the performance of the model and should be carefully chosen. We find that the model seems to perform well if the window size is commensurate with the characteristic length scale for the feature of interest—Fig. 4 shows the effect of window size on the model performance. The stack of image patches for each channel are sent together to the rVAE algorithm and in a similar way to VAE, the algorithm attempts to minimize the difference between the decoded channels and original channels (rather than a spectral evaluation) which at this stage is a purely image-based technique. Choice of number of channels and which channels to use is also left to the domain expert—in the case of this plasmonic nanowire system, one may choose to investigate the spatial structure of specific plasmon modes, e.g., a low-energy delocalized mode and a high-energy strongly localized mode, rather than all deconvolved components. The extracted spatial features can then be linked back the spectral component itself to gain understanding of spatial correlations based on specific spectral signatures.

FIG. 4.

Multichannel rVAE performance for a window size of 10 × 10 pixels, two latent dimensions and three channels. (a) and (b) Latent space representations are 2D manifolds, showing different voxel extremes. (c) Latent space distribution of the data overlaid with the kernel density estimate (KDE). (d) Angle, (e) latent feature map for z1, and (f) Latent feature map for z2. Distribution of encoded angular features (g) and latent features z 1 (h) and z 2 (i) indicate the spatial variability. The selected three input channels are shown in (j) along with their predictions.

FIG. 4.

Multichannel rVAE performance for a window size of 10 × 10 pixels, two latent dimensions and three channels. (a) and (b) Latent space representations are 2D manifolds, showing different voxel extremes. (c) Latent space distribution of the data overlaid with the kernel density estimate (KDE). (d) Angle, (e) latent feature map for z1, and (f) Latent feature map for z2. Distribution of encoded angular features (g) and latent features z 1 (h) and z 2 (i) indicate the spatial variability. The selected three input channels are shown in (j) along with their predictions.

Close modal

In Fig. 5, we demonstrate the use of a three-channel rVAE based on the first three NMF components, which incidentally includes both the lowest and highest energy plasmon modes that were observed. For consistency, we keep the number of latent dimensions the same as used in the 1D VAE case (two) and choose a window size of 10 × 10 pixels. The model used in Fig. 5 reveals several features. First, there is variation in both latent space dimensions [(e) and (f)], while the encoded angle in (d) indicates edge behaviors—it delineates edges of the nanowires and variability of behaviors. Figure 5(c) also indicates that the latent space does not collapse, indicated by a wide distribution of points in latent space in both dimensions, as opposed to a partially or fully collapsed latent space where points are highly concentrated in one or both latent dimensions, respectively. The distribution and occurrences of the latent features are also shown in histograms in Figs. 5(g)5(i). The character of the encoding can be represented by the latent space representation in Figs. 5(a) and 5(b). Here, the rectangular grid of points in the latent space is created and the corresponding image patches are decoded from these latent variable pairs. A three-channel multichannel model is convenient in that each channel is assigned a color channel from the RGB color scheme—channel 1, red; channel 2, green; and channel 3, blue. Hence, the manifold maps in (a) and (b) represent the latent space distribution of effects relating to each of the three channels, where the colors signify the strength of that channel in the latent space. For example, the manifolds show a red color on the left-hand side and a blue color on the right most side, meaning the first channel's effects are strongest in the left region in latent space and the third channel is strongest in the right region. Green is hardly present which implies there is little strength and variation of the second channel. This component is extremely localized to the nanowire relative to the other two provided channels and so in most image patches, the total intensity is considerably less, meaning that as far as the rVAE model is concerned, its effect is much less. Note that each channel is separately normalized; therefore, the weights of the NMF components are intentionally lost. We typically found that if the set of channels are normalized together, there is substantially less variation in latent space and generally provides less insight, (likely) caused by an imbalance in the strength of the components. On the other hand, one may artificially increase the strength of this highly localized component (e.g., normalize from [0, 2]) to attempt to balance the component strengths, resulting in the green component—because it is the second channel—becoming visible in the manifold, as well as one of the latent features favoring edge behavior (Fig. S1 in the supplementary material). A general strategy for balancing under- and over-represented components instead could be to normalize based on spatial representation of the features—i.e., based on the total summed intensity of each of the normalized components, normalize again to the component with maximum summed intensity. Having learned the variation, the model sufficiently predicts the three channels [Fig. 5(j)], even with only two latent features describing the system consisting of the chosen three channels.

FIG. 5.

Image patch relation to feature size. Window sizes of 6, 10, 16, and 30 pixels are shown in (a)–(d), respectively, for the rVAE model architecture using the identical three input channels (first three NMF components) and training parameters. Panel (d) illustrates the latent features are dramatically smeared out (note the evenly distributed points in latent space, and the encoded latent feature maps), while at the other extreme of small window sizes, (a) shows much higher localization to the latent features (including rotational feature), as well as more localized clustering within the latent space.

FIG. 5.

Image patch relation to feature size. Window sizes of 6, 10, 16, and 30 pixels are shown in (a)–(d), respectively, for the rVAE model architecture using the identical three input channels (first three NMF components) and training parameters. Panel (d) illustrates the latent features are dramatically smeared out (note the evenly distributed points in latent space, and the encoded latent feature maps), while at the other extreme of small window sizes, (a) shows much higher localization to the latent features (including rotational feature), as well as more localized clustering within the latent space.

Close modal

One important model parameter that should be discussed in more detail is the size of window into which the input channels are divided. Like the number of latent variables and NMF components, this parameter should also be appropriately chosen to coincide with relevant spatial features. For strongly localized features, i.e., core-loss EELS features which are localized to atoms or atomic columns, the window size should essentially be the size of an atom. However, for phenomena that tend to be more delocalized, the choice of window size needs to be explored more carefully. For instance, in the case of plasmons, a dipole mode may be much more delocalized than a higher order mode. If the data can be well-represented in two latent dimensions z 1 and z 2, there is no need to increase the complexity and extend to three or higher dimensions. However, this behavior is also data-dependent and should be explored.

As the number of channels to the multichannel rVAE is increased, the latent space distribution begins to take on a unique form. Reconstruction (prediction) of each channel also begins to improve, while the encoded rotation remains essentially unaffected. Analysis of latent space distribution with an increasing number of channels is presented in Fig. S2 in the supplementary material. The latent variables themselves change with almost every additional channel, which is not surprising given that each channel represents a different spectral feature (extracted via NMF), such that the information being compressed to the latent space is dynamically changing with each channel added to the model. Despite the increase of information, the latent space distribution, form, and encoded latent angles all remain constant, which is also striking.

It is important to emphasize the input to the multichannel rVAE is strictly image-based, but the images themselves have spectral origins from the spectral deconvolution method of choice. For this reason, choice of input channels is associated with the spatial features that will be learned. For example, choosing a three-channel model with different NMF abundance maps for each model may produce a different result—this is studied further in Fig. S3 in the supplementary material. Consequently, when more channels are added, they represent more of the spectral (and spatial) variability contained in the data, and the output begins to converge and is less dependent on which channels are provided to the model. Rather than supplying the model with all components in an to attempt to account for all features of the system, it is generally preferable to isolate specific features and unravel their physical impact and spatial correlations.

Switching energy regimes, we shift our focus to the performance of the VAE models on atomically resolved core-loss EELS. Namely, we seek to understand ultimately if this method can be applied to spectral features that sustain a spatial variance on the atomic scale. First, Fig. 6 shows the 1D VAE—again with two latent variables—as applied to an atomically resolved spectrum image at a SrTiO3–LaGaO3 interface.63 The model extracts two distinct latent features z 1 and z 2 in Figs. 6(a) and 6(b) which clearly separate different interfaces, which is supported by the spread of points in latent space in (d). Comparison of the original and reconstructed average spectrum images in Figs. 6(e) and 6(f) provides almost an identical match, albeit with slightly less noise, resulting in a low MSE, as shown in Fig. 6(g). Two example pixels are selected to show individual spectral reconstructions and, as before, the channel-to-channel noise has been reduced. From these considerations and in terms of faithfully reconstructing the original data, we conclude the VAE model performed well. However, the problem remains that the 1D VAE ignores the spatial structure.

FIG. 6.

1D VAE analysis for an atomically resolved core-loss EELS spectrum image of a SrTiO3–LaGaO3 interface. (a) and (b) are decoded latent features z 1 and z 2, respectively, with the HAADF-STEM image shown in (c). The distribution of points in latent space overlaid with the KDE is shown in (d). Ground truth spectrum image averaged over all energy and encoded image (reconstruction) after applying trained 1D VAE model shown in (e) and (f), respectively, where (g) is the MSE spatial map between (e) and (f). Selected spectra indicated in (c) and their VAE predictions are shown in (h) and (i). Color bar is for MSE in (g) only.

FIG. 6.

1D VAE analysis for an atomically resolved core-loss EELS spectrum image of a SrTiO3–LaGaO3 interface. (a) and (b) are decoded latent features z 1 and z 2, respectively, with the HAADF-STEM image shown in (c). The distribution of points in latent space overlaid with the KDE is shown in (d). Ground truth spectrum image averaged over all energy and encoded image (reconstruction) after applying trained 1D VAE model shown in (e) and (f), respectively, where (g) is the MSE spatial map between (e) and (f). Selected spectra indicated in (c) and their VAE predictions are shown in (h) and (i). Color bar is for MSE in (g) only.

Close modal

To address this challenge, we apply a multichannel rVAE model with two latent variables and three input channels, as shown in Fig. 7. The original and reconstructed channels are shown in (a)–(c), which, despite the presence of dense atomic features and a non-periodic structure, reveal that the atomic resolution contrast is still visible in the reconstructions. The encoded latent angle in (f) appears to again behave as an edge filter, but also reveals atomic contrast in the central part of the image that is not as clear in the input channels (or even the HAADF image). Comparing the latent variable maps from the rVAE [Figs. 7(g) and 7(h)] to those from the ordinary VAE [Figs. 6(a) and 6(b)] reveals their different performance. The VAE maps essentially show a bulk feature and a surface feature, while the rVAE distinguishes individual atomic channels. Finally, the color spread in RGB space in the manifold maps in Figs. 7(d) and 7(e) indicate each of the three inputs are mostly all utilized and relevant despite the second channel consisting mostly of low intensity. The distribution of points in latent space [Fig. 7(i)] takes on a peculiar structure where the KDE is clustered in two regions that represent the two preponderant compositions present at the interface.

FIG. 7.

Multichannel (three channel) rVAE analysis for atomically resolved core-loss EELS spectrum image, using shown NMF components as channels. Original channels are shown in (a)–(c) with their reconstructions in the panel directly below. Encoded latent rotation and latent features, z 1 and z 2 in (f)–(h), with distribution of occurrences. Manifold representations of latent space for different voxel extremes [(d) and (e)] and distribution of points in latent space (i) reveal distinct features.

FIG. 7.

Multichannel (three channel) rVAE analysis for atomically resolved core-loss EELS spectrum image, using shown NMF components as channels. Original channels are shown in (a)–(c) with their reconstructions in the panel directly below. Encoded latent rotation and latent features, z 1 and z 2 in (f)–(h), with distribution of occurrences. Manifold representations of latent space for different voxel extremes [(d) and (e)] and distribution of points in latent space (i) reveal distinct features.

Close modal

Nanoparticle systems are known to exhibit a rich plasmonic structure, which increases in complexity as more particles are positioned very close of one another (within a few nanometers).57–59 While the spectral behavior has been well studied with EELS,39,60,61 the spatial structure in the measurements has again not been considered. In Fig. 8, we present three separate fluorine-doped indium oxide (FIO) nanoparticle hyperspectral EELS data sets: a single particle, a dimer, and a chain of seven particles resembling a nanowire. Both VAE and multichannel rVAE are performed for each data set, in addition to the standard NMF feature extraction, where the latter is used as input for the multichannel rVAE as before. Three primary modes exist in these nanoparticle systems: a corner mode, an edge mode, and a face or bulk mode—these are evident in the three NMF components in the last row in Fig. 8(a). When multiple particles come together as in Figs. 8(b) and 8(c), these modes still exist, but also couple to produce additional modes in space and potentially at different observed energies. Using a trained VAE model, we show that each nanoparticle system can be represented by two main components. This model is then used to predict any coordinate's spectrum with less than 1% error, where the prediction error maps for all systems are shown in the supplementary material. This indicates that VAE can learn the system dynamics for a variety of particle geometries.

FIG. 8.

Plasmonic nanoparticle analysis using NMF, VAE, and three-channel rVAE analysis for three separate particle distributions: (a) single, isolated nanocube, (b) isolated nanocube dimer, and (c) isolated chain of nanocubes. Top row shows the structural ADF image for each set. Second row depicts the latent features extracted via VAE. Third row shows the disentangled rotational component using a three-channel rVAE model (shown in the last row as three NMF components). Fourth row shows the latent features extracted here, now separate from the rotational features. Fifth row presents the three NMF components used in the multichannel rVAE model, along with the prediction of these components by the trained rVAE model. All scalebars are 10 nm.

FIG. 8.

Plasmonic nanoparticle analysis using NMF, VAE, and three-channel rVAE analysis for three separate particle distributions: (a) single, isolated nanocube, (b) isolated nanocube dimer, and (c) isolated chain of nanocubes. Top row shows the structural ADF image for each set. Second row depicts the latent features extracted via VAE. Third row shows the disentangled rotational component using a three-channel rVAE model (shown in the last row as three NMF components). Fourth row shows the latent features extracted here, now separate from the rotational features. Fifth row presents the three NMF components used in the multichannel rVAE model, along with the prediction of these components by the trained rVAE model. All scalebars are 10 nm.

Close modal

We highlight that in all demonstrations of these particle systems, the encoded latent rotation features disentangled by the rVAE model exhibit similarity for each nanoparticle configuration. In each case, the shape and orientation of the nanoparticles are clearly visible in these encoded angle maps, and moreover, the two latent feature maps are now free from rotational effects. This implies that by choosing appropriate abundance maps that correspond to specific plasmon modes, the encoded angular representation can be learned and extracted from these data. In other words, the corner, edge, and face modes (their NMF abundance maps) were used to train the multichannel rVAE model, which allows to visualize the rotational effect of these modes. Here, the effect is qualitative but clearly aligned with the facets of the nanoparticle cube structure—it is intriguing three separate rVAE models (one for each dataset) all extract very similar encoded angle representations—indicating a consistent spatial feature among single and multiple particle configurations. In this manner, we discover the equivalent structural features in the hyperspectral EELS images independent of their orientation.

To summarize, EELS and similar hyperspectral imaging methods yield information-rich data sets containing a wealth of information on materials structure and properties. Traditional approaches to extract features from hyperspectral images use linear or non-linear dimensionality reduction to explore the signal variability in the energy domain, while spatial details are extracted via visual examination of the loading or latent variable maps. Here, we develop an approach to enable analysis of the spatial structures naturally embedded in hyperspectral images via rotationally invariant autoencoders.64 This approach allows identification of the elementary structures contained in the EELS data set while taking into account rotational invariances that are inevitably present in microscopic data.

This approach can be universally applied to hyperspectral imaging data in techniques such as continuous imaging tunneling spectroscopy (CITS) in scanning tunneling microscopy, electron energy-dispersive spectroscopy and cathodoluminescence in scanning electron microscopy, and multiple other hyperspectral and multichannel imaging techniques in electron, scanning probe, and optical imaging. We further note that development of these methods can proceed via configuring the latent space of the VAE, i.e., introducing the joint VAE concept48 and control of the latent representations.62 

Materials and methods are provided in the supplementary material. This includes the material synthesis details for InSb nanowires, FIO nanoparticles, as well as details for the SrTiO3–LaGaO3 interface sample. STEM-EELS experimental details are provided. Additional figures are also included, as well as a link to a Jupyter notebook containing the experimental data which allows to reproduce the analysis described herein.

This effort (ML and STEM) is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Basic Energy Sciences (BES), Materials Sciences and Engineering Division (K.M.R., S.V.K., A.R.L.) and was performed and partially supported (M.Z.) at the Oak Ridge National Laboratory’s Center for Nanophase Materials Sciences (CNMS), a U.S. Department of Energy, Office of Science User Facility. The authors acknowledge Shin-Hum Cho and Delia J. Milliron for supplying semiconducting nanoparticles.

This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

The authors have no conflicts to disclose.

Kevin M. Roccapriore: Data curation (equal); Formal analysis (equal); Investigation (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Maxim Ziatdinov: Conceptualization (equal); Methodology (equal); Project administration (equal); Software (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal). Andrew R. Lupini: Data curation (equal); Investigation (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). Abhay P. Singh: Investigation (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). Usha Philipose: Investigation (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). Sergei V. Kalinin: Conceptualization (equal); Funding acquisition (equal); Resources (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are openly available in Zenodo at http://doi.org/10.5281/zenodo.10449467, Ref. 64.

1.
J.
Park
et al, “
Bandgap measurement of thin dielectric films using monochromated STEM-EELS
,”
Ultramicroscopy
109
,
1183
1188
(
2009
).
2.
K:
Kimoto
et al, “
Element-selective imaging of atomic columns in a crystal using STEM and EELS
,”
Nature
450
,
702
704
(
2007
).
3.
E. C.
Cosgriff
,
M. P.
Oxley
,
L. J.
Allen
, and
S. J.
Pennycook
, “
The spatial resolution of imaging using core-loss spectroscopy in the scanning transmission electron microscope
,”
Ultramicroscopy
102
,
317
326
(
2005
).
4.
H.
Tan
,
J.
Verbeeck
,
A.
Abakumov
, and
G.
Van Tendeloo
, “
Oxidation state and chemical shift investigation in transition metal oxides by EELS
,”
Ultramicroscopy
116
,
24
33
(
2012
).
5.
OL:
Krivanek
et al, “
Vibrational spectroscopy in the electron microscope
,”
Nature
514
,
209
212
(
2014
).
6.
J. A.
Hachtel
,
A. R.
Lupini
, and
J. C.
Idrobo
, “
Exploring the capabilities of monochromated electron energy loss spectroscopy in the infrared regime
,”
Sci. Rep.
8
,
5637
(
2018
).
7.
A:
Agrawal
et al, “
Localized surface plasmon resonance in semiconductor nanocrystals
,”
Chem. Rev.
118
,
3121
3207
(
2018
).
8.
SH:
Cho
et al, “
Syntheses of colloidal F:In2O3 cubes: Fluorine-induced faceting and infrared plasmonic response
,”
Chem. Mater.
31
,
2661
2676
(
2019
).
9.
JA:
Hachtel
et al, “
Identification of site-specific isotopic labels by vibrational spectroscopy in the electron microscope
,”
Science
363
,
525
528
(
2019
).
10.
LHG:
Tizei
et al, “
Tailored nanoscale plasmon-enhanced vibrational electron spectroscopy
,”
Nano Lett.
20
,
2973
2979
(
2020
).
11.
K.
Venkatraman
,
B. D. A.
Levin
,
K.
March
,
P.
Rez
, and
P. A.
Crozier
, “
Vibrational spectroscopy at atomic resolution with electron impact scattering
,”
Nat. Phys.
15
,
1237
1241
(
2019
).
12.
F. S.
Hage
,
G.
Radtke
,
D. M.
Kepaptsoglou
,
M.
Lazzeri
, and
Q. M.
Ramasse
, “
Single-atom vibrational spectroscopy in the scanning transmission electron microscope
,”
Science
367
,
1124
1127
(
2020
).
13.
X:
Yan
et al, “
Single-defect phonons imaged by electron microscopy
,”
Nature
589
,
65
69
(
2021
).
14.
X:
Tian
et al, “
Capturing 3D atomic defects and phonon localization at the 2D heterostructure interface
,”
Sci. Adv.
7
,
eabi6699
(
2021
).
15.
ER:
Hoglund
et al, “Nanoscale phonon spectroscopy reveals emergent interface vibrational structure of superlattices,” arXiv:2105.10030 [cond-mat] (2021).
16.
K:
Reidy
et al, “
Direct imaging and electronic structure modulation of moiré superlattices at the 2D/3D interface
,”
Nat. Commun.
12
,
1290
(
2021
).
17.
J:
Nelayah
et al, “
Mapping surface plasmons on a single metallic nanoparticle
,”
Nat. Phys.
3
,
348
353
(
2007
).
18.
O:
Nicoletti
et al, “
Three-dimensional imaging of localized surface plasmon resonances of metal nanoparticles
,”
Nature
502
,
80
84
(
2013
).
19.
E.
Kretschmann
and
H.
Raether
, “
Notizen: Radiative decay of Non radiative surface plasmons excited by light
,”
Z. Naturforsch. A
23
,
2135
2136
(
1968
).
20.
A.
Otto
, “
Excitation of nonradiative surface plasma waves in silver by the method of frustrated total reflection
,”
Z. Phys.
216
,
398
410
(
1968
).
21.
C.
Colliex
, “
An illustrated review of various factors governing the high spatial resolution capabilities in EELS microanalysis
,”
Ultramicroscopy
18
,
131
150
(
1985
).
22.
D:
Rossouw
et al, “
Multicomponent signal unmixing from nanoheterostructures: Overcoming the traditional challenges of nanoscale x-ray analysis via machine learning
,”
Nano Lett.
15
,
2716
2720
(
2015
).
23.
B. R.
Jany
,
A.
Janas
, and
F.
Krok
, “
Retrieving the quantitative chemical information at nanoscale from scanning electron microscope energy dispersive x-ray measurements by machine learning
,”
Nano Lett.
17
,
6520
6525
(
2017
).
24.
M. R.
Keenan
and
P. G.
Kotula
, “
Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images
,”
Surf. Interface Anal.
36
,
203
212
(
2004
).
25.
N.
Dobigeon
,
S.
Moussaoui
,
M.
Coulon
,
J.-Y.
Tourneret
, and
A. O.
Hero
, “
Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery
,”
IEEE Trans. Signal Proc.
57
,
4355
4368
(
2009
).
26.
CT:
Nelson
et al, “
Exploring physics of ferroelectric domain walls via Bayesian analysis of atomically resolved STEM data
,”
Nat. Commun.
11
,
6361
(
2020
).
27.
R. K.
Vasudevan
,
M.
Ziatdinov
,
L.
Vlcek
, and
S. V.
Kalinin
, “
Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality
,”
npj Comput. Mater.
7
,
1
6
(
2021
).
28.
M.
Bosman
,
M.
Watanabe
,
D. T. L.
Alexander
, and
V. J.
Keast
, “
Mapping chemical and bonding information using multivariate analysis of electron energy-loss spectrum images
,”
Ultramicroscopy
106
,
1024
1032
(
2006
).
29.
M:
Shiga
et al, “
Sparse modeling of EELS and EDX spectral imaging data by nonnegative matrix factorization
,”
Ultramicroscopy
170
,
43
59
(
2016
).
30.
Y:
Nomura
et al, “
Quantitative operando visualization of electrochemical reactions and Li ions in all-solid-state batteries by STEM-EELS with hyperspectral image analyses
,”
Nano Lett.
18
,
5892
5898
(
2018
).
31.
P.
Potapov
, “
Why principal component analysis of STEM spectrum-images results in ‘abstract,’ uninterpretable loadings?
,”
Ultramicroscopy
160
,
197
212
(
2016
).
32.
E.
Strelcov
et al, “
Deep data analysis of conductive phenomena on complex oxide interfaces: Physics from data mining
,”
ACS Nano
8
,
6449
6457
(
2014
).
33.
SV.
Kalinin
et al, “
Big, deep, and smart data in scanning probe microscopy
,”
ACS Nano
10
,
9068
9086
(
2016
).
34.
T.
Kohonen
, “
Self-organized formation of topologically correct feature maps
,”
Biol. Cybern.
43
,
59
69
(
1982
).
35.
T.
Kohonen
, “
The self-organizing map
,”
Proc. IEEE
78
,
1464
1480
(
1990
).
36.
G.
Yu
,
G.
Sapiro
, and
S.
Mallat
, “
Solving inverse problems with piecewise linear estimators: From Gaussian mixture models to structured sparsity
,”
IEEE Trans. Image Process.
21
,
2481
2499
(
2012
).
37.
C. E.
Rasmussen
, “
The infinite Gaussian mixture model
,” in
Advances in Neural Information Processing Systems 12
, edited by
S. A.
Solla
,
T. K.
Leen
, and
K.
Müller
(
MIT Press
,
2000
), pp.
554
560
.
38.
D. P.
Kingma
and
M.
Welling
, “Auto-encoding variational Bayes,” arXiv:1312.6114 [cs, stat] (2014).
39.
K. M.
Roccapriore
,
M.
Ziatdinov
,
S. H.
Cho
,
J. A.
Hachtel
, and
S. V.
Kalinin
, “
Predictability of localized plasmonic responses in nanoparticle assemblies
,”
Small
17
,
2100181
(
2021
).
40.
M.
Jaderberg
,
K.
Simonyan
,
A.
Zisserman
, and
K.
Kavukcuoglu
, “Spatial transformer networks,” arXiv:1506.02025 [cs] (2016).
41.
J. F.
Henriques
and
A.
Vedaldi
, "
Warped convolutions: Efficient invariance to spatial transformations
," in
International Conference on Machine Learning
(
PMLR
,
2017
), pp.
1461
1469
.
42.
S. V.
Kalinin
,
A. R.
Lupini
,
R. K.
Vasudevan
, and
M.
Ziatdinov
, “Gaussian process analysis of electron energy loss spectroscopy (EELS) data: Parallel reconstruction and kernel control,” arXiv:2005.10507 [cond-mat, physics:physics] (2020).
43.
M.
Ziatdinov
, see https://github.com/ziatdinovmax/pyroVED for “
pyroVED
” (
2021
).
44.
MP:
Oxley
et al, “
Probing atomic-scale symmetry breaking by rotationally invariant machine learning of multidimensional electron scattering
,”
npj Comput. Mater.
7
,
1
6
(
2021
).
45.
S. V.
Kalinin
,
O.
Dyck
,
S.
Jesse
, and
M.
Ziatdinov
, “
Exploring order parameters and dynamic processes in disordered systems via variational autoencoders
,”
Sci. Adv.
7
,
eabd5084
(
2021
).
46.
SV:
Kalinin
et al, “
Disentangling rotational dynamics and ordering transitions in a system of self-organizing protein nanorods via rotationally invariant latent representations
,”
ACS Nano
15
,
6471
6480
(
2021
).
47.
M.
Ziatdinov
,
M. Y.
Yaman
,
Y.
Liu
,
D.
Ginger
, and
S. V.
Kalinin
, “Semi-supervised learning of images with strong rotational disorder: Assembling nanoparticle libraries,” arXiv:2105.11475 [cond-mat, physics:physics] (2021).
48.
M.
Ziatdinov
and
S.
Kalinin
, “Robust feature disentanglement in imaging data via joint invariant variational autoencoders: From cards to atoms,” arXiv:2104.10180 [physics] (2021).
49.
T:
Yu
et al, “
Electron-beam-induced molecular plasmon excitation and energy transfer in silver molecular nanowires
,”
J. Phys. Chem. A
125
,
74
87
(
2021
).
50.
A.
Singh
,
Z.
Algarni
, and
U.
Philipose
, “
Template-assisted electrochemical synthesis of p-type InSb nanowires
,”
ECS J. Solid State Sci. Technol.
6
,
N39
(
2017
).
51.
Y.
Wu
,
G.
Li
, and
J. P.
Camden
, “
Probing nanoparticle plasmons with electron energy loss spectroscopy
,”
Chem. Rev.
118
,
2994
3031
(
2018
).
52.
Y:
Su
et al, “
DAEN: Deep autoencoder networks for hyperspectral unmixing
,”
IEEE Trans. Geosci. Remote Sens.
57
,
4309
4321
(
2019
).
53.
Z:
Wang
et al, “
Fusionnet: An unsupervised convolutional variational network for hyperspectral and multispectral image fusion
,”
IEEE Trans. Image Process.
29
,
7565
7577
(
2020
).
54.
D. P.
Kingma
and
M.
Welling
, “
An introduction to variational autoencoders
,”
FNT Mach. Learn.
12
,
307
392
(
2019
).
55.
CP:
Burgess
et al, “Understanding disentangling in β-VAE,” arXiv:1804.03599 [cs, stat] (2018).
56.
T.
Bepler
,
E. D.
Zhong
,
K.
Kelley
,
E.
Brignole
, and
B.
Berger
, “Explicitly disentangling image content from translation and rotation with spatial-VAE,” arXiv:1909.11663 [cs, q-bio] (2019).
57.
E.
Prodan
and
P.
Nordlander
, “
Plasmon hybridization in spherical nanoparticles
,”
J. Chem. Phys.
120
,
5444
5454
(
2004
).
58.
P.
Nordlander
,
C.
Oubre
,
E.
Prodan
,
K.
Li
, and
M. I.
Stockman
, “
Plasmon hybridization in nanoparticle dimers
,”
Nano Lett.
4
,
899
903
(
2004
).
59.
E.
Prodan
,
C.
Radloff
,
N. J.
Halas
, and
P.
Nordlander
, “
A hybridization model for the plasmon response of complex nanostructures
,”
Science
302
,
419
422
(
2003
).
60.
SH:
Cho
et al, “
Spectrally tunable infrared plasmonic F,Sn:In2O3 nanocrystal cubes
,”
J. Chem. Phys.
152
,
014709
(
2020
).
61.
SV:
Kalinin
et al, “
Separating physically distinct mechanisms in complex infrared plasmonic nanostructures via machine learning enhanced electron energy loss spectroscopy
,”
Adv. Opt. Mater.
9
,
2001808
(
2021
).
62.
M.
Ziatdinov
,
C. Y.
Wong
, and
S. V.
Kalinin
, “Finding simplicity: Unsupervised discovery of features, patterns, and order parameters via shift-invariant variational autoencoders,” arXiv:2106.12472 [cond-mat] (2021).
63.
P:
Perna
et al, “
Conducting interfaces between band insulating oxides: The LaGaO3/SrTiO3 heterostructure
,”
Appl. Phys. Lett.
97
,
152111
(
2010
).
64.
K. M.
Roccapriore
and
A. R.
Lupini
,
STEM-EELS Hyperspectral Data: Nanowires, Nanoparticles, Interface
(
Zenodo
,
2024
).