Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe_{0.55}Se_{0.45} (T_{c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe_{1−x}Se_{x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.

^{1}in early 1980s has enabled direct atomic-level observation of many surface types, ranging from semiconductors and graphene to metals and oxides. This resulted in a rapid expansion of a wide spectrum of scientific areas from condensed matter physics to surface and catalysis science.

^{2,3}Early on, it was realized that the STM image is a convolution of the electronic structure of the probe and (position dependent) electronic structure of the sample. The relationship between the tunneling current and the electronic structures of the tip and the surface can be approximated as

*ρ*and

_{s}*ρ*are the density of states (DOSs) in the sample and tip, respectively, and

_{T}*M*

_{μν}is the tunneling matrix element between the modified wave-functions of the tip and the sample surface.

^{4}In STM imaging, the tip scans the surface maintaining constant tunneling current, i.e., mapping iso-current contours directly related to the local electronic structure. However, Eq. (1) contains additional information that can be obtained from the series of the STM images obtained at different bias levels; with one of the first of such examples being visualization of filled and empty states on the GaAs surface.

^{2}This multi-bias imaging approach has been developed into current imaging tunneling spectroscopy (CITS),

^{5}an experiment in which STM spectra are acquired on a grid to yield spectroscopic information at a point on the surface. The analysis of spectra at each point, using a predetermined physical model, allows parameters of interest to be extracted and represented as 2D maps. Among prominent examples of this technique is superconductor gap mapping in high temperature superconductors

^{6}or band gap and band edge mapping in oxide heterostructures.

^{7,8}

CITS mapping of the electronic structure is a laborious task that brings about a dual challenge: acquisition of high-quality data and interpretation in light of relevant material properties. The first challenge stems from the fact that a CITS dataset contains 2-3 orders of magnitude more of information than a single STM image; with commensurate increase in acquisition time and hence higher microscope stability requirements—to allow for atomic resolution. The first problem is continuously addressed with specialized low-noise, high stability, and low temperature STM platforms. The second challenge is the interpretation of the STM data set. With the wealth of additional information transpiring from the CITS method, it’s imperative to come up with the framework capable of handling, organizing, and divulging the internal structure of physical data to be interpreted. Traditionally, the analysis is performed by choosing an appropriate physics-driven model and fitting the scanning tunneling spectroscopy (STS) data to the model, e.g., using current thresholding to map band gaps or using Dyne’s formula for mapping superconductive gaps,^{9} etc. However, given that the functional form of STS is generally unavailable, this approach is fundamentally limited. The tunneling spectra often contain complex features beyond simple analytical description, and fitting with improper function can give rise to noise, parameter interdependence, and multiple artifacts.

Here, we report direct visualization of the spatial variability in the electronic structure of surfaces using multivariate analysis of the CITS data. As a model system, we have chosen layered FeTe_{0.55}Se_{0.45} compound, a prototypical layered, high-temperature superconductor.^{10} Fe_{1+y}Te_{x}Se_{1−x} family has a simple tetragonal P4/nmm space group, with layers of Fe atoms making up square planes that are tetrahedrally coordinated with chalcogen; the excess *y* sits in the planes of chalcogen.^{11} Their physical properties are very sensitive to *x* and can depend on excess Fe; crystals grow from Bridgman method.^{10,12} Crystals with optimal compositions of *x* = 0.55 and 0.5 and *y* ± 0.01 show bulk superconductivity at T_{c} = 14 K.^{10} The superconducting samples show spin excitations closest to ($ 1 2 $, $ 1 2 $, 0) wavevector^{13} and also exhibit a strong spin resonance in the spin excitation spectrum below T_{c}.^{14} Scanning tunneling microscopy topography shows that optimally-doped superconductor shows nanoscale Te and Se chemical inhomogeneity, but local electronic properties revealed by tunneling spectroscopy is homogeneous.^{15} For *x* = 0.66 and *y* = 0.04, angle-resolved photoemission spectroscopy and density functional calculations show one inner closed Fermi pocket and two outer cylindrical Fermi surfaces near **Γ** and two electron-like Fermi surfaces near the M point.^{16} It was found that small amount of chemical disorder (few percent) in the Fe magnetic ion sublattice, by partial replacement of Fe with Cu, Ni, or Co, is detrimental to superconducting state.^{17} The experimental data presented here were collected on a home-built low temperature STM system above the superconducting transition temperature.^{9,18} Sample was cleaved at room temperature *in situ* and then cooled down to 82 K; imaging and spectroscopy were performed at 82 K with a Pt-Ir tip at 50 mV in imaging mode. CITS data were collected in grid mode with the Z-feedback used for moving the tip from point to point and Z-height recorded at each pixel, prior to the Z-feedback being turned off during the bias sweep. An example STM image of this surface is shown in Fig. 1(a). The image illustrates typical topographical features observed on the (001) FeTe_{0.55}Se_{0.45}. Spectroscopic measurements were made within the imaged area, delineated by a white rectangle in Fig. 1(a).

In CITS mode, spectroscopy is performed at an individual point (x, y) with the current *I* recorded for a given applied voltage *U*. Thus the resulting data object is a 3D spectral image *I* (x, y, *U*), where *I* is the detected current, *U* is the tip bias, and x, y are spatial surface coordinates at which the measurement was done. Here, CITS imaging was performed on a 80 × 80 grid at −0.5 V to 0.5 V bias, sampled over 256 points. These data were collected on a 7.2 × 7.2 nm^{2} area, i.e., each pixel corresponds to 1 × 1 Å^{2}, with the lattice constant of the material being 3.8 Å. *Z* position of the piezo was recorded on a separate channel prior to the Z-feedback disengage resulting in a 80 × 80 pixel topographic map, shown in Fig. 1(b), which can be loosely correlated to the selected area in the Fig. 1(a). Approximate acquisition time for a CITS map was 5-6 h which resulted in some drift, apparent from the topography channel, Fig. 1(b).

The classical CITS analysis typically includes examination of the voltage segments *I* (x, y, *U*_{i}), profiles *I* (x, y_{i}, *U*), or *I* − *V* curves at selected locations *I* (x_{i}, y_{i}, *U*). Shown in Fig. 1(c) are examples of *I* − *V* curves at pixels labeled with a red and purple markers in Fig. 1(b). Note that while such representation allows insight into variability of electronic behavior across the sample surface, standalone data segments have a higher noise level, and hence, unreliable. In Fig. 1(c), *IV* curves show small variations in current behavior at −0.4 to 0 V, but these variations are subtle and are hard to classify and, therefore, quantify. As a result, these types of data tend to be generally ignored or analyzed only semiqualitatively. Fig. 1(d) illustrates a spectrogram of |*I*| vs. *V* curves along a black dotted line in Fig. 1(b). Note that while the local atomic environment along the line differs strongly based on the STM image in Fig. 1(a), the deviation in the current mostly dominates the left and right edges of the spectrogram, areas outside of the gap. Overall, spatial variation in individual *IV* profile contains additional information about the local chemical environment and their effects on the variation in the electronic structure. Currently this information content remains largely untapped, as it cannot be adequately addressed by conventional methods.

^{19}In PCA, a spectroscopic image of

*N*×

*M*pixels formed by spectra containing

*P*points, in our case 256 points of current measurement at −0.5 to 0.5 V, is converted into a linear superposition of orthogonal, linearly uncorrelated eigenvectors

*w*such that

_{k}*IV*curve at a selected pixel, and

*U*are the discrete bias values at which current is measured. The eigenvectors $ w k U $ and the corresponding eigenvalues

_{j}*λ*are found from the singular value decomposition of covariance matrix,

_{k}**C**=

**AA**

^{T}, where

**A**is the matrix of all experimental data points

**A**

_{ij}, i.e., the rows of

**A**correspond to individual grid points (

*i*= 1, …,

*N*⋅

*M*) and columns correspond to voltage points,

*j*= 1, …,

*P*. The eigenvectors $ w k U j $ are orthogonal and are arranged such that corresponding eigenvalues are placed in descending order,

*λ*

_{1}>

*λ*

_{2}> ⋯ by variance. In other words, the first eigenvector $ w 1 U j $ contains the most information within the spectral image data; the second contains the most common response after the subtraction of variance from the first one, and so on. In this manner, the first 0 −

*P*maps, $ a k x , y $, contain the majority of information within the dataset, while the remaining

*P*−

*p*sets are dominated by noise. The number of significant components,

*p*, can be chosen based on the overall shape of

*λ*

_{k(i)}dependence or from correlation analysis of loading maps, that correspond to each of the eigenvectors, $ a i k \u2261 a k x , y $. Additionally, Scree plots are used to correlate variance in each component as a function of the component’s number. For the Scree plot of this data set, see supplementary material. See supplementary material for Figure 1.

^{20}

Results of the PCA analysis of FeTe_{0.55}Se_{0.45} CITS data set are illustrated in Fig. 2. Here, Figs. 2(a), 2(b), 2(d), and 2(e) are loading maps for eigenvectors 1-3 and 6, respectively, that are shown in Figs. 2(c) and 2(f). Note that the eigenvectors and loading maps are defined unambiguously, in the statistical sense they represent levels of in the signal variance, mapped onto the sample surface, but as a result may not necessarily have rigorous physical interpretation. Hence their intensities are given in arbitrary units. However, in many cases, the eigenvectors can interpreted qualitatively based on characteristic shapes, as will be discussed below. The first principal component is the average *I* vs. *V* curve, based on the fact that raw data were presented to the PCA without mean subtraction, and the corresponding loading map visualizes the information on the chalcogen surface segregation comparable to that could be observed directly from the STM image in Fig. 1(a). Second and third eigenvectors illustrate band gap behavior, such as conduction band broadening in the 2nd eigenvector and parallel shifting of the band edges in the 3rd eigenvector. The loading map of the second eigenvector, Fig. 2(b), is highly similar to the first, but with a smoother gradient around the chalcogen rich areas and interfaces. The third loading map showcases types of behavior that are not immediately apparent from STM images. A smaller conduction gap, current decrease in the negative bias range and current increase for the positive bias range, Fig. 2(f), are immediately visible. Sixth component has even more complex structure, suggestion complex changes in conduction behavior. Interestingly, in the corresponding loading map (Fig. 2(e)), this behavior is manifested at the boundaries of the more prominent chalcogen regions shown in Figs. 2(a) and 2(b). All of the first nine eigenvectors with their corresponding loading maps are shown in the supplementary material. See supplementary material for Figures 2 and 3.^{20}

PCA sorting of the data into eigenvectors and corresponding loading maps allows for specific examination of where variations of the electronic structure take place on the surface, simultaneously it allows for mapping of key types of *IV* behavior and regions that exhibit this behavior. For this system, PCA suggest at least three major types of behavior, those associated with chalcogen clusters of Se and Te and the interface between the two. The fact that we see real contrast in more than just the first two eigenvectors suggests that the observed effects are not merely linear superpositions but rather reflect spatial variability of electronic structure which is non-additive. This information can be correlated to the real topographical locations shown in Fig. 1(a); where PCA categorizes areas of different types of behavior corresponding to clusters of different chalcogens and possibly other types of defects.

^{21}by k-means clustering. K-means algorithm divides

*M*points in

*N*dimensions into

*K*clusters so that the within cluster sum of squares

*μ*is the mean of points in

_{i}*S*, is minimized.

_{i}^{22}Here, we have used a Matlab k-means algorithm that minimizes the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances. As a measure of distance (minimization parameter), we have used sum of absolute differences with each centroid being the component wise median of the points in a given cluster. We applied a k-means clustering method on the first 20 meaningful principal components (clustering based on a different number of principal components (Figure 4) as well as clustering based on the Mahalanobis distance and a range of principal components (Figure 5) is included in the supplementary material.

^{20}), which capture all but 5 × 10

^{−5}% of the variation in the CITS data (as can be seen from the Scree Plot, see supplementary material for Figure 3

^{20}), in order to try to distinguish mean cluster behavior from noise and assign variation to phenomena at localized regions. The k-means result shown in Fig. 3(a) illustrates the data separated into three clusters as suggested by a top-down agglomerative hierarchical cluster tree (dendrogram) shown in Fig. 3(b). Figure 3(c) shows mean

*IV*curves from, respectively, color coded results in Fig. 3(a) as well as highlighting the variation in the cluster specific

*IV*curves (Fig. 3(d)) by subtracting them from the overall mean

*IV*curve for the data set.

This analysis captures significant deviation in the *IV* composition of red and yellow areas with boundaries between the two types of behavior showing up in blue. Similar to PCA, an additional benefit of this method is that it allows us to distinguish different surface behaviors due to clustering of the current response spatially. Here, the clusters II and III seem to be associated with the dissimilar chalcogen atoms, whereas cluster I is the interfacial region between dissimilar chalcogens. The corresponding variations in electronic behaviors compared to average are visualized in Fig. 3(d). Cluster II and cluster III have anti-symmetric behavior with respect to bias, whereas in II conduction is higher than average for negative biases and lower for positive, whereas reverse is true in cluster III. In cluster I, conductance in the positive bias range is identical to cluster II, but in the negative bias range the gap conductance in enhanced at a lower bias than in clusters II and III. Note that arriving at this conclusion without having the spatial mapping of the *IV* and, PCA results, would have been impossible. If the number of clusters is increased to five, six, and so on; these additional degrees of separation dominate the intra-cluster space, the blue region. Further clustering of the data highlights data outliers that lack any apparent spatial correlation. This finding suggests that we can distinguish variability based on the types of surface atoms, along with the boundaries in between.

It’s important to underline the qualitative agreement between the PCA and k-means, entirely different techniques. Both methods indicated a presence of two separate regions of behavior with an interface between them. In PCA, first and second loadings show a large enveloping region. It is quite similar to cluster III in k-means, which shows the same exact region. Third PCA loading highlights large interstitial areas, dissimilar to the first and simultaneously, k-means cluster II highlights an identical region. Since k-means cluster number was set to three, the presence of a separate interspatial region was not confirmed; however, PCA’s sixth component specifically features the interspatial behavior in Fig. 2(e).

To corroborate this interpretation and understand the theoretical limits in distinguishing (1) chemically separated regions, i.e., Se and Te rich regions as well as and (2) separating differences in local compositional variations, i.e., variations due to nearest neighbors, we apply the PCA and k-means clustering analysis to a theoretical DOS vectors obtained for each atom in a pre-determined atomic configuration. By performing such a study, we aim to explore whether variations in the electronic structure due to chemical separation and near-neighbor local compositional changes can be seen within our experimental resolution. Density functional theory based calculations were performed for a 4 × 4 × 1 supercell of FeTe_{(1−x)}Se_{(x)} at a fixed experimental lattice parameter corresponding to Vegards law for x = 0.5. Several different atomic arrangements were chosen, including (a) pure FeTe (x = 0) and FeSe (x = 1) (b) a checkerboard pattern of Se/Te for x = 0.5 (c) cluster configurations, one with Se and another with Te as shown in Figures 4(a)–4(c) with some Fe atoms omitted for clarity. Note that finite number of elements and computational constraints of the DFT model does not allow for all possible configurations to be explored, hence we selected cases that would be easy to interpret and are reasonably close the real system. Because we only want to test the multivariate statistical method to separate chalcogenide atoms/configurations, a spin non-polarized calculation was performed with semilocal Perdew-Burke-Ernzerhof generalized gradient approximation (PBE-GGA) for the exchange-correlation functional within the projector augmented wave (PAW) method as implemented in the *Vienna Ab Initio Simulation* package.^{23} PBE-GGA shows good agreement of the internal parameters with experiments as was shown in a previous study by Shi.^{24} Forces were relaxed down to 0.001 eV/Å using a 2 × 2 × 2 Γ centered **k**-point mesh. Brillouin zone integration was performed using a tetrahedron method. Local density of states was obtained by projecting onto the PAW spheres for individual atoms and then was smoothed with a Gaussian of width 0.03 eV.

As an emulation of approximate experimental conditions, we reduced the length of the calculated DOS spectrogram to the range of −2 eV to 1.82 eV before applying Gaussian smoothing of 0.03 eV. Results of the k-means clustering for the checkerboard pattern FeSeTe as well as Se and Te cluster configurations are shown in Figures 4(d)–4(f). For Se and Te clusters, Fe, Figs. 4(e) and 4(f), atoms were omitted for clarity. It is immediately clear that even though most of the spectrogram of the DOS information has been removed and jitter type deviations smoothed away, k-means allows differentiating all three types of atoms in the checkerboard case, Fig. 4(d), and separating Te and Se atoms in cluster configurations, Figs. 4(e) and 4(f). To reiterate, a DOS computational data set made to emulate experimental results was subjected to the k-means clustering algorithm that has successfully differentiated chemical species as well as differences in local environments. Furthermore, PCA analysis (see the supplementary material for Figure 6^{20}) as well as more in depth presentation of the k-means (see supplementary material for Figure 7^{20}) indicates that local bonding configuration can be extracted from this type of the analysis as well. Hence, in addition to differentiating the identity of a given atom, effects of the local environment of the atom including the type and number of nearest neighbors can be deduced. This directly confirms the interpretation of PCA and k-means data as credible and confirms the use of statistical methods as a novel and valid tool for chemical exploration of the spectroscopic STM data.

In the case of the FeTeSe system, understanding the surface layer and its termination allows a possible link between the structure factors and superconductivity to be established. Furthermore, due to the chemical sensitivity observed with the k-means on the DOS data, it could be possible to explore the FeTeSe surface for chemical impurities, as even seemingly insignificant chemical disorder in the Fe ion sublattice, by partial replacement of Fe with Cu, Ni, or Co, is detrimental to superconducting state. While the presented study is currently limited to the realm of theoretical data, with the aid of image registration techniques as well as supervised learning multivariate methods, it is possible to map the CITS signal to an imaged area with high levels of confidence and use more sophisticated clustering and unmixing techniques. These techniques used in conjunction will both improve the veracity of the experimental data and, through use of penalty functions related to the surface (such as spatial crystallographic constraints controlling atom distributions), provide results that are statistically significant and physically relevant.

To summarize, we demonstrate, for the first time, a direct multivariate statistical analysis of CITS data as an approach to local mapping of electronic structure and probing its correlation with local composition. Tools to complete this task for both experimental and theoretical data analyses were PCA and k-means clustering algorithm through the Matlab statistical toolbox. Using a combined theory and experimental approach, our results indicate that the spatial variability of the *IV* curves in experimental data pinpoints areas that show segregation behavior of chalcogen terminations on the surface. These segregated areas can be mapped individually along with interfaces between them. Hence, in our work we present a method that correlates spatial variability in the spectroscopic data with the local bonding environment captured by topography. This method is very powerful in correlating chemical and ionic functionalities at the atomic scale to large-scale correlated electronic behavior, such as superconductivity.

Research for A.B., W.L., S.V.K., B.S., and A.S. was supported by the U.S. Department of Energy, Basic Energy Sciences, Materials Sciences and Engineering Division. Research by G.P., S.J., and M.P. was conducted at the Center for Nanophase Materials Sciences, which is sponsored at Oak Ridge National Laboratory by the Scientific User Facilities Division, Office of Basic Energy Sciences, U.S. Department of Energy.

## REFERENCES

*Introduction to Scanning Tunneling Microscopy*

*Scanning Tunneling Microscopy*

*Scanning Probe Microscopy and Spectroscopy: Methods and Applications*

*Surface Science: An Introduction*