Two dimensional (2D) peak finding is a common practice in data analysis for physics experiments, which is typically achieved by computing the local derivatives. However, this method is inherently unstable when the local landscape is complicated or the signal-to-noise ratio of the data is low. In this work, we propose a new method in which the peak tracking task is formalized as an inverse problem, which thus can be solved with a convolutional neural network (CNN). In addition, we show that the underlying physics principle of the experiments can be used to generate the training data. By generalizing the trained neural network on real experimental data, we show that the CNN method can achieve comparable or better results than traditional derivative based methods. This approach can be further generalized in different physics experiments when the physical process is known.
Recent advances on experimental techniques in condensed matter physics boost the generation of large volume high-quality data. In order to present these data, 2D (and even 3D) representations of the data become more and more popular, such as in angular resolved photo-emission spectroscopy (ARPES),1 scanning tunneling microscopy (STM),2 and resonant inelastic x-ray scattering (RIXS).3 In these experiments, as the data quality is often limited by the instrumental resolution and different intrinsic physical processes, the retrieval of physical quantities with high precision can therefore benefit from effective data analysis methods.
As an example, a typical 2D ARPES experiment dataset is shown in Fig. 1. Ideally, the ARPES spectra should follow the theoretical energy–momentum dispersion shown in Fig. 1(a). However, intrinsic broadening effects1 such as electronic correlation and extrinsic factors such as crystal defects can broaden the spectrum in both energy and momentum dimensions. Together with the resolution limitation, sometimes it is difficult to resolve the energy bands in the 2D measurements [e.g., in Fig. 1(b)]. To enhance the features in the 2D band dispersions, several derivative based methods have been proposed, such as the Maximum Curvature (MC) method4 and the Minimum Gradient (MG) method.5 The MC method calculates the local curvature and assumes that the pixels with a large curvature are the positions of the energy bands; the MG method calculates the average gradient and assumes that the positions with a small average gradient represent the energy band location. However, these methods can only give reasonable results in high signal-to-noise ratio data and tend to fail in complicated situations when multiple bands are close to each other or when the data are too noisy.
On the other hand, recent development in machine learning and deep learning techniques such as convolutional neural network (CNN) provides great improvement in 2D data qualities by solving a series of inverse problems, such as super-resolution, denoising, and patching.6–8 As high-quality 2D data (i.e., images) are subjected to various degrading transformations in experiments, the objective of our data analysis becomes finding an appropriate inverse transformation that can best recover the original images—which can be formulated as solving an inverse problem.
Motivated by this insight, we consider the energy band extraction problem in a 2D ARPES image (broadened by the intrinsic and extrinsic processes as discussed above) as a problem of inversing the physical processes that blurs the spectral peaks along the band dispersions, and thus, it can be treated using CNN. In other words, we effectively look for a map between the “broadened experimental dispersions” and the “original dispersions.” Moreover, leveraging the existing knowledge of the intrinsic and extrinsic broadening processes, massive simulated data can be generated for the training purpose before we apply the trained model to the real experimental data.
In this work, we use the simulated ARPES data to train a modified Super-Resolution Convolutional Neural Network (SR-CNN) that fits the spectral intensity map of ARPES experiments with the extrema feature map, thus visualizing the positions of the energy bands. SR-CNN was originally proposed for mapping low resolution image patches to corresponding high resolution patches, which is an inverse problem of down sampling in image processing.9 Compared to other recent neural network architectures in solving this problem that uses very deep net,7,8 the SR-CNN has only three layers and the function of each layer can be well understood. We demonstrate that this method can resolve complicated features in ARPES data and outperforms previous traditional algorithms in noise-resilience, sharpness, and accuracy.
METHOD AND RESULTS
Results from different algorithms
For reference, the theoretically calculated energy band is presented at Fig. 1(a) to compare with the experiment data in Fig. 1(b). The key feature shown in the calculation is that there are three energy bands crossing the Fermi energy (EF) and band No. 2 terminates around EF and merges with other faint bands. In addition, band No. 1 has a degenerate point at k = 0 about 60 meV below the EF. Ideally, a successful data analysis algorithm should resolve all these features from the ARPES intensity map.
Due to the experiment restriction, the number of pixels in the current data along the k-direction is only 29. The low resolution makes it even more difficult to directly track the features from the experiment data. The MC and MG methods show significant enhancement of the energy band features [as shown in Figs. 1(c) and 1(d)] but are either noisy (MC) or still blurry (MG). The result of our method is shown in Fig. 1(e), demonstrating all three EF-crossing bands and the crossing below the Fermi surface (FS). It also shows that the band in the middle (No. 2) ends at about −15 meV below the FS, which matches the theoretical calculation.
Super-resolution convolutional neural network
The design of our CNN method is modified from SR-CNN, a three-layer fully convolutional neural network proposed by Dong, Loy, He and Tang for the image super-resolution.9
The architecture of our modified SR-CNN is shown in Fig. 2(a). In SR-CNN, each layer is understood as an operation that is comparable to traditional image restoration methods:6 The first layer transforms the raw experimental image patches to an n1-dimensional representation under a trained basis set. The second layer non-linearly maps the new representation to the n2-dimensional sharp energy band feature space. At the final stage, the image of energy band is reconstructed by averaging all the features from the adjacent area.
Assuming that Y is the input image, Wi and Bi are the weights and bias in the ith layer, this three-layer neural network can be mathematically written as
The “*” denotes a convolution. The result of the convolution in the first two layers is passed to a rectified linear unit [ReLU(x) = max(0, x)] to produce non-linearity.10 The output layer is with a sigmoid activation function, ().
Note that although the architecture we are using is almost the same as SR-CNN, the problem we are trying to solve is different from what SR-CNN was originally designed for: In the original SR-CNN, the super-resolution problem was formulated as a regression problem, where no non-linear activation function was applied in the last layer, and the mean squared error (MSE) loss between the network output and the ground truth was minimized. In our design, the problem was formulated as hout × wout classification problems. The sigmoid activation function is applied in the third layer to obtain the confidence for the existence of the energy band, giving an output ranging from 0 to 1 naturally. We minimize the standard loss function for classification, i.e., the cross entropy, also known as the negative log likelihood of the ground truth, which is defined as
where Xn denotes each pixel in the label, denotes the SR-CNN output from the training data, and θ denotes the parameters in the SR-CNN (i.e., bias and weights).
The setting we used is = 64, = 9 for size(W1) = [n1, , , 1]; n2 = 32 for size(W2) = [n2, 1, 1, n1]; = 5 for size(W3) = [1, , , n2], which is default, as shown in the previous SR-CNN work.6 The output size can be derived as
The receptive field of each output pixel is 13 × 13 (i.e., each output pixel is determined locally by the neighboring 13 × 13 pixels).
The training processes
For a low-level vision task such as super-resolution, the training process follows the paradigm of self-supervised learning, which does not require to manually label the training data. The low-resolution inputs Y are generated by down-sampling the high-resolution image patches X in the training set. The output generated from the forward propagation process is then compared with the original high-resolution image X to compute the loss function . Backpropagation is used to minimize the loss function with respect to θ iteratively.9 In this process, the high-resolution image patches are used as the label, and the down-sampled ones are used as the input of the model.
For the task of the energy band feature extraction from the ARPES experiment data, the training data and the label should be chosen more carefully. For any experiment data, it is not possible to label the energy band position automatically and it is not accurate to label it by human guess. To solve this dilemma, simulated data are used for the training and testing process, while the experiment data are used to evaluate the result.
Figure 2(b) shows the data simulation and the training process. The energy band ϵ(k) is generated through a tight binding model with randomized parameters. Since the SR-CNN output is only affected locally by the neighboring 13 × 13 pixels, the global property such as symmetry of the energy band does not affect the model. The energy band image is then generated by discretizing ϵ(k) through putting 1’s in the occupied pixels and 0’s in the unoccupied pixels. The generated image is finalized by super-sampling for anti-aliasing11 and then used as the label.
To simulate the detected experimental data corresponding to the label, the ARPES spectrum broadening model is used. Assuming that we have an energy band with a shift due to self-energy, E = ϵ(k) + Σ′(k, E), the one-particle spectral function is written as1
which means that the energy band is broadened and shifted by the self-energy that comes from the electronic interactions. The observed data, the photon count, follow
where FD(E) is the Fermi–Dirac distribution, M(k) is the matrix element, is the intrinsic noise such as the shot noise, ⊗ denotes convolution, R(Δk, ΔE) is the instrument resolution, and is the extrinsic noise such as circuit noise.
When the variation of self-energy cannot be ignored within the convolution window in the targeting experiment data, then this effect should be considered in the simulation process. However, as a proof-of-concept design, we currently assume that this part is slow-variate and can be ignored in our model due to the localized, all-convolution network design. To generate the simulated data, Σ′ is not considered since the result is interpreted as the local peak positions in the experimental data. The energy band image is convolved with the one-particle spectral of randomized self-energy Σ″ and is added with a Poisson noise. Gaussian blurring is then applied to simulate the instrument resolution. A Gaussian noise is added at the final step for the circuit noise. The finalized broadened and noisy simulated image is used as the training data.
The optimization is done iteratively. Before the training process, multiple (∼20 in the presented study) simulated 3D band structures are generated, and 1000 2D-slices are cut from the volume as training data. The initial value of the filters and bias are from the result of the SR-CNN in natural images.6 During one training step, multiple patches randomly selected from the training data are passed through the forward propagation to produce the output. The loss function is then calculated between the output and the label to produce a gradient flow to determine the changing direction of parameters. An Adam optimizer is used for the backpropagation12 to update the parameters. The training is iterated for ∼30 × 106 steps until the parameters of the model performance saturate.
Benchmark of the performance
|.||Gap size .||Gap size .||Peak width .||.|
|.||(meV) .||error (%) .||(meV) .||Contrast .|
|.||Gap size .||Gap size .||Peak width .||.|
|.||(meV) .||error (%) .||(meV) .||Contrast .|
Figures 3(a) and 3(b) compare the sensitivity to bandgaps for different methods. We applied each method to a simulated dataset with a small energy gap of ∼30 meV. The gap is indistinguishable either from the visual inspection of the image or from the energy distribution curve (EDC). The results of different methods are summarized in Table I. The MC method increases the visibility of the gap feature with an inferred gap size of 25 meV from fitting, but the output is noisy [Figs. 3(a-iii) and 3(b-iii)]. The minimum gradient resolves most of the band clearly but falsely produces high intensity values inside the gap [Figs. 3(a-iv) and 3(b-iv)]. This is because the local intensity landscape is a saddle point (minima along the E direction but maxima along the k direction), where the average gradient reaches minimum as it does at a peak point or at the ridge. This saddle-point problem prevents the MG method from resolving the small gap feature in general. As shown in Figs. 3(a-v) and 3(b-v), CNN resolves the gap with a size of 24 meV. Although MC and CNN provide the comparable results in gap size, the contrast of the CNN result is much larger, and the resolved bands are sharper. The result is also more noise resilient in CNN.
To evaluate the performance of our algorithm in experiment data, an example of the 2D experimental dispersion (from sample PtSe2 in Ref. 13) is used in Fig. 3(c). A degeneracy point is expected in the PtSe2 band structure [Fig. 3(c-i)]. The experiment data are too blurry to directly delineate the two bands [Fig. 3(c-ii)], and neither the MC method nor the MG method can resolve the two bands clearly [see Figs. 3(c-ii) and 3(c-iii)]. However, the CNN method can produce a clear result showing the two bands crossing with less noise and artifacts [Fig. 3(c-v)]. This result shows that the CNN method can achieve comparable or better performance than the traditional methods, especially in the resilience to the noise and sharpness of the resulting features.
While the CNN method extracts the main feature in Fig. 3(c-v) successfully, one can note an unexpected up-bending feature at k ∼ 0.06 Å−1 near ED, which is not shown in the computational result. To test whether this is a new feature or an artifact, we corrupt the data with Gaussian noise in Fig. 3(d). While the CNN results with the corrupted data show consistent features of the band crossing, there is no sign of the up-bending band. Thus, we can infer that the unexpected feature may be an artifact accidentally introduced by experimental noise or pre-processing steps.
The consistent results with noise-corrupted data show the robustness of the CNN method. However, the result quality reduces qualitatively as the noise goes up. Therefore, one should be cautious when interpreting the CNN results for experiments with a high signal-to-noise ratio and repeat the experiment when necessary.
Understanding the CNN model
To understand the mechanism of the CNN method, the filters and responses from the first layer are visualized in Fig. 4. Although not all filters have clearly understandable meanings, some of the filters can be identified with specific functions. Some of the filters are sensitive to the inclined ridges (e.g., filters No. 1 and 2); some of the filters are sensitive to flat bands (e.g., filter No. 3); some of the filters act like a ‘background detector’, which separate the band and the noisy background (e.g., filter No.4); and some of the filters act as a convolution along the horizontal axis (e.g., filter No. 5), which is similar to the traditional energy–momentum curve peak finding method.4 In the second layer, these feature maps are averaged with weights and finally used to reconstruct the sharp-feature map in the output layer.
The output of the CNN method marks the position of the peaks, which stands for the position of the energy band. However, the information on the self-energy, which is of interest in analyzing the behavior of the correlated system,14,15 is not extracted. In the future study, this part of information could be included by adding a branch in the neural network.
One issue of using neural network is over-fitting,16 when the training set is too small, and the model fails to generalize the other datasets. This issue should be tackled carefully when using the simulated data, as shown in Fig. 5. Since the experimental data are always noisy, a model trained with the noise-free simulated data can treat the experimental noise as separated peaks, resulting in the discontinuity of the output energy band, as shown in Fig. 5(b) (marked with red arrow 1). Smoothing is commonly used as a pre-processing step to suppress the noise and enhance the continuity of the output energy bands in both MC and MG methods, and it can also be used before applying CNN, as shown in Fig. 5(c). However, smoothing also causes loss of small features [as marked with red arrow 2 in Figs. 5(b) and 5(c)]. To avoid the smoothing step and enhance the model’s stability under noise, random artificial noise is added in the training data. As a result, the output of the model for the experiment data shows a continuous energy band without the necessity of smoothing and thus keeps the detailed features [Fig. 5(d)].
The training data generation process only considers the one-particle spectral function and the Fermi–Dirac distribution. The self-energy is randomized and fixed as constant for any single simulated data and is varied between 10 meV and 45 meV in the training set. Although this range only covers limited values for the self-energy, the experimental data can be resampled in the analysis so that the peak width matches the training set in the unit of pixel. In this case, we deliberately choose to have this proof-of-principle model free from any other physical effect explicitly. However, if the self-energy variation is slow enough (i.e., near constant within 13 × 13 pixels), the model is still expected to robustly work as an approximation due to the independent nature among each patch fed into the CNN convolutional window. The detected peak position in such systems may be systematically drifted by what one would expect from EDC/MDC fitting.
We demonstrate this effect by evaluating our CNN model in the Bi2212 system (Fig. 6). The nodal spectra we use here is chosen from the deeply underdoped side of the phase diagram (p = 0.063, Tc = 22 K),17,18 where both electron–electron correlation and electron–phonon coupling effects are universally accepted as strong. In particular, the electron self-energy has both a highly binding energy-dependent component due to correlations and a characteristic phonon-kink at around EB ∼ 70 meV. A traditional way of deciding the dispersion involves either EDC or MDC fitting, which tends to have artifacts when the dispersion is heavily interaction-broadened and too steep/flat. One can see that the CNN successfully extracted the “kink” in the band structure, which, even better, lies right in between the MDC (red line)/EDC (red cross) results. This shows the advantage of the 2D nature of the CNN method, which overcomes unwanted biases introduced in unidirectional analysis such as 1D EDC/MDC fitting or the unidirectional second derivative method. For future studies, one could expand the training assumptions and incorporate these physical processes in training data generation to achieve more accurate results.
Our model shows good performance in the presented datasets. However, one would expect for more complex datasets (e.g., with large noise, complex local geometry, or unmodelled physical properties), well-designed preprocessing steps will be needed, and the model may need further improvement.
In summary, in this work, we demonstrate that the neural network method shows good sensitivity to the dispersive features in real ARPES data and its resilience to noise. Our results show that the feature extraction in 2D data analysis of the physics experiment can be treated as an image restoration problem in computer vision and can be tackled with a convolutional neural network. The training process that uses simulated data provides an example that the neural network can be used to solve the inverse problem when the physical model is known. Moreover, this methodology can also be used in general peak hunting from the multidimensional data and can be potentially used in other physical processes.
The work at the Department of Physics, University of Oxford is supported by the EPSRC (UK) Platform Grant (Grant No. EP/M020517/1). Y.C. acknowledges the support from the Oxford-ShanghaiTech collaboration project and the Shanghai Municipal Science and Technology Major Project (Grant No. 2018SHZDZX02). The work at the Stanford Institute for Materials and Energy Sciences is supported by the Department of Energy, Office of Basic Energy Sciences, Division of Material Sciences and Engineering, under Contract No. DE-AC02-76SF00515.