Ptychographic imaging is a powerful means of imaging beyond the resolution limits of typical x-ray optics. Recovering images from raw ptychographic data, however, requires the solution of an inverse problem, namely, phase retrieval. Phase retrieval algorithms are computationally expensive, which precludes real-time imaging. In this work, we propose PtychoNN, an approach to solve the ptychography data inversion problem based on a deep convolutional neural network. We demonstrate how the proposed method can be used to predict real-space structure and phase at each scan point solely from the corresponding far-field diffraction data. Our results demonstrate the practical application of machine learning to recover high fidelity amplitude and phase contrast images of a real sample hundreds of times faster than current ptychography reconstruction packages. Furthermore, by overcoming the constraints of iterative model-based methods, we can significantly relax sampling constraints on data acquisition while still producing an excellent image of the sample. Besides drastically accelerating acquisition and analysis, this capability has profound implications for the imaging of dose sensitive, dynamic, and extremely voluminous samples.
Ptychography has emerged as a versatile imaging technique that is used with optical, x-ray, and electron sources in scientific fields as diverse as cell biology, materials science, and electronics. X-ray ptychography is well developed and widely used, with multiple beamlines dedicated to the technique at synchrotron sources across the world. With the ability to conduct high resolution imaging of large volumes in thick samples with little sample preparation, x-ray ptychography has provided unprecedented insight into countless material and biological specimens. Examples include a few nanometer resolution imaging of integrated circuits,1 high resolution imaging of algae2 and stereocilia actin,3 strain imaging of nanowires,4 and semiconductor heterostructures with Bragg ptychography.5 Analogously, electron ptychography has led to remarkable breakthroughs, including achieving deep sub-angstrom resolution6 and nanoscale 3D imaging.7
Ptychographic imaging is performed by scanning a coherent beam across the sample while measuring the scattered intensities in the far-field. Subsequently, the image is recovered by algorithmically inverting the measured coherent diffraction patterns. Inversion (or image reconstruction) of ptychographic imaging data requires the solution of an inverse problem, which is the problem of recovering lost phase information from measured intensities alone (commonly referred to as phase retrieval). Currently, in ptychography, the phase retrieval problem is solved using model-based iterative methods that are computationally expensive, precluding real-time imaging.8 In addition, the convergence of these iterative reconstruction algorithms is often sensitive to phase retrieval parameters, such as the choice of algorithms and the initial image and probe guess. The choice of these parameters tends to be subjective. Iterative model-based methods also require a large degree of overlap of measurement area to converge; i.e., adjacent measured scan points need to overlap by at least 50%. As the overlap is in 2D, this constraint can drastically limit the area or volume of the sample that can be scanned in a given amount of time.
Neural networks have been described as universal approximators that can represent complex and abstract functions and relationships.9 As a result, neural networks and deep neural networks, in particular, have been applied to a variety of problems in computer vision, natural language processing, and autonomous control.10 Specific to the problem of image reconstruction, deep neural networks have been used to invert magnetic resonance imaging (MRI) data,11 coherent imaging data in the far-field,12 and holographic imaging data.13
In this Letter, we present PtychoNN, a deep convolutional neural network that learns a direct mapping from far-field coherent diffraction data to real-space image structure and phase. We also demonstrate its training and application to experimental x-ray ptychographic data. Our results show that, once trained, PtychoNN is hundreds of times faster than Ptycholib,14 a production-ready high performance ptychography package. In addition, since PtychoNN learns a direct relation between diffraction data and image structure and phase, overlap constraints are no longer required for data inversion, further accelerating data acquisition and reconstruction by a factor of 5.
Ptychography measurements were obtained at the x-ray nanoprobe beamline at 26ID of the Advanced Photon Source. A tungsten test pattern etched with random features was scanned across a 60 nm coherent beam that was focused by a Fresnel zone plate. The sample was a custom designed calibration chart fabricated in 1.5 μm tungsten by Zoneplates Ltd. A scan of 161 × 161 points was acquired in steps of 30 nm, which corresponds to a 50% spatial overlap. At each scan point, coherently scattered data were acquired in the far-field using a Medipix3 area detector with a 55 μm pixel size placed 900 cm downstream of the sample. The counting time at each scan point was 1 s. Real-space images of amplitude and phase were subsequently recovered from the set of coherent diffraction patterns using 400 iterations of ePIE, as implemented in the Ptycholib package.14
Figure 1 shows the structure of PtychoNN, a deep convolutional neural network that takes as input the raw diffraction patterns and outputs both structure and phase. PtychoNN does not use any knowledge of the x-ray probe but, instead, learns a direct mapping from the reciprocal space data to the sample amplitude and phase. The neural network architecture consists of three parts: an encoder arm that learns a representation (encoding) in the feature space of the input x-ray diffraction data and two decoder arms that learn to map from the encoding of the input data to real-space amplitude and phase, respectively. The encoder arm consists of convolutional and max pooling layers and is designed to learn representations of the data at different hierarchical levels. Conversely, the decoder arms contain convolutional and upsampling layers that are designed to generate real-space amplitude and phase from the feature representation of the data provided by the encoder arm. This network architecture was chosen so that a single network can predict the amplitude and phase, which minimizes the number of network weights that need to be learnt. The number of network weights was also kept to a minimum by using only convolutional and down/up sampling layers (no dense layers). Among other advantages, networks with fewer weights are faster to train and to make predictions.15
To train the network, we used the reconstruction obtained through the iterative phase retrieval from the first 100 lines of the experimental scan. Supplementary material Fig. 4 shows the portions of the scan used for training and testing. Hence, the training set consisted of 16 100 triplets of raw coherent diffraction data, real-space amplitude, and real-space phase images. The training data were split 90–10 into training and validation, and the weights of the network were updated to minimize the per-pixel mean absolute error (MAE). Weight updates were made using adaptive moment estimation (ADAM) with a starting learning rate of 0.001.16 The learning rate was halved whenever training performance reached a plateau, i.e., when validation loss did not decrease over five training epochs. Each epoch represents one entire pass over the training data. Training continued for several epochs until a minimum was observed in the validation loss. Supplementary material Fig. 5 shows the evolution of the training and validation losses during training. Once trained, we evaluated the performance of the network on the remaining portion of the scan, i.e., on the last 61 lines of the scan (see the supplementary material Fig. 1).
Figure 2 shows single-shot examples of the performance of PtychoNN on data from the test region of the experimental scan, i.e., data that the network did not see during its training. The figure shows the original diffraction data, as well as the reconstructions achieved by ePIE and PtychoNN's predictions, over a 640 nm field of view. The results demonstrate how PtychoNN is able to predict real-space amplitude and phase from input x-ray diffraction data alone. We also note that even though the full width half max (FWHM) of the beam was ∼60 nm, PtychoNN can reproduce a 640 nm field of view from a single diffraction data point. This is a result of the PtychoNN's ability to take advantage of information in the diffraction patterns produced by the tails of the beam, which extends several hundred nanometers from the central bright focus (supplementary material Fig. 2). Supplementary material Fig. 6 shows more examples of PtychoNN's single-shot predictions along with the differences to the ground truth. To recover the entire test scan, we averaged PtychoNN's predictions from each scan point (which were spaced 30 nm apart). Figure 3 shows the PtychoNN's average predictions of amplitude and phase over the test area (1.8 1.8 μm), as well as the respective reconstruction using ePIE. As discussed previously, the test area was taken from a region of the scan that was not shown to the network during training. Supplementary material Fig. 4 shows the region of the scan used for testing, i.e., the portion of the full scan that is depicted in Fig. 3. The results demonstrate how PtychoNN's predictions are remarkably accurate when compared to those obtained by ePIE, while also being ∼300 times faster. The time taken to reconstruct the test area of the scan (61 × 61 points) using Ptycholib14 was 310 s, while PtychoNN took < 1 s.
Since PtychoNN learns a direct mapping from the reciprocal space data to real-space amplitude and phase without benefiting at all from the 50% overlap condition, we can now explore the possibility of using PtychoNN to invert sparse-sampled pytchography data. Figure 4 shows a comparison of the performance of PtychoNN and iterative phase retrieval for different overlap conditions. The results show how, when using a 30 nm step size, the necessary overlap condition for a standard ptychographic reconstruction, ePIE, is able to retrieve accurate amplitude and phase images from the diffraction data. However, if we attempt to perform iterative phase retrieval with data that has less than 50% overlap, we begin to notice the presence of artifacts, first in the retrieved amplitude image [Fig. 4(a)] and then also in the phase image [Fig. 4(c)]. In contrast, the amplitude and phase predicted by PtychoNN remain remarkably accurate even when the position overlap is sub-sampled by a factor of 5 [Figs. 4(b) and 4(d)]. Similar results have been previously demonstrated on simulated ptychography data by Guan et al.17
This capability of PtychoNN to drastically invert sparse-sampled ptychographic data can be extremely beneficial, especially when dealing with dose-sensitive, dynamic, or extremely large samples. By reducing the density of points that need to be sampled, PtychoNN can significantly reduce the radiation dose needed to image at a given resolution. Similarly, while it is always desirable to minimize the data acquisition time, this is particularly vital in the case of dynamic samples in order to capture transient phenomena. We note that this sparse-sampled approach is achieved without changing the focal condition of the optic, which is currently the only means of switching between different fields of view (at the cost of resolution).
Finally, we turn our attention to the question of how much training data are needed in order to obtain reasonably accurate results. Typically, the training of deep neural networks requires millions of training examples and enormous computational resources, leading to days or weeks of training.18 In the results presented so far, we used 16 000 experimental training examples to train the network; below, we evaluate the performance of PtychoNN when less training data are available. Figure 5 shows the performance of PtychoNN when trained on progressively fewer training examples (from left to right). The results show how PtychoNN can generate reasonable predictions when trained on as few as 800 experimental samples. This corresponds to an area that is only ∼3% of the total scan area. Training on this small set was achieved in less than a minute on a single NVIDIA V100 GPU. The robustness of the network even when employing very reduced training sets potentially allows us to train PtychoNN on-the-fly, with limited computational resources. We note that if there is sufficient variation between scan points, the amount of training data required is only dependent on the number of scan points, i.e., how many triplets of diffraction, amplitude, and phase are available for training. Once trained, PtychoNN only takes ∼1 ms to make a prediction of the amplitude and phase from the diffraction data at each scan point.
In conclusion, we have demonstrated an end-to-end machine learning solution to solve the ptychography data inversion problem on experimental x-ray diffraction data. We note that a similar approach has been demonstrated on simulated ptychographic data under the assumption of a known probe function.17 Our work extends the results presented by Guan et al. by making a key modification to the training of the network. By using a portion of the scan to train the network, our approach is agnostic to the number or nature of the probe modes. Furthermore, the use of a subset of the scan to train PtychoNN ensures that the network is trained on data that is representative of the experimental conditions (detector noise, vibrations, etc.). Our experimental results report that once the network has been trained, PtychoNN is up to 300 times faster than Ptycholib (a high performance GPU-accelerated iterative phase retrieval solution) at inverting diffraction data. We emphasize that this speedup is only in the data inversion step. A further speedup can be obtained by scanning sparsely, which could potentially lead to a 5× faster scanning from reduced data acquisition (Fig. 4). Additionally, unlike many deep learning models that require enormous computational resources to train, PtychoNN can be trained and deployed on a single GPU on the edge.19 Finally, in addition to remarkably faster experimental feedback, PtychoNN's ability to recover accurate real-space images from sparse-sampled ptychographic data has the potential to revolutionize ptychography on dose sensitive, dynamic, and extremely voluminous samples.
In general, to ensure maximum accuracy in machine learning models, the training and test sets need to be drawn from the sample space. In effect, this means that the portion of the scan used to train PtychoNN must have similar features to the remaining portions of the scan. This can be ensured by training on patches drawn from different portions of the sample as opposed to one contiguous block (as was done in this paper). In some situations, such as a dynamic sample undergoing significant structural changes, PtychoNN may not be able to generalize to the new sample state. Extending the ability of deep learning models to generalize to test samples far removed from the training set is an active area of computer science research, and future versions of PtychoNN will incorporate these ideas into its architecture.20
We believe that the results in this Letter have widespread ramifications for both x-ray and electron ptychographic imaging experiments, especially in light of the increased data rates as a result of faster detectors and brighter sources.21,22 Coherent imaging techniques, including ptychography, are one of the primary drivers for several major upgrades to synchrotron sources across the world, including the Advanced Photon Source Upgrade (APS-U), the European Synchrotron Research Facility Extremely Brilliant Source (ESRF-EBS), and PETRA-IV. These upgrades are designed to increase not just the photon flux but also their coherence by factors of hundreds or thousands. When faced with a 100–1000X increase in coherent imaging data at these upgraded sources, traditional iterative phase retrieval methods will not be able to keep pace with the experimental data. Further development of machine learning methods for data inversion, such as PtychoNN, will be vital to unlock the full potential of these vast infrastructure upgrades and to keep pace with the data generated by coherent imaging experiments.
See the supplementary material for (1) a comparison between the transmission contrast and ptychographic reconstruction, (2) an image of the probe and its radial profile, (3) Fourier ring correlation to compute the resolution achieved through ptychography, (4) a figure showing the region of the scan used to train and test PtychoNN, (5) network training and validation loss as a function of training epoch, and (6) further examples of single-shot prediction along with difference maps.
This work was performed, in part, at the Center for Nanoscale Materials. The use of the Center for Nanoscale Materials and Advanced Photon Source, both Office of Science user facilities, was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Contract No. DE-AC02-06CH11357. This work was also supported by Argonne No. LDRD 2018-019-N0: A.I C.D.I: Atomistically Informed Coherent Diffraction Imaging.
The data that support the findings of this study are openly available in Jupyter notebooks at https://github.com/mcherukara/PtychoNN.