Optofluidic time-stretch quantitative phase imaging (OTS-QPI) is a potent tool for biomedical applications as it enables high-throughput QPI of numerous cells for large-scale single-cell analysis in a label-free manner. However, there are a few critical limitations that hinder OTS-QPI from being widely applied to diverse applications, such as its costly instrumentation and inherent phase-unwrapping errors. Here, to overcome the limitations, we present a QPI-free OTS-QPI method that generates “virtual” phase images from their corresponding bright-field images by using a deep neural network trained with numerous pairs of bright-field and phase images. Specifically, our trained generative adversarial network model generated virtual phase images with high similarity (structural similarity index >0.7) to their corresponding real phase images. This was also supported by our successful classification of various types of leukemia cells and white blood cells via their virtual phase images. The virtual OTS-QPI method is highly reliable and cost-effective and is therefore expected to enhance the applicability of OTS microscopy in diverse research areas, such as cancer biology, precision medicine, and green energy.
I. INTRODUCTION
Optofluidic time-stretch quantitative phase imaging (OTS-QPI) is a potent tool for biomedical applications as it enables high-throughput imaging flow cytometry of numerous single cells at >100 000 cells/s in a label-free manner.1–6 OTS-QPI reconstructs the bright-field and quantitative phase images of flowing cells from the spectral interferograms of temporally stretched optical pulses that contain cellular profiles, such as morphology, refractive index, and thickness.7–13 It inherits the merit of OTS to image a large heterogeneous population of flowing cells while bypassing the need for a high-speed camera3,14–25 and also takes advantage of QPI to obtain biologically relevant structural information by measuring the refractive index and thickness of each cell. By combining these powerful capabilities, OTS-QPI is a highly promising tool for large-scale single-cell analysis and has been utilized in a diverse range of biomedical applications, including evaluating microalgal culture conditions,1,2 screening blood cells,4 investigating spleen tissue,5 and characterizing cellular protein concentrations.6
Unfortunately, a few critical limitations of OTS-QPI must be addressed before its wide deployment in various fields. First, OTS-QPI inevitably requires complex optical instrumentation and fine optical alignment as it employs an interferometer, which is prone to mechanical vibrations and misalignment, to generate phase images. Second, since the phase information on target objects (e.g., cells) normally lies in the high-frequency region of the temporal interferograms, a costly wide-bandwidth photodetector and a high-speed analog-to-digital converter (ADC) are needed to fully acquire the high-frequency signal. Correspondingly, the high sampling rate of the ADC results in large data volume that is hard to transfer and store. Third, the efficiency of recovering phase images from the temporal interferograms is significantly suppressed by inevitable high-frequency noise,2 which induces phase-unwrapping errors that are difficult to correct. In addition, special algorithms are usually required for phase image recovery, leading to a significant increase in computational cost. These limitations hinder OTS-QPI from being widely applied to diverse applications.
In this paper, to bypass the above limitations, we present a QPI-free OTS-QPI method that generates “virtual” phase images from their corresponding bright-field images by using a deep neural network trained with numerous pairs of bright-field and phase images. The training process of the method, which we call virtual OTS-QPI, and the process of generating virtual phase images are shown in Figs. 1(a) and 1(b), respectively. Consequently, the virtual OTS-QPI method can perform OTS-QPI without the need for actual phase measurements. This concept was driven by the rapid development of deep learning and convolutional neural networks (CNNs) in the commercial sector, which offered a practical solution to the limitations. Specifically, our trained generative adversarial network (GAN) model enables the generation of virtual phase images with high similarity (structural similarity index >0.7) to their corresponding real phase images. Also, we demonstrate the high-accuracy (>96%) classification of three types of leukemia cells (HL-60 cells, Jurkat cells, K562 cells) and white blood cells based on their bright-field and virtual phase images. The virtual OTS-QPI method is highly reliable and cost-effective and is therefore expected to enhance the applicability of OTS microscopy in diverse research areas such as cancer biology, precision medicine, and green energy.
II. MATERIALS AND METHODS
A. Optofluidic time-stretch quantitative phase imaging (OTS-QPI)
A frequency-shifted OTS-QPI setup was used for the simultaneous acquisition of bright-field and quantitative phase images of single live cells, which were used to train the GAN. The experimental setup is schematically shown in Fig. 2(a). A home-built ytterbium-doped fiber laser was mode-locked and used as a light source, with a repetition rate of 33.97 MHz, an average output power of 20 mW, a center wavelength of 1030 nm, and a spectral bandwidth of 23.7 nm. Each optical pulse from the laser was temporally stretched by a 20-km-long dispersive fiber (Nufern, 1060-XP) with a total group-velocity dispersion (GVD) of −380 ps/nm. The stretched pulse was then amplified by an ytterbium-doped fiber amplifier (YDFA) and split into the signal arm (up) and the reference arm (down) at a ratio of 50:50. In the signal arm, the incident pulse was spatially dispersed by using a diffraction grating (Thorlabs, GR25-1210, 1200 grooves/mm) to form a one-dimensional (1D) rainbow pattern and focused by using an objective lens (Olympus, LCPlan N, 50×, NA0.65) onto the microfluidic channel, which is shown in Fig. 2(b), such that the spatial profile of the flowing cell was encoded onto the amplitude and phase of the transmitted pulse. The information-carrying pulse was then recovered into the time domain by using the symmetrical optical layout. In the reference arm, the optical frequency was up-shifted by one-fourth of the repetition rate of the laser (i.e., 8.49 MHz) by using two acousto-optic modulators (AOMs) that were controlled by a home-built phase-locked loop. The pulses from both arms were collected and combined in another 50:50 fiber coupler to form a beat note, which was subsequently detected by a photodetector, digitized by an oscilloscope (Tektronix, DPO71604B) at a sampling rate of 50 GS/s, and recovered into bright-field and phase images with programs written in MATLAB.
B. Generative adversarial network (GAN)
We used the GAN26–28 to transform bright-field images into virtual phase images. The GAN has a unique design composed of two networks: generator and discriminator. These two networks were trained in an adversarial way, as the generator generated virtual phase images from bright-field images while the discriminator tried to distinguish these generated images from the real images. According to the feedback of the discriminator, the generator was gradually optimized to generate high-quality virtual phase images that would eventually be indistinguishable from the real images for the discriminator. Also, the training process optimized not only the generator, but also the discriminator. The constantly improving discriminator forced the generator to focus on more essential and abstract features that were emphasized by the discriminator such that the generator was able to output high-quality images. By taking advantage of the generator–discriminator structure, the GAN is superior to traditional CNN-based image translators as it adapts to the training data and thus generates high-quality images, which bypasses the problems (e.g., blurring) that originate from specific image-comparison algorithms of other CNNs.29,30 Therefore, the overall quality of the GAN-generated images generally surpasses that of the images generated by other CNNs.
We constructed the generator based on the U-Net architecture,31 which is illustrated in Fig. 2(c). The generator contained an encoder and a decoder, which were designed in a symmetrical structure that contained eight convolutional blocks in each function. Each block comprised a 4 × 4 convolutional layer, a batch normalization layer, and an activation layer with the Leaky Relu (LRelu) activation function. Skip connections were added between symmetric layers to carry low-level features from the encoder to the decoder to prevent information loss. In the generation process, a bright-field image of size 256 × 256 × 3 was input to the encoder and compressed into a 1 × 1 × 512 tensor at the end of the encoding path. This tensor was input to the decoder and reconstructed back into a 256 × 256 × 3 virtual phase image at the end of the decoding path. It should be noted that the pixel values of the generated phase image were normalized to range from 0 to 1 due to the normalization operation. The generated virtual phase image was then evaluated by the discriminator.
We adapted the PatchGAN architecture to construct the discriminator,32 which is shown in Fig. 2(d). The network contained a concatenation layer and five encoding blocks. Batch normalization was applied to all blocks except the first and the last one. The LRelu function was used as the activation function for the first four blocks, while the Sigmoid function was used for the last one. Both the bright-field and phase images with size 256 × 256 × 3 were merged into a 256 × 256 × 6 tensor in the concatenation layer and input to the encoding blocks. This merged tensor was eventually compressed to a 30 × 30 × 1 tensor, which was passed through the Sigmoid function to output a probability between 0 and 1. The ideal output was 0 for the merged tensor of bright-field and virtual phase images and was 1 for the merged tensor of bright-field and real phase images.
C. Evaluation of image similarity
We used the structural similarity index (SSIM) to evaluate the quality of the generated images as it is a common tool for assessing images generated from GAN models.33–35 The SSIM is a well-established algorithm that evaluates the similarity between two images based on correlation, luminance, and contrast. Mathematically, the SSIM is determined by means, standard deviations, and the covariance value of two images. Denoting the mean value as μ, the standard deviation value as σ, and two images as x and y for the real image and virtual images, respectively, the SSIM is defined as
where is the covariance value between two images. c1 and c2 are two constants that are used to increase the stability of the evaluation. The SSIM function scores the similarity of two images with a positive value between 0 and 1, where 0 indicates that the generated image is totally different from the real one and 1 indicates that the generated and real images are identical.
D. Autoencoder (AE)
To examine the quality of the generated virtual phase images, we constructed an autoencoder (AE) for high-accuracy image-based cell classification.36,37 The AE shares the same structure as we applied in our previous work.38 Specifically, the AE is composed of an encoder, a decoder, and a classifier.39,40 The encoder consisted of four hidden layers, while the decoder shared the same structure in a symmetric way. Here, to eliminate the bias induced by different value ranges of the real phase images, which contained true phase shift values, and the virtual phase images (ranging from 0 to 1), all phase images were normalized to range between 0 and 1 before inputting them into the encoder. During the training process, the encoder compressed images into 1024-dimensional vectors containing essential cellular information at the bottleneck layer from which the decoder reconstructed the images. The classifier was connected to the bottleneck layer to directly classify the images by the 1024-dimensional vectors. When the loss of the AE stopped descending for 6 epochs, the training process was terminated with all parameters fixed. The optimized AE was then used to evaluate the images in the test set to generate a confusion matrix. The corresponding t-distributed stochastic neighbor embedding (t-SNE) plot was also generated as a 2-dimensional projection of the 1024-dimensional feature space.41
E. Microfluidic chip fabrication
The microfluidic chip used in this work was designed and fabricated to form a stable, high-speed, and linear flow of single cells in the microchannel for the image acquisition of the OTS-QPI system. Specifically, two sheath flows and a sample flow were injected simultaneously into the channel to form a laminar flow condition such that cells in the sample flow were confined into a single line on the focal plane by hydrodynamic focusing. The microfluidic chip was fabricated using the standard soft lithography method.42 Polydimethylsiloxane (PDMS, Dow Corning) was poured onto the master mold on which the patterns of the channel had been developed. After healing at 80 °C for 15 min, a small piece of coverslip was placed on the PDMS layer right above the observation area of each microfluidic channel in order to resist the pressure inside the channel at high flow speed. The PDMS was solidified with another 1-h heating process and was cut into small pieces for fitting into the glass slides with the sample inlet, sheath inlet, and outlet opened by a 25 G needle. The PDMS blocks and the glass slides were treated with a plasma cleaner (Harrick Plasma) for permanent bonding. The dimensions of the microchannel in the imaging area were 80 μm in width and 40 μm in height, respectively.
F. Cell preparation
The leukemia cell lines used in this work, including HL-60, Jurkat, and K562, were purchased from RIKEN Cell Bank, incubated under normal conditions, and harvested for OTS-QPI measurements. Specifically, the cells were incubated in 75 cm2 culture flasks (Corning, 430641U) with 5% carbon dioxide (CO2) at 37 °C. RPMI-1640 medium (Sigma-Aldrich, R8758) containing 10% fetal bovine serum (FBS, Sigma-Aldrich) and 1% Penicillin–Streptomycin solution was used as the culture medium. For the flow cytometric measurements, the cells were centrifuged at 400 g for 3 min, harvested, and resuspended with phosphate-buffered saline (PBS) to a final concentration of 1 × 107 cells/ml.
The target cells in the blood samples were separated by density gradient centrifugation. For WBCs, 5 ml of blood was drawn with ethylenediaminetetraacetic acid (EDTA) as the anticoagulant from a healthy donor. The blood sample was carefully layered above the 3-ml density gradient medium, Lymphoprep (STEMCELLS, ST07851), in a 15-ml centrifuge tube. Then, the sample was centrifuged at 800 g for 20 min at room temperature.43 The cell band of WBCs was collected and harvested into a centrifuge tube and diluted with 0.9% NaCl solution for further measurement. For single platelets and agonist-induced platelet aggregates, 5 ml of blood was drawn from a healthy donor with citric acid as the anticoagulant. 500 μl of the blood was transferred into a 2-ml Eppendorf tube and incubated with 50 μl of PBS for the sample containing single platelets or with 50 μl of agonist solution contained 14-μM U46619 (Cayman Chemical, 16450) for 10 min for the sample containing platelet aggregates. The sample was diluted using 5 ml of 0.9% NaCl solution and layered above 3 ml of Lymphoprep, followed by the centrifugation at 800 g for 20 min. Then, 1 ml of the sample was taken around the cell band of WBCs to which 1 ml of 2% paraformaldehyde (Wako) was added for fixation.
III. RESULTS
A. Demonstration of virtual OTS-QPI
To demonstrate the capability of the GAN in generating virtual phase images, we compared them with the real phase images. The bright-field, real phase, and virtual phase images of leukemia cells, WBCs, single platelets, and platelet aggregates are shown in Figs. 3(a) and 3(b). Specifically, the bright-field and real phase images were obtained by the frequency-shifted OTS-QPI system at a throughput of 15 000 cells/s. We trained two GAN models for the generation of virtual phase images. The first GAN model was trained on a training dataset containing the bright-field and phase images of 3908 WBCs, 5000 HL-60 cells, 5000 Jurkat cells, and 5000 K562 cells. The second model was trained on another training dataset that contained the images of 4000 WBCs and 4000 platelets or platelet aggregates. The training process took approximately 12 h for 120 epochs. The trained GAN models were then used to generate virtual phase images from the bright-field images in test datasets that were independent of the training datasets. For the first model, 2000 phase images of each cell type were generated from their corresponding bright-field images, while the virtual phase images of 3500 WBCs and 3509 platelets or platelet aggregates were generated by the second model. It took about 10 min to generate 10 000 virtual phase images. It can be directly observed that the generated virtual phase images resemble the corresponding real phase images in terms of shape and size, regardless of the cell type. The biases in the bright-field images, such as the variations in brightness induced by nonuniform illumination and defocusing, were also eliminated in the virtual phase images. The GAN also generated virtual phase images that have minor discrepancies from the real phase images for platelet aggregates with distinct morphology from WBCs, indicating that the GAN was not simply following the same pattern for all types of cells. It is important to note that the characteristic lobed cell nucleus of a polymorphonuclear cell (i.e., a subtype of WBCs) was also generated in the virtual phase image even if the cell nucleus was nearly invisible in the bright-field image. This is presumably because the GAN model recognized certain patterns in the bright-field images that represented the cell nuclear regions, although they were indiscernible to human eyes. However, additional noise can be observed in the virtual phase images, especially for single platelets, which indicates that the generated images are not perfectly identical to the real images. Therefore, the influence of these differences on the application of virtual phase images needs to be further evaluated to demonstrate their practical applicability.
B. Evaluation of virtual phase images
To quantitatively evaluate the quality of the generated virtual phase images, we calculated the SSIM between pairs of real and virtual phase images of three types of leukemia cells, WBCs, and U46619-induced platelet aggregates. Specifically, we evaluated 2000 pairs of images for each type of cells. The SSIM scores were plotted as histograms shown in Figs. 4(a)–4(e). The mean values and standard deviations of all the histograms are summarized in Fig. 4(f). The average SSIM scores for HL-60 cells, Jurkat cells, K562 cells, WBCs, and platelet aggregates were found to be 0.808, 0.846, 0.872, 0.791, and 0.710, respectively. We also segmented the cell region from the background of both real and virtual phase images and calculated their average SSIM scores and standard deviations, as shown in Figs. 4(g) and 4(h). The average SSIM scores of the cell region of the images of HL-60 cells, Jurkat cells, K562 cells, WBCs, and platelet aggregates were found to be 0.961, 0.971, 0.914, 0.916, and 0.690, respectively, while those of the background region were found to be 0.759, 0.776, 0.945, 0.739, and 0.650, respectively. The SSIM scores agree with the subjective observation that most of the virtual phase images and their corresponding real phase images are visually similar. Additionally, the relatively small standard deviations indicate the uniform quality of the generated images, demonstrating the robustness of the GAN. Nevertheless, among the five cell types, the images of platelet aggregates were ranked with the lowest SSIM scores. One possible reason is that, in these images, similar patterns in both the cell region and background were captured by the GAN and thus transformed into high phase-shift areas in an identical way. Another reason for the discrepancy may be that the signal-to-noise ratios are relatively low in both the bright-field images and phase images of single platelets due to their small size (i.e., 2–5 μm in diameter). In such cases, it is hard for the GAN to extract sufficient information to correlate bright-field images and real phase images, which leads to the relatively low quality of the transformation. The evaluation results indicate that our GAN models have good performance on pixel-to-pixel transformation from the bright-field images to the phase images. Moreover, from the comparison between images of the cell region and background, we think that the relatively low SSIM scores of whole images are attributed to the random noise in the background instead of dissimilarities in the cell region. Therefore, these generated virtual phase images are expected to have equally significant use to real phase images in biomedical applications.
C. Application of virtual OTS-QPI to the classification of leukemia cells
To demonstrate the practical utility of the virtual OTS-QPI method, we classified WBCs and three types of leukemia cells using their bright-field and virtual phase images. We first compared two AE models in classification accuracy: one trained with only the bright-field images and the other trained with both bright-field and phase images from the same training dataset that was used to train the first GAN model. Note that for both models, while 90% of the images (randomly picked) were used for training, the other 10% of the images were used as the test dataset to plot the confusion matrices and the t-SNE plots, as shown in Figs. 5(a)–5(d), respectively. The training took around 40 min for 100 epochs for each model. Both models achieved an average classification accuracy of over 96%. We further tested the two models using a mixed dataset, which contained the images of the three types of leukemia cells and WBCs (2000 cells for each type). Note that the bright-field images of these 8000 cells were used to generate the virtual phase images by the GAN. For the AE model trained with only the bright-field images, the test dataset contained only the bright-field images of the 8000 cells. For the AE model trained with both bright-field and phase images, we performed two classifications: one on the test dataset containing the bright-field and real phase images of the 8000 cells and the other on the test dataset containing the bright-field and virtual phase images of the 8000 cells. It took around 1 min to classify the images of 8000 cells. Although there is no obvious discrepancy in the classification accuracy of the two models, the prediction results given by the model trained with the bright-field images only [Fig. 5(e), 21.68%, 11.73%, 35.63%, and 30.96%, respectively] are less accurate than those given by the model trained with both the bright-field and real phase images [Fig. 5(f), 22.00%, 27.49%, 27.56%, and 22.95%, respectively], which indicates that the combination of bright-field images and phase images can dramatically improve the reliability of the cell classification. Then, we used virtual phase images to show enhanced prediction accuracy by inputting the bright-field and virtual phase images of the same dataset into the model trained by both bright-field and real phase images. Meanwhile, the predicted ratios of the second AE model on the combined bright-field and virtual phase images are shown in Fig. 5(g) (24.75%, 21.94%, 27.70%, and 25.61%, respectively), which achieved the same level of consistency with the ground truth as the combination of the bright-field and real phase images. This is partly attributed to the fact that the phase images provided intracellular information, such as nuclear shape and refractive index, that is essential to high-accuracy classification. In addition, the phase images were not affected by the noise or biases as much as the bright-field images, firmly demonstrating that the classification is robust. Furthermore, the equivalently high classification accuracy values indicate that our virtual phase images are reliable, as the essential cellular information was extracted from the bright-field images and correctly transformed into the phase images by the GAN.
IV. DISCUSSION
In this work, we demonstrated a method to generate high-quality virtual phase images that resembled the real phase images, regardless of the cell type, from the corresponding bright-field images using the trained GAN model. Both the real and virtual phase images showed similar enhancement values in the AE-based cell classification compared with the results using the model trained with only the bright-field images. From the technical perspective, we achieved virtual QPI of cells based on OTS microscopy while bypassing the need for the costly QPI module and eliminating the drawbacks of phase image recovery. In terms of utility, by providing essential intracellular information such as the nuclear shape and refractive index, the virtual OTS-QPI method significantly improved the robustness of image-based cell classification and is thus highly promising for rapid, reliable, and accurate cell detection and identification.
Although the advantages of the virtual OTS-QPI method were experimentally demonstrated in this work, it can be further improved as follows. First, in addition to phase imaging, different imaging modules (e.g., fluorescence, Raman, and second harmonic generation) can be installed to the optical setup to obtain the ground truth for training the GAN to generate various types of virtual images of cells. Second, since the GAN cannot generate images of cell types that are not included in the training process, a larger image library containing images of numerous types of cells can be obtained to train a more comprehensive GAN model. Third, a more thorough investigation should be conducted on the features extracted by the GAN and AE, which should provide more clues about how cellular morphology and refractive indices are related to the cell identity. Understanding such correlations is expected to interpret the working mechanism of deep learning in cell image classification, especially for those obtained with QPI.44 These improvements are expected to dramatically increase the capabilities of the GAN to generate different types of highly reliable virtual images for high-accuracy cell classification, which will fuel further applications.
In light of our results, the virtual OTS-QPI method provides new possibilities for applications in a wide range of areas. For example, virtual OTS-QPI requires fewer computational resources for phase image recovery and thus holds great promise for real-time image acquisition and analysis that are desirable for image-activated cell sorting.45–47 In addition, since virtual OTS-QPI can be performed with a simple optical setup, which is less prone to mechanical vibrations and misalignment than conventional OTS-QPI, it can be employed to build compact, portable devices for cell detection and analysis for such applications as field studies and point-of-care clinical tests. Furthermore, similar to other deep-learning-based or -assisted imaging techniques,48 the optimized GAN can be generalized and transferred to other bright-field imaging systems, such as conventional bright-field microscopes, to bridge the gaps between various imaging modalities and enable broader applications. In summary, the GAN-based virtual OTS-QPI method is a powerful tool with great potential due to its unique traits and is thus expected to lead to breakthroughs in diverse fields.
AUTHOR’S CONTRIBUTIONS
H. Yan and Y. Wu contributed equally to this work.
ACKNOWLEDGMENTS
This work was supported by the ImPACT program of the Council for Science, Technology, and Innovation (Cabinet Office, Government of Japan), JSPS Core-to-Core Program, JSPS Postdoctoral Fellowship, JSPS KAKENHI (Grant No. 19H05633), White Rock Foundation, and Progetto Bandiera, La fabbrica del Futuro.