Digital holography (DH) is a powerful imaging modality that is capable of capturing the object wavefront information, making it very valuable for diverse scientific research applications. Generally, it requires ample illumination to enable good fringe visibility and a sufficient signal-to-noise ratio. As such, in situations such as probing live cells with minimal light interaction and high-speed volumetric tracking in flow cytometry, the holograms generated with a limited photon budget suffer from poor pattern visibility. While it is possible to make use of photon-counting detectors to improve the hologram quality, the long recording procedure coupled with the need for mechanical scanning means that real-time extremely low-light holographic imaging remains a formidable challenge. Here, we develop a snapshot DH that can operate at an ultra-low photon level (less than one photon per pixel). This is achieved by leveraging a quanta image sensor to capture a stack of binary holographic frames and then computationally reconstructing the wavefront through integrating the mathematical imaging model and the data-driven processing, an approach that we termed PSHoloNet. The robustness and versatility of our DH system are demonstrated on both synthetic and experimental holograms with two common DH tasks, namely particle volumetric reconstruction and phase imaging. Our results demonstrate that it is possible to expand DH to the photon-starved regime, and our method will enable more advanced holography applications in various scientific imaging systems.

## I. INTRODUCTION

Digital holography (DH) is a versatile optical imaging technique that records an interference pattern, known as a hologram, on an electronic sensor, which can then be computationally processed to retrieve wavefront information.^{1} Because of this unique capability, it is used in a wide range of applications, such as biomedical imaging,^{2,3} environmental research,^{4,5} and particle field reconstruction.^{6,7} Typically, good illumination and a sufficient exposure time are needed to provide clear holographic patterns; such requirements unfortunately restrict the further use of this powerful modality under more challenging imaging conditions. For instance, for some biological samples or photo-materials vulnerable to light irradiation, there is strict control of the illumination power in consideration of the phototoxicity of the specimen.^{8} Similarly, for high-speed particle tracking, the exposure time has to be very short in order to capture fast dynamics.^{9} There is a clear need to detect such weak signals amidst a noise level that is possibly significantly higher.^{10} In DH, we can observe in Fig. 1 that the low signal-to-noise ratio (SNR) causes the hologram quality to deteriorate substantially, thus posing great challenges for conventional reconstruction algorithms to extract meaningful information from the holograms containing highly contaminated fringe patterns.

These situations necessitate research in *photon-starved holography* since there are scarce photons to form conventional holographic patterns at the sensor plane. Various approaches have been developed to enable DH at a low signal power.^{11–13} Without introducing extra hardware, the purely computational methods are mostly based on noise removal and signal enhancement, such as angular matching^{12} and deep-learning transformation.^{13} However, conventional image sensors, such as CCD and CMOS image sensors used for registering holograms have a minimum photon flux requirement of around 10^{4} photons per pixel (PPP).^{14} As one pushes the capability of DH to extremely low photon flux situations, i.e., one PPP or lower, a new hologram formation mechanism with single-photon detection has to be considered.

Yamamoto *et al.* first attempted to tackle this challenging problem by using a photon-counting detector (PCD).^{15} Benefiting from the continuous progress in sensor development, Demoli *et al.*^{16} then used a single-photon avalanche photodiode (SPAD) to record the holographic information of signals overshadowed by noise. More recently, Kang *et al.* proposed a deep neural network for retrieving phase information at low photon counts using random phase modulation.^{17} In these methods, the long recording time for the mechanical scanning process (as long as 30 h^{16}) and the extra hardware for phase generation^{17} restrict their practical deployment. Therefore, we further aim to make the photon-starved holography a *snapshot imaging system*, which can enable the *in vitro* capture of live cells with minimal sample interaction and high-speed volumetric imaging.

To make photon-starved snapshot holography possible, an image detector with photon-level sensitivity and high resolution is necessary. Among various photon-counting image sensors reported over the past decade,^{18–22} the quanta image sensor (QIS) stands out with its small pixel size (e.g., ≤1 *μ*m), high photon detection efficiency, and ultra-low readout noise. This leads to various applications in the low quanta flux regimes, including security, night vision, and quantum computing.^{23–25}

As reported in Ref. 26, a QIS can be realized using a high-gain jot based on existing CMOS techniques^{18–20} or using a SPAD.^{21,22} In this paper, we primarily focus on the CMOS-based QIS although our mathematical formulation can be extended to a SPAD-based QIS. Among the many properties of these sensors, we consider the one-bit mode, where the measured signal is either 1 or 0. This would allow the fastest sampling rate for moving scenes. Yet, integrating this photodetector in DH is not a trivial task, due to its binary signal detection mechanism and very different noise characteristics, which would require specifically designed reconstruction algorithms.

In this work, we overcome several difficulties to present a photon-starved snapshot holography system, which integrates a QIS to take advantage of its low-light detection, and a physics-informed wavefront reconstruction network called PSHoloNet that we have designed to process the captured signals. Through numerical simulations, we demonstrate the generalization capability of the proposed PSHoloNet from a simplified sensor model during training to a more advanced sensor model during inference. To our best knowledge, it is the first time that a QIS is integrated into a coherent imaging system. To demonstrate the feasibility through synthetic experiments, we show two DH applications, namely 3D particle volumetric reconstruction and phase imaging.

## II. METHOD

### A. System design and image forward modeling

*ϑ*(

*x*,

*y*) =

*A*(

*x*,

*y*)

*e*

^{jϕ(x,y)}is placed at the axial position

*z*= 0, with

*A*and

*ϕ*representing its transmittance and phase response, respectively. Suppose that we have an incident plane wave with wavelength

*λ*. Based on the diffraction theory,

^{27}the transmitted wave received at the sensor plane is

*h*

_{z}denotes the impulse response of free-space propagation over a distance

*z*. It is represented using the Fresnel approximation as

*ϑ*

_{1},

*ϑ*

_{2}, …,

*ϑ*

_{L}], assuming that there is no inter-object interference and any nonlinear terms are treated as noise (e.g., twin image and the DC term), the intensity pattern (hologram) captured by the image sensor is

*M*unit space area, where the light exposure remains constant over an exposure time (

*M*degree of freedom). It can be represented by a 1D vector (i.e.,

**I**= [

*i*

_{1},

*i*

_{2}, …,

*i*

_{M}]) through raster scanning, and each element is expressed using the matrix notation for a compact form as

*Q*=

*LM*, each column of $H\u2208CM\xd7Q$ is a discretized representation of the impulse response kernel $hz\u2113$ defined in Eq. (2) at distance

*z*

_{ℓ}, and [⋅]

_{m}denotes the

*m*-th element of a vector.

^{20}Because of the small jot size and fast temporal response, a QIS is regarded as an oversampling device in both spatial and temporal dimensions with sampling ratios

*K*

_{s}and

*K*

_{T}, respectively.

^{28,29}We use the overall upsampling factor

*K*=

*K*

_{s}

*K*

_{T}for simplicity of notation. Therefore, the output of a QIS can be considered as a spatial–temporal cube with

*N*=

*KM*units. For a set of units responding to the

*m*-th discrete hologram area with light exposure

*i*

_{m}, the accumulated light flux at a single unit is

*φ*

_{n}, each jot collects a number of photons impinging on its surface, which is represented by

*Y*

_{n}. The relationship between

*φ*

_{n}and photon count

*Y*

_{n}is stochastic and is described by a Poisson–Gaussian distribution,

^{28}

^{,}

*η*

_{dark}is the dark current and

*σ*is the standard deviation of the read noise. Since the dark noise of a QIS is negligible,

^{30}the probability distribution of

*Y*

_{n}is

*q*be the threshold, and the quantized binary output

*B*

_{n}at each unit is, then,

*Y*

_{n}, which is drawn from the distribution described in Eq. (7). The goal of reconstruction is to recover the object light response

**v**from the binary measurements

*B*

_{n}, which is linked by the discretized light flux

*φ*

_{n}at the hologram plane. Thus, the probability distribution of

*B*

_{n}is first expressed as

*q*= 1. While the optimal threshold design is an emerged topic, which could potentially improve the performance of QIS devices, especially in high dynamic range imaging,

^{31}we choose to follow a standard configuration for the sake of consistency and comparability with the existing literature.

^{18–20}While this research direction is not included in this study, it is certainly worthy of further investigation. Previous computational methods of QIS signal processing focus on the Poisson component by ignoring the read noise.

^{28,31}We follow the same assumption in this study. However, it is worth noting that we test our algorithms on the realistic model considering the influence of different read noise levels. In our implementation (during training), we assume that

*σ*= 0. This gives us

**v**and

*φ*

_{n}in Eq. (5), the probability distribution at each sensing unit is further expressed as

*K*is the overall sampling ratio as stated previously. Therefore, due to the identical and independent measurements at each sensing unit, a QIS outputs a signal

**b**= [

*b*

_{1},

*b*

_{2}, …,

*b*

_{N}] with the joint distribution as

*b*

_{n}is a binary value, introducing two functions

### B. Physics-informed network

*f*(

**b**|

**v**) in Eq. (17), the optimal object is estimated by the

*maximum a posteriori*(MAP) solution of a Bayesian framework with the prior constraint

*p*(

**v**),

*μ*is used to control the relative effect of the regularizer on the reconstruction. Substituting Eq. (17) into Eq. (18) and taking the negative logarithm operation to transform the product term into summation, the MAP estimate solution of the proposed imaging modality is derived as

^{25}the forward and prior terms are decoupled by introducing two independent auxiliary variables

**and**

*φ***, which reformulate the original minimization problem into the constrained format,**

*α**s*(

**) to represent the task-oriented prior in an implicit form, which will be introduced later. The augmented Lagrangian function for this problem is given by**

*α***u**

_{1},

**u**

_{2}are the scaled Lagrangian multipliers and

*ρ*

_{1},

*ρ*

_{2}are the penalty parameters associated with two constraints in Eq. (20). The minimizer of Eq. (20) is the saddle point of Eq. (21),

^{32}which is found by repeatedly performing the following alternating direction method of multipliers (ADMM)

^{32}steps until convergence:

^{33–35}

The conceptual illustration of such a method is shown in Fig. 3(a). The iterative process (left) is unfolded into an end-to-end trainable network (right). Specifically, each updating step in the iteration is transferred into one layer that is cascaded in the deep network. Passing through the network is equivalent to executing the iterative algorithm a finite number of times. In this way, the algorithm parameters ** θ** (such as the model parameters and regularization coefficients) are transferred to the trainable network parameters

*θ*^{1},

*θ*^{2}, …, which are automatically tuned via back-propagation under supervised learning.

Following this scheme, we design an end-to-end trainable network for the photon-starved holographic imaging system (PSHoloNet), which is the unrolled version of the iterative process in Eqs. (22)–(26). Its *backbone* structure is shown in Fig. 3(b), which consists of several blocks. Each block contains three layers corresponding to the subproblems in Eqs. (22)–(24), respectively. The network input is the snapshot QIS recordings, while the output differs for different inference applications.

**v**-update that is defined in Eq. (22). It is a least-square estimate problem, which has a closed-form solution. In order to avoid matrix inversions, the fast Fourier transform is used for an efficient implementation as

**-update described in Eq. (23), it is a proximal mapping of**

*φ***= [**

*φ**φ*

_{1},

*φ*

_{2}, …,

*φ*

_{N}] associated with the negative likelihood for a truncated Poisson distribution. Defining $\phi \u0303t=Hvt+1\u2212u1t$ transforms Eq. (23) into

^{36}For the

*m*-th pixel, the optimal is acquired by finding the root of the equation where the gradient is set to zero,

The ** α**-update layer at the

*t*-th block is the solution of Eq. (24), which is the proximal point of $\alpha \u0303t=vt+1\u2212u2t$ with respect to the prior term

*s*(

**). Therefore, it is customized to make PSHoloNet adaptive for various inference tasks.**

*α*^{6,37}Therefore, the

**-update is designed to introduce sparsity regularization. We consider a transform $T(\u22c5)$ that sparsifies the layer input $\alpha \u0303t$. To avoid handcrafted parameters for constructing such a sparsity basis, the convolutional neural network (CNN) blocks are used with their generalized approximation property.**

*α*^{38}The sparsity of the transformed signal $T(\alpha \u0303t)$ is encouraged by the soft-threshold operation on the transformed domain. The inverse of $T(\u22c5)$, denoted by $T\u22121(\u22c5)$, is then introduced to recover the signal back to its original signal domain. Moreover, it takes the same structure with $T(\u22c5)$ for a consideration of structure symmetry. Such a symmetric constraint is further emphasized by incorporating it into the loss function during the network training (see Sec. II C). Therefore,

*α*^{t}can be efficiently computed as

*θ*

^{t}is the shrinkage threshold and is a trainable parameter at the

*t*-th block. The detailed structures are presented in the Appendix.

*α*^{t}from the

**-update layer at the**

*α**t*-th block can be interpreted as a denoising process of the layer input $\alpha \u0303t$. Such an image-to-image registration can be rephrased as

^{39}It is composed of a set of downsampling and upsampling operations, of which each has a skip connection between the resulting feature blocks of the same size. This reduces the semantic gap between the encoder and decoder features. The residual blocks are included within each operator, which aims to boost the network’s ability to overcome vanishing gradients. The detailed structures are presented in the Appendix. To increase the network flexibility, the penalty parameters

*ρ*

_{1}and

*ρ*

_{2}are varied across different phases that are determined in one-time training.

### C. Network implementation

*α*^{t}stands for the output of the

*t*-th phase and

**v**

^{gt}stands for the ground truth.

^{40}is included for preserving the global similarity between the output and its target equivalents, which is defined as

*α*^{t}at the

*t*-th stage and the ground truth phase map

**v**

^{gt}, and $\sigma \alpha t$ and $\sigma vgt$ are their standard deviation, respectively.

*T*stands for the total number of phases in the unrolled network and

*γ*is the regularization parameter. In our experiments,

*γ*is empirically set as 0.01 and

*T*= 5 considering the convergence speed and computational cost. In both applications, the network is implemented using Python, version 3.7.0, with the PyTorch framework, version 1.11.1. We use a single NVIDIA GeForce RTX 3070 GPU to accelerate the training process. The Adam optimizer

^{41}is used with an initial learning rate of 0.0001, which can handle the sparse gradients and realize efficient computation.

## III. RESULTS AND DISCUSSION

We demonstrate photon-starved snapshot holography using a simple Gabor holographic setup equipped with a QIS as the recording device. To enable 3D particle volumetric reconstruction and phase imaging, we implement two variants of a modular network, which we term PSHoloNet, as described in Sec. II B. We first mathematically model the imaging system and investigate its performance via simulation. Then, we verify the system using experimental holograms of particle volumes and analyze the influence of read noise, which is unavoidable in real sensors. The results are presented in Secs. III A–III C.

### A. Reconstruction of synthetic 3D particle volume

A photon-starved condition with less than one photon per pixel is created by setting the mean rate of Poisson statistics of photon arrival to vary between 0 and 1. The QIS, which consists of an array of photodetectors called jots, detects photon arrivals following the “uniform-binning” scheme. This device is considered an oversampling mechanism with an overall ratio *K*, which is comprised of spatial oversampling *K*_{s} and temporal oversampling *K*_{T} (refer to Sec. II A for more details). In this study, we set *K*_{s} = 4 and *K*_{T} = 100, in accordance with the established practice.^{24,31} This configuration is equivalent to an integration time of 0.1 s under a 1 kHz frame rate.^{21} By utilizing this setup, we are able to obtain accurate and reliable measurements of photon arrivals with a larger field-of-view that supports a snapshot capture of the whole hologram without the mechanical scanning process.

We now consider a volume of 3D particles as the target object. The particles are assumed to be pure amplitude disks with a diameter of 20 *µ*m, which are randomly spread in a volume with a distance of 5–7 cm from the hologram plane. It is discretized to 256 × 256 × 7 voxels, with an *xy* resolution of 10 *µ*m. Following the image formation model of the QIS-DH system, the holograms generated on the sensor plane are normalized and set as the mean rate of the Poisson arrival, in accordance with the low-flux photon condition $(<1ppp)$. We incorporate this physical model into an end-to-end training framework to construct the PSHoloNet, in order to detect the volumetric distribution of the 3D particles. We incorporate this physical model into an end-to-end training framework to construct the PSHoloNet, in order to detect the volumetric distribution of the 3D particles. The physical model provides *a priori* knowledge of the underlying system, which can be used to constrain the network and further improve its learning effectiveness that supports a small training dataset. Considering the complexity of the network and the size and diversity of the datasets, we generate 500 holograms and we follow the same typical data-splitting arrangement with a ratio of 7:2:1 for training, validation, and testing datasets. We train the network for 300 epochs and cost 4 h in total, where the loss function converges to a stable value and further training does not improve the performance of the network significantly. We find that it is sufficient to effectively train the network and achieve high accuracy in our tests. After the network training phase, which only needs to be done once, the trained PSHoloNet is capable of processing a single snapshot QIS recording in just 0.04 s.

We conduct a comparison of our system with a state-of-the-art method called OSNet,^{7} which does not take into account the physical model. The visual results are given in Fig. 4, where the green circles are the ground truth and the red triangles represent the prediction results from the tested algorithms. As expected, under sufficient illumination where the interference patterns are recorded with high visibility [Fig. 4(b)], OSNet succeeds in restoring the 3D particles distribution in Fig. 4(e). However, under low light [Fig. 4(c)], the captured hologram is heavily corrupted by the intrinsic read noise in the sensor (i.e., *σ* = 2*e*^{−}), which results in the barely recognizable patterns and, in turn, a poor performance of OSNet [Fig. 4(f)]. In contrast, the binary recording nature of QIS leads to a negligible amount of read noise, and the dominant factor affecting fringes quality is the randomness of the photon arrivals [Fig. 4(d)]. By carefully taking this into account, PSHoloNet successfully reconstructs the particle volume as shown in Fig. 4(g). Numerically, it has a small position error of only 1.26 voxels, compared to the very significant errors shown in Fig. 4(f).

^{−4}particles per volume (ppv). Since the target distribution (ground truth) is available in the simulation, the quantitative evaluation can be made by considering two common metrics: (1) recall, which is calculated as

*N*

_{t}is the number of correct predictions and

*N*

_{m}is the number of missed ground truths; (2) precision, which is calculated as

*N*

_{f}is the number of wrong inferences.

The reconstruction results presented in Fig. 5 and Table I provide compelling evidence for the robustness of PSHoloNet in the face of particle density variation. In the 3D plot shown in Fig. 5, because the number of wrong and missed predictions is relatively low compared to the number of correct predictions, they may be obscured by the large number of correct detections in the plot and hardly discernible. Therefore, we use the precision and recall metrics in Table I to take into account the number of both correct and incorrect predictions quantitatively made by our system. As indicated, our system is able to accurately identify particles with a high degree of confidence. In addition, despite a sixfold increase in the number of particles from 50 to 300, PSHoloNet continues to deliver accurate predictions of particle locations. Across different density levels, PSHoloNet achieves an average recall of 93.6% and a precision of 96.73%. In order to make a quantitative assessment of the particle location prediction, we follow the common practice in object detection tasks and calculate the deviation. First, a buffer area is pre-defined around each ground-truth particle. Then, the predicted particle from the network that falls within this area is considered a matched particle, and the difference between their coordinates is taken as the prediction error in spatial distance. In our study, the buffer area is defined as a sphere with a five-voxel radius, and the results are shown in Table I. Specifically, for the 300-particle case, which is the densest situation, the maximum prediction error in correctly matched particle pairs is 24.4 *µ*m along the xy-plane and 0.137 mm along the z axis. These values account for only 0.697% and 0.685% of the total range, respectively.

Particle density (ppv) . | Recall (%) . | Precision (%) . | XY error (μm) . | Z error (mm) . |
---|---|---|---|---|

1 × 10^{−4} | 99.6 | 98.5 | 23.3 | 0.031 |

2 × 10^{−4} | 95.3 | 98.2 | 23.4 | 0.039 |

4 × 10^{−4} | 87.2 | 99.5 | 23.6 | 0.085 |

6 × 10^{−4} | 92.3 | 90.7 | 24.4 | 0.137 |

Particle density (ppv) . | Recall (%) . | Precision (%) . | XY error (μm) . | Z error (mm) . |
---|---|---|---|---|

1 × 10^{−4} | 99.6 | 98.5 | 23.3 | 0.031 |

2 × 10^{−4} | 95.3 | 98.2 | 23.4 | 0.039 |

4 × 10^{−4} | 87.2 | 99.5 | 23.6 | 0.085 |

6 × 10^{−4} | 92.3 | 90.7 | 24.4 | 0.137 |

### B. Phase imaging of synthetic objects

We investigate the application of the proposed network in photon-starved phase imaging by replacing the object field with pure phase objects. They are simulated by mapping the intensity patterns from some public natural images^{42} to a phase modulation ranging from −*π* to *π*, as shown in Fig. 6(a). The transmissive phase object is considered at a distance of 2 cm in front of the sensor plane with the same configuration for the illumination source and QIS. We simulate 300 holograms and follow the same training strategy as Sec. III A.

Figures 6(b) and 6(c) show examples of phase imaging under photon-starved situations in a conventional DH system. Again, the randomness of photon arrivals and intrinsic read-noise result in highly contaminated interference patterns. Under this situation, even a state-of-the-art iterative reconstruction algorithm called deep compressed object decoder (DCOD)^{43} fails to recover the phase information. On the other hand, considerable phase information is successfully captured in the QIS-recorded holograms, and PSHoloNet is able to recover the phase images.

Such adaptability of this system is achieved by designing and incorporating task-oriented priors and loss functions. In previous 3D particle volumetric reconstruction tasks, since the particle field is inherently real-valued due to the relatively small particle size (micro-/nano-scale), the use of mean-square-error (MSE) loss is straightforward with fast convergence and efficiency. However, for phase imaging, since there are much richer spatial (or spectral) details of the object, the selection of loss function has a significant influence on the reconstruction performance. To investigate such an effect, we further train PSHoloNet separately with the same synthetic phase object dataset while using different loss functions for the back-propagation and parameter updating. From the visual inspection shown in Fig. 7, undesired checkerboard artifacts are observed in the output images from the network trained purely by MSE loss. It reveals the insufficiency of retrieving the phase maps only from the pixel-wise comparison. On the contrary, the negative Pearson correlation (NPCC) loss preserves the spatial correlations between the neighbor pixels, which increases the structural similarity between the network prediction and the target feature map. As shown in Fig. 7, it leads to better reconstructed phase maps with more informative and distinguishable features. To quantify this comparison, we plot the structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) values between the reconstructed results and their ground-truth equivalents along the training process in Fig. 7(b). As indicated, only using MSE as the loss function results in poor structure recovery (an SSIM of 0.1). By introducing the NPCC loss, the more structural similarity is preserved in the predicted phase maps with a three times improvement (an SSIM of 0.32).

### C. Experimental particle reconstruction with read noise analysis

We now proceed to verify the above experimentally. To prepare the particle volume sample for our experiment, we disperse 50 *µ*m polystyrene particles in a physiological saline solution, which covers a distance ranging from 10 to 60 mm from the sensor plane. This involves thoroughly mixing the plastic particles with the saline solution until the particles are evenly distributed throughout the solution. Due to the similar density of the particles and the liquid, the particles diffuse in the solution and exhibit Brownian motion. This sample preparation method allows us to study the behavior and trajectories of the plastic particles from consecutive holograms. The illumination source is a 660 nm laser that is collimated to create the planar wave. The photon level is controlled by inserting a neutral density filter between the light source and the sample. A Sony IMX264 CMOS sensor with a pixel pitch of 3.45 *µ*m is used to record the holograms, and computationally converted to QIS data, following the same mathematical modeling in Sec. III A that has been experimentally verified extensively in the literature.^{20,26,30}

Since there is no ground truth, the inference using the state-of-the-art OSNet hologram reconstruction method with the ideal imaging condition is used as the reference for performance evaluation. The prediction result example is shown in Fig. 8, with an 82.6% recall and an 88.1% precision on average. The slight decrease in accuracy levels compared with the simulation is acceptable due to the inevitable misalignment errors in optical components in the real experimental setup. Based on the volumetric reconstruction on consecutive frames, the particle tracking is conducted to get the trajectories as shown in Fig. 8(e). The reported mean lateral and axial errors are 11.2 *µ*m and 0.137 mm, respectively. The precision is competitive with that achieved by OSNet under the ideal lighting condition.

In our results reported so far, we have verified the feasibility of the proposed QIS-DH system to recover wavefront information under photon-starved scenarios. We study the possibility of snapshot DH at an extremely low signal budget $(<1ppp)$. Under such conditions, the signal is so small that the Poisson random variable dominates, which is highly contaminated by the intrinsic read-out noise in traditional sensors. In contrast, QIS has a prominent reduction in read noise and provides more faithful access for the Poisson randomness of the photon arrival. However, it still cannot be free of it since there is no *perfect* sensor.

To investigate the influence of read noise on our system, we simulate QIS-recorded holograms for particle volumes under various levels of read noise. The mean photon flux is kept below 1 ppp, and the read noise variance *σ* increases from 0.1*e*^{−} to 2*e*^{−}. The quantitative metrics (i.e., recall and precision) are shown in Fig. 9 in the way of a box-and-whisker plot. For metric at each noise level, the box extends from the Q1 to Q3 quartile values with a line at the median (Q2) and the whiskers show their range. The observation is that regardless of the read noise level (at least for *σ* ≤ 1*e*^{−}), our system demonstrates robust and competitive prediction performance both in recall and precision (over 95%). As the noise level goes up, the number of missed particles increases, which results in a decreased recall (average of 18% under *σ* = 2*e*^{−}). Its dispersion also shows the unsteady performance for particle volumetric reconstruction under the photon-starved situation when the read noise is too large as is the case in the traditional sensors. On the contrary, a QIS has a relatively small value of *σ* = 0.19*e*^{−}, which is sufficient to guarantee its insensitivity to read noise.^{44}

## IV. SUMMARY

In this study, we demonstrate the integration of QIS in developing a snapshot photon-starved DH system. Enabled by the physics-informed PSHoloNet, our approach enables 3D particle volumetric reconstruction and phase imaging under a limited photon budget by single-bit holograms. This facilitates more advanced holography applications in situations when the scarcity of photons is essential and unavoidable, which has numerous applications in fields such as microscopy, biomedical imaging, and materials science. In addition, we have shown that QIS can be used as an effective imaging modality in scientific and life science imaging applications. Our findings open up new possibilities for QIS in this area, and we anticipate that it will encourage new research initiatives and collaborations, leading to the development of more advanced and efficient imaging techniques that can benefit the scientific community.

## ACKNOWLEDGMENTS

This work was supported by the Research Grants Council of Hong Kong (Grant Nos. GRF 17200019, GRF 17201620, and GRF 17201822).

The authors would like to thank Dr. Ni Chen (University of Arizona, USA) for her comments on an early version of this project and suggestions on the experimental setup.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Yunping Zhang:** Conceptualization (lead); Data curation (lead); Formal analysis (lead); Methodology (lead); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). **Stanley H. Chan:** Methodology (supporting); Writing – review & editing (supporting). **Edmund Y. Lam:** Conceptualization (supporting); Funding acquisition (lead); Supervision (lead); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX: STRUCTURE OF *α*-UPDATE BLOCK

*α*

As introduced in Sec. II, *α***-update** can be treated as a Gaussian denoiser for data with prior information, which are task-oriented and integrated by using the powerful representation of CNNs with different structures. Different structures used in this project are presented in the following.

#### 1. *α*-update for the reconstruction of 3D particles distribution

^{38}The symmetric inverse transform $T\u22121(\u22c5)$ from the sparsity domain is designed as the same structure. Specifically, each CNN block contains consecutive convolutional layers separated by a batch normalization layer and leaky rectified linear (ReLU) activation layer, which is a widely used structure with promising performance in various vision tasks.

^{6,38,42}Specifically, 3 × 3 convolutional layers are used with the same volume depth. A skip connection is made from the input directly to the final activation layer. Overall, the update process in Eq. (31) is rephrased as

*θ*

^{t}is the shrinkage threshold, which is a trainable parameter at the

*t*-th phase.

#### 2. *α*-update for the reconstruction of the phase map

Figure 11 shows the architecture details of the task-oriented prior in the phase map reconstruction. The input of the *α**-update* block is the revised signal $\alpha \u0303t$, and the output is the phase prediction *α*^{t}.

In alignment with the typical setting of U-Net, four sets of downsampling and upsampling operations are included, of which each has a skip connection between the resulting feature blocks of the same size. The number of channels for each scale is set as (64, 128, 256, 512), respectively. The residual blocks are included in the downsampling and upsampling operators, of which each is composed of a *Conv–ReLU–Conv* unit with an extra adding operation from its input to its output.

## REFERENCES

*Digital Holography and Wavefront Sensing*

*μ*m pitch in 3D-stacked CMOS technology

*2016 IEEE International Electron Devices Meeting*

*Introduction to Fourier Optics*

^{−}rms read noise 16.7 Mpixel stacked quanta image sensor with 1.1

*μ*m-pitch backside illuminated pixels