The identification of light sources represents a task of utmost importance for the development of multiple photonic technologies. Over the last decades, the identification of light sources as diverse as sunlight, laser radiation, and molecule fluorescence has relied on the collection of photon statistics or the implementation of quantum state tomography. In general, this task requires an extensive number of measurements to unveil the characteristic statistical fluctuations and correlation properties of light, particularly in the low-photon flux regime. In this article, we exploit the self-learning features of artificial neural networks and the naive Bayes classifier to dramatically reduce the number of measurements required to discriminate thermal light from coherent light at the single-photon level. We demonstrate robust light identification with tens of measurements at mean photon numbers below one. In terms of accuracy and number of measurements, the methods described here dramatically outperform conventional schemes for characterization of light sources. Our work has important implications for multiple photonic technologies such as light detection and ranging, and microscopy.

## I. INTRODUCTION

The underlying statistical fluctuations of the electromagnetic field have been widely utilized to identify diverse sources of light.^{1,2} In this regard, the Mandel parameter constitutes an important metric to characterize the excitation mode of the electromagnetic field and consequently to classify light sources.^{3} Similarly, the degree of optical coherence has also been extensively utilized to identify light sources.^{3–6} Despite the fundamental importance of these quantities, they require large amounts of data, which impose practical limitations.^{6–10} This problem has been partially alleviated by incorporating statistical methods, such as bootstrapping, to predict unlikely events that are hard to measure experimentally.^{6,8–11} Unfortunately, the constraints of these methods severely impact the realistic implementation of photonic technologies for metrology, imaging, remote sensing, and microscopy.^{10,12–17}

The potential of machine learning has motivated novel families of technologies that exploit self-learning and self-evolving features of artificial neural networks to solve a large variety of problems in different branches of science.^{18,19} Conversely, quantum mechanical systems have provided new mechanisms to achieve quantum speedup in machine learning.^{20,21} In the context of quantum optics, there has been an enormous interest in utilizing machine learning to optimize quantum resources in optical systems.^{22–26} As a tool to characterize quantum systems, machine learning has been successfully employed to reduce the number of measurements required to perform quantum state discrimination, quantum separability, and quantum state tomography.^{25–32}

In this article, we demonstrate the potential of machine learning to perform discrimination of light sources at extremely low light levels. This is achieved by training single artificial neurons with the statistical fluctuations that characterize coherent and thermal states of light. The self-learning features of artificial neurons enable the dramatic reduction in the number of measurements and the number of photons required to perform identification of light sources. For the first time, our experimental results demonstrate the possibility of using tens of measurements to identify light sources with mean photon numbers below one. In addition, we demonstrate similar experimental results using the naive Bayes classifier, which are outperformed by our single neuron approach. Finally, we present a discussion on how a single artificial neuron based on an ADAptive LINear Element (ADALINE) model can dramatically reduce the number of measurements required to discriminate signal photons from ambient photons. These results are validated through the Helstrom and Chernoff bounds. Our work has strong implications for realistic implementation of light detection and ranging (LiDAR), remote sensing, and microscopy.

## II. EXPERIMENTAL SETUP AND MODEL

As shown in Fig. 1(a), we utilize a continuous-wave (CW) laser beam that is divided by a 50:50 beam splitter. The transmitted beam is focused onto a rotating ground glass, which is used to generate pseudo-thermal light with super-Poissonian statistics. The beam emerging from the ground glass is collimated using a lens and attenuated by neutral-density (ND) filters to mean photon numbers below one. The attenuated beam is then coupled into a single-mode fiber (SMF). The fiber directs photons to a superconducting nanowire single-photon detector (SNSPD). Furthermore, the beam reflected by the beam splitter is used as a source of coherent light. This beam, characterized by Poissonian statistics, is also attenuated, coupled into a SMF and detected by another SNSPD. The SNSPDs' bias voltages are set to achieve high-efficiency photon counting with less than five dark counts per second. The mean photon number of the coherent beam is matched to that of the pseudo-thermal beam of light.

In order to perform photon counting with our SNSPDs, we use the surjective photon counting method described in Ref. [33]. In this case, the transistor -transistor logic (TTL) pulses produced by our SNSPDs were detected and recorded by an oscilloscope. The data were divided in time bins of 1 *μ*s, which corresponds to the coherence time of our CW laser. Moreover, the 20 ns recovery time of our SNSPDs ensured that we perform measurements on a single-temporal-mode field. Voltage peaks above ∼0.5 V were considered as one photon event. The number of photons (voltage peaks) in each time bin was counted to retrieve photon statistics. The effect of dark counts was omitted, since the average number for dark counts per measurement is less than $5\xd710\u22126$. These events were then used for training and testing our ADALINE neuron and naive Bayes classifier.

The probability of finding *n* photons in coherent light is given by $Pcoh(n)=e\u2212n\xaf(n\xafn/n!)$, where $n\xaf$ denotes the mean photon number of the beam. Furthermore, the photon statistics of thermal light is given by $Pth(n)=n\xafn/(n\xaf+1)n+1$. It is worth noting that the photon statistics of thermal light is characterized by random intensity fluctuations with a variance greater than the mean number of photons in the mode. For coherent light, the maximum photon-number probability sits around $n\xaf$. For thermal light, the maximum is always at vacuum. However, when the mean photon number is low, the photon number distribution for both kinds of light becomes similar. Consequently, it becomes extremely difficult to discriminate one source from the other. The conventional approaches to discriminate light sources make use of millions of measurements.^{7,9,34,35} Unfortunately, these methods are not only time consuming, but also impose practical limitations.

In order to dramatically reduce the number of measurements required to identify light sources, we make use of an ADALINE neuron. ADALINE is a single neural network model based on a linear processing element, proposed by Bernard Widrow,^{36} for binary classification. In general, the neural networks undergo two stages: training and test. In the training stage, ADALINE is capable of learning the correct outputs (named as output labels or classes) from a set of inputs, so-called features, by using a supervised learning algorithm. In the test stage, this neuron produces the outputs of a set of inputs that were not in the training data, taking as reference the acquired experience in the training stage. Although we tested architectures far more complex than a single neuron for the identification of light sources, we found that a simple ADALINE offers a perfect balance between accuracy and simplicity (for more details, see Appendix A). The structure of the ADALINE model is shown in Fig. 1(b). The neuron input features are denoted by *P*(*n*), which corresponds to the probability of detecting *n* photons, in a single measurement event, for a given light source. Furthermore, the parameters *ω _{i}* are the synaptic weights and

*b*represents a bias term. During the training period, these parameters are optimized through the learning rule by using the error between the target output and neuron's output as reference. For binary classification (coherent or thermal), the neuron's output is fed into the identity activation function and, subsequently, into the threshold function.

To train the ADALINE, we make use of the so-called delta learning rule,^{37} in combination with a database of experimentally measured photon-number distributions, considering different mean photon numbers: $n\xaf=0.44,\u20090.53,\u20090.67,\u20090.77$. The database for each mean photon number was divided into subsets comprising $10,20,\u2026,150,160$ data points. The ADALINE neurons are thus prepared by using one hundred thousands of those subsets, where 70% are devoted to training and 30% to testing. In all cases, the training was stopped after 50 epochs.

We have established the baseline performance for our ADALINE neuron by using a naive Bayes classifier. This is a simple classifier based on Bayes's theorem.^{38} Throughout this article, we assume that each measurement is independent. Moreover, we represent the measurement of the photon number sequence as a vector $x=(x1,\u2026,xk)$. Then, the probability of this sequence generated from coherent or thermal light is given by $p(Cj|x1,\u2026,xk)$, where *C _{j}* could denote either coherent or thermal light. Using Bayes's theorem, the conditional probability can be decomposed as $p(Cj|x)=p(Cj)p(x|Cj)/p(x)$. By using the chain rule for conditional probability, we have $p(Ck|x1,\u2026,xk)=p(Cj)\u220fi=1kp(xi|Cj)$. Since our light source is either coherent or thermal, we assume $p(Cj)=0.5$. Thus, it is easy to construct a naive Bayes classifier, where one picks the hypothesis with the highest conditional probability $p(Cj|x)$. We used theoretically generated photon-number probability distributions as the prior probability $p(xi|Cj)$ and used the experimental data as the test data.

## III. RESULTS

In Fig. 2, we compare the histograms for the theoretical and experimental photon number distributions for different mean photon numbers $n\xaf=$ 0.40, 0.53, 0.67, and 0.77. The bar plots are generated by experimental data with one million measurements for each source; the curves in each of the panels represent the expected theoretical photon number distributions for the corresponding mean photon numbers. Figure 2 shows excellent agreement between theory and experiment, which demonstrates the accuracy of our surjective photon counting method. Furthermore, from Fig. 2(a)–2(d), we can also observe the effect of the mean photon number on the photon number probability distributions. As shown in Fig. 2(a), it is evident that millions of measurements enable one to discriminate light sources. On the other hand, Fig. 2(d) shows a situation in which the source mean-photon number is low. In this case, the discrimination of light sources becomes cumbersome, even with millions of measurements. In Fig. 3, we illustrate the difficulty of using limited sets of data to discriminate light sources at a mean photon number of $n\xaf=0.77$. The histograms in this figure are generated with data points that represent 10, 20, 50, 100, and 100 000 realizations of the experiment. Thus, each data point is equivalent to a photon number-resolving measurement of the light source. As shown in Fig. 3, the photon number distributions obtained with a limited number of measurements do not resemble those in the histograms shown in Fig. 2(a), for neither coherent nor thermal light beams.

To evaluate the performance of the ADALINE and naive Bayes classifiers, we calculate the accuracy $Pacc$, which is defined as the ratio of number of correct predictions *C _{P}* to the total number of input samples

*T*, that is, $Pacc(%)=CP/TN\xd7100$. Note that the accuracy is computed from the test dataset. These datasets are unseen by the algorithm in the training stage. Here, the number of input samples belonging to each class is equal. Since the accuracy only quantifies the successful events from a balanced test dataset, this is equivalent to the probability of discrimination. Therefore, we can also measure the probability of misclassification defining the discrimination probability error

_{N}*P*as $Pe=1\u2212Pacc$.

_{e}In Fig. 4, we show the overall accuracy for light discrimination using a naive Bayes classifier. The accuracy increases with the number of data points. For example, when $n\xaf=0.40$, the accuracy of discrimination increases from approximately 61% to 90% as we increase the number of data points from 10 to 160. It is worth noting that even with a small increase in the number of measurements, the naive Bayes classifier starts capturing the characteristic features of different light sources, given by distinct sequences of photon number events. This is obvious since larger sets of data contain more information pertaining to the probability distribution. Furthermore, the mean photon number of the light field significantly changes the discrimination accuracy profile. As the mean photon number increases, the overall accuracy converges faster toward 100%, as expected. This is due to the fact that the photon number probability distributions become more distinct at a higher mean photon number.

The overall accuracy of light-source discrimination with respect to the number of data points using ADALINE is shown in Fig. 5. In this case, the information provided by only 10 data points leads to an average accuracy of around 63% for $n\xaf=0.40$; whereas for 160 data points, the accuracy is greater than 90%. The comparison between Figs. 4 and 5 reveals that ADALINE and the naive Bayes classifier exhibit similar accuracy levels. As one might expect, in both cases, the accuracy increases with the number of data points and mean photon numbers. However, ADALINE requires far less computational resources than a naive Bayes classifier. Indeed, the execution time of ADALINE is one order of magnitude smaller than that of the naive Bayes classifier. The identification was performed using a computer with an Intel Core i7–4710MQ CPU (@2.50 GHz) and 32 GB of RAM with matlab 2019a. However, the convergence rate for the naive Bayes is slightly higher than that observed for the ADALINE classifier. This implies that at low mean photon numbers ADALINE outperforms the naive Bayes classifier in the sense that the former requires less computational resources than the latter.

To understand why a single ADALINE neuron is enough for light discrimination, we first realize that ADALINE is a linear classifier. Therefore, the decision surface is expressed by a seven-dimensional hyperplane. This is defined by the seven features described by *P*(*n*) (with $n=0,1,\u2026,6$). In our case, the range of [0,6] is selected based on the extremely low probability of observing a seven-photon event when the number of data points is 100. Interestingly, one can find that the datasets at the space of probability-distribution values are linearly separable. This can be seen from Fig. 6, where we plot the projection of the feature space on a three-dimensional sub-space defined by [*P*(0), *P*(1), *P*(2)] for different mean photon numbers. In all cases, each point or star is obtained from a probability distribution generated with 60 data points. Within this subspace, the photon statistics for thermal (red stars) and coherent (blue points) light sources show a separation from each other that increases with $n\xaf$. This effect is more evident when the number of data points used to generate one photon probability distribution is increased, and the mean photon number remains fixed at $n\xaf=0.77$ (see Fig. 7). Evidently, the fact that both thermal and coherent light form two linearly separated classes makes ADALINE a good classifier for identification of coherent and thermal light sources.

## IV. CONCLUSION

For more than 20 years, there has been an enormous interest in reducing the number of photons and measurements required to perform imaging, remote sensing, and metrology at extremely low light levels.^{12,17} In this regard, photonic technologies operating at low-photon levels utilize weak photon signals that make them vulnerable against detection of environmental photons emitted from natural sources of light. Indeed, this limitation has made unfeasible the realistic implementation of this family of technologies.^{6,10,13} So far, this vulnerability has been tackled through conventional approaches that rely on the measurement of coherence functions, *p*-values, the implementation of thresholding, and quantum state tomography.^{6,10,13,44} Unfortunately, these approaches to characterizing photon-fluctuations rely on the acquisition of large number of measurements that impose constraints on the identification of light sources. Here, for the first time, we have demonstrated a smart protocol for discrimination of light sources at mean photon numbers below one. Our work demonstrates a dramatic improvement in both the number of photons and measurements required to identify light sources.^{6,10,13,44} Furthermore, our results indicate that a single artificial neuron outperforms the naive Bayes classifier at low light levels. Interestingly, this neuron has simple analytical and computational properties that enable low-complexity and low-cost implementations of our technique. We are certain that our work has important implications for multiple photonic technologies, such as LiDAR and microscopy of biological materials.

## ACKNOWLEDGMENTS

We thank the Department of Physics and Astronomy at Louisiana State University for providing startup funding to perform this experimental work. C.Y. would like to acknowledge support from the National Science Foundation. N.B. would like to thank the Army Research Office (ARO) for the funding. R.J.L.M. and M.A.Q.J. thankfully acknowledge financial support by CONACYT under the Project No. CB-2016–01/284372 and by DGAPA-UNAM under the Project No. UNAM-PAPIIT IN102920. A.P.L. acknowledges financial support by the Deutsche Forschungsgemeinschaft within the priority program SPP 1839 “Tailored Disorder” (PE2602/2-2). We all thank K. Sharma, S. Khatri, J. P. Dowling, X. Wang, L. Cohen, and H. S. Eisenberg for helpful discussions.

### APPENDIX A: IMPLEMENTATION DETAILS

For sake of completeness, we provide additional details of our calculations as well as comparisons among different methods for light identification.

Since the naive Bayes classifier and the ADALINE methods presented in Sec. II were trained on experimental data, it is important to study the role of imperfections and their impact on the performance of our scheme for light identification. In Fig. 8, we report the overall accuracy of our naive Bayes classifier and our single neuron trained with numerically generated data and tested with experimental data. Naturally, the training is performed with ideal coherent and thermal statistics. A comparison among Figs. 4, 5 and 8 indicates a good agreement between theory and experiment. These results demonstrate the robustness of our method to identify light sources.

In addition to naive Bayes and ADALINE, we evaluate two additional machine-learning algorithms, namely a one-dimensional convolutional neural network (1D CNN) and a multilayer neural network (MNN). Despite the fact that both algorithms are effective to identify light sources, they are analytically and computationally more sophisticated than the simple ADALINE model. Nevertheless, their recognition rates do not present substantial differences. Figures 9(a) and 9(b) show the structure of the 1D-CNN and MNN, respectively.

A convolutional neural network is a deep learning algorithm that extracts automatically relevant features of the input.^{45} Here, our one-dimensional convolutional neural network is composed of two 1D-convolutional layers that extract the low- and high-level features of the input. Outcomes from these two layers are subsequently fed into a convolutional layer sandwiched between two max-pooling layers. The pooling layers downsample the input representation, and therefore its dimensionality, leading to a computational simplification by removing redundant and unnecessary information. The activation function, implemented in all layers, is the rectified linear unit function (ReLU). Finally, a fully connected and a flattening layer precedes the output layer consisting of two softmax functions, whose outputs are the probability distributions over labels.

On the other hand, the multilayer neural network belongs to a classical machine learning algorithm, where the feature vector should be manually determined.^{46} In our case, this vector is given by the probabilities of the photon number distribution, *P*(*n*). As depicted in Fig. 9(b), the model corresponds to a two-layer feed-forward network: the hidden layer contains ten sigmoid neurons and the output layer consists of a softmax function. To determine a suitable neuron number in the hidden layer of the MNN, we trained different MNNs by changing the neuron number in the hidden layer and followed the accuracy values for each net. Figures 10(a) and 10(b) show the overall accuracy for light discrimination vs the number of neurons in the hidden layer for different mean photon numbers, $n\xaf=0.4$ and 0.77, respectively. Note that in both cases, the accuracy becomes lower as the number of neurons increases. This is because many neurons lead to over-parameterization, causing poor generalization of the test-stage data. Additionally, as the number of neurons increases, the training becomes computationally more intensive. Figures 11(a) and 11(b) show the overall accuracy for light discrimination vs the size of the dataset for different mean photon numbers, $n\xaf=0.40$ and $n\xaf=0.77$, respectively. Note that in both cases, the accuracy exhibits similar performances as the size of the dataset increases, irrespective of the number of neurons in the hidden layer. Importantly, for large size datasets, additional computational time is required. Our results indicate that comparable accuracy can be achieved with smaller sized datasets. All the MNNs were trained by using the scaled conjugate gradient backpropagation method, where the cross-entropy was employed as the cost function. Since the output of sigmoid neurons is ranged in the interval [0,1], the cross-entropy function is ideal for the classification task. The network training was stopped after 200 epochs.

1D-CNNs and MNNs were trained with the same training set described in the main manuscript. Despite the fact that deep neural networks should be trained with a larger amount of data, we use 70% of the dataset for the training and the rest for testing both networks. Note that the same procedure was used for the ADALINE model. Figures 12(a) and 12(b) show the overall light-discrimination accuracy for increasingly larger number of data points for (a) 1D-CNNs and (b) MNNs. In both cases, the accuracy increases with the number of data points, because larger sets of data contain more information about the probability distribution. Interestingly, the accuracy of 1D-CNNs for $n\xaf=0.67$ and $n\xaf=0.77$ are almost the same; this indicates that in the low mean photon-number regime, the peak performance for 1D-CNN saturates much faster than the MNN classifier. As one might expect, this fast accuracy convergence carries the cost of a much more complex computation as compared to the one needed for the MNN classifier.

### APPENDIX B: COMPARISON WITH $g(2)$ AND *P*-VALUE CLASSIFICATION

Additionally, we have calculated the degree of second-order correlation functions for both sources. In this case, the correlation function $g(2)(\tau )$ was measured using the definition $g(2)(\tau )=1+(\u27e8(\Delta n\u0302)2\u27e9\u2212\u27e8n\u0302\u27e9)/\u27e8n\u0302\u27e92$, where $\u27e8\cdots \u27e9$ denotes statistical average of the input dataset. The second-order correlation functions at *τ* = 0 for different sizes of datasets and mean photon numbers are presented in Figs. 13(a) and 13(b). In both cases, the $g(2)(0)$ calculation presents large standard deviations due to the limited data points. These large variations impose important difficulties in the identification of light sources. In addition, these large variations make it hard to justify a good $g(2)(0)$ to identify coherent and thermal sources. To provide visual evidence of the last statement, we compare the $g(2)(0)$ results with the statistical fluctuations that characterize coherent and thermal sources. Inspired by the level of significance commonly used in statistical hypothesis testing, we established tolerance bands of 5% around the theoretical values of $g(2)(0)$. The overall accuracy obtained for both sources is reported in Fig. 13(c). Notably, the accuracy is not greater than 25%, even when the mean photon number is $n\xaf=0.77$. Remarkably, the ADALINE neuron reaches an accuracy of about 95% for the same mean photon number.

Moreover, the so-called *p*-value is a measure of the probability of an observed value, assuming that the null hypothesis is true. The null hypothesis is rejected, if the *p*-value is less than or equal to the level of significance *α*, which by convention is set to $\alpha =0.05$.^{39} Then, inspired by the previous results, we take our estimator for the *p*-value to be the variance of the photon probability distribution. Note that the variance for a coherent state is given by the mean photon number $n\xaf$; whereas the variance for a thermal distribution is given by $n\xaf+n\xaf2$. This difference in the variance makes it a perfect candidate for an estimator of the *p*-value. For this purpose, we set the null hypothesis *H*_{0} to be a Poissonian distribution and the alternative hypothesis *H*_{1} to be a thermal distribution. Thus, the *p*-value is defined by $pvalue=Pr(s2(X)\u2264p\alpha |H0)$. Here, $p\alpha $ is the critical bound which limits the decision region; furthermore, $s2(X)$ corresponds to the sample variance with *X* representing a random variable for the observed data. It is worth mentioning that the variances $s2(X)$ follow a normal distribution. Therefore, $p\alpha $ can be calculated as $Pr(s2(X)\u2a7ep\alpha |H0)=\alpha $. Assuming that $\alpha =0.05$, then $p\alpha =1.64\sigma +n\xaf$, the standard deviation of the variance distribution is described by *σ*.

Figures 14(a) and (b) show the *p*-value for different dataset sizes for sources with various mean photon numbers. Note that *p*-values shown in Fig. 14(a) for a coherent source are greater than the level of significance for any mean photon number. Consequently, the null hypothesis *H*_{0} must be accepted, and therefore, the data are produced by a source with a Poissonian distribution. For the case of a thermal source, [Fig. 14(b)], we expected to find *p*-values less than $\alpha =0.05$ to reject the null hypothesis and to accept an alternative hypothesis. However, it is worth noticing that for any mean photon number, the *p*-values are beyond the level of significance for datasets with sizes smaller than 150. Thus, we cannot reject *H*_{0,} and one concludes that the data were produced by a source with a Poissonian distribution, which is clearly wrong. Finally, Fig. 14(c) shows the accuracy for different mean photon numbers using the *p*-value as a measurement to discriminate light sources. For the lowest mean photon number, the accuracy shows high performance for data sizes beyond 500 whereas, for $n\xaf=0.77$, the accuracy reaches the best performance with 180 data points. In any case, the ADALINE and naive Bayes classifiers offer a better performance when compared to the calculated *p*-value with fewer data points.

### APPENDIX C: COMPARISON WITH SIGNAL-DISCRIMINATION THEORETICAL BOUNDS

Finally, since our scheme for light identification is inherently related to signal discrimination, it is important to compare our results with the theoretical bounds predicted by the Helstrom^{40,41} and the Chernoff bounds.^{42,43} If two quantum states are described by the density matrices *ρ* and *σ*, then Helstrom bound is given by $pH=12[1+DTr(\rho ,\sigma )]$, where $DTr(\rho ,\sigma )=12Tr[\rho \u2212\sigma ]$ is the trace distance. Thus, the Helstrom bound indicates the lower bound on the error probability $Pe=1\u2212PH$ in a single realization of the experiment. In the limit of repeating the experiment *n* times, the error probability *P _{e}* will decreases exponentially given by $Pe,n\u223c\u2009exp\u2009(\u2212n\xi QCB)$, where $\xi QCB=\u2212log\u2009min0\u2264s\u22641Tr\u2009(\rho s\sigma 1\u2212s)$ is the quantum Chernoff bound. As shown in Fig. 15, indeed, in the limit of a single data point, our accuracy is lower than that dictated by the Helstrom bound. However, in the limit of increasing the number of data points, our accuracy approaches the Chernoff bound quickly. These results validate the performance and accuracy of our method for light identification.