Estimation of the number of single-photon emitters for multiple fluorophores with the same spectral signature

Fluorescence microscopy is of vital importance for understanding biological function. However, most fluorescence experiments are only qualitative inasmuch as the absolute number of fluorescent particles can often not be determined. Additionally, conventional approaches to measuring fluorescence intensity cannot distinguish between two or more fluorophores that are excited and emit in the same spectral window, as only the total intensity in a spectral window can be obtained. Here we show that, by using photon number resolving experiments, we are able to determine the number of emitters and their probability of emission for a number of different species, all with the same measured spectral signature. We illustrate our ideas by showing the determination of the number of emitters per species and the probability of photon collection from that species, for one, two and three otherwise unresolvable fluorophores. The convolution binomial model is presented to represent the counted photons emitted by multiple species. Then, the expectation-maximization (EM) algorithm is used to match the measured photon counts to the expected convolution binomial distribution function. In applying the EM algorithm, to leverage the problem of being trapped in a sub-optimal solution, the moment method is introduced to yield an initial guess for the EM algorithm. Additionally, the associated Cramér–Rao lower bound is derived and compared with the simulation results.


I. INTRODUCTION
Understanding complex biological function often requires precise localization of biological molecules in space and time to better understand their interactions and relevant chemical reactions and transformations.Cryogenic electron microscopy is capable of defining positions of atoms in complex molecules with angstrom accuracy [1] but is often limited in capturing complex interactions of molecules in dynamic biological environment.Fluorescence and Raman microspectroscopies can provide structural and functional information about biological molecules; however, special measures are needed to extend the spatial resolution of optical imaging beyond the traditional diffraction limited spatial resolution defined by the excitation wavelength of light.The new generation of optical imaging methods based on super-resolution optical imaging, for which Nobel Prize in Chemistry was awarded in 2014, are gradually being adopted by the research community, and these techniques are now indispensable for fundamental understanding of biological function at the molecular level.However, many biological molecules are performing their function not along but in a coherent ensemble with other molecules.A typical example of such synergistic interaction is electrontransfer complex in mitochondrial membrane, where several cytochromes are involved in the ultimate production of ATP molecules which serve as a major energy fuel for a cell.To understand the function of mitochondrion and to assess its ability to efficiently produce ATP molecules, one needs to quantify the number of such cytochromes in membrane which is close to impossible using conventional methods.There are several challenges which are related to the size of the focal spot and internal dynamics of mitochondria which make it difficult to localize those different cytochromes within the membrane.Traditional approaches based on classical photon statistics have significant limitations due to either invasive nature, such as induced photobleaching which affects the electronic structure of biological fluorophores and amends its function, or complexity of signal collection and analysis [2].On the other hand, quantum spectroscopy based on quantum statistics of detecting photons using photon resolved detectors and cameras [3,4] (see also qCMOS by Hamamatsu [5]) as we have shown recently [6], makes it feasible to count individual emitters in a focal volume.
Techniques of fluorescence microscopy have become amongst the most used techniques for understanding biological function [7][8][9][10][11][12].This typically involves measurement of the uptake of functionalised fluorophores, or observation of the expression of fluorescent proteins in response to some stimulus [8][9][10][11][12].Fluorescence experiments are usually qualitative, or at least relative, as the total number of fluorophores is often not knowable because a conventional intensity measurement is unable to distinguish few bright emitters from many dim emitters [8].Moreover, if the fluorophores are excited and emit in the same spectral windows, then they may be impossible to distinguish with intensity only measurements.In [13], the average number of emitters in each species and the brightness ratio between multiple species are investigated and evaluated using high-order image correlation spectroscopy.
Earlier, we showed that the problem of quantitative determination of the number of emitters and the probability of photon detection could be solved for a single species of emitters with assumed identical properties [6].Our approach there used the binomial distribution of the number of photons emitted in a pulsed fluorescence experiment.By considering photon number resolving detectors (PNRD) [4,[14][15][16][17][18][19][20][21] we showed that the distribution of photons arriving in each detected pulse is uniquely specified by the number of emitters and the photon detection probability, so that these parameters can be determined with some confidence given a particular measurement record.
Here we show that PNRD detection techniques can, in principle, be used to discriminate between an arbitrary number of fluorescent species.We assume that each species is defined by two parameters, the number of emitters, M , and the probability of detection p, which is assumed to be constant for each member of the species, for instance, by each species having a different transition dipole moment.We assume no further ability to distinguish the species.This paper is organised as follows.We first introduce the measurement model and maximum likelihood estimation (MLE) approach.We then discuss how the expectationmaximization (EM) technique is applied.Finally we present simulations and Cramér-Rao lower bounds (CRLB) for the cases of one, two and three species.

II. MEASUREMENT MODEL AND MLE
We consider an fluorescence experiment where there are m distinct fluorophores (species); species j has M j emitters with probability of photon detection from each member of that species p j .A single excitation source (e.g.laser through microscope objective) excites the sample, and the fluorescence photons are collected by a PNRD.Schematic with four species emitters is shown in Fig. 1.This experiment is repeated a large number of times to build up statistics about the system.The number of fluorescence photons detected from each pulse, y are counted at the outputs of the emitters; this is modeled as the sum of m random variables sampled from The probability mass function (PMF) of Y = y, y = 0, curve or bar chart is always one, indicating that in a practical measurement the photon counts or the intensity from each species is the same, therefore they cannot be identified by conventional intensity-only based microscopy.However they generate distinguishing PNRD signals, building on which we introduce the MLE approach to tell them apart.0, 1, • • • , M j , and so can be derived as where Y y is actually the collection of partition of the integer y into exactly m parts.
The fundamental problem of interest is to estimate θ based on the measurements.
We show that this problem can be formulated as an MLE.In [22], it is shown that the probability of the sum of m independent integer-valued random variables (not necessarily identically distributed), i.e. pr(Y = y|θ), may be calculated using a recurrence relation.Furthermore, the PMF of (1) can be approximated by a Gaussian distribution, N m j=1 M j p j , m j=1 M j p j (1 − p j ) .In [23][24][25], other more accurate approximation methods, such as saddlepoint approximation, Kolmogorov approximation, or Krawtchouk polynomial approximation, are provided.Since all the mentioned methods are either in the form of a recurrence formula or otherwise have no closed form, they cannot be directly used in deriving the MLE for θ.Accordingly, we use Eq. ( 2) for investigating the MLE.
In each experiment we record the peak corresponding photon number from the PNRD signal as i, i = 0, ...N and N <= M .We count the occurrence of i in a series of experiments as C i , then the data from a series of experiments can be given by the frequency distribution , and ν = N i=0 C i is the total number of experiments.The log-likelihood function can then be expressed as where, similar to Y y , Y i is the collection of partition of the integer i into m parts.Furthermore, we assume that f (y|θ) = m j=1 pr j (y j |M j , p j ).
where is the Cartesian product.From ( 2) to ( 4), the underlying estimation problem is formulated as a parameterised MLE problem; that is, seeking the set of parameters θ in the parameter space which yield maximum likelihood The solution to (4) when m = 1 was provided in [6], where the MLE is proved to be an effective estimator and the associated CRLB is derived.However, when m > 1, solving (4) directly is very inefficient and even computationally impossible since the dimension of the problem, i.e. the number of the parameters to be estimated, is 2m.This rules out grid based methods to find the MLE.With the increasing number of parameters, the existence of multiple local extrema confounds most optimization methods for finding the global extremum.
To investigate the determination of θ, we generated synthetic data using the PMF, Eq. ( 2), with total number of experiments ν.These synthetic data yield a histogram of events, as illustrated in Fig. 2. The synthetic data was generated on the basis of ν = 100 experiments, with parameter θ = [8, 0.1, 10, 0.2, 12, 0.3].Also shown in Fig. 2a, the expected PMF given by Eq. ( 2), the histogram of synthetic data when ν = 100, the Gaussian approximation to Eq. ( 2) are given.As the number of experiments increases, the synthetic data should converge to the expected PMF as shown in Fig. 2b.
Remark 1 In the estimation of the parameters of fluorescent species, one may be more interested in the number of emitters of each species rather than the probabilities, so that, ideally, more consideration should be given to that problem.However, in the estimation of [M j , p j ], their accuracies are correlated; that is, reduced accuracy of estimation of p j will lead to a worse estimation performance for M j .Additionally, because of the integral nature of M j , the estimator Mj may be well away from ground truth, corresponding to a small change of pj .Accordingly, we will not focus on a particular parameter in what follows.

III. EXPECTATION-MAXIMIZATION ALGORITHM
The EM algorithm is an effective method for finding the MLE (or local extremum of the likelihood) iteratively by a simplification of a complicated likelihood function.By carefully choosing the initial guess of the EM, the MLE is approached with high probability.
In this section, the likelihood function ), is firstly reformulated and simplified to an equivalent problem under the EM framework using the sum-of-log-of-sums method [26].As a result, the 2m dimension MLE problem is converted into m independent 2-dimension optimization problems for which local extrema can be found iteratively, see (11).
Since the likelihood has many local minima, the initial guess has a significant impact on the final estimate of the EM algorithm, i.e. the EM algorithm may converge to a local critical point with an "incorrect" initial guess.It should be noted that EM becomes more sensitive to the initial guess with increasing m, as the number of local extrema increases for larger m.Seeding the EM algorithm with a limited number of random initial guesses typically provides convergence to the MLE when m = 2, but the number of initial seeds becomes unacceptably large when m > 2. To overcome this problem, we combined the  moment estimator with EM algorithm; that is, the results of moment estimator are used as the initial guesses.

A. Reformulating the likelihood function for EM algorithm
We introduce an intermediate variable θ⟨s⟩ that is the estimate of θ at the s-th iteration of the EM algorithm.Given θ⟨s⟩ , from (3), we have log = where (6) follows by Jensen's inequality [26], with w ⟨s⟩ y = f (y| θ⟨s⟩ ) L(y| θ⟨s⟩ ) and y∈Y i w ⟨s⟩ y = 1.

B. Choice of the initial guess for the EM algorithm
In the EM algorithm, the log likelihood is guaranteed to increase at each EM iteration, and it converges to a maximum of the likelihood under mild conditions [27].However, it is unnecessary to be the global optimizer.
As an "always improving" algorithm [28], EM is, of course, sensitive to the initial guess of θ, i.e., θ⟨0⟩ when the likelihood function contains multiple critical points.We observe that the number of local critical points increases dramatically with the increasing number of species.
Algorithm 1: The Structure of the EM algorithm to Estimate θ Calculate p⟨s⟩ j using ( 12) and ( 13) When m > 1, ( 13) may converge to a local minimum that is not the MLE.To relieve this problem, a search procedure can be adopted into the EM algorithm.In (13), it can be observed that p ⟨s+1⟩ j is updated at each iteration step using M ⟨s⟩ j .By taking into account that p ⟨s+1⟩ j has a better chance to converge to p j when M ⟨s⟩ j converges to M j , one can fix M ⟨s⟩ j = M j , ∀s and j = 1, • • • , m, in (13) and then find the optimized p⟨s⟩  for k > 1.Then the moment estimator, θmom , can be obtained by solving μi = µ i for i = 1, • • • , m, and this can be used to seed the EM algorithm: θ⟨0⟩ = θmom .Since we may find multiple θmom from the moment estimator,the EM can be run in parallel with different initial guesses.
As an example, the moments µ i of the sum of m = 4 Binomial distributed random variables are, [24], which can be simplified to Applying the method of moments, we replace Algorithm 2: The algorithm to estimate θ combining searching strategy Calculating p⟨s⟩ j,l using ( 13) by replacing M ⟨s⟩ j with Mj,l ;

IV. CR ÁMER-RAO LOWER BOUND
In this section, we calculate the Fisher Information Matrix (FIM) and then the Cramér-Rao lower bound (CRLB) [29].The CRLB provides a lower bound for the variance of an unbiased estimator.The underlying likelihood function (3) contains continuous as well as discrete components so that the conventional method to derive CRLB may not be applicable.
A Cramér-Rao type bound for discrete likelihood function is proposed in [30].However, it cannot handle the distribution containing discrete and continuous components.In this paper, following the calculation in [6], an approximated CRLB is derived by approximating the discrete component x! by a continuous function xΓ(x).On the other hand, the MLE is typically asymptotically unbiased under mild conditions [31].From the simulation given in Section V, the proposed MLE asymptotically approaches the derived approximated CRLB.
Then the FIM, I(θ), can now be calculated by inserting (17) and ( 18) into ( 16).The CRLB is just C(θ) = I(θ) −1 .For ν experiments, the CRLB at ground truth θ 0 is Here, the algorithm is evaluated via Monte Carlo (MC) simulations.The metrics used to evaluate performance are aMAPE( X1:nMC,1:4 , X where X ∈ {M, P }, Xi,j is the estimate of X j at the i-th MC simulation, nM C is the number of MC simulations, and RMSE(•) is the Root Mean Square Error and aMAPE(•) is the averaged Mean Absolute Percentage Error.
To explore the convergence of our algorithm, in the following subsections we illustrate the performance for the one, two, and three distinct species.

A. One species
The case of determining the number and detection probability for one species is highly analogous to the case that we presented in ref. [6].We concentrate on the small number regime as this is more pertinent for the case that we are concentrating on with few emitters.
Here we assume that the true parameters are [M 1 , p 1 ] = [8, 0.1].Fig. 3 shows the convergence of the determination of the number of emitters and detection probability.

VI. CONCLUSION
In this paper, we have formulated, mathematically, the estimation of the parameters of an arbitrary number of fluorescent species.Specifically, the convolution binomial model We also found that the values of the parameters, the number of emitters and their probabilities, have impact on the performance of the estimation: the closer the probabilities are, the worse the performance is, and a larger summed numbers of emitters degrades the performance.
Our work provides a preliminary study and demonstrates the possibility for the estimation of an arbitrary number of fluorescent species and gives insights into the relationship between performance of the estimator and the values of the parameters, which improves understanding of the problem.In future work, we will seek (1) to reduce the required number of experiments while maintaining estimator performance and (2) to improve computational efficiency.

FIG. 2 :
FIG. 2: The comparison between the relative occurrence obtained from synthetic measurement from photons with different ν, (a) ν = 100 and (b) ν = 1e7, where θ = [8, 0.1, 10, 0.2, 12, 0.3].(c) shows the Binomial PMF for each species.As an additional comparison, the Normal approximation for the expected PMF is given.It can be seen that the histogram obtained from the synthetic data converges to the expected distribution of photons with the increasing of the ν.
unknown and to be estimated.However, since M 1 , . . ., M m are (bounded) integers so that their possible values are finite and listable by enumeration.Suppose that M j ≤ M , ∀j and M ∈ Z + , then a set,M = {M 1 , • • • , M L }, containing all possible combinations for [M 1 , • • • , M m ]can be constructed by m-combinations from the integer set {1, 2, • • • , M } without repetition, order does not matter and it can be verified that L = M n .In other words, M l , l = 1, • • • , L, corresponds to a possible solution (combination) to [M 1 , • • • , M m ].The EM algorithm is, therefore, implemented L times.At the l-th implementation, the guess [ M ⟨0⟩ 1 , • • • , M ⟨0⟩ m ] is chosen to be M l and fixed for all s.For simplicity, we denote [ Mm,l , • • • , Mm,l ] as the guess of [M 1 , • • • , M m ] at l-th implementation of EM algorithm.

2 .
Choice of initial guesses for [p 1 , • • • , p m ] To obtain initial guesses for the estimator of [p 1 , • • • , p m ], we use the estimates from the moment estimator.By using the data, it is straightforward to calculate the sample mean, μ1 = N i=0 C i ν i, and k-th sample central moments, μk = N i=0 C i ν (i − μ1 ) k for k > 1.The associated population mean and central moments are µ 1 = E[x] and µ k = E[(x − µ 1 ) k ] 15) and find the real solutions [p 1,l , • • • , pm,l ] to provide the initial guess of [p 1 , • • • , p m ] at the l-th implementation of EM algorithm.After obtaining multiple candidate estimates with different initial guesses, the estimated MLE among these candidates is the one with the largest value of the likelihood function (3).The algorithm is summarized in Algorithm 2.
of √ CRLB and Std of p 1 .

6 (b) The sampled standard deviation of M 1 .FIG. 3 : 1 =
FIG. 3: Given different number of experiments ν, the comparison between the √ CRLB and sampled standard deviation when the number of species is m = 1.
FIG. 5: Given different value of ν, the comparison between the √ CRLB and sampled standard deviation when the number of species is j = 2.

6 :
The required number of experiments, ν exp , to attain CRLB(M 2 )/M 2 = 1% while [M 1 , p 1 ] = [8, 0.1] as well as M 2 varies from 1 to 20 and p 2 from 0.05 to 0.95.Please note that ν is plotted in log 10 scale and the white pixel corresponds to the point that M 1 = M 2 = 8 and p 1 = p 2 = 0.1 so that the FIM is singular (the CRLB does not exist).

C
. Three species In the case of two species, assume that [M 1 , p 1 ] = [8, 0.1], [M 2 , p 2 ] = [10, 0.2] and [M 2 , p 2 ] = [12, 0.3].The number of experiments, ν, is set to different numbers, as shown in the figures.For each ν, the Monte Carlo simulation is implemented 100 times.The comparison of the simulated standard deviation and the computed squared root of the CRLB of p 1 for all ν is shown in Fig. 7, while the comparison of the estimates and the standard deviation of the estimated [M 1 , M 2 , M 3 ] is shown in Fig. 7 for the different species.
of √ CRLB and Std of p 1 .

FIG. 7 :
FIG. 7: Given different value of ν, the comparison between the √ CRLB and sampled standard deviation when the number of species is m = 2.