Fluorescence microscopy is of vital importance for understanding biological function. However, most fluorescence experiments are only qualitative inasmuch as the absolute number of fluorescent particles can often not be determined. Additionally, conventional approaches to measuring fluorescence intensity cannot distinguish between two or more fluorophores that are excited and emit in the same spectral window, as only the total intensity in a spectral window can be obtained. Here we show that, by using photon number resolving experiments, we are able to determine the number of emitters and their probability of emission for a number of different species, all with the same measured spectral signature. We illustrate our ideas by showing the determination of the number of emitters per species and the probability of photon collection from that species, for one, two and three otherwise unresolvable fluorophores. The convolution binomial model is presented to represent the counted photons emitted by multiple species. Then, the expectation-maximization (EM) algorithm is used to match the measured photon counts to the expected convolution binomial distribution function. In applying the EM algorithm, to leverage the problem of being trapped in a sub-optimal solution, the moment method is introduced to yield an initial guess for the EM algorithm. Additionally, the associated Cramér–Rao lower bound is derived and compared with the simulation results.

## I. INTRODUCTION

Understanding the complex biological function often requires precise localization of biological molecules in space and time to better understand their interactions and relevant chemical reactions and transformations. Cryogenic electron microscopy is capable of defining positions of atoms in complex molecules with angstrom accuracy^{1} but is often limited in capturing complex interactions of molecules in dynamic biological environment. Fluorescence and Raman microspectroscopies can provide structural and functional information about biological molecules, though special measures are needed to extend the spatial resolution of optical imaging beyond the traditional diffraction limit defined by the excitation wavelength of light. The new generation of optical imaging methods based on super-resolution optical imaging, for which the Nobel Prize in Chemistry was awarded in 2014, is gradually being adopted by the research community, and these techniques are now indispensable for fundamental understanding of biological function at the molecular level. However, many biological molecules are performing their function not along but in a coherent ensemble with other molecules. A typical example of such synergistic interaction is the electron-transfer complex in a mitochondrial membrane, where several cytochromes are involved in the ultimate production of ATP molecules that serve as a major energy fuel for a cell. To understand the function of a mitochondrion and to assess its ability to efficiently produce ATP molecules, one needs to quantify the number of such cytochromes in a membrane, which is close to impossible using conventional methods. There are several challenges which are related to the size of the focal spot and internal dynamics of mitochondria, and these make it difficult to localize those different cytochromes within the membrane. Traditional approaches based on classical photon statistics have significant limitations, because of either their invasive nature, such as induced photobleaching, which affects the electronic structure of biological fluorophores and amends its function, or complexity of signal collection and analysis.^{2} On the other hand, quantum spectroscopy based on the quantum statistics of detecting photons using photon resolved detectors and cameras^{3,4} (see also qCMOS by Hamamatsu^{5}) as we have shown recently^{6} makes it feasible to count individual emitters in a focal volume.

Techniques of fluorescence microscopy have become among the most used techniques for understanding biological function.^{7–12} These typically involve measurement of the uptake of functionalized fluorophores, or observation of the expression of fluorescent proteins in response to some stimulus.^{8–12} Fluorescence experiments are usually *qualitative* or, at most, relative, as the total number of fluorophores is often not observable because a conventional intensity measurement is unable to distinguish a few bright emitters from many dim emitters.^{8} Moreover, if the fluorophores are excited and emit in the same spectral windows, then they may be impossible to distinguish with intensity only measurements. In Ref. 13, the average number of emitters in each species and the brightness ratio between multiple species are investigated and evaluated using high-order image correlation spectroscopy.

Quantum correlation techniques use two-photon and higher coincidences to gain information about an incoming light field. As such, they have found application in imaging and localization, e.g.,^{14–19} characterization^{6,20} as well as in the identification of non-classical states of light.^{21,22} The traditional method of measuring coincidences is through the Hanbury Brown and Twiss experiment,^{23} where single photon avalanche detectors (SPADs) monitor the light received via a beamsplitter, or sequence of beamsplitters in the case of higher order coincidences. Sometimes, systems that monitor higher-order coincidences in this fashion are referred to as photon number resolving detectors (PNRD) (see, for example, Ref. 22) although here we reserve this terminology for devices that are inherently photon resolving such as those we mention above. A comparison of high-order Hanbury Brown and Twiss with photon number resolving detectors can be found in Ref. 6.

Earlier, we showed that the problem of quantitative determination of the number of emitters and the probability of photon detection could be solved for a single species of emitters with assumed identical properties.^{6} Our approach there used the binomial distribution of the number of photons emitted in a pulsed fluorescence experiment. By considering photon number resolving detectors (PNRD),^{4,24–31} we showed that the distribution of photons arriving in each detected pulse is uniquely specified by the number of emitters and the photon detection probability, so that these parameters can be determined with some confidence given a particular measurement record.

Here, we show that PNRD detection techniques can, in principle, be used to discriminate between an arbitrary number of fluorescent species. We assume that each species is defined by two parameters, the number of emitters, *M*, and the probability of detection *p*, which is assumed to be constant for each member of the species, for instance, by each species having a different transition dipole moment. We assume no further ability to distinguish the species. The determination of multiple species is a highly non-trivial extension of our earlier work.^{6} The additional complexity arises due to the multi-dimensional state space which leads to multiple local minima, confounding convergence to the ground truth. To mitigate these issues, we introduce a two-stage estimator, which we find improves convergence significantly. Our treatment assumes that an upper bound on number of species is known *a priori* (some species could have either *p* = 0 or *M* = 0), but for computational simplicity here we will assume that the number of distinct species is known.

This paper is organized as follows. We first introduce the measurement model and maximum likelihood estimation (MLE) approach. We then discuss how the expectation-maximization (EM) technique is applied. Finally, we present simulations and Cramér–Rao lower bounds (CRLB) for the cases of one, two, and three species.

## II. MEASUREMENT MODEL AND MLE

We consider a fluorescence experiment where there are *m* distinct fluorophores (species); species *j* has *M _{j}* emitters with probability of photon detection from each member of that species

*p*. A single excitation source (e.g., laser through microscope objective) excites the sample, and the fluorescence photons are collected by a PNRD. Schematic with three species emitters is shown in Fig. 1.

_{j}Due to the numerical complexity of the current scheme, we make a simplifying assumption that each emitter species has the same detection probability per emitter. The point spread function of a microscope system means that as the emitters' positions vary, the collection efficiency would also vary. For some spatial distribution of the emitters, then the $ i th$ emitter from group *j* will have collection probability $ p i , j = p i , j ( r i , j )$, where $ r i , j$ is the distance of the emitter from the focal point of the objective. If these detection probabilities overlap, then without additional information (for example an *ansatz* describing the spatial distribution of the emitters) it will not be possible to distinguish bright emitters with large *r* from less bright emitters with small *r*, which in turn implies that estimates of the *M _{j}* will be problematic. These restrictions mean that we require the emitters to be confined to a region much smaller than the point spread function, or to use uniform illumination and an integrating sphere to collect the emission.

*y*, is counted at the outputs of the emitters; this is modeled as the sum of

*m*random variables sampled from binomial distributions with parameters $ [ M j , p j ] , \u2009 j = 1 , \u2026 , m$, i.e.,

*Y*=

*y*, $ y = 0 , \u2026 , M$ and $ M = \u2211 i = 1 m M i$, given parameters $ \theta = [ M 1 , p 1 , M 2 , p 2 , \u2026 , M m , p m ]$ can be seen, by the law of total probability, to be a convolution of

*m*binomial distributions, $ p r j ( y j | M j , p j )$ for $ j = 1 , \u2026 , m$ and $ y j = 0 , 1 , \u2026 , M j$, and so can be derived as

*y*into exactly

*m*parts. It should be noted that background noise, such as stray light, objective autofluorescence, and camera noise are not considered in our model. These noise sources may be modeled by Poisson or mixed Poisson Gaussian model,

^{32}and researchers have proposed some methods to eliminate or compress these noise.

^{33–35}Although treating this noise is practically important, it represents a significant complication to our model, greatly increasing the necessary space to explore and thereby distracting from the main goals of this paper. We, therefore, omit background noise in the model (2), leaving this as a future problem.

The fundamental problem of interest is to estimate $\theta $ based on the measurements. We show that this problem can be formulated as an MLE. In Ref. 36, it is shown that the probability of the sum of *m* independent integer-valued random variables (not necessarily identically distributed), i.e., $ p r ( Y = y | \theta )$, may be calculated using a recurrence relation. Furthermore, the PMF of (1) can be approximated by a Gaussian distribution, $ N [ \u2211 j = 1 m M p j , \u2211 j = 1 m M j p j ( 1 \u2212 p j ) ]$. In Refs. 37–39, other more accurate approximation methods, such as saddlepoint approximation, Kolmogorov approximation, or Krawtchouk polynomial approximation, are provided. Since all the mentioned methods are either in the form of a recurrence formula or otherwise have no closed form, they cannot be directly used in deriving the MLE for $\theta $. Accordingly, we use Eq. (2) for investigating the MLE.

*i*, $ i = 0 , \u2026 , N$ and $ N < = M$. We count the occurrence of

*i*in a series of experiments as

*C*; then, the data from a series of experiments can be given by the frequency distribution $ [ C 0 , C 1 , \u2026 , C N ]$, and $ \nu = \u2211 i = 0 N C i$ is the total number of experiments. The log-likelihood function can then be expressed as

_{i}*i*into

*m*parts. Furthermore, we assume that $ f ( y | \theta ) = \u220f j = 1 m p r j ( y j | M j , p j )$ and therefore $ L ( y | \theta ) = \u2211 y \u2208 Y i f ( y | \theta )$.

The solution to (4) when *m* = 1 was provided in Ref. 6, where the MLE is proved to be an effective estimator and the associated CRLB is derived. However, when *m* > 1, solving (4) directly is very inefficient and even computationally impossible since the dimension of the problem, i.e., the number of the parameters to be estimated, is 2*m*. This rules out grid-based methods to find the MLE. With the increasing number of parameters, the existence of multiple local extrema confounds most optimization methods for finding the global extremum.

To investigate the determination of $\theta $, we generated synthetic data using the PMF, Eq. (2), with total number of experiments *ν*. These synthetic data yield a histogram of events, as illustrated in Fig. 2. The synthetic data were generated on the basis of *ν* = 100 experiments, with parameter $ \theta = [ 8 , 0.1 , 10 , 0.2 , 12 , 0.3 ]$. Also shown in Fig. 2(a), the expected PMF given by Eq. (2), the histogram of synthetic data when *ν* = 100. As the number of experiments increases, the synthetic data should converge to the expected PMF as shown in Fig. 2(b).

**Remark 1.** *In the estimation of the parameters of fluorescent species, one may be more interested in the number of emitters of each species rather than the probabilities, so that, ideally, more consideration should be given to that problem. However, in the estimation of* $ [ M j , p j ]$*, their accuracies are correlated; that is, reduced accuracy of estimation of p _{j} will lead to a worse estimation performance for M_{j}. Additionally, because of the integral nature of M_{j}, the estimator* $ M \u0302 j$

*may be well away from ground truth, corresponding to a small change of*$ p \u0302 j$

*. Accordingly, we will not focus on a particular parameter in what follows.*

## III. EXPECTATION-MAXIMIZATION ALGORITHM

The EM algorithm is an effective method for finding the MLE (or local extremum of the likelihood) iteratively by a simplification of a complicated likelihood function. By carefully choosing the initial guess of the EM, the MLE is approached with high probability.

In this section, the likelihood function $ \u2113 ( C 0 , \u2026 , C N | \theta )$, i.e., (3), is firstly reformulated and simplified to an equivalent problem under the EM framework using the sum-of-log-of-sums method.^{40} As a result, the 2*m* dimension MLE problem is converted into *m* independent two-dimension optimization problems for which local extrema can be found iteratively, see (12).

Since the likelihood has many local minima, the initial guess has a significant impact on the final estimate of the EM algorithm, i.e., the EM algorithm may converge to a local critical point with an “incorrect” initial guess. It should be noted that EM becomes more sensitive to the initial guess with increasing *m*, as the number of local extrema increases for larger *m*. Seeding the EM algorithm with a limited number of random initial guesses typically provides convergence to the MLE when *m* = 2, but the number of initial seeds becomes unacceptably large when *m* > 2. To overcome this problem, we combined the moment estimator with EM algorithm; that is, the results of moment estimator are used as the initial guesses.

**Remark 2.** *We mention that it is non-trivial to find the global optimum, i.e., exact MLE, for the likelihood function* (3) *since the likelihood function is highly complex* and *exhibits multiple local minima. Some existing optimization methods, such as Annealing,*^{41}^{,} *particle swarm optimization*^{42} and *stochastic gradient descent,*^{43,44} *are available to find the global optimum for simple cost function, but there is no guarantee that these method can find the global optimum for highly nonlinear functions having multiple local optimum such as found here in the case of multiple species.*

### A. Reformulating the likelihood function for EM algorithm

*s*th iteration of the EM algorithm. Given $ \theta \u0302 \u27e8 s \u27e9$, from (3), we have

^{40}with $ w y \u27e8 s \u27e9 = f ( y | \theta \u0302 \u27e8 s \u27e9 ) / L ( y | \theta \u0302 \u27e8 s \u27e9 )$ and $ \u2211 y \u2208 Y i w y \u27e8 s \u27e9 = 1$.

^{45}The problem then becomes to maximize $ Q ( \theta | \theta \u0302 \u27e8 s \u27e9 )$ over $\theta $, i.e.,

*j*th term of (12), we obtain

Now, the EM algorithm can be implemented using an initial guess $ \theta \u0302 j \u27e8 0 \u27e9$ for $ j = 1 , \u2026 , m$ and the (local) estimate can be obtained until $ \theta \u0302 j \u27e8 s \u27e9$ converges. The structure of the EM algorithm is listed in Algorithm 1.

Data: $ C 0 , \u2026 , C N$ |

Result: $ \theta \u0302 \u2190 [ M \u0302 1 , p \u0302 1 , \u2026 , M \u0302 m , \u2026 , p \u0302 m ]$ $ s \u2190 0$; |

Choose initial guesses $ [ M \u0302 1 \u27e8 s \u27e9 , \u2026 , M \u0302 m \u27e8 s \u27e9 ]$ and $ [ p \u0302 1 \u27e8 s \u27e9 , \u2026 , p \u0302 m \u27e8 s \u27e9 ]$ |

repeat |

$ s \u2190 s + 1$; |

Calculate $ p \u0302 j \u27e8 s \u27e9$ using (13) and (14) for $ j = 1 , \u2026 , m$; |

until converge; |

$ \theta \u0302 \u2190 [ M \u0302 1 \u27e8 s \u27e9 , p 1 \u27e8 s \u27e9 , \u2026 , M \u0302 m \u27e8 s \u27e9 , p m \u27e8 s \u27e9 ]$; |

Data: $ C 0 , \u2026 , C N$ |

Result: $ \theta \u0302 \u2190 [ M \u0302 1 , p \u0302 1 , \u2026 , M \u0302 m , \u2026 , p \u0302 m ]$ $ s \u2190 0$; |

Choose initial guesses $ [ M \u0302 1 \u27e8 s \u27e9 , \u2026 , M \u0302 m \u27e8 s \u27e9 ]$ and $ [ p \u0302 1 \u27e8 s \u27e9 , \u2026 , p \u0302 m \u27e8 s \u27e9 ]$ |

repeat |

$ s \u2190 s + 1$; |

Calculate $ p \u0302 j \u27e8 s \u27e9$ using (13) and (14) for $ j = 1 , \u2026 , m$; |

until converge; |

$ \theta \u0302 \u2190 [ M \u0302 1 \u27e8 s \u27e9 , p 1 \u27e8 s \u27e9 , \u2026 , M \u0302 m \u27e8 s \u27e9 , p m \u27e8 s \u27e9 ]$; |

### B. Choice of the initial guess for the EM algorithm

In the EM algorithm, the log likelihood is guaranteed to increase at each EM iteration, and it converges to a maximum of the likelihood under mild conditions.^{45} However, it is not guaranteed to be the global optimum.

As an “always improving” algorithm,^{46} EM is, of course, sensitive to the initial guess of $\theta $, i.e., $ \theta \u0302 \u27e8 0 \u27e9$ when the likelihood function contains multiple critical points. We observe that the number of local critical points increases dramatically with the increasing number of species.

#### 1. Choosing initial guesses for $ [ M 1 , \u2026 , M m ]$

When *m* > 1, (14) may converge to a local minimum that is not the MLE. To relieve this problem, a search procedure can be adopted into the EM algorithm. In (14), it can be observed that $ p j \u27e8 s + 1 \u27e9$ is updated at each iteration step using $ M \u0302 j \u27e8 s \u27e9$. By taking into account that $ p j \u27e8 s + 1 \u27e9$ has a better chance to converge to *p _{j}* when $ M \u0302 j \u27e8 s \u27e9$ converges to

*M*, one can fix $ M \u0302 j \u27e8 s \u27e9 = M j , \u2009 \u2200 s$ and $ j = 1 , \u2026 , m$, in (14) and then find the optimized $ p \u0302 j \u27e8 s \u27e9$ given $ [ M 1 , \u2026 , M m ]$ iteratively.

_{j}In practice, $ [ M 1 , \u2026 , M m ]$ is unknown and to be estimated. However, since $ M 1 , \u2009 \u2026 , \u2009 M m$ are (bounded) integers so that their possible values are finite and listable by enumeration. Suppose that $ M j \u2264 M , \u2009 \u2200 j$ and $ M \u2208 \mathbb{Z} +$, then a set, $ M = { M 1 , \u2026 , M L}$, containing all possible combinations for $ [ M 1 , \u2026 , M m ]$ can be constructed by *m*-combinations from the integer set $ { 1 , 2 , \u2026 , M}$ without repetition, order does not matter and it can be verified that $ L = ( M n )$. In other words, $ M l , \u2009 l = 1 , \u2026 , L$, corresponds to a possible solution (combination) to $ [ M 1 , \u2026 , M m ]$. The EM algorithm is, therefore, implemented *L* times. At the *l*th implementation, the guess $ [ M \u0302 1 \u27e8 0 \u27e9 , \u2026 , M \u0302 m \u27e8 0 \u27e9 ]$ is chosen to be $ M l$ and fixed for all *s*. For simplicity, we denote $ [ M \u0302 m , l , \u2026 , M \u0302 m , l ]$ as the guess of $ [ M 1 , \u2026 , M m ]$ at *l*th implementation of EM algorithm.

#### 2. Choice of initial guesses for $ [ p 1 , \u2026 , p m ]$

To obtain initial guesses for the estimator of $ [ p 1 , \u2026 , p m ]$, we use the estimates from the moment estimator. By using the data, it is straightforward to calculate the sample mean, $ \mu \u0302 1 = \u2211 i = 0 N ( C i / \nu ) i$, and *k*th sample central moments, $ \mu \u0302 k = \u2211 i = 0 N ( C i / \nu ) ( i \u2212 \mu \u0302 1 ) k$ for *k* > 1. The associated population mean and central moments are $ \mu 1 = E [ x ]$ and $ \mu k = E [ ( x \u2212 \mu 1 ) k ]$ for *k* > 1. Then, the moment estimator, $ \theta \u0302 mom$, can be obtained by solving $ \mu \u0302 i = \mu i$ for $ i = 1 , \u2026 , m$, and this can be used to seed the EM algorithm: $ \theta \u0302 \u27e8 0 \u27e9 = \theta \u0302 mom$. Since we may find multiple $ \theta \u0302 mom$ from the moment estimator, the EM can be run in parallel with different initial guesses.

*μ*of the sum of

_{i}*m*= 4 binomial distributed random variables are

^{38}

Applying the method of moments, we replace $ [ M 1 , \u2026 , M m ]$ by $ [ M \u0302 1 , l , \u2026 , M \u0302 m , l ]$ and $ [ \mu 1 , \u2026 , \mu m ]$ by $ [ \mu \u0302 1 , l , \u2026 , \mu \u0302 m , l ]$ in (16) and find the real solutions $ [ p \u0302 1 , l , \u2026 , p \u0302 m , l ]$ to provide the initial guess of $ [ p 1 , \u2026 , p m ]$ at the *l*th implementation of EM algorithm. After obtaining multiple candidate estimates with different initial guesses, the estimated MLE among these candidates is the one with the largest value of the likelihood function (3). The algorithm is summarized in Algorithm 2.

Data: $ C 0 , \u2026 , C N$ |

Result: $ \theta \u0302 \u2190 [ M \u0302 1 , p \u0302 1 , \u2026 , M \u0302 m , \u2026 , p \u0302 m ]$ |

$ { M 1 , \u2026 , M L} \u2190 nchoosek ( N , m )$; |

$ k \u2190 1$; |

for $ l = 1 : L$ do |

$ sol = solve ( \u2211 j = 1 m M \u0302 j , l p j i = \mu \u0302 i , \u2009 \u2009 \u2009 i = 1 , \u2026 , m )$; |

if sol is real then |

$ s \u2190 0$; |

$ p \u0302 1 , l \u27e8 s \u27e9 , \u2026 , p \u0302 m , l \u27e8 s \u27e9 \u2190 sol$; |

repeat |

$ s \u2190 s + 1$; |

Find $ p \u0302 j , l \u27e8 s \u27e9$ using (14) by replacing $ M \u0302 j \u27e8 s \u27e9$ with $ M \u0302 j , l$; |

until converge; |

$ \theta \u0302 k = [ M \u0302 1 , l , p \u0302 1 , l \u27e8 s \u27e9 , \u2026 , M \u0302 m , l , p \u0302 m , l \u27e8 s \u27e9 ]$; |

$ e k = \u2113 ( \theta \u0302 l | C 0 , \u2026 , C N )$; |

$ k \u2190 k + 1$; |

$ I \u2190 arg max i \u2009 { e i}$; |

$ \theta \u0302 \u2190 \theta \u0302 I$; |

Data: $ C 0 , \u2026 , C N$ |

Result: $ \theta \u0302 \u2190 [ M \u0302 1 , p \u0302 1 , \u2026 , M \u0302 m , \u2026 , p \u0302 m ]$ |

$ { M 1 , \u2026 , M L} \u2190 nchoosek ( N , m )$; |

$ k \u2190 1$; |

for $ l = 1 : L$ do |

$ sol = solve ( \u2211 j = 1 m M \u0302 j , l p j i = \mu \u0302 i , \u2009 \u2009 \u2009 i = 1 , \u2026 , m )$; |

if sol is real then |

$ s \u2190 0$; |

$ p \u0302 1 , l \u27e8 s \u27e9 , \u2026 , p \u0302 m , l \u27e8 s \u27e9 \u2190 sol$; |

repeat |

$ s \u2190 s + 1$; |

Find $ p \u0302 j , l \u27e8 s \u27e9$ using (14) by replacing $ M \u0302 j \u27e8 s \u27e9$ with $ M \u0302 j , l$; |

until converge; |

$ \theta \u0302 k = [ M \u0302 1 , l , p \u0302 1 , l \u27e8 s \u27e9 , \u2026 , M \u0302 m , l , p \u0302 m , l \u27e8 s \u27e9 ]$; |

$ e k = \u2113 ( \theta \u0302 l | C 0 , \u2026 , C N )$; |

$ k \u2190 k + 1$; |

$ I \u2190 arg max i \u2009 { e i}$; |

$ \theta \u0302 \u2190 \theta \u0302 I$; |

## IV. CRÁMER–RAO LOWER BOUND

In this section, we calculate the Fisher Information Matrix (FIM) and then the Cramér–Rao lower bound (CRLB).^{47} The CRLB provides a lower bound for the variance of an unbiased estimator. The underlying likelihood function (3) contains continuous as well as discrete components so that the conventional method to derive CRLB may not be applicable. A Cramér–Rao type bound for discrete likelihood function is proposed in Ref. 48. However, it cannot handle the distribution containing discrete and continuous components. In this paper, following the calculation in Ref. 6, an approximated CRLB is derived by approximating the discrete component $ x !$ by a continuous function $ x \Gamma ( x )$. On the other hand, the MLE is typically asymptotically unbiased under mild conditions.^{49} From the simulation given in Sec. V, the proposed MLE asymptotically approaches the derived approximated CRLB.

*k*,

*l*)th element of the FIM for (2), $ I ( \theta ) k , l$ is

As a result, by applying the chain rule, we are able to calculate $ ( \u2202 f ( y | \theta ) / \u2202 \theta k )$ using (18) and (19). Then, the FIM, $ I ( \theta )$, can now be calculated by inserting (18) and (19) into (17). The CRLB is just $ C ( \theta ) = I ( \theta ) \u2212 1$. For *ν* experiments, the CRLB at ground truth $ \theta 0$ is $ C \nu ( \theta 0 ) = ( 1 / \nu ) C ( \theta ) | \theta = \theta 0$.

## V. SIMULATION

*X*at the

_{j}*i*th MC simulation,

*nMC*is the number of MC simulations, and $ RMSE ( \xb7 )$ is the root mean square error and $ aMAPE ( \xb7 )$ is the averaged mean absolute percentage error.

### A. One species

The case of determining the number and detection probability for one species is highly analogous to the case that we presented in Ref. 6. We concentrate on the small number regime as this is more pertinent for the case that we are concentrating on with few emitters.

Here, we assume that the true parameters are $ [ M 1 , p 1 ] = [ 8 , 0.1 ]$. Figure 3 shows the convergence of the determination of the number of emitters and detection probability.

Additionally, in Fig. 4, the (expected) required number of experiments, *ν _{exp}*, to attain a given estimation performance, $ CRLB ( M 1 ) / M 1 = 1 %$, with varied $ M 1 = 1 , \u2026 , 20$ and $ p 1 \u2208 [ 0.05 , 0.95 ]$ is plotted. It can be seen that, to achieve the fixed performance, the small value of

*M*

_{1}or large value of

*p*

_{1}requires less number of experiments, which gives some insight in how the values of parameters relate to the estimation performance.

### B. Two species

We now consider the case of two species. As before, we set $ [ M 1 , p 1 ] = [ 8 , 0.1 ]$ and introduce $ [ M 2 , p 2 ] = [ 10 , 0.2 ]$ as the ground truth condition for the second species. We vary the number of experiments *ν* and examine the reduction in the RMSE of these parameters as *ν* increases, as shown in Fig. 5. Note that our simulations estimate both species simultaneously. For each *ν*, the Monte Carlo simulation is implemented 100 times. The comparison of the simulated RMSE and the computed squared root of the CRLB of *p*_{1} for all *ν* is shown in Fig. 5(a) (the trends for *p*_{2} are not shown since they are similar to *p*_{1}), while the comparison of the estimates and the RMSE of the estimated $ [ M 1 , M 2 ]$ is shown in Fig. 5(b) for both species.

The number of experiments required to obtain $ CRLB ( M 1 ) / M 1 = 1 %$ and $ CRLB ( M 2 ) / M 2 = 1 %$ as a function of *M*_{2} and *p*_{2} is shown in Fig. 6, for fixed $ [ M 1 , p 1 ] = [ 8 , 0.1 ]$. Though $ [ M 1 , p 1 ]$ are fixed, they are treated as unknown parameter in the simulation and estimated simultaneously with $ [ M 2 , p 2 ]$. The trends here are highly analogous to those from Fig. 4, albeit at a larger number of experiments for the specified precision. These results show that typically fewer experiments are required for lower *M*_{2}, except where *p*_{2} approaches 1. Although the high *p*_{2} limit is not expected to be achievable in standard microscopy type setups.

### C. Three species

We now illustrate our general approach by considering the simultaneous determination of the properties of three species. As before, we maintain the same ground truth solutions for species 1 and 2, such that $ [ M 1 , p 1 ] = [ 8 , 0.1 ] , \u2009 [ M 2 , p 2 ] = [ 10 , 0.2 ]$ and introduce $ [ M 3 , p 3 ] = [ 12 , 0.3 ]$. As before we explore the estimation as a function of the number of experiments, with 100 Monte Carlo simulations per experiment (i.e., for each value of *ν*). Convergence of the *p* is similar for all species, so we only show the RMSE of *p*_{1} as a function of *ν*, which is plotted in Fig. 7(a). The overall trends here are similar to those observed with single species and two species identification, although the three species estimation requires around 1000 times more experiments to achieve comparable RMSE in the *p* than the two-species estimation, with convergence to the square root of the CRLB occurring after around 10^{14} experiments, four orders of magnitude more than the two-species convergence.

We show the estimation of *M*_{1}, *M*_{2}, and *M*_{3} for the same ground truth in Fig. 7(b). These follow similar trends, with the estimation of *M*_{2} being slightly worse than *M*_{1} or *M*_{3} for the same number of experiments. The results show that the estimation requires approximately four orders of magnitude more experiments to achieve similar precision as two-species estimation.

Due to the increasing complexity with increasing number of species, we were unable to simulate the case for more than three species, nor to explore the state space for three-species in the form of Figs. 4 and 6

Comparing with Fig. 3, Figs. 5 and 7, it is noticeable that the required *ν* for *p*_{1} to attain CRLB increases rapidly with the increasing of *m*, i.e., $ \nu \u2248 10 6$ for *m* = 1, $ \nu \u2248 10 10$ for *m* = 2 and $ \nu \u2248 10 14$ for *m* = 3. Furthermore, the aMAPEs of $ [ M \u0302 1 , \u2026 , M \u0302 m ]$ [see Eq. (21)] are plotted in Fig. 8, where $ [ M 1 , p 1 ] = [ 8 , 0.1 ] , \u2009 [ M 2 , p 2 ] = [ 10 , 0.2 ]$, and $ [ M 3 , p 3 ] = [ 12 , 0.3 ]$. One can see, from Figs. 3, 5, 7, and 8, that the performance of estimating *p _{j}* and

*M*is correlated, with roughly four orders of magnitude more experiments needed as the number of simultaneous species are estimated.

_{j}It can be seen from the simulation result, to achieve a certain value of $ [ CRLB ( M 1 ) / M 1 ]$, the required number of experiments is significantly large, roughly 10^{12}, when the probabilities to be estimated are small. This is largely because of that when the probabilities, *p _{i}*, are small, the empirical distribution $ [ C 0 , \u2026 , C N ]$ will be hard to obtain with certain accuracy via limited experiments. As a result, achieving level 1% for $ [ CRLB ( M 1 ) / M 1 ]$ requires a large number of experiments. It is possible that these results will be improved if

*a priori*information is accessible, and so these results should be considered worst case scenarios.

## VI. CONCLUSION

In this paper, we have formulated, mathematically, the estimation of the parameters of an arbitrary number of fluorescent species. Specifically, the convolution binomial model is presented for the underlying problem and then the exact MLE is derived. In order to resolve the intractability of the MLE manifest in the convolutional property and the high-dimensionality of the parameters, a version of the EM algorithm incorporating the method of moment for choosing the initial guess is proposed. The simulation results with different number of species have demonstrated the efficiency of the algorithm by comparing with the derived CRLB.

We also found that the values of the parameters, the number of emitters and their probabilities, have impact on the performance of the estimation: the closer the probabilities are, the worse the performance is, and a larger summed numbers of emitters degrades the performance. Although at present we have assumed that the number of species is known *a priori*, future work could compare models, for example, via the Akaike Information Criterion,^{50} to ease this restriction.

Our work provides a preliminary study and demonstrates the possibility for the estimation of an arbitrary number of fluorescent species and gives insights into the relationship between performance of the estimator and the values of the parameters, which improves understanding of the problem. In future work, we will seek (1) to reduce the required number of experiments while maintaining estimator performance, (2) to improve computational efficiency, (3) to introduce experimental noise signatures, and (4) to identify the role which prior information may help to constrain the estimation problem in particular situations of interest.

## ACKNOWLEDGMENTS

The authors would like to thank Brett C. Johnson for his great help in the visualization of the study. This work was funded by the Air Force Office of Scientific Research (No. FA9550-20-1-0276). A.D.G. also acknowledges funding from the Australian Research Council (Nos. CE140100003 and FT160100357). V.V.Y. acknowledges partial support from the Air Force Office of Scientific Research (AFOSR) (Nos. FA9550-20-1-0366 and FA9550-23-1-0599), the National Institutes of Health (NIH) (Nos. 1R01GM127696, 1R21GM142107, and 1R21CA269099). This material is also based upon work supported by the NASA, BARDA, NIH, and USFDA, under Contract/Agreement No. 80 ARC023CA002E.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Wenchao Li:** Data curation (equal); Formal analysis (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Shuo Li:** Formal analysis (equal); Investigation (equal); Visualization (equal); Writing – review & editing (equal). **Timothy C. Brown:** Formal analysis (equal); Methodology (equal); Writing – review & editing (equal). **Qiang Sun:** Formal analysis (equal); Investigation (equal); Writing – review & editing (equal). **Xuezhi Wang:** Formal analysis (equal); Methodology (equal); Writing – review & editing (equal). **Vladislav V. Yakovlev:** Conceptualization (equal); Funding acquisition (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Allison Kealy:** Funding acquisition (equal); Methodology (equal); Writing – review & editing (equal). **Bill Moran:** Formal analysis (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). **Andrew Greentree:** Conceptualization (equal); Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Fluorescence Microscopy in Life Sciences*

*Handbook of Mixture Analysis*

*Theory of Point Estimation*

*Breakthroughs in Statistics: Foundations and Basic Theory*