Emerging techniques now allow for precise quantification of distributions of biological molecules in single cells. These rapidly advancing experimental methods have created a need for more rigorous and efficient modeling tools. Here, we derive new bounds on the likelihood that observations of single-cell, single-molecule responses come from a discrete stochastic model, posed in the form of the chemical master equation. These strict upper and lower bounds are based on a finite state projection approach, and they converge monotonically to the exact likelihood value. These bounds allow one to discriminate rigorously between models and with a minimum level of computational effort. In practice, these bounds can be incorporated into stochastic model identification and parameter inference routines, which improve the accuracy and efficiency of endeavors to analyze and predict single-cell behavior. We demonstrate the applicability of our approach using simulated data for three example models as well as for experimental measurements of a time-varying stochastic transcriptional response in yeast.

## I. INTRODUCTION

Many physical, chemical, and biological processes are characterized by discrete particles that randomly fluctuate in space, time, or number. These microscopic fluctuations often provide the key to understand and modify mechanisms that control macroscopic phenomena. As one example, stochastic fluctuations in discrete numbers of specific genes, RNA, or proteins across genetically identical populations of cells play an increasingly important role in the understanding of gene regulation.^{1} Emerging experimental techniques, such as flow cytometry, single-cell RNA sequencing, and single-molecule fluorescence *in-situ* hybridization (smFISH),^{2–4} allow for the precise quantification of these fluctuations at the single-cell level. Several approaches have been developed to fit models to the statistical moments,^{5–7} stochastic trajectories,^{8} or full probability distributions^{4} of data collected with these experimental techniques. However, substantial work remains to establish more rigorous and efficient approaches to integrate stochastic analyses with single-cell experimental data.

A little over ten years ago, the finite state projection (FSP^{9}) approach was introduced to approximate the solution of the Chemical Master Equation (CME^{10,11}) and to capture the dynamics of discrete molecular events that control single-cell gene regulation. Since that time, the FSP has received substantial attention, has seen numerous computational improvements, and has become a benchmark tool in the analysis of stochastic gene regulation. Most recently, the FSP has been used to fit and predict experimental measurements of RNA transcription in yeast, bacteria, and human cells.^{12} The main utility of the FSP is to provide precise bounds on the accuracy of its approximation as well as a systematic approach to improve that accuracy. However, improved accuracy comes with increased computational cost, and no attention has been given to how one could optimize this tradeoff. Careful evaluation of this tradeoff is needed to improve the rigor and efficiency with which FSP models can be matched to experimentally measured data. In this work, we develop new FSP-based bounds on the likelihood of single-cell data given a stochastic model. We show how these bounds can be used to reduce computational costs without compromising accuracy. Finally, we use a combination of simulated and experimentally collected single-cell data to demonstrate how the co-design of FSP tools and experimental data can lead to efficient inference of discrete stochastic models.

## II. MATHEMATICAL BACKGROUND

Like many single-molecule kinetic events, gene expression is often modeled as a Markov process, where each discrete state $xi=\xi 1\xi 2\xi 3\cdots \xi NiT$ corresponds to the integer numbers of *N* chemical species (e.g., RNA or protein). Transition events between states are different reactions such as transcription, translation, or degradation, and these reactions can be indexed by *μ* ∈ {1, 2, …, *M*}. Each reaction occurs with a propensity *w*_{μ}(**x**_{i}) *dt*, which is the probability that the *μ*th reaction will occur in the next infinitesimal time step (*t*, *t* + *dt*) given the current state **x**_{i}. State transitions are described as **x**_{i} → **x**_{j} = **x**_{i} + **s**_{μ}, where **s**_{μ} is the stoichiometry vector that describes the change in population after the *μ*th reaction. In such models, each node has a continuous valued probability *P*(**x**_{i}, *t*) that evolves in time according to the linear ODE known as the chemical master equation (CME),^{10,11}

By enumerating all possible {**x**_{1}, **x**_{2}, …} ∈ **X** and corresponding probabilities, $P=P(x1,t)P(x2,t)\u2026aT$, the CME can be posed in matrix form as $ddtP(t)=AP(t)$, where **A** is known as the *infinitesimal generator* (examples of **A** are provided in Secs. IV and V).

The CME dimension is often infinite, making it impossible to solve directly for most systems. The finite state projection (FSP) approach allows one to approximate the CME solution within strict error bounds.^{4,9,13} In its formulation, the FSP approach selects a finite set of indices, *J* = {*j*_{1}, …, *j _{L}*} with which it separates the full state space

**X**into two exhaustive and disjoint sets,

**X**

_{J}= {

**x**

_{j1}, …,

**x**

_{jL}} and its complement

**X**

_{J′}. Under this reorganization, the full master equation can be written in matrix form as

To approximate the CME, the FSP approach then forms a finite state Markov process, where all nodes in **X**_{J′} are aggregated to one or more sink states, denoted by *g*, which record the probability mass that leaves **X**_{J}. However, all probability mass within *g* is now restricted to remain in *g* as time proceeds. The new, reduced FSP-CME becomes

The resulting approximation in Eq. (3) provides three key insights into the exact CME solution. First, it provides a lower bound on the true solution,

This can be interpreted easily by noting that probability can only leave **X**_{J} in the FSP-CME (Eq. (3)), but can return from **X**_{J′} to **X**_{J} in the original CME (Eq. (2)). Second, the FSP provides an exact measure of the approximation error,

where | ⋅ |_{1} denotes the absolute sum of a vector. Finally, as states are added to the set **X**_{J}, the error *g*(*t*) decreases monotonically. Proofs of these three results are provided in Refs. 13 and 14.

The state space of the FSP, **X**_{J}, can be defined through use of polynomial projection shapes,^{13}

Here {*f _{k}*(

**x**

_{i})} is a set of polynomials of the population counts, and the constraints {

*c*} are weights on these polynomials that may be increased (decreased) to include more (fewer) states. In practice, each

_{k}*k*th constraint is associated with its own sink,

*g*, which aggregates all states that satisfy the {1, …, (

_{k}*k*− 1)}th constraints, but not the

*k*th constraint. The value of

*g*(

_{k}*t*) then quantifies probability of violation for the

_{f}*k*th constraint, which in turn guides the systematic increase of

*c*. With this expansion, the FSP algorithm presented in Refs. 9 and 13 can be used to select an

_{k}**X**

_{J}to make $g(t)=\u2211kgk(t)$ small for a specified finite time. However, lower error will require more states and greater computational expense. Until now, there has been little attention given to how small

*g*(

*t*) should be, especially for comparison to a given set of experimental data. In this article, we will show how the combination of experimental data and the FSP results in Eqs. (4) and (5) combines to reveal a natural requirement for the accuracy of

*g*(

*t*).

## III. FSP-DERIVED BOUNDS ON THE LIKELIHOOD

To quantify how experimental data affect the accuracy requirement for the FSP, we consider single-molecule, single-cell data, such as that obtained using the technique of smFISH. This technique allows experimentalists to count the number of specific RNA molecules in individual cells. Let **D** = {*d*_{1}, *d*_{2}, …} be the set of data corresponding to *d _{i}* cells with measured populations corresponding to each state

**x**

_{i}. Define $ID$ as the subset of all

*i*such that

*d*≠0 (i.e., the support of the observed data). Assuming independence of individual cells, the logarithm of the likelihood of observing data

_{i}**D**is

In many cases only partial state observations are available (e.g., only one mRNA species is measured), and with no loss of generality, **P** may be projected onto the observable dimension prior to the summation in Eq. (7). With these definitions in place, we now explore how the FSP bounds affect the computation of the likelihoods of experimental data.

The solution for the FSP is $PJFSP$ and is guaranteed to be a lower bound on the model’s true solution **P** = [*P*(**x**_{1}), *P*(**x**_{2}), …] by Eq. (4). The log-likelihood in Eq. (7) is monotonic in each *P*(**x**_{i}); therefore $PJFSP$ provides a lower bound on the log-likelihood of **D** given the model

However, in Eq. (3) the FSP also provides the exact error in the solution of a particular model described by the CME. By redistributing the known FSP error back onto the CME solution in an optimal manner, an upper bound on log*L*(**D**) can be derived,

where ε_{i} is the probability error redistributed to state **x**_{i}. To optimize the redistribution of *g* and determine UB_{J}(**D**), we developed a modified water-filling algorithm similar to those used to determine the amount of power to send to different channels in communication systems.^{15,16} To simplify notation, we define the FSP probability for each state as $pi\u2261PJFSP(xi)$ and the corresponding portion of the total objective due to state **x**_{i} as *f _{i}* ≡

*d*log(

_{i}*p*+ ε

_{i}_{i}). Clearly, adding probability mass to the

**x**

_{i}with the greatest ∂

*f*/∂ε

_{i}_{i}will have the largest increase in the $\u2211ifi$. When a set of two or more {∂

*f*/∂ε

_{i}_{i}} are maximal and equal, probability mass must be redistributed to equally increase all members of that set. At the optimal redistribution, all

**x**

_{i}to which probability mass has been added will have identical values for ∂

*f*/∂ε

_{i}_{i}. This approach, where extreme values are first equalized before adjusting more moderate values, is analogous to raising the water level over a rough surface, filling holes to equal levels before covering hills.

The derivative of *f _{i}* with respect to ε

_{i}is computed from Eq. (9) to get

and we define *N _{D}* as the number of distinct observations (i.e., the size of $ID$). These values are then ranked in decreasing magnitude according to

where the notation $(\u22c5\u0303)$ refers to the data-ordered state space $X\u0303D={x\u03031,x\u03032,\u2026,x\u0303r,\u2026,x\u0303ND}$. An optimal redistribution of *g* equalizes the first *n* terms of Eq. (11) and satisfies the linear constraints

*n*= 4 greatest ∂

*f*/∂ε

_{i}_{i}can be equalized with ∑ε

_{i}=

*g*, the optimal redistribution solution can be found by the linear algebraic Eq. (12), which can be written in matrix form as follows:

In this formulation, the number of states to which probability is redistributed, *n*, is the largest dimension for which the solution of Eq. (13) is strictly positive for all ε_{i}. If the states **X**_{J} used by the approximation do not span the support of the distribution of data, the corresponding values of *p _{i}* are zero and $\u2202fi\u2202\epsilon i|\epsilon i=0$ are infinite. Therefore, the corresponding redistribution of {ε

_{i}} will always include some mass for those states. Algorithm 1 provides pseudocode for the proposed error redistribution approach. At most, the FSP error redistribution algorithm requires

*N*iterations, and in practice computation of the upper bound on the likelihood takes only a fraction of the time needed for the FSP solution itself, especially for cases where the data correspond to partial state observations.

_{D}Rank $\u2202fi\u2202\epsilon i|\epsilon i=0\u2200i$ in decreasing order. |

n = 1, ε_{1} = g |

while$\u2202f\u2202\epsilon n+1>\u2202f\u2202\epsilon n$ and n < N _{D}do |

n → n + 1 |

Solve for ε_{1}, ε_{2}, …, ε_{n} using Eq. (12) |

end while |

ε = [ε_{1}, …, ε_{n}] is the optimal redistribution of g. |

Rank $\u2202fi\u2202\epsilon i|\epsilon i=0\u2200i$ in decreasing order. |

n = 1, ε_{1} = g |

while$\u2202f\u2202\epsilon n+1>\u2202f\u2202\epsilon n$ and n < N _{D}do |

n → n + 1 |

Solve for ε_{1}, ε_{2}, …, ε_{n} using Eq. (12) |

end while |

ε = [ε_{1}, …, ε_{n}] is the optimal redistribution of g. |

The FSP-derived bounds on likelihoods have several important implications for the comparison of stochastic models to single-cell data. Let **Λ** denote a particular combination of a model and its parameters, and let *L*(**D**|**Λ**) denote the likelihood of **D** given **Λ**. In the case when **X**_{J} = ∅ (i.e., the FSP set is empty), all of the probability mass must be redistributed, and the FSP-derived upper bound is given by

This result is understood easily—the maximum of the log-likelihood function occurs when the distribution of the data exactly matches the model, and in this case the FSP upper bound describes the best any potential model can ever do. To interpret bounds for non-trivial FSP projections, we make use of the facts that (i) the FSP approximation lower bound *p _{i}* increases monotonically as

**X**

_{J}is expanded

^{9}and (ii) the likelihood increases monotonically as each

*p*increases. As a result, LB

_{i}_{J}(

**D**) and UB

_{J}(

**D**) are guaranteed to be monotonically increasing and decreasing functions of the projection size. Figure 1 illustrates the converging upper- and lower-bounds for the likelihoods of two FSP models as the size of the index set

*J*, or equivalently the size of

**X**

_{J}, is increased.

### A. Using FSP-derived bounds for model discrimination

For any two models and their parameter sets, **Λ**_{i} and **Λ**_{j}, we define the set of sufficient discriminating projections, **Φ**(**Λ**_{i}, **Λ**_{j}), as any pair of projection index sets, *J _{i}* and

*J*, that guarantees the correct ranking of likelihoods for the two models,

_{j} Intuitively, these are any two projections such that the worst possible likelihood for one model is greater than best possible likelihood of the other. In Fig. 1, the red and green circles denote pairs of projections sufficient to guarantee that parameter set **Λ**_{2} is more likely than **Λ**_{1}. Because **Φ**(**Λ**_{1}, **Λ**_{2}) can contain an infinite set of such pairs with varying projection sizes, we define a minimal symmetric discriminatory projection, *ϕ _{s}*(

**Λ**

_{i},

**Λ**

_{j}), as

In Fig. 1, the blue circle denotes *ϕ _{s}*(

**Λ**

_{1},

**Λ**

_{2}). Finally, in many sequential parameter searches, previous FSP models may already be computed to high accuracy, and it may not be necessary to demand the same accuracy for subsequent models. For this case, we define a minimum non-symmetric projection size such that

The utility of this particular discriminatory projection size definition becomes important in parameter search problems, where it enables sequential likelihood evaluations to be conducted at minimal projection sizes. In Fig. 1, the green circles denote *ϕ*_{2}(**Λ**_{1}, **Λ**_{2}) a particular combination where the size of **X**_{J2} is minimized given a previous computation for **X**_{J1}. As the examples below will demonstrate, utilization of these minimal discriminating projections can substantially reduce the computational effort in parameter inference by eliminating much of the potential parameter space with smaller projection sizes.

### B. Relationship of FSP bounds to other CME truncations

The FSP upper and lower bounds present an opportunity to better understand relationships between the FSP and other CME approximations in the context of single-cell data. For example, several groups have imposed limits of species^{17} or total molecule populations,^{18} which result in truncated master equations with reflecting boundary conditions (in contrast to the FSP’s absorbing boundary condition). In a similar vein, one could renormalize the FSP solution to make use of the original FSP computation. For this renormalization, the CME can be written in terms similar to the FSP formulation,

where $\alpha =|AJJPJrenorm|1$ is the rate of flow of probability out of **X**_{J}, which is now redistributed back into **X**_{J} according to the probability $PJrenorm$. However, the solution of this non-Markovian system is identical to simply renormalizing $PJFSP$ at all times,

Because the FSP likelihood bounds provide the best- and worst-case redistribution of the probability absorbed in *g*, likelihoods computed by reflection, renormalization, or any other arbitrary strategy are guaranteed to lay between the computed FSP bounds. Therefore, it is possible that reflecting boundaries may provide an improved approximation of the true likelihood for a particular combination of data and model. Unfortunately, the likelihoods of reflected or renormalized solutions are not necessarily monotonic, and the likelihoods of renormalized solutions for two different parameter sets or models may change rank depending on the projection size. These issues will be addressed further in Section IV C.

## IV. APPLICATION OF FSP-DERIVED BOUNDS

To demonstrate the application of the FSP bounds, this section uses simulated data and three examples of stochastic gene regulation: an unregulated birth-death model, a genetic toggle switch, and a non-linear self-activating gene.

### A. Birth-death model

The variability in mRNA copy numbers for housekeeping genes is well captured by the standard single-species birth and death model, where **x**_{i} = *x _{i}* is the discrete number of RNA. This model consists of two reactions that describe transcription and degradation as shown in Fig. 2(a),

where the propensity functions, **w** = {*w*_{1}, *w*_{2}}, are

Table I shows three different parameter sets for this example. A well-known analytical solution for this model, assuming *x*(0) = 0, is the time-varying Poisson distribution

The FSP formulation for this model can be written from Eq. (21) as

where *N _{m}* is the size of the FSP truncation (i.e.,

**X**

_{J}= {

*x*

_{0},

*x*

_{1}, …,

*x*

_{Nm}}). In this case, we used a single constraint function in Eq. (12)

*f*

_{1}(

*x*) =

_{i}*x*≤

_{i}*c*

_{1}, where

*c*

_{1}=

*N*. To expand the state space for this model,

_{m}*c*

_{1}is simply increased by one in each iteration.

For this model and the parameters provided in the bottom row of Table I, we simulated 500 trajectories of the stochastic simulation algorithm,^{19} and plotted the resulting “data” in Fig. 2(b) (black). For a projection size defined by *N _{m}* = 50, Fig. 2(b) also shows the FSP lower bound computed using Eq. (4) in red and the FSP upper bound from Eq. (9) in blue. Figure 3 demonstrates the convergence of the upper and lower FSP bounds as

*c*

_{1}is increased. Results are plotted for two different parameter sets. Increasing

*c*

_{1}adds more states to

**X**and monotonically decreases the error

*g*. In turn, less error is available to be redistributed back onto the FSP solution, and UB

_{J}(

**Λ**) and LB

_{J}(

**Λ**) converge monotonically to the analytical value of log

*L*(

**D**|

**Λ**).

### B. Toggle model

We next explore the application of the FSP bounds on a model of the toggle switch^{20} in which two genes *λcI* and *lacI* mutually inhibit one another, as illustrated in Fig. 4(a). Here we consider a simple model similar to that presented by Tian and Burrage.^{21} In this model, the system state is defined by the discrete populations of the two proteins, **x** = [*λ*cILacI]. The four reactions are

where the propensity functions **w** = {*w*_{1}, *w*_{2}, *w*_{3}, *w*_{4}}, are given by

Table II provides the toggle model parameters that have been used to generate simulated data shown in black in Fig. 4(b).

The infinitesimal generator for this toggle model can be written as

To apply the FSP to truncate the toggle model CME, we consider three constraint functions from Eq. (6), where *c*_{1}, *c*_{2}, and *c*_{3} define the projection as

These constraints are illustrated in Fig. 5. By monotonically increasing *c _{k}*, more states are included, and the error

*g*(

*t*) decreases. For simplicity in presentation,

*c*

_{2}and

*c*

_{3}were initially set at high values of 95 and 55, respectively, and the expansion only modifies the criteria

*c*

_{1}. Similar results can be obtained with more general expansion routines, provided that

*c*are constant or monotonically increasing for each

_{k}*k*. Figure 4(b) shows the marginal probability distribution for LacI with

*c*

_{1}= 150, where the FSP lower and upper bounds are in red and blue, respectively. Although Algorithm 1 distributes the error onto the joint probability distribution of both species, results are plotted only for the marginal distribution. Figure 6 shows the converging likelihood bounds for two parameter sets as

*c*

_{1}is increased to expand the number of states captured by the projection. Because these bounds are monotonic, the analysis reveals a minimal symmetric projection size,

*ϕ*, at which one can guarantee that the model in green has a greater likelihood than the model in blue.

_{s}### C. Comparing FSP bounds to other CME truncation approaches

To illustrate potential issues that arise through application of other CME truncation approaches, we consider a simple gene regulation model with nonlinear self-activation. In this model, the propensity of production is given by the positive feedback function

and the propensity of degradation is a first order process given by *w*_{2} = *γx*. In this formulation, *k*_{1} is the rate of production for small values of *x*, *k*_{1} + *k*_{2} is the rate of production for large values of *x*, *m* is the value of *x* at which the rate of production is halfway between *k*_{1} and *k*_{1} + *k*_{2}, and the cooperativity factor *n* determines the steepness of the activation function. In the deterministic regime, this model of self-regulated gene expression can lead to one or two stable equilibria as illustrated in Fig. 7. In the stochastic regime, this corresponds to bimodal distributions in some parameter regimes, and unimodal distributions in other regimes. Table III provides three possible parameter sets for this model. We simulated data from the first of these models, **Λ**_{True}, which admits a single stable point (see the dashed line in Fig. 7) and yields a unimodal distribution of data as shown in black in Fig. 8.

Λ
. | k(N_{r}t^{−1})
. | γ(t^{−1})
. |
---|---|---|

Λ_{1} | 50.0 | 0.5 |

Λ_{2} | 49.65 | 0.5 |

$\Lambda \u02c6$ | 45.0 | 0.5 |

Λ
. | k(N_{r}t^{−1})
. | γ(t^{−1})
. |
---|---|---|

Λ_{1} | 50.0 | 0.5 |

Λ_{2} | 49.65 | 0.5 |

$\Lambda \u02c6$ | 45.0 | 0.5 |

Λ
. | b_{λcI}
. | k_{λcI}
. | α_{LacI}
. | η_{LacI}
. | γ_{λcI}
. | b_{LacI}
. | k_{LacI}
. | α_{λcI}
. | η_{λcI}
. | γ_{LacI}
. |
---|---|---|---|---|---|---|---|---|---|---|

. | (s^{−1})
. | (s^{−1})
. | (N^{−ηLacI})
. | − . | (N^{−1} s^{−1})
. | (s^{−1})
. | (s^{−1})
. | (N^{−ηλcI})
. | − . | (N^{−1} s^{−1})
. |

Λ_{1} | 6.8 × 10^{−5} | 1.6 × 10^{−2} | 6.1 × 10^{−3} | 2.1 | 6.7 × 10^{−4} | 2.2 × 10^{−3} | 1.7 × 10^{−2} | 2.6 × 10^{−3} | 3.0 | 3.8 × 10^{−4} |

Λ_{2} | 6.8 × 10^{−5} | 1.6 × 10^{−2} | 6.1 × 10^{−3} | 2.1 | 8.0 × 10^{−4} | 2.2 × 10^{−3} | 1.7 × 10^{−2} | 2.6 × 10^{−3} | 3.0 | 3.8 × 10^{−4} |

$\Lambda \u02c6$ | 6.8 × 10^{−5} | 1.4 × 10^{−2} | 6.1 × 10^{−3} | 2.1 | 6.7 × 10^{−4} | 2.2 × 10^{−3} | 1.6 × 10^{−2} | 2.6 × 10^{−3} | 3.0 | 3.8 × 10^{−4} |

Λ
. | b_{λcI}
. | k_{λcI}
. | α_{LacI}
. | η_{LacI}
. | γ_{λcI}
. | b_{LacI}
. | k_{LacI}
. | α_{λcI}
. | η_{λcI}
. | γ_{LacI}
. |
---|---|---|---|---|---|---|---|---|---|---|

. | (s^{−1})
. | (s^{−1})
. | (N^{−ηLacI})
. | − . | (N^{−1} s^{−1})
. | (s^{−1})
. | (s^{−1})
. | (N^{−ηλcI})
. | − . | (N^{−1} s^{−1})
. |

Λ_{1} | 6.8 × 10^{−5} | 1.6 × 10^{−2} | 6.1 × 10^{−3} | 2.1 | 6.7 × 10^{−4} | 2.2 × 10^{−3} | 1.7 × 10^{−2} | 2.6 × 10^{−3} | 3.0 | 3.8 × 10^{−4} |

Λ_{2} | 6.8 × 10^{−5} | 1.6 × 10^{−2} | 6.1 × 10^{−3} | 2.1 | 8.0 × 10^{−4} | 2.2 × 10^{−3} | 1.7 × 10^{−2} | 2.6 × 10^{−3} | 3.0 | 3.8 × 10^{−4} |

$\Lambda \u02c6$ | 6.8 × 10^{−5} | 1.4 × 10^{−2} | 6.1 × 10^{−3} | 2.1 | 6.7 × 10^{−4} | 2.2 × 10^{−3} | 1.6 × 10^{−2} | 2.6 × 10^{−3} | 3.0 | 3.8 × 10^{−4} |

Λ
. | k_{1}
. | k_{2}
. | m
. | n
. | γ
. |
---|---|---|---|---|---|

. | (1/s) . | (1/s) . | (N) . | − . | (1/s) . |

Λ_{True} | 20 | 40 | 70 | 4 | 1 |

Λ_{A} | 22.5 | 40 | 70 | 4 | 1 |

Λ_{B} | 20 | 125 | 70 | 4 | 1 |

Λ
. | k_{1}
. | k_{2}
. | m
. | n
. | γ
. |
---|---|---|---|---|---|

. | (1/s) . | (1/s) . | (N) . | − . | (1/s) . |

Λ_{True} | 20 | 40 | 70 | 4 | 1 |

Λ_{A} | 22.5 | 40 | 70 | 4 | 1 |

Λ_{B} | 20 | 125 | 70 | 4 | 1 |

Next, we consider two perturbations of this true parameter set,

where **Λ**_{A} and **Λ**_{B} correspond to the red and green lines, respectively, in Figs. 7 and 8, and their parameters are given in Table III. For **Λ**_{B} the system has a bimodal response and for **Λ**_{A}, the response is unimodal. It is interesting to explore how the application of the renormalization scheme in Eq. (19) would affect comparison of these two models to simulated data. For a projection size of 45, Fig. 8(a) suggests that **Λ**_{B} provides a better match to the data than **Λ**_{A}. Moreover, the likelihood of the data given **Λ**_{A} and **Λ**_{B} appears to be nearly constant over a substantial portion of the projection space as shown by the dashed lines and the shaded region of Fig. 8(b). Based on this information, it would be easy to conclude that **Λ**_{B} is the more appropriate parameter set. However, only at large projections that include the second peak in the distribution for **Λ**_{B}, does it becomes apparent that the **Λ**_{A} is the better choice. This scenario illustrates how the renormalization scheme can complicate parameter discrimination for certain combinations of models and data. Similar cautions also apply for reflecting boundary approximations of the CME (see dotted lines in Fig. 8). The strict upper and lower bounds provided by the FSP provide a rigorous approach to eliminate this ambiguity as a function of projection size.

It should also be noted that likelihood computations using reflection or renormalization based truncations require the support of the CME to include that of the experimental data. Otherwise, these approaches will match the FSP lower bound, which suggests that the data are infinitely unlikely. Such a lower bound may appear uselessness at first, but as we will see in Sec. V, the FSP upper bound may still be sufficient for rigorous and efficient model selection.

## V. FSP LIKELIHOOD BOUNDS IN PARAMETER SEARCHES

The FSP’s constricting upper and lower bounds on the likelihood enable rigorous discrimination between two parameter sets, **Λ**_{1} and **Λ**_{2}, without using unnecessarily strict error tolerances. The following examples will demonstrate the utility of the sufficient discriminatory projections, **Φ**(**Λ**_{i}, **Λ**_{j}), *ϕ _{s}*(

**Λ**

_{i},

**Λ**

_{j}), and

*ϕ*(

_{i}**Λ**

_{i},

**Λ**

_{j}).

### A. Parameter search for the birth-death and toggle models

We return to the simulated data presented in Figs. 2 and 4, but this time we apply the FSP for many different parameter combinations and for many different projections. Figure 9 illustrates the practical strength of the minimal symmetric discriminating projection, *ϕ _{s}*(

**Λ**

_{i},

**Λ**

_{j}). For the birth/death model in Figs. 9(a) and 9(b), parameter

**Λ**=

*k*is allowed to vary. Figure 9(a) shows the likelihood of the data as a function of parameter

_{r}**Λ**, and Fig. 9(b) shows the size of the sufficient symmetric projection, $\varphi s(\Lambda ,\Lambda \u02c6)$, needed to discriminate between

**Λ**and a fixed parameter $\Lambda \u02c6$. Similarly, for the toggle model, both the maximum rates of production for LacI and

*λ*cI were varied, $\Lambda =k\lambda cIkLacI$. In this case, Fig. 9(c) shows contour plots of the likelihoods, and Fig. 9(d) shows the corresponding contours of the size of $\varphi s(\Lambda ,\Lambda \u02c6)$. For models whose likelihood is better or worse than $\Lambda \u02c6$, Figs. 9(b) and 9(d) show that the comparison can be made with smaller projection sizes. Considering that the solution of Eq. (3) has a complexity that increases with order $O(n2)$ or worse depending on the solution scheme,

^{22}such reductions can lead to substantial computational savings. In past studies, the FSP has been solved to uniformly strict error tolerances such as 10

^{−6}in Neuert

*et al.*,

^{4}yet consideration of the FSP bounds allows for error tolerances that are relaxed by several orders of magnitude.

### B. Using FSP bounds to compare CME models using experimental data

To demonstrate the application of the FSP bounds on real data, we examine a recent model and single-cell experimental data for Mitogen Activated Protein Kinase (MAPK) control of *STL1* gene regulation in *S. cerevisiae* (budding yeast). Yeast activate a variety of regulatory pathways to mitigate the osmotic pressure difference that arises from solute imbalance across the cell membrane. One such mechanism is the high-osmolarity glycerol (HOG) pathway.^{23} Upon osmotic shock, the Hog1 kinase phosphorylates and localizes to the nucleus of the cell, where it initiates transcription of several genes, including *STL1*. After cells adapt to the new condition, the kinase leaves the nucleus, and the transcription pathways turn back off. Interestingly, while the nuclear localization and transcription initialization are a largely deterministic temporal signal, transcript abundance varies considerably between isogenic cells. This variability has been quantified in detail, using the smFISH technique to quantify transcript abundances in single cells at sixteen different time points from zero to 60 min after osmotic shock.^{4}

The current time-varying CME model of the *STL1* regulation process allows the gene to switch between four possible states with different transcription rates as shown in Fig. 10(a). Reactions that change the gene from state *i* to *j* occur with propensities {*k _{ij}*}, and the transcription rates are given by

*k*

_{ri}, for each of the

*i*= {1, 2, 3, 4}th gene states. In this model, one particular transition rate

*k*

_{21}varies as a function of the Hog1p kinase in the nucleus as

where the temporal signal profile for Hog1p was measured experimentally^{4} and is reproduced in Fig. 10(b). As a result of this dependence on a time-varying parameter, the infinitesimal generator for the CME is a function of time. The FSP truncation of the CME can be written as

where N_{m} is the maximum number of RNA in the FSP approximation, and the matrices **S**, **T**, and **Γ** are given by

Figure 11 shows examples of distributions for six time points during the osmotic shock response. Experimental data^{4} (black) were collected using smFISH, and the FSP lower bound for a moderate projection size is shown in red. At each experimentally measured time point during the dynamic process, the FSP error *g*(*t*) is computed based on the FSP truncation, and the FSP upper bound on log*L*(**D**|**Λ**) (shown in blue) is computed using Algorithm 1. In this illustrative example, we note that the projection size is substantially smaller than the support of the experimental data, yet the reduced FSP adequately captures the distribution, especially for the earlier time points. Because a good model must capture early and late time points, this observation suggests that smaller projections may be quite informative for model discrimination.

To explore the impact that experimental data has on the required projection for the FSP, we use the non-symmetric minimal projection to determine the necessary projection size needed for parameter discrimination. In this case, each sequential comparison of two models begins with a previous FSP model that is already solved to a known precision. For example, in Fig. 12, we plot the FSP error bounds versus the projection size for two Hog-*STL1* model parameter sets, **Λ**_{N} and **Λ**_{N+1}. If the likelihood is already known for the Nth parameter set (horizontal dashed line in Figs. 12(a) and 12(b)), then the FSP model for the (N + 1)th parameter set need only be solved with a projection size corresponding to *ϕ*_{N+1}. Figure 12 represents this minimal non-symmetric projection size with black circles. In many cases, such as that shown in Fig. 12(b), the new parameter is worse than the previous case, and the necessary discriminating projection size can be much smaller than the support of the experimental data. Such situations where the next parameter set is worse than the current set are the norm in a typical parameter search.

Figure 13(a) shows the likelihood of the experimental data versus two parameters in the Hog-*STL1* model, and Fig. 13(b) shows the size of the necessary discriminating projection for the new **Λ**_{N+1} given that the old **Λ**_{N} is at the black circle. For new parameters that give smaller likelihoods than **Λ**_{N} (i.e., those outside of the halo), parameter discrimination can be achieved with projection sizes that are a fraction of the support of the data. In fact the median projection needed to compare the old and new models is 70, compared to a data support size of 107.

Λ
. | k_{12}
. | α
. | β
. | k_{23}
. | k_{32}
. | k_{34}
. | k_{43}
. | k_{r1}
. | k_{r2}
. | k_{r3}
. | k_{r4}
. | γ
. |
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | (1/s) . | (1/s) . | (1/(s^{∗}[A.U.]))
. | (1/s) . | (1/s) . | (1/s) . | (1/s) . | (N_{r}/s)
. | (N_{r}/s)
. | (N_{r}/s)
. | (N_{r}/s)
. | (1/s) . |

Λ_{N−1} | 2.096 | 5406.3 | 25 116.6 | 0.009 79 | 0.008 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1372 | 1.953 | 5.53 × 10^{−3} |

Λ_{N} | 2.096 | 5406.3 | 18 116.6 | 0.007 79 | 0.006 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1072 | 1.953 | 5.53 × 10^{−3} |

Λ_{N+1} | 2.096 | 5406.3 | 30 116.6 | 0.013 79 | 0.010 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1372 | 1.953 | 5.53 × 10^{−3} |

Λ_{fixed} | 2.096 | 5406.3 | 18 116.6 | 0.02 | 0.006 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1072 | 1.953 | 5.53 × 10^{−3} |

Λ
. | k_{12}
. | α
. | β
. | k_{23}
. | k_{32}
. | k_{34}
. | k_{43}
. | k_{r1}
. | k_{r2}
. | k_{r3}
. | k_{r4}
. | γ
. |
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | (1/s) . | (1/s) . | (1/(s^{∗}[A.U.]))
. | (1/s) . | (1/s) . | (1/s) . | (1/s) . | (N_{r}/s)
. | (N_{r}/s)
. | (N_{r}/s)
. | (N_{r}/s)
. | (1/s) . |

Λ_{N−1} | 2.096 | 5406.3 | 25 116.6 | 0.009 79 | 0.008 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1372 | 1.953 | 5.53 × 10^{−3} |

Λ_{N} | 2.096 | 5406.3 | 18 116.6 | 0.007 79 | 0.006 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1072 | 1.953 | 5.53 × 10^{−3} |

Λ_{N+1} | 2.096 | 5406.3 | 30 116.6 | 0.013 79 | 0.010 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1372 | 1.953 | 5.53 × 10^{−3} |

Λ_{fixed} | 2.096 | 5406.3 | 18 116.6 | 0.02 | 0.006 68 | 0.0448 | 0.465 | 9.16 × 10^{−4} | 0.012 32 | 0.1072 | 1.953 | 5.53 × 10^{−3} |

## VI. SUMMARY AND CONCLUSIONS

In recent years, substantial interest has arisen to integrate discrete stochastic models with single-cell experimental data. This has motivated many approaches to solve the chemical master equation, including stochastic simulations, moment closure analyses, and the finite state projection approach. Progress in this arena will continue in the future to open new biochemical processes for discrete, stochastic computational analyses. However, until now there has been little discussion of how accurate CME models need to be in order to adequately interpret experimental data. In this article, we have explored the benefit by which careful consideration of experimental data can help to reduce computational complexity and enable more efficient and rigorous comparison of multiple models in the context of experimental data. We have shown that this advance can substantially reduce the complexity of model identification for single-cell gene regulation models using real data, and we believe this approach opens new doors for gene regulation models in many pathways and organisms.

In light of the above results, it would be interesting to re-examine other approaches to fit stochastic models to single-cell data. For example, a common and highly flexible tool for this task is the stochastic simulation algorithm (SSA^{19}). As one runs more and more SSA trajectories, the collected statistics converge to the solution of the CME, and the computed likelihood of the data given the model will also converge to the correct value. Unlike the FSP approach derived here, convergence of the SSA or other kinetic Monte Carlo schemes will not be monotonic, and long distribution tails can be very difficult to estimate. However, although the SSA does not provide a direct computation or bounds on its computational error, one can estimate the rate of convergence with increasing numbers of trajectories. With these one could imagine that the insight gained from the optimal redistribution of the FSP error could be adapted to explore similar *ad hoc* redistribution methods for the SSA. Such analyses provide intriguing paths for future theoretical, computational, and experimental investigations.

In our examples, we assumed perfect measurements of the number of RNA per cell. While the RNA-FISH approach provides a more precise quantification of molecule numbers than flow cytometry or single-cell sequencing approaches, all experiments are subject to some measurement error. Drawing insight from past efforts to predict flow cytometry data in the presence of background fluorescence,^{24} the analyses described above can be extended to account for experimental uncertainties. Specifically, we are currently exploring experimental controls to estimate how variations in experiment or image analysis affect the probability to under- or over-count RNA spots. Such analyses will provide a baseline to calibrate how the distribution of *counted* spots conditions upon the *actual* spot number. Assuming the measurement error is identical and independently distributed for each cell, models could be convolved with the calibrated conditional distribution to predict the probability distribution for observed spot counts, and the FSP error bounds derived above will hold with no loss in generality.

In conclusion, it is now well established that stochastic models can help to better understand single-cell gene regulatory responses. Here, we have complemented this fact by showing how single-cell data may inform the design of rigorous and yet more efficient computational analyses. Together, these insights offer further motivation for tighter integration and co-design of computational and experimental investigations of biological phenomena.

## Acknowledgments

This study was partially supported by the NIH-NIGMS under Award No. R25GM105608 and by the W.M. Keck Foundation. Z.F. was funded under NSF-NRT Grant No. DGE-1450032. G.N. was supported by the NIH under Award No. DP2GM114849. Contributions were as follows: B.M. designed the study; Z.F. and B.M. developed the computational approaches; Z.F. performed the numerical analyses; G.N. performed the experimental studies and advised on interpretation of the experimental data; Z.F. and B.M. wrote the manuscript.