In this study, we obtain an exact time-dependent solution of the chemical master equation (CME) of an extension of the two-state telegraph model describing bursty or non-bursty protein expression in the presence of positive or negative autoregulation. Using the method of spectral decomposition, we show that the eigenfunctions of the generating function solution of the CME are Heun functions, while the eigenvalues can be determined by solving a continued fraction equation. Our solution generalizes and corrects a previous time-dependent solution for the CME of a gene circuit describing non-bursty protein expression in the presence of negative autoregulation [Ramos *et al.*, Phys. Rev. E **83**, 062902 (2011)]. In particular, we clarify that the eigenvalues are generally not real as previously claimed. We also investigate the relationship between different types of dynamic behavior and the type of feedback, the protein burst size, and the gene switching rate.

## I. INTRODUCTION

It is well known that often the expression of a gene is related to that of other genes. These gene–gene interactions are at the heart of gene regulatory networks.^{1} Perhaps the simplest type of such interactions is autoregulation, whereby a protein influences its own transcription, leading to positive or negative feedback. This type of regulation is common, e.g., it is estimated that about 40% of *Escherichia coli*’s transcription factors self-regulate, mostly engaging in autorepression.^{2,3}

Over the past 20 years, many stochastic models of autoregulation have been developed and their steady-state behavior has been studied using both simulations and theory. Within the continuous-time Markovian framework, some of these studies make headway by modelling gene state changes implicitly and assuming that protein numbers are real and positive (a continuum assumption).^{4–8} Other studies have tackled the more realistic problem where the changes in molecule numbers of both gene and protein, when reactions occur, are integers.^{9–12} Of the latter class of models, some assume that protein expression occurs one at a time,^{9,10} while others assume that expression occurs in stochastic bursts.^{11,12} Another distinguishing feature is that some models assume that there is no change in the protein number when a protein copy binds to a gene or when it unbinds,^{9,11} while others model the change explicitly.^{10,12} We note that the majority of these stochastic models have been solved exactly only in steady-state conditions. The exact time-dependent solution, being a much harder mathematical problem, has received considerably less attention—in Ref. 13, a model of non-bursty protein expression, including negative autoregulation, was purportedly solved exactly. A recent review of the various types of stochastic models of autoregulatory genetic circuits that also compares and contrasts their predictions can be found in Ref. 14.

In the present paper, we construct and exactly solve in time a stochastic model of bursty or non-bursty protein expression in the presence of positive or negative autoregulation, where both gene and protein numbers are modelled discretely. In Sec. II, the reaction scheme and the chemical master equation (CME) describing the stochastic dynamics of the set of reactions are introduced. In Sec. III, by means of the method of spectral decomposition, we show that the eigenfunctions of the time-dependent generating function solution of the CME are Heun functions, while the eigenvalues obey a continued fraction equation that is obtained by considering the holomorphism of the generating function. The accuracy of the solution is verified by stochastic simulations. Crucially, we also show that a previous time-dependent solution for the CME of a gene circuit describing non-bursty protein expression in the presence of negative autoregulation^{13} is incorrect because the eigenvalues are generally not real as previously claimed. In Sec. IV, we investigate the relationship between five different types of dynamic behavior and the type of feedback, the protein burst size, and the gene switching rate. We conclude with a discussion in Sec. V.

## II. MODEL

Here, we consider stochastic gene expression dynamics in a minimal coupled positive-plus-negative feedback loop with gene state switching, protein synthesis, and protein decay (Fig. 1). Let *G* and *G** denote the inactive and active states of the gene, and let *P* denote the corresponding protein. In the active state *G**, we assume that the protein is produced in a non-bursty (constitutive) or bursty manner. Both non-bursty and bursty gene expression are commonly observed in naturally occurring systems. Bursty protein synthesis is often due to rapid translation of protein from a single, short-lived mRNA molecule;^{15,16} if the mRNA lifetime is quite long (as common in mammalian cells^{17}), then protein synthesis may appear non-bursty.

*n*be the copy number of the protein. In the bursty case, the reaction scheme underlying the coupled feedback loop is as follows [Fig. 1(a)]:

*n*will directly or indirectly affect the switching rates between the two gene states. Here,

*a*and

*b*are the spontaneous switching rates,

*μ*and

*ν*characterize the strengths of positive and negative feedback loops, respectively, and

*d*is the decay rate of the protein either due to protein degradation or due to dilution during cell division.

^{18}Specifically, the protein decay rate can be represented as

*d*= log 2/

*T*

_{p}+ log 2/

*T*

_{c}, where

*T*

_{p}is the protein half-life and

*T*

_{c}is the cell cycle time. When the gene is in the active state

*G**, the synthesis of the protein occurs in bursts with frequency

*ρ*and random size

*k*sampled from a geometric distribution with parameter

*p*, in agreement with experiments.

^{19}Note that the model considered here is more general than the classical model of autoregulatory feedback loops proposed by Kumar

*et al.*

^{11}In particular, the model describes a positive autoregulatory loop if the negative feedback strength

*ν*vanishes, and it describes a negative autoregulatory loop if the positive feedback strength

*μ*vanishes. In a coupled feedback loop, both

*μ*and

*ν*are nonzero. Coupled gene circuits widely exist in nature, and they have been shown to be a crucial network motif to produce robust and tunable biological oscillations.

^{20}Biological examples of coupled feedback loops can be found in Refs. 20 and 21.

*i*,

*n*), where

*i*is the gene state with

*i*= 0, 1 corresponding to the inactive and active states, respectively, and

*n*is the protein number. Let

*p*

_{i,n}(

*t*) denote the probability of having

*n*protein molecules in an individual cell when the gene is in state

*i*. Then, the stochastic gene expression dynamics can be described by the Markov jump process shown in Fig. 1(b). The evolution of the Markovian dynamics is governed by the CMEs,

*ρ*on the right-hand side represents protein synthesis, the terms involving

*d*represent protein decay, and the terms involving

*a*,

*b*,

*μ*, and

*ν*represent gene state switching and feedback control.

*s*when the gene is active. The Markovian dynamics for this model is illustrated in Fig. 1(d). Note that the non-bursty model described by Eq. (3) is a limiting case of the bursty model described by Eq. (1).

^{22,23}Since the protein burst size is geometrically distributed, its expected value is given by $B=\u2211k=1\u221ekpk(1\u2212p)=p/(1\u2212p)$. It is easy to see that when

*ρ*→

*∞*and

*B*→ 0, while keeping

*ρB*=

*s*as constant, we have

*p*→ 0 and

We emphasize that the exact time-dependent distributions for the non-bursty model have been discussed in Ref. 13 in the special case of negative autoregulation (*μ* = *b* = 0). However, we find that the solution given in Ref. 13 is both mathematically inconsistent and incomplete (the detailed reasons will be explained later). Here, we generalize and correct the results obtained in previous studies by taking translational bursting and coupled feedback into account.

## III. SOLVING THE TIME-DEPENDENT MASTER EQUATION

### A. Time-dependent solution for the bursty model

*p*

_{n}(

*t*) =

*p*

_{0,n}(

*t*) +

*p*

_{1,n}(

*t*) denote the probability of having

*n*protein molecules in an individual cell and let

*f*(

*t*,

*z*) =

*f*

_{0}(

*t*,

*z*) +

*f*

_{1}(

*t*,

*z*) denote its generating function. Then, Eq. (2) can be converted into the following set of partial differential equations (PDEs):

*f*

_{i}have the variable separation form of $fi(t,z)=e\lambda tf\u0303i(z)$. Here,

*λ*is called an

*eigenvalue*of the PDEs given in Eq. (6) and $f\u0303i(z)$,

*i*= 0, 1, are called the corresponding

*eigenfunctions*. Inserting it into Eq. (6) yields

*p*,

*d*/(

*μ*+

*ν*+

*d*), and

*∞*. The crucial idea is to transform it into a Heun differential equation.

^{13}In fact, the regular singularities of the standard Heun differential equation are 0, 1,

*ξ*, and

*∞*for some constant

*ξ*(Ref. 24, Sec. 31.2). Thus, we need to make a variable transformation that maps {

*d*/(

*μ*+

*ν*+

*d*), 1, 1/

*p*,

*∞*} onto {0, 1,

*ξ*,

*∞*}. To this end, we introduce a new variable

*h*be the function with the variable

*x*associated with $f\u0303$, i.e.,

*ξ*= (

*μ*+

*ν*+

*d*−

*dp*)/[(

*μ*+

*ν*)

*p*] is a constant and

*x*. To proceed, we define a new function $h\u0303(x)=(x\u22121)\lambda /dh(x)$. It is easy to check that $h\u0303(x)$ satisfies the following standard Heun differential equation:

^{25,26}that the two general solutions of Eq. (11) are given by

*Hl*is the

*local Heun function*(it is also referred to as the general Heun function

^{25}) and

*z*

_{0}=

*d*/(

*μ*+

*ν*+

*d*).

*p*

_{n}(

*t*) must decay exponentially with respect to the protein number

*n*when

*n*≥ 1 (see Sec. 1 of the supplementary material for the proof). In other words,

*p*

_{n}(

*t*) must have the following approximation:

*K*(

*t*) is a constant depending on

*t*and

*γ*> 0 describes the decay rate of

*p*

_{n}(

*t*) with respect to

*n*. This shows that

^{27}the convergence radius of the power series given in Eq. (5) must be greater than 1. Since

*z*

_{0}< 1, the generating function

*f*(

*t*,

*z*), as well as the function $f\u0303(z)$, must be holomorphic at both

*z*=

*z*

_{0}and

*z*= 1. Recall that $f\u0303(z)$ has two linearly independent solutions. In Sec. 2 of the supplementary material, we prove that the solution given in Eq. (15) cannot be holomorphic at both

*z*=

*z*

_{0}and

*z*= 1. Hence, we only need to consider the other solution given in Eq. (14).

*Hl*(

*ξ*,

*q*

_{1};

*α*

_{1},

*β*

_{1},

*γ*, 2 −

*δ*;

*x*) is holomorphic in the unit disk |

*x*| < 1, it is clear that the solution given in Eq. (14) must be holomorphic at

*z*=

*z*

_{0}. On the other hand, the solution given in Eq. (14) is holomorphic at

*z*= 1 if and only if the function

*Hl*(

*ξ*,

*q*

_{1};

*α*

_{1},

*β*

_{1},

*γ*, 2 −

*δ*;

*x*) is holomorphic at

*x*= 1. Note that the parameters

*q*

_{1},

*α*

_{1},

*β*

_{1},

*γ*, and

*δ*are all functions of the eigenvalue

*λ*[see Eqs. (12) and (13)]. It is an important property of the local Heun function (Ref. 24, Sec. 31.4) that

*Hl*(

*ξ*,

*q*

_{1};

*α*

_{1},

*β*

_{1},

*γ*, 2 −

*δ*;

*x*) is holomorphic at

*x*= 1 if and only if

*λ*satisfies the following continued fraction equation (also refer to Ref. 28):

*λ*

_{n},

*n*= 1, 2, …, and these values are actually all nonzero eigenvalues of Eq. (6). Recall that the local Heun function

*Hl*(

*ξ*,

*q*

_{1};

*α*

_{1},

*β*

_{1},

*γ*, 2 −

*δ*;

*x*) is called a

*Heun function*, denoted by

*Hf*, if it is holomorphic at

*x*= 1 (Ref. 24, Sec. 31.4). Then, the eigenfunctions corresponding to all nonzero eigenvalues are given by

*q*

_{1},

*α*

_{1},

*β*

_{1},

*γ*, and

*δ*are all functions of eigenvalue

*λ*

_{n}.

*λ*

_{0}= 0, it follows from Eq. (12) that

*δ*= 0 and

*αβx*−

*q*=

*aρ*(

*x*− 1)/[

*d*(

*μ*+

*ν*+

*d*)]. In this case, the Heun differential equation given in Eq. (11) reduces to the hypergeometric differential equation,

*z*) denotes the real part of

*z*.

*λ*. To solve these solutions numerically, we need to truncate Eq. (17) at a large integer

*N*, with

*N*being the number of layers of the continued fraction equation. The truncated equation is actually a polynomial equation of degree 2

*N*+ 2, and thus, it must have 2

*N*+ 2 roots, denoted by $\lambda 1N,\lambda 2N,\u2026,\lambda 2N+2N$. These 2

*N*+ 2 roots could be arranged so that

*N*increases, $\lambda nN$ rapidly converges to the true eigenvalue

*λ*

_{n}, and thus, we can use $\lambda nN$ as an accurate approximation of

*λ*

_{n}when

*N*is large. According to our simulations, for a fixed

*N*, the first

*N*/2 roots of the truncated polynomial equation are very close to the true eigenvalues (supplementary material, Table 1).

Recall that the stochastic dynamics of the coupled feedback loop is described by the Markov jump process illustrated in Fig. 1(b). In fact, all nonzero eigenvalues determined by solving Eq. (17) are exactly the same as the all nonzero eigenvalues of the generator matrix for the Markov jump process. Table I compares the first ten solutions of the continued fraction equation (truncated at *N* = 20 with *N* being the number of layers of the continued fraction equation) and the first ten nonzero eigenvalues of the generator matrix (truncated at *N* = 200, with *N* being the maximum protein number). Clearly, they coincide with each other perfectly and some eigenvalues may be complex numbers since the system is far from equilibrium.

Eigenvalues . | λ_{1}
. | λ_{2}
. | λ_{3}
. | λ_{4}
. | λ_{5}, λ_{6}
. | λ_{7}, λ_{8}
. | λ_{9}, λ_{10}
. |
---|---|---|---|---|---|---|---|

Continued fraction | −0.89 | −1.82 | −2.81 | −3.66 | −4.52 ± 0.33i | −5.76 ± 0.50i | −6.99 ± 0.57i |

method | |||||||

Generator matrix | −0.89 | −1.82 | −2.81 | −3.66 | −4.52 ± 0.33i | −5.76 ± 0.50i | −7.00 ± 0.55i |

method |

Eigenvalues . | λ_{1}
. | λ_{2}
. | λ_{3}
. | λ_{4}
. | λ_{5}, λ_{6}
. | λ_{7}, λ_{8}
. | λ_{9}, λ_{10}
. |
---|---|---|---|---|---|---|---|

Continued fraction | −0.89 | −1.82 | −2.81 | −3.66 | −4.52 ± 0.33i | −5.76 ± 0.50i | −6.99 ± 0.57i |

method | |||||||

Generator matrix | −0.89 | −1.82 | −2.81 | −3.66 | −4.52 ± 0.33i | −5.76 ± 0.50i | −7.00 ± 0.55i |

method |

We emphasize that when the protein number is large, the eigenvalues obtained using the generator matrix method may not be accurate since we need to compute the eigenvalues of a very large sparse matrix, which may lead to numerical instability. However, the continued fraction method still works in this case. Moreover, due to the rapid convergence of $\lambda nN$ to *λ*_{n}, the truncation size for the continued fraction method is much smaller than that for the generator matrix method.

*λ*

_{n},

*n*≥ 0, and the corresponding eigenfunctions $f\u0303\lambda n$ are given by

*f*

_{i}and

*f*have the following series form:

*f*at

*z*= 0, i.e.,

*C*

_{n}in Eq. (22) based on the initial conditions. To this end, it follows from Eq. (22) that for any complex numbers

*z*

_{0},

*z*

_{1}, …,

*z*

_{M},

*i*= 0, 1, are determined by Eqs. (20) and (21) and $fi(0,z)=\u2211n=0\u221epi,n(0)zn$,

*i*= 0, 1, are determined by the initial conditions. Note that this is an infinite dimensional system of linear equations with variables

*C*

_{n}. In order to compute

*C*

_{n}, we need to truncate the system at a large integer

*M*. To do this, we use an approach similar to the inverse discrete Fourier transform. Let

*M*be a positive integer, and let

*z*

_{m}=

*e*

^{2πmi/M}, 0 ≤

*m*≤

*M*− 1, be all

*M*th roots of unity. Clearly, when

*M*≫ 1, the coefficients

*C*

_{n}approximately satisfy the following set of linear equations:

*M*variables, which can be solved either analytically or numerically (the analytical expression can be obtained using Cramer’s rule). However, the traditional numerical methods are not suitable for solving these equations since the coefficient matrix in Eq. (24) is too singular. Numerical computations indicate that traditional methods behave poorly when

*M*≥ 20. To overcome this difficulty, note that the solution

**x**to the linear equations

*A*

**x**=

**b**is also the solution to the optimization problem,

*λ*

_{n}and the coefficients

*C*

_{n}, there are numerical steps involved, and hence, our solution is only semi-analytical. In the special case where the gene is unregulated (

*μ*=

*ν*= 0), both the eigenvalues and coefficients can be computed exactly, allowing us to obtain a full analytical expression of the time-dependent distribution (see the Appendix).

To test our semi-analytical solution, we compare it with the numerical one obtained from the finite state projection algorithm (FSP).^{29} The results are shown in Fig. 2. When using FSP, we truncate the state space at a large protein number *N* and solve the truncated master equation numerically using the MATLAB function “ODE45.” The truncation size is chosen as *N* = 5*ρB*/*d*. Since *ρB*/*d* is the typical protein number in the active gene state, the probability that the protein number is outside this truncation size is very small and practically can always be ignored. As expected, the two solutions agree perfectly at all times. According to our simulations, for most sets of model parameters, the semi-analytical solution is computationally faster than FSP when the mean protein number is larger than $\u223c500$; this is because the protein number needs to be truncated in FSP but does not in our solution.

### B. Convergence to the steady-state solution

*t*→

*∞*. We have seen that when the system is ergodic, one eigenvalue is

*λ*

_{0}= 0 and the other eigenvalues

*λ*

_{n},

*n*≥ 1, have negative real parts. Hence, the only term in Eq. (22) independent of time

*t*is the first term, and all other terms tend to zero exponentially fast as

*t*→

*∞*. Then, the steady-state generating function is given by

*f*(

*t*, 1) = 1. Taking the derivatives of the generating function

*f*at

*z*= 0 yields the steady-state protein distribution,

*a*)

_{n}=

*a*(

*a*+ 1)⋯(

*a*+

*n*− 1) denotes the Pochhammer symbol. Note that this is exactly the steady-state solution obtained previously in Ref. 22, which is a generalization of the special case of pure autoregulatory feedback studied earlier in Ref. 11.

### C. Time-dependent solution for the non-bursty model

*f*

_{i}and

*f*still have the form of

*ρ*→

*∞*and

*B*→ 0 while keeping

*ρB*=

*s*as constant. In this limit, for the zero eigenvalue

*λ*

_{0}= 0, the corresponding eigenfunction reduces to [Ref. 24, Eq. (13.18.2)]

*Hf*has the following limit (see Sec. 3 of the supplementary material for the proof):

*Hc*is the confluent Heun function

^{30}and

*λ*

_{n},

*n*≥ 1, can be obtained by solving Eq. (28), and the corresponding eigenfunctions reduce to

*λ*

_{n}and eigenfunctions $f\u0303\lambda n$. Similarly, by solving the optimization problem given in Eq. (25), we can compute all the coefficients

*C*

_{n}. This gives the complete time-dependent solution for the non-bursty model.

*μ*=

*b*= 0). However, the solution derived in that paper is questionable due to the following two reasons. First, the authors did not show how to compute all the coefficients

*C*

_{n}based on the initial conditions. Second, the eigenvalues computed in Ref. 13 are incorrect. The authors claimed that the system has two families of eigenvalues (these eigenvalues were first reported in an earlier paper

^{31}by the same authors),

*k*=

*a*/(

*d*+

*ν*) +

*sν*/(

*d*+

*ν*)

^{2}. However, the correct eigenvalues should be all the solutions to the continued fraction equation. Table II compares the correct eigenvalues obtained by using the continued fraction and generator matrix methods and the incorrect eigenvalues given in Eq. (29). We observe significant deviations between them; the former can be complex numbers, while the latter are always real. In Sec. 4 of the supplementary material, we briefly explain how the false eigenvalues come from and further compare the true and false eigenvalues. Hence, the solution derived in Ref. 13 is both incomplete and incorrect. In the present paper, we addressed the above two points.

Eigenvalues . | λ_{1}
. | λ_{2}
. | λ_{3}
. | λ_{4}
. | λ_{5}
. | λ_{6}
. | λ_{7}
. | λ_{8}
. |
---|---|---|---|---|---|---|---|---|

Correct (CFM) | −1.36 | −2.97 | −4.05 + 0.81i | −4.05 − 0.81i | −5.10 + 1.13i | −5.10 − 1.13i | −6.15 + 1.37i | −6.15 − 1.37i |

Correct (GMM) | −1.36 | −2.97 | −4.05 + 0.81i | −4.05 − 0.81i | −5.10 + 1.13i | −5.10 − 1.13i | −6.15 + 1.37i | −6.15 − 1.37i |

Incorrect | −1 | −2 | −3 | −4 | −5 | −5.82 | −6 | −6.92 |

Eigenvalues . | λ_{1}
. | λ_{2}
. | λ_{3}
. | λ_{4}
. | λ_{5}
. | λ_{6}
. | λ_{7}
. | λ_{8}
. |
---|---|---|---|---|---|---|---|---|

Correct (CFM) | −1.36 | −2.97 | −4.05 + 0.81i | −4.05 − 0.81i | −5.10 + 1.13i | −5.10 − 1.13i | −6.15 + 1.37i | −6.15 − 1.37i |

Correct (GMM) | −1.36 | −2.97 | −4.05 + 0.81i | −4.05 − 0.81i | −5.10 + 1.13i | −5.10 − 1.13i | −6.15 + 1.37i | −6.15 − 1.37i |

Incorrect | −1 | −2 | −3 | −4 | −5 | −5.82 | −6 | −6.92 |

## IV. DYNAMICAL PHASE DIAGRAMS

We next investigate the shape of the time-dependent distribution for the bursty model. In what follows, we assume that the initial protein number is zero and the gene is initially in the inactive state.^{2} This mimics the situation where the gene has been silenced by some repressor over a period of time such that all protein molecules have been removed via degradation. At time *t* = 0, the repressor is removed, and we study how gene expression recovers. Under this initial condition, it is easy to see that *f*_{0}(0, *z*) = 1 and *f*_{1}(0, *z*) = 0.

In experiments, there are three commonly observed patterns for the protein distribution: a unimodal distribution with a zero peak (decaying distribution), a unimodal distribution with a nonzero peak (bell-shaped distribution), and a bimodal distribution with both a zero and a nonzero peak.^{32,33} Our model is capable of producing all these three distribution patterns (Fig. 2). Among the three patterns, the bimodal one attracts the most attention since it separates isogenic cells into two distinct phenotypes.^{34–36} To further understand the shape of the time-dependent protein distribution, we classify the dynamic behavior of our model into five different phases: (i) the distribution is decaying at all times [Fig. 2(a)], (ii) the distribution is decaying at small times and is bell-shaped at large times [Fig. 2(b)], (iii) the distribution is decaying at small and large times and is bimodal at intermediate times [Fig. 2(c)], (iv) the distribution is decaying at small times, is bimodal at intermediate times, and is bell-shaped at large times [Fig. 2(d)], and (v) the distribution is unimodal at small times and is bimodal at large times [Fig. 2(e)]. To distinguish between them, we refer to (i) as type-I of unimodality (U1), to (ii) as type-II of unimodality (U2), to (iii) as type-I of transient bimodality (TB1), to (iv) as type-II of transient bimodality, and to (v) as stationary bimodality (SB). The semi-analytical solution and numerical solution obtained from FSP indicate that all the five dynamical phases can appear when model parameters are appropriately chosen (Fig. 2).

To determine the regions for the five phases in the parameter space, we illustrate the *a*–*b* phase diagrams of our model in Figs. 3 and 4. In all phase diagrams, there is a unique point separating the five phases; this is analogous to the triple point in physics and chemistry where all three phases (gas, liquid, and solid) of a substance coexist in thermodynamic equilibrium.^{37} Intriguingly, we find that the regions for TB1 and TB2 are often not adjacent in the phase diagram and are separated by the SB region (see, e.g., the middle row of Fig. 4). This is why we distinguish type-I from type-II transient bimodality in the present paper. Figure 3 shows the phase diagrams under different gene switching rates (slow switching, intermediate fast switching, and fast switching) and different feedback controls (positive feedback, coupled feedback, and negative feedback). We find that a positive feedback network fails to produce TB1, while a negative or coupled feedback network can produce all the five types of dynamical behavior. The region for transient and stationary bimodality (TB1, TB2, and SB) shrinks significantly as the switching rates increase. When the switching rates are relatively fast, the region for SB totally disappears in negative feedback networks, which agrees with the results found in Refs. 12 and 38, and the region for TB1 also disappears in all three types of networks. This indicates that TB1 can only appear when the switching rates are relatively slow. In particular, in the regime of relatively slow switching, positive feedback promotes the occurrence of SB and restrains the occurrence of TB1, while negative feedback promotes the occurrence of TB1 and restrains the occurrence of SB.

We next focus on Fig. 4, which illustrates the phase diagrams under different protein burst sizes and feedback controls when the switching rates are not too small. Again, TB1 is not found in the positive feedback case and SB is not found in the negative feedback case (note that SB can be found when gene switching is very slow, as previously shown in Fig. 3). From the phase diagram, as the burst size *B* increases, the region for SB shrinks in positive and coupled feedback networks; the region for TB2 enlarges significantly in all three types of networks; the region for TB1 remains almost unchanged in negative feedback networks, while the region for SB converts to that for TB1 in coupled feedback networks. In summary, we find that TB1 is mostly likely to occur in negative and coupled feedback networks when gene switching is not too fast; TB2 is mostly affected by the burst size; and SB is sensitive to many factors and is mostly likely to occur in positive and coupled feedback networks when gene switching is relatively slow and the burst size is small.

Among the five dynamical phases, TB1 is of particular interest because it can never occur when the gene is unregulated (*μ* = *ν* = 0). In fact, for the two-state telegraph model of stochastic gene expression, it has been shown that it can produce U1, U2, TB2, and SB but fails to produce TB1.^{39,40} In fact, TB1 has been observed recently in single-cell experiments [see Fig. 2(c) in Ref. 41]. Previous studies have used a four-state gene expression model to fit the data but the model does not involve feedback.^{41} Our results show that the presence of negative or coupled feedback loops is possibly another important origin for the experimentally observed TB1 dynamics.

## V. CONCLUSION

In this work, we semi-analytically solved the CMEs and obtained the full time-dependence of a minimal coupled positive-plus-negative feedback loop with gene state switching, protein synthesis, and protein decay. This coupled gene circuit includes positive and negative autoregulatory feedback loops as special cases. In the active gene state, our model assumes that the protein is produced in a non-bursty or bursty manner. Following previous work,^{42} we transformed the CMEs satisfied by the protein distribution into the PDEs satisfied by the generating function. By using the method of spectral decomposition, we represented the protein distribution as a weighted sum of eigenvalue terms, which only depends on time, and eigenfunction terms, which only depends on the spatial variable. We then make nontrivial spatial and functional transformations to obtain a Heun differential equation satisfied by the eigenfunctions.

Interestingly, the eigenfunctions are all local Heun functions, while the eigenvalues are those values such that these functions are holomorphic on the unit disk and can be determined by solving a continued fraction equation. In general, these eigenvalues cannot be computed in closed form. However, in the special case where the gene is unregulated, the eigenvalues can be solved exactly and the eigenfunctions reduce to Gaussian hypergeometric functions. We finally use a method similar to the inverse discrete Fourier transform to compute the weights of these eigenvalue and eigenfunction terms based on the initial conditions. In particular, our time-dependent solution generalizes the one obtained in Ref. 38, which can be applied only in fast switching conditions, and also generalizes and modifies the incomplete and incorrect solution obtained in Ref. 13 for the non-bursty case. The eigenvalues obtained in Ref. 13 are incorrect since they are all real, while the true eigenvalues can be complex numbers. The complex eigenvalues may be related to the oscillatory^{43,44} and adaptation^{45,46} phenomena observed in negative feedback networks. Finally, we investigated the dynamical phase diagram of the coupled feedback loop by categorizing regions of parameter space according to five distinct types of time-evolution. This allowed us to deduce relationships between these phases and the type of feedback loop (positive, coupled, and negative), the burst size, and the speed of gene switching.

Our analytical theory also has mathematical importance. In the classical functional analysis textbooks,^{47} spectral decomposition for unbounded operators is only well developed when the operator is self-adjoint or normal. For a self-adjoint operator, the eigenvalues are all real and the eigenfunctions are orthogonal. In this case, the eigenvalues and eigenfunctions can sometimes be determined analytically^{48,49} using, e.g., the Sturm–Liouville theory. However, the unbounded operator studied in this paper, i.e., the differential operator in Eq. (6), is neither self-adjoint nor normal (see Table III for the comparison between the three types of operators). For this operator, the eigenvalues can be complex and the eigenfunctions are non-orthogonal. While we have showed that the eigenfunctions satisfy an second-order differential equation [see Eq. (10)], this equation cannot be transformed into a classical Strum–Liouville problem, and thus, the conventional method fails. Here, we obtain the complete spectral decomposition for a special non-self-adjoint and non-normal operator. We anticipate that a new spectral theory could be developed for general non-self-adjoint and non-normal operators in the near future.

Type of operators . | Eigenvalues . | Eigenfunctions . |
---|---|---|

Self-adjoint operators | Real | Orthogonal |

Normal operators | Complex | Orthogonal |

Differential operator in Eq. (6) | Complex | Non-orthogonal |

Type of operators . | Eigenvalues . | Eigenfunctions . |
---|---|---|

Self-adjoint operators | Real | Orthogonal |

Normal operators | Complex | Orthogonal |

Differential operator in Eq. (6) | Complex | Non-orthogonal |

Our model has two notable limitations: (i) We assume that there is no change in the protein number during gene activation and inactivation. However, in reality, the protein number decreases by one when a protein copy binds to a gene and increases by one when unbinding occurs.^{10,12} In other words, the present model ignores the protein–gene binding fluctuations, and hence, it may not be accurate when the protein number is very small or when the feedback strength is very strong.^{12,50} The reason why we make this approximation is that if we ignore the binding fluctuations, then the differential equation satisfied by the eigenfunctions has four regular singularities and, thus, can be transformed into a Heun differential equation. However, if we take the binding fluctuations into account, then the differential equation will have five regular singularities and, thus, does not allow an existing special function representation. (ii) The model does not have an explicit cell cycle description. In reality, most proteins are not continuously degraded at some positive rate, but rather because their half-lives are often much longer than the cell cycle duration itself,^{17,51,52} they tend to accumulate during the cell cycle and then (approximately) half at cell division. Under some conditions, the modelling of protein degradation via a first-order reaction (as in our current model) well approximates the dilution due to cell division.^{53} Note that our model with *d* = 0 can also be seen as predicting the time-dependent protein distribution within a cell cycle where *t* = 0 corresponds to the birth of a cell, but since there is no explicit modelling of cell division, it cannot predict how the distributions change from one generation to another. In the forthcoming part II of this work, we will address some of these issues.

## SUPPLEMENTARY MATERIAL

The supplementary material contains detailed proofs and calculations of some key results in the main text.

## ACKNOWLEDGMENTS

The authors thank Dr. Youming Li for stimulating discussion. J.H. acknowledges support from National Science Foundation Grant Nos. EF-2133863 and DMR-191073. R.G. acknowledges support from the Leverhulme Trust (Grant No. RPG-2020-327). C.J. acknowledges support from the National Natural Science Foundation of China with Grant Nos. U2230402 and 12271020.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Bingjie Wu**: Formal analysis (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). **James Holehouse**: Formal analysis (equal); Investigation (equal); Writing – review & editing (equal). **Ramon Grima**: Conceptualization (equal); Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal). **Chen Jia**: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

### APPENDIX: TIME-DEPENDENT SOLUTION FOR UNREGULATED GENES

*μ*=

*ν*= 0. We emphasize that for an unregulated gene, the analytical time-dependent distributions of protein numbers have been derived in Refs. 54 and 55. Here, we will provide a different explicit expression of the time-dependent solution.

*μ*=

*ν*= 0, the expressions of the eigenvalues

*λ*

_{n}, eigenfunctions $f\u0303\lambda n$, and coefficients

*C*

_{n}in Eq. (22) can be simplified to a great extent. In this case, the related parameters in Eq. (18) reduce to

*λ*

_{1n}= −

*nd*, the corresponding eigenfunctions are given by

*λ*

_{2n}= −(

*a*+

*b*+

*nd*), the corresponding eigenfunctions are given by

*a*

_{1},

*b*

_{1}, and

*c*

_{1}are given in Eq. (A1). Hence, for an unregulated gene, the generating function

*f*has the form of

*C*

_{1n}and

*C*

_{2n}. Let $Fni$ be the unnormalized

*n*th factorial moment of the protein number at time

*t*= 0 when the gene is in state

*i*, i.e.,

*C*

_{10}= 1, and the remaining coefficients can be computed inductively as follows (see Sec. 5 of the supplementary material for details):

*f*

_{0}(0,

*z*) = 1 and

*f*

_{1}(0,

*z*) = 0. This clearly shows that $F00=1$ and $Fn0=0$ for all

*n*≥ 1 and $Fn1=0$ for all

*n*≥ 0. Inserting these values of $Fni$ into Eq. (A3) gives the values of all the coefficients

*C*

_{1n}and

*C*

_{2n}.

## REFERENCES

*The Regulatory Genome: Gene Regulatory Networks in Development and Evolution*

*Escherichia coli*

*Heun’s Differential Equations*

*Compendium of Chemical Terminology*

*Functional Analysis*

*International Series in Pure and Applied Mathematics*

*S. cerevisiae*and

*S. pombe*