Biochemical processes typically involve many chemical species, some in abundance and some in low molecule numbers. We first identify the rate constant limits under which the concentrations of a given set of species will tend to infinity (the abundant species) while the concentrations of all other species remains constant (the non-abundant species). Subsequently, we prove that, in this limit, the fluctuations in the molecule numbers of non-abundant species are accurately described by a hybrid stochastic description consisting of a chemical master equation coupled to deterministic rate equations. This is a reduced description when compared to the conventional chemical master equation which describes the fluctuations in both abundant and non-abundant species. We show that the reduced master equation can be solved exactly for a number of biochemical networks involving gene expression and enzyme catalysis, whose conventional chemical master equation description is analytically impenetrable. We use the linear noise approximation to obtain approximate expressions for the difference between the variance of fluctuations in the non-abundant species as predicted by the hybrid approach and by the conventional chemical master equation. Furthermore, we show that surprisingly, irrespective of any separation in the mean molecule numbers of various species, the conventional and hybrid master equations exactly agree for a class of chemical systems.
I. INTRODUCTION
The chemical master equation (CME) is the accepted stochastic description of chemical reaction systems.1 Since intrinsic noise roughly scales as the inverse square root of the mean number of molecules,1 it follows that the CME provides a more accurate description than deterministic rate equations (REs), when species are in low concentrations. However, exact solution of the CME has proved to be impossible for all but the simplest systems (see for example Refs. 2–5), and Monte Carlo simulations using the stochastic simulation algorithm (SSA)6 are also time-consuming in many cases of interest. One way to bypass these issues is to use a hybrid model which treats some parts of the system using the SSA and the rest using a simulation method which is computationally more efficient. A common example of such hybrid modelling utilises time scale separation whereby some reactions occur on a fast time scale and are modelled using continuous approaches such as REs or chemical Langevin equations, while the rest of reactions occur on slow time scales and are modelled using the standard SSA.7–9 Other methods which enable a considerable improvement over the SSA when time scale separation is present are: the nested SSA,10 a coarse-grained equation-free approach,11 the constrained multiscale algorithm,12,13 an approach based on finite-state projection,14,15 the slow-scale SSA16 and the slow-scale linear-noise approximation.17,18
In this paper, we consider a different type of hybrid model, one which uses a separation in the abundance of species (abundance separation) rather than time scale separation. In particular, we no longer split reactions into fast and slow but rather categorise species based on how abundant they are. These methods utilise a continuous approach to model the abundant species and a discrete approach to model the less abundant species. While less popular than time scale separation, some hybrid algorithms have been developed to take advantage of this idea (see, for example, Refs. 19–22). Stochastic simulations verify that these hybrid models can capture important features of the fully stochastic model. In particular, the model by Hellander and Lötstedt19 has been shown by Jahnke21 to be exact for monomolecular systems, i.e., the marginal distributions of non-abundant species in the hybrid model are exactly the same as the same obtained from the full stochastic model. More sophisticated (and computationally expensive) approaches have been postulated21,22 to deal with systems which are not well described by the Hellander and Lötstedt hybrid model.
The advantages of methods using abundance separation over time scale separation are that: (i) the time scales of reactions are often unknown while the abundances are readily measurable; (ii) there is evidence suggesting that abundance separation is at least as significant, if not more, than time scale separation for biochemical networks inside cells. For example it has been shown that the mean number of proteins per E. Coli cell is roughly a thousand times that of the mean number of mRNA molecules per cell23—in contrast, the ratio of protein to mRNA lifetime in E. Coli is expected to be above 1 but in the single digits.3 For mammalian cells, the same has been found; the median number of protein per cell is roughly 3000 times that of median number of mRNA per cell—in contrast, the ratio of median protein to median mRNA lifetime is about 5.24 Clearly in these cases, abundance separation is significant while time scale separation is weak, and thus a method which takes advantage of the former appears to be ideal as a means to infer information about the stochastic dynamics of mRNA and of other proteins present in low copy numbers.
In this paper, we postulate a novel simple hybrid model based on abundance separation consisting of a CME for the non-abundant species coupled to REs for all species. Subsequently, we identify the rate constant limits under which the concentrations of a given set of species will tend to infinity (the abundant species) while the concentrations of all other species remain constant (the non-abundant species). This limit we shall refer to as the abundance or abundant limit. We show that, in this limit, the marginal distributions of the non-abundant species given by the hybrid model converge to the same distributions given by the CME of the full system. This fact is particularly useful when the hybrid model can be solved analytically, which is the case in several examples that we study. We illustrate the accuracy of our hybrid model by comparing the exact stochastic simulations of its reduced CME with exact stochastic simulations of the full CME. We also show that there are several chemical systems for which our hybrid model is exact even without abundance separation. In Section IV we offer an error analysis which provides an easy means to estimate the error incurred by the use of the hybrid model when the ratios of abundant to non-abundant species concentrations are finitely large. We conclude with a summary and discussion in Section V.
II. A REDUCED CHEMICAL MASTER EQUATION
In this section, we first propose a reduced CME which constitutes our hybrid model, and subsequently rigorously prove that it converges to the CME of the full system (that describing all species) in the abundance limit.
A. A heuristic reduction of the CME
The CME for a system of N chemical species which interact through R reactions has the form
where Ω is the volume in which the reactions occur, is the step operator which replaces ni with ni + x, the entries of the state vector n = (n1, …, nN) are the number of molecules of each species, P(n, t) is the probability of the system being in state n at time t, and the stoichiometric matrix elements Sij are given by the net change in the number of molecules of species i when the jth reaction occurs. The probability that a reaction j occurs in a time interval [t, t + dt) is given by , where is a function of elements of the state vector and reaction rates. The REs are defined by
where are the concentrations of the N chemical species, and .
We wish to reduce the number of species in this CME from N to M with M < N. Without loss of generality, we will keep species 1 to M and remove species M + 1 to N, which we consider to be “abundant.” We will do this by first summing the CME over nM+1, …, nN to leave us with an equation for the time evolution of the exact marginal distribution P∗(n′, t), where n′ = (n1, …, nM), then subsequently we use a mean-field assumption to obtain a time evolution equation for the approximate marginal distribution .
We will be considering the different possible forms that can take assuming elementary reactions, specifically up to bimolecular reactions (reactions involving more than three molecules are rare in a biological setting). We will first investigate what happens to the CME if we sum over, say, nh, using the notation that n−h is the state vector n without the hth entry, in other words, . In what follows, we will use to refer to the vector of chemical species, and to refer to the vector of random variables which give the number of molecules of each species. The state vector therefore refers to a particular realisation of the random vector Y.
If reaction j does not feature Xh amongst its reactants, then has no nh dependence and the corresponding term in the CME remains unchanged.
If reaction j is a unimolecular reaction of the type Xh → … then , and we will have
If reaction j is a bimolecular reaction of the type Xh + Xh → … then and we will have
Finally, if reaction j is a bimolecular reaction of the type Xh + Xi → … then and we will have
These results follow from the definition of conditional expectation,
Given the above results, we can now see what will happen to each if we sum over all h = M + 1, …, N. The result of this summation leads to new propensities involving conditional expectations which we call . It then follows that the exact marginal CME is given by
where n′ = (n1, …, nM). In theory, we have simplified the CME while keeping it exact, but we should be careful because the dependence of the conditional expectations on n′ is currently unknown.
We proceed by making the heuristic mean-field assumption that these conditional expectations can be replaced by deterministic concentrations ϕi as defined earlier in Eq. (2), for example,
where we have approximated away all conditional dependence. We can correspondingly update the exact effective propensities with the approximate effective propensities . A general recipe for converting to is given in Table I.
Propensity . | Reduced propensity . |
---|---|
kj | kj |
kjniΩ−1 | kjniΩ−1 |
kjninrΩ−2 | kjninrΩ−2 |
kjni(ni − 1)Ω−2 | kjni(ni − 1)Ω−2 |
kjninhΩ−2 | kjniΩ−1ϕh(t) |
kjnhΩ−1 | kjϕh(t) |
kjnh(nh − 1)Ω−2 | |
kjnhnvΩ−2 | kjϕh(t) ϕv(t) |
Propensity . | Reduced propensity . |
---|---|
kj | kj |
kjniΩ−1 | kjniΩ−1 |
kjninrΩ−2 | kjninrΩ−2 |
kjni(ni − 1)Ω−2 | kjni(ni − 1)Ω−2 |
kjninhΩ−2 | kjniΩ−1ϕh(t) |
kjnhΩ−1 | kjϕh(t) |
kjnh(nh − 1)Ω−2 | |
kjnhnvΩ−2 | kjϕh(t) ϕv(t) |
The approximate marginal CME is now
An alternative way of summarising our reduction method is that it consists of approximating a general chemical system,
by the reduced chemical system,
when species XM+1, …, XN are abundant.
B. The abundant limit
We wish to show that the approximate marginal CME given in Eq. (9) is a good approximation to the exact marginal CME given in Eq. (7) when species XM+1, …, XN are abundant. To do this, we will need to define an abundance limit. Precisely, we want to know which parameters we should tweak in order that some species concentrations should go to infinity, while others stay constant. We will assume, without loss of generality, that we want to take the abundant limit of species XN. For systems with multiple abundant species, we can just repeat the below process for each one in turn.
The convention we use for numbering reactions is given in Table II. We will introduce the functions a(i) and b(i) so that we can say that the bimolecular reaction with rate ki has species Xa(i) and Xb(i) as its reactants, where a(i)≠b(i). We will have N input reactions with rate ki, i = − 1, …, − N which lead to the production of each species, monomolecular reactions with rates ki, i = 1, …, N, bimolecular reactions between different species with rates ki, i = N + 1, …, L (for some L ∈ ℕ), and bimolecular reactions between the same species with rates ki, i = L + 1, …, L + N.
Reaction index . | Reaction type . | Reaction rate . |
---|---|---|
i = − 1, …, − N | 0̸ → … | ki |
i = 1, …, N | Xi → … | kiϕi |
i = N + 1, …, L | Xa(i) + Xb(i) → … | kiϕa(i)ϕb(i) |
i = L + 1, …, L + N | 2Xi−L → … |
Reaction index . | Reaction type . | Reaction rate . |
---|---|---|
i = − 1, …, − N | 0̸ → … | ki |
i = 1, …, N | Xi → … | kiϕi |
i = N + 1, …, L | Xa(i) + Xb(i) → … | kiϕa(i)ϕb(i) |
i = L + 1, …, L + N | 2Xi−L → … |
Now the rate equation for the concentration of Xr is
where Sri is the net change in the number of molecules of species Xr when reaction i occurs.
An intuitive means to obtain an abundant species XN is to make the rate constants of the reactions which remove this species, to be very small, whilst the rest of the rate constants remain at their constant value. In particular, if exactly one molecule of XN is a reactant, then we let ; if two molecules of XN are reactants, then we let , where x → ∞. This means that , , for j such that a(j) or b(j) equal N and j = N + 1, …, L. In what follows we shall refer to these rate constant limits as the abundance or abundant limit. Plugging in the aforementioned rates constant scalings and the trial solution,
Note also that we have here implicitly assumed that there is no chemical conservation law which involves an abundant species (chemical conservation laws which involve only non-abundant species are however allowed). This is since this law necessitates a finite upper bound on the concentrations which is contrary to the manner in which we have here defined the abundant limit.
For systems with multiple abundant species, the above recipe implies that where q is the total number of reactant molecules of abundant species involved in reaction j. For example, for the reaction Xh + Xv → … where both species Xh and Xv are abundant, the rate constant of the reaction scales as .
The limits here derived assume a steady-state rate equation description for all species. This derivation is here presented to simplify the presentation and since it is very intuitive. However , as we show in Sec. II C 3, the limits elucidated here, also constitute abundance limits for the time-dependent stochastic description.
C. Proof of N species abundant convergence
1. Taylor expansion of exact marginal distribution
A full N species CME with R reactions has the form of Eq. (1). We will expand the solution P(n, t) as a Taylor series in time about t = 0. We assume deterministic initial conditions, , where denotes the initial value of ni. We can write the Taylor expansion as
where P(k) is the kth time derivative of P. Since the full CME is a coupled set of first-order ordinary differential equations with constant coefficients, the Taylor series above is guaranteed to have an infinite radius of convergence by Fuchs’s theorem.25
From this series, we can compute the marginal distribution,
We already know , so our first problem is the second term of the expansion, which is the first time derivative. This is given by CME (1) itself,
and thus we can compute the kth derivative,
where we are careful to note that , since the E operators do not commute with the propensities . Now, we can compute the marginal distribution P⋆, which is made much simpler by the presence of the Kronecker-deltas,
where now acts on the initial conditions rather than the variable ni.
2. Taylor expansion of the approximate reduced distribution
The approximate marginal distribution is defined by the reduced CME,
Its Taylor expansion about t = 0 is
Fuchs’s theorem guarantees that a first-order ordinary differential equation with time-dependent coefficients will have a radius of convergence at least as large as the minimum of the radius of convergence of the time-dependent parameters.25 The reduced CME is a set of coupled first-order equations with time-dependent coefficients determined by the solution of the REs. Hence, if the REs admit a Taylor series solution with an infinite radius of convergence then the Taylor series of the reduced CME also does.
We already know , so our first problem is the second term of the expansion. Again, this is given by approximate CME (19),
where we note that the propensities will in general depend on t as well as n′. This will cause complications, for instance, the second derivative has the form:
We now have an extra term in our sum which depends on the time derivative of the , and if we take higher order Taylor coefficients, we get higher time derivatives of the . We get, using the notation used in Subsection II C 1,
3. Convergence of full and reduced Taylor series
The absolute difference between the kth terms of the two Taylor CMEs is given by the difference between Eqs. (18) and (22),
This will tend to zero in the abundant limit if we can prove two claims,
- For any αi ∈ ℤ, and j ∈ 1, …, R,in the abundant limit.
The time derivatives of the tend to zero in the abundant limit.
To prove (I), we must recall the different possible forms that and can take, which are given in Table III. As before, we say that the indices i, r ≤ M correspond to non-abundant species, while the indices h, v > M correspond to abundant species.
Reaction type . | . | . |
---|---|---|
0̸ → … | kj | kj |
Xi → … | kj(ni + αi)Ω−1 | kj(ni + αi)Ω−1 |
Xh → … | kjϕh(0) | |
Xi + Xr → … | kj(ni + αi)(nr + αr)Ω−2 | kj(ni + αi)(nr + αr)Ω−2 |
Xi + Xi → … | kj(ni + αi)(ni + αi − 1)Ω−2 | kj(ni + αi)(ni + αi − 1)Ω−2 |
Xh + Xv → … | kjϕh(0) ϕv(0) | |
Xh + Xh → … | ||
Xi + Xh → … | kj(ni + αi) ϕh(0)Ω−1 |
Reaction type . | . | . |
---|---|---|
0̸ → … | kj | kj |
Xi → … | kj(ni + αi)Ω−1 | kj(ni + αi)Ω−1 |
Xh → … | kjϕh(0) | |
Xi + Xr → … | kj(ni + αi)(nr + αr)Ω−2 | kj(ni + αi)(nr + αr)Ω−2 |
Xi + Xi → … | kj(ni + αi)(ni + αi − 1)Ω−2 | kj(ni + αi)(ni + αi − 1)Ω−2 |
Xh + Xv → … | kjϕh(0) ϕv(0) | |
Xh + Xh → … | ||
Xi + Xh → … | kj(ni + αi) ϕh(0)Ω−1 |
Convergence is trivial for reactions 0̸ → …, Xi → …, Xi + Xr → …, and Xi + Xi → …, since and agree. For the other reactions, we have to decide how the initial conditions should scale in the abundant limit. We will suppose that the initial concentration for species XM + 1, …, XN should tend to infinity O(x) (since they are the abundant species), while the initial concentration for species 1, …, M should stay constant.
For the Xh → … reaction, we are interested in
However by definition, , so the absolute error becomes which tends to zero since kj → 0 in accordance with our abundant limit elucidated in Section II B.
For the Xh + Xv → … reaction, we are interested in
which simplifies to
and which again tends to zero, since in the abundant limit while and are proportional to x and x → ∞.
For the Xh + Xh → … reaction, we are interested in
which simplifies to
and which again tends to zero since while .
Finally, for the Xi + Xh → … reaction we consider
which we write as
and which also tends to zero since kj → 0 in the abundance limit. We have therefore proved claim (I).
To prove claim (II), we will use the convention for reactions introduced in Table II. We will say that k0 corresponds to the rate of a null reaction, k1, …, kN correspond to the rates of monomolecular reactions, kN+1, …, kL correspond to the rates of bimolecular reactions involving distinct species with reaction kj involving species Xa(j) and Xb(j), and kL+1, …, kL+N correspond to homodimerisation reactions.
We will prove claim (II) for the reaction Xh + Xv → 0, with reduced propensity . Note that generally we have but because the reaction involves two abundant species there is no explicit dependence on Ω and n′ and hence in what follows we use the less cumbersome notation . Using Leibniz’s rule,
For this reaction, and also ϕh(0), ϕv(0) ∝ x → ∞. If we can prove that each derivative is bounded in the abundant limit, k > 0, then the above expression for will tend to zero in that limit, owing to the limiting prefactor kj. First, the definition of the first derivative of ϕh is given by the rate equation
Wherever we have a concentration which tends to infinity, (ϕj(0), j = M + 1, …, N) it is by definition cancelled out by the corresponding parameter kj. So this expression remains constant, and therefore bounded, in the abundant limit. For the purposes of mathematical induction, assume now that all derivatives up to remain bounded in that limit. Then, using Leibniz’s rule,
The only terms here that have a possibility of being unbounded are those involving zeroth derivatives of the ϕj, specifically,
But these ϕj(0) will only go to infinity if j > M, and in that case the reaction rate kj will go to zero at least fast enough to counteract the limiting concentrations. So therefore we have shown that each time derivative of the concentrations is bounded in the abundant limit, and therefore each time derivative of the goes to zero in that limit. The proofs for the other reactions are very similar. Therefore we have proved claim (II), and consequently we have proved the convergence of the full marginal CME and the reduced CME for all times, in the abundant limit.
III. THE ACCURACY OF THE REDUCTION FOR FINITELY LARGE ABUNDANT CONCENTRATIONS AND SPECIAL CASES
The abundant limit, as stated previously, is the limit that the rate constants of the reactions removing the abundant species go to zero (in a particular manner) which ensures that the ratios of the abundant to non-abundant concentrations go to infinity. It is in this limit that we have proved that the difference between the reduced and full marginal CME goes to zero. Generally, we are interested in the case where the ratio of the abundant to non-abundant concentrations is finitely large, not infinite. In this case, the reduced CME is approximate. Given two identical copies of a chemical system, one placed in a small volume and the other in a much larger volume, and given they have the same finite large ratio of abundant to non-abundant concentrations, we expect the reduced CME to be a better approximation for the system confined in the larger volume. The reason is that the number of molecules of abundant and non-abundant species is larger in the system confined in the larger volume and hence the REs in this case provide a good approximation to the dynamics of the abundant species,26,27 the key principle behind our reduced CME. Another equivalent point of view is that in the larger volume the reactions occur on faster time scales than the reactions in low volumes, due to the larger number of interacting molecules and hence the dynamics of the abundant species in the larger volume are more amenable to being modelled by a continuous approach like the chemical Langevin equation or REs.28
Of course, as Gillespie pointed out in Ref. 28, it is difficult to tell how large should the compartment volume be so that a macroscopic approach becomes a feasible approximation. Hence, in this section, we explore the accuracy of the reduced CME, for finite ratios of abundant to non-abundant concentrations, by means of stochastic simulations. In particular we will use the SSA6 to sample the probability distribution of full CME (1) and the Extrande algorithm29 to sample the probability distribution of reduced CME (9) for the case where the rate constants of the reactions removing the abundant species scale as (where q is the total number of reactant molecules of abundant species involved in reaction j) and x is a finite real number. Note that the reduced CME cannot be sampled using the SSA since the propensities are generally time-dependent and hence the use of Extrande. An alternative to the use of the latter algorithm would be to use a method involving the numerical integration of reaction propensities.30 We also show in this section that curiously, for some chemical systems, the exact and approximate marginals are identical even without taking abundance limit.
A. Stochastic simulations
1. Homodimerisation
We will investigate an open homodimerisation reaction studied in Ref. 26,
Species X1 is produced with rate k1, and is either consumed with rate k3 or else forms a dimer X2 with rate k2. The dimer X2 is then consumed with rate k4. We consider the case where X1 is abundant and X2 is not. The RE for the concentration of X1 is
which has the solution
where
Therefore, approximate reduced CME (9) for non-abundant species X2 is given by
The approximate equilibrium distribution of this system is therefore Poisson . In Figs. 1 and 2, we compare the exact marginal distribution of X2 from the full CME and the approximate marginal distributions of the reduced CME. These are obtained by means of an ensemble average of SSA and Extrande trajectories, respectively. In Fig. 1, we plot the distributions for four different relative abundances of X1 and X2 at the same time point t = 0.01. The abundance is adjusted by choosing the rate constants to scale as and (in accordance with the limits delineated in Section II B) and varying x over a certain finite range (see caption of Fig. 1). The distance between the two distributions clearly becomes smaller as the ratio of the abundant to non-abundant species concentrations increases, in line with the proof in Sec. II; the two distributions are practically indistinguishable when this ratio is of the order of 100. Nevertheless, we find that some salient features of the two distributions are fairly similar (namely, the position of mode and the width of distribution) over a large range of the ratio of the abundant to non-abundant species concentrations (0.4–263). In Fig. 2, we show that the high accuracy of the reduced CME also extends to predicting the whole time-evolution of the distribution. Both of these figures indicate that the results of simulations using the reduced CME bear a significant closeness to those obtained using the full CME under a wide range of abundances and hence point towards the utility of the reduced CME as a low dimensional approximation of the full CME.
2. Genetic feedback loop
We next investigate a negative genetic feedback loop studied in Ref. 2,
The “on” promoter Don produces proteins P with rate v0. These proteins bind to the promoter with rate d1 to generate the inactive “off” promoter Doff, which can then unbind back into active promoter and protein with rate v1. Furthermore, the protein P is consumed by a unimolecular reaction with rate d0. We consider the case where P is abundant.
The REs for this system are given by
where ϕ1 is the concentration of Doff, ϕ2 is the concentration of P, Ω is the volume, and is the total (fixed) gene concentration (equivalent to one gene). Unlike the previous example, these equations do not admit an analytical solution, so we must solve them numerically, and then use the numerical solution in the reduced CME,
where n1 is the number of molecules of Doff. Since at any one time, the gene is either on or off, the distribution of Doff is Bernoulli. In particular, since the parameter of the Bernoulli distribution, θ(t), is equal to , we can say
where is obtained by setting n1 = 1 in the above reduced CME,
Our reduction method therefore provides us with an “exact” solution at all times in this case, since we do not need to perform any stochastic simulations sampling the reduced CME, but rather just numerically solve the ordinary differential equation above.
Figure 3 shows how the approximate expression for θ given in Eq. (45) compares with the true θ (obtained by computing 〈n1〉 from an ensemble average of SSA trajectories of the full CME) for different relative abundances at t = 50. The relative abundance is controlled by choosing the rate constants to scale as (in accordance with the limits delineated in Section II B) and varying x over a certain finite range (see caption of Fig. 3). For the parameter set chosen, we find that the approximation for θ using the reduced CME is in good qualitative agreement with that calculated from the full CME when the ratio of abundant to non-abundant concentrations varies over the range 103–107. In particular both the full and reduced CME predict that the probability of the gene being in the off state increases monotonically, in a step-like manner, as the protein concentration increases at constant gene concentration (consistently with a negative feedback loop). For low relative abundances (ratios less than a few hundreds), the approximate θ is almost double the true value implying that the reduced CME in this case over-estimates the strength of the negative feedback.
3. Metabolic network
We consider an arbitrarily large, sequential enzyme reaction network which has been previously associated with metabolism.31,32 The network consists of N + 1 enzymes, each converting a substrate into a product which then serves as the substrate for the next enzyme in the cascade. The first substrate is produced by a zeroth-order reaction, and the final substrate is converted into a product which we ignore. We seek the approximate distribution of the enzymes, which we expect to be exact in the limit where substrates are abundant. The full chemical system is described by the scheme,
It is hence clear that the molecule numbers of each enzyme species are binomially distributed in steady-state conditions,
where ϕSi is the steady-state solution of the REs of the full system, Eq. (46), given by
For a time-dependent description, the reduced CME corresponding to reduced chemical system (47) cannot be exactly solved and stochastic simulations are required. In Fig. 4, we plot both the approximate and exact distributions (using Extrande for the reduced system and the SSA for the full system) of the enzyme E1 at a fixed time for different abundances of substrate S1. It is clear that the approximation improves as the substrate becomes more abundant than enzyme and is essentially exact in the bottom right panel where the relative abundance is . It is also remarkable that the approximation is good even when there is essentially no clear separation in abundance, i.e., . Indeed even though the approximation suffers quantitatively when the relative abundance is not high, yet it captures the main distinctive qualitative feature, namely, that the distribution changes from positive to negative skewness as a function of the relative abundance (the switch happens at a relative abundance between 1 and 10).
For this system, we have the added benefit that the distribution of the number of molecules of each enzyme Ei is independent in the approximate description. This means that if we are interested in the distribution of the number of molecules of a given enzyme, say, E1, then we only need to simulate the three reactions involving that particular enzyme, rather than the 3N + 4 reactions of the full system. There is therefore a marked reduction in computational time for our reduced SSA, particularly for large N, as shown in Fig. 5, where the approximate SSA is roughly 3 times faster than the exact SSA when N = 20. We note, however, that the computational time of the approximate method does increase slightly with N, owing to the need to solve a coupled system of 2N + 2 rate equations.
4. Genetic oscillator with transcriptional feedback
We consider an arbitrarily large gene regulatory network which has been previously studied as a model of a circadian oscillator.33,34 The mechanism is as follows. A protein P1 is translated by mRNA, M, which is itself transcripted by a gene in the on-state, Don. Subsequently, the protein P1 generates P2, and P2 generates P3, etc., until a final protein PN is generated. The latter can bind to Don to deactivate it as Doff, which can reversibly unbind into PN and Don. We seek the approximate distribution of the number of molecules of Don and M, which we expect to be accurate when the proteins are abundant. The full chemical system is described by the scheme,
We note that the distribution of Don is independent of M, and is therefore simply . The steady-state distribution of M can be straightforwardly obtained using the method in Ref. 2 or else a fast implementation of the finite-state projection algorithm is equally effective.15 For a time-dependent description, however, we must use stochastic simulations to determine the accuracy of our method. In Fig. 6, we plot the time development of the distribution of the number of mRNA molecules, M, for parameters such that the steady state is characterised by a fixed large abundance of proteins, in particular, , which is a physically realistic ratio for some cells.23 Remarkably the approximate distribution provides an excellent match to the exact distribution for all times, reproducing even the transition from unimodality to bimodality and back to unimodality as a function of time.
In Fig. 7, we plot the computational time taken to simulate an individual trajectory of length 10 time units with the SSA for full system (50) and Extrande for approximate system (51). For the approximate system, only 4 reactions must be simulated, while the full system has N + 5 reactions. On the other hand, the approximate system requires the time-dependent solution of an N + 2 dimensional system of REs. This trade-off implies that for N = 1, 2, the full system is slightly faster, but for any N > 3, the approximate system is faster.
B. Exact reductions
We have already shown that our approximation is exact in the limit of infinite concentration of the abundant species, but as we now show, surprisingly, there is also a wide class of systems where the method is exact, regardless of abundance separation.
1. Systems in detailed balance
We now show that for systems in detailed balance,1 where the species we remove from the system (what we would previously call “abundant” species) are not involved in a chemical conservation law, the approximation is exact. Note that detailed balance is a property of some systems in steady-state conditions, and hence necessarily, the exactness mentioned does not apply to finite times, rather it applies only in the limit of infinitely long times.
Consider a detailed balance system of R reversible reactions, and N chemical species, X1, …, XN. Let us denote reaction j as
where we allow a conservation law on the species X1, …, XM of the form
where αi and k are time-independent constants. Application of the method described by van Kampen in Ref. 35 leads to an explicit expression for the steady-state solution of the CME of this chemical system which is given by a constrained multivariate Poisson distribution of the form
where ϕi are the steady-state rate equation solutions, δ(…, …) is a Kronecker delta, and C is a normalisation constant. The marginal distribution is obtained by summing over nM+1, …, nN. The exact marginal is therefore,
where C′ is a normalisation constant.
Now the approximate reduction introduced in Section II is equivalent to approximating chemical system (52) by reduced chemical system,
with the same conservation law as above. Application of the method in Ref. 35 to the reduced CME describing the above system immediately leads to a steady-state solution which is exactly the same as Eq. (55) since the RE solution of the reduced system is the same as that of the full system. Hence the approximation is exact for this class of chemical systems in detailed balance.
2. Open Michaelis-Menten reaction with one enzyme molecule
In the following example, we show that the approximation can be exact in steady-state conditions without taking any abundant limits, even if the system is not in detailed balance.
The open Michaelis-Menten reaction is given by
where substrate molecules S are input into the system, they reversibly bind with enzyme E to form a complex C which in turn irreversibly decays into the original enzyme E and product molecules P.
We will consider the case with the conservation law nE + nC = 1, that is where there is just one enzyme molecule in the compartment. The CME describing the above reaction system is
This equation has been solved exactly in steady-state conditions in Appendix G of Schnoerr et al.36 In particular therein it was shown that the average enzyme molecule number in steady-state condition is given by . This together with the fact that a single enzyme molecule, at any given time, can be in only one of two states, implies that the steady-state marginal distribution of enzyme number fluctuations is
Next, we show that our reduction gives exactly the same distribution, regardless of the abundance of substrate. The reduced CME describing enzyme fluctuations is given by
In steady state, setting nE = 0 gives us
Therefore, with the condition , we find that
The REs for this system are
As by arguments before, the steady-state distribution is Bernoulli and hence it follows that
which is equal to exact solution Eq. (59).
By similar arguments, it can be easily deduced that the marginal distribution of any species which exists in two states and for which the average number of molecules predicted by the REs is the same as the CME, is exactly predicted by the reduced CME. The second criterion on average molecule numbers is bound to generally be the limiting one since it is typically not the case that the REs exactly agree with the mean concentrations calculated from the CME (see, for example, Ref. 26). For example for genetic feedback loop (41) the marginal distributions of the gene in the on or off state cannot be exactly predicted by the reduced CME because as shown in Ref. 2, the average number of genes in each state (equivalently the fraction of time spent in each state) predicted by the CME does not equal that of the REs.
IV. ESTIMATING THE APPROXIMATION ERROR OF THE HYBRID MODEL
As we have shown for most systems, the reduction is exact only in the limit of infinite concentrations of certain species, and the reduction is therefore an approximation if concentrations are finitely large.
We now investigate the use of the Linear Noise Approximation (LNA) to obtain an estimate of the error made by the use of the reduced CME. By comparing this estimate with that obtained from stochastic simulations of both the full and approximate systems, we demonstrate that the LNA’s estimate is accurate for a wide range of parameters and systems. Since the LNA is obtained by solving a system of coupled ordinary differential equations, our results suggest the use of the LNA as a computationally efficient means of estimating the error which bypasses lengthy stochastic simulations using stochastic simulations of the full and reduced CMEs.
The LNA is an approximation which assumes the fluctuations in each chemical species are normally distributed. More precisely, it is the leading order approximation of the system-size expansion of the CME1 in the limit of large volumes. The general formulation of the LNA is as follows (see, for example, Ref. 37 for more details).
Consider a system of N chemical species, with R reactions, where the jth reaction is
The REs for the system are then given by
where we remind the reader that S is the stochiometric matrix with elements Sij = rij − sij and is the macroscopic propensity vector defined as
The Jacobian matrix J is the derivative of Eq. (68) with respect to ,
The diffusion matrix D is given by the matrix product,
The time-evolution of the second moments of the fluctuations is then approximately given by the Lyapunov differential equation,
where ΩCij is the LNA estimate for the CME’s prediction of the covariance in the number fluctuations of species Xi and Xj.
Now, proposed reduction approximates reaction scheme (67) by
when species XM+1, …, XN are the abundant species. Note that, for the reduced system with M species, the abundant concentrations no longer function as concentrations, but instead as parameters like the reaction rates kj. The REs remain unchanged, however. The Jacobian and diffusion matrices for this system, and , are hence the upper left M × M blocks of J and D previously defined in Eqs. (70) and (71), respectively. Thus, the LNA leads to an estimate of the reduced CME’s prediction of the covariance of fluctuations, , which is the solution of the Lyapunov equation,
Hence, it follows that the LNA’s estimate of the absolute relative difference in the variance predictions of the full and reduced CME’s for species i is given by
Of course, one can also calculate this quantity as a function of time, by solving the Lyapunov equations of the full and reduced CMEs numerically; however, in what follows we shall assume steady-state conditions to simplify the presentation.
Though its generally impossible to obtain a closed-form simple analytical solution to the LNA equations, one can show that the error Ri is approximately proportional to the inverse of the ratio of the abundant to non-abundant species concentrations in the abundant limit. The proof is as follows.
Referring to Table II, if species XN is abundant, then the abundant limit consists of the rate constant scalings: , , for j such that a(j) or b(j) equal N (j denotes a bimolecular reaction which involves XN and another species) and the steady-state concentration scalings: ϕi = ci for i≠N, where ci are constants independent of x and ϕN ∝ x. It is easy to verify using this limit and the REs given by Eq. (12) that the Jacobian matrix can be written as J = J0 + yJ1, where y = 1/x and Ji are matrices to be determined from the REs. In particular, J0 is the Jacobian of the REs with the terms describing the removal of the abundant species set to zero. The same scaling form for the Jacobian is obtained for any number of abundant species.
On the other hand, the diffusion matrix D is unchanged under the abundance limit. This is since by Eq. (71), the elements of the D are linear functions of the macroscopic rate functions fj (see Eq. (69)) which are unchanged by the abundance limit since each limit of kj tending to zero will be counterbalanced by the opposite limit of a concentration of an abundant species tending to infinity.
Hence, in the abundance limit, Lyapunov Eq. (72) can be written as
The form of this equation suggests a solution of the type C = C(0) + C(1)y + C(2)y2 + O(y3). Indeed plugging this ansatz in the above equation, one transforms it into a coupled set of equations for the matrices C(i) which can be solved iteratively, i.e., the equation for C(i) depends on C(j) where j < i except for C(0) which is a function of J0 and D only. Now the abundance limit is the limit y = 1/x → 0 and hence the relative error in the variance can be written as
Now in the abundance limit, the ratios of abundant to non-abundant concentrations are proportional to x = 1/y and hence it follows that in this limit, the relative error Ri is proportional to the inverse of these ratios.
Next we demonstrate the accuracy of the LNA’s estimate of the relative error in the predictions of the reduced CME, i.e., Eq. (75). This is done by comparison of the LNA estimate with the relative error directly computed from the SSA of the full CME and the steady-state analytical solution of the reduced CME for three examples of biochemical relevance.
A. Open Michaelis-Menten reaction with multiple enzyme molecules
We consider open Michaelis Menten system (57) with multiple number of enzyme molecules, i.e., (nE + nC) = ET, where ET is the total number of enzyme molecules. We consider the case where the substrate is much more abundant than the enzyme. Computing the LNA of the full CME and of the reduced CME for the non-abundant enzyme species, we find that the relative error in the variance of the enzyme number fluctuations, as given by Eq. (75), is
where and is the Michaelis-Menten constant.
The steady-state substrate concentration solution of the REs for this system is
We define as a measure of the relative abundance of substrate S. It then follows that R can be written as
The condition 0 < a < 1 is a requirement for the existence of a steady state, and R is a monotonically increasing function of a, so the maximum possible value of R is at a = 1, in other words,
That is, if the substrate concentration is ten times the total enzyme concentration, then the percentage relative error in the reduced CME’s estimate of the variance of enzyme number fluctuations will be less than ten percent.
The reduced CME can in this case be exactly solved in steady-state conditions and one obtains a binomial distribution with parameters ET and 1 − a describing the fluctuations in enzyme molecule numbers; indeed for the case ET = 1, the binomial distribution reduces to the Bernoulli distribution found earlier for the open Michaelis Menten system with one enzyme molecule (see Eq. (59)). In Fig. 8, we use the variance calculated from this solution together with the variance calculated from time-averages of SSA (for the full CME) to compute the true error in the reduced CME’s variance of enzyme number fluctuations for the open Michaelis-Menten system. This is done for two different volumes, Ω = 1 and Ω = 103. The true error is also compared in the same figure with the LNA estimate given by Eq. (80). The relative concentrations of substrate and enzyme are controlled by setting the rate constant k1 proportional to 1/x and varying x (in accordance with the abundance limits discussed earlier; see caption of Fig. 4 for details). The LNA estimates are reasonably good for both volumes but practically indistinguishable from the true error for the larger volume of Ω = 103. This is to be expected since the LNA becomes exact in the limit of large volumes.
B. Open homodimerisation reaction
Next, we use the LNA to estimate the errors in the reduced CME description for homodimerisation example (36). We consider the case in which species X1 is abundant compared to species X2. Choosing the scalings k2 = c1/x2 and k3 = c2/x (where ci are proportionality constants), it follows by the considerations of Section II B that the steady-state concentration of X1 is proportional to x while that of X2 is a constant; hence, by varying x we have a convenient way to control the ratio of the two concentrations. In particular, in steady-state, the solution of the REs is given by
where , , are constants. Note that the ratio of the abundant to the non-abundant species concentrations is proportional to x. Hence, by Eq. (83), it follows that the relative error has a maximum equal to λ0/λ1 and decreases monotonically as X1 becomes more abundant relative to X2. Next we test the accuracy of the LNA estimate.
The reduced CME for this system can be exactly solved in steady-state conditions and one obtains a Poisson distribution for the fluctuations in the number of molecules of X2 with parameter . In Fig. 9, we use the variance calculated from this solution together with the variance calculated from time-averages of SSA (for the full CME) to compute the true error in the reduced CME’s variance of number of X2 fluctuations. This is done for two different volumes, Ω = 1 and Ω = 103. The true error is also compared in the same figure with the LNA estimate given by Eq. (83). As for the previous example, the LNA accuracy is good across a wide range of volumes and becomes particularly accurate in the limit of large volumes. It is also noteworthy that the LNA estimate of the relative error is good over a wide range of relative abundances; in particular it even provides an accurate value (about 0.3) for the maximum relative error which occurs in the limit of small relative abundance of X1 compared to X2.
C. Genetic feedback loop
Here, we use the LNA to estimate the error in the reduced CME description for genetic feedback loop (41), where the gene concentration is fixed to 1/Ω, i.e., a single gene. We consider the case where the protein P is much more abundant than the gene. Choosing the scalings d0 = c0/x and d1 = c1/x (where ci are proportionality constants), it follows by the considerations of Section II B that the steady-state concentration of X2 (the protein) is proportional to x while that of X1 (the gene) is a constant; hence, by varying x we have a convenient way to control the ratio of the two concentrations. In particular, in steady-state, the solution of the REs is given by
The LNA relative error in the variance of the fluctuations of the non-abundant gene as given by Eq. (75) is
where λ = ϕ2/ϕ1x, λ0 = − c1(−1 + ϕ1Ω)(c1(−1 + ϕ1Ω)(−1 + 2ϕ1Ω) v0v1 − c0ϕ1Ω2v1(v0 + v1) + c0c1ϕ1Ω((−1 + ϕ1Ω) v0 − ϕ1v1) λ), and are constants. Note that the ratio of the abundant to the non-abundant species concentrations is proportional to x. Hence, by Eq. (86), it follows that the relative error has a maximum equal to λ0/λ1 and decreases monotonically as X2 becomes more abundant relative to X1. Note that the form of the LNA estimate of the error in this example is the same as that in the previous example.
In Fig. 10, we plot the true error in the variance of the fluctuations of the non-abundant gene computed using time-averages of SSA (for the full CME) and the analytical solution of the reduced CME in steady-state conditions (a Bernoulli distribution with parameter given by the steady-state solution of Eq. (45)) at Ω = 1. This is compared with LNA’s estimate Eq. (86) which is found to be particularly accurate, as found previously for the enzyme and dimerisation examples. However, unlike the previous examples, for the gene system, in the limit of large Ω, the LNA’s estimate does not become more accurate. The reason is that the LNA is accurate in the deterministic limit in which all species molecule numbers increase to infinity at constant concentration, whereas, in this example, the gene molecule number is fixed to one and only the protein molecule number is taken to infinity.
D. Genetic oscillator
Finally, we use the LNA to estimate the error in the reduced CME description for genetic oscillator (50). We consider the case where the proteins Pi are more abundant than the mRNA, while the gene is restricted to a maximum concentration of , i.e., a single molecule. Choosing the scalings d1 = c1/x and k2 = ⋯ = kN+1 = c2/x (where ci are proportionality constants), it follows by the considerations of Section II B that the steady-state concentrations of Pi (the proteins) is proportional to x while that of Don (the gene) and M (the mRNA) are constant; hence, by varying x, we have a convenient way to control the ratio of the two concentrations. In particular, in steady-state, the solution of the REs is given by
Given the arbitrarily large number of species, there is no compact analytic expression for the LNA relative error in the variance of the fluctuations of the non-abundant mRNA as given by Eq. (75); however, the error can be calculated by numerical solution of the REs and the Lyapunov equations of the full and reduced systems.
In Fig. 11, we plot the true error in the variance of the fluctuations of the non-abundant mRNA computed using time-averages of SSA (for the full CME) and the solution of the reduced CME in steady-state conditions (computed with the finite-state projection algorithm) at Ω = 1. This is compared with the LNA estimate which is found to be accurate for physically realistic abundances (ratio of protein to mRNA concentrations are commonly larger than a hundred in bacteria; see for example Ref. 23). However, unlike some of the previous examples, for the gene system, in the limit of large Ω, the LNA’s estimate does not become more accurate. The reason is that the LNA is accurate in the deterministic limit in which all species molecule numbers increase to infinity at constant concentration, whereas, p in this example, the gene molecule number is fixed to one and only the protein molecule number is taken to infinity.
V. SUMMARY AND DISCUSSION
Summarising, in this paper, we have introduced a novel reduced stochastic description of chemical systems in which some species are abundant. The key intuitive idea is to replace the conditional expectation of the number of molecules of abundant species in the propensities of the exact marginal CME by the solutions of the deterministic REs and hence obtain a reduced CME for the non-abundant species only. Therefore, our method is a hybrid approach. We note that our method is different and simpler than that presented in Refs. 19, 21, and 22 since the latter postulate a more complicated approach than REs to model the abundant species. This relative simplicity indeed leads to three major strengths of our approach over existing approaches: (i) it is easier to implement and computationally more efficient than present approaches; (ii) our reduced CME can be explicitly solved for a number of biochemically relevant examples; (iii) simple rational expressions can be derived which estimate the errors inherent in the hybrid approximation relative to the fully stochastic description. Curiously we also found that the reduced CME at the heart of our hybrid method is exact for some chemical systems, i.e., without requiring the necessity of abundance separation or without restricting the system to purely monomolecular systems (as was found to be the case in Ref. 21 to ensure exactness for the Hellander and Lotstedt model). The major disadvantage of our approach is that its unlikely that it will be able to capture as many features of the fully stochastic model as the more sophisticated approaches mentioned above.
The present work also suggests some new avenues of research. The first is finding exact error bounds for the reduced CME, which could provide useful reassurance for mathematical modellers using this method. A second interesting direction would be to develop a more refined reduction of the CME by replacing the conditional expectation of the number of molecules of abundant species in the propensities of the exact marginal CME by the solutions of effective mesoscopic rate equations (EMREs)26 instead of REs. EMREs have been demonstrated to be more accurate than REs in the sense that the difference between their mean concentration solution and that of the CME is considerably smaller than the difference between the RE solution and that of the CME.27 Hence, a CME reduction based on EMREs is highly likely to be more accurate than the one developed, in this paper, particularly for cases where the compartment volume is small such that even though there is a large ratio of abundant to non-abundant concentrations, the number of molecules of abundant species is small. Finally another interesting area for future work is the relationship between time scale separation and abundance separation. It is clear that latter does not typically imply the type of time scale separation typically used to obtain reduced CMEs (see for example Ref. 16) because it does not lead to a partitioning of reactions into fast and slow types; this is since within our abundance separation method, each limit of a rate constant tending to zero is counterbalanced by the opposite limit of a concentration of an abundant species tending to infinity. Yet it is not difficult to show that our abundance separation limit does lead to a separation of the eigenvalues of the Jacobian and hence point to time scale separation of concentration transients on the deterministic level. Thus, a deeper investigation into the relationship between abundance separation and time scale separation could improve understanding of both types of separation and as well lead to a clearer picture regarding when CMEs can be effectively reduced.