Noise in genetic networks has been the subject of extensive experimental and computational studies. However, very few of these studies have considered noise properties using mechanistic models that account for the discrete movement of ribosomes and RNA polymerases along their corresponding templates (messenger RNA (mRNA) and DNA). The large size of these systems, which scales with the number of genes, mRNA copies, codons per mRNA, and ribosomes, is responsible for some of the challenges. Additionally, one should be able to describe the dynamics of ribosome exchange between the free ribosome pool and those bound to mRNAs, as well as how mRNA species compete for ribosomes. We developed an efficient algorithm for stochastic simulations that addresses these issues and used it to study the contribution and trade-offs of noise to translation properties (rates, time delays, and rate-limiting steps). The algorithm scales linearly with the number of mRNA copies, which allowed us to study the importance of genome-scale competition between mRNAs for the same ribosomes. We determined that noise is minimized under conditions maximizing the specific synthesis rate. Moreover, sensitivity analysis of the stochastic system revealed the importance of the elongation rate in the resultant noise, whereas the translation initiation rate constant was more closely related to the average protein synthesis rate. We observed significant differences between our results and the noise properties of the most commonly used translation models. Overall, our studies demonstrate that the use of full mechanistic models is essential for the study of noise in translation and transcription.

## INTRODUCTION

Translation and transcription are two of the most important cellular processes. An important aspect of these processes is template polymerization, which leads into mechanistic volume-exclusion kinetics that must take into account the ribosome movement along the messenger RNA (mRNA) chain. Protein translation is a multi-step process, and its mechanistic details have been reviewed by Rodnina and Wintermeyer^{1} (Fig. 1). Although most mathematical models ignore such mechanistic details, recent experimental and computational studies point to the importance of polysome formation along the mRNA.^{2–9}

Mathematical models that account for ribosome movement and the kinetics of transition between individual codons have been used to study protein synthesis.^{5–7,10–14} Earlier studies analyzed the properties of a single gene in isolation, examining, for example, the rate-limiting steps of translation or the effect of different codon elongation rate constants along the mRNA chain. These studies were based on different types of analyses: ordinary differential equations (ODE) to estimate the deterministic evolution of the system, on a mean-field statistical approach termed TASEP (for Totally Asymmetric Exclusion Process), or on stochastic simulations. These modeling studies have been comprehensively reviewed by von der Haar.^{15} Recent work has increasingly considered the competition between multiple mRNAs to study the effect of ribosome pool depletion. Such analyses were first performed using a TASEP-based approach for a few genes in competition,^{16} whereas subsequent studies employed stochastic simulations that encompassed a realistic pool of genes.^{17,18}

However, these computational studies (of isolated genes or competing genes) considered the resultant average properties of the system and not the properties of noise due to translation. There is nevertheless large *in vivo* phenotypic variability within a given cell population owing to the intrinsic or extrinsic noise that arises, for example, from the small number of mRNA copies or from the stochastic production and degradation of mRNAs.^{19–23}

Therefore, to evaluate the impact of noise in the process of protein synthesis, we must model and study the system using a stochastic framework. Efficient stochastic simulations of chemical systems have been initiated by Gillespie.^{24} Protein synthesis and translation have later been studied under a stochastic framework for simplified systems.^{17,18,25–27}

The two main challenges in the stochastic simulation of protein translation that accounts for the ribosomes are the conservation of ribosomes (a ribosome leaves the pool of free ribosomes, translates a mRNA for some time, and is then recycled back to the pool of free ribosomes) and the coupling of two types of processes: 3D and 1D kinetic processes. The commonly used algorithms for the stochastic simulation of chemical reactions are based on the assumptions of a well-mixed reaction system, and the kinetic laws are derived from 3D collision theory. While translation initiation can be described as a 3D kinetic process, elongation and termination are 1D processes (i.e., ribosomes moving irreversibly in one dimension). Another challenge common to the stochastic simulation of biological and physical systems is the scalability of the simulations: the mRNA copies should not be observed simply in isolation because they compete for the same resources (e.g., ribosomes and tRNA) and a cell-sized system consists of multiple tens of thousands of simultaneous reactions. Moreover, to perform sensitivity or parametric studies and to elucidate the influence of various parameters on the system, simulations must be repeated multiple times and for various sets of parameters, thus requiring efficient algorithms. Additionally, in view of the recent increase in interest in the importance of delays arising from protein synthesis,^{26,28–30} we designed our algorithm to follow the individual motion of each ribosome, thus allowing the computation of such delays as well as their distribution, although such a capability further increases the complexity of the system.

In this study, we introduce a new and efficient stochastic translation algorithm to alleviate the constraint of studying a single gene in isolation, thereby allowing us to estimate the effects of competition among mRNAs for a common pool of ribosomes. We further characterize in detail the noise properties and propagation in the system when taking translation elongation into account and we characterize how the noise and translation properties are influenced by various parameters.

We first compare our algorithm with models based on ODE for a single mRNA copy. Because the algorithm enables efficient simulations of pools of ribosomes and mRNAs, we subsequently study the implications of genome-scale competition for common resources. We further quantify the influence of the model parameters on the distribution of the outputs from the system via sensitivity analysis. We also study the propagation of the noise through protein translation and observe that ribosome elongation reduces noise. The results are further analyzed to estimate the time delays in the system. Finally, we compare our full mechanistic model with reduced models commonly used in the literature and demonstrate that such lumped models cannot capture the true noise present in the system.

## METHODS

### Model of protein translation

To study the effects of stochasticity in translation, we base our model on the formulation by Heinrich and Rapoport.^{12} The reactions of the system are as follows (see also Fig. 1):

where *R _{free}* represents a free ribosome value obtained via Eq. (4) below (note that the various intermediate steps of initiation have been incorporated into the initiation rate constant $ k I j $ and that we consider here the entire ribosome binding to the mRNA initiation codon as one element); $ m I j $ is a free initiation codon of mRNA

*j*; and $R m i j $ is the complex “ribosome bound to codon No.

*i*of mRNA

*j*.” Similar to what is observed in prokaryotes and eukaryotes,

^{4,31–34}the ribosome is assumed to occupy

*L*= 12 codons of the mRNA and to then slide along by one codon at a time. Consequently, the system has a 1D constraint forbidding the translocation of a ribosome if another ribosome is on any of the

*L*codons in front of it. Once a ribosome is free to translocate, its complex is in the activated form $R m i j , * $. Similarly, the initiation site of the mRNA is not available for initiation as long as a ribosome lies on the

*L*first codons of it. For elongation, the effective elongation rate constant, $ k E , i eff $, depends on the various intermediate steps of elongation and the concentrations of the corresponding aminoacyl-tRNA (aa-tRNA) and elongation factors.

^{10,11}However, in this study, we assumed a uniform elongation rate constant (i.e., identical for each codon), of 10 amino acids/s (similar to the value observed in

*Saccharomyces cerevisiae*

^{35}and

*Escherichia coli*

^{36}). Moreover, a gene is usually some hundreds of codons long, and the elongation reaction occurs on each codon. Finally, the new protein is released from the stop codon of the mRNA (codon No.

*n*, value depending on the gene species) with the termination rate constant $ k T j $.

_{j}### Reversible translation initiation term

In contrast to the description provided in Eq. (1) above, the binding of ribosomes to the mRNA can be reversible. As we demonstrate in this paper, this reversibility of ribosome binding does not have any considerable influence on the protein synthesis rate or on most properties of translation; thus, the majority of the simulations are performed using the model described in Eq. (1). For simulations including a reversible initiation step, the model includes, in addition to the steps of Eq. (1), the following step:

where $ k \u2212 I j $ represents the reverse initiation rate constant for the ribosome bound to the first codon, $ m 1 j $.

When the reverse initiation step is omitted, the model has a lower triangular structure. Consequently, the *linear noise approximation* method could be used to explore analytical solutions for the noise of the system near its steady state by solving Lyapunov equations.^{37}

### Definitions and additional parameters

We define the *specific protein synthesis* rate (or *translation rate*) as the rate at which new proteins appear in the system per mRNA copy of the gene, which is also equal to the *translation termination rate* (which corresponds to the rate at which the last reaction of Eq. (1) occurs per mRNA copy and is computed in our simulations as the inverse of the time interval between two occurrences of this reaction). Similarly, the *translation initiation rate* is defined as the rate at which ribosomes effectively bind to a free initiation site of a given mRNA copy (which corresponds to the rate at which the first reaction of Eq. (1) occurs per mRNA copy).

Multiple ribosomes can simultaneously synthesize proteins on a single mRNA copy. The number of ribosomes on the same mRNA copy is called the *polysome size*, *P*, and we define the *ribosomal density* of the mRNA copy as

This density takes values between 0 (the mRNA is completely empty of ribosomes) and 1 (the open reading frame (ORF) of the mRNA is completely covered by ribosomes).

In our simulations with various gene species, each gene species is present in multiple copies (the number of mRNA copies per species is kept constant, i.e., transcription and mRNA degradation are not considered), and each gene species can have different characteristics: number of codons and initiation and termination rate constants. However, the elongation rate constant depends on the codon species but not the gene species. To focus on the properties of the translation process, this study assumes constant abundances of mRNA copies and ribosomes, but our formulation can easily be extended to consider dynamic changes in the mRNA and ribosome populations.

At any time point, the number of free ribosomes is given by

where *R _{tot}* is the total number of ribosomes in the cell and is assumed to be constant. The summation of the polysome sizes from all mRNA copies corresponds to the number of ribosomes bound to mRNAs of any species. However, in most of the simulations, we study one mRNA copy in isolation, and as we will show, the depletion of ribosomes by a single mRNA copy is negligible. Therefore, we will assume for these simulations of a single gene, without loss of generality, that the number of free ribosomes is constant in the system, i.e.,

*R*≈ 17 670. (This value was computed based on data for polysome size in

_{free}*S. cerevisiae*

^{2}and considering that 80% of the total ribosomes are active in translation;

^{2}similar values have been observed in

*E. coli*: between 1000 and 14 400 ribosomes are free at various growth rates.

^{36})

The model described here differs from the original models of Heinrich-Rapoport^{12} and Gibbs-MacDonald^{38} in two main respects: first, the pool of free ribosomes is allowed to vary in our model, and second, as continuous deterministic models, these other models must include an approximate expression describing the probability that codon *i* is free, knowing that a ribosome is on codon *i* − *L*. This approximation only holds under low ribosomal crowding conditions, whereas our model, which is described under discrete stochastic terms, better reflects the true distribution of ribosomes on mRNAs.

The parameter values used for the simulations are provided in Tables S1–S3 in the supplementary material.^{52} The definitions of uncommon terms used throughout this document are summarized in Table I.

Name . | Definition . |
---|---|

Ribosomal density and crowding | The ribosomal density describes the average coverage of the mRNA by ribosomes (see the Methods section), whereas ribosomal crowding refers to high local coverage (many ribosomes near each other), possibly slowing down translation. |

Specific protein synthesis rate | Rate at which new proteins are synthesized per mRNA copy of the gene. |

Optimal synthesis rate profile | Profile of the maximum synthesis rate as a function of ribosomal density; this profile corresponds to the maximum achievable specific synthesis rate for each ribosomal density of the mRNA. |

Critical density | Ribosomal density above which an increase in the number of ribosomes bound to the mRNA results in a decrease in the specific protein synthesis rate. |

Opto-critical density | Critical density of the optimal synthesis profile, resulting in the overall maximal protein synthesis rate. |

Initiation and termination event | The event when a ribosome binds the initiation site of the mRNA (initiation) and the event when a ribosome finishes synthesizing a new protein (termination). |

Initiation (i) and translation (ii) delay | Time it takes for a ribosome between the initiation event and (i) when it clears the initiation blocking region (codon L), allowing space for another ribosome to bind to this mRNA; (ii) the termination event (i.e., total time the ribosome is bound to the mRNA). |

Name . | Definition . |
---|---|

Ribosomal density and crowding | The ribosomal density describes the average coverage of the mRNA by ribosomes (see the Methods section), whereas ribosomal crowding refers to high local coverage (many ribosomes near each other), possibly slowing down translation. |

Specific protein synthesis rate | Rate at which new proteins are synthesized per mRNA copy of the gene. |

Optimal synthesis rate profile | Profile of the maximum synthesis rate as a function of ribosomal density; this profile corresponds to the maximum achievable specific synthesis rate for each ribosomal density of the mRNA. |

Critical density | Ribosomal density above which an increase in the number of ribosomes bound to the mRNA results in a decrease in the specific protein synthesis rate. |

Opto-critical density | Critical density of the optimal synthesis profile, resulting in the overall maximal protein synthesis rate. |

Initiation and termination event | The event when a ribosome binds the initiation site of the mRNA (initiation) and the event when a ribosome finishes synthesizing a new protein (termination). |

Initiation (i) and translation (ii) delay | Time it takes for a ribosome between the initiation event and (i) when it clears the initiation blocking region (codon L), allowing space for another ribosome to bind to this mRNA; (ii) the termination event (i.e., total time the ribosome is bound to the mRNA). |

### Stochastic translation algorithm

We implement an exact stochastic algorithm in continuous time, adapted from the exact kinetic Monte Carlo algorithm of Gillespie,^{24} which allows us to perform stochastic simulations of our system composed of a large number of species. However, to cope with the large number of components present, we need a refined algorithm that addresses two major issues: (1) a crude kinetic Monte Carlo would follow the evolution of every constituent separately as needed but would result in an unacceptable number of reaction propensities and (2) Gillespie’s algorithm^{24} is a more efficient exact stochastic algorithm established for 3D reactions kinetics under the assumption of a well-mixed system but does not distinguish “particle identities” of the same type, which is necessary here. Once bound to the mRNA, ribosomes follow a 1D trajectory along the mRNA and are therefore subject to constraints, for example, a ribosome cannot overpass another ribosome on the same mRNA. Therefore, we need to know exactly which ribosomes and mRNAs are able to react at each time point.

We thus extend Gillespie’s algorithm as follows (see Fig. 2 and Fig. S1 in the supplementary material^{52} for a detailed flowchart of this algorithm; the numbers below refer to annotations in the figures):

We first set the initial conditions. Usually, we start with all mRNAs empty and all ribosomes free and then let the system evolve to reach a quasi-steady state before recording the steady state values. At each step, all ribosomes and mRNAs that are not blocked are grouped according to the type and rate of their current reaction. For example, the free ribosomes are grouped together; the ribosomes bound to a mRNA are grouped according to the codon on which their A-site currently lies (because the elongation rate can depend on codon species), and so on. This grouping limits the number of reaction propensities present in the system to only a few reactions and thus results in more efficient computations.

For each iteration, we start by evaluating the next reaction and time step in the same way as in Gillespie’s continuous time algorithm, drawing two random numbers: one giving the time step value and the other determining the current reaction.

Then, depending on the current reaction, we eventually draw an additional random number, in a second Monte Carlo step, to determine the exact identity of the currently reacting ribosome and mRNA: even knowing which is the current reaction (through the Gillespie’s type step from point 2), a subset of ribosomes may still be able to perform this reaction, and we therefore randomly select one ribosome from this subset.

We next update the propensities only for the reactions that are affected by the change of state that just occurred. If the end time is not yet reached, this cycle (starting at point 2) is repeated for the next iteration.

Because the exact identity of the reacting mRNA and ribosome is obtained in each iteration, a detailed tracking of the trajectories of each ribosome is possible with this algorithm. An important improvement with this algorithm is the grouping of reactions sharing the same rate constants. The aim of this grouping is similar to the aim of the *sorting direct method* proposed by McCollum and colleagues:^{39} to reduce the large cost in computational time due to the search depth needed to find the current reaction. In their algorithm, McCollum and colleagues propose to dynamically sort propensities so that the reactions more likely to occur are scanned first. However, in the specific system of translation, many reactions are equally probable (the same codon species are present multiple times on the genes, with the same elongation rate constants), and thus the sorting of reactions would not be able to efficiently order the propensities. In our case, we decrease the search depth by decreasing the number of possible propensities to scan. These two algorithms can work together, and thus further improvement could be achieved if we integrated the two algorithms to dynamically sort the grouped reactions.

Our proposed algorithm is an exact stochastic algorithm and thus reproduces the properties of the noise in the system. Step 2 in our algorithm corresponds to the *direct method* from Gillespie’s algorithm and has been previously demonstrated to be exact,^{24} and the second extra Monte Carlo step in our algorithm (step 3 above) simply groups the reactions with identical rate constants, which accelerates the search of the current reaction but retains the same possible reactions and related propensities in the system as the *direct method*. To validate our algorithm, we determined that our algorithm achieved the same results as the *direct method* (not shown) but with a considerable increase in speed (Fig. S2(a) in the supplementary material).^{52}

To illustrate this point, consider a small system with five possible reactions, each with a propensity of *a _{i}* (

*i*= 1, …, 5), and assume that reactions 2 to 5 all have the same propensity, which we define as

*A*

_{2_5}:

*a*

_{2}=

*a*

_{3}=

*a*

_{4}=

*a*

_{5}= :

*A*

_{2_5}. In the standard Gillespie algorithm, the total propensity would be $ a tot = \u2211 k = 1 5 a k $; two random numbers,

*r*

_{1}and

*r*

_{2}, would be drawn to determine the next time interval,

and determine the next reaction, reaction *i*, by

For the purpose of illustration, we denote the propensities as *b _{j}* in our algorithm. The propensities that are equal to each other (

*a*

_{2}to

*a*

_{5}in our illustration) are first grouped into an extended propensity:

*b*

_{2}=

*a*

_{2}+

*a*

_{3}+

*a*

_{3}+

*a*

_{5}= 4 ⋅

*A*

_{2_5}, and the other propensity is retained (

*a*

_{1}in the illustration):

*b*

_{1}=

*a*

_{1}. Despite the grouping of propensities, the total propensity is equal to the original one:

*b*=

_{tot}*b*

_{1}+

*b*

_{2}=

*a*

_{1}+ 4 ⋅

*A*

_{2_5}=

*a*. As in Gillespie’s algorithm, we first draw two random numbers

_{tot}*r*

_{1}and

*r*

_{2};

*r*

_{1}preserves the same time interval: $d t b =\u2212 1 b tot \u22c5log r 1 =\u2212 1 a tot \u22c5log r 1 =dt$. The current reaction is determined similarly to Eq. (6): $ \u2211 k = 1 i \u2212 1 b k < r 2 \u22c5 b tot \u2264 \u2211 k = 1 i b k $. If the current reaction is reaction

*i*= 2, which corresponds to

*b*

_{2}, then a third random number is drawn,

*r*

_{3}, which is a uniformly distributed random integer with possible values 2, 3, 4, and 5. This random number determines which of the four “sub-reactions”

*a*

_{2}to

*a*

_{5}occur. Because these four reactions all have the same propensity (

*A*

_{2_5}), they are equally probable; thus, the probability that each reaction will occur is identical to that in Gillespie’s original algorithm, i.e.,

where equality (1) in this last equation corresponds to the computed steps with our algorithm and equality (2) is from the standard Gillespie algorithm.

As we have demonstrated that the time interval and reaction probabilities are exactly the same as in the standard Gillespie method, our method will give exactly the same results. The proof of this illustration is generalizable to a system of any size. Thus, the results are the same as with Gillespie’s algorithm, but because of the grouping of reactions, the search for the current reaction (Eq. (6) in Gillespie’s algorithm) will be much faster using our algorithm when the system consists of many possible reactions. This difference results in an improved computational efficiency of the algorithm when dealing with a large number of species (Fig. S2 in the supplementary material):^{52} with our algorithm, the runtime needed per iteration is nearly independent of the number of species.

## RESULTS AND DISCUSSION

It has been previously shown that the specific protein synthesis rate depends on the polysome size and density. More specifically, using deterministic models of translation, we and others have determined that for given codon elongation rate constants, there exists a unique pair of initiation and termination rate constants (inputs of the model) that results in a given ribosomal density, ribosome distribution along the mRNA chain, and specific protein synthesis rate^{10,12,38,40} (see definitions in Table I). Note that because the elongation rate constants are fixed in the model, not all of the output states of ribosome distribution, ribosomal density, and specific protein synthesis rate can be achieved. In addition, for each value of ribosomal density, there exists a range of reachable specific protein synthesis rates, and it is possible to determine the set of initiation/termination rate constants that corresponds to the optimal synthesis rate (i.e., for a given value of ribosomal density, we define the optimal synthesis rate as the maximum synthesis rate at this ribosomal density). We define the optimal synthesis rate profile as the curve that describes the optimal (maximum) synthesis rate as a function of the ribosomal density (the specific synthesis rates below this curve but not above it can be achieved). As the ribosomal density increases (by increasing the initiation rate constant), the corresponding optimal synthesis rate increases monotonically until a critical value beyond which it declines. The highest ribosomal density states, in which the synthesis rate decreases after the critical value, are only reached when the termination rate constants decrease.

### Comparison of the stochastic system to the deterministic system and experimental data

Previously, we developed a deterministic model based on ODEs giving steady-state values of protein synthesis rates.^{10} This model allowed us to efficiently explore the parameter space of initiation and termination rate constants. However, deterministic models are based on approximations and assume many copies of each component, and the results obtained from deterministic models do not necessarily hold in the more general stochastic framework. Therefore, we first compare the results from the deterministic system with the time-averaged values obtained in the stochastic simulations using the same input values (initiation and termination rate constants, gene length, number of ribosomes, and mRNA copies).

The stochastic and deterministic models of the system are compared with respect to the synthesis rate profiles, which display the specific protein synthesis rate as a function of the ribosomal density, *ρ* (Fig. 3 and Fig. S3 in the supplementary material^{52}). We examine three characteristic cases: the optimal synthesis profile (Fig. 3), i.e., the maximal reachable specific synthesis rate for each ribosomal density, and two suboptimal profiles with the same ribosomal densities (Fig. S3).^{52} Comparing the results from the stochastic simulations of these three cases with the profiles obtained from the ODE simulations reveals very good agreement. Furthermore, we observe a critical density in both the deterministic and stochastic simulations: when more ribosomes are present on the mRNA than this critical density, then the specific synthesis rate decreases. This effect is due to the crowding of ribosomes on the same mRNA, which creates a “traffic jam” effect (see below). Because the results of both types of simulations, deterministic and stochastic, are in perfect agreement, we can estimate the parameter values that optimize the synthesis rate using the deterministic model and use them as parameters in our stochastic system in subsequent analyses.

In deterministic simulations, the results correspond to the averaged values of the system, whereas stochastic simulations contain more information because the full population evolution can be followed, enabling the observation of the distribution of possible values of the system. Using the stochastic simulations, we can determine the variability of various observables (e.g., rates, time delays, and ribosomal density), thus providing information about the noise present during the translation of proteins (Fig. 3).

A commonly used measure for the noise in the system is the coefficient of variation (CV, defined as *CV* = *σ*/*μ*, where *σ* is the standard deviation and *μ* is the mean). We report the values of CV for the synthesis rate in Fig. 3 and Fig. S3 in the supplementary material.^{52} However, to our knowledge, no experimental study has measured the amount of noise in the synthesis rate at the genome scale, and thus we cannot directly compare these estimates to experimental measures. Instead, noise has been measured experimentally for the protein abundances: in *E. coli*, the distribution of noise in protein abundances is between 0.3 and 3, with a median value of 0.47 (when the noise is expressed as CV for protein abundance);^{41} similar values prevail in *S. cerevisiae*, with a distribution of observed CV values between 0.08 and 0.5.^{19} We thus extend our stochastic model with a protein degradation term to estimate the noise on protein abundances in our simulations and for comparison with the experimental values (see Appendix A for details).

The noise we observe in protein abundances depends on the ribosomal density and protein half-lives (Fig. S4 in the supplementary material).^{52} Importantly, the amount of noise in our model is consistent with experimentally observed values. For most parameter sets used in our simulations, the noise we obtain is slightly lower than the experimentally observed noise, but this discrepancy can be explained by the following: (1) experimental observations have indicated that most mRNAs are translated with a low ribosome density in prokaryotes and eukaryotes,^{2,6} which corresponds to the regime in which our simulations encompass a higher amount of noise for protein abundances as well (Fig. S4(b) in the supplementary material)^{52} and (2) our model of protein abundance encompasses a detailed model of protein translation and a simplified term for protein degradation but does not consider, for example, mRNA synthesis and degradation, mRNA transport (for eukaryotes), or possible feedback mechanisms, all processes that could add some noise to the system.

### The maximal protein synthesis rate corresponds to minimum noise and uniform ribosome distribution along the mRNA

At the opto-critical density (defined in Table I), at which the maximal specific protein synthesis rate is achieved over the whole range of ribosomal density, the variability in all four properties of ribosomal densities, rate of ribosome binding, synthesis rate, and protein abundance are also the lowest (Fig. 3 and Fig. S4(b) in the supplementary material,^{52} as indicated by a smaller coefficient of variation and by smaller interquartile ranges). This simultaneous reduction in noise could be the result of evolutionary forces as it implies that, as the cellular machinery may have evolved toward high specific synthesis rates for a given protein, it also evolved in parallel toward reduced variability for the processes and properties associated with the synthesis rate.

In the suboptimal synthesis rate profiles (Fig. S3 in the supplementary material),^{52} we observe a maximum specific synthesis rate plateau over a wide range of ribosomal densities that correspond to the identical specific synthesis rates. This result implies that the system could drift toward inefficient utilization of resources because a larger number of ribosomes will be engaged without improving the synthesis rate. The noise in the ribosomal densities is also higher under these suboptimal conditions. In fact, there is an essential difference between the ribosomal densities: for such a plateau, the ribosomal density varies widely between the extremes of the plateau, whereas for the optimal synthesis profile, the density only lightly oscillates around its mean value (Fig. S5).^{52}

### Ribosomal motion and ribosome distribution along the mRNA

The simulations allow the evolution of the ribosome distribution and the motion of each ribosome on the mRNA to be followed under different conditions (Fig. 4). We can observe that for the cases corresponding to the optimal ribosome profile (Figs. 4(a)-4(c)), the ribosome distribution along the mRNA is nearly uniform, in contrast to a suboptimal condition (Fig. 4(d)), in which the ribosome moves faster on some codons but is slowed on other codons due to a queuing effect. Additionally, at low ribosomal densities, the ribosomes exhibit regular motion on the gene, colliding only rarely (Fig. 4(a)).

Despite this quite regular motion of ribosomes at low density, we observe that the time-evolution of the ribosomal density oscillates, leading to an irregular rate of protein synthesis and thus to some bursts of proteins followed by periods of time in which fewer proteins are synthesized, which might reflect an inefficient process. Similar bursts of transcription have been reported in the literature.^{25,42,43} Note that such oscillations are reduced at higher ribosomal density (Figs. 4(b) and 4(c)) and that similar oscillations are also observed for suboptimal profiles, even at high ribosomal density (Fig. 4(d)).

Moreover, we note that the system displays some behavior similar to that of traffic jams: at high density, the ribosomes queue, and their average motion slows (Fig. 4(c)). This phenomenon is also observed at lower densities in the suboptimal conditions (Fig. 4(d)). Similar to traffic problems,^{44} we observe *shock waves* propagating in front of a hindrance (here, *hindrance* corresponds to a ribosome that takes more time than usual to leave its current codon); queues of ribosomes then accumulate on codons before this hindrance, and the queue formation lasts longer than the hindrance (the ribosome that provoked the hindrance has left, but other ribosomes remain queued downstream of it) (Figs. 4(c) and 4(d)). This behavior is observed, for example, in Fig. 4(d) in the region shaded in gray (region marked “*A*”): a ribosome provokes a hindrance by blocking the termination site starting at 108 s, which provokes a queue that propagates to prior codons; the ribosome that provoked the queuing terminates translation at 115 s, but the queue starts receding only at approximately 135 s. In this analysis, the elongation rate constant is the same along the whole mRNA. Therefore, there is no bias for a limiting codon along the chain, and the hindrance is the last codon because the termination rate constant is lower than the elongation rate constant in this case. In the presence of a codon with an elongation rate constant lower than the elongation rate constants of the neighboring codons, we could have observed similar queuing upstream of this codon.

### Background pool of genes

Models of genetic networks are commonly studied in isolation, that is, usually only a few genes are considered to interact simultaneously. However, the synthesis of individual proteins competes for the same resources, including the same free ribosomes to initiate translation, the same aa-tRNA molecules, and the same elongation factors.

Our algorithm scales linearly with the total number of mRNA copies present (Fig. S2(b) in the supplementary material).^{52} We can, therefore, study the effect of a genome-size pool of mRNAs competing in a bath of ribosomes. Indeed, due to competition between mRNAs, the number of free ribosomes (available to bind mRNAs) is constantly varying: ribosomes are trapped on a mRNA for some time during their translation and then recycled back to be available for new binding events, which could affect the results.

We study this system under two conditions. The first condition encompasses multiple gene species with different characteristics (length and initiation/termination rate constants) representing a total number of 12 900 mRNA copies with a low average ribosomal density of 0.21 (*mixed pool*). In the second condition (*uniform pool*), only a single background gene species is present, with 2450 copies under a high ribosomal density condition (*ρ* = 0.8); see Appendix B for details. In both conditions, we observe a rapid decrease in the number of free ribosomes to its mean value, followed by only small oscillations around the mean (Fig. 5 and Fig. S6(a) in the supplementary material).^{52} The oscillations in free ribosomes are small because the large number of mRNA copies present in the cell means that the variations present in isolated genes are smoothed out by averaging over the whole mRNA population. Indeed, in both conditions, changing the values of a “marker” gene present as a single mRNA copy on top of the background did not significantly influence the rest of the system, as indicated by a similar evolution of the free ribosomes (Fig. 5).

Then, for the stochasticity present for the “marker” gene, we observe that the differences between observing this gene in isolation or embedded in a background gene pool are negligible (Fig. S6(b) in the supplementary material).^{52} Therefore, the rest of this study considers an isolated gene, which allows the gene-intrinsic translation characteristics to be decoupled from eventual extrinsic noise that could come from another gene of the background.

### Dependence on gene length

We next investigated the influence of the gene length on the noise during translation. In the deterministic model, when we consider the synthesis rate profile to be a function of the ribosomal density and not of the polysome size explicitly, all such profiles of synthesis rates match, independently of the gene length. Simulations of various genes of different lengths yield synthesis profiles similar to that in Fig. 3. In stochastic simulations, the average values are also independent of the gene length, and no real difference is observed in the distribution of specific protein synthesis rates (Fig. 6). Similarly, we observe no difference in the variability in polysome sizes: the polysome size varies by 5-10 bound ribosomes for all gene lengths. However, due to different gene lengths, the variation in ribosomal densities is greater for shorter genes (Fig. 6).

### Sensitivity analysis

Sensitivity analysis aims at understanding the influence of various parameters on the output of a system. In the case of deterministic models, such analysis is well established and consists of computing the log-sensitivities (also known as the control coefficients in various fields of study^{45}), which describes the percent change of an output as a result of a percent change in a parameter. For stochastic simulations, the result of a simulation is a random trajectory, and repetitions of such simulations yield a distribution function, which complicates the sensitivity analysis and thus has been less studied. However, the results given by a distribution function also provide new perspectives.^{46} Instead of focusing only on how the average value of an output changes with a parameter, we can observe how the distribution of this output changes. In some cases, the full distribution of the output will be shifted by the same amount as the average value, but in other cases, a change in a parameter will modify the shape of the distribution of the output, which would not be reflected solely by the average value of the output.^{46}

We perform stochastic simulations for changes in the initiation, elongation, or termination rate constants and compute the distributions of synthesis rates and ribosomal densities. We next characterize quantitatively how these distributions change in response to the changes in the various parameters (Figs. S7 and S8 in the supplementary material for a change of 50% in the parameters and Fig. S9 for different amounts of change).^{52} Interestingly, not all parameters have the same effect on the resultant distribution of synthesis rates: for example, at a medium density of $ \rho \u0304 =0.37$, an increase in the initiation rate constant, *k _{I}*, shifts the distribution to a higher value, while a change in the mean elongation rate,

*k*, maintains the peak of the distribution at the same value but increases the tail of this distribution (Figs. S7 and S9(a)).

_{E}^{52}At values up to ∼0.75, we observe that an increase in

*k*results in a shift of the distribution of the ribosomes toward higher densities, whereas an increase in

_{I}*k*results in a shift toward lower ribosomal densities, the distributions always maintaining approximately the same shape (Figs. S8 and S9(b)).

_{E}^{52}We quantify these changes by computing the sensitivities on the various moments of these distributions (sensitivity on mean, variance, skewness, and kurtosis; Fig. 7 and Figs. S10 and S11 in the supplementary material;

^{52}see also Appendix C). The sensitivity on the first moment can also be obtained in the deterministic model, and the results obtained in the stochastic and deterministic cases are in good agreement (Fig. 7(a) and Fig. S11(a) in the supplementary material).

^{52}

The common hypothesis is that most genes are limited by translation initiation. This limitation is reflected in our results: the initiation log-sensitivity on the synthesis rate is the highest up to a ribosomal density of approximately 0.45 (Fig. 7(a)). This log-sensitivity describes the effect of changes of the initiation rate constant on the average synthesis rate of the system. Two remarkable points, however, emerge from this study: (1) even though the elongation log-sensitivity is lower than the initiation log-sensitivity, it is not negligible, meaning that elongation also has an impact on the synthesis rate and (2) already starting below a ribosomal density of 0.2, the sensitivity of the variance of the synthesis rate resulting from the elongation rate constant is higher than from the initiation rate constant (Fig. 7(b)), highlighting the important role of translation elongation. Note that the variance directly relates to the noise in the protein synthesis, demonstrating the crucial importance of elongation in the noise of the system.

These predictions could be tested, for example, in an *in vitro* experiment: a fluorescent protein could be attached to specific genes representing diverse polysome sizes, and the distribution of the fluorescent protein synthesized in cells would be followed. Then, by changing the composition of the medium (e.g., changing the abundances of ribosomes, initiation factors, elongation factors, or tRNAs), one could observe whether the variation in the distribution of the fluorescence reflects the results obtained here using our model.

### Initiation noise versus termination noise

The synthesis of proteins is intrinsically a stochastic process leading to noise and high variability of protein concentrations between cells.^{20–22} Such noise in protein concentrations has multiple sources, such as the mRNA abundance variability, the translation process, and protein degradation. To extract the amount of noise that is due to the translation steps and queuing of ribosomes, we investigate differences between the noise in the translation initiation and termination rates to identify possible noise amplification or buffering due to the translation elongation process.

We compute the distribution of the instantaneous rates of initiation or termination, which are the inverse of the time between two consecutive translation initiation or translation termination events, respectively. From the comparison of the interquartile ranges of the two instantaneous rates, we can identify differences at higher ribosomal densities (Fig. 3). Upon closer investigation of the distributions of these two rates (Fig. 8), we observe that at low ribosomal densities, the elongation process does not have a significant impact, and the synthesis (termination) rates correspond directly to the initiation rates. By contrast, at high ribosomal densities, the distribution differs between initiation and termination, and the results suggest that the rate of initiation does not correspond to the ultimate rate of protein synthesis. However, the average values of the initiation and termination rates are equal to each other in all the conditions. Interestingly, the distributions of both the initiation and termination rates are narrower at a ribosomal density near the opto-critical value (*ρ* = 0.75, i.e., near the maximal achievable synthesis rate for the gene), suggesting again that the maximum specific synthesis rates correspond to optimal noise reduction. Additionally, under very high crowding, we observe a bimodal distribution of the initiation rate (Fig. 8(b) with *ρ* = 0.98): the slower rate corresponds to the peak synthesis rate, whereas the most frequent mode corresponds to the inverse of the median time for the ribosomes to empty the initiation site.

### Time delays

Time delays arise in protein synthesis due to the elongation process, which introduces a time lag between the initiation and termination steps.^{28} This aspect is important because time delays can have a substantial impact on the output of the system.^{26,28–30} For example, Bratsun *et al.* demonstrated that time delays can cause a system to oscillate or that a delay can reduce the ability of a negative feedback loop to reduce the noise in the system.^{26} However, such delay models consider a constant delay value (i.e., two given reactions are always spaced in time by the same delay), but the results of these studies might be amplified or decreased if the true distribution of delays is broad. Therefore, knowledge of the value and the distribution of the delay emerging from translation is necessary, and this knowledge can be obtained through our stochastic simulations. Indeed, the delays that we observe in our analysis are directly derived from the translation process itself, and no *a priori* distribution of a delay is needed. In addition to the delay for complete translation, we can also observe the initiation delay, i.e., how long the translation initiation site is blocked by a ribosome.

We observe an increase in the time delays with increasing ribosomal densities (Fig. 9). This increase could be expected because increased ribosomal hindrance would slow the general motion of the ribosomes. Under low ribosomal crowding, when the ribosomes can move quite freely, we find that the initiation site is free for new ribosome binding approximately 1 s after the previous initiation event. This time interval can be much longer under high crowding, for which the distribution shows a mean value of approximately 6 s. In addition, at the highest ribosomal density, the distribution of initiation delays displays a long tail, so that the initiation site can be blocked much longer before a new ribosome can bind the mRNA copy (Fig. 9(a)).

We observe a very narrow distribution of translation delays up to a density of $ \rho \u0304 \u22480.75$ (Fig. 9(b)). Note that at the opto-critical density, despite having the highest specific protein synthesis rate, the translation delay is longer than at lower densities. Thus, the ribosomes take longer to move on the mRNA, which could help enhance the protein folding efficiency. Indeed, excessively high elongation rates can lead to increased amounts of misfolded proteins.^{47} Finally, under very high ribosomal densities, the ribosomes need approximately 5 times as long to translate a protein, and the distribution of time delays is broad.

### Initiation step

In our above simulations, we assume the ribosome cannot unbind from the initiation site (i.e., translation initiation is always followed by elongation). However, in reality, the first step can be reversible (see the Methods section). To investigate the effect of reversible ribosome binding under comparable conditions, we perform simulations in which we vary the translation initiation and reverse initiation rate constants while maintaining the same final protein synthesis rates (Fig. S12 in the supplementary material).^{52} Despite observing differences in the noise in the ribosome binding events as a function of the ribosome binding rate constant (even at a fixed ribosomal density), we note that the final noise in the protein synthesis rate remains the same when the parameters result in the same ribosomal density and synthesis rate (Figs. S12(a) and S12(c) in the supplementary material).^{52} Considering only the ribosome binding events that further lead to protein synthesis (i.e., not counting the binding events followed by reverse unbinding), we observe that the noise on these events is independent of the ribosome binding rate constant when the final ribosomal density and synthesis rate remain the same.

### Evaluation of mathematical models

The model we use here, while it captures the mechanistic details of the real system, encompasses many reactions. This broadness can make it difficult to use in large systems because it can be quite computationally intensive and, most importantly, many parameters of these reactions have not yet been well characterized. Consequently, such model has not been used extensively in computational systems’ biology studies. Therefore, we examine whether it is possible to reduce the full model and obtain similar results; thus, we compare the full model with the reduced models most commonly used in the literature. The common feature of these models is the description of the synthesis rate using a single rate expression, which is usually a function of the mRNA concentration and free ribosomes. Therefore, these models mechanistically describe only the initiation process and ignore the steps of elongation and termination. In essence, these models implicitly assume that protein synthesis is initiation limited, and thus they are based on an *ad hoc* model reduction, which could be invalid even when translation is predominantly initiation limited.^{10,28,48} Although these models represent a simplification of the system, they are based on some phenomenological assumptions and statistics from experimental measurements of protein variability in bacteria, and they are useful for the purposes of many analyses.

Here, we test three such models against the mechanistic model to observe the differences in the predicted noise of protein synthesis from the system. Descriptions of the models and the computational settings are provided in Appendix D. We compare the models under different regimes (corresponding to different ribosomal densities of the full model), and the parameters of the various models are uniquely matched to output the same average protein synthesis rates as in the full model. Thus, what can differ between the models is the noise in synthesis. We cannot compare other data because these alternative models, for example, do not include ribosomal density or cannot estimate time delays. For parameters corresponding to a low ribosomal density or to a very high one, all models (the mechanistic model and the three reduced models) give very similar distributions in protein synthesis (Fig. 10 and Fig. S13 in the supplementary material).^{52} However, at middle to high densities, the distributions are very different between the reduced models and the more complete model: all three alternative models tested overestimate the amount of noise in the system. Recalling that CV is a measure of noise in the system, we note that the difference in noise between the models can be considerable with, for example, an approximately 8-fold higher CV for the lumped models than the full model at middle-high ribosomal densities (Fig. 10). Additionally, we observe in the full model that the noise decreases significantly with increasing ribosomal density up to the density of the maximal synthesis rate (Figs. 3 and 10), whereas the amount of noise predicted in the lumped models remains approximately constant (Fig. 10).

## CONCLUDING REMARKS

Our study suggests that translation elongation reduces the noise present in the cell. The lumped models do not reproduce the qualitative and quantitative behaviors of the full model, and consequently, a full model of translation must be considered to incorporate translation into a larger system of protein synthesis and interactions and to fully characterize the noise of the system. This present analysis focused on the noise emerging from the translation process only, and transcription and degradation would add additional noise. However, the conclusion that the alternative, simpler, models compared here overestimate the noise from translation still holds. Interestingly, Bratsun *et al.*, in their study on the impacts of delays,^{26} used a simplified model for protein synthesis (similar to the lumped *model* 2 discussed above) and observed that adding a delay to protein synthesis could induce an amplification of the noise in a negative feedback loop. In contrast, our present study might instead indicate that considering the full elongation process could reduce the noise. However, we did not implement negative feedback in our study; thus, this hypothesis would need to be confirmed through further analysis.

As discussed here, and also in view of the fact that elongation has a definite influence on translation efficiency, as observed from the sensitivity analysis performed in this study, it is important to consider that lumped models only reveal a biased view of the noise from the system and that full mechanistic models are needed to obtain a complete view. However, even the full mechanistic models could be used in simplified terms: our study showed that the variability in free ribosomes could be neglected when the total number of ribosomes is not subject to perturbations; thus, in such a case, one could simply consider the few mRNAs of interest in isolation.

## Acknowledgments

The authors would like to thank Olivier Burri and Jan Overney for early ideas supporting this research. J.R. and V.H. were supported by the SNSF (Swiss National Science Foundation), the SystemsX.ch (The Swiss Initiative in Systems Biology), and the Ecole Polytechnique Fédérale de Lausanne (EPFL).

### APPENDIX A: SIMULATING THE PROTEIN ABUNDANCES

As discussed in the main text, experiments measuring noise in the protein synthesis system have concentrated on measuring the noise in protein abundances. Therefore, to compare our results with experimental values, we considered a simple extension of our translation model that includes protein degradation. The model is represented by the same reactions described in Eq. (1), with the addition of protein degradation modeled as a first-order reaction,

where $ k deg j $ is the degradation rate constant for protein *j* and is related to the half-life ($ t HL j $) of this protein as

As in the rest of the translation model, we consider the case here of constant mRNA abundance, with the mRNA corresponding to the gene of interest present as a single copy. This approximation is important, and some additional noise could be added when considering the evolution of the mRNA abundance in a more complete model that includes transcription and mRNA degradation.

To reflect the conditions of an *in vitro*/*in vivo* experiment, the protein abundance from the gene of interest was not continuously followed but instead *measured* in the simulation every 3000 s. (This is a longer time scale than the protein half-lives to avoid possible auto-correlation effects on consecutive measurements; shorter or longer time intervals between the measures gave the same results.) The mean and coefficient of variation of this protein abundance were then obtained for these measures.

### APPENDIX B: SIMULATIONS WITH A BACKGROUND POOL OF GENES

In the simulations in which we studied a mRNA copy competing for free ribosomes with a background pool of genes, we followed the evolution of the system under two different conditions for the background:

*Mixed pool:* The background consisted of a virtual pool of 10 different gene species of various lengths and mean polysome sizes, each species being present in multiple copies: the species were selected to provide good representation of the distribution of the data observed by Arava *et al.*^{2} The background had a mean ribosomal density of *ρ* = 0.21, and the total number of mRNA copies in the background was ∼12 900. Finally, the total number of ribosomes was taken as 88 350. See Table S2 in the supplementary material for parameter values.^{52}

*Uniform pool:* This background No. 2 comprised a single gene species, with a mean polysome size giving a high ribosomal density of *ρ* = 0.8, corresponding to the values observed, for example, for fast growing cells of *E. coli*.^{36} A total of 2450 copies of this gene species composed this background. See Table S3 in the supplementary material for parameter values.^{52}

### APPENDIX C: SENSITIVITY OF THE MOMENTS

The main outputs of the stochastic simulations are the protein synthesis rate and ribosomal density as a function of time. It is then possible to compute, for these and other outputs, the probability density functions (*pdf*), as obtained in Figs. 8 and 9, for example. However, if we want to determine how a distribution changes with respect to a perturbation of a parameter (Figs. S7–S9),^{52} we need to compute the various moments of the corresponding *pdf* and then compute the sensitivities on these moments. We, therefore, define the following quantities:

- 1st moment: the average of the distribution. The associated log-sensitivity corresponds to the
*control coefficients*,where$ S 1 , p v = C p v = d \u2009 v \u0304 / v \u0304 d p / p , $*v*is the output (e.g., synthesis rate or ribosomal density) for which we want to compute the sensitivity coming from parameter*p*(e.g., initiation rate constant); $ v \u0304 $ is the weighted average on*v*(weight coming from how long the system was in each state). - 2nd central moment: the variance of the distribution. This variance can be defined aswhere the averages are again weighted by the time spent in each state. Then, the sensitivity of this 2nd moment is given by$ m 2 v = v \u2212 v \u0304 2 \xaf , $$ S 2 , p v = d m 2 v / m 2 v d p / p . $
- 3rd standardized central moment: skewness of the distribution. It is defined by$ a 3 v = v \u2212 v \u0304 3 \xaf m 2 v 3 / 2 . $
- 4th standardized central moment: kurtosis of the distribution. It is defined by$ a 4 v = v \u2212 v \u0304 4 \xaf m 2 v 2 . $

The sensitivities of the 3rd and 4th standardized central moments are defined in a similar manner to the sensitivities of the 1st and 2nd moments, i.e.,

Note that in the definition of the various sensitivities of the moments (Eqs. (C1), (C3), and (C6)), a derivative is used; however, in the application to the simulations, the derivative is approximated by a small perturbation of the various parameters. These sensitivities are shown in Fig. 7 and Figs. S10 and S11 in the supplementary material.^{52}

### APPENDIX D: DESCRIPTION OF THE SIMPLER MODELS

To describe protein synthesis, the translation process is often modeled in the literature as a one- or two-step process, assuming implicitly an initiation-limited process and therefore neglecting all elongation steps. To compare the results of the presented full model with such simpler ones, we define different models that can be found in the literature. Here is a brief description of these models and the corresponding equations:

Model 1: our full mechanistic model described in the paper.

Model 2: in this model, protein translation is considered as a simple second-order reaction, where the ribosomes,

*R*, directly translate the mRNA in one step with a rate of synthesis_{f}*k*, as schematically presented in Fig. 11(a)._{s}Model 3: the ribosomes first bind the mRNA with rate

*k*to form a complex written as_{I}*RmRNA*, and this complex then synthesizes a protein with rate*k*. Here, no limitation exists on the number of_{T}*RmRNA*complexes per mRNA copy, i.e., the ribosomes are assumed to instantaneously leave the initiation region, directly allowing other ribosomes to bind this mRNA with the same rate (Fig. 11(b)).Model 4: similar to model 3, but here, it is only possible to have one ribosome in the

*RmRNA*state per mRNA copy, i.e., the bound ribosome blocks the initiation site (Fig. 11(c)).

Fixing different rate constants of initiation and termination for model 1, we obtain the resultant specific protein synthesis rate in various cases. We can then fix the parameter values of the other models (*k _{s}*,

*k*,

_{I}*k*) so that the specific protein synthesis rate is the same in all 4 cases; what will then change is the amount of noise between the models, which we can compare.

_{T}