Author Notes
Templated copolymerization, in which information stored in the sequence of a heteropolymer template is copied into another polymer product, is the mechanism behind all known methods of genetic information transfer. A key aspect of templated copolymerization is the eventual detachment of the product from the template. A second key feature of natural biochemical systems is that the template-binding free energies of both correctly matched and incorrect monomers are heterogeneous. Previous work has considered the thermodynamic consequences of detachment and the consequences of heterogeneity for polymerization speed and accuracy, but the interplay of both separation and heterogeneity remains unexplored. In this work, we investigate a minimal model of templated copying that simultaneously incorporates both detachment from behind the leading edge of the growing copy and heterogeneous interactions. We first extend existing coarse-graining methods for models of polymerization to allow for heterogeneous interactions. We then show that heterogeneous copying systems with explicit detachment do not exhibit the subdiffusive behavior observed in the absence of detachment when near equilibrium. Next, we show that heterogeneity in correct monomer interactions tends to result in slower, less accurate copying, while heterogeneity in incorrect monomer interactions tends to result in faster, more accurate copying, due to an increased roughness in the free energy landscape of either correct or incorrect monomer pairs. Finally, we show that heterogeneity can improve on known thermodynamic efficiencies of homogeneous copying, but these increased thermodynamic efficiencies do not always translate to increased efficiencies of information transfer.
I. INTRODUCTION
Genetic information transfer, through the processes of translation, transcription, and replication, underpins all life.1 Underlying these processes is the same fundamental motif, wherein a long information-carrying heteropolymer, known as a template, has the information stored in its sequence of monomers copied into another polymer, known as the copy. The copy itself may or may not be composed of the same types of monomers. As demonstrated by the sheer breadth of biological form and function, this templated polymer copying motif is capable of specifically assembling a massive space of structures from a small handful of subunits. To give a brief idea of the scale of the challenge, consider the converse approach of self-assembly without an underlying template. In this counterfactual biology, a cell would need to obtain all its proteins (tens of thousands in humans), in their correct proportions, by mixing together the 20 natural amino acids. This task would be impossible as there is not enough information stored in the interactions between the 20 amino acids to create anything but a mess of partially formed or malformed assemblies.2
Given the central role of templated polymer copying in biology, understanding its fundamental principles will likely play a crucial role in attempts at building artificial life3 or understanding life’s origins.4 A key test of our understanding of these principles is our ability to build synthetic copying systems without relying on highly evolved enzymes such as polymerases. Our progress here has been far slower than in template-free self-assembly,5–10 highlighting the difficulties that come with engineering templated copying systems. The earliest attempts at building synthetic copiers resulted in copies that remained stuck to the template, rendering the template unable to catalyze the production of more copies.11 Later designs tended to utilize time-varying external conditions such as temperature and pressure to induce separation.12–15
The need to separate copies from templates is, in fact, the crux of the difficulty in engineering synthetic polymer copying systems. Accurately producing a state where the correct copy monomers are bound to their correct positions along the template is relatively straightforward, simply requiring a large free-energy advantage for correct monomer pairings.5–10 However, the copy and template will then tend to remain bound due to cooperative interactions, preventing the production of further copies. Cells are able to supply the free energy needed to separate chemically, without requiring changes in conditions external to the cell, through the use of ATP-consuming enzymes. This separation occurs after the completion of copying in the case of replication16 or during copying in the case of transcription17 and translation.18 Mimicking this autonomy without enzymes is challenging, but approaches based on DNA nanotechnology19–21 and organic chemistry22,23 that use the free energy of backbone formation to drive the product off the template have shown some promise.
Limited progress in the development of artificial copying systems highlights a lack of understanding of the basic theory of polymer copying. To remedy this knowledge gap, high-level models have been devised that emphasize the thermodynamic consequences of separation from the template.24–27 The fundamental challenge in templated copying is the production of a polymer or ensemble of polymers with low sequence entropy. Although specific bonds form during the production of a copy, they are transient and, thus, cannot supply the free energy to compensate for this drop in entropy relative to random sequences. Hence, extra chemical work is required to produce low entropy polymer ensembles, and an efficiency can be defined by comparing this chemical work to the free energy stored in the low entropy ensemble.24,25 Furthermore, maintaining such an ensemble in a non-equilibrium, low entropy steady state imposes thermodynamic constraints on the network that maintains the steady state.26,27
More detailed models have also been made that focus on the polymerization mechanism, typically discrete state Markov models. The simplest ones involve one step per added monomer;28–34 more complex ones have multiple steps35 and sometimes added cycles.36–43 Some models explicitly consider microscopic reversibility,28–34,43,44 while others incorporate irreversible steps.36–38,40–42 Another distinction is whether they assume a symmetry between all correct pairings and, separately, all incorrect pairings (homogeneous)28,29,31,33,35–44 or consider explicit differences between them (heterogeneous).32,34 Finally, some approaches are not preoccupied with copy–template separation.28,30,31,34,36–40,42,44 On the other hand, other works consider models in which the copy explicitly separates from the template,41,45 including some that attempt to understand the thermodynamic consequences of separation.29,33,35,43
Although there are multiple theoretical frameworks to treat heterogeneity in the context of separating copolymerization systems32,45–47 (in particular,45 explicitly applying their framework to RNA polymerase), existing work has not considered both separation and heterogeneity in models with thermodynamic self-consistency. Thermodynamically self-consistent models have been constructed for homogeneous systems,29,33,35,43 but it remains unclear if and how the effects of heterogeneity interplay with the thermodynamic consequences of separation. We fill this gap by incorporating heterogeneity into a known thermodynamically consistent model of polymer copying that incorporates separation from behind the growing tip of the product, analogous to transcription and translation.33 The addition of each monomer in our model consists of multiple steps, following.35 We first introduce a method of coarse-graining the model to a single-step description, allowing existing methods (Refs. 32 and 43) of analysis to be used. Using these methods, we demonstrate qualitative differences between heterogeneous copying with and without separation. We then identify regimes where heterogeneity facilitates or hinders fast, accurate, and efficient copying.
We organize our work as follows: In Secs. II A and II B, we introduce our coarse-grained and fine-grained Markov copying models with separation. We briefly describe Gaspard’s32 and Qureshi et al.’s43 analysis methods in Sec. II C and our extended iteration for state visits in Sec. II D. In Sec. II E, we describe the parameter sets we initially consider. We begin our results in Sec. III A by comparing velocities of heterogeneous copying with and without separation. In Sec. III B, we investigate the effect of heterogeneity on error and completion time for our limiting regimes. Finally, in Sec. III C, we show that in the low driving regime, the thermodynamic efficiency of copying can be improved by heterogeneity. We conclude our paper in Sec. IV.
II. METHODS
A. General coarse-grained model of templated polymer copying with heterogeneity and product separation
We consider the coarse-grained model of polymer copying in Fig. 1(a), well-known in the literature.30,32,33,44 In this section, we will define the model in a general sense, without enforcing separation or thermodynamic consistency (we shall define a parameterization of this model that fulfills these conditions in Sec. II E). The state of the model (x, y) is defined by the two sequences x = n1n2 … nL of the template polymer and y = m1m2 … ml of the copy polymer (Note that L and l are indices of the last monomer in x and y, respectively). The sequences draw their elements from finite templates and copy alphabets {1, …, N} and {1, 2, …, M}, respectively. We list a few notational conventions before we proceed. First, for both x and y, we use ni:i′ and mi:i′ to refer to the subsequences of x and y between indices i and i′, inclusive. Second, we will often use & as a shorthand for m1:l−2 for convenience, writing y = &ml−1ml.
Polymer copying with continuous separation of the product from the template. (a) Illustration of the coarse-grained model for a two-monomer system. Each monomer type is assigned a unique number (1 or 2 in this case) and color (light blue for 1 and red for 2). Square monomers correspond to the template polymer, while round monomers correspond to the copy polymer. A coarse-grained state is represented by the full sequence of the copy and the template polymers. Purple dotted arrows represent the coarse-grained state transitions corresponding to either monomer addition (forward arrow) or monomer removal (backward arrow). Each coarse-grained transition may be further subdivided into fine-grained steps. (b) Illustration of the fine-grained steps that occur between coarse-grained states. In particular, the fine-grained steps corresponding to the leftmost purple dot in subfigure (a) are depicted (only monomers in the final 3 positions of the copy and template are depicted). First, a new monomer binds to the template with both a generic (dashed line) and sequence-specific (solid line) bond. Then, the newly bound monomer is covalently incorporated into the growing polymer, breaking the generic bond in the process. Finally, the previously added monomer unbinds from the template. (c) Labeling of fine-grained states for a given coarse-grained state, with indices f denoted by bold bracketed numbers. Arrows here correspond to the fine-grained state transitions. Transitions to other states of index f = 0 (denoted here by the dotted arrows) change the coarse-grained state (x, y). Note that these fine-grained state indices f are not valid when returning from a different coarse-grained state (via a dashed arrow) as f is dependent on the last completed state visited. For simplicity, general and sequence-specific interactions are not depicted separately in subfigures (a) and (c).
Polymer copying with continuous separation of the product from the template. (a) Illustration of the coarse-grained model for a two-monomer system. Each monomer type is assigned a unique number (1 or 2 in this case) and color (light blue for 1 and red for 2). Square monomers correspond to the template polymer, while round monomers correspond to the copy polymer. A coarse-grained state is represented by the full sequence of the copy and the template polymers. Purple dotted arrows represent the coarse-grained state transitions corresponding to either monomer addition (forward arrow) or monomer removal (backward arrow). Each coarse-grained transition may be further subdivided into fine-grained steps. (b) Illustration of the fine-grained steps that occur between coarse-grained states. In particular, the fine-grained steps corresponding to the leftmost purple dot in subfigure (a) are depicted (only monomers in the final 3 positions of the copy and template are depicted). First, a new monomer binds to the template with both a generic (dashed line) and sequence-specific (solid line) bond. Then, the newly bound monomer is covalently incorporated into the growing polymer, breaking the generic bond in the process. Finally, the previously added monomer unbinds from the template. (c) Labeling of fine-grained states for a given coarse-grained state, with indices f denoted by bold bracketed numbers. Arrows here correspond to the fine-grained state transitions. Transitions to other states of index f = 0 (denoted here by the dotted arrows) change the coarse-grained state (x, y). Note that these fine-grained state indices f are not valid when returning from a different coarse-grained state (via a dashed arrow) as f is dependent on the last completed state visited. For simplicity, general and sequence-specific interactions are not depicted separately in subfigures (a) and (c).
From each state (x, &ml−1ml), two types of transitions are allowed. Monomer addition at a propensity of Φ+(ml+1, x, &ml−1ml) results in the state (x, &ml−1mlml+1), while monomer removal at a propensity of Φ−(x, &ml−1ml) results in the state (x, &ml−1) (we use the term propensity at the coarse-grained level to distinguish Φ± from the rates of the fine-grained process introduced in Sec. II B). Hence, changes to the template are forbidden in the model, as are monomer additions and removals away from the tip of the copy. From this generic framework, the only additional constraint imposed is that propensities of monomer addition and removal are only dependent on copy monomers within a small locality of the copy tip, such that and .30,32,33,44 These forms are physically justified by our mechanism in Sec. II E, but for now it is sufficient to note that these forms are an appropriate first-order approximation for processes such as transcription and translation that are composed of reactions that occur in a small neighborhood of the tip of the growing polymer.30,32 Hence, local transitions and propensities from (x, y) are fully determined by (x, ml−1ml, l), known as the tip state.
For finite L, once the copy reaches the length of the template l = L, the copy has a certain propensity of detaching from the template. Detachment terminates copying, and hence a detached copy is an absorbing state of the model. Formally, we need to distinguish between a complete copy still attached to the template and one completely detached; purely as a trick of notation, we append the integer 0 to the end of a detached copy sequence (resulting in l = L + 1).
B. Fine-grained reaction steps
The monomer addition and removal steps in Sec. II A are not elementary chemical reactions, and in a general model will be described through a series of smaller, fine-grained steps. Properties of the fine-grained system can be preserved in the coarse-grained system if an appropriate coarse-graining procedure is followed (we cover this procedure for our case in Secs. II C and II D; refer to Qureshi et al.43 for more detail). Fine-grained steps for our system follow from earlier work35 on a copying system that separates as it is being copied; in brief, monomer binding, polymerization, and unbinding of the copy from the template are each treated as separate steps. Note that although we use the term fine-grained, these states should still be understood as biochemical macrostates.48
To define the state space, we assume that no more than two monomers are bound to the template at any given time, and fully incorporating a new monomer onto the copy requires breaking the existing template-copy bond. The exact process of monomer addition, summarized in Fig. 1(b) and Table I, consists of three steps. First, a monomer binds to the next empty site on the template. Then, a covalent bond is formed between the new monomer and the tip of the copy. Finally, the previous tip detaches from the template. The reverse processes occur for monomer removal. Copy–template bonds have a generic component and a sequence-specific component. The generic bond is broken during polymerization; this mechanism was found to reduce product inhibition in Ref. 35, and so we incorporate it here.
A summary of the fine-grained events and their associated rates.
Event . | Forward rate . | Backward rate . |
---|---|---|
Binding a monomer to the template | ||
Polymerization of the tip monomer | ||
to the newly bound monomer | ||
Detachment of the previous | ||
tip monomer (tail unbinding) |
Event . | Forward rate . | Backward rate . |
---|---|---|
Binding a monomer to the template | ||
Polymerization of the tip monomer | ||
to the newly bound monomer | ||
Detachment of the previous | ||
tip monomer (tail unbinding) |
We use the notation (x, y, f) to specify the fine-grained states, consisting of the coarse-grained state as well as an index f ∈ {0, …, 6} that specifies the fine-grained state given the coarse-grained state [Fig. 1(c)]. We treat the fine-grained process as a series of sub-processes in which the system transitions between “completed states” with f = 0 and “transitory states” with f ≠ 0, as in Ref. 43. The coarse-grained label (x, y) is given by the completed state that was visited most recently, with f then defined relative to that state. Hence, biochemical macrostates with f ≠ 0 have two different fine-grained state descriptors, y and f, depending on whether they were approached “from the back” or “from the front.” For example, starting from (x, &ml, 0), binding and then polymerization of a type 1 monomer would yield (x, &ml, 3). On the other hand, the same state would be labeled (x, &ml1, 1) if (x, &ml1, 0) was the last completed state visited.
C. Solving for distributions of complete polymers
We will cover here and in Sec. II D, the analysis methods used in this work. Readers interested only in our physical results may skip to Sec. II E. To be able to discuss our methods and results precisely, we must first distinguish between the various stochastic processes that arise from our models and clarify the relationships between them. Let t refer to (continuous) time, and k refer to the discrete number of steps taken (number of state transitions). For a fixed template X = x, the stochastic processes Y(t) and F(t), giving the coarse-grained and fine-grained index of the copy, respectively, together define a Markov process . From , we can extract an embedded discrete-time Markov chain representing the sequence of states visited. If we remove any information about the fine-grained index F(t) and only record Y(t), we obtain the continuous, non-Markovian stochastic process and the embedded discrete sequence of states . If we are only interested in the distribution of final copies, it is sufficient to consider , since it records any change in the composition of the copy.
There are many alternative ways that can be coarse-grained with a view to avoiding the need to handle non-Markovian processes. Qureshi et al.’s coarse-graining procedure43 can be applied if the full state space of the system can be partitioned into transitory and completed states such that every transitory state is only encountered during a transition between two unique completed states. In such systems, between any two completed states is a “petal” of transitory states. Each petal between completed states can then be analyzed independently.
Having argued that quantities of the fine-grained model can be obtained from solutions of a Markovian coarse-grained model, we now proceed to discuss the method we use to actually solve for quantities of the coarse-grained model. Gaspard and Andrieux showed that the distribution of consecutive copy monomers, conditioned on the entire template, is Markovian if transition rates are only dependent on tip states.30,32 Moreover, the conditional probability distributions Px,l(ml|ml−1) for complete polymers given a fixed template x and position indices l can be obtained through the solution of an iterated function system. We discuss here a brief intuition for Gaspard’s iterated function system32 in terms of absorbing probabilities of Markov chains and performing a full derivation for our system in Appendix A (refer to Ref. 32 for a full treatment).
Markov chain absorption probability problem for the calculation of complete polymer distributions with a length l product polymer as an initial state. Single arrows (without corresponding reverse counterparts) represent the transitions to absorbing states (monomer removal to a copy of length l − 1 and copy completion to some complete polymer for some given monomer at position l + 1), and the probability of absorption to each absorbing state is considered. The length of the copy for each state is given at the top of the figure (note the jump from l + 1 to L, indicating absorption into a complete and fully detached polymer). The px,l(ml+1|ml−1ml) variables are local transition probabilities that can be obtained by considering coarse-grained propensities or . The Q variables are local probabilities of absorption (into some arbitrary complete polymer) occurring before monomer removal. The absorption probabilities Rx,l(ml+1|ml−1ml) follow from values of Q for the next position index Qx,l+1(mlml+1). The Q values (since these are the probabilities of absorption before removal) for the current iteration step can then be calculated by summing over all the non-backward absorption probabilities; in this case, Qx,l(21) = Rx,l(1|21) + Rx,l(2|21). With this current Q value, the next step of the backward iteration may proceed.
Markov chain absorption probability problem for the calculation of complete polymer distributions with a length l product polymer as an initial state. Single arrows (without corresponding reverse counterparts) represent the transitions to absorbing states (monomer removal to a copy of length l − 1 and copy completion to some complete polymer for some given monomer at position l + 1), and the probability of absorption to each absorbing state is considered. The length of the copy for each state is given at the top of the figure (note the jump from l + 1 to L, indicating absorption into a complete and fully detached polymer). The px,l(ml+1|ml−1ml) variables are local transition probabilities that can be obtained by considering coarse-grained propensities or . The Q variables are local probabilities of absorption (into some arbitrary complete polymer) occurring before monomer removal. The absorption probabilities Rx,l(ml+1|ml−1ml) follow from values of Q for the next position index Qx,l+1(mlml+1). The Q values (since these are the probabilities of absorption before removal) for the current iteration step can then be calculated by summing over all the non-backward absorption probabilities; in this case, Qx,l(21) = Rx,l(1|21) + Rx,l(2|21). With this current Q value, the next step of the backward iteration may proceed.
D. Recovering further properties of the fine-grained system from the coarse-grained system
Markov chain from which expected visitation counts per visit of the previous tip state can be calculated. Solid arrows represent the local transition probabilities that can be calculated from coarse-grained rates Φ+ or Φ−, while dotted arrows represent the Q variables obtained via Gaspard’s iteration32 (Fig. 2). To avoid over-counting, the tip states whose visitation counts are of interest are not permitted to move back into the initial state; rather, they point to a virtual absorbing state at a rate equal to the rate of monomer removal. Complete polymers are again treated as absorbing states. Starting from the initial state, we can now count, through standard methods, the average number of times either tip state of interest [denoted by “”] is visited before either the initial tip state is revisited or polymerization terminates.
Markov chain from which expected visitation counts per visit of the previous tip state can be calculated. Solid arrows represent the local transition probabilities that can be calculated from coarse-grained rates Φ+ or Φ−, while dotted arrows represent the Q variables obtained via Gaspard’s iteration32 (Fig. 2). To avoid over-counting, the tip states whose visitation counts are of interest are not permitted to move back into the initial state; rather, they point to a virtual absorbing state at a rate equal to the rate of monomer removal. Complete polymers are again treated as absorbing states. Starting from the initial state, we can now count, through standard methods, the average number of times either tip state of interest [denoted by “”] is visited before either the initial tip state is revisited or polymerization terminates.
E. Thermodynamically consistent parameterization for heterogeneous separating copiers
Refer back to Fig. 1(b) and Table I for the rates of our fine-grained system. In this section, we will assign parameters to these fine-grained rates such that thermodynamic consistency is maintained. We first assume that monomer concentrations [M] are equal and chemostatted (unchanging over the copy process). We assume that binding of a monomer to a template obeys mass-action, so . The rate constants for monomer binding are set to be invariant between the different tip states, as they are chemically difficult to tune, so . Tail binding is similarly difficult to tune and obeys a pseudo-mass action rule where rates are proportional to an effective local concentration of the tail monomer around the tip of the growing polymer, so . Monomer polymerization rates, on the other hand, can conceivably be engineered, and here we envision a parameterization .
As in Ref. 35, we assume the binding free energy between monomers nl and ml can be divided into a specific monomer-dependent free energy ΔGTT(nl, ml) and a generic monomer-independent free energy ΔGgen [Fig. 1(b)]. Coupling the breaking of this generic bond to backbone formation was found to reduce product inhibition and allow detachment.35 Monomer binding incurs a free-energy change of −ln[M] − ΔGTT(nl, ml) − ΔGgen, polymerization incurs a free-energy change of −ΔGBB + ΔGgen, and tail unbinding results in a net free-energy change of ΔGTT(nl−1, ml−1) + ln[M]eff.
Due to the need to separate, copy–template interactions must be transient. Observe that ΔGgen cancels during the incorporation of a single monomer. Similarly, the −ΔGTT(nl, ml) free-energy change from monomer binding is offset by the ΔGTT(nl, ml) change from tail unbinding during the incorporation of the next monomer. The persistent net polymerization free-energy change per incorporated monomer is then , free of any ΔGTT or ΔGgen terms and, hence, consistent with transient copy–template interactions. The resulting forward and backward rates, with the ratio determined by generalized local detailed balance, are summarized in Table II.
Parameterized forms of each of the fine-grained reaction steps after local detailed balance is imposed and further simplifications are made.
Forward reaction . | Forward rate . | Parameterized form . | Backward rate . | Parameterized form . |
---|---|---|---|---|
Binding | kbind[M] | |||
Polymerization | kpol(nl−1nl, ml−1ml) | |||
Tail unbinding | ktail[M]eff |
Forward reaction . | Forward rate . | Parameterized form . | Backward rate . | Parameterized form . |
---|---|---|---|---|
Binding | kbind[M] | |||
Polymerization | kpol(nl−1nl, ml−1ml) | |||
Tail unbinding | ktail[M]eff |
- Slow binding and unbinding of the free monomers, [M] → 0 and ΔGgen, ΔGBB → ∞. For simplicity, we let and we normalize rates such that kbind[M] = 1. Then,(10)(11)
- Slow polymerization, kpol ≪ kbind. Here, rates are normalized so that kpol[M] = 1. Then,(12)(13)
Case 1 corresponds to a discrimination on backward propensities (consistent with the “temporary thermodynamic discrimination” described in Ref. 33). Case 2, on the other hand, corresponds to a thermodynamically driven discrimination on forward propensities (consistent with “combined kinetic and thermodynamic discrimination,” as described in Ref. 33).
III. RESULTS
A. Consequences of separation for the velocity and energy landscapes of heterogeneous copying
Fine-grained steps for a model of copying where the copy does not separate from the template. Only monomers in the vicinity of the copy tip are depicted. (a) New monomers bind and polymerize in separate steps, but there is no tail unbinding of the previously added monomer. (b) Labeling of fine-grained states associated with a given coarse-grained state. Indices f are denoted by the bold bracketed numbers. Transitions to other states of index f = 0 (denoted here by the dotted arrows) change the coarse-grained state (x, y).
Fine-grained steps for a model of copying where the copy does not separate from the template. Only monomers in the vicinity of the copy tip are depicted. (a) New monomers bind and polymerize in separate steps, but there is no tail unbinding of the previously added monomer. (b) Labeling of fine-grained states associated with a given coarse-grained state. Indices f are denoted by the bold bracketed numbers. Transitions to other states of index f = 0 (denoted here by the dotted arrows) change the coarse-grained state (x, y).
Parameterized forms of each of the fine-grained reaction steps for a model of copying where the copy does not separate from the template.
Forward reaction . | Forward rate . | Backward rate . |
---|---|---|
Binding | kbind[M] | |
Polymerization | kpol(nl−1nl, ml−1ml) |
Forward reaction . | Forward rate . | Backward rate . |
---|---|---|
Binding | kbind[M] | |
Polymerization | kpol(nl−1nl, ml−1ml) |
We begin by considering how the velocity profiles for STC change with increasing nonequilibrium drive ΔGpol. As with NTC, we consider backward propensity discrimination (that is, the slow binding limit). The binding free energies of the incorrect monomer (we remind readers that a correct pair is defined by ml = nl) pairs are kept constant, such that ΔGTT(1, 2) = ΔGTT(2, 1) = 0. The free energy of a correct monomer 1 pair is held at ΔGTT(1, 1) = 2, while ΔGTT(2, 2) is varied from 2 to 10. Gaspard’s iterated function system32 was applied to obtain the absorption probabilities Q for the copying of a polymer sequence of length L = 10 000. Then, the methods in Sec. II D and Ref. 43 were used to find τx,l, defined as the average amount of total time spent at position l for a template sequence x before a complete polymer forms and detaches. τx,l is calculated by multiplying visits to each state by the waiting time for each visit to said state and then summing over all states of copy length l. The average completion times tc = Σlτx,l can then be calculated, and the average copying velocity was calculated as . As in Ref. 33, the value of ΔGpol at equilibrium is given by ΔGpol,eq = −ln 2, and we report copy velocities as a function of the difference in ΔGpol from this equilibrium value up to ΔGpol = 2 in Fig. 5(b).
Comparison of velocity profiles and free energy landscapes for STC and NTC. Graphs of normalized average velocity as a function of ΔGpol − ΔGpol,eq are shown for NTC in (a) and STC in (b). Equilibrium Gx,l − ⟨G⟩ free-energy landscapes (at ΔGpol = ΔGpol,eq) are shown for NTC in (c) and STC (d). A zoom inset is provided in (d). The long tails of zero velocity for copying without separation, caused by a rougher free-energy profile, do not occur for copying with simultaneous separation, where the velocity profile is linear close to ΔGpol − ΔGpol,eq = 0.
Comparison of velocity profiles and free energy landscapes for STC and NTC. Graphs of normalized average velocity as a function of ΔGpol − ΔGpol,eq are shown for NTC in (a) and STC in (b). Equilibrium Gx,l − ⟨G⟩ free-energy landscapes (at ΔGpol = ΔGpol,eq) are shown for NTC in (c) and STC (d). A zoom inset is provided in (d). The long tails of zero velocity for copying without separation, caused by a rougher free-energy profile, do not occur for copying with simultaneous separation, where the velocity profile is linear close to ΔGpol − ΔGpol,eq = 0.
However, the models approach zero velocity in different ways. As noted by Gaspard,32 the heterogeneous model of NTC displays a long tail of near-zero velocity as it approaches equilibrium. Such behavior is not observed in our model of heterogeneous copying [Fig. 5(b)]. This difference can be explained by considering sample free-energy landscapes in the heterogeneous NTC model and the STC model. We plot free-energy deviation Gx,l − ⟨G⟩ for a fixed sample template at various copy lengths l for both heterogeneous NTC [Fig. 5(c)] and heterogeneous STC [Fig. 5(d)].
For heterogeneous STC, the free-energy deviations switch stochastically between two levels, determined by the template monomer at position l, and Gx,l − ⟨G⟩ is, therefore, constrained to be close to zero. By contrast, the free-energy deviations for heterogeneous NTC are large. On scales of length ld we expect to see barriers of height 50 that trap the system, resulting in slow sub-diffusive motion close to equilibrium driving.32,50 Hence, we observe a region of zero velocity in Fig. 5(a). These tall barriers do not occur for separating copiers, as the length of the product interacting with the template at any given time is bounded. In our system, only two monomers may be bound at a given time, but this intuition generalizes to other parameter sets and even completely different models that include separation as the copy grows (even realistic models of transcription and translation). If only a finite number of copy monomers bind to the template simultaneously, the roughness of the free-energy landscape associated with sequence heterogeneity is inherently limited.
B. Effects of separation and heterogeneity on error probability and average time spent per site
We now investigate the effect of heterogeneity on two key indicators of a copy system: the error probability, ϵx,l, and the total time spent at a site l before the monomer is permanently incorporated, τx,l (contrast with θx,l, the waiting time for each visit of a tip state). We also consider visitations to a given position Vx,l for reasons that will become apparent. As indicated by the subscripts, these three quantities depend on the positional index l underlying template x. We assume these properties are self-averaging if the underlying template distribution is stationary,32 such that in the limit L → ∞, , , and are well-defined. Throughout this section, we will calculate these averages for a single long template x of length L = 105 whose monomers are drawn from a Bernoulli distribution B(L, pt) with pt representing the average proportion of monomers of type 2, similar to Ref. 32.
The quantities ⟨ϵ⟩, ⟨V⟩, and ⟨τ⟩ are then dependent on pt, the proportion of monomers of type 2. Consider now random variables ϵ1 and ϵ2. ϵ1 is the error rate (in the large L limit) for copying a homogeneous template with monomers of type 1, while ϵ2 is the error rate for copying a homogeneous template with monomers of type 2. Let ϵI,x,l be a random variable that takes on a value ϵ1 if the monomer at position l for template x is of type 1, and ϵ2 if the monomer at position l for template x is of type 1. In the long L limit, for a given probability pt of monomer 2, then ⟨ϵI⟩ = (1 − pt)ϵ1 + ptϵ2. Similarly, we can apply analogous definitions to τ and V such that ⟨VI⟩ = (1 − pt)V1 + ptV2 and ⟨τI⟩ = (1 − pt)τ1 + ptτ2. If the copying of subsequent monomers were independent of each other, we would expect ⟨ϵ⟩ = ⟨ϵI⟩, ⟨V⟩ = ⟨VI⟩, and ⟨τ⟩ = ⟨τI⟩. However, these equalities generally do not hold due to inter-monomer correlations in the product.30,32,33 To infer whether heterogeneity tends to improve or worsen copying performance for a particular parameter set, we now consider, for specific parameter regimes, the sign of log-ratio-averages , , and . As an example, would imply that for a given parameter set, heterogeneity in monomer interactions tends to increase average errors.
We will separately consider both limits mentioned in Sec. II E, that is, the slow binding limit with backward propensity discrimination and the slow kpol limit with forward propensity discrimination. Where parameters tend to ∞, a multiplier of e12 is used for the numerics. For the backward propensity discrimination case, [M] = e−12 and kbind = e12. In Sec. III B 1, we consider the case where the interactions between correct monomer pairs (1, 1) and (2, 2) are made heterogeneous, while in Sec. III B 2, we consider the case where the interactions between incorrect pairs (2, 1) and (1, 2) are made heterogeneous. For each section, we will be plotting and as a function of pt, and we provide a framing for our findings in Sec. III B 3.
1. Heterogeneity in correct monomer interactions
Consider the coarse-grained dynamics given in Eqs. (10)–(13). To investigate the effects of heterogeneous correct monomer interactions, we fix ΔGTT(2, 2) = 2 and ΔGTT(1, 2) = ΔGTT(2, 1) = 0, while varying ΔGTT(1, 1) from 2 to 10. Keep in mind that with backward propensity discrimination, ΔGTT modifies the rates of monomer unbinding, while in the case of forward propensity discrimination, ΔGTT modifies the rate of monomer binding (Sec. II E). Throughout, ΔGpol is arbitrarily held at 0, which corresponds to relatively weak but non-zero driving.
a. Backward propensity discrimination.
Plots are shown in Figs. 6(a)–6(c). We see that , implying that in this regime errors tend to be increased by heterogeneity. The trend in is more complex, as there appears to be a small dip below 0 for high pt. Note that the time spent per site is obtained by modulating the total number of state visits with the average waiting time at each tip state (Sec. II D). If we instead turn our attention to the visitation counts, we once again obtain , implying that heterogeneity in this regime tends to increase the number of tip states visited before completion.
Deviations in ϵ, τ, and V due to heterogeneity in correct monomer interactions, as a function of the monomer 2 content pt, relative to homogeneous copying. Log-ratio-means , , and are plotted for backward propensity discrimination in (a)–(c) and forward propensity discrimination in (d)–(f).
Deviations in ϵ, τ, and V due to heterogeneity in correct monomer interactions, as a function of the monomer 2 content pt, relative to homogeneous copying. Log-ratio-means , , and are plotted for backward propensity discrimination in (a)–(c) and forward propensity discrimination in (d)–(f).
b. Forward propensity discrimination.
Plots are shown in Figs. 6(d)–6(f). Again, , so errors tend to increase in this regime due to heterogeneity. Here, both time spent per site τ and the number of state visits tend to increase as a result of heterogeneity.
2. Heterogeneity in incorrect monomer interactions
We now consider the case where the binding strengths of incorrect monomers are heterogeneous, ΔGTT(1, 2) ≠ ΔGTT(2, 1) while ΔGTT(1, 1) = ΔGTT(2, 2) = 6. ΔGpol is again arbitrarily held at 0. ΔGTT(2, 1) = 2 is kept constant and ΔGTT(1, 2) is varied from 0 to 6.
a. Backward propensity discrimination.
Plots are shown in Figs. 7(a)–7(c). We see that , implying that in this regime errors tend to be decreased by heterogeneity. Unlike in the case where heterogeneity is applied to the binding free energies of correct pairs, both and . Hence, heterogeneity here tends to decrease visitation counts. The graphs have similar shapes, and hence waiting time modulation does not result in qualitative differences in the total time spent per site.
Deviations in ϵ, τ, and V due to heterogeneity in incorrect monomer interactions, as a function of the monomer 2 content pt, relative to homogeneous copying. Log-ratio-means , , and are plotted for backward propensity discrimination in (a)–(c) and forward propensity discrimination in (d)–(f).
Deviations in ϵ, τ, and V due to heterogeneity in incorrect monomer interactions, as a function of the monomer 2 content pt, relative to homogeneous copying. Log-ratio-means , , and are plotted for backward propensity discrimination in (a)–(c) and forward propensity discrimination in (d)–(f).
b. Forwards propensity discrimination.
Plots are shown in Figs. 7(d)–7(f). We see that there is no clear trend in . Let us turn to a different measure, . This average-log-ratio of errors is a measure of changes in relative error at each site that occur due to heterogeneity in monomer interactions. For example, if the average factor of error increase for one monomer is greater than the average factor of error decrease for the other monomer. We will argue the significance of this error measure in Subsection III B 3. For now, observe that in Fig. 8, implying that relative errors on each site are, on average, reduced. We continue to observe and .
Mean-log-ratios of error for forward propensity discrimination with heterogeneity on incorrect monomers. is negative for all values of pt and ΔGTT,12 with heterogeneous interactions.
Mean-log-ratios of error for forward propensity discrimination with heterogeneity on incorrect monomers. is negative for all values of pt and ΔGTT,12 with heterogeneous interactions.
3. Discussion
Heterogeneity on the correct monomers tends to make copying both slower (up to modifications due to waiting time) and more error-prone, while heterogeneity on incorrect monomers tends to make copying faster and more accurate (in some cases, only the average relative error is made better). We now attempt to explain these results. We can divide the overall coarse-grained state-space into two: one in which the correct monomer is bound at the tip and one in which the incorrect monomer is bound at the tip. Due to the detachment of the copy behind the tip, this division is enough to specify the chemical free energy at each step (the same does not apply to models of NTC). Copying can then be thought of as a special example of 1D diffusion with a choice of the free-energy landscape at each step (Fig. 9). The addition of a correct monomer corresponds to moving through a more favorable landscape (the “correct landscape”), while the addition of an incorrect monomer involves moving through a less favorable landscape (the “incorrect landscape”). To understand the effects of heterogeneity, we only need to consider transitions with free-energy changes modified by heterogeneity (purple lines in Fig. 9), as all other transitions would be found in equivalent forms in corresponding homogeneous copying systems. Observe that the introduction of heterogeneity in the interaction of correct monomers results in an increased roughness of the correct landscape, while the introduction of heterogeneity in the interaction of incorrect monomers results in an increased roughness of the incorrect landscape. This relative roughness is particularly evident as the overall variability in ΔGx,l is constrained within finite bounds [Fig. 5(d)].
Conceptual diagram of the “choice of landscape” model. Three cases are considered: (a) Heterogeneous correct monomer interactions ΔGTT = (2 0, 0 4). (b) Heterogeneous incorrect monomer interactions ΔGTT = (4 0, 2 4). (c) Homogeneous copying ΔGTT = (2 0, 0 2). The dashed red line represents the landscape that must be traversed to form incorrect pairings and the dashed blue line represents the landscape that must be traversed to form correct pairings. The dashed green line is a sample path including both correct and incorrect monomers. This sample path is colored purple instead when transitions occur within a landscape with free-energy changes altered as a result of heterogeneity. The dotted lines in graphs (a) and (b) represent the landscapes of homogeneous analogs (i.e., free energy landscapes of the two homogeneous systems that involve a template with only monomer 1 and only monomer 2). A template x = 1112212112 is used throughout, and the sample path taken by the dashed green line corresponds to y = 1212122121.
Conceptual diagram of the “choice of landscape” model. Three cases are considered: (a) Heterogeneous correct monomer interactions ΔGTT = (2 0, 0 4). (b) Heterogeneous incorrect monomer interactions ΔGTT = (4 0, 2 4). (c) Homogeneous copying ΔGTT = (2 0, 0 2). The dashed red line represents the landscape that must be traversed to form incorrect pairings and the dashed blue line represents the landscape that must be traversed to form correct pairings. The dashed green line is a sample path including both correct and incorrect monomers. This sample path is colored purple instead when transitions occur within a landscape with free-energy changes altered as a result of heterogeneity. The dotted lines in graphs (a) and (b) represent the landscapes of homogeneous analogs (i.e., free energy landscapes of the two homogeneous systems that involve a template with only monomer 1 and only monomer 2). A template x = 1112212112 is used throughout, and the sample path taken by the dashed green line corresponds to y = 1212122121.
Upward slopes tend to slow down motion more than downward slopes tend to speed up motion, and this asymmetry drives the well-documented observation that rough landscapes tend to be harder to traverse than smoother ones.50 A “choice of landscape” model, therefore, explains why heterogeneity in the interaction of correct monomers tends to increase errors—traversing the incorrect landscape becomes more favorable relative to the correct one (compared to a copying system with homogeneous interactions). Similarly, incorrect monomer interaction heterogeneity tends to reduce errors, as the incorrect landscape becomes harder to traverse. This effect is most robust when viewed in terms of averaging over relative errors at each site rather than average absolute errors. A sweep over copying systems with various heterogeneous discrimination factors for both backward and forward discrimination at ΔGpol = 0 (we expect the effect to be stronger with weaker driving) uniformly shows when correct monomer interactions are heterogeneous and when incorrect monomer interactions are heterogeneous (figures are presented in Appendix E).
The sign of influences the sign of , but the latter may experience a sign reversal, for instance, if monomer 2 is the monomer that experiences an error reduction and ϵ2 ≪ ϵ1 already. This argument suggests that, generally, the potential benefits of heterogeneity on copying error rates are more limited than the potential drawbacks. Heterogeneity will not tend to increase the probability of finding a correct monomer after another correct monomer but instead can reduce the probability of finding an incorrect monomer placed after another incorrect monomer by making the landscape of incorrect monomers more difficult to traverse. However, finding an incorrect monomer after another incorrect monomer is usually a rare event for a good copying system, and reducing the occurrences of such events would have a smaller impact on the average error rates of a copying system. Conversely, deleterious effects would be expected to be larger as correct monomer pairs would be dominant in a good copying system, and there is more room to reduce the probability of consecutive correct monomers. This reasoning is consistent with the fact that the deterioration factors in Fig. 6 tend to be larger than the improvement factors in Fig. 7.
The observed trends in also follow from this “choice of landscape” model after a few additional considerations. When correct monomer interactions are made heterogeneous, moving through the correct landscape becomes more difficult, and hence more state visits are required. This trend is again very robust ( Appendix E). On the other hand, when incorrect monomer interactions are made heterogeneous, barriers to movement in the incorrect landscape are increased. Here, we need to consider two separate effects. First, traversal of the incorrect landscape gets harder, increasing the number of visits to states on average. Second, the system preferentially traverses the correct landscape. The expected number of steps of the coarse-grained model along the correct landscape will be less than that along the incorrect landscape since the correct tip will be harder to remove on average, thus decreasing state visits. The second effect is usually dominant ( Appendix E), but exceptions occur when the correct and incorrect landscapes have a small free-energy gap (implying a small benefit to visitation counts when traversing the correct landscape). The sign of feeds forward into the sign of , but the latter may experience a sign reversal depending on the waiting times at each template monomer.
We perform a sweep over parameters to identify regimes where heterogeneity results in the greatest improvement in error rate (results in Appendix F). Interestingly, our parameter sweep revealed parameter sets where a heterogeneous system has a lower mean error than either of its constituent monomers used in isolation. Consider, for instance, the plot in Fig. 10. There is a clear minimum in ⟨ϵ⟩ at pt = 0.4, implying that some combination of the two considered monomers performs better than either one on its own. Thus, it is generally untrue that the error performance of a heterogeneous copying system is bounded by the error performance of its constituent monomers.
Graph of ⟨ϵ⟩ against pt showing ⟨ϵ⟩ dipping below the values of ⟨ϵ⟩ at both pt = 0 and pt = 1. Parameters are ΔGpol = −0.3, ΔGTT,11 = ΔGTT,22 = 6.0, ΔGTT,12 = 5.0, and ΔGTT,21 = 0.0.
Graph of ⟨ϵ⟩ against pt showing ⟨ϵ⟩ dipping below the values of ⟨ϵ⟩ at both pt = 0 and pt = 1. Parameters are ΔGpol = −0.3, ΔGTT,11 = ΔGTT,22 = 6.0, ΔGTT,12 = 5.0, and ΔGTT,21 = 0.0.
C. Entropy, mutual information, and efficiency in the low ΔGpol regime
For our investigations here, we average entropy over the middle 80% of a long template to mitigate edge effects. Similarly to the error rate, the entropy rate captures uncertainty in the identity of successive monomers in the copy. It is, however, the more thermodynamically relevant parameter, as it is directly bound by the drive ΔGpol. For ΔGpol > 0, copying can, in principle, be arbitrarily accurate. However, if ΔGpol < 0 (as and generally [M] < [M]eff, this regime corresponds to one where the chemical free-energy drop due to backbone formation cannot compensate for the entropic drop as a result of attaching a monomer to the polymer tail), then there is a fundamental limit on how accurate a copying system can be when operating in the limit of long polymer length.33
Setting L = 104, and using the coarse-grained model defined by the propensities in Eqs. (10)–(13) with ΔGTT(1, 1) = ΔGTT(2, 2) = 6, ΔGTT(2, 1) = 0, and varying ΔGTT(1, 2) from 0 (the homogeneous case) to 6, we plot in the low ΔGpol region as a function of ΔGpol for copying with backward propensity and forward propensity discrimination in Figs. 11(a) and 11(c), respectively. We see quite significant increases in efficiency relative to the homogeneous case. The source of this increased efficiency is the mechanism identified in Sec. III B: roughness in the free-energy landscape of incorrect monomers makes correct monomers more favorable. Interestingly, these increases in efficiency occur despite combining a monomer with another monomer having worse discrimination than itself; hence, (congruent with our observations of error performance in Sec. III B), the thermodynamic performance of a heterogeneous copying system is not bounded by the performance of its constituent monomers. Note that because of finite size effects and sampling, h(Y|X) does not quite achieve ln 2 at ΔGpol = −ln 2, so the leftmost data points are calculated at ΔGpol = −ln 2 + 0.0001 and L = 105 to prevent numerical instabilities.
Thermodynamic (ηTherm) and (estimated) information efficiencies as a function of ΔGpol when incorrect monomer interactions are made heterogeneous by varying ΔGTT(1, 2) from 0. Plots for backward propensity discrimination are shown in (a) and (b), and plots for forward propensity discrimination are shown in (c) and (d). For both types of discrimination, the entropy rate is reduced by heterogeneity, and hence ηTherm is increased in (a) and (c). for backward propensity discrimination is increased by heterogeneity up to about ΔGpol = −0.2 to −0.3, after which it starts decreasing. For forward propensity discrimination, heterogeneity tends to reduce ηInf except for very low values of ΔGpol < −0.55.
Thermodynamic (ηTherm) and (estimated) information efficiencies as a function of ΔGpol when incorrect monomer interactions are made heterogeneous by varying ΔGTT(1, 2) from 0. Plots for backward propensity discrimination are shown in (a) and (b), and plots for forward propensity discrimination are shown in (c) and (d). For both types of discrimination, the entropy rate is reduced by heterogeneity, and hence ηTherm is increased in (a) and (c). for backward propensity discrimination is increased by heterogeneity up to about ΔGpol = −0.2 to −0.3, after which it starts decreasing. For forward propensity discrimination, heterogeneity tends to reduce ηInf except for very low values of ΔGpol < −0.55.
In Eq. (28), the p(x, m1, m2, …, ml−1) term means that we cannot assume p(ml|m1, m2, …, ml−1) = p(ml|ml−1). In order to proceed, we first make the assumption that Y, having marginalized over X, has finite-length, templated-mediated correlations and can be approximated as an ith-order Markov process ( Appendix G). Remarkably, numerical evaluations show that the change in estimated entropy going from i = 1 to i = 8 is not significant for the parameters we consider when sampling the middle 80% of a length L = 104 template. Furthermore, by sampling length L = 105 and L = 106 templates at select parameter values, we found that a significant proportion of this error (at least for some regimes) is likely attributable to sampling issues instead of genuine long-range correlations ( Appendix G), and hence we are justified in estimating entropies by treating Y as a first-order Markov chain. Formally, this assumption means that the information efficiency we calculate is an upper bound of some true information efficiency ηInf for Bernoulli templates. However, Appendix G suggests that is a very tight upper bound. On the other hand, ηInf may be higher when considering non-Bernoulli templates.
In Figs. 11(b) and 11(d), we plot our estimated as a function of ΔGpol for the backward propensity and forward propensity discrimination regimes, respectively. As expected, we observe . Increases in efficiency relative to the homogeneous case ΔGTT(1, 2) = 0 are better preserved with backward propensity discrimination compared to forward propensity discrimination. P(Y) is, in essence, more skewed in the forward propensity discrimination case, which is detrimental to mutual information.
IV. CONCLUSION
Using minimal models with fine-grained steps,35 we have investigated how heterogeneity interacts with separation to affect copying error, velocity, and thermodynamic efficiency in STC. We have thus far investigated heterogeneity in monomer binding energies and kinetics; the effects of heterogeneous monomer concentrations, which would naturally manifest as heterogeneity in binding rates, will be an interesting topic for future research. Our first contribution is an approach for extending Qureshi et al.’s coarse-graining method43 to generalize to heterogeneous systems. We have thus far used this method to find polymer completion times for our coarse-grained model. A natural extension would be to use the method to investigate a heterogeneous version of kinetic proofreading,36,43 where it can be used to find the average expected free-energy consumption per polymer copy.
Using this method, we were able to characterize the velocity profiles of a heterogeneous separating templated copolymerization system. In contrast to non-separating templated copolymerization systems,32 we do not observe a long tail of zero velocity as equilibrium is approached. This absence makes sense in light of the template-dependent free-energy profiles of partial copies, which have significantly higher barriers (scaling as for a length scale lD) in the case of NTC. Put another way, each monomer type in heterogeneous NTC has a different pseudo-equilibrium point, and copying with a drive lower than a monomer’s pseudo-equilibrium point makes traversing a long stretch of said monomer more difficult. This effect is totally absent in the case of STC since the sequence-specific interactions with the template are transient. We expect that this observation would generalize to more realistic models of transcription or translation, as long as the copy continuously separates from the template and the number of copy monomers interacting with the template at a given time is effectively bounded.
Our results on the effect of heterogeneity on error rates are more surprising. It was not initially obvious that discriminating on correct monomer interactions, arguably the more natural form of heterogeneous discrimination, would tend to increase errors. There is evidence that in protein translation, the ribosome grips on tRNA have identical strengths.52 This grip is analogous to our transient copy–template bond, as it is not persistent, and so it is plausible that the homogeneity of this grip was selected due to similar mechanisms that increase heterogeneous error rates (consider that in our case, error increase factors of up to e1.25 = 3.5-fold were observed). In the context of artificial systems, our results would suggest that it would be wise to minimize interaction heterogeneity on correct pairs of monomers.
The relative error reduction observed when discriminating on incorrect monomers is equally surprising. Applying heterogeneity to incorrect monomer interactions could be a useful design motif for the design of accurate copiers, and we have made some attempts toward this goal by scanning over parameter space. We find that having backward and forward propensity discrimination on separate monomers, with roughly equal effect sizes, tends to produce the best results. However, we emphasize that the usefulness of this design motif depends on the space of accessible parameter sets. Under some chemical restrictions (in particular, if we had to operate in the low ΔGpol regime, or if our copying mechanism only permits backward propensity discrimination), it may make sense to apply heterogeneity to incorrect monomer interactions for the roughly 10%–20% improvement in error rates. Note that operating under low ΔGpol may be necessary for synthetic systems. Reassuringly, there does not appear to be a trade-off between using heterogeneity to reduce error or using it to speed up copying, as regimes that tend to reduce error tend to reduce copying time as well. Both performance measures prefer heterogeneity on incorrect monomer interactions as opposed to correct monomer interactions, as heterogeneity on incorrect pairs makes it harder to move through the landscape of incorrect monomers.
Our final result relates to the entropy drop and mutual information in heterogeneous copying systems. In homogeneous copying, the entropy drop from equilibrium is exactly the mutual information when template monomers are equally distributed. In contrast, we showed that there is a meaningful difference between this entropy drop and mutual information in the case of heterogeneous copying due to the skewing of the copy polymer distributions. For a given template distribution, we can evaluate a channel capacity, the mutual information maximized over input distributions. Channel capacity can in turn be used to define an information efficiency measure. We showed that both thermodynamic and information efficiencies can be improved by heterogeneity in the incorrect monomer interactions at low ΔGpol, but that information efficiency is always less than or equal to thermodynamic efficiency. Per Shannon’s channel coding theory, channel capacity represents the minimal bitrate above which arbitrarily accurate decoding is possible, and hence some of the thermodynamic entropy drop in heterogeneous copying systems does not contribute to reducing this bitrate. On the other hand, for homogeneous copying systems, there is always a template distribution where the thermodynamic entropy drop in the copy (relative to equilibrium) fully contributes to the reduction of this bitrate.
ACKNOWLEDGMENTS
J.E.B.G. was supported by an Imperial College President’s Ph.D. Scholarship, B.J.Q. by the European Research Council under the European Union’s Horizon 2020 Research and Innovation Program (Grant Agreement No. 851910), and T.E.O. by a Royal Society University Research Fellowship.
AUTHOR DECLARATIONS
Conflict of Interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Author Contributions
J.E.B.G., B.J.Q., and T.E.O. planned the research. J.E.B.G. performed the research. J.E.B.G., B.J.Q., and T.E.O. wrote the manuscript.
Jeremy E. B. Guntoro: Conceptualization (equal); Investigation (lead); Methodology (lead); Software (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). Benjamin J. Qureshi: Conceptualization (equal); Methodology (supporting); Supervision (equal); Writing – review & editing (equal). Thomas E. Ouldridge: Conceptualization (equal); Methodology (supporting); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
Code and data are available at doi.org/10.5281/zenodo.14003309.
APPENDIX A: GASPARD’S METHOD VIA ABSORBING PROBABILITIES OF MARKOV CHAINS
Consider again Fig. 2 in the main text. We wish to calculate the probability Rx,l(ml+1|ml−1ml) of being absorbed into a complete polymer with ml+1 after ml starting from the initial state (x, &ml−1ml), without going back a step. To aid in our calculations, we introduce the absorption probabilities Rx,l(ml+1|ml−1mlmr), the probability of completing polymerization with ml+1 after ml starting from a system state (x, &ml−1mlmr) in the context of the Markov chain given in Fig. 2 (ml+1 may be the same or different from mr). To clarify, this probability includes the probability that we move backward from (x, &ml−1mlmr) to (x, &ml−1ml) and then, after an arbitrary non-absorbing set of moves, eventually absorb into (x, &ml−1mlml+1 … 0), where 0 indicates a detached, complete polymer. Note: moving back two consecutive times from mr always results in the backward absorbing state. In addition, if ml+1 = mr, then absorption without moving back is included in the absorption probability Rx,l(ml+1|ml−1mlmr).
APPENDIX B: DERIVATION OF COARSE-GRAINED PROPENSITIES
Enumeration of spanning trees for the calculation of coarse-grained propensities. (a) The modified network for the calculation of A. (b) Spanning trees of the network rooted at the initial coarse-grained state.
Enumeration of spanning trees for the calculation of coarse-grained propensities. (a) The modified network for the calculation of A. (b) Spanning trees of the network rooted at the initial coarse-grained state.
APPENDIX C: DERIVATION OF AVERAGE TIP WAITING TIMES
We apply the method in Ref. 43 to calculate the expected waiting times at a given tip state of the coarse-grained model. Note the waiting times will be dependent on the template as well as copy monomers near the tip; hence, in this section, a tip state will refer to (nl−1nlnl+1, ml−1ml). For each coarse-grained tip state, we wish to consider the network of reactions into and out of the completed state corresponding to f = 0 [Fig. 1(c)]. Refer to Fig. 13 for the first passage network from a completed state, with all transitions to other completed states redirected back to the initial completed state. As our propensities are dependent only on local template and copy monomers, for our case. As per,43 where J(nl−1nlnl+1, ml−1ml) is the flux into completed states corresponding to distinct coarse-grained states. To find this flux, we will need to calculate the stationary probability distribution pss(x, &ml−1ml, f) of fine-grained states in the network in Fig. 13.
Fine-grained network for calculating first passage between coarse-grained states. Transitions leading to other coarse-grained tip states are redirected back to the initial coarse-grained state. Rates for the bottom arm are omitted but are analogous to those for the top arm with (12, 12) in place of (12, 11).
Fine-grained network for calculating first passage between coarse-grained states. Transitions leading to other coarse-grained tip states are redirected back to the initial coarse-grained state. Rates for the bottom arm are omitted but are analogous to those for the top arm with (12, 12) in place of (12, 11).
APPENDIX D: ITERATION FOR THE EXPECTED NUMBER OF VISITS TO A TIP STATE
Let a tip state refer to a class of states with template x and a copy of length l + 1 ending in ml, ml+1. This definition extends to longer tails; for example, would refer to states ending in ml−1, ml, ml+1. In this section, we aim to calculate , the expected number of visits to for each visit to . We argue that this quantity can be calculated by considering visits to transient tip states in the Markov chain in Fig. 3. We will use abbreviations of the form for convenience, noting that we always refer to the Markov process in Fig. 3, and hence the conditioned tip ml−1, ml, and template x are defined.
APPENDIX E: PARAMETER SWEEP OF RELATIVE ERROR AND TOTAL VISITATIONS
Heat map illustrating the resulting relative error change from parameter sweeps for heterogeneity, having extremized over pt. We consider heterogeneity on correct [backward propensity discrimination: (a) and forward discrimination propensity discrimination: (b)] and incorrect [backward propensity discrimination: (c) and forward propensity discrimination: (d)] monomers. In the case of heterogeneity on correct monomer interactions, this relative error change is minimized over pt; for heterogeneity on incorrect monomer interactions, it is maximized over pt. Only the upper triangle ΔGTT,H ≥ ΔGTT,L is plotted. For heterogeneity on correct monomers, , so relative error decreases do not occur, and for heterogeneity on incorrect monomers, , so relative error increases do not occur.
Heat map illustrating the resulting relative error change from parameter sweeps for heterogeneity, having extremized over pt. We consider heterogeneity on correct [backward propensity discrimination: (a) and forward discrimination propensity discrimination: (b)] and incorrect [backward propensity discrimination: (c) and forward propensity discrimination: (d)] monomers. In the case of heterogeneity on correct monomer interactions, this relative error change is minimized over pt; for heterogeneity on incorrect monomer interactions, it is maximized over pt. Only the upper triangle ΔGTT,H ≥ ΔGTT,L is plotted. For heterogeneity on correct monomers, , so relative error decreases do not occur, and for heterogeneity on incorrect monomers, , so relative error increases do not occur.
Heat map illustrating the resulting log heterogeneous change in state visits from parameter sweeps for heterogeneity, having extremized over pt. We consider heterogeneity on correct [backward propensity discrimination: (a) and forward discrimination propensity discrimination: (b)] and incorrect [backward propensity discrimination: (c) and forward propensity discrimination: (d)] monomers. In the case of heterogeneity on correct monomer interactions, is minimized over pt; for heterogeneity on incorrect monomer interactions, it is maximized over pt. Only the upper triangle ΔGTT,H ≥ ΔGTT,L is plotted. For heterogeneity on correct monomers, , so decreases in state visits do not occur. For heterogeneity on incorrect monomers, for the vast majority of parameters, so increases in state visits do not occur. However, deviations are observed in the small discrimination regime for heterogeneity on incorrect monomer interactions.
Heat map illustrating the resulting log heterogeneous change in state visits from parameter sweeps for heterogeneity, having extremized over pt. We consider heterogeneity on correct [backward propensity discrimination: (a) and forward discrimination propensity discrimination: (b)] and incorrect [backward propensity discrimination: (c) and forward propensity discrimination: (d)] monomers. In the case of heterogeneity on correct monomer interactions, is minimized over pt; for heterogeneity on incorrect monomer interactions, it is maximized over pt. Only the upper triangle ΔGTT,H ≥ ΔGTT,L is plotted. For heterogeneity on correct monomers, , so decreases in state visits do not occur. For heterogeneity on incorrect monomers, for the vast majority of parameters, so increases in state visits do not occur. However, deviations are observed in the small discrimination regime for heterogeneity on incorrect monomer interactions.
APPENDIX F: INVESTIGATIONS ON ERROR REDUCTION THROUGH HETEROGENEITY FOR SEPARATING COPIERS
Section III B revealed an interesting phenomenon that heterogeneity on correct monomers tends to increase error rates, while heterogeneity on incorrect monomers tends to decrease error rates. We believe it would be illustrative to attempt to find levels of forward and backward heterogeneity that are (in a heuristic sense) optimal for extracting benefits from heterogeneity.
Based on Sec. III B, we anticipate that error reduction is maximized when correct monomer pairs are kept homogeneous while the heterogeneity in the incorrect pairs is maximized, and so we make that assumption. We first consider the limit where one monomer is purely backward propensity discriminated, while the other is purely forward propensity discriminated. Then, we gradually shift the discrimination until one monomer has half the forward propensity discrimination (and the other has half the backward propensity discrimination) of the other.
We list our constrained parameter sets in Table IV. For each constrained set, we allow for baseline amounts of forward and backward propensity discrimination, and , to vary independently. Two values of ΔGpol are considered, 0 and −0.3 (note that −0.3 is weaker driving). For each parameter set, we consider the average error ratio (here, the expectation is taken over a uniform distribution of the parameter pt), a measure of how much heterogeneity helps decrease errors averaged over the content of monomer 2 for Bernoulli templates. Sweeping over different baseline values and , we obtain heat maps of , plotted in Fig. 16.
Parameter sets investigated in Appendix F.
Index . | ΔGTT . | ΔGK . |
---|---|---|
i | ||
ii | ||
iii |
Index . | ΔGTT . | ΔGK . |
---|---|---|
i | ||
ii | ||
iii |
Identifying regimes where heterogeneity has large impacts on error. is plotted for parameter sets indexed (i), (ii), and (iii) in Table IV, for ΔGpol = 0 (a)–(c) and ΔGpol = −0.3 (d)–(f). When ΔGpol = 0, the greatest decreases in error tend to occur when for parameter set (i), and for parameter sets (ii) and (iii). When ΔGpol = −0.3, the greatest decreases in error tend to occur when or .
Identifying regimes where heterogeneity has large impacts on error. is plotted for parameter sets indexed (i), (ii), and (iii) in Table IV, for ΔGpol = 0 (a)–(c) and ΔGpol = −0.3 (d)–(f). When ΔGpol = 0, the greatest decreases in error tend to occur when for parameter set (i), and for parameter sets (ii) and (iii). When ΔGpol = −0.3, the greatest decreases in error tend to occur when or .
Consider first ΔGpol = 0. We observe that in the limit of pure opposite discrimination (i.e., one monomer is purely forward propensity discriminated, and the other purely backward propensity discriminated), the largest decreases in error tend to be obtained when backward and forward propensity discrimination is present at roughly similar amounts. However, as discrimination is shifted in parameter sets (ii) and (iii), tends to increase in this region [Figs. 16(a)–16(c)]. This response is consistent with what we saw in Sec. III B, since shifting the discrimination results in a smoothing of the incorrect monomer potential. In the best cases, heterogeneity tends to decrease errors by about 10–20 percent (averaged over all pt) relative to the naïve, uncorrelated estimate of the error probability ϵI. On the other hand, for the lower value of ΔGpol = −0.3, beneficial effects tend to peak when one monomer is purely backward or forward propensity-discriminated, while the other experiences no discrimination at all [Figs. 16(d)–16(f)].
APPENDIX G: MARKOV APPROXIMATIONS FOR ENTROPY AND INFORMATION
Differences in estimates of h(Y) assuming a first order Markov chain, h1(Y), and an eighth order Markov chain, h8(Y), for ΔGTT,21 = 0 and ΔGTT,11 = ΔGTT,22 = 6, maximized over pt for each point, are presented in Fig. 17. The differences are not significant, suggesting that h1(Y) is a tight estimate of h(Y). Strangely, errors seem to be largest in the homogeneous case ΔGpol = 0 and ΔGTT,12 = 0 at pt = 0.5, where Y should be Markov and h(Y) should be exactly ln 2 due to symmetry. However, for this parameter set, it appears that sampling errors are the primary contributor to the difference h1(Y) − h8(Y). To better illustrate the effect of sampling, we plot the estimation error h(Y) − hi(Y) for both L = 104 and L = 106 in Fig. 18, for a homogeneous backward propensity-discriminated copying system with pt = 0.5 [note h(Y) is exactly calculable: h(Y) = ln 2 in this regime]. Here, hi(Y) is the entropy estimated by assuming an ith order Markov process. Strangely, the estimation error actually increases with increasing i. However, as the overwhelming majority of the difference vanishes for L = 106, this estimation error is likely due to sampling (genuine errors due to correlations in Y should persist in the L → ∞ limit).
Heat maps illustrating the difference in estimated entropy h(Y) assuming a first vs eighth order Markov Y for (a) backward propensity and (b) forward propensity discrimination, showing insignificant differences throughout.
Heat maps illustrating the difference in estimated entropy h(Y) assuming a first vs eighth order Markov Y for (a) backward propensity and (b) forward propensity discrimination, showing insignificant differences throughout.
Entropy estimation error for h(Y) − hi(Y) for a homogeneous backward propensity-discriminated copying system. Estimation error is plotted for (a) L = 104 and (b) L = 106. Parameters are ΔGTT,11 = ΔGTT,22 = 6, and pt = 0.5. We observe an error increase due to taking higher order i. However, the error decreases massively going from L = 104 to L = 106, implying that sampling is the source of this error.
Entropy estimation error for h(Y) − hi(Y) for a homogeneous backward propensity-discriminated copying system. Estimation error is plotted for (a) L = 104 and (b) L = 106. Parameters are ΔGTT,11 = ΔGTT,22 = 6, and pt = 0.5. We observe an error increase due to taking higher order i. However, the error decreases massively going from L = 104 to L = 106, implying that sampling is the source of this error.
Note that we estimate entropy by considering the probability of follow-up monomers conditioned on previous length i strings [Eq. (G2)]. As i increases for a fixed L, we expect the estimate of this conditional probability to become worse as there are fewer samples of each length i string, leading to undersampling. This homogeneous case is unique in that we can attribute all errors to sampling; for most parameters, the error in entropy is likely some combination of sampling errors (these errors likely get worse with increasing i) and genuine correlations in Y (these errors likely get better with increasing i).