Polymers consisting of more than one type of monomer, known as copolymers, are vital to both living and synthetic systems. Copolymerization has been studied theoretically in a number of contexts, often by considering a Markov process in which monomers are added or removed from the growing tip of a long copolymer. To date, the analysis of the most general models of this class has necessitated simulation. We present a general method for analyzing such processes without resorting to simulation. Our method can be applied to models with an arbitrary network of sub-steps prior to addition or removal of a monomer, including non-equilibrium kinetic proofreading cycles. Moreover, the approach allows for a dependency of addition and removal reactions on the neighboring site in the copolymer and thermodynamically self-consistent models in which all steps are assumed to be microscopically reversible. Using our approach, thermodynamic quantities such as chemical work; kinetic quantities such as time taken to grow; and statistical quantities such as the distribution of monomer types in the growing copolymer can be directly derived either analytically or numerically from the model definition.

Copolymers are polymers consisting of more than one type of monomeric unit; the order of these monomers in the chain defines the copolymer sequence. In general, copolymerization mechanisms can be classified into two main categories: free copolymerization that does not rely on a template,1 as shown in Fig. 1(a); and templated copolymerization, in which a template (usually another copolymer) is used to bias the distribution of sequences produced, as shown in Figs. 1(b) and 1(c). Polymers produced via both types of mechanism are of relevance to both biological and industrial systems. In living systems, O-glycans are sequences of monosaccharides that grow by free copolymerization from serine or threonine amino acids.2 They play a key role as a physical protective barrier for cells from pathogens, as well as participating in other cellular processes.2,3 Free copolymerization is also a common method for producing plastics and rubbers in commercial and industrial systems.4,5 In addition, there have been recent experimental designs for free copolymerization systems that produce specific products utilizing DNA-nanotechnology-based reaction schemes.6,7

FIG. 1.

Comparison of the different types of copolymerization mechanism with three types of monomer (blue, red, and yellow). (a) shows free copolymerization, (b) templated self-assembly, and (c) templated copolymerization with autonomous separation. In (b) and (c) the template is shown with squares and the growing polymer with circles. (d) Example of a more detailed reaction scheme used to select the next monomer. In each of these sub-figures, different colors represent different monomer types, with bonds colored accordingly when their strength might depend on the monomer. In (d), different activation states of the monomer undergoing incorporation are represented by different shapes. The dashed bubble indicates how the arbitrary set of reactions in (d) may replace the simple reaction surrounded by a dashed bubble in (c) for a more complex model.

FIG. 1.

Comparison of the different types of copolymerization mechanism with three types of monomer (blue, red, and yellow). (a) shows free copolymerization, (b) templated self-assembly, and (c) templated copolymerization with autonomous separation. In (b) and (c) the template is shown with squares and the growing polymer with circles. (d) Example of a more detailed reaction scheme used to select the next monomer. In each of these sub-figures, different colors represent different monomer types, with bonds colored accordingly when their strength might depend on the monomer. In (d), different activation states of the monomer undergoing incorporation are represented by different shapes. The dashed bubble indicates how the arbitrary set of reactions in (d) may replace the simple reaction surrounded by a dashed bubble in (c) for a more complex model.

Close modal

Templated copolymerization is the mechanism by which DNA, RNA, and polypeptides are produced in DNA replication, RNA transcription, and protein translation, respectively. These processes are at the heart of the central dogma of molecular biology8 and are the basis of the informational and biochemical complexity of life. In DNA replication, DNA templates the production of copies of itself; in transcription, DNA templates the production of RNA; and, in translation, messenger RNA (mRNA) is the template for the production of a polypeptide.9 Inspired by these biological templated copolymerization mechanisms, there has been recent interest in designing synthetic systems that can produce other sequence-controlled molecules via templated copolymerization.10–16 

Free polymerization can be modeled as a Markovian growth process under which monomers bind to the end of a growing polymer at a certain rate. Early free copolymerization models17–19 built on this framework to allow for copolymerization via the incorporation of multiple types of monomeric unit, as shown in Fig. 1(a), albeit with irreversible polymerization reactions. In particular, Mayo and Lewis19 emphasized that in polymerization models, if the monomer binding events are irreversible and their rates are conditional on the terminal monomer type, then intra-sequence correlations are generated within the copolymer.

Although the use of models with irreversible transitions is reasonable in many contexts, thermodynamically self-consistent models require all transitions to be microscopically reversible.20 In particular, if a transition from state A to state B is possible, then transitions from B to A must also be possible. Models with fully microscopically reversible polymerization reactions, as in Fig. 1(a), are more challenging to analyze but can be interpreted in a thermodynamic sense.1,21,22

Templates can affect the rate at which monomers are added or removed from a growing copolymer, and hence, templated copolymerization models can be more complex than free copolymerization models. When the template consists of just one type of templating monomer (homopolymer), a templated copolymerization process can be mapped onto a free copolymerization model. Furthermore, if one assumes some symmetries regarding interactions between monomers in the growing copolymer and those in the template (such as all complementary bonds have equal strength and all non-complementary bonds have equal strength), models of sequence-bearing templates may be mapped onto models with homopolymeric templates and, hence, to models of free copolymerization.23–27 

Templated copolymerization models can be further divided into two main categories: templated self-assembly Fig. 1(b),1,27–41 and autonomously separating mechanisms Fig. 1(c).23,24,42 Templated self-assembly models are those in which all the monomers in the growing copolymer remain bound to the template. In autonomously separating models, the growing copolymer detaches as it extends.23–25 There has been recent interest in explicitly modeling autonomous separation in templated growth in an attempt to understand models that give a better description of transcription or translation.23–25 In autonomously separating models, the simultaneous growth and separation of the copolymer and template mean that the copy-template interactions are not permanent, and therefore, free energy released from such interactions cannot be part of the driving force of polymerization. In addition, since these copy-template bonds are temporary, they cannot stabilize the accurate copy directly in the long time limit. Furthermore, an ensemble of accurate polymers is a lower entropy state than an ensemble of random polymers. These conditions mean that non-equilibrium driving is required to generate accurate copies of the template if the copies are to spontaneously detach.42 Moreover, the separation of the lagging tail from the template as the copolymer grows naturally causes intra-sequence correlations within the product.23 

The models described above are maximally coarse-grained, in that they treat the binding of monomers to the growing tip of the copolymer as a simple, usually single-step, process. However, more generally, one may wish to study models in which polymerization occurs via a more detailed series of steps, as in Fig. 1(d). For instance, in order to explain the high accuracy observed in biological polymer copying systems, Hopfield43 and Ninio44 independently introduced the concept of kinetic proofreading: a reaction motif in which a monomer undergoes a free energy consuming activation reaction before it is polymerized into the copolymer. The introduction of kinetic proofreading reaction motifs presaged the investigation of more complex copolymerization mechanisms.35,45

In summary, models that allow for multiple monomer types, intra-sequence correlations, reversible reactions, and general, multi-step monomer inclusion reactions represent a wide class of copolymerization processes. Previous techniques17–19,22,23,27–35,39–41,46–51,66–69 have not allowed analysis of thermodynamically self-consistent models of generalized free copolymerization processes in which monomer addition is given by an arbitrarily complex network of reversible reactions with rates that may depend on the terminal monomer type, and templated copolymerization models with a high symmetry that can be mapped to these free processes. Investigating the most general type of model in this class would require simulation.

In this paper, we present a universal method for studying this large class of copolymerization models. Drawing on the work of Gaspard and Andrieux52 for analyzing linear copolymerization processes, and Hill53,54 for analyzing absorbing Markov processes, we present analytical methods for extracting: explicit expressions for the probability of inclusion of a given monomer; the growth rate of a copolymerization process; and the chemical work done by the process. Our method removes the need to extract the same features by simulation and often produces simple, analytic results.

In Sec. II A, we review and refine methods relating to absorbing Markov chains that are crucial to understanding our approach. In Sec. II B 1, we present our method. In Sec. III, we apply the method to a few example processes to demonstrate its use and power when considering models with certain features. First, we apply the method to models for which the rate of adding new monomers only depends on the monomer type being added. Next, we apply the method to templated copolymerization systems with autonomous-separation that do not have non-equilibrium kinetic proofreading cycles. Finally, we solve a generalized version of Hopfield’s kinetic proofreading model applied to a templated copolymerization system with an autonomously separating product.

We begin by reviewing and adapting some diagrammatic techniques introduced by Hill to analyze absorbing Markov chains.53,54 An absorbing Markov chain is a Markov chain for which any trajectory through its state space with arbitrary initial conditions will reach an absorbing state in finite time almost surely.55 We can decompose the state space of an absorbing Markov chain into absorbing states A and transient states X such that the state space is V=AX. Let us denote the rate function that describes the chain as K:V×VR+ such that K(x, y) is the rate of the transition from state x to state y. Then, we denote a Markov process as the tuple (V, K).

Throughout this section, we shall refer to the absorbing Markov chain given in Fig. 2(a), which possesses two absorbing states and non-trivial cycles, for illustrative purposes.

FIG. 2.

Graphical representations of an absorbing Markov process to illustrate the methodology outlined in Sec. II A. (a) Example absorbing Markov process (XA,K), with two absorbing states, A={A,B}, and four transient states X={1,2,3,4}. (b) The closed process starting at state 1, (X,K1). (c) The cycle process ({c}X/{1,2,3}={c,4},K1,C) for the cycle C = 1 → 2 → 3 → 1 or C′ = 1 → 3 → 2 → 1.

FIG. 2.

Graphical representations of an absorbing Markov process to illustrate the methodology outlined in Sec. II A. (a) Example absorbing Markov process (XA,K), with two absorbing states, A={A,B}, and four transient states X={1,2,3,4}. (b) The closed process starting at state 1, (X,K1). (c) The cycle process ({c}X/{1,2,3}={c,4},K1,C) for the cycle C = 1 → 2 → 3 → 1 or C′ = 1 → 3 → 2 → 1.

Close modal

1. Expectations of an absorbing process are steady state averages of a “closed process”

We will derive expressions for four main quantities: the probabilities of reaching certain absorbing states, the expected time taken to absorption, the expected net number of times traversing a given edge before absorption, and the expected number of times that a trajectory goes round a cycle before absorption. These quantities depend on the starting (transient) state σX and can be found in terms of the “closed” process.54 The closed process is a modified version of an absorbing Markov chain in which transitions to the absorbing states are redirected to the starting state. Figure 2(b) shows the closed process starting at state 1 of our example absorbing chain of Fig. 2(a). The closed process for a Markov process (XA,K) starting at state σ is a new Markov process (X,Kσ) with a rate function given by

Kσ(x,σ)=K(x,σ)+AAK(x,A)
(1)

for xX and agreeing with K on X×X/{σ}.

The closed process has a unique stationary distribution for the following reasons. From the definition of an absorbing Markov chain, there exists a path from any state to an absorbing state, taking finite time. Thus, in the closed process, there is a path from any state to the starting state, taking finite time. The set of states including the starting state and all those that may be reached from the starting state is, therefore, positive recurrent and furthermore, this set is the only recurrent set of states and will be reached from any other state. Since there is only one recurrent set of states, there is a unique stationary distribution.55 

Expected quantities of an absorbing Markov chain, such as the expected probability that a particular absorbing state is reached, can be found in terms of steady state quantities in the closed process. Whenever a trajectory of the original process reaches an absorbing state, in the closed process that same trajectory would have been reset back to the starting state. Hence, running the closed process for long times is equivalent to generating many independent trajectories to absorption for the original chain. Thus, averaging quantities in the steady state of the closed process is equivalent to taking expectations over independent trials of quantities in the absorbing chain. It is worth noting that the dependence of expected quantities on the starting state is encoded in the definition of the closed process. Finally, we can see that the definition of the closed process may permit self-transitions σσ, which, for continuous time Markov processes, have little meaning. However, for the purposes of calculating steady state probabilities of the closed process they may be ignored.

2. Steady state averages of the closed process are calculated using the Markov chain tree theorem

Given that we can turn the calculation of expectations of absorbing processes into steady state averages over closed processes, we can make use of tools developed for analyzing the steady state of Markov processes, such as the Markov chain tree theorem (MCTT).56 The MCTT states that the steady state distribution of a Markov chain with a unique stationary distribution may be found by summing over rooted spanning trees of the process, where the transition rates are taken as weights on the edges of the graph. Explicitly, let G be a directed weighted graph, with weight K(e) for an edge e of G. A spanning tree of G, rooted at a vertex ν, is a subgraph of G with no cycles that connects all the vertices of G and for which the out degree of every vertex, except ν, is one. The sets of spanning trees rooted at nodes 3 and 4 of the closed process given in Fig. 2(b), are shown in Fig. 3. Denote the set of all spanning trees rooted at x by T(x). The MCTT states that the steady state probability π(x) to be in state x is given by

π(x)=TT(x)eTK(e)vXTT(v)eTK(e),
(2)

with eT representing the edges of the tree. The denominator here is simply a normalization constant.

FIG. 3.

Spanning trees of the closed processes rooted at (a) node 3 and (b) node 4 derived from Fig. 2(b), with nodes labeled in the first spanning tree and all other trees following the same positioning. The spanning trees have been arranged in terms of the self-avoiding walk between nodes 1 and 3 for the trees rooted at node 3 and arranged in terms of the self-avoiding walks between nodes 1 and 4 for the trees rooted at node 4. More details on the relationship between self-avoiding walks and spanning trees are given in  Appendix A.

FIG. 3.

Spanning trees of the closed processes rooted at (a) node 3 and (b) node 4 derived from Fig. 2(b), with nodes labeled in the first spanning tree and all other trees following the same positioning. The spanning trees have been arranged in terms of the self-avoiding walk between nodes 1 and 3 for the trees rooted at node 3 and arranged in terms of the self-avoiding walks between nodes 1 and 4 for the trees rooted at node 4. More details on the relationship between self-avoiding walks and spanning trees are given in  Appendix A.

Close modal

We can define steady state currents from the steady state distribution of the closed process that corresponds to expected currents of the absorbing chain. Let a subscript σ denote quantities in the closed process starting at state σ. Then, πσ is the steady state probability distribution and Kσ the rate function. The current along a given edge, e = xy, is given by the probability to be in state x, πσ(x), multiplied by the rate along said edge. We can, therefore, write the steady state current along all edges that originally led to absorbing states as

JTot(σ)=AAxXπσ(x)K(x,A),
(3)

where, as before, X is the set of transient states and A, the set of absorbing states. JTot(σ) is the expected total current to absorbing states from state σ, and, therefore, its reciprocal is the expected time to absorption.

For the example process shown in Fig. 2, we present the spanning trees of the corresponding closed process rooted at node 3 and 4 in Fig. 3. Given the spanning trees, we can directly write down the total current to the absorbing states as

JTot(1)=1NkAr12r24r43+r12r23(r42+r43+kB)+r13((r43+kB)(r21+r23+r24)+r42(r21+r23))+kBr12r23r34+r12r24(r31+kA+r32+r34)+r13r34(r21+r23+r24)+r13r32r24,
(4)

where N is the normalization term, given in  Appendix B. The terms multiplied by kA are the partial current to absorbing state A, i.e., the current along transition 3 → A, coming from the trees rooted at node 3, and equivalently for kB with state B, i.e., the current along transition 4 → B, coming from trees rooted at node 4.

3. Absorbing probabilities

Given a Markov chain with multiple absorbing states, we can ask for the probability of absorption in each absorbing state in the long time limit. The probability that a trajectory eventually ends in a specific absorbing state can be calculated from the closed process, by dividing the expected current along transitions that originally led to the absorbing state in question by JTot(σ) [Eq. (3)]. Therefore, the absorption probabilities can be written

P[σA]=xXπσ(x)K(x,A)BAxXπσ(x)K(x,B),
(5)

using the notation P[σA] to denote probability of being absorbed to A given that the trajectory started in state σ. It is worth noting here that given that this quantity is a ratio of currents, there is a factor of πσ in both the denominator and the numerator of the expression. In practice, we see that the normalization factor from the MCTT [Eq. (2)] cancels out, which simplifies the quantities in the calculation.

For our example process shown in Fig. 2, we can use the partial currents to absorbing states A and B to write down the absorbing probabilities

P[1A]=1NJTot(1)kAr12r24r43+r12r23(r42+r43+kB)+r13((r43+kB)(r21+r23+r24)+r42(r21+r23)),P[1B]=1NJTot(1)kBr13r34(r21+r23+r24)+r13r32r24+r12r23r34+r12r24(r31+kA+r32+r34).
(6)

The normalization factor, N, propagated through from Eq. (2), conveniently cancels out with the 1/N implicit in Jtot.

4. Counting edge and cycle transitions

In this subsection, we shall calculate the expected net number of times traversing a given edge of an absorbing Markov process before absorption. In addition, we shall calculate the expected number of times a non-recurrent cycle of an absorbing Markov process is traversed before absorption. Both of these will be of use later in defining a notion of chemical work.

To calculate the net number of times crossing a given edge, we find the expected current along the transition xy, between states x and y of an absorbing process, (XA,K), as in Sec. II A. The expected current through this edge, denoted Jxy(σ), given starting in state σX, can be calculated from the closed process, (X,Kσ), as in Eq. (1), as the difference between the steady state probability to be in state x multiplied by the rate from xy and the steady state probability to be in state y multiplied by the rate from yx,

Jxy(σ)=πσ(x)K(x,y)πσ(y)K(y,x).
(7)

The net number of times traversing this edge (number of observed transitions xy - number of observed transitions yx) before absorption is, then, just the ratio between this current and total current to absorbing states,

Nxy(σ)=Jxy(σ)JTot(σ).
(8)

The current, given in Eq. (7), is intimately linked to the notion of cycles as pointed out by Wachtel et al.57 and detailed in  Appendix C. Thus, we also wish to find the expected number of times traversing a non-recurrent cycle. We define a non-recurrent cycle for a Markov chain to be a cycle of states, where each state, aside from the originating state, does not appear more than once in the cycle. For example, the cycle A → B → C → D → A is non-recurrent, but A → B → C → D → B → A is recurrent. Note that the originating state is arbitrary, and so A → B → C → D → A is equivalent to B → C → D → A → B. For a stationary process, the expected frequency with which a cycle is completed can be calculated from the one-way cycle current,54,58 which is the probability current going around the cycle. For a chosen non-recurrent cycle, the one-way cycle current can be calculated diagrammatically from three terms. First, a cycle term given by the product of rates around the cycle in the chosen direction. Second, a spanning tree term that can be found by collapsing the nodes in the cycle into a single node [in Fig. 2(c), the cycle 1 → 2 → 3 → 1 has been collapsed in this way] and finding the sum of spanning trees of this new graph rooted at the collapsed cycle node. Finally, there is a normalization factor, which is the same normalization factor as for the current, N. Explicitly, consider an absorbing Markov chain G=(XA,K) and its closed process starting at σX, Gσ=(X,Kσ). Let C denote both the set of edges and set of nodes of a cycle in the closed process. To calculate the spanning tree term for the one-way cycle current, construct a new Markov chain, the cycle process Gσ,C=({c}(X/C),Kσ,C), where {c}(X/C) is the set of transient states of the original process with the states in the cycle replaced by the single node c and Kσ,C is given by

Kσ,C(x,c)=iCKσ(x,i)Kσ,C(c,x)=iCKσ(i,x)Kσ,C(c,c)=0
(9)

for xX/C and agreeing with Kσ elsewhere. The cycle process for the cycle C = 1231 (or C′ = 1321) of the example system in Fig. 2(a) is shown in Fig. 2(c). Let Tσ(x),TC(x) be the sets of spanning trees rooted at x of the closed process, Gσ, and cycle process, Gσ,C, respectively. Then, the cycle current is given by58 

JCyc(σ,C)=eCK(e)CycleTTCeTKσ,C(e)Spanning TreesxXTTσ(x)eTKσ(e) normalization.
(10)

Note that, for the cycle term, the edges are taken from the original process rather than the closed process. Given the cycle current for the closed process, the expected number of circulations of the cycle before absorption, NCyc(σ, C), is the ratio of the cycle current to the total current to absorbing states

NCyc(σ,C)=JCyc(σ,C)JTot(σ).
(11)

For our example process, the expected number of circulations of C = 1231 is

NCyc(1,C)=(r12r23r31)(r42+r43+kB)NJTot(1),
(12)

with the same implicit cancelation of normalization as before, since JTot1N.

For an absorbing process starting at a given state, we may divide the cycles into internal and external cycles. External cycles are those which appear in the closed process and involve edges which were absorbing edges in the original process. The set of all cycles, sorted into internal and external, for the example process Fig. 2, is shown in  Appendix D. The external cycles correspond to the pathways from the starting state to an absorbing state. Therefore, the expected number of times traversing an external cycle before absorption will be at most one and corresponds to the probability of following a given path to absorption. Furthermore, the sum of Eq. (11) over all external cycles will be one.

1. Philosophy of coarse-graining complex underlying copolymerization reaction networks

Armed with the techniques for solving absorbing Markov chains, here, we set out the method for the analysis of copolymerization processes. Gaspard and Andrieux52 presented a method to analyze Markov polymerization processes in which each monomer is added in a single step [i.e., if the internal reaction network shown in Fig. 1(d) were trivial], assuming long polymers. We shall present a method for mapping more complex models for the individual polymerization step onto coarse-grained descriptions that can be analyzed using this framework and, subsequently, show how to get the behavior of the full model from the results.

Consider a growing copolymer with M monomer types, which are assumed to be present in the environment at fixed concentrations. At a coarse grained level, we can define a state space of finite length sequences {x1x2xl|xi ∈ {1, 2, …, M}, l ≥ 0}, where l is the length of the sequence. Let us refer to the coarse-grained states in this state space as completed states. On this coarse-grained level, a sequence of length l may increase in length by one unit by polymerizing one of M units at the growing tip (x1x2xlx1x2xlxl+1) or it may decrease in length by one unit (x1x2xlx1x2xl−1). Such a coarse grained model is depicted in Figs. 1(a)1(c) for free polymerization, templated self-assembly, and templated polymerization with simultaneous separation.

In general, copolymerization processes may be best described by models in which the underlying copolymerization reaction networks are complex, featuring multiple sub-steps in arbitrarily complex networks connecting the completed states, as suggested in Fig. 1(d). Hence, overall, we could consider a copolymerization process as having a tree-like structure with networks of reactions connecting completed states, as in Fig. 4. Such a class of models is wide-reaching, with many examples from the literature included in this class.17–19,22,23,27,28,30–35,39–41,45–51

FIG. 4.

(a) The step-wise process for an arbitrary model with three monomer types. This step-wise process is for a copolymer &xy. The edges colored red are the completion edges. The flower like structure of the step-wise process can be seen with four petals, each connected at the starting state &xy. (b) One of the petals of the step-wise process, which we use to define Λ±(z, y). Λ+(z, y) is defined as the sum of spanning trees rooted at the rightmost state &xyz and Λ(z, y) is defined as the sum of the trees rooted at the leftmost state &xy. (c) One of the petals [connecting &xy to &xyz, as in (b)] which has been linked back to the starting state. This graph is used to define Q(z, y) as the sum of spanning trees rooted at the leftmost state &xy.

FIG. 4.

(a) The step-wise process for an arbitrary model with three monomer types. This step-wise process is for a copolymer &xy. The edges colored red are the completion edges. The flower like structure of the step-wise process can be seen with four petals, each connected at the starting state &xy. (b) One of the petals of the step-wise process, which we use to define Λ±(z, y). Λ+(z, y) is defined as the sum of spanning trees rooted at the rightmost state &xyz and Λ(z, y) is defined as the sum of the trees rooted at the leftmost state &xy. (c) One of the petals [connecting &xy to &xyz, as in (b)] which has been linked back to the starting state. This graph is used to define Q(z, y) as the sum of spanning trees rooted at the leftmost state &xy.

Close modal

We will define a Markov process at the level of the coarse-grained completed states that, by construction, preserves the probabilities of the transitions between the completed states of the fine-grained process and, therefore, preserves the statistics of the sequences produced. The coarse-grained Markov process does not preserve the distribution of the transition times between the completed states implied by the fine-grained model, which will, in general, be non-Markovian. Moreover, it does not provide fine-grained information on trajectories between the coarse-grained completed states. However, the temporal details and information about the fine-grained dynamics can be added back in at a later stage, once the statistics have been analyzed at the coarse-grained level.

2. Identifying propensities in the coarse-grained model

We find the transition rates of the coarse-grained model (hereafter labeled as propensities to avoid confusion with the underlying rates of the fine-grained process) by considering first passage problems between completed states. From a given completed state, there are M + 1 completed states that may be reached, corresponding to the M possible additions of a monomer and the removal of the monomer currently at the tip of the copolymer. For a first passage problem, we can convert each of these reachable completed states into an absorbing state by removing the transitions out of said states, as in Fig. 4(a), in the same vein as Cady and Qian.59 Let us refer to this absorbing Markov process as the step-wise process and define step to mean the addition/removal of a monomer.

We shall work with the assumption that the transition rates depend on the two monomers at the growing tip of the copolymer following.1,17–19,21–23,36–38,52,60 There will therefore be M2 flavors of this process corresponding to the combinations of the two terminal monomers of the copolymer, the central state &xy (here & represents an arbitrary sequence). We wish to find the absorbing probabilities, P[&xy&xyz],z{1,M},P[&xy&x] given an initial condition of the central state &xy. As outlined in Sec. II B 1, Eq. (5), we can find these probabilities by constructing the closed process and finding sums over spanning trees rooted at different states. The step-wise process has M + 1 petal-like graphs each connected to the central state but disconnected from each other. Due to this structure, any sums over spanning trees of the full process will factorize into a product of sums over spanning trees of the petals. Thus, we find that the absorbing probabilities take the following form:

P[&xy&xyz]=1NΛ+(z,y)zzQ(z,y)Q(y,x),P[&xy&x]=1NΛ(y,x)zQ(z,y).
(13)

Here, z ∈ {1, …, M}, N is the normalization factor from Eq. (2); Λ+(z, y) is the sum over spanning trees of the petal connecting states monomers &xy and &xyz, rooted at the forward completed state &xyz; Λ(y, x) is the sum over spanning trees of the petal connecting states monomers &x and &xy, rooted at the backward completed state &x, Fig. 4(b); and Q(y, x) is the sum over spanning trees of the petal connecting states &x and &xy, linked back to the central state and rooted at the central state, i.e., with edges redirected to the starting state as in the closed process, as in Fig. 4(c). Since Q is a sum over spanning trees rooted at the node to which edges have been redirected, the sum takes the same form for both the forward and backward petals, only depending on which two completed states it is connecting.

From these probabilities, we see that choosing propensities ω±yx for the transitions &xω+yx&xy and &xyωyx&x such that

ω±yx=Λ±(y,x)Q(y,x)
(14)

not only preserves the ratios of probabilities of transitions to completed states, but also ensures that ω±yx only depends on monomers x and y.

We note here that this coarse graining process is different from lumping,55,61 in which the state space is reduced while attempting to retain trajectory dynamics. In our approach, the coarse-grained process does not reproduce the dynamics of the fine-grained process, only the statistics of the completed states that are visited. However, dynamic quantities may be extracted exactly from the step-wise process, as we show in Sec. II B 4.

3. Solving the coarse-grained model

We now use the methods developed by Gaspard and Andrieux52 to solve the coarse-grained Markov model over the completed states with propensities ω±yx. Gaspard and Andrieux’s approach considers a frame of reference that is comoving with the tip of the growing polymer and assumes that the state of the tip and nearby monomers reaches a stationary distribution to derive quantities at this steady state, such as the set of tip incorporation velocities vx (the rates of adding monomers to a copolymer &x), the tip probabilities μ(x) (the probability at a given time that the growing polymer is in state &x), and the pair tip probabilities μ(x, y) (the probability of being in state &xy). However, we note that the time-dependent information is not physical at this stage due to the coarse-graining process. The above quantities are found from solving the following equations:52 

vx=y=1Mω+yxvyωyx+vy,
(15)
μ(x)=y=1Mω+xyωxy+vxμ(y),
(16)
μ(x,y)=ω+yxωyx+vyμ(x).
(17)

Using μ and v, we can calculate the statistics of the copolymer sequence far behind the growing tip.52 We note that the distribution of monomers at the tip μ(x) is different from the distribution of monomers at sites behind the tip; we assume that this distribution reaches some limit far behind the growing tip in the bulk of the copolymer. This limiting distribution describes the probability that a monomer in the bulk of the copolymer takes a value x. Using ɛ(x) to denote the frequency of monomer x in the bulk of the copolymer,52 

ε(x)=μ(x)vxyμ(y)vy.
(18)

We may similarly define ɛ(y|x) as the probability that in the bulk of the copolymer, a monomer y is observed given a monomer x behind it. ɛ(x) and ɛ(y|x) fully characterize the statistics of the bulk copolymer since under our assumptions, transitions only depend on the two monomers at the tip; the completed copolymer sequence is itself a Markov chain.23 

4. Extracting properties of the fine-grained model from the solution of the coarse-grained model

The easiest quantities to extract are the frequencies of monomers in the bulk of the copolymer. These quantities are identical in the coarse-grained and fine-grained models, since the coarse-graining preserves the statistical distribution of the sequences produced. Therefore ɛ(x), as defined in Eq. (18), and ɛ(y|x) apply directly to the fine-grained process.

The tip probabilities μ above give the fraction of the time spent in each tip state in the coarse-grained model. However, the coarse-grained model will not reproduce the time series of the fine-grained model, but only the sequences of completed states visited. We, therefore, quotient out the lifetime τ(x, y) of the tip state (x, y) to obtain the frequency with which the tip states are visited in the coarse-grained model,

ξ(x,y)=1x,y=1Mμ(x,y)τ(x,y)μ(x,y)τ(x,y),
(19)
τ(x,y)=1ωyx+z=1Mω+zy.
(20)

This frequency defines a new tip distribution ξ. ξ(x, y) is the frequency that a given pair of monomers x, y is observed at the tip of the growing copolymer in the sequence of transitions. This distribution, ξ(x, y), applies to both the coarse-grained model and the sequence of completed states visited in the full fine-grained model. It can, therefore, be used to find averages of the key dynamic properties.

For example, we can calculate the probability P that a growing copolymer increases in length at each step of the step-wise process. P is calculated by averaging the probability of adding a monomer over the possible states &xy:

P=x,y=1Mξ(x,y)z=1Mω+zyωyx+z=1Mω+zy.
(21)

Upon averaging out the sequence information, we may treat the growth of a polymer as a random walk with probability P of stepping forwards and (1 − P) of stepping back. We can find the expected number of monomer inclusion/removal steps per net forward step as 1/(2P − 1) (for proof see  Appendix E). A number of quantities scale with the total number of steps rather than the net number of steps, making the number of steps per net forward step a necessary quantity. For example, in order to find the expected time taken per net forward step, one can find the expected time to absorption for the step-wise process T(x, y) for a copolymer in state &xy by calculating 1/JTot(&xy) for the step-wise process using Eq. (3). Then, the expected time per net forward step is

τstep=12P1x,y=1Mξ(x,y)T(x,y).
(22)

1/τstep is, therefore, the physical average growth rate of the copolymer in the fine-grained model.

We may also calculate the chemical work done by the system in producing the copolymer. In a purely chemical system, with no time-varying externally applied protocols, the entropy increase of the universe is given by the decrease in the generalized free energy of the chemical system, including any coupled reservoirs of fuel molecules.20 Since the total free energy must decrease, any increase in one contribution must be paid for by a decrease of at least the same magnitude in another contribution. It is common to describe the latter subsystem as doing work on the former.

For the polymerization systems analyzed here, the generalized free energy can be split into a term corresponding to the chemical free energy of the system averaged over the uncertain state of the system and a term related to the entropy arising due to the uncertainty of the state occupied.62 

G=ap(a)Gchem(a)+ap(a)lnp(a),
(23)

where we use natural units such that kBT = 1. Here, a is a chemical state of the system as a whole, Gchem(a) is the chemical free energy of state a, and p(a) is the probability that the system occupies the state a. Gchem(a) = −ln Za, where Za is the partition function of the system (explicitly including any large chemical buffers) restricted to the chemical state a. Za represents the contribution of concentrations and bond strength to the favorability of a molecular state. The principle of detailed balance20 states that the chemical free energy change associated with a transition from a to b is given by

Gchem(b)Gchem(a)=lnK(a,b)K(b,a).
(24)

The second term in Eq. (23) is information theoretic in character; it is equal to the negative of the Shannon entropy associated with the distribution over chemical states. For the systems studied here, in which we consider infinitely long copolymers that have reached steady state growth, the only relevant contribution to this term is the increase in Shannon entropy of the copolymer sequence produced as the polymer gets longer. Since the copolymer sequence is itself a discrete time Markov chain23 the additional entropy per net forward step (the entropy rate) can be readily calculated,63 

H=x,y=1Mε(x)ε(y|x)lnε(y|x),
(25)

with x, y representing the monomer types. Since the purpose of a copolymerization system is often to produce a low entropy (or “accurate”) sequence, it is reasonable to think of the chemical free-energy decrease per net forward step as the chemical work done to reduce the information entropy of Eq. (25) below that of a uniform, random polymer. Extending the definition provided by Poulton et al.,23 we may define the efficiency of copolymerization as

η=lnMHlnM+Wchem1,
(26)

where ln M is the entropy per monomer (or entropy rate) of a uniform, random copolymer with M monomer types and Wchem is the average decrease in chemical free energy per net forward step. This efficiency is, then, the ratio between the entropy drop due to the accuracy of the copolymer compared to that of a random one (ln MH) and the chemical work used to drive the system (Wchem) above that required to make a random copolymer in equilibrium is (−ln M).64 

The expected work done during a transition adding or removing a monomer given starting in completed state &xy can be calculated by summing the contribution from Eq. (24) multiplied by the expected net current along the edge ab prior to absorption over all the edges in the step-wise process,

wchem(x,y)=ΔGchem(x,y)=b>alnK(a,b)K(b,a)Nab(&xy),
(27)

where Nab(&xy) is the expected net number of times traversing edge ab before absorption given started in the central state of the step-wise process &xy, as in Eq. (8). This sum will also require contributions from edges that lead to absorbing states. For such edges, the rate for the reverse transition in the logarithm of Eq. (27) is the rate from the full process.

Equivalently, however, as outlined in  Appendix C, we may find this chemical work by considering the non-recurrent cycles of the process.57 For a given internal cycle, C, we may define the affinity20 

A(C)=lneCK(e)eCK(e),
(28)

where the products are over the edges e composing the cycle and C′, which is the cycle with the edges in a reversed direction. For external cycles, we may define the affinity in the same way, inferring the rate for the reversed edge of the transition to absorbing states from the full process. The expected work done before absorption of the cycle C having started in the state &xy is

wchem(x,y)=ΔGchem(x,y)=CA(C)JCyc(σ,C)JCyc(σ,C)JTot(&xy).
(29)

Averaging wchem(x, y) with ξ and multiplying by the expected number of steps per net forward step gives the expected chemical work done per net forward step,

Wchem=12P1x,y=1Mξ(x,y)wchem(x,y).
(30)

Furthermore, the forms of Eqs. (22) and (30) may be applied to an arbitrary quantity for which one can find the expected value in the step-wise process starting in state &xy. Let this arbitrary quantity be A(x, y). One can then average this quantity using the distribution ξ to obtain the expected value of the quantity per step. Then, if appropriate, multiplying by 1/(2P − 1) gives the expected value of the quantity per net forward step. In practice, as shall be seen in Sec. III C, since the quantities we wish to calculate may be written in terms of sums over spanning trees, the quantities for the step-wise process may be written as a sum over the terms per petal, with the quantity for a given petal factorizing into some quantity that depends on the petal multiplied by Q’s for the other petals.

5. Stalled growth

Explicit simulation of copolymer growth is particularly challenging in regimes where P ≳ 0.5, since many backward and forward steps are taken per net forward step. At P = 0.5, then, the process will not reliably produce copolymers; for P < 0.5 polymers will tend to shrink. In general, for P = 0.5, we can say the model has stalled. Our approach is particularly beneficial in this case; indeed, it is possible to check whether a model is at the stall point by considering an M × M dimensional matrix of the ratios of forward to backward propensities,52Zyx=ω+yxωyx=Λ+(y,x)Λ(y,x). The model is at the stall point if and only if

detZ1M=0,
(31)

where 1M is the M × M identity matrix. The polymer will shrink if detZ1M<0, Since Z gives the ratios of adding a monomer to removing one, this condition essentially says that models will stall if the total rate of adding a monomer is equal to the total rate of removing one.

In a typical model, there exists at least one parameter that controls the driving. Often this parameter is related to the backbone strength of the polymer produced: e.g., the free energy drop associated with the formation of a generic backbone bond ΔGpol. This parameter will be present in the rates of each external cycle so that by tuning it, the model can be moved all the way from stalling to irreversible growth, whereby monomers cannot be removed once polymerized. If such a parameter exists, we may rephrase the stall condition, given in Eq. (31), in terms of this parameter. For example, for the case of the parameter being ΔGpol, we may find some threshold value Γ such that the model will stall for ΔGpol = Γ.

6. Limiting behavior

We shall note two limits for which we may give analytic expressions for the frequency of monomer types in the copolymer bulk for all models. First, consider the case that the system is at the stall point [Eq. (31)]. In general, entropy production can still occur within cycles in the step-wise process; therefore, these frequencies cannot be determined from equilibrium arguments and are non-trivial. Nonetheless, at the stall point, we may express the monomer frequencies in the bulk relatively simply. The frequency ɛstall(x) of monomer x is proportional (up to normalization) to the cofactor of the diagonal element (corresponding to monomer x) of the matrix 1MZ, as proven in  Appendix F. For example, for M = 2,

εstall(1)1ω+22ω22,εstall(2)1ω+11ω11,
(32)

and for M = 3, we have

εstall(1)1ω+22ω221ω+33ω33ω+23ω23ω+32ω32,εstall(2)1ω+11ω111ω+33ω33ω+13ω13ω+31ω31,εstall(3)1ω+11ω111ω+22ω22ω+12ω12ω+21ω21.
(33)

On the other end of the spectrum, we can also solve for the monomer bulk frequencies in the irreversible limit, where ωyx = 0 for all x, y. Intuitively, we could consider the Markov process on the state space {1, …, M} representing copolymers with a given monomer at its tip and the transitions between those states with rates Kirrev(xy) = ω+yx. The steady state of this process will give the time dependent frequencies of having a given monomer at the tip of the copolymer. Therefore, dividing by the time spent in each state will give the bulk frequencies. A nice way to write out these frequencies in the style of the methods described thus far is as a sum over the spanning trees on the complete graph on M vertices with rate functions Kirrev(x, y) = ω+yx. Explicitly, we may write these frequencies (up to normalization) as

εirrev(x)TT(x)eTKirrev(e)y=1Mω+yx,
(34)

where T(x) is the set of spanning trees of the complete graph on M vertices. This expression is derived formally in  Appendix G. For example, with M = 2,

εirrev(1)=ω+12(ω+11+ω+21)ω+12(ω+11+ω+21)+ω+21(ω+12+ω+22),εirrev(2)=ω+21(ω+12+ω+22)ω+12(ω+11+ω+21)+ω+21(ω+12+ω+22).
(35)

7. Simplification for factorizable propensities

The presented method applies to arbitrary complex copolymerization models obeying the structure of Fig. 4. However, if we make some further common assumptions, much of the analysis simplifies. For example, consider the case in which the ratios of propensities may be factored,

ω+yxωyx=Λ+(y,x)Λ(y,x)=Y(y)X(x),
(36)

where Y is a function of monomer y only and X is a function of monomer x only. Intuitively, such a condition holds in the cases where there are no direct, type-dependent interactions between monomers in the growing polymer, such as when monomers only interact with a template.1,23,24,27–38 Under such an assumption, multiple calculations simplify, see  Appendix H. For example, the stall condition becomes simply that the model will stall at

xX(x)Y(x)=1.
(37)

Bulk frequencies at stall are just,

εstall(x)=X(x)Y(x).
(38)

We shall now consider some exemplar classes of models to: provide examples of how to utilize the methods; validate their accuracy; and to show the types of quantities and information that may be extracted.

A useful initial classification of these models is into those which we shall call balanced. We shall refer to a model as being balanced if its petals [see Fig. 4(b)] are detailed balanced. Such models are useful baseline checks as their cycles all have zero affinity, meaning no chemical work is done internally and, hence, the only contributions to chemical work are from external cycles. Furthermore, these models exhibit a proper equilibrium at the stall point and allow for equilibrium arguments to validate the method at this point. It is worth noting that although related to the notion of detailed balanced, the full model with its infinite state space is not detailed balanced.

We shall start with the simplest case, where the propensities in the coarse-grained model only depend on the monomer type being added/removed: ω±yx = ω±y, such as in a simple model for templated self-assembly, shown in Fig. 1(b). Assume that there exists a backbone free energy ΔGpol controlling the driving as described in Sec. II B 5. Any spanning tree in Λ± must involve at least one incidence of ΔGpol, since it appears in every external cycle. Therefore, we can split the ratio of propensities as follows:

ω+yωy=eΔGyeΔGpol,
(39)

where ΔGy encompasses the rest of the details about the models. We note, in general, that ΔGy may be a function of ΔGpol, however, in many cases, it is not. These cases include when there is only one completion reaction [highlighted in red in Fig. 4(a)] that contains the dependence on ΔGpol or if the model is balanced. We may then interpret −ΔGy as an effective binding free energy of monomer y. If we think of ΔGpol as the free energy drive of the model away from stalling, we look for a threshold value ΔGpol = Γ above which the model will not stall. Using Eq. (37), we see that

Γ=lnyeΔGy=lnZ,
(40)

where Z is the partition function for a system with one state for each monomer type, each state labeled by y and with free energy −ΔGy. Furthermore, using Eq. (38), the bulk frequencies at the stall point may be written as

εstall(y)=eΔGyxeΔGx=1ZeΔGy,
(41)

which is the probability of selecting a state y with free energy −ΔGy as predicted by equilibrium statistical mechanics. In these results, −ΔGy looks like the equilibrium contribution to free energy, and the results follow fairly directly in equilibrium. However, these results hold even if the process involves fuel-consuming cycles: entropy may still be being produced at stall. In such cases, the effect of breaking the equilibrium will be to change the effective free energies of selecting a given monomer type.

Next, we shall consider a class of models where the ratio of propensities may be written as

ω+yxωyx=eΔGyeΔGxeΔGpol.
(42)

As before, ΔGpol, coming from the polymerization reactions represents the driving of this process. Such a class of models includes, most notably, balanced models of templated polymerization with autonomous separation.23 In these cases, the breaking of the previous copy-template bond every time a new bond is formed enforces the structure in Eq. (42). We shall assume, as in Ref. 23, that ΔGy is independent of ΔGpol.

Using Eqs. (37) and (38), we find the stall point to be ΔGpol = Γ = −ln M and the bulk frequencies at stall εstall(y)=1M, where M the number of monomer types. Physically, we can understand these results by considering balanced models of templated polymerization with autonomous separation. For such models, by definition, there is no entropy production in internal cycles and therefore, the stall point must be thermodynamic equilibrium. In such models, the only driving comes from the polymerization ΔGpol and the entropic effect having M monomers to choose. These two effect balance at equilibrium.64 

Next, let us consider the limit that the completion reactions highlighted in red in Fig. 4(a) are much slower than the other reactions. Explicitly, let k be some rate constant at the same order of magnitude of the rates of the process that are not the rates for the completion transitions indicated in red in Fig. 4(a). Write the completion rates as kcomRcom+(y,x), where kcomk is a rate constant controlling the overall speed of the completion reactions and Rcom+(y,x) provides any sequence dependence. Similarly, the reverse transitions along the completion edges have the rate kcomRcom(y,x). Furthermore, let there be ncom such completion reactions in a given petal of the step-wise process (we shall assume this number is the same for all pairs of monomers x, y).

Assume for simplicity that all completion reactions, Rcom±(y,x), take the same form in a given petal. Then, we can write the sum over spanning trees Q(y, x) as

Q(y,x)=1ncomΛ(y,x)kcomRcom(y,x)+Okcomk,
(43)

since Λ(y, x) has first order terms in kcom/k. This fact can be seen from noting that the leading order terms in Q(y, x) are the trees with no completion reactions and the leading order terms in Λ(y, x) are those same leading order trees of Q(y, x), except with one completion reaction added in. There are ncom such completion reactions and each adds the same leading order term to Λ(y, x). With Q(y, x) taking this form, and remembering Eq. (42), the propensities take the following form:

ω+yx=ncomkcomRcom(y,x)eΔGyΔGx+ΔGpol+Okcomk2,ωyx=ncomkcomRcom(y,x)+Okcomk2.
(44)

The ncomkcom term cancels in ratios of ω±yx variables and, therefore, does not affect the sequence statistics. Thus, in the slow completion limit, such models are only affected by the binding free energy differences (ΔGy − ΔGx), the driving (ΔGpol), and the nature of the final completion step (Rcom). Therefore, the finer details do not affect the statistics of the polymers.

Assuming that all completion edges are associated with the same free energy change −ΔGpol, so that Rcom(y,x)=eΔGpol, we may solve for the statistics explicitly. For the case of two monomer types, M = 2, we find the bulk frequency to be ( Appendix I):

ε(1)=112(eΔGpol1)(eDG1)+12(eΔGpol1)2(eDG1)2+4eDG1,
(45)

where DG = ΔG1 − ΔG2. This expression is plotted in Fig. 5 for DG = 4. From this expression, we can confirm explicitly by substituting in the stall driving, ΔGpol = −ln 2, that the bulk frequency, indeed, becomes ε(1)=12. Furthermore, taking the irreversible limit, ΔGpol, we find that the bulk frequency becomes

ε(1)=eΔG1eΔG1+eΔG2,
(46)

which is the equilibrium statistical mechanics probability of choosing state 1 with free energy −ΔG1, given state 2 has free energy −ΔG2. Since the completion reactions are slow and irreversible, in this limit, the process of selecting the monomers is allowed to equilibrate. Therefore, copolymerization is simply sampling from the equilibrium distribution of this process and, hence, tends to the result predicted by equilibrium statistical mechanics.

FIG. 5.

Plots of the frequency of the less stably-bound (incorrect) monomer with smallest binding free energy, labeled 2 for on- and off-rate discrimination balanced models with kcom = 100 and kcom → 0. The binding free-energy difference for these models is DG = ΔG1 − ΔG2 = 4. The models are topologically the Hopfield model as in Fig. 6(a), with ΔGact = 0 and Min = Mact = 1. However, for the on-rate discrimination, the free-energy terms are in the binding reactions instead of the unbinding ones. The specific models are given in  Appendix J.

FIG. 5.

Plots of the frequency of the less stably-bound (incorrect) monomer with smallest binding free energy, labeled 2 for on- and off-rate discrimination balanced models with kcom = 100 and kcom → 0. The binding free-energy difference for these models is DG = ΔG1 − ΔG2 = 4. The models are topologically the Hopfield model as in Fig. 6(a), with ΔGact = 0 and Min = Mact = 1. However, for the on-rate discrimination, the free-energy terms are in the binding reactions instead of the unbinding ones. The specific models are given in  Appendix J.

Close modal

Equation (44) shows that in the slow completion limit, the finer details of the reaction network leading to the selection of a specific monomer becomes unimportant and the models collapse onto a single accuracy curve determined by DG, ΔGpol, and Rcom. Conversely, if we fix all parameters except kcom, we seem to see that the bulk frequencies will tend monotonically to their limits as kcom/k → 0, either from above or below.

We can use this fact to compare bulk frequencies for certain types of models. For example, we may compare on-rate discrimination,26 where incorrect monomers bind more slowly, to off-rate discrimination,26 where incorrect monomers unbind more quickly. An example model comparing on-rate and off-rate discrimination is plotted in Fig. 5 for a model defined in  Appendix J. Consider the bulk frequency of an incorrect monomer. On-rate discrimination benefits from fast polymerization and, therefore, tends to its slow polymerization limit from below, whereas off-rate discrimination benefits from allowing the process selecting monomers to equilibrate and, hence, tends to its slow copolymerization limit from above. This fact sets up a hierarchy for a given set of parameters and moderate or strong driving; for the bulk frequency of incorrect monomers, off-rate discrimination > slow copolymerization > on-rate discrimination. This observation is consistent with the results of Sartori and Pigolotti26 and Poulton et al.23 for kinetic (on-rate) and energetic (off-rate) discrimination.

For our final example, we shall consider an explicit model of copolymerization, with Hopfield’s kinetic proofreading mechanism incorporated into a templated copolymerization system with autonomously separating product in a thermodynamically valid way. From this setup, we can provide a fully worked example of an explicit model and demonstrate the power of the method for analyzing sequences of models with recursive structures as we look at a generalized version of Hopfield’s proofreading incorporated into a model of templated polymerization with autonomous separation.

Explicitly, we first consider the one-loop model of kinetic proofreading shown in Fig. 6(a). There are two monomer types, the right ones x = r and wrong ones x = w. Note that we have already transformed the model so that the sequence of the copy is defined relative to that of the template.35 These monomer types exist in inactive and active states with concentrations Min and Mact, respectively, relative to some reference concentration, with each monomer type having the same concentration. As previously, we shall assume the environment is sufficiently large such that these concentrations remain constant.

FIG. 6.

Reaction rates of the (a) 1-loop and (b) N-loop Hopfield kinetic proofreading models implemented in a templated polymerization model with autonomous separation. Each of these subfigures represents a single petal of the step-wise process as in Fig. 4, going from completed state &x → &xy. In both cases, the template is represented by red squares. In (a) the inactive monomer is represented by a white circle and the activated monomers by a dark blue circle. In (b), different levels of activation are represented by increasingly dark shades of blue circles. Furthermore, in (b) the numbers by the states represent the activation level of the monomer. In each case, the desired pathway is highlighted with red arrows.

FIG. 6.

Reaction rates of the (a) 1-loop and (b) N-loop Hopfield kinetic proofreading models implemented in a templated polymerization model with autonomous separation. Each of these subfigures represents a single petal of the step-wise process as in Fig. 4, going from completed state &x → &xy. In both cases, the template is represented by red squares. In (a) the inactive monomer is represented by a white circle and the activated monomers by a dark blue circle. In (b), different levels of activation are represented by increasingly dark shades of blue circles. Furthermore, in (b) the numbers by the states represent the activation level of the monomer. In each case, the desired pathway is highlighted with red arrows.

Close modal

The monomers may bind to the template either in an active or inactive state with binding free energies −ΔGx for monomer type x. Inactive monomers may be activated on the template with a free-energy change of ΔGact. Finally, active monomers may be polymerized into the copolymer chain, with a free-energy change −ΔGpol. Subsequently, the penultimate monomer of the copolymer unbinds from the template. Each of these reactions is assigned a forward and reverse reaction rate consistent with the thermodynamic model; the full model is illustrated in Fig. 6(a). Conceptually, the proofreading motif functions by providing two opportunities to reject the unwanted monomer w: first, when the unactivated monomer binds and second, after it has been activated. To be effective, a non-zero affinity is required to drive the system around the cycle of states in the correct order: unbound template site → unactivated monomer bound → activated monomer bound.20,43 We emphasize that this model differs from Hopfield’s original description in two important ways: first, we consider a full, microscopically reversible polymerization process, rather than a single incorporation step with irreversible polymerization; and second, we embed the proofreading motif into a non-trivial polymerization process involving autonomous detachment from the template.

Given the model as described in Fig. 6(a), we first identify the propensities ωxy connecting completed states. Due to the petal-like structure, we can follow Eq. (13) and simply consider spanning trees of the petal sub-processes illustrated in Fig. 6; Λ(y, x) rooted at &x, Λ+(y, x) rooted at &xy, and Q(y, x) for a petal connecting &x and &xy. Explicitly writing out the sums of spanning trees, we obtain

Λ1+(y,x)=k1kactMin+kKPMact(k1eΔGy+kact)kpoleΔGx,
(47)
Λ1(y,x)=k1kacteΔGactΔGy+kKPeΔGy(k1eΔGy+kact)×kpoleΔGpol,
(48)
Q1(y,x)=k1kacteΔGactΔGy+(kKPeΔGy+kpoleΔGx)×(k1eΔGy+kact).
(49)

Here, we add a subscript 1 to denote these quantities as applying to the simple, “1-loop,” Hopfield model, which we shall extend to allow more loops later. We note that the ratio Λ+(y, x)/Λ(y, x) factorizes as in Eq. (36), and so, we can easily write down the stall condition as ΔGpol = Γ with

Γ=lnk1kKPMacteΔGr+kact(kKPMact+k1Min)k1kKPeΔGr+kact(k1eΔGact+kKP)+k1kKPMacteΔGw+kact(kKPMact+k1Min)k1kKPeΔGw+kact(k1eΔGact+kKP).
(50)

Note that by setting Min = Mact = 1, ΔGact = 0 in Eq. (50), Γ collapses to −ln 2 as these conditions reduce the system to a balanced one with a stall point at equilibrium, as in Sec. III B.

The frequency of the right and wrong monomers ɛ(x = r, w) may be calculated from Eq. (18) (the calculation is implemented in the supplementary material). We plot copying error, as represented by ɛ(w), in Fig. 7(a), and demonstrate that it agrees well with the results found from a Gillespie simulation65 of the same model. We also compare to a “0-loop” version of the model, in which the inactivated monomers and the inactivated monomer bound state are omitted. As can be seen, the proofreading motif generally improves accuracy when driven above its stall point ΔGpol = Γ. Indeed, we may write down expressions for the bulk frequency in the irreversible limit (ΔGpol) using Eq. (34). In this irreversible limit, we recover Hopfield’s classic argument by taking some further limits consistent with his analysis. Namely, letting Mact, kact, kpol → 0, we find ε(w)/ε(r)=e2(ΔGwΔGr). In this limit, the ratio of incorrect monomers to correct ones involves the square of the binding free energy difference, reflecting the fact that two steps of discrimination have occurred.

FIG. 7.

Analytical method applied to a 1-loop proofreading model [Fig. 6(a)], compared to Gillespie simulation of the same model and a simpler 0-loop model. For these data, the following parameters were used: ΔGr = 2, ΔGw = −2, ΔGact = −1, Min = 1, Mact = 0.01, and k1 = kact = kKP = 1. The stall point Γ is marked on each of the plots. The Gillespie simulations used a template of length 2000 and were run until completion with the first monomer being chosen as either r or w with probability 0.5. The statistics were averaged over 2000 copolymers per data point. The chemical work was calculated from the simulation as (Inactive monomers)*(ΔGact − ln(Mact/Min)) + L*(ΔGpol + ln Mact), where “Inactive monomers” is the number of inactive monomers taken out of the environment and L is the length of the template.

FIG. 7.

Analytical method applied to a 1-loop proofreading model [Fig. 6(a)], compared to Gillespie simulation of the same model and a simpler 0-loop model. For these data, the following parameters were used: ΔGr = 2, ΔGw = −2, ΔGact = −1, Min = 1, Mact = 0.01, and k1 = kact = kKP = 1. The stall point Γ is marked on each of the plots. The Gillespie simulations used a template of length 2000 and were run until completion with the first monomer being chosen as either r or w with probability 0.5. The statistics were averaged over 2000 copolymers per data point. The chemical work was calculated from the simulation as (Inactive monomers)*(ΔGact − ln(Mact/Min)) + L*(ΔGpol + ln Mact), where “Inactive monomers” is the number of inactive monomers taken out of the environment and L is the length of the template.

Close modal

We may also write down expressions for the expected chemical work done per net step of the process. This quantity will involve the total current to absorbing states of the step-wise process for starting with a copolymer &xy, which we may write as

JTot(y,x)=1N(y,x)Λ1+(r,y)Q(w,y)Q(y,x)+Λ1+(w,y)Q(r,y)Q(y,x)+Λ1(y,x)Q(r,y)Q(w,y),
(51)

where N is a normalization factor that will cancel out of calculations. In order to track each of the terms here, we shall break down the contributions to the chemical work done into three parts, one for each of the petals present in the step-wise process. These three petals correspond to adding a monomer type r, adding a monomer type w, or removing a monomer type y. Let us label each of these contributions to the chemical work with a subscript, Gr(y,x) for the transition &xy → &xyr, Gw(y,x) for the transition &xy → &xyw, and Gq(y,x) for the transition &xy → &x. From the r petal, we have

Gr(y,x)=ΔGact+lnMinMactk1kactkKPeΔGr(Min+MacteΔGact)+(ΔGpol+ΔGrΔGy+lnMinΔGa)(k1kactkpolMineΔGy)+(ΔGpol+ΔGrΔGy+lnMact)kKPkpolMacteΔGy×(k1eΔGr+kact)Q(w,y)Q(y,x)N(y,x)JTot(y,x).
(52)

The first line of Eq. (52) corresponds to the chemical work associated with the internal cycle (inactive monomer binds, gets activated, and the activated monomer unbinds). The second line corresponds to an external cycle: an inactive monomer binds to the template, is activated, and is polymerized into the chain with the previous monomer y, detaching from the template. The third line corresponds to the alternative external cycle: an active monomer binds to the template and is polymerized with monomer y, unbinding from the template. We may similarly write down Gw(y,x) as Eq. (52), except swapping r and w. Finally, the contribution to the chemical work from the petal for removing monomer y may be written as

Gq(y,x)=(ΔGpol+ΔGyΔGx+lnMinΔGa)×(k1kactkpoleΔGactΔGyΔGpol)ΔGpol+ΔGyΔGx+lnMactkKPkpoleΔGpolΔGy(k1eΔGr+kact)×Q(r,y)Q(w,y)N(y,x)JTot(y,x).
(53)

Here, only external cycles are possible. The first line corresponds to monomer x rebinding to the template, monomer y being depolymerized, this monomer being deactivated and an inactive monomer y unbinding from the template; and the second line to x rebinding, y being depolymerized, and an active monomer y unbinding from the template. The distribution ξ(y, x) may be calculated from Eq. (20) and P from Eq. (21) (both demonstrated in the supplementary material), letting the chemical work done per net step of the 1-loop model be written as

ΔG=12P1x,y{r,w}ξ(y,x)Gr(y,x)+Gw(y,x)+Gq(y,x).
(54)

This chemical work done is plotted for a certain set of parameters in Fig. 7(b) and is also compared to both the results of direct simulation and the simpler “0-loop” model, which has chemical work ΔGpol. The free-energy cost of the proofreading mechanism diverges as ΔGpol → Γ since there will be a finite chemical work done per monomer addition/removal step due to the proofreading internal cycle and the number of addition/removal steps per net step diverges. Furthermore, for large ΔGpol, the work tends to be dominated by ΔGpol, albeit very slowly, as shown by the orange line gradually approaching ΔGpol (the blue line) in Fig. 7(b).

In addition, we can find an expression for the time taken per net step forwards, Eq. (22). For this quantity, we need the explicit expression for the normalization, N. Similar to the chemical work, we can split this term into contributions from the petal adding an r, Nr(y,x); from the petal adding a w, Nw(y,x); from the petal removing monomer y, Nq(y,x) and a contribution from the central node. These normalization terms come from the sums of the spanning trees directed to the individual nodes in the closed step-wise process. We see that

Nr(y,x)=k1Min(kacteΔGact+kKPeΔGr+kpoleΔGy)+kKPkactMacteΔGact+k1kactMin+kKPkactMact+k1kKPeΔGrQ(y,x)Q(w,y),
(55)

with a similar result for Nw(y,x) except swapping r and w. Finally, for the monomer removal petal, we have

Nq(y,x)=kpoleΔGpol(k1eΔGy+kact+kacteΔGact)Q(r,y)Q(w,y).
(56)

The total normalization is, then,

N(y,x)=Nr(y,x)+Nw(y,x)+Nq(y,x)+Q(y,x)Q(r,y)Q(w,y),
(57)

with the last term being the contribution from the starting, central node. This normalization can be used in Eq. (51) to give the current to absorbing states, which can be used in Eq. (22) to find the expected time per net step. This time is plotted in Fig. 7(c), alongside a simulation of the same model and the simplified 0-loop model for comparison. Like the chemical work in Fig. 7(b), the time per net step diverges as ΔGpol → Γ, since each monomer addition/removal step will take finite time, but the number of such steps required for a net forward step diverges. Unsurprisingly, the time taken for a given driving for the Hopfield model is longer than that of the simple model, due to the proofreading cycle.

Hopfield’s model for proofreading may be naturally extended to include N activation stages instead of just one.48,50 We shall call these extensions the N-loop Hopfield models. These models can be solved recursively to write down expressions for the sums over spanning trees ΛN±(y,x),QN(y,x) as a function of the number of loops N. We shall consider the model as in Fig. 6(b). A detailed derivation of the sums over the spanning trees is given in  Appendix K. From these sums over spanning trees, we calculate the bulk frequencies, the time taken per net step, and the chemical work done per net step using recursive relations (see  Appendix K).

For simplicity, we shall discuss the case where the monomer binding free energy is only dependent on monomer type, not on the activation stage; each activation stage is associated with a free energy change of ΔGact; each active monomer is present in the environment at a concentration Mact except for the inactive monomers at concentration Min; and the overall rate constants are k1 for binding of inactive monomer, kKP for binding of active monomers, and kact for activation of monomers. Under these assumptions, the corresponding rates are given in  Appendix K.

To reduce the frequency of incorrect monomers in the product, we wish to have a low concentration Mact of active monomers in solution to force the system into utilizing the proofreading cycles. Indeed, the bulk error probability in the irreversible limit [calculated using Eq. (34) and plotted in Fig. 8(a)] shows a strong improvement with loop number for low Mact, but larger values of Mact lead to a much worse performance and limited (or negative) returns to increasing the number of loops.

FIG. 8.

Many-loop models have a limited efficacy for finite Mact for the proofreading model introduced in Fig. 6. Plots of the error in the irreversible limit ɛirrev(w) and the stall point driving Γ, for different values of the active monomer concentrations Mact and other parameters: ΔGr = 2, ΔGw = −2, Min = 1, ΔGact = −1, and ks = 1. The N = 0 error is not shown for clarity, but is 0.5 for all Mact.

FIG. 8.

Many-loop models have a limited efficacy for finite Mact for the proofreading model introduced in Fig. 6. Plots of the error in the irreversible limit ɛirrev(w) and the stall point driving Γ, for different values of the active monomer concentrations Mact and other parameters: ΔGr = 2, ΔGw = −2, Min = 1, ΔGact = −1, and ks = 1. The N = 0 error is not shown for clarity, but is 0.5 for all Mact.

Close modal

However, for finite driving strength ΔGpol, we cannot allow this concentration to be arbitrarily small. To see why, consider the stall point Γ(N) derived in  Appendix K and plotted for a certain set of parameters in Fig. 8(b). It is observed that the stall point driving increases monotonically with N and that this increase is faster and tends to a higher limit for smaller Mact. We find that the limiting Γ scales approximately linearly with −ln(Mact). Intuitively, introducing more monomer states at a low concentration in the environment destabilizes the polymer. For a small Mact and driving ΔGpol, the depolymerization of the polymer into these activated states competes with its tendency to grow by binding to and activating the inactive monomers.

One drawback of proofreading with a large number of loops is, therefore, that the tendency to disassemble the growing polymer increases. A second effect is a tendency to introduce errors by alternate pathways if Mact is non-zero. In particular, for Mact ≠ 0, we observe in Fig. 8(a) a minimum in ɛirrev(w) for a relatively small value of N. This minimum can be explained by splitting the pathways by which a monomer can go from solution to being incorporated into the polymer into two; either starting from a fully inactive monomer or from a partially activated one. The pathway starting with an inactive monomer will have the highest discrimination between the right and wrong monomers and will improve exponentially with more loops, as demonstrated by the exponential decrease in error for Mact = 0. However, the probability that a monomer taking this path will reach polymerization falls exponentially with loop number at the same time. On the other hand, the pathway from partially active monomers will give an error that reaches some non-zero limit as the number of loops N increases. Furthermore, the rate with which activated monomers bind to an available template site and subsequently get incorporated into the polymer will also tend to a constant. As such, the error will initially decrease exponentially with N, but for non-zero Mact, will eventually become dominated by the less discriminating, partially active monomer pathways through which monomers are more likely to be incorporated into the polymer.

Having calculated the error probability ɛ(w) at finite driving, plotted in Fig. 9(a); having used ɛ(x, y) to calculate the entropy rate; and having calculated ΔG, we can evaluate the efficiency η, as in Eq. (26) (see the supplementary material for demonstrations). This efficiency is plotted in Fig. 9 for N = 0, 1, 5, 10 and a certain set of parameters. Although accuracy is generally increased above the stall point, we see that in this particular model, kinetic proofreading requires much more work than the minimum required to generate information and as such is inefficient. In addition, the gradient of the efficiency at minimum driving ΓN is zero for N > 0, reflecting how at minimum driving, the number of monomer addition/removal steps diverges, but the chemical work done per such step remains finite.

FIG. 9.

Plots of (a) the error and (b) efficiency of the N-loop proofreading model (Fig. 6) for a range of N, with the same parameters as in the one-loop Hopfield case, shown in Fig. 7, as a function of driving ΔGpol. Proofreading is observed to generally increase the accuracy above its stall point, but in a thermodynamically inefficient way. The enhanced plot in the second graph shows the efficiencies near the stall point for each of the loop numbers on a non-logarithmic scale to emphasize the decreasing gradient at stall.

FIG. 9.

Plots of (a) the error and (b) efficiency of the N-loop proofreading model (Fig. 6) for a range of N, with the same parameters as in the one-loop Hopfield case, shown in Fig. 7, as a function of driving ΔGpol. Proofreading is observed to generally increase the accuracy above its stall point, but in a thermodynamically inefficient way. The enhanced plot in the second graph shows the efficiencies near the stall point for each of the loop numbers on a non-logarithmic scale to emphasize the decreasing gradient at stall.

Close modal

We have presented a method for analyzing copolymerization models with complex networks of reactions leading to the incorporation or removal of monomers. By coarse graining, a model may be transformed into a simpler model that may be solved and, thereafter, information from the fine-grained process may be put back into the model to extract thermodynamic or kinetic quantities such as chemical work done, molecule exchange, or time taken. The approach allows for complex incorporation motifs to be considered alongside the nearest neighbor interactions in a thermodynamically well-defined model of polymerization with microscopic reversibility. We note that all of these features were present in the kinetic proofreading example in Sec. III C. Moreover, phenomena such as the shift in stall point with loop number and the non-monotonicity of the error rate with the loop number rely on these features being present in the model.

In general, this method provides a way to extract model predictions numerically quickly and without the need for simulations. Doing so is particularly useful when simulating polymer growth is slow, either due to the details of the incorporation process or because the polymer is near its stall point. In addition, the approach makes screening of a large parameter space for a given model topology feasible.

In addition to the numerical performance, this approach allows for analytic results in simpler models or those with helpful symmetries and in certain limits for more complex models. The process of summing over the spanning trees is particularly well suited to identifying the structure of the process and providing simplified results.

Moving forward, it is an open question as to whether components of the techniques developed here can be applied outside of the context of infinitely long polymers whose tips have reached a steady state. An obvious goal would be a simplified way to analyze finite-length “oligomers.”24 More generally, we believe that the key equation of this paper, Eq. (14), may be applied more generally for the coarse graining of Markov processes and, in particular, that if a set of states are enclosed between two boundary states, in the sense that any path from one of the trapped states to outside must pass through one of the boundary states, then this set of states may be replaced by a pair of edges analogously to Eq. (14) that shall preserve the steady state properties of the Markov process.

This framework could be applied to explore models of copolymerization processes such as those presented in Refs. 1719, 22, 23, 27, 28, 3035, 3941, and 4551, more straightforwardly or more thoroughly. Alternatively, the method would allow for more complex reaction steps to be included in such models. The framework presented here is particularly useful when backward steps are relevant, either when the system is weakly driven and, thus, operating near the stall point or when thermodynamics is of importance or interest. We also predict that it will be useful to guide design principles for synthetic copolymerization systems, which are often particularly well-described by the class of models studied here.

The supplementary material contains a C++ script implementing the Gillespie algorithm that reproduces the data for the 1-loop Hopfield kinetic proofreading model presented in Fig. 7, and a MATLAB script for numerically calculating quantities of the 1-Loop and N-Loop Hopfield kinetic proofreading models presented in Sec. III C and shown in the solid lines of Fig. 7 and the points of Figs. 8 and 9.

This work was part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Program (Grant Agreement No. 851910). T.E.O. was supported by a Royal Society University Fellowship. J.J. was supported by a Royal Society Ph.D. studentship.

The authors have no conflicts to disclose.

All authors conceived of the project. B.Q. produced the methodology and analysis and wrote the initial draft. All authors interpreted results and reviewed and edited this paper.

Benjamin Qureshi: Conceptualization (equal); Formal analysis (lead); Methodology (equal); Writing – original draft (lead). Jordan Juritz: Conceptualization (equal); Visualization (equal); Writing – review & editing (equal). Jenny M. Poulton: Conceptualization (equal); Writing – review & editing (equal). Adrian Beersing-Vasquez: Conceptualization (equal); Writing – review & editing (equal). Thomas E. Ouldridge: Conceptualization (equal); Funding acquisition (lead); Supervision (lead); Writing – review & editing (equal).

The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.7271702.

We note here that sums of spanning trees can be factorized in terms of self-avoiding walks (SAWs), a result which is both useful for generating sets of spanning trees and allows us to make statements about ratios of propensities of balanced models. For a given process, G=(X,K), for which we wish to find the sum of spanning trees rooted at x1X, we may factorize this sum in terms of self-avoiding walks (SAWs) between two vertices in the graph. Select some other arbitrary vertex x2X/{x1} and let S(x2,x1) be the set of SAWs from x2 to x1. For each SS(x2,x1), we can construct GS=({s}(X/S),KS) analogously to Eq. (9), whereby we collapse the nodes S in the SAW into the single node s. The sum over spanning trees rooted at x1 may then be written as

TT(x1)eTK(e)=SS(x2,x1)eSK(e)SAW termTTS(s)eTKS(e)Spanning tree term,
(A1)

where T(x),TS(x) are the sets of spanning trees directed to x for the original process G and the new process GS. For example, in Fig. 3(a), the spanning trees are arranged in terms of SAWs from node 1 to node 3, with the first row for SAW: 1 → 2 → 4 → 3; the second row for 1 → 2 → 3; and the last three rows for 1 → 3. Similarly, for Fig. 3(b), the trees are arranged in terms of SAWs from node 1 to node 4 with row one for 1 → 2 → 3 → 4; row two for 1 → 2 → 4; row three for 1 → 3 → 4; and row four for 1 → 3 → 2 → 4.

The normalization constant for the closed example process, given in Fig. 2(b), can be found by considering the spanning trees rooted at each of the nodes. Factorizing these in terms of SAWs, we write

N=r34r42r21+r32r24kB+r32r21(r43+r42+kB)+r34kB(r24+r23+r21)+(r31+kA)(r21r43+r21kB+r42r21+r23r43+r23kB+r42r23+r24r43+r24kB)+r13r34r42+r13r32(r43+r42+kB)+r12(r34r42+r43r32+r32kB+r32r42+r43(r31+kA)+kB(r31+kA)+r42(r31+kA)+r34kB)+r12r24r43+r12r23(r42+r43+kB)+r13((r43+kB)(r21+r23+r24)+r42(r21+r23))+r13r34(r21+r23+r24)+r13r32r24+r12r23r34+r12r24(r31+kA+r32+r34).
(B1)

The first square bracket corresponds to the trees rooted at node 1, organized by SAWs from node 3; the second, to trees rooted at 2 organized by SAWs from 1; the third, to trees rooted at 3 organized by SAWs from 1; and the fourth, to trees rooted at 4 organized by SAWs from 1.

Here, we shall show the equivalence of chemical work for a process calculated by summing over edges vs summing over cycles. For this comparison, consider a process (X,K) without any absorbing states (for simplicity) and such that every edge is microscopically reversible, and let π(x) be the steady state probability to be in state x. For an edge xy, as described in Sec. II A 4, the net current through this edge is

Jxy=π(x)K(x,y)π(y)K(y,x).
(C1)

We can write π(x) in terms of the spanning trees by MCTT, and by  Appendix A, we may expand the sum over the spanning trees by SAWs from y to x. For π(y), we may expand by SAWs from x to y such that the spanning tree terms of both expansions are the same and only the direction of the edges in the SAW terms is flipped. The net current may then be written as

Jxy=1NSS(y,x)K(x,y)eSK(e)K(y,x)eSK(e)×TTS(s)eTKS(e),
(C2)

where S(x,y) is the set of SAWs from node x to node y; N is the normalization as in Eq. (2), and e′ is the edge in the opposite direction, i.e., if e = xy, e′ = yx; and the last bracketed term is the spanning tree part for SAW, S, as in Eq. (A1). One of the SAWs from y to x will simply be the single transition xy, however, this term will cancel out from the sum leaving just the non-trivial SAWs. Taking a non-trivial SAW from y to x and multiplying by the rate K(x, y) gives a cycle containing the edge xy. Therefore, the current may be written as a sum over cycle currents, as in Sec. II A 4, of cycles which contain the edge xy minus those which contain yx. Each of the edges contains a contribution to the chemical work lnK(x,y)K(y,x). The total chemical work before absorption is the sum over all edges of these contributions,

Wchem=xylnK(x,y)K(y,x)JxyJTot.
(C3)

Since Jxy may be split up as a sum over cycles in this sum, we may collect the parts of this sum corresponding to given cycles and convert the sum over edges into a sum over cycles. By performing these operations, we find the contribution to the chemical work from cycle C to be lnA(C)A(C), i.e., the affinities as we might expect. Hence, the sum over cycles is equivalent to the sum over edges.

We divide the cycles of the example process, shown in Fig. 2(a), into internal cycles, external cycles to absorbing state A, and external cycles to absorbing state B. First, the internal cycles are

formula

where the cycle is written out below in the clockwise direction. Similarly, we find the external cycles to state A,

formula

Finally, the external cycles to absorbing state B are

formula

Here, we shall derive the number of steps per net forward step of a random walk. Let us set up a random walk as follows: let the state space be the nodes {0, 1, …L}, where L is the length of the walk (polymer). Let the transition 0 → 1 have probability 1, ii + 1 for i = 1, …L − 1 have probability p, ii − 1 for i = 1, …L − 1 have probability q = 1 − p, and let state L be an absorbing state as in Fig. 10.

FIG. 10.

Graphical representation of the random walk process considered.

FIG. 10.

Graphical representation of the random walk process considered.

Close modal

Then, we wish to find the expected number of steps to absorption given we start from state 0, for which we can utilize the spanning tree methods with Eq. (22). Since the total rate out of any state sums to one, the expected number of steps equals the expected time to absorption. Thus, we can form the closed process starting at 0. Let f(n) be the sum over spanning trees rooted at node n for the closed process. f(n) is given by

f(n)=i=0L1piqL1iforn=0pn1i=0L1npiqL1niforn=1,,L1.
(E1)

From this equation, the expected number of steps before absorption is

E[steps]=n=0L1f(n)pf(L1).
(E2)

By utilizing the formulas for finite geometric series, we can find the expected number of steps to be

E[steps]=12p1L1qpLpLqLpq+qLpL+pLqLpL1.
(E3)

Most of this expression is sub-linear in L and as such,

limLE[steps]L=12p1,
(E4)

which is the net number of steps per net forward step.

We wish to show that, at the stall point, the frequency with which a monomer appears in the bulk of the copolymer is proportional to the cofactor of the corresponding diagonal element of a matrix,

ε(x)Axx,
(F1)

where Aij is the cofactor of element i, j of the matrix 1Z. To show this relation, we will rely on the relationship between the cofactors and vectors of the nullspace of a matrix. Let M be an arbitrary matrix with a one dimensional nullspace, and let A be its matrix of cofactors. Recall that

MAT=det(M)1=0.
(F2)

Thus, any column of AT is in the nullspace of M. In anticipation, let μ be a vector in the nullspace of M and v be a vector in the nullspace of MT. Since M has a one dimensional nullspace, then,

μxμy=AixAiy,
(F3)

for some arbitrary i. Similarly,

vxvy=AxjAyj,
(F4)

for arbitrary j.

Looking at Eqs. (15) and (16), noting that near the stall point, vzω±y,x, we see that the tip probabilities μ(x) form a vector in the nullspace of 1MZ and the tip velocities vx form a vector in the nullspace of 1MZT. Hence, we have that

μ(y)vyμ(x)vx=AjyAyiAjxAxi,
(F5)

for arbitrary i, j. Thus, we may choose j = y and i = x leading to cancellation such that

μ(x)vxμ(y)vy=AxxAyy.
(F6)

Since

ε(x)=μ(x)vxyμ(y)vy=AxxyAyy,
(F7)

we get the required result.

We wish to find an expression for the frequency with which monomer x appears in the bulk of the copolymer in the irreversible limit. This limit is such that the backward propensities ωyx = 0. With this assumption, from Eq. (15), we have

vx=yω+yx.
(G1)

With this form for the velocities, we may manipulate Eq. (16),

μ(x)=yω+yxzω+zxμ(y),zxω+zxμ(x)+ω+xxμ(x)=yxω+xyμ(y)+ω+xxμ(x).
(G2)

The last line is the equation for the steady state of a Markov process with probability μ(x) to be in state x and rate ω+yx of transition from state x to state y. Thus, set μ(x) to be the steady state probability distribution of the Markov process on M states with transition rates from state x to y given by ω+yx and vx = yω+yx. Then, calculating

ε(x)μ(x)vx,
(G3)

gives the required result. Finding the distribution μ(x) in terms of the spanning trees of the complete graph on M elements gives Eq. (36).

We shall show that, if the ratio of propensities factorizes as in Eq. (38), then we may simplify the stall condition and frequency of monomers at stall. Thinking of the functions X and Y as column vectors, since they have a discrete domain, the matrix Z may be written,

Z=XYT.
(H1)

By a well-known result,70 

det(1MXYT)=1YTX=1xX(x)Y(x).
(H2)

Here, rearranging gives Eq. (39). As shown, the frequency of monomer x in the bulk of the copolymer is given by the cofactor of the diagonal elements of 1MXYT. The cofactor Axx may be written as

Axx=det(1M1X[x]Y[x]T)=1yxX(y)Y(y)=X(x)Y(x),
(H3)

using the stall condition, where X[x] is the vector X missing element X(x), i.e., X[x]=(X(1),X(x1),X(x+1),X(M))T. In addition, because of the stall condition xX(x)Y(x) = 1,

εstall(x)=X(x)Y(x)
(H4)

is already normalized.

We shall derive the frequency of monomer x in the bulk of the copolymer with propensities given by Eq. (46), canceling ncomkcom, with Rcom=eΔGpol, and with M = 2. With these propensities, Eq. (15) becomes

v1=v1eΔGpol+v1+eDGv2eΔGpol+v2
(I1)
v2=eDGv1eΔGpol+v1+v2eΔGpol+v2,
(I2)

where DG = ΔG1 − ΔG2. These equations may be solved by the following form the velocities:

vx=eΔGyΔGxvy.
(I3)

Doing so reduces Eq. (15) to a quadratic equation,

0=eDGv12+(1+eDG)(eΔGpol1)v1+eΔGpol(eΔGpol2)
(I4)

with one positive root,

v1=12(1eΔGpol)(eDG+1)+(eΔGpol1)2(eDG1)2+4eDG,
(I5)

when the system is not stalling. v2 can be found from in terms of v1 as v2 = eDGv1. Furthermore, a quick check confirms v1 = 0 if ΔGpol = −ln 2. Furthermore, with vy known, Eq. (16) is a simple linear equation, and μ can be found as the eigenvector of the matrix

1eΔGpol+v1eDGeΔGpol+v1eDGeΔGpol+v21eΔGpol+v2,
(I6)

with eigenvalue 1 and normalized to sum to 1. Combining the solutions for μ and v, using Eq. (18), gives Eq. (47).

Figure 11 shows the on-rate and off-rate discrimination models used to produce Fig. 5.

FIG. 11.

Reaction rates of the (a) off-rate and (b) on-rate discrimination models used to produce the results of Fig. 5. These reactions represent a single petal of the step-wise process (Fig. 4) between the completed states &x and &xy. For the results in Fig. 5 for the off-rate and on-rate curves, the following parameters were used: ΔG1 = 2, ΔG2 = −2, k1 = kKP = kact = 1, and kcom = 100.

FIG. 11.

Reaction rates of the (a) off-rate and (b) on-rate discrimination models used to produce the results of Fig. 5. These reactions represent a single petal of the step-wise process (Fig. 4) between the completed states &x and &xy. For the results in Fig. 5 for the off-rate and on-rate curves, the following parameters were used: ΔG1 = 2, ΔG2 = −2, k1 = kKP = kact = 1, and kcom = 100.

Close modal

The sum over the spanning trees of the N-loop model can be written in terms of sums over spanning trees of the lower loop number models. We label the reaction rates for the N-loop process as shown in Fig. 6. The N-loop model has one more node and two more edges than the N − 1-loop model. Let a subscript N denote the sums over the spanning trees for the N-loop models. Tracking the spanning trees, we see,

ΛN+=Ract+(N)ΛN1++RKP+(N)Rpol+Rpoli=0Nj=0i1Ract+(Nj)ΛN1i,
(K1)
ΛN=Ract(N)ΛN1+RKP(N)i=0Nj=0i1Ract+(Nj)ΛN1i,
(K2)
QN=1RpolΛN+Rpol+i=0Nj=0i1Ract+(Nj)ΛN1i,
(K3)

with the initial conditions

Λ1±=Rpol±,Λ0±=Rpol±Rin±,Q0=Rpol++Rin.
(K4)

The sum-product can be eliminated by subtracting terms proportional to ΛN1±,QN1, leaving just

ΛN+=Ract+(N)+RKP+Ract+(N)RKP+(N1)ΛN1+RKP+(N)Ract+(N)Ract+(N1)RKP+(N1)×ΛN2++RKP+(N)Rpol+RpolΛN1,
(K5)
ΛN=Ract(N)+RKP(N)+RKP(N)Ract+(N)RKP(N)ΛN1×RKP(N)Ract+(N)Ract(N1)RKP(N1)ΛN2,
(K6)
QN=Ract+(N)QN1+ΛNRpol+(Rpol+Ract+(N))ΛN1Rpol,
(K7)

with the same initial conditions as above. This system of recursion relations may be used to generate the terms of the spanning tree sums quickly.

In certain simple cases, Eqs. (K5)(K7) can be solved as a function of N. For example, when the reaction rates are not a function of N, such as

Rin+=k1Min,Rin=k1eΔGy,Ract+(n)=kact,Ract(n)=kacteΔGact,RKP+(n)=kKPMact,RKP(n)=kKPeΔGy,
(K8)

for n ∈ {1, …N}, for the step-wise process with monomer tip &xy, where ki are some overall rates, Min and Mact represent the concentrations of inactive or active monomers, respectively, and ΔGact represents the chemical work upon moving a monomer up one activation stage. The rates in Eq. (K8) are used for the numeric results in Figs. 8 and 9. In this case, the sums over spanning trees are

ΛN+(y,x)=kpoleΔGxkactNk1Mink1Mact+kacteΔGyMact(eΔGact1)+kKPMacteΔGyΔk1λ++kact(kKPk1)(λ+kact)2λ+N+1k1λ+kact(kKPk1)(λkact)2λN+1,
(K9)
ΛN(y,x)=kpoleΔGpoleΔGyΔk1λ++kact(kKPk1)λ+Nk1λ+kact(kKPk1)λN,
(K10)
QN(y,x)=kpoleΔGxΔ(k1eΔGy+kactλ)λ+Nk1eΔGy+kactλ+λN+eΔGyΔk1λ++kact(kKPk1)λ+Nk1λ+kact(kKPk1)λN,
(K11)

where

λ±=12kact+kKPeΔGy+kacteΔGact±Δ,
(K12)
Δ=kact+kKPeΔGy+kacteΔGact24(kact)2eΔGact.
(K13)

From Eqs. (K9) and (K10), we may write the stall condition. Noting that, Λ+(y, x) is independent of ΔGpol and Λ(y, x) is proportional to eΔGpol, we may write the stall point as

Γ(N)=lnΛN+(r,r)eΔGpolΛN(r,r)+ΛN+(w,w)eΔGpolΛN(w,w),
(K14)

such that the dependence on ΔGpol in the logarithm is canceled out.

1.
P.
Gaspard
,
Philos. Trans. R. Soc., A
374
,
20160147
(
2016
).
2.
A. P.
Corfield
,
Biochim. Biophys. Acta, Gen. Subj.
1850
,
236
(
2015
).
3.
S.
Pinzón Martín
,
P. H.
Seeberger
, and
D.
Varón Silva
,
Front. Chem.
7
,
710
(
2019
).
4.
M.
Chanda
,
Introduction to Polymer Science and Chemistry: A Problem-Solving Approach
(
CRC Press
,
Boca Raton, FL
,
2013
).
5.
C. G.
Overberger
,
J. Polym. Sci., Polym. Symp.
72
,
67
(
1985
).
6.
W.
Meng
,
R. A.
Muscat
,
M. L.
McKee
,
P. J.
Milnes
,
A. H.
El-Sagheer
,
J.
Bath
,
B. G.
Davis
,
T.
Brown
,
R. K.
O’Reilly
, and
A. J.
Turberfield
,
Nat. Chem.
8
,
542
(
2016
).
7.
H.
Zhang
,
Y.
Wang
,
H.
Zhang
,
X.
Liu
,
A.
Lee
,
Q.
Huang
,
F.
Wang
,
J.
Chao
,
H.
Liu
,
J.
Li
 et al,
Nat. Commun.
10
(
1
),
1006
(
2019
).
9.
B.
Alberts
,
A.
Johnson
,
J.
Lewis
,
D.
Morgan
, and
M.
Raff
,
Molecular Biology of the Cell
(
Garland Science, Taylor & Francis Group
,
New York
,
2014
).
10.
J.
Niu
,
R.
Hili
, and
D. R.
Liu
,
Nat. Chem.
5
,
282
(
2013
).
11.
D.
Kong
,
W.
Yeung
, and
R.
Hili
,
ACS Comb. Sci.
18
,
355
(
2016
).
12.
A. E.
Stross
,
G.
Iadevaia
,
D.
Núñez-Villanueva
, and
C. A.
Hunter
,
J. Am. Chem. Soc.
139
,
12655
(
2017
).
13.
J.-F.
Lutz
,
Sequence-Controlled Polymers
(
Wiley-VCH
,
Weinheim, Germany
,
2018
).
14.
D.
Núñez-Villanueva
,
M.
Ciaccia
,
G.
Iadevaia
,
E.
Sanna
, and
C. A.
Hunter
,
Chem. Sci.
10
,
5258
(
2019
).
15.
D.
Núñez-Villanueva
and
C. A.
Hunter
,
Acc. Chem. Res.
54
,
1298
(
2021
).
16.
J.
Cabello-Garcia
,
W.
Bae
,
G.-B. V.
Stan
, and
T. E.
Ouldridge
,
ACS Nano
15
,
3272
(
2021
).
17.
F. T.
Wall
,
J. Am. Chem. Soc.
63
,
1862
(
1941
).
18.
F. T.
Wall
,
J. Am. Chem. Soc.
66
,
2050
(
1944
).
19.
F. R.
Mayo
and
F. M.
Lewis
,
J. Am. Chem. Soc.
66
,
1594
(
1944
).
21.
S.
Whitelam
,
R.
Schulman
, and
L.
Hedges
,
Phys. Rev. Lett.
109
,
265506
(
2012
).
22.
M.
Nguyen
and
S.
Vaikuntanathan
,
Proc. Natl. Acad. Sci. U. S. A.
113
,
14231
(
2016
).
23.
J. M.
Poulton
,
P. R.
ten Wolde
, and
T. E.
Ouldridge
,
Proc. Natl. Acad. Sci. U. S. A.
116
,
1946
(
2019
).
24.
J. M.
Poulton
and
T. E.
Ouldridge
,
New J. Phys.
23
,
063061
(
2021
).
25.
J.
Juritz
,
J. M.
Poulton
, and
T. E.
Ouldridge
,
J. Chem. Phys.
156
,
074103
(
2022
).
26.
P.
Sartori
and
S.
Pigolotti
,
Phys. Rev. Lett.
110
,
188101
(
2013
).
27.
P.
Sartori
and
S.
Pigolotti
,
Phys. Rev. X
5
,
041039
(
2015
).
28.
M.
Sahoo
,
N.
Arsha
,
P. R.
Baral
, and
S.
Klumpp
,
Phys. Rev. E
104
,
034417
(
2021
).
29.
Y.-S.
Song
,
Y.-G.
Shu
,
X.
Zhou
,
Z.-C.
Ou-Yang
, and
M.
Li
,
J. Phys.: Condens. Matter
29
,
025101
(
2016
).
30.
Y.
Song
and
C.
Hyeon
,
J. Phys. Chem. Lett.
11
,
3136
(
2020
).
31.
Q.-S.
Li
,
P.-D.
Zheng
,
Y.-G.
Shu
,
Z.-C.
Ou-Yang
, and
M.
Li
,
Phys. Rev. E
100
,
012131
(
2019
).
32.
S.
Pigolotti
and
P.
Sartori
,
J. Stat. Phys.
162
,
1167
(
2016
).
33.
F.
Wong
,
A.
Amir
, and
J.
Gunawardena
,
Phys. Rev. E
98
,
012420
(
2018
).
34.
W. D.
Piñeros
and
T.
Tlusty
,
Phys. Rev. E
101
,
022415
(
2020
).
38.
D.
Andrieux
and
P.
Gaspard
,
J. Chem. Phys.
130
,
014901
(
2009
).
39.
R.
Rao
and
L.
Peliti
,
J. Stat. Mech.: Theory Exp.
2015
,
P06001
.
40.
K.
Banerjee
,
A. B.
Kolomeisky
, and
O. A.
Igoshin
,
Proc. Natl. Acad. Sci. U. S. A.
114
,
5183
(
2017
).
41.
D.
Chiuchiù
,
Y.
Tu
, and
S.
Pigolotti
,
Phys. Rev. Lett.
123
,
038101
(
2019
).
42.
T. E.
Ouldridge
and
P.
Rein ten Wolde
,
Phys. Rev. Lett.
118
,
158103
(
2017
).
43.
J. J.
Hopfield
,
Proc. Natl. Acad. Sci. U. S. A.
71
,
4135
(
1974
).
45.
J. D.
Mallory
,
O. A.
Igoshin
, and
A. B.
Kolomeisky
,
J. Phys. Chem. B
124
,
9289
(
2020
).
46.
A.
Murugan
,
D. A.
Huse
, and
S.
Leibler
,
Phys. Rev. X
4
,
021016
(
2014
).
47.
A.
Murugan
,
D. A.
Huse
, and
S.
Leibler
,
Proc. Natl. Acad. Sci. U. S. A.
109
,
12034
(
2012
).
48.
Q.
Yu
,
A. B.
Kolomeisky
, and
O. A.
Igoshin
,
J. R. Soc., Interface
19
,
20210883
(
2022
).
49.
M.
Sahoo
and
S.
Klumpp
,
J. Phys.: Condens. Matter
25
,
374104
(
2013
).
50.
M.
Ehrenberg
and
C.
Blomberg
,
Biophys. J.
31
,
333
(
1980
).
51.
V.
Galstyan
and
R.
Phillips
,
J. Phys. Chem. B
123
,
10990
(
2019
).
52.
P.
Gaspard
and
D.
Andrieux
,
J. Chem. Phys.
141
,
044908
(
2014
).
54.
T. L.
Hill
,
Proc. Natl. Acad. Sci. U. S. A.
85
,
2879
(
1988
).
55.
J. G.
Kemeny
and
J. L.
Snell
,
Finite Markov Chains
(
Springer
,
New York
,
1983
).
56.
V.
Anantharam
and
P.
Tsoucas
,
Stat. Probab. Lett.
8
,
189
(
1989
).
57.
A.
Wachtel
,
R.
Rao
, and
M.
Esposito
,
New J. Phys.
20
,
042002
(
2018
).
58.
H.-H.
Kohler
and
E.
Vollmerhaus
,
J. Math. Biol.
9
,
275
(
1980
).
59.
F.
Cady
and
H.
Qian
,
Phys. Biol.
6
,
036011
(
2009
).
60.
61.
M.
Esposito
,
Phys. Rev. E
85
,
041125
(
2012
).
62.
T. E.
Ouldridge
,
R. A.
Brittain
, and
P. R.
ten Wolde
, in
The Energetics of Computing in Life and Machines
, edited by
C.
Kempes
,
D. H.
Wolpert
,
P. F.
Stadler
, and
J. A.
Grochow
(
SFI Press
,
Santa Fe, NM
,
2019
), pp.
307
351
.
63.
T. M.
Cover
and
J. A.
Thomas
,
Elements of Information Theory
(
John Wiley & Sons, Inc.
,
Hoboken, NJ
,
2006
).
64.
M.
Esposito
,
K.
Lindenberg
, and
C.
Van den Broeck
,
J. Stat. Mech.: Theory Exp.
2010
,
P01008
.
65.
D. T.
Gillespie
,
J. Comput. Phys.
22
,
403
(
1976
).
66.
D.
Chiuchiù
,
J.
Ferrare
, and
S.
Pigolotti
,
Phys. Rev. E
100
,
062502
(
2019
).
67.
M.
Das
and
H.
Kantz
,
Phys. Rev. E
103
,
032110
(
2021
).
68.
V.
Galstyan
,
K.
Husain
,
F.
Xiao
,
A.
Murugan
, and
R.
Phillips
,
eLife
9
,
e60415
(
2020
).
70.
D. A.
Harville
,
Matrix Algebra from a Statistician’s Perspective
(
Springer
,
New York
,
1998
).

Supplementary Material