The steepest-entropy-ascent quantum thermodynamic (SEAQT) framework is used to explore the influence of heating and cooling on polymer chain folding kinetics. The framework predicts how a chain moves from an initial non-equilibrium state to stable equilibrium along a unique thermodynamic path. The thermodynamic state is expressed by occupation probabilities corresponding to the levels of a discrete energy landscape. The landscape is generated using the Replica Exchange Wang–Landau method applied to a polymer chain represented by a sequence of hydrophobic and polar monomers with a simple hydrophobic-polar amino acid model. The chain conformation evolves as energy shifts among the levels of the energy landscape according to the principle of steepest entropy ascent. This principle is implemented via the SEAQT equation of motion. The SEAQT framework has the benefit of providing insight into structural properties under non-equilibrium conditions. Chain conformations during heating and cooling change continuously without sharp transitions in morphology. The changes are more drastic along non-equilibrium paths than along quasi-equilibrium paths. The SEAQT-predicted kinetics are fitted to rates associated with the experimental intensity profiles of cytochrome c protein folding with Rouse dynamics.
I. INTRODUCTION
There has been considerable interest in recent decades in computational models of polymer folding. Folding models have been used to provide information about (i) the initial collapse of polymer chains and the formation of transient structures,1–3 (ii) the behavior of chains of various lengths and morphologies,4–6 (iii) the effects of geometric constraints,7–9 (iv) monomer site mutations,10 and (v) protein adsorption to surfaces.11 Studies of single polymer chain folding have explored transitions from crystalline or globular conformations at low temperatures to extended or coil conformations at high temperatures.
These transitions are sometimes correlated with maxima or minima in plots of the specific heat vs temperature.4,5,12,13 However, since the specific heat is an equilibrium property, it necessarily applies to quasi-equilibrium conditions (i.e., infinitesimally slow temperature changes), so specific heat is not likely to be a reliable predictor of conformation transitions during realistic heating and cooling rates. In addition, folding kinetics are difficult to determine from computations because polymer chain relaxation times range between milliseconds and seconds. These times are beyond the accessible range of atomistic models, such as molecular dynamics.1,2,14 Although Monte Carlo simulations can model folding kinetics,15–19 they are susceptible to becoming trapped in local metastable conformations, and they are sensitive to the starting conditions. Consequently, it is usually necessary to average many Monte Carlo simulations to analyze even small conformation changes.
The steepest-entropy-ascent quantum thermodynamic (SEAQT) framework20–46 is an alternative modeling strategy that can address these simulation challenges. For a given system, this framework predicts a thermodynamically unique kinetic path from any occupiable initial state to stable equilibrium. It utilizes a discrete energy spectrum generated by a Monte Carlo method, but folding kinetics is predicted separately in thermodynamic state space from a deterministic equation of motion—not a Monte Carlo simulation. The kinetic path found by the SEAQT equation of motion proceeds from an initial state to a global stable state; it cannot remain stuck in metastable configurations nor is its path prediction subject to statistical uncertainties that would arise from predicting such a path from averaged Monte Carlo simulations. The sole statistical uncertainties that arise are those from the Monte Carlo method used to generate the energy spectrum.
The SEAQT equation of motion is derived from the principle of steepest-entropy-ascent, which has been suggested by Beretta47 as a fourth law of thermodynamics. This equation of motion predicts the non-equilibrium occupation probability distribution at every instance of time along the kinetic path. This leads to a straightforward description of the energy and entropy evolution in time. Moreover, tracking the change in the probability distribution makes it possible to calculate the non-equilibrium evolution of structural parameters, such as radius of gyration, tortuosity, and end-to-end distance.27 The time evolution of such structural parameters provides an average physical representation of the chain at each point along the non-equilibrium kinetic path predicted by the equation of motion. Unlike properties commonly discussed in the literature, properties along the kinetic path are not limited by the usual quasi-equilibrium assumption.
In this contribution, we employ the Replica-Exchange Wang–Landau algorithm to generate an energy landscape for a 58-monomer protein chain48 and apply the SEAQT equation of motion to predict polymer chain folding along different non-equilibrium paths associated with either heating or cooling. These paths are then compared with quasi-equilibrium paths, which require no equation of motion and are simply a set of consecutive global stable equilibrium states. Once a given path is known, chain folding is characterized by the expected chain energy, radius of gyration, tortuosity, and end-to-end distance.
The remainder of this paper is organized into the following sections. In Sec. II, we begin with a description of the overall SEAQT framework. A non-Markovian Monte Carlo approach used to generate the energy landscape is presented as is the SEAQT equation of motion. This is followed by a description of how the state space results generated by the equation of motion are linked to the polymer chain conformations. Section III presents how different structural parameters (i.e., the radius of gyration, the tortuosity, and the end-to-end distance) evolve during cooling and heating along quasi-equilibrium and non-equilibrium paths. The evolution of polymer conformations is presented as well. A discussion of these results is then given in Sec. IV followed by a set of conclusions in Sec. V.
II. METHOD
Applying the SEAQT framework involves two essential steps. First, a discrete energy landscape is generated using an appropriate model: in this case, it is a lattice model.49 The system is taken to be a single protein chain of fixed length, and each of the possible arrangements of the chain constitutes a conformation with discrete energy, Ej. Borrowing terminology from quantum mechanics, we call Ej an “energy eigenlevel” (or simply “eigenlevel” for short); the combination of a chain conformation and its corresponding energy eigenlevel is an “eigenstate.” Many chain conformations (or configurations) have the same energy. The number of eigenstates an eigenlevel has is its degeneracy, g. The energy dependence of the degeneracy, g(Ej), is sometimes called a density of states; here, we refer to it as the energy degeneracy. The combination of the energy spectrum and associated degeneracies is referred to as the energy landscape.
Second, once the energy landscape is established, an initial state is chosen, and the SEAQT equation of motion is solved over the landscape. This equation of motion provides the non-equilibrium path through thermodynamic state space from the initial state to that of global stable equilibrium (or stable equilibrium for short). The “states” in this context are thermodynamic states, which are not the same as eigenstates. Continuing with the quantum mechanical convention, the energy of a thermodynamic state is an expectation value calculated from the eigenlevel energies and their associated occupation probabilities. The time-dependence of the occupation probabilities is obtained from the SEAQT equation of motion.
The details behind these two steps are provided in the subsequent sections. Section II A describes the Hamiltonian used to calculate the eigenlevels of a particular polymer chain, and Sec. II B describes the Wang–Landau algorithm used to construct the energy landscape. This is followed in Sec. II C by a description of the equation of motion. Section II D then provides an explanation of how the predictions from the thermodynamic state space through which the SEAQT equation moves are linked to the expected conformations of the polymer chain.
A. Energy landscape
The eigenlevels that make up the energy landscape of a 58–monomer chain48 are calculated with the hydrophobic-polar (HP) model formulated by Lau and Dill.49 In this three-dimensional cubic lattice model, the polymer chain is composed of amino acids with either hydrophobic (nonpolar) or hydrophilic (polar) monomers. The energy of a particular chain conformation is calculated from the sum of the attractive interactions at hydrophobic contacts along folded segments of the chain. This energy, or Hamiltonian, is expressed by
where the quantity ɛn,m represents an interaction energy between two monomers on nearest-neighbor sites of the cubic lattice. This term alternates between 0 and ɛHH according to a Boolean switch, bn,m, which is either 0 or 1 depending upon the identities of the n and m monomers. When n and m are (a) both polar monomers, (b) a polar–hydrophobic pair, or (c) consecutive monomers along the chain (i.e., covalently bonded monomers), or (d) there is no monomer neighbor at m, then bn,m = 0 and ɛn,m is set to 0. When n and m are two hydrophobic monomers that are not consecutive monomers along the chain (i.e., n and m are hydrophobic monomers brought into contact by chain folding), then bn,m = 1. In this circumstance, ɛn,m is set equal to the nonzero interaction energy, ɛHH. For the computations here, ɛHH is given the value −1 for convenience. The first summation in Eq. (1) is over the N = 58 monomers of the polymer chain while the second summation is over the Nc first-nearest-neighbor sites to the nth monomer in the chain. For a cubic lattice, Nc = 6. Only adjacent, non-bonded, hydrophobic monomers contribute to the Hamiltonian.
To generate the energy landscape of possible eigenstates (eigenenergies and conformations), a starting conformation is modified through a non-Markovian Monte Carlo procedure using the Wang–Landau algorithm.4,5 The algorithm performs a random walk through possible conformations and maps both the system’s spectrum of eigenenergies and their respective degeneracies. Following previous investigators,4,5,50 allowable chain modifications in the Monte Carlo procedure include pull moves, reptation moves, rebridging moves, and pivot moves. The chosen distribution of movements is 75% pull and reptation moves, 23% rebridging moves, and 2% pivot moves. These percentages were chosen by validating the degeneracy estimated with the Replica Exchange Wang–Landau algorithm against the exact degeneracy for the 14-monomer chain studied by Bachmann and Janke.51
Once the energy landscape is established, thermodynamic states are specified by the occupation probabilities of the energy eigenlevels at a given instant of time. These occupation probabilities can, for example, be predicted for a non-equilibrium path by the SEAQT equation of motion. Each thermodynamic state has energy ⟨E ⟩ = ∑jpj Ej and entropy ⟨S ⟩ = ∑jpj Sj where the angle brackets represent expectation values, Ej and Sj are the energy and entropy of the jth eigenlevel, respectively, and pj represents the occupation probability of eigenlevel j. If the set of all pj’s satisfies a canonical distribution, then they represent stable equilibrium state; if they do not, then they represent a non-equilibrium state.
B. Wang–Landau algorithm
The Wang–Landau algorithm employs entropic sampling to establish a system’s discrete energy spectrum and degeneracies. The algorithm52–54 performs a random walk through the energy eigenlevels and estimates the degeneracies from the fact that the sampling probability of the Ej level approaches the inverse of the degeneracy if the eigenlevels are sampled uniformly: . The random walk uses two counters. The first generates a histogram of the visits to each energy eigenlevel and the second provides a means for updating the degeneracy of the eigenlevels. With each visit to a level, the histogram counter is incremented by 1, and the degeneracy is updated by a modification factor, f. Once all of the eigenlevels have been visited uniformly (a flat histogram), the modification factor is reduced, the histogram is reset, and the process is repeated until the modification factor reaches a small, predetermined threshold. Flatness is achieved when the minimum histogram value is greater than or equal to the average histogram value multiplied by a flatness criterion, which typically ranges between 80% and 100%. In this work, the flatness criterion is set to 99.2% based upon computational accuracy and efficiency considerations.4 The threshold criterion for the modification factor52–54 is chosen as ln(f) < 10−8.
Although the Wang–Landau algorithm is applicable to any microcanonical ensemble, the characteristics of some physical systems slow the method’s convergence. In polymer systems, the number of difficult-to-reach eigenlevels makes it hard to accurately estimate the degeneracy. This problem is mitigated by a parallelized version of the algorithm called the Replica Exchange Wang–Landau algorithm. Replica Exchange Wang–Landau divides the energy landscape into smaller energy ranges, or windows, with multiple Monte Carlo walkers moving independently in each energy window. To ensure convergence of the degeneracy for the whole energy landscape, the energy windows partially overlap each other, and information is shared periodically among walkers from different energy windows.52–55
Figure 1 shows the degeneracy of the 58-monomer chain estimated with the Replica Exchange Wang–Landau algorithm. The algorithm precision is indicated at regular energy intervals by error-bars representing ±1 standard deviation of the degeneracy determined from six independent simulations. The algorithm was implemented with three energy windows: one from 0 to −35 (units of ɛHH) containing one walker, a second from −3 to −40 containing one walker, and a third from −6 to −44 containing eight walkers. A large number of random walkers were used in the third energy window to accelerate the search for the lowest eigenlevel. Figure 2 represents the equilibrium specific heat calculated from the degeneracies of Fig. 1. The error-bars are the upper and lower values of specific heat calculated from an “upper” and a “lower” degeneracy curve. These two degeneracy curves are obtained from the mean curve of Fig. 1 ±1 standard deviation.
C. SEAQT equation of motion
The thermodynamic state of the polymer system is specified by a set of occupation probabilities associated with the energy eigenlevels of the system. These probabilities, pj, change with time as the system evolves from a thermodynamic state to a thermodynamic state. Given the system’s time-independent energy landscape (its discrete energy spectrum and associated set of degeneracies established with the Replica-Exchange Wang–Landau algorithm, Fig. 1), the equation of motion of the SEAQT framework, which is deterministic and not stochastic, predicts the time evolution of a set of non-equilibrium probability distributions along a unique thermodynamic path starting from some arbitrary non-equilibrium state and ending in stable equilibrium. It does so without any a priori assumption of equilibrium. In other words, these non-equilibrium distributions are not equilibrium-derived. To the authors’ knowledge, the Wang–Landau algorithm does not have any implicit dependence upon the equilibrium characteristics of the simulated system. In fact, the canonical distribution associated with equilibrium is unavailable until after the algorithm finishes estimating the energy landscape.
Another distinct feature of the equation of motion is that it requires no mechanistic detail about the system in order to calculate a kinetic path. The equation of motion can be applied to any energy landscape—including a fictitious one with random eigenlevels and degeneracies—because the equation of motion only depends upon the energy eigenlevels and their degeneracies. It is not affected by how the landscape is derived or even what physical system it describes. Thus, the SEAQT framework is able to predict a polymer chain’s morphological evolution solely on the basis of the principle of the steepest entropy ascent. This principle reflects the redistribution of energy among available eigenlevels at every instant of time consistent with the laws of thermodynamics and mechanics. The steepest-entropy-ascent postulate in the framework is a variational principle that leads to the SEAQT equation of motion.39 The question, of course, arises as to why the variational extremum should be a maximum rather than a minimum.
It has been shown previously56–60 that both linear and nonlinear non-equilibrium thermodynamics can be deduced from Ziegler’s maximum entropy production principle. This cannot be done with Prigogine’s minimum entropy production principle,61 which is a special case of the Onsager–Gyarmati principle of linear non-equilibrium thermodynamics. Furthermore, much evidence suggests that processes in nature maximize entropy production at each instant of time,32,62 which is consistent with Ziegler’s principle as well as the steepest-entropy-ascent principle of Beretta.39–42,47,63,64 In addition, as noted by Cahn and Mullins,65 assuming that entropy production is minimized fails in very simple cases such as that of (i) 1D steady state heat conduction and (ii) 1D steady state mass diffusion in the presence of an externally maintained gradient. In contrast, as shown in Appendix B of Li, von Spakovsky, and Hin24 and by Li and von Spakovsky in Ref. 21, the SEAQT equation of motion leads to the correct steady state linear temperature and concentration profiles, respectively. This is done using the steepest-entropy-ascent principle only. No assumption is made of a particular kinetic mechanism, such as Fourier’s law for transient thermal transport or Fick’s law for transient mass transport.27
Originally introduced and applied to small quantum systems in the 1970s and 1980s,41–46 the SEAQT framework has been extended over the past decade and a half to make it a practical tool for predicting non-equilibrium thermodynamic paths at all spatial and temporal scales. As a result, it has been applied to a wide variety of chemical and material problems.20–37
The SEAQT equation of motion for a simple quantum system41 (or a model that mimics a quantum system such as an Ising or a Potts model) is
where is the probability density operator representing the thermodynamic state of the system, t is the time, ℏ is the modified Planck constant, and is the Hamiltonian operator. If the system is classical, becomes the set of occupation probabilities, pj. The left side of this equation and the first term on the right side constitutes the von Neumann form of the time-dependent part of the Schrödinger equation of motion. The state evolution of this part of the equation is strictly reversible. The final term on the right, in which τ is a relaxation parameter and is a dissipation operator, is the SEAQT addition. This term adds the nonlinear dynamics of irreversible state evolution that are absent from the von Neumann or Schrödinger equation. Equation (2) is valid for any irreversible process20,39,40,64 and although originally simply postulated by Beretta et al.,41 it subsequently has been derived via a variational principle along the direction of steepest-entropy-ascent by a constrained gradient descent in Hilbert space while preserving the energy and occupation probabilities.39
Given the aforementioned HP model for the polymer chain (Sec. II A), the system is strictly classical (no quantum correlations). As a result, and commute since they are diagonal in the energy eigenvalue basis.20–23,40,64 For the case when the only generators of the motion (those operators that must be conserved) are the Hamiltonian and identity operator, the equation of motion can be expressed as
where
and pj, Ej, and gj are the occupation probability, energy, and degeneracy, respectively, of the jth energy eigenlevel. Quantities in angle brackets, ⟨·⟩, represent expectation values of properties, such as the energy ⟨E⟩, entropy ⟨S⟩, and their product ⟨ES⟩. In this formulation, the von Neumann definition of entropy is used so that . This form is chosen because it is the only one that satisfies all the characteristics required by the entropy of thermodynamics.66
Equation (3) is only applicable to an isolated system, but it can be extended to account for multiple subsystems interacting within a larger isolated system.20 In doing so, mass and heat transfer between sets of subsystems can be modeled. For example, Eq. (3) can be modified to account for a heat interaction between two subsystems A and B so that the equation of motion of subsystem A becomes20
Using the co-factors C1, , and C3 determined from the first line of the determinant of the numerator and assuming the hypo-equilibrium condition developed by Li and von Spakovsky,20 this equation of motion for subsystem A reduces to the following form20
with the variables t and τ replaced by a dimensionless time defined as . The relaxation parameter, τ, can either be assumed constant or taken to be a function of the time-dependent occupation probabilities, pj, represented by the vector . In what follows: τ is assumed to be a constant that scales the dimensionless time t* to real time. An equation of motion for subsystem B is written in a similar fashion. However, if subsystem B represents a large temperature reservoir, its state does not change in time so that its equation of motion is not needed, and the co-factor ratio in Eq. (6) reduces to a function of the temperature of the reservoir, TR, such that20,21
Now, in order to establish the initial condition used by the SEAQT equation of motion for the heating or cooling cases considered here, a procedure is needed to generate the occupation probabilities of this initial thermodynamic state. Probabilities for an initial non-equilibrium thermodynamic state can be generated using a canonical distribution (for a different temperature), a partially canonical distribution, and a perturbation function.
The equation for the stable equilibrium probabilities, , of a canonical distribution is given by
where βse = 1/(kBTse), and Tse is the temperature of the canonical, i.e., stable equilibrium, state. For a chosen Tse, Eq. (8) provides all of the . The expected energy for the stable equilibrium state with these occupation probabilities is
For a partially canonical state, the equation for the partial equilibrium probabilities, , is written as
where βpe has units of reciprocal energy and could be interpreted as being inversely proportional to a local temperature associated with the partially canonical state (metastable equilibrium state). The δj takes values of 1 or 0 depending upon whether or not the energy eigenlevel is assumed to be occupied. Each set of 0 and 1 values defines a vector of values with the occupations of a particular partially canonical state. The expected energy for the partial equilibrium state with these occupation probabilities is
Once the and the are known for a given system energy, ⟨E⟩, the following perturbation function can be used to generate a non-equilibrium probability distribution for time t = t0:
Here, λ is a number between 0 and 1. The closer it is to 1, the further the initial non-equilibrium state is from stable equilibrium. The non-equilibrium probability distribution of Eq. (12) provides the thermodynamic state used for the initial condition of the SEAQT equation of motion, Eq. (6).
The properties and conformations resulting from the non-equilibrium distributions predicted by the SEAQT equation of motion along non-equilibrium paths are compared in Sec. III below to a corresponding set of properties and conformations resulting from quasi-equilibrium paths that consist only of a set of neighboring stable equilibrium states. The quasi-equilibrium paths and properties are based on the same system energy landscape used by the SEAQT equation of motion but are not predicted with the equation of motion; they consist only of stable equilibrium, i.e., canonical, states that can be generated using Eq. (8).
D. Linking state space to chain conformations
Previous Monte Carlo simulations of polymer chains provide quasi-equilibrium properties.4,55,67 The SEAQT equation of motion allows one to predict properties along any non-equilibrium thermodynamic path. Its predictions are based exclusively on the way energy shifts among eigenlevels according to the steepest-entropy-ascent principle. The computational overhead for the equation of motion is very modest since it is a first-order ordinary differential equation in time. In addition, when the energy landscape is topographically rough, there is no danger of the system becoming trapped in a metastable state (i.e., a partially canonical state)2,27 because the time derivatives of all the eigenlevel occupation probabilities do not become zero until the expectation values of the system energy and entropy—the thermodynamic state—satisfies the global stability criterion.
Nevertheless, in order to describe changes in chain conformations or changes in specific physical properties with time, it is necessary to work backward from the occupied eigenlevels and the expected values of thermodynamic states to representations (conformations) of the eigenstates. Because there are an astronomical number of chain conformations associated with each energy eigenlevel, the procedure outlined in Ref. 27 is used to link the thermodynamic states to expected conformations through microstructural descriptors. The descriptors are parameters calculated from a chain conformation. Those used here are the radius of gyration (Rg), the tortuosity (ζ), and the end-to-end distance (RE).4,55,67
The radius of gyration is calculated via the expression,
where N is the number of monomers in the system, ri is the three-dimensional location vector of a given monomer, and rcm is the calculated location of the center of mass of the chain. All the monomers are assumed to have the same mass. The radius of gyration provides a quantitative assessment of the distribution of mass of the polymer chain.
The tortuosity employed here is defined as
where si is a local vector at the ith monomer that reflects the cumulative turns along the chain and is the mean of these vectors. The vector si has components,
with the “turn vector,” wj, given by
To calculate si, the cross product of vectors from the current monomer to the next two monomers in the sequence (i.e., rj,j+1 = rj+1 − rj and rj,j+2 = rj+2 − rj) are determined, and their x, y, and z components summed. Whenever the polymer chain bends, it produces a non-zero wj that is oriented perpendicular to the plane of the turn. The tortuosity is determined from the difference between the cumulative mean turns along the entire length of the chain, and it represents a measure proportional to the number of turns present along the chain segments. It is zero for a straight chain.
The end-to-end distance, RE, given by
is the distance from the head of the chain, with coordinates r0 = {x0, y0, z0}, to the tail of the chain, with coordinates rN = {xN, yN, zN}.
A direct link between state space and chain conformations would involve associating the structural parameters with the conformations in each energy eigenlevel, but this approach is impractical because the degeneracy of each Ej is far too large to record but some small fraction of the chain conformations.27,51 Instead, a strategy is employed to identify and record only those conformations that lie along the kinetic path. First, at each energy eigenlevel sampled by the Replica Exchange Wang–Landau algorithm, the arithmetic average of the descriptors of unique visits is found for Rg, τ, and RE. These averages are determined from over 1011 unique visits to each eigenlevel (degeneracies are in the range 108 < g < 1030). Wust and Landau4 also used this re-sampling technique and noted that it “cover[s] conformational space more uniformly and faster (including conformational regions of low energy) than with standard multicanonical sampling.”
Next, the SEAQT equation of motion is solved for the occupation probabilities of the energy eigenlevels along a given kinetic path. Once these predicted probabilities are known, they are multiplied by the arithmetic averages of Rg, τ, and RE at each energy eigenlevel and summed to obtain expectation values of these descriptors along the kinetic path. Finally, to determine representative chain conformations along the path, the Replica Exchange Wang–Landau algorithm is run a second time to record specific conformations. During this second Monte Carlo simulation, conformations are recorded only for occupied eigenlevels, and descriptor values close to the expected values obtained from the SEAQT equation of motion. Since the number of occupied eigenlevels is much less than the total eigenlevels, and matching expected descriptor values eliminate all but a few of the degenerate conformations, the procedure yields a small set of conformations that evolve smoothly with time along the kinetic path.
Scaling the rate with which the simulated system traverses the kinetic path to real-time can be accomplished by linking τ to physical quantities like a diffusion coefficient or other experimental data. How this is done is explained in Sec. IV.
III. RESULTS
The evolution of the 58-monomer chain is tracked for two types of thermal interactions: an exchange of heat with a low-temperature reservoir (cooling) and an exchange of heat with a high-temperature reservoir (heating). For each of these cases, two kinetic paths are considered: (I) a quasi-equilibrium path and (II) a non-equilibrium path. The latter evolves according to the equation of motion from some initial non-equilibrium thermodynamic state to stable equilibrium with the thermal reservoir. The quasi-equilibrium path is an idealized reversible path for which equilibrium is maintained continually until thermal equilibrium with the thermal reservoir is achieved. The equation of motion is not needed to describe this path because occupation probabilities obey a canonical distribution [Eq. (8)] at each temperature along the path. For the non-equilibrium path, however, the initial non-equilibrium thermodynamic state is established using Eqs. (8)–(12), and the final stable equilibrium state is set by the choice of the reservoir temperature. The occupation probabilities at each instant of time along the path between these two states must be found from the equation of motion. The final thermodynamic state predicted by the equation of motion coincides with the equilibrium canonical distribution at the reservoir temperature.
Results for the quasi-equilibrium path (I) and non-equilibrium path (II) for cooling are shown in Figs. 3–5 and for heating in Figs. 6–8. The thermodynamic initial and final states are indicated by subscripts “i” and “f,” respectively. For the two cooling paths seen in Fig. 3, the initial thermodynamic states for the quasi-equilibrium and non-equilibrium paths are different: Ii ≠ IIi. The low-temperature reservoir is set to the same temperature for both paths so that their final states, If and IIf, are the same stable equilibrium state. For the heating processes of Fig. 6, the initial states for the quasi-equilibrium and non-equilibrium paths have the same expected energy but different probability distributions: (Ii ≠ IIi), so they are different thermodynamic states. The high-temperature reservoir is set to the same temperature for both paths, so again their final states, If and IIf, are the same.
In both Figs. 3 and 6, the solid and dotted curves together represent the curve of stable equilibria; the slope of this curve is the thermodynamic temperature, . The ⟨E⟩ vs ⟨S⟩ curve of stable equilibria can be calculated from the canonical probability distribution over a range of temperatures, Tse. The solid curve (red online) represents the quasi-equilibrium path chosen for the cooling case. The dashed gray curve is the non-equilibrium path predicted by the SEAQT equation of motion. For both paths, the black circles and squares are states at selected times along the paths from the initial state to the final equilibrium state. During cooling along either path in Fig. 3, the energy and entropy of the polymer decrease as energy is extracted from the polymer and dumped into the reservoir. During heating (see Fig. 6), the energy and entropy of the polymer increase as energy flows from the reservoir into the polymer.
To compute a single non-equilibrium path, the SEAQT equation of motion (a system of first-order ordinary differential equations in time) is solved using Matlab’s ODE45 numerical differential equation solver. Solving the system requires about a minute of the central processing unit (CPU) time on a processing thread of a Mac computer using an Intel i7 processor 8750H clocked at 2.2 GHz. The solution speed can be adjusted by varying the mass matrix through Matlab’s odeset options with the parameters RelTol and AbsTol set to 1−10 and 1−20, respectively. Of course, the total computational time to calculate SEAQT kinetics must also include the time needed to build the energy landscape. This is the most resource-intensive step in the calculations. For the degeneracy shown in Fig. 1, the arithmetic averages of the descriptors and representative chain conformations required approximately one week of CPU time, although the additional simulations needed to generate the error-bars extended the computational time to about one month of CPU time.
A. Interactions with a low-T reservoir
The final states, If and IIf, in Fig. 3 correspond to a low-temperature reservoir at a non-dimensional temperature of kBTR/ɛHH = 0.125 where ɛHH is the negative of the interaction energy in Eq. (1) and kB = 1.4 The two paths through state space in Fig. 3 indicate the difference between an ideal, reversible path, i.e., the quasi-equilibrium path, and a real path, i.e., the non-equilibrium path, during cooling. Next, we explore how the parameters associated with the chain descriptors and conformation evolve along these two kinetic paths.
The time evolution of the descriptors Rg, τ, and RE are shown in Fig. 4. The SEAQT equation of motion is a deterministic equation that does not introduce error into the calculation of descriptor properties, so the error-bars in Fig. 4 reflect the range of values expected from uncertainty in the degeneracies of Fig. 1. The upper error-bars in Fig. 4 were calculated using the upper limit of the degeneracies in Fig. 1 [i.e., ln g(Ej) + σ], and the lower error-bars in Fig. 4 were calculated using the lower limit of the degeneracies [ln g(Ej) − σ].
The expectation values for the radius of gyration, Rg, nearly overlap during quasi-equilibrium cooling and non-equilibrium cooling. Although the curves are similar, the initial value of Rg for the non-equilibrium curve is significantly larger, suggesting it has a less-pronounced hydrophobic core and, thus, a wider distribution of mass around its core than the initial conformation for the quasi-equilibrium path. The expectation values for Rg along both paths decrease rapidly at the beginning of the evolutions and then very gradually as the stable equilibrium value at the reservoir temperature is approached.
For the tortuosity descriptor, there is an initial rapid increase in τ during non-equilibrium cooling followed by a more gradual change to the final equilibrium value. There is a somewhat less rapid initial change followed by more moderate evolution along the quasi-equilibrium path, although the total change in tortuosity along the quasi-equilibrium path is about half the change along the non-equilibrium path. It is not clear why initially the quasi-equilibrium tortuosity exhibits a non-monotonic increase and then decrease, but similar changes in the slope of tortuosity vs temperature curves have been reported in other studies.4
The end-to-end distance, RE, during cooling follows a trend similar to the radius of gyration for the non-equilibrium path. For the quasi-equilibrium path, there is a small initial rapid decrease followed by a gradual increase to the stable equilibrium value. The difference between the RE values of non-equilibrium and quasi-equilibrium cooling is initially large but sharply decreases with time. For both RE paths, there is little change after t* ∼ 0.5 at which time RE essentially has reached the stable equilibrium value at the low-temperature reservoir.
The expected chain conformations along the non-equilibrium and quasi-equilibrium paths of Fig. 3 are presented in Fig. 5. The upper row shows six expected conformations along quasi-equilibrium path I in Fig. 3 at the points indicated by squares in that figure. The solid arrow in Fig. 5 indicates the direction of evolution in time from a representative initial configuration on the left to stable equilibrium on the right. The lower row shows six expected conformations along non-equilibrium path II in Fig. 3 at the points indicated by circles in that figure. The dashed arrow in Fig. 5 indicates the direction of evolution in time from a representative initial configuration on the left to stable equilibrium on the right. These conformations are chosen from expected values for the energy and the structural descriptors, Rg, τ, and RE, for individual states along the calculated paths. The conformations shown are not simply snapshots taken from a continuous Monte Carlo simulation but rather are structures that represent expected conformations of the chosen thermodynamic states along the calculated path. For the quasi-equilibrium path, the initial chain conformation (left-most image) has a well-formed hydrophobic core (the darker hued orange spheres represent the hydrophobic monomers) with only a few free monomers apart from the center of mass. The initial state for the non-equilibrium path has a smaller hydrophobic core, and segments of non-interacting monomers extending away from the core are more apparent. The two paths have different initial states, but the right-most image on both rows are same because the final stable equilibrium state for the quasi-equilibrium and non-equilibrium paths are identical. Generally speaking, the quasi-equilibrium conformations begin with a relatively compact core that becomes slightly denser as the chain cools to the reservoir temperature, whereas the non-equilibrium path begins with a more open conformation that eventually collapses to the same compact core.
As eigenlevel occupation evolves differently with time along the quasi-equilibrium and non-equilibrium paths, chain conformations must also evolve differently. Conformation changes along the two kinds of paths are wholly distinct even though the paths have similar descriptor values because the values do not all change at the same speed. Since only expected descriptor values are considered in this state-based method, the claim made here is not that an actual system will adopt the exactly predicted chain conformations, but instead that the chains will appear notably similar. Moreover, despite the fact that Wang–Landau chain movements—such as re-bridging moves—are random and unrelated to the sequence of changes in a particular chain, the SEAQT framework can predict chain evolution when a continuous set of descriptors are selected along the kinetic path to ensure conformations evolve smoothly.
B. Interactions with a high-T reservoir
To investigate the behavior of the 58-monomer chain during a heating process, the SEAQT equation of motion is solved with a higher reservoir temperature. In this case, heat flows from the reservoir into the polymer until the system and high-temperature reservoir equilibrate. The final states, If and IIf, in Fig. 6 correspond to a high-temperature reservoir at a non-dimensional temperature of kBTR/ɛHH = 1.8. The quasi-equilibrium (I) and non-equilibrium (II) heating paths are shown in Fig. 6. During heating, the polymer chain is expected to transition from a globular conformation at its initial state to an uncoiled conformation at the high reservoir temperature. Because the degeneracy of the eigenlevels greatly increases at higher energies (Fig. 1), the difference between the quasi- and non-equilibrium paths is also expected to be more pronounced for the heating case.
The evolution of the structural parameters associated with the quasi-equilibrium and non-equilibrium heating paths are shown in Fig. 7. The error-bars were calculated in the same way as those in Fig. 4. The expectation values for the radius of gyration, Rg, and end-to-end distance, RE, exhibit periods of rapid unfolding for both paths but are much more pronounced for the non-equilibrium path. This is then followed by a gradual increase as the system approaches the equilibrium values. The tortuosity, ζ, follows behavior similar to that of Rg and RE with the primary difference being that the slope for ζ for the two paths is negative. For all three parameters, the change along the non-equilibrium path is initially rapid with a steep slope for a relatively brief amount of time followed by a very gradual increase or decrease to the stable equilibrium value. This contrasts with the quasi-equilibrium path behavior, which initially sees a less rapid increase or decrease that, however, lasts well along the path before eventually transitioning smoothly to a very gradual asymptotic approach to the stable equilibrium value. The peculiar inflection or initial decrease seen in the tortuosity and the end-to-end distance curves during quasi-equilibrium cooling is not seen here.
During heating, the energy and entropy of the polymer increase as energy flows into the system from the reservoir. Consequently, the values of the Rg, τ, and RE descriptors evolve along the heating paths (Fig. 7) in opposite directions (with the exception of the tortuosity for the quasi-equilibrium path) to the corresponding values along the cooling paths (Fig. 4).
The expected chain conformations along the non-equilibrium and quasi-equilibrium heating paths of Fig. 6 are shown in Fig. 8. The chain conformations along the quasi-equilibrium path (upper row) and the non-equilibrium path (lower row) begin with moderately compact hydrophobic cores (left-most images) and quickly uncoil as heating takes place. As the trends in the descriptor values suggest, the chain conformations in Fig. 8 uncoil most quickly along the non-equilibrium path during heating. In addition, the differences between the chain conformations along the quasi-equilibrium and non-equilibrium paths are much more apparent during heating (Fig. 8) than during cooling (Fig. 5).
For both cooling and heating interactions with a thermal reservoir, the expected values of the property descriptors all evolve faster along the non-equilibrium paths than along quasi-equilibrium paths, especially during the initial portion of the kinetic paths. Thus, transitions in chain conformations are initially more drastic along the non-equilibrium path than along the quasi-equilibrium path.68
IV. DISCUSSION
A direct comparison of chain properties and conformations with experimental data from the literature is challenging. Sensitive measurements of structural parameters are difficult to make, and the sequence and length of monomers are difficult to control precisely. Nonetheless, even without direct comparisons, polymer folding models can elucidate aspects of chain folding that are not readily accessible to experiments. For example, it is known polymers form near-native structures through an initial collapse over a period of a few milliseconds to tens of milliseconds. This initial collapse is followed by a gradual transition toward the true native structure over hundreds of milliseconds.2,69–73 The rapid collapse and the start of the gradual transition are recovered in Figs. 4 and 5, which show significant changes in the evolution of the structural properties followed by a period of gradual change as the system approaches stable equilibrium. In fact, qualitative comparisons of calculated results for the low temperature folding path follow a trend similar to the ultra fast mixing derived experimental intensity profiles for a 103 monomer chain of cytochrome c.74 Assuming similar conformation evolution, these experimental results make it possible to scale the non-dimensional times employed here to real times and predict the kinetics of property changes. The procedure for converting the non-dimensional kinetics to dimensional kinetics is described in the Appendix, Sec. VI.
In general, structural properties during quasi-equilibrium and non-equilibrium cooling and heating differ, particularly during the initial stages of heating or cooling, and especially during heating. In addition, under high-temperature conditions or denaturing of the chain, the chain is expected to retain some of the hydrophobic core as it uncoils. This behavior is also recovered in our predictions. From Fig. 8, the representative chains unravel from the central core, which remains distinct along both paths until the system reaches the highest energy state.69
Finally, the reported descriptors could be used to optimize the statistical search through conformation space by creating bounds for chain conformations. The range of calculated values could be used in Monte Carlo simulations to reduce or constrain the search range to a small set of available conformations and potentially improve the computational time by avoiding structures prone to trapping.2,69 It is worth emphasizing, though, that once the energy landscape is generated, conformation evolution predicted with the SEAQT framework is made without reference to any kinetic folding mechanism. In addition, as mentioned previously, there is no risk of the system becoming kinetically trapped in local or metastable equilibrium conformations because the steepest-entropy-ascent principle ensures the system moves in state space toward the global maximum entropy (stable equilibrium) during either cooling or heating interactions with a thermal reservoir.
Of course, some discussion is warranted here on the efficiencies of the Wang–Landau-SEAQT approach, which has been presented. For example, Wust and Landau (e.g., Ref. 4) have shown that Wang–Landau sampling has similar to as much as 2 orders of magnitude faster operational speeds at locating the lowest eigenlevels when compared to multi-canonical methods. Furthermore, those authors have shown that the Wang–Landau convergence speed for a system with an 80% flatness criterion produces an estimated degeneracy result within a time similar to that of a single lowest-energy-configuration search by multi-canonical methods. They have also shown that Wang–Landau sampling is between an order of magnitude slower and three times faster for a range of chain sizes4 than multi-canonical methods. As to the Wang–Landau implementation used here for determining the energy landscape of the 58-monomer chain, our estimate is that it is at least an order of magnitude slower for calculating the degeneracies than what is reported in Ref. 4. However, our implementation of moves is somewhat different and the conditions we used for convergence are more stringent. Thus, for ln(f) < 3 × 10−8 and a flatness criterion of 99.2%, convergence required 5–8 days using a Ryzen 95800X processor in a desktop personal computer (PC) vs the 2.23 h with a flatness criterion of 80% reported by Landau et al., using a processor which was probably two to three times slower than that used in this work. As alluded to above, we speculate that in part the difference in computational efficiency may arise from implementation differences such as how the polymer movement options are validated although clearly the much more stringent convergence criterion also plays a significant role. As to the computational costs of solving the SEAQT equation of motion for a single non-equilibrium path, it is, as outlined at the beginning of Sec. III, insignificant compared to the time required to execute the Replica Exchange Wang–Landau algorithm.
Whatever the computational efficiencies of the Replica Exchange Wang–Landau algorithm are, the authors believe that there are real benefits to combining this algorithm with the SEAQT framework. The methodology laid out in this paper leverages the path-independent nature of the Replica Exchange Wang–Landau algorithm plus the path predictions of the SEAQT equation of motion to provide for relatively free movement across the energy space. If tens and hundreds of paths are needed, the only significant computational cost is incurred once with the Replica Exchange Wang–Landau algorithm whose results are valid for all the paths predicted by the SEAQT equation of motion. In contrast, using a simple Metropolis search for minimized structures constrained by Boltzmann temperature probabilities would, at a minimum, require several separate Monte Carlo simulations to deduce even a single thermodynamic path and thus, a nontrivial amount of time to reach the require minimized energy structures from some set of initial high energy conformations. Furthermore, a Metropolis simulation of a system is intrinsically slower, as movement from any given initial structure does not guarantee the system will encounter or progress through the necessary structures to reach the minimized energy configuration. Any simulation would require repeated fluctuations of the system structure between higher and lower energies, which may not be possible due to the low probability of energy increase given by the Boltzmann temperature constraint. Additionally, a Metropolis approach has no means of fundamentally distinguishing the thermodynamic state of one given initial random structure from another. In other words, the majority of possible paths producible by the SEAQT equation of motion simply cannot be generated with a Metropolis algorithm without prior knowledge of the degeneracy and expected descriptor values with which to place the initial system’s thermodynamic state within the likely non-equilibrium region.
In short, with a reasonable investment of time, the thermodynamically relevant information derivable from the Wang–Landau algorithm permits the calculation of stable equilibrium information and, via the SEAQT equation of motion, as many thermodynamic non-equilibrium paths as there are initial states. This combination is, thus, able to generate all the relevant expected extensive thermodynamic and structural information needed to describe a system’s conformational evolution along any non-equilibrium thermodynamic path.
V. CONCLUSIONS
Folding transitions of a simple polymer chain are studied using the principle of steepest-entropy-ascent. In conjunction with a path-independent energy landscape, the steepest-entropy-ascent principle is employed to describe chain conformations and chain properties along non-equilibrium cooling and heating paths. These calculations are performed with no prior assumption about the kinetic mechanism(s) governing folding. The Replica Exchange Wang–Landau algorithm is utilized to generate the energy landscape, which includes the necessary degeneracies and structural parameters associated with each energy eigenlevel. The kinetic path through state space is found by solving the SEAQT equation of motion. When applied to a 58-monomer chain with an amino acid sequence taken from Dill et al.,48 the following conclusions are drawn:
Chain conformations and the properties Rg, τ, and RE change more drastically along non-equilibrium paths than along quasi-equilibrium paths.
The kinetics predicted by the SEAQT equation of motion agree qualitatively with the radius of gyration and intensity data of a coiling cytochrome c protein.
The simulated folding kinetics can be made physically realistic by fitting the SEAQT relaxation parameter, τ, to experimental data.
Representative chain conformations can be constructed from expected values of the system energy and structural descriptors along any kinetic path.
SEAQT extends the current application of energy landscapes describing polymer folding by linking a verifiable and thermodynamically bound equation of motion to unique entropy-driven paths through state space.
ACKNOWLEDGMENTS
J.M. acknowledges support from the Department of Education through the Graduate Assistance in Areas of National Need Program (Grant No. P200A180016).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Jared McDonald: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (supporting); Investigation (equal); Methodology (supporting); Project administration (equal); Resources (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Michael R. von Spakovsky: Formal analysis (equal); Methodology (equal); Resources (equal); Supervision (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal). William T. Reynolds, Jr.: Data curation (supporting); Formal analysis (equal); Funding acquisition (lead); Methodology (equal); Supervision (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX: CONVERTING DIMENSIONLESS TIME
To transform the dimensionless time (t*) scale used for the figures presented in Sec. III to real time (t), the predicted results are compared to the experimental sub-millisecond folding results for a 103-monomer cytochrome c protein found in Ref. 74. As indicated earlier, . For a constant relaxation parameter value, this reduces to t* = t/τ where it is assumed that t is proportional to the experimental value, tex, for the evolution of the cytochrome system in Ref. 74. The experimental results for this system are qualitatively comparable to the radius of gyration, Rg, results75 for the quasi-equilibrium, low-temperature reservoir path up to a dimensionless time of t* = 1 in Fig. 4. The experimental time that corresponds to this is tex = 5 × 10−4 s. The proportionality constant linking t with tex is then found using Rousian dynamics, which indicates that with the diffusion coefficient given by D ∝ 1/N. Thus, t ∝ N, where N is the total number of monomers and t is found from t = (Nsim/Nex)tex = 0.0028 s or 2.8 ms. With t* = 1, the relaxation parameter τ has a value of 2.8 ms as well.
Now, if no experimental results are available, then tex can be replaced with tR determined from Rousian dynamics since or , where C is a proportionality constant. Rousian dynamics are utilized since they are the means generally used in Monte Carlo simulations to determine time evolutions. Furthermore, since the radius of gyration reported here (e.g., in Fig. 4) is dimensionless, it must be dimensionalized. This can be done using a lattice parameter, a, defined as the point in the potential well formed by a Lennard–Jones potential, i.e.,
where the attractive force changes to a repulsive one or vice versa. This effectively simplifies the first nearest neighbor hydrophobic–hydrophobic interaction as an energy well Van der Waals interaction. Mao et al.74 provide a value of σ = 3.042 Å from which a = σ(21/6) = 3.415 Å. In addition, the dimensionless energy, , in Eq. (1) can be scaled using Vα−β and a value for ɛαβ = 1.34 meV.74 However, it should be noted that this value is an order of magnitude less than what would be expected for molecular Lennard–Jones potentials since it does not consider all atoms in a single monomer.74
Returning now to Fig. 4, the change in the radius of gyration for the quasi-equilibrium, low-temperature reservoir path is about ΔRg = (ΔRg/N)N = 0.174, which when scaled to the 103 monomer cytochrome c protein becomes . Squaring and scaling using the calculated lattice parameter value gives a of about 1.11 Å2. Then, using an experimental value for the diffusion coefficient for the protein of D = 1 × 10−6 cm2/s found in Ref. 76 and the experimental time from Ref. 74 used above, the dimensionless proportionality constant for scaling becomes C ≈ 4.5 × 106. This constant can, of course, depend on a number of factors, including the temperature, the modeled system’s environment, and the available degrees of freedom.7,8,70,72 However, this value does provide a reasonable constant for scaling the present as well as future results where the real time t is unknown.
REFERENCES
“Faster” or “more drastic” here refers to the change in a property with an incremental change along the kinetic path in state space. This is not the same as the time rate with which the kinetic path is traversed.
Although the experimental results are measured using intensity profiles, the rate of change of the generated curves map directly to morphological changes due to the reduced separation of functional groups during the initial collapse. The experimental results are most comparable to the radius of gyration since it reflects the significant cluster formation seen in Fig. 4, while the tortuosity is expected to gradually increase beyond the initial collapse as the system moves closer to the actual native conformation.