We study the conformational transitions of proteins by using the hydrophobic-polar (HP) model on a square lattice. In contrast with previous studies that relied on sampling techniques, we conducted an exhaustive enumeration of all possible conformations to obtain the density of states so that exact physical quantities could be computed. We study the conformational transitions of three sequences with varying lengths and observe both the collapse and folding transitions. The transitions exhibit distinct characteristics that depend on the sequence.
I. INTRODUCTION
To function properly, proteins must fold into unique native structures, and the specific sequence of amino acids of a protein is designed for the free energy to have a unique minimum.1 In addition to the folding transition from ensemble of unfolded conformations to the native state, there can be a collapse transition from random coil to a compact globule due to non-specific hydrophobic interactions between the residues, which is a generic property of polymers.2–11
Various coarse-graining methods have been employed to obtain simple protein models to study the general physical properties of proteins, including their conformational transitions,9–11 because a computational study of proteins with all-atom models is too computationally intensive. The hydrophobic-polar (HP) protein model12–18 is obtained through a drastic simplification whereby 20 amino acids are classified into either hydrophobic (H) or polar (P) and the conformations are only allowed to be on a lattice.
Because proteins are heteropolymers with a finite-size, conformational transitions are not phase transitions that are defined in terms of singular physical quantities. Instead, they are pseudo-transitions that are defined in terms of the maxima of the corresponding physical quantities, such as the change in the radius of gyration, specific heat, etc. In order to compute these quantities, the density of states, which contains all the relevant information in the equilibrium property of the system, must first be computed.
In this work, we compute the exact density of states by enumerating all possible conformations of the lattice protein, and this stands in contrast with previous studies that relied on various sampling methods. In particular, we study the conformational transitions of the Fibonacci sequences with lengths of 13, 21, and 34 by computing various geometric and thermodynamic quantities, such as the fraction of the native contacts, the radius of gyration, the specific heat, and the zeros of the partition function. Both collapse and folding transitions are observed, and these exhibit distinct characteristics depending on the sequence.
II. MODELS AND METHODS
Random sequences have in general multiple conformations with a lowest energy, and in order to observe the folding transition, one must choose a sequence with a unique ground state,18 which is referred to as a foldable sequence. It is also preferable for there to be a large difference between the energy of the native structure and the second lowest energy. Because it is rather time-consuming to find such a sequence, we restrict ourselves to the Fibonacci sequence, which has often been used in studies for the two-dimensional off-lattice model and found to form hydrophobic core which is essential for the protein folding.20–30 We then adjusted the energy parameters so that the Fibonacci sequences become foldable, as will be elaborated below. The Fibonacci sequence is defined recursively by
where the asterisk denotes the concatenation operator. The first few sequences are Λ2 = HP, Λ3 = PHP, Λ4 = HPPHP, and so on. The length of a sequence is thus given by the sum of the lengths of the two preceding sequences, i.e., Ni = Ni−2 + Ni−1. The Λ6, Λ7, and Λ8 sequences have lengths 13, 21, and 34. In order to make their lengths be more visible, from now on we will use the notation S13, S21, and S34 instead of Λ6, Λ7, and Λ8.
The conformations of a polymer chain are modelled as a self-avoiding chain on a square lattice, and the position of a monomer i is given by ri = (k, l), where k and l are integers that represent the Cartesian coordinates relative to an arbitrary origin. The distance between the bonded monomers, the bond length, is fixed, i.e., |ri − ri+1| = 1. The condition that only one monomer is allowed to occupy one lattice site, ri ≠ rj for i ≠ j, realizes the excluded volume. The attractive interaction works only for the contact between the non-bonded nearest-neighboring monomers. The Hamiltonian is
where
and the interaction energy ϵσiσj is set as negative to represent the attractive interaction between two monomers, σi and σj. The energy of the system is then
where KHH, KHP, and KPP are the number of contacts between the non-bonded nearest-neighboring H-H, H-P, and P-P monomers, respectively.
The simplest version of the HP model is the one where ϵHH < 0 and ϵHP = ϵPP = 0, but other variants have also been studied. It is usually assumed that
implying that hydrophobic interactions are the dominant driving force in protein folding,31,32 and
corresponds to the condition where different types of monomers tend to be segregated.33–36 We tune the values of ϵHH, ϵHP, and ϵPP under the constraints of Eqs. (5) and (6) for the 3 Fibonacci sequences to be foldable, and we obtain a parameter set ϵHH : ϵHP : ϵPP = − 7 : − 1 : 0 that yields a unique ground state for both S21 and S34, and two very similar ground states for S13. In fact, we found that there is no parameter set that makes S13 be a foldable sequence, probably as a result of its short length. The native structure of S34 for this parameter set is shown in Fig. 1 with the lowest energy Emin = − 84.
We first compute the density of states to obtain various quantities. An efficient parallel algorithm is employed to enumerate the exact lattice polymer conformations,19 and the states are classified according to the variables of interest. For example, the expectation value of a physical quantity f(KHH, KHP, KPP) is a function of three variables KHH, KHP, and KPP, so the density of states for these three variables, Ω(KHH, KHP, KPP), needs to be computed to obtain the average value of f as
The CPU time needed to generate the density of states for S34 is about 3 days with 7 Intel Core i7-3770 CPUs.
III. GEOMETRIC ANALYSIS
Structural quantities can provide us with insight into the conformational transitions of the polymers and the proteins. The most commonly used quantity to characterize the degree of the folding transition is the fraction of the native contacts,11,37–43 which we denote as Q. In the case of the sequence S13 that has two native structures, the native contacts are defined as the contacts that exist in both of these two native structures.
The average value of the fraction of the native contacts, 〈Q〉, is plotted in Fig. 2(a) as a function of the temperature for S13, S21, and S34. As shown in this figure, the fraction of the native contacts increases sharply at lower temperatures, and as the length of the sequence increases, the fraction of the native contacts changes more sharply at lower temperatures. In addition, we evaluate the derivatives of the fractions of the native contacts with respect to the temperature in order to find the temperatures where the change in 〈Q〉 is maximal. Figure 2(b) shows the derivative of the average value of the fraction of native contacts, d〈Q〉/dT, as a function of the temperature. In accordance with Fig. 2(a), the minimum of d〈Q〉/dT becomes deeper, and the corresponding temperature moves to a lower temperature as the length of the sequence increases. Table I shows the folding temperature Tf, which is defined as the temperature with the maximum value of d〈Q〉/dT, and the value of 〈Q〉 at Tf for each of these sequences. As shown in the table, 〈Q〉 ≈ 0.8 at Tf.
Sequence . | Tf . | 〈Q〉(Tf) . |
---|---|---|
S13 | 1.825 | 0.769 |
S21 | 0.915 | 0.881 |
S34 | 0.589 | 0.837 |
Sequence . | Tf . | 〈Q〉(Tf) . |
---|---|---|
S13 | 1.825 | 0.769 |
S21 | 0.915 | 0.881 |
S34 | 0.589 | 0.837 |
Another important structural quantity is the square of the radius of gyration,
where ri is the position vector of the ith monomer and is the center of mass of the conformation on the assumption that all monomers have an equal mass. In general, the square of the radius of gyration provides us with information on the collapse transitions of polymers and proteins.
The average value of the square of the radius of gyration, , is plotted in Fig. 3(a) as a function of the temperature for S13, S21, and S34. According to the figure, the value of approaches that of the native structure for S13 and S21 at around T = 1. Also, the overall behavior of for S13 and S21 is very similar. However, in the case of S34, does not approach that of the native structure around T = 1, even though it decreases quickly above T = 1. Again, it decreases quickly in 0.5 < T < 1 and then approaches of the native structure below T = 0.5 for S34.
Next, we evaluate the derivatives of with respect to the temperature to find the temperature with a maximal change in . Figure 3(b) shows the derivatives of the radius of gyration, . The collapse temperatures can be identified from the maxima of .3–5,9 The S13 and S21 sequences show one maximum (T1 = 2.133 for S13 and T1 = 2.611 for S21) while the S34 sequence shows two maxima (T1 = 0.615 and T2 = 2.567).
In the case of the sequence S13, T1 = 2.133 is close to the folding temperature Tf = 1.825, implying that the rapid decrease in is a result of folding into the native structure. Therefore, no separate collapse transition is observed for this sequence, and T1 = 2.133 may be considered as an alternative definition of the folding temperature, which we denote as from here on.
On the other hand, for the S21 sequence, T1 = 2.611 is well separated from Tf = 0.915, indicating that the transition that occurs at T1 should be considered as the collapse transition from a swollen state to an unfolded globule, which precedes the folding transition.2–11 For this sequence, the radius of gyration does not change after the collapse transition, so the unfolded globule is as compact as the folded conformation. In contrast to the results for the other two sequences, two local maxima can be observed in the graph of for S34. One of the corresponding temperatures, T1 = 0.615, is close to the folding temperature Tf = 0.589 defined in terms of d〈Q〉/dT, so it can be identified as a folding transition. The second transition temperature T2 = 2.567 corresponds to the collapse transition from the swollen state to the unfolded globule, and we have 〈Q〉 = 0.471 at T2, indicating that the conformation is far from being folded.
For the S34 sequence, we have compared the size of the unfolded globule and the fully folded conformation. The size of the unfolded globule was taken at the temperature where the change in the size is minimal, at T = 1.17. We find that at T = 1.17 is about 13% larger than that at T = 0, which is more or less consistent with the previous results of an increase of 10%,6,7 considering the fact that is in general larger than 〈Rg〉 used in these studies due to fluctuations. However, note that the size of the unfolded globule and the folded conformation is almost the same in the case of the S21 sequence. Therefore, we see that the characteristics of the transition depends on the sequence. In particular, the analysis shows that for the short S13 sequence there is no separate collapse transition that precedes the folding. From here on, we will denote the collapse transition temperature as Tc for S21 and S34.
IV. THERMODYNAMIC ANALYSIS
The specific heat is a thermodynamic quantity that is most commonly analyzed to study phase transitions, and is defined as
Figure 4 shows the specific heat per monomer as a function of the temperature for S13, S21, and S34. As shown in the figure, the specific heat shows two peaks for S21 and S34 (T1 = 0.961 and T2 = 2.210 for S21, and T1 = 0.620 and T2 = 1.867 for S34) but one peak (T = 1.939) for S13.
For the S13 sequence, the transition temperature from the specific heat, T = 1.939, lies between Tf = 1.825 and . Therefore, the thermodynamic transition corresponds to the folding transition. In the case of the S21 sequence, the first transition temperature T1 = 0.961 is very close to the folding temperature (Tf = 0.915) and the second transition temperature T2 = 2.210 is near the collapse temperature (Tc = 2.611). For the S34 sequence, again the first transition temperature T1 = 0.620 is very close to the folding temperature (Tf = 0.589 and ) and the second transition temperature T2 = 1.867 is near the collapse temperature (Tc = 2.567). The two-peak structure of the specific heat for other HP sequences was also obtained in the literature.9,11
The partition function zeros (PFZs) are one of the most sensitive quantities to identify various transitions. The concept of the PFZs was first introduced by Yang and Lee44 as a tool to study phase transitions by identifying the zeros in the complex plane of the fugacity and the magnetic field. The concept was later extended by Fisher45 to the zeros in the complex temperature plane. PFZs are known to be a much more sensitive indicator of a phase transition than the specific heat and have been applied to various systems, including simple models of polymers and proteins.10,46–49 The partition function zeros are defined as the solution of the polynomial equation
in the complex plane of exponentiated temperature y ≡ exp(−β), where {σ} denotes the sum over all possible conformations, and E is defined in Eq. (4). The physical region of positive temperatures 0 ≤ T < ∞ is mapped into the interval 0 ≤ y < 1. In the thermodynamic limit, the PFZs ideally fall on a simple locus. In such an event, the point where the locus crosses the positive real axis corresponds to the transition temperature. For finite-size systems, the PFZs do not cross the positive real axis, but we can define the transition temperature by the real values of the first zeros,10,47,48 which are the zeros closest to the positive real axis. It has been suggested that even for a heterogeneous and intrinsically finite-size system, such as a protein, the distribution of PFZs provides information on the nature of the conformational transition.49
Figure 5 shows the partition function zeros of the S13, S21, and S34 sequences in the complex temperature (y = e−β) plane. For comparison, we also show the partition function zeros for homopolymers of the same lengths in the figure. As shown in the figure, the PFZs of the homopolymers show only one transition, which corresponds to the collapse transition and is independent of the sequence length. In the cases of the HP sequences, the PFZs show one transition (T1 = 2.507) for S13 and two transitions for S21 and S34 (T1 = 0.888 and T2 = 3.069 for S21 and T1 = 0.529 and T2 = 1.701 for S34). The existence of two transitions is more clearly visible than with the previous results10 where the sequence with a length of 25 was considered. The transition temperatures, estimated from the PFZs, are consistent with the temperatures that were obtained in the previous sections from the fraction of the native contacts, the radius of gyration, and the specific heat.
The distribution of PFZs shown in Fig. 5(a), 5(b), 5(c) also exhibits an approximate 7-fold symmetry in the angular direction due to the fact that ϵHH is 7 times that of ϵHP and ϵPP = 0, leading to an approximate 7-fold symmetry, as was also observed for the case of the homopolymer with nearest- and next-nearest-neighbor interactions.48
V. DISCUSSIONS
We computed the exact density of states of an HP protein model for Fibonacci sequences with lengths of 13, 21, and 34. The length of the longest sequence is much longer than those discussed in a previous work based on exact enumeration, which is 25.10 The exact density of states was used to compute the geometric and thermodynamic quantities as a function of the temperature. The transition temperatures for collapse and folding can be defined in terms of the maxima of these quantities or their derivatives, as summarized in Tables I, II, and III. We observed that the collapse and folding transitions show distinct characteristics depending on the sequence.
Sequence . | . | Tc . |
---|---|---|
S13 | 2.133 | - |
S21 | - | 2.611 |
S34 | 0.615 | 2.567 |
Sequence . | . | Tc . |
---|---|---|
S13 | 2.133 | - |
S21 | - | 2.611 |
S34 | 0.615 | 2.567 |
. | Specific heat . | Partition function zeros . | ||
---|---|---|---|---|
Sequence . | T1 . | T2 . | T1 . | T2 . |
S13 | - | 1.939 | - | 2.507 |
S21 | 0.961 | 2.210 | 0.888 | 3.069 |
S34 | 0.620 | 1.867 | 0.529 | 1.701 |
. | Specific heat . | Partition function zeros . | ||
---|---|---|---|---|
Sequence . | T1 . | T2 . | T1 . | T2 . |
S13 | - | 1.939 | - | 2.507 |
S21 | 0.961 | 2.210 | 0.888 | 3.069 |
S34 | 0.620 | 1.867 | 0.529 | 1.701 |
Only the longest sequence, S34, followed the common scenario where the collapse transition from the swollen state to a compact globule preceded the folding transition to its native state, with the overall size decreasing by roughly 10% when folding. Although there are separate collapse and folding transitions for the intermediate-sized sequence S21, the folding transition did not involve a decrease in size, indicating that the size of the unfolded globule and that in the folded state are nearly the same. For the shortest sequence, S13, there was only a folding transition and no separate collapse transition. Whether these effects are indeed a result of the chain size or the intrinsic properties of the specific sequences, is yet to be determined. We also note that depending on sequences, there can be transitions other than folding, such as changes of the surface morphology or structural rearrangements.15
Although two-dimensional HP model is the simplest model that can be used for studying folding transitions, it still has many limitations in describing the folding of real proteins, in particular, in the lack of two-state cooperativity.16,50 The study of more realistic three-dimensional models will be able to overcome such shortcomings.
ACKNOWLEDGMENTS
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2056127).