Random polyampholytes (PAs) contain positively and negatively charged monomers that are distributed randomly along the polymer chain. The interaction between charges is assumed to be given by the Debye-Huckel potential. We show that the size of the PA is determined by an interplay between electrostatic interactions, giving rise to the polyelectrolyte effect due to net charge per monomer (σ) and an effective attractive PA interaction due to charge fluctuations, δσ. The interplay between these terms gives rise to non-monotonic dependence of the radius of gyration, Rg, on the inverse Debye length, κ, when PA effects are important (). In the opposite limit, Rg decreases monotonically with increasing κ. Simulations of PA chains, using a charged bead-spring model, further corroborate our theoretical predictions. The simulations unambiguously show that conformational heterogeneity manifests itself among sequences that have identical PA parameters. A clear implication is that the phases of PA sequences, and by inference intrinsically disordered proteins (IDPs), cannot be determined using only the bare PA parameters (σ and δσ). The theory is used to calculate the changes in Rg on N, the number of residues for a set of IDPs. For a certain class of IDPs, with N between 24 and 441, the size grows as Rg ∼ N0.6, which agrees with data from small angle X-ray scattering experiments.
I. INTRODUCTION
The shapes and dynamics of polyampholytes (PAs), which are polymers with monomers that carry both positive and negative charges, have been extensively studied.1–8 Polyampholytes naturally occur in an aqueous solution if the monomers contain acidic and basic groups. In this sense, all proteins are PAs in which charged residues are interspersed between hydrophobic and hydrophilic residues. Because of the simultaneous presence of positive and negative charges, the conformations of the PAs are determined by an interplay of electrostatic interactions, charge fluctuation effects (see below), and the stiffness of the backbone. In simple terms, we expect that repulsion between like-charges would stretch the chain, whereas attraction would tend to make the polymer compact. Of course, in random PAs, this balance is determined on an average performed over an ensemble of sequences (see below). If the number, N, of monomers is large, then the PA is predicted to adopt compact conformations if the polymer is overall neutral (the number of positive and negative charges nearly cancel). On the other hand, if there is residual charge on the PA, it is likely to be extended. It should be noted that there are differences in the behavior of the dependence of the radius of gyration, Rg, on N depending on whether the chain is globally neutral (plus and minus charges exactly cancel) or statistically neutral9 (residual charge when averaged over a large number of sequences scales as with N ≫ 1). Thanks to several insightful theoretical studies,2–4,10 the complex phase behavior of PAs as a function of salt concentration and temperature has been elucidated.
More recently, there has been renewed interest in PAs in the biophysics community because many eukaryotic proteins contain an unusually large fraction of charged residues.11–13 As a consequence, the favorable hydrophobic interactions cannot overcome the residual electrostatic interactions. For this reason, this class of proteins does not adopt globular structures unless it is in complex with another partner protein. Polypeptide sequences with this characteristic are referred to as intrinsically disordered proteins or IDPs because they do not have stable ordered structures under physiological conditions. It is also the case that there are protein sequences in which only certain regions are disordered under nominal conditions. Because of the preponderance of such sequences and their roles in a variety of cellular functions and the potential role they play in diseases,13,14 there is heightened interest in understanding their structural and dynamical properties.15–18 The IDPs, whose backbone is relatively flexible [persistence length in the range (0.6–1.0) nm], are low complexity sequences containing a large fraction of charged residues and smaller fraction of hydrophobic residues compared to their counterparts that adopt well-defined structures in isolation. As a consequence, water is likely to be a good or at best a Θ solvent, which means that Rg ≈ Nν, where ν is approximately 0.6 or 0.5. There are differences between IDPs and random PAs. (i) The sequences of IDPs are quenched, thus making it necessary to understand the conformations of a specific sequence. In other words, two sequences with identical charge composition could have drastically different structural characteristics. Of course, this could be the case for random PAs as well, although this aspect has not been investigated as much. (ii) Unlike the case of PAs for which N ≫ 1, which allows one to develop analytical and scaling type arguments using well-developed methods in polymer physics, typically studied IDPs have finite N, at best on the order of a few hundred residues. (iii) IDPs also contain uncharged amino acids, which are not usually considered when treating PAs using theory and simulations. Despite these differences, concepts from polyelectrolytes (PEs) and PAs have been used to envision the conformations of IDPs using the difference between the positive and negative charge (σ) and net charge as appropriate variables.19
The importance of sequence effects on the Rg of PAs was first illustrated in a key note by Srivastava and Muthukumar.3 Using Monte Carlo simulations, with N = 50, they showed that there are substantial variations in Rg in PAs (containing only charged monomers) for a globally neutral chain. This study showed that the location of charges (sequence specificity) plays a crucial role in determining the conformational properties. More recently, Firman and Ghosh20 used the Edwards model for charged polymers, encoding for the precise sequence in order to calculate Rgs for small N. Their theory successfully accounted for simulations of synthetic IDPs21 containing only a mixture of positive and negative charged residues.
Here, we develop a theory to investigate the effects of charge fluctuations on the shapes of random PAs. In our model, there is a probability, p+, (p−) that a monomer at location s is positively (negatively) charged. The probabilities, p+ and p−, should be calculated as follows. The number of sequences, M, of a PA containing N monomers is M = 3N because each monomer can either have a + or a − charge or is neutral. We assume that there are no correlations between charges along a given sequence, which implies that the magnitude of the charge of monomer s does not affect the value of s′. Thus, p+ and p− are independent of s. Let N+(s) (N−(s)) be the number of sequences with a + (−) charge at position s. Then, and . Because N+(s) + N−(s) + N0(s) = M, where N0(s) is the number of sequences in which the sth monomer is neutral, the probability that the sth monomer in an ensemble of M sequences is neutral is 1 − p+ − p−.
The fluctuations in the ensemble of PA sequences arise because the normalized charge distribution is taken to be stochastic and is given by
The charges are measured in units of e−. Because some monomers do not carry a charge (like in IDPs) (p+ + p−) ≠ 1, the mean ⟨σ(s)⟩ gives the net charge, p+ − p−, and the expression for the square of the charge fluctuations is . We refer to ⟨σ(s)⟩ and ⟨δσ(s)⟩, both of which are independent of s, as PA variables. We show that due to ⟨δσ2(s)⟩ the Rg is altered substantially and could even induce a coil-globule transition even when the total charge on the PA is not globally neutral. Because of the opposing behavior of polyelectrolyte (σ ≠ 0) and PA effects arising from charge fluctuations (), the dependence of Rg on the Debye screening length could be non-monotonic. The phase diagram in the [⟨σ(s)⟩, ⟨δσ(s)⟩] plane is rich. We also apply the theory to calculate Rg of specific IDP sequences. Remarkably, the theory reproduces quantitatively the Rg values for the wild type Tau protein and various fragments obtained from the wild type Tau, which have been measured by Small Angle X-ray Scattering (SAXS) experiments.22 In Tau, and other IDPs, charge fluctuations arise because of conformational heterogeneity, which we demonstrate explicitly elsewhere23 for IDPs and here for PAs using simulations. From now on, we drop the angular brackets in both ⟨σ⟩ and ⟨δσ⟩.
II. THEORY
We begin by considering the Edwards Hamiltonian for a polymer chain,
where is the position of the monomer s, a0 is the monomer size, and N is the number of monomers. The first term in the Eq. (2) accounts for chain connectivity, and the second term represents the sum of excluded volume interactions, electrostatic interactions, and effects of charge fluctuations (see below) due to the random values of charges in different positions in the ensemble of sequences. The expression for is
The first term in Eq. (3) accounts for the non-specific two body excluded volume interactions. It differs insignificantly from the usual δ function potential used in the standard Edwards model. Of course, when a0 is small compared to Rg, the precise form of this term is irrelevant, as long as it is short-ranged. In a good solvent ( > 0), the polymer chain swells with Rg ∼ a0Nν (ν ≈ 0.6), whereas in a poor solvent ( < 0), the size of the polymer is Rg ∼ a0Nν (ν ≈ 1/3). Here, we consider a PA in a good solvent ( > 0).
From Eq. (3), one may obtain an effective interaction term between charges on the PA chain. By following the theory developed previously,24 we use the Hubbard-Stratonovich transformation to decouple the product of charges σ(s)σ(s′) in Eq. (3). The partition function may be written as
where and . If we assume that the charge distribution [Eq. (1)] is annealed, it suffices to average Zψ over the sequence of charges. With an assumption that the charges σ(s) at distant sites are not correlated, the partition function averaged over a sequence of charges to second order in ψ becomes24
where the average value of the charge on the chain σ = ⟨σ(s)⟩ = p+ − p−, the charge fluctuation , and the local monomer density . The term involving (δσ)2, arising from the charge fluctuations, gives rise to the so called PA effect, which is manifested as an effective attractive interaction of the screened Coulomb potential. Using Eqs. (4) and (5), we perform the needed integration over to obtain the following expression for the effective two body interaction term between charges on the PA,
We neglect the three body interactions in the effective Hamiltonian in Eq. (6), which would be important if the PA were in a poor solvent. In the work of Higgs and Joanny,2 the variational type calculation (see below) was done directly using Eq. (3). In this case, upon expansion to second order in , the electrostatic potential [second term in Eq. (3)] generates a term ∝ σ(s)σ(s′)σ(s″)σ(s‴)σ(s′′′′), which is random. When averaged over the ensemble of sequences, the coefficient of the third term is in Ref. 2. By contrast, we carry out averaging first, as shown in Eq. (4), and hence obtain a different prefactor for the charge fluctuation (δσ) induced attraction term in Eq. (6).
The screened Coulomb potential, the second term in Eq. (6), accounts for the interactions between charges separated by a distance . The strength of the unscreened electrostatic interactions is characterized by the Bjerrum length lB = e2/ϵkBT. The Debye screening length, κ−1, determines the range of the electrostatic interactions. By changing the value of κ, and hence the range of charge interactions, the PA chain could undergo a coil-to-globule transition. The value of κ may be the changed by decreasing or increasing the salt concentration. The dimensionless parameter, σ, determines the net charge per residue on the polyelectrolyte chain. For a particular sequence, fraction p = p+ + p− of the monomers is charged with the charge on each monomer being ±e. Therefore, the net charge per monomer is σ = ∣p+ − p−∣. The third term in Eq. (6) is the attractive interaction term that is proportional to charge fluctuations (δσ). The PA effect arises due to the interaction between the charge and dipoles formed between the sequence of positive and negative charges. The charge-dipole interaction term decays as and it is effectively screened (with a screening length 1/2κ) due to the presence of other dipoles. In the absence of the third term, the Hamiltonian would describe a polyelectrolyte, whose phases as a function of temperature and κ have been previously described using the methods used here.25
In order to obtain Rg, we adopt the Edwards-Singh (ES) type variational calculation,26 which has been extensively used in the polymer literature.2,25,27–29 More recently, the method was used to study the sequence dependence of collapse of polypeptide chains30 and polyelectrolytes20 with application to a special class of synthetic IDPs. In developing the theory, we assume that the interactions between charges exist only between specific monomers described by the second and third term in Eq. (6). The sum is over the set of specific contacts between pairs {si, sj}. We use the contact maps of IDP, generated in coarse-grained (CG) simulations of IDPs,23 in order to assign the specific interactions. The contact map from the simulation is computed by using a cutoff of 8 Å. The contacts are included between all side chain beads. In the two bead CG model,23 the charges are positioned on the center of masses of the side chain beads, and therefore the contact map includes charge-charge contacts.
The ES method is a variational type (referred to as the uniform expansion method) calculation that represents the exact Hamiltonian by a Gaussian chain with an effective monomer size, which is determined as follows. Consider a virtual chain without excluded volume interactions, whose radius of gyration is ,26 described by the Hamiltonian,
The monomer size in the trial Hamiltonian is a. We split the deviation between the virtual chain Hamiltonian and the real Hamiltonian as
where
The radius of gyration is , with the average being , where, ⟨⋯⟩ denotes the average over .
Assuming that the deviation is small, we can calculate the average to first order in . The result is and the radius of gyration becomes
If we choose the effective monomer size a in such that the first order correction [second and third terms in the right hand side of Eq. (10)] vanishes, then the size of the chain is . This is an estimate of the exact and is only an approximation as we have neglected and higher powers of . Thus, in the ES theory, we determine a using Eq. (10),
The equation given above leads to a self-consistent equation for a and is given by26
By calculating the averages in the Fourier space [, , and ], we obtain
From Eq. (13), we can calculate the effective monomer size a and hence the chain size . However, without having to solve Eq. (13) numerically, we can define the Θ-like point, which signals the onset of a potential transition from a coil to a globule state in the PA. At the θ-point, the repulsive terms exactly balance the PA term. Since at the Θ-point, the PA behaves as a Gaussian chain, with a = a0, we substitute this value for a in Eq. (13) to determine the condition for the Θ-point. Thus, from Eq. (13), the critical charge fluctuation value, at which the PA term equals the excluded volume and PE terms, is
The numerator in Eq. (14) is a consequence of the repulsion containing excluded volume interactions and the polyelectrolyte term. The denominator encodes the PA effect, determining the extent to which the size of the polymer changes due to charge fluctuations. Using Eq. (14), we obtain the dependence of δσθ on N. Scaling n by N, it can be shown that and . From these results, we obtain . The implication is that for N ≫ 1, charge fluctuations have to be extremely large to drive coil to globule transitions unless the PA is globally neutral. Because even for statistically neutral PA the PE term would not be irrelevant, we surmise that a genuine coil to globule transition may not be easily realizable in long PAs, which is in accord with the results in a previous study.9 By implication, our theory suggests that maximally compact IDPs would be difficult to obtain for generic IDP sequences if the fractions of + and − charged residues is on the order of (0.4–0.5). Of course, to quantitatively describe the conformational ensemble of IDPs or PAs adopt as the PA parameters and salt concentration (σ, δσ, and κ) are varied, it will require detailed calculations, as was previously done for polyelectrolytes.31
III. SIMULATIONS
A. Model
To provide further insights into some aspects of our theoretical predictions and to highlight the nature of the heterogeneous ensembles that are sampled, we carried out simulations of PA chains, with N = 50. We consider sequences having the same net charge, σ, but different charge distributions to elucidate the role of sequence in determining the size of PAs. The PA chain is modeled using a standard bead-spring model for charged polymers, with the total potential energy, Utot, given by
Here, Uch describes the chain connectivity between the beads and is modeled using the FENE (finitely extensible nonlinear elastic) potential,
In Eq. (16), k = 20 kcal mol−1 Å−2 denotes the spring constant, l0 = 3.8 Å is the equilibrium bond length between the connected PA beads, and R0 = 2 Å controls the maximum allowable deformation of the covalent bonds.
The excluded volume interactions between pairs of beads are described by a truncated and shifted Lennard-Jones potential,
Based on the previous work,32–34 we set ϵ = kBT and σ = l0. The pairwise interactions between the beads are ignored if the distance is greater than 21/6σ. This cutoff ensures that the excluded volume term is purely repulsive.
The interactions between charged beads are taken into account via the screened Coulomb potential,
In Eq. (18), ε and κ are the inverse Debye length and the dielectric constant, respectively. We consider only unit charges, i.e., q = ±e.
B. Simulations
The conformational space of each PA chain is explored using Langevin dynamics. For each PA bead, the stochastic equation is given by , where m is the mass, Fi is the conservative force acting on each bead, and γi is the drag coefficient. The Gaussian random force, gi, satisfies ⟨gi(t)gj(t′)⟩ = 6kBTγδijδ(t − t′). The drag coefficient γ is given by γ = m/τeff, where τeff = σ(m/ϵ)1/2 is the effective time scale. We used a variant of the velocity Verlet scheme35 to integrate the equations of motion, using a time step of Δt = 0.01 τeff. Each simulation was carried out for 1.2 × 109 steps to ensure proper equilibration and to obtain meaningful statistics.
C. Analysis
Following Eq. (6), we can estimate the charge fluctuations for each PA chain from simulations using an approximate expression,
where δUelec = Uelec − ⟨Uelec⟩ is the fluctuation in the electrostatic energy [Eq. (18)] about its mean, and δσc denotes the charge fluctuation computed from the ensemble of sequence-specific conformations generated from simulations (see below).
To characterize the structural heterogeneity of the PA ensembles and to identify the most populated equilibrium conformations, we carried out hierarchical clustering of the simulation trajectories using a pairwise distance metric, Dij, defined as
where and are the pairwise distances between the PA beads a and b in snapshots i and j, respectively. Distinct geometric clusters were identified using the Ward variance minimization algorithm,36 as implemented within the scipy module. The hierarchical organization of conformations into distinct families was visualized in the form of dendrograms.
IV. RESULTS
A. Theoretical predictions
From Eq. (6), it is easy to show that the size of the PA should be determined by , which can be written as , where , which in the IDP literature is referred to as the charge asymmetry parameter. Figure 1, displaying the dependence of the radius of gyration for a PA chain with randomly distributed charges on the screening length, shows that Rg changes non-monotonically as κ increases when . In this charge-fluctuation dominated regime, the behavior can be explained by noting that at small values of κ, Rg increases due to the PA term until κlb ≈ 0.19. In this range of κ, the effective attractions between monomers, the PA effect, decrease by adding ions to the solvent. As a result, the size of the chain increases. For the PA whose Rg is shown in Fig. 1, at κlB = 0.19, the PA and PE effects balance each other, and the chain becomes a random coil. Upon further increase in κ, the decrease in Rg (but the chain is not a globule) is due to the dominance of the PA term. In the opposite limit when , both the dimensions of the chain are dominated by the PE term, and Rg decreases with increasing κ. We expect that at sufficiently large values of κlB the consequences of PE and PA effects are negligible, and hence, Rg would have the value expected for a Flory random coil (ν = 0.6). Interestingly, these trends are qualitatively similar to the experiments on two IDPs (the N-terminal domain of HIV-1 integrase and human prothymosin-α).37
Non-monotonic increase in the radius of gyration, Rg, with an increasing value of the inverse screening length κ for a PA chain (N = 150).
Non-monotonic increase in the radius of gyration, Rg, with an increasing value of the inverse screening length κ for a PA chain (N = 150).
B. Predictions for a Tau-like IDP
The generality of the theory allows us to predict the dependence size of an IDP. In Fig. 2, we plot the κ dependence of radius of gyration of a Tau protein fragment (K17Tau with N = 145). To perform the calculations, we used the contact map generated in simulations based on the Self-Organized Polymer (SOP)-IDP model, which captures accurately the measured structure factors for a variety of IDPs.23 Using input from simulations,23 which accounts for the heterogeneity of the conformational ensembles of IDP, we find that Rg of K17Tau changes non-monotonically with increasing value of κ (Fig. 2). The size of K17Tau protein increases with κ until it reaches a maximum at κlB ≈ 0.28 (ion density for a monovalent salt is ≈30.5 mM), where the protein behaves like a polymer in a Θ-solvent. The peak is broader when compared to the chain with random sequences. With further increase in κ, Rg decreases just as for the random PA chain (Fig. 1). From these results, we conclude that charge fluctuations are substantial in K17Tau.
The size of the chain (Rg) changes non-monotonically with an increasing value of κ for K17Tau (N = 145) protein. The parameters used to generate the plot are δσ = 0.95 and σ = 0.01.
The size of the chain (Rg) changes non-monotonically with an increasing value of κ for K17Tau (N = 145) protein. The parameters used to generate the plot are δσ = 0.95 and σ = 0.01.
C. Phase diagram
The 3D plots in Figs. 3 and 4 for two different values of κ show the phase diagram for different values of σ, and δσ. The plot in Fig. 3 shows that for small values of σ, the change in Rg is significant at a particular value of δσ. For a large value of net charge, say σ = 0.8, the change in Rg is small over a range of values of δσ indicating that PE effects dominate. The value of δσθ increases with σ for a PA chain. In Fig. 4, for κ = 0.4 nm−1 and for a high value of net charge σ, the change in Rg is significant at a particular value of δσ indicating that PA effects dominate. The phase diagrams show that by changing the salt concentrations, the sizes of random PAs can be altered dramatically.
Phase diagram for different values of net charge σ and the charge fluctuation parameter δσ. The chain size decreases monotonically as δσ increases. The parameters for the plot are N = 150 and κ = 0.2 nm−1.
Phase diagram for different values of net charge σ and the charge fluctuation parameter δσ. The chain size decreases monotonically as δσ increases. The parameters for the plot are N = 150 and κ = 0.2 nm−1.
Phase diagram for different values of net charge σ and the charge fluctuation parameter δσ. The chain size decreases monotonically as δσ increases. The parameters for the plot are N = 150 and κ = 0.4 nm−1.
Phase diagram for different values of net charge σ and the charge fluctuation parameter δσ. The chain size decreases monotonically as δσ increases. The parameters for the plot are N = 150 and κ = 0.4 nm−1.
D. Application to IDPs
In order to calculate Rg for several IDPs (see Fig. 5), with N ranging from 24 (HISTATIN5) to 441 (hTau40), we used the average contact maps from simulations,23 which restrict the summation in Eq. (6) to specific sites on the IDP. The dependence of Rg on the chain length for a set of IDPs (listed in the caption to Fig. 5) is shown in Fig. 5. The theory shows that Rg ∼ N0.6, implying that these IDPs behave as self-avoiding polymers, similar to the results in the simulations for PAs.9 The scaling in Fig. 5 has a weaker N dependence than that predicted by the renormalization group argument (Rg ∼ N) for long PAs.38 The N dependence in Fig. 5 has higher power than the result in Ref. 2 (Rg ∼ N1/3). It appears that for values of σ observed in this set of IDPs the random coil behavior is the apt description.
The size Rg increases with N as Rg ∼ 0.26N0.6 for specific contacts. The symbols denote the Rg values calculated from the theory [using Eq. (6)], and the green line is the power law fit. The scaling is unchanged even when the restriction to specific interactions in Eq. (6) is removed. The orange squares correspond to the Rg values for the IDP sequences: ACTR (N = 65), An16 (N = 185), aSyn (N = 140), ERMnTAD (N = 122), HISTATIN5 (N = 24), hNHE1 (N = 131), NUP153 (N = 81), p53 (N = 93), ProtA (N = 111), SH4UDsrc (N = 85), and Sic1 (N = 90). The Rg values for the hTau40 protein and other constructs derived from it are denoted as blue circles. The various Tau protein constructs are as follows: K18Tau (N = 130), K17Tau (N = 145), K27Tau (N = 167), K10Tau (N = 168), K32Tau (N = 198), K44Tau (N = 283), K23Tau (N = 254), K25Tau (N = 185), hTau23 (N = 352), hTau40 (N = 441), and K16Tau (N = 176). The parameters used to generate the plot are σ = 0.01, δσ = 0.8, and κlB = 0.324.
The size Rg increases with N as Rg ∼ 0.26N0.6 for specific contacts. The symbols denote the Rg values calculated from the theory [using Eq. (6)], and the green line is the power law fit. The scaling is unchanged even when the restriction to specific interactions in Eq. (6) is removed. The orange squares correspond to the Rg values for the IDP sequences: ACTR (N = 65), An16 (N = 185), aSyn (N = 140), ERMnTAD (N = 122), HISTATIN5 (N = 24), hNHE1 (N = 131), NUP153 (N = 81), p53 (N = 93), ProtA (N = 111), SH4UDsrc (N = 85), and Sic1 (N = 90). The Rg values for the hTau40 protein and other constructs derived from it are denoted as blue circles. The various Tau protein constructs are as follows: K18Tau (N = 130), K17Tau (N = 145), K27Tau (N = 167), K10Tau (N = 168), K32Tau (N = 198), K44Tau (N = 283), K23Tau (N = 254), K25Tau (N = 185), hTau23 (N = 352), hTau40 (N = 441), and K16Tau (N = 176). The parameters used to generate the plot are σ = 0.01, δσ = 0.8, and κlB = 0.324.
E. Sequence effects and conformational heterogeneity
To illustrate the effect of charge fluctuations on the conformational ensemble of PAs, we performed simulations of PA chains using a simple off-lattice model. The PA variables, p+ and p−, as well as the other simulation parameters are kept fixed. Hence, any variations in the size or the underlying conformational heterogeneity of the PA sequences is entirely due to the different charge distributions. In a recent study, Firman and Ghosh20 identified combinations of p+ and p−, for which coil to globule transitions are expected to be extremely sensitive to the charge decoration along the PA chain. Taking a cue from their work, we consider PA sequences with a net charge of +6, with p+ = 0.280 and p− = 0.160.
Twenty different realizations of charge distributions were generated by randomly permuting the positions of the neutral, positive, and negative beads along the chain. The ensemble averaged Rg values fall in the range from 1.77 to 2.02 nm. The spread in Rgs is interesting considering that in all these sequences N = 50, and the PA variables, σ and δσ, which are often used to analyze data in the IDP community, are identical. To explain these differences, in terms of fluctuations in the conformations linked to δσ, we consider three representative examples (Seq1, Seq2, and Seq3, in Fig. 6). The peak of the Rg distribution progressively shifts toward lower values in going from Seq1 to Seq3 (Fig. 6), clearly indicating that standard PA variables are not sufficient to fully describe the equilibrium properties.
Top: The cartoon representations of the PA sequences having different charge decorations along the chain with p+ = 0.280 and p− = 0.160. The beads are color coded according to charge: the neutral beads are colored green, positively charged beads are colored red, and negatively charged beads are colored blue. Middle: The distribution of Rg for Seq1 (brown), Seq2 (cyan), and Seq3 (orange). Bottom: The distributions of δUelec = Uelec − ⟨Uelec⟩ for the PA sequences, shown with the same color coding.
Top: The cartoon representations of the PA sequences having different charge decorations along the chain with p+ = 0.280 and p− = 0.160. The beads are color coded according to charge: the neutral beads are colored green, positively charged beads are colored red, and negatively charged beads are colored blue. Middle: The distribution of Rg for Seq1 (brown), Seq2 (cyan), and Seq3 (orange). Bottom: The distributions of δUelec = Uelec − ⟨Uelec⟩ for the PA sequences, shown with the same color coding.
Insights into the relative populations of the coil-like and the globule-like states for the three PA sequences can be obtained from the hierarchical arrangement of structural clusters (Fig. 7). The structural ensemble of Seq1 is clearly dominated by extended conformations, which accounts for 64.1% of the equilibrium population. Compact structures, on the other hand, have a lower occupation probability (35.9%). For Seq2, the relative populations of extended and compact structures are approximately the same, being 49.5% and 50.5%, respectively. As is evident from the dendrogram, the equilibrium ensemble of Seq3 is primarily dominated by compact structures (net population of 73.4%), which is in complete contrast to Seq1. The contrasting heterogeneity of the conformational ensembles for the three PA sequences, with identical N, σ, and δσ, readily explains the differences in Rg.
The conformational heterogeneity of PA sequences depicted in the form of dendrograms (Top: Seq1, Middle: Seq2, and Bottom: Seq3). The olive branches lead to extended configurations, and magenta branches lead to collapsed structures. Representative snapshots corresponding to the different clusters are also depicted. The relative cluster populations are marked near the appropriate dendrogram branches.
The conformational heterogeneity of PA sequences depicted in the form of dendrograms (Top: Seq1, Middle: Seq2, and Bottom: Seq3). The olive branches lead to extended configurations, and magenta branches lead to collapsed structures. Representative snapshots corresponding to the different clusters are also depicted. The relative cluster populations are marked near the appropriate dendrogram branches.
The distributions of δUelec (Fig. 6), together with the approximate values of δσc computed using Eq. (19) (Table I), provide a clear-cut evidence of the key role of the charge fluctuations in determining chain dimensions. For Seq1, which has a high propensity to form extended structures, the δUelec distribution is narrow, and the charge fluctuation is minimal. In fact, the variance to mean ratio (VMR) of the electrostatic energy suggests that the charge distribution of Seq1 would correspond to the theoretical limit, , where PE effects dominate. For Seq3, the δUelec distribution is quite broad, and charge fluctuation effects are the most dominant, in perfect harmony with the clustering analysis, which revealed that Seq3 is mostly associated with compact structures. Furthermore, the VMR suggests that in contrast to Seq1, the appropriate theoretical limit would be , the regime where PA effects dominate. Seq2, for which the equilibrium populations of compact and extended structures are approximately equal, presents an interesting scenario. As expected, the estimated charge fluctuation falls between the two extremities. The VMR ≈1 implies that the corresponding theoretical limit would be . Therefore, the random coil like behavior is predicted from structural clustering, which manifests itself due to a balancing act of the PA and PE effects.
The values of charge fluctuation, δσc, for the different sequences. Note that δσc is computed from the ensemble of sequence-specific conformations using Eq. (19). Also shown are the variance to mean ratios (VMR) for the electrostatic energy.
Seq . | ⟨δ2Uelec(kcal/mol)⟩ . | ⟨Rg⟩ (nm) . | δσc . | |⟨δ2Uelec⟩/⟨Uelec⟩| . |
---|---|---|---|---|
Seq1 | 0.23 | 2.03 | 1.53 | 0.83 |
Seq2 | 1.04 | 1.87 | 2.14 | 1.02 |
Seq3 | 4.07 | 1.78 | 2.93 | 9.49 |
Seq . | ⟨δ2Uelec(kcal/mol)⟩ . | ⟨Rg⟩ (nm) . | δσc . | |⟨δ2Uelec⟩/⟨Uelec⟩| . |
---|---|---|---|---|
Seq1 | 0.23 | 2.03 | 1.53 | 0.83 |
Seq2 | 1.04 | 1.87 | 2.14 | 1.02 |
Seq3 | 4.07 | 1.78 | 2.93 | 9.49 |
We draw two important conclusions. (i) The scaling of Rg with N, which for a certain class of IDPs, obeys the Flory scaling law and can be used to accurately determine Rg without the need for experiments or simulations. The class of IDPs for which Rg can be computed using Rg ∼ Nν (ν ≈ 0.6) can be discerned using σ and δσ. (ii) However, quantitative description of the equilibrium properties of IDPs as a function of salt concentration or denaturants requires a complete characterization of the conformational ensemble, as simulations explicitly demonstrate. The analyses of charge fluctuations show that δσ, which can be readily calculated for a specific sequence (in our simulations they are identical), is substantially modified when weighted by the energies associated with the conformational ensemble (denoted as δσc). Thus, only by understanding the details of the conformations can the properties of IDPs be correctly described. Alas, the average Rg masks such subtle but important effects.
V. CONCLUSIONS
We developed a theory to quantitatively predict the effect of charge fluctuations (δσ) on the size of flexible PAs that is in a good solvent (excluded volume interactions are positive) as a function of the inverse Debye length and the net charge per monomer on the chain (σ). Interestingly, when charge fluctuations are non-negligible ( is greater than unity), the radius of gyration increases non-monotonically as κ increases. When is less than unity, Rg decreases with increasing κ. The generality of the theory allows us to predict Rg for a number of IDPs. For a certain class of IDPs, we find the usual scaling of Rg ∼ Nν with ν = 0.6, which coincides with the behavior expected for Flory random coils. Remarkably, our theory gives accurate estimates of the size of the Tau protein and various fragments derived from it. This class of IDPs behaves as an ideal chain. The differences in the two scaling behavior between these IDPs can be rationalized in terms of the interplay between charge fluctuations and the net charge per monomer.
What could be the origin of charge fluctuations in an IDP in which σ (more precisely the precise sequence) is fixed? Even if σ is fixed, the ensemble of conformations that a typical IDP or PA samples is heterogeneous. In sampling a large number of conformations, the spatial distances between charged residues could vary greatly. Therefore, the effective charge of each conformation is different. In some of the conformations, positively and negatively charged residues would be close together, whereas in others they would be spatially well-separated. This gives rise to conformation-dependent effective attraction, which is quantified in our theory in terms of the average quantity δσ. Of course, the effective value of δσ cannot be computed for a quenched charged sequence for a specific IDP without suitable simulations (see Fig. 7 which illustrate this important point using three specific PA sequences). Therefore, it is difficult to construct phase diagrams of IDPs solely in terms of σ or the differences between the number of positively and negatively charged residues. Construction of phase diagrams requires the use of physical order parameters, which necessarily involves quantitatively characterizing the conformational ensembles of IDPs, an exercise requiring simulations using models that reproduce experimental measurements, such as scattering profiles.
ACKNOWLEDGMENTS
We are indebted to Upayan Baul for providing the simulation results and for useful discussions. We thank Professor D. Svergun for providing us SAXS data on the Tau protein. This work was supported by the National Science Foundation and the National Institutes of Health (Grant Nos. CHE 16-36424 and R01 GM107703) and the Collie-Welch Foundation (No. F-0019).