Random polyampholytes (PAs) contain positively and negatively charged monomers that are distributed randomly along the polymer chain. The interaction between charges is assumed to be given by the Debye-Huckel potential. We show that the size of the PA is determined by an interplay between electrostatic interactions, giving rise to the polyelectrolyte effect due to net charge per monomer (*σ*) and an effective attractive PA interaction due to charge fluctuations, *δσ*. The interplay between these terms gives rise to non-monotonic dependence of the radius of gyration, *R*_{g}, on the inverse Debye length, *κ*, when PA effects are important ($\delta \sigma \sigma >1$). In the opposite limit, *R*_{g} decreases monotonically with increasing *κ*. Simulations of PA chains, using a charged bead-spring model, further corroborate our theoretical predictions. The simulations unambiguously show that conformational heterogeneity manifests itself among sequences that have identical PA parameters. A clear implication is that the phases of PA sequences, and by inference intrinsically disordered proteins (IDPs), cannot be determined using only the bare PA parameters (*σ* and *δσ*). The theory is used to calculate the changes in *R*_{g} on *N*, the number of residues for a set of IDPs. For a certain class of IDPs, with *N* between 24 and 441, the size grows as *R*_{g} ∼ *N*^{0.6}, which agrees with data from small angle X-ray scattering experiments.

## I. INTRODUCTION

The shapes and dynamics of polyampholytes (PAs), which are polymers with monomers that carry both positive and negative charges, have been extensively studied.^{1–8} Polyampholytes naturally occur in an aqueous solution if the monomers contain acidic and basic groups. In this sense, all proteins are PAs in which charged residues are interspersed between hydrophobic and hydrophilic residues. Because of the simultaneous presence of positive and negative charges, the conformations of the PAs are determined by an interplay of electrostatic interactions, charge fluctuation effects (see below), and the stiffness of the backbone. In simple terms, we expect that repulsion between like-charges would stretch the chain, whereas attraction would tend to make the polymer compact. Of course, in random PAs, this balance is determined on an average performed over an ensemble of sequences (see below). If the number, *N*, of monomers is large, then the PA is predicted to adopt compact conformations if the polymer is overall neutral (the number of positive and negative charges nearly cancel). On the other hand, if there is residual charge on the PA, it is likely to be extended. It should be noted that there are differences in the behavior of the dependence of the radius of gyration, *R*_{g}, on *N* depending on whether the chain is globally neutral (plus and minus charges exactly cancel) or statistically neutral^{9} (residual charge when averaged over a large number of sequences scales as $N$ with *N* ≫ 1). Thanks to several insightful theoretical studies,^{2–4,10} the complex phase behavior of PAs as a function of salt concentration and temperature has been elucidated.

More recently, there has been renewed interest in PAs in the biophysics community because many eukaryotic proteins contain an unusually large fraction of charged residues.^{11–13} As a consequence, the favorable hydrophobic interactions cannot overcome the residual electrostatic interactions. For this reason, this class of proteins does not adopt globular structures unless it is in complex with another partner protein. Polypeptide sequences with this characteristic are referred to as intrinsically disordered proteins or IDPs because they do not have stable ordered structures under physiological conditions. It is also the case that there are protein sequences in which only certain regions are disordered under nominal conditions. Because of the preponderance of such sequences and their roles in a variety of cellular functions and the potential role they play in diseases,^{13,14} there is heightened interest in understanding their structural and dynamical properties.^{15–18} The IDPs, whose backbone is relatively flexible [persistence length in the range (0.6–1.0) nm], are low complexity sequences containing a large fraction of charged residues and smaller fraction of hydrophobic residues compared to their counterparts that adopt well-defined structures in isolation. As a consequence, water is likely to be a good or at best a Θ solvent, which means that *R*_{g} ≈ *N*^{ν}, where *ν* is approximately 0.6 or 0.5. There are differences between IDPs and random PAs. (i) The sequences of IDPs are quenched, thus making it necessary to understand the conformations of a specific sequence. In other words, two sequences with identical charge composition could have drastically different structural characteristics. Of course, this could be the case for random PAs as well, although this aspect has not been investigated as much. (ii) Unlike the case of PAs for which *N* ≫ 1, which allows one to develop analytical and scaling type arguments using well-developed methods in polymer physics, typically studied IDPs have finite *N*, at best on the order of a few hundred residues. (iii) IDPs also contain uncharged amino acids, which are not usually considered when treating PAs using theory and simulations. Despite these differences, concepts from polyelectrolytes (PEs) and PAs have been used to envision the conformations of IDPs using the difference between the positive and negative charge (*σ*) and net charge as appropriate variables.^{19}

The importance of sequence effects on the *R*_{g} of PAs was first illustrated in a key note by Srivastava and Muthukumar.^{3} Using Monte Carlo simulations, with *N* = 50, they showed that there are substantial variations in *R*_{g} in PAs (containing only charged monomers) for a globally neutral chain. This study showed that the location of charges (sequence specificity) plays a crucial role in determining the conformational properties. More recently, Firman and Ghosh^{20} used the Edwards model for charged polymers, encoding for the precise sequence in order to calculate *R*_{g}s for small *N*. Their theory successfully accounted for simulations of synthetic IDPs^{21} containing only a mixture of positive and negative charged residues.

Here, we develop a theory to investigate the effects of charge fluctuations on the shapes of random PAs. In our model, there is a probability, *p*_{+}, (*p*_{−}) that a monomer at location *s* is positively (negatively) charged. The probabilities, *p*_{+} and *p*_{−}, should be calculated as follows. The number of sequences, *M*, of a PA containing *N* monomers is *M* = 3^{N} because each monomer can either have a + or a − charge or is neutral. We assume that there are no correlations between charges along a given sequence, which implies that the magnitude of the charge of monomer *s* does not affect the value of *s*′. Thus, *p*_{+} and *p*_{−} are independent of *s*. Let *N*_{+}(*s*) (*N*_{−}(*s*)) be the number of sequences with a + (−) charge at position *s*. Then, $p+=N+(s)M$ and $p\u2212=N\u2212(s)M$. Because *N*_{+}(*s*) + *N*_{−}(*s*) + *N*_{0}(*s*) = *M*, where *N*_{0}(*s*) is the number of sequences in which the *s*th monomer is neutral, the probability that the *s*th monomer in an ensemble of *M* sequences is neutral is 1 − *p*_{+} − *p*_{−}.

The fluctuations in the ensemble of PA sequences arise because the normalized charge distribution is taken to be stochastic and is given by

The charges are measured in units of *e*^{−}. Because some monomers do not carry a charge (like in IDPs) (*p*_{+} + *p*_{−}) ≠ 1, the mean ⟨*σ*(*s*)⟩ gives the net charge, *p*_{+} − *p*_{−}, and the expression for the square of the charge fluctuations is $\u27e8\delta \sigma 2(s)\u27e9=p++p\u2212\u2212(p+\u2212p\u2212)2$. We refer to ⟨*σ*(*s*)⟩ and ⟨*δσ*(*s*)⟩, both of which are independent of *s*, as PA variables. We show that due to ⟨*δσ*^{2}(*s*)⟩ the *R*_{g} is altered substantially and could even induce a coil-globule transition even when the total charge on the PA is not globally neutral. Because of the opposing behavior of polyelectrolyte (*σ* ≠ 0) and PA effects arising from charge fluctuations ($\u27e8\delta \sigma 2(s)\u27e9\u22600$), the dependence of *R*_{g} on the Debye screening length could be non-monotonic. The phase diagram in the [⟨*σ*(*s*)⟩, ⟨*δσ*(*s*)⟩] plane is rich. We also apply the theory to calculate *R*_{g} of specific IDP sequences. Remarkably, the theory reproduces quantitatively the *R*_{g} values for the wild type Tau protein and various fragments obtained from the wild type Tau, which have been measured by Small Angle X-ray Scattering (SAXS) experiments.^{22} In Tau, and other IDPs, charge fluctuations arise because of conformational heterogeneity, which we demonstrate explicitly elsewhere^{23} for IDPs and here for PAs using simulations. From now on, we drop the angular brackets in both ⟨*σ*⟩ and ⟨*δσ*⟩.

## II. THEORY

We begin by considering the Edwards Hamiltonian for a polymer chain,

where $r\u2192(s)$ is the position of the monomer *s*, *a*_{0} is the monomer size, and *N* is the number of monomers. The first term in the Eq. (2) accounts for chain connectivity, and the second term represents the sum of excluded volume interactions, electrostatic interactions, and effects of charge fluctuations (see below) due to the random values of charges in different positions in the ensemble of sequences. The expression for $V(r\u2192(s))$ is

The first term in Eq. (3) accounts for the non-specific two body excluded volume interactions. It differs insignificantly from the usual *δ* function potential used in the standard Edwards model. Of course, when *a*_{0} is small compared to *R*_{g}, the precise form of this term is irrelevant, as long as it is short-ranged. In a good solvent ($v0$ > 0), the polymer chain swells with *R*_{g} ∼ *a*_{0}*N*^{ν} (*ν* ≈ 0.6), whereas in a poor solvent ($v0$ < 0), the size of the polymer is *R*_{g} ∼ *a*_{0}*N*^{ν} (*ν* ≈ 1/3). Here, we consider a PA in a good solvent ($v0$ > 0).

From Eq. (3), one may obtain an effective interaction term between charges on the PA chain. By following the theory developed previously,^{24} we use the Hubbard-Stratonovich transformation to decouple the product of charges *σ*(*s*)*σ*(*s*′) in Eq. (3). The partition function may be written as

where $Z\psi =\u222bd[r\u2192]exp\u2212V0\u2212i\u222bds\sigma (s)\psi (r\u2192(s))$ and $N=\u222bd[\psi (r\u2192)]exp[\u221212\u222bdr\u2192dr\u2032\u2192\psi (r\u2192)V1\u22121(\u2223r\u2192(s)\u2212r\u2192(s\u2032)\u2223)\psi (r\u2032\u2192)]$. If we assume that the charge distribution [Eq. (1)] is annealed, it suffices to average *Z*_{ψ} over the sequence of charges. With an assumption that the charges *σ*(*s*) at distant sites are not correlated, the partition function averaged over a sequence of charges to second order in *ψ* becomes^{24}

where the average value of the charge on the chain *σ* = ⟨*σ*(*s*)⟩ = *p*_{+} − *p*_{−}, the charge fluctuation $(\delta \sigma )2=\u27e8\sigma 2(s)\u2212\u27e8\sigma (s)\u27e92\u27e9=p++p\u2212\u2212(p+\u2212p\u2212)2$, and the local monomer density $c(r\u2192)=\u222bds\delta (r\u2192(s)\u2212r\u2192)$. The term involving (*δσ*)^{2}, arising from the charge fluctuations, gives rise to the so called PA effect, which is manifested as an effective attractive interaction of the screened Coulomb potential. Using Eqs. (4) and (5), we perform the needed integration over $\psi (r\u2192)$ to obtain the following expression for the effective two body interaction term between charges on the PA,

We neglect the three body interactions in the effective Hamiltonian in Eq. (6), which would be important if the PA were in a poor solvent. In the work of Higgs and Joanny,^{2} the variational type calculation (see below) was done directly using Eq. (3). In this case, upon expansion to second order in $V(r\u2192(s))$, the electrostatic potential [second term in Eq. (3)] generates a term ∝ *σ*(*s*)*σ*(*s*′)*σ*(*s*″)*σ*(*s*‴)*σ*(*s*′′′′), which is random. When averaged over the ensemble of sequences, the coefficient of the third term is $\u221d(p++p\u2212)2$ in Ref. 2. By contrast, we carry out averaging first, as shown in Eq. (4), and hence obtain a different prefactor for the charge fluctuation (*δσ*) induced attraction term in Eq. (6).

The screened Coulomb potential, the second term in Eq. (6), accounts for the interactions between charges separated by a distance $\u2223r\u2192(s)\u2212r\u2192(s\u2032)\u2223$. The strength of the unscreened electrostatic interactions is characterized by the Bjerrum length *l*_{B} = *e*^{2}/*ϵk*_{B}*T*. The Debye screening length, *κ*^{−1}, determines the range of the electrostatic interactions. By changing the value of *κ*, and hence the range of charge interactions, the PA chain could undergo a coil-to-globule transition. The value of *κ* may be the changed by decreasing or increasing the salt concentration. The dimensionless parameter, *σ*, determines the net charge per residue on the polyelectrolyte chain. For a particular sequence, fraction *p* = *p*_{+} + *p*_{−} of the monomers is charged with the charge on each monomer being ±*e*. Therefore, the net charge per monomer is *σ* = ∣*p*_{+} − *p*_{−}∣. The third term in Eq. (6) is the attractive interaction term that is proportional to charge fluctuations (*δσ*). The PA effect arises due to the interaction between the charge and dipoles formed between the sequence of positive and negative charges. The charge-dipole interaction term decays as $\u2223r\u2192(s)\u2212r\u2192(s\u2032)\u2223\u22122$ and it is effectively screened (with a screening length 1/2*κ*) due to the presence of other dipoles. In the absence of the third term, the Hamiltonian would describe a polyelectrolyte, whose phases as a function of temperature and *κ* have been previously described using the methods used here.^{25}

In order to obtain *R*_{g}, we adopt the Edwards-Singh (ES) type variational calculation,^{26} which has been extensively used in the polymer literature.^{2,25,27–29} More recently, the method was used to study the sequence dependence of collapse of polypeptide chains^{30} and polyelectrolytes^{20} with application to a special class of synthetic IDPs. In developing the theory, we assume that the interactions between charges exist only between specific monomers described by the second and third term in Eq. (6). The sum is over the set of specific contacts between pairs {*s*_{i}, *s*_{j}}. We use the contact maps of IDP, generated in coarse-grained (CG) simulations of IDPs,^{23} in order to assign the specific interactions. The contact map from the simulation is computed by using a cutoff of 8 Å. The contacts are included between all side chain beads. In the two bead CG model,^{23} the charges are positioned on the center of masses of the side chain beads, and therefore the contact map includes charge-charge contacts.

The ES method is a variational type (referred to as the uniform expansion method) calculation that represents the exact Hamiltonian by a Gaussian chain with an effective monomer size, which is determined as follows. Consider a virtual chain without excluded volume interactions, whose radius of gyration is $\u27e8Rg2\u27e9=Na2/6$,^{26} described by the Hamiltonian,

The monomer size in the trial Hamiltonian is *a*. We split the deviation $W$ between the virtual chain Hamiltonian and the real Hamiltonian as

where

The radius of gyration is $Rg2=1N\u222b0N\u27e8r\u21922(s)\u27e9ds$, with the average being $\u27e8r\u21922(s)\u27e9=\u222br2e\u2212Hv/kBTe\u2212W\delta r\u2192\u222be\u2212Hv/kBTe\u2212W\delta r\u2192=\u27e8r\u21922(s)e\u2212W\u27e9v\u27e8e\u2212W\u27e9v$, where, ⟨⋯⟩_{$v$} denotes the average over $Hv$.

Assuming that the deviation $W$ is small, we can calculate the average to first order in $W$. The result is $\u27e8r\u21922(s)\u27e9\u2248\u27e8r\u21922(s)(1\u2212W)\u27e9v\u27e8(1\u2212W)\u27e9v\u2248\u27e8r\u21922(s)(1\u2212W)\u27e9v\u27e8(1+W)\u27e9v$ and the radius of gyration becomes

If we choose the effective monomer size *a* in $Hv$ such that the first order correction [second and third terms in the right hand side of Eq. (10)] vanishes, then the size of the chain is $\u27e8Rg2\u27e9=Na2/6$. This is an estimate of the exact $\u27e8Rg2\u27e9$ and is only an approximation as we have neglected $W2$ and higher powers of $W$. Thus, in the ES theory, we determine *a* using Eq. (10),

The equation given above leads to a self-consistent equation for *a* and is given by^{26}

By calculating the averages in the Fourier space [$r\u2192n=1N\u222b1Ncos\pi nsNr\u2192(s)ds$, $r\u2192(s)=2\u2211n=1Ncos\pi nsNrn\u0303\u2192$, and $Rg2=2\u2211n\u27e8|rn\u0303\u21922|\u27e9$], we obtain

where $C1ss\u2032=\u2211n=1N1\u2212cos[n\pi (s\u2212s\u2032)/N]n4$ and $C2ss\u2032=\u2211n=1N1\u2212cos[n\pi (s\u2212s\u2032)/N]n2$. In obtaining Eq. (13), we have used $v0=4\pi a033$ in Eq. (3).

From Eq. (13), we can calculate the effective monomer size *a* and hence the chain size $\u27e8Rg2\u27e9=a2N6$. However, without having to solve Eq. (13) numerically, we can define the Θ-like point, which signals the onset of a potential transition from a coil to a globule state in the PA. At the *θ*-point, the repulsive terms exactly balance the PA term. Since at the Θ-point, the PA behaves as a Gaussian chain, with *a* = *a*_{0}, we substitute this value for *a* in Eq. (13) to determine the condition for the Θ-point. Thus, from Eq. (13), the critical charge fluctuation value, at which the PA term equals the excluded volume and PE terms, is

The numerator in Eq. (14) is a consequence of the repulsion containing excluded volume interactions and the polyelectrolyte term. The denominator encodes the PA effect, determining the extent to which the size of the polymer changes due to charge fluctuations. Using Eq. (14), we obtain the dependence of *δσ*_{θ} on *N*. Scaling *n* by *N*, it can be shown that $C1ss\u2032\u223c1N2$ and $C2ss\u2032\u223c1$. From these results, we obtain $\delta \sigma \theta \u223cN$. The implication is that for *N* ≫ 1, charge fluctuations have to be extremely large to drive coil to globule transitions unless the PA is globally neutral. Because even for statistically neutral PA the PE term would not be irrelevant, we surmise that a genuine coil to globule transition may not be easily realizable in long PAs, which is in accord with the results in a previous study.^{9} By implication, our theory suggests that maximally compact IDPs would be difficult to obtain for generic IDP sequences if the fractions of + and − charged residues is on the order of (0.4–0.5). Of course, to quantitatively describe the conformational ensemble of IDPs or PAs adopt as the PA parameters and salt concentration (*σ*, *δσ*, and *κ*) are varied, it will require detailed calculations, as was previously done for polyelectrolytes.^{31}

## III. SIMULATIONS

### A. Model

To provide further insights into some aspects of our theoretical predictions and to highlight the nature of the heterogeneous ensembles that are sampled, we carried out simulations of PA chains, with *N* = 50. We consider sequences having the same net charge, *σ*, but different charge distributions to elucidate the role of sequence in determining the size of PAs. The PA chain is modeled using a standard bead-spring model for charged polymers, with the total potential energy, *U*_{tot}, given by

Here, *U*_{ch} describes the chain connectivity between the beads and is modeled using the FENE (finitely extensible nonlinear elastic) potential,

In Eq. (16), *k* = 20 kcal mol^{−1} Å^{−2} denotes the spring constant, *l*_{0} = 3.8 Å is the equilibrium bond length between the connected PA beads, and *R*_{0} = 2 Å controls the maximum allowable deformation of the covalent bonds.

The excluded volume interactions between pairs of beads are described by a truncated and shifted Lennard-Jones potential,

Based on the previous work,^{32–34} we set *ϵ* = *k*_{B}*T* and *σ* = *l*_{0}. The pairwise interactions between the beads are ignored if the distance is greater than 2^{1/6}*σ*. This cutoff ensures that the excluded volume term is purely repulsive.

The interactions between charged beads are taken into account via the screened Coulomb potential,

In Eq. (18), *ε* and *κ* are the inverse Debye length and the dielectric constant, respectively. We consider only unit charges, i.e., *q* = ±*e*.

### B. Simulations

The conformational space of each PA chain is explored using Langevin dynamics. For each PA bead, the stochastic equation is given by $mr\u0308i=\u2212\gamma r\u0307i+Fi+gi$, where *m* is the mass, *F*_{i} is the conservative force acting on each bead, and *γ*_{i} is the drag coefficient. The Gaussian random force, *g*_{i}, satisfies ⟨*g*_{i}(*t*)*g*_{j}(*t*′)⟩ = 6*k*_{B}*Tγδ*_{ij}*δ*(*t* − *t*′). The drag coefficient *γ* is given by *γ* = *m*/*τ*_{eff}, where *τ*_{eff} = *σ*(*m*/*ϵ*)^{1/2} is the effective time scale. We used a variant of the velocity Verlet scheme^{35} to integrate the equations of motion, using a time step of Δ*t* = 0.01 *τ*_{eff}. Each simulation was carried out for 1.2 × 10^{9} steps to ensure proper equilibration and to obtain meaningful statistics.

### C. Analysis

Following Eq. (6), we can estimate the charge fluctuations for each PA chain from simulations using an approximate expression,

where *δU*_{elec} = *U*_{elec} − ⟨*U*_{elec}⟩ is the fluctuation in the electrostatic energy [Eq. (18)] about its mean, and *δσ*_{c} denotes the charge fluctuation computed from the ensemble of sequence-specific conformations generated from simulations (see below).

To characterize the structural heterogeneity of the PA ensembles and to identify the most populated equilibrium conformations, we carried out hierarchical clustering of the simulation trajectories using a pairwise distance metric, *D*_{ij}, defined as

where $ra,bi$ and $ra,bj$ are the pairwise distances between the PA beads *a* and *b* in snapshots *i* and *j*, respectively. Distinct geometric clusters were identified using the Ward variance minimization algorithm,^{36} as implemented within the *scipy* module. The hierarchical organization of conformations into distinct families was visualized in the form of dendrograms.

## IV. RESULTS

### A. Theoretical predictions

From Eq. (6), it is easy to show that the size of the PA should be determined by $\delta \sigma \sigma $, which can be written as $(1\u2212\sigma A)/\sigma $, where $\sigma A=(p+\u2212p\u2212)2\sigma $, which in the IDP literature is referred to as the charge asymmetry parameter. Figure 1, displaying the dependence of the radius of gyration for a PA chain with randomly distributed charges on the screening length, shows that *R*_{g} changes non-monotonically as *κ* increases when $\delta \sigma \sigma =10$. In this charge-fluctuation dominated regime, the behavior can be explained by noting that at small values of *κ*, *R*_{g} increases due to the PA term until *κl*_{b} ≈ 0.19. In this range of *κ*, the effective attractions between monomers, the PA effect, decrease by adding ions to the solvent. As a result, the size of the chain increases. For the PA whose *R*_{g} is shown in Fig. 1, at *κl*_{B} = 0.19, the PA and PE effects balance each other, and the chain becomes a random coil. Upon further increase in *κ*, the decrease in *R*_{g} (but the chain is not a globule) is due to the dominance of the PA term. In the opposite limit when $\delta \sigma \sigma =0.1$, both the dimensions of the chain are dominated by the PE term, and *R*_{g} decreases with increasing *κ*. We expect that at sufficiently large values of *κl*_{B} the consequences of PE and PA effects are negligible, and hence, *R*_{g} would have the value expected for a Flory random coil (*ν* = 0.6). Interestingly, these trends are qualitatively similar to the experiments on two IDPs (the N-terminal domain of HIV-1 integrase and human prothymosin-*α*).^{37}

### B. Predictions for a Tau-like IDP

The generality of the theory allows us to predict the dependence size of an IDP. In Fig. 2, we plot the *κ* dependence of radius of gyration of a Tau protein fragment (K17Tau with *N* = 145). To perform the calculations, we used the contact map generated in simulations based on the Self-Organized Polymer (SOP)-IDP model, which captures accurately the measured structure factors for a variety of IDPs.^{23} Using input from simulations,^{23} which accounts for the heterogeneity of the conformational ensembles of IDP, we find that *R*_{g} of K17Tau changes non-monotonically with increasing value of *κ* (Fig. 2). The size of K17Tau protein increases with *κ* until it reaches a maximum at *κl*_{B} ≈ 0.28 (ion density for a monovalent salt is ≈30.5 mM), where the protein behaves like a polymer in a Θ-solvent. The peak is broader when compared to the chain with random sequences. With further increase in *κ*, *R*_{g} decreases just as for the random PA chain (Fig. 1). From these results, we conclude that charge fluctuations are substantial in K17Tau.

### C. Phase diagram

The 3D plots in Figs. 3 and 4 for two different values of *κ* show the phase diagram for different values of *σ*, and *δσ*. The plot in Fig. 3 shows that for small values of *σ*, the change in *R*_{g} is significant at a particular value of *δσ*. For a large value of net charge, say *σ* = 0.8, the change in *R*_{g} is small over a range of values of *δσ* indicating that PE effects dominate. The value of *δσ*_{θ} increases with *σ* for a PA chain. In Fig. 4, for *κ* = 0.4 nm^{−1} and for a high value of net charge *σ*, the change in *R*_{g} is significant at a particular value of *δσ* indicating that PA effects dominate. The phase diagrams show that by changing the salt concentrations, the sizes of random PAs can be altered dramatically.

### D. Application to IDPs

In order to calculate *R*_{g} for several IDPs (see Fig. 5), with *N* ranging from 24 (HISTATIN5) to 441 (hTau40), we used the average contact maps from simulations,^{23} which restrict the summation in Eq. (6) to specific sites on the IDP. The dependence of *R*_{g} on the chain length for a set of IDPs (listed in the caption to Fig. 5) is shown in Fig. 5. The theory shows that *R*_{g} ∼ *N*^{0.6}, implying that these IDPs behave as self-avoiding polymers, similar to the results in the simulations for PAs.^{9} The scaling in Fig. 5 has a weaker *N* dependence than that predicted by the renormalization group argument (*R*_{g} ∼ *N*) for long PAs.^{38} The *N* dependence in Fig. 5 has higher power than the result in Ref. 2 (*R*_{g} ∼ *N*^{1/3}). It appears that for values of *σ* observed in this set of IDPs the random coil behavior is the apt description.

### E. Sequence effects and conformational heterogeneity

To illustrate the effect of charge fluctuations on the conformational ensemble of PAs, we performed simulations of PA chains using a simple off-lattice model. The PA variables, *p*_{+} and *p*_{−}, as well as the other simulation parameters are kept fixed. Hence, any variations in the size or the underlying conformational heterogeneity of the PA sequences is entirely due to the different charge distributions. In a recent study, Firman and Ghosh^{20} identified combinations of *p*_{+} and *p*_{−}, for which coil to globule transitions are expected to be extremely sensitive to the charge decoration along the PA chain. Taking a cue from their work, we consider PA sequences with a net charge of +6, with *p*_{+} = 0.280 and *p*_{−} = 0.160.

Twenty different realizations of charge distributions were generated by randomly permuting the positions of the neutral, positive, and negative beads along the chain. The ensemble averaged *R*_{g} values fall in the range from 1.77 to 2.02 nm. The spread in *R*_{g}s is interesting considering that in all these sequences *N* = 50, and the PA variables, *σ* and *δσ*, which are often used to analyze data in the IDP community, are identical. To explain these differences, in terms of fluctuations in the conformations linked to *δσ*, we consider three representative examples (*Seq1*, *Seq2*, and *Seq3*, in Fig. 6). The peak of the *R*_{g} distribution progressively shifts toward lower values in going from *Seq1* to *Seq3* (Fig. 6), clearly indicating that standard PA variables are not sufficient to fully describe the equilibrium properties.

Insights into the relative populations of the coil-like and the globule-like states for the three PA sequences can be obtained from the hierarchical arrangement of structural clusters (Fig. 7). The structural ensemble of *Seq1* is clearly dominated by extended conformations, which accounts for 64.1% of the equilibrium population. Compact structures, on the other hand, have a lower occupation probability (35.9%). For *Seq2*, the relative populations of extended and compact structures are approximately the same, being 49.5% and 50.5%, respectively. As is evident from the dendrogram, the equilibrium ensemble of *Seq3* is primarily dominated by compact structures (net population of 73.4%), which is in complete contrast to *Seq1*. The contrasting heterogeneity of the conformational ensembles for the three PA sequences, with identical *N*, *σ*, and *δσ*, readily explains the differences in *R*_{g}.

The distributions of *δU*_{elec} (Fig. 6), together with the approximate values of *δσ*_{c} computed using Eq. (19) (Table I), provide a clear-cut evidence of the key role of the charge fluctuations in determining chain dimensions. For *Seq1*, which has a high propensity to form extended structures, the *δU*_{elec} distribution is narrow, and the charge fluctuation is minimal. In fact, the variance to mean ratio (VMR) of the electrostatic energy suggests that the charge distribution of *Seq1* would correspond to the theoretical limit, $\delta \sigma \sigma <1$, where PE effects dominate. For *Seq3*, the *δU*_{elec} distribution is quite broad, and charge fluctuation effects are the most dominant, in perfect harmony with the clustering analysis, which revealed that *Seq3* is mostly associated with compact structures. Furthermore, the VMR suggests that in contrast to *Seq1*, the appropriate theoretical limit would be $\delta \sigma \sigma >1$, the regime where PA effects dominate. *Seq2*, for which the equilibrium populations of compact and extended structures are approximately equal, presents an interesting scenario. As expected, the estimated charge fluctuation falls between the two extremities. The VMR ≈1 implies that the corresponding theoretical limit would be $\delta \sigma \sigma \u22481$. Therefore, the random coil like behavior is predicted from structural clustering, which manifests itself due to a balancing act of the PA and PE effects.

Seq . | ⟨δ^{2}U_{elec}(kcal/mol)⟩
. | ⟨R_{g}⟩ (nm)
. | δσ_{c}
. | |⟨δ^{2}U_{elec}⟩/⟨U_{elec}⟩|
. |
---|---|---|---|---|

Seq1 | 0.23 | 2.03 | 1.53 | 0.83 |

Seq2 | 1.04 | 1.87 | 2.14 | 1.02 |

Seq3 | 4.07 | 1.78 | 2.93 | 9.49 |

Seq . | ⟨δ^{2}U_{elec}(kcal/mol)⟩
. | ⟨R_{g}⟩ (nm)
. | δσ_{c}
. | |⟨δ^{2}U_{elec}⟩/⟨U_{elec}⟩|
. |
---|---|---|---|---|

Seq1 | 0.23 | 2.03 | 1.53 | 0.83 |

Seq2 | 1.04 | 1.87 | 2.14 | 1.02 |

Seq3 | 4.07 | 1.78 | 2.93 | 9.49 |

We draw two important conclusions. (i) The scaling of *R*_{g} with *N*, which for a certain class of IDPs, obeys the Flory scaling law and can be used to accurately determine *R*_{g} without the need for experiments or simulations. The class of IDPs for which *R*_{g} can be computed using *R*_{g} ∼ *N*^{ν} (*ν* ≈ 0.6) can be discerned using *σ* and *δσ*. (ii) However, quantitative description of the equilibrium properties of IDPs as a function of salt concentration or denaturants requires a complete characterization of the conformational ensemble, as simulations explicitly demonstrate. The analyses of charge fluctuations show that *δσ*, which can be readily calculated for a specific sequence (in our simulations they are identical), is substantially modified when weighted by the energies associated with the conformational ensemble (denoted as *δσ*_{c}). Thus, only by understanding the details of the conformations can the properties of IDPs be correctly described. Alas, the average *R*_{g} masks such subtle but important effects.

## V. CONCLUSIONS

We developed a theory to quantitatively predict the effect of charge fluctuations (*δσ*) on the size of flexible PAs that is in a good solvent (excluded volume interactions are positive) as a function of the inverse Debye length and the net charge per monomer on the chain (*σ*). Interestingly, when charge fluctuations are non-negligible ($\delta \sigma \sigma $ is greater than unity), the radius of gyration increases non-monotonically as *κ* increases. When $\delta \sigma \sigma $ is less than unity, *R*_{g} decreases with increasing *κ*. The generality of the theory allows us to predict *R*_{g} for a number of IDPs. For a certain class of IDPs, we find the usual scaling of *R*_{g} ∼ *N*^{ν} with *ν* = 0.6, which coincides with the behavior expected for Flory random coils. Remarkably, our theory gives accurate estimates of the size of the Tau protein and various fragments derived from it. This class of IDPs behaves as an ideal chain. The differences in the two scaling behavior between these IDPs can be rationalized in terms of the interplay between charge fluctuations and the net charge per monomer.

What could be the origin of charge fluctuations in an IDP in which *σ* (more precisely the precise sequence) is fixed? Even if *σ* is fixed, the ensemble of conformations that a typical IDP or PA samples is heterogeneous. In sampling a large number of conformations, the spatial distances between charged residues could vary greatly. Therefore, the effective charge of each conformation is different. In some of the conformations, positively and negatively charged residues would be close together, whereas in others they would be spatially well-separated. This gives rise to conformation-dependent effective attraction, which is quantified in our theory in terms of the average quantity *δσ*. Of course, the effective value of *δσ* cannot be computed for a quenched charged sequence for a specific IDP without suitable simulations (see Fig. 7 which illustrate this important point using three specific PA sequences). Therefore, it is difficult to construct phase diagrams of IDPs solely in terms of *σ* or the differences between the number of positively and negatively charged residues. Construction of phase diagrams requires the use of physical order parameters, which necessarily involves quantitatively characterizing the conformational ensembles of IDPs, an exercise requiring simulations using models that reproduce experimental measurements, such as scattering profiles.

## ACKNOWLEDGMENTS

We are indebted to Upayan Baul for providing the simulation results and for useful discussions. We thank Professor D. Svergun for providing us SAXS data on the Tau protein. This work was supported by the National Science Foundation and the National Institutes of Health (Grant Nos. CHE 16-36424 and R01 GM107703) and the Collie-Welch Foundation (No. F-0019).