Fully understanding biomolecular function requires detailed insight into the systems’ structural dynamics. Powerful experimental techniques such as single molecule Förster Resonance Energy Transfer (FRET) provide access to such dynamic information yet have to be carefully interpreted. Molecular simulations can complement these experiments but typically face limits in accessing slow time scales and large or unstructured systems. Here, we introduce a coarse-grained simulation technique that tackles these challenges. While requiring only few parameters, we maintain full protein flexibility and include all heavy atoms of proteins, linkers, and dyes. We are able to sufficiently reduce computational demands to simulate large or heterogeneous structural dynamics and ensembles on slow time scales found in, e.g., protein folding. The simulations allow for calculating FRET efficiencies which quantitatively agree with experimentally determined values. By providing atomically resolved trajectories, this work supports the planning and microscopic interpretation of experiments. Overall, these results highlight how simulations and experiments can complement each other leading to new insights into biomolecular dynamics and function.

Life at the molecular level is determined by the interplay of structure and dynamics. While a biomolecular structure can be determined by sophisticated experimental techniques such as NMR or X-ray, detailed insight into its dynamics is more difficult to access. A particularly powerful technique is FRET (Förster Resonance Energy Transfer), where specific sites in biomolecules are labeled by fluorescent dyes. FRET can provide time-resolved information about, e.g., protein-folding dynamics,1 folding intermediates,2,3 unfolded or disordered proteins,4–6 or conformational transitions.7 Both planning and interpretation of such experiments can be nontrivial, as FRET does not directly provide quantitative distance information within the protein. Instead, it rather measures the transfer efficiency between the dyes which—besides the distance between the dyes—depends on their orientation and dynamics and hence also on their shapes. A useful complement for such experiments is molecular simulations of dye-labeled biomolecules. Depending on their complexity, additional information up to the full dynamics of all atoms becomes accessible.

To date, many simulation approaches are available. A straightforward approach is predicting the inter-dye distance by estimating the accessible volume of FRET dyes.8–12 This does, however, not account for dynamics and the mutual dye orientation, reflected by the orientation factor κ2. An approach that has been used extensively for unfolded and intrinsically disordered proteins is the interpretation of FRET data in terms of analytical polymer models,13 but such models are not applicable to folded proteins. Molecular dynamics simulations are computationally considerably more demanding and aim at modeling the system dynamics accurately. In this approach, the dyes are typically parametrized for a specific biomolecular force field.14 Then, simulations on the ns to μs time scale are run and compared to experimentally accessible properties, such as fluorescence anisotropy15 or mean FRET efficiencies.14 As one might imagine, dye and linker fluctuations play an important role.16 Similarly, the specific dye shape and mutual orientation are crucial and are not fully captured by simple accessible volume calculations.17 Molecular dynamics simulations also allow testing assumptions such as approximating the mutual dye-dye orientation factor κ2 with 2/3, which is questioned in several studies,18,19 in particular, for low linker flexibilities.20 Many simulation protocols have computationally prohibitive demands to reach sufficiently long simulation times for accurate calculations of experimentally accessible properties.21 In particular, it is challenging to account for slow or large-scale conformational transitions, structurally diverse ensembles such as unfolded proteins, very large systems, or intrinsically disordered proteins.

Here, we want to introduce a new method for simulating the dynamics of FRET-labeled proteins on a coarse-grained description level22 which maintains full protein flexibility and all heavy atoms of proteins and dyes. This considerably lowers computational costs and allows us to eliminate many challenges regarding the mentioned insufficient sampling of large conformational ensembles. Our model allows us to access and quantitatively analyze these complex dynamic ensembles in atomic detail. The model requires only few and robust parameters to achieve a realistic description of the system. We validate our approach by direct and quantitative comparison against experimental FRET efficiency histograms. As an example, we highlight simulating large and diverse structural ensembles by comparing simulations and measured FRET data of both folded and unfolded proteins. Having established the precision of our model, we then compare our model to descriptions of FRET by the accessible volume approach and analytical polymer models.

Förster Resonance Energy Transfer (FRET)23 is a non-radiative energy transfer between a donor and an acceptor fluorophore. It has a high sensitivity to distances in the nanometer range and is therefore a powerful “spectroscopic ruler”.24 For example, one can label specific residues of proteins with suitable donor and acceptor fluorophores to distinguish different protein conformations and conformational changes directly through distance changes between the dyes.

FRET experiments measure the FRET efficiency E, which is related to the inter-dye distance RDA by25 

E=11+RDAR06.
(1)

The Förster radius R0 determines the range of measurable distances and is given by25 

R06=9(ln10)κ2QDJ(λ)128π5n4NA,
(2)

where QD is the fluorescence quantum yield of the donor in the absence of the acceptor, κ2 is the dipole orientation factor, n is the refractive index of the medium, NA is the Avogadro constant, and J(λ) is the spectral overlap integral of donor emission and acceptor absorption spectra. The orientation factor κ2 describes the relative orientation of the transition dipole moments for emission of the donor, μD, and absorption of the acceptor, μA, and can be calculated via25 

κ2=(sinθDsinθAcosϕ2cosθDcosθA)2,
(3)

where θD and θA are the angles between μD and μA and the connecting vector RDA between the two dye centers, respectively. ϕ denotes the angle between the planes defined by μA, RDA and μD, RDA, respectively.

Most experimental studies assume fast rotational diffusion of the dyes with respect to the excited state lifetime, resulting in an averaging over all possible orientations. This is referred to as the “isotropic averaging regime,” in which case a constant value of κ2¯=2/3 can be used.25 The assumption of κ2¯=2/3 has been questioned in several studies12,18–20 but can be tested experimentally, e.g., by measuring time-dependent fluorescence anisotropy decays.14 

A sample of randomly oriented dyes illuminated by polarized light depolarizes over time as a consequence of rotational diffusion. Observing the time dependent fluorescence anisotropy r(t) leads to a measure of the flexibility and rotational speed of the dyes, the rotational correlation time τrot. The time-resolved fluorescence anisotropy can be measured experimentally and calculated from simulations using the normalized transition dipole moment vector μ^ via26 

r(t)=r0P2μ^(s)μ^(s+t),
(4)

where r0 is the fundamental anisotropy and P2 is the second-order Legendre polynomial. r(t) allows a direct comparison of experimental observations with simulations. Here, we assume that the absorption and emission dipole moments are collinear for our dyes, so the fundamental anisotropy is given by r0 = 0.4, close to the experimental value.27 

In the analysis of experimental data, often a double-exponential decay is required to account for the contributions of rotational motion of both the fluorophore and the protein.14,26,28 In the systems we investigate here, the proteins are large in comparison to the dyes and the measured dye rotational correlation times are one order of magnitude smaller than the expected rotational correlation time of the proteins. In the simulations, we focus on the rotational motion of the dye in the inertial system of the protein, where we find it sufficient to use only one exponential and fit with

r(t)=r0exptτrot.
(5)

For our simulations, we use native Structure Based Models (SBMs), also referred to as Go¯-type models.29 They are based on energy landscape theory and the principle of minimal frustration.30–33 SBMs are used successfully to study a wide range of phenomena34 ranging from, e.g., protein folding,35,36 misfolding,37,38 structure prediction,39 and conformational dynamics40,41 to large biomolecules such as the ribosome42 or RNA.43 SBM simulations show good agreement with experimental results, e.g., they are used to reproduce transition state ensembles and “en-route” intermediates,44 and folding rates comparable to experimental measurements.35 By design, they are limited to unfrustrated systems with negligible non-native interactions.29–33 With their high computational efficiency, SBMs facilitate simulations of several folding and unfolding transitions on regular desktop computers while offering full flexibility for all parts of the protein. The SBM potential employs the native structure of the protein as its ground state but also allows one to model systems with multiple stable conformations by using potentials with multiple minima.7,40,41,45

We use an all-atom SBM22 implemented in eSBMTools,46 which takes into account all heavy atoms of the protein and uses a simplified potential of the form47 

V=bondsKb(rr0)2+anglesKa(θθ0)2+impr.d.Ki(χχ0)2+prop. d.Kd1cos(ϕϕ0)+121cos(3(ϕϕ0))+contactsKcCG(rij,r0ij)+non-native contactsKncσ̃rij12.
(6)

The potential includes harmonic potentials for bonds, angles, and improper dihedral angles and a potential for proper dihedral angles which allows the occupation of isomeric conformations. The minima are set to r0, θ0, χ0, and ϕ0, the values for bond lengths, angles, improper dihedral angles, and proper dihedral angles in the native state, respectively. The contact potential CG(rij,r0ij) stabilizes the native structure by introducing attractive potentials for atom pairs forming contacts in the native state, whereas an overall repulsive term is included for the excluded volume of all atoms. r0ij and rij are the native and the actual distances of the atom pair (i, j) and σ̃ represents the excluded volume for Pauli repulsion with the value σ̃=2.5 Å. The energetic weights for bonds, angles, improper dihedral angles, and non-native contacts are set to the values Kb = 20 000 ϵ/nm2, Ka = 40 ϵ/deg, Ki = 40 ϵ/deg, and Knc = 0.01 ϵ, where deg refers to degree and ϵ is the reduced energy unit.48 The energetic weights for the contact potential Kc and for the proper dihedral angle potential Kd are determined as described in Ref. 48. The Gaussian contact potential is given by47,49

CG(rij,r0ij)=1+σ̃rij121exp(rijr0ij)22σ21,
(7)

with σ2=(r0ij)2/(50ln2) for each native contact pair (i, j) determined by the shadow map algorithm.47 All atoms are handled equally with identical parameters for the excluded volume and a unit mass of m = 1.0.

As SBMs do not have an inherent temperature scale comparable to physical temperature, the temperatures are typically measured in reduced units. Accordingly, the folding temperature in the SBMs is about 1.0, corresponding to around T = 120 in GROMACS units.48 Here, we report the temperature in GROMACS units.

We want to find a minimal and robust set of parameters describing the dyes which is sufficient to replicate and predict experimental data.

SBM simulations require an initial structure of the system. For many dyes, only chemical structures are available. As we want a systematic and consistent workflow, we perform quantum-chemical calculations to get three-dimensional structures of the dyes. Details and the structures of the Alexa Fluor dyes with available chemical structures are given in the supplementary material (see SI 1). Subsequently we add the respective linkers and maleimide which are used to bind the dyes to the protein. In this study, we use two dye pairs—the Alexa Fluor 488 dye with C5-linker (AF488) and Alexa Fluor 594 dye with C5-linker (AF594), and the Alexa Fluor 546 dye with C5-linker (AF546) and Alexa Fluor 647 dye with C2-linker (AF647). In general, every dye and linker with an available chemical structure can be utilized accordingly.

For the initial dye parameters, we choose the same parametrization as for the protein in the SBM. We determine the angles and dihedral angles automatically from the bond information given in the chemical structure. A detailed description can be found in the supplementary material (see SI 2). The only interaction between both dyes and also between dyes and protein is the excluded volume repulsion term.

SBMs do not have an inherent temperature scale. To adjust the dye parameters independently of the protein in a consistent way, we set the dye temperature Tdye separately from the protein temperature T. In the simulation, we treat dyes and protein as different groups and couple them to separate temperature baths. This has the advantage of uncoupling the behavior of dyes and protein, so we can use the same dye temperature regardless of investigating, e.g., a folded or an unfolded protein, realized by low and high protein temperatures, respectively. We tested the effect of the dye temperature on the protein dynamics and found only a slight increase of the fluctuations of the respective residues the dyes are attached to, whereas the dynamics of the adjacent residues remain unaltered. Due to numerical instabilities,50 we cannot sufficiently increase the temperature to account for the high dye mobility. Hence, we additionally change the mass of the dye atoms to 0.2, rather than the unit atom mass of 1.0 the SBM assigns to the protein.

The protein structures used for the simulations are taken from the Protein Data Bank (PDB).51 The three used proteins are the chymotrypsin inhibitor 2 (CI-2, PDB: 2CI252) as a test system, the tenth type III module of fibronectin (10FNIII, PDB: 1TTG53), and the cold-shock protein from the hyperthermophilic bacterium Thermotoga maritima (CspTm, PDB: 1G6P54).

The Alexa Fluor dyes are typically attached to cysteine residues of proteins via a maleimide group. In our simulations, the respective residue in the protein is replaced by a cysteine and the dye-maleimide structure is attached to the sulfur atom of the cysteine. This is done by attaching the dye preferably orthogonally to the protein surface while avoiding steric clashes between atoms. As an example, 10FNIII with AF546 and AF647 attached to residues 11 and 86, respectively, is shown in Fig. 1. Details of the procedure can be found in the supplementary material (see SI 2).

FIG. 1.

Structures of 10FNIII (left) and AF546 and AF647 dyes55 (middle). The dyes are attached with the respective linkers via a maleimide bound to residues 11 and 86 of 10FNIII. The merged structure is shown with the assumed transition dipole moments (black arrows) for donor (blue, μD) and acceptor (red, μA) dyes as well as the distance between the dyes’ centers (RDA) (right).

FIG. 1.

Structures of 10FNIII (left) and AF546 and AF647 dyes55 (middle). The dyes are attached with the respective linkers via a maleimide bound to residues 11 and 86 of 10FNIII. The merged structure is shown with the assumed transition dipole moments (black arrows) for donor (blue, μD) and acceptor (red, μA) dyes as well as the distance between the dyes’ centers (RDA) (right).

Close modal

We perform our simulations using GROMACS56 with the SBM potential [see Eq. (6)] and the Langevin dynamics simulation protocol.56 As we use CI-2 as a simple test system, we choose a protein temperature of T = 50 below the folding temperature for the simulation of the folded state.

For the proteins CspTm and 10FNIII, we want to achieve a quantitative comparison against experimental measurements. To identify the SBM temperature for the folded states corresponding to the experimental setup, we perform an initial all-atom simulation in the AMBER99 force field:57 We choose the SBM protein temperature by comparing the native-basin fluctuations of the proteins simulated with the AMBER99 force field at the physiological temperatures used in the experiments [T = 296/295 K (10FNIII/CspTm)] with SBM simulations of varying temperatures. We compare the root mean square fluctuations of the Cα-atoms and choose the SBM temperature which results in the least deviation from the AMBER99 simulation, according to Ref. 58. For the results, see Table I. For the unfolded state, we choose a temperature well above the folding temperature so that the protein is unfolded throughout the simulation. We check the influence of choosing different temperatures above the folding temperature, which does not alter the results. Future work will focus on the challenges of simulating conformational transitions (cf. Ref. 59).

TABLE I.

Simulation parameters for different protein/dye combinations. The protein temperatures for simulation of the folded TF and the unfolded TU states, the dye pair, the dye-labeled residues, and the dye temperature(s) are given.

Dye pair Labeled residues Tdye
Protein TF TU Donor/acceptor Donor/acceptor Donor/acceptor
CI-2  50  170  AF546/AF647  20/78  250 
CI-2  50  170  AF488/AF594  20/78  190/250 
CspTm  90  150  AF488/AF594  2/68, 68/2, 11/68, 68/11, 23/68  190/250 
10FNIII  60  200  AF546/AF647  11/86  250 
Dye pair Labeled residues Tdye
Protein TF TU Donor/acceptor Donor/acceptor Donor/acceptor
CI-2  50  170  AF546/AF647  20/78  250 
CI-2  50  170  AF488/AF594  20/78  190/250 
CspTm  90  150  AF488/AF594  2/68, 68/2, 11/68, 68/11, 23/68  190/250 
10FNIII  60  200  AF546/AF647  11/86  250 

To determine appropriate dye temperatures, we perform simulations with different dye temperatures and determine the respective rotational correlation times by using Eqs. (4) and (5). The temperature directly affects the dye mobility and the rotational correlation time τrot. Higher temperatures lead to smaller τrot. To obtain a high dye mobility, we choose the temperature for the faster rotating dye at Tdye = 250 and for the slower dye such that the ratio of both rotational correlation times from experiments is matched (see Table S1 in the supplementary material). For AF647, we do not have an experimental value of τrot and use the same temperature as for AF546. Varying the dye temperature slightly does not alter the final results.

We then perform simulations with 109 steps and dt = 0.0005 for CspTm and with 2.5 × 109 steps and dt = 0.0002 for CI-2 and 10FNIII. The temperature coupling constant is set to τT = 0.1. From the simulations, we extract the inter-dye distance RDA(t) and the transition dipole moments of donor μD(t) and acceptor μA(t) (see Fig. 1, for details see SI 3 of the supplementary material).

SBMs do not have a time scale directly comparable to the physical time scale. To evaluate simulated trajectories and obtain FRET efficiency histograms, we introduce a time scale based on a comparison of the experimental and theoretical rotational correlation times τrot of the dyes. We calculate the fluorescence anisotropy decay r(t) from simulations according to Eq. (4) and fit it with Eq. (5). From the comparison, we get a conversion factor to adjust the trajectory time scale.

This time scale is necessary as we want to calculate FRET efficiency histograms directly comparable to experiments. Photon statistics are gained from Monte Carlo photon simulations (as described in Refs. 8, 60, and 61) and fully account for shot-noise. The simulation parameters are given in Table S1 in the supplementary material. We generate donor and acceptor photons for the whole simulated trajectory, corresponding to around 100 μs on the physical dye time scale, until a specified number of photons (the burst size) is collected. Each of these bursts is used to calculate a single FRET efficiency value. Details are given in the supplementary material (see SI 3).

We perform simulations with three different proteins (CspTm, CI-2, and 10FNIII) and two different dye pairs (AF488 and AF594, AF546 and AF647). Because of the dependence of the Förster radius on the refractive index of the medium, R0 changes slightly for higher denaturant concentrations in the experiments. Hence, for evaluation of the unfolded state, we use the changed Förster radii (see Table S1 in the supplementary material).

As an example, we simulate the test system CI-2. The resulting distributions for CI-2 with two different dye pairs, attached at residues 20 and 78, respectively, are shown in Fig. 2. The inter-dye distances RDA [see Figs. 2(a) and 2(b)] and the orientation factors κ2 [see Figs. 2(c) and 2(d)] for folded and unfolded conformations along with the respective mean values are depicted. We have also carefully investigated the effect of sampling (see SI 4 of the supplementary material).

FIG. 2.

Simulation results for CI-2 for two different dye pairs—AF546 and AF647 (left) and AF488 and AF594 (right)–attached to residues 20 and 78, respectively. The results are shown for simulations of the folded state (blue) and the unfolded state (orange). [(a) and (b)] Distributions of the inter-dye distances RDA and the average distances for folded (R¯F) and unfolded states (R¯U) and respective Förster radii R0 (dashed black lines). Further, the respective distributions of Cα-distances between residues 20 and 78 for folded (green) and unfolded (red) states are shown. [(c) and (d)] Distributions of the orientation factors κ2 and the average orientation factors for folded (κ2¯F) and unfolded (κ2¯U) states. [(e) and (f)] Distributions of the FRET efficiencies with a Gaussian fit (black lines) and the positions of the peaks for folded (⟨EF) and unfolded (⟨EU) states.

FIG. 2.

Simulation results for CI-2 for two different dye pairs—AF546 and AF647 (left) and AF488 and AF594 (right)–attached to residues 20 and 78, respectively. The results are shown for simulations of the folded state (blue) and the unfolded state (orange). [(a) and (b)] Distributions of the inter-dye distances RDA and the average distances for folded (R¯F) and unfolded states (R¯U) and respective Förster radii R0 (dashed black lines). Further, the respective distributions of Cα-distances between residues 20 and 78 for folded (green) and unfolded (red) states are shown. [(c) and (d)] Distributions of the orientation factors κ2 and the average orientation factors for folded (κ2¯F) and unfolded (κ2¯U) states. [(e) and (f)] Distributions of the FRET efficiencies with a Gaussian fit (black lines) and the positions of the peaks for folded (⟨EF) and unfolded (⟨EU) states.

Close modal

Clearly, the unfolded state has a much broader distance distribution, whereas the folded state is narrower. Comparison of the distances of the Cα-atoms with the inter-dye distances shows that the distribution of RDA in the folded state is dominated by dye dynamics. For both dye pairs, the distance distributions are similar. The ideal Förster radius to get best-separated states would be between the average distances of folded and unfolded states. Considering the Förster radii of the two dye pairs used here, the distance distributions show that the pair AF488/AF594 is better suitable for studying CI-2 experimentally.

The distributions of the orientation factors in the folded and unfolded states are similar, and the mean values κ2¯U are in good agreement with the approximation of κ2¯=2/3. As expected, in the case of unfolded proteins, the dyes can rotate freely and their orientations are almost unrestricted. However, we observe slightly larger deviations of 2%-5% for the mean values in the folded state κ2¯F, probably due to steric restrictions imposed by the protein.

The resulting FRET efficiency histograms along with the mean efficiency values calculated by Gaussian fits are illustrated in Figs. 2(e) and 2(f). As an alternative, log-normal distributions or a straightforward median can be used, but this does only have negligible effects on the quantitative results. Different hypothetical Förster radii can be tested with this model (see SI 5 of the supplementary material).

Another advantage of the efficient simulation approach we propose is that different labeling positions can readily be implemented and investigated. As an example, we perform simulations for CspTm with AF488 and AF594 at different labeling positions. The resulting FRET efficiency histograms are shown in Fig. 3. The labeling sites are the residue pairs C2/C68, C68/C2, C11/C68, C68/C11, and C23/C68 for donor/acceptor,62,63 respectively (for the numbering scheme, see SI 6 of the supplementary material). As expected, the peaks for the unfolded state shift to higher efficiencies with shorter sequence separation. Also, permutations of the dyes with respect to the attachment points have, unsurprisingly, no relevance.

FIG. 3.

FRET efficiency distributions for CspTm with AF488 (donor) and AF594 (acceptor) at different labeling sites. The labeling positions of donor (D) and acceptor (A) are color coded in (a)–(e). Colors refer to the positions marked in (f), the residues 2 (red), 11 (green), 23 (blue), and 68 (yellow). The residue numbering within the protein is given in the supplementary material (see SI 6). The distributions are shown for the folded (blue) and unfolded (orange) states and fitted with Gaussians. The positions of the peaks are given for the folded (⟨EF) and unfolded (⟨EU) states.

FIG. 3.

FRET efficiency distributions for CspTm with AF488 (donor) and AF594 (acceptor) at different labeling sites. The labeling positions of donor (D) and acceptor (A) are color coded in (a)–(e). Colors refer to the positions marked in (f), the residues 2 (red), 11 (green), 23 (blue), and 68 (yellow). The residue numbering within the protein is given in the supplementary material (see SI 6). The distributions are shown for the folded (blue) and unfolded (orange) states and fitted with Gaussians. The positions of the peaks are given for the folded (⟨EF) and unfolded (⟨EU) states.

Close modal

Both results (see Figs. 2 and 3) show the strength of the simulation method as it establishes a way to directly relate distance distributions to FRET efficiency distributions and allows one to vary and test different parameters such as the Förster radius, dye pair, linker length, or labeling sites. This facilitates improvement in planning, interpreting, and validating experimental results.

As a validation of our model, we compare the resulting FRET efficiency histograms against experimental data. FRET efficiency histograms from experiments and simulations for four different systems are shown in Fig. 4. Figure 4(a) shows the results for 10FNIII with AF546 and AF647, and Figs. 4(b)–4(d) show the results for CspTm with AF488 and AF594 at different labeling positions.

FIG. 4.

Comparison of FRET efficiency histograms from simulations and experiments. (a) Results for 10FNIII with AF546 and AF647 at residues 11 and 86. [(b)–(d)] Results for CspTm with AF488 and AF594 at (b) residues 68 and 2, (c) residues 11 and 68, and (d) residues 23 and 68, respectively.63 The distributions are shown for the folded (blue) and unfolded (orange) states. The experimental FRET efficiency values below 0.0 and above 1.0 are not shown. The distributions are fitted with Gaussians, and the peak positions for folded (⟨EF) and unfolded (⟨EU) states together with the respective standard deviations σF and σU can be found in Table II.

FIG. 4.

Comparison of FRET efficiency histograms from simulations and experiments. (a) Results for 10FNIII with AF546 and AF647 at residues 11 and 86. [(b)–(d)] Results for CspTm with AF488 and AF594 at (b) residues 68 and 2, (c) residues 11 and 68, and (d) residues 23 and 68, respectively.63 The distributions are shown for the folded (blue) and unfolded (orange) states. The experimental FRET efficiency values below 0.0 and above 1.0 are not shown. The distributions are fitted with Gaussians, and the peak positions for folded (⟨EF) and unfolded (⟨EU) states together with the respective standard deviations σF and σU can be found in Table II.

Close modal

The experimental data for the respective protein in 0.0 M GdmCl (folded) and in 4.63 M and 7.0 M GdmCl (unfolded) for 10FNIII and CspTm63 are shown, respectively. The experimental data are already corrected for background, different quantum yields of donors and acceptors, different detection efficiencies, cross talk, and direct acceptor excitation. From these errors, only the different quantum yields are reflected in our simulation protocol and are already corrected for in the Monte Carlo photon simulations.

The simulated data agree well with the experimental results except in the case of CspTm C11/C68 [see Fig. 4(c)], where the transfer efficiency of the folded state in the simulation is shifted to lower values. In this case, the dyes are attached on opposite sides of the protein, which would explain a lower FRET efficiency compared to the other two labeling schemes. The deviation between simulations and experiments may be caused by residual attractive interactions between the fluorophores and protein surface, an aspect that could be tested by detailed time-resolved fluorescence anisotropy measurements of the different variants.

The mean and standard deviations of the Gaussian fits in Fig. 4 are summarized in Table II. The widths of the FRET efficiency distributions from the simulations are dominated by shot-noise, caused by the limited number of photons collected for each burst.

TABLE II.

Parameters of the Gaussian fits of the FRET efficiency histograms from simulations (Sim) and experiments (Expt.). The peak positions of the Gaussian fit for the folded ⟨EF and unfolded ⟨EU states are given with the respective standard deviations (σF, σU). (a) 10FNIII with AF546 and AF647 at residues 11 and 86. [(b)–(d)] CspTm with AF488 and AF594 at (b) residues 68 and 2, (c) residues 11 and 68, and (d) residues 23 and 68.

Folded Unfolded
System EF σF EU σU
(a) 10FNIII  Expt.  0.84  0.07  0.45  0.12 
  Sim  0.90  0.03  0.41  0.06 
(b) CspTm C68/C2  Expt.  0.95  0.04  0.36  0.08 
  Sim  0.93  0.03  0.34  0.06 
(c) CspTm C11/C68  Expt.  0.95  0.06  0.37  0.08 
  Sim  0.81  0.04  0.40  0.06 
(d) CspTm C23/C68  Expt.  0.94  0.06  0.45  0.07 
  Sim  0.90  0.03  0.46  0.06 
Folded Unfolded
System EF σF EU σU
(a) 10FNIII  Expt.  0.84  0.07  0.45  0.12 
  Sim  0.90  0.03  0.41  0.06 
(b) CspTm C68/C2  Expt.  0.95  0.04  0.36  0.08 
  Sim  0.93  0.03  0.34  0.06 
(c) CspTm C11/C68  Expt.  0.95  0.06  0.37  0.08 
  Sim  0.81  0.04  0.40  0.06 
(d) CspTm C23/C68  Expt.  0.94  0.06  0.45  0.07 
  Sim  0.90  0.03  0.46  0.06 

In experiments, dye photophysics, e.g., donor and acceptor quenching or blinking, additional deviations through the necessity to correct for background, cross talk, and detection efficiencies, lack of site-specific labeling or other chemical heterogeneity, and other experimental artifacts can lead to additional broadening of the FRET efficiency distribution.25,64 Incomplete labeling or photobleaching can lead to a donor-only peak near zero FRET efficiency which is also not present in the simulations.

For CspTm, the width of the Gaussian from the simulations is only slightly lower than the ones from the experiment, which indicates that the width of the experimental histograms is already close to the shot-noise limit.

To compare the simulation results with different simple models for data analysis, we plot the inter-dye distance RDA for the folded state and RDA in relation to the Cα-distance (the distance between the Cα-atoms of the respective residues the dyes are attached to) in the unfolded state in Fig. 5. This is shown for three different protein-dye systems.

FIG. 5.

Distributions of inter-dye distances RDA and distances of the Cα-atoms of the respective residues for CspTm (with AF488 and AF594 at residues 2 and 68, left), 10FNIII (with AF546 and AF647 at residues 11 and 86, middle), and CI-2 (with AF546 and AF647 at residues 20 and 78, right). For the folded state [(a)–(c)], distance distributions (red), mean distances (crosses), and standard deviation (error bars) from simulations (black) and from accessible volume calculations (blue) are depicted. For the unfolded state [(d)–(f)], the histograms of the inter-dye distances and the corresponding Cα-distances are shown. The expected dependency for equality of both values (dotted lines) and for the expectation corrected by an effective segment length of the chain (white crosses) is shown. The histogram count is scaled according to the maximum of each histogram separately.

FIG. 5.

Distributions of inter-dye distances RDA and distances of the Cα-atoms of the respective residues for CspTm (with AF488 and AF594 at residues 2 and 68, left), 10FNIII (with AF546 and AF647 at residues 11 and 86, middle), and CI-2 (with AF546 and AF647 at residues 20 and 78, right). For the folded state [(a)–(c)], distance distributions (red), mean distances (crosses), and standard deviation (error bars) from simulations (black) and from accessible volume calculations (blue) are depicted. For the unfolded state [(d)–(f)], the histograms of the inter-dye distances and the corresponding Cα-distances are shown. The expected dependency for equality of both values (dotted lines) and for the expectation corrected by an effective segment length of the chain (white crosses) is shown. The histogram count is scaled according to the maximum of each histogram separately.

Close modal

For the folded state, we compare the mean and standard deviation from the distance distribution of the simulation (black crosses and error bars) to the respective values of an accessible volume calculation11,12 (blue). The accessible volume approach calculates all sterically possible dye positions within the linkage length and considers them as equally probable. Then, it calculates the mean distance of the dyes RDA¯.

Both methods agree very well, with the simulations having a lower standard deviation. This originates from the dynamics in the simulations, which entropically disfavors dye states close to the protein surface, whereas the accessible volume method assumes all states to be equally probable.

For the unfolded state, the accessible volume approach faces the challenge of properly treating an unfolded ensemble of diverse conformations with dye distributions. By contrast, our model can directly simulate the whole ensemble with dyes.

Proteins in the unfolded state can be described as polymer chains. At high temperatures in the unfolded state, the SBM results in an excluded volume polymer chain.65 To test this, we consider the equation

r21/2=C×Nν
(8)

with the spatial distance r between two chain elements, the sequence separation between two elements N, the length scaling exponent ν, and a constant C. We fit the mean Cα-distance (RCα)ij21/2 between residues i and j, averaged over the trajectory, as a function of the sequence separation N = |ij| with Eq. (8). We obtain values for the fit parameters ν which correspond well with the expected value for an excluded volume chain of ν = 3/565 (see SI 7 of the supplementary material). Now we determine an effective segment length Neff = N + L with a “length” L of the dye pair which was already used to interpret experimental FRET measurements.66,67 We take the dyes’ mean square separation RDA21/2 and assign an Neff from the respective fit. An example fit and details of the calculation are given in the supplementary material (see Fig. S12). It turns out that for the AF488/AF594 dye pair, the length is about L = 11.3 ± 1.3 residues, which is in the same range as found experimentally in Ref. 67. For the AF546/AF647 dye pair, it is 14.15 ± 0.65 residues. With this effective length, a correction factor m = ((N + L)/N)ν for the relation between RDA21/2 and the distance of the respective Cα-atoms RCα21/2 can be calculated. It is given by m = 1.10 for CspTm (with AF488/AF594), m = 1.14 for CI-2 (with AF546/AF647), and m = 1.11 for 10FNIII (with AF546/AF647) (for details see SI 7 of the supplementary material).

For the unfolded state, the histogram for the inter-dye distance and the corresponding Cα-distance is shown in Fig. 5. It can be seen clearly that the values are not identical (which would be indicated by the dotted lines). By correcting this for the effective chain length, we expect a different relation between RDA and the Cα-distance, as indicated by the white crosses, providing a good approximation of the dependence observed in the simulations.

In summary, we achieve our aim of capturing both folded and unfolded states in the same simulation method, paving the way to simulate complex large-scale conformational transitions between multiple states in the future.

In this work, we introduce a new simulation method for dye-labeled biomolecules with only a few robust parameters. The approach suggested here is most useful in the absence of pronounced non-native interactions or structure formation not included in the structure-based model. We are able to achieve a realistic description of the system as evidenced by direct comparison and quantitative agreement of FRET efficiency histograms with experimental data. This establishes a way to directly complement experimental FRET efficiency distributions with atomically resolved structural ensembles from simulations, which provides novel and detailed insights into biomolecular processes. Even complex scenarios involving large structural ensembles, such as unfolded proteins, intrinsically disordered proteins, conformational transitions, and folding intermediates, can be sufficiently sampled with modest computational resources.

The model can improve planning experimental measurements as one can vary and test different parameters such as the Förster radius, linker length, and labeling sites in silico. On the basis of these simulations, one can then choose the dye combinations which would, e.g., best distinguish the conformational states of interest. Further, we found that the approximation of κ2¯=2/3 is mostly appropriate for the tested typical systems. Similarly, we could probe estimations of dye distance distributions based on simple models.

Having established the simulation techniques, many new scenarios are now accessible. These include simulating strongly restricted or slow dye motions and very short or inflexible linkers to test their effect on κ2. The low computational costs make investigating conformational transitions between multiple stable conformations and other large scale motions accessible. In addition, it can be used to simulate more complex FRET measurements, such as three-color FRET,68–70 where the interpretation of the data is even more challenging.

See supplementary material for additional information about the used methods and the three-dimensional structures of the Alexa Fluor dyes.

I.R., C.S., and A.S. received funding from the Impuls-und Vernetzungsfond of the Helmholtz association. We would like to thank Pascal Friederich for fruitful discussions. G.U.N. acknowledges funding by the Karlsruhe Institute of Technology in the context of the Helmholtz Program Science and Technology of Nanosystems (STN). B.S. was supported by the Swiss National Science Foundation.

1.
B.
Schuler
and
H.
Hofmann
,
Curr. Opin. Struct. Biol.
23
,
36
(
2013
).
2.
R.
Rieger
,
A.
Kobitski
,
H.
Sielaff
, and
G. U.
Nienhaus
,
ChemPhysChem
12
,
627
(
2011
).
3.
R.
Rieger
and
G. U.
Nienhaus
,
Chem. Phys.
396
,
3
(
2012
).
4.
S.
Muller-Spath
,
A.
Soranno
,
V.
Hirschfeld
,
H.
Hofmann
,
S.
Ruegger
,
L.
Reymond
,
D.
Nettels
, and
B.
Schuler
,
Proc. Natl. Acad. Sci. U. S. A.
107
,
14609
(
2010
).
5.
D.
Nettels
,
S.
Muller-Spath
,
F.
Kuster
,
H.
Hofmann
,
D.
Haenni
,
S.
Ruegger
,
L.
Reymond
,
A.
Hoffmann
,
J.
Kubelka
,
B.
Heinz
,
K.
Gast
,
R. B.
Best
, and
B.
Schuler
,
Proc. Natl. Acad. Sci. U. S. A.
106
,
20740
(
2009
).
6.
E.
Sherman
and
G.
Haran
,
Proc. Natl. Acad. Sci. U. S. A.
103
,
11539
(
2006
).
7.
Y.
Gambin
,
A.
Schug
,
E. A.
Lemke
,
J. J.
Lavinder
,
A. C. M.
Ferreon
,
T. J.
Magliery
,
J. N.
Onuchic
, and
A. A.
Deniz
,
Proc. Natl. Acad. Sci. U. S. A.
106
,
10153
(
2009
).
8.
Z.
Wang
and
D. E.
Makarov
,
J. Phys. Chem. B
107
,
5617
(
2003
).
9.
A.
Muschielok
,
J.
Andrecka
,
A.
Jawhari
,
F.
Brückner
,
P.
Cramer
, and
J.
Michaelis
,
Nat. Methods
5
,
965
(
2008
).
10.
A.
Muschielok
and
J.
Michaelis
,
J. Phys. Chem. B
115
,
11927
(
2011
).
11.
S.
Sindbert
,
S.
Kalinin
,
H.
Nguyen
,
A.
Kienzler
,
L.
Clima
,
W.
Bannwarth
,
B.
Appel
,
S.
Müller
, and
C. A. M.
Seidel
,
J. Am. Chem. Soc.
133
,
2463
(
2011
).
12.
S.
Kalinin
,
T.
Peulen
,
S.
Sindbert
,
P. J.
Rothwell
,
S.
Berger
,
T.
Restle
,
R. S.
Goody
,
H.
Gohlke
, and
C. A. M.
Seidel
,
Nat. Methods
9
,
1218
(
2012
).
13.
B.
Schuler
,
A.
Soranno
,
H.
Hofmann
, and
D.
Nettels
,
Annu. Rev. Biophys.
45
,
207
(
2016
).
14.
R. B.
Best
,
H.
Hofmann
,
D.
Nettels
, and
B.
Schuler
,
Biophys. J.
108
,
2721
(
2015
).
15.
G. F.
Schröder
,
U.
Alexiev
, and
H.
Grubmüller
,
Biophys. J.
89
,
3757
(
2005
).
16.
R. B.
Best
,
K. A.
Merchant
,
I. V.
Gopich
,
B.
Schuler
,
A.
Bax
, and
W. A.
Eaton
,
Proc. Natl. Acad. Sci. U. S. A.
104
,
18964
(
2007
).
17.
M. J.
Shoura
,
R. U.
Ranatunga
,
S. A.
Harris
,
S. O.
Nielsen
, and
S. D.
Levene
,
Biophys. J.
107
,
700
(
2014
).
18.
M. M.
Reif
and
C.
Oostenbrink
,
J. Comput. Chem.
35
,
2319
(
2014
).
19.
E. V.
Kuzmenkina
,
C. D.
Heyes
, and
G. U.
Nienhaus
,
Proc. Natl. Acad. Sci. U. S. A.
102
,
15471
(
2005
).
20.
D. B.
VanBeek
,
M. C.
Zwier
,
J. M.
Shorb
, and
B. P.
Krueger
,
Biophys. J.
92
,
4168
(
2007
).
21.
B.
Corry
and
D.
Jayatilaka
,
Biophys. J.
95
,
2711
(
2008
).
22.
P. C.
Whitford
,
J. K.
Noel
,
S.
Gosavi
,
A.
Schug
,
K. Y.
Sanbonmatsu
, and
J. N.
Onuchic
,
Proteins: Struct., Funct., Bioinf.
75
,
430
(
2009
).
23.
25.
E.
Sisamakis
,
A.
Valeri
,
S.
Kalinin
,
P. J.
Rothwell
, and
C. A.
Seidel
, in
Single Molecule Tools, Part B: Super-Resolution, Particle Tracking, Multiparameter, and Force Based Methods
, Methods in Enzymology, edited by
N. G.
Walter
(
Academic Press
,
2010
), Vol. 475, pp.
455
514
.
26.
G.
Lipari
and
A.
Szabo
,
Biophys. J.
30
,
489
(
1980
).
27.
F.
Hillger
,
D.
Hänni
,
D.
Nettels
,
S.
Geister
,
M.
Grandin
,
M.
Textor
, and
B.
Schuler
,
Angew. Chem., Int. Ed.
47
,
6184
(
2008
).
28.
D.
Nettels
,
A.
Hoffmann
, and
B.
Schuler
,
J. Phys. Chem. B
112
,
6137
(
2008
).
29.
J. D.
Bryngelson
,
J. N.
Onuchic
,
N. D.
Socci
, and
P. G.
Wolynes
,
Proteins: Struct., Funct., Genet.
21
,
167
(
1995
).
30.
J. D.
Bryngelson
and
P. G.
Wolynes
,
Proc. Natl. Acad. Sci. U. S. A.
84
,
7524
(
1987
).
31.
P. E.
Leopold
,
M.
Montal
, and
J. N.
Onuchic
,
Proc. Natl. Acad. Sci. U. S. A.
89
,
8721
(
1992
).
32.
H.
Frauenfelder
,
S.
Sligar
, and
P.
Wolynes
,
Science
254
,
1598
(
1991
).
33.
J. N.
Onuchic
and
P. G.
Wolynes
,
Curr. Opin. Struct. Biol.
14
,
70
(
2004
).
34.
C.
Sinner
,
B.
Lutz
,
S.
John
,
I.
Reinartz
,
A.
Verma
, and
A.
Schug
,
Isr. J. Chem.
54
,
1165
(
2014
).
35.
L. L.
Chavez
,
J. N.
Onuchic
, and
C.
Clementi
,
J. Am. Chem. Soc.
126
,
8426
(
2004
).
36.
C.
Sinner
,
B.
Lutz
,
A.
Verma
, and
A.
Schug
,
J. Chem. Phys.
143
,
243154
(
2015
).
37.
M. B.
Borgia
,
A.
Borgia
,
R. B.
Best
,
A.
Steward
,
D.
Nettels
,
B.
Wunderlich
,
B.
Schuler
, and
J.
Clarke
,
Nature
474
,
662
(
2011
).
38.
P.
Tian
and
R. B.
Best
,
PLoS Comput. Biol.
12
,
e1004933
(
2016
).
39.
A.
Schug
,
M.
Weigt
,
J. N.
Onuchic
,
T.
Hwa
, and
H.
Szurmant
,
Proc. Natl. Acad. Sci. U. S. A.
106
,
22124
(
2009
).
40.
P. C.
Whitford
,
O.
Miyashita
,
Y.
Levy
, and
J. N.
Onuchic
,
J. Mol. Biol.
366
,
1661
(
2007
).
41.
A.
Schug
,
P. C.
Whitford
,
Y.
Levy
, and
J. N.
Onuchic
,
Proc. Natl. Acad. Sci. U. S. A.
104
,
17674
(
2007
).
42.
P. C.
Whitford
and
K. Y.
Sanbonmatsu
,
J. Chem. Phys.
139
,
121919
(
2013
).
43.
P. C.
Whitford
,
A.
Schug
,
J.
Saunders
,
S. P.
Hennelly
,
J. N.
Onuchic
, and
K. Y.
Sanbonmatsu
,
Biophys. J.
96
,
L7
(
2009
).
44.
C.
Clementi
,
H.
Nymeyer
, and
J. N.
Onuchic
,
J. Mol. Biol.
298
,
937
(
2000
).
45.
J. K.
Noel
,
A.
Schug
,
A.
Verma
,
W.
Wenzel
,
A. E.
Garcia
, and
J. N.
Onuchic
,
J. Phys. Chem. B
116
,
6880
(
2012
).
46.
B.
Lutz
,
C.
Sinner
,
G.
Heuermann
,
A.
Verma
, and
A.
Schug
,
Bioinformatics
29
,
2795
(
2013
).
47.
J. K.
Noel
,
P. C.
Whitford
, and
J. N.
Onuchic
,
J. Phys. Chem. B
116
,
8692
(
2012
).
48.
J. K.
Noel
,
P. C.
Whitford
,
K. Y.
Sanbonmatsu
, and
J. N.
Onuchic
,
Nucleic Acids Res.
38
,
W657
(
2010
).
49.
H.
Lammert
,
A.
Schug
, and
J. N.
Onuchic
,
Proteins: Struct., Funct., Bioinf.
77
,
881
(
2009
).
50.
M.
Abraham
,
D.
van der Spoel
,
E.
Lindahl
,
B.
Hess
, and
GROMACS Development Team
, GROMACS User Manual version 5.0, URL: http://www.gromacs.org (
2014
), pp. 78–79, 84.
51.
H. M.
Berman
,
J.
Westbrook
,
Z.
Feng
,
G.
Gilliland
,
T. N.
Bhat
,
H.
Weissig
,
I. N.
Shindyalov
, and
P. E.
Bourne
,
Nucleic Acids Res.
28
,
235
(
2000
).
52.
C. A.
McPhalen
,
I.
Svendsen
,
I.
Jonassen
, and
M. N.
James
,
Proc. Natl. Acad. Sci. U. S. A.
82
,
7242
(
1985
).
53.
A. L.
Main
,
T. S.
Harvey
,
M.
Baron
,
J.
Boyd
, and
I. D.
Campbell
,
Cell
71
,
671
(
1992
).
54.
W.
Kremer
,
B.
Schuler
,
S.
Harrieder
,
M.
Geyer
,
W.
Gronwald
,
C.
Welker
,
R.
Jaenicke
, and
H. R.
Kalbitzer
,
Eur. J. Biochem.
268
,
2527
(
2001
).
55.
See URL: http://www.thermofisher.com for information about the Alexa Fluor dyes.
56.
D.
van der Spoel
,
E.
Lindahl
,
B.
Hess
,
A. R.
van Buuren
,
E.
Apol
,
P. J.
Meulenhoff
,
D. P.
Tieleman
,
A. L. T. M.
Sijbers
,
K. A.
Feenstra
,
R.
van Drunen
, and
H. J. C.
Berendsen
, Gromacs User Manual version 4.5.6, URL: http://www.gromacs.org (
2010
).
57.
J.
Wang
,
P.
Cieplak
, and
P. A.
Kollman
,
J. Comput. Chem.
21
,
1049
(
2000
).
58.
B.
Lutz
,
M.
Faber
,
A.
Verma
,
S.
Klumpp
, and
A.
Schug
,
Nucleic Acids Res.
42
,
2687
(
2014
).
59.
J.
Jackson
,
K.
Nguyen
, and
P.
Whitford
,
Int. J. Mol. Sci.
16
,
6868
(
2015
).
60.
M.
Hoefling
,
N.
Lima
,
D.
Haenni
,
C. A. M.
Seidel
,
B.
Schuler
, and
H.
Grubmüller
,
PLoS One
6
,
e19791
(
2011
).
61.
M.
Hoefling
and
H.
Grubmüller
,
Comput. Phys. Commun.
184
,
841
(
2013
).
62.
A.
Hoffmann
,
A.
Kane
,
D.
Nettels
,
D. E.
Hertzog
,
P.
Baumgartel
,
J.
Lengefeld
,
G.
Reichardt
,
D. A.
Horsley
,
R.
Seckler
,
O.
Bakajin
, and
B.
Schuler
,
Proc. Natl. Acad. Sci. U. S. A.
104
,
105
(
2007
).
63.
A.
Soranno
,
B.
Buchli
,
D.
Nettels
,
R. R.
Cheng
,
S.
Muller-Spath
,
S. H.
Pfeil
,
A.
Hoffmann
,
E. A.
Lipman
,
D. E.
Makarov
, and
B.
Schuler
,
Proc. Natl. Acad. Sci. U. S. A.
109
,
17800
(
2012
).
64.
B.
Schuler
, “
Application of single molecule Förster resonance energy transfer to protein folding
,” in
Protein Folding Protocols
, edited by
Y.
Bai
and
R.
Nussinov
(
Humana Press
,
Totowa, NJ
,
2006
), pp.
115
138
.
65.
P. J.
Flory
,
J. Chem. Phys.
17
,
303
(
1949
).
66.
H.
Hofmann
,
A.
Soranno
,
A.
Borgia
,
K.
Gast
,
D.
Nettels
, and
B.
Schuler
,
Proc. Natl. Acad. Sci. U. S. A.
109
,
16155
(
2012
).
67.
M.
Aznauryan
,
L.
Delgado
,
A.
Soranno
,
D.
Nettels
,
J.-r.
Huang
,
A. M.
Labhardt
,
S.
Grzesiek
, and
B.
Schuler
,
Proc. Natl. Acad. Sci. U. S. A.
113
,
E5389
(
2016
).
68.
S.
Hohng
,
C.
Joo
, and
T.
Ha
,
Biophys. J.
87
,
1328
(
2004
).
69.
S.
Lee
,
J.
Lee
, and
S.
Hohng
,
PLoS One
5
,
e12270
(
2010
).
70.
S.
Milles
,
C.
Koehler
,
Y.
Gambin
,
A. A.
Deniz
, and
E. A.
Lemke
,
Mol. BioSyst.
8
,
2531
(
2012
).

Supplementary Material