Molecular Dynamics (MD) simulations of proteins implicitly contain the information connecting the atomistic molecular structure and proteins’ biologically relevant motion, where large-scale fluctuations are deemed to guide folding and function. In the complex multiscale processes described by MD trajectories, it is difficult to identify, separate, and study those large-scale fluctuations. This problem can be formulated as the need to identify a small number of collective variables that guide the slow kinetic processes. The most promising method among the ones used to study the slow leading processes in proteins’ dynamics is the time-structure based on time-lagged independent component analysis (tICA), which identifies the dominant components in a noisy signal. Recently, we developed an anisotropic Langevin approach for the dynamics of proteins, called the anisotropic Langevin Equation for Protein Dynamics or LE4PD-XYZ. This approach partitions the protein’s MD dynamics into mostly uncorrelated, wavelength-dependent, diffusive modes. It associates with each mode a free-energy map, where one measures the spatial extension and the time evolution of the mode-dependent, slow dynamical fluctuations. Here, we compare the tICA modes’ predictions with the collective LE4PD-XYZ modes. We observe that the two methods consistently identify the nature and extension of the slowest fluctuation processes. The tICA separates the leading processes in a smaller number of slow modes than the LE4PD does. The LE4PD provides time-dependent information at short times and a formal connection to the physics of the kinetic processes that are missing in the pure statistical analysis of tICA.

Large-scale fluctuations and global structural rearrangements play an essential role in the biological functions of biopolymers. DNA transcription and replication involve the self-assembly of large multiprotein complexes that spontaneously form through step-by-step processes where binding of proteins is facilitated by the molecular flexibility.1 At the single molecule level, folding of the proteins to their most probable conformation involves large-scale molecular fluctuations and slow global structural rearrangements that are guided by cooperative dynamics.2–5 Proteins’ slow fluctuations may have important biological implication in the mechanism of protein binding and function.6–13 Following the hypothesis of the Monod–Wyman–Changeux model, a protein’s spontaneous fluctuations can lead the conformation selection mechanism where the binding partner selects the most favorable conformation among the ones made available by fluctuations.14–17 Thus, fluctuations in the unbound protein can signal the regions that are involved in protein binding and function.

Molecular dynamics (MD) simulations of proteins in solvent are a powerful method to identify fluctuations and investigate the role that the chemical structure, or primary sequence, of a protein play in multiscale dynamics. However, the information contained in the simulation trajectory is difficult to analyze because dynamical processes are often coupled on multiple length scales. Therefore, it is crucial to devise statistical procedures that conveniently separate the multidimensional trajectory of a simulation into a set of independent dynamical processes that, when added together, form the observed data. These different contributions should be as independent as possible for one to be able to analyze and classify their dynamical response separately. Traditionally, this issue has been addressed by adopting statistical tools from signal processing to extract from a noisy response the most critical information, which is usually a slowly fluctuating signal or a collection of slowly fluctuating signals.

The most widely used analysis method for simulation trajectories is the principal component analysis or PCA method based on the definition of a covariance matrix of the selected variables.18 Correlation is a linear association measure, and uncorrelated processes are defined as having the cross terms in the covariance matrix equal to zero. However, the goal is to isolate from the trajectory independent fluctuations. Independent processes and uncorrelated processes are different from the mathematical point of view. Independent processes are defined as having a joint probability distribution that can be separated into a product of individual distributions.19 Independence includes not only linear (uncorrelated) but also nonlinear relationships.20 In a nutshell, linear models, such as PCA, in general, cannot decompose the dynamics into independent motions because those motions can be nonlinear. Independent, nonlinear, mode dynamics can be identified by an Independent Component Analysis (ICA).19 

In 2011, Naritomi and Fuchigami introduced to the field of protein dynamics the time-structure based independent component analysis or tICA, which is a specific type of ICA method.21 In tICA, the dynamics is defined by a time-lagged covariance matrix and its limit at zero lag time, which is the conventional covariance matrix. By solving the generalized eigenvalue problem, the protein’s dynamics are separated by tICA into modes that are uncorrelated at both zero lag time and at τtICA.22 These constraints substitute for the stringent independence criteria normally required; i.e., the mode independence at τtICA substitutes for the independence of nonlinear zero-lag correlations.19,23 The time-structure based method was later revisited by Pande and co-workers and with the name of “time-lagged independent component analysis” by Noe and co-workers.24,25 We will use the terms “time-structure based” and “time-lagged” interchangeably. When tICA is paired with a Markov State Model (MSM) of the kinetics of transition between modes, it accurately detects dominant slow modes of motion.24,25 The tICA modes identify the slowest dynamical decorrelation and thus are considered to be the optimal linear coordinates to represent the slow dynamics:26 for example, the tICA modes have been used as variationally optimal collective coordinates for enhanced sampling in metadynamics.27,28

While tICA remains a rigorous statistical analysis of the multidimensional simulation trajectory, it still does not provide a physical interpretation of the slow dynamics or the connection between slow motions and protein’s atomistic structure and interactions. Naritomi and Fuchigami partially addressed this issue by examining the slow dynamics of protein’s domains and backbone using tICA. Until now, an equation of motion that related the structure and interaction potential to the dynamics was missing in their study.21,29 The degrees of freedom or “features” input to the tICA are chosen based on their ability to predict the slowest dynamics but are not necessarily connected to an equation of motion for describing the time evolution of the input coordinates. One determines the efficacy of the chosen features a posteriori using cross-validation methods.30,31 In this study, we relate the tICA formalism to our Langevin Equation for Protein Dynamics (LE4PD).32–36 In the limit in which the tICA lag time is zero, the two are formally equivalent.32 Furthermore, the LE4PD is a formal extension of the equation of motion for a macromolecule, obtained from the first-principles Liouville equation using Mori–Zwanzig projection operators.37–40 Thus, albeit involved, there is a formal connection between the tICA fluctuations and the fundamental equation of motion for the dynamics of a protein.

The LE4PD formalism is based on the Rouse–Zimm equation for the dynamics of a polymer in solution,41–43 which we extended to include physical characteristics that are specific to folded proteins. Typically (i) inside the hydrophobic core of a protein, where atoms are not exposed to the solvent, the hydrodynamic interaction is screened, but atoms still experience friction; thus, the LE4PD adopts a residue-dependent friction coefficient calculated from the extent that each amino acid is solvent-exposed; (ii) protein dynamics are non-linear and molecular rearrangements of the protein during fluctuations involve the crossing of energy barriers that play a major role in protein dynamics and folding. The LE4PD approximately accounts a posteriori for the nonlinearities in the dynamics through the construction of free-energy landscapes for each mode and the rescaling of the timescale of barrier-crossing.32,34,44

Through normal mode diagonalization, the LE4PD separates the dynamics sampled in a long MD simulation, or in a set of short MD simulations, into a set of diffusive normal modes that are largely independent. These modes directly depend on real-space information, as the amino acid-specific local frictions, the water’s viscosity, the potential of mean force, and the internal energy barriers are included. From the mode-dependent free-energy landscape, one can identify the energy minima and the pathways that cross energy barriers, thus isolating the mode-dependent local fluctuations. Along the pathway, one can sample the protein conformation, thus depicting the conformational transitions during barrier-crossing. A simple Kramers’ rescaling, or applying a MSM analysis to the mode-dependent free-energy surface (FES), provides the transition times of the mode-dependent fluctuations. Relaxation dynamics of time correlation functions predicted by LE4PD have been shown to be accurate when compared with experimental data of T1, T2, and nuclear Overhauser effect (NOE) nuclear magnetic resonance (NMR) relaxation,34,35 as well as to short-time Debye–Waller factors from x-ray scattering experiments.33 Recently, Beyerle and Guenza extended the LE4PD approach to describe anisotropic fluctuations in the LE4PD-XYZ model.32 

In this study, we focus on tICA, given that at zero lag time, the covariance matrix is formally consistent with the inverse of the LE4PD-XYZ force matrix. We compare the position, timescale, and pathway of slow fluctuations measured by tICA with the ones described by the LE4PD-XYZ. The question we aim to address is if a Langevin-mode decomposition can be as effective as tICA in isolating the leading dynamical processes from a protein MD trajectory.

We analyze an extensive, 1-µs long MD simulation of the protein ubiquitin in a solution of sodium chloride at physiological conditions. Ubiquitin is a regulatory protein in eukaryotic cells, known for its role as a post-translational modifier of other proteins through mono- and poly-ubiquitination processes. It can bind to substrates either covalently45,46 or non-covalently.47 By identifying the slowest fluctuations in the isolated protein, we can locate important regions for ubiquitin’s binding where a partner’s selection can be guided by the thermodynamics and the kinetics of the different fluctuation processes.

This study addresses several relevant questions related to protein fluctuations and the tICA method: Are the tICA’s slow fluctuations similar to the ones that LE4PD-XYZ identifies? How are the results of this study dependent on the choice of the tICA lag time? Are there different but compatible procedures to correctly identify the best tICA lag time? Can a Langevin equation that adopts the tICA covariance matrix to build the intramolecular force matrix describe the correct dynamics of the system as measured by time correlation functions? How important is the role of hydrodynamics in tICA? How significant are the internal energy barriers that are only approximately accounted for by tICA?

By comparing LE4PD-XYZ to tICA, this study formally connects the tICA method to a Langevin equation of motion. Along similar lines of thinking, Takano and co-workers recently proposed the Relaxation Mode Analysis (RMA), which is similar to tICA.48–51 Both RMA and tICA maximize the time-dependent correlation matrix of the fluctuations at a given lag time, τtICA, and at an initial time, t0, while dynamics faster than t0 are averaged out.51 The difference between the two approaches is that RMA calculates the covariance matrix at a time t0 ≠ 0, while tICA is a particular case of RMA, where t0 = 0.51 The RMA also has some similarities with our LE4PD approach as both accurately model with a Langevin equation of motion the slow dynamics of the protein, even if the details of the two dynamical equations are different.

This paper is structured as follows: Sec. II briefly summarizes the anisotropic LE4PD-XYZ approach and the calculation of the LE4PD mode-dependent free-energy surfaces. Section III presents the tICA method while also proposing the calculation of a single-mode-dependent free-energy map, which MSM analyzes. Section IV illustrates the comparison between LE4PD-XYZ’s and tICA’s slowest fluctuations and also includes a biological interpretation of the observed fluctuations. Section V discusses the compatibility of the FES for the slowest tICA mode and the slowest LE4PD-XYZ modes with and without hydrodynamic interactions. The calculation of time correlation functions with the two approaches is in Sec. VI. Because the results of the tICA method depend on the tICA lag time selected, we analyze the dependence of the tICA single-mode FES on this parameter in Sec. VII. Finally, we assess the advantages and disadvantages of the proposed single-mode tICA method in comparison with the more traditional two-mode tICA analysis in Sec. VIII. A brief discussion with conclusions summarizes the findings of this study in Sec. IX.

In recent years, we have developed a coarse-grained model to describe protein fluctuations in the amino acid positions, called the Langevin Equation for Protein Dynamics (LE4PD).33–36,44,52 The original LE4PD is isotropic and is presented in Sec. S6 of the supplementary material. Beyerle and Guenza recently extended it to the related anisotropic formalism, called the LE4PD-XYZ method.32 The anisotropic LE4PD directly connects the PCA fluctuations to an equation of motion that contains the covariance matrix in the amino acids positions.32 We briefly review the LE4PD-XYZ model here, while we refer for more details on both LE4PD models to the aforementioned original manuscripts.

The first step in developing the anisotropic LE4PD is to define as the leading variables the deviations of the position of the protein’s alpha carbons from their average values, ΔRi(t)=Ri(t)Ri(t).32 Here, a(t)=1Mt=1Ma(t) denotes the usual static average calculated over a trajectory of length M frames.

Each component of the position vector fluctuation follows the anisotropic LE4PD equation of motion,

(1)

where α,β,γx,y,z. Furthermore, kB is the Boltzmann constant, T is the temperature in kelvin, and vi(t) is a stochastic velocity. The average residue friction coefficient is ζ̄=1Niζi, where ζi is the friction coefficient of amino acid i. The matrix Hijαβ describes the hydrodynamic interaction between the α component of residue i and the β component of residue j,

(2)

where 1rij is the average inverse distance between residues i and j and r̄w=1Nirw,i is the average residue radius exposed to the solvent.

In this equation, the dynamics is defined in a body-fixed system of coordinates, where both translation and rotation dynamics have been eliminated. The trajectory of the protein, analyzed to build the H′ and A′ matrices, for example, is also in a body-fixed reference system, where translation and rotation are absent.53–56 The equation is solved by applying the fluctuation–dissipation condition, as described in our previous publication.32 

The matrix Ajkβγ describes the inverse of the covariance between the β component of residue j and the γ component of residue k as

(3)

where U1=Δl(t)Δl(t)T is the matrix of correlations of the bond fluctuations in Cartesian coordinates with Δl(t)=aIΔR(t), Δliα(t)=jaIijδαβΔRjβ(t). I is the 3 × 3 identity matrix, and a is the N − 1 × N matrix of the amino acid connectivity (with i = 1, …, N − 1 and j = 1, …, N),

(4)

Here, δαβ is the Kronecker delta, and the “⊗” symbol denotes the Kronecker product.57 

From the simulation trajectory, we calculate (i) the average fluctuations in the α-carbon positions, which enter through the U matrix the inverse of the covariance matrix [Eq. (3)]; (ii) the average inverse distance between the residues, which enter the hydrodynamic interaction matrix [Eq. (2)]; (iii) the friction coefficient of each residue, ζi, and the average residue radius exposed to the solvent, r̄w, which also enter Eq. (2). The simulation trajectory is also used to test the quality of the theoretical predictions of time correlation functions in Sec. VI.

More details on the anisotropic LE4PD model, and how it is formally related to the isotropic LE4PD, are given in Ref. 32. Equation (1) is solved using the eigenvalue decomposition of the HA′ matrix product, Q−1HAQ′ = Λ′, which gives the equation of motion for the evolution of the LE4PD-XYZ modes, Δξa(t),

(5)

with σa=kBTλa/ζ̄ being the characteristic diffusive rate of mode a58 and Δva(t) being the stochastic velocity in mode coordinates.

Using the decomposition of Q′ for the anisotropic HA′ matrix, the mode coordinate ξa(t) of the anisotropic LE4PD can be separated into its x–, y–, and z– components as

(6)

where x̂,ŷ,andẑ are the unit vectors in the x-, y-, and z- directions, and the spherical mode coordinates and free-energy surfaces can be defined as

(7)

where the probability for the protein of adopting, in mode a, a conformation with angles θa,ϕa is

The linear combination of all the anisotropic modes leads to structural and time-dependent properties, directly comparable to simulations or experimental data. From the anisotropic free-energy surfaces, we calculate fluctuations in the three spatial directions. As an example, Fig. 1 shows the LE4PD-XYZ analysis of a 1-µs long MD simulation of the protein ubiquitin (for details on the simulation, see Sec. S8 in the supplementary material). The same trajectory will be analyzed with the tICA to provide a comparison between the two methods. Figure 1 shows in panel (a) the FES in the mode coordinates for the seventh LE4PD-XYZ mode without hydrodynamics (mode seventh is the slowest one in this formalism). The FES displays two minima separated by a small energy barrier. The protein’s conformations along the transition pathway between these two minima are displayed in panel (b). Panels (c) and (d) report data from a MSM analysis of the mode trajectory (for details on the MSM, see Sec. S9 in the supplementary material).

FIG. 1.

Analysis of the seventh LE4PD-XYZ mode without hydrodynamics. Panel (a) shows the free-energy landscape of the seventh LE4PD-XYZ mode in the two spherical coordinate reference system. The pathway of crossing the energy barrier between the two minima is identified with a rubber band using a variant of the string method.44 Panel (b) shows ubiquitin’s conformations that correspond to the pathway shown in panel (a) with the red conformation referring to the energy minimum at the top of the map and the blue conformation corresponding to the energy minimum at the bottom of the map. The blue arrow points to the region of ubiquitin experiencing the largest amplitude fluctuation for this mode corresponding to the 50 s loop. Panel (c) displays the second eigenvector resulting from the diagonalization of the transition matrix defined in the Markov State Model (MSM) procedure for this mode, which identifies the two minima in the FES. The projection of ψ2 onto the discrete states of the MSM has colors that correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. Panel (d) shows how the transition time for the second MSM eigenvector changes when we select a different lag time in the calculation of the MSM transition matrix. The black, vertical line demarcates the lag time corresponding to the second eigenvector mapping the two minima, as reported in panel (c).

FIG. 1.

Analysis of the seventh LE4PD-XYZ mode without hydrodynamics. Panel (a) shows the free-energy landscape of the seventh LE4PD-XYZ mode in the two spherical coordinate reference system. The pathway of crossing the energy barrier between the two minima is identified with a rubber band using a variant of the string method.44 Panel (b) shows ubiquitin’s conformations that correspond to the pathway shown in panel (a) with the red conformation referring to the energy minimum at the top of the map and the blue conformation corresponding to the energy minimum at the bottom of the map. The blue arrow points to the region of ubiquitin experiencing the largest amplitude fluctuation for this mode corresponding to the 50 s loop. Panel (c) displays the second eigenvector resulting from the diagonalization of the transition matrix defined in the Markov State Model (MSM) procedure for this mode, which identifies the two minima in the FES. The projection of ψ2 onto the discrete states of the MSM has colors that correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. Panel (d) shows how the transition time for the second MSM eigenvector changes when we select a different lag time in the calculation of the MSM transition matrix. The black, vertical line demarcates the lag time corresponding to the second eigenvector mapping the two minima, as reported in panel (c).

Close modal

More specifically, panel (c) shows the projection of the second MSM eigenvector, ψ2, onto the FES. The second eigenvector of the MSM transition matrix identifies the two deepest minima and the top of the energy barrier between them in an FES.59 When the second MSM eigenvector matches the population in the two minima and at the top of the barrier, the MSM lag time identified by this process provides the transition time across the barrier.44 Panel (d) shows the calculation of the transition time as a function of the MSM lag time. When a process fulfills the Chapman–Kolmogorov (CK) equation, the process follows Markovian statistics, and the transition time becomes independent of the lag time, i.e., the transition time becomes constant.59 Panel (d) shows that the transition to Markovian dynamics (when the blue line becomes flat) is reasonably close to the transition time calculated from the second MSM eigenvector (the vertical dashed line). Thus, the two procedures to evaluate the transition time give similar values for the seventh mode. Note that Fig. 1 displays results for the LE4PD-XYZ theory without hydrodynamic interactions (see for a discussion Sec. V). An identical calculation performed for the slowest LE4PD-XYZ mode with hydrodynamic interactions (mode six) gives similar free-energy maps and MSM analyses. This result suggests that the hydrodynamics interaction does not affect the dynamics of the slowest fluctuations. Note that the calculations performed take into account hydrodynamics and also include residue-dependent friction coefficients.

The procedure briefly presented here was proposed in our recent publications.32,44 The same well-tested procedure will be applied to the analysis of the tICA modes in Sec. III.

The time-lagged independent component analysis is a method extensively used in the field of signal processing, information theory, and artificial neural networks to identify hidden factors that are shared and underlie the observed multivariate data.23 This technique has been applied in several fields, including the analysis of protein dynamics to identify the prevalent large-scale motion inside a simulation trajectory. By introducing a time lag in the covariance matrix, one effectively includes the temporal dimension in the analysis of the leading fluctuations, making it possible to model kinetic processes. The time-lagged ICA is an extension of the principal component analysis (PCA) method, where one takes care of isolating the most slowly decorrelating dynamics while including the time dependence of the data as an explicit variable in the analysis. The tICA method has been reviewed in several recent publications and will be only summarized here.21,24,29,60–62

While tICA is a general approach that applies to any set of coordinates, here, we are interested in performing a tICA of the alpha-carbon trajectory of a protein with N residues. We define as tICA coordinates the ΔR(t)T=R1(t)R1(t),R2(t)R2(t),,Rn(t)Rn(t), where ΔRi(t)=Ri(t)Ri(t) represents the fluctuations out of the equilibrium structure of the position of the space coordinates Ri(t), with Ri(t)=xi(t),yi(t),zi(t) and i = 1, …, N, with N being the number of amino acids in the protein. The time-lagged covariance matrix is defined, for a lag time τtICA, as

(8)

and for τtICA = 0, the covariance matrix recovers the static, structural matrix that is used in PCA, as Cr(0)=ΔR(t0)TΔR(t0). Here, a(t+τ)b(t)τ=1Mτt=1Mτa(t+τ)b(t) denotes an average over the time-lagged trajectory containing M frames.

The tICA modes, or tICs, are found by solving the following generalized eigenvalue equation:19,24

(9)

where Ω is the matrix of right eigenvectors of Cr(τtICA) and ΛIC(τtICA) is the diagonal matrix of the related eigenvalues.

From the solution of the generalized eigenvalue problem, one has that the eigenvector matrix, Ω, diagonalizes both Cr(τtICA) and Cr(0),

(10)

where I is an identity matrix of the same dimensions as Cr(τtICA) and Cr(0). The tICA modes, z(t), are determined by transforming the input coordinates ΔR(t) as z(t) = ΩTΔR(t).

Given the definition of the LE4PD-XYZ A matrix of Eq. (3), it is straightforward to show that

(11)

where the tICA matrix is defined in Eq. (8). A more detailed calculation of this relation is reported in Ref. 32.

It follows that in the limit of zero lag time, the tICA eigenvectors are identical to the eigenvectors of the LE4PD-XYZ matrix when hydrodynamics is discarded, i.e., the hydrodynamic matrix H = I. The tICA eigenvalues, instead, are equivalent to the inverse of the LE4PD-XYZ eigenvalues. For the hydrodynamic matrix to become equal to the identity matrix, one needs to assume that there are not solvent-mediated hydrodynamic interactions between amino acids and that the friction coefficient of each amino acid is set equal to the average friction coefficient. Note that the formal equivalence between the tICA dynamics at lag time zero and the LE4PD-XYZ approach represented by Eq. (5), with H = I, implies that the fluctuations are harmonic and the internal energy barriers are also discarded. Thus, both the LE4PD-XYZ and the tICA dynamics should include in the equation of motion the correction due to the energy barriers calculated from the mode-dependent free-energy landscapes and the hydrodynamic interaction.

In this section, we show how a free-energy landscape can be associated with each tICA mode. This step is important because the mode-specific free-energy barriers allow one to study the details of conformation fluctuations, thus determining the pathway of transition and the height of the energy barriers in the fluctuations associated with tICA modes. We follow here a procedure similar to the one we established for the LE4PD-XYZ modes.

The elements of the tICA eigenvectors Ω can be decomposed into their x–, y–, and z– projections,

(12)

where x̂,ŷ, and ẑ are the unit vectors in the x-, y-, and z-directions and ⊗ denotes the Kronecker product.57 This decomposition is useful as it allows for the creation of tIC-dependent free-energy surfaces, which can be compared directly with the LE4PD free-energy surfaces (see Sec. V).

To define a Free-Energy Surface (FES) for each of the tICA mode coordinates, we start by projecting the space coordinates of the fluctuations onto tICA modes using the tICA eigenvectors. For the tICA modes, the eigenvector matrix ΩT, which transforms the ΔR(t) into the z(t) tIC coordinate system, can be decomposed into its contributions from the x–, y–, and z– components of ΔR(t),

(13)

which allows for the decomposition of each tIC za(t) into its contributions from the x–, y–, and z– components of the input coordinates ΔR(t),

(14)
(15)
(16)

This decomposition can be used to describe each tIC in a new spherical coordinate system,

(17)
(18)
(19)

With the definitions of θa(t), ϕa(t), and Ra(t), one can create two-dimensional free-energy surfaces in (θa, ϕa) by averaging over the radial coordinate Ra(t),

(20)

The main advantages of constructing the free-energy surfaces in this manner are that (1) each surface is tIC-specific because the dynamics among tICs are largely decoupled (an evaluation of the extent of coupling in tICA modes is reported in Sec. S1 of the supplementary material), and (2) energetic pathways and fluctuations along this surfaces are easy to visualize for each tIC. As with previous LE4PD analyses, a variant of the string method is utilized to find minimum free-energy pathways between energy wells on the surface.32,44,63 A MSM analysis can provide the time scale associated with each tICA mode. Note that the tICA can be applied once we have selected a lag time, τtICA. In this study, the tICA lag time is 2 ns; we present the procedure to establish this value in Sec. VII.

As an example of the information inherent in F(θa, ϕa) for the tICs, Figs. 2 and 3 show the results of the analysis in the θa,ϕa coordinate space for the two slowest tICA modes extracted from the 1-µs simulation of ubiquitin.

FIG. 2.

Analysis of the free-energy map of the first tICA mode. (a) Free-energy surface along the θa,ϕa coordinates for the slowest tIC. (b) Structures of ubiquitin from the trajectory along the free-energy surface given in (a). The colors of the structures correspond to the given colored marker along the transition pathway. Arrows point to the three regions of ubiquitin showing the largest amplitude fluctuations: the C-terminal tail (black arrow), the 50 s loop (blue arrow), and the Lys11 loop (red arrow). (c) Projection of ψ2 onto the discrete states of the MSM; colors correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. (d) Implied timescales of the MSM as a function of MSM lag time. The black vertical line demarcates the lag time selected when constructing the MSM, τMSM = 4 ns.

FIG. 2.

Analysis of the free-energy map of the first tICA mode. (a) Free-energy surface along the θa,ϕa coordinates for the slowest tIC. (b) Structures of ubiquitin from the trajectory along the free-energy surface given in (a). The colors of the structures correspond to the given colored marker along the transition pathway. Arrows point to the three regions of ubiquitin showing the largest amplitude fluctuations: the C-terminal tail (black arrow), the 50 s loop (blue arrow), and the Lys11 loop (red arrow). (c) Projection of ψ2 onto the discrete states of the MSM; colors correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. (d) Implied timescales of the MSM as a function of MSM lag time. The black vertical line demarcates the lag time selected when constructing the MSM, τMSM = 4 ns.

Close modal
FIG. 3.

Analysis of the free-energy map for the second tICa mode. (a) Free-energy surface along the θa,ϕa coordinates for the slowest tIC. (b) Structures of ubiquitin from the trajectory along the free-energy surface given in (a). The colors of the structures correspond to the given colored marker along the transition pathway. Movement along the pathway in (a) correspond to fluctuations mostly in the Lys11 loop (red arrow) and C-terminal tail (black arrow) of ubiquitin, as well as smaller amplitude motions in the 50 s loop (blue arrow). (c) Projection of ψ2 onto the discrete states of the MSM; colors correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. (d) Implied timescales of the MSM as a function of MSM lag time. The black vertical line demarcates the lag time selected when constructing the MSM, τMSM = 1.6 ns.

FIG. 3.

Analysis of the free-energy map for the second tICa mode. (a) Free-energy surface along the θa,ϕa coordinates for the slowest tIC. (b) Structures of ubiquitin from the trajectory along the free-energy surface given in (a). The colors of the structures correspond to the given colored marker along the transition pathway. Movement along the pathway in (a) correspond to fluctuations mostly in the Lys11 loop (red arrow) and C-terminal tail (black arrow) of ubiquitin, as well as smaller amplitude motions in the 50 s loop (blue arrow). (c) Projection of ψ2 onto the discrete states of the MSM; colors correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. (d) Implied timescales of the MSM as a function of MSM lag time. The black vertical line demarcates the lag time selected when constructing the MSM, τMSM = 1.6 ns.

Close modal

Figure 2(a) shows the free-energy map, F(θ1, ϕ1), for the first tICA mode, z1(t), with a pathway drawn between the two prominent minima on the surface. Figure 2(b) displays the fluctuations along the alpha-carbon backbone of ubiquitin when moving along the pathway given in Fig. 2(a); the colors of the structures in Fig. 2(b) correspond to the colors of the images along the pathway in Fig. 2(a). Movement along the minimum energy pathway for z1(t) shows fluctuations in the 50 s loop (blue arrow), the C-terminal tail (black arrow), and the Lys11 loop (red arrow), each of which is a known binding region of ubiquitin to other proteins.45,46,64

Figure 2(c) shows the projection of the most slowly decaying eigenfunction, ψ2, from the MSM transition matrix constructed on this surface starting from the MD trajectory. Note that by assuming different MSM lag times, one obtains different eigenvectors ψ2 and different projections onto the surface. By selecting τMSM = 4.0 ns, we see that the most positive projection of ψ2 lies in the minimum in the bottom half of the surface, and the maximum projection of ψ2 lies in the minimum in the top half of the surface. Thus, with τMSM = 4.0 ns selected, the spectrum indicates that the slowest process described by the MSM corresponds to transitions between the two minima on the surface, whose fluctuations should be described well by the extracted structures from the pathway given in Fig. 2(b).

To test the validity of τMSM = 4.0 ns found with this procedure, Fig. 2(d) shows the implied timescale of t2, i.e., the timescale of the process described by ψ2, as a function of MSM lag time τMSM. The vertical dashed line marks the lag time used in the construction of the MSM shown in Fig. 2(c). We see that at the selected MSM lag time, the dynamics are Markovian, and the t2 time is constant, confirming the validity of the selected τMSM. Thus, for ψ2 shown in Fig. 2(c), the MSM transition matrix, T, is constructed by sampling the trajectory with an interval of τMSM = 4.0 ns, and the predicted timescale is t2τMSM=4.0ns=52.6 ns. In summary, combining the tIC free-energy surface in θa,ϕa with the Markov state modeling analysis predicts that the timescale of movement between the two minima in Fig. 2(a) is ∼53 ns. The corresponding dynamics along the alpha-carbon backbone during this event are illustrated in Fig. 2(b).

Figure 3 illustrates the analogous analysis for the θa,ϕa surface spanned by the second-slowest tIC. Drawing a transition pathway between the two minima on the surface [Fig. 3(a)] and extracting the structures along that pathway from the MD simulation show that this tIC describes fluctuations in the Lys11 loop and C-terminal tail regions of ubiquitin [Fig. 3(b)].45,46 Again, using the decomposition of ψ2 from the MSM on this surface to choose the lag time of the MSM [Fig. 3(c)], the process of transitioning between the minima on the surface is predicted to occur over a timescale of 6.7 ns [Fig. 3(d)]. Thus, the θa,ϕa surface for the second-slowest tIC predicts mainly motion in the tail and Lys11 loop, occurring over a timescale of 6.7 ns.

We repeat the evaluation of the t2 times for all the tICs, and we present these results for the first ten tICs in comparison with the first ten LE4PD-XYZ modes in Tables I and II. Further information on the timescales associated with the first and second tICA modes and on the amplitude and position of their fluctuations are presented in Secs. IV and IV A, respectively.

TABLE I.

Comparing the slowest timescales from the isotropic LE4PD, the LE4PD-XYZ (ansiotropic LE4PD), and tICA for the 1-µs simulation of ubiquitin at the MSM lag time where the spectrum of ψ2 on the free-energy surface is optimized (τMSM).44 For the LE4PD-XYZ modes, the table reports data for the approach with (w/HI) and without hydrodynamic (w/o HI) interaction included. The isotropic LE4PD modes are indexed by the internal mode number (see explanation in the text).

LE4PD-XYZ
LE4PDw/HIW/o HItICA
Modet2τMSM (ns)t2τMSM (ns)t2τMSM (ns)t2τMSM (ns)
3.9(1.05) 8.0(3.2) 6.5(2.8) 52.6 (4.0) 
0.7(0.1) 3.7(1.1) 4.6(1.5) 6.7 (1.6) 
0.9(0.35) 4.3(2.5) 4.3(2.0) 4.8(1.6) 
2.4(0.5) 6.4(4.0) 1.0(0.3) ⋯ 
0.1(0.01) 4.8(4.0) 5.5(4.9) 2.5(1.0) 
11.0(0.9) 3.3(1.0) 3.1(2.0) ⋯ 
0.5(0.25) 3.6(2.0) 7.4(3.0) 2.5(1.6) 
0.4(0.11) 0.6(0.2) ⋯ 6.1(5.0) 
0.24(0.1) 0.3(0.1) 1.3(0.5) 0.8(0.3) 
10 0.35(0.3) 0.4(0.1) 0.4(0.2) 7.0(5.0) 
LE4PD-XYZ
LE4PDw/HIW/o HItICA
Modet2τMSM (ns)t2τMSM (ns)t2τMSM (ns)t2τMSM (ns)
3.9(1.05) 8.0(3.2) 6.5(2.8) 52.6 (4.0) 
0.7(0.1) 3.7(1.1) 4.6(1.5) 6.7 (1.6) 
0.9(0.35) 4.3(2.5) 4.3(2.0) 4.8(1.6) 
2.4(0.5) 6.4(4.0) 1.0(0.3) ⋯ 
0.1(0.01) 4.8(4.0) 5.5(4.9) 2.5(1.0) 
11.0(0.9) 3.3(1.0) 3.1(2.0) ⋯ 
0.5(0.25) 3.6(2.0) 7.4(3.0) 2.5(1.6) 
0.4(0.11) 0.6(0.2) ⋯ 6.1(5.0) 
0.24(0.1) 0.3(0.1) 1.3(0.5) 0.8(0.3) 
10 0.35(0.3) 0.4(0.1) 0.4(0.2) 7.0(5.0) 
TABLE II.

Comparing the slowest timescales from the isotropic LE4PD, the LE4PD-XYZ (ansiotropic LE4PD), and tICA for the 1-µs simulation of ubiquitin in the long-lag time regime (τMSM) where the dynamics best satisfy the Chapman–Kolmogorov condition.66 For the LE4PD-XYZ modes, the table reports data for the approach with (w/HI) and without hydrodynamic (w/o HI) interaction included. The isotropic LE4PD modes are indexed by the internal mode number (see explanation in the text).

LE4PD-XYZ
LE4PDw/HIw/o HItICA
Modet2τMSM (ns)t2τMSM (ns)t2τMSM (ns)t2τMSM (ns)
5.3(1.8) 14.6(12.0) 16.2(12.0) 54.0(5.0) 
3.3(1.6) 14.4(10.0) 16.6(12.0) 12.6(5.0) 
1.9(1.2) 9.6(8.0) 9.2(8.0) 10.5(5.0) 
4.7(1.6) 7.2(6.0) 9.5(8.0) 9.1(5.0) 
3.6(1.6) 4.8(4.0) 7.7(6.0) 9.3(5.0) 
33.7(25.0) 4.6(4.0) 21.5(12.0) 6.6(5.0) 
1.2(1.0) 19.9(12.0) 12.6(10.0) 5.7(5.0) 
3.0(1.6) 2.4(2.0) 4.6(4.0) 6.1(5.0) 
0.5(0.4) 4.0(3.5) 1.8(1.5) 6.4(5.0) 
10 0.35(0.3) 1.3(1.0) 3.7(3.0) 7.0(5.0) 
LE4PD-XYZ
LE4PDw/HIw/o HItICA
Modet2τMSM (ns)t2τMSM (ns)t2τMSM (ns)t2τMSM (ns)
5.3(1.8) 14.6(12.0) 16.2(12.0) 54.0(5.0) 
3.3(1.6) 14.4(10.0) 16.6(12.0) 12.6(5.0) 
1.9(1.2) 9.6(8.0) 9.2(8.0) 10.5(5.0) 
4.7(1.6) 7.2(6.0) 9.5(8.0) 9.1(5.0) 
3.6(1.6) 4.8(4.0) 7.7(6.0) 9.3(5.0) 
33.7(25.0) 4.6(4.0) 21.5(12.0) 6.6(5.0) 
1.2(1.0) 19.9(12.0) 12.6(10.0) 5.7(5.0) 
3.0(1.6) 2.4(2.0) 4.6(4.0) 6.1(5.0) 
0.5(0.4) 4.0(3.5) 1.8(1.5) 6.4(5.0) 
10 0.35(0.3) 1.3(1.0) 3.7(3.0) 7.0(5.0) 

In this section, we perform a quantitative comparison of the transition times predicted for the slow tICA modes and for the LE4PD-XYZ modes, starting from the same MD trajectory of ubiquitin in solution (for the MD simulation method, see Sec. S8 in the supplementary material).

For both the LE4PD-XYZ and the tICA free-energy surfaces, we evaluate the MSM times following the procedure presented in Secs. II A and III C. To calculate the transition times, t2, we construct the MSM for each mode and estimate the timescales, t2, using either the mapping of the second MSM eigenvector onto the FES or the Markovian criterion (i.e., the Chapman–Kolmogorov [CK] condition) for the mode trajectories (for details on the MSM, see Sec. S9 in the supplementary material). Table I presents the values of t2 calculated using the second MSM eigenvector ψ2, while Table II reports the values of the transition times calculated using the Chapman–Kolmogorov criterion for a Markovian process.59,65 In Tables I and II, we also report the values of transition times calculated for the internal modes of the isotropic LE4PD equation with the hydrodynamic interaction included (the LE4PD theory is briefly summarized in Sec. S6 in the supplementary material). The data for the LE4PD-XYZ approach are reported both with and without hydrodynamic interaction included. From these data, we can assess the importance of hydrodynamics in the time scale of the protein’s fluctuations. Note that the calculations are performed while including hydrodynamics and also account for residue-dependent friction coefficients.

Note that the isotropic LE4PD is an equation of motion in the lab reference system, while the LE4PD-XYZ starts from a body-centered trajectory where translation and rotation have been eliminated.53–56 Thus, we report the internal motion of LE4PD after the first three rotational modes are discarded. In Tables I and II, all the dashed entries denote free-energy surfaces where the extreme projections of ψ2 are never located in minima on the surface and are thus not suited for Markov state modeling in the manner desired here.

Crossing the energy barriers may slow down differently different modes. Some internal modes may have larger energy barriers than the first mode. This is, in fact, the case for ubiquitin, as one can see from reading Tables I and II. When t2 is calculated using the second MSM eigenvector (Table I), the slowest mode for the isotropic LE4PD method, with hydrodynamic interaction included, is mode 6, while the slowest mode for the anisotropic LE4PD-XYZ model without hydrodynamics is mode 7. If the Markovianity criteria are enforced (Table II), the slowest fluctuations are in mode 7 for the LE4PD-XYZ with hydrodynamics and in mode 6 for the LE4PD-XYZ without hydrodynamics.

From the timescales listed in Table I, all the LE4PD methods give roughly the same timescales for the slowest motions of the system. The first tICA mode, however, displays dynamics that are five times slower than LE4PD. The first tIC corresponds to the contemporary motion in the three flexible binding regions of ubiquitin, as shown in Fig. 2, and predicts that this motion occurs almost ten times slower than the roughly analogous motion predicted by the isotropic LE4PD mode 6 and LE4PD-XYZ mode 7 with hydrodynamics, respectively. However, when the MSM lag time is selected using the CK condition, which does not always coincide with the lag time selected by optimizing the projection of ψ2 from the MSM,44 the gap between the predicted timescales of the slow LE4PD and tICA modes is reduced, as shown in Table II.

Overall, the timescales presented are similar in magnitude, with the tICA modes being generally slower than the LE4PD-XYZ. Note that the plots to calculate t2 are in all cases on a logarithmic scale and that small changes in the selected τMSM can give large differences in the value of t2 [see Figs. 1(d) and 2(d)]. Except for the first tICA mode, all the other tICA and LE4PD modes do not show an evident Markovian nature of the dynamics, and one should take the exact values of t2 with some reservations.

To compare the dynamics predicted by the slow tICs and LE4PD modes, we calculate the mode-dependent fluctuation profiles as a function of amino acid sequence along the backbone of the protein for the first ten modes of the LE4PD-XYZ theory without hydrodynamic interaction and of the tICA approach. These correspond to the last two columns on the right of Tables I and II. Local fluctuations are well represented by the local mode lengthscale (LML).36,44 In the anisotropic formalism of LE4PD-XYZ, where ΔRiΔRi=Δxi2+Δyi2+Δzi2, the eigenvectors are partitioned into their x–, y–, and z– components, thus isolating the x–, y–, and z–projections of LMLia2 as

(21)
(22)
(23)

where μa,LE4PD-XYZ are the eigenvalues of A′.32 

For the tICA, fluctuations are derived from the definition of the modes, za(t)=iΩaiTΔRi(t), and the Moore–Penrose generalized inverse67 of ΩT, Ω−1T as

The mean-square fluctuations of residue i, given by ⟨Δxi(txi(t)⟩ + ⟨Δyi(tyi(t)⟩ + ⟨Δzi(tzi(t)⟩, can be written in terms of the tICs as

(24)

where the definition ⟨za(t)zb(t)⟩ = δab is used to obtain Eq. (24). The tICA LMLs are reported in Fig. 5.

Thus, Figs. 4 and 5 show the mode-dependent fluctuations calculated from the one-microsecond ubiquitin simulation using the anisotropic LE4PD without hydrodynamics and the tICA, respectively, for the first ten modes of each method. For tICA, the slowest tIC describes concerted fluctuations in the tail, Lys11, and 50 s loops of the protein. For the LE4PD-XYZ approach, most of the low-index modes describe fluctuations in the C-terminal tail of the protein. One needs to look at the fourth mode to find slow fluctuations in the Lys11 loop and to the sixth and seventh modes to find slow fluctuations in the 50 s loop. Neither of the LE4PD approaches gives a single mode describing simultaneous motion in the three important regions of the protein. However, we observe that there is good correspondence between the slowest tIC and the anisotropic LE4PD-XYZ mode 7, which is the slowest mode for the anisotropic LE4PD when hydrodynamic effects are neglected.

FIG. 4.

Mode-dependent fluctuations or local mode lengthscale (LML) for the ten slowest modes captured from the anisotropic LE4PD-XYZ analysis, without hydrodynamics, of the 1-µs simulation of ubiquitin. Each panel shows the fluctuations’ amplitude as a function of the protein’s primary sequence. For example, the first LE4PD-XYZ mode shows fluctuations mostly in the C-terminal tail. One finds the slowest fluctuations corresponding to the 50 s loop in mode 6.

FIG. 4.

Mode-dependent fluctuations or local mode lengthscale (LML) for the ten slowest modes captured from the anisotropic LE4PD-XYZ analysis, without hydrodynamics, of the 1-µs simulation of ubiquitin. Each panel shows the fluctuations’ amplitude as a function of the protein’s primary sequence. For example, the first LE4PD-XYZ mode shows fluctuations mostly in the C-terminal tail. One finds the slowest fluctuations corresponding to the 50 s loop in mode 6.

Close modal
FIG. 5.

Mode-dependent fluctuations or local mode lengthscale (LML) for the ten slowest modes captured from the tICA of the 1-µs simulation of ubiquitin, with a tICA lag time of 2 ns. Each panel shows the fluctuations’ amplitude as a function of the protein’s primary sequence. For example, the first tICA mode shows fluctuations in the Lys11 loop, the 50 s loop, and the C-terminal tail.

FIG. 5.

Mode-dependent fluctuations or local mode lengthscale (LML) for the ten slowest modes captured from the tICA of the 1-µs simulation of ubiquitin, with a tICA lag time of 2 ns. Each panel shows the fluctuations’ amplitude as a function of the protein’s primary sequence. For example, the first tICA mode shows fluctuations in the Lys11 loop, the 50 s loop, and the C-terminal tail.

Close modal

Because the three slowest processes all appear in the first tICA mode, while they are partitioned in six different LE4PD-XYZ modes, we observe that the tICA procedure can group the slowest, important, dynamics in a smaller number of modes than the LE4PD, which, instead, partitions the protein’s slow dynamics into several leading modes with different time and length scales. When the goal is identifying the slowest fluctuations in one mode, tICA appears to be more efficient than the LE4PD in isolating the slow fluctuations if the lag time, τtICA, is selected appropriately. However, suppose the ultimate goal is the accurate analysis of the protein’s slow dynamics. In that case, the LE4PD approach has a more desirable outcome because it maintains the information on the fast dynamics and provides better resolution at short times. As shown in Sec. VI, the LE4PD can predict the dynamics as measured by time correlation functions with higher accuracy than the tICA modes.

The quantitative comparisons of the mode-dependent fluctuations calculated using the isotropic LE4PD with hydrodynamics, the anisotropic LE4PD-XYZ without hydrodynamics, and the tICA are presented in the supplementary material (Sec. S4).

Following the conformational selection model, a protein in the absence of its binding partner samples all the energetically available states, and among those states are some that can bind to the substrates of interest.14–17 Thus, the residue fluctuations, observed in the tICA and LE4PD analyses, may provide useful information on the timescales and length scales of relevant binding modes. By direct inspection of the LMLs calculated with the LE4PD-XYZ and with tICA, we observe that there are three important regions of slow fluctuation dynamics in ubiquitin. Fluctuations in the C-terminal tail and in the flexible loop containing Lys11, which are visible in most of the slow modes, are implicated in the covalent association to other proteins and in covalent binding to lysine in polyubiquitination.45–47 Given that ubiquitin binds covalently to numerous proteins of different sizes and flexibilities, it is perhaps not surprising that these fluctuations cover a wide range of length scales and timescales. The third region of important fluctuations involves the 50 s loop, which is known to participate in the hydrophobic binding to the A20 zinc-finger motif of Ras guanine exchange factor Rabex-5, where the residue Y25 of Rabex-5 forms a hydrogen bond with residue E51 of ubiquitin, which has the largest fluctuations in the 50 s loop.64,68

Dynamics in the 50 s loop region of ubiquitin is correlated with the breaking of a hydrogen bond between G53 and E24, which helps maintain the protein’s folded structure.69 Thus, breaking this hydrogen bond serves as a “gatekeeper” to selecting different conformations. For example, in Ref. 69, the authors found that only 29% of 155 x-ray structures examined showed ubiquitin with the same hydrogen bonding pattern between G53 and E24 found in the folded structure from Ref. 70, which is also the starting structure in this study. Furthermore, in Ref. 71, the authors demonstrated that the interchange between hydrogen bonding patterns in the 50 s loop modulate large-scale conformational changes (contraction and expansion) along the entire primary sequence of ubiquitin. This affects the protein’s ability to bind to a set of ubiquitinases known as ubiquitin-specific proteases and marks the 50 s loop as a potential site of allosteric inhibition. Thus, experimental evidence indicates that local conformational changes in the 50 s loop are required for global conformational transitions in ubiquitin.

In this section, we quantify the agreement between the energy maps for the slowest modes identified by tICA and LE4PD-XYZ. Figure 6 displays in each row the comparison between the LE4PD slowest modes and the tICA slowest mode for the two LE4PD models we study, namely, the isotropic LE4PD and the anisotropic LE4PD-XYZ theory.

FIG. 6.

(a) From left to right: free-energy surface of the isotropic LE4PD internal mode 6 with hydrodynamics from the one-microsecond ubiquitin simulation; projection of ψ2 from the MSM of the trajectory on the θ,ϕ surface; and projection of the first tIC z1(t) onto the θ,ϕ surface. (b) Same as (a), but the displayed free-energy surface is for the LE4PD-XYZ mode 7 without hydrodynamics. (c) Same as (a) and (b), except for the LE4PD-XYZ mode 5 without hydrodynamics, with the projection in the right-most panel being the third tIC z3(t) onto the surface.

FIG. 6.

(a) From left to right: free-energy surface of the isotropic LE4PD internal mode 6 with hydrodynamics from the one-microsecond ubiquitin simulation; projection of ψ2 from the MSM of the trajectory on the θ,ϕ surface; and projection of the first tIC z1(t) onto the θ,ϕ surface. (b) Same as (a), but the displayed free-energy surface is for the LE4PD-XYZ mode 7 without hydrodynamics. (c) Same as (a) and (b), except for the LE4PD-XYZ mode 5 without hydrodynamics, with the projection in the right-most panel being the third tIC z3(t) onto the surface.

Close modal

In the first column, Fig. 6 shows the FES of the LE4PD projected trajectory, which displays energy minima for the most populated conformations of the protein. For this FES, the second column of Fig. 6 presents the second eigenvector obtained from the Markov State Model (MSM) analysis of the FES. The superposition of the second MSM eigenvector to the LE4PD energy map indicates which transition represents the slowest fluctuation for the given LE4PD mode.65,72,73 The third column in Fig. 6 shows the comparison between a tICA mode and the LE4PD mode. The superposition is accomplished by projecting the first tIC onto the LE4PD free-energy map and testing if the most extreme tICA conformations are the ones that correspond to the minima in the LE4PD FES. To perform this comparison, we assign each conformation in the tICA mode trajectory to the closest MSM microstate in the LE4PD-mode FES surface using the root mean-square distance from each MSM microstate as the assignment metric. Then, the tICA mode trajectory populates the FES, giving a projection of the tICA mode that is completely analogous, in both meaning and interpretation, to the projection of an eigenvector ψi from the MSM onto the LE4PD FES (see the second column of Fig. 6). The approach of projecting a tICA mode onto a free-energy surface has been previously applied by Sultan and Pande27 to verify the interpretation for the slowest tIC from a simulation of alanine dipeptide.

When projecting the tICs, z(t), onto the θa,ϕa surfaces, the average of z(t) within each MSM LE4PD microstate i, Si, is calculated as

with Mi being the total number of frames the z(t) trajectory resides in the Si LE4PD microstate over the course of the simulation. This local average of z(t) within each of the discrete states is what is reported in Fig. 6.

The slowest tIC is the optimal linear approximation to the full-space Markov propagator of the system.24 The ψ2 from the MSM on the slowest LE4PD modes are also estimators of the slowest processes of the system; a high similarity between the projected spectra of the slow tICs and ψ2 indicates high similarity between the predicted dynamics from the two models. That is, if the slow dynamics predicted in each approach are consistent with each other, then the spectra of both the slow tICs and ψ2 should predict probability flow between the deep minima on the θa,ϕa surfaces of the slowest LE4PD modes. The ψ2 are already parameterized to do so,44 but the slow tICs are, in principle, ignorant of the LE4PD θa,ϕa surface. We use this technique to confirm that the slow LE4PD modes can extract the slow dynamics compatible with tICA modes.8 

The three rows in Fig. 6 represents, from top to bottom, the following calculations. In the first row, the isotropic LE4PD (mode 6) with hydrodynamics agrees with the first tICA mode. In the second row, the anisotropic LE4PD-XYZ (mode 7) without hydrodynamics compares well with the first tICA mode. The third row shows a comparison between the third tICA mode (mode 3) and the fifth mode of the anisotropic LE4PD-XYZ without hydrodynamics. It is clear from these results that the slow dynamics detected by tICA and anisotropic LE4PD-XYZ are similar, even if the slow dynamics can be distributed differently in the LE4PD, LE4PD-XYZ, and tICA modes (see Figs. 4 and 5).

Note that the technique used here of projecting the tICs onto the θa,ϕa surfaces of the LE4PD modes is analogous to the technique used in Refs. 74–77 to model experimental observables using Markov state models. Like an experimental observable, the separation of two minima of the θa,ϕa surfaces into “high z” and “low z” states indicates that transitions on the θa,ϕa surface correspond to transitions between a high z state and a low z state, similar to how fluorescence experiments on a protein search for transitions between a high fluorescence state, indicating that the protein is sampling conformations where the fluorophores are far apart, and a low fluorescence state, where the protein is sampling conformations where the fluorophors are close together.78,79

In conclusion, Fig. 6 demonstrates that both LE4PD approaches are able to capture the same slow motion as the tICA. The correlation between the time series of z1 and ψ2 from the MSM of the slowest isotropic LE4PD mode is high (ρ = 0.92), indicating that both z1 and ψ2 are predictive of the slow dynamics in ubiquitin. The correlation coefficient between the time series of z1 and ψ2 from the MSM of the slowest anisotropic LE4PD-XYZ mode is ρ = 0.73, which is still acceptable. The correlation coefficient between the time series of z3 and the ψ2 for the fifth LE4PD-XYZ mode is ρ = 0.54.

The analysis of the amplitude, location, and timescale of the slow fluctuations for ubiquitin with the three methods (tICA, LE4PD, and LE4PD-XYZ) show that they correctly identify the regions in the protein where slow fluctuations occur. However, the slow fluctuations are partitioned in different modes for the two methods. To gain further details on the capabilities of the two approaches, we introduce, as the ultimate test of the tICA’s and LE4PD’s ability to predict with accuracy slow time dynamics, the comparison of their time correlation functions (tcfs) to the tcfs directly calculated from the simulation trajectory.

The normalized autocorrelation function for the fluctuations of each residue is defined as C(t)=ΔR(t)ΔR(0)ΔR(0)ΔR(0). For the LE4PD approaches, the autocorrelation function is calculated by including for each mode the slowing down of the dynamics due to the presence of an energy barrier in the FES.32,44 Recently, we have shown that neglecting the hydrodynamic interaction modifies the LE4PD-XYZ curves, leading to a (moderately) worse agreement with the simulation data.32Figure 7 shows the fluctuation decay of the tcfs for residues sampled along the primary sequence of ubiquitin. This figure compares the LE4PD-XYZ results with hydrodynamics included to the tcfs from the simulations: the agreement is remarkable. It also shows the tcfs for the LE4PD-XYZ without hydrodynamic interactions, which are less in agreement, at least for the residues presented in the figure.

FIG. 7.

Comparison of the time correlation functions (tcfs) for a sampling of residues along the primary sequence of ubiquitin. The black curves in each subplot show the tcf calculated from the simulation trajectory; the blue curves show the tcfs predicted from the LE4PD-XYZ theory with HI, the red curves show the tcfs predicted from the LE4PD-XYZ without HI, and the yellow curves show the tcfs predicted from the tICA with a lag time of 2000 ps.

FIG. 7.

Comparison of the time correlation functions (tcfs) for a sampling of residues along the primary sequence of ubiquitin. The black curves in each subplot show the tcf calculated from the simulation trajectory; the blue curves show the tcfs predicted from the LE4PD-XYZ theory with HI, the red curves show the tcfs predicted from the LE4PD-XYZ without HI, and the yellow curves show the tcfs predicted from the tICA with a lag time of 2000 ps.

Close modal

At a given lag time, C(t) can be written in terms of the tICA eigenspectra by inverting the relationship za(t)=iΩaiTΔRi(t) as ΔRi(t)=aΩia1Tza(t) and using the (near) independence of the tICs za(t)zb(0)za(0)zb(0)expt/τa=δabexpt/τa as

(25)

The decay timescales for each tICA mode, τa, are calculated empirically by the integration of the autocorrelation function za(t)zb(0)/za(0)zb(0)=et/τa obtained from the simulations21 and assuming that each mode is represented by a single exponential decay. This procedure should account for the barriers present along each tICA coordinate in, at least, a coarse manner.32,80 This time, τa, is, in general, different from the inverse of the eigenvalues λIC [Eq. (10)] because that time does not include the mode-dependent energy barrier. If one adopted the inverse of the eigenvalues λIC as the timescale of decay, the tcfs calculated from tICA would display an even faster and more unphysical decay than the one observed when including the mode-dependent energy barrier for tICA (see Fig. 7). Once a lag time is selected, we build the matrix Cr(τtICA) [Eq. (10)], and by diagonalization, we derive the eigenvectors and eigenvalues that enter Eq. (25).

The time correlation functions calculated from the tICs [Eq. (25)] are directly compared to the one from the simulation trajectory in Fig. 7. For each residue shown and for most residues across the primary sequence of ubiquitin, the tcfs predicted from the LE4PD-XYZ with hydrodynamics are in better agreement with the simulated tcfs than those predicted from the tICA or the LE4PD-XYZ without hydrodynamics.

The accuracy of the two approaches is quantified in Fig. 8, which shows the mean absolute error [⟨MAE(t)⟩] between the simulated and predicted C(t) for all the residues in ubiquitin. This metric of quantifying the distance between the “true” [simulated, C(t)] and “estimated” [predicted, Ĉ(t)] is defined as

(26)

which is the average distance between the two autocorrelation functions (acf). M0 is the number of frames before an acf cutoff time, which is the time at which the acf first decays to a specified value. For example, using a cutoff of C(t) = 0.0 we calculate Eq. (26) over all points of the acf until the acf attains a value of 0.0 for the first time.

FIG. 8.

Mean absolute error [⟨MAE(t)⟩] between the simulated autocorrelation function (acf), C(t), and the predicted autocorrelation functions from the tICA (purple) and from the LE4PD-XYZ with hydrodynamics (blue). The error is reported for all amino acids as a function of the primary sequence of the protein ubiquitin and at increasing acf cutoff. The error is found by calculating the right-hand side of Eq. (26) for all values of t until C(t) reaches the given cutoff value for the first time t′. On average, across all residues and for all cutoff values, the anisotropic LE4PD with hydrodynamics out-performs the tICA predictions.

FIG. 8.

Mean absolute error [⟨MAE(t)⟩] between the simulated autocorrelation function (acf), C(t), and the predicted autocorrelation functions from the tICA (purple) and from the LE4PD-XYZ with hydrodynamics (blue). The error is reported for all amino acids as a function of the primary sequence of the protein ubiquitin and at increasing acf cutoff. The error is found by calculating the right-hand side of Eq. (26) for all values of t until C(t) reaches the given cutoff value for the first time t′. On average, across all residues and for all cutoff values, the anisotropic LE4PD with hydrodynamics out-performs the tICA predictions.

Close modal

Figure 8 shows that for most amino acids, the error metric is lower for the anisotropic LE4PD with hydrodynamics compared to the tICA estimator of C(t). This figure shows data for four different cutoff times, indicating that the result is robust and is not affected by the choice of the cutoff time. Thus, using the anisotropic LE4PD with hydrodynamics gives a better prediction, on average, of the C(t) autocorrelation function compared to tICA (lag time 2.0 ns) across all the residues in ubiquitin.

One may assume that the disagreement of tICA with the simulated tcfs observed in Fig. 7 is related to the choice of the lag time and that choosing either a longer or shorter tICA lag time may give a better agreement in the tcf of specific bonds. This is, in fact, the case. Figure 9 shows how using a shorter lag time (2 ps) yields tcfs in good agreement with residues’ tcfs in the highly flexible Lys11 loop, especially at timescales less than 10 ns. Similarly, using a longer lag time (20 ns) gives tICs that agree well with the simulated tcfs of several residues in the 50 s loop, where the slowest fluctuations of the protein occur. This analysis supports the heterogeneity of ubiquitin dynamics and the concept that there may be a different optimal lag time for each different tICA mode, since one can locally optimize the residues’ relaxation in different regions by varying the tICA lag time.

FIG. 9.

Left column: two residues in the Lys11 loop of ubiquitin whose tcfs from the simulation (black) are well approximated at timescales less than 10 ns by the tICs predicted using a lag time of 2 ps (cyan). Right column: two residues in the 50 s loop of ubiquitin whose tcfs from the simulation (black) are well approximated at timescales less than 10 ns by the tICs predicted using a lag time of 20 ns (magenta).

FIG. 9.

Left column: two residues in the Lys11 loop of ubiquitin whose tcfs from the simulation (black) are well approximated at timescales less than 10 ns by the tICs predicted using a lag time of 2 ps (cyan). Right column: two residues in the 50 s loop of ubiquitin whose tcfs from the simulation (black) are well approximated at timescales less than 10 ns by the tICs predicted using a lag time of 20 ns (magenta).

Close modal

The tICA approach is general and applies to any time-dependent set of coordinates and any lag time, τtICA. After selecting the input coordinates to the tICA, which in this study are the coordinates of the fluctuations away from the average structure calculated over the MD trajectory, ΔR, there remains a single adjustable parameter: the observation lag time or τtICA. This time parameter is used to construct the time-lagged covariance matrix [see Eq. (9)]. The tICA modes illustrate the dynamics taking place over a timescale longer than τtICA, while dynamical phenomena that are faster are averaged out and cannot be detected. Thus, only selecting the proper lag time can lead to the correct sampling of the dynamical phenomena that one desires to study.21,24,25

The selection of τtICA is usually accomplished by performing the MSM analysis of multi-dimensional free-energy maps at different lag times. This step is followed by testing the results a posteriori to verify which lag time leads to the longest possible timescale and to the most efficient separation of those timescales. Here, we propose a procedure to select a priori the optimal tICA lag time. However, note that the traditional a posteriori verification of the optimal tICA lag time agrees with the 2 ns value used in our calculations (see Sec. S5 in the supplementary material).

We start here from the one-dimensional mode-dependent free-energy surface constructed as described in Sec. III B and select the optimum τtICA lag time as the one that samples the highest energy barrier in the FES. To illustrate how the choice of τtICA affects the tICA modes, Figs. 10 and 11 show how an increase in the lag time modifies the dynamics that the first tICA mode samples. The energy landscape displays two deep minima and two possible paths that connect the two minima at all the lag times studied. Both Figs. 10 and 11 display the two pathways in the two FES at the top and the bottom of each panel, respectively. The protein configurations that populate the two pathways are shown in the middle of the panel: the configurations on the left (blue-white-red) represent the path in the top FES (blue-white-red path). In contrast, the configurations on the right (pink-yellow-light blue) represent the path in the bottom FES (pink-yellow-light blue path). While at all the time lags studied, the dynamics of the protein involves mostly fluctuations in the C-terminal tail, at an increasing lag time, the fluctuations in the tails become less pronounced, and new fluctuations start to appear in the Lys11 loop and in the 50 s loop.

FIG. 10.

Effect of changing the tICA lag time on the first tICA mode free-energy surface (FES) and the associated fluctuations. Note that each FES has two possible pathways to transition between the two energy minima, depicted in the panels above and below the protein fluctuations pictures. In the protein cartoon, the configurations on the left (blue-white-red) represent the path in the top FES (blue-white-red path). In contrast, the configurations on the right (pink-yellow-light blue) represent the path in the bottom FES (pink-yellow-light blue path). As one increases the lag time, the FES detects different internal energy barriers.

FIG. 10.

Effect of changing the tICA lag time on the first tICA mode free-energy surface (FES) and the associated fluctuations. Note that each FES has two possible pathways to transition between the two energy minima, depicted in the panels above and below the protein fluctuations pictures. In the protein cartoon, the configurations on the left (blue-white-red) represent the path in the top FES (blue-white-red path). In contrast, the configurations on the right (pink-yellow-light blue) represent the path in the bottom FES (pink-yellow-light blue path). As one increases the lag time, the FES detects different internal energy barriers.

Close modal
FIG. 11.

Effect of changing the tICA lag time on the first tICA mode free-energy surface (FES) and the associated fluctuations. Note that each FES has two possible pathways to transition between the two energy minima, depicted in the panels above and below the protein fluctuations pictures. In the protein cartoon, the configurations on the left (blue-white-red) represent the path in the top FES (blue-white-red path). In contrast, the configurations on the right (pink-yellow-light blue) represent the path in the bottom FES (pink-yellow-light blue path). As one increases the lag time, the FES detects different internal energy barriers. While the system crosses the barrier, it samples fluctuations in the tail and in the important loops, but as the lag time increases, the predicted motion moves from the C-terminal tail and Lys11 loop into the 50 s loop.

FIG. 11.

Effect of changing the tICA lag time on the first tICA mode free-energy surface (FES) and the associated fluctuations. Note that each FES has two possible pathways to transition between the two energy minima, depicted in the panels above and below the protein fluctuations pictures. In the protein cartoon, the configurations on the left (blue-white-red) represent the path in the top FES (blue-white-red path). In contrast, the configurations on the right (pink-yellow-light blue) represent the path in the bottom FES (pink-yellow-light blue path). As one increases the lag time, the FES detects different internal energy barriers. While the system crosses the barrier, it samples fluctuations in the tail and in the important loops, but as the lag time increases, the predicted motion moves from the C-terminal tail and Lys11 loop into the 50 s loop.

Close modal

The mode-dependent FES look qualitatively similar at τtICA smaller than 20 ps, and at τtICA larger than 2 ns. If we report the barrier height of the red-white-blue pathway between minima in Figs. 10 and 11 as a function of the lag time (see Fig. 12), we observe that when the FES is calculated at increasing τtICA, the barrier height increases until τtICA ≈ 2.0 ns when it starts decreasing. Figure 12 also reports the calculated Markov State Model (MSM) time, t2, which is given by the projection of the second MSM eigenvector as described in Sec. II. t2 is the time needed by the system to cross the barrier and shows a nice correlation with the barrier height for increasing tICA lag times. This analysis agrees with the concept that the best tICA lag time is the one that leads to the slowest dynamics, and hence the longest timescales, in a MSM analysis.

FIG. 12.

Correlation between the barrier surmounted by the red-white-blue pathway between minima in Figs. 10 and 11 (red markers) and the t2 timescale of the MSM constructed on the surface (black markers), as a function of tICA lag time. The correlation coefficient between the two sets of data, ρ, is 0.56. Dotted lines between markers are a guide to the eye.

FIG. 12.

Correlation between the barrier surmounted by the red-white-blue pathway between minima in Figs. 10 and 11 (red markers) and the t2 timescale of the MSM constructed on the surface (black markers), as a function of tICA lag time. The correlation coefficient between the two sets of data, ρ, is 0.56. Dotted lines between markers are a guide to the eye.

Close modal

Intuitively, the non-homogeneity of ubiquitin’s dynamics when changing the tICA lag time, observed in Fig. 12, seems associated with the well-known hierarchical energy landscape of proteins in the folded state.3,9 At short lag times, the tICA is sampling faster dynamics than at large lag times. Fast fluctuations cross small barriers along the pathway while sampling the energy landscape. As the tICA lag time is increased, the analysis picks up slower fluctuations, with a corresponding increase in predicted timescales and barrier heights. The fall-off in t2 (and in barrier heights) at longer tICA lag times is likely due to a loss of statistics as the lag time is made large and the system makes direct “hops” between deep minima, thus avoiding the sampling of the barriers. Because the t2 from the MSM is being reported as the timescale of the slowest processes found by the tICA, and at both long and short τtICA there are no large barriers sampled, the tICA coordinates, which are unit-free and do not encode lengthscales, return a similar quadratic or barrier-free surface to the MSM analysis.

To conclude this section, we observe that for each mode, our procedure identifies the optimal mode-dependent tICA lag time using the height of the energy barriers. Thus, different tICA modes are likely to have different optimal lag time so that the definition of an optimal lag time cannot be unique. We adopt for the optimal lag time the one measured in the slowest tICA mode.

In this section, we compare the outcome of this study with the results of the analysis performed using the conventional procedure, which combines a two-dimensional, or multidimensional, tICA free-energy map with an MSM analysis of the kinetics.8,24,25,81–84 In what has become a fairly typical workflow for the analysis of MD simulations of biomolecules using Markov state models, the MD trajectory is projected onto not just one mode but a number n of the slowest tICA modes. This procedure reduces the high dimensionality of the original free-energy landscape by identifying the slowest dominant modes. One then performs an MSM analysis on the reduced subspace to parse the slowest dynamics and corresponding timescales of the system.22,24,25,62,84–86 Usually, one selects the two slowest modes, but in some cases, one considers instead more than two tICA modes: the latter procedure may lead to even slower measured kinetic timescales.25 This is because the transitions among the selected modes can become even less probable, while statistical insufficiencies in the necessarily finite simulation data can also play a role. Note that if the tICA modes were fully uncoupled, as we would like them to be, there would not be transitions between the modes (or the time for the transition would be infinite). Transitions between tICA slow modes are rare and represent the slowest dynamics in the MD trajectory. See also Sec. S1 in the supplementary material for an analysis of the independence of the tICA modes.

Here, we report the results of this type of “traditional” tICA-MSM approach for the 1-µs ubiquitin simulation. We build the MSMs on the space spanned by the first two tICs calculated from the ubiquitin simulation at a tICA lag time of τtICA = 2 ns, the same as that used in the θa,ϕa surfaces presented in Sec. III C. Then, we compare the results of this model to the analysis performed on the single-mode projections, namely, the FES of a tIC, and the LE4PD-XYZ mode-dependent FES, presented earlier in this paper.

Figure 13 shows that the free-energy surface spanned by the first two tICs has two “lobes” having two minima each. When a MSM is constructed on this surface, it predicts that the slowest motion spanned in this two-dimensional space is a transition between the two lobes, i.e., the transition between the two tICA modes, as can be seen by an examination of the spectrum of ψ2 projected onto the free-energy surface (Fig. 13, top left panel). The MSM predicts that the transition between the two lobes occurs over a timescale of ∼200 ns. Tracing a pathway between the two deepest minima in each lobe, using the same method we utilized for the θa,ϕa surfaces shows that inter-lobe transitions correspond to dynamics in the 50 s loop of ubiquitin. The second-slowest relaxation processes on the surface spanned by the first two tICs correspond to movement between the intra-lobe minima on the right-hand lobe (Fig. 14). The MSM predicts that this transition occurs over a timescale of ∼70 ns and that the transition causes motions in the Lys11 of ubiquitin. Note that the same slow fluctuations are identified by the single mode analysis of tICA and by the LE4PD models.

FIG. 13.

Results for the MSM of the two slowest tICs. (a) Free-energy surface for the first two tICs. (b) Structures of ubiquitin from the trajectory along the free-energy surface given in (a). The colors of the structures correspond to the given colored marker along the transition pathway. (c) Projection of ψ2 onto the discrete states of the MSM; colors correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. (d) The two slowest implied timescales, t2 (blue curve) and t3 (red curve), of the MSM as a function of MSM lag time, which is completely unrelated to the lag time used in the prior tICA step. The black vertical line demarcates the lag time selected when constructing the MSM, 7.5 ns.

FIG. 13.

Results for the MSM of the two slowest tICs. (a) Free-energy surface for the first two tICs. (b) Structures of ubiquitin from the trajectory along the free-energy surface given in (a). The colors of the structures correspond to the given colored marker along the transition pathway. (c) Projection of ψ2 onto the discrete states of the MSM; colors correspond to the scaled-and-shifted value of ψ2 at that discrete state, ψ2=ψ2min(ψ2)max(ψ2)min(ψ2)0.5. (d) The two slowest implied timescales, t2 (blue curve) and t3 (red curve), of the MSM as a function of MSM lag time, which is completely unrelated to the lag time used in the prior tICA step. The black vertical line demarcates the lag time selected when constructing the MSM, 7.5 ns.

Close modal
FIG. 14.

Same as Fig. 13, except examining the second-slowest process of the MSM, which is described by ψ3 in (c), where ψ3 is scaled and shifted in the same manner as ψ2 is in Fig. 13.

FIG. 14.

Same as Fig. 13, except examining the second-slowest process of the MSM, which is described by ψ3 in (c), where ψ3 is scaled and shifted in the same manner as ψ2 is in Fig. 13.

Close modal

Thus, although the timescales predicted using the space spanned by the first two tICs are slightly slower than using the θa,ϕa surfaces for the first 2 tICs individually (∼40 vs 24.0 and ∼20 vs 10.7, respectively), the timescales are still within a factor of 2 in both cases, and the qualitative dynamics are predicted to be similar from both methods (comparing Figs. 2 and 3 with Figs. 13 and 14). Thus, the single-mode analysis of tICA, the two-mode analysis of tICA, and the single-mode analysis of LE4PD and LE4PD-XYZ give consistent results when identifying, in one simulation trajectory, the local fluctuations for the slowest processes.

We report in Sec. S7 of the supplementary material the analysis of the two-dimensional map of the two slowest LE4PD-XYZ modes. Analysis of this map confirms the previous calculations for the single-mode LE4PD-XYZ analysis. Because the LE4PD modes are slightly more coupled than the tICA modes, the energy barriers between the two slowest modes are smaller than in the tICA case.

Atomistic MD simulations of proteins have been shown to describe with accuracy relevant biological processes. However, the leading behavior that guides the properties in biological systems, including the slow cooperative dynamics involved in protein binding and function, is often hidden by the broad spectrum of phenomena and the multitude of atomistic details displayed in simulations. Many studies have begun to rely on machine learning techniques to distill the essential leading kinetic information from MD trajectories. The most straightforward and widely used analytical tool in machine learning is the time-lagged independent analysis or tICA, where the covariance matrix samples fluctuations at a given lag time, τtICA. One can obtain a careful analysis of the kinetics of slow fluctuations by combining the tICA with the MSM analysis.24,25 Similarly to tICA, the LE4PD and LE4PD-XYZ approaches accurately model nonlinear protein motions by partitioning the MD dynamics into a quasi-independent diffusive mode while including the mode-dependent free-energy surfaces.34–36,44,52 The LE4PD is an isotropic representation of protein dynamics in a lab reference frame, analogous to the celebrated Rouse–Zimm equation for synthetic polymer dynamics,41,42 extended to consider the protein’s hydrophobic core and the sequence-dependent hydrodynamic interaction. The LE4PD-XYZ is the anisotropic version of the previous equation of motion in the protein’s center-of-mass reference system, where rotation and translation are removed. The LE4PD-XYZ represents the anisotropic fluctuating dynamics of the proteins around its average structure.

Among deep learning methods, more sophisticated approaches than tICA have been proposed to model nonlinear effects in protein motions, such as kernel tICA,87 state-free reversible VAMPnets,88 and time-lagged autoencoders (TAEs).81,89 However, the use of such deep learning approaches to modeling nonlinearities in dynamics often comes with an increased computational cost, paired with a loss of physical intuition for the system under study. Thus, the tICA coordinates are considered, in general, the optimal linear approximation to the order parameters for relevant slow processes in proteins’ dynamics.21,24,60–62 However, the tICA modes and other slow processes identified by machine learning lack information on their physical origin, having no associated equation of motion.

In this study, we compare tICA predictions with both the isotropic LE4PD and with the anisotropic LE4PD or LE4PD-XYZ. To do so, we associate with each tICA mode a free-energy landscape obtained by the eigenvector projection of the simulation trajectory onto the tIC modes. This representation is convenient because it allows one to analyze the tICA’s predictions based on the time evolution of fluctuations onto the mode-dependent free-energy landscape. Because this representation depends on a single mode, it is free from the need to decide the number of modes to consider when building a free-energy map, as one usually does.

LE4PD and tICA agree in identifying the regions in the protein’s primary sequence that undergo slow dynamics. There are three regions of slow dynamics for ubiquitin, namely, the 50 s loop, the Lys 11 loop, and the C-terminal tail. All of them are known to be involved in ubiquitin’s multiple biological functions.45–47,64 Because the primary sequence of ubiquitin is highly conserved in the family of proteins with a similar function, we expect the processes identified in our study to be kinetically and thermodynamically robust and that similar mechanisms are likely to guide the binding of other proteins that perform the same function.

Both tICA and LE4PD consistently identify the leading slow modes. However, while tICA tends to collect all the slow processes in the first or a few modes, the LE4PD provides a more detailed picture of the time- and length-dependence of the slow dynamics, which are partitioned into a larger number of modes. Thus, if one aims at identifying the slowest fluctuations in one mode, tICA may be more efficient than the LE4PD if τtICA is opportunely selected. A detail to note is that when the tICs are sorted in descending order of decorrelation time (i.e., in the order of the tICA eigenvalues), the relative timescale may change after the free-energy barriers are accounted for through the Markov state modeling so that the slowest tICA mode could be different from the first tICA mode.32,34,44

In general, the tICA captures the slow fluctuations that occur at a timescale longer than the given tICA lag time, while faster dynamics are averaged out. The LE4PD method, instead, which is based on the solution of a “bead-and-spring” model of macromolecular dynamics, provides detailed information on the dynamics at the different length scales. It follows that the LE4PD is accurate in reproducing the time decay of amino acid fluctuations at all timescales when the dynamics is represented by the time correlation functions calculated from the simulation trajectory (Figs. 7 and 8). The similar calculation performed using tICA modes is, with a few exceptions, much less accurate (Figs. 7 and 9).

The tICA’s lack of accuracy in the description of the time dependence of the fluctuation decorrelation as described by the simulated tcfs is not surprising because the tICA averages out the information at times shorter than the lag time. Setting a lag time for tICA affects the modality of sampling the dynamics in the free-energy landscape. These considerations lead us to perform a study on how the tICA lag time affects the properties analyzed by tICA, namely, the slow fluctuations and the calculation of the time correlation functions. For example, if the lag time is too short or too long, the tICA cannot properly sample the free-energy barriers. Thus, we propose and test several methods to evaluate a priori an optimal lag time. The optimal τtICA calculated with the different methods is fairly consistent.

We also observed an almost quantitative agreement between the time correlation functions directly calculated from the simulation and the ones obtained by solving the LE4PD-XYZ equation when hydrodynamics is included. This result confirms the importance of hydrodynamics in the Langevin dynamics of proteins in solution, which is not surprising given that the Langevin is an equation of motion in the protein’s reduced coordinates, where the effect of the solvent enters through friction, random forces, and hydrodynamic interactions. Thus, the hydrodynamic forces that enter the LE4PD equations result from the projection of the forces due to the solvent and the protein’s atomistic fast degrees of freedom onto the reduced coordinates of the alpha carbons. Finally, hydrodynamics is more important for modes that are local, while large-scale fluctuations and slow modes are less affected.

In conclusion, if a rapid identification of the leading slow dynamics is required, the tICA analysis is a practical and valuable strategy to collect that information. However, suppose the time propagation of the slow leading dynamics is of interest. In that case, the LE4PD-XYZ with hydrodynamics provides a more accurate representation of the slow processes based on its superior ability to reproduce the protein’s dynamics at all times.

Note that different tICA modes are likely to have different optimal lag times. Thus, in principle, one cannot find one lag time that is optimal for the whole multiscale dynamics of the protein. In fact, in general, one observes that the barrier height tends to decrease with the increasing mode number, as the mode-dependent dynamics become increasingly more local and less cooperative.44 Because defining the lag time in tICA implies that dynamics at shorter lag time are not detected, the dynamics on the more local modes may not be correctly represented.

The supplementary material presents in Sec. S1 a study of the independence of the tICA and LE4PD modes, while Sec. S2 shows the stability of the tICA modes as a function of the selected lag time. To most efficiently identify the LE4PD-XYZ modes that best overlap to tICA modes, Sec. S3 in the supplementary material presents some additional methods to the ones described in Sec. V of the text, including the projection of the slowest tICA mode onto the LE4PD-XYZ modes with and without hydrodynamics interactions. Section S4 presents the quantitative comparison of the mode-dependent fluctuations for the different LE4PD methods and tICA. A selection of alternative methods to calculate the optimal tICA lag time is summarized in Sec. S5. Section S6 presents a brief overview of the isotropic LE4PD: the calculations for this model compare in several sections of the text together with the anisotropic LE4PD-XYZ’s data. Section S7 shows the two-dimensional free-energy maps for the two slowest modes of the anisotropic LE4PD-XYZ. Finally, the supplementary material document concludes in Sec. S8 with a presentation of the molecular dynamics methodology we used to simulate ubiquitin and the post-processing of the trajectory, followed by Sec. S9 with a brief overview of the Markov state model method.

E.R.B. was supported by the John Keana Graduate Student Fellowship from the University of Oregon and by the National Science Foundation through Grant Nos. CHE-1665466 and CHE-2133464. The computational work was performed on the supercomputer Comet at the San Diego Supercomputer Center, with the support of XSEDE90 (Allocation Grant No. TG-CHE100082) (XSEDE is a program supported by the National Science Foundation under Grant No. ACI-1548562).

The authors have no conflicts of interest to disclose.

The codes used to perform the isotropic LE4PD and the anisotropic LE4PD-XYZ analyses described here are available on GitHub (https://github.com/GuenzaLab). The processed MD trajectory and θa(t),ϕa(t),|ξa(t)| trajectories for the first ten LE4PD-XYZ modes both with and without hydrodynamics included are available on Zenodo (https://doi.org/10.5281/zenodo.4312224).

1.
B.
Alberts
,
A.
Johnson
,
J.
Wilson
,
J.
Lewis
,
T.
Hunt
,
K.
Roberts
,
M.
Raff
, and
P.
Walter
,
Molecular Biology of the Cell
(
Garland Science
,
2008
).
2.
N. D.
Socci
,
J. N.
Onuchic
, and
P. G.
Wolynes
, “
Diffusive dynamics of the reaction coordinate for protein folding funnels
,”
J. Chem. Phys.
104
,
5860
5868
(
1996
).
3.
S.
Wu
,
P. I.
Zhuravlev
, and
G. A.
Papoian
, “
High resolution approach to the native state ensemble kinetics and thermodynamics
,”
Biophys. J.
95
,
5524
5532
(
2008
).
4.
G. G.
Maisuradze
,
A.
Liwo
, and
H. A.
Scheraga
, “
Principal component analysis for protein folding dynamics
,”
J. Mol. Biol.
385
,
312
329
(
2009
).
5.
R.
Hegger
,
A.
Altis
,
P. H.
Nguyen
, and
G.
Stock
, “
How complex is the dynamics of peptide folding?
,”
Phys. Rev. Lett.
98
,
028102
(
2007
).
6.
A.
Kitao
and
N.
Go
, “
Investigating protein dynamics in collective coordinate space
,”
Curr. Opin. Struct. Biol.
9
,
164
169
(
1999
).
7.
A.
Amadei
,
A. B. M.
Linssen
, and
H. J. C.
Berendsen
, “
Essential dynamics of proteins
,”
Proteins: Struct., Funct., Bioinf.
17
,
412
425
(
1993
).
8.
F.
Sittel
and
G.
Stock
, “
Perspective: Identification of collective variables and metastable states of protein dynamics
,”
J. Chem. Phys.
149
,
150901
(
2018
).
9.
P. I.
Zhuravlev
and
G. A.
Papoian
, “
Protein functional landscapes, dynamics, allostery: A tortuous path towards a universal theoretical framework
,”
Q. Rev. Biophys.
43
,
295
332
(
2010
).
10.
A. R.
Atilgan
,
S. R.
Durell
,
R. L.
Jernigan
,
M. C.
Demirel
,
O.
Keskin
, and
I.
Bahar
, “
Anisotropy of fluctuation dynamics of proteins with an elastic network model
,”
Biophys. J.
80
,
505
515
(
2001
).
11.
M. C.
Demirel
,
A. R.
Atilgan
,
I.
Bahar
,
R. L.
Jernigan
, and
B.
Erman
, “
Identification of kinetically hot residues in proteins
,”
Protein Sci.
7
,
2522
2532
(
1998
).
12.
M. M.
Tirion
, “
Large amplitude elastic motions in proteins from a single-parameter, atomic analysis
,”
Phys. Rev. Lett.
77
,
1905
1908
(
1996
).
13.
I.
Bahar
and
R. L.
Jernigan
, “
Cooperative fluctuations and subunit communication in tryptophan synthase
,”
Biochemistry
38
,
3478
3490
(
1999
).
14.
J.
Monod
,
J.
Wyman
, and
J.-P.
Changeux
, “
On the nature of allosteric transitions: A plausible model
,”
J. Mol. Biol.
12
,
88
118
(
1965
).
15.
D. D.
Boehr
,
R.
Nussinov
, and
P. E.
Wright
, “
The role of dynamic conformational ensembles in biomolecular recognition
,”
Nat. Chem. Biol.
5
,
789
796
(
2009
).
16.
P.
Csermely
,
R.
Palotai
, and
R.
Nussinov
, “
Induced fit, conformational selection and independent dynamic segments: An extended view of binding events
,”
Trends Biochem. Sci.
35
,
539
546
(
2010
).
17.
U.
Kahler
,
A. S.
Kamenik
,
F.
Waibl
,
J.
Kraml
, and
K. R.
Liedl
, “
Protein-protein binding as a two-step mechanism: Preselection of encounter poses during the binding of BPTI and trypsin
,”
Biophys. J.
119
,
652
666
(
2020
).
18.
I.
Jolliffe
,
Principal Component Analysis
, Springer Series in Statistics (
Springer
,
2002
).
19.
A.
Hyvärinen
,
J.
Karhunen
, and
E.
Oja
,
Independent Component Analysis, Adaptive and Cognitive Dynamic Systems: Signal Processing, Learning, Communications and Control
(
Wiley
,
2001
).
20.
I.
Goodfellow
,
Y.
Bengio
, and
A.
Courville
,
Deep Learning
(
MIT Press
,
2016
).
21.
Y.
Naritomi
and
S.
Fuchigami
, “
Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: The case of domain motions
,”
J. Chem. Phys.
134
,
065101
(
2011
).
22.
S.
Klus
,
F.
Nüske
,
P.
Koltai
,
H.
Wu
,
I.
Kevrekidis
,
C.
Schütte
, and
F.
Noé
, “
Data-driven model reduction and transfer operator approximation
,”
J. Nonlinear Sci.
28
,
985
1010
(
2018
).
23.
L.
Molgedey
and
H. G.
Schuster
, “
Separation of a mixture of independent signals using time delayed correlations
,”
Phys. Rev. Lett.
72
,
3634
3637
(
1994
).
24.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
25.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
,
2000
2009
(
2013
).
26.
P.
Tiwary
and
B. J.
Berne
, “
Spectral gap optimization of order parameters for sampling complex molecular systems
,”
Proc. Natl. Acad. Sci. U. S. A.
113
,
2839
2844
(
2016
).
27.
M. M.
Sultan
and
V. S.
Pande
, “
tICA-metadynamics: Accelerating metadynamics by using kinetically selected collective variables
,”
J. Chem. Theory Comput.
13
,
2440
2447
(
2017
).
28.
J.
McCarty
and
M.
Parrinello
, “
A variational conformational dynamics approach to the selection of collective variables in metadynamics
,”
J. Chem. Phys.
147
,
204109
(
2017
).
29.
Y.
Naritomi
and
S.
Fuchigami
, “
Slow dynamics of a protein backbone in molecular dynamics simulation revealed by time-structure based independent component analysis
,”
J. Chem. Phys.
139
,
215102
(
2013
).
30.
M. K.
Scherer
,
B. E.
Husic
,
M.
Hoffmann
,
F.
Paul
,
H.
Wu
, and
F.
Noé
, “
Variational selection of features for molecular kinetics
,”
J. Chem. Phys.
150
,
194108
(
2019
).
31.
R. T.
McGibbon
,
K. A.
Beauchamp
,
M. P.
Harrigan
,
C.
Klein
,
J. M.
Swails
,
C. X.
Hernández
,
C. R.
Schwantes
,
L.-P.
Wang
,
T. J.
Lane
, and
V. S.
Pande
, “
MDTraj: A modern open library for the analysis of molecular dynamics trajectories
,”
Biophys. J.
109
,
1528
1532
(
2015
).
32.
E. R.
Beyerle
and
M. G.
Guenza
, “
Comparison between slow anisotropic LE4PD fluctuations and the principal component analysis modes of ubiquitin
,”
J. Chem. Phys.
154
,
124111
(
2021
).
33.
E.
Caballero-Manrique
,
J. K.
Bray
,
W. A.
Deutschman
,
F. W.
Dahlquist
, and
M. G.
Guenza
, “
A theory of protein dynamics to predict NMR relaxation
,”
Biophys. J.
93
,
4128
4140
(
2007
).
34.
J.
Copperman
and
M. G.
Guenza
, “
Coarse-grained Langevin equation for protein dynamics: Global anisotropy and a mode approach to local complexity
,”
J. Phys. Chem. B
119
,
9195
9211
(
2015
).
35.
J.
Copperman
and
M. G.
Guenza
, “
Predicting protein dynamics from structural ensembles
,”
J. Chem. Phys.
143
,
243131
(
2015
).
36.
J.
Copperman
and
M. G.
Guenza
, “
Mode localization in the cooperative dynamics of protein recognition
,”
J. Chem. Phys.
145
,
015101
(
2016
).
37.
R.
Zwanzig
,
Nonequilibrium Statistical Mechanics
(
Oxford University Press
,
Oxford
,
2001
).
38.
M.
Guenza
, “
Many chain correlated dynamics in polymer fluids
,”
J. Chem. Phys.
110
,
7574
7588
(
1999
).
39.
K. S.
Schweizer
, “
Microscopic theory of the dynamics of polymeric liquids: General formulation of a mode–mode-coupling approach
,”
J. Chem. Phys.
91
,
5802
(
1998
).
40.
I.
Lyubimov
and
M. G.
Guenza
, “
First-principle approach to rescale the dynamics of simulated coarse-grained macromolecular liquids
,”
Phys. Rev. E
84
,
031801
(
2011
); arXiv:1103.3047.
41.
M.
Doi
and
S.
Edwards
,
The Theory of Polymer Dynamics
(
Clarendon Press
,
Oxford
,
1986
).
42.
R.
Bird
,
C.
Curtiss
,
R.
Armstrong
, and
O.
Hassager
,
Dynamics of Polymeric Liquids, Volume 2: Kinetic Theory
(
Wiley
,
1987
).
43.
R.
Zwanzig
, “
Diffusion in a rough potential
,”
Proc. Natl. Acad. Sci. U. S. A.
85
,
2029
2030
(
1988
); arXiv:1409.4581.
44.
E. R.
Beyerle
and
M. G.
Guenza
, “
Kinetics analysis of ubiquitin local fluctuations with Markov state modeling of the LE4PD normal modes
,”
J. Chem. Phys.
151
,
164119
(
2019
).
45.
D.
Komander
, “
The emerging complexity of protein ubiquitination
,”
Biochem. Soc. Trans.
37
,
937
953
(
2009
).
46.
D.
Komander
and
M.
Rape
, “
The ubiquitin code
,”
Annu. Rev. Biochem.
81
,
203
229
(
2012
).
47.
Z.
Lv
,
K. M.
Williams
,
L.
Yuan
,
J. H.
Atkison
, and
S. K.
Olsen
, “
Crystal structure of a human ubiquitin E1–ubiquitin complex reveals conserved functional elements essential for activity
,”
J. Biol. Chem.
293
,
18337
18352
(
2018
).
48.
H.
Takano
and
S.
Miyashita
, “
Relaxation modes in random spin systems
,”
J. Phys. Soc. Jpn.
64
,
3688
3698
(
1995
).
49.
A.
Mitsutake
,
H.
Iijima
, and
H.
Takano
, “
Relaxation mode analysis of a peptide system: Comparison with principal component analysis
,”
J. Chem. Phys.
135
,
164102
(
2011
).
50.
A.
Mitsutake
and
H.
Takano
, “
Relaxation mode analysis and Markov state relaxation mode analysis for chignolin in aqueous solution near a transition temperature
,”
J. Chem. Phys.
143
,
124111
(
2015
).
51.
A.
Mitsutake
and
H.
Takano
, “
Relaxation mode analysis for molecular dynamics simulations of proteins
,”
Biophys. Rev.
10
,
375
389
(
2018
).
52.
J.
Copperman
,
M.
Dinpajooh
,
E. R.
Beyerle
, and
M. G.
Guenza
, “
Universality and specificity in protein fluctuation dynamics
,”
Phys. Rev. Lett.
119
,
158101
(
2017
).
53.
C.
Eckart
, “
The kinetic energy of polyatomic molecules
,”
Phys. Rev.
46
,
383
387
(
1934
).
54.
A.
Sayvetz
, “
The kinetic energy of polyatomic molecules
,”
J. Chem. Phys.
7
,
383
389
(
1939
).
55.
G. R.
Kneller
, “
Eckart axis conditions, Gauss’ principle of least constraint, and the optimal superposition of molecular structures
,”
J. Chem. Phys.
128
,
194101
(
2008
).
56.
G.
Chevrot
,
P.
Calligari
,
K.
Hinsen
, and
G. R.
Kneller
, “
Least constraint approach to the extraction of internal motions from molecular dynamics trajectories of flexible macromolecules
,”
J. Chem. Phys.
135
,
084110
(
2011
).
57.
R.
Horn
and
C.
Johnson
,
Topics in Matrix Analysis
(
Cambridge University Press
,
1994
).
58.
P. G.
de Gennes
,
Scaling Concepts in Polymer Physics
(
Cornell University Press
,
1979
).
59.
G.
Bowman
,
V.
Pande
, and
F.
Noé
,
An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation
, Advances in Experimental Medicine and Biology (
Springer
,
The Netherlands
,
2013
).
60.
M. M.
Sultan
,
H. K.
Wayment-Steele
, and
V. S.
Pande
, “
Transferable neural networks for enhanced sampling of protein dynamics
,”
J. Chem. Theory Comput.
14
,
1887
1894
(
2018
); arXiv:1801.00636.
61.
F.
Noé
and
C.
Clementi
, “
Kinetic distance and kinetic maps from molecular dynamics simulation
,”
J. Chem. Theory Comput.
11
,
5002
5011
(
2015
); arXiv:1506.06259.
62.
M. K.
Scherer
,
B.
Trendelkamp-Schroer
,
F.
Paul
,
G.
Pérez-Hernández
,
M.
Hoffmann
,
N.
Plattner
,
C.
Wehmeyer
,
J.-H.
Prinz
, and
F.
Noé
, “
PyEMMA 2: A software package for estimation, validation, and analysis of Markov models
,”
J. Chem. Theory Comput.
11
,
5525
5542
(
2015
).
63.
W.
E
,
W.
Ren
, and
E.
Vanden-Eijnden
, “
String method for the study of rare events
,”
Phys. Rev. B
66
,
052301
(
2002
).
64.
L.
Penengo
,
M.
Mapelli
,
A. G.
Murachelli
,
S.
Confalonieri
,
L.
Magri
,
A.
Musacchio
,
P. P.
Di Fiore
,
S.
Polo
, and
T. R.
Schneider
, “
Crystal structure of the ubiquitin binding domains of rabex-5 reveals two modes of interaction with ubiquitin
,”
Cell
124
,
1183
1195
(
2006
).
65.
L.
Reichl
,
A Modern Course in Statistical Physics
(
Wiley
,
1998
).
66.
W. C.
Swope
,
J. W.
Pitera
, and
F.
Suits
, “
Describing protein folding kinetics by molecular dynamics simulations. 1. Theory
,”
J. Phys. Chem. B
108
,
6571
6581
(
2004
).
67.
R. A.
Horn
and
C. R.
Johnson
,
Matrix Analysis
(
Cambridge University Press
,
New York
,
1986
).
68.
S.
Lee
,
Y. C.
Tsai
,
R.
Mattera
,
W. J.
Smith
,
M. S.
Kostelansky
,
A. M.
Weissman
,
J. S.
Bonifacino
, and
J. H.
Hurley
, “
Structural basis for ubiquitin recognition and autoubiquitination by Rabex-5
,”
Nat. Struct. Mol. Biol.
13
,
264
271
(
2006
).
69.
A.
Sidhu
,
A.
Surolia
,
A. D.
Robertson
, and
M.
Sundd
, “
A hydrogen bond regulates slow motions in ubiquitin by modulating a β-turn flip
,”
J. Mol. Biol.
411
,
1037
1048
(
2011
).
70.
S.
Vijay-Kumar
,
C. E.
Bugg
, and
W. J.
Cook
, “
Structure of ubiquitin refined at 1.8 Å resolution
,”
J. Mol. Biol.
194
,
531
544
(
1987
).
71.
C. A.
Smith
,
D.
Ban
,
S.
Pratihar
,
K.
Giller
,
M.
Paulat
,
S.
Becker
,
C.
Griesinger
,
D.
Lee
, and
B. L.
de Groot
, “
Allosteric switch regulates protein–protein binding through collective motion
,”
Proc. Natl. Acad. Sci. U. S. A.
113
,
3269
3274
(
2016
).
72.
F.
Noé
,
I.
Horenko
,
C.
Schütte
, and
J. C.
Smith
, “
Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states
,”
J. Chem. Phys.
126
,
155102
(
2007
).
73.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noé
, “
Markov models of molecular kinetics: Generation and validation
,”
J. Chem. Phys.
134
,
174105
(
2011
).
74.
S.
Olsson
,
H.
Wu
,
F.
Paul
,
C.
Clementi
, and
F.
Noé
, “
Combining experimental and simulation data of molecular processes via augmented Markov models
,”
Proc. Natl. Acad. Sci. U. S. A.
114
,
8265
8270
(
2017
).
75.
B. G.
Keller
,
J.-H.
Prinz
, and
F.
Noé
, “
Markov models and dynamical fingerprints: Unraveling the complexity of molecular kinetics
,”
Chem. Phys.
396
,
92
107
(
2012
).
76.
F.
Noé
,
S.
Doose
,
I.
Daidone
,
M.
Lollmann
,
M.
Sauer
,
J. D.
Chodera
, and
J. C.
Smith
, “
Dynamical fingerprints for probing individual relaxation processes in biomolecular dynamics with simulations and kinetic experiments
,”
Proc. Natl. Acad. Sci. U. S. A.
108
,
4822
4827
(
2011
).
77.
J. D.
Chodera
and
F.
Noé
, “
Probability distributions of molecular observables computed from Markov models. II. Uncertainties in observables and their time-evolution
,”
J. Chem. Phys.
133
,
105102
(
2010
).
78.
M. J.
Comstock
,
K. D.
Whitley
,
H.
Jia
,
J.
Sokoloski
,
T. M.
Lohman
,
T.
Ha
, and
Y. R.
Chemla
, “
Direct observation of structure-function relationship in a nucleic acid–processing enzyme
,”
Science
348
,
352
354
(
2015
).
79.
H.
Mazal
and
G.
Haran
, “
Single-molecule FRET methods to study the dynamics of proteins at work
,”
Curr. Opin. Biomed. Eng.
12
,
8
17
(
2019
).
80.
J.
Chan
,
K.
Takemura
,
H.-R.
Lin
,
K.-C.
Chang
,
Y.-Y.
Chang
,
Y.
Joti
,
A.
Kitao
, and
L.-W.
Yang
, “
An efficient timer and sizer of biomacromolecular motions
,”
Structure
28
,
259
269.e8
(
2020
).
81.
C.
Wehmeyer
and
F.
Noé
, “
Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics
,”
J. Chem. Phys.
148
,
241703
(
2018
).
82.
F.
Noé
,
R.
Banisch
, and
C.
Clementi
, “
Commute maps: Separating slowly mixing molecular configurations for kinetic modeling
,”
J. Chem. Theory Comput.
12
,
5620
5630
(
2016
).
83.
A. M.
Razavi
and
V. A.
Voelz
, “
Kinetic network models of tryptophan mutations in β-hairpins reveal the importance of non-native interactions
,”
J. Chem. Theory Comput.
11
,
2801
2812
(
2015
).
84.
Z.-G.
Wang
, “
50th anniversary perspective: Polymer conformation—A pedagogical review
,”
Macromolecules
50
,
9073
9114
(
2017
).
85.
G.
Pérez-Hernández
and
F.
Noé
, “
Hierarchical time-lagged independent component analysis: Computing slow modes and reaction coordinates for large molecular systems
,”
J. Chem. Theory Comput.
12
,
6118
6129
(
2016
).
86.
B. E.
Husic
,
R. T.
McGibbon
,
M. M.
Sultan
, and
V. S.
Pande
, “
Optimized parameter selection reveals trends in Markov state models for protein folding
,”
J. Chem. Phys.
145
,
194103
(
2016
).
87.
C. R.
Schwantes
and
V. S.
Pande
, “
Modeling molecular kinetics with tICA and the kernel trick
,”
J. Chem. Theory Comput.
11
,
600
608
(
2015
).
88.
W.
Chen
,
H.
Sidky
, and
A. L.
Ferguson
, “
Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets
,”
J. Chem. Phys.
150
,
214114
(
2019
).
89.
W.
Chen
,
H.
Sidky
, and
A. L.
Ferguson
, “
Capabilities and limitations of time-lagged autoencoders for slow mode discovery in dynamical systems
,”
J. Chem. Phys.
151
,
064123
(
2019
).
90.
J.
Towns
,
T.
Cockerill
,
M.
Dahan
,
I.
Foster
,
K.
Gaither
,
A.
Grimshaw
,
V.
Hazlewood
,
S.
Lathrop
,
D.
Lifka
,
G. D.
Peterson
,
R.
Roskies
,
J. R.
Scott
, and
N.
Wilkins-Diehr
, “
XSEDE: Accelerating scientific discovery
,”
Comput. Sci. Eng.
16
,
62
74
(
2014
).

Supplementary Material