The generalized master equation (GME) provides a powerful approach to study biomolecular dynamics via non-Markovian dynamic models built from molecular dynamics (MD) simulations. Previously, we have implemented the GME, namely the quasi Markov State Model (qMSM), where we explicitly calculate the memory kernel and propagate dynamics using a discretized GME. qMSM can be constructed with much shorter MD trajectories than the MSM. However, since qMSM needs to explicitly compute the time-dependent memory kernels, it is heavily affected by the numerical fluctuations of simulation data when applied to study biomolecular conformational changes. This can lead to numerical instability of predicted long-time dynamics, greatly limiting the applicability of qMSM in complicated biomolecules. We present a new method, the Integrative GME (IGME), in which we analytically solve the GME under the condition when the memory kernels have decayed to zero. Our IGME overcomes the challenges of the qMSM by using the time integrations of memory kernels, thereby avoiding the numerical instability caused by explicit computation of time-dependent memory kernels. Using our solutions of the GME, we have developed a new approach to compute long-time dynamics based on MD simulations in a numerically stable, accurate and efficient way. To demonstrate its effectiveness, we have applied the IGME in three biomolecules: the alanine dipeptide, FIP35 WW-domain, and Taq RNA polymerase. In each system, the IGME achieves significantly smaller fluctuations for both memory kernels and long-time dynamics compared to the qMSM. We anticipate that the IGME can be widely applied to investigate biomolecular conformational changes.

Biological molecules are often large and flexible, and their conformational changes are important for understanding their biological functions. Molecular dynamics (MD) simulations have been widely adopted in recent decades to investigate the conformational dynamics of biomolecules at the atomic level. However, the long timescale of biomolecular processes poses a challenge for MD simulation. Some processes reach milliseconds or even seconds, and atomistic simulations typically only reach sub-milliseconds for large biomolecules.

The Markov State Model (MSM)1–27 is a powerful approach used to bridge the timescale gap between atomistic MD simulations and biological processes. In MSMs, conformations are grouped into states according to their structural or dynamical similarities, and the conformational dynamics are modelled as Markovian transitions between these stable and metastable states at a discrete time sequence t = T (τT is the lag time). The Markovian transitions between each pair of states can be computed from many short MD trajectories, and the long-time dynamics can be predicted with the first-order master equation. To achieve Markovian dynamics, one often needs to construct an MSM with many small states, so that the relaxation within each state is easier to achieve, hence the transitions between pairs of states become Markovian within the affordable lag time (bound by length of MD trajectories). However, these MSMs usually contain a large number of (e.g., hundreds to thousands) states,28 which are useful to make quantitative predictions to be attested with experiments, but often hinder the comprehension of biological insights.29 

To address the above-mentioned challenge facing MSMs, we have developed a non-Markovian framework30,31 based on the generalized master equation (GME),32,33 in which we explicitly consider the memory kernels to predict the long-time dynamics of biomolecular conformational changes. In particular, we developed a straightforward implementation of the GME, the quasi-MSM (qMSM),30 to explicitly consider the memory kernels with the GME. The qMSM is constructed from short-time dynamics within the memory kernel relaxation time τK, and it can predict the long-time dynamics with the GME. The qMSM has been successfully applied in two large biomolecular systems: the gate opening dynamic of RNA polymerase34 and the conformational dynamics of messenger RNA (mRNA) recognition of RNA Induced Silencing Complex.35 To fit the single-molecular force spectroscopy data, Dill and co-workers36,37 have also developed a non-Markovian memory kernel (NMMK) method based on GME, where they applied the maximum entropy principle to obtain the memory kernels from experimental data.

In addition to the GME, the generalized Langevin equation (GLE) has also been recently applied to study biomolecular dynamics.38–46 The GLE is a useful tool for investigating the non-Markovian dynamics of biomolecules along one or several collective variables.43,44 Unlike the Langevin equation, friction in the GLE becomes a time integral of historical dynamics that involves memories. The memory effect in friction can significantly affect the conformational dynamics of biomolecules as revealed in the Grote–Hynes theory.44,45 Furthermore, it has been shown that the decay of the memory kernels in GLE heavily rely on the choice of collective variables. If good collective variables are selected, the memory kernel decay time is significantly shorter than the slowest timescale of the system (e.g., for the folding of Ala944). In addition to GME and GLE, Nóe and co-workers have developed HMM47 that provides an alternative way to model long-timescale biomolecular dynamics using a few states at relatively short lag times. HMMs can effectively approximate the projected Markov models containing hundreds of small states (microstates) using only a few hidden states. Rather than explicitly considering memory kernels as in GME or GLE, HMMs allow fuzzy state boundaries, where a MD conformation has a probability to be assigned to multiple states.

The initial success demonstrates that GME represents a promising approach to study conformational dynamics of complexed biomolecules. However, there still exists a major challenge in the robustness of the computed memory kernels when applying GME to study complex conformational changes. The qMSM often suffers from the numerical instability. For example, small numerical fluctuations of the short-time dynamics (the input of the qMSM) may lead to large fluctuations or even failures when predicting the long-time dynamics in many large biomolecular systems.34,35 To mitigate these fluctuations, we previously adopted a smoothing technique for the qMSM when dealing with large biomolecular systems.34,35 While the smoothing technique helps prevent failure of the qMSM, it still suffers from large uncertainty for predicted dynamics of complexed conformational changes (see Fig. 5). To improve the robustness of algorithms, the integrations often provide a useful approach by reducing the numerical fluctuations. For example, Dinner and co-workers48 have previously developed a method by integrating over multiple lag times when applying the variational approach to MSMs. They show that it can significantly reduce numerical fluctuations and yield more robust results than the variational approach.

We here introduce a new method, the Integrative GME (IGME), to overcome the above-mentioned challenge of the qMSM. The IGME utilizes the time integrals of memory kernels instead of the memory kernel tensor as used in the qMSM, thus the numerical fluctuation of the IGME is much smaller than that of the qMSM. In this paper, we first derive the analytical solution of the GME under the condition where memory kernels have decayed to zero. We then develop a two-step method to build IGME models and obtain parameters of this analytical solution by fitting to the MD simulations. Three biomolecular systems are presented to demonstrate the strength of the IGME: the alanine dipeptide, the FIP35 WW domain, and the gate opening dynamic of a bacterial RNA polymerase (RNAP). We show that the IGME has comparable performance with the qMSM for small alanine dipeptide that have sufficient samplings; while the IGME displays much less uncertainty in calculating the memory kernels and predicting long-time dynamics than the qMSM for the dynamics of WW domain and RNAP that contain larger fluctuations.

The dynamics of a molecular system satisfy the Liouville equation. The Liouville equation describes the dynamics in a high dimensional phase space, which is difficult to analyze or visualize. As a result, many tools have been developed to analyze the MD simulations in a low dimensional space, e.g., collective variables. In most cases, the dynamics of interest lie on a low dimensional manifold.

We can consider the dynamics of the low-dimensional manifold as a projection from the phase space dynamics. We here employ a generalized Hummer–Szabo projection operator29 to project the continuous phase space onto discreate states. In this projection operator, the projection from the phase space to the state space would satisfy the following form:29,49
P=χjxπχjx=jρeqΓχjxπj1χjx
(1)
Where χx is an indicator function that defines the states [χjx=1 indicates x belongs to state j, and χjx=0 indicates x doesn’t belong to state j]. x is a vector containing the position of all atoms in the system. χj and χk are orthogonal when jk: χjxχkx=δjkχjx. We then define the bracket operator as an ensemble average over equilibrium populations [ρeqx] for the entire system:
χjxχjxρeqxχjxχkxρeqxdx=δjkπj
(2)
The full dynamics in the phase space [ρt] satisfies the Liouville equation, ρt/t=Lρt, where L is the Liouville operator. Upon the projection, the dynamics follows the Nakajima-Zwanzig equation (which can be derived from the Liouville’s equation):50,51
tPρt=PLPρt+PLeQLtQρ0+0tPLeQLtsQLPρsds
(3)
Where Q=IP (I is an identity matrix). The second term on the right-hand side of Eq. (3) is called the inhomogeneous term, which is zero when ρ0 is initiated from the equilibrium sampling. As demonstrated in our previous work,30 the above Nakajima–Zwanzig equation can be rewritten as the GME:
Ṫt=TtṪ00mint,τKTtsKsds
(4)
τK is the memory kernel decay time: KtτK=0. The GME given by Eq. (4) determines the evolution of the transition probability matrix (TPM) at any time point. In Eq. (4), Tt is referred to as the TPM, and Tijt represents the probability of finding the system in state j after time t, given that it started from state i:
Tijt=χjxeLtχixρeqxπi1
(5)
Here T is a row-normalized matrix: jTij=1. Ṫ0 is defined as follows:
Ṫij0=χjxLχixρeqxπi1=dTijtdtt0
(6)
And the memory kernel Kt is as below:
Kijt=χjxLeQLtQLχixρeqxπi1
(7)
In our previous work,30 we have developed a numerical implementation of the GME, qMSM, to compute the memory kernel tensor Kij(t) directly from the GME. In the qMSM, the equation used to compute memory kernels contains the first order time derivative of Tt, which may induce numerical fluctuations in the memory kernels. Consequently, the qMSM is heavily affected by the numerical fluctuations of Tt, and generates substantial numerical fluctuations when predicting long-time dynamics. Previously, we have developed a smoothing scheme to reduce the numerical fluctuations,34 but the numerical fluctuations can still be large for large biomolecular systems (see Fig. 5). In this work, we introduce a new theory to utilize the integrals of memory kernels to solve the GME.
The IGME is an extension of the GME that utilizes integrals of memory kernels instead of a memory kernel tensor to model conformational dynamics. The original GME [Eq. (4)] contains a time convolution term to incorporate memory kernels, which is difficult to solve analytically. Here we introduce the Taylor expansion of the convolution term to rewrite the GME as below:
0tTtsKsds=0tTt+n=11n!dnTtdtnsnKsds=TtM0t+n=11nn!dnTtdtnMnt
(8)
Then we can have:
Tt1ddtTttτK=Ṫ0M0tn=1T1t1nn!dndtnTtMnt
(9)
Where Mnt are the time integrals of memory kernels:
Mnt=0tKssnds
(10)
The above Taylor expansion form of the GME [Eq. (9)] still cannot be analytically solved since it is a differential equation of both T(t) and Mn(t). However, we notice that there often exist timescale separations for biomolecular dynamics, where the memory kernels decay to zero at significantly shorter timescale τK than the slowest dynamics of the system: KtτK0. Under this conditions (tτK), the integrals of memory kernels become constants: Mnt>τK=MnτKMn. This subsequently allows us to analytically solve the Taylor expansion form of the GME, which can be rewritten as:
Tt1ddtTttτK=Ṫ0M0n=1T1t1nn!dndtnTtMn
(11)
Equation (11) is a differential equation of T(t) alone, which can be analytically solved in a self-consistent manner. The self-consistent form of Eq. (11) is:
Tmt1ddtTmttτK=Ṫ0M0n=1Tm1t11nn!×dnTm1tdtnMn
(12)
Where Tmt corresponds to the solution of the m-th iteration. First, we need to choose an initial guess m=0) to begin the iterative process. We here adopted an initial guess [T0(t)] as the solution of the zeroth order truncation of the Taylor-expansion form of GME [Eq. (11)]:
T0t1ddtT0ttτK=Ṫ0M0lnA01T0tτK=tṪ0M0tlnT̂0
(13)
Where A0 is the integration constant. As both Ṫ0 and M0 are constant matrices, lnT̂0Ṫ0M0 is also a constant matrix. Next, we move on to the next iteration (m = 1). In this iteration, Eq. (12) becomes:
T1t1ddtT1ttτK=Ṫ0M0n=1T0t11nn!dndtnT0tMn
(14)
Put the solution of m = 0 [i.e., T0t] on the right-hand side of Eq. (14),
T1t1ddtT1ttτK=Ṫ0M0+lnT̂0M1+12lnT̂02M2+=Ṫ0M0n=11nn!lnT̂0nMn
(15)
We then perform time integration on both sides of Eq. (15):
lnA11T1tτK=tlnT̂1lnT̂1=Ṫ0M0n=11nn!lnT̂0nMn
(16)
Where A1 is the integration constant at m = 1. For further interactions, we can perform a similar operation to obtain the general form for the solution at the m-th iteration:
lnAm1TmtτK=tlnT̂mlnT̂m=Ṫ0M0n=11nn!lnT̂m1nMn
(17)
When we approach infinite number of iterations m → ∞, the solution should be converged: Tmt=Tm1t. We can then obtain the solution of Tt at m-th iteration [i.e., T(m)t] can be derived:
lnA1TtτK=tlnT̂TtτK=AT̂t
(18)
Where A=Am matrix is the integration constant, and lnT̂=lnT̂m is a constant matrix satisfying the following self-consistent equation of T̂:
lnT̂=Ṫ0M0n=11nn!lnT̂nMn
(19)
In the above solutions [Eqs. (18) and (19)], Ṫ0, A, T̂ and Mn are all N × N matrices, where N corresponds to the number of states. Our self-consistent solutions are based on the initial guess [T0(t) in Eq. (13)]. In fact, this initial guess corresponds to the physical systems containing memory kernels as delta functions as discussed in our previous work.30 Under this condition, the memory kernel decays infinitely fast in comparison to the time scale of interest, representing the limit of extreme timescale separation. In this section, we use the self-consistent approach to derive an analytical solution for the GME at tτK, and we obtain the converged solution at an infinite number of interactions [T(m)t]. We note that Eqs. (18) and (19) can also be obtained via an alternative approach by solving the characteristic equations of the delayed differential equations (see  Appendix A for details).

Matrices Ṫ0, A, T̂ and Mn in Eq. (19) can be obtained by numerically fitting them to MD simulations. Specifically, Eq. (18) enables us to determine A and T̂, and Eq. (19) would allow us to obtain the time integrals of memory kernels at various orders: Mn. Interestingly, with A and T̂, we can already obtain the long-timescale dynamics [T(tτK)] using Eq. (18) without explicitly knowing the integrals of memory kernels, Mn. Indeed, in our numerical implementations of the IGME model (Sec. II C), we fit Eq. (18) to MD simulation data to obtain two constant matrices A and T̂, and subsequently propagate the dynamics using Eq. (18). Equation (18) also indicates that a Markovian model (i.e., TnτT=TτTn) can only be reached when the lag time is long enough so that ln A (containing the contributions from the fast dynamics) becomes negligible, i.e., lnAτTlnT̂. Therefore, τK in IGME is usually shorter than the Markovian lag time τT in an MSM. We also notice that when t in Eq. (18), limtlnTt=tlnT̂, and then T̂=limtTt1/t. As T̂ describes the dynamics at the infinite long lag time, it provides an efficient way to estimate the timescales of the slowest dynamic modes.

Numerically solving Mn using Eq. (19) is not a trivial task. In particular, obtaining higher-order memory kernels is only possible with the knowledge of the mathematical forms of the memory kernels. However, we show that it is possible to estimate the time integral of the memory kernels at the zeroth order, M0=0τKKsds. We notice that the last two terms in the right-hand side of Eq. (19) is originated from the Taylor expansion of the convolution term in the GME [Eq. (11)]. Specifically, the zeroth order term is M0, while the higher-order terms are: lnT̂M1+12lnT̂2M2+. As discussed in the previous paragraph, the constant matrix T̂ is related to the MSM at the infinite long lag time [lnT̂=limtlnTt/t]. The implied timescales (ITS) of this MSM [Tt] are defined as ITSi=limtt/lnλit, and the eigenvalues of T̂ thus satisfy lnλ̂i=1/ITSi. For biomolecular systems, the implied timescales correspond to the slowest dynamics of the system, and the values of −1/ITSi are typically close to zero. As a result, lnλ̂i approaches zero and lnT̂ is close to the zero matrix. Therefore, the zero-order term (M0) would represent a good approximation of this Taylor expansion. If we only keep the zero-order term, we could have:
M0Ṫ0lnT̂
(20)

For all three biomolecular systems tested in this work, we show that this approximation provides an efficient way to estimate the integrals of the memory kernels (M0) in reasonable accuracy, which may benefit future studies that depend on this quantity (see Sec. II D and Fig. 6).

Using alanine dipeptide as the example, we demonstrate how Eq. (18) works for biomolecular dynamics (see Fig. 1). We generate a four-state model for the conformational dynamics of alanine dipeptide, and the four states are shown in green, blue, red, and magenta regions of the Ramachandran plot in Fig. 1(a). At tτK (in Fig. 1, τK = 0.5 ps), a linear relationship between lnTt and t following Eq. (18) could be seen as the red lines [Fig. 1(b)]. The intercepts between the red lines and the vertical axis (i.e., ln A) denote the fast dynamics that fully relax at tτK.

FIG. 1.

An example of IGME: the four-state model of alanine dipeptide. (a) The Ramachandran plot of the four-state model of alanine dipeptide. (b) The IGME model of four representative elements of lnTt, which becomes linear to t when t ≥ 1 ps.

FIG. 1.

An example of IGME: the four-state model of alanine dipeptide. (a) The Ramachandran plot of the four-state model of alanine dipeptide. (b) The IGME model of four representative elements of lnTt, which becomes linear to t when t ≥ 1 ps.

Close modal

An IGME model follows Eq. (18) and can be constructed by finding the two constant matrices, A and T̂. Once these two constant matrices are obtained, the time dependent dynamics of the system can be predicted by computing Tt from Eq. (18). In particular, the IGME model is constructed from TPMs at multiple lag times: TτK,T(τK+Δt),T(τK+2Δt),TτK+L. Here, two hyper parameters are involved for building an IGME: τK and L. We need to choose τK to be sufficient long so that the memory kernels decay to zero at t > τK and integrals of memory kernels become constant matrices. The L of IGME is another hyper-parameter that requires an additional approach to optimize. For well sampled data, any values of L will lead to identical IGMEs. However, when the sampling is insufficient, the TPMs counted from MD trajectories may contain numerical fluctuations, and this can significantly affect the IGME. To address this situation, we use a range of TPM data (spanning the lag time from τK to τK + L) to reduce fluctuations of the input TPMs. Below, we have outlined our procedure for obtaining A and T̂ for a given set of hyperparameters: τK and L. In Sec. II D, we will introduce our protocol to select the optimal values for τK and L.

  1. Compute TPMs from MD simulation trajectories at lag times within τK,τK+L: TMDτK, TMDτK+Δt, …,TMDτK+L.

  2. Obtain the initial values of A and T̂ using the least square fitting (LSF) method. First, compute the logarithms of TPMs from the previous step (i.e., lnTMDτK,…, lnTMDτK+L). Then, perform the least square fitting for each element of the logarithm of TPMs: lnTtij=αij+βijt to obtain the initial estimates: Ainit=expα and T̂init=expβ.

  3. Optimize A and T̂ using the steepest descent method. In each optimization step, we evaluate the loss function [Eq. (22)] and employ the steep descent to optimize all elements in the matrices A and T̂. We then output the optimized Aopt and T̂opt, along with the rms error between the prediction of IGMEAopt,T̂opt and MD data.

(In Step 2), we employ the LSF method to derive the initial estimates for A and T̂. While the LSF method offers a straightforward and efficient approach for fitting A and T̂, it experiences several numerical issues. LSF utilizes the linear relationship of Eq. (18), which consists of N2 linear equations (N is the number of states) that can be fitted separately. To compute the logarithm of lnTt, we have implemented a spectrum-based approach, which involves combing the original eigenvectors with the logarithms of eigenvalues as below:
lnX=n1IXnnn1VIΛnnV1=VlnΛV1lnX=VlnΛV1
(21)
Where X is an arbitrary N × N matrix, Λ is a diagonal matrix of eigenvalues of X, I is the identity matrix, and V is a matrix of right eigenvectors of X. lnΛij=δijlnΛii. This LSF implementation may suffer from several numerical issues. Firstly, the eigenvalues of Tt are required to be positive in Eq. (18), as negative eigenvalues will induce complex numbers in lnTt. As a result, LSF is not applicable when Tt contains negative eigenvalues. Secondly, the TPMs directly obtained by LSF, TLSFt, may not satisfy the normalization requirement of the TPM [i.e., jTijt=1]. Thirdly, the TPMs obtained by LSF is not greatened to satisfy the detailed balance (i.e., πiTij = πjTji).
(In Step 3), we utilize the steepest descent method to further optimize the values of A and T̂ obtained from the LSF method. In our steepest descent optimization, we also introduce two necessary constraints to avoid the negative eigenvalues. First, we employed the exponential form in Eq. (18) to avoid the negative eigenvalue issue. Then, we enforced jTijt=1 and detailed balance to the IGME via a steepest descent method with the following loss function:
Lloss=t=τKτGTMDtAT̂tF2+αijTIGMEtij1+βπTIGMEtTIGMEtTπTF2
(22)
Where the subscript F represent the Frobenius norm. The second term of Eq. (22) corresponds to the jTijt=1 requirement of TPM, and the third term of Eq. (22) corresponds to the requirement of detailed balance πTt=TtTπT.
To obtain optimal τK and L, we adopted a systematic search approach by scanning all possible values of τK and L to find the best IGME models. A good IGME should be able to well reproduce the TPMs, Tt. Therefore, we can use the Root Mean Square Error (RMSE) between the predicted TPMs [TIGMEt=elnA+tlnT̂] and the original TPMs from MD data [Tt] to evaluate the quality of the IGME:30 
RMSE=0Lmaxi,j=1NπiTijMDtπiTijIGMEt2dtN20Ltrajdt0.5
(23)
Where Lmax is the maximum length of lag time. In Eq. (23), TIGME is built from TPMs between τK and τK + L (0 < τK < τ + KLLmax), while the RMSE is computed from t = 0 to t = Lmax.

1. Alanine dipeptide

AMBER99SB force field52 was used for alanine dipeptide, and the TIP3P model53 was used for water. After energy minimization, the box size was equilibrated with a 20 ns NPT simulation. Next, a 20 ns NVT trajectory was performed to generate 100 initial conformations. Finally, 100 MD trajectories started with the 100 initial conformations were performed, each containing 10 ns trajectories that saved with 0.01 ps interval between frames. The simulation was performed with GROMACS 4.5.3.54 Other details of the MD setup can be found in our previous paper.30 

The four-state model was generated from a splitting-and-lumping approach. In this approach, we first split all MD conformations into 1000 microstates by K-Centers clustering,55–57 then we lumped the microstates into four microstates with the PCCA+ (Perron Cluster-Cluster Analysis).58 The error bars of the residence probability of MD data were obtained from a 100-round bootstrapping. In each bootstrapping we randomly chose 100 trajectories to compute TPMs.30 When computing the TPMs, the detailed balance was enforced by directly symmetrizing the transition count matrix.

2. WW domain

The simulation dataset of WW domain was provided by Shaw et al. research.59 The dataset contains two 100 µs trajectories saved in 200 ps interval between frames. We first performed tICA (Time-lagged Independent Components)60–62 using a tICA lagtime of 18 ns based on the pairwise distances of the 35 alpha carbon atoms of the protein backbone to obtain three collective variables (i.e., three tICs) that can describe the slowest dynamics. Next, we adopted a split-and-lumping approach to obtain the four-state model. We first split all conformations into 800 microstates with the K-Centers algorithm,55–57 then lumped them into four metastable states with the PCCA+ algorithm.58 The error bars of the residence probability of MD data were obtained from a ten-round bootstrapping. In the bootstrapping, we first divided each trajectory into five segments, then in each bootstrapping we randomly chose ten segments of the trajectories to compute TPMs.30 When computing the TPMs, the detailed balance was enforced by directly symmetrizing the transition count matrix.

3. Taq RNAP

The MD simulation dataset of this system was taken from our previous study.34 This dataset consists of 306 MD simulation trajectories, and each has 200 ns with the saving interval of 100 ps. The TICA analysis was performed to reduce the number of features from 1,770 pairwise distances (involving alpha Carbon atoms of the clamp, β-lobe, β-protrusion, Switch regions, and active site) to three TICs containing the slowest dynamics of the system. Then the four-state model34 was built with a splitting-and-lumping approach, where a 100-state microstate model was built from the K-Centers clustering in the TICA space, and the four-state model was obtained with PCCA+ algorithm. The error bars of the residence probability of MD data were obtained from a 200-round bootstrapping. In each bootstrapping we randomly chose 100 trajectories to compute TPMs.34 When computing the TPMs, the detailed balance was enforced by the maximum likelihood estimator63 method.

In this section, we demonstrate the performance of the IGME in three systems: dynamics of alanine dipeptide, folding of FIP35 WW domain, and the gate opening of a bacterial RNAP.

We first applied the MSM, qMSM and IGME to a four-state model of the conformational dynamics of alanine dipeptide in explicit solvent. The four states are shown in green, blue, red, and magenta regions of the Ramachandran plot in Fig. 2(a). The MD simulation dataset of the alanine dipeptide is a well-sampled one; thus, the numerical fluctuation for the memory kernels is small, and both the IGME and qMSM yield good models that can perfectly match with MD simulations [Fig. 2(b)]. In particular, the Root Mean Square Error (RMSE) of the qMSM could reach very small values below 1.4 × 10−3 when τK ≥ 1.5 ps, while the IGME yields even better models [RMSE is below 3.4 × 10−4 when τK ≥ 1.5 ps, see Fig. 2(b)]. The Chapman–Kolmogorov test in Fig. 2(c) also shows that the qMSM and IGME built at τK = 1.5 ps perfectly match with the MD simulations. In contrast, the accuracy of the MSM is significantly lower in τ = 0 ∼ 8 ps, as the RMSEs of the MSM are much larger than those of the qMSM or IGME.

FIG. 2.

The IGME model of alanine dipeptide. (a) The Ramachandran plot of the four-state model of alanine dipeptide. (b) The RMSE heatmap of the IGME, qMSM and MSM. For the MSM, the lag time τ is the Markovian lag time; while for the qMSM and IGME, τ = τK. The triangular panel is the result of the IGME at different τK and L. The purple color shows the regions of the top 5% most accurate IGME models. (c) The Chapman-Kolmogorov test of the MSM (green), qMSM (blue) and IGME (red) compared against MD simulations (grey). In this panel, τMSM = τK = 1.5 ps. (d) The bar graph of ITS bounds predicted by the qMSM (blue) and IGME (red).

FIG. 2.

The IGME model of alanine dipeptide. (a) The Ramachandran plot of the four-state model of alanine dipeptide. (b) The RMSE heatmap of the IGME, qMSM and MSM. For the MSM, the lag time τ is the Markovian lag time; while for the qMSM and IGME, τ = τK. The triangular panel is the result of the IGME at different τK and L. The purple color shows the regions of the top 5% most accurate IGME models. (c) The Chapman-Kolmogorov test of the MSM (green), qMSM (blue) and IGME (red) compared against MD simulations (grey). In this panel, τMSM = τK = 1.5 ps. (d) The bar graph of ITS bounds predicted by the qMSM (blue) and IGME (red).

Close modal

As shown in Fig. 2(b), the extensive scanning of the hyperparameters of τK and L can provide highly accurate results and help build IGME models with small RMSEs. In Fig. 2(b), the purple region marks the top 5% IGME models with the smallest RMSEs (<8.1×105). These top 5% IGME models can give the most accurate predictions of dynamics. For example, in Fig. 2(d), the top 5% IGME models can correctly predict the variational bound of the slowest implied timescales (compared to the qMSM), with even smaller numerical uncertainties.

To test the performance of the IGME in a more complex system, we examined the conformational dynamics of FIP35 WW domain. Figure 3(a) shows a four-state model of the folding dynamics of FIP35 WW domain. In our previous work, we have shown that a qMSM constructed at τK = 15 ns was able to accurately predict long-timescale dynamics, while the MSM could only be built at the Markovian lag time of τ = 200 ns for this four-state model of WW domain.30 This can also be seen in Fig. 3(c), where the qMSM would work at τK = 15 ns but the MSM has significant deviations at τ = 15 ns. Figure 3(b) shows that the MSM is generally less accurate than the qMSM. In Fig. 3(b), we can see the RMSEs of the MSM are much larger than those of the qMSM or IGME.

FIG. 3.

The IGME model of FIP35 WW domain. (a) The four-state model of FIP35 WW domain. (b) The RMSE heatmap of the IGME, qMSM and MSM. The triangular panel is the result of IGME at different τK and L. The purple color shows the regions of the top 5% most accurate IGME models. (c) The Chapman–Kolmogorov test of the MSM (green), qMSM (blue) and IGME (red) compared against MD simulations (grey). In this panel, τMSM = τK = 15 ns.

FIG. 3.

The IGME model of FIP35 WW domain. (a) The four-state model of FIP35 WW domain. (b) The RMSE heatmap of the IGME, qMSM and MSM. The triangular panel is the result of IGME at different τK and L. The purple color shows the regions of the top 5% most accurate IGME models. (c) The Chapman–Kolmogorov test of the MSM (green), qMSM (blue) and IGME (red) compared against MD simulations (grey). In this panel, τMSM = τK = 15 ns.

Close modal

The simulation data of WW domain contains larger fluctuations than the alanine dipeptide dataset. In Fig. 2(c), the uncertainty of the residence probability of alanine dipeptide is around 4%, while in Fig. 3(c), the uncertainty could be 40%. Consequently, the qMSM of FIP35 WW domain is less stable than alanine dipeptide. In the short-time dynamics (shorter than length of trajectories) shown in Figs. 2(b) and 3(b), the RMSE of FIP35 WW domain already contains larger fluctuation than alanine dipeptide at short time scales (shorter than length of trajectories). While in the long-time dynamics as shown in Figs. 2(d) and 4(a), the uncertainty of qMSM predicted dynamics for WW domain contains large uncertainties (∼17% at τK ∼ 70 ns), which is significantly larger than alanine dipeptide (<1.7%).

FIG. 4.

The IGME outperforms in predicting slowest implied timescales and MFPT of FIP35 WW domain. (a) The bar graph of ITS bounds predicted by the qMSM (blue) and IGME (red). (b) The conformation of folded and unfolded states. (d) The bar graph of unfolding time, calculated with the qMSM (blue) and IGME (red). (c) The distributions of predicted unfolding time, calculated with the qMSM (blue) and IGME (red).

FIG. 4.

The IGME outperforms in predicting slowest implied timescales and MFPT of FIP35 WW domain. (a) The bar graph of ITS bounds predicted by the qMSM (blue) and IGME (red). (b) The conformation of folded and unfolded states. (d) The bar graph of unfolding time, calculated with the qMSM (blue) and IGME (red). (c) The distributions of predicted unfolding time, calculated with the qMSM (blue) and IGME (red).

Close modal

The IGME can provide models with much greater numerical stability than the qMSM for the dynamics of WW domain. In Fig. 3(b), the scanning of τK and L shows smooth distributions of RMSE in the τK,L space, significantly less fluctuating than the qMSM. In certain regions of τK,L, the IGME can reach even much smaller errors than qMSM. The numerical stability of IGME also greatly facilitates the prediction of long-time dynamics. In Figs. 4(a) and 4(c), the top 5% IGME models [selected from the purple region of Fig. 3(b)] can generate much more accurate results for the variational bound of the slowest implied timescales (<2%) as well as the MFPT of unfolding process (<2%), when compared with the qMSM (17% at τK ∼ 70 ns). Therefore, the IGME is better than the qMSM when computing the long-time dynamics of a simulation system which contains less sufficient sampling.

Finally, we demonstrate the power of the IGME in a large biomolecular system, the gate opening of Taq RNAP, which is shown in Fig. 5(a). Here we adopt the same four-state models as our previous paper.34 For this system, the qMSM at certain values of τK fail to converge, while the converged qMSMs contain large fluctuations. Therefore, we have introduced a smoothing technique,34,35 the qMSMsm, to reduce the fluctuation of the TPMs used to build qMSMs. This smoothing technique could greatly improve the convergence of the qMSM from 51% (the original qMSM at τK0,90 ns) to 92% (qMSMsm). For the converged data as shown in Fig. 5(b), the smoothing technique could further reduce the error of predicted TPMs from 1.6‰ (averaged RMSE of the qMSM within τK30,90 ns) to 0.7‰ (qMSMsm).

FIG. 5.

The IGME model of the gate opening dynamics of Taq RNAP. (a) The cartoon of the bacterial holoenzyme and its domains (yellow), clamp (magenta), β-lobe (yellow) and Switch two region (orange). (b) The RMSE heatmap of the IGME, qMSM, qMSMsm and MSM. The triangular panel is the result of the IGME at different τK and L. The purple color shows the regions of the top 5% most accurate IGME models. (c) The Chapman–Kolmogorov test of the MSM (green), qMSMsm (cyan) and IGME (red) compared against MD simulations (grey). In this panel, τMSM = τK = 30 ns. (d) The bar graph of ITS bounds predicted by the qMSMsm (cyan) and IGME (red).

FIG. 5.

The IGME model of the gate opening dynamics of Taq RNAP. (a) The cartoon of the bacterial holoenzyme and its domains (yellow), clamp (magenta), β-lobe (yellow) and Switch two region (orange). (b) The RMSE heatmap of the IGME, qMSM, qMSMsm and MSM. The triangular panel is the result of the IGME at different τK and L. The purple color shows the regions of the top 5% most accurate IGME models. (c) The Chapman–Kolmogorov test of the MSM (green), qMSMsm (cyan) and IGME (red) compared against MD simulations (grey). In this panel, τMSM = τK = 30 ns. (d) The bar graph of ITS bounds predicted by the qMSMsm (cyan) and IGME (red).

Close modal

However, for complex systems like Taq RNAP, qMSMs with the smoothing scheme still contain large fluctuations that hinder accurate predictions of long-time dynamics. The uncertainty of the variational bound of slowest implied timescales of the qMSMsm varies between 23% at τK ∼ 40 ns and 108% at τK ∼ 10 ns, as shown in the blue bars of Fig. 5(d). In addition, another issue of the smoothing scheme can be seen in Fig. 6(c), where the smoothed TPMs will also reduce the short-time memory kernels and lead to divergence of memory kernels at long lag times.

FIG. 6.

IGME can correctly compute the integral of memory kernels. (a) The MIK of alanine dipeptide. (b) The MIK of FIP35 WW domain. (c) The MIK of gate opening dynamics of Taq RNAP. The blue and red curves are computed from the qMSM and IGME, respectively. All qMSMs are generated with original TPMs, and the smoothing scheme is not applied in this figure.

FIG. 6.

IGME can correctly compute the integral of memory kernels. (a) The MIK of alanine dipeptide. (b) The MIK of FIP35 WW domain. (c) The MIK of gate opening dynamics of Taq RNAP. The blue and red curves are computed from the qMSM and IGME, respectively. All qMSMs are generated with original TPMs, and the smoothing scheme is not applied in this figure.

Close modal

In sharp contrast, the IGME can generate significantly more accurate results with smaller numerical fluctuations. As seen from the accuracy of the IGME in Fig. 5(b), the best qMSMsm has RMSE = 0.4‰, which is larger than the RMSE of the green, cyan, blue, and magenta regions of IGME that containing 30% of all IGME models in the exhaustive scan. The purple region consisting of the top 5% IGME models could reach an even smaller RMSE (0.23‰–0.28‰) than the qMSM. The improvement of numerical stability can be seen Fig. 5(d): when τK ≥ 20 ns, the uncertainty of ITS bound calculated by IGME varies from 1% at τK ∼ 30 ns to 7% at τK ∼ 50 ns.

As discussed in Sec II B, we can also evaluate the integral of the memory kernels using the zero-order term of the Eq. (19) (M0Ṫ0lnT̂). M0 represents the integral matrix of memory kernels, and the mean integral of memory kernels MIKτK can be simply computed from M0 via MIKτK=M0F/N. M0F is the Frobenius norm of M0, and N is the number of states. We show that the MIKs computed from the qMSM and IGME are in reasonable agreement for Alanine Dipeptide [Fig. 6(a)], WW-domain [Fig. 6(b)], and Taq RNAP [Fig. 6(c)]. When the MIK reaches a plateau, meaning the memory kernels have completely relaxed and the MIK will not change when extending the lag time, the MIK computed from the IGME could perfectly match with the qMSM. The MIK computed from the IGME is also much smoother than from the qMSM. Even at long lag times, fluctuations always exist even for the alanine dipeptide system. While in the IGME, the MIK is numerically more stable, as the numerical fluctuations are much smaller than in the qMSM.

In the IGME, we have employed the Taylor series to expand the convolution term of the GME. Previously, Laplacian transform was used in many papers33,37,43 to compute memory kernels from the time dependent transition matrixes [e.g., Tt]. For a dynamical system, the Laplacian transform at frequency s → 0 provides the Markovian approximation to the GME, while the Laplacian transform at s → ∞ describes the short-time dynamics. In systems with timescale separations, there should exist a characteristic frequency sK that corresponds to τK in our IGME theory, where the memory kernel decays to zero at tτK or ssK. By employing the Laplacian transform, one could potentially develop an analogous method to IGME in the future by considering all contributions of frequencies within ssK.

Constructing IGMEs requires longer MD trajectories compared to GME (or qMSMs). This is because the validity of IGME’s solution relies on the time integrals of memory kernels becoming constant matrices (tτK) and obtaining a numerically stable estimation of memory kernel integrals generally necessitates slightly longer simulations than τKqMSM (required for qMSM). Let’s consider the WW-domain as an example. Our previous work30 has shown that qMSMs outperform MSMs by predicting correct long-timescale dynamics of the WW-domain folding, while being built from significantly shorter MD simulations: τKqMSM=25ns v.s. τTMSM=200ns. For this WW-domain system, our IGME requires 33 ns (τK = 31 ns and L = 2 ns) vs τKqMSM=25ns to build models with comparable RMSEs of ∼1.5 × 10−3 [see Fig. 3(b)]. For the alanine dipeptide, the IGME requires 1.6 ps (τK = 1.3 ps and L = 0.3 ps) vs τKqMSM=1.5ps vs τTMSM=12.6ps to build models with comparable errors (RMSE ∼ 1.4 × 10−3). In all three systems, the sum of τK + L in our IGME is still significantly shorter than the Markovian lag times: τTMSM.

Recently, the time convolutionless GME theory U-GME64 has been developed and it is shown that the rate matrix [Rt] becomes time invariant when the lag time is longer than a characteristic time. This is also consistent with our IGME theory. In particular, the solutions of IGME (currently as a function of TPMs) could be re-written as a function of the rate matrix and we show that this rate matrix becomes constant when the memory kernel fully decays (tτK, see  Appendix B for details). IGME and UGME also exhibit comparable performance in predicting the slowest implied timescales for the WW-domain and RNAP systems.31 Different from GME-based methods, HMM47 allows fuzzy state boundaries rather than considering memory kernels. We have demonstrated that GME-based methods (e.g., IGME) are numerically more robust than the existing implementation of HMM, because IGME is fitted to several TPMs at lag times longer than τK while HMM maximizes the likelihood function defined by entire input trajectories.31 

In this paper, we present a new method, the Integrative GME (IGME). This method adopts a Taylor expansion of memory term in the GME and utilizes the time integrals of memory kernels to reduce the complexity of the GME. Under the condition (tτK) where the memory kernels have decayed to zero (KtτK0), the time integrals of memory kernels of different moments become constant matrices, allowing for the analytical solution of the GME [Eq. (18)] through a self-consistent approach. This analytical solution can be fitted with the MD simulation data, providing accurate and numerically stable predictions of the long-time dynamics. The two hyperparameters, τK and L, are obtained with an exhaustive scan approach. Choosing the best IGME models allows for more accurate and numerically more stable predictions compared to qMSMs. Beside the analytical solution of Tt [Eq. (18)], the self-consistent equation of T̂, Eq. (19), can provide an alternative way of computing integrals of memory kernels. The memory kernel computed from Eq. (19) is much more stable than in the qMSM. To demonstrate the power of IGME, we have applied it to study the dynamics of the alanine dipeptide, the folding of FIP35 WW domain, and the gate opening of Taq RNAP. For all three systems, we show that the IGME achieves significantly smaller fluctuations for both memory kernels and long-time dynamics compared to the qMSM. Biomolecular dynamics often exhibit timescale separations, where the memory kernels decay to zero at significantly shorter timescales than the slowest dynamics of the system. Therefore, we expect that the IGME holds great promise to be widely applied to study complex conformational changes of biomolecules.

We would like to thank Dr. Jianfeng Lu for helpful discussions. We would also like to express our gratitude to an anonymous reviewer for bringing to our attention an alternative approach to solving the GME. We acknowledge the support from the NIH/NIGMS under Award No. 1R01GM147652-01A1, the support from the Office of the Vice-Chancellor for Research and Graduate Education at the University of Wisconsin-Madison with funding from the Wisconsin Alumni Research Foundation, and the support from the Hirschfelder Professorship Fund.

The authors have no conflicts to disclose.

Siqin Cao: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Methodology (lead); Software (equal); Validation (lead); Visualization (lead); Writing – original draft (equal); Writing – review & editing (equal). Yunrui Qiu: Methodology (supporting); Software (equal); Writing – original draft (supporting); Writing – review & editing (supporting). Michael L. Kalin: Writing – original draft (equal); Writing – review & editing (equal). Xuhui Huang: Conceptualization (supporting); Funding acquisition (lead); Investigation (supporting); Methodology (supporting); Resources (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

The source code of IGME is available at https://github.com/xuhuihuang/IGME. Other simulation datasets are available from the corresponding authors upon request.

In this appendix, we provide an alternative approach to solve the GME [Eq. (4)] under the condition [tτK and KtτK=0] via the characteristic equation. We first consider the condition of t → ∞, and adopt a nontrivial solution of the homogeneous Delayed Differential Equation: TtAeBt, where A and B are constant matrices. The characteristic equation of Eq. (4) is:
B=Ṫ00min,τKeBsKsds=Ṫ00τKeBsKsds
(A1)
We then perform a Taylor expansion on the exponential term eBs. In addition, when tτK, it is given that the integrals of memory kernels become constants: Mnt>τK=MnτKMn.
B=Ṫ0M0n=11nn!Bn0τKKssnds=Ṫ0M0n=11nn!BnMn
(A2)
The general solution of Eq. (A2) is as below and contains multiple exponential terms:
Tt=A1eB1t+A2eB2t+
(A3)
For biomolecules, when the lag time t → ∞, there should be a distinctive model [i.e., T(t)consisted of a set of unique dynamic modes] to describe their conformational dynamics. In fact, when the lag time t > τTτK (τT is the Markovian lag time),30 such a unique MSM already emerges to describe the system’s conformational dynamics. Therefore, the solution of the Eq. (A2) contains a single term:
Tt=AeBt
(A4)
If we define B=lnT̂, where T̂ is also a constant matrix. Equation (A1) becomes Eq. (17) at t → ∞: Tt=AT̂t. We next solve the GME at tτK. Similar to the solution at t → ∞, we can also substitute the non-trivial solution TtAeBt into Eq. (4) and obtain the corresponding characteristic equation at tτK,
B=Ṫ0M0n=11nn!BnMn
(A5)
The general solution of the above characteristic equation [Eq. (A5)] could contain multiple terms at finite time (tτK):
TtτK=A1eB1t+A2eB2t+
(A6)
However, we notice that the solution at tτK [Eq. (A6)] should be consistent with the solution at t → ∞ [Eq. (A4)]. Therefore, Eq. (A6) contains only one exponential term ALeBLt. If we also define BL=lnT̂, we can obtain the same solution as Eqs. (18) and (19) in the main text:
TtτK=AT̂tlnT̂=Ṫ0M0n=11nn!lnT̂nMn
(A7)
In parallel to the GME derived from the Nakajima–Zwanzig equation, an empirical form of the GME:
ddtTt=TtRt
(B1)
Where Rt is also called the transition rate matrix. For a Markovian model, Rt is invariant of time, thus TMarkovt=AeRt. But in general, Rt for a non-Markovian process should be written as:
Rt=ddtlnTt
(B2)
With Eq. (9), the rate matrix can be rewritten as:
Rt=lnT̂+M0M0t+n=11nn!×T1tTntMnMntn=11nn!T1tTntlnT̂n
(B3)
After the memory kernel fully decays (tτK), all the memory kernel terms become invariant of the lag time: MntMnτK=Mn. Therefore, the second and third term in the right-hand side of the above equation vanish when tτK. In addition, according to Eqs. (B3) and (18) at tτK, T1tTnt=lnT̂n, the fourth term also vanishes. As a result, the rate matrix Rt becomes a constant matrix at tτK and is related toT̂:
RtτK=lnT̂
(B4)
1.
G. R.
Bowman
,
V. S.
Pande
, and
F.
Noe
,
An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation
(
Springer
,
Netherlands
,
2014
).
2.
J. H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schutte
, and
F.
Noe
, “
Markov models of molecular kinetics: Generation and validation
,”
J. Chem. Phys.
134
(
17
),
174105
(
2011
).
3.
J. D.
Chodera
,
N.
Singhal
,
V. S.
Pande
,
K. A.
Dill
, and
W. C.
Swope
, “
Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics
,”
J. Chem. Phys.
126
(
15
),
155101
(
2007
).
4.
A. C.
Pan
and
B.
Roux
, “
Building Markov state models along pathways to determine free energies and rates of transitions
,”
J. Chem. Phys.
129
(
6
),
064107
(
2008
).
5.
B. W.
Zhang
,
W.
Dai
,
E.
Gallicchio
,
P.
He
,
J. C.
Xia
,
Z. Q.
Tan
, and
R. M.
Levy
, “
Simulating replica exchange: Markov state models, proposal schemes, and the infinite swapping limit
,”
J. Phys. Chem. B
120
(
33
),
8289
8301
(
2016
).
6.
F.
Morcos
,
S.
Chatterjee
,
C. L.
McClendon
,
P. R.
Brenner
,
R.
Lopez-Rendon
,
J.
Zintsmaster
,
M.
Ercsey-Ravasz
,
C. R.
Sweet
,
M. P.
Jacobson
,
J. W.
Peng
, and
J. A.
Izaguirre
, “
Modeling conformational ensembles of slow functional motions in Pin1-WW
,”
PLoS Comput. Biol.
6
(
12
),
e1001015
(
2010
).
7.
X. H.
Huang
,
G. R.
Bowman
,
S.
Bacallado
, and
V. S.
Pande
, “
Rapid equilibrium sampling initiated from nonequilibrium data
,”
Proc. Natl. Acad. Sci. U. S. A.
106
(
47
),
19765
19769
(
2009
).
8.
R. D.
Malmstrom
,
C. T.
Lee
,
A. T.
Van Wart
, and
R. E.
Amaro
, “
Application of molecular-dynamics based Markov state models to functional proteins
,”
J. Chem. Theory Comput.
10
(
7
),
2648
2657
(
2014
).
9.
N. V.
Buchete
and
G.
Hummer
, “
Coarse master equations for peptide folding dynamics
,”
J. Phys. Chem. B
112
(
19
),
6057
6069
(
2008
).
10.
F.
Noe
,
C.
Schutte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T. R.
Weikl
, “
Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations
,”
Proc. Natl. Acad. Sci. U. S. A.
106
(
45
),
19011
19016
(
2009
).
11.
G. R.
Bowman
,
V. A.
Voelz
, and
V. S.
Pande
, “
Taming the complexity of protein folding
,”
Curr. Opin. Struct. Biol.
21
(
1
),
4
11
(
2011
).
12.
I.
Buch
,
T.
Giorgino
, and
G.
De Fabritiis
, “
Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations
,”
Proc. Natl. Acad. Sci. U. S. A.
108
(
25
),
10184
10189
(
2011
).
13.
M.
Lawrenz
,
D.
Shukla
, and
V. S.
Pande
, “
Cloud computing approaches for prediction of ligand binding poses and pathways
,”
Sci. Rep.
5
,
7918
(
2015
).
14.
D. A.
Silva
,
G. R.
Bowman
,
A.
Sosa-Peinado
, and
X. H.
Huang
, “
A role for both conformational selection and induced fit in ligand binding by the LAO protein
,”
PLoS Comput. Biol.
7
(
5
),
e1002054
(
2011
).
15.
N.
Plattner
and
F.
Noe
, “
Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models
,”
Nat. Commun.
6
,
7653
(
2015
).
16.
L. T.
Da
,
F.
Pardo-Avila
,
L.
Xu
,
D. A.
Silva
,
L.
Zhang
,
X.
Gao
,
D.
Wang
, and
X. H.
Huang
, “
Bridge helix bending promotes RNA polymerase II backtracking through a critical and conserved threonine residue
,”
Nat. Commun.
7
,
11244
(
2016
).
17.
L. T.
Da
,
E.
Chao
,
B. G.
Duan
,
C. B.
Zhang
,
X.
Zhou
, and
J.
Yu
, “
A jump-from-cavity pyrophosphate ion release assisted by a key lysine residue in T7 RNA polymerase transcription elongation
,”
PLoS Comput. Biol.
11
(
11
),
e1004624
(
2015
).
18.
L. T.
Da
,
D.
Wang
, and
X. H.
Huang
, “
Dynamics of pyrophosphate ion release and its coupled trigger loop motion from closed to open state in RNA polymerase II
,”
J. Am. Chem. Soc.
134
(
4
),
2399
2406
(
2012
).
19.
D. A.
Silva
,
D. R.
Weiss
,
F.
Pardo Avila
,
L. T.
Da
,
M.
Levitt
,
D.
Wang
, and
X. H.
Huang
, “
Millisecond dynamics of RNA polymerase II translocation at atomic resolution
,”
Proc. Natl. Acad. Sci. U. S. A.
111
(
21
),
7665
7670
(
2014
).
20.
R. D.
Malmstrom
,
A. P.
Kornev
,
S. S.
Taylor
, and
R. E.
Amaro
, “
Allostery through the computational microscope: cAMP activation of a canonical signalling domain
,”
Nat. Commun.
6
,
7588
(
2015
).
21.
P. D.
Dixit
and
K. A.
Dill
, “
Building Markov state models using optimal transport theory
,”
J. Chem. Phys.
150
(
5
),
054105
(
2019
).
22.
D.
Nagel
,
A.
Weber
,
B.
Lickert
, and
G.
Stock
, “
Dynamical coring of Markov state models
,”
J. Chem. Phys.
150
(
9
),
094111
(
2019
).
23.
A.
Kells
,
Z. E.
Mihalka
,
A.
Annibale
, and
E.
Rosta
, “
Mean first passage times in variational coarse graining using Markov state models
,”
J. Chem. Phys.
150
(
13
),
134107
(
2019
).
24.
E.
Hruska
,
J. R.
Abella
,
F.
Nuske
,
L. E.
Kavraki
, and
C.
Clementi
, “
Quantitative comparison of adaptive sampling methods for protein dynamics
,”
J. Chem. Phys.
149
(
24
),
244119
(
2018
).
25.
G. R.
Bowman
,
X.
Huang
, and
V. S.
Pande
, “
Network models for molecular kinetics and their initial applications to human health
,”
Cell Res.
20
(
6
),
622
630
(
2010
).
26.
H.
Jiang
,
F. K.
Sheong
,
L.
Zhu
,
X.
Gao
,
J.
Bernauer
, and
X.
Huang
, “
Markov state models reveal a two-step mechanism of miRNA loading into the human argonaute protein: Selective binding followed by structural re-arrangement
,”
PLoS Comput. Biol.
11
(
7
),
e1004404
(
2015
).
27.
C. Y.
Son
,
A.
Yethiraj
, and
Q.
Cui
, “
Cavity hydration dynamics in cytochrome c oxidase and functional implications
,”
Proc. Natl. Acad. Sci. U. S. A.
114
(
42
),
E8830
E8836
(
2017
).
28.
Q.
Qiao
,
G. R.
Bowman
, and
X. H.
Huang
, “
Dynamics of an intrinsically disordered protein reveal metastable conformations that potentially seed aggregation
,”
J. Am. Chem. Soc.
135
(
43
),
16092
16101
(
2013
).
29.
W.
Wang
,
S. Q.
Cao
,
L. Z.
Zhu
, and
X. H.
Huang
, “
Constructing Markov state models to elucidate the functional conformational changes of complex biomolecules
,”
Wires Comput. Mol. Sci.
8
(
1
),
e1343
(
2018
).
30.
S. Q.
Cao
,
A.
Montoya-Castillo
,
W.
Wang
,
T. E.
Markland
, and
X. H.
Huang
, “
On the advantages of exploiting memory in Markov state models for biomolecular dynamics
,”
J. Chem. Phys.
153
(
1
),
014105
(
2020
).
31.
A. J. D.
Dominic
,
S. Q.
Cao
,
A.
Montoya-Castillo
, and
X. H.
Huang
, “
Memory unlocks the future of biomolecular dynamics: Transformative tools to uncover physical insights accurately and efficiently
,”
J. Am. Chem. Soc.
145
(
18
),
9916
9927
(
2023
).
32.
Q.
Shi
and
E.
Geva
, “
A new approach to calculating the memory kernel of the generalized quantum master equation for an arbitrary system-bath coupling
,”
J. Chem. Phys.
119
(
23
),
12063
12076
(
2003
).
33.
A.
Kelly
,
A.
Montoya-Castillo
,
L.
Wang
, and
T. E.
Markland
, “
Generalized quantum master equations in and out of equilibrium: When can one win?
,”
J. Chem. Phys.
144
(
18
),
184105
(
2016
).
34.
I. C.
Unarta
,
S. Q.
Cao
,
S.
Kubo
,
W.
Wang
,
P. P. H.
Cheung
,
X.
Gao
,
S.
Takada
, and
X. H.
Huang
, “
Role of bacterial RNA polymerase gate opening dynamics in DNA loading and antibiotics inhibition elucidated by quasi-Markov state model
,”
Proc. Natl. Acad. Sci. U. S. A.
118
(
17
),
e2024324118
(
2021
).
35.
L. Z.
Zhu
,
H. L.
Jiang
,
S. Q.
Cao
,
I. C.
Unarta
,
X.
Gao
, and
X. H.
Huang
, “
Critical role of backbone coordination in the mRNA recognition by RNA induced silencing complex
,”
Commun. Biol.
4
(
1
),
1345
(
2021
).
36.
S.
Presse
,
J.
Peterson
,
J.
Lee
,
P.
Elms
,
J. L.
MacCallum
,
S.
Marqusee
,
C.
Bustamante
, and
K.
Dill
, “
Single molecule conformational memory extraction: P5ab RNA hairpin
,”
J. Phys. Chem. B
118
(
24
),
6597
6603
(
2014
).
37.
S.
Presse
,
J.
Lee
, and
K. A.
Dill
, “
Extracting conformational memory from single-molecule kinetic data
,”
J. Phys. Chem. B
117
(
2
),
495
502
(
2013
).
38.
L.
Ferrari
, “
Test particles in a gas: Markovian and non-Markovian Langevin dynamics
,”
Chem. Phys.
523
,
42
51
(
2019
).
39.
G. R.
Haynes
,
G. A.
Voth
, and
E.
Pollak
, “
A theory for the activated barrier crossing rate constant in systems influenced by space and time dependent friction
,”
J. Chem. Phys.
101
(
9
),
7811
7822
(
1994
).
40.
Y. J.
Yan
and
R. X.
Xu
, “
Quantum mechanics of dissipative systems
,”
Annu. Rev. Phys. Chem.
56
,
187
219
(
2005
).
41.
M.
Berkowitz
,
J. D.
Morgan
, and
J. A.
Mccammon
, “
Generalized Langevin dynamics simulations with arbitrary time-dependent memory kernels
,”
J. Chem. Phys.
78
(
6
),
3256
3261
(
1983
).
42.
A. D.
Baczewski
and
S. D.
Bond
, “
Numerical integration of the extended variable generalized Langevin equation with a positive Prony representable memory kernel
,”
J. Chem. Phys.
139
(
4
),
044107
(
2013
).
43.
R.
Satija
and
D. E.
Makarov
, “
Generalized Langevin equation as a model for barrier crossing dynamics in biomolecular folding
,”
J. Phys. Chem. B
123
(
4
),
802
810
(
2019
).
44.
C.
Ayaz
,
L.
Tepper
,
F. N.
Brunig
,
J.
Kappler
,
J. O.
Daldrop
, and
R. R.
Netz
, “
Non-Markovian modeling of protein folding
,”
Proc. Natl. Acad. Sci. U. S. A.
118
(
31
),
e2023856118
(
2021
).
45.
R. F.
Grote
and
J. T.
Hynes
, “
The stable states picture of chemical reactions. II. Rate constants for condensed and gas phase reaction models
,”
J. Chem. Phys.
73
(
6
),
2715
2732
(
1980
).
46.
B. A.
Dalton
,
C.
Ayaz
,
H.
Kiefer
,
A.
Klimek
,
L.
Tepper
, and
R. R.
Netz
, “
Fast protein folding is governed by memory-dependent friction
,”
Proc. Natl. Acad. Sci. U. S. A.
120
(
31
),
e2220068120
(
2023
).
47.
F.
Noe
,
H.
Wu
,
J. H.
Prinz
, and
N.
Plattner
, “
Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules
,”
J. Chem. Phys.
139
(
18
),
184114
(
2013
).
48.
C.
Lorpaiboon
,
E. H.
Thiede
,
R. J.
Webber
,
J.
Weare
, and
A. R.
Dinner
, “
Integrated variational approach to conformational dynamics: A robust strategy for identifying eigenfunctions of dynamical operators
,”
J. Phys. Chem. B
124
(
42
),
9354
9364
(
2020
).
49.
G.
Hummer
and
A.
Szabo
, “
Optimal dimensionality reduction of multistate kinetic and Markov-state models
,”
J. Phys. Chem. B
119
(
29
),
9029
9037
(
2015
).
50.
S.
Nakajima
, “
On quantum theory of transport phenomena: Steady diffusion
,”
Prog. Theor. Phys.
20
(
6
),
948
959
(
1958
).
51.
R.
Zwanzig
, “
Ensemble method in the theory of irreversibility
,”
J. Chem. Phys.
33
(
5
),
1338
1341
(
1960
).
52.
V.
Hornak
,
R.
Abel
,
A.
Okur
,
B.
Strockbine
,
A.
Roitberg
, and
C.
Simmerling
, “
Comparison of multiple amber force fields and development of improved protein backbone parameters
,”
Proteins
65
(
3
),
712
725
(
2006
).
53.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M. L.
Klein
, “
Comparison of simple potential functions for simulating liquid water
,”
J. Chem. Phys.
79
(
2
),
926
935
(
1983
).
54.
B.
Hess
,
C.
Kutzner
,
D.
van der Spoel
, and
E.
Lindahl
, “
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation
,”
J. Chem. Theory Comput.
4
(
3
),
435
447
(
2008
).
55.
W.
Wang
,
T.
Liang
,
F. K.
Sheong
,
X. D.
Fan
, and
X. H.
Huang
, “
An efficient Bayesian kinetic lumping algorithm to identify metastable conformational states via Gibbs sampling
,”
J. Chem. Phys.
149
(
7
),
072337
(
2018
).
56.
D. S.
Hochbaum
and
D. B.
Shmoys
, “
A best possible heuristic for the k-center problem
,”
Math. Oper. Res.
10
(
2
),
180
184
(
1985
).
57.
Y. T.
Zhao
,
F. K.
Sheong
,
J.
Sun
,
P.
Sander
, and
X. H.
Huang
, “
A fast parallel clustering algorithm for molecular simulation trajectories
,”
J. Comput. Chem.
34
(
2
),
95
104
(
2013
).
58.
P.
Deuflhard
,
W.
Huisinga
,
A.
Fischer
, and
C.
Schutte
, “
Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains
,”
Linear Algebra Appl.
315
(
1-3
),
39
59
(
2000
).
59.
D. E.
Shaw
,
P.
Maragakis
,
K.
Lindorff-Larsen
,
S.
Piana
,
R. O.
Dror
,
M. P.
Eastwood
,
J. A.
Bank
,
J. M.
Jumper
,
J. K.
Salmon
,
Y. B.
Shan
, and
W.
Wriggers
, “
Atomic-level characterization of the structural dynamics of proteins
,”
Science
330
(
6002
),
341
346
(
2010
).
60.
Y.
Naritomi
and
S.
Fuchigami
, “
Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: The case of domain motions
,”
J. Chem. Phys.
134
(
6
),
02B617
(
2011
).
61.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
(
4
),
2000
2009
(
2013
).
62.
G.
Perez-Hernandez
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noe
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
(
1
),
015102
(
2013
).
63.
H. B.
Wan
and
V. A.
Voelz
, “
Adaptive Markov state model estimation using short reseeding trajectories
,”
J. Chem. Phys.
152
(
2
),
024103
(
2020
).
64.
A. J.
Dominic
III
,
T.
Sayer
,
S. Q.
Cao
,
T. E.
Markland
,
X. H.
Huang
, and
A.
Montoya-Castillo
, “
Building insightful, memory-enriched models to capture long-time biochemical processes from short-time simulations
,”
Proc. Natl. Acad. Sci. U. S. A.
120
,
e2221048120
(
2022
).