Biomolecular dynamics play an important role in numerous biological processes. Markov State Models (MSMs) provide a powerful approach to study these dynamic processes by predicting long time scale dynamics based on many short molecular dynamics (MD) simulations. In an MSM, protein dynamics are modeled as a kinetic process consisting of a series of Markovian transitions between different conformational states at discrete time intervals (called “lag time”). To achieve this, a master equation must be constructed with a sufficiently long lag time to allow interstate transitions to become truly Markovian. This imposes a major challenge for MSM studies of proteins since the lag time is bound by the length of relatively short MD simulations available to estimate the frequency of transitions. Here, we show how one can employ the generalized master equation formalism to obtain an exact description of protein conformational dynamics both at short and long time scales without the time resolution restrictions imposed by the MSM lag time. Using a simple kinetic model, alanine dipeptide, and WW domain, we demonstrate that it is possible to construct these quasi-Markov State Models (qMSMs) using MD simulations that are 5–10 times shorter than those required by MSMs. These qMSMs only contain a handful of metastable states and, thus, can greatly facilitate the interpretation of mechanisms associated with protein dynamics. A qMSM opens the door to the study of conformational changes of complex biomolecules where a Markovian model with a few states is often difficult to construct due to the limited length of available MD simulations.

## I. INTRODUCTION

Biological molecules often need to dynamically change their shapes or conformations in order to perform their functions. Conformational dynamics, thus, play an important role in many biological processes, such as protein misfolding and aggregation, protein–ligand recognition, and numerous other functional conformational changes. By modeling dynamics at the atomistic level, molecular dynamics (MD) simulations provide a powerful tool to study biomolecular dynamics that complement experimental techniques. However, while biologically relevant processes often occur on time scales of milliseconds or longer, individual MD simulations of large proteins are usually limited to microseconds or shorter time scales.

In recent years, Markov State Models (MSMs) have become a popular approach to bridge this time scale gap by modeling long time scale dynamics by combining the information contained in many short MD simulations.^{1–24} In MSMs, dynamic processes are modeled as a series of Markovian transitions among metastable conformational states (or free energy minima) at discrete time intervals, i.e., lag times, *τ*_{T}. A key feature of MSMs is that the lag time has to be long enough to allow transitions among states to become memoryless or Markovian, where the memory of these transitions is mainly determined by dynamic relaxation within each state. In practice, however, it is challenging to obtain a Markovian model for complex systems with only a few states, as the computationally feasible lag times are bound by the length of the MD simulations available to estimate the frequency of these transitions. To achieve a partitioning of the system with a truly Markovian evolution, one often needs to construct MSMs containing a large number of states so that each state is sufficiently well localized in a configurational space and has relatively fast internal relaxation dynamics that support affordable lag times. For example, to model the millisecond folding of the NTL9 peptide, an MSM containing 2000 states (with a lag time of 12 ns) is required.^{25} Furthermore, our work on a 37-residue intrinsically disordered peptide has shown that to allow for Markovian evolution, an MSM with as many as 10 000 states is needed.^{26} While MSMs containing thousands of states are useful to make quantitative predictions to be tested against experiments, the large number of states often prevents intuitive interpretations and obscures biological insights.^{27} In contrast, MSMs built using only a few states facilitate understanding and interpretation, but require very long individual simulations to reach the lag times that enable a Markovian description of the dynamics.^{27}

These limitations of MSMs evince the need for an alternative approach that allows for the accurate and efficient simulation of the dynamics of a few states, which allows for intuitive interpretation while circumventing the high computational cost associated with long MSM lag times. The generalized master equation (GME) framework provides a way to achieve this by exactly encoding the non-Markovian dynamics of a given choice of states. The GME approach, which is based on the projection operator formalism,^{28,29} has been widely applied to compute various properties of liquids,^{30–34} polymers,^{35,36} and glass formation^{37–39} and can offer a particularly efficient representation of the dynamics when the system of interest contains a separation of time scales, enabling one to derive equations of motion for the slow degrees of freedom while exactly encapsulating the influence of all fast degrees of freedom in a generally time-dependent memory kernel.^{40,41} While directly calculating the memory kernel has traditionally posed a difficult problem due to the presence of *projected dynamics*, recent advances have shown how to calculate the memory kernel directly from the *unprojected dynamics* of the system,^{42–48} allowing for the development of accurate and efficient methods to calculate the classical^{49,50} and quantum^{46,51–54} dynamics of complex systems, suggesting that similar benefits might be obtained in the context of the classical dynamics of proteins and other biomolecules. In particular, it is known that certain protein motions are slow, including transitions among metastable conformational states enabled by, for example, protein backbone folding and domain closure, while transitions within the same metastable state are fast, such as those facilitated by side-chain rotations. In such cases, one can expect a GME-based description of slow protein motions to provide an efficient means to study its long time scale dynamics.

Here, we show how one can use GMEs to construct a quasi-Markov State Model (qMSM) that encodes the non-Markovian dynamics of a small number of configurational states for several model systems, including a simple kinetic model, alanine dipeptide, and the Fip35 WW domain, and contrast the properties of the resulting dynamics to those obtained via MSMs. In particular, we demonstrate that to construct qMSMs, one needs MD simulations that are 5–10 times shorter than those required by the analogous MSMs to describe the dynamics of these model systems, suggesting that the GME framework can provide an efficient approach to interrogate the long-time dynamics associated with protein folding.

## II. THEORY

Employing projection operators, one can derive a GME and an MSM that describe the non-Markovian and Markovian dynamics, respectively, of a subset of degrees of freedom in a system of an originally much higher dimension. While the GME is in principle exact, MSMs, especially when the lag time is not sufficiently long, generally represent an approximation to the dynamics of the chosen states. In this section, we first derive the GME for projected dynamics using the projection operator formalism in Sec. II A and then discuss how to numerically obtain the memory kernels for protein dynamics in Sec. II B. We then present the algorithm of our qMSM approach in Sec. II C.

### A. Generalized master equations and Markov state models

The dynamics of molecular systems follow the Liouville equation, which is defined in a high-dimensional phase space. However, for the conformational dynamics of proteins, the dynamic modes of interest often lie in a low-dimensional manifold, i.e., a few collective variables are often sufficient to describe the slow dynamics of protein conformational changes. To study the dynamics of these modes, one can construct an equation of motion for them that exists in a much lower dimensional space than the phase space describing all molecular motions. The projection operator technique^{55,56} provides a formally exact and effective framework to study a system’s dynamics by projecting it onto a low-dimensional space.

The configurational projection operator is defined as^{27,56,57}

where $\rho eqx,p$ is the equilibrium probability distribution and *π*_{j} is the equilibrium population associated with state *j* of the projected space [see Eq. (3) below]. **x** and **p** are the vectors containing the positions and momenta of all atoms in the simulation, respectively,

correspond to the indicator functions that label the different configurational states supported by the protein, and are orthogonal such that *χ*_{j}(*x*)*χ*_{k}(*x*) = *δ*_{j,k}*χ*_{j}(*x*). The bracket operation is defined as an ensemble average over the equilibrium distribution for the entire system,

One can derive the equation of motion for the probability of finding the protein in the *k*th configurational state at time *t* given that it was initially in the *j*th configurational state by starting from the equation of motion for the propagator,^{45}

where $L$ is the classical Liouville operator. The above equation of motion for the propagator can be used to obtain both the Nakajima–Zwanzig^{58,59} and Mori^{60} approaches. Next, one proceeds by closing on the right- and left-hand sides with members of the projector, yielding

The GME given by Eq. (5)*exactly* determines the evolution of the probability of finding the protein in a particular configurational state at any point in time. Here,

is the transition probability matrix (TPM), i.e., the probability of finding the protein in the *k*th conformation given that it started in the *j*th conformation, $T\u0307k,j(0)=\u2212\u27e8\chi k(x)|L|\rho eq\chi j(x)\u27e9\pi j\u22121$ is the time derivative of the TPM at *t* = 0, and the memory kernel takes the form $Kk,j(t)=\u27e8\chi k(x)|Le\u2212QLtQL|\rho eq\chi j(x)\u27e9\pi j\u22121$. The so-called inhomogeneous term^{29,60} arising from the second term on the right-hand side of Eq. (4) is identically zero when the GME is constructed from equilibrium MD simulations starting from any of the states included in the projector. Even when the initial condition corresponds to an equilibrium one with a statistical combination of more than one state, the inhomogeneous term can be shown to be identically zero if the set of states in the projection operator spans the configurations sampled by the initial condition. Finally, since the TPM contains all dynamical information about the protein in this reduced state space, it can be used to obtain the probability of finding the protein in the *k*th state at time *t*, *P*_{k}(*t*), given an initial condition corresponding to a combination of configurational states, **P**(0),

where *T*(0) = **1** by virtue of the idempotency of the projection operator, i.e., $P2=P$, and **P**(0) is a vector with positive entries and unit norm describing the relative contributions of the different states to the initial condition of interest.

Importantly, when the projected degrees of freedom included in the GME, corresponding to the set of configurational states of the protein in this case, evolve on a much slower time scale than all other motions and correlations in the entire system, such as vibrations, librations, and torsions, the memory kernel decays on a faster time scale, *τ*_{K}, than the characteristic time scale of the projected motions. This physical insight guides the choice of states included in the projection operators used to address physical processes ranging from electron and energy transfer in the condensed phase^{45,61–64} to density fluctuations and their currents in glasses,^{38,65} and the configurational states in protein dynamics.^{27,56} In such cases, one can rewrite the GME to explicitly account for the finite lifetime of the memory kernel,

where the upper limit of the integral is ≤*τ*_{K}. This suggests that the MD simulations required to construct the memory kernel for the GME need to at least span the time scale defined by *τ*_{K}.

In the limit of extreme time scale separation, the memory kernel decays infinitely fast in comparison to the time scale of interest, i.e., $K(t)\u2192KM\delta (t)$ , where the M superscript denotes the Markovian condition. This fast decay leads to a Markovian equation of motion for the projected states, $T\u0307M(t)=(T\u0307(0)\u2212KM)TM(t)$ , that can be formally integrated to provide an analytic expression for the evolution transformation that evolves the TPM over an arbitrary time interval Δ*t*,

where $UM(\Delta t)=exp(T\u0307(0)\u2212KM)\Delta t$ and forms the basis of the purely kinetic description in MSMs. In Eq. (9), *T*(0) = **1** and *U*^{M}(Δ*t*) = *U*^{M}(Δ*t*)*T*(0) = *T*^{M}(Δ*t*), which allows one to rewrite Eq. (9) in the common MSM notation,

The Markovian TPM, $Tk,jM$, represents the transition probability from state *j* to *k*, which can also be computed from the transition count matrix, *C*, as

where *C*_{k,j} counts the number of transitions coming from state *j* to *k* in the unbiased simulation trajectories. The transition count matrix, *C*, is symmetric, i.e., the number of transitions from state *j* to *k* equals the number of backward transitions, whereas the TPM is not.

While a strict separation of time scales leading to a purely Markovian evolution is not always possible, it is, nevertheless, possible to coarse grain the time step such that the evolution of the TPM from one discrete time step to the next *appears* effectively Markovian, i.e., its evolution can be described by an evolution transformation similar to that in Eq. (9), except that Δ*t* must be larger than some critical time scale, known as the Markovian lag time, *τ*_{T}, in the MSM literature. The Markovian lag time, *τ*_{T}, therefore, defines the time scale that MD simulations need to sample in order to allow the construction of the MSM. Efforts to construct an expression for this evolution transformation *U*^{MSM}(Δ*t* ≥ *τ*_{T}) = *T*^{MSM}(Δ*t* ≥ *τ*_{T}) from the time-dependent memory kernel dates back to the work of Zwanzig^{56} and has recently received renewed attention.^{67} Ultimately, the memory kernel lifetime, *τ*_{K}, and the lag time, *τ*_{T}, determine the minimum time scales necessary from MD simulations to construct the corresponding qMSM and MSM to describe a protein’s configurational dynamics.

### B. Obtaining $K(t)$ for the qMSM via direct inversion of the GME

Here, we provide a straightforward approach to calculate the memory kernel from the dynamics of the TPM itself, as it is calculated via MD simulation. In particular, we employ the left Riemann sum to discretize the convolution of the memory kernel $K$ and the TPM in the GME given by Eq. (5),

where Δ*t* is the time step. Using this discretization scheme, one can then obtain the following expression for the discretized memory kernel $K(n\Delta t)$:

for *n* ≥ 1, where we have explicitly used the fact that *T*(0) = **1**. Thus, $K(n\Delta t)$ can be determined from *T*(*t*) and its derivative $T\u0307$(*n*Δ*t*) ≈ [*T*((*n* + 1)Δ*t*) − *T*(*n*Δ*t*)]/Δ*t* at time points *t* = 0, Δ*t*, …, *n*Δ*t*, as well as the memory kernel at all previous time points. Since this choice of discretization for the convolution of the memory kernel and the TPM in Eq. (5) leads to an indeterminate value for the memory kernel at *t* = 0, it is essential that one ensures internal consistency by using the same level of approximation for the convolution integral in the evolution of the GME. Finally, we also note that the transfer tensor method,^{66} which is equivalent to the generalized (quantum) master equation formulation, can also be used in the present context to evolve the configurational dynamics of biomolecules. To obtain TPMs, here we apply a straightforward approach that normalizes the symmetric transition count matrix, given by Eq. (11). We note that other statistical estimators that maintain the detailed balance constraint required by the projection operator in Eq. (6) that defines the reduced (MSM and qMSM) dynamics can also be adopted to compute TPMs. This includes, for example, the maximum likelihood estimator (MLE)^{67} and observable operator model (OOM).^{67}

To characterize the memory kernel lifetime, *τ*_{K}, we define the mean integral of the memory kernel (MIK),

where *N* is the number of configurational states in the projection operator. The MIK function plateaus to a constant value when the memory kernel reaches its relaxation time, *τ*_{K}, at which point all kernel elements have decayed to zero. We use this measure as a visual means to determine whether the memory kernel elements have decayed. However, we emphasize that the most reliable test of whether the memory kernel has decayed to a stable value sufficiently close to zero is to examine the variation of the resulting GME dynamics with respect to the choice of kernel cutoff time, *τ*_{K}. In the context of quantum dynamics, this cutoff is chosen to be early in the plateau of stability of the memory kernel, defined as the span of time over which the GME dynamics cease to change with the choice of kernel cutoff time.^{45}

### C. qMSM algorithm

Figure 1 illustrates our implementation of the qMSM, which consists of two steps: the calculation of the memory kernel and the prediction of long time scale dynamics. To calculate the memory kernel, one first obtains *T*(*t*) at the discrete time steps *t* = 0, Δ*t*, …, *n*Δ*t*, from which one can calculate the memory kernel *K*(*n*Δ*t*) using Eq. (13). Using the calculated memory kernel, one can then propagate the GME dynamics by numerically integrating the GME in Eq. (5) where the upper limit of the convolution of the memory kernel and the TPM is explicitly truncated at the kernel lifetime, *τ*_{K} when *t* > *τ*_{K}. We determine *τ*_{K} as the time beyond which the MIK, given by Eq. (14), plateaus to a constant value, and employs Heun’s method^{68} to evolve the GME.

### D. Simulation details

**Simple kinetic model**: To illustrate the construction of the qMSM, we employ a simple Markovian kinetic network model containing six nodes whose evolution is defined by a transition count matrix with a lag time of one time step (step),

where the equilibrium populations of the different nodes (microstates) are given by the vector ** π** = (0.674, 0.226, 0.0675, 0.0227, 0.006 81, 0.002 26). One can then compute the TPM for this six-state model:

*T*

_{micro}at the lag time of one step by normalizing

*C*via the column sum, and, subsequently, obtain

*T*

_{micro}at a larger lag time of

*n*steps by the direct propagation: $Tmicro(n)=Tmicron(1)$ . As Fig. 3(a) shows, the size of each node in the original six-node model is proportional to its equilibrium population, and the thickness of lines connecting pairs of nodes is proportional to transition probabilities. From this kinetic model, we then generate two three-state models by grouping these six nodes based on their kinetic connectivity. The first reduced model, shown in Fig. 2(a), is obtained by connecting the states in a pairwise manner such that the states with the largest connectivity, which leads to a fast interstate transfer, are lumped together. The second reduced model, shown in Fig. 2(b), arises from random pairwise association.

The state assignments corresponding to what we, henceforth, call good and bad lumpings are listed below:

Here, *A*_{Ij} represents the mapping between nodes and states, where *j* ∈ {*a*, *b*, …, *f*} corresponds to the node index and *I* ∈ {1, 2, 3} denotes the state index. For these two cases, the projection operator is of the Hummer–Szabo form,^{57} $P=DAT\Pi \u22121A$, where *D* and Π are diagonal matrices of stationary populations of the three-state and the six-state models, respectively, and the superscript *T* denotes the matrix transpose.

**Alanine dipeptide**: The alanine dipeptide adopted in this study is terminally blocked by the NMe and Ace groups. The initial structure was prepared using the AMBER tools 1.5.^{69} After energy minimization, the alanine dipeptide was solvated in a simulation box and a 7 ns NPT simulation was performed, which was then followed by a 20 ns NVT simulation for equilibriation. We performed 100 10 ns production NVT simulations with initial conformations sampled from the 20 ns NVT simulation every 150 ps in the 5 ns–18.5 ns interval. The snapshots of these MD trajectories were stored every 0.01 ps. All MD simulations were performed using the GROMACS software version 4.5.3.^{70} We used the AMBER99SB^{71} force field for the simulations of alanine dipeptide and the flexible TIP3P model^{72} for the water molecules. All chemical bonds were constrained using the LINCS algorithm.^{73} The electrostatic interactions were calculated using the Particle Mesh Ewald (PME) method^{74} with a short-range cutoff at 13 Å. The Lennard-Jones (LJ) interactions were calculated by the switch method with a distance cutoff of 12 Å, with the switch cutoff of the LJ interaction set at 10 Å. The temperature of the simulations was maintained at 310 K via the velocity rescaling thermostat^{75} with 1 ps coupling time. For the equilibrium NPT simulation, the pressure was maintained at 1 bar via the Parrinello–Rahman barostat^{76} with 1 ps coupling. The size of the cubic box for our production NVT simulations was 30.0263 Å^{3}. To construct the discrete state model for the MSM and qMSM, we generated a four-state model using the splitting-and-lumping algorithm by MSMBuilder.^{77} Specifically, we first split all the available MD conformations into 1000 microstates using the k-center clustering algorithm^{78–80} and then lumped these microstates into four macrostates using the Perron–Cluster Cluster Analysis (PCCA).^{81} The lag time of the 1000-state MSM is ∼2 ps.^{78} To compute error bars for the MD, MSM, and qMSM dynamics shown in Figs. 5(c) and 5(e), we performed the bootstrapping process by random sampling with replacement from 100 MD trajectories to generate a bootstrap sample also containing 100 trajectories and repeated this process 100 times. We then used the resulting 100 bootstrap samples to compute error bars for the MD, MSM, and qMSM dynamics.

**WW domain Fip35**: The WW domain Fip35 is a fast folding protein that has been intensively investigated both theoretically and experimentally and commonly serves as a test bed for methods to elucidate protein folding. In this study, we performed our analysis on the simulation dataset provided by D. E. Shaw research.^{82} This simulation dataset consists of two 100 microsecond MD trajectories, each of which is sufficiently long to capture multiple independent folding and unfolding events.^{82}

Following the procedure outlined in our previous work,^{27} we grouped the MD-derived conformations into four metastable conformational states. In particular, we first projected all the conformations onto three major tICs (Time-lagged Independent Components) obtained by tICA (Time-lagged Independent Component Analysis),^{83–85} where the pairwise distances of all 35 *C*_{α} atoms (595 distances in total) and a time lag of 0.018 *µ*s were chosen to identify the tICs. Using these three tICs, we performed the clustering using the K-Center algorithm^{79,80} to generate 800 microstates, leading to an MSM with a lag time of 50 ns. The distances between pairs of MD conformations are defined as the Euclidian distances between vectors containing the values of the three major tICs. Finally, the microstates are further lumped into four macrostates using the PCCA+ algorithm.^{86} To report error bars for the dynamics of the Fip35 WW domain, we applied a bootstrapping process similar to that for alanine dipeptide. In particular, we divided each of the two MD trajectories equally into five segments and bootstrapped these 10 MD segments with replacement to obtain a bootstrap sample also containing 10 MD segments. We repeated the bootstrapping process 100 times and used the resulting samples to compute error bars for the MD, MSM, and qMSM dynamics, as shown in Figs. 6(c)–6(e).

## III. RESULTS AND DISCUSSIONS

### A. Simple kinetic model

Using the simple Markovian chain containing six nodes shown in Fig. 2(a), we first demonstrate how to use the MSM and qMSM approaches to obtain the intermediate- and long-time dynamics of the (reduced) three-state models obtained from two different pairwise groupings of the six nodes.

The first reduced three-state model (with states 1, 2, and 3) arises from a grouping procedure that connects pairs of states connected by the strongest transition probabilities, ensuring that the time scale of intrastate equilibration is shorter than that associated with interstate equilibration.

The MSM for this three-state model is fully characterized by its Markovian transition count matrix, which only becomes well defined for times equal to or longer than the lag time, *τ*_{T}, which sets the minimal time step that allows for the dynamics of the reduced model to be described as a Markovian process. The lag time for this reduced model, by which time the implied time scale (ITS) curves in Fig. 3(a) reach a plateau, is *τ*_{T} ≈ 200 steps. The implied time scale (ITS) for an MSM is defined as

where *λ*_{i}(*τ*) is the *i*th eigenvalue of the TPM at lag time *τ*. When the model becomes Markovian, i.e., $T(n\tau T)=T(\tau T)n$, the ITS reaches a plateau. In Fig. 3(a), the Markovian lag time is *τ*_{T} = 200 steps as all ITS curves reach a plateau at this time scale. The MSM dynamics for this model are shown in Fig. 3(d). The limited temporal resolution (discrete points) of the MSM dynamics arises from the fact that *τ*_{T} sets the minimal time step for the prediction of the reduced dynamics.

At the heart of the qMSM approach is the memory kernel, $K(t)$. Figure 3(c) shows the different elements of $K(t)$ associated with the GME dynamics of this three-state model, computed using Eq. (13). Memory kernels with short lifetimes [in comparison to the equilibration time scales of the reduced dynamics, such as those shown in Fig. 3(d)] indicate a substantive separation of time scales between the equilibration of intra- vs interstate dynamics. Indeed, while most elements of $K(t)$ decay quickly (in less than 100 steps), the elements of $K(t)$ corresponding to transitions out of state 3 (e.g., $K13$, $K23$ and $K33$) decay substantially more slowly than other elements, suggesting a less well defined separation of time scales between the intrastate relaxation dynamics and the interstate equilibration dynamics associated with state 3. Nevertheless, all elements of $K(t)$ decay to zero by *t* = 200 steps, as evidenced by the fact that the MIK, shown in Fig. 3(b), plateaus by this time, suggesting that *τ*_{K} = 200 steps. As an additional check of the validity of the memory kernel truncation at *τ*_{K} = 200 steps, Fig. 3(e) shows that the time-averaged root mean squared error (RMSE) in the qMSM dynamics associated with different kernel cutoff times similarly reaches a plateau at this time scale. The time-integrated RMSE for the MSM and qMSM dynamics as a function of the lag time *τ*_{T} and kernel cutoff time *τ*_{K}, respectively, considers the cumulative average over all time-dependent TPM matrix elements and is given by

where *x* ∈ {MSM, qMSM} such that *τ*_{MSM} ≡ *τ*_{T} and *τ*_{qMSM} ≡ *τ*_{K}. Because the lag time, *τ*_{T}, determines the temporal resolution of the MSM dynamics, *δt*_{MSM} = *τ*_{MSM}, and therefore, a more limited number of data points contribute to the RMSE for the MSM for a particular value of the lag time, whereas *δt*_{qMSM} is the time step used to evolve the GME. This implies that for an MSM, *L*_{x} = *t*_{sim}/*τ*_{T} since *τ*_{T} determines the MSM time step, whereas for a qMSM, *L*_{x} = *t*_{qMSM}/*δt*_{qMSM}, where *δt*_{qMSM} does not depend on *τ*_{K} and is determined by the resolution of the TPM used to obtain the memory kernel, $K$.

To assess the reliability of the qMSM and MSM approaches, Fig. 3(d) presents the results of the Chapman–Kolmogorov test of the TPM dynamics, which compares the TPM evolution predicted by the qMSM and MSM (red and green curves, respectively) with the TPMs obtained via direct simulations (dashed circles). In this example, the memory kernel lifetime, *τ*_{K} = 200 steps, serves as the upper limit of the convolution integral used to integrate the GME in Eq. (8) for *t* ≥ *τ*_{k}. The dynamics offer two important observations emerge. First is the finer time resolution of the qMSM dynamics, which arises from the fact that the GME can exactly describe the non-Markovian dynamics of the reduced system, whereas the resolution of the MSM dynamics is constrained by *τ*_{T}. Second is the fact that for this three-state model, where there exists a relatively clean separation of time scales, both the MSM lag time and the qMSM kernel lifetime agree, i.e., *τ*_{T} = *τ*_{K} = 200 steps, suggesting that the only benefit the qMSM approach can afford in such cases is to provide a finer time resolution that can elucidate potentially complex intermediate-time dynamics missed by the limited resolution of the MSM.

In contrast, when a clean separation of time scales between the intra- and interstate dynamics does not exist, the MSM and qMSM approaches can lead to markedly different behaviors. To test this possibility, we have created a second three-state model by randomly grouping pairs of nodes, as shown in Fig. 2(b). This poor grouping results in kinetic barriers within states, and thus, the implied time scales in Fig. 4(a) fail to reach a plateau even beyond *t* = 2000 steps, implying that a Markovian model cannot be built over the time scales of interest. Indeed, as Fig. 4(d) demonstrates, the MSM built from this bad lumping procedure predicts dynamics that drastically deviate from the exact dynamics. In contrast, as Figs. 4(b) and 4(c) show, the memory kernel elements associated with the qMSM decay by *τ*_{K} ≈ 1000 steps, thereby allowing for the exact determination of the intermediate- and long-time reduced dynamics of this three-state model, as shown in Fig. 4(d). To confirm the validity of our choice of *τ*_{K} in this example, the error analysis in Fig. 4(e) demonstrates that by *τ*_{K} = 1000 steps, the error in the qMSM becomes negligible. Hence, this simple model illustrates the fact that the qMSM approach can be applied even for cases where the use of an MSM would be inappropriate.

### B. Alanine dipeptide

We next applied the MSM and qMSM approaches to alanine dipeptide in an aqueous solution. As described in Sec. II D, we first grouped MD conformations from 100 10-ns MD simulations into four metastable conformational states using the splitting-and-lumping algorithm in MSMBuilder.^{77} These states can be defined using the two backbone torsion angles in Fig. 5(a). This configurational partitioning provides the basis for the construction of both the MSM and qMSM approaches below.

As the simple kinetic model based on bad lumping in Sec. III A suggests, one can construct an accurate qMSM of a system with a significant non-Markovian behavior more efficiently than the analogous MSM. In particular, in those cases, the required MSM lag time, *τ*_{T}, greatly exceeds the qMSM memory kernel lifetime, *τ*_{K}. Here, we assess the applicability of this observation in the case of alanine dipeptide, by testing the quality of MSM and qMSM dynamics one would obtain when using a limited amount of MD simulation time. The MIK curve in Fig. 5(b) plateaus at ∼1 ps–2 ps, suggesting that a qMSM kernel cutoff time *τ*_{K} of that order will be able to accurately capture the dynamics. Indeed, Fig. 5(c) shows that the qMSM dynamics (in red) obtained for *τ*_{K} = 1.5 ps accurately capture the dynamics of the TPM. In contrast, limiting the lag time of the MSM approach to the same value, i.e., *τ*_{T} = 1.5 ps, results in an overly fast relaxation of the population dynamics.

The poor performance of the MSM approach motivates asking what is the minimum MSM lag time, *τ*_{T}, that is required to extract the Markovian rates that determine the MSM since this also determines the minimum MD simulation times needed. The lag time can be determined from the time by which the ITS curves in Fig. 5(d) plateau. The onset of the plateau starts around *τ*_{T} = 10 ps, which leads to the TPM dynamics in Fig. 5(e). These MSM dynamics accurately capture the true dynamics and confirm that this choice of lag time is, indeed, appropriate. Finally, to more rigorously test the veracity of our observation that the qMSM can more accurately capture the TPM dynamics than an analogous MSM when using the same amount of simulation time to construct both dynamical schemes, we turn to the RMSE of the qMSM and MSM dynamics for the TPM evolution as a function of the memory kernel cutoff and lag times, respectively, as shown in Fig. 5(f). As the figure shows, the qMSM curve is always below its MSM counterpart. To achieve an RMSE of 0.5%, the qMSM requires *τ*_{K} ≈ 1 ps, whereas the MSM would necessitate *τ*_{T} ≈ 10 ps. Thus, these results demonstrate the utility of the qMSM approach in efficiently capturing the dynamics associated with these simple peptide fluctuations.

### C. WW domain

We now turn to the more complex problem of folding the Fip35 WW domain, a small protein containing 35 residues, and demonstrate the utility of the qMSM approach in describing its dynamics. As described in Sec. II D, we divide the MD conformations from two 100-*µ*s simulations generated by the D. E. Shaw research^{82} into four metastable conformational states. As Fig. 6(a) shows, state 1 corresponds to the folded state consisting of two *β*-hairpins, state 2 is an unfolded state, while states 3 and 4 correspond to an on-pathway intermediate and an off-pathway state, respectively.

Here, we are interested in determining, given the four conformational states described above, whether the qMSM scheme can provide a more efficient description of the configurational dynamics of complex protein folding processes than an analogous MSM. We first determine an estimate of the minimum allowed memory kernel cutoff time, *τ*_{K}, which determines the minimum MD simulation time required to construct the qMSM, from the plateau time in the MIK, which occurs around 25 ns, as shown in Fig. 6(b). In contrast to the previous examples in Secs. III A and III B, the MIK for the folding of the Fip35 WW domain is much more oscillatory. Especially, in such cases, it is imperative to test the convergence of the GME dynamics with respect to the choice of the memory kernel cutoff, as shown in Fig. 7. Figures 6(c)–6(e) show the qMSM dynamics for three choices of the memory kernel cutoff, as well as the MSM dynamics for the same model where the lag time is chosen to be equal to the choice of memory kernel cutoff, i.e., *τ*_{K} = *τ*_{T} = {15 ns, 25 ns, 200 ns}, to permit a fair comparison between the two approaches. The second of these choices, *τ*_{K} = 25 ns, shown in Fig. 6(d), corresponds to the onset of the plateau region of the MIK and demonstrates that the qMSM approach is able to capture the correct dynamics of the TPM, while the analogous MSM with *τ*_{T} = 25 ns leads to an overly fast relaxation of the configurational populations. The fact that the fluctuations of the MIK associated with the plateau region in Fig. 6(b) are commensurate with the value of the MIK at *t* = 15 ns motivates the question of whether such a kernel cutoff time is able to capture the correct qMSM dynamics. As Fig. 6(c) demonstrates, the quality of the qMSM dynamics when *τ*_{K} = 15 ns is still satisfactory, capturing almost quantitatively the decay of all population dynamics, whereas the analogous MSM with *τ*_{T} = 15 ns again predicts overly fast relaxation. In fact, as Fig. 6(e) illustrates, an MSM that correctly describes the dynamics of these four conformational states requires a lag time that is about an order of magnitude longer than the minimum memory kernel cutoff required by an analogous qMSM, i.e., *τ*_{T} = 200 ns, and still suffers from a poor temporal resolution over the early relaxation time scale of the protein folding process. The slow plateau time of the ITS curves in Fig. 6(f) supports this choice of lag time, showing that the variations of ITS_{2} and ITS_{3} have ceased by *τ*_{T} = 200 ns, whereas ITS_{1} continues to vary, albeit only slightly. Furthermore, confirming this choice of lag time, as well as the validity of choices of much smaller memory kernel cutoffs, appears in the RMSE curves associated with the choice of *τ*_{K} and *τ*_{T} for the qMSM and MSM approaches. As Fig. 6(g) demonstrates, the *τ*_{K} required for the qMSM RMSE (in red) to fall below 0.1% is about an order of magnitude lower than the *τ*_{T} required for the MSM (in green). One should also note that for *T*_{44}(*t*) in Fig. 7, the qMSM dynamics yields anomalous behavior when *τ*_{K} = 200 ns. This likely reflects the more noisy data for the Fip35 WW domain that make converging the long-time correlations needed to construct the memory kernel more challenging.

While the TPM dynamics in Figs. 6(c)–6(e) occur on the order of nanoseconds, where the existing MD simulations can be used to assess their accuracy, the folding process in the Fip35 WW domain occurs on the microsecond time scale, i.e., three orders of magnitude longer. Even on this time scale, the qMSM approach can capture the correct long-time dynamics, as shown in Fig. 8, which could otherwise only be accessed via an MSM with a converged lag time. Using the MSM with a long lag time *τ*_{T} = 200 ns as the baseline for the long time scale dynamics, one can see in Fig. 8 that the qMSM approach even with *τ*_{K} = 25 ns captures these TPM dynamics. Their slight disagreement at intermediate time scales for *T*_{41}(*t*), *T*_{42}(*t*), and *T*_{44}(*t*) might arise from either small inaccuracies in the memory kernel for the qMSM, arising from the statistical noise due to limited sampling or from the fact that MSM dynamics, even with a fully converged lag time, are, nonetheless, approximate. The direct MD provides a benchmark for short and intermediate times, where the large lag time MSM provides insufficient resolution. Here, we again observe in Fig. 8 that even with *τ*_{K} = 25 ns, the qMSM quantitatively captures the short-time dynamics in contrast to the MSM using a lag time of *τ*_{T} = 25 ns. To further compare the efficiency of qMSMs and MSMs, one can also assess how many states an MSM with a lag time equal to 25 ns (as small as the kernel cutoff time for the qMSM with four states) requires to correctly capture the dynamics. Figure 9 shows the slowest ITS associated with MSMs for the Fip35 WW domain with increasingly larger numbers of states and demonstrates that, for the WW domain, an MSM needs at least 1000 states to yield the correct slowest ITS. Hence, in this system, qMSMs allow one to obtain accurate reduced dynamics with 250 times fewer states than an MSM. In addition, we observed that the RMSE of a qMSM declines at a faster rate than that of an MSM for the simple model as well as peptide systems. To further explore this efficiency advantage of a qMSM over an MSM as a function of number of states, we have built a series of models containing different numbers of states based on a 1D potential (see the supplementary material for details). As shown in Fig. S1, the RMSE of a qMSM declines at a faster rate than that of an MSM, indicating that the observed efficiency advantage of a qMSM over an MSM is robust. Altogether, the above results demonstrate that the qMSM approach can provide a highly accurate and efficient representation of the configurational dynamics of biomolecules on the short, intermediate, and long time scales even for complex polypeptides such as the Fip35 WW domain.

From the above simple model and peptide systems, we have shown that qMSMs can effectively capture the configurational dynamics using models containing only a few states and built from affordable lengths of MD simulations. We note that two previously developed methods also aim at addressing this issue: hidden Markov models^{87,88} and core-set MSMs.^{89,90} Specifically, the hidden Markov model adopts a soft state partitioning scheme and, thus, allows states to overlap, while the core-set MSM does not perform a full partitioning of the phase space but only focuses on core regions with high densities. Both methods can efficiently generate Markovian models with a small number of states. However, hidden Markov models have overlapping states (i.e., one protein conformation can belong to multiple states), which can lead to ambiguity in the understanding of biological mechanisms, especially when there are substantial overlaps between states. For the core-set MSMs, it can be challenging to correctly identify the core regions in order to build core-set MSMs.^{90} An algorithm has been recently developed to identify core sets by optimizing the metastability index, which has been shown to effectively improve the accuracy of MSMs.^{91}

## IV. CONCLUSIONS

MSM descriptions of protein dynamics with small numbers of states often require long lag times, *τ*_{T}, which in turn necessitate the use of long MD simulations for their construction. In contrast, qMSMs encode the non-Markovian dynamics of such states in a generally time-dependent memory kernel, whose characteristic decay time scale corresponds to the kernel lifetime, *τ*_{K}. Here, we have shown that qMSMs can be used to capture short-, intermediate-, and long-time dynamics of a variety of systems, including alanine dipeptide and the WW Domain Fip35, with a memory kernel lifetime, *τ*_{K}, that is 5–10 times shorter than the lag time, *τ*_{T}, needed to obtain accurate MSM dynamics. Since *τ*_{K} and *τ*_{T} determine the length of MD simulations necessary to construct the qMSM and MSM descriptions, respectively, this means that qMSMs can provide a more efficient means to capture the reduced dynamics of biomolecules. Equivalently, for a given length of MD simulations, which sets an upper limit on the attainable memory kernel lifetime and lag time, one could construct a qMSM with significantly fewer states than those in an MSM. Indeed, both qMSMs and MSMs benefit from the numerous schemes that have been developed to partition the space optimally to select the slowest evolving time scales.^{92,93} By reducing hundreds or even thousands of states necessary to describe biological events of interest, such as protein folding and misfolding^{25,26} and other conformational changes,^{94–96} to a handful of larger and metastable states, qMSMs provide a powerful approach to facilitate interpretation of biological mechanisms in addition to two existing models containing few states: the hidden Markov model^{87,88} and core-set MSM.^{89,90} qMSMs, thus, hold promise to be widely applied to study biomolecular conformational changes. Finally, the analysis of the qMSM kernels may offer an appealing route to analyzing the origins of challenging biological phenomena, such as allosteric effects.

## SUPPLEMENTARY MATERIAL

See the supplementary material for the analysis of the scaling of the RMSE with the number of states in qMSMs and MSMs using a 1D potential.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## ACKNOWLEDGMENTS

X.H. and S.C. were supported by the Hong Kong Research Grant Council (Grant Nos. 16303919 and AoE/P-705/16), the Shenzhen Science and Technology Innovation Committee (Grant No. JCYJ20170413173837121), and the Innovation and Technology Commission (Grant No. ITCPD/17-9). T.E.M. and A.M.C. were supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Award No. DE-SC0020203. T.E.M. also acknowledges support from the Camille Dreyfus Teacher-Scholar Awards Program. X.H. and S.C. would like to thank Dr. Fu Kit Sheong for helpful discussions.