Molecular dynamics simulations have the potential to provide atomic-level detail and insight to important questions in chemical physics that cannot be observed in typical experiments. However, simply generating a long trajectory is insufficient, as researchers must be able to transform the data in a simulation trajectory into specific scientific insights. Although this analysis step has often been taken for granted, it deserves further attention as large-scale simulations become increasingly routine. In this perspective, we discuss the application of Markov models to the analysis of large-scale biomolecular simulations. We draw attention to recent improvements in the construction of these models as well as several important open issues. In addition, we highlight recent theoretical advances that pave the way for a new generation of models of molecular kinetics.

Simulations have the hope to shed light on many areas in chemical physics. For example, molecular dynamics (MD) simulations, which numerically integrate Newton's equations to simulate physical dynamics, can provide high-resolution physics-based models of emergent biological phenomena.1–4 While promising, there are three central challenges which limit the application of MD for studying questions in biophysics. The first is the development of simplified physical models (called force fields), which avoid the intractability of solving the full, electronic Schrödinger equation. Second, while simulations are required to use a discrete time step on the order of one femtosecond, processes that are characterized by slow, large-scale, collective motions, such as protein folding, protein-ligand binding, and conformational change can take milliseconds or longer to occur. This separation in timescales requires a simulation of length at least 1012 time steps in order to observe one such event, which is difficult using current hardware. Finally, the analysis of MD simulations is itself non-trivial since the result—a set of trajectories tracking the Cartesian coordinates of every particle—can contain millions of data points in tens of thousands of dimensions. Despite these challenges, a number of recent innovations have expanded the scope of molecular simulation. In fact, because of improvements in forcefield accuracy,5–10 and the development of novel computing platforms,11–15 we believe that quantitative analysis has increasingly become a limiting factor in the application of MD.16,17

With routine MD datasets now comprising terabytes of data, the direct visualization of raw MD trajectories is neither scalable nor quantitative. Instead, we suggest that MD trajectories should not be viewed as ends in and of themselves, but instead a means of parameterizing a quantitative statistical model of the structure and dynamics of the system of interest.

These models should have the following properties:

  1. The model should be oriented toward quantitatively describing the long-timescale processes in the data.

  2. The model should be suitably complex, capable of smoothly adapting to—rather than assuming—the structure of the data.

  3. The model should be interpretable and provide answers to specific scientific questions that may be difficult to answer via experiment alone.

Towards this end, we survey recent developments in the theory and application of Markov modeling for the analysis of MD simulations, which have all of these desired properties.

The transfer operator provides a theoretical foundation for the description of the slow dynamical processes in stochastic systems, and is the basis for the approaches discussed herein. Indeed, at its essence, the methods described below all work to build some sort of numerical model of the transfer operator. Here we highlight some important details; we refer the interested reader to Schütte, Huisinga, and Deuflhard.18 

Consider a time-homogeneous, ergodic Markov processes, xt, in phase space, Ω, which is reversible with respect to a positive stationary density μ(x). At time t, an ensemble of such processes can be described by a probability distribution, pt(x), which can change as a function of time. The evolution of this probability density over a time interval τ can be described by the action of an operator, T ( τ ) , the transfer operator. For each time t, define the equilibrium-weighted probability density, ut(x) = pt(x)μ(x)−1. Then T ( τ ) is defined as the operator that evolves ut(x) to ut + τ(x),
(1)
T ( τ ) is bounded and self-adjoint with respect to the scalar product
(2)
and can therefore be decomposed into terms corresponding to each of its eigenfunctions,
(3)
The eigenvalues, λi, are real, and the eigenpairs (λi, ψi(x)) can be taken to be sorted in decreasing order by eigenvalue. According to the Perron-Frobenius theorem, λ1 is equal to one and corresponds to the stationary eigenfunction, ψ1(x) = 1. The remaining eigenvalues lie in (−1, 1), and are associated with eigenfunctions which describe directions of collective motion in phase space—dynamical processes—along which the system relaxes towards equilibrium. Equation (3) can therefore be rewritten as
(4)
where ti are the characteristic relaxation timescales of each dynamical mode, defined in terms of the associated eigenvalues,
(5)
At long times (large n), terms in Eqs. (3) and (4) corresponding to large eigenvalues (long relaxation timescales) dominate the summation, as small-eigenvalue modes equilibrate and decay quickly. The focus for accurate numerical models of T ( τ ) is thus to resolve these dominant eigenmodes.

The fundamental approach we will take is to use computational methods to build an approximate representation of the transfer operator, T ( τ ) , from molecular dynamics simulations in a way which is systematic, automated, and statistically rigorous. One major upshot of this approach is that it allows a wide range of simulation data to be useful, ranging from many relatively short trajectories (compared to the longest timescales of interest) to a few long trajectories. Below, we lay out the challenges, current status, and future of this paradigm.

Markov State Models (MSMs) describe the stochastic dynamics of molecules as a Markovian jump process between a finite number of conformational states. Because of the Markov property, the probability of jumping to a new state only depends on the current state. As such, the dynamics are completely determined by a transition matrix, T, such that
(6)
where { x t } t = 1 N is a trajectory in Ω and the states, S = { s i } i = 1 k , are a set of non-overlapping subsets of Ω that partition the space: si ⊆ Ω ∀ i, i = 1 k s i = Ω , sisj = ∅ ∀ ij. T is a discretization of T ,
(7)
where
(8)
There is a rich history of describing the dynamics of biophysical processes in terms of finite state models, including work based on enumeration of potential energy minima,19,20 study of simplified physical models,21,22 analysis of long-timescale processes from many short trajectories,23–29 and transition path and spectral analysis of the resulting network models.30–33 For more specific details on MSMs, we refer the interested reader to Prinz et al.34 and Pande, Beauchamp, and Bowman.35 

Though the theory is relatively simple, the construction of an MSM is not a solved problem. Given a set of MD simulations, we need to define a state decomposition, S, as well as estimate the transition probability matrix, T. Straightforward maximum likelihood estimators for T, given S exist,34,36 but the determination of S itself is more challenging. A wide variety of strategies have been proposed for constructing S, including grid-based discretization,27 and clustering with many algorithms36,37 and distance metrics.38–43 However, no consensus has emerged in the literature on the preferred strategy, largely because these clustering schemes rest on heuristic foundations built primarily by physical intuition. No solid theoretical justifications exist for choosing one of these schemes over another, especially when statistical errors in the parameterization of T are non-negligible.

On the other hand, Sarich, Noé, and Schütte44 showed that in the absence of statistical uncertainty in the parameterization of T, an MSM's error, measured in terms of the maximum possible difference between the true and MSM-predicted probability density at a later time, is bounded. The state decomposition-dependent component of this bound quantifies the difference between the transfer operator's continuous eigenfunctions, ψi(x), and the MSM's piecewise constant approximate eigenfunctions, given by the eigenvectors of T. This result implies that a “good” state decomposition is one that finely partitions phase space particularly in the regions where the dominant eigenfunctions, ψi(x), change rapidly, such as transition regions or energy barriers. Unfortunately, these barrier regions are precisely the areas that are most poorly sampled in MD. Finely subdividing these regions into many MSM states can lead to large statistical errors in the associated transition probabilities.

In practice, even the seemingly simple problem of deciding the correct number of states given a clustering algorithm and distance metric is difficult.39,45 When the number of states is low, the discretization in Eq. (7) is very coarse, which leads to systematic underestimation of the relaxation timescales.44,46 However, as the number of states increases, we have to estimate more state to state transition probabilities. With the amount of data fixed, increasing the number of parameters leads to a corresponding increase in the statistical uncertainty of the model. Some recent work, however, has made progress toward balancing these sources of errors by defining a likelihood function for evaluating and comparing models.39,45 These likelihood functions are useful for simple systems, but can be difficult to employ in most MSM applications.

To illustrate the difficulty of building a state space more concretely, consider a simulation of a protein and a ligand where we are interested in calculating the on and off rates of the binding reaction. In this simulation, the ligand is very mobile when it is not bound to the protein, and so a typical geometric clustering will produce many unbound states. These states contribute to the statistical error since we are essentially forcing the model to describe the diffusion of the ligand in water. However, since we are only interested in the binding reaction, we do not care about this diffusion, and could instead build a more statistically robust MSM by ignoring these motions in the unbound state. This behavior is not unique to protein-ligand simulations, and in fact has been an issue when analyzing protein folding simulations. Typically these models must use tens of thousands of small states—termed microstates—in order to accurately describe the folding reaction's timescale.47–49 Not only does this increase the statistical error, but it raises major challenges for the interpretation of the resulting model. One suggested resolution has been to lump these microstate models into coarser macrostate models by grouping together states which rapidly interconvert.32,33,50,51

In our view, this microstate-to-macrostate lumping strategy can be dangerous, and we discourage its use. The lumping procedure relies directly on the quality of the microstate model, which is purposely estimated subject to large statistical errors. For example, the PCCA and PCCA+ algorithms for lumping MSMs are designed to preserve the estimates of the slow eigenfunctions as identified by the microstate model,33,50 without regard to the fact that these estimates are, by design, approximated subject to large errors. Although other methods like BACE partially remedy this problem,32,52 in our view a more direct solution is to construct accurate and interpretable models from the start.

To be able to build simultaneously accurate and interpretable MSMs, clustering must be able to focus on the important (slowly equilibrating) degrees of freedom, while ignoring the unimportant ones. One suitable method is thus to perform dimensionality reduction before clustering, explicitly removing quickly decorrelating coordinates. With this in mind, Schwantes and Pande43 and Pérez-Hernández et al.42 introduced time-structure based Independent Component Analysis (tICA) for building MSMs. The tICA method identifies the slowest decorrelating linear projections in the observed data. This technique has made it possible to build macrostate MSMs from the very beginning that better resolve long-timescale processes.45,53 In addition, these models have identified new slow motions that eluded detection previously, such as near-native register shift dynamics in β-sheet proteins.43 

Part of the difficulty in constructing the MSM state space is that there is no single quantitative criterion for comparing alternative state spaces which is suitable for practical applications. Therefore, one approach is to generalize the MSM method such that the state decomposition can be optimized simultaneously with the transition probabilities in a unified way. Hidden Markov models (HMMs) are one such extension that relax the constraint that states correspond to a discrete partition of Ω. Like MSMs, HMMs characterize the system using a Markov jump process over a finite number of states; however in the HMM, these states are not assumed to be directly observed. Instead, each hidden state is equipped with an emission distribution defined on Ω, describing the per-state conditional probability of observing a conformation at a particular point in phase space. HMMs have been widely used in a variety of fields, including signal processing, speech, and bioinformatics.54,55

For example, using Gaussian emissions, if { S t { 1 ... k } } t = 1 N is the Markov chain over hidden states with transition probability matrix T, { X t Ω } t = 1 N is the observed process, π is a probability vector for the initial distribution over hidden states, { μ i Ω } i = 1 k are the per-state means, and { Σ i } i = 1 k are the per-state covariance matrices, the model can be specified probabilistically as
(9)
(10)
(11)
Constructing an HMM requires estimating the transition probability matrix T as well as the emission distribution parameters θi = (μi, Σi). Using a maximum-likelihood approach, T and θi can be estimated jointly.56 In the HMM, the θi's are analogous to the MSM's state decomposition, but unlike the MSM state space, the θi's are determined simultaneously with the elements of T—the two are jointly optimized with respect to the same objective function. In our view, this is the key advantage of the HMM, as the optimal construction of the MSM state space remains an unsolved problem.

One caveat in applications of HMMs to analysis of MD is that without conditioning on S t , the HMM's observed process Xt is strictly non-Markovian when the emission distributions have nonzero overlap. While it is precisely these overlaps that make the model (particularly θi) optimizable, they also hamper direct interpretation of the HMM as a model for the Markovian transfer operator.57,58

Markov models have been heavily applied to many different types of systems, ranging from protein folding,48,49,59 RNA folding,60 protein-ligand binding,61 and protein conformational change.4,58,62 Below, we summarize two recent highlights from the literature, which demonstrate the unique utility of Markov modeling. In particular, these approaches allow researchers to connect many short MD trajectories into a single coherent description of the dynamics. Moreover, MSMs are able to provide an interpretable picture of the kinetics that can lead to specific scientific insight.

β2AR is a G-protein coupled receptor (GPCR) involved in many trans-membrane signaling pathways. GPCRs are critical in medicine, representing the target for 30% of top-selling drugs on the market.63 Despite extensive experimental studies,64–67 the mechanism of β2AR activation is elusive. Kohlhoff et al.68 used Google exacycle, a cloud computing platform that enables scientific calculations to be run on Google's datacenter infrastructure, to collect over 2 ms of aggregate MD simulations of β2AR in a lipid bilayer and explicit solvent using tens of thousands of individual MD trajectories, each with a mean length of ∼10 ns. Although no individual MD trajectory traversed the full activation pathway, Kohlhoff et al.68 were able to use MSMs to understand the full activation landscape using these many short trajectories. This is a particularly powerful property of MSMs, because collecting many short trajectories is substantially easier than generating a few long trajectories.11,69

In addition, Kohlhoff et al.68 simulated these trajectories in three waves. The starting conformations for the second and third wave were selected by uniformly sampling states from an MSM built on the previous waves. Known as adaptive sampling, this technique provides a framework for efficiently covering all of phase space with many, relatively short simulations, which is only possible because of the use of MSMs.70,71

Kohlhoff et al.68 used the MSM to show that the activation of β2AR proceeded along multiple paths. Many of these pathways featured metastable intermediates that displayed unique chemotype selectivity in docking studies, suggesting that computational drug design can be enhanced by considering important intermediates.

While typical structure-based computational drug design pipelines use a single protein conformation to screen for potential therapeutics,72 proteins are dynamic systems and an ensemble of structures exist in solution. Accounting for this heterogeneity could expand the range of possible methods for modulating protein activity. With this in mind, Bowman and Geissler73 set out to find cryptic druggable binding sites in β-lactamase using MD simulations and MSMs.

Using one hundred microseconds of aggregate simulation time distributed over hundreds of trajectories, each no longer than 500 ns, Bowman and Geissler73 were able to characterize slow near-native conformational changes in β-lactamase. Their MSM was able to describe the kinetics of the opening and closing of various cryptic binding sites on distal regions of the protein. In particular, MSMs allowed the researchers to study the lifetimes and equilibrium opening probabilities of these pockets. They first observed that a known cryptic binding site was predicted to be open 53% of the time according to the MSM. Furthermore, they identified 50 novel sites, which were predicted to be open more than 10% of the time, suggesting new avenues for drug design.

While these results highlight that MSMs have been useful for analyzing MD simulations, the definition of the state space remains a substantial hurdle in MSM construction. Although recent work has provided significant improvements in this area, one way to side-step this challenge is to build a Markov model (i.e., provide an estimate of the transfer operator) without using a discrete state decomposition. Towards this end, Noé and Nüske74 and Nüske et al.75 established a variational principle that applies to the estimation of the transfer operator's eigenspectrum. This is analogous to the variational principle for the Hamiltonian in quantum mechanics and guarantees that, when the operator is discretized using a finite basis set, its eigenvalues are underestimated in the limit of vanishing statistical errors.

Given a basis set, the variational approach searches for linear combinations of basis set elements which approximate the transfer operator eigenfunctions by solving a particular generalized eigenvalue problem, or equivalently maximizing a series Rayleigh quotients that can be interpreted as autocorrelation functions. In fact, the MSM approach is a specific instance of this method when the basis set consists of indicator functions which partition Ω, but the method more broadly can be applied to any set of basis functions including those which have nonzero overlap. Furthermore, tICA is also an instance of this approach, with a basis set instead consisting of linear functions of the input coordinates. In the HMM, however, the process observed in Ω is not strictly Markovian, and thus its interpretation in terms of the Markovian transfer operator (or its eigenspectrum) is less straightforward.

This theoretical advance suggests a tension between two distinct but related perspectives for building simplified models of molecular kinetics:

  1. Parameterize a probability distribution over trajectories.

  2. Find the first m eigenfunctions of T .

The HMM and tICA methods are certainly different, but to what extent are probabilistically- and variationally-inspired models distinct? These views are, at least, not always different since an MSM is both a probabilistic model and one that estimates the slowest eigenfunctions of T . But, outside of MSMs, when is a probabilistic model more desirable than a variational model, or vice versa? Is one model missing something that can only be found in the other? If so, is it possible to unify these views with a model other than an MSM? We do not know the answers to these questions, but both perspectives have enriched our understanding of molecular kinetics, and we anticipate future developments will clarify the apparent tension and lead to new classes of Markov models.

The future of Markov modeling is bright. Ten years ago, conventional MD simulations could only access timescales in the nanosecond regime.76 A revolution in Graphical Processing Unit (GPU) computing,13,69,77 however, has enabled researchers around the world to perform simulations that were once impossible. But, MD trajectories alone are not enough. Quantitative analysis, which can turn a simulation into scientific insight, is a necessary component of the research process that has often been taken for granted.

Indeed, the combination of improvements in simulation and analysis has allowed for simulations of phenomena on the hundreds of microseconds to millisecond timescale;17 as a high end GPU can today simulate over 100 ns/day for a 30 000 atom system, a cluster of 100 GPUs can simulate an aggregate of 10 μs/day, yielding a millisecond of aggregate simulation in three months. While this cluster cannot simulate one long millisecond trajectory in three months, MSMs allow the use of these 10-μs trajectories to describe phenomenon on the millisecond timescale.

Markov modeling is a technology that has been rapidly maturing for MD analysis. At least two open-source software packages allow non-experts to construct these models routinely,36,37 and tutorials on their use are available.78 Nevertheless, many challenges remain in their theory and application. In particular, parameter selection continues to be difficult—e.g., how fine a discretization best balances competing sources of error? Recent work has benefited from a synthesis of concepts from machine learning and mathematical physics, and we anticipate that the future development of Markov modeling will continue to be enhanced by a cross-pollination of ideas from these communities.

Looking to the next 10 years, we expect that these approaches will become more automated, with statistical methods replacing empirical tests of model quality and adaptive sampling approaches maturing to the point where they can automatically tackle a wide range of sampling challenges. Moreover, in 10 years, GPUs (which double in speed every year) will likely be 1000 times more powerful. The combination of these technologies suggests that MSM simulations on the millisecond timescale will become routine and require only a day on a modest 10-GPU cluster, and simulations on the seconds timescale will become within reach. At that point, the field of bimolecular simulation will likely be in a golden period, where sampling is no longer the principle challenge for many biological questions of interest.

The authors would like to acknowledge Thomas J. Lane for useful discussions during the preparation of this manuscript. In addition, the authors acknowledge funding from NSF-MCB-0954714 and NIH-R01-GM062868.

1.
A.
Ivetac
and
J. A.
McCammon
, “
Elucidating the inhibition mechanism of HIV-1 non-nucleoside reverse transcriptase inhibitors through multicopy molecular dynamics simulations
,”
J. Mol. Biol.
388
,
644
658
(
2009
).
2.
K.
Lindorff-Larsen
,
S.
Piana
,
R. O.
Dror
, and
D. E.
Shaw
, “
How fast-folding proteins fold
,”
Science
334
,
517
520
(
2011
).
3.
J.
Ostmeyer
,
S.
Chakrapani
,
A. C.
Pan
,
E.
Perozo
, and
B.
Roux
, “
Recovery from slow inactivation in K+ channels is controlled by water molecules
,”
Nature
501
,
121
124
(
2013
).
4.
D.
Shukla
,
Y.
Meng
,
B.
Roux
, and
V. S.
Pande
, “
Activation pathway of src kinase reveals intermediate states as targets for drug design
,”
Nat. Commun.
5
,
3397
(
2014
).
5.
S.
Piana
,
K.
Lindorff-Larsen
, and
D. E.
Shaw
, “
How robust are protein folding simulations with respect to force field parameterization?
Biophys. J.
100
,
L47
L49
(
2011
).
6.
K.
Lindorff-Larsen
,
S.
Piana
,
K.
Palmo
,
P.
Maragakis
,
J. L.
Klepeis
,
R. O.
Dror
, and
D. E.
Shaw
, “
Improved side-chain torsion potentials for the amber ff99SB protein force field
,”
Proteins Struct. Funct. Bioinf.
78
,
1950
1958
(
2010
).
7.
L.-P.
Wang
,
T.
Head-Gordon
,
J. W.
Ponder
,
P.
Ren
,
J. D.
Chodera
,
P. K.
Eastman
,
T. J.
Martinez
, and
V. S.
Pande
, “
Systematic improvement of a classical molecular model of water
,”
J. Phys. Chem. B
117
,
9956
9972
(
2013
).
8.
L.-P.
Wang
,
T. J.
Martinez
, and
V. S.
Pande
, “
Building force fields: An automatic, systematic, and reproducible approach
,”
J. Phys. Chem. Lett.
5
,
1885
1891
(
2014
).
9.
G.
Lamoureux
,
A. D.
MacKerell
Jr.
, and
B.
Roux
, “
A simple polarizable model of water based on classical Drude oscillators
,”
J. Chem. Phys.
119
,
5185
5197
(
2003
).
10.
Y.
Shi
,
Z.
Xia
,
J.
Zhang
,
R.
Best
,
C.
Wu
,
J. W.
Ponder
, and
P.
Ren
, “
Polarizable atomic multipole-based AMOEBA force field for proteins
,”
J. Chem. Theory Comput.
9
,
4046
4063
(
2013
).
11.
M.
Shirts
and
V. S.
Pande
, “
Screen savers of the world unite!
Science
290
,
1903
1904
(
2000
).
12.
D. E.
Shaw
,
M. M.
Deneroff
,
R. O.
Dror
,
J. S.
Kuskin
,
R. H.
Larson
,
J. K.
Salmon
,
C.
Young
,
B.
Batson
,
K. J.
Bowers
,
J. C.
Chao
et al, “
Anton, a special-purpose machine for molecular dynamics simulation
,”
Commun. ACM
51
,
91
97
(
2008
).
13.
P.
Eastman
,
M. S.
Friedrichs
,
J. D.
Chodera
,
R. J.
Radmer
,
C. M.
Bruns
,
J. P.
Ku
,
K. A.
Beauchamp
,
T. J.
Lane
,
L.-P.
Wang
,
D.
Shukla
,
T.
Tye
,
M.
Houston
,
T.
Stich
,
C.
Klein
,
M. R.
Shirts
, and
V. S.
Pande
, “
OpenMM 4: A reusable, extensible, hardware independent library for high performance molecular simulation
,”
J. Chem. Theory Comput.
9
,
461
469
(
2013
).
14.
A. W.
Götz
,
M. J.
Williamson
,
D.
Xu
,
D.
Poole
,
S.
Le Grand
, and
R. C.
Walker
, “
Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized born
,”
J. Chem. Theory Comput.
8
,
1542
1555
(
2012
).
15.
S.
Pronk
,
S.
Páll
,
R.
Schulz
,
P.
Larsson
,
P.
Bjelkmar
,
R.
Apostolov
,
M. R.
Shirts
,
J. C.
Smith
,
P. M.
Kasson
,
D.
van der Spoel
,
B.
Hess
, and
E.
Lindahl
, “
GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit
,”
Bioinformatics
29
,
845
854
(
2013
).
16.
P. L.
Freddolino
,
C. B.
Harrison
,
Y.
Liu
, and
K.
Schulten
, “
Challenges in protein-folding simulations
,”
Nat. Phys.
6
,
751
758
(
2010
).
17.
T. J.
Lane
,
D.
Shukla
,
K. A.
Beauchamp
, and
V. S.
Pande
, “
To milliseconds and beyond: Challenges in the simulation of protein folding
,”
Curr. Opin. Struct. Biol.
23
,
58
65
(
2013
).
18.
C.
Schütte
,
W.
Huisinga
, and
P.
Deuflhard
,
Transfer Operator Approach to Conformational Dynamics in Biomolecular Systems
(
Springer
,
2001
).
19.
R.
Czerminski
and
R.
Elber
, “
Reaction path study of conformational transitions and helix formation in a tetrapeptide
,”
Proc. Natl. Acad. Sci. U.S.A.
86
,
6963
6967
(
1989
).
20.
D. A.
Evans
and
D. J.
Wales
, “
Folding of the GB1 hairpin peptide from discrete path sampling
,”
J. Chem. Phys.
121
,
1080
1090
(
2004
).
21.
H.
Grubmüller
and
P.
Tavan
, “
Molecular dynamics of conformational substates for a simplified protein model
,”
J. Chem. Phys.
101
,
5047
5057
(
1994
).
22.
A.
Scala
,
L. A. N.
Amaral
, and
M.
Barthélémy
, “
Small-world networks and the conformation space of a short lattice polymer chain
,”
Europhys. Lett.
55
,
594
(
2001
).
23.
F.
Rao
and
A.
Caflisch
, “
The protein folding network
,”
J. Mol. Biol.
342
,
299
306
(
2004
).
24.
A. K.
Faradjian
and
R.
Elber
, “
Computing time scales from reaction coordinates by milestoning
,”
J. Chem. Phys.
120
,
10880
10889
(
2004
).
25.
M.
Andrec
,
A. K.
Felts
,
E.
Gallicchio
, and
R. M.
Levy
, “
Protein folding pathways from replica exchange simulations and a kinetic network model
,”
Proc. Natl. Acad. Sci. U.S.A.
102
,
6801
6806
(
2005
).
26.
J.
Chodera
,
W.
Swope
,
J.
Pitera
, and
K.
Dill
, “
Long-time protein folding dynamics from short-time molecular dynamics simulations
,”
Multiscale Model. Simul.
5
,
1214
1226
(
2006
).
27.
N.-V.
Buchete
and
G.
Hummer
, “
Coarse master equations for peptide folding dynamics
,”
J. Phys. Chem. B
112
,
6057
6069
(
2008
).
28.
A. C.
Pan
and
B.
Roux
, “
Building Markov state models along pathways to determine free energies and rates of transitions
,”
J. Chem. Phys.
129
,
064107
(
2008
).
29.
F.
Noé
,
C.
Schütte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T. R.
Weikl
, “
Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations
,”
Proc. Natl. Acad. Sci. U.S.A.
106
,
19011
19016
(
2009
).
30.
A.
Berezhkovskii
,
G.
Hummer
, and
A.
Szabo
, “
Reactive flux and folding pathways in network models of coarse-grained protein dynamics
,”
J. Chem. Phys.
130
,
205102
(
2009
).
31.
E.
Weinan
and
E.
Vanden-Eijnden
, “
Towards a theory of transition paths
,”
J. Stat. Phys.
123
,
503
523
(
2006
).
32.
G. R.
Bowman
, “
Improved coarse-graining of Markov state models via explicit consideration of statistical uncertainty
,”
J. Chem. Phys.
137
,
134111
(
2012
).
33.
P.
Deuflhard
,
W.
Huisinga
,
A.
Fischer
, and
C.
Schütte
, “
Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains
,”
Linear Algebra Appl.
315
,
39
59
(
2000
).
34.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noé
, “
Markov models of molecular kinetics: Generation and validation
,”
J. Chem. Phys.
134
,
174105
(
2011
).
35.
V. S.
Pande
,
K.
Beauchamp
, and
G. R.
Bowman
, “
Everything you wanted to know about Markov state models but were afraid to ask
,”
Methods
52
,
99
105
(
2010
).
36.
K. A.
Beauchamp
,
G. R.
Bowman
,
T. J.
Lane
,
L.
Maibaum
,
I. S.
Haque
, and
V. S.
Pande
, “
MSMBuilder2: Modeling conformational dynamics on the picosecond to millisecond scale
,”
J. Chem. Theory Comput.
7
,
3412
3419
(
2011
).
37.
M.
Senne
,
B.
Trendelkamp-Schroer
,
A. S.
Mey
,
C.
Schütte
, and
F.
Noé
, “
EMMA: A software package for Markov model building and analysis
,”
J. Chem. Theory Comput.
8
,
2223
2238
(
2012
).
38.
Y.
Mu
,
P. H.
Nguyen
, and
G.
Stock
, “
Energy landscape of a small peptide revealed by dihedral angle principal component analysis
,”
Proteins Struct. Funct. Bioinf.
58
,
45
52
(
2005
).
39.
E. H.
Kellogg
,
O. F.
Lange
, and
D.
Baker
, “
Evaluation and optimization of discrete state models of protein folding
,”
J. Phys. Chem. B
116
,
11405
11413
(
2012
).
40.
T.
Zhou
and
A.
Caflisch
, “
Distribution of reciprocal of interatomic distances: A fast structural metric
,”
J. Chem. Theory Comput.
8
,
2930
2937
(
2012
).
41.
R. T.
McGibbon
and
V. S.
Pande
, “
Learning kinetic distance metrics for Markov state models of protein conformational dynamics
,”
J. Chem. Theory Comput.
9
,
2900
2906
(
2013
).
42.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
43.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
,
2000
2009
(
2013
).
44.
M.
Sarich
,
F.
Noé
, and
C.
Schütte
, “
On the approximation quality of Markov state models
,”
Multiscale Model. Simul.
8
,
1154
1177
(
2010
).
45.
R. T.
McGibbon
,
C. R.
Schwantes
, and
V. S.
Pande
, “
Statistical model selection for Markov models of biomolecular dynamics
,”
J. Phys. Chem. B
118
,
6475
6481
(
2014
).
46.
N.
Djurdjevac
,
M.
Sarich
, and
C.
Schütte
, “
Estimating the eigenvalue error of Markov state models
,”
Multiscale Model. Simul.
10
,
61
81
(
2012
).
47.
G. R.
Bowman
,
V. A.
Voelz
, and
V. S.
Pande
, “
Atomistic folding simulations of the five-helix bundle protein λ6−85
,”
J. Am. Chem. Soc.
133
,
664
667
(
2011
).
48.
T. J.
Lane
,
G. R.
Bowman
,
K.
Beauchamp
,
V. A.
Voelz
, and
V. S.
Pande
, “
Markov state model reveals folding and functional dynamics in ultra-long MD trajectories
,”
J. Am. Chem. Soc.
133
,
18413
18419
(
2011
).
49.
V. A.
Voelz
,
M.
Jäger
,
S.
Yao
,
Y.
Chen
,
L.
Zhu
,
S. A.
Waldauer
,
G. R.
Bowman
,
M.
Friedrichs
,
O.
Bakajin
,
L. J.
Lapidus
et al, “
Slow unfolded-state structuring in Acyl-CoA binding protein folding revealed by simulation and experiment
,”
J. Am. Chem. Soc.
134
,
12565
12577
(
2012
).
50.
M. W.
Peter Deuflhard
, “
Robust perron cluster analysis in conformation dynamics
,”
Linear Algebra Appl.
398
,
161
184
(
2005
).
51.
S.
Bacallado
,
J. D.
Chodera
, and
V.
Pande
, “
Bayesian comparison of Markov models of molecular dynamics with detailed balance constraint
,”
J. Chem. Phys.
131
,
045106
(
2009
).
52.
G. R.
Bowman
,
L.
Meng
, and
X.
Huang
, “
Quantitative comparison of alternative methods for coarse-graining biological networks
,”
J. Chem. Phys.
139
,
121905
(
2013
).
53.
L. J.
Lapidus
,
S.
Acharya
,
C. R.
Schwantes
,
L.
Wu
,
D.
Shukla
,
M.
King
,
S. J.
DeCamp
, and
V. S.
Pande
, “
Complex pathways in folding of protein G explored by simulation and experiment
,”
Biophys. J.
107
,
947
955
(
2014
).
54.
L.
Rabiner
, “
A tutorial on hidden Markov models and selected applications in speech recognition
,”
Proc. IEEE
77
,
257
286
(
1989
).
55.
M.
Stanke
and
S.
Waack
, “
Gene prediction with a hidden Markov model and a new intron submodel
,”
Bioinformatics
19
,
ii215
ii225
(
2003
).
56.
A. P.
Dempster
,
N. M.
Laird
,
D. B.
Rubin
et al, “
Maximum likelihood from incomplete data via the EM algorithm
,”
J. R. Stat. Soc. B
39
,
1
38
(
1977
).
57.
F.
Noé
,
H.
Wu
,
J.-H.
Prinz
, and
N.
Plattner
, “
Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules
,”
J. Chem. Phys.
139
,
184114
(
2013
).
58.
R. T.
McGibbon
,
B.
Ramsundar
,
M. M.
Sultan
,
G.
Kiss
, and
V. S.
Pande
, “
Understanding protein dynamics with L1-regularized reversible hidden Markov models
,” in
Proceedings of the 31st International Conference on Machine Learning
(
Omnipress
,
Beijing, China
,
2014
), Vol.
32
, pp.
1197
1205
.
59.
C. R.
Baiz
,
Y.-S.
Lin
,
C. S.
Peng
,
K. A.
Beauchamp
,
V. A.
Voelz
,
V. S.
Pande
, and
A.
Tokmakoff
, “
A molecular interpretation of 2D IR protein folding experiments with Markov state models
,”
Biophys. J.
106
,
1359
1370
(
2014
).
60.
X.
Huang
,
Y.
Yao
,
G. R.
Bowman
,
J.
Sun
,
L. J.
Guibas
,
G.
Carlsson
, and
V. S.
Pande
, “
Constructing multi-resolution Markov state models (MSMs) to elucidate RNA hairpin folding mechanisms
,” in
Proceedings of the Pacific Symposium on Biocomputing
(
World Scientific Publishing
,
2009
), pp.
228
239
.
61.
I.
Buch
,
T.
Giorgino
, and
G.
De Fabritiis
, “
Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations
,”
Proc. Natl. Acad. Sci. U.S.A.
108
,
10184
10189
(
2011
).
62.
S. K.
Sadiq
,
F.
Noé
, and
G.
De Fabritiis
, “
Kinetic characterization of the critical step in HIV-1 protease maturation
,”
Proc. Natl. Acad. Sci. U.S.A.
109
,
20449
20454
(
2012
).
63.
D. T.
Chalmers
and
D. P.
Behan
, “
The use of constitutively active GPCRs in drug discovery and functional genomics
,”
Nat. Rev. Drug Discovery
1
,
599
608
(
2002
).
64.
D. M.
Rosenbaum
,
V.
Cherezov
,
M. A.
Hanson
,
S. G.
Rasmussen
,
F. S.
Thian
,
T. S.
Kobilka
,
H.-J.
Choi
,
X.-J.
Yao
,
W. I.
Weis
,
R. C.
Stevens
et al, “
GPCR engineering yields high-resolution structural insights into β2-adrenergic receptor function
,”
Science
318
,
1266
1273
(
2007
).
65.
V.
Cherezov
,
D. M.
Rosenbaum
,
M. A.
Hanson
,
S. G.
Rasmussen
,
F. S.
Thian
,
T. S.
Kobilka
,
H.-J.
Choi
,
P.
Kuhn
,
W. I.
Weis
,
B. K.
Kobilka
et al, “
High-resolution crystal structure of an engineered human β2-adrenergic G protein-coupled receptor
,”
Science
318
,
1258
1265
(
2007
).
66.
S. A.
Laporte
,
R. H.
Oakley
,
J.
Zhang
,
J. A.
Holt
,
S. S.
Ferguson
,
M. G.
Caron
, and
L. S.
Barak
, “
The β2-adrenergic receptor/β-arrestin complex recruits the clathrin adaptor AP-2 during endocytosis
,”
Proc. Natl. Acad. Sci. U.S.A.
96
,
3712
3717
(
1999
).
67.
B.
Jordan
,
N.
Trapaidze
,
I.
Gomes
,
R.
Nivarthi
, and
L.
Devi
, “
Oligomerization of opioid receptors with β2-adrenergic receptors: A role in trafficking and mitogen-activated protein kinase activation
,”
Proc. Natl. Acad. Sci. U.S.A.
98
,
343
348
(
2001
).
68.
K. J.
Kohlhoff
,
D.
Shukla
,
M.
Lawrenz
,
G. R.
Bowman
,
D. E.
Konerding
,
D.
Belov
,
R. B.
Altman
, and
V. S.
Pande
, “
Cloud-based simulations on google exacycle reveal ligand modulation of GPCR activation pathways
,”
Nat. Chem.
6
,
15
21
(
2014
).
69.
I.
Buch
,
M. J.
Harvey
,
T.
Giorgino
,
D. P.
Anderson
, and
G.
De Fabritiis
, “
High-throughput all-atom molecular dynamics simulations using distributed computing
,”
J. Chem. Inf. Model.
50
,
397
403
(
2010
).
70.
N.
Singhal
and
V. S.
Pande
, “
Error analysis and efficient sampling in Markovian state models for molecular dynamics
,”
J. Chem. Phys.
123
,
204909
(
2005
).
71.
J. K.
Weber
and
V. S.
Pande
, “
Characterization and rapid sampling of protein folding Markov state model topologies
,”
J. Chem. Theory Comput.
7
,
3405
3411
(
2011
).
72.
S.
Kalyaanamoorthy
and
Y.-P. P.
Chen
, “
Structure-based drug design to augment hit discovery
,”
Drug Discovery Today
16
,
831
839
(
2011
).
73.
G. R.
Bowman
and
P. L.
Geissler
, “
Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites
,”
Proc. Natl. Acad. Sci. U.S.A.
109
,
11681
11686
(
2012
).
74.
F.
Noé
and
F.
Nüske
, “
A variational approach to modeling slow processes in stochastic dynamical systems
,”
Multiscale Model. Simul.
11
,
635
655
(
2013
).
75.
F.
Nüske
,
B. G.
Keller
,
G.
Pérez-Hernández
,
A. S. J. S.
Mey
, and
F.
Noé
, “
Variational approach to molecular kinetics
,”
J. Chem. Theory Comput.
10
,
1739
1752
(
2014
).
76.
M.
Karplus
and
J. A.
McCammon
, “
Molecular dynamics simulations of biomolecules
,”
Nat. Struct. Mol. Biol.
9
,
646
652
(
2002
).
77.
M. S.
Friedrichs
,
P.
Eastman
,
V.
Vaidyanathan
,
M.
Houston
,
S.
Legrand
,
A. L.
Beberg
,
D. L.
Ensign
,
C. M.
Bruns
, and
V. S.
Pande
, “
Accelerating molecular dynamic simulation on graphics processing units
,”
J. Comput. Chem.
30
,
864
872
(
2009
).
78.
G. R.
Bowman
, “
A tutorial on building Markov state models with MSMBuilder and coarse-graining them with BACE
,” in
Protein Dynamics
, edited by
D. R.
Livesay
(
Springer
,
2014
), pp.
141
158
.