The modeling of atomistic biomolecular simulations using kinetic models such as Markov state models (MSMs) has had many notable algorithmic advances in recent years. The variational principle has opened the door for a nearly fully automated toolkit for selecting models that predict the long time-scale kinetics from molecular dynamics simulations. However, one yet-unoptimized step of the pipeline involves choosing the features, or collective variables, from which the model should be constructed. In order to build intuitive models, these collective variables are often sought to be interpretable and familiar features, such as torsional angles or contact distances in a protein structure. However, previous approaches for evaluating the chosen features rely on constructing a full MSM, which in turn requires additional hyperparameters to be chosen, and hence leads to a computationally expensive framework. Here, we present a method to optimize the feature choice directly, without requiring the construction of the final kinetic model. We demonstrate our rigorous preprocessing algorithm on a canonical set of 12 fast-folding protein simulations and show that our procedure leads to more efficient model selection.

1.
P.
Hänggi
and
P.
Talkner
, “
Memory index of first-passage time: A simple measure of non-Markovian character
,”
Phys. Rev. Lett.
51
,
2242
(
1983
).
2.
D.
Shalloway
, “
Macrostates of classical stochastic systems
,”
J. Chem. Phys.
105
,
9986
10007
(
1996
).
3.
R.
Du
,
V. S.
Pande
,
A. Y.
Grosberg
,
T.
Tanaka
, and
E. S.
Shakhnovich
, “
On the transition coordinate for protein folding
,”
J. Chem. Phys.
108
,
334
350
(
1998
).
4.
M. A.
Rohrdanz
,
W.
Zheng
, and
C.
Clementi
, “
Discovering mountain passes via torchlight: Methods for the definition of reaction coordinates and pathways in complex macromolecular reactions
,”
Annu. Rev. Phys. Chem.
64
,
295
316
(
2013
).
5.
J.-H.
Prinz
,
J. D.
Chodera
, and
F.
Noé
, “
Spectral rate theory for two-state kinetics
,”
Phys. Rev. X
4
,
011020
(
2014
).
6.
F.
Noé
and
C.
Clementi
, “
Collective variables for the study of long-time kinetics from molecular trajectories: Theory and methods
,”
Curr. Opin. Struct. Biol.
43
,
141
147
(
2017
).
7.
C.
Schütte
,
A.
Fischer
,
W.
Huisinga
, and
P.
Deuflhard
, “
A direct approach to conformational dynamics based on hybrid Monte Carlo
,”
J. Comput. Phys.
151
,
146
168
(
1999
).
8.
W. C.
Swope
,
J. W.
Pitera
, and
F.
Suits
, “
Describing protein folding kinetics by molecular dynamics simulations. 1 Theory
,”
J. Phys. Chem. B
108
,
6571
6581
(
2004
).
9.
F.
Noé
,
I.
Horenko
,
C.
Schütte
, and
J. C.
Smith
, “
Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states
,”
J. Chem. Phys.
126
,
155102
(
2007
).
10.
J. D.
Chodera
,
N.
Singhal
,
V. S.
Pande
,
K. A.
Dill
, and
W. C.
Swope
, “
Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics
,”
J. Chem. Phys.
126
,
155101
(
2007
).
11.
N.
Singhal
,
C. D.
Snow
, and
V. S.
Pande
, “
Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a tryptophan zipper beta hairpin
,”
J. Chem. Phys.
121
,
415
425
(
2004
).
12.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noé
, “
Markov models of molecular kinetics: Generation and validation
,”
J. Chem. Phys.
134
,
174105
(
2011
).
13.
B. E.
Husic
and
V. S.
Pande
, “
Markov state models: From an art to a science
,”
J. Am. Chem. Soc.
140
,
2386
2396
(
2018
).
14.
S.
Sriraman
,
I. G.
Kevrekidis
, and
G.
Hummer
, “
Coarse master equation from Bayesian analysis of replica molecular dynamics simulations
,”
J. Phys. Chem. B
109
,
6479
6484
(
2005
).
15.
N.-V.
Buchete
and
G.
Hummer
, “
Coarse master equations for peptide folding dynamics
,”
J. Phys. Chem. B
112
,
6057
6069
(
2008
).
16.
M. A.
Rohrdanz
,
W.
Zheng
,
M.
Maggioni
, and
C.
Clementi
, “
Determination of reaction coordinates via locally scaled diffusion map
,”
J. Chem. Phys.
134
,
124116
(
2011
).
17.
B.
Peters
, “
Using the histogram test to quantify reaction coordinate error
,”
J. Chem. Phys.
125
,
241101
(
2006
).
18.
W.
E
and
E.
Vanden-Eijnden
, “
Towards a theory of transition paths
,”
J. Stat. Phys.
123
,
503
(
2006
).
19.
G.
Hummer
, “
Position-dependent diffusion coefficients and free energies from Bayesian analysis of equilibrium and replica molecular dynamics simulations
,”
New J. Phys.
7
,
34
(
2005
).
20.
J. D.
Chodera
,
W. C.
Swope
,
J. W.
Pitera
, and
K. A.
Dill
, “
Long-time protein folding dynamics from short-time molecular dynamics simulations
,”
Multiscale Model. Simul.
5
,
1214
1226
(
2006
).
21.
S.
Kube
and
M.
Weber
, “
A coarse graining method for the identification of transition rates between molecular conformations
,”
J. Chem. Phys.
126
,
024103
(
2007
).
22.
A.
Altis
,
P. H.
Nguyen
,
R.
Hegger
, and
G.
Stock
, “
Dihedral angle principal component analysis of molecular dynamics simulations
,”
J. Chem. Phys.
126
,
244111
(
2007
).
23.
F.
Noé
,
C.
Schütte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T. R.
Weikl
, “
Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations
,”
Proc. Natl. Acad. Sci.
106
,
19011
19016
(
2009
).
24.
G.
Jayachandran
,
M. R.
Shirts
,
S.
Park
, and
V. S.
Pande
, “
Parallelized-over-parts computation of absolute binding free energy with docking and molecular dynamics
,”
J. Chem. Phys.
125
084901
(
2006
).
25.
S.
Yang
and
B.
Roux
, “
Src kinase conformational activation: Thermodynamics, pathways mechanisms
,”
PLoS Comput. Biol.
4
,
e1000047
(
2008
).
26.
S.
Muff
and
A.
Caflisch
, “
Kinetic analysis of molecular dynamics simulations reveals changes in the denatured state and switch of folding pathways upon single-point mutation of a β-sheet miniprotein
,”
Proteins: Struct., Funct., Bioinf.
70
,
1185
1195
(
2008
).
27.
M.
Sarich
,
F.
Noé
, and
C.
Schütte
, “
On the approximation quality of Markov state models
,”
Multiscale Model. Simul.
8
,
1154
1177
(
2010
).
28.
F.
Noé
and
F.
Nüske
, “
A variational approach to modeling slow processes in stochastic dynamical systems
,”
Multiscale Model. Simul.
11
,
635
655
(
2013
).
29.
F.
Nuske
,
B. G.
Keller
,
G.
Pérez-Hernández
,
A. S.
Mey
, and
F.
Noé
, “
Variational approach to molecular kinetics
,”
J. Chem. Theory Comput.
10
,
1739
1752
(
2014
).
30.
L.
Molgedey
and
H. G.
Schuster
, “
Separation of a mixture of independent signals using time delayed correlations
,”
Phys. Rev. Lett.
72
,
3634
(
1994
).
31.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
32.
F.
Noé
and
C.
Clementi
, “
Kinetic distance and kinetic maps from molecular dynamics simulation
,”
J. Chem. Theory Comput.
11
,
5002
5011
(
2015
).
33.
F.
Noé
,
R.
Banisch
, and
C.
Clementi
, “
Commute maps: Separating slowly mixing molecular configurations for kinetic modeling
,”
J. Chem. Theory Comput.
12
,
5620
5630
(
2016
).
34.
G.
Pérez-Hernández
and
F.
Noé
, “
Hierarchical time-lagged independent component analysis: Computing slow modes and reaction coordinates for large molecular systems
,”
J. Chem. Theory Comput.
12
,
6118
6129
(
2016
).
35.
C. R.
Schwantes
and
V. S.
Pande
, “
Modeling molecular kinetics with tica and the kernel trick
,”
J. Chem. Theory Comput.
11
,
600
608
(
2015
).
36.
S.
Harmeling
,
A.
Ziehe
,
M.
Kawanabe
, and
K.-R.
Müller
, “
Kernel-based nonlinear blind source separation
,”
Neural Comput.
15
,
1089
1124
(
2003
).
37.
H.
Wu
,
F.
Nüske
,
F.
Paul
,
S.
Klus
,
P.
Koltai
, and
F.
Noé
, “
Variational Koopman models: Slow collective variables and molecular kinetics from short off-equilibrium simulations
,”
J. Chem. Phys.
146
,
154104
(
2017
).
38.
H.
Wu
and
F.
Noé
, “
Variational approach for learning Markov processes from time series data
,” preprint arXiv:1707.04659 (
2017
).
39.
R. T.
McGibbon
and
V. S.
Pande
, “
Variational cross-validation of slow dynamical modes in molecular kinetics
,”
J. Chem. Phys.
142
,
124105
(
2015
).
40.
M. K.
Scherer
,
B.
Trendelkamp-Schroer
,
F.
Paul
,
G.
Pérez-Hernández
,
M.
Hoffmann
,
N.
Plattner
,
C.
Wehmeyer
,
J.-H.
Prinz
, and
F.
Noé
, “
Pyemma 2: A software package for estimation, validation, and analysis of Markov models
,”
J. Chem. Theory Comput.
11
,
5525
5542
(
2015
).
41.
B. E.
Husic
,
R. T.
McGibbon
,
M. M.
Sultan
, and
V. S.
Pande
, “
Optimized parameter selection reveals trends in Markov state models for protein folding
,”
J. Chem. Phys.
145
,
194103
(
2016
).
42.
A.
Mardt
,
L.
Pasquali
,
H.
Wu
, and
F.
Noé
, “
Vampnets for deep learning of molecular kinetics
,”
Nat. Commun.
9
,
5
(
2018
).
43.
J.
Kubelka
,
J.
Hofrichter
, and
W. A.
Eaton
, “
The protein folding speed limit
,”
Curr. Opin. Struct. Biol.
14
,
76
88
(
2004
).
44.
K.
Lindorff-Larsen
,
S.
Piana
,
R. O.
Dror
, and
D. E.
Shaw
, “
How fast-folding proteins fold
,”
Science
334
,
517
520
(
2011
).
45.
K. A.
Beauchamp
,
R.
McGibbon
,
Y.-S.
Lin
, and
V. S.
Pande
, “
Simple few-state models reveal hidden complexity in protein folding
,”
Proc. Natl. Acad. Sci.
109
,
17807
17813
(
2012
).
46.
A.
Dickson
and
C. L.
Brooks
 III
, “
Native states of fast-folding proteins are kinetic traps
,”
J. Am. Chem. Soc.
135
,
4729
4734
(
2013
).
47.
J. K.
Weber
,
R. L.
Jack
, and
V. S.
Pande
, “
Emergence of glass-like behavior in Markov state models of protein folding dynamics
,”
J. Am. Chem. Soc.
135
,
5501
5504
(
2013
).
48.
F.
Paul
,
H.
Wu
,
M.
Vossel
,
B. L.
de Groot
, and
F.
Noé
, “
Identification of kinetic order parameters for non-equilibrium dynamics
,”
J. Chem. Phys.
150
,
164120
(
2019
).
49.
C.
Schütte
and
M.
Sarich
, “
A critical appraisal of Markov state models
,”
Eur. Phys. J
224
,
2445
2462
(
2015
).
50.

For more details on this estimation, see Ref. 37, Sec. II C.

51.

Reference 38, Theorem 1.

52.

To see the equivalence, as an intermediate step write Et+τ(ξi)=pij(τ)Et(ξj).

53.

While the Koopman reweighting estimator introduced in Ref. 37 removes bias, it has a relatively large variance.

54.

See Ref. 38, Appendix F, for details. When the whitening as suggested in Ref. 38 is used, C00 and C11 become identity matrices when calculated from the whitened data.

55.
E. K.
Gross
,
L. N.
Oliveira
, and
W.
Kohn
, “
Rayleigh-Ritz variational principle for ensembles of fractionally occupied states
,”
Phys. Rev. A
37
,
2805
(
1988
).
56.

When the stationary process is modeled as in Ref. 38, the score is bounded by m + 1, where m is the number of dynamical (i.e., nonstationary) processes scored.

57.

The folded structures were chosen visually, before analysis, to replicate a naïve choice of reference frame. For WW domain, residues 5–30 were used for the aligned Cartesian coordinate feature, but the results are comparable for the full system.

58.
A.
Shrake
and
J.
Rupley
, “
Environment and exposure to solvent of protein atoms. Lysozyme and insulin
,”
J. Mol. Biol.
79
,
351
371
(
1973
).
59.
R. T.
McGibbon
,
K. A.
Beauchamp
,
M. P.
Harrigan
,
C.
Klein
,
J. M.
Swails
,
C. X.
Hernández
,
C. R.
Schwantes
,
L.-P.
Wang
,
T. J.
Lane
, and
V. S.
Pande
, “
MDTraj: A modern open library for the analysis of molecular dynamics trajectories
,”
Biophys. J.
109
,
1528
1532
(
2015
).
60.
T. F.
Chan
,
G. H.
Golub
, and
R. J.
LeVeque
, “
Updating formulae and a pairwise algorithm for computing sample variances,”
in
COMPSTAT 1982 5th Symposium held at Toulouse 1982
(
Springer
,
1982
), pp.
30
41
.
61.

This is because the Frobenius norm is equivalent to the r-Schatten norm for r = 2; see Ref. 38, Sec. 3.2.

62.
B.
Trendelkamp-Schroer
,
H.
Wu
,
F.
Paul
, and
F.
Noé
, “
Estimation and uncertainty of reversible Markov models
,”
J. Chem. Phys.
143
,
174101
(
2015
).
63.
C.
Spearman
, “
The proof and measurement of association between two things
,”
Am. J. Psychol.
15
,
72
101
(
1904
).
64.
W.
Humphrey
,
A.
Dalke
, and
K.
Schulten
, “
VMD—Visual Molecular dynamics
,”
J. Mol. Graph.
14
,
33
38
(
1996
).
65.
C. R.
Schwantes
,
D.
Shukla
, and
V. S.
Pande
, “
Markov state models and tICA reveal a nonnative folding nucleus in simulations of NuG2
,”
Biophys. J.
110
,
1716
1719
(
2016
).
66.

We also constructed MSMs with cross-validated state decompositions and still observed this time-scale.

67.
H.
Wan
,
G.
Zhou
, and
V. A.
Voelz
, “
A maximum-caliber approach to predicting perturbed folding kinetics due to mutations
,”
J. Chem. Theory Comput.
12
,
5768
5776
(
2016
).
68.
B. E.
Husic
and
V. S.
Pande
, “
Ward clustering improves cross-validated Markov state models of protein folding
,”
J. Chem. Theory Comput.
13
,
963
967
(
2017
).
69.
G.
Zhou
,
G. A.
Pantelopulos
,
S.
Mukherjee
, and
V. A.
Voelz
, “
Bridging microscopic and macroscopic mechanisms of p53-MDM2 binding with kinetic network models
,”
Biophys. J.
113
,
785
793
(
2017
).
70.
A. M.
Razavi
,
G.
Khelashvili
, and
H.
Weinstein
, “
A Markov state-based quantitative kinetic model of sodium release from the dopamine transporter
,”
Sci. Rep.
7
,
40076
(
2017
).
71.
P. V.
Banushkina
and
S. V.
Krivov
, “
Nonparametric variational optimization of reaction coordinates
,”
J. Chem. Phys.
143
,
184108
(
2015
).
72.
T.
Krivobokova
,
R.
Briones
,
J. S.
Hub
,
A.
Munk
, and
B. L.
de Groot
, “
Partial least-squares functional mode analysis: Application to the membrane proteins AQP1, Aqy1, and CLC-ec1
,”
Biophys. J.
103
,
786
796
(
2012
).
73.
G. R.
Bowman
,
K. A.
Beauchamp
,
G.
Boxer
, and
V. S.
Pande
, “
Progress and challenges in the automated construction of Markov state models for full protein systems
,”
J. Chem. Phys.
131
,
124101
(
2009
).

Supplementary Material

You do not currently have access to this content.