The accurate definition of suitable metastable conformational states is fundamental for the construction of a Markov state model describing biomolecular dynamics. Following the dimensionality reduction in a molecular dynamics trajectory, these microstates can be generated by a recently proposed density-based geometrical clustering algorithm [F. Sittel and G. Stock, J. Chem. Theory Comput. 12, 2426 (2016)], which by design cuts the resulting clusters at the energy barriers and allows for a data-based identification of all parameters. Nevertheless, projection artifacts due to the inevitable restriction to a low-dimensional space combined with insufficient sampling often leads to a misclassification of sampled points in the transition regions. This typically causes intrastate fluctuations to be mistaken as interstate transitions, which leads to artificially short life times of the metastable states. As a simple but effective remedy, dynamical coring requires that the trajectory spends a minimum time in the new state for the transition to be counted. Adopting molecular dynamics simulations of two well-established biomolecular systems (alanine dipeptide and villin headpiece), dynamical coring is shown to considerably improve the Markovianity of the resulting metastable states, which is demonstrated by Chapman-Kolmogorov tests and increased implied time scales of the Markov model. Providing high structural and temporal resolution, the combination of density-based clustering and dynamical coring is particularly suited to describe the complex structural dynamics of unfolded biomolecules.

1.
J. D.
Chodera
,
W. C.
Swope
,
J. W.
Pitera
, and
K. A.
Dill
, “
Obtaining long-time protein folding dynamics from short-time molecular dynamics simulations
,”
Multiscale Model. Simul.
5
,
1214
(
2006
).
2.
N.-V.
Buchete
and
G.
Hummer
, “
Coarse master equations for peptide folding dynamics
,”
J. Phys. Chem. B
112
,
6057
(
2008
).
3.
G. R.
Bowman
,
K. A.
Beauchamp
,
G.
Boxer
, and
V. S.
Pande
, “
Progress and challenges in the automated construction of Markov state models for full protein systems
,”
J. Chem. Phys.
131
,
124101
(
2009
).
4.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noe
, “
Markov models of molecular kinetics: Generation and validation
,”
J. Chem. Phys.
134
,
174105
(
2011
).
5.
G. R.
Bowman
,
V. S.
Pande
, and
F.
Noe
,
An Introduction to Markov State Models
(
Springer
,
Heidelberg
,
2013
).
6.
W.
Wei
,
C.
Siqin
,
Z.
Lizhe
, and
H.
Xuhui
, “
Constructing Markov state models to elucidate the functional conformational changes of complex biomolecules
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1343
(
2017
).
7.
B. E.
Husic
and
V. S.
Pande
, “
Markov state models: From an art to a science
,”
J. Am. Chem. Soc.
140
,
2386
(
2018
).
8.
M. K.
Scherer
,
B.
Trendelkamp-Schroer
,
F.
Paul
,
G.
Perez-Hernandez
,
M.
Hoffmann
,
N.
Plattner
,
C.
Wehmeyer
,
J.-H.
Prinz
, and
F.
Noe
, “
PyEMMA 2: A software package for estimation, validation, and analysis of Markov models
,”
J. Chem. Theory Comput.
11
,
5525
(
2015
).
9.
G. R.
Bowman
,
X.
Huang
, and
V. S.
Pande
, “
Using generalized ensemble simulations and Markov state models to identify conformational states
,”
Methods
49
,
197
(
2009
).
10.
L.
Sawle
and
K.
Ghosh
, “
Convergence of molecular dynamics simulation of protein native states: Feasibility vs self-consistency dilemma
,”
J. Chem. Theory Comput.
12
,
861
(
2016
).
11.
R.
Hegger
,
A.
Altis
,
P. H.
Nguyen
, and
G.
Stock
, “
How complex is the dynamics of peptide folding?
,”
Phys. Rev. Lett.
98
,
028102
(
2007
).
12.
S.
Piana
and
A.
Laio
, “
Advillin folding takes place on a hypersurface of small dimensionality
,”
Phys. Rev. Lett.
101
,
208101
(
2008
).
13.
E.
Facco
,
M.
d’Errico
,
A.
Rodriguez
, and
A.
Laio
, “
Estimating the intrinsic dimension of datasets by a minimal neighborhood information
,”
Sci. Rep.
7
,
12140
(
2017
).
14.
M. A.
Rohrdanz
,
W.
Zheng
, and
C.
Clementi
, “
Discovering mountain passes via torchlight: Methods for the definition of reaction coordinates and pathways in complex macromolecular reactions
,”
Annu. Rev. Phys. Chem.
64
,
295
(
2013
).
15.
B.
Peters
, “
Reaction coordinates and mechanistic hypothesis tests
,”
Annu. Rev. Phys. Chem.
67
,
669
(
2016
).
16.
F.
Noe
and
C.
Clementi
, “
Collective variables for the study of long-time kinetics from molecular trajectories: Theory and methods
,”
Curr. Opin. Struct. Biol.
43
,
141
(
2017
).
17.
F.
Sittel
and
G.
Stock
, “
Perspective: Identification of collective coordinates and metastable states of protein dynamics
,”
J. Chem. Phys.
149
,
150901
(
2018
).
18.
A.
Amadei
,
A. B. M.
Linssen
, and
H. J. C.
Berendsen
, “
Essential dynamics of proteins
,”
Proteins
17
,
412
(
1993
).
19.
Y.
Mu
,
P. H.
Nguyen
, and
G.
Stock
, “
Energy landscape of a small peptide revealed by dihedral angle principal component analysis
,”
Proteins
58
,
45
(
2005
).
20.
L.
Molgedey
and
H. G.
Schuster
, “
Separation of a mixture of independent signals using time delayed correlations
,”
Phys. Rev. Lett.
72
,
3634
(
1994
).
21.
G.
Perez-Hernandez
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noe
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
22.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
,
2000
(
2013
).
23.
P.
Das
,
M.
Moll
,
H.
Stamati
,
L. E.
Kavraki
, and
C.
Clementi
, “
Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction
,”
Proc. Natl. Acad. Sci. U. S. A.
103
,
9885
(
2006
).
24.
W. M.
Brown
,
S.
Martin
,
S. N.
Pollock
,
E. A.
Coutsias
, and
J.-P.
Watson
, “
Algorithmic dimensionality reduction for molecular structure analysis
,”
J. Chem. Phys.
129
,
064118
(
2008
).
25.
M.
Ceriotti
,
G. A.
Tribello
, and
M.
Parrinello
, “
Simplifying the representation of complex free-energy landscapes using sketch-map
,”
Proc. Natl. Acad. Sci. U. S. A.
108
,
13023
(
2011
).
26.
M.
Duan
,
J.
Fan
,
M.
Li
,
L.
Han
, and
S.
Huo
, “
Evaluation of dimensionality-reduction methods from peptide folding-unfolding simulations
,”
J. Chem. Theory Comput.
9
,
2490
(
2013
).
27.
A.
Rodriguez
,
M.
d’Errico
,
E.
Facco
, and
A.
Laio
, “
Computing the free energy without collective variables
,”
J. Chem. Theory Comput.
14
,
1206
(
2018
).
28.
A.
Ma
and
A. R.
Dinner
, “
Automatic method for identifying reaction coordinates in complex systems
,”
J. Phys. Chem. B
109
,
6769
(
2005
).
29.
E.
Chiavazzo
,
R.
Covino
,
R. R.
Coifman
,
C. W.
Gear
,
A. S.
Georgiou
,
G.
Hummer
, and
I. G.
Kevrekidis
, “
Intrinsic map dynamics exploration for uncharted effective free-energy landscapes
,”
Proc. Natl. Acad. Sci. U. S. A.
114
,
E5494
(
2017
).
30.
W.
Chen
,
A. R.
Tan
, and
A. L.
Ferguson
, “
Collective variable discovery and enhanced sampling using autoencoders: Innovations in network architecture and error function design
,”
J. Chem. Phys.
149
,
072312
(
2018
).
31.
M. M.
Sultan
,
H. K.
Wayment-Steele
, and
V. S.
Pande
, “
Transferable neural networks for enhanced sampling of protein dynamics
,”
J. Chem. Theory Comput.
14
,
1887
(
2018
).
32.
C.
Wehmeyer
and
F.
Noe
, “
Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics
,”
J. Chem. Phys.
148
,
241703
(
2018
).
33.
J. M. L.
Ribeiro
,
P.
Bravo
,
Y.
Wang
, and
P.
Tiwary
, “
Reweighted autoencoded variational bayes for enhanced sampling (rave)
,”
J. Chem. Phys.
149
,
072301
(
2018
).
34.
S.
Brandt
,
F.
Sittel
,
M.
Ernst
, and
G.
Stock
, “
Machine learning of biomolecular reaction coordinates
,”
J. Phys. Chem. Lett.
9
,
2144
(
2018
).
35.
A. K.
Jain
, “
Data clustering: 50 years beyond K-means
,”
Pattern Recognit. Lett.
31
,
651
(
2010
).
36.
M.
Ester
,
H.-P.
Kriegel
,
J.
Sander
, and
X.
Xu
,
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
(
AAAI Press
,
1996
).
37.
B.
Keller
,
X.
Daura
, and
W. F.
van Gunsteren
, “
Comparing geometric and kinetic cluster algorithms for molecular simulation data
,”
J. Chem. Phys.
132
,
074110
(
2010
).
38.
F. K.
Sheong
,
D.-A.
Silva
,
L.
Meng
,
Y.
Zhao
, and
X.
Huang
, “
Automatic state partitioning for multibody systems (APM): An efficient algorithm for constructing Markov state models to elucidate conformational dynamics of multibody systems
,”
J. Chem. Theory Comput.
11
,
17
(
2015
).
39.
A.
Rodriguez
and
A.
Laio
, “
Clustering by fast search and find of density peaks
,”
Science
344
,
1492
(
2014
).
40.
F.
Sittel
and
G.
Stock
, “
Robust density-based clustering to identify metastable conformational states of proteins
,”
J. Chem. Theory Comput.
12
,
2426
(
2016
).
41.
S.
Liu
,
L.
Zhu
,
F. K.
Sheong
,
W.
Wang
, and
X.
Huang
, “
Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories
,”
J. Comput. Chem.
38
,
152
(
2017
).
42.
M.
Sarich
,
F.
Noe
, and
C.
Schütte
, “
On the approximation quality of Markov state models
,”
Multiscale Model. Simul.
8
,
1154
(
2010
).
43.
A.
Jain
and
G.
Stock
, “
Identifying metastable states of folding proteins
,”
J. Chem. Theory Comput.
8
,
3810
(
2012
).
44.
S.
Röblitz
and
M.
Weber
, “
Fuzzy spectral clustering by PCCA+: Application to Markov state models and data classification
,”
Adv. Data Anal. Classif.
7
,
147
(
2013
).
45.
G. R.
Bowman
,
L.
Meng
, and
X.
Huang
, “
Quantitative comparison of alternative methods for coarse-graining biological networks
,”
J. Chem. Phys.
139
,
121905
(
2013
).
46.
G.
Hummer
and
A.
Szabo
, “
Optimal dimensionality reduction of multistate kinetic and Markov-state models
,”
J. Phys. Chem. B
119
,
9029
(
2015
).
47.
L.
Martini
,
A.
Kells
,
R.
Covino
,
G.
Hummer
,
N.-V.
Buchete
, and
E.
Rosta
, “
Variational identification of Markovian transition states
,”
Phys. Rev. X
7
,
031060
(
2017
).
48.
S.
Krivov
,
S.
Muff
,
A.
Caflisch
, and
M.
Karplus
, “
One-dimensional barrier-preserving free-energy projections of a β-sheet miniprotein: New insights into the folding process
,”
J. Phys. Chem. B
112
,
8701
(
2008
).
49.
F.
Rao
and
M.
Karplus
, “
Protein dynamics investigated by inherent structure analysis
,”
Proc. Natl. Acad. Sci. U. S. A.
107
,
9152
(
2010
).
50.
A. K.
Faradjian
and
R.
Elber
, “
Computing time scales from reaction coordinates by milestoning
,”
J. Chem. Phys.
120
,
10880
(
2004
).
51.
C.
Schütte
,
F.
Noe
,
J.
Lu
,
M.
Sarich
, and
E.
Vanden-Eijnden
, “
Markov state models based on milestoning
,”
J. Chem. Phys.
134
,
204105
(
2011
).
52.
O.
Lemke
and
B. G.
Keller
, “
Density-based cluster algorithms for the identification of core sets
,”
J. Chem. Phys.
145
,
164104
(
2016
).
53.
A.
Jain
and
G.
Stock
, “
Hierarchical folding free energy landscape of HP35 revealed by most probable path clustering
,”
J. Phys. Chem. B
118
,
7750
(
2014
).
54.
F.
Sittel
,
T.
Filk
, and
G.
Stock
, “
Principal component analysis on a torus: Theory and application to protein dynamics
,”
J. Chem. Phys.
147
,
244101
(
2017
).
55.
N.
Schaudinnus
,
B.
Lickert
,
M.
Biswas
, and
G.
Stock
, “
Global Langevin model of multidimensional biomolecular dynamics
,”
J. Chem. Phys.
145
,
184114
(
2016
).
56.
V.
Hornak
,
R.
Abel
,
A.
Okur
,
B.
Strockbine
,
A.
Roitberg
, and
C.
Simmerling
, “
Comparison of multiple Amber force fields and development of improved protein backbone parameters
,”
Proteins
65
,
712
(
2006
).
57.
R. B.
Best
and
G.
Hummer
, “
Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides
,”
J. Phys. Chem. B
113
,
9004
(
2009
).
58.
K.
Lindorff-Larsen
,
S.
Piana
,
K.
Palmo
,
P.
Maragakis
,
J. L.
Klepeis
,
R. O.
Dror
, and
D. E.
Shaw
, “
Improved side-chain torsion potentials for the Amber ff99SB protein force field
,”
Proteins
78
,
1950
(
2010
).
59.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M.
Klein
, “
Comparison of simple potential functions for simulating liquid water
,”
J. Chem. Phys.
79
,
926
(
1983
).
60.
S.
Pronk
 et al., “
GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit
,”
Bioinformatics
29
,
845
(
2013
).
61.
S.
Piana
,
K.
Lindorff-Larsen
, and
D. E.
Shaw
, “
Protein folding kinetics and thermodynamics from atomistic simulation
,”
Proc. Natl. Acad. Sci. U. S. A.
109
,
17845
(
2012
).
62.
A.
Reiner
,
P.
Henklein
, and
T.
Kiefhaber
, “
An unlocking/relocking barrier in conformational fluctuations of villin headpiece subdomain
,”
Proc. Natl. Acad. Sci. U. S. A.
107
,
4955
(
2010
).
63.
F.
Sittel
,
A.
Jain
, and
G.
Stock
, “
Principal component analysis of molecular dynamics: On the use of Cartesian vs. internal coordinates
,”
J. Chem. Phys.
141
,
014111
(
2014
).
64.
A.
Altis
,
M.
Otten
,
P. H.
Nguyen
,
R.
Hegger
, and
G.
Stock
, “
Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis
,”
J. Chem. Phys.
128
,
245102
(
2008
).
65.
M.
Ernst
,
F.
Sittel
, and
G.
Stock
, “
Contact- and distance-based principal component analysis of protein dynamics
,”
J. Chem. Phys.
143
,
244114
(
2015
).
66.
A.
Altis
,
P. H.
Nguyen
,
R.
Hegger
, and
G.
Stock
, “
Dihedral angle principal component analysis of molecular dynamics simulations
,”
J. Chem. Phys.
126
,
244111
(
2007
).
67.

This can be tested by projecting the data onto one or two coordinates and comparing the original MD distribution to the number of neighbors the points have within R (normalized to 1) as calculated for this projection. If R is chosen too large, the latter yields blurred features such that details in the point distribution cannot be recovered.

68.

We note that due to the low dimension in the case of AD there is a large number of points that are geometrically isolated from the main cluster of points, e.g., the αL-helical region. However, this region forms a cluster of more than 0.1% of data and is therefore not defined as noise.

69.
S.
Buchenberg
,
F.
Sittel
, and
G.
Stock
, “
Time-resolved observation of protein allosteric communication
,”
Proc. Natl. Acad. Sci. U. S. A.
114
,
E6804
(
2017
).
70.
W.
Kabsch
and
C.
Sander
, “
Dictionary of protein secondary structure: Pattern recognition of hydrogen bonded and geometrical features
,”
Biopolymers
22
,
2577
(
1983
).
71.
S.
Piana
,
K.
Lindorff-Larsen
, and
D. E.
Shaw
, “
How robust are protein folding simulations with respect to force field parameterization?
,”
Biophys. J.
100
,
L47
(
2011
).

Supplementary Material

You do not currently have access to this content.