Markov State Model (MSM) has become a popular approach to study the conformational dynamics of complex biological systems in recent years. Built upon a large number of short molecular dynamics simulation trajectories, MSM is able to predict the long time scale dynamics of complex systems. However, to achieve Markovianity, an MSM often contains hundreds or thousands of states (microstates), hindering human interpretation of the underlying system mechanism. One way to reduce the number of states is to lump kinetically similar states together and thus coarse-grain the microstates into macrostates. In this work, we introduce a probabilistic lumping algorithm, the Gibbs lumping algorithm, to assign a probability to any given kinetic lumping using the Bayesian inference. In our algorithm, the transitions among kinetically distinct macrostates are modeled by Poisson processes, which will well reflect the separation of time scales in the underlying free energy landscape of biomolecules. Furthermore, to facilitate the search for the optimal kinetic lumping (i.e., the lumped model with the highest probability), a Gibbs sampling algorithm is introduced. To demonstrate the power of our new method, we apply it to three systems: a 2D potential, alanine dipeptide, and a WW protein domain. In comparison with six other popular lumping algorithms, we show that our method can persistently produce the lumped macrostate model with the highest probability as well as the largest metastability. We anticipate that our Gibbs lumping algorithm holds great promise to be widely applied to investigate conformational changes in biological macromolecules.

1.
G. R.
Bowman
,
V. S.
Pande
, and
F.
Noé
,
An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation
(
Springer Science & Business Media
,
2013
).
2.
L.-T.
Da
,
F. K.
Sheong
,
D.-A.
Silva
, and
X.
Huang
, in
Protein Conformational Dynamics
, edited by
K.
Han
,
X.
Zhang
, and
M.
Yang
(
Springer International Publishing
,
2014
), pp.
29
66
.
3.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noé
,
J. Chem. Phys.
134
,
174105
(
2011
).
4.
W.
Wang
,
S.
Cao
,
L.
Zhu
, and
X.
Huang
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1343
(
2018
).
5.
B. E.
Husic
and
V. S.
Pande
,
J. Am. Chem. Soc.
140
,
2386
(
2018
).
6.
A. C.
Pan
and
B.
Roux
,
J. Chem. Phys.
129
,
064107
(
2008
).
7.
B. W.
Zhang
,
W.
Dai
,
E.
Gallicchio
,
P.
He
,
J.
Xia
,
Z.
Tan
, and
R. M.
Levy
,
J. Phys. Chem. B
120
,
8289
(
2016
).
8.
F.
Morcos
,
S.
Chatterjee
,
C. L.
McClendon
,
P. R.
Brenner
,
R.
López-Rendón
,
J.
Zintsmaster
,
M.
Ercsey-Ravasz
,
C. R.
Sweet
,
M. P.
Jacobson
,
J. W.
Peng
, and
J. A.
Izaguirre
,
PLoS Comput. Biol.
6
,
e1001015
(
2010
).
9.
X.
Huang
,
G. R.
Bowman
,
S.
Bacallado
, and
V. S.
Pande
,
Proc. Natl. Acad. Sci. U. S. A.
106
,
19765
(
2009
).
10.
G. R.
Bowman
,
X.
Huang
, and
V. S.
Pande
,
Methods
49
,
197
(
2009
).
11.
R. D.
Malmstrom
,
C. T.
Lee
,
A. T.
Van Wart
, and
R. E.
Amaro
,
J. Chem. Theory Comput.
10
,
2648
(
2014
).
12.
N.-V.
Buchete
and
G.
Hummer
,
J. Phys. Chem. B
112
,
6057
(
2008
).
13.
L.-T.
Da
,
F.
Pardo-Avila
,
L.
Xu
,
D.-A.
Silva
,
L.
Zhang
,
X.
Gao
,
D.
Wang
, and
X.
Huang
,
Nat. Commun.
7
,
11244
(
2016
).
14.
G. R.
Bowman
,
V. A.
Voelz
, and
V. S.
Pande
,
Curr. Opin. Struct. Biol.
21
,
4
(
2011
).
15.
D.-A.
Silva
,
D. R.
Weiss
,
F. P.
Avila
,
L.-T.
Da
,
M.
Levitt
,
D.
Wang
, and
X.
Huang
,
Proc. Natl. Acad. Sci. U. S. A.
111
,
7665
(
2014
).
16.
Y.
Zhao
,
F. K.
Sheong
,
J.
Sun
,
P.
Sander
, and
X.
Huang
,
J. Comput. Chem.
34
,
95
(
2013
).
17.
F. K.
Sheong
,
D.-A.
Silva
,
L.
Meng
,
Y.
Zhao
, and
X.
Huang
,
J. Chem. Theory Comput.
11
,
17
(
2015
).
18.
S.
Liu
,
L.
Zhu
,
F. K.
Sheong
,
W.
Wang
, and
X.
Huang
,
J. Comput. Chem.
38
,
152
(
2017
).
19.
F.
Sittel
and
G.
Stock
,
J. Chem. Theory Comput.
12
,
2426
(
2016
).
20.
R. T.
McGibbon
and
V. S.
Pande
,
J. Chem. Theory Comput.
9
,
2900
(
2013
).
21.
C. R.
Schwantes
and
V. S.
Pande
,
J. Chem. Theory Comput.
9
,
2000
(
2013
).
22.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
,
J. Chem. Phys.
139
,
015102
(
2013
).
23.
Y.
Naritomi
and
S.
Fuchigami
,
J. Chem. Phys.
139
,
215102
(
2013
).
24.
U.
von Luxburg
,
Stat. Comput.
17
,
395
(
2007
).
25.
P.
Deuflhard
,
W.
Huisinga
,
A.
Fischer
, and
C.
Schütte
,
Linear Algebra Appl.
315
,
39
(
2000
).
26.
P.
Deuflhard
and
M.
Weber
,
Linear Algebra Appl.
398
,
161
(
2005
).
27.
J.
Shi
and
J.
Malik
,
IEEE Trans. Pattern Anal. Mach. Intell.
22
,
888
(
2000
).
28.
A. Y.
Ng
,
M. I.
Jordan
, and
Y.
Weiss
, in
Proceedings of the 14th International Conference on Neural Information Processing and Systems: Natural and Synthetic
(
MIT Press
,
Cambridge, MA, USA
,
2001
), pp.
849
856
.
29.
G. R.
Bowman
,
L.
Meng
, and
X.
Huang
,
J. Chem. Phys.
139
,
121905
(
2013
).
30.
A.
Jain
and
G.
Stock
,
J. Chem. Theory Comput.
8
,
3810
(
2012
).
31.
Y.
Yao
,
R. Z.
Cui
,
G. R.
Bowman
,
D.-A.
Silva
,
J.
Sun
, and
X.
Huang
,
J. Chem. Phys.
138
,
174106
(
2013
).
32.
G. R.
Bowman
,
J. Chem. Phys.
137
,
134111
(
2012
).
33.
J. H.
Ward
,
J. Am. Stat. Assoc.
58
,
236
(
1963
).
34.
T.
Hastie
,
R.
Tibshirani
, and
J.
Friedman
, in
Elements of Statistical Learning
(
Springer
,
New York, NY
,
2009
), pp.
485
585
.
35.
G.
Hummer
and
A.
Szabo
,
J. Phys. Chem. B
119
,
9029
(
2015
).
36.
L.
Martini
,
A.
Kells
,
R.
Covino
,
G.
Hummer
,
N.-V.
Buchete
, and
E.
Rosta
,
Phys. Rev. X
7
,
031060
(
2017
).
37.
N. L.
Johnson
,
Ann. Math. Stat.
40
,
326
(
1969
).
38.
R.
Zwanzig
,
J. Stat. Phys.
30
,
255
(
1983
).
39.
K. A.
Beauchamp
,
G. R.
Bowman
,
T. J.
Lane
,
L.
Maibaum
,
I. S.
Haque
, and
V. S.
Pande
,
J. Chem. Theory Comput.
7
,
3412
(
2011
).
40.
W. C.
Swope
,
J. W.
Pitera
, and
F.
Suits
,
J. Phys. Chem. B
108
,
6571
(
2004
).
41.
B.
Trendelkamp-Schroer
,
H.
Wu
,
F.
Paul
, and
F.
Noé
,
J. Chem. Phys.
143
,
174101
(
2015
).
42.
S.
Bacallado
,
J. D.
Chodera
, and
V.
Pande
,
J. Chem. Phys.
131
,
045106
(
2009
).
43.
J. S.
Liu
,
J. Am. Stat. Assoc.
89
,
958
(
1994
).
44.
J. S.
Liu
,
Monte Carlo Strategies in Scientific Computing
(
Springer
,
2008
).
45.
A. P.
Dempster
,
N. M.
Laird
, and
D. B.
Rubin
,
J. R. Stat. Soc. Ser. B Methodol.
39
,
1
(
1977
).
46.
S.
Mandt
,
M. D.
Hoffman
, and
D. M.
Blei
,
J. Mach. Learn. Res.
18
,
4873
(
2017
).
47.
J. D.
Chodera
and
M. R.
Shirts
,
J. Chem. Phys.
135
,
194110
(
2011
).
48.
B.
Walsh
,
Markov Chain Monte Carlo and Gibbs Sampling
, Lecture Notes for MIT (
2004
).
50.
H. C.
Andersen
,
J. Chem. Phys.
72
,
2384
(
1980
).
51.
M.
Matsumoto
and
T.
Nishimura
,
ACM Trans. Model. Comput. Simul.
8
,
3
(
1998
).
52.
D. A.
Case
,
J. T.
Berryman
,
R. M.
Betz
,
D. S.
Cerutti
,
T. E.
Cheatham
 III
,
T. A.
Darden
,
R. E.
Duke
,
T. J.
Giese
,
H.
Gohlke
,
A. W.
Goetz
,
N.
Homeyer
,
S.
Izadi
,
P.
Janowski
,
J.
Kaus
,
A.
Kovalenko
,
T. S.
Lee
,
S.
LeGrand
,
P.
Li
,
T.
Luchko
,
R.
Luo
,
B.
Madej
,
K. M.
Merz
,
G.
Monard
,
P.
Needham
,
H.
Nguyen
,
H. T.
Nguyen
,
I.
Omelyan
,
A.
Onufriev
,
D. R.
Roe
,
A.
Roitberg
,
R.
Salomon-Ferrer
,
C. L.
Simmerling
,
W.
Smith
,
J.
Swails
,
R. C.
Walker
,
J.
Wang
,
R. M.
Wolf
,
X.
Wu
,
D. M.
York
, and
P. A.
Kollman
,
AMBER 2015
(
University of California
,
San Francisco
,
2015
).
53.
D.
Van der Spoel
,
E.
Lindahl
,
B.
Hess
,
A.
Van Buuren
,
E.
Apol
,
P.
Meulenhoff
,
D.
Tieleman
,
A.
Sijbers
,
K.
Feenstra
,
R.
Van Drunen
, and
H. J. C.
Berendsen
, Gromacs User Manual version 4.5.4, www.gromacs.org (
2010
).
54.
V.
Hornak
,
R.
Abel
,
A.
Okur
,
B.
Strockbine
,
A.
Roitberg
, and
C.
Simmerling
,
Proteins Struct. Funct. Bioinf.
65
,
712
(
2006
).
55.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M. L.
Klein
,
J. Chem. Phys.
79
,
926
(
1983
).
56.
B.
Hess
,
H.
Bekker
,
H. J. C.
Berendsen
, and
J. G. E. M.
Fraaije
,
J. Comput. Chem.
18
,
1463
(
1997
).
57.
U.
Essmann
,
L.
Perera
,
M. L.
Berkowitz
,
T.
Darden
,
H.
Lee
, and
L. G.
Pedersen
,
J. Chem. Phys.
103
,
8577
(
1995
).
58.
M.
Parrinello
and
A.
Rahman
,
J. Appl. Phys.
52
,
7182
(
1981
).
59.
D. E.
Shaw
,
P.
Maragakis
,
K.
Lindorff-Larsen
,
S.
Piana
,
R. O.
Dror
,
M. P.
Eastwood
,
J. A.
Bank
,
J. M.
Jumper
,
J. K.
Salmon
,
Y.
Shan
, and
W.
Wriggers
,
Science
330
,
341
(
2010
).
60.
A.
Jain
and
G.
Stock
,
J. Phys. Chem. B
118
,
7750
(
2014
).
61.
W.
Huisinga
,
S.
Meyn
, and
C.
Schütte
,
Ann. Appl. Probab.
14
,
419
(
2004
).
62.
J. D.
Chodera
,
N.
Singhal
,
V. S.
Pande
,
K. A.
Dill
, and
W. C.
Swope
,
J. Chem. Phys.
126
,
155101
(
2007
).
63.
J.
Apostolakis
,
P.
Ferrara
, and
A.
Caflisch
,
J. Chem. Phys.
110
,
2099
(
1999
).
64.
P. L.
Freddolino
,
F.
Liu
,
M.
Gruebele
, and
K.
Schulten
,
Biophys. J.
94
,
L75
(
2008
).
65.
S. V.
Krivov
,
J. Phys. Chem. B
115
,
12315
(
2011
).
66.
T. J.
Lane
,
G. R.
Bowman
,
K.
Beauchamp
,
V. A.
Voelz
, and
V. S.
Pande
,
J. Am. Chem. Soc.
133
,
18413
(
2011
).
67.
G.
Berezovska
,
D.
Prada-Gracia
, and
F.
Rao
,
J. Chem. Phys.
139
,
035102
(
2013
).
68.
L.
Boninsegna
,
G.
Gobbo
,
F.
Noé
, and
C.
Clementi
,
J. Chem. Theory Comput.
11
,
5947
(
2015
).
69.
T.
Mori
and
S.
Saito
,
J. Chem. Phys.
142
,
135101
(
2015
).
70.
H. S.
Chung
,
K.
McHale
,
J. M.
Louis
, and
W. A.
Eaton
,
Science
335
,
981
(
2012
).
71.
B.
Schuler
and
H.
Hofmann
,
Curr. Opin. Struct. Biol.
23
,
36
(
2013
).
72.
W.
Kabsch
and
C.
Sander
,
Biopolymers
22
,
2577
(
1983
).
73.
F.
Liu
,
D.
Du
,
A. A.
Fuller
,
J. E.
Davoren
,
P.
Wipf
,
J. W.
Kelly
, and
M.
Gruebele
,
Proc. Natl. Acad. Sci. U. S. A.
105
,
2369
(
2008
).
74.
F.
Noé
,
C.
Schütte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T. R.
Weikl
,
Proc. Natl. Acad. Sci. U. S. A.
106
,
19011
(
2009
).
75.
A.
Chakrabarti
and
J. K.
Ghosh
, in
Philosophy of Statistics
, edited by
P. S.
Bandyopadhyay
and
M. R.
Forster
(
North-Holland
,
Amsterdam
,
2011
), pp.
583
605
.

Supplementary Material

You do not currently have access to this content.