Bottom-up coarse-grained (CG) molecular dynamics models are parameterized using complex effective Hamiltonians. These models are typically optimized to approximate high dimensional data from atomistic simulations. However, human validation of these models is often limited to low dimensional statistics that do not necessarily differentiate between the CG model and said atomistic simulations. We propose that classification can be used to variationally estimate high dimensional error and that explainable machine learning can help convey this information to scientists. This approach is demonstrated using Shapley additive explanations and two CG protein models. This framework may also be valuable for ascertaining whether allosteric effects at the atomistic level are accurately propagated to a CG model.

1.
M.
Karplus
and
J. A.
McCammon
, “
Molecular dynamics simulations of biomolecules
,”
Nat. Struct. Biol.
9
,
646
652
(
2002
).
2.
D. E.
Shaw
,
P.
Maragakis
,
K.
Lindorff-Larsen
,
S.
Piana
,
R. O.
Dror
,
M. P.
Eastwood
,
J. A.
Bank
,
J. M.
Jumper
,
J. K.
Salmon
,
Y.
Shan
, and
W.
Wriggers
, “
Atomic-level characterization of the structural dynamics of proteins
,”
Science
330
,
341
346
(
2010
).
3.
I.
Buch
,
M. J.
Harvey
,
T.
Giorgino
,
D. P.
Anderson
, and
G.
De Fabritiis
, “
High-throughput all-atom molecular dynamics simulations using distributed computing
,”
J. Chem. Inf. Model.
50
,
397
403
(
2010
).
4.
K.
Lindorff-Larsen
,
S.
Piana
,
R. O.
Dror
, and
D. E.
Shaw
, “
How fast-folding proteins fold
,”
Science
334
,
517
520
(
2011
).
5.
M.
Karplus
and
R.
Lavery
, “
Significance of molecular dynamics simulations for life sciences
,”
Isr. J. Chem.
54
,
1042
1051
(
2014
).
6.
G. A.
Voth
,
Coarse-graining of Condensed Phase and Biomolecular Systems
(
Taylor & Francis,
,
2008
).
7.
E.
Brini
,
E. A.
Algaer
,
P.
Ganguly
,
C.
Li
,
F.
Rodríguez-Ropero
, and
N. F. A.
van der Vegt
, “
Systematic coarse-graining methods for soft matter simulations—A review
,”
Soft Matter
9
,
2108
2119
(
2013
).
8.
M. G.
Saunders
and
G. A.
Voth
, “
Coarse-graining methods for computational biology
,”
Annu. Rev. Biophys.
42
,
73
93
(
2013
).
9.
W. G.
Noid
, “
Perspective: Coarse-grained models for biomolecular systems
,”
J. Chem. Phys.
139
,
090901
(
2013
).
10.
W. G.
Noid
,
Biomolecular Simulations
(
Springer
,
2013
).
11.
S. J.
Marrink
and
D. P.
Tieleman
, “
Perspective on the Martini model
,”
Chem. Soc. Rev.
42
,
6801
6822
(
2013
).
12.
R.
Potestio
,
C.
Peter
, and
K.
Kremer
, “
Computer simulations of soft matter: Linking the scales
,”
Entropy
16
,
4199
4245
(
2014
).
13.
A. J.
Pak
and
G. A.
Voth
, “
Advances in coarse-grained modeling of macromolecular complexes
,”
Curr. Opin. Struct. Biol.
52
,
119
126
(
2018
).
14.
P.
Gkeka
,
G.
Stoltz
,
A.
Barati Farimani
,
Z.
Belkacemi
,
M.
Ceriotti
,
J. D.
Chodera
,
A. R.
Dinner
,
A. L.
Ferguson
,
J.-B.
Maillet
,
H.
Minoux
,
C.
Peter
,
F.
Pietrucci
,
A.
Silveira
,
A.
Tkatchenko
,
Z.
Trstanova
,
R.
Wiewiora
, and
T.
Lelièvre
, “
Machine learning force fields and coarse-grained variables in molecular dynamics: Application to materials and biological systems
,”
J. Chem. Theory Comput.
16
,
4757
4775
(
2020
).
15.
F.
Noé
,
A.
Tkatchenko
,
K.-R.
Müller
, and
C.
Clementi
, “
Machine learning for molecular simulation
,”
Annu. Rev. Phys. Chem.
71
,
361
390
(
2020
).
16.
J.
Jin
,
A. J.
Pak
,
A. E. P.
Durumeric
,
T. D.
Loose
, and
G. A.
Voth
, “
Bottom-up coarse-graining: Principles and perspectives
,”
J. Chem. Theory Comput.
18
,
5759
5791
(
2022
).
17.
W. G.
Noid
, “
Perspective: Advances, challenges, and insight for predictive coarse-grained models
,”
J. Phys. Chem. B
127
,
4174
(
2023
).
18.
J. F.
Rudzinski
, “
Recent progress towards chemically-specific coarse-grained simulation models with consistent dynamical properties
,”
Computation
7
,
42
(
2019
).
19.
W. G.
Noid
,
J.-W.
Chu
,
G. S.
Ayton
, and
G. A.
Voth
, “
Multiscale coarse-graining and structural correlations: Connections to liquid-state theory
,”
J. Phys. Chem. B
111
,
4116
4127
(
2007
).
20.
M. S.
Shell
, “
The relative entropy is fundamental to multiscale and inverse thermodynamic problems
,”
J. Chem. Phys.
129
,
144108
(
2008
).
21.
W.
Humphrey
,
A.
Dalke
, and
K.
Schulten
, “
VMD: Visual molecular dynamics
,”
J. Mol. Graph.
14
,
33
38
(
1996
).
22.
M.
Ceriotti
, “
Unsupervised machine learning in atomistic simulations, between predictions and understanding
,”
J. Chem. Phys.
150
,
150901
(
2019
).
23.
A. P.
Bartók
,
J.
Kermode
,
N.
Bernstein
, and
G.
Csányi
, “
Machine learning a general-purpose interatomic potential for silicon
,”
Phys. Rev. X
8
,
041048
(
2018
).
24.
A. E. P.
Durumeric
,
N. E.
Charron
,
C.
Templeton
,
F.
Musil
,
K.
Bonneau
,
A. S.
Pasos-Trejo
,
Y.
Chen
,
A.
Kelkar
,
F.
Noé
, and
C.
Clementi
, “
Machine learned coarse-grained protein force-fields: Are we there yet?
,”
Curr. Opin. Struct. Biol.
79
,
102533
(
2023
).
25.
G.
Stoltz
,
M.
Rousset
, and
T.
Lelièvre
,
Free Energy Computations: A Mathematical Perspective
(
World Scientific
,
2010
).
26.
C.
Molnar
,
Interpretable Machine Learning
,
2 edn
(
Lulu.com
,
2019
).
27.
A.
Barredo Arrieta
,
N.
Díaz-Rodríguez
,
J.
Del Ser
,
A.
Bennetot
,
S.
Tabik
,
A.
Barbado
,
S.
Garcia
,
S.
Gil-Lopez
,
D.
Molina
,
R.
Benjamins
,
R.
Chatila
, and
F.
Herrera
, “
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
,”
Inf. Fusion
58
,
82
115
(
2020
).
28.
V.
Arya
et al, “
One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques
,” arXiv:1909.03012 (
2019
).
29.
W. J.
Murdoch
,
C.
Singh
,
K.
Kumbier
,
R.
Abbasi-Asl
, and
B.
Yu
, “
Definitions, methods, and applications in interpretable machine learning
,”
Proc. Natl. Acad. Sci. U. S. A.
116
,
22071
22080
(
2019
).
30.
C.
Molnar
,
G.
Casalicchio
, and
B.
Bischl
, “
Interpretable machine learning—A brief history, state-of-the-art and challenges
,” in
ECML PKDD 2020 Workshops
, edited by I. Koprinska et al. (Springer, Cham, 2020), pp.
417
431
.
31.
A.
Holzinger
,
P.
Kieseberg
,
E.
Weippl
, and
A. M.
Tjoa
, “
Current advances, trends and challenges of machine learning and knowledge extraction: From machine learning to explainable AI
,”
Lecture Notes in Computer Science
(
Springer, Cham
,
2018
), Vol. 11015, pp.
1
8
.
32.
Y.
Kodratoff
, “
The comprehensibility manifesto
,” https://www.kdnuggets.com/news/94/n9.txt.
33.
S.
Rüping
, “
Learning interpretable models
,” Ph.D. thesis,
Dortmund University
,
2006
.
34.
C.
Rudin
, “
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
,”
Nat. Mach. Intell.
1
,
206
215
(
2019
).
35.
B.
Mittelstadt
,
C.
Russell
, and
S.
Wachter
,
Explaining Explanations in AI
, edited by
D.
Boyd
,
J.
Morgenstern
,
A.
Chouldechova
and
F.
Diaz
(
Association for Computing Machinery
,
New York
,
2019
), pp.
279
288
.
36.
S. M.
Lundberg
and
S.-I.
Lee
, in
A Unified Approach to Interpreting Model Predictions
, edited by
I.
Guyon
et al
(
Red Hook
,
NY
), pp.
4768
4777
.
37.
S. M.
Lundberg
,
B.
Nair
,
M. S.
Vavilala
,
M.
Horibe
,
M. J.
Eisses
,
T.
Adams
,
D. E.
Liston
,
D. K.-W.
Low
,
S.-F.
Newman
,
J.
Kim
, and
S.-I.
Lee
, “
Explainable machine-learning predictions for the prevention of hypoxaemia during surgery
,”
Nat. Biomed. Eng.
2
,
749
760
(
2018
).
38.
S. M.
Lundberg
,
G.
Erion
,
H.
Chen
,
A.
DeGrave
,
J. M.
Prutkin
,
B.
Nair
,
R.
Katz
,
J.
Himmelfarb
,
N.
Bansal
, and
S.-I.
Lee
, “
From local explanations to global understanding with explainable AI for trees
,”
Nat. Mach. Intell.
2
,
56
67
(
2020
).
39.
J.
Friedman
,
T.
Hastie
, and
R.
Tibshirani
,
The Elements of Statistical Learning
(
Springer
,
2001
), Vol.
1
.
40.
A.
Buja
,
W.
Stuetzle
, and
Y.
Shen
, “
Loss functions for binary class probability estimation and classification: Structure and applications of work
,” Technical report, University of Pennsylvania, 2005.
41.
T.
Gneiting
and
A. E.
Raftery
, “
Strictly proper scoring rules, prediction, and estimation
,”
J. Am. Stat. Assoc.
102
,
359
378
(
2007
).
42.
M. D.
Reid
and
R. C.
Williamson
, “
Information, divergence and risk for binary experiments
,”
J. Mach. Learn. Res.
12
,
731
817
(
2011
).
43.
A.
Niculescu-Mizil
and
R.
Caruana
, in
Predicting good probabilities with supervised learning
, edited by
S.
Dzeroski
,
L.
De Raedt
, and
S.
Wrobel
(
Association for Computing Machinery
,
New York
,
2005
), pp.
625
632
.
44.
C.
Guo
,
G.
Pleiss
,
Y.
Sun
, and
K. Q.
Weinberger
, “
On calibration of modern neural networks
,”
Proc. Mach. Learn. Res.
70
,
1321
1330
(
2017
).
45.
V.
Kuleshov
,
N.
Fenner
, and
S.
Ermon
, “
Accurate uncertainties for deep learning using calibrated regression
,”
Proc. Mach. Learn. Res.
80
,
2796
2804
(
2018
).
46.
T.
Lemke
and
C.
Peter
, “
Neural network based prediction of conformational free energies—A new route toward coarse-grained simulation models
,”
J. Chem. Theory Comput.
13
,
6213
6221
(
2017
).
47.
X.
Ding
and
B.
Zhang
, “
Contrastive learning of coarse-grained force fields
,”
J. Chem. Theory Comput.
18
,
6334
6344
(
2022
).
48.
L. S.
Shapley
, “
A value for n-person games
,”
Contributions to the Theory of Games (AM-28)
(
Princeton University Press
,
1953
), Vol.
2
, pp.
307
318
.
49.
H. P.
Young
, “
Monotonic solutions of cooperative games
,”
Int. J. Game Theory
14
,
65
72
(
1985
).
50.
M.
Sundararajan
and
A.
Najmi
, “
The many Shapley values for model explanation
,” in
Proceedings of the 37th International Conference on Machine Learning
(PMLR, 2020), Vol. 119, pp.
9269
9278
.
51.
I. E.
Kumar
,
S.
Venkatasubramanian
,
C.
Scheidegger
, and
S.
Friedler
, “
Problems with Shapley-value-based explanations as feature importance measures
,” in
Proceedings of the 37th International Conference on Machine Learning
(PMLR, 2020), Vol. 119, pp.
5491
5500
.
52.
T.
Kluyver
,
B.
Ragan-Kelley
,
F.
Pérez
,
B.
Granger
,
M.
Bussonnier
,
J.
Frederic
,
K.
Kelley
,
J.
Hamrick
,
J.
Grout
,
S.
Corlay
,
P.
Ivanov
,
D.
Avila
,
S.
Abdalla
,
C.
Willing
,
Jupyter development team
.
Jupyter Notebooks—A publishing format for reproducible computational workflows
, edited by
F.
Loizides
and
B.
Scmidt
(
IOS Press BV
:
Amsterdam, Netherlands
,
2016
), pp.
87
90
.
53.
R. K.
Vinayak
and
R.
Gilad-Bachrach
, “
DART: Dropouts meet multiple additive regression trees
,”
Proc. Mach. Learn. Res.
38
,
489
497
(
2015
).
54.
G.
Ke
,
Q.
Meng
,
T.
Finley
,
T.
Wang
,
W.
Chen
,
W.
Ma
,
Q.
Ye
, and
T.-Y.
Liu
, “
LightGBM: A highly efficient gradient boosting decision tree
,” in
Advances in Neural Information Processing Systems
, Vol. 30, edited by I. Guyon et al. (Curran Associates, 2017).
55.
R.
Al-Rfou
,
G.
Alain
,
A.
Almahairi
,
C.
Angermueller
,
D.
Bahdanau
,
N.
Ballas
,
F.
Bastien
,
J.
Bayer
,
A.
Belikov
,
A.
Belopolsky
et al., “
Theano: A Python framework for fast computation of mathematical expressions
,” arXiv:1605.02688 (
2016
).
56.
C. R.
Harris
et al, “
Array programming with NumPy
,”
Nature
585
,
357
362
(
2020
).
57.
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
,
V.
Dubourg
,
J.
Vanderplas
,
A.
Passos
,
D.
Cournapeau
,
M.
Brucher
,
M.
Perrot
, and
E.
Duchesnay
, “
Scikit-learn: Machine learning in Python
,”
J. Mach. Learn. Res.
12
,
2825
2830
(
2011
).
58.
L.
McInnes
,
J.
Healy
, and
J.
Melville
, “
UMAP: Uniform manifold approximation and projection for dimension reduction
,” arXiv:1802.03426 (
2018
).
59.
L.
McInnes
,
J.
Healy
,
N.
Saul
, and
L.
Großberger
, “
UMAP: Uniform manifold approximation and projection
,”
J. Open Source Softw.
3
,
861
(
2018
).
60.
Pandas Development Team T. pandas-dev/pandas: Pandas 1.1.3 version v. 1.1.3
,
Pandas
,
2020
.
61.
H.
Wickham
,
Ggplot2: Elegant Graphics for Data Analysis
(
Springer-Verlag
,
New York
,
2016
).
62.
M.
Dowle
and
A.
Srinivasan
,
data.table: Extension of “data.frame” v. 1.12.8
,
2019
.
63.
H. W.
Borchers
.
pracma: Practical Numerical Math Functions v. 2.2.9
,
2019
.
64.
J. F.
Rudzinski
and
W. G.
Noid
, “
Bottom-up coarse-graining of peptide ensembles and helix-coil transitions
,”
J. Chem. Theory Comput.
11
,
1278
1291
(
2015
).
65.
D. A.
Case
et al,
AMBER 2018
,
University of California
,
San Francisco
,
2018
.
66.
J.
Huang
,
S.
Rauscher
,
G.
Nawrocki
,
T.
Ran
,
M.
Feig
,
B. L.
de Groot
,
H.
Grubmüller
, and
A. D.
MacKerell
, Jr.
, “
CHARMM36m: An improved force field for folded and intrinsically disordered proteins
,”
Nat. Methods
14
,
71
73
(
2017
).
67.
T.
Schneider
and
E.
Stoll
, “
Molecular-dynamics study of a three-dimensional one-component model for distortive phase transitions
,”
Phys. Rev. B
17
,
1302
1322
(
1978
).
68.
S.
Piana
and
A.
Laio
, “
A bias-exchange approach to protein folding
,”
J. Phys. Chem. B
111
,
4553
4559
(
2007
).
69.
A.
Prakash
,
M. D.
Baer
,
C. J.
Mundy
, and
J.
Pfaendtner
, “
Peptoid backbone flexibilility dictates its interaction with water and surfaces: A molecular dynamics investigation
,”
Biomacromolecules
19
,
1006
1015
(
2018
).
70.
M. J.
Abraham
,
T.
Murtola
,
R.
Schulz
,
S.
Páll
,
J. C.
Smith
,
B.
Hess
, and
E.
Lindahl
, “
GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers
,”
SoftwareX
1–2
,
19
25
(
2015
).
71.
G. M.
Hocky
,
J. L.
Baker
,
M. J.
Bradley
,
A. V.
Sinitskiy
,
E. M.
De La Cruz
, and
G. A.
Voth
, “
Cations stiffen actin filaments by adhering a key structural element to adjacent subunits
,”
J. Phys. Chem. B
120
,
4558
4567
(
2016
).
72.
A. D.
Mackerell
, Jr.
,
M.
Feig
, and
C. L.
Brooks
III
, “
Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations
,”
J. Comput. Chem.
25
,
1400
1415
(
2004
).
73.
G.
Bussi
,
D.
Donadio
, and
M.
Parrinello
, “
Canonical sampling through velocity rescaling
,”
J. Chem. Phys.
126
,
014101
(
2007
).
74.
M.
Parrinello
and
A.
Rahman
, “
Polymorphic transitions in single crystals: A new molecular dynamics method
,”
J. Appl. Phys.
52
,
7182
7190
(
1981
).
75.
M. G.
Saunders
and
G. A.
Voth
, “
Comparison between actin filament models: Coarse-graining reveals essential differences
,”
Structure
20
,
641
653
(
2012
).
76.
E.
Lyman
,
J.
Pfaendtner
, and
G. A.
Voth
, “
Systematic multiscale parameterization of heterogeneous elastic network models of proteins
,”
Biophys. J.
95
,
4183
4192
(
2008
).
77.
C. H.
Bennett
, “
Efficient estimation of free energy differences from Monte Carlo data
,”
J. Comput. Phys.
22
,
245
268
(
1976
).
78.
A.
Goscinski
,
G.
Fraux
,
G.
Imbalzano
, and
M.
Ceriotti
, “
The role of feature space in atomistic learning
,”
Mach. Learn.: Sci. Technol.
2
,
025028
(
2021
).
79.
B.
Anderson
,
T.-S.
Hy
, and
R.
Kondor
, “
Cormorant: Covariant molecular neural networks
,” in
Advances in Neural Information Processing Systems
Vol. 32, edited by H. Wallach et al. (Curran Associates, 2019).
80.
O. T.
Unke
,
S.
Chmiela
,
H. E.
Sauceda
,
M.
Gastegger
,
I.
Poltavsky
,
K. T.
Schütt
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
Machine learning force fields
,”
Chem. Rev.
121
,
10142
10186
(
2021
).
81.
I.
Goodfellow
,
J.
Pouget-Abadie
,
M.
Mirza
,
B.
Xu
,
D.
Warde-Farley
,
S.
Ozair
,
A.
Courville
, and
Y.
Bengio
, in Generative Adversarial Nets, edited by
Z.
Ghahramani
et al
(
Red Hook
,
NY
), pp.
2672
2680
.
82.
A. E. P.
Durumeric
and
G. A.
Voth
, “
Adversarial-residual-coarse-graining: Applying machine learning theory to systematic molecular coarse-graining
,”
J. Chem. Phys.
151
,
124110
(
2019
).
83.
P. G.
Sahrmann
,
T. D.
Loose
,
A. E. P.
Durumeric
, and
G. A.
Voth
, “
Utilizing machine learning to greatly expand the range and accuracy of bottom-up coarse-grained models through virtual particles
,”
J. Chem. Theory Comput.
(published online) (
2023
).
84.
X.
Fu
,
Z.
Wu
,
W.
Wang
,
T.
Xie
,
S.
Keten
,
R.
Gomez-Bombarelli
, and
T.
Jaakkola
, “
Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations
,” arXiv:2210.07237 (
2022
).

Supplementary Material

You do not currently have access to this content.