Gradient-domain machine learning (GDML) is an accurate and efficient approach to learn a molecular potential and associated force field based on the kernel ridge regression algorithm. Here, we demonstrate its application to learn an effective coarse-grained (CG) model from all-atom simulation data in a sample efficient manner. The CG force field is learned by following the thermodynamic consistency principle, here by minimizing the error between the predicted CG force and the all-atom mean force in the CG coordinates. Solving this problem by GDML directly is impossible because coarse-graining requires averaging over many training data points, resulting in impractical memory requirements for storing the kernel matrices. In this work, we propose a data-efficient and memory-saving alternative. Using ensemble learning and stratified sampling, we propose a 2-layer training scheme that enables GDML to learn an effective CG model. We illustrate our method on a simple biomolecular system, alanine dipeptide, by reconstructing the free energy landscape of a CG variant of this molecule. Our novel GDML training scheme yields a smaller free energy error than neural networks when the training set is small, and a comparably high accuracy when the training set is sufficiently large.

1.
K.
Lindorff-Larsen
,
S.
Piana
,
R. O.
Dror
, and
D. E.
Shaw
, “
How fast-folding proteins fold
,”
Science
334
,
517
520
(
2011
).
2.
I.
Buch
,
M. J.
Harvey
,
T.
Giorgino
,
D. P.
Anderson
, and
G.
De Fabritiis
, “
High-throughput all-atom molecular dynamics simulations using distributed computing
,”
J. Chem. Inf. Model.
50
,
397
403
(
2010
).
3.
M.
Shirts
and
V. S.
Pande
, “
Screen savers of the world unite!
,”
Science
290
,
1903
1904
(
2000
).
4.
R. O.
Dror
,
A. C.
Pan
,
D. H.
Arlow
,
D. W.
Borhani
,
P.
Maragakis
,
Y.
Shan
,
H.
Xu
, and
D. E.
Shaw
, “
Pathway and mechanism of drug binding to g-protein-coupled receptors
,”
Proc. Natl. Acad. Sci. U. S. A.
108
,
13118
13123
(
2011
).
5.
D.
Shukla
,
Y.
Meng
,
B.
Roux
, and
V. S.
Pande
, “
Activation pathway of src kinase reveals intermediate states as targets for drug design
,”
Nat. Commun.
5
,
3397
(
2014
).
6.
N.
Plattner
and
F.
Noé
, “
Protein conformational plasticity and complex ligand binding kinetics explored by atomistic simulations and markov models
,”
Nat. Commun.
6
,
7653
(
2015
).
7.
N.
Plattner
,
S.
Doerr
,
G.
De Fabritiis
, and
F.
Noé
, “
Protein-protein association and binding mechanism resolved in atomic detail
,”
Nat. Chem.
9
,
1005
1011
(
2017
).
8.
F.
Paul
,
C.
Wehmeyer
,
E. T.
Abualrous
,
H.
Wu
,
M. D.
Crabtree
,
J.
Schöneberg
,
J.
Clarke
,
C.
Freund
,
T. R.
Weikl
, and
F.
Noé
, “
Protein-ligand kinetics on the seconds timescale from atomistic simulations
,”
Nat. Commun.
8
,
1095
(
2017
).
9.
F.
Noé
,
A.
Tkatchenko
,
K.-R.
Müller
, and
C.
Clementi
, “
Machine learning for molecular simulation
,”
Annu. Rev. Phys. Chem.
71
(
1
),
361
390
(
2020
).
10.
D.
Frenkel
and
B.
Smit
,
Understanding Molecular Simulation: From Algorithms to Applications
, 2nd ed. (
Academic Press
,
2001
).
11.
G. M.
Torrie
and
J. P.
Valleau
, “
Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling
,”
J. Comput. Phys.
23
(
2
),
187
199
(
1977
).
12.
J.
Kästner
, “
Umbrella sampling
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
1
(
6
),
932
942
(
2011
).
13.
R. H.
Swendsen
and
J. S.
Wang
, “
Replica Monte Carlo simulation of spin-glasses
,”
Phys. Rev. Lett.
57
,
2607
2609
(
1986
).
14.
R. M.
Neal
, “
Sampling from multimodal distributions using tempered transitions
,”
Stat. Comput.
6
(
4
),
353
366
(
1996
).
15.
R.
Rajamani
,
K. J.
Naidoo
, and
J.
Gao
, “
Implementation of an adaptive umbrella sampling method for the calculation of multidimensional potential of mean force of chemical reactions in solution
,”
Proteins
24
,
1775
1781
(
2003
).
16.
J.
Preto
and
C.
Clementi
, “
Fast recovery of free energy landscapes via diffusion-map-directed molecular dynamics
,”
Phys. Chem. Chem. Phys.
16
,
19181
19191
(
2014
).
17.
G. R.
Bowman
,
D. L.
Ensign
, and
V. S.
Pande
, “
Enhanced modeling via network theory: Adaptive sampling of Markov state models
,”
J. Chem. Theory Comput.
6
(
3
),
787
794
(
2010
).
18.
C.
Clementi
, “
Coarse-grained models of protein folding: Toy-models or predictive tools?
,”
Curr. Opin. Struct. Biol.
18
,
10
15
(
2008
).
19.
A.
Davtyan
,
N. P.
Schafer
,
W.
Zheng
,
C.
Clementi
,
P. G.
Wolynes
, and
G. A.
Papoian
, “
AWSEM-MD: Protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing
,”
J. Phys. Chem. B
116
(
29
),
8494
8503
(
2012
).
20.
S.
Izvekov
and
G. A.
Voth
, “
A multiscale coarse-graining method for biomolecular systems
,”
J. Phys. Chem. B
109
(
7
),
2469
2473
(
2005
).
21.
S. J.
Marrink
,
A. H.
de Vries
, and
A. E.
Mark
, “
Coarse grained model for semiquantitative lipid simulations
,”
J. Phys. Chem. B
108
(
2
),
750
760
(
2004
).
22.
F.
Müller-Plathe
, “
Coarse-graining in polymer simulation: From the atomistic to the mesoscopic scale and back
,”
ChemPhysChem
3
(
9
),
754
769
(
2002
).
23.
W. G.
Noid
, “
Perspective: Coarse-grained models for biomolecular systems
,”
J. Chem. Phys.
139
(
9
),
090901
(
2013
).
24.
S. O.
Nielsen
,
C. F.
Lopez
,
G.
Srinivas
, and
M. L.
Klein
, “
A coarse grain model for n-alkanes parameterized from surface tension data
,”
J. Chem. Phys.
119
(
14
),
7043
7049
(
2003
).
25.
S.
Matysiak
and
C.
Clementi
, “
Optimal combination of theory and experiment for the characterization of the protein folding landscape of S6: How far can a minimalist model go?
,”
J. Mol. Biol.
343
,
235
248
(
2004
).
26.
S.
Matysiak
and
C.
Clementi
, “
Minimalist protein model as a diagnostic tool for misfolding and aggregation
,”
J. Mol. Biol.
363
,
297
308
(
2006
).
27.
J.
Chen
,
J.
Chen
,
G.
Pinamonti
, and
C.
Clementi
, “
Learning effective molecular models from experimental observables
,”
J. Chem. Theory Comput.
14
(
7
),
3849
3858
(
2018
).
28.
A. P.
Lyubartsev
and
A.
Laaksonen
, “
Calculation of effective interaction potentials from radial distribution functions: A reverse Monte Carlo approach
,”
Phys. Rev. E
52
(
4
),
3730
3737
(
1995
).
29.
M.
Praprotnik
,
L. D.
Site
, and
K.
Kremer
, “
Multiscale simulation of soft matter: From scale bridging to adaptive resolution
,”
Annu. Rev. Phys. Chem.
59
(
1
),
545
571
(
2008
).
30.
Y.
Wang
,
W. G.
Noid
,
P.
Liu
, and
G. A.
Voth
, “
Effective force coarse-graining
,”
Phys. Chem. Chem. Phys.
11
(
12)
,
2002
(
2009
).
31.
M. S.
Shell
, “
The relative entropy is fundamental to multiscale and inverse thermodynamic problems
,”
J. Phys. Chem.
129
(
14
),
144108
(
2008
).
32.
F.
Noé
,
S.
Olsson
,
J.
Köhler
, and
H.
Wu
, “
Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning
,”
Science
365
(
6457
),
eaaw1147
(
2019
).
33.
F.
Noé
,
G.
De Fabritiis
, and
C.
Clementi
, “
Machine learning for protein folding and dynamics
,”
Curr. Opin. Struct. Biol.
60
,
77
84
(
2020
).
34.
K. T.
Schütt
,
M.
Gastegger
,
A.
Tkatchenko
,
K.-R.
Müller
, and
R. J.
Maurer
, “
Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions
,”
Nat. Commun.
10
(
1
),
5024
(
2019
).
35.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
36.
A. P.
Bartók
,
M. C.
Payne
,
R.
Kondor
, and
G.
Csányi
, “
Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons
,”
Phys. Rev. Lett.
104
(
13
),
136403
(
2010
).
37.
M.
Rupp
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
, “
Fast and accurate modeling of molecular atomization energies with machine learning
,”
Phys. Rev. Lett.
108
(
5
),
058301
(
2012
).
38.
A. P.
Bartók
,
M. J.
Gillan
,
F. R.
Manby
, and
G.
Csányi
, “
Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water
,”
Phys. Rev. B
88
(
5
),
054104
(
2013
).
39.
J. S.
Smith
,
O.
Isayev
, and
A. E.
Roitberg
, “
ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost
,”
Chem. Sci.
8
(
4
),
3192
3203
(
2017
).
40.
A. P.
Bartók
,
S.
De
,
C.
Poelking
,
N.
Bernstein
,
J. R.
Kermode
,
G.
Csányi
, and
M.
Ceriotti
, “
Machine learning unifies the modeling of materials and molecules
,”
Sci. Adv.
3
(
12
),
e1701816
(
2017
).
41.
K. T.
Schütt
,
F.
Arbabzadah
,
S.
Chmiela
,
K. R.
Müller
, and
A.
Tkatchenko
, “
Quantum-chemical insights from deep tensor neural networks
,”
Nat. Commun.
8
,
13890
(
2017
).
42.
J. S.
Smith
,
B.
Nebgen
,
N.
Lubbers
,
O.
Isayev
, and
A. E.
Roitberg
, “
Less is more: Sampling chemical space with active learning
,”
J. Chem. Phys.
148
(
24
),
241733
(
2018
).
43.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet—A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
(
24
),
241722
(
2018
).
44.
A.
Grisafi
,
D. M.
Wilkins
,
G.
Csányi
, and
M.
Ceriotti
, “
Symmetry-adapted machine learning for tensorial properties of atomistic systems
,”
Phys. Rev. Lett.
120
(
3
),
036002
(
2018
).
45.
G.
Imbalzano
,
A.
Anelli
,
D.
Giofré
,
S.
Klees
,
J.
Behler
, and
M.
Ceriotti
, “
Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials
,”
J. Chem. Phys.
148
(
24
),
241730
(
2018
).
46.
T. T.
Nguyen
,
E.
Székely
,
G.
Imbalzano
,
J.
Behler
,
G.
Csányi
,
M.
Ceriotti
,
A. W.
Götz
, and
F.
Paesani
, “
Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions
,”
J. Chem. Phys.
148
(
24
),
241725
(
2018
).
47.
L.
Zhang
,
J.
Han
,
H.
Wang
,
W. A.
Saidi
,
R.
Car
, and
W. E.
, “
End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems
,”
Proceedings of the 32nd International Conference on Neural Information Processing Systems (NuerIPS 2018)
(
NIPS
,
2018
), pp.
4436
4446
.
48.
L.
Zhang
,
J.
Han
,
H.
Wang
,
R.
Car
, and
W.
E
, “
Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics
,”
Phys. Rev. Lett.
120
,
143001
(
2018
).
49.
T.
Bereau
,
R. A.
DiStasio
,
A.
Tkatchenko
, and
O. A.
von Lilienfeld
, “
Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning
,”
J. Chem. Phys.
148
,
241706
(
2018
).
50.
S.
Chmiela
,
A.
Tkatchenko
,
H. E.
Sauceda
,
I.
Poltavsky
,
K. T.
Schütt
, and
K.-R.
Müller
, “
Machine learning of accurate energy-conserving molecular force fields
,”
Sci. Adv.
3
,
e1603015
(
2017
).
51.
S.
Chmiela
,
H. E.
Sauceda
,
K.-R.
Müller
, and
A.
Tkatchenko
, “
Towards exact molecular dynamics simulations with machine-learned force fields
,”
Nat. Commun.
9
,
3887
(
2018
).
52.
A.
Mardt
,
L.
Pasquali
,
H.
Wu
, and
F. N.
Vampnets
, “
Deep learning of molecular kinetics
,”
Nat. Commun.
9
,
5
(
2018
).
53.
H.
Wu
,
A.
Mardt
,
L.
Pasquali
, and
F.
Noé
, “
Deep generative Markov state models
,” in
Advances in Neural Information Processing Systems
, edited by
S.
Bengio
,
H.
Wallach
,
H.
Larochelle
,
K.
Grauman
,
N.
Cesa-Bianchi
, and
R.
Garnett
(
Curran Associates, Inc.
,
2018
), Vol. 31, pp.
3975
3984
.
54.
C.
Wehmeyer
and
F.
Noé
, “
Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics
,”
J. Chem. Phys.
148
,
241703
(
2018
).
55.
C. X.
Hernández
,
H. K.
Wayment-Steele
,
M. M.
Sultan
,
B. E.
Husic
, and
V. S.
Pande
, “
Variational encoding of complex dynamics
,”
Phys. Rev. E
97
(
6
),
062412
(
2018
).
56.
J. M. L.
Ribeiro
,
P.
Bravo
,
Y.
Wang
, and
P.
Tiwary
, “
Reweighted autoencoded variational Bayes for enhanced sampling (RAVE)
,”
J. Chem. Phys.
149
,
072301
(
2018
).
57.
S. T.
John
and
G.
Csányi
, “
Many-body coarse-grained interactions using Gaussian approximation potentials
,”
J. Phys. Chem. B
121
(
48
),
10934
10949
(
2017
).
58.
L.
Zhang
,
J.
Han
,
H.
Wang
,
R.
Car
, and
W.
E
, “
DeePCG: Constructing coarse-grained models via deep neural networks
,”
J. Chem. Phys.
149
(
3
),
034101
(
2018
).
59.
J.
Wang
,
S.
Olsson
,
C.
Wehmeyer
,
A.
Pérez
,
N. E.
Charron
,
G.
de Fabritiis
,
F.
Noé
, and
C.
Clementi
, “
Machine learning of coarse-grained molecular dynamics force fields
,”
ACS Cent. Sci.
5
,
755
767
(
2019
).
60.
W. G.
Noid
,
J.-W.
Chu
,
G. S.
Ayton
,
V.
Krishna
,
S.
Izvekov
,
G. A.
Voth
,
A.
Das
, and
H. C.
Andersen
, “
The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models
,”
J. Chem. Phys.
128
(
24
),
244114
(
2008
).
61.
S.
Chmiela
,
H. E.
Sauceda
,
I.
Poltavsky
,
K.-R.
Müller
, and
A.
Tkatchenko
, “
Constructing accurate and data efficient molecular force fields using machine learning
,”
Comput. Phys. Commun.
240
,
38
45
(
2019
).
62.
L.
Boninsegna
,
R.
Banisch
, and
C.
Clementi
, “
A data-driven perspective on the hierarchical assembly of molecular structures
,”
J. Chem. Theory Comput.
14
(
1
),
453
460
(
2018
).
63.
A. V.
Sinitskiy
,
M. G.
Saunders
, and
G. A.
Voth
, “
Optimal number of coarse-grained sites in different components of large biomolecular complexes
,”
J. Phys. Chem. B
116
(
29
),
8363
8374
(
2012
).
64.
W.
Wang
and
R.
Gómez-Bombarelli
, “
Coarse-graining auto-encoders for molecular dynamics
,”
npj Comput. Mater.
5
(
1
),
125
(
2019
).
65.
V. N.
Vapnik
, “
An overview of statistical learning theory
,”
IEEE Trans. Neural Network
10
,
988
999
(
1999
).
66.
C. E.
Rasmussen
, “
Gaussian processes in machine learning
,” in
Advanced Lectures on Machine Learning
(
Springer
,
2004
), pp.
63
71
.
67.
M. L.
Stein
,
Interpolation of Spatial Data: Some Theory for Kriging
(
Springer-Verlag
,
New York
,
1999
).
68.
D.
Opitz
and
R.
Maclin
, “
Popular ensemble methods: An empirical study
,”
J. Artif. Intell. Res.
11
,
169
198
(
1999
).
69.
R.
Polikar
, “
Ensemble based systems in decision making
,”
IEEE Circuits Syst. Mag.
6
(
3
),
21
45
(
2006
).
70.
L.
Rokach
, “
Ensemble-based classifiers
,”
Artif. Intell. Rev.
33
(
1
),
1
39
(
2010
).
71.
L.
Breiman
, “
Bagging predictors
,”
Mach. Learn.
24
(
2
),
123
140
(
1996
).
72.
S.
Geman
,
E.
Bienenstock
, and
R.
Doursat
, “
Neural networks and the bias/variance dilemma
,”
Neural Comput.
4
(
1
),
1
58
(
1992
).
73.
L. K.
Hansen
and
P.
Salamon
, “
Neural network ensembles
,”
IEEE Trans. Pattern Anal. Mach. Intell.
12
(
10
),
993
1001
(
1990
).
74.
R. E.
Schapire
, “
The strength of weak learnability
,”
Mach. Learn.
5
(
2
),
197
227
(
1990
).
75.
L. I.
Kuncheva
and
C. J.
Whitaker
, “
Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy
,”
Mach. Learn.
51
(
2
),
181
207
(
2003
).
76.
E.
Bauer
and
R.
Kohavi
, “
An empirical comparison of voting classification algorithms: Bagging, boosting, and variants
,”
Mach. Learn.
36
(
1
),
105
139
(
1999
).
77.
M. A.
Rohrdanz
,
W.
Zheng
, and
C.
Clementi
, “
Discovering mountain passes via torchlight: Methods for the definition of reaction coordinates and pathways in complex macromolecular reactions
,”
Annu. Rev. Phys. Chem.
64
,
295
316
(
2013
).
78.
F.
Noé
and
C.
Clementi
, “
Collective variables for the study of long-time kinetics from molecular trajectories: Theory and methods
,”
Curr. Opin. Struct. Biol.
43
,
141
147
(
2017
).
79.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
(
1
),
015102
(
2013
).
80.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
,
2000
2009
(
2013
).
81.
A.
Ziehe
and
K.-R.
Müller
, “
TDSEP—An efficient algorithm for blind separation using time structure
,” in
ICANN 98
(
Springer Science and Business Media
,
1998
), pp.
675
680
.
82.
A.
Belouchrani
,
K.
Abed-Meraim
,
J.-F.
Cardoso
, and
E.
Moulines
, “
A blind source separation technique using second-order statistics
,”
IEEE Trans. Signal Process.
45
(
2
),
434
444
(
1997
).
83.
L.
Molgedey
and
H. G.
Schuster
, “
Separation of a mixture of independent signals using time delayed correlations
,”
Phys. Rev. Lett.
72
,
3634
3637
(
1994
).
84.
B.
Schölkopf
,
A.
Smola
, and
K.-R.
Müller
, “
Kernel principal component analysis
,” in
Artificial Neural Networks—ICANN’97
, edited by
W.
Gerstner
,
A.
Germond
,
M.
Hasler
, and
J.-D.
Nicoud
(
Springer Berlin Heidelberg
,
Berlin, Heidelberg
,
1997
), pp.
583
588
.
85.
B.
Schölkopf
,
A.
Smola
, and
K.-R.
Müller
, “
Nonlinear component analysis as a kernel eigenvalue problem
,”
Neural Comput.
10
(
5
),
1299
1319
(
1998
).
86.
K.-R.
Muller
,
S.
Mika
,
G.
Ratsch
,
K.
Tsuda
, and
B.
Scholkopf
, “
An introduction to kernel-based learning algorithms
,”
IEEE Trans. Neural Networks
12
(
2
),
181
201
(
2001
).
87.
M. A.
Rohrdanz
,
W.
Zheng
,
M.
Maggioni
, and
C.
Clementi
, “
Determination of reaction coordinates via locally scaled diffusion map
,”
J. Chem. Phys.
134
,
124116
(
2011
).
88.
F.
Nüske
,
L.
Boninsegna
, and
C.
Clementi
, “
Coarse-graining molecular systems by spectral matching
,”
J. Chem. Phys.
151
(
4
),
044116
(
2019
).

Supplementary Material

You do not currently have access to this content.