Free energies govern the behavior of soft and liquid matter, and improving their predictions could have a large impact on the development of drugs, electrolytes, or homogeneous catalysts. Unfortunately, it is challenging to devise an accurate description of effects governing solvation such as hydrogen-bonding, van der Waals interactions, or conformational sampling. We present a Free energy Machine Learning (FML) model applicable throughout chemical compound space and based on a representation that employs Boltzmann averages to account for an approximated sampling of configurational space. Using the FreeSolv database, FML’s out-of-sample prediction errors of experimental hydration free energies decay systematically with training set size, and experimental uncertainty (0.6 kcal/mol) is reached after training on 490 molecules (80% of FreeSolv). Corresponding FML model errors are on par with state-of-the art physics based approaches. To generate the input representation for a new query compound, FML requires approximate and short molecular dynamics runs. We showcase its usefulness through analysis of solvation free energies for 116k organic molecules (all force-field compatible molecules in the QM9 database), identifying the most and least solvated systems and rediscovering quasi-linear structure–property relationships in terms of simple descriptors such as hydrogen-bond donors, number of NH or OH groups, number of oxygen atoms in hydrocarbons, and number of heavy atoms. FML’s accuracy is maximal when the temperature used for the molecular dynamics simulation to generate averaged input representation samples in training is the same as for the query compounds. The sampling time for the representation converges rapidly with respect to the prediction error.

1.
R.
Jinnouchi
,
F.
Karsai
, and
G.
Kresse
,
Phys. Rev. B
101
,
060201
(
2020
).
2.
Y.
Basdogan
,
M. C.
Groenenboom
,
E.
Henderson
,
S.
De
,
S. B.
Rempe
, and
J. A.
Keith
,
J. Chem. Theory Comput.
16
,
633
(
2020
).
3.
L.
Mones
,
A.
Jones
,
A. W.
Götz
,
T.
Laino
,
R. C.
Walker
,
B.
Leimkuhler
,
G.
Csányi
, and
N.
Bernstein
,
J. Comput. Chem.
36
,
633
(
2015
).
4.
H.
Takahashi
,
H.
Kambe
, and
A.
Morita
,
J. Chem. Phys.
150
,
114109
(
2019
).
5.
F.
Fogolari
,
A.
Brigo
, and
H.
Molinari
,
J. Mol. Recognit.
15
,
377
(
2002
).
6.
B.
Mennucci
,
J.
Tomasi
,
R.
Cammi
,
J. R.
Cheeseman
,
M. J.
Frisch
,
F. J.
Devlin
,
S.
Gabriel
, and
P. J.
Stephens
,
J. Phys. Chem. A
106
,
6102
(
2002
).
7.
A. V.
Marenich
,
C. J.
Cramer
, and
D. G.
Truhlar
,
J. Chem. Theory Comput.
9
,
609
(
2013
).
8.
A. V.
Marenich
,
C. J.
Cramer
, and
D. G.
Truhlar
,
J. Phys. Chem. B
113
,
6378
(
2009
).
9.
A.
Klamt
and
G.
Schüürmann
,
J. Chem. Soc., Perkin Trans.
2
,
799
(
1993
).
10.
A.
Klamt
,
J. Phys. Chem.
99
,
2224
(
1995
).
11.
A.
Klamt
and
F.
Eckert
,
Fluid Phase Equilib.
172
,
43
(
2000
).
12.
D.
Beglov
and
B.
Roux
,
J. Phys. Chem. B
101
,
7821
(
1997
).
13.
A.
Kovalenko
and
F.
Hirata
,
Chem. Phys. Lett.
290
,
237
(
1998
).
14.
D.
Roy
and
A.
Kovalenko
,
J. Phys. Chem. A
123
,
4087
(
2019
).
15.
Y.
Basdogan
,
A. M.
Maldonado
, and
J. A.
Keith
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
10
,
e1446
(
2020
).
16.
J. J.
Guerard
and
J. S.
Arey
,
J. Chem. Theory Comput.
9
,
5046
(
2013
).
17.
J.
Zhang
,
H.
Zhang
,
T.
Wu
,
Q.
Wang
, and
D.
van der Spoel
,
J. Chem. Theory Comput.
13
,
1034
(
2017
).
18.
A. S. J. S.
Mey
,
B. K.
Allen
,
H. E. B.
Macdonald
,
J. D.
Chodera
,
D. F.
Hahn
,
M.
Kuhn
,
J.
Michel
,
D. L.
Mobley
,
L. N.
Naden
,
S.
Prasad
,
A.
Rizzi
,
J.
Scheen
,
M. R.
Shirts
,
G.
Tresadern
, and
H.
Xu
,
Living J. Comput. Mol. Sci.
2
,
18378
(
2020
).
19.
M.
Rupp
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
,
Phys. Rev. Lett.
108
,
058301
(
2012
).
20.
F. A.
Faber
,
L.
Hutchison
,
B.
Huang
,
J.
Gilmer
,
S. S.
Schoenholz
,
G. E.
Dahl
,
O.
Vinyals
,
S.
Kearnes
,
P. F.
Riley
, and
O. A.
von Lilienfeld
,
J. Chem. Theory Comput.
13
,
5255
(
2017
).
21.
F. A.
Faber
,
A.
Lindmaa
,
O. A.
von Lilienfeld
, and
R.
Armiento
,
Phys. Rev. Lett.
117
,
135502
(
2016
).
22.
M.
Schwilk
,
D. N.
Tahchieva
, and
O. A.
von Lilienfeld
, “
Large yet bounded: Spin gap ranges in carbenes
,” arXiv:2004.10600 [physics.chem-ph] (
2020
).
23.
J.
Westermayr
and
P.
Marquetand
, “
Machine learning for electronically excited states of molecules
,”
Chem. Rev.
(
published online
2020
).
24.
B.
Huang
and
A. O.
von Lilienfeld
,
Nat. Chem.
12
,
945
(
2020
).
25.
J.
Behler
and
M.
Parrinello
,
Phys. Rev. Lett.
98
,
146401
(
2007
).
26.
J.
Behler
,
J. Chem. Phys.
145
,
170901
(
2016
).
27.
O. T.
Unke
,
S.
Chmiela
,
H. E.
Sauceda
,
M.
Gastegger
,
I.
Poltavsky
,
K. T.
Schütt
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
Machine learning force fields
,” arXiv:2010.07067 [physics.chem-ph] (
2020
).
28.
S.
Manzhos
and
T.
Carrington
,
Chem. Rev.
(
published online
2020
).
29.
H. E.
Sauceda
,
M.
Gastegger
,
S.
Chmiela
,
K.-R.
Müller
, and
A.
Tkatchenko
, “
Molecular force fields with gradient-domain machine learning (GDML): Comparison and synergies with classical force fields
,” arXiv:2008.04198 [physics.chem-ph] (
2020
).
30.
M. L.
Paleico
and
J.
Behler
,
J. Chem. Phys.
153
,
054704
(
2020
).
31.
V.
Quaranta
,
M.
Hellström
, and
J.
Behler
,
J. Phys. Chem. Lett.
8
,
1476
(
2017
).
32.
M.
Hellström
and
J.
Behler
,
J. Phys. Chem. B
121
,
4184
(
2017
).
33.
B.
Cheng
,
E. A.
Engel
,
J.
Behler
,
C.
Dellago
, and
M.
Ceriotti
,
Proc. Natl. Acad. Sci. U. S. A.
116
,
1110
(
2019
).
34.
B.
Cheng
,
G.
Mazzola
,
C. J.
Pickard
, and
M.
Ceriotti
,
Nature
585
,
217
(
2020
).
35.
J.
Wang
,
S.
Chmiela
,
K.-R.
Müller
,
F.
Noé
, and
C.
Clementi
,
J. Chem. Phys.
152
,
194106
(
2020
).
36.
F.
Noé
,
G.
De Fabritiis
, and
C.
Clementi
,
Curr. Opin. Struct. Biol.
60
,
77
(
2020
).
37.
C.
Scherer
,
R.
Scheid
,
D.
Andrienko
, and
T.
Bereau
,
J. Chem. Theory Comput.
16
,
3194
(
2020
).
38.
K. H.
Kanekal
and
T.
Bereau
,
J. Chem. Phys.
151
,
164106
(
2019
).
39.
C.
Hoffmann
,
R.
Menichetti
,
K. H.
Kanekal
, and
T.
Bereau
,
Phys. Rev. E
100
,
033302
(
2019
).
40.
B.
Huang
and
O. A.
von Lilienfeld
, “
Ab initio machine learning in chemical compound space
,” arXiv:2012.07502 [physics.chem-ph] (
2020
).
41.
J.
Gebhardt
,
M.
Kiesel
,
S.
Riniker
, and
N.
Hansen
,
J. Chem. Inf. Model.
60
,
5319
(
2020
).
42.
J.
Scheen
,
W.
Wu
,
A. S. J. S.
Mey
,
P.
Tosco
,
M.
Mackey
, and
J.
Michel
,
J. Chem. Inf. Model.
60
,
5331
(
2020
).
43.
H.
Lim
and
Y.
Jung
, “
MLSolv-A: A novel machine learning-based prediction of solvation free energies from pairwise atomistic interactions
,” arXiv:2005.06182 [stat.ML] (
2020
).
44.
S.
Axelrod
and
R.
Gomez-Bombarelli
, “
Molecular machine learning with conformer ensembles
,” arXiv:2012.08452 [cs.LG] (
2020
).
45.
F. H.
Vermeire
and
W. H.
Green
, “
Transfer learning for solvation free energies: From quantum chemistry to experiments
,” arXiv:2012.11730 [physics.chem-ph] (
2020
).
46.
A. S.
Christensen
,
L. A.
Bratholm
,
F. A.
Faber
, and
O. A.
von Lilienfeld
,
J. Chem. Phys.
152
,
044107
(
2020
).
47.
F. A.
Faber
,
A. S.
Christensen
,
B.
Huang
, and
O.
Anatole von Lilienfeld
,
J. Chem. Phys.
148
,
241717
(
2018
).
48.
B.
Huang
,
N. O.
Symonds
, and
O. A.
von Lilienfeld
, arXiv:1807.04259 (
2018
).
49.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
von Lilienfeld
,
Sci. Data
1
,
140022
(
2014
).
50.
V. N.
Vapnik
,
Statistical Learning Theory
(
Wiley-Interscience
,
1998
).
51.
C.
Rauer
and
T.
Bereau
,
J. Chem. Phys.
153
,
014101
(
2020
).
52.
A. J.
Hopfinger
,
S.
Wang
,
J. S.
Tokarski
,
B.
Jin
,
M.
Albuquerque
,
P. J.
Madhav
, and
C.
Duraiswami
,
J. Am. Chem. Soc.
119
,
10509
(
1997
).
53.
V. E.
Kuz’min
,
A. G.
Artemenko
,
P. G.
Polishchuk
,
E. N.
Muratov
,
A. I.
Hromov
,
A. V.
Liahovskiy
,
S. A.
Andronati
, and
S. Yu.
Makan
,
J. Mol. Model.
11
,
457
(
2005
).
54.
D. V.
Zankov
,
M.
Matveieva
,
A.
Nikonenko
,
R.
Nugmanov
,
A.
Varnek
,
P.
Polishchuk
, and
T.
Madzhidov
, “
QSAR modeling based on conformation ensembles using a multi-instance learning approach
,” chemrxiv:13456277 (
2020
).
55.
G.
Duarte Ramos Matos
,
D. Y.
Kyu
,
H. H.
Loeffler
,
J. D.
Chodera
,
M. R.
Shirts
, and
D. L.
Mobley
,
J. Chem. Eng. Data
62
,
1559
(
2017
).
56.
D.
Lemm
,
J. C.
Kromann
,
G. F.
von Rudorff
, and
O. A.
von Lilienfeld
, “
Clockwork
,” (unpublished).
57.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
von Lilienfeld
,
J. Chem. Theory Comput.
11
,
2087
(
2015
).
58.
P.
Eastman
,
J.
Swails
,
J. D.
Chodera
,
R. T.
McGibbon
,
Y.
Zhao
,
K. A.
Beauchamp
,
L.-P.
Wang
,
A. C.
Simmonett
,
M. P.
Harrigan
,
C. D.
Stern
,
R. P.
Wiewiora
,
B. R.
Brooks
, and
V. S.
Pande
,
PLoS Comput. Biol.
13
,
e1005659
(
2017
).
59.
J.
Wang
,
W.
Wang
,
P. A.
Kollman
, and
D. A.
Case
,
J. Mol. Graphics Modell.
25
,
247
(
2006
).
60.
J.
Wang
,
R. M.
Wolf
,
J. W.
Caldwell
,
P. A.
Kollman
, and
D. A.
Case
,
J. Comput. Chem.
25
,
1157
(
2004
).
61.
J.-P.
Ryckaert
,
G.
Ciccotti
, and
H. J. C.
Berendsen
,
J. Comput. Phys.
23
,
327
(
1977
).
62.
A.
Jakalian
,
B. L.
Bush
,
D. B.
Jack
, and
C. I.
Bayly
,
J. Comput. Chem.
21
,
132
(
2000
).
63.
A.
Onufriev
,
D.
Bashford
, and
D. A.
Case
,
Proteins
55
,
383
(
2004
).
64.
J. r.
Weiser
,
P. S.
Shenkin
, and
W. C.
Still
,
J. Comput. Chem.
20
,
217
(
1999
).
65.
A.
Christensen
,
F.
Faber
,
B.
Huang
,
L.
Bratholm
,
A.
Tkatchenko
,
K.
Müller
, and
O. v.
Lilienfeld
(unpublished) (
2017
), https://github.com/qmlcode/qml.
66.
D.
Rogers
and
M.
Hahn
,
J. Chem. Inf. Modell.
50
,
742
(
2010
).
67.
Rdkit: Open-source cheminformatics, Python package.
68.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M. L.
Klein
,
J. Chem. Phys.
79
,
926
(
1983
).
69.
Y.
Zhao
and
D. G.
Truhlar
,
Theor. Chem. Acc.
119
,
525
(
2008
).
70.
F.
Weigend
and
R.
Ahlrichs
,
Phys. Chem. Chem. Phys.
7
,
3297
(
2005
).
71.
M. J.
Frisch
,
G. W.
Trucks
,
H. B.
Schlegel
,
G. E.
Scuseria
,
M. A.
Robb
,
J. R.
Cheeseman
,
G.
Scalmani
,
V.
Barone
,
G. A.
Petersson
,
H.
Nakatsuji
,
X.
Li
,
M.
Caricato
,
A. V.
Marenich
,
J.
Bloino
,
B. G.
Janesko
,
R.
Gomperts
,
B.
Mennucci
,
H. P.
Hratchian
,
J. V.
Ortiz
,
A. F.
Izmaylov
,
R.
Cammi
,
J. W.
Ochterski
,
R. L.
Martin
,
K.
Morokuma
,
O.
Farkas
,
J. B.
Foresman
, and
D. J.
Fox
, Gaussian16 Revision C.01 (
Gaussian Inc.
.
Wallingford, CT
,
2016
).
72.
TURBOMOLE V7.2 2017, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since 2007; available from http://www.turbomole.com.
73.
A. D.
Becke
,
Phys. Rev. A
38
,
3098
(
1988
).
74.
R.
Ahlrichs
,
F.
Furche
, and
S.
Grimme
,
Chem. Phys. Lett.
325
,
317
(
2000
).
75.
A.
Schäfer
,
C.
Huber
, and
R.
Ahlrichs
,
J. Chem. Phys.
100
,
5829
(
1994
).
76.
F.
Eckert
and
A.
Klamt
, Cosmotherm, Cosmologic GMBH & CoKG, leverkusen, Germany, version c2.1, revision 01.07, 2007.
77.
A.
Klamt
,
B.
Mennucci
,
J.
Tomasi
,
V.
Barone
,
C.
Curutchet
,
M.
Orozco
, and
F. J.
Luque
,
Acc. Chem. Res.
42
,
489
(
2009
).
78.
A.
Klamt
and
M.
Diedenhofen
,
J. Phys. Chem. A
119
,
5439
(
2015
).
79.
S.
Sinnecker
,
A.
Rajendran
,
A.
Klamt
,
M.
Diedenhofen
, and
F.
Neese
,
J. Phys. Chem. A
110
,
2235
(
2006
).
80.
R. W.
Taft
,
J.-L. M.
Abboud
,
M. J.
Kamlet
, and
M. H.
Abraham
,
J. Solution Chem.
14
,
153
(
1985
).
81.
T. N.
Borhani
,
S.
García-Muñoz
,
C.
Vanesa Luciani
,
A.
Galindo
, and
C. S.
Adjiman
,
Phys. Chem. Chem. Phys.
21
,
13706
(
2019
).
82.
M.
Bragato
,
G. F.
von Rudorff
, and
O. A.
von Lilienfeld
,
Chem. Sci.
11
,
11859
(
2020
).

Supplementary Material

You do not currently have access to this content.