Policy-guided Monte Carlo is an adaptive method to simulate classical interacting systems. It adjusts the proposal distribution of the Metropolis–Hastings algorithm to maximize the sampling efficiency, using a formalism inspired by reinforcement learning. In this work, we first extend the policy-guided method to deal with a general state space, comprising, for instance, both discrete and continuous degrees of freedom, and then apply it to a few paradigmatic models of glass-forming mixtures. We assess the efficiency of a set of physically inspired moves whose proposal distributions are optimized through on-policy learning. Compared to conventional Monte Carlo methods, the optimized proposals are two orders of magnitude faster for an additive soft sphere mixture but yield a much more limited speed-up for the well-studied Kob–Andersen model. We discuss the current limitations of the method and suggest possible ways to improve it.

1.
N.
Metropolis
,
A. W.
Rosenbluth
,
M. N.
Rosenbluth
,
A. H.
Teller
, and
E.
Teller
,
J. Chem. Phys.
21
,
1087
(
1953
).
2.
J.
Batoulis
and
K.
Kremer
,
J. Phys. A: Math. Gen.
21
,
127
(
1988
).
3.
4.
R. H.
Swendsen
and
J.-S.
Wang
,
Phys. Rev. Lett.
58
,
86
(
1987
).
5.
J.-S.
Wang
,
R. H.
Swendsen
, and
R.
Kotecký
,
Phys. Rev. Lett.
63
,
109
(
1989
).
7.
C.
Dress
and
W.
Krauth
,
J. Phys. A: Math. Gen.
28
,
L597
(
1995
).
8.
J.
Liu
and
E.
Luijten
,
Phys. Rev. Lett.
92
,
035504
(
2004
).
9.
C. H.
Mak
and
A. K.
Sharma
,
Phys. Rev. Lett.
98
,
180602
(
2007
).
10.
D.
Frenkel
and
B.
Smit
,
Understanding Molecular Simulation: From Algorithms To Applications
, in
Computational Science Series
, 2nd ed. (
Academic Press
,
San Diego
,
2002
).
11.
D.
Landau
and
K.
Binder
,
A Guide to Monte Carlo Simulations in Statistical Physics
(
Cambridge University Press
,
Cambridge
,
2005
).
12.
T.
Hastie
,
R.
Tibshirani
, and
J.
Friedman
,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
, 2nd ed. (
Springer
,
New York, NY
,
2016
).
13.
14.
K.-W.
Zhao
,
W.-H.
Kao
,
K.-H.
Wu
, and
Y.-J.
Kao
,
Phys. Rev. E
99
,
062106
(
2019
).
15.
M.
Gabrié
,
G. M.
Rotskoff
, and
E.
Vanden-Eijnden
,
Proc. Natl. Acad. Sci. U. S. A.
119
,
e2109420119
(
2022
).
16.
H.
Christiansen
,
F.
Errica
, and
F.
Alesiani
,
J. Chem. Phys.
159
,
234109
(
2023
).
17.
S.
Asghar
,
Q.-X.
Pei
,
G.
Volpe
, and
R.
Ni
, “
Efficient rare event sampling with unsupervised normalising flows
,” arXiv:2401.01072 (
2024
).
18.
D.
Wu
,
L.
Wang
, and
P.
Zhang
,
Phys. Rev. Lett.
122
,
080602
(
2019
).
19.
F.
Noé
,
S.
Olsson
,
J.
Köhler
, and
H.
Wu
,
Science
365
,
eaaw1147
(
2019
).
20.
T.
Marchand
,
M.
Ozawa
,
G.
Biroli
, and
S.
Mallat
,
Phys. Rev. X
13
,
041038
(
2023
).
21.
S.
Ciarella
,
J.
Trinquier
,
M.
Weigt
, and
F.
Zamponi
,
Mach. Learn. Sci. Technol.
4
,
010501
(
2023
).
22.
G.
Jung
,
G.
Biroli
, and
L.
Berthier
, “
Normalizing flows as an enhanced sampling method for atomistic supercooled liquids
,” arXiv:2404.09914 (
2024
).
23.
L.
Berthier
and
D. R.
Reichman
,
Nat. Rev. Phys.
102
,
102
(
2023
).
24.
L.
Berthier
and
G.
Biroli
,
Rev. Mod. Phys.
83
,
587
(
2011
).
25.
L.
Berthier
and
W.
Kob
,
J. Phys. Condens. Matter
19
,
205130
(
2007
).
26.
T. S.
Grigera
and
G.
Parisi
,
Phys. Rev. E
63
,
045102
(
2001
).
27.
A.
Ninarello
,
L.
Berthier
, and
D.
Coslovich
,
Phys. Rev. X
7
,
021039
(
2017
).
28.
A. D. S.
Parmar
,
M.
Ozawa
, and
L.
Berthier
,
Phys. Rev. Lett.
125
,
085505
(
2020
).
29.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
, 2nd ed. (
The MIT Press
,
Cambridge, MA
,
2018
).
30.
31.
P.
Billingsley
,
Probability and Measure
, 3rd ed. (
Wiley-Interscience
,
New York
,
1995
).
32.
33.
C.
Robert
and
G.
Casella
,
Monte Carlo Statistical Methods
(
Springer Verlag
,
2004
).
34.
W. K.
Hastings
,
Biometrika
57
,
97
(
1970
).
35.

We note that this holds even when q and p are “generalized” densities, as long as they are defined with respect to the same measure. We also point out that, in some applications, the Radon–Nikodym derivative cannot be expressed as a ratio of q and p. This occurs, for instance, in spatial point processes where the support of the proposal distribution changes dimension at each step.30,67

36.
A.
Gelman
,
W. R.
Gilks
, and
G. O.
Roberts
,
Ann. Appl. Probab.
7
,
110
(
1997
).
37.
H.
Haario
,
E.
Saksman
, and
J.
Tamminen
,
Comput. Stat.
14
,
375
(
1999
).
38.
K.
Latuszynski
,
G.
Roberts
, and
J.
Rosenthal
,
Ann. Appl. Probab.
23
,
66
(
2013
).
39.
C.
Andrieu
and
J.
Thoms
,
Stat. Comput.
18
,
343
(
2008
).
40.
C.
Pasarica
and
A.
Gelman
,
Stat. Sin.
20
,
343
(
2010
).
41.
L.
Rall
,
Automatic Differentiation: Techniques and Applications
(
Springer
,
Berlin, Heidelberg
,
1981
).
42.
S.
Duane
,
A.
Kennedy
,
B. J.
Pendleton
, and
D.
Roweth
,
Phys. Lett. B
195
,
216
(
1987
).
43.

This method is analogous to hybrid Monte Carlo, well-known in the context of the simulation of liquids.

44.
J. P.
Nilmeier
,
G. E.
Crooks
,
D. D. L.
Minh
, and
J. D.
Chodera
,
Proc. Natl. Acad. Sci. U. S. A.
108
,
E1009
(
2011
).
45.
S.
Tamagnone
,
A.
Laio
, and
M.
Gabrié
, “
Coarse grained molecular dynamics with normalizing flows
,” arXiv:2406.01524 (
2024
).
46.
S.
Kullback
and
R. A.
Leibler
,
Ann. Math. Stat.
22
,
79
(
1951
).
47.
C.
Andrieu
and
E.
Moulines
,
Ann. Appl. Probab.
16
,
1462
(
2006
).
48.
B.
Bernu
,
J. P.
Hansen
,
Y.
Hiwatari
, and
G.
Pastore
,
Phys. Rev. A
36
,
4891
(
1987
).
49.
W.
Kob
and
H. C.
Andersen
,
Phys. Rev. E
51
,
4626
(
1995
).
50.
S.
Sastry
,
P. G.
Debenedetti
, and
F. H.
Stillinger
,
Nature
393
,
554
(
1998
).
51.
W.
Moses
and
V.
Churavy
, in
Advances in Neural Information Processing Systems
, edited by
H.
Larochelle
,
M.
Ranzato
,
R.
Hadsell
,
M. F.
Balcan
and
H.
Lin
(
Curran Associates Inc.
,
2020
), Vol.
33
, p.
12472
.
52.
P. J.
Rossky
,
J. D.
Doll
, and
H. L.
Friedman
,
J. Chem. Phys.
69
,
4628
(
1978
).
53.
Y. S.
Elmatad
,
D.
Chandler
, and
J. P.
Garrahan
,
J. Phys. Chem. B
113
,
5563
(
2009
).
54.
L.
Berthier
,
P.
Charbonneau
,
A.
Ninarello
,
M.
Ozawa
, and
S.
Yaida
,
Nat. Commun.
10
,
1508
(
2019
).
55.
D.
Coslovich
,
M.
Ozawa
, and
W.
Kob
,
Eur. Phys. J. E
41
,
62
(
2018
).
56.
H.
Tanaka
,
H.
Tong
,
R.
Shi
, and
J.
Russo
,
Nat. Rev. Phys.
1
,
333
(
2019
).
57.
C.
Wang
,
W.
Chen
,
H.
Kanagawa
, and
C. J.
Oates
, “
Reinforcement learning for adaptive MCMC
,” arXiv:2405.13574 (
2024
).
58.
F.
Ghimenti
,
L.
Berthier
, and
F.
van Wijland
, “
Irreversible Monte Carlo algorithms for hard disk glasses: From event-chain to collective swaps
,”
Phys. Rev. Lett.
133
,
028202
(
2024
).
60.
G.
Carleo
and
M.
Troyer
,
Science
355
,
602
(
2017
).
61.
L. L.
Viteritti
,
R.
Rende
, and
F.
Becca
,
Phys. Rev. Lett.
130
,
236401
(
2023
).
62.
64.
S. M.
Kakade
, in
Advances in Neural Information Processing Systems
, edited by
T.
Dietterich
,
S.
Becker
and
Z.
Ghahramani
(
MIT Press
,
2001
), Vol.
14
.
65.
J.
Schulman
,
S.
Levine
,
P.
Moritz
,
M.
Jordan
, and
P.
Abbeel
, in
Proceedings of the 32nd International Conference on International Conference on Machine Learning ICML’15
(
JMLR.org
,
2015
), Vol.
37
, p.
1889
, https://dl.acm.org/doi/proceedings/10.5555/3045118.
66.
A. B.
Bhatia
and
D. E.
Thornton
,
Phys. Rev. B
2
,
3004
(
1970
).
67.
C. J.
Geyer
and
J.
Møller
,
Scand. J. Stat.
21
,
359
(
1994
).
You do not currently have access to this content.