A challenge of atomistic machine-learning (ML) methods is ensuring that the training data are suitable for the system being simulated, which is particularly challenging for systems with large numbers of atoms. Most atomistic ML approaches rely on the nearsightedness principle (“all chemistry is local”), using information about the position of an atom’s neighbors to predict a per-atom energy. In this work, we develop a framework that exploits the nearsighted nature of ML models to systematically produce an appropriate training set for large structures. We use a per-atom uncertainty estimate to identify the most uncertain atoms and extract chunks centered around these atoms. It is crucial that these small chunks are both large enough to satisfy the ML’s nearsighted principle (that is, filling the cutoff radius) and are large enough to be converged with respect to the electronic structure calculation. We present data indicating when the electronic structure calculations are converged with respect to the structure size, which fundamentally limits the accuracy of any nearsighted ML calculator. These new atomic chunks are calculated in electronic structures, and crucially, only a single force—that of the central atom—is added to the growing training set, preventing the noisy and irrelevant information from the piece’s boundary from interfering with ML training. The resulting ML potentials are robust, despite requiring single-point calculations on only small reference structures and never seeing large training structures. We demonstrated our approach via structure optimization of a 260-atom structure and extended the approach to clusters with up to 1415 atoms.

1.
D. R.
Bowler
and
T.
Miyazaki
, “O
(N) methods in electronic structure calculations
,”
Rep. Prog. Phys.
75
,
036503
(
2012
).
2.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
3.
J.
Behler
, “
Four generations of high-dimensional neural network potentials
,”
Chem. Rev.
121
,
10037
(
2021
).
4.
A. P.
Bartók
,
M. C.
Payne
,
R.
Kondor
, and
G.
Csányi
, “
Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons
,”
Phys. Rev. Lett.
104
,
136403
(
2010
).
5.
M.
Rupp
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
, “
Fast and accurate modeling of molecular atomization energies with machine learning
,”
Phys. Rev. Lett.
108
,
058301
(
2012
).
6.
T.
Mueller
,
A.
Hernandez
, and
C.
Wang
, “
Machine learning for interatomic potential models
,”
J. Chem. Phys.
152
,
050902
(
2020
).
7.
L.
Himanen
,
M. O. J.
Jäger
,
E. V.
Morooka
,
F.
Federici Canova
,
Y. S.
Ranawat
,
D. Z.
Gao
,
P.
Rinke
, and
A. S.
Foster
, “
DScribe: Library of descriptors for machine learning in materials science
,”
Comput. Phys. Commun.
247
,
106949
(
2020
).
8.
M.
Rupp
, “
Machine learning for quantum mechanics in a nutshell
,”
Int. J. Quantum Chem.
115
(
16
),
1058
1073
(
2015
).
9.
K. T.
Schütt
,
F.
Arbabzadah
,
S.
Chmiela
,
K. R.
Müller
, and
A.
Tkatchenko
, “
Quantum-chemical insights from deep tensor neural networks
,”
Nat. Commun.
8
,
13890
(
2017
).
10.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet—A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
,
241722
(
2018
).
11.
A.
Khorshidi
and
A. A.
Peterson
, “
Amp: A modular approach to machine learning in atomistic simulations
,”
Comput. Phys. Commun.
207
,
310
324
(
2016
).
12.
A. P.
Bartók
,
R.
Kondor
, and
G.
Csányi
, “
On representing chemical environments
,”
Phys. Rev. B
87
,
184115
(
2013
).
13.
J.
Behler
, “
Atom-centered symmetry functions for constructing high-dimensional neural network potentials
,”
J. Chem. Phys.
134
,
074106
(
2011
).
14.
A. A.
Peterson
, “
Acceleration of saddle-point searches with machine learning
,”
J. Chem. Phys.
145
,
074106
(
2016
).
15.
E. L.
Kolsbjerg
,
A. A.
Peterson
, and
B.
Hammer
, “
Neural-network-enhanced evolutionary algorithm applied to supported metal nanoparticles
,”
Phys. Rev. B
97
,
195424
(
2018
).
16.
K.
Gubaev
,
E. V.
Podryabinkin
,
G. L. W.
Hart
, and
A. V.
Shapeev
, “
Accelerating high-throughput searches for new alloys with active learning of interatomic potentials
,”
Comput. Mater. Sci.
156
,
148
156
(
2019
).
17.
N.
Bernstein
,
G.
Csányi
, and
V. L.
Deringer
, “
De novo exploration and self-guided learning of potential-energy surfaces
,”
npj Comput. Mater.
5
,
99
(
2019
).
18.
L.
Zhang
,
D.-Y.
Lin
,
H.
Wang
,
R.
Car
, and
W.
E
, “
Active learning of uniformly accurate interatomic potentials for materials simulation
,”
Phys. Rev. Mater.
3
,
023804
(
2019
).
19.
M.
Shuaibi
,
S.
Sivakumar
,
R. Q.
Chen
, and
Z. W.
Ulissi
, “
Enabling robust offline active learning for machine learning potentials using simple physics-based priors
,”
Mach. Learn.: Sci. Technol.
2
,
025007
(
2021
).
20.
K.
Gubaev
,
E. V.
Podryabinkin
, and
A. V.
Shapeev
, “
Machine learning of molecular properties: Locality and active learning
,”
J. Chem. Phys.
148
,
241727
(
2018
).
21.
A. A.
Peterson
,
R.
Christensen
, and
A.
Khorshidi
, “
Addressing uncertainty in atomistic machine learning
,”
Phys. Chem. Chem. Phys.
19
,
10978
10985
(
2017
).
22.
G.
Csányi
,
T.
Albaret
,
G.
Moras
,
M. C.
Payne
, and
A. D.
Vita
, “
Multiscale hybrid simulation methods for material systems
,”
J. Phys.: Condens. Matter
17
,
R691
R703
(
2005
).
23.
N.
Artrith
and
J.
Behler
, “
High-dimensional neural network potentials for metal surfaces: A prototype study for copper
,”
Phys. Rev. B
85
,
045439
(
2012
).
24.
M. W.
Mahoney
and
P.
Drineas
, “
CUR matrix decompositions for improved data analysis
,”
Proc. Natl. Acad. Sci. U. S. A.
106
,
697
702
(
2009
).
25.
W.
Kohn
, “
Density functional and density matrix method scaling linearly with the number of atoms
,”
Phys. Rev. Lett.
76
,
3168
3171
(
1996
).
26.
E.
Prodan
and
W.
Kohn
, “
Nearsightedness of electronic matter
,”
Proc. Natl. Acad. Sci. U. S. A.
102
(
33
),
11635
11638
(
2005
).
27.
T. W.
Ko
,
J. A.
Finkler
,
S.
Goedecker
, and
J.
Behler
, “
A fourth-generation high-dimensional neural network potential with accurate electrostatics including non-local charge transfer
,”
Nat. Commun.
12
,
398
(
2021
).
28.
M. A.
Blanco
,
A.
Martín Pendás
, and
E.
Francisco
, “
Interacting quantum atoms: A correlated energy decomposition scheme based on the quantum theory of atoms in molecules
,”
J. Chem. Theory Comput.
1
,
1096
1109
(
2005
).
29.
L.-W.
Wang
, “
Charge-density patching method for unconventional semiconductor binary systems
,”
Phys. Rev. Lett.
88
,
256402
(
2002
).
30.
V.
Botu
and
R.
Ramprasad
, “
Learning scheme to predict atomic forces and accelerate materials simulations
,”
Phys. Rev. B
92
,
094306
(
2015
).
31.
A.
Glielmo
,
P.
Sollich
, and
A.
De Vita
, “
Accurate interatomic force fields via machine learning with covariant kernels
,”
Phys. Rev. B
95
,
214302
(
2017
).
32.
N.
Bernstein
,
J. R.
Kermode
, and
G.
Csányi
, “
Hybrid atomistic simulation methods for materials systems
,”
Rep. Prog. Phys.
72
,
026501
(
2009
).
33.
J.
Nørskov
and
N. D.
Lang
, “
Effective-medium theory of chemical binding: Application to chemisorption
,”
Phys. Rev. B
21
,
2131
(
1980
).
34.
W.
Kohn
and
L. J.
Sham
, “
Self-consistent equations including exchange and correlation effects
,”
Phys. Rev.
140
,
A1133
A1138
(
1965
).
35.
J.
Enkovaara
,
C.
Rostgaard
,
J. J.
Mortensen
,
J.
Chen
,
M.
Dułak
,
L.
Ferrighi
,
J.
Gavnholt
,
C.
Glinsvad
,
V.
Haikola
,
H. A.
Hansen
,
H. H.
Kristoffersen
,
M.
Kuisma
,
A. H.
Larsen
,
L.
Lehtovaara
,
M.
Ljungberg
,
O.
Lopez-Acevedo
,
P. G.
Moses
,
J.
Ojanen
,
T.
Olsen
,
V.
Petzold
,
N. A.
Romero
,
J.
Stausholm-Møller
,
M.
Strange
,
G. A.
Tritsaris
,
M.
Vanin
,
M.
Walter
,
B.
Hammer
,
H.
Häkkinen
,
G. K. H.
Madsen
,
R. M.
Nieminen
,
J. K.
Nørskov
,
M.
Puska
,
T. T.
Rantala
,
J.
Schiøtz
,
K. S.
Thygesen
, and
K. W.
Jacobsen
, “
Electronic structure calculations with GPAW: A real-space implementation of the projector augmented-wave method
,”
J. Phys.: Condens. Matter
22
,
253202
(
2010
).
36.
C. J.
Pickard
and
R. J.
Needs
, “
Ab initio random structure searching
,”
J. Phys.: Condens. Matter
23
,
053201
(
2011
).
37.
V. L.
Deringer
,
C. J.
Pickard
, and
G.
Csányi
, “
Data-driven learning of total and local energies in elemental boron
,”
Phys. Rev. Lett.
120
,
156001
(
2018
).
38.
G.
Imbalzano
,
A.
Anelli
,
D.
Giofré
,
S.
Klees
,
J.
Behler
, and
M.
Ceriotti
, “
Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials
,”
J. Chem. Phys.
148
,
241730
(
2018
).
39.
B.
Cheng
,
G.
Mazzola
,
C. J.
Pickard
, and
M.
Ceriotti
, “
Evidence for supercritical behaviour of high-pressure liquid hydrogen
,”
Nature
585
,
217
220
(
2020
).
40.
N.
Choly
,
G.
Lu
,
W.
E
, and
E.
Kaxiras
, “
Multiscale simulations in simple metals: A density-functional-based methodology
,”
Phys. Rev. B
71
,
094101
(
2005
).
41.
F.
Bianchini
,
J. R.
Kermode
, and
A.
De Vita
, “
Modelling defects in Ni–Al with EAM and DFT calculations
,”
Modell. Simul. Mater. Sci. Eng.
24
,
045012
(
2016
).
42.
A.
Peguiron
,
L.
Colombi Ciacchi
,
A.
De Vita
,
J. R.
Kermode
, and
G.
Moras
, “
Accuracy of buffered-force QM/MM simulations of silica
,”
J. Chem. Phys.
142
,
064116
(
2015
).
43.
R.
Jinnouchi
,
F.
Karsai
, and
G.
Kresse
, “
On-the-fly machine learning force field generation: Application to melting points
,”
Phys. Rev. B
100
,
014105
(
2019
).
44.
T. T.
Nguyen
,
E.
Székely
,
G.
Imbalzano
,
J.
Behler
,
G.
Csányi
,
M.
Ceriotti
,
A. W.
Götz
, and
F.
Paesani
, “
Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions
,”
J. Chem. Phys.
148
,
241725
(
2018
).
45.
L.
Li
,
A. H.
Larsen
,
N. A.
Romero
,
V. A.
Morozov
,
C.
Glinsvad
,
F.
Abild-Pedersen
,
J.
Greeley
,
K. W.
Jacobsen
, and
J. K.
Nørskov
, “
Investigation of catalytic finite-size-effects of platinum metal clusters
,”
J. Phys. Chem. Lett.
4
,
222
226
(
2013
).
46.
V.
Mironov
,
A.
Moskovsky
,
M.
D’Mello
, and
Y.
Alexeev
, “
An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon PhiTM processor architecture
,”
Int. J. High Perform. Comput. Appl.
33
,
212
224
(
2019
).
47.
C. H.
Choi
,
H.-K.
Lim
,
M. W.
Chung
,
J. C.
Park
,
H.
Shin
,
H.
Kim
, and
S. I.
Woo
, “
Long-range electron transfer over graphene-based catalyst for high-performing oxygen reduction reactions: Importance of size, N-doping, and metallic impurities
,”
J. Am. Chem. Soc.
136
,
9070
9077
(
2014
).
48.
T.
Morawietz
,
V.
Sharma
, and
J.
Behler
, “
A neural network potential-energy surface for the water dimer based on environment-dependent atomic energies and charges
,”
J. Chem. Phys.
136
,
064103
(
2012
).

Supplementary Material

You do not currently have access to this content.