Physically motivated and mathematically robust atom-centered representations of molecular structures are key to the success of modern atomistic machine learning. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules and to explore and visualize their chemical structures and compositions. Recently, it has become clear that many of the most effective representations share a fundamental formal connection. They can all be expressed as a discretization of n-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing their evaluation. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss smooth overlap of atomic position (SOAP) features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis sets. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to reduce the total computational cost by a factor of up to 4 without affecting the model’s symmetry properties and without significantly impacting its accuracy.

1.
M. A.
Rohrdanz
,
W.
Zheng
,
M.
Maggioni
, and
C.
Clementi
, “
Determination of reaction coordinates via locally scaled diffusion map
,”
J. Chem. Phys.
134
,
124116
(
2011
).
2.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
3.
T. D.
Huan
,
A.
Mannodi-Kanakkithodi
, and
R.
Ramprasad
, “
Accelerated materials property predictions and design using motif-based fingerprints
,”
Phys. Rev. B
92
,
014106
(
2015
).
4.
P.
Gasparotto
,
R. H.
Meißner
, and
M.
Ceriotti
, “
Recognizing local and global structural motifs at the atomic scale
,”
J. Chem. Theory Comput.
14
,
486
498
(
2018
).
5.
M.
Ceriotti
, “
Unsupervised machine learning in atomistic simulations, between predictions and understanding
,”
J. Chem. Phys.
150
,
150901
(
2019
).
6.
J.
Rogal
,
E.
Schneider
, and
M. E.
Tuckerman
, “
Neural-network-based path collective variables for enhanced sampling of phase transformations
,”
Phys. Rev. Lett.
123
,
245701
(
2019
).
7.
B. A.
Helfrecht
,
R. K.
Cersonsky
,
G.
Fraux
, and
M.
Ceriotti
, “
Structure-property maps with Kernel principal covariates regression
,”
Mach. Learn.: Sci. Technol.
1
,
045021
(
2020
).
8.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
9.
A. P.
Bartók
,
M. C.
Payne
,
R.
Kondor
, and
G.
Csányi
, “
Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons
,”
Phys. Rev. Lett.
104
,
136403
(
2010
).
10.
J.
Behler
, “
Atom-centered symmetry functions for constructing high-dimensional neural network potentials
,”
J. Chem. Phys.
134
,
074106
(
2011
).
11.
C. E.
Rasmussen
and
C. K. I.
Williams
,
Gaussian Processes for Machine Learning
(
MIT Press
,
Cambridge, MA
,
2006
).
12.
A. P.
Bartók
,
R.
Kondor
, and
G.
Csányi
, “
On representing chemical environments
,”
Phys. Rev. B
87
,
184115
(
2013
).
13.
S.
Chmiela
,
H. E.
Sauceda
,
K.-R.
Müller
, and
A.
Tkatchenko
, “
Towards exact molecular dynamics simulations with machine-learned force fields
,”
Nat. Commun.
9
,
3887
(
2018
).
14.
M. J.
Willatt
,
F.
Musil
, and
M.
Ceriotti
, “
Atom-density representations for machine learning
,”
J. Chem. Phys.
150
,
154110
(
2019
).
15.
B.
Onat
,
C.
Ortner
, and
J. R.
Kermode
, “
Sensitivity and dimensionality of atomic environment representations used for machine learning interatomic potentials
,”
J. Chem. Phys.
153
,
144106
(
2020
).
16.
B.
Parsaeifard
,
D. S.
De
,
A. S.
Christensen
,
F. A.
Faber
,
E.
Kocer
,
S.
De
,
J.
Behler
,
A.
von Lilienfeld
, and
S.
Goedecker
, “
An assessment of the structural resolution of various fingerprints commonly used in machine learning
,”
Mach. Learn.: Sci. Technol.
(published online,
2020
).
17.
S. N.
Pozdnyakov
,
M. J.
Willatt
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Incompleteness of atomic structure representations
,”
Phys. Rev. Lett.
125
,
166001
(
2020
).
18.
A. P.
Thompson
,
L. P.
Swiler
,
C. R.
Trott
,
S. M.
Foiles
, and
G. J.
Tucker
, “
Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials
,”
J. Comput. Phys.
285
,
316
330
(
2015
).
19.
R.
Drautz
, “
Atomic cluster expansion for accurate and transferable interatomic potentials
,”
Phys. Rev. B
99
,
014104
(
2019
).
20.
M.
Bachmayr
,
G.
Csanyi
,
R.
Drautz
,
G.
Dusson
,
S.
Etter
,
C.
van der Oord
, and
C.
Ortner
, “
Atomic cluster expansion: Completeness, efficiency and stability
,” arXiv:1911.03550 [cs, math] (
2020
).
21.
A.
Shapeev
, “
Accurate representation of formation energies of crystalline alloys with many components
,”
Comput. Mater. Sci.
139
,
26
30
(
2017
).
22.
A.
Grisafi
,
D. M.
Wilkins
,
G.
Csányi
, and
M.
Ceriotti
, “
Symmetry-adapted machine learning for tensorial properties of atomistic systems
,”
Phys. Rev. Lett.
120
,
036002
(
2018
).
23.
J.
Nigam
,
S.
Pozdnyakov
, and
M.
Ceriotti
, “
Recursive evaluation and iterative contraction of N-body equivariant features
,”
J. Chem. Phys.
153
,
121101
(
2020
).
24.
T. T.
Nguyen
,
E.
Székely
,
G.
Imbalzano
,
J.
Behler
,
G.
Csányi
,
M.
Ceriotti
,
A. W.
Götz
, and
F.
Paesani
, “
Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions
,”
J. Chem. Phys.
148
,
241725
(
2018
).
25.
Y.
Zuo
,
C.
Chen
,
X.
Li
,
Z.
Deng
,
Y.
Chen
,
J.
Behler
,
G.
Csányi
,
A. V.
Shapeev
,
A. P.
Thompson
,
M. A.
Wood
, and
S. P.
Ong
, “
Performance and cost assessment of machine learning interatomic potentials
,”
J. Phys. Chem. A
124
,
731
745
(
2020
).
26.
C. W.
Rosenbrock
,
K.
Gubaev
,
A. V.
Shapeev
et al.
Machine-learned interatomic potentials for alloys and alloy phase diagrams
,”
NPJ Comput. Mater.
7
,
24
(
2021
).
27.
M. J.
Willatt
,
F.
Musil
, and
M.
Ceriotti
, “
Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements
,”
Phys. Chem. Chem. Phys.
20
,
29661
29668
(
2018
).
28.
M. A.
Caro
, “
Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials
,”
Phys. Rev. B
100
,
024112
(
2019
); arXiv:1905.02142.
29.
A. P.
Bartók
and
G.
Csányi
, “
Gaussian approximation potentials: A brief tutorial introduction
,”
Int. J. Quantum Chem.
115
,
1051
1057
(
2015
).
30.
C. S.
Adorf
,
P. M.
Dodd
,
V.
Ramasubramani
, and
S. C.
Glotzer
, “
Simple data and workflow management with the signac framework
,”
Comput. Mater. Sci.
146
,
220
229
(
2018
).
31.
C. S.
Adorf
,
V.
Ramasubramani
,
B. D.
Dice
,
M. M.
Henry
,
P. M.
Dodd
, and
S. C.
Glotzer
(
2019
). “
Glotzerlab/signac
,” Zenodo.
32.
F.
Musil
, LIBRASCAL benchmark workflows, https://github.com/felixmusil/rascal_benchmarks.
33.
F.
Musil
,
M.
Stricker
,
A.
Goscinski
,
F.
Giberti
,
M.
Veit
,
T.
Junge
,
G.
Fraux
,
M.
Ceriotti
,
R.
Cersonsky
,
M.
Willatt
, and
A.
Grisafi
(
2021
). “
cosmo-epfl/librascal
,” Zenodo.
34.
J.
Kermode
(
2018
). “
Silicon testing framework
,” Zenodo. https://zenodo.org/record/1250555
35.
A. P.
Bartók
,
J.
Kermode
,
N.
Bernstein
, and
G.
Csányi
, “
Machine learning a general-purpose interatomic potential for silicon
,”
Phys. Rev. X
8
,
041048
(
2018
).
36.
M.
Veit
(
2018
). “Bulk methane models and simulation parameters” Apollo, Dataset https://www.repository.cam.ac.uk/handle/1810/279000.
37.
M.
Veit
,
S. K.
Jain
,
S.
Bonakala
,
I.
Rudra
,
D.
Hohl
, and
G.
Csányi
, “
Equation of state of fluid methane from first principles with machine learning potentials
,”
J. Chem. Theory Comput.
15
,
2574
2586
(
2019
).
38.
K.
Rossi
,
V.
Jurásková
,
R.
Wischert
,
L.
Garel
,
C.
Corminbæuf
, and
M.
Ceriotti
, “
Simulating solvation and acidity in complex mixtures with first-principles accuracy: The case of CH3SO3H and H2O2 in phenol
,”
J. Chem. Theory Comput.
16
,
5139
5149
(
2020
); arXiv:2006.12597.
39.
F.
Musil
,
M. J.
Willatt
,
M. A.
Langovoy
, and
M.
Ceriotti
, “
Fast and accurate uncertainty estimation in chemical machine learning
,”
J. Chem. Theory Comput.
15
,
906
915
(
2019
).
40.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
Von Lilienfeld
, “
Quantum chemistry structures and properties of 134 kilo molecules
,”
Sci. Data
1
,
140022
(
2014
).
41.
J. C.
Light
and
T.
Carrington
, Jr.
,
Discrete-Variable Representations and Their Utilization
, (
John Wiley & Sons, Ltd.
,
2000
), pp.
263
310
.
42.
M.
Abramowitz
and
I.
Stegun
,
Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables
(
Dover Publications, Inc.
,
1972
).
43.
W. H.
Press
,
S. A.
Teukolsky
,
W. T.
Vetterling
, and
B. P.
Flannery
,
Numerical Recipes: The Art of Scientific Computing
, 3rd ed. (
Cambridge University Press
,
2007
).
44.
A.
Goscinski
,
G.
Fraux
,
G.
Imbalzano
, and
M.
Ceriotti
, “
The role of feature space in atomistic learning
,”
Mach. Learn.: Sci. Technol.
(published online,
2021
).
45.
P.
Rowe
,
V. L.
Deringer
,
P.
Gasparotto
,
G.
Csányi
, and
A.
Michaelides
, “
An accurate and transferable machine learning potential for carbon
,”
J. Chem. Phys.
153
,
034702
(
2020
).
46.
T.
Limpanuparb
and
J.
Milthorpe
, “
Associated Legendre polynomials and spherical harmonics computation for chemistry applications
,” arXiv:1410.1748 (
2014
).
47.
M.
Galassi
 et al.,
GNU Scientific Library Reference Manual
, 3rd ed. (
Network Theory
,
2009
), p.
573
.
48.
A.
Glielmo
,
C.
Zeni
, and
A.
De Vita
, “
Efficient nonparametric n-body force fields from machine learning
,”
Phys. Rev. B
97
,
184307
(
2018
).
49.
G.
Imbalzano
,
A.
Anelli
,
D.
Giofré
,
S.
Klees
,
J.
Behler
, and
M.
Ceriotti
, “
Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials
,”
J. Chem. Phys.
148
,
241730
(
2018
).
50.
M. W.
Mahoney
and
P.
Drineas
, “
CUR matrix decompositions for improved data analysis
,”
Proc. Natl. Acad. Sci. U. S. A.
106
,
697
702
(
2009
).
51.
Y.
Eldar
,
M.
Lindenbaum
,
M.
Porat
, and
Y. Y.
Zeevi
, “
The farthest point strategy for progressive image sampling
,”
IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc.
6
,
1305
1315
(
1997
).
52.
M.
Ceriotti
,
G. A.
Tribello
, and
M.
Parrinello
, “
Demonstrating the transferability and the descriptive power of sketch-map
,”
J. Chem. Theory Comput.
9
,
1521
1532
(
2013
).
53.

We use the GAP version “1611600208” implemented in the QUIP code.

54.
A.
Singraber
(
2020
). “N2P2,” GitHub https://github.com/CompPhysVienna/n2p2.
55.
J. S.
Smith
,
O.
Isayev
, and
A. E.
Roitberg
, “
ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost
,”
Chem. Sci.
8
,
3192
3203
(
2017
).
56.
R.
Lot
,
F.
Pellegrini
,
Y.
Shaidu
, and
E.
Küçükbenli
, “
PANNA: Properties from artificial neural network architectures
,”
Comput. Phys. Commun.
256
,
107402
(
2020
).
57.
H.
Wang
,
L.
Zhang
,
J.
Han
, and
W.
E
, “
DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics
,”
Comput. Phys. Commun.
228
,
178
184
(
2018
).
58.
A.
Bartók-Pártay
,
L.
Bartók-Pártay
,
F.
Bianchini
,
A.
Butenuth
,
M.
Caccin
,
S.
Cereda
,
G.
Csányi
,
A.
Comisso
,
T.
Daff
,
S.
John
,
C.
Gattinoni
,
G.
Moras
,
J.
Kermode
,
L.
Mones
,
A.
Nichol
,
D.
Packwood
,
L.
Pastewka
,
G.
Peralta
,
I.
Solt
,
O.
Strickson
,
W.
Szlachta
,
C.
Varnai
,
M.
Veit
, and
S.
Winfield
(
2020
). “libAtoms+QUIP,” GitHub https://github.com/libatoms/QUIP.
59.
V. L.
Deringer
and
G.
Csányi
, “
Machine learning based interatomic potential for amorphous carbon
,”
Phys. Rev. B
95
,
094203
(
2017
).
60.
F. C.
Mocanu
,
K.
Konstantinou
,
T. H.
Lee
,
N.
Bernstein
,
V. L.
Deringer
,
G.
Csányi
, and
S. R.
Elliott
, “
Modeling the phase-change memory material, Ge2Sb2Te5, with a machine-learned interatomic potential
,”
J. Phys. Chem. B
122
,
8998
9006
(
2018
).
61.
M. A.
Caro
,
V. L.
Deringer
,
J.
Koskinen
,
T.
Laurila
, and
G.
Csányi
, “
Growth mechanism and origin of high sp3 content in tetrahedral amorphous carbon
,”
Phys. Rev. Lett.
120
,
166101
(
2018
).
62.
Z.
Zhang
,
G.
Csányi
, and
D.
Alfè
, “
Partitioning of sulfur between solid and liquid iron under Earth’s core conditions: Constraints from atomistic simulations with machine learning potentials
,”
Geochim. Cosmochim. Acta
291
,
5
18
(
2020
).
63.
L.
Himanen
,
M. O. J.
Jäger
,
E. V.
Morooka
,
F. F.
Canova
,
Y. S.
Ranawat
,
D. Z.
Gao
,
P.
Rinke
, and
A. S.
Foster
, “
DScribe: Library of descriptors for machine learning in materials science
,”
Comput. Phys. Commun.
247
,
106949
(
2020
).
64.
D. M.
Wilkins
and
A.
Grisafi
(
2020
). “TENSOAP,” GitHub, https://github.com/dilkins/TENSOAP.
65.
S.
Chmiela
,
A.
Tkatchenko
,
H. E.
Sauceda
,
I.
Poltavsky
,
K. T.
Schütt
, and
K.-R.
Müller
, “
Machine learning of accurate energy-conserving molecular force fields
,”
Sci. Adv.
3
,
e1603015
(
2017
); arXiv:1611.04678.
66.
A. S.
Christensen
,
F. A.
Faber
,
B.
Huang
,
L. A.
Bratholm
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
(
2017
). “
QML
,” Zenodo. .
67.
F. A.
Faber
,
A. S.
Christensen
,
B.
Huang
, and
O. A.
von Lilienfeld
, “
Alchemical and structural distribution based representation for universal quantum machine learning
,”
J. Chem. Phys.
148
,
241717
(
2018
).
68.
A. S.
Christensen
,
F. A.
Faber
, and
O. A.
von Lilienfeld
, “
Operators in quantum machine learning: Response properties in chemical space
,”
J. Chem. Phys.
150
,
064105
(
2019
).
69.
S.
Plimpton
, “
Fast parallel algorithms for short-range molecular dynamics
,”
J. Comput. Phys.
117
,
1
19
(
1995
).
70.
C.
van der Oord
,
G.
Dusson
,
G.
Csányi
, and
C.
Ortner
, “
Regularised atomic body-ordered permutation-invariant polynomials for the construction of interatomic potentials
,”
Mach. Learn.: Sci. Technol.
1
,
015004
(
2020
).
71.
C.
Ortner
(
2019
). “JuLIP: Julia Library for Interatomic Potentials” GitHub, https://github.com/JuliaMolSim/JuLIP.jl.
72.
S.
Pozdnyakov
(
2020
). “NICE libraries” GitHub, https://github.com/cosmo-epfl/nice.
73.
E.
Snelson
and
Z.
Ghahramani
, “
Sparse Gaussian processes using pseudo-inputs
,” in
Advances in Neural Information Processing Systems
(
MIT Press
,
2006
), pp.
1257
1264
.
74.
G.
Csányi
,
M. J.
Willatt
, and
M.
Ceriotti
, “
Machine-learning of atomic-scale properties based on physical principles
,” in
Machine Learning Meets Quantum Physics
, edited by
K. T.
Schütt
,
S.
Chmiela
,
O. A.
von Lilienfeld
,
A.
Tkatchenko
,
K.
Tsuda
, and
K.-R.
Müller
(
Springer International Publishing
,
Cham
,
2020
), Vol. 968, pp.
99
127
.
75.

In practice, to match the number of features computed by QUIP, we use a mild feature sparsification in librascal that corresponds to the same use of the ⟨an1; an2; l| = ⟨an2; an1; l| symmetry that is implemented in QUIP.

76.
R. K.
Cersonsky
,
B. A.
Helfrecht
,
E. A.
Engel
, and
M.
Ceriotti
, “
Improving sample and feature selection with principal covariates regression
,” arXiv:2012.12253 (
2020
).
77.
K.
Lejaeghere
,
G.
Bihlmayer
,
T.
Bjorkman
,
P.
Blaha
,
S.
Blugel
,
V.
Blum
,
D.
Caliste
,
I. E.
Castelli
,
S. J.
Clark
,
A.
Dal Corso
,
S.
de Gironcoli
,
T.
Deutsch
,
J. K.
Dewhurst
,
I.
Di Marco
,
C.
Draxl
,
M.
Du ak
,
O.
Eriksson
,
J. A.
Flores-Livas
,
K. F.
Garrity
,
L.
Genovese
,
P.
Giannozzi
,
M.
Giantomassi
,
S.
Goedecker
,
X.
Gonze
,
O.
Granas
,
E. K. U.
Gross
,
A.
Gulans
,
F.
Gygi
,
D. R.
Hamann
,
P. J.
Hasnip
,
N. A. W.
Holzwarth
,
D.
Iu an
,
D. B.
Jochym
,
F.
Jollet
,
D.
Jones
,
G.
Kresse
,
K.
Koepernik
,
E.
Kucukbenli
,
Y. O.
Kvashnin
,
I. L. M.
Locht
,
S.
Lubeck
,
M.
Marsman
,
N.
Marzari
,
U.
Nitzsche
,
L.
Nordstrom
,
T.
Ozaki
,
L.
Paulatto
,
C. J.
Pickard
,
W.
Poelmans
,
M. I. J.
Probert
,
K.
Refson
,
M.
Richter
,
G.-M.
Rignanese
,
S.
Saha
,
M.
Scheffler
,
M.
Schlipf
,
K.
Schwarz
,
S.
Sharma
,
F.
Tavazza
,
P.
Thunstrom
,
A.
Tkatchenko
,
M.
Torrent
,
D.
Vanderbilt
,
M. J.
van Setten
,
V.
Van Speybroeck
,
J. M.
Wills
,
J. R.
Yates
,
G.-X.
Zhang
, and
S.
Cottenier
, “
Reproducibility in density functional theory calculations of solids
,”
Science
351
,
aad3000
(
2016
).
78.
W. J.
Szlachta
,
A. P.
Bartók
, and
G.
Csányi
, “
Accuracy and transferability of Gaussian approximation potential models for tungsten
,”
Phys. Rev. B
90
,
104108
(
2014
).
79.
A.
Glielmo
,
P.
Sollich
, and
A.
De Vita
, “
Accurate interatomic force fields via machine learning with covariant kernels
,”
Phys. Rev. B
95
,
214302
(
2017
).
80.
NIST Digital Library of Mathematical Functions
, edited by
F. W. J.
Olver
,
A. B.
Olde Daalhuis
,
D. W.
Lozier
,
B. I.
Schneider
,
R. F.
Boisvert
,
C. W.
Clark
,
B. R.
Miller
,
B. V.
Saunders
,
H. S.
Cohl
, and
M. A.
McClain
(
NIST
,
2020
), http://dlmf.nist.gov/.
81.
F.
Musil
,
M.
Veit
,
T.
Junge
,
M.
Stricker
,
A.
Goscinki
,
G.
Fraux
, and
M.
Ceriotti
(
2018
). “LIBRASCAL,” GitHub, https://github.com/cosmo-epfl/librascal.

Supplementary Material

You do not currently have access to this content.