The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here, we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can build a unique basis that is optimal in this sense and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when working with representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models.

1.
J.
Behler
, “
Neural network potential-energy surfaces in chemistry: A tool for large-scale simulations
,”
Phys. Chem. Chem. Phys.
13
,
17930
17955
(
2011
).
2.
M.
Rupp
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
, “
Fast and accurate modeling of molecular atomization energies with machine learning
,”
Phys. Rev. Lett.
108
,
058301
(
2012
).
3.
A. P.
Bartók
,
R.
Kondor
, and
G.
Csányi
, “
On representing chemical environments
,”
Phys. Rev. B
87
,
184115
(
2013
).
4.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
5.
B. J.
Braams
and
J. M.
Bowman
, “
Permutationally invariant potential energy surfaces in high dimensionality
,”
Int. Rev. Phys. Chem.
28
,
577
606
(
2009
).
6.
A. V.
Shapeev
, “
Moment tensor potentials: A class of systematically improvable interatomic potentials
,”
Multiscale Model. Simul.
14
,
1153
1173
(
2016
).
7.
A.
Glielmo
,
P.
Sollich
, and
A.
De Vita
, “
Accurate interatomic force fields via machine learning with covariant kernels
,”
Phys. Rev. B
95
,
214302
(
2017
).
8.
A.
Grisafi
,
D. M.
Wilkins
,
G.
Csányi
, and
M.
Ceriotti
, “
Symmetry-adapted machine learning for tensorial properties of atomistic systems
,”
Phys. Rev. Lett.
120
,
036002
(
2018
).
9.
B.
Anderson
,
T. S.
Hy
, and
R.
Kondor
, “
Cormorant: Covariant molecular neural networks
,” in
Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
, edited by
H.
Wallach
,
H.
Larochelle
,
A.
Beygelzimer
,
F.
d'Alché-Buc
,
E.
Fox
, and
R.
Garnett
(
Curran Associates
,
2019
), p.
10
.
10.
S. N.
Pozdnyakov
,
M. J.
Willatt
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Incompleteness of atomic structure representations
,”
Phys. Rev. Lett.
125
,
166001
(
2020
).
11.
B.
Onat
,
C.
Ortner
, and
J. R.
Kermode
, “
Sensitivity and dimensionality of atomic environment representations used for machine learning interatomic potentials
,”
J. Chem. Phys.
153
,
144106
(
2020
).
12.
B.
Parsaeifard
,
D. S.
De
,
A. S.
Christensen
,
F. A.
Faber
,
E.
Kocer
,
S.
De
,
J.
Behler
,
A.
von Lilienfeld
, and
S.
Goedecker
, “
An assessment of the structural resolution of various fingerprints commonly used in machine learning
,”
Mach. Learn.: Sci. Technol.
2
,
015018
(
2020
).
13.
B.
Huang
and
O. A.
von Lilienfeld
, “
Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity
,”
J. Chem. Phys.
145
,
161102
(
2016
).
14.
F.
Musil
,
A.
Grisafi
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Physics-inspired structural representations for molecules and materials
,”
Chem. Rev.
121
,
9759
9815
(
2021
).
15.
M. J.
Willatt
,
F.
Musil
, and
M.
Ceriotti
, “
Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements
,”
Phys. Chem. Chem. Phys.
20
,
29661
29668
(
2018
).
16.
M. J.
Willatt
,
F.
Musil
, and
M.
Ceriotti
, “
Atom-density representations for machine learning
,”
J. Chem. Phys.
150
,
154110
(
2019
).
17.
J. M.
Sanchez
,
F.
Ducastelle
, and
D.
Gratias
, “
Generalized cluster description of multicomponent systems
,”
Physica A
128
,
334
350
(
1984
).
18.
R.
Drautz
, “
Atomic cluster expansion for accurate and transferable interatomic potentials
,”
Phys. Rev. B
99
,
014104
(
2019
).
19.
Y.
Zuo
,
C.
Chen
,
X.
Li
,
Z.
Deng
,
Y.
Chen
,
J.
Behler
,
G.
Csányi
,
A. V.
Shapeev
,
A. P.
Thompson
,
M. A.
Wood
, and
S. P.
Ong
, “
Performance and cost assessment of machine learning interatomic potentials
,”
J. Phys. Chem. A
124
,
731
(
2020
).
20.
A.
Goscinski
,
G.
Fraux
,
G.
Imbalzano
, and
M.
Ceriotti
, “
The role of feature space in atomistic learning
,”
Mach. Learn.: Sci. Technol.
2
,
025028
(
2021
).
21.
M.
Bachmayr
,
G.
Csanyi
,
R.
Drautz
,
G.
Dusson
,
S.
Etter
,
C.
van der Oord
, and
C.
Ortner
, “
Atomic cluster expansion: Completeness, efficiency and stability
,” arXiv:1911.03550 (
2019
).
22.
F.
Musil
,
M.
Veit
,
A.
Goscinski
,
G.
Fraux
,
M. J.
Willatt
,
M.
Stricker
, and
M.
Ceriotti
, “
Efficient implementation of atom-density representations
,”
J. Chem. Phys.
154
,
114109
(
2021
).
23.
A.
Schäfer
,
H.
Horn
, and
R.
Ahlrichs
, “
Fully optimized contracted Gaussian basis sets for atoms Li to Kr
,”
J. Chem. Phys.
97
,
2571
2577
(
1992
).
24.
V.
Blum
,
R.
Gehrke
,
F.
Hanke
,
P.
Havu
,
V.
Havu
,
X.
Ren
,
K.
Reuter
, and
M.
Scheffler
, “
Ab initio molecular simulations with numeric atom-centered orbitals
,”
Comput. Phys. Commun.
180
,
2175
2196
(
2009
).
25.
J.
Nigam
,
S.
Pozdnyakov
, and
M.
Ceriotti
, “
Recursive evaluation and iterative contraction of N-body equivariant features
,”
J. Chem. Phys.
153
,
121101
(
2020
).
26.
S.
Pozdnyakov
, NICE libraries, https://github.com/cosmo-epfl/nice,
2020
.
27.
F.
Musil
,
M.
Veit
,
T.
Junge
,
M.
Stricker
,
A.
Goscinki
,
G.
Fraux
, and
M.
Ceriotti
, LIBRASCAL, https://github.com/cosmo-epfl/librascal.
28.
M. A.
Caro
, “
Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials
,”
Phys. Rev. B
100
,
024112
(
2019
).
29.
A. P.
Bartók
,
S.
De
,
C.
Poelking
,
N.
Bernstein
,
J. R.
Kermode
,
G.
Csányi
, and
M.
Ceriotti
, “
Machine learning unifies the modeling of materials and molecules
,”
Sci. Adv.
3
,
e1701816
(
2017
).
30.
S.
de Jong
and
H. A. L.
Kiers
, “
Principal covariates regression
,”
Chemom. Intell. Lab. Syst.
14
,
155
164
(
1992
).
31.
B. A.
Helfrecht
,
R. K.
Cersonsky
,
G.
Fraux
, and
M.
Ceriotti
, “
Structure-property maps with Kernel principal covariates regression
,”
Mach. Learn.: Sci. Technol.
1
,
045021
(
2020
).
32.
L.
Himanen
,
M. O. J.
Jäger
,
E. V.
Morooka
,
F.
Federici Canova
,
Y. S.
Ranawat
,
D. Z.
Gao
,
P.
Rinke
, and
A. S.
Foster
, “
DScribe: Library of descriptors for machine learning in materials science
,”
Comput. Phys. Commun.
247
,
106949
(
2020
).
33.
A. P.
Bartók
,
M. C.
Payne
,
R.
Kondor
, and
G.
Csányi
, “
Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons
,”
Phys. Rev. Lett.
104
,
136403
(
2010
).
34.
B. K.
Miller
,
M.
Geiger
,
T. E.
Smidt
, and
F.
Noé
, “
Relevance of rotationally equivariant convolutions for predicting molecular properties
,” arXiv:2008.08461 (
2020
).
35.
A. P.
Bartók
,
J.
Kermode
,
N.
Bernstein
, and
G.
Csányi
, “
Machine learning a general-purpose interatomic potential for silicon
,”
Phys. Rev. X
8
,
041048
(
2018
).
36.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
Von Lilienfeld
, “
Quantum chemistry structures and properties of 134 kilo molecules
,”
Sci. Data
1
,
140022
(
2014
).
37.
F. A.
Faber
,
L.
Hutchison
,
B.
Huang
,
J.
Gilmer
,
S. S.
Schoenholz
,
G. E.
Dahl
,
O.
Vinyals
,
S.
Kearnes
,
P. F.
Riley
, and
O. A.
von Lilienfeld
, “
Prediction errors of molecular machine learning models lower than hybrid DFT error
,”
J. Chem. Theory Comput.
13
,
5255
5264
(
2017
).
38.
G.
Imbalzano
,
A.
Anelli
,
D.
Giofré
,
S.
Klees
,
J.
Behler
, and
M.
Ceriotti
, “
Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials
,”
J. Chem. Phys.
148
,
241730
(
2018
).
39.
Y.
Eldar
,
M.
Lindenbaum
,
M.
Porat
, and
Y. Y.
Zeevi
, “
The farthest point strategy for progressive image sampling
,”
IEEE Trans. Image Process.
6
,
1305
1315
(
1997
).
40.
M.
Ceriotti
,
G. A.
Tribello
, and
M.
Parrinello
, “
Demonstrating the transferability and the descriptive power of sketch-map
,”
J. Chem. Theory Comput.
9
,
1521
1532
(
2013
).
41.
R. K.
Cersonsky
,
B. A.
Helfrecht
,
E. A.
Engel
,
S.
Kliavinek
, and
M.
Ceriotti
, “
Improving sample and feature selection with principal covariates regression
,”
Mach. Learn.: Sci. Technol.
2
,
035038
(
2021
).

Supplementary Material

You do not currently have access to this content.