The training set of atomic configurations is key to the performance of any Machine Learning Force Field (MLFF) and, as such, the training set selection determines the applicability of the MLFF model for predictive molecular simulations. However, most atomistic reference datasets are inhomogeneously distributed across configurational space (CS), and thus, choosing the training set randomly or according to the probability distribution of the data leads to models whose accuracy is mainly defined by the most common close-to-equilibrium configurations in the reference data. In this work, we combine unsupervised and supervised ML methods to bypass the inherent bias of the data for common configurations, effectively widening the applicability range of the MLFF to the fullest capabilities of the dataset. To achieve this goal, we first cluster the CS into subregions similar in terms of geometry and energetics. We iteratively test a given MLFF performance on each subregion and fill the training set of the model with the representatives of the most inaccurate parts of the CS. The proposed approach has been applied to a set of small organic molecules and alanine tetrapeptide, demonstrating an up to twofold decrease in the root mean squared errors for force predictions on non-equilibrium geometries of these molecules. Furthermore, our ML models demonstrate superior stability over the default training approaches, allowing reliable study of processes involving highly out-of-equilibrium molecular configurations. These results hold for both kernel-based methods (sGDML and GAP/SOAP models) and deep neural networks (SchNet model).
Skip Nav Destination
Article navigation
28 March 2021
Research Article|
March 22 2021
Improving molecular force fields across configurational space by combining supervised and unsupervised machine learning Available to Purchase
Gregory Fonseca;
Gregory Fonseca
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
Search for other works by this author on:
Igor Poltavsky
;
Igor Poltavsky
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
Search for other works by this author on:
Valentin Vassilev-Galindo
;
Valentin Vassilev-Galindo
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
Search for other works by this author on:
Alexandre Tkatchenko
Alexandre Tkatchenko
a)
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
a)Author to whom correspondence should be addressed: [email protected]
Search for other works by this author on:
Gregory Fonseca
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
Igor Poltavsky
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
Valentin Vassilev-Galindo
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
Alexandre Tkatchenko
a)
Department of Physics and Materials Science, University of Luxembourg
, L-1511 Luxembourg, Luxembourg
a)Author to whom correspondence should be addressed: [email protected]
J. Chem. Phys. 154, 124102 (2021)
Article history
Received:
October 29 2020
Accepted:
March 01 2021
Citation
Gregory Fonseca, Igor Poltavsky, Valentin Vassilev-Galindo, Alexandre Tkatchenko; Improving molecular force fields across configurational space by combining supervised and unsupervised machine learning. J. Chem. Phys. 28 March 2021; 154 (12): 124102. https://doi.org/10.1063/5.0035530
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
CREST—A program for the exploration of low-energy molecular chemical space
Philipp Pracht, Stefan Grimme, et al.
DeePMD-kit v2: A software package for deep potential models
Jinzhe Zeng, Duo Zhang, et al.
Related Content
Challenges for machine learning force fields in reproducing potential energy surfaces of flexible molecules
J. Chem. Phys. (March 2021)
Considerations in the use of machine learning force fields for free energy calculations
J. Chem. Phys. (May 2025)
Molecular force fields with gradient-domain machine learning: Construction and application to dynamics of small molecules with coupled cluster forces
J. Chem. Phys. (March 2019)
Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics
J. Chem. Phys. (July 2023)
Neural network potential from bispectrum components: A case study on crystalline silicon
J. Chem. Phys. (August 2020)