Nowadays, computer simulations have become a standard tool in essentially all fields of chemistry, condensed matter physics, and materials science. In order to keep up with state-of-the-art experiments and the ever growing complexity of the investigated problems, there is a constantly increasing need for simulations of more realistic, i.e., larger, model systems with improved accuracy. In many cases, the availability of sufficiently efficient interatomic potentials providing reliable energies and forces has become a serious bottleneck for performing these simulations. To address this problem, currently a paradigm change is taking place in the development of interatomic potentials. Since the early days of computer simulations simplified potentials have been derived using physical approximations whenever the direct application of electronic structure methods has been too demanding. Recent advances in machine learning (ML) now offer an alternative approach for the representation of potential-energy surfaces by fitting large data sets from electronic structure calculations. In this perspective, the central ideas underlying these ML potentials, solved problems and remaining challenges are reviewed along with a discussion of their current applicability and limitations.
I. INTRODUCTION
Machine learning (ML) methods1 have a long tradition in chemistry and physics.2,3 While for many years they have been used almost exclusively as a classification tool to assist, e.g., the analysis of spectra,4 the prediction of binding sites of biomolecules5 or the derivation of quantitative structure-activity relationships (QSARs),6 they have now started to enter the heart of theoretical chemistry and computational materials science in a manifold way. This process was initially slow, because of the understandable scepticism of theoretical chemists and physicists, who are used to well-controlled hierarchies of physical approximations, regarding these obscure black box methods. Further, only in recent years ML methods have reached a mature state mainly driven by commercial needs, resulting in a variety of new applications.
The border between ML, which is a subtopic of artificial intelligence, and other purely mathematical regression methods is difficult to define and is not possible without some degree of arbitrariness. Still, several definitions have been proposed in the literature over the years starting from the famous classical definition given by Arthur Samuel, a pioneer in ML research, in 1959: “[ML is a] Field of study that gives computers the ability to learn without being explicitly programmed,” to a modern definition by Tom M. Mitchell:1 “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with the experience E.”
Nowadays, ML affects everyday life in manifold ways, like character and voice recognition software, fingerprint identification, e-mail spam filtering, autonomously driving cars, computer game opponents, credit card fraud detection and too many others to mention. For these purposes, ML offers a whole toolbox of techniques, like artificial neural networks,7 Gaussian processes,8 and support vector machines.9
Driven by the desire for a more rational design of materials, in recent years ML has also established a new trend in computational materials science,10,11 which shows many similarities to the traditional field of QSAR and drug design. In both cases the goal is to identify systems, molecules or materials, exhibiting certain properties. To reach this goal the machinery of ML is used to search large databases for hidden relationships between the atomic structure, which can be encoded directly or indirectly using a huge number of possible descriptors, and the property of interest.12
The idea to employ information from large databases is not new, and repositories of experimental data like the Cambridge Structural Database13 and the Protein Data Bank14 have been existing for decades. New is the availability of enormous amounts of high quality theoretical data from electronic structure calculations, which are inaccessible to a manual inspection and analysis. Still, in spite of the high performance of modern supercomputers, it will remain impossible for a long time to screen all candidate structures directly by electronic structure calculations due to the exponentially increasing complexity of configuration space with chemical composition and system size. ML now offers a very exciting tool to fill this gap enabling the analysis of the available data, the discovery of hitherto unknown relationships and the identification of new promising materials.
The topic of this perspective is a different application of ML methods, i.e., the accurate representation of atomic interactions for applications in atomistic computer simulations of problems in chemistry, physics and materials science. In contrast to the classification and pattern recognition applications discussed above, the task to be performed for this purpose is the fitting of a complicated function using a set of known function values. This function, the potential energy surface (PES), is a multidimensional real-valued function providing the potential energy of a system as a function of the atomic positions.
The concept of a PES is founded on the Born-Oppenheimer approximation15 of quantum mechanics. If the atomic positions, the nuclear charges and the total charge are known, the potential energy of a system is fully defined by its electronic Hamiltonian. Electronic structure methods enable in principle the calculation of arbitrary atomic configurations corresponding to individual points on the PES, which can be either a ground state or excited state surface. For applications like molecular dynamics (MD) simulations, the energies and forces for a large number of atomic configurations are required, which can be obtained either on-the-fly, by performing the electronic structure calculations as needed while the trajectory evolves, or by evaluating an analytic expression, which provides a direct functional relation between the atomic configuration and the energy. This analytic expression, which has to be defined before running the simulation, is often called “interatomic potential,” “force field” or potential-energy surface.
Electronic structure calculations are computationally very demanding, even if relatively efficient methods like density functional theory (DFT) are used, and DFT-based “ab initio MD simulations”16 are thus restricted to a few hundred atoms and simulation times significantly shorter than 1 ns. Analytic potentials, which provide a much simpler direct relation between the structure and its energy in closed form, can be calculated many orders of magnitude faster, but they are usually derived by introducing physical approximations. While in most cases, the essential features of the atomic interactions are still reasonably described, these potentials necessarily represent a compromise between efficiency and accuracy. Interatomic potentials employing ML techniques aim to combine the advantages of both approaches, the accuracy of first principles calculations and the efficiency of a simple functional form. Much progress has been made in recent years to reach this goal, but still this field is young and developing very dynamically.
In principle, the requirements to be met by ML potentials are very similar to conventional empirical potentials. The evaluation should be fast to enable large scale MD and Monte Carlo simulations. The computational effort to construct the potential should be significantly lower than the costs of a direct application of electronic structure methods. Minimum human effort and manual intervention should be needed to develop the potential and systematic improvements should be possible. Analytic energy gradients for the calculation of the forces, the stress tensor and other gradient-dependent quantities should be available. The information required to evaluate the potential should not include any classification of atoms into types beyond the specification of the nuclear charge, as chemical environments and bonding situations should be allowed to change in the course of simulations. Consequently, also the use of predefined atomic connectivities and bonds should be avoided to obtain a reactive potential enabling the making and breaking of bonds. Ideally, potentials should be generally applicable and not restricted to certain types of atomic interactions, like covalent, ionic, or metallic bonding, and due to their purely mathematical unbiased form ML methods are particularly promising candidates to reach this goal. Finally, the most important requirement is the accuracy of the potential, i.e., the energies and forces should be as close as possible to the underlying first principles data. Aiming for predictive simulations, the most stringent test concerns the transferability to structures, which are very different from the data used in the construction of the potential.
Based on these requirements, a definition of ML potential can be given. A ML potential
employs a ML method to construct a direct functional relation between the atomic configuration and its energy;
does not contain any physical approximations apart from the chosen reference electronic structure method used in its construction;
is developed using a consistent set of electronic structure data.
The application of ML potentials in computer simulations in chemistry, physics, and materials science involves several steps. When using a ML potential, e.g., in MD simulations, in the first step the atomic positions need to be transformed to a set of coordinates suitable as input for the ML method. The complexity of this crucial step has represented a significant challenge in the early days of ML potentials. This problem emerged rather unexpectedly since in most empirical potentials feeding the structure into the functional form is quite trivial, and often, like in classical force fields, readily available internal coordinates like interatomic distances, bond angles, and dihedral angles are well suited. Overcoming the problem of finding appropriate structural descriptors for ML potentials has slowed down their development substantially in the first ten years, but nowadays several options are available. After the transformation of the coordinates onto a suitable set of descriptors, which must have certain properties as discussed in Section II, a ML method is used to evaluate the energy and if required the atomic forces. In principle, they can further be validated to get an estimate for the reliability. They are then passed to the propagator of the MD code and a new atomic configuration is obtained, which then undergoes the same cycle again.
In this perspective, the construction and application of ML potentials for computer simulations with first principles accuracy is reviewed and discussed. A special focus is put on the two central components of any ML potential: the transformation of the atomic positions onto suitable descriptors and the subsequent assignment of an energy to this structure using a functional form provided by a ML method. Since there is no strict border between ML potentials and other available mathematical fitting approaches to represent PESs, the list of methods covered in this perspective cannot be complete, and several important related approaches like splines,17 modified Shepard interpolation,18 interpolating moving least squares,19 and permutation invariant polynomial fits,20 which are very similar in spirit in that they do not employ a physically derived functional form and to some extent face similar challenges, will not be discussed here.
II. STRUCTURAL DESCRIPTION
A. The role of the descriptor
Many different names have been used in the literature for the effective coordinates, i.e., the descriptors, representing the structural inputs for the ML algorithms. The set of descriptors plays a central role, not only in the construction of PESs,21 but also beyond in applications like structure-property relationships for property prediction in materials science22 and also in the strongly related older field of QSAR.23 Further applications with similar requirements regarding the structural description are the analysis of MD trajectories, the screening of large numbers of atomic configurations for certain features, and structure identification in general.24 Even the collective variables used in simulation techniques like metadynamics have similar requirements.25
The substantial challenge of finding suitable descriptors for ML potentials has been recognized in the advent of the first NN potentials about 20 years ago,26 which initially have been developed for rather low-dimensional systems like small molecules and diatomic molecules interacting with frozen surfaces. As NNs, like all other ML methods, process ordered vectors of numbers, the output is not invariant with respect to any permutation of the position of equivalent atoms in the structure. Further, directly available descriptors of the atomic positions like Cartesian coordinates are not invariant with respect to translation and rotation of the system, while the resulting ML potential must possess these properties. There have been satisfying customized solutions for specific molecular27 and surface systems28,29 also beyond the field of ML potentials, e.g., for permutation invariant polynomials.20 Still, initially the lack of suitable input descriptors was the main obstacle to construct ML potentials for high-dimensional systems including hundreds or thousands of atoms.
In general, a transformation of the atomic coordinates onto a suitable set of descriptors must provide the same numerical descriptor values for any equivalent atomic configuration, including translation, rotation, permutation, and any applicable point group symmetry. If this is ensured, then all these equivalent representations of the system have the same effective coordinate values and consequently they are considered as the same structure with necessarily the same energy by the ML method. For simple potentials like classical force fields meeting these requirements is straightforward, because internal coordinates are translationally and rotationally invariant, and the total energy is constructed as a sum of very low-dimensional terms, which maintain permutation invariance. For more accurate potentials employing ML, many-body descriptors are needed, that are much more complicated than additive low dimensional functions.
It is of utmost importance that the descriptors do not contain any artificial symmetries because in this case two different structures would be mapped onto the same coordinates and become indistinguishable. Ideally there would be a one-to-one correspondence between the structure and the set of descriptor values, but this would only be guaranteed if there would be one descriptor per degree of freedom in the system. In this case the number of descriptors would depend on the system size, which should be avoided to obtain generally applicable ML potentials as discussed below. It should also be noted that the transformation from the atomic positions to the set of descriptors does not need to be inverted, and rather complicated functional forms can therefore be used.
Apart from these formal requirements there are also practical aspects concerning the choice of descriptors. Since the transformation onto the descriptors has to be carried out for every structure, the evaluation should be fast, and the descriptors need to be differentiable with respect to the atomic positions to enable the calculation of analytic gradients for the forces. To obtain very accurate ML potentials, the descriptors must provide a very detailed structural description and typically multidimensional functions are required to allow for fitting the many-body interactions in electronic structure calculations with high precision.
In summary, the choice of the descriptors is of vital importance because it is the set of descriptors to which the energy is assigned. The number of descriptors has an immediate impact on the feasibility of any ML potential, as it does not only restrict the efficiency of the energy evaluation but also determines the dimensionality of the coordinate space to be mapped by the training sets. On the other hand, the number of descriptors must be sufficiently large to enable an unambiguous distinction of different structures. In Secs. II B 1–II B 4 several descriptors which have been developed for ML potentials are discussed. This list is by far not complete, and there are plenty of new interesting approaches, which have been proposed recently, like moment tensors,30 scattering transforms,31 and many others.
B. Descriptors for machine learning potentials
1. Atom centered symmetry functions
The angular ACSF (Eq. (4)) provides an angular fingerprint of the environment using the angles θijk formed with each pair of neighbors j and k. The angular resolution can be controlled by the parameter ζ, while λ = ± 1 defines the positions of the extrema of the cosine function.
A number of potentials have been constructed using these ACSFs,35 and typically 50–100 symmetry functions are used per atom differing in the specific values of η, ζ, Rs and λ. The cutoff Rc is a convergence parameter, which needs to be increased until all energetically relevant interactions are included, and in most cases values between 6 and 9 Å have been found to be sufficient. All ACSFs provide a rotationally and translationally invariant description of the environment because they depend on the internal coordinates Rij and θijk. Further, because of the sum over all neighbors, they are invariant with respect to any permutation of chemically equivalent atoms in the environment. The number of ACSFs grows with the number of elements in the system because for each atom there is one radial function per element. The angular functions must be constructed for all possible element combinations involved in the angles θijk. Further details and a discussion of the properties of these and other ACSFs can be found elsewhere.36
2. Bispectrum of the neighbor density
An alternative approach to describe the atomic environments based on Eq. (1) has been proposed by Csányi and coworkers in 201038 for constructing ML potentials for high-dimensional systems employing Gaussian processes. In this very systematic approach, the selection of parameters defining the spatial shape of the descriptors, that is required in case of ACSFs, is avoided by an expansion of the environment in a series of spherical harmonics.
Like in the case of ACSFs, the structural description by the bispectrum has all required properties like translational and rotational as well as permutation invariance. An advantage is the possibility to refine the description of the environments very systematically by using more and more spherical harmonics.
3. Smooth overlap of atomic positions
4. Coulomb matrix
Several of the drawbacks of the original Coulomb matrix have been removed by introducing local Coulomb matrices41 depending on the neighboring atoms inside a cutoff radius. In this way the description of many-atom systems becomes possible, while the locality of many properties is exploited like in the other methods developed for high-dimensional systems. Still, access to the analytic derivatives is complicated, as the eigenvalues are not continuous and differentiable if the number of neighbors in the environment changes.21
To date, the Coulomb matrix has not been applied to the construction of PESs, but several energy-related applications have been reported, like the atomization energies of small organic molecules,40 and atomic forces as a function of the environment.41 Further, also new related approaches like the “bag of bonds” method have been proposed recently with promising first results.42
III. A BRIEF SURVEY OF ML POTENTIALS
A. Overview
The development of ML potentials has taken place in two steps, which are intimately related to the progress made in the derivation of suitable structural descriptors. In the initial phase, ML potentials have been available only for low-dimensional systems, since ML approaches based on the decomposition of the total energy into atomic contributions according to Eq. (1) had not been devised. These early potentials almost exclusively relied on artificial neural networks43,44 and have been applicable to systems containing up to about six atoms. Nevertheless, several interesting and quite successful attempts have also been made in this early stage to extend the applicability of NNs to systems including larger numbers of atoms.45,46 Starting with NNs32 in 2007, several different ML approaches suitable for high-dimensional systems containing thousands of atoms have now become available38,40 and are still evolving rapidly.10,47,48 An overview about the increasing use of different ML techniques to the construction of PESs is shown in Fig. 1.
In this section, several ML methods along with their use in the construction of PESs are outlined, which in principle could be combined almost arbitrarily with the descriptors discussed above. Still, currently there are a few dominant combinations of descriptors and ML methods, but this is not because of the compatibility or incompatibility of the different approaches, but it is mainly a consequence of the personal preferences of the developers.
B. Neural networks
Artificial neural networks are the oldest ML method49 and they have first been developed to understand the signal processing in the brain. They also represent the first ML method which has been used to fit a DFT PES by Doren and coworkers in 1995,26 and to date a large number of NN potentials have been developed for many molecular and condensed systems.27,28,50
By using a FFNN for each atom in the system and employing Eq. (1), an extension to high-dimensional NN potentials applicable to very large systems has been proposed by Behler and Parrinello in 2007.32 For this purpose, the atomic environments have been described by ACSFs. Further details about this method can be found elsewhere.35,51 In 2013 it has been explicitly demonstrated by Geiger and Dellago52 that this scheme, i.e., the combination of atomic NNs and ACSFs, can also be used to accurately identify amorphous and crystalline structures in MD simulations. They found that this approach is superior to conventional methods for structure identification like Steinhardt parameters,53 which supports the suitability of ACSFs for the distinction of different structural motifs. This application, which is very useful on its own, can be viewed as an intermediate step of the construction of the NN PES, since the proper characterization of the atomic environments is a mandatory ingredient for obtaining a reliable structure-energy relation in NNPs.
Finally, it should be noted that NNs are currently experiencing an enormous boost in development under the name “deep learning,” which is essentially based on very large NNs.
C. Gaussian approximation potentials and kernel methods
D. Support vector machines
Support Vector Machines (SVMs)9 belong to the most important ML methods, but to date they have been rarely used for the construction of PES.57,58 Originally, SVMs have been developed for the classification of data into two groups with different properties by introducing a multidimensional hyperplane separating the data. As this hyperplane usually has a very complicated non-linear shape, the data is processed by a kernel substitution, which is in essence the mapping of the original data onto a higher-dimensional feature space, in which a linear separation is possible (see Fig. 3). In order to assign the data to real-valued properties like energies, instead of the original SVMs, an extension in the form of support vector regression (SVR) is used.57
Operation mode of support vector machines (SVMs):9 The data to be classified into two categories cannot be separated linearly by a hyperplane (a). This linear separation becomes possible after the transformation onto a higher-dimensional space as shown in (b).
Operation mode of support vector machines (SVMs):9 The data to be classified into two categories cannot be separated linearly by a hyperplane (a). This linear separation becomes possible after the transformation onto a higher-dimensional space as shown in (b).
E. Spectral neighbor analysis potential
IV. DISCUSSION AND OUTLOOK
Starting from the two main components discussed in Secs. II and III, ML potentials can be constructed by combining a descriptor to convert the structure into a representation fulfilling the required invariances and a ML method to relate the set of descriptor values to an energy. In principle this approach is quite modular and many combinations of descriptors and ML techniques are possible, although only a few have been used so far.
The obtained ML potentials have a solid theoretical foundation. Concerning the physics this is the Born Oppenheimer approximation. Mathematically it has been proven for several ML approaches, e.g., for NNs,60,61 that very accurate representations of the potential energy functions are possible, although for practical applications these proofs are of limited use. Therefore, the application of ML to PES fitting is more rigorous than the prediction of other physical properties, although in the end most properties are related to the quantum mechanical Hamiltonian, i.e., to the structure. This relation, however, might be very indirect and thus more complicated to identify for ML.
ML potentials have many advantages. They offer a very fast calculation of the energies and forces, although they are usually one or two orders of magnitude slower than very simple force fields. High-dimensional ML potentials employing Eq. (1) can easily be parallelized and scale close to linearly with system size. Therefore, they enable simulations of very large systems much beyond the realm of ab initio MD. They all provide very accurate energies close to the underlying electronic structure methods, but they cannot and must not provide better results than this chosen reference method because by themselves they are not relying on a physical functional form but must learn the physical shape of the PES from the provided example data.
Still, there are also some remaining challenges. The construction of ML potentials currently requires very large reference data sets from electronic structure calculations, making the construction computationally very demanding and time consuming. Reducing the size of the reference sets should therefore be one of the primary goals, and all available information about the PES, like forces, should also be used. The minimum size of the reference sets strongly depends on the complexity of the PES, which must be fully mapped, and research in this direction is just in its infancy. In any case, the identification of missing relevant structures is closely related to the analysis of structural similarities.24,62,63 Alternatively, the flexibility of the ML methods can also be used to identify structures, which are too far away from the known reference set.51 Another interesting approach is to test ML forces on-the-fly in MD simulations and to add more data if necessary while running the simulations.55 In any case, ML potentials require very careful testing and validation because they can fail spectacularly, if they have not been constructed properly. ML potentials are a tool to speed up simulations, but they are unable to provide new insights that have not been provided in the training set, but these insights might be complex and hidden and very difficult to extract without the use of ML techniques.
Another current limitation is caused by the exponentially growing complexity of configuration space with chemical composition, which drastically increases both, the number of descriptors and the amount of reference data. Therefore, many ML potentials are restricted to only a few chemical elements, a limitation that they share with many other potentials in materials science. Only if the configuration space is reduced, either by using only very low-dimensional terms like in classical force fields, or by the restriction to local minima of the PES,40 a higher chemical complexity can be dealt with.
A future trend to overcome this limitation will certainly be the combination of ML methods with physically meaningful energy terms. The first step in this direction has already been taken for the most obvious choice, the electrostatic energy contribution. Already in 2007 Popelier and coworkers demonstrated that NNs can be used to represent atomic electrostatic multipoles64 to obtain improved electrostatic energy terms in classical force fields. In 2011 environment-dependent atomic charges have been expressed as an extension to the short range energy part in high-dimensional NN potential to avoid the truncation of the atomic interactions at the cutoff radius defining the atomic environments.65 Both approaches rely on reference charges from electronic structure calculations, which are mathematically well-defined but physically not unique. A possible solution to overcome this limitation in a ML framework has been proposed by Ghasemi et al.66
V. CONCLUSIONS
A lot of progress has been made in recent years in the development of interatomic potentials based on the combination of ML techniques and data from electronic structure calculations. For this purpose, a toolbox of structural descriptors and ML methods is now ready for use and several new techniques can be expected to enter the stage soon. First applications have demonstrated that potentials of very high accuracy can be obtained for realistic systems containing many atoms and complex atomic configurations. Apart from benchmark studies, state-of-the-art simulations can now be performed, which allow to solve problems in chemistry, physics, and materials science that have hitherto been inaccessible due to the lack of suitable potentials with first-principles accuracy.67–73
In spite of this success, the construction of ML potentials is still very demanding and the potentials need to be tested very carefully due to the lack of a physical basis of their functional form. Consequently, the distribution of potentials from developers to users is not as quickly as for other types of potentials, which is certainly one of the main reasons for the only slowly increasing usage of these methods. Further efforts are also required regarding the applicability to systems containing more than a few elements. This bottleneck, which ML potentials share with many conventional potentials in materials science, is one of the most urgent problems to be addressed in the near future. Finally, it would be desirable to compare different approaches for well-defined benchmark cases to identify and possibly combine the advantages of these methods. To date only the tip of the iceberg of the field of ML has been exploited in the construction of interatomic potentials and many new approaches will be developed to further extend the applicability of ML-based potentials in atomistic simulations.
Acknowledgments
The author thanks the DFG for financial support (Heisenberg Fellowship No. Be3264/6-1). This work was supported by the Cluster of Excellence RESOLV (EXC 1069) funded by the Deutsche Forschungsgemeinschaft.