To design new materials and understand their novel phenomena, it is imperative to predict the structure and properties of materials that often rely on first-principles theory. However, such methods are computationally demanding and limited to small systems. This topical review investigates machine learning (ML) approaches, specifically non-parametric sparse Gaussian process regression (SGPR), to model the potential energy surface (PES) of materials, while starting from the basics of ML methods for a comprehensive review. SGPR can efficiently represent PES with minimal ab initio data, significantly reducing the computational costs by bypassing the need for inverting massive covariance matrices. SGPR rank reduction accelerates density functional theory calculations by orders of magnitude, enabling accelerated simulations. An optimal adaptive sampling algorithm is utilized for on-the-fly regression with molecular dynamics, extending to interatomic potentials through scalable SGPR formalism. Through merging quantum mechanics with ML methods, the universal first-principles SGPR-based ML potential can create a digital-twin capable of predicting phenomena arising from static and dynamic changes as well as inherent and collective characteristics of materials. These techniques have been applied successfully to materials such as solid electrolytes, lithium-ion batteries, electrocatalysts, solar cells, and macromolecular systems, reproducing their structures, energetics, dynamics, properties, phase-changes, materials performance, and device efficiency. This review discusses the built-in library universal first-principles SGPR-based ML potential, showcasing its applications and successes, offering insights into the development of future ML potentials and their applications in advanced materials, catering to both educational and expert readers.

A breakthrough in functional materials is driving innovation across various technological fields, spanning from clean energy to information processing. Historically, the discovery of new materials has been hindered by costly trial-and-error methods. Meanwhile, advancements in deep learning, particularly in language, vision, and biology, have demonstrated remarkable predictive capabilities with the accumulation of data and computational power. Leveraging these trends, researchers have developed various machine learning (ML) methods. In particular, in developing new functional materials, the prediction of the structure and properties of the desired materials is possible from various ML approaches to possible materials candidates. One noteworthy achievement is the recent success of ML-based design in creating new solid-state materials.1,2

To achieve accurate predictions of diverse material systems, conducting simulations using ab initio calculations such as first-principles theory is the most desirable. Consequently, the most comprehensive approach to designing new functional materials involves employing simulation methods that range from simple to complex material systems, operating at the quantum mechanical level. However, ab initio calculations demand substantial computational resources and are often constrained to systems comprising only few hundred atoms. To overcome these limitations, numerous efforts have been directed toward utilizing ML approaches to generate first-principles level interatomic potentials (IAPs) and forces for materials. These endeavors span a wide array of ML approaches to generate the first-principles level IAPs of large-scale materials.3–35 

Diverse ML methods allow us to extend the high quality of ab initio level calculations to large systems and long time periods. Indeed, density functional theory (DFT) combined with ML has enabled fast and realistic computer experiments, which correlate very well with the real world. Notable methods are neural network (NN) representations,36–42 Bayesian linear regression (BLR)43–46 and Gaussian process regression (GPR),25,47–49 gradient-domain50 and symmetrized gradient-domain ML,51 deep potential,52 compressed sensing,53 sparse Gaussian process regression (SGPR),54 etc.

Based on these various ML methods, there have been diverse applications for materials simulations such as inorganic crystals,55–57 metal particles,58,59 solid electrolytes,54,60–63 batteries,64–66 electrocatalysts,67–70 perovskite solar cells,71–73 molecular liquids,74–78 organic/bioorganic molecular systems,79–81 macromolecules,82 supramolecules, and polymers.83 In particular, the SGPR rank reduction84,85 was able to accelerate a DFT calculation of ∼1000 atoms by a factor of 104, resulting in accelerated simulations of silicon crystals, sulfide solid electrolytes as well as others.54 

In this context, this topical review outlines ML representations of the potential energy surface (PES) of materials using non-parametric Bayesian and GPR algorithms, which have the advantage of estimating the possible errors in the predicted potential and force fields. These representations can effectively capture the PES with a minimal amount of ab initio data, contrasting with NNs that typically require extensive datasets for training their hyper-parameters. The computational efficiency of the GPR algorithm is primarily determined by the inversion of the covariance matrix between data components, such as energy and forces. Addressing this computational bottleneck, SGPR representations have been developed for PES, wherein large datasets are projected into smaller inducing sets, bypassing the need for inverting massive covariance matrices and significantly reducing computational costs. Several orders of magnitude improvement in computational cost is achieved. Moreover, an optimal adaptive sampling algorithm has been devised to facilitate on-the-fly data generation for regression, in conjunction with molecular dynamics (MD) simulations. Through hierarchical combinations of expert models, universal potentials have been generated, enabling the modeling of large-scale complexity via a combinatorial approach.63 

As a demonstration, this review presents examples such as the reproduction of melting, glass-crystallization, and phase-change for silicon materials and solid electrolytes. The ab initio level MD simulations using on-the-fly ML of IAPs have been conducted to survey Li diffusivity in hundreds of crystals, potentially applicable as electrolytes for all-solid-state batteries. The models generated for these crystals can be amalgamated to create more general and transferable models through the hierarchical combination of expert models, which generate universal potentials, facilitating the simulation of all these materials and their analogs. This leads to applications for various systems, including batteries, solar cells, catalysts, and hydrocarbon macromolecular systems, paving the way for the eventual evolution of ab initio-level force fields for general materials systems. These universal potentials can be useful in designing novel functional molecular/materials systems as well as predicting their structures and properties. The continuous accumulation of these universal potentials could eventually evolve toward super-AI by self-learning process of diverse IAPs, which would lead to the prediction of diverse undiscovered materials. This comprehensive review offers in-depth insights into the evolution of ML potentials and their practical implementations in advanced functional materials. It is crafted to accommodate readers at both introductory and advanced levels of comprehension, ensuring accessibility for a wide range of audiences.

Ab initio simulations are an integral part of our understanding of the materials physics and chemistry alongside experiments. The major bottleneck for application of the ab initio simulations is the computational cost, which scales exponentially with the size of system. If Ne is the number of electrons, scaling of the computational cost of different ab initio methods varies from O(Ne3) to higher powers. DFT is one of the most popular and efficient ab initio methods with a nice trade-off between accuracy and computational cost.86 The cost for DFT calculations based on the Kohn–Sham scheme, which requires diagonalization to obtain Kohn–Sham orbitals, scales as O(Ne3) or at best O(Ne2logNe) unless one uses the fragmental calculations based on divide-and-conquer algorithms.87,88 From a practical point of view, DFT calculations are affordable only for 100 atoms with reasonable computational resources. In many cases for simulating large systems, force fields are used, which express the potential energy in terms of distances, bonds, angles, dihedrals, etc. Force fields are specially popular in simulating biological systems where large macromolecules such as proteins need to be simulated. The parameters in the force field are often set by fitting to limited experimental data or ab initio results. However, these force fields are of varying transferability between different thermodynamic environments and sometimes between the calculation of different properties within the same environment. In many cases, a force field cannot even satisfactorily fit ab initio data because of the many-body quantum effects, which cannot be reduced to bonds, angles, etc.89,90

As an alternative to empirically fitting the force fields with simple analytical forms, ML techniques have recently emerged as a powerful tool for representing the mathematical entities used in ab initio modeling.15 Using ML, the potential energy is learned from a set of n ab initio data points. ML reduces the scaling of the computational cost for energy and force calculations to be linear in the number of atoms in the system. This allows us to apply a potential, which may be trained with the data obtained for small systems, to arbitrarily large systems. Currently, the most widely used ML methods are variations of NNs,36 Gaussian approximation potentials (GAPs),47 and BLR methods.43,53,91 In spite of their success stories, each method has shortcomings. NNs and linear methods need large datasets for training. GAPs can be used for robust modeling with small datasets, but their computational cost for training scales as n3 with the size of the data. Here we discuss the SGPR algorithm, which shows very attractive traits in comparison to the previous ML methods. Moreover, we address the issue of optimal data generation, which may be even more important than the regression algorithm. Combined with SGPR, we also introduce an adaptive sampling algorithm that can build robust potentials with a surprisingly small number of DFT data.

The capabilities of this methodology are demonstrated with applications to superionic conducting solid electrolytes. Li diffusivity is studied in dozens of solid electrolytes. For a certain sulfide based electrolyte, the melting, melt-quench, and diffusivity in the glass phase and glass crystallization are simulated, which are found to be consistent with experimental observations. Additionally, since some of the simulated electrolytes had similar chemical elements, the ML models generated for these materials can be combined to give a universal potential for that subset of elements.

In many simulations, even a computationally fast potential is not enough. For instance, slow dynamics or rare events may require inconceivably long molecular dynamics or Monte Carlo simulations in order to achieve statistical certainty. In general, there are two scales that determine the size of the simulation cell and the period for simulation. The size of the simulation cell must be much larger than the length scale of the spatial correlations. Similarly, the period of the simulation must be much longer than the temporal autocorrelation functions. In these situations, enhanced sampling algorithms are necessary in addition to fast potentials. Although the temporal autocorrelation function oscillates long in many cases in chemistry (compared with 1ps for simple liquids92), the spatial correlation length is often small (compared, for example, with the Debye length 10Å for salt concentrations in biological fluids93). Two exceptions exist where the spatial correlation length is divergent. The first is at the critical point of continuous phase transitions and the second is when a system exhibits quasi-long-range order. In this regard, fast and accurate ML models for first-principles potential energy are critical for large-scale materials simulations. Thus, in this section, we address the ML representations of the potential energy surface (PES), including NNs, GPR, and BLR, and then discuss SGPR along with the adaptive sampling algorithms.

As for the ML representations of the PES, the inputs are the atomic coordinates and the targets are the potential energy and forces obtained from ab initio calculations such as DFT. A set of (inputs, targets) constitute the data that will be used for modeling. Assembling the data is an important topic by itself. The data largely determine the domain of validity for the generated model and the computational cost. Therefore, the generation of optimal data might be the most important step for wider transferability and acceptance of the simulation results. Here, we will assume that the data are generated beforehand and we wish to model the existing data. We will discuss the general aspects of ML algorithms and briefly introduce the state of the art methods. Optimal generation of the data by algorithms such as on-the-fly sampling or active learning will be discussed later.

Nomenclature will be followed hereafter. It should be noted that symbols like D(·) and K(·,·) are used to distinguish functions and kernels from coefficients, vectors, etc.

Let R={ri}i=1N represent the coordinates of N atoms in a physical system. A set of descriptor functions {Dα} is defined, which maps R into a surrogate space called the descriptor set {dα}, where dα=Dα(R) with a descriptor index α. The essential property of the descriptor functions is that they should be continuous and differentiable with respect to atomic coordinates. Ideally, there should be one-to-one mapping for symmetrically distinct physical systems and their associated descriptor sets. Symmetry in this context refers to operations such as rotations and translations, which leave the potential energy invariant. In ML terminology, the descriptors are often referred to as feature vectors. The potential energy of the system is then expressed as
(1)
where {Eα} is the associated set of latent energy functions (LEFs) for the descriptors DαEα. The main goal of ML is to find representations for LEFs using a dataset of ab initio potential energy and forces. For this, LEFs are represented by regression algorithms such as linear regressions, NN, and GPR.
A subcategory of ML representations, which we are most concerned with, is expression of the potential energy in terms of local chemical environments (LCEs), which represents local atomic structures around an atomic position. In this case, it is useful to introduce a cutoff radius rc as a measure of locality for practical efficiency. The force on atom i depends only on the relative coordinates of other atoms whose distance is less than rc. This set of neighbors forms the LCE of atom i, formally defined by
(2)
where rij=rjri. Then, the descriptor for atom i depends on its LCE di=D(ρi) and the potential energy is expressed as
(3)
Additionally, the force on atom k is
(4)
where rkμ are the coordinates of atoms k. Derivatives of E with respect to atomic coordinates are obtained using the chain rule
(5)
In the following, we will assume the representation given by Eq. (3), but switching to the generalized form given by Eq. (1) is often as simple as reinterpretation of the indices in the given formulas.

The central assumption in the above formulation is that of locality. This assumption is exercised in practice by choosing a finite cutoff radius rc. There is a trade-off in choosing the cutoff. Choosing a large cutoff dilutes the features of LCEs and increases the computational cost. On the other hand, by choosing a small cutoff, the effect of atoms beyond the cutoff will appear as random forces that cannot be explained. Therefore, the cutoff should be chosen by balancing the errors of the regression algorithm and the random noise due to the atoms beyond the cutoff. Such a principled approach has been taken for ML of the potential energy for amorphous carbon.94 

The complexity of a ML model is roughly proportional to the number of parameters present in the model. We are interested in models that can describe the training data with sufficient accuracy, but the main goal is to use the model for reliable predictions of the latent function at unseen inputs. If this is the case for a model, it is said that the model generalizes well. Otherwise, if a model describes the training data well but fails to predict the latent function at unseen inputs, then it is said that the model over-fits the training data. Therefore, available data are often split into training and testing sets. Only the training set is used for optimizing the parameters of the model. The testing set is only used to estimate the generalization error. Over-fitting can be viewed as a model memorizing the training data instead of finding patterns and relationships among the data. As a rule of thumb, a model with more complexity is more likely to over-fit. Therefore, systematic methods exist to reduce the number of parameters used in the model. This task is called regularization and is often carried out by defining loss functions that aim at minimizing or nullifying some of the parameters in the model. If w=[w1,] are the parameters/weights in a model, the most common loss functions for regularization are
(6)
and
(7)
For the definition of the norms, see Nomenclature The important distinction between L1 and L2 regularization losses is that L1 encourages some of the weights to be exactly zero, even if some other weights remain large, while L2 shrinks all of the weights to small but finite values. Therefore, L1 can be included to nullify some of the weights. L1 is called the least absolute shrinkage and selection operator (LASSO). L2 is called ridge regression or Tikhonov regularization. Mixing of L1 and L2 is the topic for elastic net regression.
A NN consists of stacked layers of weights and biases for the propagation of a descriptor vector by a sequence of transformations. Algebraically,
(8)
where w and b are the weights and biases, respectively. The first input is the descriptor vector x1=d and the final output is a scalar value for the latent function E. a(·) is called the activation function, which is just a non-linear transformation to enable the NN to model non-linear data. The layers between the input and the output are called hidden layers (Fig. 1). NNs are used extensively for representing the PES.36,37,52 Here we will briefly discuss the main characteristics of NNs.
FIG. 1.

Neural network representation of the latent energy function E.

FIG. 1.

Neural network representation of the latent energy function E.

Close modal
For training, the parameters (i.e., weights and biases) in a NN are optimized using a flavor of the gradient descent algorithm. A loss function is defined whose minimization yields the optimal parameters. A common loss function is the squared error
(9)
where {Ei}i=1n and {Ẽi}i=1n are the energies predicted by the NN and energies in the training data, respectively. A loss function Lf can be similarly defined with the force data. An important issue to keep in mind is that NNs are massively parameterized; therefore, they can easily over-fit the training data. For regularization, often L1 or L2 loss functions are included in the total loss function alongside LE and Lf.

NNs are specially efficient for big data since the computational cost for training a NN for data of size n scales as O(n). On the other hand, it is not easy to train a NN model using a small training set without over-fitting. Therefore, NNs are very useful when large datasets are already available. However, in the context of ML of the PES, it should be noted that the data are obtained from expensive ab initio calculations and are often scarce.

ML potentials based on NNs have greatly improved due to advancements in graph-based algorithms.1,17,40,95–100 When the predicted energies are constrained to be invariant under rotation, translation, and permutation, and the predicted forces are constrained to be equivariant under rotation as the NN parameters are updated, the overfitting issues are significantly reduced.

Graph neural networks (GNNs) are useful for modeling systems where data can naturally be represented as a graph, such as atomic systems in materials and molecular simulations. In these models, nodes represent atoms, and edges represent interactions like bonds or forces between them. Two important categories of GNNs in this domain are invariant models (M3GNet,96 CHGNet97) and equivariant models (NequIP,40 MACE17). Invariant models ensure the output properties are invariant to the rotation, translation, and reflection of the input system, while equivariant models enforce the predictions change in a way that follows the symmetry transformations (like rotations) of the input system. Invariant models are typically simpler and faster, suitable for scalar outputs like total energy, while equivariant models that incorporate geometric symmetry are ideal for tasks requiring directional dependencies (such as forces and torques).

A Gaussian process is a type of stochastic process that is generalization of the multivariate normal (MVN) distribution to the continuous index set. A stochastic process is a set of random variables {f(x):xX}, where X is an index set (for instance, X=Rd, where d represents a dimension). A Gaussian process is a stochastic process such that any finite subset of random variables has a MVN distribution. A MVN distribution with the mean vector m and the covariance matrix K is defined by
(10)
where f=[f1,,fn]T. Symbolically,
(11)
A Gaussian process with the mean function M(·) and the covariance kernel K(·,·) is defined such that for any finite set of elements x1,,xnX, the associated finite set of random variables f(x1),,f(xn) have a MVN distribution
(12)
or symbolically
(13)
The covariance kernel K(x,x) is a positive definite function, given by
(14)
which results in a symmetric positive definite covariance matrix K such that
(15)
Positive definiteness ensures that all of the eigenvalues of the resulting covariance matrix K are positive. For intuition, let
(16)
where Φ(x)={ϕj(x)} is the mapping to decoupled components such that ϵj are independent random variables ϵiϵj=σj2δij. Then for the covariance of f at x and x, we have
(17)
The reverse argument is Mercer's theorem, which states that a given kernel can be expressed in terms of its eigenvalues and eigenfunctions (K(x,x)ϕ(x)dμ(x)=λϕ(x), where dμ(x) may be dx or p(x)dx)
(18)
As an example for the covariance kernel, the radial basis function (RBF) kernel is defined by
(19)
where ξ is a hyper-parameter.
For representing the PES, the latent energy function E is taken as the Gaussian process. A detailed formulation of the regression algorithm will be discussed in a later section. Before proceeding, it is useful to also consider the covariance between derivatives of the latent function since we need to consider forces in addition to energies:
(20)
(21)
(22)

1. Multivariate normal distributions

Here we review some of the most important properties of MVN distributions defined by Eq. (10). Let f and f be random vectors with MVN distributions defined as
(23)
(24)
If f and f have the same size, then f+f is also a MVN distribution given by
(25)
Let f be partitioned into two parts fA,fB such that
(26)
Then, the marginal density is defined by
(27)
One can show that p(fA) is a MVN distribution given by
(28)
Similarly, the conditional density defined by
(29)
is a MVN distribution (see the  Appendix for the derivation) given by
(30)
Since the covariance matrix is symmetric positive definite, it can be decomposed as K=LLT, where L is a lower triangular matrix. This factorization is called Cholesky decomposition. Any symmetric matrix has the eigen-decomposition defined as K=QΛQT, where columns of Q are eigenvectors of K and Λ is a diagonal matrix whose entries are eigenvalues of K. The inverse of covariance matrix can be expressed as K1=QΛ1QT. Then the vector α=QT(fm) has a Gaussian product distribution
(31)

2. Gaussian process regression

In GPR, for a given data {(xi,yi)}i=1l, we assume
(32)
where ϵN(0,σ2) is a random noise, xi is a columnar (feature) vector of size d, yi is a scalar (label), and f is the latent function. For briefness,
(33)
Considering a valid covariance kernel K(·,·), resulting in the covariance matrix Kij=K(xi,xj), we assume a Gaussian process prior for f as
(34)
Then, a MVN distribution for the labels will be assumed
(35)
We can split the data into training and test sets as follows:
(36)
Then, Eq. (35) becomes
(37)
By removing the noise from the testing data, we get
(38)
Then, using the conditional density formula [Eq. (30)], we can find the predictive distribution for f as
(39)
where the predictive mean and variance are given by
(40)
In contrast to NNs, which are massively parameterized, Gaussian processes are (almost) non-parametric. Often, only a few hyper-parameters are included in the covariance kernel, which need to be optimized. Therefore, GPR works well with small datasets and is almost prone to over-fitting. The schematic for GPR, covariance, and kernel is shown in Fig. 2.
FIG. 2.

Schematic features of Gaussian process regression and of covariance and kernel.

FIG. 2.

Schematic features of Gaussian process regression and of covariance and kernel.

Close modal

Meanwhile, for GPR, we need to invert the covariance matrix on the training data, which is computationally costly. If the size of covariance matrix is n, the computational cost for matrix inversion is O(n3). For reducing this unfavorable scaling of the cost, several approximation methods have emerged that reduce this scaling to O(n). We will discuss one of them in a later section.

3. Hyper-parameters

Any selected covariance kernel K(·,·) has a set of hyper-parameters ϑ. In training, we tune these parameters to maximize the probability of training labels (marginal likelihood)
(41)
In practice, we maximize logarithm of this probability (log-marginal-likelihood)
(42)
where Γ=K+σ2I. For gradient ascent algorithms, the partial derivatives are
(43)
where α=Γ1y.

If the inverse of the matrix is not explicitly required in the calculation of quantities such as Γ1y, we can bypass the matrix inversion and calculate the result α as the solution of linear system Γα=y. If Γ is symmetric and positive definite, the Cholesky decomposition can be used to significantly reduce the computational complexity. An important point is worth noting here about σ2, which is added to the diagonal elements of the covariance matrix. This should not be set to exactly zero, even if our data are noiseless. A small non-zero σ is needed to ensure the numerical stability of the Cholesky factorization and matrix inversion. The reason is that although the covariance matrix is in principle positive definite, an eigenvalue can become 0 due to numerical errors. This makes the determinant 0 and the inverse non-existent.

4. Covariance matrix for energies and forces

For GPR of the potential energy defined by Eq. (3), the latent energy function E(·) is assumed to be a Gaussian process with the covariance kernel K(·,·)
(44)
Explicit forms for K(·,·) will be discussed later. For two sets of atomic positions R and R, the covariance between respective potential energies is obtained by
(45)
where di and dj are the descriptors for the LCEs in R and R. The covariance between a force component and a total energy is expressed as
(46)
where rkμ are the coordinates of the atom k in R and the chain rule is applied at the last step. Similarly for two force components, we can write
(47)
Let the number of atoms in R and R be N and N, respectively. Then the computational cost for Eq. (45) is
For Eq. (46), since di/rkμ is non-zero only if i and k are neighbors, the computational cost is
where nc is the number of atoms in the neighborhood of atoms k within the cutoff radius rc. Similarly, for Eq. (47), the cost becomes
The total cost for all covariance calculations, considering that there are 3N and 3N force components in R and R, becomes
(48)
Here we discuss the algebra for GPR of the ab initio potential energy and forces using a dataset of such calculations. For two sets of atomic positions R and R, we can calculate the covariance matrix as
(49)
where α(R,R) given by Eq. (45) and rkμ,rkν are the coordinates of atoms in R and R. If the numbers of atoms in R and R is N and N, the shape of KRR is (1+3N)×(1+3N). In addition, let
(50)
be a vector of the length (1+3N) containing the potential energy and forces obtained from ab initio calculations for R.
With above definitions, let R1,,Rn be n distinct ab initio calculation. Using these dataset, we wish to predict the ab initio potential energy and forces for R. Using Eq. (40), the predictive mean is
(51)
(52)
where
(53)
The predictive variance becomes
(54)
For improving the inference, we need to maximize
(55)
with respect to the hyper-parameters in the covariance kernel K(·,·) and possibly the descriptor function D(·).
If all of the systems in the dataset have the same number of atoms N, the size of the covariance matrix that needs to be inverted is (1+3N)n. The computational cost for calculating w is
(56)
Using Eq. (48), the computational cost for construction of the covariance matrix knn is
(57)
With N100 atoms and reasonable computational resources, in practice only a few hundred ab initio data can be used for full-rank GPR with an acceptable efficiency. Therefore, approximations for reduction of the cost scaling are crucial. Variations of this formalism are used for regression of the ab initio potential energy.44,47,49–51,101
Let z={χj}j=1m be a set of descriptors. This set can be a subset of all of the LCEs in n ab initio data R1,,Rn. For a given descriptor d, the latent function is expressed as
(58)
where w=[w1,,wm]T are the weights for the descriptors in z and K(·,·) is a covariance kernel similar to GPR. For atomic positions given by R, we define
(59)
where di are the descriptors in R, N is the number of atoms, and β(R,z) is a 1×m matrix. The potential energy for R becomes
(60)
Then, the forces acting on the kth atom are obtained from
(61)
The instantaneous pressure tensor Pμν is defined as
(62)
where Tμν is the instantaneous temperature tensor. To predict the pressure tensor, the internal virial tensor Wμν obtained by
(63)
The weights w can be calculated by linear fitting of E(R), fkμ, and Wμν to the energies, forces, and virial pressure from ab initio calculations for the data R1,,Rn. For this, the following matrices are defined:
(64)
Then, the linear system to be solved is
(65)
where yn is the vector with the associated ab initio energies, forces, and virial pressures in the data [see Eqs. (50) and (53)]. The formal solution of this linear system, with a ridge regression regularization, becomes
(66)
where σ is a small regularization parameter [Eq. (66) is the vector that minimizes ||Knmwyn||22+σ2||w||22], but computationally it is better to solve Eq. (65) directly. The predictive variance can be calculated from
(67)
The computational cost for w, with the same assumptions that we made in Sec. II G, becomes
(68)
which is significantly smaller than the inversion of the full covariance matrix since typically mn. The cost of calculation of the covariance matrices scales as
(69)
Determination of the set z is the most challenging step. Naively, the set z can be obtained by collecting all of the LCEs in the data. However, for efficiency, an algorithm can be designed to collect only significantly distinct environments. This method is successfully used for the studies of melting and stability of hybrid metal–organic perovskites.43,91
Compressed sensing formalism53 is similar to the linear regression discussed in Sec. II H. The only difference is that instead of using the ridge regression regularization in solving Eq. (65), the elastic net regression regularization is applied when w is calculated by minimizing the following loss function
(70)
which reduced to the ridge regression if β = 0. Non-zero β causes some of wj to vanish. This can be viewed as a systematic method for compressed sampling of the LCE present in the data.

In Secs. II G and II H, we discussed two types of Bayesian regression algorithms: full-rank GPR and BLR methods. In full-rank GPR, the cost for calculation of the covariance matrix scales as Ccovn2 for a dataset of size n. The cost for evaluation of the regression weights is dominated by inversion of the covariance matrix, which scales as Cwn3. In linear regression methods scaling of both of these costs is reduced to O(n). Instead, linear regression methods do not come close in data efficiency, while GPR can represent the potential energy surface with much less data with the same accuracy. In this section, we explore low-rank approximation of the covariance matrix in GPR to reduce the cost scaling to that of the linear regression methods while maintaining the data-efficiency and expressiveness of GPR. This novel algorithm is called SGPR modeling. We also address on-the-fly data sampling methods that are useful to maximize the efficiency of SGPR.

The key for SGPR is low-rank approximation of the covariance matrix in GPR (Figs. 3 and 4). If z={χj}j=1m is a reduced set of LCEs, which is sufficient statistics for the LCEs in n ab initio data R1,,Rn, we can approximate
(71)
where Knn is the full covariance matrix among the data and Knm is the cross covariance matrix between the data and z. See Eqs. (53) and (64) for explicit forms of these matrices. Kmm is the covariance matrix among z, which is given by
(72)
FIG. 3.

Schematic of rank reduction of the co-variance matrix of kernel by SGPR (KnnKnmKmm1KnmT) involved with the inversion of knn, which improves the scalability with data efficient adaptive sampling toward optimal generation of training data and global exploration.

FIG. 3.

Schematic of rank reduction of the co-variance matrix of kernel by SGPR (KnnKnmKmm1KnmT) involved with the inversion of knn, which improves the scalability with data efficient adaptive sampling toward optimal generation of training data and global exploration.

Close modal
FIG. 4.

Schematic comparison of GPR (left) and SGPR (right). Squares represent atomic configurations. Circles represent compilation of configurations into LCEs around each atom. In GPR, the potential energy for x is predicted by calculating its covariance matrix with the data X. In SGPR, only the covariance matrix with the inducing LCEs z is needed. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 4.

Schematic comparison of GPR (left) and SGPR (right). Squares represent atomic configurations. Circles represent compilation of configurations into LCEs around each atom. In GPR, the potential energy for x is predicted by calculating its covariance matrix with the data X. In SGPR, only the covariance matrix with the inducing LCEs z is needed. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal
The set z is called an inducing set of LCEs. Similarly, for covariance of the test system R with the data, we approximate
(73)
where Km is the covariance matrix between R and z. Using the approximations given by Eqs. (71) and (73) for the predictive mean in Eq. (51), we can write
(74)
(75)
where yn is the vector of ab initio energies, forces, and virial pressures in the data [see Eqs. (53) and (65); refer to Eq. A(34) in  Appendix]. An interesting observation is that if Kmm in the above equation is replaced by I, the predictive mean reduces to that of the BLR method in Eq. (66). Therefore, except for calculation of Kmm and its inverse, the computational cost is similar to the Bayesian linear regression method. Another point is that if we define w=Kmm1e and substitute into Eq. (74), it looks like a regression based on (z,e) instead of full data. For this reason, the above approximation is also called projected process. The predictive variance becomes
(76)
Finally, w can be viewed as the solution of the following linear system:
(77)
where L is the Cholesky factor of Kmm (Kmm=LLT, where L is a lower triangular) and 0 is a columnar vector of zeros with length m.
From another point of view, the SGPR algorithm is just a normal GPR only with the kernel redefined as
(78)
where Kρim is the covariance matrix between LCE ρi and inducing LCE z. That is, the covariance kernel between two LCEs is indirectly calculated using the inducing set z. The covariance loss due to sparsification can be quantified by the spilling factor102 
(79)
which can also be viewed as the predictive variance of a GPR based on (z,e). Therefore, s(ρ) can be utilized in sampling LCEs.

Given that the kernel in SGPR only needs to be calculated between the data and inducing LCEs, a substantial improvement is achieved as compared with full GPR. In a naive comparison, the improvement is O(n2)O(n), while a more rigorous analysis shows a much better improvement. For example, with 100 configurations (100 atoms each), the improvement is a factor of O(103).

In full-rank GPR, for optimizing the hyper-parameters, we maximize logp(yn), where p is a MVN distribution with the mean 0 and the covariance matrix Knn. In SGPR, Knn is approximated by Eq. (71). In the following, we will refer to the approximated Knn by K̃nn. A straightforward approximation for p(yn) is to just use K̃nn, instead of Knn. This approximate distribution will be denoted by p̃. Instead of this approximation, Titsias103 found a variational lower bound for the true marginal log-likelihood, which is given by
(80)
Therefore, for optimizing the hyper-parameters in the model, F should be maximized. Adding a LCE to the inducing set z can only increase F. This is very useful for selecting the inducing LCEs. Among several candidates, the one that increases F the most can be selected.

The most important step, regardless of the regression algorithm, is the generation of the data. Early methods36,47 prepared the data beforehand, which is not efficient. For instance, they used existing ab initio calculations or generated samples by random displacement of atoms. On-the-fly learning methods,43,44,91,101 estimate the accuracy of the potential and use a criterion for data generation during molecular dynamics simulation, etc. In GPR based algorithms, the predictive variance is a suitable criterion while in NNs the uncertainty can be emulated by training identical NNs with different initial random seeds. Generally, GPR based algorithms are more suited for on-the-fly learning since they have only a few hyper-parameters. However, the size of these potentials and their computational cost grow with time; therefore, the data generated earlier need to be dumped to keep the potential affordable. Thus, for globally accurate potentials, it is critical to devise an optimal adaptive sampling algorithm. We tackle this issue by replacing the predictive variance by another criterion, which we call the geometric criterion. This is motivated from the observation that the predictive variance does not take advantage of the energy and forces data. Assume that a SGPR model is generated using n ab initio data X={Rj}j=1n and m inducing LCEs z={χj}j=1m. Aside from hyper-parameter optimization, the potential can be modified in two ways: adding a LCE χm+1 to the inducing set z and adding a new ab initio data Rn+1 to X. Instead of calculating the predictive variance, we calculate the change in the predictive mean of the potential by directly constructing the potential both with and without Rn+1 (same for χn+1). If the change in the predictive mean is more than a threshold ϵ, the addition is accepted. Inclusion of the forces data in the regression is critical for this algorithm; otherwise, geometric criterion may become only marginally better than variance criterion.

For adaptive sampling of the data and inducing LCEs, the model is built on-the-fly with MD where the system evolves with SGPR model, and ab initio calculations are carried out actively to correct the model based on the following criterion. The data and unique LCEs in the first step are automatically included in X and z. In the following steps, we try to insert on-the-fly LCEs to z. A LCE χm+1 is added to z only if ΔE(χm+1)>ϵ, where ΔE is the change in the predictive energy resulting from the inclusion of χm+1 in z and ϵ is a predetermined threshold. Since it is too costly to try out all LCEs, in the spirit of importance sampling, we try insertion of LCEs based on their spilling factor [see Eq. (79)] in a descending order and terminate at the first unsuccessful insertion trial. If at least one LCE is added to z, it means that the system has potentially crossed into an unfamiliar region. At this stage, in one algorithm (FAST), we calculated the exact ab initio energy and forces and included the snapshot Rn+1 in X only if ΔE(Rn+1)>ϵ. Thus, sometimes the exact ab initio data were regarded as redundant and were rejected by the model. This was devised to keep the number of sampled data ( the computation cost in future steps) as small as possible. In another algorithm (ULTRAFAST), in order to preempt unwanted ab initio calculations, predictions of the model were used as fake ab initio data, and if the insertion was accepted, they were corrected by exact calculations. This algorithm is similar to leave-one-out cross-validation method for active learning.104 

We have developed the Python package AutoForce for generating SGPR models and on-the-fly learning of the PES.105 This package is coupled with the atomic simulation environment (ASE)106 and from there it can be coupled to various ab initio software including Vienna Ab initio Simulation Package (VASP),107 which implements the projector augmented wave (PAW)108 approach to DFT with Perdew–Burke–Ernzerhof (PBE) and generalized-gradient approximation (GGA) functionals.109 

For reporting the accuracy of ML potentials, we use root mean squared error (RMSE) and coefficient of determination, which is denoted as R2 in statistics and is defined by
(81)
where {fi} are the ab initio forces, f¯ is their average, and {f̃i} are the ML forces. R2 indicates the fraction of explained variance.

For a benchmark simulation, we created a SGPR potential for bulk Si in that it has been studied by several other ML methods.36,44,47,91,101 A cubic cell with 64 atoms was chosen with a 2×2×2 k-point grid for DFT calculations. Calculations for the remaining part are non-spin-polarized and a kinetic energy cutoff of 500 eV is applied. The potential is generated by on-the-fly sampling with MD simulations at 300 K (for 20 ps) and 1000 K (for 30 ps). Using the algorithm ULTRAFAST, only 13 DFT calculations were performed for data generation and 36 LCEs were sampled as the inducing set. This entire simulation took less than 1 h. A total of 100 snapshots were chosen from trajectories for testing, which resulted in the RMSE = 0.07 eV/AA and R2=0.994 for forces. The phonon spectra with a 5×5×5 supercell are shown in Fig. 5. This performance is better than all previous ML methods, which used hundreds to thousands of DFT data for generating a potential. The closest competition is on-the-fly active learning,44 which reached the RMSE of 0.08 eV/Å with 133 DFT calculations for 10 ps MD at 620 K. The SGPR algorithm, which used much more expensive smooth overlap of atomic positions (SOAP) descriptor and smaller computational resources, was found to be more than ∼300 times faster than the GPR algorithm.

FIG. 5.

Phonon spectra of bulk Si with a SGPR potential generated using only 13 DFT data and 36 inducing LCEs. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 5.

Phonon spectra of bulk Si with a SGPR potential generated using only 13 DFT data and 36 inducing LCEs. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal

The RMSE of ML potential usually increases with temperature because of larger forces and more disorder. In Si simulation above from 300 to 700 K, the RMSE increases from 0.03 to 0.07 eV/Å. However, at higher temperatures, the larger RMSE can be tolerated because of random forces due to thermal fluctuations. In Ref. 36, a silicon melt simulation with NN errors well above 0.2 eV/Å accurately reproduced the DFT radial distribution function. A minimal requirement is that the distribution of forces error should be Gaussian centered at 0. As such, they act as random forces due to thermal coupling with a heat bath and do not significantly alter the thermodynamics.

When we consider materials with various compositions but with the same elements, each “expert” model can be generated for each composition based on the SGPR algorithm. Then, a general expert model can be obtained for a material system having diverse compositions. As an example, let us consider the Li–S–P materials system (Lix–Sy–Pz) having four different compositions (Li3PS4,Li3PS6,Li48P16S61,Li7P3S11), which are composed of the same set of elements of Li, P, and S. Then, each expert model is generated for each material. By combining these four expert models, one universal expert model can be generated. Three approaches could be utilized:

  • Product of experts (POE): Average of all of the expert models is calculated for making predictions.

  • Merging of experts (MOE): The data and inducing LCEs of all expert models are combined by importance sampling using the geometric criterion.

  • Fusion of experts (FOE): The data and inducing LCEs of all expert models are fully utilized, without importance sampling.

A SGPR model is generated for each material of a certain composition. We may refer to such a model as a local or expert model. An expert model i is obtained from its mi inducing descriptors and ni data that are sampled on-the-fly with MD. Then the covariance matrices kmimi and knimi are built, which yield the weights wi needed for predictions [see Eq. (77)]. Each expert model can be viewed as a block from the universal SGPR model, which is defined by the global covariance matrices kMM and kNM (with M=imi and N=ini):
(82)
and
(83)
where kmimj and knimj (for ij) are inter-expert cross covariance matrices. Inserting these matrices into Eq. (77) with Y=[Y1,Y2,]T results in the global weights W. Such a model would be able to describe all the materials with the same elements but with different compositions simultaneously, and therefore more transferable. But, building such a universal model is computationally difficult and, even if possible, would become too slow for predictions. The assumption that the off diagonal blocks are zero results in W=[w1,w2,]T and the prediction of the universal model reduces to simple averaging over expert models. This approximation is referred to as POE. POE becomes reasonable when the expert models are built for completely dissimilar materials. A better approach is to use the expert models as a library. Given a new material x that is not available in the library, we can choose only a subset of expert models, which are similar to x. Algebraically, if kMM is built from a subset of experts and
(84)
then the chosen subset is a sufficient model. We refer to such a combination as FOE. If such a subset cannot be constructed, then the expert model for x should be added to the library. Since it is possible that similar inducing descriptors are repeated in several expert models, a FOE can be further compressed for increasing the efficiency. This model is referred to as MOE. MOE can be obtained by importance sampling of the FOE data and inducing descriptors using a threshold criterion. Here, the adaptive sampling algorithm described in Ref. 54 is used. In simple terms, POE is obtained by neglecting the off diagonal blocks (cross terms) of the global covariance matrix, MOE by importance sampling of the rows/columns, and FOE by forceful inclusion of all elements.

As an example, here we provide the coefficients of determination R2 for the above combinatorial models, i.e., the aforementioned four different crystals of the ternary (Li–P–S) solid electrolyte with the same set of three elements, as listed in Table I. As expected, POE is poorer than expert models. It is included here because in most of the non-Bayesian ML methods, averaging over the expert models is the only available mechanism for combining ML models, which is equivalent to POE in Bayesian methods. The current study provides a quantitative comparison. Thus, although POE can provide a rough ML-IAP without any further ML procedures, it does not generalize well. ML models trained for different domains may not be simply averaged. MOE is as good as the expert models, even though its size (data/inducing) = (102/573) is much smaller than the combined size of the experts (181/1418).63 FOE has the same size as the combined size of the experts but almost in every case is better than the experts. Thus, FOE could be used when affordable; otherwise, MOE offers a scalable solution that can handle more complex systems much faster.

TABLE I.

R2 of expert models and their combinations for Li–P–S and Li–Sb–S systems. Adapted from Ref. 63. Reproduced with permission from Hajibabaei and Kim, J. Phys. Chem. Lett. 12, 8115 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Li–P–S Expert POE MOE FOE
Li3PS4  0.972  0.962  0.973  0.980 
Li7PS6  0.972  0.813  0.966  0.978 
Li48P16S61  0.953  0.697  0.952  0.969 
Li7P3S11  0.958  0.864  0.955  0.969 
Li–P–S Expert POE MOE FOE
Li3PS4  0.972  0.962  0.973  0.980 
Li7PS6  0.972  0.813  0.966  0.978 
Li48P16S61  0.953  0.697  0.952  0.969 
Li7P3S11  0.958  0.864  0.955  0.969 

Similar combinations can also be made for another ternary system such as Li–Ge–S solid electrolyte by accomplishing the same by training a model sequentially in all domains. Then, as a more interesting and useful example, by combining the two expert models for the two ternary systems of Li–P–S and Li–Ge–S, it is possible to generate the expert model for a quaternary system of Li–P–Ge–S solid electrolyte, once the cross term between P and Ge, i.e., the expert model for the P–Ge system is properly provided. If there is no direct contact between these two elements, the cross term could be reliable enough by adding only the long-range interactions between the two elements. As an approximation, such a small effect by the cross term could be even ignored with a certain error range. However, if the two elements are directly in contact, the cross term needs to be taken into account. In the case of MOE, the cross term between the two elements can be more easily and simply calculated than the three elements cases. Overall, the MOE approach can be utilized with a minimal loss of accuracy in energy. Thus, the combinatorial approach utilizing the MOE approach opens the door for strategies like “high level parallelism” or “a library of SGPR potentials,” which can be combined by simple cross covariance calculations to describe increasingly complex materials. Such a combinatorial approach is not trivial by other ML methods.

Indeed, the SGPR models for Li3PS4 and Li4GeS4 were simply combined and tested on Li10Ge(PS6)2 (both configurations in Table II). Due to the presence of different species P and Ge, the combinatorial model cannot be significantly compressed and thus MOE and FOE are equivalent. The combined model of the two ternary crystals described the forces of the quaternary crystal with R2=0.961, which is very close to R2 of the expert model. This is understandable in view of the similarity of LCEs for the cations P and Ge in these crystals (see Fig. 6). In many promising solid electrolytes, they are present as (XS4)q poly-anionic moieties with X = P, Ge, etc., in the sea of Li+ ions and do not directly interact with each other. By combination of the ML models for the simpler crystals, we can simulate the composite ones with a minimal training or even without any additional training.

TABLE II.

22 fastest Li conducting ternary solid electrolytes obtained after filtering of about 300 candidates. Diffusivities are obtained with MLMD in NPT ensemble. The first column is the ID of crystal in the materialsproject111 repository. The last column σRT is the extrapolated ionic conductivity at 300 K. The quaternary Li10Ge(PS6)2 is appended as a special case.63 Adapted from Ref. 63. Reprinted with permission from Hajibabaei and Kim, J. Phys. Chem. Lett. 12, 8115 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

mp-ID Formula ML Diffusion coefficient (Å2/ps) at given temperature (K) Ea σRT
R2 400 500 600 700 800 900 1000 1200 (eV) (mS/cm)
mp-1211324  Li7PS6  0.972      0.15    0.29    0.54  0.83  0.18  65.2 
mp-1097036  Li3PS4  0.972    0.068  0.14  0.28  0.46        0.23  20.2 
mp-641703  Li7P3S11  0.958      0.12  0.2  0.32    0.61    0.21  20.1 
mp-768197  Li5SbS4  0.934      0.1  0.18  0.31        0.23  18.4 
mp-1147621  Li3InBr6  0.924  0.018  0.057  0.13            0.2  16.1 
mp-28327  LiGaBr3  0.903    0.049  0.11            0.21  7.6 
mp-756427  Li5SbS4  0.933        0.086  0.19  0.26  0.37    0.26  6.0 
mp-685863  Li4SiO4  0.958          0.093  0.2  0.32  0.53  0.3  3.1 
mp-1195718  Li4SnS4  0.973          0.18  0.32  0.37  0.74  0.29  2.3 
mp-766409  Li3BiS3  0.946      0.034  0.077  0.12    0.29    0.27  2.1 
mp-29009  Li2MgBr4  0.944      0.045  0.096  0.18        0.28  1.4 
mp-38684  Li2MgCl4  0.958      0.047  0.095  0.21    0.46    0.3  1.3 
mp-760415  Li3SbS4  0.966          0.2    0.57  0.98  0.33  0.83 
mp-1194700  Li4SnSe4  0.962          0.15  0.3  0.41  0.8  0.33  0.6 
mp-1222582  Li4GeS4  0.97          0.14  0.31  0.4  0.93  0.38  0.24 
mp-1001069  Li48P16S61  0.953      0.027  0.083  0.17        0.38  0.22 
mp-1177520  Li3SbS3  0.929      0.018  0.058  0.18    0.39    0.38  0.2 
mp-775806  Li3SbS3  0.944          0.084  0.16  0.27  0.49  0.36  0.19 
mp-38487  Li7BiO6  0.987          0.041  0.079  0.12  0.25  0.37  0.17 
mp-753720  Li3BiS3  0.939          0.08  0.14  0.21  0.51  0.39  0.079 
mp-29008  Li6MgBr8  0.965      0.019  0.052  0.13  0.23      0.41  0.077 
mp-757150  Li5BiS4  0.929      0.02  0.12  0.2  0.41      0.46  0.049 
mp-696128  Li10Ge(PS6)2  0.967      0.16    0.32    0.55  0.76  0.17  84.2 
mp-696138  Li10Ge(PS6)2  0.97      0.14    0.33    0.55  0.85  0.19  46.6 
mp-ID Formula ML Diffusion coefficient (Å2/ps) at given temperature (K) Ea σRT
R2 400 500 600 700 800 900 1000 1200 (eV) (mS/cm)
mp-1211324  Li7PS6  0.972      0.15    0.29    0.54  0.83  0.18  65.2 
mp-1097036  Li3PS4  0.972    0.068  0.14  0.28  0.46        0.23  20.2 
mp-641703  Li7P3S11  0.958      0.12  0.2  0.32    0.61    0.21  20.1 
mp-768197  Li5SbS4  0.934      0.1  0.18  0.31        0.23  18.4 
mp-1147621  Li3InBr6  0.924  0.018  0.057  0.13            0.2  16.1 
mp-28327  LiGaBr3  0.903    0.049  0.11            0.21  7.6 
mp-756427  Li5SbS4  0.933        0.086  0.19  0.26  0.37    0.26  6.0 
mp-685863  Li4SiO4  0.958          0.093  0.2  0.32  0.53  0.3  3.1 
mp-1195718  Li4SnS4  0.973          0.18  0.32  0.37  0.74  0.29  2.3 
mp-766409  Li3BiS3  0.946      0.034  0.077  0.12    0.29    0.27  2.1 
mp-29009  Li2MgBr4  0.944      0.045  0.096  0.18        0.28  1.4 
mp-38684  Li2MgCl4  0.958      0.047  0.095  0.21    0.46    0.3  1.3 
mp-760415  Li3SbS4  0.966          0.2    0.57  0.98  0.33  0.83 
mp-1194700  Li4SnSe4  0.962          0.15  0.3  0.41  0.8  0.33  0.6 
mp-1222582  Li4GeS4  0.97          0.14  0.31  0.4  0.93  0.38  0.24 
mp-1001069  Li48P16S61  0.953      0.027  0.083  0.17        0.38  0.22 
mp-1177520  Li3SbS3  0.929      0.018  0.058  0.18    0.39    0.38  0.2 
mp-775806  Li3SbS3  0.944          0.084  0.16  0.27  0.49  0.36  0.19 
mp-38487  Li7BiO6  0.987          0.041  0.079  0.12  0.25  0.37  0.17 
mp-753720  Li3BiS3  0.939          0.08  0.14  0.21  0.51  0.39  0.079 
mp-29008  Li6MgBr8  0.965      0.019  0.052  0.13  0.23      0.41  0.077 
mp-757150  Li5BiS4  0.929      0.02  0.12  0.2  0.41      0.46  0.049 
mp-696128  Li10Ge(PS6)2  0.967      0.16    0.32    0.55  0.76  0.17  84.2 
mp-696138  Li10Ge(PS6)2  0.97      0.14    0.33    0.55  0.85  0.19  46.6 
FIG. 6.

In all of the above SEs, XS4 poly-anionic moieties are present with X = P or Ge. Figure taken from Ref. 63. Reproduced with permission from Hajibabaei and Kim, J. Phys. Chem. Lett. 12, 8115 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 6.

In all of the above SEs, XS4 poly-anionic moieties are present with X = P or Ge. Figure taken from Ref. 63. Reproduced with permission from Hajibabaei and Kim, J. Phys. Chem. Lett. 12, 8115 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

In summary, extensive MD simulations can be carried out with on-the-fly ML for the potential energy using the SGPR algorithm. The models generated with kernel-based methods such as SGPR can be easily combined for more generality and transferability. Therefore, using the previously trained models in the database, it is possible to model new crystals without further training. This approach was demonstrated by creating universal potentials for two ternary systems. Additionally, combinations of two ternary crystals resulted in a realistically accurate potential for a quaternary crystal. With the introduced algorithm, the models generated in this study can be reused, with limited or no training at all, to describe more complex system (such as quaternary electrolytes) by a combinatorial approach.

Recently, universal potentials have been widely developed using graph-based models trained on large databases like the Materials Project.1,17,40,95–100 These models aim to cover all elements in the periodic table and handle various configurations, expanding the term “universal potential” to encompass not only elements but also configurational spaces. While the current review discusses mainly elements, nothing in the model formulation limits its applicability to different configurations. The uncertainty threshold of the model can be adjusted to accommodate a broader training configuration space, thereby enhancing its predictive capabilities across diverse materials. The term “universal potential” was first used to refer to extended subsystems within typical machine learning potentials (MLPs), which were tailored for specific materials with fixed compositions.34,63 Kernel methods are able to address a wide range of material systems, regardless of differences in phase or stoichiometry. These subsystems can be combined into supersystems, enabling them to handle even more diverse materials. A built-in library system, stored in memory, allows for the selection of appropriate kernels for specific systems. Once fully constructed, this library can support various material systems by incorporating cross terms between expert systems, calculated from local structure descriptors. These cross terms are then added to the library, making it more general and versatile over time. Although the compositionally universal potential is currently somewhat limited, the library is continuously expanding and will eventually provide greater generality and applicability for a wide range of materials.

The SGPR algorithm can have difficulties in dealing with large number of inducing sets m as O(nm2) in obtaining the solution w of Eq. (74), which involves the inversion of a kernel matrix, with large size of training datasets X and inducing set z of LCEs. In order to mitigate this challenge, Bayesian Committee Machine (BCM)110 is employed to combine SGPR models, each trained independently on different subsets as a local expert (Fig. 7). When building MLP of ice polymorphs, rather than utilizing all ice phases as a training dataset, each ice phase is divided into separate training subsets. Each local expert model is then trained on its corresponding subset. These individually trained local expert SGPR models are combined into a universal BCM MLPs. This approach not only simplifies the training process but also overcomes the computational challenges associated with inverting large kernel matrices, enabling an efficient and accurate universal MLP. The BCM, for instance, combines the predictions from P distinct local expert models, each trained on different data subsets, which reduces the computational scaling to O(nm2/P3) for training.

FIG. 7.

Schematic of active sparse Bayesian committee machine (BCM) potential. (a) The entire dataset is divided into P distinct subsets. Within each subset, a SGPR MLP is trained. Subsequently, a combined version of MLP for the entire dataset is built through the BCM algorithm. The weights βασα2 ensure that the predictions from confident models are more outspoken than those from less confident models. (b) Illustration of active BCM algorithm. Each local expert SGPR model undergoes training up to a predefined kernel size. Before surpassing the kernel size limit, our algorithm initiates training for the next SGPR model. This process is sequentially executed, resulting in the assembly of P local SGPR MLPs, BCM(1:P), forming a Bayesian committee. Throughout the MD simulation, this ensemble of P committee MLPs predicts the energy, forces, and virial tensor of the system. Figure taken from Ref. 112. Reproduced with permission from Willow et al., Phys. Chem. Chem. Phys. 26, 22073–22082 (2024). Copyright 2024 Royal Society of Chemistry.

FIG. 7.

Schematic of active sparse Bayesian committee machine (BCM) potential. (a) The entire dataset is divided into P distinct subsets. Within each subset, a SGPR MLP is trained. Subsequently, a combined version of MLP for the entire dataset is built through the BCM algorithm. The weights βασα2 ensure that the predictions from confident models are more outspoken than those from less confident models. (b) Illustration of active BCM algorithm. Each local expert SGPR model undergoes training up to a predefined kernel size. Before surpassing the kernel size limit, our algorithm initiates training for the next SGPR model. This process is sequentially executed, resulting in the assembly of P local SGPR MLPs, BCM(1:P), forming a Bayesian committee. Throughout the MD simulation, this ensemble of P committee MLPs predicts the energy, forces, and virial tensor of the system. Figure taken from Ref. 112. Reproduced with permission from Willow et al., Phys. Chem. Chem. Phys. 26, 22073–22082 (2024). Copyright 2024 Royal Society of Chemistry.

Close modal
For given subsets of the training dataset {n1,,nP} and inducing LCEs {m1,,mP}, the BCM110 potential energy is
(85)
where the potential energy Eα of α-th SGPR local expert (Rnα and zα={χj}jα) is given by
(86)
where wα are the training weights of α-th local SGPR model.
The forces fiμ and internal virial Wμν of the BCM potential are obtained from
(87)
(88)
where fi,αμ and Wαμν represent the forces and the internal virial of the α-th SGPR local expert, respectively, which are given by
The maximum of the covariance loss for the α-th subset LCEs, denoted as σα2, is used to weight the α-th committee prediction. The σα2 is estimated by
(89)
where K*m is the covariance matrix between a test configuration R and α-th inducing set zα. Furthermore, we weight the prediction with βα following the concepts of the robust BCM where the individual committee prediction is weighted by the differential entropy, βα=log(σα2). In the BCM, a particular local expert, which has the highest similarity with a test configuration R, mostly contributes to the energy, forces, and pressure predictions. The final normalization factor Ŝ is thus given as
(90)

The key advantage of using the SGPR-based BCM is that it not only provides sparsification but also allows us to invert an approximate (m/P)-dimensional matrix rather than a full m-dimensional one (m=αPmα). This reduces the computational cost of the matrix inversion from O(m3) to O(m3/P3). Compared to “product of experts” (POE), where the predicted potential energy is given by E1PαPEα, the BCM employs a weighting factor βα/σα2 to ensure that models with higher confidence contribute more significantly to the predicted energy compared to less confident models.

It is validated that the BCM potential accurately estimates the melting temperatures of ice II, III, and V under pressure conditions of 0.2–0.6 GPa.112 To achieve this, on-the-fly active learning SGPR NpT MD simulations was performed to create BCM MLP for each coexisting system: ice II-water, ice III-water, and ice V-water. Three separately trained SGPR local expert MLP models were combined into one sparse robust BCM potential. This combined potential was used for NpH MD simulations of coexisting ice–liquid systems. Non-active NpH MD simulations were performed to determine the melting temperatures of ice II, III, and V. During these MD simulations, the conservation of enthalpy (H) was ensured while monitoring fluctuations in total energy, which includes both potential and kinetic energy.

Figure 8 illustrates the melting points of ice II, III, and V as determined from our NpH MD simulations using the MB-pol water model. The ice–water phase diagram predicted with the MB-pol potential closely aligns with those obtained from enhanced-coexistence simulations, as reported by Bore and Paesani.113 The BCM MLP stabilizes ice V over ice II, consistent with Zhang et al.'s predicted phase diagram.114 However, using a sophisticated deep NN potential, Bore and Paesani113 observed that ice V became a metastable phase, absent from the phase diagram. This discrepancy is attributed to the failure in incorporating high model uncertainty configurations of the water–ice system in the training dataset. In contrast, the active BCM potential adeptly learns model uncertainties on-the-fly, enhancing model accuracy and thereby stabilize ice V in the phase diagram (Fig. 8). However, the melting points predicted by the BCM MLP exhibit a slight upward shift in temperature relative to the benchmark MB-pol potential (Fig. 8). This deviation primarily stems from the noise present in the energy and force predictions of the BCM potential. Notwithstanding the shifts in melting temperatures, we emphasize the BCM potential's capability to accurately replicate the physical and thermodynamic properties of the reference MB-pol water model. While the discrepancies in melting temperatures suggest a room for further improvements in accuracy of model, the significance lies in the BCM potential's reproduction of reference phase diagram. This affirms the BCM potential's utility in capturing the essence of water's behavior under various conditions while reducing the computational scaling of training SGPR MLP for large data.

FIG. 8.

Ice–water phase diagram. Ice–water melting points are calculated through NpH MD simulations of coexisting ice–liquid systems using the MB-pol (circle lines) and BCM (star dashed lines). Solid black line represents the experimental melting line.117 Figure taken from Ref. 112. Reproduced with permission from Willow et al., Phys. Chem. Chem. Phys. 26, 22073–22082 (2024). Copyright 2024 Royal Society of Chemistry.

FIG. 8.

Ice–water phase diagram. Ice–water melting points are calculated through NpH MD simulations of coexisting ice–liquid systems using the MB-pol (circle lines) and BCM (star dashed lines). Solid black line represents the experimental melting line.117 Figure taken from Ref. 112. Reproduced with permission from Willow et al., Phys. Chem. Chem. Phys. 26, 22073–22082 (2024). Copyright 2024 Royal Society of Chemistry.

Close modal

Figure 9 compares GNNs and SGPR during training and testing. The Material Project Trajectory (MPTrj)97 serves as an example of a large dataset for training, containing about 1 500 000 compounds. The NNs95–98 used the entire dataset directly, employing an embedding layer, interaction layers, and a readout layer to predict output values such as energy, with optimization. In contrast, SGPR organizes the data into subsets based on the similarity of LCEs, creating the local expert models. Each local expert is built from its unique LCEs, which serve as an inducing dataset and a solution (w) to predict energy. These local expert models form the basis for built-in SGPR kernels. To estimate the potential energy of a test configuration x* ({ρi*}), NNs rely on the trained parameters, but SGPR selects local experts (h,j,k) and quantifies uncertainty based on the similarity measure k(ρi*,ρjg) between the LCE (ρi*) of the testing configuration and the existing inducing LCEs {ρjg}j=1m.

FIG. 9.

Comparison between GNNs and SGPR during training and testing. Multiple stacked layers symbolize a large dataset covering materials such as ice/water, batteries, perovskites, zeolites, metal–organic frameworks (MOFs), molten salts, and heterogeneous catalysis. Each stack corresponds to configurations with the same atomic numbers. The GNN model consists of an embedding layer, interaction layers, and a readout layer for predicting output values. During training, node features for all atomic numbers are initialized through the embedding layer and updated via interaction layers based on the edge information. The readout layer then converts the updated node features into the output values, in particular energy values. All model weights are optimized using the Adam optimizer. During testing, the train modeled model predicts the energy of a test configuration, labeled as x* in a green box. For SGPR, during training, configurations in the dataset are classified into subsets based on their similarity. Yellow boxes labeled “expert” refer to SGPR local expert model, which includes an inducing set (z) and a solution (w), as described in Eqs. (74) and (77). In testing, the predicted energy value E* is obtained from the combination of experts' kernels (h,j,k) keeping the similarity k(ρi*,ρjg) between LCEs ρi* in x* and those ρjg in inducing sets z. For one universal expert model by combining these local experts, we use POE, MOE, or FOE as described in Sec. II O as well as BCM as described in Sec. II P.

FIG. 9.

Comparison between GNNs and SGPR during training and testing. Multiple stacked layers symbolize a large dataset covering materials such as ice/water, batteries, perovskites, zeolites, metal–organic frameworks (MOFs), molten salts, and heterogeneous catalysis. Each stack corresponds to configurations with the same atomic numbers. The GNN model consists of an embedding layer, interaction layers, and a readout layer for predicting output values. During training, node features for all atomic numbers are initialized through the embedding layer and updated via interaction layers based on the edge information. The readout layer then converts the updated node features into the output values, in particular energy values. All model weights are optimized using the Adam optimizer. During testing, the train modeled model predicts the energy of a test configuration, labeled as x* in a green box. For SGPR, during training, configurations in the dataset are classified into subsets based on their similarity. Yellow boxes labeled “expert” refer to SGPR local expert model, which includes an inducing set (z) and a solution (w), as described in Eqs. (74) and (77). In testing, the predicted energy value E* is obtained from the combination of experts' kernels (h,j,k) keeping the similarity k(ρi*,ρjg) between LCEs ρi* in x* and those ρjg in inducing sets z. For one universal expert model by combining these local experts, we use POE, MOE, or FOE as described in Sec. II O as well as BCM as described in Sec. II P.

Close modal

For uncertainty quantification (UQ), SGPR estimates uncertainty based on the similarity measurement k(ρi*,ρjg), providing UQ values [Eqs. (79) and (89)] using similarity measures, particularly in regions far from the training data, between the LCE of the testing configuration and the existing inducing LCEs. NNs, on the other hand, face difficulties in producing robust UQ estimates.115,116 Methods such as Bayesian NNs, including Bayes by Backprop,118,119 Monte Carlo-Dropout,120,121 and deep ensembles,122 have been proposed to estimate uncertainties in the predicted total energy, but not for local energy predictions. Other UQ approaches use feature space,123 latent space,124 or GP-based kernel on latent space,125 with the Euclidean norm distance in latent features showing some success. Recently, UQ could be improved by analyzing variations in predicted forces rather than predicted energies,126 while handling variations in predicted forces remain less reliable in data-sparse regions. In this regard, SGPR stands out for its principled approach, providing closed-form uncertainty estimates, data-dependent variance, and the flexibility of a non-parametric model, which adapts its complexity to the data.

As for scalability, GNNs are able to handle large datasets and many atoms efficiently through local message-passing mechanisms. However, significant disadvantages of GNN models arise from the significant training data requirements, UQ, interpretability, and computational complexity during training. GNNs typically require large datasets to reach optimal performance. If training data are limited, they may not perform as well as GPR-based models, which are better at handling small data regimes. The need for extensive data can be a drawback in situations where high-quality labeled data (e.g., quantum mechanical calculations) are expensive or time-consuming to generate. GNNs are generally deterministic models, without providing UQ. This makes them less useful for tasks where knowing the confidence in prediction is crucial. GNNs are often seen as black-box models, making it difficult to interpret why certain predictions are made. This lack of transparency can be a concern in certain applications where interpretability is important. The underlying physics captured by the model can be harder to analyze compared to GPR where the kernels can often be mapped to physical processes. Although GNNs are scalable during inference, training them can still be computationally expensive, especially for equivariant models that require careful enforcement of rotational symmetries and use large datasets.

GPR models typically rely on SOAP descriptors, which become computationally intensive as the number of elements increases. This is because the descriptor needs to account for the diverse chemical environments and interactions for each additional element. The computational cost in GPR grows significantly when a greater number of atomic species is involved, making it less scalable for systems with many different elements. On the other hand, GNN models use graph-based embeddings, which capture structural information about the molecules without scaling as severely with the number of elements. GNNs are more flexible and efficient for large systems due to the ability to generalize molecular structures through learned embeddings. GNNs are highly scalable by handling large datasets and systems with many atoms efficiently due to their local message-passing nature. However, significant disadvantages of GNN models arise from the significant training data requirements, UQ, interpretability, and computational complexity during training. GNNs typically require large datasets to reach optimal performance. If training data are limited, they may not perform as well as GPR-based models, which are better at handling small data regimes. The need for extensive data can be a drawback in situations where high-quality labeled data (e.g., quantum mechanical calculations) are expensive or time-consuming to generate. GNNs are generally deterministic models, without providing UQ. This makes them less useful for tasks where knowing the confidence in prediction is crucial. GNNs are often seen as black-box models, making it difficult to interpret why certain predictions are made. This lack of transparency can be a concern in certain applications where interpretability is important. The underlying physics captured by the model can be harder to analyze compared to GPR where the kernels can often be mapped to physical processes. Although GNNs are scalable during inference, training them can still be computationally expensive, especially for equivariant models like NequIP and MACE that require careful enforcement of rotational symmetries and use large datasets.

On the other hand, GPR is a non-parametric Bayesian approach used for regression tasks. SGPR is a variant of GPR that reduces the computational cost by selecting a subset as a natural way to quantify uncertainty in the predictions. This is useful in materials discovery or simulations where understanding the reliability of predictions is critical. The Bayesian nature of SGPR allows for the prediction of both mean and variance, which is a unique strength over many GNNs. SGPR models tend to perform well with relatively small and moderate datasets, as they rely on a kernel-based approach that implicitly embeds prior knowledge about smoothness in the data. SGPR is particularly useful when training data are scarce or expensive to obtain. GPR models are highly interpretable, as the choice of kernels can be directly related to physical principles, and the model predictions can be understood as weighted averages of the training data. The ability to choose specific kernel functions provides flexibility to capture particular characteristics of atomic interactions. The predictions of GPR are generally smooth and conservative, which can be beneficial in certain applications, especially when physical smoothness constraints are essential, like for energy landscapes. In contrast, there are disadvantages of GPR. As the number of atoms and interactions increases, even sparse approximations become computationally extensive for very large systems. GPR relies heavily on the choice of kernel function to model interactions. To mitigate the scaling issue of GPR, feature compression, sparse representations, and partitioning strategy could be exploited. Techniques such as dimensionality reduction on SOAP descriptors (e.g., principal component analysis) could help reduce the complexity of the feature space without losing essential chemical information. Using a sparser representation of the SOAP descriptor could lessen the computational load by focusing on the most relevant atomic environments. Breaking down the chemical space into smaller regions (e.g., clustering molecules based on their element types) may allow the model to focus on subsets of elements at a time.

Overall, GNNs excel in scalability, can model complex interactions, and are better suited for large datasets, but require more data and computational resources, and lack built-in uncertainty quantification. SGPR is an excellent choice for small datasets, interpretable models, and applications where uncertainty quantification is critical. While GPR struggles with scalability and expressiveness in complex systems, SGPR mitigates such issues and, furthermore, the built-in-library SGPR potential/force-fields would maintain the accuracy of complex materials model systems without significant loss of the computational speed, enhance the prediction power of structure, energetics, and properties of unknown novel materials, and help in understanding of their physical science concepts.

There is no definitive answer for the exact maximum number of elements SGPR can handle, as it depends on the specific implementation and computational resources available. However, most SGPR models may struggle beyond 6–10 distinct elements, especially when high accuracy and rich descriptors are required. Meanwhile, GNNs are known for their strong extrapolation capabilities due to the generality of their learned embeddings. In contrast, GPR-based models are generally more interpolative, meaning they excel at making predictions within the bounds of the data they were trained on. While GPR models show limitations in scalability and extrapolation, particularly when the number of elements grows, GNN models provide a more robust framework for handling diverse chemical systems. In particular, SGPR-based models offer significant advantages in smaller, moderate-sized well-defined systems where interpretability and fine-grained control over descriptors like SOAP are essential. Overall, the choice between SGPR and GNN-based methods depends on specific application needs, data size, and computational resources. SGPR is preferred for smaller datasets requiring interpretable models and theoretically grounded uncertainty estimates, while GNNs are favored for large datasets and scalable predictions. SGPR provides a more interpretable and theoretically grounded uncertainty estimate, while NN-based methods are scalable but often rely on approximations or ensembles that may be less reliable in data-sparse regions. In this regard, scalable built-in-library SGPR could be a useful approach for moderately larger datasets. Future research may explore ways to combine the strengths of both SGPR and GNN approaches for broader applicability, particularly for larger datasets in materials science.

In Sec. II, we discussed ML representations of the ab initio PES. Two of the most important branches are NN36 and GAPs.47 NNs are computationally efficient because they scale as O(n) with the size of the data. However, since they contain many hyper-parameters, they are more likely to over-fit the training data and not generalize well. Therefore, NNs are often used when large datasets are available. On the other hand, Gaussian process based methods are almost non-parametric; only a few hyper-parameters are present in the covariance kernel. They work well even with small datasets, but their computational cost grows fast with the size of data. The cost for building the covariance matrix scales as O(n2) and for its inversion as O(n3). Therefore, approximate inference methods for the GPR, which transform these unfavorable cost scaling to O(n), are crucial. BLR method43,91) is one of the possible solutions. However, a method better than linear regression is sparse approximation.

Another important issue is on-the-fly modeling and active learning43,44,91,101 where the data required for building the model is generated alongside molecular dynamics (MD) simulations. For this purpose, Gaussian process based regression methods are much more suitable than NN because the predicted variance can be used as a proxy for on-the-fly accuracy of the model and a criterion for data generation by ab initio calculations. Although the uncertainty can be emulated with NN by training identical networks with different initial random seeds, updating an existing Gaussian process model is much easier.

Currently, Gaussian process based active learning algorithms use the predictive variance [see Eq. (40)] as a criterion for on-the-fly data generation. A common objective for these methods is to show that their predictive variance correlates well with the actual ML errors. However, we noticed that the predictive variance is blind to the energy and forces data. For this, we devised a new criterion that includes these data as well. As a result, during on-the-fly sampling, the model is generated with much smaller data while maintaining or improving the accuracy, which dramatically increases the efficiency. This is accomplished by low-rank approximation of the full covariance matrix in normal GPR. Using SGPR, the computational cost is effectively reduced to that of the linear regression methods while maintaining the accuracy. In addition, we discussed a novel adaptive sampling algorithm. The excellent performance of this formalism (SGPR + adaptive sampling) is demonstrated with benchmark simulations. As a benchmark simulation, we applied this algorithm to molecular dynamics simulations of bulk Si. The SGPR potential, which was generated using only 13 ab initio calculations, regenerated the phonon spectra almost identical to DFT. The combinatorial approach to generate a more universal expert system dealing with more elements from several expert systems dealing with less elements opens the door for strategies like “high level parallelism” or “a library of SGPR potentials,” which can be combined by simple cross covariance calculations (based on the MOE approach) for the cases to describe increasingly complex materials. Therefore, this approach can be useful for modeling wide-ranging composite materials for tailoring the performance. Since the computational cost of SGPR is proportional to the number of atoms N in the system, much bigger systems can be efficiently simulated compared with DFT where the computational cost scales as N3. This allows us to study large and complex systems.

The kernel regression methods in Sec. II assumed that a latent energy function E(di) exists for the contribution of atoms i in the potential energy of the system, where di=D(ρi) is the descriptor for the local chemical environment of atom i. Then using an appropriate covariance kernel K(di,dj)=E(di)E(dj), representations based on linear regression or Gaussian process regression are found for E using a set of ab initio data. Here we discuss the kernels and descriptors in more details.

In all of the previous derivations, we assumed that the latent energy function (LEF) is related to the potential energy by Eq. (3), where every atom i has its own descriptor di. Although this can be applied everywhere, it is not the only possible approach. For instance, a group of atoms can be described with a single descriptor and the associated LEF. This is specially useful for molecular systems where coordinates of a group of atoms are strongly correlated. In such cases, the generalized expression in Eq. (1) can be assumed. The regression equations can still be applied with simple adjustments: ρi represents the translationally invariant degrees of freedom in the group of atoms indexed by i.

Ideally, the descriptor should be invariant with respect to the symmetry operations that do not change the potential energy. Symmetries that we are most concerned with are translations, rotations, and permutations of identical atoms. In some cases, certain symmetries can be ignored in the descriptor design and enforced manually. For instance, in a system with strong chemical bonds that permutations of atoms never occur, one can ignore this symmetry. Instead, the indices that are chosen for the atoms should be kept fixed.

In designing the descriptors for LCEs, since ρi defined by Eq. (2) explicitly depends on the cutoff radius rc, once must make sure that the descriptor function D(ρi) is continuous and differentiable with respect to the coordinates of atoms in the vicinity of rc
(91)
The above behavior can be enforced by appropriate inclusion of smooth cutoff functions in the descriptor. The simplest choice for the cutoff function is
(92)
Another commonly used cutoff function is36 
(93)
The LEF can be split into several parts, for instance, to two-body (2b), three-body (3b), and many-body (mb), where each has its own descriptor. Formally,
(94)
(95)
where α represents the index of different types of descriptors. Then for the covariance kernel, we can write
(96)
With the assumption that Eα(diα)Eβ(djβ) is zero unless α=β, the kernels become additive
(97)
(98)
In describing the LCEs, we are interest in quantities that remain invariant with respect to rotations. Any function of the type f(θ,ϕ), where θ and ϕ are polar and azimuthal angles in spherical coordinates, respectively, can be represented as
(99)
where Ylm are the spherical harmonics. Therefore, it suffices to know the transformations of the spherical harmonics under applied rotations. If (θ,ϕ)(θ,ϕ) under rotation R, then
(100)
where Dmml(R) are elements of the Wigner D-matrix, which is a block-diagonal matrix with the property
Equation (100) in matrix notations (Yl is a column vector of the length 2l+1, and Dl a square matrix of the shape (2l+1)×(2l+1)) becomes
(101)
Considering that (Dl)Dl=Il, it is easy to see that Yl(θ,ϕ)Yl(θ,ϕ) remains invariant with respect to rotations. Thus if we define fl=[cl,lYll(θ,ϕ),,cl,lYll(θ,ϕ)]T, ql defined below is invariant under rotations such that
(102)
In fact, this is used in defining the bond orientational order parameter, which has been used as the descriptor. More generally, Yl(θ1,ϕ1)Yl(θ2,ϕ2) is invariant with respect to rotations. The following relations will be useful:
(103)
(104)
where Cλμ,αβlm are the Clebsch–Gordan coefficients and the integration over all rotations can be done using Euler angles, i.e., dR=02πdα02πdγ0πsinβdβ.
Application of two-body (2b) descriptor is straightforward since distances are invariant with respect to rotations or permutations of atoms. If the RBF kernel is chosen as the covariance kernel, then 2b kernel becomes
(105)
where Θ is a smooth cutoff function as Eq. (92) or (93). For a three-body (3b) descriptor, one has to construct an array that is invariant with respect to the permutation of atoms. A possible choice is48 
(106)
where atoms j and k are present in the environment of atom i. The above array is invariant with respect to jk permutation.
Another descriptor that has been widely used is Coulomb matrix,127 which is defined by
(107)
where Θ is a optional smooth cutoff function. Mij is not invariant with respect to permutations of atoms. Therefore, a fingerprint is often generated using the sorted eigenvalues of the Coulomb matrix
(108)
Still, discontinuities may occur if a level crossing happens in the eigenvalues of the Coulomb matrix. This descriptor is often used for studies of molecular systems.50,51
A many-body descriptor for the LCEs, which is often used in neural networks, is Behler–Parrinello symmetry functions.36 Radial symmetry functions are defined by
(109)
and angular symmetry functions by
(110)
where
(111)
and ω=(η,rs,λ,ζ) are the free parameters. The symmetry functions with different values of ω are packed into a descriptor vector as
(112)
For a similarity/covariance kernel between two LCEs, the SOAP is defined as128 
(113)
where R is the 3D rotations operator, ϱi(r) is the atomic density defined as
(114)
and α is a hyper-parameter. Such a kernel is invariant with respect to all rotations
(115)
The above integrals can be carried out analytically, which ignoring a constant multiplier, results in
(116)
where
(117)
rj=|rij|,n̂j=rij/rj, and ιl is the modified Bessel function of the first kind (Bessel-i). Unfortunately, these equations are not computationally efficient because of the dependence of Imml identities on both environments. In the study by Bartók et al.,128 this is remedied by expansion of the atomic density using an orthonormal radial basis set {gn}
(118)
which transforms the Imml identities into inner products
(119)
which in turn transforms the kernel into an inner-product
(120)
Here we achieve the same goal, by considering the series expansion of Bessel-i functions ιl,129 
(121)
where anl=(α/2)2n+l/(n+l)!n!. Using this expansion, we get the same expression for Imml as Eq. (119) with the expansion coefficients already determined:
(122)
For elimination of discontinuities due to the application of a cutoff, we need to modify the descriptor, for instance, by the substitution eαrj2/2Θ(rj)eαrj2/2, where Θ is a smooth cutoff function with a continuous derivative at rc. The anl multipliers decrease extremely fast by increasing n and l; therefore, it suffices to calculate up to only a few orders of expansion. α = 1, rc = 8, and l,n3 are chosen for reasonable accuracy.
Extension to multi-species environments is straightforward.130 If the coefficients cnlm are calculated for each atomic species in the LCE, the total descriptor vector is
(123)
where α and β denote the species. Then, the kernel may be generalized to
(124)
where χαβ is an alchemical similarity kernel between the atomic species in LCEs and μsisi is a kernel that depends on the species of the central atoms i,i and quantifies the covariance between local energies. Here we have chosen χαβ=δαβ and μsisi=δsisi. The latter corresponds to summation of independent Gaussian processes defined for each atomic species. The exponent η controls the sensitivity of the kernel, which is fixed at 4 in our work. Finally, the kernel can be normalized by
(125)

In this section, we discussed combinations of covariance kernels such as two-body, three-body, and many-body. As examples of many-body descriptors, we represented Coulomb matrix, Behler–Parrinello symmetry functions, and smooth overlap of atomic positions (SOAP). We found analytical expressions for the components of the SOAP descriptor using the series expansion of the Bessel-i functions, which can be used for fast numerical evaluation of the descriptors. The SOAP kernel is almost non-parametric. Only a few parameters are present in the kernel, which have simple intuitive interpretations. Therefore, we can often skip iterative refinement using the gradient descent algorithm, which is computationally an exhaustive task. The SOAP descriptor with the series expansion of the Bessel-i functions (SE-SOAP) is used in this study.

In this section, we discuss diffusivity in solids and the methods used to estimate ionic conductivity.

Resistance R of a conductor with the length l and cross-sectional area A is given by
(126)
where resistivity (or specific resistance) ρ has SI unit of Ωm. Conductivity (or specific conductance) is the reciprocal of resistivity
(127)
SI unit of conductivity is S/m, where Siemens S is 1/Ω.
Solid electrolytes that allow fast ion conduction may lead to smaller and safer portable energy storage devices than the current batteries that come with liquid electrolytes.131 Cations such as Li+ diffuse in solid in the framework of anions or poly-anionic moieties. Many fast ion conductor solids have a body-centered-cubic arrangement of anions.132 In crystals, ionic conductivity rises from the formation and hopping of defects such as vacancies and interstitials whose concentration is temperature dependent. If defects are uncorrelated, conductivity can be described by an Arrhenius-type relationship
(128)
where q,n,u,Ea are the charge, concentration, mobility, and activation energy of mobile ions, respectively, and m is typically −1. Ea entails the energies required for the creation and hopping of mobile charges. In the simplest case where an ion jumps from one lattice site to a neighboring vacancy, we can write
(129)
where z,ΔSm,α0,ν0 are the geometric factor, entropy of migration, jumping distance, and attempt frequency, respectively. ν0 is related to the character of lattice vibrations.
Diffusion is proportional to conductivity through the Nernst–Einstein relationship
(130)
where the Haven ratio HR is the correlation factor that depends on the diffusion mechanism.133 For uncorrelated ion hopping HR = 1 and for highly correlated ionic motion observed in fast-ion conductors HR>2. It has been suggested that concerted migrations of multiple ions with low energy barriers is the origin of fast ionic diffusion in superionic conductors.134 

Nudged elastic band (NEB) calculations are used to calculate the energy barrier along a specific diffusion path if dominant diffusion pathways are known. For a concerted diffusion mechanism, this becomes even more complicated as the number of possible pathways become increasingly large. The ab initio molecular dynamics (AIMD) simulations, on the other hand, are able to realistically determine the diffusion coefficient and the conductivity of materials as well as the diffusion pathways. However, due to the huge computational resources required for the electronic structure calculations, these simulations are often applied only to small systems (100 atoms) and short intervals of time (100ps). Therefore, sometimes classical MD simulations that use empirical potentials or force fields are utilized to scale up calculations.

Another important aspect of the studies of diffusion in the solids is that, due to the Arrhenius-type temperature dependence of the diffusion events, they occur too slowly at room temperature to be sampled with statistical certainty using AIMD135 even in the fastest ion conductors. Therefore, often simulations are carried out at elevated temperatures and the diffusion coefficient at room temperature is approximated by extrapolation. However, this is not possible in the cases where the system of interest is unstable at higher temperatures. In fact, such a case is encountered in the current study.

The ionic conductivity σ is given by the Green–Kubo relation in linear response theory136 
(131)
where d is the dimension, V is the volume, N is the number of mobile ions, q is the charge of ions, kB is the Boltzmann constant, T is temperature, and J(t)=i=1Nvi(t) is the total particle flux. This can be written in the form of Nernst–Einstein equation as
(132)
where Dσ is called the jump diffusion coefficient and is related to the diffusion of the center of mass. The tracer diffusion coefficient is
(133)
One can see that
(134)
With the definition of Haven ratio as HR=D/Dσ,137 ionic conductivity is often expressed in terms of the tracer diffusion coefficient as
(135)
If the motions of ions are uncorrelated, then HR = 1.
The diffusion coefficient can be calculated from molecular dynamics simulations. The mean squared displacement (MSD) is defined as
(136)
where N is the number of mobile ions and Δri(t)=ri(t)ri(0). The tracer diffusion coefficient can be calculated from MSD using the relation
(137)
where d is the number of dimensions and indicates the ensemble average. The squared mean displacement (SMD), which is related to the displacement of center of mass, is
(138)
Similarly, the jump diffusion coefficient can be obtained from SMD as
(139)
Although, the diffusion of the center of mass is often too slow to be calculable with statistical certainty using AIMD.
Consider an infinite 1D lattice where pj=exp(εj/kBT) is the probability that a random walker will jump over the edge connecting j and j + 1 vertices (pj=pjj+1=pj+1j) and εj is the energy barrier. If pj = p for all j, then D=νa2p is the diffusion coefficient for random walkers, where a is the lattice spacing and ν is the frequency of the jump trials. In general case where pj is non-uniform, the diffusion coefficient becomes D=νa2pf and εf=kBTlog(pf) is the effective energy barrier, which can be calculated (analogous to series of resistors) from following:
(140)
As a simple model for a crystal with grain boundaries, assume that one out of every l edge has a different energy barrier εg; otherwise, the energy barrier is equal to εa. The effective energy barrier becomes
(141)
If εakBT,εgkBT,1l1, then
(142)
which readily shows how the activation energy for long-range diffusion is related to the spacing of the grain boundaries.

In molecular dynamics simulations, often only the tracer diffusivity is calculated in a pure crystal. This can be used to estimate the short-ranged diffusivity. Grain boundaries and long-range diffusivity are difficult to consider because of the size limitations of ab initio simulations.

The diffusion coefficient (diffusivity) is defined as
(143)
where d = 3 is dimensions and indicates the ensemble average. The directional MSD along n̂ is
(144)
The activation energy Ea is calculated with the assumption of an Arrhenius temperature dependence for diffusivity as
(145)
The Nernst–Einstein relationship for conductivity σ is given by
(146)
where N is the number of the diffusing atoms, V is the volume, q is the ions electric charge, and T is the temperature. Alternatively, Arrhenius dependence for σT can be assumed and the diffusivity can be obtained by extrapolation. The numerical difference between the two cases is found to be small [within O(1%)].

Here we have described the dynamic properties such as ionic conductivity, diffusion, linear response theory, molecular dynamics simulations, grain boundaries, and long-range diffusivity for material simulations.

In this section, we discuss the Li diffusivity in sulfide solid electrolytes, which are one of the most promising candidates for future battery systems. Li7P3S11 has shown high ionic conductivity. More powerful solid electrolytes showing remarkable ionic conductivity need to be developed for more promising battery systems. Here we investigate these materials using the SGPR algorithm and study its properties using large scale ML MD simulations. We also discuss the glass phase with a melt-quench simulation and the Li diffusivity in the glass phase and the crystallization.

FIG. 10.

Training data: for forces, RMSE is 0.13 eV/Å and R2=0.967; for energies, MAE is 3.817×103 eV per-atom.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 10.

Training data: for forces, RMSE is 0.13 eV/Å and R2=0.967; for energies, MAE is 3.817×103 eV per-atom.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal

1. Li2S-P2S5 system

We consider sulfide-based superionic conductors in the (1z) Li2S-zP2S5 system. The most stable compound in this system is considered to be Li3PS4. It exists as γ-Li3PS4 below 195 °C and β-Li3PS4 at higher temperatures. Depending on the experimental preparation method, the ionic conductivity as high as 104S/cm is reported for this material.138 A much more conductive compound is the Li7P3S11 glass-ceramic, which has a highly conductive phase due to the formation of a metastable crystal structure.139 This crystal structure is found to be a triclinic cell made of PS4 tetrahedra and P2S7 ditetrahedra.140 With heat treatment to reduce the grain boundary resistance, the conductivity increases to 1.7×102S/cm.141 Some of other fastest ion conductors are closely related to this system in that they are obtained upon substituting some of P or S atoms by other atoms (M, for example, Ge), namely, Li2S-P2S5-GeSn, Li10GeP2S12,142 or a more complicated case of Li9.54Si1.74P1.44S11.7Cl0.3.143 Argyrodite type solid-state electrolytes (SSEs) are another family of electrolytes. Thus, the main degrees of freedom are to choose the ternary/quaternary systems, the composition ratios, and the route to the final product such as the heat treatment temperature and duration. First-principles theory calculations have been applied extensively for theoretical studies of solid electrolytes,144 which helped in shedding light on the lithium's self-diffusion mechanism in Li10GeP2S12,145,146 the effects of anion/cation substitutions,147 the cause of different ionic conductivity between tetragonal and cubic Li7La3Zr2O12,148,149 and the effect of composition in Li3OCl1−xBrx lithium-rich antiperovskite150 and (1-z)Li4SiO4-(z)Li3PO4 solid solutions.151 

2. Training

A sparse Gaussian process potential was trained with on-the-fly adaptive sampling algorithm for the well known crystal structures of γ-Li3PS4, β-Li3PS4, and Li7P3S11.63 For sampling data, the systems were quenched from ∼1500 to 700 K in ∼5 ps by velocity re-scaling and kept at this temperature for ∼45 ps using a time step of 2 fs. In quenching, there is no time for the crystal to melt (since atoms do not have time to displace too far the crystal becomes only locally disordered), but this local distortion results in stream of diverse LCEs to the sampling algorithm, which helps in faster stabilization of the model at the beginning of training. In total, only 456 ab initio calculations are performed from which only 111 are sampled by the model as the data X and 705 LCEs are sampled as the inducing set z.

3. Testing

The model on all 456 ab initio data generated during training fits ab initio forces with an RMSE of 0.13 eV/Å and a coefficient of determination of R2=0.967 (Fig. 10). To verify the model with independent data, ML MD simulations at five temperatures in the range 300–700 K for 0.8–2.2 ns are performed and 103 snapshots are randomly selected. The accuracy of the model is tested with these samples in Fig. 11. For this testing set, we have RMSE = 0.14 eV/Å and R2=0.944. The RMSE of the model is almost the same for the training data and testing sets. This shows that the model generalized very well.

FIG. 11.

Independent testing set: for forces, RMSE is 0.139 eV/Å and R2=0.944; for energies, MAE is 3.822×103 eV per-atom.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 11.

Independent testing set: for forces, RMSE is 0.139 eV/Å and R2=0.944; for energies, MAE is 3.822×103 eV per-atom.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal

In addition, the liquid and glass phases of Li7P3S11 are simulated with the SGPRs model. 50 + 70 samples are drawn from machine learning molecular dynamics (MLMD) simulation of the liquid phase Li7P3S11 at 1200 K and glass phase (obtained after melt-quenching) at 600 K for ∼0.2 ns. The number of atoms in these samples is 672.

To show that the model is energy-conserving, a MD simulation was carried out in the NVE ensemble. The total energy is conserved up to small fluctuations of 105 eV per atom, which are related to the finite time step of 2 fs. Since the model is trained at high temperatures, it is intriguing to see how well the model describes the properties close to 0 K. Specifically, NEB simulation is used in many ab initio studies to estimate the activation energy for Li diffusion since lengthy AIMD simulation is computationally very time consuming. We carried out NEB calculations with both the ML potential and direct DFT calculations. The trained potential in Fig. 12 describes NEB with an acceptable accuracy.

FIG. 12.

Nudged elastic band calculations for Li hopping to a neighboring vacancy (created by Li removal) in γ-Li3PS4 with DFT and ML potentials. (2×2×2) k-point grid is applied in DFT calculations.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 12.

Nudged elastic band calculations for Li hopping to a neighboring vacancy (created by Li removal) in γ-Li3PS4 with DFT and ML potentials. (2×2×2) k-point grid is applied in DFT calculations.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal

4. Computational performance

For a system with N atoms, the computational cost of predicting energy and forces with the SGPR model is O(N), while the complexity of ab initio DFT calculations is O(N3) or at best O(N2logN) unless a fragmental approach is not employed. The elapsed time for ab initio calculations of the energy and forces for a system with 672 atoms was 9718 s using VASP with 80 cores, while the elapsed time for the same calculation with SGPR was 5.3 s using only 20 cores. The cost defined by elapsed×cores/N is 0.15s for SGPR, which is better than ab initio calculations by a factor of O(104). This gain will grow exponentially as the number of atoms increases.

5. α and β phases

Large scale ML MD simulations of Li7P3S11 (4×2×2 supercell, 672 atoms) were performed in isothermal-isobaric (NPT) ensemble at several temperatures in the range 300–1200 K and external pressure of 105Pa. For this size, the implementation of ML MD is O(104) faster than DFT MD. A phase transition was detected by ML MD (at T450), which occurred by rotations of the P2S7 double-tetrahedra into a new orientational order. The initial and the new structures are referred to as α and β phases. The ground state energies of α and β phases obtained from exact DFT calculations are −4.413 and −4.416 eV per-atom, respectively, which shows that they are almost iso-energetic and the β phase is not an artifact of the SGPR potential. Also, examples from the β phase are encountered during training and testing. 0.14 eV/Å is the lumped RMSE for both phases. There appeared two steps in the MSD of S atoms corresponding to the delayed rotations of P2S7 double-tetrahedra in different layers. The delay was ∼0.5 ns at 450 K with a 4×2×2 supercell. Specifically in the range 450–600 K (Fig. 13), the system remained a few hundred picoseconds in the α-phase before transition to the β-phase, which allows estimation of the Li diffusivity in both phases. In agreement with experimental reports,139,140 at 900 K, the P and S atoms also started to diffuse, which indicates melting or its decomposition to more stable components.

From the diffusion coefficient (diffusivity) (Fig. 14) with the assumption of an Arrhenius temperature dependence for diffusivity, the activation energies Ea in α and β phases were 0.20 and 0.30 eV, respectively. The ionic conductivity in the α phase in NPT at RT(=300 K) becomes 3.5×102S/cm, which is in reasonable agreement with previous DFT simulation results of 5.7×102152 in NVT and 4.5×102S/cm153 within a limited box size. For the β phase, the RT conductivity is obtained by extrapolation of D and approximating V (which is T-dependent) by the volume of the relaxed cell. This yields the conductivity of 2.3×103S/cm, which is more than an order of magnitude lower than the α phase. Alternatively, the Arrhenius dependence for σT is assumed. The numerical difference is O(1%) since the volume of α phase expands less than 1% (300–600 K) and β phase about 4% (450–800 K).

The diffusivity was the largest along the a-axis in both phases. The main difference is that the diffusivity along b and c axes are almost equal in the α phase, while in the β phase, this symmetry is clearly broken. In Ref. 152, it is pointed out that the one-dimensional (1D) MSD of Li atoms along one of the axes in Li7P3S11 is nearly a factor of 2 higher than that along the other two axes. This is consistent with the directional MSD.

6. Glass phase

The glass Li7P3S11 is created by a melt-quench simulation from 1200 to 300 K. The Li conductivity of 9.3×103S/cm is then obtained for the glass phase (with ML MD for 3 ns), which is higher than the conductivity in the β phase. Equilibration of the glass at higher temperatures shows that P atoms start to diffuse at 600 K, which indicates crystallization below this temperature (Fig. 13), in agreement with experiment.139–141 The liquid and glass phases show RMSE = 0.21 eV/Å and R2=0.917 (see Fig. 15).

FIG. 13.

MSD of P atoms in glass phase Li7P3S11. Diffusion of these atoms at 600 K allows the glass to slowly relax and eventually crystallize. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 13.

MSD of P atoms in glass phase Li7P3S11. Diffusion of these atoms at 600 K allows the glass to slowly relax and eventually crystallize. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal
FIG. 14.

Li diffusivity (D) in α (blue) and β (red) phases of Li7P3S11. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 14.

Li diffusivity (D) in α (blue) and β (red) phases of Li7P3S11. Figure taken from Ref. 54. Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal
FIG. 15.

Liquid and glass phases: for forces, RMSE is 0.21 eV/Å and R2=0.917; for energies, MAE is 6.35×103 eV per-atom.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

FIG. 15.

Liquid and glass phases: for forces, RMSE is 0.21 eV/Å and R2=0.917; for energies, MAE is 6.35×103 eV per-atom.54 Reprinted with permission from Hajibabaei et al., Phys. Rev. B 103, 214102 (2021). Copyright 2021 American Physical Society.

Close modal

Experimentally, Li7P3S11 is often prepared in glass-ceramic phase by controlled crystallization and heat treatment of the glass phase. Depending on the heat treatment conditions, ionic conductivities of 3.2×103 and 1.7×102S/cm are reported.140,141 It was shown that the experimental conductivity is correlated with the degree of crystallization, depending on the heat treatment method.154 Experimental measurements of the ionic conductivity with techniques such as impedance spectroscopy probe the long-range dynamics of Li, which is influenced by the properties such as grain boundaries and degree of amorphism. Wohlmuth et al.155 probed the short-range diffusivity of Li in glass-ceramic Li7P3S11 using NMR a range of nuclear magnetic resonance (NMR) methods and showed that the activation energy of bulk Li7P3S11 is 0.20 eV, which is in agreement with our simulation. It was shown in Sec. IV F that grain boundaries modify the activation energy by Ea=Ea+(EgEa)/l, where Eg is the energy barrier at the grain boundary and l is the grain size that can often be measured experimentally.

7. Li–X–Y ternary systems

We studied Li diffusivity in Li7P3S11 with extensive MD simulations using a ML potential created with the SGPR algorithm. These simulations described the diffusion properties and the melting temperature in agreement with the experimental measurements. We simulated systems with 700 atoms for several nanoseconds. A new crystal structure is found for Li7P3S11, which is referred to the β phase. The β phase shows much lower ionic conductivity than the α phase. These phases are almost isoenergetic according to exact DFT calculations. Therefore, it is important to identify the β phase experimentally and fine-tune the preparation method in order to avoid this phase. Additionally, we studied diffusivity in the glass phase and showed that the ML potential predicts the crystallization temperature consistently with experimental observations.

In this section, we turn our focus on known Li–X–Y ternary systems, which are available online (materials-project111). We considered any crystal that has a considerable concentration of Li to be considered as a solid electrolyte. For any material that qualified certain criteria, a ML potential is generated using the SGPR algorithm and its Li diffusivity properties are calculated. Then, we discuss how the ML models for similar materials can be combined for generating global/universal potentials.

a. Searching algorithm

For all of the Li–X–Y ternary systems, with the exclusion of transition metals, a sparse GP potential was trained using an on-the-fly adaptive sampling algorithm for known crystal structures (Materials Project database) in the materials-project111 repository. At the first filtering, the systems with lithium less than 10% of their atoms are dropped. As the next filter, the systems whose electronic bandgap is less than 1 eV are eliminated. After the first two filters, about 300 structures remained for further investigations. Then, on-the-fly ML MD simulations for the potential energy are triggered for these materials. The size of supercells for the ML MD simulations was chosen such that k-spacing Δk in the reciprocal space satisfied Δk<2π10Å1. Therefore, only Γ-point was chosen as the k-point grid for DFT calculations. The number of atoms in the supercell was in the range of 100–300. If the Li diffusivity of a certain material was estimated to be too small (e.g., less than ∼0.1 Å2/ps at 1000 K), the simulation was stopped for that material. The other condition for continuation of the simulation was stability. If a crystal melted at temperatures higher than ∼600 K, the simulation was terminated unless a very high diffusivity is observed at lower temperatures. The ML MD simulations were carried out in isothermal-isobaric (NPT) ensemble for ∼100 ps. Temperature dependent diffusivity and room temperature ionic conductivity were calculated for this set of materials and listed in Table S1 of Ref. 63.

At the end, 22 promising crystals were detected (Table II). As an exemplary case, 4 materials out of 22 had the same set of elements (Li–P–S) for which the training set consisted of crystal structures known for their relevance to Li-ion conductivity focusing primarily on three materials: γ-Li3PS4 with (1×1×2), β-Li3PS4 with (2×1×1), and Li7P3S11 with (2×1×1) cells.138–140 This sampling procedure was discussed in the earlier Subsection V C. As such, the combinatorial approach above opens the door for strategies like “high level parallelism” or “a library of SGPR potentials,” which can be combined by simple cross covariance calculations to describe increasingly complex materials. Such a combinatorial approach is not trivial by other methods.

b. Universal ML interatomic potentials for surveying ternary solid electrolytes

With on-the-fly ML of interatomic potentials using the SGPR algorithm, MD simulations are applied for a survey of Li diffusivity in hundreds of ternary crystals as potential electrolytes for all-solid-state batteries. The models generated for these crystals can be easily combined for creating more general and transferable models, which can potentially be used for simulating new materials without further training. As an example, universal potentials are created for Li–P–S and Li–Ge–S systems by combining the expert models of the crystals that contained the same set of elements. Combinatorial models of different ternary crystals can be directly applied for modeling composite quaternary ones (e.g., Li–Ge–P–S). This hierarchical approach paves the way for modeling large-scale complexity by a combinatorial approach.63 

NNs are suitable for large datasets, while kernel-based methods excel in training with smaller data and on-the-fly learning.43,44,54,91,101 These methods have found a wide range of application in physical chemistry. Relating to solid electrolytes (SEs), Li3N is studied using electrostatic spectral neighbor analysis potential (eSNAP) method,156 Li3PO4 using NNs,157 and Li7P3S11 using SGPR.54 

An important issue for the future of ML IAPs is transferability and systematic model reusability. Currently, models are generated for specific tasks and often discarded afterward. The ability to build on prior work is highly sought-after and imperative for efficient exploration of functional materials. Although gradual improving of the trained models is possible with all ML methods, kernel-based methods such as SGPR offer natural frameworks for libraries of ML models, which can be combined with little effort to describe increasingly complex materials. NNs memorize the data using a set of weights with fixed size and thus have to be re-optimized entirely upon adding new data. In contrast, in kernel-based methods, models are represented by a set of LCEs that can be easily categorized and encapsulated in libraries. These ideas can be explored in the context of Li conducting crystals, which are one of the essential components of all-solid-state storage devices. In an unsupervised fashion, hundreds of ternary crystals are simulated by on-the-fly ML MD and fastest Li conductors are distinguished. Later, we discuss combining the SGPR models for more transferable potentials and a combinatorial approach for modeling new crystals.

Li7P3S11 superionic conductor is extensively studied using SGPR. Similar simulations are applied to hundreds of potential SEs. We focus on ternary systems (composed of three elements) in the materialsproject111 repository that contains Li. Crystals containing the transition metals are not considered here. Li concentration and bandgap are used as initial filters since a SE should have considerable Li atoms and it should be an electron insulator. About 300 crystals passed the initial filters. A quaternary system of Li10Ge(PS6)2142 is included in the list as an extension of the ternary system. Then NPT-MD simulations with on-the-fly SGPR are performed for a duration of ∼100 ps for each crystal at standard pressure. Three or four temperatures are chosen for MD simulation depending on the stability against melting and the Li diffusion rate.

Figure 16 and Table II show the Li diffusivity in 22 fastest Li conducting crystals. In NPT-MD simulations, the volume expands with increasing temperature. This temperature dependence is ignored in reporting the ionic conductivities in Table II since it only modifies the values by O(1%) (see Ref. 54). Nevertheless, the effect of NPT ensemble is directly reflected in diffusivity and significantly faster ionic conduction is observed in comparison with NVT ensemble.

FIG. 16.

Extrapolated ionic conductivity at 300 K (blue) and activation energy (red) for 22 fastest Li conducting crystals. Figure taken from Ref. 63. Reprinted with permission from Hajibabaei and Kim, J. Phys. Chem. Lett. 12, 8115 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 16.

Extrapolated ionic conductivity at 300 K (blue) and activation energy (red) for 22 fastest Li conducting crystals. Figure taken from Ref. 63. Reprinted with permission from Hajibabaei and Kim, J. Phys. Chem. Lett. 12, 8115 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

Fast Li diffusion is one of the most important factors for a promising solid electrolyte. Stability against Li metal and wide electrochemical window are the other important factors. Additionally, experimental reports often measure long-range diffusivity, which is affected by grain boundaries, while the values in Table II reflect only the short-range diffusivity. A special technique is required for probing the short-range diffusivity in experimental samples.155 Among 22 Li conducting SEs reported in Table II, most are sulfides (14), few halides (5), two oxides, and one selenide. Here we briefly discuss the experimental results for selective SEs. The top three fastest ionic conductors belong to the Li2S-P2S5 system. The most stable compound in this system is (γ-/β-) Li3PS4. Li7PS6 is a lithium argyrodite with orthorhombic/cubic (low/high temperatures) phase where the cubic phase has a higher conductivity. The highest experimental conductivity reported in this system is for Li7P3S11, which has a triclinic cell made of PS4 tetrahedra and P2S7 double-tetrahedra.140 Experimental room temperature ionic conductivities (mS/cm) of 0.16 for Li3PS4,158 0.11 for Li7PS6,159 and 17 for Li7P3S11141 are reported. The latter is one of the highest reported to date for a SE.138 As a halide, a conductivity of ∼4 mS/cm (at 330 K) is reported for Li3InBr6.160 Li4SiO4 has also been considered as a parent for SE solutions.151 The experimental conductivities are often improved, even by several orders of magnitude,158 depending on the preparation method. Our simulations suggest that experimental preparation methods could be improved, specially for Li7PS6.

8. Universal potentials: Application to quaternary systems

In previous simulations, an “expert” model is generated for the potential energy of each material. At the end, 22 materials are selected, which show significant Li diffusivity. For instance, 4 materials out of 22 have the same set of elements (Li-P-S). It is intriguing to somehow combine these four models to create a universal potential for Li–P–S system. The combinatorial approach based on MOE model opens the door for strategies like “high level parallelism” or “a library of SGPR potentials,” which can be combined by simple cross covariance calculations to describe increasingly complex materials.

As a more simple case, the SGPR models for Li3PS4 and Li4GeS4 are simply combined and tested on Li10Ge(PS6)2 (both configurations in Table II). Due to the presence of different species P and Ge, the combinatorial model cannot be significantly compressed. The combined model of the two ternary crystals described the forces of the quaternary crystal with R2=0.961, which is very close to R2 of the expert model. This is understandable in view of the similarity of local chemical environments for the cations P and Ge in these crystals (see Fig. 6). In many promising solid electrolytes, they are present as (XS4)q poly-anionic moieties with X = P, Ge, etc., in the sea of Li+ ions and do not directly interact with each other. By a combination of the ML models for the simpler crystals, we can simulate the composite ones with a minimal training or even without any additional training. Thus, by combining two ternary systems of Li3PS4 and Li4GeS4, a quaternary system of Li10Ge(PS6)2 was studied because the models generated with kernel-based methods such as SGPR can be easily combined with more generality and transferability. Using the previously trained models in the database, it is possible to model new crystals without further training. This approach is indeed demonstrated by creating universal potentials for two ternary systems. Combinations of two ternary crystals resulted in a realistically accurate potential for a quaternary crystal. With the introduced algorithm, the models generated in this study can be reused, with limited or no training at all, to describe more complex system (such as quaternary electrolytes) by a combinatorial approach.

9. Summary for SGPR simulations of solid electrolytes

The SGPR-based ML formalism is applied to the studies of ionic diffusion in dozens of solid electrolytes. Specifically for Li7P3S11, these simulations described the diffusion properties and the melting temperature in agreement with experiments. We simulated systems with 700 atoms for several nanoseconds, which is unimaginable with direct ab initio molecular dynamics (AIMD). Although we could easily simulate larger systems because the computational cost for calculating the energy and forces is proportional to the number of atoms N, unlike DFT that scales as N3 or N2logN. For 700 atoms, using 100 CPUs, it took less than 1 s to predict the ab initio potential energy and forces using the SGPR potential which was faster than direct DFT by a factor of O(104). A new crystal structure is found for Li7P3S11, which is referred to as the β phase. The β phase shows much lower ionic conductivity than α phase. These phases are almost isoenergetic according to exact DFT calculations. Therefore, it is important to identify the β phase experimentally and fine-tune the preparation method in order to avoid this phase. Additionally, we studied diffusivity in the glass phase and showed that the ML potential predicts the crystallization temperature consistently with the experimental observations.

We showed that SGPR models for materials with the same set of elements can be easily merged into a single universal potential. This allows high level parallelism in generating ML potentials, which is not easily possible with other ML algorithms. That is, instead of training a potential sequentially in different domains, we can train potentials for all of the domains in parallel and merge them into a single global potential at the end. This is a critical step forward for applications of the ML potentials at increasingly wider domains and truly complex materials.

1. Al-doping in Li-ion-battery cathode materials

Al-doping has been employed in the cathode materials of lithium-ion batteries (LIBs) to alter the reaction of negatively charged ions in order to prevent declining in energy storing ability or dropping in electrical potential. The chemistry involving the transfer of negative charge in cathodes with high lithium content has opened avenues for creating cathode materials with greater energy capacity. Yet, the prolonged reaction of these negatively charged ions leads to a reduction in the battery's storage capacity and a decrease in voltage as the battery is used repeatedly. This occurs due to irreversible changes in the oxygen molecules, destabilizing the structure. To enhance the durability and steadiness of layered oxides and lithium-rich variants such as lithium nickel–cobalt–aluminum oxide (LNCAO or abbreviated as NCA) and lithium nickel–cobalt–manganese oxide (LNCMO or abbreviated as NCM), Al-doping has been intermittently applied. However, prior studies on Al-doping have mostly focused on NCA or NCM, which predominantly contain TM elements from the 3d group. The chemical behavior differs significantly in the case where the elements from the 4d or TM groups are utilized, leading to distinct characteristics. Cathodes comprising solely 3d TMs exhibit unique structural arrangements. In contrast, addition of more 4d or 5d transition metals into oxide-based cathodes introduces novel chemical properties. Due to the intricate behavior of these materials, SGPR-based DFT potentials/forces were employed to identify their most stable structures, presenting a new approach to designing fully reversible layered cathode materials. By employing this method, the impact of Ni and Al-doping was investigated on the decline in capacity and voltage of the high-capacity LIB cathode materials containing 4d-elements. An exhaustive and dependable analysis of this complex system was conducted using SGPR.65 The instability of LRO (Li2RuO3; Li1.33Ru0.67O2) was significantly enhanced by Ni-doping (LRNO: Li1.22Ru0.61Ni0.17O2). Further, stability was achieved in this LRNO through additional Al-doping. These Al-doped LRNO (Al-LRNO: Li1.22Ru0.61Ni0.11Al0.06O2) materials exhibit exceptional stability and high capacity. The structures of LRNO and Al-LRNO were investigated with SGPR-based DFT potentials by incorporating a diverse genetic algorithm based on the known LNO structure. The effects of N-doping and Al-doping on the materials' stability under high voltage conditions were investigated.

2. DFT calculations

To generate training test sets for SGPR potential energies, spin-polarized DFT calculations were conducted to derive potential energies for both discharged and charged phases of LRNO and Al-LRNO configurations by using a PAW plane wave with an energy cutoff of 600 eV. The method involved PBE+U functional with Tkatcheko–Schaffler (TS)161 dispersion correction, where U = 7.5 eV, J = 0.6 eV for Ru and U = 7.7 eV, J = 1.0 eV for Ni. The structures were fully relaxed until the interatomic forces were smaller than 0.03 eV/Å.

3. Structure prediction

To identify the best fitting arrangements of cations in the LRNO and Al-LRNO systems, the structures with the lowest energy were searched by altering atomic positions within the unit-cell formula to mirror the effect of doping. A modified genetic algorithm was employed starting with a handful of randomly generated parent structures. These parents represented changes in atomic positions, swapping three Li/Ru-sites within the 3 × 2 × 1 Li2RuO3 supercell with either three Ni atoms or a combination of two Ni and one Al atom, thereby forming LRNO or Al-LRNO reference systems, respectively. At each step, numerous child structures were randomly generated from each parent, and the lowest energy structures were selected from the combined parent and child generation for the next iteration. The search process continued until minimal changes occurred within the parent structures over a few steps, while conducting multiple independent searches. To calculate the energy of each structure, the ML potential was generated on-the-fly-by using the SGPR methodology, employing a variation of SOAPs—a many-body descriptor—as the core kernel. During each energy calculation step, if the ML model fell short, precise DFT calculations were performed, updating the model. An accuracy of 3 meV per atom was achieved by retaining the structures that could potentially possess the minimum energy within this margin for exact DFT calculations. Geometry optimization was performed to determine the lowest energy structure. Upon identifying stable cation ordering in Al-LRNO, a Li vacancy was introduced to balance additional charge, and its placement was determined by defect formation energy. To forecast the charging phase, each state of charge was handled separately, thus reducing the number of sampled geometries. Commencing with pre-established cation ordering, ∼500 random delithiated geometries were generated for each model, with their structure energies predicted by ML potential. The SGPR model recognized unique environments from around 11 samples among prepared structures, learning atomic potential from DFT results. Again, the model identified the Li ordering with the lowest energy, validated subsequently by DFT. The trained model underwent validation by comparing predicted and calculated formation energies of six randomly selected non-learning samples. For the training/test sets, each model exhibited R2 of 0.96/0.92, RMSE of 10.2/10.4 meV, and MAE of 8.2/6.4 meV (Fig. 17).

FIG. 17.

Predicted formation energy (EformML) vs DFT formation energy (EformDFT) of train set and test set at each delithiation. (a) 50% delithiated LRNO, (b) 75% delithiated LRNO, (c) 50% delithiated Al-LRNO, and (d) 75% delithiated Al-LRNO.65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 17.

Predicted formation energy (EformML) vs DFT formation energy (EformDFT) of train set and test set at each delithiation. (a) 50% delithiated LRNO, (b) 75% delithiated LRNO, (c) 50% delithiated Al-LRNO, and (d) 75% delithiated Al-LRNO.65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

4. Structures of LRNO and Al-LRNO

The structure features a honeycomb-like Li-transition metal layer reminiscent of LRO, showcasing an edge-shared section composed of RuO6-NiO6 or RuO6-AlO6 configurations [Fig. 18(a)]. A resemblance is observed in their respective calculated x-ray diffraction (XRD) spectra [Fig. 18(b)]. When comparing the optimized LRO, predicted LRNO, and experimental XRD results, no significant visual disparities were noted.162,163 This alignment significantly bolsters the reliability of the SGPR-based ML-driven simulations rooted in first principles for the structures of LRO, LRNO, and Al-LRNO. A notable observation is that in LRNO, Ni tends to form clusters, whereas in Al-LRNO, Ni and Al coalesce into alloy-type clusters, predominantly bonding with each other rather than exclusively with Ru. This inclination is supported by the cluster expansion coefficients (energies) derived from pair atoms, obtained through the least absolute shrinkage and selection operator (LASSO) regression model.164 This intriguing pattern can be comprehended through the concept of immiscibility or miscibility between Al, Ni, and Ru.166,167 DFT calculations on the alloying or mixing energies of these metals suggest that Ni and Ru tend to be immiscible, leading to the clustering of Ni atoms adjacent to Ru in LRNO. Meanwhile, the calculations indicate that Al–Ni mixing is marginally more favorable than Al–Ru mixing, resulting in a preference for Al atoms to bond primarily with Ni atoms, especially when a small quantity of Al atoms is present.

FIG. 18.

(a) Simulated atomic structures (top/side views) of LRO, LRNO, and Al-LRNO crystals generated by SGPR-based ML methods along with genetic algorithm approach and (b) their XRD patterns. The XRD patterns are calculated using λ = 1.541 Å. The arrangements of various atoms are shown in color code: green Li; skyblue Ru; orange Ni; deep-blue Al; red O.65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 18.

(a) Simulated atomic structures (top/side views) of LRO, LRNO, and Al-LRNO crystals generated by SGPR-based ML methods along with genetic algorithm approach and (b) their XRD patterns. The XRD patterns are calculated using λ = 1.541 Å. The arrangements of various atoms are shown in color code: green Li; skyblue Ru; orange Ni; deep-blue Al; red O.65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

5. Stability fading of Al-LRNO due to delithiation at high-voltages

The delithiated structures of LRNO and Al-LRNO are investigated to scrutinize the impact of Al-doping on stability, particularly in high-voltage environments. Initially, the fully (100%) delithiated structures of LRNO and Al-LRNO are optimized. Delithiation in LRO triggers significant distortion, indicated by a short O–O distance of approximately 2.39 Å, resulting from anionic activity.167 This leads to an essential de-coordination of oxygen atoms, inducing a structural distortion to alleviate strain energy through a reductive coupling mechanism. In LRO, oxygen prompts adjacent oxygen atoms to approach Ru along the O–O axis, causing O–O dimerization and eventual O2 release, contributing to poor anionic cycling stability, as highlighted in prior studies. However, in LRNO after Ni substitution, the distortion is somewhat alleviated, evident from a larger O–O distance (∼2.42–2.58 Å) compared to LRO. This suggests suppression of oxygen oxidation. Subsequent Al-doping in LRNO relaxes the delithiated structure further, resulting in a significantly increased O–O distance (∼2.50–2.62 Å). The O atoms attached to Al, exhibiting strong ionic covalent characteristics, hinder O–O dimerization, thereby suppressing O2 release. Additionally, the O defect formation energy was calculated to comprehend the suppression of structural distortion due to O2 release. The calculations reveal a higher O defect formation energy in Al-LRNO > LRNO > LRO at any delithiated state. By recognizing that 100% delithiation during charging is unrealistic, the 50% and 75% delithiated structures of LRNO and Al-LRNO were examined to determine their formation energies (meV/f.u., where f.u. is formula-unit) using SGPR model-based DFT-level potential. The results confirm the higher stability of delithiated Al-LRNO structures over the corresponding delithiated LRNO structures, evidenced by more negative formation energies [Fig. 19(a)]. Despite similar destabilization from structural reconstruction in both Al-LRNO and LRNO, the heightened stability in delithiated Al-LRNO stems from Al's stronger reduction effect compared to Ni, as validated by the Bader charges. Specifically, the Bader charges (in au) of O around Ni in LRNO for 50% and 75% delithiated phases are −1.01 and −0.87, respectively. Conversely, around Al in Al-LRNO for 50% and 75% phases, the values become more negative, registering at −1.23 and −1.12, respectively. These findings imply Al effectively reinforces oxygen binding energy in Al-LRNO, leading to greater suppression of oxygen evolution. Furthermore, a notable difference is observed in delithiation around the Al atom compared to LRNO, where delithiation is significantly curtailed [Figs. 19(b)–19(g)]. At highly delithiated states, Li-ions farther from Al tend to diffuse out first, whereas no specific Li position preference is noted around Ni.

FIG. 19.

Machine learning predicted structures and formation energy of LRO, LRNO, and Al-LRNO at discharged and charged states. (a) Formation energy (meV/f.u.) with delithiation process in LRNO (red line) and Al-LRNO (blue line). The most stable geometry (filled circle) is found among sampled configurations (empty triangle) by the sparse Gaussian process regression (SGPR) model-based on-the-fly potential learning algorithm. (b) and (c) The SGPR model-based machine learning predicted pristine structures of (b) LRNO and (c) Al-LRNO at the discharged phase. (d)–(g) Charging phases (50% and 75%) (Li: green, Ru: skyblue, Ni: orange, Al: blue, O: red).65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 19.

Machine learning predicted structures and formation energy of LRO, LRNO, and Al-LRNO at discharged and charged states. (a) Formation energy (meV/f.u.) with delithiation process in LRNO (red line) and Al-LRNO (blue line). The most stable geometry (filled circle) is found among sampled configurations (empty triangle) by the sparse Gaussian process regression (SGPR) model-based on-the-fly potential learning algorithm. (b) and (c) The SGPR model-based machine learning predicted pristine structures of (b) LRNO and (c) Al-LRNO at the discharged phase. (d)–(g) Charging phases (50% and 75%) (Li: green, Ru: skyblue, Ni: orange, Al: blue, O: red).65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

6. Enhanced diffusion by Al-doping

Delving into the diffusion coefficient via AIMD study poses challenges due to the fluctuating lithium concentration during the charging and discharging cycles. Particularly, at room temperature, the diffusion of Li atoms belongs to extremely rare events. To expedite computations, the diffusion was simulated at elevated temperatures of 700, 800, 900, 1000, and 1200 K. To find the impact of Al-doping, LRNO and Al-LRNO structures were examined by employing (2 × 1 × 1) cells with a 50% Li concentration, comprising 122 and 120 atoms in the simulation cells, respectively. The SGPR-based ML DFT potential was trained dynamically during the simulations. MD simulations in the NVT ensemble with a Nosé–Hoover thermostat were conducted for 0.5 ns (except for 1 ns at 700 K), using a time step of 2 fs. Each atom's mean squared displacement (MSD) from its initial position surpassed 50 Å within the simulation timeframe. Typically, the temperature dependency of the diffusion constant D is approximated using the Arrhenius relation: D=Doexp(Ea/kbT), where Ea represents the activation energy. The calculated activation energies for LRNO and Al-LRNO with 50% Li concentration stand at 0.48 and 0.40 eV, respectively (Fig. 20). Simulations based on the SGPR ML DFT potential project a higher diffusion coefficient for the Al-doped material compared to the pristine one, suggesting an approximate doubling in Li+ mobility within the cathode material, especially around ∼700 °C, where Li+ diffusion favors Al-LRNO over LRNO.

FIG. 20.

Diffusion coefficient of Li ions (with 50% concentration) in LRNO (blue bullets) and Al-LRNO (red bullets) calculated with SGPR-based machine-learned DFT potential MD simulations.65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 20.

Diffusion coefficient of Li ions (with 50% concentration) in LRNO (blue bullets) and Al-LRNO (red bullets) calculated with SGPR-based machine-learned DFT potential MD simulations.65 Figure taken from Ref. 65. Reprinted with permission from Ha et al., Adv. Energy Mater. 12, 2201497 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

7. Electronic origin of stabilization and Discussion

In the realm of cathode materials containing 4d-elements, the introduction of Al through doping has shown remarkable efficacy in mitigating capacity and voltage deterioration. This intervention not only addresses persistent challenges related to capacity fading but also curtails the decline in average discharge voltage, thereby paving the way for high-performance LIBs. Al-doping fosters the formation of Ni–Al bonding, bolstering the characteristics of bonding orbitals within Al–O bonds. This fortified Al–O bonding acts as a deterrent to oxygen oxidation, significantly enhancing structural stability and alleviating safety concerns. The capability of Al-doping to suppress capacity fading and voltage decay is pivotal in the development of stable, reversible layered cathode materials. Furthermore, the reported material possesses the potential to serve as a protective coating over cathode materials, substantially enhancing their electrochemical performance by imbuing them with heightened stability. The exceptional performance of this material can be attributed to several factors. The bonding orbital characteristics of Al–O, situated deep below the Fermi level in Al-doped material, effectively restrain excessive oxygen oxidation even during profound delithiation. This phenomenon stabilizes crystal structures by notably suppressing oxygen defect formation compared to LRNO. Al-LRNO demonstrates higher structural stability within the high voltage region due to the less oxidized state of Ni, a characteristic observed in this Al-doped material. In essence, the strategic utilization of Al-doping stands as a promising avenue for addressing key challenges in LIBs, fostering improved stability and performance.

Notably, this SGPR ML-based computational approach underscores the effective suppression of cationic migration within these high-voltage redox systems. Al-doping emerged as a highly impactful strategy, influencing the bonding dynamics to stabilize TM octahedrons within the high-voltage redox framework. The Al-doping not only enhances diffusion rates but also reduces Li-ion charge transfer resistance, bolstering the overall performance. Drawing from these fundamental insights and preceding experimental results, conventional cathode materials such as NCMA could potentially be harnessed for long-term development in high-energy-density batteries. The analogical relationship between Mn and Ru in the periodic table holds promise for this purpose. The exceptional attributes of Al-LRNO, including its high capacity, diminished voltage hysteresis, and remarkable structural integrity, owe credit to the induced Al–O bonding orbital characteristics. The ionic covalent nature of Al–O bonding orbitals, situated significantly below the Fermi level within the doped material, plays a pivotal role in restraining excessive oxygen oxidation, even under deep delithiation conditions, effectively stabilizing the crystal structure. We posit that the reported material holds potential as a protective coating for commercial cathode materials, fostering high-temperature stability and substantially augmenting their electrochemical performance. This strategic use of the material may pave the way for advancements in battery technology.

8. Sodium-substituted disordered rock salt cathode: Reduced energy barrier for oxygen evolution

Cation-disordered rock salt (DRX) cathodes stand as promising candidates for future high-energy-density materials, surpassing conventional layered cathodes in LIB technology. An intriguing discovery emerged when Na-doped DRX exhibited efficient electrocatalytic properties toward the OER. This OER electrocatalyst demonstrated a current density of 10 mA cm−2 at an overpotential (η) of 270 mV, displaying a Tafel slope of 67.5 mV dec−1 and impressive long-term stability surpassing 5.5 days—surpassing the benchmark IrO2 (with η = 330 mV and Tafel slope = 74.8 mV dec−1). This intriguing electrochemical behavior finds strong support both experimentally and through ML-based searches for the minimum energy structure. Na-doping notably facilitates O2 evolution by reducing the oxygen binding energy (OBE) on the surface, showing a decrease from 6.51 eV in pristine DRX to 5.45 eV with Na-doping. To comprehend Na's role, the scenario where 1/16 of Li atoms in Li2RuO3 were substituted by Na was examined. Simulation-based relaxations considering positions and cell structures demonstrated that the system's potential energy is minimized when the Na atom occupies the Li layer rather than the Ru layer, due to the compatibility between Na and Li. Subsequent discharging/delithiation was simulated using AIMD. SGPR MD was performed to construct an on-the-fly potential energy surface of the spin-polarized PBE+U calculations.66 The ML model accurately predicted the DFT forces with a RMSE of ∼0.2 eV, which can be considered favorable at high temperatures. Simulations were conducted with a time step of 2 fs in the isothermal isobaric ensemble (NPT) to monitor volume and shape changes during discharging at an elevated temperature of 700 K for enhanced dynamics. Two supercells were created—one with 32 undoped Li atoms and another with two Li atoms replaced by Na atoms in their minimum energy positions. Both were simulated using MLMD to discern differences. During these simulations, the delithiation potential was calculated based on energy changes and volume variations in the crystal structure. The findings highlighted that while Na's presence did not notably affect the potential, it played a pivotal role in controlling crystal volume. At lower delithiation stages (<50%), the Na-doped structure exhibited slightly higher volume than the undoped crystal. However, in higher delithiation percentages (50%–80%), the presence of Na prevented excessive expansion along the stacking direction of the Ru layers, thereby enhancing the system's mechanical stability (Fig. 21).

FIG. 21.

Effect of Na doping in DRX. (Left) Supercell of Na-doped Li2RuO3 used for delithiation (discharging) simulations with SGPR based PBE+D3 first-principles MD simulation. After every 5 ps a Li atom was removed from system. (Right) Spacing of Ru layers and delithiation potential averaged over every 5 ps. All calculations were spin-polarized and the maximum k-spacing of |Δk|<π/10Å1 was applied for generating the k-point grid. A kinetic energy cutoff of 500 eV is applied. Due to the large size of unit cells, the Γ-centered k-Point grid of (1, 2, 3) was chosen. The (U; J) values of (4:5; 0:6) for Ru and (4:7; 1:0) for Ni were used for PBE+U calculations.66 

FIG. 21.

Effect of Na doping in DRX. (Left) Supercell of Na-doped Li2RuO3 used for delithiation (discharging) simulations with SGPR based PBE+D3 first-principles MD simulation. After every 5 ps a Li atom was removed from system. (Right) Spacing of Ru layers and delithiation potential averaged over every 5 ps. All calculations were spin-polarized and the maximum k-spacing of |Δk|<π/10Å1 was applied for generating the k-point grid. A kinetic energy cutoff of 500 eV is applied. Due to the large size of unit cells, the Γ-centered k-Point grid of (1, 2, 3) was chosen. The (U; J) values of (4:5; 0:6) for Ru and (4:7; 1:0) for Ni were used for PBE+U calculations.66 

Close modal

1. High-performing atomic electrocatalysts

There is an urgent need for developing efficient and environmentally friendly electrocatalysts for various electrochemical reactions including hydrogen/oxygen evolution/reduction, carbon dioxide reduction, ammonia reduction, hydroxide evolution, and chlorine evolution. Traditionally, these catalysts relied on expensive metals like platinum, ruthenium, or iridium. Recent efforts focus on using minimal amounts of these costly metals or finding cheaper alternatives, embedding them into graphene or other 2D materials to maximize surface area and catalytic activity. However, while this embedding enhances the active surface area, it can decrease electrical conductivity, impacting the catalyst's efficiency. To optimize these catalysts, researchers turn to high-throughput computer screening, analyzing different metal configurations onto/into 2D materials and their impact on stability, electronic properties, and electrochemical reduction/evolution activities. Their findings highlight specific configurations as displaying superior durability and high conductivity for efficient water splitting, crucial for clean energy applications like hydrogen-powered vehicles and electricity generation. The study also explores ML approaches and AIMD simulations to characterize high-performance catalysts.168–170 Detailed examinations have unveiled that single atom catalysts (SACs)161,172 follow distinct catalytic pathways compared to their nanocluster counterparts, resulting in differing selectivity in catalytic reactions. This means that by precisely adjusting the coordination environment of active sites, specific catalytic reaction pathways and product preferences can be achieved. Notably, many carbon-based materials have proven to be effective hosts for anchoring single transition metal atoms, serving as a favorable environment for these catalysts.67 

Lately, there has been a surge in interest toward atomically dispersed catalysts that feature isolated transition metal atoms supported on a variety of materials such as metal oxides, borides, carbides, nitrides, graphene, graphitic carbon nitride (g-C3N4), hexagonal-boron nitride (h-BN), black phosphorous (BP), layered double hydroxides (LDHs), and transition metal dichalcogenides (TMDs), MXenes, MBenes.173 These catalysts have garnered attention due to their notably high utilization of atoms, exposed active sites, customizable electronic structure properties, and robust interaction between the metal and the supporting material. Known as single atom catalysts (SACs),161,172 embedding atoms catalysts onto or into 2D materials are being widely explored and recognized as promising candidates for a wide-ranging catalytic reactions such as hydrogen evolution (HER),165,174–177 oxygen evolution/reduction (OER/ORR),166,178–180 nitrogen reduction (NRR),173,181–184 CO/CO2 reduction (CORR/CO2RR),69 and chlorine evolution reaction (ChER).185 Ultimately, this research contributes to a clearer understanding of these catalysts, particularly for critical applications in clean energy environments like hydrogen fuel cells and electric power generation.

2. HER/OER/ORR

Tiny yet mighty, nanoparticle (NP)-sized noble electrocatalysts have been harnessed for a range of electrochemical reactions, particularly in advancing an environmentally friendly hydrogen-based economy like water splitting. Researchers have leveraged minuscule quantities of single atoms (SAs) to amplify the active surface area, fine-tuning catalytic abilities by situating these atoms within defect sites on nitrogen-doped graphene (GN). Developing highly effective catalysts that serve multiple purpose like HER, OER, and ORR has become a focal point in advancing clean and sustainable energy technologies.171 Focusing on these reactions, top-tier performances from 3d-5d TM SACs have been unveiled by employing DFT and ML. The investigation delved into the stability and activity of TM–GN, scrutinizing structure/coordination, formation energy, structural/electrochemical stability, electronic traits, conductivity, and reaction mechanisms. Among the array of –NnCm configurations, the –N2C2 compositions stand out for their ease of formation, showcasing superior electrochemical catalytic prowess and prolonged durability, in particular, for HER, notably avoiding aggregation or dissolution, compared to the extensively studied –C4/C3 and –N4/N3 configurations.65 

Introducing non-noble TMs such as Fe/Ni/Cu onto the surface of MXenes as individual SAs has demonstrated altered mechanical and electrical traits, albeit rarely observed. Through the use of DFT simulations, various combinations of MXenes are assessed for their stability with intermediates and adsorption free energies to gauge their efficacy as catalysts for these reactions. The majority of the considered MXenes remain thermodynamically stable upon doping with Ni/Cu/Fe, showcasing optimal performance for OER/ORR/HER, respectively. The ML potential was generated using on-the-fly sampling and SGPR algorithm.70 The kinetic stability of the material was confirmed by using SGPR ML potential for the catalyst. Surprisingly minimal overpotentials observed in these reactions highlight these catalysts' appeal for water splitting (overall potential of 0.48 V) and metal–air batteries (overall potential of 0.70 V), surpassing even commercially utilized IrO2(111) and Pt(111) for the OER and HER/ORR, respectively.70,186 The strong correlations observed between intermediates (scaling relations) and their interactions with the catalytic surface predominantly dictate their performance. Doped metals not only serve as active sites but also play a crucial role as conduits for charge transfer between the surface and adsorbate. The notably low reaction barriers of Ni–Cr2ScC2O2 for all three reactions, coupled with robust reaction kinetics, position it as an efficient catalyst serving three functions simultaneously.

Here, we show heterogeneous catalysis dynamics of H/Pt reported by Vandermause et al.45 They validated a ML sparse Gaussian process (SGP) force field model for simulating Pt, H2, and H/Pt systems, comparing its performance with DFT and ReaxFF187,188 calculations. The SGP model was trained on energy, forces, and stress data from MD simulations and tested on various properties not included in the training set. The SGP model predicted the lattice constant within 0.1% and the bulk modulus and elastic constants within 6% of DFT values. It performed significantly better than ReaxFF, particularly for the C44 elastic constant. The model accurately predicted energies and properties for structures not encountered during training, showing low uncertainties and excellent agreement with DFT, especially for bulk Pt equilibrium volume and H2 bond lengths. It generalized well for different Pt surface facets. The SGP model showed higher fidelity to DFT, avoiding non-physical behaviors like surface sublimation of Pt hydride species observed in ReaxFF. The model's predictions of H2 dissociative adsorption and atomic H diffusion on Pt(111) closely matched DFT results. MD simulations using SGP yielded an apparent activation energy and diffusion prefactor consistent with experimental values. Overall, the SGP model demonstrated superior accuracy and reliability in comparison to ReaxFF and showed promising generalization capabilities for unseen configurations (Fig. 22).

FIG. 22.

Validation of the H/Pt SGP force field. In each plot, shaded regions indicate 99% confidence regions of the SGP model. (a) Energy vs volume of bulk Pt. (b) Energy vs H–H bond distance of a single H2 molecule. (c) Energy vs intermolecular distance of two H2 molecules oriented perpendicular (blue) and parallel (green) to each other. (d) Surface energies of Pt for (111) (110) (100), and (211) facets. (e) H adsorption energies at different binding sites and Pt facets (copyright npg Nature).45 Figure taken from Ref. 45. Vandermause et al., Nat. Commun. 13, 5183 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 22.

Validation of the H/Pt SGP force field. In each plot, shaded regions indicate 99% confidence regions of the SGP model. (a) Energy vs volume of bulk Pt. (b) Energy vs H–H bond distance of a single H2 molecule. (c) Energy vs intermolecular distance of two H2 molecules oriented perpendicular (blue) and parallel (green) to each other. (d) Surface energies of Pt for (111) (110) (100), and (211) facets. (e) H adsorption energies at different binding sites and Pt facets (copyright npg Nature).45 Figure taken from Ref. 45. Vandermause et al., Nat. Commun. 13, 5183 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

3. CO2RR

Carbon capture, storage, and conversion technologies are seen as viable strategies to tackle excessive CO2 emissions. The electrocatalytic reduction of CO2 into carbon-based fuels and chemicals using renewable electricity is a promising method to mitigate CO2 levels in the atmosphere and address energy concerns. While carbon monoxide (CO) and formic acid (HCOOH) are common products of CO2 reduction, further transformations can yield valuable outputs like methane, methanol, ethylene, and ethanol. However, CO2's chemical inertness leads to high overpotentials and low selectivity for desired products, making the process challenging. High voltages may trigger hydrogen evolution and hinder CO2 reduction efficiency. While carbon-based materials effectively host TM SAs, stability during catalysis remains an issue. Efforts are focused on identifying non-carbon 2D materials with strong metal-support interaction and structural tunability. For instance, ultrathin gallium nitride (GaN) monolayers have been synthesized as promising hosts for TM SAs, showcasing exceptional electronic and optical properties. GaN-based SACs have demonstrated efficacy in various catalytic reactions such as nitrogen reduction, oxygen evolution/reduction, and more, showcasing potential multifunctionality through precise coordination modulation of active sites. This advancement highlights the potential of GaN SACs in diverse catalytic reactions by precisely adjusting the coordination environment of active sites.69 

AIMD simulations based on SGPR ML potential at 500 K using the Nose–Hoover thermostat method tested the kinetic stability of proposed structures. The representative examples of (a) Ir-SAC and (b) Os-SAC also showed the dynamic stability of these structures (Fig. 23).

FIG. 23.

Structural stabilities of proposed TM-SACs: Representative structure of GaN monolayer surface with TM embedded at Ga-SV defect site. Snapshots of the ab initio molecular dynamics simulation at different time. The temperature is controlled at 500 K. The structural stabilities of the representative Ir-SAC and Os-SAC are dynamically stable (Ga: light green, Os: light pink, N: blue).69 Figure adapted from Ref. 69. Reproduced with permission from Umer et al., J. Mater. Chem. A 10, 24280–24289 (2022). Copyright 2022 Royal Society of Chemistry.

FIG. 23.

Structural stabilities of proposed TM-SACs: Representative structure of GaN monolayer surface with TM embedded at Ga-SV defect site. Snapshots of the ab initio molecular dynamics simulation at different time. The temperature is controlled at 500 K. The structural stabilities of the representative Ir-SAC and Os-SAC are dynamically stable (Ga: light green, Os: light pink, N: blue).69 Figure adapted from Ref. 69. Reproduced with permission from Umer et al., J. Mater. Chem. A 10, 24280–24289 (2022). Copyright 2022 Royal Society of Chemistry.

Close modal

4. NRR

Nitrogen (N2) is abundant but inert, requiring transformation into bio-available forms like ammonia (NH3) and other nitrogen compounds through a process called nitrogen fixation. This process is crucial for sustaining life, impacting agriculture, soil fertility, ecosystem health, and even potential energy sources. NH3, a key product of nitrogen fixation, is vital in fertilizer production, pharmaceuticals, and energy storage due to its high energy density and hydrogen content. As the global population grows, the demand for NH3 escalates, necessitating cost-effective production methods that are environmentally friendly. Current industrial methods, like the Haber–Bosch process, though pivotal for food production, contribute significantly to greenhouse gas emissions. Efforts are underway to explore alternatives to the energy-intensive and environmentally impactful industrial processes. Electrochemical nitrogen reduction (NRR) shows promise as a cleaner, more sustainable method for NH3 synthesis, drawing inspiration from biological nitrogenase enzymes. However, challenges persist in finding efficient catalysts that can enhance the selectivity and yield of NH3 production while competing with hydrogen evolution reactions. Researchers are investigating diverse materials, including transition metals, borides, sulfides, oxides, single-atom catalysts, MBenes, and defect-engineered substrates, to improve NRR efficiency. Various studies aim to elucidate mechanisms, control parameters, and catalyst performance using experimental, theoretical, and computational approaches and to explore innovative solutions to improve electrochemical NRR performance in the coming years.

SACs for effective nitrogen reduction reaction (NRR) using various transition metals were investigated as implanted on boron-arsenide (BAs), boron-phosphide (BP), and boron-antimony (BSb) using DFT. The kinetic stability of the material was investigated by using simulations based on SGPR machine-learned potential for the catalyst (W-BAs) with the lowest potential determining step (PDS). W-BAs showed high catalytic activity and excellent selectivity with an insignificant barrier of only 0.05 eV along the distal pathway and a surmountable kinetic barrier of 0.34 eV. The W-BSb and Mo-BSb exhibit high performances with limiting potentials of −0.19 and −0.34 V, respectively.

To check the dynamic stability of single and double vacancy systems, SGPR based ML DFT potential/force MD simulations were performed at 500 K. The simulations indicated the dynamical stability of (a) Pd@2DCP, (b) Fe@h-BN, and (c) Fe@h-B2N2 structures (Fig. 24),181–183 which promised the stability of mono- and dual-type non-metal doped TM-SACs which can be utilized for NRR.

FIG. 24.

Temperature and energy profiles from 6 ps ab initio molecular dynamics simulations under 500 K with the time step of 2 fs. Top and side views of (a) Pd@2DCP, (b) Fe@h-BN, and (c) Fe@h-B2N2 structures are presented. Color code: Pd, light-gray; Fe, golden-brown; B, light pink; N, blue; C, gray; O, red; H, cyan).68 Figure adapted from Ref. 68. Reproduced with permission from Umer et al., J. Mater. Chem. A 10, 6679–6689 (2022). Copyright 2022 Royal Society of Chemistry.

FIG. 24.

Temperature and energy profiles from 6 ps ab initio molecular dynamics simulations under 500 K with the time step of 2 fs. Top and side views of (a) Pd@2DCP, (b) Fe@h-BN, and (c) Fe@h-B2N2 structures are presented. Color code: Pd, light-gray; Fe, golden-brown; B, light pink; N, blue; C, gray; O, red; H, cyan).68 Figure adapted from Ref. 68. Reproduced with permission from Umer et al., J. Mater. Chem. A 10, 6679–6689 (2022). Copyright 2022 Royal Society of Chemistry.

Close modal

5. ClER

The chlor-alkali process stands as one of the fundamental chemical industries globally, with an annual chlorine production surpassing 75 × 106 tons. The production of each ton of chlorine demands approximately 2500 kW · h of electricity, contributing to an overall requirement of around 200 TW · h annually for the chlor-alkali sector alone. This consumption constitutes roughly 1% of the world's total energy production. Given these substantial energy demands, it becomes paramount to explore novel, energy-efficient, and eco-friendly methods for synthesizing Cl2 at a lower cost. Furthermore, achieving selectivity in the oxygen evolution reaction over the chlorine evolution reaction is crucial for industries like pharmaceuticals, cosmetics, and computing. A promising and cost-effective technique to produce pure Cl2 involves the use of a membrane-separated device within the chlor-alkali process. Presently, the electrocatalysts based on RuO2 and IrO2 dominate chlorine evolution reaction (ClER). However, these metal oxides suffer from inadequate catalytic activity, selectivity, and durability due to concurrent oxygen generation during Cl2 production. For highly efficient ClER, a catalyst capable of producing Cl2 with minimal overpotential in the chlor-alkali process is imperative. In this regard, for more efficient electrocatalysts, ML potential/force-fields simulations were carried out. The optimal configurations of the complex systems were obtained by simulation/optimization process utilizing the SGPR based ML DFT potential. The free energy profiles of ClERs for Pt–C4, Pt–N4, Pt–C2N2, Pt–C2 N22, and Pt–C2 N23 are shown in Fig. 25. In this context, the Pt–C2N2 moiety steered electrocatalysts (referred to as Pt-1) exhibit promising ClER features. Comparative studies demonstrate that Pt-1 displays an onset potential of only 5 mV at 80 °C, in line with industrial Cl2 production temperatures. At a current density of 10 mA/cm2, it exhibits a minimal overpotential (η) of only 39 mV, contrasting with ∼100 mV for the best industrial electrocatalyst. These results align closely with the DFT study conducted. Pt-1 showcases superior selectivity in Cl2 production (up to 100%, compared to 90% for industrial electrocatalysts), exceptionally high mass catalytic activity (∼2.12 A/mgPt against 1.49 × 10−5 A/mgIrRu at 70 mV overpotential), and long-term durability in acidic electrolytes. Notably, the reported ClER mass activity for Pt-1 in acidic media surpasses that achieved by any other catalyst to date. The efficient Cl2 generation capability of Pt–C2N2 at remarkably low applied voltages in acidic conditions signifies a groundbreaking advancement toward eco-friendly and economical pure Cl2 production, potentially saving substantial energy. The as-prepared Pt-N2C2 catalyst exhibits superior electrochemical performance for chlorine production in acidic electrolytes. These catalysts can be industrially synthesized at low temperatures, demonstrating exceptional activity, durability, and selectivity (100%)—outperforming commercial and membrane catalysts commonly adopted in industry. The industrial electrodes (IE) comprise substantial amounts of precious metals (0.84 mg/cm2). Their deactivation arises from weak adhesion of the coating to the substrate, leading to the separation of electrocatalyst species from the electrode surface. Additionally, oxygen penetration through cracks during electrolysis causes electrode passivation. This process reduces the electroactive sites due to the oxidation of RuO2 and IrO2 to RuO4 and IrO3 at high potentials. However, the Pt-C2N2 species in our electrode material circumvent these issues, resulting in heightened stability (Fig. 25).

FIG. 25.

DFT Free energy profiles of ClERs for SGPR based ML-DFT optimized models of Pt–C4, Pt–N4, Pt–C2N2, and PtO2 (110). Pt–C2N2 shows 0.05 eV (and −0.02 eV at 80 °C) at U = Ueq(ClER) in acid condition. (H: white, C: gray, N: deep-blue, O: red, Cl: green, Pt: light-blue).185 Figure adapted from Ref. 191. Reproduced with permission from Ha et al., Small 19, 2300240 (2023). Copyright 2023 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 25.

DFT Free energy profiles of ClERs for SGPR based ML-DFT optimized models of Pt–C4, Pt–N4, Pt–C2N2, and PtO2 (110). Pt–C2N2 shows 0.05 eV (and −0.02 eV at 80 °C) at U = Ueq(ClER) in acid condition. (H: white, C: gray, N: deep-blue, O: red, Cl: green, Pt: light-blue).185 Figure adapted from Ref. 191. Reproduced with permission from Ha et al., Small 19, 2300240 (2023). Copyright 2023 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

For perovskite solar cell systems, a number of theoretical studies associated with ML and AIMD simulations have been reported.189,190 The links connecting the perovskite and layers responsible for carrying electrical charge contain significantly more imperfections compared to those found within the perovskite layer itself—roughly 100 times greater. These imperfections, especially the deep-level ones, noticeably reduce the effectiveness of these devices in converting power.72,73,191–193 Recent efforts to reduce these flaws at the connections have primarily focused on enhancing the surface.194 However, improving the surface of the perovskite that connects with the layer responsible for transporting electrons presents challenges, as the agents used for treating the electron-transporting layer might dissolve during application to the perovskite thin film. Conversely, if a seamless connection between the electron-transporting and perovskite layers could be established, concerns about flaws at the connection might become less significant.

The development of an intermediary layer between a SnO2 layer that transports electrons and a halide perovskite layer that absorbs light involves combining Cl-bonded SnO2 with a Cl containing precursor for the perovskite. This intermediary layer demonstrates coherence at the atomic level, improving the extraction and movement of charges from the perovskite layer while exhibiting fewer imperfections at the connection. This coherent intermediary layer has facilitated the creation of perovskite solar cells achieving an efficiency of 25.8% under standard light exposure. Furthermore, even after continuous exposure to light for 500 h, uncovered devices maintained 90% of their initial efficiency. These findings provide valuable insights into crafting interfaces that minimize imperfections between metal halide perovskites and layers responsible for transporting electrons.

The experimental analysis suggested the ability to form stable FA(MA)SnClx structures between the FAPbI3 perovskite and SnO2 without significant distortion in the lattice. However, when TiO2 was utilized as an electrode, establishing a stable interface proved challenging due to the substantial difference in electronegativity between Ti and Sn [Fig. 26(b)]. As per DFT simulations, the interface between the perovskite and TiO2 did not demonstrate stability in the TiO2/FAPbI3 scenario [Fig. 26(c)]. Conversely, the perovskite and SnO2 interface exhibited the creation of FASnCl3 interlayers at the SnO2/FAPbI3 interface [Fig. 26(d)], resulting in a favorable alignment of energy bands between SnO2 and FAPbI3 [Fig. 26(a)]. The conduction band minimum (CBM) of SnO2 was slightly lower (by 0.21 eV) than that of FAPbI3 due to the presence of short strong hydrogen bonding (SSHB) between interfacial oxygen and the FA molecule, stabilizing the dangling bond of the interfacial oxygen.195 

FIG. 26.

Interlayer formation from Cl-BSO and Cl-CPP. (a) (left) Layered stack of the ETL, perovskite, and HTL with an interlayer between the perovskite absorber and ETL, and (right) the energy diagrams of the ETL, perovskite, and HTL. (1): carrier transport to the contact interfaces via the interlayer. (2): back recombination of the extracted carriers, which remained in the interface region, will be inhibited due to the coherent interlayer. (b) Analysis of the Cl ions contained in the SnO2 and TiO2 electrodes. (c) Initial SGPR-based ML DFT and then refined DFT simulation of the formation of the interlayer between the perovskite and SnO2. (d) and (e) DFT optimized geometries of (d) SnO2/FAPbI3 with stable interfacial FASnCl3 perovskite monolayer and (e) TiO2/FAPbI3 without stable interface. [Pb (black), I (purple), Cl (green), C (brown), N (light blue), H (white), Sn (dark blue), Ti (blue), and O (red)]. Figure adapted from Ref. 191. Reproduced with permission from Min et al., Nature 598, 444 (2021). Copyright 2021 Authors, under exclusive license to Springer Nature Limited.

FIG. 26.

Interlayer formation from Cl-BSO and Cl-CPP. (a) (left) Layered stack of the ETL, perovskite, and HTL with an interlayer between the perovskite absorber and ETL, and (right) the energy diagrams of the ETL, perovskite, and HTL. (1): carrier transport to the contact interfaces via the interlayer. (2): back recombination of the extracted carriers, which remained in the interface region, will be inhibited due to the coherent interlayer. (b) Analysis of the Cl ions contained in the SnO2 and TiO2 electrodes. (c) Initial SGPR-based ML DFT and then refined DFT simulation of the formation of the interlayer between the perovskite and SnO2. (d) and (e) DFT optimized geometries of (d) SnO2/FAPbI3 with stable interfacial FASnCl3 perovskite monolayer and (e) TiO2/FAPbI3 without stable interface. [Pb (black), I (purple), Cl (green), C (brown), N (light blue), H (white), Sn (dark blue), Ti (blue), and O (red)]. Figure adapted from Ref. 191. Reproduced with permission from Min et al., Nature 598, 444 (2021). Copyright 2021 Authors, under exclusive license to Springer Nature Limited.

Close modal

Section V E presents the development of universal ML IAPs for a wide range of hydrocarbons, encompassing alkanes, alkenes, and aromatic compounds. Leveraging the SGPR algorithm and on-the-fly adaptive sampling, these potentials were generated based on the PBE+D3 level of DFT. The ML models were trained on diverse hydrocarbon molecules and their clusters encompassing intra- and inter-molecular interactions. Notably, the combined potentials accurately capture the intricate potential energy surfaces for various hydrocarbon systems, demonstrating excellent transferability from small molecules to long alkanes and polyenes and complex ring-chain molecular systems. Furthermore, the ML-IAPs successfully predicted the structures and energies of β-carotene monomer and dimer. Simulation outcomes for liquid ethylene and toluene crystals showcased the ML IAPs' ability to reproduce molecular volumes and crystalline stability differences between different phases. These findings signify the potential of these ab initio-level force fields to advance toward modeling universal organic, polymeric, and biomolecular systems. Classical force-fields like OPLS,196–200 AMBER,207 GROMACS,202 and CHARMM203 have long served as essential tools for simulating organic, biological, polymeric, and macro-molecular systems. Though these force-fields reproduce some experimental data such as conformation reasonably well, they fall short in describing non-covalent inter-molecular interactions194–199 as well as helicity.210–214 It is particularly difficult to accurately describe both structures and properties.215,216 Furthermore, their reliance on descriptors such as bond stretching and angle bending limits their applicability to non-reactive systems, hindering their ability to capture diverse molecular behaviors accurately.

In contrast, ML IAPs offer non-parametric representations, but their efficacy depends on comprehensive descriptors that account for many-body correlations while remaining invariant to system transformations. ML-IAPs rely on “descriptors” derived from atomic positions for their expression. While bonds, bond angles, and torsional angles can serve as useful descriptors, their utility is limited to non-reactive systems. Consequently, there is a necessity for more versatile descriptors that capture many-body correlations, remaining unchanged under the physical system's translations, rotations, and permutations of identical atoms.128 Ideally, a seamless correspondence should exist from the coordinate space to the descriptor space. If the transformation of descriptors lacks injectivity, it can impose a lower boundary on the achievable accuracy of ML-IAPs.217 Thus, the ongoing challenge involves developing more comprehensive descriptors.

Sampling data for ML-IAPs pose a significant challenge, necessitating training data from various potential energy surface domains. Most of these ML-IAPs are often generated on-the-fly with MD simulations.43,44,54,91,101 Active or on-the-fly learning requires assessing the model's uncertainty at given input points. Bayesian inference methods straightforwardly predict uncertainty, while NNs emulate it as variance among predictions from a committee of NNs trained with different architectures or random seeds. However, generating the initial NNs demands substantial effort. While NNs are advantageous when abundant ab initio data are available, the challenge remains in sampling these data across all relevant domains of the potential energy surface. Conversely, kernel-based or Bayesian methods enable on-the-fly model generation. The difficulties in utilizing NNs for on-the-fly training stem from their lack of uncertainty prediction comparable to Bayesian methods and their extensive hyper-parameterization. Re-optimizing all hyper-parameters upon sampling new data becomes cumbersome in high-dimensional spaces and is less conducive to on-the-fly training. Among Bayesian methods, exact GPR offers optimal accuracy but scales computationally as O(n3) with the training data's size (=n). In contrast, SGPR scales more favorably as O(n) while retaining the favorable characteristics of GPR. Hence, SGPR emerges as an ideal choice for on-the-fly sampling of optimal datasets in generating ML IAPs.

The focus of this study lies in developing ML IAPs using SGPR for a broad spectrum of hydrocarbons, encompassing alkanes and alkenes with different structures and different degrees of unsaturation. Though triple bonds have not been included yet, here the work has considered not only saturated, olefinic, and aromatic hydrocarbons, which include alkane, alkene, alkadiene, and alkatriene molecules (having zero to arbitary number of C=C double bonds) with branches but also cyclic saturated/unsaturated and aromatic hydrocarbons, and further their combined structures.81,82 This research aims to lay the groundwork for a universal ab initio force-field specific to this subset of hydrocarbons. The goal is to bridge the accuracy gap observed in conventional force-fields and create a more versatile model capable of accurately describing diverse organic, bioorganic systems as well as polymers and macromolecular systems.

1. Machine learning potentials

ML IAPs adopt a holistic approach by encapsulating all atoms within a specified cutoff radius as descriptors. For any configuration of N atoms represented as x, a compilation of descriptors x={ρi}i=1N is created. Each ρi relies on the relative coordinates of all atoms in the LCE of atom i within the cutoff radius. The potential energy, E(x), is then computed:
(147)
where E is a fictional energy representing the interaction of atom i with its neighboring atoms. Kernel-based regression methods define E as
(148)
where the matrices are expressed with bold fonts, z={χj}j=1m is the set of reference/inducing descriptors, w is the vector of weights for the inducing descriptors, and K is a similarity/covariance kernel. Namely, the potential energy E(x) is a sum over atoms i, involving a similarity/covariance kernel K operating on the descriptors ρi and inducing descriptors χj, weighted by wj. These weights are determined to accurately replicate the potential energy and forces of an ab initio dataset X={xk}k=1n. In SGPR,54,
(149)
where kmm and knm are the inter-inducing and data-inducing covariance matrices, respectively, Y are the data potential energies (and forces), and σ is the noise hyper-parameter. The noise scale σ and other possible hyper-parameters in the kernel K are optimized to maximize likelihood of the data energies. For kernel similarity, a variation of SOAP128 defined in Ref. 54 is employed. The inducing descriptors are selected from existing data or sampled during on-the-fly learning in MD simulations, aiding in uncertainty prediction and facilitating the learning process. With Bayesian methods, the predictive variance serves as a threshold for sampling new data or inducing descriptors. In SGPR, the predictive variance for E(ρ) is given by
(150)
For calculation of w from Eq. (149), the matrix inversion (σ2kmm+knmTknm)1 can be bypassed by converting this equation into a linear system.54,217 However, in Eq. (150), this matrix inversion can be computationally burdensome218 for on-the-fly learning, prompting the use of approximations for computational effectiveness in spite of occasionally slightly sub-optimal sampling outcomes.

In this work, for all first-principles calculations, the Vienna ab initio simulation package (VASP)107 using the Perdew–Burke–Ernzerhof (PBE) generalized-gradient approximation (GGA) functionals109 with correction method of Grimme et al.219 for van der Waals interactions (D3) was used. The python package AUTOFORCE105 for on-the-fly generation of SGPR models.54 was used. For the accuracy of ML IAPs, the coefficient of determination denoted by R2=1i(fif̃i)2/i(fif¯)2 is reported, where {fi} are the first-principles (PBE+D3) forces, f¯ is their average, and {f̃i} are the ML forces.

Here, the training with a particular molecule means MD simulations with an adaptive sampling algorithm similar to Ref. 54 for on-the-fly ML of the potential energy surface. Accurate variance prediction is central for efficient on-the-fly training. All available conformers of a given molecule are used for training. Each conformer is simulated for 3–6 ps of MD at 300 K. Molecules are queued for training based on their simplicity. Owing to the abundance of relevant molecules, trained molecules are split into several groups and independent SGPR models are trained in parallel using the AutoForce package. In this study, we considered expert models for linear or branched systems (C1–C8 alkane, C3–C10 isoalkane, C2–C6 alkene, C3–C6 branched alkene, C3–C8 alkadiene, C7–C9 alkatriene), cyclic systems (C3–C10 cycloalkane, C3–C8 cycloalkene, C4–C10 methyl-/dimethyl-cycloalkane, bicycloalkane), and aromatic systems (benzene, toluene, xylene, trimethylbenzene, mesitylene) to build the universal model. To consider the intermolecular interactions, expert models for liquid phases of alkane (methane, ethane, propane, butane), alkene (ethylene), and aromatic (benzene) are trained also and denoted as alkane-inter, alkene-inter, and aromatic-inter, respectively. The model for alkane is extended by simulations for small clusters and liquid phases of methane, ethane, propane, and butane for learning inter-molecule interactions. This extended alkane model will be distinguished as “alkane.” Later, independent MD simulations are repeated for sampling independent testing sets for all conformers of the above molecules.

2. Macromolecular systems

a. Expert models

The expert SGPR model for each category of molecules is generated on-the-fly by short MD simulations (3–6 ps) of the associated molecules at 300 K. As an example, training for the elementary alkane molecules is shown in Fig. 27. The model trained after each MD simulation is used as the initial model for the next. In this way, the forces for larger molecules are inferred from the data for smaller ones more frequently and less ab initio calculations are performed for on-the-fly training. Expert models are also trained for cyclic hydrocarbons (cycloalkane, cycloalkene, and bicycloalkane) and aromatic systems, and then are combined with the previously generated model for alkane, isoalkane, alkene, and alkadiene. Each generated expert model is validated by comparing it with the DFT calculation results. Similar MD simulations are repeated without on-the-fly training for generating independent testing sets. The performance of the expert models on their testing sets are shown in Table III by means of mean absolute error (MAE) and R2. These SGPR models closely approximate the ab initio PBE+D3 potential energy surface in their respective domains.

FIG. 27.

Initial steps for on-the-fly ML MD for elementary alkane molecules. Red bullets indicate the ab initio potential energy of the configurations sampled by the SGPR model. Green bullets are for single-point ab initio tests.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 27.

Initial steps for on-the-fly ML MD for elementary alkane molecules. Red bullets indicate the ab initio potential energy of the configurations sampled by the SGPR model. Green bullets are for single-point ab initio tests.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal
TABLE III.

Testing the expert models in their domains. Ntest is the number of test samples. Energy* is the potential energy per-atom.81 Adapted from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Group Ntest Energy* MAE Force MAE Force R2
(meV) (eV/Å) Testing/training
Alkane  448  4.9  0.076  0.978/0.993 
Isoalkane  308  3.0  0.080  0.977/0.989 
Alkene-1  142  8.1  0.110  0.966/0.985 
Alkene-2  88  5.3  0.110  0.955/0.98 
Alkene-3  238  4.0  0.087  0.975/0.991 
Alkadiene  374  4.5  0.110  0.959/0.978 
Alkatriene  238  3.2  0.108  0.992/0.966 
Cycloalkane  253  2.3  0.074  0.983/0.990 
Cycloalkene  77  3.4  0.110  0.963/0.984 
Methyl/dimethyl-cycloalkane  946  5.2  0.092  0.970/0.989 
Bicycloalkane  594  2.1  0.075  0.980/0.987 
Aromatic  83  2.1  0.074  0.997/0.983 
Group Ntest Energy* MAE Force MAE Force R2
(meV) (eV/Å) Testing/training
Alkane  448  4.9  0.076  0.978/0.993 
Isoalkane  308  3.0  0.080  0.977/0.989 
Alkene-1  142  8.1  0.110  0.966/0.985 
Alkene-2  88  5.3  0.110  0.955/0.98 
Alkene-3  238  4.0  0.087  0.975/0.991 
Alkadiene  374  4.5  0.110  0.959/0.978 
Alkatriene  238  3.2  0.108  0.992/0.966 
Cycloalkane  253  2.3  0.074  0.983/0.990 
Cycloalkene  77  3.4  0.110  0.963/0.984 
Methyl/dimethyl-cycloalkane  946  5.2  0.092  0.970/0.989 
Bicycloalkane  594  2.1  0.075  0.980/0.987 
Aromatic  83  2.1  0.074  0.997/0.983 
b. Long alkane molecules

Using the alkane SGPR model, which is trained using only short molecules, fails to reproduce the potential energy surface for longer molecules when they fold and form a coil. When a long molecule folds, different parts of the same molecule interact with each other, which is not reproducible with short molecules. Instead, alkane model, which is extended for inter-molecule interactions using small alkane molecules, is applicable. Here we used this model for MD simulations (without on-the-fly training) of long n-alkane molecules for 100 ps at 300 K. Snapshots of C64H130 are shown in Fig. 28. A testing set is generated by collecting snapshots every 1 ps. Performance of the alkane model is summarized in Table IV, which shows a high degree of transferability. This table also reconfirms that the computational cost is N3 for DFT but N for ML.

FIG. 28.

Folding of C64H130 at 300 K. Snapshots are shown every 2 ps.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 28.

Folding of C64H130 at 300 K. Snapshots are shown every 2 ps.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal
TABLE IV.

Testing alkane SGPR model on long alkane molecules. T indicates the elapsed time (seconds) for a single-point energy and force calculation on a node with 20 CPUs. Energy* is the potential energy per-atom.81 Adapted from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Molecule TML TDFT Energy* MAE Force MAE R2
(s) (s) (meV) (eV/Å) (testing)
C8H18  0.44  82.07  3.3  0.070  0.985 
C16H34  0.65  282.36  4.7  0.085  0.977 
C32H66  1.08  1836.88  4.2  0.091  0.973 
C64H130  1.92  12310.9  4.8  0.095  0.970 
Molecule TML TDFT Energy* MAE Force MAE R2
(s) (s) (meV) (eV/Å) (testing)
C8H18  0.44  82.07  3.3  0.070  0.985 
C16H34  0.65  282.36  4.7  0.085  0.977 
C32H66  1.08  1836.88  4.2  0.091  0.973 
C64H130  1.92  12310.9  4.8  0.095  0.970 
c. Universal model

The SGPR models for alkane, isoalkane, alkene-(1, 2, 3), and alkadiene are combined into a single universal model for linear/branched hydrocarbons with arbitrary number of double bonds. The model was rebuilt using all of the ab initio data and inducing descriptors, which were sampled during training of the expert models.63 As shown in Table V, the universal model is more accurate than the expert models even in their own domains. The correlation of the predicted energies with the exact ab initio energies for all testing sets is shown in Fig. 29.

TABLE V.

Testing the universal model in all domains. The testing sets are identical to those in Table III.81 Adapted from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Group Energy* MAE Force MAE Force R2
(meV) (eV/Å)
Alkane  3.4  0.055  0.989 
Isoalkane  3.3  0.060  0.986 
Alkene-1  5.6  0.075  0.98 
Alkene-2  5.1  0.079  0.973 
Alkene-3  4.1  0.071  0.983 
Alkadiene  4.4  0.081  0.977 
Group Energy* MAE Force MAE Force R2
(meV) (eV/Å)
Alkane  3.4  0.055  0.989 
Isoalkane  3.3  0.060  0.986 
Alkene-1  5.6  0.075  0.98 
Alkene-2  5.1  0.079  0.973 
Alkene-3  4.1  0.071  0.983 
Alkadiene  4.4  0.081  0.977 
FIG. 29.

Predicted energies of the universal model for all testing sets.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 29.

Predicted energies of the universal model for all testing sets.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

For a case study, several linear and branched isomers of C11H18 were considered. There are 3C=C double bonds in these isomers (see Fig. 30). It is worth noting that the training molecules contained at most 2C=C double bonds (alkadiene). These structures are relaxed using the universal ML-IAP model, PBE+D3, B3LYP+D3, and OPLS-AA force-field by minimizing the forces below 0.01 eV/Å. B3LYP+D3 is included since it is a popular functional for molecular systems. Additionally, we are intrigued to check how the error of ML-IAP compares to the disagreement of two widely used density functionals. The energies for the optimized structures, depending on the method used for relaxation, are shown in Fig. 31. For structures optimized using PBE+D3, single-point ML-IAP and OPLS-AA energies are compared. The energies for optimized structures using ML-IAP, PBE+D3, and B3LYP are in reasonable agreement while OPLS-AA fails to reproduce the trend. It should be noted that OPLS-AA is an empirical potential designed to statistically reproduce some experimental observations rather than single-point relaxed structures.

FIG. 30.

Isomer structures of C11H18 for testing the universal model. 15 different isomers are prepared for C11H18.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 30.

Isomer structures of C11H18 for testing the universal model. 15 different isomers are prepared for C11H18.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal
FIG. 31.

Relative energies ΔE=EiE1 (i = 1–15, see Fig. 30) of C11H18 isomers relaxed with PBE+D3, ML-IAP, B3LYP+D3, and OPLS-AA. (sp) are single-point energies for configurations relaxed by PBE+D3.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 31.

Relative energies ΔE=EiE1 (i = 1–15, see Fig. 30) of C11H18 isomers relaxed with PBE+D3, ML-IAP, B3LYP+D3, and OPLS-AA. (sp) are single-point energies for configurations relaxed by PBE+D3.81 Figure taken from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal
d. Rotation of CH3 group in ethane

The accuracy of universal ML-IAPs generated by combining expert models is tested first. As a simplest case, nudged elastic band (NEB) calculation is conducted to predict the rotation barrier of CH3 group in ethane. The barrier to rotate CH3 group for 120° is calculated using universal ML-IAPs and compared with PBE+D3 results. As a result, the barrier of the reaction was predicted to be 0.12 eV, which is almost the same with the PBE+D3 results showing 0.11 eV.

e. Dimerization of β-carotene

The universal ML-IAPs are used to predict the structures of molecular systems containing various hydrocarbon groups. Here, the structures of β-carotene monomer and dimer are investigated using the universal potential. β-carotene is an organic molecule having eight isoprenes and two trimethyl-cyclohexenes in its structure. The expert models including C–C single bonds, C=C double bonds, and cyclic hydrocarbons are used to construct the universal model. The relative energies of 20 different conformers of β-carotene monomer are predicted to be close to the PBE+D3 results within a low RMSE of 0.12 eV.

f. Liquid ethane and ethylene
The alkane model is used for simulations of liquid ethane at the experimental boiling temperature. In all, 216 molecules are used for MD simulations in NPT ensemble with constant pressure of 1 bar. The duration of simulation was 100 ps for equilibration and 200 ps for averaging. Heat of vaporization is calculated as
(151)
where ug/vg and ul/vl are the internal energy/volume per molecule in the gas and liquid phases, respectively, p is the pressure, and Hg is calculated with ideal gas approximation (pvg = RT). The results are compared with the experimental data in Table VI. In this case, although ML-IAP yields vl in good agreement with experiment, it slightly overestimates ΔH. However, unlike empirical force-fields, this ML-IAP is truly a first-principles potential whose accuracy is limited by PBE+D3 level of DFT. Note that in this case a direct testing of DFT, if not impossible, would be extremely difficult. Thus, surrogate ML-IAP models can be used for testing expensive, lower level, first-principles potentials against experimental observations.
TABLE VI.

Simulation of liquid ethane at 184.52 K and 1 bar. vl is the volume per molecule in the liquid phase. ΔH is the heat of vaporization. Adapted from Ref. 81. Reproduced with permission from Hajibabaei et al., J. Phys. Chem. A 125, 9414 (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Units ML (alkane) Experimental
vl  3 90.3(4)  91.5 
ΔH  (kcal/mol)  4.01(2)  3.52 
Units ML (alkane) Experimental
vl  3 90.3(4)  91.5 
ΔH  (kcal/mol)  4.01(2)  3.52 

Also, the molecular volume of liquid ethylene is predicted using ML-IAPs. The NPT MD simulations of 34 ethylene molecules are conducted for 30 ps with a constant pressure 1 bar and temperature 169.5 K. The volume per molecule vl is predicted to be 76 ± 6 Å3 in good agreement with the experimental value220 of 81.5 Å3. Some discrepancy from the experimental value seems to arise from the inherent discrepancy in DFT-D3 calculation.

g. Toluene crystal

The prediction of crystal structures is accelerated by using the ML potential trained with DFT potential energies/forces. Here, the universal model predicts the stable toluene crystal structure (Fig. 32). The α-phase is predicted to be the most stable state in crystal, while the β-phase is metastable state.221 

FIG. 32.

Crystal structures of (a) α-toluene and (b) β-toluene.82 Figure taken from Ref. 82. Reproduced with permission from Hajibabaei et al., ACS Phys. Chem. Au 2, 260–264 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

FIG. 32.

Crystal structures of (a) α-toluene and (b) β-toluene.82 Figure taken from Ref. 82. Reproduced with permission from Hajibabaei et al., ACS Phys. Chem. Au 2, 260–264 (2022). Copyright 2022 Authors, licensed under a Creative Commons Attribution (CC BY) license.

Close modal

3. Discussion

This study outlines the comprehensive generation of a universal ML-IAP designed to predict molecular and crystal structures of hydrocarbons. This model encapsulates various hydrocarbon types, including alkanes, unsaturated alkenes with diverse C=C double bond numbers, aromatic compounds, open-chain, and cyclic compounds. Using the SGPR algorithm and on-the-fly learning, the model emerges from less than 500 atomic configurations of elementary molecules, along with clusters and liquid phases of small molecules to incorporate inter-molecule interactions.

Expert models tailored to distinct hydrocarbon groups are initially developed through on-the-fly ML and then consolidated into the universal ML-IAPs. These universal models, originating from the expert models trained for 90 molecules (350 conformers) and five bulk phases, exhibit exceptional transferability across varied hydrocarbon systems. They effectively handle complex scenarios like the folding and coiling of long alkane molecules, showcasing reliability even under such structural changes. Simulations of liquid ethane at its boiling temperature yield results reasonably aligned with experimental observations.

The model's prowess extends to predicting the minimum energy structure of C11H18, displaying impressive agreement with direct ab initio simulations while outperforming traditional force-fields like OPLS-AA. Although OPLS-AA is empirically successful in replicating certain experimental observations, ML-IAPs, being surrogates for expensive first-principles potentials, provide more adaptable avenues for studying bond breaking/forming/rearrangements reaction-paths involving atom transfer, transition-states, and phase transitions. The universal ML-IAPs accurately predict the structure and interaction energies of complex compounds like the β-carotene dimer and effectively describe molecular volume and phase transition, as exemplified in the investigation of the toluene crystal's phases. The significance of this work lies in efficiently generating a universal model from separately trained atomic potentials for various subgroups, enabling its broad applicability across multi-component systems—a feat unattainable for other ML methods. It showcases one of the most extensive explorations of hydrocarbon phase spaces, employing fewer than 1000 configurations as the basis for SGPR potential. Employing a Bayesian criterion during on-the-fly MD simulations allows automatic discarding of configurations with redundant information. It emphasizes that data quality, in terms of diversity and completeness, significantly impacts the accuracy and transferability of ML potentials, surpassing the importance of data quantity. Moreover, it explores the potential of using more comprehensive descriptor schemes for improved accuracy and introduces intriguing concepts like “recursively embedded atom NNs” for incorporating higher-order correlations. It also highlights the importance of balancing global exploration and local exploitation in data sampling for developing accurate ML-IAPs.

The SGPR algorithm emerges as a robust and rapid tool for on-the-fly ML of potential energy surfaces, offering excellent transferability despite being based on a relatively small number of configurations. It introduces a modular approach through expert models, allowing the combination of diverse models into a singular global model. The study underscores the significance of building efficient yet globally diverse datasets, a crucial step in the automation and optimization of ML-IAPs.

ML has gained recognition for its precision in modeling quantum potential surfaces and forces. While NNs are frequently employed for this purpose, they require frequent updates from numerous trajectories. GPR provides an alternative. However, traditional GPR models still necessitate large datasets for training due to extensive covariance matrix elements. This complicates continuous learning, particularly during phase transitions and chemical reactions. SGPR addresses this issue by significantly reducing the number of parameters, which allows for smaller training datasets. SGPR's ability to perform rank reduction improves the computational performance by several orders of magnitude (see Table VII). Moreover, by utilizing path-independent feature parameters through active learning instead of traditional atomic coordinates, the SGPR-ML based algorithm facilitates the creation of universal quantum mechanical potentials for complex systems, establishing a foundational platform for virtual material modeling and broadening applicability across various quantum mechanical systems. Path-independent parameters can be compiled into a built-in library, enabling their reuse in related systems without the need for re-training. The transferability of SGPR's covariance matrix means it remains effective without updates. This facilitates the creation of a comprehensive covariance matrix library through cumulative learning, enhancing the efficiency and speed of simulations for complex material systems. This makes SGPR versatile for the simulations of a range of small to complex quantum mechanical systems at the levels of advanced theories like coupled cluster theory with perturbative triple excitations [CCSD(T)],222 matrix product states (MPS),223,224 and quantum Monte Carlo (QMC) simulations225,226 alongside DFT.

TABLE VII.

Computational cost of normal GPR vs SGPR. n is the number of snapshots in the data, N is the number of atoms in each snapshot, nc is the average number of atoms inside a sphere with cutoff radius rc, and m is the number of inducing LCEs sampled by SGPR. In the last column, an exemplary ratio in computational cost for GPR over SGPR is estimated for the case where n = N = 100, nc = 33, and m = 1000, similar to the cases that we have often encountered.

Method DFT-data Accuracy Paper
Cost GPR SGPR GPR/SGPR
Matrix inversion  (1 + 3N) 3n3  (1 + 3N)nm2  ∼103 
Kernel construction  (1 + 3nc) 2N2n2  (1 + 3nc)Nnm + m2  ∼103 
Method DFT-data Accuracy Paper
Cost GPR SGPR GPR/SGPR
Matrix inversion  (1 + 3N) 3n3  (1 + 3N)nm2  ∼103 
Kernel construction  (1 + 3nc) 2N2n2  (1 + 3nc)Nnm + m2  ∼103 

SGPR's rank reduction, empowered by selecting LCE features via similarity measures in active learning, replaces atomic coordinates with characteristic geometric features. This method helps achieve chemical accuracy across various spatial configurations of quantum mechanics potentials. By aggregating covariance elements from subsystems and analyzing their interrelationships, SGPR is capable of generating universal quantum mechanical potentials, enabling integrated first-principles calculations and ML simulations on a comprehensive platform for virtual materials reality. This advanced capability is generally unachievable with most traditional ML methods. Meanwhile, the covariance matrices created through SGPR's rank reduction are transferable, meaning that their updates are not required even during events such as phase transitions, molecular transformations, or chemical bond formation/breaking. This feature allows the combination of systems, where inter-system covariance matrices are integrated gradually, building a comprehensive library. This library then facilitates rapid and precise simulations for complex systems by simply retrieving relevant datasets, effectively realizing the structural and phenomenological characteristics of these systems. SGPR thus distinguishes itself with several key features: rapid training due to minimal LCE-based rank reduction, high-speed computations suitable for large or complex systems, and the generation of a universal, transferable potential that supports both expansive learning across complex systems and the development toward self-learning super-AI or digital twin. By merging SGPR with active learning, meta-dynamics, and quantum mechanical calculations, it would be possible to develop a self-improving AI and digital twin capable of innovating new materials and devices. SGPR offers a reduction in computational complexity and biases while ensuring quicker, more accurate simulations for sizable and intricate systems. Its unique attributes, such as linear scalability and adaptability to varied systems, represent a substantial leap over conventional ML methodologies, encouraging creative and effective approaches in materials science simulations and designs. The strengths of SGPR include (i) fast training due to reduced parameter needs; (ii) high-speed computation suitable for large and complex systems (linear scaling); (iii) chemical accuracy and the ability to produce universal potentials; and (iv) scalability through cumulative learning, forming a base for a self-learning AI or Digital Twin.

This approach has been demonstrated in the simulation of superionic materials for fast-charging batteries and electro-catalytic reactions, showing significant improvements over traditional methods. SGPR's low-rank approximation of the covariance matrix boosts computational performance dramatically, which we have quantified in comparative studies. Additionally, SGPR integrates seamlessly with on-the-fly ML methods, eliminating the need for constant retraining of ML models during simulations. This makes it particularly suited for extensive materials studies involving solid electrolytes, catalysts, solar cells, and even complex biological molecules, where it enables rapid simulation and exploration of potential materials without the computational overhead typically associated with such tasks. Thus, SGPR represents a transformative step forward in the use of ML for quantum mechanical simulations, providing a robust platform for the development of new materials and the detailed study of existing ones under various conditions, all while reducing the computational expense and increasing the scope of potential applications.

In this work, we reviewed the applications of SGPR for extensive materials involving bulk materials, solid electrolytes, batteries, catalysts, solar cells, and macromolecular systems. Leveraging this tool allows to bypass traditional computational constraints, enabling extensive exploration of material compositions. This approach is expected to significantly contribute to the development of safer and more efficient solid electrolytes for future energy storage applications. Here, as an exemplary application, we outline a technical strategy and the expected outcome of SGPR as a comprehensive virtual lab capable of simulating and predicting properties of electrolytes across various systems, significantly advancing our capability to design efficient, stable, and safe solid electrolytes for future energy storage solutions.

The Innovative Approach will push the boundaries of materials science by integrating advanced ML techniques with traditional chemical physics approaches to develop more advanced materials, which is crucial for designing next-generation novel materials.

  1. SGPR-based ab initio molecular dynamics (AIMD): AIMD excels in simulating complicated systems. However, AIMD's computational demands scale cubically with the number of atoms, making it feasible only for smaller systems and shorter durations. SGPR-enhanced AIMD can handle larger atom counts over extended periods, demonstrating its potential for larger, more complex simulations.

  2. Integration of quantum mechanics and ML: This SGPR-based approach ushers in a new era for modeling with significant applications such as:

    • Learning from existing data to develop strategies, similar to how AlphaGo learned to master Go toward AlphaGo-0.

    • Generating substantial data on unknown material properties using quantum mechanical computations.

    • Creating a universal quantum mechanics potential that combines quantum mechanics and ML, enabling rapid computations.

    • Exploring structural space and phase transitions through meta-dynamics.

    • Developing self-learning systems that evolve into Super AI and Digital Twins, capable of innovating new materials and devices.

  3. Challenges with conventional ML methods: Traditional GPR methods require inverting large covariance matrices, which is impractical for systems with thousands of trajectories due to computational constraints. SGPR addresses this by reducing feature dimensions, which simplifies the learning process and enhances the speed and scalability of simulations.

  4. Key features of SGPR:

    • Fast training and superfast computing capabilities suitable for extensive systems due to linear scaling.

    • Ability to generate transferable universal potentials that facilitate rapid, large-scale quantum calculations.

    • Minimal data requirements for accurate potential generation, leveraging the cumulative learning of key features.

To conclude, by merging computational power with innovative ML strategies, this research stands to significantly impact the field of energy storage, contributing to safer and more reliable battery technologies and other novel functional materials including catalysts and solar cells that are essential for a range of modern applications.

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT: Ministry of Science and ICT) (No. RS-2024-00346451) and the Program for Cultivating the Next Generation of Outstanding Scientists (No. 1.240033.01) by UNIST Basic Science Institute.

The authors have no conflicts to declare.

Soohaeng Yoo Willow, Amir Hajibabaei, Miran Ha, David ChangMo Yang, Chang Woo Myung, Seung Kyu Min, and Geunsik Lee contributed equally to this work.

Soohaeng Yoo Willow: Writing – original draft. Amir Hajibabaei: Writing – original draft. Miran Ha: Writing – original draft. David ChangMo Yang: Writing – review & editing. Chang Woo Myung: Writing – review & editing. Seung Kyu Min: Writing – review & editing. Geunsik Lee: Writing – review & editing. Kwang S. Kim: Conceptualization; Funding acquisition; Project administration; Writing – original draft; Writing – review & editing.

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

1. Mathematical expressions
a. Matrix algebra

Matrix inversion lemma:

The matrix inversion lemma shows
(A1)
Therefore,
or
Consequently, the Woodbury, Sherman, and Morrison formula shows
(A2)
A useful expression is
(A3)
Matrix determinant lemma:
For the dyadic product abT of a column vector u and a row vector vT, the matrix determinant lemma satisfies
(A4)

Therefore,