Machine learning models for the potential energy of multi-atomic systems, such as the deep potential (DP) model, make molecular simulations with the accuracy of quantum mechanical density functional theory possible at a cost only moderately higher than that of empirical force fields. However, the majority of these models lack explicit long-range interactions and fail to describe properties that derive from the Coulombic tail of the forces. To overcome this limitation, we extend the DP model by approximating the long-range electrostatic interaction between ions (nuclei + core electrons) and valence electrons with that of distributions of spherical Gaussian charges located at ionic and electronic sites. The latter are rigorously defined in terms of the centers of the maximally localized Wannier distributions, whose dependence on the local atomic environment is modeled accurately by a deep neural network. In the DP long-range (DPLR) model, the electrostatic energy of the Gaussian charge system is added to short-range interactions that are represented as in the standard DP model. The resulting potential energy surface is smooth and possesses analytical forces and virial. Missing effects in the standard DP scheme are recovered, improving on accuracy and predictive power. By including long-range electrostatics, DPLR correctly extrapolates to large systems the potential energy surface learned from quantum mechanical calculations on smaller systems. We illustrate the approach with three examples: the potential energy profile of the water dimer, the free energy of interaction of a water molecule with a liquid water slab, and the phonon dispersion curves of the NaCl crystal.
I. INTRODUCTION
Modeling materials in the electronic ground state requires an accurate description of the Born–Oppenheimer potential energy surface (PES). First-principles models based on density functional theory (DFT) have good predictive power, but high computational complexity. Empirical force fields are more efficient but often less accurate than DFT and are typically not applicable when chemical bonds change within a system of interest. PES models based on machine learning (ML) offer a promising route to overcome the dilemma of accuracy vs efficiency.1–16 These models combine force field efficiency with first-principles accuracy and have been successful in a number of applications that could not be tackled with direct first-principles studies. Here, we focus on one such method, the deep potential (DP) scheme,13,14 which, upon training with DFT data, can achieve accuracy comparable to that of ab initio molecular dynamics (AIMD). On high-performance computer platforms, deep potential molecular dynamics (DPMD) simulations can model millions of atoms on timescales of tens of nanoseconds or even longer.17,18 Successful applications to date include studies of the signatures of a liquid–liquid phase transition in supercooled water,19 nucleation of crystalline Si from the melt,20 phase diagrams of Ga21 and water,22 structural motifs in the growth of quasicrystals,23 the reactive uptake of N2O5 in interfacial processes,24 reactive processes in combustion,25 and the dissociative chemisorption of water at the water–titania interface26 and of the 1D cooperative diffusion in a simple cubic crystal.27
Most ML models, including DP, assume that the PES can be represented by a sum of atomic terms describing the many-body interactions among the atoms within some fixed cutoff distance. This construction is computationally advantageous as its complexity scales linearly with system size. On physical grounds, however, it entails a severe approximation because electrons and nuclei are charged particles that experience long-range Coulomb forces. Despite that, local effective interaction models work well in many cases. In metals, this can be attributed to complete electronic screening. In insulators, screening is incomplete, yet the interatomic interactions are often represented satisfactorily by short-range ML models. These models are trained on DFT data, which include long-range electrostatics, suggesting that some features of the long-range interactions may be mimicked by short-range models. While this treatment is sufficient in many cases, particularly when dealing with systems that are, on average, homogeneous and neutral at the molecular scale, failures of the short-range models in cluster, interfacial, and vapor phase properties are well known.28,29 A quintessential example of a property that cannot be simulated by short-range models is the longitudinal-transverse splitting of the long-wavelength optical phonon modes in polar crystals. Long-range electrostatics should be even more important in heterogeneous systems, where charge separation at large scale may be induced by external fields or by chemical potentials. Examples include electrolyte solutions in contact with electrodes,30 dipolar surfaces,31,32 protein folding problems,33 and the binding affinity of ligands to proteins.34
A possible way of incorporating long-range electrostatics within ML models is to assume that the PES has two contributions. One accounts for short-range interactions and is constructed as in standard ML models. The other accounts for long-range interactions and is approximated by the electrostatic energy of a system of partial point charges located at the atom sites, with the added constraint of global charge conservation. The effective charges either can be defined empirically, as commonly done in the context of force field models,35 or can be found by machine learning,11,16,36–38 in which case the partial charges are matched to reference charges extracted from DFT calculations by partitioning the electronic charge density among the atoms.36–38 A difficulty with this approach is that an atom in a molecule, or a material, does not have a well-defined charge because the charge densities of neighboring atoms overlap. As a consequence, different partitioning schemes may lead to different results.39,40 To reduce this ambiguity, charge partitioning schemes have been combined with the concepts of electronegativity and hardness,41 which, however, also lack a rigorous definition. Interestingly, the accuracy and transferability of molecular ML models with effective atomic charges were found to improve significantly when the molecular dipole moment was added to energy and forces in the training data.41–43 This condition is implemented in PhysNet,16 a model in which the partial charges are trained to best reproduce the total energy and dipole of a system. A more flexible representation than the point charge models is provided by the long-distance equivariant (LODE) framework, in which the charge density is approximated by a sum of atom centered spherical Gaussians or other localized functions.44,45 LODE successfully describes the mutual interaction of charged molecular species at large separation, a property beyond the reach of short-range ML models.
The above schemes can be seen as different realizations of a coarse-graining transformation that approximates the electronic density with a sum of atom centered spherical contributions. This representation can provide a good overall fit of the charge density and of the total dipole of a molecule, but, in general, does not correctly describe the dipolar fluctuations that couple with the electric field generated by distant charges in extended systems within periodic boundary conditions. The reason is that the dipole of a periodic crystal can only be defined modulo a quantum, and the polarization fluctuations typically include dipolar fluxes across the cell boundary, originating from the delocalized nature of the quantum mechanical electronic charge distribution. These fluxes are related to the phase of the wavefunction and cannot be described, in general, by partial atomic charges (see, e.g., Ref. 46). Interestingly, an exception to this rule occurs when a crystal is made of nonoverlapping molecular units because, in this case, the total dipole is the sum of the molecular dipoles.
In this work, we propose an alternative deep learning model for insulating systems that overcomes this limitation. As in previous approaches, we assume that the PES has short- and long-range contributions. The short-range contribution has the standard form of the DP model. The long-range contribution is the electrostatic energy of a system of spherical Gaussian charges, associated with the ions (nuclei + frozen core electrons) and the valence electrons, located, respectively, at the ionic and electronic sites. The latter are defined by the averages of the positions of the maximally localized Wannier centers (MLWCs)47 associated with specific atoms.48 In general, ionic and electronic sites do not coincide. The electronic site coordinates are not independent dynamical variables, but are fixed by the atomic configuration, as required by the Born–Oppenheimer adiabatic separation of ionic and electronic dynamics. We assume that the electronic sites, hereafter also called Wannier centroids, depend only on the atoms in their neighborhood, a conjecture that can be rationalized in terms of the so-called nearsightedness of electronic matter.49 We found that this condition is well satisfied in all systems we have studied to date. Thus, the environmental dependence of the centroids can be modeled accurately by a deep neural network (DNN), such as deep Wannier (DW), which was introduced in Ref. 48 to describe the dielectric response of insulators. By combining DP and DW, we construct a model of the PES that includes explicit long-range electrostatic interactions. The new scheme, called deep potential long-range (DPLR), has several advantages relative to atom centered models. First, the centroid distributions have integer charges and automatically conserve the total charge of the system. Second, their positions derive from a unitary transformation in the subspace of the occupied Kohn–Sham orbitals. By construction, this representation rigorously describes molecular dipoles in finite systems and dipolar fluctuations in condensed media (see, e.g., Ref. 46). The spherical Gaussian distributions adopted in DPLR differ from the maximally localized Wannier distributions in the quadrupole and higher moments. This results in a potential energy error that converges well with size. In contrast, the errors of atom centered distributions start at the dipole level, and the potential energy error is only conditionally convergent with size. The DPLR model is physically meaningful, symmetry preserving, and smooth so that forces and virial can be analytically calculated, preserving the conservative properties of the adopted molecular dynamics scheme. The approach is also computationally efficient, as fast algorithms for calculating Ewald sums for Gaussian charges under periodic boundary conditions are well established.50–53 DPLR describes polarization fluctuations in finite and extended systems. Thus, it also allows us to model the response of a system to an externally applied field.
While preparing this article, we became aware of a recent preprint by Gao and Remsing,54 in which the electronic structure information encoded in the Wannier centers is used to construct a ML model of the PES, including long-range electrostatics. This model, called self-consistent field neural network (SCFNN), differs from DPLR because the separation between short- and long-range contributions is not done in the way in which the training data are handled, but in the way in which they are generated. The model requires short-range DFT data obtained, in principle, from calculations with a truncated Coulomb potential. A drawback of this formulation is that it introduces a self-consistency condition for the Wannier center positions. As a consequence, the simplicity of the ML construction is lost, and the dynamics of the model is no longer conservative, unless formulations such as those proposed by Car and Parrinello are introduced.55 In practice, separating standard DFT data into data for a truncated Coulomb potential and data for its long-range counterpart is not straightforward, and the authors overcame this difficulty by invoking linear response conditions.
In this article, we focus on DPLR and how it compares to the original DP model. Theory and method are presented in Sec. II. In Sec. III, we construct the DP, DW, and DPLR models for water that we use in applications discussed in Sec. IV. In that section, we compare DPLR, DP, and direct DFT calculations for the binding energy curve of the water dimer and for the free energy profile of a water molecule as a function of the distance from a liquid water slab. In Sec. V, we construct DP and DPLR models for small atomic displacements around the classical ground state in crystalline NaCl and use these models to compute the phonon dispersion curves. We show that the non-analytic contribution to the dynamical matrix, arising from long-range dipole–dipole interactions, can be calculated with DW and its polarizability extension.56 We then study the evolution of the analytical phonon dispersion curves calculated with finite supercells of increasing size. While the modes calculated with the DP model do not exhibit size dependence, those calculated with DPLR depend on size and partially recover the longitudinal-transverse splitting of the optical modes, within the limitations of the finite supercells used in the calculations. In Sec. VI, we discuss how the model can be used to study the response of a system to an externally applied electric field. Finally, Sec. VII is devoted to our conclusions.
II. THEORY AND METHOD
We focus on extended systems modeled with a periodically repeated supercell. Finite systems can be investigated in this way, provided that the supercell is large enough that interactions between periodic images can be neglected. With periodic boundary conditions, the system must be electrically neutral. However, the scheme could be easily extended to different choices of boundary conditions.
A. Electrostatic energy
In the reference DFT model, we adopt a pseudopotential framework. Thus, in what follows, electrons stand for valence electrons and ions stand for nuclei plus frozen core electrons. However, the method would be applicable to all-electron DFT methods as well, in which case ions would stand for nuclei and electrons would include valence and core. We indicate the charge density of ions and electrons by ρion(r) and ρe(r), respectively. When the system is in the ionic configuration {RI}, ρion(r) is given by
where RI indicates ion coordinates and ρI(r) = qIδ(r), in terms of qI, the charge of the ion I, and of δ(r), the Dirac delta distribution. In insulators, the density of the electrons, ρe(r), can be represented by a sum of local distributions via a unitary transformation of the occupied Bloch orbitals onto maximally localized Wannier functions.47 The centers of the local distributions are called maximally localized Wannier centers (MLWCs). Often, during molecular evolution, the same MLWCs can be uniquely assigned to the same atom along a trajectory. For example, in water systems, the same four MLWCs are always nearest to the same oxygen atom irrespective of liquid diffusion and molecular dissociation. It is then convenient to define the nth Wannier centroid (WC), Wn, as the average of the MLWCs assigned to a given atom, as illustrated in Fig. 1 for a water molecule. Lumping into a single centroid associated with the O atom, the four Wannier centers of a water molecule have the following advantages: (i) it makes straightforward to satisfy, within the deep neural network (DNN) model, the Born–Oppenheimer constraint of parametric dependence of the ground-state electronic properties on the atomic coordinates; (ii) it eliminates the need for imposing permutational symmetry between equivalent Wannier centers in a molecule; and (iii) it is computationally efficient. We note, however, that a unique association of the Wannier centers with specific atoms is not possible in the presence of electron transfer between different atoms. Dealing with these events requires a significant generalization of the method. Ignoring this possibility, ρe(r) takes the following form:
where ρn is the sum of the maximally localized Wannier distributions associated with the nth WC. For simplicity, we consider spin saturated systems, in which each maximally localized Wannier distribution carries a charge of two electrons (−2e). Thus, ρn carries an integer charge qn, equal to −2e times the number of Wannier functions associated with the nth centroid. For instance, in the water examples of this article, qn = −8e. The charge neutrality condition implies that the total charge density, i.e., ρt(r) = ρion(r) + ρe(r), satisfies
Thus, the total electrostatic energy is well defined and given by
Here, v(r) = 1/|r| is the Coulomb interaction, and it is understood that the (infinite) self-energy of the ionic point charges should be subtracted from the integral. The energy in Eq. (4) is calculated exactly in the DFT reference model, while it is approximated, in the DPLR model, by the electrostatic energy of a system of spherical Gaussian charge distributions gI and gn located at ionic (RI) and WC (Wn) sites, respectively. In DPLR, one adds to this energy a short-range contribution described by the DP part of the model, which takes care of all the additional many-body effects within rc, the cutoff radius of the model. Thus, all residual errors of the approximation for the electrostatic energy should be associated with long-range effects, for which r > rc. The Gaussian distributions gI and gn integrate to qI and qn, respectively, but are assumed to have the same spread , with β being an adjustable parameter. The true distributions, ρI and ρn, differ from gI and gn according to the following equations:
The distributions ΔI and Δn, in Eqs. (5) and (6), integrate to zero. ΔI is known, having all its mutipole moments equal to zero, while Δn has zero dipole moment, but is otherwise unknown. We choose the spread parameter β so that the Gaussian distributions are at the same time smooth on the atomic scale and well localized within rc. More details on the choice of β will be given when discussing specific examples later in this article. While the electrostatic potential generated by ΔI, which has spherical symmetry, decays to 0 exponentially for r > rc, the potential generated by Δn decays to 0 algebraically for r > rc because, in general, the distribution Δn has non-zero quadrupole and higher multipole moments. Thus, the potential generated by Δn decays at least as fast as r−3 for r > rc and has magnitude controlled by the quadrupole moments specific to the system. Typically, we expect quadrupoles on the order of 1 a.u., which would correspond to Δn(r = rc) being approximately equal to 1 meV for rc = 6 A. Given that the DP part of the model takes care of the effects within rc, the error of the above procedure is associated with effects beyond rc, which are controlled primarily by Δn.
The total ionic and electronic Gaussian distributions are given by
The corresponding total Gaussian charge distribution is Gt(r) = Gion(r) + Ge(r). The electrostatic energy associated with Gt is easily calculated in Fourier space,50
where L is the cutoff in Fourier space and S(m), the structure factor, is given by
Due to the smoothness of the distributions, the Fourier sum in Eq. (9) converges rapidly. Fast algorithms for calculating the electrostatic energy are available, such as the smooth particle–mesh Ewald (SPME)52,53 and the particle–particle–particle–mesh (PPPM)51 methods. The latter is adopted in the applications discussed in this article. The electrostatic energy in Eq. (9) also contains short-range contributions. To avoid double counting, is subtracted from the total potential energy when training the DP part of the DPLR model.
The error , due to neglecting the contribution of Δn to the electrostatic energy, is the sum of two terms, and , given, respectively, by
The two integrals above are well defined for extended systems as the corresponding charge distributions are neutral. Since the lowest non-zero multipole of Δn is the quadrupole, in Eq. (12) is (at worst) a sum of quadrupole–quadrupole interactions, decaying as r−5. Only the part of that sum involving terms with (roughly) r > rc cannot be learned by the DP model. These terms contribute to the error. Taking 6 Å for a typical rc value, the sum would include terms smaller than 1 meV. Due to the fast rate of decay, the double sum should converge rapidly. We could estimate in Eq. (11) in a similar fashion by writing Gt as a sum of local dipole contributions, resulting in a sum of dipole–quadrupole interactions that decay as r−4, i.e., still rapidly but less rapidly than the terms in . This estimate is, however, affected by the arbitrariness of the local dipole definition. A better way of estimating is the following. Let EΔ(r) be the electric field generated by the sum of the Δn distributions,
To leading order, the electric field in Eq. (13) is a sum of quadrupole contributions that decay like r−4 for r > rc. This field is essentially uniform on the molecular scale. Let us call the corresponding average field . Then, the average energy error can be written in terms of the cell dipole (polarization times volume) M as
In extended systems, M is defined modulo a quantum.46 Thus, the error in Eq. (14) is only well defined for a change of ionic coordinates that leads to a change of M.
In summary, the error in the LR contribution to the electrostatic energy, when the true electronic distribution ρe is replaced by the Gaussian distribution Ge, is given by a sum of small terms that decay at most as r−4 for r > rc. It would be possible to further reduce these errors by representing Ge as a sum of non-spherical Gaussian distributions having the spreads of the Wannier centroid distributions, which could be learned by an extended DW approach. In this way, the lowest non-zero multipole of Δn would be the octupole. At rc = 6 Å, such treatment would reduce the error by approximately an order of magnitude. However, this formulation would also result in a significant increase of the computational complexity, which was deemed unnecessary based on our tests.
We remark that formulations of LR electrostatics based on effective atomic charges cannot reproduce the fluctuations of M in the condensed phase. In these formulations, the atom centered electronic charge distribution differs locally from the true distribution ρe at the dipole rather than at the quadrupole level. Therefore, the corresponding error would be associated with dipole–dipole interactions that are only conditionally convergent. It is worth noting that, in finite systems, the error is reduced by adopting partial charge models trained to reproduce the molecular dipole.
B. Deep potential long-range model
In DPLR, the PES is given by
where the electrostatic energy was introduced in Eq. (9) and Esr is a short-range contribution constructed as in the standard DP model (see the Appendix for more details). However, Esr is not equal to the standard DP energy because the Gaussian electrostatic energy is subtracted from the DFT training data to avoid double counting.
The forces and the virial are derivatives of the energy with respect to the atomic positions and the simulation cell tensor, respectively. Within DPLR, the energy E depends on the atomic coordinates, both explicitly and implicitly, via the dependence of the WCs on the atomic coordinates. The implicit dependence makes the calculations nontrivial.
The force on atom I is given by
Here, the environmental dependence of Wn is given by the DW model. We recall that there is a one-to-one correspondence between the Wannier centroid n and its nearest atom, the atom I. This correspondence establishes a bijective mapping between the indices of the WCs and the indices of a subset of the ions, which we indicate, equivalently, by n = n(I) or by I = I(n). Due to the nearsightedness of the electronic matter, the position of the WC associated with atom I depends on the local environment of that atom within a cutoff radius rc that can be safely assumed to be equal to the cutoff radius of the DP model. Indicating with Dn the displacement of Wn relative to RI, we have
which relates the environmental dependence of Dn to that of Wn. The latter is smooth by construction in the DW model. Thus,
where δI,J is the Kronecker delta and the force on atom I in Eq. (16) becomes
The first term on the right-hand side (RHS) of Eq. (19) is the standard DP force, and the second and third terms give the electrostatic force on atom I and its associated WC, respectively, while the fourth term originates from the environmental dependence of the WC. The derivatives ∂Dn/∂RI are calculated by back-propagating the DW model.
We indicate the cell tensor by h = {hαβ}, with hαβ being the βth component of αth cell vector. The virial is defined by
where summation over repeated indices is assumed. By using Eq. (15), we have
The first term on the RHS of Eq. (21) is the short-range virial contribution, , which is calculated as in the standard DP model. The remaining term is the electrostatic virial contribution, whose calculation is non-trivial, as we detail below. First, it is convenient to define the reciprocal cell tensor , in which is the βth component of αth reciprocal cell vector. The direct and reciprocal cell tensors define linear scaling transformations in direct space and reciprocal space that are useful in MD simulations with a variable cell. In direct space, a scaled coordinate s is related to its non-scaled counterpart R by R = sh, and in reciprocal space, the lattice vector m is related to its scaled counterpart k by m = h−1k. As a consequence, the scalar product mR = sk is independent of the cell tensor h, and so is the factor in Eq. (10). However, the factor in the same equation does depend on the cell vector h because, in general, the positions of the WCs do not change linearly with a cell deformation. Then, we can write for the electrostatic contribution to the virial,
with
where we used the definition . in Eq. (23) is the standard reciprocal virial contribution of the Ewald method, while the correction term in Eq. (24) accounts for the nonlinear dependence of the WCs on the cell tensor. The correction term is calculated as follows:
where the terms in parentheses are given by
Thus, the correction virial term in Eq. (25) becomes
Given that
the correction virial simplifies to
We checked that the above formulas are correct by comparing the analytical derivatives in the formulas to the numerical derivatives calculated with finite differences. As a consequence, in molecular dynamics simulations, the conservation laws associated with the equations of motion should be satisfied within the accuracy of the adopted numerical implementation. We find, for instance, that in NVE trajectories of liquid water with 128 molecules, the total energy shows a small drift of ∼0.4 meV/H2O (∼1 K) per 100 ps with the DPLR approach, when using a mesh spacing η = 0.98 Å in the calculation of the Ewald sum. As shown in Fig. 2, this is small but greater than the drift of a DP trajectory for the same system, which is almost not observable in a 100 ps trajectory. The total energy drift in the DPLR trajectory is controlled by the numerical accuracy of the fast algorithm for computing the Ewald sum, here the PPPM method, and can be further reduced by using a finer mesh. As shown in Fig. 2, the drift of DPLR is not observable when the mesh spacing η is set equal to 0.49 Å.
III. DEEP POTENTIAL LONG-RANGE MODEL FOR WATER
We construct a DPLR model for water based on the Perdew–Burke–Ernzerhof (PBE) functional approximation of DFT.57 It is well known that PBE substantially overestimates the strength of the hydrogen bonds in water. As a consequence, it fails to describe correctly some important thermodynamic properties, such as the relative density of ice and water at ambient pressure.58 However, here, we are not interested in constructing a state-of-the-art water model, but rather we intend to demonstrate that DPLR can capture long-range electrostatic effects missing in the DP model. For this purpose, the choice of the exchange–correlation functional is not a critical issue. It would be straightforward to adopt more accurate functionals than PBE by simply replacing PBE with any other functional of choice, in the labeling steps whereby energy, force, and virial tensor are calculated at selected atomic configurations. In Subsections III A–III C, we construct DP, DW, and DPLR models based on the PBE functional approximation.
A. Training data and DP model
We use the concurrent learning scheme Deep Potential GENerator (DP-GEN).59,60 DP-GEN iteratively enlarges the training dataset of labeled DFT data and refines a representative ensemble of DP models by learning from a growing dataset. In this work, the representative ensemble consists of four models that differ in the random initialization of the network parameters but are otherwise trained on the same data. In the DP-GEN protocol, the four DP models are used to sample the relevant thermodynamic space with deep potential molecular dynamics (DPMD) trajectories. The DP construction includes three stages. In the first one (iteration 1–7), bulk configurations in the thermodynamics space 200 ≤ T ≤ 400 K and 1 ≤ P ≤ 104 bars are explored with isothermal–isobaric (NPT) DPMD. The initial configurations for bulk NPT DPMD simulations are snapshots of DPMD simulations with 128 molecules in the liquid state obtained with the DP model developed in Ref. 13. The second stage extends from iteration 8 to iteration 15. In this stage slab, configurations are explored with canonical (NVT) DPMD simulations in the temperature range 200 ≤ T ≤ 400 K. The initial configurations are prepared by adding a vacuum region on top of bulk liquid configurations. Different surface structures with thickness of the vacuum region ranging from 0.5 to 7.0 Å are considered. In the third stage, from iterations 16–23, low density bulk configurations are explored by NVT simulations for 200 ≤ T ≤ 400 K. Initial configurations are created by randomly removing 64, 96, 112, and 120 molecules from the initial configurations of stage 1. At each stage, the time span of the DPMD simulations is gradually increased over the iterations, from 1 to 10 ps. The maximal deviation of the forces predicted by the ensemble of DP models is monitored along the trajectories and used to classify the explored configurations. Configurations with deviation between 0.15 and 0.30 eV/Å are labeled training data, following the protocol detailed in Ref. 60. Single-shot DFT calculations are performed at these configurations.
DFT calculations are carried out with the Vienna ab initio simulation package (VASP) version 5.4.461,62 using the PBE exchange–correlation energy. The kinetic energy cutoff for the plane wave basis is set to 1500 eV. The self-consistent field procedure is stopped when the energy difference between the last two iterations is less than 10−6 eV.
The DP models are trained with the DeePMD-kit package.63 The cutoff radius rc for the atomic environments is set to 6 Å. The size of the embedding and that of the fitting network are set to (25, 50, 100) and (240, 240, 240), respectively. The DP models are trained with the Adam stochastic gradient descent scheme64 using 106 steps, during which the learning rate exponentially decays from 10−3–3.5 × 10−8. At each training step, a subset of the training dataset, referred to as the mini-batch, is used to calculate the gradient of the loss function. The size of the minibatch is 1, i.e., only one configuration is used for calculating the gradient.
At the end of each training stage, the fraction of configurations with gradient deviation larger than 0.15 eV/Å is reduced to less than 0.1%, indicating satisfactory convergence of the DP-GEN cycle. A total of 448 configurations are selected as training data during the DP-GEN protocol. They include 323 bulk, 94 surface, and 31 low density configurations. The dataset used for training the initial DP models has 135 configurations; thus, the total training data include 583 configurations.
B. DW model
For each water configuration, the DW model gives the position of the WC associated with a given molecule relative to the O atom of that molecule. The training data are the same data used to generate the DP model. Each WC is uniquely associated with a particular O atom and depends on the atomic environment of that atom within a cutoff radius, which is set here to 6 Å, i.e., it is the same cutoff rc of the DP model. We use the standard DP descriptor (see the Appendix for details). The sizes of the embedding and fitting networks are (25, 50, 100) and (100, 100, 100), respectively. The model is trained with the Adam stochastic gradient descent method64 using 106 steps. The batch size is set equal to 1. The learning rate exponentially decays from 1.0 × 10−2 to 5.6 × 10−8.
The training accuracy of the DW model is 1.7 × 10−3 Å, which is much smaller than the typical distance D of a WC from its reference O atom. The test accuracy is 1.9, ×, 10−3 Å. Thus, the generalization gap between test and training errors is almost negligible. The correlation between labeled and predicted D is graphically illustrated in Fig. 3.
C. DPLR model
The DPLR model is trained on the dataset generated by DP-GEN. In DPLR, the spread parameter β, i.e., the inverse spread of the Gaussian charge distribution, needs to be fixed. In the limit of β going to 0, the Gaussian width is infinite and DPLR reduces to the standard DP model. On the other hand, for β going to infinity, the Gaussian charges become point charges. In this limit, the magnitude of the intramolecular Coulomb force between the oxygen ion (qI = 6e) and its associated WC (qn = −8e), for a typical separation distance (D ≈ 0.07 Å), is about 104 eV/Å, which is about four orders of magnitude larger than the average atomic force. This situation would create serious numerical problems due to the difficulty in resolving a labeled force from the strong intramolecular Coulomb force. Therefore, there should be an optimal β, neither too small to avoid reduction to the standard DP model nor too large to avoid distributions too close to point charges.
To study the effect of the spread parameter on the accuracy of the DPLR models, we train different DPLR models with different spread parameters (β = {0.1, 0.2, …, 0.6} Å−1). To ensure proper convergence of the Ewald sum in Eq. (9), the mesh spacing η is set equal to the largest value compatible with the constraint η · β ≤ 0.4. In DP and in the short-range part of the DPLR models, we adopt a hybrid descriptor (see the Appendix for details) , and the cutoff of is set to Å, while the cutoff of is set to rc = 6 Å. The size of the embedding net of is (25, 50, 100), and that of is (10, 20, 40). Training and test errors, as a function of β, are shown in Fig. 4. The training dataset is divided into two parts, one consisting only of bulk configurations and the other consisting only of surface configurations. The test dataset includes surface configurations with 128 molecules with a vacuum region of 7 or 12 Å, generated along a 100 ps DPLR molecular dynamics (DPLRMD) trajectory for each of the two widths of the vacuum region. Configurations are recorded every 10 ps and are labeled with ab initio DFT calculations. Thus, the test dataset includes 20 surface configurations in total.
In Fig. 4, one can see that the energy training error for the bulk configurations is smaller than the corresponding error for the surface configurations in both DP and DPLR models, reflecting the more difficult task of interpolating the PES for surface than for bulk configurations due to the larger variance of the local environments in the former case. For β < 0.5 Å−1, the DPLR training errors are close to those of DP, but for β ≥ 0.5 Å−1, they gradually increase and become notably larger than the corresponding DP errors, a behavior that we attribute to numerical difficulties with Gaussian charges that are too localized.
The test dataset is composed of independent configurations not included in the training dataset and serves to assess the capability of the models to deal with unforeseen situations. In the case of the standard DP model, the test error (2.0 meV/H2O) is larger than the surface training error (1.2 meV/H2O). The gap, usually called generalization gap, between test and training errors indicates that a model overfits the training data and loses accuracy when generalized to cases not included in the training dataset. The DPLR model with a small β has a generalization gap comparable to that of the standard DP model, as expected because for β → 0 Å−1, DPLR reduces to DP. However, for β ≈ 0.4 Å−1, the generalization gap is minimal. In this case, the training error and the test error are 1.3 and 1.5 meV/H2O, respectively, indicating that inclusion of long-range electrostatics contributes to the model’s generalization ability. For β ≥ 0.5 Å−1, the test error increases with β, as expected from numerical difficulties with charge distributions that are too strongly localized.
Based on the above analysis, the optimal spread parameter β should have a value of about 0.4 Å−1. It is interesting to note that, with this value of the spread parameter, the radial extent of the Gaussian distribution gn(r) about the oxygen atom is close to the physical extent of the (valence) electron density of a water molecule, as measured by the radius of the surface on which the electron density is approximately one order of magnitude smaller than its maximum value. Thus, with the optimal β, the spherical Gaussian distribution gn(r) approximates (roughly) the physical WC distribution ρn(r) and the error Δn(r) in Eq. (6) is minimized.
In the rest of this article, we set β = 0.4 Å−1, unless stated otherwise. We note that simply increasing the cutoff radius of the standard DP model to 8 Å only marginally reduces training and test errors. Indeed, when using rc = 8 Å in place of rc = 6 Å, the bulk and surface training errors are reduced from 1.02 and 1.26 to 1.01 and 1.18 meV/H2O, respectively, while the test error is reduced from 2.14 to 2.08 meV/H2O. This indicates that the long-range effect is non-trivial and can only be captured by including explicitly the long-range electrostatic interaction.
IV. APPLICATION TO TWO WATER SYSTEMS
We compare the performance of DPLR and DP in two test cases. In one, we compute the potential energy of interaction of two water molecules as a function of the relative distance. In the other, we compute the free energy profile of a water molecule vs its distance from a liquid water slab. The second example requires the calculation of a thermal property of a relatively complex system. In both cases, long-range electrostatic interactions among the electric dipoles of the water molecules play a role. These interactions, absent in the DP model, are described with sufficient accuracy by DPLR.
A. Water dimer
The potential energy curve of a water dimer as a function of the separation distance dO between the two molecules is reported in Fig. 5. Both DP and DPLR models describe equally well the potential energy at short distance. The DP model does not correctly predict the energy curve at a distance beyond the cutoff radius. Due to the finite range of the DP model, the corresponding potential energy curve misses the tail of the DFT reference, which is due to dipole–dipole interactions. By contrast, the long-range Coulomb tail is recovered accurately by the DPLR model, as shown in both panels of Fig. 5. The accuracy at intermediate distance improves by increasing the cutoff in all the models, as expected. Although DPLR has a higher ability of representing the energy surface, it is marginally less accurate than the DP model between 3.5 and 4.0 Å. This can be explained by the fact that dimer configurations were not included in the training dataset. From Fig. 5, one concludes that DPLR is a superior model with both cutoffs.
B. Free energy profile of a molecule at varying distance from a liquid slab
In this subsection, we test the performance of DPLR in a calculation of the free energy profile of a water molecule as a function of its distance from a liquid water slab. The model has cutoff radii of Å and rc = 6 Å, a spread parameter of 0.4 Å−1, and an Ewald mesh spacing of 1 Å. We simulate the slab and the adjacent vacuum region on a periodic box of size 50 × 11 × 11 Å3. The slab contains 64 molecules, let to equilibrate with a long canonical MD run at a temperature of 300 K starting from an initial bulk liquid configuration. After equilibration, the half width of the slab is Å and the adjacent vacuum region has a width of Å. We insert an additional molecule to this system, initially near the center of the vacuum region where it interacts weakly with the slab. Then, we vary the distance of the molecule from the slab and we calculate the free energy profile as a function of the distance, while keeping the whole system in equilibrium at 300 K. The distance of the molecule from the slab is defined by the distance of the respective centers of mass. A typical configuration is illustrated in the inset of Fig. 6 for a distance s between the molecule and the liquid slab. In the atomic configuration {R}, the distance of the molecule from the slab is represented by the collective variable σ({R}). In MD simulations, an holonomic constraint, such as σ({R}) = s, is easily imposed by a Lagrange multiplier. We set the initial distance to sfar = 17 Å. At this distance, the molecule experiences weak long-range dipolar interactions with the slab. We indicate by F the projection of the force on the molecule along the direction connecting the center of mass of the molecule and that of the slab. When the molecule is displaced from sfar to s, the corresponding free energy change, A(s), is given by
Here, H({R}) and Z are the Hamiltonian and the partition function of the system. For each value of τ, the ensemble average in Eq. (30) is calculated with canonical MD at 300 K for configurations that satisfy the constraint σ({R}) = τ. We compute the ensemble average at a discrete set of τ values using 1000 ps long MD trajectories, the first 200 ps of which serve for equilibration and are not included in the observation time. During observation, configurations are recorded every 10 ps. By varying s from sfar to snear = 9 Å, we obtain the blue free energy profile reported in panel (a) of Fig. 6 for the DPLR model. We divide the interval [snear, sfar] into eight equal intervals and calculate the corresponding free energy changes by evaluating the integrals in Eq. (30) numerically with the trapezoidal rule. For s in the vicinity of snear, the tagged molecule forms hydrogen bonds with neighboring molecules in the slab, suggesting that incorporation into the slab is taking place.
We use a simple approximation to estimate the deviation of the calculated DPLR profile from the DFT reference. Instead of re-weighting with the DFT energy the configurations along DPLR trajectories, which would be expensive given the large number of configurations needed for statistical accuracy, we keep the DPLR weights and approximate the ensemble average with the following expression, which only requires DFT calculations at a relatively small number of configurations:
This procedure can be justified because DPLR reproduces accurately DFT energies with an average error of meV/H2O. We find that 80 configurations, equally spaced in time along an equilibrated 800 ps long trajectory, are sufficient to compute the average force at each τ value. The results are reported in panel (b) of Fig. 6, where the baseline DFT reference corresponds to ΔA(s) = 0. The deviations of DPLR from the baseline, calculated at discrete distances in the interval [snear, sfar], appear in Fig. 6(b) as blue points connected by blue lines. The blue points represent the average of four independent DPLR models trained on the same data with same hyper-parameters but different random initialization of the model parameters. The blue error bars are the corresponding standard deviations of the predictions of the four models. Overall, the deviation of DPLR from DFT is quite small and the uncertainty of the model, estimated from the standard deviation of the four models, is even smaller.
We now consider a DP model with same cutoff radius of DPLR but without explicit long-range Coulomb interactions. To estimate the deviation of this model from DPLR, we adopt a similar approximation to the one adopted in Eq. (31), i.e., we assume that
and use the same configurations as before to calculate the ensemble averages and the free energy differences. The deviations of DP from DPLR and from the baseline DFT reference are reported as red points with error bars connected by red lines in panel (b) of Fig. 6. As before, the DP points are averages of four independent models trained on the same data with different random initialization. The average free energy profile obtained in this way is reported as a red line in panel (a) of Fig. 6, showing that when the molecule approaches the slab, the gain in free energy is smaller for DP than for DPLR, a result consistent with the missing attractive dipolar interactions in the DP model. However, the most important outcome of this analysis is that the DP model is affected by a large uncertainty. When evaluated along DPLR trajectories, the standard deviation of the independent DP models grows rapidly as the tagged molecule approaches the slab to become almost an order of magnitude larger than the uncertainty of the corresponding DPLR model when the tagged molecule is near the slab. We attribute this behavior to the absence of long-range electrostatic interactions in the DP model as opposed to DPLR and DFT.
V. PHONONS IN CRYSTALLINE SODIUM CHLORIDE
In this section, we consider the phonon dispersion curves in the sodium chloride (NaCl) crystal, as a further example to illustrate the importance of long-range electrostatic interactions. For the reference DFT model, we adopt the PBE exchange–correlation functional and use pseudopotentials for the electron–ion interactions. DP and DPLR models for NaCl are trained on DFT data generated in a 200 ps DPMD NPT trajectory of a 2 × 2 × 2 supercell (64 atoms) at 100 K and 1 bar. In the DFT data, we use Brillouin Zone (BZ) sampling with a 4 × 4 × 4 Monkhorst Pack mesh to calculate the interatomic potential and the maximally localized Wannier distributions. The hyper-parameters of the DP and DPLR models are the same as in the water models discussed earlier in this article. Using the same arguments of Sec. III C, we find that the optimal value of the spread parameter for NaCl is β = 0.2 Å−1. The smaller β of NaCl compared to water reflects the fact that the electron density distribution about the Cl atom in NaCl is more delocalized than the one about the O atom in the water molecule. The training accuracy of the DP model is 7.6 × 10−4 eV/atom for energy and 2.0 × 10−3 eV/Å for atomic forces, similar to that of the DPLR model, which is 7.7 × 10−4 eV/atom for energy and 2.6 × 10−3 eV/Å for atomic forces.
To compute the phonons, we calculate numerically the force constants from the second derivatives of the PES with respect to atomic displacements. We then obtain the phonon frequencies at a generic point q of the BZ of the crystal with the phonopy package,66 by diagonalizing the dynamical matrix given by a Fourier expansion of the force constants at wavevector q. This approach would be adequate in non-polar materials where the force constants have a finite range. In polar materials, such as NaCl, long-range electrostatic interactions yield a non-analytic contribution to the dynamical matrix, which should be added to the analytic contribution calculated with the above procedure. We report in panel (a) of Fig. 7 the phonon dispersion curves along the Γ–X segment of the BZ obtained from the analytical part of the dynamical matrix by using a 2 × 2 × 2 supercell for the force constants. DP and DPLR curves coincide within the accuracy of the calculation and are very close to the corresponding DFT curves, as one could have expected, given that a 2 × 2 × 2 supercell was used for training the models. Experimental frequencies are also reported in the same panel, from which we see that, while the calculated acoustic and transverse optical (TO) modes agree with the experiment, the calculated longitudinal optical (LO) modes deviate considerably from the experiment, particularly in the long-wavelength limit, where the calculated modes fail to exhibit an LO–TO splitting. Adding the non-analytic contribution to the dynamical matrix restores agreement between theory and experiment, as illustrated in panel (b) of Fig. 7. The non-analytic contribution to the dynamical matrix is given by67
Here, Ω is the volume, κ, κ′ are atom indices, α, β, γ are Cartesian directions, Mκ, Mκ′ are atomic masses, are Born dynamical charge tensors, and ϵ∞ is the dielectric tensor that describes the static response of the electrons at fixed ions. We recall that the dynamical Born charges are the derivatives of the polarization with respect to atomic displacements and are therefore directly accessible from the DW network within DPLR. The dielectric tensor is related to the dielectric susceptibility or polarizability tensor χ via ϵ∞ = 1 + 4πχ. The susceptibility is defined by the derivatives of the polarization with respect to an applied electric field and is accessible within a DW extension introduced to model the polarizability surface.56 Thus, all the quantities needed for calculating the phonons in polar materials are accessible within the extended DP methodology.
It is interesting to monitor the evolution of the analytical phonon modes calculated within DP and DPLR when the size of the supercell used to compute the force constants is increased. This is shown in panels (c)–(e) of Fig. 7 for a 3 × 3 × 3 supercell (216 atoms), a 5 × 5 × 5 supercell (1000 atoms), and a 6 × 6 × 6 supercell (1728 atoms), respectively. The DP modes are essentially independent of size, indicating that, with the chosen cutoff radius (rc = 6 Å), the force constants for atomic distances larger than those probed in a 2 × 2 × 2 supercell are negligible. By contrast, the LO mode of DPLR is strongly size dependent, reflecting the appearance of not negligible force constants on larger supercells due to long-range dipolar interactions. With larger supercells, the analytical spectra from DPLR recover a larger portion of the correct LO modes. However, the convergence is slow as the non-analytic behavior for q → 0 cannot be recovered from numerical Fourier interpolation. At small wavevectors, we observe an upward bump in the LO and TO modes that becomes sharper and moves to smaller q values as the size of the supercell increases. The bump is a manifestation of the Gibbs oscillations that are expected to occur in place of the LO–TO splitting when attempting to reproduce the non-analytic behavior of the dynamical matrix with a Fourier sum.68
The long-range contributions that are responsible for the spectral changes in panels (c)–(e) of Fig. 7 derive from the Ewald contribution to the PES within DPLR. This example illustrates an essential advantage of DPLR over DP in materials where long-range electrostatic effects are important. While the DP model can describe well such materials on the supercell used for training, it is unable to extrapolate the PES to larger supercells, a drawback that is absent in the DPLR model.
VI. EXTERNAL FIELDS
In this article, we have considered the electrostatic fields originating from internal charges, but, in the linear regime, it is straightforward to include the effect of an external time dependent field that couples with the polarization P({R}). In this situation, the PES takes the following form:
In Eq. (34), the PES in the absence of an external field can be provided either by DPLR or by DP depending on whether the size dependence of the electrostatic energy should be included explicitly or not. Equation (34) makes possible non-equilibrium MD simulations of an insulating system driven by an external electric field. The response of the system to the external field can also be studied with equilibrium MD using the Kubo formalism, as done in Ref. 48 to study the infrared absorption spectra of liquid water. In that work, the standard DP model for the PES was used, which was adequate because at the infrared frequencies, long-range electrostatic effects can be ignored as the dominant correlations are between neighboring molecular dipoles.69 Long-range correlations among the molecular dipoles become important in the static ionic limit.70 In that limit, which is relevant to study the static dielectric constant of water, one should use the DPLR model for the PES in Eq. (34).
Equation (34) can be further extended by including the quadratic response of the electrons via the susceptibility tensor χ,
This approach was used in the context of the Kubo formalism to compute the Raman spectra of water in Ref. 56, a study that required the environmental dependence of the electric susceptibility. The latter was modeled by a DNN extending the DW approach to the polarizability tensor. The polarizability tensor is defined by the derivatives (i.e., the response) of the Wannier centroid positions with respect to an applied electric field in the limit of zero field. These derivatives are described by an extension of the DW model trained on DFT calculations in the presence of a small, i.e., numerically infinitesimal, electric field, as discussed in Ref. 56. The Raman scattering response is dominated by short-range correlations between molecular polarizabilities, and the standard DP model was sufficient for the PES.
VII. CONCLUSIONS
In this article, we examined a fundamental limitation of the local representation of the PES common to most ML models. These models are trained on quantum mechanical data, typically DFT data, on relatively small systems. They are then used in molecular simulations on larger systems that become accessible in view of the computational efficiency of the models. While successful in many cases, this procedure may lead to errors for physical properties affected by long-range electrostatic interactions. We have shown that these effects can be modeled accurately by the DPLR model, which extends the DP methodology using information on the centers of the electronic charge. The latter is encoded in the location of the Wannier centroids, which are averages of the Wannier centers uniquely associated to specific atoms. The environmental dependence of the Wannier centroids is described by DW, a DNN model, which we introduced in previous work to describe the dielectric polarization of insulators. Using the information from DW, DPLR augments the local representation of the PES of the standard DP model with the long-range electrostatic energy of ions and electrons, modeled by spherical Gaussian charge distributions centered at ions and Wannier centroids, respectively. The width of the Gaussian distributions is fixed by the value of the spread parameter β. Interestingly, the optimal β that minimizes the generalization gap in the ML procedure is close to the physical value expected from the spatial delocalization of the reference Wannier centroid distribution within DFT, as illustrated in our water and NaCl examples. The variation in size of the long-range electrostatic contribution, ignored in the standard DP model, is approximated accurately in the DPLR model under the assumption that the electrostatic fields from which this contribution originates are a weak perturbation that can be treated within linear response theory. Because of that, it is possible to learn the environmental dependence of the Wannier centroids, ignoring the size dependence of the electrostatic energy as done in the DW scheme.
DPLR uses two DNN models to represent the PES: the standard DP network for the short-range interactions and the DW network for computing long-range electrostatic interactions with the Ewald method. Because of the complex architecture of the DPLR network, whereby force calculations require backward propagation within both DP and DW, and because of the need for Ewald calculations, DPLR simulations are more expensive than standard DP calculations. Based on our experience, an increase of the computational burden of approximately a factor of 5 should be expected when using DPLR in place of DP. The decision on which of the two models should be used in a given application should be guided by the physical properties of interest, whether long-range correlations originating by Coulomb forces are important or not.
In developing the DPLR model, we assumed that long-range electrostatic effects beyond the cutoff radius of the DP model originate a weak dependence on size that can be described within the linear response regime. This seems a reasonable assumption based on our experience and on the examples discussed in this article. We cannot exclude, however, that effects of long-range electrostatics beyond the linear response regime may manifest in some circumstances. In these cases, the assumption made here that the environmental dependence of the Wannier centroids is not affected by size should be revisited and one should envision a self-consistent condition for the centroids under the action of the long-range electrostatic field along lines similar to those developed in Ref. 54. We expect that a better understanding of all these issues should emerge from widespread application of the DPLR model.
The major simplifying assumption of the current DPLR model is that the WCs are uniquely associated with specific atoms. Thus, they cannot split or recombine along molecular dynamics trajectories. This assumption limits the capability of the model to deal with chemical reactions: proton transfer reactions such as those discussed, for example, in Refs. 26 and 48, are allowed, but not, in general, electron transfer reactions in which an electron is donated by one atomic entity to another. The centroid assumption shall be relaxed, within the limits of adiabatic ground-state dynamics, in a future generalization of the DPLR model.
ACKNOWLEDGMENTS
This work was supported by the Computational Chemical Sciences Center “Chemistry in Solution and at Interfaces” funded by the U.S. Department of Energy under Award No. DE-SC0019394. The work of H.W. was supported by the National Science Foundation of China under Grant Nos. 11871110 and 12122103.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
All data and codes needed to reproduce this work, including DP, DW, and DPLR models, are publicly available at https://doi.org/10.5281/zenodo.6024644.
APPENDIX: MODELING THE SHORT-RANGE PART
In this appendix, we briefly recall how the short-range DP model is constructed. The short-range part Esr is given by a sum of atomic contributions,
where is an environment matrix whose rows record the positions, relative to atom I, of the neighbors of that atom, within a specified cutoff radius rc. We consider two sets of neighboring atoms and , where and rc denote two cutoff radii, typically satisfying . The number of neighbors of atom I in the two sets is correspondingly denoted by Na(I) and N(I). The two sets are used for constructing descriptors with angular and radial information.
The environment matrix is derived from the neighbor set and has dimension equal to Na(I) × 4, with each row being a four-dimensional vector s(RIJ) × (1, xIJ/|RIJ|, zIJ/|RIJ|, zIJ/|RIJ|). Here, the relative positions are defined as RIJ = RI − RJ, and their Cartesian components are denoted by RIJ = (zIJ, yIJ, zIJ). s(RIJ) is defined as w(|RIJ|)/|RIJ|, with w being a switch function that decays smoothly from 1 to 0 at the cutoff radius. The embedding matrix has dimension equal to Na(I) × Ma, with each row mapping s(RIJ) to an Ma dimensional vector via a feed forward DNN. A submatrix of that includes its first M1 columns is denoted by . The descriptor defined by contains angular and radial information, and one proves that it provides a symmetry preserving representation of the local environment of atom I.
The environment matrix includes only radial neighbor information. It is set up from the neighbor set and has dimension equal to N(I) × 1, with each row being s(RIJ). The corresponding embedding matrix is denoted by and has dimension equal to N(I) × M. It should be noted that the embedding net used to map s(RIJ) to is usually different from the one used to generate . The descriptor with only radial information is defined as , i.e., it is the average of the embedding matrix with respect to its first index. The descriptor contains radial and angular information on the environment, while contains only radial information. Typically, combined angular and radial information is more important for close neighbors, while radial information alone is sufficient to describe the more distant neighbors. We thus define a hybrid descriptor as the concatenation of the radial and angular descriptors, i.e., . Relative to the descriptor , the hybrid descriptor has usually a stronger ability of generalization.
The hybrid descriptor is proved to be a symmetry preserving representation of the local environment of atom I. in (A1) denotes the fitting network, which is implemented by a feed forward DNN with skip connections. The DNNs used in the standard DP construction are trained in an end-to-end way by stochastic gradient descent schemes such as Adam.64