Many state-of-the art machine learning (ML) interatomic potentials are based on a local or semi-local (message-passing) representation of chemical environments. They, therefore, lack a description of long-range electrostatic interactions and non-local charge transfer. In this context, there has been much interest in developing ML-based charge equilibration models, which allow the rigorous calculation of long-range electrostatic interactions and the energetic response of molecules and materials to external fields. The recently reported kQEq method achieves this by predicting local atomic electronegativities using Kernel ML. This paper describes the q-pac Python package, which implements several algorithmic and methodological advances to kQEq and provides an extendable framework for the development of ML charge equilibration models.

## I. INTRODUCTION

Atomistic machine learning (ML) methods and interatomic potentials, in particular, have had an enormous impact in the fields of molecular and materials simulation.^{1–5} One of the key innovations that made this possible was the idea of decomposing the total energy of a system into atomic contributions, which could be learned as a function of each atom’s chemical environment within a certain cutoff radius using Neural Networks (NNs)^{6} or Gaussian Process Regression (GPR).^{7} This locality assumption has enabled the construction of highly accurate, computationally efficient, and size-extensive potentials that approach first-principles accuracy at a fraction of the cost.^{8–11}

At the same time, locality ultimately limits the achievable accuracy of a potential, since the information beyond the cutoff radius is not taken into account in this case.^{12–14} Indeed, long-range interactions can be substantial in bulk systems, most prominently due to the Coulomb interaction, which decays slowly $(\u223c1r)$ with interatomic distance. These electrostatic interactions are often screened in practice so that local potentials can still effectively describe polar solids and liquids with surprising accuracy.^{9,10,14} Unfortunately, this cannot always be relied upon, however. For example, non-local charge transfer at heterogeneous interfaces or through molecular wires cannot be adequately captured in this manner.^{15} Similarly, the relative stability of molecular crystal polymorphs sensitively depends on a balance of long-ranged electrostatic and dispersion interactions, precluding a purely local description.^{16,17}

Due to these limitations, the inclusion of long-range interactions in ML potentials has been an active field of study, with several different approaches in use. These, for example, include the use of global or non-local descriptors.^{18–20} In many cases, physical baseline models can also provide the correct long-range physics at affordable computational cost (Δ-ML).^{16,17,21–23} Finally, message-passing neural networks can extend the range of local interatomic potentials by a multiple of the employed cutoff, although without including the full long-range interactions present in a periodic system.^{24}

In this contribution, we focus on approaches that tackle the problem of long-range electrostatics by describing the charge distribution of molecules or materials within the ML model itself (e.g., via partial charges). This has the advantage that it allows the incorporation of different total charge states and the response to external fields rigorously. The simplest approach to this end is to directly learn suitable reference charges, e.g., from Hirshfeld decomposition.^{25,26} While this, in principle, affords a reasonable description of long-range electrostatics, it does not resolve the issue of non-local charge transfer since the charges are, in this case, themselves functions of a local ML model.

To overcome this, Goedecker and co-workers proposed the Charge Equilibration via Neural Network Technique (CENT),^{27–29} where the charges are obtained by minimizing a charge dependent energy function. Specifically, CENT uses the classical Charge Equilibration (QEq) model^{30} as a basis, replacing fixed elemental electronegativities with environment-dependent ones predicted by a NN. This approach was subsequently developed further by Behler, Goedecker, and co-workers into the Fourth Generation High Dimensional Neural Network Potentials (4GHDNNP).^{1,15,31} Here, CENT and local NN potentials are combined, and partial charges are fitted to reproduce those obtained from Hirshfeld partitioning.^{32} Similarly, Xie *et al.* reported a self-consistent NN potential, where charges are obtained through the gradient-based minimization of a coupled local and electrostatic energy function.^{33} Here, charges from the Becke population analysis were used as a reference for the partial charges.

Our recently reported Kernel Charge Equilibration (kQEq) method is in the same spirit as these approaches but uses Kernel ML instead of NNs.^{34} Kernel methods are frequently used for interatomic potentials, as they are highly data-efficient, depend on few hyperparameters, and can be trained through a closed-form linear algebra expression.^{4} Furthermore, kQEq avoids the ambiguity of charge partitioning schemes by training directly on electrostatic observables, such as the dipole moment.

In this paper, we introduce the q-pac Python package. q-pac provides a modular framework for implementing machine-learned charge equilibration methods, with a particular focus on kQEq. We review the kQEq methodology and describe several new algorithmic and methodological advances in q-pac. In particular, the Kernel Ridge Regression (KRR) approach of the original kQEq paper is replaced by a sparse GPR formulation, which provides better computational scaling of training and prediction as a function of the training set size. Furthermore, additional fitting targets and the possibility to fit multiple properties at the same time have been implemented. Notably, this includes energies, which allow the development of fully long-ranged ML interatomic potentials based on kQEq. Finally, some example applications of these new capabilities are showcased.

## II. THEORY

To provide a consistent account of the methodology, the kQEq working equations are rederived in this section, starting from the classical QEq approach of Rappe and Goddard.^{30,35} Differences and new features relative to the original implementation presented in Ref. 34 are highlighted where appropriate. Atomic units are used in all equations.

### A. Charge equilibration

The core idea of QEq^{30,35} and related methods is to define a simple energy expression that depends on the charge distribution within a system. The ground-state charge distribution for a given geometry is then obtained by minimizing this energy, under the constraint that the total charge is conserved.

The charge distribution in a molecule or solid is rigorously described by the electron density *ρ*(**r**) and the location of the nuclei. Since the electron density is a complex three dimensional distribution, it is computationally convenient to work with a more simplified representation, such as atomic partial charges, however. To this end, we can split the total electron density into a reference density *ρ*_{0}(**r**) and a fluctuation term *δρ*(**r**), where the former is typically the superposition of electron densities of the corresponding isolated spherical atoms (see Fig. 1). Together with the corresponding nuclei, these atomic reference densities are charge neutral and, therefore, do not contribute to the long-range electrostatic interactions, leaving the fluctuation density as the object of interest.

*δρ*

_{i}(

**r**) is the local fluctuation density around atom

*i*. This allows us to define partial charges as

*δρ*(

**r**), the partial charges are also to some extent arbitrary, although canonical choices like Hirshfeld partitioning exist.

*E*

_{0}is a charge-independent reference energy. This could, e.g., be defined as the sum of the energies of the isolated neutral atoms, or this could stem from a local (charge-independent) interatomic potential. The second term

*E*

_{QEq}collects all charge-dependent terms and will be our primary focus in the following.

*E*

_{QEq}can itself further be divided into two terms. The first of these is a site-energy term that sums over all

*N*atoms

*i*and represents the second-order Taylor expansion of the atomic energy with respect to the partial charges. In this context, the expansion coefficient

*χ*

_{i}is usually termed the electronegativity, while

*J*

_{i}is the electronic hardness. In the original QEq scheme, both of these coefficients are element-dependent parameters. The second term in

*E*

_{QEq}is the classical Coulomb energy of the fluctuation density.

*E*

_{QEq}, we now need to define a mathematical expression of the fluctuation density and its partitioning. Specifically, we will assume that the fluctuation density

*δρ*(

**r**) can approximately be expressed as a superposition of spherically symmetric atom-centered Gaussians. Each of these Gaussians is normalized to the corresponding atomic partial charge

*q*

_{i}according to Eq. (2) and has an inverse distribution width $\varphi i=1/(2\alpha i)$, where

*α*

_{i}can be interpreted as an atomic radius. This leads to the expression

**r**

_{i}. Using this definition, the Coulomb integral in Eq. (3) can be evaluated analytically as

*r*

_{ij}being the distance between atoms

*i*and

*j*. With this,

*E*

_{QEq}from Eq. (3) can be rewritten as

*J*

_{i}as a non-classical contribution to the hardness, while the classical contribution is given by the self-energy of the Gaussians.

*q*

_{i},

*E*

_{QEq}must now be minimized. Due to the chosen functional form of the site energy, this expression is quadratic in

*q*

_{i}, so that the optimal charges can be computed in closed form. To this end, we take the derivative of Eq. (6) with respect to each partial charge and set it to zero. This leads to the linear system of equations,

*A*

_{ij}being the elements of the hardness matrix

**A**defined as

*λ*to conserve the total charge

*q*

_{tot}, we obtain a linear system of equations that can be expressed in matrix notation as

*N*-dimensional arrays.

*λ*) can now easily be computed as

*E*

_{QEq}can be expressed as

### B. Periodic boundary conditions

Up to this point (and in Ref. 34), we have only considered systems in open boundary conditions (i.e., isolated molecules in the gas phase). For periodic systems, Eqs. (10) and (11) can also be used. However, this requires using Ewald summation in the construction of the hardness matrix **A**, in order to take the full long-range interactions in an infinite crystal into account.^{36} In particular, the Coulomb integral must be modified. The same implementation as in 31 was adopted here.

*N*point charges in periodic boundary conditions by separating it into a real-space and a reciprocal-space contribution. To this end, each charge is embedded into an auxiliary Gaussian charge distribution of the opposite sign and width

*η*, defined as

^{37}

*V*is the volume of the unit cell. Note that these auxiliary Gaussians are not to be confused with the ones defined in Eq. (4).

*E*

_{real}can be evaluated in real space as

*N*atoms

*i*in the unit cell, and the second sum goes over all

*N*

_{neig}atoms

*j*(including periodic replicas) within the cutoff distance

*r*

_{real}of atom

*i*. The cutoff is derived from the width

*η*of the auxiliary Gaussians and depends on the desired accuracy

*ϵ*, which is a small positive number determined by the user (as in Ref. 31, we use a default value of

*ϵ*= 10

^{−8}). This yields the following expression for

*r*

_{real}:

^{37,38}

**k**within the cutoff

*r*

_{recip}, which is computed as

*r*

_{recip}depends again on the user defined accuracy parameter

*ϵ*.

*N*point charges in periodic boundary conditions,

*α*

_{i}instead of point charges, an additional correction term is required

^{36}

*r*

_{real}, while the second term corresponds to the on-site contribution of the Coulomb integral, which is also present in the non-periodic case.

### C. Kernel charge equilibration

As described so far, conventional charge equilibration schemes like QEq require the definition of three parameters per element. These are the electronegativity (*χ*_{i}), the non-classical contribution to the hardness (*J*_{i}), and the atomic radius (*α*_{i}). In practice, this limits the achievable accuracy of QEq, since the same electronic properties are assumed for all atoms of the same element, independent of their chemical environment and oxidation state. ML based charge equilibration methods can overcome this limitation by allowing *χ*_{i} (and, in principle, also the other parameters) to adapt to the environment of each atom, e.g., via a NN.^{27,28}

**p**

_{i}, a kernel function

*k*, and regression weights

*w*

_{m}as

*m*in a representative set of chemical environments. Simply put, this equation thus assigns the electronegativity of atom

*i*based on the similarity between

*i*and each atom

*m*in the representative set, as quantified by the kernel function

*k*.

In the original kQEq implementation reported in Ref. 34, the representative set simply consisted of all chemical environments in the training set. In this case, the cost of predicting the electronegativities scales linearly with the number of training samples. Even worse, the training cost of such a Kernel Ridge Regression (KRR) model scales cubically with the number of training samples. The new implementation in q-pac, therefore, uses a different regression framework, namely sparse GPR. This is directly analogous to the approach used in Gaussian Approximation Potentials (GAP).^{4,7} Specifically, the representative set now consists of *M* environments (also called sparse points), which form a representative subset of the training set. As in GAP, these are selected through a CUR decomposition of the matrix of representation vectors.^{4,39}

^{40}vectors are used, as implemented in the $Dscribe$ package. As in GAP, polynomial kernels are used to quantify similarities between SOAP vectors,

*ζ*= 2 is used as a default. SOAP vectors are normalized so that

*k*(

**p**

_{i},

**p**

_{i}) = 1.

*N*atoms, Eq. (21) can be rewritten as a matrix–vector multiplication,

^{4}where the subscripts indicate their dimensions.

**K**

_{NM}is thus a matrix containing the evaluations of the kernel function between all

*M*sparse points and all

*N*environments to be predicted. Thus, the obtained electronegativities can then be used to predict charges via Eq. (10).

### D. Training on electrostatic properties

In principle, Eqs. (23) and (10) fully specify the kQEq method. However, this leaves the key question of what the regression weights *w*_{m} should be. In Ref. 34, we showed that these can be computed in closed-form by solving a regularized least-squares problem. This was demonstrated by fitting kQEq models on molecular dipole moments. Importantly, the fact that training can be performed as a single closed-form linear algebra operation, in this case, hinges on the fact that dipole moments are linear functions of atomic partial charges. Additionally, the charges in QEq are linear functions of the electronegativities [see Eq. (10)], and the electronegativities are linear functions of the regression weights [see Eq. (23)]. Indeed, Kernel methods like KRR and GPR allow arbitrary linear transformations of the regression output in the loss function.

**t**

_{ref}is a general target property (e.g., atomic charges or dipole vector elements), and

**T**

_{t}is a transformation matrix that converts a Lagrange multiplier expanded vector [see Eq. (9)] of charges $q\u0304$ to the target property.

**Σ**

_{t}is an

*N*-dimensional diagonal regularization matrix that contains noise parameters $\sigma i2$, which are proportional to the regularization strength. Unlike the unit-less regularization parameter in the previous KRR implementation,

*σ*

_{i}has the unit of the predicted property and can be interpreted as the expected accuracy of the fit. Furthermore, the GPR framework allows assigning individual values of

*σ*

_{i}in order to weight training samples differently. For simplicity, a single value of

*σ*

_{i}is used for each of the examples below.

**w**, which minimizes this loss function. To this end, Eq. (24) must be rewritten so that it only depends on

**w**. Here, a technical difficulty arises, in that Eq. (23) yields a vector of electronegativities

**, while Eq. (10) requires the extended vector $\chi \u0304$, which includes the total charge of the system**

*χ**q*

_{tot}. This transformation is achieved via an auxiliary matrix

**X**,

**T**

_{t}determines the targeted property

**t**,

**w**. The optimal weights are obtained by taking the derivative $\u2207wL$ (which is linear in

**w**), setting it to zero and solving for

**w**.

Note that for simplicity, all expressions provided herein assume a single kQEq problem with *N* atoms (i.e., one simulation cell or molecule). In practice, models are trained on multiple systems simultaneously using blocked matrices and concatenated vectors, which then naturally allow the use of systems with varying numbers of atoms *N* in the training set.

### E. Training on energies

As discussed in the introduction, one of the main motivations for developing ML-based charge equilibration models is the development of interatomic potentials with full long-range electrostatics. To this end, training on reference charges (e.g., from Hirshfeld partitioning) can yield a reasonable description of long-range interactions. However, population analysis schemes are, in general, not optimal for this purpose, as the charges usually yield quantitatively incorrect electrostatic properties. More critically, the energy *E*_{QEq} is in our experience rather unphysical when only training on charges or dipole moments. This is due to the fact that the energy expression is only a latent quantity in this case (yielding the appropriate charges upon minimization), which bears no relation to the real potential energy surface. As a consequence, the on-site energy contributions can be large and overly sensitive to small geometric changes, making *E*_{QEq} a poor basis for an interatomic potential.

One way to overcome this issue is to simply ignore the site-energy term in the interatomic potential. This is the approach taken in the 4GHDNNPs mentioned above.^{15} However, this has the downside that the corresponding potentials are not self-consistent, in the sense that their charges do not minimize the energy. The other alternative is to explicitly include energies in the loss function so that *E*_{QEq} takes physical information about the potential energy surface into account.

Unfortunately, fitting energies is not entirely straightforward within the kQEq framework. This is because *E*_{QEq} is not linear in the charges, which means that a closed-form solution for the optimal regression weights does not exist. However, for a given set of charges, the energy *is* linear in the electronegativities. q-pac, therefore, includes a form of fixed-point iteration to obtain accurate self-consistent energy models.

Specifically, we use an arbitrary set of initial charges **q**^{0} (e.g., from Hirshfeld partitioning) and train electronegativities *χ*^{1} that yield optimal energies for these charges. Subsequently, we predict the self-consistent charges **q**^{1} corresponding to *χ*^{1}. In general, there is a large difference between **q**^{0} and **q**^{1}, so the self-consistent energies of this kQEq model will be rather inaccurate. However, iteratively restarting this process usually yields significant improvements, in that the self-consistent charges **q**^{t} corresponding to *χ*^{t} quickly converge toward the ones used to fit the energies (**q**^{t−1}). In pathological cases, the loss function can be expanded to include a bias toward the charges from the previous iteration, further aiding convergence. Here, a practical approach is to begin training on energies alone until the energy fitting error starts to increase. The charge bias can then be added with an initially large regularization parameter $\sigma q2=0.01$ e (corresponding to a small weight of charges in the loss function), which is subsequently decreased by a factor of 0.5 at each iteration.

By monitoring the energy fitting error and the charge differences between two iterations, optimal regression weights for a kQEq interatomic potential can be selected. The typical convergence of charges and energies in this process is illustrated for a set of ZnO nanoparticles in Fig. 2 (see below for details on the dataset). This shows that even without the charge bias, energies and charges converge well, with the differences between self-consistent and fitting charges being below 10^{−3} electron charges.

### F. Atomic forces

**r**

_{j}. While the derivative of Eq. (6) with respect to

**r**

_{j}is straightforward to compute, a complication arises because both the charges

**q**and the electronegativities

**also depend on**

*χ***r**

_{j}. Here, the use of self-consistent charges is beneficial because by definition,

*j*can be expressed as

**r**

_{j}. According to Eq. (21), this term only requires taking the derivatives of the kernel function and the SOAP vectors, the latter of which are obtained through Dscribe.

### G. Technical aspects

q-pac is implemented as an object-oriented library using Python 3.9. It heavily relies on numpy^{41} for array operations and linear algebra, ase^{42,43} for representing structural data and running atomistic simulations, and Dscribe^{37} for calculating SOAP vectors and their derivatives. The Ewald summation portion of the code is written in C++. C++ types are exposed to Python via pybind11.^{44}

## III. RESULTS

### A. Effect of sparsification

The main algorithmic advance in q-pac relative to the original kQEq implementation is the use of sparse GPR instead of KRR for predicting electronegativities. In order to demonstrate the benefit of this change, a series of dipole moment prediction models were trained, similar to Ref. 34. Specifically, 35 000 randomly selected molecules from the QM9 database were used, spanning a variety of small organic molecules containing C, H, O, N, and F.^{45} The corresponding reference dipole moments were computed at the PBE0/def2-TZVP level using ORCA.^{46–48} From this, 1000 molecules each were randomly drawn as validation and test sets, while differently sized training sets were randomly drawn from the remaining 33 000 molecules.

As described in Sec. III, multiple hyperparameters need to be defined for any kQEq model. Following Ref. 34, the non-classical atomic hardness *J*_{i} was set to 0 for all elements. Atomic radii *α*_{i} for all elements are tabulated in the original QEq paper.^{30} In our previous work, we found it beneficial to scale these since they are not necessarily ideal for the Gaussian charge distributions used in q-pac. In this paper, all radii are globally scaled by $12$, which yielded robust models in all cases we considered.

The regularization strength *σ*_{μ} was optimized for each training set using grid search on the validation set error. The main SOAP hyperparameter to be chosen is the cut-off radius *r*_{cut}, which was set to 4.4 Å. The remaining SOAP hyperparameters are discussed in the Supplementary Information, together with results for smaller values of *r*_{cut}.

To establish the accuracy of the sparse GPR approach, Fig. 3 shows the mean absolute error (MAE) in predicted dipole moments for different training sets as a function of the number of sparse points *M* (per element). For comparison, the dashed lines in these figures show results for the corresponding full GPR models, where *M* includes all chemical environments in the training set. This is equivalent to the KRR models used in the original kQEq paper.^{34} For training sets of 3000 and 10 000 molecules, the accuracy of the full GPR is reached with *M* = 3000 (0.12 and 0.1 D, respectively). Note that for the training set of 20 000 molecules, the full model was not computed due to prohibitively high memory requirements. With sparse GPR, this is not an issue, however, allowing even lower MAEs.

In Fig. 4, the evolution of test set errors and computational costs for training and prediction with the number of training molecules is shown. Here, it should be emphasized that the number of sparse points is given in terms of chemical environments, whereas the training set size is given in terms of the number of molecules. Depending on the training set size, sparse models with 1000 and 3000 sparse points are equally accurate as the full models, with the *M* = 1000 models deviating from the full results for training sets larger than 1000 molecules. As mentioned above, the full models become computationally prohibitive for the largest training set of 20 000 molecules.

This can also be seen from the timings in Fig. 4, which clearly show that the complete training process (including evaluation of the validation set for hyperparameter tuning) scales much more steeply with the training set size for the full model. This is in line with the expected asymptotic scalings of $O(N3)$ and $O(N)$ for the full and sparse models, respectively, although the sparse models are not yet in the linear regime in this plot. In concrete terms, training and validation of the full models took an average of 2.1 h (see supplementary material for hardware details) for 10 000 training molecules. In comparison, the *M* = 3000 model was six times faster, reaching nearly identical accuracy in 0.35 h. With 1000 sparse points per element, the same results were produced in 0.24 h.

The difference between sparse and full GPR models is even more striking when looking at the prediction times. Here, the sparse models scale as $O(1)$, while the full models scale as $O(N)$. This clearly has significant implications for models with large training sets and/or a large number of required predictions, where sparse kQEq models can potentially be one to two orders of magnitude more efficient. Beyond this substantial acceleration, the memory requirements of training sparse models are also much smaller.

### B. Interatomic potentials for isolated systems

To illustrate the capabilities of kQEq for fitting potential energy surfaces (PES) of ionic materials, we developed an ML potential for a set of ZnO nanoparticles taken from the global optimization study of Chen and Dixon.^{49} Specifically, a set of 98 low-energy structures of sizes between 62 and 264 zinc and oxygen atoms was used. Reference energies and Hirshfeld charges were computed with FHI-Aims at the PBE level using *tight* basis sets and integration settings.^{50,51}

Note that the choice of a predominantly ionic material like ZnO allows fitting accurate interatomic potentials using kQEq alone. Although bonding in ZnO is not purely ionic but contains a partial covalent character (in addition to other interactions like dispersion and exchange repulsion), the use of environment-dependent electronegativities allows kQEq models to indirectly take such effects into account. This works well for systems where non-ionic interactions are not dominating and of low body-order. In the more general case, such interactions should be treated separately.

The original dataset of Chen and Dixon exclusively consists of locally relaxed structures. In order to also test kQEq potentials on non-equilibrium structures, active learning was used to augment the dataset.^{4,52} Specifically, the initial kQEq interatomic potential (trained on the relaxed structures) was used to generate new configurations via MD simulations. From these, a diverse subset was selected via CUR decomposition, evaluated by the reference DFT method, and added to the training set. This active learning loop was repeated until the energy RMSE on newly generated structures converged.

The corresponding MD simulations were run through the Atomic Simulation Environment (ASE)^{42,43} calculator implemented in q-pac. Langevin dynamics were performed at 300 K with a 0.5 fs time step and a friction coefficient of 0.01. Before each production run, structures were reoptimized with the BFGS algorithm, followed by a 1 ps equilibration run. To account for the increasing accuracy of the interatomic potentials as a function of the active learning iterations, the length of the production runs was incrementally increased. Specifically, the initial simulation time was set to 0.2 ps and increased by a factor of two in each following iteration. Accordingly, the number of configurations added to the training set was also increased in each iteration, starting with 50 configurations.

In terms of model hyperparameters, the regularization for the energy term was set to $\sigma E2$ = 0.01 eV throughout. As discussed in the methods section, a charge bias term was added in later fixed point iterations to ensure convergence, although this was usually not necessary (see Fig. 2). The hardness parameters *J*_{i} were set to 27.21 eV (1.0 Ha) for Zn and O, and the SOAP cutoff was set to 3.0 Å. A full table of hyperparameters is provided in the supplementary material. Note that for this proof-of-principle application, no hyperparameter optimization was performed. It is, therefore, likely that even better performance could be achieved in principle.

The results of the active learning iterations are shown in Fig. 5. Here, the energy RMSE for new MD configurations is shown for each iteration, along with the number of configurations added to the training set. The RMSE quickly drops from 39.6 meV/atom for the initial model to 1 meV/atom in iteration 2 and 0.8 meV/atom in the final iteration.

Beyond this good energetic accuracy, it is also of interest to consider the charge distributions that are learned by this potential. In Fig. 6 (top), the correlation between Hirshfeld and kQEq charges is shown. This reveals that kQEq charges are somewhat larger than Hirshfeld ones, with average charges of ±0.54 and ±0.37, respectively. This is consistent with the known tendency of Hirshfeld charges to underestimate charge transfer in polar systems, which is related to the use of neutral isolated atom densities to define the partitioning.^{53,54}

Notably, the variation of the charges displays an inverse correlation between kQEq and Hirshfeld, in particular for the oxygen atoms. Here, the least negative atoms in kQEq are the most negative in Hirshfeld and vice versa. This can be understood by considering the individual cases displayed in the lower part of Fig. 6, which reveals that Hirshfeld charges tend to be more extreme for undercoordinated edge and corner atoms, while kQEq charges are less extreme for these atoms. This is again consistent with the way the respective charges are calculated. In Hirshfeld population analysis, the electron density is spatially partitioned according to a stockholder scheme where the electron densities of isolated neutral atoms are used to define the respective weights. Here, lower coordination environments mean that the density is partitioned between fewer partners. Meanwhile, kQEq charges minimize an electrostatic energy expression. Here, more extreme charges can be stabilized through electrostatic interactions in higher coordination environments.

Overall, partial charges are, in general, somewhat arbitrary, so no strong conclusions in favor of either model can be drawn from the charges alone. Here, we can consider dipole moments as an observable instead. While dipole interactions are, in principle, included in energies, they only dominate in the long-range interactions between clusters. Since we only train on isolated clusters, it should not be expected that these effects are learned by the current model. Indeed, while kQEq dipole elements correlate reasonably well with the DFT reference, they deviate rather strongly in absolute terms (see Fig. S6). Notably, the Hirshfeld charges obtained from DFT calculations actually reproduce the dipole of the particles surprisingly well in this case, as shown in Fig. S7. This indicates that combined loss functions with energies, charges, and/or dipole moments could be used to obtain improved electrostatic properties. Importantly, the kQEq charges are optimized for producing an interatomic potential for these clusters, whereas the Hirshfeld charges are not.

So far, we have only mentioned the non-classical atomic hardness *J* as a hyperparameter without discussing its effects in detail. Qualitatively speaking, larger values of *J* have the effect of counteracting charge transfer. To demonstrate how this affects the predicted charges and energies, 25 additional models were trained on the ZnO particle dataset, with different combinations of hardness parameters for Zn and O (see Fig. S8). This reveals that higher values of *J* lead to smaller absolute values of the charges and also decrease the fluctuations for the individual charge types. This is also reflected in the correlations between DFT and kQEq dipole moments (see Fig. S6), where models using smaller hardness parameters are better at reproducing the DFT dipole moments, likely because smaller hardness allows charges to be distributed more freely. Meanwhile, the energy RMSEs change less than 0.3 meV per atom between the best and worst models, underscoring that the energy is rather insensitive to this parameter.

### C. Interatomic potentials for periodic systems

As described in the methods section, q-pac also allows the development of models for periodic systems. To demonstrate this, we fit an interatomic potential for a range of bulk structures with the stoichiometries ZnO and ZnO_{2}.

An initial set of crystals was obtained from Materials Project^{55} (16 for ZnO and 9 for ZnO_{2}, see supplementary material). This set was augmented by creating random neutral vacancy defect pairs (i.e., removing one O and one Zn atom) in different supercells of each crystal, yielding 20 configurations for each supercell. Here, the supercells were used to sample different defect densities and were chosen in order to still allow reference DFT calculations for all cells (i.e., containing less than 400 atoms, see supplementary material for details). Additionally, non-equilibrium structures were generated by randomly perturbing the atomic positions with Gaussian noise, yielding 20 additional structures per supercell (half of them perturbed with *σ* = 0.05 Å, the other half with *σ* = 0.1 Å). This led to a total of 1025 structures with supercells ranging from 52 to 384 atoms, for which reference calculations were performed at the PBE level using FHI-Aims with *light* basis set and integration settings.

In order to train the kQEq model, a training set was generated from this data by randomly drawing 150 configurations each for pristine and defective ZnO structures, respectively. Similarly, 80 ZnO_{2} configurations each were drawn from the pristine and defective sets. The remaining 606 structures were used as an unseen test set. The same hyperparameters as for the non-periodic structures were used for the kQEq and SOAP settings.

The corresponding model displays an RMSE of 4.1 meV/atom on the test set. This is quite satisfactory, given that the test set covers an energy span of nearly 1 eV/atom (see Fig. 7). Importantly, the potential is able to fit ZnO and ZnO_{2} on the same footing, since it is able to describe different oxidation states of Zn through environment dependent electronegativities.

This is also reflected in Fig. 8, which shows the correlation between Hirshfeld and kQEq charges for the test set. Both approaches yield distinct clusters for the ZnO and ZnO_{2} charges, with the charges again being larger in magnitude for kQEq (e.g., ±0.55 vs ±0.35 for ZnO). Notably, the Hirshfeld oxygen charges display a strikingly large variation for ZnO_{2} and are even positive in some cases, whereas the corresponding kQEq charges are consistently negative and display much smaller variance.

To illustrate the corresponding charge distributions, three simulation cells for a tetraauricupride-structured ZnO polymorph (MP-ID 13161) are shown in Fig. 8. As a reference, Hirshfeld and kQEq charge distributions for the relaxed supercell are shown in the left column. In the center, a rattled configuration is shown. kQEq and Hirshfeld charges display a qualitatively similar response to this perturbation. Finally, the right frame shows a structure with a vacancy pair. Here, an O atom is missing on the bottom right, and a Zn atom is missing on the bottom left. Interestingly, the response to the O vacancy is almost identical between the kQEq and Hirshfeld charges, with the adjacent O atoms being more negative and the adjacent Zn atoms being less positive, consistent with an overall charge neutral defect. Meanwhile, the response for the Zn vacancy is similar for the adjacent Zn atoms (which become more positive) but different for the adjacent oxygen atoms. Here, the corresponding Hirshfeld charges are slightly more negative than average, while the kQEq charges are slightly less negative. Nevertheless, the agreement between these methods in terms of the electronic localization of the defect is rather good, given that both methods are based on very different premises.

Since we present models for zinc oxides targeting either isolated clusters or periodic structures, it is natural to ask whether the models are transferable from one case to the other. This is not quite straightforward to test for the reported models, since the bulk structures feature both ZnO and ZnO_{2} compositions, whereas the clusters all display the stoichiometry ZnO. Additionally, slightly different DFT settings were used to train both models.

To explore the transferability between bulk and cluster models, we, therefore, retrained the periodic model using only ZnO compositions and consistent DFT settings with the isolated case. Additionally, a new model was trained on both bulk and cluster configurations. In Fig. S5, the transferability is shown for these three models. This reveals that the predictions of the bulk model are systematically too high in energy and display significant scatter for cluster configurations. In the reverse case, even stronger deviations are observed, with some configurations being predicted to be several eV too high in energy. Nonetheless, when training a model on both types of configurations, excellent predictive performance is achieved on the test set. This indicates that the lack of transferability is not due to the simultaneous inclusion of periodic and cluster configurations but rather due to inherent differences in the two sets. For example, the bulk set contains defective and randomly perturbed structures, while the clusters feature hollow nanospheres and tubes with atomic environments that strongly differ from the bulk case.

### D. Forces

As for other interatomic potentials, in principle, forces can also be used as fitting targets for kQEq. Currently, this feature is not implemented in q-pac, however. This is mainly due to the increased memory demand when fitting on forces as opposed to energies. We are actively working to overcome this limitation at the moment. Nevertheless, it is already of interest to consider how accurate forces are for kQEq models trained on energies.

As described above, the model for ZnO particles was trained on molecular dynamics configurations in an active learning setup. Due to these non-equilibrium structures, the model has sufficient information about the shape of the potential energy surface to predict forces with reasonable accuracy (RMSE = 0.17 eV/Å, see Fig. S3). In contrast, the potential for bulk ZnO and ZnO_{2} structures is less suitable for use in dynamical calculations due to overall larger force errors (see Fig. S4). Here, the training configurations were created by randomly removing atoms from crystalline supercells (without further relaxation) or randomly perturbing atomic positions. Both of these procedures lead to rather unphysical structures and large forces. In practice, active learning would also allow better performance on force prediction for the bulk structures, however.

## IV. CONCLUSIONS

In this paper, we have introduced the q-pac package, which provides an efficient and general framework for fitting kQEq models. This is achieved through a new sparse GPR formulation of the kQEq method and the implementation of additional and generalizable fitting targets. Importantly, this allows fitting kQEq interatomic potentials for the first time using a fixed point iteration algorithm.

While the showcased applications demonstrate the functionality and accuracy of this approach, pure kQEq potentials are (much like the CENT method)^{27} limited to predominantly ionic materials like ZnO. In future work, we will explore hybrid potentials that combine short-ranged local interatomic potentials like GAP with kQEq, in order to obtain generally applicable potentials with full long-range interactions.

The modularity of q-pac will also allow for the development of ML charge equilibration models beyond the simple QEq energy function. This may become necessary for properly modeling the response of polarizable systems to external fields, where conventional QEq is known to be inadequate.^{56} However, there are some indications that the much larger flexibility of kQEq mitigates these problems to a large extent.^{34} Either way, q-pac provides the ideal testing ground for addressing such pathologies both in terms of the physics of charge equilibration and in terms of more advanced ML approaches.

## SUPPLEMENTARY MATERIAL

See the supplementary material for hyperparameters, a detailed description of ZnO data, and an additional analysis of predicted charges and forces.

## ACKNOWLEDGMENTS

We thank Jonas Finkler for his help with implementing electrostatic interactions for periodic systems.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Martin Vondrák**: Investigation (equal); Methodology (equal); Software (lead); Visualization (lead); Writing – original draft (equal). **Karsten Reuter**: Conceptualization (equal); Funding acquisition (lead); Writing – review & editing (equal). **Johannes T. Margraf**: Conceptualization (lead); Methodology (equal); Software (supporting); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The code and data for this paper are publicly available at https://gitlab.com/jmargraf/qpac.

## REFERENCES

*δ*-machine learning approach