Tight-binding approaches, especially the Density Functional Tight-Binding (DFTB) and the extended tight-binding schemes, allow for efficient quantum mechanical simulations of large systems and long-time scales. They are derived from *ab initio* density functional theory using pragmatic approximations and some empirical terms, ensuring a fine balance between speed and accuracy. Their accuracy can be improved by tuning the empirical parameters using machine learning techniques, especially when information about the local environment of the atoms is incorporated. As the significant quantum mechanical contributions are still provided by the tight-binding models, and only short-ranged corrections are fitted, the learning procedure is typically shorter and more transferable as it were with predicting the quantum mechanical properties directly with machine learning without an underlying physically motivated model. As a further advantage, derived quantum mechanical quantities can be calculated based on the tight-binding model without the need for additional learning. We have developed the open-source framework—Tight-Binding Machine Learning Toolkit—which allows the easy implementation of such combined approaches. The toolkit currently contains layers for the DFTB method and an interface to the GFN1-xTB Hamiltonian, but due to its modular structure and its well-defined interfaces, additional atom-based schemes can be implemented easily. We are discussing the general structure of the framework, some essential implementation details, and several proof-of-concept applications demonstrating the perspectives of the combined methods and the functionality of the toolkit.

## I. INTRODUCTION

Many recent technological advancements can be attributed, at least in part, to the discovery and development of novel functional materials. Computational simulation methods have become an important tool in the field of materials research in which the ability to explore configurational space and to accurately predict material properties is of paramount importance. *Ab initio* methods, such as Density Functional Theory (DFT), are commonplace as they offer a comparably high degree of accuracy and provide access to the electronic structure. However, their high computational cost makes them ill-suited to tackle problems requiring large time and size scales, such as molecular overlayer formation. While classical force field methods are able to reach the required scales at a reasonable cost, they do not provide access to electronic structure and are commonly unable to capture the full complexity of the potential energy surface; particularly when metals are involved. Semi-empirical methods are commonly referred to as “bridging” methods due to their ability to reach size and time scales that are off-limits to traditional *ab initio* methods while retaining electronic structure. Unfortunately, the parameter sets upon which such methods commonly rely require a non-trivial degree of effort to create and might suffer from limited transferability for complex systems. However, by leveraging the power of machine learning (ML), one could augment existing bridging methods to improve their transferability, reduce the complexity of creating parameter sets, and even introduce reactivity.

The field of machine learning has experienced steady growth over the past two decades and has recently exploded in popularity due to the accessibility afforded by frameworks, such as TensorFlow^{1} and PyTorch.^{2} Machine learning has been used with great success to tackle problems in a variety of fields, with chemistry, physics, and materials science being no exception. Several groups demonstrated successful and promising approaches for combining various levels of materials simulation methods with machine learning, such as in Refs. 3–9. We will omit an in-depth discussion of machine learning and its successes within the field of materials science in this paper, as detailed reviews^{10–12} on the subject already exist, and the purpose of this publication is not to demonstrate any novel methods but to provide an overview about the current capabilities of the Tight-Binding Machine Learning Toolkit (TBMaLT) software package.

Applications of machine learning to materials modeling can be loosely placed into one of three categories: (*i*) assistive, (*ii*) augmentative, and (*iii*) substitutive. Assistive implementations are those which do not directly modify the method to which they are applied but rather fulfill an ancillary role, such as aiding the creation of new parameters.^{13,14} Those which directly modify or replace parts of or add new components to their parent method are deemed to be augmentative; some examples being those replacing traditional parameterizations with neural networks or those offering alternative routes to the chemical Hamiltonian.^{13,15,16} Novel ML-based approaches that are not templated off of any standard simulation method are labeled as substitutive as they are used in place of traditional methods; methods that directly predict specific properties, or groups thereof, are also included in this category. While such categorization is entirely artificial and somewhat subjective, it does highlight the notion that many approaches are based on established methods. Thus, assistive and augmentative approaches could, in theory, make use of existing codes to reduce the work required to implement them. When working with non-ML-based augmentations, one may directly modify existing codes to add new functionality and methods. However, this is not always viable for ML-based augmentations.

While novel ML-augmented methods are an attractive prospect, they can require a disproportionately large degree of effort to fully implement when compared to their non-ML counterparts. This is because the majority of production-level simulation packages were never designed to accommodate machine learning, and thus lack associated features, such as batch operability and automatic gradients. Therefore, existing code cannot easily be reused to reduce the work required, and one cannot simply generate and test a new method without first having to create and test a framework within which the new method can be used. With the Tight-Binding Machine Learning Toolkit (TBMaLT),^{17} we would like to introduce an ML-capable framework that offers tested and documented, ready-to-use components to facilitate the rapid prototyping and development of new tight-binding models. It is hoped that the presence of such a framework will not only reduce the work required to implement, validate, and share new models but also aid in reducing duplicated efforts.

To keep the scope of the project to a reasonable size, the decision was made to limit the focus of TBMaLT exclusively to tight-binding type methods. Tight-binding theory [such as in Density Functional Tight-Binding (DFTB)^{18,19} or in Extended Tight-Binding (xTB)^{20,21}] was chosen because it lends itself well to the ideas of machine learning as it tends to focus on speed and scalability without giving up access to the electronic band structure. It is important to note that the purpose of this body of work is not to present a novel tight-binding model but rather to highlight the existence and capabilities of the TBMaLT framework.

## II. STRUCTURE AND DESIGN

The main driving force behind TBMaLT’s development was the common need for a testbed within which novel, commonly machine learning-based, tight-binding methods could be implemented without having to repeatedly write and test all the associated boilerplate code. Such a framework could, in addition to helping to reduce a potential source of duplicated effort, be employed to reduce the complexity associated with parameterizing currently existing models by automating much of the fitting process. The primary design philosophy behind this project emphasizes flexibility and usability over raw computational efficiency. TBMaLT’s intended use case deviates from traditional modeling packages in that it is intended to facilitate the rapid prototyping, development, and validation of new models rather than the production-level use of existing ones.

The TBMaLT framework was written in Python as it is a highly accessible language, which lends itself well to rapid prototyping, which is prolifically used in the field of machine learning.^{22} Critical machine learning functionality, such as automated analytical gradients, was provided through the use of the PyTorch package.^{2} PyTorch was selected over other common machine learning frameworks due to its ease of use, simple pythonic syntax, and flexible nature. TBMaLT has been written in a modular and plug-and-play manner. This reduces the complexity associated with implementing new models and keeps users from having to modify the source code to do so. This has the added benefit of allowing new components to be used beyond their initial scope as components from different models could be arbitrarily combined to create new ones.

TBMaLT supports batch operability through the use of packed padding, which involves expanding a set of *n* rank *k* arrays to a common size and concatenating them into a single rank (*k* + 1) array. The padding values are commonly set to zero or one, depending on usage. This allows for operations to be vectorized and optimized for graphics processing unit (GPU)/single-instruction multiple data (SIMD) execution. The low memory efficiency of this method can commonly be offset via sparse tensors. However, it should be noted that the undeveloped nature of sparse tensors in PyTorch, at the time of writing, prevents their current use. Lorentzian and conditional broadening were used to alleviate the gradient stability issues with performing eigendecompositions on systems with degenerate eigenspectra, such as an equilibrated CH_{4} molecule.^{23,24} Conversion from the generalized to the standard eigenvalue problem was achieved via Cholesky decomposition as it produced more stable gradients than Löwdin orthogonalization. Self-consistency cycles, such as the self-consistent-charge (SCC) cycle of DFTB and xTB, are then performed (if applicable to the method), during which systems are sequentially culled from the batch as they converge to prevent over-convergence.

The default behavior in TBMaLT is to track the gradients during the SCC cycles using PyTorch’s autograd engine, resulting in accurate gradients with respect to the input features and accurate forces. All results presented in the main paper were obtained this way. Depending on the system sizes, however, the overhead caused by the gradient tracking might become substantial. Different strategies can be applied in TBMaLT to reduce this overhead. One might calculate the converged charges (at a given value of the input feature vector) for each system outside of the purview of PyTorch’s autograd engine and use those charges as input when building the charge dependent Hamiltonians (with gradient tracking) and omit any further SCC cycles. Keeping those input charges constant for several epochs, as first implemented by Yaron and co-workers,^{13} helps to reduce the computational complexity associated with the gradient computation. Nevertheless, the input charges must be updated regularly (e.g., every tenth epoch as in Ref. 13) to take their change due to the changes in the input feature vector into account. Alternatively, in systems where one expects the converged charges to have only a weak dependence on the input feature vector (e.g., in mostly homogeneous systems), one might also carry out the external SCC cycles without the autograd feature for each input feature vector. In contrast to the previous approach, this yields continuous error functions without jumps, but the obtained gradients will always deviate from the exact ones due to the missing derivative of the converged charges with respect to the input feature vector. If the deviation is small, the optimization might, nevertheless, converge to a solution close to the one obtained with exact gradients (as demonstrated for rattled bulk silicon structures in Fig. S1).

In a typical workflow, a base Calculator is selected and initialized by supplying it with the requisite Feed objects. The Feed objects are optimizable entities that are responsible for supplying the necessary inputs to a given calculation. The Calculator is then tasked with interrogating these feeds and using the resulting data to compute various properties as and when they are requested. During initialization, the Calculator may also be provided with calculator-specific settings, such as those that may pertain to charge mixing, finite temperature, and so on. Once initialized, the Calculator is given a target system, or batch thereof, and is then itself interrogated to retrieve the desired properties. These properties can then be compared to those computed via the reference method, and a loss can be calculated. The PyTorch autograd engine can then be invoked, and the resulting gradients are used to update the properties contained within some or all of the feeds. A schematic depiction of this process is provided for the SCC-DFTB case in Fig. 1. The method can be modified by selecting a different choice of Calculator or Feed objects. For example, the SCC-DFTB implementation displayed in Fig. 1 can be morphed into xTB by swapping out the Hamiltonian and overlap Feed objects. The modular nature of TBMaLT means that in many cases, the Feed objects can themselves be modified or otherwise augmented to change the nature of the method.

## III. METHODOLOGY

The TBMaLT framework currently offers two different tight-binding approaches: the DFTB and the xTB methods. In the following, we give a concise summary of these two formalisms in order to supply the necessary theoretical background for the applications described in Sec. IV.

### A. DFTB

The density functional tight-binding method^{18,19} is an approximation to density functional theory (for a review, see Ref. 25). It starts from the Kohn–Sham ansatz, where the electron density *ρ*(**r**) is represented as a sum of independent particle-densities,

where the sum runs over all one-electron states and *f*_{i} is the occupation number of state *ψ*_{i}. Writing the density as a deviation with respect to a reference density, *ρ*(**r**) = *ρ*_{0}(**r**) + *δρ*(**r**), the total energy is expanded up to second (or sometimes third) order and is traditionally collected into three energy terms,

the repulsive energy *E*_{rep}, the band energy *E*_{band}, and the charge transfer related second order energy *E*_{2}. The reference density is composed as a sum of the (compressed) atomic densities $\rho 0(r)=\u2211A\rho 0A(r)$ from all atoms *A* in the system. The last two terms in Eq. (2) are explicitly calculated with several approximations, while the repulsive energy is used to compensate for the accuracy loss due to those approximations.

In the traditional DFTB approach, the repulsive energy is written as a sum of pairwise interactions,

where the sum runs over all atom pairs in the system, and the two-body contributions $Erepsp(A)sp(B)$depend on the species of the two atoms *A* and *B*, respectively, and their distances *R*_{AB}. The two-body contributions are usually obtained by a fitting procedure, requiring DFTB total energies and/or forces to reproduce corresponding reference quantities (typically originating from *ab initio* calculations).

The band-structure energy can be written as

where the one-electron Hamiltonian $H=\u221212\u22072+Veff$ contains the kinetic energy and the Kohn–Sham effective potential *V*_{eff} calculated at the reference density *ρ*_{0}(**r**). The one-electron wavefunctions are expanded into a linear combination of atomic orbitals

which leads to the generalized eigenvalue problem

with *H*_{μν} = ⟨*ϕ*_{μ}|*H*|*ϕ*_{ν}⟩ and *S*_{μν} = ⟨*ϕ*_{μ}|*ϕ*_{ν}⟩ being the Hamiltonian and overlap matrices, respectively, in the basis of the atomic orbitals.

The Hamiltonian matrix element $H\mu A\nu B$ between the atomic orbitals *ϕ*_{μ} on atom *A* and *ϕ*_{ν} on atom *B* is approximated as

where *ɛ*_{μ} is the electron eigenenergy (onsite energy) belonging to orbital *ϕ*_{μ} in a free atom. Due to the two-center approximation in the matrix elements between different atoms (*A* ≠ *B*), the actual Hamiltonian (and overlap) matrices can be instantaneously constructed from pre-computed distance-dependent integrals for specific orbital orientations (ss^{σ}, sp^{σ}, pp^{σ}, pp^{π}, etc.) by applying the Slater–Koster transformations.^{26}

One typically uses a minimal basis with maximal one polarization orbital in DFTB. In order to improve the accuracy, the basis functions and the atomic densities are obtained by solving the atomic Kohn–Sham equation for an atom in a confinement potential *V*_{conf},

Often, the confinement potential is chosen to be a harmonic potential $Vconf(r)=(r/r0)2$ and different values for the compression radius *r*_{0} are used when calculating the compressed atomic orbitals *ϕ*_{λ} and the compressed atomic density $\rho I0$ (in separate calculations) for a given atom type *I*.

Finally, the charge fluctuation-dependent energy term is written as a Coulomb-like interaction of the gross Mulliken monopole charges on the atoms

where the coupling

between the charges *q*_{A} and *q*_{B} on atoms *A* and *B* depends on the distance *R*_{AB} between the atoms and on the second derivative of the energy of the free DFT atom with respect to the charge, $UI=\u22022EIatom/\u2202qI2$ (chemical hardness). The second term in Eq. (10) is short-ranged. The Coulomb term is long-ranged, though, and needs special summation techniques (e.g., Ewald summation^{27}) when calculated for periodic systems. The charge fluctuations also contribute to the Hamiltonian as

and consequently also to the resulting electronic structure. Since the Mulliken population depends on the solution of Eq. (6), a self-consistent procedure is needed to obtain the final quantities of a calculation, similar to the self-consistent field (SCF) in DFT calculations.

### B. xTB

The extended tight-binding (xTB) method is rooted in the same energy expansion as DFTB but contains different energy expressions and approximations and employs a mostly atom-wise parameterization strategy. Note that the following expressions are restricted to the first iteration of the xTB method (GFN1-xTB) as TBMaLT currently implements the GFN1-xTB Hamiltonian.

The GFN1-xTB total energy reads as

whereby the zeroth order energy expansion is given by a repulsion term *E*_{rep} approximated by an atom-pairwise screened Coulomb potential,^{28} and a dispersion term *E*_{disp} given by the established DFT-D3(BJ) dispersion correction scheme.^{29,30} For the first order energy, an extended Hückel-type (EHT) term *E*_{EHT} is employed,

where *H*^{0} and *P* denote the core Hamiltonian and the density matrix, respectively. This energy term is the analog of *E*_{band} in Eq. (2). Interatomic electrostatic interactions *E*_{el} originate from the second and third order energy expansion and are expressed with orbital-resolved Mulliken charges *q*_{μ} similar to DFTB. In fact, $Eel2$ closely resembles Eq. (9),

replacing the Coulomb interaction with the analytical Mataga–Nishimoto–Ohno–Klopman kernel^{31–33}

where *f*_{avg} denotes the (harmonic) average of the atomic Hubbard parameters *U*_{A,l} and *U*_{B,l′}, and *g* is a free parameter. The third order term $Eel3$ includes the diagonal elements of the charge derivative of the Hubbard parameter Γ_{μ},

improving the description of highly charged systems.^{34} A classical halogen bond correction *E*_{XB} is added to remedy the poor description of halogen bonds due to the isotropic point charge approximation. Finally, the electronic free energy *G*_{Fermi} is added.^{35} This term arises from fractional occupations generated by Fermi smearing in order to handle static correlation cases.

Variational minimization of the total energy expression yields the tight-binding Hamiltonian matrix elements *H*_{μν} for the generalized eigenvalue problem [Eq. (6)]. The Hamiltonian can be separated in a core Hamiltonian *H*^{0} stemming from *E*_{EHT} and a charge-dependent Hamiltonian *H*^{1} derived from *E*_{el}: *H* = *H*^{0} + *H*^{1}.

Similar to DFTB, the extended tight-binding (xTB) Hamiltonian is defined in a valence-only but partially polarized minimal basis set of spherical Gaussian-type orbitals (GTO). The GTO contraction coefficients are obtained by expanding Slater-type orbitals (STO) using the STO-*n*G scheme.^{36} The core Hamiltonian matrix elements are obtained in an EHT-like fashion by

where *S*_{μν} denotes the overlap matrix elements, *X*_{EN} an electronegativity-dependent term, and *F*_{scale} an elaborate scaling function. The latter depends on the angular momenta of the shells and the distance of the atomic centers and consumes the majority of the xTB parameters. The diagonal elements *H*_{μμ/νν} are computed from parameterized, shell-resolved atomic level energies *h*_{μ} and their local environment described by the coordination number CN_{μ} as

with the scaling factor $k\mu CN$ being an adjustable parameter.

The charge-dependent Hamiltonian takes a similar form as DFTB’s charge fluctuation Hamiltonian [Eq. (11)] but extends it by the third order term

Correspondingly, and similar to DFTB, xTB also requires a self-consistent procedure.

## IV. RESULTS

In this section, we present the first proof-of-concept showcases demonstrating the current capabilities of the TBMaLT framework. We include both, DFTB and xTB applications, in order to emphasize the Hamiltonian-type agnostic structure of the framework. Since the applications described below are meant to serve as simple showcases only, we have chosen rather small training set sizes (which we have found to be suitable for those cases^{37}), but omitted a detailed systematic study on the influence of the training set choice on the final results.

### A. DFTB

#### 1. Global and local parameter optimization

The traditional parameterization technique in DFTB involves an initial choice of atomic orbitals obtained from confined atomic calculations, which are then used to calculate the two-center approximated Hamiltonian integrals and the overlap integrals, as shown in Eqs. (5), (7), and (8). Within the TBMaLT framework, we have tested three different optimization strategies that have the potential to improve this procedure.

In the first approach, we carry out a global optimization of the compression radii of the species-dependent confinement potentials [*V*_{conf} in Eq. (8)] and the onsite energies [*ɛ*_{μ} in Eq. (7)]. This way, we obtain a “global” basis set for every species adapted to the training set. This approach is similar to the traditional parameterization approach but allows for an optimized choice of the basis orbitals and the onsite energies.

The second approach is similar, but we represent the distance-dependent two-center integrals of Eq. (7) and the similar overlap integrals with splines and optimize the spline coefficients directly. This method is similar to the approach followed by Yaron and co-workers^{13} and yields globally optimized integral tables for the corresponding species pairs. Unfortunately, by optimizing the two-center diatomic integrals directly, one loses the concept of a well-defined basis. This inevitably leads to inconsistencies between the obtained Hamiltonian and overlap matrices, although those can be limited by starting the optimization from a good and consistent initial guess. Furthermore, one cannot represent calculated quantities (such as electron density or eigenstates) as a function of space anymore.

In our third approach, we reintroduce the concept of the well-defined basis by predicting individual confinement radii *V*_{conf} and onsite energies *ɛ*_{μ} for each atom based on its local environment. This way, the basis becomes atom-dependent instead of atom-type dependent, as in the traditional approach. The Hamiltonian and the overlap matrices can be consistently calculated within this basis, and all obtained quantities have a well-defined real space representation.

In all three approaches, the optimization is driven by using the loss function

where *N* describes the number of systems in the training dataset, and *m* is the number of physical properties taken into account. *P*^{DFT} and *P*^{DFTB} are the target physical property values from the reference DFT and the DFTB calculations, respectively, and *ω*_{j} is the weight associated with a given physical property.

For the training procedure, we used a subset of the ANAKIN-ME dataset,^{38} where every molecule contained one heavy atom only (ANI-1_{1}). From this subset, we have randomly selected 1000 and 400 molecules for the training and test sets, respectively, for all three optimization methods. In order to reduce the bias in our training, we have carried out three independent training runs for each optimization method, selecting the molecules of the training and test sets randomly from the ANI-1_{1} dataset each time. The reference DFT data were calculated using the FHI-aims code.^{39} We have used the Perdew–Burke–Ernzerhof (PBE) functional^{40} with the “tight” basis set. For the local basis optimization, we have used atom-centered symmetry functions (ACSFs)^{41} to represent the chemical environment of the atoms in the ML input and the random forest method as the ML algorithm.^{42} The ACSFs have been implemented in TBMaLT, while Scikit-learn^{43} has been used to perform the random forest predictions. (Further details about the ML settings can be found in the supplementary material.)

Figure 2 shows the result of the optimizations when using the dipole moment as the only target property in the loss function (*m* = 1). The reported mean absolute errors (MAEs) are the averages of the MAEs calculated over the test sets in each of the three training runs. For comparison, we also report the performance when using the mio-1-1 parameter set.^{19} All three optimization techniques show a significant improvement, with the local basis optimization model performing best.

A more detailed discussion of the three different optimization techniques and further use cases can be found in Ref. 37.

#### 2. Training on the density of states for periodic systems

The TBMaLT framework can also treat periodic systems and allows for their inclusion into optimization and training scenarios (currently with the DFTB Hamiltonian only). This should be demonstrated by the following showcase, where DFTB parameters were trained on bulk silicon structures in order to improve the density of states (DOS) description. Unlike discrete properties, such as Mulliken charges or dipole moments, the DOS is represented by a continuous distribution in a certain energy range. Therefore, the Hellinger distance^{44} was employed as a loss function to evaluate the difference between pairs of distributions as it enables learning not only on the absolute differences but also on the trends of the curves. By using a similar strategy as discussed in Sec. IV A 1, the diatomic two-center integrals were represented by splines, which were optimized in order to obtain the Hamiltonian and the overlap matrices yielding an improved DOS for the training dataset.

First, we generated a database containing 50 randomly rattled silicon bulk systems with supercells of 64 atoms [an example is shown in Fig. 3(a)]. DFT calculations employing the HSE06 hybrid functional^{45} with a screening parameter of $0.11a0\u22121$ were carried out with the FHI-aims code^{39} using the “tight” parameter set. During the training process, the siband-1-1 parameter set^{46} was used as the starting point for the optimization of the spline representations of the DFTB diatomic Hamiltonian and overlap integrals, which were optimized simultaneously. In order to ensure that the DFT and DFTB DOS results are comparable, the DOS was calculated by applying identical Gaussian broadening to the eigenvalues in both cases. The same energy ranges (from −3.3 eV to +1.6 eV relative to the Fermi levels) were used to sample the DOS curves and to calculate the loss between the DFT and the DFTB results. To reduce the influence of sampling on the training data, 30 systems were randomly selected and partitioned into three distinct training sets, yielding three independent training runs. The remaining 20 unseen systems were then used to test the model. Figures 3(b) and 3(c) compare the DOS results with the DFT references on the test dataset in the three training runs before and after the ML optimization of the DFTB parameters. The ML optimization leads to more accurate results, reducing the average Hellinger and MSE losses from 10.8 and 19.4 to 6.2 and 5.4, respectively. It is also clear that the results after the ML optimization fit the DFT references better than the original DFTB calculations using the siband-1-1 parameter set.

Additional tests were also performed to evaluate the transferability of the model; i.e., whether it makes reasonable predictions for larger systems after having been trained on small ones. This was done by training models exclusively on the small 64-atom systems and evaluating them on larger 512-atom systems. In order to reduce the computational efforts, training and testing references were obtained by using the GGA-PBE functional^{40} in this case. As shown in Fig. 3(d), the ML optimization can correct the energy shift of the DOS curve below the Fermi level and enhance the accuracy above the Fermi level, making a total improvement of 34% and 66% as measured by the Hellinger loss and the MSE loss, respectively.

### B. xTB

This section illustrates the possibility of using other tight-binding schemes within the TBMaLT framework. For this purpose, we employ the GFN1-xTB Hamiltonian to improve its parameterization for applications on specific chemical spaces. The Hamiltonian is integrated into the TBMaLT workflow employing a GFN1-xTB adapter.^{47} Prior to further analysis, it is verified that the xTB Hamiltonian reproduces the number of electrons as well as the orbital occupancies within TBMaLT.

The optimization procedure employs gradient descent with the Adam optimizer^{48} and an MSE loss function. The atomic charges obtained from SCC-DFTB calculations are used as the target quantity. The corresponding DFT reference charges are obtained as described in Sec. IV A. Using the ANAKIN-ME subset ANI-1_{1} with a single heavy atom (C, N, and O), Hamiltonian-dependent parameters (scaling of electronegativities *k*_{EN} and shell parameters) as well as element-specific scaling parameters (elemental pair parameters *k*_{pair}) are optimized. For training purposes, we utilize 50 samples and test on 1000 samples, both containing an equal ratio of molecular species. Despite the minimal split, the chemical space of the dataset is well represented due to the choice of only a single heavy atom. Moreover, it facilitates only minor deviations from the original general-purpose GFN1-xTB parameterization and demonstrates the general robustness of the fit, being able to transfer to a multitude of samples not included in the fit. However, since the study is intended as a proof-of-principle, one should notice that the chemical variety in the employed dataset is rather limited.

Figure 4 shows the performance (MAE) of the default and various reoptimized GFN1-xTB parameterizations with respect to the DFT reference values on the ANI-1_{1} dataset. For training on the atomic charges (Fig. 4, left), reparameterizations based on Hamiltonian- (red, green) and element-specific (yellow) parameters lead to a significant improvement over the initial parameterization (blue). As refitting all Hamiltonian-based parameters grants the most flexibility to the optimization, the resulting reparameterization expectedly achieves the largest improvement of ∼44%. Considering the only slightly worse performance (∼40% error decrease) of the single-parameter fit of the electronegativity scaling *k*_{EN} (red), however, it becomes apparent that this parameter is most important and mainly drives the Hamiltonian fit. This is due to the fact that the original parameterization is designed to give reliable results over a wide range of chemical space. Since this study targets small molecules with limited binding motifs and simple structures, the electronic properties are optimized for individual heavy atoms.

Furthermore, it becomes evident that the reparameterization based on the atomic charges also has a beneficial effect on other properties, as exemplified by the dipole moments (Fig. 4, right). This demonstrates the possibility of improving the overall accuracy for specific use cases by means of a GFN1-xTB reparameterization.

## V. SUMMARY

Combining simple tight-binding models with machine learning is a very promising approach for materials simulations. Machine learning can improve the accuracy of such models significantly while having a well-defined physical model provides the advantage of an easier interpretation of the simulation results as compared to approaches, where physical quantities are predicted “directly” via machine learning without an underlying classical or quantum mechanical model. Also, the training procedure can be expected to be less involved when the physics is provided by the tight-binding model. Derivation of the ground state and excited state quantities not considered during training is straightforward in these approaches, and their transferability should be considerably better as of the direct approaches, provided the physical behavior is qualitatively well described by the chosen model.

We have started the development of the open-source software package TBMaLT,^{17} which provides a toolkit for combining machine learning approaches with atomistic simulation models. Our main goal is to provide a framework for the rapid prototyping of such hybrid approaches and for testing their capabilities and limits. The toolkit has been written using the PyTorch framework, providing backpropagated gradients out of the box. TBMaLT’s overall structure is designed to be physical model agnostic, opening the possibility for implementing and testing arbitrary (atom-based) approaches. Although the development only started recently, the package already offers the DFTB model (as a built-in module) and the xTB model (via a connector to an external PyTorch-based xTB implementation). Further models can be implemented or integrated easily. TBMaLT comes with extensive unit tests and a well-documented application programming interface.

We have presented a few proof-of-concept showcases, which demonstrate the viability of such hybrid approaches as well as the current capabilities of the TBMaLT framework. Our project’s development follows the usual open-source development workflow and is open to external contributors. We hope that the software package will be useful for the scientific community and will help to catalyze the development of faster and more accurate materials simulation methods in the near future.

## SUPPLEMENTARY MATERIAL

The supplementary material contains further details to the comparison of the approximative external SCC cycle technique with the exact gradient calculation (as discussed in Sec. II) for a rattled silicon structure and additional details about the machine learning settings used in Sec. IV A 1. Additionally, a dataset containing the geometries and the *ab initio* reference data used for the show case applications in Sec. IV is provided.

## ACKNOWLEDGMENTS

This work was supported by the DFG Priority Program No. SPP 2363 on “Utilization and Development of Machine Learning for Molecular Applications—Molecular Machine Learning” (Project No. 497190956). The authors also gratefully acknowledge funding from the DFG Research Training Network GRK 2247 on “Quantum Mechanical Materials Modeling—QM^{3}.”

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**A. McSloy**: Conceptualization (equal); Methodology (equal); Software (equal); Supervision (equal); Writing – original draft (equal). **G. Fan**: Conceptualization (equal); Software (equal); Writing – original draft (equal). **W. Sun**: Conceptualization (equal); Software (equal); Writing – original draft (equal). **C. Hölzer**: Conceptualization (equal); Software (equal); Writing – original draft (equal). **M. Friede**: Conceptualization (equal); Software (equal); Writing – original draft (equal). **S. Ehlert**: Conceptualization (equal); Software (equal); Writing – original draft (equal). **N.-E. Schütte**: Software (equal). **S. Grimme**: Funding acquisition (equal). **T. Frauenheim**: Funding acquisition (equal). **B. Aradi**: Conceptualization (equal); Project administration (equal); Software (equal); Supervision (equal); Writing – original draft (equal).

## DATA AVAILABILITY

The TBMaLT software package can be downloaded from its GitHub repository.^{17} Its API is extensively documented in the source code using Python’s docstring feature. Geometries and the target quantities obtained from DFT reference calculations for the various training and test tests are provided in the supplementary material.