Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort—design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.
Machine learning (ML) was born decades ago as a branch of applied mathematics and statistics and has put down roots in nearly all technological areas of life, ranging from search engines, computer vision tasks, self-driving cars, and natural language processing. These rapid developments catalyzed the advent of ML in the chemistry community,1–7 where a large amount of valuable data has been accumulated but not fully rationalized due to the lack of tools for finding implicit relationships between variables. Such issues are ubiquitous in chemistry and materials science, where chemical compound space is almost limitless, while the extraction of accurate theoretical description requires the solution of the Schrödinger equation at a great computational expense. It is not surprising that ML, a powerful tool to uncover relationships between variables of interest, has become a popular instrument in a computational chemistry toolbox.
The development of high accuracy, reactive, interatomic potentials is one of the primary breakthroughs of ML in the space of chemistry and materials science,8–12 which can infer energies and forces very close to quantum methods over a wide range of chemical space when there is sufficient data.4,13,14 However, the current generation of ML potentials widely employs the spatial locality assumption13,15–19 suggesting that the interactions between the nearest neighbors hold the highest significance. In practice, this assumption involves dividing the target property Ptot into localized contributions Pi within a specific spherical cut-off: . This partitioning approach facilitates efficient and nearly linear scaling when applying Machine Learning Interatomic Potentials (MLIAPs) to new systems. However, it also imposes a limitation on the information processed by machine learning models for phenomena involving quantum-mechanical delocalization and effects such as aromaticity, conjugation, or long-range electrostatics. These interactions extend beyond the cut-off distance and may not be automatically incorporated into the model. This limitation remains for existing ML potentials and cannot be simply overcome with larger or more diverse datasets.
Another continuing challenge is to develop ML potentials that perform well on rare atomic configurations. Such a capability becomes especially important when modeling chemical reactions, phase transitions, and other transient processes critical to industrial processes, where rare events, such as transition state configurations dictate the dynamics of the entire process. To build very general ML models, a noticeable focus has been on the collection of very broad datasets.13,20–22 However, the challenges remain significant: first, it is almost impossible to generate datasets spanning all relevant regions simply because it would require an impractically large number of reference simulations.20 Second, dramatic structural changes accompanying chemical reactions are naturally hard to sample, and plenty of relevant configurations could be missed even by active learning,17,23,24 and third, complete (or almost complete) neglect of electronic structure implies that energy (or other property) is primarily a function of a structure only9,10,13,15,18,25 or more sophisticated descriptor.25–30 Neglect of a quantum mechanical description leads to the immediate loss of many phenomena concerning reactivity, such as the interplay between closed and open shells wave functions,31,32 the preference of excited state pathway, and many others. Abrupt changes in electronic structures often arise during bond breaking and formation, and the same nuclear configuration in different electronic states can result in different energy and other properties. Despite substantial progress, the majority of MLIAPs still perform significantly better for equilibrium structures, and their lack of electronic structure limits their transferability to unexplored and out-of-equlibrium regions of chemical space. Since we are at a relatively early stage of the development of machine learning models, we should assume that these concerns will be addressed by future studies and that surrogate models might eventually incorporate the outlined effects.
In this perspective, we would like to focus on retaining relevant quantum physics in models parameterized by ML. Advanced dataset generation techniques responsible for tackling the first two sources of deficiency are covered elsewhere.17,23,24,33–35 A promising alternative to conventional scientific modeling is emerging as a new class of differentiable physics models.36–38 These models utilize auto-differentiation techniques39,40 to optimize the parameters of the underlying physical theory, in contrast to the randomly initialized weights used in pure ML models. Their primary objective remains unchanged: to discover the underlying numerical relationships between an input and an output. They are also sometimes referred to as physics-inspired or physics-aware models. By leveraging established scientific principles and thorough domain knowledge, these models strive to provide improved interpretability that cannot be attained through surrogate models' abstract layers.
In the realm of computational chemistry, semi-empirical quantum mechanics (SEQM)41,42 appears to be the most promising basis for developing differentiable models for three key reasons. First, the advent of machine learning (ML) was driven by the need for fast and scalable tools, which contrasts with the reliable but expensive ab initio techniques. Thus, higher-throughput SEQM provides a robust foundation for building differentiable models. Second, SEQM Hamiltonians involve a multitude of parameters, making them a fascinating subject for ML refinement. Finally, statically parameterized SEQM has known limitations in terms of accuracy and transferability, underscoring the need for more sophisticated optimization schemes. We argue that ML-enhanced SEQM models offer superior transferability compared to both conventional SEQM and MLIAPS, at a negligible increase in computational expense. Moreover, due to the encoded domain knowledge, such models usually require an order of magnitude less data than interatomic potentials, representing an excellent example of synergy–the mutually beneficial intersection of multiple approaches. These trends emphasize that the need for the development of quantum mechanical methods will not be suppressed by the advent of ML but rather will be stimulated by new ML techniques, which is exemplified by the ML-enhancement of other electronic structure methods.5
A. Semiempirical models: Place among other methods and historical remarks
A detailed review of SEQM methods is available in the literature.42–44, Figure 1(a) shows an illustration of how SEQM fits into the broader landscape of computational chemistry methodologies. It is important to note that this depiction is a simplified representation assuming the application of these methods to a hypothetical set of small molecules, such as drug-like compounds, as demonstrated in previous studies.27,45 At the lower level of accuracy, classical force fields (FFs) are employed. FFs typically utilize simple, physically motivated terms46 to account for phenomena such as bond stretching and angle bending in harmonic approximation, non-covalent interactions, and Coulomb electrostatic terms. Their computationally efficient nature enables large-scale applications, such as protein folding simulations.47 More advanced FFs incorporate additional effects, including polarization and even bond breaking/formation, as exemplified by ReaxFF.48 Machine learning interatomic potentials (MLIAPs) trained on high-quality datasets can potentially achieve greater accuracy in specific applications, such as torsional data benchmarks.27,45 Relative to traditional FFs, a typical MLIAP may contain two or three orders of magnitude more model parameters, and its numerical costs grow commensurately. MLIAP simulations of up to 107 atoms have been achieved.49,50 Both FFs and MLIAPs avoid self-consistent solutions of a quantum Hamiltonian and instead make strong assumptions regarding the spatial locality of chemical interactions, which leads to a linear scaling of computational costs with system size. We point curious readers to the comprehensive literature on MLIAPs.4,8,12,24,51–53 Note, though, that the accuracy and transferability of MLIAP could not be rigorously compared with electronic structure methods, as MLIAPs are trained for specific elemental compositions and/or crystal structures, and their high accuracy is confined to the training domain. The opposite side of the scale is dominated by a family of coupled cluster (CC) approaches,54 which remain the gold standard for accurate electronic structure calculations while retaining polynomial scaling. Most other methods naturally fall in between. Spanning the space of transferability, accuracy, and cost, SEQM methods occupy the middle-ground between force fields (FFs) and Density Functional Theory Methods (DFT),55 a workhorse of computational chemistry. Much more affordable than DFT, SEQM methods are usually applied to large systems (102–103 atoms), which should not, for physical reasons, be treated classically. In the recent decade, use of SEQM was substantially limited given the development of accurate and affordable DFT functionals and their highly parallelized implementations. However, we expect this balance to change with the arrival of ML-parametrized SEQM (ML-SEQM) which can offer accuracy on par with or exceeding that of DFT at much less computational cost [Fig. 1(a)]. Recent advances in ML-SEQM will be the main focus of this discussion.
The original models introduced characteristic approximations to reduce the number of electronic interactions to calculate; for example, 3- and 4- center Coulomb integrals are totally neglected in the popular Modified Neglect of Differential Overlap (MNDO) approach.41,56,57 Further on, 1-center and 2-center integrals are simplified through monopole interactions and atom-specific constants (the latter will be subjected to ML parametrization). Parameters of the model are optimized to reproduce a set of reference values and provided interatomic distances. The structural knowledge encoded in those models is simplified since it does not contain angles, bond connectivity, etc., and the final parametrization yields a single set of parameters for each element. In contrast, descriptors in ML models typically encode radial and angular information on neighboring (or even further atoms) along with atom types10,13,15,18,53 allowing a more bespoke fit. The original approach can also lead to an abundance of outliers beyond the dataset and poor transferability. “As a result, any deficiencies in the reference dataset would be reflected in deficiencies in the resulting method.” This statement could be easily confused with the direct quote from a modern ML paper, even though it is taken from Stewart’s work published back in 2002.58 The work goes on to say, “The lesson learned from this experience, an important lesson painfully learned, was that the composition of the reference dataset is of paramount importance.” In 2023, this lesson may sound elementary to modern ML practitioners but it manifested the beginning of data driven techniques for quantum chemistry, and the foothold that ML has gained in the field is, therefore, no surprise. However, back in the days, outliers were tackled manually: “An effective way to prevent the errors of the type that were found would have been to use rules. Such rules would likely have prevented the types of errors that are present in PM3.”58 To eliminate severe outliers, SEQM was often further modified by some arbitrary rules such as additional or manually corrected terms for specific systems [water clusters, Cu liganded complexes,59 or peptide bonds as a result of improper nitrogen description,58, Fig. 1(b)]. Along with careful selection of target values (enthalpies, ionization potentials, etc.), implementation of system-specific rules implies high-level human expertise based on method development, programming, and chemical research experience. It also means that system-specific rules should be recalibrated almost by hand for new applications or chemical families. We would like to conclude this historical overview with a prediction made by Stewart himself: “Finally, as more and more elements are parameterized, and as methods become increasingly sophisticated, the transition will have to be made to a purely mathematical approach.”58 Stewart had foreseen that the discrepancy between chemical systems could rarely be fitted into an automated “if-else” conditional logic. All this goes to show that the usage of ML methods, designed to automatically identify patterns and hidden relationships in data, is in fact a highly logical direction that has been foreseen for decades.
To the best of our knowledge, the first true application of ML in SEQM could be tracked down to a 2015 report60 in which a workflow for automatic parameterization was established. Rupp et al. suggested the use of an invariant Coulomb matrix61 to take into account the structure of the molecules comprising the dataset. Given that established static parameters in orthogonalized model 2 (OM2)44 are already optimized to give the best average, this work builds upon these parameters and suggests only small structure specific corrections. The pipeline is very simple: vary one parameter P at a time to find optimal corrections ΔP for each individual molecule using the Levenberg–Marquardt algorithm;62,63 train the ML model on the derived correction ΔP via kernel ridge regression to learn the variation of ΔP with respect to the structure; predict ΔP for the molecules in a test set using the ML-model; evaluate the performance of OM2 based on the P + ΔP parameters. Figure 1(c) shows a performance comparison between the original OM2 model, a revised OM2 model (rOM2; variant conventionally reparameterized for a specific dataset in the work), and the ML-OM2 model derived with automatic parameterization in the 2015 report.60 However, rOM2 improves upon the original OM2 model, ML-OM2 exhibits slightly better accuracy for atomization enthalpies among the three models. Not only, is error distribution for ML-OM2 centered at zero, but the magnitude of errors is also noticeably reduced, narrowing the gap between SEQM and DFT. This seminal work suggests that ML is a powerful approach for a broad improvement of the SEQM family without sacrificing its favorable computational cost, even taking into account the small overhead of on-the-fly ΔP predictions.
B. Δ-Machine learning for improving semi-empirical Hamiltonians
Recently, Artificial Intelligence-Quantum Mechanical Method 1 (AIQM1), a ML-enhanced SEQM method, was developed.64 AIQM1 uses semiempirical orthogonalization and a dispersion-corrected (ODM2*, star denotes that original D3 correction was removed) Hamiltonian as a base, augmented with Δ-ML [Fig. 1(d)] and D4 empirical dispersion corrections. Δ-ML is a composite technique in which baseline values (obtained in this case from ODM2*) are corrected toward target values65 calculated at a superior level of theory [here, DLPNO-CCSD(T)], thus predicting the difference; hence Δ-ML. To accomplish this, the pipeline first leveraged the abundance of DFT data, and an ANI13 neural network was trained on the difference EDFT-EODM2*. Next, transfer learning was used to retrain some neural network parameters using more sophisticated ECC-EODM2* data [Fig. 1(d)]. Transfer learning is a powerful data-saving technique66 that has proven its viability in mainline ML applications such as image recognition. Applied to chemistry, it has been demonstrated that an existing model trained on DFT data can be adapted to near-CC levels of accuracy using an order of magnitude less CC data.45 The AIQM1 method is implemented in new open-source code called SCINE Sparrow.67
Several applications show the broad applicability of the AIQM1 method. For example, AIQM1 was employed for fullerene C60 optimization, and correctly predicted bond alternations in line with CC computations and experiment68 [Fig. 1(e), top]. This delicate geometric effect is typically not described by many QM methods.69 Moreover, AIQM1 correctly predicted polyynic structure in a synthesized cyclocarbon, which remained elusive until 201970 [Fig. 1(e), bottom]. In addition, AIQM1 excels at predicting intermolecular interactions and H-bonded complexes [Fig. 1(f)]. However, some limitations of the developed scheme are apparent from the case of π–π stacked pyridines. The lower accuracy of the AIQM1 method here is explained by the lack of such structures in the dataset used for training the NN contribution. Other MLIAP limitations are also inherited: for example, both the ANI neural network architecture and the D4 correction in the training data are agnostic to the electronic structure, and so the neural network corrections, tuned for the neutral ground state, are invariant across electronic states. Nevertheless, the incorporation of the ODM2* scheme makes the description of open-shell and/or charged species possible, which is rarely the case for pure MLIAPs. We foresee the dramatic improvement of such composite ML-SEQM schemes as AIQM1 for charged and open-shell structures to become possible when charge equilibration layers or environment-dependent atomic electronegativities are employed in NN, as it was recently done for some advanced MLIAPs.14,27,28
C. Backpropagation through semi-empirical Hamiltonians
However, an example of AIQM1 used SEQM as a static base upon which to apply Δ-ML corrections, a more tightly coupled system is possible. With the maturity of programming toolkits that support highly efficient reverse-mode automatic differentiation (Pytorch,40 Jax,71 etc.), a path toward automatic adjustment of static parameters in SEQM has emerged. One example of this is the work by Zubatiuk et al.,72 which augments the extended Extended Hückel Model (EHM)73 with ML generated dynamic Hamiltonian parameters, yielding ML-EHM.
EHM is one of the simplest tight-binding models in chemistry, usually implemented in a minimal valence basis set. Remarkable speed-up comes from the empirical parametrization of two Hamiltonian constants–α and K. Roughly, a diagonal element α corresponds to the negative of the ionization potential for an isolated atom, in agreement with Koopman’s theorem.74 Immediately, we can recognize that while this approach could be viable in a statistical sense, one cannot expect a single α to be an accurate approximation for different cases since the atomic orbital energies depend on the neighboring atoms in a molecule, their bond orders, and their hybridization states. The same holds true for K, an empirical constant to scale pairwise overlap contributions to energy, which is uniformly chosen as 1.75 for organics.75
Intuitively, by replacing static α and K by environment-dependent values derived from ML, one can expect significant boost in accuracy. The implementation of ML-EHM proves this hypothesis, as demonstrated in this section. The idea of ML-EHM to adjust the EHM model based on chemical structure closely resembles the concept of automatic adjustment developed in the work of Dral et al.,60 but uses a parametric approach to structural descriptor using neural networks to build a predictor that can be applied to systems of arbitrary size, and automatic differentiation in order to address a large and diverse dataset.
The Hierarchically Interacting Particle Neural Network (HIP-NN)18 architecture was chosen as the ML backend for ML-EHM design [Fig. 2(a)]. In brief, HIP-NN is a message-passing architecture that can make inferences on both atom centered and bond centered quantities and allows communication between atoms within a given interaction range. HIP-NN also employs a hierarchical regularization scheme that emphasizes lower-order contributions in an effective n-body expansion. For the ML-EHM problem, HIP-NN was adjusted to learn atomic α and pairwise K parameters with respect to structure. The ML-EHM predicted model parameters largely agree with those of the classical EHM, except for molecular conformations that are poorly described by the classical EHM, as shown in Fig. 2(b), which displays the distribution of learned atomic αi for s and p orbitals. Remarkably, in ML-EHM models, the sharpest peak for hydrogen corresponds to the widely accepted Koopman value for the hydrogen s-orbital (−13.6 eV). The ML-EHM model exhibits better accuracy for predicting MO energies when it is not trained to predict unoccupied orbitals [Fig. 2(c)], and we suspect that this phenomenon can be associated with large variations of HOMO-LUMO gaps in the dataset. Note that the ML extension of EHM greatly improves transferability for off-equilibrium structures along the rotational profiles,72 even though its limited quantum-mechanical description precludes its application to fully reactive conditions. For instance, ML-EHM reproduces the changes in MO energies upon the isomerization of 2-aza-1,3-butadiene, as shown in Fig. 2(d). In contrast, the original EHM significantly overestimates HOMO energy and predicts its lowering at around a 90° torsion angle when, according to DFT, such a configuration raises HOMO energy. The success of ML-EHM emphasizes how backpropagation through the Hamiltonian opens new horizons for improving the accuracy and transferability of the simplest tight-binding models while retaining their low-order basis set.
Beyond ML-EHM, recent work by Vargas-Hernández et al. employed the Hückel model for the inverse design of organic molecules with targeted values of polarizability and band gaps.76 With learnable parameters through auto-differentiation, Vargas-Hernández et al. optimized atom types for specific sites in conjugated organic molecules to achieve properties of interest. These works, together, produce an exciting narrative that revives the long-standing Hückel model as a venue for novel research applying tight-binding models for the exploration of chemical space.
Despite the two stories of success, the quantum mechanical description of EHM remains very constrained. More sophisticated SEQM models such as the Parameterized Model (PMx) family41,58,59,77 or the Austin Model (AMx)78–80 offer more complex domain knowledge to be refined by ML. We highlight an example from Zhou et al.81 in Fig. 3(a), where HIPNN is interfaced with the PYSEQM package,82 which enables learning of PM358 Hamiltonian parameters. In this iterative procedure, each pass returns corrections to the Hamiltonian parameters, and a new updated Hamiltonian is used to perform self-consistent field calculations; based on the error of predictions, further training epochs are called until convergence between PYSEQM predictions and reference theory (DFT) is reached. Update of generic Hamiltonian parameters Pi could be simplistically summarized as follows: .
When this pipeline is applied to the optimization of the PM3 model, a new set of environment-aware parameters is generated. Figure 3(b) depicts their distribution binned by bond type and underpins once more how Hamiltonian elements naturally vary with respect to structure to provide a better description of the structure. The ML-optimized PM3 model (denoted as HIPNN + SEQM for consistency with the original work) shows promising results for structural optimization, achieving a root-mean square error (RMSE) of 0.006 for bond lengths when compared to reference DFT geometries [Fig. 3(c)]. Furthermore, transferability tests show superior accuracy for off-equilibrium structures sampled at various temperatures [Fig. 3(d)], even up to temperatures of 2000 K. Improved accuracy over an extended range of temperatures suggests more accurate molecular dynamics simulations in extreme conditions. Finally, the HIPNN + SEQM model shows promising transferability to the quantities not used in the training procedure. Given that the model was unaware of the reference MO energies, HIPPN-SEQM predicts the bandgap with reasonable accuracy, almost on par with the original PM3 method [Fig. 3(e)]. In contrast, without extra features and explicit training to infer band gaps, MLIAPs are not capable of predicting this quantity at all. Clearly, there is room for improvement, for example, by using costly multitarget training with MO energies and other relevant properties, but even at this stage, the ML-optimized variant outperforms SEQM and MLIAP in almost all tasks. It should be noted that HIPNN + SEQM is trained only on a small subset (60k structures) of ANI-1x dataset (500k) points,83 while HIP-NN was trained on a much larger subset of 600k structures. Such noticeable reduction in the amount of data needed to fit the model is enabled thanks to the domain knowledge already encoded in the PM3 model. Still, the training procedure remains costly since the self-consistent field (SCF) cycles for each molecule in a batch must be solved at each step, but costs can be mitigated by the fast-batching mode available in PYSEQM, which enables efficient GPU calculations for up to hundreds of molecules simultaneously. Moreover, environment-aware parameters in HIPNN + SEQM potentially extend the application of active learning to the data sampling with respect to diversity in Hamiltonian parameters space.
D. Machine learning for tight-binding Hamiltonians
In parallel to wave function-based reduced Hamiltonians, ML has been introduced into Density Functional Tight Binding (DFTB) theory. As a reduced quantum chemical description, DFTB84–86 invokes a meticulously crafted system-specific set of parameters. Recent work demonstrates that ML can remarkably improve the accuracy of DFTB. Tested ML approaches for DFTB parametrization include the employment of the Δ-ML strategy64,65 (in a fashion similar to the design of AIQM1), deep learning parametrization of many-body repulsive potential based on SchNet architecture,87 clustering of bond types for environment-aware fitting of pairwise repulsive contributions,88 and backpropagation through Hamiltonian for automated adjustment of static parameters89 (similar to the HIPNN + SEQM architecture).
Recently, a customizable DFTB framework interfaced with the Pytorch backend has been developed: the Tight-Binding Machine Learning Toolkit (TBMaLT).90 Thanks to the modular design pattern, TBMaLT allows for easy swapping of an electronic structure calculator (DFTB and extended tight-binding, xTB, are currently supported) or customized implementation. TBMaLT has three different optimization schemes, namely, global optimization of atom type-dependent confinement potentials and on-site energies, spline representation of distance-dependent two-electron integrals, and atom-specific environment-dependent confinement potentials and onsite energies. With these versatile optimization schemes, TBMaLT offers remarkable flexibility for selecting and benchmarking the most appropriate algorithm. The benefit of invoking integral optimization with splines has been demonstrated by the example of a rattled bulk silicon structure [Fig. 4(a)]. For example, pure DFTB with a classical siband-1-1 set of parameters is substantially inaccurate below Fermi level, especially in the range from 0 to −3 eV [Fig. 4(a), left panel] when compared to DFT. However, the spline optimized DFTB model was trained on a tiny dataset of only 30 structures and exhibits greatly improved accuracy, as displayed by the better alignment of density of state (DOS) curves and mean square error (MSE), which is lowered from 19.4 to 5.4 eV. This TBMaLT case exemplifies how domain knowledge fine-tuned with ML offers superior accuracy and greatly reduced data greediness in contrast to ML potentials.
Beyond tight-binding approximation, Zhang et al.91 suggested constructing an analytical mapping of the Hamiltonian based on atomic cluster expansion (ACE).92,93 Such a representation using an orthogonal local atomic basis set expands beyond the two- and three-body interactions that are used in typical tight-binding approximations, and constructs a many-body Hamiltonian as in DFT. By design, the ACE Hamiltonian enforces three essential requirements: nearsightedness of chemical interactions, smooth dependence of Hamiltonian elements on structural parameters, and equivariance with respect to the rotation in three-dimensional space. Moreover, the entire equivariant mapping of the Hamiltonian is achieved by learning only two relationships: one for onsite elements HII, denoting interaction between two orbitals centered at one atom, and one for offsite elements HIJ, describing overlap between orbitals centered at two different atoms. The resulting block Hamiltonian structure is shown in Fig. 4(b) (left panel), while Fig. 4(b) (middle panel) shows a detailed matrix structure of a zoomed block with an emphasis on the atomic orbital structure. Impressively, the learned Hamiltonian is numerically very close to the reference one as showcased by FCC aluminum using a logarithmic heatmap Fig. 4(b) (right panel), and even more complex dd interactions are reproduced with the same accuracy. The dataset used to train the ML-ACE model was sampled around the minima of FCC and BCC aluminum structures; therefore, the transferability of the framework was challenged by the computation of the Bain path, a transition from FCC and BCC crystals [Fig. 4(c)]. Apparently, along the Bain path, the simulation is forced to visit an unsampled region for which the predicted DOS agrees well with ground-truth DFT results. Altogether, the ACE Hamiltonian underpins once again how the combination of data-driven approaches and physics models works synergistically.
II. OTHER ENDEAVORS
Before concluding, we would like to mention several other works aligned with the essence of this perspective. We would like to point readers to OrbNet,94 which also introduced learnable Hamiltonian parameters in a minimal basis set. Similar works were done for crystalline materials (DeepH),95 the Frenkel Hamiltonian in the context of exciton dynamics96 and molecular Hamiltonians for electron density evolution.97 Beyond ground state properties, ML was used to assist in the construction of pseudo-Hamiltonian (Schnet + H),98 in which diagonalization outputs excitation energies. The pseudo-Hamiltonian architecture is scaled with respect to the number of requested eigenvalues, in sharp contrast to the full quantum mechanical Hamiltonian, which scales with respect to the system size. Thus, even direct diagonalization is possible, eliminating the need for Krylov subspace algorithms such as the Davidson routine.99,100 Other recent work uses an ML-inspired variational ansatz for the quantum many-body wave-functions.101–105
III. CONCLUDING REMARKS
Initial development of surrogate ML models in physical and chemical simulations has faced many challenges over the years. These challenges include limited transferability to unexplored spaces, the interplay between localized and delocalized phenomena, and the necessity of careful sampling of rare but critical events. These issues stem from the limited extrapolation capability of machine learning models and the neglect of the underlying physics, such as electronic structure. Many interatomic potentials already contain basic physics principles, such as the principle of locality, the spatial distribution of neighboring atoms, and the hierarchical decay of contributions from more distant atoms. However, a new class of differentiable physics models is emerging, which offers an enticing alternative to traditional scientific modeling. Differentiable physics models leverage auto-differentiation techniques to optimize parameters of the underlying physical theory rather than merely finding a numerical relationship between input and output. These models aim to leverage rigorous domain knowledge and known scientific principles to offer better interpretability that cannot be inferred from the abstract layer of surrogate models. In the context of chemistry and materials simulations, ML-augmented SEQM models leverage differentiable physics encoded in reduced Hamiltonians and pair them with an ML backend. This interface enables responsiveness to the local spatial environment and provides improved accuracy compared to conventional optimization approaches from early works. The synergy between differentiable physics and data-driven approaches can revive statically parameterized SEQM methods, which have been largely superseded by DFT calculations in the last few decades.
The current development directions could be roughly classified into four categories: Δ-ML correction for the base-layer of pure SEQM calculations; backpropagation through Hamiltonians parameters with respect to the local chemical environment; fitting of repulsive contributions and electronic integrals in DFTB; and finally, full analytical mapping of the electronic Hamiltonian as enabled, for instance, by ACE. We foresee that the evolution of physics inspired models will include extensions of ML-SEQM to d-elements and open-shell systems, constant companions of chemical reactions, charged molecules, and, especially, excited states, which remain sparsely covered by the current generation of ML-augmented Hamiltonians.
In this perspective, we outlined some recent trends in incorporating ML into reduced electronic structure calculations. SEQM in this context seems to be the most reasonable foundation for further improvement given the abundance of static parameters and overall low computational cost. Leveraging domain knowledge, as initially demonstrated by ML-EHM, the ML-parametrized PM3 model and ML-DFTB improve accuracy of predictions for held-out tests, as well as transferability to unknown regions of chemical space, which brings applications of ML closer to practical discovery. As always, an open and long road of experimentation with different levels of theory and their optimization routines lies ahead, and many improvements should be implemented to make ML-SEQM fully generalizable.
The emergence of new models and tools highlights the dynamic nature of the field, which demands that computational chemists keep pace with the ever-changing environment. It dictates significant changes in culture and education. Future experts in the field are expected to possess a solid background in quantum mechanics and computational chemistry, as well as proficiency in programming, statistics, and data analysis. We envision a substantial shift in educational and workforce development programs at all levels, including high school, undergraduate, and graduate. As the famous quote attributed to Derek Lowe suggests, “It is not that machines are going to replace chemists. It is that the chemists who use machines will replace those that do not.”107
The authors gratefully acknowledge Dr. James P. Stewart for his insightful and stimulating discussions during his visit. The work at Los Alamos National Laboratory (LANL) was supported by the LANL Directed Research and Development Funds (LDRD) and performed in part at the Center for Nonlinear Studies (CNLS) and the Center for Integrated Nanotechnologies (CINT), a US Department of Energy (DOE) Office of Science user facility at LANL. N.F. and M.K. acknowledges the financial support from the Director's Postdoctoral Fellowship at LANL funded by LDRD. K.B. and S.T. acknowledge support from the US DOE, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (“Triad”) Contract No. 89233218CNA000001 (FWP: LANLE3F2). This research used resources provided by the LANL Institutional Computing (IC) Program. LANL is managed by Triad National Security, LLC, for the US DOE’s NNSA, under Contract No. 89233218CNA000001. O.I. acknowledges financial support from NSF under the CCI Center for Computer Assisted Synthesis Grant No. CHE-2202693.
Conflict of Interest
The authors have no conflicts to disclose.
Nikita Fedik: Conceptualization (lead); Formal analysis (lead); Investigation (lead); Methodology (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Benjamin Nebgen: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Nicholas Lubbers: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Kipton Barros: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Maksim Kulichenko: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Ying Wai Li: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Roman Zubatyuk: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Richard Messerly: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Olexandr Isayev: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). Sergei Tretiak: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal).
Data sharing is not applicable to this article as no new data were created or analyzed in this study.