Variational quantum Monte Carlo (QMC) is an ab initio method for solving the electronic Schrödinger equation that is exact in principle, but limited by the flexibility of the available Ansätze in practice. The recently introduced deep QMC approach, specifically two deep-neural-network Ansätze PauliNet and FermiNet, allows variational QMC to reach the accuracy of diffusion QMC, but little is understood about the convergence behavior of such Ansätze. Here, we analyze how deep variational QMC approaches the fixed-node limit with increasing network size. First, we demonstrate that a deep neural network can overcome the limitations of a small basis set and reach the mean-field (MF) complete-basis-set limit. Moving to electron correlation, we then perform an extensive hyperparameter scan of a deep Jastrow factor for LiH and H4 and find that variational energies at the fixed-node limit can be obtained with a sufficiently large network. Finally, we benchmark MF and many-body Ansätze on H2O, increasing the fraction of recovered fixed-node correlation energy of single-determinant Slater–Jastrow-type Ansätze by half an order of magnitude compared to previous variational QMC results, and demonstrate that a single-determinant Slater–Jastrow-backflow version of the Ansatz overcomes the fixed-node limitations. This analysis helps understand the superb accuracy of deep variational Ansätze in comparison to the traditional trial wavefunctions at the respective level of theory and will guide future improvements of the neural-network architectures in deep QMC.

The fundamental problem in quantum chemistry is to solve the electronic Schrödinger equation as accurately as possible at a manageable cost. Variational quantum Monte Carlo (variational QMC or VMC in short) is an ab initio method based on the stochastic evaluation of quantum expectation values that scale favorably with system size and provide explicit access to the wavefunction.1 Although exact in principle, VMC strongly depends on the quality of the trial wavefunction, which determines both efficiency and accuracy of the computation, and typically constitutes the limiting factor of VMC calculations.

Recently, deep QMC has been introduced. Deep QMC involves a new class of Ansätze that complement traditional trial wavefunctions with the expressiveness of deep neural networks (DNNs). This ab initio approach is orthogonal to the supervised learning of electronic structure that requires external datasets.2,3 The use of neural-network trial wavefunctions has been pioneered for spin lattice systems4 and later generalized to molecules in second quantization.5 The first application to molecules in real space was a proof-of-principle effort, but did not reach the accuracy close to traditional VMC.6 The DNN architectures PauliNet and FermiNet advanced the real-space deep QMC approach,7,8 increasing the accuracy to the state-of-the-art levels and beyond. Demonstrating very high accuracy with far fewer determinants than traditional counterparts, these deep-neural-network trial wavefunctions provide an alternative to increasing the number of Slater determinants, thus potentially improving the unfavorable scaling with respect to the number of electrons that complicates accurate calculations for large systems. Application of the deep QMC method to many-particle quantum systems other than electrons is also possible.9 

Currently, there is little understanding of why these DNN wavefunctions work well and how their individual components contribute to the approximation of the ground-state wavefunction and energy. Examining their expressive power and measuring their accuracy in comparison to traditional approaches is essential to establish neural-network trial wavefunctions as a standard technique in VMC and to guide further development.

Here, we identify a hierarchy of model Ansätze based on the traditional VMC methodology (Fig. 1) that enables us to distinguish the effects of improving single-particle orbitals and adding correlation in the symmetric part of the wavefunction Ansatz. This is of particular interest in the context of discriminating these improvements from reducing the energy by solving the intricate problem of missing many-body effects in the nodal surface.

FIG. 1.

Hierarchy of single-determinant Ansätze in QMC. The starting point of a finite-basis Hartree–Fock (HF) calculation can be extended by a “mean-field” (MF) Jastrow factor to improve the one-electron density of the Ansatz. From that point, the Ansatz can be improved in one of the two directions, by modifying the orbitals (bottom–top) or introducing electron correlation (left–right). The red pathway shows a standard approach in traditional QMC and is the path we pursue in our analysis.

FIG. 1.

Hierarchy of single-determinant Ansätze in QMC. The starting point of a finite-basis Hartree–Fock (HF) calculation can be extended by a “mean-field” (MF) Jastrow factor to improve the one-electron density of the Ansatz. From that point, the Ansatz can be improved in one of the two directions, by modifying the orbitals (bottom–top) or introducing electron correlation (left–right). The red pathway shows a standard approach in traditional QMC and is the path we pursue in our analysis.

Close modal

The trial wavefunctions in QMC are typically constructed by combining a symmetric Jastrow factor with an antisymmetric part that implements the Pauli exclusion principle for fermions by specifying the nodal surface of the Ansatz—the hyperplane in the space of electron coordinates, r = (r1, …, rN)—on which the wavefunction changes sign. Expressing the antisymmetric part as a linear combination of Slater determinants gives rise to the Ansatz of the Slater–Jastrow-backflow-type that comprises most VMC Ansätze, including the deep variants PauliNet and FermiNet,

(1)

The ability of neural networks to represent antisymmetric (wave) functions has also been explored theoretically.10,11

Traditionally, Slater determinants are antisymmetrized product states constructed from single-particle molecular orbitals, which are expressed in a one-electron basis set consisting of basis functions ϕk,

(2)

Employing such basis sets transforms the problem of searching over infinitely many functions into a problem of searching over coefficients in a system of equations, which can be solved by means of linear algebra applying, for instance, the Hartree–Fock (HF), the multi-configurational self-consistent field (MCSCF), or the full configuration interaction (FCI) method. The projection comes at the cost of introducing the finite-basis-set error (BSE), which completely vanishes only in the limit of infinitely many basis functions—the complete-basis-set (CBS) limit [Fig. 1(a)]. Finite-basis-set errors are inherent to the second-quantized representation, which, nevertheless, provides an alternative platform to introduce deep learning to quantum chemistry.5 

The real-space formulation of VMC allows us to introduce explicit electron correlation efficiently by modeling many-body interactions with a Jastrow factor [Fig. 1(b)]. The Jastrow factor is a symmetric function of the electron coordinates that traditionally involve an expansion in one-, two-, and three-body terms.12 Although strongly improving the Ansatz, traditional Jastrow factors do not have sufficient expressiveness to reach high accuracy, and an initial VMC calculation is typically followed by a computationally demanding fixed-node diffusion QMC (FN-DMC) simulation [Fig. 1(c)], which eventually projects out the exact solution for the given nodal surface—the fixed-node limit.13 DMC is based on the imaginary-time Schrödinger equation and offers yet another entry point for the use of neural networks to represent quantum states.14,15

The nodal surface of the trial wavefunctions can be improved by increasing the number of determinants or by applying the backflow technique, transforming single-particle orbitals to many-body orbitals under consideration of the symmetry constraints. These are key concepts to efficiently reach very high accuracy with VMC and integral features of deep QMC. Using multiple determinants, applying the backflow technique and modifying the symmetric component of the Ansatz at the same time, however, makes it difficult identifying the contributions of each individual part. Benchmarking deep QMC Ansätze in conceptually simpler contexts confirms their correct functionality and helps achieve a better understanding.

In this paper, we take a closer look at how neural networks compensate for errors arising from finite basis sets and demonstrate convergence to the fixed-node limit within the VMC framework by systematically increasing the expressiveness of a deep Jastrow factor. For the sake of disentangling the individual contributions to the overall accuracy, we conduct our analysis mainly with Slater–Jastrow-type trial wavefunctions with an antisymmetric part consisting of a single determinant, that is, with Ansätze possessing a mean-field nodal surface. We compare neural-network variants with traditional functional forms, as well as with DMC results. In particular, we investigate the PauliNet, a recently proposed neural-network trial wavefunction.7 PauliNet combines ideas from conventional trial wavefunctions, such as a symmetric Jastrow factor, a generalized backflow transformation, multi-determinant expansions, quantum-chemistry baselines, and an explicit implementation of physical constraints of ground-state wavefunctions. Since PauliNet is a powerful instance of the general Ansatz in (1), we can obtain traditional types of QMC Ansätze at different levels of the theory by deactivating certain trainable parts of PauliNet. The hierarchy of Ansätze sketched in Fig. 1 maps restricted single-determinant versions of PauliNet and their eventual expressiveness in the context of the traditional single-determinant VMC approach. The incentive of implementing restricted variants of PauliNet is to test the behavior of the Ansatz in settings that are well solved by existing methods and investigate the expressiveness of the individual components of PauliNet on well-defined subproblems. These restricted variants, however, are not intended to be used in order to achieve best accuracy, which is attained when taking advantage of the full flexibility of the PauliNet Ansatz, as demonstrated previously.7 

The rest of the paper is organized as follows. In Sec. II, we review the general PauliNet Ansatz and show how different levels of the model hierarchy (Fig. 1) can be obtained. In Sec. III, we use these instances of PauliNet to investigate several subproblems of the fixed-node limit within the deep QMC approach. First, we demonstrate that DNNs can be employed to correct the single-particle orbitals of a HF calculation in a small basis and obtain energies close to the CBS limit. Next, we benchmark the deep Jastrow factor. We start by applying it to two node-less test systems, H2 and He, where results within five significant digits of the exact energy are achieved. Next, we conduct an extensive hyperparameter search for two systems with four electrons, LiH and the H4 rectangle, revealing that the expressiveness of the Ansatz can be systematically increased to converge to the fixed-node limit imposed by the employed antisymmetric Ansatz. We further explore the convergence aspect by sampling the dipole moment for the LiH Ansätze and evaluating energy differences for two configurations of the hydrogen rectangle. Thereafter, we show the size consistency of the method, examining the optimization of the deep Jastrow factor for systems of non-interacting molecules (H2–H2 and LiH–H2). Finally, we test various single-determinant variants of PauliNet in an analysis of the water molecule and compare them to traditional trial wavefunctions. Section IV discusses the results.

The central object of our investigation is PauliNet, a neural-network trial wavefunction of the form in (1). PauliNet extends the traditional multi-determinant Slater–Jastrow-backflow-type trial wavefunctions,16 retaining physically motivated structural features while replacing ad hoc parameterizations with highly expressive DNNs,

(3)
(4)

The Ansatz consists of a linear combination of Slater determinants of molecular single-particle orbitals φμ corrected by a generalized backflow transformation fθ and of a Jastrow factor Jθ. The DNN components are indicated by the θ subscript, denoting the trainable parameters of the involved neural networks. The expansion coefficients cp and the single-particle orbitals are initialized from a preceding standard quantum-chemistry calculation (HF or MCSCF). The analytically known electron–nucleus and electron–electron cusp conditions17 are enforced within the orbitals φμ and as a fixed part γ of the Jastrow factor, respectively. The correct cusps are maintained by designing the remaining trial wavefunction architecture to be cusp-less.

Both backflow transformation and Jastrow factor can introduce many-body correlation and are constructed in such a way that they preserve the antisymmetry of the trial wavefunction. The Jastrow factor consists of a symmetric function, that is, it retains the antisymmetry upon being invariant under the exchange of same-spin electrons. This, however, has the consequence that it scales the wavefunction without altering the nodes of the Ansatz. The backflow transformation on the other hand alters the nodal surface by acting on the orbitals directly. Traditionally, the backflow correction introduces many-body correlation by assigning quasi-particle coordinates that get streamed through the original orbitals. PauliNet generalizes this concept, based on the observation that equivariance with respect to the exchange of electrons is a sufficient criterion to retain the antisymmetry of the Slater determinant, and considers the backflow correction as a many-body transformation of the orbitals themselves. In fact, it has been shown in principle that a single Slater determinant with generalized orbitals is capable of representing any antisymmetric function, if the many-body orbitals are sufficiently expressive.11 Both the Jastrow factor Jθ and backflow transformation fθ are obtained from a joint latent-space representation encoded by a graph-convolutional neural network. The network acts on the rotation- and translation-invariant representation of the system given by the fully connected graph of distances between all electrons and nuclei. The latent-space many-body representation is designed to be equivariant under the exchange of same-spin electrons, which is used to construct the permutation-equivariant backflow transformation and the permutation-invariant Jastrow factor. Details on the graph-convolutional neural-network architecture can be found in the  Appendix. Combining an expansion in Slater determinants with the Jastrow factor and backflow transformation introduces multiple ways to model many-body effects, helping us to efficiently encode correlation in the Ansatz by representing, i.e., dynamic correlation explicitly while implementing static correlation with multiple determinants.

The PauliNet Ansatz is optimized according to the standard VMC scheme1 of minimizing its energy expectation value. This is based on the variational principle of quantum mechanics that guarantees the energy expectation value of any trial wavefunction to be lower-bounded by the ground-state energy, as long as the fermionic antisymmetry constraint is implemented,

(5)

In VMC, this expectation value is approximated by Monte Carlo integration,

(6)

In practice, this gives rise to an alternating scheme of sampling electronic configurations according to the probability density associated with the trial wavefunction with a standard Langevin sampling approach and optimizing the parameters of this wavefunction by following their (stochastic) gradient with respect to estimates of the expectation value over small batches. For further details of the training methodology, see Ref. 7. Numerical calculations were carried out with the DeepQMC Python package,18 with training hyperparameters as reported in Table VI.

Next, we show how to obtain the Ansätze of Fig. 1 from the general PauliNet architecture and introduce the respective optimization problems to be solved.

The simplest way to approach the quantum many-body problem is by considering a mean-field theory. The HF method gives the optimal mean-field solution within the space of the employed basis set. A mean-field variant of the PauliNet architecture can be used to account for finite-basis-set errors in the HF baseline, by introducing a real-space correction to the single-particle orbitals,

(7)
(8)

The functions fθ and fθ are implemented by DNNs that generate a multiplicative and an additive correction to the HF orbitals φμ, respectively. Combining a multiplicative and an additive correction serves the practical purpose of facilitating the learning process, as the multiplicative correction has a strong effect where the value of the orbital is large, while the additive correction can alter the nodes of the molecular orbital. (In principle, an additive correction only would be a sufficient parameterization.) This approach is a special case of the generalized backflow transformation in (4), in which the backflow correction depends on the position of the ith electron only. The single-particle representation xi(L)(ri) can be obtained by a slight modification of the graph-convolutional architecture, as described in the  Appendix. If Gaussian-type orbitals are used, it is common to correct the missing nuclear cusp at the coalescence points within the orbitals. We employ the cusp correction of Ma et al.19 and construct the DNNs to be cusp-less. Though the DNN could in principle approximate the orbitals from scratch, providing the HF baseline that ensures the correct asymptotics and offers a good initial guess reduces the training cost and makes the training process more robust. In the mean-field theory, the HF energy at the CBS limit constitutes a benchmark for the best possible solution to the optimization problem.

The Slater–Jastrow-type Ansatz goes beyond the mean-field theory by introducing explicit electronic correlation. The symmetric Jastrow factor, however, cannot alter the nodal surface, and the single-determinant Slater–Jastrow-type Ansatz is, therefore, a many-body Ansatz possessing a mean-field nodal surface,

(9)

The deep Jastrow factor Jθ is obtained from the latent-space many-body representation encoded by the graph-convolutional neural network described in the  Appendix,

(10)

To enforce the symmetry of the Jastrow factor, the permutation-equivariant many-body embeddings xi(L) are summed over the electrons to give permutation-invariant features. These features serve as the input to a fully connected neural network ηθ, which returns the final Jastrow factor. The process of obtaining the latent-space representation involves multiple smaller components, such as trainable arrays and fully connected neural networks, whose full specification gives rise to a collection of hyperparameters that influence the expressiveness of the Ansatz. A list of the components and the respective hyperparameters can be found in Table V.

Benchmarking Jastrow factors comes with the difficulty of distinguishing errors arising from the nodal surface from those present due to a lack of expressiveness in the Jastrow factor. The optimal energy of a Slater–Jastrow-type trial wavefunction, however, can be obtained with the FN-DMC algorithm, which gives the exact ground state of the Schrödinger equation under the fixed-node constraint of the antisymmetric part of the Ansatz.

We, furthermore, implement a mean-field Jastrow factor, which constitutes another point in the space of Ansatz classes (Fig. 1),

(11)

The mean-field Jastrow factor can optimize the one-electron density of the Ansatz without modifying the nodal surface or introducing correlation, making its variations a strict subset of the orbital correction. This equips us with an intermediate step in approaching the finite-basis-set limit that can be used to relate the finite-basis-set error to the fixed-node error of the HF baseline. If the many-body Jastrow factor is used, the mean-field version is not needed, as it is implicitly included in the many-body version.

We start from a HF baseline obtained in the small 6-31G basis set. Instead of introducing more basis functions, the PauliNet Ansatz follows the alternative approach of correcting the orbitals directly in real space. We trained the mean-field variant of the PauliNet Ansatz (Sec. II B) on H2, He, Be, LiH, and the hydrogen square H4. For all five test systems, we obtained energies close to the extrapolated CBS limit and recovered at least 97% of the finite-basis-set error (Fig. 2). This shows that the use of a very small basis set for the baseline of PauliNet does not introduce any fundamental limitation to accuracy, because the neural network is able to correct it. We note that such an approach to the CBS limit is practical only within the context of the full PauliNet, not as a standalone technique to replace large basis sets in quantum chemistry.

FIG. 2.

Removing the basis-set error of a HF calculation. The error with respect to the estimated CBS limit of HF calculations with increasingly large basis sets as well as the result of employing a DNN to correct the single-particle orbitals of a HF calculation in a small basis (6-31G) are shown. The statistical error of the Monte Carlo integration is shown in light blue. The DNN is capable of correcting deficiencies arising from the finite basis, producing energies close to the CBS limit.

FIG. 2.

Removing the basis-set error of a HF calculation. The error with respect to the estimated CBS limit of HF calculations with increasingly large basis sets as well as the result of employing a DNN to correct the single-particle orbitals of a HF calculation in a small basis (6-31G) are shown. The statistical error of the Monte Carlo integration is shown in light blue. The DNN is capable of correcting deficiencies arising from the finite basis, producing energies close to the CBS limit.

Close modal

Next, we turn to modeling electron correlation with the deep Jastrow factor (Sec. II C). We start by evaluating the deep Jastrow factor for H2 and He, two-electron closed-shell systems for which the ground state is node-less (the antisymmetry comes from the spin part of the wavefunction only), such that the Jastrow factor is, in principle, sufficient to reproduce exact results. This yields a pure test for the expressiveness of the deep Jastrow factor. The recovered many-body correlation is measured by the correlation energy,

(12)

For both systems, we obtain energies matching five significant digits of the exact references (Table I). We evaluate the Ansatz along the dissociation curve of H2 (Fig. 3). Deep QMC outperforms FCI even with the large cc-pV5Z basis set, reducing the error in correlation energy by one to two orders of magnitude at compressed geometries and still being more accurate at stretched geometries, where the system exhibits static correlation, and the restricted HF baseline gives qualitatively wrong results (ionic contributions resulting in negative interaction energy). The results demonstrate the difficulty of modeling dynamic correlation in Slater-determinant space when applying purely second-quantized approaches and showcase the advantages of explicitly encoding many-body correlations.

TABLE I.

Results for two-electron node-less systems.

SystemDeep Jastrow factorExact energyη (%)
H2 (d = 1.4) −1.17 446(1) −1.174 474 820  99.97(3) 
He 2.90 372(1) −2.903 724 722  99.98(2) 
SystemDeep Jastrow factorExact energyη (%)
H2 (d = 1.4) −1.17 446(1) −1.174 474 820  99.97(3) 
He 2.90 372(1) −2.903 724 722  99.98(2) 
FIG. 3.

Dissociation curve of the hydrogen molecule. Upper panel shows total energy. Exact results20 cannot be distinguished from FCI and the deep Jastrow factor. Lower panel shows the percentage of correlation energy recovered. FCI results were obtained with PySCF21 in the cc-pVQZ basis (orange) and cc-pV5Z basis (green). Deep QMC surpasses the FCI accuracy for the entire dissociation curve.

FIG. 3.

Dissociation curve of the hydrogen molecule. Upper panel shows total energy. Exact results20 cannot be distinguished from FCI and the deep Jastrow factor. Lower panel shows the percentage of correlation energy recovered. FCI results were obtained with PySCF21 in the cc-pVQZ basis (orange) and cc-pV5Z basis (green). Deep QMC surpasses the FCI accuracy for the entire dissociation curve.

Close modal

The complexity of modeling correlation increases steeply with the number of particles. We evaluate the performance of the deep Jastrow factor for LiH and the hydrogen rectangle H4. While these four-electron systems exhibit more intricate interactions, they are computationally lightweight such that the hyperparameter space of the deep Ansätze can be explored exhaustively. With multiple same-spin electrons, the spatial wavefunction is no longer node-less, and the single-determinant Slater–Jastrow Ansatz possesses a fixed-node error. Instead of comparing to exact energies, we, therefore, measure the performance of the Jastrow factor with respect to the fixed-node limit estimated from FN-DMC calculations and report the fixed-node correlation energy,

(13)

As the fixed-node correlation energy is defined for Ansätze with an identical nodal surface, the nodes of the FN-DMC benchmark have to be reconstructed. For the mean-field nodal surface, this implies starting from a HF computation with the same basis set.

For the H4 rectangle, we performed a scan of all the hyperparameters of the deep Jastrow factor including those of the graph-convolutional neural-network architecture (Table V). The scan was a grid search that involved the training of 864 models, comprising models with all combinations of the hyperparameters in the vicinity of their default values. In order to reduce the dimensionality of the experiment, some hyperparameters were merged and varied together. Further details of the scan can be found in the caption of Fig. 8, depicting the energies of all the model instances. The experiment aimed at obtaining a first impression of the hyperparameter space and revealed that by increasing the total number of trainable parameters, the fixed-node limit can be approached. The experiment shows that the energy behaves smoothly with respect to changes in the hyperparameters, and there are no strong mutual dependencies between hyperparameters. Several important hyperparameters for systematically scaling the architecture can be identified, such as the depth of the neural network ηθ from (10), the number of interactions L, and the dimension of the convolutional kernel, referring to the dimension of the latent space where the interactions within the graph-convolutional neural network take place ( Appendix).

The results were used to perform a thorough investigation of the convergence behavior on LiH, varying a subset of hyperparameters and fixing the remaining hyperparameters at suitable values. We show a systematic convergence to the fixed-node limit with an increasing dimension of the convolutional kernel (DNN width) as well as the number of interactions L (Fig. 4). This is an indication that the deep Jastrow factor can be extended toward completeness in a computationally feasible way. The remaining fluctuations of the fixed-node correlation energy are caused by the stochasticity of the training and the sampling error of evaluating the energy of the wavefunction.

FIG. 4.

Approaching FN-DMC accuracy with the deep Jastrow factor. For LiH, increasingly expressive Jastrow factors are trained. The dependence on the dimension of the convolutional kernel (DNN width), and the number of interactions (L) in the Jastrow graph-neural-network is shown. The orbitals of the antisymmetric part are expressed in the TZP basis.23 The upper panel shows the energy of the trial wavefunctions. The most accurate Ansätze give results within the sampling error of the FN-DMC energy of a single-determinant benchmark from Casalegno, Mella, and Rappe,24 indicated by the shaded region at the upper end of the graph. In the lower panel, the dipole moment of the trial wavefunctions is evaluated. As a reference, the dipole moments of a HF (dashed line) and of an explicitly correlated coupled-cluster [CCSD(T)-R12] calculation25 (dashed–dotted line) are shown. For the models with five and six interactions, the dipole moment converges to the results from the coupled-cluster calculation.

FIG. 4.

Approaching FN-DMC accuracy with the deep Jastrow factor. For LiH, increasingly expressive Jastrow factors are trained. The dependence on the dimension of the convolutional kernel (DNN width), and the number of interactions (L) in the Jastrow graph-neural-network is shown. The orbitals of the antisymmetric part are expressed in the TZP basis.23 The upper panel shows the energy of the trial wavefunctions. The most accurate Ansätze give results within the sampling error of the FN-DMC energy of a single-determinant benchmark from Casalegno, Mella, and Rappe,24 indicated by the shaded region at the upper end of the graph. In the lower panel, the dipole moment of the trial wavefunctions is evaluated. As a reference, the dipole moments of a HF (dashed line) and of an explicitly correlated coupled-cluster [CCSD(T)-R12] calculation25 (dashed–dotted line) are shown. For the models with five and six interactions, the dipole moment converges to the results from the coupled-cluster calculation.

Close modal

By evaluating the dipole moment of the LiH wavefunctions, we go beyond the energy and investigate the convergence of a property that the PauliNet Ansatz is not explicitly optimized for. We found that upon converging to the fixed-node limit with increasingly large models, the dipole moment approaches the coupled cluster reference (Fig. 4). Even though the energies of the LiH wavefunctions converge consistently, the convergence of the dipole moment is subject to fluctuations, which are particularly strong for the small models and decrease as the fixed-node limit is approached. This can be explained by degenerate energy minima of the Ansatz with respect to the parameters. Multiple solutions to the optimization problem can be present if the exact solution is outside the variational subspace. This ambiguity, however, decreases with increasing expressiveness of the trial wavefunction.

While the accuracy with respect to the total energy is an appropriate measure for expressiveness of the trial wavefunction Ansatz, in practice, relative energies are most often of interest. The capability of the full PauliNet Ansatz in computing relative energies has been previously demonstrated for the cyclobutadiene automerization,7 and the results with the deep Jastrow factor for the node-less H2 (Fig. 3) provide interaction energies at the level of FCI. Here, we want to study how the relative energy converges with an increasing expressiveness of the deep Jastrow factor and demonstrate a cancellation of errors at different geometries. This is a feature that makes relative energy calculations usually more accurate than total energy calculations and is very desirable for any quantum-chemistry method. We optimized increasingly expressive versions of the deep Jastrow factor for two geometries of the hydrogen rectangle (Table VII) and determined their relative energy (Fig. 5). In order to reduce the level of stochasticity in the training, we performed five independent optimization runs and used the ones with the lowest energy to calculate the relative energies. Both the total and relative energies converge to the DMC reference with increasing number of trainable parameters of the Ansatz. Furthermore, the relative energy fluctuates within 2 kcal/mol of the DMC reference for all models with more than two interactions and is well within 1 kcal/mol for the models with the largest DNN width. This demonstrates that the deep Jastrow factor can achieve similar accuracy for both geometries and exhibits a cancellation of errors. The stochasticity of the optimization, however, complicates the comparison of individual runs, which will be subject to further investigation. Looking at the energy of the different optimizations though, we found a decrease in stochasticity of the final energy with increasing model size, which is convenient for practical purpose, where typically large models would be used. The difficulty of optimizing small models is well-known in the context of training neural networks, which tends to be improved by increasing the number of trainable parameters.27 

FIG. 5.

Relative energy for hydrogen rectangle. The convergence of the energy to the DMC reference energy26 (black line) with increasing model size for two geometries of the hydrogen rectangle, as well as the relative energy difference, is shown. The minimal energy over five independent runs is highlighted in the upper plots and used to compute the relative energy. The dashed lines indicate error margins of 1 and 2 kcal/mol.

FIG. 5.

Relative energy for hydrogen rectangle. The convergence of the energy to the DMC reference energy26 (black line) with increasing model size for two geometries of the hydrogen rectangle, as well as the relative energy difference, is shown. The minimal energy over five independent runs is highlighted in the upper plots and used to compute the relative energy. The dashed lines indicate error margins of 1 and 2 kcal/mol.

Close modal

One of the essential properties of any proper electronic structure method is size consistency. Traditional Jastrow factors are factorizable in the electronic and nuclear coordinates of two infinitely distant subsystems, which leads to exact size consistency for identical copies of a given system and to approximate size consistency for an assembly of different systems (because optimized parameters are now shared by different systems). In PauliNet, the embeddings xi for two electrons at two distant subsystems are independent of each other by construction. Although the subsequent nonlinear transformation ηθ applied to the sum of the embeddings breaks exact factorizability, it could be restored by applying the transformation before summing the embeddings, which in numerical experiments does not affect performance. Regardless, in numerical experiments with two systems of non-interacting molecules (H2–H2 and LiH–H2), we show that even the variant of our Ansatz, which is not exactly factorizable, is size-consistent in practice (Table II). For the system composed of two distant hydrogen molecules, both the combined and individual calculations give nearly exact results, 99.99(1)% and 100.00(1)% of the correlation energy, respectively. In the second test with LiH and H2, 99.65(2)% and 99.68(2)% of the correlation energy are achieved, respectively, which corresponds to the difference of less than 10% of the overall error of PauliNet with respect to the exact energy. The results, furthermore, show that optimization of the Ansatz for the combined system works similarly well as optimizing the separate instances for the respective subsystems (Fig. 6).

TABLE II.

Sampled energies of the size consistency experiment.

SystemCombinedIndividualExact20,28
H2–H2 −2.348 94(1) −2.348 95(1) −2.348 95 
LiH–H2 −9.243 94(7) −9.244 05(7) −9.245 01 
SystemCombinedIndividualExact20,28
H2–H2 −2.348 94(1) −2.348 95(1) −2.348 95 
LiH–H2 −9.243 94(7) −9.244 05(7) −9.245 01 
FIG. 6.

Size consistency of deep Jastrow factor optimization. This figure shows the smoothed training curves of the deep Jastrow factor for two systems of non-interacting molecules. The optimization of an Ansatz for the joint system (orange) is compared to the independent optimization of two separate Ansätze for the subsystems (blue), respectively.

FIG. 6.

Size consistency of deep Jastrow factor optimization. This figure shows the smoothed training curves of the deep Jastrow factor for two systems of non-interacting molecules. The optimization of an Ansatz for the joint system (orange) is compared to the independent optimization of two separate Ansätze for the subsystems (blue), respectively.

Close modal

The results for the small test systems showed that DNNs can be used to converge to the CBS limit within the mean-field theory and that by adding correlation with a deep Jastrow factor, the fixed-node limit can be approached. To investigate how these Ansätze behave for larger systems, we evaluated the respective instances of PauliNet on the water molecule (Figs. 7 and 9). These experiments aim at demonstrating that the same Ansätze can be applied to a variety of systems without any modifications and test how much their respective accuracy decreases if the size of the graph-convolutional neural network is kept fixed. For the experiment, we chose four interactions and a kernel dimension of 256, which is equal to the large models from the H4 and LiH experiments. Due to the convolutional nature of the neural network, the number of trainable parameters is mostly independent of the number of electrons; hence, it is similar to the previous experiments.

FIG. 7.

Single-determinant variants of PauliNet evaluated on the H2O molecule. All three restricted variants of PauliNet introduced in Sec. II are compared: the deep orbital correction (6-311G + DNN), the full deep Jastrow factor, and the mean-field (MF) Jastrow factor. Furthermore, the energy of the combined Ansatz (Jastrow@6-311G + DNN) and single-determinant versions of the full PauliNet (full) are shown. ROOS denotes the Roos-aug-DZ-ANO basis set. VMC and DMC references are taken from Gurtubay and Needs.30 HF@CBS and exact energy are taken from Rosenberg and Shavitt.32 The Monte Carlo sampling error is smaller than the size of the markers. The training curves for the Ansätze are shown in Fig. 9.

FIG. 7.

Single-determinant variants of PauliNet evaluated on the H2O molecule. All three restricted variants of PauliNet introduced in Sec. II are compared: the deep orbital correction (6-311G + DNN), the full deep Jastrow factor, and the mean-field (MF) Jastrow factor. Furthermore, the energy of the combined Ansatz (Jastrow@6-311G + DNN) and single-determinant versions of the full PauliNet (full) are shown. ROOS denotes the Roos-aug-DZ-ANO basis set. VMC and DMC references are taken from Gurtubay and Needs.30 HF@CBS and exact energy are taken from Rosenberg and Shavitt.32 The Monte Carlo sampling error is smaller than the size of the markers. The training curves for the Ansätze are shown in Fig. 9.

Close modal
FIG. 8.

Hyperparameter scan of the H4 square. This figure maps models with different combinations of hyperparameters in the space of accuracy with respect to the number of total trainable parameters. The exact set of hyperparameters of each model can be decoded via the accompanying legend. In order to reduce the degrees of freedom, the embedding and kernel dimensions were joined and varied together. The models with increasing embedding and kernel dimensions can be distinguished by their number of total parameters; hence, no explicit criterion is introduced. The energy of the models is estimated from the final steps of the optimization procedure. The fixed-node correlation energy is obtained with respect to FN-DMC results,26 where a single-determinant trial wavefunction with a basis consisting of s functions from the cc-pV5Z basis set and the p and d functions from the cc-pVTZ basis set was used. Increasing the number of trainable parameters, the fixed-node limit can be approached.

FIG. 8.

Hyperparameter scan of the H4 square. This figure maps models with different combinations of hyperparameters in the space of accuracy with respect to the number of total trainable parameters. The exact set of hyperparameters of each model can be decoded via the accompanying legend. In order to reduce the degrees of freedom, the embedding and kernel dimensions were joined and varied together. The models with increasing embedding and kernel dimensions can be distinguished by their number of total parameters; hence, no explicit criterion is introduced. The energy of the models is estimated from the final steps of the optimization procedure. The fixed-node correlation energy is obtained with respect to FN-DMC results,26 where a single-determinant trial wavefunction with a basis consisting of s functions from the cc-pV5Z basis set and the p and d functions from the cc-pVTZ basis set was used. Increasing the number of trainable parameters, the fixed-node limit can be approached.

Close modal
FIG. 9.

Training curve for the H2O experiment. Exponential walking averages of the energy along the optimization process are shown for the Ansätze from Fig. 7. Upper panel depicts the percentage of the recovered finite-basis-set error (BSE) for the mean-field Ansätze, given by the energy difference of the HF@6-311G baseline from the complete-basis-set limit estimate. The center panel gives the fixed-node correlation energy of the Slater–Jastrow-type versions of PauliNet with respect to the DMC reference in the Roos-aug-DZ-ANO basis set.30 Lower panel shows the total correlation energy for the full PauliNet Ansätze.

FIG. 9.

Training curve for the H2O experiment. Exponential walking averages of the energy along the optimization process are shown for the Ansätze from Fig. 7. Upper panel depicts the percentage of the recovered finite-basis-set error (BSE) for the mean-field Ansätze, given by the energy difference of the HF@6-311G baseline from the complete-basis-set limit estimate. The center panel gives the fixed-node correlation energy of the Slater–Jastrow-type versions of PauliNet with respect to the DMC reference in the Roos-aug-DZ-ANO basis set.30 Lower panel shows the total correlation energy for the full PauliNet Ansätze.

Close modal

We again start with the mean-field theory and consider the finite-basis-set error. We corrected the HF orbitals in the small 6-311G basis set with the deep orbital correction (Sec. II B), which recovered 90% of the finite-basis-set error. We then estimated how much of the finite-basis-set error amounts to the fixed-node error by applying the mean-field Jastrow factor (Sec. II D), which can recover about half of the finite-basis-set error only. This suggests that upon approaching the finite-basis-set limit, the nodal surface is altered significantly.

Next, we investigate single-determinant Ansätze with the full Jastrow factor (Sec. II C). We benchmark the deep Jastrow factor with a HF determinant in the Roos augmented double-zeta basis (Roos-aug-DZ-ANO),33 a basis set that is frequently used for calculations on H2O and gives HF energies at the CBS limit. We compare to VMC and DMC results from the literature, achieving 97.2(1)% of the fixed-node correlation energy and surpassing the accuracy of previous VMC calculations with single-determinant Slater–Jastrow trial wavefunctions by half an order of magnitude (Table III).

TABLE III.

Benchmarking single-determinant (SD) Slater–Jastrow (SJ) Ansätze on H2O.

ReferenceHFVMC (SD-SJ)DMC (SD-SJ)ηFN (%)Basis set
PauliNet −76.009 −76.3923(7) … 91.2(2)a 6-311G 
PauliNet −76.0612 −76.4096(7) … 96.0(2)a 6-311G + DNN 
PauliNet −76.0672 −76.4139(5) … 97.2(1)a Roos-aug-DZ-ANO 
Clark et al.29  … −76.3938(4) −76.423 6(2) 91.6(1) Roos-aug-TZ-ANO 
Gurtubay and Needs30  −76.0672 −76.3773(2) −76.423 76(5) 87.01(6) Roos-aug-DZ-ANO 
Gurtubay et al.31  −76.0587 −76.327(1) −76.421 02(4) 73.5(3) 6–311++G(2d,2p) 
ReferenceHFVMC (SD-SJ)DMC (SD-SJ)ηFN (%)Basis set
PauliNet −76.009 −76.3923(7) … 91.2(2)a 6-311G 
PauliNet −76.0612 −76.4096(7) … 96.0(2)a 6-311G + DNN 
PauliNet −76.0672 −76.4139(5) … 97.2(1)a Roos-aug-DZ-ANO 
Clark et al.29  … −76.3938(4) −76.423 6(2) 91.6(1) Roos-aug-TZ-ANO 
Gurtubay and Needs30  −76.0672 −76.3773(2) −76.423 76(5) 87.01(6) Roos-aug-DZ-ANO 
Gurtubay et al.31  −76.0587 −76.327(1) −76.421 02(4) 73.5(3) 6–311++G(2d,2p) 
a

The fixed-node correlation energy is computed with respect to the reference FN-DMC energy of Gurtubay and Needs.30 

In order to study how finite-basis-set errors manifest in the mean-field nodal surface of both the HF and the many-body Ansätze, we computed the energies of the deep Jastrow factor with a HF determinant in a 6-311G basis. The results suggest that finite-basis-set errors of the HF calculations transfer directly to the many-body regime. In particular, the differences between the energies of the mean-field Ansätze match the differences of the respective Slater–Jastrow trial wavefunctions, and errors in the energy due to finite-basis-set effects are not altered by the many-body correlation.

We, furthermore, demonstrate that both methods can be combined, by optimizing a trial wavefunction composed of a deep Jastrow factor and a Slater determinant of orbitals of an imprecise HF baseline that are modified by the orbital correction. The parameters of both the Jastrow factor and orbital correction were optimized simultaneously. The HF baseline was computed in the small 6-311G basis set. With this setup, we were able to achieve energies close to the fixed-node limit of the optimal mean-field nodal surface. Starting from a minimal baseline, we recovered 96.0(2)% of the fixed-node correlation energy with respect to the Roos-aug-DZ-ANO basis.

Finally, we show that the full PauliNet Ansatz can go beyond the fixed-node approximation and train an instance with the same graph-convolutional architecture as in the previous experiments, but using the full backflow transformation. With this Ansatz, we obtained a VMC energy of −76.4252(3) and −76.4281(3) for the 6-311G and the Roos-aug-DZ-ANO basis set, respectively, amounting to 96.67(8)% and 97.38(8)% of the total correlation energy. This energy is significantly below the single-determinant DMC results (Fig. 7), demonstrating energetically favorable changes in the nodal surface due to the backflow transformation. A comparison to traditional VMC results shows that the single-determinant version of PauliNet strongly improves on single-determinant Slater–Jastrow-backflow (SD-SJB) trial wavefunctions, and multi-determinant Slater–Jastrow (MD-SJ) wavefunctions need thousands of determinants to obtain a similar accuracy (Table IV). Here, it should be stated that, in principle, the accuracy of PauliNet can be further improved by increasing the size of the graph-convolutional neural-network architecture or introducing multiple determinants. The comparison should, therefore, not be understood as ultimate, but serves to give an impression of the capabilities of the PauliNet backflow. More exemplary calculations with the full PauliNet Ansatz including multi-determinant Ansätze have been carried out previously,7 and a more thorough investigation of the improvements in the nodal-surface as well as a benchmark of the computational complexity will be conducted in the future work.

TABLE IV.

Comparison of full PauliNet instance with traditional trial wavefunctions.

Ansatz# determinantsVMC
PauliNet SD-SJB −76.4281(3) 
Gurtubay and Needs30  SD-SJB −76.4034(2) 
Clark et al.29  MD-SJ 2316 −76.4259(6) 
Clark et al.29  MD-SJ 7425 −76.4289(8) 
Ansatz# determinantsVMC
PauliNet SD-SJB −76.4281(3) 
Gurtubay and Needs30  SD-SJB −76.4034(2) 
Clark et al.29  MD-SJ 2316 −76.4259(6) 
Clark et al.29  MD-SJ 7425 −76.4289(8) 

We have demonstrated that the choice of the architecture does not introduce fundamental limitations regarding the flexibility of the investigated components of the PauliNet and that a systematic improvement of the accuracy is possible when increasing the number of trainable parameters in a suitable way. Close to exact energies for the corresponding level of theory can be obtained for both the deep orbital correction and the deep Jastrow factor. This highlights the generality and expressiveness of deep QMC—a single Ansatz without any problem-specific modifications can be applied to a variety of systems and extended systematically to improve the accuracy without introducing new components to the trial wavefunction architecture. Though the results with the deep orbital correction and the deep Jastrow factor emphasize the potential of the deep QMC approach, the major benefit of deep QMC over FN-DMC calculations remains that it can go beyond the fixed-node approximation by faithfully representing the nodal surface upon introducing many-body correlation at the level of the orbitals. We have outlined this with an exemplary calculation on the water molecule, using a single-determinant instance of the full PauliNet Ansatz. The presented analysis paves the way for future investigations on how the full PauliNet Ansatz improves the nodes and overcomes the fixed-node limitations.

We acknowledge funding and support from the European Research Commission (Grant No. ERC CoG 772230), the Berlin Mathematics Research Center MATH+ (Project Nos. AA2-8, EF1-2, and AA2-22), and the German Ministry for Education and Research (Berlin Institute for the Foundations of Learning and Data BIFOLD). J.H. would like to thank K.-R. Müller for support and acknowledge funding from TU Berlin.

The data that support the findings of this study are openly available in http://doi.org/10.6084/m9.figshare.13077158.v3.

At the core of the PauliNet architecture, a graph-convolutional neural-network generates a permutation-equivariant latent-space many-body representation of a given electron configuration. In the following we give a short introduction to the network architecture, discussing both the general concept and the particular application in the context of PauliNet.

Graph neural networks are constructed to represent functions on graph domains34 and have become increasingly popular for modeling chemical systems, as they can be designed to comply with symmetries of molecules.35 The graph-convolutional neural network of PauliNet is a modification of SchNet,36 an architecture developed to predict molecular properties from atom positions upon being trained in a supervised setting, that is, by repeated exposure to known pairs of input and output data. In SchNet, a trainable embedding is assigned to each atom, which serves to give an abstract representation of the atomic properties in a high-dimensional feature space, and is successively updated to encode information about the atomic environment. The updates are implemented as convolutions over the graph of atomic distances, which makes the architecture invariant to translation and rotation and equivariant with respect to the exchange of identical atoms. The graph convolutions, furthermore, implement parameter sharing across the edges such that the number of network parameters does not depend on the number of interacting entities; hence, it is constant with the system size. The final features are then used to predict the molecular properties, respectively.

In quantum chemistry, we consider modeling electrons and nuclei. At this level, a molecule can be represented as a complete graph, where nodes correspond to electrons and nuclei, and the distances of each pair of particles are assigned to the edge between their respective nodes. The graph-convolutional neural network at the core of PauliNet acts on this graph representation of the system. Similar to the SchNet implementation, a representation in an abstract feature space is assigned to each node by introducing electronic embeddings Xθ,si and nuclear embeddings Yθ,I, respectively. The embeddings are trainable arrays that are initialized randomly. As same-spin electrons are indistinguishable, they share representations and get initialized with a copy of the same electronic embedding,

(A1)

The electronic embeddings constitute the latent-space representation, which serves to obtain the Jastrow factor and backflow transformation in the later process of evaluating the trial wavefunction. In order to encode positional information of each electron with respect to the nuclei as well as electronic many-body correlation into the latent-space representation, the electronic embeddings are updated in an interaction process. Information is transmitted along the edges of the graph, by exchanging messages that take the distances to the nuclei and the other electrons into account,

(A2)

Here, the functions wθ and hθ are implemented by fully connected neural networks, e represents an expansion of the distances in a basis of Gaussian functions, and ⊙ indicates element-wise multiplication. For each two interacting particles, the filter-generating function wθ generates a mask that is applied to their respective embeddings. Thereby, the filter-generating function moderates interactions based on the distance of the particles. By summing up the messages of identical particles, their overall contribution is invariant under the exchange of these identical particles. The transformation hθ serves to introduce additional flexibility to the architecture, by separating the latent-space representation from the interaction space. The superscripts of the neural networks indicate that different functions are applied at each subsequent interaction, and the filter-generating functions for the interactions with spin-up electrons, spin-down electrons, and nuclei are different. The distance expansion e is truncated with an envelope that ensures it to be cusp-less, that is, that all Gaussian features have a vanishing derivative at zero distance, and imposes a long-range cutoff. The final step in the interaction process is to update to the electronic embeddings,

(A3)

Therefore, the messages are transformed from the interaction space to the embedding space and added to the original embedding. The transformation gθ is again implemented by fully connected neural networks. This interaction process is repeated L times, to successively encode increasingly complex many-body information. The continuous-filter convolutions over the molecular graph and the initialization of electrons with identical embeddings make the architecture equivariant with respect to the exchange of same-spin electrons,

(A4)

Overall, this gives a latent-space representation that can efficiently encode electronic many-body effects while intrinsically fulfilling the desired permutation equivariance. Information about the hyperparameters of all of the components is collected in Table V.

TABLE V.

Components of the deep Jastrow factor.

ComponentTypeHyperparameter
Xθ,si—electronic embedding Trainable array Embedding dimension 
Yθ,I—nuclear embeddings Trainable array Kernel dimension 
e—distance expansion Fixed function # distance features 
wθ—filter-generating function DNN (# distance features → kernel dimension), depth 
hθ—transformation embedding to kernel space DNN (Embedding dimension → kernel dimension), depth 
gθ—transformation kernel to embedding space DNN (Kernel dimension → embedding dimension), depth 
ηθ—Jastrow network DNN (Embedding dimension → 1), depth 
Full architecture … # interactions L 
ComponentTypeHyperparameter
Xθ,si—electronic embedding Trainable array Embedding dimension 
Yθ,I—nuclear embeddings Trainable array Kernel dimension 
e—distance expansion Fixed function # distance features 
wθ—filter-generating function DNN (# distance features → kernel dimension), depth 
hθ—transformation embedding to kernel space DNN (Embedding dimension → kernel dimension), depth 
gθ—transformation kernel to embedding space DNN (Kernel dimension → embedding dimension), depth 
ηθ—Jastrow network DNN (Embedding dimension → 1), depth 
Full architecture … # interactions L 
TABLE VI.

Hyperparameters used in numerical calculations.

HyperparameterValue
One-electron basis 6-31G 
Dimension of e (# distance features) 16 
Dimension of xi (embedding dimension) 128 
Number of interaction layers L 
Number of layers in ηθ 
Number of layers in wθ 
Number of layers in hθ 
Number of layers in gθ 
Batch size 2000 
Number of walkers 2000 
Number of training steps H4: 5000 
 H2: 10 000 
 He: 10 000 
 Be: 10 000 
 LiH: 10 000 
 H2O: see Fig. 9  
Optimizer AdamW 
Learning rate scheduler CyclicLR 
Minimum/maximum learning rate 0.0001/0.01 
Clipping window q 
Epoch size 100 
Number of decorrelation sampling steps 
Target acceptance 57% 
HyperparameterValue
One-electron basis 6-31G 
Dimension of e (# distance features) 16 
Dimension of xi (embedding dimension) 128 
Number of interaction layers L 
Number of layers in ηθ 
Number of layers in wθ 
Number of layers in hθ 
Number of layers in gθ 
Batch size 2000 
Number of walkers 2000 
Number of training steps H4: 5000 
 H2: 10 000 
 He: 10 000 
 Be: 10 000 
 LiH: 10 000 
 H2O: see Fig. 9  
Optimizer AdamW 
Learning rate scheduler CyclicLR 
Minimum/maximum learning rate 0.0001/0.01 
Clipping window q 
Epoch size 100 
Number of decorrelation sampling steps 
Target acceptance 57% 
TABLE VII.

Geometries of test systems.

MoleculeAtomPosition (Å)
LiH Li (0.000, 0.000, 0.000) 
 (1.595, 0.000, 0.000) 
H4 square (−0.635, −0.635, 0.000) 
 (−0.635, 0.635, 0.000) 
 (0.635, −0.635, 0.000) 
 (0.635, 0.635, 0.000) 
H4 deformed (−0.900, −0.635, 0.000) 
 (−0.900, 0.635, 0.000) 
 (0.900, −0.635, 0.000) 
 (0.900, 0.635, 0.000) 
H2(0.000 00, 0.000 00, 0.000 00) 
 (0.756 95, 0.585 88, 0.000 00) 
 (−0.756 95, 0.585 88, 0.000 00) 
MoleculeAtomPosition (Å)
LiH Li (0.000, 0.000, 0.000) 
 (1.595, 0.000, 0.000) 
H4 square (−0.635, −0.635, 0.000) 
 (−0.635, 0.635, 0.000) 
 (0.635, −0.635, 0.000) 
 (0.635, 0.635, 0.000) 
H4 deformed (−0.900, −0.635, 0.000) 
 (−0.900, 0.635, 0.000) 
 (0.900, −0.635, 0.000) 
 (0.900, 0.635, 0.000) 
H2(0.000 00, 0.000 00, 0.000 00) 
 (0.756 95, 0.585 88, 0.000 00) 
 (−0.756 95, 0.585 88, 0.000 00) 

A single-particle variant of the graph-convolutional neural-network architecture can be obtained by considering only interactions along edges between electrons and nuclei, that is, the overall architecture of the network remains identical but the electronic updates zi(n,±) in (A3) are removed. Given that the convolution over the electronic distances is the only interaction between electrons, the final embeddings do not contain any many-body correlation.

1.
W. M. C.
Foulkes
,
L.
Mitas
,
R. J.
Needs
, and
G.
Rajagopal
,
Rev. Mod. Phys.
73
,
33
(
2001
).
2.
K. T.
Schütt
,
M.
Gastegger
,
A.
Tkatchenko
,
K.-R.
Müller
, and
R. J.
Maurer
,
Nat. Commun.
10
,
5024
(
2019
).
3.
M.
Gastegger
,
A.
McSloy
,
M.
Luya
,
K. T.
Schütt
, and
R. J.
Maurer
,
J. Chem. Phys.
153
,
044123
(
2020
).
4.
G.
Carleo
and
M.
Troyer
,
Science
355
,
602
(
2017
).
5.
K.
Choo
,
A.
Mezzacapo
, and
G.
Carleo
,
Nat. Commun.
11
,
2368
(
2020
).
6.
J.
Han
,
L.
Zhang
, and
W.
E
,
J. Comput. Phys.
399
,
108929
(
2019
).
7.
J.
Hermann
,
Z.
Schätzle
, and
F.
Noé
,
Nat. Chem.
12
,
891
(
2020
).
8.
D.
Pfau
,
J. S.
Spencer
,
A. G. D. G.
Matthews
, and
W. M. C.
Foulkes
,
Phys. Rev. Res.
2
,
033429
(
2020
).
9.
C.
Adams
,
G.
Carleo
,
A.
Lovato
, and
N.
Rocco
, “
Variational Monte Carlo calculations of A ≤ 4 nuclei with an artificial neural-network correlator ansatz
,” arXiv:2007.14282 (
2020
).
10.
J.
Han
,
Y.
Li
,
L.
Lin
,
J.
Lu
,
J.
Zhang
, and
L.
Zhang
, “
Universal approximation of symmetric and anti-symmetric functions
,” arXiv:1912.01765 (
2019
).
11.
M.
Hutter
, “
On representing (anti)symmetric functions
,” arXiv:2007.15298 (
2020
).
12.
N. D.
Drummond
,
M. D.
Towler
, and
R. J.
Needs
,
Phys. Rev. B
70
,
235119
(
2004
).
13.
R. J.
Needs
,
M. D.
Towler
,
N. D.
Drummond
, and
P.
López Ríos
,
J. Phys.: Condens. Matter
22
,
023201
(
2009
).
14.
A.
Barr
,
W.
Gispen
, and
A.
Lamacraft
, in
Proceedings of Machine Learning Research
(
PMLR
,
2020
), Vol. 107, pp.
635
653
.
15.
J.
Han
,
J.
Lu
, and
M.
Zhou
,
J. Comput. Phys.
423
,
109792
(
2020
).
16.
P.
López Ríos
,
A.
Ma
,
N. D.
Drummond
,
M. D.
Towler
, and
R. J.
Needs
,
Phys. Rev. E
74
,
066701
(
2006
).
17.
T.
Kato
,
Commun. Pure Appl. Math.
10
,
151
(
1957
).
18.
J.
Hermann
,
Z.
Schätzle
, and
H. E.
Sauceda
(
2021
). “
Deepqmc/deepqmc: DeepQMC 0.3.0
,” Zenodo.
19.
A.
Ma
,
M. D.
Towler
,
N. D.
Drummond
, and
R. J.
Needs
,
J. Chem. Phys.
122
,
224322
(
2005
).
20.
W.
Kołos
and
L.
Wolniewicz
,
J. Chem. Phys.
43
,
2429
(
1965
).
21.
Q.
Sun
,
X.
Zhang
,
S.
Banerjee
,
P.
Bao
,
M.
Barbry
,
N. S.
Blunt
,
N. A.
Bogdanov
,
G. H.
Booth
,
J.
Chen
,
Z.-H.
Cui
,
J. J.
Eriksen
,
Y.
Gao
,
S.
Guo
,
J.
Hermann
,
M. R.
Hermes
,
K.
Koh
,
P.
Koval
,
S.
Lehtola
,
Z.
Li
,
J.
Liu
,
N.
Mardirossian
,
J. D.
McClain
,
M.
Motta
,
B.
Mussard
,
H. Q.
Pham
,
A.
Pulkin
,
W.
Purwanto
,
P. J.
Robinson
,
E.
Ronca
,
E. R.
Sayfutyarova
,
M.
Scheurer
,
H. F.
Schurkus
,
J. E. T.
Smith
,
C.
Sun
,
S.-N.
Sun
,
S.
Upadhyay
,
L. K.
Wagner
,
X.
Wang
,
A.
White
,
J. D.
Whitfield
,
M. J.
Williamson
,
S.
Wouters
,
J.
Yang
,
J. M.
Yu
,
T.
Zhu
,
T. C.
Berkelbach
,
S.
Sharma
,
A. Y.
Sokolov
, and
G. K.-L.
Chan
,
J. Chem. Phys.
153
,
024109
(
2020
).
22.
T.
Kinoshita
,
Phys. Rev.
115
,
366
(
1959
).
23.
B. P.
Pritchard
,
D.
Altarawy
,
B.
Didier
,
T. D.
Gibson
, and
T. L.
Windus
,
J. Chem. Inf. Model.
59
,
4814
(
2019
).
24.
M.
Casalegno
,
M.
Mella
, and
A. M.
Rappe
,
J. Chem. Phys.
118
,
7193
(
2003
).
25.
D.
Tunega
and
J.
Noga
,
Theor. Chem. Acc.
100
,
78
(
1998
).
26.
K.
Gasperich
,
M.
Deible
, and
K. D.
Jordan
,
J. Chem. Phys.
147
,
074106
(
2017
).
27.
R.
Livni
,
S.
Shalev-Shwartz
, and
O.
Shamir
, in
Advances in Neural Information Processing Systems
, edited by
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N.
Lawrence
and
K. Q.
Weinberger
(
Curran Associates, Inc.
,
2014
), Vol. 27, pp.
855
863
.
28.
W.
Cencek
and
J.
Rychlewski
,
Chem. Phys. Lett.
320
,
549
(
2000
).
29.
B. K.
Clark
,
M. A.
Morales
,
J.
McMinis
,
J.
Kim
, and
G. E.
Scuseria
,
J. Chem. Phys.
135
,
244105
(
2011
).
30.
I. G.
Gurtubay
and
R. J.
Needs
,
J. Chem. Phys.
127
,
124306
(
2007
).
31.
I. G.
Gurtubay
,
N. D.
Drummond
,
M. D.
Towler
, and
R. J.
Needs
,
J. Chem. Phys.
124
,
024318
(
2006
).
32.
B. J.
Rosenberg
and
I.
Shavitt
,
J. Chem. Phys.
63
,
2162
(
1975
).
33.
P.-O.
Widmark
,
P. k.
Malmqvist
, and
B. r. O.
Roos
,
Theor. Chim. Acta
77
,
291
(
1990
).
34.
P. W.
Battaglia
,
J. B.
Hamrick
,
V.
Bapst
,
A.
Sanchez-Gonzalez
,
V.
Zambaldi
,
M.
Malinowski
,
A.
Tacchetti
,
D.
Raposo
,
A.
Santoro
,
R.
Faulkner
,
C.
Gulcehre
,
F.
Song
,
A.
Ballard
,
J.
Gilmer
,
G.
Dahl
,
A.
Vaswani
,
K.
Allen
,
C.
Nash
,
V.
Langston
,
C.
Dyer
,
N.
Heess
,
D.
Wierstra
,
P.
Kohli
,
M.
Botvinick
,
O.
Vinyals
,
Y.
Li
, and
R.
Pascanu
, “
Relational inductive biases, deep learning, and graph networks
,” arXiv:1806.01261 (
2018
).
35.
J.
Gilmer
,
S. S.
Schoenholz
,
P. F.
Riley
,
O.
Vinyals
, and
G. E.
Dahl
, in
International Conference on Machine Learning
(
PMLR
,
2017
), pp.
1263
1272
.
36.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
,
J. Chem. Phys.
148
,
241722
(
2018
).