Automatic differentiation represents a paradigm shift in scientific programming, where evaluating both functions and their derivatives is required for most applications. By removing the need to explicitly derive expressions for gradients, development times can be shortened and calculations can be simplified. For these reasons, automatic differentiation has fueled the rapid growth of a variety of sophisticated machine learning techniques over the past decade, but is now also increasingly showing its value to support ab initio simulations of quantum systems and enhance computational quantum chemistry. Here, we present an open-source differentiable quantum chemistry simulation code and explore applications facilitated by automatic differentiation: (1) calculating molecular perturbation properties, (2) reoptimizing a basis set for hydrocarbons, (3) checking the stability of self-consistent field wave functions, and (4) predicting molecular properties via alchemical perturbations.

Automatic differentiation is a collection of techniques used to evaluate, up to machine precision, the derivative of a function specified by a computer program. It allows software developers to focus solely on designing the best model for a given problem, without having to worry about implementing any derivatives of the model with respect to its various mathematical parameters. It has already had a transformative effect in machine learning, enabling the development of many new techniques over the past decade, such as batch normalization,1 attention layers,2 and unique neural network architectures.3,4

Automatic differentiation is still relatively new in the context of computational sciences but is already showing promise across a diverse set of applications, including tensor networks,5 computational fluid dynamics,6 and molecular dynamics simulations.7 Automatic differentiation is also growing increasingly popular in quantum chemistry, where it has been used to optimize molecular basis sets,8 to calculate higher derivatives of various exchange–correlation (xc) functionals9 of density functional theory (DFT),10,11 and to determine arbitrary-order nuclear coordinate derivatives of electronic energies.12 

Automatic differentiation is also an essential stepping stone to enable direct integration of quantum chemistry methods with machine learning models and their training. In this context, a differentiable implementation of DFT was recently used to learn the xc functional13 from accurate reference calculations within the density matrix renormalization group approach or from a mixture of computational and experimental data,14 showing a promising new approach to developing transferable and robust xc functionals via deep learning.

Although differentiation in quantum chemistry can be done via finite-difference schemes, calculating the gradient of a parameter with respect to some other input parameters can be time consuming as one has to run the simulation as many times as the number of input parameters. Moreover, finite-difference methods are prone to numerical instabilities and are very sensitive to the chosen step-size. Automatic differentiation addresses these challenges by calculating the analytical gradient via the chain rule, eliminating both the need to run the simulation many times and the need for step-size tuning.

To address the growing need for automatic differentiation in quantum chemistry, we introduce Differentiable Quantum Chemistry (DQC), a DFT and Hartree–Fock (HF)15 simulation code. DQC is written in Python using PyTorch16 and xitorch.17 While PyTorch provides gradient calculations for elementary operations, such as matrix multiplication and explicit eigendecomposition, xitorch provides gradient calculations for functional operations, such as root finding, optimization, and implicit eigendecomposition. The use of PyTorch and xitorch in DQC enables various applications in quantum chemistry, which would otherwise be far more difficult to pursue. For example, the work of Kasim and Vinko14 on learning the xc functional directly from a set of heterogeneous experimental data and calculated density profiles within the constraints of Kohn–Sham DFT already mentioned above was enabled by the use of DQC.

We begin this paper by outlining the basic theory behind DQC in Sec. II. The implementation is described in Sec. III. Applications that exemplify the present approach are given in Sec. IV. We discuss the challenges in the use of automatic differentiation techniques in computational science in Sec. V and summarize our work in Sec. VI.

Quantum chemical calculations of the electronic structure typically require the evaluation of abstract functionals, such as root finding for self-consistent field (SCF) iterations, minimization for geometry optimizations, and the direct minimization approach to the SCF problem. We discuss the handling of these functionals in the context of DQC in this section.

DQC is based on the use of a Gaussian basis set within the linear combination of atomic orbitals (LCAO) approach. For simplicity, we will only present the theory for the spin-restricted formalism, as the spin-unrestricted (and restricted open-shell) formalisms are analogous. The SCF iterations proceed in the LCAO approach by solving Roothaan’s equation18,19
F(D)C=SCE,
(1)
where F(D) is the Fock matrix that is a function of the density matrix D, C is the orbital matrix, S is the overlap matrix, and E is a diagonal matrix containing the orbital energies. The generalized eigenvalue equation in Eq. (1) can be reduced to the normal form by orthogonalizing the overlap matrix S20 and operating in the linearly independent basis spanned by the eigenvectors of S with large enough eigenvalues; any ill-conditioning in the overlap matrix can be removed by choosing a well-defined sub-basis with the help of a pivoted Cholesky decomposition.21 The density matrix D, required to construct the Fock matrix F, can be obtained by
D=CNC,
(2)
where N is a diagonal matrix containing the occupation numbers of the orbitals. As discussed in Ref. 19, the SCF procedure requires repeatedly solving Eqs. (1) and (2) until self-consistency is achieved.
The Roothaan iteration in Eq. (1) can be written mathematically as the process of finding a vector y such that
y=f(y,θ),
(3)
where f is a function that takes the vector y and other parameters θ and returns the expected value of vector y. In DQC, the parameter y is represented by the Fock matrix F, so the function f is the procedure that solves Eq. (1), calculating the density matrix from Eq. (2) and constructing back the Fock matrix from the density matrix. The parameters θ represent the other variables involved in the calculation, such as the overlap matrix S and the occupation number matrix N. The algorithm and gradient calculation for Eqs. (1) and (3) are already available in PyTorch and xitorch.
An alternative approach to solving the self-consistent field equations is to directly find the orbitals that minimize the total energy E by the use of optimization algorithms.22 The energy E can be calculated from the density matrix D, which can be obtained in turn from the orbital coefficients C using Eq. (2). This relation makes the energy a function of the orbital matrix, E(C). However, as the orbitals must remain orthonormal, CSC = I, we introduce a new variable Q defined in terms of its relation with C as
Q=QR̂,
(4)
C=S1/2Q̂.
(5)
Equation (4) is the QR decomposition of the matrix Q into an orthogonal matrix Q̂ and an upper triangular matrix R. Equation (5) involves the inverse square root of the overlap matrix S, which can be computed using the eigendecomposition of S. The energy can then be parameterized by Q, reducing the direct minimization problem to an unbounded optimization problem,
Q*=argminQE(Q).
(6)
The gradient of the energy with respect to Q, i.e., E/Q, is required for an efficient solution. It is automatically computed by PyTorch and xitorch.

Once the optimum Q* in Eq. (6) is found, Q* can still be differentiated with respect to any other variables, such as the nuclear coordinates or the occupation number matrix N. This is made possible by the gradient of the optimization functional provided by xitorch.

We note that our approach of using QR decomposition is slightly different from common direct minimization techniques that use the matrix exponential of a skew-Hermitian matrix, such as the geometric direct minimization algorithm of Ref. 23.

PyTorch16 is a dynamic automatic differentiation library written in Python that provides gradients automatically without having to implement them explicitly. It works by constructing a computational graph and propagating the graph backward when calculating the gradient. The backward propagation of the computational graph follows the chain rule of differentiation. It uses a dynamical approach by default, which means that the computational graph is constructed when the forward calculation is performed.

For example, let us say we want to calculate a variable L from two other variables, a and b, with L = f1(a, b) + f2(a)f3(b) for some functions f1, f2, and f3. If the calculation of L is performed from a and b, PyTorch will construct the graph that consists of some operations, as shown in Fig. 1(a). To calculate the gradient, e.g., ∂L/∂a, the calculations are performed backward in the computational graph. It starts from ∂L/∂L and propagates through until it reaches the gradient that we want, as shown in Fig. 1(b).

FIG. 1.

The computational graph of calculation L = f1(a, b) + f2(a)f3(b). (a) The computational graph for the forward calculation. Note that t1t4 are the temporary variables in this case. (b) The backward gradient calculation from the computational graph. For the backward calculation, it starts from ∂L/∂L = 1 at the top and ends at the bottom. Merging arrows in this backward computational graph means that the gradients are accumulated.

FIG. 1.

The computational graph of calculation L = f1(a, b) + f2(a)f3(b). (a) The computational graph for the forward calculation. Note that t1t4 are the temporary variables in this case. (b) The backward gradient calculation from the computational graph. For the backward calculation, it starts from ∂L/∂L = 1 at the top and ends at the bottom. Merging arrows in this backward computational graph means that the gradients are accumulated.

Close modal

As it goes through an operation in the backward calculation, the backward calculation of that operation is performed. For example, if the operation is t4 = t2t3 (i.e., the * node in Fig. 1), then the backward operation is ∂L/∂t2 = ∂L/∂t4t3 and ∂L/∂t3 = ∂L/∂t4t2. If all the operations in a calculation have the backward calculations defined, then one can back-propagate the graph to calculate the gradients.

PyTorch provides the backward calculations for commonly used operations, such as multiplication, addition, and matrix multiplication. However, backward calculations for functionals (such as root finding) are not available in PyTorch. This is where xitorch17 comes in. xitorch provides backward calculations of functionals, i.e., functions that require other functions as their inputs.

One example that is used in writing DQC is equilibrium finding, i.e., find x such that
x=f(x,θ),
(7)
where f is a function and θ is a parameter of that function. If x is used to calculate a value L and ∂L/x is available from another calculation, then the gradient of L with respect to the parameter θ is given by the adjoint method,24 
Lθ=LxIfx1fθ.
(8)

Using the adjoint method reduces the memory requirements in computing the gradient, compared to direct backpropagation of the equilibrium finder calculation as done in Ref. 13 because the latter method needs to save temporary values calculated in the equilibrium finder iterations, while the former requires no such temporary variables.

The gradient calculation above is provided by xitorch. Other than equilibrium finding, xitorch also provides the gradient calculation of other functionals, such as root finding, optimization, and initial value problem solver.

DQC is implemented mainly using PyTorch as the automatic differentiation package and xitorch for differentiating through functionals. Other than those two general purpose libraries, DQC also uses libxc25 to include exchange–correlation functionals from the literature and libcint26 for Gaussian-basis integrals. Those libraries are wrapped with PyTorch to provide the automatic differentiation capability, where the details can be found in Ref. 14.

DQC implements restricted and unrestricted HF and Kohn–Sham DFT calculations without periodic boundary conditions. The energy can be minimized either using SCF iterations with Broyden’s good method27 for Fock matrix updates or with direct minimization using gradient descent with momentum;28 more elaborate SCF convergence accelerators will be implemented at a later stage of the project. All parameters are differentiable with respect to any other parameters present in the calculation, including the nuclear coordinates, atomic numbers, and electron occupation numbers, as well as the basis set exponents and contraction coefficients. The differentiability of these parameters allows for the exploration of new applications with DQC, a few of which are presented in what follows.

Although the execution speed is not a top priority for DQC, the program shows good overall performance. Table I gives the comparison of the running times of DQC and PySCF,29 an established quantum chemistry code. For small systems, we find that DQC is as efficient as PySCF. However, for moderate-sized molecules (e.g., C4H5N), DQC is about 4–6 times slower than PySCF. This slowdown can be attributed to DQC currently not taking advantage of the symmetry of the two-electron integral tensor, which can reduce the tensor size by roughly a factor of 8 when the basis functions are real-valued. In contrast, DQC is only about 50% slower than PySCF for systems with density fitting. The difference is mainly caused by the higher number of algebraic operations in DQC than PySCF because DQC has not been primarily optimized for speed. Speed optimization for DQC is subject to the future work.

TABLE I.

Execution speed comparison between DQC (SCF iterations) and PySCF.

CasesDQCPySCF
H2O (HF/cc-pVDZ) 96 ms 245 ms 
H2O (PW92/cc-pVDZ) 530 ms 430 ms 
C4H5N (HF/cc-pVTZ) 108 s 17 s 
C4H5N (PW92/cc-pVTZ) 101 s 25 s 
C4H5N (density fit PW92/cc-pVTZ) 30 s 22 s 
C6H8O6 (density fit PW92/cc-pVDZ) 87 s 57 s 
CasesDQCPySCF
H2O (HF/cc-pVDZ) 96 ms 245 ms 
H2O (PW92/cc-pVDZ) 530 ms 430 ms 
C4H5N (HF/cc-pVTZ) 108 s 17 s 
C4H5N (PW92/cc-pVTZ) 101 s 25 s 
C4H5N (density fit PW92/cc-pVTZ) 30 s 22 s 
C6H8O6 (density fit PW92/cc-pVDZ) 87 s 57 s 

The automatic availability of gradients makes it easy to implement the direct minimization method in DQC. Direct minimization is known to be more robust than SCF iterations in finding a converged solution19 and is particularly useful in difficult cases, such as for molecules near their dissociation limits. We illustrate this in Fig. 2, where we show the PW92/cc-pVDZ30,31 total energy of H2+ as a function of the internuclear distance. The SCF method does not converge for internuclear distances greater than around 17 bohrs. In contrast, the direct minimization method continues to converge even for substantially larger distances in excess of 40 bohrs.

FIG. 2.

Unrestricted PW92/cc-pVDZ30,31 energy for the hydrogen molecule cation H2+ as a function of the internuclear distance calculated via the SCF (using Broyden’s method27) and direct minimization approaches. Note that the SCF iterations do not converge for internuclear distances larger than 17 bohrs.

FIG. 2.

Unrestricted PW92/cc-pVDZ30,31 energy for the hydrogen molecule cation H2+ as a function of the internuclear distance calculated via the SCF (using Broyden’s method27) and direct minimization approaches. Note that the SCF iterations do not converge for internuclear distances larger than 17 bohrs.

Close modal

One difficulty in SCF calculations is that the calculation can converge onto a saddle point, which corresponds to an excited state.19 If the calculation has converged onto a local minimum, the Hessian of the energy with respect to orbital rotations must be positive semidefinite. Saddle point solutions, instead, correspond to one or more negative eigenvalues of the orbital Hessian. To maintain the orthonormality of the orbitals, the Hessian is calculated based on the Q variable. Conveniently, we do not need to derive the expression for the Hessian matrix 2E/Q2, as it is automatically obtained by PyTorch.

Moreover, only the lowest eigenvalue needs to be calculated for the stability check, so the full Hessian matrix does not need to be constructed. We obtain the lowest eigenvalue using Davidson’s iterative algorithm,32 as implemented in xitorch. Note that the gradients of the lowest energy eigenvalue with respect to all other parameters in the calculation are automatically available, which may prove useful in further future applications.

We listed some examples of SCF stability checks for HF/pc-133 calculations on diatomic molecules in Table II. As can be seen from the data, the lowest eigenvalues of the orbital Hessian are very close to zero for the ground state, while the excited states yield large negative eigenvalues. This illustrates that it is straightforward to check whether or not a state corresponds to a true minimum with DQC.

TABLE II.

Lowest eigenvalues of the HF/pc-1 orbital Hessian matrix. All the calculations correspond to the stationary points of the HF energy.

DistanceEnergyLowest
Molecules(bohrs)(Eh)eigenvalue (Eh)
O2 (ground) 2.0 −149.54 −6 × 10−14 
O2 (excited) 2.0 −149.17 −0.73 
BeH (ground) 2.5 −15.14 −4 × 10−14 
BeH (excited) 2.5 −15.01 −0.25 
CH (ground) 2.0 −38.259 −5 × 10−8 
CH (excited) 2.0 −38.256 −0.07 
DistanceEnergyLowest
Molecules(bohrs)(Eh)eigenvalue (Eh)
O2 (ground) 2.0 −149.54 −6 × 10−14 
O2 (excited) 2.0 −149.17 −0.73 
BeH (ground) 2.5 −15.14 −4 × 10−14 
BeH (excited) 2.5 −15.01 −0.25 
CH (ground) 2.0 −38.259 −5 × 10−8 
CH (excited) 2.0 −38.256 −0.07 
A key advantage of writing quantum chemistry software with automatic differentiation is that the calculations of molecular properties can be implemented efficiently: all we need to know is how a specific property relates to some derivative expressions. For example, vibrational modes and frequencies of a molecule can be written as
qvib,ωvib=eig2EX2,
(9)
where X is an n × 3 matrix containing the nuclear coordinates of the n atoms, eig is the eigendecomposition, qvib is one of the vibrational modes, and ωvib is its frequency. The expression shown in Eq. (9) is all that is needed to calculate the vibrational characteristics of the molecule; the explicit form of the derivative expression is not required.
Another useful example is the calculation of the electric dipole and quadrupole moments of a molecule. The electric dipole moment is given by
μ=EE+iZixi,
(10)
while the electric quadrupole tensor is given by
M=2EE+iZixixiT.
(11)
In both the expressions, E is the total energy of the molecule, E is the electric field, Zi is the atomic number of the ith atom, and xi are its coordinates.
Having these vibrational and electric multipole properties readily available makes obtaining the Raman vibrational spectrum straightforward. For example, the intensity of the infrared vibrational transition for a normal mode q at a frequency ω is given by
IIR=ijkμiXjkqjk2,
(12)
where μ is the electric dipole moment and X is the matrix of atomic coordinates.
Similarly, the Raman activity at the same frequency and normal mode is proportional to35 
IRaman=5trα2+7γ,
(13)
αij=kl2μiEjXklqkl,
(14)
γ=ij(1δij)14(αiiαjj)2+32αij2,
(15)
where δij is the Kronecker delta.

The HF/cc-pVDZ perturbation properties of H2O, found using the expressions above, are displayed in Table III. The values are in excellent agreement with the data from the computational chemistry comparison and benchmark database (CCCBDB),34 even though DQC does not have any explicit code to calculate the gradients required for these properties.

TABLE III.

Perturbative properties of H2O from HF/cc-pVDZ calculations. The middle column presents the values calculated in DQC, while the last column shows the CCCBDB values.34 The IR intensities and Raman activities are presented at the frequency of 1800 cm−1.

PropertiesDQCCCCBDB
IR intensities (km/mol) 80.69 80.70 
Raman activities (A4/amu) 4.79 4.79 
Dipole (D) −2.044 −2.044 
Quadrupolexx (DA) −7.008 −7.008 
PropertiesDQCCCCBDB
IR intensities (km/mol) 80.69 80.70 
Raman activities (A4/amu) 4.79 4.79 
Dipole (D) −2.044 −2.044 
Quadrupolexx (DA) −7.008 −7.008 

The differentiability of DQC with respect to the basis set parameters enables the optimization of system-specific basis sets. Here, we show this capability by optimizing a basis set for hydrocarbons within Kohn–Sham DFT using the PW92 functional. We reoptimized the cc-pVDZ basis31 for a training set of molecules consisting of CH (methylidyne), CH3 (methyl radical), CH4 (methane), C2H2 (acetylene), and C2H4 (ethylene). The accuracy of the reoptimized basis was then tested on a set of hydrocarbons that were not included in the training set. The results are shown in Fig. 3. The reoptimization of the cc-pVDZ basis set leads to a marked decrease in the total energies of all the molecules in the test set, as the cc-pVXZ basis set series is designed to capture correlation energies instead of polarization energies31 and as hydrocarbons are chemically similar.

FIG. 3.

Basis set truncation errors for the cc-pVDZ and cc-pVTZ basis sets for a range of hydrocarbons, compared with the truncation error for a reoptimized cc-pVDZ basis set. The reference energies are computed in the cc-pV5Z basis set. None of the molecules shown in the figure were used in the basis set optimization.

FIG. 3.

Basis set truncation errors for the cc-pVDZ and cc-pVTZ basis sets for a range of hydrocarbons, compared with the truncation error for a reoptimized cc-pVDZ basis set. The reference energies are computed in the cc-pV5Z basis set. None of the molecules shown in the figure were used in the basis set optimization.

Close modal

One of the differentiable quantities in DQC is the atomic number, allowing us to perform alchemical perturbation studies to predict the properties of molecules without actually needing to simulate them.36,37 As a simple example, we will show here that some properties of diatomic molecules CO (atomic numbers 6 and 8) and BF (atomic numbers 5 and 9) can be estimated directly from the properties of N2 (atomic number 7) and its alchemical perturbations. To do this, we parameterize the atomic number of the atoms in the diatomic molecule by a continuous variable λ so that the atoms have atomic numbers Zl = 7 + λ and Zr = 7 − λ. The molecules N2, CO, and BF, thus, correspond to λ = 0, λ = 1, and λ = 2, respectively.

The properties we aim to predict are the bond length s* and the energy E* at the equilibrium position, which can be mathematically expressed as
s*(λ)=arg minsE(s,λ),
(16)
E*(λ)=E(s*,λ).
(17)
Performing HF/pc-1 calculations, we evaluate the equilibrium distance and the energy at the equilibrium position in two ways. The first way is to optimize the geometry for various fixed values of λ and calculate the above properties directly. The calculations were performed separately but with the same basis (pc-1 nitrogen basis on all atoms) for different molecules, in analogy to the procedure used in Refs. 36 and 37. The second way is to estimate those properties using a Taylor series expansion to second order in λ,
s*(λ)s*(0)+λs*λ(0)+12λ22s*λ2(0),
(18)
E*(λ)E*(0)+λE*λ(0)+12λ22E*λ2(0).
(19)
As s* and E* are calculated at equilibrium, calculating the perturbation terms requires propagating the gradient through the optimization process. However, automatic differentiation makes the propagation trivial, as it is automatically handled by the optimization routine in xitorch.

The results obtained via these two approaches are compared in Fig. 4. As we can see from these results, the properties of CO and BF can be estimated accurately from alchemical perturbations of N2. The estimated equilibrium distances for CO and BF differ by −0.0004 and 0.024 bohr, respectively, while the estimated equilibrium energies deviate by 33 and 587 mEh, respectively. This shows that the equilibrium position of new molecules can be estimated well with the alchemical gradient calculated by DQC, without actually having to calculate those molecules.

FIG. 4.

Comparison of the properties of CO and BF obtained via an exact calculation and an estimation from the second-order gradients via alchemical perturbations, employing the nitrogen pc-1 basis on all atoms.

FIG. 4.

Comparison of the properties of CO and BF obtained via an exact calculation and an estimation from the second-order gradients via alchemical perturbations, employing the nitrogen pc-1 basis on all atoms.

Close modal

Implementing quantum chemistry with automatic differentiation libraries is a promising way to accelerate simulation workflows and to enable novel applications. However, the implementation comes with several challenges. An overarching challenge is that the automatic differentiation library used here, PyTorch, is primarily designed for deep learning rather than for scientific computing. As deep learning only focuses on low-order derivatives, accessing high-order gradients that are commonly required for scientific applications can be difficult.

Detecting numerical instabilities in high-order gradient calculations can also be demanding. Instabilities that produce NaN (not-a-number) in PyTorch are relatively straightforward to manage with its debugging feature since version 1.8, but other instabilities that do not produce NaN can be challenging to detect.

Another challenge is debugging and profiling higher-order gradient calculations. As the gradient is generated automatically, it is hard to find the slowest part of the code or the part that requires the most memory because this information is not readily provided by the available profiling tools.

In addition to higher-order gradient calculations, using automatic differentiation libraries also poses unique challenges in terms of code optimization. For example, quantum chemistry codes usually save the two-electron integrals on the disk due to their large size and process them only in blocks small enough to fit easily into memory. This scheme can only be used in DQC if the gradients with respect to the nuclear positions and the basis are not required. To the best of the authors’ knowledge, there is currently no obvious structure in PyTorch to allow gradient-propagating operations to work with large tensors saved on the disk.

Other automatic differentiation libraries developed for deep learning, such as Tensorflow38 or JAX,39 can also be used to implement differentiable quantum chemistry. As those libraries have developed over the years, their features have become similar to each other. Therefore, differentiable quantum chemistry can also be written using those libraries. The choice depends on the preference of the developers, for instance, which library they are most familiar with.

Implementing quantum chemical calculations using automatic differentiation liberates us from needing to derive analytical gradient expressions. With gradients automatically generated by the program, software developers can focus on designing better and more detailed computational models and applying them to problems at hand. We have shown how automatic differentiation allows us to easily explore various applications in quantum chemistry and are confident that further exploration of this approach will unveil new applications that have not been considered to date.

M.F.K. and S.M.V. acknowledge support from the UK EPSRC (Grant No. EP/P015794/1) and the Royal Society. S.M.V. is a Royal Society University Research Fellow.

The authors have no conflicts to disclose.

The Differentiable Quantum Chemistry (DQC) code can be found at https://github.com/diffqc/dqc/. The repository that contains the applications presented in this paper is located at https://github.com/diffqc/dqc-apps/.

1.
S.
Ioffe
and
C.
Szegedy
, “
Batch normalization: Accelerating deep network training by reducing internal covariate shift
,” in
International Conference on Machine Learning
(
PMLR
,
2015
), pp.
448
456
.
2.
A.
Vaswani
,
N.
Shazeer
,
N.
Parmar
,
J.
Uszkoreit
,
L.
Jones
,
A. N.
Gomez
,
L.
Kaiser
, and
I.
Polosukhin
, “
Attention is all you need
,” in
Advances in Neural Information Processing Systems
, edited by
I.
Guyon
,
U. V.
Luxburg
,
S.
Bengio
,
H.
Wallach
,
R.
Fergus
,
S.
Vishwanathan
, and
R.
Garnett
(Curran Associates, Inc.,
2017
), Vol. 30, https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
3.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet – A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
(
24
),
241722
(
2018
).
4.
O.
Ronneberger
,
P.
Fischer
, and
T.
Brox
, “
U-Net: Convolutional networks for biomedical image segmentation
,” in
International Conference on Medical Image Computing and Computer-Assisted Intervention
(
Springer
,
2015
), pp.
234
241
.
5.
H.-J.
Liao
,
J.-G.
Liu
,
L.
Wang
, and
T.
Xiang
, “
Differentiable programming tensor networks
,”
Phys. Rev. X
9
(
3
),
031041
(
2019
).
6.
C.
Schenck
and
D.
Fox
, “
SPNets: Differentiable fluid dynamics for deep neural networks
,” in
Conference on Robot Learning
(
PMLR
,
2018
), pp.
317
335
.
7.
S.
Schoenholz
and
E. D.
Cubuk
, “
JAX MD: A framework for differentiable physics
,”
J. Stat. Mech.
2021
,
124016
.
8.
T.
Tamayo-Mendoza
,
C.
Kreisbeck
,
R.
Lindh
, and
A.
Aspuru-Guzik
, “
Automatic differentiation in quantum chemistry with applications to fully variational Hartree–Fock
,”
ACS Cent. Sci.
4
(
5
),
559
566
(
2018
).
9.
U.
Ekström
,
L.
Visscher
,
R.
Bast
,
A. J.
Thorvaldsen
, and
K.
Ruud
, “
Arbitrary-order density functional response theory from automatic differentiation
,”
J. Chem. Theory Comput.
6
(
7
),
1971
1980
(
2010
).
10.
P.
Hohenberg
and
W.
Kohn
, “
Inhomogeneous electron gas
,”
Phys. Rev.
136
(
3B
),
B864
(
1964
).
11.
W.
Kohn
and
L. J.
Sham
, “
Self-consistent equations including exchange and correlation effects
,”
Phys. Rev.
140
(
4A
),
A1133
(
1965
).
12.
A. S.
Abbott
,
B. Z.
Abbott
,
J. M.
Turney
, and
H. F.
Schaefer
III
, “
Arbitrary-order derivatives of quantum chemical methods via automatic differentiation
,”
J. Phys. Chem. Lett.
12
(
12
),
3232
3239
(
2021
).
13.
L.
Li
,
S.
Hoyer
,
R.
Pederson
,
R.
Sun
,
E. D.
Cubuk
,
P.
Riley
, and
K.
Burke
, “
Kohn-Sham equations as regularizer: Building prior knowledge into machine-learned physics
,”
Phys. Rev. Lett.
126
(
3
),
036401
(
2021
).
14.
M. F.
Kasim
and
S. M.
Vinko
, “
Learning the exchange-correlation functional from nature with fully differentiable density functional theory
,”
Phys. Rev. Lett.
127
,
126403
(
2021
); arXiv:2102.04229.
15.
D. R.
Hartree
, “
The wave mechanics of an atom with a non-Coulomb central field. Part II. Some results and discussion
,” in
Mathematical Proceedings of the Cambridge Philosophical Society
(
Cambridge University Press
,
1928
), Vol. 24, pp.
111
132
.
16.
A.
Paszke
,
S.
Gross
,
F.
Massa
,
A.
Lerer
,
J.
Bradbury
,
G.
Chanan
,
T.
Killeen
,
Z.
Lin
,
N.
Gimelshein
,
L.
Antiga
,
A.
Desmaison
,
A.
Kopf
,
E.
Yang
,
Z.
DeVito
,
M.
Raison
,
A.
Tejani
,
S.
Chilamkurthy
,
B.
Steiner
,
L.
Fang
,
J.
Bai
, and
S.
Chintala
, “
PyTorch: An imperatie style, high-performance deep learning library
,” in
Advances in Neural Information Processing Systems
, edited by
H.
Wallach
,
H.
Larochelle
,
A.
Beygelzimer
,
F.
d’Alché-Buc
,
E.
Fox
, and
R.
Garnett
(
Curran Associates, Inc.
,
2019
), Vol. 32, pp.
8026
8037
.
17.
M. F.
Kasim
and
S. M.
Vinko
, “
ξ-torch: Differentiable scientific computing library
,” arXiv:2010.01921 (
2020
).
18.
C. C. J.
Roothaan
, “
New developments in molecular orbital theory
,”
Rev. Mod. Phys.
23
(
2
),
69
(
1951
).
19.
S.
Lehtola
,
F.
Blockhuys
, and
C.
Van Alsenoy
, “
An overview of self-consistent field calculations within finite basis sets
,”
Molecules
25
(
5
),
1218
(
2020
).
20.
P.-O.
Löwdin
, “
Quantum theory of cohesive properties of solids
,”
Adv. Phys.
5
(
17
),
1
171
(
1956
).
21.
S.
Lehtola
, “
Curing basis set overcompleteness with pivoted Cholesky decompositions
,”
J. Chem. Phys.
151
(
24
),
241102
(
2019
).
22.
M.
Head-Gordon
and
J. A.
Pople
, “
Optimization of wave function and geometry in the finite basis Hartree–Fock method
,”
J. Phys. Chem.
92
(
11
),
3063
3069
(
1988
).
23.
T.
Van Voorhis
and
M.
Head-Gordon
, “
A geometric approach to direct minimization
,”
Mol. Phys.
100
(
11
),
1713
1721
(
2002
).
24.
S.
Bai
,
J. Z.
Kolter
, and
V.
Koltun
, “
Deep equilibrium models
,” in
Advances in Neural Information Processing Systems
, edited by
H.
Wallach
,
H.
Larochelle
,
A.
Beygelzimer
,
F.
d’ Alché-Buc
,
E.
Fox
, and
R.
Garnett
(Curran Associates, Inc.
2019
), Vol. 32, https://proceedings.neurips.cc/paper/2019/file/01386bd6d8e091c2ab4c7c7de644d37b-Paper.pdf.
25.
S.
Lehtola
,
C.
Steigemann
,
M. J. T.
Oliveira
, and
M. A. L.
Marques
, “
Recent developments in libxc—A comprehensive library of functionals for density functional theory
,”
SoftwareX
7
,
1
5
(
2018
).
26.
Q.
Sun
, “
Libcint: An efficient general integral library for Gaussian basis functions
,”
J. Comput. Chem.
36
(
22
),
1664
1671
(
2015
).
27.
C. G.
Broyden
, “
A class of methods for solving nonlinear simultaneous equations
,”
Math. Comput.
19
(
92
),
577
593
(
1965
).
28.
B. A.
Pearlmutter
, “
Gradient descent: Second-order momentum and saturating error
,”
4
,
887
894
(
1992
).
29.
Q.
Sun
,
X.
Zhang
,
S.
Banerjee
,
P.
Bao
,
M.
Barbry
,
N. S.
Blunt
,
N. A.
Bogdanov
,
G. H.
Booth
,
J.
Chen
,
Z.-H.
Cui
et al, “
Recent developments in the PySCF program package
,”
J. Chem. Phys.
153
(
2
),
024109
(
2020
).
30.
J. P.
Perdew
and
Y.
Wang
, “
Accurate and simple analytic representation of the electron-gas correlation energy
,”
Phys. Rev. B
45
(
23
),
13244
(
1992
).
31.
T. H.
Dunning
, Jr.
, “
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
,”
J. Chem. Phys.
90
(
2
),
1007
1023
(
1989
).
32.
E. R.
Davidson
, “
The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real symmetric matrices
,”
J. Comput. Phys.
17
(
1
),
87
94
(
1975
).
33.
F.
Jensen
, “
Polarization consistent basis sets: Principles
,”
J. Chem. Phys.
115
,
9113
9125
(
2001
).
34.
NIST Computational Chemistry Comparison and Benchmark Database, NIST Standard Reference Database Number 101, edited by
R. D.
Johnson
III
(Release on 21 August 2020) (Online),
National Institute of Standards and Technology
,
Gaithersburg, MD
,
2020
, available at https://dx.doi.org/10.18434/T47C7Z.
35.
D. P.
O’Neill
,
M.
Kállay
, and
J.
Gauss
, “
Analytic evaluation of Raman intensities in coupled-cluster theory
,”
Mol. Phys.
105
(
19–22
),
2447
2453
(
2007
).
36.
R.
Balawender
,
M.
Lesiuk
,
F.
De Proft
,
C.
Van Alsenoy
, and
P.
Geerlings
, “
Exploring chemical space with alchemical derivatives: Alchemical transformations of H through Ar and their ions as a proof of concept
,”
Phys. Chem. Chem. Phys.
21
(
43
),
23865
23879
(
2019
).
37.
G. F.
von Rudorff
and
O. A.
von Lilienfeld
, “
Alchemical perturbation density functional theory
,”
Phys. Rev. Res.
2
(
2
),
023220
(
2020
).
38.
M.
Abadi
,
P.
Barham
,
J.
Chen
,
Z.
Chen
,
A.
Davis
,
J.
Dean
,
M.
Devin
,
S.
Ghemawat
,
G.
Irving
,
M.
Isard
et al, “
TensorFlow: A system for large-scale machine learning
,” in
12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16)
(
USENIX Association
,
2016
), pp.
265
283
.
39.
R.
Frostig
,
M. J.
Johnson
, and
C.
Leary
, “
Compiling machine learning programs via high-level tracing
,” in
Systems for Machine Learning
(
SysML
,
2018
).
Published open access through an agreement with University of Oxford