Automatic differentiation represents a paradigm shift in scientific programming, where evaluating both functions and their derivatives is required for most applications. By removing the need to explicitly derive expressions for gradients, development times can be shortened and calculations can be simplified. For these reasons, automatic differentiation has fueled the rapid growth of a variety of sophisticated machine learning techniques over the past decade, but is now also increasingly showing its value to support *ab initio* simulations of quantum systems and enhance computational quantum chemistry. Here, we present an open-source differentiable quantum chemistry simulation code and explore applications facilitated by automatic differentiation: (1) calculating molecular perturbation properties, (2) reoptimizing a basis set for hydrocarbons, (3) checking the stability of self-consistent field wave functions, and (4) predicting molecular properties via alchemical perturbations.

## I. INTRODUCTION

Automatic differentiation is a collection of techniques used to evaluate, up to machine precision, the derivative of a function specified by a computer program. It allows software developers to focus solely on designing the best model for a given problem, without having to worry about implementing any derivatives of the model with respect to its various mathematical parameters. It has already had a transformative effect in machine learning, enabling the development of many new techniques over the past decade, such as batch normalization,^{1} attention layers,^{2} and unique neural network architectures.^{3,4}

Automatic differentiation is still relatively new in the context of computational sciences but is already showing promise across a diverse set of applications, including tensor networks,^{5} computational fluid dynamics,^{6} and molecular dynamics simulations.^{7} Automatic differentiation is also growing increasingly popular in quantum chemistry, where it has been used to optimize molecular basis sets,^{8} to calculate higher derivatives of various exchange–correlation (xc) functionals^{9} of density functional theory (DFT),^{10,11} and to determine arbitrary-order nuclear coordinate derivatives of electronic energies.^{12}

Automatic differentiation is also an essential stepping stone to enable direct integration of quantum chemistry methods with machine learning models and their training. In this context, a differentiable implementation of DFT was recently used to learn the xc functional^{13} from accurate reference calculations within the density matrix renormalization group approach or from a mixture of computational and experimental data,^{14} showing a promising new approach to developing transferable and robust xc functionals via deep learning.

Although differentiation in quantum chemistry can be done via finite-difference schemes, calculating the gradient of a parameter with respect to some other input parameters can be time consuming as one has to run the simulation as many times as the number of input parameters. Moreover, finite-difference methods are prone to numerical instabilities and are very sensitive to the chosen step-size. Automatic differentiation addresses these challenges by calculating the analytical gradient via the chain rule, eliminating both the need to run the simulation many times and the need for step-size tuning.

To address the growing need for automatic differentiation in quantum chemistry, we introduce Differentiable Quantum Chemistry (DQC), a DFT and Hartree–Fock (HF)^{15} simulation code. DQC is written in Python using PyTorch^{16} and xitorch.^{17} While PyTorch provides gradient calculations for elementary operations, such as matrix multiplication and explicit eigendecomposition, xitorch provides gradient calculations for functional operations, such as root finding, optimization, and implicit eigendecomposition. The use of PyTorch and xitorch in DQC enables various applications in quantum chemistry, which would otherwise be far more difficult to pursue. For example, the work of Kasim and Vinko^{14} on learning the xc functional directly from a set of heterogeneous experimental data and calculated density profiles within the constraints of Kohn–Sham DFT already mentioned above was enabled by the use of DQC.

We begin this paper by outlining the basic theory behind DQC in Sec. II. The implementation is described in Sec. III. Applications that exemplify the present approach are given in Sec. IV. We discuss the challenges in the use of automatic differentiation techniques in computational science in Sec. V and summarize our work in Sec. VI.

## II. METHODS

Quantum chemical calculations of the electronic structure typically require the evaluation of abstract functionals, such as root finding for self-consistent field (SCF) iterations, minimization for geometry optimizations, and the direct minimization approach to the SCF problem. We discuss the handling of these functionals in the context of DQC in this section.

### A. SCF iterations

^{18,19}

**F**(

**D**) is the Fock matrix that is a function of the density matrix

**D**,

**C**is the orbital matrix,

**S**is the overlap matrix, and

**E**is a diagonal matrix containing the orbital energies. The generalized eigenvalue equation in Eq. (1) can be reduced to the normal form by orthogonalizing the overlap matrix

**S**

^{20}and operating in the linearly independent basis spanned by the eigenvectors of

**S**with large enough eigenvalues; any ill-conditioning in the overlap matrix can be removed by choosing a well-defined sub-basis with the help of a pivoted Cholesky decomposition.

^{21}The density matrix

**D**, required to construct the Fock matrix

**F**, can be obtained by

**N**is a diagonal matrix containing the occupation numbers of the orbitals. As discussed in Ref. 19, the SCF procedure requires repeatedly solving Eqs. (1) and (2) until self-consistency is achieved.

**y**such that

**f**is a function that takes the vector

**y**and other parameters

**and returns the expected value of vector**

*θ***y**. In DQC, the parameter

**y**is represented by the Fock matrix

**F**, so the function

**f**is the procedure that solves Eq. (1), calculating the density matrix from Eq. (2) and constructing back the Fock matrix from the density matrix. The parameters

**represent the other variables involved in the calculation, such as the overlap matrix**

*θ***S**and the occupation number matrix

**N**. The algorithm and gradient calculation for Eqs. (1) and (3) are already available in PyTorch and xitorch.

### B. Direct minimization

^{22}The energy $E$ can be calculated from the density matrix

**D**, which can be obtained in turn from the orbital coefficients

**C**using Eq. (2). This relation makes the energy a function of the orbital matrix, $E(C)$. However, as the orbitals must remain orthonormal,

**C**

^{†}

**SC**=

**I**, we introduce a new variable

**Q**defined in terms of its relation with

**C**as

**Q**into an orthogonal matrix $Q\u0302$ and an upper triangular matrix

**R**. Equation (5) involves the inverse square root of the overlap matrix

**S**, which can be computed using the eigendecomposition of

**S**. The energy can then be parameterized by

**Q**, reducing the direct minimization problem to an unbounded optimization problem,

**Q**, i.e., $\u2202E/\u2202Q$, is required for an efficient solution. It is automatically computed by PyTorch and xitorch.

Once the optimum **Q*** in Eq. (6) is found, **Q*** can still be differentiated with respect to any other variables, such as the nuclear coordinates or the occupation number matrix **N**. This is made possible by the gradient of the optimization functional provided by xitorch.

We note that our approach of using QR decomposition is slightly different from common direct minimization techniques that use the matrix exponential of a skew-Hermitian matrix, such as the geometric direct minimization algorithm of Ref. 23.

### C. Automatic differentiation with PyTorch

PyTorch^{16} is a dynamic automatic differentiation library written in Python that provides gradients automatically without having to implement them explicitly. It works by constructing a computational graph and propagating the graph backward when calculating the gradient. The backward propagation of the computational graph follows the chain rule of differentiation. It uses a dynamical approach by default, which means that the computational graph is constructed when the forward calculation is performed.

For example, let us say we want to calculate a variable *L* from two other variables, *a* and *b*, with *L* = *f*_{1}(*a*, *b*) + *f*_{2}(*a*)*f*_{3}(*b*) for some functions *f*_{1}, *f*_{2}, and *f*_{3}. If the calculation of *L* is performed from *a* and *b*, PyTorch will construct the graph that consists of some operations, as shown in Fig. 1(a). To calculate the gradient, e.g., *∂L*/*∂a*, the calculations are performed backward in the computational graph. It starts from *∂L*/*∂L* and propagates through until it reaches the gradient that we want, as shown in Fig. 1(b).

As it goes through an operation in the backward calculation, the backward calculation of that operation is performed. For example, if the operation is *t*_{4} = *t*_{2}*t*_{3} (i.e., the * node in Fig. 1), then the backward operation is *∂L*/*∂t*_{2} = *∂L*/*∂t*_{4}*t*_{3} and *∂L*/*∂t*_{3} = *∂L*/*∂t*_{4}*t*_{2}. If all the operations in a calculation have the backward calculations defined, then one can back-propagate the graph to calculate the gradients.

### D. xitorch

PyTorch provides the backward calculations for commonly used operations, such as multiplication, addition, and matrix multiplication. However, backward calculations for functionals (such as root finding) are not available in PyTorch. This is where xitorch^{17} comes in. xitorch provides backward calculations of functionals, i.e., functions that require other functions as their inputs.

**x**such that

**f**is a function and

*θ*is a parameter of that function. If

**x**is used to calculate a value

*L*and

*∂L*/

*∂*

**x**is available from another calculation, then the gradient of

*L*with respect to the parameter

*θ*is given by the adjoint method,

^{24}

Using the adjoint method reduces the memory requirements in computing the gradient, compared to direct backpropagation of the equilibrium finder calculation as done in Ref. 13 because the latter method needs to save temporary values calculated in the equilibrium finder iterations, while the former requires no such temporary variables.

The gradient calculation above is provided by xitorch. Other than equilibrium finding, xitorch also provides the gradient calculation of other functionals, such as root finding, optimization, and initial value problem solver.

## III. IMPLEMENTATION

DQC is implemented mainly using PyTorch as the automatic differentiation package and xitorch for differentiating through functionals. Other than those two general purpose libraries, DQC also uses libxc^{25} to include exchange–correlation functionals from the literature and libcint^{26} for Gaussian-basis integrals. Those libraries are wrapped with PyTorch to provide the automatic differentiation capability, where the details can be found in Ref. 14.

DQC implements restricted and unrestricted HF and Kohn–Sham DFT calculations without periodic boundary conditions. The energy can be minimized either using SCF iterations with Broyden’s good method^{27} for Fock matrix updates or with direct minimization using gradient descent with momentum;^{28} more elaborate SCF convergence accelerators will be implemented at a later stage of the project. All parameters are differentiable with respect to any other parameters present in the calculation, including the nuclear coordinates, atomic numbers, and electron occupation numbers, as well as the basis set exponents and contraction coefficients. The differentiability of these parameters allows for the exploration of new applications with DQC, a few of which are presented in what follows.

Although the execution speed is not a top priority for DQC, the program shows good overall performance. Table I gives the comparison of the running times of DQC and PySCF,^{29} an established quantum chemistry code. For small systems, we find that DQC is as efficient as PySCF. However, for moderate-sized molecules (e.g., C_{4}H_{5}N), DQC is about 4–6 times slower than PySCF. This slowdown can be attributed to DQC currently not taking advantage of the symmetry of the two-electron integral tensor, which can reduce the tensor size by roughly a factor of 8 when the basis functions are real-valued. In contrast, DQC is only about 50% slower than PySCF for systems with density fitting. The difference is mainly caused by the higher number of algebraic operations in DQC than PySCF because DQC has not been primarily optimized for speed. Speed optimization for DQC is subject to the future work.

Cases . | DQC . | PySCF . |
---|---|---|

H_{2}O (HF/cc-pVDZ) | 96 ms | 245 ms |

H_{2}O (PW92/cc-pVDZ) | 530 ms | 430 ms |

C_{4}H_{5}N (HF/cc-pVTZ) | 108 s | 17 s |

C_{4}H_{5}N (PW92/cc-pVTZ) | 101 s | 25 s |

C_{4}H_{5}N (density fit PW92/cc-pVTZ) | 30 s | 22 s |

C_{6}H_{8}O_{6} (density fit PW92/cc-pVDZ) | 87 s | 57 s |

Cases . | DQC . | PySCF . |
---|---|---|

H_{2}O (HF/cc-pVDZ) | 96 ms | 245 ms |

H_{2}O (PW92/cc-pVDZ) | 530 ms | 430 ms |

C_{4}H_{5}N (HF/cc-pVTZ) | 108 s | 17 s |

C_{4}H_{5}N (PW92/cc-pVTZ) | 101 s | 25 s |

C_{4}H_{5}N (density fit PW92/cc-pVTZ) | 30 s | 22 s |

C_{6}H_{8}O_{6} (density fit PW92/cc-pVDZ) | 87 s | 57 s |

## IV. APPLICATIONS

### A. Direct minimization

The automatic availability of gradients makes it easy to implement the direct minimization method in DQC. Direct minimization is known to be more robust than SCF iterations in finding a converged solution^{19} and is particularly useful in difficult cases, such as for molecules near their dissociation limits. We illustrate this in Fig. 2, where we show the PW92/cc-pVDZ^{30,31} total energy of $H2+$ as a function of the internuclear distance. The SCF method does not converge for internuclear distances greater than around 17 bohrs. In contrast, the direct minimization method continues to converge even for substantially larger distances in excess of 40 bohrs.

### B. Checking SCF stability

One difficulty in SCF calculations is that the calculation can converge onto a saddle point, which corresponds to an excited state.^{19} If the calculation has converged onto a local minimum, the Hessian of the energy with respect to orbital rotations must be positive semidefinite. Saddle point solutions, instead, correspond to one or more negative eigenvalues of the orbital Hessian. To maintain the orthonormality of the orbitals, the Hessian is calculated based on the **Q** variable. Conveniently, we do not need to derive the expression for the Hessian matrix $\u22022E/\u2202Q2$, as it is automatically obtained by PyTorch.

Moreover, only the lowest eigenvalue needs to be calculated for the stability check, so the full Hessian matrix does not need to be constructed. We obtain the lowest eigenvalue using Davidson’s iterative algorithm,^{32} as implemented in xitorch. Note that the gradients of the lowest energy eigenvalue with respect to all other parameters in the calculation are automatically available, which may prove useful in further future applications.

We listed some examples of SCF stability checks for HF/pc-1^{33} calculations on diatomic molecules in Table II. As can be seen from the data, the lowest eigenvalues of the orbital Hessian are very close to zero for the ground state, while the excited states yield large negative eigenvalues. This illustrates that it is straightforward to check whether or not a state corresponds to a true minimum with DQC.

. | Distance . | Energy . | Lowest . |
---|---|---|---|

Molecules . | (bohrs) . | (E_{h})
. | eigenvalue (E_{h})
. |

O_{2} (ground) | 2.0 | −149.54 | −6 × 10^{−14} |

O_{2} (excited) | 2.0 | −149.17 | −0.73 |

BeH (ground) | 2.5 | −15.14 | −4 × 10^{−14} |

BeH (excited) | 2.5 | −15.01 | −0.25 |

CH (ground) | 2.0 | −38.259 | −5 × 10^{−8} |

CH (excited) | 2.0 | −38.256 | −0.07 |

. | Distance . | Energy . | Lowest . |
---|---|---|---|

Molecules . | (bohrs) . | (E_{h})
. | eigenvalue (E_{h})
. |

O_{2} (ground) | 2.0 | −149.54 | −6 × 10^{−14} |

O_{2} (excited) | 2.0 | −149.17 | −0.73 |

BeH (ground) | 2.5 | −15.14 | −4 × 10^{−14} |

BeH (excited) | 2.5 | −15.01 | −0.25 |

CH (ground) | 2.0 | −38.259 | −5 × 10^{−8} |

CH (excited) | 2.0 | −38.256 | −0.07 |

### C. Molecular properties

**X**is an

*n*× 3 matrix containing the nuclear coordinates of the

*n*atoms, eig is the eigendecomposition,

**q**

_{vib}is one of the vibrational modes, and

*ω*

_{vib}is its frequency. The expression shown in Eq. (9) is all that is needed to calculate the vibrational characteristics of the molecule; the explicit form of the derivative expression is not required.

**E**is the electric field,

**Z**

_{i}is the atomic number of the

*i*th atom, and

**x**

_{i}are its coordinates.

**q**at a frequency

*ω*is given by

*μ*is the electric dipole moment and

**X**is the matrix of atomic coordinates.

^{35}

*δ*

_{ij}is the Kronecker delta.

The HF/cc-pVDZ perturbation properties of H_{2}O, found using the expressions above, are displayed in Table III. The values are in excellent agreement with the data from the computational chemistry comparison and benchmark database (CCCBDB),^{34} even though DQC does not have any explicit code to calculate the gradients required for these properties.

Properties . | DQC . | CCCBDB . |
---|---|---|

IR intensities (km/mol) | 80.69 | 80.70 |

Raman activities (A^{4}/amu) | 4.79 | 4.79 |

Dipole (D) | −2.044 | −2.044 |

Quadrupole_{xx} (DA) | −7.008 | −7.008 |

Properties . | DQC . | CCCBDB . |
---|---|---|

IR intensities (km/mol) | 80.69 | 80.70 |

Raman activities (A^{4}/amu) | 4.79 | 4.79 |

Dipole (D) | −2.044 | −2.044 |

Quadrupole_{xx} (DA) | −7.008 | −7.008 |

### D. Basis set optimization

The differentiability of DQC with respect to the basis set parameters enables the optimization of system-specific basis sets. Here, we show this capability by optimizing a basis set for hydrocarbons within Kohn–Sham DFT using the PW92 functional. We reoptimized the cc-pVDZ basis^{31} for a training set of molecules consisting of CH (methylidyne), CH_{3} (methyl radical), CH_{4} (methane), C_{2}H_{2} (acetylene), and C_{2}H_{4} (ethylene). The accuracy of the reoptimized basis was then tested on a set of hydrocarbons that were not included in the training set. The results are shown in Fig. 3. The reoptimization of the cc-pVDZ basis set leads to a marked decrease in the total energies of all the molecules in the test set, as the cc-pVXZ basis set series is designed to capture correlation energies instead of polarization energies^{31} and as hydrocarbons are chemically similar.

### E. Alchemical perturbation

One of the differentiable quantities in DQC is the atomic number, allowing us to perform alchemical perturbation studies to predict the properties of molecules without actually needing to simulate them.^{36,37} As a simple example, we will show here that some properties of diatomic molecules CO (atomic numbers 6 and 8) and BF (atomic numbers 5 and 9) can be estimated directly from the properties of N_{2} (atomic number 7) and its alchemical perturbations. To do this, we parameterize the atomic number of the atoms in the diatomic molecule by a continuous variable *λ* so that the atoms have atomic numbers *Z*_{l} = 7 + *λ* and *Z*_{r} = 7 − *λ*. The molecules N_{2}, CO, and BF, thus, correspond to *λ* = 0, *λ* = 1, and *λ* = 2, respectively.

*s** and the energy

*E** at the equilibrium position, which can be mathematically expressed as

*λ*and calculate the above properties directly. The calculations were performed separately but with the same basis (pc-1 nitrogen basis on all atoms) for different molecules, in analogy to the procedure used in Refs. 36 and 37. The second way is to estimate those properties using a Taylor series expansion to second order in

*λ*,

*s** and

*E** are calculated at equilibrium, calculating the perturbation terms requires propagating the gradient through the optimization process. However, automatic differentiation makes the propagation trivial, as it is automatically handled by the optimization routine in xitorch.

The results obtained via these two approaches are compared in Fig. 4. As we can see from these results, the properties of CO and BF can be estimated accurately from alchemical perturbations of N_{2}. The estimated equilibrium distances for CO and BF differ by −0.0004 and 0.024 bohr, respectively, while the estimated equilibrium energies deviate by 33 and 587 m*E*_{h}, respectively. This shows that the equilibrium position of new molecules can be estimated well with the alchemical gradient calculated by DQC, without actually having to calculate those molecules.

## V. DISCUSSION

Implementing quantum chemistry with automatic differentiation libraries is a promising way to accelerate simulation workflows and to enable novel applications. However, the implementation comes with several challenges. An overarching challenge is that the automatic differentiation library used here, PyTorch, is primarily designed for deep learning rather than for scientific computing. As deep learning only focuses on low-order derivatives, accessing high-order gradients that are commonly required for scientific applications can be difficult.

Detecting numerical instabilities in high-order gradient calculations can also be demanding. Instabilities that produce NaN (not-a-number) in PyTorch are relatively straightforward to manage with its debugging feature since version 1.8, but other instabilities that do not produce NaN can be challenging to detect.

Another challenge is debugging and profiling higher-order gradient calculations. As the gradient is generated automatically, it is hard to find the slowest part of the code or the part that requires the most memory because this information is not readily provided by the available profiling tools.

In addition to higher-order gradient calculations, using automatic differentiation libraries also poses unique challenges in terms of code optimization. For example, quantum chemistry codes usually save the two-electron integrals on the disk due to their large size and process them only in blocks small enough to fit easily into memory. This scheme can only be used in DQC if the gradients with respect to the nuclear positions and the basis are not required. To the best of the authors’ knowledge, there is currently no obvious structure in PyTorch to allow gradient-propagating operations to work with large tensors saved on the disk.

Other automatic differentiation libraries developed for deep learning, such as Tensorflow^{38} or JAX,^{39} can also be used to implement differentiable quantum chemistry. As those libraries have developed over the years, their features have become similar to each other. Therefore, differentiable quantum chemistry can also be written using those libraries. The choice depends on the preference of the developers, for instance, which library they are most familiar with.

## VI. CONCLUSIONS

Implementing quantum chemical calculations using automatic differentiation liberates us from needing to derive analytical gradient expressions. With gradients automatically generated by the program, software developers can focus on designing better and more detailed computational models and applying them to problems at hand. We have shown how automatic differentiation allows us to easily explore various applications in quantum chemistry and are confident that further exploration of this approach will unveil new applications that have not been considered to date.

## ACKNOWLEDGMENTS

M.F.K. and S.M.V. acknowledge support from the UK EPSRC (Grant No. EP/P015794/1) and the Royal Society. S.M.V. is a Royal Society University Research Fellow.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The Differentiable Quantum Chemistry (DQC) code can be found at https://github.com/diffqc/dqc/. The repository that contains the applications presented in this paper is located at https://github.com/diffqc/dqc-apps/.

## REFERENCES

*Advances in Neural Information Processing Systems*

*Advances in Neural Information Processing Systems*

*ξ*-torch: Differentiable scientific computing library

*Advances in Neural Information Processing Systems*

*Systems for Machine Learning*