We propose the relaxation of geometries throughout chemical compound space using alchemical perturbation density functional theory (APDFT). APDFT refers to perturbation theory involving changes in nuclear charges within approximate solutions to Schrödinger’s equation. We give an analytical formula to calculate the mixed second order energy derivatives with respect to both nuclear charges and nuclear positions (named “alchemical force”) within the restricted Hartree–Fock case. We have implemented and studied the formula for its use in geometry relaxation of various reference and target molecules. We have also analyzed the convergence of the alchemical force perturbation series as well as basis set effects. Interpolating alchemically predicted energies, forces, and Hessian to a Morse potential yields more accurate geometries and equilibrium energies than when performing a standard Newton–Raphson step. Our numerical predictions for small molecules including BF, CO, N2, CH4, NH3, H2O, and HF yield mean absolute errors of equilibrium energies and bond lengths smaller than 10 mHa and 0.01 bohr for fourth order APDFT predictions, respectively. Our alchemical geometry relaxation still preserves the combinatorial efficiency of APDFT: Based on a single coupled perturbed Hartree–Fock derivative for benzene, we provide numerical predictions of equilibrium energies and relaxed structures of all 17 iso-electronic charge-neutral BN-doped mutants with averaged absolute deviations of ∼27 mHa and ∼0.12 bohr, respectively.

Chemical compound space is conceptually defined as the infinite set of all possible chemical compounds.1,2 Extensive knowledge of chemical compound space based on quantum mechanical calculations only is made extremely challenging by the unfathomably large number of molecules that have to be taken into consideration. In practical terms, only a small portion of it can be screened using standard computational methods. In order to improve our understanding of chemical compound space as well as to enhance molecular design efforts, faster methods to scan chemical space are needed.3,4 This can be achieved in two possible ways: either using an exact method at a low level of theory [cheap quantum machine (QM) methods or force fields] or using an approximate method for estimating high-level theory calculations. Both quantum machine learning (QML) and alchemical perturbation density functional theory (APDFT) take advantage of data previously acquired and generate new predictions using either interpolation (QML) or extrapolation (APDFT) techniques. QML5,6 can be used effectively to predict several quantum properties;7–12 recent studies13–15 succeeded in machine learning the optimized molecular geometries, but most QML methods do not predict directly geometrical structures.

APDFT16 is an alternative to QML for the rapid screening of chemical space. QML comes at the initial cost of securing a large training set, and this is not always available for the desired level of theory, the desired property of interest, or a relevant chemical subspace. In this regard, APDFT is more versatile than QML because it requires the explicit calculation of only one reference molecule. From the reference molecule calculation and through the use of alchemical perturbations, it is possible to subsequently obtain estimates of molecular properties for a combinatorially large number of target molecules.16 APDFT accuracy depends on the perturbation order17 and on the basis set choice.18 If a high-quality calculation [e.g., coupled cluster singles doubles (CCSD)] is used as a reference, the error in the third order energy prediction can be smaller than the error made using a cheaper QM method, such as Hartree–Fock (HF) and MP2.16,18

Alchemical changes and derivatives were at first introduced as a way to rationalize energy differences between molecules.19–21 Successive works showed that APDFT is also capable of predicting many other molecular properties, such as electron densities, dipole moments,16 ionization potentials,22,23 HOMO energy eigenvalues,24 band structures,25 and deprotonation energies.26,27 APDFT is particularly convenient if applied to highly symmetric systems where the explicit calculation of only a few derivatives (all the others can be obtained by symmetry operations) is required, e.g., in the BN doping of polycyclic aromatic hydrocarbons.28–30 Using pseudopotentials, quantum alchemy can also be applied to crystal structures25,31–35 and can even suggest catalyst improvements.36–40 

One of the major remaining challenges in APDFT is to include geometry relaxation as the equilibrium molecular structure changes from the reference to the target; this would constitute a huge improvement over fixed atoms’ “vertical” predictions. To the best of our knowledge, only two studies24,41 have tried to predict equilibrium energies, applying APDFT to “non-vertical” energy predictions (predicting energies for a perturbation that includes changes in both nuclear charges and geometry).

FIG. 1.

APDFT predictions of gradients and Hessians enable efficient geometry relaxation of target molecules. The predictive accuracy improves with Morse potential interpolation.

FIG. 1.

APDFT predictions of gradients and Hessians enable efficient geometry relaxation of target molecules. The predictive accuracy improves with Morse potential interpolation.

Close modal

In this paper, we propose an alternative approach: we alchemically predict the geometry energy derivatives (gradients and Hessians) for the target molecule and, subsequently, use them to relax the geometry (see Fig. 1). Alchemical forces are defined as the second order mixed derivatives of the energy with respect to nuclear positions and nuclear charges (2EZR). Alchemical forces appear as non-diagonal terms in the generalized Hessian42 (the matrix that contains all the second derivatives of the energy with respect to nuclear charges, atoms’ positions, and electron number). Other off-diagonal terms correspond to nuclear Fukui functions43–48, (2ENeR) and alchemical Fukui functions22,23,49–52 (2ENeZ).

These functions have already been studied, while, to the best of our knowledge, no study has yet carried out an in-depth analysis of alchemical forces. To study the predictive power and applications of alchemical predicted gradients and Hessians, we just consider the restricted Hartree–Fock case. We think that an extension to unrestricted Hartree–Fock (UHF) or density functional theory (DFT) is possible, and some of the conclusions derived for RHF may also hold for other levels of theory.

In Sec. II, we will introduce the reader to the concept of APDFT as well as to the Coupled Perturbed Hartree–Fock (CPHF) method used to calculate alchemical derivatives. We will derive an analytical formulation of RHF alchemical forces. In Sec. III, we show that it is possible to compute higher order derivatives through numerical differentiation and those derivatives can be used to predict geometrical gradients and Hessians for the target molecules; we also discuss how basis sets affect the accuracy of alchemical predictions of energy and geometry. In the results section, we give some examples of how to use the predicted gradients and Hessians to relax the geometry structures of the target molecules.

Within the Born–Oppenheimer approximation,53 the non-degenerate electronic ground state wave function Ψ0(rR, Z, Ne) depends parametrically on the positions of the nuclei R, the nuclear charges Z, and the number of electrons Ne. The nuclear charge vector Z specifies the chemical composition of a given molecule, for real molecules, Z can only have integer values, though from a theoretical point of view, it is possible to extend the concept of atomic charge to any real number. It is thus possible to define a smooth path for Z coordinates that connects molecules of different chemical compositions.22 We call “alchemical transmutation” the process of transforming one element into another for one or more atoms in a molecule. In order to connect a reference molecule R to a target molecule T, we can define the alchemical coordinate λ as a linear transformation of the nuclear charges from the reference ZR = Z(λ = 0) to the target ZT = Z(λ = 1),
Z(λ)ZR+λ(ZTZR)=ZR+λΔZ.
(1)
The linear transformation of the nuclear charges corresponds also to a linear transformation of the electronic Hamiltonian and the nuclear electron attraction operators V̂ne,
Ĥ(λ)=ĤR+λ(ĤTĤR)=ĤR+λ(V̂neTV̂neR),Ĥ(λ)=ĤR+λΔV̂ne.
(2)
For isoelectronic transmutations, the total electronic ground state energy of the molecule is continuous and differentiable in λ.24 From the energy of the reference molecule ER = E(λ = 0) and its derivatives, we can express the energy of the target molecule ET = E(λ = 1) through a Taylor series expansion, where Δλ = 1 and where we assume convergence,
ET=ER+n=11n!nE(λ)λnλ=0.
(3)

Truncating the series in Eq. (3) to a certain order will give an approximation of ET. We call APDFTn prediction the approximation that includes all terms up to the n-th order.

For a variational wave function, assuming the independence of the basis set with respect to λ, the first alchemical derivative of the electronic energy can be evaluated via the Hellmann–Feynman theorem24,54 as follows:
Eλλ=0=ψR|ĤTĤR|ψR=ΩdrΔVne(r)ρR(r).
(4)
Higher order derivatives can be expressed as functionals of density derivatives,
nEλnλ=0=ΩdrΔVne(r)n1ρR(r)λn1λ=0.
(5)

As one can see from Eqs. (4) and (5), the alchemical derivatives are a functional of perturbed electronic densities. These expressions are part of APDFT (alchemical perturbation density functional theory), as already introduced in Ref. 16. They offer an orbital-free way to approximate alchemical energy differences as perturbations of the reference electronic density.

For a monodeterminantal orbital based wave function (e.g., the solution of a Hartree–Fock or a Kohn–Sham DFT calculation), the first alchemical derivative [Eq. (4)] can be rewritten as a contraction between the one-electron density matrix P of the reference molecule with the change in the nuclear electron potential energy operator ΔV,
Eλλ=0=μνPμνΔVμν.
(6)

Higher order derivatives can be obtained via numerical differentiation or analytically using either Møller–Plesset perturbation theory41 or a coupled perturbed approach couple perturbed Hartree-Fock (CPHF) or coupled perturbed Kohn-Sham DFT (CPKS).55 The contribution of nuclear–nuclear repulsion to the alchemical derivatives can be obtained analytically as the derivative of a classical charge distribution.

Coupled perturbed methods are a powerful tool to calculate derivative properties of a monodeterminant wave function (HF, DFT). They represent the standard way to calculate polarization56–58 and can also be used for alchemical perturbations.28,30,50,59

The mathematical treatment to calculate the molecular response to an external electric field or to the electric field generated by the transmutation of one atom is indeed identical. We would like to recall some theoretical aspects of the coupled perturbed Hartree–Fock method applied to perturbations, such as APDFT, which do not include basis set derivatives. In CPHF, the first derivative of the coefficient’s matrix C is written as the product of C and a unitary response matrix U,
Cλ=CU.
(7)
From the orthogonality condition CTSC = I, it is possible to show that U must be skew symmetric,
λ(CTSC)=0,UT(CTSC)+(CTSC)U=0,U=UT.
(8)
Since U acts as a rotation matrix, and since orbital rotations inside the occupied–occupied and the virtual–virtual blocks do not affect the total energy, we can choose the oo and vv blocks of U to be zero, keeping only the ov blocks of U non-zero. The derivative of the Roothaan equation FC = SCϵ with respect to λ becomes
FλC+FCU=SCUϵ+SCϵλ.
(9)
Multiplying from the left by CT and simplifying using the identities CTSC = I and CTFC = ϵ lead to
CTFλC=UϵϵU+ϵλ.
(10)
Since ϵ is a diagonal matrix and U, , and ϵU are zero on the diagonal blocks, we can divide Eq. (10) into two parts as follows:
CTFλCia=Uia(ϵaaϵii)outofdiagonal,
CTFλCpq=ϵpqλonthediagonal.

We used the standard index notation for molecular orbitals, i, j, k for occupied orbitals, a, b, c for virtual orbitals, and p, q, r for arbitrary orbitals. We will use Greek letters λμ, ν, and σ as indices of atomic orbitals and capital letters I, J, and K as atoms’ indices.

The partial derivative of the Fock matrix in Eq. (10) contains the term ∂V/∂λ, but also an implicit dependence on U since F depends on C. A detailed solution to the CPHF equations can be found in Refs. 55, 56, and 60. After solving Eq. (10), derivatives of the one-particle density matrix of a RHF close shell calculation can be evaluated as
Pμνλ=2CμiCiνTλ=2(CμaUaiCiνT+CμiUiaTCaνT).
(11)

In APDFT, any molecular property can be expressed as a function of the alchemical coordinate λ.

Explicating the dependence on the nuclear charges (A(λ)=AZ(λ)), differentiation with respect to λ can be performed using the chain rule as follows:
nAλn=IatomsZIλZInA,ZIλ=ZITZIR.
(12)
The last equation comes directly from Eq. (1).

Solving the CPHF equation for any atom I leads to the evaluation of a response matrix UI and of the first order derivatives (ΔVI, CI, FI, ϵI) of the operators with respect to the nuclear charge ZI.

Consistent with the Wigner 2n + 1 rule, from these first order derivatives, it is possible to evaluate the second and third derivatives of the electronic energy as follows:
2EZIZJ=4aiUaiIΔVaiJ,
(13)
3EZIZJZK=4[(IJK)+(JKI)+(KIJ)],(IJK)=iocc.μνFμνICμiJCνiKijocc.μνSμνCμiICνjJϵijK.
(14)

This formulation is particularly convenient for highly symmetrical systems, where many of the derivatives ∂A/∂ZI are symmetrically equivalent. Only a few explicit CPHF calculations are needed, while the number of possible targets grows combinatorially with the size of the system.

We pointed out in a previous paper18 that neglecting the derivatives of the basis set coefficients with respect to nuclear charges, i.e., calculating alchemical derivatives using the basis set of the reference molecule only, the alchemical series converges to the energy of the target molecule with the basis set of the reference (ET[R]). The difference between this energy and the true target energy was named (alchemical) basis set error: ΔEBSET[R]ET[T].

In the same paper, we proposed a correction to the basis set error that can be added to any order APDFT energy prediction. The correction decomposes the total basis set error into a sum of individual atom contributions calculated from isolated atoms,
ΔEBScorrectionIatomsEIT[T]EIT[R].
(15)

It is worth mentioning that there is the existence of the universal Gaussian basis set (UGBS),61 an uncontracted basis set where all elements share the same basis set exponents, and thus, there is no difference between reference and target basis sets.

Alchemical forces are the mixed derivatives of the energy with respect to both nuclear charges and nuclear positions.42 Alchemical forces can be computed analytically either by differentiating the alchemical potential with respect to the nuclear coordinates or by differentiating the geometrical gradient with respect to the nuclear charges. In accordance with Schwarz’s theorem, both methods are valid and will be presented below.

1. First formulation

Differentiating the first alchemical derivative of the electronic energy [Eq. (6)] with respect to the nuclear coordinates RI of the atom I leads to
RIPΔVne=PRIΔV+PΔVRI,
(16)
where
ΔVRI=VRIZIλ.
(17)

The derivative of the one-electron density matrix PR can be obtained through a CPHF calculation, independently for each molecular coordinate R, i.e., three times the number of atoms; for this reason, we think that the second approach is more convenient.

2. Second formulation

Differentiating the geometrical gradient with respect to the nuclear charge of an atom (ZI) requires only one density derivative PZI, which is also needed for APDFT3 energy predictions.

The first derivative of the energy with respect to one nuclear Cartesian coordinate R,60 unlike Eq. (6), contains terms deriving from the A.O. integrals’ derivatives,
ER=μνPμνHμν(1)R+12μνλσPμνPλσR(μλ||νσ)μνWμνSμνR,
(18)
where Hμν(1) is the monoelectronic part of the Hamiltonian, (μλ||νσ) is the sum of the bielectronic Coulomb and exchange integrals, and W is the energy weighted density matrix,
(μλ||νσ)ϕμ(r1)ϕλ(r2)|1|r1r2||ϕν(r1)ϕσ(r2)12ϕμ(r1)ϕλ(r2)|1|r1r2||ϕσ(r1)ϕν(r2),
(19)
Wμν2iocc.ϵiCμiCνi.
Equation (18) is composed of three terms, and we need to differentiate all of them with respect to the nuclear charge ZI of a given atom I.
For the first term of Eq. (18),
ZIPH(1)R=PZIH(1)R+P2H(1)ZIR.
(20)
The monoelectronic Hamiltonian H(1) is composed of two parts: the kinetic energy operator T, which is independent of ZI, and the nuclear electron potential V,
2Hμν(1)ZIR=2(T+V)μνRZI=RVμνZI=RZIJatomsϕμ(r)|ZJ|RJr||ϕν(r)=Rϕμ(r)|1|RIr||ϕν(r).
(21)

The mixed derivative of the one-electron Hamiltonian [Eq. (21)] is only non-zero if the coordinate R is referred to the position of atom I.

For the second term of Eq. (18),
ZIPμνPλσR(μλ||νσ)=PμνPλσZIR(μλ||νσ)+PμνZIPλσR(μλ||νσ).
(22)
Differentiating the third term of Eq. (18) yields
ZIWμνSμνR=WμνZISμνR.
(23)
The derivative of W can be obtained from Eq. (19) using the response matrix U and the derivatives of the MO energies ∂ϵ/∂ZI from the solution of the CPHF equation [Eq. (10)],
WμνZI=2iocc.ϵi(CU)μiCνi+ϵiCμi(CU)νi+ϵiZICμiCνi.
(24)
Collecting together all terms, the electronic part of the alchemical force at a RHF level of theory can then be expressed as follows:
2ERZI=μνPμνZIHμν(1)R+Pμν2Hμν(1)ZIR+μνλσPμνZIPλσR(μλ||νσ)μνWμνZISμνR.
(25)

3. Nuclear–nuclear contribution

The nuclear–nuclear contribution to the alchemical force can be evaluated analytically as derivatives of an electrostatic charge distribution,
2ENNZJRI=ZI(RIRJ)|RIRJ|3(1δIJ)+QQIZQ(RIRQ)|RIRQ|3δIJ.
(26)

All through our work, we used a locally modified version of PySCF62 in which we implemented subroutines for the analytical alchemical derivatives [Eqs. (6), (13), and (14)] as well as the alchemical force [Eqs. (25) and (26)]. If not specified differently, the pcX-2 basis set63 will be used for second row elements in conjunction with the pc-2 basis set for hydrogen atoms. Basis sets’ coefficients and exponents were obtained from Basis Set Exchange.64–66 Geometrical optimization in redundant internal coordinates was performed using a locally modified version of PyBerny,67 where we included the transformation of the geometrical Hessian from Cartesian to internal coordinates. Root mean square deviations of atomic positions were evaluated using the program (RMSD)68 implementing the Kabsch algorithm.69 The molecular sketch of benzene’s B–N mutants was plotted using RDKit.70–72 All Python code used in this work is made available, free of charge on a Zenodo repository.73 

Geometrical gradients and geometrical Hessians can be expanded as an alchemical series resulting in alchemical predictions of gradients and Hessians of the target molecules. The first alchemical prediction of the gradient is the alchemical force [Eqs. (25) and (26)].

Higher order derivatives of the gradient can be obtained through numerical differentiation of the alchemical force, while differentiation of the geometrical Hessian was only computed numerically. Derivatives were performed via a central finite difference stencil of seven points equispaced by Δλ = 0.1.

As a test case, we choose to analyze the bond stretching of some diatomic molecules and the stretching of a C–H bond in a methane molecule, and the results are plotted in Fig. 2.

FIG. 2.

Decreasing error for APDFT predictions of geometrical gradients (top) and Hessians (bottom) as the perturbation order increases. Legends indicate respective colors for “Ref. → Targ.” molecules. The inset shows convergence to less than 1.5%.

FIG. 2.

Decreasing error for APDFT predictions of geometrical gradients (top) and Hessians (bottom) as the perturbation order increases. Legends indicate respective colors for “Ref. → Targ.” molecules. The inset shows convergence to less than 1.5%.

Close modal

The zeroth order gradient prediction is just the gradient of the reference, which is zero in its geometrical minimum, and this leads to a 100% prediction error for all molecules. Due to symmetry, the alchemical force for the transmutation N2 → CO is zero, and therefore, the first order term does not improve the prediction. The inverse transmutation CO → N2 has also a high first order error, which can be justified by similar reasons.

The third order prediction leads to an error of about 5%, while predictions of orders higher than 4 lead to an error, which is in any case below 1.50%. Going above the fifth order does not necessarily increase accuracy because of numerical errors.

Furthermore, if the derivatives of the atomic orbitals are neglected, the Taylor series converges to the gradient of the target calculated with the basis set of the reference. We can see this effect in the prediction of N2 → CO, where the green line has an offset of ∼1.3%.

For Hessians, the zeroth order prediction, i.e., approximating the target Hessian with the reference’s one, has still a certain degree of accuracy. Additional terms increase the precision and going above the second order will lead to an error below 2%. The best predictions were obtained at the third and the fourth order, while after the fifth order term, the series diverge due to numerical errors in the calculation of the Hessians.

In a previous paper,18 we stated that one of the major sources of error in APDFT vertical energy predictions is the alchemical basis set error, which is explained in Sec. II D.

For the sake of this study, it is of great interest to analyze the alchemical basis set error, not only for vertical energy predictions, but also for the predictions of energies and geometries of the target in its energy minimum “relaxed errors” (ΔEeqBS,ΔReqBS),
ΔEeqBSEeqT[R]EeqT[T],ΔReqBSReqT[R]ReqT[T].
(27)

In the simple cases of diatomic molecules (BF ↔ CO, CO ↔ N2) at the RHF level of theory, we tested performances of some of the most commonly used triple and quadruple ζ basis sets. We compared the family of Karlsruhe def2-nZ,74,75 Dunning and co-worker’s correlation consistent polarized valence cc-pVnZ76 and polarized core and valence cc-pCVnZ,77 Jensen’s polarization consistent pc-n,78–80 and their uncontracted variant optimized for x-ray spectroscopy pcX-n.63 

Figure 3 shows that ΔEeqBS decreases with the increase in the basis set size, and this trend is approximately linear in the logarithmic plot for def2, cc-pCVnZ, and pc-n basis sets.

FIG. 3.

Alchemical basis set errors [see Eq. (27)] for energy (top) and geometry (bottom) as a function of the number of atomic basis functions. Values correspond to averages for BF, CO, and N2 (which have been assessed as Ref. and Targ. molecules for APDFT predictions in Fig. 2 and Table I).

FIG. 3.

Alchemical basis set errors [see Eq. (27)] for energy (top) and geometry (bottom) as a function of the number of atomic basis functions. Values correspond to averages for BF, CO, and N2 (which have been assessed as Ref. and Targ. molecules for APDFT predictions in Fig. 2 and Table I).

Close modal

Core polarization is important for the alchemical invariance of basis sets: the cc-pVnZ basis functions, which are only polarized on the valence shell, have a higher ΔEeqBS compared to other basis of equal size.

As a proof of this concept, we created a new basis set combining the core basis functions (the orbitals with s symmetry) of the cc-pCVQZ basis set with the valence and polarization orbitals of the smaller cc-pCVTZ. The orthonormality of this basis set labeled cc-pCV(sQ)TZ is guaranteed by spherical symmetry.

In Fig. 3, we can compare ΔEeqBS for cc-pCVTZ and cc-pCV(sQ)TZ basis sets: passing from 6 to 8 s type orbitals per atom is enough to reduce the error by about one order of magnitude.

The alchemical basis set error in equilibrium bond lengths decreases with the basis set size, and the highest ΔReqBS occurred using the cc-pVTZ basis set. Equilibrium bond lengths are consistently shorter as a consequence of the basis set superposition error.81–83 

The other basis sets of the cc-pVnZ family (cc-pVQZ and cc-pV5Z) do not exhibit such behavior. ΔReqBS is similar for cc-pCV(sQ)TZ and cc-pCVTZ, and both basis sets share the same valence and polarization functions. We can conclude that a good description of core and valence atomic orbitals is needed for accurate APDFT energies, while a good description of valence orbitals and the presence of polarization functions are required for accurate geometry predictions.

Best results in terms of accuracy per basis set number were achieved for both energy and geometry by the polarization consistent pcX-2 and pcX-3 basis sets. Those are uncontracted basis sets whose coefficients are optimized for x-ray spectroscopy. To effectively describe core-ionization of an atom, the basis functions used should provide a balanced representation of both the ground and core-excited state, where the effective nuclear charge changes by roughly +1. The same requirement is also true for alchemy, where the true nuclear charges change by ±1. This is a case where the computational solution to one specific problem (x-ray spectroscopy) can also be used effectively in a completely different application (APDFT). Acknowledged these conclusions on basis sets, we will use for the rest of the article the pCX-2 basis set, and since the pcX-2 basis sets are defined only for the second and third row elements, for hydrogens, we will use the pc-2 basis set.

In Sec. III B, we showed how alchemical perturbation can be used to predict geometrical gradients and Hessians. Those gradients and Hessians can be used to relax the geometry of the target molecule in several ways. We would like to show in this section that a geometrical relaxation using the alchemical predicted gradients and Hessians is possible and can be accurate depending on the APDFT order.

The simplest relaxation technique is a single step in the Newton–Raphson (NR) optimization method. The potential energy surface is approximated with a paraboloid. Given a set of Cartesian coordinates x, a gradient g, and a Hessian H, the optimization step and the relaxation energy are
Δx=H(1)g,
(28)
ΔE=12ΔxHΔx.
(29)
A better representation for the monodimensional potential of a bond stretching was given by Morse,84 
VMorse(R;De,Re,a,Ve)=De(1ea(RRe))2+Ve.
(30)

Morse potentials depend on four parameters: the bond dissociation energy De, the equilibrium distance Re, the parameter a, which controls the width of the potential well, and Ve = V(re) the value of the energy at the minimum.

Knowing the energy, the first and second derivatives at a given point of the curve, we can match the number of parameters with the number conditions using an empirical value for the dissociation energy. An empirical dissociation energy can be used because the interpolated curves do not depend sensitively on De. In analogy to what is done in the UFF,85 we will approximate De with the value of 100 kcal/mol times the total bond order [for the universal force field (UFF), the parameter is 80 kcal/mol].

For molecules bigger than diatomics, it is still possible to introduce the Morse potential interpolation as a correction to a Newton–Raphson step.

At first, a transformation into internal coordinates q is needed. The Morse correction can be calculated for every bond qb from the b component of the gradient gb and the diagonal term of the Hessian Hbb,
ΔqbCorr.=ΔqbMorse(qb,gb,Hbb,De(b))(gb/Hbb).
(31)

The mixed terms between different coordinates are still calculated in the paraboloid (Newton–Raphson) approximation.

The Morse potential correction affects also the energy,
ΔECorr.=12ΔqTHΔq+bΔEbMorse+12Hbb(ΔqbMorse)2.
(32)

A more detailed discussion of the Morse potential interpolation is given in the supplementary material of this paper.

Taking as an example the dissociation of CO, shown in Fig. 4, we can see that the quadratic approximation leads to a poor result when the starting point is far from the minimum. Since the geometries of the reference and target are usually substantially different, this condition occurs quite often in alchemical relaxation.

FIG. 4.

Morse interpolation improves geometry relaxation as exemplified for CO: Interpolating Hartree–Fock energies, gradients, and Hessians at R0 to the Morse potential and parabola (Newton–Raphson step) affords purple and red potentials, respectively.

FIG. 4.

Morse interpolation improves geometry relaxation as exemplified for CO: Interpolating Hartree–Fock energies, gradients, and Hessians at R0 to the Morse potential and parabola (Newton–Raphson step) affords purple and red potentials, respectively.

Close modal

At the Morse potential, the whole energy curve of bond dissociations is described better, and this can lead to a more accurate relaxation, if the gradient and the Hessian are predicted accurately.

In the simple case of diatomic molecules, we can derive some considerations regarding the accuracy of the alchemical relaxation at different perturbation orders and compare the relaxations obtained using the Newton–Raphson step and the Morse potential interpolation.

Table I shows the results obtained with a NR step. In the predictions CO → BF and N2 → BF, the error in bond lengths does not decrease with the alchemical prediction order. The reason for that is because part of the error comes from the relaxation step used and does not depend on how precisely gradients and Hessians are predicted.

TABLE I.

APDFT based predictions of equilibrium distances and energies and frequencies using a single step of the parabola Newton–Raphson (top) and Morse potential (bottom) optimization. “True” corresponds to Hartree–Fock SCF results.

Ref.Targ.APDFTReq [bohr]E [Ha.]ω0 (cm−1)
Newton–Raphson optimization 
N2 BF 2.290 −124.2586 2676 
N2 BF 2.221 −124.1884 2678 
CO BF 2.285 −124.1862 2402 
CO BF 2.262 −124.1598 2407 
CO BF 2.258 −124.1634 2410 
True BF ⋯ 2.353 −124.1624 1507 
BF CO 1.793 −112.7944 1415 
BF CO 1.846 −112.8141 1418 
BF CO 1.864 −112.8121 1431 
N2 CO 2.080 −112.7909 2740 
N2 CO 2.076 −112.7855 2740 
True CO ⋯ 2.083 −112.7866 2430 
BF N2 0.813 −109.4238 1244 
BF N2 1.375 −109.1547 1274 
BF N2 1.761 −109.0494 1490 
CO N2 1.989 −108.9803 2416 
CO N2 2.009 −108.9977 2411 
CO N2 2.005 −108.9933 2415 
True N2 ⋯ 2.014 −108.9891 2730 
MAE ΔZ ± 1 0.097 0.0111 634 
MAE ΔZ ± 1 0.084 0.0108 635 
MAE ΔZ ± 1 0.083 0.0080 632 
Morse optimization 
N2 BF 2.543 −124.2904 1233 
N2 BF 2.369 −124.2033 1440 
CO BF 2.412 −124.1965 1379 
CO BF 2.364 −124.1674 1453 
CO BF 2.354 −124.1704 1471 
True BF ⋯ 2.353 −124.1624 1507 
BF CO 2.096 −112.7580 2744 
BF CO 2.101 −112.7867 2589 
BF CO 2.104 −112.7868 2572 
N2 CO 2.090 −112.7913 2389 
N2 CO 2.084 −112.7858 2408 
True CO ⋯ 2.083 −112.7866 2430 
BF N2 2.133 −108.7587 5082 
BF N2 2.084 −109.0103 3510 
BF N2 2.110 −109.0010 3130 
CO N2 2.005 −108.9794 2910 
CO N2 2.019 −108.9973 2785 
CO N2 2.017 −108.9929 2812 
True N2 ⋯ 2.014 −108.9891 2730 
MAE ΔZ ± 1 0.022 0.0192 166 
MAE ΔZ ± 1 0.011 0.0045 77 
MAE ΔZ ± 1 0.007 0.0032 70 
Ref.Targ.APDFTReq [bohr]E [Ha.]ω0 (cm−1)
Newton–Raphson optimization 
N2 BF 2.290 −124.2586 2676 
N2 BF 2.221 −124.1884 2678 
CO BF 2.285 −124.1862 2402 
CO BF 2.262 −124.1598 2407 
CO BF 2.258 −124.1634 2410 
True BF ⋯ 2.353 −124.1624 1507 
BF CO 1.793 −112.7944 1415 
BF CO 1.846 −112.8141 1418 
BF CO 1.864 −112.8121 1431 
N2 CO 2.080 −112.7909 2740 
N2 CO 2.076 −112.7855 2740 
True CO ⋯ 2.083 −112.7866 2430 
BF N2 0.813 −109.4238 1244 
BF N2 1.375 −109.1547 1274 
BF N2 1.761 −109.0494 1490 
CO N2 1.989 −108.9803 2416 
CO N2 2.009 −108.9977 2411 
CO N2 2.005 −108.9933 2415 
True N2 ⋯ 2.014 −108.9891 2730 
MAE ΔZ ± 1 0.097 0.0111 634 
MAE ΔZ ± 1 0.084 0.0108 635 
MAE ΔZ ± 1 0.083 0.0080 632 
Morse optimization 
N2 BF 2.543 −124.2904 1233 
N2 BF 2.369 −124.2033 1440 
CO BF 2.412 −124.1965 1379 
CO BF 2.364 −124.1674 1453 
CO BF 2.354 −124.1704 1471 
True BF ⋯ 2.353 −124.1624 1507 
BF CO 2.096 −112.7580 2744 
BF CO 2.101 −112.7867 2589 
BF CO 2.104 −112.7868 2572 
N2 CO 2.090 −112.7913 2389 
N2 CO 2.084 −112.7858 2408 
True CO ⋯ 2.083 −112.7866 2430 
BF N2 2.133 −108.7587 5082 
BF N2 2.084 −109.0103 3510 
BF N2 2.110 −109.0010 3130 
CO N2 2.005 −108.9794 2910 
CO N2 2.019 −108.9973 2785 
CO N2 2.017 −108.9929 2812 
True N2 ⋯ 2.014 −108.9891 2730 
MAE ΔZ ± 1 0.022 0.0192 166 
MAE ΔZ ± 1 0.011 0.0045 77 
MAE ΔZ ± 1 0.007 0.0032 70 

Furthermore, in this method, the dependence of Hessian on the geometry is neglected, and the different geometries of the reference and target lead to a huge error in the prediction of harmonic vibrational frequencies. Table I shows that the Morse method performs better than the method of Newton–Raphson. The biggest errors were made in the predictions N2 ↔ CO, where an alchemical change of ΔZ = ±2 is involved. All the other predictions, which have ΔZ = ±1, have an error in the bond length prediction on an average of 0.011 bohr at the third order, which decreases to 0.007 bohr at the fourth order. The error in the energy is also very low, 4.5 and 3.2 mHa for APDFT3 and APDFT4 predictions, respectively. The error in the harmonic frequency prediction is about 70 cm−1; it is not accurate, but it is indicative of the value of the frequency.

Deprotonation and protonation energies are important quantities because they describe the changes in the enthalpy within an acid–base reaction and are needed in the calculation of the equilibrium constants pKa and pKb.86–88 In this section, we demonstrate how APDFT can be used to predict protonation and deprotonation energies accurately. The APDFT3 error for vertical deprotonation energies can be as small as 1.4 kcal/mol,26,27,41,89 and we would like to go beyond the fixed geometry approximation including the geometrical relaxation energy. More specifically, we exemplify our approach for the alchemical navigation of the iso-electronic ten electron series of second row hydrides CH4 → NH3 → H2O → HF (see Fig. 5 for illustration). Moving from one molecule to a neighboring one, within alchemical deprotonation (protonation), a hydrogen nucleus is alchemically annihilated (created), while the nuclear charge of the heavy atom is simultaneously increased (decreased). The initial guess of the protonation site was chosen on the electrostatic potential minimum of the reference molecule; in that position, we placed a basis set for the hydrogen created. We implemented a three-point finite difference stencil (Δλ = 0.1); combining numerical and analytical differentiation, we obtained second order Hessian predictions, third order gradient prediction, and fifth order energy prediction. Combining numerical differentiation from a three-point finite difference scheme (Δλ = 0.1) with analytical differentiation [Eqs. (13), (14), and (25)], we calculated second order predictions for the Hessian, third order prediction for the gradient, and fifth order prediction for the energy. Geometry relaxation was performed using the Morse potential interpolation [Eqs. (31) and (32)].

FIG. 5.

Subsequent alchemical couplings connect the entire iso-electronic series of second period elements’ hydrides (CH4, NH3, BH2, and HF) through successive alchemical deprotonation (to the right) and protonation (to the left).

FIG. 5.

Subsequent alchemical couplings connect the entire iso-electronic series of second period elements’ hydrides (CH4, NH3, BH2, and HF) through successive alchemical deprotonation (to the right) and protonation (to the left).

Close modal

Figure 6 shows the errors in protonations and deprotonations. The average error on energy predictions is 7 mHa for deprotonation and 5 mHa for protonation; this error is already present in the vertical predictions, as such does not come from the geometrical relaxations.

FIG. 6.

Absolute APDFT prediction errors of equilibrium energies (top), bond lengths (middle), and angle widths (bottom) for the alchemical protonations and deprotonations (see Fig. 5).

FIG. 6.

Absolute APDFT prediction errors of equilibrium energies (top), bond lengths (middle), and angle widths (bottom) for the alchemical protonations and deprotonations (see Fig. 5).

Close modal

Alchemical deprotonation can predict relaxed bond lengths accurately; the results show a mean absolute error of 4 mbohrs, and in all cases, the error is lower than 10 mbohrs; for protonations, the predicted bond length of the hydrogens that were already present in the reference molecule is different from the one of the alchemically created hydrogen.

The effect of alchemical perturbation is much higher on the created hydrogens; for this reason, the error in bond length predictions is ten times higher than the error for hydrogens already present in the molecule, 56 and 4 mbohrs, respectively.

Angles are more flexible coordinates than bonds, and therefore, the prediction error of about 1° is quite substantial, and an exception is the prediction NH3 → CH4, where the smaller error might be due to a stiffer energy potential.

APDFT is efficient in predicting properties for multiple targets from a few derivatives when symmetric systems are chosen as the reference. An appealing application is the B–N doping of aromatic hydrocarbons, such as coronenes, fullerenes, or graphene.28–30,42

Here, we show that it is possible to obtain approximate equilibrium energies and relaxed structures for all the 17 B–N doped mutants shown in Fig. 7 through the explicit calculation of just one CPHF derivative for benzene. The CPHF response matrix U can be obtained for all atoms via symmetry operations; in particular, we rotated the U matrix around the symmetry axis of the molecule using the Wigner D-matrix.90 Energy prediction up to the third order can be obtained using Eqs. (6), (13), and (14); the same CPHF derivatives can be reused to calculate the alchemical force in Eq. (25). This is sufficient to give a first order prediction of the geometrical gradient for all 17 target molecules. The targets’ Hessians were approximated at the zeroth order with the Hessian of the reference. As shown in Fig. 2, this approximation has an error of ∼20% for the isoelectronic hydride series, which can still lead to qualitative improvements. In this case, the geometrical relaxation was performed using a Newton–Raphson step and not Morse because it is less sensitive to errors in gradient and Hessian. We measure geometry prediction errors through the root mean square deviation from the true target molecules’ minima (RMSD69). It is important to remark on the energy ordering of the mutants. Every B–N substitution corresponds to a decrease in the energy of about 3 hartree, and therefore, stoichiometry is the dominating dimension. Among constitutional isomers, energy ordering depends on the relative positions of B and N: for monodoped mutants, the least stable is the one with B and N in the meta position, followed by the trans and ortho isomers. This ordering is analogous to the reactivity in electrophilic–nucleophilic aromatic substitutions.91 In fact, after a B–N doping of benzene, the electronic charge shifts from the boron atom to the nitrogen; in this context, nitrogen atoms act as Lewis acids (electrophiles) and borons as Lewis bases (nucleophiles).92,93 We note, however, that these trends change upon removal of the nuclear repulsion terms, which do not affect the electronics. The predicted energy ordering is the same, both for the vertical and relaxed energies; the relaxation calculated only with the first order alchemical force is in fact not able to discriminate alchemical enantiomers94 and does not change the energy order (Fig. 8).

FIG. 7.

A single APDFT3 reference calculation for benzene affords equilibrium energy predictions for all 17 possible iso-electronic charge-neutral B–N doped mutants at the Hartree–Fock level of theory.

FIG. 7.

A single APDFT3 reference calculation for benzene affords equilibrium energy predictions for all 17 possible iso-electronic charge-neutral B–N doped mutants at the Hartree–Fock level of theory.

Close modal
FIG. 8.

Geometry (top) and equilibrium energy (bottom) prediction errors from a single APDFT3 reference calculation for benzene for all 17 possible iso-electronic charge-neutral B–N doped mutants (structures shown in Fig. 7).

FIG. 8.

Geometry (top) and equilibrium energy (bottom) prediction errors from a single APDFT3 reference calculation for benzene for all 17 possible iso-electronic charge-neutral B–N doped mutants (structures shown in Fig. 7).

Close modal

Relaxation reduces the average RMSD by a factor of 60%: from 0.31 to 0.12 bohr. For singly doped mutants, the RMSD is reduced from 0.23 bohr to only 0.07 bohr.

The highest deviation from benzene’s geometry was found for mutants with adjacent N–N atoms and adjacent B–B atoms (mutants 4, 5 7, 15), the lowest for the mutants with alternating B–N (mutants 3, 14, 17). This is a consequence of alchemical symmetry:94 at the first order, the alchemical force on a B–N bond is zero, and consecutively, there will be no stretching.

The mean absolute error in energy predictions is reduced by 70% after geometrical relaxation, passing from 89 to 27 mHa. In particular, for singly B–N doped mutants, it is reduced from 42 mHa to just 2 mHa.

In this work, we have shown that alchemical perturbation can successfully be applied to geometry relaxation and the prediction of equilibrium energies. A key step is the computation of the alchemical force (2EZR), for which we gave an analytical formulation in the restricted Hartree–Fock case; an extension of the formula to other self-consistent field methods, such as UHF and DFT, should be possible. As we reported previously for vertical APDFT predictions,18 the basis set choice is crucial for accuracy. We confirm this conclusion is also for the APDFT based geometry relaxation and equilibrium energies studied in this work. More specifically, using the pcX-2 basis set, gradients and Hessians can be predicted with a fourth order error smaller than 2%. The consecutive geometrical relaxation using the employed Morse potential interpolation can predict equilibrium energies and bond lengths with a respective accuracy of ∼10 mHa and ∼0.01 bohr for small molecules (BF, CO, and N2 and CH4, NH3, H2O, and HF), and ∼27 mHa and ∼0.12 bohr for all the BN doped benzene mutants. The Morse potential was found to be a good descriptor of the bond dissociation energy profile for molecules within the range of bond distances encountered. Nevertheless, for different systems (e.g., solid state materials) or different bond lengths, the choice of another potential (e.g., Buckingham and Lennard-Jones) might be more suitable, but further investigations are beyond the scope of this paper.

In comparison to vertical APDFT predictions, we have confirmed the expectation that the inclusion of gradient and Hessian information results in a considerable increase in accuracy. The computational overhead is modest since first order APDFT predictions of gradients do not require more CPHF calculations than the one required for APDFT3. As for all alchemical methods, APDFT based geometry relaxations become particularly attractive when symmetry reduces number derivatives and simultaneously increases the number of target molecules. In particular, we demonstrated reasonable equilibrium energy and geometry predictions for all 17 BN doped mutant systems based on a single APDFT3 reference calculation for benzene. As an outlook, we believe that our findings and discussion would suggest that APDFT based geometry relaxation can be meaningful for many compounds on regular lattices, e.g., doped fullerenes or graphitic systems.

See the supplementary material for additional data on the accuracy of alchemical force predictions and for a more detailed discussion on the Morse potential interpolation.

We acknowledge support from the European Research Council (ERC-CoG grant QML and H2020 project BIG-MAP). This project has received funding from the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement Nos. 772834 and 957189. All results only reflect the authors’ views and the EU is not responsible for any use that may be made of the information. This work was partly supported by the NCCR MARVEL, funded by the Swiss National Science Foundation. The computational results presented have been achieved using the Vienna Scientific Cluster (VSC). The authors would also like to thank Dr. Max Schwilk for the helpful discussions about the formulation of analytic derivatives and Professor Frank Jensen from Aarhus University for the suggestion of using the pc-X basis set in the context of quantum alchemy.

The authors have no conflicts to disclose.

The data that support the findings of this study are openly available in Alchemical CPHF perturbator at http://doi.org/10.5281/zenodo.5606918.70 

1.
P.
Kirkpatrick
and
C.
Ellis
, “
Chemical space
,”
Nature
432
,
823
(
2004
).
2.
A.
Mullard
, “
The drug-maker’s guide to the galaxy
,”
Nature
549
,
445
447
(
2017
).
3.
O. A.
von Lilienfeld
, “
First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties
,”
Int. J. Quantum Chem.
113
,
1676
1689
(
2013
).
4.
J. G.
Freeze
,
H. R.
Kelly
, and
V. S.
Batista
, “
Search for catalysts by inverse design: Artificial intelligence, mountain climbers, and alchemists
,”
Chem. Rev.
119
,
6595
6612
(
2019
).
5.
O. A.
von Lilienfeld
, “
Quantum machine learning in chemical compound space
,”
Angew. Chem., Int. Ed.
57
,
4164
4169
(
2018
).
6.
M.
Rupp
, “
Machine learning for quantum mechanics in a nutshell
,”
Int. J. Quantum Chem.
115
,
1058
1073
(
2015
).
7.
O. A.
von Lilienfeld
,
K.-R.
Müller
, and
A.
Tkatchenko
, “
Exploring chemical compound space with quantum-based machine learning
,”
Nat. Rev. Chem.
4
,
347
358
(
2020
).
8.
M.
Rupp
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
, “
Fast and accurate modeling of molecular atomization energies with machine learning
,”
Phys. Rev. Lett.
108
,
058301
(
2012
).
9.
F. A.
Faber
,
L.
Hutchison
,
B.
Huang
,
J.
Gilmer
,
S. S.
Schoenholz
,
G. E.
Dahl
,
O.
Vinyals
,
S.
Kearnes
,
P. F.
Riley
, and
O. A.
von Lilienfeld
, “
Prediction errors of molecular machine learning models lower than hybrid DFT error
,”
J. Chem. Theory Comput.
13
,
5255
5264
(
2017
).
10.
A. S.
Christensen
,
F. A.
Faber
, and
O. A.
von Lilienfeld
, “
Operators in quantum machine learning: Response properties in chemical space
,”
J. Chem. Phys.
150
,
064105
(
2019
).
11.
J.
Weinreich
,
N. J.
Browning
, and
O. A.
von Lilienfeld
, “
Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation
,”
J. Chem. Phys.
154
,
134113
(
2021
).
12.
G.
Pilania
,
C.
Wang
,
X.
Jiang
,
S.
Rajasekaran
, and
R.
Ramprasad
, “
Accelerating materials property predictions using machine learning
,”
Sci. Rep.
3
,
2810
(
2013
).
13.
M. K.
Bisbo
and
B.
Hammer
, “
Global optimization of atomistic structure enhanced by machine learning
,” arXiv:2012.15222 (
2020
).
14.
D.
Lemm
,
G. F.
von Rudorff
, and
O. A.
von Lilienfeld
, “
Machine learning based energy-free structure predictions of molecules, transition states, and solids
,”
Nat. Commun.
12
,
4468
(
2021
).
15.
J. A.
Keith
,
V.
Vassilev-Galindo
,
B.
Cheng
,
S.
Chmiela
,
M.
Gastegger
,
K.-R.
Müller
, and
A.
Tkatchenko
, “
Combining machine learning and computational chemistry for predictive insights into chemical systems
,”
Chem. Rev.
121
,
9816
9872
(
2021
).
16.
G. F.
von Rudorff
and
O. A.
von Lilienfeld
, “
Alchemical perturbation density functional theory
,”
Phys. Rev. Res.
2
,
023220
(
2020
).
17.
G. F.
von Rudorff
, “
Arbitrarily accurate quantum alchemy
,”
J. Chem. Phys.
155
,
224103
(
2021
).
18.
G.
Domenichini
,
G. F.
von Rudorff
, and
O. A.
von Lilienfeld
, “
Effects of perturbation order and basis set on alchemical predictions
,”
J. Chem. Phys.
153
,
144118
(
2020
).
19.
E. B.
Wilson
, “
Four-dimensional electron density function
,”
J. Chem. Phys.
36
,
2232
2233
(
1962
).
20.
P.
Politzer
and
R. G.
Parr
, “
Some new energy formulas for atoms and molecules
,”
J. Chem. Phys.
61
,
4258
4262
(
1974
).
21.
M.
Levy
, “
An energy-density equation for isoelectronic changes in atoms
,”
J. Chem. Phys.
68
,
5298
5299
(
1978
).
22.
O. A.
Von Lilienfeld
and
M. E.
Tuckerman
, “
Molecular grand-canonical ensemble density functional theory and exploration of chemical space
,”
J. Chem. Phys.
125
,
154104
(
2006
).
23.
V.
Marcon
,
O. A.
von Lilienfeld
, and
D.
Andrienko
, “
Tuning electronic eigenvalues of benzene via doping
,”
J. Chem. Phys.
127
,
064305
(
2007
).
24.
O.
Anatole von Lilienfeld
, “
Accurate ab initio energy gradients in chemical compound space
,”
J. Chem. Phys.
131
,
164102
(
2009
).
25.
K. Y. S.
Chang
and
O. A.
von Lilienfeld
, “
AlxGa1−xAs crystals with direct 2 eV band gaps from computational alchemy
,”
Phys. Rev. Mater.
2
,
073802
(
2018
).
26.
G. F.
von Rudorff
and
O. A.
von Lilienfeld
, “
Rapid and accurate molecular deprotonation energies from quantum alchemy
,”
Phys. Chem. Chem. Phys.
22
,
10519
(
2020
).
27.
M.
Muñoz
,
A.
Robles-Navarro
,
P.
Fuentealba
, and
C.
Cárdenas
, “
Predicting deprotonation sites using alchemical derivatives
,”
J. Phys. Chem. A
124
,
3754
3760
(
2020
).
28.
R.
Balawender
,
M. A.
Welearegay
,
M.
Lesiuk
,
F.
De Proft
, and
P.
Geerlings
, “
Exploring chemical space with the alchemical derivatives
,”
J. Chem. Theory Comput.
9
,
5327
5340
(
2013
).
29.
Y. S.
Al-Hamdani
,
A.
Michaelides
, and
O. A.
von Lilienfeld
, “
Exploring dissociative water adsorption on isoelectronically BN doped graphene using alchemical derivatives
,”
J. Chem. Phys.
147
,
164113
(
2017
).
30.
R.
Balawender
,
M.
Lesiuk
,
F.
De Proft
, and
P.
Geerlings
, “
Exploring chemical space with alchemical derivatives: BN-simultaneous substitution patterns in C60
,”
J. Chem. Theory Comput.
14
,
1154
1168
(
2018
).
31.
N.
Marzari
,
S.
de Gironcoli
, and
S.
Baroni
, “
Structure and phase stability of GaxIn1−xP solid solutions from computational alchemy
,”
Phys. Rev. Lett.
72
,
4001
4004
(
1994
).
32.
A. M.
Saitta
,
S.
de Gironcoli
, and
S.
Baroni
, “
Structural and electronic properties of a wide-gap quaternary solid solution: Zn, Mg S, Se
,”
Phys. Rev. Lett.
80
,
4939
(
1998
).
33.
L.
Bellaiche
,
A.
García
, and
D.
Vanderbilt
, “
Low-temperature properties of Pb(Zr1−xTix)O3 solid solutions near the morphotropic phase boundary
,”
Ferroelectrics
266
,
41
56
(
2002
).
34.
D.
Sheppard
,
G.
Henkelman
, and
O. A.
von Lilienfeld
, “
Alchemical derivatives of reaction energetics
,”
J. Chem. Phys.
133
,
084104
(
2010
).
35.
A.
Solovyeva
and
O. A.
von Lilienfeld
, “
Alchemical screening of ionic crystals
,”
Phys. Chem. Chem. Phys.
18
,
31078
31091
(
2016
).
36.
C. D.
Griego
,
L.
Zhao
,
K.
Saravanan
, and
J. A.
Keith
, “
Machine learning corrected alchemical perturbation density functional theory for catalysis applications
,”
AIChE J.
66
,
e17041
(
2020
).
37.
K.
Saravanan
,
J. R.
Kitchin
,
O. A.
Von Lilienfeld
, and
J. A.
Keith
, “
Alchemical predictions for computational catalysis: Potential and limitations
,”
J. Phys. Chem. Lett.
8
,
5002
5007
(
2017
).
38.
C. D.
Griego
,
K.
Saravanan
, and
J. A.
Keith
, “
Benchmarking computational alchemy for carbide, nitride, and oxide catalysts
,”
Adv. Theory Simul.
2
,
1800142
(
2019
).
39.
C. D.
Griego
,
J. R.
Kitchin
, and
J. A.
Keith
, “
Acceleration of catalyst discovery with easy, fast, and reproducible computational alchemy
,”
Int. J. Quantum Chem.
121
,
e26380
(
2021
).
40.
C. D.
Griego
,
A. M.
Maldonado
,
L.
Zhao
,
B.
Zulueta
,
B. M.
Gentry
,
E.
Lipsman
,
T. H.
Choi
, and
J. A.
Keith
, “
Computationally guided searches for efficient catalysts through chemical/materials space: Progress and outlook
,”
J. Phys. Chem. C
125
,
6495
6507
(
2021
).
41.
K. Y. S.
Chang
,
S.
Fias
,
R.
Ramakrishnan
, and
O. A.
von Lilienfeld
, “
Fast and accurate predictions of covalent bonds in chemical space
,”
J. Chem. Phys.
144
,
174110
(
2016
).
42.
S.
Fias
,
K. Y. S.
Chang
, and
O. A.
von Lilienfeld
, “
Alchemical normal modes unify chemical space
,”
J. Phys. Chem. Lett.
10
,
30
39
(
2019
).
43.
M. H.
Cohen
,
M. V.
Ganduglia-Pirovano
, and
J.
Kudrnovský
, “
Electronic and nuclear chemical reactivity
,”
J. Chem. Phys.
101
,
8988
8997
(
1994
).
44.
M. H.
Cohen
,
M. V.
Ganduglia-Pirovano
, and
J.
Kudrnovský
, “
Reactivity kernels, the normal modes of chemical reactivity, and the hardness and softness spectra
,”
J. Chem. Phys.
103
,
3543
3551
(
1995
).
45.
F.
De Proft
,
S.
Liu
, and
P.
Geerlings
, “
Calculation of the nuclear Fukui function and new relations for nuclear softness and hardness kernels
,”
J. Chem. Phys.
108
,
7549
7554
(
1998
).
46.
B. G.
Baekelandt
, “
The nuclear Fukui function and Berlin’s binding function in density functional theory
,”
J. Chem. Phys.
105
,
4664
4667
(
1996
).
47.
R.
Balawender
and
P.
Geerlings
, “
Nuclear Fukui function from coupled perturbed Hartree–Fock equations
,”
J. Chem. Phys.
114
,
682
691
(
2001
).
48.
R.
Laplaza
,
C.
Cárdenas
,
P.
Chaquin
,
J.
Contreras-García
, and
P. W.
Ayers
, “
Orbital energies and nuclear forces in DFT: Interpretation and validation
,”
J. Comput. Chem.
42
,
334
343
(
2021
).
49.
M.
Muñoz
and
C.
Cárdenas
, “
How predictive could alchemical derivatives be?
,”
Phys. Chem. Chem. Phys.
19
,
16003
16012
(
2017
).
50.
R.
Balawender
,
M.
Lesiuk
,
F.
De Proft
,
C.
Van Alsenoy
, and
P.
Geerlings
, “
Exploring chemical space with alchemical derivatives: Alchemical transformations of H through Ar and their ions as a proof of concept
,”
Phys. Chem. Chem. Phys.
21
,
23865
23879
(
2019
).
51.
T.
Gómez
,
P.
Fuentealba
,
A.
Robles-Navarro
, and
C.
Cárdenas
, “
Links among the Fukui potential, the alchemical hardness and the local hardness of an atom in a molecule
,”
J. Comput. Chem.
42
,
1681
1688
(
2021
).
52.
C.
Cárdenas
,
F.
Heidar-Zadeh
, and
P. W.
Ayers
, “
Benchmark values of chemical potential and chemical hardness for atoms and atomic ions (including unstable anions) from the energies of isoelectronic series
,”
Phys. Chem. Chem. Phys.
18
,
25721
25734
(
2016
).
53.
M.
Born
and
R.
Oppenheimer
, “
Zur quantentheorie der molekeln
,”
Ann. Phys.
389
,
457
484
(
1927
).
54.
R. P.
Feynman
, “
Forces in molecules
,”
Phys. Rev.
56
,
340
343
(
1939
).
55.
M.
Lesiuk
,
R.
Balawender
, and
J.
Zachara
, “
Higher order alchemical derivatives from coupled perturbed self-consistent field theory
,”
J. Chem. Phys.
136
,
034104
(
2012
).
56.
R.
Cammi
,
M.
Cossi
, and
J.
Tomasi
, “
Analytical derivatives for molecular solutes. III. Hartree–Fock static polarizability and hyperpolarizabilities in the polarizable continuum model
,”
J. Chem. Phys.
104
,
4611
4620
(
1996
).
57.
T. C.
Caves
and
M.
Karplus
, “
Perturbed Hartree–Fock theory. I. Diagrammatic double-perturbation analysis
,”
J. Chem. Phys.
50
,
3649
3661
(
1969
).
58.
A.
Dalgarno
, “
Atomic polarizabilities and shielding factors
,”
Adv. Phys.
11
,
281
315
(
1962
).
59.
R.
Balawender
,
A.
Holas
,
F.
De Proft
,
C.
Van Alsenoy
, and
P.
Geerlings
, “
Alchemical derivatives of atoms: A walk through the periodic table
,” in
Many-Body Approaches at Different Scales: A Tribute to Norman H. March on the Occasion of his 90th Birthday
, edited by
G.
Angilella
and
C.
Amovilli
(
Springer International Publishing
,
Cham
,
2018
), pp.
227
251
.
60.
J. A.
Pople
,
R.
Krishnan
,
H. B.
Schlegel
, and
J. S.
Binkley
, “
Derivative studies in Hartree-Fock and Møller-Plesset theories
,”
Int. J. Quantum Chem.
16
,
225
241
(
1979
).
61.
E. V. R.
de Castro
and
F. E.
Jorge
, “
Accurate universal Gaussian basis set for all atoms of the periodic table
,”
J. Chem. Phys.
108
,
5225
5229
(
1998
).
62.
Q.
Sun
,
T. C.
Berkelbach
,
N. S.
Blunt
,
G. H.
Booth
,
S.
Guo
,
Z.
Li
,
J.
Liu
,
J. D.
McClain
,
E. R.
Sayfutyarova
,
S.
Sharma
,
S.
Wouters
, and
G. K.
Chan
, “
PySCF: The python-based simulations of chemistry framework
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1340
(
2017
).
63.
M. A.
Ambroise
and
F.
Jensen
, “
Probing basis set requirements for calculating core ionization and core excitation spectroscopy by the Δ self-consistent-field approach
,”
J. Chem. Theory Comput.
15
,
325
337
(
2019
).
64.
B. P.
Pritchard
,
D.
Altarawy
,
B.
Didier
,
T. D.
Gibson
, and
T. L.
Windus
, “
New basis set exchange: An open, up-to-date resource for the molecular sciences community
,”
J. Chem. Inf. Model.
59
,
4814
4820
(
2019
).
65.
D.
Feller
, “
The role of databases in support of computational chemistry calculations
,”
J. Comput. Chem.
17
,
1571
1586
(
1996
).
66.
K. L.
Schuchardt
,
B. T.
Didier
,
T.
Elsethagen
,
L.
Sun
,
V.
Gurumoorthi
,
J.
Chase
,
J.
Li
, and
T. L.
Windus
, “
Basis set exchange: A community database for computational sciences
,”
J. Chem. Inf. Model.
47
,
1045
1052
(
2007
).
67.
J.
Hermann
, “
PyBerny is an optimizer of molecular geometries with respect to the total energy, using nuclear gradient information
,” accessed November 2020, Github project: https://github.com/jhrmnn/pyberny, Zenodo database: https://doi.org/10.5281/zenodo.3695038.
68.
J. C.
Kromann
, “
Calculate root-mean-square deviation (RMSD) of two molecules using rotation
,” accessed February 2021, Github project: http://github.com/charnley/rmsd.
69.
W.
Kabsch
, “
A solution for the best rotation to relate two sets of vectors
,”
Acta Crystallogr., Sect. A
32
,
922
923
(
1976
).
70.
Rdkit: Open-source cheminformatics software, 2013, www.rdkit.org.
71.
G.
Landrum
,
RDkit: A Software Suite for Cheminformatics, Computational Chemistry, and Predictive Modeling
(
Academic Press Cambridge
,
2013
); available at https://pubs-acs-org.uaccess.univie.ac.at/doi/full/10.1021/acs.jcim.1c01537.
72.
Y.
Kim
and
W. Y.
Kim
, “
Universal structure conversion method for organic molecules: From atomic connectivity to three-dimensional geometry
,”
Bull. Korean Chem. Soc.
36
,
1769
1777
(
2015
).
73.
G.
Domenichini
(
2021
). “
Alchemical CPHF perturbator
,” Zenodo. https://zenodo.org/10.5281/zenodo.5606918
74.
A.
Schäfer
,
C.
Huber
, and
R.
Ahlrichs
, “
Fully optimized contracted Gaussian basis sets of triple zeta valence quality for atoms Li to Kr
,”
J. Chem. Phys.
100
,
5829
5835
(
1994
).
75.
F.
Weigend
,
F.
Furche
, and
R.
Ahlrichs
, “
Gaussian basis sets of quadruple zeta valence quality for atoms H–Kr
,”
J. Chem. Phys.
119
,
12753
12762
(
2003
).
76.
T. H.
Dunning
, Jr.
, “
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
,”
J. Chem. Phys.
90
,
1007
1023
(
1989
).
77.
D. E.
Woon
and
T. H.
Dunning
, “
Gaussian basis sets for use in correlated molecular calculations. V. Core-valence basis sets for boron through neon
,”
J. Chem. Phys.
103
,
4572
4585
(
1995
).
78.
F.
Jensen
, “
Polarization consistent basis sets: Principles
,”
J. Chem. Phys.
115
,
9113
9125
(
2001
).
79.
F.
Jensen
, “
Polarization consistent basis sets. II. Estimating the Kohn-Sham basis set limit
,”
J. Chem. Phys.
116
,
7372
7379
(
2002
).
80.
F.
Jensen
, “
Polarization consistent basis sets. 4: The elements He, Li, Be, B, Ne, Na, Mg, Al, and Ar
,”
J. Phys. Chem. A
111
,
11198
11204
(
2007
).
81.
B. J.
Ransil
, “
Studies in molecular structure. IV. Potential curve for the interaction of two helium atoms in single-configuration LCAO MO SCF approximation
,”
J. Chem. Phys.
34
,
2109
2118
(
1961
).
82.
N. R.
Kestner
, “
He–He interaction in the SCF–MO approximation
,”
J. Chem. Phys.
48
,
252
257
(
1968
).
83.
J. M.
Leclercq
,
M.
Allavena
, and
Y.
Bouteiller
, “
On the basis set superposition error in potential surface investigations. I. Hydrogen-bonded complexes with standard basis set functions
,”
J. Chem. Phys.
78
,
4606
4611
(
1983
).
84.
P. M.
Morse
, “
Diatomic molecules according to the wave mechanics. II. Vibrational levels
,”
Phys. Rev.
34
,
57
64
(
1929
).
85.
A. K.
Rappe
,
C. J.
Casewit
,
K. S.
Colwell
,
W. A.
Goddard
, and
W. M.
Skiff
, “
UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations
,”
J. Am. Chem. Soc.
114
,
10024
10035
(
1992
).
86.
P.
Atkins
,
J.
De Paula
and
J.
Keeler
,
Atkins' Physical Chemistry
(
Oxford University Press, Oxford
,
2006
).
87.
F. M.
Carvalho
,
Y. A.
de Oliveira Só
,
A. S. K.
Wernik
,
M. d. A.
Silva
, and
R.
Gargano
, “
Accurate acid dissociation constant (pKa) calculation for the sulfachloropyridazine and similar molecules
,”
J. Mol. Model.
27
,
233
(
2021
).
88.
R.
Casasnovas
,
J.
Frau
,
J.
Ortega-Castro
,
A.
Salvà
,
J.
Donoso
, and
F.
Muñoz
, “
Absolute and relative pKa calculations of mono and diprotic pyridines by quantum methods
,”
J. Mol. Struct.: THEOCHEM
912
,
5
12
(
2009
), part of the Special Issue: The 6th Congress on Electronic Structure: Principles and Applications (ESPA 2008).
89.
R.
Flores-Moreno
,
S. A.
Cortes-Llamas
,
K.
Pineda-Urbina
,
V. M.
Medel
, and
G. K.
Jayaprakash
, “
Analytic alchemical derivatives for the analysis of differential acidity assisted by the h function
,”
J. Phys. Chem. A
125
,
10463
(
2021
).
90.
E. P.
Wigner
,
Gruppentheorie und ihre Anwendung auf die Quantenmechanik der Atomspektren
(
Springer
,
1931
).
91.
J. F.
Bunnett
and
R. E.
Zahler
, “
Aromatic nucleophilic substitution reactions
,”
Chem. Rev.
49
,
273
412
(
1951
).
92.
A.
Robles
,
M.
Franco-Pérez
,
J. L.
Gázquez
,
C.
Cárdenas
, and
P.
Fuentealba
, “
Local electrophilicity
,”
J. Mol. Model.
24
,
245
(
2018
).
93.
P. W.
Ayers
,
J. S. M.
Anderson
, and
L. J.
Bartolotti
, “
Perturbative perspectives on the chemical reaction prediction problem
,”
Int. J. Quantum Chem.
101
,
520
534
(
2005
).
94.
G. F.
von Rudorff
and
O. A.
von Lilienfeld
, “
Simplifying inverse materials design problems for fixed lattices with alchemical chirality
,”
Sci. Adv.
7
,
eabf1173
(
2021
).

Supplementary Material