Extending the definition of atomic basis sets to atoms with fractional nuclear charge

Alchemical transformations showed that perturbation theory can be applied also to changes in the atomic nuclear charges of a molecule. The alchemical path that connects two different chemical species involves the conceptualization of a non-physical system in which atom possess a non-integer nuclear charge. A correct quantum mechanical treatment of these systems is limited by the fact that finite size atomic basis sets do not define exponents and contraction coefficients for fractional charge atoms. This paper is proposes a solution to this problem, and shows that a smooth interpolation of the atomic orbital coefficients and exponents across the periodic table is a convenient way to produce accurate alchemical predictions, even using small size basis sets.


I. INTRODUCTION
A molecular compound can be defined by its composition, its molecular geometry, and its molecular charge; in a computational simulation, these properties are fully described by: nuclear positions R, nuclear charges Z, and number of electrons N e .
Considering only compounds in their electronic ground state, and assuming valid the Born-Oppenheimer adiabatic approximation 1 , R, Z, and N e are the variables which define a continuous P.E.S. (Potential Energy Surface): E(R, Z, N e ).5][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32] Perturbation theory can be applied to alchemical transmutations, but it requires the abstraction of calculating derivatives along a non-physical path.The concept of nuclear charge has to be extended to non-integer numbers: if the basis set is kept fixed, a molecule with fractional nuclear charge can be modelled by scaling the nuclear-electron attraction operator by a real number.
Contracted Gaussian type orbitals (CGTO) 33,34 are standard atomic basis for quantum mechanical calculations on molecular systems, their contraction coefficients and exponents are optimized and tabulated for every chemical element.
Previous works 35,36 showed that the alchemical perturbation (AP) series calculated using the atomic orbital (AO) basis set of the reference, converges to the energy of the target molecule with the basis set of the reference (E T [R] ).The difference between the target's energy calculated with reference basis set, and the target's energy with its own basis set(E T [T ] ), was defined 36 as "alchemical basis set error" . Using small basis sets, ∆E BS can be as big as 1.
A way to reduce this error is to use of large augmented basis sets [43][44][45] (e.g. the Dunning's augcc-pVQZ, aug-cc-pV5Z ).Also uncontracted basis set are more flexible than their respective contracted versions (e.g. for uncontracted cc-pVDZ 46 uncontracted Cartesian Def2TZVP 45 ), therefore more suitable to applications in quantum alchemy.According to a previous investigation 47 we found that the smallest alchemical basis set errors are produced using polarized core and valence basis functions, such as Dunning's cc-pCVnZ 48 , or Jensen's pC-n [49][50][51] basis sets, similar conclusions were obtained also by Geerlings et al. 13 .Particular good results can be achieved using the Jensen's pcX-n 52 basis sets.Some basis set do not differ from reference to target, this is the case of the universal Gaussian basis set 53 , as well as the plane waves basis sets. 54,55e most elegant way to solve the basis set issue, suggested in a 2012 paper from M.Lesiuk, R.Balawender and J.Zachara 56 , involve a smooth transformation of the atomic orbitals as the nuclear charges of the reference approaches the nuclear charges of the target molecules.In this paper I show that, as long as target and reference molecules share the same number of primitive and contracted Gaussian basis function, it is straightforward to interpolate through Splines basis set exponents and coefficients, extending the definition of basis sets to atoms with a non integer nuclear charge.The extended basis set that change consistently with the nuclear charge will be called alchemically consistent basis sets (ACBS), a term which is not referred to a particular basis set, rather to the interpolating procedure on which can be derived from an existing basis set.In section FIG. 1.For the simple alchemical transmutation CO→BF (HF/6-31G), using reference basis set, the AP series converges E BF [CO] , while using AC-6-31G basis set the series approaches E BF [BF] II are explained the interpolation procedure through alchemical consistent basis sets are defined.
For the Restricted Hartree Fock (RHF) level of theory is proposed a formula to calculate the first alchemical derivative including the basis set derivatives.In section III are given some example on how including basis set derivatives in the alchemical expansion is crucial to produce accurate results while using a small basis set.Numerical evidence of this are deduced from the two test cases of diatomic molecules (exemplified in Figure 1) and the boron-nitrogen (BN) doping of benzene.
The results in this paper were obtained from a locally modified version of the PySCF program 57,58 , accessible through the GitHub repository "Supplementary code for Quantum Alchemy" 59 .In this work were used Python 60,61 3.8 scripts, using Numpy 62 , Scipy 63 and Matplotlib 64 libraries.

Interpolation of orbitals exponent and coefficients
The radial part of a CGTO is: N is a normalization constant chosen such that the atomic orbitals are normalized, this constant is calculated automatically by PySCF.
The contraction coefficients c i and the Gaussian exponents ζ i , are tabulated in the basis set for different elements.In many basis sets, elements belonging to the same period share the same number of atomic orbitals and Gaussian primitives.This allows a smooth transformation of orbital coefficients and exponents from one element to another.Coefficients c i (Z), and exponents ζ i (Z) have to be defined as continuous and differentiable functions of the atomic nuclear charge Z, and for integer nuclear charge they should be equal to the tabulated value.A simple choice that fulfills these requirements is to define c i (Z), ζ i (Z), is to use splines interpolations.In this paper will be used a fifth order spline with the 'not-a-knot' boundary conditions.
In Figure 2 are plotted, for Pople's 6-31G basis set, the tabulated a, with integer Z, and interpolated a with fractional Z.A n th order spline is a polynomial function that is continuous up to its n − 1 th derivative.Discontinuity of higher order derivatives affects the alchemical perturbations.As an example, Figure 3 shows that if AO coefficients and exponents are transformed with a cubic spline, the alchemical series diverges for a perturbation order greater than 4, while if the AO coefficients and exponents transformed with the fifth order spline diverges for an alchemical perturbation order greater than 6.
As a rule of thumb, the AO interpolation spline should have an order which is higher than the alchemical perturbation.

First order analytical alchemical derivative
The first order alchemical derivative of the RHF energy can be derived from the gradient expression [65][66][67] .
W is the energy weighted density matrix: If the alchemical derivatives are calculated using the reference basis set, the derivatives of the AO integrals w.r.t. the nuclear charge are zero ∂ ∂ Z = 0, ∂ ∂ Z (µλ ||νσ ) = 0 , and Equation 2 is reduced to the Hellmann-Feynman derivative: Higher order alchemical derivatives can be evaluated analytically from the solution of the Coupled Perturbed Hartree Fock (CPHF), or the Couple Perturbed Kohn Sham (CPKS) equations [68][69][70] .
These analytical derivatives have been implemented and used in alchemical perturbations 35,47,[71][72][73] , but only in the case where basis sets do not depend on nuclear charge (reference basis set are kept fixed), and will be used in the following section.An implementation of analytical CPHF second and third derivatives which includes the basis set derivative (ACBS) is possible, but not yet implemented.Second and third alchemical derivatives, using ACBS, were calculated from numerical differentiation (through a three point central finite difference scheme with spacing h = 0.05e) of the first alchemical derivative (Equation 2).

III. NUMERICAL EXAMPLES
The basis sets 3-21G, 6-31G, cc-pVDZ have a constant number of CGTO and Gaussian primitives for all elements of the second period, this allows to define ACBS which connect all atoms from nuclear charge 3 (Lithium), up to nuclear charge 10 (Neon).The RHF energy alchemical prediction calculated using the ACBS will be compared to the AP calculated on the basis set of the reference molecule.
In table III are listed the mean absolute errors (MAE) of the third order alchemical prediction (AP3) of the 14 electrons diatomic molecules BF,CO,N 2 .From the reference molecule are reached all possible target with a ∆Z = ±1 perturbation: The error made using the ACBS, is on the milli Hartree scale, and it comes from the truncation of the alchemical series, while the MAE from the reference basis set AP it is constantly off by the huge amount of ≈ 1.2 Hartree, mainly due to ∆E BS .
FIG. 4. Alchemical B-N doping of benzene, the plot shows the true versus the second order predictions for AC-6-31G basis set, against the predictions made with the 6-31G basis set of benzene.
Another common application of alchemical perturbations is the BN doping of carbon aromatic systems, previous papers 46,71,72,[74][75][76] showed how quantum alchemy can predict a combinatorially large number of doped mutants from the explicit calculation of one single reference molecule.In figure 4 and in table II are presented the results for the second order alchemical prediction (AP2) of all 17 B-N doped mutants of benzene, calculated at the RHF level of theory, using Pople's 6-31G basis sets, the atomic coordinates are the one of the benzene's energy minimum.Figure 4 shows that the alchemical predictions with the ACBS approach the true energy of the target, while the predictions made from the reference basis set derivatives approach the energy of the mutants with the basis set of benzene (E T [R] ).The mean absolute errors for the second order predictions, listed in Table II show that the error for the AC-BS can be as little as 4 mHa for a single B-N doping, while grows up to 27 mHa for double substitutions, and 74 mHa for triple substitutions, even if this is quite a large error on total energy, however constitutes less than 1% of the total perturbation energy, and is expected that adding more terms in the series can eventually reduce this error to the desired accuracy.

IV. CONCLUSIONS
CGTO basis sets are defined only for real elements with integer nuclear charge.In this paper I present a way to extend the basis set definition, to non-physical atoms, with non-integer nuclear charge.This is possible as long as exist a series of contiguous elements which share the same basis set contraction scheme (they possess the same number of contracted Gaussians and the same number of Gaussian primitives).
Because of the high ∆E BS some basis sets struggle to give accurate alchemical predictions of the total energy if the derivatives are calculated on the reference basis set.However if the derivatives include the alchemical consistent transformation of the atomic orbitals, as is the case of the first alchemical derivative in Equation 2, than the alchemical perturbation series converge to the true target energy.I showed that in section III that for alchemical perturbation with ∆Z = ±1, such as

FIG. 2 .
FIG. 2. Radial density function of Pople's 6-31G 1s and 2s atomic orbitals for the elements of the second period.The solid lines represent the basis function of the elements, which have an integer nuclear charge Z, the dashed lines represent the basis functions of some non-physical atoms with fractional nuclear charge Z + 1 3 and Z + 2

TABLE I .
MAE of the third order alchemical perturbation of the 14 electrons diatomic molecules (BF,CO,N 2 ), all target within a ∆Z = ±1 perturbation are reached, and the interatomic distance is kept fixed at a length of 2.05 Bohr

TABLE II .
MAE for the alchemical energy predictions of the BN doped benzene's mutants.

TABLE III :
Third order alchemical perturbation (AP) energies of diatomic molecules using basis set of the reference, or the alchemical consistent basis set (ACBS).