Translational eigenstates of He@C 60 from four-dimensional ab initio Potential Energy Surfaces interpolated using Gaussian Process Regression

,


I. INTRODUCTION
Endofullerenes are a class of systems where atom(s) or small molecule(s) are trapped inside fullerene cages.While the development of the technique known as "molecular surgery" 1,2 has allowed for synthesis and characterisation of these species experimentally they are also of interest from a theoretical perspective 3 .The encapsulating cage provides a confining potential experienced by the endohedral species, which in turn quantises its translational motion.
The simplest examples of these systems that do not have any other degrees of freedom, are monoatomic noble gas endofullerenes.Among these, He@C 60 has been the subject of recent experimental and theoretical investigation. 4,5Bacanu et al. synthesised both 3 He@C 60 and 4 He@C 60 , and characterised them using using both Terahertz (THz) spectroscopy and Inelastic Neutron Scattering (INS).From these spectra, they simplified the model of the potential energy surface (PES) by considering it to be spherically symmetric, and neglecting the influence of the endohedral He atom on the C 60 cage radius.This reduces the complexity of the PES to a single dimension, specifically the distance of the He atom from the centre of the cage. 4,5his work extends upon this treatment by incorporating the angular dependence and allows the cage radius to vary.A consequence of this is that the PES becomes four-dimensional: three translational degrees of freedom of the encapsulated He atom, and the C 60 isotropic cage breathing mode.
The predominantly dispersive interaction between the He atom and C 60 cage 3 , combined with the high dimensionality of this PES imposes substantial demands on the electronic structure routines.It specifically requires the use of methods that are both very accurate in accounting for dispersion ef-fects, and computationally efficient.
Recent years have seen major developments in approaches to the interpolation problem based on machine learning frameworks.These allow for the prediction of PESs without the necessity of strong a priori knowledge of all its features.For example, Gaussian Processes (GPs) can span accurate surfaces for very different systems, despite using a very common, generic construction. 16,17This is possible due to the fact that GPs adapt their underlying characteristics to those of the true surface.The amount of data required for accurate interpolations depends on the dimensionality of the PES, the hypervolume needed to cover the relevant region of the surface and the roughness of the surface.Smooth and regular surfaces, which PESs are, can be accurately described by quite small, sparse datasets. 18t is no surprise that GPs have been extensively used in computational chemistry to generate cheaper PES evaluations for both small systems, [19][20][21] larger scale chemical simulations, 22 excited states 23 and reaction dynamics. 24The biggest challenge in creating accurate descriptions of these surfaces is not to explicitly obtain the data but to make it efficiently predict new data.Instead, one is usually hit with the curse of dimensionality due to the exponential increase in the hypervolume with increasing input dimensions.][27] Once armed with the PES, the energies of the translational states can be found by diagonalising the nuclear Hamiltonian.While this gives the full "absolute" energy spectrum, only the differences between energy levels are of physical importance, as these are the transitions observed spectroscopically.The comparison of these energy gaps with the previously reported one-dimensional and spectroscopic data 4,5 distinguishes the quality of the PES, and by consequence the quality of its underlying ab initio method.
With a spherically symmetric potential, using a basis set of three-dimensional harmonic oscillator functions generates energies which are (2l + 1)-fold degenerate.The angular dependence in the potential allows coupling of states with differing angular momentum quantum numbers and this effect is observed in the lifting of degeneracy of these states as expected under I h symmetry.The importance of the angular dependence in the potential can also be ascertained by analysing the wavefunctions, and their deviations from the purely spherically symmetric states.The influence of the cage breathing mode can be determined analogously.
The rest of this work is structured as follows: Section II contains the theory and methodology used to calculate the translational eigenstates of 3 He@C 60 .This is split into Section II A which provides an explanation of the rationale of the choices of electronic structure methods; Section II B details the construction of the four-dimensional PES using GPs expanded in spherical polar coordinates; and Section II C outlines the calculation of the translational eigenstates of the system.The results of this methodology and a discussion are presented in Section III.Concluding remarks and prospective avenues for future research are outlined in Section IV.

II. THEORY
A. Electronic Structure Calculations

Choosing an appropriate method
In He@C 60 the choice of electronic structure method is ultimately a balancing act between accurately depicting the effects of dispersion, and computational efficiency.][10][12][13][14][15] However, the choice of these parameters may not be unique, and their optimal values are strongly dependent on the environmental conditions. 8,15Recent work also suggests these empirical models yield potentials with considerable variance and can often lead to poor agreement with experimental observations.This is not a surprise due to the delocalised electronic structure of the C 60 cage. 4,5,14ensity functional theory (DFT) emerges as a natural alternative to this procedure, due to its renowned success across computational chemistry, physics and materials science.This is primarily owing to its favourable cost-to-performance ratio, which is why it is commonly referred to as the workhorse of quantum chemistry. 28Nonetheless, standard density functional approximations (DFAs) have well-documented limitations in accurately capturing dispersion forces. 29To mitigate this, various techniques including Grimme's dispersion corrections [30][31][32][33][34][35][36] have been developed and have gained enor-mous popularity.However, given the extensive array of DFAs, 37 some of them yielding divergent results in prior endofullerene studies, 11 the reliability of these methods in accurately modelling these systems remains questionable.
On the other hand, wavefunction (WF) methods are reputed for their ability to deliver highly accurate results, albeit with an unforgiving relationship between accuracy and computational expense.The significant computational demand arises from both the unfavourable scaling with system size, and also the inherent slow convergence with respect to basis set size. 38he latter effect is due to the necessity that WF methods explicitly describe the electronic cusp 39 which requires the use of many basis functions with a large angular momentum quantum number. 40DFAs on the other hand, implicitly incorporate this condition through exchange-correlation functionals leading to faster basis set convergence.
Nevertheless, significant advancements have been made in the field of WF electronic structure methods over recent decades.A prime example of this is second order Møller-Plesset perturbation theory (MP2) which stands out for its efficiency.  Notay, the only current higher dimensional PES of an endofullerene, HF@C 60 , was constructed using density-fitting local MP2 11 developed by Werner et al. 47 In our work, we will utilise an advanced MP2 variant known as ω-RI-CDD-MP2, a method recently introduced by the Ochsenfeld group. 60The specific thresholds applied are outlined within the supplementary information in Section SI 1.
3][64][65] These inaccuracies tend to systematically increase with system size, which suggests a strong concern in applying these methods to large dispersion-bound complexes. 66The root of MP2's shortcomings is the inadequate treatment of electrodynamic screening of the Coloumb interactions among electrons induced by particle-hole pairs resulting in an overestimation of the effective interaction strength. 66oncerned by these findings, we decided against relying exclusively on the MP2 method.Research by Furche and colleagues has demonstrated that while the scaled opposite spin (SOS) and spin component scaled (SCS) MP2 methods have similar issues, their errors are generally less severe. 66Consequently, we will also include results from SOS-MP2 and SCS-MP2 in our discussion.
An additional method that does account for correlation by including electrodynamic screening through induced particlehole pairs or density fluctuations, thereby reducing the effective electron interaction, is the random phase approximation (RPA). 66RPA can be interpreted as a resummation of all possible ring diagrams, and due to the neglect of exchange contributions between particle-hole pairs it can be evaluated even more efficiently than the MP2 method.The RPA has been recognised for counteracting the erratic behaviour of MP2, proving to be reliable across various system sizes. 66With its good description of long-range correlation effects such as dispersion, RPA therefore serves as an excellent tool for validating our results.In the present work, we will hence utilise the linear scaling ω-CDGD-RPA method put forward by the Ochsenfeld group which has been shown to provide excellent agreement with conventional RPA. 67While self-consistent formulations of RPA have been introduced in the literature, [68][69][70][71] RPA is predominantly applied in a post-Kohn-Sham manner, 72 utilising orbitals and orbital energies obtained from a preceding DFA calculation, which we will abbreviate as RPA@DFA in the following.Typically, these reference calculations employ a generalized gradient approximation (GGA), with the one proposed by Perdew, Burke, and Ernzerhof (PBE) 73,74 being particularly popular.However, pure density functionals such as GGAs are known for their self-interaction error 75,76 which, in turn, can lead to erroneous densities, Kohn-Sham orbitals, and orbital energies, impacting the subsequent RPA calculation.Aware of this issue, we also present results obtained with the corrected Hartree-Fock RPA (C(HF)-RPA) approach, designed to address these errors. 77

Ensuring Basis-Set Convergence
WF methods generally exhibit slow convergence with respect to the basis-set size.The RPA, sitting on the border between WF theory and DFT, shows similar limitations; in fact, research suggests that it converges even slower than MP2. 40,78Given the pronounced basis set incompleteness error in dispersion-bound systems, 78 extrapolation to the complete basis set (CBS) limit is strongly advised for both MP2 and RPA.

Gaussian Process Regression
A Gaussian Process (GP) is a machine learning regression method, and is defined as a collection of random variables, any finite number of which have a joint Gaussian distribution. 91An essential part of a GP model is its covariance function, or kernel, which is equivalent to a similarity measure of the data scattered throughout an input space.
Kernels may take a wide variety of mathematical forms, as they only need to adhere to a few simple rules. 91More complex kernels can be constructed by multiplying and/or adding other kernels and the rule yields a valid "composite" kernel function.
A GP is trained on a set of training data, with inputs X t with known outputs y(X t ) and predicts and multivariate Gaussian distribution defined by the mean µ(X p ) and covariance where K i j is shorthand for the kernel K(X i , X j ) and the subscripts t and p are for the training and prediction points respectively.A common metric used by the machine learning community to gauge the efficacy of the GP is to consider the 95% confidence interval of the prediction, which is given by µ(X p ) ± 1.96Σ(X p ). GPs are trained by finding an optimal set of hyperparameters, so-called as to differentiate them between parameters in the kernel which are left unchanged during the optimisation process.These hyperparameters are what introduce the flexibility into the model, and can control length scales, amplitude and noise within the data.Using a Bayesian approach, this hyperparamter set is found by maximising the log-marginal likelihood (LML) given by 91 The three contributing terms to this quantity are to be understood as a fit, a regularisation, and a normalisation factor.

Building an appropriate interpolation
The high I h symmetry of the He@C 60 system, motivates the use of a spherical polar coordinate system (r He , θ , φ , R C 60 ), with (θ , φ ) representing the polar and azimuthal angles respectively.The C 60 cage is oriented as given in Fig 1, with the z-axis going through the centre of a pentagon, a C 5 rotation axis, and the y-axis aligned through the centre of a hexagonpentagon C -C bond.
Analysis of the spectroscopic data 4,5 indicates that the PES is dominated by the distance of the encapsulated He atom This form of PES necessitates the learning of two separate but linked GPs.Initially, the two-dimensional radial surface is trained on the majority of the data.The four-dimensional corrections are then trained the remaining input data, but with y 4D = y 2D − µ 2D .That is to say, its outputs are the difference between the learned radial surface and electronic structure data.Finally, before training the four-dimensional surface, the input coordinates are symmetrised.This is accomplished by applying the I h symmetry operations C n 5 for n ∈ [0, 1, 2, 3, 4] using the axis aligned along the z direction, and also the centre of inversion i, producing nine extra equivalent geometries with identical energies.
The quality of the PES is determined by the choice of kernel and distance metric used in the GP.For the two-dimensional radial surface, an anisotropic Matérn covariance function 91 with ν = 2.5 using the standard Euclidean distance added to a white noise function was chosen.For the four-dimensional surface, a more considered choice is required as the angular and radial components use different types of coordinates, units and possibly even distance metrics.Consequently, this kernel is split into the sum of a white noise function and a product of a radial and angular functions where r refers to the radial coordinates (r He , R C 60 ), Ω refers to the angular coordinates (θ , φ ) and R := (r, Ω).The product of k r and k a is chosen as it treats both coordinate types on equal footing.An alternative choice of summed kernel here would treat the similarity as similar to the radii or the angles which TABLE I: Kernels which give a positive-definite covariance matrix when using the spherical geodesic distance for d.In the Matérn, ν, l refer to the smoothness and length-scale.In the Wendland kernels, τ, s play analogous roles although s is a support parameter and may not be optimisable.The subscript + is shorthand for max(f(x),0).If using the chordal metric, some parameter restrictions are lifted.
would introduce artificial correlation with data.While k r = k is a possible choice, a more nuanced approach is required for k a .This is due to the use of spherical polar coordinates, and how this affects the choice of metric.As the angular coordinates (without loss of generality) lie on the surface of a unit sphere this provokes the choice of two metrics: chordal, which is the simple straight line distance; or greatcircle, which is the geodesic on the sphere.This is calculated as where the spherical polar coordinates have been converted to longitude and latitude.This is easily converted to the chordal distance using the relationship given in Eq (10).If using the chordal metric, as this is analogous to the Euclidean distance of Cartesian coordinates, the usual choices of kernels are possible.However, should the great-circle distance be preferred, this is not the case as a positive-definite covariance matrix is not guaranteed.This is resolved by using compactly supported covariance functions 92,93 , examples of which are given in Table I.These functions can also use the chordal distance, but as the geodesic does not distort these distances, that should be the preferred metric where possible.Due to the high symmetry of the endofullerene, this also motivates the use of a kernel that can emulate the radial and angular patterns within the input data, which is better achieved through the kernels listed in Table I.
As we require the PES to be twice-differentiable, given the restriction on ν in Table I this removes the Matérn as a possibility.The Wendland kernels are dependent on two parameters: τ, a smoothness parameter which is analogous to ν in the Matérn; and s which appears to be an optimisable lengthscale but is actually a support parameter.To ensure differentiability of the Wendland kernel, we require s ≥ max(d) which is π and 2, for the great-circle and chordal distances respectively.
While in the latter case this could be an optimised hyperparameter, in both cases we choose to fix s = max(d).

Full 4D Problem
The Hamiltonian for He@C 60 including all degrees of freedom of the endohedral atom and the cage breathing in atomic units is given by where m is the reduced mass of the He-C 60 "two-particle" system calculated as m = m He m C60 m He +m C60 , M = m C 60 is the mass of the C 60 cage, (r, θ , φ ) refer to the endohedral coordinates of the He in spherical polars: distance from origin, polar and azimuthal angles respectively; and R is the C 60 cage radius.The harmonic contributions of the He translation and C 60 cage breathing have been explicitly included as this enables the use of the following simple coordinate transformations where R eq is the equilibrium C 60 cage radius.By scaling both the He and C 60 cage breathing to the natural length scales of the oscillators, and centring them at q i = 0, this allows for the basis set to be constructed from a three-dimensional harmonic oscillator (3D HO) for the He translation and a onedimensional harmonic oscillator (1D HO) for the C 60 cage breathing mode.Overall, the basis set is then written as where N klmb is the appropriate normalisation constant.Working in the finite basis representation (FBR), the Ĥ0 matrix is diagonal and depends only on the quantum numbers and frequencies of the two different oscillators.The potential energy matrix is a bit more cumbersome to work with.Traditionally this problem is solved using discrete variable representation (DVR) techniques [94][95][96] , which require both the selection of basis set and quadrature points where this matrix would be diagonal, and the quadrature grid would normally be constructed as a direct product of smaller sub-dimensional grids.
However, as the angular momentum quantum number l is shared between the radial and angular functions, a direct product grid is not possible.Nonetheless it is still possible to leverage advantages from using the DVR in this scenario.As this basis set is orthonormal, there exists an orthogonal transformation between the FBR and DVR. 94This transformation from the DVR to the FBR corresponds to effectively evaluating the potential matrix elements using Gaussian quadrature.
An appropriate choice of values for k He and k C 60 in Equations (12) and ( 13) will correspond to the equivalent Gaussian quadrature points.There can be multiple choices for this: the traditional method, where an interval for each coordinate is chosen such that the potential doesn't exceed a cutoff values, or a Potential-Optimised DVR 96 scheme where the shape and curvature of the potential dictates this scaling.
For the q He coordinate, the atom is restricted to explore a sphere within the cage with r He ∈ [0, 1.5Å], as even at the equilibrium cage radius beyond this interval the potential is greater than 5000cm −1 .On the other hand, q C 60 is scaled using the natural known frequency of the cage breathing mode of 496cm −1 . 97

Comparison to 1D
After imposing spherical symmetry and fixing the cage radius, the simple 1D translational eigenstates can be calculated from the Hamiltonian 4,5,14 A simple sixth degree polynomial for V (r) has been fitted 4,5 , and by using the |kl⟩ basis set as defined in Equation ( 14) but with k chosen from a PODVR perspective, the onedimensional eigenstates can be derived.Anharmonicity lifts the degeneracy of states with the same principal quantum number 4,5 n = 2k + l, but the lack of angular dependence in this potential implies no coupling of states with differing values of l and maintains their (2l + 1)-fold degeneracy.Under I h , states with l ≥ 3 are expected to have this degeneracy lifted but this is not observable with the purely one-dimensional radial potential.
The simplest comparison of the one-dimensional and fourdimensional problems is through the energy gaps between eigenstates as these are what can be measured spectroscopically.The importance of angular dependence in the potential can be determined by how strongly the (2l + 1)-fold degeneracy of the spherically symmetric states is lifted.
Not only can the energies of the states be compared, but also the wavefunctions by considering their overlaps ⟨Kl|kl⟩ and ⟨B|b⟩ is not necessarily recovered due to the different scalings and centres of expansion, and they must be evaluated.
Instead of using the overlap which measures the similarity of the wavefunctions, an equivalent notion of how far apart the states are can be considered.The Hellinger distance 98 , which is given by can be used as an alternative.This requires both wavefunctions to be renormalised along the same measure guaranteeing 0 ≤ H ≤ 1 with equality holding if the states are completely identical or orthogonal respectively.

A. Potential Energy Surface
For the two-dimensional radial kernel, k defined in Eq (7), was chosen to take the form of an anisotropic Matérn covariance function with ν = 2.5.For the four-dimensional corrections with kernel of the form given in Eq (8), k r took the same form as k, whereas for k a we choose to use the Wendland-C4 covariance function shown in Table I, with τ = 6 and the greatcircle metric defined in Eq (9) enforcing s = π.Both kernels required the optimisation of four hyperparameters namely: amplitude, r He and R C 60 length-scales and the noise.Further details for training the GPs to generate the PESs are outlined within the supplementary information, Section SI 2.
Adding the predictions of these kernels together gives rise to the two-dimensional radial and angular slices of the fourdimensional PES seen in Fig 2 .As the PES is defined up to an arbitrary energy shift, we choose the energy zero to be the energy minimum in the radial slice which is taken at (θ , φ ) = (0, 0) and in the angular slice to be the value at the origin, with the radii frozen at (r He , R C 60 ) = (0.30066Å, 3.54181Å).
The shape of the radial slices is remarkably smooth along both radial coordinates.This indicates the radial surface can be well described by a polynomial oscillator.The r He coordinate shows a considerable amount of anharmonicity as previously observed. 4,5The R C 60 coordinate on the other hand shows very little anharmonicity.As the elliptical contours are almost perfectly aligned along these axes, this suggests there is unlikely to be any observable coupling between the He translational and C 60 cage breathing modes.
These features are shared in both the MP2 and RPA@PBE radial surfaces, alongside the equilibrium He position being at the origin.This is expected given the I h symmetry of the system.However there is one stark difference being the equilibrium cage radius.MP2 predicts this to be at R C 60 = 3.536Å whereas the RPA@PBE predicts it to be at R C 60 = 3.551Å.This difference of approximately 0.04% signifies the relevance of this coordinate in the surface which was previously neglected. 4,5 Turning to the angular slices the contours are taken every 0.5cm −1 in the interval [−5cm −1 , 5cm −1 ].There is a great similarity between the positions of the positive fourdimensional corrections given in red, and the negative shown in blue.Both the MP2 and RPA@PBE surfaces have a periodic 2π 5 symmetry along the φ direction as necessitated by the symmetrisation of the data with the C 5 rotation axis.They also exhibit the symmetry of mirroring along θ = π 2 and swapping the φ = (0, π) and φ = (π, 2π) regions due to the inversion symmetry introduced into the training data.
However, the numerical values of the corrections are subtly different between the MP2 and RPA@PBE.The MP2 corrections span the full corrections range, whereas the RPA@PBE, with its much lighter contours only spans the range [−2.5cm −1 , 2cm −1 ].The sensitivity of these fourdimensional corrections is further illustrated in the SCS-MP2 and SOS-MP2 PES slices, which are given in the supplementary information Section SI 2. While these corrections are on the order of single digit wavenumbers, this weak angular dependence not being completely flat will lift the (2l + 1)fold degeneracy of the states in the traditional 3D HO.This is alongside the breaking of principal quantum number n = 2k + l degeneracy due to the radial anharmonicity.
The choice of angular kernel and metric was made by comparing the 95% confidence interval on the surface prediction, whose lower and upper bound two-dimensional slices are shown in Fig 3 .The radial slices are taken at the same fixed angular coordinates and vice versa as in Fig 2 .These confidence intervals are calculated only from the error in the prediction of the four-dimensional corrections.This is equivalent to "pre-selecting" a two-dimensional radial surface and asking what is the possible range of values of the four-dimensional corrections.If the GPs are independent, the total error in each prediction would be given by Σ 2 2D + Σ 2 4D .However, in this particular form of PES, this is likely to be an upper bound to the true confidence interval, as the two-dimensional GP and four-dimensional GP would have a negative cross-covariance.This is because as the two-dimensional surface becomes more accurate, the four-dimensional surface becomes more flat.
The radial lower and upper bound surfaces for MP2 look almost identical to the mean surface prediction.The contours are centred at the minimum of the surface, but taken at 44cm −1 lower and higher than the values in Fig 2 .While this confidence interval of 88cm −1 may seem quite large, as the PES is only defined up to an arbitrary energy zero, it can be linearly shifted without affecting the subsequent calculations.This is an indication of the kernel constraining the behaviour of the GP, strongly enforcing the overall shape of the PES.As further calculations using the PES are agnostic of the energy zero, the mapping ) will maintain the subsequent properties derived from the PES.Therefore this wide confidence interval is not diagnostic of a poor PES.
bound surfaces are still approximately 10cm −1 , analogous to the mean prediction, there is a dominance towards the extremes of each bound.That is to say, the lower bound surface shows tendency to skew towards a more negative correction and vice-versa.This implies the confidence interval is somewhat spiky, with a width in places of over 100cm −1 .Once again, as the PES is only defined up to an arbitrary energy zero, the shape of these errors is of more importance.The consequence of the exact shape and size of the PES and confidence interval will be observed in the degree of splitting of the (2l + 1)-fold degenerate energy levels and the size of error bars on the translational eigenstates seen in Fig 4.
This choice of kernel and metric had the tightest confidence interval of all possible options indicated in Table I.The ranges of the other options of kernel and metric for MP2 are given in the supplementary information Section SI 2. While the shape of these confidence intervals and PES is more important than its width, it is still prudent to try to tighten them.The errors could arise due to a combination of factors including the choice of metric which dictates the value of the support parameter s defined in Table I, or even because τ is fixed.They may also be a consequence of the tight hyperparameter bounds on the four-dimensional kernel imposed by chemical intuition.However, widening them could lead to a very flat surface prediction by the GP.On the other hand, introducing more training data could lead to overfitting of the surface.Noticing the Wendland-C4 being an improvement on the Wendland-C2 opens a possible systematic route to better choices for k a .This could be achieved by generating higher degree Wendland functions 99 and using them with the greatcircle metric as the angular kernel.

B. Translational Eigenstates
The nuclear Hamiltonian in Eq (11) was diagonalised using basis functions of the form defined in Eq (14), with 10 translational |kl⟩ functions for each l, spherical harmonics |lm⟩ with l ≤ 7, and 4 cage breathing |b⟩ functions.This gave an overall basis set size of 2560 functions.The 100 lowest lying eigenstates of 3 He@C 60 were converged to within 0.5cm −1 , with the majority of states converging to within 0.02cm −1 .
These eigenstates are plotted, with the ground state energy set to be the energy zero, in Fig 4, alongside energies from the spherically symmetric one-dimensional potential.As these were generated from a potential fitted to the spectroscopic data, they are equivalent for comparison.These onedimensional energies are colour coded by their angular momentum quantum number, l.Due to the spherical symmetry, these states are (2l + 1)-fold degenerate, but the principal quantum number of the 3D HO n = 2k + l which usually gives ⌈ n 2 ⌉-fold degeneracy (ignoring the l degeneracy effect), is lifted.This is due to the anharmonic nature of the potential along r He , which is also seen in Fig 2. This anharmonicity is recovered in all five electronic structure methods: MP2, SCS-MP2, SOS-MP2 RPA@PBE, and C(HF)-RPA.Looking at the energies of these translational eigenstates, it is noticeable that they are consistently substantially overestimated by the two semi-empirical methods SCS-MP2 and SOS-MP2.The fundamental transition differs by approximately 10cm −1 to the spectroscopic data, with higher energy states having sequentially worse agreement.The two ab initio methods on the other hand perform considerably better, with MP2 predicting a fundamental frequency of 96.58 ± 0.99cm −1 which has very good agreement to the spectroscopic value of 96.7cm −1 . 4,5The RPA@PBE consistently overestimates these energies by approximately 6cm −1 .While its corrected variant, C(HF)-RPA, yields an equilibrium cage radius significantly closer to that obtained with the MP2 method (see Section SI 2), the accuracy in terms of the energies of the translational eigenstates is deteriorated.C(HF)-RPA performs very similar to SCS-MP2 at low energy states, but as this energy increases it starts to outperform the semiempirical methods.The possible causes of the discrepancies of MP2 and RPA to the one-dimensional data are described in Section II A: the MP2 lacking screening effects, and the RPA@PBE not including exchange effects.While a priori it is not obvious which is the more significant effect, the comparison to the spectroscopic data indicates that the MP2 outperforms the RPA@PBE.However, it should be noted that, due to the neglect of exchange effects between particle-hole pairs, RPA@PBE exhibits significantly higher efficiency compared to the MP2 method.To illustrate, a single MP2 calculation using a quadruple-zeta basis set takes approximately 325 min, whereas the corresponding RPA@PBE calculation requires about 34 min.This makes the MP2 method roughly ten times as computationally expensive as the RPA calculation.Therefore, although the energies predicted by RPA@PBE do not converge to spectroscopic accuracy, the significantly higher computational efficiency of this electronic structure method suggests that it should not be discarded as a viable choice for these systems.
We also observe a lifting of the of the (2l + 1)-fold degeneracy in all four electronic structure methods, but to varying degrees, indicating different levels of importance on angular dependence within the PES.All three MP2 methods predict a p-state splitting of over 1cm −1 and a d-state splitting ranging to over 3cm −1 whereas the RPA@PBE and C(HF)-RPA predict these splittings to be less than 1cm −1 and 2cm −1 respectively.However, considering the I h character table, all three p states and five d states are expected to be degenerate.This fictitious splitting arises due to the symmetrisation of the data using only a single C 5 rotation axis and the centre of inversion i.This only generates a subgroup symmetry of I h , namely D 5d , where the p and d state splitting is expected.Fully symmetrising the training data is a non-trivial task due to the orientation of the C 60 cage shown in Fig 1 and position of the remaining symmetry elements.This would also be counter-productive to the problem as it would lead to strong overfitting in the PES, making it very spiky at the location of the training data.
For states with l ≥ 3, namely f states and higher, there is a real splitting of these states as expected under I h symmetry, of 4cm −1 in the MP2 type and 2.5cm −1 in the RPA type methods.The smaller splitting is expected in the RPA, as the range of the angular PES slice in Fig 2 is smaller than the equivalent MP2 range.This real splitting indicates the importance of catering for the angular dependence in the PES, as this allows for states with differing l to couple and mix, which is not possible with a purely symmetric potential.The lifting of this degeneracy may be masked within an experimental spectrum into the linewidth of the corresponding transition.
For the states with energies under 500cm −1 , there is a oneto-one mapping of the one-dimensional energy levels, to the ones of each electronic structure method.However, around 500cm −1 , an extra state appears in the electronic structure diagonalisation.This is due to the excitation of a quantum in the cage breathing mode but remaining in the ground translational state.In the three MP2 methods, the fundamental cage breathing frequency is 495cm −1 , but 490cm −1 in the RPA@PBE.Looking at the top right insert in Fig 4, this extra state is obvious in the MP2, SOS-MP2 and RPA@PBE, but seems to be missing in the SCS-MP2 and C(HF)-RPA.It is necessarily still present, but has been mixed into the cluster of the first set of excited d states.
Going upwards in energy, all the states with energy < 500cm −1 will be repeated again, with an excitation in the breathing mode.These will be combined with the higher pure translational excitations which leads to a cluttered energy level diagram and increases the difficulty in matching the one-dimensional energies to the electronic structure analogues.There seems to be no discernible coupling of the breathing mode to the translational mode, as the combination modes occur at the sum of energies of exciting each mode separately.This is expected given the form of radial slices in the PES in Fig 2 , with the contours aligning almost perfectly along the radial axes.
An advantage of using GPR for PES interpolation, is the ability to generate the covariance matrix as defined in Eq (5), alongside its mean prediction given by Eq (4).We construct the overall covariance matrix as Σ = Σ 2D,2D + Σ 4D,4D , which is exact assuming independence between each GP.However,  as mentioned in Section III A, there is likely to be a negative cross covariance between these processes denoted by Σ 2D,4D , implying the error bars shown in Fig 4 are upper bounds to the "true" errors.These error bars were calculated by generating 100 samples of the PES from the appropriate multivariate Gaussian distribution which in turn generates 100 different nuclear Hamiltonian matrices.The eigenvalues of these matrices were averaged, generating its mean and standard deviation.These error bars are also colour coded by near degeneracy of the states.For states with energies below 500cm −1 , this corresponds identically to the angular momentum quantum number, l, with the spectroscopic reference.Above this, however, this may not be the case due to the muddling of states involving an excitation in the breathing mode.
Looking at the four zoomed-in inserts in Fig 4, it is noticeable how small these error bars are, usually less than ±2cm −1 .Despite the wide looking 95% confidence interval seen in Fig 3, this ends up being a relatively small error bar in the translational energy, which may be interpreted in relation to a linewidth in an experimental spectrum.The minute size of these error bars once again demonstrates the dominance of the two-dimensional radial PES, compared to the fourdimensional corrections.The error bars on the MP2 translational energies are smaller than those of the RPA@PBE, despite the angular corrections being larger in magnitude.This implies the confidence interval on the MP2 surface is tighter than the respective one for RPA@PBE.
The energies are not the only information obtained from the diagonalisation procedure; the eigenvectors corresponding to the wavefunctions are also obtained.These nuclear wavefunctions (orbitals) have strikingly regular patterns as seen in  x and y axes is not best suited to the Cartesian representation of these orbitals.The isosurfaces, taken at 0.5% of the maximum amplitude of each wavefunction are constrained within a sphere of radius 1.5Å which is less than 50% the value of the C 60 cage radius.This implies that while the He is encapsulated within the C 60 cage, it does not explore any region close to the C atoms.
Instead of using the energies as the discriminator for the validity of electronic structure method, the wavefunction can be used instead.This is achieved by considering the overlap, or by calculating the Hellinger distance, as defined in Eqns ( 16) and ( 17) respectively between the one-dimensional reference state and four-dimensional eigenfunction.Considering just the ground state, we take the reference ⟨LM| ⟨B| states to be the pure basis state ⟨00| ⟨0|, reducing the summation over lmb quantum numbers to a single term.Due to the different equilibrium cage radii predicted by each electronic structure method, the centre of expansion of these breathing 1D HO functions is different.For each electronic structure method, we enforce the reference 1D breathing state to be centred in the same position.This has the intended effect of the Hellinger distance (and overlap) to be purely due to the difference in the translational portion of the eigenfunction.These overlaps and distances are given in Table II.While all these methods have over 98% overlap with the reference one-dimensional ground state, the distinction in their quality is more apparent in their Hellinger distances.In line with the energy matches, the MP2 and RPA@PBE perform best, with the C(HF)-RPA and semi-empirical SCS-MP2 and SOS-MP2 lagging behind.This seems to indicate that either the energy, or the Hellinger distance could be used interchangeably as the validator of the electronic structure method.
Despite the spectroscopic accuracy in the energy prediction of the MP2, the Hellinger distance still seems fairly large.This could be due to the extra angular dependence in the po-tential, allowing the mixing of states with differing angular momentum quantum number l to couple, which is not possible in a purely spherically symmetric potential.This casts doubt on the correctness of the one-dimensional potential as while this fits the spectroscopic data, might not correspond to any particular one-dimensional slice of the four-dimensional PES.
Not only can the wavefunctions calculated from the electronic structure methods be compared to just the onedimensional reference, they can also be compared to each other.The Hellinger distances of the ground state between electronic structure methods is given in Table III.Once again, for the ⟨B|b⟩ integral, despite the differing centres of expansion, we enforce this to be δ Bb to isolate just the translational states.Due to the high frequency of the cage breathing mode, if these integrals were left in, the distances would all be over 0.70.Considering just the translational states, we find that all these methods are closer to each other than they are to the one-dimensional reference.This is possibly due to the angular dependence, with the sum over lmb in Eq (17) not collapsing as both wavefunctions are four-dimensional.The SCS-MP2 and SOS-MP2 are most similar, closely followed by the MP2 and RPA@PBE, which follows suit from the energy level diagram in Fig 4 .As with the energies, the C(HF)-RPA sits in between these two pairs, but falling closer to the SCS-MP2.

IV. CONCLUSION
In this work, we have investigated how the angular dependence of the potential energy surface and the cage breathing mode impact the translational eigenstates of the He@C 60 endofullerene.For this investigation, we calculated complete basis set extrapolated four-dimensional potential energy surfaces, incorporating three He translational degrees of freedom and the C 60 cage radius, using various electronic structure methods: MP2, SOS-MP2, SCS-MP2, RPA@PBE, and C(HF)-RPA.The rationale for utilising this array of methods is twofold: by comparing their results, we have gained not only confidence in our findings but also valuable insights into the performance of these methods where comparisons with reference data were feasible.
Due to the high computational cost of the electronic structure calculations, the full Potential Energy Surface (PES) required interpolation from this sparse training data which was performed using Gaussian Process Regression.The PES was partitioned into a two-dimensional radial surface, to which corrections were applied making it overall four-dimensional.This is a well motivated choice due to the lack of obvious angular splitting within the spectroscopic data.The expression of the PES using spherical polar coordinates motivated the form of kernels shown in Eqns ( 7) and ( 8), which required a nuanced choice of the angular kernel k a and its metric.We chose to use the Wendland-C4, as shown in Table I with the great-circle metric as defined in Eq (9) as this accurately reflected the symmetry within the endofullerene, and had the smallest 95% confidence interval as seen in Figs 2 . and 3.
We generated the four-dimensional translational eigenstates of He@C 60 by diagonalising the nuclear Hamiltonian.By not enforcing spherical symmetry in the potential, and allowing states for different angular momenta to couple we find the (2l + 1)-fold degeneracy of states to be lifted.The three MP2 type methods all predict very similar levels of splitting, whereas the RPA methods predict smaller splittings.While some of this is artificial due to not exploiting the full I h symmetry of the system, we do observe a true splitting of f states and higher of approximately 4cm −1 .This suggests while the angular dependence is weak, it can still be observable in the broadening of these peaks in the spectra.On the other hand, we find very little influence on the cage breathing mode on these translational states.
Comparing the energies to the one-dimensional case, we find that the RPA@PBE and MP2 outperform the C(HF)-RPA and the semi-empirical methods SCS-MP2 and SOS-MP2.Comparing the energies as seen in Figure 4, MP2 has the best agreement with the one-dimensional case, achieving sub-wavenumber accuracy with the spectroscopic data. 4,5On the other hand, RPA@PBE exhibits a discrepancy of approximately 6cm −1 compared to the experimental results; however, it proves significantly more efficient than MP2, surpassing it by almost an order of magnitude.
Furthermore, the wavefunctions of the ground state were compared to the one-dimensional case through their overlap and Hellinger distances in Tables II and III.Once again we find that the MP2 and RPA@PBE eclipse the C(HF)-RPA and the semi-empirical SCS-MP2 and SOS-MP2 methods.Considering the overall efficacy of these methods, and the tradeoff between accuracy and computational cost we can recommend both MP2 and RPA@PBE as viable electronic structure methods for these systems.
Going forwards, a natural extension would be to apply these electronic structure calculations to larger endofullerenes, whether that be molecules within C 60 such as H 2 @C 60 [6][7][8][9] and H 2 O@C 60 [100][101][102][103] which are popular in the literature or by considering larger fullerene cages such as Ne@C 70 . 104An extra challenge for the latter system is to accurately describe the double well that is opened up by elongat-ing the cage along the unique axis and reducing the symmetry to D 5h . 15hile knowledge of the PES gave us access to the translational eigenstates, in order to accurately reproduce an experimental spectrum the knowledge of intensities of transitions is also required.This necessitates the calculation of a dipole moment surface (DMS), which could be generated and interpolated in an analogous way to the PES 11 .Knowledge of the DMS allows for the calculation of the transition dipole moment integrals which are fundamental in calculating the transition intensities.
2][3] The calculations were executed on compute nodes equipped with 2 AMD EPYC 7302 CPUs, featuring in total 32 cores and 64 threads at 3.00 GHz.All reported runtimes are wall times, not CPU times.
The evaluation of the exchange-correlation terms was carried out using the multi-grids defined in Ref. 4, utilising a smaller grid during the SCF optimisation and a larger grid for the final energy evaluation.These grids were generated using the modified Becke weighting scheme. 4The SCF convergence threshold was set to 10 −7 , both for the change in energy and for the commutator norm ||FPS − SPF||.
We employ the integral-direct resolution-of-the-identity Coulomb (RI-J) method, as described by Kussmann et al. 5 , for evaluating the Coulomb matrices.Additionally, for the evaluation of the exact exchange matrices, we utilise the linearscaling semi-numerical exact exchange (sn-LinK) method developed by Laqua et al. 6 All correlation methods employed in this study utilise the frozen-core approximation.For both the ω-CDGD-RPA and the ω-RI-CDD-MP2 methods, we set the attenuation parameter to ω = 0.1.In the RPA calculations, the integration along the imaginary frequency axis was conducted using an optimised minimax grid, 7,8 consisting of 15 quadrature points.For the MP2 calculations, we employed 7 Laplace points.Additionally, the following thresholds were selected for the MP2 calculations: θ NB = 10 −8 , θ iaP NB = 10 −10 , θ 3c = 10 −8 , θ i j = 10 −9 , θ schwarz = 10 −12 .For detailed descriptions of these thresholds and approximations, the reader is referred to the original publications (Ref.8 and Ref. 9).

SI 2. GAUSSIAN PROCESS POTENTIAL ENERGY SURFACE
The set of 200 training points was generated by sampling each of the four coordinates from the following distributions r He ∼ N(0Å, 100cm −1 ) (SI 1) The angular distributions are chosen to force all the coordinates into a single spherical wedge.While this does not correspond to a uniform distribution on the surface of a sphere which may be preferable, this still generates a valid set of random training coordinates.Ensuring the angular coordinates lie within the given spherical wedge allows for ease of generating their symmetric equivalents.
The distribution of the radial coordinates is motivated similarly to a potential optimised discrete variable representation (DVR) scheme. 20For the means, the equilibrium position of He within the cage, and cage radius of C 60 are taken.For the variances, we use round values close to the harmonic frequency of each mode.These frequencies can be simply converted to squared distances modulated by the appropriate mass SI: He@C 60 4D SI 4 in the harmonic frequency calculation.A subtlety to note is that by definition r He ≥ 0 whereas sampling from the distribution in Eq (SI 1) does not guarantee this.We choose to take the value as |r He |.This method generates sampling coordinates r He ∈ (0, 1.5)Å, and R C 60 ∈ (3.52, 3.57)Å.These ranges ensure the sampling points used when evaluating matrix elements in the DVR 21,22 lie within these intervals, allowing for sensible interpolation by the Gaussian Process (GP).
The two GPs were trained using a modified framework from scikit-learn. 23The training set was partitioned in a 2:1 ratio, with the larger set used to train a GP on the twodimensional radial surface.The kernel given in Eq 7 used an anisotropic Matérn covariance function with ν = 2.5 with a white noise function added, with the hyperparameter bounds given in Table SI 1. Once the latter GP was trained, its latent surface was evaluated at the remaining training points, and an additional GP is trained on the four-dimensional surface representing the difference between the electronic structure calculation and the two-dimensional GP prediction.This smaller set of training points was symmetrised by applying the I h symmetry operations C n 5 for n ∈ [0, 1, 2, 3, 4] using the axis aligned along the z direction as seen in Fig 1, and also the centre of inversion i, producing nine extra equivalent geometries with identical energies.The four-dimensional GP was thus trained on the symmetrised "extended" set of coordinates.
The kernel given in Eq 8 used an anisotropic Matérn covariance function 24 with ν = 2.5 for k r , but for k a given the multitude of choices for covariance function outlined in Table I, with metrics in Eqns 9 and 10 the rationale for a specific choice is not obvious.All five choices (excluding Matérn function with geodesic metric due to its restriction on ν) were tested by plotting the four-dimensional surfaces and the corresponding 95% confidence interval for the MP2 data, analogously to Figs 2 and 3.The ranges of all these surfaces is given in Table SI 2 with the hyperparameter bounds and optimised values for this kernel also given in Table SI 1.These are noticeably tighter than their two-dimensional counterparts.This is due to the fact that the majority of the energy spread is accounted for in the two-dimensional GP and having wide bounds for the latter process often leads to "ficitious" LML minima (which are really very flat regions in the LML surface) to be chosen.Too large a length scale leads to a flat surface which ignores the locality of the data, whereas too low a length scale leads to overfitting.
Considering the range of the potential energy surface (PES), the Wendland-C2 covariance function performs worst, with a spread of over 15cm −1 .Looking at its confidence interval, the lower and upper bound surfaces range over 30cm −1 , which is not consistent with the other choices of covariance function and metric whose lower and upper bound surfaces of the confidence interval have the same range as the surface itself.Finally noticing the width of the confidence interval being well over 100cm −1 allows us to discount the Wendland-C2 as a possible choice.The remaining three choices are all fairly similar to each other.Using the tightest 95% confidence interval as the discriminator, this ranks Wendland-C4 with geodesic as best, closely followed by Matérn with chordal, and Wendland-C4 with chordal ranking last.Due to the high symmetry, the geodesic metric outperforming the chordal metric is due to its better ability at capturing the radial and angular patterns within the training data. 25,26alogously to the MP2 and RPA@PBE surfaces given in Fig 2, the equivalent SCS-MP2, SOS-MP2 and C(HF)-RPA PESs are shown in Fig SI 1.Looking at the radial surfaces, they once again are very smooth, indicative of being well described as a polynomial oscillator.There is appreciable anharmonicity along the r He coordinate unlike the R C 60 coordinate which does not show any.The alignment of the elliptical contours almost perfectly along each direction indicates there to be a lack of any observable coupling between the He translational and C 60 cage breathing mode.As with MP2 and RPA@PBE all these methods correctly predict the equilibrium He position to be the origin.However, they also predict slightly different equilibrium cage radii for the He@C 60 system, with SCS-MP2 predicting a cage radius of 3.539Å SOS-MP2 one of 3.542Åand C(HF)-RPA a radius of 3.536Å.
Turning our attention to the angular contours, they maintain the symmetry dictated by the C 5 axis as periodic 2π 5 symmetry along the φ direction, and by the centre of inversion i which manifests as an inversion about the point (θ , φ ) = ( π 2 , π).Once again there are regions of both positive (red) and negative (blue) corrections relative to the value at the origin.The locations of these regions is identical and consistent with the MP2 and RPA@PBE angular surfaces.The shape of the angular contours in all three MP2 methods is extremely similar.The only obvious difference between the SCS-MP2 and SOS-MP2 appears to be the former has a larger magnitude negative peak, whereas the latter has this in the positive contours.The range of these surfaces is larger than the RPA@PBE, but lower than the pure MP2.On the other hand, the C(HF)-RPA has a much smaller range, which is consistent with the RPA@PBE.
Based on the information in Table SI 2, we decided on Wendland-C4 covariance function with geodesic metric for the kernel due to it having the tightest 95% confidence interval.The MP2 lower and upper bound surfaces are given in Fig 3, and the RPA@PBE equivalents are shown in Fig SI 2. As with the MP2, both the radial lower and upper bounds have the same shape as each other, and the mean PES prediction.As the PES is agnostic to a linear energy shift, the behaviour of the GP is to strongly enforce the shape of the surface.The angular bounds on the other hand look very different to each other, and to their MP2 counterparts.While it maintains the appropriate symmetries, this once again points to the sensitivity of the four-dimensional kernel to its data.Both the lower and upper bound contours are centred on the value at the origin.This implies that the lower bound surface has its maximum at the origin, with the exact opposite for the upper-bound surface.However, the range of both the upper and lower bound surface is again on the order of 6cm −1 , identical to the mean prediction which is also the case for the MP2 surface.He@C 60 for MP2, SCS-MP2, SOS-MP2, RPA@PBE, and C(HF)-RPA PESs alongside the one-dimensional experimental derived counterparts.One-dimensional energies are coloured by angular momentum quantum number l in increasing order: blue, orange, green, red, purple, brown, pink and grey.Error bars on the electronic structure methods are also coloured according to the near degeneracy of states.where ω He and ω C 60 are the harmonic frequencies defined during the transformations given in Eqns 12 and 13, using a DVR 21,22 and potential optimised DVR 20 respectively.Due to the definitions of the basis functions, the ĥ0 matrix is diagonal, whereas the ∆V matrix is not.However, as the basis functions are orthonormal, there exists an orthogonal transformation between the finite basis representation (FBR) and the DVR 21,22 which can be leveraged.The basis matrices where the V DV R matrix is diagonal; entries being the potential sampled at the quadrature points minus the already accounted for harmonic contribution.Eq (SI 9) leverages another advantage of DVR, as the overall quadrature grid is the outer product of sub-dimensional grids.This is not possible for Eq (SI 6) due to the shared quantum number between the radial function and spherical harmonics, which in turns implies their quadrature grids are correlated.The q He α grid is defined by generalised Gauss-Laguerre quadrature, cos(θ β ) and φ γ are defined on Gauss-Laguerre grids, and q C 60 δ is defined by a Gauss-Hermite grid.Calculating this matrix product is equivalent to evaluating the integral using Gaussian quadrature. 21,22Usually, the number of quadrature points and basis functions are equal, making the basis matrices in Eqns (SI 6) and (SI 8) square, but this does not need to be the case.Due to coupling of radial functions with differing l values, and oscillatory nature of the spherical harmonics, in order to converge the integrals, twice the number of radial functions are used as quadrature points in the q He α grid, and four times the number of angular functions are used as quadrature points in the φ γ grid.This factor of eight between the number of basis functions and quadrature points implies working in the variational basis representation (VBR) 21,22 , as opposed to the FBR, as more matrix elements are converged.
While the energies for all four electronic structure methods are given in  SI 3. Similar to the ground s state in Table III, the MP2 and RPA@PBE p states are very similar to each other, as are the SCS-MP2 and SOS-MP2.Once again, the C(HF)-RPA is very close with the SCS-MP2.However, this is not found to be the case in the d state distances.Despite the energy ordering of MP2<RPA@PBE<C(HF)-RPA<SCS-MP2<SOS-MP2, all the MP2 wavefunctions are closer together than the RPA methods.This is justifiable considering the higher fold degeneracy of the d basis states, which are the major contributor to the eigenfunction and their exact contribution being sensitive to the shape of the PES.This is further exacerbated within the f nuclear orbital distances, as the energy matches are poorer but the wavefunction distances are even further apart.This can be indicative of nuclear orbitals for each method having different principal x and y axes.If they have been rotated a different amount relative to each other, this will lead to a poorer overlap of the orbitals, leading to a larger Hellinger distance.
While the results in Section III are for 3 He@C 60 , the translational eigenstate energies for 4 He@C 60 are shown in Fig SI 4, which is the analogous figure to Fig ??.As for the lighter isotope, C(HF)-RPA and the semi-empirical SCS-MP2 and SOS-MP2 methods consistently overestimate the energies; MP2 and RPA@PBE perform significantly better.Once again, the MP2 reaches spectroscopic accuracy with the one-dimensional results, with the RPA@PBE lagging a few wavenumber behind.

FIG. 1 :
FIG. 1: Axis orientation of C 60 .Red, green and blue arrows correspond to x, y and z directions respectively Fig 2 also highlights the importance of the four-dimensional nature of the surface.Picking any particular one-dimensional slice through an ab initio PES with fixed (θ , φ , R C 60 ) will lead to different translational energies.

FIG. 3 :
FIG. 3: 95% confidence interval on the value of the four-dimensional corrections to a fixed two-dimensional radial MP2 PES: (a) radial lower bound, (b) angular lower bound, (c) radial upper bound, and (d) angular upper bound.The radial and angular slices are at the same fixed coordinates as in Fig 2. Radial contours have their values shifted by -44cm −1 in the lower bound, and +44cm −1 in the upper bound.Angular contours are at every 0.5cm −1 relative to the value at the origin being -43.5cm −1 in the lower bound and +43.5cm −1 in the upper bound.

FIG. 4 :
FIG.4: Four-dimensional translational energies of3  He@C 60 for MP2, SCS-MP2, SOS-MP2, RPA@PBE, and C(HF)-RPA PESs alongside the one-dimensional experimental derived counterparts.One-dimensional energies are coloured by angular momentum quantum number l in increasing order: blue, orange, green, red, purple, brown, pink and grey.Error bars on the electronic structure methods are also coloured according to the near degeneracy of states.

Fig 5 ,
which allows for simple assignment of both translational and total angular momentum quantum numbers k and l respectively.These eigenfunctions are usually dominated by a single 3D HO wavefunction.With the reorientation of the

FIG. 5 :
FIG. 5: MP2 isosurfaces of (one of) the lowest lying set of (a) s, (b) p, (c) d, and (d) f nuclear orbitals.The C 60 cage has been rotated relative to Fig 1, with the z direction pointing directly upwards, and xy plane rotated to centre down a C 3 axis.The isosurfaces are taken at 0.5% of the maximum amplitude of each wavefunction.
FIG. SI 2: 95% confidence interval on the value of the four-dimensional corrections to a fixed two-dimensional radial RPA@PBE PES: (a) radial lower bound, (b) angular lower bound, (c) radial upper bound, and (d) angular upper bound.The radial and angular slices are at the same fixed coordinates as in Fig 2. Lower bound contours have their values shifted by -44cm −1 , and upper bound by +44cm −1 .

FIG. SI 3 : 100 FIG. SI 4 :
FIG. SI 3: RPA@PBE isosurfaces of (one of) the lowest lying set of (a) s, (b) p, (c) d, and (d) f nuclear orbitals.The C 60 cage has been rotated relative to Fig 1, with the z direction pointing directly upwards, and xy plane rotated to centre down a C 3 axis.The isosurfaces are taken at 0.5% of the maximum amplitude of each wavefunction.

Fig 4 ,
only the ground state wavefunctions are compared in Tables II and III.For MP2, one of the lowest lying set of s, p, d, and f orbitals is given in Fig 5.The RPA@PBE analogue is given here in Fig SI 3. Once again, they have strikingly regular patterns, with assignment of translational and angular momentum quantum numbers k and l respectively being very simple by analysing the nodal patterns.With the cage being reoriented, neither the d nor f orbital lie perfectly along the x and y axes.The ordering of the nearly degenerate orbitals is subtly different between the RPA@PBE and MP2, which is due to the difference in angular contours seen in Fig ??.The Hellinger distances of the p, d and f states of the MP2 in Fig 5 and RPA@PBE in Fig SI 3 are compared to the SCS-MP2, SOS-MP2 and C(HF)-RPA in Table

TABLE SI 2
: Minimum and maximum values in cm −1 of the MP2 angular PES and its lower and upper bound of the 95% confidence interval for four-dimensional kernel for the varying choices of covariance function and metric.