Interoperable Workflows by Exchanging Grid-Based Data between Quantum-Chemical Program Packages

Quantum-chemical subsystem and embedding methods require complex work-flows that may involve multiple quantum-chemical program packages. Moreover, such workflows require the exchange of voluminous data that goes beyond simple quantities such as molecular structures and energies. Here, we describe our approach for addressing this interoperability challenge by exchanging electron densities and embedding potentials as grid-based data. We describe the approach that we have implemented to this end in a dedicated code, PyEmbed , currently part of a Python scripting framework. We discuss how it has facilitated the development of quantum-chemical subsystem and embedding methods, and highlight several applications that have been enabled by PyEmbed , including WFT-in-DFT embedding schemes mixing non-relativistic and relativistic electronic structure methods, real-time time-dependent DFT-in-DFT approaches, the density-based many-body expansion, and workflows including real-space data analysis and visualization. Our approach demonstrates in particular the merits of exchanging (complex) grid-based data, and in general the potential of modular software development in quantum chemistry, which hinges upon libraries that facilitate interoperability.


Introduction
Contemporary applications of quantum chemistry increasingly rely on complex workflows, which combine calculations with different methods and program packages.For instance, schemes for the automatic exploration of chemical reaction networks (see, e.g., Ref. 1-3)   combine heuristics for the generation of possible reaction products and pathways, [4][5][6] semiempirical quantum chemistry for the generation of conformer ensembles, 7,8 and quantumchemical calculations at different levels of theory, possibly in combination with approaches for quantifying their uncertainty. 9,10 imilarly, the elucidation of reaction mechanisms in homogeneous transition-metal catalysis requires intricate workflows combining distinct computational tools (for examples, see Ref. 11, 12).3][54][55][56] As these multiscale and multilevel methods combine rather distinct computational approaches for different parts or fragments of complex chemical systems, they commonly give rise to computational workflows that combine many individual calculations, possibly with different quantum-chemical software modules or packages -both for density-functional theory (DFT) and for various methods of wave-function theory (WFT) -and with other software tools, such as force-field codes or chemoinformatics toolkits.Furthermore, they might require exchanging data between these individual calculations beyond "simple" quantities such as atomic coordinates or total energies.[66] To realize the computational workflows that are required for density-based subsystem and embedding methods, some of us have initially developed the Python-based scripting framework PyAdf. 23While initially focused around the Amsterdam Density Functional (Adf) program 67 [now part of the Amsterdam Modeling Suite (Ams)], over the years PyAdf has been extended to facilitate density-based subsystem and embedding methods in which different quantum-chemical methods and program packages can be combined.
In this context, the interoperability of different quantum-chemical program packages poses particular challenges.Commonly, quantum-chemical program packages (such as Ams, 67,68 Dalton, 69,70 Dirac, 71 NWChem, 72,73 Molcas, 74,75 Orca, 76,77 Turbomole, 78,79 and QuantumEspresso 80,81 ) have been developed as monolithic codes over many decades. 82erefore, there is generally no easy access to the internals of the program packages, either because they are closed-source codes or because they are too complex for users and developers outside of their core community.However, density-based subsystem and embedding methods require exchanging information between different program packages and/or the workflow environment that go beyond those commonly provided to the user, such as integration grids or electron densities and derived properties.Unfortunately, there are no standardized interfaces for exchanging such information between different quantum-chemical program packages.
Here, we discuss how these challenges have been addressed in the PyAdf scripting framework.To this end, we describe the PyEmbed module of PyAdf for handling grid-based data (see Section 2) and demonstrate how it facilitates interoperable workflows in multiscale quantum chemistry.To illustrate the capabilities of our approach, we discuss selected examples of workflows that either combine different program packages or that are agnostic to the underlying quantum-chemical programs and methods.In particular, we highlight the development and applications of DFT-in-DFT and WFT-in-DFT embedding schemes (see Section 3.1), of real-time time-dependent DFT-in-DFT embedding schemes (see Section 3.2), and of the density-based many-body expansion (see Section 3.3).Furthermore, we discuss the integration of real-space data analysis and visualization into computational workflows (see Section 3.4).Note that the present article focuses on PyAdf's functionality for handling grid-based data and does not aim at covering its full capabilities in other areas, which will be discussed elsewhere.

Challenges and General Design
The scripting framework PyAdf allows the user to set up workflows combining several quantum-chemical calculations seamlessly.It provides classes for handling and manipulating atomic coordinates, for defining quantum-chemical calculations with a number of program packages (including Adf/Ams, Dalton, Dirac, NWChem, Molcas, Orca, and Turbomole) as well as for generating input files for these programs, for executing the quantum-chemical calculations as well as for storing their results, and for extracting results data (such as optimized structures, energies, and properties) from the result files.
The general design of PyAdf is described in Ref. 23, even though its functionality has been extended significantly in the past decade.

Use Case: Density-Based Embedding
The prototypical use case that is of interest to us here are density-based subsystem and embedding calculations, in particular, DFT-in-DFT and WFT-in-DFT calculations based on the framework of subsystem-DFT and frozen-density embedding (FDE). 52,54,56 Hre, one considers a molecular system, which is divided into N subsystems.This is achieved by partitioning the total electron density ρ tot (r) into the electron densities of these subsystems, When considering the first subsystem as the subsystem of interest, its electron density ρ 1 (r) is obtained by minimizing the total energy functional.This results in Kohn-Sham (KS)-like equations for the active subsystem, in which the effect of the frozen environment of all the other subsystems (with electron densities ρ J=2,...,N ) is included via an effective embedding potential, 83 Here, v is the kinetic energy functional.The latter two are commonly evaluated using suitable approximate density functionals.
For DFT-in-DFT embedding calculations, this embedding potential is included in the KS equations.Note that it depends on the electron density of the active subsystem, i.e., it needs to be updated during the self-consistent field interactions.In WFT-in-DFT calculations, the embedding potential is included in the one-electron Hamiltonian, which requires evaluating the matrix elements,  Several quantum-chemical program packages provide integrated implementation of DFTin-DFT embedding, e.g., Adf/Ams, 89 Serenity, 90,91 CP2K, 92 and eQE. 93,94 hese control the calculations for the different subsystems themselves and use their internal data structures to exchange data between them.In this case, algorithms and routines available within these programs can be used to evaluate the different terms of the embedding potential and their matrix elements.An integrated implementation of WFT-in-DFT methods is available in Koala. 95wever, such an integrated approach makes it difficult to combine the unique capabilities of different program packages in a single DFT-in-DFT or WFT-in-DFT workflow.An alternative modular approach, which is facilitated by the PyEmbed module, is illustrated in Fig. 1.Here, DFT calculations for the frozen subsystems are performed individually, possibly using different methods and/or program packages for different frozen subsystems (see green and blue rectangles at the top of the figure).These calculations provide the electron densities of the subsystems, from which the PyEmbed module calculates the embedding potential according to Eq. (2) (see central box), which is subsequently imported into the WFT calculation for the active subsystem, again using a different program package (see orange box at the bottom).In freeze-and-thaw calculations, roles of the subsystems can be interchanged, and the electron density from the WFT calculation (or an approximation thereof) can be used to obtain an embedding potential for the initially frozen subsystems, leading to updated frozen electron densities that can, in turn, be used to calculate an updated embedding potential for the WFT calculation.
Such a modular approach to DFT-in-DFT and WFT-in-DFT calculations poses a challenge, as it requires clearly defined interfaces between the quantum-chemical program packages and PyEmbed that facilitate interoperability.In particular, the quantumchemical program packages, on the one hand, need to be able to export the electron density and related quantities, and on the other hand, must provide the capability to import an external embedding potential.In particular the latter will generally require modifying the quantum-chemical program packages themselves, but the required changes should be as lightweight as possible.
Here, we chose to address these challenges by exchanging the densities and potentials as grid-based data.That is, we employ a suitable three-dimensional grid of points {r i }, and express the electron densities and related quantities as well as embedding potentials on these grid points.Usually, the grid points coincide with those of a suitable numerical integration grid (e.g., a Becke grid 96 ), which makes it possible to evaluate the integrals over the embedding potential [Eq. ( 3)] by numerical integration: To this end, the embedding potential is also evaluated at the individual grid points from the values of the electron densities and related properties at the grid points, which are exported from the quantum-chemical program packages on these grid points.In addition to the electron density itself, the first and second derivatives of the electron density at the grid points are required to calculate the functional derivatives of E xc [ρ] and of T s [ρ] when using generalized-gradient approximations (GGAs), and the Coulomb potential of the electron densities, should also be provided on the grid points by the quantum-chemical program packages.
Furthermore, the nuclear potential v nuc (r) at each grid point is required, but can be easily calculated directly from the atomic coordinates.
Exchanging grid-based data facilitates interoperability, as it defines a clear interface that is independent of the quantum-chemical methods that are used, of the employed basis functions and the specific basis sets, and of the inner workings of the individual quantumchemical program packages.In the following, we will present the primary components of the PyEmbed module that allow for the exchange of grid-based data between quantumchemical program packages.
While our focus here is on calculating embedding potentials using PyEmbed, it is worth noting that exchanging grid-based data can extend to more complex quantities.For instance, it can involve derivatives of the embedding potential, essential for computing kernel contributions to linear response properties. 97

Handling of Grid-Based Data
To facilitate the handling of grid-based quantities in Python, we have developed a library of GridFunction classes, which provides the required data structures (see Fig. 2).Simply speaking, a GridFunction contains references to the associated integration grid and the relevant data on these grid points.
Here, the integration grid is represented by an instance of a Grid class, which encap- Coul (r) and v Coul (r).The grid-based representations of the densities and potentials are stored as instances of GridFunctions, each referencing the associated Grid.See text for further details.
sulates the coordinates of the grid points as well as their weights.Different types of grids, such as evenly-spaced cubic grids (commonly used for visualization) and irregular molecular integration grids (as used in molecular quantum chemistry) are supported.
PyEmbed interfaces with PySCF 98,99 to generate suitable Becke integration grids based on the molecular structure, independent of external program packages.The accuracy of the generated Becke grid can be tuned according to the options available in PySCF.
Grid-based data (values on the grid points) are generally stored in Numpy arrays.Different subclasses of GridFunction are tailored to specific common use cases.For instance, GridFunctionDensityWithDerivatives can encapsulate the electron density ρ(r i ) (stored as one-dimensional Numpy array of size N ), its gradient ∇ρ(r i ) (stored as N × 3 array), and if necessary its second derivative ∇∇ρ(r i ) (stored as N × 6 array).The possibility to represent spin unrestricted densities is also available.
All GridFunctions provide methods for mathematical operations on them, such as addition and multiplication by scalars, as well as for numerical integration.These methods provide an interface that is unified across all types of grid functions, and are implemented efficiently via Numpy.Furthermore, all GridFunctions include functionality for reading and writing the grid and data to file in various formats, including Numpy's 100,101 text and binary formats, as well as HDF5. 102Finally, routines for interpolating grid-based quantities between different grids are also available.

Generation of Grid-Based Data
The available methods for reading GridFunctions from files provide a natural interface to external program packages and can be used to import electron densities and related quantities, provided the program packages allow for the export of grid-based data in a suitable file format.Currently, PyAdf supports the import from Ams's tape files as well as from text files written by Dalton and Dirac.Implementing interfaces to other program packages and file formats as well as to other scalar, vector and tensor fields beyond density-related quantities (see Sect. 3.4) is straightforward.
Another approach for obtaining the values of the electron density and related quantities on the grid points proceeds via the molecular orbital coefficients optimized in the quantumchemical calculation.Together with information on the basis functions used, this allows for the calculation of the electron density and related quantities inside PyEmbed.However, despite efforts to develop a standardized data format for the exchange of the required basis set and orbital / wavefunction information (such as Q5Cost, 103 QCSchema, 104 and TREXIO 105 ), only a few quantum-chemical program packages currently provide such a standardized interface.Further complications may arise if calculations take into account relativistic effects 106,107 such as spin-orbit coupling, 108 since then one-electron functions may be expressed as complex-valued functions, or as quaternions. 109,110 owever, if one remains within the non-relativistic theory, the file format used by the Molden program 111 (originally developed several decades ago), provides a de facto standard, as such files can be obtained from most program packages.Therefore, we have implemented a DensityEvaluator module that can process Molden files generated by quantum-chemical program packages using Gaussian-type orbitals.This module makes use of PySCF to read in the basis functions, molecular orbital coefficients, as well as occupation numbers and uses these to calculate the electron density, its derivatives, and the corresponding electrostatic potential on the points of an integration grid.
These quantities are provided as GridFunction objects inside PyEmbed.We note that the use of such a non-standardized legacy file format can cause problems, in particular because different program packages implement the Molden file format inconsistently.

Calculation of Embedding Potential
At the core of PyEmbed is the EmbPot module, which performs the calculation of the embedding potential as given in Eq. ( 2).In this module, the nuclear potential of the frozen subsystems is evaluated directly from the corresponding atomic coordinates, whereas the Coulomb potential of the frozen subsystems needs to be provided as input as a suitable GridFunction object.For evaluating the nonadditive kinetic and exchange-correlation potentials, it takes the electron density (and possibly its first and second derivatives) of the active and frozen subsystems as input, again in the form of suitable GridFunction objects.To this end, the functional derivatives of the selected approximate functionals are evaluated with the help of the XcFun library. 113,114 or the nonadditive xc and kinetic energy functionals, both LDA functionals (requiring only the electron densities of the subsystems) as well as GGA functionals (additionally requiring the first and second derivatives of the subsystem electron densities) are currently supported.Note that for GGA functionals, the local embedding potential is evaluated explicitly, such that the matrix elements of the embedding potential can be evaluated directly according to Eq. ( 4) (i.e., the derivatives of the basis functions are not needed).
As a result, this provides a GridFunction representation of the embedding potential, which can be imported into the quantum-chemical calculation for the active subsystem.
The evaluation of the matrix elements of the embedding potential [see Eq. ( 3)] by numerical integration generally needs to be performed inside the respective codes, which also need to provide an interface for importing this embedding potential from file.Such interfaces are currently publicly available in Ams/Adf, Dalton, Dirac, NWChem, Molcas, and QuantumEspresso.We note that with the modular approach described here, the integration grid used inside the quantum-chemical program packages in the case of DFT calculations and those used for the evaluation of the embedding potential and for the evaluation of the matrix elements of the embedding potential according to Eq. ( 4) are generally different.Therefore, both grid have to be chosen appropriately.

DFT-in-DFT and WFT-in-DFT embedding
The exchange of grid-based data has been extensively employed by some of us in applications of WFT-in-DFT, employing a linearized embedding potential.In the case one is interested in energies for ground or excited states, the linearized scheme has been found to be rather accurate, since electron densities from WFT and DFT are often very similar. 84,85,87 Orecent example is found in the simulations of valence binding energies of halides in water droplets, 115 such as the one shown in Fig. 3a.In this work, the halide binding energies have been calculated with Dirac using the relativistic equation of motion coupled cluster for ionization energies (EOM-IP), with the embedding potential being obtained after preparatory DFT-in-DFT calculations with ADF.Due to the need to account for finite temperature effects, these CC-in-DFT calculations have been carried out for a set of 100 snapshots originating from classical molecular dynamics simulations with polarizable force fields.With this setup, the overall computational cost for the calculation of each snapshot was roughly equivalent to that of the DFT-in-DFT calculations (instead of the much more computationally expensive EOM-IP calculation), while results for the halide binding energies were found to deviate from experiment by about 0.1 eV, a value comparable to that obtained with state of the art periodic Green's functions calculations. 116similar protocol, based on an ensemble of preparatory DFT-in-DFT calculations followed by a WFT-in-DFT calculation, has been employed to obtain the core electron binding energies associated with the chlorine 2s and 2p levels for chloride in water droplets as well as chloride and HCl on ice surfaces (see Fig. 3b). 117The WFT-in-DFT calculations were again carried out with Dirac, but now employing the core-valence separation (CVS)  120 For the case of superheavy elements, it was found that embedding potentials were largely transferable between atoms, and provided a more physical description of the cage than the previously employed potentials consisting of a square well.A key implementation detail in PyBertha for achieving high efficiency is the expression of the embedding potential on the basis of fit functions, instead of orbital pairs. 120he applications outlined above also serve to underscore that when different subsystems are treated with different Hamiltonians, exchanging grid-based data has a practical (if not fundamental) advantage over data exchange through orbital manipulations: since water molecules or C 60 are made up of light elements for which spin-orbit coupling is not important, preparatory DFT-in-DFT calculations could safely be carried out with the scalar relativistic ZORA Hamiltonian (decreasing the overall cost of calculations), and the resulting embedding potential imported into four-component calculations without any particular difficulty.We are not aware of any other implementation that allows for such a mixture.
Grid-based data exchange has also been deployed for relativistic DFT-in-DFT calculations of response properties.In a recent application, the X-ray absorption spectra of the uranyl tetrachloride molecule (Cs 2 UO 2 Cl 4 ) has been obtained from the calculation of dipole polarizabilities within the damped response formalism, 121 and in line with what had been found for valence excitations, 87 embedding calculations in which the chlorides are represented by an embedding potential can reproduce very well the spectral features for the uranium M 4 and L 3 edges, and the O K-edge.
Another example of DFT-in-DFT response calculations is that of magnetic properties such as NMR shieldings, indirect spin-spin coupling constants, and magnetizabilities. 97 this case, the embedding was determined self-consistently at the four-component level, followed by that of perturbed densities, required for the calculation of kernel and coupling contributions which are essential for linear response properties.
It should be noted that the Fortran implementation in Dirac closely followed the design philosophy of PyEmbed, with the definition of GridFunction containing grid-based data for particular subsystems, and operations that can be carried out on them.This makes for a natural connection to the PyAdf and PyEmbed frameworks, and easily allows the exchange between subsystems (as Dirac is not capable of handling multiple fragments in the same execution).
The magnetic properties embedding code has been used to investigate the solvent effects on NMR shieldings of the molybdate (MnO 4 2 -) ion, 122 with temperature effects being taken into account by averaging the property calculations over 100 snapshots from AIMD simulations of molybdate with 20 water molecules. 123One key finding from these studies is that despite the operators involved in the magnetic perturbations exhibiting an effective r −2 dependence (suggesting strong locality), the response properties still show non-negligible contributions from longer-range interactions with solvent molecules.These findings, along with the challenges in understanding the interplay between long-range and short-range contributions, have motivated some of us to pursue the development of real-space data analysis and visualization, discussed below.

Real-Time TDDFT-in-DFT Simulations
The applications of FDE discussed above for calculating molecular properties rely in one way or another on perturbation (response) theory.Apart from the fact that response theory will break down in the case of strong external perturbations (e.g., in the simulation of processes such as high harmonic generation), and that they are unable to provide detained information in the attosecond regime, a practical drawback of response theory is the need to calculate first or higher-order derivatives of the embedding potential.
An alternative approach lies in the so-called real-time (rt) methods, in which the timedependent Schrödinger (or Dirac) equation is solved explicitly.In the case of Kohn-Sham DFT, this translates into where from time-dependent orbitals ψ i (r, t) one obtains the time-dependent density matrix D µν (t) and consequently the time-dependent electron density can be expressed as ρ(r, t) = µν D µν (t)χ µ (r)χ ν (r), with χ µ (r) being, for instance, atom-centered basis functions.This equation can be recast as the Liouville-von Neumann (LvN) equation 124,125 i ∂D(t) ∂t = F(t)D(t) − D(t)F(t), with the time-dependent Fock matrix F(t) now includes the embedding potential.Molecular properties can then be obtained from a simulation of the time evolution of the dipole moment or another suitable operator, and this signal can be subsequently Fourier transformed to obtain for instance an excitation spectrum.
One consequence of this formulation for the density-based embedding theories, 53,126 is that the embedding potential now formally becomes time-dependent, even if only subsystem I evolve in time (in analogy to the time-independent FDE case), due to the dependence of the embedding potential on both electron densities through the non-additive term (in what follows we only discuss results employing non-relativistic Hamiltonians, but note that a four-component code for rt-TDDFT-in-DFT calculations based on the PyBertha framework is currently under development).
In practice, recalculating the embedding potential at each time step, especially considering that typical simulations require thousands to tens of thousands of time steps, can be computationally very expensive and, in some cases, prohibitive.However, in a first implementation with atom-centered Gaussian basis sets employing the PyEmbed framework and combining the Adf and Psi4 codes, some of us have shown 126 that it is nevertheless possible to avoid recalculating the embedding potential over a certain (problem-dependent) number of time steps, and with that reduce the computational cost of rt-TDDFT-in-DFT FDE calculations, making it feasible to investigate the effect of the solvent on spectra with it.
In spite of the computational savings introduced in Ref. 126, calculations with relatively large frozen density region remained rather time-consuming, putting application to sys-tems requiring large solvation shells (such as the halides discussed above) out of reach.
At the same time, even for halides FDE had been shown to fail to properly describe the spectra of species such as fluoride, 115 which interact strongly with their first solvation shell.
In order to overcome these two difficulties, we have recently introduced 127 an embedding rt-TDDFT implementation based on the PyEmbed and PyBertha-RT 119 frameworks that combines the real-time 128,129 Block-Orthogonalized Manby-Miller Embedding (BOMME) 130 and FDE 126 approaches.This hybrid approach aims to reduce computational cost further while accurately treating strong interactions in the vicinity of a species of interest (BOMME).Moreover, it can incorporate long-range interactions (FDE), which are crucial for describing excitation processes accurately.
We also explored a simple way to speed up calculations by taking advantage of the GPU offloading capabilities of optimized tensor libraries like PyTorch and Tensorflow.
Our early stage implementation, described in Ref. 127, involved replacing certain Numpy operations, such as Fock matrix formation in time propagation, and constructing the matrix representation of the embedding potential, which are all performed in our own Python codes.Further work still needs to be done to fully realize the potential of GPU acceleration, in particular by enabling GPU offloading in more components of our implementation, as the generation of the non-additive potentials (get nadd pot) with PyEmbed.
The performance analysis summarized in Fig. 4, illustrates the impact of GPU offloading.
In this figure we can see that the total wall time is reduced when going from the CPU-only implementation to that with PyTorch, with a further reduction with Tensorflow.In parallel to these developments, we have recently started to explore how far one can take the idea of not updating the embedding potentials at each timestep.To this end, in Fig. 5 we present results of a simulation revisiting the acetone in water system of Ref. 126 in which we compare two limiting cases: the update of the embedding potential at each timestep as done previously, and the opposite situation in which the embedding potential is not updated at all during the time propagation.These preliminary results seem to indicate that for this system, which is a case of relatively weak interactions between the active system and its environment, this very crude approximation yields results which are quite close to the original ones.This point requires further investigations, as it may be that this approximation will be suitable for a few low-lying transitions but break down for high energy transitions such as those in the X-ray region.
In the case of heavy atoms (Rn, Cn, Fl, Og) encapsulated in C 60 , discussed above for time-independent calculations, we observed the transferability of the embedding potential.We explored here whether that could hold true also for time-dependent case, by considering trifluoracetone, which can be regarded as a modified acetone.We calculated its absorption spectrum utilizing embedding potentials without updates (corresponding to the assumption of weak interactions with the environment for the fluorinated species), one obtained for the acetone in water system, and another calculated for the trifluoracetone in water system.As shown in Fig. 5b, the spectra for the two calculations show a semi-quantitative agreement.The conditions of transferability of embedding potentials in the case of confined molecular systems will be addressed in future works.

Density-Based Many-Body Expansion
Another example that has been enabled by the possibility to import and process gridbased quantities inside PyEmbed is the development of the density-based many-body expansion (db-MBE) by some of us. 112,131,132 Te db-MBE offers an accurate and efficient fragmentation method for the quantum-chemical treatment of molecular clusters.Its starting point is the conventional many-body expansion (MBE), 48,133,134 in which the total energy of a molecular cluster is approximated as [energy-based many-body expansion (eb-MBE)], I is the energy of the I-th fragment, ∆E IJ − E J is the interaction energy for the dimer of fragments I and J, and so forth.
The db-MBE seeks to improve upon the eb-MBE by calculating the density-based correction, where is the many-body expansion of the total electron density, E tot [ρ] is the DFT total energy functional, and E tot is its many-body expansion.With this density-based correction, an improved n-body approximation to the total energy of the molecular cluster is obtained as Previous studies have demonstrated the excellent accuracy of db-MBE in predicting both absolute and relative energies of water and ion-water clusters, even at the two-body expansion level. 132,135 first order (i.e., using only the electron densities of the monomers), the density-based correction can be calculated as J ](r) d 3 r where the non-additive kinetic energy and exchange-correlation functionals are evaluated using suitable approximate density functionals, and where NN is the interaction energy between the nuclei of fragments I and J.
With representations of the monomers' electron densities (as well as their derivatives) and nuclear and Coulomb potentials available, all terms in Eq. ( 11) can be evaluated using numerical integration. 131We have integrated a db-MBE module in PyAdf that makes use of the PyEmbed module as described above and evaluates the density-based correction independent of external quantum-chemical program packages (see top part of Fig. 6).Again, the density functionals are evaluated using the XcFun library. 113,114  higher orders (n > 1), the density-based correction is given by

Real-Space Data Analysis and Visualization
Another important area of application for workflows that involve the exchange of gridbased data is the analysis and visualization of quantum-chemical calculation.Such combined methods can be particularly insightful -they adopt the "give us insight (not) and numbers" mindset, [136][137][138] which has been driving the development of numerous analysis tools to accompany quantum-chemical calculations.Specifically, we highlight techniques applied to molecular descriptors, which can be represented as scalar, vector, or tensor fields and discretized on grids in physical space.Such molecular descriptors can be provided by quantum-chemical program packages by the interfaces described above (passed as GridFunctions, see Fig. 7), and if necessary, the modules described in Section 2 facilitate their processing, manipulation, and export to software tools for their analysis and manipulation.
The most obvious example of such a molecular descriptor is the electron density, ρ(r).
Plotting the electron density in real space as, for instance, a three-dimensional surface for a selected isovalue, can demonstrate the "shape" of a molecule or a specific molecular fragment (depending on the isovalue), point to nuclear positions and identify chemical bonds (see Fig. 7a).More elaborated electron density functions designed to capture specific information can highlight a particular feature of the molecular system.For instance, the electron density Laplacian ∆ρ(r) identifies electron localization domains. 140The reduced density gradient, s(r) ∝ |∇ρ(r)|/ρ(r), captures weak non-covalent interactions much better than the electron density.For this reason, its visualization (popularized as the NCI index 141,142 ) often complements electron density plots in studies on chemical interactions (see Fig. 7b).Such an information-centric approach to chemical simulations bloomed into the conceptual DFT theory 143 and Quantum Chemistry Topology (QCT) 144 domains, which largely focus on developing such molecular descriptors.
Efficient visualization techniques for such fields can be broadly classified as geometrical or feature-based approaches (for more detailed classification schemes, see, e.g., Ref. 145).
Geometrical approaches are based on plotting the data either directly on the grids (as points or vectors) or by various functions that represent the data, for instance, via contours (scalar data) or integral curves (vector data).However, while these methods illustrate the underlying data intuitively and efficiently, they "transfer the burden of feature recognition to human brains". 146Moreover, such methods come with various difficulties, which emerge especially for more complex data, such as the selection of meaningful isosurfaces (scalar data) or the choice of seeding points for streamlines (vector data), to name but a few.
The other group of methods focuses on extracting specific features of the data, which are independent of the choice of their visual representation and, therefore, can lead to more reliable qualitative conclusions.An example is the topology-based visualization, 147 which, for instance, applied to a scalar field, results in the topological skeleton of the field composed of the critical points and separatrices (lines and planes "joining" these points).Such topological methods have been widely adopted in computational chemistry.In particular, the whole QCT domain is based on the analysis of scalar and vector fields through their gradient fields.Specifically, it applies the theory of dynamical systems to the studied field gradient and interprets the extracted topological objects (e.g., critical points) in terms of chemical concepts (e.g., chemical bonds).The well-known example of a QCT method is the topological analysis of the electron density in molecular systems, pioneered by Bader and popularized as the Quantum Theory of Atoms in Molecules (QTAIM). 148wever, the practical applications of QCT methods to chemistry problems are often burdened with technical difficulties, and their value is reduced by ambiguous interpretations.
Modern analysis approaches, such as the Topological Data Analysis (TDA), can address many of such challenges.TDA is a new research field at the interface between mathematics and computer science, rooted in sound theoretical settings such as Morse theory 149 and Persistent Homology (PH). 150TDA constitutes an appealing framework for extracting the topological features in the data, measuring their salience (what allows for discriminating important features from non-essential ones), and for multi-scale data analysis (for examples from chemistry, see Refs.151-156).As such, it is particularly useful to study the manifestation of weak phenomena in the data representing both weak and strong physical features.A good illustration of this aspect is the analysis of non-covalent interactions in small gold complexes, Au 4 -S-C 6 H 4 -S'-Au' 4 , based on the presence of bond critical points of the electron density in the region between interacting atoms: while QTAIM attributed their appearance to the relativistic effects, 157 TDA demonstrated that these critical points are non-essential electron density features. 139 the analysis outcome remains inconclusive, a good strategy is to reconsider the selection of a molecular descriptor for the studied phenomenon -in the cited case, further studies employing the reduced density gradient scalar field confirmed the presence of non-covalent interactions between Au-H irrespective of the inclusion of relativistic effects in the quantum chemistry model (Fig. 7c).The choice of an appropriate function for analysis is also crucial for understanding specific features of descriptors represented by higher-rank tensor fields.For example, the extraction of axial and toroidal vortices in the "omega" scalar field derived from the magnetically-induced current density tensor by TDA tools 158 serves as a good illustration.While the primary focus of real-space analysis revolves around data visualization, other aspects such as quantification (e.g., through data integration), space partitioning, and data clustering can now also be efficiently addressed by TDA tools and libraries implementing them, such as the Topological Toolkit (TTK). 159,160

Conclusions and Outlook
Realizing complex workflows in quantum chemistry that involve different quantum-chemical program packages presents a challenge, particularly in the context of quantum-chemical subsystem and embedding methods.Quantum chemistry software packages are typically developed as stand-alone codes that prioritize numerical efficiency, accuracy, and adaptability to parallel architectures, but they often lack standardized interfaces for interoperability.However, achieving interoperability requires the seamless exchange of complex data between different program packages.
In this context, we have introduced an approach implemented in the PyEmbed module of the PyAdf scripting framework to facilitate the exchange of electron densities, related properties, and embedding potentials between various quantum-chemical program packages.We have opted to utilize grid-based data and developed Python modules specifically for this purpose.Leveraging grid data for information exchange significantly simplifies the interfaces between program packages, as it is agnostic to internal data structures and technical intricacies such as basis functions and basis sets.The PyEmbed module outlined here establishes clear data and file formats for grid-based information exchange, streamlining the integration of program packages into workflows to the creation of clean interfaces that furnish the necessary grid-based data.
While grid-based data simplifies data exchange, standardization of data and file formats, such as adopting a standardized format based on HDF5, would greatly enhance the broader acceptance of the modular approach outlined here.Furthermore, establishing clear and standardized interfaces in quantum-chemical program packages for results data, including molecular orbital coefficients, will further improve interoperability.In the approach presented here, utilizing the Molden file format, despite its antiquated nature and potential limitations in meeting modern software development needs, offers the advantage of enabling the generation of grid-based data outside of quantum-chemical program packages, thus allowing this process to be performed within PyEmbed.This flexibility enhances the versatility of the modular approach and streamlines the integration of grid-based data into quantum-chemical workflows.
The Python modules discussed here are presently incorporated into the PyADF scripting framework but are also designed to be utilized independently.We intend to release PyEmbed as a standalone library in the near future.Additionally, we plan to introduce support for GPU offloading using machine learning frameworks such as PyTorch or Tensorflow.This integration has the potential to yield substantial speed improvements, particularly when processing grid-based data, aligning with our recent efforts to accelerate rt-TDDFT-in-DFT calculations. 127 have showcased numerous applications made possible by interoperable workflows that rely on the exchange of grid-based data between quantum-chemical program packages.
Primarily, PyEmbed facilitates DFT-in-DFT and WFT-in-DFT embedding workflows by integrating various program packages, thereby granting access to a wide array of quantumchemical methods.The modular nature of this approach is especially well-suited for implementing customized freeze-and-thaw schemes involving different program packages.
In a broader sense, PyEmbed streamlines the rapid prototyping of subsystem and embedding methods.This capability has been exemplified in the development of real-time time-dependent density functional theory (rt-TDDFT) embedding schemes, where different program packages must interact at each time step.Similarly, the creation of the density-based many-body expansion (db-MBE) has been made possible solely through the modular and interoperable approach outlined here.These instances signify a larger paradigm shift, wherein the innovation of novel quantum-chemical methods shifts away from monolithic codes to lightweight Python scripts that manipulate and amalgamate externally provided data.We anticipate that this evolving paradigm will gain broader acceptance in the years to come (see also Refs.82, 161).
Finally, the utilization of grid-based data streamlines the incorporation of data analysis and visualization tools into quantum chemistry workflows.This facilitates the development of customized workflows that seamlessly integrate analysis tools to adjust the computational model as required, akin to the modeling cycle proposed in Ref. 162.These adaptive workflows will need to encompass steps related to evaluating the accuracy of models and data, which are pivotal for their optimization and refinement.The integration of real-space data analysis could enhance numerous stages of such efficient modeling cycles.
The integration of grid-based data, including descriptors utilized in QSAR/QSAP 163 or from conceptual DFT, 143 into computational workflows also presents new opportunities for high-throughput applications.These applications can serve as the foundation for machine learning in computational chemistry. 164,165 is the potential of the nuclei in subsystem I, v Coul [ρ](r) is the Coulomb potential of electron density ρ, E xc [ρ] is the exchange-correlation functional, and T s [ρ]

where χ µ
are the basis functions.For further details on the inclusion of the FDE embedding potential in WFT calculations, see Refs.40,84-87.Here, one commonly replaces the dependence of the embedding potential on the electron density of the active subsystem by a fixed density (linearized embedding potential 88 ).

Figure 1 :
Figure 1: Illustration of a modular approach to WFT-in-DFT calculations.Each subsystem is treated in a separate quantum-chemical calculation (shown as rectangles), possibly using different program packages (indicated by the colors of the rectangles).The electron densities from the individual calculations serve as input to PyEmbed, which calculates the embedding potential that is subsequently imported into the WFT calculation of the active subsystem.In freeze-and-thaw calculations, the roles of the different subsystems can be interchanged (indicated by the dashed arrows).

Figure 2 :
Figure 2: Overview of the general design of PyEmbed, for the example of the evaluation of the embedding potential v emb (r from the fragments electron densities ρ 1 (r) and ρ 2 (r) as well as the corresponding Coulomb potentials v

Further
details are described in Ref. 112.Currently, PyEmbed supports the generation of electron densities and related quantities via the DensityEvaluator Molden file interface from Dalton, NWChem, Orca, and Turbomole.Extending this interface to modern, standardized file formats for the exchange of molecular orbital / wavefunction information 103-105 will be straightforward once such interfaces are available in the respective quantum-chemical program packages.

Figure 3 :
Figure 3: Structural models of (a) a halide in a 50-water droplet used in the WFT-in-DFT calculations reported in Ref. 115 (Figure available from https://doi.org/10.5281/zenodo.1477102under a CC-BY license); and (b) HCl adsorbed onto an amorphous ice surface used in the WFT-in-DFT calculations reported in Ref. 117 (Figure available from https://zenodo.org/doi/10.5281/zenodo.10591687under a CC-BY license).In both cases the highlighted volumes indicate the electron densities for the individual subsystems, obtained from the preparatory DFT-in-DFT calculations.
This is mostly due to a significant reduction in the portion of the total wall time spent on building the atomic orbital (AO) matrix representation of the embedding potential (embpot2mat), from 65.8% in the original CPU-based implementation (making it the bottleneck) to 22.6% for PyTorch and further down to 1.8% for Tensorflow.With these speedups and the lack of GPU offloading, get nadd pot then becomes the bottleneck for the tensor libraries, representing 61.4% of walltime for PyTorch and 49.4% for Tensorflow.The percentage of time spent on Fock matrix formation (mo fock mid forward) also becomes a bit more important for the tensor libraries, though they are rather similar (22.6% of walltime for PyTorch and 25.9% for Tensorflow).For Tensorflow other routines such as traceback utils start to become important for the timings, though at this stage we consider them as artifacts of profiling runs, and we intend to explore this matter further once the modifications to get nadd pot are carried out.Taken together, these results provide a good first indication of the potential to speed up python-based scientific code by offloading costly operations onto GPU-aware frameworks, with minimal code changes in the scientific application.

Figure 4 :
Figure 4: Comparison of the performance of rt-TDDFT-in-DFT simulations with and without GPU offloading employing the PyTorch and Tensorflow frameworks.Left: comparison of total simulation time (wall time) required to carry out 56000 time steps, and the aggregate of time spent on evaluating the embedding potential.Right: breakdown of the time spent on the functions: embpot2mat calculates matrix elements of the embedding potential in the electronic structure code; get nad pot calculates the embedding potential on the grid with PyEmbed, mo fock mid forward extrapolates of Fock matrices during the propagation; denstogrid calculates the electron density on the grid.See Ref. 127 for a detailed discussion.The benchmarks were carried out using 64 OpenMP threads on an Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz for the CPU runs, and 1 OpenMP thread (on the same type of CPU) and 1 NVIDIA A100-PCIE-40GB GPU for the GPU runs (Figure available from https://zenodo.org/doi/10.5281/zenodo.10591687under a CC-BY license).

Figure 5 :
Figure5: Absorption spectra (a) of acetone in water and obtained using the timedependent embedding potential (green) evaluated at each time step (denoted as iterative FDE) with the same simulation setup as in Ref. 126.The violet trace ("external pot") has been obtained here for simulations that closely match those in Ref.126, with the difference that there is no update of the embedding potential using the time propagation; and (b) of trifluoroacetone in water, embedded in external static potentials.We show in violet the spectrum obtained with a potential originating from a calculation of acetone in water, and in green the spectrum obtained with the with an embedding potential determined for trifluoroacetone in water.

Figure 6 :
Figure 6: Top: Illustration of the db-MBE for the example of a water tetramer.The electron densities and related quantities are fed into the db-MBE module of PyAdf as grid-based data.The db-MBE calculation uses these to calculate the density-based correction energy E (2) corr .Bottom: Error in interaction energy obtained for the eb-MBE and db-MBE at first and second order, using fragment calculations with three different quantum-chemical methods and program packages.The calculations have been performed as described in Ref. 112.

Figure 7 :
Figure 7: The selection of the correct molecular descriptor and the analysis method can significantly affect the analysis results: The topological analysis of the electron density of the Au 4 -S-C 6 H 4 -S'-Au' 4 complex (a) by QTAIM and TDA suggests relativity-induced non-covalent interactions between Au and H atoms (manifested by the absence/presence of the bond critical points in the non-relativistic/relativistic case, respectively, demonstrated on the left and right figures).Their further inspection with the TDA tools (c) shows that these bond critical points are insignificant, and their importance is further reduced with the improvement of the quantum chemistry model.A better molecular descriptor for uncovering weak interactions is the NCI index (b), which unambiguously points to the presence of the bond critical points irrespective of the relativity level applied in QM calculations.Example after Ref. 139.