The development of highly efficient methods for the calculation of electronic coupling matrix elements between the electron donor and acceptor is an important goal in theoretical organic semiconductor research. In Paper I [F. Gajdos, S. Valner, F. Hoffmann, J. Spencer, M. Breuer, A. Kubas, M. Dupuis, and J. Blumberger, J. Chem. Theory Comput. **10**, 4653 (2014)], we introduced the analytic overlap method (AOM) for this purpose, which is an ultrafast electronic coupling estimator parameterized to and orders of magnitude faster than density functional theory (DFT) calculations at a reasonably small loss in accuracy. In this work, we reparameterize and extend the AOM to molecules containing nitrogen, oxygen, fluorine, and sulfur heteroatoms using 921 dimer configurations from the recently introduced HAB79 dataset. We find again a very good linear correlation between the frontier orbital overlap, calculated ultrafast in an optimized minimum Slater basis, and DFT reference electronic couplings. The new parameterization scheme is shown to be transferable to sulfur-containing polyaromatic hydrocarbons in experimentally resolved dimeric configurations. Our extension of the AOM enables high-throughput screening of very large databases of chemically diverse organic crystal structures and the application of computationally intense non-adiabatic molecular dynamics methods to charge transport in state-of-the-art organic semiconductors, e.g., non-fullerene acceptors.

## I. INTRODUCTION

The calculation of electronic coupling matrix elements, or transfer integrals, is at the very heart of charge transfer simulations in biology and materials science.^{1–4} Computational techniques for electronic couplings have significantly matured over the last 20 years, and depending on the requirements of the problem at hand, a large number of methods can now be employed, ranging from high accuracy but time consuming *ab initio* calculations^{5–7} to density functional theory (DFT) calculations [e.g., time-dependent DFT (TDDFT),^{8} constrained DFT (CDFT),^{9–14} projector operator-based diabatization (POD),^{15–17} fragment-orbital DFT (FODFT),^{18,19} and frozen density embedding],^{20,21} to fast semi-empirical density functional tight binding (DFTB),^{6,7,22} and to the ultrafast analytic overlap method (AOM).^{23}

For the modeling of charge carrier transport in soft condensed media (e.g., organic and biological semiconductors), it is important to properly account for the strong thermal fluctuations of the molecules, and several methods have been developed to do that, e.g., kinetic Monte Carlo,^{24–27} transient localization theory,^{28,29} or non-adiabatic molecular dynamics (MD) simulations.^{30,31} A common feature of these methods is the very large number of transfer integrals that need to be evaluated to converge charge mobilities. As a result, one needs to select an appropriate electronic coupling calculation scheme as to strike a balance between computational efficiency and chemical accuracy.

The analytic overlap method for electronic coupling calculation was designed with these requirements in mind.^{23} The idea behind this method is that the computationally expensive DFT calculation of charge transfer integrals can be effectively substituted by very rapid estimations of frontier molecular orbital (FMO) overlap integrals, multiplied by an appropriate scaling coefficient. The FMOs are reconstructed very efficiently in an optimized minimum Slater type orbital (STO) basis set, allowing the ultrafast analytic calculation of FMO overlaps and electronic couplings for a large number of dimer configurations.^{32}

The parameterization of the AOM relies on the compilation of a molecular training set and a series of single molecule DFT calculation for the determination of the FMOs and dimer electronic coupling calculations using an appropriate level of theory as to establish a linear approximation between overlap and charge transfer integrals. The original version of the AOM relied on planewave DFT calculations for both single molecule and dimeric configurations using a limited molecular training set comprised of small pi-conjugated molecules and C60 fullerenes, primarily focusing on carbon-rich compounds with minimal heteroatom contribution to the FMO electronic density.^{23}

From a materials point of view, the AOM has been successfully utilized for the modeling of charge transfer mechanisms for homocyclic pristine and functionalized polyaromatic hydrocarbons (PAHs) for organic electronics.^{33–36} Furthermore, from a theoretical and method development perspective, the AOM is at the heart of the fragment orbital-based surface hopping non-adiabatic MD simulation methodology for charge transport modeling^{34} while simultaneously constituting a robust electronic coupling estimator for pi-conjugated systems.^{7}

Two issues limit the applicability of our previous implementation of the AOM.^{23} First, reference electronic couplings were obtained with planewave DFT calculations in the gas phase, which generated a significant computational overhead for large pi-conjugated organic molecules compared to atomic orbital-based DFT calculations. Second, the chemical diversity of the training molecular dataset with regard to the types of PAH heteroatoms and dimer structural variability was very limited. The focus of this work is to effectively address both of these issues. With regard to the first issue, we take advantage of the recently implemented^{16} POD method^{15} for reference electronic coupling calculations from DFT with atom-centered Gaussian-type orbitals (GTOs). The AOM is recast here to support the DFT FMOs in a GTO basis as input for the FMO reconstruction in a minimum Slater basis. With regard to the diversity of the chemical space covered by the AOM, we utilize the newly introduced HAB79 dataset^{37} containing nitrogen, oxygen, fluorine, and sulfur heteroatoms. This database contains 79 single molecules and 921 dimer configurations, providing a wealth of data for the reparameterization of key parameters of the AOM (that is, Slater decay coefficients), hence enabling ultrafast electronic coupling calculation for a plethora of organic semiconductors found in state-of-the-art organic electronics and photovoltaic applications (e.g., non-fullerene acceptors).

The revised AOM framework provides reparameterized optimal Slater decay coefficients for carbon and nitrogen chemical species and new parameters for oxygen, fluorine, and sulfur [oxygen and sulfur parameters in the original AOM implementation relied on unoptimized Clementi–Raimondi (C–R) reference values,^{38} and fluorine was not included]. Using the new parameterization, a universal scaling constant satisfying the linear relationship between electronic coupling and overlap integrals is established and successfully tested on a molecular dataset taken from the experimentally resolved molecular crystal structures.

## II. COMPUTATIONAL DETAILS

The AOM asserts that the electronic coupling *H*_{ab} between two non-orthogonal diabatic charge transfer electronic states, Ψ_{a} and Ψ_{b}, is, to a good approximation, proportional to the overlap integral *S*_{ab} between the two states, thus satisfying the following linear relationship:

where

We note that a similar relation is used in the Hückel theory to approximate the resonance integral. The difference is that in the latter, the resonance integral refers to the electronic interaction of atomic orbitals centered at neighboring atoms on the same molecule, whereas in the AOM, the coupling matrix element is between two diabatic wavefunctions centered on neighboring molecules.

In most methods for coupling matrix element calculation, including POD, the two wavefunctions Ψ_{a} and Ψ_{b} are approximated by the localized FMOs obtained by diabatization of a dimer electronic Hamiltonian (highest occupied FMOs on the donor and acceptor for hole transfer and lowest unoccupied FMOs for excess electron transfer). These diabatized FMOs are very similar to the FMOs of the isolated molecules, the highest occupied electronic orbitals (HOMOs) or the lowest unoccupied molecular orbitals (LUMOs), but not identical with them due to donor–acceptor interactions in the dimer Hamiltonian (if included) and the presence of orthogonalization tails. Hence, if the (non-orthogonal) FMOs of the isolated donor and acceptor are used for the calculation of *S*_{ab}, as done in this work, “a” and “b” on the left- and right-hand sides of Eq. (1) refer to slightly different orbitals. In practice, this is not a problem because any such differences are absorbed in the *C* constant.

In the following, we present explicit expressions of the FMOs in terms of an optimized minimal Slater basis set that allows for an ultrafast calculation of overlap and of electronic coupling via Eq. (1). The FMOs of the donor and acceptor are initially resolved by means of DFT calculations on the isolated molecules using an atom-centered GTO basis set. For instance, the HOMO of the donor molecule, $\varphi D\u2032N\u3009$ (notation as in Ref. 23), is expressed as

where

is a primitive GTO in its Cartesian form and *N*_{GTO} is the number of GTO basis functions. Similar expansions are made for the HOMO of the acceptor or for the LUMO of the donor and acceptor in the case of excess electron transfer. The FMO is subsequently projected into a STO basis set only with s- and p-type angular momentum contributions,

where *N*_{STO} is the number of STO basis functions and

with $Rn\mu ;r$ being the radial part of the STO orbital,

and $Ylm\theta ,\phi $ being an orthonormal real spherical harmonic,

The GTO-to-STO FMO projection is effectively reduced to the solution of the following linear system of equations:

For the sake of computational simplicity and efficiency, the STO–GTO overlap integrals $\chi j|\psi i$ are calculated using the Gaussian expansion method by O-ohata *et al.*^{39} using expansions of 10, 6, and 8 GTOs for 1s and 2s, 2p, 3s and 3p STO orbitals, respectively.

Defining the projection completeness as

the set of STO exponential coefficients $\mu i$ per chemical species and angular momentum channel is chosen so as to maximize $p\mu i\u22641$. In order to achieve an augmented level of chemical diversity, the projection completeness optimization procedure with respect to $\mu i$ is carried out considering an ensemble of *N*_{mol} different molecules. To this end, we define a GTO-to-STO projection goodness metric function

where $meanp=1Nmol\u2211j=1Nmolpj$ and $varp=1Nmol\u2211j=1Nmolpj$$\u2212meanp2$. Due to the intricate nature of the multidimensional objective function $pM\mu i$, the minimization procedure is carried out by means of a derivative-free Nelder–Mead simplex optimization algorithm.^{40} Using the optimized *μ*_{i} Slater decay coefficients, we can arrive at a set of $ci$ expansion coefficients for every isolated molecule under study, thus concluding the GTO-to-STO projection procedure.

The expansion of the FMOs [Eq. (5)] can be simplified further. The FMOs of pi-conjugated organic molecules are usually formed of linear combinations of p-orbitals pointing in the direction perpendicular to the plane of pi-conjugation. Hence, we project the Slater p-orbitals of each atom *i* participating in pi-conjugation on p-orbitals orthogonal to the plane [*p*_{π,i}] and on two p-orbitals parallel to the plane [*p*_{σ1,i} and *p*_{σ2,i}]. As shown before, the expansion coefficients of the *p*_{σ} orbitals and s orbitals can be neglected to a very good approximation.^{23} Hence, the expansion equation (5) simplifies to

A similar projection is carried out for the HOMO of the acceptor. After renormalization, these orbitals are then used to calculate the overlap in an optimal minimalistic representation of the FMOs,

The AOM estimate for electronic coupling is then given by

where $C\u0304$ is a constant of proportion obtained by a linear fit of $S\u0304ab$ to $Hab$ values from POD DFT reference calculations. Given a set of dimeric configurations, in order to obtain an optimal linear correlation between reference electronic coupling values $Hab$ and AOM calculated overlap integrals $S\u0304ab$, a further tuning of the 2p and 3p STO exponential coefficients $\mu \u0304i$ per chemical species is required. This optimization is carried out using the Nelder–Mead simplex algorithm, aiming to minimize the sum of squared residuals during a least-squares linear fit between $Hab$ and $S\u0304ab$ in a logarithmic scale, effectively calculating the intercept $C\u2032\u0304=logC\u0304$ and minimizing the residual in

With regard to the technical details of DFT single molecule calculations, FMOs of isolated molecules are calculated at the PBE/DZVP-GTH minimal level of theory.^{41,42} Electronic couplings from scaled POD (sPOD) calculations at the PBE/DZVP-GTH level of theory were taken from the HAB79 dataset.^{37} The POD couplings were scaled by a factor of 1.282, as dictated by the correlation analysis using reference *ab initio* values at the minimal active space CASSCF/NEVPT2 level of theory, corresponding to a mean relative signed error (MRSE) of −3.3% and a mean relative unsigned error (MRUE) of 17.9%.^{37} All DFT calculations are carried out using the CP2K package.^{43}

## III. RESULTS AND DISCUSSION

The training of the revised AOM scheme in this work is carried out using the newly introduced HAB79 dataset: a selection of planar heterocyclic PAHs with non-degenerate FMOs, containing nitrogen, oxygen, fluorine, and sulfur heteroatoms with a significant contribution to the electronic density and with either n- or p-type semiconducting behavior.^{37}

The first step in the training procedure is dedicated to isolated molecules and aims at the determination of the optimal STO decay exponents $\mu i$ for atomic species and orbital type and molecule-specific expansion coefficients $ci$, both optimizing the statistical distribution of the GTO-to-STO projection completeness value via the metric described in Eq. (11), with the STO decay exponents explicitly treated during the optimization procedure and the expansion coefficients derived via the solution of Eq. (9) for every $\mu i$ set. We have carried out two types of computational procedures for $\mu i$. In the first procedure, we group the HAB79 molecules in subsets that contain the same atom types actively participating to the FMO density and examine the landscape of the goodness metric of Eq. (11) by means of 2D $\mu i$ grid evaluations. Doing so, we validate the global minima resolved by the Nelder–Mead simplex optimization on selected lower dimensional 2D grids. In the second procedure, we carry out a simultaneous multi-dimensional optimization of $\mu i$ for all molecules of HAB79.

In the first procedure, the HAB79 dataset is divided into five subsets, as depicted in Figs. 1–3. Molecules 1–14 correspond to heterocyclic PAHs with nitrogen heteroatoms having significant FMO density contribution. In the same fashion, molecules 15–23, 24–30, and 31–53 bear oxygen, fluorine, and sulfur heteroatoms with sizable contribution to the electronic density. Finally, molecules 54–79 have multiple heteroatom contributions to the FMOs and are excluded from the 2D landscape analysis of the projection goodness metric.

Considering the cases of X-based PAHs, where X corresponds to a N, O, F, or S heteroatom, the projection goodness metric is evaluated on a grid of $\mu C,\mu X$ STO exponents. This procedure returns unimodal surfaces with a distinctive global minimum, indicating the existence of a unique set of decay exponents that can minimize the projection goodness metric.

The contour diagrams of the projection goodness metric, along with projection completeness histograms utilizing the C–R and optimized STO exponential coefficients for the aforementioned HAB79 subsets, are depicted in Figs. 4 and 5. As a reminder, the C–R coefficients were obtained by Clementi and Raimondi^{38} by means of SCF single atom calculations for the expression of atomic orbitals using a STO basis set—hence may not be optimal for our purpose. In Paper I,^{23} the C–R coefficients were used for oxygen and sulfur as the optimization of these coefficients was not possible due to a lack of molecules in the previously used training set with significant contributions to the FMO density on these atom types. The optimization procedure clearly shifts all projection completeness distributions toward unity and reduces the standard deviation relative to C–R.

In the second procedure, a simultaneous minimization using the complete HAB79 dataset (molecules 1–79) is carried out by means of a Nelder–Mead simplex, resulting in the optimized values reported in Table I.

. | C . | N . | O . | F . | S . |
---|---|---|---|---|---|

Projection $\mu i$ (this work) | 1.4427 | 1.6467 | 1.8588 | 2.1364 | 1.6517 |

AOM $\mu \u0304i$ (this work) | 1.3856 | 1.6171 | 1.5051 | 1.6652 | 1.6411 |

Projection $\mu i$^{a} | 1.3120 | 1.7000 | 2.2266 | N/A | 1.8273 |

AOM $\mu \u0304i$^{a} | 1.0000 | 1.5000 | 2.2266 | N/A | 1.8273 |

C–R^{b} | 1.5679 | 1.9170 | 2.2266 | 2.5500 | 1.8273 |

. | C . | N . | O . | F . | S . |
---|---|---|---|---|---|

Projection $\mu i$ (this work) | 1.4427 | 1.6467 | 1.8588 | 2.1364 | 1.6517 |

AOM $\mu \u0304i$ (this work) | 1.3856 | 1.6171 | 1.5051 | 1.6652 | 1.6411 |

Projection $\mu i$^{a} | 1.3120 | 1.7000 | 2.2266 | N/A | 1.8273 |

AOM $\mu \u0304i$^{a} | 1.0000 | 1.5000 | 2.2266 | N/A | 1.8273 |

C–R^{b} | 1.5679 | 1.9170 | 2.2266 | 2.5500 | 1.8273 |

The performance of the Nelder–Mead derivative-free optimization procedure in minimizing the projection goodness metric objective function is illustrated in Fig. 6. Using the C–R STO exponential decay coefficients as a reference, the new set of optimized $\mu i$ parameters correspond to a more complete GTO-to-STO projection, exhibiting tighter distributions closer to 1.0. In what follows below, we use the set of $\mu i$ values from the Nelder–Mead simplex optimization procedure.

After the calculation of the STO decay coefficients $\mu i$ reported in Table I that optimize the projection procedure, we arrive at a set of STO-based FMO expansion coefficients $ci$ for every molecule in the dataset. Paper I has shown that the set of $\mu i$ coefficients derived from the projection part of this particular methodology cannot fully account for a sufficient linear relationship between overlap and electronic coupling values.^{23} This is also the case for the current work, as demonstrated by the correlation plots depicted in Fig. 7.

As described in Ref. 23, we introduce a second set of STO decay coefficients $\mu \u0304i$ to optimize the linear correlation between electronic coupling and overlap while retaining the optimized expansion coefficients $ci$ obtained from the GTO-to-STO projection with optimized decay coefficients $\mu i$. Due to the complexity of the optimization problem at hand, we utilize a Nelder–Mead simplex algorithm, aiming to minimize the sum of squared residuals during a least-squares linear fit of $logHab$ vs $logS\u0304ab$ according to Eq. (15). The set $\mu \u0304i$ reported in Table I corresponds to the global minimum attained by the simplex optimization procedure. Contour plots of the residual objective function calculated on two-dimensional grids of STO decay coefficients for HAB79 subsets are illustrated in Fig. 8, along with the simplex-derived global minimum position. The optimal linear correlation using the set $\mu \u0304i$ between overlap and charge coupling is shown in Fig. 9 with a universal scaling constant $C\u0304$ equal to 9463 meV. We find that that the linear correlation is excellent for large coupling values (>100 meV) and still rather good for values down to about 10 meV, while we notice a significant spread for smaller coupling values.

The reparameterization of the AOM based on the HAB79 dataset is tested against a set of 30 sulfur-based PAHs. The testing molecular set contains seven molecules already present in the training HAB79 dataset, namely, molecules 31, 32, 38, 40, 41, 42, and 48. The molecules on the test dataset not belonging to the training set are depicted in Fig. 10.

The Cambridge Crystallographic Data Centre (CCDC) deposition numbers for all compounds in the testing dataset are listed in the supplementary material.^{44} It should be noted that, although consisting of a smaller number of configurations compared to the training set, the testing set constitutes a quite challenging benchmark, since it contains dimeric structures taken directly from experimentally resolved structures that are not present in the training set.

As a first test at the single molecule level of description, the GTO-to-STO projection completeness is evaluated for all 23 test molecules not belonging to the training dataset. The statistical distribution of these values using the optimized parameters $\mu i$ from Table I is plotted in Fig. 10. All except one fall in the range obtained for the training set 0.97–0.99, showing excellent transferability of $\mu i$ to the sulfur-containing PAH outside the training set.

Moving on to molecular dimers, a set of test configurations of dimers is extracted from the experimentally resolved molecular crystal structures reported in the CCDC records for the testing dataset. In order to keep dimers with significant charge transfer characteristics, scaled POD coupling calculations at the PBE/DZVP-GTH level of theory are carried out for all dimers having a center-of-mass separation smaller than 1.5 nm, discarding dimers with charge coupling values smaller than 0.1 meV. As a result, we obtain 233 dimer configurations.

The correlation between AOM and scaled POD coupling values is shown in Fig. 10, with the values of the dimers of the 23 test molecules not present in the training set depicted using square symbols and the seven molecules belonging to the training set (but in different dimer configurations than in the HAB79 dataset) depicted using circles. The linear fitting procedure results in a slope of 0.99, thus perfectly correlating the AOM electronic coupling estimate with the reference sPOD DFT calculations. As for the training set, the agreement with reference sPOD couplings is excellent for couplings larger than 10 meV. For lower values, the AOM underestimates sPOD couplings by up to an order of magnitude. The results on the training data (Fig. 9) suggest that the underestimation of small coupling values is not a systematic feature of the AOM but specific to the molecules in the testing set.

Finally, a statistical error analysis is carried out on AOM electronic coupling values, calculating the mean unsigned error (MUE), mean relative signed error (MRSE), mean relative unsigned error (MRUE), and maximum absolute error (MAX). The results of this analysis are summarized in Table II using scaled POD and *ab initio* values for the HAB79 dataset.

. | . | MUE . | MRSE . | MRUE . | MAX . |
---|---|---|---|---|---|

Dataset . | Reference . | (meV) . | (%) . | (%) . | (meV) . |

HAB79 | sPOD | 12.4 | 11.2 | 44.0 | 114.4 |

HAB79 | Ab initio^{a} | 17.8 | −8.4 | 36.6 | 182.5 |

Testing | sPOD | 4.4 | −10.5 | 32.4 | 49.1 |

. | . | MUE . | MRSE . | MRUE . | MAX . |
---|---|---|---|---|---|

Dataset . | Reference . | (meV) . | (%) . | (%) . | (meV) . |

HAB79 | sPOD | 12.4 | 11.2 | 44.0 | 114.4 |

HAB79 | Ab initio^{a} | 17.8 | −8.4 | 36.6 | 182.5 |

Testing | sPOD | 4.4 | −10.5 | 32.4 | 49.1 |

^{a}

Reference 37.

The MRUE of the AOM relative to sPOD is reasonable, considering that this is an average error over electronic couplings that span 5–6 orders of magnitude. A further breakdown of the AOM error for each order of magnitude of reference sPOD coupling is summarized in Table III. As apparent from Fig. 7, the MRUE for couplings larger than 100 meV is very small (8.5%) and still reasonably small for couplings between 10 and 100 meV (24.7%). For couplings smaller than 10 meV, the AOM appears to be less reliable. However, we would like to point out that also for DFT electronic structure calculations, it becomes increasingly challenging to accurately calculate small electronic coupling values and converge them with respect to the basis set and grid size, so some of the errors could be derived from inaccuracies of the reference calculations. Fortunately, the electronic couplings that determine transport in organic (opto-)electronic materials are typically on the order of 10–200 meV, which is the regime that is very well captured by the present AOM parameterization.

sPOD H_{ab}
. | MUE . | MRSE . | MRUE . | MAX . |
---|---|---|---|---|

interval (meV) . | (meV) . | (%) . | (%) . | (meV) . |

(0, 1] | 0.7 | 299.8 | 357.5 | 12.9 |

(1, 10] | 3.9 | 24.0 | 69.3 | 27.0 |

(10, 100] | 12.2 | −13.2 | 24.7 | 94.4 |

(100, 1000] | 18.3 | −4.8 | 8.5 | 114.4 |

sPOD H_{ab}
. | MUE . | MRSE . | MRUE . | MAX . |
---|---|---|---|---|

interval (meV) . | (meV) . | (%) . | (%) . | (meV) . |

(0, 1] | 0.7 | 299.8 | 357.5 | 12.9 |

(1, 10] | 3.9 | 24.0 | 69.3 | 27.0 |

(10, 100] | 12.2 | −13.2 | 24.7 | 94.4 |

(100, 1000] | 18.3 | −4.8 | 8.5 | 114.4 |

The fact that the MRUE of the AOM relative to *ab initio* couplings is smaller than that relative to sPOD is most likely a coincidence. The small statistical deviations reported for the sulfur-based PAH testing dataset constitute a promising feature with regard to the predictive capabilities of AOM calculations.

## IV. CONCLUSIONS

In this work, we have reparameterized the analytic overlap method (AOM) for ultrafast electronic coupling calculation, we extended the parameterization to heteroatoms commonly found in semiconducting organic molecules (namely, nitrogen, oxygen, fluorine, and sulfur), and we made the AOM compatible with DFT electronic coupling calculations that employ Gaussian-type basis sets. The original AOM implementation relied on unoptimized values for oxygen and sulfur, taken directly from the work of Clementi and Raimondi,^{38} and a parameterized value for nitrogen based only on the pyrrole molecule, hence the need for (re)parameterization in this work. The good linear correlation between frontier orbital overlap and DFT electronic coupling that we reported in Paper I^{23} is retained for the much larger and more diverse database investigated in this work. The couplings of the vast majority of the 921 dimers investigated here can be well described by a single linear fit, i.e., by a single constant $C\u0304$. The transferability of the revised parameterization was tested on a set of application-relevant sulfur-containing PAHs that was not included in the parameterization. The error was not larger (in fact, smaller) than that for the training set, which makes us confident that transferability is a general feature of the AOM.

The major advantage of the AOM is the sheer speed with which electronic couplings can be estimated as it requires only the analytic evaluation of overlap integrals using a small number of Slater type orbitals. One AOM coupling calculation on a dimer of a 152-atom molecule (e.g., a porphyrin dimer) typically takes 14 ms on a single compute core, whereas the explicit POD electronic coupling calculation takes 527 s on 144 cores (Intel Xeon CPU E5-2630 v3 @ 2.40 GHz), a speed-up of a factor of 10^{4}–10^{5} and a saving in computing resources of a factor of 100.

The error of the AOM, while reasonably small (given the large chemical space and range of coupling values covered), can be reduced by parameterizing the linear relation, i.e., the $C\u0304$ constant, specifically for a given molecule in question. This usually requires only a small number of dimer configurations, typically not more than a few tens.^{34,35} Another approach to reduce the error, which could be investigated in future work, is to use state-of-the-art machine learning (ML) methods to estimate and correct the error between AOM and DFT coupling estimates (ΔML). Compared to ML methods, the AOM has the advantage that it already knows the complicated nodal shape of the FMOs, which determines electronic coupling, so this important feature no longer needs to be learned. Hence, we expect that ΔML will need a much smaller number of reference data than the ML models alone.

The AOM reparameterization procedure employed here is based on a well-defined derivative-free optimization scheme, the Nelder–Mead simplex method, and an appropriate choice of the objective functions. As a result, the method can be easily extended to cover additional application-relevant heteroatoms, such as other peripheral halogen atoms or PAH core boron, phosphorus, and heavy chalcogen additions—provided the existence and curation of appropriate training molecular datasets.

Finally, the new code for GTO-to-STO projection and for calculation of AOM electronic couplings is publicly available at https://github.com/oziogos/pyAOMlite, along with all reference single molecule and dimer information needed to validate the results presented in this work.

## SUPPLEMENTARY MATERIAL

See the supplementary material for a full list of CCDC deposition numbers for all molecules belonging to the AOM sulfur-based PAH testing set.

## ACKNOWLEDGMENTS

This work was supported by the European Research Council (ERC) under the European Union Horizon 2020 Research and Innovation Program (Grant Agreement No. 682539/SOFTCHARGE). Through our membership of the UK’s HEC Materials Chemistry Consortium [funded by the EPSRC (Grant Nos. EP/L000202 and EP/R029431)], this work used the ARCHER UK National Supercomputing Service (http://www.archer.ac.uk) as well as the UK Materials and Molecular Modeling (MMM) Hub [partially funded by the EPSRC (Grant No. EP/P020194)] for computational resources. The authors also acknowledge the use of UCL Grace and Kathleen High Performance Computing Facilities.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.