Strategies for machine-learning (ML)-accelerated discovery that are general across material composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets such as open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (∼1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the Periodic Table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to the graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the group number alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data are limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the Periodic Table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the Periodic Table, a property we expect to be broadly useful for other material domains.
Machine learning (ML) models have been demonstrated as a powerful alternative to conventional computational high-throughput screening1,2 by greatly accelerating chemical discovery,3–8 e.g., with active learning.9–12 It is typically the case at the outset of a discovery campaign that a great deal is known from experiments, computation, and associated ML models about a highly localized region of chemical space, but the most promising directions for improvement lie outside this region. For example, many of the most active catalysts13,14 or single molecule magnets15,16 require the use of rare and toxic 4d or 5d metals, and identification of principles17 for their replacement with 3d metals would be advantageous18,19 in a number of difficult small molecule catalytic conversions (e.g., H2,20–25 CO226, and O227). Thus, ML-accelerated discovery requires models and representations that generalize well and strategies to exploit existing knowledge while data in new regions are scarce.
Despite the need for ML models that generalize across the Periodic Table,3,28–31 ML models have much more commonly been applied in narrow regions of chemical space, such as closed-shell organic (i.e., CHNOF-containing) molecules in the QM9 dataset32 with few exceptions.33 For open-shell transition-metal chemistry, in particular, unique challenges are present in the generation of sufficiently large datasets for ML model training while spanning a large range of metal centers, ligand chemistry, and the cooperative effect the metal/ligand identity play in the resulting spin or oxidation state of the complex.34 While ML-model-accelerated computational discovery has targeted numerous transition-metal catalysts35–39 or functional complexes9,40 and related metal–organic materials (e.g., metal–organic frameworks41–43 and transition metal oxides44), these models have also generally been restricted to a small number of transition metals or modest modifications of the ligands. While only preliminarily demonstrated on open-shell transition-metal complexes,45–47 there is evidence that electronic descriptors such as partial charges or bond valence45 and molecular orbitals46,48,49 or other computed electronic properties50,51 are one path to training transferable ML models on small datasets. Sidestepping ML with direct, computationally demanding simulation [e.g., with density functional theory (DFT)] in large spaces of functional materials and catalysts is plagued by combinatorial challenges34,52–54 that can only be partly reduced through inverse design strategies (e.g., with alchemical derivatives53,55–57 or heuristics58). Nevertheless, all such approaches that require an electronic structure calculation to be completed for property prediction or optimization hamper the large-scale exploration that is required for large-scale material discovery.
For transferable ML models, opportunities and challenges for either conventional 3D-structure-based59–63 or graph-based64–66 representations remain. Given the importance of this challenge, numerous groups have made effort to understand how to transfer information learned on small molecules to larger molecules (e.g., with persistent homology,67 using a smooth overlap of atomic positions (SOAP) kernel or atom centered symmetry functions68 or graph neural networks69) and how to ensure transferability across different datasets70 and to new solid state materials with unseen elements (e.g., using Euclidean neural networks71). Here, we focus on graph-based revised autocorrelations65,72,73 (RACs) that are no-cost products and differences of heuristic atomic properties (e.g., nuclear charge or Pauling electronegativity) on the molecular graph that have been demonstrated for transition metal chemistry. The 155-feature form of RACs (i.e., RAC-155) was used9 to train a multi-task artificial neural network (ANN) that accelerated identification of soluble and high redox potential transition metal complexes (TMCs) for redox flow batteries from nearly 3 M candidates in weeks instead of decades.9 In this or other discovery campaigns for which there is a reasonable degree of similarity between training and test data, RAC-155 and its feature-selected subsets have been used9,35,38,65,74–77 to train ML [e.g., kernel ridge regression (KRR) and ANN] models that exhibit good performance on modestly sized (∼200 to 1000 data points) datasets of DFT properties for TMCs, including around 1–4 kcal/mol accuracy for adiabatic spin-splitting energies,65,76 bond lengths,65,74,78 ionization/redox potentials,65 HOMO energies,75 and catalytic properties.38
Exemplary of how RAC-trained models have limited domains of applicability, Janet et al. observed76 that introducing new elements (e.g., P or As) into an out-of-distribution test molecule uniquely challenged ANN models trained only on first-row, 3d open-shell transition-metal complexes with primarily 2p ligand chemistry. Conversely, introducing diverse metal coordination environments with a wider range of sampled elements into model training greatly improved ANN model generalization,79 reducing mean absolute errors by around 25%–30%. Because the distance in feature space (e.g., in KRR) or latent space (i.e., in the ANN) is a critical component, prediction by ML models of new materials with compositions not well supported by training data is universally challenging. Nevertheless, RACs and other widely used representations60 employ the nuclear charge in the representation, exaggerating differences between rows of the Periodic Table with respect to differences in the group number.80 While some heuristic properties in RACs (e.g., the Pauling electronegativity) more faithfully encode chemical similarity,64 the nuclear-charge-based features are often emphasized in feature-selected subsets, highlighting their importance in model prediction.74
Limitations in the representation will make it challenging for an ML model to learn relationships between isovalent materials with elements from different rows of the Periodic Table, despite clear relationships81–83 in both ligand field strength (e.g., 3p P > S and 2p N > O) and d-filling trends of spin-splitting energies84 between 3d and 4d TMCs. One may anticipate that changing the row of the metal requires a change in the ligand chemical composition (e.g., for nitrogen-coordinating ligands vs phosphorus-coordinating ligands) as well. The existence of these structure–property relationships presents an opportunity for representation improvement. In this work, we tackle the challenge of improving the transferability of no-cost representations by introducing the group number as a heuristic into RACs, an approach previously demonstrated only in closed-shell organic molecules85,86 or in metallic alloys.33,87 We show how this always improves model performance and devise a synergistic strategy for leveraging large amounts of data containing elements from one row of the Periodic Table to improve the learning curves on materials belonging to a new target row. While demonstrated for the first time here on open shell 3d/4d learning tasks, we expect this approach to be broadly useful for other materials.
A. Dataset construction
Mononuclear, octahedral TMCs were assembled from combinations of eight transition metals with ten small, common ligands to explore relationships among isovalent metal centers and metal-coordinating ligand atoms. While the electronic structure of some of these TMCs were discussed in Ref. 84, we reiterate the rationale for their curation in paired datasets for ML on isovalent properties here. The TMCs contain 3d or 4d mid-row transition metals (Cr/Mo, Mn/Tc, Fe/Ru, and Co/Rh) in +2 and +3 oxidation states (Fig. 1). We study the energetically accessible spin states for these metals, which correspond to low-spin (LS), intermediate-spin (IS), and high-spin (HS) states, where the IS and HS states differ by having two more or four more unpaired electrons, respectively, with respect to the singlet or doublet LS state. The earliest Cr(III)/Mo(III) and latest Co(II)/Rh(II) are studied in only two states, a LS doublet and an IS quartet (supplementary material, Table S1).
The ten common ligands were selected for their varying field strengths (i.e., from weak field halides to strong field carbonyl). Six (i.e., three pairs) of these ligands contain isovalent coordinating atoms that differ in the principal quantum number, e.g., 2p vs 3p NH3 and PH3 (Fig. 1). A large set of possible TMC isomers were assembled from up to two of these ligands (i.e., L1 and L2) in four arrangements: (i) homoleptic M(L1)6; (ii) trans M(L1)4(L2)2; (iii) cis M(L1)4(L2)2; and (iv) 5 + 1 M(L1)5(L2) (Fig. 1). All homoleptic (i.e., 440) TMCs and trans or 5 + 1 (i.e., 3960 each) structures were set up for calculation with DFT, whereas only a subset (1,080) of the cis structures were set up (see Sec. III A and the supplementary material, Table S2). For the TMCs in this set, we compute (see Sec. III A) up to three gas-phase, adiabatic spin-splitting energies, i.e., between the LS and HS states, ∆EH-L, as well as between the IS state and either the LS or the HS state (i.e., ∆EH-I and ∆EI-L), which were discussed in part in Ref. 84. In this work, for a subset of the TMCs, we now also compute a vertical ligand dissociation (LD) energy, ∆ELD, the energy required to rigidly remove a single axial ligand (see Sec. III A).
From nearly ten thousand hypothetical TMCs, we obtain a smaller subset for testing the transferability of ML models by pairing isovalent species with differing principal quantum numbers via a filtering procedure that is more restrictive than that used in Ref. 84. For the isovalent metal pairing set, we compare computed properties of pairs of complexes that have metal centers in the same group of the Periodic Table but differ in their principal quantum number [e.g., 3d Fe(II) and 4d Ru(II)] with identical characteristics for all other TMC properties (e.g., spin multiplicity, L1/L2 ligands, and symmetry class). After accounting for data fidelity and convergence rates, we obtain over 1200 pairs of spin-splitting ∆E values depending on the spin states compared with the I–L set roughly twice as large as the H–L or H–I sets. The isovalent metal pairing set of ∆ELD values across spin states contains over 1000 pairs (supplementary material, Table S3).
We also construct an isovalent ligand pairing set, which compares properties of pairs of TMCs that have ligands with metal-coordinating atoms that belong to the same group of the Periodic Table but differ in the principal quantum number (e.g., 2p N and 3p P). Because only six of the ten ligands belong to one of these 2p/3p pairs, heteroleptic TMCs could include between zero and two unique ligands from this subset. For the pairs in the isovalent ligand pairing set, we change all relevant ligands simultaneously while holding any incompatible ligands and all other TMC properties (e.g., metal and oxidation and spin state) fixed. Because we start from a smaller subset of TMCs that have the relevant ligands, isovalent ligand pairing subsets of spin-splitting and ligand dissociation energies are about half of the size of the isovalent metal pairing sets (supplementary material, Table S4). Despite the importance of machine learning to accelerate discovery in transition metal chemistry, our dataset here represents the first systematic approach to identify transferability of properties when changing both the metal center and ligand principal quantum numbers. The extent to which trends exist in this data will be assessed in Sec. IV where we show some trends exist but not enough to rely on simple models (i.e., linear models), thus motivating development of transferable ML representations (Sec. II B).
We employ Moreau–Broto autocorrelations (ACs)72,88 and revised AC variants (i.e., RACs),65 which are representations formed by discrete operations on the heuristic properties of atoms in the molecular graph in analogy to continuous autocorrelations. A standard AC is evaluated over all of the n atoms in a molecule that are d bonds apart,
where P is the chosen property and Pi and Pj are the specific values of the ith and jth atoms, respectively, that are dij bonds apart. The five heuristic properties65,72,73,88,89 we have employed for transition-metal chemistry65,89 include the Pauling electronegativity, nuclear charge, covalent radius, topology, and the identity (i.e., 1) and produce a 5d + 5 dimensional vector. ACs have achieved best-in-class performance for geometry-free descriptor predictions of atomization energies on subsets of QM932 using KRR65 or ANN models76 with diminishing returns observed for depth cutoffs above three,65 although higher depth cutoffs are sometimes used.73,89
Extensions of ACs to enrich metal-local descriptors90 in the feature vector led to the development of RACs, as first introduced in Ref. 65. RACs include centering (i.e., always including in the sum) the property evaluation on the metal (i.e., mc-RACs) or the ligand atom that coordinates the metal (i.e., lc-RACs). The lc-RACs are averaged separately over ligands of specific types, which were referred to as the scope in Ref. 65. In the octahedral complexes studied here, lc-RACs are averaged separately over the equatorial or axial ligands. RACs also introduced the difference of two atomic properties,
which can be non-trivially computed for metal-centered or ligand-centered RACs excluding the identity property for any d > 0 bond paths. In total, the complete RAC set evaluated on the five original heuristic properties consists of 30d + 30 product-based and 12d difference-based RACs for a total feature vector size of 42d + 30. For a typical65 cutoff of d = 3, this yields 156 possible features. For the mononuclear octahedral TMCs that are the focus of this work, excluding five constant features (i.e., corresponding to connectivity around the metal center) produces 151 features. In the original work, four additional descriptors of ligand denticity, oxidation state, spin state, and Hartree–Fock exchange fraction in the DFT functional were added, leading to the name RAC-155.65 In this work, we always include the oxidation state (i.e., for 152 total features) and the spin state is included only for ΔELD (i.e., for 153 total features) but still refer to the set as RAC-155.
Because RAC heuristic properties depend on the properties of individual atoms, we revisit the extent to which such properties encode useful information about chemical similarity of distinct elements. Some, such as the Pauling electronegativity, encode close chemical similarity of distinct elements (e.g., χ: 1.88 for Co vs 1.90 for Tc). Still, the nuclear charge, Z, exaggerates the dissimilarity of elements with differing principal quantum number belonging to the same group (e.g., Z: 27 for Co vs 45 for Rh or 7 for N vs 15 for P). Thus, we expect low-complexity kernel models trained on RAC-155 to exaggerate differences in TMCs with metals or ligands from differing rows of the Periodic Table in comparison to those from the same row. To illustrate this, we carried out principal component analysis (PCA) on a subset (i.e., d = 0 or 1) of mc-RACs in RAC-155 applied to all isovalent metal pairing (i.e., with both 3d and 4d metals) complexes for which we have computed the ∆EI-L property. For these TMCs, the first PC in this subset of RAC-155 primarily distinguishes the metal centers (Fig. 2). Not only are 3d and 4d metals very distant from each other, but the strong variation of the metal Z over the set in comparison to the other heuristics leads to the later 3d Co being the most proximal metal to the early 4d Mo.
We thus add to our original set of RAC heuristic properties the group number, G, which had been shown to be beneficial for increasing transferability in organic molecules.85,86 This property increases the number of feature vector dimensions by 6d + 6 product-based and 3d difference-based RACs. For the d = 3 cutoff used here, this adds 33 features to our original 152- or 153-dimensional set, and we refer to this set as eRAC-185. The G is identical for isovalent species (e.g., 9 for Co and Rh and 5 for N and P) with distinct principal quantum numbers. As expected, PCA in a subset (i.e., d = 0 or 1) of the new G-derived mc-RACs distinguishes TMCs by the metal G in PC1 and by the metal-coordinating ligand atoms in PC2 (Fig. 2 and supplementary material, Fig. S1). As a result, isovalent 3d and 4d metals (e.g., Co and Rh) will be identical in this feature set, and the full G set of features will order TMCs by increasing d filling and groups isovalent 3d and 4d metals close to each other (supplementary material, Figs. S2 and S3). Conversely, both PCA and average Euclidean distances over isovalent metal pairing complexes for the full RAC-155 set indicate large differences between isovalent 3d and 4d metal centers and do not reproduce expected trends from the Periodic Table (supplementary material, Figs. S2 and S3). Because the new features encode intuitive relationships, we expect that incorporating them into ML property predictions should be beneficial for learning tasks that span multiple rows in the Periodic Table. Despite this natural expectation, the extent to which including intuitive relationships is necessary to improve property predictions from models and in which data regimes this relationship is most needed to pass to the model will be addressed in Sec. IV.
A. Electronic structure calculations
Some of the DFT data used for training in this work was previously published in Ref. 84, and all data generation steps followed an established protocol75 for DFT training data for octahedral, mononuclear transition-metal complexes,45 which we reiterate here. DFT geometry optimizations were performed using a development version of TeraChem v1.9.91,92 The B3LYP93–95 global hybrid functional was employed with the LANL2DZ96 effective core potential for transition metals and the 6-31G* basis97 for all other atoms. Singlet calculations were carried out in a spin-restricted formalism, while all other calculations were unrestricted. Level shifting98 was employed to aid self-consistent field convergence with the majority-spin and minority-spin virtual orbitals each shifted by 0.25 Ha. Geometry optimizations were carried out in translation rotation internal coordinates99 using the L-BFGS algorithm. Default tolerances of 4.5 × 10−4 hartree/bohr and 10−6 hartree were applied in the convergence criteria for the gradient and energy difference between steps, respectively.
Initial structures were generated with molSimplify,76,100,101 which uses Open Babel102,103 as a backend, and calculations were automated with molSimplify Automatic Design (mAD).40,75 The mAD calculations run for 24 h of wall time with up to five resubmissions. Geometric criteria75 were applied after each resubmission and at tighter thresholds on the final structure (supplementary material, Table S5). We use the same geometric criterion values as in Ref. 75, which preserve connectivity and penalize excessive asymmetry, except we have tightened the criterion on metal–ligand distance asymmetry for homoleptic complexes (supplementary material, Table S5). Out of 9440 initiated 3d and 4d geometry optimizations, converged structures were obtained for 8729 TMCs, and 6480 TMCs satisfied all geometric criteria (supplementary material, Table S6).
As in previous work,45,75,76 criteria based on electronic structure properties were also used to filter the dataset. For all open shell (i.e., non-singlet) complexes, calculations with deviations of ⟨S2⟩ from its expected S(S + 1) value by 1 μB2 or more were eliminated (196 runs) as were cases where the Mulliken spin on the metal was at least 1 μB lower than the expected total spin (292 runs, supplementary material, Table S6). After all filtering steps, 5992 octahedral 3d or 4d TMCs were retained for spin-splitting property prediction, but because evaluation of this property requires the convergence of multiple spin states, the final dataset sizes were smaller (supplementary material, Table S7). The ∆ELD property was only evaluated for trans or homoleptic complexes with neutral ligands, and filtering steps by the ⟨S2⟩ and Mulliken metal spin criteria were also applied to these complexes (supplementary material, Table S8).
The retained TMCs were then processed through two additional steps to generate the datasets used in ML model training, a more restrictive procedure than that outlined in Ref. 84. First, TMCs were paired to construct the isovalent metal pairing and isovalent ligand pairing datasets, reducing the candidate TMC set sizes by a factor of two (supplementary material, Tables S3 and S4). We then detected cases where the electronic structure was qualitatively distinct between isovalent pairs of complexes, as judged by the metal d-orbital occupations. The occupation of each d orbital was obtained from the NBO v6.0 package,38 and a heuristic cutoff of 3e− for the total difference summed over all d orbitals was used to exclude a pair of calculations (supplementary material, Fig. S4). This procedure eliminates a small number (∼10 to 100 for spin-splitting and 250 for ∆ELD) of outliers (supplementary material, Tables S3 and S4). The final isovalent metal pairing datasets contain ∼300 to 600 pairs of spin-splitting (i.e., I–L, H–L, or H–I) ∆E values and over 1000 pairs of ∆ELD values, and the final isovalent ligand pairing datasets are a little more than half the size of the isovalent metal pairing sets (supplementary material, Tables S3 and S4). The total energies and structures of all TMCs converged in this work along with details of structures eliminated are provided in the supplementary material.
B. Machine learning
We largely follow a previously developed protocol65,75 for training kernel ridge regression (KRR) models to predict properties in open-shell transition-metal chemistry with some modifications noted. KRR models with a Gaussian kernel were trained on both RAC-155 and eRAC-185 representations and isovalent metal pairing or isovalent ligand pairing data using the scikit-learn104 software package. The pairs of complexes in the isovalent metal pairing or isovalent ligand pairing sets were randomly partitioned into 80% train and 20% validation. We use the 20% validation data to select hyperparameters and then report the errors on this set as a lower bound for the test error. The pairs are preserved during the random partitioning, e.g., if an isovalent metal pairing 3d complex is in the training set, its isovalent 4d counterpart will be as well. All inputs and outputs were normalized to have zero mean and unit variance on the training data.
Two types of models were trained: single-row models that only are trained on a subset of the training data (in practice, from 1 to 50 points) from the same row as the learning task and transfer models that are supplemented with all of the training data from the alternate row. Using the 324-pair of complex isovalent metal pairing set for the 3d to 4d learning task as an example, models (see Sec. IV) were trained on a subset of the 259-complex 80% 4d-only training data (i.e., single-row models) or this same subset of points along with the 80% 3d data (i.e., transfer models). Exhaustive grid search from 10−12 to 102 with logarithmic spacing was used to select hyperparameters (i.e., kernel width and regularization strength). For the isovalent metal pairing 3d to 4d example, hyperparameters selected from ten-fold cross-validation errors on 3d data produce models with low test set mean unsigned error (MUE) on the in-distribution 3d complexes but high MUE on the out-of-distribution 4d test set (i.e., the isovalent pairs to the 20% 3d test set, supplementary material, Fig. S5). Thus, we select hyperparameters for each KRR model on the 20% data from the same row as the learning task and hereafter refer to this as the validation MUE, which provides a lower bound of a true unseen test set model error (supplementary material, Fig. S5).
Feature selection was employed as motivated by observations65,75 that it improves KRR model generalization in open-shell transition-metal chemistry. Feature-selected subsets were obtained in a one-shot fashion using LASSO105 or random forest (RF)106 as in some prior work.65 Features with an importance greater than 1% were retained from RF. For LASSO feature selection, five values for the regularization strength were tested (10−2, 10−1, 100, 101, or 102), and all nonzero features were retained from the LASSO model with the lowest validation MUE. The overall feature set (i.e., from LASSO, RF, or the full feature set) that produced the lowest validation MUE was selected for training the KRR model. We report the MUE as the ensemble-average MUE and a credible interval from the ensemble standard deviation. A representative example of the selected hyperparameters, the feature selection method, and MUEs is provided in detail in the supplementary material (Table S9). All other model hyperparameters and feature sets are provided in the supplementary material.
IV. RESULTS AND DISCUSSION
A. Correlations of properties between rows
Because our modified representations are motivated by the assumption that isovalent metals or ligands should exhibit similar properties and structure–property relationships, we first validate this expectation over the isovalent metal pairing and isovalent ligand pairing sets. For the isovalent 3d and 4d transition metals in the isovalent metal pairing set, we observe good correlations between the adiabatic gas-phase spin-splitting energies of the first- and second-row metals (Fig. 3). This correlation holds somewhat better (R2 = 0.9) for the two-electron ∆EI-L and four-electron ∆EH-L than for the two-electron ∆EH-I (R2 = 0.65) values (Fig. 3 and supplementary material, Table S10). As observed in prior work,84 all 4d TMC DFT spin-splitting energies are consistently low-spin shifted with respect to equivalent 3d TMCs, especially for strong-field ligands (i.e., with positive 3d TMC ∆E values). For the ∆EI-L and ∆EH-L cases where the correlations hold best, the 4d TMC ∆E increases by around 1.4 kcal/mol for every 1 kcal/mol increase in the 3d TMC (Fig. 3 and supplementary material, Table S10).
Despite the apparent good linear correlation over the dataset, the MUE of the linear model for predicting 4d ∆EH-L from its 3d TMC value is still large at around 10 kcal/mol, and the maximum unsigned error is even larger at 45 kcal/mol (supplementary material, Table S10). While MUEs are reduced somewhat for ∆EH-I and ∆EI-L (to ∼7 kcal/mol), the range of values are also smaller for these properties. Each individual 3d or 4d metal/oxidation state spans a wide range of ∆EH-L values (∼100 kcal/mol) with metal-specific distributions that reflect changes in ligand field strength in our set (i.e., from weak-field halide to strong-field methylisocyanide) for both 3d and 4d TMCs (supplementary material, Fig. S6). When linear models are fit on each individual metal and oxidation state, the MUE of each linear model is somewhat reduced (e.g., ∆EH-L 4–5 kcal/mol vs 10 kcal/mol for the full set) in part due to the fact that a smaller range of ∆E values is sampled (supplementary material, Figs. S7–S9 and Tables S10 and S11). Subdividing the data by metal/oxidation state and building individual linear relationships as a strategy for TMC design would, nevertheless, be challenged by variability in dataset sizes (supplementary material, Tables S11 and S12). Thus, even though these spin-splitting properties are related for metal pairs in the isovalent metal pairing set, their relationship is not trivial and could be expected to benefit from a flexible ML regression model. A predictive ML model would have the added benefit of not requiring the calculation of one property (e.g., 3d ΔEH-L) to predict the other (e.g., 4d ΔEH-L) that would be required for even a better linear fit.
For the ligand dissociation energy, ∆ELD, introduced in this work, a single linear model obtains good correlation (R2 = 0.91) between 3d and 4d TMCs over all metals, oxidation states, and spin states (Fig. 3). The ratio of 4d:3d ∆ELD values (∼0.93) is close to unity but indicates that ligands bind 4d TMCs somewhat more weakly than equivalent 3d TMCs (Fig. 3 and supplementary material, Table S13 and Fig. S10). Subdividing ligand dissociation energies by metal/oxidation state or spin provides limited benefit because subset comparisons generally have similar 4d:3d ratios and correlation coefficients to the overall set (supplementary material, Tables S13 and S14 and Fig. S11). Whether overall or in metal/oxidation/spin-state-specific subsets, the MUE of the linear fits is somewhat lower (4–5 kcal/mol) than was observed for spin-splitting energies (supplementary material, Tables S11 and S13).
The isovalent ligand pairing set consists of TMCs containing ligands (i.e., six of ten) that vary by isovalent (i.e., 2p vs 3p) changes to the metal-coordinating atom of the ligand. Depending on the metal and oxidation state, the correlation of 2p and 3p spin-splitting energies ranges from good (R2 > 0.9) to poor (R2 < 0.1), where poorer correlations correspond most typically to unchanged spin-splitting energies with the ligand substitution and with earlier (i.e., d3–d4) transition metals (supplementary material, Fig. S12 and Table S15). The influence of the 2p to 3p isovalent substitution on spin-splitting energies depends on the degree of change in ligand field strength (Fig. 4). A consistent trend can be observed over all d5–d7 metal centers in either row for the H2O/H2S and NH3/PH3 substitutions with a good correlation and low standard deviation (Fig. 4 and supplementary material, Fig. S13 and Table S16). For the majority of 4d TMCs, this change corresponds to an increase in spin-splitting energy, but there are some exceptions especially with acetonitrile ligands (supplementary material, Table S17). While average trends can be observed over the dataset, linear relationships differ as the oxidation and spin states being compared are varied. For example, a correlation that works well for Co(II) metal centers does not transfer to either Co(III) or Rh(II) (supplementary material, Table S15). This observation motivates our focus on development of machine learning models.
Conversely, the limited change in field strength among the halides means these TMCs have a poorer correlation for the substitution (supplementary material, Fig. S13 and Table S16). For ∆ELD, trends are less clear, but the 2p to 3p substitution of two axial ligands (i.e., including the ligand undergoing dissociation) generally decreases the ligand dissociation energy (by 12–16 kcal/mol) for the H2O/H2S and NH3/PH3 pairs (supplementary material, Tables S17 and S18). As with the isovalent metal pairing set, there is a clear structure to the isovalent ligand pairing data indicating that properties of 2p complexes are related to their 3p counterparts, but the relationships depend more strongly on the group identity of the metal-coordinating atom.
We briefly summarize the structure–property relationships uncovered for our learning tasks to test ML model and representation generalization across the Periodic Table. For isovalent metal pairing spin-splitting energies, 3d/4d relationships are sensitive to the spin states and ligand field strengths being compared, whereas isovalent metal pairing ΔELD values are comparable across 3d and 4d TMCs. Thus, conclusions about which features are essential for learning spin state ordering trends can be expected to differ from those based on ligand dissociation energy, highlighting the importance of comparing a range of properties when assessing featurizations. For isovalent ligand pairing spin-splitting energies, 3d/4d relationships are dependent on ligand field strength, and ligand dissociation energies are more clearly influenced by the principal quantum number of the dissociated axial ligands. Given the evidence for some relationships between rows of the Periodic Table, ML model training can be expected to benefit from the G heuristics present in eRAC-185 for both learning tasks that involve changing the metal or changing the ligand principal quantum number. Nevertheless, the relative extent to which these features improve property prediction requires further quantitative assessment (Sec. IV B).
B. ML property prediction across the periodic table
Based on the ability of linear models to predict isovalent metal pairing properties as well as some evidence of structure in the isovalent ligand pairing dataset, we expect that data from one row of the Periodic Table can be used to inform predictions on isovalent counterparts. To test this hypothesis, we construct a series of KRR models using both RAC-155 and the eRAC-185 representation. We adopt two strategies to test model generalization (see Sec. III B). While we train all models on a randomly selected and small subset (i.e., from 10 to 50) of training points from the same row as the 20% validation data, we also train “transfer” models that are supplemented with all available (i.e., the full 80%) training data from the alternate row. To unambiguously identify the training data included in each model, we adopt the notation data-representation, where the training data are only from the same row (S) as the test (here, also validation) data or transfer (T) data from the alternate row are also included and representation is the feature set (i.e., RACs or eRACs) used by the model. For example, a T-eRAC model applied to the 3d to 4d ∆EH-L learning task will be provided with all available 3d ∆EH-L data in the isovalent metal pairing set along with a small number (e.g., 10–50) of randomly selected 4d ∆EH-L points from the isovalent metal pairing set and trained on eRAC-185 and feature-selected subsets (see Sec. III B).
We first focus on 3d to 4d prediction of ∆EH-L and ∆ELD in the isovalent metal pairing dataset to assess the relative performance of the possible data–representation combinations. Models incorporating data from both rows [i.e., T-(e)RACs] consistently outperform single-row models [i.e., S-(e)RACs], and simultaneously, eRAC-185-trained models have lower MUEs than models trained with RAC-155 (Fig. 5). The MUE is evaluated as the average error from an ensemble of 25 feature-selected models (see Sec. III B), and observed differences in model performance are typically larger in magnitude than the standard deviation of the errors from the ensemble. The benefit of eRAC-185 is most apparent when limited training data (i.e., <20 points) from the same row as the validation data are used. For example, prediction of ∆ELD from the isovalent metal pairing dataset has a substantially lower MUE (by ∼6.6 kcal/mol) with the T-eRAC model with 20 4d points in the training set than with the S-eRAC model. When we increase the number of 4d training points modestly (i.e., to 50), the relative benefit of this additional data from another row is smaller (i.e., the MUE is lower by only ∼3.5 kcal/mol) but still significant (supplementary material, Table S19).
To investigate if this observed maximum benefit of eRAC-185 when few examples from the same row as the validation data are available is a general phenomenon, we expanded our analysis to models trained to predict ∆EI-L and ∆EH-I with only 20 4d data points. Indeed, we find that transfer models consistently outperform single-row models, and the use of eRAC-185 feature set is synergistic with this benefit (Fig. 6). The T-eRAC models consistently achieve the lowest overall MUEs for predicting all three spin-splitting properties, and those trained on eRAC-185 achieve larger reductions in MUE (∼2 to 5 kcal/mol) than equivalent models training using RAC-155 (∼1 to 2 kcal/mol). For predicting ∆ELD, transfer models significantly improve over single-row models (by ∼6 kcal/mol), but this improvement is observed for both RAC-155 and eRAC-185 representations (supplementary material, Table S19). We attribute the difference in ∆ELD model errors to the observed lack of dependence on the principal quantum number of the metal for this property (see Sec. IV A). Comparing MUEs scaled by the property range in the dataset, T-RAC or T-eRAC models predict ∆ELD with a scaled MUE roughly half of that of the S-eRAC model (0.04 vs 0.07, supplementary material, Table S20). Similar improvements are observed on scaled MUE values for the spin-splitting properties (i.e., from as high as 0.1 to as low as 0.05, supplementary material, Table S20). While the lowest scaled errors from the transfer models are slightly larger than those obtained on models with the same metals in train and test data, they still represent a significant improvement.
To probe the source of benefit of the inclusion of the first 20 inter-row data points, we developed an additional ML model training procedure in which we grouped inter-row data by metal in order to preferentially add one metal at a time to the training set. When the training data includes only a single 4d metal (i.e., Mo, Tc, Ru, or Rh) for ∆EI-L, no improvement in the model is observed (supplementary material, Fig. S14). Rather than reducing MUEs with more data, the MUEs are instead high and constant at ∼15 kcal/mol when one to 15 training points of this type are added (supplementary material, Table S21). If data points are added from multiple additional 4d metals, performance instead improves significantly. In this case, MUEs for predicting ∆EI-L reduce to ∼11 kcal/mol when training data include two 4d metals (supplementary material, Table S21). After adding fewer than 20 points, the T-eRAC model with knowledge of two 4d metals alongside all 3d metals outperforms the S-eRAC model (MUE: 13 kcal/mol) that was trained on 20 random points that included all 4d metals (Fig. 6 and supplementary material, Fig. S15). The MUE is reduced further when a third 4d metal (i.e., to 7.8 kcal/mol) and all 4d metals (i.e., to 6.2 kcal/mol) are included in the training set (supplementary material, Table S21). Thus, we expect that inter-row models perform best when examples of each new metal type are included in the training data by giving the model balanced information about the relationship among all groups of the Periodic Table studied in the validation set. The significance of the identified sample size of 20 inter-row data points maximizing benefits in transfer models is likely attributable to reaching a sample that includes sufficient diversity of isovalent metals or ligands. This result is non-trivial because at such small set sizes, the model has information about all metals but not in combination with ligands and thus must infer the cooperativity of metal and ligand field effects.
Returning to the eRAC-185 representation, we tested whether its observed benefit in the isovalent metal pairing dataset 3d to 4d learning task is general to both the reverse learning task (i.e., 4d to 3d isovalent metal pairing) as well as to changes in ligand chemistry (i.e., 2p to 3p isovalent ligand pairing and 3p to 2p isovalent ligand pairing). Across all of these learning tasks, a combination of transfer models and eRAC-185 reduces validation MUEs for most properties with results and improvements comparable to those observed for the isovalent metal pairing 3d to 4d learning task (supplementary material, Fig. S15). In all learning tasks and properties compared, isovalent ligand pairing 2p to 3p and 3p to 2p ∆EH-I are the only cases where RAC-155-trained models outperform eRAC-185, but these differences are small (0.3 kcal/mol) and within the standard deviation (0.6 kcal/mol) of the 25-model ensembles (supplementary material, Table S22). As in the isovalent metal pairing dataset, the benefit of inter-row, transfer models trained to predict ∆ELD in the isovalent ligand pairing set (∼6 kcal/mol) is observed to be independent of the representation (i.e., RAC-155 or eRAC-185) chosen. For spin-splitting energy prediction in the isovalent ligand pairing set, we again observe that eRAC-185 improves inter-row models with T-eRAC models outperforming single-row models by a slightly larger margin (∼2 to 6 kcal/mol) than T-RAC models (∼2 to 4 kcal/mol). Taken together, we conclude that the use of eRAC-185 in combination with a transfer model trained on a small set of examples from the new learning task provides a near-universal benefit and never significantly degrades the accuracy in comparison to a standard ML model. The converse is thus also true and an important conclusion from our study. Specifically, while some transferability could have been expected given the relationship of properties when changing the metal or ligand identity (Sec. IV A), once sufficient data are available, the benefit of feature engineering diminishes although it never worsens the learning performance. We expect these conclusions to be broadly true across other material classes, but to understand what chemical trends are learned by the models, we carry out further analysis of selected features next in Sec. IV C.
C. Feature analysis reveals learned chemical trends
To investigate the source of the improved performance observed when using eRAC-185, we evaluate the averaged, down-selected feature space. For each model in the ensemble, features are selected independently (i.e., from one-shot LASSO or random forest 1% cutoff, see Sec. III B), so this analysis is on the normalized feature contribution averaged over the KRR models in the ensemble. Features containing the G atomic property are frequently selected in inter-row models for predicting both spin-splitting energies and ∆ELD (see the supplementary material for all features and models). For 3d to 4d prediction of ∆EI-L (i.e., on the isovalent metal pairing set), G eRACs involving the metal center, atoms one bond-path away from the metal, and atoms two bond-paths away from the metal are all selected (Fig. 7). This suggests that the T-eRAC models achieve performance improvements by positioning both metals and ligands closer to their inter-row counterparts in the feature space than standard RACs. In contrast, for 3d to 4d prediction of ∆ELD, G features involving the metal are rarely selected (Fig. 7). This analysis mirrors our observation that the expanded representation did not lead to significant improvements for a ∆ELD inter-row, transfer model (see Sec. IV B) due to the relative lack of dependence of the ligand dissociation on the principal quantum number of the metal.
We analyzed other learning tasks (i.e., beyond 3d to 4d learning of ∆EI-L and ∆ELD) to confirm that the benefit of using eRAC-185 was due to the presence of G features in feature-selected subsets. Prior to feature selection, the G features comprise 18% of the initial eRAC-185 set, and they are frequently retained after feature selection regardless of the learning task or property (supplementary material, Fig. S16). For the ∆EI-L and ∆EH-I properties where G-based features were most beneficial in model performance (see Sec. IV B), G contributions are enriched (to ∼23% of features) after feature selection. For ∆ELD and ∆EH-I, cases where G-based features provided less benefit, they are retained at a comparable frequency (∼18% of the down-selected features) to others in the set, confirming that these features are neither particularly detrimental nor beneficial (supplementary material, Table S23).
As in previous work,38,65,75,77 we also analyzed the selected features to reveal qualitative chemical trends in the dataset. We select ∆EI-L 3d to 4d learning as a representative property for spin-splitting energy trends (supplementary material, Fig. S16). Features retained by models trained on ∆EI-L are strongly metal-local with 80% corresponding to the metal or its immediate coordinating atoms. Since this result recapitulates observations from prior work65 that emphasized the metal-local nature of spin-splitting energy prediction, the tailored representation preserves known trends (Fig. 7 and supplementary material, Table S24). We next analyze the features selected for predicting ∆ELD, which is a property for which we had not previously trained ML models. In this case, the retained features are 79% distal (i.e., with information from atoms two or more bonds away) from the metal (Fig. 7 and supplementary material, Table S24). The high weight of ligand-based features and more global information is consistent with our observations of reduced dependence on the principal quantum number of the metal for ∆ELD prediction (Fig. 3). Thus, feature analysis suggests that it is possible to design ligand binding strength (e.g., for catalysis) in a metal-row-independent fashion (e.g., predicting 4d properties from data obtained on 3d TMCs). Conversely, because the spin state is dictated by both the metal center and only the immediate coordinating atoms of the ligand, inter-row transfer models trained on eRACs and minimal but chemically diverse TMCs79 should enable accelerated prediction of 4d TMC spin-splitting in cases where only 3d TMC data are available.
Because the property prediction obtained from a KRR model depends on a combination of feature-space distances and the kernel width, the relative distances between TMCs in a representation are critical for understanding model performance. We thus quantify the Euclidean norm distance between complexes with differing metals in feature-selected subspaces that have been averaged and normalized over the 25-model ensembles used for property prediction. In the features selected in T-RAC models, these Euclidean distances do not represent what we would expect from the arrangement of elements on the Periodic Table (supplementary material, Table S25). For example, in the 3d to 4d prediction of ∆EI-L, Cr is positioned closer to Fe and Co in the feature space than it is to the more chemically similar Mn. Furthermore, Cr is comparatively distant to all of the 4d metals, artificially reducing its influence on predicting their properties (Fig. 8). In contrast, the features retained in T-eRAC models rearrange the feature space to more closely match intuition from the Periodic Table with Cr in closest proximity to Mn and Mn in closest proximity to Cr and Fe. The 3d and 4d metal pairs are also shifted closer together in T-eRAC models. For example, 3d Cr is closer to isovalent 4d Mo than to a dissimilar 3d metal such as Co (Fig. 8). The rearrangement of the feature space is observed for other metals with 3d/4d pairs shifted an average of 13% closer in T-eRAC models in comparison to equivalent T-RAC models (supplementary material, Table S25). These results suggest that the reduced prediction error when using eRAC-185 (see Sec. IV B) can be attributed to this representation encoding chemical trends more efficiently.
To demonstrate the value of using the eRAC representation in chemical discovery, we apply transfer learning models with both RACs and eRACs to new out of sample sets of TMCs. For the out of sample sets, we construct 4d TMCs from monodentate OHLDB79 ligands from previous work. The OHLDB ligands consist of up to two heavy (i.e., C, N, O, P, or S) atoms in combination with hydrogen atoms enumerated in an exhaustive fashion and scored against heuristic rules.79 The goal of the OHLDB set was to generate a set of ligands distinct from more commonly studied ligands and thereby to enable chemical discovery of novel metal-local environments. We now evaluate our best-performing IMP ΔEH-L transfer learning models on never-before-studied 4d TMCs of OHLDB ligands as a representative chemical discovery example. All structures are provided in the supplementary material. For consistency with our established approach, the best IMP transfer models were trained with 3d data and 20 seeded 4d data points (i.e., all from the initial ligand set) and applied to 70 new 4d homoleptic OHLDB complexes. Prediction errors over this set are systematically higher for complexes containing 3p atoms two bonds distant from the metal, none of which are present in any of the training data (supplementary material, Figs. S17 and S18). We thus focus our analysis on the subset of 52 data points that exclude more metal-non-local 3p atoms in ligands. Comparison of the T-RAC model and T-eRAC models for this subset emphasizes the advantage of using the group number as a descriptor (supplementary material, Fig. S19). Despite having the same number of seeded 4d data points, the T-eRAC model predicts double the number of points (i.e., 26 vs 14) to lower than 10 kcal/mol error (supplementary material, Fig. S19). The T-eRAC model also predicts all points with a lower overall MUE (21 kcal/mol vs 27 kcal/mol). For example, the T-eRAC model correctly predicts Tc(III)(methylamine)6 to have a HS ground state with a modest error (i.e., 5 kcal/mol), whereas the T-RAC model error is over three times larger (19 kcal/mol, supplementary material, Fig. S19). We emphasize that OHLDB ligands are particularly challenging for our models to predict because they represent a set of ligands designed to increase diversity from more commonly studied ligands. Thus, the T-eRAC approach is expected to be most beneficial when starting to collect data in spaces where chemistry is most distinct. We thus conclude that while many ML tasks have relied on large datasets and narrow chemical compositions, this work has highlighted the minimum new data needed to develop transferable ML models across a wider range of chemical compositions. While demonstrated here on mononuclear transition metal complexes with 3d and 4d metals and 2p and 3p ligands, we plan to next extend this approach to other heavier (i.e., 5d and 4f) elements as well as more diverse ligand chemistry. In all such cases, this work has revealed opportunities in the small data limit to carry out chemical exploration once a minimum amount of data is used in combination with transferable ML representations.
For many materials design challenges in transition metal chemistry, it is common to find one element that works well for a chemical property or a transformation, even if it is preferred to use another due to price or scarcity of the element. Furthermore, when carrying out ML-accelerated chemical discovery efforts, one may start with a specific range of chemical compositions but then aim to change course. It is not known a priori how much prior data can be used in these cases. It is also unknown the extent to which chemical observations from one element (e.g., 3d vs 4d) for either the metal or the ligand (e.g., 2p vs 3p) within the same group of the Periodic Table should be transferable. Intuitively, chemical relationships between neighboring elements in the Periodic Table are expected, so efficient strategies for ML-driven discovery should leverage representations that preserve these similarities. To thus tackle these challenges, we carried out the first study of the extent to which ML models can best leverage transferability of properties between 3d and 4d transition metal complexes with 2p and 3p ligands.
After confirming the strong inter-row structure–property relationships for spin-splitting energies and ligand dissociation in isovalent 3d/4d metals and 2p/3p ligands in large sets (∼1000 pairs) of data, we identified a strategy for increasing ML model transferability. We introduced eRAC-185, a tailored representation that blends together the monotonically increasing nuclear charge with the periodic in nature group number heuristics. We showed how the addition of the G heuristic property in eRACs altered feature-space distances to encode a more intuitive degree of similarity among isovalent elements essential for kernel-based ML models.
Next, we demonstrated the synergistic value of eRACs alongside a transfer learning approach to leverage inter-row data in sets containing isovalent metals or ligand-coordinating atoms in transition-metal chemistry. To assess the potential improved performance of eRAC-185, we trained KRR models using data from either only the same row as the prediction task or both rows of the Periodic Table. While including data from the alternate row always reduced prediction errors, the eRAC-185 model errors were most substantially improved in the limit of very low data (∼20 points). The model error improvement was strongest when the relationship being learned deviated most from parity. This highlights how transferable representations will be essential to improve ML model generalization from one row of the Periodic Table to another especially when carrying out chemical discovery where limited prior knowledge is available.
To identify differences in models trained on eRAC-185, we analyzed the distance and heuristic atomic properties retained during feature selection. Feature-space distances of TMCs grouped and averaged by metals confirmed that eRAC-185 subsets faithfully represented trends expected from the Periodic Table, causing elements from the same or adjacent group to influence KRR predictions most strongly. These representations, models, and approaches can be leveraged to accelerate the discovery of earth-abundant catalysts or materials from known materials featuring scarcer elements. We expect our strategies for inter-row design and expanding transferability to be broadly useful to other material challenges across the Periodic Table.
See the supplementary material for spin multiplicity definitions; subset of combinations studied for cis TMCs; statistics for filtering and pairing structures as well as 3d/4d comparisons to obtain properties; PCA of RAC and eRAC representations; comparison of the Euclidean distance between metals by the featurization type; geometry metrics used for geometric filtering; number of compounds failed at each filtering step with reasons; details of NBO d orbital check elimination; cross-validation heat maps for hyperparameter selection; KRR model hyperparameters by the model number; linear models for inter-row property prediction for the isovalent metal pairing dataset, overall and broken down by metal; histograms of property distributions for spin-splitting energy; inter-row parity plots for properties by the metal and oxidation state; ligand dissociation energy distribution broken down by the metal and oxidation state; frequencies of ligand dissociation energy by metal, oxidation, and spin state in the isovalent metal pairing and isovalent ligand pairing datasets; parity plots for spin-splitting energy in the isovalent metal pairing set; changes in the property by ligand mixing for three spin-splitting energy properties; linear models for spin-splitting energies in isovalent ligand pairing dataset split by metal, oxidation state, and type of spin-splitting energy; change in spin-splitting energy as a function of ligand mixing in the isovalent ligand pairing dataset; quantification of ligand substitution in the isovalent ligand pairing dataset quantified by the ligand type; differences in ligand dissociation energy for compounds that vary only in equatorial ligands or only in axial ligands; quantification of test set errors for KRR model ensembles for various inter-row learning tasks; model performance by ordered metal addition for 4d metals; quantification of errors with ordered metal addition of 4d metals; mean unsigned errors for various inter-row learning schemes over the isovalent metal pairing and isovalent ligand pairing datasets; stacked bar charts for the feature character for different properties obtained during each scheme of inter-row learning and tabulation of the corresponding feature character by the property and bond depth; quantification of feature space distances for feature selected subspaces of 3d/4d models in the isovalent metal pairing dataset (PDF); structures and total energies of all 3d and 4d TMCs studied in this work; reasons complexes were not included in the dataset; spin-splitting values for all pairs of 3d/4d TMCs; ligand dissociation energies for 3d/4d TMCs; total energies of all TMCs; OHLDB out of sample predictions; and KRR model performance and selected features and ensemble quantification (ZIP).
The authors acknowledge initial partial support in data generation steps and machine learning representation development (for A.N.) by the U.S. Department of Energy under Grant No. DE-SC0012702. This work was also supported by the Office of Naval Research under Grant Nos. N00014-17-1-2956 and N00014-18-1-2434 (D.R.H. and J.P.J.), Grant No. N00014-20-1-2150 (C.D., D.R.H., and N.A.), and DARPA Grant No. D18AP00039 (for N.A., C.D., and A.N.). Ligand dissociation studies were supported by the National Science Foundation under Grant No. CBET-1704266 (for A.N.). A.N. and N.A. were partially supported by a National Science Foundation Graduate Research Fellowship under Grant No. 1122374. H.J.K. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and an AAAS Marion Milligan Mason Award, which supported this work. This work was carried out, in part, using computational resources from the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant No. ACI-1548562. The authors thank Adam H. Steeves for providing a critical reading of the manuscript.
Conflict of Interest
The authors have no conflicts of interest to disclose.
D.R.H. and A.N. contributed equally to this work.
The data that support the findings of this study are available within the article and its supplementary material.