The experimental search for new thermoelectric materials remains largely confined to a limited set of successful chemical and structural families, such as chalcogenides, skutterudites, and Zintl phases. In principle, computational tools such as density functional theory (DFT) offer the possibility of rationally guiding experimental synthesis efforts toward very different chemistries. However, in practice, predicting thermoelectric properties from first principles remains a challenging endeavor [J. Carrete et al., Phys. Rev. X 4, 011019 (2014)], and experimental researchers generally do not directly use computation to drive their own synthesis efforts. To bridge this practical gap between experimental needs and computational tools, we report an open machine learning-based recommendation engine (http://thermoelectrics.citrination.com) for materials researchers that suggests promising new thermoelectric compositions based on pre-screening about 25 000 known materials and also evaluates the feasibility of user-designed compounds. We show this engine can identify interesting chemistries very different from known thermoelectrics. Specifically, we describe the experimental characterization of one example set of compounds derived from our engine, RE12Co5Bi (RE = Gd, Er), which exhibits surprising thermoelectric performance given its unprecedentedly high loading with metallic d and f block elements and warrants further investigation as a new thermoelectric material platform. We show that our engine predicts this family of materials to have low thermal and high electrical conductivities, but modest Seebeck coefficient, all of which are confirmed experimentally. We note that the engine also predicts materials that may simultaneously optimize all three properties entering into zT; we selected RE12Co5Bi for this study due to its interesting chemical composition and known facile synthesis.
I. INTRODUCTION
For any materials problem, breaking out of “local optima” in composition space to discover entirely new chemistries remains a notoriously difficult challenge.3 Many of the most notable materials classes under investigation today—from NaxCoO2 derived thermoelectrics4 to iron arsenide superconductors5—were discovered fortuitously. As a result, experimental efforts often gravitate toward incrementally improving known chemistries (via doping, nanostructuring, etc.), as these efforts are more likely to bear fruit than high-risk searches through chemical whitespace for entirely new materials.
The consequence of research communities’ focus on further exploitation of known chemistries rather than exploration of unknown chemistries is that much of composition space simply remains uncharacterized. We illustrate the remarkable chemical homogeneity of most thermoelectric materials investigated to date by plotting each material from the thermoelectric database of Gaultois et al.6 on the periodic table based on the composition-weighted average of the positions of elements in the material (Fig. 1). The tight cluster of previously investigated chemistries is, as expected, dominated by chalcogenides and p-block elements such as Sn and Sb. In contrast, we also show the positions of Gd12Co5Bi and Er12Co5Bi, materials derived from our recommendation engine, which we characterize as a new class of thermoelectrics in this work. These materials are almost pure intermetallics, in sharp contrast to thermoelectric compounds investigated to date (Fig. 2). The objective of our recommendation engine is to directly enable experimental researchers to rapidly identify new materials, such as RE12Co5Bi, that are very distinct from known compound classes, and worthy of further study.
Most known thermoelectric materials lie in a tight cluster in composition space (black and blue dots; blue dots have chemical formulae explicitly labelled). The recommendation engine presented here allows the identification of new thermoelectric materials families that are well outside the existing composition space of common systems in the Gaultois et al. database.6 In particular, we report the characterization of RE12Co5Bi (RE = Gd, Er; orange squares), which are chemically and structurally distinct from known thermoelectrics.
Most known thermoelectric materials lie in a tight cluster in composition space (black and blue dots; blue dots have chemical formulae explicitly labelled). The recommendation engine presented here allows the identification of new thermoelectric materials families that are well outside the existing composition space of common systems in the Gaultois et al. database.6 In particular, we report the characterization of RE12Co5Bi (RE = Gd, Er; orange squares), which are chemically and structurally distinct from known thermoelectrics.
The strongly intermetallic RE12Co5Bi compounds we report here lie far outside the norm for metal loading among collected thermoelectric compositions in the Gaultois et al. database.6 The recommendation of these materials was neither the result of simple interpolation between known compounds nor obvious from a strict chemical intuition standpoint.
The strongly intermetallic RE12Co5Bi compounds we report here lie far outside the norm for metal loading among collected thermoelectric compositions in the Gaultois et al. database.6 The recommendation of these materials was neither the result of simple interpolation between known compounds nor obvious from a strict chemical intuition standpoint.
A. A materials recommendation engine
Our recommendation engine is a machine learning-based approach7,8 for efficiently driving synthetic efforts toward promising new chemistries. We have trained a machine learning model to make a confidence level prediction of whether the (1) Seebeck coefficient, (2) electrical resistivity, (3) thermal conductivity, and (4) band gap of input materials are within acceptable ranges for thermoelectric applications. We define these ranges as follows: (1) |S| > 100 μV K−1; (2) ρ < 10−2 Ω cm; (3) κ < 10 W m−1 K−1; and (4) Eg > 0 eV, all at room temperature.
For each range of thermoelectric property, the engine gives a confidence score between 0% and 100% that a given material’s measured value for that property at room temperature will fall within the targeted range. We would classify any material for which the answer to all these questions is likely “yes” as a potentially promising thermoelectric that may warrant further study. The purpose of our recommendation engine is thus neither to make quantitative predictions of these thermoelectric properties nor to definitively identify record-setting compounds—these remain open challenges for future work. Rather, the engine is intended to greatly augment the chemical intuition of experimental researchers working on materials discovery. In particular, we have found that our model’s ability to screen vast numbers of possible compositions and short-list interesting candidates can inspire materials syntheses that would not have been obvious a priori.
Machine learning models such as those developed here differ considerably from atomistic simulation approaches such as density functional theory (DFT). DFT is already a well-established tool for accelerating materials discovery,9,10 and high-throughput methods have already been applied successfully in the search for new thermoelectric materials.11–15 Nevertheless, accurately predicting thermoelectric properties from first principles remains challenging.2 Recent works, for example, use the BoltzTraP code16 to estimate the Boltzmann transport properties of candidate materials based on DFT-predicted band structures.17 The nascent field of materials informatics—algorithmically extracting new knowledge by mining large-scale materials databases—has emerged alongside these traditional physics-based simulations as a key means of predicting materials behavior.18,19
The present machine learning-based recommendation engine looks for empirical, chemically meaningful patterns in experimentally reported data on known thermoelectric compounds to make statistical predictions for the performance of new materials. Further, while efforts such as the Materials Project are making the results of DFT calculations more accessible to the experimental materials community than ever before,3 most experimentalists still are not able to run DFT calculations continually to inform their laboratory work in real-time. To make predictive computation more widely accessible, we make the results of the present work available as a web application (http://thermoelectrics.citrination.com) that any materials researcher can utilize to request real-time predictions and search for new thermoelectric candidates.
II. METHODS
A. Modelling and informatics
Here we describe the approach used to construct the recommendation engine. Our engine is an example of materials informatics,20,21 or the application of empirical machine learning methods to the prediction of materials behaviour. Any machine learning approach for materials relies on three key ingredients: training data, descriptors, and choice of algorithm. Training data are the example sets from which the machine learning approach should extract meaningful chemical trends. Descriptors are the low-level characteristics of materials (e.g., crystal structure, chemical formula, etc.) that might correlate with materials properties of interest. Specifically, descriptors are either numerical (e.g., average atomic number Z) or categorical (e.g., crystal structure = perovskite) variables that enable us to “vectorize” materials in such a way that they become amenable to machine learning techniques. Finally, learning algorithms interrogate descriptor-vectorized training data for relevant patterns.
In this work, the training set comprises a large body of both experimental thermoelectric characterization data,6 experimental materials property data from the NIMS MatNavi database, and first principles-derived electronic structure data.3,22 These data are publicly available via the Citrination platform (http://www.citrination.com), the Materials Project API (http://www.materialsproject.org/open), and NIMS (http://mits.nims.go.jp/index_en.html). These data consist of the Seebeck coefficients, thermal conductivities, electrical conductivities, and band gaps measured for thousands of materials as a function of temperature and a variety of other metadata conditions. Our model uses these input data to learn interesting chemical trends that could be exploited to design new materials. As large, high-quality training data sets are scarce in materials science relative to the biological sciences, where bioinformatics has become a standard tool, we urge the materials community to consider contributing to data infrastructures (Citrination, Materials Project, NIST’s DSpace repository, EU’s NoMaD, and others) that together will significantly expand open access to data for materials researchers.
Descriptors are the second key ingredient in materials informatics. The scientific literature around designing descriptors for materials has grown substantially in just the past several years.23,24 Indeed, recent work has shown that the predictive power of machine learning models for materials is strongly dependent upon the selected descriptor set.25 Our engine relies upon a tuned blend of descriptors designed in-house and drawn from a variety of sources.2,7 By way of example, as materials scientists, we recognize that the periodic table contains a tremendous amount of information about how the elements behave and interact. We thus pre-bias our machine learning models with such knowledge (e.g., the d block of the periodic table is metallic; Li and Na are chemically very similar but not identical; and the lanthanides behave similarly in ionic compounds). This step allows us to create predictive models with data sets that have thousands (rather than tens or hundreds of thousands) of examples.
The ability of materials informatics techniques to extract signal from materials data is strongly dependent on effective descriptor design and access to large quantities of training data. With respect to the latter point, machine learning algorithms are only able to identify patterns that are (at least sparsely) sampled by the training data. An important manifestation of this requirement in the context of the present work is modeling doping. Doping represents a minute change in materials composition (on an atomic percentage basis) but may result in orders of magnitude changes in properties. As most of the training data used in this work correspond to undoped bulk compounds, we expect the recommendation engine to perform best in identifying new such bulk systems which could be potentially be further optimized via doping. Given more training data, we could readily extend the current work to dilutely doped thermoelectric systems.
Finally, our recommendation engine is built using the so-called random forest algorithm.26 This algorithm constructs a large number of decision trees, all trained on slightly different subsets of the training data. Random forest is an ensembling technique, which takes advantage of the fact that a collection of “weak” learners such as decision trees can, in concert, model extraordinarily complex nonlinear behaviour. An example rule that a single decision tree might learn is that if a material contains two elements with very different electronegativities (e.g., Na and Cl), that material is likely to have a large band gap. Of course, the thermoelectric phenomena we seek to model here are substantially more subtle, and thus a large random forest of decision trees is useful in untangling the underlying physics. We refer the reader elsewhere2,7,27 for more detailed discussions and tutorials on how to apply random forests to materials data.
B. Model validation
We visualize the accuracy of our recommendation engine’s predictions in Fig. 3, which represents the results of leave-one-out cross-validation (LOOCV) on our training data (in the case of the band gap data, we performed LOOCV on a subset of the extremely large training set). In the LOOCV procedure, if we have n total measurements of a particular property such as thermal conductivity, we train our machine learning model on n − 1 of these values and predict the nth (left out) value. We perform one training step and prediction for each property value and present the error distribution for all n values in Fig. 3. The error distribution then provides us with a sense of how we may expect the model to perform on new materials of which we have no prior knowledge.
Leave-one-out cross validation error histograms for the four key properties estimated by our recommendation engine: (a) Seebeck coefficient; (b) electrical resistivity; (c) thermal conductivity; and (d) band gap. For each material in our training set and each property, the recommendation engine gives a confidence score between 0 and 1 that the property value falls within the ideal windows we have defined for thermoelectric applications. Errors approaching +1 represent false negatives (our engine was extremely confident the material would be poor for that property, but the property is actually good); and an error of −1 is a false positive (our engine was extremely confident the material would be good for that property, but the property is actually poor). The peak around 0 for each property shows that the engine generally gives confidence values very close to unity for materials possessing properties in the desired ranges, or close to zero for materials whose property values fall outside the target range.
Leave-one-out cross validation error histograms for the four key properties estimated by our recommendation engine: (a) Seebeck coefficient; (b) electrical resistivity; (c) thermal conductivity; and (d) band gap. For each material in our training set and each property, the recommendation engine gives a confidence score between 0 and 1 that the property value falls within the ideal windows we have defined for thermoelectric applications. Errors approaching +1 represent false negatives (our engine was extremely confident the material would be poor for that property, but the property is actually good); and an error of −1 is a false positive (our engine was extremely confident the material would be good for that property, but the property is actually poor). The peak around 0 for each property shows that the engine generally gives confidence values very close to unity for materials possessing properties in the desired ranges, or close to zero for materials whose property values fall outside the target range.
Fig. 3 indicates that our engine generally makes very reliable assessments of thermoelectric materials properties. The modes of the error distributions are in each case close to 0. For each property, the engine’s errors skew toward false negatives (resistivity, band gap, thermal conductivity) or false positives (Seebeck), which reflects the fact that the underlying training data do not contain equal fractions of positive and negative examples. Seebeck coefficients prove most difficult to assess (i.e., the error distribution for that property has the largest standard deviation), likely because there are strikingly different mechanisms that underpin the values, for example, strongly correlated oxides as opposed to degenerate semiconductors. Owing to the difficulty in assessing the Seebeck coefficient, initial predictive models using only the electrical resistivity, thermal conductivity, and Seebeck coefficient produced too many candidates that were good metals with poor Seebeck coefficients. To remedy this shortcoming and provide more robust recommendations, the band gap was added as a secondary metric, where we determine the probability whether a given composition will have a non-zero bandgap.
C. Experimental details
RE12Co5Bi (RE = Gd, Er) samples were made by arc-melting freshly filed Er or Gd pieces (99.9%, Hefa), Co powder (99.8%, Cerac), and Bi powder (99.999%, Alfa Aesar). Stoichiometric mixtures (0.5 g total mass) with 5% to 7% excess Bi were pressed into pellets and melted twice in arc-melting furnace under argon atmosphere (Edmund Bühler Compact Arc Melter MAM-1). The total mass loss after melting was <1%. The samples were sealed in silica tubes and annealed at 1070 K for one week, then quenched in cold water. To produce enough material for physical property measurement, ∼70 samples of each compound were prepared, and pure samples were combined by melting into a single ingot of ∼5 g, which was sanded to yield the appropriate geometry (either a rectangular bar, or a cylinder). Density was measured using Archimedes’ method; the final pellets had densities 100% of the single crystal values (ρGd12Co5Bi = 8.6 g/cm3, ρEr12Co5Bi = 9.9 g/cm3).
Powder X-ray diffraction patterns were collected using an INEL CPS 120 diffractometer with Cu Kα1 radiation at room temperature, and Rietveld refinement was used to confirm the structure and phase purity (see supplementary material).28 Backscatter electron microscopy and elemental analysis via energy dispersive X-ray spectroscopy (EDX) were performed with a JEOL JSM-6010LA InTouchScope scanning electron microscope. Backscatter micrographs reveal that the samples are largely compositionally homogeneous (see supplementary material).28 Quantitative elemental analysis on several polished pieces found an atomic composition of Gd69(2)Co26(2)Bi5(2) which is in a good agreement with expected RE12Co5Bi composition. Er12Co5Bi samples were not appropriate for quantitative analysis because of overlapping Co Kα (6.924 keV) and Er Lα (6.947 keV) lines.
High-temperature thermoelectric properties (electrical resistivity and Seebeck coefficient) were measured with an ULVAC Technologies ZEM-3. Sample bars had approximate dimensions of 9 mm × 4 mm × 4 mm. Measurements were performed with a helium under-pressure, and data were collected from 300 K to 800 K through three heating and cooling cycles over 18 h to ensure sample stability and reproducibility.
III. DISCUSSION
In this work, we are interested not only in developing a model that gives accurate predictions of materials properties but also in making it immediately accessible and useful for experimental researchers. To that end, we have published our recommendation engine as a web app at http://thermoelectrics.citrination.com, where researchers may explore a pre-computed list of around 25 000 known compounds (representing a sizable subset of the Inorganic Crystal Structure Database, or ICSD) and also use our model to evaluate their own materials candidates in real-time. In this way, we hope that the app serves as a rapid triage tool for ideas for potential new thermoelectric materials.
This adds to a growing toolbox of computational tools designed to be a user-friendly aid to experimental workers, such as TEDesignLab, and the Materials Project.3,29 Our pre-computed list may be arranged according to the probabilities associated with any one of the four properties we are modelling and is sorted by default according a composite score that takes all four properties into account. Furthermore, the user may specify cutoff thresholds for any of the properties and thereby greatly reduce the size of the list.
As we believe our extensive precomputed list contains some interesting and heretofore uncharacterized candidate thermoelectric materials, we now comment on a select set of high-ranking compounds. Several of these compounds are given in Table I.
Several promising new thermoelectric compounds selected from our pre-computed list. The P values refer to the engine’s confidence level that a given material will exhibit a room-temperature value for a particular property (e.g., S or ρ) within the target ranges specified above. The full compound list is available for exploration at http://thermoelectrics.citrination.com.
Material . | PS . | Pρ . | Pκ . | Pgap . | Composite . | Comments . |
---|---|---|---|---|---|---|
TaPO5 and TaVO5 | 0.894 | 0.793 | 0.958 | 0.987 | 3.537 | High polyhedral connectivity and structural superlattices |
Tl9SbTe6 | 0.845 | 0.871 | 0.999 | 0.876 | 3.46 | Recently reported to be a good thermoelectric (zT ≈1 at 600 K) |
TaAlO4 | 0.893 | 0.703 | 1 | 0.977 | 3.477 | High mass contrast, high polyhedral connectivity (edge- and corner-sharing TaO6 octahedra) |
SrCrO3 | 0.772 | 0.767 | 0.996 | 0.95 | 3.308 | High polyhedral connectivity (3-D corner-sharing CrO6 octahedra), |
metallic when made under high pressure | ||||||
TaSbO4 | 0.892 | 0.919 | 1 | 0.997 | 3.559 | High polyhedral connectivity: layered, edge-sharing MO6 octahedra |
TiCoSb | 0.981 | 0.714 | 0.958 | 0.833 | 3.467 | TiCoSb is not a new compound but has been studied as a high-zT material. However it was not included in training data |
Material . | PS . | Pρ . | Pκ . | Pgap . | Composite . | Comments . |
---|---|---|---|---|---|---|
TaPO5 and TaVO5 | 0.894 | 0.793 | 0.958 | 0.987 | 3.537 | High polyhedral connectivity and structural superlattices |
Tl9SbTe6 | 0.845 | 0.871 | 0.999 | 0.876 | 3.46 | Recently reported to be a good thermoelectric (zT ≈1 at 600 K) |
TaAlO4 | 0.893 | 0.703 | 1 | 0.977 | 3.477 | High mass contrast, high polyhedral connectivity (edge- and corner-sharing TaO6 octahedra) |
SrCrO3 | 0.772 | 0.767 | 0.996 | 0.95 | 3.308 | High polyhedral connectivity (3-D corner-sharing CrO6 octahedra), |
metallic when made under high pressure | ||||||
TaSbO4 | 0.892 | 0.919 | 1 | 0.997 | 3.559 | High polyhedral connectivity: layered, edge-sharing MO6 octahedra |
TiCoSb | 0.981 | 0.714 | 0.958 | 0.833 | 3.467 | TiCoSb is not a new compound but has been studied as a high-zT material. However it was not included in training data |
TaVO5 and TaPO5 occur in an analogous crystal structure to the phosphate tungsten bronzes.30,31 These materials can be expected to have good thermoelectric performance given the heavy atoms, the potential for low electrical resistivity provided by the repeating ReO3-type structural network that is highly connected in three dimensions, and the intrinsic crystallographic shear provided by the crystal structure. Although the phosphate tungsten bronzes themselves are not highly rated, their metallic electrical transport properties are encouraging for structural analogues.32 Moreover, TaVO5 has a negative coefficient of thermal expansion and a structural transition at 600 °C.33 This structural transition may lead to softening of phonon modes and anharmonic scattering, which may lead to low thermal conductivity.
Other interesting suggestions to come from the recommendation engine are Tl9SbTe6, Ba2Pb, and FeAs2. Although none of these compounds were included in the thermoelectric database, they all scored highly within the recommendation engine. This prediction provides experimental validation since good thermoelectric performance has recently been demonstrated for these materials through property measurements or high-level DFT calculations.34–36
The suggestion of TaAlO4, SrCrO3, TaSbO4, and other oxides expected to be insulators can be understood because the recommendation engine uses as training data references where stoichiometric formulas were primarily reported rather than doping details.37,38 Nevertheless, with doping through substitution or reduction, these compound may exhibit moderate electrical performance. Further, these materials all feature extended structures that are highly connected in three dimensions, an important feature for low electrical resistivity. Moreover, the large mass contrast on the cation sublattice in TaAlO4 (edge shared TaO6 and AlO6 octahedra) could lead to low thermal conductivity, and previous reports have shown that SrCrO3 is metallic when synthesized under pressure.39
Many of the high-ranking candidate materials are interesting because of their highly connected extended structures, even though the recommendation engine does not use features of crystal structure to make its suggestions. The chief disadvantage to training prediction algorithms using crystal structure is that structure then becomes a required input for making predictions, and yet structure is by definition not available for uncharacterized materials. However, the absence of crystal structure does cause our engine difficulty where changes in crystal structure with similar elemental compositions cause large changes in physical properties. For example, both DyPO4 and LaPO4 are predicted to have low thermal conductivity. However, LaPO4 is monazite, a corner edge-shared structure, whereas DyPO4 is xenotime,40 an edge-shared structure leading to inherently higher thermal conductivity.41
A. New materials and their properties
Our final and most important task in this work is to demonstrate that our recommendation engine can indeed guide researchers toward interesting experimental discoveries. Among the set of high-scoring candidate materials, we selected Er12Co5Bi and Gd12Co5Bi to characterize as thermoelectric materials due to their facile synthesis through arc melting and due to the fact they are chemically quite distinct from known thermoelectrics (Fig. 1). While the RE12Co5Bi (RE = rare earth) family of compounds has only been sparsely studied in the literature, their crystal structure and initial low-temperature electrical and magnetic properties have been reported by Mar and co-workers.42 The crystal structure of RE12Co5Bi is shown in Figure 4.
(a) Crystal structure of RE12Co5Bi (prototype Ho12Co5Bi), of which Er12Co5Bi and Gd12Co5Bi are exemplars. (b) Crystal structure of the filled skutterudites, which have the generic chemical formula AM4X12. These two structure types share an icosahedral motif consisting of RE12Bi and AX12 units, respectively.
(a) Crystal structure of RE12Co5Bi (prototype Ho12Co5Bi), of which Er12Co5Bi and Gd12Co5Bi are exemplars. (b) Crystal structure of the filled skutterudites, which have the generic chemical formula AM4X12. These two structure types share an icosahedral motif consisting of RE12Bi and AX12 units, respectively.
Interestingly, the crystal structure of our candidate thermoelectric exhibits notable similarity to the structures of known thermoelectrics, in spite of the fact that crystal structure was not an input feature for our recommendation engine. Ho12Co5Bi is the eponymous structure prototype (orthorhombic, space group Immm) adopted by a series of rare-earth intermetallics RE12Co5Bi (RE = Y, Gd, …, Tm). In this structure, the Ho12Bi icosahedra play an analogous role to the LaP12 icosahedra in the filled skutterudite prototype LaFe4P12; rare-earth atoms “rattling” within their 12-fold coordinated cages is the idiosyncratic feature of filled skutterudites that imparts low thermal conductivity so prized in thermoelectric materials. In fact, if the transition metal atoms, which occupy different sites in these structures, are disregarded, the Ho12Bi framework is an antitype to the LaP12 framework, with the roles of the rare-earth and group 15 elements reversed. We hypothesize its crystallographic similarity to skutterudite could be partly responsible for the thermoelectric behaviour of RE12Co5Bi (RE = Gd, Er).
We give a full thermoelectric characterization of Er12Co5Bi and Gd12Co5Bi in Fig. 5. Based on these results, we report the discovery of a new thermoelectric class, which remains a completely unoptimized, pure bulk material and thus lends itself to further study. Notably, the material falls far outside the usual search space for thermoelectrics (Figs. 1 and 2) and was neither the result of simple interpolation between known compounds nor obvious from a strict chemical intuition standpoint. The electrical resistivity is commensurate with other high-performing materials such as chalcogenides, although the Seebeck coefficient is too low for the material to be competitive with the best-known thermoelectrics. Furthermore, the thermal conductivity is relatively high, but the filled cage structure lends itself to substitution that has successfully reduced thermal conductivity in the skutterudite systems.1,43 In RE12Co5Bi (RE = Gd, Er), the thermal conductivity from 300 K to 800 K ranges from 4 W m−1 K−1 to 8 W m−1 K−1, comparable to the half-Heuslers.44,45 Note that these results are consistent with the engine’s predictions (Fig. 5); the models give a high probability of achieving the thresholds for electrical conductivity (a) and thermal conductivity (c) (see confidence bar insets), while also suggesting a low probability of observing a large Seebeck coefficient (b). The electrical performance figure of merit κzT is around 0.03 W m−1 K−1 at 400 K, which is actually higher than that of nearly 30% of the thermoelectrics in the Gaultois et al. thermoelectrics database;6 of course, the database is a highly self-selected set of materials, consisting of literature-reported thermoelectrics, and would skew toward much higher κzT values than would a random subset of all crystalline materials. We note, of course, that the zT of several other thermoelectric materials can be significantly improved through carrier concentration tuning and microstructural engineering. For example, undoped polycrystalline Si has a 60-fold increase in performance after optimization, going from zT < 0.01 to 0.6 at 300 K.46
Thermoelectric characterization of RE12Co5Bi (RE = Gd, Er). (a) Electrical resistivity, (b) Seebeck coefficient, (c) thermal conductivity, and (d) thermoelectric figure of merit zT as a function of temperature. We also include the recommendation engine’s confidence levels for the first three properties; the lowest-probability property, the Seebeck coefficient, is indeed found to be below the 100 μV K−1 threshold.
Thermoelectric characterization of RE12Co5Bi (RE = Gd, Er). (a) Electrical resistivity, (b) Seebeck coefficient, (c) thermal conductivity, and (d) thermoelectric figure of merit zT as a function of temperature. We also include the recommendation engine’s confidence levels for the first three properties; the lowest-probability property, the Seebeck coefficient, is indeed found to be below the 100 μV K−1 threshold.
Another observation from Fig. 5 illustrates the scientific boon of studying entirely new classes of materials. Unexpectedly, RE12Co5Bi (RE = Gd, Er) exhibits increasing thermal conductivity with temperature. (We note the recommendation engine successfully chose a material with a low thermal conductivity at room temperature, which would normally decrease with increasing temperature.) The increasing electrical resistivity with temperature indicates metallic electrical transport, so the electrical contribution to the total thermal conductivity should therefore decrease with increasing temperature. Additionally, the phonon contribution to thermal conductivity should also decrease with increasing temperature due to more phonon–phonon (Umklapp) scattering.47 Thermal conductivity is calculated from the following relation: κ = αρCp, where α is thermal diffusivity, Cp is heat capacity, and ρ is density. Normally, thermal diffusivity has a negative temperature dependence whereas heat capacity and density both have positive temperature dependence. However, for this compound we observe a positive temperature dependence for the thermal diffusivity even after multiple measurements, the origin of which is not presently understood. Materials with increasing thermal conductivity with temperature are rare, though not unprecedented,48,49 and further studies on this class of compounds to shed light on this anomaly could thus lead to new strategies for thermoelectric materials optimization.
IV. CONCLUSIONS
This initial experimental validation of our recommendation engine is encouraging. The present work represents the first time that machine learning has been used to suggest an experimentally viable new compound from true chemical white space, where no prior characterization had hinted at promising chemistries. The implication is that our approach—wherein a data-driven computational tool directly augments experimental capabilities and intuition—is a semi-rational way to discover new materials families that may have desirable properties. We suggest that such an paradigm could eventually replace trial-and-error and fortuity in the search for new materials across a wide variety of application areas.
ACKNOWLEDGMENTS
We thank Ram Seshadri for helpful discussions and insight. We thank the National Science Foundation for support of this research through No. NSF-DMR 1121053, as well as the Natural Sciences and Engineering Research Council of Canada (NSERC), and the DARPA SIMPLEX Program No. N66001-15-C-4036. Additionally, this research made extensive use of shared experimental facilities of the Materials Research Laboratory: a NSF MRSEC, supported by No. NSF-DMR 1121053. M.W.G. is thankful for support from NSERC through a Postgraduate Scholarship, support from the US Department of State through an International Fulbright Science & Technology Award, and support from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska–Curie Grant Agreement No. 659764. B.M. and G.J.M. are founders and significant shareholders in Citrine Informatics Inc.