Advances in materials informatics (MI), which combines material property calculations/measurements and informatics algorithms, have realized properties in the nanostructures of thermal functional materials beyond what is accessible using empirical approaches based on physical instincts and models. In this Tutorial, we introduce technological procedures and underlying knowledge of MI combining thermal transport calculations and machine learning using an optimization problem of superlattice structures as an example (sample script available in the supplement). To provide fundamental guidance on how to use MI, we describe practical details about descriptors, objective functions, property calculators, machine learning (Bayesian optimization) algorithms, and optimization efficiencies. We then briefly review the recent successful applications of MI to design thermoelectric and thermal radiation materials. Finally, we summarize and provide future perspectives about the topic.

## I. INTRODUCTION

The rapid development of nanotechnology has introduced unprecedented opportunities in designing nanostructures for energy transport in the forms of phonons, electrons, and photons. Phonon transport becomes more ballistic when the material length scale is reduced to the nanoscale. Forming the appropriate nanostructure can control heat conduction. Such control has been widely realized in superlattices,^{1,2} phononic crystals,^{3–5} nanocrystals,^{6–8} and nanocomposites.^{9,10} Furthermore, when the length scale of the nanostructure periodicity exceeds the phase-relaxation lengths of the phonons that contribute to the thermal transport and the nanostructure-boundary roughness is smaller than the wavelengths,^{11,12} nanostructures can exhibit various coherent phonon transport phenomena such as constructive/deconstructive phonon interference,^{13–15} resonance,^{16} and localization.^{17,18} These phenomena can magnify the controllability of heat conduction.

A similar discussion can be made for heat conduction through interfaces. Various factors have been reported to tune the interfacial thermal conductance such as roughness,^{19,20} vacancy defects,^{21} lattice orientation,^{22,23} nanoinclusions,^{24} and interfacial adhesion or bonding.^{25,26} To further advance the field of thermal transport control by nanostructures, an important question is how much room is left to control for a specific class of nanostructures. That is, what is the optimal structure to minimize or maximize the thermal conductivity? However, this is a challenging question to answer because phonon transport is highly sensitive to the detailed atomistic configuration.

When phonon transport is coupled with other energy carrier transports such as electrons, transport property design becomes more complex. Considering thermoelectric applications as an example, the energy conversion efficiency of thermoelectric materials is characterized by the dimensionless figure of merit: *ZT* = *S*^{2}*σT*/(*κ*_{e }+ *κ*_{ph}), where *S*, *σ*, *T*, *κ*_{e}, and *κ*_{ph} are the Seebeck coefficient, electrical conductivity, absolute temperature, and thermal conductivities of phonons and electrons, respectively. Many studies have been dedicated to reduce the phonon thermal conductivity.^{7,27} However, increasing the figure of merit by altering only one component is difficult due to the general correlation between the electric and thermal transports.

In addition to the thermal transport in solids carried by phonons and electrons, photons can transfer heat via thermal radiation. By tuning the photon transport, wavelength- or directional-selective emitters/absorbers have been designed and applied to thermal photovoltaics,^{28,29} infrared heaters,^{30} and thermal radiative cooling.^{31–33} Similar to phonon transport, when the structure length scale is comparable or smaller than the photon wavelength, micro-/nanostructures can induce interference and diffraction in the electromagnetic waves. This allows thermal radiation to be tailored. Although various structures such as nanowires, planar multilayers, periodic gratings, nanoparticles embedded in a matrix, and two-dimensional materials have been proposed, their optimization usually involves many variables: material type, layer thickness, grating size, particle size and distribution, etc. Hence, the number of possible candidates is enormous.

Along with theory, simulations, and experiments, materials informatics (MI)^{34–36} has become the fourth paradigm of science in the past few years. MI combines material property calculations/measurements and informatics algorithms. Not only has it guided experiments directly,^{37} MI has a high efficiency in designing novel materials such as drugs,^{38,39} polymers,^{40} thermoelectrics,^{41,42} magnetic tunnel junctions,^{43} and thermal emitters/absorbers.^{44,45} Compared with traditional trial-and-error materials’ research procedures, which are time-consuming and costly, MI has opened new avenues for accelerating the discovery, development, manufacturing, and deployment of advanced materials.

Many MI methods have been developed for energy transport optimization. However, they can be divided into two categories: database screening and structure designing. The main difference is that screening selects from available databases while designing creates novel materials or structures that do not yet exist.

Due to advances in computational materials science and computing power, more and more open-access material databases have been established. Examples include Materials Project,^{46} AFLOW,^{47} ICSD, OQMD,^{48,49} and NOMAD.^{50} To select the most appropriate material from these databases for material design, high-throughput screening^{51} has been developed. Typical screening involves various machine learning algorithms such as neural networks,^{52,53} Gaussian process regression,^{54} random-forest regression,^{55} and transfer learning.^{56}

For structure design, an initial database is typically not available. Hence, training data are collected gradually, and the prediction model is updated over time via an iterative and closed-loop design process. Currently, various intelligent optimization algorithms such as the genetic algorithm,^{57,58} particle swarm optimization,^{59} Bayesian optimization (BO),^{42,54,60–62} Monte Carlo tree search (MCTS),^{63,64} and quantum annealing (factorization machine)^{65} are widely employed as black-box optimization tools. They realize automated materials’ discovery based on an iterative process to select the optimal candidate from a massive number of potential structures.

This Tutorial introduces the automated design of thermal functional materials through a combination of thermal transport calculations and a Bayesian optimization algorithm. It offers guidance for researchers who are interested in efficiently designing novel materials for thermal transport using MI. This Tutorial is organized as follows. Section II explains the basis of MI using an example problem of designing a superlattice structure to obtain minimum thermal conductance. Section III discusses the optimization results, including the efficiency of optimization, the effect of the parameters on optimization, and new knowledge we can learn from MI. Section IV introduces recent applications of Bayesian optimizations on nanostructure designs for thermoelectrics^{42} and thermal radiation.^{44,45} Finally, Sec. V summarizes and gives perspectives for designing thermal functional materials using MI.

## II. METHODOLOGY

Over the past few decades, various computational and experimental techniques for thermal transport have been developed at the nanoscale. Typically, the design is based on a physical model of thermal transport formulated in four dimensions (time and the three dimensions of space) either in a real or in a reciprocal representation, which is preferred for humans to comprehend. However, dimensional restrictions may become the limiting factor when identifying complicated structure–property relations. On the other hand, machine learning with a black-box model is advantageous in handling hyperspace of a large number of dimensions and may be superior in predicting complicated structure–property relations, though it may not immediately help humans understand the underlying physics. With advances in machine learning and data-mining methodologies, researchers are now able to construct a black-box model from computational or experimental data and identify or predict nanostructures or materials displaying the desired properties. Although these methods employ interpolation in a large-dimensional hyperspace, this can be viewed as an extrapolation in our usual four dimensions. Hence, it offers great opportunities for discovering, designing, and even understanding novel materials.

The key to this MI formalism is to build communications between materials’ simulations/experiments and machine learning tools (Fig. 1). Generally, four essential elements are necessary for MI: objective function, descriptor selection, property calculator to evaluate the objective function, and an informatics optimization method. In the following, we introduce the general methodology to optimize the nanostructure for thermal transport.

### A. Problem description and objective function

Before initiating any optimization, the optimization problem needs to be clearly defined with a specific objective function and the degrees of freedom to be optimized. Herein, we use the design of the superlattice structure as an example to demonstrate the methodology. Superlattice materials,^{66–69} which are important in applications such as thermoelectrics and semiconductor lasers, have attracted much attention in the past few decades due to their unique thermal and electrical transport properties.

Alternately, depositing two different materials on a substrate can synthesize a superlattice in a periodic or aperiodic fashion. Periodic superlattices are well studied. Their thermal conductivity exhibits a minimum with respect to layer thickness due to the crossover between wave interference and particle scattering effects of phonons.^{70} Recent studies have revealed that aperiodic superlattices have a lower thermal conductivity than periodic ones.^{71} Because the thickness of each layer in a superlattice differs, the underlying physics becomes complicated. Consequently, designing an optimal superlattice structure from a vast number of candidates based on physical principles is a challenge.

Figure 2 illustrates a case where the design aims to realize an optimal structure to minimize thermal conductance at room temperature. The superlattice structure consists of silicon (Si) and germanium (Ge) layers with a total thickness of 7.6 nm sandwiched by two Si leads. The superlattice structure is discretized into unit layers with thicknesses of 5.43 Å, which is the lattice constant of a silicon crystal. Each unit layer can be either Si or Ge. The objective is to find the optimal arrangement of Si and Ge layers to obtain the minimum thermal conductance. The objective function is the magnitude of thermal conductance or its reciprocal. To validate and discuss the optimization efficiency, the number of layers is intentionally kept to 14, which gives 16 384 possible structures. This number is small enough that all structures can be calculated for validation.

### B. Descriptors

Descriptors, which are digital representations of the nanostructures, are used as inputs in machine learning models. Adopting an effective descriptor is important for successful machine learning. Various types of descriptors have been proposed in the literature for MI. Descriptors associated with the fundamental crystal properties are widely used since large amounts of data that are known to influence the transport properties in solid-state physics are available and compiled as databases. For instance, in database screening for high and low thermal conductivity crystals,^{51,54,72,73} the properties associated with the crystal structure are usually used as crystal descriptors. These include the lattice constants, the number of elements, the number of atoms in the cell, volume, density, formation energy, and space group number. In addition to independent parameters, secondary parameters, which are generated by mixing different parameters such as volume/atom, density/atom, and formation energy/atom, can also improve the prediction accuracy. Moreover, different types of elemental descriptors are important in screening. These include the atomic number, atomic weight, empirical covalent atomic radius, position in the periodic table, Pettifor chemical scale, and Pauling electronegativity. However, a descriptor does not have to be a physical property. Abstract quantities such as radial distribution function perform as well as or often even better than a physical property.

On the other hand, in the design of new nanostructures, descriptors must incorporate the spatial distribution of atoms. In the case of superlattice structure optimization, three types of descriptors are tested: digital flag numbers, the Coulomb matrix (CM), and the mass matrix (MM).^{74}

#### 1. Digital flag numbers

Binary flags of “1” and “0,” which are convenient and intuitive, indicate the material state of each unit layer. Here, “1” indicates a Ge layer and “0” indicates a Si layer. For instance, the descriptor for the aperiodic superlattice structure shown in Fig. 1 is (11001010110111). The 14 layers give rise to 16 384 possible candidates. It should be mentioned that decimal fraction values can also be used as descriptors. For instance, Guo *et al.*^{44} used a discrete decimal fraction to describe the layer thicknesses in the multilayer structure introduced in Sec. IV.

#### 2. Coulomb matrix

CM is a common descriptor in the field of molecular chemistry. It describes the Coulomb interaction between atoms.^{75} For a given atomistic structure, CM is calculated as

where *q* is the nuclear charge of the atom, *r _{ij}* is the distance between atoms

*i*and

*j*, and

*n*is the number of atoms in the system. The diagonal elements in CM describe the self-interaction of each atom, which can be obtained by a polynomial fit of the atomic energies to the nuclear charge

*q*, while the off-diagonal elements represent the Coulomb interactions between atoms

*i*and

*j*.

Because CM is symmetric, the actual descriptor vector can be obtained by adopting only the upper triangular part (*i *≤ *j*) in CM as

Therefore, the dimension size of CM is *n*(*n *+ 1)/2. Based on the Coulomb matrix, the eigenvalues of CM (EVCM)^{75,76} can be calculated and used as a descriptor,

where *λ _{i}*,

*i*∈ (1, 2, … ,

*n*), are the eigenvalues of

*C*. In this way, the dimension size of EVCM can be reduced from

*n*(

*n*+ 1)/2 to

*n*compared with CM, decreasing the search space. In general, compressing the dimension size of a descriptor is preferable for an optimization. However, over-compression may not sufficiently represent the actual structural information, negatively affecting the optimization. Another advantage of a CM descriptor is that it is size-independent when a cut-off for the interaction for a complicated irregular atomistic structure is applied.

#### 3. Mass matrix

Similar to the definition of CM, the MM descriptor, which considers the mass information instead of charge, is defined as

where *m* is the atomic weight of the atom. The diagonal elements expressing the self-interaction are set to 0, considering that heat transport occurs via interatomic interactions. Similar with CM, the final descriptors of MM and the eigenvalues of MM (EVMM) can be defined as

where *μ _{i}*,

*i*∈ (1, 2, … ,

*n*), are the eigenvalues of

*M*. MM should describe the thermal transport properties of the structure better by incorporating information about the atomic mass while inheriting the good features of CM. It is worth a mention that the expression of MM [Eq. (6)] was formulated simply as an extension of CM and was not derived based on physical principles, but descriptors do not necessarily need to be physical, and often unphysical descriptors can perform as well as the physical ones if not better.

### C. Property calculator

The property calculator is used to calculate the objective function during optimization. It can be any open-source, commercial, or homemade tool for thermal transport property calculations. Examples include density functional theory,^{77} molecular dynamics simulations,^{78} Monte Carlo simulations,^{79} finite element method, and electromagnetic simulations.^{80,81} When selecting the appropriate property calculator, the calculation time is a critical consideration because it needs to be balanced with the selection time (including modeling). This balance depends on the target system and the property as well as the general size and complexity of the system. For a given property calculation method, the calculation time increases as the system size, selection time, and the number of structure candidates increase. However, the selection time may be overcome by, for example, choosing the appropriate selection process and transforming it to a binary optimization problem (fraternization machine) to use quantum (or digital) annealers.^{65} Although the property calculation can also be sped up by parallelization, the speed remains limited, particularly when using *ab initio* approaches. Therefore, the choice of calculator is critical when considering the objective of the research, and often compromise is necessary. For instance, in the example problem of superlattice optimization among 16 384 candidates, the atomistic Green's function (AGF) method is chosen because it can be performed within one minute. This time is comparable with the selection (modeling) time in Bayesian optimization, which is about a half a minute on a computer with an Intel Core i7-8700 CPU @ 3.20 GHz. The drawback of AGF is that it cannot capture anharmonic phonon–phonon scattering. Hence, this is used as a compromise for the sake of optimization speed because the influence of anharmonicity has been thought to be negligible in many cases since the interface distance is much smaller than the phonon mean-free-paths. It should be noted here that recent advances in the anharmonic AGF calculations^{82} have suggested notable contribution of anharmonicity to the interfacial conductance, and this aspect of compromise requires more attention in the future. Of course, the choice of calculator also depends on the amount of computer resources available and the acceptable wait time for the results. Heavier property calculators that include anharmonic effects such as molecular dynamics and anharmonic lattice dynamics are feasible, but it is important to consider the balance in the problem design.

Here, we briefly describe the AGF method. For a more detailed description, refer to previous works.^{83,84} The phonon transmission function across the Si/Ge superlattice region Ξ(*ω*) is calculated by

where *ω* is the phonon frequency. *G ^{r}* and

*G*are the retarded and advanced Green's functions of the scattering region, respectively. The level broadening matrices $\Gamma L=i(\Sigma Lr-\Sigma La)$ and $\Gamma R=i(\Sigma Lr-\Sigma La)$ describe the rates of inflow from the left lead and outflow into the right lead, respectively. $\Sigma L$ and $\Sigma R$ are the self-energies, which are calculated from the surface Green's functions of the left and right leads, respectively. The surface Green's function calculated for semi-infinite regions reproduces the left and right Si leads with an infinite thickness (Fig. 2). Hence, the transmission can be calculated for the full phonon spectrum, including long-wavelength phonons.

^{a}With the two leads kept at different temperatures *T _{L}* and

*T*, the heat current flowing through the device is given by the Landauer formula,

_{R}^{85}

where *f _{L}* and

*f*are the Bose–Einstein distributions of phonons. In the limit of small temperature differences, the value of thermal conductance at average temperature

_{R}*T*can be further obtained as

where *k*_{B} is the Boltzmann constant and *S* is the cross-sectional area. As mentioned above, only the harmonic interaction is considered here. In principle, the anharmonic interaction can be included in the AGF calculation by introducing three-phonon scattering at interfaces via the many-body self-energy in the reciprocal space. However, the calculation is heavy.

To minimize the computational cost, the supercell size in the transverse direction perpendicular to heat conduction is set as a single unit cell (UC) (i.e., 5.43 × 5.43 Å^{2}) since the periodic boundary condition can represent the interface with an infinite cross section, and the thermal conductance is independent of the size of the transverse supercell. The lattice mismatch between Si and Ge is ignored because the stress effect on thermal transport due to the lattice mismatch is negligible.^{83} To further reduce the computational time, the k-mesh size is optimized in the AGF calculation. Figure 3 shows the dependences of the phonon transmission function and thermal conductance on k-mesh size for a pure silicon system with a cross section size of 1 UC × 1 UC, where convergence is confirmed for a mesh size larger than 10 × 10. This confirms that the *k*-mesh used in our calculation (20 × 20) is sufficient to capture the phonons in the full frequency spectrum. All the calculations are carried out using the commercial Atomistix ToolKit simulation package (ATK)^{86} with Tersoff potentials.^{87,88}

### D. Combining the Bayesian optimization and thermal transport calculation

The final step is to choose a suitable machine-learning algorithm and combine it with the property calculator to form closed-loop optimization. Among the various machine-learning algorithms, the Bayesian optimization employing Gaussian processes is well established as a prediction model for black-box optimizations. The combined process for the Bayesian optimization and the thermal-transport-property calculator is shown in Algorithm 1 and Fig. 4. The sample script can be found in the supplementary material. Suppose the candidates are represented as *d*-dimensional vectors ** x** = (

*x*

_{1},… ,

*x*) ∈

_{m}**R**

*, which are candidate descriptors. The dimensional size is typically smaller than 20 in most successful applications. In the Bayesian optimization process, the transport properties of*

^{d}*n*candidates are initially calculated. Then, the next one to calculate is chosen. The purpose of the Bayesian optimization is to find the candidate with an optimal transport property using as few calculations/experiments as possible. A Bayesian regression function is learned from

*n*pairs of descriptors and transport properties. This statistical prediction model is invariably a Gaussian process, which provides a Bayesian posterior probability distribution that describes the potential thermal conductance values for candidate point

*x*. The best candidates are chosen based on the expected improvement criterion,

^{89}a widely used acquisition function that can explicitly encode a trade-off between exploitation (evaluating points with low mean) and exploration (evaluating points with high uncertainty). Besides the expected improvement, the maximum probability of improvement and upper confidence bound are two other popular acquisition functions

^{89}used in Bayesian optimization. Finally, the exact thermal conductance values are calculated for the chosen candidate and added to the training set. By repeating this procedure, the calculation of thermal conductance is scheduled optimally, and the best candidate can be found quickly.

Determine vector descriptors for the optimization problem. |

Randomly select n initial candidates and evaluate the target property. _{0} |

While the calculated number of candidates or the optimization target is not reached |

do |

Update the posterior probability distribution using all available training data. |

Compute the acquisition function using the current posterior distribution. |

Select next possible optimal candidate x according to the maximization of the acquisition function over _{n}. x |

Evaluate the property for candidate x. _{n} |

Increment n. |

end while |

Return the solution. |

Determine vector descriptors for the optimization problem. |

Randomly select n initial candidates and evaluate the target property. _{0} |

While the calculated number of candidates or the optimization target is not reached |

do |

Update the posterior probability distribution using all available training data. |

Compute the acquisition function using the current posterior distribution. |

Select next possible optimal candidate x according to the maximization of the acquisition function over _{n}. x |

Evaluate the property for candidate x. _{n} |

Increment n. |

end while |

Return the solution. |

Many Bayesian optimization tools have been developed. Examples include GPyOpt,^{90} DiceOptim,^{91} Spearmint,^{92} GPFlow,^{93} Metrics Optimization Engine,^{94} and the open-source Bayesian optimization library (COMBO).^{61} Here, we use COMBO^{61} to perform the Bayesian optimization process. COMBO is a black-box tool without user hyperparameter tuning for the kernel of Gaussian processes or an acquisition function. It has been successfully applied to optimizations in various energy transport problems. It is highly scalable and convenient for researchers from different research fields such as materials, physics, chemistry, and engineering due to the efficient protocol employing Thompson sampling, random feature maps, one-rank Cholesky update, and automatic hyperparameter tuning.

In COMBO, the prediction model is built by employing a Bayesian linear regression model combined with a random feature map *Φ* as

where ** x** is a candidate descriptor, the values

*y*are the thermal transport property evaluations for the candidates,

**is a**

*w**D*-dimensional weight vector, and

*ɛ*is the noise subject to a normal distribution with mean 0 and variance

*σ*. The selection of feature map

*Φ*which approximates the distribution will affect the performance of Bayesian optimization crucially. Here, the random feature map means the feature map is built based on random samples. With the increasing number of random samples, the Bayesian linear model with the random feature map converges to a Gaussian process. In COMBO, the random feature map is chosen so that the inner product corresponds to the Gaussian kernel,

^{95}

The model's performance depends on hyperparameters *σ* and *η* that are initialized according to the heuristic procedure by Yang *et al.*^{96} In every round with 20 or so training examples added, the hyperparameters are periodically configured and updated by maximizing the type-II likelihood.^{97} The supplementary material includes the full example script and supporting thermal conductance data for researchers to conveniently repeat our demonstration case for the superlattice structure design.

## III. RESULTS AND DISCUSSION

### A. Optimization efficiency of Bayesian optimization

Figure 5(a) shows the typical evolution of Bayesian optimization. As the number of calculated candidate structures increases, the thermal conductance value fluctuates, reflecting the stochastic feature in Bayesian optimization. The current minimum thermal conductance converges gradually in the calculations of 1000 structures, which is around 6.1% of the total number of candidates.

This value is somewhat larger (i.e., the optimization efficiency is lower) than the other cases that the authors and co-workers have previously reported such as interfacial Si/Ge alloy structure optimization.^{60} One reason may be that the distribution of structures near the lowest thermal conductance is large in the histogram (Fig. 6), suggesting that there are many structures with similarly low thermal conductance values. Thus, more calculations are required to find the superlattice with the global minimum of thermal conductance. This differs from the case for alloy structure optimization, which has a smaller distribution near the lowest thermal conductance. These results suggest that optimization efficiency is not universal and depends on the details of the optimization problem. Regardless, machine learning performs significantly better than a random search to find structures with a lower thermal conductance [Fig. 5(b)].

To check optimization accuracy, the thermal conductances of all candidates are calculated. It is confirmed that the minimum thermal conductance and the corresponding superlattice structure are the same as those obtained by Bayesian optimization.

To evaluate how optimization efficiency depends on the algorithm, the Bayesian optimization is compared with a random search and a Monte Carlo tree search, which combines the generality of random simulation with a precision tree search in optimization. For each optimization algorithm, ten rounds of optimizations are conducted with different random seeds, which represent the different optimization experiments. Figure 7 shows the average convergence of each optimization method. The efficiency of Bayesian optimization is much higher than that of random search. As for the Monte Carlo tree search, the efficiency is better than the random search but is lower than that of Bayesian optimization.

It is noteworthy that the Bayesian optimization is extremely effective when the total number of candidates is around several hundred thousand. However, for cases with a much larger or even unlimited number of candidates, the required computer memory becomes massive because the algorithm simultaneously handles and scores the entire candidate structures through the built model. To overcome this limitation, two options are offered. One is to divide the candidate pool into subgroups and perform the Bayesian optimization for each subgroup. Section IV shows recent applications of this option to design a thermal radiation surface. The other option is to adopt the Monte Carlo tree search algorithm, which has superior scalability for targeting candidates that are approaching the global optimal ones when the total number of candidates is large or infinite. As shown in Fig. 7, the optimization efficiency is reduced.

### B. Comparison of different descriptors

The optimization efficiencies for different descriptors described in Sec. II B are compared. Figure 8 shows the average number of calculated structures needed to find the superlattice with the minimum thermal conductance. To compare the efficiency of different descriptors, here we define the average efficiency by averaging the percentage of structures calculated until we find the optimal structure for each optimization experiment with random seed. The average efficiency is 6.59%, 4.25%, 3.10%, 6.31%, and 3.69% for descriptors using binary flag values, CM, EVMC, MM, and EVMM, respectively. The efficiencies for binary flag and MM descriptors are comparable, whereas CM improves the search efficiency. When reducing the dimension size of the descriptors using eigenvalues, the optimization efficiency further improves for both the CM and the MM.

The search efficiencies of Bayesian optimization and a random search are compared for different descriptors. For each type of descriptor, *N*_{Bayes} and *N*_{rand} are evaluated, which denote the average number of calculations needed until the first structure belonging to top k (k = 1–20) of all the candidates is found by Bayesian optimization and random search. Here, we compare the optimization efficiency using the ratio of *N*_{Bayes} to *N*_{rand} (*N*_{Bayes}/*N*_{rand}). For all descriptor types, the Bayesian optimization has a higher efficiency than the random search (Fig. 9). For k ≥ 6, the search efficiencies of EVCM and EVMM are worse than that of CM, MM, and binary flag values. Binary flag values, CM, and MM descriptors are more efficient than the eigenvalue descriptors to find superlattice structures approaching the global minimum thermal conductance.

### C. Number of candidates calculated and added in each round

In the Bayesian optimization, one adjustable parameter is the number of candidates added in each round. Figure 10 compares the consumed optimization time and the efficiency for different numbers of candidates added in each round. The number added should not be too small or too large. If it is too small, modeling and selection must be performed frequently, resulting in a longer optimization time. If it is too large, the total number of calculations required to obtain the optimal structure becomes large. By balancing these two factors, the optimal number is 50 in the current superlattice structure optimization. The optimal number depends on the characteristics of each problem. However, our experience suggests that the optimal number typically ranges between 20 and 50.

### D. New knowledge gained via MI

MI not only finds the optimal structure with a high efficiency but also helps find the physical mechanisms behind designed novel structures. The aforementioned optimized aperiodic superlattice structure (10010101101111) is one such example. In fact, we have optimized superlattices with different layer thicknesses. All the results show that the aperiodic superlattice has a relatively lower thermal conductivity than its corresponding periodic one [Fig. 11(a)].

Knowing the optimal structures or classes, we can “reverse-study” the mechanism that gives rise to minimum thermal conductance. Using two separate calculations, one where the thickness is changed for a constant number of layers and the other where the number of layers is changed at a constant thickness, the optimal periodic superlattice is evaluated. Both reduce the thermal conductance but for different reasons. The change when altering the thickness is due to Fabry–Pérot wave interference, while that when altering the number of layers is due to interfacial scattering. When the total thickness of the superlattice is fixed, the thickness and the number of layers cannot be freely varied, imposing a trade-off to balance the two. Here, an aperiodic structure has more degrees of freedom to optimize the balance. The significance of the added degrees of freedom in an aperiodic superlattice can be seen in Fig. 11(b) as the spread of the data for a certain number of layers (i.e., the number of interfaces). Hence, the minimum value is the one with the best balance between wave interference and interfacial scattering effects. Recent progress on the aperiodic GaAs/AlAs superlattice design via Bayesian optimization has been demonstrated experimentally in Ref. 98. The agreement between the fully coherent calculation and the experiments assures that the minimization of thermal conductivity is due to the optimal control of the coherent phonon thermal transport. The significant reduction in the optimal aperiodic superlattice compared with the optimal periodic superlattice is attributed to the stronger and more extensive localization of the acoustic phonons.

## IV. RECENT APPLICATIONS OF NANOSTRUCTURE OPTIMIZATION BY MI

### A. Thermoelectric optimization

The Bayesian optimization is also useful to optimize the figure-of-merit, which is comprised of multiple properties in trade-off relations such as thermal and electric transport in thermoelectric materials. Here, we introduce graphene nanoribbons (GNRs) structural optimization,^{42} which alternated the phonon/electron transport calculation and the Bayesian optimization to obtain a high thermoelectric figure of merit.

#### 1. Problem description and objective function

Figures 12(a) and 12(b) show the two types of defective GNR structures: the periodically nanostructured GNR and the antidot GNR. For the periodic GNR, the optimization problem removes *m* atoms from the system to achieve the highest figure of merit. To ensure structural stability during optimization, it is constrained such that every carbon atom in each candidate structure belongs to at least one complete hexagonal lattice, and no hexagonal lattice is isolated. As for the antidot GNR, it is composed of multiple sections, and each section is either a pristine structure or an antidot structure with a pore possessing a diameter of *d* = 2.9 Å in the center. The optimization problem is whether to dig a hole in each section to achieve the highest figure of merit.

#### 2. Descriptors and transport property evaluation

Similar to the phonon transport optimization, binary flag values are employed as descriptors, where 0 and 1, respectively, represent the defective and complete hexagonal lattices in periodic GNR optimization [Fig. 12(a)] and the pristine structure and the antidot structure in antidot GNR optimization [Fig. 12(b)]. In this way, the vector value ** g** = (1, 0, …) can describe each candidate structure in optimization. Green's function with the Landauer formula is employed to evaluate the thermoelectric properties, which includes the phonon/electron thermal resistance, the Seebeck coefficient, and the power factor calculated using the Atomistix ToolKit package.

^{86}

#### 3. Results and discussions

The efficiencies of Bayesian optimization and a random search for thermoelectric optimization are first compared for periodic nanostructured GNR optimization. In most cases, a random search requires double the number of candidate structure calculations compared to the Bayesian optimization to find top 0.5% of all candidates with the highest figure of merit. The absence of *m*-dependence indicates that the optimization efficiency is independent of the total number of candidates. Hence, a comparable efficiency is expected in cases for larger GNR systems.

Figure 12(c) shows GNR structures with *m* = 4 and 10 with the largest power factor and lowest thermal resistance, respectively. For the highest power factor structure, vacancies are introduced over the entire area of GNR, except for the hexagonal lattices along the edge. The structural optimization leads to a strong flattening of the electronic bands around the energy levels of the edge state, and eventually, bandgaps are generated. Nanostructuring in the middle areas of zigzag GNRs leads to phonon scattering without a significant change in the edge state and enhances the thermoelectric performance. For the structure with the lowest thermal resistance, GNR has a complicated labyrinthian shape, where the phonons are significantly localized with reduced group velocities.

For the antidot GNR optimization, 16 sections, which give a total of 32 896 candidates, are considered. The final optimal structure has an aperiodic array of antidots. Figure 12(d) compares the figure of merit, power factor, and thermal resistance of the pristine, periodic antidot, and optimal structures. The optimal arrangement of antidots increases the figure of merit 11 times with an enhanced power factor and thermal resistance of 2.9 and 3.6 times, respectively. It is interesting to note that simply arranging the antidot periodically increases the figure of merit 5.0 times compared with the pristine structure. However, the remaining 2.1-time enhancement requires optimization. These results show that the optimization of the arrangement of antidots can simultaneously improve thermal and electronic properties. The successful thermoelectric optimization for defective GNR structures demonstrates the versatility of the Bayesian optimization in a robust and highly efficient multi-variable optimal search.

### B. Thermal radiation

Tailoring thermal radiation plays an important role in designing ultranarrow-band, polarization-dependent, directional emission nanophotonics, and metamaterials. Such materials have a wide range of applications in solar emitters/absorbers, thermal functional textiles, light incandescent sources, and radiative cooling. The Bayesian optimization has been successfully applied to an ultranarrow-band wavelength-selective thermal radiator and thermal radiative cooling.^{44} Here, we introduce the work by Guo *et al.*^{44} They employed the Bayesian optimization to design structures consisting of a one-dimensional grating and multilayer structures for a spectral-selective radiative cooling application.

#### 1. Problem description and objective function

Thermal radiative cooling takes advantage of the atmospheric transparent window in the wavelength range of 8–13 *μ*m. This window directly transmits the thermal radiation energy to 3 K outer space without any energy consumption. A high radiative emissivity is essential to maximize the heat flux radiated from this atmospheric transparent window. The structure to be optimized is composed of a top dielectric grating layer, a middle multilayer, and a thin silver layer at the bottom [Fig. 13(a)]. The grating material is silica because it can excite the surface phonon polaritons around 10 *μ*m to enhance the absorption in the atmospheric window. Candidate materials for the multilayer are selected based on the criteria that the absorption should be high in the atmospheric window while the solar energy wavelength range should be as low as possible. Silver is selected as the substrate material due to its high plasmon frequency to reflect the solar energy.

To evaluate the thermal radiative property within the atmospheric window for each candidate structure, the figure of merit (FOM), which compares the emission power with blackbody case, is defined as

where *ɛ* is the emissivity of each structure and *E _{b}* is the spectral blackbody emissive.

*λ*

_{1},

*λ*

_{2},

*λ*

_{min}, and

*λ*

_{max}are 8, 13, 1, and 20

*μ*m, respectively.

#### 2. Descriptors and transport property evaluation

The parameters involved in optimization are the grating period length *P*; filling ratio *f*; height *h*; the material type for three layers *m*_{1}, *m*_{2}, and *m*_{3}; and thickness *d*_{1}, *d*_{2}, and *d*_{3} for three layers in the multilayer region. Each possible candidate is represented by descriptors [*P*, *f*, *h*, *m*_{1}, *m*_{2}, *m*_{3}, *d*_{1}, *d*_{2}, and *d*_{3}]. Prior to optimization, different length scales and material types are roughly estimated to determine the variation range of these parameters as well as to fix increments to control the searching space. Here, the grating period length *P* ranges from 1 *μ*m to 10 *μ*m in 0.5-*μ*m steps, the filling ratio *f* ranges from 0.1 to 1 in 0.05 steps, and the height *h* ranges from 0.5 *μ*m to 10 *μ*m in 1-*μ*m steps. The material types are 0, 1, and 2 denoting Si, SiO_{2}, and Al_{2}O_{3}, respectively. The multilayer thicknesses *d*_{1}, *d*_{2}, and *d*_{3} range from 1 *μ*m to 10 *μ*m in 0.5-*μ*m steps. Hence, the total number of candidate structures is 510,183,360.

The property of the thermal photonic structure is evaluated by the rigorous coupled-wave analysis method. The structures are sliced into several layers along the *z*-direction. The electromagnetic field and dielectric distribution of the structure are decomposed by summation of the finite Fourier harmonic terms in each sliced layer. By imposing the continuous boundary condition between the adjacent sliced layers of the electric and magnetic fields, the governing Maxwell equation of the electric or the magnetic field can be solved fast and accurately.

#### 3. Results and discussion

Due to memory-size constraints in Bayesian optimization, the candidates are divided into 320 × 81 subgroups. Then, the Bayesian optimization is performed for each group. Finally, the best structure is selected from each subgroup. Less than 50 candidates in each subgroup are used to identify the optimal structure in Bayesian optimization. By contrast, a random search requires 4500 structures to find the optimal candidate in each subgroup.

The descriptor of the final targeted optimal structure is [8, 0.5, 1.4, 0, 0, 1, 0.9, 0.5, and 1] [Fig. 13(a)]. Its spectral selectivity exactly matches the atmospheric window [Fig. 13(b)]. It is interesting to see that, despite including Al_{2}O_{3} in the optimization, the optimal structure does not contain any Al_{2}O_{3}. Figures 13(c) and 13(d) give the polar angle dependence of emittance for the designed optimal structure under *p-*polarized and *s*-polarized incident waves, respectively. For *s-*polarized wave incidence, the phonon polariton cannot be excited due to the absence of a material with negative permeability. The absorption is from the intrinsic absorption band of the materials. Comparing the *s-* and *p-*polarized incident results, the magnitude of *p*-polarized incident emittance increases greatly within 8–11 *μ*m, indicating the necessity of a grating structure on the top surface to excite the phonon polaritons.

## V. SUMMARY AND PERSPECTIVES

In the past few years, MI has evolved into a powerful and highly efficient tool for thermal functional materials design. In this Tutorial, we discuss the basic principles and implementation of MI involving descriptors, objective functions, property calculators, and informatics algorithms. The Bayesian optimization has been applied to find the superlattice structure with a minimized thermal conductance. The optimal aperiodic superlattice can be designed by calculating only a few percent of the total number of candidates. The low thermal conductivity of the aperiodic superlattice is attributed to the balance of the Fabry–Pérot wave interference and interfacial particle scattering. These results show that MI not only supports finding the optimal structure, but also promotes the discovery of physical mechanisms. The thermoelectric optimization for defective GNR structures demonstrates the versatility, efficiency, and robustness of the Bayesian optimization in a multi-variable optimal search. For thermal radiation, a hybrid multilayer and grating structure realizes radiative cooling tailored to the thermal emittance within the atmospheric window. This material has the highest cooling performance to date. These successful applications demonstrate the advantage of MI in designing thermal functional materials with a desired property.

Despite the successful applications, MI for thermal transport is still in its infancy, which offers great opportunities for future development.

Most current MI work focuses on computational thermal transport due to ease and speed of investigation. In the future, experimental MI research should be expanded. In materials science, high-throughput materials synthesis and composition/structure characterization on combinatorial chips have been already relatively well developed. Combining MI with thermal property mapping measurements on a micrometer-scale resolution using methods such as time-domain thermoreflectance

^{99}may accelerate this process.Although various elemental, structural, digital, and physical property descriptors have been developed for MI, novel and effective descriptors depend on the application. How to optimize the selection and combination of different types of descriptors is also important. One challenge is that expert experience or knowledge about descriptor selection is unknown prior to machine learning. One way to overcome this is to use machine learning to select the descriptors.

^{100}When using MI, there is a large gap between the “big data” required for credible machine learning and the “small data” currently available. The technique of transfer learning

^{56}has considerable potential to address this issue. For the target property to be predicted from limited training data, an intermediate property, which holds physical connections with the target property, is pretrained using available big data. The final prediction model can then be obtained by adjusting minor parameters in the pretrained model using small data.As the design degrees of freedom increase, optimizations tend to exceed the ability of classical algorithms since optimization problems grow exponentially more difficult with the number of variables. Although the Monte Carlo tree search and genetic algorithms can deal with such optimization problems, their efficiency is not high. Recently, quantum annealing

^{65}has been incorporated into automated materials’ discovery to circumvent the difficulty with global optimization. Its application to design complex metamaterials for thermal radiative cooling using quantum annealing has shown its amazing efficiency and searching ability.

The concept of MI should be widely applied in more advanced materials’ design related to phonons, electrons, photons, and magnons.

## SUPPLEMENTARY MATERIAL

Information concerning installation of COMBO, a sample script for the Si/Ge superlattice optimization, and data file description are provided.

## ACKNOWLEDGMENTS

We thank Koji Tsuda, Thaer M Dieb, Atsushi Sakurai, Jiang Guo, and Masaki Yamawaki for their significant contributions to some of the works reviewed here. This work was supported by the Materials Research by Information Integration Initiative (MI^{2}I) project with the Support Program for Starting Up Innovation Hub and CREST (Grant Nos. JPMJCR16Q5 and JPMJCR20Q3) from the Japan Science and Technology Agency (JST), and KAKENHI (Grant No. 19K14902) from the Japan Society for the Promotion of Science (JSPS). This work was also partially supported by the National Natural Science Foundation of China (NNSFC; Grant No. 52006134), the Materials Genome Initiative Center of Shanghai Jiao Tong University, and the Materials Genetic Engineering Project for Rare and Precious Metals by Yunnan Province (Grant No. 202002AB080001-1).

## DATA AVAILABILITY

The data and unavailable thesis reference that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

_{2}O

_{4}