Thermoelectric coolers (TECs) offer a promising solution for direct cooling of local hotspots and active thermal management in advanced electronic systems. However, TECs present significant trade-offs among spatial cooling, heating, and power consumption. The optimization of TECs requires extensive simulations, which are impractical for managing actual systems with multiple hotspots under spatial and temporal variations. In this study, we present a novel machine learning-assisted optimization algorithm for thermoelectric coolers that can achieve global optimal temperature by individually controlling TEC units based on real-time multi-hotspot conditions across the entire domain. We train a convolutional neural network with a combination of the inception module and multi-task learning approach to comprehend the coupled thermal-electrical physics underlying the system and attain accurate predictions for both temperature and power consumption with and without TECs. Due to the intricate interaction among passive thermal gradient, Peltier effect and Joule effect, a local optimal TEC control experiences spatial temperature trade-off which may not lead to a global optimal solution. To address this issue, we develop a backtracking-based optimization algorithm using the machine learning model to iterate all possible TEC assignments for attaining global optimal solutions. For any m × n matrix with NHS hotspots (n, m ≤ 10, 1 ≤ NHS ≤ 20), our algorithm is capable of providing 52.4% peak temperature reduction and its corresponding TEC array control within an average of 1.64 s while iterating through tens of temperature predictions behind-the-scenes. This represents a speed increase of over three orders of magnitude compared to traditional finite element method strategies which take approximately 27 min.

Despite great advancements in semiconductor technology beyond the sub-3 nm node,1 most thermal management techniques nowadays are limited to the macroscale operation. The trend toward device miniaturization and the rapid emergence of System-on-Chip (SoC) inevitably complicate the thermal behavior within microelectronic devices.2,3 Specifically, multiple on-chip hotspots exhibit spatial and temporal changes due to workload variations, environmental fluctuations, device defects and aging, which can occur among modules,4,5 cores (processors),6,7 and transistors.8 The complexity of the hotspot behavior presents unprecedented challenges for conventional thermal management methods which only rely on uniform control, necessitating a more efficient, sophisticated, and intelligent approach capable of on-demand thermal management to ensure optimal functionality and longevity of microelectronic devices.9 

Among various active cooling techniques, thermoelectric coolers (TECs) offer distinctive local cooling capability as well as several other advantages,10–12 making them a promising solution to hotspot thermal management. In recent years, there have been emerging designs utilizing single TECs13–16 and TEC arrays17–19 for on-chip hotspot cooling in microelectronic devices. Revolutionary materials, including nanostructured Si,20–22 self-hygroscopic hydrogel,23 and flexible inorganics,24,25 are extensively studied to improve TEC cooling performance. However, TEC cooling exhibits significant trade-offs in spatial temperature and power consumption, and its performance relies on multiple variables including TEC voltages and hotspot conditions.12,26 The high non-linearity in TEC behavior requires multiple solutions for optimization, which brings expensive computational cost to conventional finite element method (FEM) simulations. Furthermore, in actual applications where multiple hotspots undergo spatial and temporal evolution, the traditional techniques become more challenging and even impossible to realize a real-time optimal TEC control.

The thriving field of machine learning offers a powerful tool for thermoelectric research by providing neural network models that greatly expedite the process of thermoelectric material selection,27 TEC design,28,29 and optimization.30,31 However, these models primarily focus on the analysis of back-end designs by considering an individual, isolated TEC device,28,29 or using over-simplified optimization logics such as linear control30 and uniform control.31 Apparently, there is still vacancy and urgent need for a more comprehensive model that can comprehend the coupled thermal-electrical physics in TECs while predicting their spatial interplays with multiple hotspots undergoing dynamic evolution across the entire domain. Ultimately, this model should be capable of providing responsive TEC control and the corresponding power consumption over the entire domain to achieve real-time global optimal temperature.

In this study, we present a machine learning-assisted optimization algorithm for TECs that fulfills the aforementioned on-demand thermal management. We utilize our previous holey silicon-based TEC array with independent TEC control32 as an illustrative example for conducting the analysis. We develop a convolutional neural network (CNN) with the inception module and multi-task learning (MTL) approach to perceive the spatial correlation of TECs and hotspots, thereby accurately predicting temperature and power consumption by comprehending the thermal-electrical physics underlying the system. During the TEC optimization process, the major challenge lies in the intricate thermal-electrical interaction among multiple hotspots and TECs, since a local optimal TEC control may not lead to a global optimal solution due to temperature redistribution. Therefore, we develop a backtracking-based optimization algorithm that efficiently explores all potential TEC assignments in order to obtain the global optimal temperature based on real-time hotspot conditions. Note that this methodology can be applied to general TEC/TEC array designs with a wide range of thermoelectric materials (e.g., Bi2Te3/Sb2Te3), configurations (e.g., lateral- and vertical-oriented TECs), and device scales (e.g., module-scale and transistor-scale). Consequently, this approach can hopefully provide efficient TEC/TEC array control logics for the future TEC-incorporated electronic systems.

To demonstrate our study, we choose our previous theoretical designs,32 the holey silicon-based singe TEC [Fig. 1(a)], and its scaling array [Fig. 1(b)] as our TEC model. The model features a lateral orientation of the TEC components (i.e., Peltier electrodes and holey silicon region) along with the central hotspot. Here, holey silicon is the thermoelectric material due to its compatibility with microfabrication processes, and because the introduction of vertical nanoholes results in substantial decrease of in-plane thermal conductivity due to phonon boundary scattering, meanwhile retaining excellent electrical properties (electrical conductivity and Seebeck coefficient) from p-type silicon.15,21,33–35 When positive voltage is applied to the cooler, lateral heat redistribution occurs which provides active heat removal from the hotspot to the in-plane surroundings. Compared to a single TEC, the TEC array as shown in Fig. 1(b) has multiple coolers enclosed by a single ground that allow for independent TEC control. Based on the existing hotspot conditions, different coolers can accept different input voltages to achieve on-demand thermal management.

FIG. 1.

Holey silicon-based lateral TEC and its array model. (a) The schematic of a single TEC. (b) The schematic of an arbitrary m × n TEC array with NHS assigned hotspots and NTEC assigned TECs (1 ≤ m, n ≤ 10; 0 ≤ NHS, NTEC ≤ min[m × n, 20]). The intensities of hotspots and TECs are in nine discrete levels. (c) FEM modeling examples of a 3 × 3 hotspot-TEC array with three scenarios: hotspots only, TECs only, and hotspots + TECs.

FIG. 1.

Holey silicon-based lateral TEC and its array model. (a) The schematic of a single TEC. (b) The schematic of an arbitrary m × n TEC array with NHS assigned hotspots and NTEC assigned TECs (1 ≤ m, n ≤ 10; 0 ≤ NHS, NTEC ≤ min[m × n, 20]). The intensities of hotspots and TECs are in nine discrete levels. (c) FEM modeling examples of a 3 × 3 hotspot-TEC array with three scenarios: hotspots only, TECs only, and hotspots + TECs.

Close modal
In general, the TEC modeling involves thermal-electrical physics which includes passive heat diffusion, thermoelectric effect, and Joule effect. In steady state, the governing equations can be written as
(1)
(2)
(3)
(4)
where q is the heat flux vector, Q e is the Joule heat, J is the current density vector, E is the electric field vector, S is the Seebeck coefficient, T is the absolute temperature, k is the thermal conductivity, V is the voltage, and σ is the electrical conductivity. Equations (1) and (2) represent heat transfer in a thermoelectric material, where Joule heating is considered as the primary source of internal heat generation. Equation (3) defines the total heat flux which combines the Peltier effect and traditional heat conduction dictated by Fourier's law. Equation (4) describes the current density as driven by both electric potential and temperature gradients as a result of the Seebeck effect. In our TEC model, the thermal conductivity, Seebeck coefficient, and electrical conductivity of holey silicon are set at 1 W/mK, 440 μV/K, and 1.36 × 104 S/m, respectively, representing a 30% porosity, 20 nm neck size, and highly doped P-type holey silicon thin film at elevated temperatures.21,34,35 The ambient temperature is set at 30 °C, and the array boundary is set to be laterally conducted. Considering the target in transistor-scale thermal management, the size of one cooler unit is defined at 1 × 1 μm2. Besides, to investigate the scaling effect and provide more flexibility, the TEC array can have arbitrary m rows and n columns (1 ≤ m, n ≤ 10) with random NHS assigned hotspots and NTEC assigned TECs (0 ≤ NHS, NTEC ≤ min[m × n, 20]). To simplify the problem while exaggerating the temperature difference, the assigned hotspot heat flux and assigned TEC voltage are pre-defined with eight discrete intensities, ranging from 0.5 × 1015 to 4 × 1015 W/m3 (an interval of 0.5 × 1015 W/m3) and from 25 to 200 mV (an interval of 25 mV), respectively. Figure 1(c) demonstrates a specific example of using 2D steady-state FEM simulations for a 3 × 3 TEC array. From the simulations, a temperature map can be numerically derived given the values of m, n, NHS, NTEC and their corresponding intensities.

Figure 2 illustrates the research workflow which includes massive data generation and data postprocessing. Here, we utilize MATLAB R2021a to generate random hotspot and TEC inputs based on the given constraints, which will be sequentially fed to COMSOL 6.0 to conduct FEM simulations and evaluate temperature maps and power consumption as outputs. In MATLAB, the random values of rows (m) and columns (n) in the TEC array will be first defined. Later, two random m × n matrices with values from 0 to 8 will be generated, representing the hotspot and TEC intensities. The maximum number of assigned hotspots (NHS) and TECs (NTEC) depend not only on the model constraints but also on the robustness of the training process. Here, we select min[m × n, 36] as the maximum number since it exceeds the original span (0 ≤ NHS, NTEC ≤ min[m × n, 20]) for conducting a more robust training, meanwhile, it is not so large to weaken the independency of individual units. With the defined hotspot and TEC inputs, steady-state FEM simulations will be conducted, followed by the evaluation of temperature maps. Eventually, the temperature will be stored in an m × n output and the power consumption at eight intensities will be in an 8 × 1 output. The autonomous program generates 100 000 random samples, whose total number considers the complexity of the model, the desired accuracy, and the computation resources. After generating the original data set, data splitting of training set (70%), development set (10%), and test set (20%) is performed. The training set is further augmented with the transformation of flipping (horizontal and vertical), rotation (90°, 180°, and 270°), and their combinations, resulting in a total number of training samples as eight times as the original (560 000). This augmentation offers a cost-effective way to create new data, improve model’s accuracy and robustness,36 and facilitate the learning of inherent symmetry.

FIG. 2.

Research workflow. (Left) Massive data generation based on autonomous FEM simulations and (right) data postprocessing, including data splitting, data augmentation, neural network training, and TEC optimization algorithm design.

FIG. 2.

Research workflow. (Left) Massive data generation based on autonomous FEM simulations and (right) data postprocessing, including data splitting, data augmentation, neural network training, and TEC optimization algorithm design.

Close modal
Due to the necessity to generate a 2D temperature map and its significant relevance to the spatial correlation among hotspots and TECs, a convolutional neural network (CNN) is chosen as the base model.37 However, most CNN models in the literature, such as LeNet-5,38 VGG16,39 and GoogLeNet (Inception),40 are primarily designed for classification tasks. Therefore, modifications are necessary to adapt the CNN model for regression tasks with continuous output values. After conducting multiple tests and comparison, we have developed a CNN model based on the inception40,41 over other options (e.g., encoder–decoder42,43 and residual network44) and employed a multi-task learning (MTL) approach45,46 to comprehend the thermal-electrical physics underlying the system and attain accurate predictions for both temperature and power consumption, as depicted in Fig. 3. Specifically, the 10 × 10 × 3 input matrix consists of three channels, representing the hotspot heat flux, TEC voltage, and boundary indicator. Note that the boundary indicator is introduced as an auxiliary input channel so that the model can learn its boundary effectively. For any m × n TEC array where m and n are less than 10, zero padding is applied to ensure uniform shapes for both input and temperature output. The inception module incorporates nine pathways in parallel with one or more Conv2D layers in various kernel sizes. Each layer is then followed by a rectified linear unit (ReLu)47 activation function to introduce non-linearity. Through many trials, it is observed that the network perform best when the layer counts are [1,3,4,4,1,1,1,1,1], corresponding to the pathways with kernel sizes from 1 × 1 to 9 × 9. More layers are found necessary for pathways with kernel sizes of 2 × 2, 3 × 3, and 4 × 4 since these kernels capture the most important spatial hierarchies in the system. Besides, the one-layer pathway with 1 × 1 kernel may play a role in boundary detection. Furthermore, other short pathways with large kernels seems crucial to perceive extensive spatial distribution.48 The results obtained from multiple pathways are then concatenated and flattened before being processed by two separate series of fully connected layers with linear activation functions for continuous output values. Finally, the first series generates 100 nodes, representing a 10 × 10 temperature map output, and the second series creates 8 nodes which correspond to the power consumption at 8 TEC intensities. During the training, a summation of two mean squared errors (MSEs)49,50 separately for temperature and power is defined as the loss function to adjust the weights for all layers through backpropagation, which can be written as
(5)
(6)
(7)
where L total is the total loss, M S E T and M S E P are the mean squared errors of temperature and power consumption, respectively, Y i , j and Y ^ i , j are the ground-truth and predicted temperatures in the ith row and jth column, respectively, M and N are the maximum numbers of rows and columns, respectively (M = N = 10), Z k and Z ^ k are the ground-truth and predicted power consumption values at the kth TEC intensity, respectively, and NL is the total number of non-zero TEC intensities (NL = 8). Figure 4 demonstrate the MSE loss in 100 training epochs. With a learning rate of 0.001 and a batch size of 500 samples, the loss decreases rapidly through the first 20 epochs and stabilizes the third decimal place through the last 20 epochs. Table I summarizes the MSE loss for the training set, development set, and test set.
FIG. 3.

CNN based on inception module and multi-task learning. The input matrices and output temperature matrix are padded into 10 × 10 arrays for uniform shapes. The output power matrix is an 8 × 1 array corresponding to eight TEC intensities.

FIG. 3.

CNN based on inception module and multi-task learning. The input matrices and output temperature matrix are padded into 10 × 10 arrays for uniform shapes. The output power matrix is an 8 × 1 array corresponding to eight TEC intensities.

Close modal
While MSE is sensitive to large values and offers a smooth and differentiable function for the training, the mean absolute error (MAE)51 provides a more interpretable evaluation of temperature prediction. The MAE loss is defined as
(8)
where m and n represent the actual number of rows and columns before padding, respectively. Figures 5(a)5(e) demonstrates the loss analysis with additional 4200 samples, where 10 random samples are taken into average for every combination of x axis and y axis arguments. As shown in Fig. 5(a), a relatively small and uniform MAE loss occurs when no TEC is assigned. However, when any TEC is assigned, the MAE loss becomes generally large and will be impacted by the array rows and columns. This is because the implementation of thermoelectric effect significantly increases the model complexity compared to a pure heat transfer model. It should be noted that the number of assigned TECs does not significantly impact the MAE loss when more than one TEC is assigned, as shown in Fig. 5(d). For all samples, the error of power consumption between the simulation and prediction values is plotted in Fig. 5(e).
FIG. 4.

MSE loss as a function of training epochs.

FIG. 4.

MSE loss as a function of training epochs.

Close modal
FIG. 5.

Loss analysis of 4200 random samples. (a)–(c) MAE loss as a function of array dimensions (1 ≤ m, n ≤ 10) in 3000 samples. (a) Hotspots only. (b) TECs only. (c) Hotspots + TECs. (d) MAE loss as a function of hotspot and TEC counts (1 ≤ NHS, NTEC ≤ 10) in a 6 × 6 TEC array in 1200 samples. (e) Error of power consumption between ground-truth and predicted values for all samples.

FIG. 5.

Loss analysis of 4200 random samples. (a)–(c) MAE loss as a function of array dimensions (1 ≤ m, n ≤ 10) in 3000 samples. (a) Hotspots only. (b) TECs only. (c) Hotspots + TECs. (d) MAE loss as a function of hotspot and TEC counts (1 ≤ NHS, NTEC ≤ 10) in a 6 × 6 TEC array in 1200 samples. (e) Error of power consumption between ground-truth and predicted values for all samples.

Close modal
TABLE I.

Summary of MSE loss.

Training loss (70% data)Validation loss (10% data)Test loss (20% data)
MSE—temperature 0.604 0.936 0.951 
MSE—power 0.258 0.447 0.443 
Training loss (70% data)Validation loss (10% data)Test loss (20% data)
MSE—temperature 0.604 0.936 0.951 
MSE—power 0.258 0.447 0.443 

The predictions of a 1 × 1 TEC array (i.e., a single TEC) under hotspot only and TEC only scenarios are illustrated in Fig. 6. For temperature predictions, the MAE loss can be simply interpreted as the local error in a single TEC. The results show that the CNN model not only captures the proportional relationship between temperature and hotspot intensity, but it also successfully predicts the parabolic TEC cooling as a function of input voltage50 due to coupled Joule and Peltier effect.

FIG. 6.

Predictions of 1 × 1 array. (a) Temperature prediction for hotspot only. (b) Temperature prediction for TEC only. (c) Power consumption prediction for TEC only.

FIG. 6.

Predictions of 1 × 1 array. (a) Temperature prediction for hotspot only. (b) Temperature prediction for TEC only. (c) Power consumption prediction for TEC only.

Close modal

To demonstrate the multi-hotspot scenarios, Fig. 7 illustrates four prediction examples of a 6 × 6 TEC array with the following: (a) random hotspots, (b) random hotspots + TEC cooling, (c) clustered hotspots, and (d) clustered hotspots + TEC cooling. In the first two scenarios, nine arbitrary hotspots are assigned with random intensities to represent a system incorporating different modules that experience various local heating conditions. In the last two scenarios, on the other hand, nine assigned hotspots are clustered within a 3 × 3 region with equal intensity to mimic a system consisting of similar components that undergo simultaneous workloads. It is observed that the orderly and clustered scenarios exhibit slightly higher MAE loss compared to the scattered and random scenarios due to the lower likelihood of generating well-organized data during data generation. Furthermore, for Figs. 7(b) and 7(d), larger MAE loss is identified owing to the introduction of TEC cooling mechanism. Nevertheless, the key features of the TEC array can be safely captured: First of all, significant lateral heat redistribution can be observed from the predictions, where higher temperature occurs near the TEC-assigned regions compared to their hotspots-only counterparts. Second, the clustered TECs are predicted to have poorer effectiveness compared to the isolated TECs with the same intensity. This is because the adjacent TECs tend to generate active heat flow against each other, resulting in ineffective cooling. Lastly, local TEC cooling can be influenced by its corresponding hotspot conditions. A TEC is more likely to provide greater temperature reduction when its local hotspot has higher intensity. In summary, local TEC cooling will impact and be impacted by the surrounding TECs and hotspots. Therefore, achieving a global optimal solution is almost impossible by simply considering the local optimal TEC control.

FIG. 7.

Case studies of a 6 × 6 array for temperature predictions (unit: °C). (a) Arbitrary hotspots without TECs. (b) Arbitrary hotspots with TECs (not optimal). (c) Clustered hotspots without TECs. (d) Clustered hotspots with TECs (not optimal).

FIG. 7.

Case studies of a 6 × 6 array for temperature predictions (unit: °C). (a) Arbitrary hotspots without TECs. (b) Arbitrary hotspots with TECs (not optimal). (c) Clustered hotspots without TECs. (d) Clustered hotspots with TECs (not optimal).

Close modal

Due to the complex dependency among TEC array and multiple hotspots, using the traditional FEM-based techniques to enumerate all possible solutions seem difficult and even impossible. However, with the efficient optimization algorithm based on machine learning model, real-time global optimal solution can be feasible. In this study, we set our target to find the global optimal temperature (i.e., the smallest peak temperature) based on the existing hotspot conditions, while other possibilities, such as looking for the minimum TEC power/TEC counts for achieving acceptance temperature, can also be possible. Figure 8 demonstrates the flow chart of the backtracking-based52,53 TEC decision-making algorithm. This algorithm can compute the lowest peak temperature across the multi-hotspot system given the number of available TEC intensities (i.e., optimization level, K). The established CNN model serves as a function to efficiently evaluate the current status. To improve efficiency and reduce unnecessary iterations, two assumptions are made: first, the highest-temperature grid has the most priority to assign the TEC. Second, the next TEC assignment must lead to a lower peak temperature compared to the current one. Only when these two assumptions hold will the algorithm look for a deeper solution based on the existing ones. Figure 9 demonstrates three cases of the 9 × 9 TEC array control using the developed algorithm: (a) random sparse hotspots, (b) random dense hotspots, and (c) clustered hotspots. The peak temperatures of three samples (i.e., 348, 362, and 349 °C, respectively) experience substantial temperature reduction (i.e., dropped down by 177, 172, and 176 °C, respectively) after the single-level optimal TEC control. Here, we define cooling effectiveness = (Tpeak,0–Tpeak,opt)/(Tpeak,0–30 °C), where Tpeak,0 and Tpeak,opt are the original and optimal peak temperatures, respectively. At this point, the cooling effectiveness of three samples yields 55%, 52%, and 55%, respectively. The total iteration counts (and times) are 26 (936 ms), 44 (1584 ms), and 47 (1692 ms), respectively. A greater K can lead to higher temperature reduction and cooling effectiveness at the expense of more computational cost. Additionally, all samples show a trend where the hotspot is moving toward the center as the optimization process evolves. This is because the TEC cooling inherently drives the system toward a more uniform temperature field, which manifests as the formation of a centralized hotspot with mitigated temperature gradient. It is worth noting that only small and moderate intensities of TEC (i.e., 1–5) are found in the optimal TEC assignments, while those large intensities of TEC (i.e., 6–8) may either cause too much penalty (i.e., temperature rise) to its neighboring or generate too much local Joule heat, which are abandoned by the optimization algorithm. This again demonstrates the fact that a local optimal TEC control may not suffice for the global optimal temperature. Interestingly, for clustered hotspots, optimal TEC assignments follow a staggered “checkerboard” pattern. This observation motivates a novel strategy for TEC array placement against uniform heat flux.

FIG. 8.

Flowchart of backtracking-based TEC decision-making algorithm.

FIG. 8.

Flowchart of backtracking-based TEC decision-making algorithm.

Close modal
FIG. 9.

Case studies of a 9 × 9 array for backtracking-based decision-making (unit: °C). The pink matrix represents the real-time hotspot conditions, the blue matrix represents the optimal TEC assignment based on the optimization level, and the red-green matrix denotes the corresponding global optimal temperature map. Three scenarios are discussed: (a) random sparse hotspots, (b) random dense hotspots, and (c) clustered hotspots.

FIG. 9.

Case studies of a 9 × 9 array for backtracking-based decision-making (unit: °C). The pink matrix represents the real-time hotspot conditions, the blue matrix represents the optimal TEC assignment based on the optimization level, and the red-green matrix denotes the corresponding global optimal temperature map. Three scenarios are discussed: (a) random sparse hotspots, (b) random dense hotspots, and (c) clustered hotspots.

Close modal

Figure 10 evaluates the as-achieved cooling performance within 1800 random samples using the machine learning-assisted TEC optimization algorithm. A total of 600 samples are generated for each array size of 6 × 6, 8 × 8, and 10 × 10. Among these samples, there are 30 samples for every NHS ranging from 1 to 20. Here, the maximum optimization level is set at six, which allows up to six discrete intensities for the assigned TEC voltage. As a result, the average peak temperature reductions for the six levels are {206, 222, 226, 226, 226} °C, the average cooling effectiveness values are {52.3%, 56.5%, 57.4%, 57.6%, 57.6%, 57.6%}, and the corresponding power consumption yields {24.4, 29.0, 29.4, 29.7, 29.8, 29.8} mW, respectively. For 0 < K ≤ 3, an increase in K provide a greater peak temperature reduction and higher cooling effectiveness at the expense of increased power consumption. However, for 3 < K ≤ 6, the cooling reaches a plateau. In general, larger arrays and smaller hotspot counts can result in more significant cooling due to more available TECs and larger space for heat redistribution.

FIG. 10.

As-achieved cooling performance and power consumption of 1800 random samples using the machine learning-assisted TEC optimization algorithm. (a) and (b) Peak temperature reduction as a function of the optimization level. (c) and (d) The corresponding cooling effectiveness. (e) and (f) The corresponding power consumption. The first column is varied by array dimensions and the second is by hotspot counts.

FIG. 10.

As-achieved cooling performance and power consumption of 1800 random samples using the machine learning-assisted TEC optimization algorithm. (a) and (b) Peak temperature reduction as a function of the optimization level. (c) and (d) The corresponding cooling effectiveness. (e) and (f) The corresponding power consumption. The first column is varied by array dimensions and the second is by hotspot counts.

Close modal

Figure 11 demonstrates the efficiency analysis based on the aforementioned 1800 samples using the machine learning-assisted TEC optimization algorithm. Here, the iteration count indicates the total number of predictions required to perform a single optimization, which reflects the computational cost. From level one to six, the average iteration counts are {36, 344, 1400, 2951, 3906, 4105}, respectively. The iteration count increases with the increased optimization level and becomes excessively large when NHS is large. Given the decreasing margin of cooling improvements, it is highly recommended to apply the TEC optimization algorithm with K ≤ 3 in order to achieve balanced computational cost and TEC cooling performance.

FIG. 11.

Efficiency analysis of 1800 random samples using the machine learning-assisted TEC optimization algorithm. (a) Iteration statistics of various array dimensions. (b) Iteration statistics of various hotspot counts.

FIG. 11.

Efficiency analysis of 1800 random samples using the machine learning-assisted TEC optimization algorithm. (a) Iteration statistics of various array dimensions. (b) Iteration statistics of various hotspot counts.

Close modal

To further investigate the efficiency of the machine learning-based TEC optimization algorithm, we record the single prediction time in Table II for both FEM simulation and CNN model within 3000 samples as mentioned in Fig. 5. Following this, in Table III, we summarize the running time for the optimization algorithm using the 1800 samples mentioned in Figs. 10 and 11. All FEM simulations are performed using COMSOL 6.0 with CPU computation on an AMD Ryzen 9 3950× processor (16-core, 3.5 GHz) and 128GB memory, with a maximum of 1 582 714 degrees of freedom. The CNN model predictions are computed using GPU acceleration on a NVIDIA GeForce RTX 2080Ti (11GB) with a total of 124 157 612 parameters. Based on the statistical results, the average FEM simulation time for a single prediction is found to be 45 s. Larger array sizes generally result in a longer computational time due to the increased degrees of freedom. Conversely, the CNN prediction demonstrates similar computational time through various input variables with an average time of only 42 ms. A speed increase of over three orders of magnitude of is found when using the CNN model to conduct a single prediction compared to the traditional FEM methods. Furthermore, with the acceleration of the CNN model, the single-level, double-level, and triple-level TEC optimization can be carried out within an average time of 1.5, 14.5, and 58.8 s, respectively, where the same amount of FEM computation will take about 26.8 min, 4.3h, and 17.4 h, respectively. The significant increases in speed pave the way for on-demand thermal management using realistic TEC systems. Future research will focus on the practical integration of TEC array into complex SoC systems, exploring ways to leverage machine learning-assisted TEC optimization algorithm to ensure efficient and reliable operation.

TABLE II.

Summary of single prediction time between simulation and the CNN model.

m × nTest samplesMaximum simulation time (s)Minimum simulation time (s)Average simulation time (s)Average CNN prediction time (ms)
[1,25] 1530 39 20 42 
[26,50] 900 76 29 53 43 
[51,75] 390 108 69 89 42 
[76,100] 180 149 102 123 42 
m × nTest samplesMaximum simulation time (s)Minimum simulation time (s)Average simulation time (s)Average CNN prediction time (ms)
[1,25] 1530 39 20 42 
[26,50] 900 76 29 53 43 
[51,75] 390 108 69 89 42 
[76,100] 180 149 102 123 42 
TABLE III.

Summary of running time for machine learning-assisted TEC optimization.

Optimization levelsTest samplesMaximum time (iterations)Minimum time (iterations)Average time (iterations)
1800 3.3 s (78) 0.3 s (8) 1.5 s (36) 
1800 51 s (1212) 0.4 s (9) 14 s (344) 
1800 282 s (6702) 0.4 s (9) 59 s (1440) 
1800 815 s (19 403) 0.4 s (9) 124 s (2951) 
1800 1356 s (32 286) 0.4 s (9) 164 s (3906) 
1800 1545 s (36 780) 0.4 s (9) 172 s (4105) 
Optimization levelsTest samplesMaximum time (iterations)Minimum time (iterations)Average time (iterations)
1800 3.3 s (78) 0.3 s (8) 1.5 s (36) 
1800 51 s (1212) 0.4 s (9) 14 s (344) 
1800 282 s (6702) 0.4 s (9) 59 s (1440) 
1800 815 s (19 403) 0.4 s (9) 124 s (2951) 
1800 1356 s (32 286) 0.4 s (9) 164 s (3906) 
1800 1545 s (36 780) 0.4 s (9) 172 s (4105) 

In this study, we present a novel machine learning-assisted TEC optimization algorithm aimed at achieving global optimal temperature control for on-demand multi-hotspot thermal management in microelectronic systems. Our findings demonstrate the ability of the machine learning-assisted algorithm to dynamically adapt to the evolving thermal landscape of microelectronic devices, efficiently offering optimal TEC control for managing the spatial and temporal variations of hotspots. The algorithm not only mitigates the computational burdens associated with the traditional FEM-based optimization techniques but also heralds a significant leap toward achieving the on-demand thermal management imperative for the sustainability and performance of advanced semiconductor devices.

The authors have no conflicts to disclose.

Jiajian Luo: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Funding acquisition (lead); Investigation (lead); Methodology (lead); Project administration (lead); Resources (lead); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead). Jaeho Lee: Supervision (lead); Writing – review & editing (lead).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
Samsung Newsroom
, see https://News.Samsung.Com/Global/Samsung-Begins-Chip-Production-Using-3nm-Process-Technology-with-Gaa-Architecture for “Samsung begins chip production using 3 nm process technology with GAA architecture” (2022).
2.
R.
Saleh
,
S.
Wilton
,
S.
Mirabbasi
,
A.
Hu
,
M.
Greenstreet
,
G.
Lemieux
,
P. P.
Pande
,
C.
Grecu
, and
A.
Ivanov
, “
System-on-chip: Reuse and integration
,”
Proc. IEEE
94
(
6
),
1050
1069
(
2006
).
3.
W.
Wolf
,
A. A.
Jerraya
, and
G.
Martin
, “
Multiprocessor system-on-chip (MPSoC) technology
,”
IEEE Trans. Comput Aided Design Integrated Circuits Syst.
27
(
10
),
1701
1713
(
2008
).
4.
H.
Amrouch
,
T.
Ebi
,
J.
Schneider
, S. Parameswaran, and J. Henkel,
“Analyzing the thermal hotspots in FPGA-based embedded systems,” in 2013 23rd International Conference on Field programmable Logic and Applications, Porto, Portugal (IEEE, 2013)
, pp. 1–4.
5.
M. R.
Stan
,
K.
Skadron
,
M.
Barcella
,
W.
Huang
,
K.
Sankaranarayanan
, and
S.
Velusamy
, “
Hotspot: A dynamic compact thermal model at the processor-architecture level
,”
Microelectronics J.
34
,
1153
1165
(
2003
).
6.
A.
Gupte
, and
P.
Jones
, “
Hotspot mitigation using dynamic partial reconfiguration for improved performance
,” in
ReConFig’09—2009 International Conference on ReConFigurable Computing and FPGAs
(IEEE,
2009
), pp.
89
94
.
7.
K. C. J.
Chen
,
C. H.
Chao
, and
A. Y. A.
Wu
, “
Thermal-aware 3D network-on-chip (3D NoC) designs: Routing algorithms and thermal managements
,”
IEEE Circuits Syst. Mag.
15
(
4
),
45
69
(
2015
).
8.
M.
Engel
,
M.
Steiner
,
J. W. T.
Seo
,
M. C.
Hersam
, and
P.
Avouris
, “
Hot spot dynamics in carbon nanotube array devices
,”
Nano Lett.
15
(
3
),
2127
2131
(
2015
).
9.
C. J. M.
Lasance
, “
Thermally driven reliability issues in microelectronic systems: Status-quo and challenges
,”
Microelectron. Reliabil.
43
(
12
),
1969
1974
(
2003
).
10.
Y.
Yu
,
W.
Zhu
,
X.
Kong
,
Y.
Wang
,
P.
Zhu
, and
Y.
Deng
, “
Recent development and application of thin-film thermoelectric cooler
,”
Front. Chem. Sci. Eng.
14
(
4
),
492
503
(
2020
).
11.
T.
Guclu
and
E.
Cuce
, “
Thermoelectric coolers (TECs): From theory to practice
,”
J. Electron. Mater.
48
(
1
),
211
230
(
2019
).
12.
W. Y.
Chen
,
X. L.
Shi
,
J.
Zou
, and
Z. G.
Chen
, “
Thermoelectric coolers for on-chip thermal management: Materials, design, and optimization
,”
Mater. Sci. Eng. R
151
,
100700
(
2022
).
13.
G.
Bulman
,
P.
Barletta
,
J.
Lewis
,
N.
Baldasaro
,
M.
Manno
,
A.
Bar-Cohen
, and
B.
Yang
, “
Superlattice-based thin-film thermoelectric modules with high cooling fluxes
,”
Nat. Commun.
7
,
10302
(
2016
).
14.
I.
Chowdhury
,
R.
Prasher
,
K.
Lofgreen
,
G.
Chrysler
,
S.
Narasimhan
,
R.
Mahajan
,
D.
Koester
,
R.
Alley
, and
R.
Venkatasubramanian
, “
On-chip cooling by superlattice-based thin-film thermoelectrics
,”
Nat. Nanotechnol.
4
(
4
),
235
238
(
2009
).
15.
Z.
Ren
and
J.
Lee
, “
Thermal conductivity anisotropy in holey silicon nanostructures and its impact on thermoelectric cooling
,”
Nanotechnology
29
(
4
),
045404
(
2018
).
16.
Z.
Ren
,
J.
Cao
,
J.
Lim
,
Z.
Yu
,
J. C.
Kim
, and
J.
Lee
, “
Experimental demonstration of holey silicon-based thermoelectric cooling
,”
IEEE Trans. Electron Devices
69
(
6
),
3446
3454
(
2022
).
17.
O.
Sullivan
,
M. P.
Gupta
,
S.
Mukhopadhyay
, and
S.
Kumar
, “
Array of thermoelectric coolers for on-chip thermal management
,”
J. Electron. Packaging Trans. ASME
134
(
2
),
021005
(
2012
).
18.
L. M.
Goncalves
,
J. G.
Rocha
,
C.
Couto
,
P.
Alpuim
, and
J. H.
Correia
, “
On-chip array of thermoelectric Peltier microcoolers
,”
Sens. Actuators A Phys.
145–146
(
1–2
),
75
80
(
2008
).
19.
H.
Kattan
,
S. W.
Chung
,
J.
Henkel
, and
H.
Amrouch
, “
On-demand mobile CPU cooling with thin-film thermoelectric array
,”
IEEE Micro.
41
(
4
),
67
73
(
2021
).
20.
A. I.
Boukai
,
Y.
Bunimovich
,
J.
Tahir-Kheli
,
J. K.
Yu
,
W. A.
Goddard
, and
J. R.
Heath
, “
Silicon nanowires as efficient thermoelectric materials
,”
Nature
451
(
7175
),
168
171
(
2008
).
21.
J.
Tang
,
H. T.
Wang
,
D. H.
Lee
,
M.
Fardy
,
Z.
Huo
,
T. P.
Russell
, and
P.
Yang
, “
Holey silicon as an efficient thermoelectric material
,”
Nano Lett.
10
(
10
),
4279
4283
(
2010
).
22.
G.
Schierning
, “
Silicon nanostructures for thermoelectric devices: A review of the current state of the art
,”
Phys. Status Solidi A
211
,
1235
1249
(
2014
).
23.
X.
Mu
,
X. L.
Shi
,
J.
Zhou
,
H.
Chen
,
T.
Yang
,
Y.
Wang
,
L.
Miao
, and
Z. G.
Chen
, “
Self-hygroscopic and smart color-changing hydrogels as coolers for improving energy conversion efficiency of electronics
,”
Nano Energy
108
,
108177
(
2023
).
24.
X. L.
Shi
,
T.
Cao
,
W.
Chen
,
B.
Hu
,
S.
Sun
,
W. D.
Liu
,
M.
Li
,
W.
Lyu
,
M.
Hong
, and
Z. G.
Chen
, “
Advances in flexible inorganic thermoelectrics
,”
EcoEnergy
1
(
2
),
296
343
(
2023
).
25.
X. L.
Shi
,
S.
Sun
,
T.
Wu
,
J.
Tu
,
Z.
Zhou
,
Q.
Liu
, and
Z. G.
Chen
, “
Weavable thermoelectrics: Advances, controversies, and future developments
,”
Mater. Futures
3
(
1
),
012103
(
2024
).
26.
H.
Fateh
,
C. A.
Baker
,
M. J.
Hall
, and
L.
Shi
, “
High fidelity finite difference model for exploring multi-parameter thermoelectric generator design space
,”
Appl. Energy
129
,
373
383
(
2014
).
27.
T.
Wang
,
C.
Zhang
,
H.
Snoussi
, and
G.
Zhang
, “
Machine learning approaches for thermoelectric materials research
,”
Adv. Funct. Mater.
30
(
5
),
1906041
(
2020
).
28.
L.
Chen
,
W.
Jin
,
J.
Zhang
, and
S. X. D.
Tan
, “
Thermoelectric cooler modeling and optimization via surrogate modeling using implicit physics-constrained neural networks
,”
IEEE Trans. Comput. Aided Design Integrated Circuits Syst.
42
(
11
),
4090
4101
(
2023
).
29.
P.
Zhang
,
D. W.
Wang
,
W. S.
Zhao
,
B.
You
,
J.
Liu
,
C.
Qian
, and
H. B.
Xu
, “
Intelligent design and tuning method for embedded thermoelectric cooler (TEC) in 3-D integrated microsystems
,”
IEEE Trans. Compon. Packaging Manuf. Technol.
13
(
6
),
788
797
(
2023
).
30.
J.
Zhang
,
S.
Sadiqbatcha
,
L.
Chen
,
C.
Thi
,
S.
Sachdeva
,
H.
Amrouch
, and
S. X. D.
Tan
, “
Hot-spot aware thermoelectric array based cooling for multicore processors
,”
Integration
89
,
73
82
(
2023
).
31.
T.
Bucher
and
H.
Amrouch
, “
Modeling TPU thermal maps under superlattice thermoelectric cooling
,”
IEEE Access
10
,
21970
21978
(
2022
).
32.
J.
Luo
,
J.
Lim
,
J.
Chen
,
A.
Venugopal
,
Z.
Ren
, and
J.
Lee
, “
Dynamic thermal management in SOI transistors using holey silicon-based thermoelectric cooling
,”
IEEE Trans. Electron Devices
71
,
2577
2584
(
2024
).
33.
J.
Lee
,
J.
Lim
, and
P.
Yang
, “
Ballistic phonon transport in holey silicon
,”
Nano Lett.
15
(
5
),
3273
3279
(
2015
).
34.
N.
Liu
,
T.
Zhu
,
M. G.
Rosul
,
J.
Peters
,
J. E.
Bowers
, and
M.
Zebarjadi
, “
Thermoelectric properties of holey silicon at elevated temperatures
,”
Mater. Today Phys.
14
,
100224
(
2020
).
35.
J.
Ma
,
D.
Gelda
,
K. V.
Valavala
, and
S.
Sinha
, “
Peak thermoelectric power factor of holey silicon films
,”
J. Appl. Phys.
128
(
11
),
115109
(
2020
).
36.
C.
Shorten
and
T. M.
Khoshgoftaar
, “
A survey on image data augmentation for deep learning
,”
J. Big Data
6
(
1
),
60
(
2019
).
37.
J.
Schmidhuber
, “
Deep learning in neural networks: An overview
,”
Neural Netw
61
,
85
117
(
2015
).
38.
Y.
Lecun
,
E.
Bottou
,
Y.
Bengio
, and
P.
Haffner
, “Gradient-based learning applied to document recognition,”
Proc. IEEE
86
(
11
),
2278
2324
(
1998
).
39.
K.
Simonyan
, and
A.
Zisserman
, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556 (
2014
).
40.
C.
Szegedy
,
W.
Liu
,
Y.
Jia
,
P.
Sermanet
,
S.
Reed
,
D.
Anguelov
,
D.
Erhan
,
V.
Vanhoucke
, and
A.
Rabinovich
, “Going deeper with convolutions,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (IEEE, 2024), pp. 1–9.
41.
C.
Szegedy
,
V.
Vanhoucke
,
S.
Ioffe
, and
J.
Shlens
, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 2818–2826.
42.
A.
Li
,
R.
Chen
,
A. B.
Farimani
, and
Y. J.
Zhang
, “
Reaction diffusion system prediction based on convolutional neural network
,”
Sci. Rep.
10
(
1
),
3894
(
2020
).
43.
C.
Rao
,
P.
Ren
,
Q.
Wang
,
O.
Buyukozturk
,
H.
Sun
, and
Y.
Liu
, “
Encoding physics to learn reaction–diffusion processes
,”
Nat. Mach. Intell.
5
(
7
),
765
779
(
2023
).
44.
W.
Jin
,
W.
Zhang
,
J.
Hu
,
B.
Weng
,
T.
Huang
, and
J.
Chen
, “
Using the residual network module to correct the sub-seasonal high temperature forecast
,”
Front. Earth Sci.
9
, 760766 (
2022
).
45.
S.
Ruder
, “An overview of multi-task learning in deep neural networks,” arXiv:1706.05098 (
2017
).
46.
O. Sener and V. Koltun, “
Multi-task learning as multi-objective optimization
,” in Advances in Neural Information Processing Systems, Vol. 31 (Curran Associates, 2018).
47.
V.
Nair
, and
G. E.
Hinton
, “Rectified linear units improve restricted Boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (Omnipress, 2010).
48.
M.
Lin
,
Q.
Chen
, and
S.
Yan
, “Network in network,” arXiv:1312.4400 (
2013
).
49.
Z.
Wang
and
A. C.
Bovik
, “
Mean squared error: Lot it or leave it? A new look at signal fidelity measures
,”
IEEE Signal Process. Mag.
26
(
1
),
98
117
(
2009
).
50.
J. E.
Parrott
, “The interpretation of the stationary and transient behaviour of refrigerating thermocouples,” Solid-State Electron.
1
(2), 135–143 (
1960
).
51.
A.
Ghosh
,
H.
Kumar
, and
P. S.
Sastry
, “
Robust loss functions under label noise for deep neural networks
,”
Proc AAAI Conf. Artif. Intell.
31
(
1
), 1919–1925 (
2017
).
52.
P.
van Beek
, “Backtracking search algorithms” (
2006
), pp.
85
134
.
53.
P.
Civicioglu
, “
Backtracking search optimization algorithm for numerical optimization problems
,”
Appl. Math Comput.
219
(
15
),
8121
8144
(
2013
).