In this study, we introduce an automated design method for Quantum Cascade Laser (QCL) active region (AR) structure employing a generative neural network, termed an inverse network, which creates the structure designs based on specific k · p metric inputs related to device performance. The training dataset, derived from an earlier study, was selectively filtered to remove entries affected by energy-level hybridization or splitting, yielding ∼300 000 valid entries. A pre-trained forward network that processes QCL-AR structures and returns corresponding k · p metrics serves as the evaluator for the inverse network, supplanting traditional loss functions such as mean squared error or mean absolute error. This strategy overcomes the problem of non-uniqueness in the mapping from k · p metrics to QCL-AR structures. The inverse network incorporates a random layer, allowing it to produce a variety of QCL-AR structures from identical predicted metrics, thereby increasing the model’s practicality. Performance testing indicates high accuracy in the metrics of the generated QCL-AR structures, with the coefficient of determination, *R*^{2} scores, for key energy-level differences between the upper-laser (*ul*) level and the lower-laser level, *E*_{43}, and between the next-higher-energy level above the *ul* level and the *ul* level, *E*_{54}, of 0.9153 and 0.9701, respectively; and for the electron lifetimes *τ*_{43} and *τ*_{54} of 0.9568 and 0.9175. As an example, we show how the network generates a QCL-AR structure with the potential for low threshold-current density by suppressing shunt-type carrier leakage from the *ul* level through a higher energy AR state.

Mid-infrared (IR) semiconductor lasers, such as Quantum Cascade Lasers (QCLs), hold promise for applications in free-space communication links and remote sensing of chemicals and explosives. Automated approaches for efficient semiconductor QCL design include numerical optimization approaches, such as Bayesian optimization,^{1} simulated annealing,^{2} genetic algorithms,^{3,4} and data-driven machine learning.^{5–8} The performance of QCLs is intricately linked to their structural design, which typically requires advanced physics-based simulation tools to design the active region (AR) for specified performance metrics. For each stage of a QCL, there is a sequence of events: carrier injection into the upper laser level, followed by scattering-assisted photon emission and carrier extraction to the subsequent stage. These processes are influenced by the spatial distributions of the electron wavefunctions at various AR energy states. QCLs with InGaAs/AlInAs ARs, grown on InP substrates, have achieved high continuous-wave output powers (>1 W) over the 3.8–11 *μ*m range.^{9} However, optimizing these designs to enhance key performance metrics and reduce design time remains a challenge. Traditional design methods for the AR of QCLs^{10} are limited by their complexity and dependency on expert knowledge and manual intervention, making them time-consuming and labor-intensive. Machine learning presents a promising alternative to improve and potentially automate the design process for QCL-ARs.^{5–8} The previous work resulted in the development of an automated wavefunction identification program coupled with a k · p solver, generating a substantial dataset of QCL-AR structures and their associated k · p metrics. The k · p metrics are the metrics related to layer thickness and composition of the layered structure, which are directly obtained from k · p calculations and include critical band structure information such as energy-level differences and carrier lifetimes. Such k · p metrics can be used to compute device-level performance metrics, such as threshold-current density, lasing wavelength, lasing-transition efficiency, and internal efficiency.^{9,11} Using this dataset, we have successfully trained and validated forward neural networks that predict energy level differences from QCL-AR structures, demonstrating the dataset’s utility for training networks focused on AR design. Building on this foundation, we have developed, trained, and tested an inverse or generative neural network for QCL-AR design. This model, which operates inversely to traditional QCL performance metric calculations, takes k · p metrics as inputs and outputs corresponding QCL-AR structures. The training dataset, refined from the earlier work, was specifically filtered to optimize training. We used k · p metrics such as energy level differences and electron lifetimes for lower-laser level (*E*_{3}), upper-laser level (*E*_{4}), and higher energy active region level (*E*_{5}) as inputs. The differences in energy levels between *E*_{4} and *E*_{3}, and between *E*_{5} and *E*_{4}, are denoted as *E*_{43} and *E*_{54}, respectively, while the electron lifetimes between those states are referred to as *τ*_{43} and *τ*_{54}. These lifetimes, influenced by longitudinal-optical (LO) phonon scattering rates of crucial AR states, affect the laser population inversion as well as the carrier-leakage mechanisms involved.^{9} In addition, a random layer within the inverse network introduces variability, allowing for the generation of diverse QCL-AR structures from identical k · p metrics, thus broadening the model’s applicability in exploring a variety of potential designs. The dataset used to train the inverse neural network in this study is derived from the comprehensive dataset employed in the previous work,^{12} which was generated using a k · p energy band solver. A significant modification to this dataset involved a filtering process that reduced the number of data points to just under 300 000. This reduction was necessitated by the effects of hybridization or energy level splitting, which significantly altered the distribution of wavefunctions.

For a more detailed description of the dataset, the data filtering process, and its justification based on the underlying physical phenomena and computational constraints, readers may refer to the supplementary material accompanying this paper.

Despite assembling a substantial training dataset, directly training the inverse neural network posed significant challenges due to the inherent non-uniqueness in the mapping between k · p metrics and QCL-AR structure parameters. Specifically, multiple QCL-AR structures can correspond to a given set of k · p metrics, which can exist within the training dataset. Employing a Mean Squared Error (MSE) loss function to train an inverse network may lead the network to minimize this MSE loss across different structure parameter sets with identical k · p metrics, thereby converging on the average of these parameters. However, this averaged parameter set often does not accurately reflect the specific k · p metrics, thereby undermining the effectiveness of direct inverse network training.

To overcome this challenge, we implemented a tandem network approach,^{13} which involves integrating the target inverse network with a pre-trained forward network. The forward network,^{12} which uses QCL-AR structures as inputs and outputs corresponding to k · p metrics, was trained independently and was kept fixed during the tandem system’s training phase. This configuration effectively employed the forward network as an advanced loss function, verifying the accuracy of the QCL-AR structures generated by the inverse network against the actual k · p metrics. The forward network functioned as an evaluator, ensuring that the structures produced by the inverse network accurately reproduced the k · p metrics used in the training process.

Notably, we opted not to substitute the forward network with direct k · p solvers or other standard computational physics techniques. These traditional methods require explicit wavefunction identification, which poses significant challenges for GPU acceleration and cannot be easily integrated with gradient descent and backpropagation algorithms. Consequently, they are incompatible with neural network frameworks. By employing the pre-trained forward network, we ensured that our tandem system remains fully differentiable and efficient for training within a deep learning architecture.

Upon completion of the tandem network training, the inverse network of the tandem was extracted and used on its own to generate QCL-AR structures from given k · p metrics.

The forward network within the tandem architecture is structured to independently evaluate each k · p metric, employing individual networks trained for each k · p metric. The networks for the energy level differences, *E*_{43} and *E*_{54}, were detailed in the previous work.^{12} A similar design and training process was used for the forward networks that predict the lifetimes *τ*_{43} and *τ*_{54}. The training results for these networks on the lifetime metrics are in the supplementary material.

After training the forward networks, we proceeded to construct the inverse component of the tandem network. As illustrated in Fig. 1, the inverse network has two distinct input ports: “Input” and “GuessInput.” The “Input” receives a one-dimensional vector of length four, comprising the metrics *E*_{43}, *E*_{54}, 1/*τ*_{43}, and 1/*τ*_{54}, which represent the energy level differences and the reciprocals of the lifetimes. “GuessInput” takes in a one-dimensional vector of length 24, representing the thicknesses of various QCL layers. “GuessInput” is aptly named due to its role in accepting either randomly generated QCL-AR structures or speculative estimates, which do not necessarily correspond to the k · p metrics input. The inverse network outputs QCL-AR structure parameters corresponding to the metrics provided at the “Input” port. Essentially, the network calculates a deviation based on the k · p metrics, which is added to “GuessInput” to produce the required QCL-AR structure.

During training, variability is introduced by connecting a random layer to “GuessInput.” This random layer can either remain active during practical application to generate diverse QCL-AR structures for identical k · p metrics or be bypassed for manual structure input.

At the heart of the inverse network is a fully connected, multilayer perceptron (MLP) architecture, where vectors from “Input” and “GuessInput” are concatenated and fed into the MLP. This MLP includes two linear layers, with widths of 100 and 50, each followed by a scaled exponential linear unit (SELU) activation layer. The final output is a 24-length vector that indicates the modifications needed to adjust the initially guessed QCL-AR structure toward one that better agrees with the given k · p metrics. The magnitude of this output vector is used in “ResThickLoss,” which adjusts the influence of the guessed input on the final QCL-AR structure. By varying the weight of “ResThickLoss” during training, we control the degree to which the network’s output should conform to or deviate from the initial structure. A higher weight on “ResThickLoss” favors outputs closer to the random guesses, while a lower weight permits substantial modifications, yielding structures that may significantly differ from the guesses but align more closely with the input k · p metrics.

The random layer enables the generation of multiple QCL-AR structures for the same input k · p metrics. This allows the user to choose among a variety of possible structures, eliminating those that may be unphysical or unrealistic. In addition, the generation of multiple structures allows the user to select those with minimal deviation from the desired k · p metrics. Finally, the inverse network can be used to optimize specific k · p metrics of an existing QCL-AR structure: inputting the existing structure into “GuessInput” allows the network to refine it based on modified k · p metrics fed into “Input.”

Once the inverse network generates a QCL-AR structure, it is input to the pre-trained forward network for loss calculation, using both the k · p metric and “ResThickLoss” to iteratively refine the network during training. Further details of the training configuration are in the supplementary material.

Upon completing the training phase for the tandem network, we proceeded to test the performance of the resulting inverse network using a selected subset of the test dataset not used in training. For practical reasons, the test was limited to the first one thousand data points from the test set. Each of these data points was independently and randomly generated.

For each metric input, the inverse network was run 50 times to generate 50 different QCL-AR structures. The generated structural parameters were then rounded to the nearest whole number to meet the input requirements of the k · p solver. Minor changes induced by rounding can significantly affect lifetime calculations. Initially, we attempted to round the output of the inverse network; however, this approach resulted in the tandem network being untrainable with divergent loss values, leading to its abandonment. This sensitivity necessitated running the inverse network multiple times (50 iterations per input set) to select a structure whose rounded parameters still yield k · p metrics closely aligned with expected outcomes.

The 50 structures for each k · p metric set were evaluated using the forward network, which assessed the accuracy of the metrics. The structure with the smallest total relative error was chosen as the optimal output for that particular input set. This approach helps mitigate error propagation from the forward to the inverse network, leveraging the observation that smaller predicted errors often correlate with smaller actual discrepancies.

All four metrics—*E*_{43}, *E*_{54}, 1/*τ*_{43}, and 1/*τ*_{54}—were equally weighted in calculating the total relative error. Adjustments to these weights could be considered if specific metrics require prioritization in the future designs. Notably, the entire process, from structure generation to metric evaluation and error comparison, was fully automated, with each metric input’s processing time around 1 s.

The results are displayed in scatter plots in Fig. 2, showing the original metrics on the *x*-axis and the metrics computed from the generated QCL-AR structures post-k · p solver processing on the *y*-axis. These plots reveal a strong linear correlation between predicted and actual values, confirming the effectiveness of the inverse network. The *R*^{2} scores for the metrics were as follows: *E*_{43}: 0.9153, *E*_{54}: 0.9701, 1/*τ*_{43}: 0.9568, and 1/*τ*_{54}: 0.9175, underscoring the inverse network’s ability to accurately generate QCL-AR structures based on given metrics.

Next, we performed a test to understand the impact of the inverse network through the changes made to critical metrics on the generated QCL-AR structure. We selected a set of metrics representing median values from the dataset (*E*_{43} = 0.155 eV, *E*_{54} = 0.055 eV, 1/*τ*_{43} = 0.32 ps^{−1}, 1/*τ*_{54} = 2.8 ps^{−1}) to demonstrate the capabilities of the inverse network. Following the same testing methodology as above, we generated 50 different QCL-AR structures for this set of k · p metrics and selected the one with the smallest prediction error from the forward network as the output. The chosen QCL-AR structure was subsequently analyzed using the k · p solver to determine the corresponding k · p metrics, as shown in Table I. The conduction band diagram of the chosen QCL-AR structure is shown in Fig. 3(a).

. | E_{43} (eV)
. | E_{54} (eV)
. | 1/τ_{43} (ps^{−1})
. | 1/τ_{54} (ps^{−1})
. |
---|---|---|---|---|

Origin | 0.155 | 0.055 | 0.32 | 2.8 |

Test | 0.1538 | 0.0558 | 0.3152 | 2.6281 |

Relative error | 0.78% | 1.47% | 1.51% | 6.14% |

Modified E_{43} | 0.165 | 0.055 | 0.32 | 2.8 |

Test | 0.1658 | 0.0565 | 0.3169 | 2.7472 |

Relative error | 0.49% | 2.72% | 0.98% | 1.88% |

Modified E_{54} | 0.155 | 0.065 | 0.32 | 2.8 |

Test | 0.1550 | 0.0635 | 0.3253 | 2.7488 |

Relative error | 0.02% | 2.30% | 1.67% | 1.83% |

Modified 1/τ_{43} | 0.155 | 0.055 | 0.4 | 2.8 |

Test | 0.1542 | 0.0552 | 0.4028 | 2.8058 |

Relative error | 0.49% | 0.29% | 0.69% | 0.21% |

Modified 1/τ_{54} | 0.155 | 0.055 | 0.32 | 2 |

Test | 0.1549 | 0.0570 | 0.3200 | 2.0519 |

Relative error | 0.04% | 3.57% | 0.00% | 2.59% |

. | E_{43} (eV)
. | E_{54} (eV)
. | 1/τ_{43} (ps^{−1})
. | 1/τ_{54} (ps^{−1})
. |
---|---|---|---|---|

Origin | 0.155 | 0.055 | 0.32 | 2.8 |

Test | 0.1538 | 0.0558 | 0.3152 | 2.6281 |

Relative error | 0.78% | 1.47% | 1.51% | 6.14% |

Modified E_{43} | 0.165 | 0.055 | 0.32 | 2.8 |

Test | 0.1658 | 0.0565 | 0.3169 | 2.7472 |

Relative error | 0.49% | 2.72% | 0.98% | 1.88% |

Modified E_{54} | 0.155 | 0.065 | 0.32 | 2.8 |

Test | 0.1550 | 0.0635 | 0.3253 | 2.7488 |

Relative error | 0.02% | 2.30% | 1.67% | 1.83% |

Modified 1/τ_{43} | 0.155 | 0.055 | 0.4 | 2.8 |

Test | 0.1542 | 0.0552 | 0.4028 | 2.8058 |

Relative error | 0.49% | 0.29% | 0.69% | 0.21% |

Modified 1/τ_{54} | 0.155 | 0.055 | 0.32 | 2 |

Test | 0.1549 | 0.0570 | 0.3200 | 2.0519 |

Relative error | 0.04% | 3.57% | 0.00% | 2.59% |

Subsequently, we adjusted the k · p metrics *E*_{43}, *E*_{54}, 1/*τ*_{43}, and 1/*τ*_{54} individually to evaluate whether the inverse network could accurately produce QCL-AR structures based on desired k · p metrics beyond those within the test set, which represent more optimized designs. The modified values chosen are *E*_{43} = 0.165 eV, *E*_{54} = 0.065 eV, 1/*τ*_{43} = 0.4 ps^{−1}, and 1/*τ*_{54} = 2 ps^{−1}. The adjustments to *E*_{43} and 1/*τ*_{43} were arbitrary, tailored to specific requirements, whereas the increase in *E*_{54} and the reduction in 1/*τ*_{54} were aimed at minimizing electron scattering probabilities to suppress carrier-leakage loss.^{9}

The band diagram of the QCL-AR structure with modified *E*_{43} (0.165 eV) is shown in Fig. 3(b). Minor alterations are observed in *E*_{4} and *E*_{5}; however, the distribution of *E*_{3} shifts to the left. This adjustment is due to the exclusive enlargement of *E*_{43}, with other k · p metrics held constant, thereby ensuring minimal changes in *E*_{4} and *E*_{5} to maintain the stability of *E*_{54} and 1/*τ*_{54}. To counterbalance the impact of the increased *E*_{43} on the lifetime, the leftward shift in *E*_{3} results in enhanced wavefunction overlap between *E*_{3} and *E*_{4}. Similarly, in Fig. 3(c), *E*_{5} shifts markedly to the right for analogous reasons due to modified *E*_{54} (0.065 eV). For Fig. 3(d), the increase, from 0.32 to 0.4 ps^{−1}, in 1/*τ*_{43} indicates a higher transition probability, yet with *E*_{43} unchanged, *E*_{3} shifts left to augment overlap with *E*_{4}. In Fig. 3(e), a decrease (from 2.8–2 ps^{−1}) in 1/*τ*_{54} reduces the transition probability, with *E*_{54} remaining constant, prompting a rightward shift in *E*_{5} to reduce overlap with *E*_{4}.

^{14}

^{,}

*J*

_{leak,45}is the leakage-current density,

*τ*

_{54}is the lifetime, characterizing the carrier relaxation between states

*E*

_{5}and

*E*

_{4},

*E*

_{54}is the energy difference,

*T*

_{e4}is the electron temperature in

*E*

_{4}, and

*k*is Boltzmann constant. In general, this expression considers all forms of scattering important in mid-IR QCLs, that is, LO-phonon, interface roughness, and alloy disorder. In this work, we are only considering LO-phonon scattering and its related values.

^{15}Following this equation, adjustments made to

*E*

_{54}and 1/

*τ*

_{54}have significant impacts on the carrier leakage current density. When

*E*

_{54}is increased from 0.055 to 0.065 eV, the leakage current becomes ∼0.79 times its original value. Here, the value of

*T*

_{e4}is set as 500 K.

^{15}Similarly, when the transition rate, 1/

*τ*

_{54}, is adjusted from 2.8–2 ps

^{−1}, the leakage current decreases to about 0.71 times its initial value. These changes indicate that modifications to

*E*

_{54}and

*τ*

_{54}can effectively be used to minimize the carrier leakage of mid-IR QCLs, crucial for improving device performance.

^{9}

The outcomes, detailed in Table I, indicate that the network adeptly generated QCL-AR structures corresponding to the specifically modified k · p metrics. Most k · p metrics exhibited errors around 1%, with some exceptions that were slightly higher but still acceptable. This variation is inevitable due to the interdependence of k · p metrics, where not all k · p metric combinations are physically plausible. Furthermore, when selecting structures using the forward network, we prioritize minimizing the total error, which may result in individual k · p metrics deviating more significantly. To address this, one could adjust the weighting of specific k · p metrics during the selection process to focus on minimizing their errors. Currently, all k · p metrics are weighted equally.

These results underscore the capabilities of the inverse network, not only in reproducing but also in effectively predicting and adapting the k · p metrics of QCL-AR structures. Such adaptability suggests potential applications of the network in designing optimized QCLs with predetermined properties.

In conclusion, this study successfully developed a tandem neural network architecture to address the non-uniqueness in generating QCL-AR structures from k · p metrics, using a cascaded configuration that includes an inverse network with a random layer. This structure allows for the generation of diverse QCL-AR designs using the forward network to dynamically evaluate during training, ensuring generalization.

The random layer in the inverse network enables the production of multiple structures for each input set, with multiple iterations and selection based on minimal error to mitigate inaccuracies from the forward network.

The process of generating these structures is fully automated and independent of the k · p solver, which is only used during testing phases to validate k · p metrics accuracy. This method represents a significant advancement in QCL-AR design and could influence other semiconductor devices.

The main limitation stems from the forward network’s errors in predicting lifetimes, often due to energy level hybridization or splitting in the training data. Future research will aim to correct these inaccuracies, possibly through modifications to the k · p solver or alternative simulations. Further enhancements will focus on improving the wavefunction identification program and expanding the range of QCL-AR structures to include different compositions and operational conditions, increasing the design models’ applicability and effectiveness.

## SUPPLEMENTARY MATERIAL

See the supplementary material for additional information, which is divided into three sections to enhance the reader’s understanding of the study and its outcomes. The first section, “Dataset and Filtering,” provides an overview of the data collected and the criteria used to filter this data. The second section, “Forward Networks,” discusses the training outcomes for the forward networks used in our study. The third section, “Training Configuration for the Tandem Network,” details the specific configurations and parameters utilized in the training of the tandem network, offering insights into the optimization process and the rationale behind each choice. These additions aim to ensure that all readers, regardless of their familiarity with the subject matter, can fully appreciate the depth and scope of our research.

This work was supported under Navy Contract No. N6893622C0020 (Richard LaMarca).

## AUTHOR DECLARATIONS

### Conflict of Interest

The research reported in this paper was supported via funding provided by Intraband, LLC, with which D. Botez and L. J. Mawst have significant financial interest.

### Author Contributions

**Y. Hu**: Conceptualization (equal); Data curation (equal); Formal analysis (lead); Investigation (lead); Methodology (lead); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (lead). **S. Suri**: Data curation (equal); Investigation (supporting); Software (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (equal). **J. Kirch**: Data curation (equal); Investigation (supporting); Software (equal); Writing – review & editing (equal). **B. Knipfer**: Data curation (supporting); Software (equal). **S. Jacobs**: Investigation (supporting); Supervision (supporting); Validation (equal); Writing – review & editing (equal). **Z. Yu**: Conceptualization (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal). **D. Botez**: Resources (equal); Supervision (equal); Writing – review & editing (equal). **L. J. Mawst**: Conceptualization (equal); Funding acquisition (lead); Resources (equal); Supervision (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The datasets generated and analyzed during the current study are available in the repository linked below. The description of each file is in the README file in the repository. Repository link: https://drive.google.com/drive/folders/1Ej43Tk4qiDxLkaQgRPGjyOV1CdBZh8Iw?usp=sharing

## REFERENCES

*Mid-Infrared and Terahertz Quantum Cascade Lasers*

*Mid-Infrared and Terahertz Quantum Cascade Lasers*