While there are several bottlenecks in hybrid organic–inorganic perovskite (HOIP) solar cell production steps, including composition screening, fabrication, material stability, and device performance, machine learning approaches have begun to tackle each of these issues in recent years. Different algorithms have successfully been adopted to solve the unique problems at each step of HOIP development. Specifically, high-throughput experimentation produces vast amount of training data required to effectively implement machine learning methods. Here, we present an overview of machine learning models, including linear regression, neural networks, deep learning, and statistical forecasting. Experimental examples from the literature, where machine learning is applied to HOIP composition screening, thin film fabrication, thin film characterization, and full device testing, are discussed. These paradigms give insights into the future of HOIP solar cell research. As databases expand and computational power improves, increasingly accurate predictions of the HOIP behavior are becoming possible.
Over the past decade, hybrid organic–inorganic perovskites (HOIPs) have seen a rapid increase in research interest due to their exceptional optical and electronic properties, which demonstrates their potential for optoelectronic applications, such as photovoltaics (PVs), light-emitting diodes (LEDs), and radiation sensors.1 Perovskites follow the ABX3 structure [see Fig. 1(a)], in which A is an organic or inorganic cation [such as methylammonium (MA+), formamidinium (FA+), or Cs+], B is a metal cation (such as Pb2+ or Sn2+), and X is a halide anion (such as Cl−, I−, or Br−). There are several possible constituent ions for HOIPs, and each composition has its own unique properties. This adjustability is extremely important as it produces a tunable bandgap, which makes HOIPs an ideal candidate for semiconducting applications. Simultaneously, the innumerous possibilities for chemical compositions and their response to environmental stressors represent a colossal hyperparameter space to be explored.2,3 In turn, machine learning (ML) could significantly accelerate the discovery of new HOIP families and their physical properties.4,5
Many applications of HOIPs to PVs have been explored, including dual- and multi-junction devices,6–8 which have produced thousands of scientific publications over the past decade, shown as the blue curve in Fig. 1(b). However, due to the vast parameter space and degrees of freedom encompassing the HOIPs, purely experimental progress is often limited. Machine learning (ML), automated and autonomous high-throughput experimentation, and artificial intelligence can offer solutions to challenges related to HOIP material/device development. The pressing need to identify stable HOIPs has, thus far, led to an exponential growth in the number of peer-reviewed publications applying ML to these materials, as shown by the red curve in Fig. 1(b). This steep “learning curve” demonstrates the scientific community’s interest in the topic, and, if extrapolated to the next five years, confirms a burgeoning interest in the implementation of ML algorithms to effectively identify stable materials, optimize fabrication parameters and device structures, and device operation conditions that maximize device recovery.2
Exceptional physical properties of HOIPs have led to them being considered the “holy grail” for PVs, as they work well in both single- and dual-junction configurations.9 Yet, it is imperative that researchers effectively exploit all material- and device-related aspects required for the commercialization of these solar cells. Thus, herein we present our perspective on how ML can be an additional and powerful tool to assist in solving several critical open questions related to this promising class of material. First, we provide a brief overview of five distinct ML models. Second, we discuss examples from the literature applying ML models to different stages of HOIP development. Third, we examine which specific models are most beneficial for solving the distinct HOIP problems. Finally, we share a succinct outlook regarding future experiments that could benefit from ML.
MACHINE LEARNING MODELS
Numerous ML models have been applied to the field of functional materials, and we focus here on the ones that have already been used with HOIPs. We present an overview of significant ML examples from the HOIP literature in this section. The most straightforward one is linear regression, which can elucidate trends in data and suggest where additional analysis may be beneficial.10 At a more sophisticated level, neural networks (NNs) are multilayer models that resemble neurons in the brain, and they have successfully been used from text predictions to image analyses.11 Deep learning neural networks entail a series of hidden “black-box” layers, which have the potential to provide extremely accurate predictions in data trends and image analysis.12,13 These NN models can remember previous states, which may have lasting historical effects on HOIP degradation, a very relevant topic of research. Finally, statistical forecasting combines advanced statistical models with ML to produce accurate time series predictions of both regular and irregular, non-linear trends.14
Initial screening for long-term data trends can be made with linear regression before beginning the time intensive task of obtaining adequate training data for advanced ML models. Simple predictions of these data can be made with linear regression, which uses a least-squares curve fitting method [the black solid line in the schematic in Fig. 2(a)] to fit the dependent and independent variables (the blue data points). This model has an extremely low computational cost when compared to more advanced ML algorithms, making it an adequate starting method to detect long-term trends before moving to more advanced procedures, if needed. It is also very useful to use linear regression as a baseline approximation to which the accuracy of other models can be compared. For example, in Fig. 2(a),15 the bandgap of Cs0.25FA0.75Pb(ClxBryI(1−x−y))3 HOIPs was estimated with high accuracy, showing a strong correlation between bandgap and halide composition (Pearson’s coefficient, r > 0.99) and an overall low root mean squared-error (RMSE, <0.032 eV).
Nearly linear variation with chemical composition allows predictions of HOIP bandgaps [see Fig. 2(a)] to be calculated quickly, efficiently, and accurately. As mentioned before, the bandgap tunability enables these materials to be used in various optoelectronic devices and, thus, it is critical to screening all potentially useful chemical compositions. Estimating the bandgaps using ML can drastically reduce the time and resource investment to identify stable HOIPs.
Potential applications of neural networks (NNs), which are a subset of ML, include quantitative, categorical, or visual data analysis, making this an extremely versatile approach to investigate HOIP thin films and full devices. The overall NN structure resembles neurons in the brain, as represented by the schematic in Fig. 2(b), which have complex and intricate levels of connections. NNs may have one or several hidden layers between their input and output, which have different weighted computational nodes. In the past decade, with faster graphics processing unit (GPU) more readily available, NNs increased in complexity by adding anywhere from two to thousands of hidden layers. This enhancement in layers is known as deep neural networks or deep learning and is represented by the schematic in Fig. 2(c). There is no definite distinction been NNs and deep neural networks, but most recent studies are taking advantage of the computing power that gives deep neural networks more accurate results. There are numerous types of the latter, including artificial neural networks (ANNs), which are able to predict the potential energy of MAPbI3 perovskites with a hyperbolic tangent update function [see Fig. 2(b)];11 convolutional neural networks (CNNs), already used to calculate the octahedral tilt angle, lattice constant, and bandgap of 862 perovskite compositions [see Fig. 2(c)];16 and recurrent neural networks (RNNs),18 such as long short-term memory (LSTM)4 and echo state networks (ESNs),4 which can predict the photoluminescence (PL) response of MAPbI3 and MAPbBr3 upon their exposure to humidity. After initial predictions with linear regression, Li et al. have demonstrated the application of NNs and random forest algorithms (RFAs) to precisely predict bandgaps of the CsaFAbMA(1−a−b)Pb(ClxBryI(1−x−y))3 family with an accuracy of at least RMSE <0.145 and <0.05, respectively. The more advanced networks are discussed below.
One of the more recent groundbreaking results has been achieved in the field of image classification. As shown in the illustration in Fig. 2(d), with sufficient input information, one can build a trustworthy pathway to distinguish images, for example dogs from cats. This method has rapidly accelerated the classification of visual research data, which previously would either require a scientist with many years of experience (in the simplest cases) or keep material patterns and correlations hidden (in situations where human eyes solely cannot resolve the information). As an effective example of image classification, Kirman et al.13 adopted a convolutional NN to predict the probability of perovskite thin film crystallization in the phenethylammonium lead bromide family of compositions [see Fig. 2(d)]. Starostin et al.12 utilized deep learning methods to identify Debye–Scherrer rings from grazing incidence x-ray diffraction (GIXRD) images to track the formation of the distinct (BA)2(MA)n−1PbI3n+1 Ruddlesden–Popper phases. Applications of NNs to identify space groups from XRD data19 and carrier lifetime from time-resolved PL measurements [see Fig. 3(c-i)]20 are becoming more researched as these techniques offer valuable information related to material stability. In our opinion, deep neural networks and image classification will be adopted more often by research groups across the world and, in the next few years, will become routinely used methods to characterize HOIPs.
While NN-assisted methods are in an intermediate stage of development in the context of HOIPs, statistical forecasting is still in its early stages of application to PV fabrication and device processing. These advanced statistical methods can be implemented to forecast economic and weather trends that may have several unknown confounding variables. For instance, they have the potential to pick up long- and short-term trends and differences in seasons. Srivastava et al. showed how a statistical forecasting model reveals photoluminescence trends in Cs–FA thin films for over 50 h while submitting the samples to varying moisture exposure17 [see Fig. 2(e) for one example case]. Overall, this method outperformed a deep neural network algorithm and achieved greater than 90% accuracy.
MACHINE LEARNING APPLIED TO HOIP MATERIAL SELECTION, CHARACTERIZATION, AND DEVICE TESTING
Effective use of ML in all the steps required to design, test, and manufacture stable HOIP PVs will attract much of the focus of the perovskite industry in next few years. We dedicate this section to demonstrating how ML is useful in each development stage, such as material screening, thin film characterization, and full device testing. Overall, the cornucopia of possible compositions for HOIPs has driven researchers to gather a profusion of data relating to various properties and behaviors of the material combinations5,26. This dataquake has led to the need for the screening of compositions at various levels to achieve the desired performance and stability metrics.27 This methodology is analogous to sifting through a funnel, as depicted in Fig. 3(a-i). Acquiring experimental data to determine materials’ properties to then screen each possible composition is often extremely laborious and resource intensive. Thus, big data repositories, such as the “Perovskite Database Project,”5 have enabled quick visualization of and information extraction from the existing published data. Beyond PVs, researchers have also developed methods to sort through existing papers using unsupervised ML and natural language processing (NLP)28 methods to extract possible compositions suitable for other optoelectronic applications. In search of new materials, several studies on the application of ML-based methods have provided insights into the structural, electronic, optical, and overall physical properties of HOIPs without the need for fabrication and characterization of each chemical composition,16 an incredible step forward in the field. Thus, no doubt, we see ML as the impetus of the race to find the ideal composition and optoelectronic properties supported by automated and high-throughput fabrication and characterization.
Material discovery and selection
Regarding material fabrication, changing the ratios of the A site ions, as shown in Fig. 3(a-ii), or the halide ions, can significantly alter the bandgap. For example, increasing the Cs content at the A site is well known for increasing the semiconducting film bandgap, while the addition of FA reduces it. This approach is similar to the situation presented in Fig. 2(c), where two convolutional NNs predicted the perovskite bandgap after calculating the lattice constant and octahedral tilt angle of the structures. Furthermore, Cai et al.29 used LR, support vector regression (SVR), k-nearest neighbor regression (KNR), RFA, gradient boosting regression (GBR), and NN to predict the bandgaps of mixed Pb–Sn perovskites and they used the predicted bandgaps to estimate device performance in terms of open-circuit voltage (Voc), short-circuit current (Jsc), fill factor (FF), and power conversion efficiency (PCE). First-principles calculations30 and density functional theory (DFT) have been employed to judge the accuracy of the output of ML algorithms and indicate robustness through low error-margins and computing costs.
A primary challenge faced by HOIP solar cells is the limited knowledge of environmentally induced degradation. The mechanisms understood so far indicate that improving charge carrier mobility by reducing the density of trap states is required for enhanced device lifetimes. Due to the poor thermal stability of numerous HOIP combinations, defect formation poses a real threat to the lifetime of the devices. Defect-induced localized trap states hinder ionic motion and trap charges at the surface and/or grain boundaries. In PV applications, this behavior is, in general, detrimental to the device open-circuit voltage (Voc) and, hence, the power conversion efficiency (PCE) over a period of time. In addition, exposure to moisture has displayed an exacerbated degradation timeline in MAPbI3 and is one of the most extensively studied environmental stressors.10,11 Solutions to hinder degradation, such as surface passivation using capping layers that modify the material chemistry of the HOIPs, have shown promising results. Due to the large possibility of chemical species, the selection of appropriate layers has also been subjected to ML regression, assisting the rapid identification of best candidates [see Fig. 3(b-i)]. Here, along with regression models, image classification has been utilized to quickly analyze the color of thin films during degradation experiments within MAPbI3 for 21 different capping layers. The onset of color change observed was matched against that of a random forest regression with an RMSE of ∼100 min.22 Other measures to improve stability entail appropriate annealing times for carrier transport layers, such as spiro-OMeTAD, a commonly used hole transport layer (HTL) in full HOIP devices [see Fig. 3(b-ii)],23 post-treatment amines, again for which ML has been used to predict options that would be most compatible with perovskite films,31 and feature engineering to study the photoelectrochemical properties of MAPbI3 films in water.32,33 We notice that further quantitative studies on the effects of external stressors, such as exposure to oxygen, applied potential, and variations in temperature, are required to fill in the remaining knowledge gaps. Overall, we foresee ML as a key tool for effectively exploring the vast hyperparameter space related to all possible combinations of the relevant stressors.
Lead (Pb) is, by far, the most selected B site metal cation in HOIPs. However, due to environmental and health concerns related to the handling, storage, and use of lead, there has been a strong interest in Pb-free compositions.34 The use of ML for identifying Pb-free families has, thus far, focused on tree-based algorithms that use predictive models for properties such as heat of formation, Debye temperature, and bandgap to determine the suitability of the possible compositions.35,36 Sn is a common alternative for Pb-based HOIPs;34 however, the performance and stability of solar cells containing Sn significantly lag behind compared to those with Pb. Again, there is much work to be accomplished with Pb-free HOIPs and the need for ML methods for the discovery of stable, Pb-free alternatives remains imperative.
We also note an emerging interest in higher-throughput fabrication experiments in HOIP research. The fabrication of thin films commonly utilizes techniques such as spin coating, thermal evaporation, and atomic layer deposition. However, these methods are tedious and time-consuming, impeding the pace of characterization and development of new materials.19 In addition, the occasional poor consistency in the fabrication process has stalled the development of methods for scaling up the production of HOIPs. To overcome these barriers, the automation of the fabrication steps using robotics has already been demonstrated in few publications.21,23,27 Uniform coatings of each layer, i.e., the absorber, electron transport layers (ETLs), hole transport layers (HTLs), and electrodes, is required for a successful scale-up. While methods such as spin coating work exceptionally well at the laboratory scale, they cannot be used for fabricating large PV modules. Therefore, there is an impending need for the development of high-throughput fabrication methods that can be scaled successfully to manufacture full PV modules. Fabrication of PV modules is a sequential process. From substrate preparation to deposition of the absorber layer, ETL and HTL, and, finally, electrodes, each step requires optimization in various ways. Liu et al.37 applied probabilistic constraints at each development stage to ascertain the optimal parameters using power conversion efficiency as the target metric. Using the knowledge acquired from one set of experimental data, the next set of experiments is improved and developed. Other methods, such as Bayesian parameter estimation,38,39 also utilize the output from previous sets of data to reject unnecessary or unpromising parameters as a means of optimization.
An ascending interest in data-driven approaches to characterize HOIP thin films has led to the development of novel ML-based solutions. The use of automated data diagnosis has significantly reduced the time investment enabling further workflow optimization. High throughput experimentation can often generate very large quantities of data, resulting in time- and computation-intensive methods for analysis, which may leave significant room for error. Material parameters, such as carrier lifetime [see Fig. 3(c-i)] and refractive index [see Fig. 3(c-ii)], have been analyzed and predicted for large datasets using ML models to quickly and accurately report crucial data required for the development of stable HOIPs. For the optimization of complete PV devices, the identification of the main limiting factor in J–V characteristics is essential.40 Surface and bulk defects in HOIP solar cells lead to Shockley–Read–Hall (SRH) recombination dominating the loss and reducing Voc. Several publications have indicated the interfaces between the absorber layer and electron transport layer (ETL)/hole transport layer (HTL) to be the main recombination centers in HOIP solar cells. Determining the type of recombination mechanism can help identify the presence of trap states. ML algorithms, such as tree-based methods, have been used to classify recombination processes based on Voc and ideality factor. Predictions for carrier diffusion lengths in MAPbI3 as shown in Fig. 3(d-i) and the PCE of MA/FA/Cs–Pb/Sn–I3/Br3/Cl3 type compositions in Fig. 3(d-ii) are additional prime examples for the crucial need for the implementation of ML methods in assisting accurate and reliable analysis of large sets of data.
Yesterday’s technology can no longer sufficiently meet the needs of today’s rapid pace of material discovery and development. The vast compositional space and non-standardized fabrication and characterization techniques for HOIPs require ML-facilitated analyses and predictions to keep up with the pace of knowledge being acquired in the field. Various ML models and their applications to solve the numerous stability and performance issues related to HOIP solar cells presented in this perspective clearly indicate the rising interest and uptake in utilizing advanced, computational frameworks that are considerably less resource and time intensive. The sharing of ML algorithms and databases is already helping in standardizing the performance metrics for all kinds of HOIPs for energy applications, including LEDs and thermoelectrics. By extending this to robotic applications to high-throughput experiments and fabrication techniques, key environmental factors, optimal processing conditions, and the best suited ML models for various problems can be narrowed down. Specifically, we advocate for high-throughput measurements that do not require data augmentation17 and can provide direct information about solar cells’ figures of merit, such as ellipsometry, PL, and time resolved PL, to acquire information related to the generation, recombination, and collection of carriers, respectively. In the realm of imaging, the analysis of data produced by sophisticated methods, such as scanning probe and electron microscopies, is now benefiting from ML-assisted methods that entail the conversion of information (images and spectroscopy measurements) into descriptors that enables its quick assessment using statistical methods and graph analysis.41 Images acquired by more simple methods can also be extremely informative. As an example, dark-field imaging has already been successfully adopted to inform variations in the electrical conductivity of HOIPs, revealing subtle features not distinguishable by human eyes.4 Ideally, photographs using cell phones could enable quick, low cost, and vast amount of information about HOIP changes once submitted to distinct environmental stressors, which could then be correlated with the materials’ optical properties. Concrete examples include correlating variations in sample color with changes in the material permittivity (i.e., light absorption) or the formation of new compounds on the surface of the HOIP films. In this ideal scenario, the main challenge would be to assure that images acquired from different laboratories around the world are comparable. Thus, a standard set of parameters (camera angle, lighting, etc.) is imperative. Furthermore, our need to access colossal amount of data to train any ML in an accurate manner demonstrates more than ever the need for researchers to follow the findable, accessible, interoperable, and reusable (FAIR) guiding principles.42
Significant present-day challenges to HOIP solar cell development are long-term stability and reproducibility. This issue is currently being addressed by diverse approaches that integrate ML models. However, ML accuracy is still limited by the availability of adequate training data. To train ML models with high accuracy to forecast HOIP degradation and performance, emphasis on high-throughput, automated, or robotic experimentation is needed. Neural networks can be applied to a diverse set of problems in the HOIP development process, so are an extremely vital approach. Statistical forecasting methods are computationally expensive but can predict long-term behavior with high accuracy. Overall, we presented a survey of the most promising ML models to predict HOIP materials and PV devices. As the number of scientific publications encompassing HOIP solar cells and ML continues to grow (likely exponentially), we anticipate the use of artificial intelligence tools to become more widely used prior to, during, and after experiments.
M.S.L. thanks the financial support from the National Science Foundation (Grant No. 20-23974) and from Sandia National Laboratories. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under Contract No. DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in this paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
Conflict of Interest
The authors have no conflicts to disclose.
A.R.H. and M.D. contributed equally to this work.
Abigail R. Hering: Data curation (equal); Formal analysis (equal); Visualization (equal); Writing – original draft (equal). Mansha Dubey: Data curation (equal); Formal analysis (equal); Visualization (equal); Writing – original draft (equal). Marina S. Leite: Conceptualization (lead); Funding acquisition (lead); Project administration (lead); Supervision (lead); Visualization (supporting); Writing – review & editing (supporting).
Data sharing is not applicable to this article as no new data were created or analyzed in this study.