The key to keep the rising slope of perovskite solar cell performances is to reduce non-radiative losses by minimizing defect density. To this end, a large variety of strategies have been adopted spanning from the use of interfacial layers, surface modifiers, to interface engineering. Although winning concepts have been demonstrated, they result from a mere trial and error approach, which is time consuming and operator-dependent. To face this challenge, in this work, we propose the use of a machine learning approach for an educated and rational material screening with optimal characteristics in terms of surface passivation. In particular, we applied Shapley additive explanation to extract the specific chemical features of the passivator, which directly impact the device parameters, specifically the open circuit voltage (Voc). By monitoring the different material parameters as input, we were able to list the most promising passivators and directly test them in working solar cells. By comparing the device performances with the results of the modeling and with additional optical and morphological characterization, we retrieved the most significant material properties linked to the highest efficiency, which are (i) the presence of chlorine and its strong binding capacity to positively charged defects on perovskite surface, reducing the non-radiative recombination and (ii) an increased flexibility of the molecule, resulting in better coverage of the surface. Finally, we tested the predictive power of the ML algorithm proposing a new passivator, which, implemented in a working device, leads to the predicted high Voc confirming the results of the modeling.
I. INTRODUCTION
In the past decade, hybrid halide perovskites (HPs) have stood out as frontrunners in the emerging field of photovoltaics (PVs) thanks to their remarkable optoelectronic characteristics, leading to a power conversion efficiency (PCE) record of 26.1%.1–4 This stems true despite the non-negligible amounts of defects (>1016 cm−3) affecting the non-radiative losses and device open circuit voltage (Voc).5–8 To date, one of the most effective methods to limit the defective nature of HPs relies on surface defects passivation by means of surface modifiers. They include aromatic compounds,9,10 alkyl compounds,11,12 and organic salts with fluorine atoms or sulfur-rich functional groups.13–15 However, despite few examples providing rational guidelines on the link between the chemical properties of the perovskite surface and the device Voc,16–18 the device optimization is still mainly driven by the trial-and-error approach. In this context, we move away from the current methods by developing and applying a machine learning (ML) approach to provide a rational screening of molecular species for an educated device optimization. On the one hand, physics-informed ML workflow is rising to speed up the discovery process.17,19 On the other hand, the experimental-driven ML workflow is essential to find correlation between variables in a database.20 In particular, Hartono et al. and, more recently Zhi et al., pioneered ML methods for the optimization of organic cations for the passivation of the perovskite active layer, Kouroudis et al. have used a ML pipeline to predict the outdoor degradation behavior of PSCs, and finally, Kirman et al. have used ML to guide their robotic synthetic trials.20–23 However, experimental-driven ML workflow requires big efforts in terms of time and materials. Here, we try to close the gap between the big amount of data required for a suitable database for the ML approach and reliable ML results by considering both different organic salts for passivation and different experimental processing conditions.
In this work, we propose a multistep approach, where different ML models are trained for the choice of the best organic cations for the passivation of formamidinium lead iodide (FAPbI3) perovskite solar cells, choosing as a target variable to enhance the open circuit voltage (Voc). Through this procedure and by applying the Shapley additive explanations (SHAP) values method, we first identified the most influential processing variables and chemical properties that the organic cation should possess for the maximization of the device Voc. Second, we obtained a reliable prediction on a new candidate cation, which we directly tested as an FAPbI3 passivator in a working device. The proposed approach not only provides critical insight into the key properties of the organic cations crucial for achieving high Voc but also enables a deeper understanding of the physics behind the passivation mechanism. Overall, this method paves the way for a smart selection of a new potential organic passivator by screening with ML their chemical properties.
II. RESULTS AND DISCUSSION
A. Model for organic cation analysis
For the material screening, we decided to consider nine organic salts among the most used as passivator materials for FAPbI3 in n-i-p PSCs architecture, i.e., (I) 4-methylphenethylammonium iodide (MePEAI), (II) 4-methylphenethylammonium bromide (MePEABr), (III) 4-methylphenethylammonium chloride (MePEACl), (IV) n-butylammonium iodide (n-BAI), (V) iso-butylammonium iodide (iso-BAI), (VI) n-octylammonium iodide (n-OAI), (VII) benzylammonium bromide (BBr), (VIII) hexylammonium bromide (HBr), and (IX) octylammonium tosylate (OaTos). Figure 1(a) shows their molecular structures. Every compound was deposited on top of the FAPbI3 using different concentrations (5, 10, and 15 mM) and temperatures (with or without annealing step at 100 °C), setting six different processing conditions. To evaluate the relationship between the Voc, the processing variable, and the chemical properties of the salts, we applied the workflow shown in Fig. 1(b), testing the performance of various ML models coupled to a features importance ranking method. After the production of three samples for each combination of passivators and processing variables, a total of 162 samples were obtained and measured under a solar simulator. To validate the measurements, we develop, as control, solar cells with no passivation (namely, FAPbI3) and cells passivated with MePEABr (our reference passivator, namely, FAPbI3 + MePEABr) (Fig. S1). We chose the device Voc as the target variable (considering only values exceeding 1000 mV). Subsequently, the filtered values were averaged, providing the average Voc for the fifty-four combinations (Fig. S2). Then, the ML models were trained by using the chemical properties of the cations and two processing variables as input features, while the corresponding measured Voc was set as the output variable. A set of sixteen chemical properties (computed via CACTVS toolkit24) concatenated with the two processing variables were initially selected. To enhance the model performance, mitigate model overfitting, decrease computational costs, and promote model interpretability, we conducted a preliminary selection of the input features by developing a correlation matrix before the training phase of the models. The square matrix, with its rows and columns representing input features, contains correlation coefficients (Pearson factors, Pf) in each cell. These coefficients measure the linear correlation and anticorrelation between features, denoted by values between +1 and −1, respectively. The computed matrix facilitated the screening of redundant variables by focusing on pairs with a Pearson factor lower than 0.9 (Fig. S3), allowing us to reduce the final set of chemical features to nine: molecular weight, rotatable bond count, rigid bond count, complexity (ensemble complexity according to modified Bertz/Hendrickson algorithm25,26), halide fraction (fraction of molecular weight contributed by halogen atoms), heteroatom carbon ratio (number of atoms other than carbon and hydrogen, divided by the number of carbon atoms), iodine, bromine, and chlorine atom counts (Table S1). Furthermore, the selected input features were normalized between 0 and 1 (Fig. S4). This is a standard procedure to mitigate the potential influence of varying amplitudes of input features on the model training. Six models were independently employed in this study, including linear regression (LR) serving as the minimum performance threshold, two geometric models, i.e., k-nearest neighbor regression (KNN) and support vector machine regression (SVM), and two ensemble models based on decision trees, i.e., random forest regression (RF) and gradient boosting regression (GB). In addition, a neural network (NN) was implemented. To determine the optimal hyperparameters for each model, we employed a grid search (GS) with fivefold cross-validation (Fig. S5), ensuring the full utilization of the available dataset and mitigating the risk of overfitting on the test set, as it would occur with a direct split into training and test sets. To select the model with the best generalization, we employed an all-but-one (AbO) cross-validation approach (Fig. S6). Model training was conducted on n − 1 organic cations data, applying GS to identify the best hyperparameters while keeping the data of the nth cation as test set. At the end of each iteration, the root mean square error (RMSE) and standard deviation of each model were saved. This process continued iteratively, excluding from the training set a different cation at each cycle, until all n cations were evaluated. Finally, the model exhibiting the lowest average RMSE and standard deviation, i.e., RF was identified as the best-performing model, as shown in Fig. 1(c). Upon selecting the best model and training it on the full dataset, we applied the SHAP values method to extract the weights that the model assigns to the values of the input features in computing its predictions.27 The method offers an interpretable strategy, unveiling the influence of each feature on the target variable.
B. Features interpretation
Figure 2(a) shows the SHAP values derived from the interpretation of the selected optimal model, particularly the RF. In particular, a positive/negative SHAP value indicates that the input property with the given value (indicated by color: yellow for high values and purple for low values) positively/negatively affects the output variable (Voc), while the modulus indicates the weight of this influence. The most relevant and clear feature for the Voc optimization is the negligible weight of the annealing step. On the other side, low/medium halide fraction, low molecular weight and heteroatom carbon ratio, and high rotatable bonds are predicted to be beneficial for pushing the device Voc. Ultimately, the presence of an atom of I seems to be detrimental, while the presence of Cl or Br halogen is beneficial to maximize the Voc. The concentration parameter trend instead is more complex. To explain these results and find a link between the different variables, we considered the Pf matrix, as shown in Fig. 2(b). First, we observe that the processing conditions, namely, the annealing temperature and the concentration, evidence a Pf with any chemical features between −0.02 and +0.02 testifying that the processing variables are independent from the cation chemical properties. Regarding the chemical properties of the cations, we combine them into two groups according to their correlation/anticorrelation values. The first group contains the main chemical properties, i.e., the absence/presence of I, Br, and Cl atoms; the molecular weight; and the halide fraction. In particular, iodine with its high atomic weight of 126.904 u (bromine and chlorine have 79.904 and 35.453 u, respectively) strictly affects the overall molecular weight of the organic cation (for MePEAI, the weight of the halogen counts to ∼48% to the total molecular weight), whose high value is detrimental for the Voc. On the contrary, for Cl-based organic cations, the reduction in the weight of the halogen is reflected in the low molecular weight of the cation and the maximization of the Voc (for MePEACl, the halogen/molecular weight is only 21%). These results allow us to explain that a low halide fraction, which is exactly the fraction of molecular weight contributed by halogen atoms, is beneficial for the Voc. Finally, we grouped the heteroatom carbon ratio, the C and H counts, and the rotatable bond count (Fig. S3). Excluding OaTos, all the cations explored have a number of atoms other than C and H equal to two (halogen and N). So, the heteroatom carbon ratio, which is the number of heteroatoms divided by the number of carbon atoms, is strictly related to the number of C, consistent with the anticorrelation between the heteroatom carbon ratio and C count observable from Pf = −0.62. On the other hand, the count of rotatable bonds shows a high correlation with the counts of C and H (Pf = 0.53 and Pf = 0.90, respectively). The correlation/anticorrelation between rotatable bonds count/heteroatom carbon ratio with C (and H for the case of rotatable bonds) counts suggests that we can explain the hard to comprehend heteroatom carbon ratio through the rotatable bonds count. Following these considerations, we have three groups of variables shown in Fig. 2(c): processing variables (yellow circle), first chemical features group (light blue triangle), and second chemical features group (dark blue square). The annealing temperature emerges as the most influential factor, exhibiting a clear trend wherein no annealing step proves to be more efficient in yielding higher Voc compared to annealing at 100 °C. Since the influence of concentration on Voc appears unclear, further data collection is needed to validate this result. Then, lighter halogens yield better performance, corroborated by findings for specific cations with different halogens, properly TMAX with X = I, Br, and Cl while TMA is 2-thiophenemethylammonium.16 Indeed, from I to Cl, the increased electronegativity is accompanied by stronger binding capacity between the anion and positively charged defects at the perovskite surface with a beneficial result for Voc.17 Finally, a higher number of C and H atoms, forming C–H bonds and contributing to rotatable bonds, results in a more flexible molecule structure. We believe that flexibility enhances the spreading and coverage of the FAPbI3 surface, leading to more efficient passivation. This conclusion is consistent with prior studies correlating alkyl chain length with PSC performance improvement.18
C. Material characterizations
To link the most influential features of the organic cations, as defined by the SHAP values analysis and the film properties, we selected MePEACl and MePEABr as the best and iso-BAI as the worst performing material (10 mM and no annealing step). Indeed, we noticed that PSCs with FAPbI3 + iso-BAI, FAPbI3 + MePEABr, and FAPbI3 + MePEACl evidence +1%, +6%, and +7%, respectively, in the mean Voc compared to the Voc of the FAPbI3 PSCs (Fig. S2). X-ray diffraction (XRD) patterns shown in Fig. S7 evidence a dominant peak for all materials at 2Θ = 14.3° and higher order reflections assigned to a-FAPbI3, while unstable δ-FAPbI3 is present only for pure FAPbI3 and the lead iodide peak, especially for FAPbI3+iso-BAI.28–30 In addition, atomic force microscopy (AFM) images (Fig. S8) show an improved surface coverage for MePEABr and MePEACl, ascribed to the flexible MePEA.31 Photoluminescence (PL) spectra are shown in the inset of Fig. 3(a), evidencing a relative increase in the PL intensity from +16% to +85% and +84% for FAPbI3 + iso-BAI, FAPbI3 + MePEABr and FAPbI3 + MePEACl, respectively, compared to bare FAPbI3 (it should be noted that they all have the same absorption spectra, as shown in Figs. S9 and S10). Such a trend suggests that the surface treatment reduces non-radiative recombination, specifically for MePEA-based passivation.32 Time-resolved PL traces [Fig. 3(a) and Table S2] provide further confirmation of the reduced charge trapping, as a slower decay (from 30 to 70 ns) is registered for FAPbI3 + MePEACl and FAPbI3 + MePEABr with respect to bare FAPbI3.33,34 Figure 3(b) shows the PL quantum yield (PLQY) and the experimental Voc obtained for the respective PSCs. A clear increase in the PLQY, in agreement with the higher PL intensity and the longer lifetime as previously discussed, going from neat FAPbI3 to FAPbI3 + iso-BAI and, especially, to FAPbI3 + MePEACl and FAPbI3 + MePEABr, is observed, providing final confirmation of the reduction in the non-radiative recombination due to surface passivation with the evident beneficial impact on the device Voc.6
D. Predictions on new candidate organic cations
The obtained RF model can be used directly to make predictions on the device Voc. To prove this capability, we listed a database containing the chemical properties of sixteen candidate organic cations (presented in Table S3), and we used the RF to calculate the prediction on the Voc. To minimize the number of processing conditions, we exclude the annealing step (noA) and we use a single concentration (10 mM). The results are shown in Fig. 3(c). Generally, the predictions for the highest values are associated with cations with Br and Cl, while those with I have the lowest, in line with the obtained SHAP values of the model. In particular, the highest Voc would be potentially obtained using PEACl. Notably, the prediction of 1127.69 mV for a PSC passivated with FAPbI3 + PEACl, however, is slightly lower than the average experimental value of 1131.05 mV obtained with FAPbI3 + MePEACl-(10mM-noA). This turns out to be consistent with their chemical properties. Indeed, although MePEACl has a much higher molecular weight than PEACl (+9%), it compensates with a lower halide fraction (−8%) and heteroatom carbon ratio (−11%). Taking into account that the RF prediction confidence range given by ± the AbO RMSE score is comparable with the range in which the predictions are contained, it is not possible to draw absolute conclusions about which of the candidate cations is the most effective. On the other hand, one must take into account that the candidate cations were chosen based on the conclusions drawn from the SHAP values, so we expect the model predictions to be close.35 In fact, if we consider the range of the experimental data used for model training of 168 mV, we obtain a normalized AbO RMSE of 12% (Fig. S11). The AbO RMSE obtained is affected by both the limited dataset used and the high variability of experimental data for each combination of surface passivator, concentration, and temperature (see Fig. S12 for AbO cross-validation RMSEs with two, three, four, or nine cations with additional models). To verify whether the prediction is empirically valid, we fabricated and measured a batch of five PSCs passivated with FAPbI3 + PEACl. The measured Voc values of the best two PSCs are shown in Fig. 3(d) with the RF prediction confidence range (light blue), given by the predicted Voc value ± the AbO RMSE score. The prediction obtained is very close to the mean value of the experimental measurements, and the confidence range is consistent with the variability of the experimental data. Although a more robust argument would require a larger test of the reliability of the predictions compared to experimental data, this result corroborates the potential of the proposed method to perform dependable a priori predictions for a smart and guided optimization of the experimental device optimization.
III. CONCLUSIONS
In conclusion, we have developed a robust machine learning assisted investigation revealing the critical chemical features of different surface passivators used to push PSCs efficiency, mainly through the increase in the device Voc. Fifty-four combinations of surface passivated FAPbI3 solar cells have been evaluated for the training of six machine learning models, while AbO cross-validation was applied to select the best model, from which RF was selected. We applied a SHAP values method to identify which features most strongly correlate with the improved device Voc, and we found that the most impacting features are (I) no need for the annealing step, (II) the low halide fraction explained through the beneficial role of Cl in binding iodine vacancies, and (III) the low heteroatom carbon ratio, which correlates with the increased spreading of a flexible molecule. By comparing the organic salts that exhibited the highest improvement of Voc, i.e., MePEACl and MePEABr, and the lowest, i.e., iso-BAI, we correlated the enhancement in the Voc, specifically to both (II) and (III). Vacancies passivation induced by Cl (and Br) is further verified by PLQY, while the increased flexibility of MePEACl (and MePEABr) with respect to iso-BAI affects the crystalline size uniformity and the coverage in the perovskite film as testified by morphological mapping. Moreover, we further pushed the validity of our algorithm by performing a priori predictions to guide experimental efforts. We used the RF model to predict the Voc upon considering sixteen new candidates as surface passivators. The highest predictive value was obtained with PEACl, which found an excellent validation with the measured Voc values (i.e., the measured Voc values fell within the confidence band given by the prediction of the RF ± its AbO RMSE score). This proves the reliability and the capability of the predictive power of the ML model. We demonstrate a robust and rational ML approach, which can facilitate a deeper understanding of the chemical properties and an easier screening of a vast library of materials, getting rid of a time-consuming trial and error approach. In addition, the predictive power is demonstrated, essential to conduct a priori selection of candidate materials. We foresee that follow-up work will benefit from using active learning processes incorporating large additional data, to corroborate the model and generalize the observations. Beyond that, the model is extremely versatile, and we envisage that it could be applied to explore other chemical features/target variables and to further accelerate the optimization of perovskite solar cells.
SUPPLEMENTARY MATERIAL
The supplementary material includes experimental section, organic cations properties, PSCs performances, main ML results, XRD, AFM, ABS, Tauc plot, TRPL lifetimes, and the whole ML code at: https://github.com/MattiaSpider-SIMaP/PVSquared2/tree/main/ML_for_FAPbI3_passivation.
ACKNOWLEDGMENTS
The authors acknowledge the ‘‘HY-NANO’’ project (Grant Agreement No. 802862) and the “ERC Proof of Concept SPIKE” (Grant Agreement No. 101068936) that received funding from the European Research Council (ERC) Starting Grant 2018 under the European Union’s Horizon 2020 research and innovation programme. The authors acknowledge the project for infrastructures funded by Regione Lombardia RL3776, the Ministero dell’Università e della Ricerca (MUR), and the University of Pavia through the program “Dipartimenti di Eccellenza 2023–2027.” F.F. and G.G. acknowledge Edison for the collaboration in the PhD fellowship co-founded by the European Union–FSE, Programma Operativo Nazionale (PON) Ricerca e Innovazione 2014–2020 (CCI 2014IT16M2OP005). M.R. acknowledges MIAI@Grenoble Alpes (ANR-19-P3IA-0003). I.P. acknowledges the National Institute for Nuclear Physics (INFN) within the next_AIM (Artificial Intelligence in Medicine: next steps) research Project No. (INFN-CSN5).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
M.R. and F.F. contributed equally to this work.
Mattia Ragni: Data curation (lead); Formal analysis (equal); Methodology (equal); Software (lead); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Fabiola Faini: Conceptualization (equal); Data curation (supporting); Formal analysis (equal); Methodology (supporting); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Matteo Degani: Formal analysis (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Silvia Cavalli: Formal analysis (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). Ian Postuma: Methodology (supporting); Software (supporting); Writing – original draft (equal); Writing – review & editing (equal). Giulia Grancini: Conceptualization (equal); Funding acquisition (lead); Project administration (lead); Resources (lead); Supervision (lead); Visualization (supporting); Writing – original draft (supporting); Writing – review & editing (supporting).
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material.