An understanding of how input parameter uncertainty in the numerical simulation of physical models leads to simulation output uncertainty is a challenging task. Common methods for quantifying output uncertainty, such as performing a grid or random search over the model input space, are computationally intractable for a large number of input parameters represented by a high-dimensional input space. It is, therefore, generally unclear as to whether a numerical simulation can reproduce a particular outcome (e.g., a set of experimental results) with a plausible set of model input parameters. Here, we present a method for efficiently searching the input space using Bayesian optimization to minimize the difference between the simulation output and a set of experimental results. Our method allows explicit evaluation of the probability that the simulation can reproduce the measured experimental results in the region of input space defined by the uncertainty in each input parameter. We apply this method to the simulation of charge-carrier dynamics in the perovskite semiconductor methyl-ammonium lead iodide (MAPbI_{3}), which has attracted attention as a light harvesting material in solar cells. From our analysis, we conclude that the formation of large polarons, quasiparticles created by the coupling of excess electrons or holes with ionic vibrations, cannot explain the experimentally observed temperature dependence of electron mobility.

## I. INTRODUCTION

Numerical simulation of physical models typically requires a large number of input parameters that must be derived from previous theoretical calculations or measured experimentally.^{1,2} It is, therefore, inevitable that model input parameters suffer from uncertainty, and the effect of this uncertainty on the simulation output is not often discussed. Quantifying the functional relationship between the model input parameters and simulation output in the *N*-dimensional input space, where a set of *N* model input parameters is a point in this space, is computationally challenging due to the combination of a high-dimensional input space and expensive numerical simulation times.^{3} The effect of input uncertainty is thus routinely ignored given that naive methods for exploring the input space, such as grid or random search, require a computationally intractable number of simulations. It is, therefore, difficult to establish whether it is possible to generate a particular outcome, e.g., matching a set of experimental results, from a numerical simulation with a plausible set of input parameter values.

Machine learning, specifically Bayesian optimization,^{4,5} can be used to efficiently search the input space in the region defined by the uncertainty in each input parameter, while minimizing the difference between the simulation output and the experimentally measured result. During optimization a probabilistic function is learned that maps the input parameters to the simulation output. This function explicitly evaluates the probability that the simulation can reproduce the experimental result in the region of input parameter space defined by the uncertainty. With this approach, informed conclusions can be drawn from the simulation output that allow for the uncertainty present in the input parameters. The method presented here falls into the wider field of statistical emulation for sensitivity and uncertainty analysis.^{6–8}

In the field of semiconductor physics, previous methods have attempted to manually minimize the error between simulation and experiment in the context of device parameter estimation for organic light-emitting diodes (OLEDs).^{9–11} However, a machine-learning method for automatically quantifying the effect of input parameter uncertainty on simulation prediction by minimizing the error between simulation and experiment has not been established. A related problem of utilizing simulation prediction to find device parameters from experimental measurements was demonstrated by Knapp *et al.*^{12}

We demonstrate the method with application to the simulation of charge-carrier dynamics in lead-halide perovskites (LHPs). LHPs as light-harvesting layers in solar cells have been the focus of intense research activity due to power conversion efficiency increases of 9%–27% since 2009^{13,14} and low-energy, low-cost manufacturing.^{15} However, a comprehensive theoretical understanding of the mechanisms responsible for a number of their electronic properties is currently lacking.^{16,17} Specifically, the fundamental physical mechanisms controlling the transport of photogenerated charge carriers are actively debated; see, for example, the measurements of Kobbekaduwa *et al.*^{18} on ultrafast dynamics that address the carrier mobility dependence on temperature in LHPs. Mesoscale simulation models of charge transport in LHPs, such as BoltMC, an ensemble Monte Carlo approach using Boltzmann transport theory,^{19} can provide insight into these mobility-limiting mechanisms.

Here, we investigate the possibility that the formation of large polarons,^{20} quasi-particles resulting from the coupling of charge carriers with the surrounding polarized ionic lattice, affects the temperature dependence of the carrier mobility in methyl-ammonium lead iodide, MAPbI_{3}. A large polaron is an itinerant particle with mobility that decreases with temperature.^{19} Prior theoretical work has utilized many approaches for calculating this dependence,^{20–22} parameterized by electronic structure calculations and/or experimental measurements of material parameters. Yet, the effect of uncertainty on these material input parameters has been largely ignored. Here, we show, with calculations from BoltMC, that the temperature dependence of polaron mobility is a non-linear function of the input parameters in the region of input space defined by the uncertainty in each parameter. This finding indicates that strong conclusions derived from a single point (i.e., a single set of material input parameters) in the input space are unjustified. Bayesian optimization has thus been used to efficiently explore the effect of uncertain inputs on simulation predictions of the temperature dependence of polaron mobilities in comparison to experiments. This approach strengthens any conclusions drawn from the simulation output as the effect of input parameter uncertainty has been considered. In addition, context aids our understanding of the fundamental physical mechanisms influencing charge transport in MAPbI_{3}.

## II. RESULTS AND DISCUSSION

### A. Polaron mobility temperature exponent

Assuming the polaron mobility varies with temperature as a power law, Bayesian optimization was used to minimize the absolute value of the difference between the predicted temperature exponent and an experimentally measured exponent of −1.5 (see Fig. 1). This power law dependence was assumed to be linear in the high-temperature limit, found in previous studies to be for temperatures $>200$ K.^{19} The predicted temperature exponent was determined from the polaron mobility at three temperatures toward the lower temperature end of this limit (200, 250, and 300 K) in order for the procedure to be computationally efficient; the simulation time increases with temperature due to increased scattering at higher temperatures. Our procedure has two stages explained in more detail in Sec. III C: first, we find that the minimum temperature exponent predicted by BoltMC for a prior uncertainty range of ±20% is −0.5 ± 0.04. A value of 0.12 *m*_{e} for the effective mass *m** minimizes the temperature exponent at the bottom of the uncertainty range. Second, we increase the prior uncertainty range to ±40% and find that the predicted temperature exponent is further reduced to −0.54 with *m** = 0.098*m*_{e}. Our predictions for both uncertainty ranges are well above the experimental temperature exponent, so our predicted minimum temperature exponent is the same as the prediction we would have found if we had minimized the difference between the predicted and experimental temperature exponents.

Using the approach described in Sec. III, 1269 simulations were required from 423 points (including the initial set) in the 6-D simulation input parameter space at temperatures of 200, 250, and 300 K. The minimum temperature exponent of polaron mobility in MAPbI_{3} was found to be −0.50 ± 0.04. This prediction resulted from optimization in the hypervolume of input space defined by the uncertainty in each of the six material simulation input parameters. The optimization procedure was terminated due to Gaussian Process Regression (GPR) predicting the probability of obtaining a temperature exponent equal to or more negative than −1.5 of $\u223c10\u221224$. Table I shows the material parameters associated with the minimum temperature exponent found during minimization. All material parameters lie within the assumed uniform uncertainty, except the effective mass *m**, which was found to minimize the temperature exponent at 0.12 *m*_{e} (bottom of the uncertainty range). This suggests that the temperature exponent may be further minimized by decreasing *m**. On increasing the prior uncertainty range to ±40% and allowing the minimization procedure to continue, it was found that the temperature exponent may be further minimized to −0.54 with *m** = 0.098*m*_{e}. This value of *m** is well within the ±40% range, indicating that the temperature exponent may not be further minimized by decreasing *m** (within this range). Additionally, a temperature exponent of −0.54 is within one standard error of the temperature exponent found within the ±20% uncertainty range and, therefore, does not affect any conclusions drawn below. Finally, increasing the material parameter uncertainty range to ±40% also leads to material parameters that are unphysical for MAPbI_{3} (see Refs. 20 and 27–29).

Parameter . | Mean value . | Minimum exponent discrepancy value . |
---|---|---|

Conduction band effective mass m* | 0.15 m_{e} | 0.12 m_{e} |

Polar optical phonon frequency ω_{pop}/2π | 2.25 THz | 2.67 THz |

Low frequency permittivity ɛ_{LF} | 25.7 ɛ_{0} | 27.1 ɛ_{0} |

High frequency permittivity ɛ_{HF} | 4.5 ɛ_{0} | 5.4 ɛ_{0} |

Acoustic deformation potential Ξ | 2.13 eV | 1.7 eV |

Elastic constant c_{L} | 32 GPa | 32 GPa |

Parameter . | Mean value . | Minimum exponent discrepancy value . |
---|---|---|

Conduction band effective mass m* | 0.15 m_{e} | 0.12 m_{e} |

Polar optical phonon frequency ω_{pop}/2π | 2.25 THz | 2.67 THz |

Low frequency permittivity ɛ_{LF} | 25.7 ɛ_{0} | 27.1 ɛ_{0} |

High frequency permittivity ɛ_{HF} | 4.5 ɛ_{0} | 5.4 ɛ_{0} |

Acoustic deformation potential Ξ | 2.13 eV | 1.7 eV |

Elastic constant c_{L} | 32 GPa | 32 GPa |

An accurate prediction of the probability of reproducing a temperature exponent of −1.5 relies strongly on the accuracy of the output variance predicted by GPR. A measure of this accuracy is easily obtained and is described as follows: In the 6-D simulation input space, 100 points were randomly generated, and the temperature exponent of the polaron mobility was calculated. Using the trained Gaussian Process Regressor, the predicted normal distribution [see Eq. (3)] over the temperature exponent for the 100 points in the input space was determined, and ± 2*σ*, ±3*σ* (95.5%, 99.7%) confidence bounds were calculated. 88% of simulated temperature exponents for the 100 points in the input space fell within 95.5% predicted bounds, with 100% of simulated temperature exponents within 99.7% bounds. This indicates that the uncertainty of the GP (Gaussian Process) model is reasonably accurate. Further uncertainty calibration to systematically correct model uncertainties so that model errors (with respect to the training data, i.e., the simulation data) agree with the predicted model uncertainties is not required.^{30}

The accuracy of predicted GP model uncertainties provides strong evidence that the probability of obtaining a temperature exponent more negative than −1.5 is indeed vanishingly small, $\u223c10\u221224$. Therefore, this modeling indicates that polaron formation in MAPbI_{3} cannot explain the observed temperature dependence of electron mobility, even once the effect of a large uncertainty in the material simulation input parameters is considered. Other mechanisms that regulate charge-carrier dynamics in MAPbI_{3} must be investigated, such as trapping and recombination, as polaron formation has been shown to be more limited in its effect on electron mobilities than previously hypothesized in the literature.^{31–33}

### B. Band electron and polaron mobility at 298 K

Additionally, GPR was fit to both the band electron and polaron mobility as a function of the 6-D simulation input space at 298 K, with mobilities calculated as described in Sec. III B. A training set of 1000 simulations was obtained from randomly generated sets of simulation input parameters and their calculated band electron and polaron mobilities. A mean absolute percentage error of 0.5% was achieved for both model predictions of band electron and polaron mobility on an unseen test set of 200 simulations. The trained length-scales of the covariance function of the GP for the two models provide information regarding the physical scattering mechanisms present in MAPbI_{3}. Specifically, a measure of the sensitivity of the band electron and polaron mobility to each of the six input parameters (normalized by removing the mean and scaling to unit variance) is quantified by the inverse of the length-scale for each parameter (see Fig. 2).^{34}

An understanding of the effect of a small perturbation to an input parameter on the band electron and polaron mobility is, therefore, obtained, providing insight into the physical mechanisms controlling polaron mobility in lead-halide perovskites. For MAPbI_{3}, this analysis suggests that the polaron mobility is less sensitive to a perturbation in any input parameter than seen for bare band electron mobility. Given that the six input parameters define the polaron scattering rates for acoustic phonons, optical phonons, and ionized impurities,^{19} this in turn suggests that the polaron mobility is less sensitive to perturbations in the scattering rates.

Furthermore, comparisons across the magnitude of length-scales for each parameter (note that comparison between length-scales for different parameters is possible only as the input data were normalized before fitting GPR) indicate which perturbed scattering rates the electron mobility is most sensitive to within the uncertainty in each input parameter. For example, a perturbation in the conduction band effective mass *m** has the largest influence on mobility. This can be rationalized from the simple Drude theory *μ* = *τ*(*m**)/*m**, where the mobility is mediated by the magnitude of the effective mass directly as well as implicitly through the relaxation time *τ*(*m**), arising from the dependence of the scattering rates on *m**. The effective mass can be seen to strongly regulate electron mobility, a result that is confirmed by this analysis.

Large mobility sensitivity to both low and high frequency permittivity, *ϵ*_{s} and *ϵ*_{∞}, is indicated by large inverse length-scales for both parameters. This suggests that the band electron/polaron mobility in MAPbI_{3} has increased sensitivity to a perturbation in scattering rates strongly dependent on these parameters, namely, the scattering rates for polar optical phonons and ionized impurities due to their functional dependence on (1/*ϵ*_{∞} − 1/*ϵ*_{s}) and $1/\u03f5s2$, respectively.^{35} Walsh^{36} discussed the importance of dielectric screening for defect tolerance of perovskites and, we infer, for charge transport. Despite the dependence of the acoustic phonon scattering rate on Ξ^{2} and 1/*c*_{L}, the mobility sensitivity (inverse length-scale) to each parameter (Ξ, *c*_{L}) remains low, suggesting that a perturbation in the acoustic phonon scattering rate has a reduced effect on the band electron mobility. This effect on mobility is significantly decreased for polarons, as indicated by the smaller inverse length-scale. Indeed, previous studies of polaron formation in MAPbI_{3} have concluded that scattering due to acoustic phonons is the mechanism most significantly decreased in comparison to bare band electrons,^{19} a result confirmed by this analysis.

However, one must be careful with the physical insight derived from an analysis of the input parameter length-scales. Specifically, it is not clear that an input parameter with a larger length-scale has reduced significance in determining the band electron/polaron mobility; only that a perturbation in the parameter and, therefore, a perturbation in the scattering rate, has a smaller effect on the mobility. For example, it may be found that the magnitude of electron mobility is not significantly changed by a perturbation in one scattering rate. Yet this does not mean that electron mobility would be unchanged if this scattering mechanism was not present in the simulation; only that its contribution to mobility is approximately constant over the range in which the material parameters were varied.

Finally, this analysis is useful for guiding future uncertainty reduction, whereby reducing uncertainty in input parameters with smaller length scales will result in a greater decrease in the uncertainty over the simulation output. Similarly, an analysis of the length-scale for each input parameter may be used to determine a reduction in the dimensionality of the simulation input space by excluding parameters with comparatively large length-scales, in turn improving the efficiency of search/optimization methods.

## III. METHODS

### A. Efficient simulation input space search

^{37}Here, the “black box” function refers to a numerical simulation of a parameterized physical model. Bayesian optimization

^{4,5}is frequently used in this setting and utilizes a learned probabilistic model of the function that is being optimized. No knowledge of the functional form is assumed; e.g., the function is not assumed to be convex, and gradients are not directly accessible. GPR has been used to model the functional relationship between the simulation input parameters

**x**and output

*g*(

**x**). GPs

^{38}can be viewed as specifying a distribution over functions,

*m*(

**x**) defines the mean function and

*k*(

**x**,

**x′**) is the covariance function of the GP. The radial basis function (RBF) was used in this work to calculate

*k*(

**x**,

**x′**), specifying the

*i*th row and

*j*th column of the covariance matrix with the addition of additive noise along the diagonal,

*l*, define the output signal variance and input length-scales, respectively, and $\sigma n2$ specifies the variance of additional Gaussian measurement noise. These parameters are optimized during training to maximize the log-marginal likelihood of the data. The choice of covariance function quantifies certain modeling assumptions that the underlying numerical function is expected to obey, such as smoothness, non-stationarity, and periodicity.

^{39}The RBF kernel encodes a prior assumption that the output of the function to be approximated smoothly varies with respect to function input parameters.

*y*(simulation output) is univariate normal,

^{38}

^{,}

*α*(

**x**) is defined over the input space, and the next input location with which to query the ground truth function is determined by maximizing this acquisition function,

^{38}

**x**

^{+}represents the input location associated with the minimum function value observed thus far during optimization and

*κ*is a hyper-parameter that defines the exploration-exploitation trade-off of the optimization policy.

In this application, the function minimized during optimization is the absolute value of the difference between the predicted simulation output and the experimental result. The optimization procedure can be terminated once either a simulation output is found that reproduces the experimental result below a pre-specified error or the probability of obtaining an output equal to the experimental result falls below a threshold. This probability can be explicitly evaluated from the predicted variance of the simulation output from GPR at each point in the input space; a method for determining the accuracy of the predicted variances is reported in Sec. III. Note that the probability of reproducing the experimental result may increase or decrease with the addition of a further simulation. The threshold probability must, therefore, be defined conservatively (i.e., sufficiently low) to ensure sufficient iterations take place and so avoid incorrect early stopping.

### B. Simulation of polaron dynamics

^{40}using the ensemble Monte Carlo method.

^{35,41}For an ensemble of polarons subject to a constant electric field

**E**, the Boltzmann transport equation is given by

**k**is the polaron wavevector and

*f*(

**k**,

*t*) defines the one-particle distribution function. The partial derivatives on the right hand side of Eq. (6) represent the change to the distribution function due to the scattering of polarons by optical phonons (pop), acoustic phonons (aco), and ionized impurities (imp). The scattering rates for each of these three mechanisms were calculated using Fermi’s golden rule,

*H*′ is the time-dependent perturbing Hamiltonian for each scattering mechanism in the bulk of a polar semiconductor.

^{35},

*ϵ*

_{i}and

*ϵ*

_{f}are the energies of the initial and final states, respectively, and Δ

*ϵ*represents the quanta of energy exchanged as a result of the scattering. A derivation of the scattering rates for the three scattering mechanisms considered here for both band electrons and polarons can be found in Ref. 19.

**k**⟩. These eigenstates were obtained from the Feynman model of a polaron, where an electron is coupled to a second particle via a harmonic potential that represents the cloud of virtual phonons associated with the surrounding polarized ionic lattice.

^{42}The Hamiltonian of this system is given by

^{40}

**r**,

**k**, and

*m** are the position, wavevector, and effective mass for the electron (subscript “e”), phonon cloud (subscript “c”), and polaron (no subscript). The polaron’s internal harmonic oscillator state is defined by the angular frequency

*ω*

_{osc}, with $(aosc)i\u2020$ and $(aosc)i$ representing the ladder operators for this harmonic oscillator system for the three cartesian directions, indexed by

*i*.

^{35,41}(additional information on this calculation and its computational implementation can be found in Ref. 19), and the polaron mobility

*μ*was determined from the ensemble average of the polaron wavevector,

**k**

_{e}was determined from the effective mass Hamiltonian,

### C. Polaron dynamics under input uncertainty

_{3}, as predicted by electronic structure calculations, can be seen in Table I. The uncertainty in each of the six MAPbI

_{3}input parameters was assumed to be ±20% in the absence of uncertainty estimation accompanying

*ab initio*prediction.

^{19}This assumed uncertainty results in input parameter values that align with values presented in a range of previous studies, both theoretical predictions and experimental measurements.

^{20,27–29}In the hypervolume of the 6-D input space defined by the uncertainty in each input parameter, 300 points (sets of input parameters) were randomly generated, and the polaron mobility was calculated at 200, 250, and 300 K. The temperature dependence of the polaron mobility has been characterized by fitting the power-law relationship,

*β*is the temperature exponent obtained from the gradient of a linear fit performed on log–log axes,

*β*was used as the predicted variable

*y*for GPR as a function of the 6-D input space

**x**. We explore the temperature dependence of the mobility from 150 to 300 K, for which Fig. 1 shows there is a linear fit using log–log axes.

As noted in Sec. III A, Bayesian optimization could then be used to minimize the absolute value of the difference between the predicted temperature exponent and an experimentally measured exponent of −1.5 (see Fig. 1). The Python codes used for performing this procedure, along with the supporting data used to run the code, can be found in Ref. 43. Correlations between input parameters have not been considered during optimization, as constraining the optimization procedure by considering parameter correlations would only decrease the searched parameter space. In this specific application, reducing the searched parameter space may result in a temperature exponent optimum further from the experimental exponent but never closer (as the space considered is inclusive of the correlation constrained space). While any correlations between input parameters will be important in other applications, their inclusion here would not affect any conclusions drawn.

## IV. CONCLUSIONS

We have demonstrated that Bayesian optimization can efficiently search a region of the simulation input space defined by the uncertainty in each input parameter to minimize the difference between the simulation output and a set of experimental results. This was achieved through an explicit evaluation of the probability that the simulation can reproduce the measured experimental results in this region of input space. With this method, 1269 simulations using the code BoltMC were required from 423 points in the 6-D simulation input parameter space (three temperatures at each point in the input space). A naive grid search of any simulation input space would require *n*^{d} points for *n* discrete parameter values along each input dimension *d*, rendering a quantification of input uncertainty computationally intractable for most numerical physics simulations. Here, a coarse grid (*n* = 10) would require 2 × 10^{6} simulations. Given the generality of the method, application to other numerical simulations (in semiconductor physics and other fields) is straightforward. For physical models known to exhibit a non-linear functional dependence on model inputs, this method allows for more complete conclusions to be drawn from the results as the effect of input parameter uncertainty on the simulation output has been determined.

The minimum temperature exponent of the polaron mobility found during optimization was −0.50 ± 0.04, with the probability of reproducing an experimental exponent of −1.5 found to be $\u223c10\u221224$. This result suggests that the formation of large polarons in MAPbI_{3} cannot explain the observed temperature dependence of electron mobility; other mechanisms that regulate charge-carrier dynamics in lead-halide perovskites, such as trapping and recombination, must therefore be investigated.

Additionally, with analysis of the length-scales of the covariance function for GPR, polaron mobility was found to be less sensitive to perturbations in the scattering rates than observed for band electrons. Both band electrons and polarons in MAPbI_{3} displayed greater mobility sensitivity to perturbations in the scattering rates for polar optical phonons and ionized impurities, with the effect on the mobility of a perturbation in the acoustic phonon scattering rate significantly decreased. This analysis demonstrates the predictive possibilities of BoltMC, giving insight on the origins of the temperature dependence of the mobility from a comparison of physical scattering for band electrons and electron polarons.

## ACKNOWLEDGMENTS

S.G.M. and J.E.L. acknowledge the UK Engineering and Physical Sciences Research Council (EPSRC) for, respectively, a summer bursary from Computational Collaboration Project No. 5 (CCP5) and a doctoral training partnership studentship.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Samuel G. McCallum**: Conceptualization (lead); Methodology (lead); Writing – original draft (lead); Writing – review & editing (equal). **James E. Lerpinière:** Methodology (equal); Writing – review & editing (equal). **Kjeld O. Jensen**: Methodology (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). **Pascal Friederich**: Conceptualization (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). **Alison B. Walker**: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available within the article. The code with supporting data for this article is publicly available at https://github.com/sammccallum/BoltMC-Bayes-Opt, where the datasets and code files are described in README.md.

## REFERENCES

*ab initio*total-energy calculations: Molecular dynamics and conjugate gradients

*Advances in Neural Information Processing Syste*

*ms 25,*edited by

*Bayesian Optimization*

_{3}perovskite nanocrystalline thin films

_{3}NH

_{3}PbI

_{3}from multiphonon Fröhlich coupling

_{3}NH

_{3}PbI

_{3}perovskite thin films

*Advances in Neural Information Processing Systems 32 (NeurIPS 2019)*, edited by

_{3}NH

_{3}PbI

_{3}perovskites revealed by coherent acoustic phonons

*The Monte Carlo Method for Semiconductor Device Simulation*

*Gaussian Processes for Machine Learning*

*Bayesian Data Analysis*