Optimization of materials’ performance for specific applications often requires balancing multiple aspects of materials’ functionality. Even for the cases where a generative physical model of material behavior is known and reliable, this often requires search over multidimensional function space to identify low-dimensional manifold corresponding to the required Pareto front. Here, we introduce the multi-objective Bayesian optimization (MOBO) workflow for the ferroelectric/antiferroelectric performance optimization for memory and energy storage applications based on the numerical solution of the Ginzburg–Landau equation with electrochemical or semiconducting boundary conditions. MOBO is a low computational cost optimization tool for expensive multi-objective functions, where we update posterior surrogate Gaussian process models from prior evaluations and then select future evaluations from maximizing an acquisition function. Using the parameters for a prototype bulk antiferroelectric (PbZrO_{3}), we first develop a physics-driven decision tree of target functions from the loop structures. We further develop a physics-driven MOBO architecture to explore multidimensional parameter space and build Pareto-frontiers by maximizing two target functions jointly—energy storage and loss. This approach allows for rapid initial materials and device parameter selection for a given application and can be further expanded toward the active experiment setting. The associated notebooks provide both the tutorial on MOBO and allow us to reproduce the reported analyses and apply them to other systems (https://github.com/arpanbiswas52/MOBO_AFI_Supplements).

## I. INTRODUCTION

Ferroelectric (FE) and antiferroelectric (AFE) materials are one of the key material classes for memory and energy storage applications.^{1} For ferroelectrics, the classical applications include ferroelectric capacitors^{2,3} as well as emergent applications such as ferroelectric tunneling devices^{4–6} and long-sought ferroelectric field effect transistors.^{7,8} The complex ferroelectric structures are of broad interest in the context of ferroelectric device integration as well as enabling new functionalities such as composite magnetoelectric devices.^{9,10} Similarly, antiferroelectric materials are broadly considered for potential energy storage, either in the form of on-chip components or in the capacitive or electrolytically gated configurations. Common for both ferroelectric and antiferroelectric materials is the presence of polar instability, which is a ground state for ferroelectrics and is suppressed by the structural instability at zero fields for antiferroelectrics.^{11,12} Correspondingly, FE and AFE materials both allow for explanation in the form of Ginzburg–Landau-Devonshire (GLD)-type theories^{1,13–15} with several competing order parameters, which include a polarization for multiaxial FE and a structural order parameter for AFE systems. This formalism allows for the direct prediction of corresponding ground states, concentration, field and strain responses, and serves as the basis for the phase-field models.^{16–18}

The potential applications of ferroelectrics in information technology have stimulated over two decades of effort in integrating the classical perovskite-based systems with semiconductors, stymied by the intrinsic instability of the oxide–semiconductor interfaces. The development of HfO_{2} and other binary ferroelectrics by Mikolajick *et al.* in early 2011 have stimulated new and rapidly growing research effort in this area,^{19–23} also stimulating the search for and discovery of novel binary systems such as (Al* _{x}*Sc

_{1−x}N)

^{24,25}and (Mg

_{1−x}Zn

*O).*

_{x}^{26}Notably, despite significant controversies on the origin of ferroelectric-like responses in these systems,

^{27–31}these material systems also allow effective representation via GLD formalism, predicting dopant-dependent phase transitions and temperature-polarization behavior.

^{32,33}

Both for conventional and binary ferroelectrics, of fundamental interest is their behavior and performance in the realistic devices and structures, i.e., in the form of films, multilayers, or 3D integrated objects.^{34–41} The intrinsic complexity of this problem stems from the non-local nature of the depolarization phenomena in ferroelectric systems, where the distributions of the polarization and screening charges on the bounding interfaces result in the dominant contribution to the thermodynamics of the system. Traditionally, these phenomena are analyzed using effective dead layer models, where the complex electrostatic structure of the interface is approximated via the introduction of an effective dielectric layer.^{14,42} However, this approximation is limited to more realistic scenarios including an open ferroelectric surface with ionic screening, interfaces between ferroelectric and ionically conductive and electrochemically active materials, and interfaces between combining ferroelectric and semiconductor functionalities, i.e., ferroelectric–semiconductor interfaces,^{43,44} interfaces between ferroelectric–semiconductor and other materials, and interfaces with intrinsic finite density of electronic (or ionic) states.

Despite the apparent variability of these systems, this broad range of phenomena allows ready representation in terms of formalism combining the Landau–Ginzburg–Devonshire (LGD) description of the ferroelectric system with the corresponding surface and bulk semiconductor subsystem or surface/interface electrochemical models. For many ferroelectric materials, corresponding free energy coefficients are well known, as are their temperature, concentration, and strain dependencies, whereas others can be readily measured via classical semiconductor characterization techniques. As such, they are amenable for numerical assessment via approximate analytical, 1D numerical, and 3D numerical models. However, many of these models are time consuming and bring forth the challenge of effective navigation of possible parameter space toward target functionalities or their combination.

Here, we introduce the combination of multi-objective Bayesian optimization (MOBO) and ferroelectric-electrochemical model for figure of merit optimization for memory and energy storage devices. The analysis here is developed for a 1D model that does not allow for the formation of domain structures but can be trivially extended to the full 3D finite element models. We develop the general formalism for the (anti)ferroelectric with the interface control in Sec. I, introduce the Bayesian optimization and multi-objective Bayesian optimization in Sec. II, and demonstrate the exploration of optimal functionalities in the interface-controlled ferroelectrics in Sec. III. The analyses reported here are summarized in the associated Colab notebooks that can both serve as tutorials and allow us to reproduce the results in this manuscript and apply the same analytics for other models (https://github.com/arpanbiswas52/MOBO_AFI_Supplements).^{112 }

## II. GENERAL FERROELECTRIC–IONIC/SEMICONDUCTOR MODEL

Traditionally, interfacial phenomena in ferroelectrics are studied in the dielectric dead layer approximation, effectively allowing for separation between polarization bound charges and screening charges. While capturing some aspects of polarization behavior in ferroelectrics, this model is insufficient to describe the surface and interface phenomena in ferroelectrics in the presence of finite electronic or ionic density of states. For ferroelectric–semiconductor interfaces, nonlinear screening phenomena and their influence on the contact barrier can be important, while the phenomena are usually either neglected (model of constant screening charge density) or linear approximation is used (the Bardeen states model).^{45–47} Similarly, a number of authors have explored ferroelectric semiconductors, where both ferroelectric and semi-conductive subsystems coexist in a single material.^{48} Finally, for surfaces, a complementary thermodynamic approach was developed by Stephenson and Highland (SH),^{49,50} who consider an ultrathin film in equilibrium with a chemical environment that supplies (positive and negative) ionic species to compensate its polarization bound charge at the surface.

Recently, we modified the SH approach allowing for the presence of a gap between the ferroelectric surface covered by ions and an SPM (scanning probe microscopy) tip^{39,51–54} and developed the analytical description for thermodynamics and kinetics of these systems. The analysis leads to the elucidation of ferroionic states, which are the result of the nonlinear electrostatic interaction between the ions with the surface charge density described via Langmuir adsorption isotherm and ferroelectric dipoles. Recently, the emergent behaviors in uniaxial antiferroelectric (AFE) thin films covered with surface ions have been explored within the framework of the Kittel–Landau–Ginzburg–Devonshire–Stephenson–Highland (KLGD-SH) approach.^{55} The approach allows the characterization of the polar and anti-polar ordering’s dependence on the voltage applied to the antiferroelectric film, and the analysis of their static and dynamic hysteresis loops.

Below, we introduce the general formalism for polarization rotation in AFE films covered by surface ions with a charge density proportional to the relative partial oxygen pressure. We calculate the phase diagrams and analyze the dependence of polarization and anti-polarization on applied voltage and discuss the peculiarities of static and dynamic hysteresis loops. It is important to note that, while here we consider a special case of adsorption isotherm, this analysis can be readily extended toward other functional forms and mechanisms that give rise to finite electron or ion density of states at the interface. While the exact nature of the carriers will affect the transport and reaction mechanisms, the ferroelectric responses are expected to be universal.

Here, we consider the case of ultrathin dielectric gaps between the top electrode and the surface of AFE film with the controlled electrochemical potential of mobile component, and the control is performed by the oxygen pressure [see Fig. 1(a)]. The linear equation of state $D=\epsilon 0\epsilon dE$ relates an electric displacement $D$ and field $E$ in the gap. Here, $\epsilon 0$ is a universal dielectric constant and $\epsilon d\u223c(1\u221210)$ is a relative permittivity in the gap filled by air with controllable oxygen pressure. A wide bandgap AFE film can be considered insulating, and hence within the material $D=\epsilon 0E+P$. A potential $\varphi $ of a quasi-static electric field inside the film satisfies a Laplace equation in the gap and a Poisson equation in the film, with the bound charge density determined by the polarization gradient. The boundary conditions for the system are the equivalence of the electric potential to the applied voltage *U* at the top electrode (modeled by a flattened region $z=\u2212\lambda $); and the equivalence of the difference $D3(gap)\u2212D3(film)$ to the ionic surface charge density $\sigma [\varphi (r\u2192)]$ at $z=0$; the continuity of the $\varphi $ at gap–film interface $z=0$; and zero potential at the conducting bottom electrode $z=h$.

The polarization components of the AFE film depend on the electric field $Ei$ as $Pi=Pif+\epsilon 0(\epsilon ijb\u22121)Ej$, where $i=1,2,3$, and $\epsilon 33b$ is a relative background permittivity of antiferroelectric, $\epsilon ijb10]$.^{56} The polarization component $Pif$ is further abbreviated as $Pi$. To determine the spatial–temporal evolution of $Pi$, we use either the simplest two-sublattice Kittel model,^{57} or more complicated models incorporating anti-polar mode related oxygen octahedral rotations^{58–60} combined with the LGD approach. Corresponding LGD free energy functional *F* additively includes a bulk part—an expansion on powers of 2–4 of the polarization $(Pi)$ and anti-polarization $(Ai)$, $Fbulk$; a polarization gradient energy contribution, $Fgrad$; an electrostatic contribution, $Fel$; and a surface energy, $FS$. It has the form

where the constituent parts are

Here, $Vf$ is the film volume. The coefficients $ai$ and $bi$ linearly depend on the temperature *T*; in particular, $ai=\alpha T(TP\u2212T)$ and $bi=\alpha T(TA\u2212T)$, so they change their sign at Curie temperature $TP$ and AFE temperature $TA$, respectively; $\alpha T>0$. All other tensors in Eq. (2) are regarded as temperature-independent. The tensors $aijk$ and $bijk$ and the gradient coefficients tensor $gijkl$ are positively defined for the functional stability. The elastic and electrostriction energies are negligibly small for a single-domain film on a matched substrate. Otherwise, the influence of elastic strain can be approximately taken into account by the renormalization of coefficients in Eq. (2a) (see, e.g., Ref. 1).

The energy $Fel$ is dependent on the surface charge density $\sigma (\varphi )$, since the electric field is ultimately determined by the charge distribution in the system. Several models have been introduced for the charge variation.^{39,47} In principle, these models can incorporate specific DOS and electric potential at the interface, but only a few are sensitive to the changes of interfacial potential in a self-consistent manner.

Below, we use an equation for the surface ionic charge density $\sigma (\varphi )$ analogous to the Langmuir adsorption isotherm^{61} and a linear relaxation model to describe the temporal dynamics of the surface charge density,

Here, the dependence of an equilibrium charge density $\sigma 0[\varphi ]$ on the electric potential $\varphi $ is controlled by the concentration of surface ions, $\theta i(\varphi )$, at the interface $z=0$ in a self-consistent manner, as proposed by Stephenson and Highland,^{49,50}

where *e* is an elementary charge, $Zi$ is the ionization degree of the surface ions/electrons, and $1/Ni$ are saturation densities of the surface ions. A subscript *i* designates the summation on positive $(i=1)$ and negative $(i=2)$ charges, respectively; $\rho =pO200pO2$ is the relative partial pressure of oxygen (or other ambient gas);^{14,49}$ni$ is the number of surface ions created per gas molecule. Two surface charge species exist, since the gas molecule had been electroneutral before its electrochemical decomposition started. The dimensionless ratio $\rho $ is chosen to vary in a range from $10\u22126$ to $106$.

Positive parameters $\Delta G100$ and $\Delta G200$ are the free energies of the surface defects formation at normal conditions, $pO200=1$ bar, and zero applied voltage $U=0$. The energies $\Delta Gi00$ are responsible for the formation of different surface charge states (ions, vacancies, or their complexes). Specifically, exact values of $\Delta Gi00$ are poorly known even for many practically important cases, and so hereinafter they are regarded varying in the range ∼(0–1) eV.^{49} Note that exact values of the parameters can be derived either via variable pressure experiments or by exploring polarization dynamics under electrochemical control.^{62} However, we refer the theory–experiment matching to further studies. Schematic dependence of the normalized charge densities σ0 vs dimensionless electric potential ψ calculated for the SH model in comparison with other models is shown in Fig. 1(b).

Since the stabilization of a single-domain polarization in ultrathin perovskite films can take place due to the chemical switching (see, e.g., Refs. 49, 50, and 63), we can assume the same for an AFE film. Also, we will assume that the distributions of $Pi(x,y,z)$ and $Ai(x,y,z)$ do not deviate significantly from their values averaged over the film thickness, which are further abbreviated as “polarization” $Pi\u2245\u27e8Pi\u27e9$ and “anti-polarization” $Ai\u2245\u27e8Ai\u27e9$. Note that this mean-field approximation reduces the problem to zero dimensional and allows for analytical treatment as performed here.

In this case, the behavior of the polarization $Pi$ and anti-polarization $Ai$ can be described via relaxation-type nonlinear differential equations,

Hereinafter, we assume that the in-plane electric field components are absent, $E1=E2=0$, and consider only the component $E3$ in Eq. (5). The expression for electric field $E3$ follows from the minimization of electrostatic energy,^{55}

Equation (6) corresponds either to the stationary case or to the adiabatic conditions, when $\sigma =\sigma 0[\Psi ]$ in Eq. (3). The overpotential $\Psi $ contains the contribution from surface charges proportional to $\sigma 0$, the depolarization field contribution proportional to *P*, and the external potential drop proportional to *U*. Hence, the component $E3$ consists of the external field, depolarization field, and the field created by the surface ion layer.

The constituent parts of the thermodynamic potential (1) can be further expanded in $Ai$, $Pi$, and $\Psi $ powers, assuming that $|eZi\Psi /kBT|\u226a1$. As a result, we obtain that the surface ions and depolarization field change the coefficients $a3$, $ai3$, and $\chi i3$; induce the built-in field $ESI$; and decrease the external field $Ea$ in the following way:

The last term in Eq. (7a), $\lambda 2\epsilon 0(\epsilon dh+\lambda \epsilon 33b)$, originated from the depolarization field. Since, as a rule, $\epsilon dh\u226b\lambda \epsilon 33b$, the acting field is close to an external field, $Eact\u2248\u2212Uh$. Also, we introduce positive functions in Eq. (7),

It is seen from the expressions (6) and (7) that possible solutions of relaxation Eq. (5) and, in particular, the features of static and dynamic polarization (P–E) hysteresis loops depend on a great number of parameters, some of which can vary in a relatively wide range. Specifically, besides the dynamic variables, such as applied pressure *U*, temperature *T*, and partial oxygen pressure $\rho $, one can vary the film thickness *h* and the gap thickness $\lambda $, as well as the energies $\Delta Gi00$ of the surface defects formation and their saturation densities $1/Ni$. Not the least, the coupling constants $\chi ij$ are poorly known, as are the fitting parameters.

Our preliminary analysis of the phase diagrams calculated for a simplified 2–4-LGD expansion [without 6th powers in Eq. (2a)] show that there are multiple possible phases in the system, such as paraelectric (PE) and multiple ferroelectric-like (FE), antiferroelectric, and ferroionic phases nontrivially located in the space of variables ${T,\rho ,U}$ ruled by the other parameters, namely, external ones ($h$ and $\lambda $), interfacial, and material constants ($\Delta Gi00$, $1/Ni$, and $\chi ij$), see Fig. 1(c) for illustration of the influence of the film thickness *h* on the phase diagram. We expect multiple solutions routes, especially in a dynamic case, and the specific route critically depends on the point in the above multi-parameter space, via, e.g., interfacial conditions (controlled by $\lambda $), finite size effects (controlled by $h$), and maybe the most intriguing via the surface ions coupling to the soft modes in the AFE film (controlled by parameters $\Delta Gi00$, $1/Ni$, and $\chi ij$). These control parameters define the multidimensional parameter space of the system.

Practically of interest for applications are the macroscopic responses of the system, including energy storage and losses for a given film thickness and bias window. With the formalism above, the relevant P–E loops can be readily evaluated, and parameters can be extracted. However, the relationships between these can be highly non-trivial, with a change in a single parameter affecting multiple aspects of materials functionality. Combined with the high-dimensional nature of the parameter space, this precludes the simple gradient descent or grid search methods. This necessitates the introduction of Bayesian optimization methods capable to combine exploration and exploitation of the relevant parameter spaces. Previously, we adopted these approaches for single-parameter optimization of ferroelectric materials^{48} and lattice models^{1} and theory experiment matching.^{64} Below, we discuss the basic principles of BO and particularly the multi-objective BO, necessary for discovery balancing multiple functionalities.

## III. BAYESIAN OPTIMIZATION (BO) AND MULTI-OBJECTIVE BAYESIAN OPTIMIZATION (MOBO)

Bayesian optimization^{65} (BO) is an emerging field of study in sequential design methods. It has been considered a low cost computationally global optimization tool for design problems having expensive black-box objective functions. The general idea of BO is to emulate unknown functional behavior in the given parameter space and find the local and global optimal locations, while reducing the cost of function evaluations from expensive high-fidelity models. The reason it is called Bayesian is that it follows the ideology of the Bayes theorem, which states that “posterior probability of a model (or parameters) *M* given evidence (or data, or observations) *E* is proportional to the likelihood of *E* given *M*, multiplied by the prior probability of E.” Mathematically,

In the BO setting, the prior represents the belief of the unknown function *f*, assuming there exists prior knowledge about the function (e.g., smoothness, noisy or noise-free, etc.). Given the prior belief, the likelihood represents how likely is the observed data, *D*. These data are the sampled data as the true function value from expensive evaluations and can be viewed as the realizations from the unknown function. Finally, given the data, the posterior distribution is computed in the BO as a posterior surrogate model [e.g., Gaussian process model (GPM)]: $\Delta =p(f|(D))$ is developed from these sampled data. Thus, Eq. (9) in the Bayesian optimization setting can be modified as below,

where $D1:k=[x1:k,f(x1:k)]$ is the augmentation of the observation or sampled data till $kth$ iteration of BO. Unlike the general Bayesian modeling, here prior or likelihood function is not written separately, but the augmentation of data at each iteration combines the prior distribution with the likelihood function. Generally, the GPM posterior model is considered based on conjugate normal prior assuming the data are realized from the Gaussian (normal) function.

This approach has been widely used in many machine learning problems.^{66–70} However, attempts have been made when the response is discrete such as in consumer modeling problems where the responses are in terms of user preference.^{65,71} The idea is to approximate the user preference discrete response function into continuous latent functions using the binomial-probit model for binary choices^{72,73} and polychotomous regression model for more than two choices where the user can state no preference.^{74} In the domain of materials science, though the optimization problem in material characterization and syntheses have expensive (computational) or unknown (physical experiments) functions, the application of BO has not progressed much yet.^{75} In some notable collaboration between BO and material science, researchers found a similar performance with lot lesser BO evaluations than exhaustive evaluations in optimizing different properties of materials like grain boundary structure, grain boundary energy, elastic properties, etc., thus reducing cost and increasing efficiency.^{76–79} Even if the functions are computationally inexpensive, BO allows for quick convergence for very large parameter space than doing an exhaustive search.^{80} Furthermore, BO is also applied in optimizing the hysteresis loop shape in ferroelectric materials using a continuous lattice model.^{81} Just like in expensive computational simulations, BO also helped in reducing the effort and resource cost in experimental materials science problems.^{82–84} However, the vast majority of these applications address only single-parameter optimization, whereas in many cases, we aim to discover the optimal balance between two or more functionalities, as implemented in multi-objective BO. To date, there are only a few applications of MOBO for material discovery in optimizing multiple target properties, considering both experimental and computer simulated data.^{85} MOBO has also been applied for efficient discovery of the targeted materials, performed by a thermodynamically consistent micromechanical model that predicts the material’s response based on its composition and microstructure.^{86}

BO adopts a Bayesian perspective and assumes that there is a prior on the function; typically, a Gaussian process. The prior is represented from the true model or experiments, which are assumed as the realizations of the true function, are treated as training data. The overall Bayesian optimization approach has two major components: a predictor or Gaussian process model (GPM) and an acquisition function (AF). As shown in Fig. 2, in this approach, a posterior GPM is first built or iteratively updated, given the data from the current evaluated locations (red dots). The surrogate GPM then predicts the realizations of the function at the unexplored locations over the control parameter space. The best locations are then strategically selected in the space (green dots) for future expensive evaluations by maximizing the acquisition functions, defined from the posterior GPM simulations. Finally, after the evaluations are done, they are augmented with the existing evaluations and the cycle is repeated till the model is converged.

It is important to mention that as an alternative to a GPM, random forest regression has been proposed as an expressive and flexible surrogate model in the context of sequential model-based algorithm configuration.^{87} Although random forests are good interpolators in the sense that they output good predictions in the neighborhood of training data, they are very poor extrapolators where the training data are far away.^{88} This can lead to selecting redundant exploration (more experiments) in the non-interesting region as suggested by the acquisition function in the early iterations of the optimization, due to having additional prediction errors of the region far away from the training data. This motivates us to consider the GPM in a Bayesian framework while extending the application to complex physics-driven problems.

The numerical optimization problems can be classified on the basis of a number of objective functions as single (SOO) and multi-objective optimization problems (MOOs). MOO is the extension of SOO having more than one objective as in Eq. (10),

It is obvious that SOO is relatively simpler with lower computational cost; however, in practical problems, it may be a challenge to formulate a single objective problem and, therefore, much focus has been given on multi-objective optimization (MOO) methods. The question generally attempted to solve in any MOO is the optimal decisions under user defined preferences of objectives, which are referred to as Pareto-optimal solutions. Considering a minimization (maximization) problem, a candidate point is a Pareto-optimal solution if there is no other feasible point, which gives a lower (higher) minimum (maximum) objective function for at least one of the objectives without the sacrifice from others. Such Pareto-optimal solutions at different trade-offs of objectives are represented by a Pareto-frontier. Pareto-optimal solutions are also referred to as non-dominated solutions in the design or objective criterion space. The methods to solve MOO problems can be classified into *a priori* and *a posteriori* methods based on whether we predefine the weightage associated to each objective or not respectively. Similarly, in the Bayesian framework, MOBO is the extension of BO, where we focus on optimizing multiple expensive responses or objective functions and build a Pareto-frontier. We next describe the GPM and AF.

### A. Gaussian process model (GPM)

As a simple example, Figs. 3(a)–3(f) show a simple 1D Gaussian process model with one control parameter *x* and one objective function variable $z=f(x)$. The green dots are the evaluated locations, and the gray dotted and blue solid lines are the true and the predictor mean functions in the parameter space, given the prior evaluated data. The area enclosed by the black curves shows the measure of uncertainty over the surrogate GPM prediction. It is clearly seen that the variance near the observations is small and increases as the design samples are farther away from the evaluated data, thereby related to kriging models where the errors are not independent. Much research has been ongoing regarding incorporating and quantifying uncertainty of the experimental or training data by using a nugget term in the predictor GPM. It has been found that the nugget provides a better solution and a computational stability framework.^{89,90} Furthermore, GPM has also been implemented in high-dimensional design space exploration^{91} and BIG DATA problems,^{92} as an attempt to increase computational efficiency. A survey of implementation of different GP packages has been provided in different coding languages such as MATLAB, R, and Python.^{93} Extending BO to MOBO, we have multiple Gaussian priors, one for each expensive functions, and therefore we develop respective multiple posterior GPMs. Since, in classical MOO problems, each function is treated independently during optimization, here also in MOBO, we consider multiple independently fitted GPMs.

The general form of the GPM is as follows:

where $xT\beta $ is the polynomial regression model. The polynomial regression model captures the global trend of the data. In general, the first order polynomial regression is used, which is also known as universal kriging;^{94} however, it has also been claimed that it is fine to use a constant mean model.^{95} $z(x)$ is a realization of a correlated Gaussian process with mean $E[z(x)]$ and covariance $cov(xi,xj)$ functions defined as follows:

where $\sigma 2$ is the overall variance parameter and $\theta m$ is the correlation length scale parameter in dimension *m* of *d* dimension of *x*. These are termed as the hyper-parameters of GP model. $R(xi,xj)$ is the spatial correlation function. In this paper, we have considered the constant mean model and a radial basis function, which is given by Eq. (12c). The objective is to estimate (by MLE) the hyper-parameters $\sigma ,\theta m$ that create the surrogate model that best explains the training data $Dk$ at iteration *k*.

After the GP model is fitted, the next task of the GP model is to predict at an arbitrary (unexplored) location drawn from the parameter space. Assume $Dk={Xk,Y(Xk)}$ is the prior information from previous evaluations or experiments from high-fidelity models, and $x\xaf\xafk+1\u2208X\xaf\xaf$ is a new design within the unexplored locations in the parameter space, $X\xaf\xaf$. The predictive output distribution of $xk+1$, given the posterior GP model, is given by Eq. (13a),

where

$COVk$ is the kernel matrix of already sampled designs $Xk$, and $covk+1$ is the covariance function of new design $x\xaf\xafk+1$, which is defined as follows:

### B. Acquisition function (AF)

The second major component in Bayesian optimization is the acquisition function (AF). AF guides the search for future evaluations or experiments toward the desired goal and thereby brings the sequential design into the BO. The AF predicts an improvement metric for each sample. The improvement metric depends on exploration (probability to discover useful behaviors in unexplored locations) and exploitation (known region with high responses). Thus, the acquisition function gives high value of improvement to the samples whose mean prediction is high, variance is high, or a combination of both. By maximizing the acquisition function, we select the best samples to find the optimum solution and reduce the uncertainty of the unknown expensive design space. Figures 3(a)–3(f) show a simple example of BO exploration with one control parameter x and one objective function variable $z=f(x)$ of the sequential selection of samples by maximizing the acquisition function, given the posterior GP model in iterations 1–3, 7, 15, and 25. The solid blue line is the GP predicted function and the dotted gray line is the original function. The green dots in iteration 1 [Fig. 3(a), top panel] are the evaluated locations and the red dot is the new evaluated location, as per maximizing the acquisition function [Fig. 3(a), bottom panel]. The new data are augmented with current data and the GPM is updated in the next iteration, and then the acquisition function is maximized to get the next evaluated locations [see Fig. (3b)]. This process is repeated iteratively, and in this manner, the search space is explored. We can see from the figure that the acquisition function value is the highest where the samples have high prediction mean and/or high variance and the lowest where the samples have low prediction, low variance, or both. Thus, the acquisition function strategically selects points that have the potential to have the optimal (e.g., maximum value of the unknown function) and gradually reduces the error and aligns with the true function, at those potential region, with the iterations.

Throughout the years, various formulations have been applied to define the acquisition functions. One such method is the probability of improvement (PI),^{96} which is an improvement-based acquisition function. Jones^{97} notes that the performance of PI “is truly impressive; however, the difficulty is that the PI method is extremely sensitive to the choice of the target. If the desired improvement is too small, the search will be highly local and will only move on to search globally after searching nearly exhaustively around the current best point. On the other hand, if the small-valued tolerance parameter ξ in PI equation is set too high, the search will be excessively global, and the algorithm will be slow to fine-tune any promising solutions.” Thus, the expected improvement acquisition function (EI)^{65} is widely used over PI, which is a trade-off between exploration and exploitation. Another acquisition function is the confidence bound criteria (CB), introduced by Cox and John,^{98} where the selection of points is based on the upper or lower confidence bound of the predicted design surface for maximization or minimization problem respectively. However, these acquisition functions are limited for the case when there is a single optimization functionality. Hence, we further discuss the acquisition functions for MOBO, where the target is to find the manifold in the parameter space of the system containing the points on the Pareto-frontier for the system.

In MOBO, the computation of the acquisition function is more complicated and can be computationally costly. Currently, there are two approaches to configure the acquisition functions in MOBO, namely, *a priori* method and *a posteriori* method. The two differ in whether the weighting factors for each objective are predefined or not. This is a similar technique like classical MOO problems. However, unlike MOO where the weighting factors are associated with objective functions, in MOBO, the same are associated with acquisition functions. In the *a posteriori* approach, the goal is to maximize the hyper-volume indicator function to maximize the likelihood of iteratively selecting non-dominated solutions (green dots) as shown in Fig. 4. The shaded area is the hyper-volume at iteration 1. As we proceed from iteration 1 to 2, we found three additional non-dominated solutions, where two of the earlier non-dominated solutions are now dominated (blue dots), which guaranteed that those solutions are not Pareto-optimal. Eventually, we see an increase of hyper-volume in iteration 2 (red colored area). This reselection of dominated solutions goes on iteratively until the convergence criteria are met, which finally configures the Pareto-frontier by the set of current dominated solutions. Different methods in this approach are expected improvement hyper-volume (EIHV), expected hyper-volume gradient-based (EIHVG), max-value entropy search (MESMO), and predictive entropy search (PESMO) acquisition functions. EIHV has been modeled to provide better performance.^{99,100} However, to increase the computational efficiency, Yang *et al.*^{101} modified the EIHV acquisition function into the EIHVG acquisition function and proposed an efficient algorithm to calculate it. To reduce the computational cost, MESMO^{102} and PESMO^{103} have been formulated. Abdolshah *et al*.^{104} proposed a multi-objective BO framework with preference over objectives on the basis of stability and attempting to focus on the Pareto front where preferred objectives are more stable. However, due to the computational complexity of all these MOBO approaches with increasing number of objectives, the weighted Tchebycheff method has been augmented in MOBO with a ridge regularization term to smoothen the converted single multi-objective function.^{105} Likewise, a weighted Tchebycheff method has been implemented in MOBO and introduced a regression-based model calibration to estimate the parameter (utopia) of the multi-objective function.^{106} Here, the weights are predefined by the user. The architecture is further modified with optimizing the regression-based model for calibration from the ensemble of models in order to enhance the model performance.^{107} Another *a priori* method is ParEGO,^{108} where the weights are randomly assigned with a uniform distribution. In these *a priori* methods,^{106–108} the multiple objectives are first transformed into a single weighted multi-objective function and then fit into a single-function based acquisition function (for example, EI).

### C. Multi-objective Bayesian optimization framework (MOBO)

Table I presents the high level description of the MOBO algorithm. For the sake of simplicity, we considered two control parameters (2D) and two objective functions. However, the model can be easily extended to “n” parameters and “j” objective functions.

1. Initialization: Define parameter space. State maximum iteration, N. Randomly select m samples, = {XX_{1}, X_{2}}. Assuming f_{1}, f_{2} are the two expensive objective functions. Evaluate m samples for both objectives as, Y_{1}(X_{k}) and Y_{2}(X_{k}) respectively. Build training data matrices, D_{1,k} = {X_{k}, Y_{1,k}} and D_{2,k} = {X_{k}, Y_{2,k}}. Set k = 1. |

For k ≤ N |

2. Surrogate Modeling: Develop or update GPM models for both objectives as Δ_{1}(D_{1,k}) and Δ_{2}(D_{2,k}). a. Optimize the hyper-parameters of GPM by minimizing the loss (negative marginal log-likelihood) function using Adam optimizer algorithm. Here, we consider learning rate 0.2. |

3. Posterior Predictions: Given the surrogate model, compute posterior means and variances for the unexplored locations, $Xk\xaf\xaf$, over the parameter space as${\pi 1(Y(Xk\xaf\xaf)|\Delta 1,\pi 2(Y(Xk\xaf\xaf)|\Delta 2}$ and ${\sigma 12(Y(Xk\xaf\xaf)|\Delta 1,\sigma 22(Y(Xk\xaf\xaf)|\Delta 2}$ respectively. |

4. Acquisition function: Compute and maximize acquisition function, $maxx\u2061U(f1,f2|\Delta 1,\Delta 2)$ to select next best location, x_{best,k} for evaluations. |

5. Augmentation: Evaluate y_{1}(x_{best,k}) and y_{2}(x_{best,k}) and augment data, D_{1,k+1} = [D_{1,k}; {x_{best,k}, y_{1}} and D_{2,k+1} = [D_{2,k}; {x_{best,k}, y_{2}}. |

1. Initialization: Define parameter space. State maximum iteration, N. Randomly select m samples, = {XX_{1}, X_{2}}. Assuming f_{1}, f_{2} are the two expensive objective functions. Evaluate m samples for both objectives as, Y_{1}(X_{k}) and Y_{2}(X_{k}) respectively. Build training data matrices, D_{1,k} = {X_{k}, Y_{1,k}} and D_{2,k} = {X_{k}, Y_{2,k}}. Set k = 1. |

For k ≤ N |

2. Surrogate Modeling: Develop or update GPM models for both objectives as Δ_{1}(D_{1,k}) and Δ_{2}(D_{2,k}). a. Optimize the hyper-parameters of GPM by minimizing the loss (negative marginal log-likelihood) function using Adam optimizer algorithm. Here, we consider learning rate 0.2. |

3. Posterior Predictions: Given the surrogate model, compute posterior means and variances for the unexplored locations, $Xk\xaf\xaf$, over the parameter space as${\pi 1(Y(Xk\xaf\xaf)|\Delta 1,\pi 2(Y(Xk\xaf\xaf)|\Delta 2}$ and ${\sigma 12(Y(Xk\xaf\xaf)|\Delta 1,\sigma 22(Y(Xk\xaf\xaf)|\Delta 2}$ respectively. |

4. Acquisition function: Compute and maximize acquisition function, $maxx\u2061U(f1,f2|\Delta 1,\Delta 2)$ to select next best location, x_{best,k} for evaluations. |

5. Augmentation: Evaluate y_{1}(x_{best,k}) and y_{2}(x_{best,k}) and augment data, D_{1,k+1} = [D_{1,k}; {x_{best,k}, y_{1}} and D_{2,k+1} = [D_{2,k}; {x_{best,k}, y_{2}}. |

Here, we considered two acquisition functions, namely, (1) qEIHV^{109} and (2) qParEGO. These are the extensions of EIHV and ParEGO, respectively, where parallelism is introduced, and we can jointly optimize to provide the next best location, $xbest,k$ in batch such that $Xbest,k={xbest,k,1,\u2026,xbest,k,q}$. This reduces the computational effort and can be extremely helpful when it is desirable (lesser time) for expensive evaluations in batch. For this study, we choose batch size, $q=4$. Obviously, the choice of acquisition function can change the performance of MOBO depending on the problem; however, choosing the best acquisition function is a hard problem and is not the scope of this paper and can be addressed in the future. It is up to the knowledge of the user to select or build an appropriate acquisition function for a given problem.

To initialize the computation of the acquisition function, we select 20 restart initial locations for a set of 1000 random points and consider maximum iterations of 200. The acquisition function is approximated using 500 Monte Carlo (MC) based samples. To compute the hyper-volume indicator function in qEIHV, a reference point (red star in Fig. 4), which is the lower bound of the objectives (worst solution since MOBO is set up as maximization problem), is required. We set this value (after normalizing the objectives) as $0\u2212\delta $. $\delta =0.001$ is a small positive value to make the reference point slightly worse than true lower bound of the objectives. Different strategies can be applied to select reference point based on the problem complexity, especially when we do not have much knowledge on the lower bound of the objective functions (for constrained problems), to increase the computational efficiency of the acquisition function. However, this is beyond the scope of this paper and we have considered the stated reference point throughout our analysis since we have good knowledge on the lower bound of the objectives (normalized).

To illustrate the MOBO framework in Table I, we implemented it on two numerical test problems. To see the general application of the proposed design architecture, we first implement using two numerical multi-objective examples. As the scope of the proposed Bayesian framework in this paper is for solving expensive multi-objective problems, we assume that the numerical functions are expensive in nature. We have considered here two numerical test problems (maximization): Test Problem (1): *ZDT1* and Test Problem (2): *2-D Six-hump camel back—inversed-Ackley's path (6HC-IAP)*^{110} multi-objective functions. The full mathematical equations of the optimization problems are provided in Appendix A in the supplementary material. We start with ten randomly selected samples and set the max iteration of MOBO as 50. Thus, the total evaluation is 10 + 50 × 4 = 210. Though, we solved the problems, each with two acquisition functions, we present here that the results (Figs. 5 and 6) of MOBO for the acquisition function performed better.

Figures 5(a), 5(b), 6(a), and 6(b) show the ground truth of the individual functions (*f*_{1}, *f _{2}*) of test problems (1) and (2), respectively. The ground truth is configured with exhaustive grid-based sampling in the parameter space where true evaluations are done at all the grid points on a 100 × 100 grid. Comparing individual functions [panels (a) and (b)] of each of the test problems in Figs. 5 and 6, we can see clear trade-offs between them (see the color representations—yellow region being the maximum value region of individual functions), which gives us the Pareto-frontier (optimal balance) as we attempt to optimize (e.g., maximize) both functions jointly (optimizing trade-offs) by implementing MOBO.

Comparing the ground truth of the functions of both test problems, it can be easily interpreted that the test problem (2) is highly complex than test problem (1). Figures 5(c), 5(e), 6(c), and 6(e) show all the samples selected through MOBO iterations in the parameter and objective space for each of the test problem (1) and (2), using qEIHV and qParEGO acquisition functions. The color coding is defined as the darker color resembles early iterations and the lighter color resembles later stage of iterations. Here, looking at Figs. 5(e) and 6(e), we see at the early stage of the evaluation, the MOBO model explores by sampling over the entire region, as denoted by the scattered samples in dark colors. Then, in the later stages, once it quickly identifies the potential region of interest (Pareto-frontier), the model exploits by sampling more often in that region, as denoted by the cluster of samples with light colors. This is power of adaptive sampling in MOBO, which maximizes the learning focused on the region of interest through exploration–exploitation. Readers can look into Figs. 5(c) and 6(c) to get an understanding of where most of the sampling has been done in the parameter space of ground truth. Also, we have seen that using the stated acquisition functions for the test problems, qParEGO does more exploration than qEIHV and, therefore, has few more selection of samples far away from the Pareto-frontier. The Pareto-optimal solutions, extracted from respective Figs. 5(c) and 6(c) in the parameter space and build the Pareto-frontier in the objective space for both test problems, are drawn over the parameter space as shown in Figs. 5(d) and 6(d), respectively. Figures 5(f) and 6(f) show the Pareto-frontiers in the objective space, extracted from the respective Fig. 5(e) and 6(e). From the Pareto-frontier, one can visualize the trade-off between the objectives and can choose one or multiple solutions based on the desired trade-off.

To measure the performance of Pareto-frontier of the test problems with MOBO, we compare with the respective Pareto-frontier by solving the non-dominated sorting genetic algorithm II (NSGA-II) in the “*pymoo*” package^{111} in Python. We select the population size = 100 and generation = 100, a total of 10 000 function evaluations to ensure the configuration of the Pareto-frontier. We consider this as the ground truth of the Pareto-frontier. Using MOBO, our goal is to cover this Pareto-frontier with maximizing accuracy and minimizing cost (lesser evaluations). Also, we run NSGA-II with only population size = 15 and generation = 15, total of 225 function evaluations, which is close to the number of function evaluations we did in MOBO. Figure 7 shows the comparison. Given that we did only 2% evaluation in MOBO that what we did in full convergence of NSGA-II (210 function evaluations in MOBO, compared to 10 000 evaluations in NSGA-II), MOBO offered a superior performance in identifying a large portion of Pareto-optimal solutions, which overlaps with the ground truth in the test problem ZDT1. Similarly, MOBO, with both acquisition functions, clearly outperforms by a long margin when the Pareto-frontier is built with similar function evaluations in NSGA-II [Figs. 7(a) and 7(b)]. Obviously, the 6HC-IAP problem is much more complex than ZDT1 as we can see from the ground truth of individual functions [Figs. 6(a) and 6(b)] and highly disjoint Pareto-frontier (10 000 NSGA-II evaluations), MOBO needs more evaluations to cover more Pareto-frontier regions. However, even with only 210 evaluations for a complex problem with highly dis-jointed Pareto-frontier, MOBO could identify Pareto-solutions in three out of four Pareto-sub-frontiers (for qParEGO) and four out of four Pareto-sub-frontiers (for qEIHV) and did not get stuck into one region. However, the Pareto-optimal solutions are more accurate for qParEGO [Figs. 7(c) and 7(d)]. Thus, the user though would not have full knowledge of the Pareto-frontier will have a good approximated idea over the whole objective or parameter space. Here also, MOBO particularly with the qParEGO acquisition function found more Pareto-optimal solutions than running similar evaluations with NSGA-II. Obviously, in case of expensive functions, it would not be possible for 10 000 expensive evaluations; thus, we are confident enough with the performance of MOBO in terms of the trade-off between accuracy and cost. More detailed figures on the test problems with the iteration (steps-wise) of MOBO is provided in Colab notebook. Next, we proceed to solve a complex physics-driven model using MOBO for performance optimization for memory and energy storage applications.

## IV. MOBO OF CHEMICALLY CONTROLLED FERROELECTRICS

With the general principles of MOBO discussed in Sec. II and interface-controlled ferroelectric model introduced in Sec. I, here we present a MOBO optimization of the energy storage and losses in this system. For a given set of model parameters, we can generate the hysteresis loops for the field-dependent polarization response. The *per se* are too complex (high dimensionality) in the MOBO framework to define objective functions and train the MOBO model. Thus, we build a decision tree to first extract different low-dimensional target functions from different loop structures, given the numerical physics-driven model parameters through an automated task.

Figure 8 shows the full decision tree. For chosen parameters of a material [e.g., 2–4–6 KLGD for bulk PbZrO_{3} (PZO)] we can get different loop structures such as zero, one, and two loops, which defines different phases like paraelectric (PE), ferroelectric (FE) and antiferroelectric (AFE), respectively. For each of these phases, we have different classifications. For example, if the phase is PE, we then classify the hysteresis curve as linear, sigmoid. and two waves. If the phase is AFE, we further classify as the two loops are attached or not, with quantifying the distance between separation between the loops. If the phase is FE, we have two different classifications. The first one is the measure of the shift of the loops and in the positive or negative direction. The second one is the measure of the width of the loop. Finally, as shown in Fig. 8, irrespective of the loop structures, we calculate the energy storage and loss.

As the focus of this paper is the performance optimization for memory and energy storage applications, we selected the energy storage and loss as the objectives of the MOBO. It is to be noted that the stated automated task of developing a similar decision tree from the hysteresis loop can be applicable to any theoretical model or experimental settings of any materials, and the user can choose any target function (branch) in the decision tree as the objectives as appropriate for the optimization problem. Finally, integrating the decision tree in Fig. 8 with the general MOBO framework (Table I), we develop a physics-driven MOBO architecture to optimize various target material properties jointly and thus attempt to rapidly explore the multidimensional parameter space for initial material and device parameter selection. The full architecture is summarized in Fig. 9. The MOBO framework is coded in the Python BoTorch package. GP models are configured using GPyTorch package and to handle multiple independent GP models, we utilize the multi-output GP function, ModelListGP. The full code is available in Colab notebook.

Now, we illustrate the application of the proposed physics-driven MOBO architecture in the 2–4–6 KLGD for bulk PZO, which contains different phases such as PE, FE, and AFE for the chosen parameter space. Here, we aim to explore the balance between energy storage and loss, and build the Pareto front where we have various flavors of solutions—high storage-low loss, medium storage-medium loss, and low storage-high loss. It is up to the decision maker to choose the appropriate Pareto-optimal solution(s), which gives the desired balance in device parameter selection for given application. We started with 20 randomly selected initial samples and then stopped MOBO after 30 iterations, with four samples (batch_size = 4) selected at each iteration (30 × 4 = 120 sampling). Figures 10–12 show the Pareto-frontiers through rapid exploration–exploitation using MOBO with *a priori* and *a posteriori* acquisition functions over different parameter spaces such as such as temperature, partial O_{2} pressure, film thickness, and surface ion energy.

To check the performance of MOBO, we draw the ground truth of storage and loss [Figs. 10(a) and 10(b)] using low-sampling (7 × 7), expensive exhaustive (grid-based) exploration. Here, we configure the Pareto-frontier within the AFE phase only. Interpreting the trade-off between storage and loss in the parameter and objective space [Figs. 10(c)–10(f)], we see that higher storage maps to higher temperature and lower order (in the absolute power of 10) partial O_{2} pressure, whereas lower storage maps generally to lower temperature and higher order partial O_{2} pressure at the chosen value of film thickness and surface ion energy. The balanced trade-off between storage and loss populates the intermediate values in the parameter space. The ground truth shows the similar maps of high and low values of storage and loss in the parameter space. As we compare the Pareto-frontier and the ground truth in the parameter space [Figs. 10(a)–10(d)], the Pareto-frontier actually connects the maximum value regions of both storage and loss, giving a trade-off between them. Also, we present the loop diagrams in Fig. 13 in the Appendix in the supplementary material, from where the ground truth of storage/loss is drawn following the decision tree. It is clearly seen that the loop area is decreasing as the temperature and partial O_{2} pressure are higher and lower (in the absolute power of 10), respectively, thus maximizing the storage or minimizing the loss. This agrees to the results in Fig. 10. Both the acquisition functions performed quite similar.

Also in Fig. 11, we first draw the ground truth of storage and loss [Figs. 11(a) and 11(b)]. Here, we configure the Pareto-frontier within the FE phase only. Interpreting the trade-off between storage and loss in the parameter and objective space [Figs. 11(c)–11(f)], we see that higher loss maps to higher partial O_{2} pressure, whereas lower loss maps to lower partial O_{2} pressure at the chosen value of film thickness and surface ion energy. The balanced trade-off between storage and loss populates the intermediate values of partial O_{2} pressure. Though the dependence of temperature is much lesser than partial O_{2} pressure in the effect of storage or loss in the FE phase at the given parameters, however, most of the Pareto-solutions in the intermediate region are at low temperature. Validating the interpretation with the ground truth using low-sampling (7 × 7), expensive exhaustive (grid-based) exploration, it shows the similar maps of high and low values of storage and loss in the parameter space. As we compare the Pareto-frontier and the ground truth in the parameter space [Figs. 11(a)–11(d)], the Pareto-frontier actually connects the maximum value regions of both storage and loss, giving a trade-off between them. Here, we also see that the dependence of partial O_{2} pressure is much greater than temperature, which goes by the results from MOBO. From the loop diagrams in Fig. 14 in the Appendix in the supplementary material, it is clearly seen that the loop area is increasing as the partial O_{2} pressure is higher, thus minimizing the storage or maximizing the loss. This agrees to the results in Fig. 11. Both the acquisition functions performed quite similar.

Both the parameter spaces in Fig. 12 configure the Pareto-frontier within both AFE and FE phases. Focusing on the parameter space as in Figs. 12(a)–12(d), we first draw the ground truth of storage and loss [Figs. 12(a) and 12(b)]. Interpreting the trade-off between storage and loss in the objective and parameter space [Figs. 12(c) and 12(d)], we see that higher loss maps to higher partial O_{2} pressure and lower temperature, whereas higher storage maps to lower partial O_{2} pressure at the chosen value of film thickness and surface ion energy. Validating the interpretation with the ground truth using low-sampling (7 × 7), expensive exhaustive (grid-based) exploration, it shows the similar maps of high and low values of storage and loss in the parameter space. As we compare the Pareto-frontier and the ground truth in the parameter space [Figs. 12(a), 12(b), and 12(d)], the Pareto-frontier actually connects the maximum value regions of both storage and loss, giving a trade-off between them. From the loops diagrams in Fig. 15 in the Appendix in the supplementary material, it is clearly seen that the loop area (considering both single loop FE region and double loop AFE region) is increasing as the partial O_{2} pressure is higher and temperature is mostly lower (as we move from AFE to FE, and deeper into FE domain) and is decreasing as the partial O_{2} pressure is lower (as we move from FE to AFE, and deeper into AFE domain), thus minimizing (maximizing) the storage (loss) and maximizing (minimizing) the storage (loss), respectively. This agrees to the results in Figs. 12(a)–12(d). Interestingly, we see a sudden jump in storage and loss in the Pareto after very closely space Pareto-solutions, which likely happened in this case, due to phase transition between FE and AFE (see Fig. 15in the Appendix in the supplementary material).

Focusing on the parameter space (choosing temperature and surface ion energy) as in Figs. 12(e)–12(h), we first draw the ground truth of storage and loss [Figs. 12(e) and 12(f)]. Interpreting the trade-off between storage and loss in the objective and parameter space [Figs. 12(g) and 12(h)], we see that higher loss maps to lower surface ion energy and temperature, whereas higher storage maps to higher surface ion energy at the chosen value of film thickness and partial O_{2} pressure. Validating the interpretation with the ground truth using low-sampling (7 × 7), expensive exhaustive (grid-based) exploration, we find similar maps of high and low values of storage and loss in the parameter space. As we compare the Pareto-frontier and the ground truth in the parameter space (Figs. 12(e), 12(f), and 12(h)], the Pareto-frontier actually connects the maximum value regions of both storage and loss, giving a trade-off between them. From the loops diagrams in Fig. 16 in the Appendix in the supplementary material, it is clearly seen that the loop area (considering both single loop FE region and double loop AFE region) is increasing as the surface ion energy and temperature is lower (as we move from AFE to FE, and deeper into FE domain) and is decreasing as the surface ion energy is higher (as we move from FE to AFE, and deeper into AFE domain), thus minimizing (maximizing) the storage (loss) and maximizing (minimizing) the storage (loss), respectively. This agrees to the results in Figs. 12(e)–12(h). We see a similar jump of storage and loss, which again likely to be due to phase transition between FE and AFE (see Fig. 16 in the Appendix in the supplementary material). In practical, we do not know this transition region between the phases, and this type of sharp jumps can be problematic for the surrogate GP model in MOBO. Thus, to avoid this type of sudden jump at the unknown transition boundary, we can build Pareto-frontier (trade-off between the target functions) over any parameter space for a pre-selected phase (AFE or FE) using more complex architecture of constrained multi-objective Bayesian optimization (MOBO), which will be addressed in the future. More detailed figures for the analysis of 2–4–6 KLGD for bulk PZO, with the iteration (steps-wise) of MOBO is provided in Colab notebook.

## V. SUMMARY

To summarize, here, we develop a workflow for the joint optimization of multiple materials functionalities from expensive computations using the multi-objective Bayesian optimization. This approach is applied for the discovery of the optimal Pareto-frontier between the energy storage and loss in interfacially controlled ferroelectric materials but can be equally applied to other functionalities that can be derived from the P–E hysteresis loops.

In general, MOBO offers a powerful tool for the optimization of material functionalities and can be further extended toward theory–experiment matching. As in practical problems, there is a general need to focus of different functionalities from expensive experiments, which is more complex in the field of optimization. Here, the proposed MOBO architecture offers a great choice where it first replaces with inexpensive surrogate theoretical modeling with physics-driven prior knowledge and then optimizes any chosen multiple functionalities jointly in an attempt to rapidly discover the optimal systems with desired balance between the functionalities. Going forward, this approach can help in guiding the experiments by exploring toward the region of interest in high-dimensional multi-functionality space, and thereby reducing time and effort.

## SUPPLEMENTARY MATERIAL

See the supplementary material for the mathematical equations of the test problems in Sec. II, hysteresis plots for the low-sampling grid (ground truth) as part of the analysis in Sec. III. From these hysteresis plots, we calculate the respective ground truth of storage and loss to validate the MOBO results as presented in Sec. III.

## ACKNOWLEDGMENTS

This work was supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences, as part of the Energy Frontier Research Centers program: CSSAS—The Center for the Science of Synthesis Across Scales—under Award No. DE-SC0019288 (A.B.), located at University of Washington, DC; by the Center for 3D Ferroelectric Microelectronics (3DFeM), an Energy Frontier Research Center funded by the U.S. Department of Energy (DOE), Office of Science, Basic Energy Sciences under Award No. DE-SC0021118 (S.V.K.); and performed at Oak Ridge National Laboratory's Center for Nanophase Materials Sciences (CNMS), a U.S. Department of Energy, Office of Science User Facility. A.N.M.’s work was supported by the National Research Foundation of Ukraine (Grant Application No. Ф81/41481).

## DATA AVAILABILITY

The data that support the findings of this study are openly available in Github at https://github.com/arpanbiswas52/MOBO_AFI_Supplements, Ref. 112.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## REFERENCES

**83**, 1089 (1982)].

_{2}-Y

_{2}O

_{3}alloys: A theoretical investigation

_{3}

_{0.5}Hf

_{0.5}O

_{2}thin films for nonvolatile memory applications

_{2}

_{1-x}Sc

_{x}N, and Al

_{1-x}B

_{x}N thin films

_{2}thin films

_{2}-based thin films: An ab initio perspective

_{3}/BaTiO

_{3}thin films exhibiting constricted hysteresis loops

_{3}evidence for polar twin walls

_{3}nanowires

_{3}films

_{2}thin films

_{3}with surface compensation controlled by oxygen partial pressure

**16**, 044053 (2021).

_{3}

_{2}