Benchmarking Inverse Optimization Algorithms for Materials Design

Machine learning-based inverse materials discovery has attracted enormous attention recently due to its flexibility in dealing with black box models. Yet, many metaheuristic algorithms are not as widely applied to materials discovery applications as machine learning methods. There are ongoing challenges in applying different optimization algorithms to discover crystals with single- or multi-elemental compositions and how these algorithms differ in mining the ideal materials. We comprehensively compare 11 different optimization algorithms for the design of single- and multi-elemental crystals with targeted properties. By maximizing the bulk modulus and minimizing the Fermi energy through perturbing the parameterized elemental composition representations, we estimated the unique counts of elemental compositions, mean density scan of the objectives space, mean objectives and frequency distributed over the materials' representations and objectives. We found that nature-inspired algorithms contain more uncertainties in the defined elemental composition design tasks, which correspond to their dependency on multiple hyperparameters. Runge Kutta optimization (RUN) exhibits higher mean objectives whereas Bayesian optimization (BO) displayed low mean objectives compared with other methods. Combined with materials count and density scan, we propose that BO strives to approximate a more accurate surrogate of the design space by sampling more elemental compositions and hence have lower mean objectives, yet RUN will repeatedly sample the targeted elemental compositions with higher objective values. Our work shed light on the automated digital design of crystals with single- and multi-elemental compositions and is expected to elicit future studies on materials optimization such as composite and alloy design based on specific desired properties.


Introduction
Designing materials with desired tailored properties has always been a key goal in human society, from ancient metallurgy to modern nanotechnology.Traditionally, most research conducted in the field of designing crystals with multi-elemental compositions relies on costly experimental synthesis and time-consuming trial-and-error processes.Advances in computational engineering can greatly accelerate materials design efficiency.To achieve the design goal, many optimization methods were developed for the design process.These methods can be categorized as first and zeroth orders, where first-order optimization methods are gradient-based methods, such as gradient descent and Newton-Raphson, that rely on the problem being first-order differentiable.First-order methods are widely applied to topology optimization coupled with finite element methods for mechanical design, including classical problems like the design of beams 1 and trusses 2 .However, the primary drawback of firstorder optimization is that the differential form of the problems may not exist, or be easily obtained, in many (or most) real-world application scenarios.In many cases, the governing equations for the problem do not exist as differentiable forms.Moreover, first-order optimization tends to get "trapped" in local optima close to the initial design which heavily relies on prior physical understanding of the problem to ameliorate.Hence, metaheuristic optimization methods, such as genetic algorithms and particle swarm optimization, are more flexible and adaptive to complex problem domains.These characteristics enable metaheuristics to be applicable to multiscale materials design problems, such as molecular simulations, machine learning surrogates, and density functional theory calculations.For example, molecular dynamics (MD) simulations model the interactions of particles based on interatomic potentials 3,4 , DFT simulations calculate the energy based on pseudo-potentials for the Hamiltonian 5 , and ML surrogates represent the material's structure-property map with a neural network 6 .In these situations, obtaining first-order derivatives is either an extremely taxing task or may even be intractable.To overcome these challenges, recent literature highlights the utility of metaheuristics for elemental composition design, in contrast to gradient-based optimizations.For example, Ramprasad and coworkers developed genetic algorithm methods coupled with deep learning surrogate models for designing polymer dielectrics 7 and used similar strategies to design polymers with high glass transition temperatures and bandgaps 8 .Winter et al. combined machine learning surrogate with particle swarm optimization (PSO) to design drugs 9 .Weiel et al. used PSO to select parameters in molecular dynamics simulations 10 .
Many numerical methods exist for design optimization and two of the most widely applied methods are Bayesian optimization (BO) and deep reinforcement learning (DRL).BO leverages Gaussian process regression to fit a surrogate model of the design space and leverages Bayesian statistics to iteratively improve the accuracy of such a surrogate by sampling more points via an acquisition function.BO is frequently applied in materials design.For example, Shin et al. combined micromechanical experiments and BO to design spiderweb nanomechanical resonators 11 .Chen and coworkers developed a series of BO frameworks for mechanical design and elemental composition design using finite element simulations 12 , molecular simulations 13,14 , and machine learning surrogate models 15,16 .Zhai and Yeo combined BO and individual-based modeling of bacteria to design antimicrobial nanosurfaces 17 .In another popular method, DRL utilizes a deep neural network (DNN) as an agent to interact with the environment to receive rewards from different output properties and learn the best strategy to achieve the design goal.When applying DRL for materials design, the evaluations of the material properties are treated as the environment for the agent to learn the correct policy to achieve the design goal.For instance, Gu and coworkers developed a DRL framework for designing tough graphene oxide 18 and continuum composite materials 19 .Farimani and coworkers developed DRL frameworks for porous graphene for water desalination 20 .
Can these optimization methods be uniformly applied to the same problem and how do they differ?To answer these questions, benchmark studies are urgently needed.One particular reason for the urgency is efficiency.Optimization methods are intended to improve the efficiency of the design problems.However, when companies or research groups want to employ design optimization for their problems of interest, it is of utmost importance to select optimization methods that are best suited to their specific problems.Selecting ideal optimization methods from the massive amounts of existing literature is non-trivial: one may fall into the "trial-and-error" trap and the efficiency of solving the problem is significantly reduced.Moreover, ML-based studies for materials design are often focused on either large-scale molecular structures (i.e., polymers) or continuum structures.This is in part because it is easier to extract structure-property relationships for larger structural bodies in the context of continuum modeling, where first-order derivatives are easy to obtain.Specifically for materials composed of multiple elements, many studies leverage molecular graphs or use natural language processing to find this structure-property map.In contrast, while there are some studies on the design of small molecules using language representation with generative models 21 , a huge gap exists in applying general optimization methods for designing such multi-elemental compositions.Therefore, following our previous discussion, the specific problem to be posed herein is how different optimization methods differ in the context of design optimization.
To address these challenges, we propose and conduct a benchmark study by designing a framework to implement most state-of-the-art optimization methods.We benchmark many widely employed metaheuristic optimizations, including Bayesian optimization, genetic algorithms, particle swarm optimization, and simulated annealing.We further compared novel optimization methods, such as the arithmetic optimization algorithm and Runge Kutta optimization, for a total of 11 different optimization algorithms.We took advantage of the large numbers of open-source software for computational elemental composition design, including pymatgen 22 , Materials Project 23 , matminer 24 , and mealpy 25 .The material evaluations are enabled by surrogate models: pretrained graph neural network models to predict the elemental compositions' properties based on input graph structure.The general schematic of our study is shown in Figure 1.We begin with inspiration from the general premise of industrial or fundamental research problems: one needs to design elemental compositions based on specific applications.Given this application, the targeted physical properties must be identified based on the design requirements, such as high ✓ ) Why do we need to benchmark the optimization algorithms?There are so many di↵erent algorithms for materials inverse designhow these optimizations di↵er in the application of molecular materials design is of interest -could've significantly improved discovery e ciency.
Goal: Benchmark the design optimization process and results for molecular materials design with targeted properties.The optimal elemental compositions are extracted from the design space exploration and are ready for real-world applications.In this study, we focus on formulating an optimization framework that enables the application of the most current popular optimization methods to search for desired elemental compositions and compare different optimizations based on specific metrics.Finally, the "mined" elemental compositions are analyzed.toughness or low thermal conductivity.Based on these targeted applications and properties, the design problem can then be formulated mathematically together with the corresponding optimization algorithms.The design space can be explored to extract the optimal elemental compositions from the materials database(s).Here, we will mainly focus on the optimization process (Figure 1 C, D, & E) to propose simplified target material properties and focus on the algorithmic implementations and benchmarking.The significance of this work lies in that there is a lack of studies that evaluate the performance of different optimization methods, in contrast with the abundance of published literature that applies specific optimization methods for tailored materials design tasks.Our study may benefit future research in ways such as: (1) Researchers can select different algorithms that correspond well with their potential applications based on our evaluation results; (2) Researchers can use or refer to our framework and benchmark other optimization algorithms of their interests.Our major contributions are to (1) propose a framework that can implement different algorithms, which generalizes the materials' inverse design tasks, (2) evaluate different optimization methods based on defined metrics, and (3) highlight ideal algorithms for different potential applications based on our evaluations.Some results were unexpected: the commonly employed BO did not stand out in terms of the mean objective evaluation, while surprisingly RUN outperformed on both the stability and objective evaluations.
This paper is articulated as follows.We present an overview of the underlying methodology, describe the workflow of connecting machine-learned models of materials properties with different optimization methods, and benchmark the results obtained from 11 different optimization algorithms in Section 3. We then discuss the algorithms that are developed to perform the automated optimizations in Section 4. A detailed explanation of the methodologies employed is presented in Section 2 and further information is contained in the ESI.

Problem Formulation
Our goal is to benchmark design optimization algorithms for inverse elemental composition design, in which we formulate the problem as maximizing an objective function corresponding to materials properties.We benchmark using a model problem of maximizing the bulk modulus while simultaneously minimizing the Fermi energy.While not specific to any applications, our prototype model is useful for benchmarking and unveiling characteristics of different optimization algorithms for elemental composition design and discovery.To maximize an objective function, J , as a single-objective optimization problem which is defined as the difference between the bulk modulus and the Fermi energy, Here, the problem is defined as a discrete constrained optimization problem.The input graph structure G is related to the properties K and E Fermi , which can be considered as a forward mapping.The forward mapping of fast properties predictions is enabled based on the graph neural network structure MatErials Graph Network (MEGNet) 26 .Further details are provided in Section 2 & ESI.
The optimization problem can be solved using various optimization algorithms.Here, we explored 11 different state-ofthe-art optimizations to conduct benchmarking for materials discovery.The methods include Bayesian Optimization 27 , Deep Reinforcement Learning 28 , Genetic Algorithm 29 , Particle Swarm Optimization 30 , Hybrid Grey Wolf Optimization 31 , (Hybrid-Improved) Whale Optimization Algorithm 32 , Ant Colony Optimization 33 , Differential Evolution 34 , Simulated Annealing 35 , Bees Optimization 36 , Arithmetic Optimization 37 , and Runge Kutta (RUN) Optimization 38 .These algorithms fall into two main categories: machine learning-based optimizations and metaheuristic optimizations (Figure 2 B & C).We use different strategies to perturb Θ to maximize J , which will be elaborated in Section 2 & ESI.Here, we consider single and multi-element composition optimization scenarios.For single-element composition optimization, the design variable Θ is restricted to 1, and there are only two design variables ξ n and η.For multi-element composition optimization, the screened elemental compositions contain n atom ∈ [1, 4].The materials libraries as indicated in Figure 2 consist of a series of MP IDs and the randomly generated η is used to select the corresponding MP ID.
Following the discussion from Equation (1), the optimization is benchmarked and connected through the material evaluations.We utilize MEGNet for materials properties evaluation mainly for two considerations: (1) Fast inference.By using the pretrained MEGNet model (mp-2019.4.1), the properties can be obtained "on the fly" given the input graph structure, getting rid of the lengthy computational process via traditional numerical methods like molecular dynamics (MD) or density functional theory (DFT).(2) Better screen-ability of the design space.Given the pretrained MEGNet model, one can infer the corresponding properties given generated input graph G and hence allowing the evaluation of a large amount of "untouched" materials.On the other hand, due to the lack of existing data for the properties of many potential materials, one may not directly obtain the properties given the input elements just from a query.Also, due to the lack of pseudo-and/or interatomic potentials, MD or DFT calculations may not be conducted.Here, we employ MEGNet due to its reported accuracy and stability.As reported by Chen et al. 26 , the MAE for the prediction for the Fermi energy and bulk modulus are around 0.028 and 0.05, respectively.MEGNet is not the only option for materials properties inference, other existing frameworks with competitive performance include CGCNN 39 , MODNet 39 , H-CLMP(T) 40 , and Crystal graph attention networks 41 .Since our focus is on the optimization methods, we do not emphasize the surrogate model evaluations.Notably, many recent works focus on materials property prediction benchmarks 42,43 .
For this problem, the inverse optimization problem is to directly perturb the graph structure such that the materials' properties can be tailored.However, it is difficult to represent different materials' graph structures containing different elemental numbers in the same dimensions for optimization.Hence, we parameterize the graph structure G = G Θ so that the algorithms tune the design variables Θ = [n atom , ξ n , η] to change the elemental composition G Θ .Here, as visualized in Figure 2 A, the optimization begins with variable n atom , as the number representing different types of elements in the elemental compositions.We may denote the different atoms in the elemental compositions as the atomic dimensions.Follows is a random number distributed in [0, 100] as ξ n to assign atomic types to the atomic dimensions, which we may define as the material basis.For example, if the atomic dimension n atom = 3 we may obtain an element basis from ξ n like {C-H-O}.Given this material basis, one can generate a list of potential elemental compositions from the Materials Project, for which we denote as the materials library.One can then use η to select the candidate elemental compositions from the materials library.If following our previous example, the materials library may look like {CH 3 O, C 2 H 6 O, C 3 H 6 O, ...} and our selected material may be C 2 H 6 O.For how ξ n and η determine the atomic graph structure G in the algorithm, please refer to Extension on Problem Formulation in ESI.

Workflow Automation
One of the main contributions of our work is the development of an optimization framework that is flexible for many implementations of state-of-the-art optimization methods.The key point in such a framework is the automation of the optimization, where the update of the next set and the current set of the materials evaluation are connected.The automation is enabled through the graph neural network surrogate model as formulated in Equation ( 1), the MEGNet.The whole workflow 5/31
• Initialize design variable vector .
• Evaluate materials properties using pre-trained GNN models .
• Obtain the best fit and record the iterations.
• Read in design variables .
• Search candidate and store as a list in Materials Project based on .
• Select the materials in list based on the randomization factors .can be simplified to the expression: where IOA stands for "inverse optimization algorithms".Assumes the algorithm evaluates the n th set of elemental compositions with the design variable Θ n , obtaining the corresponding properties MEGNet(Θ n ), by employing the hyperparameters P IOA specified by from different optimization methods, one then obtain the design variables for the next set of evaluation Θ n+1 , for which the evaluated properties MEGNet(Θ n+1 ) then feeds back to the IOA and iteratively search for the next set.This loop allows one to get rid of the traditional "trial-and-error" approach and only need to set the hyperparameters to obtain the optimal elemental compositions given the targeted properties.The algorithmic flowchart is visualized in Figure 3 (Equations ( 1) & ( 2)).The formulated basic optimization method is shown in the right block, while the materials generation and evaluation processes are laid out in the left blocks.In our approach, the algorithm is initialized and repeated based on different random seeds for the five attempts in Figure 5.The hyperparameters for each algorithm are provided in the ESI and were selected from commonly used default values for each algorithm.

Bayesian Optimization
The core of machine learning (ML)-based optimization algorithms is to employ the explored materials data to fit a surrogate assisting the optimization process.Here, there are two main ways of adopting ML methods for design optimization: (1) building up a surrogate model of the design space for final exploration, namely the Bayesian optimization method; (2) utilizing a deep neural network (DNN) to learn the interactions of input elemental compositions and their corresponding material properties, which can be considered as the interaction of actions and their environment in which this DNN is considered as an "agent".This method is known as deep reinforcement learning.The detailed mathematical formulation and derivations are shown in the Machine Learning Based Algorithms & Metaheuristic Optimization in ESI.
For approach (1), following the formulated problem in Equation ( 1), we now assume the optimization algorithm is initiated by a set of m materials evaluations from MEGNet: , Ω(Θ 2 ), ..., Ω(Θ m )); one then obtains data lies in the mapping of M : Θ → J , in which we assume M exist and continuous.Here, this M represents the design space, i.e., the projection from the input design variables to the objectives.M can be approximated using Gaussian process regression (GPR) from the generated data in the materials evaluation, which can be simplified to the form: where p GP are all the hyperparameters involved in setting up the regression model, elaborated in detail in the ESI.Given a surrogate model of the design space built from GPR, one can improve the accuracy of such a surrogate by iterative sampling more points.The core of BO here would be the strategies to sample the next set of material evaluations from Θ.

6/31
The next set of design variables Θ next is selected via maximizing the acquisition function A , writes: where we use the upper confidence bound as the acquisition function (see Bayesian Optimization in ESI).BO utilizes GPR and acquisition function to efficiently sample the design space for a more accurate approximation towards the formulated optimization goal (max J ).

Metaheuristic Optimizations
Metaheuristic optimization is a family of optimization techniques that are used to find the optimal solution by exploring a large search space.They are often inspired by natural processes or social behavior, such as animal behavior (e.g., ant, whale, wolf), physical process (i.e., annealing of metals), biological processes (i.e., gene mutation), or fundamental mathematics.Metaheuristic algorithms are typically applied to complex optimization problems where traditional methods are either not feasible or inefficient.They are highly flexible and can be adapted to a wide range of problem domains, in which elemental composition design is our target scenario.Metaheuristics are often considered "black box" optimization methods, as they do not require knowledge of the problem structure.More specifically, they do not require the existence of the first-order differentiation form of the governing equations of the problem.
If we assume the simplest form, one can simplify and interpret the metaheuristic optimization process by sampling a set of random particles in the design space and the optimization process refers to the update of these sampled particles (For the generalization of this form see Methodology Generalization in ESI).The update rules may be inspired by different natural processes, corresponding to the optimization methods employed.If one assumes there are a total of d dimension of the problem domain, and denotes the rule as R, the update rules for the i th particle write: where X is the location of the particle, and the i th particle's location may be related to it adjacent particle(s) j. p MH denotes the hyperparameters employed in the metaheuristic optimizations.We here only present the generalized form to identify the key essence of metaheuristics (details of each algorithm see Metaheuristic Optimization in ESI).Both metrics help us understand the "strategy" of different algorithms when they scan the chemical design space to search for the ideal elemental compositions.There are five major findings from Figure 4. (1) Comparing A1 and A2, the scanned objective spaces are smaller for GWO compared with other optimization methods for both the single and multi-element composition design cases.For the single-element composition design case: (2) the evaluated elemental composition numbers are similar for most methods, whereas SA and AOA are observably smaller.Both GWO and SA display relatively high variances compared with other methods.(3) The mean scanned densities are relatively higher for SA and AOA methods, with both displaying higher variances.Both SA and AOA possess higher variances.For the multi-element composition design cases: (4) RUN has the lowest evaluated elemental composition numbers.GWO and BWO have the highest variances.(5) For the mean scanned densities for multi-element compositions, GWO and RUN have the highest evaluated elemental composition numbers and variances.

Elemental Composition Design Benchmarks
From Figure 5, we can draw five key observations.(1) When comparing A1 and A2, the mean objective distributions differ for the single-element and multi-element composition design cases.(2) For single-element design, GA, SA, and RUN display high mean objective values.SA and RUN also exhibit relatively low variances.GWO shows relatively high mean objectives but with higher variances, while BO demonstrates a stable performance with a low mean objective value and low variance.
(3) DE and AOA exhibit repetitive scans in the same composition region, indicating a tendency to get "trapped" in a local space.This aligns with their relatively lower mean objectives in A1.Moving on to multi-element design, RUN achieves the highest mean objective values with low variances.GA, GWO, and BWO show higher variances.(4) AOA and SA again exhibit repeated scanning of the same elemental compositions, suggesting a tendency for optimization methods to get trapped in local   To further quantitatively support our observations on the uncertainties of different optimizations for repeated elemental composition design experiments in Figures 4 & 5, Tables 1 & 2 show the variances among different experiments per different optimizations.Both Table 1 & Figure 4 show that the materials count variances are very different for single-and multielement compositions, due to the very limited single-element compositions existing in the chemistry table.For multi-element composition design, BO has the lowest variance and GWO has the highest variance of materials count, with GWO's variance approximately 7398 times higher than BO.Other than that, GWO's variance is approximately 27 times higher than that of RUN of materials count.GA and AOA have approximately the same level of uncertainty for materials count.The relative magnitude of the variances of the materials count and the mean density scan does not follow the exact same trend.Yet for multi-element composition design, GWO still exhibits the highest variance, with 366.93% higher than that of BO and 9992.08% higher than that of BO.Generally, it can still be deduced from Table 1 that, for both single-and multi-element composition design, nature-inspired algorithms (i.e., GA, GWO, BWO) display relatively higher variances, indicating higher uncertainties, which will be further discussed in Section 4. Table 2 further confirms the observation that nature-inspired algorithms present higher uncertainties.For both single-and multi-element composition designs, GA, GWO, & BWO show evidently higher variances of the mean objectives than that of other optimizations.For single-element composition design, GA presents 155.56% and 1093.8%relatively higher variances, GWO presents 35.24% and 531.77% relatively higher variances, and BWO presents 95.20% and 811.86% relatively higher variances than that of RUN and BO, respectively.For multi-element composition design, GA presents 2136.79% and 6380.87%relatively higher variances, GWO presents 2190.70% and 6537.04%relatively higher variances, and BWO presents 2132.71% and 6369.04%relatively higher variances than that of RUN and BO, respectively.
Figure 6 visualizes the material distribution generated by all the optimization algorithms, the changes in objective values during material evaluations benchmarked against silicon, and the reconstruction of the design space averaged across the 11 different optimizations.In Figure 6 A, the crystal phases of Xe and Kr are repeatedly scanned by multiple algorithms, despite their instability under room temperature.This raises an intriguing question for computational optimization: How can complex factors such as cost and industrial applications be incorporated into the optimization framework to enhance its realism for real-world deployment?This may be left for future explorations.AOA and GWO show an increasing trend in the number of evaluated elemental compositions in the target space for bulk modulus, while BO, GA, and GWO exhibit the same trend for Fermi energy.Figure 6 C visually depicts the design space, conveying a clear message that it is highly nonconvex and presents significant challenges in searching for the optimal point.Moreover, the non-continuous contours observed in the averaged design spaces based on evaluated elemental compositions further complicate the search for the global optimum.For a detailed discussion on the design spaces of all 11 optimization algorithms (and DRL), please refer to the ESI.
From Figure 6, we can derive five main observations.(1) Comparing A and B, we observe that the top evaluated elemental compositions are similar.(2) The majority of the top chemical compounds are binary alloys, consisting of elements present in the top single-element compositions such as Cs, Ce, Ba, and others.(3) RUN demonstrates effective scanning strategies and consistently converges towards local optimal elemental compositions.(4) SA demonstrates similar search strategies as RUN, evidenced by the high normalized frequency along the materials list.(5) BO appears to struggle in identifying the "ideal" elemental compositions and tends to scan the same region repeatedly, indicating a potential trap in the local optimal area.Based on Figures 4-6, we have also added qualitative comparison and suggested potential applications in Table 3.Since we hope to design materials digitally with low uncertainty and targeted desired properties quantified through objectives, low variance, and high mean objectives can be viewed as "advantages" of the algorithm."Targeted materials selection" is recommended due to repeated materials sampling and "Large-scale exploration" is recommended based on more materials candidates sampled with decent objective values.
Among all the metrics we evaluated, it can be deduced that RUN achieved superior optimization results compared with the others (Figures 4 & 5, Tables 1 & 2).This out-performance may be due to several reasons.(1) RUN is free of hyperparameters.The update process of RUN does not depend on fine-tuned hyperparameters, making it more robust especially for highly nondeterministic situations like elemental composition design.(2) The updating strategies of RUN are very different than that of other methods.The RUN optimization searches for the next set of evaluations by mimicking the update strategies of Runge Kutta's discretization of differential equations (see ESI), which are fundamentally different from other metaheuristic methods.(3) From (2), the enhanced solution quality (ESQ) mechanism is well suited for our formulated elemental composition discovery cases.In the authors' original proposal 38 , the ESQ mechanism is employed to avoid the local optimal solutions and increase convergence speed.In our case, as the material evaluation numbers are fixed at 5000, the faster convergence induced by ESQ enables the robust performance of the RUN algorithm in our benchmark cases.Further comprehensive characteristics are discussed in Section 4.

Discussion
Based on the results presented earlier, our findings raise several intriguing questions, for which we proffer the following discussions.Firstly, we observe variations in the performance of different optimization algorithms across the two elemental composition design scenarios, such as the contrasting results of material counts and mean density of RUN and AOA (Figure 4 B & C).One explanation is that algorithm effectiveness is problem-specific.Moreover, the optimization methods may also be sensitive to the dimensionality of the problem, such that changing from single-element to multi-element composition design induces the change of metrics evaluation shown in Figures 4 & 5. Additionally, in the case of single-element composition design, the limited number of available elemental compositions can impact the performance of certain algorithms, potentially influencing their search capabilities.An interesting observation pertains to the consistent success of the RUN algorithm in achieving high mean objective values in both problem scenarios, especially in the second case (Figure 5 A).This achievement can be attributed to the fact that the sampling strategy employed by RUN is hyperparameter-free and adept at storing information, which proves advantageous in handling high-dimensional spaces.
Another intriguing finding is that BO exhibits lower mean objectives accompanied by low variances in both cases.This trend becomes more apparent when considering Figures 5 B & 6 E. A plausible explanation would be BO is sampling a wider range of diverse elemental compositions during the optimization process.This behavior can be attributed to the uncertainty-based sampling strategy employed by BO, which facilitates the exploration of a diverse design space and aids in constructing a more accurate surrogate model.AOA tends to repeatedly sample similar elemental compositions over an extensive number of evaluations (Figure 5 B).This behavior suggests that AOA may become trapped within a specific region of the design space, necessitating further investigation to understand the underlying causes and identify potential remedies.Finally, we explore the major disparity in SA's performance between the two elemental composition discovery cases.A possible explanation for this discrepancy is that SA is sensitive to the dimension of the optimization problem, leading to its different performances tested from the designed metrics in Figures 4 & 5.
Each optimization algorithm demonstrates unique characteristics and strategies that contribute to its specific outcomes.The low variances of the objectives in Figure 5 could be explained by BO's low dependence on hyperparameters.BO aims to approximate an accurate surrogate model to optimize the objective, resulting in small mean objective values.This strategy utilizes uncertainty-based sampling, which decreases the overall objective value.The low uncertainty and variance in the final results are attributed to the small number of hyperparameters and the large sampling base.Additionally, BO exhibits high material counts, as observed in Figure 4 B. GA heavily relies on hyperparameters such as mutation, offspring generation, and randomization (details see ESI).Strong dependence on hyperparameters leads to high uncertainty in the mean objective values for both cases (Figure 5 A1 & A2).Moreover, GA repeated samples of the same elemental compositions once ideal properties are detected, resulting in high sampling density compared with BO, PSO, and other related methods (Figure 5 B1 & B2).PSO, on the other hand, exhibits lower uncertainty and variance as it has fewer hyperparameters, contributing to its more stable performance, as depicted in Figure 5 A1 & A2.
GWO is similar to GA in terms of hyperparameter dependency, resulting in high uncertainty and sampling density focused on specific objective regions (Figure 5).Notably, GWO displays higher uncertainty even in material counts and mean density, which differentiates it from GA (Figure 4).Similarly, BWO exhibits high uncertainty in mean objective values (Figure 5 A), which may also be attributed to its dependence on hyperparameters and very similar searching strategies of mimicking animal dynamical behavior.However, BWO displays more material counts and less sampling density scan (Figure 4 B & C), potentially attributed to the setting of the encircling prey parameter, which enhances sampling stability.ACO employs continuous probability-based sampling, resulting in relatively less uncertainty in the mean objective values (Figure 5 A).ACO has similar mean objective values as BO and PSO while exhibiting lower sampling density scans (Figure 4 C1), indicating that ACO avoids oversampling the same regime for the single-element composition design case.DE algorithmically operates as a variant of GA, yielding comparable mean objective results (Figure 5 A).With fewer hyperparameters compared to GA, DE exhibits lower uncertainty (Figure 5 A).The evolutionary strategies of DE demonstrate similar material counts to GA, as seen in Figure 4, and repeated sampling for favorable materials, as depicted in Figure 6.SA's performance is observed to be more problem-specific, as both the material counts and density scan vary per different design cases (Figure 4 B & C and Figure 5 A).SA exhibits low uncertainty for the mean objective values (Figure 5 A) due to fewer hyperparameters and the implementation of a Markov chain for the acceptance-rejection sampling criteria (details see ESI).BOA shares similarities with GWO and BWO in terms of high uncertainty in the mean objectives (Figure 5 A) and low material counts for multi-element composition design (Figure 4 B2), as they use particle-based, animal behavior-inspired search criteria (details see ESI).However, BOA exhibits relatively lower variance (Figure 5 A), indicating lower dependence on, and having fewer, hyperparameters.
AOA uses a distinct searching strategy based on mathematical symbolized operations.AOA exhibits high uncertainty with relatively low mean objectives (Figure 5 A).AOA tends to oversample the local regime, leading to low material counts (Figure 5 B).This behavior can be attributed to a large number of hyperparameters and the overdependence on certain factors such as MOA and MOP (details see Arithmetic Optimization Algorithm in ESI), resulting in a scaling factor that underestimates previous locations.Finally, RUN achieves high objectives with low uncertainty and avoids oversampling in specific objective regions (Figure 5).The material count is sensitive to dimensionality, yet all dimensions yield successful search results, as judged by the high overall mean objectives (Figure 5 A).RUN is characterized by being hyperparameter-free, incorporating numerous intermediate hyperparameters to increase the dimensionality of the iteration scheme, and relying on previous solutions for updates based on search history.Following our previous discussion on RUN's ESQ search mechanism and being hyperparameter-free, we would like to examine why RUN had outperformed other algorithms by having high objective values and low uncertainties.RUN employs an update scheme that has four update parameters 1 , each containing randomization coefficients2 and are connected.This randomization and "parameter-connection" could help explain the low uncertainty observed in our numerical experiments.For the high objective, RUN's ability to avoid local optima as reported by Ahmadianfar et al. 38 could be a possible explanation, in which they contend that the adaptive mechanism employed to update the parameter and the ESQ are the main mechanisms that assure a good transition from exploration to exploitation.

Conclusion
In this paper, we propose a framework that can incorporate most state-of-the-art optimization algorithms to perform multielemental composition discovery and design for crystals.Our framework parameterizes the input elemental graph structure to tailor the corresponding bulk modulus K and Fermi energy E Fermi .The key goal is to conduct a comprehensive benchmark for many of the state-of-the-art optimization methods and provide further insights for future implementation of these algorithms for elemental composition design.We optimized both single and multi-element composition discovery for maximized K and minimized E Fermi by formulating a single objective optimization.We benchmarked 11 different inverse optimization algorithms by fixing the material evaluations.We found that GA, GWO, and BWO exhibit higher variances, and BO and RUN display generally lower variances for materials count, density scan (Table 1), and mean objectives (Table 2).We further conclude that nature-inspired algorithms contain more uncertainties in elemental design cases, which can be attributed to the dependency on hyperparameters.RUN exhibits higher mean objectives whereas BO displayed low mean objectives compared with other optimization methods.Combined with materials count and density scan, it is proposed that BO samples more elemental compositions aiming to approximate a more accurate surrogate of the design space, and hence have lower mean objectives, yet RUN will repeatedly sample the discovered optimal elemental compositions for the successful search strategy (Figures 5 B and  6 D & E).We also proffer detailed discussions on the exhibited results that correspond to each optimization's characteristics.action for selecting the best element composition from the materials project database.Detailed formulations are elaborated in the following section.Figure S5 showcases some preliminary results from using DRL to design materials, where the single-element composition design processes are shown in Figure S5 A and the multi-element composition design processes are shown in Figure S5 B. Correspondingly, the material evaluations indicated in Figure 4 A are projected into 2-dimensional space and shown in Figure S6.
To better visualize our efforts in estimating the design processes through the mean density scan methods, we present some example density scan evaluations using BO and RUN in Figure S7.It can be seen that BO tends to evaluate a broad range of different element compositions with shallow gradients distributed among a material region, whereas RUN tends to repeatedly evaluate the detected "optimal" element composition multiple times, with sharp gradient points discretely distributed among the objectives space.This direct observation cross-validated our observations and proposed understandings of the searching strategies of BO and RUN in Sections 3 & 4.

Extension on Problem Formulation
Based on our problem formulation in Equation (1), we can further expand the materials parametrization function Ω.To explain, this is a highly nonconvex problem where the design variables are not related to the objective values at all.The ideal scenario, as described in the article, is to directly tune the graph structure representing the element composition to achieve direct modifications of its corresponding materials' properties.However, the difficulty that arises in this matter is it is knotty to represent the graph as the same dimensional input for perturbing the variables.
If assume the graph structures of the element composition can be written in the form of G (u, e, v).The relation between the design variables and the objective values can be written as where MEGNet K and MEGNet E Fermi represent the two pretrained models for predicting the properties of bulk modulus K and Fermi energy E Fermi .In our formulation (Equation ( 1)), the graph is generated from the design variables via a parameterization function Ω.This parameterization process relaxes the dimensional constraint of the graphs for different types of compounds.
where u, e, v are state, bond, and atom attributes, following the notation of Ref. 26 .n low atom and n high atom are readjusted for the two design optimization scenarios.C mat , L mat , and T mat are different functions relating the design variables to the graph structure through hierarchical algorithmic steps.[MP] simply stands for the material database stored in the Materials Project.

Estimation Metrics Density Scan
We use density scans to estimate the repeated evaluations of the same material in the objectives space.First, the material evaluation in the objectives space (Figure 4 A & Figure S7) is first projected on a 100 × 100 grids as a density scan to describe the repeated evaluations of the same points.We then eliminate all the zero-value scan points and compute the mean values of the non-zero density scans.The equation describing the density scan can be written as where N is the density scan, which is a function of the design variables N = N(n atom , ξ n , η).

FIGURE S4
. Schematic for materials discovery using deep reinforcement learning.The optimization is enabled by training a deep neural network as the "agent" to make optimal decisions, i.e., selecting the best materials.The action, and output of the network, screen the materials database and find the corresponding structure as input to the GNN as the environment.The environment then feeds back rewards (from predicted material properties) to the agent for making the best decisions.Note that the blue area corresponds to the zero evaluation region.

Deep Reinforcement Learning
For approach (2), one trains a deep neural network (DNN) as the agent to learn the policy between the environment (Θ → J ) and action (a).Here, the environment would be the material evaluator, i.e., MEGNet.The action a would be the change of the design variables Θ.The process of agent tuning the design variables can be represented as where r is the rewards from the evaluated material properties, which the DNN receives as input.Similar to GPR, p DNN is the hyperparameters involved in training the DNN.The agent receives the reward r based on the properties of the evaluated material and takes the corresponding actions a to tune the design variables Θ.
Here, we employ the basic deep Q-learning.The update of the Q-learning schemes is solved from the Bellman's equation 28 : where Q π is the optimal action-value function, γ is the discount-rate parameter.s ′ & a ′ represents the update for the action and states.The update for the agent's new action requires a policy, in which we employ the Boltzmann Q policy.In the DRL implementation, the optimization objective max J is expected to be learned as policies via agent-environment interactions by feeding rewards.Here, we implement the basic Deep Q Network (DQN) by employing the Q-learning strategy.The agent numerically approaches the strategy for selecting the best Θ to either maximize or minimize J .
In our DRL implementations, the agent is a deep neural network (DNN): It can be seen that our agent has 3 hidden layers with 64 neurons per layer activated by the ReLU function.s is the state, which we defined as the objective value.Here, a 1 , a 2 , a 3 correspond to changing the design variables, n atom , ξ n , η.The rewards are crafted based on the state as The reward differences for different materials' properties intervals are relatively small.Our goal here is to pose a more challenging learning task for the agent to approximate the correct policy for maximizing the policy for design.The reward feeds into the total scores per episode to train the agent.
The standard reinforcement learning setup is considered, consisting of an agent interacting with an environment E in discrete timesteps.The action-value function is used in many reinforcement learning algorithms.It describes the expected return after taking an action at in state st and thereafter following policy π: Q π (s, a) = E[R|s, a].The optimization goal can be achieved via solving the Bellman equation, where the Q π values are updated recursively 44 : The Q-learning is achieved by minimizing the loss by considering the function approximators parameterized by p Q : where Y = r(s, a)+γQ(s ′ , µ(s ′ )|p Q ).The optimization is achieved via feedback-action loops through the rewards.Randomized a s(a) r MEGNet DNN

DNN
Here, we employ the Boltzmann Q policy for the Q π values: where π(a|s) is the probability of taking action a in state s; Q(s, a) is the estimated Q-value of taking action a in state s; τ is the temperature parameter that controls the exploration-exploitation trade-off.In our implementation, The algorithm undergoes an exploration phase consisting of 30 episodes, during which 5 elemental compositions are evaluated per episode.After this, the algorithm is trained for 60 steps, with 1 material evaluation per step.Finally, the algorithm is tested over 20 episodes, with 5 material evaluations per episode.This setup allows for the algorithm to learn and optimize material designs, with the testing phase providing a measure of its effectiveness.

Bayesian Optimization
Gaussian Process Regression Gaussian process regression (GPR) is a Bayesian statistical approach to approximate and model function(s).Considering our optimization problem, if the function is denoted as y = DS(x, p), where DS is evaluated at a collection of different sets of points: x 1 , x 2 , ..., x k ∈ R d , we can obtain the vector [ f (x 1 ), ..., f (x k )] to construct a surrogate model for the design parameters with the correlated objectives.The vector is randomly drawn from a prior probability distribution, where GPR takes this prior distribution to be a multivariate normal with a particular mean vector and covariance matrix.Here, the mean vector and covariance matrix are constructed by evaluating the mean function µ 0 and the covariance function Σ 0 at each pair of points x i , x j .The resulting prior distribution on the vector [ f (x 1 ), ..., f (x k )] is represented in the form of a normal distribution to construct the surrogate model where N (•) denotes the normal distribution.The collection of input points is represented in compact notation: 1 : k represents the range of 1, 2, ..., k.The surrogate model f (x) on 1 : k is represented as a probability distribution given in Equation (S8).To update the model with new observations, such as after inferring the value of f (x) at a new point x, we let k = l + 1 and x k = x.
The conditional distribution of f (x) given observations x 1:l using Bayes' rule is where the posterior mean µ l (x) is a weighted average between the prior µ 0 (x) and the estimation from f (x 1:l ), where the weight applied depends on the kernel used.We used the Matérn function in the implementation.

Acquisition Function
Given the training data [y k , x k ], Equation (S8) gives us the prior distribution y l ∼ N (µ 0 , Σ 0 ) as the surrogate.This prior and the given dataset induce a posterior: the acquisition function denoted as A : X − → R + , determines the point in X to be evaluated through the proxy optimization x best = arg max x A (x).The acquisition function depends on the previous observations, which can be represented as A = A (x; (x l , y l ), θ ).Taking our previous notation, the new observation is probed through the acquisition 17 where the input space contains the evaluation of design variables at n points: X l := (x 1 , x 2 , ..., x l ).In our case, X is acquired through running n numbers of material evaluations.We pick the GP Upper Confidence Bound (GP-UCB) as the acquisition function, exploiting the lower confidence bounds (in the case of minimizing the objective function) to construct the acquisition and minimize the regret.GP-UCB takes the form where κ is a tunable parameter balancing exploitation and exploration when constructing the surrogate model.We take κ = 2.5 as a default value in the model.Combining GPR and the acquisition function, the surrogate model can be constructed to approximate the minimum value in the design space.In our case, such BO methods are applied to obtain active surface typologies with minimal residual bacterial cells.For both material optimizations, we randomly explored the design space for 5000 material evaluations and repeated 5 times of different random seeds.

Methodology Generalization
Although different methods may employ totally different notations and problem representations varied by their different optimization techniques, the general metaheuristic methods can be generalized to a "heuristic sampling" technique to sample the design space, in a non-rigorous sense.To symbolize our thoughts, assuming we have a design space map our design variables (recall we parametrize the elemental graph in Equation ( 1)) Θ to our objective values, one can write We generalize metaheuristic optimizations as sampling a set of points (or particles) with d dimension, of which locations are represented as X d 1 , X d 2 , ..., X d i , ..., X d m if we initialize the search with m particles.This is how we perceive the metaheuristic optimization process shown in Equation ( 5), where we use R to denote the update strategy differed by different metaheuristic methods employing different prior information.Note that we here just propose a way to simplify the representation, but not state we will use this notation for all the further methods.
Here, we roughly classified our employed metaheuristic optimization methods into two categories, Nature-inspired algorithms, and mathematics-inspired algorithms.Of course, one may further categorize simulated annealing as physicsinspired (or materials-inspired), and genetic algorithm, particle swarm optimization, etc., as biology-inspired algorithms.For the sake of simplicity and clarity, we do not further proceed.In most implementations, we pick the default hyperparameters from the library or previous literature hoping to eliminate the bias towards hyperparameters.

Genetic Algorithm
Genetic algorithms have been recently applied to elemental composition design with rapid growth.The genetic algorithm was originally proposed inspired by the mutation and reproduction of genes.The algorithm seeks to mimic the fundamental mathematical rule to be treated as an optimization process.The process involves representing the design variables, denoted as θ , as a set of "chromosomes," denoted as C. The chromosomes can be written as C = [n atom , ξ n , η], or C ≡ [Θ].
The population, denoted as N pop , is selected as a randomly chosen assortment of chromosomes, such that N pop = [C 1 , C 2 , ...].For instance, if the population contains five chromosomes, it can be represented as [10011].The population then undergoes "crossover" and "mutation" operations to generate the next generation of chromosomes, to maximize or minimize a given "fitness function," denoted as J .
The probability of selecting a chromosome for reproduction is determined by its fitness value.Specifically, the probability of choosing the i j th chromosome, denoted as C i j , is given by: where f (C i j ) represents the fitness value of the chromosome, and N pop represents the total number of chromosomes in the population.Here, f (C i j ) ≡ MEGNet(Θ), where the MEGNet material property evaluations are defined as the fitness function(s).
In the context of our generalization, one can treat each chromosome as a particle X D i , and N pop as the initial set of particles.The mutation and selection of chromosomes are analogous to the update of these particles and the hyperparameters in the update together with the probability determine the R.
By iteratively applying crossover and mutation operations on the population and selecting chromosomes based on their fitness values, the genetic algorithm is able to identify the most promising set of design variables that maximize or minimize the fitness function.For the hyperparameters, for the single and multi-element cases, we pick the maximum iteration number as 25 and 75, respectively, and population size as 10, mutation probability as 0.3, crossover probability as 0.5, parents portion as 0.5, and uniform crossover type.

Particle Swarm Optimization
Particle Swarm Optimization (PSO) is another evolutionary optimization algorithm that can be used for materials design.PSO starts with a set of randomized design variables (similar to chromosomes in GA) and propagates these points according 25/31 to certain rules.The velocities and positions of these points in the d th dimension of the i th particle are denoted as V and X respectively.The update of these variables is given by: Here, rand1 d i and rand2 d i are random numbers between 0 and 1, c 1 and c 2 are acceleration coefficients, and pbest d i and gbest d are the best positions of the i th particle and the swarm respectively in the d th dimension.
Liang et al. 30 proposed a new formulation to improve traditional PSO by changing the update of velocities.The new formulation is given by: Here, fi = [ f i (1), f i (2), f i (3)] defines which particles' pbest the particle i should follow.Additionally, c local is a local acceleration coefficient, w is the inertia weight, and f lag i represents the number of generations that the i th particle has not improved over its pbest position.In our PSO implementations, for the single and multi-element design cases, the population size is 10, c local equals 1.2, the parameter f lag i equals 7, the range of the weight is w ∈ [0.4,0.9], the portion of parents is selected as 0.5.

Hybrid Grey Wolf Optimization
Grey wolf optimization (GWO) is a swarm intelligence algorithm that is inspired by the dynamics of hunting grey wolfs originally proposed by Mirjalili et al. 45 .To explore the design space, GWO starts with a group of particles representing initial designs, denoted as X i = [n atom , ξ n , η] for the i th design, with the position represented by X (Equation ( 5)).The GWO algorithm updates the population inspired by the hunting behavior of grey wolves.The algorithm comprises a pack of grey wolves, each representing a solution to the optimization problem.The original GWO algorithm updates the position of the i th wolf as follows: where where X i is the position of the i th wolf, X α , x β and X δ are the positions of the α, β , and δ wolves respectively, and A 1 , A 2 , A 3 are random coefficients.Here, both the coefficients A = [A 1 , A 2 , A 3 ] and C = [C 1 ,C 2 ,C 3 ] are calculated as where components of a are linearly decreased from 2 to 0 during the material evaluations and r 1 and r 2 are random vectors in [0, 1].These randomization settings make the GWO a pseudo-hyper-parameter-free optimization method, as one does not need to set hyperparameters to update evaluations.However, while the original GWO effectively explores potential areas in a large search space, its exploitation capability can be improved.To address this, Obadina et al. 31 proposed combining GWO with the whale optimization algorithm (WOA) to use the logarithmic spiral equation from WOA to enhance GWO's exploitation capability.This hybrid algorithm, denoted as GWO-WOA, updates the position of the i th wolf as the same as the original GWO, where the form of the two parameters D α and a are different.For parameters D α of the alpha wolf: where p 1 , Y , and ρ are random numbers within [0, 1].l is a random number in [−1, 1].b is a constant that defines the logarithmic spiral shape in WOA, which in our cases equals 1.As we mentioned in the vanilla GWO settings, a linearly Motivation !ProblemA simple problem: Discover New Materials Why do we need computational design optimization?Experimental synthesis is expensive, trial-and-error is time-consuming, and grid search is computationally burdening.argmax ✓ y = f (x ✓ ) find ✓ = ✓ ⇤ , where y max = y (x ⇤

Figure 1 .
Figure 1.The general overview of the materials design process.(A) Schematic for defining the intended industrial application.(B) The process of identifying the desired material properties such as high bulk modulus and low Fermi energy.(C) The schematic for formulating a digital design optimization problem using numerical models to discover or design elemental composition with the desired properties.(D) Different optimization methods use exploration/exploitation strategies to generate a design space for exploring the best materials.(E)The optimal elemental compositions are extracted from the design space exploration and are ready for real-world applications.In this study, we focus on formulating an optimization framework that enables the application of the most current popular optimization methods to search for desired elemental compositions and compare different optimizations based on specific metrics.Finally, the "mined" elemental compositions are analyzed.

Figure 2 .
Figure 2. Schematic for optimization benchmarking and different optimization algorithms employed in the study.(A) Workflow for discovering elemental compositions by combining graph neural network (GNN) surrogate models with defined design variable randomization to select optimal elemental compositions with targeted material properties.See Section 2 & ESI for details.(B) Schematic for optimization through metaheuristic algorithms, starting with a set of random particles (or population) sampled in the design space, to iteratively update and eventually cluster around globally optimal values, indicating the search for elemental compositions with desired properties.(C) Bayesian optimization methods for elemental composition discovery, seeking to build a surrogate model of the design space by employing Gaussian process regression and searching for the next material evaluations based on an acquisition function.Optimal elemental compositions can be extracted from the design space surrogate.

Figure 4
shows the benchmarks of the 11 different inverse optimization algorithms (IOAs) and their objective properties explorations (Column A), the distribution of the material evaluation numbers with the corresponding mean values (Column B), and the distributions of scanned density with overall mean values (Column C), for both the single (Row 1) and multi-element (Row 2) elemental composition design.In the objective distributions, different IOAs are marked in different colors, and the corresponding optimization methods are abbreviated.Here, the materials count represents the total number of unique elemental compositions scanned by the different IOAs.The mean densities of objective spaces are computed to estimate the repetitive evaluations of the same elemental compositions by computing the mean value of the density scan space (see Section 2 & ESI).

Figure 4 ./ 31 Figure 5 .
Figure 4. Objective properties exploration, objective value evaluation, and targeted design space count for benchmarking different optimization methods for single and multi-element composition design.(A1) Overall design space benchmarking for the 11 different optimization schemes used, marked in different colors for the single-element composition design (For the projections in 2D, please refer to Figure S6).(B1) The distribution (for the 5 repeated experiments) and mean values of the overall material counts during the evaluation processes.(C1) The distribution (for the 5 repeated experiments) and mean values of the scanned material evaluation densities (see Section 2 & Density Scan in ESI for the calculation of "Mean Density") during the evaluation processes.(Related details can be found in the analysis of Figure S7).(A2) Overall design space benchmarking for the multi-element composition design (refer to Figure S6 for details).(B2) The distribution and mean values of the overall material counts.(C2) The distribution and mean values of the scanned material evaluation densities.The white triangular dots are the mean values for the 11 evaluations in both (B) & (C).The corresponding numbers are marked on top of the bar plot for both (B) & (C).BO, GA, PSO, GWO, HIWOA, ACO, DE, SA, BOA, AOA, & RUN denote Bayesian Optimization, Genetic Algorithm, Particle Swarm Optimization, Hybrid Grey Wolf Optimization, Hybrid-Improved Whale Optimization Algorithm, Ant Colony Optimization, Differential Evolution, Simulated Annealing, Bees Optimization, Arithmetic Optimization, and Runge Kutta Optimization, respectively.

Figure 6 .
Figure 6.Extracted elemental composition evaluations for the optimization benchmarking.(A) The top ten elemental compositions extracted from the single-element design case.(B) The top ten elemental compositions extracted from the multi-element composition design cases.(C) The top ten chemical compounds extracted from the multi-element composition design case.(D) The normalized frequency (see Additional Results in ESI for extended analysis for frequency evaluations of optimizations) for different extracted elemental compositions under the single-element composition design cases.The top subfigure dots stand for the projection of the normalized frequency in the range of [1, 2.5].(E) The normalized frequency under the multi-element composition design cases.The top subfigure dots stand for the projection of the normalized frequency in the range of [10, 45].

FIGURE S2 .
FIGURE S2.The distribution of the materials count over different objective values for the five different experiments, in which the single-element composition design experiments are shown in A and multi-element case in B.

FIGURE S3 .
FIGURE S3.Some example numerical experiments showcasing results from Bayesian optimization and Runge Kutta optimization.Subfigures A1 & B1 are the objective space evaluations for the single-element composition design for BO and RUN; Subfigures A2 & B2 are the objective space evaluations for the multi-element composition design for BO and RUN.Subfigure C1 & D1 are the top ten element compositions extracted from single-element composition discovery for the single-element composition design.Subfigure C2 & D2 are the top ten element compositions extracted from single-element composition discovery for the multi-element composition design.

FIGURE S5 . 5 FIGURE S7 .
FIGURE S5.Example cases of materials design using the deep reinforcement learning methods.(A) Single-element composition design case with 110 evaluations.(B) Multi-element composition design case with 300 evaluations.
and upper bounds of the design variables and d indicate that the problem is d-dimensional.

Table 1 .
The variances for the five repeated experiments of different optimization algorithms for single-and multi-element composition design, expressed with single-digit precision.Note that MC denotes "Materials Count", and MDS stands for "Mean Density Scan".The four columns correspond to Figure4 B & C. For better presentation, we rescale the variances of MDS 100 times larger (indicated in the form ×10 −2 ).

Table 2 .
The mean objective values and variances for the five repeated experiments of different optimization algorithms for single-and multi-element composition design, expressed with single-digit precision.Note that MO denotes "Mean Objective", and Vr stands for "Variances".The four columns correspond to Figure5A.For better presentation, we rescale the variances of MDS 100 times larger (indicated in the form ×10 −2 ).

Table 3 .
Assessment and conclusion from the comparison results for optimization methodologies benchmarked.Note that the (dis)advantages concluded from the previous results may be subjective.Note that the "Low selectivity" for RUN indicates the variance is relatively low compared with DE and AOA.