Restricted Global Optimization for QAOA

The Quantum Approximate Optimization Algorithm (QAOA) has emerged as a promising variational quantum algorithm for addressing NP-hard combinatorial optimization problems. However, a significant limitation lies in optimizing its classical parameters, which is in itself an NP-hard problem. To circumvent this obstacle, initialization heuristics, enhanced problem encodings and beneficial problem scalings have been proposed. While such strategies further improve QAOA’s performance, their remaining problem is the sole utilization of local optimizers. We show that local optimization methods are inherently inadequate within the complex cost landscape of QAOA. Instead, global optimization techniques greatly improve QAOA’s performance across diverse problem instances. While global optimization generally requires high numbers of function evaluations, we demonstrate how restricted global optimizers still show better performance without requiring an exceeding amount of function evaluations.


Introduction
Quantum algorithms like Shor's algorithm [1] and the Harrow-Hassidim-Lloyd (HHL) algorithm [2] show exponential advantages over their classical counter parts.Since such quantum algorithms demand fault-tolerant quantum computers, variational quantum algorithms have emerged for currently available Noisy Intermediate-Scale Quantum (NISQ) devices, which combine parameterized quantum circuits and classical optimizers.One of these variational quantum algorithms is the Quantum Approximate Optimization Algo-rithm (QAOA), which is considered a promising candidate to show quantum advantage for NP-hard combinatorial optimization problems.While extrapolations suggest a quantum speedup for problem instances that require several hundreds of qubits [3], the main limitation lies in the exponential cost of optimizing the classical parameters of this quantum algorithm, which has been shown to be a NP-hard problem itself [4].The cost landscape of QAOA exhibits wide plane areas and large numbers of local minima [5], which are especially disadvantageous for local optimizers.To overcome this issue, heuristic strategies for the classical optimization as well as favorable problem formulations and encodings have been studied [6][7][8][9][10].While such strategies enhance the performance of QAOA, further improvements are crucial in order to apply QAOA to real-world optimization problems like the Unit Commitment (UC) problem [11], the Traveling Salesperson (TSP) problem [12] or the Factory Layout (FL) problem [13].
In classical optimization, it is well known that optimization algorithms, which are solely based on local information, are generally not adequate for solving NP-hard problems [14].Instead, global optimization algorithms which combine multipoint global search and problem oriented local search have been well established for such problems [15,16].
In this work we show why current approaches with local optimizers show poor performance, even for simple problem instances.While a current line of research aims to improve the overall structure of the cost landscape [6,7], we argue that such improvements cannot resolve the drawbacks caused by the use of local optimizers.We therefore introduce global optimizers and show that they greatly improve the performance of QAOA on various problem instances and sizes, at the cost of a higher number of function evaluations.
Our work is structured as follows: In Section 2 we motivate the need for better optimization techniques and discuss current approaches.In Section 3 we briefly discuss QAOA and the problem formulations used in this work.In Section 4, we show how the cost landscape of general problem formulations undergoes a transformation based on both the selected encoding parameters as well as the general parameter choices of QAOA.In Section 5 we motivate the use of global optimizers on the basis of a simple example and benchmark their performance on three use cases in Section 6 and 7. Finally, we conclude our findings in Section 8.

Related Works
To overcome the difficulty of optimizing the classical parameters of QAOA, various optimization strategies as well as favorable problem encodings and formulations have been proposed.One line of research focuses on improving the structure of QAOA's cost landscape in order to reduce its difficulty for local optimizers.Brandhofer et al. show how different factors like scaling parameters, encoding strategies and mixer Hamiltonians can influence the shape of the cost landscape and therefore influence optimization performance [6].Albeit many enhancements have been proposed, this line of research does not consider the underlying problem of optimizing QAOA's classical parameters with local optimizers.
Another line of research proposes heuristics for parameter initialization in order to minimize the effort of classical optimization.Nakanishi et al. combine such a heuristic with a layer-wise training approach [8], where each layer of QAOA is optimized after the other, leading to improved results at the cost of higher number of optimization steps.Zhou et al. also propose a heuristic for a layer wise initialization [7], while Sack et al. propose adapting concepts from Quantum Annealing to the initialization of classical parameters [9].A similar approach is the idea of parameter transfer, where the classical parameters of QAOA are first optimized for a simpler problem instance and the resulting parameters are then used for the initialization of harder problem instances [10,17,18].Instead of improving optimization results of QAOA by enhanced encodings or heuristics, var-ious works have tried to replace the local optimizers by other optimization approaches such as reinforcement learning [19].Closest to our claims are the works of [20] and [21].Acampora  Besides these approaches, many concepts like warm-starting [22] have been introduced, which improve the solution quality of QAOA without addressing the difficulty of optimizing the classical parameters with local optimizers.

Preliminaries
The cost landscape of hybrid quantum-classical optimization algorithms like QAOA has been subject to detailed analyses.The specific shape changes with the problem at hand and its encoding into a Hamiltonian.To start from a common basis, Section 3.1 introduces the general structure of QAOA and Section 3.2 the general form of the problem formulations used in this work.

QAOA
The aim of the optimization algorithm is the preparation of a state |ψ(β, γ)⟩ that encodes the solution to the given problem, where β and γ are parameter vectors that are optimized by a classical optimizer.The output state is prepared from an equal superposition by an Ansatz that consists of layers.Each of them depends on the value of β i and γ i , where i denotes the index of the layer.The layers are repeated p times, with p being small for implementations on NISQ devices.Each layer Ĥlayer consist of two parts: The cost and the mixing Hamiltonians ĤC and ĤM respectively.
Here, β i parameterizes the mixer term, while γ i parameterizes the cost term.
The complete circuit can be written as For all cases considered in this work, the mixer Hamiltonian is defined simply as There are several other options to chose the mixer Hamiltonian to improve performance [6], but in this work we want to focus on the general drawbacks of local optimizers and therefore only consider this simplified case.
The cost Hamiltonian on the other hand encodes the given problem and can be constructed from a quadratic unconstrained binary optimization (QUBO) formulation via the Ising-Model.The cost Hamiltonian will be constructed as with J ij being the factors of the quadratic terms of the i-th and j-th element and h i the factors of the linear terms of the Ising-Model.The so constructed circuit generates states depending on the parameters β and γ.The quality of a given state depends on the expectation value of the cost function denoted as C(β, γ).This cost function is minimized by the classical optimizer.

Problem Formulation
The problem formulations of interest consist of an objective function and a set of constrains for a binary (or integer) optimization problem.In order to encode this problem formulation into the cost Hamiltonian for QAOA, it needs to be converted into a QUBO formulation.A general guide on doing this can be found in [23].In this work the cost function of a given QUBO will have the following general form: Here x i refers to the binary variables of the optimization problem.H cost and H pen,j are the functions encapsulating the cost and the n pen penalties.The factor s, as described in [17], is a scaling factor that is chosen, such that it ensures numerical stability in the optimization process (ref.Section 4.1).Lastly, P j scales the respective penalty term to change the influence it has on the cost landscape as shown in [6,12,24] (ref.Section 4.2).A detailed description of the problem formulations of the UC, TSP and FL problem can be found in Appendix A.

Performance Metrics
The performance of an optimization algorithm depends on two main factors: The quality of the solution found and the calculation cost of this solution.The latter can be easily quantified in our setting as the number of cost function evaluations (which corresponds to the number of circuit executions).Therefore, the number of function evaluations will be the measure of how costly it is to obtain a given solution.The quality of the solution is measured by the expectation value of the cost function divided by the cost of the minimal solution, so the normalized cost C(β,γ) C min [18].This metric has the benefit, that it is not dependent on the absolute value of the cost function.Due to the dropping of the constant offset in the construction of the cost function the minimal cost will always be negative, so the optimal value is 1.Given these performance metrics, a given optimizer should maximize the normalized cost while minimizing the number of function evaluations.

Cost Landscape
The shape of the cost landscape for a given problem depends on a set of parameters that are influenced both by the specific problem at hand and the characteristics of the algorithm being used.These parameters greatly influence the performance of classical optimization methods and require a thorough examination of their impact on the underlying cost landscape.As we show in Section 5, even for well designed QAOA cost landscapes, optimization (especially for local optimizers) remains difficult.To obtain such well designed QAOA cost landscapes, the effects of various parameters will be studied in the follow-

ing.
Problem dependent parameters like the power demand L of the UC problem or the adjacency matrix D of the TSP and FL problem are directly tied to the problem.Other parameters, like the scaling factor s or pentalty factors P i , have to be chosen such that stable and successful optimization is achieved.On the other hand, algorithmic dependent parameters are (partly) independent of the problem instance and influence the characteristics of the QAOA cost landscape as well.In this work only qubit number n and layer count p are investigated, dropping the additional degrees of freedom stemming from the choice of mixer Hamiltonian and enhanced encoding strategies.
The visualization of the cost landscapes for a given problem formulation for QAOA with one layer is straight forward: The parameters β and γ are used as x and y axis respectively and the expectation value of the cost function C(β, γ) is used as z axis (ref.Figure 1).In order to plot higher dimensional cost landscapes we follow the ideas of [25] where two random vectors (θ 1 and θ 2 ) serve as axes needed to visualize parts of the cost landscape.
A characteristic example of a well designed cost landscape of QAOA with a single layer for a four Unit UC problem instance is depicted in Figure 1a: Due to the simple construction of the mixer Hamiltonian as described in [9], a clear periodicity in the direction of β can be seen.This is not the case in the direction of γ, where (depending on the Hamiltonian) many different frequency components will overlay [26].Generally, the detection of the combined period for arbitrary Hamiltonians is unlikely.Again this can be seen in Figure 1a.Since the periodicity of γ for the Hamiltionians considered in this work is unknown, strategies of using parameter initializations similar to an annealing schedule, as proposed in [9], cannot be applied.
The similarity of the position of local minima in the cost landscapes depicted in Figure 1 leads to a path of investigation where one tries to apply the knowledge of one found minimum in one problem instance to other problem instances.This includes the concepts from [10,17,27] where this strategy is successfully demonstrated.These concepts have the caveat, that they need prior knowledge of solutions found for similar problems, so the initial problem of finding a primary good solutions remains the same, motivating the use of global optimizers.

Effect of Scaling Factor s
The magnitude of s (ref.Equation 5) has a direct effect on the scaling of the coefficients in the cost Hamiltonian.As described by [17], this stretches or compresses the cost landscape with respect to γ, when s is smaller or greater than 1 respectively.The comparison between Figure 1a and 1c illustrates this phenomenon.The change in s by a factor of 10 compresses the landscape spanning from 0 to 2π down to 0 to around 0.4.Additionally, the magnitude of C(β, γ) also changes, as the calculation of the cost function is also dependent on s.The resulting states and their corresponding solutions are not changed, so the minima remain the same, solely shifted by this parameter.
As the effectiveness of standard optimization algorithms often deteriorates if the variables do not have the same order of magnitude, s should be chosen such that all variables fullfill this requirement.In figure 1c a further increase of s would result in a cost landscape, where small changes in γ would have an even greater effect, making convergence difficult.In this work, the rescaling heuristic suggested in [17] is used to enhance the cost landscape, where s is chosen such that the absolute mean weight of the coefficients is scaled to 1.This ensures that at least one minimum lies in the range γ ∈ [0, 2π], but also results in a trade-off between excluding better minima from the search space on one side and a smaller, but limited parameter space on the other side.

Effect of Penalty Factors P j
The penalty factors affects the number and size of the local minima: With rising values of P j , more local minima are placed in the same segment of the parameter space.This is due to the increased value a invalid solution adds to the expectation value of the solution, as states representing wrong solutions become less desirable.This effect is depicted in Figures 1a and 1b, where the increased P results in a rougher surface.When choosing the value of P j there is again a trade-off: If P j is too low, invalid solutions will become optimal (in terms of H cost ).If P is too large, the cost landscape becomes difficult to optimize in.Additionally, the relative difference between valid and optimal solutions decreases in comparison to the over all values of the cost landscape, so valid and optimal solutions become increasingly similar.
Works such as [6,24] have proposed methods to choose a favorable penalty factor for a given problem.In this work we follow the strategy suggested in [6], where P j is iteratively increased by small values until the cheapest wrong (invalid) solution becomes at least as expensive as the optimal solution.A short synopsis to this method can be found in Appendix B. 3. In real applications this strategy is not feasible as it requires knowledge of the full cost structure of the problem.

Effect of the Problem Dependent Variables
In contrast to s and P j , problem dependent variables such as the power demand L in the UC problem can not be chosen freely, as they are part of the problem at hand.The position of the minima of the cost landscape move depending on such values.To illustrate such changes, the power demand L of the UC problem has been varied between Figures 1a and 1d.While the underlying structure of the cost landscape remains the same, the distinct minima change.This also shows the difficulty of applying parameter transfer [10,17,27] for such problem instances, where   1a).Subfigures 2a and 2b show the effect more qubits have on the landscape.Subfigures 2c and 2d show the impact of an increase in layers, with the axis being random vectors in the parameter space.
some of the minima are greatly shifted or even disappear for varying problem dependent variables.

Effect of Qubit Number
The increase of the number of qubits moves the archetype of the cost landscape from a turbulent shape with many local minima (Figure 1a) to the shape of a barren plateau.Here, local minima occupy a smaller parameter subspace, while wide planes show low variance in the value of the cost function, causing the optimization in this domain to become difficult.This phenomenon can be seen in both Figures 2a and 2b, where the amount of minima drastically decreases at the cost of wide planes without any visible structure.

Effect of Layer Number
The effect of the number of layers is the hardest to access with the used method of visual analysis, as a higher number of layers results in higher dimensionality.Therefore, only limited information can be extracted from this comparison.The analysis shows, that the cost landscape exhibits an increased number of local minima, which is in line with the hypothesis from [28].An increase of number of free parameters makes the process of optimization increasingly complex.The two examples shown in Figures 2c and 2d demonstrate this increased complexity.

Global optimization
Global optimization aims to find the best solution within the entire feasible search space.The search space encompasses all possible solutions of the problem and global optimization attempts to identify the global minimum by considering all variable combinations and constraints.However, global optimizers are often limited to finding the minimum in a designated area, leading to no guarantee of finding the global minimum.
On the other hand, local optimization focuses on finding the best solution within a specific region of the search space, usually centered around an initial guess or starting point.Unlike global optimization, local methods analyse the local behavior of the cost function near the starting point.
While local optimization methods are generally faster and more efficient than global optimizers, as they do not explore the entire search space, they can get trapped in local minima.Additionally, their effectiveness is greatly dependent on their initialization.
Often there is an interplay between both types, as global optimizers can use local ones to improve the potential solutions found.There is no clear cut that defines all algorithms as part of one of the two groups.Methods like the Univariate Marginal Distribution Algorithm (UMDA) [29] (considered as a local optimizer in this work), could be argued to also sample a part of the parameter space, which is a feature of global methods.A brief overview over all used local and global optimizers and their categorization can be found in the Appendix B.1 and B.2.
In the following we illustrate the drawbacks of local optimization for QAOA and motivate the use of global optimizers with a simple example.In Figure 3 the cost landscape of an UC problem instance (previously depicted in Figure 1a) can be seen.This cost landscape has been designed and optimized by the methods proposed by [6,17], as shown in Section 4, therefore representing a best case.A local optimization algorithm (represented here by the Nelder-Mead (NM) optimizer) is initialized at four different initial points and run until convergence.One can clearly see, that the local optimizer gets trapped in local minima, even for such simple problem instance.Their effectiveness, compared to global optimizers, is there-fore greatly dependent on their initialization.In contrast to this, Figure 3b shows the behavior of the global optimization algorithm Fast-Slow (FS) [21].This algorithm first samples globally and approximates the cost landscape by bayesian optimization.In a second step, a local optimizer is initialized at the most promising point.The yellow dots in Figure 3b show the sampled points used for the global search, with the red dot denoting the most promising point for a local optimizer to start at.The global optimization phase ensures that the optimization algorithm is not trapped in local minima due to its (disadvantageous) initialization.
To illustrate this point further, different optimizers are used to find the minimum in the cost landscape of Figure 4 with similar results.All local optimizers get trapped in various local minima, depending on their point of initialization.On the other hand, all global optimizers (almost always) find the global minima in the predefined parameter space.Note that in this scenario a value of 1 is not achievable, as the depth of QAOA as well as the choice of the parameter subspace does not allow such a solution.The improvement comes at the cost of an increasing number of function evaluations and time the optimization process takes.While local optimizers can achieve convergence within a few hundred function evaluations for simple problem instances, global optimizers might require several thousand evaluations to converge, if provided with the opportunity to do so.As this would take prohibitively long in simulations and especially in experiments on real hardware, this will be an important factor when comparing the different optimizers.

Experiments
In order to demonstrate the advantage of global optimization, experiments on three industrially motivated use cases are performed.Most experiments were conducted on different configurations of the UC problem, which has the benefit of being scaleable to arbitrary numbers of qubits.
Use Cases For all use cases, the cost landscapes of the problem instances are enhanced according to the principles of Section 4 and kept constant for all experiments.Differences between samples are solely due to the optimizer used and

Backend
The simulations are carried out with an ideal statevector backend and a shot based noisy simulation backend.In the latter a cus-  an iteratively chosen penalty factor.State vector and noisy simulations for all tested local optimizers and the most promising global optimizers with 10 random initializations each.NM is labeled as a reference to results in Table 1.
tom noise model is used, which imitates the gate and readout errors described in qiskits FakeBoe-blingenV2 backend, without copying its coupling map, therefore allowing for all-to-all connections in the circuit.This model results in lower error rates than the real counterpart, but allows for easier scaleable and faster experiments.To verify the findings from simulations, experiments are also run on real hardware.All experiments use layer numbers of p = 1 and p = 3.

Simulations
All optimizers are tested on 8, 10 and 14 qubit UC problem instances with 25 random initializations.The results for state vector simulations are depicted in Figure 6 in Appendix C. The global optimizers are evaluated with and without a limited number of function evaluations.In the following, just the abbreviations of the algorithms will be used (see Appendix B for details).On all problem instances the unrestricted global optimizers outperform the local optimizers, while requiring one to three orders of magnitude more function evaluations.The performance of the global optimizers SHGO and BH is greatly decreased, as the num-ber of allowed function evaluations is reduced to 1000 to 5000.This is also the case for DE and DA, albeit this trend is attenuated especially for DE.The decrease in function evaluations leads especially for DA to a higher variance in performance across runs.Therefore the global optimizers FS and DE show the most promising behavior: high performance with low variance and relatively low number of function evaluations.These global optimizers are benchmarked in a restricted setting against all local optimizers on all three use cases with state vector as well as noisy simulations.In Figure 5 the results for nine qubit instances of each use case are depicted.Here a clear trend can be observed: The restricted global optimizers improve the performance of QAOA by a factor of two while increasing the number of function evaluations by roughly the same factor.At the same time the variance of performance is strongly reduced.

Hardware experiments
To verify the results of our simulations, we evaluated FS and DE as well as the local optimizer NM on the 27 qubits IBMQ Ehningen quantum computer.The results in Table 1 show that for all tested problem instances the global optimizers FS and DE outperform all tested local opti- mizers, while generally requiring more function evaluations.Among the different configurations, a deteriorating effect on the solution quality can be observed that is caused by hardware errors: The errors of deeper circuits even prevail the benefits of higher layer numbers, such that QAOA with p = 1 has higher values of C(β,γ) C min than its three layer counterpart.These findings are in line with the results in [18], where -depending on the problem -better performance with increasing layer number is not to be expected.The limited number of samples available for higher numbers of qubits makes it hard to estimate the scalability of this approach, so it remains to be shown, that the advantage holds for larger systems, even though the trend seems promising.

Conclusion
In this work, we investigated the performance of local optimizers on well designed cost landscapes of QAOA.We demonstrated that even for simple problem instances, local optimizers fail to find optimal solutions.Global optimizers on the other hand outperform local optimizers not only on such simple problem instances, but on a variety of use cases at the cost of higher numbers of function evaluations.These results hold for state vector simulations, noisy simulations as well as for experiments on real hardware.In order to overcome the caveat of high numbers of function evaluations required, we propose to restrict the global optimizers in terms of function evaluations.While these restrictions lead to a great decrease in number of function evaluations, the overall so-lution quality is only slightly decreased.Global optimizers like FS and DE can greatly improve QAOA's performance, while increasing the number of function evaluations only by a factor of two compared to local optimizers like Powell or NM.
In current research, local optimization has been the standard approach to optimize the classical parameters of QAOA.Our work showed that restricted global optimizers are better suited and should be applied more widely.Especially combined with further enhancements in problem encodings, problem scalings and mixer designs, global optimizers can help to pave the way for the application of QAOA to real world problems.

A.1 Unit Commitment Problem
The Unit Commitment (UC) problem describes the allocation of power units to satisfy a given power demand L for the lowest possible price.The formulation used here is modelled after [11]: A i ,B i and C i are fixed parameters, that encode the cost for producing an amount of power p i , if the unit x i is turned on.In this work, p i is additionally fixed to constant integer values, which are not subject to the optimization process.n units denotes the over-all number of units of the problem and is equal to the number of required qubits.
To Equation 6constraines are added, to ensure that the sum of the produced power is equal to the power demand L, which can be expressed as the penalty term Equations 6 and 7 are used to construct the QUBO formulation using Equation 5.

A.2 Travelling Salesperson Problem
The objective of the Travelling Salesperson (TSP) problem is to find the shortest possible route for visiting a list of cities and returning to the initial point.The used implementation is taken from [12] and is described there in greater detail.The binary variables x i,j of the problem have two indicies, with i being an identifier for the city and j the point in time, when it is visited.Both are integers n cit , with n 2 cit qubits needed to construct the QAOA circuit.With this the problem can be described as D is the adjacency matrix describing the distances between each city, with the element D ii ′ being the distance between the i th and i ′th city.So each movement between cities in consecutive time steps adds to the cost.The constraints are added with the penalty term Equation 9 ensures that each city is only visited once and only one city is visited each time step respectively.The results in [12] show, that using the same penalty factor for both constraints works reasonably well, so this simplification will be used here as well.Combining Equations 8 and 9 in Equation 5 constructs the QUBO formulation.

A.3 Factory Layout Problem
The Factory Layout (FL) problem is based on [13].A number of positions n pos on a factory floor plan are given and some number n mach of production units has to be placed as efficiently as possible to minimize the cost of transport between each machine.As in the previous case the distances between the positions can be summarized in an adjacency matrix D. The flow of material between each of the machines can be described by a transportation density matrix T , where each element T ii ′ describes the amount of material going from the i th and i ′th production cell.To construct the problem, the decision variables have two indices with x i,j being the machine i placed on position j.Using these definitions the problem can be formulated as The constrains in this problem are the limitation of each machine being used exactly once and each position being at most taken once.This results in two distinct penalty terms: Each machine being used is written as while each position being taken once is encoded as B Algorithms and Heuristics

B.1 Local Optimizers
The used optimizers are summarized in short and if not noted otherwise, the standard Scipy implementation is used.
• Nelder-Mead (NM) [30]: This algorithm is derivative-free and uses a simplex of n + 1 points, with n being the dimensionality of the problem.Based on the function values of the different vertices a new point is constructed, that should be closer to the next minimum.The used Scipy implementation uses the version described in [31].
• Powell[32]: This algorithm is derivative-free and performs several line searches to find the next optimum.It is constructed as an iterative process of choosing lines to search, based on the currently known minimum and performing the actual searches.The Scipy implementation is a modified version, that has no closer description.
• Conjugate Gradient (CG) [33]: This algorithm uses derivatives and is originally designed to find solutions for big linear systems of equations.Based on gradient information it chooses a subspace to optimize for the next step.
• Broyden-Fletcher-Goldfarb-Shannon (BFGS) [34]: This algorithm approximates the gradients iteratively.Based on those the direction of descent is chosen and a line search is performed.
• Truncated Newton (TNC) [35]: This algorithm uses a CG-Method to solve the Newton equations and update the parameters for the next iteration based on this.It is designed to handle non linear functions as well.
• Contrained Optimization BY Linear Approximation (COBYLA) [36]: This algorithm is derivativefree and uses a linear approximation of the problem to find potential points.For this a simplex is constructed that forms the basis of the approximation.In each iteration, vertices can then be replaced based on the approximation to improve the simplex or to save a found vector.
• Simultaneous Perturbation Stochastic Approximation (SPSA) [37]: This algorithm approximates the gradient to follow by using only two measurements, so it is independent of dimensionality.With this gradient the next step is calculated and the approximation is performed again.The implementation used can be found in the qiskit package.
• Continuous Univariate Marginal Distribution Algorithm (UMDA) [29]: This algorithm populates an area with sampling points and uses the best as data in the approximation of the cost function with univariate normal distributions.From these combined distributions new points are drawn for the sampling of a new generation.It is part of the qiskit package.

B.2 Global Optimizers
As with the local optimizers this work mostly utilizes optimizers from the Scipy library, with deviations noted in the following.
• Basin-Hopping (BH) [38]: This algorithm is initialized at some random point and tries to find the next local minimum with a local minimizer.After convergence the algorithm applies a random perturbation to the coordinates in order to find a different local minimum.Depending on the found function value the algorithm can chose to use this point as the basis for the next global step or reject the point and revert to a previous solution.The sequence of local optimizations and global steps is repeated until either the maximum number of steps is reached or some condition is met.The used implementation ignores if the local optimizer converged to a minimum or not, and decides on accepting a new point purely based on its function value.
• Differential Evolution (DE) [39]: This algorithm uses a random population of points in the search space to find the minimum.From iteration to iteration it combines the vectors of members of the current population to form members of the next generation.
• Dual Annealing (DA) [40]: This algorithm is also known as Generalized Simulated Annealing.It tests new random points, minimizes them with a local optimizer and rejects ones with higher function values than the current best point based on a random distribution.This distribution is annealed over time, so the condition of acceptance becomes stricter.
• Simplical Homology Global Optimization (SHGO) [41]: This algorithm uses sampling points to build a complex for approximating the cost function and finding potential regions where the minima might lie.It optimizes all possible points with a local optimizer to find all local minima and returns them.
• Fast-Slow (FS) [21]: This algorithm is a hybrid between a bayesian learning regime and a classical local optimizer.In a first step, the energy landscape is approximated over a wide area via bayesian approximation.With this model the most promising point is identified through classical optimization.This point will be minimized on the actual function by a local optimizer in a second step.This approach is the only one that is not directly implemented in the Scipy library.Instead, the bayesian learning from the Scikit-learn library is used for the first step, while the second step is done with the scipy optimizers.

B.3 Adaptive Choice of Penalty Factors P j
The goal of the algorithm mentioned in [6] is setting P j just large enough, so the optimal solution becomes the global optimum of the function of Equation 5.This does not necessarily mean, that all wrong (invalid) solutions have a higher value of the cost function than any of the valid solutions.
For the construction of P j the cost structure of all combinations needs to be known beforehand.Out of this one can extract H min,opt , the cost of the optimal solution and H val , the mean cost of the valid solutions.With this the algorithm can start at a P j = 0: Baseline s = 0.0002, P = 0.2,

Figure 1 :
Figure 1: Influence of problem dependent parameters: All contour plots depict cost landscapes created by a four Unit UC problem.Subfigure 1a serves as a best case baseline cost landscape with parameters s = 0.0002, P = 0.2 and L = 300 (chosen according to Section 4).Subfigures 1b and 1c show the introduction of additional minima with a changed values of P = 2 and s = 0.002 respectively.Subfigure 1d shows an additional cost landscape created by a related problem with L = 200.

Figure 2 :
Figure 2: Influence of algorithmic dependent parameters: All contour plots depict cost landscapes created by the UC problem at s = 0.0002, P = 0.2 and L = 300 (same as baseline in Figure1a).Subfigures 2a and 2b show the effect more qubits have on the landscape.Subfigures 2c and 2d show the impact of an increase in layers, with the axis being random vectors in the parameter space.

Figure 3 :
Figure 3: Both surface plots show the cost landscape of the one layer, four unit UC problem baseline with L = 300, s = 0.0002 and P = 0.2 (as in Figure 1a).Subfigure 3a shows four optimization runs of the randomly initialized local optimizer NM, with the starting and final points indicated by a red marker.Depending on the initialization, the local optimizers only converge to the nearest local minimum.The second Subfigure 3b shows the sampling points (green) used in the bayesian optimization step of the FS optimizer and the selected final point for the local optimization step indicated by a red marker.The global optimizer finds one of the best possible minima.

Figure 4 :
Figure 4: Comparison of local (upper) and global (lower) optimizers on the baseline cost landscape (four unit UC problem, L = 300, s = 0.0002 and P = 0.2): The plots depict the results of 25 randomly initilizated runs of eight local and four global optimizers (see Appendix B).The dotted line denotes the value of the best possible minimum in the defined search space.The local optimizers get stuck in various local minima depending on their initializations, while global optimizers mostly find the optimal solution.

Figure 5 :
Figure5: Results of nine qubit use case instances of a three layer QAOA with adaptively chosen s and, if needed, an iteratively chosen penalty factor.State vector and noisy simulations for all tested local optimizers and the most promising global optimizers with 10 random initializations each.NM is labeled as a reference to results in Table1.
et al. demonstrate how the use of evolutionary strategies enhances the performance of QAOA on the Max-Cut problem, and Rad et al. show how the optimization of classical parameters of QAOA can be separated into two distinct phases: a first global phase where a bayesian estimate is used to initialize the parameters of a second local phase.Both works, which can be considered to use global optimization approaches, show enhanced performance over local optimizers.

Table 1 :
Results of hardware experiments on IBMQ Ehningen (backend=e) and comparison with state-vector (back-end=s) and noisy (backend=n) simulation on various UC problem instances with 10 to 18 qubits (Q).Two local optimizers (NM and Powell) are benchmarked against the restricted global optimizers FS and DE in terms of function evaluations (eval.)and normalized cost (norm.cost).