Quantum machine learning algorithms promise to deliver near-term, applicable quantum computation on noisy, intermediate-scale systems. While most of these algorithms leverage quantum circuits for generic applications, a recent set of proposals, called analog quantum machine learning (AQML) algorithms, breaks away from circuit-based abstractions and favors leveraging the natural dynamics of quantum systems for computation, promising to be noise-resilient and suited for specific applications such as quantum simulation. Recent AQML studies have called for determining best ansatz selection practices and whether AQML algorithms have trap-free landscapes based on theory from quantum optimal control (QOC). We address this call by systematically studying AQML landscapes on two models: those admitting black-boxed expressivity and those tailored to simulating a specific unitary evolution. Numerically, the first kind exhibits local traps in their landscapes, while the second kind is trap-free. However, both kinds violate QOC theory’s key assumptions for guaranteeing trap-free landscapes. We propose a methodology to co-design AQML algorithms for unitary evolution simulation using the ansatz’s Magnus expansion. Our methodology guarantees the algorithm has an amenable dynamical Lie algebra with independently tunable terms. We show favorable convergence in simulating dynamics with applications to metrology and quantum chemistry. We conclude that such co-design is necessary to ensure the applicability of AQML algorithms.

Quantum machine learning (QML) promises to deliver advantageous applications in noisy, intermediate-scale quantum computers. QML algorithms promise to deliver either by using quantum systems to speed up machine learning subroutines1,2 or by designing novel techniques to optimize and control noisy quantum systems for machine learning and quantum applications.3 This later approach, called variational quantum algorithms, uses variational optimization for application within near-term devices. These algorithms often comprise an underlying hardware architecture with tunable parameters, a loss function measuring the error relative to a desired computation, and a classical optimizer routine tuning the parameters to minimize the loss. The hardware architecture is often abstracted away and modeled as a digital quantum circuit, and such abstracted algorithms are called variational quantum circuits (VQCs).

Despite their promise, VQCs exhibit issues with accuracy, efficiency, and training, which precludes an advantage over classical algorithms. VQCs often experience flat landscapes upon random initialization4—a phenomenon known as barren plateaus—and an exponential number of local minima.5 To circumvent these challenges, extensive work has been performed on proposing circuit architectures (ansätze),6–10 loss functions,11–14 regulation techniques,15–17 and optimization techniques.18–21 

A recent set of proposals, called analog quantum machine learning (AQML) ansätze,22–24 breaks away from circuit-based abstractions. Instead, AQML favors directly using the system’s dynamics for computation. At a high level, an AQML ansatz comprises a native interaction Hamiltonian and a set of time-dependent controls. However, AQML studies suffer from various practical drawbacks, such as the fact that simulating time evolution is computationally expensive and, therefore, limits theoretical studies to small system sizes.

AQML ansätze are more hardware-efficient than circuit approaches. AQML tends to reduce circuit depth,25 is closely related to quantum simulation, can efficiently prepare states adiabatically or through Floquet dynamics,26 and is well-suited for machine-learning applications with limited tunable devices.27 Theoretical AQML studies are more faithfully translated into experimental studies.

AQML appeals to developers for various reasons related to classical analog computation. First, classical analog computing is sometimes advantageous over digital computing for specific applications due to ease of fabrication, problem-solution co-design, and energy cost.28–30 AQML is thus potentially beneficial for applications related to the simulation of other quantum behavior or for classical problems that may benefit from quantum-like computation. Second, analog classical computing can also be noise-resilient upon a judicious choice of learning paradigm.31 AQML algorithms are potentially resilient to current noise in quantum devices and can thus deliver practical applications sooner.

Recent observations and desires within quantum computation also justify investments in AQML research. First, analog approaches promise to clarify the fundamental computational capabilities of available physical systems.32,33 Second, small-scale analog systems have been used for various tasks relevant to scientific and industrial communities, including efficient unitary time-evolution simulation,34 sensing,35 resource allocation optimization,36 time-series prediction,37 image classification,27 and memory storage and retrieval.32 Third, while VQCs are expressive, they are also plagued with barren plateaus. On the other hand, for AQML algorithms, the source of barren plateaus stems from excess entanglement.15,35,38–40 Finally, for both unitary evolution41 and ground state finding,42 shallow VQCs often exhibit low accuracy and exponential local minima.43 VQCs more often succeed when the circuit depth is exponential in the system size.5,44,45 On the other hand, AQML algorithms’ landscapes are believed to resemble those from controllable quantum systems more closely. According to quantum optimal control (QOC) theory, these systems exhibit trap-free landscapes.46–48 Recent AQML studies and perspectives have called for a study of the training landscapes of AQML models.36,49 There is a dire need to determine the topography of the landscapes. Which problems are suitable for AQML algorithms and what algorithms are ideal for specific problems remain open lines of inquiry.

This work leverages various numerical and analytical studies to interrogate and understand AQML landscapes as a function of the number of control parameters, control types, and system size for a specific loss function. To do this, we focus on simulating the transverse-field Ising model evolution with two groups of ansätze: (A1) ansätze that have native Hamiltonian different with the Ising interactions but admit black-box universal quantum computation such as the recently introduced quantum perceptrons (QPs),35 and (A2) ansätze tailored to the specific tasks such as those with Ising native interactions, or specialized QPs that contain the Ising interaction as a special case (see Sec. II C for details). Ansätze in A1 are expressive, while those in A2 are co-designed with the task we study. Co-design is not a new concept; significant effort has gone into describing how quantum computing architectures are often built to address specific problems.50–52 

In Sec. III A, we show that using QOC theory with common QOC assumptions leads to a theoretical prediction of trap-free landscapes when an exponential number of control parameters are available (i.e., in over-parametrized ansätze) regardless of the microscopic ansatz details. We also discuss conditions on AQML ansätze that lead to a violation of one QOC assumption, local surjectivity. Consequently, QOC theory does not apply to AQML ansätze. However, numerical experiments show that A2 ansätze exhibit trap-free landscapes and better approximation errors, while those in A1 are trap-ridden (Sec. III B). In summary, we argue that although most AQML landscapes do not possess the amenable qualities of QOC settings, co-designing an ansatz is an important step toward favorable training landscapes.

In Sec. IV, we show that task-algorithm co-design can be developed for unitary time evolution simulation by analyzing the ansatz’s Magnus expansion. A co-designed algorithm is one where terms in the Magnus expansion contain the Hamiltonian terms in the target unitary evolution and where each term in the Magnus expansion is independently tunable. That is, a co-designed algorithm has a tunable dynamical Lie algebra (DLA) amenable to the problem at hand. We do this analysis pictorially using a generalization of the Lie trees used in optimal quantum control.46 We show that when co-designed, the QP ansatz is suited for time-reversible spin-squeezing with metrology applications (see Sec. IV B), and the Ising ansatz is suited for unitary couple-cluster evolution with applications to quantum simulation (see Sec. IV C). In both cases, satisfactory accuracy is reached within a constant number of controllable parameters. Section V concludes with further research directions.

Upfront, our studies make the following two significant conclusions:

  1. AQML algorithms violate common QOC assumptions. Despite not following QOC theory, co-designed AQML algorithms exhibit trap-free landscapes. On the other hand, black-box algorithms exhibit traps. Further work is needed to determine if co-design is necessary for trap-free landscapes.

  2. Succeeding at a task is a more likely outcome when AQML algorithms are co-designed. In the case of unitary time evolution, one approach toward co-design is given by analyzing the terms of the AQML algorithm’s Magnus expansion.

All of the code and data for this publication are available in Ref. 53.

This section introduces AQML algorithms comprising an analog ansatz, a loss function, and a classical optimization scheme. We also introduce two classes of algorithms: black-box expressive ones and task co-designed ones. This distinction will allow us to see the quantitative differences in training performance due to co-design.

Our studies concern a system of N qubits labeled i = 1, …, N, each living in the Hilbert space hi spanned by the basis states {|0⟩i, |1⟩i}. The system inhabits the product Hilbert space i=1Nhi. We call Xi, Yi, and Zi the spin operators acting on qubit i, which obey the usual spin commutator relationship [Xi, Yj] = ijZi.54 The system evolves under a time-dependent Hamiltonian H(tθ) composed of a native–also called drift–(Hnat) and control (Hctr) Hamiltonians, with tunable parameters J and a, respectively, with θ = (J, a). For the Ising and QP ansätze, the native Hamiltonians are
(1)
(2)
We select the QP Hamiltonian based on our recent introduction and interest in QPs as the basic motif of other 2D native Hamiltonians.35 QPs are useful for ancillary-based operations, energy measurements, and sensing applications. Indeed, the QML community is interested in proposing scalable and analytically tractable building blocks of quantum neural networks to inform computational capacity.55 Previously, we studied the source and presence of vanishing gradients upon random initialization in QPs. In this study, we are interested in how the parameters θ shape the landscape.
The control Hamiltonian we use is
(3)
where a are variational parameters that modulate the control terms through the functions fiα. It is customary to parameterize these functions as a finite sum of K simple functions, fiα(t)=k=0K1aikαgk(t).36 In our case, the functions gk will be Fourier series, Gaussians, or piecewise constants (see  Appendix A for details). We remark that when using narrow Gaussians, an AQML model is equivalent to a circuit-based QML model.
We assume that the quantum system is noiseless. Therefore, an initial state |Ψ0⟩ evolves under the Schrödinger equation i∂t|Ψ(t)⟩ = H(tθ)|Ψ(t)⟩ to the final state |Ψf⟩ at t = T. Mathematically, the final state is a unitary transformation of |Ψ0⟩ defined by
(4)
where T is the time-ordering operator. To recap, an AQML ansatz is a particular choice of native Hamiltonian and controls. Figure 1 shows a pictorial representation of the QP ansatz.
FIG. 1.

Schematic representation of the AQML Hamiltonian (a) and the learning procedure (b). The analog ansatz’s Hamiltonian generally comprises a native Hamiltonian with qubit interactions, parameterized by variables J, and a control Hamiltonian of single-qubit terms parameterized by variables a. The ansatz parameters are θ = (J, a). Panel (a) exemplifies the analog ansatz of a quantum perceptron (QP) composed of N − 1 qubits interacting with a central qubit. Each qubit experiences several time-dependent control fields as sums of simple time-dependent functions such as Fourier and Gaussian functions. Panel (b) exemplifies the learning loop with an AQML algorithm. The system undergoes unitary evolution dictated by the time-dependent Hamiltonian with parameters θ. The evolved state is then measured to calculate a loss function depending on the variational parameters. A classical optimizer—such as the Nelder–Mead or gradient descent methods—calculates parameters that reduce the loss. The parameters are updated, and the training loop continues until convergence.

FIG. 1.

Schematic representation of the AQML Hamiltonian (a) and the learning procedure (b). The analog ansatz’s Hamiltonian generally comprises a native Hamiltonian with qubit interactions, parameterized by variables J, and a control Hamiltonian of single-qubit terms parameterized by variables a. The ansatz parameters are θ = (J, a). Panel (a) exemplifies the analog ansatz of a quantum perceptron (QP) composed of N − 1 qubits interacting with a central qubit. Each qubit experiences several time-dependent control fields as sums of simple time-dependent functions such as Fourier and Gaussian functions. Panel (b) exemplifies the learning loop with an AQML algorithm. The system undergoes unitary evolution dictated by the time-dependent Hamiltonian with parameters θ. The evolved state is then measured to calculate a loss function depending on the variational parameters. A classical optimizer—such as the Nelder–Mead or gradient descent methods—calculates parameters that reduce the loss. The parameters are updated, and the training loop continues until convergence.

Close modal

In the most general setting, after the system evolves under U(θ), the system is measured several times with each qubit in a particular basis Mi to calculate a loss L encoding the error relative to a target behavior. The loss is then passed through a classical optimizer for improvement through tuning the parameters θ.

We focus on learning a unitary time evolution Utarg. The motivation for this loss function is twofold. First, studying this loss tells us about the possibilities of using AQML algorithms for quantum simulation, a promising application of quantum hardware. Second, previous QOC studies have shown that this loss has a trap-free landscape for some simple Hamiltonians. Mathematically, the loss is the Fröbenius norm
(5)
where AF2=Tr(AA), and the factor of 2N−1 bounds the loss to the interval [0, 1] making LE a measure of fidelity.
Except for Sec. IV, the target unitary in all our numerical experiments is generated by the 1D transverse field Ising Hamiltonian
(6)
The parameters will be updated using Adam, a version of gradient-descent with momentum and scalability terms.56 

We now define the groups of ansätze we alluded to in the introduction. All ansatz we study admits full expressivity in the form of controllability. These are ansätze for which the operators in the native and control Hamiltonians can produce any element of the dynamical Lie algebra (i.e., Pauli strings in our case) through nested commutations.

Group A1 comprises QPs with arbitrary control parameters and only Gaussian or Fourier controls. These ansatz are expressive but not tailored for the task in Eq. (6). Group A2 consists of expressive ansätze with a close form solution for simulating the evolution in Eq. (6). For example, the Ising ansatz with constant fields is immediately in A2. Moreover, in  Appendix A, we show that specialized QPs with a combination of Gaussian controls and extra piecewise constant controls on the output qubit are in A2. This can also be derived from analyzing the ansatz Magnus expansion as in Sec. IV. In summary, algorithms in A1 and A2 are expressive, but those in A2 are tailored to the task in Eq. (6).

In Sec. III, we show that QPs in A1 perform purely at the task in Eq. (6), while specialized A2 QPs perform better but worse than the Ising ansatz. However, we show that A2 QPs, co-designed for the task, exhibit landscapes free of local minima. Therefore, we contrast A1 and A2 ansätze to showcase the importance of co-design.

Before going any further, this section introduces crucial concepts in landscape analysis. We discuss theoretical conditions from QOC that are necessary for trap-free landscapes and show that while these conditions are violated, co-designed ansätze in A2 indeed show trap-free landscapes.

Let us begin reviewing how to characterize a landscape’s curvature. In general, studying the landscape of any algorithm encompasses finding the parameter values with vanishing loss’s derivative (i.e., the critical points). Critical points are then categorized as minima, maxima, or saddles, depending on the curvature around them.

The derivative can be expressed as the function composition θL=ULθU. For the case of Eq. (5), the first term is independent of θ and, therefore, all critical points satisfy Tr(θU) = 0.

In quantum optimal control theory (QOC), θU is called the dynamical derivative, and it captures how the unitary U changes for a changing control field. For an AQML ansatz, the dynamical derivative of a parameter θikα associated with the time-dependent control function gk(t), qubit i, and the Hamiltonian operator hiα is given by (see  Appendix B 1 for details)
(7)
Here, if the parameter is associated with a term in Hnat, we use gk(t) = 1 and hi = ZiZk; else hiα=Xi or Yi. Equation (7) is a special case of the AQML dynamical derivative in Ref. 36, assuming our particular ansatz architecture.

The nature of the critical points is dictated by the Hessian (i.e., second derivative) of the loss. In the case of Eq. (5), the Hessian is given by θθL=U2LθθU. The eigenvalues of the Hessian determine the nature of the critical point [see Fig. 2(a) for examples]. A trap-free landscape has only one maximum or minimum critical point, while the rest are saddles. See Fig. 2(a) for an example of a trap-free landscape.

FIG. 2.

The optimization landscape depends on the ansatz and the loss. (a) Exemplifies a trap-free optimization landscape where all sub-optimal critical points are maxima and saddles, while the only minimum is optimal. According to QOC theory, the AQML ansatz we present could be trap-free if it satisfies the condition of local surjectivity, meaning that each dynamical derivative of the unitary has a nonzero average overlap with each Pauli string. (b) Shows numerical calculations of the average overlaps between the dynamical derivatives and Pauli strings for A1 and A2 QP ansätze with N = 2, 3, 4, 5 qubits and with 4N − 1 parameters. For some Pauli strings, the average overlap is minimal at zero or near zero, violating local surjectivity.

FIG. 2.

The optimization landscape depends on the ansatz and the loss. (a) Exemplifies a trap-free optimization landscape where all sub-optimal critical points are maxima and saddles, while the only minimum is optimal. According to QOC theory, the AQML ansatz we present could be trap-free if it satisfies the condition of local surjectivity, meaning that each dynamical derivative of the unitary has a nonzero average overlap with each Pauli string. (b) Shows numerical calculations of the average overlaps between the dynamical derivatives and Pauli strings for A1 and A2 QP ansätze with N = 2, 3, 4, 5 qubits and with 4N − 1 parameters. For some Pauli strings, the average overlap is minimal at zero or near zero, violating local surjectivity.

Close modal

For the remainder of this section, we show that AQML algorithms violate one of QOC’s conditions necessary to ensure trap-free landscapes (Sec. III A). However, we present evidence that co-design ansätze in A2 exhibit trap-free landscapes (Sec. III B).

We now discuss conditions from QOC theory necessary for trap-free landscapes. We show that AQML ansätze in both A1 and A2 violate these requirements, but we will show that A2 ansätze are trap-free.

According to QOC theory, three conditions are sufficient to ensure a trap-free landscape,57–59 which several studies have assumed to apply to AQML algorithms.36,49 The conditions are as follows:

  1. Unconstrained Fields. The control fields fnα should be allowed to take on any real value.

  2. Controllability. The terms in the H(t) should produce any element of the Hilbert space’s dynamical Lie algebra through nested commutators.

  3. Local Surjectivity. The dynamic derivatives of the ansatz for every parameter must be full-rank in the dynamical Lie algebra.

For AQML, the dynamical Lie algebra comprises all possible Pauli strings P composed of products of Xn, Yn, Zn, and the identity operator. Local surjectivity requires that every Pauli-string P has a nonzero average overlap with the dynamical derivative in Eq. (7). That is,
(8)

The first two conditions are satisfied for our AQML ansätze. Indeed, the control fields are unconstrained, and the ansätze allow for universal quantum computation,35,60 which is linked to controllability.61 On the other hand, local surjectivity must be checked numerically on all 4N − 1 Pauli strings. As a result, local surjectivity is often assumed. Theoretical studies of QOC landscapes justify this assumption because, experimentally, optimal control fields are indeed easy to find in most cases.62 However, this easiness is not the case for AQML algorithms.

In  Appendix B, we show that if one assumes these three conditions for an AQML ansatz with at least 4N − 1 variational parameters, then the landscape of LE is trap-free (i.e., it consists of saddles and only one local minimum and maximum). We remark that the over-parametrizing requirement is sufficient to produce a theoretical prediction of trap-free landscapes, but that numerics in Sec. III B show that over-parametrizing results in higher losses in practice. This proof uses the methods developed in Ref. 57. We highlight that the landscape analysis depends on the loss function. The loss dictates the values of the landscape, while its derivatives dictate the critical values and curvature. The analysis in this paper only applies to the loss in Eq. (5). However, as it has been shown in QOC, using the techniques in Ref. 58, QOC assumptions suffice to produce trap-free landscapes on other losses such as those related to ground state preparation, and trap-free landscapes have been observed in applications in a plethora of settings.63 

Figure 2(b) shows numerical tests of local surjectivity using A1 and A2 QP ansatz with Fourier and Gaussian parametrizations for different qubit numbers. We see that while the average overlap for many Pauli’s is nonzero, there exist Pauli strings for which the overlap is at a minimum of zero or close to zero. In other words, the ansätze fail the local surjectivity condition even in the over-parametrized regime of 4N − 1 parameters. We also observe similar results for fewer parameters and other parametrizations, such as using Legendre polynomials and piecewise constant functions.

This observed violation gives way to a natural question: under what conditions can we expect an AQML algorithm to violate local surjectivity? In  Appendix C, we show that when an AQML ansatz produces a unitary U(θ) that is Haar random distributed, then local surjectivity is violated for every P. This condition can be relaxed to U(tθ) following a unitary 1-design in case the control functions gk vanish around t = 0, T, in which cases local surjectivity is violated for Pauli’s P contained in the native Hamiltonian. The observations in  Appendix C provide us with design guidelines for AQML algorithms. First, it is best to use time-dependent control functions that do not vanish at the endpoints. Second, it is best to avoid native and controlled Hamiltonians that produce dynamics resembling Haar unitaries.

We highlight that following these guidelines does not guarantee trap-free landscapes. Our algorithms do not resemble Haar randoms. Therefore, further work is needed to close the gap between our analytic and numeric understanding of the scenarios leading to the violation of local surjectivity and in what cases a violation leads to trap-ridden landscapes. However, as Sec. III B shows, trap-free landscapes can still appear when an ansatz is co-designed, even when local surjectivity is violated.

We now present a numerical analysis of the landscapes of A1 and A2 ansätze. We show that A2 QPs are trap-free. We compare the quality of the solutions produced by both A1 and A2 QPs at approximating the time evolution in Eq. (6). Importantly, this evolution is theoretically simulatable since both ansäte are expressible, but only A2 QPs are co-designed to accomplish this task. The main results are presented in Figs. 3 and 4.

FIG. 3.

Approximating the 1D transverse fields Ising evolution on type A1 and A2 QPs. Panel (a) shows the average Fröbenius norm found for both types of QPs after 100 trials and for different qubit numbers. We observe that both types of QPs fail to approximate the desired time evolution. A2 QPs consistently outperform A1 QPs. To further investigate the A2 QPs behavior, (b) shows the average and minimum Fröbenius norm for A2 QPs with increasing control functions K for N = 4 and 5. For N = 4 qubits, we observe an improvement in the loss as we approach the over-parametrized regime (K = 26), followed by an increasing error due to over-fitting. For N = 5 qubits, this improvement is no longer present. Moreover, when comparing the average loss to the minimum loss, we observe a stark difference by approximately two orders of magnitude, thus suggesting a wide spread of the loss at the critical points.

FIG. 3.

Approximating the 1D transverse fields Ising evolution on type A1 and A2 QPs. Panel (a) shows the average Fröbenius norm found for both types of QPs after 100 trials and for different qubit numbers. We observe that both types of QPs fail to approximate the desired time evolution. A2 QPs consistently outperform A1 QPs. To further investigate the A2 QPs behavior, (b) shows the average and minimum Fröbenius norm for A2 QPs with increasing control functions K for N = 4 and 5. For N = 4 qubits, we observe an improvement in the loss as we approach the over-parametrized regime (K = 26), followed by an increasing error due to over-fitting. For N = 5 qubits, this improvement is no longer present. Moreover, when comparing the average loss to the minimum loss, we observe a stark difference by approximately two orders of magnitude, thus suggesting a wide spread of the loss at the critical points.

Close modal
FIG. 4.

Trap-free AQML landscapes for A2 ansätze. Each plot shows the sorted Hessian eigenvalues for 50 trials calculated through automatic differentiation for N = 4 of A1 and A2 QPs. Each eigenvalue represents the local curvature in each linearly independent direction at the found critical points. Positive and zero curvature (eigenvalue) present for the first row (A1 ansätze) correspond to local/global minima, whereas negative or zero curvature present for the second row (A2 ansätze) correspond to saddle points. The basis set sizes (K) are chosen such that the ansätze on the left (right) column are under- (over-) parameterized. All eigenvalues are non-negative for the first row (A1 QP ansatz), characterizing these critical points as traps. This is no longer the case for the second row (A2 QP ansatz), where we observe the appearance of positive and negative eigenvalues. Interestingly, we observe that in both parameter-size regimes there is no significant change in the properties and quality of the critical points. That is, for A1 ansätze, the loss landscape in both regimes consists solely of local (sub-optimal) minima, whereas for A2 ansätze the landscapes remain free of local minima. The critical points found through A2 present positive and negative curvature along some directions, thus characterizing them as saddle points despite this ansatz violating key assumptions in QOC theory.

FIG. 4.

Trap-free AQML landscapes for A2 ansätze. Each plot shows the sorted Hessian eigenvalues for 50 trials calculated through automatic differentiation for N = 4 of A1 and A2 QPs. Each eigenvalue represents the local curvature in each linearly independent direction at the found critical points. Positive and zero curvature (eigenvalue) present for the first row (A1 ansätze) correspond to local/global minima, whereas negative or zero curvature present for the second row (A2 ansätze) correspond to saddle points. The basis set sizes (K) are chosen such that the ansätze on the left (right) column are under- (over-) parameterized. All eigenvalues are non-negative for the first row (A1 QP ansatz), characterizing these critical points as traps. This is no longer the case for the second row (A2 QP ansatz), where we observe the appearance of positive and negative eigenvalues. Interestingly, we observe that in both parameter-size regimes there is no significant change in the properties and quality of the critical points. That is, for A1 ansätze, the loss landscape in both regimes consists solely of local (sub-optimal) minima, whereas for A2 ansätze the landscapes remain free of local minima. The critical points found through A2 present positive and negative curvature along some directions, thus characterizing them as saddle points despite this ansatz violating key assumptions in QOC theory.

Close modal

For Fig. 3(a), we initialized 100 instances of each QP ansatz with K = 5N and K = 3N control fields for A1 and A2 ansätze, respectively. We note that we ran several more simulations with varying numbers of fields, and the simulation results remained practically unchanged. Figure 3(a) shows the average converged loss. We observe that at N = 4 onwards, both ansätze failed to converge, on average, to the optimal solution. We attribute the sudden change at N = 4 to the fact that both A1 and A2 ansätze have a “star-like” architecture, which reduces to the Ising connectivity at the values of N = 2 and N = 3. At N ≥ 4, however, neither can reliably converge to the optimal solution. We note that the A2 ansatz consistently outperforms the A1, and the error bars show a much larger spread in the loss distribution at the converged critical points.

To further investigate the effect of the number of control functions (K) and obtain a better picture of the quality and spread of the solutions for these landscapes, we repeated the same experiments for the A2 QP ansatz with varying K. Figure 3(b) shows the resulting plots of minimum and average over 100 trials for each value of K. For N = 4, we see an improvement in the average and minimum loss found as K approaches the over-parametrized regime at K = 26, followed by an increasing error due to over-fitting. In addition, we observe a stark difference between the average and minimum solutions found by ∼2 orders of magnitude. This contrasts with the homogeneity in loss values observed for the A1 QPs. For N = 5, the sudden improvement as the pulse basis sets grow in size is absent. Yet, we still recover the trend in the spread of the found critical points, signaling a richer and more heterogeneous distribution of critical points with differences in quality by ∼2 orders of magnitude. The explored values of K encompass the under- and over-parametrized regimes for each value of N (K = 26 and 85, respectively). We highlight that  Appendix D shows the convergence rates of our AQML models, highlighting their high convergence rates.

We investigate the nature of the critical points we found for both A1 and A2 QPs. Using automatic differentiation, we calculate the exact Hessian matrix at each critical point across 50 trials with varying numbers of control parameters K. Figure 4 shows the results for four select choices of K corresponding to below and above the over-parametrized regime of 4N − 1 parameters (see  Appendix D for more details). The right (left) column plots the Hessian eigenvalues for both ansätze in the under- (over-) parameterized regime. As we can see, for the A1 QPs, both regimes exhibit only non-negative eigenvalues; therefore, these critical points correspond to local minima. For the A2 QPs, however, we observe the presence of both positive and negative eigenvalues in both regimes. The critical points of A2 QPs can thus be classified as saddles. This result adds to the difference in solution quality achievable through both ansätze classes, as it further characterizes the A2 QPs solutions as saddles that can potentially be avoided by increasing the number of optimization cycles.

In sum, our numerical experiments provide evidence of the absence of amenable landscapes for black-boxed expressive algorithms. Rather than seeing a computational phase transition around the overparameterized regime, we found that a more accurate predictor of success was whether a model is co-designed with a particular task. In particular, the A1 QPs are swamped with traps. However, the experiments also revealed a quantitative and qualitative difference in the achievable solutions through the different ansätze. Specifically, while neither class was able to produce an optimal solution on average, we observed a significant increase in the variability of the quality of the solutions achievable through the A2 QPs and, for both cases, a significantly better solution (by ∼3 orders of magnitude when compared to the A1 QP solutions) was found. Additionally, we found that the A2 QPs produced saddle points, which opens the possibility of improving the solutions found by optimizing them for longer. The co-designed algorithm with a modest amount of parameters outperformed non-co-designed algorithms with even an exponential number of parameters. These drastically different results in solution quality thus hint at the importance of choosing the right ansatz for the appropriate problem.

We have shown that AQML algorithms suffer from trap-ridden landscapes. Therefore, an outstanding set of questions remains: What kinds of unitaries can an AQML algorithm readily approximate? Conversely, given a desired unitary, can we devise a systemic methodology for allocating attention to different ansatz? This section argues that these questions can be addressed by thinking of algorithm-task co-design.

Co-design is not a new concept; significant effort has gone into describing how quantum computing architectures are often built to address specific problems.50–52 More generally, technologies are defined by and help define the issues they aim to resolve.64 Co-design is crucial in the development of hardware control software,65 in proposals for materials and chemistry simulation,66–69 and in approaches to error correction.70–72 Indeed, the field of AQML broadly asserts that the applications developers envision are inseparable from the hardware we expect to use to realize them.

In this section, co-design will take the following particular meaning. An AQML ansatz is co-designed for a desired unitary evolution such that its Magnus expansion contains the operators of the desired evolution weighted by independently tunable coefficients. Likewise, understanding the Magnus expansion of an AQML ansatz can co-design a desired unitary with a specific application.

We remark that co-design is linked to the dynamical Lie algebra (DLA) of a QML model. The DLA of a model is the set of operators generated by the nested commutators of the model’s native and control Hamiltonians. A model with a DLA consisting of D linearly independent operators exhibits vanishing gradients concentrated around zero as 1/D.73 Vanishing gradients are equivalent to the presence of narrow local minima with 1/D curvature scaling.74 Therefore, the most expressive models suffer from barren plateaus with exponentially concentrated local minima.5 When a co-designed model is trained, it explores a region of parameter space where Hamiltonian terms outside the target Hamiltonian are zeroed out while keeping desired terms accessible. This parameter region contains a smaller effective DLA of size D′ ≪ 4N, therefore avoiding narrow local minima. While this guarantees an absence of parasitic local minima, trap-free landscapes are not guaranteed. Yet, our numerics show their presence. Therefore, further work is needed to design a predictive theory by exploring if methodology linked to the DLA can be used to explain trap-free landscapes or by finding a QOC theory that does not require local surjectivity but does predict trap-free landscapes.

Let us begin by justifying our insisted attention to the Magnus expansion and showing how it can be used for co-design.

A time-dependent Hamiltonian produces a unitary U(θ), which can be expressed as generated by an effective Hamiltonian U=expiTHeff(θ). This effective Hamiltonian is calculated using the Magnus expansion75,76
(9)
each term in the expansion comprises nested commutators of the control and native Hamiltonian of the AQML algorithm (see  Appendix E for details).

Each term in the Magnus expansion contains O(N2l) operators. At first glance, for simple control fields (i.e., small K), one would expect that the coefficients of different operators are linearly dependent. However, numeric experimentation shows that the operators’ coefficients are linearly independent and can thus be, in theory, tuned independently. For this, see Fig. 10 in  Appendix E, where we also argue that integration order relevance ensures linear independence by analyzing operators generated up to l = 2.

Unsurprisingly, we can expect an ansatz to be reliably used to simulate the dynamics of a many-body Hamiltonian whose interactions resemble those in Hnat. This is because the term H(0), called the average Hamiltonian, is
(10)
where Fiα is the integral of fiα divided by T. Figure 5 shows the A1 QP and A2 Ising ansatz results in approximating two kinds of evolution. Figure 5(a) shows the training results of approximating a 1D transverse-field Ising model as in Eq. (6) with h = 0.5. The QP ansatz performs poorly, while the Ising ansatz is successful. Likewise, Fig. 5(b) shows the training results of approximating a star model with a transverse field of strength h = 0.5. In the star evolution, all qubits interact with a central qubit. This kind of evolution was, in fact, the inspiration for the QP connectivity. The Ising ansatz performs poorly, while the QP ansatz is successful.
FIG. 5.

The importance of task-algorithm co-design. In (a) and (b), we train an A1 QP and an A2 Ising ansatz at two different tasks: simulating the evolution under a 1D transverse-field Ising evolution and of a transverse-field star evolution. Panel (a) The Ising evolution results demonstrate that the Ising ansatz succeeds while the QP ansatz trails behind. Panel (b) shows that the QP is better suited for simulating the star evolution than the Ising ansatz. Using an ansatz suited for a particular task is key to success. Panels (c) and (d) pictorially depict calculating the operators and their coefficients in the Magnus expansion to co-design ansätze for a particular task. Panel (c) depicts the transformation of different Hamiltonian terms in the QP ansatz due to nested commutations. Single-site operators transform to other single-site operators when commuted with control fields. The interaction terms in Hnat generate many-body terms. Going against the arrows results in the coefficients picking up an extra factor of −1. Panel (d) exemplifies the operators in the Magnus expansion emerging from commuting an interaction term twice. The resulting operators are all in the l = 2 term of the Magnus expansion. The arrows here symbolize the coefficients associated with each operator. We exemplify the coefficient in front of ZiZjXN, which can be used for time-reversible spin-squeezing.

FIG. 5.

The importance of task-algorithm co-design. In (a) and (b), we train an A1 QP and an A2 Ising ansatz at two different tasks: simulating the evolution under a 1D transverse-field Ising evolution and of a transverse-field star evolution. Panel (a) The Ising evolution results demonstrate that the Ising ansatz succeeds while the QP ansatz trails behind. Panel (b) shows that the QP is better suited for simulating the star evolution than the Ising ansatz. Using an ansatz suited for a particular task is key to success. Panels (c) and (d) pictorially depict calculating the operators and their coefficients in the Magnus expansion to co-design ansätze for a particular task. Panel (c) depicts the transformation of different Hamiltonian terms in the QP ansatz due to nested commutations. Single-site operators transform to other single-site operators when commuted with control fields. The interaction terms in Hnat generate many-body terms. Going against the arrows results in the coefficients picking up an extra factor of −1. Panel (d) exemplifies the operators in the Magnus expansion emerging from commuting an interaction term twice. The resulting operators are all in the l = 2 term of the Magnus expansion. The arrows here symbolize the coefficients associated with each operator. We exemplify the coefficient in front of ZiZjXN, which can be used for time-reversible spin-squeezing.

Close modal

For both experiments, N = 4 and 100 randomly initialized were optimized to approximate a T = 1 evolution with K = N. We observe that convergence is achieved when aikα0 for all k > 0 (i.e., the algorithm is trained to choose constant fields) and for α = y, z. Importantly, for the case of constant fields, H(l) = 0 for all l > 0.

The example of co-design in Fig. 5 is very simple. However, the idea is that by analyzing the terms in H(l) (θ) for l > 0, and the coefficients in front of them, we can tell what kinds of unitaries can be readily approximated by our AQML algorithm. Notice that this is quite different than ensuring simulatability through either a claim of universality or controllability. In the case of universality, a desired unitary Utarg is broken down into a product of unitaries, each achievable given a particular hardware architecture. Instead, our approach focuses on determining if the Hamiltonian generating the desired unitary is spanned by the term within the ansatz’s Magnus expansion. Controllability studies whether the nested commutators of the terms in the Hamiltonian can generate every generator of the dynamical Lie algebra (Pauli strings, in our case). Our approach also attends to the coefficient before a given generator, which influences how easy it is to achieve evolution under said generator using an AQML ansatz in an experiment. By looking at the Magnus expansion, one can gain insight into both the generators and the coefficients in front of the generators, which are crucial to determining the likelihood of success in practice. In summary, if A is an operator present in the Magnus expansion with a coefficient c(θ), we can, in theory, approximate the evolution expic(θ)A as long as c(θ) can be made nonzero and all other coefficients for other operators can be mitigated.

Figure 5 exemplifies how to calculate the operators and coefficients in the Magnus expansion for l = 2 for a QP. Figure 5(c) is a pictorial representation of how the Hamiltonian terms transform into each other through commutation. For example, the operator Zi transforms into Yi while the associated coefficient picks a ifiz(t) factor. The yellow, short-dash, long-dash arrow in Fig. 5(c) depicts this example. Going in the direction opposite to the arrows picks up an extra −1 on the coefficient. Figure 5(d) exemplifies how to use this pictorial depiction of the commutation-induced transformations to compute the operators generated from the interaction ZiZN appearing in H(2). We note that new interactions emerge between two inputs and the output qubit (i.e., three body terms). Using panel (c), each term comes with a coefficient composed of a nested integral of control fields and interaction strengths (see  Appendix E for details on how to calculate the coefficients).

It is important to note that the order of integration matters for these coefficients. For an example of the importance of integration order, see Eq. (E6) in  Appendix E. Using this method and focusing on a particular term in the Magnus expansion label by l gives us a function mapping the variational parameters θ to the coefficients of all possibly generated Hamiltonian terms. Figure 9 shows all operators obtained from ZiZN in the l = 2 term of the expansion.

Figure 5(c) also exemplifies the coefficient in front of the operator ZiZjXN, creating an effective interaction between two input qubits mediated through the output qubit. In Sec. IV B, we show that this operator can naturally be used for time-reversible spin-squeezing, an observation that was previously made in Ref. 35 using second-order perturbation theory. Therefore, while the result that quantum perceptrons can perform time-reversible squeezing is not new, the Magnus expansion approach to arrive at this result shows that this can be achieved even outside the perturbative regime.

With this framework in mind, let us show that co-design via attention to the Magnus expansion can reveal naturally suitable evolution for a given ansatz. Below, we show that QPs are naturally suited to implement time-reversible spin-squeezing and that the Ising ansatz is naturally well-suited for evolution under products of Wigner–Jordan strings, an evolution paramount in quantum chemistry applications of quantum computers.

This subsection offers a fresh perspective on an observation we made in previous work (Ref. 35), namely that QPs are suited for realizing time-reversible spin-squeezing, which can be used for quantum metrology applications. Our previous observation used second-order perturbation theory. This section shows that the Magnus expansion tells us QPs can do time-reversible spin-squeezing. We do not show how this can be used for metrology applications since that has already been explored. Instead, this section explains how the Magnus expansion can lead us to similar conclusions without the need for perturbation theory.

Figure 5(c) shows that the coefficient of the operator ZiZjXN is given by
(11)
where GNx(t) [HNx(t)] is the second (third) anti-derivative of the function fNx, and ΔGNx (ΔHNx) is the difference of the second (third) anti-derivatives from t = 0 to T.
Then, a term in the Magnus expansion is ( Appendix E 2)
(12)
Here, FNx is the anti-derivative (divided by T) of fNx. In the equation above, Sinz=i=1N1Zi is the total spin z-projection of the input qubits. The term Sinz2 is called the one-axis-twisting,77 which creates spin-squeezing with applications to Heisenberg-limit metrology.78 Moreover, the inclusion of XN means that the output qubit can be used to squeeze the input when initialized in the state |+N=12(|0N+|1N), the positive eigenstate of XN. Similarly, the output can anti-squeeze (i.e., create a time-reversed evolution) when in the state |N=12(|0N|1N), the negative eigenstate of XN. This time reversal has proven an efficient methodology in metrology experiments, achieving Heisenberg scaling.79 

Therefore, the Magnus expansion analysis can elucidate the applications suitable for AQML ansatz.

Finally, this subsection shows how the Magnus expansion can be used to co-design algorithms to simulate the evolution products of Pauli strings, which show up repeatedly in quantum chemistry applications.

Quantum computers promise to help with calculations related to chemical reactions and molecular configurations. One approach to achieve this promise relies on mapping N chemically relevant spin orbitals to N qubits with Nm occupied and m virtual orbitals. Then, a reference state, a reference state |ψref⟩ is evolved via a unitary U to minimize the energy of U|ψref⟩ as measured by the molecular Hamiltonian Hmol. This approach underpins algorithms like variational quantum eigensolvers (VQEs),6 where a circuit with variational parameters is tuned for energy minimization regardless of whether the circuit’s operation directly affects the occupancy of virtual orbitals. Chemical intuition has resulted in designing U to correlate virtual and occupied orbitals.80 This approach underpins unitary couple-clusters (UCC), which aims at evolving under the UCC unitary U=exp{iT̂}, where T̂ is the so-called cluster operator defined by
(13)
A larger overlap with the ground state is then achieved by tuning the real parameters {cia,cijab,}, which mediate coupled occupancy (unoccupancies) of the virtual (occupied) orbitals.
To realize the UCC, it is crucial to implement the UCC unitary on a quantum computer. One popular approach is to breakdown U using Trotterization81 and implement each operator coupling operator independently. To do this, we must map operators like aaai to the qubit basis using the Jordan–Wigner transformation am=12a<mZaXmiYm. Notice that, for example, the coupled operator aaai necessitates implementing unitary evolutions generated by the following operators, which we call Jordan–Wigner products,
(14)
Using the Magnus expansion, we see that an AQML algorithm can simulate evolution under Jordan–Wigner products. Take, for example, the Jordan–Wigner product X1Z2ZN−1XN. We can expect that evolution under this operator can be simulated using a rotated Ising ansatz with Hnat = J∑iXiXi+1 and at least z controls. To see this, consider the Magnus expansion of this model. Through nested commutation, we have that
(15)
The operators above the arrows are the ansatz’s Hamiltonian operators with which we commute. It is important to note that the Jordan–Wigner approximation was first proposed as a methodology to solve the time-independent 1D Ising model. Therefore, it is reasonable that the 1D Ising model can also simulate the evolution of Jordan–Wigner products.

Figure 6 shows the results of simulating the evolution of different Jordan–Wigner products. We minimized Eq. (5) for these numerical experiments, starting from 100 random initializations of control fields and coupling constants J with K = 10 for N = 4. Fourier functions were used, but similar results were observed with Legendre and Gaussian functions. In Fig. 6, we show the channel fidelity of the optimized evolution U(θ). We see that both products’ evolution was faithfully simulated early, while it is easier to simulate XZZY evolution over a longer period. The inset shows the results of simulating the product XZZX for different qubits for T = 0.1 and K = 10. We see that the fidelity stays largely constant. Therefore, the simulation is faithful with linear resources in N. This is similar to other methods for calculating these products using fermionic swap operators, where faithful simulation requires resources to scale linearly in N.69 However, fermionic operations are only available on a handful of platforms, such as superconducting qubits and movable neutral atoms.82 Our approach, on the other hand, makes couple-cluster simulation with linear resources available to other platforms, such as fixed neutral atoms and trapped ions, without requiring extra controls to achieve fermionic interactions.83 

FIG. 6.

Results of simulating evolution under Jordan–Wigner products on an Ising ansatz. We used N = 4 qubits to simulate the unitary evolution eiTP, where P is a Jordan–Wigner product as defined in Eq. (14) for a total time T. In particular, we show the average channel fidelity (F) of XZZX and XZZY evolution. We note that both products are faithfully simulated for small T. The inset shows the average fidelity obtained (T = 0.1) as a function of the number of qubits (N). We observe that the minimal fidelity remains constant around 99.5%. For all experiments, K = 10 control field terms were used. Therefore, a high-fidelity simulation is obtained with constant resources in N.

FIG. 6.

Results of simulating evolution under Jordan–Wigner products on an Ising ansatz. We used N = 4 qubits to simulate the unitary evolution eiTP, where P is a Jordan–Wigner product as defined in Eq. (14) for a total time T. In particular, we show the average channel fidelity (F) of XZZX and XZZY evolution. We note that both products are faithfully simulated for small T. The inset shows the average fidelity obtained (T = 0.1) as a function of the number of qubits (N). We observe that the minimal fidelity remains constant around 99.5%. For all experiments, K = 10 control field terms were used. Therefore, a high-fidelity simulation is obtained with constant resources in N.

Close modal

In this work, we systematically studied the landscapes of several AQML algorithms. We showed that AQML algorithms violate a key assumption from QOC theory that would suffice for trap-free landscapes. However, we observe that ansätze co-designed with specific tasks often showcase trap-free landscapes. Therefore, co-design is paramount to the development of successful AQML algorithms. We show that in the case of time evolution simulation, co-design can be realized by studying the Magnus expansion of a given ansatz and showcasing our approach to tasks relevant to the quantum information community.

This work elucidates various lines of inquiry worth further exploration. First, further work is needed to design an AQML ansatz that satisfies local surjectivity. As pointed out in Ref. 84, QOC theory’s assertion of trap-free landscapes does not apply when the control fields are singular (i.e., they violate local surjectivity). However, recent work shows that even singular ansätze, depending on powers of control functions, can still produce trap-free landscapes.85 So far, these ansatz’s Hamiltonians include higher-order terms resulting from the higher-order effects of shining intense lasers onto quantum systems. Along this direction, further work could determine whether higher-order effects in the envisioned hardware of AQML algorithms can mitigate traps. For example, it is well-known that optical-tweezer arrays of Rydberg atoms exhibit nonlinear dynamics when closely packed due to the Rydberg blockage mechanism. These dynamics could then elucidate trap-free landscapes. Alternatively, classes of basis functions may exist for which local surjectivity is met. Moreover, studying the effect of decoherence on the landscape remains an important line of inquiry.

Second, our study of the Magnus expansion as a means to co-design AQML algorithms is limited. Based on our work, we hope the community is inspired to study the viability of simulating higher-order terms from the UCC operator.

Third, the theory linked to the dynamical Lie algebra (DLA) and QOC approaches fails to explain the presence of trap-free landscapes in co-designed algorithms. The DLA’s prediction of barren plateaus and local minima whose curvature scales as 1/D only guarantees that a co-designed algorithm lacks parasitic local minima. QOC theory guarantees trap-free landscapes when local surjectivity is assumed. Therefore, further work is needed to design a predictive theory by exploring if methodology linked to the DLA can be used to explain trap-free landscapes or by finding a QOC theory that does not require local surjectivity but does predict trap-free landscapes.

Finally, using the operators uncovered by the Magnus expansion to simulate novel quantum dynamics seems particularly enticing. For example, the operator highlighted in Sec. IV B has a precise application to quantum metrology as it creates spin-squeezing. Moreover, such an all-to-all Ising model has recently exhibited a dynamical phase transition,86 which can enable new understandings of out-of-equilibrium phenomena.87 Multi-body terms such as those in Sec. IV C also appear when considering the dynamics of qubits to phononic baths,88 and understanding these models may enable long-lived quantum information storage, kinetically constrained dynamics, and correlated quantum states of matter.

The authors acknowledge Christian Arentz for his insightful discussion on QOC theory and Nishad Maskara for pointing out applications relevant to quantum chemistry. R.A.B. acknowledges support from the National Science Foundation (NSF) Graduate Research Fellowship under Grant No. DGE1745303. J.G.P. acknowledges support from the Harvard College Research Program and the Harvard Quantum Initiative. S.F.Y. acknowledges funding from the NSF through the Q-IDEAS HDR Institute (No. OAC-2118310), the Q-SeNSE QCLI (No. OMA-2016244), and the CUA PFC (No. PHY-2317134). The authors also acknowledge the funding through the DARPA IMPAQT Program (No. HR0011-23-3-0023).

The authors have no conflicts to disclose.

Rodrigo Araiza Bravo and Jorge Garcia Ponce contributed equally to this work.

Rodrigo Araiza Bravo: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (lead); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). Jorge Garcia Ponce: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (supporting); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (supporting). Hong-Ye Hu: Formal analysis (supporting); Validation (supporting); Writing – review & editing (supporting). Susanne F. Yelin: Funding acquisition (lead); Project administration (supporting); Resources (lead); Supervision (supporting); Writing – review & editing (supporting).

The data supporting this study’s findings are openly available in Ref. 53.

The control fields fiα are sums of simple functions fiα(t;a)=k=1Kaikαgk(t). We use the following different functions:
(A1)
(A2)
(A3)
(A4)
where T is the total evolution time and σ a pulse-width. The Fourier expansion is continuous, while the Gaussian expansion more closely represents a circuit-based (or pulse-based) system.
We now show that a QP with Gaussian controls and piecewise constant controls on the inputs can simulate the target unitary in Eq. (6). To do so, let us first observe that we can Trotterize the Ising evolution in Eq. (6) such that
(A5)
Therefore, we only have to ensure that a QP ansatz can produce the unitaries URi=eiαXi and UIi=eiβZiZi+1 for every i and for at least, small α, β.
Consider a QP with Gaussian pulses of width σ ≪ 1. Suppose the Gaussian functions gk(t) have a unit height and are centered at t = 0, T/K, 2T/K, …, (K − 1)T/K. The parameters aikx,y paired with these Gaussian functions dictate the magnitude of the control field. Additionally, suppose the output qubit contains piece-wise constant functions that change at the same intervals. The magnitude of the PWC functions is mediated via parameters bNkx,y. In that case, one can faithfully breakdown the unitary evolution of a QP as follows:
(A6)
Immediately, we notice that the right-most term of the equation above contains the rotations URi. Moreover, the left-most term contains the unitary UIN1. However, we are yet missing Ising-type interactions between different input qubits (i.e., ZiZi+1 for i < N − 2). Suppose, however, that the PWC parameters are such that J/b ≪ 1. For the sake of argument, suppose that only the x-field is present. In that case, we have already shown in Ref. 32 that this effectuates the following evolution on the input qubits:
(A7)
This contains the interaction terms UIi for which j = i + 1. However, it also contains unwanted interactions, which we want to eliminate. This can be performed by choosing the two subsequent rotations generated by the Gaussians to be of opposite sign and large magnitude such that σajkx=π=σaj(k+1)x for a j, which we do not want. This means that per interaction term UIi that we want to produce, we must ensure we use at least three values of k: one used to rotate qubits according to URi, and two with bNkx large and equal to make sure that for all ji + 1, we have a vanishing interaction. Notice that this works because
(A8)
For the case of j = i + 1, the total evolution time of the interaction is β=3JiJi+1T/KbNkx, which is necessarily small.

This procedure necessitates that K be large enough for each interaction to be tuned independently. This means that K ≥ 3Nn, where n is the number of Trotter steps.

Moreover, our proof heavily relies on producing the effective Hamiltonian in Eq. (A7). Therefore, it is paramount that the QP ansatz includes PWC functions on the output qubit. In numerical experiments, we have seen that while Gaussian controls work best, Fourier controls also allow the QP to simulate the Ising evolution as long as the output contains PWC controls. We say that a QP with PWC controls is an A2 ansatz as specified in Sec. II C.

1. Derivation of the dynamical derivative

Assume that we can break the evolution of our system into M constant portions,
(B1)
where δti = titi−1. Then,
(B2)
(B3)
Taking the continuous limit and using the midpoint definition of integration, we obtain
(B4)
By introducing the unity 1 = U(θ)UU(θ) on the left side of the integral, one obtains Eq. (7).

2. Trap-free landscapes

For this section, we use the loss LE=(UW)F2, where AF2=Tr(AA) is the Fröbenius norm. Note that this loss differs from that in Eq. (5) by a factor of 2N+1, but this discrepancy only changes its values, not its curvature. Evidently,
(B5)
where we use the notation W = Utarg. Let us express the dynamical derivative in Eq. (7) as θnkαU=iUμnkα.
The derivative of the loss function is
(B6)
If we assume local surjectivity, then a critical point where all derivatives vanish can only be accomplished if UW = WU.57 Obviously, one answer is that U = eW is a critical point for any phase ϕ. We denote UW as χ. Note that χ2 = I. At the critical points, χ = χ and so it must have eigenvalues ±1 (ϕ = 0, 2π). For the critical points, the loss function takes on the following values:
where ni = 0 if the ith eigenvalue of χ is positive and ni = 1 otherwise. Clearly, the case where W = U yields LE=0 and W = −U yields LE = 4(2N). Other critical points can be found by taking W and permuting n columns using a permutation matrix Π. In such cases, χ = Π. Evidently, the loss becomes LE=2(2Nn), which has a degeneracy of 2Nn.
Let us now calculate the Hessian of the loss function LE. We obtain
(B7)
since μnkα is Hermitian.
Note that
Therefore,
(B8)
At a critical point, χ = χ and, therefore,
(B9)
Since χ is diagonalizable, there exists a change of basis D such that χ = DAD where A=diag((1)n1,(1)n2,,(1)nd) with d = 2N. Let a, b stand for the indices of θnkα and θmlβ. Equation (B9) is the entry Hab of the Hessian H. Inserting the identity DD into Eq. (B9), we obtain
(B10)
Let us call μ̄a,b=Dμa,bD. Then,
Therefore, we have that
(B11)
(B12)
with the d2 × d2 matrix Γ defined as
(B13)
So far, this theory has yielded the same results as in Ref. 57; however, while in that case the Hessian is defined as on different times H(t,t) since the functions the controls are to be tuned at every point in time, in an AQML model, the functions are only tunable through a series of finite parameters Np. The entire Hessian is a finite matrix defined by
(B14)
The matrix M is a (d2 × Np) matrix. Since we have assumed local surjectivity, the entries of μa are all linearly independent but not necessarily orthogonal.

When Npd2, we can apply some of the results in Sec. VI of Ref. 57. These results show that there exists an orthonormal basis O that diagonalizes H (C=OTHO), where C is a diagonal matrix whose entries are the curvature of the landscape and the critical points. Therefore, the matrix Γ is congruent to C. Using Sylvester’s law of inertia, congruent matrices share rank (i.e., the number of nonzero eigenvalues) and signature (i.e., the number of positive minus negative eigenvalues). A signature of magnitude equal to the rank means the critical point is a maximum or a minimum if positive or negative, respectively. A signature smaller than the rank in magnitude means the critical point is a saddle point. This is because a critical point with such a signature would have at least one direction in which the curvature is of the opposite sign as other directions. Therefore, Eq. (B13) tells us the most relevant information about the curvature by looking at the sign of its entries and the number of zero entries.

For the case that U = W ni = 0 and, therefore, the first d entries of Γ are positive. The rest of the entries are zero. Therefore, the rank of the Hessian is d and its signature d. Therefore, this critical point is a global minimum. For the case that U = −W, Γ has a rank of d and a signature of −d, and it is thus a global maximum. For the cases that U = ΠW where Π is a permutation of n columns, the rank is Rn = d(d − 2n) + 2dn2 greater than d for n > 0. The signature is Sn = d(d − 2n). Clearly, for n > 0, Sn < Rn.

Let us summarize our findings. We made the following two significant assumptions:

  1. We assumed that our AQML algorithm is locally surjective. In practice, this needs to be checked at least numerically.

  2. We assumed that we have at least 22N variational parameters. In practice, this is a choice experiments must decide on. Making this choice can be seen as choosing to overparametrize the AQML algorithm.

Given those assumptions, we found that

  1. the critical points of LE correspond to the cases in which UW = WU,

  2. the values of the loss function at these critical points are an integer given by 2(2Nn), where 0 ≤ n ≤ 2n is an integer. Each critical point has a degeneracy of 2Nn solutions,

  3. the rank of the Hessian at the critical points is Rn = d(d − 2n) + 2dn2, which achieves a maximum value of d2 for n = 2N (U = W) or n = 0 (U = −W) and,

  4. The signature of the Hessian at the critical points is Sn = d(d − 2n). Therefore, the critical point of the case of W = U (n = 0) is a global minimum. That is, the Hessian contains only non-negative eigenvalues. Similarly, W = −U (n = d) is a global maximum. That is, the Hessian contains only non-positive eigenvalues. For all other critical points, there is at least one positive and one negative eigenvalue and, therefore, they are saddle points.

In conclusion, the assumptions of local surjectivity and overparametrization should lead to trap-free landscapes of LE.

This appendix shows that AQML ansatz fails to satisfy local surjectivity in certain cases. To prove this statement, it is helpful to establish the following lemmas.

Lemma 1.
Suppose h and P are two Pauli strings and that U1 and U2 are independent unitary 1-designs. Then,
(C1)

Proof.
First, recall that for a 1-design EU(UAU)=Tr(A)2N1. Then,
(C2)
This follows from Tr(h) = 0. □

Lemma 2.
Suppose M is a Hermitian operator and U is a Haar random matrix. Then,
(C3)

Proof.
Since M is Hermitian, it can be diagonalized. That is, a unitary D exists for which M = DΛD where Λ is diagonal. Then, using the right and left invariances of the Haar measure
By symmetry, each of the terms in this last equation should vanish since if they did not, then the eigenvector associated with the eigenvalue Λi would be a special direction for the Haar random unitary. □
Using these lemmas, we can now make a substantive statement about local surjectivity for AQML ansätze. Notice that testing local surjectivity amounts to calculating the following quantity:
(C4)
Here, we make a few extra assumptions. We assume that U(τ) and U(T) are independent for τ < T, that U(τ) is a 1-design for τ > 0, and that U(T) is a Haar random unitary. These assumptions imply that U(T) is also a 1-design. For 0 < τ < T, the integrand vanishes due to Lemma 1. The edge case τ = 0 gives us an integrand containing Eθ(Tr(U(θ)hP)). The edge case τ = T gives an integrand containing Eθ(Tr(U(θ)Ph)). Since Paulis are closed under multiplications, both cases have integrands of the form of Lemma 2 and, therefore, vanish.

We can, therefore, conclude that local surjectivity will be violated for every Pauli string if an AQML algorithm such that U(θ) is a Haar random unitary. Notice that this statement also assumes that U(τ) and U(T) are independent for τ < T and that U(τ) is a 1-design for τ > 0.

Here, we must clarify that one could also relax the Haar random requirement if gk(t) vanishes for tT and t ≈ 0. In that case, an AQML algorithm producing unitaries that only resemble 1-designs suffices for the violation of local surjectivity.

Another caveat to our proof is that the Haar random requirement of Lemma 2 is stringent. It remains to be shown whether the average Tr(UM) can vanish for other kinds of unitary ensembles.

We numerically simulated the time evolution of a 1D Ising with a transverse field defined by
(D1)
with h = 0.1. We approximated this evolution with N = 2, 3, 4, and 5 qubits using the loss in Eq. (5).

The simulation code was implemented with PennyLane’s differentiable pulse programming capabilities89 and the JAX framework. The optimizers were similarly implemented using the Optax library. We note that these libraries/frameworks support automatic differentiation; therefore, the results are exact up to machine precision. Additionally, we note that both Gaussian and Fourier control fields behave qualitatively very similar. The plots here shown are for Fourier controls.

The first phase of the numerical experiments consists of training 100 randomly initialized A1 QP ansatz for each qubit number. These are sampled from a uniform distribution on the interval [0, 1]. We parameterize the pulses as a sum of the first K terms of a Fourier series, where K = 5N for N qubits. Therefore, concretely, we used 10, 15, 20, and 25 terms for each control pulse for the experiments with 2, 3, 4, and 5 pulses, respectively. We note that the K = 5N cutoff for the series is arbitrary, but the system’s behavior remained unchanged for several different K. As for the optimizer, we used Optax’s Adam implementation.

Figure 7 shows the results of the numerical experiments. The training process is successful in all cases and converges to a critical point. Yet the optimal approximated unitary is only achieved at a value of LE=0. Therefore, we see that for 4 and 5 qubits, all sampled QP ansatz converge to a sub-optimal critical point (i.e., a local minimum or maximum instead of the global optimum). We see similar results with the A2 QP ansatz, with a lower loss for N ≥ 4.

FIG. 7.

Training history of 100 random A1 QP ansatz on 2, 3, 4, and 5 qubits. The QPs were trained for 6000 epochs, and we observe that for each trial, LE plateaus, suggesting that the training converged to a critical point. For 2 and 3 qubits, all trials converge to the optimal value of LE=0, yet for 4 and 5, this is no longer the case, and they all converge to sub-optimal values. The inset shows logarithmic plots of two different gradient metrics used to confirm convergence: the L2 norm of the gradient and the L2 norm of the differences between gradients at consecutive epochs. These metrics further confirm convergence to a critical point.

FIG. 7.

Training history of 100 random A1 QP ansatz on 2, 3, 4, and 5 qubits. The QPs were trained for 6000 epochs, and we observe that for each trial, LE plateaus, suggesting that the training converged to a critical point. For 2 and 3 qubits, all trials converge to the optimal value of LE=0, yet for 4 and 5, this is no longer the case, and they all converge to sub-optimal values. The inset shows logarithmic plots of two different gradient metrics used to confirm convergence: the L2 norm of the gradient and the L2 norm of the differences between gradients at consecutive epochs. These metrics further confirm convergence to a critical point.

Close modal
To confirm that the training had indeed converged, we focus on two additional metrics: the mean L2 norm of the gradient at each point Eq. (D2) and the mean L2 norm of the differences between the gradients at epoch i and epoch i − 1 Eq. (D3),
(D2)
(D3)

To make sense of these metrics, recall that at a critical point of LE, its gradient LE=0 by definition. Therefore, the mean L2 norm of the gradient will only be zero at a critical point, and we would expect the gradients at that epoch and onwards to remain very small and close to 0. Similarly, because the gradients are zero at a critical point, the update rule of the optimizer at the critical point would not change the current parameters. Therefore, the difference between the gradients at epoch i and i − 1 at a critical point should remain small and close to 0.

We see that for all experiments, both metrics keep decreasing and plateauing by the end of the training. For 2 and 3 qubits, this plateau occurs at ∼10−5, whereas for 4 and 5 qubits it is located at around 10−4.

The second phase of the numerical experiments consists of calculating the Hessian matrix at the critical points found during training. Recall that the Hessian matrix encodes information about the second derivatives of the function at a given point and, therefore, encodes information about the local curvature of the landscape around that point. In particular, a positive eigenvalue corresponds to a direction (the direction of its corresponding eigenvector) with positive curvature, and conversely, a negative eigenvalue corresponds to a direction with negative curvature. A zero eigenvalue represents a direction with zero curvature (i.e., flat).

Figure 8 shows the minimum and maximum eigenvalues of the Hessian matrices after the training process converges to a critical point for each trial of A1 QPs. The insets show that for 2 and 3 qubits, the final loss is indeed optimal. Yet even for this optimal case. We observe that there is always a 0 eigenvalue corresponding to some locally “flat” direction around the critical points. That is, even for qubit sizes that can converge to an optimal value, the landscape is not as well-behaved as the theory predicts. This suggests that the ability to converge to the global minimum for the A1 QP and similar AQML ansätze is not a consequence of a trap-free landscape. We note that for A2 QPs, the eigenvalues are both positive and negative (see Fig. 4), and so those landscapes are trap-free.

FIG. 8.

Minimum and maximum eigenvalues of the Hessian matrix at the converged critical points for each trial of an A1 QP. Since the minimum eigenvalue is always 0, every trial converged to a critical point with a positive semidefinite Hessian. The presence of a 0 eigenvalue indicates that in some direction around the critical point, the landscape is locally “flat.” The insets show a histogram of the final losses at each critical point.

FIG. 8.

Minimum and maximum eigenvalues of the Hessian matrix at the converged critical points for each trial of an A1 QP. Since the minimum eigenvalue is always 0, every trial converged to a critical point with a positive semidefinite Hessian. The presence of a 0 eigenvalue indicates that in some direction around the critical point, the landscape is locally “flat.” The insets show a histogram of the final losses at each critical point.

Close modal

1. Magnus expansion analysis

Every unitary U could be written as the evolution under a Hamiltonian HU via U=expiTHU. Formally, HU=iTlog(U). On the other hand, a time-dependent Hamiltonian generates evolution as in Eq. (4). This evolution could be written as expiTHeff, where Heff is an effective Hamiltonian.

According to the average Hamiltonian theory,75,76 the effective Hamiltonian is given by the Magnus-expansion
(E1)
(E2)
(E3)
(E4)
In the case that TH(t)‖S ≪ 1 (where ‖AS is the spectral norm of an operator), it is possible to truncate the expansion since the lth term is of order O(TlH(t)Sl). However, all orders are relevant to the cases presented in the main text.

Note that for a constant Hamiltonian, all terms with l > 0 cancel since [H(t1), H(t2)] = 0.

The Hamiltonians generated by the Magnus expansion depend on the native and control Hamiltonians. We will use SHl to denote the operators generated by the nested commutators in H(l). This means that H(l) is a linear combination of the operators in SHl (i.e., H(l)=OSHlαOO). It is worth noting that different orders may produce the same operators. That is, in general, SHlSHm. An example here is instructive. Consider Hnat = JZ1Z2 with x-controls Hc(t)=ifix(t)Xi. Clearly, SH0 contains Z1Z2. We can obtain this operator by commuting Z1Z2 with the operator Xi twice or with X1 once and then with X2 (or vice versa). Therefore, Z1Z2SH2 as well.

Theoretically, one can approximate every unitary evolution U for which HU is in the span of SH=lSHl. In practice, however, the Hamiltonians generated by the first few terms of the Magnus expansion may be easier to optimize and find.

The coefficients in front of each generator are a parametric function. For example, going back to our example, up to second order,
(E5)
Notice that we have not simplified the last two terms since the order of integration matters. To see this, suppose the controls are polynomials of orders two and three. Then, note that
(E6)
In this case, the contributions of the coefficients are distinct but linearly dependent.
One can follow the following procedure to construct the terms in H(l) and its coefficients. We will exemplify the procedure on the Ising ansatz with global control fields,
(E7)
(E8)
  1. First, we construct a graph that explains how the elements in Hnat and Hctr transform under commutations. In our example, the operator Xi transforms into Yi when commuted with Zi, picking up a coefficient of ifz. It also transforms to YiZi+1, picking up a coefficient of iJ. Figure 9 exemplifies the transformations generated by terms in our Hamiltonian. The arrows represent commutations that lead to the prescribed coefficient, while going in the opposite direction requires an extra factor of −1.

  2. To construct a term given by l commutation relations, we first grab an operator from the Hamiltonian (for example, JZiZi+1) and operate on it using the graph in Step 1 a total of l times to produce one possible operator. We observe that many different operators can be created in this fashion, starting from a given initial operator. We must write the acquired coefficients from right to left at each stage. For time-dependent coefficients, the times acquire growing indices, from 0 to l, from right to left. For example, for l = 2, we may obtain the following operators:
    (E9)
    (E10)
  3. We then multiply the coefficients by the constant (−i)l/(l!T), which weighs in the contribution to the Magnus expansion. Finally, the coefficients are integrated and added to all other contributions paired with the same operator. For the paths exemplified above in Step 2, we see that ZiZi+1 we get the coefficients
    (E11)
    (E12)

    With this in mind, here are the coefficients of all the operators of the Magnus expansion up to l = 2 for the model in Eq. (E7) as shown in Table I, where we have used the notation [A,B]̄=AB̄BĀ. Notice that if the order of integration did not matter, then XiZi+1 and YiZi+1 would have a zero coefficient, and the coefficients of XiXi+1 and XiYi+1 would be the same (up to a sign). Therefore, if the order of integration did not matter, the Magnus expansion would lead to linearly dependent coefficients, so individual terms could not be tuned independently.

FIG. 9.

Diagrammatic representation of the commutation relationships of the Hamiltonian composed of a nearest-neighbor Ising-type interaction and global controls along the x, y, and z directions. Arrows represent directions where coefficients are multiplied by a +1, while directions opposite to the arrows acquire a factor of −1.

FIG. 9.

Diagrammatic representation of the commutation relationships of the Hamiltonian composed of a nearest-neighbor Ising-type interaction and global controls along the x, y, and z directions. Arrows represent directions where coefficients are multiplied by a +1, while directions opposite to the arrows acquire a factor of −1.

Close modal
TABLE I.

Operators and coefficients of the Ising ansatz with global fields appearing on the Magnus expansion up to second order.

OperatorCoefficient
Xi fx̄+[fz,fx]̄+12J[fx,J]̄+12fy[fx,fy]̄+12fz[fx,fz]̄ 
Yi fȳ+[fx,fz]̄+12J[fy,J]̄+12fx[fy,fx]̄+12fz[fy,fz]̄ 
Zi fz̄+[fy,fx]̄+12fx[fz,fx]̄+12fy[fz,fy]̄ 
XiXi+1 12fzfyJ̄+12fy[fy,J]̄ 
XiYi+1 12fyfxJ̄+12fx[J,fy]̄ 
XiZi+1 [J,fy]̄+12J[fx,fz]̄+12fz[fx,J]̄ 
YiXi+1 12fxfyJ̄+12fy[J,fx]̄ 
YiYi+1 12fxfxJ̄+12fx[fx,J]̄ 
YiZi+1 [fx,J]̄+12J[fy,fz]̄+12fz[J,fy]̄ 
ZiXi+1 12fzfxJ̄fyJ̄ 
ZiYi+1 12fzfyJ̄+fxJ̄ 
ZiZi+1 J̄+12fx[J,fx]̄+12fy[J,fy]̄+12(fyfyfxfx)J̄ 
ZiYi+1Zi+2 12JfyJ̄ 
ZiXi+1Zi+2 12JfxJ̄ 
OperatorCoefficient
Xi fx̄+[fz,fx]̄+12J[fx,J]̄+12fy[fx,fy]̄+12fz[fx,fz]̄ 
Yi fȳ+[fx,fz]̄+12J[fy,J]̄+12fx[fy,fx]̄+12fz[fy,fz]̄ 
Zi fz̄+[fy,fx]̄+12fx[fz,fx]̄+12fy[fz,fy]̄ 
XiXi+1 12fzfyJ̄+12fy[fy,J]̄ 
XiYi+1 12fyfxJ̄+12fx[J,fy]̄ 
XiZi+1 [J,fy]̄+12J[fx,fz]̄+12fz[fx,J]̄ 
YiXi+1 12fxfyJ̄+12fy[J,fx]̄ 
YiYi+1 12fxfxJ̄+12fx[fx,J]̄ 
YiZi+1 [fx,J]̄+12J[fy,fz]̄+12fz[J,fy]̄ 
ZiXi+1 12fzfxJ̄fyJ̄ 
ZiYi+1 12fzfyJ̄+fxJ̄ 
ZiZi+1 J̄+12fx[J,fx]̄+12fy[J,fy]̄+12(fyfyfxfx)J̄ 
ZiYi+1Zi+2 12JfyJ̄ 
ZiXi+1Zi+2 12JfxJ̄ 

However, even when the order of integration matters, the analysis in Table I does not suffice to show that the coefficients are linearly independent. To do so, we see the right-most column of Table I as a map ϕ that takes in variational parameters θ and produces the coefficients of the operators. In a sense, ϕ is a featured vector whose entries ϕO are the coefficients of the operator O. We can produce a data matrix D whose M columns correspond to choosing M random instantiating of the variational parameters θii=1M and calculating the associated coefficient vectors ϕ(θi)i=1M. We can then analyze the singular value decomposition of D and count the number of nonzero singular values. Suppose the dimension of ϕ is d and the number of nonzero singular values is s; if s < d, then the coefficients are linearly dependent.

Figure 10 shows the SVD analysis of the coefficients of the operators of the Magnus expansion up to the second order. We chose 500 random instances of variational parameters and calculated all 29 coefficients of an Ising ansatz with global and local pulses for N = 3. We plot the singular values of the matrix obtained by this procedure for different numbers of basis functions K and for different functions (Fourier and Legendre). We find that whenever K > 1, the coefficients are linearly independent. Similar results were observed for N = 4, 5, 6.

FIG. 10.

SVD analysis of the coefficients of the operators of the Magnus expansion up to second order. We chose 500 random instances of variational parameters and calculated all 29 coefficients of an Ising ansatz with global and local pulses for N = 3. We plot the singular values of the matrix obtained by this procedure for different numbers of basis functions K and for different functions (Fourier and Legendre). We find that whenever K > 1, the coefficients are linearly independent. Similar results were observed for N = 4, 5, 6.

FIG. 10.

SVD analysis of the coefficients of the operators of the Magnus expansion up to second order. We chose 500 random instances of variational parameters and calculated all 29 coefficients of an Ising ansatz with global and local pulses for N = 3. We plot the singular values of the matrix obtained by this procedure for different numbers of basis functions K and for different functions (Fourier and Legendre). We find that whenever K > 1, the coefficients are linearly independent. Similar results were observed for N = 4, 5, 6.

Close modal

2. Spin-squeezing using QPs

Suppose that all control fields are zero in a QP except fNx. Then, up to second order, the effective Hamiltonian has the following contributions:
Therefore, we have the effective Hamiltonian is
(E13)
Let us analyze the case when αZZ = 0. Notice that
Therefore, the Hamiltonian is that of Eq. (12). Notice that this hinges on αZZ = 0, which is thus a condition on the control fields we choose.

3. Additional data on simulating Jordan–Wigner products

In the main text, we showed the result of simulating the evolution of Jordan–Wigner products for small times and for the Fourier ansatz. Figure 11 shows the results of simulating the evolution of Jordan–Wigner products for longer times and using the Fourier (top panel) and Legendre (bottom panel) ansatz. We observe that for both of these ansatz, the fidelity decreases as time increases but returns to maximum later. The time of this comeback differs from one to another anzats. The Fourier ansatz has a repetition property after T = 1, but the Lengendre ansatz does not.

FIG. 11.

Results of simulating the evolution of Jordan–Wigner products for longer times and using the Fourier (top panel) and Legendre (bottom panel) ansätze.

FIG. 11.

Results of simulating the evolution of Jordan–Wigner products for longer times and using the Fourier (top panel) and Legendre (bottom panel) ansätze.

Close modal
1.
A. W.
Harrow
,
A.
Hassidim
, and
S.
Lloyd
, “
Quantum algorithm for linear systems of equations
,”
Phys. Rev. Lett.
103
(
15
),
150502
(
2009
).
2.
J.
Biamonte
,
P.
Wittek
,
N.
Pancotti
,
P.
Rebentrost
,
N.
Wiebe
, and
S.
Lloyd
, “
Quantum machine learning
,”
Nature
549
(
7671
),
195
202
(
2017
).
3.
M.
Cerezo
,
A.
Arrasmith
,
R.
Babbush
,
S. C.
Benjamin
,
S.
Endo
,
K.
Fujii
,
J. R.
McClean
,
K.
Mitarai
,
X.
Yuan
,
L.
Cincio
, and
P. J.
Coles
, “
Variational quantum algorithms
,”
Nat. Rev. Phys.
3
(
9
),
625
644
(
2021
).
4.
J. R.
McClean
,
S.
Boixo
,
V. N.
Smelyanskiy
,
R.
Babbush
, and
H.
Neven
, “
Barren plateaus in quantum neural network training landscapes
,”
Nat. Commun.
9
(
1
),
4812
(
2018
).
5.
E. R.
Anschuetz
and
B. T.
Kiani
, “
Quantum variational algorithms are swamped with traps
,”
Nat. Commun.
13
(
1
),
7760
(
2022
).
6.
A.
Peruzzo
,
J.
McClean
,
P.
Shadbolt
,
M.-H.
Yung
,
X.-Q.
Zhou
,
P. J.
Love
,
A.
Aspuru-Guzik
, and
J. L.
O’brien
, “
A variational eigenvalue solver on a photonic quantum processor
,”
Nat. Commun.
5
(
1
),
4213
(
2014
).
7.
E.
Farhi
and
A. W.
Harrow
, “
Quantum supremacy through the quantum approximate optimization algorithm
,” arXiv:1602.07674 (
2016
).
8.
K.
Beer
,
D.
Bondarenko
,
T.
Farrelly
,
T. J.
Osborne
,
R.
Salzmann
,
D.
Scheiermann
, and
R.
Wolf
, “
Training deep quantum neural networks
,”
Nat. Commun.
11
(
1
),
808
(
2020
).
9.
X.
Yuan
,
J.
Sun
,
J.
Liu
,
Q.
Zhao
, and
Y.
Zhou
, “
Quantum simulation with hybrid tensor networks
,”
Phys. Rev. Lett.
127
(
4
),
040501
(
2021
).
10.
J.
Martyn
and
B.
Swingle
, “
Product spectrum ansatz and the simplicity of thermal states
,”
Phys. Rev. A
100
(
3
),
032107
(
2019
).
11.
B. T.
Kiani
,
G.
De Palma
,
M.
Marvian
,
Z.-W.
Liu
, and
S.
Lloyd
, “
Learning quantum data with the quantum earth mover’s distance
,”
Quantum Sci. Technol.
7
(
4
),
045002
(
2022
).
12.
R.
LaRose
,
A.
Tikku
,
É.
O’Neel-Judy
,
L.
Cincio
, and
P. J.
Coles
, “
Variational quantum state diagonalization
,”
npj Quantum Inf.
5
(
1
),
57
(
2019
).
13.
S.
Lloyd
,
M.
Schuld
,
A.
Ijaz
,
J.
Izaac
, and
N.
Killoran
, “
Quantum embeddings for machine learning
,” arXiv:2001.03622 (
2020
).
14.
A. V.
Uvarov
and
J. D.
Biamonte
, “
On barren plateaus and cost function locality in variational quantum algorithms
,”
J. Phys. A: Math. Theor.
54
(
24
),
245301
(
2021
).
15.
T. L.
Patti
,
K.
Najafi
,
X.
Gao
, and
S. F.
Yelin
, “
Entanglement devised barren plateau mitigation
,”
Phys. Rev. Res.
3
(
3
),
033090
(
2021
).
16.
M.
Kobayashi
,
K.
Nakaji
, and
N.
Yamamoto
, “
Overfitting in quantum machine learning and entangling dropout
,”
Quantum Mach. Intell.
4
(
2
),
30
(
2022
).
17.
K.
Kuroiwa
and
Y. O.
Nakagawa
, “
Penalty methods for a variational quantum eigensolver
,”
Phys. Rev. Res.
3
(
1
),
013197
(
2021
).
18.
S. Y. C.
Chen
,
C.-M.
Huang
,
C.-W.
Hsing
,
H.-S.
Goan
, and
Y.-J.
Kao
, “
Variational quantum reinforcement learning via evolutionary optimization
,”
Mach. Learn.: Sci. Technol.
3
(
1
),
015025
(
2022
).
19.
J.
Stokes
,
J.
Izaac
,
N.
Killoran
, and
G.
Carleo
, “
Quantum natural gradient
,”
Quantum
4
,
269
(
2020
).
20.
Y.
Huang
,
H.
Lei
, and
X.
Li
, “
An empirical study of optimizers for quantum machine learning
,” in
2020 IEEE 6th International Conference on Computer and Communications (ICCC)
(
IEEE
,
2020
), pp.
1560
1566
.
21.
X.
Bonet-Monroig
,
H.
Wang
,
D.
Vermetten
,
B.
Senjean
,
C.
Moussa
,
T.
Bäck
,
V.
Dunjko
, and
T. E.
O’Brien
, “
Performance comparison of optimization methods on variational quantum algorithms
,”
Phys. Rev. A
107
(
3
),
032407
(
2023
).
22.
J.
Tangpanitanon
,
S.
Thanasilp
,
N.
Dangniam
,
M.-A.
Lemonde
, and
D. G.
Angelakis
, “
Expressibility and trainability of parametrized analog quantum systems for machine learning applications
,”
Phys. Rev. Res.
2
(
4
),
043364
(
2020
).
23.
D.
Marković
and
J.
Grollier
, “
Quantum neuromorphic computing
,”
Appl. Phys. Lett.
117
(
15
),
150501
(
2020
).
24.
P.
Mujal
,
R.
Martínez-Peña
,
J.
Nokkala
,
J.
García-Beni
,
G. L.
Giorgi
,
M. C.
Soriano
, and
R.
Zambrini
, “
Opportunities in quantum reservoir computing and extreme learning machines
,”
Adv. Quantum Technol.
4
(
8
),
2100027
(
2021
).
25.
M.
Otten
,
C. L.
Cortes
, and
S. K.
Gray
, “
Noise-resilient quantum dynamics using symmetry-preserving ansatzes
,” arXiv:1910.06284 (
2019
).
26.
W. W.
Ho
and
T. H.
Hsieh
, “
Efficient variational simulation of non-trivial quantum states
,”
SciPost Phys.
6
(
3
),
029
(
2019
).
27.
M.
Kornjača
,
H.-Y.
Hu
,
C.
Zhao
,
J.
Wurtz
,
P.
Weinberg
,
M.
Hamdan
,
A.
Zhdanov
,
S. H.
Cantu
,
H.
Zhou
,
R.
Araiza Bravo
et al, “
Large-scale quantum reservoir learning with an analog quantum computer
,” arXiv:2407.02553 (
2024
).
28.
C. D.
Schuman
,
S. R.
Kulkarni
,
M.
Parsa
,
J. P.
Mitchell
,
P.
Date
, and
B.
Kay
, “
Opportunities for neuromorphic computing algorithms and applications
,”
Nat. Comput. Sci.
2
(
1
),
10
19
(
2022
).
29.
B.
MacLennan
, “
The promise of analog computation
,”
Int. J. Gen. Syst.
43
(
7
),
682
696
(
2014
).
30.
D.
Marković
,
A.
Mizrahi
,
D.
Querlioz
, and
J.
Grollier
, “
Physics for neuromorphic computing
,”
Nat. Rev. Phys.
2
(
9
),
499
510
(
2020
).
31.
T.
Wunderlich
,
A. F.
Kungl
,
E.
Müller
,
A.
Hartel
,
Y.
Stradmann
,
S. A.
Aamir
,
A.
Grübl
,
A.
Heimbrecht
,
K.
Schreiber
,
D.
Stöckel
et al, “
Demonstrating advantages of neuromorphic computation: A pilot study
,”
Front. Neurosci.
13
,
260
(
2019
).
32.
R. A.
Bravo
,
K.
Najafi
,
X.
Gao
, and
S. F.
Yelin
, “
Quantum reservoir computing using arrays of Rydberg atoms
,”
PRX Quantum
3
(
3
),
030325
(
2022
).
33.
R.
Martínez-Peña
,
G. L.
Giorgi
,
J.
Nokkala
,
M. C.
Soriano
, and
R.
Zambrini
, “
Dynamical phase transitions in quantum reservoir computing
,”
Phys. Rev. Lett.
127
,
100502
(
2021
).
34.
S.
Ghosh
,
T.
Krisnanda
,
T.
Paterek
, and
T. C. H.
Liew
, “
Realising and compressing quantum circuits with quantum reservoir computing
,”
Commun. Phys.
4
(
1
),
105
(
2021
).
35.
R.
Araiza Bravo
,
K.
Najafi
,
T. L.
Patti
,
X.
Gao
, and
S. F.
Yelin
, “
Universal quantum perceptrons for quantum machine learning
,” arXiv:2211.07075 (
2022
).
36.
J.
Leng
,
Y.
Peng
,
Y.-L.
Qiao
,
M.
Lin
, and
X.
Wu
, “
Differentiable analog quantum computing for optimization and control
,”
Adv. Neural Inf. Process. Syst.
35
,
4707
4721
(
2022
).
37.
P.
Mujal
,
R.
Martínez-Peña
,
G. L.
Giorgi
,
M. C.
Soriano
, and
R.
Zambrini
, “
Time-series quantum reservoir computing with weak and projective measurements
,”
npj Quantum Inf.
9
(
1
),
16
(
2023
).
38.
F.
Sauvage
,
M.
Larocca
,
P. J.
Coles
, and
M.
Cerezo
, “
Building spatial symmetries into parameterized quantum circuits for faster training
,”
Quantum Sci. Technol.
9
(
1
),
015029
(
2024
).
39.
C.-Y.
Park
and
N.
Killoran
, “
Hamiltonian variational ansatz without barren plateaus
,”
Quantum
8
,
1239
(
2024
).
40.
C.-Y.
Park
,
M.
Kang
, and
J.
Huh
, “
Hardware-efficient ansatz without barren plateaus in any depth
,” arXiv:2403.04844 (
2024
).
41.
B.
Toussi Kiani
,
S.
Lloyd
, and
R.
Maity
, “
Learning unitaries by gradient descent
,” arXiv:2001.11897 (
2020
).
42.
R.
Wiersema
,
C.
Zhou
et al, “
Exploring entanglement and optimization within the Hamiltonian variational ansatz
,”
PRX Quantum
1
(
2
),
020319
(
2020
).
43.
X.
You
and
X.
Wu
, “
Exponentially many local minima in quantum neural networks
,” in
International Conference on Machine Learning
(
PMLR
,
2021
), pp.
12144
12155
.
44.
J.
Liu
,
K.
Najafi
,
K.
Sharma
,
F.
Tacchino
,
L.
Jiang
, and
A.
Mezzacapo
, “
Analytic theory for the dynamics of wide quantum neural networks
,”
Phys. Rev. Lett.
130
(
15
),
150601
(
2023
).
45.
X.
You
,
S.
Chakrabarti
, and
X.
Wu
, “
A convergence theory for over-parameterized variational quantum eigensolvers
,” arXiv:2205.12481 (
2022
).
46.
C.
Arenz
,
G.
Gualdi
, and
D.
Burgarth
, “
Control of open quantum systems: Case study of the central spin model
,”
New J. Phys.
16
(
6
),
065023
(
2014
).
47.
C.
Arenz
and
H.
Rabitz
, “
Drawing together control landscape and tomography principles
,”
Phys. Rev. A
102
(
4
),
042207
(
2020
).
48.
R.-B.
Wu
,
R.
Long
,
J.
Dominy
,
T.-S.
Ho
, and
H.
Rabitz
, “
Singularities of quantum control landscapes
,”
Phys. Rev. A
86
(
1
),
013405
(
2012
).
49.
A. B.
Magann
,
C.
Arenz
,
M. D.
Grace
,
T.-S.
Ho
,
R. L.
Kosut
,
J. R.
McClean
,
H. A.
Rabitz
, and
M.
Sarovar
, “
From pulses to circuits and back again: A quantum optimal control perspective on variational quantum algorithms
,”
PRX Quantum
2
(
1
),
010101
(
2021
).
50.
T.
Tomesh
and
M.
Martonosi
, “
Quantum codesign
,”
IEEE Micro
41
(
5
),
33
40
(
2021
).
51.
W.
Jiang
,
J.
Xiong
, and
Y.
Shi
, “
A co-design framework of neural networks and quantum circuits towards quantum advantage
,”
Nat. Commun.
12
(
1
),
579
(
2021
).
52.
X.
Zhao
,
X.
Xu
,
L.
Qi
,
X.
Xia
,
M.
Bilal
,
W.
Gong
, and
H.
Kou
, “
Unraveling quantum computing system architectures: An extensive survey of cutting-edge paradigms
,”
Inf. Software Technol.
167
,
107380
(
2024
).
53.
J.
Garcia Ponce
and
R.
Araiza Bravo
, Perceptron-loss-landscapes,
2023
, https://github.com/jorgegponce/perceptron-loss-landscapes.
54.
J.
Preskill
, Lecture notes for ph219/cs219: Quantum information,
2015
, available URL http://www.theory.caltech.edu/people/preskill/ph229.
55.
M.
Schuld
and
N.
Killoran
, “
Is quantum advantage the right goal for quantum machine learning?
,”
PRX Quantum
3
(
3
),
030101
(
2022
).
56.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2014
).
57.
T.-S.
Ho
,
J.
Dominy
, and
H.
Rabitz
, “
Landscape of unitary transformations in controlled quantum dynamics
,”
Phys. Rev. A
79
(
1
),
013422
(
2009
).
58.
H.
Rabitz
,
M.
Hsieh
, and
C.
Rosenthal
, “
Optimal control landscapes for quantum observables
,”
J. Chem. Phys.
124
(
20
),
204107
(
2006
).
59.
B.
Russell
,
H.
Rabitz
, and
R.-B.
Wu
, “
Control landscapes are almost always trap free: A geometric assessment
,”
J. Phys. A: Math. Theor.
50
(
20
),
205302
(
2017
).
60.
X.
Gao
,
S.-T.
Wang
, and
L.-M.
Duan
, “
Quantum supremacy for simulating a translation-invariant Ising spin model
,”
Phys. Rev. Lett.
118
(
4
),
040502
(
2017
).
61.
V.
Ramakrishna
and
H.
Rabitz
, “
Relation between quantum computing and quantum controllability
,”
Phys. Rev. A
54
(
2
),
1715
(
1996
).
62.
T. S.
Ho
and
H.
Rabitz
, “
Why do effective quantum controls appear easy to find?
,”
J. Photochem. Photobiol., A
180
(
3
),
226
240
(
2006
).
63.
M. M.
Müller
,
R. S.
Said
,
F.
Jelezko
,
T.
Calarco
, and
S.
Montangero
, “
One decade of quantum optimal control in the chopped random basis
,”
Rep. Prog. Phys.
85
(
7
),
076001
(
2022
).
64.
T.
Pinch
, “
The social construction of technology: A review
,” Technological Change (
2012
), pp.
17
35
.
65.
G.
Li
,
A.
Wu
,
Y.
Shi
,
A.
Javadi-Abhari
,
Y.
Ding
, and
Y.
Xie
, “
On the co-design of quantum software and hardware
,” in
Proceedings of the Eight Annual ACM International Conference on Nanoscale Computing and Communication
(
ACM
,
2021
), pp.
1
7
.
66.
I. M.
Georgescu
,
S.
Ashhab
, and
F.
Nori
, “
Quantum simulation
,”
Rev. Mod. Phys.
86
,
153
185
(
2014
).
67.
N.
Maskara
,
S.
Ostermann
,
J.
Shee
,
M.
Kalinowski
,
A.
McClain Gomez
,
R. A.
Bravo
,
D. S.
Wang
,
A. I.
Krylov
,
N. Y.
Yao
,
M.
Head-Gordon
et al, “
Programmable simulations of molecules and materials with reconfigurable quantum processors
,” arXiv:2312.02265 (
2023
).
68.
J.
Argüello-Luengo
,
A.
González-Tudela
,
T.
Shi
,
P.
Zoller
, and
J. I.
Cirac
, “
Analogue quantum chemistry simulation
,”
Nature
574
(
7777
),
215
218
(
2019
).
69.
I. D.
Kivlichan
,
J.
McClean
,
N.
Wiebe
,
C.
Gidney
,
A.
Aspuru-Guzik
,
G. K.-L.
Chan
, and
R.
Babbush
, “
Quantum simulation of electronic structure with linear depth and connectivity
,”
Phys. Rev. Lett.
120
(
11
),
110501
(
2018
).
70.
D.
Bluvstein
,
S. J.
Evered
,
A. A.
Geim
,
S. H.
Li
,
H.
Zhou
,
T.
Manovitz
,
S.
Ebadi
,
M.
Cain
,
M.
Kalinowski
,
D.
Hangleiter
et al, “
Logical quantum processor based on reconfigurable atom arrays
,”
Nature
626
(
7997
),
58
65
(
2024
).
71.
Y.
Zhao
et al, “
Realization of an error-correcting surface code with superconducting qubits
,”
Phys. Rev. Lett.
129
,
030501
(
2022
).
72.
E.
Campbell
, “
A series of fast-paced advances in quantum error correction
,”
Nat. Rev. Phys.
6
,
160
161
(
2024
).
73.
N. L.
Diaz
,
D.
García-Martín
,
S.
Kazi
,
M.
Larocca
, and
M.
Cerezo
, “
Showcasing a barren plateau theory beyond the dynamical lie algebra
,” arXiv:2310.11505 (
2023
).
74.
A.
Arrasmith
,
Z.
Holmes
,
M.
Cerezo
, and
P. J.
Coles
, “
Equivalence of quantum barren plateaus to cost concentration and narrow gorges
,”
Quantum Sci. Technol.
7
(
4
),
045015
(
2022
).
75.
A.
Brinkmann
, “
Introduction to average Hamiltonian theory. I. Basics
,”
Concepts Magn. Reson., Part A
45A
(
6
),
e21414
(
2016
).
76.
J.
Choi
,
H.
Zhou
,
H. S.
Knowles
,
R.
Landig
,
S.
Choi
, and
M. D.
Lukin
, “
Robust dynamic Hamiltonian engineering of many-body spin systems
,”
Phys. Rev. X
10
(
3
),
031002
(
2020
).
77.
M.
Kitagawa
and
M.
Ueda
, “
Squeezed spin states
,”
Phys. Rev. A
47
,
5138
5143
(
1993
).
78.
M.
Block
,
B.
Ye
,
B.
Roberts
,
S.
Chern
,
W.
Wu
,
Z.
Wang
,
L.
Pollet
,
E. J.
Davis
,
B. I.
Halperin
, and
N. Y.
Yao
, “
A universal theory of spin squeezing
,” arXiv:2301.09636 (
2023
).
79.
S.
Colombo
,
E.
Pedrozo-Peñafiel
,
A. F.
Adiyatullin
,
Z.
Li
,
E.
Mendez
,
C.
Shu
, and
V.
Vuletić
, “
Time-reversal-based quantum metrology with many-body entangled states
,”
Nat. Phys.
18
,
925
930
(
2022
).
80.
G.
Greene-Diniz
,
D. Z.
Manrique
,
W.
Sennane
,
Y.
Magnin
,
E.
Shishenina
,
P.
Cordier
,
P.
Llewellyn
,
M.
Krompiec
,
M. J.
Rančić
, and
D.
Muñoz Ramo
, “
Modelling carbon capture on metal-organic frameworks with quantum computing
,”
EPJ Quantum Technol.
9
(
1
),
37
(
2022
).
81.
J.
Romero
,
R.
Babbush
,
J. R.
McClean
,
C.
Hempel
,
P. J.
Love
, and
A.
Aspuru-Guzik
, “
Strategies for quantum computing molecular energies using the unitary coupled cluster ansatz
,”
Quantum Sci. Technol.
4
(
1
),
014008
(
2018
).
82.
D.
González-Cuadra
,
D.
Bluvstein
,
M.
Kalinowski
,
R.
Kaubruegger
,
N.
Maskara
,
P.
Naldesi
,
T. V.
Zache
,
A. M.
Kaufman
,
M. D.
Lukin
,
H.
Pichler
et al, “
Fermionic quantum processing with programmable neutral atom arrays
,”
Proc. Natl. Acad. Sci. U. S. A.
120
(
35
),
e2304294120
(
2023
).
83.
B.
Fauseweh
, “
Quantum many-body simulations on digital quantum computers: State-of-the-art and future challenges
,”
Nat. Commun.
15
(
1
),
2123
(
2024
).
84.
B.
Russell
,
R.
Wu
, and
H.
Rabitz
, “
Reply to comment on ‘Control landscapes are almost always trap free: A geometric assessment’
,”
J. Phys. A: Math. Theor.
51
(
50
),
508002
(
2018
).
85.
B.
Russell
,
R.-B.
Wu
, and
H.
Rabitz
, “
Quantum control landscapes beyond the dipole approximation: Controllability, singular controls, and resources
,”
Front. Phys.
9
,
674794
(
2021
).
86.
Á. L.
Corps
,
P.
Stránský
, and
P.
Cejnar
, “
Mechanism of dynamical phase transitions: The complex-time survival amplitude
,”
Phys. Rev. B
107
(
9
),
094307
(
2023
).
87.
M.
Heyl
, “
Dynamical quantum phase transitions: A review
,”
Rep. Prog. Phys.
81
(
5
),
054001
(
2018
).
88.
F. M.
Gambetta
,
C.
Zhang
,
M.
Hennrich
,
I.
Lesanovsky
, and
W.
Li
, “
Long-range multibody interactions and three-body antiblockade in a trapped Rydberg ion chain
,”
Phys. Rev. Lett.
125
(
13
),
133602
(
2020
).
89.
V.
Bergholm
,
J.
Izaac
,
M.
Schuld
,
C.
Gogolin
,
S.
Ahmed
,
V.
Ajith
,
M. S.
Alam
,
G.
Alonso-Linaje
,
B.
AkashNarayanan
,
A.
Ali
et al, “
PennyLane: Automatic differentiation of hybrid quantum-classical computations
,” arXiv:1811.04968 (
2018
).