Quantum machine learning algorithms promise to deliver near-term, applicable quantum computation on noisy, intermediate-scale systems. While most of these algorithms leverage quantum circuits for generic applications, a recent set of proposals, called analog quantum machine learning (AQML) algorithms, breaks away from circuit-based abstractions and favors leveraging the natural dynamics of quantum systems for computation, promising to be noise-resilient and suited for specific applications such as quantum simulation. Recent AQML studies have called for determining best ansatz selection practices and whether AQML algorithms have trap-free landscapes based on theory from quantum optimal control (QOC). We address this call by systematically studying AQML landscapes on two models: those admitting black-boxed expressivity and those tailored to simulating a specific unitary evolution. Numerically, the first kind exhibits local traps in their landscapes, while the second kind is trap-free. However, both kinds violate QOC theory’s key assumptions for guaranteeing trap-free landscapes. We propose a methodology to co-design AQML algorithms for unitary evolution simulation using the ansatz’s Magnus expansion. Our methodology guarantees the algorithm has an amenable dynamical Lie algebra with independently tunable terms. We show favorable convergence in simulating dynamics with applications to metrology and quantum chemistry. We conclude that such co-design is necessary to ensure the applicability of AQML algorithms.
I. INTRODUCTION
Quantum machine learning (QML) promises to deliver advantageous applications in noisy, intermediate-scale quantum computers. QML algorithms promise to deliver either by using quantum systems to speed up machine learning subroutines1,2 or by designing novel techniques to optimize and control noisy quantum systems for machine learning and quantum applications.3 This later approach, called variational quantum algorithms, uses variational optimization for application within near-term devices. These algorithms often comprise an underlying hardware architecture with tunable parameters, a loss function measuring the error relative to a desired computation, and a classical optimizer routine tuning the parameters to minimize the loss. The hardware architecture is often abstracted away and modeled as a digital quantum circuit, and such abstracted algorithms are called variational quantum circuits (VQCs).
Despite their promise, VQCs exhibit issues with accuracy, efficiency, and training, which precludes an advantage over classical algorithms. VQCs often experience flat landscapes upon random initialization4—a phenomenon known as barren plateaus—and an exponential number of local minima.5 To circumvent these challenges, extensive work has been performed on proposing circuit architectures (ansätze),6–10 loss functions,11–14 regulation techniques,15–17 and optimization techniques.18–21
A recent set of proposals, called analog quantum machine learning (AQML) ansätze,22–24 breaks away from circuit-based abstractions. Instead, AQML favors directly using the system’s dynamics for computation. At a high level, an AQML ansatz comprises a native interaction Hamiltonian and a set of time-dependent controls. However, AQML studies suffer from various practical drawbacks, such as the fact that simulating time evolution is computationally expensive and, therefore, limits theoretical studies to small system sizes.
AQML ansätze are more hardware-efficient than circuit approaches. AQML tends to reduce circuit depth,25 is closely related to quantum simulation, can efficiently prepare states adiabatically or through Floquet dynamics,26 and is well-suited for machine-learning applications with limited tunable devices.27 Theoretical AQML studies are more faithfully translated into experimental studies.
AQML appeals to developers for various reasons related to classical analog computation. First, classical analog computing is sometimes advantageous over digital computing for specific applications due to ease of fabrication, problem-solution co-design, and energy cost.28–30 AQML is thus potentially beneficial for applications related to the simulation of other quantum behavior or for classical problems that may benefit from quantum-like computation. Second, analog classical computing can also be noise-resilient upon a judicious choice of learning paradigm.31 AQML algorithms are potentially resilient to current noise in quantum devices and can thus deliver practical applications sooner.
Recent observations and desires within quantum computation also justify investments in AQML research. First, analog approaches promise to clarify the fundamental computational capabilities of available physical systems.32,33 Second, small-scale analog systems have been used for various tasks relevant to scientific and industrial communities, including efficient unitary time-evolution simulation,34 sensing,35 resource allocation optimization,36 time-series prediction,37 image classification,27 and memory storage and retrieval.32 Third, while VQCs are expressive, they are also plagued with barren plateaus. On the other hand, for AQML algorithms, the source of barren plateaus stems from excess entanglement.15,35,38–40 Finally, for both unitary evolution41 and ground state finding,42 shallow VQCs often exhibit low accuracy and exponential local minima.43 VQCs more often succeed when the circuit depth is exponential in the system size.5,44,45 On the other hand, AQML algorithms’ landscapes are believed to resemble those from controllable quantum systems more closely. According to quantum optimal control (QOC) theory, these systems exhibit trap-free landscapes.46–48 Recent AQML studies and perspectives have called for a study of the training landscapes of AQML models.36,49 There is a dire need to determine the topography of the landscapes. Which problems are suitable for AQML algorithms and what algorithms are ideal for specific problems remain open lines of inquiry.
This work leverages various numerical and analytical studies to interrogate and understand AQML landscapes as a function of the number of control parameters, control types, and system size for a specific loss function. To do this, we focus on simulating the transverse-field Ising model evolution with two groups of ansätze: (A1) ansätze that have native Hamiltonian different with the Ising interactions but admit black-box universal quantum computation such as the recently introduced quantum perceptrons (QPs),35 and (A2) ansätze tailored to the specific tasks such as those with Ising native interactions, or specialized QPs that contain the Ising interaction as a special case (see Sec. II C for details). Ansätze in A1 are expressive, while those in A2 are co-designed with the task we study. Co-design is not a new concept; significant effort has gone into describing how quantum computing architectures are often built to address specific problems.50–52
In Sec. III A, we show that using QOC theory with common QOC assumptions leads to a theoretical prediction of trap-free landscapes when an exponential number of control parameters are available (i.e., in over-parametrized ansätze) regardless of the microscopic ansatz details. We also discuss conditions on AQML ansätze that lead to a violation of one QOC assumption, local surjectivity. Consequently, QOC theory does not apply to AQML ansätze. However, numerical experiments show that A2 ansätze exhibit trap-free landscapes and better approximation errors, while those in A1 are trap-ridden (Sec. III B). In summary, we argue that although most AQML landscapes do not possess the amenable qualities of QOC settings, co-designing an ansatz is an important step toward favorable training landscapes.
In Sec. IV, we show that task-algorithm co-design can be developed for unitary time evolution simulation by analyzing the ansatz’s Magnus expansion. A co-designed algorithm is one where terms in the Magnus expansion contain the Hamiltonian terms in the target unitary evolution and where each term in the Magnus expansion is independently tunable. That is, a co-designed algorithm has a tunable dynamical Lie algebra (DLA) amenable to the problem at hand. We do this analysis pictorially using a generalization of the Lie trees used in optimal quantum control.46 We show that when co-designed, the QP ansatz is suited for time-reversible spin-squeezing with metrology applications (see Sec. IV B), and the Ising ansatz is suited for unitary couple-cluster evolution with applications to quantum simulation (see Sec. IV C). In both cases, satisfactory accuracy is reached within a constant number of controllable parameters. Section V concludes with further research directions.
Upfront, our studies make the following two significant conclusions:
AQML algorithms violate common QOC assumptions. Despite not following QOC theory, co-designed AQML algorithms exhibit trap-free landscapes. On the other hand, black-box algorithms exhibit traps. Further work is needed to determine if co-design is necessary for trap-free landscapes.
Succeeding at a task is a more likely outcome when AQML algorithms are co-designed. In the case of unitary time evolution, one approach toward co-design is given by analyzing the terms of the AQML algorithm’s Magnus expansion.
All of the code and data for this publication are available in Ref. 53.
II. ANALOG QUANTUM MACHINE LEARNING ALGORITHMS
This section introduces AQML algorithms comprising an analog ansatz, a loss function, and a classical optimization scheme. We also introduce two classes of algorithms: black-box expressive ones and task co-designed ones. This distinction will allow us to see the quantitative differences in training performance due to co-design.
A. Analog Ansätze
B. Learning with analog ansätze
In the most general setting, after the system evolves under U(θ), the system is measured several times with each qubit in a particular basis Mi to calculate a loss encoding the error relative to a target behavior. The loss is then passed through a classical optimizer for improvement through tuning the parameters θ.
C. Groups of ansätze
We now define the groups of ansätze we alluded to in the introduction. All ansatz we study admits full expressivity in the form of controllability. These are ansätze for which the operators in the native and control Hamiltonians can produce any element of the dynamical Lie algebra (i.e., Pauli strings in our case) through nested commutations.
Group A1 comprises QPs with arbitrary control parameters and only Gaussian or Fourier controls. These ansatz are expressive but not tailored for the task in Eq. (6). Group A2 consists of expressive ansätze with a close form solution for simulating the evolution in Eq. (6). For example, the Ising ansatz with constant fields is immediately in A2. Moreover, in Appendix A, we show that specialized QPs with a combination of Gaussian controls and extra piecewise constant controls on the output qubit are in A2. This can also be derived from analyzing the ansatz Magnus expansion as in Sec. IV. In summary, algorithms in A1 and A2 are expressive, but those in A2 are tailored to the task in Eq. (6).
In Sec. III, we show that QPs in A1 perform purely at the task in Eq. (6), while specialized A2 QPs perform better but worse than the Ising ansatz. However, we show that A2 QPs, co-designed for the task, exhibit landscapes free of local minima. Therefore, we contrast A1 and A2 ansätze to showcase the importance of co-design.
III. LANDSCAPES OF ANALOG QUANTUM MACHINE LEARNING ALGORITHMS
Before going any further, this section introduces crucial concepts in landscape analysis. We discuss theoretical conditions from QOC that are necessary for trap-free landscapes and show that while these conditions are violated, co-designed ansätze in A2 indeed show trap-free landscapes.
Let us begin reviewing how to characterize a landscape’s curvature. In general, studying the landscape of any algorithm encompasses finding the parameter values with vanishing loss’s derivative (i.e., the critical points). Critical points are then categorized as minima, maxima, or saddles, depending on the curvature around them.
The derivative can be expressed as the function composition . For the case of Eq. (5), the first term is independent of θ and, therefore, all critical points satisfy Tr(∂θU) = 0.
The nature of the critical points is dictated by the Hessian (i.e., second derivative) of the loss. In the case of Eq. (5), the Hessian is given by . The eigenvalues of the Hessian determine the nature of the critical point [see Fig. 2(a) for examples]. A trap-free landscape has only one maximum or minimum critical point, while the rest are saddles. See Fig. 2(a) for an example of a trap-free landscape.
A. AQML theory of trap-free landscapes
We now discuss conditions from QOC theory necessary for trap-free landscapes. We show that AQML ansätze in both A1 and A2 violate these requirements, but we will show that A2 ansätze are trap-free.
According to QOC theory, three conditions are sufficient to ensure a trap-free landscape,57–59 which several studies have assumed to apply to AQML algorithms.36,49 The conditions are as follows:
Unconstrained Fields. The control fields should be allowed to take on any real value.
Controllability. The terms in the H(t) should produce any element of the Hilbert space’s dynamical Lie algebra through nested commutators.
Local Surjectivity. The dynamic derivatives of the ansatz for every parameter must be full-rank in the dynamical Lie algebra.
The first two conditions are satisfied for our AQML ansätze. Indeed, the control fields are unconstrained, and the ansätze allow for universal quantum computation,35,60 which is linked to controllability.61 On the other hand, local surjectivity must be checked numerically on all 4N − 1 Pauli strings. As a result, local surjectivity is often assumed. Theoretical studies of QOC landscapes justify this assumption because, experimentally, optimal control fields are indeed easy to find in most cases.62 However, this easiness is not the case for AQML algorithms.
In Appendix B, we show that if one assumes these three conditions for an AQML ansatz with at least 4N − 1 variational parameters, then the landscape of is trap-free (i.e., it consists of saddles and only one local minimum and maximum). We remark that the over-parametrizing requirement is sufficient to produce a theoretical prediction of trap-free landscapes, but that numerics in Sec. III B show that over-parametrizing results in higher losses in practice. This proof uses the methods developed in Ref. 57. We highlight that the landscape analysis depends on the loss function. The loss dictates the values of the landscape, while its derivatives dictate the critical values and curvature. The analysis in this paper only applies to the loss in Eq. (5). However, as it has been shown in QOC, using the techniques in Ref. 58, QOC assumptions suffice to produce trap-free landscapes on other losses such as those related to ground state preparation, and trap-free landscapes have been observed in applications in a plethora of settings.63
Figure 2(b) shows numerical tests of local surjectivity using A1 and A2 QP ansatz with Fourier and Gaussian parametrizations for different qubit numbers. We see that while the average overlap for many Pauli’s is nonzero, there exist Pauli strings for which the overlap is at a minimum of zero or close to zero. In other words, the ansätze fail the local surjectivity condition even in the over-parametrized regime of 4N − 1 parameters. We also observe similar results for fewer parameters and other parametrizations, such as using Legendre polynomials and piecewise constant functions.
This observed violation gives way to a natural question: under what conditions can we expect an AQML algorithm to violate local surjectivity? In Appendix C, we show that when an AQML ansatz produces a unitary U(θ) that is Haar random distributed, then local surjectivity is violated for every P. This condition can be relaxed to U(t; θ) following a unitary 1-design in case the control functions gk vanish around t = 0, T, in which cases local surjectivity is violated for Pauli’s P contained in the native Hamiltonian. The observations in Appendix C provide us with design guidelines for AQML algorithms. First, it is best to use time-dependent control functions that do not vanish at the endpoints. Second, it is best to avoid native and controlled Hamiltonians that produce dynamics resembling Haar unitaries.
We highlight that following these guidelines does not guarantee trap-free landscapes. Our algorithms do not resemble Haar randoms. Therefore, further work is needed to close the gap between our analytic and numeric understanding of the scenarios leading to the violation of local surjectivity and in what cases a violation leads to trap-ridden landscapes. However, as Sec. III B shows, trap-free landscapes can still appear when an ansatz is co-designed, even when local surjectivity is violated.
B. Numerical evidence for trap-ridden and trap-free landscapes
We now present a numerical analysis of the landscapes of A1 and A2 ansätze. We show that A2 QPs are trap-free. We compare the quality of the solutions produced by both A1 and A2 QPs at approximating the time evolution in Eq. (6). Importantly, this evolution is theoretically simulatable since both ansäte are expressible, but only A2 QPs are co-designed to accomplish this task. The main results are presented in Figs. 3 and 4.
For Fig. 3(a), we initialized 100 instances of each QP ansatz with K = 5N and K = 3N control fields for A1 and A2 ansätze, respectively. We note that we ran several more simulations with varying numbers of fields, and the simulation results remained practically unchanged. Figure 3(a) shows the average converged loss. We observe that at N = 4 onwards, both ansätze failed to converge, on average, to the optimal solution. We attribute the sudden change at N = 4 to the fact that both A1 and A2 ansätze have a “star-like” architecture, which reduces to the Ising connectivity at the values of N = 2 and N = 3. At N ≥ 4, however, neither can reliably converge to the optimal solution. We note that the A2 ansatz consistently outperforms the A1, and the error bars show a much larger spread in the loss distribution at the converged critical points.
To further investigate the effect of the number of control functions (K) and obtain a better picture of the quality and spread of the solutions for these landscapes, we repeated the same experiments for the A2 QP ansatz with varying K. Figure 3(b) shows the resulting plots of minimum and average over 100 trials for each value of K. For N = 4, we see an improvement in the average and minimum loss found as K approaches the over-parametrized regime at K = 26, followed by an increasing error due to over-fitting. In addition, we observe a stark difference between the average and minimum solutions found by ∼2 orders of magnitude. This contrasts with the homogeneity in loss values observed for the A1 QPs. For N = 5, the sudden improvement as the pulse basis sets grow in size is absent. Yet, we still recover the trend in the spread of the found critical points, signaling a richer and more heterogeneous distribution of critical points with differences in quality by ∼2 orders of magnitude. The explored values of K encompass the under- and over-parametrized regimes for each value of N (K = 26 and 85, respectively). We highlight that Appendix D shows the convergence rates of our AQML models, highlighting their high convergence rates.
We investigate the nature of the critical points we found for both A1 and A2 QPs. Using automatic differentiation, we calculate the exact Hessian matrix at each critical point across 50 trials with varying numbers of control parameters K. Figure 4 shows the results for four select choices of K corresponding to below and above the over-parametrized regime of 4N − 1 parameters (see Appendix D for more details). The right (left) column plots the Hessian eigenvalues for both ansätze in the under- (over-) parameterized regime. As we can see, for the A1 QPs, both regimes exhibit only non-negative eigenvalues; therefore, these critical points correspond to local minima. For the A2 QPs, however, we observe the presence of both positive and negative eigenvalues in both regimes. The critical points of A2 QPs can thus be classified as saddles. This result adds to the difference in solution quality achievable through both ansätze classes, as it further characterizes the A2 QPs solutions as saddles that can potentially be avoided by increasing the number of optimization cycles.
In sum, our numerical experiments provide evidence of the absence of amenable landscapes for black-boxed expressive algorithms. Rather than seeing a computational phase transition around the overparameterized regime, we found that a more accurate predictor of success was whether a model is co-designed with a particular task. In particular, the A1 QPs are swamped with traps. However, the experiments also revealed a quantitative and qualitative difference in the achievable solutions through the different ansätze. Specifically, while neither class was able to produce an optimal solution on average, we observed a significant increase in the variability of the quality of the solutions achievable through the A2 QPs and, for both cases, a significantly better solution (by ∼3 orders of magnitude when compared to the A1 QP solutions) was found. Additionally, we found that the A2 QPs produced saddle points, which opens the possibility of improving the solutions found by optimizing them for longer. The co-designed algorithm with a modest amount of parameters outperformed non-co-designed algorithms with even an exponential number of parameters. These drastically different results in solution quality thus hint at the importance of choosing the right ansatz for the appropriate problem.
IV. TASK-ALGORITHM CO-DESIGN
We have shown that AQML algorithms suffer from trap-ridden landscapes. Therefore, an outstanding set of questions remains: What kinds of unitaries can an AQML algorithm readily approximate? Conversely, given a desired unitary, can we devise a systemic methodology for allocating attention to different ansatz? This section argues that these questions can be addressed by thinking of algorithm-task co-design.
Co-design is not a new concept; significant effort has gone into describing how quantum computing architectures are often built to address specific problems.50–52 More generally, technologies are defined by and help define the issues they aim to resolve.64 Co-design is crucial in the development of hardware control software,65 in proposals for materials and chemistry simulation,66–69 and in approaches to error correction.70–72 Indeed, the field of AQML broadly asserts that the applications developers envision are inseparable from the hardware we expect to use to realize them.
In this section, co-design will take the following particular meaning. An AQML ansatz is co-designed for a desired unitary evolution such that its Magnus expansion contains the operators of the desired evolution weighted by independently tunable coefficients. Likewise, understanding the Magnus expansion of an AQML ansatz can co-design a desired unitary with a specific application.
We remark that co-design is linked to the dynamical Lie algebra (DLA) of a QML model. The DLA of a model is the set of operators generated by the nested commutators of the model’s native and control Hamiltonians. A model with a DLA consisting of D linearly independent operators exhibits vanishing gradients concentrated around zero as 1/D.73 Vanishing gradients are equivalent to the presence of narrow local minima with 1/D curvature scaling.74 Therefore, the most expressive models suffer from barren plateaus with exponentially concentrated local minima.5 When a co-designed model is trained, it explores a region of parameter space where Hamiltonian terms outside the target Hamiltonian are zeroed out while keeping desired terms accessible. This parameter region contains a smaller effective DLA of size D′ ≪ 4N, therefore avoiding narrow local minima. While this guarantees an absence of parasitic local minima, trap-free landscapes are not guaranteed. Yet, our numerics show their presence. Therefore, further work is needed to design a predictive theory by exploring if methodology linked to the DLA can be used to explain trap-free landscapes or by finding a QOC theory that does not require local surjectivity but does predict trap-free landscapes.
A. Co-design guided by the Magnus expansion
Let us begin by justifying our insisted attention to the Magnus expansion and showing how it can be used for co-design.
Each term in the Magnus expansion contains operators. At first glance, for simple control fields (i.e., small K), one would expect that the coefficients of different operators are linearly dependent. However, numeric experimentation shows that the operators’ coefficients are linearly independent and can thus be, in theory, tuned independently. For this, see Fig. 10 in Appendix E, where we also argue that integration order relevance ensures linear independence by analyzing operators generated up to l = 2.
For both experiments, N = 4 and 100 randomly initialized were optimized to approximate a T = 1 evolution with K = N. We observe that convergence is achieved when for all k > 0 (i.e., the algorithm is trained to choose constant fields) and for α = y, z. Importantly, for the case of constant fields, H(l) = 0 for all l > 0.
The example of co-design in Fig. 5 is very simple. However, the idea is that by analyzing the terms in H(l) (θ) for l > 0, and the coefficients in front of them, we can tell what kinds of unitaries can be readily approximated by our AQML algorithm. Notice that this is quite different than ensuring simulatability through either a claim of universality or controllability. In the case of universality, a desired unitary Utarg is broken down into a product of unitaries, each achievable given a particular hardware architecture. Instead, our approach focuses on determining if the Hamiltonian generating the desired unitary is spanned by the term within the ansatz’s Magnus expansion. Controllability studies whether the nested commutators of the terms in the Hamiltonian can generate every generator of the dynamical Lie algebra (Pauli strings, in our case). Our approach also attends to the coefficient before a given generator, which influences how easy it is to achieve evolution under said generator using an AQML ansatz in an experiment. By looking at the Magnus expansion, one can gain insight into both the generators and the coefficients in front of the generators, which are crucial to determining the likelihood of success in practice. In summary, if A is an operator present in the Magnus expansion with a coefficient c(θ), we can, in theory, approximate the evolution as long as c(θ) can be made nonzero and all other coefficients for other operators can be mitigated.
Figure 5 exemplifies how to calculate the operators and coefficients in the Magnus expansion for l = 2 for a QP. Figure 5(c) is a pictorial representation of how the Hamiltonian terms transform into each other through commutation. For example, the operator Zi transforms into Yi while the associated coefficient picks a factor. The yellow, short-dash, long-dash arrow in Fig. 5(c) depicts this example. Going in the direction opposite to the arrows picks up an extra −1 on the coefficient. Figure 5(d) exemplifies how to use this pictorial depiction of the commutation-induced transformations to compute the operators generated from the interaction ZiZN appearing in H(2). We note that new interactions emerge between two inputs and the output qubit (i.e., three body terms). Using panel (c), each term comes with a coefficient composed of a nested integral of control fields and interaction strengths (see Appendix E for details on how to calculate the coefficients).
It is important to note that the order of integration matters for these coefficients. For an example of the importance of integration order, see Eq. (E6) in Appendix E. Using this method and focusing on a particular term in the Magnus expansion label by l gives us a function mapping the variational parameters θ to the coefficients of all possibly generated Hamiltonian terms. Figure 9 shows all operators obtained from ZiZN in the l = 2 term of the expansion.
Figure 5(c) also exemplifies the coefficient in front of the operator ZiZjXN, creating an effective interaction between two input qubits mediated through the output qubit. In Sec. IV B, we show that this operator can naturally be used for time-reversible spin-squeezing, an observation that was previously made in Ref. 35 using second-order perturbation theory. Therefore, while the result that quantum perceptrons can perform time-reversible squeezing is not new, the Magnus expansion approach to arrive at this result shows that this can be achieved even outside the perturbative regime.
With this framework in mind, let us show that co-design via attention to the Magnus expansion can reveal naturally suitable evolution for a given ansatz. Below, we show that QPs are naturally suited to implement time-reversible spin-squeezing and that the Ising ansatz is naturally well-suited for evolution under products of Wigner–Jordan strings, an evolution paramount in quantum chemistry applications of quantum computers.
B. Time-reversible spin-squeezing through co-design
This subsection offers a fresh perspective on an observation we made in previous work (Ref. 35), namely that QPs are suited for realizing time-reversible spin-squeezing, which can be used for quantum metrology applications. Our previous observation used second-order perturbation theory. This section shows that the Magnus expansion tells us QPs can do time-reversible spin-squeezing. We do not show how this can be used for metrology applications since that has already been explored. Instead, this section explains how the Magnus expansion can lead us to similar conclusions without the need for perturbation theory.
Therefore, the Magnus expansion analysis can elucidate the applications suitable for AQML ansatz.
C. Products of Jordan–Wigner strings through co-design
Finally, this subsection shows how the Magnus expansion can be used to co-design algorithms to simulate the evolution products of Pauli strings, which show up repeatedly in quantum chemistry applications.
Figure 6 shows the results of simulating the evolution of different Jordan–Wigner products. We minimized Eq. (5) for these numerical experiments, starting from 100 random initializations of control fields and coupling constants J with K = 10 for N = 4. Fourier functions were used, but similar results were observed with Legendre and Gaussian functions. In Fig. 6, we show the channel fidelity of the optimized evolution U(θ). We see that both products’ evolution was faithfully simulated early, while it is easier to simulate XZZY evolution over a longer period. The inset shows the results of simulating the product XZ…ZX for different qubits for T = 0.1 and K = 10. We see that the fidelity stays largely constant. Therefore, the simulation is faithful with linear resources in N. This is similar to other methods for calculating these products using fermionic swap operators, where faithful simulation requires resources to scale linearly in N.69 However, fermionic operations are only available on a handful of platforms, such as superconducting qubits and movable neutral atoms.82 Our approach, on the other hand, makes couple-cluster simulation with linear resources available to other platforms, such as fixed neutral atoms and trapped ions, without requiring extra controls to achieve fermionic interactions.83
V. CONCLUSION AND OUTLOOK
In this work, we systematically studied the landscapes of several AQML algorithms. We showed that AQML algorithms violate a key assumption from QOC theory that would suffice for trap-free landscapes. However, we observe that ansätze co-designed with specific tasks often showcase trap-free landscapes. Therefore, co-design is paramount to the development of successful AQML algorithms. We show that in the case of time evolution simulation, co-design can be realized by studying the Magnus expansion of a given ansatz and showcasing our approach to tasks relevant to the quantum information community.
This work elucidates various lines of inquiry worth further exploration. First, further work is needed to design an AQML ansatz that satisfies local surjectivity. As pointed out in Ref. 84, QOC theory’s assertion of trap-free landscapes does not apply when the control fields are singular (i.e., they violate local surjectivity). However, recent work shows that even singular ansätze, depending on powers of control functions, can still produce trap-free landscapes.85 So far, these ansatz’s Hamiltonians include higher-order terms resulting from the higher-order effects of shining intense lasers onto quantum systems. Along this direction, further work could determine whether higher-order effects in the envisioned hardware of AQML algorithms can mitigate traps. For example, it is well-known that optical-tweezer arrays of Rydberg atoms exhibit nonlinear dynamics when closely packed due to the Rydberg blockage mechanism. These dynamics could then elucidate trap-free landscapes. Alternatively, classes of basis functions may exist for which local surjectivity is met. Moreover, studying the effect of decoherence on the landscape remains an important line of inquiry.
Second, our study of the Magnus expansion as a means to co-design AQML algorithms is limited. Based on our work, we hope the community is inspired to study the viability of simulating higher-order terms from the UCC operator.
Third, the theory linked to the dynamical Lie algebra (DLA) and QOC approaches fails to explain the presence of trap-free landscapes in co-designed algorithms. The DLA’s prediction of barren plateaus and local minima whose curvature scales as 1/D only guarantees that a co-designed algorithm lacks parasitic local minima. QOC theory guarantees trap-free landscapes when local surjectivity is assumed. Therefore, further work is needed to design a predictive theory by exploring if methodology linked to the DLA can be used to explain trap-free landscapes or by finding a QOC theory that does not require local surjectivity but does predict trap-free landscapes.
Finally, using the operators uncovered by the Magnus expansion to simulate novel quantum dynamics seems particularly enticing. For example, the operator highlighted in Sec. IV B has a precise application to quantum metrology as it creates spin-squeezing. Moreover, such an all-to-all Ising model has recently exhibited a dynamical phase transition,86 which can enable new understandings of out-of-equilibrium phenomena.87 Multi-body terms such as those in Sec. IV C also appear when considering the dynamics of qubits to phononic baths,88 and understanding these models may enable long-lived quantum information storage, kinetically constrained dynamics, and correlated quantum states of matter.
ACKNOWLEDGMENTS
The authors acknowledge Christian Arentz for his insightful discussion on QOC theory and Nishad Maskara for pointing out applications relevant to quantum chemistry. R.A.B. acknowledges support from the National Science Foundation (NSF) Graduate Research Fellowship under Grant No. DGE1745303. J.G.P. acknowledges support from the Harvard College Research Program and the Harvard Quantum Initiative. S.F.Y. acknowledges funding from the NSF through the Q-IDEAS HDR Institute (No. OAC-2118310), the Q-SeNSE QCLI (No. OMA-2016244), and the CUA PFC (No. PHY-2317134). The authors also acknowledge the funding through the DARPA IMPAQT Program (No. HR0011-23-3-0023).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Rodrigo Araiza Bravo and Jorge Garcia Ponce contributed equally to this work.
Rodrigo Araiza Bravo: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (lead); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). Jorge Garcia Ponce: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (supporting); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (supporting). Hong-Ye Hu: Formal analysis (supporting); Validation (supporting); Writing – review & editing (supporting). Susanne F. Yelin: Funding acquisition (lead); Project administration (supporting); Resources (lead); Supervision (supporting); Writing – review & editing (supporting).
DATA AVAILABILITY
The data supporting this study’s findings are openly available in Ref. 53.
APPENDIX A: QP’S EXPRESSIVITY AND THE ISING EVOLUTION
This procedure necessitates that K be large enough for each interaction to be tuned independently. This means that K ≥ 3Nn, where n is the number of Trotter steps.
Moreover, our proof heavily relies on producing the effective Hamiltonian in Eq. (A7). Therefore, it is paramount that the QP ansatz includes PWC functions on the output qubit. In numerical experiments, we have seen that while Gaussian controls work best, Fourier controls also allow the QP to simulate the Ising evolution as long as the output contains PWC controls. We say that a QP with PWC controls is an A2 ansatz as specified in Sec. II C.
APPENDIX B: LANDSCAPES OF UNITARY TIME EVOLUTION
1. Derivation of the dynamical derivative
2. Trap-free landscapes
When Np ≥ d2, we can apply some of the results in Sec. VI of Ref. 57. These results show that there exists an orthonormal basis O that diagonalizes , where C is a diagonal matrix whose entries are the curvature of the landscape and the critical points. Therefore, the matrix Γ is congruent to C. Using Sylvester’s law of inertia, congruent matrices share rank (i.e., the number of nonzero eigenvalues) and signature (i.e., the number of positive minus negative eigenvalues). A signature of magnitude equal to the rank means the critical point is a maximum or a minimum if positive or negative, respectively. A signature smaller than the rank in magnitude means the critical point is a saddle point. This is because a critical point with such a signature would have at least one direction in which the curvature is of the opposite sign as other directions. Therefore, Eq. (B13) tells us the most relevant information about the curvature by looking at the sign of its entries and the number of zero entries.
For the case that U = W ni = 0 and, therefore, the first d entries of Γ are positive. The rest of the entries are zero. Therefore, the rank of the Hessian is d and its signature d. Therefore, this critical point is a global minimum. For the case that U = −W, Γ has a rank of d and a signature of −d, and it is thus a global maximum. For the cases that U = ΠW where Π is a permutation of n columns, the rank is Rn = d(d − 2n) + 2dn2 greater than d for n > 0. The signature is Sn = d(d − 2n). Clearly, for n > 0, Sn < Rn.
Let us summarize our findings. We made the following two significant assumptions:
We assumed that our AQML algorithm is locally surjective. In practice, this needs to be checked at least numerically.
We assumed that we have at least 22N variational parameters. In practice, this is a choice experiments must decide on. Making this choice can be seen as choosing to overparametrize the AQML algorithm.
Given those assumptions, we found that
the critical points of correspond to the cases in which U†W = W†U,
the values of the loss function at these critical points are an integer given by 2(2N − n), where 0 ≤ n ≤ 2n is an integer. Each critical point has a degeneracy of solutions,
the rank of the Hessian at the critical points is Rn = d(d − 2n) + 2dn2, which achieves a maximum value of d2 for n = 2N (U = W) or n = 0 (U = −W) and,
The signature of the Hessian at the critical points is Sn = d(d − 2n). Therefore, the critical point of the case of W = U (n = 0) is a global minimum. That is, the Hessian contains only non-negative eigenvalues. Similarly, W = −U (n = d) is a global maximum. That is, the Hessian contains only non-positive eigenvalues. For all other critical points, there is at least one positive and one negative eigenvalue and, therefore, they are saddle points.
In conclusion, the assumptions of local surjectivity and overparametrization should lead to trap-free landscapes of .
APPENDIX C: AQML MODELS AND LOCAL SURJECTIVITY
This appendix shows that AQML ansatz fails to satisfy local surjectivity in certain cases. To prove this statement, it is helpful to establish the following lemmas.
We can, therefore, conclude that local surjectivity will be violated for every Pauli string if an AQML algorithm such that U(θ) is a Haar random unitary. Notice that this statement also assumes that U(τ) and U(T) are independent for τ < T and that U(τ) is a 1-design for τ > 0.
Here, we must clarify that one could also relax the Haar random requirement if gk(t) vanishes for t ≈ T and t ≈ 0. In that case, an AQML algorithm producing unitaries that only resemble 1-designs suffices for the violation of local surjectivity.
Another caveat to our proof is that the Haar random requirement of Lemma 2 is stringent. It remains to be shown whether the average Tr(UM) can vanish for other kinds of unitary ensembles.
APPENDIX D: ANALOG LANDSCAPES
The simulation code was implemented with PennyLane’s differentiable pulse programming capabilities89 and the JAX framework. The optimizers were similarly implemented using the Optax library. We note that these libraries/frameworks support automatic differentiation; therefore, the results are exact up to machine precision. Additionally, we note that both Gaussian and Fourier control fields behave qualitatively very similar. The plots here shown are for Fourier controls.
The first phase of the numerical experiments consists of training 100 randomly initialized A1 QP ansatz for each qubit number. These are sampled from a uniform distribution on the interval [0, 1]. We parameterize the pulses as a sum of the first K terms of a Fourier series, where K = 5N for N qubits. Therefore, concretely, we used 10, 15, 20, and 25 terms for each control pulse for the experiments with 2, 3, 4, and 5 pulses, respectively. We note that the K = 5N cutoff for the series is arbitrary, but the system’s behavior remained unchanged for several different K. As for the optimizer, we used Optax’s Adam implementation.
Figure 7 shows the results of the numerical experiments. The training process is successful in all cases and converges to a critical point. Yet the optimal approximated unitary is only achieved at a value of . Therefore, we see that for 4 and 5 qubits, all sampled QP ansatz converge to a sub-optimal critical point (i.e., a local minimum or maximum instead of the global optimum). We see similar results with the A2 QP ansatz, with a lower loss for N ≥ 4.
To make sense of these metrics, recall that at a critical point of , its gradient by definition. Therefore, the mean L2 norm of the gradient will only be zero at a critical point, and we would expect the gradients at that epoch and onwards to remain very small and close to 0. Similarly, because the gradients are zero at a critical point, the update rule of the optimizer at the critical point would not change the current parameters. Therefore, the difference between the gradients at epoch i and i − 1 at a critical point should remain small and close to 0.
We see that for all experiments, both metrics keep decreasing and plateauing by the end of the training. For 2 and 3 qubits, this plateau occurs at ∼10−5, whereas for 4 and 5 qubits it is located at around 10−4.
The second phase of the numerical experiments consists of calculating the Hessian matrix at the critical points found during training. Recall that the Hessian matrix encodes information about the second derivatives of the function at a given point and, therefore, encodes information about the local curvature of the landscape around that point. In particular, a positive eigenvalue corresponds to a direction (the direction of its corresponding eigenvector) with positive curvature, and conversely, a negative eigenvalue corresponds to a direction with negative curvature. A zero eigenvalue represents a direction with zero curvature (i.e., flat).
Figure 8 shows the minimum and maximum eigenvalues of the Hessian matrices after the training process converges to a critical point for each trial of A1 QPs. The insets show that for 2 and 3 qubits, the final loss is indeed optimal. Yet even for this optimal case. We observe that there is always a 0 eigenvalue corresponding to some locally “flat” direction around the critical points. That is, even for qubit sizes that can converge to an optimal value, the landscape is not as well-behaved as the theory predicts. This suggests that the ability to converge to the global minimum for the A1 QP and similar AQML ansätze is not a consequence of a trap-free landscape. We note that for A2 QPs, the eigenvalues are both positive and negative (see Fig. 4), and so those landscapes are trap-free.
APPENDIX E: ALGORITHM-TASK CO-DESIGN
1. Magnus expansion analysis
Every unitary U could be written as the evolution under a Hamiltonian HU via . Formally, . On the other hand, a time-dependent Hamiltonian generates evolution as in Eq. (4). This evolution could be written as , where Heff is an effective Hamiltonian.
Note that for a constant Hamiltonian, all terms with l > 0 cancel since [H(t1), H(t2)] = 0.
The Hamiltonians generated by the Magnus expansion depend on the native and control Hamiltonians. We will use to denote the operators generated by the nested commutators in H(l). This means that H(l) is a linear combination of the operators in (i.e., ). It is worth noting that different orders may produce the same operators. That is, in general, . An example here is instructive. Consider Hnat = JZ1Z2 with x-controls . Clearly, contains Z1Z2. We can obtain this operator by commuting Z1Z2 with the operator Xi twice or with X1 once and then with X2 (or vice versa). Therefore, as well.
Theoretically, one can approximate every unitary evolution U for which HU is in the span of . In practice, however, the Hamiltonians generated by the first few terms of the Magnus expansion may be easier to optimize and find.
First, we construct a graph that explains how the elements in Hnat and Hctr transform under commutations. In our example, the operator Xi transforms into Yi when commuted with Zi, picking up a coefficient of ifz. It also transforms to YiZi+1, picking up a coefficient of iJ. Figure 9 exemplifies the transformations generated by terms in our Hamiltonian. The arrows represent commutations that lead to the prescribed coefficient, while going in the opposite direction requires an extra factor of −1.
- To construct a term given by l commutation relations, we first grab an operator from the Hamiltonian (for example, JZiZi+1) and operate on it using the graph in Step 1 a total of l times to produce one possible operator. We observe that many different operators can be created in this fashion, starting from a given initial operator. We must write the acquired coefficients from right to left at each stage. For time-dependent coefficients, the times acquire growing indices, from 0 to l, from right to left. For example, for l = 2, we may obtain the following operators:(E9)(E10)
- We then multiply the coefficients by the constant (−i)l/(l!T), which weighs in the contribution to the Magnus expansion. Finally, the coefficients are integrated and added to all other contributions paired with the same operator. For the paths exemplified above in Step 2, we see that ZiZi+1 we get the coefficients(E11)(E12)
With this in mind, here are the coefficients of all the operators of the Magnus expansion up to l = 2 for the model in Eq. (E7) as shown in Table I, where we have used the notation . Notice that if the order of integration did not matter, then XiZi+1 and YiZi+1 would have a zero coefficient, and the coefficients of XiXi+1 and XiYi+1 would be the same (up to a sign). Therefore, if the order of integration did not matter, the Magnus expansion would lead to linearly dependent coefficients, so individual terms could not be tuned independently.
Operator . | Coefficient . |
---|---|
Xi | |
Yi | |
Zi | |
XiXi+1 | |
XiYi+1 | |
XiZi+1 | |
YiXi+1 | |
YiYi+1 | |
YiZi+1 | |
ZiXi+1 | |
ZiYi+1 | |
ZiZi+1 | |
ZiYi+1Zi+2 | |
ZiXi+1Zi+2 |
Operator . | Coefficient . |
---|---|
Xi | |
Yi | |
Zi | |
XiXi+1 | |
XiYi+1 | |
XiZi+1 | |
YiXi+1 | |
YiYi+1 | |
YiZi+1 | |
ZiXi+1 | |
ZiYi+1 | |
ZiZi+1 | |
ZiYi+1Zi+2 | |
ZiXi+1Zi+2 |
However, even when the order of integration matters, the analysis in Table I does not suffice to show that the coefficients are linearly independent. To do so, we see the right-most column of Table I as a map ϕ that takes in variational parameters θ and produces the coefficients of the operators. In a sense, ϕ is a featured vector whose entries ϕO are the coefficients of the operator O. We can produce a data matrix D whose M columns correspond to choosing M random instantiating of the variational parameters and calculating the associated coefficient vectors . We can then analyze the singular value decomposition of D and count the number of nonzero singular values. Suppose the dimension of ϕ is d and the number of nonzero singular values is s; if s < d, then the coefficients are linearly dependent.
Figure 10 shows the SVD analysis of the coefficients of the operators of the Magnus expansion up to the second order. We chose 500 random instances of variational parameters and calculated all 29 coefficients of an Ising ansatz with global and local pulses for N = 3. We plot the singular values of the matrix obtained by this procedure for different numbers of basis functions K and for different functions (Fourier and Legendre). We find that whenever K > 1, the coefficients are linearly independent. Similar results were observed for N = 4, 5, 6.
2. Spin-squeezing using QPs
3. Additional data on simulating Jordan–Wigner products
In the main text, we showed the result of simulating the evolution of Jordan–Wigner products for small times and for the Fourier ansatz. Figure 11 shows the results of simulating the evolution of Jordan–Wigner products for longer times and using the Fourier (top panel) and Legendre (bottom panel) ansatz. We observe that for both of these ansatz, the fidelity decreases as time increases but returns to maximum later. The time of this comeback differs from one to another anzats. The Fourier ansatz has a repetition property after T = 1, but the Lengendre ansatz does not.