Author Notes
The controlled-swap and controlled-controlled-not gates are at the heart of the original proposal of reversible classical computation by Fredkin and Toffoli. Their widespread use in quantum computation, both in the implementation of classical logic subroutines of quantum algorithms and in quantum schemes with no direct classical counterparts, has made it imperative early on to pursue their efficient decomposition in terms of the lower-level gate sets native to different physical platforms. Here, we add to this body of literature by providing several logically equivalent circuits for the Toffoli and Fredkin gates under all-to-all and linear qubit connectivity, the latter with two different routings for control and target qubits. Besides achieving the lowest cnot counts in the literature for all these configurations, we also demonstrate the remarkable effectiveness of the obtained decompositions at mitigating coherent errors on near-term quantum computers via equivalent circuit averaging. We first quantify the performance of the method in silico with a coherent-noise model before validating it experimentally on a superconducting quantum processor. In addition, we consider the case where the three qubits on which the Toffoli or Fredkin gates act nontrivially are not adjacent, proposing a novel scheme to reorder them that saves one cnot for every swap. This scheme also finds use in the shallow implementation of long-range cnots. Our results highlight the importance of considering different entangling gate structures and connectivity constraints when designing efficient quantum circuits.
I. INTRODUCTION
The Fredkin gate (also known as controlled-swap) and the Toffoli gate (also known as controlled-controlled-not) are three-input, three-output logic gates that were introduced within the reversible logic model of classical computation,1 in which logic circuits realize invertible Boolean functions.2 The cswap leaves an input bit unchanged and swaps the remaining two if and only if the first one is in state 1,3 while the ccnot negates the target bit if both control bits are in state 1. Their importance lies in both being universal Boolean primitives for reversible logic: any classical logic operation can be constructed entirely out of Fredkin or Toffoli gates.4
As their reversible nature implies unitarity, both gates were readily adopted in quantum computing, particularly to realize classical logic circuits that perform subroutines of quantum algorithms. As a result, they gain the ability to operate over superposition states—i.e., complex-valued linear combinations of the classical states—and implement arbitrary classical logic operations on quantum data. Moreover, adding just the Hadamard gate to the Toffoli gate suffices to form a universal quantum basis set.5,6 The Fredkin gate requires the x gate in addition to the Hadamard gate to form a universal quantum basis.7 In practice, however, two-qubit gates are often used instead of the Toffoli or Fredkin gates as the elements of basis gate sets that can change the entanglement structure of the input state, a necessary condition for quantum universality.
Nevertheless, the Fredkin and Toffoli gates play a pivotal role in quantum computation. In particular, the Toffoli gate is the key building block of its multi-qubit generalizations,8–10 which are ubiquitous in quantum arithmetic circuits11 and in the construction of oracles.12,13 Moreover, the Toffoli gate has been adopted in quantum error correction.14 Recently, the iToffoli gate, a close variant of the Toffoli gate,15 was part of a proposal to compute frequency-domain molecular response properties.16 As for the Fredkin gate, it is the core element of the swap test,17,18 the canonical method to compute the fidelity between two states. In addition, the Fredkin gate has also been employed in quantum state preparation,19–21 estimation of linear and nonlinear functionals of density operators,22 quantum switches,23–25 optimal quantum cloning,26 stabilization of quantum computations by symmetrization,17 sampling states in the Hamiltonian eigenbasis (along with the Toffoli gate),27 and calculation of Bargmann invariants.28,29 Both the Fredkin and Toffoli gates have found use in routines tailored to near-term quantum hardware.21,29–33
In light of such a broad range of applications, it is unsurprising that the problem of implementing the Fredkin and Toffoli gates on digital quantum computers has attracted great interest. Unlike previous proposals tailored to specific quantum hardware—e.g., in platforms based on trapped ions,34,35 superconducting circuits,14,36–39 and quantum optics40–44—we follow a high-level, hardware-agnostic approach, whereby the Fredkin and Toffoli gates are decomposed in terms of standard single- and two-qubit operations. In particular, we take the cnot as the reference two-qubit basis gate. Earlier studies45–47 have minimized the number of non-Clifford operations, such as t gates, to render the Fredkin and Toffoli less onerous for fault-tolerant quantum computation.48 Instead, we focus on decompositions suitable for noisy intermediate-scale quantum hardware,49 in which case the key goal is to minimize the number of cnot gates while taking qubit connectivity constraints into account. A lower cnot count can be achieved by allowing for implementations up to a relative phase factor8,47,50 or by replacing some qubits with qutrits.51–54 Here, we restrict ourselves to the consideration of qubits, aiming to realize three-qubit operations with the exact matrix representations shown in Fig. 1, up to a global phase.
Matrix representations and quantum circuit diagrams of (a) the Fredkin gate and (b) the Toffoli gate. Qubit significance decreases from top to bottom in the diagrams. (c) Standard decomposition of swap gate in terms of 3 cnots. (d) Decomposition of Fredkin gate in terms of a Toffoli gate between two cnots, adapting the circuit in (c) and making use of the result from Appendix A. (e) Introduction of textbook decomposition of the Toffoli gate,9 as depicted within the solid-line blue box, into circuit shown in (d). This gives rise to a circuit for the Fredkin gate with 8 cnots and depth 14. (f) Two simplifications to the circuit shown in (e) can be applied. The first corresponds to the replacement of the two-qubit subcircuit shown inside the dashed-line red box, which results in one less cnot. The second amounts to removing a layer of single-qubit gates by changing some single-qubit gates while keeping the cnot structure unaltered. Overall, the Fredkin gate on three adjacent qubits can, therefore, be executed with 7 cnots and a depth of 13, ignoring qubit connectivity constraints. Single-qubit Hadamard (H), phase (S), and π/8 (T) gates follow the standard definitions,9 and .
Matrix representations and quantum circuit diagrams of (a) the Fredkin gate and (b) the Toffoli gate. Qubit significance decreases from top to bottom in the diagrams. (c) Standard decomposition of swap gate in terms of 3 cnots. (d) Decomposition of Fredkin gate in terms of a Toffoli gate between two cnots, adapting the circuit in (c) and making use of the result from Appendix A. (e) Introduction of textbook decomposition of the Toffoli gate,9 as depicted within the solid-line blue box, into circuit shown in (d). This gives rise to a circuit for the Fredkin gate with 8 cnots and depth 14. (f) Two simplifications to the circuit shown in (e) can be applied. The first corresponds to the replacement of the two-qubit subcircuit shown inside the dashed-line red box, which results in one less cnot. The second amounts to removing a layer of single-qubit gates by changing some single-qubit gates while keeping the cnot structure unaltered. Overall, the Fredkin gate on three adjacent qubits can, therefore, be executed with 7 cnots and a depth of 13, ignoring qubit connectivity constraints. Single-qubit Hadamard (H), phase (S), and π/8 (T) gates follow the standard definitions,9 and .
The remainder of this paper is structured as follows: Sec. II considers the cnot-count minimization of the Fredkin and Toffoli gate decompositions for three adjacent qubits with both all-to-all and linear qubit connectivity. Section III contemplates the case where the three qubits on which these unitaries act nontrivially are not directly connected to one another. In particular, we devise a method to bring the three qubits together and then return them to their original positions that saves one cnot for every swap. In Sec. IV, we exploit the multiple generated circuits for the Fredkin and Toffoli gates to mitigate coherent errors via equivalent circuit averaging (ECA), analyzing performance in silico and experimentally. Finally, Sec. V summarizes our results.
II. DECOMPOSITIONS FOR ADJACENT QUBITS
It is well established that five two-qubit operations suffice to decompose the Toffoli gate.55,56 However, the native basis gate sets that can be realized in quantum computing platforms typically only include a single fixed (i.e., not parameterized) two-qubit operation such as the cnot. Hence, in practice, the minimum number of two-qubit gates involved in the decomposition of the Toffoli gate is 6. The circuit inside the blue solid-line box in Fig. 1(e) shows the textbook decomposition9 of the Toffoli gate, which is optimal as far as the cnot count is concerned. Henceforth, this circuit will be the starting point to find shallow decompositions of the Toffoli and Fredkin gates under different qubit connectivity constraints.
The standard quantum circuit for the Fredkin gate55 results from adapting the well-known decomposition of the swap gate in terms of 3 cnots [Fig. 1(c)]. Naïvely, an extra control-qubit should be added to each cnot, but only the middle one happens to be required [Fig. 1(d)], thanks to the symmetric structure of the swap circuit (see Appendix A). Making use of the aforementioned textbook decomposition of the Toffoli gate,9 this results in a circuit for the Fredkin gate with 8 cnots and depth 14 [Fig. 1(e)]. However, the subcircuit within the red dashed-line box can be further simplified, resulting in the elimination of 1 cnot. Moreover, a layer of single-qubit gates can also be removed at the end of the circuit by changing some single-qubit gates while leaving the entangling gates structure unchanged. The result of these two simplifications is shown in Fig. 1(f), corresponding to a total of 7 cnots and a circuit depth of 13. To the best of our knowledge, this is the shallowest decomposition of the Fredkin gate in the literature in terms of cnot count.
The circuits shown in Fig. 1 assume that all qubits are connected to one another, thus allowing us to implement a cnot gate between any pair of qubits natively. However, in quantum computers based on solid-state platforms that realize qubits through superconducting circuits57 or silicon quantum dots,58 there are unavoidable restrictions in the connections between qubits. cnot gates between widely separated qubits are only possible by moving the information content of the qubits around through networks of swap gates,59 which introduce a considerable depth overhead. Generating shallow decompositions that forgo such swap networks while taking these qubit connectivity constraints into account is thus crucial to exploit the potential of near-term quantum processors. This is particularly relevant for circuits comprising three-qubit operations, such as the Toffoli or the Fredkin gates, as the architectures of most quantum processors that are currently available or under development do not include trios of fully connected qubits.
Leveraging the ZX-calculus and optimization heuristics, we have recently developed a technique60 for unitary decomposition capable of producing many logically equivalent circuits with manifestly different entangling gate structures. The entangling gate structure of the circuit, as we define it, consists of the description of the order and position of the cnot gates applied to different qubit pairs along the execution of the circuit. Single-qubit gates are excluded from this definition, grouping circuits differing only in single-qubit gates under the same category. Furthermore, if two circuits differ from each other only due to permutations of commuting cnot gates, they are also considered under the same entangling gate structure.60
The input provided to this circuit optimization technique is a circuit that implements the desired gate; this initial circuit is generally suboptimal in cnot count, and the goal of the method is to generate an equivalent circuit with fewer cnots. However, it is equally possible to start from a cnot-count-optimal circuit and obtain another circuit with the same number of cnots but a different entangling gate structure. The input circuit is converted into a ZX-diagram through the PyZX software package,61 which also includes methods to simplify the ZX-diagram and convert it back into a quantum circuit.62,63 This conversion can often give rise to a wide variety of circuits, and our technique searches for those that minimize the cnot count. Specifically, we build upon the PyZX simplification techniques with an intensive search and optimization procedure that often succeeds in escaping from local minima, thus optimizing the decompositions further.
We have applied our circuit simplification technique to generate several logically equivalent circuits for the Fredkin and Toffoli gates under all-to-all and linear qubit connectivity. In the former case, we started from the cnot-count-optimal circuits shown in Fig. 1, so the obtained circuits had the same number of cnot gates, though arranged in a different way. Under linear connectivity constraints, our starting point also corresponded to the circuits in Fig. 1 but with the two cnots between the outermost qubits requiring a swap before and after the execution of the actual cnot. This Naïve approach to handle the qubit connectivity restrictions is naturally far from optimal, and therefore, our ZX-calculus-based technique yielded circuits with a significantly lower number of cnots.
At the end of the search procedure, further logically equivalent circuits with different cnot structures were generated from the circuits directly obtained from the original ZX-calculus-based procedure by exploiting the symmetries of the Toffoli and Fredkin gates, namely the invariance of the former under permutations of all three qubits [once it is converted into a controlled-controlled-z (ccz) gate by applying a pair of Hadamard gates on either side, as discussed in Appendix B], the invariance of the latter under the permutation of the two target-qubits, and the invariance of both under inversion (since the Fredkin and Toffoli gates are self-inverses). Under linear qubit connectivity, some of these transformations were discarded, as they resulted in cnot gates between unconnected qubits. All in all, this process allowed us to increase the number of circuits with different entangling gate structures for each gate implementation.
The cnot count and the number of equivalent circuits for each scenario of qubit connectivity and position of the odd qubit (target-qubit for Toffoli and control-qubit for Fredkin) are shown in Table I. When all qubits are connected to one another, the placement of the odd qubit is immaterial. For the Toffoli gate, even under linear connectivity, the position of the target-qubit is irrelevant as far as the entangling gate structure of the circuit is concerned, since the target-qubit can be changed by simply moving a pair of Hadamard gates, one on either end of the circuit. This follows from the close relation of the Toffoli gate to the ccz gate, which is invariant under permutations of the three qubits (see Appendix B). Figure 2 shows an example of a circuit with the lowest cnot count for each of the three scenarios of linear qubit connectivity.
cnot count and the number of equivalent circuits generated for Fredkin and Toffoli decompositions in five different scenarios of qubit connectivity and position of odd qubit (control-qubit for the former and target-qubit for the latter). All circuits have been made available online in qasm file format. In addition to the optimal-cnot-count circuits, we also provide all circuits with linear qubit connectivity that have one more cnot gate than the optimal, as these may be useful for equivalent circuit averaging (see Sec. IV).
Gate . | Connectivity . | Odd qubit placement . | cnot count . | No. equiv. circ. . | No. equiv. circ. with +1 cnot . |
---|---|---|---|---|---|
Toffoli | All-to-all | Anywhere | 6 | 48 | ⋯ |
Linear | 8 | 18 | 54 | ||
Fredkin | All-to-all | Anywhere | 7 | 40 | ⋯ |
Linear | Ends | 8 | 8 | 22 | |
Center | 10 | 2 | 69 |
Gate . | Connectivity . | Odd qubit placement . | cnot count . | No. equiv. circ. . | No. equiv. circ. with +1 cnot . |
---|---|---|---|---|---|
Toffoli | All-to-all | Anywhere | 6 | 48 | ⋯ |
Linear | 8 | 18 | 54 | ||
Fredkin | All-to-all | Anywhere | 7 | 40 | ⋯ |
Linear | Ends | 8 | 8 | 22 | |
Center | 10 | 2 | 69 |
Examples of optimal basis gate decompositions in cnot count obtained through our ZX-calculus-based optimization heuristic for the case of linear qubit connectivity of (a) Toffoli gate, (b) Fredkin gate with control-qubit at one end of three-qubit register, and (c) Fredkin gate with control-qubit at center of three-qubit register. Single-qubit Hadamard (H), phase (S), π/8 (T), Pauli-X (X), and Pauli-Z (Z) gates follow the standard definitions,9 and , being its inverse. The circuit in (a) applies specifically to the case where the target-qubit of the Toffoli gate is at the central position, but the target-qubit can be changed by simply moving the two Hadamard gates, one on either end of the circuit, to the desired target-qubit (see Appendix B).
Examples of optimal basis gate decompositions in cnot count obtained through our ZX-calculus-based optimization heuristic for the case of linear qubit connectivity of (a) Toffoli gate, (b) Fredkin gate with control-qubit at one end of three-qubit register, and (c) Fredkin gate with control-qubit at center of three-qubit register. Single-qubit Hadamard (H), phase (S), π/8 (T), Pauli-X (X), and Pauli-Z (Z) gates follow the standard definitions,9 and , being its inverse. The circuit in (a) applies specifically to the case where the target-qubit of the Toffoli gate is at the central position, but the target-qubit can be changed by simply moving the two Hadamard gates, one on either end of the circuit, to the desired target-qubit (see Appendix B).
The shallowest circuits for the Fredkin and Toffoli gates generated by our ZX-calculus-based unitary decomposition technique have the lowest cnot counts in the literature. Table II shows the cnot counts achieved by different basis gate decomposition methods for the five different scenarios of qubit connectivity and odd qubit placement previously considered in Table I. In addition, Table II includes the cnot counts presented in two earlier papers32,67 for the decomposition of the Toffoli gate under all-to-all and linear qubit connectivity; analogous results for the Fredkin gate could not be found in the literature. The lowest cnot counts herein reported have also been attained by the BQSKit64 and CPFlow65 packages (with the exception of the Fredkin gate under linear qubit connectivity and the control-qubit at the center in the latter case). Our results offer three advantages relative to using these alternative packages. First, the circuits we have generated have been decomposed in the {cnot, Rz(θ), Rx(θ)} basis9 with all single-qubit-gate parameters θ corresponding to exact fractions of π. Besides guaranteeing that the decompositions are accurate to numerical precision, these circuits may also be useful for fault-tolerant quantum hardware, as the decomposition of single-qubit gates with respect to a finite basis is simplified. Second, instead of just one decomposition, our method generates several logically equivalent ones. Third, all equivalent circuits we have generated for the Fredkin and Toffoli gates have been made available online, so they can just be saved in memory and retrieved when required.68
cnot count of decompositions of Fredkin and Toffoli gates for five different scenarios of qubit connectivity and position of odd qubit, as shown in Table I. The BQSKit,64, CPFlow,65 and Qiskit66 unitary decomposition methods were used to benchmark our results. The lowest cnot counts reported in the literature32,67 for the Toffoli gate are also included for reference; no analogous results for the Fredkin gate could be found. Apart from achieving the lowest cnot count in all five cases, the multiple equivalent circuits we have generated have the additional benefits of being exact—as all single-qubit-gate parameters are exact fractions of π—and having been stored in memory—so that they can be retrieved when necessary, thus avoiding carrying out the unitary decomposition from scratch.
. | . | . | . | Fredkin . | Fredkin . |
---|---|---|---|---|---|
. | Toffoli . | Toffoli . | Fredkin . | Linear . | Linear . |
. | All-to-all . | Linear . | All-to-all . | (Ends) . | (Center) . |
Here | 6 | 8 | 7 | 8 | 10 |
BQSKit64 | 6 | 8 | 7 | 8 | 10 |
CPFlow65 | 6 | 8 | 7 | 8 | 11 |
Qiskit66 | 6 | 10 | 7 | 11 | 17 |
Duckering et al.32 | 6 | 8 | ⋯ | ⋯ | ⋯ |
Liu et al.67 | 6 | 8 | ⋯ | ⋯ | ⋯ |
Having different circuits that realize the same gate offers the possibility of implementing a number of methods that address the limitations of near-term quantum hardware. For example, two decompositions of the same gate may allow for a different degree of simplification of the circuit of which the gate is part by taking the context around the gate into account.67 Likewise, if the cnot gate implemented between a pair of qubits has an especially high error rate, one may choose a circuit that makes use of the fewest number of cnots between those two qubits to maximize the fidelity of the outcome. Even more importantly, it is possible to mitigate the effects of coherent errors through equivalent circuit averaging.69–71 Before we discuss this application in Sec. IV, we will consider the implementation of the Fredkin and Toffoli gates when the three qubits are not adjacent.
III. DECOMPOSITIONS FOR NON-ADJACENT QUBITS
In this section, we address the implementation of the Fredkin and Toffoli gates when the three active qubits are not adjacently connected. In this scenario, the neighboring qubits in their path must be used to implement the global long-range unitary. Avoiding a direct basis gate decomposition, we introduce the cnot-swapping method and show how it allows for an efficient rerouting of the qubits before and after applying the three-qubit circuits in Fig. 2. We first examine the general applicability of this technique to moving any qubit with respect to which the matrix representation of the gate is diagonal in the computational basis. This includes the important case of control-qubits. Then, we explain how it can also be used to move the target-qubits in multi-controlled-not operations. The cases of a long-range cnot and the Toffoli gates follow immediately from these two instances. Finally, the application to the Fredkin gate is discussed.
A. CNOT-SWAP rerouting
Let us now suppose that we wish to implement a two-qubit gate V between two non-adjacent qubits. The general approach would be to bring the two qubits together through a network of swap gates, apply V locally to a pair of adjacent qubits, and finally, reverse the initial swap network to return the qubits to their original positions. However, provided that V is diagonal in the computational basis of the moving qubit, a more efficient alternative is possible by replacing every swap with a cnot-swap, thereby saving two cnots for every qubit hop and its reversal [see Fig. 3(b)]. The moving qubit is the clean qubit of every cnot-swap. Although the qubits it goes past are initially left dirty, the final cnot-swap network cleans them to recover their original form. This is possible because V is guaranteed not to change the computational basis states of the moving qubit. Hence, after its application, each computational basis state of a dirty qubit is still associated with the computational basis state of the moving qubit responsible for its garbage [see Eq. (1)], and it can be cleaned by uncomputing the initial cnot-swap network. More generally, this method is valid to reroute any qubit on which a given n-qubit gate has support, provided that this unitary only modifies its amplitudes up to a relative phase factor.
(a) Shorthand diagram for the cnot-swap, which is equivalent up to single-qubit transformations to the fermionic swap72 and iswap73 gates. The cnot-swap gate was also discussed previously under the name “double-cnot” and shown to be a maximally non-local operator.74 (b) One-hop movement of the control-qubit of an arbitrary controlled gate via two cnot-swaps. (c) One-hop movement of the target-qubit of a multi-controlled-not gate via two cnot-swaps. Note that the direction of the cnot-swaps is reversed with respect to the rerouting of a control-qubit shown in (b).
(a) Shorthand diagram for the cnot-swap, which is equivalent up to single-qubit transformations to the fermionic swap72 and iswap73 gates. The cnot-swap gate was also discussed previously under the name “double-cnot” and shown to be a maximally non-local operator.74 (b) One-hop movement of the control-qubit of an arbitrary controlled gate via two cnot-swaps. (c) One-hop movement of the target-qubit of a multi-controlled-not gate via two cnot-swaps. Note that the direction of the cnot-swaps is reversed with respect to the rerouting of a control-qubit shown in (b).
B. Long-range CNOT and Toffoli gates
Let us now consider the important case of rerouting a control-qubit, as illustrated in Fig. 3(b). An |i1⟩ ⊗ |i2⟩ basis state of the top two qubits is first transformed into |i1 ⊕ i2⟩ ⊗ |i1⟩ by cnot-swap2,1. The subsequent controlled-operation on the bottom n + 1 qubits, therefore, becomes controlled by |i1⟩ and preserves this state, as intended. Finally, by reversing the direction of the cnot-swap, the top two-qubit state is transformed back into |i1⟩ ⊗ |i1 ⊕ i1 ⊕ i2⟩ = |i1⟩ ⊗ |i2⟩. Longer movements of the control are clearly generalized by the sequential application of this process.
The cnot-swap can also be used to move the target-qubit of a multi-controlled-not gate (mcx), as shown in Fig. 3(c). The crucial difference relative to the previously considered case of a control-qubit is that the moving target-qubit is the dirty qubit of the cnot-swap, while the idle qubits it goes past are left clean. This is why the cnot-swap gates have opposite orientations with respect to the direction of flow of the moving qubit in Figs. 3(b) and 3(c). In the circuit of Fig. 3(c), the basis state |in+1⟩ ⊗ |in+2⟩ of the bottom two qubits is transformed into |in+1 ⊕ in+2⟩ ⊗ |in+1⟩ by the first cnot-swap gate; the mcx operation yields the state ; finally, the cnot-swapn+1,n+2 gate transforms the state of the bottom two qubits into , as expected for a mcx gate.
In constructing cnot-swap networks to facilitate extended movements of control and target qubits, a simple but important simplification can be applied to the resultant circuits. Specifically, the last cnot in a given cnot-swap can be permuted with the initial cnot of the subsequent cnot-swap along each network path. This interchange is feasible as these cnot gates lack a common qubit serving as the target for one and the control for the other. This rearrangement allows some pairs of cnot gates along each of the network paths to be applied concurrently, thereby achieving a further reduction in circuit depth. For a visual representation, refer to Fig. 4(c).
Long-range cnot gate circuit. (a) A 3 × 3 square qubit lattice layout example with nearest-neighbor connections only. The five qubits on which the long-range cnot operates are highlighted in color. Active qubits, in blue, are connected through idle qubits, in red, across one of the shortest paths in terms of the Manhattan distance. (b) Decomposition of the long-range cnot gate between qubits 1 and 9, which are both moved toward each other to minimize circuit depth. (c) General construction of a long-range cnot gate, minimizing cnot count and depth, in this order. The subcircuit inside the blue box corresponds to the optimal decomposition of a cnot with an idle qubit between control and target. For a greater number of idle qubits between the pair of active qubits, networks of cnot-swaps on either side are applied. The permutation of cnots from adjacent cnot-swaps, highlighted in the red box for the second cnot of the first qubit pair and the first cnot in the second qubit pair, allows each rerouting layer to fit two to four gates.
Long-range cnot gate circuit. (a) A 3 × 3 square qubit lattice layout example with nearest-neighbor connections only. The five qubits on which the long-range cnot operates are highlighted in color. Active qubits, in blue, are connected through idle qubits, in red, across one of the shortest paths in terms of the Manhattan distance. (b) Decomposition of the long-range cnot gate between qubits 1 and 9, which are both moved toward each other to minimize circuit depth. (c) General construction of a long-range cnot gate, minimizing cnot count and depth, in this order. The subcircuit inside the blue box corresponds to the optimal decomposition of a cnot with an idle qubit between control and target. For a greater number of idle qubits between the pair of active qubits, networks of cnot-swaps on either side are applied. The permutation of cnots from adjacent cnot-swaps, highlighted in the red box for the second cnot of the first qubit pair and the first cnot in the second qubit pair, allows each rerouting layer to fit two to four gates.
The minimal case of a single control-qubit results in the so-called long-range cnot, i.e., a cnot gate acting on two qubits that are not directly connected to each other. Applying the cnot-swap methodology herein introduced to the long-range cnot gate decomposition produces both the lowest number of cnot gates and circuit depth, in this sequential order, in the literature.
A brief review of the literature on the implementation of the long-range cnot is in order. The standard approach to the synthesis of a long-range cnot gate from basic circuit primitives amounts to the introduction of swap gates along the shortest path connecting the control and target qubits, resulting in their adjacent placement, at which point a cnot gate can be directly applied. With n ≥ 1 intermediary qubits between the control-qubit and target-qubit, this method results in a circuit comprising 6n + 1 cnot gates, with a best-case depth of , assuming that the control-qubit and target-qubit of the long-range cnot are both moved toward each other in parallel. An improvement over this simple swap-based method was proposed by Shende et al.;75 the number of cnot gates was reduced to 4n at the expense of increasing circuit depth to 4n as well. Interestingly, this method appears to have been re-discovered recently with an algorithm based on the cryptographic problem of syndrome decoding.76 Later, Kutin et al.77 proposed a circuit construction reducing circuit depth to ∼n while increasing cnot count in only 1 unit relative to the circuit by Shende et al.
The decomposition of the long-range cnot that we arrive to using the cnot-swap methodology and represent in Fig. 4 reaches the circuit depth78 of ∼n from Kutin et al. while maintaining exactly the same minimal number of 4n cnots achieved by Shende et al. We have verified its optimality for the cases where the two active qubits are separated by n = 1 and n = 2 idle qubits, minimizing these instances primarily by cnot count and secondarily by depth; in both instances, an exhaustive search was carried out with a gate set containing only nearest-neighbor cnot gates.
Furthermore, the parallelized structure of the circuit provides additional advantages, since composing two long-range cnots one after the other, possibly interposed by some local operations, allows for a further simplification of the overall circuit by canceling out subsequent cnots on the same qubit pairs. An important case where this occurs is in sequences of cnots with a fixed control-qubit but multiple target-qubits, which are commonly found in state distillation and error correction.79,80 Another relevant instance of the use of cnot-swaps to reduce the depth and cnot count of quantum circuits is the implementation of complex exponentials of Pauli strings, which are ubiquitous in Hamiltonian simulation.81 An example for each of these cases is given in Appendix C.
To synthesize the Toffoli gate on a trio of non-adjacent qubits, interpreting each cnot that appears in the decomposition as a long-range cnot may not be the most advantageous solution. However, and most importantly, the same underlying ideas discussed before can be applied to bring the qubits together and implement the Toffoli gate through the circuits introduced in Sec. II that assume linear qubit connectivity. Concretely, both control-qubits and the target-qubit of the Toffoli gate can be moved similarly to the control-qubit and target-qubit of the long-range cnot, respectively. For the sake of clarity, Fig. 5(a) illustrates a specific example of this cnot-swap-based decomposition for a Toffoli gate. As far as we are aware, this decomposition has not appeared in the literature before.
Shallow implementation of Toffoli (a) and Fredkin (b) gates when the three qubits on which they act are not adjacent in an architecture with linear connectivity constraints. To reroute the qubits, every swap gate was replaced by a cnot-swap, saving one cnot in each instance. This use of cnot-swaps to reroute the target-qubits of the Fredkin gate only works when both are moved past the same idle qubits, as discussed in the main text. This strategy of moving both the control-qubit and the pair of target-qubits of the Fredkin in parallel aims to minimize the circuit depth; we could instead move only the control through cnot-swap networks, which would achieve a lower overall cnot count, though at the cost of a greater depth. To fully appreciate the depth savings, consider the cnot-swap decomposition in terms of its constituent cnots and the permutation trick depicted in Fig. 4(c).
Shallow implementation of Toffoli (a) and Fredkin (b) gates when the three qubits on which they act are not adjacent in an architecture with linear connectivity constraints. To reroute the qubits, every swap gate was replaced by a cnot-swap, saving one cnot in each instance. This use of cnot-swaps to reroute the target-qubits of the Fredkin gate only works when both are moved past the same idle qubits, as discussed in the main text. This strategy of moving both the control-qubit and the pair of target-qubits of the Fredkin in parallel aims to minimize the circuit depth; we could instead move only the control through cnot-swap networks, which would achieve a lower overall cnot count, though at the cost of a greater depth. To fully appreciate the depth savings, consider the cnot-swap decomposition in terms of its constituent cnots and the permutation trick depicted in Fig. 4(c).
For the long-range cnot and Toffoli gates, all swaps can be replaced with cnot-swaps in the qubit rerouting layers before and after the actual gate. Hence, if the cumulative number of idle qubits that are gone past by the three (two) qubits on which the Toffoli (long-range cnot) gate acts nontrivially is n, the cnot count of the rerouting networks is reduced from 6n to 4n and their depth is reduced from to ∼n.
C. Fredkin gate
Regarding the Fredkin gate, the control-qubit can always be moved through cnot-swap networks in a similar way to the control-qubits of the long-range cnot and Toffoli gates. As for the target-qubits, at first glance, it appears that rerouting via cnot-swap networks is not a valid option, as the effective action of the Fredkin gate on the target-qubits is neither diagonal in the computational basis nor equivalent to a not gate. In any case, it is possible to apply the network of cnot-swaps (just like for the control-qubits of the long-range cnot and Toffoli gates) to the target-qubits of the Fredkin gate if they are moved together, as illustrated with an example in Fig. 5(b).
If the two target-qubits of the Fredkin gate are initially adjacent, they have to move past exactly the same idle qubits to reach the control-qubit, so the garbage introduced in the idle qubits can still be cleaned even if the two target-qubits are swapped by the Fredkin gate between the two rerouting layers. Taking the example shown in Fig. 5(b), let us consider the action of the networks of cnot-swaps on the computational basis states of all qubits before and after the Fredkin,
Since the case where the control-qubit of the Fredkin gate is in state |c⟩ = |0⟩ is trivial, we shall assume that the control-qubit is in state |c⟩ = |1⟩, in which case the target-qubits |t1⟩ and |t2⟩ are swapped. The basis states of the idle qubits are represented as . After the Fredkin gate swaps |t1⟩ and |t2⟩, both target-qubits are moved past the same idle qubits, so the undesired change in the latter that the former left jointly in the first network of cnot-swaps will still be reversed by the second network. Conversely, if the two target-qubits of the Fredkin gate are not next to each other, the cnot-swap gate cannot replace the swap gate in general. However, even if the two target-qubits are originally separated from each other, we may consider moving one of them (namely the one that is farthest from the control-qubit) toward the other via a network of swaps and then move the pair of target-qubits together toward the control-qubit via a network of cnot-swaps. Meanwhile, the control qubit should also be moved toward the target qubits via a network of cnot-swaps to parallelize the rerouting, thus reducing the circuit depth.
Rerouting only the control qubit stands out as the best approach for minimizing the cnot count of the Fredkin gate when only the control qubit is non-adjacent to the targets. While targeting circuit depth reduction, however, moving only the control qubit yields a depth scaling of in the rerouting networks, where n is the number of idle qubits, whereas moving all three qubits concurrently into an intermediate position reaches ∼n depth.82 In the latter case, the strategy consists in hopping the control qubit by 1 position if n = 1, or by positions if n > 1 while also moving the target qubits together in the opposite direction to make them adjacent to the control. As a result, rerouting all three qubits at the same time may be the best option for minimizing environmental interactions in near-term quantum hardware or reducing total execution time in fault-tolerant hardware.
IV. EQUIVALENT CIRCUIT AVERAGING
Exploiting the full potential of quantum computing and achieving super-polynomial algorithmic speedups will require further technological advancements that allow for the faithful execution of arbitrarily long quantum circuits. On both near- and long-term quantum hardware, this is hampered by two primary challenges: decoherence, which limits the amount of time during which quantum circuits can operate before incoherent errors accumulate, and control errors, which often arise from coherent sources. It still remains unclear which of these limitations will be harder to overcome. This is because there is typically a trade-off: deepening circuits enhances decoherence, while introducing parallelized operations to reduce depth simultaneously adds coherent noise.
Coherent noise sources may be more damaging to the operation of a digital quantum computer as their worst-case error rate scales as the square root of the average error rate, thus potentially leading to a faster deterioration of the fidelity of the outcome of a quantum circuit.83,84 As a result, it is imperative to suppress coherent errors in gate implementations as much as possible and prevent their accumulation during the algorithmic execution, as it can incur constructive or destructive interference and lead to computational results that, while precise, are incorrect. In fact, coherent errors can be statistically resolved in the outcomes of current superconducting quantum processors even with very shallow circuits.85
To this end, various methods that add new or modify existing single-qubit gates in the default circuit have been introduced. Important examples include dynamical decoupling;86 arbitrarily accurate composite pulse sequences;87 and randomization procedures, such as Pauli twirling,88 Pauli frame randomization,89 and randomized compiling.90 Another strategy consists in synthesizing close but different unitaries in such a way that mixing and averaging over them produces statistics closer to that of the target unitary.70,71 The equivalent circuit averaging (ECA) technique69 follows a similar spirit: different but logically equivalent circuits are executed, and their measurement statistics are aggregated to convert the different systematic errors into stochastic noise.
In contrast to previously proposed ECA protocols, we recognize that the primary source of control errors in current quantum processors originates from the implementation of two-qubit gates rather than single-qubit gates. Consequently, our focus is directed toward devising a set of equivalent circuits featuring a variety of entangling gate structures. The greater the diversity of equivalent circuits, the more effective the ECA methodology is at mitigating the coherent errors.
A. Approximating ideal and faulty circuits
While the systematic nature of coherent errors makes it theoretically possible to correct them through recalibration or compensation operations, in practice, characterizing these errors on multi-qubit processors is an unmanageable task. The challenge stems from the lack of efficient methods to fully characterize the coherent processes that occur in all qubits in a timely manner when a single- or two-qubit gate is applied.
In order to assess how well Eq. (3) might hold in practice for the set of equivalent circuits we generated for the Fredkin and Toffoli gates, a concrete coherent-error model must be considered. We adopted a model recently introduced by one of us85 for the unitary errors of two-qubit operations implemented in transmon-based quantum hardware, namely a biased-cnot (bcnot) gate. In the current quantum processors developed by IBM Q, the two-qubit interaction that implements a cnot gate is the so-called cross-resonance (cr) gate. In theory, the cr pulse Hamiltonian should only generate a ZX interaction term, of which the time evolution results in a cnot (up to single-qubit rotations) for an appropriate duration of the dynamics. In practice, however, control errors arise due to the challenging calibration procedure and result in small additional error-terms in the interaction. Focusing only on the two-qubit subspace of the effective cr Hamiltonian and ignoring the entanglement with spectator qubits and external degrees of freedom, the most significant of these error-terms have been identified as IY, IZ, IX, ZY, and ZZ.91 The bcnot gate takes these terms into account with five dimensionless parameters, , that quantify the bias ratios between the coupling strength of these extra error terms and the desired ZX interaction. Its usefulness in modeling experimental data and improving the understanding of the computational outcomes of these quantum processors has been statistically demonstrated with exhaustive experiments on small circuits.
By replacing all cnot gates by bcnot gates in the circuits we provided for the Fredkin and Toffoli gates, in silico numerical simulations were performed to evaluate the performance of these decompositions in approximating the target unitary, both with and without the ECA procedure. We assumed that a bcnot between each different pair of qubits has different bias parameters. However, these parameters remain fixed over time for a cnot gate applied in the same qubit pair more than once in the circuit, in order to simulate a systematic miscalibration of that gate. The numerical study began by uniformly sampling the five bias ratios in the interval to assign them to the bcnot model of each qubit pair in the circuit. Having defined all two-qubit gates under the noise model, the unitary representations of the equivalent circuits for the Fredkin (Toffoli) gate were obtained by replacing every cnot appearing in the circuit by the respective bcnot. The diamond distance of each of these unitaries to the target unitary was computed, and their average was calculated. The same unitaries were also used to build the corresponding uniformly mixed unitary channel [see Eq. (2)], and the diamond distance from the channel to the target unitary was also computed. The Qutip 4.7 open-source software library92 was employed to perform these computations through a simplified semi-definite program method.93 This procedure was repeated B times, each with a different sampling of the biases in the interval mentioned above for a given βmax. With the resulting B values for the diamond distances of the channel and the average diamond distances of the unitaries of each circuit, two separate averages and standard deviations were calculated. This process was repeated inside an external loop that varied βmax from 0 to 0.5.
The results are plotted in Fig. 6 for a total of B = 20 bcnot models generated for each value of βmax. For both the Fredkin (red) and Toffoli (blue) gates, the diamond distance relative to the exact unitary representation of the gate of the uniformly mixed unitary channel resulting from the ECA methodology is noticeably lower than the average diamond distance for a single circuit. The black line shows the diamond distance for a single bcnot with respect to the exact cnot for reference. Naturally, the diamond distances of the Toffoli and Fredkin gates are greater than that of the bcnot, as each takes 6 and 7 cnots, respectively, since all-to-all connectivity was assumed. The consistently lower diamond distance for the Toffoli gate relative to the Fredkin gate is due to the extra cnot involved in the decomposition of the latter.
Impact of equivalent circuit averaging (ECA) on the approximation of the Fredkin and Toffoli gates using the multiple logically equivalent circuits discussed in Sec. II subject to a coherent-noise model where every cnot is replaced by a biased-cnot (bcnot).85 The degree of approximation to the exact unitary is quantified through the diamond distance d♢, which is plotted against the maximum magnitude βmax of the bias ratios of the noise model. The numerical simulation procedure is detailed in the main text. For each of the 20 different values of βmax, B = 20 different bcnot models were generated. The diamond distance between the bcnot and the cnot gates is plotted for reference. The width of each shaded region represents two standard deviations. The ECA implementation results in a significant reduction in d♢ for the Fredkin and Toffoli circuits compared to single-circuit implementations. The systematic difference in d♢ between the Toffoli and Fredkin circuits, with or without ECA, is due to the Toffoli circuit having one fewer cnot gate, making it less susceptible to the coherent errors.
Impact of equivalent circuit averaging (ECA) on the approximation of the Fredkin and Toffoli gates using the multiple logically equivalent circuits discussed in Sec. II subject to a coherent-noise model where every cnot is replaced by a biased-cnot (bcnot).85 The degree of approximation to the exact unitary is quantified through the diamond distance d♢, which is plotted against the maximum magnitude βmax of the bias ratios of the noise model. The numerical simulation procedure is detailed in the main text. For each of the 20 different values of βmax, B = 20 different bcnot models were generated. The diamond distance between the bcnot and the cnot gates is plotted for reference. The width of each shaded region represents two standard deviations. The ECA implementation results in a significant reduction in d♢ for the Fredkin and Toffoli circuits compared to single-circuit implementations. The systematic difference in d♢ between the Toffoli and Fredkin circuits, with or without ECA, is due to the Toffoli circuit having one fewer cnot gate, making it less susceptible to the coherent errors.
B. Application to quantum simulation: An example
As a proof of concept of the application of equivalent circuit averaging to the determination of expectation values of physical quantities in digital quantum simulation, in this section, we consider the estimation of the energy of the ground state of the Fermi–Hubbard model94–96 on a two-site lattice at half-filling using the Gutzwiller wave function.33,94
The Fermi–Hubbard model is a canonical description of strongly correlated electrons, capturing the competition between the kinetic energy, which favors the delocalization of electrons, and the potential energy, which tends to localize electrons due to the repulsive Coulomb interaction between like charges. Specifically, the electrons are assumed to be in a lattice, where each site represents an orbital of an atom that is part of the crystalline structure of a solid. The hopping of an electron from one site to a nearest-neighboring one lowers the energy by −t < 0. Each site can only be occupied by two electrons at most, one with spin-↑ and another with spin-↓; such a double occupancy of a site imposes an energy penalty of U > 0. For a sufficiently low temperature, the electrons under the Fermi–Hubbard model take the configuration that minimizes the total energy—the so-called ground state.
On quantum hardware, adopting the Jordan–Wigner transformation to map electrons to qubits,97 each site is encoded by two qubits, one to store in the computational basis states the number of spin-↑ electrons at that site (either 0 or 1) and another for spin-↓. Here, we consider a two-site lattice, so four qubits are required to store the wave function. Assuming half-filling and net zero magnetization—i.e., there are as many electrons as the number of sites, one with spin-↑ and another with spin-↓—the Gutzwiller wave function94 encodes the exact ground state for the two-site case through a suitable choice98 of its single free parameter g. This ansatz is prepared on quantum hardware following the scheme proposed by one of us.33 At each site, a controlled-controlled-Ry (ccRy) gate with the two qubits that encode the spin-↑ and spin-↓ occupations at that site acting as control-qubits and an ancillary qubit initialized in the fiducial state |0⟩ acting as the target-qubit is applied to the ground state of the non-interacting model (i.e., for , which is just a Slater determinant99,100). The Gutzwiller parameter g sets the angle of the Ry gate.101 After applying the ccRy gate, the ancilla is measured in the computational basis and only the trials that yield the fiducial state |0⟩ are retained, thus resulting in a non-deterministic preparation scheme. The greater the value of , the lower the probability of success, converging to as for the two-site case. Overall, the six-qubit circuit—i.e., four qubits to encode the ground state and two ancillas, one for each site—comprises two ccRy gates, each being decomposed in terms of two Toffoli gates, thus resulting in a total of four Toffoli gates. A scheme of the quantum circuit can be found in Appendix D.
In order to demonstrate what would be observed in practice, Fig. 7 shows the finite-statistics estimated energy of the ground state |ψ0⟩ of the two-site Fermi–Hubbard model in the presence of a bcnot coherent-noise model with βmax = 0.04. Other simulations under bcnot noise models with different randomly generated parameters for the same βmax were also performed, producing analogous results. All-to-all qubit connectivity is assumed. The exact ground state energy is shown in black for reference. The results presented in red correspond to the default option where the textbook circuit for the Toffoli gate [see the circuit inside blue solid-line box in Fig. 1(e)] was repeated at all four occurrences of the Toffoli gate in the circuit that prepares |ψ0⟩. The results in blue correspond to the ECA methodology, whereby one of the 48 logically equivalent circuits generated for the Toffoli gate was sampled at random for each of the four instances the Toffoli gate appears in the circuit. To allow for a fair comparison between the default and ECA approaches, in both cases, for each value of , 100 different sampling trials were carried out, each involving 1000 measurements. Of the total of 100 000 samples, only those for which both ancillas were measured in the fiducial state |0⟩—thus signaling the successful preparation of |ψ0⟩ in the ideal noiseless scenario—were used to estimate the energy. This accounts, in part, for the larger error bars observed as increases: fewer trials were used to estimate the energy, so the shot noise is greater. The ECA approach yields estimates of the ground state energy closer to the exact value across the whole range of values of , thus handling the effect of the bcnot coherent errors more effectively than the default method. This implementation was not even intended to address the coherent errors introduced by the bcnots present in the first part of the circuit (see the red dashed-line box in Fig. 13) as the averaging over equivalent circuits only considers the second part (see the blue solid-line box in Fig. 13) where the four Toffoli gates are present. Nevertheless, of the 28 bcnots present in the circuit, 24 are found in the latter part, so most of the impact of the coherent-noise model is addressed by ECA.
Coherent error mitigation via equivalent circuit averaging (ECA) in the ground state energy estimation of the two-site Fermi–Hubbard model. t is the hopping constant, and U is the Hubbard parameter of the Fermi–Hubbard model. A bcnot noise model with βmax = 0.04 was considered. The ground state was prepared via the quantum circuit shown in Appendix D. The exact ground state energy is shown in black. A total of 100 000 samples were generated to estimate each set of commuting terms in the Hamiltonian stated in Eq. (5). These 100 000 samples were divided into 100 trials. For the default approach, the same circuit was employed to prepare the ground state across all trials, replacing each of the four occurrences of the Toffoli gate by the circuit shown inside the blue solid-line box in Fig. 1(e). The corresponding results are shown in red. For the ECA method, in each of the 100 trials, a new circuit was generated by selecting a circuit at random from the set of 48 logically equivalent ones for the Toffoli gate with all-to-all connectivity introduced in Sec. II for each of the four Toffoli gates of the circuit. The respective results are presented in blue. Only the samples for which the Gutzwiller wave function was successfully prepared were considered, due to the non-deterministic nature of the preparation scheme. This contributes to the rise in the size of the error bars as increases, since the probability of success of the preparation scheme decreases with down to a minimum of as for the two-site case.
Coherent error mitigation via equivalent circuit averaging (ECA) in the ground state energy estimation of the two-site Fermi–Hubbard model. t is the hopping constant, and U is the Hubbard parameter of the Fermi–Hubbard model. A bcnot noise model with βmax = 0.04 was considered. The ground state was prepared via the quantum circuit shown in Appendix D. The exact ground state energy is shown in black. A total of 100 000 samples were generated to estimate each set of commuting terms in the Hamiltonian stated in Eq. (5). These 100 000 samples were divided into 100 trials. For the default approach, the same circuit was employed to prepare the ground state across all trials, replacing each of the four occurrences of the Toffoli gate by the circuit shown inside the blue solid-line box in Fig. 1(e). The corresponding results are shown in red. For the ECA method, in each of the 100 trials, a new circuit was generated by selecting a circuit at random from the set of 48 logically equivalent ones for the Toffoli gate with all-to-all connectivity introduced in Sec. II for each of the four Toffoli gates of the circuit. The respective results are presented in blue. Only the samples for which the Gutzwiller wave function was successfully prepared were considered, due to the non-deterministic nature of the preparation scheme. This contributes to the rise in the size of the error bars as increases, since the probability of success of the preparation scheme decreases with down to a minimum of as for the two-site case.
C. Experimental testing
Finally, we conducted an experimental evaluation of the equivalent circuit averaging protocol using an IBM Q quantum processor to validate its performance on a physical device. Specifically, we considered the swap test17,18 with single-qubit states as the application example. The swap test is a quantum algorithm that estimates the fidelity for two input states and without performing full tomography of each one separately. For single-qubit states, it requires three qubits: one to prepare each input state and a third auxiliary qubit to be measured in the computational basis to estimate the fidelity from the expectation value . Besides two Hadamard gates, the procedure only employs one Fredkin gate, which can be implemented by making use of our logically equivalent circuit decompositions.
Due to the restricted qubit connectivity of the hardware, we opted to test the cswap decompositions obtained for linear connectivity with the control-qubit at one of the ends, for which we collected equivalent circuits with eight different entangling gate structures (see Table I). Besides these structural differences, our method also returns variations in single-qubit gates, producing a multitude of different circuits. The count of these variations is not included in Table I because, as mentioned previously, coherent two-qubit gate errors are more significant than single-qubit ones. On top of that, effective techniques, such as randomized compiling, can easily add variations to single-qubit gates.90 Nevertheless, in the experimental implementation, we could leverage all our circuits to increase the diversity of logically equivalent decompositions. Therefore, the 404 circuits with minimal cnot-count that we obtained were transpiled into the native gate set of IBM Q devices and sorted by depth. Since there were more circuits than deemed necessary and their depths (including both CNOTs and single-qubit gates) varied significantly from 28 to 43, a cutoff depth value of 31 was defined and only the shallowest circuits were kept. This value was chosen so that all eight entangling gate structures were represented. In the end, 40 equivalent circuits, with depths of 28 (2 circuits), 29 (7 circuits), 30 (6 circuits), and 31 (25 circuits), were considered. The number of circuits per entangling gate structure was 6, 3, 5, 3, 7, 1, 9, and 6.
The experimental protocol started by defining 200 pairs of Haar random single-qubit states, generated by applying a Haar random unitary to a fixed pure state, and keeping only the pairs with F > 0.01. Then, with the aim of estimating the fidelity of each pair of states, an independent experiment was prepared for each of them to be carried out in two different protocols: as a single-circuit execution (SCE) or with ECA. A budget of S = 980 000 shots was given to each protocol. For the SCE protocol, one of the 40 equivalent Fredkin decompositions was sampled102 and the complete circuit for the swap test was put together by initializing it with the gate sequence that prepared each input state.75 This circuit was run S times, and was estimated from the measurement outcomes. For the ECA protocol, an equal share of s = S/8 shots was given to each entangling gate structure, where the s circuits to be used were defined by sampling from the Fredkin decompositions with the entangling gate structure under consideration. The initial states were prepared with the same algorithm as before, and the resulting S shots were combined to compute . All the circuits and shots in one experiment—comprising both protocols—were executed within the same job in the ibmq_lima processor103 to assure a fair comparison under the same experimental conditions. Besides the circuits under evaluation (49 copies of only one circuit for the single-circuit execution protocol, and 49 logically equivalent circuits for the ECA protocol) each job included two additional circuits to calibrate the measurement error mitigation (MEM) protocol,104 which was also tested with and without combining it with our ECA error mitigation method. Within each job, the first shot of each circuit was executed sequentially before moving on to the second shot, and so on until all of the circuits in the job ran for 20 000 shots each, totaling 980 000 shots for each protocol. With 200 pairs of random states and minutes per job, the full runtime of all the independent experiments was around 30 h, spread over two days.105
Having completed all the experiments, our analysis started by comparing the values of the measured fidelity and the expected value F, revealing an anticipated behavior: in both protocols, in the cases where F ≈ 0, the value of the estimated value tends to be slightly higher than the theoretical value F, as it becomes more challenging to reduce the error further due to random errors during circuit execution; conversely, when F ≈ 1, any error introduced tends to decrease , emphasizing the sensitivity of fidelity estimation to errors in such scenarios. More comprehensively, Fig. 8 presents the evaluation of the relative accuracy error in fidelity estimation with and without the ECA protocol. The plot showcases the remarkable improvements achieved through the application of ECA. It is evident that the ECA protocol not only reduces errors but also diminishes their variability. To quantify this observation, we computed the average and standard deviation (σɛ) of all relative errors. ECA proves to be highly effective in error reduction when compared to SCE, reducing the average relative error from 28% to 17%, and their standard deviation from 56% to 25%.
Coherent error mitigation via equivalent circuit averaging (ECA) in the swap test of 200 pairs of Haar random single-qubit states performed on the ibmq_lima quantum processor from IBM Corp. For each pair of states, ECA is compared against a single-circuit execution (SCE) using the relative accuracy error ɛ. The average ɛ (horizontal colored lines) is significantly reduced from 0.28 to 0.17 when the ECA protocol is applied to combine the shot statistics of different circuit decompositions. The standard deviation of the errors is also noticeably reduced when ECA is performed, from 0.56 to 0.25, improving the predictability of results. The histograms of the marginal distributions are also displayed. Tests were uniformly performed for fidelity, F, in the range from 0.01 to 1.
Coherent error mitigation via equivalent circuit averaging (ECA) in the swap test of 200 pairs of Haar random single-qubit states performed on the ibmq_lima quantum processor from IBM Corp. For each pair of states, ECA is compared against a single-circuit execution (SCE) using the relative accuracy error ɛ. The average ɛ (horizontal colored lines) is significantly reduced from 0.28 to 0.17 when the ECA protocol is applied to combine the shot statistics of different circuit decompositions. The standard deviation of the errors is also noticeably reduced when ECA is performed, from 0.56 to 0.25, improving the predictability of results. The histograms of the marginal distributions are also displayed. Tests were uniformly performed for fidelity, F, in the range from 0.01 to 1.
While Fig. 8 exclusively displays the results of our ECA method, it is important to note that it can seamlessly be complemented with other error mitigation protocols, such as measurement error mitigation (MEM).104,106 Although not displayed in the figure, we compared ECA with MEM104 to further benchmark our method. Specifically, MEM applied to the SCE protocol yields a non-significant 1% improvement in , reducing it from 28% to 27%, with the associated σɛ actually increasing to 62%. In stark contrast, ECA significantly enhances the results, reducing the average relative error from 28% to 17%, as mentioned above. Moreover, we coupled ECA with MEM, demonstrating its potential for an even greater error reduction. We observed that when MEM is coupled with ECA, it achieves the best performance, with an average relative error of only with standard deviation also reaching σɛ = 16%.
In addition to supplementing ECA with MEM, it might be worth considering pairing it with a compatible technique for mitigating incoherent errors. Because these errors are expected to occur randomly and independently of the circuit decomposition, and because their level should be similar in circuits with comparable depths, ECA should have no impact on them. Therefore, complementing ECA with incoherent error mitigation might improve fidelity further.
V. CONCLUSION
The Fredkin and Toffoli gates play a prominent role in quantum computing, underscoring the critical importance of efficiently decomposing these three-qubit gates in terms of cnots and single-qubit gates. In this paper, we have provided multiple decompositions of the Fredkin and Toffoli gates that achieve, to the best of our knowledge, an optimal cnot count, thus being relevant for near-term quantum hardware. The savings in cnot count produced by our ZX-calculus-based optimization scheme were especially pronounced under qubit connectivity constraints. Since the generation of the multiple equivalent quantum circuits herein presented demanded a considerable amount of time of computation, these circuits have been stored in memory to be retrieved when required.
Besides considering the case where the three qubits on which the Toffoli and Fredkin gates act nontrivially are adjacent, we have also explored the scenario where they are separated from one another in an architecture subject to connectivity constraints. In particular, we have devised an improved scheme to efficiently reroute the qubits of long-range Fredkin and Toffoli gates by replacing a swap gate with a cnot-swap. Although it only successfully swaps one of the qubits while leaving the other one dirty, it takes only two cnots as opposed to the three required by a perfect swap. We employed this cnot-swap-based rerouting scheme to bring the three active qubits next to one another in order to apply our local Toffoli or Fredkin gate decompositions before returning them to their original positions while ensuring that the idle qubits are left in their starting state. Consequently, the cnot count and depth for implementing these three-qubit gates were further reduced.
The use of cnot-swaps is not restricted to the implementation of the Fredkin and Toffoli gates. In fact, the replacement of the standard swap with a cnot-swap—thus saving one cnot for every substitution—applies generally to the rerouting of the control-qubits of any multi-controlled-gate and of the target-qubit of any multi-controlled-not operation, as well as any qubit from a multi-qubit gate with respect to which the matrix representation of the gate is diagonal in the computational basis. A noteworthy example of application of this general scheme corresponds to the implementation of a long-range cnot—i.e., a cnot between two qubits that are not directly connected to each other. In addition to yielding the optimal cnot count decomposition of the long-range cnot when there are n = 1 or n = 2 idle qubits between the active ones—as confirmed by an exhaustive search with circuits comprising only cnots, this cnot-swap-based decomposition results in exactly 4n cnot gates and a depth of ∼n. Although this cnot count scaling had already been achieved by Shende et al.,75 their decomposition did not offer the possibility of compressing circuit depth, thus being restricted to a depth scaling of 4n as well. Our cnot-swap methodology, in turn, does allow for the parallelization of cnots by moving both the control and the target qubits toward each other simultaneously and by permuting commuting cnots in the rerouting layers, as illustrated in Fig. 4(c). The cnot-swap decomposition of the long-range cnot therefore combines the best of both worlds.
Having multiple logically equivalent circuits with different entangling gate structures that realize the Toffoli and Fredkin gates opens a number of possibilities for overall circuit optimization and error mitigation. In this regard, we have explored the use of equivalent circuit averaging (ECA)—i.e., combining the measurement statistics of multiple different but logically equivalent circuits as opposed to repeating the same circuit multiple times—to address the effects of coherent noise sources. Using a realistic coherent-noise model that accounts for the leading-order biases in the implementation of the cnot via the cross-resonance gate in transmon-based quantum hardware, the uniformly mixed unitary channel resulting from the ECA methodology was shown to approximate the exact Fredkin and Toffoli unitaries more closely than an average individual circuit by computing the diamond distance. In addition, to illustrate the application of ECA to digital quantum simulation, we employed this methodology in the estimation of the energy of the ground state of the Fermi–Hubbard dimer, having obtained improved results relative to the bare approach using the same coherent-noise model considered in the calculation of the diamond distance. Finally, to confirm the effectiveness of the ECA methodology on actual quantum hardware, an experiment that involved estimating the fidelity between pairs of single-qubit states via the swap test was carried out on an IBM Q processor. ECA was found to reduce both the average relative accuracy error and its variance with respect to the single-circuit approach. The integration of ECA with measurement error mitigation resulted in a further reduction in the average error.
The various decompositions of the Fredkin and Toffoli gates should find wide use in near-term quantum computing hardware. We expect them to be especially useful in solid-state platforms based on superconducting circuits and silicon quantum dots, given the prevalence of qubit connectivity constraints in such cases. Nevertheless, even quantum computing platforms based on trapped ions and cold atoms may benefit from the multiple realizations of the Fredkin and Toffoli gates that assume all-to-all connectivity, namely to perform equivalent circuit averaging to mitigate coherent errors or to unlock opportunities for overall circuit simplifications. While the results presented in Sec. IV regarding the implementation of equivalent circuit averaging on quantum processors based on superconducting circuits are promising, further studies involving alternative technological realizations of quantum computers are encouraged.
ACKNOWLEDGMENTS
We thank J. P. Pedroso, J. Fernández-Rossier, D. Farina, and E. Bäumer for fruitful discussions. P.M.Q.C. acknowledges Fundação para a Ciência e a Tecnologia (FCT) Portugal for Grant No. SFRH/BD/150708/2020, the Government of Spain (Severo Ochoa CEX2019-00910-S and TRANQI), Fundació Cellex, Fundació Mir-Puig, Generalitat de-Catalunya (CERCA program), and the AXA Chair in Quantum Information Science. B.M. acknowledges the support from FCT Grant No. SFRH/BD/08444/2020. Both authors acknowledge the use of the IBM Q for this work. The views expressed are those of the authors and do not reflect the official policy or position of IBM Corp. or the IBM Q team.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Pedro M. Q. Cruz: Conceptualization (equal). Bruno Murta: Conceptualization (equal).
DATA AVAILABILITY
The OpenQASM instruction files for the quantum circuits used in this study are openly available68 in Zenodo at https://doi.org/10.5281/zenodo.10047422, reference number 10.5281/zenodo.10047422. Access to these files is granted under a CC BY-NC 4.0 license to mitigate the computational cost and time required by quantum compilers to re-obtain them at runtime when executing circuits that make use of these primitives.
APPENDIX A: CONTROLLING A GATE WITH SYMMETRIC DECOMPOSITION
Let U be an n-qubit gate for which a symmetric decomposition U = V†WV can be found for some n-qubit gates V and W. Suppose that we wish to add a control-qubit to U. We will show that, in general, the controls on V and V† assumed in the Naïve approach illustrated in Fig. 9(b) can be skipped. It suffices to control the central gate W [see Fig. 9(c)].
(a) Symmetric decomposition of n-qubit gate U = V†WV. (b) Naïve approach to controlling U adds a control to every element of the circuit. (c) Thanks to the symmetric decomposition, a control at the central gate W suffices to yield the controlled-U gate.
(a) Symmetric decomposition of n-qubit gate U = V†WV. (b) Naïve approach to controlling U adds a control to every element of the circuit. (c) Thanks to the symmetric decomposition, a control at the central gate W suffices to yield the controlled-U gate.
APPENDIX B: CHANGING TARGET-QUBIT OF TOFFOLI GATE
By applying an Hadamard gate on either side of a Toffoli at the target-qubit and a control-qubit, their roles are reversed, i.e., the target-qubit becomes a control-qubit and vice versa [see Fig. 10(a)]. This result follows from the well-known identity HXH = Z and the fact that the controlled-controlled-z gate is invariant under any permutation of the three qubits on which it acts nontrivially.
(a) Changing the target-qubit of the Toffoli gate by applying a pair of Hadamard gates, one on either side of the Toffoli, at the old and new target-qubits. (b) Equivalent two-qubit circuit identity reverses the direction of cnot. (c) Generalization to arbitrary number n = m1 + m2 + m3 + 1 control-qubits for multi-controlled-Toffoli gate.
(a) Changing the target-qubit of the Toffoli gate by applying a pair of Hadamard gates, one on either side of the Toffoli, at the old and new target-qubits. (b) Equivalent two-qubit circuit identity reverses the direction of cnot. (c) Generalization to arbitrary number n = m1 + m2 + m3 + 1 control-qubits for multi-controlled-Toffoli gate.
This result is a generalization to three qubits of the more familiar two-qubit result shown in Fig. 10(b), where the direction of a cnot gate is reversed by applying a pair of Hadamard gates on both qubits, one on either side of the cnot. As shown in Fig. 10(c), this result is valid for an arbitrary number of control-qubits: a multi-controlled-Toffoli (mcx) gate can always be turned into a multi-controlled-z (mcz) gate by applying the pair of Hadamard gates at the target-qubit, and then, a mcx gate with a different target-qubit can be generated by applying another pair of Hadamard gates to the mcz at the new target-qubit.
It should be stressed, however, that this result is only valid for a single target-qubit, i.e., applying two or more pairs of Hadamard gates to a mcz gate on as many different qubits does not result in a multi-controlled operation with conditional not gates at those qubits.
APPENDIX C: TWO IMPORTANT EXAMPLES OF SIMPLIFICATIONS OF QUANTUM CIRCUITS WITH CNOT-SWAP NETWORKS
Here, we demonstrate how cnot-swaps can be leveraged to reduce the depth and cnot count of important examples of circuits under linear connectivity constraints. First, the long-range cnot decompositions based on the cnot-swapping methodology (see Sec. III) are shown to simplify circuits involving sequences of cnot gates with the same control-qubit but different target-qubits, which are commonly found in error correction codes.79,80 Then, cnot-swaps are also applied to the circuits that realize complex exponentials of Pauli strings, which are pervasive in quantum simulation.81
1. Sequences of CNOTs with shared control-qubit
Figure 11 shows an example of a quantum circuit with three consecutive cnot gates that share the same control-qubit but act on different target-qubits. As the leftmost scheme suggests, such a sequence of cnots can be regarded as a single-control-multi-target-not gate. Assuming linear qubit connectivity, each long-range cnot is implemented by moving the control-qubit via cnot-swaps until it is next to the target-qubit, applying a cnot gate, and returning the control-qubit to its original position via cnot-swaps. The cnot-swaps within the red dashed-line boxes highlighted in the scheme after the second equality of Fig. 11 cancel out in pairs, which greatly reduces the cnot count and depth. Finally, the subcircuit within the blue solid-line box, which would take 5 cnots upon decomposing the cnot-swaps, can be replaced by the 4-cnot circuit shown in the dashed-line box of Fig. 4. All in all, the full circuit has a total of 22 cnots and depth 21.
Example of a single-control multi-target-not gate decomposition in terms of nearest-neighbor cnot gates. The circuit is simplified by making use of the cnot-swap decomposition of long-range cnots (see Fig. 4) and eliminating conjugated pairs of cnot-swap gates acting on the same qubits when possible, as highlighted inside the red dashed-line boxes. The subcircuit in the blue solid-line box is further simplified with the optimal decomposition of a cnot with an idle qubit between the control- and target-qubits. The final decomposition comprises 22 cnot gates and has a depth of 21.
Example of a single-control multi-target-not gate decomposition in terms of nearest-neighbor cnot gates. The circuit is simplified by making use of the cnot-swap decomposition of long-range cnots (see Fig. 4) and eliminating conjugated pairs of cnot-swap gates acting on the same qubits when possible, as highlighted inside the red dashed-line boxes. The subcircuit in the blue solid-line box is further simplified with the optimal decomposition of a cnot with an idle qubit between the control- and target-qubits. The final decomposition comprises 22 cnot gates and has a depth of 21.
Had we implemented each of the three long-range cnots via the method first introduced by Shende et al.,75 we would have obtained a cnot count of 30 and depth of 29. Like the cnot-swap-based approach described in Fig. 11, the standard approach of moving the control-qubit via conventional swaps also allows for the cancellation of many gates, resulting in a cnot count and depth of 29. Alternatively, we can make use of the cnot-swap decomposition of the long-range cnots while moving both the control- and target-qubits in parallel toward each other; compared to the case where only the control-qubit is moved (see Fig. 11), the depth is reduced from 21 to 19, but the cnot count increases from 22 to 31, as fewer pairs of cnot-swaps cancel out.
Although the cnot-swap-based methods herein introduced result in a shallower circuit for the example considered in Fig. 11, we note that this advantage relative to the long-range cnot decomposition of Shende et al.75 may not be observed for all circuits with successive cnots sharing the same control-qubit. In fact, in the cases where all target-qubits are adjacent to one another (though distant from the shared control-qubit), the method by Shende et al.75 achieves a lower cnot count after straightforward simplifications of the global circuit. For example, if the target-qubits of the three cnots in Fig. 11 were the three bottommost qubits in the scheme, the cnot-swap method would result in 20 cnots and a depth of 18, while the long-range cnot method due to Shende et al.75 would produce a circuit with 16 cnots and a depth of 14. The advantage of one long-range cnot decomposition over the other for the overall simplification of these circuits depends on the specific cnot sequence under consideration, the number of qubits involved, and the adjacency relations between all target-qubits. In practice, a compilation procedure could be implemented to choose the combination of different long-range cnot decompositions that yields the shallowest circuit.
2. Complex exponentials of Pauli strings
Let P be an n-qubit Pauli string, i.e., , where Gn is the Pauli group on n qubits.9 Any unitary of the form e−iθP with can be implemented with 2(s − 1) cnots, where s ≤ n is the number of qubits on which P acts nontrivially (i.e., the number of occurrences of X, Y, or Z in the Pauli string P, with the remaining n − s elements of the tensor product corresponding to ). The key idea9 behind this decomposition is the fact that, if P′ is the Pauli string resulting from P by replacing every occurrence of X and Y by Z, e−iθP′ applies the phase factor e−iθ to an input computational basis state if its parity is even and eiθ otherwise. The circuit for e−iθP can be obtained from that of e−iθP′ by applying the suitable single-qubit basis transformation to the qubits where the respective Pauli operation in P is X or Y.
Under linear qubit connectivity, some of these 2(s − 1) cnots will be applied at pairs of non-adjacent qubits. The standard approach is to move one toward the other via swaps. However, once again every swap can be replaced by a cnot-swap, thus reducing the overall cnot count by 2 for every idle qubit that is between the active qubits. Figure 12 shows an example of a circuit for a complex exponential of a Pauli string, , under linear qubit connectivity; the cnot count obtained using cnot-swaps is 12, i.e., with 4 fewer cnot gates than the approach based on swap gates.
Example of decomposition of quantum circuit that realizes for for (a) all-to-all qubit connectivity, (b) linear qubit connectivity using swaps to reroute qubits, and (c) linear qubit connectivity using cnot-swaps to reroute qubits. The standard approach based on full swaps to move active qubits past idle ones yields a total of 16 cnots. The rerouting of the qubits with cnot-swaps saves 2 cnots for every idle qubit, thus resulting in an overall cnot count of 12. Note also that the same strategy adopted in Fig. 4(c) to parallelize pairs of adjacent cnots can also be employed here to reduce the circuit depth further.
Example of decomposition of quantum circuit that realizes for for (a) all-to-all qubit connectivity, (b) linear qubit connectivity using swaps to reroute qubits, and (c) linear qubit connectivity using cnot-swaps to reroute qubits. The standard approach based on full swaps to move active qubits past idle ones yields a total of 16 cnots. The rerouting of the qubits with cnot-swaps saves 2 cnots for every idle qubit, thus resulting in an overall cnot count of 12. Note also that the same strategy adopted in Fig. 4(c) to parallelize pairs of adjacent cnots can also be employed here to reduce the circuit depth further.
APPENDIX D: QUANTUM CIRCUIT TO PREPARE GROUND STATE OF FERMI–HUBBARD DIMER VIA GUTZWILLER WAVE FUNCTION
Quantum circuit to prepare exact ground state of Fermi–Hubbard dimer via the Gutzwiller wave function33 and compute its energy. For the special case of the dimer, the Gutzwiller wave function94 is the exact ground state of the Fermi–Hubbard model for , where t is the hopping constant and U is the Hubbard parameter. The first part of the circuit, shown inside the red dashed-line box, corresponds to the preparation of the ground state of the non-interacting model (i.e., for ), which is just a Slater determinant.99,100 The corresponding subcircuit was decomposed in the {U3(θ, ϕ, λ), cnot} basis to highlight the four cnots. The second part of the circuit, shown inside the blue solid-line box, applies the Gutzwiller operator at each site non-deterministically, with . The preparation is successful when both ancillary qubits A1 and A2 are measured in the Z-basis and found in the |0⟩ state. The success probability decreases with , being 1 for and as . All-to-all qubit connectivity is assumed, so qubits do not need to be rerouted to perform the Toffoli gates, which require 6 cnots each. Once the ground state has been successfully prepared, its energy can be estimated by measuring all four qubits in the main register in the same single-qubit basis P = X, Y, Z, depending on the set of commuting terms—{X0X1, X2X3}, {Y0Y1, Y2Y3}, or {Z0, Z1, Z2, Z3, Z0Z2, Z1Z3}—that are computed. The qubit labels shown at the left end of the scheme are consistent with the expansion of the Hamiltonian of the Fermi–Hubbard dimer in the Pauli basis that is presented in Eq. (5) in the main text, assuming the Jordan–Wigner transformation to map electrons to qubits.97
Quantum circuit to prepare exact ground state of Fermi–Hubbard dimer via the Gutzwiller wave function33 and compute its energy. For the special case of the dimer, the Gutzwiller wave function94 is the exact ground state of the Fermi–Hubbard model for , where t is the hopping constant and U is the Hubbard parameter. The first part of the circuit, shown inside the red dashed-line box, corresponds to the preparation of the ground state of the non-interacting model (i.e., for ), which is just a Slater determinant.99,100 The corresponding subcircuit was decomposed in the {U3(θ, ϕ, λ), cnot} basis to highlight the four cnots. The second part of the circuit, shown inside the blue solid-line box, applies the Gutzwiller operator at each site non-deterministically, with . The preparation is successful when both ancillary qubits A1 and A2 are measured in the Z-basis and found in the |0⟩ state. The success probability decreases with , being 1 for and as . All-to-all qubit connectivity is assumed, so qubits do not need to be rerouted to perform the Toffoli gates, which require 6 cnots each. Once the ground state has been successfully prepared, its energy can be estimated by measuring all four qubits in the main register in the same single-qubit basis P = X, Y, Z, depending on the set of commuting terms—{X0X1, X2X3}, {Y0Y1, Y2Y3}, or {Z0, Z1, Z2, Z3, Z0Z2, Z1Z3}—that are computed. The qubit labels shown at the left end of the scheme are consistent with the expansion of the Hamiltonian of the Fermi–Hubbard dimer in the Pauli basis that is presented in Eq. (5) in the main text, assuming the Jordan–Wigner transformation to map electrons to qubits.97
REFERENCES
In other words, circuits that implement a bijection between input and output states. When, additionally, the circuit outputs a bit string that is just a permutation of the input bits, the computation is said to be conservative.
In fact, in Fredkin and Toffoli’s seminal paper,1 the target-bits are assumed to be swapped only when the control-bit is 0 [see the truth table in Eq. (2)]. However, since this paper is devoted to the quantum versions of the Fredkin and Toffoli gates, and because it is common practice in quantum information science to associate the nontrivial action of a controlled-gate with the control-qubit in state |1⟩, we will adopt this convention.
Strictly speaking, the classical universality of the Toffoli and Fredkin gates assumes that we can add ancillary bits to the circuit that can be initialized in either 0 or 1 as required.
Classically, given the universality of the Fredkin gate, it is clear that a Toffoli gate can be decomposed in terms of Fredkin gates alone by adding extra bits, so it might seem that the basis set comprising the Fredkin and Hadamard is also universal. However, because the Fredkin gate is conservative (i.e., it preserves the Hamming weight of input bitstrings, unlike the Toffoli), this replacement of every Toffoli gate by Fredkin gates would not reset all required ancillary qubits, thus generating undesired garbage qubits. Hence, the NOT gate needs to be added to the Fredkin and Hadamard gates to form a universal basis set.
The iToffoli gate amounts to a Toffoli gate that triggers the NOT gate on the target-qubit when the control-qubits are in state |0⟩, followed by a controlled-S gate also triggered by |0⟩. See Fig. 1(c) of Ref. 38.
Concretely, the depth of the rerouting network that moves only the control qubit is 2n + 4 for n ≥ 2 idle qubits between the control and both targets, while the depth achieved by the network that moves the three qubits simultaneously, considering that the targets begin in an adjacent position, is given by , for n ≥ 6, when the pattern is fully developed.
Specifically, , following the definition from Ref. 33.
Concretely, if θ is the parameter of the ccRy gate, .
Note that a different decomposition was sampled for each pair of states, i.e., each different experiment.
The ibmq_lima is a Falcon r4T processor with Quantum Volume 8 made available for free of charge use by IBM Q, with no special credentials required.
It is worth noting that ibmq_lima may have undergone recalibration during this time. However, all executions evaluating the performance of the swap test for one pair of random states, using both protocols, were conducted within a single, uninterrupted job lasting ∼9 min. Hence, analyzing the results of all experiments together remains valid even if the device undergoes recalibration between jobs. Our primary focus is to assess whether the performance with ECA improves relative to the single-circuit within each job, independently.