The quantum Monte Carlo algorithm can provide significant speedup compared to its classical counterpart. So far, most reported works have utilized Grover’s state preparation algorithm. However, this algorithm relies on costly controlled Y rotations to apply the correct amplitudes onto the superposition states. Recently, a comparison-based state preparation method was proposed to reduce computational complexity by avoiding rotation operations. One critical aspect of this method is the generation of the comparison threshold associated with the amplitude of the quantum superposition states. The direct computation of the comparison threshold is often very costly. An alternative is to estimate the threshold with a Taylor approximation. However, Taylor approximations do not work well with heavy-tailed distribution functions such as the Cauchy distribution, which is widely used in applications such as financial modeling. Therefore, a new state preparation method needs to be developed. In this study, an efficient comparison-based state preparation method is proposed for the heavy-tailed Cauchy distribution. Instead of a single Taylor approximation for the entire function domain, this study uses quantum piecewise arithmetic to increase accuracy and reduce computational cost. The proposed piecewise function is in the simplest form to estimate the comparison threshold associated with the amplitudes. Numerical analysis shows that the number of required subdomains increases linearly as the maximum tolerated approximation error decreases exponentially. 197 subdomains are required to keep the error below of the maximum amplitude. Quantum parallelism ensures that the computational complexity of estimating the amplitudes is independent from the number of subdomains.
I. INTRODUCTION
Quantum computers have the potential for large speedup over their classical counterparts.1–3 One specific algorithm that leverages potential quantum speedup is the quantum Monte Carlo algorithm.4–6 The quantum Monte Carlo algorithm is used in fields such as risk analysis7 and pricing prediction8 and in many other financial applications.9–12 However, maximizing the effects of quantum speedup for the Monte Carlo algorithm heavily depends on the chosen method of state preparation. State preparation is used twice in every Oracle call during the quantum Monte Carlo algorithm,13 so efficient state preparation is a crucial part of realizing quantum speedup.
Currently, there are several ways to prepare the superposition of quantum states for the quantum Monte Carlo algorithm. One of the most popular methods was developed by Grover and Rudolph to inductively prepare the amplitude of superposition states from the most significant qubit to the least significant qubit.14 New methods, such as the rotation-based method15 and the matrix product state method,16 have been proposed that can provide some improvements over Grover’s state preparation algorithm. Both of these methods eliminate the use of integration operations. This removes the restriction of efficiently integrable distribution functions. However, these methods had to make some trade-offs to obtain their improvements. The matrix product state method has only been demonstrated on distribution functions with bounded derivatives, and the newly proposed rotation-based method has high computational complexity to calculate the rotation angle.
All rotation-based state preparation methods, including Grover’s state preparation algorithm, require controlled Y rotations to prepare the quantum superposition states. The controlled Y rotations are performed in each Oracle call and contribute significantly to the computational complexity associated with state preparation. Furthermore, some of these methods have restrictions on the type of distribution functions that can be used. Grover’s state preparation method performs state preparation iteratively by adding a qubit and doubling the number of points in the distribution at each step. Because each point in the distribution corresponds to an interval, the distribution with the added points can be obtained by integrating the probability density function of the left and right half of the interval. For a distribution function that is not efficiently integrable, Grover’s algorithm cannot maintain computational complexity within polynomial time to achieve quantum speedup in the Monte Carlo algorithm. Therefore, Grover’s algorithm is only applicable for efficiently integrable distribution functions, such as log-concave functions. This requirement has led many studies on the quantum Monte Carlo algorithm to focus on log-concave functions.17–19 If more distribution functions are to be used in the quantum Monte Carlo algorithm, new quantum state preparation algorithms need decreased computational complexity and extended coverage.20–22
Recently, a comparison-based state preparation method was proposed.23 This state preparation method utilizes only a comparison operation and does not need any controlled Y rotations in its Oracle calls. Each state is assigned a threshold value to apply the correct amplitude to the superposition states. However, the study that presented the comparison-based method did not mention how to either calculate or load a quantum register with all the threshold values. Loading or calculating the comparison thresholds is crucial to the state preparation algorithm.
Generally, it is less costly to use quantum computation rather than calculating every threshold in a classical computer and loading them into the corresponding quantum register. However, this is not always the case. It is very costly for quantum computers to calculate complicated functions such as arcsine or square root. Consequently, the direct computation of the comparison thresholds can be very costly. Therefore, an efficient alternative approach is needed to approximate the threshold values. One way that the threshold level can be estimated is to use a Taylor approximation. The Taylor approximation works well with distribution functions with exponentially bounded tails, such as the normal distribution. However, Taylor approximations need high order and costly arithmetic when working with heavy-tailed distributions whose tails are not exponentially bounded. Heavy-tailed distributions, such as the Cauchy distribution, also have many applications in financial modeling using the Monte Carlo algorithm.24–26
This paper will utilize the comparison-based state preparation method and quantum piecewise arithmetic to create a practical gate-level algorithm for quantum superposition state preparation based on the non-log-concave heavy-tailed Cauchy distribution. Gate-level circuits that estimate the comparison thresholds based on a piecewise polynomial are presented. In general, a piecewise polynomial can be any polynomial function. The Cauchy distribution in this study is an even function because its center is set at 0, so the piecewise function can only contain even order terms. The proposed piecewise polynomial in this study consists of one quadratic term and one constant term to maintain the lowest time complexity in calculating the comparison threshold. Only a single layer of logic consisting of one multiplier (MUL) and one multiplier-accumulator (MAC) is required in this algorithm.
As far as we know, this is the first study on a gate-level comparison-based algorithm for quantum superposition state preparation based on the non-log-concave heavy-tailed Cauchy distribution.
The rest of the paper will be arranged in the following way. Section II will cover previous studies that are used in the development of the algorithm presented in this paper. Section III will present the algorithm, the gate implementation, and the cost analysis of the proposed method. Next, Sec. IV will lay out a numerical analysis that analyzes the relationship between the computational complexity and the maximum tolerated error in the piecewise approximation.
II. RELATED WORK
The comparison-based state preparation method provides an alternate non-rotation-based method of state preparation. The simplified operations come at the cost of an additional register and an ancilla qubit. The controlled Y rotations are replaced by a comparison operator and two Hadamard operations.23
The comparison-based method starts with n qubits at |0⟩ being put into superposition by n Hadamard gates. The result is an n qubit register that keeps track of all the coordinates in the quantum Monte Carlo algorithm. The initial superposition states |Φ1⟩ are shown in Eq. (1). Note that |0⟩⊗m denotes an m qubit register with all initial states at |0⟩. Two additional registers are added. The first additional register is , which holds all the comparison thresholds for every coordinate point i in the quantum Monte Carlo algorithm. This register is the threshold register and holds m qubits where m is directly related to the resolution of the amplitude of each state. The second register is an m qubit register, which is used in the comparison operation and is referred to as the reference register. Finally, a flag qubit is added,
At this point, the register is put into equal superposition by m Hadamard gates. The register will be referred to as |j⟩ref after it is put into superposition. The superposition states |Φ2⟩ after the Hadamard operation are shown in Eq. (2). Note that the tensor product symbol ⊗ denotes the joining of two groups of quantum registers,
At this point, the comparison operation can be carried out on the registers and |j⟩ref. The comparison operator is defined as
The comparison operation is applied on the threshold register, reference register, and flag qubit. The result will be similar to thermometer coding, with the first ai terms of the summation being assigned the |0⟩flag qubit and the rest of the terms being assigned the |1⟩flag qubit. The superposition states |Φ3⟩ after the comparison operation are shown as
The first summation term of in Eq. (4) has ai terms, so applying m Hadamard gates to the register |j⟩ will result in an amplitude of on the state where the flag qubit is |0⟩flag. The final state |Φ⟩4 is shown as
|ω⟩ is an unnormalized state that contains the parts of the superposition with nonzero values encoded on ref ⊗ flag.
At this point, the correct amplitudes are applied onto their respective states by controlling |0⟩ref|0⟩flag, so the state preparation is finished. No rotation operations were applied at any point during the state preparation.
The comparison-based state preparation method provides an alternative to the complicated rotation operations used in methods such as Grover’s state preparation algorithm. However, neither calculating nor loading the values in the register is trivial. This is especially true for the Cauchy distribution because a Taylor approximation is not an efficient estimation for a heavy-tailed distribution function. This paper will focus on a new method to calculate the values in for the Cauchy distribution.
Another study that will be used in this paper is regarding the quantum implementation of piecewise arithmetic.27 The piecewise function will be used to estimate the values to be stored in the register. This method divides the domain of a complicated function into multiple subdomains. Each subdomain will have its own unique polynomial approximation to estimate the function. The error of estimation can be reduced by increasing the number of subdomains.
The efficient implementation of this method relies on the iterative multiplication by x and adding a constant bk where k ranges anywhere from the highest power of the polynomial down to zero. The process for calculating a quadratic polynomial is shown in Eqs. (6)–(8). This fully leverages the benefit of quantum parallelism as the computational complexity is independent of the number of subdomains,
Equations (6)–(8) show an example of iterative multiplication and accumulation steps to create a general piecewise quadratic approximation for a more complicated function. The result in Eq. (8) is the final approximation for the more complicated function. In Eqs. (6)–(8), the value l serves as the label for the specific subdomain the coordinate x is a part of. In the algorithm, the coefficients are loaded onto their respective quantum registers to be used in the function computation. After the coefficients are loaded, a single set of quantum multiplier accumulators carries out the operations shown in Eqs. (6)–(8) iteratively. This method will be used in the algorithm presented in this paper.
III. ALGORITHM
A. Cauchy distribution function
This paper will focus on the Cauchy distribution as the distribution function used in the quantum Monte Carlo algorithm. The Cauchy distribution is defined by the function as follows:
In Eq. (9), i is the coordinate point, γ is the half width at half maximum, and i0 is the coordinate at the center of the distribution. The comparison-based state preparation method requires the calculation of to be stored in the register . According to Eq. (5), the amplitude of the superposition states is (ignoring the global normalization factor of ). Therefore, needs to store the value of to apply an amplitude of onto the superposition states (again ignoring the global normalization factor ). This requires the calculation of
Computing the square root of Eq. (9) in a quantum computer has a high computational complexity that will reduce the effective benefits of quantum speedup. If the values for ai are calculated in a classical computer, then each value for ai needs to be loaded onto its quantum register. This process greatly reduces the effective benefits of quantum speedup.
One of the alternatives to directly calculating Eq. (10) is to use a Taylor series approximation. However, this method does not work well for heavy-tailed distribution functions. In the case of the Cauchy distribution, Taylor approximations result in high error even for small values of i. An example of the second, fourth, and sixth degree Taylor approximations is shown in Fig. 1. In Fig. 1, γ is set to 1, and i0 is set to 0. The error of the Taylor approximations is large even before i reaches the half width at half maximum. A much higher order polynomial is needed to achieve reasonable error, but a higher order polynomial will drastically increase computational complexity. Therefore, the method of using a Taylor approximation is inefficient.
Second, fourth, and sixth order Taylor approximations are overlaid on the graph of Eq. (10). i0 is set at 0, and γ is set at one. It is clearly presented that every approximation has large error even before i reaches the half width at half maximum.
Second, fourth, and sixth order Taylor approximations are overlaid on the graph of Eq. (10). i0 is set at 0, and γ is set at one. It is clearly presented that every approximation has large error even before i reaches the half width at half maximum.
B. Piecewise polynomial approximation
Instead of using a single Taylor approximation for the entire domain of the distribution function, this study will use a piecewise polynomial function to estimate the value of ai, skipping the square root operation. The piecewise polynomial will be quadratic in the form of b0 + b2x2 to create the simplest possible polynomial approximation for an even function. The linear term is skipped because Eq. (9) is even. Both the size and the number of subdomains will be able to be adjusted to achieve the desired level of accuracy and relatively low computational complexity.
The algorithm consists of the following three steps:
First, the bounds, coefficients, and number of subdomains of the piecewise polynomial function will be calculated by using a classical computer. These parameters will depend on the maximum allowed error for ai. The calculation is performed iteratively starting at i = 0. At the beginning of each iteration for a subdomain, the value of Eq. (10) is calculated for the first two points in the subdomain. These values are used to create and solve a system of equations to find the coefficients b0 and b2. Once the coefficient values are calculated, the approximation error at each point can be found by calculating the difference between the polynomial and the value in Eq. (10). The error is calculated for increasing values of i until it is greater than the maximum error tolerance. Once the difference exceeds the maximum tolerated error, the subdomain ends, and a new iteration is started for the next subdomain. This process can be carried out using a search algorithm, such as binary search, to reduce computational cost. After the end of the entire domain is reached, the process ends, and the piecewise polynomial for the entire domain is determined. Note that each subdomain has a range independent from each other and does not follow a set pattern. The maximum error tolerance can be chosen specific to the individual application. It can be adjusted to trade the accuracy of the estimation with the number of subdomains as needed. Note that a decrease in maximum error tolerance will increase accuracy and increase the number of subdomains and an increase in maximum error tolerance will decrease accuracy and decrease the number of subdomains.
Next, the coefficients b0 and b2 are loaded onto quantum registers, and the values of ai are calculated using a quantum computer.
Finally, the comparison-based method is used to complete the state preparation.
C. Gate implementation
The gate implementation requires multiple quantum registers and ancilla qubits. The register |i⟩ will require n qubits to cover all the 2n grid points in the domain of the quantum Monte Carlo algorithm. An ancilla register containing (m − 1) qubits will store the value of i2 that will be used in the calculation of ai. The value m is directly tied with the resolution of ai. Two additional (m − 1) qubit registers will contain the values for b0 and b2. The final register will have m qubits to store the final value of ai. Two additional qubits will be used as a control and a flag qubit.
Two gate circuits are used in the calculation of ai. The first circuit will load the values of b0 and b2 onto their respective registers. The corresponding values are already calculated beforehand in a classical computer. The circuit that loads the coefficients onto their registers is shown in Fig. 2.
This circuit is used to load the coefficients onto their respective registers. The process only needs to be carried out once throughout the whole algorithm.
This circuit is used to load the coefficients onto their respective registers. The process only needs to be carried out once throughout the whole algorithm.
Register |b0⟩, register |b2⟩, register |i⟩, and control qubit |0⟩ are all used as inputs for the gate circuit shown in Fig. 2. First, the |i⟩ register and |0⟩ control qubit are used as inputs into a Pi gate. The P gate switches the control qubit to |1⟩ if |i⟩ is in the subdomain that corresponds to the Pi gate. The control qubit controls the gate, which will load the |b0⟩ and |b2⟩ registers with the coefficients for the subdomain. Finally, a gate is used to return the control qubit and |i⟩ register to their original states. Only the |b0⟩ and |b2⟩ registers are changed in this process. This gate circuit only needs to be used once to load the coefficients as they are only used in the calculation for ai.
The next circuit calculates the value of ai. The circuit is shown in Fig. 3. The same registers |b0⟩, |b2⟩, and |i⟩ are used as inputs for this circuit. An additional m − 1 qubit ancilla register and an m qubit |ai⟩ register are also required for this circuit.
This circuit uses multiplier and multiplier-accumulator gates to calculate the value of ai. The ancilla and a registers are labeled to avoid confusion.
This circuit uses multiplier and multiplier-accumulator gates to calculate the value of ai. The ancilla and a registers are labeled to avoid confusion.
Both the |i⟩ and ancilla registers will be used as inputs for the first multiplier gate. The multiplier (MUL) will calculate the value of i2 and store the most significant (m − 1) qubits into the ancilla register. Note that the symbol ⊕ represents the XOR operator. Then, the ancilla register is used alongside registers |b0⟩ and |b2⟩ and the ai register as inputs into the multiplier-accumulator (MAC). The MAC will multiply the values in the ancilla register, and |b2⟩ registers and adds the value in the |b0⟩ register to compute the piecewise polynomial approximation for ai. The result is then stored into the a register.
Once the MAC operation is finished, the comparison-based state preparation method described in Sec. II can be used to finish state preparation for the quantum Monte Carlo algorithm.
D. Cost analysis
As described in Sec. III C, the proposed algorithm has two quantum computational steps. The first step calculates the comparison threshold ai outside of the Oracle call. The second step prepares the superposition states using a comparison method within the Oracle call.
1. ai estimation outside of the Oracle call
The register |i⟩ requires n qubits to cover the entire domain of the quantum Monte Carlo algorithm. Each of the |b0⟩, |b2⟩, and ancilla register requires (m − 1) qubits where m is directly related to the resolution in the amplitude. The exact relationship between m and the resolution is , where δa is the resolution in the amplitude and is the ceiling function that gives the smallest integer greater than or equal to x. The threshold register |ai⟩ contains m qubits. Finally, a control qubit is needed for loading the coefficients onto their registers. Note that the ancilla qubits inside the MUL and MAC gates are not counted as the number depends on their design.
The computational time complexity mostly comes from the MUL and MAC gates. The design for the gates used in the estimation as well as time complexity for each is presented in the Appendix. These gates can be replaced with more efficient designs that are developed in the future. The MUL gate used in the estimation has a time complexity of O((3 log2 m) + 1), and the MAC has a delay complexity of O(log2 m).28 All the other operations used in estimating the amplitude are simple arithmetic and do not contribute significantly to the computational time complexity.
2. Comparison operation within the Oracle call
Most of the savings of the comparison-based method come from using the comparison operator instead of controlled Y rotations. The comparison operation can be applied with m Toffoli gates and m qubits.23 In addition to this, both the threshold and the reference registers need m additional qubits each.23 Finally, the flag qubit is added onto the register. The total number of qubits needed for this operation is 3m + 1.
The quantum Monte Carlo algorithm has speedup from O(N) to , where N is the number of coordinate points used in the algorithm. Due to the high number of Oracle calls in the quantum Monte Carlo algorithm, the time complexity of the algorithm is dominated by the time spent in Oracle calls. The total number of Oracle calls of the comparison-based state preparation method, Grover’s state preparation algorithm, and the new rotation-based state preparation algorithm is the same.14,15,23 Therefore, the time complexity comparison only depends on the time complexity of each Oracle call for each of the three state preparation algorithms.
Controlled Y rotations are used in the Oracle call of both Grover’s state preparation algorithm and the new rotation-based method. In addition to the controlled Y rotation, Grover’s state preparation algorithm requires an arcsine operation and an integration operation to calculate the rotation angle in each Oracle call. The new rotation-based state preparation algorithm does not need these extra operations as the rotation angle is calculated outside of the Oracle call and stored in a register in the algorithm. Therefore, the time complexity of the new rotation-based state preparation method is less than that of Grover’s state preparation algorithm.
Grover’s algorithm is now eliminated from the comparison, so the time complexity of the new rotation-based state preparation algorithm and that of the comparison-based state preparation algorithm need to be compared. The comparison-based state preparation method requires two Hadamard operations and one comparison operation. The comparison operation can be executed with a time complexity of O(2(2 + log2 m)),29 leading to an overall time complexity of O(2(3 + log2 m)). In comparison, the controlled Y rotations used in the rotation-based state preparation algorithm have a time complexity of O(m)23 for an m qubit rotation angle. These two methods break even when m equals 16 qubits. Therefore, the comparison-based state preparation algorithm is more efficient than the new rotation-based state preparation algorithm when m exceeds 16 qubits.
IV. NUMERICAL ANALYSIS
A numerical analysis was conducted to analyze the impact the maximum error tolerance of ai had on the computational complexity. This impact is extremely difficult to derive analytically because the distribution function is not partitioned into subdomains based on a set pattern. A graph of the error of the piecewise polynomial approximation with respect to the actual value of the square root function is shown in Fig. 4.
Error of the piecewise polynomial approximation for each point in the domain of the distribution function is presented in the graph. The maximum tolerated error in this case was set at of the maximum value of ai. A total of 28 subdomains were required to keep the error below the maximum tolerated error level. Note that the graph only shows positive values of i because the distribution function is even. γ was set at 1000, and i0 was set at 0. The domain covers a 20 bit range. Note that the i coordinates are plotted in logscale.
Error of the piecewise polynomial approximation for each point in the domain of the distribution function is presented in the graph. The maximum tolerated error in this case was set at of the maximum value of ai. A total of 28 subdomains were required to keep the error below the maximum tolerated error level. Note that the graph only shows positive values of i because the distribution function is even. γ was set at 1000, and i0 was set at 0. The domain covers a 20 bit range. Note that the i coordinates are plotted in logscale.
Figure 4 uses 20 unsigned qubits because the numerical analysis contained values of i that ranged from −(220 − 1) to (220 − 1). In the numerical analysis, gamma in the Cauchy distribution had a value of 1000, and the center i0 is at zero. Only positive values of i are presented in Fig. 4 due to the even nature of distribution functions. It is clearly presented that no i coordinate in the entire domain exceeds the maximum tolerated error.
As shown in Sec. III, quantum parallelism keeps the computation depth and gate complexity independent from the number of subdomains once the polynomial coefficients are loaded onto their registers. Therefore, the number of subdomains does not impact the computational complexity. However, loading the coefficients is still an important part of the quantum operation and is strongly affected by the number of subdomains. The number of subdomains is directly related to the maximum tolerated error, so the relationship needs to be investigated.
The numerical analysis calculated a 10 bit accuracy range, starting from to of the maximum value of in Eq. (10). The graph of the number of subdomains with respect to the maximum tolerated error is shown in Fig. 5. As the maximum tolerated error halved, the number of required subdomains appeared to increase linearly. The maximum number of subdomains needed in the numerical analysis was around 200.
The graph presents the number of required subdomains with respect to the maximum tolerated error. Each function was tested with a domain of 20 qubits and a γ value of 1000. Note that the maximum allowed error is in logscale and that the error gets larger from left to right.
The graph presents the number of required subdomains with respect to the maximum tolerated error. Each function was tested with a domain of 20 qubits and a γ value of 1000. Note that the maximum allowed error is in logscale and that the error gets larger from left to right.
The proposed algorithm contains quantum gate circuits for loading the coefficients onto quantum registers and calculating the piecewise approximation for the comparison thresholds. The time complexity of the circuit for loading the quantum registers is directly proportional to the number of subdomains. Although the circuit for calculating the piecewise approximation is not directly related to the number of subdomains, it is directly related to how complicated the piecewise approximation is. How complicated the piecewise approximation is determines how many subdomains, so the computational complexity of calculating the piecewise approximation is indirectly related to the number of subdomains. Note that there is a trade-off between the time complexity of loading the coefficients onto their registers and the computational complexity of calculating the comparison thresholds. If a piecewise approximation contains a more complicated polynomial, the computational complexity of calculating the piecewise function increases. However, the gate depth of loading the coefficients decreases as the number of subdomains decreases. On the other hand, a simpler piecewise polynomial approximation reduces computational complexity when calculating comparison thresholds but increases gate depth for loading the coefficients as the number of subdomains increases. The trade-off can be adjusted to fit the need of specific applications.
Trade-off optimization has to start with more advanced ways of using polynomial functions. There are some works that use modified polynomial methods for scientific problems in computational analysis,30–32 which could provide improvements over the proposed piecewise polynomial approximation. It will be an interesting future work to investigate the potential improvement by using these methods.
V. CONCLUSION
This study proposed a gate-level implementation of a Cauchy distribution based state preparation method. Specifically, this algorithm relied on the comparison-based state preparation method that bypasses the need for any rotation operators. Rather than using a Taylor approximation, the proposed algorithm relied on quantum piecewise arithmetic to estimate the comparison threshold for each state with relatively low computation cost and error.
The proposed algorithm was separated into three sections. First, a classical computer will calculate all the parameters for the piecewise polynomial approximation, and the parameters will be loaded onto quantum registers. Next, quantum circuits will estimate the amplitude of the superposition states. Finally, the comparison-based state preparation framework will be applied to finish the state preparation.
The computational complexity for the proposed algorithm is less than that of Grover’s state preparation algorithm or other rotation-based state preparation methods. This is achieved by eliminating the need for controlled Y rotations. However, this comes at the cost of extra qubits for the threshold and reference registers. For the estimation of the comparison thresholds, a piecewise function was used that was in the simplest form to model an even function. The estimation error can be reduced by increasing the total number of subdomains in the piecewise function. Numerical analysis showed that the number of required subdomains increased linearly as the maximum tolerated approximation error decreased exponentially. Quantum parallelism ensures that the computational complexity remains the same regardless of the number of subdomains.
The proposed algorithm is capable of performing quantum superposition state preparation for any distribution function. Specifically, this study demonstrated its capability to handle non-log-concave heavy-tailed Cauchy distributions. In comparison, Grover’s state preparation algorithm is only suitable for efficiently integrable distribution functions, and the matrix product state method has only been demonstrated on functions with bounded derivatives. To the best of our knowledge, this is the first comparison-based state preparation method for the non-log-concave distribution function category.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
APPENDIX: MULTIPLIER AND MULTIPLIER-ACCUMULATOR DESIGN
This appendix reviews the design of the MUL and MAC gates presented in Ref. 28. A multiplication operation can be represented as the sum of multiple partial products between the multiplier and the multiplicand. Figure 6 shows how an eight-qubit multiplier and an eight-qubit multiplicand are multiplied. {|a0⟩, |a1⟩, |a2⟩, |a3⟩, |a4⟩, |a5⟩, |a6⟩, and |a7⟩} represent the multiplicand qubits, and {|b0⟩, |b1⟩, |b2⟩, |b3⟩, |b4⟩, |b5⟩, |b6⟩, and |b7⟩} represent the multiplier qubits. The partial products are represented by AiBi and are generated by the partial product generation array denoted by the symbol “X.” Partial product array generation requires ANDing and copying operations. These operations can be executed in parallel to make the generation of partial products faster. PAi denotes the addition of the partial products, and Mi is the final multiplication result.
The addition of the partial product arrays is executed with a quantum full-adder circuit. The design for the full-adder circuit is shown in Fig. 7.
The time complexity of this multiplication algorithm is at least O((3 log2 n) + 1). This is proven through mathematical induction. An n qubit multiplier and multiplicand requires log2 n2 ANDing and copying operations to generate the partial product array. Therefore, the time required for the ANDing and copying operations is
The next step is performing the addition of the partial products. There need to be log2 2n operations to calculate the addition when the multiplier and multiplicand are both n qubits. Therefore, the total time required to perform the addition of the partial products is
The calculation of the base case with n = 1 is shown in Eq. (A3). Because the actual time complexity has the same value as O((3 log2 m) + 1), the base case holds true,
Now the induction step is proven. Assuming that the time complexity is O((3 log2 n) + 1) for n = k, the time complexity is
Extending this to the case of n = k + 1 gives a required time of
Therefore, the time complexity is O((3 log2(k + 1)) + 1) for n = k + 1. This proves that the time complexity of the proposed multiplication approach is at least O((3 log2n) + 1).
The multiplier-accumulator utilizes a similar algorithm to the multiplier. The only difference is that the quantum full-adder and an additional quantum accumulator are added after the quantum multiplier. The logic flow for the quantum multiplier-accumulator and the circuit for the quantum accumulator are shown in Figs. 8 and 9, respectively.
The hardware and delay complexity of the quantum accumulator were determined by a performance evaluation. The results of the performance evaluation are shown in Table I.