We propose a quantum algorithm for the lattice Boltzmann (LB) method to simulate fluid flows in the low Reynolds number regime. First, we encode the particle distribution functions (PDFs) as probability amplitudes of the quantum state and demonstrate the need to control the state of the ancilla qubit during the initial state preparation. Second, we express the LB algorithm as a matrix-vector product by neglecting the quadratic non-linearity in the equilibrium distribution function, wherein the vector represents the PDFs, and the matrix represents the collision and streaming operators. Third, we employ classical singular value decomposition to decompose the non-unitary collision and streaming operators into a product of unitary matrices. Finally, we show the importance of having a Hadamard gate between the collision and the streaming operations. Our approach has been tested on linear/linearized flow problems such as the advection-diffusion of a Gaussian hill, Poiseuille flow, Couette flow, and lid-driven cavity problems. We provide counts for two-qubit controlled-NOT and single-qubit U gates for test cases involving 9–12 qubits with grid sizes ranging from 24 to 216 points. While the gate count aligns closely with theoretical limits, the high number of two-qubit gates on the order of 107 necessitates careful attention to circuit synthesis.

Recent advancements in quantum computing algorithms and the availability of actual quantum processor units (QPUs) offer an exciting opportunity for the field of computational fluid dynamics (CFD) to explore the potential of quantum computing. The CFD workflow generally involves three key steps: mesh generation (for a given geometry), flow modeling (governing equations, discretization, and numerical methods), and post-processing of the results (analysis and visualization). Each of these tasks can demand high computational resources in terms of memory and speed, depending on the problem being solved.

For instance, a three-dimensional direct numerical simulation (DNS) of turbulent flow for a Reynolds number of 108 would require approximately Re941018 grid points.21,41 Storing a flow variable at each of these 1018 points using 64-bit precision floating values would require approximately 1018×64 bits of memory, which is equivalent to eight thousand petabytes (8×109 GB) of memory. Quantum superposition and the encoding of classical information on a quantum processor as probability amplitudes suggest the number of qubits required would be approximately 60 [ 15/2log(Re)]. Thus, in terms of computing memory, there is a promising opportunity to perform large-scale CFD simulations on QPUs.17 However, the QPUs are still in their early stages of development (limited numbers of noisy qubits), and the potential benefits of quantum speed-up have yet to be fully established.8,30

Present-day qubits in quantum computers can be affected by errors and imperfections, such as decoherence, gate errors, and measurement errors. In particular, decoherence times are an important factor in determining the performance and feasibility of quantum computations, wherein decoherence refers to the process by which a qubit loses its quantum properties and behaves more classically due to environmental interaction. Thus, the current noisy intermediate-scale quantum (NISQ) devices are characterized by having a limited number of qubits and relatively short coherence times, which impact their computational capabilities.

Typical efforts to develop a quantum CFD involve either (i) the application of Schrödinger equation to describe fluid motion or (ii) the development of algorithms for solving classical fluid mechanic equations for implementation on QPUs. In the former case, an analogy between the Navier–Stokes equation (NSE) and the Schrödinger equation has been derived through Madelung transformation.28 The latter case focuses on the applicability of existing CFD algorithms portable to QPUs, which received much more attention due to the extensive advancement in CFD algorithms over the past few decades. The main steps to perform CFD on QPUs involve encoding classical data (such as velocity and pressure fields), designing circuits representing differential or integral operators, and employing measurement techniques to convert quantum data back into classical variables.

The encoding of classical data requires the preparation of quantum states, which begins with initializing all qubits in their ground state. This is followed by the application of a specific set of quantum gates designed to manipulate the qubits into the desired quantum state. The resulting state vector contains the probability amplitudes corresponding to the classical data. A key challenge in preparing an arbitrary quantum state for a system comprising n qubits is the exponential increase in the number of required two-qubit controlled-NOT (CNOT) gates. Bharadwaj and Sreenivasan6 have investigated two approaches relevant to CFD that demonstrate reduced CNOT gate requirements, suggesting potential applicability in NISQ devices. In transient flow simulations, reinitialization of the quantum state is necessary at each time step to incorporate updated information, which involves applying a series of quantum gates. This reinitialization process becomes increasingly resource-intensive as both the system size and the number of iterations increase. Moreover, quantum measurements are essential for extracting classical information at each iteration; however, these measurements lead to the collapse of the quantum state, necessitating further reinitialization. This phenomenon introduces additional computational overhead, as multiple measurements may be needed to obtain reliable results, particularly when dealing with complex nonlinear partial differential equations.6,8,38,41

In CFD, the governing equations can be discretized and expressed in matrix-vector form. For explicit time-marching schemes, the discretized equation will be of the form Φt+1=AΦt, where matrix A represents the discrete differential/integral operator and vector Φt represents the flow variable at time t. In this context, the matrix-vector product AΦt can be viewed as a quantum state evolution, where qubits in a registry are initialized with state vector |Φt and result in state vector |Φt+1 containing the probability amplitudes in the chosen basis state. If the matrix A is not unitary, it can be decomposed as a linear combination of Pauli matrices, or a Hamiltonian AH can be constructed as AH=(0iAiAT0) which can then be converted to a unitary matrix10,15,27 as U(t)=eiAHt.

For implicit schemes, the equation becomes Φt+1=A1Φt. In this case, quantum linear solver algorithms such as the Harrow–Hassidim–Lloyd (HHL) algorithm6,7,14,19,25 or the variational quantum linear solver (VQLS)9,20,31 can be used. VQLS belongs to a class of hybrid algorithms known as variational quantum algorithms (VQAs) that utilize classical optimization schemes on CPUs to provide circuit parameters implementable on NISQ devices. Thus, VQLS performs two tasks simultaneously: quantum circuit parametrization and solution to linear system of equations. In summary, any CFD algorithm can be viewed as a Hamiltonian simulation on QPUs.

Instead of discretizing the NSE, an alternative approach known as the lattice Boltzmann method (LBM), which is based on kinetic theory, has been shown to effectively solve the NSE in the incompressible limit. It is important to note that LBM is not restricted to incompressible flows; it can also simulate compressible flows.1,16,43 LBM describes the evolution of particle distribution functions (PDFs) on a Cartesian grid with a predefined lattice structure. The moments of these PDFs yield the macroscopic flow variables of primary interest, such as density and velocity. Due to particle interactions with their nearest neighbors, LBM demonstrates high parallel efficiency on GPUs. However, the drawback comes in terms of computational memory. For instance, in an isothermal and incompressible flow simulation with N grid points and qn lattice directions, storing variables for present and future time steps in LBM will be 2Nqn, whereas, in traditional schemes, it would be 2N(D+1); D-number of physical dimension. The most common lattice structures for 2D and 3D simulations are D2Q9 and D3Q19, which give the scaling factor qn=9 and 19, respectively.

Before the development of the quantum lattice Boltzmann method (QLBM), Yepez44 proposed the quantum lattice-gas model (QLG) for fluid flow simulations that can be implemented using a hybrid quantum-classical nuclear magnetic resonance (NMR) computing methodology, commonly known as Type-II quantum computer (T2QC). Developed in the early 2000s, a T2QC consists of an array of small quantum computers linked by classical communication channels. Several QLG algorithms were developed for T2QC.4,26,45 Additionally, Pravia et al.32 successfully implemented the quantum lattice gas algorithm for the one-dimensional diffusion equation on T2QC. By using two-qubit processors assigned to each of the 16 grid points, they found that the experimental measurements at seven computational time steps aligned well with both the simulation and analytical solutions. This motivated further developments in the QLBM.

Typically, the LBM algorithm is divided into collision and streaming operations. The collision operator is local and has a non-linearity in the velocity, whereas the streaming operator is non-local and linear. Significant work on QLBM can be categorized based on (i) tackling the non-linearity,21,35,40 (ii) circuit design of QLBM,11,12 (iii) implementation of streaming without collision,36,42 and (iv) encoding.37 The fundamental challenge in QLBM involves handling non-linearity in the equilibrium distribution function and its computation through unitaries, which are linear operators.41 One way to tackle non-linearity is to perform Carleman linearization (CL) of the LB equations, resulting in an infinite dimensional linear system.21,22,25,34,35 At second-order truncation, the qubit requirement will increase by a factor of two before CL. Instead of CL which utilize amplitude encoding, Ref. 40 designed a quantum circuit based on computational basis encoding, which performs quantum floating point arithmetic and thereby evaluates the non-linear term. Excluding the non-linear term, Mezzacapo et al.29 developed a quantum simulator suitable for trapped ion and superconducting quantum systems and attempted to solve a two-dimensional advection-diffusion equation (ADE). The collision operator was decomposed as a sum of unitaries, and the streaming operator using the D2Q4 lattice was represented as shifted diagonal matrices. Later, Budinski11 presented a complete circuit with single/ two qubit gates for solving ADE with the D1Q3 and D2Q5 lattice model along with linearized equilibrium distribution function (EDF) and constant flow velocity assumptions. An extension to stream function-vorticity formulation of NSE using QLBM was also attempted with the D2Q5 lattice.12 In contrast to amplitude encoding, which occupies the whole of Hilbert space, Schalkers and Möller37 argued that quantum parallelism is best exploited when we utilize the least of Hilbert space. Instead of PDFs, if the velocity were encoded in qubits, neither amplitude encoding nor computational basis encoding technique would permit the collision and streaming to be unitary at once.

Based on the above discussion, it is evident that there is no comprehensive quantum model for LBM that encompasses collision, streaming, boundary conditions, and external forcing terms for solving various fluid flow problems in the existing literature. Moreover, treating boundary conditions as a separate operator, dealing with non-orthogonal states, and the possibility of taking measurements at each time step present challenges. As discussed earlier, the measurement-induced collapse of the quantum state further complicates matters, as each measurement alters the state and can potentially introduce errors. Thus, performing simulating without intermediate initialization/measurements is an active field of research in QCFD.41 

In this study, we propose the QLB algorithm to solve the 2D low Reynolds number flows with external forcing and wall boundary conditions. Our approach involves representing the streaming and boundary conditions as a unified matrix operator. The objective of this study is to develop QLBM as a Hamiltonian simulation. Our work primarily focuses on implementing the QLBM without making any intermediate assumptions, except for neglecting the non-linear term in the equilibrium distribution function. We note that upon neglecting the non-linearity, the linear lattice Boltzmann equation will not recover the incomplete Navier–Stokes equations. However, the linear model is not affected by the nonlinear O(u3) error terms in the momentum equation.24 Thus, the test cases considered herein are linear/linearized flows with low Mach (Ma21) and low Reynolds (Re=10) flows. The quantum circuit presented here for the collision and streaming remain same throughout the simulation. However, initialization of the quantum state based on the constructed PDFs at the start of each simulation time step is required.

In the following, the LB model along with the boundary conditions are outlined in Sec. II. The quantum LB algorithm is described in Sec. III. Results are discussed in Secs. IV and V summarizes with salient conclusions.

The single relaxation time LB model for fluid flow is given by
(1)
where fi is the PDF along the ith direction, ei is the lattice velocity, and τ=3ν+0.5 is the relaxation parameter with ν being the kinematic viscosity. The EDF is given by
(2)
where wi is the lattice weight, cs is the sound speed, ρ is the fluid density, and u is the flow velocity. The source term can be evaluated using the simple force model of Buick and Greated,13  Si=wics2ei·Fb, where Fb is body force. Equation (1) is solved using the following two step approach:
(3)
(4)
Equations (3) and (4) are referred as collision and streaming steps, respectively. The moments of the distribution function (DF) yield the macroscopic flow variables, density (ρ) and momentum (ρu),
(5)
(6)
During streaming step, each lattice site sends and receives the DFs to and from the neighboring sites. In a periodic domain, the DFs leaving on one side of the domain will reenter on the other side. In the case of solid walls, boundary nodes (xb) send the DFs (fi) to the wall node xw with lattice velocity ei, which then reflect back with velocity ei. If the wall is moving with the velocity uw, the reflected DF (fi) will gain/lose the momentum.24 The incoming DF (fi) for the boundary node can be computed using Eq. (7) which includes the stationary wall case by taking uw=0
(7)

The proposed quantum algorithm consists of three major steps, viz., (i) matrix-vector representation of the LB algorithm, (ii) decomposition of non-unitary LB matrices into unitaries using the singular value decomposition (SVD), and (iii) development of quantum circuit with suitable encoding of PDFs. In the following, we will explain each of these steps in turn.

Let ng=nx×ny be the total number of grid points, where nx and ny are the number of grid points along x and y axes, respectively. Then the total number of DFs will be nf=neng, where ne is the number of directions defined in the DmQn lattice model. At first, the DFs are arranged in a vector form according to their lattice direction, df=[df1,df2,,dfne], where the superscript refers to the direction. The index of any element in dfie can be written as x+ynx+ienxny, where (x,y) is the co-ordinates and ie is the direction. Next, the entries for the collision and streaming matrices will be derived, wherein the row and column indices of the constructed matrix follow the same order in which the DFs are arranged. In general, every element of the resultant vector from the matrix-vector product can be expressed as a linear combination of the elements of the input vector with weights being the corresponding row. Thus, each fidf can be written as fi=j=1nfαijfj, where αij represents the element at the ith row and jth column of the matrix.

1. Collision

The quadratic terms in Eq. (2) are needed to recover the NSE through the Chapman–Enskog analysis using the Knudsen number as the expansion parameter. We assume the flow is low Mach and low Reynolds, and the flow quantities such as pressure and velocity exhibit small variations about the constant rest state. Hence, we neglect the second-order term in Eq. (2). Detailed discussion on the validity of linearized EDF can be found in Ref. 24 
(8)
Using Eqs. (5) and (6), the linearized EDF in Eq. (8) becomes
(9)
Now, Eq. (3) can be written as
(10)
where δik is the Kronecker delta function. Note the values of αiki are constant and computed once, as they depend on relaxation time, lattice velocity, and weights. Thus, the matrix form of collision operation [Eq. (10)] can be written as
(11)

2. Streaming and boundary conditions

The streaming matrix S consists of ne×ne blocks with the size of each block ng×ng. In the absence of boundary conditions, the streaming operator shifts the DFs along the direction of propagation. This results in a permutation matrix, i.e., a square binary matrix where each row/column contains one entry equal to 1, the remaining entries are zero. As mentioned earlier, the index of each row i can be decomposed as x+ynx+ienxny. For a fully periodic domain, the column index j such that sij=1 can be computed by replacing the co-ordinate x by xeiΔt+L, where L is the domain length. In the case of a wall boundary, the column index will be computed as xΔ+yΔnx+ienxny, where (xΔ,yΔ)=xei and ie is the reflection of the direction ie.

As an illustration, we choose the D2Q9 lattice model, which is a commonly used lattice for two dimensional LB fluid flow simulations.24 The lattice parameters are given in Table I. The first block of matrix S will be identity, as the lattice velocity will be zero. In a fully periodic case, the number of elements in the streaming matrix will be the size of diagonal entries and the row major ordering of grid points provide the shifted elements closer to the diagonal [Fig. 1(a)]. The matrix shapes in Fig. 1 are obtained for the test cases presented in Sec. IV with nqa=11, and the grid sizes are given in Table IV.

Table I.

Lattice parameters for the D2Q9 model.

i ei wi
(0,0)  4/9 
1,2,3,4  (0,1),(1,0),(0,1),(1,0)  1/9 
5,6,7,8  (1,1),(1,1),(1,1),(1,1)  1/36 
i ei wi
(0,0)  4/9 
1,2,3,4  (0,1),(1,0),(0,1),(1,0)  1/9 
5,6,7,8  (1,1),(1,1),(1,1),(1,1)  1/36 
Fig. 1.

Streaming matrices for test cases with nqa=11 qubits using the D2Q9 lattice model. (a) A fully periodic domain, (b) periodic along the x axis and bounded by stationary walls in the y axis, (c) periodic along the x axis and bounded by stationary (bottom) and moving walls (top) in the y axis, and (d) bounded by a moving top wall and stationary walls on other three sides.

Fig. 1.

Streaming matrices for test cases with nqa=11 qubits using the D2Q9 lattice model. (a) A fully periodic domain, (b) periodic along the x axis and bounded by stationary walls in the y axis, (c) periodic along the x axis and bounded by stationary (bottom) and moving walls (top) in the y axis, and (d) bounded by a moving top wall and stationary walls on other three sides.

Close modal

Neither the collision nor the streaming matrix generated in Sec. III A is unitary. There are several methods to decompose non-unitary matrices into unitary matrices. One common approach is to express them as a weighted sum of unitaries, known as the linear combination of unitaries (LCU) method,15 or as a product of unitaries, such as through singular value decomposition (SVD).18 The number of terms required in an LCU representation of a matrix generally depends on three factors: (i) the size of the matrix, (ii) the sparsity or structure of the matrix, and (iii) the precision and algorithmic context of the LCU application. For a general N×N matrix, the worst-case scenario typically requires O(N) terms. However, for sparse matrices, the number of terms can be proportional to the number of non-zero entries, resulting in O(k), where k is the number of non-zero terms. In contrast to LCU, the number of matrices produced in SVD is always three with two of them being unitary. However, Bharadwaj and Sreenivasan8 utilized a four-unitaries approach, wherein a non-unitary matrix can be approximated as a sum of four unitaries using an expansion parameter. Thus, finding the efficient decomposition of a given non-unitary matrix is highly crucial and is a challenging area of research.2,5,8

In the present work, we use the singular value decomposition (SVD) to transform collision/streaming matrices as a combination of unitary matrices. Specifically, a matrix A can be decomposed into A=U*D*V, where U, V are unitary and D is a diagonal matrix but need not be an unitary. Again, this presents two approaches for representing D: either as a sum or as a product of unitaries. Previously, Schlimgen et al.38 introduced a dilation-based algorithm based on SVD (product of unitaries) to simulate a non-unitary diagonal operator. In our work, we utilized LCU to decompose the diagonal matrix as a sum of two unitary matrices. Before we decompose D, the norm of the elements in D may not be unity. Hence, we normalize it and store the factor α for the post multiplication of resultant vector. Now, D can be decomposed as D=0.5(D1+D2), where D1,D2 are unitary matrices given by D1,D2=D±iID2. Thus, A will be written in terms of unitaries U*1/2(D1+D2)*V. Consequently, the total number of unitary operators in our quantum LB algorithm is eight. In contrast, in the generic LCU approach, this count depends on the size of the matrix, which scales with the problem size.

The algorithm consists of three steps: encoding, circuit construction, and measurement. Herein, we encode the PDFs as the sole variable using the amplitude encoding technique. The collision and streaming operations are performed via the quantum circuits built from the unitaries. Finally, we measure and evaluate the probability amplitudes resulting in the PDFs for the next time step. The distinct advantage of the present algorithm is that the circuit is built once for the entire simulation. This means the circuit built at the start of the simulation can be iterated using a for loop, with the statevector re-initialized at each time step of the simulation. The complete circuit model for implementing QLBM is given in Fig. 2.

Fig. 2.

Quantum circuit for LBM, wherein the collision and streaming blocks are repeated for n time steps.

Fig. 2.

Quantum circuit for LBM, wherein the collision and streaming blocks are repeated for n time steps.

Close modal

1. Encoding

Two quantum registers, a computational (q) and an ancilla (a) will be used. The q register contains nq=log2(ne×nx×ny) qubits, whereas ancilla register has one qubit. In case 2nq>nf, additional zeros are padded to make the vector df of length 2nq. Because of ancilla qubit, the size of statevector |ϕ is of 21+nq. Now, the question is whether to control the initialization on a state of the ancilla, i.e., df is either to append with itself or to pad with zero entries. Physically, this means whether the ancilla qubit should ((HI)(|0a|ϕ)) or should not (|0a|ϕ) be in superposition with computational qubits. Because of non-essential orthogonal states generated after collision, the answer is affirmative, and it will be discussed in the  Appendix. We append the vector df to itself, resulting in a vector of length 21+nq. Thus, the initial state of qubits ϕ=[df,df] will be encoded as
(12)

2. Quantum circuits

At first, the collision matrices are converted into quantum gates by the direct application of UnitaryGate operation, which is available in major quantum software development kits (SDKs), such as Qiskit. After SVD decomposition, the collision matrix can be written as C=Uc*1/2(D1c+D2c)*Vc. Because of the summation, matrices D1c and D2c cannot be directly applied as gates. Following Refs. 11 and 27, the controlled operation for the linear combination of unitaries will be implemented as
(13)
The unitary corresponding to Eq. (13) is (DciIDcDciIDcDcDc). The single ancilla is used as a control qubit, which through Hadamard gate (H) is in superposition with the computational qubits. Thus the complete collision operation as product of unitaries can be expressed as
(14)
Upon collision, the quantum state |ϕ0 is transformed to |ϕc as
(15)
where ||αc|| denotes the normalization factor obtained from the elements of diagonal matrix Dc. Similarly, the streaming operation can be expressed as a product of unitaries,
(16)

However, before proceeding to streaming after collision, the Hadamard needs to be applied on the ancilla, i.e., circuit (HIq) has to be applied. In the  Appendix, we show its necessity. Upon performing streaming the real part in the first block of statevector results in the desired PDF for the next simulation time step.3 

3. Measurement and flow variables

The measurement process in quantum computing does not yield the complete state vector of qubits in the register. Instead it results in the collapse of the so-called quantum wave function. Here, we encoded PDFs onto the amplitude of the standard basis. Thus, the measurement collapses to one of the state |j and contains the binary string of zeros and ones. Since a quantum device is a probabilistic model, we execute the circuit multiple times and compute the probability distribution of the different states of the qubit. This results in the distribution function of LBM. Now, converting the DFs into the flow variable can be done in the post-processing step on the classical computer. The matrix form for evaluating the first and second moments of PDFs in Eqs. (5) and (6) is given by
(17)
(18)
where the identity matrices I in Eqs. (17) and (18) are of size ng×ng and repeated ne times horizontally. Upon computing variables such as density and momentum, the normalization factor (||ϕ||·||αc||·||αs||) need to multiplied; where ||αs|| is the norm of elements in Ds.

The proposed algorithm is implemented using IBM's Qiskit quantum computing SDK. Since we are interested in obtaining PDFs, we used the state vector simulator in Qiskit-Aer package.33 The unitaries derived in Eq. (14) and (16) are applied in the Qiskit-simulator as a custom gate operation. Four test cases with different boundary conditions were chosen for study. The D2Q9 lattice is used unless otherwise mentioned.

The evolution of species concentration C with an initial Gaussian profile is simulated in a fully periodic two-dimensional domain. The time evolution of concentration field C in the presence of a homogeneous advection velocity u is given by an analytical solution,24 
(19)
where C0 is the maximum concentration value, σ0 is the initial mean squared deviation and σD=2Dt, and D is diffusion coefficient. Initially,
(20)
where x0 is chosen at the center of the domain. We choose the domain of grid size 10×10, which results in nf=900 and nq=10. The values of other parameters are: C0=1.0, σ0=2.0, and D=0.005. The relaxation time can be computed by τ=3D+0.5. The simulation ran for 100 time steps. Figure 3 shows the comparison between the analytical solution and QLBM results of C profiles picked along the x-axis where the max{C} is found. The comparison of concentration contours between QLBM results with analytical solution is given in Fig. 4. After 100 time steps, the L2 relative error norm (=(CrefCLBM)2Cref2) is found to be around 6%.
Fig. 3.

Comparison of concentration profiles obtained from QLBM with analytical solution24 for the advection-diffusion of Gaussian hill of grid size 10×10 at different time steps. The parameters are: relaxation time τ=0.515, C0=1.0, σ0=2.0, and D=0.005.

Fig. 3.

Comparison of concentration profiles obtained from QLBM with analytical solution24 for the advection-diffusion of Gaussian hill of grid size 10×10 at different time steps. The parameters are: relaxation time τ=0.515, C0=1.0, σ0=2.0, and D=0.005.

Close modal
Fig. 4.

Comparison of concentration contours obtained from QLBM with analytical solution (bold lines)24 for the advection-diffusion of Gaussian hill of grid size 10×10. The chosen parameters are specified in caption of Fig. 3.

Fig. 4.

Comparison of concentration contours obtained from QLBM with analytical solution (bold lines)24 for the advection-diffusion of Gaussian hill of grid size 10×10. The chosen parameters are specified in caption of Fig. 3.

Close modal
The plane Poiseuille flow is an important test case of pressure driven flow, where a constant pressure gradient is applied in the direction of flow. The domain is two-dimensional which is periodic along the flow direction (x-axis) and bounded by stationary walls at top and bottom (y-axis). The analytical solution for horizontal velocity distribution along the vertical axis (y) is given by24 
(21)
where G is the pressure gradient applied along the direction of the flow, μ is the dynamic viscosity, and h is the distance between the walls. The Reynolds number (Re) of the flow is taken as 10. The value for the forcing term in Si can be evaluated as Fb=8νh2umax with umax=(0.1,0). Four different grid sizes were chosen by varying the number of qubits nqa (Table II). Figure 5 shows the comparison of horizontal velocity profile along the y-axis taken at the center of xaxis. The maximum relative error of around 2% is found in the smaller grid case of size 3×8 which utilized nine qubits (Table II).
Table II.

Relative error norm obtained for the Poiseuille flow test case with different grid sizes.

nqa Grid # Timesteps L2 (%)
3×8  500  2.34 
10  5×10  700  1.35 
11  7×16  900  0.6 
12  9×24  1200  0.56 
nqa Grid # Timesteps L2 (%)
3×8  500  2.34 
10  5×10  700  1.35 
11  7×16  900  0.6 
12  9×24  1200  0.56 
Fig. 5.

The horizontal velocity profile of plane Poiseuille flow obtained from QLBM for four different qubit counts is compared with analytical solution.24 The values of Fb are 0.001, 0.008, 0.0005, and 0.0003, respectively, for the nqa= 9, 10, 11, and 12. The grid size and the final time step are given in Table II.

Fig. 5.

The horizontal velocity profile of plane Poiseuille flow obtained from QLBM for four different qubit counts is compared with analytical solution.24 The values of Fb are 0.001, 0.008, 0.0005, and 0.0003, respectively, for the nqa= 9, 10, 11, and 12. The grid size and the final time step are given in Table II.

Close modal
The plane CouettePoiseuille flow is a test case of shear induced fluid motion. Test case setup is similar to the Poiseuille flow except that it features a moving wall at the top boundary. The analytical solution for the horizontal velocity distribution along the vertical axis (y) is given by
(22)
Here, the top wall velocity is taken as uw=0.1, and all other parameters are chosen as in Poiseuille flow case. The grid size of 7×16 with nqa=11 is chosen. Two test cases with and without pressure gradient has been simulated. The simulation ran for 900 time steps, and compared against the analytical solution (Fig. 6). The relative difference is found to be 0.3%.
Fig. 6.

The horizontal velocity profile of plane Couette flow obtained from QLBM for grid size of 7×16 with nqa=11 is compared with analytical solution24 at the end of time step t=900.

Fig. 6.

The horizontal velocity profile of plane Couette flow obtained from QLBM for grid size of 7×16 with nqa=11 is compared with analytical solution24 at the end of time step t=900.

Close modal

The lid-driven cavity is a standard CFD benchmark problem for validating various numerical schemes for low-speed flows. A square cavity with two side walls and the bottom are considered to be rigid, whereas the top wall is moving with a tangential velocity uw. Before proceeding with QLBM, we performed the classical LB simulation with and without non-linear terms in the equilibrium distribution function. We choose the grid size of 10×10. The lid velocity is chosen as uw=0.1 and Re=10. This gives ν=uwL/Re=0.1, and τ=3ν+0.5=0.8. This setup yielded no significant difference between the LB simulations with and without non-linearity at low Re. However, we observed a significant difference when Re=100, necessitating a larger grid size to maintain stability. Hence, in order to validate the QLBM simulation with the coarse grid in the absence of an analytical solution, we used icoFOAM solver from OpenFOAM-v2312 to obtain the reference solution.23 Simulation time of 1s with Δt=0.005s is used in OpenFOAM, whereas QLBM simulation is performed for 1000 time steps. Figure 7 shows the streamlines obtained from QLBM after 1000 time steps. Figure 8 shows the comparison of QLBM results with OpenFOAM for horizontal (ux) and vertical (uy) velocity profiles extracted, respectively, along the y- and x-axes at the center of the domain. The relative difference is found to be 5.9%.

Fig. 7.

Lid-driven cavity case in the 10×10 grid. Streamlines for Re=10 obtained from QLBM at the end of time step t=1000.

Fig. 7.

Lid-driven cavity case in the 10×10 grid. Streamlines for Re=10 obtained from QLBM at the end of time step t=1000.

Close modal
Fig. 8.

Lid-driven cavity case Re=10 in the 10×10 grid. Comparison of x and y velocities from QLBM with OpenFOAM solution. The x and y velocities are obtained along the y and x axes, respectively.

Fig. 8.

Lid-driven cavity case Re=10 in the 10×10 grid. Comparison of x and y velocities from QLBM with OpenFOAM solution. The x and y velocities are obtained along the y and x axes, respectively.

Close modal

Implementing the present algorithm on real QPU hardware is challenging for several reasons. First, when applying Hadamard gates to all qubits including the ancilla, all possible states in the Hilbert space have equal probabilities. Since we encode PDFs onto these probabilities, we must prepare the state of qubits before the start of QLBM. The number of CNOT gates required for state preparation39 will be O(2n). In the present algorithm, at each time step, LBM is expressed as a product of six unitary matrices, three each for collision and streaming operation. Of the six unitaries, two are diagonal matrices. Each of these matrix decomposed as controlled-NOT (CNOT) and U-gates using Qiskit decomposition tool, which uses the quantum Shannon decomposition (QSD) algorithm. In QSD, the circuit for n-qubit operator is decomposed into three rotational gates and n1 qubits operators; and this undergoes recursively until the circuit is made of single and two qubit gates.39  Table III shows the gate count obtained from the Poiseuille flow case with four different qubit counts. For each additional qubit, gate counts of both CNOT and U gate increases by a factor of 4, which matches with the theoretical estimate of O(4nqa). Each diagonal operator over nqa qubits will be decomposed into O(2nqa+1) gates, and the general unitary operator require O(2nqa1(2nqa1)) gates. Thus, the total gate count for two diagonal matrices and four generic unitaries is of O(2nqa+2(2nqa1)). Next, we report on the influence of different streaming-cum-boundary operators on the gate count. Table IV shows the gate count obtained for all test cases with nqa=11. The difference is observed in the streaming part Vs where the gate count of fully bounded case (Lid-cavity) is four times that of fully periodic case (ADE).

Table III.

Gate count obtained from the Poiseuille flow test case solved for different grid sizes (single time step).

nqa Grid points CNOT U
Uc Dc Vc Us Ds Vs Uc Dc Vc Us Ds Vs
24  (3×8)  31 020  5 325  31 020  31 020  5 325  16 505  62 168  4 100  62 168  62 160  4 069  31 435 
10  50  (5×10)  124 844  12 362  124 844  124 844  12 362  64 959  249 942  9 854  249 944  249 775  9 599  126 602 
11  112  (7×16)  500 908  27 619  500 908  500 908  27 619  256 491  1 002 326  20 901  1 002 327  1 001 847  20 838  504 379 
12  216  (9×24)  2 006 700  61 307  2 006 700  2 006 700  61 307  1 016 860  4 014 422  45 816  4 014 424  4 013 528  45 561  2 014 951 
nqa Grid points CNOT U
Uc Dc Vc Us Ds Vs Uc Dc Vc Us Ds Vs
24  (3×8)  31 020  5 325  31 020  31 020  5 325  16 505  62 168  4 100  62 168  62 160  4 069  31 435 
10  50  (5×10)  124 844  12 362  124 844  124 844  12 362  64 959  249 942  9 854  249 944  249 775  9 599  126 602 
11  112  (7×16)  500 908  27 619  500 908  500 908  27 619  256 491  1 002 326  20 901  1 002 327  1 001 847  20 838  504 379 
12  216  (9×24)  2 006 700  61 307  2 006 700  2 006 700  61 307  1 016 860  4 014 422  45 816  4 014 424  4 013 528  45 561  2 014 951 
Table IV.

Gate count obtained for all test cases with nqa=11 (single time step).

Case Grid points CNOT U
Uc Dc Vc Us Ds Vs Uc Dc Vc Us Ds Vs
ADE  100  (10×10)  500 908  27 619  500 908  500 908  27 619  134 397  1 002 327  21 093  1 002 325  1 002 066  20 838  257 167 
Poiseuille  112  (7×16)  256 491  1 002 326  20 901  1 002 327  1 001 847  20 838  504 379 
Couette  112  (7×16)  500 908  1 002 324  20 901  1 002 327  1 002 328  21 861  1 002 327 
Cavity  100  (10×10)  500 908  1 002 326  21 093  1 002 328  1 002 328  21 861  1 002 322 
Case Grid points CNOT U
Uc Dc Vc Us Ds Vs Uc Dc Vc Us Ds Vs
ADE  100  (10×10)  500 908  27 619  500 908  500 908  27 619  134 397  1 002 327  21 093  1 002 325  1 002 066  20 838  257 167 
Poiseuille  112  (7×16)  256 491  1 002 326  20 901  1 002 327  1 001 847  20 838  504 379 
Couette  112  (7×16)  500 908  1 002 324  20 901  1 002 327  1 002 328  21 861  1 002 327 
Cavity  100  (10×10)  500 908  1 002 326  21 093  1 002 328  1 002 328  21 861  1 002 322 

Another challenging aspect of the quantum simulation described by generic unitary matrices is the system memory of classical computers. The quantum state described by the probability amplitudes will be stored as an array of complex numbers. A single qubit system described by two complex numbers requires two 64-bit precision floating values, i.e., 2×2×64=256 bits =32 bytes of memory. An arbitrary 2×2 unitary matrix representing a quantum operation requires 4×2×64=512 bits =64 bytes of memory. In case of LBM with D2Q9 lattice with nqa=20 qubits, which can handle a grid size of 1024×1024, storing statevector and the single unitary matrix would require 0.017, and 18 000 Gigabytes (GB) of memory, respectively. Because our algorithm comprises of six unitaries, simulation with larger qubits are not feasible, unless the matrix can be built directly from the universal quantum gate set. It is worth to mention the importance of non-linear term on the computational requirements. For instance, the Carleman system at second order truncation35 requires nqa=log2(nf+nf2), whereas the present work requires nqa=log2(nf). Also, the computational cost of SVD on the classical computer is of O(23nqa), which we have not taken into account.

We presented a comprehensive quantum model for LBM that encompasses collision, streaming, boundary conditions, and external forcing terms for solving various fluid flow problems in the low Re regime of O(10). As a result of neglecting the non-linear term in the equilibrium function, the LB algorithm is written in terms of a matrix-vector product. Furthermore, we combined the streaming operation and boundary conditions as a single matrix operator. We utilized classical SVD to decompose the non-unitaries, which followed by representing the diagonal operator as a sum of unitaries. Thus, we have eight unitary operators describing the LB algorithm for a single time step. Compared to the generic linear combination of the unitaries approach, where the number of unitaries depends on the problem size, the fixed number of unitaries is a key advantage of the present algorithm. The algorithm has been tested with four different CFD benchmark problems performed using the “statevector” simulator. The results are in good agreement with the reference solutions.

The present QLB algorithm has certain limitations. First, the quantum state evolution performed with any of eight unitary matrices generated in a single time step of QLBM results in a larger circuit depth of O(105) (minimum estimate among all test cases), which is not feasible in present day QPUs, as they are made of qubits with shorter decoherence times and their two qubit gates are noisy/error-prone. However, the quantum circuit presented here for the collision and streaming remain same throughout the simulation. This allows us to explore various optimization strategies to obtain the shallow circuit depth, which in turn has a potential to be implementable on NISQ devices. However, initialization of the quantum state based on the constructed PDFs at the start of each simulation time step is required. Second, simulations with more than 15 qubits using the D2Q9 model are not feasible due to higher memory requirements. Even though the first limitation is hardware-related, circuit depth can be optimized by building a parameterized quantum circuit or performing a unitary circuit synthesis with more efficient algorithms. This helps efficiently implement the Carleman system for LBM, which includes non-linear terms, as it requires at least twice the resources required by the present algorithm. Further works include testing the present algorithm in real hardware or simulation with noise models. This helps to quantify (i) the single qubit noise on the final distribution functions and (ii) affect the number of measurements required to construct the state vector.

This research was kindly supported by the Quantum Computing Consortium that has been funded by the MAGNET program of the Israel Innovation Authority. The authors acknowledge the referees for the comments/suggestions which greatly improved the work.

The authors have no conflicts to disclose.

E. Dinesh Kumar: Conceptualization (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Steven H. Frankel: Conceptualization (equal); Funding acquisition (lead); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are available within the article.

Let us denote the matrices defined in Eqs. (14) and (16) as
(A1)
(A2)
(A3)
(A4)
(A5)
(A6)
The complete LB operation can be written as (operation from right to left) as
(A7)
where |ϕ0 and |ϕ1 are the statevectors containing initial and the final DFs, respectively. As mentioned earlier, the choice of encoding df with padded zeros of the same size found to be unsuitable. Let |ϕ0=(df0), where 0 is the zero vector of size same as df. We will apply the matrix operations defined in Eq. (A7) (from right to left). Thus,
(A8)
(A9)
(A10)
(A11)
(A12)
(A13)
Thus, the imaginarity present in the second block of Eq. (A10) results in the additional term UsIDsDsVsUcIDcDcVcdf in Eq. (A13) affecting the post-streaming state vector. Now, let's choose |ϕ0=(dfdf), and apply the matrix operations as before
(A14)
(A15)
(A16)
(A17)
(A18)
(A19)
(A20)
Thus, the final state vector in Eq. (A20) contains undesired terms in both real and imaginary components. In order to get the correct DFs, before proceeding to streaming, the Hadamard needs to be applied on the ancilla, i.e., circuit (HIq) has to be applied. The resulting matrix operation is defined as Hcs=(HIq)=1/2(IIII). After collision operation defined in Eq. (A16), we proceed as follows:
(A21)
(A22)
(A23)
(A24)
(A25)

Finally, the real part in the first block of state vector results in the desired PDF for the next simulation time step.

1.
Alexander
F. J.
,
Chen
H.
,
Chen
S.
, and
Doolen
G.
,
Phys. Rev. A
46
,
1967
(
1992
).
2.
An
D.
,
Liu
J.-P.
, and
Lin
L.
,
Phys. Rev. Lett.
131
,
150603
(
2023
).
3.
Bautista
I.
,
Kreinovich
V.
,
Kosheleva
O.
, and
Nguyen
H. P.
, “
Why it is sufficient to have real-valued amplitudes in quantum computing
,” in
Soft Computing: Biomedical and Related Applications
(
Springer
,
2021
), pp.
131
136
.
4.
Berman
G.
,
Ezhov
A.
,
Kamenev
D.
, and
Yepez
J.
,
Phys. Rev. A
66
,
012310
(
2002
).
5.
Berry
D. W.
,
Childs
A. M.
, and
Kothari
R.
, “
Hamiltonian simulation with nearly optimal dependence on all parameters
,” in
IEEE 56th Annual Symposium on Foundations of Computer Science
(
IEEE
,
2015
), pp.
792
809
.
6.
Bharadwaj
S. S.
and
Sreenivasan
K. R.
,
Proc. Natl. Acad. Sci. U. S. A.
120
,
e2311014120
(
2023
).
7.
Bharadwaj
S. S.
and
Sreenivasan
K. R.
, “
Compact quantum algorithms that can potentially maintain quantum advantage for solving time-dependent differential equations
,” arXiv:2405.09767 (
2024
).
8.
Bharadwaj
S. S.
and
Sreenivasan
K. R.
, “
Simulating fluid flows with quantum computing
,” arXiv:2409.09736 (
2024
).
9.
Bravo-Prieto
C.
,
LaRose
R.
,
Cerezo
M.
,
Subasi
Y.
,
Cincio
L.
, and
Coles
P. J.
,
Quantum
7
,
1188
(
2023
).
10.
Brearley
P.
and
Laizet
S.
,
Phys. Rev. A
110
,
012430
(
2024
).
11.
Budinski
L.
,
Quantum Inf. Process.
20
,
57
(
2021
).
12.
Budinski
L.
,
Int. J. Quantum Inf.
20
,
2150039
(
2022
).
13.
Buick
J. M.
and
Greated
C. A.
,
Phys. Rev. E
61
,
5307
(
2000
).
14.
Cao
Y.
,
Daskin
A.
,
Frankel
S.
, and
Kais
S.
,
Mol. Phys.
110
,
1675
(
2012
).
15.
Childs
A. M.
and
Wiebe
N.
,
Quantum Inf. Comput.
12
,
901
(
2012
), see https://dl.acm.org/doi/abs/10.5555/2481569.2481570.
16.
Frapolli
N.
,
Chikatamarla
S. S.
, and
Karlin
I. V.
,
Phys. Rev. E
92
,
061301
(
2015
).
18.
Gilyén
A.
,
Su
Y.
,
Low
G. H.
, and
Wiebe
N.
, “
Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics
,” in
Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
(
ACM
,
2019
), pp.
193
204
.
19.
Harrow
A. W.
,
Hassidim
A.
, and
Lloyd
S.
, “
Quantum algorithm for linear systems of equations
,”
Phys. Rev. Lett.
103
,
150502
(
2009
).
20.
Ingelmann
J.
,
Bharadwaj
S. S.
,
Pfeffer
P.
,
Sreenivasan
K. R.
, and
Schumacher
J.
,
Comput. Fluids
281
,
106369
(
2024
).
21.
Itani
W.
,
Sreenivasan
K. R.
, and
Succi
S.
,
Phys. Fluids
36
,
017112
(
2024
).
22.
23.
Jasak
H.
,
Int. J. Nav. Archit. Ocean Eng.
1
,
89
(
2009
).
24.
Krüger
T.
,
Kusumaatmaja
H.
,
Kuzmin
A.
,
Shardt
O.
,
Silva
G.
, and
Viggen
E. M.
,
The Lattice Boltzmann Method
(
Springer International Publishing
,
2017
).
25.
Li
X.
,
Yin
X.
,
Wiebe
N.
,
Chun
J.
,
Schenter
G. K.
,
Cheung
M. S.
, and
Mülmenstädt
J.
, “
Potential quantum advantage for simulation of fluid dynamics
,” arXiv:2303.16550 (
2023
).
26.
Love
P. J.
and
Boghosian
B. M.
,
Physica A
362
,
210
(
2006
).
27.
Low
G. H.
and
Chuang
I. L.
,
Quantum
3
,
163
(
2019
).
28.
Meng
Z.
and
Yang
Y.
, “
Quantum computing of fluid dynamics using the hydrodynamic Schrödinger equation
,” arXiv:2302.09741 (
2023
).
29.
Mezzacapo
A.
,
Sanz
M.
,
Lamata
L.
,
Egusquiza
I. L.
,
Succi
S.
, and
Solano
E.
,
Sci. Rep.
5
,
13153
(
2015
).
30.
Penuel
J.
,
Katabarwa
A.
,
Johnson
P. D.
,
Farquhar
C.
,
Cao
Y.
, and
Garrett
M. C.
, “
Feasibility of accelerating incompressible computational fluid dynamics simulations with fault-tolerant quantum computers
,” arXiv:2406.06323 (
2024
).
31.
Pool
A. J.
,
Somoza
A. D.
,
Mc Keever
C.
,
Lubasch
M.
, and
Horstmann
B.
,
Phys. Rev. Res.
6
,
033257
(
2024
).
32.
Pravia
M. A.
,
Chen
Z.
,
Yepez
J.
, and
Cory
D. G.
,
Quantum Inf. Process
2
,
97
(
2003
).
33.
Qiskit Contributors,
Qiskit: An Open-Source Framework for Quantum Computing
(
Qiskit
,
2023
).
34.
Sanavio
C.
,
Mauri
E.
, and
Succi
S.
, “
Explicit quantum circuit for simulating the advection-diffusion-reaction dynamics
,” arXiv:2410.05876 (
2024
).
35.
Sanavio
C.
and
Succi
S.
,
AVS Quantum Sci.
6
,
023802
(
2024
).
36.
Schalkers
M. A.
and
Möller
M.
,
J. Comput. Phys.
502
,
112816
(
2024
).
37.
Schalkers
M. A.
and
Möller
M.
,
Quantum Inf. Process.
23
,
1
(
2024
).
38.
Schlimgen
A. W.
,
Head-Marsden
K.
,
Sager-Smith
L. M.
,
Narang
P.
, and
Mazziotti
D. A.
,
Phys. Rev. A
106
,
022414
(
2022
).
39.
Shende
V. V.
,
Bullock
S. S.
, and
Markov
I. L.
, “
Synthesis of quantum logic circuits
,” in
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
(
IEEE
,
2005
), pp.
272
275
.
41.
Succi
S.
,
Itani
W.
,
Sreenivasan
K.
, and
Steijl
R.
,
Europhys. Lett.
144
,
10001
(
2023
).
42.
Todorova
B. N.
and
Steijl
R.
,
J. Comput. Phys.
409
,
109347
(
2020
).
43.
Xu
A.-G.
,
Zhang
G.-C.
,
Gan
Y.-B.
,
Chen
F.
, and
Yu
X.-J.
,
Front. Phys.
7
,
582
(
2012
).
45.
Yepez
J.
,
Int. J. Mod. Phys. C
12
,
1273
(
2001
).