Quantum JPEG

The JPEG algorithm compresses a digital image by filtering its high spatial-frequency components. Similarly, we introduce a quantum algorithm that uses the quantum Fourier transform to discard the high spatial-frequency qubits of an image, downsampling it to a lower resolution. This allows one to capture, compress, and send images even with limited quantum resources for storage and communication. We show under which conditions this protocol is advantageous with respect to its classical counterpart.


I. INTRODUCTION
Digital images represent information in terms of arrays of pixels.Compression and downsampling can reduce the cost for data storage and transmission, while preserving the original visual pattern [1].An example is the joint photographic expert group (JPEG) algorithm [2], which operates in the spatial-frequency domain.The JPEG algorithm divides the input image into smaller subimages, then taking the discrete cosine transform of each element.The high spatial frequencies are removed, reducing the amount of information stored at the cost of the quality of the output image.
Quantum image processing seeks to encode, manipulate, and retrieve visual information in a quantummechanical way [3][4][5][6][7][8][9].In this paper, we discuss the downsampling and compression of images encoded as multiqubit quantum states.Using the quantum Fourier transform (QFT) [10,11], we provide an algorithm to discard the most significant qubits in the encoding register, thus filtering the high spatial-frequency components of the input image.This algorithm preserves the original visual pattern, downscaling its resolution and reducing the number of encoding resources in the register (see Fig. 1).Our implementation differs from [12][13][14][15] in both the encoding and the compression strategies.In particular, [12] leverages matrix product states truncation, while [13] and [14,15] propose hybrid or interpolation-based algorithms, respectively.It also differs from [16] (appeared at the same time as the current paper), which focuses on image filtering rather than downsampling.
By taking into account the statistical reconstruction at the output, we show that our algorithm is advantageous over its classical counterpart, as long as the output resolution is sufficiently compressed.As a possible encoder implementation, we consider a multiatom lattice sensor In this section we discuss the quantum downsampling and compression algorithm.We vectorize and encode the image in the probabilities of a state loaded in the n 0 -register, i.e. a register of n 0 qubits.After a QFT, we trace out the qubits that correspond to the high spatialfrequency components of the image.Then, we return to the computational basis with the inverse QFT (QFT † ), discarding the redundant qubits and compressing the image in the n 2 -register, i.e. a register of n 2 < n 0 qubits.We discuss the relation between the downsampling parameter, i.e. the number of discarded qubits n 2 − n 0 , and the loss in the output resolution.
A digital grayscale image is a grid of elements called pixels.Each pixel is associated with a finite and discrete q 00 q 01 q 10 q 11 p 0000 p 0001 p 0010 p 0011 quantity called gray value, which represents the brightness reproduced at each point of the grid (for colored images, there are three brightness values, one for each of the red, green and blue channels).Classically, each pixel value is encoded in a string of c bits which determines the total number of L = 2 c gray levels of the image, namely the depth.Mathematically, an image is described by a matrix I , with the size N × M representing its resolution and N M the total number of pixels.For simplicity we consider only square images, i.e. with N = M .We represent I as a n 0 -qubit quantum state, where |j⟩ 0 labels the elements of the computational basis on the n 0 -qubit Hilbert space H 0 and θ is a row-wise1 vectorization map that associates a N × N matrix to the N2 column vector θ(I ) = I 00 , . . ., I 0(N −1) , I 10 , . . ., T .
(2) We call |Ψ⟩ 0 quantum image.In our notation, states or operations with subscript 0 are referred to the n 0 -register.This representation encodes the gray value of N 2 pixels in the probabilities of n 0 = 2 log 2 (N ) qubits, whose normalization follows by rescaling each pixel value to the total brightness of the image, so that This strategy yields a first lossless exponential compression of the image: from N 2 pixels to n 0 qubits.Below, we process the quantum image to further reduce the size of the encoding register.
The QFT operates on the n 0 -register as We use the subscript notation to specify the register on which the QFT operates.Consider the n 1 -subregister made only of the first n 1 = n 0 − ñ qubits of H 0 and let |j⟩ 1 label the elements of its computational basis.Then |j⟩ 0 = |l⟩ ñ ⊗|m⟩ 1 , with |l⟩ ñ defined on the ñ-register.The inverse QFT operates naturally on the n 1 -subregister as Our algorithm downsamples a quantum image |Ψ⟩ 0 from the n 0 -register to the n 2 -register, with settable ñ, reducing the number of encoding qubits from n 0 to n 2 = n 0 − 2ñ.The output image resolution is N/2 ñ × N/2 ñ.
We adopt the little-endian ordering, with the least significant qubit placed on the top of the register and labeled by q = 0, with 0 ≤ q ≤ m − 1 and m ∈ {n 0 , n 1 , n 2 }. 2 We denote H the single-qubit Hadamard gate.

Algorithm 1 Quantum image downsampling
Input Quantum image |Ψ⟩ 0 Parameter Integer ñ < n0/2 1: apply H ⊗n 0 ▷ n0-register 2: apply UQFT 0 3: for q in n0-register do The algorithm operates as follows.After applying H ⊗n0 , a first QFT is taken on the register in which |Ψ⟩ 0 is initially loaded.Discard the first ñ most significant qubits from the bottom of the register, which means averaging out the low-probability frequency components of the original image (Rule 1).Take the QFT † .This reduces the image resolution of a factor 2 ñ along only one of its axes.Then, discard the last ñ positions of the first half of the initial register, which contain redundant information after the QFT † , then take H ⊗n2 .As we discuss below, this yields a N/2 ñ × N/2 ñ image, conformally downscaled along both its axes, and described by the state ρ 2 .There may be different implementations of this protocol, which eventually uses the superposition principle to achieve a Fourier compression that operates on both the image axes, simultaneously.However, we expect such generalizations to produce similar results, eventually optimizing the complexity of the whole algorithm or maximizing the signal-to-noise ratio of the final reconstruction.
The algorithm works independently of the presence of H ⊗n0 (Step 1) and H ⊗n2 (Step 10).As we show in Appendix A, the Hadamard gates improve the quality at the output, reducing the statistical fluctuations at each pixel while preserving the original image contrast.We report the circuit implementation of Alg. 1 in Fig. 2.
The reconstruction of the output image requires the complete knowledge of the probabilities Since the gray values are probabilistically encoded in the output state, a single computational basis measurement on ρ 2 would produce one white pixel in the image plane, with position specified by the output bitstring.The complete reconstruction of the image requires the full knowledge of the probability distribution of Eq. ( 5).Let S be the number of shots, obtained by repeating the algorithm and the measurement for S times, and L be the number of gray levels desired at the output, e.g.L = 256 for an 8-bit image.Let f j be the frequency of each outcome.We identify the color white with the pixel with the highest probability f w = max j f j , with f w ∈ [0, 1] and f j ∈ [0, f w ] ∀j ∈ {0, 1, . . ., d 2 − 1}.Each gray value is reconstructed as with g j = 0, 1, . . .L − 1 and g w = L.A measurement on ρ 2 corresponds to two possible outcomes: a shot is assigned to the jth bin, i.e. to the jth pixel, with probability p j , or it is not with probability 1 − p j .Under the normal approximation and with the 95% confidence level [18], the probabilities can be estimated as  reads By requiring that g j fluctuates of at most one gray level, we get S ≥ 4f j (1 − f j )L 2 /f 2 w .Any realistic image with at least two non-black pixels has f w < 0.5, yielding S ≥ 4L 2 /f w .For an all-purpose estimation, consider a d × d completely white image as the worst-case scenario.In this case f w = 1/d 2 and where each pixel uniformly collects an average of 4L 2 shots.
Eq. ( 8) provides a conservative estimation for every image.The optimal sample size depends on the proportion between dark and bright pixels, which determines the value of f w . 3For a specific image, the reconstruction can be actively optimized by measuring the fluctuations each time a new shot is collected, then stopping the experimental repetitions as soon as the desired standard deviation is reached by the chosen amount of pixels.
In terms of number of gates, Alg. 1 provides an exponential speedup over its classical counterpart.However, the output reconstruction requires multiple runs of it, increasing the overall cost of the protocol.We show that an advantage remains, which increases when registers of different sizes are all downsampled to the same resolution.Consider a N × N (n 0 -qubit) image and the N/2 ñ × N/2 ñ (n 2 -qubit) output of Alg. 1, with L = 2 c gray levels and downsampling parameter ñ = (n 0 −n 2 )/2.For n 1 ≪ n 0 , the complexity of QFT 0 dominates of QFT † 1 , while for n 1 ≃ n 0 both contribute to the same order.For this reason, their composition can be upper bounded at O(2n 2 0 ) gates. 4 A conservative reconstruction requires 4L 2 2 n2 shots, for a total cost of Q(n 0 , n 2 ) = O(8L 2 n 2 0 2 n2 ) operations. 5 A single fast Fourier transform (FFT) requires O 2N 2 log 2 N operations [19], for a total classical cost C(n 0 ) = O (2n 0 2 n0 ).Then, the relative cost in terms of operations and statistics is For example, downsampling a 128×128 (14-qubit) blackand-white image to a 8 × 8 (6-qubit) resolution requires ∼ 12% fewer operations using Alg. 1 than the FFT.
In Fig. 3, we simulate the downsampling and reconstruction of a 8-bit 512 × 512 image, prepared in a 18qubit register.The output is shown for different values of n 2 .In Fig. 3e, we plot the statistical fluctuations for a reconstruction performed with L 3/2 d 2 shots.We show that this order of magnitude provides the adequate statistics for an average reconstruction.
The vectorized representation of Eq. ( 1) allows the processing of high-resolution images even for small NISQ registers [20].For example, let b be the maximum number of qubits supported by an hypothetical hardware device, and consider a N ×N image exceeding the encoding capability of the hardware with N 2 > 2 b .The compression can still be achieved by first splitting the image in 2 b/2 × 2 b/2 subimages, then processing it as a sequence of vectorized quantum states Ψ (m) 0 m , each given in input to Alg. 1, with m = 0, 1, . . ., N 2 /2 b .A similar pre-encoding is also performed by the classical JPEG algorithm, which splits the original image in 8 × 8 blocks, before mapping them to the spatial frequencies domain [2].
Finally, we outline an alternative algorithm that compresses the image without downscaling its resolution.Consider U QFT 0 |Ψ⟩ 0 .Instead of discarding the last ñ qubits of the n 0 -register, we apply Rule 1 to reinitialize them to |0 . . .0⟩ ñ. 6 After taking QFT † 0 and skipping Rule 2, we obtain ρ 0 , without modifying the size of the original register.This implementation closely relates to the classical JPEG compression, in particular when combined with the previous subimages encoding.As opposed to Alg. 1, this algorithm preserves the resolution of the input image.For this same reason, however, the full output reconstruction shows no computational advantage with respect to its classical counterpart.

III. HARDWARE ENCODER: THE QUANTUM CAMERA
In this section, we show how to encode an image into the n 0 -register using a multiatom lattice sensor analogous to a classical charge-coupled device (CCD) [21].This provides a hardware alternative to the Grover-Rudolph preparation scheme [22].The implementation below precedes the application of Alg. 1, whose working principle remains independent of the specific hardware encoder model or the preparation strategy.The output of the sensor is related to the n 0 -register of Alg.We consider a lattice made of N 2 identical atoms, with evenly-spaced energy levels.Let (m, n) label their position on the N ×N lattice.Each atom can be modeled as a two-level system, namely a qubit with computational basis {|0⟩ mn , |1⟩ mn }.We initialize the lattice in the vacuum state |b⟩ = m ′ ,n ′ |0⟩ m ′ n ′ , which represents a completely black image with all the qubits in the ground state.
We model the interaction between the lattice and the electromagnetic field using a multimode and multiatom Jaynes-Cummings Hamiltonian, in the resonant rotating wave approximation [23][24][25] where and γ mn (k) is the Fourier transform of the spatial coupling function g mn (r) between the coherent photons and a qubit located at (m, n), i.e. γ mn (k) = drg mn (r)e −ik•r .We assume that g mn is compactly supported ∀m, n and that supp(g mn ) ∩ supp(g m ′ n ′ ) is negligible.
The interaction Hamiltonian H I encodes the input coherent state as a quantum image in the user's sensor.See Appendix B for a complete derivation and a discussion about time evolution.Consider a shutter that initially prevents the photons from interacting with the qubits, so that the initial state reads |I⟩⟩ = |α ψ ⟩ ⊗ |b⟩.When the shutter opens, a photon may release energy in the sensor, exciting one of the qubits as |0⟩ mn → |1⟩ mn .Globally, a single interaction may give no excitation, a single excitation or multiple qubits excitations, with contributions to the final state of following the form: The vacuum term can be neglected in post-selection, by discarding any completely black image at the output.Then, the single-qubit contributions become dominant and the final state reads |F ⟩⟩ ≃ |α ψ ⟩ ⊗ |Ω⟩, where with P mn = supp (g mn ), and C an overall normalization constant.In these equations, the interaction term g mn (r) acts as a spatial transfer function that discretizes the image in the region P mn , encoding the light intensity in the probability of the corresponding qubit to occupy the higher energy level state.In Fig. 4, we plot the probabilities encoded from a superposition of two Gaussian packets.We compare the numerical results with the probability density of the input state, showing that they correctly reproduce the spectral amplitudes of |α ψ ⟩.
The state |Ω⟩ represents the image using a sub-optimal one-hot encoding, i.e. with only N 2 -qubit contributions of the form We optimize the output of the hardware encoder by appending a one-hot to binary converter [26,27], which uses a combination of CNOTs to re-encode the image in n 0 = 2 log 2 (N ) qubits,7 yielding which is the state |Ψ⟩ 0 of Section II, with w = θ(v).This conversion requires O(N 2 ) operations, which is the same readout cost of the CCD bidirectional shift registers.After this step, the user applies Alg. 1 to discard 2ñ qubits from the encoding register.This compresses |Ψ⟩ 0 in ρ 2 , reducing the requirements for its storage, e.g. the number of qubits of a quantum random access memory [28], as well as the dimension of the channel needed for communicating the image to another user [29].

IV. CONCLUSIONS
In this paper, we introduced a novel quantum algorithm for image downsampling and compression.Our protocol leverages the QFT to downscale the original visual pattern, which is preserved while reducing the number of encoding qubits.This opens new perspectives for quantum imaging and quantum image processing.In terms of number of gates only, our scheme provides an exponential speedup over its classical FFT-based counterpart and over interpolation-based quantum algorithms.Although its cost rises when reconstructing full images, we investigated the statistics at the output, showing that the advantage increases consistently with the size of the input register.Whenever a full reconstruction is not needed, e.g. when using our algorithm as a quantum pre-processing operation rather than a standalone module, the theoretical advantage is completely recovered.
As a possible hardware implementation, we designed a multiatom lattice sensor for coherent light.Modeled by the Jaynes-Cummings Hamiltonian, this encoder maps the light intensity at each lattice location to the probability of the corresponding qubit of being excited by a single-photon interaction.In this framework, two users can capture and share images even with limited communication resources, using our downsampling algorithm to forward the image state through lower-dimensional channels.
The above procedure is implicitly summarized in Eq. ( 5), in which the compressed state ρ 2 is obtained by applying (Rule 1) and (Rule 2) to |Φ⟩ 0 .For Eq. (A1), we find which reproduces the same input pattern, but downsampled and represented with fewer qubits than |Ψ⟩ 0 .The results improve when |Ψ⟩ 0 is encoded in the X basis.Let H be the single-qubit Hadamard operator.As shown in Fig. 2, we introduce H ⊗n0 and H ⊗n2 respectively before QFT 0 and after (Rule 2).Hence, Alg. 1 reads In Fig. 5, we simulate the downsampling of a onedimensional array pattern, with and without using the Hadamard gates.Without the Hadamard, Alg. 1 reproduces the original input, but with lowered contrast.Moreover, non-symmetrical boundary artifacts occur for a ladder, i.e. linear and monotonic, pattern.This effect increases with the number of discarded qubits, i.e. inversely with the size of QFT † 1 .In both cases, the Hadamard improves the quality of the output: it regularizes the contrast, while removing the artifacts in the ladder pattern.In Fig. 6 we show that, for a generic 512 × 512 image, the Hadamard also reduces the statistical fluctuations in the output reconstruction, while keeping the same number of shots.

APPENDIX B: JAYNES-CUMMINGS HARDWARE ENCODING
In this section we show how to obtain the state of Eq. ( 15) from the multimode and multiatom Jaynes-Cummings model introduced in [23,24], also known as FIG. 6. Statistical reconstruction at the algorithm output, without (a) and with (b) the Hadamard gates.The input image represents a Shepp-Logan phantom, commonly used in medical tomography [31], which is prepared as a 18-qubit quantum state and downsampled for ñ = 2.The simulation is performed with Qiskit Aer and 2 14 shots.The choice of a sub-optimal sample size purposely showcases the effect of the statistical fluctuations.The histograms plot the number of pixels with respect to their standard deviation, obtained by repeating the reconstruction for 20 times.The vertical axis is in log 10 scale.The inset displays the standard deviation at each pixel location.(a) Downsampling in the standard computational basis, i.e. without using the Hadamard gates.(b) Downsampling in the X basis, i.e. with the Hadamard gates, as reported in Fig. 2. Statistical fluctuations reduce when using the Hadamard, while keeping the same number of shots.
multimode Tavis-Cummings model [32].We adopt units with ℏ = 1.We work in the interaction picture and in the rotating wave approximation [25], in which the free Hamiltonian reads ω 0 σ (3)  mn + dkω(k)a + (k)a(k) , (B1) with σ (3) mn = |1⟩ mn ⟨1| mn − |0⟩ mn ⟨0| mn , while the interaction term yields The total Hamiltonian of the system reads where λ is an overall coupling constant.Prepare the system in the state |I⟩⟩ = |α ψ ⟩⊗|b⟩.Since [H F , H I ] = 0, the evolution of |I⟩⟩ is completely driven by the unitary operator generated by H I , so that which is precisely the result shown in Eqs. ( 15) and ( 16).

2 FIG. 2 .
FIG. 2. Downsampling of a quantum image into a register of fewer qubits.The image is initially encoded in the probabilities of |Ψ⟩ 0 .The most significant qubit is represented at the bottom of the n0-register.The QFT is taken.(Rule 1) Chosen an integer ñ < n0/2, take an QFT † on the first n0 − ñ qubits and discard the remaining part of the register.(Rule 2) Discard the same number of qubits from those register positions that correspond to the last ñ qubits of the first half of the n0-register.The algorithm compresses the image |Ψ⟩ 0 in the state ρ2, reducing the number of pixels from N 2 to N 2 /4 ñ.The Hadamard gates reduce the statistical fluctuations at the output, while preserving the original image contrast (see Appendix A for a discussion).The trade-off between the output resolution and the amount of resources saved (the number of discarded qubits) is controlled by the choice of ñ.

FIG. 3 .
FIG. 3. Simulated results for a 8-bit digital image with original resolution of 512 × 512 pixels.(a) The image is initially vectorized and prepared as a 18-qubit quantum state, then Alg. 1 is taken.Figures (b-d) show the reconstructed output with respect to the number of n2 qubits in the output register.The Hadamard gates improve the reconstruction, reducing the statistical fluctuations while correcting the image contrast (see Appendix A for a discussion).Simulation performed with Qiskit Aer and 2 n 2 × 256 3/2 shots.For comparison, all the images are rescaled to the same size.Aliasing starts to appear on cat's whiskers.(b) n2 = 16, 256 × 256 pixels.(c) n2 = 14, 128 × 128 pixels.(d) n2 = 10, 32 × 32 pixels.(e) Statistical fluctuations at the output.The reconstruction of (c) is repeated for 20 times.The histogram plots the number of pixels with respect to their standard deviation.The inset shows the standard deviation at each pixel location.
1, in which the fingerprint of the image is loaded as the multiqubit quantum state |Ψ⟩ 0 .Consider a classical image encoded in a multimode coherent state |α ψ ⟩ = D(α ψ ) |∅⟩, where |∅⟩ denotes the vacuum state andD(α ψ ) = exp αA † ψ − α * A ψ (10)the multimode displacement operator.We work in the far-field approximation, with k and r = (x, y) respectively labeling the momentum and the transversal position on the image plane.The creation operatorA † ψ = dk ψ(k)a † (k) (11)encodes the Fourier transform ψ(k) = drψ(r)e −ik•r of the spectral amplitudes of |α ψ ⟩.In our notation, ψ describes the classical analog image encoded in the coherent state, while |Ψ⟩ 0 denotes the same image but discretized, vectorized and probabilistically encoded in the n 0 -register of Section II.In this section, we show how to connect these two representations.