Compressed sensing (CS) is a promising approach to reduce the number of measurements in photoacoustic tomography (PAT) while preserving high spatial resolution. This allows to increase the measurement speed and reduce system costs. Instead of collecting point-wise measurements, in CS one uses various combinations of pressure values at different sensor locations. Sparsity is the main condition allowing to recover the photoacoustic (PA) source from compressive measurements. In this paper, a different concept enabling sparse recovery in CS PAT is introduced. This approach is based on the fact that the second time derivative applied to the measured pressure data corresponds to the application of the Laplacian to the original PA source. As typical PA sources consist of smooth parts and singularities along interfaces, the Laplacian of the source is sparse (or at least compressible). To efficiently exploit the induced sparsity, a reconstruction framework is developed to jointly recover the initial and modified sparse sources. Reconstruction results with simulated as well as experimental data are given.

## I. INTRODUCTION

Photoacoustic tomography (PAT) is a non-invasive hybrid imaging technology, that beneficially combines the high contrast of pure optical imaging and the high spatial resolution of pure ultrasound imaging (see Refs. 1–3). The basic principle of PAT is as follows (see Fig. 1). A semitransparent sample (such as a part of a human body) is illuminated with short pulses of optical radiation. A fraction of the optical energy is absorbed inside the sample, which causes thermal heating, expansion, and a subsequent acoustic pressure wave depending on the interior absorbing structure of the sample. The acoustic pressure is measured outside of the sample and used to reconstruct an image of the interior.

In this paper, we consider PAT in heterogeneous acoustic media, where the acoustic pressure satisfies the wave equation^{3,4}

Here $r\u2208\mathbb{R}d$ is the spatial location, $t\u2208\mathbb{R}$ the time variable, $\Delta r$ the spatial Laplacian, *c*(**r**) the speed of sound, and *f*(**r**) the photoacoustic (PA) source that has to be recovered. The factor $\delta \u2032(t)$ denotes the distributional derivative of the Dirac *δ*-distribution in the time variable and the product $\delta \u2032(t)\u2009f(r)$ on the right-hand side of the equation represents the PA source due to the pulsed optical illumination. The wave equation (1) is augmented with the initial condition *p*(**r**,*t*) = 0 on {*t* < 0}. The acoustic pressure is then uniquely defined and referred to as the causal solution of Eq. (1). Both cases *d* = 2,3 for the spatial dimension are relevant in PAT: The case *d* = 3 arises in PAT using classical point-wise measurements; the case *d* = 2 is relevant for PAT with integrating line detectors (ILDs).^{5–7}

To recover the PA source, the pressure is measured with sensors distributed on a surface or curve outside of the sample; see Fig. 1. Using standard sensing, the spatial sampling step size limits the spatial resolution of the measured data and therefore the spatial resolution of the final reconstruction. Consequently, high spatial resolution requires a large number of detector locations. Ideally, for high frame rate, the pressure data are measured in parallel with a large array made of small detector elements. However, producing a detector array with a large number of parallel readouts is costly and technically demanding. In this work, we use techniques of compressed sensing (CS) to reduce the number of required measurements and thereby accelerate PAT while keeping high spatial resolution.^{8–11}

CS is a novel sensing paradigm, introduced in Refs. 12–14, that allows to capture high resolution signals using fewer measurements than advised by Shannon's sampling theory. The basic idea is to replace point samples by linear measurements, where each measurement consists of a linear combination of sensor values. It offers the ability to reduce the number of measurements while keeping high spatial resolution. One crucial ingredient enabling CS PAT is sparsity, which refers to the requirement that the unknown signal is sparse, in the sense that it has only a small number of entries that are significantly different from zero (possibly after a change of basis).

### A. Main contributions

In this work, we develop a new framework for CS PAT that allows to bring sparsity into play. Our approach is rooted in the concept of sparsifying temporal transforms developed for PAT in Refs. 10 and 11 for two and three spatial dimensions. However, the method in this present paper extends and simplifies this transform approach considerably. First, it equally applies to any detection surface and arbitrary spatial dimension. Second, the new method can even be applied to heterogeneous media. In order to achieve this, we use the second time derivative applied to the pressure data as a sparsifying transform. Opposed to Refs. 10 and 11, where the transform was used to sparsify the measured signals, in the present work we exploit this for obtaining sparsity in the original imaging domain.

Our new approach is based on the following. Consider the second time derivative $\u2202t2p(r,t)$ of the PA pressure. We will show that this transformed pressure again satisfies the wave equation, however, with the modified PA source $c2(r)\Delta rf(r)$ in place of the original PA source *f*(**r**). If the original PA source consists of smooth parts and jumps, the modified source consists of smooth parts and sparse structures; see Fig. 2 for an example. This enables the use of efficient CS reconstruction algorithms based on sparse recovery. One possible approach is based on the following two-step procedure. First, recover an approximation $h\u2243c2\Delta rf$ via *ℓ*^{1}-minimization. In a second step, recover an approximation to *f* by solving the Poisson equation $\Delta rf=h/c2$.

While the above two-stage approach very well recovers the singularities of the PA source in our performed numerical tests, at the same time it showed disturbing low-frequency artifacts. Therefore, in this paper we develop a completely novel strategy for jointly recovering *f* as well as its Laplacian. Similarly to the two-stage procedure, the joint sparse reconstruction approach aims at finding the sparsified PA source *h*, corresponding to the second derivative of the measured data, via *ℓ*^{1}-minimization. Additionally, it requires the original PA source *f* to be consistent with the original data. Finally, the two source terms are connected by adding the constraint $\Delta rf=h/c2$. The resulting optimization problem will be formally introduced in Sec. III C. In the case of noisy data, we consider a penalized version that can be efficiently solved via various modern optimization techniques such as forward backward splitting.

### B. Outline

In Sec. II we describe PAT and existing CS approaches, wherein we focus on the role of sparsity in PAT. The proposed framework for CS PAT will be presented in Sec. III. This includes the sparsification of the PA source and the joint sparse reconstruction algorithm. Numerical and experimental results are presented in Sec. IV. The paper concludes with a summary and an outlook on future research presented in Sec. V.

## II. COMPRESSED PAT

### A. PAT

Suppose that $V\u2286\mathbb{R}d$ is a bounded region and let **r**_{i} for *i* = 1,…,*n* denote admissible detector locations distributed on the boundary *∂V*. We define the forward wave operator **W** that maps a PA source *f* vanishing outside *V* to the corresponding pressure data by

where *p*(**r**,*t*) is the solution of Eq. (1). The inverse source problem of PAT is to recover the PA source *f* from data $Wf(ri,t)$, known for a finite number of admissible detector locations **r**_{i}. The measured data **W***f* are considered to be fully/completely sampled if the transducers are densely located on the whole boundary *∂V* such that the function *f* can be stably reconstructed from the data. Finding necessary and sufficient sampling conditions for PAT is still on-going research.^{15}

Let us mention that most of the theoretical works on PAT consider the continuous setting where the transducer locations are all points of a surface or curve $\Gamma \u2286\u2202V$; see Refs. 14–20. On the other hand, most works on discrete settings consider both discrete spatial and time variables.^{15,21} The above setting [Eq. (2)] has been considered in a few works.^{10,11,22} It well reflects the high sampling rate of the time variable in many practical PAT systems.

### B. Compressive measurements in PAT

The number *n* of detector positions in Eq. (2) is directly related to the resolution of the final reconstruction. Namely, consider the case *V* being the disk of radius *R* and *f* being essentially wavenumber limited with maximal wavenumber given by *λ*_{0}. Then, *N _{φ}* ≥ 2

*Rλ*

_{0}equally spaced transducers are sufficient to recover

*f*with small error; see Ref. 15. This condition, however, requires a very high sampling rate, especially when the PA source contains narrow features, such as blood vessels, or a sharp interface. Consequently, full sampling in PAT is costly and time consuming.

To reduce the number of measurements while preserving resolution, we use CS measurements in PAT. Instead of collecting *n* individually sampled signals as in Eq. (2), we take CS measurements

with *m***≪***n*.^{10,11} We proposed to take the measurement matrix **A** in Eq. (3) as the adjacency matrix of a lossless expander graph. Hadamard matrices have been proposed in Refs. 9 and 23. In this work, we take **A** as a random Bernoulli matrix with entries $\xb11/m$ with equal probability or a Gaussian random matrix consisting of independent identically distributed (i.i.d.) $N(0,1/m)$-Gaussian random variables in each entry. These choices are validated by the fact that Gaussian and Bernoulli random matrices satisfy the restricted isometry property (RIP) with high probability (see Sec. II C). Subsampling matrices, which do not satisfy the RIP, have been used in Ref. 24.

### C. The role of sparsity

A central aspect in the theory of CS is sparsity of the given data in some basis or frame.^{12,14,25} Recall that a vector $x\u2208\mathbb{R}N$ is called *s*-sparse if $|{i|xi\u22600}|\u2264s$ for some number *s* **≪** *N*, where $|\u2009\xb7\u2009|$ is used to denote the number of elements of a set. If the data are known to have sparse structure, then reconstruction procedures using *ℓ*_{1}-minimization or greedy-type methods can often be guaranteed to yield high quality results even if the problem is severely ill-posed.^{12,26} If we are given measurements **M***x* = *y*, where $x\u2208\mathbb{R}N$ and $y\u2208\mathbb{R}m$ with *m* ≪ *N*, the success of the aforementioned reconstruction procedures can, for example, be guaranteed if the matrix **M** satisfies the restricted isometry property of order *s* (*s*-RIP), i.e., for all *s*-sparse vectors *z* we have

for a RIP constant $\delta s<1/2$; see Ref. 25. Gaussian and Bernoulli random matrices satisfy the *s*-RIP with high probability, provided $m\u2265Cs\u2009\u2009log(en/s)$ for some reasonable constant *C* and with e denoting Euler's number.^{27}

In PAT, the possibility to sparsify the data has recently been examined.^{10,11} In these works it was observed that the measured pressure data could be sparsified, and the sparse reconstruction methods were applied directly to the pressure data. As a second step, still, a classical reconstruction via filtered backprojection (FBP) had to be performed. The sparsification of the data was achieved with a transformation in the time direction of the pressure data. In two dimensions, the transform is a first-order pseudo-differential operator,^{11} while in three dimensions the transform is of second order.^{10}

## III. PROPOSED FRAMEWORK

### A. Sparsifying transform

The following theorem is the foundation of our CS approach. It shows that the required sparsity in space can be obtained by applying the second time derivative to the measured data.

**Theorem III.1.** *Let* $f:\mathbb{R}d\u2192\mathbb{R}$ *be a given smooth source term vanishing outside V, and let p(**r**,t) denote the causal solution of Eq.* (1)*. Then* $\u2202t2p(r,t)$ *is the causal solution of*

*In particular, we have*$\u2202t2M[f]=M[c2\Delta rf]$, *where***M***denotes the CS PAT forward operator defined by Eq*. (3).

*Proof.* The proof is split into two parts. First, we show that the causal solution of Eq. (1) is, for positive times, equal to the solution of the initial value problem

Second, we take the second time derivative of the even extension and derive Eq. (5)

To see the equivalence of Eqs. (6)–(8) and (1), denote by *q* the even extension of the solution of Eqs. (6)–(8) to $\mathbb{R}d\xd7\mathbb{R}$, which satisfied the homogeneous wave equation for all $t\u2208\mathbb{R}$. By using the characteristic function of the positive semiaxis, *χ*(**r**,*t*): = 1 if *t* > 0 and zero otherwise, we obtain that *χq* is a causal function. By the product rule, we have

This shows that *χq* is the (unique) causal solution of Eq. (1) and establishes the equivalence of the two formalisms. Next notice that $\u2202t2q$ satisfies the wave equation as well, $(\u2202t2\u2212c2\Delta r)(\u2202t2q)(r,t)=0$. Therefore, a computation similar to Eq. (9) gives $(\u2202t2\u2212c2\Delta r)(\chi \u2202t2q)=\delta \u2032c2\Delta rf+\delta \u2202t3q(\xb7,0)$. Because *q* is smooth and even in *t*, its third time derivative is continuous and odd in *t*, and thus $\u2202t3q(r,0)=0$. This shows that $\chi \u2202t2q$ is the causal solution of Eq. (5) and concludes the proof. ◻

### B. Spatial discretization and sparse recovery

Throughout the following, we assume a semi-discrete setting, where the spatial variable $r\u2208{0,\u2026,Nr}d$ is already discretized. As we show in Theorem A.2 in the Appendix, an analog of Theorem III.1 for discrete sources $f:{0,\u2026,Nr}d\u2192\mathbb{R}$ holds as well, where the discrete Laplacian $\Delta r$ may be defined in the spatial domain using finite differences or in the spectral domain using the Fourier transform. Working with discrete sources significantly simplifies treating the sparsity of $\Delta rf$. In the continuous setting applying the Laplacian to piecewise smooth functions would require working with generalized functions (or distributions), which is completely avoided by taking the discretized spatial variable (which is anyway required for the numerical implementation).

Typical phantoms consist of smoothly varying parts and rapid changes at interfaces. For such PA sources, the modified source $c2\Delta rf$ is sparse or at least compressible. The theory of CS therefore predicts that the modified source can be recovered by solving

where **M** is the forward operator (including wave propagation and compressed measurements) and $\Vert \xb7\Vert 1$ denotes the *ℓ*^{1}-norm guaranteeing sparsity of *h*.

In the case in which the unknown is only approximately sparse or the data are noisy, one instead minimizes the penalized functional problem $(1/2)\Vert Mh\u2212\u2202t2y\Vert 22+\beta \Vert h\Vert 1$, where *β* > 0 is a regularization parameter which gives trade-off between the data-fitting term $\Vert Mh\u2212\u2202t2y\Vert 22$ and the regularization term $\Vert h\Vert 1$. Having obtained an approximation of *h* by either solving Eq. (10) or the relaxed version, one can recover the original PA source *f* by subsequently solving the Poisson equation $\Delta rf=h/c2$ with zero boundary conditions.

While the above two-stage procedure recovers boundaries well, we observed disturbing low-frequency artifacts in the reconstruction of *f*. Therefore, below we introduce a new joint sparse reconstruction approach based on Theorem III.1 that jointly recovers *f* and *h*.

### C. Joint sparse reconstruction approach

As argued above, the second derivative $p\u2033$ is well suited [via $c2(r)\Delta rf$] to recover the singularities of *f*, but hardly contained low-frequency components of *f*. On the other hand, the low-frequency information is contained in the original data, which are still available to us. Therefore, we propose the following joint sparsity constrained optimization problem:

Here, *I _{C}* is the indicator function of the positive cone $C\u225c{f|f(r)\u22650}$, defined by

*I*(

_{C}*f*) = 0 if

*f*∈

*C*and

*I*(

_{C}*f*) = ∞ and guaranteeing non-negativity.

We have the following result.

**Theorem III.2.** *Assume that* $f:{0,\u2026,Nr}d\u2192\mathbb{R}$ *is non-negative and that* $\Delta rf$ *is s-sparse. Moreover, suppose that the measurement matrix* *M**satisfies the RIP of order 2s with δ _{2s} < 0.6246 [see Eq*. (4)

*] and denote y =*

*M**f. Then, the pair*$[f,\Delta rf]$

*can be recovered as the unique solution of Eq.*(11).

*Proof.* According to Theorem A.2, the second derivative $p\u2033(r,t)$ is the unique causal solution of the wave equation (5) with modified source term $h=c2(r)\Delta rf$. As a consequence $c2(r)\Delta rh$ satisfies $Mh=y\u2033$, which implies that the pair [*f*,*h*] is a feasible solution for Eq. (11). It remains to verify that [*f*,*h*] is the only solution of Eq. (11). To show this, note that for any solution [*f*^{*},*h*^{*}] of Eq. (11) its second component *h*^{*} is a solution of Eq. (10). Because **M** satisfies the 2*s*-RIP, and $h=c2(r)\Delta rf$ is *s*-sparse, CS theory implies that Eq. (10) is uniquely solvable (see Theorem 6.12 in Ref. 25]) and therefore *h* = *h*^{*}. The constraint $\Delta rf=h/c2$ then implies that *f*^{* }= *f*. *◻*

In the case in which the data are only approximately sparse or noisy, we propose, instead of Eq. (11), to solve the *ℓ*^{2}-relaxed version

Here, *α* > 0 is a tuning and *β* > 0 is a regularization parameter. There are several modern methods to efficiently solve Eq. (12), for example, proximal or stochastic gradient algorithms.^{28,29} In this work, we use a proximal forward-backward splitting method with the quadratic terms used in the explicit (forward) step and $\beta \Vert h\Vert 1+IC(f)$ for the implicit (backward) step. For an overview over proximal forward-backward splitting methods in signal processing, see, for example, Ref. 29.

### D. Numerical minimization

We will solve Eq. (12) using a proximal gradient algorithm,^{29} which is an algorithm well suited for minimizing the sum of a smooth and a non-smooth but convex part. In the case of Eq. (12) we take the smooth part as

and the non-smooth part as $\Psi (f,h)\u225c\beta \Vert h\Vert 1+IC(f)$.

The proximal gradient algorithm then alternately performs an explicit gradient step for Φ and an implicit proximal step for Ψ. For the proximal step, the proximity operator of a function must be computed. The proximity operator of a given convex function $F:\mathbb{R}d\u2192\mathbb{R}$ is defined by^{29}

The regularizers we are considering here have the advantage that their proximity operators can be computed explicitly and do not cause a significant computational overhead. The gradient [**∇**_{f}Φ, **∇**_{h}Φ] of the smooth part can easily be computed to be

The proximal operator of the non-smooth part is given by

With this, the proximal gradient algorithm is given by

where (*f ^{k}*,

*h*) is the

^{k}*k*th iterate and

*μ*the step size in the

_{k}*k*th iteration. We initialize the proximal gradient algorithm with

*f*

^{0 }=

*h*

^{0 }= 0.

**Remark III.3.** *Note that the optimization problem* (11) *is further equivalent to the analysis-ℓ _{1} problem*

*Implementation of Eq.*(16)*avoids taking the second time derivative of the data y. Because the proximal map of*$f\u21a6||c\Delta rf||1$*in not available explicitly, Eq.*(16)*and its relaxed versions cannot be straightforwardly addressed with the proximal gradient algorithm. Therefore, in the present paper we only use the model [Eq.*(12]) *and the algorithm [Eqs.*(14)*and*(15)*] for its minimization. Different models and algorithms will be investigated in future research.*

## IV. EXPERIMENTAL AND NUMERICAL RESULTS

### A. Numerical results

For the presented numerical results, the two-dimensional (2D) PA source term $f:{0,\u2026,Nr}2\u2192\mathbb{R}$ depicted in Fig. 2 is used and is assumed to be supported in a disk of radius *R*. For simplicity, we assume the speed of sound *c* to be constant. Additional results are presented using a magnetic resonance imaging (MRI) phantom. The synthetic data are recorded on the boundary circle of radius *R*, where the time was discretized with 301 equidistant sampling points in the interval [0,2*R*/*c*]. The phantom images were then used to compute the pressure data at *n* = 200 equispaced detector locations in a circle around the source. For the simulated experiments, we assume ideal detectors with *δ*(*t*) impulse response. The reconstruction of both phantoms via the FBP algorithm of Ref. 16 from the full measurements is shown in Fig. 3.

CS measurements **M***f* [see Eq. (3)] have been generated in two random ways and one deterministic way. The random matrices **A** have be taken either as random Bernoulli matrix with entries $\xb11/m$ with equal probability or a Gaussian random matrix consisting of i.i.d. $N(0,1/m)$-Gaussian random variables in each entry. The deterministic subsampling was performed by choosing *m* equispaced detectors. In the joint sparse reconstruction with Eqs. (14) and (15), the step-size and regularization parameters are chosen to be *μ _{k}* = 0.1,

*α*= 0.1, and

*β*= 0.005; five thousand iterations are performed. For the random subsampling matrices the recovery guarantees from the theory of CS can be employed, but they do not provably hold for the deterministic subsampling—although the results are equally convincing even for subsampling on the order of 10% of the full data, cf. Fig. 4. All results are compared to the standard FBP reconstruction applied to

**A**

^{T}(

**M**

*f*).

If one increases the number of measurements to about 25% of the data, the reconstruction results become almost indistinguishable from the results obtained from FBP on the full data, cf. Fig. 5.

In Fig. 6, the application of the method developed in this paper to an MRI image is presented. As the sparsity of the Laplacian is not as pronounced as in the synthetic example, the MRI image requires more measurements to achieve high qualitative outcome.

For noisy data, the algorithm still produces good results, although more samples need to be taken to achieve good results. For the synthetic phantom, Gaussian noise amounting to a signal-to-noise (SNR) of ∼15 dB was added, as shown in Figs. 7 and 8. The reconstruction results using *m* = 20 and *m* = 50 measurements are depicted in Figs. 9 and 10, respectively.

### B. Experimental results

Experimental data have been acquired by an all-optical photoacoustic projection imaging (O-PAPI) system as described in Ref. 5. The system featured 64 ILD elements distributed along a circular arc of radius 4 cm, covering an angle of 289 degrees. For an imaging depth of 20 mm, the imaging resolution of the O-PAPI system was estimated to be between 100 and 260 μm; see Ref. 5. PA signals were excited by illuminating the sample from two sides with pulses from a frequency-doubled Nd:YAG laser (Continuum Surelite, 20 Hz repetition rate, 6 ns pulse duration, 532 nm center wavelength) at a fluence of 21mJ/cm^{2} and recorded by the ILD elements with a sample rate of 60 MS/s. The sample consisted of an approximately triangular shaped piece of ink-stained leaf skeleton, embedded in a cylinder consisting of agarose gel with a diameter of 36 mm and a height of 40 mm. Intralipid was added to the agarose to increase optical scattering. The strongest branches of the leaf had diameters of approximately 160–190 μm and the smallest branches of about 50 μm. Results are only for 2D samples (projection imaging).

Reconstruction results for the leaf phantom from 60 sparsely sampled sensor locations after 500 iterations with the proposed joint sparse minimization algorithm are shown in Fig. 11. For this, the regularization and step-size parameters were chosen as in Sec. IV A. From the experimental data we also generated *m* = 30 random Bernoulli measurements. The reconstruction results using this data are shown in Fig. 12. For all results, the PA source is displayed on a 1.6 cm × 1.33 cm rectangle with step size 26 μm inside the detection arc.

### C. Discussion

All numerical results presented in Figs. 4–12 demonstrate that the proposed joint sparse reconstruction method yields artifact-free reconstructions from a smaller number of spatial measurements. In particular, our approach demonstrated robustness with respect to noise (see Figs. 9 and 10) and also good performance on experimental data (see Figs. 12 and 11). In the case of the cross phantom we even obtained almost artifact-free reconstructions from only 20 spatial measurements (see Fig. 4).

We point out, that opposed to most existing reconstruction schemes for sparse data, our algorithms come with a uniqueness result given in Theorem III.2. Theorem III.2 is based on the RIP which is satisfied, for example, by Gaussian and Bernoulli measurements matrices. It does not hold for the subsampling matrices and, therefore, it is surprising that even in this case, the joint sparse reconstruction methods shows similar performance as for Gaussian or Bernoulli matrices. This fact will be investigated theoretically and numerically in a future work. We also will compare our approach with other standard methods such as total variation (TV) minimization.

## V. CONCLUSION

In order to achieve high spatial resolution in PAT, standard measurement and reconstruction schemes require a large number of spatial measurements with high bandwidth detectors. In order to speed up the measurement process, systems allowing a large number of parallel measurements are desirable. However, such systems are technically demanding and costly to fabricate. For example, in PAT with integrating detectors, the required analog-to-digital converters are among the most costly building blocks. In order to increase measurement speed and minimize system costs, CS aims to reduce the number of measurements while preserving high resolution of the reconstructed image.

One main ingredient enabling CS in PAT is sparsity of the image to be reconstructed. To bring sparsity into play, in this paper, we introduced a new approach based on the commutation relation $\u2202t2W[f]=W[c2\Delta rf]$ between the PAT forward operator **W** and the Laplacian. We developed a new reconstruction strategy for jointly reconstructing the pair $[f,\Delta rf]$ by minimizing Eq. (12) and thereby using the sparsity of $\Delta rf$. The commutation relation further allows to rigorously study generalized Tikhonov regularization of the form $(1/2)||Mf\u2212y||22+\beta ||c\Delta rf||1+IC(f)$ for CS PAT. Such an analysis, as well as the development of more efficient numerical minimization schemes, are subjects of further research.

## ACKNOWLEDGMENTS

The research of L.N. is supported by the National Science Foundation (NSF) Grant Nos. DMS 1212125 and DMS 1616904. M.H. and T.B. acknowledge support from the Austrian Science Fund (FWF), project P 30747. Michael Sandbichler was supported by the Austrian Science Fund (FWF) under Grant No. Y760. Peter Burgholzer and Johannes Bauer-Marschallinger were supported by the strategic economic and research program “Innovative Upper Austria 2020” of the province of Upper Austria. In addition, the computational results presented have been achieved (in part) using the HPC infrastructure LEO of the University of Innsbruck.

### APPENDIX A: SPARSIFYING CONCEPT IN SEMI-DISCRETE SETTING

Suppose the discrete source is given by $f:{0,\u2026,Nr}d\u2192\mathbb{R}$, which we can identify with $f:\mathbb{Z}d\u2192\mathbb{R}$ by assigning the value *f*(**r**) = 0 for $r\u2209{0,\u2026,Nr}d$. We consider the spatially discretized wave equation

together with the causality condition *p*(**r**,*t*) = 0 for *t* < 0. To be specific, we take the discrete Laplacian $\Delta rp(r,t)$ to be defined via symmetric finite differences. However, we note that alternatively, it may be also defined in the spectral domain via the time-discrete spatial Fourier transform.

Note that we assume Eq. (A1) to be satisfied for all $r\u2208\mathbb{Z}d$ (instead of finitely many sampling points) in order not to suffer from boundary effects, which would be induced by considering a finite spatial domain. The spatially discretized wave equation (A1) is therefore an infinite dimensional coupled system of second-order differential equations that can be analyzed in an abstract Hilbert space.

*Remark A.1 [Existence and uniqueness of Eq.* (A1)]. *Consider the (two-sided) initial value problem*

*where*$Lr:\u21132(\mathbb{Z}d)\u2192\u21132(\mathbb{Z}d)$*is linear and bounded and*$f\u2208\u21132(\mathbb{Z}d)$*. From general theory of evolution equations in Hilbert spaces,*^{30} *it follows that Eqs.*(A2)–(A4)*have a unique solution*$p\u2208C\u221e(\mathbb{R},\u21132(\mathbb{Z}d))$*. As in the case of a continuous spatial variable, this shows that χq solves Eq*. (A1)*for general*$Lr$*in place of*$c2(r)\Delta r$*, and therefore assures the existence of a causal solution of Eq*. (A1)*. Its uniqueness is implied by the uniqueness of a solution of Eqs.*(A2)–(A4).

The following theorem establishes a discrete version of Theorem III.1 concerning the sparsification of PA source.

**Theorem A.2.** *Suppose the discrete source* $f:\mathbb{Z}d\u2192\mathbb{R}$ *vanishes outside the set* ${0,\u2026,Nr}d$*, let* $Lr:\u21132(\mathbb{Z}d)\u2192\u21132(\mathbb{Z}d)$ *be a continuous linear operator and let p denote the unique causal solution of Eq*. (A1)*. Then* $\u2202t2p$ *is the unique causal solution of*

*Proof.* Taking into account Remark A.1, the conclusion can be shown in a similar fashion as for the continuous version (Theorem III.1). *◻*

## References

^{1}-regularization,”