In this Tutorial, we combine the different scientific fields of information theory, thermodynamics, regularization theory, and non-destructive imaging, especially for photoacoustic and photothermal imaging. The goal is to get a better understanding of how information gaining for subsurface imaging works and how the spatial resolution limit can be overcome by using additional information. Here, the resolution limit in photoacoustic and photothermal imaging is derived from the irreversibility of attenuation of the pressure wave and of heat diffusion during the propagation of the signals from the imaged subsurface structures to the sample surface, respectively. The acoustic or temperature signals are converted into so-called virtual waves, which are their reversible counterparts and which can be used for image reconstruction by well-known ultrasound reconstruction methods. The conversion into virtual waves is an ill-posed inverse problem, which needs regularization. The reason for that is the information loss during signal propagation to the sample surface, which turns out to be equal to the entropy production. As the entropy production from acoustic attenuation is usually small compared to the entropy production from heat diffusion, the spatial resolution in acoustic imaging is higher than in thermal imaging. Therefore, it is especially necessary to overcome this resolution limit for thermographic imaging by using additional information. Incorporating sparsity and non-negativity in iterative regularization methods gives a significant resolution enhancement, which was experimentally demonstrated by one-dimensional imaging of thin layers with varying depth or by three-dimensional imaging, either from a single detection plane or from three perpendicular detection planes on the surface of a sample cube.

## I. INTRODUCTION

Metrology can be described as collecting information from samples. Imaging techniques provide much information, as is indicated by the saying “a picture is worth a thousand words.” Non-destructive evaluation (NDE) or biomedical imaging often image deep structures in the samples' interior without destroying the samples. Therefore, an information theoretical viewpoint is instructive to determine the amount of information on subsurface or other embedded structures, which can be gained by measurements on the sample surface. Information theory is a branch of applied mathematics, electrical engineering, and computer science involving the quantification of information. Information theory was developed by Shannon^{1} to find fundamental limits on signal processing operations such as compressing data and on reliably storing and communicating data. Information processing is a physical activity, which has to obey the laws of (non-equilibrium) thermodynamics. This was originally recognized by Szilard,^{2} and Landauer^{3} captured it with his aphorism: “Information is physical.”

Imaging of subsurface features from data measured on the sample surface usually results in an ill-posed or an ill-conditioned inverse problem, which needs regularization,^{4–6} as illustrated in Fig. 1 for photoacoustic and photothermal imaging. Regularization typically involves additional assumptions, such as the smoothness of the solution. Usually, scientists working on the inverse problem of subsurface imaging are barely working on non-equilibrium thermodynamics and information theory and vice versa. This Tutorial should enable researchers from both sides to get more insight from the counterpart. The models used for subsurface imaging and for thermodynamics are kept simple in the beginning, allowing to investigate the essential connections between thermodynamics and regularization of inverse problems. As a first step, a one-dimensional (1D) model without boundaries and a Dirac delta-like structure at a varying depth is used for imaging, either inducing thermal waves described by heat diffusion or acoustic pressure signals showing acoustic attenuation as in a liquid. For describing the relevant results from non-equilibrium thermodynamics, we follow a paper from Esposito and van den Broeck about the “Second law and Landauer principle far from equilibrium,”^{7} where we describe the photothermal or photoacoustic measurement process in terms of thermodynamics. The description can be simplified as the Hamiltonian does not change in time. A short laser light pulse (Dirac delta-like) is used for excitation, which brings the sample suddenly out of thermal equilibrium by heating the structures via optical absorption and thereby generating an initial pressure distribution, and then the system returns slowly to equilibrium either by heat diffusion (“thermal”) or by attenuation of the pressure wave (“acoustic”).

After presenting these simple models for the wave propagation and the thermodynamic approach, they are linked in the frequency domain to evaluate the physical background of regularization. Later on, more realistic samples in higher dimensions and with realistic boundary conditions are described. But before stepping into the detail, in the remaining part of Sec. I, previous work in subsurface imaging and in thermodynamics is described.

In former times, images always have shown what could be detected with human eyes. Looking under the sample surface in a non-destructive way was possible only for optically transparent samples. This situation changed drastically with the discovery of x rays by Röntgen in 1895. Nowadays, the interaction of sample structures with electromagnetic waves in the whole frequency range from eddy current (EC), radar, terahertz (THz), infrared (IR), up to UV, and x-ray radiation is used for imaging, but also with particles or elastic (acoustic) waves. The amount of information, which can be gained by measurements on the sample surface, depends on the interaction of these waves or particles with structures of the sample, such as scattering or absorption, and on the detector (e.g., sensitivity or bandwidth). For interior structures, the available information for imaging is limited also by the information loss during wave propagation from the imaged structures to the sample’s surface, caused by diffusion, dissipation, or scattering (Fig. 1 or Burgholzer and Hendorfer^{9}). A unifying framework for treating diverse diffusion-related periodic phenomena under the global mathematical label of diffusion-wave fields has been developed by Mandelis,^{10} including thermal waves, charge-carrier-density waves, and diffuse-photon-density waves, but also modulated eddy currents, neutron waves, or harmonic mass-transport diffusion waves.

There have been made several attempts to compensate the diffusion, dissipation, or scattering during wave propagation to get a higher resolution for the reconstructed images of the samples interior. We have shown that thermodynamical fluctuations are the cause for an entropy production, which is equal to the information loss and limits this compensation.^{9} Therefore, also the spatial resolution for non-destructive imaging (NDI) at a certain depth is limited. For waves that satisfy the wave equation, no information is lost during the propagation, because the time reversed wave is a solution of the same wave equation. The inverse problem of “backprojection” can be exactly solved by, e.g., time reversal.^{11} It is called a virtual wave signal, because in reality always dissipation, diffusion, or scattering causes a loss of information during propagation. For attenuated acoustic waves^{12–24} and thermal diffusion,^{9,25–30} a linear relation between the time discretized measured signal $Ymeas$ and the virtual wave signal $Yvirt$, which is a solution of the wave equation, was established and can be written as

The matrix $M$ depends on the kind of wave propagation and was calculated already for acoustic waves having a power law attenuation and for heat diffusion. The solution of Eq. (1) constitutes an ill-posed or ill-conditioned inverse problem. The direct inversion of this equation would, therefore, cause severe noise amplification resulting in useless solutions. Therefore, regularization methods are used for the inversion. This is a consequence of the information loss during signal propagation to the surface. $Ymeas$ could be the measured acoustic pressure or temperature, and $Yvirt$ could be the virtual pressure or temperature, respectively.

Equation (1) was derived for attenuated acoustic waves and thermal diffusion, but the concept of deriving the resolution limit from entropy production seems to be useful for non-destructive and biomedical imaging, in general. For example, applications of a transformation of the diffusive electromagnetic wave into a wave field were shown by Lee *et al.*^{31,32} and Gershenson^{33} for geophysical inverse problems. The physical reason for the entropy production is fluctuations, which is the noise added to an ideally noise-free signal and can be statistically described by stochastic processes.^{9} The ratio between the ideal signal amplitude and the noise amplitude is called the signal-to-noise ratio (SNR) and plays an essential role in this Tutorial. Noise is identified as the thermodynamic fluctuations around a certain time varying mean value and is shown in Sec. III to be the mechanism for entropy production and information loss.

One comprehensive letter about the relevant non-equilibrium thermodynamics, the second law, and the connection between entropy production and information loss was published in 2011 by Esposito and van den Broeck.^{7} They showed that for two different non-equilibrium states evolving to the same equilibrium state, the entropy production $\Delta iS$ during the evolution from one state to the other is equal to the information loss $\Delta I=kB\Delta D$, where $kB$ is the Boltzmann constant and $\Delta D$ is the difference of the Kullback–Leibler divergence $D$, also called relative entropy of these states. $D$ is a measure of how “far” a certain state is away from equilibrium.^{34} The entropy production for the macroscopic states with small fluctuations around equilibrium turns out to be, in a good approximation, equal to the dissipated energy $\Delta Q$, which is the dissipated heat or the heat transported by diffusion, divided by the mean temperature $Tmean$, so $\Delta iS=\Delta Q/Tmean=kB\Delta D$.^{26}

The thermodynamic fluctuations in macroscopic samples are usually so small that they can be neglected—but not for ill-posed inverse problems such as described by the inversion of Eq. (1). The fluctuations are highly amplified due to the ill-posed problem of image reconstruction. As long as the macroscopic mean-value equations describe the mean entropy production, the resolution limit depends only on the amplitude of the fluctuations and not on the actual stochastic process including all correlations.^{26} Therefore, we have used in the past a simple Gauss–Markov process to describe the measured signal as a time-dependent random variable.^{9} In addition, we have performed a preliminary work to describe heat diffusion as a Wiener process and have compared for a simulated signal the spatial resolution limits derived in temporal and spatial frequency domains, $\omega $- and $k$-space, respectively.^{35} Experimentally, these theoretical resolution limits were verified by thermographic reconstructions of inclined steel rods in an epoxy sample, heated by a short light pulse^{27} (Sec. IV B 1).

## II. PHOTOACOUSTIC AND PHOTOTHERMAL IMAGING MODEL

In photothermal and photoacoustic imaging, a short light pulse, e.g., from a laser, is used to heat subsurface structures by the absorbed (scattered) light, such as blood vessels in tissue or light absorbing structures in epoxy resin, creating a temperature distribution $T0(r)$ at location $r$ immediately after the pulse [Fig. 1(a)]. This sudden temperature increase causes an initial pressure distribution $p0(r)$ proportional to the absorbed optical energy density $A(r)=T0(r)Cp\rho 0$, with a dimensionless material constant $\Gamma =\beta c2/Cp$, the Grüneisen coefficient.^{36–38} The initial pressure distribution can be written as

where $Cp$ is the specific heat capacity, $\rho 0$ is the ambient density, $c$ is the sound speed, and $\beta $ is the thermal volume expansion coefficient at constant pressure. The initial pressure distribution causes a virtual pressure wave $pvirt(r,t)$ as a function of space $r$ and time $t$. This wave is called “ideal” as no acoustic attenuation or dispersion during propagation is assumed and is “virtual” as attenuation and dispersion are always present for real waves, even if acoustic attenuation can often be neglected for lower frequencies. The wave equation describes the acoustic pressure $pvirt(r,t)$ as a function of space $r$ and time $t$ and can be written as^{37,38}

where $\u22072$ is the Laplacian (second derivative in space). The source term on the right side of Eq. (3) ensures that the pressure just after the short excitation pulse, which is modeled by the temporal Dirac delta function $\delta (t)$, is the initial pressure distribution $p0(r)$. The same equation is valid for a virtual temperature wave, which is defined as $Tvirt:=pvirt(r,t)/(\beta c2\rho 0)$ and gives

The virtual wave is the induced virtual pressure wave, but multiplied by a material constant to get a temperature measured in Kelvin. As it has no direct physical representation but is a mathematical model for thermographic reconstruction, it is called “virtual temperature wave.”

The actual temperature evolution $T(r,t)$ can be described by the heat diffusion equation^{39}

where $T(r,t)$ is the temperature as a function of space and time, and $\alpha $ is the thermal diffusivity, which is assumed to be homogeneous in the sample. This description of the thermal diffusion is based on Fourier’s law, which is valid for macroscopic samples, where the propagation distance is much larger than the phonon mean free path.^{40}

We denote by $T~(r,\omega )$ the temperature signal in the frequency domain, the $\omega $-space, which is related to $T(r,t)$ by the temporal Fourier transform

and its inverse

Similarly, $p~virt(r,\omega )$ and $T~virt(r,\omega )$ denote the Fourier transforms of the virtual waves $pvirt(r,t)$ and $Tvirt(r,t)$, respectively. Taking the Fourier transform according to Eq. (6) of Eqs. (3)–(5) and using $\delta (t)=12\pi \u222b\u2212\u221e\u221eexp\u2061(\u2212i\omega t)d\omega $ results in the Helmholtz equations

with $k(\omega )\u2261\omega /c$ and $\sigma (\omega )2\u2261i\omega /\alpha $. For the virtual pressure wave and the virtual temperature wave, the wavenumber $k(\omega )$ is real; for the temperature, the wavenumber $\sigma (\omega )$ is complex with a real and imaginary part of equal size to describe the diffusive effect.

Acoustic attenuation and dispersion can also be described by a complex wavenumber, but usually the imaginary part is much smaller than the real part. As Stokes^{41} could already show in 1845 and later on, e.g., Shutilov,^{42} fundamental responses of the material system cause a relaxation time between pressure and density changes, which result in an attenuation of a propagating acoustic wave. For liquids, the acoustical absorption $\alpha (\omega )$ increases with the square of the angular frequency $\omega $, that is $\alpha (\omega )=\alpha 0\omega 2$, with the material constant $\alpha 0$ taking into account, e.g., the viscosity of the liquid. The pressure caused by a monochromatic acoustic plane wave of frequency $\omega $ in a uniform attenuating medium at a point $x$ and at an instant $t$ can be expressed as

where the acoustic attenuation coefficient $\alpha (\omega )$ can be written as the imaginary part of a complex wavenumber $K(\omega )=k(\omega )+i\alpha (\omega )=\omega /c+i\alpha 0\omega 2$. For the attenuated acoustic wave, the Helmholtz equation

is the same as for the virtual pressure wave in Eq. (7), but with the complex wavenumber.^{13–15} The source term on the right side of Eq. (11) is the same as for the virtual pressure wave $p~virt(r,\omega )$ in Eq. (7) and therefore a direct relation can be derived between the attenuated pressure wave and the virtual pressure wave on the same location $r$.^{13–15} Replacing $\omega $ by $cK(\omega )$ in Eq. (7) results in

which is equal to Eq. (11), if multiplied by $\omega /cK(\omega )$, and

This is the sought relation between the attenuated pressure signal and the virtual pressure signal. The location $r$ is the same for $p~virt$ and $p~$, and the relation is valid in all dimensions. Transformation back from $\omega $–space to the time domain with the inverse Fourier transformation [Eq. (6b)] results in

with

which can be written as

The factor $\omega /(cK(\omega ))$ turns out to be approximately one for relevant frequencies and attenuation coefficients. This is shown for glycerine, a liquid with rather high viscosity, in Subsection II A: even for a high frequency of 100 MHz, the factor is 0.994. For lower frequencies or lower acoustic attenuation than in glycerine, this factor is even closer to one. Therefore, $Mp(t,t\u2032)$ is in a good approximation to a Gaussian pulse, where the amplitude is reduced and the width is increased by a factor of $\alpha 0ct\u2032$. For a thin absorbing planar layer in an infinite medium, which can be modeled as a Dirac delta function in the thickness coordinate $z$, the virtual pressure $pvirt(z,t)$ for $z>0$ is proportional to $\delta (t\u2212z/c)$. From Eq. (15), one gets for the attenuated pressure wave

which is the inverse Fourier transform [Eq. (6b)] of $exp\u2061(iK(\omega )z)$.^{43} This is also true for a general complex $K(\omega )$, e.g., with different power laws than the square of the frequency, such as for porcine fat tissue with a power law exponent of 1.5 (Sec. IV A 1). More details on compensation of acoustic attenuation can be found in Refs. 12–24.

Equation (15) can be discretized to produce a matrix equation

where $p$ and $pvirt$ are the vectors of the attenuated and virtual pressure signal at discrete time steps, respectively. $Mp$ is the matrix at these time steps calculated from Eq. (15).

In a similar way, we have calculated a local relation between the actual temperature $T(r,t)$ and the virtual wave signal $Tvirt(r,t)$ from Eqs. (4) and (5), or in the frequency domain from the Helmholtz equations (8) and (9).^{25} Replacing $\omega $ by $c\sigma (\omega )$ in Eq. (8) gives Eq. (9) by multiplying with $ic\alpha \sigma (\omega )$ and identifying

The inverse Fourier transformation in Eq. (6b) can be integrated analytically and can be written as^{25}

Equation (19) like Eq. (18) connects the temperature signal to the virtual wave signal at the same location $r$, but in the time domain instead of the temporal frequency domain.

For the example of a thin absorbing layer in an infinite medium, the virtual temperature wave consists of two Dirac delta pulses running in the positive and negative $z$-direction proportional to $12\delta (z\xb1ct)=12c\delta (t\xb1zc)$. From Eq. (19), we get the 1D Green’s function

Equation (19) can be discretized to produce a matrix equation

similar to Eq. (17), Eqs. (21) and (17) can be inverted only with appropriate regularization, as the matrices $MT$ and $Mp$ are rank deficient. The truncated-singular value decomposition (T-SVD) method is used in the following to get the reconstructed signal $Trec$ as an estimate for $Tvirt$ from the thermographic signal $T$ or $prec$ as an estimate for $pvirt$ from the pressure signal $p$, respectively. The truncation value for the smallest singular values is $1/SNR$, with SNR being the signal-to-noise ratio for the measured temperature or pressure.

Image reconstruction is now a two-stage process, where two successive inverse problems are solved. In photoacoustic imaging, in a first step, $pvirt$ from the measured pressure signal $p$ on the sample surface is determined and then in a second step the initial pressure distribution $p0(r)$ is determined by acoustic reconstruction methods, such as backpropagation or time reversal reconstruction algorithms.^{11} In photothermal imaging, the initial temperature distribution $T0(r)$ is determined by the same reconstruction methods from the virtual temperature wave $Tvirt$, which was calculated in the first step from the measured surface temperature $T$. This two-stage process for imaging can be used in 1D, 2D, and 3D. In one dimension, the second step, the reconstruction by backprojection, is rather trivial: the time signal of the virtual pressure or temperature wave multiplied by the constant sound speed $c$ gives directly the initial pressure or temperature at the depth $z=ct$. Therefore, in the beginning, we will show a 1D example.

### A. Compensation of acoustic attenuation for 1D photoacoustic imaging

In the frequency domain, according to Eq. (10), the amplitude of the wave component with frequency $\omega $ is damped by the factor $exp\u2061(\u2212\alpha 0\omega 2z)$ during propagation from depth $z$ to the sample surface. For frequencies larger than the truncation frequency $\omega cut$, the amplitude of these wave components is damped below the noise level, and at $\omega cut$, we get

For the spatial resolution in photoacoustic imaging, the minimal possible width of the acoustic signal in the time domain is essential. A small width enables high spatial resolution, which corresponds to a broad frequency bandwidth. If the frequency bandwidth is limited according to Eq. (22), the spatial resolution limit according to Nyquist is half the wavelength at this frequency

The resolution limit can be validated by the reconstruction of a thin absorbing planar layer in an infinite medium, which is modeled as a Dirac delta function in the thickness coordinate $z$, and where the virtual pressure $pvirt(z,t)$ for $z>0$ is proportional to $\delta (t\u2212z/c)$. The Fourier transformation of a delta pulse is $exp(i\omega z/c)$ and shows all frequencies with equal amplitude. The reconstruction in the attenuation limited frequency bandwidth is the inverse Fourier transformation, where the frequency integral is taken from $\u2212\omega cut$ to $+\omega cut$, which gives

This is a sinc function, where the maximum at a distance $z$ is at $t=z/c$, which is the arrival time of the virtual wave. The zero points are at $t=zc\xb1\pi \omega cut=z\xb1\delta resolutionc$ with $\delta resolution$ from Eq. (23). Compared to the measured pressure pulse without any compensation of attenuation given in Eq. (16), the width of the reconstructed pressure pulse, Eq. (24), is reduced by a factor of $ln\u2061(SNR)$. A higher SNR allows a better resolution for the reconstruction.

Glycerine is a liquid with rather high viscosity. Acoustic attenuation in glycerine is about 100 times higher than in water. At a temperature of $25\xb0C$ and a frequency of 1 MHz, the sound velocity is $c=1923m/s$ and the attenuation is $2.5m\u22121$, which gives 22 dB/m or 0.22 dB/cm [multiplying the attenuation with $20log10$($e$)].^{42} This gives $\alpha 0=\alpha /\omega 2=2.5/(2\pi 106)2s2/m=6.3\xd710\u221214s2/m$. Even for high acoustic attenuation as in glycerine and for high frequencies, e.g., up to 100 MHz, $\omega /(cK(\omega ))$ can still be approximated by one in Eq. (16), because $c\alpha 0\omega =0.076$ is small compared to one.

We will start with a simple 1D imaging problem: a thin planar layer at a certain depth $z$ below the surface of our detection plane. Due to acoustic attenuation, the rectangular pressure pulse gets a different shape according to Eq. (15), calculated by the discretized version of the matrix equation in (17). In our example, we take three layers at a depth $z$ of 1 mm, 5 mm, and 10 mm, with a layer thickness of 0.2 mm. Table I shows the truncation frequencies according to Eq. (22) and the resulting resolution limits from Eq. (23). Figure 2 shows the initial pressure distribution, which corresponds in 1D directly to the virtual pressure wave $pvirt$, and the “measured” pressure signal $p$, which was calculated from Eq. (7), where 1% of Gaussian noise was added (SNR = 100). For image reconstruction, the inverse matrix was calculated by using the truncated singular value decomposition (T-SVD).^{25} The SVD of the matrix $Mp$ is a factorization of the form $U\Sigma Vt$, where $U$ and $V$ are unitary matrices and $\Sigma $ is a diagonal matrix with non-negative diagonal elements in a decreasing order, called the singular values. For the pseudo-inverse matrix $\Sigma +$, the inverse of the non-vanishing diagonal elements in $\Sigma $ is taken and the non-significant singular values were set to zero. In the truncated SVD reconstruction, if the singular values get less than 1/SNR, they are set to zero. The pseudo-inverse of the matrix $Mp$ is $Mp+=V\Sigma +Ut$. Then, the reconstructed virtual pressure wave is

which is shown in Fig. 2 around the depth of 1 mm and around the depth of 10 mm.

Propagation distance . | Truncation frequency . | Spatial resolution . |
---|---|---|

z (mm)
. | ω_{cut}/(2π) (MHz)
. | δ_{resolution} (μm)
. |

1 | 42.9 | 22.4 |

5 | 19.2 | 50.1 |

10 | 13.6 | 70.9 |

Propagation distance . | Truncation frequency . | Spatial resolution . |
---|---|---|

z (mm)
. | ω_{cut}/(2π) (MHz)
. | δ_{resolution} (μm)
. |

1 | 42.9 | 22.4 |

5 | 19.2 | 50.1 |

10 | 13.6 | 70.9 |

### B. Virtual wave concept for thermographic reconstruction

In the frequency domain, we can calculate the decay in amplitude of a thermal wave with frequency $\omega $ after a distance $z$ either by directly solving the Helmholtz equation (9) or by inserting the Ansatz^{44}

with the complex wave number $\sigma (\omega )=i\omega \alpha \u22611+i\mu $ and a thin absorbing layer $T0(z)=T0\delta (z)$ into the heat diffusion equation (5), which results in

where $\u211c$ is the real part and $\mu (\omega )\u22612\alpha /\omega $ is defined as the thermal diffusion length.^{45} The amplitude of the thermal wave is reduced by a factor $1/e$ after propagation of that length. The wavenumber or spatial frequency is $k(\omega )\u22611/\mu (\omega )=\omega /2\alpha $. Similar as for acoustic attenuation described in Sec. II A, for frequencies larger than the truncation frequency $\omega cut$, the amplitude of these wave components is damped below the noise level, with the truncation wavenumber $kcut$, and for the truncation diffusion length, frequency, or wavenumber, we get

Similar to Eq. (23), the spatial resolution

is half of the wavelength of the truncation frequency. For thermographic reconstruction, the resolution limit decays linear with depth $z$. In comparison, for photoacoustic reconstruction, the resolution limit decays proportional to the square root of $z$ [Eq. (23)].

The objective of the solution of the inverse problem in thermographic reconstruction is to calculate the virtual wave temperature from the noisy temperature measurements. The one-dimensional solution of the wave equation is a wave package of constant shape traveling with sound velocity, because the group velocity is equal to the phase velocity. Therefore, in 1D the virtual wave immediately gives the reconstruction at time $t=0$ by projecting it back at a distance $z=ct$. Similar to photoacoustic imaging, the optimum reconstruction can be achieved in the temporal frequency domain ($\omega $-space) by considering frequencies up to the truncation frequency $\omega cut$. If $Tvirt$ has the shape of a sharp pulse, ideally a Dirac delta distribution, the pulse will keep its shape over a travel distance, as the virtual wave equation (8) has only solutions without dispersion and attenuation. In $\omega $-space, for $T~virt$ also the high frequencies will contribute to the signal after some propagation. From the delta pulse in $Tvirt$, the related temperature signal $T$ gets broadened with increasing distance $z$ according to Eq. (19).

In Fig. 3(a), the time-dependent temperature signal calculated using Eq. (21) is shown for three different heat sources at layer depths of 1, 3, and 5 mm. The layer thickness is the same as in the photoacoustic example with 0.2 mm. A low carbon steel ($<0.4%C$) with the following material parameters is theoretically investigated: $k=43W/(mK)$, $Cp=465J/(kgK)$, and $\rho =7850kg/m3$. The resulting thermal diffusivity, which describes the speed of heat propagation, is $\alpha =k/(\rho Cp)=11.810\u22126m2/s$. The characteristic time (or timescale) to reach the maximum temperature in an 1D heat diffusive process is given by $tD=z2/(2\alpha )$. Due to this quadratic dependence, the temperature curves in Fig. 3(a) show very widespread timescales for different layer depths $z$. This can be interpreted by the concept of “thermal waves”: the short time behavior is dominated by the fastest propagating, high frequency components of the pulse and the long time behavior by the low frequency components, with a phase velocity of $2\alpha \omega $.^{44} The sample can be considered a low pass filter, because the high frequency components are heavily attenuated as they propagate, and so only the slow moving low frequency components reach the surface from deeper layer sources in the sample. In Fig. 3(b), the reconstructed signal $Trec$ is shown for the three rectangular signals of $Tvirt$, whereby for the regularization the truncated-SVD is used with a truncation value for the smallest singular value of 1/SNR with a SNR of 100. The truncation frequencies are listed in Table II for the three different layer depths. As the resolution limit is the size of a small, or ideally point-like, reconstructed source at a certain depth, the resolution after some propagation distance $z=ct\u2032$ is half the wavelength at this frequency or the width of the reconstructed signal calculated by the T-SVD. This is in good agreement with the spatial resolution in Eq. (29). The thermographic resolution limit is proportional to the travel distance $z$ of the signal between the excitation location and the detection surface and inversely proportional to the natural logarithm of the SNR, whereby the thermal diffusivity has no influence. The resolution limits for a SNR of 100 are given in Table II.

Propagation distance . | Truncation frequency . | Spatial resolution . |
---|---|---|

z (mm)
. | ω_{cut}/(2π) (Hz)
. | δ_{resolution} (mm)
. |

1 | 79.5 | 0.68 |

3 | 8.8 | 2.05 |

5 | 3.2 | 3.41 |

Propagation distance . | Truncation frequency . | Spatial resolution . |
---|---|---|

z (mm)
. | ω_{cut}/(2π) (Hz)
. | δ_{resolution} (mm)
. |

1 | 79.5 | 0.68 |

3 | 8.8 | 2.05 |

5 | 3.2 | 3.41 |

## III. THERMODYNAMICS AND INFORMATION

The process of information gaining during photoacoustic and photothermal imaging is a physical one which has to obey the laws of (non-equilibrium) thermodynamics. Determining the virtual pressure or temperature wave from the measured pressure or temperature is an ill-posed or an ill-conditioned inverse problem, which needs regularization.^{4} In Sec. II, the truncation frequency $\omega cut$ was chosen as a regularization parameter, where it was assumed by regularization using the T-SVD method that all signal components below $\omega cut$ could be used for reconstruction, and all components above $\omega cut$, where the signal is damped below the noise level, provide no additional information for reconstruction. Therefore, the signal amplitude compared to the noise level plays an important role in determining the truncation frequency $\omega cut$, which serves as a regularization parameter for the inverse problem. Where does this noise come from and what is its relation to information gaining in imaging?

Before answering these essential questions, some basics of thermodynamics and statistical physics should be reviewed. A microstate is a specific microscopic configuration of a thermodynamic system, e.g., with certain locations or velocities of all of the molecules. A microstate can be represented as a single point with coordinates $x$ in a usually very high dimensional phase space. In contrast, the macrostate of a system refers to a few macroscopic properties, such as its temperature, pressure, volume, or density. A macrostate is characterized by a probability distribution $\rho (x)$ in the phase space of possible microstates. This distribution describes the probability of finding the system in a certain microstate. In classical mechanics, the position and momentum variables of a particle can vary continuously, so the set of microstates is actually uncountable and $x$ gets a continuous variable. The time evolution of all the particles is determined by the Hamiltonian $H$, which gives the total energy of a microstate, the sum of the kinetic and potential energy. The systems energy

at time $t$ is the mean value or also called expectation value of the system Hamiltonian $H$, where $\rho t(x)$ is the probability distribution at time $t$.

Here, we follow the work of Esposito and van den Broeck^{7} about the second law and the connection between entropy production and information loss. They have used a general Hamiltonian system in contact with an ideal heat bath at temperature $T$. $Wt$ is the work performed on the system, and $Qt$ is the heat coming from the system into the ideal heat bath. In our case, the system is the sample, the performed work is the energy $W$ of the short laser pulse at the time $t=0$, and $Qt$ is the diffused heat or the dissipated heat from the acoustic attenuation until time $t$. The ideal heat bath is the samples environment having an ambient temperature $T$ (see Fig. 4). The system entropy is the Shannon entropy

where $kB$ is the Boltzmann constant, and the corresponding system free energy is given by

Here, $T$ is the temperature in Kelvin of an ideal heat bath with which the system is in contact—the ambient temperature of the sample. The excitation laser pulse deposits the work $W$ at the time $t=0$, which either diffuses as heat out of the system or is the heat from dissipation of the photoacoustic pulse by acoustic attenuation, called $Qt$ from the laser pulse until the time $t$. After a long time, the system is in the same thermal equilibrium as in the time before the excitation pulse, and all the energy $W$ of the laser pulse has left the system and is in the heat bath ($Qt=W$), as sketched in Fig. 4.

Following conservation of energy (first law of thermodynamics), the corresponding energy change $\Delta Ut:=Ut\u2212Ueq$ of the system is given by

At $t\u2192\u221e$, equilibrium is reached again, $Qt$ gets $W$, and the change in system energy $\Delta Ut=0$.

If the system is in equilibrium with the surroundings at the ambient temperature $T$, the equilibrium distribution is the canonical distribution

where the normalization function of the equilibrium distribution is called the canonical participation function $Z$ and the thermodynamic $\beta =1/(kBT)$. This Maxwell-Boltzmann distribution $\rho eq$ can be determined by minimizing the entropy $St$ under the two constraints on the total particle number and on the average energy per particle, e.g., Refs. 46 and 47. For the equilibrium entropy, one gets

According to the second law of thermodynamics, the entropy of an adiabatically insulated system increases monotonically until thermodynamic equilibrium is established, where the relative entropy $\u2212kBD(\rho t\u2223\rho eq)$ gets zero, with the Kullback–Leibler divergence

between the non-equilibrium distribution $\rho t$ at time $t$ and the equilibrium distribution $\rho eq$.^{34} The Chernoff–Stein’s Lemma gives a precise meaning to $D(f\u2223g)$ as a “distance” between two distributions: if $n$ data from $g$ are given, the probability of guessing incorrectly that the data come from $f$ is bound by the error $\u03f5=exp(\u2212nD(f\u2223g))$, for $n$ large.^{34,48} The free energy of a non-equilibrium state $\rho t$ is higher than that of the equilibrium state by an amount equal to the temperature times the information $It$ needed to specify the non-equilibrium state

For the first equality, Eqs. (37), (31), and (34) are used, and the second equation uses Eq. (30). The third equation uses the definition of the free energy in Eq. (32) and for the equilibrium state in Eq. (36). Due to the excitation pulse at time $t=0$, the free energy jumps from $Feq$ to $Fkick=Feq+W$, because the pulse shifts the distribution to a higher energy $\Delta Ukick=W$ [Eq. (33)] but does not change the entropy (Fig. 4), $Skick=Seq$.

The change in the non-equilibrium system entropy $\Delta St=St\u2212Seq$ can be written as a reversible contribution to the heat flow $\u2212Qt/T$ and of an irreversible contribution called entropy production

Together with the definition of the free energy in Eq. (32), the first law equations (33) and (38) we obtain for the entropy production

The free energy $\Delta Ft$ jumps at time $t=0$ from zero to $W$ and then decreases and gets zero, the information according to Eq. (38) makes the jump to $W/T$ and then decreases to zero. $\Delta It$ is defined as the information loss, and according to Eq. (40), we get the very important result that the entropy production,

is equal to the information loss.

For a macroscopic sample, the fluctuations that are the variance or the “noise” around the mean values $x(t)$ are small and the change of the “shape” of the distribution during time evolution shown in Fig. 4 can be neglected. Therefore, the change of the Shannon system entropy can be neglected [$\Delta St\u22480$ in Eq. (39)] and the change in free energy is equal to the change in system energy, $\Delta Ft\u2248\Delta Ut$, which gives from Eq. (38) for the information

In this very good approximation for macroscopic samples, the Kullback–Leibler divergence gets a distance in the mathematical sense, and the energy $W\u2212Qt$, which has not yet dissipated or diffused to the surroundings at temperature $T$, divided by this temperature is the information content. After a long time, when all the work $W$ has dissipated or diffused, $It$ gets zero and the equilibrium distribution $\rho eq$ is reached. But according to Chernoff–Stein’s Lemma after some time $tcut$, the distribution $\rho tcut$ cannot be statistically distinguished from $\rho eq$, if $kBTD(\rho t\u2223\rho eq)$ gets smaller than the equilibrium energy $Ueq$.

For real world examples having a high dimensional phase space $x$, the actual distribution density $\rho t$ can be hardly determined, but for the information loss from Eq. (42) it is sufficient to evaluate the dissipated or diffused work from the mean-value equations. Usually, a set of reduced variables with a drastically lower dimensionality than the phase space can be used, which captures the information on the mean dissipated or diffused work. In the macroscopic approximation, the actual noise or fluctuation around the mean value is irrelevant. Only its amplitude is significant for setting the lower bound in Chernoff–Stein’s Lemma. Subsections III A and III B give a small example for dissipation and for diffusion to clarify the relation between entropy production and information loss.

### A. Kicked Brownian particle

We now consider the velocity $v$ of a particle as a stochastic process, which was used to describe the Brownian motion of a particle, but for simplicity only one velocity component (1D) is considered here. The essential relation between fluctuation (noise) and dissipation (viscous damping) is shown. Stochastic processes can be described mathematically, e.g., by Master equations, Langevin or Fokker–Planck equations shown, e.g., in the books of van Kampen,^{49} Gardiner,^{50} or Risken.^{51} In the Langevin equation

the environmental forces on a particle in Newton’s law are a linear damping term together with random noise. The linear damping $\u2212\gamma v$ is a viscous drag, and $\sigma $ is the amplitude of the white noise $\eta $, which has a zero mean value and is uncorrelated in time: $\u27e8\eta (t)\eta (t\u2032)\u27e9=\delta (t\u2212t\u2032)$. The Langevin equation governs an Ornstein–Uhlenbeck (O-U) process, named after L. S. Ornstein and G. E. Uhlenbeck, who formalized the properties of this continuous Markov process.^{52} It was shown by using the statistical properties and the continuum limit of the white noise $\eta $ that the Langevin equation is equivalent to a description based on a Fokker–Planck equation. For the time-dependent distribution density $\rho t(v)$ of the velocity, a linear differential equation

can be derived.^{51,53}

We start with the equilibrium state [zero time derivative if inserted into Eq. (44)] as the initial velocity distribution

which is a Gaussian distribution with mean value zero and a variance $Var(v)=\sigma 2/(2\gamma )$, representing the statistical “noise.” Comparison with Eq. (34) and the kinetic energy $H(v)=mv2/2$ gives

This relation states a connection between the strength of the fluctuations, given by $\sigma $, and the strength of the dissipation $\gamma $. This is the *fluctuation–dissipation theorem* for uncorrelated white noise.

At a time zero, the particle is kicked, which causes an immediate change in velocity of $v0$ (kick magnitude) and the distribution density after the kick is

The solution of Eq. (44) for the time-dependent distribution density $\rho t(v)$ at $t>0$ with $\rho kick(v)$ as initial condition at $t=0$ is

which gives a Gaussian distribution with time dependent mean value $v\xaf(t)$ but a constant variance $Var(v)=\sigma 2/(2\gamma )$. As shown by Burgholzer and Hendorfer, this is a general feature of Gauss–Markov processes, also for higher dimensions: taking the equilibrium as an initial condition results for all times after the kick in a distribution with a constant (co)variance (matrix) equal to the equilibrium variance.^{9} One realization of the kicked O-U process is shown in Fig. 5. At the time $t=0$, a kick with a magnitude of $v0=10$ occurs. For a time $t>0$, the information about the magnitude of the kick gets more and more lost due to the fluctuations. Now, this increasing information loss is quantified and compared to the mean entropy production.

According to Eqs. (30) and (33) due to the kick at time zero, the energy of the particle is increased by $W=mv02/2$ and

From Eq. (31), the entropy

stays constant in time ($\Delta St=0$), as the distribution is shifted in time but with a constant “shape.” Therefore, the approximation in Eq. (42) gets exact and the information content is

According to Chernoff–Stein’s Lemma, if the information $It$ at a time $tcut$ gets less than a certain limit, here chosen as the zero-point energy divided by the temperature ($Ueq/T=kB/2$), the distribution $\rho t(v)$ cannot be statistically distinguished from the equilibrium distribution $\rho eq(v)$. The information content is

which gives

At this truncation time $tcut$, not only the distribution cannot be distinguished from equilibrium, but also the mean velocity $v\xaf(t)=v0exp(\u2212\gamma t)$ gets less than the noise level $v0/SNR$. In Fig. 5, this is at the scaled time $tcut\gamma =ln\u2061(10)\u22482.3$. This coincidence, that the information related criterion in Eq. (52), gives the same time $tcut$ when the signal gets less than the noise always happens, if *the energy described by the Hamiltonian $H$ is proportional to the square of the signal amplitude.*

### B. Ideal gas diffusion between two boxes

Another example for an Ornstein–Uhlenbeck process using diffusion instead of dissipation is the distribution of $2N$ particles of an ideal gas between two boxes of volume $V$, which are connected by a small hole where particles can slip through between the two boxes (effusion). The constant time rate $\gamma $ is the reciprocal value of the mean time of a particle to stay in one of the boxes before it slips through the hole. At a certain time $t$, $N+\delta N(t)$ particles are situated in the left box and $N\u2212\delta N(t)$ particles are in the right box (Fig. 6).

$\delta N(t)$ can be approximated by an Ornstein–Uhlenbeck process for a big number of particles (see, e.g., the tutorial introduction to stochastic processes by Lemons and Langevin^{54}). At the time $t=0$, $N0$ particles are “pumped” from the right to the left box. The mean value $\delta N\xaf(t)=N0exp(\u2212\gamma t)$ for $t>0$ is given by an exponential decay in time, and the variance of the Gauss distribution is again constant in time with $Var(\delta N)=\sigma 22\gamma $, according to Eq. (48). However, $Var(\delta N)$ is not an energy, and therefore $\sigma 2$ cannot be expressed in terms of a rate $\gamma $ and a temperature T by requiring the equipartition of energy at equilibrium. The fluctuation–dissipation theorem as in Eq. (46) does not apply; $Var(\delta N)$ fluctuates without dissipation.^{54} The information loss can be described analogously to Eq. (51) by an exponential decay

Spatial diffusion is the cause for the entropy production and is described according to Boltzmann by $S=kBln\u2061W$. $W$ is the number of possibilities to distribute the particles into the two boxes for a certain $\delta N\xaf$. Using Stirlings formula for a big number of particles $N$, one gets for the entropy^{46}

By comparison with Eq. (54), one gets again a relation between the fluctuations, given by $\sigma $, and the strength of the diffusion, given by the rate $\gamma $ (fluctuation–“diffusion” theorem) and for the variance of $\delta N$, we get

## IV. IMAGING AS AN INVERSE PROBLEM

In Sec. III, imaging with an excitation pulse at time $t=0$ has been described by means of nonequilibrium statistical physics. Pressure or temperature is proportional to the momentum or the kinetic energy of many particles, which fluctuate around their mean value. For macroscopic samples, these fluctuations are very small and can usually be neglected in the thermodynamic limit. But for inverse problems, these fluctuations are highly “amplified,” which was shown in Sec. III for a system kicked out of the equilibrium, followed by a dissipative process back to equilibrium (Fig. 4). The inverse problem of estimating the kick-magnitude from an intermediate state a certain time after the kick is ill-posed. Just after the kick, its magnitude can be estimated very well. A long time after the kick the state is nearly in equilibrium and all the information about the kick magnitude is lost. For macroscopic systems, it could be shown that the information loss is in a good approximation just the mean dissipated energy or diffused heat divided by the temperature, which is the mean entropy production. In imaging, the spatial resolution and the information content are strongly correlated, and therefore a loss of information results in a loss of resolution, which is quantified for 1D examples in Secs. II A and II B.

In Secs. III A and III B, the mean dissipated energy and the resulting loss of information are calculated explicitly for a kicked Brownian particle and for an ideal gas diffusing between two boxes. For these study cases, also the time-dependent probability distribution could be calculated explicitly, and the deduced information losses are in agreement with the results from mean-value calculations, even when the distributions are broad and still far away from the thermodynamic limit. A truncation time could be given, for which the state after that time cannot be distinguished from the equilibrium distribution according to Chernoff–Stein’s Lemma, when the Kullback–Leibler divergence gets too small. The amplitude of the fluctuations and the dissipation or the diffusion are not independent. Fluctuations and mean entropy production can be thought as two sides of the same coin and are connected by the fluctuation–dissipation theorem. For systems near thermal equilibrium in the linear regime, such relations between entropy production and fluctuation properties have been found by Callen and Greene,^{55} Callen and Welton,^{56} and Greene and Callen.^{57} This fluctuation–dissipation theorem is a generalization of the famous Johnson^{58} and Nyquist^{59} formula in the theory of electric noise. It is based on the fact that in the linear regime the fluctuations decay according to the same law as a deviation from equilibrium following an external perturbation.

For macroscopic systems, it is not necessary to describe the full stochastic process to get the influence of the fluctuations. The relevant information loss can be calculated from the averaged behavior (mean value), which describes the usually known macroscopic evolution of the system in time, as the mean dissipated work or diffused heat divided by the temperature. This remarkable feature might be the reason that regularization methods for ill-posed inverse problems work so well, although they use only the mean-value equations and not the detailed stochastic process to describe the time evolution. The choice of an adequate regularization parameter, e.g., the truncation value for the truncated singular value decomposition (SVD) method, is equivalent to the choice of the error level $\u03f5$ in Chernoff–Stein’s Lemma.

A prominent class of ill-posed inverse problems is non-destructive imaging (NDI), where the information about the spatial pattern of a sample’s interior has to be transferred to the sample surface by certain waves, e.g., ultrasound or thermal waves (Fig. 1). Imaging is done by reconstruction of the interior structure from the signals measured on the sample surface, e.g., by backprojection or time reversal for a photo-acoustically induced ultrasound wave.^{11,60} There are several effects that limit the spatial resolution for photoacoustic imaging. Beside insufficient instrumentation and data processing, one principle limitation comes from attenuation of the acoustic wave or from heat diffusion (Sec. II). Attenuation during wave propagation can be caused by dissipation of acoustic energy to heat or by acoustic scattering. In this article, it is assumed that, as with dissipation, the information is also lost with scattering and that scattered waves are not used for image reconstruction. Using scattered waves for reconstruction would be possible, but this is beyond the scope of this Tutorial. Therefore, the entire information loss due to acoustic attenuation is lost and cannot be compensated by subsequent processing algorithms. In this section, it is shown that the loss of information, which is equal to the entropy production (Sec. III) as the dissipated energy or diffused heat divided by the temperature, is a principle thermodynamic limit, which cannot be compensated. Using the information loss and entropy production for a kicked process derived in Sec. III, it is shown that the resolution limit depends just on the macroscopic mean-value equations and is independent of the actual stochastic process, as long as the macroscopic equations describe the mean work and therefore also the mean dissipated work.

In this section, we do not attempt to model the process of acoustic attenuation or heat diffusion as a stochastic process, which we have done earlier as a Gaussian process.^{9} We use from Sec. II the mean-value equations for the pressure $p(r,t)$ or temperature $T(r,t)$ evaluation in the frequency domain, $p~(r,\omega )$ or $T~(r,\omega )$, respectively.

For an acoustic wave in the frequency domain, according to Eq. (10), the amplitude of the wave component with frequency $\omega $ is damped by the factor $exp(\u2212\alpha 0\omega 2r)$ after propagating a distance $r$. The energy $\Delta U\omega $ of the acoustic wave with frequency $\omega $ is proportional to the square of the pressure amplitude

which can be found, e.g., in Morse and Ingard,^{61} where $\chi =1/(c2\rho )$ is the adiabatic compressibility with the density $\rho $ and $\Delta V$ is the measurement volume. Using that $Var(\rho )=kBT/(\chi \Delta V)$ is the variance of the pressure (e.g., from Landau and Lifshitz^{62}) one gets from Eq. (42) for the information content of the Fourier component with frequency $\omega $: $I\omega =\Delta U\omega /T=0.5kBSNR2exp(\u22122\alpha 0\omega 2r)$. The signal-to-noise ratio at the distance $r=0$ is the reciprocal value of the square root of the variance of the pressure, as the signal amplitude in the frequency domain is normalized to one [Eq. (10)]. The truncation frequency $\omega cut$ is defined in the manner that the information content for frequencies larger than $\omega cut$ is so low that its distribution cannot be distinguished from the equilibrium distribution within a certain statistical error level (Chernoff–Stein’s lemma). This level is $\Delta Ueq/T=0.5\chi \Delta VVar(p)/T=0.5\chi \Delta VkBT/(\chi \Delta V)/T=0.5kB$, which results in

This gives the same truncation frequency as in Eq. (22), where the amplitude is damped just below the noise level. As described already in Sec. III A, the information related criterion in Eq. (58) gives the same truncation value as when the signal amplitude gets less than the noise, if the energy $\Delta U\omega $ is proportional to the square of the signal amplitude $|p~(r,\omega )|$ [Eq. (57)].

Similarly, also for particle diffusion, the information content $It$ is proportional to the square of the mean difference in the particle number $\delta N\xaf(t)$ [Eq. (54)]. For heat diffusion and thermal waves, the information content is proportional to the square of the temperature deviation from the equilibrium temperature, in real time space and in the frequency domain. Heat diffusion can be described as an Ornstein–Uhlenbeck process.^{9,63} Therefore, the information related criterion in Eq. (42) for thermal waves gives the same truncation frequency $\omega cut$ when the signal gets less than the noise with the results given in Sec. II B.

As mentioned at the end of Sec. II, imaging in two (2D) or three dimensions (3D) can always be reduced using a two-stage process: first, for each detector location the virtual pressure signal in the absence of attenuation or the virtual thermal wave is calculated from the measured acoustic or thermal signal, respectively. This is a one-dimensional (1D) reconstruction problem, described in discrete time steps by Eqs. (17) and (21). In a second step, any reconstruction method for photoacoustic tomography without acoustic attenuation, such as time reversal or backprojection, can be used for reconstructions in higher dimensions.^{11} In 1D, the virtual wave immediately gives the reconstruction at time $t=0$ by projecting it back at a distance $ct$ (see Secs. II A and II B). Therefore, it is sufficient to examine the acoustic attenuation or thermal diffusion of 1D waves and the reconstruction in 1D. Compensation of acoustic attenuation or thermal diffusion in higher dimensions can always be reduced to 1D, which was also explicitly shown for signals from a layer (1D), cylinder (2D), and a sphere (3D).^{20} Therefore, in the beginning, we will show a 1D example.

### A. 1D imaging

#### 1. Acoustic attenuation in a fat tissue layer

For excitation of plane acoustic waves, the surface of a silicon wafer was illuminated by a nanosecond laser pulse (Fig. 7). Due to the abrupt local heating due to absorption of the 532 nm wavelength laser light pulse, the silicon wafer expands thermoelastically and generates an acoustic wave, propagating through a layer of fatty tissue and is detected by an unfocused ultrasound transducer (V358-SU, Panametrics, Waltham, MA). The measured acoustic pressure without fat, through 6 mm thick porcine fat tissue and through 20 mm thick tissue, is shown in Fig. 8(a) as a function of time.^{64,65} For comparison, the measured signals are scaled and time-shifted. The frequency spectrum was calculated by Fourier transformation in time and the frequency dependent attenuation for the 6 mm and 20 mm fat layer was determined by dividing their spectrum through the spectrum of the reference signal without fat. It turned out that a power law $\alpha (\omega )=\alpha 0\omega n$ with an exponent $n=1.5$ and $\alpha 0=0.87dBMHz\u2212ncm\u22121$ fits the attenuation in a wide frequency range very well.^{64}

The experimental setup now is slightly different compared to the photoacoustic imaging setup described in Sec. II. In photoacoustic imaging, the short excitation pulse at the time zero excites a pressure wave in the whole sample at the same time. If an acoustic signal arrives later at the detector, this comes from a longer propagation distance and causes a higher attenuation. Here, the fat layer always has the same thickness and therefore the attenuation for earlier or later arriving parts of the signal at the detector is the same. Only for the excitation as a Dirac delta in space and time as described in Eq. (16), this is the same. Since the differential equations are linear, the general attenuated signal is a temporal convolution of the input signal with the attenuated solution for Dirac excitation, or a simple multiplication in the frequency domain.^{16} The discrete version of the relationship between the measured pressure and the virtual pressure wave now, instead of Eq. (17), becomes

where $p$ and $pvirt$ are the vectors of the attenuated and virtual pressure signal at discrete time steps, respectively. Writing the discrete Fourier and inverse Fourier transformation as multiplication by $F$ and its conjugate transpose $F\u2217$, and $diag(.)$ forms a diagonal matrix, $Mz$ can be written by neglecting the dispersion as^{64}

which immediately shows that the singular values of $Mz$ decay exponentially and according to Eq. (22) for larger frequencies than the truncation frequency $\omega cut$ the amplitude of these wave components is damped below the noise level, which results in

The spatial resolution according to Eq. (23) is half the wavelength at the truncation frequency $\omega cut$. For the SNR of 1358 (63 dB), the truncation frequency for a fat thickness of 6 mm and 20 mm is 24 MHz and 11 MHz, respectively, which corresponds to a spatial resolution of $32\mu m$ after 6 mm of fat and $70\mu m$ after 20 mm of fat.

The acoustic attenuation in water compared to fat can be neglected.^{64} Therefore, the signal measured in water without fat can be taken as the virtual pressure signal $pvirt$, which is no single Dirac delta pulse, but gets negative and shows additional “ringing” because of the laser ultrasound excitation within the silicon wafer and the characteristic of the piezoelectric transducer and the amplifier. All these influences on the signal can be described by a matrix $Mwater$ with $pvirt=Mwaterp\delta $, and Eq. (59) can be written as

Using the truncated SVD for the inversion of $MzMwater$, the reconstructions of $p\delta $ from the measurements $p$ for water, 6 mm fat, and 20 mm fat are shown in Fig. 8(b). Multiplying the temporal width of these peaks of 21 ns and 46 ns by the sound velocity of 1512 m/s in fatty tissue perfectly fits to the resolution limits of $32\mu m$ after 6 mm of fat and $70\mu m$ after 20 mm of fat derived above from the truncation frequencies.

#### 2. Axial thermal profiling of planar heat sources

To generate an internal planar heat source for an experimental study, a thin plate (foil) was embedded in pure epoxy resin. Laser light is absorbed for thermal excitation and heats the foil [Fig. 9(b)]. In the case of an electrically conductive film, electromagnetic induction can also be used to heat the film [Fig. 9(a)]. The epoxy resin is opaque in the spectral sensitivity range of the mid-wave IR camera ($3\u22125\mu m$). Therefore, the measured infrared radiation only comes from the sample surface ($z=0mm$).

The thermal diffusivity of the epoxy sample was obtained by the Parker method^{67} as $\alpha =1.4\xd710\u22127m2s\u22121$. For two different epoxy samples, with the embedded film at a depth of $z=7.2mm$ and 11.7 mm, the surface temperature as a function of time has been measured with a quantum IR detector [Fig. 10(a)]. Due to the low temperature increase on the sample surface, the measurement noise is approximately additive white Gaussian noise with a standard deviation of 25 mK.^{68} The measurement parameters are listed in Table III, with the pulse duration $tp$, the number of discretizations in time and space $N$, and the temporal and spatial discretization $\Delta t$ and $\Delta z$.

z
. | t_{p}
. | N . | Δt
. | Δz
. | SNR . | ω_{cut}/(2π)
. | δ_{resolution}
. |
---|---|---|---|---|---|---|---|

(mm) . | (s) . | (−) . | (s) . | (μm)
. | (−) . | (mHz) . | (mm) . |

7.2 | 0.5 | 900 | 0.1 | 17 | 587 | 36 | 3.5 |

11.7 | 1 | 1100 | 0.2 | 24 | 930 | 16 | 5.4 |

z
. | t_{p}
. | N . | Δt
. | Δz
. | SNR . | ω_{cut}/(2π)
. | δ_{resolution}
. |
---|---|---|---|---|---|---|---|

(mm) . | (s) . | (−) . | (s) . | (μm)
. | (−) . | (mHz) . | (mm) . |

7.2 | 0.5 | 900 | 0.1 | 17 | 587 | 36 | 3.5 |

11.7 | 1 | 1100 | 0.2 | 24 | 930 | 16 | 5.4 |

We invert these experimental temperature signals into virtual wave signals to reconstruct the location, the width, and amplitude of the internal heat sources, showing the capability of incorporating prior information in the regularization. The reconstructed depth profile of the initial temperature using the direct regularization method T-SVD^{4} is shown in Fig. 10(b). The comparison of the different reconstructed depth profiles clearly shows the entropy production caused by thermal diffusion that is equal to information loss with the result of lower axial resolution for the deeper defect, as listed in Table III. The SNR, which was derived from the singular values, is nearly doubled for the defect depth of 11.7 mm, as the duration $tp$ of the heating pulse is increased from 0.5 s to 1 s. The truncation frequency $\omega cut$ and the spatial resolution $\delta resolution$ are derived by Eqs. (28) and (29), respectively.

Using additional information about the system, e.g., through regularization constraints improves the definition of the system, decreases the available states, and leads to lower entropy, which leads to an increased spatial resolution in image reconstruction. To increase the information content in the regularization process, we introduce prior knowledge in the form of positivity, which is reasonable due to the solely non-negative amplitude values of the virtual wave, as well as for the photoacoustic pressure, in the 1D regime.^{69} In contrast to the bipolar reconstruction with T-SVD, non-negative values for the reconstructed virtual wave field can be enforced using iterative regularization procedures, such as the Alternating Direction Method of Multipliers (ADMM).^{70,71} Additionally, for a majority of practical problems in photothermal imaging, we can assume sparsity; since in the application of defect detection, mostly isolated singular defects occur, e.g., cracks, pores, delaminations, or inclusions of other materials, or in the application of parameter estimation, only the positions of the sources, the back wall, or individual inner boundary layers should be identified. Consequently, we have only a few point or line scatterers, which leads to a sparse virtual wave signal. Sparsity is introduced in the ADMM by an appropriate formulation of the objective function using $\u21131$-norm minimization. As shown in the reconstructed depth profiles in Fig. 10(b), iterative regularization with prior information leads to an improvement in energy localization and hence to a higher axial resolution.

### B. Active thermographic computed tomography (ATCT)

Active thermographic computed tomography (ATCT) is a new hybrid reconstruction technique that utilizes the photothermal (PT) effect,^{72,73} or other thermal excitation sources as inductive heating,^{25,27} mechanical friction through ultrasound, microwaves, for signal generation. In the case of photothermal computed tomography (PTCT), an optical pulse is used to irradiate an object, such as biological tissue or manufactured materials and products, to generate thermal waves within the object. The thermal waves diffuse to the sample surface, where the temperature change can be measured using infrared cameras. Due to the focal plane array of the infrared camera, up to $106$ signals can be recorded simultaneously, which is equivalent to an extremely large aperture in contrast to photoacoustic transducers. The aim of PTCT is to reconstruct an image that represents a map of the initial temperature distribution within the object from the measured photothermally induced thermal signals. The initial temperature distribution is proportional to the absorbed optical energy, which can reveal useful information of the internal structure, such as material defects, or inhomogeneous material parameters.

The reconstruction process in ATCT is a two step algorithm:

*Computation of the virtual wave field*: The original thermographic problem is converted to a photoacoustic imaging task by a pointwise transformation, that means separately for each pixel, of the measured temperature field into a virtual wave field as shown in Sec. IV A 2.*Virtual wave backpropagation*: The resulting virtual wave field exhibits wave properties, such as wavefront propagation, reflection, and refraction. Frequency domain-synthetic aperture focusing technique (F-SAFT), a well known acoustic reconstruction method,^{74,75}can be applied to image the initial temperature distribution. This allows a multidimensional image reconstruction with the advantage of a higher SNR.

The common inverse heat conduction solutions for multidimensional thermographic imaging lead to large scale optimization problems with high computation costs. The virtual wave concept for ATCT splits the severely ill-posed 3D large scale problem into a variety of small-scale 1D problems, where efficient algorithms, e.g., ADMM, can compute parallel for the solution. The 1D depth-encoded signals calculated for each location could be gathered and then reconstructed to a 3D image using F-SAFT with fast computational architecture, which could be also accelerated by graphical processing unit (GPU) programming.^{76}

In 3D imaging, we prefer the regularization technique ADMM including the prior information sparsity and positivity because of its improvements in terms of sensitivity and depth resolving capability compared to direct regulizers. The energy input during thermal excitation always results in positive temperature signals. By converting the temperature field into a virtual wave field, positivity cannot be applied directly for 2D or 3D wave propagation. An initially non-negative acoustic signal will take negative values during propagation.^{69} The circular and spherical projections, which is in 2D the Abel transformation and in 3D the time integral of the virtual wave,^{18} retain positivity of the initial source. For one data point, the information gain by a positivity constraint would be only a factor of 2, but for a signal with $N$ data points this factor becomes $2N$, which can be large for higher $N$.

In Secs. IV B 1 and IV B 2, we show two examples of ATCT with limited data (limited view), when the detectors cannot be placed around the complete object. In practical measurements, only a limited space around the specimen is available for photothermal detection. The thermal diffusivity and the resulting virtual speed of sound of the bulk are assumed to be constant and isotropic.

#### 1. Reconstruction from a single detector plane

The experimental phantom is built up with two steel rods, which are embedded in epoxy resin (Fig. 11). The cylindrical axes of the steel rods are parallel with a distance of 5 mm and inclined to the object surface. The inductive heated steel rods work as volumetric heat sources with a heating time of $tp=2s$ similar to the 1D photothermal example (Sec. IV A 2). The temperature field is measured on a single surface at $z=0$. The measured signal decreases with increasing depth of the rods due to the homogeneous diffusion in each direction. The spatial- and time discretizations were $\Delta x=168\mu m$ and $\Delta t=40ms$.

Reconstruction results based on this phantom and T-SVD regularization were also published by Burgholzer *et al.*^{27} In this Tutorial, not only T-SVD is used for regularization, but also the iterative regularization scheme ADMM is used, which enables the incorporation of prior information in the form of positivity and sparsity for defect reconstruction. A cross section of the virtual wave field reconstructed by T-SVD and ADMM is illustrated in Fig. 12. The virtual wave fields were normalized to the maximum wave amplitude that was obtained by ADMM regularization. Each cross section shows the typical scattering hyperbola as a consequence of the cylindrical steel rods, but the ADMM regularization leads to a much sharper localization. Based on the virtual wave field, the initial temperature distribution can be calculated by applying inverse wave propagation methods like F-SAFT.

Figure 13 illustrates the resulting 3D initial temperature distribution for the different regularization methods. The amplitude of the reconstructed initial temperature field was binarized with a threshold value to represent the localization of thermal sources at a point within the 3D sample volume. The circles indicate the real position of the steel rods in the epoxy cylinder. Figure 13(a) exhibits the T-SVD based reconstruction. It is visible that the steel rods can be distinguished until $z=4mm$, as expected for direct reconstructions without any prior information. The resolution is in the range of the depth of the sources. Moreover, artifacts are present in regions with low SNR. Figure 13(b) illustrates the reconstruction employing ADMM. In this process, positivity and sparsity were introduced to improve the regularized solution. Here, the steel rods are separable in the domain $10<x<40mm$. Moreover, the artifacts are significantly reduced compared to T-SVD.

#### 2. 3D plane detection

An example of ATCT with temperature data from three orthogonal detection planes is shown in Fig. 14. The phantom is an epoxy cube with an edge length of $a=15mm$ containing four spherical steel spheres with a diameter of 2 mm. The epoxy material, the steel beads, the inductive heating device, and the position of the sample inside the coil were the same as in Sec. IV B 1. The spatial and temporal discretization of the IR camera were $\Delta x=\Delta y=\Delta z=172\mu m$ and $\Delta t=50ms$. The bottom and the sidewalls of the cube were thermally isolated to ensure adiabatic boundary conditions. The time-dependent temperature fields were measured on the right side plane ($x=a$), front plane ($y=0$), and on the top plane ($z=a$) for a time duration of 120 s after the pulse excitation. The pulse time of 7.5 s was chosen to obtain also thermographic signals from the deeper spheres.

Reconstructions based on this phantom and T-SVD regularization were also published in Burgholzer *et al.*^{25} In this Tutorial and in contrast to prior works, we use ADMM regularization and incorporate prior information in the form of positivity and sparsity to improve the quality of the regularized solution. The three corresponding reconstructions using ADMM for regularization and F-SAFT for image reconstruction are shown in Fig. 14. For all three reconstruction planes, limited view artifacts are visible and the inclusions far from the detector planes cannot be reconstructed at all.

## V. SUMMARY, CONCLUSIONS, AND OUTLOOK

In nature, real processes are always irreversible and show a clear direction of the arrow of time. Some processes, such as the propagation of an acoustic wave, especially for low frequencies in a medium with low viscosity, are in a good approximation reversible. The corresponding wave equation for pressure is symmetric in time; for every pressure wave, which is a solution of this equation, the time-reversed pressure wave is also a solution. Even if some acoustic waves can be described in a good approximation by the wave equation, acoustic attenuation cannot be avoided totally. Therefore, we call this ideal solution a “virtual” wave, in comparison to the real measured signal. In this Tutorial, it is demonstrated for thermal and acoustic waves that the measured and the virtual signal are linked by a local relation [Eq. (1)].

For non-destructive imaging of subsurface structures, from an information theoretical viewpoint, it is advantageous to use “nearly reversible” waves to transport the information from the structures to the sample surface, such as acoustic waves with low attenuation. Due to heat diffusion, thermal waves exhibit a higher entropy production, which is shown to be equal to the information loss (Sec. III) and thus results in a lower spatial resolution (Tables I and II or Figs. 2 and 3). In Sec. III, two simple examples for stochastic processes are given, where the close connection between fluctuation and dissipation (“kicked Brownian particle”) and diffusion (“ideal gas diffusion between two boxes”) can be shown explicitly.

In Sec. IV, the information theoretical cutoff criterion using Chernoff–Stein’s Lemma (Sec. III) is applied in the frequency domain to photoacoustic and photothermal imaging. The information related criterion gives the same truncation value as when the signal amplitude gets less than the noise level. This is always the case if the mean energy of the wave component and therefore also its information content is proportional to the square of the signal amplitude. In Sec. IV A for the 1D imaging even in fat with rather high acoustic damping, the resolution is always much better than for thermographic imaging (see also the simulation in glycerine in Sec. II A). Therefore, for thermographic imaging with this high entropy production from heat diffusion, it is essential to use additional information such as positivity and sparsity by implementing iterative algorithms, e.g., by ADMM to get a better resolution, also in the sub-mm range [Fig. 10(b)]. This has been used in thermographic computed tomography (Sec. IV B), either by 2D reconstruction from single detector plane measurements or on three perpendicular planes for a cubic sample containing steel spheres to be imaged.

Using the T-SVD without incorporating additional information for a single 1D reconstruction, the spatial resolution does not get better compared to a direct inversion of the heat diffusion. The advantage of the virtual wave concept in 1D is that more advanced regularization techniques incorporating *a priori* information such as sparsity or positivity can be utilized in the reconstruction process. Including additional prior information allows thermographic reconstruction with a significantly better resolution than without that additional information, as shown in Figs. 10(b), 12, and 13.^{72,73} In 2D and 3D, after the computation of the virtual wave field, well known acoustic reconstruction methods such as F-SAFT are used (Sec. IV B). Here, the SNR is significantly enhanced as the heat flow in all directions is taken into account for reconstructions. In the past, one-dimensional axial image reconstruction was mainly carried out for thermographic reconstruction regardless of the lateral heat flow. The virtual wave concept respects the lateral heat flow, which yields an improved SNR and therefore an increased spatial resolution.^{25} This is similar to averaging of many 1D measurements when imaging a layered structure, but with the essential advantage, that the virtual wave concept can be used for any 2D or 3D structure to be imaged.

For future work, deep learning neural networks are planned to be used as the second step instead of, e.g., F-SAFT. For the same data from Fig. 13, this gives another significant enhancement in resolution. In Fig. 13, the threshold level for the isosurface plot is rather critical to avoid additional artifacts. With a deep neural network trained only by simulated data, the reconstructions are very stable and choosing a threshold level for the isosurface plot is not critical as reconstruction artifacts hardly appear.^{77}

## ACKNOWLEDGMENTS

The financial support by the Austrian Federal Ministry of Science, Research and Economy and the National Foundation for Research, Technology and Development is gratefully acknowledged. Furthermore, this work has been supported by the project “From measurement science to information gaining” by the federal government of Upper Austria, the project “Multimodal and *in situ* Characterization of Inhomogeneous Materials” (MiCi), by the federal government of Upper Austria and the European Regional Development Fund (EFRE) in the framework of the EU-program IWB2020. Financial support was also provided by the Austrian Research Funding Association (FFG) under the scope of the COMET programme within the research project “Photonic Sensing for Smarter Processes (PSSP)” (Contract No. 871974). This programme is promoted by BMK, BMDW, the federal state of Upper Austria, and the federal state of Styria, represented by SFG. Parts of this work have been supported by the Austrian Science Fund (FWF), Project Nos. P30747-N32 and P33019-N.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Variational Methods in Imaging*, Applied Mathematical Sciences Vol. 167 (Springer, New York, 2009).

*IEEE Nuclear Science Symposium Conference Record, 23–29 October 2005*(IEEE, 2005), pp. 1841–1845.

*Mathematical Modeling in Biomedical Imaging II, Lecture Notes in Mathematics*Vol. 2035, Mathematical Biosciences Subseries, edited by H. Ammari (Springer, Heidelberg, 2012), ISSN: 0075-8434.

*Mathematical Modeling in Biomedical Imaging II*, Lecture Notes in Mathematics Vol. 2035, edited by H. Ammari (Springer, Heidelberg, 2012), pp. 85–130, ISSN: 0075-8434.

**6437**, 643724 (2007).

**7564**, 75640O (2010).

*Modeling of Biological Interfacial Processes Using Thickness-Shear Mode Sensors*, edited by M. G. Beghi (INTECH Open Access Publisher, 2011).

**11004**, 1100404 (2019).

*Conduction of Heat in Solids*, 2nd ed. (Oxford Science Publications, Oxford, 1959), 1986 printing.

*Mathematical and Physical Papers*, edited by G. G. Stokes (Cambridge University Press, Cambridge, 2009).

*Photothermal Science and Techniques*, Physics and Its Applications Vol. 10 (Chapman & Hall, London, 1996).

*Stochastic Processes in Physics and Chemistry*, 3rd ed., North-Holland Personal Library (Elsevier, Amsterdam, 2007).

*Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences*, 2nd ed., Springer Series in Synergetics Vol. 13 (Springer, Berlin, 1985).

*The Fokker-Planck Equation: Methods of Solution and Applications*, 2nd ed., Springer Series in Synergetics Vol. 18, edited by H. Risken (Springer-Verlag, Berlin, 1989), ISSN: 0172-7389.

*Nonequilibrium Statistical Physics of Small Systems*, Reviews of Nonlinear Dynamics and Complexity (Wiley-VCH, Weinheim, 2013).

*An Introduction to Stochastic Processes in Physics: Containing “On the Theory of Brownian Motion”*, edited P. Langevin, A. Gythiel, and D. S. Lemons (Johns Hopkins University Press, Baltimore, 2002).

**7177**, 717723 (2009).

**11240**, 112403Y (2020).

*Photoacoustic Imaging and Spectroscopy*, Optical Science and Engineering Vol. 144 (CRC Press, Boca Raton, 2009).

*ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4 May 2020–8 May 2020*(IEEE, 2020), pp. 4277–4281.