In this Tutorial, we aim to directly recreate some of our “aha” moments when exploring the impact of heat diffusion on the spatial resolution limit of photothermal imaging. Our objective is also to communicate how this physical limit can nevertheless be overcome and include some concrete technological applications. Describing diffusion as a random walk, one insight is that such a stochastic process involves not only a Gaussian spread of the mean values in space, with the variance proportional to the diffusion time, but also temporal and spatial fluctuations around these mean values. All these fluctuations strongly influence the image reconstruction immediately after the short heating pulse. The Gaussian spread of the mean values in space increases the entropy, while the fluctuations lead to a loss of information that blurs the reconstruction of the initial temperature distribution and can be described mathematically by a spatial convolution with a Gaussian thermal point-spread-function. The information loss turns out to be equal to the mean entropy increase and limits the spatial resolution proportional to the depth of imaged subsurface structures. This principal resolution limit can only be overcome by including additional information such as sparsity or positivity. Prior information can be also included by using a deep neural network with a finite degrees of freedom and trained on a specific class of image examples for image reconstruction.

## I. INTRODUCTION

Photothermal imaging, and also photoacoustic imaging, involves (scattered) light causing a rise in the temperature of sub-surface light-absorbing structures, such as blood vessels in human tissue or carbon fibers in an epoxy material. In photothermal imaging, the internally absorbed heat diffuses from the absorbing structures to the sample surface, where it can be measured as a change in surface temperature. In photoacoustic imaging, the temperature rise causes a sudden increase in pressure, which produces an acoustic wave that can be detected at the surface with an ultrasound transducer. Both surface signals, the detected temperature increase and the pressure signal, can be used to reconstruct an image of the internal light-absorbing structures in a non-destructive way. The achievable spatial resolution in imaging is directly related to the information content of the thermal diffusion wave or the acoustic wave.^{1} For linear image reconstruction methods, the spatial resolution is the width of the point-spread-function (PSF), which is the reconstructed image of a point-like heat or pressure source.

Unfortunately, heat diffusion from the internal structure to the sample surface “annihilates” a lot of information, which, as we have shown earlier, corresponds to entropy production during heat diffusion.^{2,3} In thermal imaging, this leads to a spatial resolution proportional to the depth of a point heat source below the surface. In photoacoustic imaging, entropy production due to acoustic attenuation is lower and leads for liquids, where the acoustic attenuation is proportional to the square of the acoustic frequency, to a degradation of spatial resolution proportional to the square root of depth.^{4} Therefore, for a higher imaging depth, acoustic imaging shows a significantly better spatial resolution than thermographic imaging.

In this Tutorial, we first describe the one-dimensional diffusion process as a random walk, which can be described mathematically as a simple stochastic process. Of course, this is a rough approximation to the real physical processes of heat diffusion, where phonons can carry a whole distribution of energy, but two essential properties of thermal diffusion waves can already be recognized in this random walk model: The mean values of the occupation numbers spread in space, which can be described by the diffusion equation, and the actual occupation numbers fluctuate around these mean values. This fluctuation is described as the variance of the stochastic process. The diffusion process spreads the particles or the heat in space and, at the same time, induces temporal and spatial fluctuations around the mean values of Gaussian spreading, which are the reasons for an information loss being equal to an increase in entropy. For a macroscopic object, the temperature fluctuations are very small; but for thermographic reconstruction, the time reversal of heat diffusion turns out to be an highly ill-posed inverse problem that amplifies these fluctuations exponentially with increasing time. This limits the spatial resolution proportional to the depth of the subsurface structures, and this principal resolution limit can only be overcome by including additional information such as sparsity or positivity.^{5} Prior information can be also included by using a trained deep neural network for image reconstruction, either directly from the measured data (end-to-end reconstruction by deep learning) or by calculating first the virtual wave, which corresponds to the same initial heat or pressure distribution as the measured surface temperature but is a solution to the wave equation (hybrid reconstruction by the virtual wave concept and deep learning).^{6}

## II. HEAT DIFFUSION DESCRIBED AS A RANDOM WALK

In one dimension, the traditional way to define a random walk is to allow the walker to take steps at each time interval at which he must step either backward or forward with equal probability (Fig. 1). Described mathematically as a stochastic process, for the so-called Wiener process, the distribution density of the walking distance is a Gaussian distribution with the mean value constant in time and equal to the starting position and a variance proportional to time $t$ (e.g., Gardiner^{7}). Therefore, an initial sharp distribution spreads out with time [Figs. 1(b) and 2]. In addition to these simulated results, such a stochastic process can be physically realized using a Galton board (Fig. 3). The Galton board consists of a vertical board with nested rows of pins. Beads are dropped from the top and bounce either to the left or to the right when they hit the pins. They are collected into bins at the bottom, where the height of bead columns accumulated in the bins approximate a Gaussian bell curve. For the following mathematical description of the one-dimensional diffusion process, the individual bins are the cells numbered such that the central cell has $i=0$, and the occupation number $Ni(t)$ is the number of beads collected in a given pin. The time $t$ can be identified with the number of rows of pins of the Galton board.

We assume that the diffusion of the particles is independent of each other, with $pi(t)$ giving the probability that a certain particle is in cell $i$ at time $t$ when starting from cell $i=0$ at $t=0$. $Xij(t)$ is a new random variable, defined to be one if particle $j$ is in cell $i$ and zero in all other cases. For every particle, $Xij(t)$ is equal to one with probability $pi(t)$, independent of particle number $j$. Therefore, the mean value and variance of this random variable are the same for all the particles,

To derive these equations, it was used that $Xij2=Xij$ equals 1 if the particle $j$ is in cell $i$ and equals 0 otherwise. With occupation numbers $Ni=\u2211j=1NXij$, the mean value and variance follow from independence of the particle number $j$,

A simple model for one-dimensional heat diffusion using adiabatic boundary conditions is a one-dimensional random walk with reflecting boundaries. Such adiabatic boundary conditions can be realized for a thermally isolated sample for which the normal derivative of the temperature vanishes at all times at sample boundaries. The walking particles could be thought as the phonons in the sample volume. Giving them all the same energy is a rough approximation, but the essential details of the influence of fluctuations and the connection of entropy production and information loss can be seen already for such a model system.

Starting from an equilibrium modeled by a uniform distribution (Fig. 4) with $Nequi$ particles in $Ncell$ cells, at the time zero, $N0$ particles are added at cell $i=0$. Compared to Eq. (2), we have now two different distributions: the equilibrium distribution, which is constant in time and space (for every cell $i$), and the Gaussian distribution for the added particles (at least up to a time when the reflecting boundaries can be still neglected). As the movement of individual particles is assumed to be independent, one gets for the mean value and variance,

The last approximation is valid if the equilibrium occupation number $Nequi/Ncell$ is much higher than the mean diffusion term $N0piGauss(t)$ and the variance $Var$ gets constant in space and time. An example for $Nequi=10000,Ncell=40$, and $N0=1000$ is shown in Fig. 5.

Despite the independence of the movement of individual particles, the occupation numbers are not uncorrelated. One example is that if a cell has a high occupation number, then the occupation number of the neighboring cells increases shortly afterward. The covariance of the occupation number in cell $i$ at time $t$ and cell $j$ at later time $t+\tau $ is

For Eq. (5), it was used that the probability for a particle to be in cell $j$ at time $t+\tau $ when it was in cell $i$ at time $t$ is $pi\u2212j(\tau )$.

The time development of occupation numbers $Ni$ is a Gauss–Markov process and can be described by a Langevin equation (see, e.g., Groot and Mazur^{8}), which we have used in previous publications to describe the information loss for a kicked Brownian particle^{1} or the diffusion of heat.^{2} During the infinitesimal time $dt$, the infinitesimal change of the occupation number in cell $i$ is $dNi$,

where $\epsilon t(0,1)$ is a random variable with standard distribution (mean value zero and variance one), which describes uncorrelated white noise. The reflecting boundaries are represented by

All the occupation numbers can be combined in an $Ncell$ dimensional vector **N**,

The matrix **M** in Eq. (9) is a square matrix with dimension equal to the number of cells,

Equation (9) can be seen also as a discretized version (finite differences) of the one-dimensional diffusion equation,

with a unit spacing between discrete cells, a diffusion coefficient of $\alpha =12$, adiabatic boundary conditions, and a white noise generating term. The Fourier transform in space “k-spac” of the diffusion equation yields

which solves the diffusion equation (11) for the initial condition, $N(x,t=0)=N0(x)$,

To solve the Langevin equation (9), the matrix **M** can be diagonalized by an adequate coordinate transform, which turns out to be the Fourier transform in space and which is the same as for the solution to the diffusion equation (11). For the discretized random walk with reflecting boundaries, this transformation is in fact the discrete cosine transform $F=dct(diag(1,1,\u20261))$ because of the adiabatic boundary conditions. $F\u22c5M\u22c5Ft$ with $Ft$ being the transposed matrix of $F$ results in a diagonal matrix with singular values $\gamma k$ in the matrix diagonal,

The last approximation is for a continuous k-space when the cell volume gets zero and $Ncell$ gets to infinity and is identical to $\alpha k2$ with $\alpha =1/2$ for the one-dimensional diffusion equation in k-space. The singular values are proportional to the square of the wavenumber $k$ and are time constants for the exponential decay of $N^(k,t)=N^0(k)e\u2212k2\alpha t$ in time [compare to Eq. (13)]. Therefore, variations with a higher wavenumber (lower wavelength) decay faster. Wavenumber $k=0$ shows no decay and represents the equilibrium temperature, which is reached after a long time. The fluctuations and their Variance $Var(\u22c5)$ is the same in k-space and in real space because the noise energy must be the same in both domains (Parseval’s theorem). In Sec. IV, we show that the spatial resolution limit at a certain time $t$ is given by the wavenumber $kcut$ due to information loss by fluctuations, where $N^(kcut,t)$ is equal to the noise level given by the square root of the variance $Var$. For wavenumbers with $k>kcut$, $N^(k,t)$ is below the noise level and $N^0(k)=N^(k,t)e+k2\alpha t$ cannot be reconstructed any more.

## III. HOW PROBABILITY COMES INTO PLAY

In Sec. II, diffusion was introduced by a stochastic process such as random walk, where in each time interval the next step is chosen with a certain probability. But how does probability come into play for diffusion? In an earlier publication, we have used statistical physics and non-equilibrium thermodynamics to describe the heat diffusion, entropy, and information during photothermal imaging.^{1} One main result was that the information content about the initial state just after the excitation pulse in the diffusive system decreases due to fluctuations. The loss of information in the diffusive system is equal to the increase in entropy called entropy production, with the entropy expressed by the Shannon or Gibbs formula,

where the mean occupation number is $Ni\xaf(t)=Npi(t)$ in cell $i$ at time $t$ [Eq. (1)] and $kB$ is the Boltzmann constant. For real heat diffusion, the actual distribution with all its fluctuations can hardly be determined, but for obtaining information loss, the average entropy production can be used, which for heat diffusion in macroscopic samples is, to a very good approximation, the heat diffused to the surroundings divided by temperature.^{1}

The temporal evolution of the particles is described microscopically by mechanical equations, which are deterministic. But how does probability come into play in particle diffusion, for example? Here, we are confronted by the “demon” first articulated by Pierre Simon Laplace in 1814, which is a hypothetical observer that knows the position and momentum of every molecule, and together with the deterministic time evolution, this would be also true for the past and the future. Fortunately, this “demon can be exorcised” by moving to a quantum perspective on statistical mechanics, as described in an excellent overview in Physics Today by Katie Robertson on “demons in thermodynamics.”^{9} For Laplace’s demon, the entropy $S$ in Eq. (15) would be zero for all times because the state of the system is known with certainty. No probability distributions as described in thermodynamics would be necessary. But in the quantum case, probabilities are already an inherent part of the theory, so there is no need to add ignorance to the picture. In other words, the probabilities from statistical mechanics and quantum mechanics turn out to be one and the same.^{9}

Three publications by different authors in 2006^{10–12} describe that quantum entanglement of the diffusive system with the environment is the key, and no “ignorance” probabilities are necessary in the description of the diffusion process. They considered the global state of a large isolated system called the “universe” to be a quantum pure state. Therefore, there is no lack of knowledge about this state, and the entropy is zero. However, if only a small part of the universe is considered, which we call the “diffusive system,” its state will not remain pure due to quantum entanglement with the rest of the universe, which is called the “environment” (Fig. 6).^{11} If one of the two systems is taken on its own, it will be in an intrinsically uncertain state known as the mixed state. Assuming that the surrounding environment, e.g., the laboratory at room temperature, is sufficiently large, then for almost any pure state that the composite system is in, the diffusive system will be in a state very close to the state it would be assigned by traditional statistical mechanics, which was described by the stochastic random-walk process in Sec. II. The quantum description, thus, leads to a probability distribution indistinguishable from that of statistical mechanics.^{9} Probabilities are an intrinsic and inescapable part of quantum mechanics. When it describes the diffusive system taken on its own, Laplace’s demon cannot know more than anyone else. According to Fig. 6, the probabilities describing the uncertainties (or ignorance) in the diffusion process result from the fact that the information “transported” to the environment by entanglement is lost.

## IV. INFORMATION LOSS LIMITS THE SPATIAL RESOLUTION OF THE RECONSTRUCTION

In Sec. II, it was shown that a stochastic process such as random walk leads to spreading in space (dissipation), which is accompanied by an increase in entropy and at the same time to fluctuations. Fluctuations and mean entropy production can be thought as two sides of the same coin and are connected by the fluctuation–dissipation theorem (Fig. 7). For systems near thermal equilibrium in the linear regime, such relations between entropy production and fluctuation properties have been found by Callen,^{13} Welton,^{14} and Greene.^{15} This fluctuation–dissipation theorem is a generalization of the famous Johnson^{16}–Nyquist^{17} formula in the theory of electric noise. It is based on the fact that in the linear regime, the fluctuations decay according to the same law as a deviation from equilibrium following an external perturbation. We could show from non-equilibrium thermodynamics that the mean entropy production due to the spreading of heat or particles (dissipation) is equal to the information loss due to fluctuations.^{1} This connects dissipation, e.g., described by the singular values $\gamma k$ in Eq. (14), with the fluctuations given by the variance $Var$. Equation (9) reads in k-space for all wavenumbers $k$,

where the amplitude of the noise for each wavenumber $k$ is given by $\sigma k2=2\gamma kVar$. This fluctuation–dissipation relation connects directly the exponential decay of $N^(k,t)=N^0(k)e\u2212\gamma kt$, which shows the dissipative behavior, with the fluctuations represented by the noise term $\sigma k2$ in the Langevin equation (16).

The diffusion process increases the entropy, but also leads to fluctuations that reduce the information content and, thus, the spatial resolution for a subsequent reconstruction process. For thermal diffusion imaging, we assume that immediately after the excitation pulse, the absorbed heat is present only at a specific location, e.g., in cell zero. It diffuses with time [$t=20$ (a) and $t=100$ (b) according to Fig. 2], and because of the fluctuations, reconstruction of the initial distribution from the distribution at a later time is possible only with increasing uncertainty. At time $t=1000$, equilibrium is reached (Fig. 4), and all information about the initial location is lost.

The information loss and the degradation of the spatial resolution for the initial distribution just after the excitation pulse can be quantified in k-space. With an initially fully localized distribution $N0(x)=N0\delta (x)$, Eq. (16) gives for each wavenumber $k$ an exponential decay of the mean values. After a certain time $t$, only the wavenumbers $k<kcut$ where $N^(k,t)\xaf=N0e\u2212k2\alpha t>Var$ can be distinguished from noise (Fig. 8) and can be used for reconstruction.^{3} The reconstructed signal $Nr(x,t)$ is the inverse Fourier transform of this rectangular function, which gives a sinc-function,

With the signal-to-noise ratio $SNRk=N0/Var$ in k-space, we get

with the natural logarithm $ln(\u22c5)$. The spatial resolution for the reconstruction of the initial distribution from a distribution at time $t$ is the “width” of the reconstructed signal and is taken as half of the “wavelength” at wavenumber $kcut$,

## V. THERMAL POINT-SPREAD-FUNCTION

In general, photothermal imaging does not measure the temperature at all locations $T(x)$ of the sample at a given time $t$, as assumed in Sec. IV, but at a specific location $x=0$, which is usually the sample surface, the temperature is measured for the entire time $T(x=0,t)$.^{1} Instead of a Fourier transform to k-space, the one-dimensional heat diffusion equation,

with the initial temperature distribution $T0(x)\u2261T(x,t=0)$ is solved in the $\omega $-space not by assuming a real wavenumber and a complex frequency as for solving Eq. (11), but for real frequency $\omega $ and a complex wavenumber $\sigma (\omega )$, and by inserting the “thermal wave” as an Ansatz into the heat diffusion Eq. (20),

where $R$ is the real part, $\sigma (\omega )=i\omega /\alpha \u2261(1+i)/\mu $ is the complex wave number, and $T0$ is the amplitude, we get

with $\mu (\omega )\u22612\alpha /\omega $ being defined as the thermal diffusion length.^{18} The thermal wave is strongly damped because the real and imaginary parts of the complex wavenumber $\sigma (\omega )$ are equal and the amplitude is reduced by a factor of $1/e$ after propagation of the thermal diffusion length. The wavenumber or spatial frequency is $k(\omega )\u22611/\mu (\omega )=\omega /2\alpha $. Similar as in k-space, for frequencies larger than the truncation frequency $\omega cut$, the amplitude of these wave components is damped below the noise level, and with the truncation frequency $\omega cut$, the truncation diffusion length $\mu cut$, and the signal-to-noise ratio $SNR$ in $\omega $-space, we get

or

Similar to Eq. (19), the spatial resolution

is half of the wavelength of the truncation frequency. For thermographic reconstruction, the resolution limit increases linearly with depth $x$. In comparison, $\delta r(t)$ increases with the square root of $t$ [Eq. (19)].

In the frequency domain, a short-time excitation is the Fourier transform of a delta pulse where all frequencies have the same amplitude. The thermal point-spread function (PSF) is the inverse Fourier transform, where the frequency integral is taken up to $\omega cut$, which assumes that below $\omega cut$ the full signal can be reconstructed, and for frequency components above the cutoff frequency, no signal information is available.^{19} Therefore, the signal-to-noise ratio $SNR$, or more precisely, due to the exponential decrease in the amplitude, the natural logarithm $ln$ of $SNR$ plays an important role for spatial resolution. The width of the one-dimensional thermal PSF is proportional to $x/ln\u2061(SNR)$, as derived in Eq. (24). For frequency components lower than $\omega cut$, the signal amplitude at the sample surface is less than the noise. In deriving spatial resolution, it is assumed that such frequency components cannot be detected. Frequency components above the noise level are used for reconstruction. From information theory and non-equilibrium thermodynamics, we could show that the information content for frequencies larger than $\omega cut$ is so small that their distribution cannot be distinguished from equilibrium distribution.^{1} If the mean energy and, thus, the entropy are proportional to the square of the signal amplitude, the cutoff frequency $\omega cut$ from this information criterion is always the same frequency at which the signal amplitude becomes smaller than the noise amplitude.

One-dimensional signals can be realized by layered structures, and there the noise can easily be reduced by increasing the area covered by the detector, which corresponds to enhanced averaging. For higher dimensions, such as image reconstruction in three dimensions, this is no longer possible because the signal is different for different detection points on the surface. There is a trade-off between small measurement pixels for better spatial resolution and larger measurement pixels with less noise and better $SNR$, which also affects the spatial resolution of the reconstruction.

To obtain the two- or three-dimensional thermal PSF, we assume a point source to be embedded in a homogeneous sample at depth $d$ and a planar detection surface. The distance to the surface $x=d/cos(\theta )$ depends on angle $\theta $ (Fig. 9). Using Eq. (23), the wavenumber $kcut=\omega cut/2\alpha $ depends on the direction $\theta $,

Inverse Fourier transformation back to real space gives the thermal PSF, as shown for $SNR=1000$ in Fig. 10. It shows the principle resolution limit for a certain $SNR$, like the Abbe limit in optics for a certain wavelength. The axial depth resolution is limited by $kcut$ [Eq. (25)] at $\theta =0$, which is the same as in the one-dimensional case given in Eq. (24), and is $\pi d/ln\u2061(SNR)$, and ranges in the scaled depth units in Fig. 10 from $z/d=0.7726$ until $z/d=1.2274$ for $SNR=1000$. For $SNR=100$, this would range from 0.66 until 1.34.

## VI. OVERCOMING THE RESOLUTION LIMIT

The resolution limit $\delta r$ given in Eq. (24) degrades proportional to imaging depth $x$. One way to improve the resolution when imaging a structure at certain depth is to increase the $SNR$, which could be achieved by increasing the power of the excitation laser. This is only possible to a certain extent to prevent the sample from being destroyed by too much heating. As mentioned above, for layered structures for which the one-dimensional thermal diffusion equation applies, the $SNR$ can be improved by measuring and averaging over a larger area. For two-dimensional structures, such as parallel cylindrical objects, it is still possible to perform averaging in the direction of the parallel axis and thereby increase the $SNR$.

For three-dimensional reconstruction, the $SNR$ cannot be improved by direct averaging of the surface signals. However, similar to reconstruction with ultrasound, reconstruction methods can be used that include the diffusion of heat in all directions. This provides a kind of averaging over all detection points and the $SNR$ can be increased similar to direct averaging. Acoustic imaging methods, such as back-projection, Synthetic Aperture Focusing Technique (SAFT), or time reversal, can be used by calculating in a first step from the measured temperature, the so-called virtual wave, which has the same initial heat or pressure distribution as the measured temperature, but is a solution to the wave equation.^{20} The measured surface temperature and the virtual wave are connected by local transformation, i.e., for the same location $r$,

where the sound velocity $c$ of the virtual wave can be chosen arbitrarily. Figure 11 shows a two-dimensional example with simulated data where the virtual wave concept is used for reconstruction of original temperature distribution. Here, for the initial temperature distribution, three gaussian-like temperature peaks are located at different depths below the sample surface. The deepest source is barely visible when looking at the 200 measured surface temperature signals $T(r,t)$ alone with a SNR of 1000, whereas when reconstructed using the virtual wave concept, all three original temperature sources are clearly visible in $T0rec(r)$. The ultrasound reconstruction from the virtual waves has the same effect on the SNR as averaging, and therefore, compared to a single one-dimensional reconstruction, in this example for 200 detection points, the SNR can be enhanced by a factor of $200$ or the spatial resolution is enhanced by a factor of $ln\u2061200\u22482.65$ according to Eq. (24). Since the $SNR$ increases with the square root of the averaged measurements and the resolution increases with $ln\u2061(SNR)$ [Eq. (24)], increasing the $SNR$ by averaging or by using ultrasound reconstruction from virtual waves increases the resolution only in a logarithmic way.

To increase the thermal resolution limit more efficiently, either super-resolution methods, such as those known to overcome the Abbe limit in optical imaging, e.g., by structured illumination, or additional information, such as sparsity or positivity, can be used. When using structured illumination, several reconstructed images with different illumination are used to calculate one super-resolution image by a non-linear reconstruction method, such as the iterative joint sparsity algorithm (IJOSP).^{19} As an example of the consideration of sparsity and positivity, the reconstruction of three graphite bars embedded in epoxy resin is presented here (thermal diffusivity $0.13\xd710\u22126m2/s$ at a depth of 1.6, 2.6, and 3.6 mm).^{5} The measurement setup is sketched in Fig. 12. The graphite bars were heated by laser excitation (diode laser wavelength 938 nm, 250 W laser power with a pulse duration of 200 ms). The surface temperature after laser pulse excitation was measured with an infrared camera (106 Hz full frame mode and a noise equivalent temperature difference of 25 mK, cooled InSb sensor), which was sensitive in the spectral range of $3.0\u22125.1\mu m$. In this spectral range, the epoxy resin is opaque and we measured the surface temperature evolution. The spatial resolution from the camera pixels was 0.1 mm on the sample surface. Figure 13 shows a cross section of the initial temperature field from the light absorbing bars, and the reconstruction using the truncated-singular value decomposition (T-SVD) method for calculating the virtual wave and SAFT for the ultrasound reconstruction algorithm. To calculate the virtual wave considering sparsity (the cross section of graphite rods are only small points) and positivity (heating always increases the temperature), the Alternating Direction Method of Multipliers (ADMM) was used, a nonlinear optimization algorithm similar to the Douglas–Rachford method. Sparsity is taken into account by these methods by minimizing in addition to a data fitting term also the L1-norm. More precisely, the virtual wave is computed as a minimizer of $\Vert MTvirt\u2212T\Vert 2+R(Tvirt)$, where $M$ is a discretization of the integral in (26) and $R(.)$ is a suitable regularizer defined by the L1-norm. The use of this regularized virtual wave allows to implement efficient L1-minimization algorithms (e.g., ADMM) and leads to a much better resolution of the reconstructed temperature field.^{5}

The application of deep neural networks to tackle inverse problems is currently attracting a huge amount of interest within the inverse problems community, e.g., for thermal property reconstruction^{21–23} or for source reconstruction.^{24} Combining these methods with traditional model-based reconstruction techniques for solving inverse problems is an emerging trend that comes with new challenges, such as generating training data, designing architectures, choosing learning strategies, and interpreting the learned parameters. For recent results in this field, we refer to the survey of Arridge *et al.*^{25} We have compared purely model-based reconstructions using the ADMM method for calculating the virtual wave and a subsequent reconstruction with SAFT, a hybrid reconstruction by using deep learning to determine the reconstruction from the virtual wave, and a end-to-end deep learning reconstruction using directly the measured temperature data without calculating the virtual wave before.^{6} The best spatial resolution could be demonstrated by hybrid reconstruction using the ADMM method to calculate the virtual wave and then the deep learning reconstruction. This showed better results than reconstruction directly by deep learning and also the effort for training of the network was significantly less. For training of the deep neural network (u-net architecture), we have used simulated synthetic data, but nevertheless the reconstruction from measured data was better than using our model-based approach.^{26} One reason for this could be that the virtual waves show clearer signals with more spatial variations, similar to the images that are to be reconstructed, and not as slightly varying signals as the original thermal measurements. Using simulated synthetic data for training is important because real-world data are hard to obtain, as it requires the production of physical *phantoms* with a range of material properties (e.g., defects at various positions and of various sizes and shapes). However, we can synthesize arbitrarily large amounts of data with relatively simple analytic models of the heat diffusion process, albeit at the risk of results based on synthetic data not perfectly matching the real world. As a consequence, we provided code and data (synthetic and real-world) for full reproducibility of the results obtained by our experiments.

In summary, the virtual wave concept is an ideal and universal tool enabling the prospect of high-resolution photothermal imaging. The virtual wave concept can be viewed as beneficial factorization of the highly ill-posed inverse photothermal problem into a conversion from thermal to acoustic waves and a subsequent beamforming task. The conversion task is still highly ill-posed. However, the challenging situation of the full 2D or 3D reconstruction problem has been reduced to the solution to an 1D integral equation (26). In doing so, we were able to efficiently integrate sparsity as well as positivity of the initial temperature profile to overcome the resolution limit. Note that the virtual waves $Tvirt(r,t)$ are generally not positive. However, if we apply the Abel transform to virtual waves, we obtain the spherical mean values of the initial temperature profile. Therefore, in reality, positivity is always satisfied for Abel transformed waves that we actually use for the positivity constraint. In the pixel basis, sparsity may also not be given directly. However, sparsity of the initial temperature is usually the case in various different bases such as the wavelet base. As shown in Ref. 27 in the context of photoacoustic imaging, wavelet sparsity of the initial temperature is transferred to wavelet sparsity of the virtual wave, thereby allowing the use of sparse recovery techniques. Finally, some other information such as the shape of absorbers may also be available. Exploiting such information is interesting and challenging. We anticipate, however, that ideas from Ref. 27 along with matched filtering techniques are promising for integrating such prior information into the reconstruction of the virtual wave.

## VII. DISCUSSION

Often, high-frequency components in the signals cannot be detected because the noise prevents them from detection. There are several effects that limit the detectability and cause noise. Beside insufficient instrumentation and data processing, one principle limitation comes from thermodynamic fluctuations (noise) associated with dissipative processes in accordance to the dissipation–fluctuation theorem (Fig. 7). As mentioned above, in addition to the imaging depth, the $SNR$ also affects spatial resolution and the width of the thermal PSF. Even if the noise amplitude is the same in frequency and time domain for Gaussian white noise, because the noise energy must be the same in the frequency and time domain (Parseval’s theorem), the signal amplitude and, hence, the $SNR$ can be very different. For example, a short delta-shaped pulse in time contains all frequencies with the same amplitude, and therefore, the $SNR$ is much lower in the frequency domain than in the time domain. The appropriate space for the $SNR$ to calculate the resolution limit is closely related to the ill-posed inverse problem to compensate for the effect of thermal diffusion or acoustic attenuation. Regularization is required to solve such ill-posed inverse problems.^{28–30} The use of the cutoff frequency $\omega cut$ is one way to regularize the ill-posed inverse problem, which turns out to be equivalent to the truncated singular value decomposition (SVD) method. But, of course, such a rigorous cut, which uses the full information of the frequency components below $\omega cut$ and completely neglects them above this cutoff frequency, is only a rough approximation. In the frequency domain, the information content of a frequency component gradually decreases with the square of the reduced amplitude as the frequency increases, until it disappears in the noise at cutoff frequency $\omega cut$. Therefore, other regularization methods that allow a smoother transition, such as Tikhonov regularization,^{31} might better describe this behavior.

To describe how humans perceive acoustic waves, Harvey Fletcher conducted tone-in-noise masking experiments, where a single tone (now a frequency domain peak equivalent to the delta time peak from above) was masked by noise. He had a problem with at the time (very) noisy telephones and needed to find out why people could or could not hear the speech and what the telephone company could do about it.^{32} We performed some of these experiments with different signals and with white noise and bandstop noise to better understand whether signals are detectable or not in the presence of simultaneous noise. It turned out that before the time-domain signal is converted to the frequency domain, temporal windowing is an important aspect in describing detectability. This, in turn, affects the possible measurement bandwidth for the signal, and a trade-off between higher bandwidth or more noise must be made in choosing the optimal discretization time step. This could lead to an optimal temporal discretization to achieve an optimal spatial resolution.

The physical reason for the entropy production is fluctuation, which is the noise added to an ideally noise-free signal and can be statistically described by stochastic processes.^{2} The ratio between the ideal signal amplitude and the noise amplitude is called the signal-to-noise ratio (SNR) and plays an essential role in this Tutorial. Noise is identified as thermodynamic fluctuations around a certain time varying mean-value and is the mechanism for entropy production and information loss. Other technical limitations from insufficient instrumentation can additionally limit spatial resolution more than described by the thermal PSF.

The thermodynamic fluctuations in macroscopic samples are usually so small that they can be neglected but not for ill-posed inverse problems. The fluctuations are highly amplified due to the ill-posed problem of image reconstruction. As long as the macroscopic mean-value-equations describe the mean entropy production, the resolution limit depends only on the amplitude of the fluctuations and not on the actual stochastic process including all correlations.^{3} Therefore, we have already used in the past a simple Gauss–Markov process to describe the measured signal as a time-dependent random variable.^{2} Here, we describe heat diffusion as a random walk (Wiener process).

## VIII. CONCLUSIONS

This Tutorial explains the origin of the spatial resolution limit associated to photothermal techniques for imaging buried structures. The diffusion process is represented as a stochastic process (random walk) that causes spatial spreading of the mean values (entropy increase) and is accompanied by fluctuations that lead to information loss about the original temperature distribution, imposing a resolution limit that is calculated. Thereby, it shows how the diffusion of heat and fluctuations lead to the well-known blurring of reconstructions in infrared imaging. This can be described mathematically by a spatial convolution of the original temperature distribution with a thermal point-spread-function (PSF). The width of the thermal PSF increases linearly with imaging depth and is indirectly proportional to the natural logarithm of the signal-to-noise ratio (SNR).

The random walk model is introduced as a mathematically simple approach based on statistical probability calculations that provides insight into the heat diffusion process. It is also explained how probabilities come into play in a deterministic model for diffusion, where interaction with the environment is the cause for losing information. Furthermore, some inversion strategies based on the virtual wave concept, the inclusion of sparsity and positivity of the solution, as well as deep neural networks are presented and compared as suitable methods to partially overcome the depth-dependent resolution limit.

## ACKNOWLEDGMENTS

This work was co-financed by research subsidies granted by the government of Upper Austria. Financial support was also provided by the Austrian research funding association (FFG) under the scope of the COMET programme within the research project “Photonic Sensing for Smarter Processes (PSSP)” (Contract No. 871974). This programme is promoted by BMK, BMDW, the federal state of Upper Austria, and the federal state of Styria, represented by SFG. Parts of this work have been supported by the Austrian Science Fund (FWF), Project Nos. P 30747-N32 and P 33019-N. In addition, the financial support by the Austrian Federal Ministry for Digital and Economic Affairs and the National Foundation for Research, Technology and Development and the Christian Doppler Research Association is gratefully acknowledged.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Photons Plus Ultrasound: Imaging and Sensing 2007: The Eighth Conference on Biomedical Thermoacoustics, Optoacoustics, and Acousto-optics*,

*SPIE Proceedings*, edited by A. A. Oraevsky and L. V. Wang (SPIE, 2007), p. 643726.

*Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences*, 2nd ed., Springer Series in Synergetics Vol. 13 (Springer, Berlin, 1985).

*Variational Methods in Imaging*, Applied Mathematical Sciences Vol. 167 (Springer, New York, 2009).

*Modeling of Biological Interfacial Processes Using Thickness-Shear Mode Sensors*, edited by M. G. Beghi (INTECH Open Access Publisher, 2011).

*Speech and Hearing in Communication*, The Bell Telephone Laboratories Series (Van Nostrand, New York, 1953).