In this Tutorial, we aim to directly recreate some of our “aha” moments when exploring the impact of heat diffusion on the spatial resolution limit of photothermal imaging. Our objective is also to communicate how this physical limit can nevertheless be overcome and include some concrete technological applications. Describing diffusion as a random walk, one insight is that such a stochastic process involves not only a Gaussian spread of the mean values in space, with the variance proportional to the diffusion time, but also temporal and spatial fluctuations around these mean values. All these fluctuations strongly influence the image reconstruction immediately after the short heating pulse. The Gaussian spread of the mean values in space increases the entropy, while the fluctuations lead to a loss of information that blurs the reconstruction of the initial temperature distribution and can be described mathematically by a spatial convolution with a Gaussian thermal point-spread-function. The information loss turns out to be equal to the mean entropy increase and limits the spatial resolution proportional to the depth of imaged subsurface structures. This principal resolution limit can only be overcome by including additional information such as sparsity or positivity. Prior information can be also included by using a deep neural network with a finite degrees of freedom and trained on a specific class of image examples for image reconstruction.

## I. INTRODUCTION

Photothermal imaging, and also photoacoustic imaging, involves (scattered) light causing a rise in the temperature of sub-surface light-absorbing structures, such as blood vessels in human tissue or carbon fibers in an epoxy material. In photothermal imaging, the internally absorbed heat diffuses from the absorbing structures to the sample surface, where it can be measured as a change in surface temperature. In photoacoustic imaging, the temperature rise causes a sudden increase in pressure, which produces an acoustic wave that can be detected at the surface with an ultrasound transducer. Both surface signals, the detected temperature increase and the pressure signal, can be used to reconstruct an image of the internal light-absorbing structures in a non-destructive way. The achievable spatial resolution in imaging is directly related to the information content of the thermal diffusion wave or the acoustic wave.^{1} For linear image reconstruction methods, the spatial resolution is the width of the point-spread-function (PSF), which is the reconstructed image of a point-like heat or pressure source.

Unfortunately, heat diffusion from the internal structure to the sample surface “annihilates” a lot of information, which, as we have shown earlier, corresponds to entropy production during heat diffusion.^{2,3} In thermal imaging, this leads to a spatial resolution proportional to the depth of a point heat source below the surface. In photoacoustic imaging, entropy production due to acoustic attenuation is lower and leads for liquids, where the acoustic attenuation is proportional to the square of the acoustic frequency, to a degradation of spatial resolution proportional to the square root of depth.^{4} Therefore, for a higher imaging depth, acoustic imaging shows a significantly better spatial resolution than thermographic imaging.

In this Tutorial, we first describe the one-dimensional diffusion process as a random walk, which can be described mathematically as a simple stochastic process. Of course, this is a rough approximation to the real physical processes of heat diffusion, where phonons can carry a whole distribution of energy, but two essential properties of thermal diffusion waves can already be recognized in this random walk model: The mean values of the occupation numbers spread in space, which can be described by the diffusion equation, and the actual occupation numbers fluctuate around these mean values. This fluctuation is described as the variance of the stochastic process. The diffusion process spreads the particles or the heat in space and, at the same time, induces temporal and spatial fluctuations around the mean values of Gaussian spreading, which are the reasons for an information loss being equal to an increase in entropy. For a macroscopic object, the temperature fluctuations are very small; but for thermographic reconstruction, the time reversal of heat diffusion turns out to be an highly ill-posed inverse problem that amplifies these fluctuations exponentially with increasing time. This limits the spatial resolution proportional to the depth of the subsurface structures, and this principal resolution limit can only be overcome by including additional information such as sparsity or positivity.^{5} Prior information can be also included by using a trained deep neural network for image reconstruction, either directly from the measured data (end-to-end reconstruction by deep learning) or by calculating first the virtual wave, which corresponds to the same initial heat or pressure distribution as the measured surface temperature but is a solution to the wave equation (hybrid reconstruction by the virtual wave concept and deep learning).^{6}

## II. HEAT DIFFUSION DESCRIBED AS A RANDOM WALK

In one dimension, the traditional way to define a random walk is to allow the walker to take steps at each time interval at which he must step either backward or forward with equal probability (Fig. 1). Described mathematically as a stochastic process, for the so-called Wiener process, the distribution density of the walking distance is a Gaussian distribution with the mean value constant in time and equal to the starting position and a variance proportional to time $t$ (e.g., Gardiner^{7}). Therefore, an initial sharp distribution spreads out with time [Figs. 1(b) and 2]. In addition to these simulated results, such a stochastic process can be physically realized using a Galton board (Fig. 3). The Galton board consists of a vertical board with nested rows of pins. Beads are dropped from the top and bounce either to the left or to the right when they hit the pins. They are collected into bins at the bottom, where the height of bead columns accumulated in the bins approximate a Gaussian bell curve. For the following mathematical description of the one-dimensional diffusion process, the individual bins are the cells numbered such that the central cell has $i=0$, and the occupation number $ N i(t)$ is the number of beads collected in a given pin. The time $t$ can be identified with the number of rows of pins of the Galton board.

^{8}), which we have used in previous publications to describe the information loss for a kicked Brownian particle

^{1}or the diffusion of heat.

^{2}During the infinitesimal time $ dt$, the infinitesimal change of the occupation number in cell $i$ is $ d N i$,

**N**,

**M**in Eq. (9) is a square matrix with dimension equal to the number of cells,

**M**can be diagonalized by an adequate coordinate transform, which turns out to be the Fourier transform in space and which is the same as for the solution to the diffusion equation (11). For the discretized random walk with reflecting boundaries, this transformation is in fact the discrete cosine transform $F= dct( diag(1,1,\u20261))$ because of the adiabatic boundary conditions. $F\u22c5M\u22c5 F t$ with $ F t$ being the transposed matrix of $F$ results in a diagonal matrix with singular values $ \gamma k$ in the matrix diagonal,

## III. HOW PROBABILITY COMES INTO PLAY

^{1}One main result was that the information content about the initial state just after the excitation pulse in the diffusive system decreases due to fluctuations. The loss of information in the diffusive system is equal to the increase in entropy called entropy production, with the entropy expressed by the Shannon or Gibbs formula,

^{1}

The temporal evolution of the particles is described microscopically by mechanical equations, which are deterministic. But how does probability come into play in particle diffusion, for example? Here, we are confronted by the “demon” first articulated by Pierre Simon Laplace in 1814, which is a hypothetical observer that knows the position and momentum of every molecule, and together with the deterministic time evolution, this would be also true for the past and the future. Fortunately, this “demon can be exorcised” by moving to a quantum perspective on statistical mechanics, as described in an excellent overview in Physics Today by Katie Robertson on “demons in thermodynamics.”^{9} For Laplace’s demon, the entropy $S$ in Eq. (15) would be zero for all times because the state of the system is known with certainty. No probability distributions as described in thermodynamics would be necessary. But in the quantum case, probabilities are already an inherent part of the theory, so there is no need to add ignorance to the picture. In other words, the probabilities from statistical mechanics and quantum mechanics turn out to be one and the same.^{9}

Three publications by different authors in 2006^{10–12} describe that quantum entanglement of the diffusive system with the environment is the key, and no “ignorance” probabilities are necessary in the description of the diffusion process. They considered the global state of a large isolated system called the “universe” to be a quantum pure state. Therefore, there is no lack of knowledge about this state, and the entropy is zero. However, if only a small part of the universe is considered, which we call the “diffusive system,” its state will not remain pure due to quantum entanglement with the rest of the universe, which is called the “environment” (Fig. 6).^{11} If one of the two systems is taken on its own, it will be in an intrinsically uncertain state known as the mixed state. Assuming that the surrounding environment, e.g., the laboratory at room temperature, is sufficiently large, then for almost any pure state that the composite system is in, the diffusive system will be in a state very close to the state it would be assigned by traditional statistical mechanics, which was described by the stochastic random-walk process in Sec. II. The quantum description, thus, leads to a probability distribution indistinguishable from that of statistical mechanics.^{9} Probabilities are an intrinsic and inescapable part of quantum mechanics. When it describes the diffusive system taken on its own, Laplace’s demon cannot know more than anyone else. According to Fig. 6, the probabilities describing the uncertainties (or ignorance) in the diffusion process result from the fact that the information “transported” to the environment by entanglement is lost.

## IV. INFORMATION LOSS LIMITS THE SPATIAL RESOLUTION OF THE RECONSTRUCTION

^{13}Welton,

^{14}and Greene.

^{15}This fluctuation–dissipation theorem is a generalization of the famous Johnson

^{16}–Nyquist

^{17}formula in the theory of electric noise. It is based on the fact that in the linear regime, the fluctuations decay according to the same law as a deviation from equilibrium following an external perturbation. We could show from non-equilibrium thermodynamics that the mean entropy production due to the spreading of heat or particles (dissipation) is equal to the information loss due to fluctuations.

^{1}This connects dissipation, e.g., described by the singular values $ \gamma k$ in Eq. (14), with the fluctuations given by the variance $ Var$. Equation (9) reads in k-space for all wavenumbers $k$,

The diffusion process increases the entropy, but also leads to fluctuations that reduce the information content and, thus, the spatial resolution for a subsequent reconstruction process. For thermal diffusion imaging, we assume that immediately after the excitation pulse, the absorbed heat is present only at a specific location, e.g., in cell zero. It diffuses with time [ $t=20$ (a) and $t=100$ (b) according to Fig. 2], and because of the fluctuations, reconstruction of the initial distribution from the distribution at a later time is possible only with increasing uncertainty. At time $t=1000$, equilibrium is reached (Fig. 4), and all information about the initial location is lost.

^{3}The reconstructed signal $ N r(x,t)$ is the inverse Fourier transform of this rectangular function, which gives a sinc-function,

## V. THERMAL POINT-SPREAD-FUNCTION

^{1}Instead of a Fourier transform to k-space, the one-dimensional heat diffusion equation,

^{18}The thermal wave is strongly damped because the real and imaginary parts of the complex wavenumber $\sigma (\omega )$ are equal and the amplitude is reduced by a factor of $1/e$ after propagation of the thermal diffusion length. The wavenumber or spatial frequency is $k(\omega )\u22611/\mu (\omega )= \omega / 2 \alpha $. Similar as in k-space, for frequencies larger than the truncation frequency $ \omega cut$, the amplitude of these wave components is damped below the noise level, and with the truncation frequency $ \omega cut$, the truncation diffusion length $ \mu cut$, and the signal-to-noise ratio $SNR$ in $\omega $-space, we get

In the frequency domain, a short-time excitation is the Fourier transform of a delta pulse where all frequencies have the same amplitude. The thermal point-spread function (PSF) is the inverse Fourier transform, where the frequency integral is taken up to $ \omega cut$, which assumes that below $ \omega cut$ the full signal can be reconstructed, and for frequency components above the cutoff frequency, no signal information is available.^{19} Therefore, the signal-to-noise ratio $SNR$, or more precisely, due to the exponential decrease in the amplitude, the natural logarithm $ln$ of $SNR$ plays an important role for spatial resolution. The width of the one-dimensional thermal PSF is proportional to $x/ln\u2061(SNR)$, as derived in Eq. (24). For frequency components lower than $ \omega cut$, the signal amplitude at the sample surface is less than the noise. In deriving spatial resolution, it is assumed that such frequency components cannot be detected. Frequency components above the noise level are used for reconstruction. From information theory and non-equilibrium thermodynamics, we could show that the information content for frequencies larger than $ \omega cut$ is so small that their distribution cannot be distinguished from equilibrium distribution.^{1} If the mean energy and, thus, the entropy are proportional to the square of the signal amplitude, the cutoff frequency $ \omega cut$ from this information criterion is always the same frequency at which the signal amplitude becomes smaller than the noise amplitude.

One-dimensional signals can be realized by layered structures, and there the noise can easily be reduced by increasing the area covered by the detector, which corresponds to enhanced averaging. For higher dimensions, such as image reconstruction in three dimensions, this is no longer possible because the signal is different for different detection points on the surface. There is a trade-off between small measurement pixels for better spatial resolution and larger measurement pixels with less noise and better $SNR$, which also affects the spatial resolution of the reconstruction.

## VI. OVERCOMING THE RESOLUTION LIMIT

The resolution limit $ \delta r$ given in Eq. (24) degrades proportional to imaging depth $x$. One way to improve the resolution when imaging a structure at certain depth is to increase the $SNR$, which could be achieved by increasing the power of the excitation laser. This is only possible to a certain extent to prevent the sample from being destroyed by too much heating. As mentioned above, for layered structures for which the one-dimensional thermal diffusion equation applies, the $SNR$ can be improved by measuring and averaging over a larger area. For two-dimensional structures, such as parallel cylindrical objects, it is still possible to perform averaging in the direction of the parallel axis and thereby increase the $SNR$.

^{20}The measured surface temperature and the virtual wave are connected by local transformation, i.e., for the same location $ r$,

To increase the thermal resolution limit more efficiently, either super-resolution methods, such as those known to overcome the Abbe limit in optical imaging, e.g., by structured illumination, or additional information, such as sparsity or positivity, can be used. When using structured illumination, several reconstructed images with different illumination are used to calculate one super-resolution image by a non-linear reconstruction method, such as the iterative joint sparsity algorithm (IJOSP).^{19} As an example of the consideration of sparsity and positivity, the reconstruction of three graphite bars embedded in epoxy resin is presented here (thermal diffusivity $0.13\xd7 10 \u2212 6 m 2/ s$ at a depth of 1.6, 2.6, and 3.6 mm).^{5} The measurement setup is sketched in Fig. 12. The graphite bars were heated by laser excitation (diode laser wavelength 938 nm, 250 W laser power with a pulse duration of 200 ms). The surface temperature after laser pulse excitation was measured with an infrared camera (106 Hz full frame mode and a noise equivalent temperature difference of 25 mK, cooled InSb sensor), which was sensitive in the spectral range of $3.0\u22125.1\mu m$. In this spectral range, the epoxy resin is opaque and we measured the surface temperature evolution. The spatial resolution from the camera pixels was 0.1 mm on the sample surface. Figure 13 shows a cross section of the initial temperature field from the light absorbing bars, and the reconstruction using the truncated-singular value decomposition (T-SVD) method for calculating the virtual wave and SAFT for the ultrasound reconstruction algorithm. To calculate the virtual wave considering sparsity (the cross section of graphite rods are only small points) and positivity (heating always increases the temperature), the Alternating Direction Method of Multipliers (ADMM) was used, a nonlinear optimization algorithm similar to the Douglas–Rachford method. Sparsity is taken into account by these methods by minimizing in addition to a data fitting term also the L1-norm. More precisely, the virtual wave is computed as a minimizer of $\Vert M T virt\u2212T \Vert 2+ R( T virt)$, where $ M$ is a discretization of the integral in (26) and $ R(.)$ is a suitable regularizer defined by the L1-norm. The use of this regularized virtual wave allows to implement efficient L1-minimization algorithms (e.g., ADMM) and leads to a much better resolution of the reconstructed temperature field.^{5}

The application of deep neural networks to tackle inverse problems is currently attracting a huge amount of interest within the inverse problems community, e.g., for thermal property reconstruction^{21–23} or for source reconstruction.^{24} Combining these methods with traditional model-based reconstruction techniques for solving inverse problems is an emerging trend that comes with new challenges, such as generating training data, designing architectures, choosing learning strategies, and interpreting the learned parameters. For recent results in this field, we refer to the survey of Arridge *et al.*^{25} We have compared purely model-based reconstructions using the ADMM method for calculating the virtual wave and a subsequent reconstruction with SAFT, a hybrid reconstruction by using deep learning to determine the reconstruction from the virtual wave, and a end-to-end deep learning reconstruction using directly the measured temperature data without calculating the virtual wave before.^{6} The best spatial resolution could be demonstrated by hybrid reconstruction using the ADMM method to calculate the virtual wave and then the deep learning reconstruction. This showed better results than reconstruction directly by deep learning and also the effort for training of the network was significantly less. For training of the deep neural network (u-net architecture), we have used simulated synthetic data, but nevertheless the reconstruction from measured data was better than using our model-based approach.^{26} One reason for this could be that the virtual waves show clearer signals with more spatial variations, similar to the images that are to be reconstructed, and not as slightly varying signals as the original thermal measurements. Using simulated synthetic data for training is important because real-world data are hard to obtain, as it requires the production of physical *phantoms* with a range of material properties (e.g., defects at various positions and of various sizes and shapes). However, we can synthesize arbitrarily large amounts of data with relatively simple analytic models of the heat diffusion process, albeit at the risk of results based on synthetic data not perfectly matching the real world. As a consequence, we provided code and data (synthetic and real-world) for full reproducibility of the results obtained by our experiments.

In summary, the virtual wave concept is an ideal and universal tool enabling the prospect of high-resolution photothermal imaging. The virtual wave concept can be viewed as beneficial factorization of the highly ill-posed inverse photothermal problem into a conversion from thermal to acoustic waves and a subsequent beamforming task. The conversion task is still highly ill-posed. However, the challenging situation of the full 2D or 3D reconstruction problem has been reduced to the solution to an 1D integral equation (26). In doing so, we were able to efficiently integrate sparsity as well as positivity of the initial temperature profile to overcome the resolution limit. Note that the virtual waves $ T virt( r,t)$ are generally not positive. However, if we apply the Abel transform to virtual waves, we obtain the spherical mean values of the initial temperature profile. Therefore, in reality, positivity is always satisfied for Abel transformed waves that we actually use for the positivity constraint. In the pixel basis, sparsity may also not be given directly. However, sparsity of the initial temperature is usually the case in various different bases such as the wavelet base. As shown in Ref. 27 in the context of photoacoustic imaging, wavelet sparsity of the initial temperature is transferred to wavelet sparsity of the virtual wave, thereby allowing the use of sparse recovery techniques. Finally, some other information such as the shape of absorbers may also be available. Exploiting such information is interesting and challenging. We anticipate, however, that ideas from Ref. 27 along with matched filtering techniques are promising for integrating such prior information into the reconstruction of the virtual wave.

## VII. DISCUSSION

Often, high-frequency components in the signals cannot be detected because the noise prevents them from detection. There are several effects that limit the detectability and cause noise. Beside insufficient instrumentation and data processing, one principle limitation comes from thermodynamic fluctuations (noise) associated with dissipative processes in accordance to the dissipation–fluctuation theorem (Fig. 7). As mentioned above, in addition to the imaging depth, the $SNR$ also affects spatial resolution and the width of the thermal PSF. Even if the noise amplitude is the same in frequency and time domain for Gaussian white noise, because the noise energy must be the same in the frequency and time domain (Parseval’s theorem), the signal amplitude and, hence, the $SNR$ can be very different. For example, a short delta-shaped pulse in time contains all frequencies with the same amplitude, and therefore, the $SNR$ is much lower in the frequency domain than in the time domain. The appropriate space for the $SNR$ to calculate the resolution limit is closely related to the ill-posed inverse problem to compensate for the effect of thermal diffusion or acoustic attenuation. Regularization is required to solve such ill-posed inverse problems.^{28–30} The use of the cutoff frequency $ \omega cut$ is one way to regularize the ill-posed inverse problem, which turns out to be equivalent to the truncated singular value decomposition (SVD) method. But, of course, such a rigorous cut, which uses the full information of the frequency components below $ \omega cut$ and completely neglects them above this cutoff frequency, is only a rough approximation. In the frequency domain, the information content of a frequency component gradually decreases with the square of the reduced amplitude as the frequency increases, until it disappears in the noise at cutoff frequency $ \omega cut$. Therefore, other regularization methods that allow a smoother transition, such as Tikhonov regularization,^{31} might better describe this behavior.

To describe how humans perceive acoustic waves, Harvey Fletcher conducted tone-in-noise masking experiments, where a single tone (now a frequency domain peak equivalent to the delta time peak from above) was masked by noise. He had a problem with at the time (very) noisy telephones and needed to find out why people could or could not hear the speech and what the telephone company could do about it.^{32} We performed some of these experiments with different signals and with white noise and bandstop noise to better understand whether signals are detectable or not in the presence of simultaneous noise. It turned out that before the time-domain signal is converted to the frequency domain, temporal windowing is an important aspect in describing detectability. This, in turn, affects the possible measurement bandwidth for the signal, and a trade-off between higher bandwidth or more noise must be made in choosing the optimal discretization time step. This could lead to an optimal temporal discretization to achieve an optimal spatial resolution.

The physical reason for the entropy production is fluctuation, which is the noise added to an ideally noise-free signal and can be statistically described by stochastic processes.^{2} The ratio between the ideal signal amplitude and the noise amplitude is called the signal-to-noise ratio (SNR) and plays an essential role in this Tutorial. Noise is identified as thermodynamic fluctuations around a certain time varying mean-value and is the mechanism for entropy production and information loss. Other technical limitations from insufficient instrumentation can additionally limit spatial resolution more than described by the thermal PSF.

The thermodynamic fluctuations in macroscopic samples are usually so small that they can be neglected but not for ill-posed inverse problems. The fluctuations are highly amplified due to the ill-posed problem of image reconstruction. As long as the macroscopic mean-value-equations describe the mean entropy production, the resolution limit depends only on the amplitude of the fluctuations and not on the actual stochastic process including all correlations.^{3} Therefore, we have already used in the past a simple Gauss–Markov process to describe the measured signal as a time-dependent random variable.^{2} Here, we describe heat diffusion as a random walk (Wiener process).

## VIII. CONCLUSIONS

This Tutorial explains the origin of the spatial resolution limit associated to photothermal techniques for imaging buried structures. The diffusion process is represented as a stochastic process (random walk) that causes spatial spreading of the mean values (entropy increase) and is accompanied by fluctuations that lead to information loss about the original temperature distribution, imposing a resolution limit that is calculated. Thereby, it shows how the diffusion of heat and fluctuations lead to the well-known blurring of reconstructions in infrared imaging. This can be described mathematically by a spatial convolution of the original temperature distribution with a thermal point-spread-function (PSF). The width of the thermal PSF increases linearly with imaging depth and is indirectly proportional to the natural logarithm of the signal-to-noise ratio (SNR).

The random walk model is introduced as a mathematically simple approach based on statistical probability calculations that provides insight into the heat diffusion process. It is also explained how probabilities come into play in a deterministic model for diffusion, where interaction with the environment is the cause for losing information. Furthermore, some inversion strategies based on the virtual wave concept, the inclusion of sparsity and positivity of the solution, as well as deep neural networks are presented and compared as suitable methods to partially overcome the depth-dependent resolution limit.

## ACKNOWLEDGMENTS

This work was co-financed by research subsidies granted by the government of Upper Austria. Financial support was also provided by the Austrian research funding association (FFG) under the scope of the COMET programme within the research project “Photonic Sensing for Smarter Processes (PSSP)” (Contract No. 871974). This programme is promoted by BMK, BMDW, the federal state of Upper Austria, and the federal state of Styria, represented by SFG. Parts of this work have been supported by the Austrian Science Fund (FWF), Project Nos. P 30747-N32 and P 33019-N. In addition, the financial support by the Austrian Federal Ministry for Digital and Economic Affairs and the National Foundation for Research, Technology and Development and the Christian Doppler Research Association is gratefully acknowledged.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Photons Plus Ultrasound: Imaging and Sensing 2007: The Eighth Conference on Biomedical Thermoacoustics, Optoacoustics, and Acousto-optics*,

*SPIE Proceedings*, edited by A. A. Oraevsky and L. V. Wang (SPIE, 2007), p. 643726.

*Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences*, 2nd ed., Springer Series in Synergetics Vol. 13 (Springer, Berlin, 1985).

*Non-Equilibrium Thermodynamics*

*Rank-Deficient and Discrete Ill-Posed Problems*

*Parameter Estimation and Inverse Problems*

*Variational Methods in Imaging*, Applied Mathematical Sciences Vol. 167 (Springer, New York, 2009).

*Modeling of Biological Interfacial Processes Using Thickness-Shear Mode Sensors*, edited by M. G. Beghi (INTECH Open Access Publisher, 2011).

*Speech and Hearing in Communication*, The Bell Telephone Laboratories Series (Van Nostrand, New York, 1953).