This article offers a perspective on photoacoustic tomography (PAT) under realistic scenarios. While PAT has gained much attention in preclinical and clinical research, most early works used image reconstruction techniques based on ideal assumptions, and thus these techniques may not be fully effective in real environments. In this work, we consider such non-ideal conditions as a limited view, limited bandwidth, lossy medium, or heterogeneous medium. More importantly, we use k-Wave simulation to numerically evaluate the effects of these limiting factors on various image reconstruction algorithms. Then, to enable more reliable PAT image reconstruction, we introduce recent techniques for mitigating each of the limiting conditions. We seek to emphasize the importance of working within these realistic limitations, and we encourage researchers to develop compensating solutions that advance PAT’s translation to real clinical environments.
I. INTRODUCTION
Photoacoustic (PA) imaging, also called optoacoustic (OA) imaging, has been increasingly employed in biomedical imaging, where it provides information at scales from microscopic to macroscopic.1–5 PA imaging utilizes both optical and acoustic energy to generate rich optical absorption contrast in biological tissues while maintaining high ultrasonic spatial resolution at depths beyond the optical ballistic region (∼1 mm in soft tissues). In the PA effect, the instantaneous absorption of short pulses of light induces recurring thermal expansions that generate acoustic waves. The PA signal at the instant of generation (called the initial PA pressure) is linearly proportional to the amount of light absorbed; thus, it can spectroscopically differentiate endogenous chromophores, such as hemoglobin,6–8 lipid,9–11 melanin,12,13 water,14 DNA/RNA,15 and collagen.16 Beyond intrinsic contrasts, exogenous contrast agents have been vigorously investigated to improve the local visibility of deeply located vasculatures, internal organs, and diseased tissues.17–23 PA contrast agents must exhibit a high thermal energy transition to the given optical exposure and comprise various materials, such as small molecules,24–27 inorganic nanoparticles,28–30 organic nanoparticles,31–36 polymeric nanoparticles,37–39 or their composites.40,41
In particular, PA tomography (PAT) has gained much attention as a preclinical and clinical imaging modality.42–44 PAT can visualize optical heterogeneities of biological tissues at depths of several centimeters, with spatial resolutions of tens to hundreds of micrometers.45–50 The main PAT application is agent-free vascular imaging, which is critical for various diseases, such as cancers (e.g., breast,51–54 thyroid,55–57 cervical,58 colon,59,60 and prostate cancers61,62), joint inflammations,63–65 peripheral vascular diseases,66–68 and skin diseases.13,69 Similar to conventional x-ray computed tomography (CT), PAT produces images from simultaneously acquired multiple PA measurements made using various forms of ultrasound (US) transducer arrays positioned around the imaging targets. Widely used array geometries include linear,49,51 convex,70,71 concave,52,72 and circular shapes73,74 for two-dimensional (2D) cross-sectional imaging, and planar54,75 or hemispherical shapes10,76 for three-dimensional (3D) volumetric imaging.77 The multiple measurements are combined to form a single image frame by using image reconstruction algorithms, also called beamformers.78–81 A common beamforming principle is to reverse the wave propagation to the initial state, where the light absorption was converted to the acoustic pressure at the target position.78,82–84 Assuming that the acoustic waves propagate directly toward the detectors with known speeds, the acoustic pressures at the targets can simply be estimated via triangulation.85 Many publications have reported novel reconstruction methods, most of which can be categorized as time-domain,86–94 frequency-domain,95–98 or numerical model-based algorithms.99–102 While beamformers are generally developed and validated by thorough theoretical derivations, in real situations, the actual output images from the beamformers may not be as good as predicted. Practically, PAT must operate under non-ideal conditions that are difficult to mathematically model in the development process, and thus, the qualitative and quantitative accuracy of the reconstructed images may be greatly degraded.103,104
In this perspective, we discuss the practical effects of the factors limiting PAT's performance and introduce state-of-the-art techniques to overcome them. Starting with Sec. II, we assign the limiting factors to four categories: limited view, limited bandwidth, lossy medium, and heterogeneous medium. Realistic US detectors have both finite apertures that fail to capture part of the tomographic data and limited detection bandwidths that are inadequate for the inherently wideband PA signals. Furthermore, actual propagation media are lossy for optical and acoustic waves, and the heterogeneous constitution of various tissue components also significantly distorts the optical and acoustic wavefronts, complicating the image reconstruction process. To demonstrate the effects of the limiting factors, we use the k-Wave toolbox in MATLAB105,106 to simulate those conditions and evaluate the corresponding images qualitatively and quantitatively. In Sec. III, we review potential technical solutions for each limiting factor to provide future research directions, and we conclude our work in Sec. IV with a brief summary. Due to the difficulty of modeling heterogeneous optical fluence in tissues, we have not considered the optical heterogeneity of the medium. Because all these factors affect the images, the practical deployment of PAT can be limited, especially in clinical scenarios where guarantees of reliability and reproducibility are required. For PAT to be successfully translated from benchside to bedside, these realistic limitations must be properly acknowledged and mitigated.
II. EVALUATION OF FACTORS LIMITING PRACTICAL PHOTOACOUSTIC TOMOGRAPHY
A. Limiting factors of practical photoacoustic tomography
In general, the performance of the PA image reconstruction is determined by the imaging probe parameters and the medium conditions. As illustrated in Fig. 1, the specific determinants are the signal reception aperture, the bandwidth (BW) of the sensor, the optical fluence, acoustic attenuation, and tissue heterogeneity that results in various speeds of sound transmission. Ideally, PAT would have an infinite aperture, infinite BW, a lossless medium (i.e., uniform light fluence and no acoustic attenuation), and a homogeneous known speed of sound, and early image reconstruction algorithms (beamforming techniques) were derived under these ideal assumptions. However, in practical scenarios, various combinations of these defects could differently affect the reconstructed image. In this section, we separately discuss the characteristics of each limiting factor.
As a first consideration, a US imaging probe has a limited aperture, creating what is commonly termed a “limited-view” condition. This limit is caused by either the geometry of the US transducer array or the directional sensitivity of each transducer element. Circular transducer arrays can receive PA signals from a nearly full 360-degrees, but linear or concave transducer arrays can receive only signals within much reduced angles of view. Circular arrays may also suffer from limited-view artifacts in their outer regions because each transducer element can essentially receive only signals coming in normal to the transducer surface. The −6 dB sensitivity angle of an element is typically about 40°; thus, a linear transducer array might not exceed this aperture angle even with an ultra-wide geometry. The effective apertures of circular or concave arrays are much less affected by directional sensitivity because all elements are facing the center of the arc, but the usable field-of-view (FOV) can be limited to the near center. This limited-view condition can create image distortion and degrade the spatial resolution because the image reconstruction is done with only a restricted amount of the generated signal. For example, a circular target may appear as an oval.
Third, propagation media have different degrees of optical and acoustic attenuation. Optical fluence in biological tissues decreases exponentially, a major factor influencing imaging depth. More seriously, optical attenuation is strongly correlated with optical wavelength. If this optical attenuation is not properly compensated, spectroscopic PAT that assumes a linear relationship between PA signals and light absorption will not be accurate. Complicating the matter, it is extremely challenging to accurately map the distribution of light fluence in tissue, and for this reason, optical heterogeneity is not considered in our analysis here. Typically, acoustic attenuation is not as serious as optical attenuation, and it comes from a combination of frequency-dependent power attenuation and the dispersion of frequency components. In practice, high frequency acoustic waves are more vulnerable to attenuation, thus low-frequency PA signals can be mainly detected from deep tissue.
Last, tissue heterogeneity causes a complex speed-of-sound distribution that complicates image reconstruction. Because most image reconstruction algorithms assume a uniform speed-of-sound (typically, 1540 m/s), they become very ineffective when applied to highly heterogeneous tissues. Lipids and bones are well-known sources of speed-of-sound contrast, with typical values of 1440 and 3500 m/s, respectively. These outliers cause phase aberration, which breaks the coherence of the received acoustic waves and deteriorates the constructive interference among them. Moreover, it is challenging to measure the actual speed-of-sound distribution in real tissue, and thus compensation based on typical values or on crude tissue segmentation may not be adequate.
B. Numerical simulation of PAT under limiting conditions
Next, we demonstrate our procedure for numerically simulating PAT in realistic conditions. Recall the four categories of limiting conditions: limited view, limited bandwidth, lossy medium, and heterogeneous medium. Figure 2(a) illustrates the flow of our simulation and evaluation procedure. To begin, a radio frequency (RF) dataset was generated with k-Wave, a MATLAB-based time-domain acoustic wave simulator, under selected conditions reflected in the source, detector, and medium parameter presets.105 The RF data were then processed with seven types of beamformers: delay and sum (DAS), delay multiply and sum (DMAS), nonlinear pth-root DAS (p-DAS), minimum variance (MV), filtered back projection (FBP), Fourier-domain reconstruction (Fourier), and time reversal (TR).78,80,82,87,93,96,99 The performance analysis was performed using two representative indices, mean squared error (MSE), and structural similarity (SSIM) for two separate regions of interest (ROIs) indicating the source and background.
The simulation parameters for each condition are tabulated in Fig. 2(b), and the simulation setting is illustrated in Fig. 2(c). A 25.6 mm (axial) × 38.4 mm (lateral) virtual planar space with 0.1-mm grid step was defined for each condition. Nine circular targets with a diameter of 4 mm were equally spaced in a 3 × 3 grid, centered on the target. Circulatory vessels are the most common imaging target in PAT, hence, the targets mimicked the typical size of human cephalic veins (∼4 mm). The medium was initially set to be optically and acoustically lossless, and the speed of sound (SoS) was assumed to be homogeneous at 1540 m/s.
As shown in Fig. 2(b), we defined the conditions C1–C5 to describe the practical PAT scenarios where the four limiting factors are cumulatively applied. As a baseline condition (C1), the lateral coverage and element number of the transducer were doubled (76.8 mm and 384 elements). For the limited-view condition (C2), the transducer size was reduced to 38.4 mm and the element number was changed to 192. The FOV calculated as the view angle from the target to the aperture would be 143° (C1) and 112° (C2–C5) at the center target. For the limited bandwidth condition (C3), the acquired RF data were preprocessed with a Gaussian-shaped bandpass filter, designed with the parameters from a commercial linear probe (GE9LD, General Electric) having 5.3 MHz center frequency and 75% bandwidth. To model the optical attenuation in the lossy medium condition (C4), the optical absorption coefficient ( ), optical scattering coefficient ( ), and the anisotropy ( ) were set as the typical values of 0.1 cm−1, 100 cm−1, and 0.9, respectively.107 Because all targets are located deeper than the optical ballistic region (<1 mm depth), we used an exponential function to approximate the fluence at depth (z); , where . Note that we assumed the medium to be optically homogeneous because of the difficulty of accurate modeling of optical heterogeneity. Also, we assumed that the light illumination on the surface is homogeneous so that the optical fluence only depends along the depth. Under these assumptions, the light fluence in deep tissue (>1 mm depth) can be simply modeled with an exponential function following the diffusion equation. Finally, the initial pressure map was multiplied by the optical fluence map to simulate the optically lossy medium. The acoustic attenuation was set using the k-Wave medium parameter settings, using the typical value of 0.75 dB/(MHz1.5⋅cm) in soft tissues. To induce acoustic heterogeneity in the medium (C5), a lipid layer was added with a sinusoidally varied thickness ranging from 3 to 5 mm (SoS 1440 m/s), and the rest of the medium was maintained as before (i.e., SoS 1540 m/s). The acoustic heterogeneity of the real tissue would be much more complex, and thus, the image reconstruction quality would be even more degraded in an unexpected manner. Still, it is very difficult to compensate the phase aberrations even in a simple model because the actual speed-of-sound distribution would be unknown in realistic situations.
Next, the generated RF data were processed with seven different beamformers. DAS is a direct sum of delayed signals, and its simplicity makes DAS the most conventional type of reconstruction scheme and the basis of every time-domain reconstruction model. DMAS is derived from DAS with a cross-element multiplication, similar to that of cross correlation, that highlights coherent pixels while suppressing low coherence artifacts or random noises.88,90–92,94 Although various advanced beamformers have been derived92,108 from DMAS to improve the denoising and sidelobe suppression, the baseline DMAS algorithm was selected in our simulation study. A generalized expansion of DMAS from the order of 2 to a variable p yields our third scheme, p-DAS.93 While higher values of p suppress the sidelobe more, we used p = 3 in our simulation to avoid excessive cancellation of the mainlobe in C5 from incoherent summation of the waves. MV is also known as the Capon beamformer and adaptively finds the optimal apodization weight that minimizes the variance (i.e., power) of the beamforming output.86 In our model, the beamforming process contains both spatial and temporal averaging to maximize the stability of covariance matrix calculation and, consequently, to achieve the best image quality.87 FBP can be regarded as DAS with a first derivative operation on the RF data, which works as a ramp filter in the frequency domain because its original use was to sharpen the edges of figures developed in x-ray CT.82 An angular normalization term is another key feature of this beamformer, compensating the magnitude of each data sample by weighting it with the solid angle of each detector element.51,84 Instead of the delay-based reconstruction methods mentioned above, the frequency-domain reconstruction algorithm, called Fourier in this work, transforms the acquired RF data into spatial frequency domain to reconstruct the image.95 First, the spatial/temporal-domain RF data are processed with a fast Fourier transform (FFT) to translate the data into the spatial/temporal frequency ( ) domain. Using the relationship established by the general wave equation between the temporal frequency, and the spatial frequencies, and , each frequency component is reconstructed via interpolation in the spatial frequency domain. Finally, the image ( ) can be reconstructed via inverse FFT of the interpolated spatial frequency-domain result. By applying FFT, such a method has advantageously low computational complexity, but harmonic artifacts compromise the image quality. The last beamformer, acoustic TR, uses the symmetry in acoustic wave propagation and operates by literally transmitting the received initial data backward.99,100 As the fully-received data should be retraced in stepwise acoustic wave propagation modeling, such a method requires complex numerical simulation.101,102 Different from the previously mentioned beamformers, TR can be implemented to consider a heterogeneous SoS. However, to show the severe limitation imposed by a heterogeneous medium, we assumed that only the typical value of 1540 m/s is known for the TR method.
Figure 3 summarizes the image reconstruction results for the center target with all the aforementioned beamforming algorithms. All images are normalized to their maximum absolute pixel value. Scanning from left to right, the image qualities are sequentially degraded in all beamformers as the sequence of practical conditions proceeds. Initially, in near-ideal condition (C1), the Fourier and TR images show the best resemblance to the circle with almost uniform PA amplitudes. The first derivative term in the FBP algorithm delivers a constant hollow feature in all types of conditions because of its ramp filtering that enhances the edges. In limited-view condition (C2), a slight deformation of the circular shape is observed, and the sidelobes tend to increase due to the degraded focusing capability of the US transducer. Under the limited bandwidth condition (C3), all seven image results appear to be hollow because the low-frequency components are suppressed from bandpass filtering of the transducer element.104 Optical and acoustic attenuation from the medium (C4) haze the bottom boundary caused by decreased optical fluence and increased acoustic attenuation. Finally, in the heterogeneous medium (C5), circular shapes are no longer conserved, and only a part of the upper boundary is reconstructed: the acoustic waves are not coherently summed, especially in the side boundaries with steeper slopes. Deeply located targets are generally more vulnerable to the limiting factors, particularly, in C2 and C4 because they have narrower FOV to the transducer aperture and longer traveling distance. From these results, we can observe that it becomes pointless to compare the image qualities of various beamformers as the conditions become harsher. Up through C3, the limiting conditions are problems arising from the signal detection perspective, and they are reasonably mitigatable by using advanced transducer array geometries or wideband sensor materials. However, the medium conditions in C4 and C5 are uncontrollable and almost unmeasurable, and thus they can be regarded as the major limiting factor of practical PAT image reconstruction.
C. Quantitative evaluation measures: Structural similarity (SSIM) and mean squared error (MSE)
Figure 5 displays SSIMs and MSEs from the two ROIs. An image is considered good when its SSIM index is close to 1 and its MSE index is close to 0. For evaluation, we used a total of 35 images (7 beamformers × 5 conditions) of the whole nine targets. SSIMs and MSEs in the target and background have almost opposite tendencies as the condition becomes harsher. The monotonous decrease in the target SSIM and the increase in the target MSE from every beamformer reveal increasingly degraded image quality as the limiting conditions accumulate. On the other hand, the background consisted of sidelobes around each target, which is prominent in limited-view condition and diminished afterward due to the signal attenuation and acoustic heterogeneity. In C1, the target SSIM values of the seven beamformers are [0.26, 0.17, 0.12, 0.02, 0.16, 0.70, 0.73] and the MSE values are [0.33, 0.49, 0.59, 0.59, 0.51, 0.08, 0.08]. These values match well with the visual appearances in Fig. 3, where TR produces the best image quality. In terms of the background, the SSIM values are [0.07, 0.28, 0.59, 0.14, 0.48, 0.05, 0.11], and the MSE values are [0.05, 0.02, 0.01, 0.03, 0.01, 0.04, 0.03], which also well describe the sidelobe reduction capability of p-DAS. From C1 to C2, slight degradation in the two metrics of the target is observed in all beamformers because the halved aperture distorts the target morphology. The background SSIM and MSE are also degraded due to the increased sidelobe levels. The target features are the most adversely affected by the limited bandwidth (from C2 to C3), which empties out the inner content of the target image and results in major drops in SSIM and rises in MSE. On the other hand, noticeable enhancements of the background SSIM and MSE are observed in this case because the strong low-frequency sidelobes are also erased. In the following conditions, C4 and C5, the image degradation is not more evident from the graphs, where the target SSIM values are low, from 0.02 to 0.05, and the MSE values are from 0.73 to 0.81. While the Fourier and TR beamformers are superior for target reconstruction in the near-ideal condition (C1), the comparison becomes meaningless among all beamformers in the harshest but most realistic condition (C5). It can be concluded that many image reconstruction algorithms tend to lose power in realistic situations where the ideal assumptions are no more valid. Note that TR can incorporate the acoustic heterogeneity of the medium when the parameters are known but still would be affected by the incompleteness of the acquired RF data due to the other conditions. Given these results, we stress that the limiting conditions must be seriously considered and mitigated to make PAT a reliable and truly competitive biomedical imaging modality.
III. TECHNICAL SOLUTIONS TO OVERCOMING THE LIMITING CONDITIONS IN PHOTOACOUSTIC TOMOGRAPHY
A. Limited-view conditions
Various approaches have been tried to avoid or compensate for limited-view effects.52,53,76,110–115 A popular hardware solution is to use specialized transducer arrays, such as arc, ring, or hemisphere shapes.52,53,76,114 These shapes are optimal for receiving omnidirectional PA waves because such arrangements can enlarge the aperture angle for tomographic signal reception and let each US transducer element receive normally incident signals. Typically, these sensors require a relatively large number of elements (>256) to maintain a pitch small enough to avoid grating lobe artifacts, and thus they require expensive high-performance data acquisition platforms. As an alternative, carefully-positioned acoustic reflectors can be used to redirect the out-of-view signals toward the transducer, which virtually adds to the detection view.111,116 As in Fig. 6(a), a reverberant cavity can be used to acquire multiple reflections of the PA signal that propagates outside of the aperture. When the reflected data are incorporated into the image reconstruction, blurred targets can be much more clearly distinguished. This design concept would also be appropriate for targets that could be enclosed within the cavity, although this requirement may limit the application to in vitro or ex vivo samples. In any case, acoustic reflectors must be designed and manufactured with high accuracy and a judicious consideration of the imaging application. While the former approaches exploit signal reception schemes, several studies have used modulated light transmission to compress the limited-view artifacts.110,117 Recently, a new approach using Hadamard-modulated light illumination was proposed.110 A series of Hadamard illumination patterns are used to measure the PA responses and estimate the pattern that best focuses the light energy onto each pixel location through the scattering medium, which is referred to as the PA transmission matrix (PA-TM). The eigenvalue of the PA-TM for each pixel represents the extent of the PA signal response of each pixel, thus can be used to distinguish each PA signal source from the limited-view artifacts. However, this approach may not be applicable for in vivo situations where the propagation medium conditions change with metabolic activities. Another approach approximates the out-of-view PA signals using the Gerchberg–Papoulis method.115 This method was originally used to extrapolate a truncated function and recover the original function having a known bandwidth or to restore the blurred or noisy parts of a given image,118,119 and it is reported to show great potential for limited-view PA imaging. In principle, the method is performed in five steps: (1) transform the limited-view image into the frequency domain via FFT, (2) reverse the frequency components from the domain to the domain via the interpolation relationship as described in the Fourier-domain reconstruction,95 (3) estimate the data from the missing detector elements in the frequency domain, (4) use the data to recover the missing view in the image, and (5) apply compensation to the limited-view image from the previous reconstruction. This process is performed iteratively until it converges. Its stability is further improved by combining the process with total variance (TV) optimization. As in Fig. 6(b), the proposed method (denoted as GPEF, Gerchberg–Papoulis extrapolation using FFT) provided decent reconstruction performance in the limited-view condition for both circular and linear scans. Even with a coverage angle as small as 60°, the GPEF method successfully recovered the simulated target. As a practical demonstration, the method was applied to linearly scanned phantom data, which showed its superiority over other methods (gradient descent or variable splitting) for solving the TV optimization.
Capitalizing on great advances in deep learning, several studies have focused on solving the limited-view problem by training neural networks.112,113 In the work of Hauptmann et al.,113 the network was trained to learn the update function in the iterative image reconstruction step. Davoudi et al.112 proposed a neural network for post-processing the limited-view images by training the network with the full-view tomographic images. While the performance of the deep learning approaches may vary with the quality of the training data set, noticeable quality enhancements as well as accelerated image reconstruction have been reported.
B. Limited bandwidth conditions
Obviously, a wideband US sensor is the most basic remedy for a limited bandwidth. Conventional piezoelectric US sensors generally have the limited bandwidth at high frequencies (e.g., over 5 MHz), which is actually useful for obtaining high resolution US images. However, they are not desirable for detecting PA signals originating from targets with various sizes, where large targets generally emit low-frequency signals (down to about few hundred kHz). As one alternative, capacitive micromachined ultrasonic transducers (CMUTs) have gained attention for PA imaging because of their wide signal detection bandwidth and their ease of fabrication compared to piezoelectric materials.120–123 Due to these characteristics, CMUTs are especially advantageous for PAT applications that require a large number of sensor elements. Still, CMUTs may not sufficiently detect frequency components down to direct current (DC), which is necessary to describe large targets. Another promising technique for wideband PA signal detection is all-optical US sensing.75,124–128 All-optical detection of US waves uses the principle of the Fabry–Pérot interferometer, which detects acoustically induced changes in the thickness of a Fabry–Pérot cavity using laser beams reflected from optical interferometer mirrors on two surfaces of the cavity. One representative example of an all-optical detector design is illustrated in Fig. 7(a).128 This detector architecture illuminates the excitation laser pulses through the transparent Fabry–Pérot sensor head and detects the PA signals with an interrogation beam that scans the target ROI, and it is capable of receiving extremely broadband frequency components from DC to 25 MHz.
Besides hardware approaches, image processing using PA signal fluctuation analysis have been reported.117,129,130 These methods utilize multiple PA image frames with varying pixel intensities, either using the speckle light illumination or targets containing light absorbers moving in a flow. By calculating the high-order cumulant (i.e., the variance that corresponds to the second order cumulant) of a series of PA images, the hollow part of the image caused by the limited bandwidth can be recovered.129,130 Fundamentally, the procedure enables incoherent summation of the PA waves which are destructively cancelled out in the original image. At the same time, it can mitigate the limited-view effect such as the Hadamard-modulated illumination method described in the previous section.
Deep learning can also play a role in providing good estimates of the PA signals lost in limited bandwidth scenarios.131 Figure 7(b) shows a demonstration of a deep neural network that is primarily trained with simulated PA signal pairs of full bandwidth and limited bandwidth. The results suggest that the images reconstructed from the estimated signals can approach the actual target as well as the full bandwidth images.
C. Lossy medium conditions
Most of the studies on overcoming a lossy medium have focused on compensating the optical fluence probably because the acoustic attenuation is broadly assumed to be negligible compared to the optical attenuation.51,132–140 The simplest method to approximate the optical fluence distribution would be to use an exponentially decaying function following Beer's law. Yet, this method cannot accurately describe the effect of scattering on the optical fluence. Radiative transfer equation (RTE) is the most exact analytical model of photon transport in biological tissue, regarding energy conservation of an arbitrary volume as the baseline of the derivation. Because RTE itself is insufficient to be analytically solved, diffusion equation is suggested as its approximate solution under assumption of sufficiently scattered propagation in a high albedo medium ( ). Such model can be expressed in a closed form solution which is further simplified in deep tissue into an exponential function using the effective attenuation coefficient , but it remains inaccurate in near-field where the assumptions of the high albedo medium is not available. Monte Carlo light transport simulation is a numerical method that is considered to be equivalent to RTE and can be performed under a simplified modeling of the tissue structure to estimate the light fluence.135,136 The research illustrated in Fig. 8(a) demonstrates the image enhancement from Monte Carlo fluence estimation compared to exponential modeling of the fluence. While the exponential function can well approximate the fluence in deep tissue (>1 mm depth), it is not applicable for shallow tissue (<1 mm depth) where the light scattering is minimal. Thus, the Monte Carlo simulation can be advantageous especially for applications that image regions near the skin, as in Fig. 8(a). However, both fluence modeling methods must be built on typical values for optical extinction coefficients, which are unpredictably known, or even unknown, in practical situations. As an alternative, several researchers have tried to estimate fluence indirectly from the pixel intensity distribution of the 3D PA image.51,137 Alternatively, rather than approximating the fluence distribution, the work illustrated in Fig. 8(b) aimed to balance the pixel intensity distribution by assuming that the intensities were dependent only on the light fluence.137 The compensation process has two steps: (1) compensate the non-uniform illumination pattern on the skin and (2) compensate the depthwise optical attenuation. The results have shown that the depthwise pixel intensity distribution resembles the optical attenuation modeled by Beer's law, so this method can be regarded as reasonable. Figure 8(b) shows that a compensated image of the human breast can enhance the visualization of the underlying deep vasculature. However, this approach assumes that the target shape is close to a sphere, which is applicable almost only to the breast. Also, it is difficult to apply on the body parts that contain bone structures. In a similar manner, the mean background PA signal intensity at each depth has been used to estimate and compensate the depthwise fluence distribution of each wavelength and has proved its effectiveness in accurately calculating hemoglobin oxygen saturation in deep tissues.51
Figure 8(c) shows a novel solution using acousto-optic tomography (AOT) to measure the fluence map experimentally.133,134 AOT is a hybrid imaging technique that uses ultrasonic tagging of the light inside the tissue. This tagging induces changes in the optical speckle contrast, the extent of which can be used to measure the light fluence at that location. It is well demonstrated in the phantom experiment that the compensated image with the AOT-measured fluence map can correct the PA intensity values of the identical absorbers to make them uniform. By applying this method to multi-wavelength PA imaging, the accuracy and reliability of spectral unmixing can be highly enhanced. One representative study on compensating the acoustic attenuation was reported by Treeby.138 While the effect of acoustic attenuation is mostly regarded as negligible, its actual mechanism is complex because it is frequency dependent and nonlinear. This work proposed time-variant filtering of the PA signals to correct both the acoustic attenuation and dispersion, following a frequency power law. This filter was applied to the time-domain PA signals acquired from a pregnant mouse, and the result showed noticeable contrast enhancement at imaging depths greater than 2 mm.
D. Acoustically heterogeneous medium conditions
The heterogeneity of SoS directly degrades the quality of image reconstruction, but it is difficult to estimate and compensate.73,141,142,144–146 Even in conventional US imaging, fatty tissues can hamper the visualization of structures beneath them. Nevertheless, most image reconstruction algorithms assume a uniform speed of sound, thus their quality is highly dependent on the actual tissue composition of the imaged target. Compensating for the time of arrival is even more difficult because the acoustic wavefront is refracted while passing between two different tissue components. Still, given the exact distribution of the speed of sound, it is possible to track the wave propagation and recover the actual source profile by using the multi-stencils fast marching (MSFM) method.141 MSFM provides stepwise estimation of the wave propagation, assuming that the wave component at each node propagates along the fastest path. The calculated path delays can be used for time-domain image reconstruction algorithms such as DAS. As a result, the unfocused signals in the conventional image can be tightly focused to resemble the actual source profile [Fig. 9(a)]. Drawbacks of this approach include the need for prior knowledge of the speed of sound, and the complex pre-processing needed for the time delay calculation, which limits real-time image reconstruction. To help resolve this issue, US computed tomography (USCT) can be integrated with PA computed tomography (PACT) to provide comprehensive acoustic information about the imaged target.73,142 Figure 9(b) presents various optical and acoustic features of a mouse, obtained from simultaneous PACT, reflection USCT, and transmission USCT measurements.73 Transmission USCT is particularly capable of providing the SoS and acoustic attenuation maps of the target, which encourages the practical application of MSFM reconstruction. While an SoS map can be obtained solely from USCT signals, a joint reconstruction of both the PA initial pressure and SoS maps using the combined PACT and USCT signals can enhance the accuracy of both measurements.142 Deep learning is also applicable for compensating the phase aberration of photoacoustic signals. Jeon et al. trained the network to reconstruct PA images with the optimized speed of sound for each target and showed that it can effectively recover the target in heterogeneous media while minimizing sidelobes or noisy background signals.143
IV. CONCLUSIONS
In this article, we discussed various conditions that limit PAT's performance in realistic scenarios. Through simulation, we showed reconstructed images under four cumulatively added limiting conditions and evaluated the image qualities via two quantitative metrics, SSIM and MSE. For each limiting condition, we summarized the research on potential solutions for overcoming the limitations. All limiting factors and their corresponding solutions for practical PAT are summarized in Table I.
. | Limiting factors . | |||
---|---|---|---|---|
Hardware limitations . | Medium limitations . | |||
Limited view . | Limited bandwidth . | Lossy medium . | Heterogeneous medium . | |
Definition | Incapability of detecting signals outside of the ultrasound transducer aperture | Incapability of detecting signals out of the frequency range of the sensor | Optical/acoustic attenuation of the signals during propagation | Distortion of wavefronts due to non-uniform speed of sound in the medium |
Effects |
| Blurring of the boundaries or emptying of the inner parts | Diminishing signal intensity in deep regions | Distortion of the target morphology |
Potential solutions |
|
|
. | Limiting factors . | |||
---|---|---|---|---|
Hardware limitations . | Medium limitations . | |||
Limited view . | Limited bandwidth . | Lossy medium . | Heterogeneous medium . | |
Definition | Incapability of detecting signals outside of the ultrasound transducer aperture | Incapability of detecting signals out of the frequency range of the sensor | Optical/acoustic attenuation of the signals during propagation | Distortion of wavefronts due to non-uniform speed of sound in the medium |
Effects |
| Blurring of the boundaries or emptying of the inner parts | Diminishing signal intensity in deep regions | Distortion of the target morphology |
Potential solutions |
|
|
The field of PAT has grown fast, and PAT is now nearing clinical deployment, where reliability and reproducibility become crucial. At this moment, along with the great advances in the mainstream techniques, we believe proper compensation for the limitations and thorough validation of the improved accuracy must be accompanied. Efforts to overcome the effects of non-ideal conditions should be encouraged to firmly establish a new medical imaging modality.
AUTHORS’ CONTRIBUTIONS
W.C. and D.O. contributed equally to this work.
ACKNOWLEDGMENTS
This work was supported by the Ministry of Science and ICT's National Research Foundation under Grant No. NRF-2019R1A2C2006269, Republic of Korea, and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant No.: HI15C1817). Chulhong Kim has financial interests in OPTICHO; however, it did not support this research.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.