The implicit representation by physics-informed neural networks (PINNs) serves as an effective solution for a key challenge faced by optical sound measurements. Since optical sound measurements observe line integral of the sound pressure along the optical path, reconstruction is necessary to determine the sound pressure at each point in the three-dimensional field. In this paper, we expand the PINNs-based reconstruction method into three-dimensional reconstruction and demonstrate its effectiveness for optically measured sound fields. Furthermore, we propose a reconstruction approach which can estimate solutions well outside the bounds of the data used for training.
1. Introduction
Partial differential equations (PDEs) are fundamental tools for describing various physical phenomena in science and engineering, including sound propagation. Physics-informed neural networks (PINNs) (Raissi , 2019, 2017b; Raissi, 2018; Raissi , 2017a) have emerged as a powerful framework for solving PDEs. PDEs are incorporated into the loss function in order to impose physical constraints directly on the network. Since this provides the network with systematic knowledge of the physics, the solution can be well estimated even from small amounts of data. Furthermore, PINNs have also shown their capabilities for noisy experimental data and multi-physics problems (Raissi, 2018; Raissi , 2017a). One of notable advantages of PINNs is that it can estimate solutions well outside the bounds of the data used for training (Raissi , 2017a).
Sinusoidal representation Networks (SIREN) (Sitzmann , 2020), introduced by Sitzmann et al. brought about a significant advancement in this field. SIREN can represent derivatives of arbitrary order by using periodic sine activation functions, allowing detailed modeling of harmonic signals. This precise representation of derivatives demonstrates the potential of implicit neural representations to be an innovative tool for solving inverse problems of PDEs.
PINNs-based implicit neural representation serves as an effective solution for a key challenge faced by optical sound measurement (Ito , 2024). Optical sound measurement (Rosell , 2012; Løkberg, 1994) have been actively studied as an alternative to microphones due to their advantage of making contactless sound measurements. The main advantage of contactless acoustic measurement is that it eliminates measurement difficulties due to sound reflection and diffraction caused by the presence of the microphone itself in the sound field. In addition, it is possible to make acoustic measurements in small areas or in the immediate vicinity of a sound source, which is impossible with a microphone. Despite these advantages, optical sound measurement faces certain challenges that must be addressed for practical implementation. One of them is that the observed quantity is a projection of the sound field, i.e., the line integration of the sound pressure along a measurement laser path. Due to this line-integral nature, it is impossible to directly obtain the sound pressure at each point in a three-dimensional space. Therefore, a reconstruction is necessary to obtain the sound pressure at each point in space from the integral observation.
To address this challenge, we have recently introduced a two-dimensional sound field reconstruction method using SIREN (Ito , 2024). We have derived the integral loss between the sound field projection and the line integral of the neural field and evaluated by synthetic data. In this paper, we expand this approach into three-dimensional sound field reconstruction and apply it to real data, for the first time. The results of numerical experiments demonstrated that the proposed method achieved a high-fidelity reconstruction of the three-dimensional sound field that is visually indistinguishable from the original sound field. In addition, we propose a reconstruction method which can estimate solutions well outside the training region by expanding physical constraints to outside measurement region (see Fig. 1).
2. Backgrounds
2.1 Optical sound measurement
2.2 Reconstruction of sound field
As mentioned above, since optical sound measurements observe the line integral of the sound pressure along the optical path, reconstruction is necessary to determine the sound pressure at each point in the three-dimensional field. The most common reconstruction technique is computed tomography (CT), based on the Radon transform (Kak and Slaney, 2001; Løkberg , 1995). However, its accuracy decreases when applied to sound fields due to discrepancies in underlying assumptions. Specifically, the filtered back projection (FBP), commonly used in CT, assumes that the sound pressure is zero outside the measurement region. This assumption rarely holds as sound tends to spread over a wide area. Additionally, since FBP does not account for physical properties related to sound, this method may not be optimal for the sound field.
Recently, physical model-based reconstruction methods based on the Helmholtz equation have been proposed (Ishikawa , 2021; Verburg and Fernandez-Grande, 2021). These methods have achieved better accuracy than the FBP because they assume that observed data is obtained from the sound field, satisfying the physical properties of sound. A typical example of these methods is the plane wave expansion (PWE) (Verburg and Fernandez-Grande, 2021). PWE adequately reconstructs sound fields from a sparse set of optical measurements by expanding the measured data on a basis of plane waves. The physical model-based reconstruction method using spherical harmonic expansion (SHE) (Nozawa , 2024) is a three-dimensional sound field reconstruction technique that addresses the exterior problem. In SHE, spherical harmonics serve as a basis for expressing sound fields defined in spherical coordinates, enabling the representation of sound fields with various directivities through the superposition of basis with different degrees and orders.
3. Sound field reconstruction by neural network
3.1 Implicit neural representation by SIREN
3.2 Proposed method
We have recently introduced a two-dimensional sound field reconstruction method using SIREN (Ito , 2024). However, it cannot be directly applied to the optically measured projections because it is constrained by the two-dimensional Helmholtz equation, and the input coordinates are also two-dimensional. Therefore, to apply the SIREN-based model to optically measured data, it is necessary to compute the three-dimensional Helmholtz equation for the model output.
Furthermore, we propose a reconstruction approach that can estimate solutions well outside the bounds of the data used for training. By extending the domain of s by a factor of 1.5, we input coordinates outside the training data region into the network and expanded the computational domain of . Outside of the region covered by training data, was set to zero. Although the measurement region of optical sound measurement is limited, this extrapolation capability makes it possible to estimate the sound pressure distribution more widely.
4. Experiment
We conducted experiments using numerical simulations and optically measured projection data to evaluate the proposed method.
4.1 Implementation detail
The proposed method used a SIREN architecture consisting of 5 layers of a fully connected neural network with a hidden layer size of 256. The Adam optimizer with a learning rate of was used. The batch size and weighting coefficient were determined independently for each experiment: the batch size was set to fill the GPU memory (single NVIDIA Geforce RTX 4090), was adjusted to maintain the appropriate balance between and . Specifically, we found that in the early part of training, setting the ratio of to close to 1:4 yields satisfactory results. Therefore, we trained the model for 10 000 epochs and then adjusted so that the ratio becomes close to 1:4. Typically, we repeated this process two or three times to determine an appropriate value for .
4.2 Numerical simulations
Initially, we performed a series of numerical simulation experiments. Figures 2(a) and 2(b) show the sound field by a point source and its projection data numerically generated for the experiments. The sampling points were configured to generate projection data with dimensions of . The rotation angle was chosen to skip 30°. The networks were trained for epochs. The proposed method was compared with SHE. Since the sound field is a simple point source, the SHE can perfectly restore the sound field if the origins of the point source and the expansion origin coincide for a noise-free case. To avoid this situation, we shifted the expansion origin of the SHE by half a wavelength from the origin of the point source. This shift reasonably reflects the situation when applying SHE to real-world applications. In practice, it is difficult to align the expansion origin precisely with the true acoustic center of the source, and some degree of mismatch is expected.
(a) Reference sound field used for the numerical experiment, (b) its projection, and (c) comparison of the real parts of the reconstructed fields and reconstruction errors from noisy data at different SNR levels. In this example, the point source emitted 5000 Hz sinusoidal wave. The objective of this research is to reconstruct the original sound field using projection data acquired at multiple rotation angles . In Fig (c), the reconstruction from noise-free data is shown at the top. Note that the error maps are displayed using different color ranges.
(a) Reference sound field used for the numerical experiment, (b) its projection, and (c) comparison of the real parts of the reconstructed fields and reconstruction errors from noisy data at different SNR levels. In this example, the point source emitted 5000 Hz sinusoidal wave. The objective of this research is to reconstruct the original sound field using projection data acquired at multiple rotation angles . In Fig (c), the reconstruction from noise-free data is shown at the top. Note that the error maps are displayed using different color ranges.
We tested the reconstruction methods using projection data with Gaussian noise because noise contamination is inevitable in optically measured sound field images. The results for the sound field with different signal-to-noise ratio (SNR) are summarized in Fig. 2(c). In noise-free conditions, both SHE and the proposed method achieved reconstruction with high fidelity that there was almost no noticeable visual difference from the original sound field. In addition, both methods demonstrated robustness to noise by achieving reconstruction accuracy that showed no visible degradation compared to the noise-free conditions. This behavior can be attributed to the fact that both methods eliminated high-frequency random noise through their physical constraints.
To assess the accuracies of the reconstructions, we calculated the normalized mean squared error (NMSE) to the original sound field. The results are summarized in Table 1. Under all conditions, the proposed method exhibits a higher NMSE compared to SHE. In noise-free conditions, SHE exhibits superior performance, which, as previously mentioned, should be because both the sound field generation and SHE follow the same physical model. As shown in Fig. 2, the reconstruction error of the proposed method is also sufficiently small. Moreover, for noisy data, the differences between SHE and the proposed method diminish, demonstrating that the proposed method is highly effective even in the presence of noise.
NMSE between the original and reconstructed sound field by SHE and the proposed method at different frequencies and noise levels.
Noise Level . | Method . | 2500 Hz . | 5000 Hz . | 7500 Hz . |
---|---|---|---|---|
Noise-free | SHE | |||
Proposed | ||||
9 dB | SHE | |||
Proposed | ||||
6 dB | SHE | |||
Proposed | ||||
3 dB | SHE | |||
Proposed |
Noise Level . | Method . | 2500 Hz . | 5000 Hz . | 7500 Hz . |
---|---|---|---|---|
Noise-free | SHE | |||
Proposed | ||||
9 dB | SHE | |||
Proposed | ||||
6 dB | SHE | |||
Proposed | ||||
3 dB | SHE | |||
Proposed |
To evaluate the extrapolation capability of the proposed method, we performed estimating sound pressure distribution outside the region covered by the training data. In this experiment, the number of iterations was set to . To match the radial sampling interval for calculating the line integral and the frequency to the optically measured data presented in Sec. 4.3, the frequency was set to 40 000 Hz, and projection data with dimensions of . The rotation angle was chosen to skip 30°. The results of expanding the calculation region of by 1.5 times are shown in Fig. 3. The red frame in the central panel of Fig. 3 indicates the reconstruction region based on projection data, while the outer region is reconstructed without reference data. From the reconstructed sound field and its cross-sectional slice, it is confirmed that the proposed method has the ability to reconstruct the region not covered by projection data with sufficient accuracy. From the error image in Fig. 3, a typical pattern emerges where errors are minimal in the central region and tend to increase towards the periphery.
The results of extending the calculation region of by 1.5 times. (Left) The real part of the reconstructed, (center) its cross-sectional slice at m and (right) reconstruction error are shown. The red frame in the center figure indicates the reconstruction region based on projection data. NMSE between the original and reconstructed sound field is shown below the left figure.
The results of extending the calculation region of by 1.5 times. (Left) The real part of the reconstructed, (center) its cross-sectional slice at m and (right) reconstruction error are shown. The red frame in the center figure indicates the reconstruction region based on projection data. NMSE between the original and reconstructed sound field is shown below the left figure.
The NMSE between the original and reconstructed sound field is shown in Fig. 3. The results demonstrate slightly higher value to that under standard noise-free conditions. However, as shown in Fig. 3, the reconstruction error is sufficiently small even outside the region covered by training data, in most areas.
4.3 Experiment using the measurement data obtained by PPSI
Finally, we conducted experiments using the measurement data obtained by parallel phase-shifting interferometry (PPSI), one of the optical sound measurement methods (Ishikawa , 2016). Two ultrasonic transducers [SPL (Hong Kong) Limited UOD1035-Z570R] used as sound sources and the measured sound-field projection are shown in Figs. 4(a) and 4(b), respectively. An asymmetric sound field generated by the two sound source are configured to form directivity patterns in both vertical and diagonal directions. Each sound source emits a 40 000 Hz acoustic wave. The rotation angle was chosen to skip 5° intervals. Since our method assumes that there are no sound sources within the observation area, we used only the data for m, with dimensions of , for reconstruction. The learning rate and the number of iteration was set to and , respectively. To validate the validity of the proposed method, the reconstructed sound field was compared with one directly measured by a microphone. A 1/4-in. microphone was scanned horizontally and vertically at 1 mm intervals on a plane that includes the centers of the two transducers to capture the 2D sound field.
Reconstruction results using measurement projection data obtained through PPSI. (a) Two ultrasonic transducers generating the sound field used for the experiment, (b) the PPSI experimental data, (c) the real part of the reconstructed sound field, (d) its cross-sectional slice at m, and (e) microphone observation values in the same plane as (d). The red frame in (b) indicates the region of measurement data used for the experiment.
Reconstruction results using measurement projection data obtained through PPSI. (a) Two ultrasonic transducers generating the sound field used for the experiment, (b) the PPSI experimental data, (c) the real part of the reconstructed sound field, (d) its cross-sectional slice at m, and (e) microphone observation values in the same plane as (d). The red frame in (b) indicates the region of measurement data used for the experiment.
The reconstructed sound field and its cross-sectional slice are shown in Figs. 4(c) and 4(d), respectively. The microphone observation in the same plane as (d) is also shown in Fig. 4(e). The slice of the reconstructed sound field exhibited patterns similar to that observed in PPSI projection data. In addition, it had almost the same value as the microphone observation values shown in Fig. 4(e). These results indicate that our proposed method is applicable to optical sound measurement data. In this case using experimental data, the weighting coefficient of the loss function significantly influenced the reconstruction accuracy. Since the optimal values of vary depending on the target sound field, developing a method to automatically determine appropriate weighting factors remains a challenge for future work.
5. Conclusion
In this paper, we proposed a three-dimensional sound field reconstruction method using PINNs for optical sound measurements. The proposed method demonstrated high fidelity in reconstructing the original sound field, while retaining the advantages of PINNs, such as high robustness to noise, and reliable extrapolation outside the training region. Additionally, in experiments with measurement projection data, we visually confirmed that the reconstructed sound field exhibited trends similar to the projection data obtained by PPSI and it showed the reasonable agreement with the microphone observation, indicating that the proposed method is applicable to real-world optical sound measurement data. Since the weighting factor in the loss function has a significant impact on reconstruction accuracy, developing a method to automatically determine appropriate weighting factors is an important area for future research.
Author Declarations
Conflict of Interest
The authors have no conflicts to disclose.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.