The implicit representation by physics-informed neural networks (PINNs) serves as an effective solution for a key challenge faced by optical sound measurements. Since optical sound measurements observe line integral of the sound pressure along the optical path, reconstruction is necessary to determine the sound pressure at each point in the three-dimensional field. In this paper, we expand the PINNs-based reconstruction method into three-dimensional reconstruction and demonstrate its effectiveness for optically measured sound fields. Furthermore, we propose a reconstruction approach which can estimate solutions well outside the bounds of the data used for training.

Partial differential equations (PDEs) are fundamental tools for describing various physical phenomena in science and engineering, including sound propagation. Physics-informed neural networks (PINNs) (Raissi , 2019, 2017b; Raissi, 2018; Raissi , 2017a) have emerged as a powerful framework for solving PDEs. PDEs are incorporated into the loss function in order to impose physical constraints directly on the network. Since this provides the network with systematic knowledge of the physics, the solution can be well estimated even from small amounts of data. Furthermore, PINNs have also shown their capabilities for noisy experimental data and multi-physics problems (Raissi, 2018; Raissi , 2017a). One of notable advantages of PINNs is that it can estimate solutions well outside the bounds of the data used for training (Raissi , 2017a).

Sinusoidal representation Networks (SIREN) (Sitzmann , 2020), introduced by Sitzmann et al. brought about a significant advancement in this field. SIREN can represent derivatives of arbitrary order by using periodic sine activation functions, allowing detailed modeling of harmonic signals. This precise representation of derivatives demonstrates the potential of implicit neural representations to be an innovative tool for solving inverse problems of PDEs.

PINNs-based implicit neural representation serves as an effective solution for a key challenge faced by optical sound measurement (Ito , 2024). Optical sound measurement (Rosell , 2012; Løkberg, 1994) have been actively studied as an alternative to microphones due to their advantage of making contactless sound measurements. The main advantage of contactless acoustic measurement is that it eliminates measurement difficulties due to sound reflection and diffraction caused by the presence of the microphone itself in the sound field. In addition, it is possible to make acoustic measurements in small areas or in the immediate vicinity of a sound source, which is impossible with a microphone. Despite these advantages, optical sound measurement faces certain challenges that must be addressed for practical implementation. One of them is that the observed quantity is a projection of the sound field, i.e., the line integration of the sound pressure along a measurement laser path. Due to this line-integral nature, it is impossible to directly obtain the sound pressure at each point in a three-dimensional space. Therefore, a reconstruction is necessary to obtain the sound pressure at each point in space from the integral observation.

To address this challenge, we have recently introduced a two-dimensional sound field reconstruction method using SIREN (Ito , 2024). We have derived the integral loss between the sound field projection and the line integral of the neural field and evaluated by synthetic data. In this paper, we expand this approach into three-dimensional sound field reconstruction and apply it to real data, for the first time. The results of numerical experiments demonstrated that the proposed method achieved a high-fidelity reconstruction of the three-dimensional sound field that is visually indistinguishable from the original sound field. In addition, we propose a reconstruction method which can estimate solutions well outside the training region by expanding physical constraints to outside measurement region (see Fig. 1).

Fig. 1.

An overview of the proposed method.

Fig. 1.

An overview of the proposed method.

Close modal
Since changes in air density induced by sound affect light as a refractive index, optical measurements can provide quantitative information about sound field properties. Assuming adiabatic conditions, the refractive index of air n(x,t) is expressed as a function of sound pressure p(x,t),
(1)
where x=(x,y,z) is the position vector, t is the time, ρ0 and n0 represent the air density and refractive index under atmospheric pressure p0, and γ is the specific heat ratio of air. Since the frequency of the light is considered high enough to geometrically approximate the rays, the geometrically approximated phase of the light ϕ is given by
(2)
where k is the wavenumber of light. The optical phase is proportional to the line integral of the refractive index n along the optical path L. Substituting Eq. (1) into Eq. (2), we obtain
(3)
The time-independent first term represents a DC component and can be neglected in acousto-optic measurement. Therefore, the measurement of phase modulation provides the line-integrated sound pressure along the optical path as follows:
(4)
where pL=Lp and C={(n0γp0)/(n01)}L(x)dy is the constant term.

As mentioned above, since optical sound measurements observe the line integral of the sound pressure along the optical path, reconstruction is necessary to determine the sound pressure at each point in the three-dimensional field. The most common reconstruction technique is computed tomography (CT), based on the Radon transform (Kak and Slaney, 2001; Løkberg , 1995). However, its accuracy decreases when applied to sound fields due to discrepancies in underlying assumptions. Specifically, the filtered back projection (FBP), commonly used in CT, assumes that the sound pressure is zero outside the measurement region. This assumption rarely holds as sound tends to spread over a wide area. Additionally, since FBP does not account for physical properties related to sound, this method may not be optimal for the sound field.

Recently, physical model-based reconstruction methods based on the Helmholtz equation have been proposed (Ishikawa , 2021; Verburg and Fernandez-Grande, 2021). These methods have achieved better accuracy than the FBP because they assume that observed data is obtained from the sound field, satisfying the physical properties of sound. A typical example of these methods is the plane wave expansion (PWE) (Verburg and Fernandez-Grande, 2021). PWE adequately reconstructs sound fields from a sparse set of optical measurements by expanding the measured data on a basis of plane waves. The physical model-based reconstruction method using spherical harmonic expansion (SHE) (Nozawa , 2024) is a three-dimensional sound field reconstruction technique that addresses the exterior problem. In SHE, spherical harmonics serve as a basis for expressing sound fields defined in spherical coordinates, enabling the representation of sound fields with various directivities through the superposition of basis with different degrees and orders.

SIREN employs a fully connected neural network to optimize the implicit representation (Sitzmann , 2020). This approach enables precise representation of complex natural signals and their derivatives. The model output of SIREN is expressed as
(5)
where x=(x,y,z) is a position vector, fi denotes the ith layer of the network. For each layer, the weight matrix Wi and bias vector bi are applied to the input vector xi. Using a sinusoidal activation function, F(x) can be represented as a cosine function with the same representation ability as the original model. This property of well-behaved derivatives makes implicit neural representations particularly suitable for solving inverse differential equations. An illustrative application of this concept is demonstrated in SIREN through neural full-waveform inversion (FWI) (Sitzmann , 2020), which addresses the Helmholtz equation. FWI enables the reconstruction of the complete sound field using sparsely distributed microphones by solving a constrained PDE. The reconstruction is achieved through the minimization of the following loss function:
(6)
(7)
(8)
where r(x) models observed data of microphones at position x. λ represents the weighting coefficient of the loss function, and 2 denotes the Laplacian operator.

We have recently introduced a two-dimensional sound field reconstruction method using SIREN (Ito , 2024). However, it cannot be directly applied to the optically measured projections because it is constrained by the two-dimensional Helmholtz equation, and the input coordinates are also two-dimensional. Therefore, to apply the SIREN-based model to optically measured data, it is necessary to compute the three-dimensional Helmholtz equation for the model output.

In this paper, we expand this two-dimensional approach into three-dimensional reconstruction. Instead of using the microphone observation r, we introduce a loss function using the line integral I (Ito , 2024). The line integral for an arbitrary three-dimensional sound field is given by
(9)
where p(x,y,z) represents the arbitrary sound field and s, θ are the radial position and the rotation angle, respectively, t and t represent the lower and upper limits of the integration range, and 0θ<π. Since the line integral of the sound field I cannot be directly compared with the network output, we calculate the line integral of the network output as
(10)
To numerically compute the line integral, we set multiple evaluation points at equal intervals along the line defined by the parameters in Eq. (10), and calculated the value of the function F (i.e., the output of the network) at each point. The number of evaluation points matches that of samples of the parameter s. By summing the discrete network outputs multiplied by the intervals between the evaluation points along the line, the continuous line integral is approximated by a discrete sum. Since Î is differentiable, the loss from the line integral value can be used in gradient-based optimization. Due to computational memory constraints, s, θ and z are randomly sampled at each training iteration. The loss function is defined as
(11)
where s̃, θ̃, and z̃ represent randomly sampled s, θ, and z, respectively. The training process constrained by line integral observations is achieved by substituting Lint for Ldata in Eq. (8). While the 2 norm is commonly used for loss functions, our preliminary tests demonstrated that the 1 norm achieves superior reconstruction accuracy (Ito , 2024). Therefore, we adopted the 1 norm for our implementation. It should be noted that our method assumes no sound sources are present within the optical measurement observation area. The reconstruction approach using this SIREN framework incorporates the physical properties of sound by learning to satisfy constraints imposed by the Helmholtz equation. In addition, it combines various advantages of PINNs, such as robustness to noise and ease of extension to multi-physics problems.

Furthermore, we propose a reconstruction approach that can estimate solutions well outside the bounds of the data used for training. By extending the domain of s by a factor of 1.5, we input coordinates outside the training data region into the network and expanded the computational domain of LHelmholtz. Outside of the region covered by training data, Lint was set to zero. Although the measurement region of optical sound measurement is limited, this extrapolation capability makes it possible to estimate the sound pressure distribution more widely.

We conducted experiments using numerical simulations and optically measured projection data to evaluate the proposed method.

The proposed method used a SIREN architecture consisting of 5 layers of a fully connected neural network with a hidden layer size of 256. The Adam optimizer with a learning rate of 2.0×105 was used. The batch size and weighting coefficient λ were determined independently for each experiment: the batch size was set to fill the GPU memory (single NVIDIA Geforce RTX 4090), λ was adjusted to maintain the appropriate balance between LHelmholtz and Lint. Specifically, we found that in the early part of training, setting the ratio of LHelmholtz to Lint close to 1:4 yields satisfactory results. Therefore, we trained the model for 10 000 epochs and then adjusted λ so that the ratio becomes close to 1:4. Typically, we repeated this process two or three times to determine an appropriate value for λ.

Initially, we performed a series of numerical simulation experiments. Figures 2(a) and 2(b) show the sound field by a point source and its projection data numerically generated for the experiments. The sampling points were configured to generate projection data with dimensions of 32×32. The rotation angle θ was chosen to skip 30°. The networks were trained for 1.0×106 epochs. The proposed method was compared with SHE. Since the sound field is a simple point source, the SHE can perfectly restore the sound field if the origins of the point source and the expansion origin coincide for a noise-free case. To avoid this situation, we shifted the expansion origin of the SHE by half a wavelength from the origin of the point source. This shift reasonably reflects the situation when applying SHE to real-world applications. In practice, it is difficult to align the expansion origin precisely with the true acoustic center of the source, and some degree of mismatch is expected.

Fig. 2.

(a) Reference sound field used for the numerical experiment, (b) its projection, and (c) comparison of the real parts of the reconstructed fields and reconstruction errors from noisy data at different SNR levels. In this example, the point source emitted 5000 Hz sinusoidal wave. The objective of this research is to reconstruct the original sound field using projection data acquired at multiple rotation angles θ. In Fig (c), the reconstruction from noise-free data is shown at the top. Note that the error maps are displayed using different color ranges.

Fig. 2.

(a) Reference sound field used for the numerical experiment, (b) its projection, and (c) comparison of the real parts of the reconstructed fields and reconstruction errors from noisy data at different SNR levels. In this example, the point source emitted 5000 Hz sinusoidal wave. The objective of this research is to reconstruct the original sound field using projection data acquired at multiple rotation angles θ. In Fig (c), the reconstruction from noise-free data is shown at the top. Note that the error maps are displayed using different color ranges.

Close modal

We tested the reconstruction methods using projection data with Gaussian noise because noise contamination is inevitable in optically measured sound field images. The results for the sound field with different signal-to-noise ratio (SNR) are summarized in Fig. 2(c). In noise-free conditions, both SHE and the proposed method achieved reconstruction with high fidelity that there was almost no noticeable visual difference from the original sound field. In addition, both methods demonstrated robustness to noise by achieving reconstruction accuracy that showed no visible degradation compared to the noise-free conditions. This behavior can be attributed to the fact that both methods eliminated high-frequency random noise through their physical constraints.

To assess the accuracies of the reconstructions, we calculated the normalized mean squared error (NMSE) to the original sound field. The results are summarized in Table 1. Under all conditions, the proposed method exhibits a higher NMSE compared to SHE. In noise-free conditions, SHE exhibits superior performance, which, as previously mentioned, should be because both the sound field generation and SHE follow the same physical model. As shown in Fig. 2, the reconstruction error of the proposed method is also sufficiently small. Moreover, for noisy data, the differences between SHE and the proposed method diminish, demonstrating that the proposed method is highly effective even in the presence of noise.

Table 1.

NMSE between the original and reconstructed sound field by SHE and the proposed method at different frequencies and noise levels.

Noise Level Method 2500 Hz 5000 Hz 7500 Hz
Noise-free  SHE  1.12×104  1.02×104  1.30×104 
Proposed  1.70×102  1.04×102  4.55×104 
9 dB  SHE  4.13×103  4.13×103  8.56×103 
Proposed  1.80×103  1.23×102  2.27×102 
6 dB  SHE  8.35×103  1.22×102  1.10×102 
Proposed  2.49×102  2.52×102  2.73×102 
3 dB  SHE  1.46×102  2.35×102  2.21×102 
Proposed  3.21×102  4.10×102  4.92×102 
Noise Level Method 2500 Hz 5000 Hz 7500 Hz
Noise-free  SHE  1.12×104  1.02×104  1.30×104 
Proposed  1.70×102  1.04×102  4.55×104 
9 dB  SHE  4.13×103  4.13×103  8.56×103 
Proposed  1.80×103  1.23×102  2.27×102 
6 dB  SHE  8.35×103  1.22×102  1.10×102 
Proposed  2.49×102  2.52×102  2.73×102 
3 dB  SHE  1.46×102  2.35×102  2.21×102 
Proposed  3.21×102  4.10×102  4.92×102 

To evaluate the extrapolation capability of the proposed method, we performed estimating sound pressure distribution outside the region covered by the training data. In this experiment, the number of iterations was set to 2.0×105. To match the radial sampling interval for calculating the line integral and the frequency to the optically measured data presented in Sec. 4.3, the frequency was set to 40 000 Hz, and projection data with dimensions of 79×79. The rotation angle θ was chosen to skip 30°. The results of expanding the calculation region of LHelmholtz by 1.5 times are shown in Fig. 3. The red frame in the central panel of Fig. 3 indicates the reconstruction region based on projection data, while the outer region is reconstructed without reference data. From the reconstructed sound field and its cross-sectional slice, it is confirmed that the proposed method has the ability to reconstruct the region not covered by projection data with sufficient accuracy. From the error image in Fig. 3, a typical pattern emerges where errors are minimal in the central region and tend to increase towards the periphery.

Fig. 3.

The results of extending the calculation region of LHelmholtz by 1.5 times. (Left) The real part of the reconstructed, (center) its cross-sectional slice at z=0.02 m and (right) reconstruction error are shown. The red frame in the center figure indicates the reconstruction region based on projection data. NMSE between the original and reconstructed sound field is shown below the left figure.

Fig. 3.

The results of extending the calculation region of LHelmholtz by 1.5 times. (Left) The real part of the reconstructed, (center) its cross-sectional slice at z=0.02 m and (right) reconstruction error are shown. The red frame in the center figure indicates the reconstruction region based on projection data. NMSE between the original and reconstructed sound field is shown below the left figure.

Close modal

The NMSE between the original and reconstructed sound field is shown in Fig. 3. The results demonstrate slightly higher value to that under standard noise-free conditions. However, as shown in Fig. 3, the reconstruction error is sufficiently small even outside the region covered by training data, in most areas.

Finally, we conducted experiments using the measurement data obtained by parallel phase-shifting interferometry (PPSI), one of the optical sound measurement methods (Ishikawa , 2016). Two ultrasonic transducers [SPL (Hong Kong) Limited UOD1035-Z570R] used as sound sources and the measured sound-field projection are shown in Figs. 4(a) and 4(b), respectively. An asymmetric sound field generated by the two sound source are configured to form directivity patterns in both vertical and diagonal directions. Each sound source emits a 40 000 Hz acoustic wave. The rotation angle θ was chosen to skip 5° intervals. Since our method assumes that there are no sound sources within the observation area, we used only the data for z0.01 m, with dimensions of 34×79, for reconstruction. The learning rate and the number of iteration was set to 1.0×105 and 2.0×105, respectively. To validate the validity of the proposed method, the reconstructed sound field was compared with one directly measured by a microphone. A 1/4-in. microphone was scanned horizontally and vertically at 1 mm intervals on a plane that includes the centers of the two transducers to capture the 2D sound field.

Fig. 4.

Reconstruction results using measurement projection data obtained through PPSI. (a) Two ultrasonic transducers generating the sound field used for the experiment, (b) the PPSI experimental data, (c) the real part of the reconstructed sound field, (d) its cross-sectional slice at x=0 m, and (e) microphone observation values in the same plane as (d). The red frame in (b) indicates the region of measurement data used for the experiment.

Fig. 4.

Reconstruction results using measurement projection data obtained through PPSI. (a) Two ultrasonic transducers generating the sound field used for the experiment, (b) the PPSI experimental data, (c) the real part of the reconstructed sound field, (d) its cross-sectional slice at x=0 m, and (e) microphone observation values in the same plane as (d). The red frame in (b) indicates the region of measurement data used for the experiment.

Close modal

The reconstructed sound field and its cross-sectional slice are shown in Figs. 4(c) and 4(d), respectively. The microphone observation in the same plane as (d) is also shown in Fig. 4(e). The slice of the reconstructed sound field exhibited patterns similar to that observed in PPSI projection data. In addition, it had almost the same value as the microphone observation values shown in Fig. 4(e). These results indicate that our proposed method is applicable to optical sound measurement data. In this case using experimental data, the weighting coefficient λ of the loss function significantly influenced the reconstruction accuracy. Since the optimal values of λ vary depending on the target sound field, developing a method to automatically determine appropriate weighting factors λ remains a challenge for future work.

In this paper, we proposed a three-dimensional sound field reconstruction method using PINNs for optical sound measurements. The proposed method demonstrated high fidelity in reconstructing the original sound field, while retaining the advantages of PINNs, such as high robustness to noise, and reliable extrapolation outside the training region. Additionally, in experiments with measurement projection data, we visually confirmed that the reconstructed sound field exhibited trends similar to the projection data obtained by PPSI and it showed the reasonable agreement with the microphone observation, indicating that the proposed method is applicable to real-world optical sound measurement data. Since the weighting factor λ in the loss function has a significant impact on reconstruction accuracy, developing a method to automatically determine appropriate weighting factors is an important area for future research.

The authors have no conflicts to disclose.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
Ishikawa
,
K.
,
Yatabe
,
K.
,
Chitanont
,
N.
,
Ikeda
,
Y.
,
Oikawa
,
Y.
,
Onuma
,
T.
,
Niwa
,
H.
, and
Yoshii
,
M.
(
2016
). “
High-speed imaging of sound using parallel phase-shifting interferometry
,”
Opt. Express
24
(
12
),
12922
12932
.
2.
Ishikawa
,
K.
,
Yatabe
,
K.
, and
Oikawa
,
Y.
(
2021
). “
Physical-model-based reconstruction of axisymmetric three-dimensional sound field from optical interferometric measurement
,”
Meas. Sci. Technol.
32
(
4
),
045202
.
3.
Ito
,
R.
,
Oikawa
,
Y.
, and
Ishikawa
,
K.
(
2024
). “
Tomographic reconstruction of sound field from optical projections using physics-informed neural networks
,” in
2024 IEEE 34th International Workshop on Machine Learning for Signal Process
es (
MLSP
).
4.
Kak
,
A. C.
, and
Slaney
,
M.
(
2001
).
Principles of Computerized Tomographic Imaging
(
SIAM
,
Philadelphia, PA
).
5.
Løkberg
,
O.
(
1994
). “
Sound in flight: Measurement of sound fields by use of TV holography
,”
Appl. Opt.
33
(
13
),
2574
2584
.
6.
Løkberg
,
O. J.
,
Espeland
,
M.
, and
Pedersen
,
H. M.
(
1995
). “
Tomographic reconstruction of sound fields using TV holography
,”
Appl. Opt.
34
(
10
),
1640
.
7.
Nozawa
,
H.
,
Imanishi
,
M.
,
Oikawa
,
Y.
, and
Ishikawa
,
K.
(
2024
). “
Physical-model-based reconstruction of three-dimensional sound field from multi-directional measurement by parallel phase-shift interferometry
,”
Proc. Mtgs. Acoust.
52
,
030001
.
8.
Raissi
,
M.
(
2018
). “
Deep hidden physics models: Deep learning of nonlinear partial differential equations
,”
J. Mach. Learn. Res.
19
,
932
955
.
10.
Raissi
,
M.
,
Perdikaris
,
P.
, and
Karniadakis
,
G.
(
2019
). “
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
,”
J. Comput. Phys.
378
,
686
707
.
9.
Raissi
,
M.
,
Pedikaris
,
P.
, and
Karniadakis
,
G. E.
(
2017a
). “
Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations
,”
SIAM J. Sci. Comput.
40
,
172
198
.
11.
Raissi
,
M.
,
Perdikaris
,
P.
, and
Karniadakis
,
G. E.
(
2017b
). “
Machine learning of linear differential equations using gaussian processes
,”
J. Comput. Phys.
348
,
683
697
.
12.
Rosell
,
A. T.
,
Figueroa
,
S. B.
, and
Jacobsen
,
F.
(
2012
). “
Sound field reconstruction using acousto-optic tomography
,”
Acoust. Soc. Am.
131
,
3786
3793
.
13.
Sitzmann
,
V.
,
Martel
,
J. N. P.
,
Bergman
,
A. W.
,
Lindell
,
D. B.
, and
Wetzstein
,
G.
(
2020
). “
Implicit neural representations with periodic activation functions
,”
Adv. Proc. Neural Inf. Process. Syst.
33
,
7462
7473
.
14.
Verburg
,
S. A.
, and
Fernandez-Grande
,
E.
(
2021
). “
Acousto-optical volumetric sensing of acoustic fields
,”
Phys. Rev. Appl.
16
,
044033
.