Realistic sound is essential in virtual environments, such as computer games and mixed reality. Efficient and accurate numerical methods for pre-calculating acoustics have been developed over the last decade; however, pre-calculating acoustics makes handling dynamic scenes with moving sources challenging, requiring intractable memory storage. A physics-informed neural network (PINN) method in one dimension is presented, which learns a compact and efficient surrogate model with parameterized moving Gaussian sources and impedance boundaries and satisfies a system of coupled equations. The model shows relative mean errors below 2%/0.2 dB and proposes a first step in developing PINNs for realistic three-dimensional scenes.

## 1. Introduction

In computer games and mixed reality, realistic sound is essential for an immersive user experience. The impulse responses (IR) can be obtained accurately and efficiently by numerically solving the wave equation using traditional numerical methods, such as finite element methods,^{1} spectral element methods (SEM),^{2} discontinuous Galerkin finite element method,^{3} and finite-difference time-domain methods.^{4,5} For real-time applications spanning a broad frequency range, the IRs are calculated offline due to the computational requirements. However, for dynamic, interactive scenes with numerous moving sources and receivers, the computation time and storage requirement for a lookup database become intractable (in the range of gigabytes) since the IR is calculated for each source-receiver pair. When covering the whole audible frequency range, these challenges become even more extensive. Previous attempts to overcome the storage requirements of the IRs include work for lossy compression,^{6} and lately, a novel portal search method has been proposed as a drop-in solution to pre-computed IRs to adapt to flexible scenes, e.g., when doors and windows are opened and closed.^{7} A recent technique for handling parameter parameterization and model order reduction for acceleration of numerical models is the reduced basis method (RBM).^{8,9} Although very efficient, RBM cannot meet the runtime requirements regarding computation time for virtual acoustics.

In this paper, we consider a new approach using physics-informed neural networks (PINNs)^{10–12} including knowledge of the underlying physics (in contrary to traditional “black box” neural networks^{13}) to learn a surrogate model for a one-dimensional (1D) domain that can be executed very efficiently at runtime (in the range of ms) and takes up little storage due to their intrinsic interpolation properties in grid-less domains. The applications of PINNs in virtual acoustics are very limited,^{14,15} and the main contribution of this work is the development of frequency-dependent and independent impedance boundary conditions with parameterized moving Gaussian sources, making it possible to model sound propagation taking boundary materials properly into account. This work investigated PINNs for virtual acoustics in a 1D domain—still taking the necessary physics into account—making it a possible stepping stone to model realistic and complex 3D scenes for applications, such as games and mixed reality, where the computation and storage requirements are very strict.

## 2. Methods

We take a data-free approach where only the underlying physics is included in the training and their residual minimized through the loss function, allowing insights into how well PINNs perform for predicting sound fields in acoustic conditions. The Gaussian impulse is used as the initial condition tested with frequency-independent and dependent impedance boundaries. To assess the quality of the developed PINN models, we have used our in-house open-source SEM simulators.^{2}

### 2.1 Governing equations

We consider in the following the use of PINNs for the construction of a surrogate model predicting the solution to the linear wave equation in 1D,

where *p* is the pressure (Pa), *t* is the time (s), and *c* is the speed of sound in air (m/s). The initial conditions (ICs) are satisfied by using a Gaussian source for the pressure part and setting the velocity equal to zero,

with *σ*_{0} being the width of the pulse determining the frequencies to span.

### 2.2 Boundary conditions

We consider impedance boundaries and denote the boundary domain as Γ (in 1D, the left and right endpoints). We will omit the source position *x*_{0} in the following.

#### 2.2.1 Frequency-independent impedance boundaries

The acoustic properties of a wall can be described by its surface impedance^{16} $Zs=p/vn$, where *v _{n}* is the normal component of the velocity at the same location on the wall surface. Combining the surface impedance with the pressure term $\u2202p/\u2202n=\u2212\rho 0(\u2202vn/\u2202t)$ of the linear coupled wave equation yields

where $\xi =Zs/(\rho 0c)$ is the normalized surface impedance and *ρ*_{0} denotes the air density ($kg/m3$). Note that perfectly reflecting boundaries can be obtained by letting $\xi \u2192\u221e$ being the Neumann boundary formulation.

#### 2.2.2 Frequency-dependent impedance boundaries

The wall impedance can be written as a rational function in terms of the admittance $Y=1/Zs$ and rewritten by using partial fraction decomposition in the last equation (17)

where *a*, *b* are real coefficients; $i=\u22121$ is the complex number; *Q* is the number of real poles *λ _{k}*;

*S*is the number of complex conjugate pole pairs $\alpha k\xb1j\beta k$; and $Y\u221e$,

*A*,

_{k}*B*, and

_{k}*C*are numerical coefficients. Since we are concerned with the (time-domain) wave equation, the inverse Fourier transform is applied on the admittance and on the partial fraction decomposition term in Eq. (4). Combining these gives

_{k}^{17}

The functions $\varphi k,\u2009\psi k(0)$, and $\psi k(1)$ are the so-called accumulators determined by the following set of ordinary differential equations (ODEs) referred to as auxiliary differential equations (ADEs):

The boundary conditions can then be formulated by inserting the velocity *v _{n}* calculated in Eq. (5) into the pressure term of the linear coupled wave equation $\u2202p/\u2202n=\u2212\rho 0(\u2202vn/\u2202t)$.

### 2.3 PINNs

Two multi-layer feed-forward neural networks are setup,

where **W** and **b** are the network weights and biases, respectively; $NADE$ is only applied in case of frequency-dependent boundaries. The networks take three inputs $x,t,x0$ corresponding to the spatial, temporal, and source position dimensions. The network $Nf$ has one output $p\u0302(x,t,x0)$ approximating $p(x,t,x0)$ predicting the pressure; the network $NADE$ is a multi-output network with the number of outputs corresponding to the number of accumulators to approximate as explained in the following.

Including only information about the underlying physics, the governing partial differential equation (PDE) from Eq. (1) and initial conditions (ICs) from Eq. (2) can be learned by minimizing the mean squared error loss denoted $\Vert \u2022\Vert $ as

where

Here, ${xici,x0,ici}i=1NIC$ denotes the initial $NIC$ data points, ${xfi,tfi,x0,fi}i=1Nf$ denotes the *N _{f}* collocation points for the PDE

*f*, and the penalty weights $\lambda IC$ and $\lambda BC$ are used for balancing the impact of the individual terms. The loss function $LBC$ will be treated separately for the impedance boundary conditions in the following, where ${xbci,tbci,x0,bci}i=1NBC$ will, correspondingly, be denoting the $NBC$ collocation points on the boundaries. For frequency-dependent boundaries, an auxiliary neural network will be coupled, resulting in the additional loss term $LADE$ in Eq. (7) and is explained in detail in the following.

#### 2.3.1 Frequency-independent impedance boundary loss functions

The frequency-independent boundary condition, Eq. (3), is included in the loss function $LBC:=LINDEP$ satisfied by $LINDEP=\Vert (\u2202/\u2202t)Nf(xbci,tbci,x0,bci;W,b)+c\xi (\u2202/\u2202n)Nf(xbci,tbci,x0,bci;W,b)\Vert $.

#### 2.3.2 Frequency-dependent impedance boundary loss functions

For frequency-dependent boundaries, the ADEs need to be solved as well, approximating the ODEs from Eq. (6) by introducing an additional neural network $NADE(xb,t,x0;W,b)$ parameterized by *x _{b}* and

*x*

_{0}for boundary positions and moving sources, respectively. The network has multiple outputs $NADE(xb,t,x0;W,b)=[\varphi \u03030,\varphi \u03031,\u2026,\varphi \u0303Q\u22121;\psi \u03030(0),\psi \u03031(0),\u2026,\psi \u0303S\u22121(0);\psi \u03030(1),\psi \u03031(1),\u2026,\psi \u0303S\u22121(1)]$ corresponding to the

*scaled*accumulators determined by a scaling factor $lADE\u2022$, mapping $\varphi \u0303k=lADE\varphi k\varphi \u0302k,\u2009\psi \u0303k(0)=lADE\psi k\psi \u0302k(0),$ and $\psi \u0303k(1)=lADE\psi k\psi \u0302k(1)$, such that $\varphi \u0303k,\psi \u0303k(0),\psi \u0303k(1):(xb,t,x0)\u21a6[\u22121,1]$ matching the range of the tanh function used in this work. The scaling factors are independent of the geometry and domain dimensionality, only the material properties determine the amplitude of the accumulators. In this work, the scaling factors are determined using the SEM solver, but might be analytically estimated from the accumulators considering only a single reflection in a 1D domain. A graphical representation of the neural network architectures for approximating the governing physical equations and ADEs is depicted in Fig. 1. The accumulators with parametrized moving sources and boundary positions $\varphi \u0303k(xbci,tbci,x0,bci),\u2009\psi \u0303k(0)(xbci,tbci,x0,bci),\u2009\psi \u0303k(1)(xbci,tbci,x0,bci)$ can be learned by minimizing the mean squared error loss as (arguments omitted)

where

and

The frequency-dependent boundary conditions are satisfied by $LDEP=\Vert (\u2202/\u2202n)Nf(xbci,tbci,x0,bci;W,b)+\rho 0[\u2202vn(xbci,tbci,x0,bci)/\u2202t]\Vert $, where *v _{n}* is the expression at the boundaries given in Eq. (5) with $\varphi \u0302k=1/lADE\varphi k\varphi \u0303k,\u2009\psi \u0302k(0)=1/lADE\psi k(0)\psi \u0303k(0),$ and $\psi \u0302k(1)=1/lADE\psi k(1)\psi \u0303k(1)$, ensuring that the accumulators are properly re-scaled. The loss is included as the term $LBC:=LDEP$ in Eq. (7) together with the loss for the ADEs in Eq. (9).

### 2.4 Setup

tensorflow 2.5.1,^{18} sciann 0.6.4.7,^{19} and python 3.8.9 with 64 bit floating points for the neural network weights are used. The code for reproducing the results can be found online.^{20}

Reference data for impedance boundaries are generated using a fourth-order Jacobi polynomial SEM solver. The grid was discretized with 20 points per wavelength spanning frequencies up to 1000 Hz yielding an average grid resolution of $\Delta x=0.017$ m, and the time step was $\Delta t=CFL\xd7\Delta x/c$, where CFL is the Courant-Friedrichs-Lewy constant ($CFL=1.0$ and $CFL=0.1$ for frequency-independent and dependent boundaries, respectively). The speed of sound *c* = 1 m/s is used for the PINN setup, implying $\Delta x=\Delta t/c=\Delta t$, which is a normalization introduced to ensure the same scaling in time and space required for the optimization problem to converge. In case of a normalized speed of sound, the effective normalized frequency is correspondingly $f=fphys/cphys$, since the wave is now travelling slower compared to the physical setup. To evaluate the results for a physical speed of sound $cphys=343$ m/s, the temporal dimension should be converted back as $tphys=t/cphys$ s. In case of frequency-dependent boundaries, the velocity from Eq. (5) needs to be normalized accordingly regarding frequency and flow resistivity $\sigma mat=\sigma mat,\u2009phys/cphys$ in Miki's model. Fitting the parameters for *c* = 1 m/s yields modified *λ _{k}*,

*α*,

_{k}*β*, and $Y\u221e$ values resulting in the exact same surface impedance as for

_{k}*c*= 343 m/s, but scaled by $cphys$ in frequency range and amplitude. This can be seen from the complex wavenumber and characteristic impedance of the porous medium in Miki's model involving $f/\sigma mat$ and $2\pi f/c$ not being affected by normalization.

^{20}

The point distribution in time and space, number of sources, penalty weights *λ* and scaling factors $lADE$ for the ADEs are listed in Table 1. Note that the number of (time and space) domain points (30%) per source (7) is $47\u2009089\xd70.3/7=2018$, satisfying the Nyquist theorem, since $\Delta x=2/2018=0.045$ m (we can use the square root to get the gridpoint distribution in the spatial dimension) resulting in $ppw=\lambda w/\Delta x=7.6$ points per wavelength; $\lambda w=c/f$ being the wavelength for physical frequency 1000 Hz and physical speed of sound 343 m/s. For the neural network, we have used the ADAM optimizer and the mean-squared error for calculating the losses for both networks. The training was run with learning rate 1e−4 and batch size 512 until a total loss of $\u03f5=2e\u22124$ was reached (roughly 16k and 20k epochs needed for frequency-independent and dependent boundaries, respectively). The relatively big batch size was chosen to ensure that enough initial and boundary points were included in the optimization steps.

#total . | #BC . | #IC . | #inner^{a}
. | #srcs . | $\lambda IC$ . | $\lambda BC$ . | $\lambda ADE\u2022$ . | $lADE\varphi 0$ . | $lADE\varphi 1$ . | $lADE\psi 0(0)$ . | $lADE\psi 0(1)$ . |
---|---|---|---|---|---|---|---|---|---|---|---|

$47,089$ | 45% | 25% | 30% | 7 | 20 | 1 | 10 | 10.3 | 261.4 | 45.9 | 22.0 |

#total . | #BC . | #IC . | #inner^{a}
. | #srcs . | $\lambda IC$ . | $\lambda BC$ . | $\lambda ADE\u2022$ . | $lADE\varphi 0$ . | $lADE\varphi 1$ . | $lADE\psi 0(0)$ . | $lADE\psi 0(1)$ . |
---|---|---|---|---|---|---|---|---|---|---|---|

$47,089$ | 45% | 25% | 30% | 7 | 20 | 1 | 10 | 10.3 | 261.4 | 45.9 | 22.0 |

^{a}

The centered Latin hypercube sampling strategy (Ref. 24) is used.

The network architecture of $Nf$ consists of three layers, each with 256 neurons applying the sine activation function in each layer except for a linear output layer, with proper weight initialization.^{21} Using sine activation functions can be seen as representing the signal using Fourier series^{22} and is probably the reason for a significantly better convergence compared to using the more common choice of tanh activation functions. However, experiments showed degraded interpolation properties using sine activation functions when the network was trained on grids with source positions distributed more sparsely (0.3 m), even when lowering the number of neurons to prevent overfitting. A reason could be related to the distributed source interval violating the Nyquist sampling theorem $\Delta x<c/(2f)=0.17$ m and causing aliasing effects, but this remains an open question. Therefore, the source positions were distributed evenly with finer resolution to improve the results between the source positions, consequently resulting in a sparser grid per source by keeping the total number of points the same. Despite the sparser grid, the convergence and final error still showed satisfying results. Distributing the source positions more densely is trivial in a data-free implementation, but if a combination of the underlying physics and simulated/measured data is considered later, a large number of source positions could be practically challenging.

The network architecture of $NADE$ consists of three layers, each with 20 neurons applying the tanh activation function in each layer except for a linear output layer, with Glorot normal initialization of the weights.^{23} Using the tanh function is an obvious choice since we have chosen to scale the accumulators to take values in the range $[\u22121,1]$.

## 3. Results

Frequency-independent and dependent boundary conditions are tested, each with parameterized moving sources trained at seven evenly distributed positions $x0=[\u22120.3,\u22120.2,\u2026,0.3]$ m and evaluated at five positions $x0=[\u22120.3,\u22120.15,0.0,0.15,0.3]$ m. Additional results are included as supplementary materials.^{20} The source is satisfied through the initial condition modeled as a Gaussian impulse from Eq. (2) with $\sigma 0=0.2$ spanning frequencies up to 1000 Hz. The speed of sound $cphys=343$ m/s and air density $\rho 0=1.2\u2009kg/m3$ are used for all studies.

First, we test the frequency-independent boundary condition with normalized impedance $\xi =5.83$ depicted in Fig. 2(a). Frequency-independent boundaries with corresponding wave propagation animations available from Mm. 1. Then, we test the frequency-dependent impedance boundary condition, where the boundary is modeled as a porous material mounted on a rigid backing with thickness $dmat=0.10$ m with an air flow resistivity of $\sigma mat,phys=8000\u2009Nsm\u22124$. The surface impedance *Y* of this material is estimated using Miki's model^{25} and mapped to a two-pole rational function in the form of Eq. (4) with *Q* = 2 and *S* = 1 using a vector fitting algorithm^{26} yielding the coefficients for Eq. (5). The results are depicted in Fig. 2(b). Frequency-dependent boundaries with corresponding wave propagation animations available from Mm. 2.

We observe that the shape of the wave propagations is well captured, and the impulse responses also fit the reference solutions very well for all boundary types. The relative mean error $\mu rel(x,x0)=(1/N)\u2211i=0N\u22121[|p\u0302(x,ti,x0)\u2212p(x,ti,x0)|/p(x,ti,x0)]$ within −60 dB and absolute maximum error $\u221eabs(x,x0)=max{|p\u0302(x,ti,x0)\u2212p(x,ti,x0)|\u2009:i=0\cdots N\u22121}$ of the impulse responses originating from various source and receiver positions are summarized in Table 2. Relative errors are below 2%/0.2 dB for all predictions. The absolute maximum errors are below 0.011 Pa for all predictions indicating that no severe outliers are present.

. | $s0$ . | $s1$ . | $s2$ . | $s3$ . | $s4$ . | |||||
---|---|---|---|---|---|---|---|---|---|---|

. | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . |

Freq. indep. | 0.7% | 0.004 | 0.3% | 0.002 | 0.5% | 0.001 | 0.4% | 0.002 | 1.5% | 0.006 |

Freq. dep. | 0.5% | 0.004 | 1.5% | 0.008 | 1.7% | 0.011 | 1.9% | 0.009 | 1.1% | 0.005 |

. | $s0$ . | $s1$ . | $s2$ . | $s3$ . | $s4$ . | |||||
---|---|---|---|---|---|---|---|---|---|---|

. | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . | $\mu rel$ . | $\u221eabs$ . |

Freq. indep. | 0.7% | 0.004 | 0.3% | 0.002 | 0.5% | 0.001 | 0.4% | 0.002 | 1.5% | 0.006 |

Freq. dep. | 0.5% | 0.004 | 1.5% | 0.008 | 1.7% | 0.011 | 1.9% | 0.009 | 1.1% | 0.005 |

## 4. Conclusion and future work

A novel method is presented for predicting the sound field in a 1D domain for impedance boundaries and parameterized moving Gaussian sources using PINNs. A coupled system of differential equations, consisting of the governing physical equations and a system of ODEs predicting the accumulators of the ADEs, was used for training the PINN. The equations for the ADEs depend only on time *t* but were parameterized to take boundary and source positions into account, yielding a very flexible implementation. The results are promising, with relative mean errors below 2%/0.2 dB for all cases. The approach taken by learning a compact surrogate model that is inexpensive to evaluate at runtime shows potential to overcome current numerical methods' limitations in modeling flexible scenes, such as moving sources.

Compared to standard numerical methods, the PINN method takes up to three orders of magnitude more time to converge. Therefore, to solve realistic problems in 3D, the convergence rate needs to be improved. This is partly due to the need for fairly large amounts of grid points with 70% of the points located at the initial time step and at boundaries where penalty weights are also needed for balancing each loss term. Formulating an ansatz imposing initial and boundary conditions directly could overcome this problem.^{27} Also, considering other architectures taking (discrete) time-dependence into account instead of optimizing the entire spatiotemporal domain at once might improve the learning rate and produce more precise results. Moreover, we have observed challenges in the global optimizer for a larger domain size and/or by increasing the frequency due to the ratio between zero and non-zero pressure values. Domain decomposition methods^{28} have been introduced to overcome this limitation. In ongoing work, more complex benchmarks are being considered.

## Acknowledgments

Thanks to DTU Computing Center GPULAB for access to GPU clusters and swift help. Also, a big thanks to Ehsan Haghighat for valuable discussions regarding PINNs and SciANN. Last but not least, thanks to Finnur Pind for making an SEM code available for calculating reference solutions.