Sound field reconstruction (SFR) augments the information of a sound field captured by a microphone array. Using basis function decomposition, conventional SFR methods are straightforward and computationally efficient but may require more microphones than needed to measure the sound field. Recent studies show that pure data-driven and learning-based methods are promising in some SFR tasks, but they are usually computationally heavy and may fail to reconstruct a physically valid sound field. This paper proposes a compact acoustics-informed neural network (AINN) method for SFR, whereby the Helmholtz equation is exploited to regularize the neural network. As opposed to pure data-driven approaches that solely rely on measured sound pressures, the integration of the Helmholtz equation improves robustness of the neural network against variations during the measurement processes and prompts the generation of physically valid reconstructions. The AINN is designed to be compact and able to predict not only the sound pressures but also sound pressure gradients within a spatial region of interest based on measured sound pressures along the boundary. Experiments with acoustic transfer functions measured in different environments demonstrate the superiority of the AINN method over the traditional cylindrical harmonics and singular value decomposition methods.

## I. INTRODUCTION

Microphone arrays are commonly used to measure a sound field and maximize the information about the source (Benesty , 2008). A large-aperture array with densely spaced microphones is preferred for sound field measurements; however, this is not always possible because of practical considerations such as cost and microphone arrangement (Rafaely, 2015). This necessitates sound field reconstruction (SFR; Fernandez-Grande, 2016; Williams, 1999; Zhang , 2008), a task that aims to reconstruct a sound field apart from the limited (sparse) measurement locations.

Existing SFR methods can be broadly classified into two categories: conventional methods based on basis function decomposition and recent learning-based methods. The conventional methods decompose measured sound fields into some basis functions, such as cylindrical harmonics (CHs) (Williams, 1999), spherical harmonics (Chen , 2015; Tang , 2022; Verburg and Fernandez-Grande, 2018; Wabnitz , 2011; Williams, 1999), prolate spheroidal wave functions (Zhang , 2023), and plane waves (Antonello , 2017; Fernandez-Grande, 2016; Schmid , 2021; Williams, 1999). The basis functions are solutions of the Helmholtz equation (Skudrzyk, 2012; Williams, 1999), the governing partial differential equation (PDE) of time-harmonic wave propagation, and are continuous spatial functions that can be evaluated at arbitrary positions. These two factors make the conventional methods easy to compute and generate a physically valid reconstruction of the sound field away from the measurement positions. However, the basis functions are designed with respect to some coordinate systems (Williams, 1999) without considering the statistical characteristics of sound fields. Thus, conventional methods may require more than the necessary number of measurements (spatial sampling points) to determine the basis function weights to reconstruct a sound field. The reality is that the statistical characteristics of a sound field can be used to reduce the number of sampling points based on singular value decomposition (SVD; Zhu , 2020, 2021), compressive sensing (Verburg and Fernandez-Grande, 2018; Wabnitz , 2011), statistical learning (Hahmann , 2021), or Bayesian inference (Schmid , 2021).

In contrast to conventional methods, recent learning-based techniques do not rely on predesigned basis functions. Instead, they exploit the learned statistical characteristics of sound fields for SFR. Lluis (2020) and Kristoffersen (2021) developed U-net-like neural networks, which were trained with simulated or measured room impulse responses (RIRs). The U-net-like neural networks achieved superior SFR performance over some of the conventional methods in the low frequency range (<300 Hz). Hahmann (2021) proposed to learn basis functions in local subdomains that are subsequently generalized across different rooms and frequencies, which showed potential for modeling complex sound fields according to their local (spatial) or statistical characteristics. By further enforcing self-similarity between adjacent local subdomains (Hahmann and Fernandez-Grande, 2022), the method attained better SFR performance when few measurements were available. Most recently, Fernandez-Grande (2023) proposed to use the generative adversarial networks for SFR to recover some of the sound field energy at high frequencies that would otherwise be lost due to under-sampling, demonstrating the promise of using statistical learning methods to overcome the sampling limitations. Although the learning-based methods outperformed the conventional methods in some SFR tasks, their computations are time-consuming. Furthermore, they are purely data-driven and, thus, do not necessarily reconstruct physically valid sound fields (Fernandez-Grande , 2023).

Recently, physical laws have been integrated into neural networks for various acoustic studies, such as the Kirchhoff–Helmholtz-based convolutional neural network (CNN) for nearfield acoustic holography (Olivieri , 2021), the physics-informed convolutional neural network (PI-CNN) for sound field estimation (Shigemi , 2022), the physics-informed neural network (PINN) for RIR reconstruction (Karakonstantis , 2024; Pezzoli , 2023), the PINN for acoustic boundary admittance estimation (Schmid , 2024), the DeepONet-based method for sound propagation simulations with moving sources (Borrel-Jensen , 2024), and the Fourier neural network-based method for estimating the sound field due to a source whose position is unknown (Middleton , 2023). Most of these studies (Karakonstantis , 2024; Olivieri , 2021; Pezzoli , 2023; Shigemi , 2022) attempted to reconstruct the sound field within a region of interest (ROI) with a few measurements inside the ROI.

In an alternative approach, this paper proposes an acoustics-informed neural network (AINN) to reconstruct the sound field within the ROI based on the sound pressures measured on its boundary (Cuomo , 2022; Raissi , 2019). The AINN is designed to approximate the sound field at the measurement positions and guided by the Helmholtz equation to generate physically valid reconstructions away from the measurement positions. The AINN is designed in the frequency domain and its size, i.e., the number of neurons in the hidden layers, is determined based on the physical principle. The AINN is compact and lightweight, making it easier to train than large neural networks. In addition, owing to the automatic-differentiation of deep-learning libraries (Pezzoli , 2023), the AINN is able to reconstruct not only sound pressures but also their gradients within the ROI. Experiments with transfer functions measured with two microphone arrays in three different rooms (Zhao , 2022) are conducted to compare the proposed method with the conventional CH (Williams, 1999) and SVD (Zhu , 2021) methods. Experimental results demonstrate the superiority of the proposed AINN method over the CH and SVD methods.

The remainder of this paper is organized as follows. The problem is formulated in Sec. II. The CH and SVD methods are reviewed in Sec. III, followed by the proposed AINN method. Numerical experiments are presented in Sec. IV to validate the performance of the proposed AINN method in comparison to the CH and the SVD methods. Section V concludes this work.

## II. PROBLEM FORMULATION

The problem of interest is illustrated in Fig. 1, where (*x*,*y*) and $(r,\varphi )$ denote the Cartesian and polar coordinates with respect to the coordinate origin *O*, respectively. The stars denote sound sources that generate a sound field in the ROI, which is depicted as the gray area. An array of microphones on the boundary of the ROI, displayed as the dots in Fig. 1, measure the sound pressures at ${xq,yq}q=1Q$ (or ${rq,\varphi q}q=1Q$) as $P(\omega ,xq,yq)q=1Q$ [or equivalently $P(\omega ,rq,\varphi q)q=1Q$], where *Q* is the number of measurement points, and $\omega =2\pi f$ is the angular frequency, where *f* is the frequency. The objective is to estimate the sound pressures and their gradients inside the ROI based on the measured sound pressures on the boundary. Hereafter, the symbol *ω* is omitted in some quantities for notational simplicity.

*c*= 340 m/s is the speed of sound in air at room temperature, and $\u22072$ denotes the Laplacian operator. In Cartesian coordinates, the Laplacian operator is given by (Williams, 1999)

In this paper, we build a compact neural network informed by the Helmholtz equation [Eq. (1)] to reconstruct the sound field within the ROI based on the microphone measurements.

## III. METHODOLOGY

This section first reviews the CH method (Williams, 1999) and SVD method (Zhu , 2021) for SFR, and subsequently proposes the AINN method.

### A. The CHs method

*n*th-order CH (Williams, 1999) evaluated at $(rq,\varphi q)$,

*i*is the imaginary unit, and $Jn(\xb7)$ is the Bessel function of order

*n*(Williams, 1999). In Eqs. (4) and (5),

*N*is the dimensionality of the sound field under CH decomposition and normally chosen as (Kennedy , 2007)

*x*axis and

*y*axis at (

*x*,

_{e}*y*) can be reconstructed as

_{e}### B. The SVD method

*x*,

_{s}*y*) as a cluster of virtual point sources whose positions are ${xs,j,ys,j}j=1J$ and constructs two matrices with respect to the virtual point sources. The first is a matrix of transfer functions between the virtual point sources and microphones such that

_{s}*q*th row and

*j*th column entry, $H(xq,yq,xs,j,ys,j)$, is the free-field transfer function (Williams, 1999) between the virtual point source located at $(xs,j,ys,j)$ and the microphone located at $(xq,yq)$, i.e. (Williams, 1999),

*v*th row and

*j*th column entry, $H(xv,yv,xs,j,ys,j)$, is the free-space transfer function between the virtual point source located at $(xs,j,ys,j)$ and the pressure estimation point located at $(xv,yv)$, which is defined similarly as in Eq. (13). The two matrices are decomposed as (Zhu , 2021)

### C. The AINN

This section proposes an AINN method for SFR. Two designs of the AINN, depending on whether the real and imaginary parts of the sound pressure are modeled separately or collaboratively, are investigated in this paper. In the first design, the real and imaginary parts of the sound pressure are modeled with a single network and a single loss function, as shown in Fig. 2(a). In the second design, by contrast, the real and imaginary parts of the sound pressure are modeled separately with two small networks with individual loss functions, as illustrated in Fig. 2(b). Hereinafter, the two designs are referred to as the coupled acoustics-informed neural network (cAINN) and decoupled acoustics-informed neural network (dAINN).

To model the real and imaginary parts of the sound pressure, one network design, cAINN, should have more expressive power and, hence, more hidden layers or more neurons in each hidden layer (Goodfellow , 2016). As shown in Figs. 2(a) and 2(b), more neuron connections between the two hidden layers make the cAINN more complicated than the dAINN. If, in Fig. 2(a), the *N* neurons in the upper half network are disconnected from the other *N* neurons in the lower half network, the cAINN will be the same as the simpler dAINN when two losses are used to train the network separately. Although in theory, cAINN is capable of exploiting the real and imaginary parts of the sound pressure for the training process and, therefore, may achieve better SFR performance, its training is complicated but may not attain the desired performance in practice.

*x*,

*y*) and the outputs are the real and imaginary parts of the reconstructed sound pressure, denoted as $N\u211c(x,y)$ and $N\u2111(x,y)$, respectively. Similar to the conventional data-driven methods, a data loss is used to minimize the differences between the reconstructed and measured sound pressures at the measurement locations. The data loss for the real parts of the sound pressures $Ld\u211c$ is given by

*D*positions within the ROI at ${xd,yd}d=1D$ and referring to Eqs. (1) and (2), the PDE loss for the real part of the sound pressure $Lp\u211c$ is given by

*D*should be chosen to be sufficiently large such that the distance between two PDE loss calculation points is no more than one-tenth of the wavelength of interest. The data loss and PDE loss are combined to calculate the total loss. The definitions of the imaginary-part data loss $Ld\u2111$ and PDE loss $Lp\u2111$ are similar to Eqs. (21) and (22), respectively, and are not shown here for brevity.

*N*neurons in each hidden layer. The trainable parameters of the network are updated to minimize a single total loss function, i.e.,

*N*neurons in each hidden layer. The trainable parameters of two networks are updated to minimize the real-part total loss,

*x*,

_{e}*y*) as $Ne(xe,y)=N\u211c(xe,ye)+iN\u2111(xe,ye)$. The sound pressure gradient at that position can be reconstructed as $\u2202N(xe,ye)/\u2202xe$ along the

_{e}*x*direction and as $\u2202N(xe,ye)/\u2202ye$ along the

*y*direction through differentiation on the network output. The pressure gradient along the radial direction can be reconstructed as

Here, are some comments on the proposed AINN method and recommended configurations.

#### 1. Using tanh as the activation function

We use $tanh$ as the activation function for two reasons. First, the $tanh$ function is a smooth function, whose second-order gradient can be computed. This is necessary for the Laplacian operators in Eqs. (2) and (3). Second, the $tanh$ function outputs positive or negative values according to the input. This makes it easier to model sound pressure, whose value can be either positive or negative. Sinusoidal functions also meet these two criteria and, thus, may also be used as the activation function. However, our trial results indicated that using tanh as the activation function shows better results than using the sin function. Therefore, we choose $tanh$ over sin in this paper.

#### 2. Cartesian coordinates vs polar coordinates

For the AINN method, we express sound pressures in Cartesian coordinates instead of polar coordinates for two reasons. First, the presence of the $1/r$ term can make the polar-coordinate Laplacian operator in Eq. (3) numerically unstable. Second, when a circular microphone array is used, no information about the sound field variation along the radial direction can be measured. In this case, the AINN method is unable to accurately estimate the first- and second-order radial gradients needed for calculating the Laplacian operator in polar coordinates [Eq. (3)].

#### 3. Loss function

The loss functions in Eqs. (23)–(25) consist of the data loss and PDE loss. The data loss prompts the network output to approximate the measured sound pressure at positions ${xq,yq}q=1Q$, which are on the boundary of the ROI as shown in Fig. 1. The PDE loss, on the other hand, regularizes the network output to conform with the Helmholtz equation at positions ${xd,yd}d=1D$ on the boundary of and within the ROI. *D* should be chosen to be sufficiently large and, thus, the distance between adjacent positions is at most half (ideally, one-tenth) of the wavelength for the frequency of interest. It is noted that when calculating the PDF loss in Eq. (22), the Helmholtz equation was divided by the term $(\omega /c)2$ such that the output error has a consistent physical dimension with sound pressure. Therefore, theoretically, there is no need to further balance the data loss and PDE loss with an extra weighting factor.

#### 4. Neuron number

As displayed in Eq. (8), the sound pressure can be expressed as a linear combination of $2N+1$ CHs ${Jn(kr)e\u2212in\varphi}n=\u2212NN$, which are solutions of the Helmholtz equation [Eq. (1)]. As depicted in Figs. 2(a) and 2(b), the very same sound pressure can also be expressed as a linear combination of the output of a number of neurons. This fact inspires us to set the number of neurons in hidden layers according to the CH decomposition of the sound pressure. Specifically, for the cAINN, the single network design, the neuron number is set to be 2*N*, and for the dAINN, the two network design, the neuron number is set to be *N*, where *N* can be calculated from Eq. (6). Based on Eq. (6), for a sound field within a circular ROI with a *r* = 0.1 m radius, the number of trainable parameters *N* of the dAINN method are listed in Table I when the number of hidden layers is *L* = 2. As shown in Table I, a dAINN with no more than 200 trainable parameters is sufficient for modeling a *r* = 0.1 m radius sound field up to 4 kHz, considering the real and imaginary parts of the complex sound pressure. To model a sound field of a different size and at other frequencies, we can design the cAINN and dAINN based on Eq. (6) accordingly. The AINN is compact in comparison to other learning-based methods, thus, is easier and faster to train. For example, Karakonstantis (2024) employed a network with 5 hidden layers, each with 512 neurons. It is worth noting that unlike the time domain approach in Karakonstantis (2024), our proposed AINN works in the frequency domain. To reconstruct the time-domain sound field, multiple compact AINNs, each for a frequency bin, can be trained in parallel. It is also noted that this paper designed the AINN architecture and parameters based on the CH expansion order, aiming to make it more explainable. Hyperparameter optimization of the AINN may lead to better performance but is not covered in this paper and will be investigated in the future.

Frequency . | Number of layers, L
. | Number of neurons, N
. | Number of trainable parameters . |
---|---|---|---|

f = 1 kHz | 2 | 2 | 11 |

f = 2 kHz | 2 | 4 | 29 |

f = 3 kHz | 2 | 6 | 57 |

f = 4 kHz | 2 | 8 | 99 |

Frequency . | Number of layers, L
. | Number of neurons, N
. | Number of trainable parameters . |
---|---|---|---|

f = 1 kHz | 2 | 2 | 11 |

f = 2 kHz | 2 | 4 | 29 |

f = 3 kHz | 2 | 6 | 57 |

f = 4 kHz | 2 | 8 | 99 |

## IV. EXPERIMENTS

Experiments were conducted to validate the performance of the proposed AINN method and compare it with the CH and SVD methods.

### A. Data processing

The SFR methods were evaluated using the University of Technology Sydney (UTS) multi-zone sound field reproduction dataset (Zhao , 2022). The measurement setup is shown in Figs. 3 and 4. The RIRs between a loudspeaker array [Fig. 3(a)] and two microphone arrays [Figs. 3(b) and 3(c)] were measured in an anechoic chamber [Fig. 4(a)], a medium meeting room [Fig. 4(b)], and a small meeting room [Fig. 4(c)]. The microphone arrays were placed at the center of the circular array of 60 loudspeakers. The radius of the loudspeaker array is 1.5 m.

*R*= 1.5 m and $l=1,2,\u2026,60$.

The loudspeaker and microphone arrays were connected to a 64-input-64-output audio interface, which consists of four Yamaha RIO1608-D2 and four Yamaha RIO8-D (Shizuoka, Japan). The audio interface was commanded by a matlab program (MathWorks, Natick, MA) through a Dante virtual sound card (Sydney, Australia). A logarithmic sine sweep signal of 3 s duration with a frequency range from 20 Hz to 20 kHz was produced through each loudspeaker. The microphone recorded signals were processed to obtain the RIRs, which are 43 680 taps long at a sampling rate of 48 kHz. The RIRs were transformed into frequency domain transfer functions through the discrete Fourier transform (Oppenheim , 1997), resulting in the complex sound pressures used in the experiments here. For more details about the measurement, please refer to Zhao (2022).

### B. Implementation

The CH method was implemented based on Eqs. (4)–(11). According to Eq. (6), the dimensionalities of the sound field within the planar array are *N* = 3, 6, 8 for *f* = 1, 2, 3 kHz, respectively, and the dimensionalities of the sound field within the dual-circular array are *N* = 3, 5, 7 for *f* = 1, 2, 3 kHz, respectively.

The SVD method was implemented based on Eqs. (12)–(20). For a source (loudspeaker) located at (*x _{s}*,

*y*), the virtual point sources are uniformly arranged around the source in a 0.1 m × 0.1 m square, and the distance between two virtual point sources is 0.01 m. This amounts to 120 virtual point sources in total. The arrangement of the sources for the SVD method is shown in Fig. 5. It is noted that in the implementation of the CH and SVD methods according to Eqs. (7) and (17), respectively, no regularization has been employed. For the CH method, the condition numbers of the matrix

_{s}**J**in Eq. (7) for the current experimental setup are less than ten, therefore, the regularization is not necessary. For the SVD method, incorporating regularization does not provide sensible improvement over current implementation.

To implement the proposed AINN method, we used the TensorFlow library and initialized the trainable parameters according to the Xavier initialization (Glorot and Bengio, 2010). The ADAM algorithm (Kingma and Ba, 2014), with a learning rate of 0.001, was used as the optimizer. The neuron number was set based on the dimensionality of the sound field under CH decomposition. The hidden layer number was set as one for *f* = 1 kHz and as two for *f* = 2, 3 kHz based on a trial-and-error process.

The data loss was calculated with respect to the measured sound pressures $PM$ at the corresponding coordinates ${xq,yq}q=1Q$. The PDE loss was calculated with respect to coordinates ${xd,yd}d=1D$ on the boundary of and within the ROI, which were consistently sampled with a uniform interval distance of 0.01 m between adjacent points, as depicted in Fig. 6. The AINN method was trained for 10^{5} epochs, and no early stop strategy was implemented. At each epoch, ${PM,{xq,yq}q=1Q,{xd,yd}d=1D}$ were fed to the AINN all at once (no batching). The specific values of *Q* and *D* depend on the experimental setup. For the sound pressure reconstruction experiments in Secs. IV D and IV E, *Q* = 28 because 28 measurement microphones are used and *D* = 841 for the uniformly sampled spatial points with an interval distance of 0.01 within the 0.28 m square (Fig. 6). For the pressure gradient reconstruction experiments in Secs. IV F and IV G, *Q* = 30 because 30 measurement microphones are used and *D* = 625 for the uniformly sampled spatial points with an interval distance of 0.01 within the 0.24 m square around the circular microphone array. The AINN was trained on a NVIDIA RTX 6000 graphics processing unit (GPU) (NVIDIA, Santa Clara, CA) with a 24 G random access memory (RAM), and the training of each AINN for a frequency took about 3 min.

In Secs. IV C–IV G, the performance of the proposed AINN method is compared to that of the CH and SVD methods when the same amount of data are available for all three methods, which is a fair comparison. It would also be interesting to compare the amount of training data required by these methods to reconstruct a sound field with a certain accuracy, which is supposed to depend on frequency and the accuracy threshold. This will be investigated in future work.

### C. Performance metrics

*E*is the total number of reconstruction positions. The pressure gradient reconstruction error was defined similarly to Eq. (34). The logarithmic error measure was chosen because it can represent a wide range of values compactly.

### D. Sound pressure reconstruction: Loudspeaker 7

Based on the sound pressures measured by the 28 exterior microphones of the planar array [Fig. 3(b)], we reconstructed the sound pressures at the 36 interior microphones within the array. Figures 7–9 show the real parts of the sound pressure due to loudspeaker 7 [the black loudspeaker in Fig. 3(a)] at 1, 2, and 3 kHz in the anechoic chamber, the medium room, and small room, respectively. Figures 7–9 also show the reconstructions by the CH method, SVD method, AINN method, and corresponding (real and imaginary parts) reconstruction errors *ξ*. The results for the imaginary parts are similar and are not shown here for brevity.

As shown in Figs. 7–9, the dAINN, two network design, outperforms the cAINN, single network design, at all frequencies and in all room environments. Experiments for the cAINN with more or less neurons on hidden layers, i.e., $\u23083N/2\u2309$ or 3*N*, were also conducted, and the results were also inferior to those for the dAINN, thus, they are not revealed for brevity. Hereinafter, we focus only on the dAINN.

As shown in the second and third rows of Figs. 7–9, the SVD method performs better than the CH method and in most of the cases for *f* = 2, 3 kHz, with a reduction in the overall reconstruction error by 1.0–2.6 dB at 2 kHz and 1.2–4.1 dB at 3 kHz across the three rooms. This could be attributed to the facts that prior information of the sound source location was used in the SVD method (Zhu , 2020, 2021) or the noncircular/cylindrical geometry of the array does not allow the CH method to work effectively. In contrast, the proposed dAINN outperforms the SVD method in all of the tested rooms at *f* = 2, 3 kHz, although no prior information about the sound source location was required by the dAINN method. This could be attributed to the fact that the CH and SVD methods relied on only the sound pressures measured on the edge of the planar array for SFR. The measured pressures did not necessarily contain sufficient information to fully determine the sound field within the planar array. The dAINN, on the other hand, exploited the Helmholtz equation for regularizing the SFR within the array through the PDE loss and reconstructed the sound field within the planar array more accurately.

As shown in the first column of Fig. 7, at 1 kHz, all methods achieve lower than −10 dB reconstruction errors in the anechoic chamber. However, Figs. 8(a1)–8(a4), 9(a11), 9(a12)–9(a41), and 9(a42) show that in the medium and small rooms, the pressure reconstruction errors of all of the methods are poor. Specifically, the SVD method, cAINN method, and dAINN method achieve larger than 0 dB pressure reconstruction errors in the medium room. This may be attributed to the fact that in the medium [Fig. 8(a0)] and small rooms [Fig. 9(a0)], under the particular loudspeaker-microphone array setup, the interference between the direct sound and room reflected sound makes the sound field within the planar array show no viable wave front as in the anechoic chamber [Fig. 7(a0)]. This causes the methods to be unable to accurately reconstruct the sound field. An extension of this work is to exploit knowledge of the sound source(s) position to further improve the performance of the AINN method, which will be investigated in the future.

Figure 10 further shows the difference, i.e., $P\u211c(xe,ye)\u2212P\u0302\u211c(xe,ye)$, between the ground truth and the reconstructions by different methods in the small meeting room. It can be observed that the difference is close to zero on the boundary of the microphone array but is larger in the interior of the microphone array. The results are expected as the pressures on the boundary are known and used in the calculations or training. Larger errors are observed at 1 kHz than at 2 kHz and 3 kHz, which is consistent with Fig. 9.

Figure 11 depicts the learning curves for real/imaginary-part data and PDE losses of the dAINN method in the medium room at 2 kHz, which corresponds to Fig. 6(b4). The learning curves of the dAINN method at other frequencies and other room environments are similar to those in Fig. 11 and, thus, are not shown here for brevity. As displayed in Fig. 11, the learning curves converge after 50 000 epochs of training. After convergence, the difference between the data and PDE losses for the real part is $|Ld\u211c\u2212Lp\u211c|\u22484$ dB, and the corresponding difference for the imaginary part is $|Ld\u2111\u2212Lp\u2111|\u22483$ dB.

### E. Sound pressure reconstruction: All loudspeakers

The SFR experiment was further conducted for other loudspeakers with one loudspeaker working each time. For each loudspeaker, we set the transfer functions between it and the planar array as the sound pressure. We used the sound pressure measured by the exterior 28 microphones to reconstruct the sound pressure at the interior 36 microphones within the planar array. Figure 12 compares the sound pressure reconstruction errors of the CH, SVD, and dAINN methods in the three rooms for all loudspeakers at 3 kHz.

The dAINN achieves the smallest average reconstruction errors of –12.5, –8, and –7.2 dB, in the anechoic chamber, medium room, and small room, respectively. Similar experiments were also conducted at 1 and 2 kHz. Except that the reconstruction errors were larger for the medium room and small room at 1 kHz, the results were similar to those in Fig. 12 and, thus, are not shown here for brevity. It is noted that Fig. 12 displays that the reconstruction errors in the anechoic chamber are smaller than those in other two rooms. That is because the boundaries and scatterers in the other two rooms make the sound field complex or of high frequency in the spatial domain, and the AINN has a hard time to learn the high-frequency spatial feature (Xu , 2022).

### F. Pressure gradient reconstruction: Loudspeaker 7

To examine the effectiveness of the pressure gradient approximation, we performed a simulation based on the ideal free-field environment before experimenting with the real-measured transfer functions in different rooms. The transfer functions between loudspeaker 7 and the dual-circular array were simulated using the free-field Green's function (Williams, 1999) with added white Gaussian noise at a signal-to-noise ratio (SNR) of 20 dB to model disturbances. The reconstruction errors are depicted in Fig. 13. As shown in Figs. 13(a1), 13(b1), and 13(c1), the reconstruction errors of three methods are relatively small, i.e., $<\u221210$ dB, when using the simulated sound pressures for reconstruction, which demonstrate the effectiveness of the approach.

The experimental results for the real-measured transfer functions in different rooms are shown in Figs. 13(a2)–13(c4). It can be observed that the reconstruction errors of all three methods show signs of degradation compared to the results with simulated transfer functions. The CH and SVD methods exhibit significant deviations from the ground truth in the medium and small rooms. In contrast, by exploiting the data-independent Helmholtz equation for regularization, the dAINN method is less susceptible to the inherent disturbances of the measurement processes. Thus, the dAINN method achieves the least pressure gradient reconstruction errors in most cases.

### G. Pressure gradient reconstruction: All loudspeakers

The pressure gradient reconstruction experiment was repeated over all 60 loudspeakers. We set each loudspeaker to the dual-circular array's transfer function as the sound pressure, approximated the radial pressure gradient according to Eq. (35), and reconstructed the pressure gradient in the same way as in Sec. IV F.

Figure 14 shows reconstruction errors for all loudspeakers in the three rooms by the CH, SVD, and dAINN methods at 3 kHz. The experimental results for 1 and 2 kHz showed similar results as for that in Fig. 14 and, thus, are not shown for brevity. Comparing Fig. 14 with Fig. 12, we can see that the pressure gradient is more challenging to reconstruct than the sound pressure. The dAINN achieves the smallest average reconstruction errors of –11.5, –7.0, and –5.5 dB in the anechoic chamber, medium room, and small room, respectively. The average reconstruction errors of the SVD method are about 2 dB higher than those of the dAINN method. The reconstruction errors of the CH method are the worst and exceed 0 dB in the small room.

It is noted that in this paper, the pressure gradient was approximated by the finite difference [Eq. (35)]. This may not be accurate in some circumstances and may contribute to the relatively high reconstruction errors depicted in Figs. 13 and 14. The velocity sensor (De Bree, 2003) may be used for measuring the pressure gradient and testing the performance of the pressure gradient reconstruction in the future.

## V. CONCLUSION

This paper proposed a compact AINN method for SFR. A neural network was designed to approximate the measured sound pressure and obey the Helmholtz equation, which regularized the network to generate physically valid output at and beyond the measurement positions. The performance of the AINN method was validated by sound pressure and pressure gradient reconstruction experiments and outperformed the CH and SVD methods. The design of the AINN method, specifically its width and depth, is still empirical. Further theoretical investigation is needed to provide better guidance on the AINN design, which will be one of our future research topics.

## ACKNOWLEDGMENT

S.Z. is the recipient of an Australian Research Council Australian Early Career Industry Fellowship (Project No. IE240100245), funded by the Australian Government. Computational facilities were provided by the UTS eResearch High Performance Computer Cluster.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are openly available in UTS Stash at https://doi.org/10.26195/0wx8-v473, Zhao (2022).

## REFERENCES

*Microphone Array Signal Processing*

*JMLR Workshop and Conference Proceedings*, May 13–15, 2010, Sardinia, Italy, pp.

*Deep Learning*

*Forum Acusticum*

*Signals and Systems*

*Fundamentals of Spherical Array Processing*

*The Foundations of Acoustics: Basic Mathematics and Basic Acoustics*

*Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography*