Simulating room transfer functions between transducers mounted on audio devices using a modified image source method

The image source method (ISM) is often used to simulate room acoustics due to its ease of use and computational efficiency. The standard ISM is limited to simulations of room impulse responses between point sources and omnidirectional receivers. In this work, the ISM is extended using spherical harmonic directivity coefficients to include acoustic diffraction effects due to source and receiver transducers mounted on physical devices, which are typically encountered in practical situations. The proposed method is verified using finite element simulations of various loudspeaker and microphone configurations in a rectangular room. It is shown that the accuracy of the proposed method is related to the sizes, shapes, number, and positions of the devices inside a room. A simplified version of the proposed method, which can significantly reduce computational effort, is also presented. The proposed method and its simplified version can simulate room transfer functions more accurately than currently available image source methods and can aid the development and evaluation of speech and acoustic signal processing algorithms, including speech enhancement, acoustic scene analysis, and acoustic parameter estimation.

are implicitly modeled.However, because of the much higher computational cost of solving wave-based models, geometrical methods are often preferred.While assuming that sound waves behave like light rays does not apply to long wavelengths, Aretz et al. 37 have shown that the ISM can be used to approximate wave-based method solutions above the Schroeder frequency 38 in rectangular rooms by specifying angle-dependent reflection coefficients.Thus, the ISM can be used to obtain acceptable solutions across most of the audible frequency range above the Schroeder frequency.
The ISM has been implemented in many widely used toolboxes [39][40][41][42] , along with various approaches for reducing the computational complexity, for example, parallelization based on GPUs 42,43 .Directional properties of the source and receiver were also incorporated into the ISM to produce more realistic room impulse responses [39][40][41][44][45][46] , and efforts have been made to include acoustic diffraction effects due to the human head 44 and rigid spherical baffles 47,48 . Jarett et al. 47 extended the ISM to include rigid spherical microphones using an analytical expression of the diffraction effects in the spherical harmonic domain.Samarasinghe et al. 49 further extended the ISM, viz. the Generalized ISM (GISM), to facilitate parameterization of the reverberant transfer function between a directional source and a directional receiver.More recently, an extension of the ISM to incorporate loudspeaker directivities in the far field in the spherical harmonic domain has been reported 50 . Ho-ever, the existing approaches cannot simulate a transfer function between a directive source and a directive receiver that includes near-field diffraction effects.
In this work, the GISM 49 is extended to enable the incorporation of acoustic diffraction effects.First, we review the basic concepts of the ISM and its extension to the spherical harmonic domain in Sec.II.In Sec.III, we present our proposed room transfer function (RTF) parameterization, which utilizes the source directivity coefficients computed from solutions of an exterior radiation problem and receiver directivity coefficients obtained using reciprocity 51 .In addition, we derive a simplified version of the proposed method to reduce computational complexity.The Finite Element Method (FEM) is used to verify the proposed method by comparing RTFs with the source and receiver mounted on either different devices or on the same device.The experimental setups and the evaluation of the proposed method are presented in Sec.IV.The effect of varying the parameters of the device, e.g., its position in the room, physical shape, and size, on the accuracy of the proposed method is investigated.Finally, conclusions are drawn in Sec.V.

II. THE IMAGE SOURCE METHOD A. Standard image source method
The ISM was first applied to acoustics by Carslaw 52 and later implemented using a digital computer by Allen and Berkley 22 to model the impulse response between a point source and an omnidirectional receiver in a reverberant rectangular room.
To derive the standard ISM, consider a rectangular room with length, width, and height given by L x , L y and L z , as well as an acoustic point source and an omidirectional receiver inside of the room at positions x s = (x s , y s , z s ) and x r = (x r , y r , z r ), given in Cartesian coordinates, respectively.The RTF can be expressed as a weighted sum of free-space Green's functions attenuated by the walls, that is: where k is the wavenumber.The triples p and q determine the position of the images and the attenuation factor β(p, q).The triple p = (p x , p y , p z ) is used to determine whether the image source is mirrored w.r.t.walls at x = 0, y = 0 or z = 0, and its elements take values of 0 or 1, resulting in a set P of eight combinations.Each element of the other triple q = (q x , q y , q z ) takes a value between [−N m , N m ], and it includes higher order reflections, where N m is used to limit computational complexity and circular convolution errors 47 , resulting in a set Q of (2N m + 1) 3 combinations.β(p, q) = β and β x1 , β x2 , β y1 , β y2 , β z1 , β z2 are the angle-independent reflection coefficients of the walls, and the subscripts a 1 , a 2 (a ∈ {x, y, z}) of β (•) correspond to the boundaries at a = 0 and the boundaries at a = L a , respectively.For an image source with position the Green's function can be written as where i denotes the imaginary unit.The vector pointing from every image of the source to the receiver is expressed as B. Extensions of the standard ISM Many extensions to the ISM have been proposed to facilitate more realistic modeling of room acoustics.In this section, we discuss two important extensions of the ISM relevant to the present study, namely, using angledependent reflection coefficients and including source and receiver directivities.

Angle-dependent reflection coefficients
In Eq. ( 1), angle-independent reflection coefficients are used.Aretz et al. 37 have shown that including the angle of incidence when specifying the boundary conditions in the ISM can result in solutions that agree with FEM solutions.As such, angle-dependent reflection coefficients are used in this work.
For each reflection path represented via the indices (p, q) in the ISM, three incident angles in Cartesian coordinates are needed, i.e., θ a (p, q) = arccos a R sI→r p,q where a ∈ {x, y, z}, a R sI→r p,q is the corresponding coordinate of the vector R sI→r p,q and θ a (p, q) is the incident angle on the walls perpendicular to the a−axis.The incident angle θ a (p, q) is the same for the reflections at the walls perpendicular to the same axis in a rectangular room 37 .For a uniform normalized impedance, ζ, the angle-dependent reflection coefficients are given by 53 From this point onward, the attenuation term due to the reflections, β(p, q), contains the angle-dependent reflection coefficients defined in Eq. ( 6).Note that, while the reflection properties of surfaces are generally frequency dependent, we use a frequency-independent impedance ζ for all walls in this work without loss of generality.Triple determining whether the image source is mirrored w.r.t.walls at x = 0, y = 0 or z = 0, elements take values of 0 or 1 q = (qx, qy, qz) ∈ Q Triple determining higher order reflections each element can take values between [−Nm, Nm] Source images Receiver images Vector pointing from source images to the receiver Vector pointing from receiver to source images The reversed vector corresponding to the same reflection path of R r→sI p,q

Extension to the spherical harmonic domain
In practice, the sound source and receiver in a room usually exhibit directional properties that are far more complex than those of a point source or an omnidirectional receiver.Various methods for incorporating source and receiver directivities into the ISM have been presented in the literature, e.g., 15,18,[39][40][41][44][45][46]48 . In this sction, we present the Spherical Microphone array Impulse Response (SMIR) generator proposed by Jarrett et al. 54 .
The directivity of a source or receiver depends not only on the directional properties of the transducer itself but also on acoustic wave interaction with the physical object on which the transducer is mounted.The SMIR generator extends the ISM to the spherical harmonic domain, which facilitates the incorporation of the acoustic diffraction effect due to a rigid sphere and enables the simulation of RTFs for microphones mounted on such a sphere.When considering the receiver at x r as the center of the spherical array with an omnidirectional microphone positioned at y (r) , the SMIR generator computes the RTF via where G N (•|•) denotes the Green's function fulfilling Neumann boundary conditions on the sphere and R r→sI p,q is the vector pointing from the center of the array to the source image in spherical coordinates.Detailed expressions of H SMIR (y (r) |x s , k) can be found in the literature 54 .It is worth noting that the Green's function given in Eq. ( 3) and the Neumann Green's function in Eq. ( 7) use different coordinate systems for the input arguments.
The acoustic diffraction on the sphere affects the directivity of the microphones on the sphere.Further directional properties of the source and receiver are usually implemented as a gain factor for each reflection path in the ISM.For example, Eq. ( 7) is modified by considering the source directivity 48 yielding the expression: p,q , ϕ e,o p,q , k) where θ e,o p,q , ϕ e,o p,q are the emission inclination and azimuth angles of the image source with indexes (p, q) with respect to the orientation of the source directivity pattern.The angles θ e,o p,q , ϕ e,o p,q can be obtained by calculating the angles between the source orientation, e.g., the main direction of the directivity pattern at the cell with indexes (p, q), and the vector from the source image with indexes (p, q) to the receiver.For first-order directivity patterns, g(•), can be expressed as 48 g(θ e,o p,q , ϕ e,o p,q , k) = δ + (1 − δ) cos θ e,o p,q sin ϕ e,o p,q , where 0 ≤ δ ≤ 1.The relation between the emission and impinging angles, i.e., the mirrored emission angles after all reflections, depends on the number of reflections by walls perpendicular to different axes.A detailed formulation of this relation is provided by Hafezi et al. 48.However, the phase information of the source directivity is missing in this approach.

C. Generalized image source method
The method introduced in Eq. ( 8) to model directional sources and receivers is restricted to directivity patterns with known analytical expressions.In practice, directivity patterns may exhibit complex shapes without analytical expressions.Samarasinghe et al. 49 further extended the ISM to facilitate the parameterization of the transfer function between a source with arbitrary directivity and a receiver region.The directional properties of the receiver region can be formed using beamforming techniques based on the received sound field.In this sec-  49 .A directional source is positioned inside a rectangular room.The room transfer function from the source located at xs to an observation point y (r) around the receiver located at xr can be simulated.tion, we describe the GISM, which is the foundation of our proposed method.
The GISM decomposes the transfer function into three parts: i) an outgoing sound field from the directional source, ii) an incident sound field captured by the receiver that can be directional, and iii) the mode coupling coefficients that couple the source and receiver directivities in the spherical harmonic domain.
Consider an outgoing sound field from a directional source, for example a loudspeaker, located at x s as shown in Fig. 1.The sound field observed at an arbitrary point r outside of the sphere enclosing the entire speaker can be computed as 55 : where (d, θ, ϕ) denote the spherical coordinates associated with r, viz. the distance d, the inclination angle θ and the azimuth angle ϕ.Moreover, h n (•) denotes the spherical Hankel function of order n, Y n,m denotes the spherical harmonic function of order n and mode m, and n,m (k) represent the spherical harmonic source directivity coefficients.In practice, the infinite summation in Eq.( 10) is truncated to a maximum order N = kr (s) 0 where r (s) 0 denotes the radius of the sphere that is large enough to surround the loudspeaker 56,57 .
The room transfer function from the source x s to any point y (r) with spherical coordinates (d y ) relative to x r can be expressed using the GISM as 49 : where V = k d (r) y denotes the truncation limit that produces a small error 58 and j v (•) denotes the nth-order spherical Bessel function of the first kind.The reverberant mode coupling coefficients γ n,m v,u (k) are given by where R sI→r p,q has spherical coordinates (||R sI→r p,q ||, Ω sI→r p,q ), and Ω sI→r p,q = (θ sI→r p,q , ϕ sI→r p,q ).The additional factors (−1) (py+pz)m+pzn and (−1) px+py result from the mirror effect of the image sources 59 on the spherical harmonics, which is a more general description compared to the use of emission and impinging angles as applied in the SMIR+ method 48 .Using x 0 with spherical coordinates (d x0 , θ x0 , ϕ x0 ) instead of R sI→r p,q for brevity, the singlepath mode-coupling coefficients α n,m v,u (•) in Eq.( 12) can be derived using the translation of fields 59 , with W 1 and W 2 denoting Wigner 3 -j symbols, and ξ = (2n + 1)(2v + 1)(2l + 1)/4π.

D. Fast source room receiver method
The Fast Source Room Receiver (FSRR) method, proposed by Luo and Kim 50 , also simulates the RTF in the spherical harmonic domain.The RTF parameterization includes an approximation of the spherical harmonic description of the source and receiver directivities and utilizes a simplification of the cross-modal coupling, i.e., the mode coupling coefficients in Eq. (12).The method also uses a randomized sign parameter.Notably, the FSRR method uses reversed reflection paths -a technique explored in this work for deriving a simplified version of the proposed method, which is presented in Sec.III D.
Using our notation, the FSRR method RTF is given by where B p,q randomly takes value between 1 and −1, M p,q is a function fitted to frequency-dependent absorption coefficients, C is a directivity coefficient obtained at 1 m from either the source or the receiver.The modified triples p, q are defined in Sec.III D (Eq. ( 30)), Ω sI→r p,q is the direction of a vector pointing from a source image to the receiver, and Ω r→sI p,q is the direction of the vector corresponding to the reversed path of the same reflection.Definitions of the two vectors are given in Sec.III D.

III. PROPOSED METHOD
We have seen that the SMIR generator relies on an analytical descriptions of directivity, the GISM simulates either a directive source or a directive receiver, and the FSRR method approximates the RTF between directional sources and receivers.In this section, we propose a method for simulating RTFs that can include near-field diffraction effects of both directional sources and directional receivers.

A. Problem formulation
We consider the following scenario: one source device and one receiver device are positioned in a rectangular room, as depicted in Fig. 2. The source contains a sound-radiating transducer located at x s , e.g., a vibrating piston on one surface, and a microphone located at x r is mounted on the receiver.The two devices can have arbitrary shapes and materials, and the transducers can exhibit arbitrary directivity patterns.The acoustic field generated by the source transducer interacts with both devices and the walls and is captured by the microphone.We aim to incorporate the local diffraction effects around the devices into the RTFs based on the ISM.The existing methods are unable to resolve the problem under consideration due to the following reasons: • As shown by Xu et al. 51 , the terms y ) corresponding to the incident sound field in Eq. ( 11) cannot model the diffraction around the receiver.Therefore, even utilizing the acoustic reciprocity principle, the GISM can only model the RTFs when either the source or the receiver does not include local diffraction effects.• The FSRR method 50 extends the ISM to include the directivity of both the source and receiver using a far-field approximation and, therefore, cannot model near-field directivities.

B. Source and receiver directivities
The directivity of a transducer mounted on a device can either be numerically simulated or practically measured at discrete points on a sphere around a local origin.When sound waves are generated or captured by the mounted transducers, e.g., a circular piston at x s or a microphone at x r as shown in Fig. 2, acoustic diffraction occurs due to the wave interaction with the speakers.As shown in previous works 51,55,60 , the total sound radiation of a source device can be quantified using directivity coefficients defined on a transparent sphere with radius r (s) 0 .Using reciprocity, the receiver directivity can be obtained analogously to the source directivity 51 .As an example, consider the source device.If the sound field is sampled at J discrete points on the transparent sphere, one can reformulate Eq. ( 10) with truncated order N as (16)  where r j denotes the the jth position on the transparent sphere, (θ j , ϕ j ) the corresponding direction, P n,m (r 0 ) is the spherical wave spectrum 55 and j ∈ {1, 2, . . ., J}. Different choices of the distribution of (θ j , ϕ j ) can be found in the literature 59 .If the number of samples is larger than (N + 1) 2 , the spherical wave spectrum can be calculated from the sampled sound field by solving an over-determined linear system using a least-squares approach.The directivity coefficients of the source can then be determined by n,m (k) can be obtained following the same procedure by making use of reciprocity 51 .

C. Room transfer function
The following parameterization of the transfer function between two transducers with arbitrary directivities in the free field, H (F) , was proposed by Xu et al. 51 : where α n,m v,u (k) represent the free-field mode coupling coefficients.Note that the two transparent spheres shown in Fig. 2 are nonoverlapping in this work.However, when both the source and receiver are on the same device, one can measure or simulate the direct path to avoid the overlapping of the two transparent spheres.The following relation between D Since the summation over source images is a linear operation and the mirroring effects are already included in γ n,m v,u in Eq. ( 12), the total RTF can be written by substituting γ n,m v,u (k) for α n,m v,u (k) in Eq. ( 17), yielding: The modified reverberant mode coupling coefficients can be defined, based on Eq. ( 12) and the reciprocal relation in Eq. ( 18), as follows: The final proposed RTF is therefore expressed as where, for clarity, the proposed method is identified using the acronym Diffraction Enhanced Image Source Method (DEISM).The GISM RTF, i.e., the simulation of a directional source to a local receiving region, can be recovered from the proposed RTF as follows.Similarly to Xu et al. 51 , for a receiver region without any physical objects and an observation point y (r) with spherical coordinates (d y ) relative to x r , as shown in Fig. 1, the receiver directivity coefficients can be written as By substituting Eq. ( 22) into Eq.( 21), and utilizing the property of spherical harmonics y ), one can obtain the same expression of the RTFs as the GISM shown in Eq. (11).

D. Far-field approximation
The computational cost of the DEISM RTF is expected to increase with frequency since the directivity patterns tend to be more complicated at higher frequencies 61 , resulting in higher orders of the directivities, i.e., larger N and V , required in the DEISM.As either N or V increases, the number of summations over l in the mode coupling coefficients in Eq. ( 12) increases, thus consuming more computational resources.Therefore, it is necessary to investigate the possibility of simplifying the DEISM in Eq. ( 21) while maintaining sufficient accuracy.To this end, a far-field approximation is applied to each reflection path of the RTF in Eq. ( 21) in this section.
Without loss of generality, we consider a reflection path from the source image to the receiver, denoted by the vector R sI→r p,q , for the following analysis.The mode coupling coefficients from Eq. ( 20) for a single reflection path can be expressed as follows, where m ′ = (−1) px+py m and Λ(p, m, n) = (−1) (py+pz)m+pzn are the additional effect of the mirroring on the spherical harmonics 49 .
Assuming ||R sI→r p,q || is large, we employ the large argument of the spherical Hankel function, i.e., h l (k||R sI→r p,q ||) ≈ i l+1 e −ik||R sI→r p,q || /k||R sI→r p,q ||.The modified mode coupling coefficients in far field can be 3. An example of reversed reflection paths along the x−direction used in the ISM.For an odd number of reflections, R r→sI 1,0 and R s→rI 1,0 share the same-length, but reversed reflection path.However, for an even number of reflections, e.g., R r→sI 0,−1 , the reversed path R s→rI 0,1 is symmetric w.r.t the original room cell instead of R s→rI 0,−1 .
expressed as Note that the product of two spherical harmonics can be expressed using the formula 62 , By applying Eq. ( 26) to Eq. ( 25), we can express the mode coupling coefficients as follows, αn,m v,u (k) ≈β(p, q)Λ(p, m, n) The transfer function of one reflection path using the farfield approximation model can thus be written as, where DEISM-LC is used to refer to the Low-Complexity (LC) solution.
The terms Λ(p, m, n)Y n,m ′ (Ω sI→r p,q ) are a result of mirroring of spherical harmonics.Since the source images are used in Eq. ( 28), the additional factors Λ(p, m, n) and m ′ need to be associated with the source spherical harmonics.To further simplify the expressions in Eq. ( 28), we propose to replace Λ(p, m, n)Y n,m ′ (Ω sI→r p,q ) by a simplified expression, which can be achieved as shown in the following by using the direction that corresponds to the reversed traveling path of the vector R sI→r p,q .First, we note that the vector pointing from the receiver to the source image R r→sI p,q is the inverse of the vector R sI→r p,q given in Eq. ( 4).In addition, we define the vector pointing from the source to the receiver images R s→rI p,q , which can be expressed similarly to Eq. ( 4) as In spherical coordinates, the vectors R r→sI p,q and R s→rI p,q have directions Ω r→sI p,q and Ω s→rI p,q , respectively.These two vectors are reversed reflection paths only when there are an odd number of reflections between parallel walls, as shown by the gray arrows in Fig. 3. To take the even number of reflections into account, as shown by the black arrows in Fig. 3, we use the modified vector R s→rI p,q (with directions Ω s→rI p,q ) that represents the reversed path of R r→sI p,q with its subscript indices p, q = (p x , py , pz ), (q x , qy , qz ) taking the following expressions where a ∈ {x, y, z}, and p ′ a , q ′ a represent the image indices that are symmetric to p a , q a w.r.t the image indices p a , q a = 0, 0. The relation between p ′ a , q ′ a and p a , q a can be summarized as where ⌊•⌋ denotes the floored division.Secondly, due to the properties of the mirroring effect on the spherical harmonics, one can write the following relation between the spherical harmonics associated with the spherical direction Ω sI→r p,q in Eq. ( 28) and Ω s→rI p,q introduced above Since R r→sI p,q = −R sI→r p,q , i.e., Ω sI→r p,q and Ω r→sI p,q have opposite directions, the following relation is obtained Finally, by substituting Eqs. ( 33) and (34) in the approximated solution in Eq. ( 28), we obtain the simplified expression The RTF of the DEISM-LC is then given by The DEISM-LC is more computationally efficient since it does not require the summation over l in the mode coupling coefficient in Eq. (12).The DEISM in Eq. ( 21) has a time complexity of O(IN 2 V 2 (N +V )) per wavenumber k, where I is the number of images, N and V are the maximum SH order used to describe directivities of the source and receiver.However, the time complexity of DEISM-LC is reduced to O(IN 2 V 2 ) per wavenumber k, which can lead to approximately N + V times faster computations compared to DEISM.As the frequency increases, larger N or V is usually necessary to accurately describe the directivity.Therefore, one might find DEISM-LC more practical for simulating high frequencies.While the simplified method described here is similar to that of the FSRR method, we note that there are significant differences between Eq. (36) and Eq.(15).As stated in Sec.III A, the FSRR utilizes directivity coefficients C(s) n,m (k), C(r) v,u (k) that are obtained at 1 m from the source or the receiver.However, the directivity coefficients used in DEISM-LC are not limited to this assumption as long as the transparent spheres enclose the devices.Furthermore, there are inherent dissimilarities between DEISM-LC and FSRR due to the different coefficients, e.g., B p,q , M p,q used in Eq. ( 15) and i n , i v used in Eq. ( 36).This difference is also illustrated by an example described in Sec.IV B.

E. Algorithm Summary
A summary of the algorithm proposed in Sec.III C is listed as pseudocode in 1, 2 and 3. A Python implementation of the algorithm, along with an example, is available online at https://github.com/audiolabs/DEISM.The pseudocode of the far-field approximation method is not provided here for brevity.
Algorithm 1: Calculate lookup tables 1: Calculate the Wigner-3j symbols W1(n, v, l) and W2(n, v, l, m, u) from Eq. ( 14).2: Calculate the spherical harmonic coefficients Pn,m(k, r0) shown in Eq. ( 16) for both the source and receiver using the least squares approach.The calculation of the Wigner 3j symbols can be computationally expensive if they are computed for each reflection path.Since they are identical for each reflection path, to reduce the computational effort, we precalculate them for all the given spherical harmonic orders and modes n, m, v, u, l, and store them in a lookup table.In addition, the directivity coefficients C The procedures for generating the lookup tables are listed in Alg. 1.
In the provided implementation, we separate the whole procedure of the ISM into two parts, viz. the image generation given in Alg. 2 and the single-path transfer function calculation using DEISM given in Alg. 3. The computation in the spherical harmonic domain is generally also expensive, and therefore we use parallel computation for all reflection paths after all images are calculated.Calculate the vector pointing from the source image to the receiver R sI→r p,q in Eq. ( 4) and its spherical coordinates (dx 0 , θx 0 , ϕx 0 ).

IV. EVALUATION OF THE PROPOSED METHODS
In the following sections, we describe the experimental setups used to evaluate the performance of DEISM and DEISM-LC (see Eq. ( 21) and Eq. ( 36), respectively).The simulated RTFs are plotted and compared for different scenarios.The differences between the DEISM and the FEM solutions are evaluated using error metrics, and the difference between DEISM and DEISM-LC is analyzed.In addition, the effects of incorporating directivities into the RTFs by comparing the DEISM-LC with the original ISM, and the differences between DEISM-LC and FSRR are also demonstrated by comparing them with FEM.

A. Test setup
The proposed methods require a description of the directivities of the devices, i.e., a loudspeaker (or microphone) in the spherical harmonic domain.These data can be obtained from measurements of the pressure field in the local vicinity of the transducer and its enclosure.Alternatively, directivity patterns obtained from simulations could be used.In this work, the FEM is used to simulate the free-field pressure fields surrounding the transducer and its enclosure.Aretz et al. 37 have demonstrated that solutions of the ISM with angle-dependent reflection coefficients are in good agreement with FEM solutions above the Schroeder frequency.Therefore, the FEM is used in this work to evaluate the performance of the proposed method.
This section presents the details of generating the simulated transducer directivities and the RTFs.

Transducer directivities
The three speaker shapes considered in this work are shown in Fig. 4. For each speaker, the loudspeaker driver is modeled as a vibrating piston with frequencyindependent unit acceleration.For each microphone, a monopole source with frequency-independent unit acceleration is placed at the microphone position, which is located on the top and close to the front of each enclosure.
One must ensure that free-field conditions are used to capture the loudspeaker and microphone directivities.In a measurement setup, one might use an anechoic cham-ber.When using the FEM, a suitable non-reflecting boundary condition is required.In this work, a perfectly matched layer 63 is used to reduce unwanted reflections from the edge of the computational domain.Note that care must be taken to ensure that the perfectly matched layer sufficiently reduces the unwanted reflections -this is achieved in this work by performing a priori mesh studies.Each speaker is meshed using triangular elements, the surrounding air (with a sound speed of 343 m/s and density of 1.2 kg/m 3 ) is meshed using tetrahedral elements, and the perfectly matched layer is meshed using hexahedral elements.An average element size that ensures that there are ten degrees of freedom (or five quadratic elements) per wavelength is used.Quadratic shape functions are used.The free-field transfer function is simulated from 20 Hz to 1 kHz with steps of 2 Hz.
For each speaker, two sizes are considered, which are referred to as Size 1 and Size 2, where Size 2 is the smallest.The sound pressure field is simulated on a transparent sphere around the speaker for each size and shape.Note that, for a given speaker shape, the radius of the transparent sphere changes depending on the size of the speaker and the position of the transducer on the speaker; The reason being that we place the transducer at the center of the coordinate system.For the source on the speakers of Size 1, the radius of the transparent sphere is 0.4 m, while for Size 2, it is 0.2 m.For the receivers on the speakers, a radius of 0.5 m is used for Size 1, and a radius of 0.25 m is used for Size 2.
The complex-valued pressure field simulated on the surface of the transparent sphere is used to compute the spherical harmonic directivity coefficients.For the loudspeaker directivity, this involves a straightforward implementation of spherical harmonic decomposition 55 .To obtain the receiver directivities, we use acoustic reciprocity 51 (see Sec. III B).Phase in radians (π) FIG. 5. Plots of sound pressure level and phase reconstructed using simulated speaker directivity coefficients of a large (Size 1) spherical and cuboidal speaker.The pressure field is reconstructed with maximum spherical harmonic order 5.The results are evaluated on the XY-plane at three frequencies.
An example of the directivity patterns is shown in Fig. 5, where the Sound Pressure Level (SPL) and phase are plotted for the spherical and cuboidal speakers with Size 1.The vibrating piston faces the positive direction of the x-axis, which is hereafter referred to as +x.The pressure field on the XY-plane is reconstructed using Eq. ( 16) using simulated directivity coefficients C (s) n,m with maximum spherical harmonic order N = 5.It can be seen that both the SPL and phase plots show differences between the two selected speaker shapes, especially at angles that are in the shadow region of the speakers.One can also observe a significant phase jump in the shadow region when a cuboidal speaker is used, which is less prominent for a spherical speaker.

Room transfer functions
A rectangular room with dimensions L x × L y × L z = 4 m × 3 m × 2.5 m is considered.The air in the room has a speed of sound of 343 m/s and a density of 1.2 kg/m 3 .A uniform frequency-independent normalized impedance of 18, which gives an absorption coefficient of approximately 0.2, is specified at the walls of the room.This value is chosen because it gives a reverberation time of approximately 0.29 s at the fundamental frequency of 43.88 Hz, which gives a Schroeder frequency of approximately 198 Hz.The reverberation time was estimated using the modal model described by Morse and Bolt 64 .Note that, in general, the surface impedance is a complex-valued frequency-dependent quantity.Our choice of a real-valued frequency-independent impedance does not limit the generality of the method described here.
Five source-receiver configurations are simulated.The center position of each transducer, for each configuration, is given in Table II.Four of the configurations contain two devices, one with a loudspeaker and the other with a microphone.The remaining configuration 3 has a single device on which the loudspeaker and microphone are placed.The transfer functions of the direct path between the source and receiver are simulated using the FEM for Configuration 3. The microphone position x C3 r in Table II depends on the size, shape, and rotation of the speaker and takes the values given in Table III.A 2D illustration of the speaker position and facing configurations is shown in Fig. 6.The source and receiver are both at least 1 m away from the walls in Positions 1 − 3. Positions 4 − 5 allow a closer distance between the walls and at least one speaker to examine the performance of the DEISM in challenging configurations.
Each model is meshed using tetrahedral elements with an average size that ensures that 10 degrees of freedom per wavelength are used.Quadratic shape functions are used.The room transfer functions for each configuration are simulated, from 20 Hz to 1 kHz, in steps of 2 Hz.

B. Preliminary comparisons with ISM and FSRR
In this section, we compare the DEISM-LC to existing RTF simulation methods.To this end, we choose the  original ISM with omnidirectional transducers (referred to as ISM-Omni) and the FSRR method, as introduced in Eq.( 1) and Eq.( 15), respectively.We demonstrate the effects of incorporating transducer directivities into the RTFs by comparing the ISM-Omni with the DEISM-LC.We are also interested in the differences between the FSRR method and DEISM-LC as they are computationally more efficient compared to the DEISM.In Fig. 7, the magnitudes and phases of the RTFs using the ISM-Omni, DEISM-LC, FSRR, and FEM are plotted.As an example, the results are shown for large cuboidal speakers in Configuration 2 with maximum reflection order 25 and maximum spherical harmonic order 5 for both the source and receiver.The magnitudes of the RTFs are given in terms of SPL in decibels and the unwrapped phases in radians.Unless specified, the magnitude and phase responses use the same units in the subsequent sections.The frequency-dependent SPL is defined as 53 SPL(k) = 20 log 10 P rms (k) P 0 , where P rms is the root-mean-square pressure calculated from the RTFs and P 0 = 2 × 10 −5 Pa is the reference pressure.The FEM solutions are presumed as the ground truth of the RTFs in this section and throughout the following evaluations.The RTFs of the ISM-Omni were generated using an omnidirectional source and receiver at the same locations as those used in Configuration 2. The directivity coefficients of the source and receiver are the same and contain only a monopole component based on Eq. ( 22), i.e., where (θ, ϕ) denotes the directions in spherical coordinates of either the point source or the omnidirectional receiver relative to their local origins.For the RTFs of the FSRR given in Eq.( 15), the directivity coefficients v,u (k) to the measurement sphere with a radius of 1 m, and we choose M p,q = β(p, q).
Overall, the differences between the DEISM-LC and FEM are small compared to those between the ISM-Omni and FEM and those between the FSRR and FEM.This indicates that the DEISM-LC can generate more realistic RTFs when compared to the other image source methods for the considered scenario.The large offset in magnitude responses between the DEISM-LC and ISM-Omni mainly come from the different source types, i.e., a point source used in ISM-Omni and a vibrating piston used in DEISM-LC.However, the locations and Q-factors of the resonant peaks and the phase responses also vary greatly.This shows that the directivities of the source and receiver, including sound diffraction around the speakers, can affect the RTFs to a large extent.
The differences between the DEISM-LC and FSRR are mainly attributed to the inherent mismatch in their RTF parameterizations, i.e., between Eq.( 36) and Eq.(15).The magnitude responses are different not only in the amplitude but also in the locations and Q-factors of the resonant peaks.The phase response of the FSRR even contains positive values at lower frequencies corresponding to non-causal behaviors, which results from the random sign B p,q used in Eq. (15).
Since the inherent differences between DEISM-LC and other methods, i.e., the ISM-Omni and FSRR, are shown to be large, we limit the following evaluations to DEISM, DEISM-LC, and FEM.

C. Varying distance between source and receiver
In this section, we present a comparison between the FEM and DEISM solutions for the small spherical speaker when the distance between the source and receiver is changing, corresponding to Configuration 1-3 in Tab.II.The two distances in Configuration 1-2 are around 1.97 m, 0.95 m and the distances in Configuration 3 are around 0.1 m and 0.2 m for the small and large speakers, respectively.The magnitude and phase are compared in Fig. 8.To avoid visual overlap of the curves for different configurations, the SPLs and phases are plotted by shifting the curves vertically, as denoted by the arrows.
We see that good agreement is obtained between the FEM and DEISM solutions.In general, greater differences are found at the notches of the RTFs, than at the peaks.Additionally, the differences between DEISM and DEISM-LC solutions are much smaller than those between the FEM and DEISM solutions in the notches of the RTFs.Since Configuration 3 contains only one speaker in the room, the corresponding RTFs show fewer variations across frequencies compared to the other two configurations.Similar agreement between FEM and DEISM was found for the other speaker shapes.For brevity, the respective plots are not presented here.

D. Source and receiver on the same device
Next, we compare solutions for cases where the source and receiver are placed on the same device (Configuration 3).For this comparison, we analyze data for the three speaker shapes considered.
The respective magnitude and phase solutions are shown in Fig. 9. Once again, good agreement is obtained between the FEM and DEISM solutions.The magnitudes and phases of the RTFs of the cuboidal and cylindrical speakers show very similar behaviour.In contrast, those of the spherical speaker exhibit clear differences when compared to the RTFs of the other speakers.Note that the main differences appear at the notches of the RTF magnitudes, which correspond to frequencies around 444 Hz and 650 Hz.In addition, the phase curves of the cuboidal and cylindrical speakers also exhibit significant jumps at some frequencies, which can not be observed for the spherical speaker.The differences between the curves indicate that the edges of the cuboidal and cylindrical speakers can induce additional diffraction and lead to distinct characteristics in the RTFs.

E. Quantitative analysis
In this section, we quantify the differences between the FEM and DEISM/DEISM-LC solutions and DEISM and DEISM-LC solutions.The following error metrics are used for the comparison: The root-mean-square log and the normalized root-mean-square phase error in radians 47 where K is the set of the K wavenumbers used in simulations, H Test (k) is the simulated RTF of either the DEISM or DEISM-LC at wavenumber k, and H FEM (k) is the simulated RTF of the FEM at wavenumber k.The log spectral distance and the phase errors are presented for different positions, speaker shapes, and sizes in Tables IV and V, respectively.In both tables, lower values indicate better performance.General observations can be made for Positions 1-3 in the two tables: • The smaller the speaker is, the smaller the difference between the DEISM and FEM solutions.
• The fewer speakers there are, i.e., for Configuration 3, the smaller the difference.
The errors of the DEISM solutions increase when the speakers are located close to one wall of the room, i.e., for Configurations 4 and 5.The large errors may be attributed to the mirror source approximation since the speakers can have more complicated wave interaction with the nearby walls, which is not considered in this model, and can lead to additional RTF fluctuations.
For devices with a physical extent, the assumption of a source image might not hold and could require more complex modeling.In Configuration 3, the DEISM outperforms DEISM-LC for all speaker shapes and sizes.However, this does not necessarily hold for Configurations 1 and 2. This is mainly caused by the notches of the RTF magnitudes, where the DEISM produces much lower values than DEISM-LC, as shown in Fig. 8 and Fig. 9.
To provide insight into the performance of the methods in larger rooms, we consider the case in which the room's length in the x-dimension is doubled.For this Meter in log10 scale  41)) between the DEISM and DEISM-LC solutions evaluated for the direct path only while changing the distance between the source and receiver.Both the distance and relative errors are plotted on a logarithmic scale.
test, only the cylindrical driver is considered in Configurations 1-3.The error levels obtained are presented in Table VI.The error levels for the larger room are similar to the values given in Tables IV and V, and imply similar conclusions.

F. Evaluation of DEISM-LC
In the previous section, we showed that in some cases, DEISM-LC achieves a smaller difference to the FEM than DEISM, especially at the notches of the RTFs.It is worth noting that the frequencies of the notches might not be accurate since the maximum resolution of the frequencies is 2 Hz.Therefore, directly comparing the DEISMs with the FEM might not be enough to determine whether the simplified DEISM performs worse or better.In the following, we investigate the difference between the DEISMs by computing the relative error using the ℓ 2 norm, defined as  41)) between the DEISM and DEISM-LC solutions as a function of reflection order for the cuboidal speakers at Positions 1, 2, and 3.
where H (•) is the complex-valued vector with the transfer function values H (•) (k) of all wavenumbers.In this comparison, we consider the error in free-field and reverberant conditions.
The errors in free-field conditions are given in Fig. 10, where the source-receiver distance and the error are plotted on logarithmic scales.One can observe that the difference decreases to around 1% at a distance of 10 m and to around 0.1% at a distance of 100 m.It can be concluded that DEISM-LC asymptotically converges to the DEISM solution as the distance between the source and receiver increases, which is expected from the derivations in Sec.III D.
In Fig. 11, the relative error is evaluated by changing the maximum reflection order in the DEISM for the cuboidal speakers.Note that for the curves of Configuration 3, the data of reflection order zero is missing since the direct path is simulated using the FEM.The difference between the DEISM solutions reduces for the first few reflection orders, but remains constant for higher reflection orders.Similar behaviors were also found for other speaker shapes, which are not presented here for brevity.

G. Summary
The presented comparisons and error analyses show that the proposed method and its simplified version can simulate RTFs that are in good agreement with FEM solutions in Configurations 1-3.Less agreement is found when a speaker is placed close to the walls, i.e., at Configurations 4 and 5.This might be attributed to the comb filtering caused by the direct path and reflection between the wall and speaker when the distance is less than 0.5 m 66 .
The errors between the DEISM and FEM solutions decrease when smaller speakers are used in the simulation.Additionally, the lowest error is obtained when the source and receiver are placed on the same speaker, which reduces the number of speakers in the simulation, and hence the number of reflecting surfaces.These observations support the hypothesis that the errors come mostly from the omission of speakers with physical extent in the

FIG. 1 .
FIG.1.Illustration of the scenario modeled by Samarasinghe et al.49 .A directional source is positioned inside a rectangular room.The room transfer function from the source located at xs to an observation point y (r) around the receiver located at xr can be simulated.

FIG. 2 .
FIG. 2. Illustration of the scenario considered for the proposed method.Two speakers are positioned inside a rectangular room.The source speaker has a vibrating piston on one of its surfaces located at xs.The receiver speaker has a microphone on the top surface located at xr.The room transfer function from xs to xr is of interest.Not shown here is the room in which the speakers are placed.The two spherical regions marked in light gray are nonoverlapping.
(r)v,u (k) and the directivity coefficients C (r) v,u (k) obtained via reciprocity for the receiver was derived in51

3 :
Calculate the directivity coefficients of the source and receiver, viz.C (s) n,m(k) and C (r) n,m(k), using the relation Pn,m(r0, k) = Cn,m(k)hn(kr0).Then store the directivity coefficients as matrices C (s) (n, m, k) and C (r) (n, m, k).
(k) are precalculated and stored in a lookup table.

FIG. 4 .
FIG.4.Speaker shapes considered in this work.From left to right, these speakers are referred to in the text as: cuboidal (cub.), spherical (sph.) and cylindrical (cyl.),where the terms in brackets are abbreviations also used in this work.The three speakers share the same height and the same-sized vibrating piston.

FIG. 6 .
FIG.6.A 2D view of the positions of the mounted transducers inside the room.The black dots and gray dots represent the approximate positions of the source and receiver transducers, respectively.The arrows of the dots indicate the directions in which the speakers face, and the numbers indicate the source and receiver positions.

FIG. 7 .
FIG. 7. A comparison of the magnitudes and phase responses between the ISM-Omni, DEISM-LC, FSRR, and FEM solutions.The results are shown for large cuboidal speakers with Configuration 2. Max.reflection order is 25 and max.SH orders are 5 for both the source and receiver.

FIG. 8 .
FIG.8.Magnitudes and phase responses for small spherical speakers with varying distance between speakers.Max.reflection order is 25 and max.SH orders are 5 for both the source and receiver.The magnitudes and phases are shifted by vertical arrows for better distinction of the curves.

FIG. 10 .
FIG.10.Relative error (Eq.(41)) between the DEISM and DEISM-LC solutions evaluated for the direct path only while changing the distance between the source and receiver.Both the distance and relative errors are plotted on a logarithmic scale.

TABLE I .
Parameters and descriptions used in ISM p = (px, py, pz) ∈ P

TABLE II .
Configurations of the transducers and the corresponding speaker orientations used in the room simulation.The position vector x C3 r for Configuration 3 is listed for different shapes and sizes in Tab.III.
Magnitudes and responses for large speakers with Configuration 3 while varying shapes of the speakers.Max.reflection order is 25 and max.SH. orders are 5 for both the source and receiver.The magnitudes and phases are shifted by verticle arrows for better distinction of the curves.
k∈K 10 log 10 |H Test

TABLE IV .
Root-mean-square log spectral distance in decibel between DEISM and FEM, and between DEISM-LC and FEM.(s.denotes small.)config.id cub.s. cub.sph.s. sph.cyl.s. cyl.

TABLE VI .
Root-mean-square log spectral distance in decibel and normalized root-mean-square phase error in radians between DEISM and FEM, and between DEISM-LC and FEM in a larger room.(s.denotes small.)RMS LSD (decibel) NRMS phase error (radians) FIG. 11.Relative error (Eq.(