The image source method (ISM) is often used to simulate room acoustics due to its ease of use and computational efficiency. The standard ISM is limited to simulations of room impulse responses between point sources and omnidirectional receivers. In this work, the ISM is extended using spherical harmonic directivity coefficients to include acoustic diffraction effects. These effects occur in practice when transducers are mounted on audio devices of finite spatial extent, e.g., modern smart speakers with loudspeakers and microphones. The proposed method is verified using finite element simulations of various loudspeaker and microphone configurations in a shoebox-shaped room. It is shown that the accuracy of the proposed method is related to the sizes, shapes, number, and positions of the devices inside a room. A simplified version of the proposed method, which can significantly reduce computational effort, is also presented. The proposed method and its simplified version can simulate room transfer functions more accurately than currently available image source methods and can aid the development and evaluation of speech and acoustic signal processing algorithms, including speech enhancement, acoustic scene analysis, and acoustic parameter estimation.

Room acoustic simulation has been an active research field since the 1960s.1 It is of paramount importance for, e.g., auralization and virtual acoustics,2–8 evaluation of acoustic signal processing algorithms,9–13 and data generation for data-driven methods.14–21 Room acoustics simulation methods can be broadly categorized into geometric methods,22–26 wave-based methods,27–32 hybrid methods,33,34 and, more recently, deep-learning–based methods.8,35,36 This paper presents an extension to the well-known geometrical method, the image source method (ISM), which enables the inclusion of acoustic diffraction effects from audio devices of finite extent, e.g., smart speakers on which the transducers are mounted, in shoebox-shaped rooms. The extension to arbitrary room shapes, which has been proposed by Borish37 and realized in well-known toolboxes,4,38 and the consideration of sound scattering by room boundary surfaces is outside the scope of this paper.

Modern smart speakers are a typical embodiment of an audio device that contains loudspeakers and microphones. Generating a reverberant room acoustic simulation that includes the directivities of loudspeakers and microphones due to acoustic diffraction effects is essential for developing and testing new acoustic signal processing algorithms. When using wave-based methods, the acoustic diffraction effects caused by the loudspeaker enclosure are implicitly modeled. However, because of the much higher computational cost of solving wave-based models, geometrical methods are often preferred. While assuming that sound waves behave like light rays does not apply to long wavelengths, Aretz et al.39 have shown that the ISM can be used to approximate wave-based method solutions above the Schroeder frequency40 in rectangular rooms by specifying angle-dependent reflection coefficients. Thus, the ISM can be used to obtain acceptable solutions across most of the audible frequency range above the Schroeder frequency.

The ISM has been implemented in many widely used toolboxes,4,38,41–43 along with various approaches for reducing the computational complexity, for example, parallelization based on graphics processing units (GPUs).43,44 Directional properties of the source and receiver were also incorporated into the ISM to produce more realistic room impulse responses,38,41,42,45–47 and efforts have been made to include acoustic diffraction effects due to the human head45 and rigid spherical baffles.48,49 Jarrett et al.48 extended the ISM to include rigid spherical microphones using an analytical expression of the diffraction effects in the spherical harmonic domain. Samarasinghe et al.50 further extended the ISM, viz., the generalized ISM (GISM), to facilitate parameterization of the reverberant transfer function between a directional source and a directional receiver. More recently, an extension of the ISM to incorporate loudspeaker directivities in the far field in the spherical harmonic domain has been reported.51 However, the existing approaches cannot simulate a transfer function between a directive source and a directive receiver that includes near-field diffraction effects.

In this work, the GISM50 is extended to enable the incorporation of the local acoustic diffraction effects that result from the interaction of sound waves with audio devices of finite spatial extent, which are used to excite and capture the acoustic field in a room. First, we review the basic concepts of the ISM and its extension to the spherical harmonic domain in Sec. II. In Sec. III, we present our proposed method, which incorporates the source directivity coefficients computed from solutions of an exterior radiation problem and receiver directivity coefficients obtained using reciprocity.52 The pseudocode of the proposed method is also included at the end of the section. In addition, we derive a simplified version of the proposed method to reduce computational complexity. The finite element method (FEM) is used to verify the proposed method by comparing room transfer functions (RTFs) with the source and receiver mounted on either different devices or on the same device. The experimental setups and the evaluation of the proposed method are presented in Sec. IV. The effect of varying the parameters of the device, e.g., its position in the room, physical shape, and size, on the accuracy of the proposed method, is investigated. Finally, conclusions are drawn in Sec. V.

The ISM was first applied to acoustics by Carslaw53 and later implemented using a digital computer by Allen and Berkley22 to model the impulse response between a point source and an omnidirectional receiver in a reverberant rectangular room.

To derive the standard ISM, consider a rectangular room with length, width, and height given by Lx, Ly, and Lz, as well as an acoustic point source and an omnidirectional receiver inside of the room at positions x s = ( x s , y s , z s ) and x r = ( x r , y r , z r ), given in Cartesian coordinates, respectively. The room transfer function can be expressed as a weighted sum of free-space Green's functions attenuated by the walls, that is
(1)
where k is the wavenumber. The triples p and q determine the position of the images and the attenuation factor β ( p , q ). The triple p = ( p x , p y , p z ) is used to determine whether the image source is mirrored with respect to walls at x = 0, y = 0, or z = 0, and its elements take values of 0 or 1, resulting in a set P of eight combinations. Each element of the other triple q = ( q x , q y , q z ) takes a value between [ N m , N m ], and it includes higher order reflections, where Nm is used to limit computational complexity and circular convolution errors,48 resulting in a set Q of ( 2 N m + 1 ) 3 combinations. β ( p , q ) = β x 1 | q x p x | β x 2 | q x | β y 1 | q y p y | β y 2 | q y | β z 1 | q z p z | β z 2 | q z | and β x 1 , β x 2 , β y 1 , β y 2 , β z 1 , β z 2 are the angle-independent reflection coefficients of the walls, and the subscripts a1, a2 ( a { x , y , z }) of β ( · ) correspond to the boundaries at a = 0 and the boundaries at a = La, respectively. For an image source with position
(2)
the Green's function can be written as
(3)
where i denotes the imaginary unit. The vector pointing from every image of the source to the receiver is expressed as
(4)

Many extensions to the ISM have been proposed to facilitate more realistic modeling of room acoustics. In this section, we discuss two important extensions of the ISM relevant to the present study, namely, using angle-dependent reflection coefficients and including source and receiver directivities.

1. Angle-dependent reflection coefficients

In Eq. (1), angle-independent reflection coefficients are used. Aretz et al.39 have shown that including the angle of incidence when specifying the boundary conditions in the ISM can result in solutions that agree with FEM solutions. As such, angle-dependent reflection coefficients are used in this work.

For each reflection path represented via the indices ( p , q ) in the ISM, three incident angles in Cartesian coordinates are needed, i.e.,
(5)
where a { x , y , z } , a R p , q sI r is the corresponding coordinate of the vector R p , q sI r and θ a ( p , q ) is the incident angle on the walls perpendicular to the a axis. The incident angle θ a ( p , q ) is the same for the reflections at the walls perpendicular to the same axis in a rectangular room.39 For a uniform normalized impedance, ζ, the angle-dependent reflection coefficients are given by54 
(6)
From this point onward, the attenuation term due to the reflections, β ( p , q ), contains the angle-dependent reflection coefficients defined in Eq. (6). Note that, while the reflection properties of surfaces are generally frequency dependent, we use a frequency-independent impedance ζ for all walls in this work without loss of generality.

2. Extension to the spherical harmonic domain

In practice, the sound source and receiver in a room usually exhibit directional properties that are far more complex than those of a point source or an omnidirectional receiver. Various methods for incorporating source and receiver directivities into the ISM have been presented in the literature.15,18,38,41,42,45–47,49 In this section, we present the spherical microphone array impulse response (SMIR) generator proposed by Jarrett et al.55 

The directivity of a source or receiver depends not only on the directional properties of the transducer itself but also on acoustic wave interaction with the physical object on which the transducer is mounted. The SMIR generator extends the ISM to the spherical harmonic domain, which facilitates the incorporation of the acoustic diffraction effect due to a rigid sphere and enables the simulation of RTFs for microphones mounted on such a sphere. When considering the receiver at x r as the center of the spherical array with an omnidirectional microphone positioned at y ( r ), the SMIR generator computes the RTF via
(7)
where G N ( · | · ) denotes the Green's function fulfilling Neumann boundary conditions on the sphere and R ̃ p , q r sI is the vector pointing from the center of the array to the source image in spherical coordinates. Detailed expressions of H SMIR ( y ( r ) | x s , k ) can be found in the literature.55 It is worth noting that the Green's function given in Eq. (3) and the Neumann Green's function in Eq. (7) use different coordinate systems for the input arguments.
The acoustic diffraction on the sphere affects the directivity of the microphones on the sphere. Further directional properties of the source and receiver are usually implemented as a gain factor for each reflection path in the ISM. For example, Eq. (7) is modified by considering the source directivity49 yielding the expression
(8)
where θ p , q e , o , ϕ p , q e , o are the emission inclination and azimuth angles of the image source with indices ( p , q ) with respect to the orientation of the source directivity pattern. The angles θ p , q e , o , ϕ p , q e , o can be obtained by calculating the angles between the source orientation, e.g., the main direction of the directivity pattern at the cell with indices ( p , q ), and the vector from the source image with indices ( p , q ) to the receiver. For first-order directivity patterns, g ( · ), can be expressed as49 
(9)
where 0 δ 1. The relation between the emission and impinging angles, i.e., the mirrored emission angles after all reflections, depends on the number of reflections by walls perpendicular to different axes. A detailed formulation of this relation is provided by Hafezi et al.;49 however, the phase information of the source directivity is missing in this approach.

The method introduced in Eq. (8) to model directional sources and receivers is restricted to directivity patterns with known analytical expressions. In practice, directivity patterns may exhibit complex shapes without analytical expressions. Samarasinghe et al.50 further extended the ISM to facilitate the parameterization of the transfer function between a source with arbitrary directivity and a receiver region. The directional properties of the receiver region can be formed using beamforming techniques based on the received sound field. In this section, we describe the GISM, which is the foundation of our proposed method.

The GISM decomposes the transfer function into three parts: (i) an outgoing sound field from the directional source, (ii) an incident sound field captured by the receiver that can be directional, and (iii) the mode coupling coefficients that couple the source and receiver directivities in the spherical harmonic domain.

Consider an outgoing sound field from a directional source, for example, a loudspeaker, located at x s, as shown in Fig. 1. The sound field observed at an arbitrary point r outside of the sphere enclosing the entire speaker can be computed as56 
(10)
where ( d , θ , ϕ ) denote the spherical coordinates associated with r, viz., the distance d, the inclination angle θ, and the azimuth angle ϕ. Moreover, h n ( · ) denotes the spherical Hankel function of order n, Y n , m denotes the spherical harmonic function of order n and mode m, and C n , m ( s ) ( k ) represent the spherical harmonic source directivity coefficients. In practice, the infinite summation in Eq. (10) is truncated to a maximum order N = k r 0 ( s ) , where r 0 ( s ) denotes the radius of the sphere that is large enough to surround the loudspeaker.57,58
FIG. 1.

Illustration of the scenario modeled by Samarasinghe et al. (Ref. 50). A directional source is positioned inside a rectangular room. The room transfer function from the source located at x s to an observation point y ( r ) around the receiver located at x r can be simulated.

FIG. 1.

Illustration of the scenario modeled by Samarasinghe et al. (Ref. 50). A directional source is positioned inside a rectangular room. The room transfer function from the source located at x s to an observation point y ( r ) around the receiver located at x r can be simulated.

Close modal
The room transfer function from the source x s to any point y ( r ) with spherical coordinates ( d y ( r ) , θ y ( r ) , ϕ y ( r ) ) relative to x r can be expressed using the GISM as50,
(11)
where V = k d y ( r ) denotes the truncation limit that produces a small error59 and j v ( · ) denotes the nth-order spherical Bessel function of the first kind. The reverberant mode coupling coefficients γ v , u n , m ( k ) are given by
(12)
where R p , q sI r has spherical coordinates ( | | R p , q sI r | | , Ω p , q sI r ), and Ω p , q sI r = ( θ p , q sI r , ϕ p , q sI r ). The additional factors ( 1 ) ( p y + p z ) m + p z n and ( 1 ) p x + p y result from the mirror effect of the image sources60 on the spherical harmonics, which is a more general description compared to the use of emission and impinging angles as applied in the SMIR+ method.49 Using x 0 with spherical coordinates ( d x 0 , θ x 0 , ϕ x 0 ), instead of R p , q sI r for brevity, the single-path mode coupling coefficients α v , u n , m ( · ) in Eq. (12) can be derived using the translation of fields,60 
(13)
with W1 and W2 denoting Wigner 3-j symbols,
(14)
and ξ = ( 2 n + 1 ) ( 2 v + 1 ) ( 2 l + 1 ) / 4 π.

The fast source room receiver (FSRR) method, proposed by Luo and Kim,51 also simulates the RTF in the spherical harmonic domain. The RTF parameterization includes an approximation of the spherical harmonic description of the source and receiver directivities and utilizes a simplification of the cross-modal coupling, i.e., the mode coupling coefficients in Eq. (12). The method also uses a randomized sign parameter. Notably, the FSRR method uses reversed reflection paths—a technique explored in this work for deriving a simplified version of the proposed method, which is presented in Sec. III D.

Using our notation, the FSRR method RTF is given by
(15)
where B p , q randomly takes value between 1 and –1, M p , q is a function fitted to frequency-dependent absorption coefficients, and C ̃ is a directivity coefficient obtained at 1 m from either the source or the receiver. The modified triples p ̃ , q ̃ are defined in Sec. III D [Eq. (30)], Ω p ̃ , q ̃ sI r is the direction of a vector pointing from a source image to the receiver, and Ω p ̃ , q ̃ r sI is the direction of the vector corresponding to the reversed path of the same reflection. Definitions of the two vectors are given in Sec. III D.

We have seen that the SMIR generator relies on analytical descriptions of directivity, the GISM simulates either a directive source or a directive receiver, and the FSRR method approximates the RTF between directional sources and receivers. In this section, we propose a method for simulating RTFs that can include near-field diffraction effects of both directional sources and directional receivers.

We consider the following scenario: one source device and one receiver device are positioned in a rectangular room, as depicted in Fig. 2. The source contains a sound-radiating transducer located at x s, e.g., a vibrating piston on one surface, and a microphone located at x r is mounted on the receiver. The two devices can have arbitrary shapes and materials, and the transducers can exhibit arbitrary directivity patterns. The acoustic field generated by the source transducer interacts with both devices and the walls and is captured by the microphone. We aim to incorporate the local diffraction effects around the devices into the RTFs based on the ISM. The existing methods are unable to resolve the problem under consideration due to the following reasons:

  • As shown by Xu et al.,52 the terms j v ( k d y ( r ) ) Y v , u ( θ y ( r ) , ϕ y ( r ) ) corresponding to the incident sound field in Eq. (11) cannot model the diffraction around the receiver. Therefore, even utilizing the acoustic reciprocity principle, the GISM can only model the RTFs when either the source or the receiver does not include local diffraction effects.

  • The FSRR method51 extends the ISM to include the directivity of both the source and receiver using a far-field approximation and, therefore, cannot model near-field directivities.

FIG. 2.

Illustration of the scenario considered for the proposed method. Two speakers are positioned inside a rectangular room. The source speaker has a vibrating piston on one of its surfaces located at x s. The receiver speaker has a microphone on the top surface located at x r. The room transfer function from x s to x r is of interest. Not shown here is the room in which the speakers are placed. The two spherical regions marked in light gray are nonoverlapping.

FIG. 2.

Illustration of the scenario considered for the proposed method. Two speakers are positioned inside a rectangular room. The source speaker has a vibrating piston on one of its surfaces located at x s. The receiver speaker has a microphone on the top surface located at x r. The room transfer function from x s to x r is of interest. Not shown here is the room in which the speakers are placed. The two spherical regions marked in light gray are nonoverlapping.

Close modal
The directivity of a transducer mounted on a device can be either numerically simulated or practically measured at discrete points on a sphere around a local origin. When sound waves are generated or captured by the mounted transducers, e.g., a circular piston at x s or a microphone at x r as shown in Fig. 2, acoustic diffraction occurs due to the wave interaction with the speakers. As shown in previous works,52,56,61 the total sound radiation of a source device can be quantified using directivity coefficients defined on a transparent sphere with radius r 0 ( s ). Using reciprocity, the receiver directivity can be obtained analogously to the source directivity.52 As an example, consider the source device. If the sound field is sampled at J discrete points on the transparent sphere, one can reformulate Eq. (10) with truncated order N as
(16)
where r j denotes the jth position on the transparent sphere, ( θ j , ϕ j ) is the corresponding direction, P n , m ( r 0 ( s ) , k ) = C n , m ( s ) ( k ) h n ( k r 0 ( s ) ) is the spherical wave spectrum,56 and j { 1 , 2 , , J }. Different choices of the distribution of ( θ j , ϕ j ) can be found in the literature.60 If the number of samples is larger than ( N + 1 ) 2, the spherical wave spectrum can be calculated from the sampled sound field by solving an over-determined linear system using a least squares approach. The directivity coefficients of the source can then be determined by C n , m ( s ) ( k ) = P n , m ( r 0 ( s ) , k ) / h n ( k r 0 ( s ) ). The receiver directivity coefficients C n , m ( r ) ( k ) can be obtained following the same procedure by making use of reciprocity.52 
The following parameterization of the transfer function between two transducers with arbitrary directivities in the free field, H ( F ), was proposed by Xu et al.:52 
(17)
where α v , u n , m ( k ) represent the free-field mode coupling coefficients. Note that the two transparent spheres shown in Fig. 2 are nonoverlapping in this work. However, when both the source and receiver are on the same device, one can measure or simulate the direct path to avoid the overlapping of the two transparent spheres. The following relation between D v , u ( r ) ( k ) and the directivity coefficients C v , u ( r ) ( k ) obtained via reciprocity for the receiver was derived in52 
(18)
Since the summation over source images is a linear operation and the mirroring effects are already included in γ v , u n , m in Eq. (12), the total RTF can be written by substituting γ v , u n , m ( k ) for α v , u n , m ( k ) in Eq. (17), yielding
(19)
The modified reverberant mode coupling coefficients can be defined, based on Eq. (12) and the reciprocal relation in Eq. (18), as follows:
(20)
The final proposed RTF is therefore expressed as
(21)
where, for clarity, the proposed method is identified using the acronym diffraction enhanced image source method (DEISM).
The GISM RTF, i.e., the simulation of a directional source to a local receiving region, can be recovered from the proposed RTF as follows. Similar to Xu et al.,52 for a receiver region without any physical objects and an observation point y ( r ) with spherical coordinates ( d y ( r ) , θ y ( r ) , ϕ y ( r ) ) relative to x r, as shown in Fig. 1, the receiver directivity coefficients can be written as
(22)
By substituting Eq. (22) into Eq. (21), and utilizing the property of spherical harmonics Y v , u * ( θ y ( r ) , ϕ y ( r ) ) = ( 1 ) u Y v , u ( θ y ( r ) , ϕ y ( r ) ), one can obtain the same expression of the RTFs as the GISM shown in Eq. (11).

The computational cost of the DEISM RTF is expected to increase with frequency since the directivity patterns tend to be more complicated at higher frequencies,62 resulting in higher orders of the directivities, i.e., larger N and V, required in the DEISM. As either N or V increases, the number of summations over l in the mode coupling coefficients in Eq. (12) increases, thus consuming more computational resources. Therefore, it is necessary to investigate the possibility of simplifying the DEISM in Eq. (21) while maintaining sufficient accuracy. To this end, a far-field approximation is applied to each reflection path of the RTF in Eq. (21) in this section.

Without loss of generality, we consider a reflection path from the source image to the receiver, denoted by the vector R p , q sI r, for the following analysis. The mode coupling coefficients from Eq. (20) for a single reflection path can be expressed as follows:
(23)
(24)
where m = ( 1 ) p x + p y m and Λ ( p , m , n ) = ( 1 ) ( p y + p z ) m + p z n are the additional effects of the mirroring on the spherical harmonics.50 Assuming | | R p , q sI r | | is large, we employ the large argument of the spherical Hankel function, i.e., h l ( 2 ) ( k | | R p , q sI r | | ) i l + 1 e i k | | R p , q sI r | | / k | | R p , q sI r | |. The modified mode coupling coefficients in far field can be expressed as
(25)
Note that the product of two spherical harmonics can be expressed using the formula63 
(26)
By applying Eq. (26) to Eq. (25), we can express the mode coupling coefficients as follows:
(27)
The transfer function of one reflection path using the far-field approximation model can thus be written as
(28)
where DEISM-LC is used to refer to the low-complexity (LC) solution.

The terms Λ ( p , m , n ) Y n , m ( Ω p , q sI r ) are a result of mirroring of spherical harmonics. Since the source images are used in Eq. (28), the additional factors Λ ( p , m , n ) and m need to be associated with the source spherical harmonics. To further simplify the expressions in Eq. (28), we propose to replace Λ ( p , m , n ) Y n , m ( Ω p , q sI r ) with a simplified expression, which can be achieved as shown in the following by using the direction that corresponds to the reversed traveling path of the vector R p , q sI r.

First, we note that the vector pointing from the receiver to the source image R p , q r sI is the inverse of the vector R p , q sI r given in Eq. (4). In addition, we define the vector pointing from the source to the receiver images R p , q s rI, which can be expressed similarly to Eq. (4) as
(29)
In spherical coordinates, the vectors R p , q r sI and R p , q s rI have directions Ω p , q r sI and Ω p , q s rI, respectively. These two vectors are reversed reflection paths, only when there is an odd number of reflections between parallel walls, as shown by the gray arrows in Fig. 3. To take the even number of reflections into account, as shown by the black arrows in Fig. 3, we use the modified vector R p ̃ , q ̃ s rI (with directions Ω p ̃ , q ̃ s rI) that represents the reversed path of R p , q r sI with its subscript indices p ̃ , q ̃ = ( p ̃ x , p ̃ y , p ̃ z ) , ( q ̃ x , q ̃ y , q ̃ z ) taking the following expressions:
(30)
where a { x , y , z }, and p a , q a represent the image indices that are symmetric to pa, qa with respect to the image indices p a , q a = 0 , 0. The relation between p a , q a and pa, qa can be summarized as
(31)
(32)
where · denotes the floored division.
FIG. 3.

An example of reversed reflection paths along the x -direction used in the ISM. For an odd number of reflections, R 1 , 0 r sI and R 1 , 0 s rI share the same-length, but reversed reflection path. However, for an even number of reflections, e.g., R 0 , 1 r sI, the reversed path R 0 , 1 s rI is symmetric with respect to the original room cell instead of R 0 , 1 s rI.

FIG. 3.

An example of reversed reflection paths along the x -direction used in the ISM. For an odd number of reflections, R 1 , 0 r sI and R 1 , 0 s rI share the same-length, but reversed reflection path. However, for an even number of reflections, e.g., R 0 , 1 r sI, the reversed path R 0 , 1 s rI is symmetric with respect to the original room cell instead of R 0 , 1 s rI.

Close modal
Second, due to the properties of the mirroring effect on the spherical harmonics, one can write the following relation between the spherical harmonics associated with the spherical direction Ω p , q sI r in Eq. (28) and Ω p ̃ , q ̃ s rI introduced above:
(33)
Since R p , q r sI = R p , q sI r, i.e., Ω p , q sI r and Ω p , q r sI have opposite directions, the following relation is obtained:
(34)
Finally, by substituting Eqs. (33) and (34) in the approximated solution in Eq. (28), we obtain the simplified expression
(35)
The RTF of the DEISM-LC is then given by
(36)
The DEISM-LC is more computationally efficient since it does not require the summation over l in the mode coupling coefficient in Eq. (12). The DEISM in Eq. (21) has a time complexity of O ( I N 2 V 2 ( N + V ) ) per wavenumber k, where I is the number of images, and N and V are the maximum spherical harmonic (SH) order used to describe directivities of the source and receiver. However, the time complexity of DEISM-LC is reduced to O ( I N 2 V 2 ) per wavenumber k, which can lead to approximately N + V times faster computations compared to DEISM. As the frequency increases, larger N or V is usually necessary to accurately describe the directivity. Therefore, one might find DEISM-LC more practical for simulating high frequencies.

While the simplified method described here is similar to that of the FSRR method, we note that there are significant differences between Eq. (36) and Eq. (15). As stated in Sec. III A, the FSRR utilizes directivity coefficients C ̃ n , m ( s ) ( k ) , C ̃ v , u ( r ) ( k ) that are obtained at 1 m from the source or the receiver. However, the directivity coefficients used in DEISM-LC are not limited to this assumption as long as the transparent spheres enclose the devices. Furthermore, there are inherent dissimilarities between DEISM-LC and FSRR due to the different coefficients, e.g., B p , q , M p , q used in Eq. (15) and in, iv used in Eq. (36). This difference is also illustrated by an example described in Sec. IV B.

Algorithm 1.

Calculate lookup tables.

1: Calculate the Wigner 3-j symbols W 1 ( n , v , l ) and W 2 ( n , v , l , m , u ) from Eq. (14)
2: Calculate the spherical harmonic coefficients P n , m ( k , r 0 ) shown in Eq. (16) for both the source and receiver using the least squares approach. 
3: Calculate the directivity coefficients of the source and receiver, viz., C n , m ( s ) ( k ) and C n , m ( r ) ( k ), using the relation P n , m ( r 0 , k ) = C n , m ( k ) h n ( k r 0 ). Then, store the directivity coefficients as matrices C ( s ) ( n , m , k ) and C ( r ) ( n , m , k )
1: Calculate the Wigner 3-j symbols W 1 ( n , v , l ) and W 2 ( n , v , l , m , u ) from Eq. (14)
2: Calculate the spherical harmonic coefficients P n , m ( k , r 0 ) shown in Eq. (16) for both the source and receiver using the least squares approach. 
3: Calculate the directivity coefficients of the source and receiver, viz., C n , m ( s ) ( k ) and C n , m ( r ) ( k ), using the relation P n , m ( r 0 , k ) = C n , m ( k ) h n ( k r 0 ). Then, store the directivity coefficients as matrices C ( s ) ( n , m , k ) and C ( r ) ( n , m , k )

A summary of the algorithm proposed in Sec. III C is listed as pseudocode in 1, 2, and 3. A Python implementation of the algorithm, along with an example, is available online.64 The pseudocode of the far-field approximation method is not provided here for brevity.

The calculation of the Wigner 3-j symbols can be computationally expensive if they are computed for each reflection path. Since they are identical for each reflection path, to reduce the computational effort, we precalculate them for all the given spherical harmonic orders and modes n , m , v , u , l, and store them in a lookup table. In addition, the directivity coefficients C n , m ( s ) ( k ) and C n , m ( r ) ( k ) are precalculated and stored in a lookup table. The procedures for generating the lookup tables are listed in Algorithm 1.

ALGORITHM 2.

Image generation.

1: p = ( p x , p y , p z ) P = { 0 , 1 } 3 (A triple introduced in Table I). 
2: q = ( q x , q y , q z ) Q = { N m , , 0 , , N m } 3 (A triple introduced in Table I). 
3: A = P × Q (A set containing all possible images.) 
4: for ( p , q ) A do 
5:  if | | 2 q x p x | | + | | 2 q y p y | | + | | 2 q z p z | | N o then { N o = maximum reflection order}. 
6:  Calculate the vector pointing from the source image to the receiver R p , q sI r in Eq. (4) and its spherical coordinates ( d x 0 , θ x 0 , ϕ x 0 )
7:   Calculate the incident angles θ a ( p , q ) with respect to different walls using Eq. (5)
8:   Calculate the reflection coefficients β x 1 , β x 2 , β y 1 , β y 2 , β z 1 , β z 2 using Eq. (6)
9:   Calculate the wall attenuation β ( p , q ) = β x 1 | q x p x | β x 2 | q x | β y 1 | q y p y | β y 2 | q y | β z 1 | q z p z | β z 2 | q z |
10:  else 
11:    A = A { ( p , q ) } (Remove current image if it exceeds No.) 
12:  end if 
13: end for 
1: p = ( p x , p y , p z ) P = { 0 , 1 } 3 (A triple introduced in Table I). 
2: q = ( q x , q y , q z ) Q = { N m , , 0 , , N m } 3 (A triple introduced in Table I). 
3: A = P × Q (A set containing all possible images.) 
4: for ( p , q ) A do 
5:  if | | 2 q x p x | | + | | 2 q y p y | | + | | 2 q z p z | | N o then { N o = maximum reflection order}. 
6:  Calculate the vector pointing from the source image to the receiver R p , q sI r in Eq. (4) and its spherical coordinates ( d x 0 , θ x 0 , ϕ x 0 )
7:   Calculate the incident angles θ a ( p , q ) with respect to different walls using Eq. (5)
8:   Calculate the reflection coefficients β x 1 , β x 2 , β y 1 , β y 2 , β z 1 , β z 2 using Eq. (6)
9:   Calculate the wall attenuation β ( p , q ) = β x 1 | q x p x | β x 2 | q x | β y 1 | q y p y | β y 2 | q y | β z 1 | q z p z | β z 2 | q z |
10:  else 
11:    A = A { ( p , q ) } (Remove current image if it exceeds No.) 
12:  end if 
13: end for 

In the provided implementation, we separate the whole procedure of the ISM into two parts, viz., the image generation given in Algorithm 2 and the single-path transfer function calculation using DEISM given in Algorithm 3. The computation in the spherical harmonic domain is generally also expensive, and therefore, we use parallel computation for all reflection paths after all images are calculated.

ALGORITHM 3.

DEISM core.

1: P(k) = 0 (Initialize the sound field at x r.) 
2: for ( p , q ) A do 
3:  for n = 0 to N do 
4:   for m = n to n do 
5:     Γ = ( 1 ) ( p y + p z ) m + p z n [Part of mirror effect of spherical harmonics in Eq. (12).] 
6:     m = ( 1 ) p x + p y m [The modified mode m of α v , u n , m in Eq. (12).] 
7:    for v = 0 to V do 
8:     for u = v to v do 
9:       α v , u n , m = 0 [Initialize the mode coupling coefficient in Eq. (13).] 
10:      for l = | n v | to n + v do 
11:          ξ = ( 2 n + 1 ) ( 2 v + 1 ) ( 2 l + 1 ) / 4 π
12:          α v , u n , m + = 4 π i v n ( 1 ) m i l h l ( 2 ) ( k d x 0 ) Y l , m u ( θ x 0 , ϕ x 0 ) W 1 ( n , v , l ) 
13:            × W 2 ( n , v , l , m , u ) ξ (Update the mode coupling coefficient.) 
14:      end for 
15:       P ( k ) + = Γ β ( p , q ) C ( s ) ( n , m , k ) α v , u n , m i ( 1 ) u k C ( r ) ( v , u , k ) [Summation over A constitutes the room transfer function in Eq. (21).] 
16:     end for 
17:    end for 
18:   end for 
19:  end for 
20: end for 
1: P(k) = 0 (Initialize the sound field at x r.) 
2: for ( p , q ) A do 
3:  for n = 0 to N do 
4:   for m = n to n do 
5:     Γ = ( 1 ) ( p y + p z ) m + p z n [Part of mirror effect of spherical harmonics in Eq. (12).] 
6:     m = ( 1 ) p x + p y m [The modified mode m of α v , u n , m in Eq. (12).] 
7:    for v = 0 to V do 
8:     for u = v to v do 
9:       α v , u n , m = 0 [Initialize the mode coupling coefficient in Eq. (13).] 
10:      for l = | n v | to n + v do 
11:          ξ = ( 2 n + 1 ) ( 2 v + 1 ) ( 2 l + 1 ) / 4 π
12:          α v , u n , m + = 4 π i v n ( 1 ) m i l h l ( 2 ) ( k d x 0 ) Y l , m u ( θ x 0 , ϕ x 0 ) W 1 ( n , v , l ) 
13:            × W 2 ( n , v , l , m , u ) ξ (Update the mode coupling coefficient.) 
14:      end for 
15:       P ( k ) + = Γ β ( p , q ) C ( s ) ( n , m , k ) α v , u n , m i ( 1 ) u k C ( r ) ( v , u , k ) [Summation over A constitutes the room transfer function in Eq. (21).] 
16:     end for 
17:    end for 
18:   end for 
19:  end for 
20: end for 

In the following, we describe the experimental setups used to evaluate the performance of DEISM and DEISM-LC [see Eqs. (21) and (36), respectively]. The simulated RTFs are plotted and compared for different scenarios. The differences between the DEISM and the FEM solutions are evaluated using error metrics, and the difference between DEISM and DEISM-LC is analyzed. In addition, the effects of incorporating directivities into the RTFs by comparing the DEISM-LC with the original ISM, and the differences between DEISM-LC and FSRR are also demonstrated by comparing them with FEM.

The proposed methods require a description of the directivities of the devices, i.e., a loudspeaker (or microphone) in the spherical harmonic domain. These data can be obtained from measurements of the pressure field in the local vicinity of the transducer and its enclosure. Alternatively, directivity patterns obtained from simulations could be used. In this work, the FEM is used to simulate the free-field pressure fields surrounding the transducer and its enclosure. Aretz et al.39 have demonstrated that solutions of the ISM with angle-dependent reflection coefficients are in good agreement with FEM solutions above the Schroeder frequency. Therefore, the FEM is used in this work to evaluate the performance of the proposed method.

This section presents the details of generating the simulated transducer directivities and the RTFs.

1. Transducer directivities

The three speaker shapes considered in this work are shown in Fig. 4. For each speaker, the loudspeaker driver is modeled as a vibrating piston with frequency-independent unit acceleration. For each microphone, a monopole source with frequency-independent unit volume velocity is placed at the microphone position, which is located on the top and close to the front of each enclosure.

FIG. 4.

Speaker shapes considered in this work. From left to right, these speakers are referred to in the text as cuboidal (cub.), spherical (sph.), and cylindrical (cyl.), where the terms in brackets are abbreviations also used in this work. The three speakers share the same height and the same-sized vibrating piston. Note that the speakers are not to scale in this figure.

FIG. 4.

Speaker shapes considered in this work. From left to right, these speakers are referred to in the text as cuboidal (cub.), spherical (sph.), and cylindrical (cyl.), where the terms in brackets are abbreviations also used in this work. The three speakers share the same height and the same-sized vibrating piston. Note that the speakers are not to scale in this figure.

Close modal

One must ensure that free-field conditions are used to capture the loudspeaker and microphone directivities. In a measurement setup, one might use an anechoic chamber. When using the FEM, a suitable non-reflecting boundary condition is required. In this work, a perfectly matched layer65 is used to reduce unwanted reflections from the edge of the computational domain. Note that care must be taken to ensure that the perfectly matched layer sufficiently reduces the unwanted reflections—this is achieved in this work by performing a priori mesh studies. Each speaker is meshed using triangular elements, the surrounding air (with a sound speed of 343 m/s and density of 1.2 kg/m3) is meshed using tetrahedral elements, and the perfectly matched layer is meshed using hexahedral elements. An average element size that ensures that there are 10 degrees of freedom (or five quadratic elements) per wavelength is used. Quadratic shape functions are used. The free-field transfer function is simulated from 20 Hz to 1 kHz with steps of 2 Hz.

For each speaker, two sizes are considered, which are referred to as Size 1 and Size 2, where Size 2 is the smallest. The sound pressure field is simulated on a transparent sphere around the speaker for each size and shape. Note that, for a given speaker shape, the radius of the transparent sphere changes depending on the size of the speaker and the position of the transducer on the speaker. These changes occur because we place the transducer at the center of the coordinate system. For the source on the speakers of Size 1, the radius of the transparent sphere is 0.4 m, while for Size 2, it is 0.2 m. For the receivers on the speakers, a radius of 0.5 m is used for Size 1, and a radius of 0.25 m is used for Size 2.

The complex-valued pressure field simulated on the surface of the transparent sphere is used to compute the spherical harmonic directivity coefficients. For the loudspeaker directivity, this involves a straightforward implementation of spherical harmonic decomposition.56 To obtain the receiver directivities, we use acoustic reciprocity52 (see Sec. III B).

An example of the directivity patterns is shown in Fig. 5, where the sound pressure level (SPL) and phase are plotted for the spherical and cuboidal speakers with Size 1. The vibrating piston faces the positive direction of the x axis, which is hereafter referred to as + x. The pressure field on the XY-plane is reconstructed using Eq. (16) using simulated directivity coefficients C n , m ( s ) with maximum spherical harmonic order N = 5. It can be seen that both the SPL and phase plots show differences between the two selected speaker shapes, especially at angles that are in the shadow region of the speakers. One can also observe a significant phase jump in the shadow region when a cuboidal speaker is used, which is less prominent for a spherical speaker.

FIG. 5.

Plots of sound pressure level and phase reconstructed using simulated speaker directivity coefficients of a large (Size 1) spherical and cuboidal speaker. The pressure field is reconstructed with maximum spherical harmonic order 5. The results are evaluated on the XY-plane at three frequencies.

FIG. 5.

Plots of sound pressure level and phase reconstructed using simulated speaker directivity coefficients of a large (Size 1) spherical and cuboidal speaker. The pressure field is reconstructed with maximum spherical harmonic order 5. The results are evaluated on the XY-plane at three frequencies.

Close modal

2. Room transfer functions

A rectangular room with dimensions L x × L y × L z = 4 m × 3 m × 2.5 m is considered. The air in the room has a speed of sound of 343 m/s and a density of 1.2 kg/m3. A uniform frequency-independent normalized impedance of 18, which gives an absorption coefficient of approximately 0.2, is specified at the walls of the room. This value is chosen because it gives a reverberation time of approximately 0.29 s at the fundamental frequency of 43.88 Hz, which gives a Schroeder frequency of approximately 198 Hz. The reverberation time was estimated using the modal model described by Morse and Bolt.66 Note that, in general, the surface impedance is a complex-valued frequency-dependent quantity. Our choice of a real-valued frequency-independent impedance does not limit the generality of the method described here.

Five source–receiver configurations are simulated. The center position of each transducer, for each configuration, is given in Table II. Four of the configurations contain two devices—one with a loudspeaker and the other with a microphone. The remaining Configuration 3 has a single device on which the loudspeaker and microphone are placed. The transfer functions of the direct path between the source and receiver are simulated using the FEM for Configuration 3. The microphone position x r C 3 in Table II depends on the size, shape, and rotation of the speaker and takes the values given in Table III. A two-dimensional (2D) illustration of the speaker position and facing configurations is shown in Fig. 6. The source and receiver are both at least 1 m away from the walls in Positions 1–3. Positions 4–5 allow a closer distance between the walls and at least one speaker to examine the performance of the DEISM in challenging configurations.

TABLE I.

Parameters and descriptions used in ISM.

Parameter Descriptions
L x , L y , L z  Room dimensions: length, width, and height 
β x 1 , β x 2 , β y 1 , β y 2 , β z 1 , β z 2  Reflection coefficients of the walls 
p = ( p x , p y , p z ) P  Triple determining whether the image source is mirrored 
  With respect to walls at x = 0, y = 0, or z = 0, elements take values of 0 or 1 
q = ( q x , q y , q z ) Q  Triple determining higher order reflections 
  Each element can take values between [ N m , N m ] 
x p , q sI  Source images 
x p , q rI  Receiver images 
R p , q sI r = x r x p , q sI  Vector pointing from source images to the receiver 
R p , q r sI = R p , q sI r  Vector pointing from receiver to source images 
R p ̃ , q ̃ s rI  The reversed vector corresponding to the same reflection path of R p , q r sI 
Parameter Descriptions
L x , L y , L z  Room dimensions: length, width, and height 
β x 1 , β x 2 , β y 1 , β y 2 , β z 1 , β z 2  Reflection coefficients of the walls 
p = ( p x , p y , p z ) P  Triple determining whether the image source is mirrored 
  With respect to walls at x = 0, y = 0, or z = 0, elements take values of 0 or 1 
q = ( q x , q y , q z ) Q  Triple determining higher order reflections 
  Each element can take values between [ N m , N m ] 
x p , q sI  Source images 
x p , q rI  Receiver images 
R p , q sI r = x r x p , q sI  Vector pointing from source images to the receiver 
R p , q r sI = R p , q sI r  Vector pointing from receiver to source images 
R p ̃ , q ̃ s rI  The reversed vector corresponding to the same reflection path of R p , q r sI 
TABLE II.

Configurations of the transducers and the corresponding speaker orientations used in the room simulation. The position vector x r C 3 for Configuration 3 is listed for different shapes and sizes in TABLE III.

Configuration Source, orientation Receiver, orientation
[1.1, 1.1, 1.3], + x  [2.9, 1.9, 1.3], – x 
[1.1, 1.1, 1.3], + x  [1.9, 1.6, 1.4], – x 
[1.1, 1.1, 1.3], + x  x r C 3 , + x 
[0.4, 1.1, 1.3], + x  [2.1, 1.6, 1.3], – x 
[0.4, 1.1, 1.3], + x  [2.5, 2.6, 1.3], – y 
Configuration Source, orientation Receiver, orientation
[1.1, 1.1, 1.3], + x  [2.9, 1.9, 1.3], – x 
[1.1, 1.1, 1.3], + x  [1.9, 1.6, 1.4], – x 
[1.1, 1.1, 1.3], + x  x r C 3 , + x 
[0.4, 1.1, 1.3], + x  [2.1, 1.6, 1.3], – x 
[0.4, 1.1, 1.3], + x  [2.5, 2.6, 1.3], – y 
TABLE III.

Positions x r C 3 for simulations with the source and receiver on the same speaker (Configuration 3).

Speaker shape Size 1 (large) Size 2 (small)
Cuboid  [1.05, 1.1, 1.5]  [1.075, 1.1, 1.4] 
Spherical  [1.05, 1.1, 1.473]  [1.075, 1.1, 1.387] 
Cylindrical  [1.05, 1.1, 1.5]  [1.075, 1.1, 1.4] 
Speaker shape Size 1 (large) Size 2 (small)
Cuboid  [1.05, 1.1, 1.5]  [1.075, 1.1, 1.4] 
Spherical  [1.05, 1.1, 1.473]  [1.075, 1.1, 1.387] 
Cylindrical  [1.05, 1.1, 1.5]  [1.075, 1.1, 1.4] 
FIG. 6.

A 2D view of the positions of the mounted transducers inside the room. The black dots and gray dots represent the approximate positions of the source and receiver transducers, respectively. The arrows of the dots indicate the directions in which the speakers face, and the numbers indicate the source and receiver positions.

FIG. 6.

A 2D view of the positions of the mounted transducers inside the room. The black dots and gray dots represent the approximate positions of the source and receiver transducers, respectively. The arrows of the dots indicate the directions in which the speakers face, and the numbers indicate the source and receiver positions.

Close modal

Each model is meshed using tetrahedral elements with an average size that ensures that 10 degrees of freedom per wavelength are used. Quadratic shape functions are used. The RTFs for each configuration are simulated, from 20 Hz to 1 kHz, in steps of 2 Hz.

In this section, we compare the DEISM-LC to existing RTF simulation methods. To this end, we choose the original ISM with omnidirectional transducers (referred to as ISM-Omni) and the FSRR method, as introduced in Eqs. (1) and (15), respectively. We demonstrate the effects of incorporating transducer directivities into the RTFs by comparing the ISM-Omni with the DEISM-LC. We are also interested in the differences between the FSRR method and DEISM-LC as they are computationally more efficient compared to the DEISM. In Fig. 7, the magnitudes and phases of the RTFs using the ISM-Omni, DEISM-LC, FSRR, and FEM are plotted. As an example, the results are shown for large cuboidal speakers in Configuration 2 with maximum reflection order 25 and maximum spherical harmonic order 5 for both the source and receiver. The magnitudes of the RTFs are given in terms of SPL in decibels and the unwrapped phases in radians. Unless specified, the magnitude and phase responses use the same units in the subsequent sections. The frequency-dependent SPL is defined as54 
(37)
where P rms is the root mean square (RMS) pressure calculated from the RTFs and P 0 = 2 × 10 5 Pa is the reference pressure. The FEM solutions are the reference RTFs in this section and throughout the following evaluations.
FIG. 7.

A comparison of the (unshifted) magnitudes and phase responses between the ISM-Omni, DEISM-LC, FSRR, and FEM solutions. The results are shown for large cuboidal speakers with Configuration 2. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver.

FIG. 7.

A comparison of the (unshifted) magnitudes and phase responses between the ISM-Omni, DEISM-LC, FSRR, and FEM solutions. The results are shown for large cuboidal speakers with Configuration 2. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver.

Close modal
The RTFs of the ISM-Omni were generated using an omnidirectional source and receiver at the same locations as those used in Configuration 2. The directivity coefficients of the source and receiver are the same and contain only a monopole component based on Eq. (22), i.e.,
(38)
where ( θ , ϕ ) denotes the directions in spherical coordinates of either the point source or the omnidirectional receiver relative to their local origins. For the RTFs of the FSRR given in Eq. (15), the directivity coefficients C ̃ n , m ( s ) ( k ) , C ̃ n , m ( r ) ( k ) are extrapolated from C n , m ( s ) ( k ) , C v , u ( r ) ( k ) to the measurement sphere with a radius of 1 m, and we choose M p , q = β ( p , q ).

Overall, the differences between the DEISM-LC and FEM are small, compared to those between the ISM-Omni and FEM and those between the FSRR and FEM. This indicates that the DEISM-LC can generate more realistic RTFs when compared to the other image source methods for the considered scenario. The large offset in magnitude responses between the DEISM-LC and ISM-Omni mainly come from the different source types, i.e., a point source used in ISM-Omni and a vibrating piston used in DEISM-LC. However, the locations and Q-factors of the resonant peaks and the phase responses also vary greatly. This shows that the directivities of the source and receiver, including sound diffraction around the speakers, can affect the RTFs to a large extent.

The differences between the DEISM-LC and FSRR are mainly attributed to the inherent mismatch in their RTF parameterizations, i.e., between Eqs. (36) and (15). The magnitude responses are different, not only in the amplitude, but also in the locations and Q-factors of the resonant peaks. The phase response of the FSRR even contains positive values at lower frequencies corresponding to non-causal behaviors, which results from the random sign B p , q used in Eq. (15).

Since the inherent differences between DEISM-LC and other methods are shown to be large, i.e., the ISM-Omni and FSRR, we limit the following evaluations to DEISM, DEISM-LC, and FEM.

In this section, we present a comparison between the FEM and DEISM solutions for the small spherical speaker when the distance between the source and receiver is changing, corresponding to Configurations 1–3 in Table II. The two distances in Configurations 1 and 2 are around 1.97 and 0.95 m, and the distances in Configuration 3 are around 0.1 and 0.2 m for the small and large speakers, respectively. The magnitude and phase are compared in Fig. 8. To avoid visual overlap of the curves for different configurations, the SPLs and phases are plotted by shifting the curves vertically, as denoted by the arrows.

FIG. 8.

Magnitudes and phase responses for small spherical speakers with varying distance between speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves. Config., configuration.

FIG. 8.

Magnitudes and phase responses for small spherical speakers with varying distance between speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves. Config., configuration.

Close modal

We see that good agreement is obtained between the FEM and DEISM solutions. In general, greater differences are found at the notches of the RTFs, than at the peaks. Additionally, the differences between DEISM and DEISM-LC solutions are much smaller than those between the FEM and DEISM solutions in the notches of the RTFs. Since Configuration 3 contains only one speaker in the room, the corresponding RTFs show fewer variations across the frequency range when compared to the other two configurations. Similar agreement between FEM and DEISM was found for the other speaker shapes. For brevity, the respective plots are not presented here.

Next, we compare solutions for cases where the source and receiver are placed on the same device (Configuration 3). For this comparison, we analyze data for the three speaker shapes considered.

The respective magnitude and phase solutions are shown in Fig. 9. Once again, good agreement is obtained between the FEM and DEISM solutions. The magnitudes and phases of the RTFs of the cuboidal and cylindrical speakers show very similar behaviour. In contrast, those of the spherical speaker exhibit clear differences when compared to the RTFs of the other speakers. Note that the main differences appear at the notches of the RTF magnitudes, which correspond to frequencies around 444 and 650 Hz. In addition, the phase curves of the cuboidal and cylindrical speakers also exhibit significant jumps at some frequencies, which cannot be observed for the spherical speaker. The differences between the curves indicate that the edges of the cuboidal and cylindrical speakers can induce additional diffraction and lead to distinct characteristics in the RTFs.

FIG. 9.

Magnitudes and phase responses for large speakers with Configuration 3 while varying shapes of the speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves.

FIG. 9.

Magnitudes and phase responses for large speakers with Configuration 3 while varying shapes of the speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves.

Close modal
In this section, we quantify the differences between the FEM and DEISM/DEISM-LC solutions and DEISM and DEISM-LC solutions. The following error metrics are used for the comparison: The root mean square log spectral distance (LSD)67 
(39)
and the normalized root mean square (NRMS) phase error in radians48 
(40)
where K is the set of the K wavenumbers used in simulations, H Test ( k ) is the simulated RTF of either the DEISM or DEISM-LC at wavenumber k, and H FEM ( k ) is the simulated RTF of the FEM at wavenumber k. The log spectral distance and the phase errors are presented for different positions, speaker shapes, and sizes in Tables IV and V, respectively. In both tables, lower values indicate better performance. General observations can be made for Positions 1–3 in the two tables:
  • The smaller the speaker, the smaller the difference between the DEISM and FEM solutions.

  • The fewer speakers, i.e., for Configuration 3, the smaller the difference.

TABLE IV.

Root mean square log spectral distance in decibel between DEISM and FEM, and between DEISM-LC and FEM. S., small. Cub., sph., and cyl. denote cuboidal, spherical, and cylindrical, respectively.

Configuration Cub. S. cub. Sph. S. sph. Cyl. S. cyl.
1 DEISM  1.918  1.214  2.268  1.143  2.062  0.950 
1 DEISM-LC  1.669  1.070  2.206  1.125  2.007  0.916 
2 DEISM  1.974  1.051  2.521  1.003  1.900  0.964 
2 DEISM-LC  1.855  1.003  2.060  0.984  1.809  0.943 
3 DEISM  0.423  0.204  0.841  0.170  0.920  0.154 
3 DEISM-LC  0.427  0.206  0.883  0.174  0.952  0.155 
4 DEISM  7.018  8.187  6.457  7.418  6.350  7.287 
4 DEISM-LC  6.864  8.149  6.461  7.298  6.240  7.211 
5 DEISM  6.887  6.810  7.077  6.309  7.011  6.425 
5 DEISM-LC  6.792  6.807  6.986  6.277  6.968  6.397 
Configuration Cub. S. cub. Sph. S. sph. Cyl. S. cyl.
1 DEISM  1.918  1.214  2.268  1.143  2.062  0.950 
1 DEISM-LC  1.669  1.070  2.206  1.125  2.007  0.916 
2 DEISM  1.974  1.051  2.521  1.003  1.900  0.964 
2 DEISM-LC  1.855  1.003  2.060  0.984  1.809  0.943 
3 DEISM  0.423  0.204  0.841  0.170  0.920  0.154 
3 DEISM-LC  0.427  0.206  0.883  0.174  0.952  0.155 
4 DEISM  7.018  8.187  6.457  7.418  6.350  7.287 
4 DEISM-LC  6.864  8.149  6.461  7.298  6.240  7.211 
5 DEISM  6.887  6.810  7.077  6.309  7.011  6.425 
5 DEISM-LC  6.792  6.807  6.986  6.277  6.968  6.397 
TABLE V.

Normalized root mean square phase error in radians between DEISM and FEM, and between DEISM-LC and FEM. S., denotes small. Cub., sph., and cyl. denote cuboidal, spherical, and cylindrical, respectively.

Configuration Cub. S. cub. Sph. S. sph. Cyl. S. cyl.
1 DEISM  0.258  0.179  0.307  0.174  0.278  0.162 
1 DEISM-LC  0.265  0.153  0.300  0.176  0.267  0.181 
2 DEISM  0.251  0.138  0.246  0.128  0.255  0.123 
2 DEISM-LC  0.246  0.149  0.259  0.142  0.260  0.144 
3 DEISM  0.051  0.026  0.104  0.032  0.093  0.024 
3 DEISM-LC  0.052  0.026  0.109  0.032  0.097  0.024 
4 DEISM  1.796  1.764  1.834  1.776  1.801  1.778 
4 DEISM-LC  1.791  1.763  1.828  1.774  1.781  1.776 
5 DEISM  1.795  1.926  1.818  1.906  1.822  1.921 
5 DEISM-LC  1.791  1.928  1.816  1.907  1.821  1.923 
Configuration Cub. S. cub. Sph. S. sph. Cyl. S. cyl.
1 DEISM  0.258  0.179  0.307  0.174  0.278  0.162 
1 DEISM-LC  0.265  0.153  0.300  0.176  0.267  0.181 
2 DEISM  0.251  0.138  0.246  0.128  0.255  0.123 
2 DEISM-LC  0.246  0.149  0.259  0.142  0.260  0.144 
3 DEISM  0.051  0.026  0.104  0.032  0.093  0.024 
3 DEISM-LC  0.052  0.026  0.109  0.032  0.097  0.024 
4 DEISM  1.796  1.764  1.834  1.776  1.801  1.778 
4 DEISM-LC  1.791  1.763  1.828  1.774  1.781  1.776 
5 DEISM  1.795  1.926  1.818  1.906  1.822  1.921 
5 DEISM-LC  1.791  1.928  1.816  1.907  1.821  1.923 

The errors of the DEISM solutions increase when the speakers are located close to one wall of the room, i.e., for Configurations 4 and 5. The large errors may be attributed to the mirror source approximation since the speakers can have more complicated wave interaction with the nearby walls, which is not considered in this model, and can lead to additional RTF fluctuations. For devices with a physical extent, the assumption of a source image might not hold and could require more complex modeling.

In Configuration 3, the DEISM outperforms DEISM-LC for all speaker shapes and sizes. However, this does not necessarily hold for Configurations 1 and 2. This is mainly caused by the notches of the RTF magnitudes, where the DEISM produces much lower values than DEISM-LC, as shown in Figs. 8 and 9.

To provide insight into the performance of the methods in larger rooms, we consider the case in which the room's length in the x-dimension is doubled. For this test, only the cylindrical driver is considered in Configurations 1–3. The error levels obtained are presented in Table VI. The error levels for the larger room are similar to the values given in Tables IV and V, and imply similar conclusions.

TABLE VI.

Root mean square log spectral distance in decibel and normalized root mean square phase error in radians between DEISM and FEM, and between DEISM-LC and FEM in a larger room. S., small.

RMS LSD (decibel) NRMS phase error (radians)
Configuration Cyl. S. cyl. Cyl. S. cyl.
1 DEISM  1.177  0.967  0.185  0.128 
1 DEISM-LC  1.190  1.036  0.227  0.126 
2 DEISM  0.987  0.928  0.123  0.131 
2 DEISM-LC  1.188  0.972  0.153  0.119 
3 DEISM  0.783  0.319  0.151  0.038 
3 DEISM-LC  0.893  0.320  0.155  0.038 
RMS LSD (decibel) NRMS phase error (radians)
Configuration Cyl. S. cyl. Cyl. S. cyl.
1 DEISM  1.177  0.967  0.185  0.128 
1 DEISM-LC  1.190  1.036  0.227  0.126 
2 DEISM  0.987  0.928  0.123  0.131 
2 DEISM-LC  1.188  0.972  0.153  0.119 
3 DEISM  0.783  0.319  0.151  0.038 
3 DEISM-LC  0.893  0.320  0.155  0.038 
In Sec. IV E, we showed that in some cases, DEISM-LC achieves a smaller difference to the FEM than DEISM, especially at the notches of the RTFs. It is worth noting that the frequencies of the notches might not be accurate since the maximum resolution of the frequencies is 2 Hz. Therefore, directly comparing the DEISMs with the FEM might not be enough to determine whether the simplified DEISM performs worse or better. In the following, we investigate the difference between the DEISMs by computing the relative error using the 2 norm, defined as
(41)
where H ( · ) is the complex-valued vector with the transfer function values H ( · ) ( k ) of all wavenumbers. In this comparison, we consider the error in free-field and reverberant conditions.

The errors in free-field conditions are given in Fig. 10, where the source–receiver distance and the error are plotted on logarithmic scales. One can observe that the difference decreases to around 1% at a distance of 10 m and to around 0.1% at a distance of 100 m. It can be concluded that DEISM-LC asymptotically converges to the DEISM solution as the distance between the source and receiver increases, which is expected from the derivations in Sec. III D.

FIG. 10.

Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions evaluated for the direct path only while changing the distance between the source and receiver. Both the distance and relative errors are plotted on a logarithmic scale.

FIG. 10.

Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions evaluated for the direct path only while changing the distance between the source and receiver. Both the distance and relative errors are plotted on a logarithmic scale.

Close modal

In Fig. 11, the relative error is evaluated by changing the maximum reflection order in the DEISM for the cuboidal speakers. Note that for the curves of Configuration 3, the data of reflection order zero is missing since the direct path is simulated using the FEM. The difference between the DEISM solutions reduces for the first few reflection orders, but remains constant for higher reflection orders. Similar behaviors were also found for other speaker shapes, which are not presented here for brevity.

FIG. 11.

Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions as a function of reflection order for the cuboidal speakers at Positions 1, 2, and 3.

FIG. 11.

Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions as a function of reflection order for the cuboidal speakers at Positions 1, 2, and 3.

Close modal

The presented comparisons and error analyses show that the proposed method and its simplified version can simulate RTFs that are in good agreement with FEM solutions in Configurations 1–3. Less agreement is found when a speaker is placed close to the walls, i.e., in Configurations 4 and 5. This might be attributed to the comb filtering caused by the direct path and reflection between the wall and speaker when the distance is less than 0.5 m.68 

The errors between the DEISM and FEM solutions decrease when smaller speakers are used in the simulation. Additionally, the lowest error is obtained when the source and receiver are placed on the same speaker, which reduces the number of speakers in the simulation, and hence, the number of reflecting surfaces. These observations support the hypothesis that the errors come mostly from the omission of speakers with physical extent in the ISM model, i.e., scattering effects between the walls and the speaker's surfaces are not included.

We have also shown that the simplified version (DEISM-LC) converges to its complete version (DEISM) in the free field and yields a small bias when used to simulate RTFs. The simplified version, DEISM-LC, yields a better agreement with the FEM simulations as opposed to the existing FSRR method.

In addition, the computation of a single RTF using FEM took approximately 2 days on a desktop computer with 256 GB of random access memory (RAM), while the computation of the same RTF using DEISM took approximately 5 min on a laptop with 16 GB of RAM. Moreover, DEISM-LC was around ten times faster than DEISM for the scenarios considered in this paper. This reduction of the time needed to compute RTFs can facilitate applications that require large amounts of data, e.g., the generation of training data for data-driven approaches.

A RTF simulation method, which makes use of spherical harmonic directivity coefficients, and which is based on the image source method, has been presented. The directivity coefficients contain information about the sound diffraction around transducers mounted on audio devices, and the method presented aims to capture this effect. The proposed method (DEISM) and its simplified version (DEISM-LC) were compared with FEM solutions and evaluated for different acoustic scenarios. The results show that the proposed approaches can simulate RTFs with good agreement to FEM simulations for shoebox-shaped rooms when devices are not placed close to the room boundaries. A rule of thumb for indicating the regions of the room in which the DEISM provides results that are significantly different from FEM results is a topic for future studies.

Future work could include extending the proposed method to incorporate the sound reflections between the enclosures of multiple devices, or between the enclosures and room boundaries. Such extensions might reduce the error incurred when multiple devices are used or when devices are placed close to the room boundaries. Another valuable extension of the proposed approach would be to apply the DEISM and DEISM-LC to rooms with more general geometries, with the aim of efficiently generating more realistic room acoustics simulations.

1.
M. R.
Schroeder
, “
Novel uses of digital computers in room acoustics
,”
J. Acoust. Soc. Am.
33
(
11
),
1669
(
1961
).
2.
M.
Kleiner
,
B.-I.
Dalenbäck
, and
P.
Svensson
, “
Auralization-an overview
,”
J. Audio Eng. Soc.
41
(
11
),
861
875
(
1993
), available at https://www.aes.org/e-lib/browse.cfm?elib=6976.
3.
L.
Savioja
, “
Modeling techniques for virtual acoustics
,” Ph.D. thesis,
Helsinki University of Technology
,
Espoo, Finland
,
1999
.
4.
D.
Schröder
, “
Physically based real-time auralization of interactive virtual environments
,” Ph.D. thesis,
RWTH Aachen
,
Aachen, Germany
,
2011
.
5.
M.
Vorländer
, “
Computer simulations in room acoustics: Concepts and uncertainties
,”
J. Acoust. Soc. Am.
133
(
3
),
1203
1213
(
2013
).
6.
C.
Schissler
and
D.
Manocha
, “
Interactive sound propagation and rendering for large multi-source scenes
,”
ACM Trans. Graph. (TOG)
36
(
4
),
2:1
2:12
(
2016
).
7.
M.
Vorländer
,
Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality
, 2nd ed. (
Springer
,
Berlin/Heidelberg, Germany
,
2020
).
8.
Z.
Tang
,
H.-Y.
Meng
, and
D.
Manocha
, “
Learning acoustic scattering fields for dynamic interactive sound propagation
,” in
IEEE Virtual Reality and 3D User Interfaces (VR)
,
Lisboa
,
Portugal
(
2021
), pp.
835
844
.
9.
E.
Mabande
,
K.
Kowalczyk
,
H.
Sun
, and
W.
Kellermann
, “
Room geometry inference based on spherical microphone array eigenbeam processing
,”
J. Acoust. Soc. Am.
134
(
4
),
2773
2789
(
2013
).
10.
D.
Cherkassky
and
S.
Gannot
, “
Blind synchronization in wireless acoustic sensor networks
,”
IEEE/ACM Trans. Audio. Speech. Lang. Process.
25
(
3
),
651
661
(
2017
).
11.
C.
Evers
and
P. A.
Naylor
, “
Acoustic slam
,”
IEEE/ACM Trans. Audio. Speech. Lang. Process.
26
(
9
),
1484
1498
(
2018
).
12.
S.
Das
and
T.
Bäckström
, “
Enhancement by postfiltering for speech and audio coding in ad hoc sensor networks
,”
JASA Express Lett.
1
(
1
),
015206
(
2021
).
13.
A.
Herzog
and
E. A. P.
Habets
, “
Distance estimation in the spherical harmonic domain using the spherical wave model
,”
Appl. Acoust.
193
,
108733
(
2022
).
14.
D.
Diaz-Guerra
,
A.
Miguel
, and
J. R.
Beltran
, “
Robust sound source tracking using SRP-PHAT and 3D convolutional neural networks
,”
IEEE/ACM Trans. Audio. Speech. Lang. Process.
29
,
300
311
(
2021
).
15.
F. B.
Gelderblom
,
Y.
Liu
,
J.
Kvam
, and
T. A.
Myrvoll
, “
Synthetic data for dnn-based doa estimation of indoor speech
,” in
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
Toronto, Ontario, Canada
(
2021
), pp.
4390
4394
.
16.
F.
Hübner
,
W.
Mack
, and
E. A. P.
Habets
, “
Efficient training data generation for phase-based DOA estimation
,” in
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Toronto, Canada (
2021
), pp.
456
460
.
17.
J.
Wechsler
,
W.
Mack
, and
E. A. P.
Habets
, “
End-to-end signal-aware direction-of-arrival estimation using weighted steered-response power
,”
European Signal Processing Conference (EUSIPCO)
, Belgrade, Serbia (
2022
), pp.
41
45
.
18.
P.
Srivastava
,
A.
Deleforge
, and
E.
Vincent
, “
Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators
,” in
International Workshop on Acoustic Signal Enhancement (IWAENC)
,
Bamberg
,
Germany
(
2022
), pp.
1
5
.
19.
P.
Srivastava
,
A.
Deleforge
,
A.
Politis
, and
E.
Vincent
, “
How to (virtually) train your speaker localizer
,” in Proceedings of Interspeech, Dublin, Ireland (
2023
), pp.
1204
1208
.
20.
A.
Herzog
,
S. R.
Chetupalli
, and
E. A. P.
Habets
, “
AmbiSep: Ambisonic-to-ambisonic reverberant speech separation using transformer networks
,” in
International Workshop on Acoustic Signal Enhancement (IWAENC)
,
Bamberg
,
Germany
(
2022
), pp.
1
5
.
21.
P.-A.
Grumiaux
,
S.
Kitić
,
L.
Girin
, and
A.
Guérin
, “
A survey of sound source localization with deep learning methods
,”
J. Acoust. Soc. Am.
152
(
1
),
107
151
(
2022
).
22.
J. B.
Allen
and
D. A.
Berkley
, “
Image method for efficiently simulating small-room acoustics
,”
J. Acoust. Soc. Am.
65
(
4
),
943
950
(
1979
).
23.
P. M.
Peterson
, “
Simulating the response of multiple microphones to a single acoustic source in a reverberant room
,”
J. Acoust. Soc. Am.
80
(
5
),
1527
1529
(
1986
).
24.
J. C. B.
Torres
,
L.
Aspöck
, and
M.
Vorländer
, “
Comparative study of two geometrical acoustic simulation models
,”
J. Braz. Soc. Mech. Sci. Eng.
40
(
6
),
1
15
(
2018
).
25.
Z.
Tang
,
L.
Chen
,
B.
Wu
,
D.
Yu
, and
D.
Manocha
, “
Improving reverberant speech training using diffuse acoustic simulation
,” in
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Barcelona, Spain (
2020
), pp.
6969
6973
.
26.
E.
Bezzam
,
R.
Scheibler
,
C.
Cadoux
, and
T.
Gisselbrecht
, “
A study on more realistic room simulation for far-field keyword spotting
,” in
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
, Auckland, New Zealand (
2020
), pp.
674
680
.
27.
D. T.
Murphy
, “
Digital waveguide mesh topologies in room acoustics modelling
,” Ph.D. thesis,
University of York
,
York, UK
,
2000
.
28.
K.
Kowalczyk
, “
Boundary and medium modelling using compact finite difference schemes in simulations of room acoustics for audio and architectural design applications
,” Ph.D. thesis,
Queen's University Belfast
,
Belfast, UK
,
2010
.
29.
T.
Sakuma
,
S.
Sakamoto
, and
T.
Otsuru
,
Computational Simulation in Architectural and Environmental Acoustics
(
Springer, Tokyo,
2014
).
30.
B.
Hamilton
and
S.
Bilbao
, “
FDTD methods for 3-D room acoustics simulation with high-order accuracy in space and time
,”
IEEE/ACM Trans. Audio. Speech. Lang. Process.
25
(
11
),
2112
2124
(
2017
).
31.
J. A.
Hargreaves
,
L. R.
Rendell
, and
Y. W.
Lam
, “
A framework for auralization of boundary element method simulations including source and receiver directivity
,”
J. Acoust. Soc. Am.
145
(
4
),
2625
2637
(
2019
).
32.
T.
Yoshida
,
T.
Okuzono
, and
K.
Sakagami
, “
A parallel dissipation-free and dispersion-optimized explicit time-domain FEM for large-scale room acoustics simulation
,”
Buildings
12
(
2
),
105
(
2022
).
33.
M.
Aretz
, “
Combined wave and ray based room acoustic simulations of small rooms
,” Ph.D. thesis,
RWTH Aachen
,
Aachen, Germany
,
2012
.
34.
M. R.
Thomas
, “
Wayverb: A graphical tool for hybrid room acoustics simulation
,” Ph.D. thesis,
University of Huddersfield
,
Huddersfield, UK
,
2017
.
35.
A.
Ratnarajah
,
Z.
Tang
, and
D.
Manocha
, “
TS-RIR: Translated synthetic room impulse responses for speech augmentation
,” in
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
,
Cartagena
,
Columbia
(
2021
), pp.
259
266
.
36.
A.
Luo
,
Y.
Du
,
M. J.
Tarr
,
J. B.
Tenenbaum
,
A.
Torralba
, and
C.
Gan
, “
Learning neural acoustic fields
,” in
Conference on Neural Information Processing Systems (NeurIPS)
,
New Orleans, LA
(
2022
), available at https://openreview.net/pdf?id=D21DRzkZbSB.
37.
J.
Borish
, “
Extension of the image model to arbitrary polyhedra
,”
J. Acoust. Soc. Am.
75
(
6
),
1827
1836
(
1984
).
38.
R.
Scheibler
,
E.
Bezzam
, and
I.
Dokmanić
, “
Pyroomacoustics: A Python package for audio room simulation and array processing algorithms
,” in
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
Calgary
,
Alberta
,
Canada
,
351
355
(
2018
).
39.
M.
Aretz
,
P.
Dietrich
, and
M.
Vorländer
, “
Application of the mirror source method for low frequency sound prediction in rectangular rooms
,”
Acta Acust. united Acust.
100
(
2
),
306
319
(
2014
).
40.
M. R.
Schroeder
and
K. H.
Kuttruff
, “
On frequency response curves in rooms. Comparison of experimental, theoretical, and Monte Carlo results for the average frequency spacing between maxima
,”
J. Acoust. Soc. Am.
34
(
1
),
76
80
(
1962
).
41.
E. A. P.
Habets
, “
Room impulse response generator
,”
Technical report
(
2000
)
1
21
.
42.
A.
Wabnitz
,
N.
Epain
,
C.
Jin
, and
A.
Van Schaik
, “
Room acoustics simulation for multichannel microphone arrays
,” in
Proceedings of the International Symposium on Room Acoustics
, Melbourne, Australia,
Citeseer
(
2010
), pp.
1
6
.
43.
D.
Diaz-Guerra
,
A.
Miguel
, and
J. R.
Beltran
, “
gpuRIR: A python library for room impulse response simulation with GPU acceleration
,”
Multimed. Tools Appl.
80
(
4
),
5653
5671
(
2021
).
44.
Z.-h.
Fu
and
J.-w.
Li
, “
GPU-based image method for room impulse response calculation
,”
Multimed. Tools Appl.
75
(
9
),
5205
5221
(
2016
).
45.
M.
Kompis
and
N.
Dillier
, “
Simulating transfer functions in a reverberant room including source directivity and head-shadow effects
,”
J. Acoust. Soc. Am.
93
(
5
),
2779
2787
(
1993
).
46.
T.
Betlehem
and
M.
Poletti
, “
Sound field of a directional source in a reverberant room
,”
J. Acoust. Soc. N. Z.
25
(
4
),
12
22
(
2012
).
47.
F.
Brinkmann
,
V.
Erbes
, and
S.
Weinzierl
, “
Extending the closed form image source model for source directivity
,” in
DAGA, München
(
2018
).
48.
D. P.
Jarrett
,
E. A. P.
Habets
,
M. R. P.
Thomas
, and
P. A.
Naylor
, “
Rigid sphere room impulse response simulation: Algorithm and applications
,”
J. Acoust. Soc. Am.
132
(
3
),
1462
1472
(
2012
).
49.
S.
Hafezi
,
A. H.
Moore
, and
P. A.
Naylor
, “
Modelling source directivity in room impulse response simulation for spherical microphone arrays
,” in
European Signal Processing Conference (EUSIPCO)
,
Nice
,
France
(
2015
), pp.
574
578
.
50.
P. N.
Samarasinghe
,
T. D.
Abhayapala
,
Y.
Lu
,
H.
Chen
, and
G.
Dickins
, “
Spherical harmonics based generalized image source method for simulating room acoustics
,”
J. Acoust. Soc. Am.
144
(
3
),
1381
1391
(
2018
).
51.
Y.
Luo
and
W.
Kim
, “
Fast source-room-receiver acoustics modeling
,” in
European Signal Processing Conference (EUSIPCO)
(
2021
), pp.
51
55
.
52.
Z.
Xu
,
A.
Herzog
,
A.
Lodermeyer
,
E. A. P.
Habets
, and
A. G.
Prinn
, “
Acoustic reciprocity in the spherical harmonic domain: A formulation for directional sources and receivers
,”
JASA Express Lett.
2
,
124801
(
2022
).
53.
H.
Carslaw
, “
Some multiform solutions of the partial differential equations of physical mathematics and their applications
,”
Proc. London Math. Soc.
s1-30
(
1
),
121
165
(
1898
), https://academic.oup.com/plms/article/s1-30/1/121/2917781 (Last viewed December 2023).
54.
H.
Kuttruff
,
Room Acoustics
(
CRC Press
,
Boca Raton, FL
,
2017
).
55.
D. P.
Jarrett
and
E. A. P.
Habets
, “
On the noise reduction performance of a spherical harmonic domain tradeoff beamformer
,”
IEEE Signal Process. Lett.
19
(
11
),
773
776
(
2012
).
56.
E. G.
Williams
,
Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography
, 1st ed. (
Academic Press
,
London
,
1999
).
57.
W.
Zhang
,
T. D.
Abhayapala
,
R. A.
Kennedy
, and
R.
Duraiswami
, “
Insights into head-related transfer function: Spatial dimensionality and continuous representation
,”
J. Acoust. Soc. Am.
127
(
4
),
2347
2357
(
2010
).
58.
P. N.
Samarasinghe
and
T. D.
Abhayapala
, “
Room transfer function measurement from a directional loudspeaker
,” in
IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
,
Xi'an
,
China
(
2016
), pp.
1
5
.
59.
D.
Ward
and
T.
Abhayapala
, “
Reproduction of a plane-wave sound field using an array of loudspeakers
,”
IEEE Trans. Speech Audio Process.
9
(
6
),
697
707
(
2001
).
60.
B.
Rafaely
,
Fundamentals of Spherical Array Processing
(
Springer
,
New York
,
2015
), Vol. 8.
61.
J.
Ahrens
and
S.
Bilbao
, “
Computation of spherical harmonics based sound source directivity models from sparse measurement data
,”
Forum Acusticum
(
2020
), pp.
2019
2026
.
62.
J. G.
Tylka
,
R.
Sridhar
, and
E.
Choueiri
, “
A database of loudspeaker polar radiation measurements
,” in 139th Audio Engineering Society Convention, New York (
2015
), pp.
1
4
.
63.
A. R.
Edmonds
,
Angular Momentum in Quantum Mechanics
(
Princeton University Press
,
Princeton, NJ
,
2016
).
64.
See https://github.com/audiolabs/DEISM for the implementation of the algorithm described in Sec. III E.
65.
J.-P.
Berenger
, “
A perfectly matched layer for the absorption of electromagnetic waves
,”
J. Comput. Phys
114
(
2
),
185
200
(
1994
).
66.
P. M.
Morse
and
R. H.
Bolt
, “
Sound waves in rooms
,”
Rev. Mod. Phys.
16
(
2
),
69
150
(
1944
).
67.
A.
Gray
and
J.
Markel
, “
Distance measures for speech processing
,”
IEEE Trans. Acoust, Speech, Signal Process.
24
(
5
),
380
391
(
1976
).
68.
J.
Paasonen
,
A.
Karapetyan
,
J.
Plogsties
, and
V.
Pulkki
, “
Proximity of surfaces – Acoustic and perceptual effects
,”
J. Audio Eng. Soc.
65
(
12
),
997
1004
(
2017
).