The image source method (ISM) is often used to simulate room acoustics due to its ease of use and computational efficiency. The standard ISM is limited to simulations of room impulse responses between point sources and omnidirectional receivers. In this work, the ISM is extended using spherical harmonic directivity coefficients to include acoustic diffraction effects. These effects occur in practice when transducers are mounted on audio devices of finite spatial extent, e.g., modern smart speakers with loudspeakers and microphones. The proposed method is verified using finite element simulations of various loudspeaker and microphone configurations in a shoebox-shaped room. It is shown that the accuracy of the proposed method is related to the sizes, shapes, number, and positions of the devices inside a room. A simplified version of the proposed method, which can significantly reduce computational effort, is also presented. The proposed method and its simplified version can simulate room transfer functions more accurately than currently available image source methods and can aid the development and evaluation of speech and acoustic signal processing algorithms, including speech enhancement, acoustic scene analysis, and acoustic parameter estimation.
I. INTRODUCTION
Room acoustic simulation has been an active research field since the 1960s.1 It is of paramount importance for, e.g., auralization and virtual acoustics,2–8 evaluation of acoustic signal processing algorithms,9–13 and data generation for data-driven methods.14–21 Room acoustics simulation methods can be broadly categorized into geometric methods,22–26 wave-based methods,27–32 hybrid methods,33,34 and, more recently, deep-learning–based methods.8,35,36 This paper presents an extension to the well-known geometrical method, the image source method (ISM), which enables the inclusion of acoustic diffraction effects from audio devices of finite extent, e.g., smart speakers on which the transducers are mounted, in shoebox-shaped rooms. The extension to arbitrary room shapes, which has been proposed by Borish37 and realized in well-known toolboxes,4,38 and the consideration of sound scattering by room boundary surfaces is outside the scope of this paper.
Modern smart speakers are a typical embodiment of an audio device that contains loudspeakers and microphones. Generating a reverberant room acoustic simulation that includes the directivities of loudspeakers and microphones due to acoustic diffraction effects is essential for developing and testing new acoustic signal processing algorithms. When using wave-based methods, the acoustic diffraction effects caused by the loudspeaker enclosure are implicitly modeled. However, because of the much higher computational cost of solving wave-based models, geometrical methods are often preferred. While assuming that sound waves behave like light rays does not apply to long wavelengths, Aretz et al.39 have shown that the ISM can be used to approximate wave-based method solutions above the Schroeder frequency40 in rectangular rooms by specifying angle-dependent reflection coefficients. Thus, the ISM can be used to obtain acceptable solutions across most of the audible frequency range above the Schroeder frequency.
The ISM has been implemented in many widely used toolboxes,4,38,41–43 along with various approaches for reducing the computational complexity, for example, parallelization based on graphics processing units (GPUs).43,44 Directional properties of the source and receiver were also incorporated into the ISM to produce more realistic room impulse responses,38,41,42,45–47 and efforts have been made to include acoustic diffraction effects due to the human head45 and rigid spherical baffles.48,49 Jarrett et al.48 extended the ISM to include rigid spherical microphones using an analytical expression of the diffraction effects in the spherical harmonic domain. Samarasinghe et al.50 further extended the ISM, viz., the generalized ISM (GISM), to facilitate parameterization of the reverberant transfer function between a directional source and a directional receiver. More recently, an extension of the ISM to incorporate loudspeaker directivities in the far field in the spherical harmonic domain has been reported.51 However, the existing approaches cannot simulate a transfer function between a directive source and a directive receiver that includes near-field diffraction effects.
In this work, the GISM50 is extended to enable the incorporation of the local acoustic diffraction effects that result from the interaction of sound waves with audio devices of finite spatial extent, which are used to excite and capture the acoustic field in a room. First, we review the basic concepts of the ISM and its extension to the spherical harmonic domain in Sec. II. In Sec. III, we present our proposed method, which incorporates the source directivity coefficients computed from solutions of an exterior radiation problem and receiver directivity coefficients obtained using reciprocity.52 The pseudocode of the proposed method is also included at the end of the section. In addition, we derive a simplified version of the proposed method to reduce computational complexity. The finite element method (FEM) is used to verify the proposed method by comparing room transfer functions (RTFs) with the source and receiver mounted on either different devices or on the same device. The experimental setups and the evaluation of the proposed method are presented in Sec. IV. The effect of varying the parameters of the device, e.g., its position in the room, physical shape, and size, on the accuracy of the proposed method, is investigated. Finally, conclusions are drawn in Sec. V.
II. THE IMAGE SOURCE METHOD
A. Standard image source method
The ISM was first applied to acoustics by Carslaw53 and later implemented using a digital computer by Allen and Berkley22 to model the impulse response between a point source and an omnidirectional receiver in a reverberant rectangular room.
B. Extensions of the standard ISM
Many extensions to the ISM have been proposed to facilitate more realistic modeling of room acoustics. In this section, we discuss two important extensions of the ISM relevant to the present study, namely, using angle-dependent reflection coefficients and including source and receiver directivities.
1. Angle-dependent reflection coefficients
In Eq. (1), angle-independent reflection coefficients are used. Aretz et al.39 have shown that including the angle of incidence when specifying the boundary conditions in the ISM can result in solutions that agree with FEM solutions. As such, angle-dependent reflection coefficients are used in this work.
2. Extension to the spherical harmonic domain
In practice, the sound source and receiver in a room usually exhibit directional properties that are far more complex than those of a point source or an omnidirectional receiver. Various methods for incorporating source and receiver directivities into the ISM have been presented in the literature.15,18,38,41,42,45–47,49 In this section, we present the spherical microphone array impulse response (SMIR) generator proposed by Jarrett et al.55
C. Generalized image source method
The method introduced in Eq. (8) to model directional sources and receivers is restricted to directivity patterns with known analytical expressions. In practice, directivity patterns may exhibit complex shapes without analytical expressions. Samarasinghe et al.50 further extended the ISM to facilitate the parameterization of the transfer function between a source with arbitrary directivity and a receiver region. The directional properties of the receiver region can be formed using beamforming techniques based on the received sound field. In this section, we describe the GISM, which is the foundation of our proposed method.
The GISM decomposes the transfer function into three parts: (i) an outgoing sound field from the directional source, (ii) an incident sound field captured by the receiver that can be directional, and (iii) the mode coupling coefficients that couple the source and receiver directivities in the spherical harmonic domain.
Illustration of the scenario modeled by Samarasinghe et al. (Ref. 50). A directional source is positioned inside a rectangular room. The room transfer function from the source located at to an observation point around the receiver located at can be simulated.
Illustration of the scenario modeled by Samarasinghe et al. (Ref. 50). A directional source is positioned inside a rectangular room. The room transfer function from the source located at to an observation point around the receiver located at can be simulated.
D. Fast source room receiver method
The fast source room receiver (FSRR) method, proposed by Luo and Kim,51 also simulates the RTF in the spherical harmonic domain. The RTF parameterization includes an approximation of the spherical harmonic description of the source and receiver directivities and utilizes a simplification of the cross-modal coupling, i.e., the mode coupling coefficients in Eq. (12). The method also uses a randomized sign parameter. Notably, the FSRR method uses reversed reflection paths—a technique explored in this work for deriving a simplified version of the proposed method, which is presented in Sec. III D.
III. PROPOSED METHOD
We have seen that the SMIR generator relies on analytical descriptions of directivity, the GISM simulates either a directive source or a directive receiver, and the FSRR method approximates the RTF between directional sources and receivers. In this section, we propose a method for simulating RTFs that can include near-field diffraction effects of both directional sources and directional receivers.
A. Problem formulation
We consider the following scenario: one source device and one receiver device are positioned in a rectangular room, as depicted in Fig. 2. The source contains a sound-radiating transducer located at , e.g., a vibrating piston on one surface, and a microphone located at is mounted on the receiver. The two devices can have arbitrary shapes and materials, and the transducers can exhibit arbitrary directivity patterns. The acoustic field generated by the source transducer interacts with both devices and the walls and is captured by the microphone. We aim to incorporate the local diffraction effects around the devices into the RTFs based on the ISM. The existing methods are unable to resolve the problem under consideration due to the following reasons:
-
As shown by Xu et al.,52 the terms corresponding to the incident sound field in Eq. (11) cannot model the diffraction around the receiver. Therefore, even utilizing the acoustic reciprocity principle, the GISM can only model the RTFs when either the source or the receiver does not include local diffraction effects.
-
The FSRR method51 extends the ISM to include the directivity of both the source and receiver using a far-field approximation and, therefore, cannot model near-field directivities.
Illustration of the scenario considered for the proposed method. Two speakers are positioned inside a rectangular room. The source speaker has a vibrating piston on one of its surfaces located at . The receiver speaker has a microphone on the top surface located at . The room transfer function from to is of interest. Not shown here is the room in which the speakers are placed. The two spherical regions marked in light gray are nonoverlapping.
Illustration of the scenario considered for the proposed method. Two speakers are positioned inside a rectangular room. The source speaker has a vibrating piston on one of its surfaces located at . The receiver speaker has a microphone on the top surface located at . The room transfer function from to is of interest. Not shown here is the room in which the speakers are placed. The two spherical regions marked in light gray are nonoverlapping.
B. Source and receiver directivities
C. Room transfer function
D. Far-field approximation
The computational cost of the DEISM RTF is expected to increase with frequency since the directivity patterns tend to be more complicated at higher frequencies,62 resulting in higher orders of the directivities, i.e., larger N and V, required in the DEISM. As either N or V increases, the number of summations over l in the mode coupling coefficients in Eq. (12) increases, thus consuming more computational resources. Therefore, it is necessary to investigate the possibility of simplifying the DEISM in Eq. (21) while maintaining sufficient accuracy. To this end, a far-field approximation is applied to each reflection path of the RTF in Eq. (21) in this section.
The terms are a result of mirroring of spherical harmonics. Since the source images are used in Eq. (28), the additional factors and need to be associated with the source spherical harmonics. To further simplify the expressions in Eq. (28), we propose to replace with a simplified expression, which can be achieved as shown in the following by using the direction that corresponds to the reversed traveling path of the vector .
An example of reversed reflection paths along the direction used in the ISM. For an odd number of reflections, and share the same-length, but reversed reflection path. However, for an even number of reflections, e.g., , the reversed path is symmetric with respect to the original room cell instead of .
An example of reversed reflection paths along the direction used in the ISM. For an odd number of reflections, and share the same-length, but reversed reflection path. However, for an even number of reflections, e.g., , the reversed path is symmetric with respect to the original room cell instead of .
While the simplified method described here is similar to that of the FSRR method, we note that there are significant differences between Eq. (36) and Eq. (15). As stated in Sec. III A, the FSRR utilizes directivity coefficients that are obtained at 1 m from the source or the receiver. However, the directivity coefficients used in DEISM-LC are not limited to this assumption as long as the transparent spheres enclose the devices. Furthermore, there are inherent dissimilarities between DEISM-LC and FSRR due to the different coefficients, e.g., used in Eq. (15) and in, iv used in Eq. (36). This difference is also illustrated by an example described in Sec. IV B.
Calculate lookup tables.
1: Calculate the Wigner 3-j symbols and from Eq. (14). |
2: Calculate the spherical harmonic coefficients shown in Eq. (16) for both the source and receiver using the least squares approach. |
3: Calculate the directivity coefficients of the source and receiver, viz., and , using the relation . Then, store the directivity coefficients as matrices and . |
1: Calculate the Wigner 3-j symbols and from Eq. (14). |
2: Calculate the spherical harmonic coefficients shown in Eq. (16) for both the source and receiver using the least squares approach. |
3: Calculate the directivity coefficients of the source and receiver, viz., and , using the relation . Then, store the directivity coefficients as matrices and . |
E. Algorithm summary
A summary of the algorithm proposed in Sec. III C is listed as pseudocode in 1, 2, and 3. A Python implementation of the algorithm, along with an example, is available online.64 The pseudocode of the far-field approximation method is not provided here for brevity.
The calculation of the Wigner 3-j symbols can be computationally expensive if they are computed for each reflection path. Since they are identical for each reflection path, to reduce the computational effort, we precalculate them for all the given spherical harmonic orders and modes , and store them in a lookup table. In addition, the directivity coefficients and are precalculated and stored in a lookup table. The procedures for generating the lookup tables are listed in Algorithm 1.
Image generation.
1: (A triple introduced in Table I). |
2: (A triple introduced in Table I). |
3: (A set containing all possible images.) |
4: for do |
5: if then { maximum reflection order}. |
6: Calculate the vector pointing from the source image to the receiver in Eq. (4) and its spherical coordinates . |
7: Calculate the incident angles with respect to different walls using Eq. (5). |
8: Calculate the reflection coefficients using Eq. (6). |
9: Calculate the wall attenuation . |
10: else |
11: (Remove current image if it exceeds No.) |
12: end if |
13: end for |
1: (A triple introduced in Table I). |
2: (A triple introduced in Table I). |
3: (A set containing all possible images.) |
4: for do |
5: if then { maximum reflection order}. |
6: Calculate the vector pointing from the source image to the receiver in Eq. (4) and its spherical coordinates . |
7: Calculate the incident angles with respect to different walls using Eq. (5). |
8: Calculate the reflection coefficients using Eq. (6). |
9: Calculate the wall attenuation . |
10: else |
11: (Remove current image if it exceeds No.) |
12: end if |
13: end for |
In the provided implementation, we separate the whole procedure of the ISM into two parts, viz., the image generation given in Algorithm 2 and the single-path transfer function calculation using DEISM given in Algorithm 3. The computation in the spherical harmonic domain is generally also expensive, and therefore, we use parallel computation for all reflection paths after all images are calculated.
DEISM core.
1: P(k) = 0 (Initialize the sound field at .) |
2: for do |
3: for n = 0 to N do |
4: for to n do |
5: [Part of mirror effect of spherical harmonics in Eq. (12).] |
6: [The modified mode m of in Eq. (12).] |
7: for v = 0 to V do |
8: for to v do |
9: [Initialize the mode coupling coefficient in Eq. (13).] |
10: for to n + v do |
11: . |
12: |
13: (Update the mode coupling coefficient.) |
14: end for |
15: [Summation over constitutes the room transfer function in Eq. (21).] |
16: end for |
17: end for |
18: end for |
19: end for |
20: end for |
1: P(k) = 0 (Initialize the sound field at .) |
2: for do |
3: for n = 0 to N do |
4: for to n do |
5: [Part of mirror effect of spherical harmonics in Eq. (12).] |
6: [The modified mode m of in Eq. (12).] |
7: for v = 0 to V do |
8: for to v do |
9: [Initialize the mode coupling coefficient in Eq. (13).] |
10: for to n + v do |
11: . |
12: |
13: (Update the mode coupling coefficient.) |
14: end for |
15: [Summation over constitutes the room transfer function in Eq. (21).] |
16: end for |
17: end for |
18: end for |
19: end for |
20: end for |
IV. EVALUATION OF THE PROPOSED METHODS
In the following, we describe the experimental setups used to evaluate the performance of DEISM and DEISM-LC [see Eqs. (21) and (36), respectively]. The simulated RTFs are plotted and compared for different scenarios. The differences between the DEISM and the FEM solutions are evaluated using error metrics, and the difference between DEISM and DEISM-LC is analyzed. In addition, the effects of incorporating directivities into the RTFs by comparing the DEISM-LC with the original ISM, and the differences between DEISM-LC and FSRR are also demonstrated by comparing them with FEM.
A. Test setup
The proposed methods require a description of the directivities of the devices, i.e., a loudspeaker (or microphone) in the spherical harmonic domain. These data can be obtained from measurements of the pressure field in the local vicinity of the transducer and its enclosure. Alternatively, directivity patterns obtained from simulations could be used. In this work, the FEM is used to simulate the free-field pressure fields surrounding the transducer and its enclosure. Aretz et al.39 have demonstrated that solutions of the ISM with angle-dependent reflection coefficients are in good agreement with FEM solutions above the Schroeder frequency. Therefore, the FEM is used in this work to evaluate the performance of the proposed method.
This section presents the details of generating the simulated transducer directivities and the RTFs.
1. Transducer directivities
The three speaker shapes considered in this work are shown in Fig. 4. For each speaker, the loudspeaker driver is modeled as a vibrating piston with frequency-independent unit acceleration. For each microphone, a monopole source with frequency-independent unit volume velocity is placed at the microphone position, which is located on the top and close to the front of each enclosure.
Speaker shapes considered in this work. From left to right, these speakers are referred to in the text as cuboidal (cub.), spherical (sph.), and cylindrical (cyl.), where the terms in brackets are abbreviations also used in this work. The three speakers share the same height and the same-sized vibrating piston. Note that the speakers are not to scale in this figure.
Speaker shapes considered in this work. From left to right, these speakers are referred to in the text as cuboidal (cub.), spherical (sph.), and cylindrical (cyl.), where the terms in brackets are abbreviations also used in this work. The three speakers share the same height and the same-sized vibrating piston. Note that the speakers are not to scale in this figure.
One must ensure that free-field conditions are used to capture the loudspeaker and microphone directivities. In a measurement setup, one might use an anechoic chamber. When using the FEM, a suitable non-reflecting boundary condition is required. In this work, a perfectly matched layer65 is used to reduce unwanted reflections from the edge of the computational domain. Note that care must be taken to ensure that the perfectly matched layer sufficiently reduces the unwanted reflections—this is achieved in this work by performing a priori mesh studies. Each speaker is meshed using triangular elements, the surrounding air (with a sound speed of 343 m/s and density of 1.2 kg/m3) is meshed using tetrahedral elements, and the perfectly matched layer is meshed using hexahedral elements. An average element size that ensures that there are 10 degrees of freedom (or five quadratic elements) per wavelength is used. Quadratic shape functions are used. The free-field transfer function is simulated from 20 Hz to 1 kHz with steps of 2 Hz.
For each speaker, two sizes are considered, which are referred to as Size 1 and Size 2, where Size 2 is the smallest. The sound pressure field is simulated on a transparent sphere around the speaker for each size and shape. Note that, for a given speaker shape, the radius of the transparent sphere changes depending on the size of the speaker and the position of the transducer on the speaker. These changes occur because we place the transducer at the center of the coordinate system. For the source on the speakers of Size 1, the radius of the transparent sphere is 0.4 m, while for Size 2, it is 0.2 m. For the receivers on the speakers, a radius of 0.5 m is used for Size 1, and a radius of 0.25 m is used for Size 2.
The complex-valued pressure field simulated on the surface of the transparent sphere is used to compute the spherical harmonic directivity coefficients. For the loudspeaker directivity, this involves a straightforward implementation of spherical harmonic decomposition.56 To obtain the receiver directivities, we use acoustic reciprocity52 (see Sec. III B).
An example of the directivity patterns is shown in Fig. 5, where the sound pressure level (SPL) and phase are plotted for the spherical and cuboidal speakers with Size 1. The vibrating piston faces the positive direction of the x axis, which is hereafter referred to as . The pressure field on the XY-plane is reconstructed using Eq. (16) using simulated directivity coefficients with maximum spherical harmonic order N = 5. It can be seen that both the SPL and phase plots show differences between the two selected speaker shapes, especially at angles that are in the shadow region of the speakers. One can also observe a significant phase jump in the shadow region when a cuboidal speaker is used, which is less prominent for a spherical speaker.
Plots of sound pressure level and phase reconstructed using simulated speaker directivity coefficients of a large (Size 1) spherical and cuboidal speaker. The pressure field is reconstructed with maximum spherical harmonic order 5. The results are evaluated on the XY-plane at three frequencies.
Plots of sound pressure level and phase reconstructed using simulated speaker directivity coefficients of a large (Size 1) spherical and cuboidal speaker. The pressure field is reconstructed with maximum spherical harmonic order 5. The results are evaluated on the XY-plane at three frequencies.
2. Room transfer functions
A rectangular room with dimensions is considered. The air in the room has a speed of sound of 343 m/s and a density of 1.2 kg/m3. A uniform frequency-independent normalized impedance of 18, which gives an absorption coefficient of approximately 0.2, is specified at the walls of the room. This value is chosen because it gives a reverberation time of approximately 0.29 s at the fundamental frequency of 43.88 Hz, which gives a Schroeder frequency of approximately 198 Hz. The reverberation time was estimated using the modal model described by Morse and Bolt.66 Note that, in general, the surface impedance is a complex-valued frequency-dependent quantity. Our choice of a real-valued frequency-independent impedance does not limit the generality of the method described here.
Five source–receiver configurations are simulated. The center position of each transducer, for each configuration, is given in Table II. Four of the configurations contain two devices—one with a loudspeaker and the other with a microphone. The remaining Configuration 3 has a single device on which the loudspeaker and microphone are placed. The transfer functions of the direct path between the source and receiver are simulated using the FEM for Configuration 3. The microphone position in Table II depends on the size, shape, and rotation of the speaker and takes the values given in Table III. A two-dimensional (2D) illustration of the speaker position and facing configurations is shown in Fig. 6. The source and receiver are both at least 1 m away from the walls in Positions 1–3. Positions 4–5 allow a closer distance between the walls and at least one speaker to examine the performance of the DEISM in challenging configurations.
Parameters and descriptions used in ISM.
Parameter . | Descriptions . |
---|---|
Room dimensions: length, width, and height | |
Reflection coefficients of the walls | |
Triple determining whether the image source is mirrored | |
With respect to walls at x = 0, y = 0, or z = 0, elements take values of 0 or 1 | |
Triple determining higher order reflections | |
Each element can take values between | |
Source images | |
Receiver images | |
Vector pointing from source images to the receiver | |
Vector pointing from receiver to source images | |
The reversed vector corresponding to the same reflection path of |
Parameter . | Descriptions . |
---|---|
Room dimensions: length, width, and height | |
Reflection coefficients of the walls | |
Triple determining whether the image source is mirrored | |
With respect to walls at x = 0, y = 0, or z = 0, elements take values of 0 or 1 | |
Triple determining higher order reflections | |
Each element can take values between | |
Source images | |
Receiver images | |
Vector pointing from source images to the receiver | |
Vector pointing from receiver to source images | |
The reversed vector corresponding to the same reflection path of |
Configurations of the transducers and the corresponding speaker orientations used in the room simulation. The position vector for Configuration 3 is listed for different shapes and sizes in TABLE III.
Configuration . | Source, orientation . | Receiver, orientation . |
---|---|---|
1 | [1.1, 1.1, 1.3], | [2.9, 1.9, 1.3], – x |
2 | [1.1, 1.1, 1.3], | [1.9, 1.6, 1.4], – x |
3 | [1.1, 1.1, 1.3], | |
4 | [0.4, 1.1, 1.3], | [2.1, 1.6, 1.3], – x |
5 | [0.4, 1.1, 1.3], | [2.5, 2.6, 1.3], – y |
Configuration . | Source, orientation . | Receiver, orientation . |
---|---|---|
1 | [1.1, 1.1, 1.3], | [2.9, 1.9, 1.3], – x |
2 | [1.1, 1.1, 1.3], | [1.9, 1.6, 1.4], – x |
3 | [1.1, 1.1, 1.3], | |
4 | [0.4, 1.1, 1.3], | [2.1, 1.6, 1.3], – x |
5 | [0.4, 1.1, 1.3], | [2.5, 2.6, 1.3], – y |
Positions for simulations with the source and receiver on the same speaker (Configuration 3).
Speaker shape . | Size 1 (large) . | Size 2 (small) . |
---|---|---|
Cuboid | [1.05, 1.1, 1.5] | [1.075, 1.1, 1.4] |
Spherical | [1.05, 1.1, 1.473] | [1.075, 1.1, 1.387] |
Cylindrical | [1.05, 1.1, 1.5] | [1.075, 1.1, 1.4] |
Speaker shape . | Size 1 (large) . | Size 2 (small) . |
---|---|---|
Cuboid | [1.05, 1.1, 1.5] | [1.075, 1.1, 1.4] |
Spherical | [1.05, 1.1, 1.473] | [1.075, 1.1, 1.387] |
Cylindrical | [1.05, 1.1, 1.5] | [1.075, 1.1, 1.4] |
A 2D view of the positions of the mounted transducers inside the room. The black dots and gray dots represent the approximate positions of the source and receiver transducers, respectively. The arrows of the dots indicate the directions in which the speakers face, and the numbers indicate the source and receiver positions.
A 2D view of the positions of the mounted transducers inside the room. The black dots and gray dots represent the approximate positions of the source and receiver transducers, respectively. The arrows of the dots indicate the directions in which the speakers face, and the numbers indicate the source and receiver positions.
Each model is meshed using tetrahedral elements with an average size that ensures that 10 degrees of freedom per wavelength are used. Quadratic shape functions are used. The RTFs for each configuration are simulated, from 20 Hz to 1 kHz, in steps of 2 Hz.
B. Preliminary comparisons with ISM and FSRR
A comparison of the (unshifted) magnitudes and phase responses between the ISM-Omni, DEISM-LC, FSRR, and FEM solutions. The results are shown for large cuboidal speakers with Configuration 2. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver.
A comparison of the (unshifted) magnitudes and phase responses between the ISM-Omni, DEISM-LC, FSRR, and FEM solutions. The results are shown for large cuboidal speakers with Configuration 2. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver.
Overall, the differences between the DEISM-LC and FEM are small, compared to those between the ISM-Omni and FEM and those between the FSRR and FEM. This indicates that the DEISM-LC can generate more realistic RTFs when compared to the other image source methods for the considered scenario. The large offset in magnitude responses between the DEISM-LC and ISM-Omni mainly come from the different source types, i.e., a point source used in ISM-Omni and a vibrating piston used in DEISM-LC. However, the locations and Q-factors of the resonant peaks and the phase responses also vary greatly. This shows that the directivities of the source and receiver, including sound diffraction around the speakers, can affect the RTFs to a large extent.
The differences between the DEISM-LC and FSRR are mainly attributed to the inherent mismatch in their RTF parameterizations, i.e., between Eqs. (36) and (15). The magnitude responses are different, not only in the amplitude, but also in the locations and Q-factors of the resonant peaks. The phase response of the FSRR even contains positive values at lower frequencies corresponding to non-causal behaviors, which results from the random sign used in Eq. (15).
Since the inherent differences between DEISM-LC and other methods are shown to be large, i.e., the ISM-Omni and FSRR, we limit the following evaluations to DEISM, DEISM-LC, and FEM.
C. Varying distance between source and receiver
In this section, we present a comparison between the FEM and DEISM solutions for the small spherical speaker when the distance between the source and receiver is changing, corresponding to Configurations 1–3 in Table II. The two distances in Configurations 1 and 2 are around 1.97 and 0.95 m, and the distances in Configuration 3 are around 0.1 and 0.2 m for the small and large speakers, respectively. The magnitude and phase are compared in Fig. 8. To avoid visual overlap of the curves for different configurations, the SPLs and phases are plotted by shifting the curves vertically, as denoted by the arrows.
Magnitudes and phase responses for small spherical speakers with varying distance between speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves. Config., configuration.
Magnitudes and phase responses for small spherical speakers with varying distance between speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves. Config., configuration.
We see that good agreement is obtained between the FEM and DEISM solutions. In general, greater differences are found at the notches of the RTFs, than at the peaks. Additionally, the differences between DEISM and DEISM-LC solutions are much smaller than those between the FEM and DEISM solutions in the notches of the RTFs. Since Configuration 3 contains only one speaker in the room, the corresponding RTFs show fewer variations across the frequency range when compared to the other two configurations. Similar agreement between FEM and DEISM was found for the other speaker shapes. For brevity, the respective plots are not presented here.
D. Source and receiver on the same device
Next, we compare solutions for cases where the source and receiver are placed on the same device (Configuration 3). For this comparison, we analyze data for the three speaker shapes considered.
The respective magnitude and phase solutions are shown in Fig. 9. Once again, good agreement is obtained between the FEM and DEISM solutions. The magnitudes and phases of the RTFs of the cuboidal and cylindrical speakers show very similar behaviour. In contrast, those of the spherical speaker exhibit clear differences when compared to the RTFs of the other speakers. Note that the main differences appear at the notches of the RTF magnitudes, which correspond to frequencies around 444 and 650 Hz. In addition, the phase curves of the cuboidal and cylindrical speakers also exhibit significant jumps at some frequencies, which cannot be observed for the spherical speaker. The differences between the curves indicate that the edges of the cuboidal and cylindrical speakers can induce additional diffraction and lead to distinct characteristics in the RTFs.
Magnitudes and phase responses for large speakers with Configuration 3 while varying shapes of the speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves.
Magnitudes and phase responses for large speakers with Configuration 3 while varying shapes of the speakers. Max. reflection order is 25 and max. SH orders are 5 for both the source and receiver. The magnitudes and phases are shifted by vertical arrows for better distinction of the curves.
E. Quantitative analysis
-
The smaller the speaker, the smaller the difference between the DEISM and FEM solutions.
-
The fewer speakers, i.e., for Configuration 3, the smaller the difference.
Root mean square log spectral distance in decibel between DEISM and FEM, and between DEISM-LC and FEM. S., small. Cub., sph., and cyl. denote cuboidal, spherical, and cylindrical, respectively.
Configuration . | Cub. . | S. cub. . | Sph. . | S. sph. . | Cyl. . | S. cyl. . |
---|---|---|---|---|---|---|
1 DEISM | 1.918 | 1.214 | 2.268 | 1.143 | 2.062 | 0.950 |
1 DEISM-LC | 1.669 | 1.070 | 2.206 | 1.125 | 2.007 | 0.916 |
2 DEISM | 1.974 | 1.051 | 2.521 | 1.003 | 1.900 | 0.964 |
2 DEISM-LC | 1.855 | 1.003 | 2.060 | 0.984 | 1.809 | 0.943 |
3 DEISM | 0.423 | 0.204 | 0.841 | 0.170 | 0.920 | 0.154 |
3 DEISM-LC | 0.427 | 0.206 | 0.883 | 0.174 | 0.952 | 0.155 |
4 DEISM | 7.018 | 8.187 | 6.457 | 7.418 | 6.350 | 7.287 |
4 DEISM-LC | 6.864 | 8.149 | 6.461 | 7.298 | 6.240 | 7.211 |
5 DEISM | 6.887 | 6.810 | 7.077 | 6.309 | 7.011 | 6.425 |
5 DEISM-LC | 6.792 | 6.807 | 6.986 | 6.277 | 6.968 | 6.397 |
Configuration . | Cub. . | S. cub. . | Sph. . | S. sph. . | Cyl. . | S. cyl. . |
---|---|---|---|---|---|---|
1 DEISM | 1.918 | 1.214 | 2.268 | 1.143 | 2.062 | 0.950 |
1 DEISM-LC | 1.669 | 1.070 | 2.206 | 1.125 | 2.007 | 0.916 |
2 DEISM | 1.974 | 1.051 | 2.521 | 1.003 | 1.900 | 0.964 |
2 DEISM-LC | 1.855 | 1.003 | 2.060 | 0.984 | 1.809 | 0.943 |
3 DEISM | 0.423 | 0.204 | 0.841 | 0.170 | 0.920 | 0.154 |
3 DEISM-LC | 0.427 | 0.206 | 0.883 | 0.174 | 0.952 | 0.155 |
4 DEISM | 7.018 | 8.187 | 6.457 | 7.418 | 6.350 | 7.287 |
4 DEISM-LC | 6.864 | 8.149 | 6.461 | 7.298 | 6.240 | 7.211 |
5 DEISM | 6.887 | 6.810 | 7.077 | 6.309 | 7.011 | 6.425 |
5 DEISM-LC | 6.792 | 6.807 | 6.986 | 6.277 | 6.968 | 6.397 |
Normalized root mean square phase error in radians between DEISM and FEM, and between DEISM-LC and FEM. S., denotes small. Cub., sph., and cyl. denote cuboidal, spherical, and cylindrical, respectively.
Configuration . | Cub. . | S. cub. . | Sph. . | S. sph. . | Cyl. . | S. cyl. . |
---|---|---|---|---|---|---|
1 DEISM | 0.258 | 0.179 | 0.307 | 0.174 | 0.278 | 0.162 |
1 DEISM-LC | 0.265 | 0.153 | 0.300 | 0.176 | 0.267 | 0.181 |
2 DEISM | 0.251 | 0.138 | 0.246 | 0.128 | 0.255 | 0.123 |
2 DEISM-LC | 0.246 | 0.149 | 0.259 | 0.142 | 0.260 | 0.144 |
3 DEISM | 0.051 | 0.026 | 0.104 | 0.032 | 0.093 | 0.024 |
3 DEISM-LC | 0.052 | 0.026 | 0.109 | 0.032 | 0.097 | 0.024 |
4 DEISM | 1.796 | 1.764 | 1.834 | 1.776 | 1.801 | 1.778 |
4 DEISM-LC | 1.791 | 1.763 | 1.828 | 1.774 | 1.781 | 1.776 |
5 DEISM | 1.795 | 1.926 | 1.818 | 1.906 | 1.822 | 1.921 |
5 DEISM-LC | 1.791 | 1.928 | 1.816 | 1.907 | 1.821 | 1.923 |
Configuration . | Cub. . | S. cub. . | Sph. . | S. sph. . | Cyl. . | S. cyl. . |
---|---|---|---|---|---|---|
1 DEISM | 0.258 | 0.179 | 0.307 | 0.174 | 0.278 | 0.162 |
1 DEISM-LC | 0.265 | 0.153 | 0.300 | 0.176 | 0.267 | 0.181 |
2 DEISM | 0.251 | 0.138 | 0.246 | 0.128 | 0.255 | 0.123 |
2 DEISM-LC | 0.246 | 0.149 | 0.259 | 0.142 | 0.260 | 0.144 |
3 DEISM | 0.051 | 0.026 | 0.104 | 0.032 | 0.093 | 0.024 |
3 DEISM-LC | 0.052 | 0.026 | 0.109 | 0.032 | 0.097 | 0.024 |
4 DEISM | 1.796 | 1.764 | 1.834 | 1.776 | 1.801 | 1.778 |
4 DEISM-LC | 1.791 | 1.763 | 1.828 | 1.774 | 1.781 | 1.776 |
5 DEISM | 1.795 | 1.926 | 1.818 | 1.906 | 1.822 | 1.921 |
5 DEISM-LC | 1.791 | 1.928 | 1.816 | 1.907 | 1.821 | 1.923 |
The errors of the DEISM solutions increase when the speakers are located close to one wall of the room, i.e., for Configurations 4 and 5. The large errors may be attributed to the mirror source approximation since the speakers can have more complicated wave interaction with the nearby walls, which is not considered in this model, and can lead to additional RTF fluctuations. For devices with a physical extent, the assumption of a source image might not hold and could require more complex modeling.
In Configuration 3, the DEISM outperforms DEISM-LC for all speaker shapes and sizes. However, this does not necessarily hold for Configurations 1 and 2. This is mainly caused by the notches of the RTF magnitudes, where the DEISM produces much lower values than DEISM-LC, as shown in Figs. 8 and 9.
To provide insight into the performance of the methods in larger rooms, we consider the case in which the room's length in the x-dimension is doubled. For this test, only the cylindrical driver is considered in Configurations 1–3. The error levels obtained are presented in Table VI. The error levels for the larger room are similar to the values given in Tables IV and V, and imply similar conclusions.
Root mean square log spectral distance in decibel and normalized root mean square phase error in radians between DEISM and FEM, and between DEISM-LC and FEM in a larger room. S., small.
. | RMS LSD (decibel) . | NRMS phase error (radians) . | ||
---|---|---|---|---|
Configuration . | Cyl. . | S. cyl. . | Cyl. . | S. cyl. . |
1 DEISM | 1.177 | 0.967 | 0.185 | 0.128 |
1 DEISM-LC | 1.190 | 1.036 | 0.227 | 0.126 |
2 DEISM | 0.987 | 0.928 | 0.123 | 0.131 |
2 DEISM-LC | 1.188 | 0.972 | 0.153 | 0.119 |
3 DEISM | 0.783 | 0.319 | 0.151 | 0.038 |
3 DEISM-LC | 0.893 | 0.320 | 0.155 | 0.038 |
. | RMS LSD (decibel) . | NRMS phase error (radians) . | ||
---|---|---|---|---|
Configuration . | Cyl. . | S. cyl. . | Cyl. . | S. cyl. . |
1 DEISM | 1.177 | 0.967 | 0.185 | 0.128 |
1 DEISM-LC | 1.190 | 1.036 | 0.227 | 0.126 |
2 DEISM | 0.987 | 0.928 | 0.123 | 0.131 |
2 DEISM-LC | 1.188 | 0.972 | 0.153 | 0.119 |
3 DEISM | 0.783 | 0.319 | 0.151 | 0.038 |
3 DEISM-LC | 0.893 | 0.320 | 0.155 | 0.038 |
F. Evaluation of DEISM-LC
The errors in free-field conditions are given in Fig. 10, where the source–receiver distance and the error are plotted on logarithmic scales. One can observe that the difference decreases to around 1% at a distance of 10 m and to around 0.1% at a distance of 100 m. It can be concluded that DEISM-LC asymptotically converges to the DEISM solution as the distance between the source and receiver increases, which is expected from the derivations in Sec. III D.
Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions evaluated for the direct path only while changing the distance between the source and receiver. Both the distance and relative errors are plotted on a logarithmic scale.
Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions evaluated for the direct path only while changing the distance between the source and receiver. Both the distance and relative errors are plotted on a logarithmic scale.
In Fig. 11, the relative error is evaluated by changing the maximum reflection order in the DEISM for the cuboidal speakers. Note that for the curves of Configuration 3, the data of reflection order zero is missing since the direct path is simulated using the FEM. The difference between the DEISM solutions reduces for the first few reflection orders, but remains constant for higher reflection orders. Similar behaviors were also found for other speaker shapes, which are not presented here for brevity.
Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions as a function of reflection order for the cuboidal speakers at Positions 1, 2, and 3.
Relative error [Eq. (41)] between the DEISM and DEISM-LC solutions as a function of reflection order for the cuboidal speakers at Positions 1, 2, and 3.
G. Summary
The presented comparisons and error analyses show that the proposed method and its simplified version can simulate RTFs that are in good agreement with FEM solutions in Configurations 1–3. Less agreement is found when a speaker is placed close to the walls, i.e., in Configurations 4 and 5. This might be attributed to the comb filtering caused by the direct path and reflection between the wall and speaker when the distance is less than 0.5 m.68
The errors between the DEISM and FEM solutions decrease when smaller speakers are used in the simulation. Additionally, the lowest error is obtained when the source and receiver are placed on the same speaker, which reduces the number of speakers in the simulation, and hence, the number of reflecting surfaces. These observations support the hypothesis that the errors come mostly from the omission of speakers with physical extent in the ISM model, i.e., scattering effects between the walls and the speaker's surfaces are not included.
We have also shown that the simplified version (DEISM-LC) converges to its complete version (DEISM) in the free field and yields a small bias when used to simulate RTFs. The simplified version, DEISM-LC, yields a better agreement with the FEM simulations as opposed to the existing FSRR method.
In addition, the computation of a single RTF using FEM took approximately 2 days on a desktop computer with 256 GB of random access memory (RAM), while the computation of the same RTF using DEISM took approximately 5 min on a laptop with 16 GB of RAM. Moreover, DEISM-LC was around ten times faster than DEISM for the scenarios considered in this paper. This reduction of the time needed to compute RTFs can facilitate applications that require large amounts of data, e.g., the generation of training data for data-driven approaches.
V. CONCLUSION
A RTF simulation method, which makes use of spherical harmonic directivity coefficients, and which is based on the image source method, has been presented. The directivity coefficients contain information about the sound diffraction around transducers mounted on audio devices, and the method presented aims to capture this effect. The proposed method (DEISM) and its simplified version (DEISM-LC) were compared with FEM solutions and evaluated for different acoustic scenarios. The results show that the proposed approaches can simulate RTFs with good agreement to FEM simulations for shoebox-shaped rooms when devices are not placed close to the room boundaries. A rule of thumb for indicating the regions of the room in which the DEISM provides results that are significantly different from FEM results is a topic for future studies.
Future work could include extending the proposed method to incorporate the sound reflections between the enclosures of multiple devices, or between the enclosures and room boundaries. Such extensions might reduce the error incurred when multiple devices are used or when devices are placed close to the room boundaries. Another valuable extension of the proposed approach would be to apply the DEISM and DEISM-LC to rooms with more general geometries, with the aim of efficiently generating more realistic room acoustics simulations.