Generating acoustically bright and dark zones using loudspeakers is gaining attention as one of the most important acoustic communication techniques for such uses as personal sound systems and multilingual guide services. Although most conventional methods are based on numerical solutions, an analytical approach based on the spatial Fourier transform with a linear loudspeaker array has been proposed, and its effectiveness has been compared with conventional acoustic energy difference maximization and presented by computer simulations. To describe the effectiveness of the proposal in actual environments, this paper investigates the experimental validation of the proposed approach with rectangular and Hann windows and compared it with three conventional methods: simple delay-and-sum beamforming, contrast maximization, and least squares-based pressure matching using an actually implemented linear array of 64 loudspeakers in an anechoic chamber. The results of both the computer simulations and the actual experiments show that the proposed approach with a Hann window more accurately controlled the bright and dark zones than the conventional methods.
I. INTRODUCTION
Achieving a personalized listening area without headphones is garnering attention as an important and attractive acoustic communication technique. A very well-known application is a personal sound system1–12 that allows individual listening using multiple loudspeakers. In addition, multiple sound zones9,13,14 can simultaneously provide different sound signals at different positions to multi-users, which is useful for multilingual guide services and other virtual reality applications.
Many approaches for generating a personalized listening area using multiple loudspeakers, initially proposed in Ref. 15, have been investigated over the last decade. These approaches control the acoustic contrast or the energy between two spaces called acoustically bright and dark zones.2–7,10,13,15–22 For creating personal sound systems and multiple sound zones, these approaches are more effective than beamforming methods,1,23–25 which maximize the energy to the target direction with the given input source power. In addition, extended approaches, which simultaneously control not only the sound pressures but also multiple sound fields in multiple regions, have also been investigated.8,11,12,26–30
Most existing methods are based on the least squares (LS) solution that is numerically calculated using control points and loudspeaker positions.2–7,10,15,17,21,28,30 Such methods, however, are quite unstable because the acoustic inverse problem is very ill-conditioned.31,32 This problem is the same in the LS-based pressure matching approach.33,34 To stably calculate the well-conditioned inversion and driving signals, regularization schemes are required, such as the truncated singular value decomposition (SVD) method.35 In these methods, repeated calculations of the inversion are needed to select the optimal regularization parameters.36
Analytical approaches for generating bright and dark zones, on the other hand, have also been investigated.13,14,19,20 An analytical method with a linear loudspeaker array, which can generate a bright zone only at the center of the array, was initially provided without being compared with other approaches.19,20 For creating multiple sound zones, an extended approach, which generates bright and dark zones with arbitrary lengths at arbitrary positions, has been proposed13 that can efficiently generate them better than the conventional energy difference maximization (EDM).16 These methods13,19,20 are based on the spectral division method (SDM),37 which is a sound field synthesis scheme based on the spatial Fourier transform31 and spatial filtering in the wave number domain. An analytical approach based on 2.5-D cylindrical harmonics expansion with open and baffled circular loudspeaker arrays has also been proposed14 and can more accurately control a sound field than conventional 2D-cylindrical harmonics-based beamforming and LS methods.24
Sound pressures at the control line have been modeled by a simple rectangular window corresponding to bright and dark zones.13,19,20 For improving control accuracy, an extended spatial filtering with a Hann window, whose dynamic range is wider than that with a simple rectangular window,38 is introduced and analytically derived in this paper. In one of the above cited works,20 only a simulation result using spatial filtering with a Hann window was reported but no analytical solution was derived.
Although only computer simulations were conducted and no experiments used actual loudspeakers,13,19,20 experiments with actual loudspeakers are critical for validating the proposed approaches in actual environments.39 Therefore this paper provides experimental validation of the proposed method with an experiment that actually implemented linear arrays of 64 loudspeakers and 64 microphones in an anechoic chamber. The experimental results of the proposed method are compared with three conventional approaches: simple delay-and-sum (DS) beamforming,23 contrast maximization (CM),15 and LS-based pressure matching.33,34 EDM16 was not included this paper's comparisons since its control accuracy with a linear loudspeaker array was lower than the simple DS beamforming in the pre-experiments. Though simple DS beamforming only considers the bright zone and the control accuracy of the other conventional numerical methods depends on normalization with tuning hyperparameters, the proposed approach can directly control the sound pressures without hyperparameters and is expected to outperform the conventional methods.
The rest of this paper is organized as follows. Section II introduces an analytical method for generating multiple sound zones using a linear loudspeaker array based on the spatial Fourier transform and derives an extended analytical spatial filtering modeled by a Hann window. In Sec. III, computer simulations and experiments with actually implemented linear arrays of loudspeakers and microphones in an anechoic chamber are conducted to evaluate the proposed method and compare it with the conventional approaches. Experimental results are compared and discussed in Sec. IV. Finally, conclusions are provided in Sec. V.
II. SPATIAL FOURIER TRANSFORM-BASED MULTIPLE SOUND ZONE GENERATION USING A LINEAR LOUDSPEAKER ARRAY
A. Spatial Fourier transform-based sound field synthesis with a continuous linear sound source
Sound pressure synthesized at position by a continuous linear sound source with an infinite length along the x axis is given as
where k = 2πf/c is the wave number, f is the temporal frequency, c is the speed of sound, is the sound source driving function at position , and is the transfer function of the sound source placed at to point x. Under the free-field assumption, is the three-dimensional free-field Green's function,31 defined as
where . When applying the spatial Fourier transform to Eq. (1) with respect to the x axis, the convolution along it is performed by the convolution theorem:
where kx is the spatial frequency in the direction of x and is the spatial Fourier transform of with respect to the x axis and is given as
where denotes the 0th order Hankel function of the second kind.31
When the continuous receiver line is located at , the driving function of the linear sound source in the wave number domain is directly obtained by
in the spectral division method.37
B. Analytical spatial filtering approach for generating multiple sound zones
For synthesizing each sound signal Sl(k) at each zone using a continuous linear sound source, each filter for each sound signal Sl(k) at each zone is calculated ( ). When the number of sound signals is L, the driving function of the sound source, which represents the superposition of each , is given as
To derive the spatial filters, the simplest case with L = 1 and S1(k) = 1 is considered and L is omitted in the following equations. In this case, , and Eq. (6) is then represented as
From Eq. (7), is calculated as the spatial filter in the wave number domain for generating a bright zone using a continuous linear sound source.
In sound field reproduction, at position is the actual acoustic pressure received at the continuous linear receiver at r = rref.36 In the previous approach that used Fourier transform-based spatial windowing, an arbitrary sound field was assumed for original sound field before the spatial window was filtered. Then contains all the wave number components, and the driving function includes the convolution operation in the wave number domain.19,20 In the proposed method, on the other hand, and contains only a component with kx = 0 whose wave front is just parallel to the linear sound source to avoid the convolution operation in the wave number domain. Previous work indicated that is sufficient for generating bright and dark zones.13
1. Spatial filtering modeled by a rectangular window
When the spatial filter is modeled by a simple rectangular window and a bright zone of length lb is generated at a position [xb, rb], the positions at and 0 correspond to the bright and dark points, respectively.13,19,20
For generating a bright zone of length lb centered around x = 0, P(x, lb) is modeled by rectangular window Π(x/lb) and is given as
The spatial Fourier transform of Prect (x, lb) with respect to x is then obtained:31
2. Spatial filtering modeled by a Hann window
To extend the spatial filtering, a Hann window is introduced and an analytical solution is derived.
As in Eq. (8), P(x, lb) is modeled by a Hann window of length lb centered around x = 0 that is given as
The Fourier transform of PHann (x, lb) with respect to x is analytically derived in the Appendix and obtained as
As in a previous work,13 for shifting the center of the bright zone from x = 0 to x = xb (Fig. 1), the shift theorem31 with respect to x is applied to Eqs. (9) and (11). Then the spatial filters in the wave number domain for generating the bright zone at are analytically derived,
Consequently, a bright zone of arbitrary length lb can be generated at arbitrary horizontal position [xb, rb] by the proposed spatial filter in the wave number domain using a continuous linear monopole sound source distribution.
C. Practical implementation using a linear loudspeaker array
The spatial filter coefficients of the proposed method in the temporal frequency domain are finally derived by the inverse spatial Fourier transform:31,37
where only the propagation wave components are considered and evanescent components are discarded to calculate stable filters.36
For actual implementations, a linear loudspeaker array instead of a continuous linear sound source is used, and Eq. (13) must be discretized and truncated.13,14,36,40 The truncation and discretization properties of the driving function in the SDM have been scrutinized.37
III. EXPERIMENTS
A. Experimental conditions
Computer simulations and experiments using actually implemented linear arrays of loudspeakers and microphones were conducted to evaluate the proposed method and to compare it with the conventional approaches.
The temperature of the anechoic chamber was 21 °C, and the speed of sound c was set to 344.18 m/s for both the computer simulations and experiments. A linear array of loudspeakers was set along the x axis and centered around x = 0. The number of loudspeakers in the linear array was M = 64, and the distance between adjacent loudspeakers was Δxsp = 0.065 m, which corresponds to an actually implemented linear loudspeaker array. Its spatial Nyquist frequency was about 2.8 kHz.
B. Conventional methods
The proposed method was compared with the following three conventional approaches.
1. Delay-and-sum beamforming
This is the simplest approach,23 and the spatial filter coefficients are obtained as
where and are the mth loudspeaker position ( ) and the nth control point for the bright zone ( ), respectively.
2. Contrast maximization
When using N control points at and M loudspeakers at ,15 the spatial averaged correlation matrix between the control points and the loudspeaker positions is calculated as
where . is the Hamiltonian of is the spatial correlation matrix between the bright points and the loudspeaker positions, and is that between the dark points and the loudspeaker positions. In the CM approach, spatial filters are obtained from the eigenvector of matrix that corresponds to the largest eigenvalue of this matrix.15 is the generalized inverse of .
3. Least squares-based pressure matching
In the LS-based pressure matching approach,33 the spatial filters are directly calculated as the inverse of matrix , which is constructed from every transfer function between each control point and loudspeaker position ,
where
and is the sound pressures at control points . For generating bright zone and dark zone at and are set to 1 and 0, respectively.
To calculate stable filters for the CM and LS methods, a truncated SVD was employed for regularization. Small valued eigenvalues of the matrices were truncated, and the threshold of the ratio between the maximum and minimum eigenvalues was set to 20 dB, which was decided from pre-experimental results.
C. Evaluation indices
The sound pressure levels synthesized by the loudspeakers on a plane with z = 0 m, −2.08 m ≤ x ≤ 2.08 m, and 1.5 m ≤ y ≤ 2.5 m were evaluated. The measurement points at the plane were discretized as Δx = 0.0325 m, which depends on an implemented linear microphone array and corresponds to twice the spatial Nyquist frequency of the linear loudspeaker array, and Δy = 0.05 m, which depends on the actual measurements.
In previous work,13,14 the bright to dark ratio (BDR) was defined for evaluating produced sound pressure level between bright zone and dark zone using a rectangular window. In this paper, an extended spatial filtering with a Hann window is proposed. Then the simple BDR that was previously used13,14 is inadequate since a Hann window in not rectangular.
For generating bright and dark zones, since the most important performance is how to reduce the undesired sound pressures at the dark zone, the following two evaluation indices are defined to evaluate it.
One is the sound pressure level averaged for y components that is defined as
where is the bright zone center position, Δy = 0.05 m, and I = 21.
The other is the averaged sound pressure level ratio between bright zone center and dark zone , defined as the modified BDR, given as
In the proposed approach, control distance rb = yb = was set to 2.0 m (zb = 0 m), and dkx in Eq. (13) was discretized into , and spatial filters of M = 64 loudspeakers in temporal frequency domain were obtained.
In the conventional methods, N = 64 control points were set to y = yb = 2 m, z = 0 m, and −2.08 m ≤ x ≤ 2.0475 m discretized Δxco = 0.065 m.
The produced sound pressure level at temporal frequency f = 2000 Hz with xb = 1.04 m and lb = 1.04 m for each method was evaluated. In addition, averaged sound pressure levels (SPL), (SPL)ave (x, k), were calculated for the proposed methods.
In the modified BDR evaluation, BDRmod (k) was calculated from the produced sound pressure level at discretized 128 (−2.08 m ≤ x ≤ 2.0475 m with Δx = 0.0325 m) × 21 (1.5 m ≤ y ≤ 2.5 m with Δy = 0.05 m) = 2688 measurement points to evaluate the following four conditions:
-
Narrow width bright zone at the array center (xb = 0 m and lb = 0.26 m).
-
Wide width bright zone at the array center (xb = 0 m and lb = 1.04 m).
-
Narrow width bright zone at the left side of the array (xb = 1.04 m and lb = 0.26 m).
-
Wide width bright zone at the left side of the array (xb = 1.04 m and lb = 1.04 m).
D. Computer simulations
In all the simulations, a three-dimensional free-field was assumed. According to the experimental conditions, the spatial filters of the conventional and proposed methods, the produced sound pressure level, and BDRmod (k) were calculated using ideal transfer functions , defined in Eq. (2).
E. Acoustic measurements with actually implemented arrays
To validate the effectiveness of the proposed method using actual loudspeakers and compare it with the conventional approaches, a linear array of 64 loudspeakers was implemented. Sixty-four loudspeakers (Bose; M2 which are active loudspeakers with their own amplifiers but no such amplifiers were used) were mounted on an aluminum frame [Fig. 2(b)] and controlled by two DA converters (RME; M-32DA) and 32 loudspeaker amplifiers (Rorand; SRA-5050). The loudspeaker locations were the same as described in Sec. III A. As shown in Fig. 2(a), the transfer functions between the loudspeakers and the evaluation points were measured using an actually implemented linear microphone array in an anechoic chamber where the background noise level was 9.2 dB (100 to 20 000 Hz). Sixty-four microphones (DPA; 4060) were fixed using 64 aluminum jigs, mounted on an aluminum frame [Fig. 2(c)], and controlled by two AD converters (RME; M-32AD) and 8 microphone amplifiers (RME; Octamic II). The distance between adjacent microphones was Δxmic = 0.0325 m. The DA and AD converters were connected to a MADI audio interface (RME; HDSPe-MADIface) and controlled by a laptop (Apple; MacBook Pro) with audio control software (Pd-extended 0.42.5) that can synchronously manage 64-in/64-out MADI audio signals with a sampling frequency of 48 kHz.41 Sixty-four microphone gains were calibrated using a sound pressure calibrator (Brüel & Kejær; type 4231).
To ensure identical experimental conditions among the three conventional and two proposed methods, the transfer functions between the loudspeakers and the evaluation points were measured as impulse responses instead of direct measurements of the sound pressures produced by these five methods. A time-stretched pulse42 with a length of 16 384 points, where the sampling frequency was 48 kHz, was used as the measurement signal. In the measurement, each impulse response was calculated from the synchronous addition of ten measurements. The impulse responses between 64 loudspeakers and 2688 evaluation points were measured, and the total number was 172 032. The transfer functions in the temporal frequency domain were obtained from the measured impulse responses by the discrete temporal Fourier transform. The signal to noise ratio (SNR) of the measured transfer functions exceeded 20 dB.
In the experimental evaluation, produced sound pressure level , averaged sound pressure level SPLave (x, k), and modified bright to dark ratio BDRmod (k) were calculated using the measured transfer functions.
IV. RESULTS AND DISCUSSIONS
A. Produced sound pressure level in ideal condition
Figures 3 and 4 show the results of the produced sound pressure level, where with was set to 0 dB at temporal frequency f = 2000 Hz with xb = 1.04 m and lb = 1.04 m, calculated from the simulated and measured transfer functions, respectively.
From the simulation results in Fig. 3, the conventional CM and LS methods and the proposed approaches controlled the bright and dark zones on reference line y = yb more effectively than the simple DS beamforming that only considered the bright zone. However, the undesired sound pressures were severely radiated around the boundary between the bright and dark zones with y ≠ yb in the CM and LS methods since these methods are based on numerical solutions, and the sound pressures at y ≠ yb cannot be controlled at all. Especially in the LS method, undesired sound pressures were radiated at the opposite side of the bright zone. On the other hand, the proposed approaches were derived from the wave equation-based analytical solution and both the sound pressures and the wave front on y = yb were efficiently controlled. The method with a Hann window effectively generated bright and dark zones with fewer undesired sound pressures radiated to the dark zone compared with the rectangular window. This is just because of the wave number component difference between the rectangular and Hann windows. Figure 5 shows the absolute value of the spatial filter in the wave number domain of each proposed method for f = 2000 Hz with xb = 1.04 m, yb = 2.0 m, and lb = 1.04 m. The spatial filter modeled by a rectangular window contains the sidelobes in the wave number domain at kx ≠ 0, and these components are radiated to the dark zones with y ≠ yb. That with a Hann window, in contrast, contains fewer sidelobes at kx ≠ 0, and the produced sound field is also constructed from mainlobe components kx ≈ 0 that are almost parallel to the control line. As a result, the unwanted sound pressures are not radiated to the dark zones with y ≠ yb. The simulation results in Fig. 3 validated the theoretical performance of both the proposed methods and indicated that spatial filtering with a Hann window can more effectively generate the bright and dark zones using a linear loudspeaker array than the other methods.
B. Produced sound pressure level using actual loudspeakers
The experimental results of control accuracy calculated from the measured transfer functions shown in Fig. 4 were degraded compared with the simulation results depicted in Fig. 3. This is because the measured transfer functions include such measurement error as loudspeaker gain differences, loudspeaker and microphone location error, the directivity of the loudspeakers, and additive noise. The measured transfer functions differ from the ideal ones. Especially in the CM method, the performance degradation was severe since it includes numerical inversion and is unstable even though truncated SVD regularization was employed. Compared with the conventional methods, the proposed approaches with a Hann window adequately controlled the bright and dark zones using the actual linear array. This indicates that the wave equation-based analytical solutions were more robust and practical for actual implementations than the conventional beamforming and numerical approaches.
C. Control accuracy comparison between proposed methods with actual loudspeakers
For evaluating the control accuracy using the actual array between both of the proposed methods, the results of averaged sound pressure level SPLave (x, k) defined in (18) were plotted in Fig. 6. The sound pressure level in the bright zone produced by a rectangular window filtering is obviously flatter than that produced by a Hann window filtering because of the window shape in the bright zone. These results, on the other hand, indicate that the control accuracy of a Hann window filtering in the dark zone is higher than a rectangular window filtering for the following two reasons. The cause of the high control accuracy near the bright zone is the wave number component difference described in Sec. IV A. In addition, the dynamic range of a Hann window is theoretically wider than a rectangular window, and the control accuracy of a Hann window at the dark zone is completely higher than a rectangular window. The effectiveness of Hann window filtering with actual loudspeakers is validated from the results in Fig. 6.
D. Control accuracy of dark zone
Figures 7, 8, 9, and 10 show the results of modified bright to dark ratio BDRmod (k) defined in (19) for the four conditions described in Sec. III C that were calculated from the ideal and measured transfer functions. In addition, to validate the results of the measured transfer functions, they were simulated from ideal ones. To simulate the degradation of the measured transfer functions, the sound pressures produced by five methods were calculated using the ideal transfer functions with additional Gaussian noise. The averaged SNR of the additive noise was set to 20 dB, which was also determined from the pre-experimental results. The BDRmod (k) results calculated by the simulated transfer functions with additive noise were plotted in Figs. 7(c) to 10(c), respectively.
First, the results of the measurement cases shown in Figs. 7(b) to 10(b) were also completely delegated compared with those of the simulation cases with the ideal transfer functions because of the transfer function mismatch described in Sec. IV B.
By comparing the results of the measured transfer functions shown in Figs. 7(b) to 10(b) with those of the simulated ones with the additive noise in Figs. 7(c) to 10(c), they seem to share a similar tendency and validate the results of the measured transfer functions.
In the case of narrow width bright zone lb = 0.26 m, especially for temporal frequency 2 to 3.5 kHz with xb = 1.04 m, the CM approach can more effectively control the dark zone in both the simulation and measurement results than the other methods. However, the results of the CM method plotted in Fig. 9 at a temporal frequency of about 4 kHz and Figs. 8 and 10 for wide width bright zone lb = 1.04 m were unstable since the CM method included the numerical inversion and the spatial filters were unstable even though regularization was applied. The results of the proposed method with a Hann window can more effectively control the dark zone especially with a wide width bright zone in both the simulation and measurement cases compared with the other method.
Consequently, the effectiveness of the proposed approach with a Hann window for generating bright and dark zones using an actual linear loudspeaker array was validated from both the simulation and measurement results. Future work will improve the proposed method for reverberant environments.
V. CONCLUSIONS
This paper experimentally validated the proposed spatial Fourier transform-based approaches for controlling multiple sound zones. Although previous work proposed a spatial filter with a simple rectangular window, a spatial filter with a Hann window in the wave number domain was introduced and an analytical solution was derived. Both a computer simulation and an experiment using actual loudspeakers were conducted. In the experiment, linear arrays of 64 loudspeakers and 64 microphones were actually implemented in an anechoic chamber, and the transfer functions between loudspeakers and evaluation points were measured. The proposed methods were compared with the conventional DS beamforming, CM, and LS methods. Both the simulation and experimental results validated the effectiveness of the proposed approach with a Hann window using an actual linear loudspeaker array.
ACKNOWLEDGMENTS
This study was partly supported by JSPS KAKENHI Grant Nos. 25871208 and 15K21674.
APPENDIX: SPATIAL FOURIER TRANSFORM OF A HANN WINDOW
The spatial Fourier transform of a Hann window is analytically derived as