Generating acoustically bright and dark zones using loudspeakers is gaining attention as one of the most important acoustic communication techniques for such uses as personal sound systems and multilingual guide services. Although most conventional methods are based on numerical solutions, an analytical approach based on the spatial Fourier transform with a linear loudspeaker array has been proposed, and its effectiveness has been compared with conventional acoustic energy difference maximization and presented by computer simulations. To describe the effectiveness of the proposal in actual environments, this paper investigates the experimental validation of the proposed approach with rectangular and Hann windows and compared it with three conventional methods: simple delayandsum beamforming, contrast maximization, and least squaresbased pressure matching using an actually implemented linear array of 64 loudspeakers in an anechoic chamber. The results of both the computer simulations and the actual experiments show that the proposed approach with a Hann window more accurately controlled the bright and dark zones than the conventional methods.
I. INTRODUCTION
Achieving a personalized listening area without headphones is garnering attention as an important and attractive acoustic communication technique. A very wellknown application is a personal sound system^{1–12} that allows individual listening using multiple loudspeakers. In addition, multiple sound zones^{9,13,14} can simultaneously provide different sound signals at different positions to multiusers, which is useful for multilingual guide services and other virtual reality applications.
Many approaches for generating a personalized listening area using multiple loudspeakers, initially proposed in Ref. 15, have been investigated over the last decade. These approaches control the acoustic contrast or the energy between two spaces called acoustically bright and dark zones.^{2–7,10,13,15–22} For creating personal sound systems and multiple sound zones, these approaches are more effective than beamforming methods,^{1,23–25} which maximize the energy to the target direction with the given input source power. In addition, extended approaches, which simultaneously control not only the sound pressures but also multiple sound fields in multiple regions, have also been investigated.^{8,11,12,26–30}
Most existing methods are based on the least squares (LS) solution that is numerically calculated using control points and loudspeaker positions.^{2–7,10,15,17,21,28,30} Such methods, however, are quite unstable because the acoustic inverse problem is very illconditioned.^{31,32} This problem is the same in the LSbased pressure matching approach.^{33,34} To stably calculate the wellconditioned inversion and driving signals, regularization schemes are required, such as the truncated singular value decomposition (SVD) method.^{35} In these methods, repeated calculations of the inversion are needed to select the optimal regularization parameters.^{36}
Analytical approaches for generating bright and dark zones, on the other hand, have also been investigated.^{13,14,19,20} An analytical method with a linear loudspeaker array, which can generate a bright zone only at the center of the array, was initially provided without being compared with other approaches.^{19,20} For creating multiple sound zones, an extended approach, which generates bright and dark zones with arbitrary lengths at arbitrary positions, has been proposed^{13} that can efficiently generate them better than the conventional energy difference maximization (EDM).^{16} These methods^{13,19,20} are based on the spectral division method (SDM),^{37} which is a sound field synthesis scheme based on the spatial Fourier transform^{31} and spatial filtering in the wave number domain. An analytical approach based on 2.5D cylindrical harmonics expansion with open and baffled circular loudspeaker arrays has also been proposed^{14} and can more accurately control a sound field than conventional 2Dcylindrical harmonicsbased beamforming and LS methods.^{24}
Sound pressures at the control line have been modeled by a simple rectangular window corresponding to bright and dark zones.^{13,19,20} For improving control accuracy, an extended spatial filtering with a Hann window, whose dynamic range is wider than that with a simple rectangular window,^{38} is introduced and analytically derived in this paper. In one of the above cited works,^{20} only a simulation result using spatial filtering with a Hann window was reported but no analytical solution was derived.
Although only computer simulations were conducted and no experiments used actual loudspeakers,^{13,19,20} experiments with actual loudspeakers are critical for validating the proposed approaches in actual environments.^{39} Therefore this paper provides experimental validation of the proposed method with an experiment that actually implemented linear arrays of 64 loudspeakers and 64 microphones in an anechoic chamber. The experimental results of the proposed method are compared with three conventional approaches: simple delayandsum (DS) beamforming,^{23} contrast maximization (CM),^{15} and LSbased pressure matching.^{33,34} EDM^{16} was not included this paper's comparisons since its control accuracy with a linear loudspeaker array was lower than the simple DS beamforming in the preexperiments. Though simple DS beamforming only considers the bright zone and the control accuracy of the other conventional numerical methods depends on normalization with tuning hyperparameters, the proposed approach can directly control the sound pressures without hyperparameters and is expected to outperform the conventional methods.
The rest of this paper is organized as follows. Section II introduces an analytical method for generating multiple sound zones using a linear loudspeaker array based on the spatial Fourier transform and derives an extended analytical spatial filtering modeled by a Hann window. In Sec. III, computer simulations and experiments with actually implemented linear arrays of loudspeakers and microphones in an anechoic chamber are conducted to evaluate the proposed method and compare it with the conventional approaches. Experimental results are compared and discussed in Sec. IV. Finally, conclusions are provided in Sec. V.
II. SPATIAL FOURIER TRANSFORMBASED MULTIPLE SOUND ZONE GENERATION USING A LINEAR LOUDSPEAKER ARRAY
A. Spatial Fourier transformbased sound field synthesis with a continuous linear sound source
Sound pressure $ P ( x , k ) $ synthesized at position $ x = [ x , \u2009 y , \u2009 z ] T $ by a continuous linear sound source with an infinite length along the x axis is given as
where k = 2πf/c is the wave number, f is the temporal frequency, c is the speed of sound, $ D ( x 0 , k ) $ is the sound source driving function at position $ x 0 = [ x 0 , \u2009 0 , \u2009 0 ] T $, and $ G 3 D ( x , x 0 , k ) $ is the transfer function of the sound source placed at $ x 0 $ to point x. Under the freefield assumption, $ G 3 D ( x , x 0 , k ) $ is the threedimensional freefield Green's function,^{31} defined as
where $ j = \u2212 1 $. When applying the spatial Fourier transform to Eq. (1) with respect to the x axis, the convolution along it is performed by the convolution theorem:
where k_{x} is the spatial frequency in the direction of x and $ G \u0303 ( k x , y , z , k ) $ is the spatial Fourier transform of $ G 3 D ( x , x 0 , k ) $ with respect to the x axis and is given as
where $ H 0 ( 2 ) $ denotes the 0th order Hankel function of the second kind.^{31}
When the continuous receiver line is located at $ y 2 + z 2 = r ref $, the driving function of the linear sound source in the wave number domain is directly obtained by
in the spectral division method.^{37}
B. Analytical spatial filtering approach for generating multiple sound zones
For synthesizing each sound signal S_{l}(k) at each zone using a continuous linear sound source, each filter $ F l ( x 0 , k ) $ for each sound signal S_{l}(k) at each zone is calculated ( $ l = 1 , 2 , \u2026 , L $). When the number of sound signals is L, the driving function of the sound source, which represents the superposition of each $ S l ( k ) F l ( x 0 , k ) $, is given as
To derive the spatial filters, the simplest case with L = 1 and S_{1}(k) = 1 is considered and L is omitted in the following equations. In this case, $ D ( x 0 , k ) = F ( x 0 , k ) $, and Eq. (6) is then represented as
From Eq. (7), $ F \u0303 ( k x , k ) $ is calculated as the spatial filter in the wave number domain for generating a bright zone using a continuous linear sound source.
In sound field reproduction, $ P ( x ref , k ) $ at position $ x ref = [ x , r ref ] $ is the actual acoustic pressure received at the continuous linear receiver at r = r_{ref}.^{36} In the previous approach that used Fourier transformbased spatial windowing, an arbitrary sound field was assumed for original sound field $ P org ( x ref , k ) $ before the spatial window was filtered. Then $ P org ( x ref , k ) $ contains all the wave number components, and the driving function includes the convolution operation in the wave number domain.^{19,20} In the proposed method, on the other hand, $ P org ( x ref , k ) = 1 $ and contains only a component with k_{x} = 0 whose wave front is just parallel to the linear sound source to avoid the convolution operation in the wave number domain. Previous work indicated that $ P org ( x ref , k ) = 1 $ is sufficient for generating bright and dark zones.^{13}
1. Spatial filtering modeled by a rectangular window
When the spatial filter is modeled by a simple rectangular window and a bright zone of length l_{b} is generated at a position [x_{b}, r_{b}], the positions at $ P ( x , k ) = 1 $ and 0 correspond to the bright and dark points, respectively.^{13,19,20}
For generating a bright zone of length l_{b} centered around x = 0, P(x, l_{b}) is modeled by rectangular window Π(x/l_{b}) and is given as
The spatial Fourier transform of P_{rect} (x, l_{b}) with respect to x is then obtained:^{31}
2. Spatial filtering modeled by a Hann window
To extend the spatial filtering, a Hann window is introduced and an analytical solution is derived.
As in Eq. (8), P(x, l_{b}) is modeled by a Hann window of length l_{b} centered around x = 0 that is given as
The Fourier transform of P_{Hann} (x, l_{b}) with respect to x is analytically derived in the Appendix and obtained as
As in a previous work,^{13} for shifting the center of the bright zone from x = 0 to x = x_{b} (Fig. 1), the shift theorem^{31} with respect to x is applied to Eqs. (9) and (11). Then the spatial filters in the wave number domain for generating the bright zone at $ y 2 + z 2 = r b $ are analytically derived,
Consequently, a bright zone of arbitrary length l_{b} can be generated at arbitrary horizontal position [x_{b}, r_{b}] by the proposed spatial filter in the wave number domain using a continuous linear monopole sound source distribution.
C. Practical implementation using a linear loudspeaker array
The spatial filter coefficients of the proposed method in the temporal frequency domain are finally derived by the inverse spatial Fourier transform:^{31,37}
where only the propagation wave components are considered and evanescent components $  k x  >  k  $ are discarded to calculate stable filters.^{36}
For actual implementations, a linear loudspeaker array instead of a continuous linear sound source is used, and Eq. (13) must be discretized and truncated.^{13,14,36,40} The truncation and discretization properties of the driving function in the SDM have been scrutinized.^{37}
III. EXPERIMENTS
A. Experimental conditions
Computer simulations and experiments using actually implemented linear arrays of loudspeakers and microphones were conducted to evaluate the proposed method and to compare it with the conventional approaches.
The temperature of the anechoic chamber was 21 °C, and the speed of sound c was set to 344.18 m/s for both the computer simulations and experiments. A linear array of loudspeakers was set along the x axis and centered around x = 0. The number of loudspeakers in the linear array was M = 64, and the distance between adjacent loudspeakers was Δx_{sp} = 0.065 m, which corresponds to an actually implemented linear loudspeaker array. Its spatial Nyquist frequency was about 2.8 kHz.
B. Conventional methods
The proposed method was compared with the following three conventional approaches.
1. Delayandsum beamforming
This is the simplest approach,^{23} and the spatial filter coefficients are obtained as
where $ x sp , m $ and $ x b , n $ are the mth loudspeaker position ( $ m = 1 , \u2009 2 , \u2009 \cdots , \u2009 M $) and the nth control point for the bright zone ( $ n = 1 , \u2009 2 , \u2009 \cdots , \u2009 N $), respectively.
2. Contrast maximization
When using N control points at $ x co $ and M loudspeakers at $ x sp $,^{15} the spatial averaged correlation matrix between the control points and the loudspeaker positions is calculated as
where $ G n ( k ) = [ G 3 D ( x co , n , x sp , 1 , k ) \u2009 \cdots \u2009 G 3 D ( x co , n , x sp , M , k ) ] $. $ G n H ( k ) $ is the Hamiltonian of $ G n ( k ) , \u2009 R b ( k ) $ is the spatial correlation matrix between the bright points and the loudspeaker positions, and $ R d ( k ) $ is that between the dark points and the loudspeaker positions. In the CM approach, spatial filters $ F CM ( x sp , k ) $ are obtained from the eigenvector of matrix $ [ R b ( k ) + R d ( k ) ] + R b ( k ) $ that corresponds to the largest eigenvalue of this matrix.^{15} $ [ R b ( k ) + R d ( k ) ] + $ is the generalized inverse of $ [ R b ( k ) + R d ( k ) ] $.
3. Least squaresbased pressure matching
In the LSbased pressure matching approach,^{33} the spatial filters are directly calculated as the inverse of matrix $ G ( x co , x sp , k ) $, which is constructed from every transfer function between each control point $ x co , n $ and loudspeaker position $ x sp , m $,
where
and $ P ( x co , k ) $ is the sound pressures at control points $ x co $. For generating bright zone $ x b $ and dark zone $ x d , \u2009 P ( x co , k ) $ at $ x b $ and $ x d $ are set to 1 and 0, respectively.
To calculate stable filters for the CM and LS methods, a truncated SVD was employed for regularization. Small valued eigenvalues of the matrices were truncated, and the threshold of the ratio between the maximum and minimum eigenvalues was set to 20 dB, which was decided from preexperimental results.
C. Evaluation indices
The sound pressure levels synthesized by the loudspeakers on a plane with z = 0 m, −2.08 m ≤ x ≤ 2.08 m, and 1.5 m ≤ y ≤ 2.5 m were evaluated. The measurement points at the plane were discretized as Δx = 0.0325 m, which depends on an implemented linear microphone array and corresponds to twice the spatial Nyquist frequency of the linear loudspeaker array, and Δy = 0.05 m, which depends on the actual measurements.
In previous work,^{13,14} the bright to dark ratio (BDR) was defined for evaluating produced sound pressure level $ 20 \u2009 log 10  P ( x , k )  $ between bright zone $ x b $ and dark zone $ x d $ using a rectangular window. In this paper, an extended spatial filtering with a Hann window is proposed. Then the simple BDR that was previously used^{13,14} is inadequate since a Hann window in not rectangular.
For generating bright and dark zones, since the most important performance is how to reduce the undesired sound pressures at the dark zone, the following two evaluation indices are defined to evaluate it.
One is the sound pressure level averaged for y components that is defined as
where $ x b , center $ is the bright zone center position, Δy = 0.05 m, and I = 21.
The other is the averaged sound pressure level ratio between bright zone center $ x b , center $ and dark zone $ x d $, defined as the modified BDR, given as
In the proposed approach, control distance r_{b} = y_{b} = was set to 2.0 m (z_{b} = 0 m), and dk_{x} in Eq. (13) was discretized into $ \Delta k x = 2 \pi / 4 M \Delta x sp = 0.3776 $, and spatial filters of M = 64 loudspeakers in temporal frequency domain $ F rect / Hann ( x sp , k ) $ were obtained.
In the conventional methods, N = 64 control points were set to y = y_{b} = 2 m, z = 0 m, and −2.08 m ≤ x ≤ 2.0475 m discretized Δx_{co} = 0.065 m.
The produced sound pressure level at temporal frequency f = 2000 Hz with x_{b} = 1.04 m and l_{b} = 1.04 m for each method was evaluated. In addition, averaged sound pressure levels (SPL), (SPL)_{ave} (x, k), were calculated for the proposed methods.
In the modified BDR evaluation, BDR_{mod} (k) was calculated from the produced sound pressure level at discretized 128 (−2.08 m ≤ x ≤ 2.0475 m with Δx = 0.0325 m) × 21 (1.5 m ≤ y ≤ 2.5 m with Δy = 0.05 m) = 2688 measurement points to evaluate the following four conditions:

Narrow width bright zone at the array center (x_{b} = 0 m and l_{b} = 0.26 m).

Wide width bright zone at the array center (x_{b} = 0 m and l_{b} = 1.04 m).

Narrow width bright zone at the left side of the array (x_{b} = 1.04 m and l_{b} = 0.26 m).

Wide width bright zone at the left side of the array (x_{b} = 1.04 m and l_{b} = 1.04 m).
D. Computer simulations
In all the simulations, a threedimensional freefield was assumed. According to the experimental conditions, the spatial filters of the conventional and proposed methods, the produced sound pressure level, and BDR_{mod} (k) were calculated using ideal transfer functions $ G 3 D ( x , x sp , k ) $, defined in Eq. (2).
E. Acoustic measurements with actually implemented arrays
To validate the effectiveness of the proposed method using actual loudspeakers and compare it with the conventional approaches, a linear array of 64 loudspeakers was implemented. Sixtyfour loudspeakers (Bose; M2 which are active loudspeakers with their own amplifiers but no such amplifiers were used) were mounted on an aluminum frame [Fig. 2(b)] and controlled by two DA converters (RME; M32DA) and 32 loudspeaker amplifiers (Rorand; SRA5050). The loudspeaker locations were the same as described in Sec. III A. As shown in Fig. 2(a), the transfer functions between the loudspeakers and the evaluation points were measured using an actually implemented linear microphone array in an anechoic chamber where the background noise level was 9.2 dB (100 to 20 000 Hz). Sixtyfour microphones (DPA; 4060) were fixed using 64 aluminum jigs, mounted on an aluminum frame [Fig. 2(c)], and controlled by two AD converters (RME; M32AD) and 8 microphone amplifiers (RME; Octamic II). The distance between adjacent microphones was Δx_{mic} = 0.0325 m. The DA and AD converters were connected to a MADI audio interface (RME; HDSPeMADIface) and controlled by a laptop (Apple; MacBook Pro) with audio control software (Pdextended 0.42.5) that can synchronously manage 64in/64out MADI audio signals with a sampling frequency of 48 kHz.^{41} Sixtyfour microphone gains were calibrated using a sound pressure calibrator (Brüel & Kejær; type 4231).
To ensure identical experimental conditions among the three conventional and two proposed methods, the transfer functions between the loudspeakers and the evaluation points were measured as impulse responses instead of direct measurements of the sound pressures produced by these five methods. A timestretched pulse^{42} with a length of 16 384 points, where the sampling frequency was 48 kHz, was used as the measurement signal. In the measurement, each impulse response was calculated from the synchronous addition of ten measurements. The impulse responses between 64 loudspeakers and 2688 evaluation points were measured, and the total number was 172 032. The transfer functions in the temporal frequency domain were obtained from the measured impulse responses by the discrete temporal Fourier transform. The signal to noise ratio (SNR) of the measured transfer functions exceeded 20 dB.
In the experimental evaluation, produced sound pressure level $ 20 \u2009 log 10  P ( x , k )  $, averaged sound pressure level SPL_{ave} (x, k), and modified bright to dark ratio BDR_{mod} (k) were calculated using the measured transfer functions.
IV. RESULTS AND DISCUSSIONS
A. Produced sound pressure level in ideal condition
Figures 3 and 4 show the results of the produced sound pressure level, where $ P ( x b , center , k ) $ with $ x b , center = [ x b , \u2009 y b , \u2009 0 ] $ was set to 0 dB at temporal frequency f = 2000 Hz with x_{b} = 1.04 m and l_{b} = 1.04 m, calculated from the simulated and measured transfer functions, respectively.
From the simulation results in Fig. 3, the conventional CM and LS methods and the proposed approaches controlled the bright and dark zones on reference line y = y_{b} more effectively than the simple DS beamforming that only considered the bright zone. However, the undesired sound pressures were severely radiated around the boundary between the bright and dark zones with y ≠ y_{b} in the CM and LS methods since these methods are based on numerical solutions, and the sound pressures at y ≠ y_{b} cannot be controlled at all. Especially in the LS method, undesired sound pressures were radiated at the opposite side of the bright zone. On the other hand, the proposed approaches were derived from the wave equationbased analytical solution and both the sound pressures and the wave front on y = y_{b} were efficiently controlled. The method with a Hann window effectively generated bright and dark zones with fewer undesired sound pressures radiated to the dark zone compared with the rectangular window. This is just because of the wave number component difference between the rectangular and Hann windows. Figure 5 shows the absolute value of the spatial filter in the wave number domain of each proposed method $  F \u0303 rect / Hann ( k x , x )  $ for f = 2000 Hz with x_{b} = 1.04 m, y_{b} = 2.0 m, and l_{b} = 1.04 m. The spatial filter modeled by a rectangular window contains the sidelobes in the wave number domain at k_{x} ≠ 0, and these components are radiated to the dark zones with y ≠ y_{b}. That with a Hann window, in contrast, contains fewer sidelobes at k_{x} ≠ 0, and the produced sound field is also constructed from mainlobe components k_{x} ≈ 0 that are almost parallel to the control line. As a result, the unwanted sound pressures are not radiated to the dark zones with y ≠ y_{b}. The simulation results in Fig. 3 validated the theoretical performance of both the proposed methods and indicated that spatial filtering with a Hann window can more effectively generate the bright and dark zones using a linear loudspeaker array than the other methods.
B. Produced sound pressure level using actual loudspeakers
The experimental results of control accuracy calculated from the measured transfer functions shown in Fig. 4 were degraded compared with the simulation results depicted in Fig. 3. This is because the measured transfer functions include such measurement error as loudspeaker gain differences, loudspeaker and microphone location error, the directivity of the loudspeakers, and additive noise. The measured transfer functions differ from the ideal ones. Especially in the CM method, the performance degradation was severe since it includes numerical inversion and is unstable even though truncated SVD regularization was employed. Compared with the conventional methods, the proposed approaches with a Hann window adequately controlled the bright and dark zones using the actual linear array. This indicates that the wave equationbased analytical solutions were more robust and practical for actual implementations than the conventional beamforming and numerical approaches.
C. Control accuracy comparison between proposed methods with actual loudspeakers
For evaluating the control accuracy using the actual array between both of the proposed methods, the results of averaged sound pressure level SPL_{ave} (x, k) defined in (18) were plotted in Fig. 6. The sound pressure level in the bright zone produced by a rectangular window filtering is obviously flatter than that produced by a Hann window filtering because of the window shape in the bright zone. These results, on the other hand, indicate that the control accuracy of a Hann window filtering in the dark zone is higher than a rectangular window filtering for the following two reasons. The cause of the high control accuracy near the bright zone is the wave number component difference described in Sec. IV A. In addition, the dynamic range of a Hann window is theoretically wider than a rectangular window, and the control accuracy of a Hann window at the dark zone is completely higher than a rectangular window. The effectiveness of Hann window filtering with actual loudspeakers is validated from the results in Fig. 6.
D. Control accuracy of dark zone
Figures 7, 8, 9, and 10 show the results of modified bright to dark ratio BDR_{mod} (k) defined in (19) for the four conditions described in Sec. III C that were calculated from the ideal and measured transfer functions. In addition, to validate the results of the measured transfer functions, they were simulated from ideal ones. To simulate the degradation of the measured transfer functions, the sound pressures produced by five methods were calculated using the ideal transfer functions with additional Gaussian noise. The averaged SNR of the additive noise was set to 20 dB, which was also determined from the preexperimental results. The BDR_{mod} (k) results calculated by the simulated transfer functions with additive noise were plotted in Figs. 7(c) to 10(c), respectively.
First, the results of the measurement cases shown in Figs. 7(b) to 10(b) were also completely delegated compared with those of the simulation cases with the ideal transfer functions because of the transfer function mismatch described in Sec. IV B.
By comparing the results of the measured transfer functions shown in Figs. 7(b) to 10(b) with those of the simulated ones with the additive noise in Figs. 7(c) to 10(c), they seem to share a similar tendency and validate the results of the measured transfer functions.
In the case of narrow width bright zone l_{b} = 0.26 m, especially for temporal frequency 2 to 3.5 kHz with x_{b} = 1.04 m, the CM approach can more effectively control the dark zone in both the simulation and measurement results than the other methods. However, the results of the CM method plotted in Fig. 9 at a temporal frequency of about 4 kHz and Figs. 8 and 10 for wide width bright zone l_{b} = 1.04 m were unstable since the CM method included the numerical inversion and the spatial filters were unstable even though regularization was applied. The results of the proposed method with a Hann window can more effectively control the dark zone especially with a wide width bright zone in both the simulation and measurement cases compared with the other method.
Consequently, the effectiveness of the proposed approach with a Hann window for generating bright and dark zones using an actual linear loudspeaker array was validated from both the simulation and measurement results. Future work will improve the proposed method for reverberant environments.
V. CONCLUSIONS
This paper experimentally validated the proposed spatial Fourier transformbased approaches for controlling multiple sound zones. Although previous work proposed a spatial filter with a simple rectangular window, a spatial filter with a Hann window in the wave number domain was introduced and an analytical solution was derived. Both a computer simulation and an experiment using actual loudspeakers were conducted. In the experiment, linear arrays of 64 loudspeakers and 64 microphones were actually implemented in an anechoic chamber, and the transfer functions between loudspeakers and evaluation points were measured. The proposed methods were compared with the conventional DS beamforming, CM, and LS methods. Both the simulation and experimental results validated the effectiveness of the proposed approach with a Hann window using an actual linear loudspeaker array.
ACKNOWLEDGMENTS
This study was partly supported by JSPS KAKENHI Grant Nos. 25871208 and 15K21674.
APPENDIX: SPATIAL FOURIER TRANSFORM OF A HANN WINDOW
The spatial Fourier transform of a Hann window is analytically derived as