Personal sound zones (PSZ) systems use an array of loudspeakers to render independent audio signals to multiple listeners within a room. The performance of a PSZ system, designed using weighted pressure matching, depends on the selected target responses for the bright zone. In reverberant environments, the target responses are generally chosen to be the room impulse responses from one of the loudspeakers to the control points in the selected bright zone. This approach synthesizes the direct propagation component and all the reverberant components in the bright zone, while minimizing the energy in the dark zone. We present a theoretical analysis to show that high energy differences cannot be achieved for the diffuse reverberant components in the bright and dark zones, and so trying to synthesize these components in the bright zone does not lead to the best performance. It is then shown that the performance can be improved by using windowed versions of these measured impulse responses as target signals, in order to control which reverberant components are synthesized in the bright zone and which are not. This observation is supported by experimental measurements in two scenarios with different levels of reverberation.

Personal sound zones (PSZ) systems use an array of loudspeakers to render different audio signals to different users with minimum leakage between them.1,2 To achieve this, a set of filters is used to process the audio signals that are fed to the loudspeakers. Different techniques have been proposed to compute the filters, such as beamforming,3,4 soundfield synthesis,5–7 energy cancellation approaches,8,9 or hybrid approaches.10,11 Among these, acoustic contrast control (ACC)12 is the algorithm that can achieve highest isolation between the bright and dark zones, where the terms bright and dark zone refer to the regions where we want high and low acoustic energy, respectively.12 However, ACC cannot synthesize a specific target response in the bright zone. To solve this limitation, the weighted pressure matching (WPM) algorithm has been proposed,13 which offers the possibility to render a target response in the bright zone while keeping control over the energy in the dark zone. To do so, the authors in Ref. 13 proposed a novel cost function in which a weighting parameter is used to balance the components of the energy in the dark zone and the error with respect to the desired response in the bright zone. The WPM algorithm can be formulated in time-domain,14 subband-domain15 and frequency-domain.16 In this paper, because of its simplicity, we will use the frequency-domain formulation. Nevertheless, our findings can be generalized to the other formulations of WPM.

The main advantage of WPM over ACC is that it allows us to synthesize a desired target response in the bright zone; however, this is at the cost of higher energy in the dark zone (i.e., higher interference between the users of the system).13 The performance of ACC and WPM has been compared under reverberant conditions in Refs. 17 and 18. The target response selected for the bright zone is a key choice for WPM systems, as different targets responses can lead to different levels of energy in the dark zone. However, this is an aspect that, to the best of our knowledge, has not been previously studied in the literature. The most usual approach in reverberant environments is to select the target as the room impulse response (RIR) produced by one of the speakers of the system in the control points of the bright zone.14–16,19–21 This approach aims to synthesize all the direct and reverberant components of the RIR in the bright zone while minimizing the energy in the dark zone. The late reverberation components, however, can be assumed to be diffuse for frequencies above the Schroeder frequency. We will show that there is no set of filters that can achieve high energy differences for the diffuse reverberant components in the bright and dark zones. Therefore, trying to synthesize these components in the bright zone while minimizing their energy in the dark zone does not give the best overall performance. We therefore propose a variation of the WPM approach, in which a window function is applied to the target impulse response for the bright zone. By windowing this response, we can control which reverberant components are synthesized and which are minimized in the bright zone. Windowing has been previously used in the context of PSZ systems in Refs. 18, 22, and 23. In these works, the authors propose to window the RIRs used to compute the filters. The effect of windowing in these cases is similar to regularization, as it makes the filters more robust to inaccuracies in the RIRs. The approach proposed in this paper is conceptually different, as the RIRs used to compute the filters are not windowed, but windowing is instead applied to the targets for the bright zone. We present experimental evaluations in two scenarios that show the performance obtained with different window lengths. The results indicate that windowing the target can lead to performance improvements with respect to the case without windowing. In general, it seems that the optimal window length is frequency and scenario dependent, but the improvements that can be obtained are more significant for mid-high frequencies. Moreover, the results show that the higher the room reverberation, the higher the performance improvements obtained by windowing the target. Finally, we present evaluation results that show that the improvements in the performance obtained with the proposed method are robust to perturbations in the environment.

The paper1 is structured as follows. Section II studies the WPM algorithm. Section III presents the novel target selection for WPM. Section IV presents experimental results to show the performance of the proposed strategy to select the target under different reverberation levels. Finally, Sec. V summarizes the main conclusions.

Let us consider a PSZ system that uses an array of L loudspeakers, and where the bright and dark zones are spatially sampled using Mb and Md control points, respectively. Let us denote Hml,q(f) as the room frequency response at frequency f between the l-th loudspeaker and the m-th control point in the q-th zone, where q{b,d}, and b and d are the indices of the bright and dark zones, respectively. Also, let us define Gl(f) as the frequency response of the filter used to filter the signals that will be fed to the l-th speaker. From now on, we will omit index f for the sake of simplicity. Next, let us define the Mq×L matrix containing the room frequency responses between all loudspeakers and all control points in the q-th zone as

Hq=[H00,qH0(L1),qH(Mq1)0,qH(Mq1)(L1),q].
(1)

Similarly, let us define the L×1 vector containing the frequency responses of the filters for all loudspeakers in the system as

g=[G0GL1]T.
(2)

Then, we can write the Mq×1 vector containing the combined frequency responses for the control points in the q-th zone as

xq=Hqg.
(3)

Moreover, let us define Dm,b(f) as the target frequency response that we want to synthesize at the m-th control point of the bright zone (we assume that a null target response is selected for the dark zone). Then, we can write the Mb×1 vector containing the target frequency responses for all the control points in the bright zone as

db=[D0,bDMb1,b]T.
(4)

Once the model is presented, we describe the weighting pressure matching (WPM) algorithm, which was originally proposed in Ref. 13. The algorithm aims to find the filter coefficients g that minimize the following cost function

J(g)=κMdHdg2+(1κ)MbHbgdb2+λg2.
(5)

We can see that Eq. (5) is formed by three terms: (1) the mean energy in the dark zone, (2) the mean square error (MSE) with respect to the target frequency response in the bright zone, and (3) the energy of the filter coefficients. In Eq. (5), λ is a regularization factor that constrains the energy of the filters. Also, κ is the weighting factor satisfying 0κ1 that is used to balance the solution, e.g., high values of κ put more effort into minimizing the mean energy in the dark zone whereas low values put more effort into minimizing the MSE in the bright zone. It is straightforward to show that the optimal filter coefficients that minimize Eq. (5) are given by,13 

g=(κMdHdHHd+(1κ)MbHbHHb+λI)1(1κ)MbHbHdb.
(6)

The target selected for the bright zone heavily influences the performance of WPM; however, this is an aspect that has not been extensively studied in the PSZ related literature. The most common approach in reverberant environments is to select the target impulse response as the delayed response from one of the loudspeakers to all of the control points in the bright zone,14–16,19–21 i.e.,

dm,b=hmlr,b(nτ),
(7)

where lr{0,,L1} is the index of the reference loudspeaker, and τ is a modelling delay that assures the causality of the filters. The target frequency response Dm,b is obtained by computing the DTFT of dm,b in Eq. (7). The previous selection aims to synthesize the direct propagation component and all the reverberant components produced by the reference loudspeaker in the bright zone.

Next, let us define

wLw(n)=0,for |n|Lw,
(8)

which is a window function centered on n = 0. From the definition of the window in Eq. (8), it is easy to see that its length is 2Lw1; however, from now on, we are going to refer to the length of the window as Lw (i.e., the number of positive time samples in the window). Now, we alternatively propose to define the target impulse response in the control points of the bright zone as

dm,b(Lw)(n)={wLw(nττp)hmlr,b(nτ)if Lw<hml,b(nτ)otherwise,
(9)

where τp is the propagation delay corresponding to the direct component of hmlr,b, and in which the window is time-shifted such that its center is located on n=τp+τ. Then, we propose to select the target frequency response Dm,b used to form db as the DTFT of dm,b(Lw) in Eq. (9). It is important to highlight that, contrary to what is proposed in Refs. 18, 22, and 23, no windowing is applied to the RIRs used to form matrices Hb and Hd. Also, it is relevant to note that the targets in Eqs. (7) and (9) are equivalent when Lw=. However, Eq. (9) has the advantage that, by selecting the window length, we can choose which reverberant components we want to synthesize and which to minimize in the bright zone. Then, when selecting a window length that removes certain reverberant components, we are aiming to achieve a de-reverberation of the bright zone. We present in Fig. 1 a schematic to illustrate this effect. In the schematic, we show an example of a RIR between the reference loudspeaker and one control point in the dark and bright zones, and also, the target that we aim to achieve at these control points using a window of length Lw. From this schematic, it is easy to see that when Lw=, we want to synthesize all the reverberant components in the bright zone while minimizing the energy of all the components in the dark zone. However, the effect is different when we select finite values for Lw. For time instants within the window, we seek to synthesize the direct propagation component and some early reflections in the bright zone, while minimizing the energy in the dark zone. For time instants after the end of the window, we want to minimize the energy of the reverberant components both in the bright and dark zones. The proposed target selection can be directly applied for the time-domain formulation of WPM14 by using Eq. (9) to form the vector with the target impulse responses in all the control points of the bright zone.

FIG. 1.

(Color online) Schematic to illustrate the proposed target selection. The upper plots represent the RIR between the reference loudspeaker and one control point in the bright and dark zones, and the lower plots represent the windowed target at these control points using a window of length Lw.

FIG. 1.

(Color online) Schematic to illustrate the proposed target selection. The upper plots represent the RIR between the reference loudspeaker and one control point in the bright and dark zones, and the lower plots represent the windowed target at these control points using a window of length Lw.

Close modal

It is not initially clear whether windowing the target response will lead to performance improvements. However, above the Schroeder frequency, the late reverberation components of the RIR become diffuse.24 These components can then be assumed to be of similar energy but uncorrelated between the different control points. Now, let us define Hb,dif and Hd,dif as the matrices containing the diffuse components of the RIR at single frequency f in the bright and dark zones, respectively. From the ACC algorithm,25 we know that the maximum acoustic contrast (AC) that can be achieved between the diffuse components of the bright and dark zones is given by the highest eigenvalue of matrix (Hd,difHHd,dif)1Hb,difHHb,dif. For the diffuse components, we can assume that Hb,difHHb,difσI and Hd,difHHd,difσI (where σ is the energy of the diffuse field in the room).26 Then, the maximum AC that can be achieved between the diffuse components of the bright and dark zones is approximately 1 (in linear units). This shows that there is no set of filters that is able to provide a significant energy difference between the diffuse components in the bright and dark zones. Thus, selecting a target that tries to synthesize all of the reverberant components, including the diffuse components in the bright zone (i.e., with Lw=) is not a good choice. In Sec. IV, we will present evaluation results that show that improvements in the performance can be achieved by windowing the target, i.e., aiming the minimization of the energy of the diffuse components in both zones.

In this section, we evaluate the performance of the proposed target selection for different window lengths and for two scenarios with different reverberation times. First, we define the experimental setup and the metrics used for the evaluations. Then, we present evaluation results that show the effect on the performance of windowing the target response for the bright zone. Finally, we evaluate the robustness to perturbations in the filters computed with a windowed target.

The experimental evaluations have been carried out in two scenarios using rooms with different reverberations levels:

  • Scenario 1: Office-like room at the Institute of Telecommunications and Multimedia Applications (iTEAM) of size 7.2×11.72×2.63 m and reverberation time T60=0.5 s. The Schroeder frequency for this room is 95 Hz. In this scenario, a linear array of 8 two-way loudspeakers with an inter-element distance of 0.18 m has been used [see Fig. 2(a)]. The setup is formed by one bright and one dark zone [Fig. 3(a)]. In each zone, two different grids of microphones have been used for spatial sampling, such that the RIRs measured with the control grid are used to compute the filters and the RIRs measured with the validation grid are used to evaluate the filters. The highest frequency that can be rendered without spatial aliasing is 1498 Hz.27 

  • Scenario 2: Listening room at iTEAM, which is a rectangular room of size 9.07×4.45×2.65 m with acoustically treated walls and reverberation time T60=0.18 s. The Schroeder frequency for this room is 137 Hz. Similarly to scenario 1, a linear array of 8 two-way loudspeakers with an inter-element distance of 0.18 m [Fig. 2(b)] has been used and two zones are considered, which are spatially sampled using two different grids of microphones [see Fig. 3(b)]. The highest frequency that can be rendered without spatial aliasing is 1401 Hz.27 

FIG. 2.

(Color online) Array of loudspeakers used in scenarios 1 and 2 in (a) and (b), respectively. The reference loudspeaker used to select the target is indicated using a red arrow in each case.

FIG. 2.

(Color online) Array of loudspeakers used in scenarios 1 and 2 in (a) and (b), respectively. The reference loudspeaker used to select the target is indicated using a red arrow in each case.

Close modal
FIG. 3.

(Color online) Setups for scenarios 1 and 2 in (a) and (b), respectively. Circle, cross, and triangle markers denote control points, validation points, and loudspeakers, respectively. The walls are located in x=±3.2 m, y=±5.86 m, and z={0,2.65} m in (a), and in x=±4.53 m, y={0.46,3.99} m, and z={0,2.65} m in (b). The loudspeakers and microphones are located at a height of 1.56 m and 1.51 m in (a) and (b), respectively.

FIG. 3.

(Color online) Setups for scenarios 1 and 2 in (a) and (b), respectively. Circle, cross, and triangle markers denote control points, validation points, and loudspeakers, respectively. The walls are located in x=±3.2 m, y=±5.86 m, and z={0,2.65} m in (a), and in x=±4.53 m, y={0.46,3.99} m, and z={0,2.65} m in (b). The loudspeakers and microphones are located at a height of 1.56 m and 1.51 m in (a) and (b), respectively.

Close modal

In all cases, the RIRs were measured using the exponentially swept sine technique28 with a sampling frequency of 44 100 Hz. In the following, we use notation ·¯ to denote the elements related to the validation grid of microphones. Due to the effects of the spatial aliasing in the arrays of loudspeakers, we limit the study to the frequency range 150–2000 Hz.

The optimal filters have been computed using Eq. (6), where the weighting parameter κ is set to 0.5, if not indicated otherwise. Moreover, the regularization parameter λ is selected such that the array effort [which will be defined next in Eq. (12)] is upper bounded as AEAEmax. The value of AEmax will be indicated in each case. The target for the bright zone has been selected using Eq. (9), where we select wLw as a Tukey window with cosine fraction α=0.329 and the reference loudspeaker is lr=3 [as indicated in Figs. 2(a) and 2(b)]. We select a Tukey window because, when applied to the target response, it does not modify the amplitude of the first early reflections of the response and it leads to a smooth transition between the reverberant and null components of the target. Other types of windows have also been investigated but gave similar results. Moreover, when Lw<, we apply a 13 octave band equalizer to the target to obtain the same energy in the bright zone as for the case with Lw=. In Fig. 4(a), we show an example of the causal part of a Tukey window with Lw=529 (12 ms), and Figs. 4(b) and 4(c) show examples of a target impulse response and a frequency response in the bright zone for scenario 1 with Lw = 529 (12 ms) and Lw=.

FIG. 4.

(Color online) Example of the causal part of a Tukey window with Lw = 529 (12 ms) and cosine fraction α=0.3 in (a), target impulse response and transfer function in the 0-th control point of the bright zone for scenario 1 with different window lengths in (b) and (c), respectively.

FIG. 4.

(Color online) Example of the causal part of a Tukey window with Lw = 529 (12 ms) and cosine fraction α=0.3 in (a), target impulse response and transfer function in the 0-th control point of the bright zone for scenario 1 with different window lengths in (b) and (c), respectively.

Close modal

Next, we present the metrics used for the evaluations. First, we define the MSE in the bright zone as

εb=H¯bgd¯b2,
(10)

where the target is selected using the same criterion used to compute the filters, meaning that the same window wLw is applied to obtain the targets db and d¯b that are used to compute and to evaluate the filters, respectively. Next, we define the mean energy in the dark zone as

Ed=H¯dg2,
(11)

and the array effort as

AE=g2Eref,
(12)

where Eref is the energy required by the reference loudspeaker to provide the same energy in the bright zone as the array of loudspeakers using the set of filters g.

So far, we have defined metrics that are directly related to the three components of the cost function in Eq. (5), and whose influence can be adjusted with λ and κ. Now, we define the acoustic contrast (AC) as

C=H¯bg2H¯dg2,
(13)

which is a metric related to the level of acoustic isolation between the bright and dark zones. The AC is not directly present in cost function [Eq. (5)] but is commonly used as a performance indicator for PSZ systems. Finally, we define the mean Kurtosis of the measured RIRs at sample time n as

K(n)=1Nhm,l,q(E{(h¯ml,qμml,q(n))4}(σml,q(n))43),
(14)

where μml,q(n) and σml,q(n) are the mean value and the standard deviation of h¯ml,q, respectively, over the interval n,,n+Ls1, and the expected value for time n is computed over the interval n,,n+Ls1. In Eq. (14), Nh denotes the total number of RIRs for which the mean Kurtosis is computed. The Kurtosis is a measure of the tailedness of a sample distribution. For a Gaussian probability density function (PDF), the Kurtosis value is 0. The authors in Ref. 30 show that the Kurtosis is closely related to the diffuseness of late reverberations. In particular, they show that the early part of the RIR, containing the direct propagation component and strong deterministic reflections, is unlikely to have a Gaussian distribution, so it presents high Kurtosis levels. However, the late diffuse components of the RIR present Kurtosis values close to 0. The motivation is that the reflection density is high for the diffuse part of the RIR, which makes it more likely that the RIR has a Gaussian distribution. For the computation of the Kurtosis, we used a length segment of Ls = 882 (20 ms) and all the RIRs have been aligned, such that their direct propagation component is located at sample index n = 0. From now on, we use a third-octave averaging31 for all frequency-domain plots to improve the readability of the results.

In this subsection, we present experimental results to evaluate the performance of the proposed target selection for scenario 1.

First, we compare the proposed target selection with the approach employed in Refs. 18, 22, and 23 in which not only the target used to form db is windowed but also the RIRs used to form Hq. We show in Fig. 5 the performance of the proposed approach as a function of Lw in terms of: mean energy in the dark zone [in Figs. 5(a) and 5(d)], MSE in the bright zone [in Figs. 5(b) and 5(e)], and array effort [in Figs. 5(c) and 5(f)]. Similarly, the performance of the approach used in Refs. 18, 22, and 23 is shown in Fig. 6. As expected, the performance in both cases is identical for very long windows, e.g., 200 ms. We can see in Fig. 6 that windows shorter than 100 ms highly degrade the performance for the case with windowed RIRs and targets. This is because relevant reverberation components of the RIRs are not taken into account in the optimization. On the contrary, the proposed approach takes into account all the reverberant components of the RIR in the optimization but tries to minimize the energy of some of these components. The results in Fig. 5 show that short windows can offer better performance in this case. These results indicate that the performance of the system can be improved with the proposed approach by selecting short windows, while this is not possible with the approach used in Refs. 18, 22, and 23. From now on, we focus on further evaluating the performance of the proposed method.

FIG. 5.

(Color online) Performance of the proposed target selection in Sec. III as a function of the window length and frequency for scenario 1 in terms of: mean energy in the dark zone (a) and (d), MSE in the bright zone (b) and (e), and array effort (c) and (f). For the top row figures AEmax=0 dB, and AEmax=15 dB for the bottom figures.

FIG. 5.

(Color online) Performance of the proposed target selection in Sec. III as a function of the window length and frequency for scenario 1 in terms of: mean energy in the dark zone (a) and (d), MSE in the bright zone (b) and (e), and array effort (c) and (f). For the top row figures AEmax=0 dB, and AEmax=15 dB for the bottom figures.

Close modal
FIG. 6.

(Color online) Performance of the approach used in Refs. 18, 22, and 23 as a function of the window length and frequency for scenario 1 in terms of: mean energy in the dark zone (a) and (d), MSE in the bright zone (b) and (e), and array effort (c) and (f). In this case, both the RIR and the target are windowed. For the top row figures AEmax=0 dB, and AEmax=15 dB for the bottom figures.

FIG. 6.

(Color online) Performance of the approach used in Refs. 18, 22, and 23 as a function of the window length and frequency for scenario 1 in terms of: mean energy in the dark zone (a) and (d), MSE in the bright zone (b) and (e), and array effort (c) and (f). In this case, both the RIR and the target are windowed. For the top row figures AEmax=0 dB, and AEmax=15 dB for the bottom figures.

Close modal

Next, we discuss the effect of the window length in the performance of the proposed target selection. In Fig. 5, the metrics in the top and bottom rows are computed with AEmax = 0 dB and AEmax = 15 dB, respectively, but the performance for both effort constraints is equal for frequencies above 800 Hz. The motivation is that the matrix that must be inverted in Eq. (6) is well conditioned above 800 Hz, and then, even with a regularization parameter of λ = 0 does not lead to a high array effort. For frequencies below 800 Hz, the performance is different for the two effort constraints, however, the effect of the window length in the performance is similar in both cases. Then, the following analysis is valid for the two studied constraints. We can see in Figs. 5(b) and 5(e) that very short windows (of 1 ms or less) lead to an MSE that is at least 10 dB worse than for longer windows, while the energy generated in the dark zone is similar [in Figs. 5(a) and 5(d)]. This degradation appears because very short windows aim to suppress the first early reflections, and also, to equalize the response of the reference loudspeaker in the bright zone, which is too restrictive and leads to important errors in the bright zone. Also, we can see that, in general, short windows (of about 12 ms) present a significantly lower energy in the dark zone than longer windows. The motivation is that these windows remove the diffuse components from the target in the bright zone, and then, the designed filters are capable of minimizing the diffuse field, both in the bright and dark zones. Then, the improvements obtained by windowing the target are not obtained because we force the cancellation of some high energy early reflections, but because we target the cancellation of the diffuse reverberant tail. In particular, for frequencies in the range of 150–200 Hz, 400–500 Hz, and above 700 Hz, the improvements in the energy in the dark zone are not at the cost of substantially higher MSE. For frequencies 200–400 Hz and 500–700 Hz, short windows lead to lower energy in the dark zone at the cost of worse MSE. For these frequency bands, lower energy in the dark zone may be achieved for long windows by tuning the weighting parameter κ; therefore, at this point, it is not yet clear if real improvements in the performance are achieved with short windows in all the studied frequency range.

Now, we present additional results where we compare the following configurations:

  • Win. Target (WT): Lw = 529 (12 ms) and κ=0.5.

  • Full Target (FT): Lw= and κ=0.5.

  • Full Target with Frequency Dependent κ (FT-FD): Lw= and κ selected as shown in Fig. 7 to achieve the same MSE as WT.

FIG. 7.

(Color online) Weighting factor κ for FT-FD in scenarios 1 and 2.

FIG. 7.

(Color online) Weighting factor κ for FT-FD in scenarios 1 and 2.

Close modal

It is important to note that in order to fairly determine which window length is able to provide lower energy in the dark zone, we must compare their performance for the case in which their MSE is equal. This means that reductions in the energy in the dark zone for one of the configurations compared to another are then not at the cost of higher MSE in the bright zone. This is the motivation to include FT-FD in the comparison. The performance of the three configurations is shown in Fig. 8 in terms of: mean energy in the dark zone [in Fig. 8(a)], MSE in the bright zone [in Fig. 8(b)], array effort [in Fig. 8(c)], and acoustic contrast [in Fig. 8(d)]. We only present results for AEmax = 15 dB, as we can see in Fig. 5 that the effect of the window length on the performance is similar for different effort constraints. We see that WT can offer lower energy in the dark zone than FT and also lower or equal MSE in certain frequency bands, e.g., 150–200 Hz, 400–500 Hz, 700–900 Hz, and 1.1–1.5 kHz. This means that a longer window cannot offer better performance than Lw = 529 for these frequencies, neither in terms of energy in the dark zone nor in terms of MSE (even if we tune κ). Next, if we compare WT and FT-FD, we can see that WT leads to lower energy in the dark zone for all the studied frequencies except for the band 200–360 Hz, where FT-FD presents slightly lower energy. For example, WT leads to almost 6 dB lower energy in the dark zone than FT-FD around 1 kHz, and 5 dB lower energy around 180 Hz. Obtaining lower energy in the dark zone does not necessarily mean that the acoustic contrast is higher, since the energy reduction could be produced because the energy in the bright zone is also lower. This aspect is especially important for the comparison of targets windowed with different window lengths, as these targets present different energy levels for the bright zone at individual frequencies (although their mean energy is equal across 13 octave bands). Nonetheless, the results in Fig. 8(d) indicate that, in general, the acoustic contrast follows an inverse trend compared with those for the energy in the dark zone, since for the same MSE level, WT can offer higher contrast than FT-FD across almost all of the studied frequencies (with improvements of more than 5 dB). Moreover, the array effort required by WT is lower than for FT and FT-FD for frequencies above 700 Hz. The MSE is almost the same for WT and FT-FD, as expected, and is broadly similar for FT, although slightly higher or lower for different frequencies. From these results, we can conclude that, for scenario 1, windowing the target response with a short window of length Lw = 529 (12 ms) leads to significantly better performance than the case without windowing for most of the studied frequencies. This indicates that targeting the minimization of the energy of the diffuse reverberant components in the bright and dark zones can lead to great improvements in the performance for this scenario.

FIG. 8.

(Color online) Performance for three different configurations as a function of frequency for scenario 1 in terms of: (a) mean energy in the dark zone, (b) MSE in the bright zone, (c) array effort, and (d) acoustic contrast. AEmax = 15 dB is considered.

FIG. 8.

(Color online) Performance for three different configurations as a function of frequency for scenario 1 in terms of: (a) mean energy in the dark zone, (b) MSE in the bright zone, (c) array effort, and (d) acoustic contrast. AEmax = 15 dB is considered.

Close modal

Finally, Fig. 9 shows the mean Kurtosis of the RIRs for scenario 1. We can observe that the Kurtosis has high values for the early part of the RIR, where the direct component and the early reflections are located, and drops to values below 3 after about 20 ms, indicating that the reverberant components of the RIRs can be assumed diffuse, with a Gaussian PDF, after this time.30 It is for window lengths greater than about 20 ms that the performance starts to deteriorate in Fig. 5. Then, the Kurtosis provides a useful metric for the selection of the window length.

FIG. 9.

(Color online) Mean Kurtosis for scenarios 1 and 2.

FIG. 9.

(Color online) Mean Kurtosis for scenarios 1 and 2.

Close modal

Next, we study the performance of the proposed target selection for scenario 2, with the lower reverberation time. Figure 10 shows the performance as a function of Lw in terms of: mean energy in the dark zone [in Figs. 10(a) and 10(d)], MSE in the bright zone [in Figs. 10(b) and 10(e)], and array effort [in Figs. 10(c) and 10(f)]. In Fig. 10, the metrics in the top row are again computed with AEmax = 0 dB, and AEmax = 15 dB for the bottom row. As for scenario 1, the performance is similar for both constraints at frequencies above 800 Hz, as are the effects of the window length for frequencies below 800 Hz in both cases. We can see in Figs. 10(b) and 10(e) that very short windows (of 1 ms or less) lead to a much higher MSE than longer windows, while the energy generated in the dark zone is similar [in Figs. 10(a) and 10(d)]. Similarly to scenario 1, this is produced because very short windows aim to suppress the first early reflections and to equalize the response of the reference loudspeaker in the bright zone, which leads to high MSE. Then, selecting a window that is too short can significantly degrade the performance. The mean energy in the dark zone is lower for shorter windows (of about 8 ms) than longer ones, particularly for frequencies above 1 kHz. However, we can also see in Figs. 10(b) and 10(e) that, in general, the MSE is higher for short windows. In this case, the lower energy generated in the dark zone when using short windows comes at the cost of higher MSE in the bright zone.

FIG. 10.

(Color online) Performance of the proposed target selection in Sec. III as a function of the window length and frequency for scenario 2 in terms of: mean energy in the dark zone (a) and (d), MSE in the bright zone (b) and (e), and array effort (c) and (f). For the top row figures, AEmax = 0 dB, and AEmax = 15 dB for the bottom figures.

FIG. 10.

(Color online) Performance of the proposed target selection in Sec. III as a function of the window length and frequency for scenario 2 in terms of: mean energy in the dark zone (a) and (d), MSE in the bright zone (b) and (e), and array effort (c) and (f). For the top row figures, AEmax = 0 dB, and AEmax = 15 dB for the bottom figures.

Close modal

To further study if windowing the target offers performance improvements in this scenario, we present additional results in Fig. 11 where we compare three configurations which are similar to those used above:

FIG. 11.

(Color online) Performance for three different configurations as a function of frequency for scenario 2 in terms of: (a) mean energy in the dark zone, (b) MSE in the bright zone, (c) array effort, and (d) acoustic contrast. AEmax = 15 dB is considered.

FIG. 11.

(Color online) Performance for three different configurations as a function of frequency for scenario 2 in terms of: (a) mean energy in the dark zone, (b) MSE in the bright zone, (c) array effort, and (d) acoustic contrast. AEmax = 15 dB is considered.

Close modal
  • Win. Target (WT): Lw = 353 (8 ms) and κ=0.5.

  • Full Target (FT): Lw= and κ=0.5.

  • Full Target with Frequency Dependent κ (FT-FD): Lw= and κ selected as shown in Fig. 7 to achieve the same MSE as WT.

We again include FT-FD to fairly determine which window length is able to provide lower energy in the dark zone, while giving the same MSE. The performance of these configurations is shown in Fig. 11 in terms of: mean energy in the dark zone [in Fig. 11(a)], MSE in the bright zone [in Fig. 11(b)], array effort [in Fig. 11(c)], and acoustic contrast [in Fig. 11(d)]. As for scenario 1, we only present results for AEmax=15 dB to avoid redundancy. Above about 700 Hz, the energy in the dark zone is lower for WT than for FT, but even lower than for FT-FD above 1 kHz. In this case, however, FT has a significantly lower MSE than WT or FT-FD. From these results, we can conclude that windowing the target response for the bright zone with a short window of 8 ms can lead to better performance in terms of acoustic contrast than the case without windowing for frequencies above 1 kHz in this scenario. These results and the results in Sec. IV C show that the optimal window length is frequency and scenario dependent. Comparing the results for the two scenarios, we can see that the higher the reverberation in the room, the greater the improvement obtained by windowing the target response in the bright zone.

Figure 9 shows the mean Kurtosis of the RIRs for scenario 2, which has high values for times smaller than 10 ms, while it drops below 3 for times greater than about 10 ms. This indicates that the reverberant components of the RIRs are diffuse after about 10 ms, which is the window length after which the performance starts to degrade in Fig. 10, again illustrating the use of the Kurtosis in estimating the window length, although only at higher frequencies, in this case, with a short reverberation time.

We now evaluate whether the improvements obtained by windowing the target are robust to perturbations in the environment. To this end, we present in Fig. 12 evaluations results for scenario 1, where the filters are computed with the RIRs measured at the control points without any perturbation [as in Fig. 13(a)] and evaluated using the RIRs measured at the control points when perturbations in the RIR, due to two people located within the zones, are present [as in Fig. 13(b)]. The control filters are those calculated without any perturbations, as in Sec. IV C, but now the performance, as shown in Fig. 12, has been calculated after these perturbations in the RIR. For comparison purposes, we also include in Fig. 12 the performance of the filters when evaluated without perturbations (as in Sec. IV C). We can see that the perturbations have generally increased the energy in the dark zone and the MSE in the bright zone with respect to the case without perturbations. The mean energy in the dark zone is still significantly smaller for the windowed target in WT than it is without the window, and the MSE is also again broadly similar in the two cases. We can thus conclude that the performance improvements obtained by selecting a short window for scenario 1 are robust to perturbations in the environment.

FIG. 12.

(Color online) Performance for WT and FT as a function of frequency for scenario 1 with perturbations in terms of: (a) mean energy in the dark zone, (b) MSE in the bright zone, (c) array effort, and (d) acoustic contrast. AEmax = 15 dB is considered. The performance of the filters evaluated without perturbations is also shown for comparison purposes.

FIG. 12.

(Color online) Performance for WT and FT as a function of frequency for scenario 1 with perturbations in terms of: (a) mean energy in the dark zone, (b) MSE in the bright zone, (c) array effort, and (d) acoustic contrast. AEmax = 15 dB is considered. The performance of the filters evaluated without perturbations is also shown for comparison purposes.

Close modal
FIG. 13.

(Color online) Setup used to measure the RIRs without perturbations in (a), and to measure the RIRs with the perturbations produced by two persons within the zones in (b).

FIG. 13.

(Color online) Setup used to measure the RIRs without perturbations in (a), and to measure the RIRs with the perturbations produced by two persons within the zones in (b).

Close modal

In this paper, we proposed a novel approach to select the target response in the bright zone for the weighting pressure matching (WPM) algorithm in personal sound zones systems. In previous works, the target for the bright zone has generally been selected to be the room impulse response (RIR) from one loudspeaker to all the control points in the bright zone. The aim is thus to synthesize the direct propagation component and all the reverberant components in the bright zone, while minimizing the energy of all components in the dark zone. The late reverberant components, however, are diffuse above the Schroeder frequency and it is shown that there is no set of filters that can lead to high energy differences between the diffuse reverberant components in the bright and dark zones, so trying to synthesize all of the reverberation components in the bright zone while minimizing their energy in the dark zone is not a good strategy. Alternatively, we proposed to window the RIRs from one loudspeaker to all the control points in the bright zone and use these responses as target for the bright zone. This approach allows us to select which reverberant components we want to synthesize in the bright zone by choosing the window length. Experimental evaluation results in two rooms with different levels of reverberation show the effect of windowing the target response. The results showed that windowing the target response can lead to lower energy in the dark zone and higher acoustic contrast than the case without windowing, with similar mean square error (MSE) in the bright zone in the case of the more reverberant room and with similar array effort. Especifically, improvements of up to 6 dB in the acoustic contrast are observed for a room with T60=500 ms when a window length of 12 ms is used in the target impulse responses. The window length that offers best performance is, in general, both frequency and scenario dependent. The Kurtosis of the RIRs, which is related to their level of diffuseness, is also shown to give a good indication of the best window length. Also, we observed that greater improvements with respect to the case without windowing are obtained for mid-high frequencies. The improvements obtained by windowing the target are greater for scenarios with high reverberation level and are also robust to perturbations in the room impulse responses.

Vicent Molés-Cases was supported by the Spanish Ministry of Education through Grant No. FPU17/01288. Stephen J. Elliott was supported by the EPSRC DigiTwin project (EP/R006768/1). Jordan Cheer was supported by the Intelligent Structures for Low Noise Environments (ISLNE) EPSRC Prosperity Partnership (EP/S03661X/1). This research was supported by the Spanish Ministry of Science, Innovation and Universities through Grant No. RTI2018-098085-B-C41 (MCIU/AEI/FEDER, UE). Open access was funded by Universitat Politècnica de València through Grant No. PAID-12-21.

1

Throughout this paper, matrices and vectors are represented by upper- and lowercase boldface letters, respectively, (·)T stands for transpose, (·)H stands for conjugate transpose, · for vector 2-norm, E{·} for expected value, F{·} denotes the discrete time Fourier transform (DTFT), and I denotes the identity matrix.

1.
W. F.
Druyvesteyn
and
R. M.
Aarts
, “
Personal sound
,”
J. Acoust. Soc. Am.
96
,
3281
(
1994
).
2.
T.
Betlehem
,
W.
Zhang
,
M. A.
Poletti
, and
T.
Abhayapala
, “
Personal sound zones: Delivering interface-free audio to multiple listeners
,”
IEEE Signal Process. Mag.
32
(
2
),
81
91
(
2015
).
3.
B.
Van Veen
and
K.
Buckley
, “
Beamforming: A versatile approach to spatial filtering
,”
IEEE ASSP Mag.
5
(
2
),
4
24
(
1988
).
4.
E.
Mabande
and
W.
Kellermann
, “
Towards superdirective beamforming with loudspeaker arrays
,” in
Proceedings of the 19th International Congress on Acoustics
, Madrid, Spain, September 2017 (
2007
).
5.
T.
Betlehem
and
P. D.
Teal
, “
A constrained optimization approach for multi-zone surround sound
,” in
Proceedings of the 2011 IEEE International Conference on Acoustics
,
Speech and Signal Processing
,
Prague, Czech Republic
, May 2011 (
2011
), pp.
437
440
.
6.
Y. J.
Wu
and
T.
Abhayapala
, “
Spatial multizone soundfield reproduction: Theory and design
,”
IEEE Trans. Audio Speech Lang. Process.
19
(
6
),
1711
1720
(
2011
).
7.
P.
Coleman
,
P. J. B.
Jackson
,
M.
Olik
, and
J. A.
Pedersen
, “
Personal audio with a planar bright zone
,”
J. Acoust. Soc. Am.
136
,
1725
1735
(
2014
).
8.
M.
Shin
,
S. Q.
Lee
,
F. M.
Fazi
,
P. A.
Nelson
,
D.
Kim
,
S.
Wang
,
K. H.
Park
, and
J.
Seo
, “
Maximization of acoustic energy difference between two spaces
,”
J. Acoust. Soc. Am.
128
,
121
131
(
2010
).
9.
Y.
Cai
,
M.
Wu
,
L.
Liu
, and
J.
Yang
, “
Time-domain acoustic contrast control design with response differential constraint in personal audio systems
,”
J. Acoust. Soc. Am.
135
,
252
257
(
2014
).
10.
F.
Olivieri
,
F. M.
Fazi
,
S.
Fontana
,
D.
Menzies
, and
P. A.
Nelson
, “
Generation of private sound with a circular loudspeaker array and the weighted pressure matching method
,”
IEEE/ACM Trans. Audio, Speech Lang. Process
25
(
8
),
1579
1591
(
2017
).
11.
A.
Canclini
,
D.
Marković
,
M.
Schneider
,
F.
Antonacci
,
E. A. P.
Habets
,
A.
Walther
, and
A.
Sarti
, “
A weighted least squares beam shaping technique for soundfield control
,” in
Proceedings of the IEEE International Conference on Acoustics
,
Speech and Signal Processing
,
Calgary, Alberta, Canada
, April 2018 (
2018
). pp.
6812
6816
.
12.
J.-W.
Choi
and
Y.-H.
Kim
, “
Generation of an acoustically bright zone with an illuminated region using multiple sources
,”
J. Acoust. Soc. Am.
111
(
4
),
1695
1700
(
2002
).
13.
J.-H.
Chang
and
F.
Jacobsen
, “
Sound field control with a circular double-layer array of loudspeakers
,”
J. Acoust. Soc. Am.
131
,
4518
4525
(
2012
).
14.
M. F.
Simón Gálvez
,
S. J.
Elliott
, and
J.
Cheer
, “
Time domain optimization of filters used in a loudspeaker array for personal audio
,”
IEEE/ACM Trans. Audio. Speech Lang. Process.
23
(
11
),
1869
1878
(
2015
).
15.
V.
Molés-Cases
,
G.
Piñero
,
M.
De Diego
, and
A.
Gonzalez
, “
Personal sound zones by subband filtering and time domain optimization
,”
IEEE/ACM Trans. Audio. Speech Lang. Process.
28
,
2684
2696
(
2020
).
16.
J.
Cheer
and
S.
Elliott
, “
Design and implementation of a personal audio system in a car cabin
,” in
Proc. of the Meetings on Acoustics
(
2013
).
17.
M.
Møller
and
M.
Olsen
, “
Sound Zones: On Performance Prediction of Contrast Control Methods
,” in
Proc. of the AES International Conference on Sound Field Control
(
2016
).
18.
M.
Olik
,
J.
Francombe
,
P.
Coleman
,
P. J.
Jackson
,
M.
Olsen
,
M.
Møller
,
R.
Mason
, and
S.
Bech
, “
A comparative performance study of sound zoning methods in a reflective environment
,” in
Proc. of the AES International Conference
(
2013
).
19.
M.
Schneider
and
E. A. P.
Habets
, “
Iterative DFT-domain inverse filter optimization using a weighted least-squares criterion
,”
IEEE/ACM Trans. Audio Speech Lang. Proc.
27
(
12
),
1957
1969
(
2019
).
20.
L.
Vindrola
,
M.
Melon
,
J.-C.
Chamard
,
B.
Gazengel
, and
G.
Plantier
, “
Pressure matching with forced filters for personal sound zones application
,” in
Proc. of the AES 147th Convention
(
2019
).
21.
L.
Vindrola
,
M.
Melon
,
J.-C.
Chamard
, and
B.
Gazengel
, “
Pressure matching with forced filters for personal sound zones application
,”
J. Audio Eng. Soc.
68
(
11
),
832
842
(
2020
).
22.
Q.
Zhu
,
P.
Coleman
,
M.
Wu
, and
J.
Yang
, “
Robust acoustic contrast control with reduced in-situ measurement by acoustic modeling
,”
J. Audio Eng. Soc.
65
(
6
),
460
473
(
2017
).
23.
M.
Ebri
,
N.
Strozzi
,
F. M.
Fazi
,
A.
Farina
, and
L.
Cattani
, “
Individual Listening Zone with Frequency-Dependent Trim of Measured Impulse Responses
,” in
Proc, Of the 149th AES Convention
(
2020
).
24.
H.
Kuttruff
,
Room Acoustics
, 5th ed. (
Spon Press
,
London
,
2009
), p.
52
.
25.
S. J.
Elliott
,
J.
Cheer
,
J.-W.
Choi
, and
Y.
Kim
, “
Robustness and regularization of personal audio systems
,”
IEEE/ACM Trans. Audio Speech Lang. Proc.
20
(
7
),
2123
2133
(
2012
).
26.
M. F.
Simón-Gálvez
,
S. J.
Elliott
, and
J.
Cheer
, “
The effect of reverberation on personal audio devices
,”
J. Acoust. Soc. Am.
135
,
2654
2663
(
2014
).
27.
S.
Spors
and
R.
Rabenstein
, “
Spatial aliasing artifacts produced by linear and circular loudspeaker arrays used for wave field synthesis
,” in
Proc. of the 120th AES Convention
, Paris, France, May 2006 (
2006
).
28.
A.
Farina
, “
Simultaneous measurement of impulse response and distortion with a swept-sine technique
,” in
Proc. of the 108th AES Convention
, Paris, France, February 2000 (
2000
).
29.
P.
Bloomfield
,
Fourier analysis of time series: An introduction
, 2nd ed. (
Wiley
,
New York
,
1976
), p.
69
.
30.
C.-H.
Jeong
, “
Kurtosis of room impulse responses as a diffuseness measure for reverberation chambers
,”
J. Acoust. Soc. Am.
139
(
5
),
2833
2841
(
2016
).
31.
P. D.
Hatziantoniou
and
J. N.
Mourjopoulos
, “
Generalized fractional-octave smoothing of audio and acoustic responses
,”
J. Audio Eng. Soc.
48
(
4
),
259
280
(
2000
).