Isolation performance metrics for personal sound zone reproduction systems

Two isolation performance metrics, Inter-Zone Isolation (IZI) and Inter-Program Isolation (IPI), are introduced for evaluating Personal Sound Zone (PSZ) systems. Compared to the commonly-used Acoustic Contrast metric, IZI and IPI are generalized for multichannel audio, and quantify the isolation of sound zones and of audio programs, respectively. The two metrics are shown to be generally non-interchangeable and suitable for different scenarios, such as generating dark zones (IZI) or minimizing audio-on-audio interference (IPI). Furthermore, two examples with free-field simulations are presented and demonstrate the applications of IZI and IPI in evaluating PSZ performance in different rendering modes and PSZ robustness.

The following article has been accepted by JASA Express Letters.After it is published, it will be found at here.

Introduction
Personal Sound Zone (PSZ) 1 reproduction aims to deliver, using loudspeakers, individual audio programs to listeners in the same space with minimum interference between programs.
In PSZ reproduction, given a specific audio program, a bright zone (BZ ) refers to the area where the program is rendered for the listener, while a dark zone (DZ ) refers to the area where the program is attenuated.For a specific listener, the audio programs are categorized into either target program, which is rendered for the intended listener with best possible audio quality, or interfering program, which is delivered to a different listener but may interfere with the target program for the intended listener.
When evaluating the performance of PSZ systems, a commonly-adopted metric is the so-called Acoustic Contrast (AC), defined by Choi and Kim 16 as the ratio of the acoustic energy in BZ to that in DZ.Although the AC metric gives a measure of the separation between two sound zones, there are several limitations associated with its definition.First, AC a) Author to whom correspondence should be addressed.
is calculated using the sound pressure resulting from a single-channel input that corresponds to rendering mono audio programs.As will be shown, it does not reflect the isolation performance of other rendering modes, where multichannel programs with different inter-channel correlations (e.g., stereo and binaural programs 17 ) are specified as input.In addition, while AC represents the isolation between sound zones, it does not give a measure of the level difference between programs in the same zone, which is more relevant to the the listener's perception of audio-on-audio interference 10,11,18 (excluding psychoacoustic masking effects).
In this paper, we introduce separate PSZ performance metrics for the sound zone and audio program isolation, which are independent of both the number of the channels in a program and the correlation between channels.With these defined metrics, it is possible to evaluate the isolation performance given arbitrary program specifications from two complementary perspectives (sound zone isolation and audio program isolation).We also present examples that utilize the defined metrics to illustrate 1) the effects of different rendering modes (e.g., mono/stereo programs or binaural programs with built-in crosstalk cancellation) on the isolation performance, and 2) the effective physical boundaries of a PSZ within which the isolation performance is preserved.
The rest of the paper is organized as follows: a mathematical definition of the PSZ system is presented in Sec.2; the new metrics are defined and discussed in Sec.3; free-field simulations are used to illustrate two example use cases of the metrics in Sec. 4, followed by the conclusion and further discussion on the metrics and their applications in Sec. 5.

Problem Definition
Without loss of generality, we consider a PSZ system (illustrated in Fig. 1) with L loudspeakers and two sound zones, Z A and Z B .A total of K control points are defined in two zones, with the subset in each zone denoted as {K A , K B }, respectively.A total of I channels of signals represent the inputs to the system, and are divided into two subsets corresponding to the two audio programs for the two zones, denoted as {I A , I B }.We use the term channel to refer to the particular input audio signal and the term program to refer to the unified audio content, which may contain one or more channels, as in mono or stereo/binaural programs.
All subsequent quantities are implicitly dependent on frequency ω; plain font is used for scalars and bold font for vectors (in lowercase) and matrices (in uppercase).Given the complex loudspeaker gains g ∈ C L×1 , the sound pressure vector p ∈ C K×1 at the control points is determined by where H = {H kl } ∈ C K×L is the acoustic transfer function matrix between the loudspeakers and the control points.Furthermore, the loudspeaker gains for a channel s i of an audio program are given by where c i ∈ C L×1 corresponds to the PSZ filters designed for the channel s i .Combining Eq. 1 and 2, the input channels and the control point pressure are related by where s ∈ C I×1 denotes a vector of input channels, C = {C li } ∈ C L×I denotes the matrix containing PSZ filters for each channel.We further introduce the system transfer function matrix, M , as the product of transfer function and filter matrices, where M = {M ki } ∈ C K×I , and relates the input channels to the pressure at the control points, and will be mainly used in the subsequent metric definition.We also note that the size of M is independent of the number of loudspeakers.

Metric Definition
Two aspects of the isolation performance of PSZ systems need to be considered: the isolation between two zones given a specific program, and the isolation between target and interfering programs for a specific zone.Therefore, we define two separate isolation metrics, which are complementary for evaluating the overall system performance.

Inter-Zone Isolation (IZI)
To evaluate the isolation performance between the previously defined zones Z A and Z B , we define the Inter-Zone Isolation (IZI) metric as the ratio of the averaged acoustic power spectra in the two zones.Since the input channels in an audio program are not necessarily correlated (for example, in a stereo program, both correlated and uncorrelated components generally exist), two terms are considered, which represent two extreme cases where all channels are correlated or none of them are correlated.IZI is then determined by the minimum of the two terms.
The following equations show the two terms and the final IZI when Z A is considered as BZ: where |K A | and |K B | denote the number of control points in each set.The definitions imply an even spacing of the defined control points, as is commonly adopted in the PSZ literature.
In the case of Z B as BZ, the sets K A and K B are interchanged, and It is worth noting that in the simplest case where only single-channel programs are considered, IZI is equivalent to AC.Assuming the input signal to be the Dirac delta, which in the frequency domain is represented as a constant, the loudspeaker gains are given by and as there is no distinction between the correlated and uncorrelated cases, IZI can be expressed as where H A and H B are the sub-matrices of H corresponding to the two zones.The latter form is equivalent to the AC definition 16 .

Inter-Program Isolation (IPI)
We define the Inter-Program Isolation (IPI) metric as the ratio of the two averaged acoustic power spectra, in the same zone, corresponding to the two different audio programs.IPI therefore quantifies the isolation of the target program from the other program in a particular zone.As for the case of IZI, we compute both the correlated and uncorrelated components of IPI, and adopt their minimum.Referring to the previously defined system, the IPI for Z A is expressed as where |I A | and |I B | denote the number of input channels in the audio program for each zone.
For the IPI associated with Z B , I A and I B are interchanged, and As a special case, IPI can also be used to evaluate the isolation performance at a single control point by choosing a particular point k in the set of K A (or K B ).We show in the subsequent section that, this single-point IPI is by itself useful in evaluating the robustness property of a generated PSZ.

Applications Using the Pressure Matching Method
In this section, we show two example use cases for the complementary metrics, IZI and IPI, using sound pressure level calculated from the free-field numerical simulation of a PSZ system.These examples are not able to be evaluated with the AC metric due to its limited definition.In the first example (Sec.4.3), IZI and IPI are used to evaluate the effects of different rendering modes on the isolation performance of the same PSZ system; in the second example (Sec.4.4), the single-point IPI is calculated to determine the effective physical boundaries of sound zones in a two-listener PSZ system.All simulated PSZ filters in the examples are designed using the standard Pressure Matching (PM) method 19 .As the PM method is usually described by vectors of loudspeaker gains and pressure at control points in most of the literature, we first re-express the method in terms of filter and transfer function matrices for the case of multichannel programs.

PSZ filter generation
The general idea of the PM method 19 is to minimize the difference between the specified target sound field with ideal zone/program isolation and the actual sound field generated by the loudspeakers.The target sound field is usually specified with target pressure p T at the control points, where in DZ it is set to zero, and in BZ it is set based on program-specific transfer functions.The cost function to be minimized is constructed as where the latter term is introduced as Tikhonov regularization to improve both matrix conditioning and the robustness of the resulting PSZ filters.Taking a single input channel to be the Dirac delta, the above function can be rewritten by replacing g and p T with filter c and target transfer function m T , respectively: The corresponding optimal solution c * is given by where (.) H denotes taking the conjugate transpose.Due to added regularization, the solution applies to all three cases of K < L, K = L, and K > L. This solution is suitable only when a single-channel program is considered and the corresponding BZ has been specified.For the case of multichannel programs, multiple targets and filters are required.Therefore, assuming I channels, we modify the cost function to be the sum of the costs from minimizing the errors in all channels: where ] are the filter and system transfer function matrices, respectively, and the subscript F denotes the Frobenius norm.The resulting optimal filter matrix C * is given by which has a similar form as Eq. 15 for a single program except that the target vector is replaced by the target matrix.Similar solutions can also be found in the literature on crosstalk cancellation systems (such as Bai and Lee 20 ).

Simulation setup
The PSZ system adopted for the free-field simulation consists of a linear array of eight loudspeakers and two zones, as illustrated in Fig. 2. The loudspeakers are modeled as circular baffled pistons in the far field, with a spacing of 25 cm between two adjacent units.
The two zones are separated by 1 m, and are also 1 m from the array.Two control points are defined in each zone, representing the ear locations of a listener with a spacing of 16.8 cm (the listeners' heads are not included in the simulation).In addition to the case of both listener A and B located in the zone center, a moved listener (illustrated as A in the figure) with a displacement of (x, y) = (−0.3,−0.2)m is also simulated.Three cases of target matrices are considered, which correspond to three rendering modes for mono, stereo programs without crosstalk cancellation (XTC), and binaural programs with XTC (equivalent to a multi-listener transaural system 21 ), respectively, and are given by H 11 H 14 0 0 where the transfer functions of loudspeaker 1, 4, 5 and 8 (denoted in Fig. 2) are chosen as those between the input channels and the control points.To simulate realistic uncertainties in the transfer functions due to factors such as loudspeaker position inaccuracies or response/gain variances, the transfer functions are sampled from a complex Gaussian distribution for each transfer function H kl , modeled as where A, φ denote the amplitude and phase of the transfer function, N (•, •) denotes the normal distribution, the hat symbol denotes the value obtained from the free-field simulation, and σ 2 A , σ 2 φ denote the variances.In the simulation we simply choose 10 −4 at all frequencies, and set the constant regularization as β = K • σ 2 = 4 × 10 −4 to minimize the expected cost in Eq. 14, following a probablistic approach 22 .From the same specified distribution N (0, σ 2 ) we sample two sets of transfer functions independently for filter generation and performance evaluation, and each set is averaged across 10 trials to represent the procedure in actual experiments.The corresponding PSZ filters are computed using Eq. 17.

Evaluating PSZ performance with different rendering modes
In the literature, the performance of PSZ systems is usually evaluated with a single set of target pressure 2,6,8,15 , which corresponds to a fixed rendering mode for mono programs in each zone.While most existing systems are capable of delivering multichannel programs, the impact of their corresponding rendering modes on the isolation performance has not been studied.By using the new IZI and IPI metrics, such potential impact can be explicitly evaluated for various choices of target matrices.
We simulate and compare the IZI and IPI performance, of the system setup described above, for three target matrices given by Eq. 18 and two cases of listener positions: 1) two listeners centered in both zones and 2) one listener moves away from the zone, with the results for IZI A and IPI A shown in Fig. 3. Fig. 3(a) corresponds to case 1, where IZI A and IPI A are almost identical due to the symmetric setup and free-field assumptions.The mono mode yields the best performance, followed by the stereo mode, while the XTC mode has the worst isolation.In particular, the degradation in the XTC mode is more significant below 1 kHz, indicating a potential trade-off between program isolation and crosstalk cancellation, due to the increased wavelength (and therefore weaker isolation between listeners) and the requirement in the cost function for cancelling the crosstalk.For PSZ systems rendering binaural audio with XTC, such trade-off can be further optimized based on established perceptual preferences of listeners 17 , or one can simply downmix the audio as mono to trade off spatial quality for better isolation.Fig. 3(b) and 3(c) correspond to case 2, with the PSZ filters designed for the new and the centered position, respectively.We start to observe differences between the two metrics in 3(b) due to breaking of the symmetry; in 3(c) the differences become clearer as the PSZ filters are no longer optimal.IZI A shows a minor degradation in isolation between two listeners, whereas IPI A reflects severely degraded isolation between two programs for the moved listener, which is more indicative of the actual experience of the listener with unoptimized PSZ filters.

Determining effective PSZ boundaries
In most PSZ systems, sound zones are specified with regular geometries (e.g., round 7,16 or square 4,10,11,15 shapes).However, due to the constraints of practical systems, such as the number and distribution of loudspeakers and control points, the isolation performance within the sound zone is often non-uniform, leading to a certain lack of the robustness against possible listener movements within zones.To quantify such robustness, the effective "boundaries" of sound zones are defined by using the previously-defined single-point IPI metric as the contour line for a certain IPI level (e.g., 20 dB).As a result, the robustness against listener movements can be evaluated by the area (or volume) within the boundaries.
Furthermore, the dependency of that robustness on moving directions can be evaluated with the projection of the effective area/volume along the direction.PSZ filters are determined by both the exact transfer functions and the target matrices (see Eq. 17), we also expect a great variability of the resulting PSZ boundaries with the choice of system configurations and rendering modes.Comparing the three map plots, a trend in the decrease of the effective size of sound zones is observed as frequency increases, which has been well recognized in the literature 1,7,15 as reduced robustness, but was difficult to study using the AC metric.It is worth noting that the simulation is aimed to only illustrate the trend, which is mainly due to the wavelength changes and can be observed with and without head presence.

Conclusion and Final Discussion
This paper introduces two metrics, IZI and IPI, for evaluating the isolation performance of generalized PSZ systems from two different aspects.The IZI metric, which reduces to the commonly-used AC metric for the special case of rendering single-channel programs, represents the isolation of the sound zones for a (single-or multi-channel) program, whereas the IPI metric quantifies the level of isolation of the target program from the interfering program in the same sound zone.
The two metrics, albeit defined by similar expressions, are generally non-interchangeable except for special cases where both the physical system setup and the program assignment are perfectly symmetric with respect to the two listeners.In addition, the different emphases of the two metrics make them suitable for different situations: IZI compares the acoustic energy between two sound zones, and therefore is more suitable in the cases where a high contrast of sound energy between different regions is desired, such as creating a dark zone in which all audio programs are attenuated; and IPI is related to the acoustic energy of different programs rendered at the same zone, and therefore is more applicable when different programs are present concurrently, and also more suitable for the objective evaluation of the audio-on-audio interference.
In Sec. 4, we present two examples of different applications of the IZI and IPI metrics, with implications for future work.In the first example, we show the potential trade-off between the isolation performance and crosstalk cancellation at low frequencies for PSZ systems with crosstalk-cancelled binaural content.This offers the potential to further improve the filter design method by optimizing the trade-off in accordance with subjective preferences.In the second example, we show that the single-point IPI metric can serve as a basis for evaluating the robustness of the generated sound zones against listener movements.This allows the definition of a new metric that specifically quantifies the sound zone robustness, which is beyond the scope of this work and will be presented in a future study.

Fig. 1 .
Fig. 1.Signal flow diagram of a general PSZ system consisting of I input audio programs (repre-

Fig. 2 .
Fig. 2. Illustration of the simulated PSZ system.The black triangles on the top and the cross

Fig. 3 .
Fig.3.Simulated IZI A (solid lines) and IPI A (dashed lines) for two evaluation setups of differ-

Fig. 4 .
Fig. 4. Computed spatial maps of single-point IPI (truncated to 40 dB) for the left half of the

Fig. 4
Fig. 4 shows three computed spatial maps of simulated single-point IPI in 2D for the