Immersive and spatial sound reproduction has been widely studied using loudspeaker arrays. However, flat-panel loudspeakers that utilize thin flat panels with force actuators are a promising alternative to traditional coaxial loudspeakers for practical applications, with benefits in low-visual profiles and diffuse radiation. Literature has addressed the sound quality and applications of flat-panel loudspeakers in three-dimensional sound reproduction, such as wave field synthesis and sound zones. This paper revisits the spatial sound perception of flat-panel loudspeakers, specifically the localization mismatch between the perceived and desired sound directions when using amplitude panning. Subjective tests in an anechoic chamber with 24 subjects result in the mean azimuth direction mismatch within ±6.0° and the mean elevation mismatch within ±10.0°. The experimental results show that the virtual source created by amplitude panning over a flat-panel loudspeaker still achieves spatial localization accuracy close to that of a real sound source, despite not using complex algorithms or acoustic transfer function information. The findings of this study establish a benchmark for virtual source localization in spatial sound reproduction using flat-panel loudspeakers, which can serve as a starting point for future research and optimization of algorithms.

Extensive research is dedicated to using loudspeaker arrays to reproduce spatial sound for creating immersive and realistic listening experiences.1 Recreate the auditory sense of space2 and the localization of perceived sound sources,3–5 which is essential for various applications such as augmented or mixed reality (AR/MR), multimedia content creation,1,4,6 and personalized sound zones.7,8 However, developing this technology from laboratory prototypes for use in real-world settings, especially in complex acoustic environments like buildings, requires further exploration. Previous research has explored various approaches, including, but not limited to, equalization of room responses,9 optimization of robustness,6,10 optimization of loudspeaker placement,11,12 implementation simplification by reducing acoustics transfer function measurement,8,13 or using distributed systems.7 However, loudspeakers are limited by the physical structure and spatial placement in sound reproduction. For example, coaxial loudspeakers can be impractical for certain applications due to their weight, cost, or other factors. Additionally, reproducing sound with sufficient spatial coverage requires loudspeakers to have an appropriate spatial span while maintaining small enough spacing for controlling sound waves at high frequencies.

The flat-panel loudspeaker is a promising alternative to traditional coaxial loudspeakers with advantages in low-visual profile and wide sound dispersion.14 It uses a thin, flat panel with force actuators at the rear side to generate acoustic radiation through the panel vibration. It can adapt to various indoor environments and even utilize existing displays, e.g., the organic light-emitting diode (OLED) screen, to generate spatialized audio.15 On the other hand, the flat-panel loudspeaker, as the multi-actuator panel (MAP), has multiple exciters driven with different signals, respectively, using signal processing to allow dynamic control of the panel's spatial vibration profile with diffuse sound radiation characteristic.16 This characteristic helps avoid the beaming properties of piston loudspeakers at high frequencies17 and reduces modal excitation within rooms.18 Compared to conventional loudspeakers, the sound quality could be challenging when applying the flat-panel loudspeaker. Though not the scope of this paper, existing literature has widely addressed sound quality improvement19 and applications of flat-panel loudspeakers in three-dimensional (3D) sound reproduction, such as wave field synthesis14,20 and directional sound fields.21 

The main aim of sound reproduction is to provide listeners with a clear spatial perception by utilizing psychoacoustic cues that lead to perceptual satisfaction. However, the human auditory perception mechanism is complex, resulting in selective emphasis and suppression, even under unfavorable conditions, such as the cocktail party effect. Therefore, considering subjective perception is crucial for evaluating, designing, and optimizing sound reproduction methods. The flat-panel loudspeaker has been perceptually evaluated, including loudness,22,23 sound localization, perception of sound distance by wave field synthesis,24 and sound quality enhancement.25 Furthermore, it is compared with the electrodynamic loudspeaker on objective and subjective measures for wave field synthesis.26 

So far, vector-based amplitude panning (VBAP)27 remains an effective and straightforward method to create virtual sound sources using traditional loudspeakers arbitrarily placed in space, with ongoing research and development.28,29 VBAP has several practical advantages, including low computational complexity, no destructive interference within the sweet spot, superior timbral quality, and gradual sound quality degradation outside the sweet spot.29 Moreover, it does not require precise information on the acoustic transfer function for implementation. This feature is essential for controlling flat-panel loudspeakers since their sound radiation is affected by several factors, such as material, boundary conditions, and coupling. While VBAP has been utilized with flat-panel loudspeakers,30 a complete and thorough evaluation of spatial sound panning with the flat-panel loudspeaker is still lacking.31 

This paper revisits the spatial sound perception of flat-panel loudspeakers, specifically the localization mismatch between the perceived and desired sound directions when using amplitude panning. An experiment involving subjective and objective tests utilized a vector-based amplitude panning algorithm to create 81 virtual sources with four actuators placed at the corners of a flat panel. Subjective listening tests included 24 normal-hearing subjects, while objective tests included results on the interaural time difference (ITD) and the interaural level differences (ILD) measured in an anechoic chamber. The study aims to determine the spatial localization accuracy of amplitude panning using a flat-panel loudspeaker. As amplitude panning does not rely on complex algorithms or acoustic transfer function information, the experiment results can be a benchmark for virtual source localization in spatial sound reproduction using flat-panel loudspeakers. This information can be useful for future research and optimization of algorithms in this area.

The motion for a thin flat panel can be expressed as31,
D 4 u ( y , z , t ) + ρ h 2 u ( y , z , t ) 2 t = f ( y , z , t ) ,
(1)
where u ( y , z , t ) is the out-of-plane displacement of time t for point ( y , z ) on the panel. The coordinate system is defined in Sec. III. f ( y , z , t ) is the external forcing function applied to the panel, h and ρ represent the thickness and density of the panel, respectively, and D is the bending stiffness per unit width given by Young's modulus E and Poisson's ratio υ as19 
D = E h 3 12 ( 1 υ 2 ) .
(2)
The forced response of a rectangular panel with dimensions L y × L z × h and simply supported edges can be expressed as a sum of modes of the panel's free response as
u ( y , z , t ) = r = 1 α r Φ r ( y , z ) e j ω r t = r = 1 α r sin ( m r π L y y ) sin ( n r π L z z ) e j ω r t ,
(3)
where α r is the amplitude of the mode Φ r ( y , z ) , ω r is the resonant frequency of each mode, j denotes the imaginary unit as the square root of –1, m r and n r represent the number of sinusoidal half-wavelengths for each mode along the y and z axes, respectively.
Using the Rayleigh integral, the surface area S of the flat panel can be divided into several sub-regions ds, each of which is treated as a point source that radiates sound waves outward. The total acoustic response in space is the superposition of these point sources. The sound pressure at r with origin at the center of the panel is
p ( r ) = S j ω ρ 0 u ̇ ( r s ) exp ( jkR ) 2 π R d s ,
(4)
where u ̇ ( r s ) is the complex transverse velocity at any point r s on the surface, R = | r r s | , ρ 0 is the density of air, and k is the wave number. The complex velocity u ̇ ( r s ) is determined by the first time derivative of the panel response.

Please note that the derivation presented here may need to be more precise for the near-field condition, where the sound pressure radiated by the source is considerably more complex and has intricate oscillatory features that cannot be accurately approximated. However, near-field scenarios can be quite demanding, such as when the user is situated within 1 m of the display screen. Furthermore, the practical boundary conditions can differ from those assumed in the derivation, making it challenging to obtain an analytical solution.

Though the sound field produced by a flat-panel loudspeaker is highly dependent on frequency and the relative observation position to the sound source,21 it is found that each actuator element can vibrate independently without being affected by neighboring exciters and panel edges, as confirmed by laser Doppler vibrometer measurements.32 It implies that individual exciters can be considered as independent sources for spatial sound reproduction, so that we can create a virtual sound using amplitude panning.

Figure 1 illustrates the vector-based amplitude panning of actuator triplets on a flat panel to create a virtual sound source. For a given virtual source direction, the three closest actuators are activated simultaneously with respective signal gains as a triplet.27 The direction to the virtual source is defined as33,34
p vs = g L 123 = g 1 l 1 + g 2 l 2 + g 3 l 3 ,
(5)
where unit vectors l 1 , l 2, and l 3 represent the directions from the listener to each actuator in Cartesian coordinates, L 123 = [ l 1 l 2 l 3 ] and the normalized gains g = [ g 1 g 2 g 3 ] T is
g = p vs L 123 1 / | | p vs L 123 1 | | ,
(6)
where ( · ) T denotes matrix transposition, | | g | | = 1, and the inverse matrix L 123 1 satisfies L 123 1 L 123 = I, where I is the identity matrix. This work implemented the vector-based amplitude panning based on codes from Refs. 35 and 36.
FIG. 1.

Three-dimensional amplitude panning using a flat-panel loudspeaker with four actuators “ACT.1–4.” For example, actuators 1, 2, and 4 are the activated triplets to create a virtual sound source in the direction of p vs. Unit vectors l 1 , l 2, and l 3 represent the directions from the listener to each actuator in Cartesian coordinates.

FIG. 1.

Three-dimensional amplitude panning using a flat-panel loudspeaker with four actuators “ACT.1–4.” For example, actuators 1, 2, and 4 are the activated triplets to create a virtual sound source in the direction of p vs. Unit vectors l 1 , l 2, and l 3 represent the directions from the listener to each actuator in Cartesian coordinates.

Close modal

VBAP includes a geometric determination of the triangle of active loudspeakers and an algebraic solution to compute the panning gains such that the velocity vector of the synthesized sound field matches the direction of the virtual source. Though it does not require acoustic transfer function information, the spatial localization accuracy generated by conventional loudspeakers with VBAP is within ± 8 ° and ± 18 ° in azimuth and elevation, respectively.34 In comparison, perceiving a real sound source has the mean azimuth mismatch ranges from 1 ° 3 °, and the mean elevation mismatch in the median plane ranges from 4 ° for white noise to 17 ° for speech.3 Conventional loudspeakers are discrete sound sources in terms of spatial distribution. On the other hand, a flat panel with multiple actuators is a continuous sound source, e.g., in Eq. (3). The sound received from the flat-panel loudspeaker is contingent upon the plate's vibration, i.e., u ̇ ( r s ) in Eq. (4). Since actuators have different gains, but the same phase under VBAP, the higher u ̇ ( r s ) value aligns with the vicinity of the actuators. Thus, due to spatial masking, the perceived sound of VBAP using the flat panel may be comparable to that of conventional loudspeakers. The following section will present experimental characteristics of the virtual source localization with amplitude panning on a flat panel with actuators.

The experiment was designed to objectively and subjectively evaluate the spatial sound perception of flat-panel loudspeakers, specifically, the localization mismatch between the perceived and desired sound directions when using amplitude panning. As illustrated in Fig. 2, we consider the indoor displays scenario and the listener in the near-field with a distance L = 70 cm from the center of the subject's head O to the center of the panel. The flat panel is a 0.2 mm–thick aluminum stencil of dimensions 60.5 × 57.5 cm with fixed edges. Virtual sound sources were created, respectively, at 81 locations within the square region of sizes 50.0 × 50.0 cm. The four actuators are at the vertexes ( L , a , a ) , ( L , a , a ) , ( L , a , a ), and ( L , a , a ), respectively, with a = 25 cm, as shown in Fig. 3. Virtual sources are denoted as S i j, where i = 1, 2,…, 9, and j = 1, 2,…, 9, represent the row and column numbers, respectively. So, the available azimuth and elevation ranges of virtual sound sources were within ± 19.65 °. The experiment was carried out within the anechoic chamber located at Institute of Acoustics, Chinese Academy of Sciences. The chamber's dimensions are 6.40 m in length, 4.70 m in width, and 4.70 m in height, with a usable height of 3.20 m.

FIG. 2.

(Color online) Experimental setup using the KEMAR Head and Torso simulator at the listener location with a distance of 70 cm from the panel.

FIG. 2.

(Color online) Experimental setup using the KEMAR Head and Torso simulator at the listener location with a distance of 70 cm from the panel.

Close modal
FIG. 3.

(Color online) The distance from the center of the listener's head to that of the panel was L = 70 cm. Four actuators ACT.1 4 were at the rear side of the panel with a = 25 cm. Eighty-one virtual sources S i j, i = 1, 2,…, 9 and j = 1, 2,…, 9, were within the square region (contoured in black) and the four actuators at the vertexes.

FIG. 3.

(Color online) The distance from the center of the listener's head to that of the panel was L = 70 cm. Four actuators ACT.1 4 were at the rear side of the panel with a = 25 cm. Eighty-one virtual sources S i j, i = 1, 2,…, 9 and j = 1, 2,…, 9, were within the square region (contoured in black) and the four actuators at the vertexes.

Close modal

Stimuli were generated at a sampling rate of 48 kHz. VBAP gains were calculated based on codes from Refs. 35 and 36. Output signals were played as multi-channel.flac files, with a pink noise signal in each channel for the corresponding actuator. The computer was equipped with a Fireface UC audio interface (RME Audio, Haimhausen, Germany) for digital to analog conversion. Separate power amplifiers drove four actuators. The stimuli were reproduced within an effective volume range, determined by a pre-test of channel distortion to ensure a total harmonic distortion (THD) of less than 10%. Furthermore, the reproduction process involved the equalization37 and inter-channel calibration to achieve flattened and aligned frequency responses from all actuators over 100 Hz to 20 kHz.

The ITD and ILD are widely used as auditory cues for the localization of a single source in psychoacoustics.3,5 ITD depends on the different durations that sounds travel toward two ears.38 It is calculated based on the position of the interaural cross correlation peak within a maximum interaural delay time of 1 ms for frequencies below 1.5 kHz that39,
ITD ( θ ) = argmax { E τ [ s L ( t ) s R ( t + τ ) ] } ,
(7)
where s L ( t ) and s R ( t ) are the sound signals the left and right ear receives, respectively. ITD reflects the shading effect of a human head (that sound pressure degrades in the ear furthest away from the source and increases at the other), when the sound source deviates from the median plane.40 ILD is defined as
ILD ( x s , y s , z s , f 0 ) = 20 lg | P R ( x s , y s , z s , f 0 ) P L ( x s , y s , z s , f 0 ) | ( dB ) ,
(8)
where P L ( x s , y s , z s , f 0 ) and P R ( x s , y s , z s , f 0 ) are left and right frequency-domain sound pressures at the ear canals generated by the sound source at location ( x s , y s , z s ) with frequency f 0, respectively.

In the experiment, ITD and ILD for different virtual sources are calculated from recordings using the Head and Torso simulator (GRAS Sound & Vibration, Holte, Denmark) and 5 s pink noise as the test signal. Then, we obtained the corresponding perceptual virtual source directions by referencing a high-resolution lookup table with simulated ITD and ILD values based on the database on head-related transfer functions and interpolation.41 

Figure 4 presents the localization mismatch between the perceptual virtual source direction (obtained from the ITD measurement) and the desired sound direction for each virtual source. The tested values are smoothed for visualization. For virtual sources at most locations on the panel, the azimuth and elevation mismatch values are relatively small. A few azimuth mismatches of negative values but no worse than 8.0 ° appear in the lower right area ( y < 100 mm , z < 0 mm), while a few elevation mismatches of positive values, but no worse than 5.0 °, locate in the lower right area ( y < 0 mm , z < 0 mm).

FIG. 4.

(Color online) Localization mismatch between the perceptual virtual source direction (obtained from the ITD measurement) and the desired sound direction for each virtual source in Fig. 3. (a) Azimuth mismatch, (b) elevation mismatch.

FIG. 4.

(Color online) Localization mismatch between the perceptual virtual source direction (obtained from the ITD measurement) and the desired sound direction for each virtual source in Fig. 3. (a) Azimuth mismatch, (b) elevation mismatch.

Close modal

Figure 5 presents the localization mismatch between the perceptual virtual source direction (obtained from the ILD measurement) and the desired sound direction for each virtual source at 2500 and 5000 Hz, respectively. The tested values are smoothed for visualization. Similar to Fig. 4 for virtual sources at most locations on the panel, the azimuth and elevation mismatch values are relatively small. ILD values are frequency dependent. Mismatch values associated with the lower frequency at 2500 Hz range exhibit more significant deviations among different virtual source heights. At a frequency of 2500 Hz, the lower region near the center of the panel ( y < 100 mm , z < 50 mm) exhibits larger azimuth localization mismatch values. Meanwhile, the consistency of different locations across various locations increases for elevation mismatch, and elevation mismatch values fall within a range of 0.3. At a frequency of 5000 Hz, larger mismatch values occur when the virtual source is positioned on the lower right area of the panel ( y < 50 mm , z < 0 mm).

FIG. 5.

(Color online) Localization mismatch between the perceptual virtual source direction (obtained from the ILD measurement) and the desired sound direction for each virtual source in Fig. 3. (a) Azimuth mismatch at 2500 Hz, (b) elevation mismatch at 2500 Hz, (c) azimuth mismatch at 5000 Hz, (d) elevation mismatch at 5000 Hz.

FIG. 5.

(Color online) Localization mismatch between the perceptual virtual source direction (obtained from the ILD measurement) and the desired sound direction for each virtual source in Fig. 3. (a) Azimuth mismatch at 2500 Hz, (b) elevation mismatch at 2500 Hz, (c) azimuth mismatch at 5000 Hz, (d) elevation mismatch at 5000 Hz.

Close modal

When the virtual sound source is at the height of the ear, the trend of flat-panel loudspeakers is more consistent with the positioning rules of traditional coaxial loudspeakers. However, when the sound source is lower than the ear level, ILD localization mismatch values become more pronounced.

Due to the variations in principles and reproduction methods among different spatial sound techniques, there has yet to be an established standard for assessing spatial sounds, including evaluation criteria, methodologies, experimental conditions, and data processing. The International Telecommunication Union (ITU) has set standards for subjectively assessing spatial sound, which can be referenced in specific evaluations of spatial sound.42,43

Twenty-four normal-hearing listeners, 22–49 yrs of age, were involved in subjective tests. Among them, 13 were male, and 11 were female. The subjects sat naturally on a chair in an anechoic chamber with a flat panel located 0.7 m in front of them. During the listening test, the subjects were instructed to maintain their head orientation toward the direction of the unknown perceptual virtual source.43 They were also asked to maintain a stable body position while slightly rotating their head to keep their eyes and ears at approximately the same height as the center of the panel.

A pre-study with three groups of training was conducted. First, virtual sound sources of different azimuth angles in the same height were played from sequence left to right, namely “S51,” “S55,” and “S59” in Fig. 3, respectively. Then, virtual sound sources of different elevation angles in the median plane were played sequentially from top to bottom, namely “S15,” “S55,” and “S95.” Finally, the virtual sound sources in four corners were played, namely “S11,” “S19,” “S99,” and “S91.” Finally, the virtual sound sources in four corners were played. During the pre-study test, subjects were instructed to figure out the perceived direction of the virtual source once heard before being told the correct direction. Then, the subjects proceeded to the formal testing phase after a 5 min rest interval. Each equalized VBAP pink noise sample lasted for 5 s and was randomly presented to simulate the virtual sound sources from various locations. A 3 s pause was introduced between every two samples to allow the subject sufficient time to respond. During the localization experiment, the selection of the reporting method is of utmost importance. It should demonstrate an accuracy level at least as high as the human localization accuracy, which is approximately 1° for frontal sound incidence. Therefore, we opted for absolute evaluation over auditory comparison and discrimination experiments. Once the direction of the virtual source was determined, subjects were instructed to point a laser pointer in that direction. After confirming the direction, the subjects were asked to hit the bluetooth remote button (Xiaomi Co., Beijing, People’s Republic of China) connected to the phone. Meanwhile, a mobile phone positioned behind the subject was used to take a picture and record the localization of the laser mark. The laser pointer marks should be confined within the black frame indicated as the target region on the panel. The controlling computer was used to verify the results to avoid any omissions. If there was any error in operation, the subjects were allowed to revise their answers before the end of the audio playback. After a 3 s rest, the subsequent trial began immediately. If the virtual source was found to be indeterminate, the subject was asked to point the laser marker to a most-likely position. The listening test had a total length of 14 min for each subject to avoid fatigue.

The mobile phone archived the laser marks indicating the chosen azimuth and elevation angles. Given that the mobile phone may have shifted slightly due to ground shaking caused by walking on the steel net while changing subjects in the anechoic chamber, we carefully repositioned the mobile phone at the beginning of each subject's experiment.

Figures 6 and 7 depict the mean and standard deviation of subjective localization mismatch. Subjective experiments exhibit a nuanced distribution compared to objective ones, with pronounced mismatch values in larger regions. For virtual sources positioned on opposite sides of the panel, perceptual azimuth angles align with traditional coaxial loudspeakers, indicating lower side accuracy. However, flat-panel loudspeakers show less accuracy for central positions than coaxial counterparts. The center localization azimuth angle accuracy for flat-panel loudspeakers at ear level is moderate. Upper localization slightly surpasses lower localization accuracy, suggesting a concentrated but imprecise lower localization area. Left horizontal localization is less effective than right localization; a trait that may be attributed to the right-handedness of all subjects when using the laser pointer.

FIG. 6.

(Color online) Localization mismatch between the perceptual virtual source direction (averaged among 24 subjects in the listening test) and the desired sound direction for each virtual source in Fig. 3. (a) Azimuth mismatch. (b) elevation mismatch.

FIG. 6.

(Color online) Localization mismatch between the perceptual virtual source direction (averaged among 24 subjects in the listening test) and the desired sound direction for each virtual source in Fig. 3. (a) Azimuth mismatch. (b) elevation mismatch.

Close modal
FIG. 7.

(Color online) Standard deviation of the localization mismatch in (a) azimuth and (b) elevation among 24 subjects in the listening test for each virtual source in Fig. 3.

FIG. 7.

(Color online) Standard deviation of the localization mismatch in (a) azimuth and (b) elevation among 24 subjects in the listening test for each virtual source in Fig. 3.

Close modal

Previous research on VBAP localization of coaxial loudspeakers indicates highest elevation angle accuracy only when the virtual sound source aligns with loudspeaker's height, lacking precision in other scenarios. However, this localization behavior is different for flat-panel loudspeakers. Notably, right-side horizontal localization performs worse than the left, while bottom-side vertical localization is less accurate than the top.

For the entire panel, the mean of azimuth mismatch values is within ± 6.0 °, generally superior to elevation mismatch within ± 10.0 °, implying more accurate horizontal than vertical localization. This aligns with the established notion of human ears having better horizontal than vertical resolution. Figure 6 reveals that when the virtual source aligns precisely with the target region edge, azimuth and elevation mismatch values are large. Yet, precision improves near this edge. For azimuth, when the virtual sound source is close to but not on the frame edge ( y = ± 187.5 mm, like cross-lines labeled 2 and 8 in Fig. 3), mismatch values are relatively small within ± 1.0 °. Conversely, when the source is at the left and right borders y = ± 250 mm, mean mismatch values are high within ± 6.0 °. A similar pattern occurs for elevation angles. We term this phenomenon the “edge-deterioration effect,” suggesting subjects shift the laser mark inward subconsciously to avoid exceeding control areas. The effect's substantial presence might arise from boundary conditions and actuator distribution constraints.

The mismatch observed in our experiment is related to various factors, such as the limit directional resolution of human hearing, the limitation of VBAP or pairwise amplitude panning itself, and the use of flat-panel loudspeakers as sources for reproduction. First, human hearing of a real sound source has the mean azimuth mismatch ranges from 1 ° 3 °, and the mean elevation mismatch in the median plane ranges from 4 ° for white noise to 17 ° for speech. Second, multichannel sound with conventional loudspeakers when using the VBAP algorithm has a similar localization mismatch pattern, that the azimuth mismatch is much better that the elevation mismatch. Specifically, the azimuth mismatches of the flat-panel loudspeaker are comparable to those of conventional loudspeakers using VBAP34 over the desired virtual sources at ( 0 ° , 0 °), ( 0 ° , 15 °). and ( 10 ° , 0 °), where the difference in the median value, the interquartile range, or the data range, is within 2 °, except that the data range of the flat-panel loudspeaker at ( 10 ° , 0 °) is 5 ° larger. For elevation mismatches, the difference in the median value is within 2 °, while the interquartile range and the data range at ( 0 ° , 15 °) and at ( 10 ° , 0 °) are 5 ° to 13 ° larger, probably due to disparities in array configurations, as the triplets are placed differently in the flat-panel loudspeaker in our experiment and conventional loudspeakers.34 

Thus, the perceived location mismatch can be largely caused by the limitation of VBAP itself, the limited directional resolution of human hearing, and the edge effect of the flat panel when the desired virtual source is geometrically close to the panel edge.

Standard deviations of azimuth and elevation mismatch values are illustrated in Fig. 7. Azimuth angles exhibit uniform distribution, with small deviations near real sound source actuators at vertices and slightly larger deviations elsewhere. In the upper part of the central panel area, deviations are smaller, probably due to improved localization accuracy resulting from slight head rotation. Elevation angle deviations are substantial around central height, contrasting with smaller deviations at top and bottom. This highlights optimal localization deviation not aligning with human ear height.

We also analyzed individual differences by the standard deviation of the subjects as shown in Fig. 8. Subjects 5, 8, 15, 21, and 22 show significant horizontal localization deviations, while subjects 1, 7, 11, 15, and 19 exhibit notable vertical deviations. Subject 15's overall deviation is substantial. Subject 8's azimuth mismatch deviation is large, with elevation mismatch showing smaller deviation, aligning with established vertical localization accuracy disparities. Spearman's ρ test between perceptual and ideal localization angles reveals a correlation range of 88%–98% (p < 0.05). This indicates a robust correlation between ideal and tested values.

FIG. 8.

(Color online) The analysis of individual differences through the standard deviation of the subjects. The abscissa represents the subject number, ranging from 1–24. The dark filled bar represents the standard deviation of azimuth mismatch, while the light filled bar represents the standard deviation of elevation mismatch.

FIG. 8.

(Color online) The analysis of individual differences through the standard deviation of the subjects. The abscissa represents the subject number, ranging from 1–24. The dark filled bar represents the standard deviation of azimuth mismatch, while the light filled bar represents the standard deviation of elevation mismatch.

Close modal

Test results were categorized into two groups based on virtual sound source positions, both horizontally and vertically. The impact of these factors on perceptual azimuth and elevation angles was analyzed, resulting in four datasets. Normality was assessed using the Lilliefors test, and data equality was checked using the Bartlett's test. Given non-acceptance of null hypotheses at 5% significance level, the Kruskal–Wallis test was chosen over analysis of variance (ANOVA).

Further, the Kruskal–Wallis test was used to determine whether there are statistically significant differences between the azimuth/elevation of the virtual source and the mean azimuth/elevation mismatch of the perceived localizations. Results show that the mean perceptual azimuth mismatch significantly differs across various azimuth angles of the virtual sources (p < 0.001). Similarly, the mean perceptual elevation mismatch also significantly differs across various elevation angles of the virtual sources (p < 0.001). The mean perceptual azimuth mismatch differs less significantly than elevation mismatch across various elevation angles of the virtual sources but still satisfies the relationship with p < 0.001. On the contrary, the mean perceptual elevation mismatch across azimuth angles of the virtual sources is least impactful with p = 0.03. These results highlight the substantial influence of virtual source placement on both perceptual azimuth and elevation localization accuracy (p < 0.05).

Though the experiment was conducted over a chosen panel with corner-positioned actuators, the findings may have broader implications. In application, the flat-panel loudspeakers can be used solely or in multi-panel setups for immersive sound with extended spatial coverage. Under VBAP, each panel can operate independently, so we assessed a corner-actuated single-panel scenario as a representative module. Since VBAP does not require the exact sound source or propagation, and considering auditory perception and masking effects, the result may hold for other flat-panel loudspeakers with comparable geometry.

This experiment assesses the performance of VBAP on the flat-panel loudspeaker, and the results can serve as a baseline for perceptual evaluation in current and future research on flat-panel loudspeakers, since VBAP is a simple but effective approach for spatial sound reproduction that does not require acoustic transfer function information. This study also further enhances the understanding of the transition of sound reproduction using the conventional loudspeaker to the flat panel from a perceptual aspect. Those findings extend the existing theory and practical value on flat-panel loudspeakers, especially for the auditory display in buildings.

This paper explored the localization mismatch between desired and perceived sound directions using amplitude panning with flat-panel loudspeakers. The study involved creating virtual sound sources of various locations and evaluating the perceptual source direction through both objective and subjective tests. The subjective tests resulted in a mean azimuth direction mismatch within ± 6.0 ° and a mean elevation mismatch within ± 10.0 °. Additionally, the objective tests using the head and torso simulator and auditory localization cues indicated a good match. These findings suggest that the virtual source created by amplitude panning over a flat-panel loudspeaker can achieve spatial localization accuracy comparable to that of a real sound source without the need for complex algorithms or acoustic transfer function information. Future research will focus on optimizing algorithms for virtual source localization in spatial sound reproduction using flat-panel loudspeakers, along with perceptual evaluation.

This work was supported by the Beijing Natural Science Foundation (Grant No. L223032). The authors would like to express gratitude to all the participants who took part in the listening test, Professor Feiran Yang for insightful discussions on binaural modeling, Dr. Yuzhen Yang for engaging discussions on acoustic modeling, and Cong Wang, Yong Chen, Kai Wang, and Xuyang Zhu for their assistance during the experiment.

1.
S.
Spors
,
H.
Wierstorf
,
A.
Raake
,
F.
Melchior
,
M.
Frank
, and
F.
Zotter
, “
Spatial sound with loudspeakers and its perception: A review of the current state
,”
Proc. IEEE
101
(
9
),
1920
1938
(
2013
).
2.
P.
Coleman
,
A.
Franck
,
P. J.
Jackson
,
R. J.
Hughes
,
L.
Remaggi
, and
F.
Melchior
, “
Object-based reverberation for spatial audio
,”
J. Audio Eng. Soc.
65
(
1/2
),
66
77
(
2017
).
3.
J.
Blauert
,
Spatial Hearing: The Psychophysics of Human Sound Localization
(
MIT Press
,
Cambridge, MA
,
1997
), pp.
137
191
.
4.
V.
Erbes
and
S.
Spors
, “
Localisation properties of wave field synthesis in a listening room
,”
IEEE/ACM Trans. Audio, Speech, Language Process.
28
,
1016
1024
(
2020
).
5.
B.
Xie
,
Head-Related Transfer Function and Virtual Auditory Display
,
2nd ed
. (
J. Ross Publishing
,
Plantation, FL
,
2013
), pp.
8
12
.
6.
C.
Kyriakakis
,
P.
Tsakalides
, and
T.
Holman
, “
Surrounded by sound
,”
IEEE Signal Process. Mag.
16
(
1
),
55
66
(
1999
).
7.
T.
Betlehem
,
W.
Zhang
,
M. A.
Poletti
, and
T. D.
Abhayapala
, “
Personal sound zones: Delivering interface-free audio to multiple listeners
,”
IEEE Signal Process. Mag.
32
(
2
),
81
91
(
2015
).
8.
Q.
Zhu
,
P.
Coleman
,
M.
Wu
, and
J.
Yang
, “
Robust acoustic contrast control with reduced in-situ measurement by acoustic modeling
,”
J. Audio Eng. Soc.
65
(
6
),
460
473
(
2017
).
9.
S.
Cecchi
,
A.
Carini
, and
S.
Spors
, “
Room response equalization–A review
,”
Appl. Sci.
8
(
1
),
16
(
2017
).
10.
M.
Poletti
, “
Robust two-dimensional surround sound reproduction for nonuniform loudspeaker layouts
,”
J. Audio Eng. Soc.
55
(
7/8
),
598
610
(
2007
).
11.
S.
Koyama
,
G.
Chardon
, and
L.
Daudet
, “
Optimizing source and sensor placement for sound field control: An overview
,”
IEEE/ACM Trans. Audio, Speech, Language Process.
28
,
696
714
(
2020
).
12.
Q.
Zhu
,
P.
Coleman
,
X.
Qiu
,
M.
Wu
,
J.
Yang
, and
I.
Burnett
, “
Robust personal audio geometry optimization in the SVD-based modal domain
,”
IEEE/ACM Trans. Audio, Speech, Language Process
27
(
3
),
610
620
(
2019
).
13.
Q.
Zhu
,
X.
Qiu
,
P.
Coleman
, and
I.
Burnett
, “
An experimental study on transfer function estimation using acoustic modelling and singular value decomposition
,”
J. Acoust. Soc. Am.
150
(
5
),
3557
3568
(
2021
).
14.
B.
Pueo
,
J. J.
López
,
J.
Escolano
, and
L.
Hörchens
, “
Multiactuator panels for wave field synthesis: Evolution and present developments
,”
J. Audio Eng. Soc.
58
(
12
),
1045
1063
(
2011
).
15.
Z.
Li
,
P.
Luo
,
C.
Zheng
, and
X.
Li
, “
Vibrational contrast control for local sound source rendering on flat panel loudspeakers
,” in
Proceedings of the 145th Audio Engineering Society Convention
,
New York, NY
,
October 17–20
,
2018
.
16.
M. C.
Heilemann
,
D.
Anderson
, and
M. F.
Bocko
, “
Sound-source localization on flat-panel loudspeakers
,”
J. Audio Eng. Soc.
65
(
3
),
168
177
(
2017
).
17.
V. P.
Gontcharov
and
N. P.
Hill
, “
Diffusivity properties of distributed mode loudspeakers
,” in
Proceedings of the 108th Audio Engineering Society Convention
,
Paris, France
,
February 19–22
,
2000
.
18.
N.
Harris
, “
Modelling acoustic room interaction for pistonic and distributed-mode loudspeakers in the correlation domain
,” in
Proceedings of the 117th Audio Engineering Society Convention
,
San Francisco, CA
,
October 28–31
,
2004
.
19.
M. C.
Heilemann
,
D. A.
Anderson
,
S.
Roessner
, and
M. F.
Bocko
, “
The evolution and design of flat-panel loudspeakers for audio reproduction
,”
J. Audio Eng. Soc.
69
(
1/2
),
27
39
(
2021
).
20.
M. M.
Boone
, “
Multi-actuator panels (MAPs) as loudspeaker arrays for wave field synthesis
,”
J. Audio Eng. Soc.
52
(
7/8
),
712
723
(
2004
).
21.
N.
Kournoutos
and
J.
Cheer
, “
A system for controlling the directivity of sound radiated from a structure
,”
J. Acoust. Soc. Am.
147
(
1
),
231
241
(
2020
).
22.
S.
Flanagan
and
N.
Harris
, “
Loudness: A study of the subjective difference between DML and conventional loudspeaker
,” in
Proceedings of the 106th Audio Engineering Society Convention
,
Munich, Germany
,
May 8–11
,
1999
.
23.
N. J.
Harris
, “
The acoustics and psychoacoustics of the distributed-mode loudspeaker (DML)
,” Ph.D. dissertation,
University of Essex
,
Essex, UK
,
2001
.
24.
M.
Rébillat
,
E.
Corteel
,
B. F.
Katz
, and
X.
Boutillon
, “
From vibration to perception: Using large multi-actuator panels (LaMAPs) to create coherent audio-visual environments
,” in
Acoustics 2012
,
Nantes, France
,
April 23–27
,
2012
.
25.
M.
Heilemann
,
D.
Anderson
,
S.
Roessner
, and
M. F.
Bocko
, “
Quantifying listener preference of flat-panel loudspeakers
,” in
Proceedings of the 145th Audio Engineering Society Convention
,
New York, NY
,
October 17–20
,
2018
.
26.
E.
Corteel
,
K.-V.
Nguyen
,
O.
Warusfel
,
T.
Caulkins
, and
R.
Pellegrini
, “
Objective and subjective comparison of electrodynamic and MAP loudspeakers for wave field synthesis
,” in
Proceedings of the Audio Engineering Society Conference: 30th International Conference: Intelligent Audio Environments
,
Saariselka, Finland
,
March 15–17
,
2007
.
27.
V.
Pulkki
, “
Virtual sound source positioning using vector base amplitude panning
,”
J. Audio Eng. Soc.
45
(
6
),
456
466
(
1997
).
28.
E.
Erdem
,
Z.
Cvetkovic
, and
H.
Hacihabiboglu
, “
3D perceptual soundfield reconstruction via virtual microphone synthesis
,”
IEEE/ACM Trans. Audio, Speech, Language Process.
31
,
1305
1317
(
2023
).
29.
A.
Franck
,
W.
Wang
, and
F. M.
Fazi
, “
Sparse 1-optimal multiloudspeaker panning and its relation to vector base amplitude panning
,”
IEEE/ACM Trans. Audio, Speech, Language Process.
25
(
5
),
996
1010
(
2017
).
30.
M. C.
Heilemann
,
D. A.
Anderson
, and
M. F.
Bocko
, “
Near-field object-based audio rendering on flat-panel displays
,”
J. Audio Eng. Soc.
67
(
7/8
),
531
539
(
2019
).
31.
M. C.
Heilemann
, “
Spatial audio rendering with flat-panel loudspeakers
,” Ph.D. dissertation,
University of Rochester
,
Rochester, NY
,
2018
.
32.
M.
Kuster
,
D. D.
Vries
,
D.
Beer
, and
S.
Brix
, “
Structural and acoustic analysis of multiactuator panels
,”
J. Audio Eng. Soc.
54
(
11
),
1065
1076
(
2006
).
33.
V.
Pulkki
, “
Generic panning tools for MAX/MSP
,” in
Proceedings of the 2000 International Computer Music Conference
,
Berlin, Germany
,
August 27–September 1
,
2000
, pp.
304
307
.
34.
V.
Pulkki
, “
Localization of amplitude-panned virtual sources II: Two-and three-dimensional panning
,”
J. Audio Eng. Soc.
49
(
9
),
753
767
(
2001
).
35.
A.
Politis
, “
Microphone array processing for parametric spatial audio techniques
,” Ph.D. dissertation,
Aalto University
,
Otaniemi, Espoo, Finland
,
2016
.
36.
H.
Wierstorf
and
S.
Spors
, “
Sound field synthesis toolbox
,” in
Proceedings of the 132nd Audio Engineering Society Convention
,
Budapest, Hungary
,
April 26–29
,
2012
.
37.
U.
Zölzer
,
DAFX: Digital Audio Effects
,
2nd ed
. (
John Wiley & Sons, West Sussex, UK
,
2002
), pp.
283
289
.
38.
D.
Howard
and
J.
Angus
,
Acoustics and Psychoacoustics
,
4th ed
. (
Focal Press
,
Oxford, UK
,
2009
), pp.
107
111
.
39.
J.
Zheng
,
T.
Zhu
,
J.
Lu
, and
X.
Qiu
, “
A linear robust binaural sound reproduction system with optimal source distribution strategy
,”
J. Audio Eng. Soc.
63
(
9
),
725
735
(
2015
).
40.
B.
Xie
,
H.
Mai
,
D.
Rao
, and
X.
Zhong
, “
Analysis of and experiments on vertical summing localization of multichannel sound reproduction with amplitude panning
,”
J. Audio Eng. Soc.
67
(
6
),
382
399
(
2019
).
41.
R.
Algazi
,
C.
Avendano
, and
R. O.
Duda
, “
Estimation of a spherical-head model from anthropometry
,”
J. Audio Eng. Soc.
49
(
6
),
472
479
(
2001
).
42.
ITU-R BS.2159-9
,
Multichannel Sound Technology in Home and Broadcasting Applications
(
International Telecommunications Union
,
2022
).
43.
B.
Xie
,
Spatial Sound: Principles and Applications
(
CRC Press
,
Boca Raton, FL
,
2023
), pp.
704
710
.