Spherical harmonics beamforming (SHB) with solid spherical microphone arrays can identify acoustic source in all directions simultaneously. To surpass the Rayleigh resolution limit and improve the performance of acoustic sources identification, this paper applies the high-resolution CLEAN-SC (HR-CLEAN-SC) algorithm, introduced by Sijtsma et al. for beamforming with planar arrays, to SHB. The factor of the potential resolution enhancement is typically about 1.7 compared to the Rayleigh resolution limit. Furthermore, simulations and experiments with spherical arrays demonstrate that HR-CLEAN-SC has higher spatial resolution and accuracy of both location and quantification than standard CLEAN-SC.
1. Introduction
Solid spherical microphone arrays have been widely used for omnidirectional acoustic source identification due to its ability to record comprehensive information of the sound field and high numerical stability because of its strong diffraction affects.1–3 Deconvolution using CLEAN-SC4 for spherical array beamforming takes advantage of the fact that the mainlobes are spatially coherent with their sidelobes, and it iteratively removes these coherent sidelobes from the results of spherical harmonics beamforming (SHB)1,2 and no point spread functions are required. CLEAN-SC provides high accuracy, effective sidelobe attenuation, efficient computation, fast convergence, and strong robustness. However, if the acoustic sources emit low frequency or space too closely to be separated by SHB (limited by the Rayleigh criterion), standard CLEAN-SC loses effectiveness accordingly.5,6 In the field of planar array beamforming, to identify the low-frequency or closely spaced sources accurately, Sijtsma et al.6,7 proposed a high-resolution extension algorithm of CLEAN-SC, called HR-CLEAN-SC, which has been successfully applied to speakers in a laboratory experiment and to a wind tunnel experiment featuring a nose landing gear.7 However, beamforming with planar arrays is restricted to a limited solid angle.2 To make it feasible to identify the low-frequency or closely spaced acoustic sources in all directions simultaneously using spherical arrays, this paper is devoted to adapting HR-CLEAN-SC to SHB with solid spherical microphone arrays.
2. Theory
Figure 1(a) shows a coordinate system whose origin is located at the center of the array. An arbitrary position in 3D space is described by , where denotes the distance to origin, and denotes the direction with and being the elevation and azimuth angles, respectively. The symbol ✦ represents the acoustic source, and the symbols ● represent microphones embedded in the array surface with a radius of . is the total number of microphones, is the index number of microphones. The sound pressure signal is sampled by the microphone due to the source at and with the wavenumber . Construct a row vector , and is the cross-spectral matrix (CSM) of the sound pressure signals collected by microphones, where the overbar indicates the average over data blocks and the superscript H represents the Hermitian transpose. SHB essentially processes the CSM by matrix operation with the column weight vector , which is computed based on the orthogonality of spherical harmonics at each focus point , where the assuming acoustic source is positioned. The expression of element in is
where , is the weight applied to the microphone signal, and is the truncated upper limit of the spherical harmonics degree.2, is the radial function,2, is the Legendre polynomial of degree , and is the elevation angle of the microphone in the rotated coordinate system.2 Thus, the outputs at focus points near the real acoustic sources are enhanced, and others are attenuated. Accordingly, the average power output of SHB is5
The mainlobe peak values obtained from Eq. (1) are equal to the sound pressure levels (SPLs) at the array center contributed by the acoustic sources under free-field conditions.2
Define the focus point, where the reconstructed acoustic source generates the CSM of the sound pressure signals at all microphone positions, as “source marker,”6 and the acoustic source indicated by this focus point is called “marked source.” Standard CLEAN-SC uses the peak point in the SHB output as the source marker, and the SPL of the marked source is equal to the peak value. When the mainlobes output by SHB are not or only slightly fused, at the peak point, the contribution of the marked source is much larger than the other sources, so the CSM , which contains the location and power information of the source, can be reconstructed accurately. Consequently, standard CLEAN-SC can perform a correct acoustic source identification. However, when the mainlobes output by SHB are severely fused, multiple sources contribute a lot at the peak point, and standard CLEAN-SC cannot distinguish these sources accurately. In fact, as long as the output of SHB at each source marker is mainly contributed by its marked source, the CSM can be accurately reconstructed and the acoustic sources can be correctly identified correspondingly.7 HR-CLEAN-SC selects the alternative source markers based on this fact to improve the spatial resolution and location accuracy, when the mainlobes output by SHB are severely fused. Figure 1(b) (Ref. 6) shows the sketch of the main idea of HR-CLEAN-SC, where the symbols ✦ denote acoustic sources, and the peak value output by SHB falls within the circle centered at the sources. The symbol ✮ denotes the source marker selected by standard CLEAN-SC, which is also the peak point in the SHB output. The symbol ⋄ represents the alternative source marker for source 2 selected by HR-CLEAN-SC, which falls in the mainlobe of source 2 and on the boundary of source 1. At the alternative source marker for source 2, the output of SHB contributed by source 2 is far more than that contributed by source 1.
HR-CLEAN-SC for spherical array beamforming is specifically implemented as follows. First, the number of acoustic sources , the initialized source positions and the SPLs should be determined according to the distribution matrix of SPL reconstructed by standard CLEAN-SC,5 is the index number of the reconstructed sources, and the SPLs are sorted as . Next, search for the correct source positions and SPLs in each iteration, which includes updating the source markers, determining the new source positions and SPLs, and sorting these reconstructed sources according to SPLs.
The specific steps of the iteration follow:
- Update the source marker position ,(2)where is the source marker for the source, is the cost function6 adapted to SHB and its expression is(3)
where is also the index number of the reconstructed sources, denotes the -norm, defines the row vector of the sound field transfer function from the source to each microphone. When , is to ensure that the output of SHB at the source marker is not more than 6 dB (0.25 times) below that at the source point.6 When , represents the ratio of the SHB outputs of the other sources except the one at the focus point to that of the one. All sources have the unit SPL.
- Determine the new source positions and SPLs based on the updated source markers. Through the source coherence analysis similar to that of standard CLEAN-SC, the CSM of the sound pressure signal generated by the source marked by the updated source marker can be obtained by(4)(5)(6)
where is the component coefficient of the marked source and is the coherent source component. Compute the SHB output of the marked source by , and traverse all focus points to obtain the output matrix . Search for the peak value to make it the SPL of the source and its position is the new source position. Perform the above analysis for the other sound sources in sequence.
Sort the sources obtained by step 2 in descending order of SPL, and then return to step 1 to repeat the cycle.
After iterations, the elements in the distribution matrix of SPL are as follows:
3. Simulations
For verification of correctness of the established theory of HR-CLEAN-SC for spherical array beamforming and compare its performance with SHB and CLEAN-SC, simulations are conducted. A 36-element solid spherical microphone array with a radius of 97.5 mm is used, whose geometric setup is shown in Fig. 1(a).5 The specific procedures are (1) assume a source distribution, including position, SPL, and frequency. Set a surface of interest as a sphere concentric to the array and with a radius of , where the focus points are spaced and apart, thus there are 61 × 121 focus points in total. (2) Compute microphone signals to obtain the CSM.5 (3) Process the CSM with SHB shown by Eq. (1) and map sources. (4) Compute the distribution matrix of SPL based on standard CLEAN-SC according to Ref. 5 iteratively and map it. Herein, the safety factor4 is set as 1 to obtain the certain number and positions of sources. (5) According to the HR-CLEAN-SC theory demonstrated in Eqs. (2)–(7), compute the distribution matrix iteratively on the basis of the matrix and map it. When mapping the distribution matrix , a smooth imaging pattern can be obtained by using the normalized clean beam function. The clean beam width here is defined as , the focus point is the identified source point in the reconstructed distribution matrix , and then the SPL at the focus point around it can be computed by , where denotes the normalized clean beam function, and means calculating the angle between vectors in it. When , , and . When , . The clean beam width is set as 5° in this paper, the number of iterations for HR-CLEAN-SC is set as 5, and standard CLEAN-SC uses the following termination condition:
where is the total number of iterations, is the element at row and column in the “degraded” CSM of standard CLEAN-SC,5 and represents the modulo.
3.1 Dual sources with equal SPL
Contour maps in Fig. 2 show simulations of two incoherent white noise sources with the equal SPL of 60 dB, which are located at and , respectively. The focus distance is set as 1 m. In each map, the real source direction is marked by the symbol ○. Obviously, compared to the output of SHB with the wide mainlobes and plentiful sidelobes, both standard CLEAN-SC and HR-CLEAN-SC improve the spatial resolution of the identification results and vanish the sidelobes completely. At 900 Hz, due to the poor resolution of SHB at low frequency, the mainlobes of the two sources output by SHB are severely fused. The two sources identified by standard CLEAN-SC are far away from the real sources, from the perspective of both location and quantification. Fortunately, the two sources identified by HR-CLEAN-SC are close to the real ones. The results at 1800 Hz are similar to those at 900 Hz except that HR-CLEAN-SC can already identify the left source accurately. At 3600 Hz, there is still a large deviation between the real sources and those identified by standard CLEAN-SC, but HR-CLEAN-SC can accurately identify the two sources. The above results show that standard CLEAN-SC cannot distinguish the acoustic sources accurately when the mainlobes output by SHB are severely fused. HR-CLEAN-SC can overcome this limitation effectively and has higher spatial resolution and location accuracy. Typically, the factor of the potential resolution enhancement6 is about 1.7 compared to the Rayleigh resolution limit. It should be noted that sources with equal SPL, as shown in Fig. 2, represent the worst case for HR-CLEAN-SC. When two sources have unequal SPLs, the peak point in the SHB output will be closer to the loudest one, and the associated source component of standard CLEAN-SC contains less energy from the secondary source, which will be more favorable to HR-CLEAN-SC to find a suitable source marker.
3.2 Four sources with unequal SPLs
To study the case of multiple sources with unequal SPLs, Fig. 3 shows contour maps of four sources located at , , , and , respectively, and their corresponding SPLs are 54, 60, 56, and 50 dB. At 900 and 1800 Hz, due to the poor resolution of SHB at low frequency, the mainlobes of the four sources are fused too severely for standard CLEAN-SC to determine the number of sources, and only three sources have been identified and far away from the real ones. Since HR-CLEAN-SC determines the number of sources based on the result of standard CLEAN-SC, it can only identify three sources too, but with higher accuracy. At 3600 Hz, the two strong sources identified by standard CLEAN-SC are much closer to the real ones, but the weak sources still deviate. HR-CLEAN-SC obtains accurate identification for all four sources. However, the result is not as good as the result of the case with two sources. This shows that the more severely the mainlobes output by SHB are fused, the lower the resolution improvement of HR-CLEAN-SC is. Herein, the transfer functions processed by the cost function in Eq. (3) are more closely constrained to each other, so there is less freedom in minimizing Eq. (3), and the resolution improvement of HR-CLEAN-SC gets reduced consequently.
4. Experiments
To verify the correctness of the simulations' conclusions, experiments are conducted on four small loudspeakers in a spacious meeting room, using a Brüel & Kjær type 8606 solid spherical microphone array same as the one in the simulations. All the four loudspeakers are located approximately at 1 m from the array center. The sampling frequency is 16384 Hz. Hanning windows is utilized, the overlap is 66.7%, the number of blocks averaged is 46, each block has a length of 0.25 s, and the frequency resolution is 4 Hz. The remaining calculation parameter settings are consistent with the simulation calculations. Figure 4 shows the contour maps when the four loudspeakers are simultaneously excited by four incoherent stationary white noise. At 900 and 1800 Hz, the mainlobes output by SHB are fused too severely, so that both standard CLEAN-SC and HR-CLEAN-SC can only identify three sources, which are far away from the real ones, especially those identified by standard CLEAN-SC. At 3600 Hz, there is still some deviation between the real sources and those identified by standard CLEAN-SC. However, HR-CLEAN-SC can already identify the four sources accurately, from the perspective of both location and quantification. The conclusions are consistent with the simulation ones.
5. Conclusions
HR-CLEAN-SC, a high-resolution extension of standard CLEAN-SC, is adapted to SHB with spherical microphone arrays in this paper. The algorithm takes advantage of the fact that a point can mark the source as long as the output of SHB at this point is mainly contributed by this source. It searches for suitable source markers where the relative influence by SHB outputs of other marked sources is minimized in each iteration, so that the acoustic sources can be reconstructed more accurately in terms of both location and quantification. HR-CLEAN-SC can effectively improve the performance of acoustic source identification when the mainlobes output by SHB are severely fused and sources cannot be accurately distinguished by standard CLEAN-SC. Simulations and experiments demonstrate that the modified HR-CLEAN-SC algorithm for spherical arrays has higher spatial resolution and location accuracy than the standard CLEAN-SC. Typically, the spatial resolution can be increased by a factor of 1.7 compared to the Rayleigh resolution limit. However, the more severely the mainlobes are fused, the lower its resolution improvement is.
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grants Nos. 11774040 and 11874096, the Fundamental Research Funds for the Central Universities under Grants Nos. 2018CDQYHK0031 and 2018CDXYTW0031.