Spherical harmonics beamforming (SHB) with solid spherical microphone arrays can identify acoustic source in all directions simultaneously. To surpass the Rayleigh resolution limit and improve the performance of acoustic sources identification, this paper applies the high-resolution CLEAN-SC (HR-CLEAN-SC) algorithm, introduced by Sijtsma et al. for beamforming with planar arrays, to SHB. The factor of the potential resolution enhancement is typically about 1.7 compared to the Rayleigh resolution limit. Furthermore, simulations and experiments with spherical arrays demonstrate that HR-CLEAN-SC has higher spatial resolution and accuracy of both location and quantification than standard CLEAN-SC.

Solid spherical microphone arrays have been widely used for omnidirectional acoustic source identification due to its ability to record comprehensive information of the sound field and high numerical stability because of its strong diffraction affects.1–3 Deconvolution using CLEAN-SC4 for spherical array beamforming takes advantage of the fact that the mainlobes are spatially coherent with their sidelobes, and it iteratively removes these coherent sidelobes from the results of spherical harmonics beamforming (SHB)1,2 and no point spread functions are required. CLEAN-SC provides high accuracy, effective sidelobe attenuation, efficient computation, fast convergence, and strong robustness. However, if the acoustic sources emit low frequency or space too closely to be separated by SHB (limited by the Rayleigh criterion), standard CLEAN-SC loses effectiveness accordingly.5,6 In the field of planar array beamforming, to identify the low-frequency or closely spaced sources accurately, Sijtsma et al.6,7 proposed a high-resolution extension algorithm of CLEAN-SC, called HR-CLEAN-SC, which has been successfully applied to speakers in a laboratory experiment and to a wind tunnel experiment featuring a nose landing gear.7 However, beamforming with planar arrays is restricted to a limited solid angle.2 To make it feasible to identify the low-frequency or closely spaced acoustic sources in all directions simultaneously using spherical arrays, this paper is devoted to adapting HR-CLEAN-SC to SHB with solid spherical microphone arrays.

Figure 1(a) shows a coordinate system whose origin is located at the center of the array. An arbitrary position in 3D space is described by (r,Ω), where r denotes the distance to origin, and Ω=(θ,φ) denotes the direction with 0θ180° and 0φ360° being the elevation and azimuth angles, respectively. The symbol ✦ represents the acoustic source, and the symbols ● represent microphones embedded in the array surface with a radius of a. Q is the total number of microphones, q=1,2,,Q is the index number of microphones. The sound pressure signal p(ka,Ωq) is sampled by the microphone due to the source at (r0,Ω0) and with the wavenumber k. Construct a row vector p=[p(ka,Ω1),p(ka,Ω2),,p(ka,ΩQ)], and C=pHp¯ is the cross-spectral matrix (CSM) of the sound pressure signals collected by microphones, where the overbar indicates the average over data blocks and the superscript H represents the Hermitian transpose. SHB essentially processes the CSM C by matrix operation with the column weight vector u(krf,Ωf)=[u1(krf,Ωf),u2(krf,Ωf),,uQ(krf,Ωf)]T, which is computed based on the orthogonality of spherical harmonics at each focus point (rf,Ωf), where the assuming acoustic source is positioned. The expression of element in u(krf,Ωf) is

where i=1, αq is the weight applied to the qth microphone signal, and N is the truncated upper limit of the spherical harmonics degree.2,Rn(kr0,ka) is the radial function,2,Qn is the Legendre polynomial of degree n, and ψq is the elevation angle of the qth microphone in the rotated coordinate system.2 Thus, the outputs at focus points near the real acoustic sources are enhanced, and others are attenuated. Accordingly, the average power output of SHB is5 

(1)

The mainlobe peak values obtained from Eq. (1) are equal to the sound pressure levels (SPLs) at the array center contributed by the acoustic sources under free-field conditions.2 

Fig. 1.

(Color online) (a) Coordinate system fixed on the solid spherical microphone array. (b) Sketch of the main idea of HR-CLEAN-SC (Ref. 6).

Fig. 1.

(Color online) (a) Coordinate system fixed on the solid spherical microphone array. (b) Sketch of the main idea of HR-CLEAN-SC (Ref. 6).

Close modal

Define the focus point, where the reconstructed acoustic source generates the CSM G of the sound pressure signals at all microphone positions, as “source marker,”6 and the acoustic source indicated by this focus point is called “marked source.” Standard CLEAN-SC uses the peak point in the SHB output as the source marker, and the SPL of the marked source is equal to the peak value. When the mainlobes output by SHB are not or only slightly fused, at the peak point, the contribution of the marked source is much larger than the other sources, so the CSM G, which contains the location and power information of the source, can be reconstructed accurately. Consequently, standard CLEAN-SC can perform a correct acoustic source identification. However, when the mainlobes output by SHB are severely fused, multiple sources contribute a lot at the peak point, and standard CLEAN-SC cannot distinguish these sources accurately. In fact, as long as the output of SHB at each source marker is mainly contributed by its marked source, the CSM G can be accurately reconstructed and the acoustic sources can be correctly identified correspondingly.7 HR-CLEAN-SC selects the alternative source markers based on this fact to improve the spatial resolution and location accuracy, when the mainlobes output by SHB are severely fused. Figure 1(b) (Ref. 6) shows the sketch of the main idea of HR-CLEAN-SC, where the symbols ✦ denote acoustic sources, and the peak value output by SHB falls within the circle centered at the sources. The symbol ✮ denotes the source marker selected by standard CLEAN-SC, which is also the peak point in the SHB output. The symbol ⋄ represents the alternative source marker for source 2 selected by HR-CLEAN-SC, which falls in the mainlobe of source 2 and on the boundary of source 1. At the alternative source marker for source 2, the output of SHB contributed by source 2 is far more than that contributed by source 1.

HR-CLEAN-SC for spherical array beamforming is specifically implemented as follows. First, the number of acoustic sources S, the initialized source positions (r0s(0),Ω0s(0)) and the SPLs Pc(0)(r0s(0),Ω0s(0)) should be determined according to the distribution matrix Pc(0) of SPL reconstructed by standard CLEAN-SC,5s=1,2,,S is the index number of the reconstructed sources, and the SPLs are sorted as Pc(0)(r01(0),Ω01(0))>Pc(0)(r02(0),Ω02(0))>>Pc(0)(r0S(0),Ω0S(0)). Next, search for the correct source positions and SPLs in each iteration, which includes updating the source markers, determining the new source positions and SPLs, and sorting these reconstructed sources according to SPLs.

The specific steps of the lth iteration follow:

  1. Update the source marker position (rmaxs(l),Ωmaxs(l)),
    (2)
    where (rmaxs(l),Ωmaxs(l)) is the source marker for the sth source, F((krf,Ωf)|(kr0s(l1),Ω0s(l1))) is the cost function6 adapted to SHB and its expression is
    (3)

    where v is also the index number of the reconstructed sources, ·2 denotes the 2-norm, t(kr0,Ω0) defines the row vector of the sound field transfer function from the source to each microphone. When |t(kr0v(l1),Ω0v(l1))u(krf,Ωf)|2<0.25, F((krf,Ωf)|(kr0s(l1),Ω0s(l1)))=+ is to ensure that the output of SHB at the source marker is not more than 6 dB (0.25 times) below that at the source point.6 When |t(kr0v(l1),Ω0v(l1))u(krf,Ωf)|20.25, F((krf,Ωf)|(kr0s(l1),Ω0s(l1))) represents the ratio of the SHB outputs of the other sources except the sth one at the focus point (rf,Ωf) to that of the sth one. All sources have the unit SPL.

  2. Determine the new source positions and SPLs based on the updated source markers. Through the source coherence analysis similar to that of standard CLEAN-SC, the CSM Gs(l) of the sound pressure signal generated by the source marked by the updated source marker (rmaxs(l),Ωmaxs(l)) can be obtained by
    (4)
    (5)
    (6)

    where λ(krmaxs(l),Ωmaxs(l)) is the component coefficient of the marked source and h(krmaxs(l),Ωmaxs(l)) is the coherent source component. Compute the SHB output of the marked source by Ws(l)(krf,Ωf)=uH(krf,Ωf)Gs(l)u(krf,Ωf), and traverse all focus points to obtain the output matrix Ws(l). Search for the peak value Pc(l)(r0s(l),Ω0s(l))=maxWs(l) to make it the SPL of the sth source and its position (r0s(l),Ω0s(l)) is the new source position. Perform the above analysis for the other sound sources in sequence.

  3. Sort the sources obtained by step 2 in descending order of SPL, and then return to step 1 to repeat the cycle.

After L iterations, the elements in the distribution matrix Pc(L) of SPL are as follows:

(7)

For verification of correctness of the established theory of HR-CLEAN-SC for spherical array beamforming and compare its performance with SHB and CLEAN-SC, simulations are conducted. A 36-element solid spherical microphone array with a radius of 97.5 mm is used, whose geometric setup is shown in Fig. 1(a).5 The specific procedures are (1) assume a source distribution, including position, SPL, and frequency. Set a surface of interest as a sphere concentric to the array and with a radius of rf, where the focus points are spaced Δθ=3° and Δφ=3° apart, thus there are 61 × 121 focus points in total. (2) Compute microphone signals to obtain the CSM.5 (3) Process the CSM with SHB shown by Eq. (1) and map sources. (4) Compute the distribution matrix Pc(0) of SPL based on standard CLEAN-SC according to Ref. 5 iteratively and map it. Herein, the safety factor4 is set as 1 to obtain the certain number and positions of sources. (5) According to the HR-CLEAN-SC theory demonstrated in Eqs. (2)–(7), compute the distribution matrix Pc(L) iteratively on the basis of the matrix Pc(0) and map it. When mapping the distribution matrix Pc, a smooth imaging pattern can be obtained by using the normalized clean beam function. The clean beam width here is defined as γ, the focus point (r0max,Ω0max) is the identified source point in the reconstructed distribution matrix Pc, and then the SPL at the focus point (rf,Ωf) around it can be computed by Pc(rf,Ωf)=Pc(r0max,Ω0max)Ψ((rf,Ωf),(r0max,Ω0max)), where Ψ denotes the normalized clean beam function, and · means calculating the angle between vectors in it. When (rf,Ωf),(r0max,Ω0max)γ, 0<Ψ1, and Ψ(0)1. When (rf,Ωf),(r0max,Ω0max)>γ, Ψ0. The clean beam width γ is set as 5° in this paper, the number of iterations for HR-CLEAN-SC is set as 5, and standard CLEAN-SC uses the following termination condition:

(8)

where L1 is the total number of iterations, D(i,j) is the element at ith row and jth column in the “degraded” CSM D of standard CLEAN-SC,5 and |·| represents the modulo.

Contour maps in Fig. 2 show simulations of two incoherent white noise sources with the equal SPL of 60 dB, which are located at (1m,102°,168°) and (1m,102°,192°), respectively. The focus distance is set as 1 m. In each map, the real source direction is marked by the symbol ○. Obviously, compared to the output of SHB with the wide mainlobes and plentiful sidelobes, both standard CLEAN-SC and HR-CLEAN-SC improve the spatial resolution of the identification results and vanish the sidelobes completely. At 900 Hz, due to the poor resolution of SHB at low frequency, the mainlobes of the two sources output by SHB are severely fused. The two sources identified by standard CLEAN-SC are far away from the real sources, from the perspective of both location and quantification. Fortunately, the two sources identified by HR-CLEAN-SC are close to the real ones. The results at 1800 Hz are similar to those at 900 Hz except that HR-CLEAN-SC can already identify the left source accurately. At 3600 Hz, there is still a large deviation between the real sources and those identified by standard CLEAN-SC, but HR-CLEAN-SC can accurately identify the two sources. The above results show that standard CLEAN-SC cannot distinguish the acoustic sources accurately when the mainlobes output by SHB are severely fused. HR-CLEAN-SC can overcome this limitation effectively and has higher spatial resolution and location accuracy. Typically, the factor of the potential resolution enhancement6 is about 1.7 compared to the Rayleigh resolution limit. It should be noted that sources with equal SPL, as shown in Fig. 2, represent the worst case for HR-CLEAN-SC. When two sources have unequal SPLs, the peak point in the SHB output will be closer to the loudest one, and the associated source component of standard CLEAN-SC contains less energy from the secondary source, which will be more favorable to HR-CLEAN-SC to find a suitable source marker.

Fig. 2.

(Color online) Contour maps showing simulations of two equal sources at 900, 1800, and 3600 Hz after different post-processing algorithms: (a), (d), and (g) for SHB; (b), (e), and (h) for standard CLEAN-SC; (c), (f), and (i) for HR-CLEAN-SC.

Fig. 2.

(Color online) Contour maps showing simulations of two equal sources at 900, 1800, and 3600 Hz after different post-processing algorithms: (a), (d), and (g) for SHB; (b), (e), and (h) for standard CLEAN-SC; (c), (f), and (i) for HR-CLEAN-SC.

Close modal

To study the case of multiple sources with unequal SPLs, Fig. 3 shows contour maps of four sources located at (1m,102°,168°), (1m,102°,192°), (1m,78°,168°), and (1m,78°,192°), respectively, and their corresponding SPLs are 54, 60, 56, and 50 dB. At 900 and 1800 Hz, due to the poor resolution of SHB at low frequency, the mainlobes of the four sources are fused too severely for standard CLEAN-SC to determine the number of sources, and only three sources have been identified and far away from the real ones. Since HR-CLEAN-SC determines the number of sources based on the result of standard CLEAN-SC, it can only identify three sources too, but with higher accuracy. At 3600 Hz, the two strong sources identified by standard CLEAN-SC are much closer to the real ones, but the weak sources still deviate. HR-CLEAN-SC obtains accurate identification for all four sources. However, the result is not as good as the result of the case with two sources. This shows that the more severely the mainlobes output by SHB are fused, the lower the resolution improvement of HR-CLEAN-SC is. Herein, the transfer functions t(kr0,Ω0) processed by the cost function in Eq. (3) are more closely constrained to each other, so there is less freedom in minimizing Eq. (3), and the resolution improvement of HR-CLEAN-SC gets reduced consequently.

Fig. 3.

(Color online) Contour maps showing simulations of four unequal sources at 900, 1800, and 3600 Hz after different post-processing algorithms: (a), (d), and (g) for SHB; (b), (e), and (h) for standard CLEAN-SC; (c), (f), and (i) for HR-CLEAN-SC.

Fig. 3.

(Color online) Contour maps showing simulations of four unequal sources at 900, 1800, and 3600 Hz after different post-processing algorithms: (a), (d), and (g) for SHB; (b), (e), and (h) for standard CLEAN-SC; (c), (f), and (i) for HR-CLEAN-SC.

Close modal

To verify the correctness of the simulations' conclusions, experiments are conducted on four small loudspeakers in a spacious meeting room, using a Brüel & Kjær type 8606 solid spherical microphone array same as the one in the simulations. All the four loudspeakers are located approximately at 1 m from the array center. The sampling frequency is 16384 Hz. Hanning windows is utilized, the overlap is 66.7%, the number of blocks averaged is 46, each block has a length of 0.25 s, and the frequency resolution is 4 Hz. The remaining calculation parameter settings are consistent with the simulation calculations. Figure 4 shows the contour maps when the four loudspeakers are simultaneously excited by four incoherent stationary white noise. At 900 and 1800 Hz, the mainlobes output by SHB are fused too severely, so that both standard CLEAN-SC and HR-CLEAN-SC can only identify three sources, which are far away from the real ones, especially those identified by standard CLEAN-SC. At 3600 Hz, there is still some deviation between the real sources and those identified by standard CLEAN-SC. However, HR-CLEAN-SC can already identify the four sources accurately, from the perspective of both location and quantification. The conclusions are consistent with the simulation ones.

Fig. 4.

(Color online) Contour maps showing identification results of four loudspeakers at 900, 1800, and 3600 Hz after different post-processing algorithms: (a), (d), and (g) for SHB; (b), (e), and (h) for standard CLEAN-SC; (c), (f), and (i) for HR-CLEAN-SC.

Fig. 4.

(Color online) Contour maps showing identification results of four loudspeakers at 900, 1800, and 3600 Hz after different post-processing algorithms: (a), (d), and (g) for SHB; (b), (e), and (h) for standard CLEAN-SC; (c), (f), and (i) for HR-CLEAN-SC.

Close modal

HR-CLEAN-SC, a high-resolution extension of standard CLEAN-SC, is adapted to SHB with spherical microphone arrays in this paper. The algorithm takes advantage of the fact that a point can mark the source as long as the output of SHB at this point is mainly contributed by this source. It searches for suitable source markers where the relative influence by SHB outputs of other marked sources is minimized in each iteration, so that the acoustic sources can be reconstructed more accurately in terms of both location and quantification. HR-CLEAN-SC can effectively improve the performance of acoustic source identification when the mainlobes output by SHB are severely fused and sources cannot be accurately distinguished by standard CLEAN-SC. Simulations and experiments demonstrate that the modified HR-CLEAN-SC algorithm for spherical arrays has higher spatial resolution and location accuracy than the standard CLEAN-SC. Typically, the spatial resolution can be increased by a factor of 1.7 compared to the Rayleigh resolution limit. However, the more severely the mainlobes are fused, the lower its resolution improvement is.

This work was supported by the National Natural Science Foundation of China under Grants Nos. 11774040 and 11874096, the Fundamental Research Funds for the Central Universities under Grants Nos. 2018CDQYHK0031 and 2018CDXYTW0031.

1.
B.
Rafaely
, “
Plane wave decomposition of the sound field on a sphere by spherical convolution
,”
J. Acoust. Soc. Am.
116
(
4
),
2149
2157
(
2004
).
2.
K.
Haddad
and
J.
Hald
, “
3D localization of acoustic sources with a spherical array
,” in
Proceedings of the 7th European Conference on Noise Control
, Pairs, France (June 29–July 4,
2008
), pp.
1585
1590
.
3.
C.
Jin
,
C. E.
Nicolas
, and
P.
Abhaya
, “
Design, optimization and evaluation of a dual-radius spherical microphone array
,”
IEEE Trans. Audio Speech Lang. Process.
22
(
1
),
193
204
(
2014
).
4.
P.
Sijtsma
, “
CLEAN based on spatial source coherence
,”
Int. J. Aeroacoust.
6
(
4
),
357
374
(
2007
).
5.
Z.
Chu
,
S.
Zhao
,
Y.
Yang
, and
Y.
Yang
, “
Deconvolution using CLEAN-SC for acoustic source identification with spherical microphone arrays
,”
J. Sound Vib.
440
,
161
173
(
2019
).
6.
P.
Sijtsma
,
R.
Merino-Martinez
,
A.
Malgoezar
, and
M.
Snellen
, “
High-resolution CLEAN-SC: Theory and experimental validation
,”
Int. J. Aeroacoust.
16
(
4-5
),
274
298
(
2017
).
7.
R.
Merino-Martinez
,
E.
Neri
,
M.
Snellen
,
J.
Kennedy
,
D. G.
Simons
, and
G. J.
Bennett
, “
Analysis of nose landing gear noise comparing numerical computations, prediction models and flyover and wind-tunnel measurements
,” in
24th AIAA/CEAS Aeroacoustics Conference
, Atlanta, GA (June 25–29,
2018
), AIAA paper 2018-3299.