This contribution deals with the in situ detection and localisation of brake squeal in an automobile. As brake squeal is emitted from regions known a priori, i.e., near the wheels, the localisation is treated as a hypothesis testing problem. Distributed microphone arrays, situated under the automobile, are used to capture the directional properties of the sound field generated by a squealing brake. The spatial characteristics of the sampled sound field is then used to formulate the hypothesis tests. However, in contrast to standard hypothesis testing approaches of this kind, the propagation environment is complex and time-varying. Coupled with inaccuracies in the knowledge of the sensor and source positions as well as sensor gain mismatches, modelling the sound field is difficult and standard approaches fail in this case. A previously proposed approach implicitly tried to account for such incomplete system knowledge and was based on ad hoc likelihood formulations. The current paper builds upon this approach and proposes a second approach, based on more solid theoretical foundations, that can systematically account for the model uncertainties. Results from tests in a real setting show that the proposed approach is more consistent than the prior state-of-the-art. In both approaches, the tasks of detection and localisation are decoupled for complexity reasons. The localisation (hypothesis testing) is subject to a prior detection of brake squeal and identification of the squeal frequencies. The approaches used for the detection and identification of squeal frequencies are also presented. The paper, further, briefly addresses some practical issues related to array design and placement.
I. INTRODUCTION
The optimisation of the acoustic design of cars over the past years has led to significant reduction in the engine-, wind- and tyre-noise, chassis vibration noise, etc. Due to this lowering of the overall sound level, a variety of low-level contributors to the acoustic footprint are now coming into the focus of the automobile manufacturers. One such contributor is brake squeal—the high-pitched noise sometimes emitted when brakes are applied. This is known to be very annoying and is a major concern for the manufacturers. The in situ detection and localisation of brake squeal is, however, a challenging problem since the scenario in which brake squeal occurs is a dynamic one: the background noise is highly time-variant depending upon road surfacing, atmospheric conditions, age and state of components, speed of motion, and engine condition, to name just a few uncontrollable factors. Further, most brake squeal is produced by vibration of the brake components, especially the pads and discs, due to resonance. Thus, the source of the squeal is distributed over the disk, pad and calliper surfaces, giving it a significant spatial spread. The squeal is narrow-band in nature, with the squeal frequencies typically ranging from 1 to 16 kHz. Also, the squeal frequencies are usually different for each wheel, and dependent upon the speed of the automobile when braking, age of the components, environmental conditions (e.g., heat, moisture), brake pressure, etc. Analysis and quantification of the various factors inducing brake squeal would therefore help the manufacturers further reduce this contributor to the acoustic footprint.
Depending upon the aim of the analysis, this can be conducted from several different perspectives. One good classification, suggested in Mauer and Haverkamp (2007), consists of the following three approaches:
A driver-focussed analysis, which considers the perceivable attributes of brake squeal in the interior of the automobile, with the aim of optimising brake noise and vibration as part of the driving experience.
A pedestrian-focussed analysis, which considers the perception of brake noise exterior to the automobile to minimise the exterior radiation of noise.
A component-focussed analysis, where the assessment is focussed on the source of the noise, in order to apply measures of noise reduction at the source.
Whereas significant research has been conducted towards the perceptual analysis of brake squeal with respect to the first two assessment goals (see, e.g., Attia et al., 2006; Philippen et al., 2010), work is still required for achieving the third goal—that of component-based analysis. Such an analysis is beneficial in that the data are more appropriate to understand the excitation process leading to brake squeal and could, thereby, allow for the development and application of suitable noise reduction measures at the source (see, e.g., Hou et al., 2009; Liang and Yamaura, 2014; Lü et al., 2017a,b; Pan et al., 2019).
A first step in this direction is the detection of brake squeal and the localisation of this squeal to the contributing brake(s). This information can then be combined along with the other local parameters such as temperature, brake-pedal pressure, etc., in the subsequent analysis. Therefore, from the acoustic signal processing point of view, we have the following:
An environment that contains several narrowband sources, the a priori knowledge of whose approximate positions are available (the four wheel positions).
At any time none, one, or several of these sources could be “active” (i.e., emit a squeal), occupying possibly overlapping frequencies. The presence of active sources within an observed time range constitutes what we shall define as an acoustic event.
Brake squeal localisation defined in this context consists of detecting such acoustic events and locating them to their contributing sources (generator brakes). For each detected squeal frequency, given that there are four brakes, there are possible combinations1 that could have contributed to the generation of the squeal. Thus, selecting the right combination constitutes the solution to the localisation problem. In other words, brake squeal localisation may be seen as a multiple-hypothesis testing or a detection problem based on the directional properties of the sound field received at appropriately situated sensors.
While the intuitive solution for brake squeal detection and localisation would be to analyse signals from individual microphones mounted next to the sound sources, i.e., next to the brake in the wheel casing, or from accelorometers placed on the brake itself, this setup suffers from several disadvantages that make the implementation impractical. First, the wheel casings are not easily accessible, making installation difficult and requiring special facilities. Furthermore, this region is prone to high temperatures and often collects dust and dirt. These could affect the sensors and could lead to a more-or-less complete system breakdown due to the lack of sensor-redundancy. This is compounded by the fact that repair or replacement in the field is difficult. Performing brake squeal localisation using single microphones in the wheel casing requires a comparison of the amplitudes or powers of the recorded signals. To make the right decision, therefore, the microphones must be perfectly calibrated, requiring expensive measurement microphones. Even under perfect calibration, dust or dirt sticking to any one sensor will change its frequency response and replacing the defective sensor would require recalibration of the system. Additionally, brake squeal from any one brake may lead to almost identical amplitudes at all the microphones. Thus, a distinction based on amplitudes is not easy. Furthermore, accelerometers mounted near the brake pads are difficult to secure in position mechanically, due the heat and vibration.
In Madhu et al. (2014), a microphone array based solution was proposed as an alternative. This approach exploited the directional properties of the sound field; hence, the arrays were not constrained to be in close vicinity of the brakes and were mounted under the chassis, which is easily accessible. Furthermore, it was argued in Madhu et al. (2014) that techniques based on sound field directivity are relatively robust against gain mismatches amongst the microphones, so an accurate calibration was not required. The redundancy of microphones in the array, combined with an appropriate algorithm to detect defective sensors (see, e.g., Madhu and Martin, 2008b, and references therein) not only prevents a system breakdown but additionally allows correct localisation, taking the changed array geometry into account. If required, an entire array can be replaced in the field since the arrays are easily accessible. This makes maintenance and servicing of the system easy.
It was further concluded in Madhu et al. (2014) that, given the narrowband nature of brake-squeal, the well-known broadband approaches (e.g., Benesty, 2000; Chen et al., 2003; Knapp and Carter, 1976; Madhu and Martin, 2005; Roth, 1971; Scheuing and Yang, 2006; Talantzis et al., 2005; Yoon et al., 2006) should not be used to localize the squeal to the generating brakes and that algorithms that lend themselves to solving this problem should be based on narrowband, approaches for each squeal frequency (cf. DiBiase et al., 2001; Paulraj and Kailath, 1986; Schmidt, 1981; Thiergart et al., 2016; Wax, 1992; Wax and Kailath, 1984). Since such narrowband approaches are implicitly based on the phase of the pairwise cross-power-spectra of the microphone signals, a weighted variant of the narrowband steered response power phase transform (SRP-PHAT) algorithm was adopted in Madhu et al. (2014) and an approach based on ad hoc formulation of likelihoods was developed. Here, we propose an alternative method, based on more solid theoretical foundations, that can systematically account for the different uncertainties. We further demonstrate the improved consistency of this approach over the existing state-of-the-art in a real setting.
The remainder of the paper is organised as follows: we first introduce the signal model of Madhu et al. (2014) in Sec. II, followed by the system overview in Sec. III (array placement and hypothesis nomenclature). In Sec. IV, we briefly describe the algorithm of Madhu et al. (2014). In Sec. V, we detail the proposed approach. The developed algorithm is next tested on both simulations and data obtained under realistic conditions, and its performance relative to the state-of-the-art summarised.
II. SIGNAL MODEL
Since we deal with the narrowband localisation of the sources, we first transform each microphone signal into the short-time Fourier domain. This frame-wise spectral representation of a discrete-time signal is obtained by a K-point discrete Fourier transform (DFT) on overlapped, windowed signal segments. The corresponding representation of the signal is , where k is the discrete frequency bin index and b is the frame index.
Consider an array of M microphones at positions capturing the signals emitted from Q sources located at . The signal at microphone m may be approximated as (Madhu and Martin, 2008a)
where is the transfer function from to ; is the qth source signal and is the noise component at microphone m. The approximation is a result of truncating the support of the signals to finite length. Each is further represented as where represents the gain along the direct path and indicates the net gain and phase smearing caused by the reflections along the indirect paths. τmq represents the absolute time delay of the signal from source q to microphone m along the direct path with being the kth discrete frequency and the sampling frequency.
Whereas the direct path components of the transfer function are directly related to the geometric arrangement of the sources and the sensors and are key to the localisation problem, the indirect path components () are treated as disturbances. Further, dominance of the direct path is assumed (i.e., ), allowing us to lump the indirect component contributions with the background noise.
The model is further simplified in terms of the relative transfer functions (Gannot et al., 2001) by considering the signals received at the first microphone through the direct path as the reference,
Under these simplifications and stacking, the microphone signals into a vector for each time-frequency point leads us to the following compact vector representation of the model as
with
where is the relative time delay or time delay of arrival (TDOA) of the wavefront of source q at microphone m, with respect to the first microphone.
III. SYSTEM OVERVIEW
A. System overview
The system proposed in Madhu et al. (2014) consisted of four linear microphone sub-arrays, each consisting of eight elements. The inter-microphone spacing for each sub-array was chosen in order to ensure that an unambiguous decision at each frequency would be possible in the desired range and that the spatial aliasing problem (van Trees, 2002) would not occur. This led to the harmonically nested array depicted in Fig. 1. Such harmonic nesting additionally has the benefit of frequency invariant beamwidth (Ward et al., 2001) in the SRP-PHAT cost function over the frequency range of interest. The four linear sub-arrays are placed along the outside border of the auto-chassis as depicted in Fig. 2. This placement preserves the dominance of the direct path from a brake to the arrays to which it has a direct line of sight (LOS). To clarify: array 2 can distinguish accurately between events originating front right and front left, but an event at the rear right/left brakes is not accurately localised (construction of the chassis is such that line-of-sight is obstructed). The reverse holds true for array 4. Array 1, on the other hand, cannot distinguish between an event at front right and front left. Controlled tests on a stationary vehicle demonstrated that even when the squeal event was from front left, array 1 indicated maximum spatially coherent energy from the direction of front right. The reverse is true for array 3. We infer that this behaviour is due to the shadowing of the brake at front right by the wheel, and the presence of a parallel reflecting surface in the vicinity of the brake at front left, leading to a coherent first-order reflection. Thus, array 1 perceives a source emanating from front right as coming from front left. The same holds for squeals originating from the rear wheels, while the reverse argument holds for array 3. Consequently, arrays 1 and 3 are used to distinguish reliably between events coming from the front and the rear. This discrimination capability for each array is also illustrated in Fig. 2. Accordingly, we require separate arrays for the front and rear wheels, to perform a left-right distinction and two arrays on the side to perform the front-back distinction. The reasoning for the redundancy in the front-back distinction is explained further in the following section.
(Color online) The designed array for brake squeal localisation. The spacings are symmetric about the center, with cm, cm, cm, and cm.
(Color online) The designed array for brake squeal localisation. The spacings are symmetric about the center, with cm, cm, cm, and cm.
(Color online) Arrangement of arrays to achieve LOS of two arrays for each brake. The subscripts for each wheel indicate the arrays having a direct LOS to that wheel. (a) Search regions for each wheel. (b) Candidate positions for array 3.
(Color online) Arrangement of arrays to achieve LOS of two arrays for each brake. The subscripts for each wheel indicate the arrays having a direct LOS to that wheel. (a) Search regions for each wheel. (b) Candidate positions for array 3.
B. Hypothesis nomenclature and formulation
Assume, for now, the availability of an initial stage that can detect the occurrence of brake squeal and extract the corresponding squeal frequencies. Given these frequencies at which we have an acoustic event and the directional statistics available from the arrays, the hypothesis testing approach requires us to decide which of the combinations of brakes generated the event. Since no single array can localise all sources, hypothesis tests which consider the complete 4-array system were not optimal. Therefore, in the hierarchical approach we proposed previously, the localisation of a squeal event to a brake was done in two steps. In the first stage, arrays 1 and 3 were used to test
: squeal emanated from the front.
: squeal emanated from the back (rear).
: squeal emanated simultaneously from the front and rear.
Depending upon the outcome of this stage, arrays 2 and/or 4 were considered to detect if the event originated from the right (R) or the left (L), i.e., if was true, we used array 2 to test pick from the subsequent hypotheses:
: squeal emanated from the brake on front left.
: squeal emanated from the brake on front right.
: squeal emanated from front left and front right.
Similarly, given the occurrence of squeal from the rear, array 4 was used to decide between , , and . Note that the accuracy of the front/back detection is critical as the next stage depends on the outcome of this test. It is to increase this accuracy and to provide robustness against sensor defects that we built in redundancy by using two arrays for this stage. The truth of the hypotheses themselves depends on the particular use of the spatial statistics.
IV. HIERARCHICAL APPROACH SUMMARY
The approach of Madhu et al. (2014) was based on heuristic conditions with the aim of assigning a measure of contribution of each brake to each acoustic event, thereby arriving at a “soft” decision. Under assumptions of independence between the squeal event at each brake, a heuristic likelihood of activity for each brake was computed in a factorable form as in Eq. (4) below,
where, e.g., denotes the likelihood of activity of the front, right brake; denotes the likelihood of activity from the front brakes; denotes the likelihood of activity of the front right brake conditional upon the hypothesis that the event originated from the front; and so on. The likelihood was based on a pair-wise coherence-weighted sum of the cross-power spectrum components of the SRP-PHAT cost-function . In this notation, the dependence on frequency k, the candidate position r and the array i is explicit. For convenience, this function is reproduced below for a generic array as
where, in accordance with the model in Sec. II, is the TDOA that would be generated across the array, if a source were present at location r; and is the coherence between microphones m and at frequency .
To account for the small deviations in the positioning of the microphone arrays and the spatially spread nature of the acoustic sources, the candidate positions for each brake are extended to encompass the possible spread of the source and the deviations in the microphone array, leading to a spatially averaged estimate of over a candidate region ,
Since the squeal event can be reasonably assumed to be concentrated in the region around the respective wheels, we define the extent of the candidate regions as depicted by the shading in Fig. 3(a) for all the wheels and in Fig. 3(b) for array 3 in particular.
(Color online) Candidate search regions. The shaded regions indicate the a priori knowledge of regions of source presence, which is used to obtain the candidate search regions for the cost function computation as in Eq. (6). The search region for array 3 is, further, illustrated for convenience.
(Color online) Candidate search regions. The shaded regions indicate the a priori knowledge of regions of source presence, which is used to obtain the candidate search regions for the cost function computation as in Eq. (6). The search region for array 3 is, further, illustrated for convenience.
With these extended regions and the corresponding spatially averaged metrics, the marginals and the conditional probabilities were defined as
and
where
This gives us a normalised value for the occurrence of a squeal event—akin to the probability of that event. At this stage, we may take a hard decision regarding the activity of a particular brake, which is done by thresholding the likelihood values obtained from Eq. (4) by substituting the values of Eqs. (7) and (8). This threshold, , is empirically set.
V. PROPOSED APPROACH SUMMARY
This approach is derived from the sequential hypothesis testing framework based on maximum likelihood (Kraus, 1993; Maiwald, 1995). However, these approaches cannot be used directly since they require accurate knowledge of the signals and the propagation vector and such information is hard to gather for brake squeal localisation. The reason lies in part in the insufficient knowledge regarding the source locations (the positions are only approximately known, the spatial extent during a squeal is unknown and time and frequency variant. Estimating these parameters during the localisation procedure is also difficult, since the squeal is only present for a short time interval). In addition, we require a model for the signal distortion caused by the acoustic environment under the automobile. This is further complicated by the fact that such a model is time variant and depends not only on the relative source–receiver positions, but also on the external environment, which is continually changing. Thus, given the short time frames in which the detection and localisation must be done, applying the ML framework requires a high-dimensional optimisation, which increases the computational complexity and makes a real-time implementation impossible. This is the motivation to find an approach that is not only real-time capable (low computational complexity) but also more tolerant and robust against such imperfect knowledge. In this section, we shall develop one such method.
Since each array principally evaluates two regions, consider one such generic array, with the corresponding sources and . We further make the simplifying assumption that the sources are spectrally disjoint, i.e., each active frequency is dominated by one source, which we require to localise. This assumption is realistic since the squeal frequencies and the presence of squeal depend on several factors (hydraulic pressure, humidity, dust, state of brake shoe and disk, speed, etc.), which will be different for any two brakes. Note this contrast to the previous algorithm, which allows multiple wheels to contribute to a squeal event at a given frequency and time frame. The spectral disjointness assumption further implies that, given an active frequency, it must be allocated to one source. To do this, we sequentially test each of the two regions for the presence of source activity whilst blocking out contributions from the other region (treated as a clutter region). The approximate knowledge of the source positions can be used for designing such blocking systems with sufficient tolerance to compensate for possible inaccuracies in the model. Assume for now that we have such a blocking system for region 1 at frequency bin k. Then, define as
where indicates the relative propagation vector from source to receiver [i.e., the signal model from Eq. (3)], but taking into account the unknown disturbances. The approximation in the last step holds as, by designing the blocking system to have a suitable tolerance, we may negate the contributions from the blocked source 1. Similarly, we may negate the contribution of source 2 to obtain: . The use of such a blocking system to obtain the resultant signals is key to our approach, which we term the maximum choice (MaxChoice) algorithm. Note that such a blocking matrix is well-known in the beamforming community for generating the noise estimate in, e.g., adaptive beamforming approaches such as the generalised sidelobe canceller (Gannot et al., 2017; Griffiths and Jim, 1982).
We next compute the residual energy in and , over the B time records. This may be compactly expressed as, when source q is blocked, as
Recognise that this is a measure of the energy received by the microphone from spatial regions other than the region associated with the presence of source q. Once this residual energy has been computed, a binary decision is made for that bin as follows:
with representing the hypothesis that the source originated from region q. Note that in this approach we do not perform explicit source localisation. Rather, by configuring the blocking system with suitable tolerance around the nominal source positions, we compensate for imperfections in the knowledge of the propagation model and source position.
When the propagation model is well-known and the source positions are available, it is straightforward to extend this approach to perform a hypothesis test for the presence or absence of source activity independently in each region. Thereby, it is also possible to handle the case of correlated or even coherent sources, and the assumption of spectral disjointness is not necessary. Indeed, one may consider such a system to be a generalised form of the approach presented in Tadaion et al. (2007), that better uses the a priori knowledge available regarding the nominal source positions and spatial extent.
MaxChoice is summarised in Fig. 4.
A. Design of the blocking system
We now address the issue of design of the blocking matrix for a generic frequency k, an M channel array and a blocking region defined as . Denote by , the dimensions of the blocking matrix. The frequency-dependent parameter is determined such that we minimise the contribution of sources from , i.e., find such that
Defining
we see that Eq. (13) is minimised if we select to span the orthogonal complement space of , i.e., if the eigenvectors of are denoted as
where is estimated, for a given threshold (), in the following manner:
where the are the eigenvalues of , sorted in the descending order of magnitude. In effect, Eq. (16) selects the dimension of the orthogonal complement space by setting a threshold on the energy that is present in the target region. The dimension of the target region subspace and the orthogonal complement space are illustrated below for an azimuthal blocking region , for various values of .
As may be seen in Fig. 5, the target region dimensions are low for the lower frequencies. This is because of the lack of directivity in these frequencies, making in Eq. (13) almost independent of r. As k increases, so does the directivity, and the matrix increases in rank. The blocking capabilities of such a designed system is illustrated in Fig. 6 for two sample values of .
Target region dimensions for the array shown in Fig. 1. The sampling frequency is kHz and a point DFT has been used. The target region dimension and the orthogonal complement space dimension add up to M for each combination of . (a) ; (b) .
Target region dimensions for the array shown in Fig. 1. The sampling frequency is kHz and a point DFT has been used. The target region dimension and the orthogonal complement space dimension add up to M for each combination of . (a) ; (b) .
Performance of the blocking system in dB for the array of Fig. 1, for a sampling frequency of kHz and a point DFT. The discontinuities in the plot are the frequency bins at which the dimensions of the orthogonal complement space change. The blocking region was set to .
Performance of the blocking system in dB for the array of Fig. 1, for a sampling frequency of kHz and a point DFT. The discontinuities in the plot are the frequency bins at which the dimensions of the orthogonal complement space change. The blocking region was set to .
The blocking matrix is computed according to (14)–(16), where the integration is either performed numerically or, under simplifying assumptions, may also be obtained in a closed form solution (see Appendix A). Note that the broad spread of the blocking region ( about the mean source DOA) is chosen to account for the inaccuracies in array positioning, source spread and propagation model uncertainties. The target regions for the two sources corresponding to each array are defined as in Table I. Note, further, that the tolerance regions are the same for all arrays, requiring only a single computation of , unlike the hierarchical approach where the cost function parameters for each array have to be computed separately. Further, this approach is easy to tailor to almost any automobile size and does not require a separate recalibration of the search regions as in the hierarchical approach.
VI. EXPERIMENTAL EVALUATION
The presented approaches to brake squeal localisation were evaluated using a VW Touran. The recordings of brake squeal were made under different driving conditions and surroundings. As mentioned previously, the signals were analyzed in T = 200 ms time segments. Each segment was decomposed into its short-time spectrogram using a windowed, K point DFT, with a frame shift of O samples, as depicted in Fig. 7. Such spectrograms were constructed for each channel of every array and evaluated for the presence of active frequencies. If such frequencies were found, the hypothesis tests were carried out in the corresponding frequency bins. The analysis parameters are presented in Table II. These parameters yield, for our application, an acceptable tradeoff between computational complexity, storage requirements, and efficient processing of information for instrumental and perceptual analysis. For the spectral analysis, the von Hann or the Hamming windows are usually used due to their good spectral characteristics. The threshold was empirically set based on the experiments we did in the controlled scenario (see further). The DFT was computed using the efficient fast Fourier transform (FFT) technique.
(Color online) Schematic of the data analysis for a microphone signal illustrating the concept of a signal segment of length T ms, its decomposition into frames of length K with a frame shift of O between two successive frames. The fact that the data-frames are windowed before the computation of the DFT is also schematically illustrated. (a) Sample signal spectrum, (b) localisation results, hierarchical approach, (c) localisation results, MaxChoice approach.
(Color online) Schematic of the data analysis for a microphone signal illustrating the concept of a signal segment of length T ms, its decomposition into frames of length K with a frame shift of O between two successive frames. The fact that the data-frames are windowed before the computation of the DFT is also schematically illustrated. (a) Sample signal spectrum, (b) localisation results, hierarchical approach, (c) localisation results, MaxChoice approach.
Parameters for the experimental evaluations.
. | DFT length . | Frame shift . | Window/ . | Likelihood . |
---|---|---|---|---|
(kHz) . | (ms) . | (ms) . | length (ms) . | threshold . |
32 | 32 | 8 | von Hann/32 | 0.7 |
. | DFT length . | Frame shift . | Window/ . | Likelihood . |
---|---|---|---|---|
(kHz) . | (ms) . | (ms) . | length (ms) . | threshold . |
32 | 32 | 8 | von Hann/32 | 0.7 |
A general difficulty in the evaluation stems from the absence of reliable ground truth. Even for the experienced listener, it is not easy to localise sounds while driving. Therefore, we first validated the approaches on a controlled scenario, where the automobile was at standstill and the brake squeal was simulated by manually exciting the brake discs with a shaker. For the subsequent evaluation on the realistic recordings, we base our evaluation on the plausibility of the results (we know the approximate resonance frequencies of the brakes and thereby can see if signals detected at these frequencies are correctly assigned).
A point of note here: the finite spectral resolution associated with the DFT coupled with the frequency jitter of the sources smears each narrowband acoustic event across more than one frequency bin. The accretion of such adjacent frequency bins builds what we denote as a frequency band and the presence of an active band characterises an acoustic event.
Both the hierarchical approach and the MaxChoice approach are modified to take such spectral smearing into account. In the former case, the probability of activity is computed using the spectrally-averaged cost functions for each array and in the latter case, the decision is based on a weighted average of the decisions in the individual bins of a band. As the MaxChoice approach essentially makes one decision per band, the maximum allowable bandwidth of a band in this approach is constrained to about 100 Hz. No such restriction is placed on the bandwidth in the hierarchical case, as it is capable of making multiple decisions at any band, and thus the smearing of two different events into one band should not affect the performance of this approach.
Some sample results are presented in Figs. 8–10, where one may easily discern the squeal regions in the reference spectrogram. The results obtained from the localisation approaches are depicted on this spectrogram in a colour coded manner consistent with Table III.
(Color online) Single active frequency, dominantly from BR. In general, both the proposed approaches are in good agreement regarding the localisation. The hierarchical approach makes multiple decisions at 21.8 s, as compared to the single decision of the MaxChoice approach. Further, the hierarchical approach detects, but does not localise the harmonic in the three time segments from 32.8 to 33.4 s. Such behaviour is examined further in Sec. VI A. This recording was made as the car was moving along a ramp in a tunnel—thus under exceedingly reverberant conditions. (a) Sample signal spectrum, (b) localisation results, hierarchical approach, (c) localisation results, MaxChoice approach.
(Color online) Single active frequency, dominantly from BR. In general, both the proposed approaches are in good agreement regarding the localisation. The hierarchical approach makes multiple decisions at 21.8 s, as compared to the single decision of the MaxChoice approach. Further, the hierarchical approach detects, but does not localise the harmonic in the three time segments from 32.8 to 33.4 s. Such behaviour is examined further in Sec. VI A. This recording was made as the car was moving along a ramp in a tunnel—thus under exceedingly reverberant conditions. (a) Sample signal spectrum, (b) localisation results, hierarchical approach, (c) localisation results, MaxChoice approach.
(Color online) Sample results for a more dynamic scenario. Multiple brakes squeal simultaneously and at multiple frequencies. Again, in general, both approaches are in good agreement. The period between 35 s – 39 s is of interest. Two closely spaced traces are visible at 8 kHz. This is treated by the hierarchical approach as a single band, which is then localised to both FR and FL, and presented in the order FR, FL. The MaxChoice approach, due to its constrained maximum bandwidth, localises each trace independently, the lower to FL and the upper to FR. This explains the apparent discrepancy between the approaches. Note that the end result of localisation is essentially the same from both approaches. (a) Sample signal spectrum, (b) localisation results, hierarchical approach, (c) localisation results, MaxChoice approach.
(Color online) Sample results for a more dynamic scenario. Multiple brakes squeal simultaneously and at multiple frequencies. Again, in general, both approaches are in good agreement. The period between 35 s – 39 s is of interest. Two closely spaced traces are visible at 8 kHz. This is treated by the hierarchical approach as a single band, which is then localised to both FR and FL, and presented in the order FR, FL. The MaxChoice approach, due to its constrained maximum bandwidth, localises each trace independently, the lower to FL and the upper to FR. This explains the apparent discrepancy between the approaches. Note that the end result of localisation is essentially the same from both approaches. (a) Sample signal spectrum, (b) localisation results, hierarchical approach, (c) localisation results, MaxChoice approach.
(Color online) Sample results for a multi-squeal scenario. Again, both the proposed approaches detect the squeal as being from different brakes and localise them correspondingly. Again, as in Fig. 9, the multiple traces around around 8 kHz are analysed as a single band in the hierarchical approach, and localised to FR and FL, whereas the MaxChoice approach localises each trace independently. Thus, while the graphical presentation may seem different, the end result of both approaches is essentially the same. (a) Sample signal spectrum and noise floor estimate, (b) Local SNR estimate.
(Color online) Sample results for a multi-squeal scenario. Again, both the proposed approaches detect the squeal as being from different brakes and localise them correspondingly. Again, as in Fig. 9, the multiple traces around around 8 kHz are analysed as a single band in the hierarchical approach, and localised to FR and FL, whereas the MaxChoice approach localises each trace independently. Thus, while the graphical presentation may seem different, the end result of both approaches is essentially the same. (a) Sample signal spectrum and noise floor estimate, (b) Local SNR estimate.
We begin with a localisation scenario where only a single active frequency is present. The predominantly active brake in this case was BR. Both approaches are in agreement in terms of their localisation result. At 21.8 s, the hierarchical approach performs multiple decisions whereas the MaxChoice approach makes a single decision consistent with the time history until that point. Note that in case of multiple decisions in the hierarchical approach, the results are presented for the localised brakes in the order FR, FL, BL, and BR. Thus, at 21.8 s, where the hierarchical approaches localises the squeal simultaneously to FL, BL, and BR, the results are presented by bars of the appropriate colour, slightly frequency shifted, in the above mentioned order.
A. Relative comparison of the approaches
We now present a relative comparison of the hierarchical and the MaxChoice approach, evaluated on the available data. The comparison is according to four different criteria:
Detection and localisation rate.
Validity of the spectral disjointness assumption in MaxChoice.
Agreement between the two approaches.
Consistency of the two approaches during disagreement in the localisation result
1. Detection and localisation rate
This criterion measures the number of instances where each algorithm detects and localises a squeal. This is indicated in Table IV, where denotes the number of localisation decisions and represents those segments where a squeal was detected but no localisation decision was taken. Additionally indicated is the number of soft decisions taken by the hierarchical approach (). We see that the data contains 1290 instances of detected and 895 instances (under MaxChoice) of localised squeal events. For any particular time frame and frequency, we say we have detected a squeal event if at least one array marks this frequency as active in the observed time frame. The discrepancy in the number of detected and localised events arises due to the following reasons:
C1: When the SNR is low, there are squeal events that are detected only by the one or the other array, and therefore do not contribute to the localisation result.
C2: In the hierarchical approach, given the zero upper threshold value of the cost function values, some frequencies where a squeal is present (i.e., detected by all the arrays) may not be localised as the cost function assumes positive values over the corresponding search regions. Manual examination of such instances indicate that in these cases the minimum of the cost function is shifted to lie outside the demarcated search region. This could happen due to temporary divergence of the direct path as, for example, when turning the wheels. The hierarchical approach is more sensitive to such misalignments as compared to the MaxChoice approach due to the stronger influence of the cost function in the decision chain in this case, as compared to the binary decision propagation based on relative differences in the MaxChoice case.
C3: In very rare cases, the MaxChoice approach may also fail to make a decision. This occurs when, for example, in the first stage the squeal is localised to the front of the automobile, but array 2, responsible for the subsequent FL/FR decision, does not detect the squeal at that frequency and time frame.
Performance comparison, hierarchical and MaxChoice—Detection and localisation capability.
. | . | . | . |
---|---|---|---|
Hier. | 861 | 456 | 27 |
MaxCh. | 895 | 395 | N/A |
. | . | . | . |
---|---|---|---|
Hier. | 861 | 456 | 27 |
MaxCh. | 895 | 395 | N/A |
Since C3 is very rare (based on manual examination of the results), we take the MaxChoice approach as the baseline and assume that the lack of decision here is principally due to C1. As this condition must also hold for the hierarchical approach, the difference in the and values between the two algorithms should indicate the frequency of occurrence of C2, a drawback of the hierarchical approach. In this context we see that the hierarchical approach fails to make a decision in an additional 61 instances, as compared to the MaxChoice approach.
2. Validity of the spectral disjointness assumption of MaxChoice
We analyze here the soft decisions taken by the hierarchical approach, indicating the number of times multiple wheels have been allocated to a specific event. This metric should give us an idea of the validity of our assumption regarding the temporal and spectral disjointness of the squeal signals originating from different brakes. We see from Table V, where indicates the number of instances where i wheels were localised, that only in 27 cases was a soft decision taken, a percentual amount of 3.4%, indicating the validity of our spectral disjointness assumption for the MaxChoice approach. Furthermore, we see that in 23 of these cases, one of the decisions of the hierarchical approach correspond to the decision taken by the MaxChoice approach. The four instances where no agreement is reached with MaxChoice are engendered by the occurrence of condition C3, which forces MaxChoice to a “no-decision” state.
3. Agreement between the two approaches
This is presented in Table VI for two cases. In the first case, we consider the instances where the squeal has been detected by all arrays and therefore must be localised at least by the MaxChoice approach. We measure, here, how often both approaches localise such events to the same wheel. In the second case, we consider the agreement between the approaches when no decision is taken for the time segment. The discrepancy between and is because of the rare occurrence of C3 for the MaxChoice approach. In general, we see a very good agreement between the two proposed approaches.
4. Consistency of the results during disagreement
This criterion presents the temporal consistency of the localisation results for the cases where both approaches are not in agreement for a squeal event. In effect, we treat the localisation result for a frame and frequency k as consistent if the same wheel has been localised at frame , for that same frequency (assuming, of course, that the squeal event spans more than one time frame. Consistency in this context cannot be defined for events lasting only one time frame). We see from the presented results in Table VII that the MaxChoice approach seems to be consistent for a larger proportion of the cases where the results of the approaches were different, for detected squeal events. Note, however, that the consistency measure is heuristic in the absence of ground truth, and it is difficult to ascertain if a consistent result is indeed the true result. Therefore, these results should be treated with some caution. indicates the total number of detected squeal events for which the two approaches were not in agreement and indicates the number of instances from , where each approach is consistent.
5. Comparison summary
Both approaches are capable of detecting and localising closely spaced harmonics that originate from different wheels. This is illustrated in Fig. 9, around frequencies of 7.9–8.1 kHz. Furthermore, both show a high degree of agreement in the localisation performance, in general making the same decision when the squeal component has a high SNR. The time frames where the approaches diverge correspond mainly to segments that contain the squeal only in a small fraction of its total length, or where the SNR is low. In such cases, the temporal consistency of the results obtained by MaxChoice is higher than that of the hierarchical approach. We reiterate that in the absence of a ground-truth, such a comparative evaluation cannot completely predict which algorithm is the better choice. However, the MaxChoice approach seems to be less sensitive to positioning errors and errors in the propagation model, and might well prove to be the better algorithm.
The hierarchical approach makes a soft decision in only a small percentage of the cases. In all other cases, only a single decision is made, leading us to conclude the validity of the assumption that an active frequency stems from a single source.
VII. SQUEAL DETECTION AND IDENTIFICATION OF ACTIVE FREQUENCIES
As mentioned previously (Sec. III B), the hypothesis testing is performed only on frequency bands where brake squeal is detected. A robust detection of the squeal and the identification of the active frequencies is, therefore, critical for the proper functioning of the localisation stage. Since brake squeal consists of narrowband harmonics, the signals recorded by the microphones would consist of such harmonics superimposed upon the (relatively stationary compared to the squeal) broadband background noise. Thus, identifying spectral components that “stick-out” of the background noise would offer a good estimate of the active frequencies.
One way to do this would be to monitor the background noise level using conventional algorithms based on long-term statistics of the signal spectra (e.g., Gerkmann and Hendriks 2011; Martin, 2001), and then to consider harmonic peaks that are significantly higher than this noise floor. This necessitates maintaining a record of the required signal statistics and computing the a posteriori signal-to-noise ratio (SNR) (Ephraim and Malah, 1984) at each frequency and each time frame, to identify potentially active frequencies. Thus detection of brake squeal and the identification of the active frequencies are performed in a single step. This approach has two drawbacks: (1) the computational cost associated with maintaining the noise floor statistics and computing the active frequencies in each time-frame and (2) the memory requirements associated with maintaining records of the noise floor for computing the statistics. Since brake squeal only occurs in a small fraction of the observed time period, most of these computations are superfluous. We therefore propose to decouple the problem of detection and frequency identification: the first stage is further broken down into its two constituent subtasks: (a) a priori detection of squeal presence and (b) extraction of the active frequencies, if part (a) detects a squeal.
For the initial decision regarding the presence of a squeal in a signal segment it suffices to use a simpler, computationally less expensive approach. If a squeal is detected by this approach, the local SNR estimators need be run only on that segment for extracting the active frequencies. The algorithm used for the a priori detection of squeal presence is described, for completeness, in Appendix B.
We reduce the computational and memory expense further by computing the local SNRs without maintaining noise statistics. For this, we exploit the observation that the squeal harmonics are extremely narrowband. Hence, order statistic methods lend themselves nicely to detecting these harmonics. Specifically, for each segment where squeal has been detected, we perform a median filtering across the signal periodogram. This yields an estimate of the noise floor for that time-segment, under the assumption that the noise has a smooth spectral progression. This is a reasonable assumption given that the noise is essentially road/tyre/engine noise, predominant at the low frequencies. This segment-wise noise floor estimate can be used to determine the local SNRs and, by a proper selection of the threshold, active frequency bins can be identified. This is illustrated in Fig. 11. For further robustness, the detection of squeal and computation of the active frequencies is done, for each array on the basis of the aggregated amplitude data.
(Color online) Illustration of the order-statistic method for local SNR estimation and active squeal frequency identification. Note that this approach is only carried out on signal segments where squeal is detected. There is no requirement to maintain signal statistics or records of past segments for computing these statistics. (a) Sub-band containing environmental noise only, (b) sub-band containing squeal.
(Color online) Illustration of the order-statistic method for local SNR estimation and active squeal frequency identification. Note that this approach is only carried out on signal segments where squeal is detected. There is no requirement to maintain signal statistics or records of past segments for computing these statistics. (a) Sub-band containing environmental noise only, (b) sub-band containing squeal.
VIII. CONCLUSIONS
This contribution has presented the design of a new localisation algorithm for the detection of brake-squeal. Given the complex and time-varying acoustic environment under the car and uncertainties in the knowledge of the source and sensor positions, localising a squeal event to the generator brakes is a challenging problem, where standard localisation approaches fail. The proposed approach builds upon the previous state-of-the-art, which treats the problem as that of a multiple-hypothesis testing problem. Under the simplifying assumptions of source disjointness, each detected squeal event (within a narrow bandwidth) is allocated to one brake. This is based on projections of the signals onto their orthogonal component subspace, where the computation of this component is designed to account for the model uncertainties. For linear arrays, the computation of the orthogonal component allows for a closed form solution. Further, for most passenger vehicle dimensions, the design of the blocking system is identical for each array, leading to a simpler architecture. Both approaches perform well on the test data. A comparison of the two approaches shows that their performance is very similar highlighting, on the one hand, the good choice of the heuristic and, on the other hand, the validity of the simplifying assumptions of the second approach. The MaxChoice approach, however, seems to present more consistent results and is computationally simpler and more versatile. More importantly, the detection approach presented here is not constrained to the problem of brake squeal localisation alone, but may be extended to other applications too. The design of robust-blocking systems, for example, may be applied to other problems where the propagation environment is not ideally known, and errors in the signal model need to be compensated for. This approach may also be applied to the hypothesis testing for the presence of coherent sources from a priori known spatial regions. Usage of appropriate blocking systems would allow for the independent testing of the presence or absence of source activity in each region—with a resulting lower computational complexity.
ACKNOWLEDGMENTS
We thank Dr. H.-W. Rehn and A. Fischer for their helpful advice on the acquisition of test data on automotive platforms.
APPENDIX A: CLOSED-FORM SOLUTION FOR R
For a linear array and a blocking region demarcated in terms of the azimuth as , and under the assumption of well calibrated arrays, the propagation vector from the sources in this region to the array is defined as
with dm being measured with respect to the center of mass of the array. The covariance matrix for is then
which may be evaluated numerically, as a closed form solution to this is difficult. If we, however, make the simplifying assumption that the sources lie on the surface of a sphere with the array at the center and assume the blocking region to span the surface of the sphere between the specified angles, we may obtain a closed form solution (see, e.g., Elko, 2001) as
with elements
where . Such a closed-form solution is also reminiscent of Slepian sequences or discrete prolate spheroidal sequences (Slepian, 1978), applied now to the context of spatial beampattern design.
Further, it is easy to show that if the arrays are not well calibrated in terms of amplitude, then the normalised steering vector may be reformulated as
where each may be modelled by a suitable probability density function with mean 1. In this case, the deterministic covariance between two elements m and is obtained by the statistical expectation
Since this expectation is not a function of the source location, it amounts to multiplying the results in Eq. (A2) by the statistical expectation of the amplitude terms: . We can assume, further, these amplitude variations to be independent of each other, simplifying the computation. Under the assumption of a uniform distribution of the amplitude about , we see that the covariance matrix is unaffected for small after the expectation. On the other hand, if the perturbations in the amplitude follow other distributions, these can be factored in as well.
APPENDIX B: DETECTION OF BRAKE SQUEAL PRESENCE
The brake-squeal events are narrowband, with most of the spectral energy being concentrated in the squeal frequencies,2 giving rise to a rather well-structured spectrum in the presence of brake-squeal. The aim of the detection stage is to form an a priori decision on the presence of a squeal event. The prerequisites for any such approach are that it should be computationally not too demanding and yet reasonably accurate. Fast model-order estimation techniques (e.g., Cong et al., 2012), based on the statistics of the eigenvectors of covariance matrices could be a possibility. If we compute the covariance matrix over small frequency bands, the presence of squeal would manifest as a dominant eigenvalue. However, this approach requires maintaining signal statistics. Further, squeal events do not always occur for long time instants, which leads to underestimation of the statistics. Hence, we propose the use of modified information theoretic tools (like entropy).
The entropy for a discrete source with an alphabet and an associated probability distribution function P is defined as (Shannon and Weaver, 1949)
It characterises the amount of disorder in the system: the entropy is maximum if all the symbols in the alphabet are equally probable and reduces as the probability distribution becomes more “peaked.”
Now, consider small sub-bands of frequencies in the discrete Fourier spectrum of a signal. Let the length of each sub-band be . If a sub-band contains a harmonic signal, the power spectrum would be peaked at this frequency. Alternatively, if the sub-band contained only environmental noise, the distribution of power would be more or less equal along all the discrete frequency bins of that band. Similar to Renevey and Drygajlo (2001), let us define a pseudo probability distribution function along the bins k of a sub-band as
where represents the DFT coefficient of bin k in sub-band . Such a distribution function is shown in Fig. 12.
(Color online) Probability distribution functions for cases where only noise is present (a) and where the sub-band contains an active frequency (b). Note the ‘peakiness’ of the second plot with respect to the almost flat curve of the first.
(Color online) Probability distribution functions for cases where only noise is present (a) and where the sub-band contains an active frequency (b). Note the ‘peakiness’ of the second plot with respect to the almost flat curve of the first.
It may be seen from Fig. 12 that the distribution defined in Eq. (B2) is a faithful representation of the underlying spectral structure. This “entropy” can then be calculated using Eqs. (B1) and (B2) and used to predict squeal presence for a fixed detection threshold . Note that setting the detection threshold involves a trade-off between sensitivity to events at lower power and the generation of too many false alarms and depends upon the requirements of the application. If this approach detects a squeal in a signal segment, that segment is analysed in more detail and the squeal frequencies are extracted using more sophisticated algorithms.
APPENDIX C: DESCRIPTION OF HARDWARE USED
The 32-channel array with microphones and pre-amplifiers was custom-made. The microphones were Knowles FG-6163-P07 analog electret microphones with an omnidirectional characteristic and a 6 dB per octave sloping frequency response for the suppression of low frequency components at the electro-acoustic interface. Frequency response of the capsules is flat above 2 kHz.
The microphones were connected to Funk Tonstudiotechnik Type SOA-2V2 (Funk-Tonstudiotechnik, 2019) pre-amps with symmetric outputs.
For the A/D conversion and transmission to the computer, we used the hardware based on the NIST Mark III design (NIST, 2019).
A picture of the array and the casing of the data acquisition system is presented in Fig. 13 to allow the reader to get a better idea of the structure of the array and the compactness of the data acquisition setup. Further, to enable usage in a mobile environment, the recording setup could be hooked up to a 12 V auto battery.
(Color online) An illustration of the hardware used. (a) Sub-array module with protective cover removed. The microphone capsules are visible in the centre portion. All 4 arrays are similar in shape and construction. (b) Front view of compact case used for data acquisition. The preamplifiers, A/D converters, etc., are integrated into this case. The arrow indicates the ethernet connector for data transmission to the acquisition and evaluation computer.
(Color online) An illustration of the hardware used. (a) Sub-array module with protective cover removed. The microphone capsules are visible in the centre portion. All 4 arrays are similar in shape and construction. (b) Front view of compact case used for data acquisition. The preamplifiers, A/D converters, etc., are integrated into this case. The arrow indicates the ethernet connector for data transmission to the acquisition and evaluation computer.
The presence of a squeal event precludes the null hypothesis that no brakes were active.
The lower frequencies containing motor noise are neglected in our considerations.