The exponential sine sweep is a commonly used excitation signal in acoustic measurements, which, however, is susceptible to non-stationary noise. This paper shows how to detect contaminated sweep signals and select clean ones based on a procedure called the rule of two, which analyzes repeated sweep measurements. A high correlation between a pair of signals indicates that they are devoid of non-stationary noise. The detection threshold for the correlation is determined based on the energy of background noise and time variance. Not being disturbed by non-stationary events, a median-based method is suggested for reliable background noise energy estimation. The proposed method is shown to detect reliably 95% of impulsive noises and 75% of dropouts in the synthesized sweeps. Tested on a large set of measurements and compared with a previous method, the proposed method is shown to be more robust in detecting various non-stationary disturbances, improving the detection rate by 30 percentage points. The rule-of-two procedure increases the robustness of practical acoustic and audio measurements.

## I. INTRODUCTION

An impulse response (IR) measurement is one of the most common procedures to assess the acoustic qualities of various systems, including physical spaces, such as concert halls and rooms,^{1–5} electronic devices,^{6} audio software,^{7} and more. IR measurements can be conducted using a variety of excitation signals, including impulses, which are produced with sources such as pistols and balloon pops;^{8,9} noise-based methods, e.g., maximum-length sequence (MLS)^{10–12} and inverse repeated sequence (IRS);^{13} and linear and exponential swept-sine signals. Each of these methods has its strong sides and shortcomings.^{14}

The exponential swept-sine (ESS) as an excitation signal for measuring IRs was introduced in the form used nowadays by Farina over 20 years ago.^{15} Currently, it is widely used as it provides the best consistency and highest robustness of measurements.^{14,16} The ESS technique also rejects most of the harmonic distortion,^{16} a bane of noise-based methods, such as MLS and IRS.^{14,17–19} However, ESS is sensitive to non-stationary noise, which causes artifacts in IRs obtained in the deconvolution process, and may lead to errors in the estimation of acoustic parameters.^{14,20–23} The present study investigates the ESS technique and discusses the disturbances that may occur during measurements and negatively affect the resulting IRs. A novel method to discriminate between clean and corrupted sweeps is proposed.

The ESS technique is known for its excellent signal-to-noise ratio (SNR)^{14} that results from a long excitation of low frequencies, which are usually more susceptible to contamination by background noise than high frequencies. This feature of swept-sine signals can be employed to achieve target SNR values for different frequencies by adjusting the time over which specific frequencies are excited.^{24–30} The vulnerability of the ESS method to non-stationary noise, however, grows proportionally with the length of the sweep signal. This may force a compromise between lengthening the ESS signal to increase the SNR and shortening it to minimize the risk of the occurrence of non-stationary noise.^{19,20} In this light, Stan *et al.*^{14} recommend using the swept-sine technique only for measurements in empty, quiet spaces.

Currently, there is no established method to identify non-stationary noise in sweep measurements. Manual detection works only for singular measurements, but in the case of numerous unsupervised measurements, automatic detection is necessary. Guski^{31,32} presented an algorithm addressing the problem of automatic classification of contaminated sweeps. Relying, however, on the separation of IR and background noise, this method is prone to errors when estimating decay and noise floor. Therefore, the need for a simpler and more reliable procedure remains.

This paper proposes to identify clean and contaminated ESS measurements based on their similarity to each other, expressed by the Pearson correlation coefficient (PCC). Used in applications such as pattern recognition^{33} and as a criterion for filter optimization,^{34} PCC proved to be more advantageous than the mean square error criterion. Similarly, cross correlation was used as a measure to estimate IRs sensitivity to small changes in sound-source position^{35} as well as for robust IR measurement against nonlinearities.^{36–38} This suggests that parameters related to similarity are good indicators of changes in audio signals, even when the environment is not free from noise.

The present work studies the problem of ESS measurements corrupted by non-stationary noise and introduces a procedure called the rule of two (Ro2). Ro2 is a method to identify a pair of clean sweeps, those not contaminated by non-stationary noise, from a series of measurements in a noisy environment. The method is based on the correlation between measured ESS signals. Various factors impacting the correlation are examined. The threshold separating clean sweeps from corrupted ones is determined. The Ro2 procedure is tested on a big dataset of ESS measurements and is compared to another method aimed at detecting impulsive noise in sweep measurements.

The remainder of this paper is organized as follows. Section II discusses the correlation between acoustic signals and describes the proposed method. In Sec. III, the expected contamination, such as stationary noise and time variance, are presented. Section IV elaborates on the types of non-stationary contamination and their effect on the correlation. Section V describes the validation procedure for the proposed method, discusses the experimental results, and compares the proposed method with another technique. Section VI concludes the paper.

## II. METHOD

This section tackles the detection of non-stationary events in an ESS signal and proposes a novel method called Ro2. The correlation of acoustic signals is also discussed.

### A. Problem formulation

Assessing whether the signal obtained during acoustic measurement is free of non-stationary noise or other artifacts is often a difficult task. Therefore, a good practice is to record a few test signals so as to be able to choose the best one, should unexpected acoustic events occur. In this case, recordings of the same conditions of the system under test can be compared to one another.

Given two acoustic measurements *y*_{1} and *y*_{2}, we want to determine whether they are clean or not. Assuming that contamination is a random occurrence, we measure the similarity of *y*_{1} and *y*_{2} as an indicator of contamination: if the similarity is low, then the contamination is indicated (in either one or in both of the signals), whereas a great similarity denotes an uncontaminated pair. We propose PCC as a robust measure of similarity. PCC is defined as

where $cov(y1,y2)$ is the covariance of signals *y*_{1} and *y*_{2}, $\sigma y1$ and $\sigma y2$ are their standard deviations, $\mu y1$ and $\mu y2$ are the mean values, and *N* is the number of samples in the signals. In acoustic measurements, the mean of the measured signals is removed,^{39} transforming Eq. (1) to

Assuming that the system under test is free of noise, and neither the system nor the measurement equipment changes between or during the recording of both signals, i.e., *y*_{1} = *y*_{2}, then $\rho y1,y2=1$. Consecutive measurements, however, are never strictly the same, and the PCC of two clean measurements is impacted by two classes of factors: (1) expected disturbances including stationary background noise and time-variances of the measured system and (2) non-stationary occurrences such as impulsive noise or sound dropouts.^{31,40}

Note that in the present study the term “clean signal” refers to a measured signal that contains stationary background noise and effects of time variance only, whereas the term “contaminated” is used for the signals containing both expected and unexpected disturbances.

An example of a set of measured ESS signals is shown in Fig. 1, where one of five sweeps is contaminated with impulsive noise. The corresponding PCC matrix is presented in Table I together with the total energy of each sweep in dB. The contaminated signal displays lower similarity with the other sweeps, while also having higher energy than the rest.

Sweep # . | 1 . | 2 . | 3 . | 4 . | 5 . | Energy (dB) . |
---|---|---|---|---|---|---|

1 | 1.000 | 0.999 | 0.995 | 0.999 | 0.999 | 71.76 |

2 | 0.999 | 1.000 | 0.995 | 0.999 | 0.999 | 71.75 |

3 | 0.995 | 0.995 | 1.000 | 0.994 | 0.994 | 71.83 |

4 | 0.999 | 0.999 | 0.994 | 1.000 | 0.999 | 71.74 |

5 | 0.999 | 0.999 | 0.994 | 0.999 | 1.000 | 71.73 |

Sweep # . | 1 . | 2 . | 3 . | 4 . | 5 . | Energy (dB) . |
---|---|---|---|---|---|---|

1 | 1.000 | 0.999 | 0.995 | 0.999 | 0.999 | 71.76 |

2 | 0.999 | 1.000 | 0.995 | 0.999 | 0.999 | 71.75 |

3 | 0.995 | 0.995 | 1.000 | 0.994 | 0.994 | 71.83 |

4 | 0.999 | 0.999 | 0.994 | 1.000 | 0.999 | 71.74 |

5 | 0.999 | 0.999 | 0.994 | 0.999 | 1.000 | 71.73 |

### B. Proposed method

The proposed method presents a systematic criterion to distinguish expected disturbances from non-stationary noise to create a meaningful and robust measure for the level of contamination. The Ro2 method requires a correlation threshold $\rho \u0302y1,y2$ separating clean signals from contaminated ones. Thus, the Ro2 is

When non-stationary noise occurs in the measurement, the PCC value does not point directly towards the contaminated sweep. Thus, when $\rho y1,y2<\rho \u0302y1,y2$, the measurement should be repeated and the correlation of all captured signals should be estimated. The measurement can end when at least two signals fulfill the requirement in Eq. (3). Section III shows how to determine the threshold $\rho \u0302y1,y2$.

## III. EXPECTED CONTAMINATION

In the following, we present the two main sources of “unavoidable” impacts on correlation that are identified here, namely, background noise and time-variance. The following discussion leads to the determination of the expected PCC $\rho \u0302y1,y2$, which serves as the detection threshold.

### A. Effect of background noise

The term “background noise” in acoustic measurements refers to any type of unwanted extra sound event. Since this definition includes non-stationary noise, the distinction needs to be made that in this study “background noise” is used to describe only the stationary noise.

The presence of stationary noise in the sweep signals affects their correlation. Therefore, in Eq. (2), we need to consider two noise signals *n*_{1} and *n*_{2} with zero mean. For this subsection, the background noise is the only disturbance such that the measurement signal is a mixture of the signal with the background noise, i.e., $y1=x+n1$, where

is the convolution of the ESS *s* and room impulse response *h*, denoted by an asterisk $\u2217$. Similarly, $y2=x+n2$. The resulting correlation is then

If the noise signals are uncorrelated with the ESS signals as well as with each other, i.e., $\u2211k=1Nn1n2=0,\u2211k=1Nx\u2009n1=0$, and $\u2211k=1Nx\u2009n2=0$, then, Eq. (5) can be simplified to

Thus, the signal energies are related by

where the energy of a signal is computed as

When the noise signal energies are equal, i.e., $E[n1]=E[n2]=E[n]$, the PCC can be estimated using the SNR value,

where the SNR is expressed in terms of signal energies,

In practice, $E[x]$ is unknown as it is affected by the room impulse response *h*. However, it can be inferred from the difference $E[x]=E[y1]\u2212E[n1]=E[y2]\u2212E[n2]$.

### B. Correlated background noise

Equation (9) provides an expected PCC value based on the assumptions that (1) the sweep responses are identical and (2) the background noise is uncorrelated and stationary. In the following, we discuss these assumptions.

The background noise is likely to contain strong harmonic content caused, for instance, by electric humming. Depending on the phase relation between measurements, harmonic background noise can be strongly correlated.

Let us consider the two extreme cases: when the noise signals *n*_{1} and *n*_{2} are fully correlated positively or negatively (anticorrelated). Thus, $n1=\xb1n2$, ergo $\u2211k=1Nn1n2=\xb1E[n]$. This yields the following bounds to Eq. (9):

where $\zeta =\rho n1,n2$ is the correlation between two stationary noise terms. Note that perfectly correlated background noise, as part of the measurement signal, is virtually indistinguishable from an ESS.

To this end, an experiment showing the relation between PCC and SNR values was conducted. A 3-s-long ESS was synthesized and convolved with a synthetic IR having reverberation time (RT) of 2 s. This signal was then added to a set of white and pink noise signals having various energies so that different values of SNR could be obtained. The noise signals were either uncorrelated (*ζ* = 0) or anticorrelated ($\zeta =\u22121$). The PCC values of these combined signals were calculated using Eqs. (9) and (11).

To simulate background noise with harmonic content, sawtooth waves were added to the aforementioned noise signals. The phase shifts between these signals were randomized between 0 and *π*. The results of the experiment, shown in Fig. 2, indicate that for clean signals, i.e., without non-stationary noise, PCC calculated as a function of SNR reaches high values close to unity. The results show that the spectral characteristics of the stationary noise have a negligible effect on the correlation as long as the noise does not contain periodic components that may result in sharp peaks or dips in the spectrum, e.g., sine waves.

The results also illustrate that harmonic content in a noise signal can heavily influence the correlation in both positive and negative directions. In Fig. 2, phase shifts for different SNR values create lines parallel to the ones resulting from the assumptions of uncorrelated and anticorrelated stationary noise. Small phase shifts close to 0 produce a highly correlated signal, whereas the increase in phase shift towards *π* decreases the PCC values, placing the signals with the biggest shift between the uncorrelated and anticorrelated boundaries.

In principle, estimating the correlation of the background noise between measurements is possible if there is a sufficiently long time interval without any other signals. However, the time intervals between measurements can be several seconds such that the stationarity of the background signals needs to be fulfilled precisely to reliably estimate the correlation. Here, we adopt the worst-case scenario of anti-correlated noise as a lower bound for the expected PCC.

### C. Transfer-function variation

The measured system itself can undergo change. For instance, the position of the microphone and loudspeakers may vary due to vibrations, or the propagation paths can be altered due to variations in the air caused by temperature and humidity fluctuations or air movement (e.g., due to ventilation).^{17,31,40–43} Unlike background noise, the measurement variations impact the impulse response *h* directly such that the signal model is

where *h* is some “ideal” room impulse response and *v* is the variation of the impulse response between measurements. Thus, the energy relation of two measurements are

The difference between the two measurements is then

The energy relation is

We choose the variation energy between two measurements such that $E[v1]=E[v2]$ and $E[s\u2217v1]=E[s\u2217v2]$. Thus,

since the variations *v*_{1} and *v*_{2} are uncorrelated (as the correlated part belongs to *h* by definition). Thus, we define the transfer-function variation factor

where $E[s\u2217(v1\u2212v2)]$ can be retrieved from the difference $y1\u2212y2$ using Eq. (15), and $E[s\u2217h]$ can be retrieved from the measurement as $E[s\u2217h]=E[y1]\u2212E[s\u2217v1]\u2212E[n1]$ using Eq. (13) and Eq. (16).

The PCC of the transfer-function variation is

Therefore, the transfer-function variation factor *τ* serves as a tolerance parameter for the expected PCC from Eq. (11),

The effect of time variances on the impulse response measurements can be modeled with time-stretching^{40} or by introducing sinusoidal jitter to the signal.^{17} The complexity and unpredictability of such variations, however, might render these experiments insufficient to predict *τ* values correctly. Therefore, in this study, transfer-function variation is estimated from the measured signals in Sec. V.

## IV. NON-STATIONARY NOISE EVENTS

During an acoustic measurement, various non-stationary disturbances can occur. Such artifacts are, e.g., impulses, low-frequency noises, and sound dropouts, which originate from door slams, heavy vehicles moving outside of the measured space, and errors in measurement software.

This section examines how different types of non-stationary noise impact the PCC values, depending on their energy, frequency content, and time of occurrence. The effect of contamination on the correlation threshold estimation is also discussed.

### A. Impulsive noise

The relation between the energy added to the sweep and the drop in PCC values can be concluded from Eq. (5) when we consider that one of the signals is contaminated with additional non-stationary noise $nns$, which is also assumed to be zero-mean and uncorrelated with both sweeps and stationary noise signals. Following the same reasoning leading from Eq. (5) to Eq. (7), we arrive at the following formula:

where $E[nns]$ is the energy of non-stationary noise.

The theoretical values of correlation estimated with the aforementioned formula are presented in Fig. 3. Equation (20) predicts the general trend of decrease in the PCC with the growing energy difference between the clean and contaminated ESS. The energy difference $\Delta E$ is a quotient of the energy of the two signals,

In the experiment, a synthetic sweep signal containing stationary noise, as described in Sec. III B, with an SNR of 84 dB was further contaminated with impulsive noise and low-frequency noise. Broadband, lowpassed, and bandpassed impulses served as impulsive disturbances. Lowpass-filtered white Gaussian noise lasting 1 s was used to elevate the noise floor of the measurement. One hundred signals of each type of non-stationary noise were used. The disturbances appeared at different times within the sweep, and their energy varied as well to obtain various changes to the contaminated signal's energy.

All signals used in the experiment had a different frequency content: the broadband impulse spanned across all frequencies, the lowpassed one had its cutoff frequency at 100 Hz, the bandpassed extended between 500 and 5000 Hz, whereas the white noise was lowpass filtered at 300 Hz.

The results of this experiment, presented in Fig. 3, show that increasing the signal's energy causes the PCC values to drop in accordance with Eq. (20). They also reveal that the correlation between the sweeps may vary for disturbances that add the same amount of energy. This phenomenon is especially prominent for the narrowest-band disturbances, such as low-passed impulse and low-frequency noise, which might be because non-stationary disturbance is not completely uncorrelated, but displays similarity to either the ESS or stationary noise. This is especially possible in the low-frequency region, where the sound is usually less diffuse.^{44}

Note that the energy differences represented by $\Delta E$ and mentioned in Table I are often small (<0.1 dB) and may fall below the uncertainty of the measurement equipment uncertainty. Therefore, the Ro2 should be applied to measurements performed within a short time, using the same measurement equipment, and the same settings.

### B. Median-based background noise energy estimation

Another problem related to the presence of non-stationary noise in the measurement is the possibility of contaminating the background noise used to estimate the SNR and thus, PCC threshold. A wrongly estimated noise energy $E[n1]$ leads to an underestimated PCC threshold and thus to the incorrect classification of clean and contaminated sweeps.

When a non-stationary random event contaminates the background noise, the affected samples carry more energy than the clean ones. Therefore, the contamination skews the amplitude distribution of noise in the positive direction. The nature of the stationary noise does not allow for the use of the PCC values as a discriminant for finding the non-stationary noise, as with sweep signals. However, if the amplitude distribution of the noise signal is Gaussian, i.e., $n1(k)\u223cN(0,\u2009\sigma 2)$, a robust estimator can be used.

The energy of Gaussian noise is essentially a scaled mean value of the squared signal. The drawback of such an estimator is its high sensitivity to outliers, resulting in false estimations for contaminated signals. The median, however, is less influenced by the outliers than the mean, since its breakdown point, i.e., the maximum proportion of contaminated observations that do not force the estimator to result in an aberrant value, is higher than that of mean: The breakdown point for the median is 0.5 whereas it is 0 for the mean.^{45–47} This means that if at least 50% of samples are not contaminated, the median values are not skewed.

For squared samples from the Gaussian distribution, the mean and median are related by a constant scaling factor $b\chi =1.4826$.^{47} Thus, the robust noise energy estimate is

To demonstrate the effectiveness of this method, a 2-s-long noise signal with Gaussian distribution and different values of noise power was contaminated at random times with impulsive noise (as described in Sec. IV A) and 200-ms-long lowpassed Gaussian noise. The non-stationary disturbances were scaled by random factors to achieve various effects on the noise energy. Then, the mean and scaled median values of noise energy were compared to a target value—the energy of noise without contamination.

The results presented in Fig. 4 show that while the mean energy value can change by tens of decibels in the presence of non-stationary noise, the median remains essentially unchanged and very close to the target (within 1 dB). Additionally, using the median instead of the mean does not add any processing to the detection process, making it the recommended procedure for calculating the background noise energy.

### C. Sound dropouts

The software-related dropouts do not add energy to the contaminated signal, but reduce it instead. Additionally, skipping the samples creates two cropped sweeps, one of which is shifted with respect to the clean sweep. Therefore, the energy difference introduced by the dropout is of less importance to the correlation between two signals than the time at which the skipping occurrs.

To estimate the effect of the time of the dropout on the correlation, the synthesized sweeps, as used in Sec. III B, were contaminated with sound dropouts. The dropouts were simulated by deleting small portions of the signal, ranging from one to ten samples, at different times throughout the ESS and shifting the remaining portion of the sweep by the respective number of deleted samples. The dropouts were broadband disturbances, since a discontinuity is an impulsive event affecting all frequencies.

The relation between the drop in the PCC values at the time at which the samples were skipped is depicted in Fig. 5. The results show that if the dropout happens in the beginning of the sweep, the correlation between the clean and corrupted sweeps is very low. However, if the contamination occurs later in the signal, the PCC drop is less prominent. Additionally, the dropouts appearing after the ESS has finished playing (in the present case, after 3.0 s) affect the correlation only marginally.

## V. VALIDATION

In this section, the database used for testing the proposed method is presented. The results of using Ro2 on this dataset are presented, and the transfer-function variation is determined. The proposed method is also compared with another procedure for coping with non-stationary noise in acoustic measurements.

### A. Validation database

Ro2 was validated on a database of swept-sine measurements collected in the *Arni* room at the Acoustics Lab of Aalto University, Espoo, Finland.^{5,48} *Arni* is a rectangular room, with dimensions 8.9 m × 6.3 m × 3.6 m (length, width, and height, respectively). The room's walls and ceiling are equipped with acoustic panels that can switch their state between open and closed, changing the amount of absorption and thus varying the acoustics within the space. A view of the space and measurement equipment is shown in Fig. 6.

The equipment used during the measurements included a 01 dB LS01 omnidirectional loudspeaker (sound source), two G.R.A.S. 1/2-in. diffuse-field microphones of type 40AG, two G.R.A.S. 1/2-in. free-field microphones of type 46AF, one Brüel & Kjær 1/2-in. diffuse-field microphone of type 4192, a G.R.A.S. power module of type 12AG, a measurement laptop, and a MOTU UltraLite mk3 Audio Interface. The measurement signal was a 3-s-long ESS^{19} that was played five times for each panel configuration with 2 s of silence in between to allow the sound to fully decay. The total number of measurements was 5342, amounting to 26 710 sweeps recorded with each microphone.

Due to the size of the database and the time required for its collection, the measurements were conducted automatically, without human supervision. Therefore, when an unwanted acoustic event occurred, no action was taken to discard the corrupted recording and repeat the measurement. This approach led to many sweeps being contaminated with non-stationary noise of unknown origin, type, and energy. Examples of sounds recorded during the measurements are available online.^{49}

### B. Ro2 measurement and selection procedure

The Ro2 detection proceeds as follows: before every measurement, a short period of silence (background noise) is captured, and its energy is calculated from Eq. (22). Next, an ESS is captured, and its energy is calculated as well. In the event that the noise and sweep signal lengths are different, their energies cannot be compared, and thus, the signal power can be used instead. The procedure is repeated so that two sweeps are measured. The expected PCC value is then computed from Eq. (9), and the lower bound for the expected PCC is calculated from Eq. (11). Next, the tolerance resulting from transfer-function variation [obtained from Eq. (18)] is applied according to Eq. (19). Finally, the detection threshold is compared to the sweeps' PCC estimated from Eq. (1).

If the measured PCC is on or above the threshold, both sweeps are classified as clean, and the measurement can end. If, on the other hand, the correlation is below the threshold, the presence of non-stationary noise is indicated. The measurement must continue until two sufficiently highly correlated sweeps are obtained. The ESSs which display low correlation with the clean sweeps are marked as contaminated and are discarded.

### C. Transfer-function variation estimation

The transfer-function variation factor was estimated for all measurements based on the difference between signals, as in Eqs. (15)–(17). Since this was done for both clean and contaminated sweeps, many values are skewed in the positive direction due to non-stationary disturbances. To eliminate outliers, values of *τ* that were higher than three median absolute deviations (MADs) from the median were discarded.

The distribution of transfer-function variation factors is displayed in Fig. 7. The figure shows that abnormal values of *τ* start just above the adopted threshold, with a prominent rise in the number of outliers between $10\u22123$ and $10\u22122$.

In the present study, $\tau =0.00019$ (indicated in Fig. 7 with a solid line) was used to set the tolerance to the PCC threshold. Thus, the detection criterion is not as strict as when using the SNR-based threshold. Note, however, that the value of *τ* depends on, among others, the length of the sweep, the time between consecutive measurements, and the characteristics of air movement within the measured space. Therefore, although the *τ* presented here may be used as a guideline for similar conditions, ideally it should be estimated for each measurement scenario separately.

### D. Correlation-based detection

In the present study, a period of silence before the emission of each sweep was used for background noise energy estimation. It was certain that there would be no late part of the decay or long-ringing modes present in that part of the measurement.

The validation results are presented in Fig. 8, where a two-dimensional histogram shows the relative probability of the expected and measured PCC values. The distribution reveals two clusters: the first one is located along the diagonal, where both the expected and measured PCC values are similar. It contains clean signals and includes the largest number of occurrences (the color bar limits the probabilities to increase readability). The second cluster consists of clearly contaminated sweeps. It is located below the diagonal in Fig. 8, meaning that the measured PCC is considerably lower than the values expected based on Eq. (9).

The SNR-motivated prediction of PCC values is indicated in Fig. 8 with a dashed blue line. Such a threshold is visibly too strict since the majority of measured signals fall below it. This proves that the tolerances due to transfer-function variation and background noise correlation need to be considered in Ro2. The lower bound for the expected PCC, $\rho \u0302$ with $\zeta =\u22121$, is indicated in Fig. 8 with a dash-dotted blue line. Although most of the signals are now classified as clean, a large number of measurements still lies below this threshold.

The final detection threshold $\rho \u0302$ with tolerance *τ* accounting for time variance is marked with a solid blue line in Fig. 8. $\rho \u0302$ with $\zeta =\u22121$, and $\tau =1.9\xd710\u22124$ identifies most of the sweeps from the top cluster as clean, whereas the signals from the bottom cluster are considered contaminated. This threshold also considers excessive time variation as contamination, due to which sweeps that do not otherwise contain non-stationary noise are discarded. The threshold represented by $\rho \u0302$ when $\zeta =\u22121$ and $\tau \u22600$ is recommended when using the Ro2 procedure.

### E. Comparison with a previous method

The proposed correlation-based detection is compared with the procedure developed by Guski,^{31,32} since it is the only other method created specifically for the purpose of identifying impulsive noise in sweep measurements. In this approach, the detection is conducted by first separating the sweep with the IR from the background noise using the iterative approach by Lundeby *et al.*^{50} Then, the logarithmic ratio between the maximum value and the root mean square value of the stationary noise is calculated. If the ratio is in the range of values typical for Gaussian noise, i.e., 12–14 dB, the measurement is classified as clean. However, if the ratio is higher, namely, 20 dB or more, contamination is indicated. Therefore, in the present study, the value of 20 dB is used as the threshold discriminating between clean and corrupted sweeps for the Guski method. The implementation provided in the ITA Toolbox^{51} was employed when testing this procedure.

First, the detection rate for non-stationary disturbances from Sec. IV is compared. Since the signals were synthesized, it was known that *ζ* = 0 and *τ* = 0. Thus, the strictest threshold could be used for the Ro2. The results for each type of non-stationary noise are presented in Table II. They show that the Ro2 is superior in terms of separating clean and contaminated sweeps, with detection rates being higher for each type of disturbance.

. | Detected . | Detected . |
---|---|---|

Non-stat. noise type . | Guski . | Ro2 . |

Broadband impulse | 78% | 95% |

Low-passed impulse | 48% | 95% |

Band-passed impulse | 63% | 95% |

Low-frequency noise | 30% | 95% |

Dropouts | 0% | 75% |

. | Detected . | Detected . |
---|---|---|

Non-stat. noise type . | Guski . | Ro2 . |

Broadband impulse | 78% | 95% |

Low-passed impulse | 48% | 95% |

Band-passed impulse | 63% | 95% |

Low-frequency noise | 30% | 95% |

Dropouts | 0% | 75% |

The last row in Table II reveals that both methods perform worst when sound dropouts occur in the ESS. In the case of the Ro2, however, only the dropouts occurring after about 3.5 s are undetected (cf. Fig. 5). Guski's method, however, was unable to detect this kind of disturbance altogether. This was an expected result since Guski's method was not intended for identifying such a type of non-stationary noise.

For the remaining types of non-stationary noise, the Ro2 did not correctly identify the sweeps containing disturbances of low energy (cf. Fig. 3). The Guski method, on the other hand, proved inconsistent in this regard, wrongly classifying the ESSs corrupted with both low- and high-energy non-stationary noise.

The comparison was also performed on the dataset of measured sweeps. The signals that were marked as contaminated by the Ro2 were further analyzed by a human annotator. The measurements were checked in terms of audibility of non-stationary disturbances as well as their visibility in spectrograms, since often the signal itself may mask the contamination, rendering it inaudible. The sweeps falling below the detection threshold due to the excessive transfer function variation were not incorporated in further experiments, as the Guski method was insensitive to time variance.

In the annotation process, 283 contaminated signals were selected to be analyzed with the Guski method. The results of the comparison are shown in Table III. The total number of measurements marked as contaminated by the Ro2 served as a reference, constituting 100% of detected non-stationary disturbances. Seventy percent of these signals were also correctly identified by Guski's method, while 30% were false positives. The human annotation revealed that the majority of unidentified disturbances were short low-frequency noise bursts. The Guski procedure also overlooked a small number of ESSs including impulsive noise.

Non-stat. noise type . | Guski . | Ro2 . |
---|---|---|

Detected | 177 (70%) | 283 (100%) |

Low-freq. noise undetected | 77 (27%) | 0 (0%) |

Impulsive noise undetected | 9 (3%) | 0 (0%) |

Non-stat. noise type . | Guski . | Ro2 . |
---|---|---|

Detected | 177 (70%) | 283 (100%) |

Low-freq. noise undetected | 77 (27%) | 0 (0%) |

Impulsive noise undetected | 9 (3%) | 0 (0%) |

Both experiments show that the Ro2 outperforms Guski's method regardless of the type of contamination. Its efficiency and robustness prove that it is the best available method for separating clean sweeps from those containing non-stationary noise. The Ro2 method can thus be recommended for acoustic measurements in situations where non-stationary noise may occur, which include most practical scenarios.

## VI. CONCLUSION

The paper introduces a novel method, called the rule of two or Ro2, to identify a pair of clean exponential swept sines in a series of repeated sweep measurements. The classification is based on the similarity between the ESS signals, expressed by means of Pearson's correlation coefficient. A detection threshold separates signals containing expected contamination, such as background noise and time variance, from those contaminated by non-stationary noise. This study also shows that using the median to estimate the background noise energy helps avoid the bias caused by non-stationary events.

If the resulting PCC value between two measured sweeps is above the threshold, the measurements are marked as clean, and both signals can be used in further analysis. If, on the other hand, the correlation is lower than the threshold, the presence of non-stationary noise is indicated and the signals must be discarded. Therefore, the measurement should be repeated until a pair of highly correlated ESSs is found.

In the large set of thousands of experiments reported in this study, the Ro2 procedure proved to be reliable and easily applicable in acoustic measurements. It also performed better than the previous established procedure for non-stationary noise detection, proving its robustness and efficiency. The Ro2 procedure increases the reliability of practical acoustic and audio measurements using sine sweeps.

## ACKNOWLEDGMENTS

This work was supported by the “Nordic Sound and Music Computing Network—NordicSMC,” NordForsk Project No. 86892.