Room impulse responses (RIRs) vary over time due to fluctuations in atmospheric temperature, humidity, and pressure. This can introduce uncertainties in room transfer-function measurements, which are challenging to account for. Previous methods of identification and compensation of time variance focus on systematic atmospheric changes and do not apply to subtle discrepancies in RIRs. In this work, we address this problem by proposing a model of short-time coherence between repeated RIR measurements as an indicator of time-frequency similarity and as a measure of time-variance-induced changes in RIRs. Atmospheric changes cause fluctuation in sound speed, which, in turn, results in variation in the time-of-arrival of sound reflections following a Generalized Wiener process. We show that the short-time coherence decreases exponentially with the reflection-path length and propose volatility as a single model parameter determining the coherence decay rate. The proposed model is validated on simulations and measurements, showing applicability in indoor scenarios. The method reliably estimates volatility of as measured under laboratory conditions. We exemplify the utility of short-time coherence loss by predicting the high-frequency energy loss stemming from RIR averaging. The proposed method is useful in assessing the uncertainty of RIR measurements, especially when repeated measurements are compared or averaged.
I. INTRODUCTION
In room acoustics, the most common way to learn about sound propagation in a particular enclosed space is to measure its room impulse response (RIR). Among numerous measurement techniques, many require that the system under test is linear and time-invariant (LTI) to produce a correct output with minimal artifacts, such as harmonic distortion.1 That prerequisite is, however, impossible to fulfill in RIR measurements due to the nature of air as a propagation environment, which is susceptible to fluctuations in atmospheric conditions, as well as to random movements of the medium.2–8
Noise-based excitation signals, such as maximum-length sequences, are notoriously sensitive to changes in the transfer function,2,5,7 although the problem applies to sweep-based techniques as well.4,9–12 When the LTI requirement is not held within the duration of one measurement, i.e., intraperiodic changes appear, distortion may occur in high frequencies of captured signals.3 If, on the other hand, the system is not LTI when more than one measurements are compared, i.e., interperiodic changes take place, the differences in phases caused by time variance may lead to destructive interference, particularly in high frequencies,9,10 which appear as a loss of signal energy in the deconvolved RIRs.2,9,13 It is especially problematic when synchronous averaging of measured signals or RIRs themselves are used to enhance the signal-to-noise ratio (SNR).
Considering RIR variations is especially relevant in scenarios where multiple measurements are to be compared, such as measurements of loudspeaker directivity14 and scattering coefficient.15,16 Regardless of short excitation signals and relatively stable measurement environments in such scenarios, changes in the RIR still affect the obtained parameters when measurements last for a long time15,16 or are averaged.
Changes in the transfer function play an important role in acoustic feedback control and cancellation.17 The temperature-induced changes in the return path from the loudspeaker to the microphone might shift the howlback frequencies of a system.18–20 While this effect is not necessarily a negative consequence, the knowledge of the amount of time variance in a room is still crucial for assessing the performance of feedback cancellation.
The RIR variations are mainly attributed to changes in atmospheric conditions in the measured room. Most notably, significant changes in the average temperature between RIRs were identified as the cause for altered speed of sound, resulting in one of the RIRs being time-stretched in comparison to the other, i.e., the respective reflections arriving at the receiver earlier or later in time.10,21,22 Many studies attribute this effect to temperature changes only,2,4,7,23 while others, mostly on the topic of acoustic tomography, also consider the impact of varying humidity.24–26
Another effect that contributes to the time variance in RIRs is the movement of the air that commonly result from working HVAC devices,2 objects moving or being inserted into the scene,27,28 and convective flow as a consequence of temperature changes.24,29,30 Although air movement affects the speed of sound, it has been studied only in the context of wind in outdoor sound propagation.29,31,32 In rooms, this effect has been considered as an increase in the background noise level.2 In the present study, we do not differentiate between the effect of air movement and atmospheric conditions changes on the values of the speed of sound c, but treat them collectively as factors that comprise time variance.
In the literature, the estimation of the effect of varying speed of sound on the measured RIRs is based on the short-time running cross correlation, and specifically, the time-lag of its maximum value.10,21,22 Depending on the properties of the time lag over time, the main phenomena causing time-variance are identified,21 and correction methods are applied.10,22 However, such techniques only apply to systematic atmospheric changes, which cause considerable differences in resulting RIRs.10,21,22
In this work, we propose a model to account for the stochastic variations in the speed of sound. Concretely, the proposed model predicts the short-time coherence between repeated RIR measurements. The model has a single frequency-independent parameter called volatility, which indicates the strength of the atmospheric variation. Based on a spatiotemporal distribution of the speed of sound changes, we derive the variation of the time-of-arrival (TOA) of reflection paths between the source and receiver as a Generalized Wiener process. As a consequence, the proposed model predicts an exponentially decaying short-time coherence.
The paper is organized as follows. Section II explains the time variance problem and discusses RIR coherence. Section III derives the short-time coherence model between repeated RIR measurements and presents an estimation method of the volatility parameter. The proposed model is validated on simulations in Sec. IV, while Sec. V shows the results of applying the model to measured RIRs. Section VI shows how the proposed model can be used to predict the energy loss caused by synchronous averaging of RIRs. Section VII offers a summary of the paper and concluding remarks.
II. BACKGROUND
The effect of atmospheric changes on an RIR is manifested in two ways: as a change in atmospheric absorption of sound (affects mostly high frequencies33) and as changes in the speed of sound c (wideband effect). The latter is of particular interest in the present work. To this end, this section discusses the spatiotemporal variations in the speed of sound and elaborates on the coherence of repeated RIR measurements.
A. RIR variation
The atmospheric conditions determine the speed of sound in the medium. The effective speed of sound is a combination of the scalar temperature-dependent Laplace sound speed and the vector velocity field of the fluid.30 In literature, the change in the speed of sound is often assumed to be approximately homogeneous, meaning that the speed of sound is constant for the entire measurement period2 or the changes occur slowly enough that the constant value is assumed anyway.22 In room acoustic simulations, at position x is often assumed to be time-invariant or even time- and space-invariant. However, the atmospheric conditions in a room rarely stay constant homogeneously and exhibit small fluctuations of values.34 The fluctuations cause to change over time, while its statistical distribution remains stationary. In this work, we focus on the difference in speed of sound spatial distribution between two measurement instances, and .
The changes from to alter the propagation time of reflections. This is illustrated in Figs. 1 and 2 by an example of the image-source method (ISM). According to ISM, having a room geometry with a sound source and a receiver, the image sources are created by reflecting the original source against the surfaces in the room geometry and using those created images for further generation of higher-order image sources.35 Two of the resulting first-order image sources are marked with gray dots in Figs. 1(a) and 1(b).
When atmospheric changes prompt fluctuations in the speed of sound, they are inhomogeneous in both time and space. In Figs. 1(a) and 1(b), this is signified by the colored voxels of shades of red and blue, marking the different values of c, which is assumed constant in each voxel. The sound paths between the sources and the receiver traverse multiple voxels, resulting in the sound traveling through regions of different c values. Consequently, the TOA of reflections is affected, resulting in discrepancies in the simulated RIRs, as depicted in Fig. 2. The magnitude of differences is proportional to the image-source order, i.e., RIR duration. In Figs. 1 and 2, the values of and were computed for realistic temperature fluctuations of 20 °C °C.34
B. RIR short-time coherence
C. Problem formulation
If the estimated short-time coherence were equal to the SNR-based one, , the loss of coherence would be attributed only to the decaying energy of the signal,40 approaching the uncorrelated background noise. However, the estimated coherence is often lower, , even when the SNR is high, indicating the presence of RIR variation39 according to Eq. (7).
Recently, we presented a study on the loss of correlation over the RIR duration and time separation between measurements.13 The research results show that correlation is a good metric of RIR energy loss resulting from time-variance-induced changes in the propagation paths. Here, we extend this concept to time-frequency domain and propose a model of short-time coherence of repeated RIR measurements based on statistical considerations of the speed of sound fluctuations.
III. METHODOLOGY
In this section, we derive a single-parameter model for the short-time coherence between repeated RIR measurements due to the speed of sound fluctuations. We also propose how to estimate the model parameters from RIR measurements.
A. Statistical TOA change model
In the derivation of Eq. (10), we assumed that is independent and identically distributed. These assumptions can be considerably weakened. Due to Donsker's theorem41 (see their Theorem 8.1.3), no assumption on the change factor is required other than that it has a mean and a finite variance. In particular, does not need to be normally distributed. The assumptions on can be even weaker, such that they may be differently distributed42 and even weakly correlated.43 Note that the change factor in Eq. (9) depends on the difference of the speed of sound and not its spatial distribution itself. In many practical situations, the temperature gradients are present in a room such that , which does not directly impact . Therefore, we believe that the Generalized Wiener process in Eq. (10) is widely applicable to account for the random speed of sound fluctuations in a room.
Drift can occur when the mean speed of sound in the space changes between the measurements, for example, by overall changes in temperature. The drift can be modeled and compensated by resampling the RIR.22 In this work, we focus on repeated measurements within a short time frame such that the fluctuations occur without drift, i.e., η = 0, or compensate for when applicable (see Sec. V).
The volatility ϑ also depends on the speed of sound fluctuations. However, estimating the volatility from those physical parameters is beyond the scope of our study. Instead, we estimate the volatility ϑ from the short-time coherence between repeated RIR measurements in Eq. (7).
The volatility depends on the time between the measurements. When the RIR measurements are close in time, the change factor is small, and thus ϑ is small. When the measurements are more apart in time, the change factor grows, and with it, the volatility ϑ. We demonstrate this fact in Sec. V with repeated measurements.
B. Short-time coherence model
Similarly, Fig. 4 depicts the progression of over time for different frequencies with a given volatility. The coherence at each frequency decays as an exponential function. For volatility , the 20-kHz band coherence fell by a third of its initial value after 1 s, while the low frequencies, 1–2 kHz, still have values of close to 1.
C. Volatility estimation
The proposed method estimates the volatility ϑ using the short-time coherence between two RIRs. We use the decomposition in Eq. (7) to estimate the from measurements x and .
The volatility can be estimated for different time-frequency regions or jointly over the entire signal. This work determines the volatility per frequency band to study the frequency dependency. We used matlab fminbnd as a general optimizer to numerically determine ϑ. For time intervals with low SNR, the fit in Eq. (19) is dominated by the SNR term and is, therefore, unsuitable for estimating the volatility. Instead, we propose to perform the fitting in a time interval with a sufficiently high SNR, which we discuss in Sec. V.
IV. VALIDATION VIA SIMULATIONS
In the following, we verify the proposed model with a simulation. First, we generate two RIRs using the stochastic reverberation model.45,46 We synthesize the RIRs in the frequency domain to allow for fractional delays.47 The reflection amplitudes are drawn from a uniform distribution between 0 and 1. The TOA changes of the synthesized reflections are drawn randomly from a normal distribution and scaled according to Eq. (13), with a . The TOA changes are the only aspect differentiating the two signals, simulating a real-life scenario when two consecutive RIRs captured using the same measurement setup are subjected to time variance.
We estimate the short-time coherence on bandpassed signals. The bandpass filters are minimum-order filters, i.e., filters that achieve the required magnitude with minimal possible order. They have a stop band attenuation of 60 dB and are compensated for the delay introduced by the filter. As Figs. 3 and 4 show, the behavior of short-term coherence depends on the wavelength, with pronounced differences in the high-frequency regions requiring sufficient analysis resolution over 10 kHz. Therefore, in this work, we choose a constant bandwidth of 1000 Hz, spanning ±500 Hz from the center frequency in place of more commonly used octave bands.
We then apply the short-time coherence to estimate and compare it with . We use a rectangular window of length 21.3 ms. Figure 5 shows the coherence between two simulated RIRs, bandpassed at 5, 10, and 20 kHz ±500 Hz, and the fit obtained by the statistical model. The lines resulting from the coherence model fit the estimated well, closely following the values decrease over time and in frequency. The values of and at times 250, 500, and 1000 ms are also presented in Fig. 6. Although there are some differences between the two curves, it is obvious from Fig. 5 that they come from the noisiness of the estimated coherence.
As shown in Figs. 5 and 6, the coherence drops faster for high frequencies, which is especially visible above 10 kHz. This is attributed to the TOA variation, which results in larger phase changes in high-frequency bands. Conversely, the phase for frequencies between 1 and 5 kHz remains largely unaffected by the TOA variation.
Note that the many choices in the simulation, e.g., the distribution of reflections and the reflection amplitudes and signs, do not affect the resulting coherence curve. Effectively, only the volatility between the RIRs determines the resulting short-time coherence.
The goodness of fit between the estimated and modeled coherence was further evaluated by computing the model-signal correlation (MSC) expressed by the Pearson Correlation Coefficient. For all considered bands, the lowest MSC value was 0.834 (1-kHz band), and the highest one was 0.987 (at 17 kHz). The mean value of MSC for all bands was 0.928, and the standard deviation was 0.022. Such a high similarity confirms that the proposed model accurately predicted the short-time coherence due to the simulated time variance.
Additionally, the model was evaluated by comparing the estimated volatility ϑ to a ground-truth value of . Figure 7 shows that the proposed model predicts the volatility accurately, with an estimation error no higher than 4.3% of the reference value. Such a result further reinforces the applicability of the proposed model to estimating time-variance-induced changes in RIRs.
V. EVALUATION WITH MEASUREMENTS
This section presents the validation of the proposed model on two sets of measured RIRs. The background noise considerations are discussed as well.
A. Measurement setup
The proposed model was validated on two sets of RIRs, which were collected in a variable acoustic laboratory Arni, located at the Acoustics Lab in the Aalto University campus in Espoo, Finland. Arni is a room of shoebox shape with dimensions 8.9 m × 6.3 m × 3.6 m (length, width, and height, respectively). Its walls and ceiling are covered with variable acoustics panels, which can change their state from open-absorptive to closed-reflective.33,48 The total number of 55 panels, which are controllable one by one, allows for a multitude of total room absorption combinations, resulting in a variety of RIRs with decay times spanning 0.2–1.5 s to be measured. The detailed illustration of the room and the measurement setup is provided elsewhere.33,48,49 In this work, we always analyze a pair of RIRs that was captured during the same measurement session, using the same room setup, equipment, and source-receiver positions, with the only variable being the time that elapsed between the measurements.
Both sets of RIRs were obtained using the exponentially swept sine as the excitation signal. The sweep was 3 s long, with a frequency range from 20 Hz to 20 kHz. The first set of Arni RIRs, from here on called Arni 1, comprises 5342 panel combinations. For each of those, the sweep measurement was repeated five times, with 2 s of silence between repetitions, so that the sound could fully decay.
In the second set, dubbed Arni 2, the panel combination remained the same for the whole duration of the measurement session. The RIRs were measured continuously for about two hours with a series of 3-s sweeps with 2 s of silence between each of them, resulting in 1291 measurements. This means that for any two RIRs, all the frequencies are separated by a multiple of 5 s. The SNR in both measurement sets spanned 40–70 dB.
B. Background noise
The measured RIRs contain noise, thus it is beneficial for the coherence analysis to consider only their useful parts. Non-stationary noise is considered an artifact, and its presence renders the measurements unsuitable for further analysis,12,51 therefore the studied RIRs were devoid of any non-stationary noise. First, if there is a noisy part before the direct sound—representing the delay between the emission of the excitation signal and the arrival of the direct sound—it should not be considered in the volatility estimation.
Additionally, we assume that at the time of emission from the sound source (t = 0) two consecutive measurements are identical, assuming a perfectly reproducible sound source, but are susceptible to time variances already on the path from source to receiver, thus making the direct sound events in two RIRs different from each other [cf. Figs. 1(a) and 1(b)]. Hence, fitting the curve resulting from the statistical model starting from the direct sound will result in a sub-optimal fit. However, if the time delay between the source and the receiver is known, we can perform a simple data augmentation by inserting the coherence value of unity at time zero, enhancing the accuracy of the fit.
Second, as the RIRs decay, the energy of the useful part of the signal is gradually suppressed by the stationary noise, resulting in a drop of coherence to values oscillating around zero. Thus, only the part of RIR with sufficient SNR should be considered in volatility estimation.
The SNR effect is depicted in Fig. 8 on the example of RIRs from Arni 1 for a 19-kHz frequency band. In the top pane, both and are relatively stable up to around 400 ms, after which they display a sudden drop, which is attributed to a growing contribution of stationary noise to the signal. The SNR, presented in the bottom pane of Fig. 8, decreases from around 30 dB at 370 ms to close to 0 dB at 500 ms. Thus, in this work, we use 30 dB as the SNR threshold for volatility estimation, which proved to be a good value for both RIR datasets. The fitting range is marked with the black dashed line in both panes of Fig. 8. The fitted model , shown in the top pane of Fig. 8, follows the decrease in estimated coherence well, although detail is lacking, as does not predict, e.g., SNR-induced coherence drop. Multiplying with shows the combined curve, closely replicating the behavior of .
C. Results
The resulting model fits for two examples, one from Arni 1 and one from Arni 2, are presented in Figs. 9 and 10. In the case of Arni 1, Fig. 9 shows the fit for a pair of RIRs separated by 5 s and two frequency bands with center frequencies 10 and 19 kHz. The modeled curves follow the estimated ones closely as long as the SNR stays above 0 dB. The accuracy of the modeled coherence is similar in both frequency bands, proving that the proposed model works well regardless of the amount of RIR data available for fitting. For the 10-kHz band, the volatility was , while for the 19-kHz band it was .
The example from Arni 2 dataset, presented in Fig. 10, shows the model fitted on three pairs of RIRs with increasing time separation between the measurements—5, 30, and 50 min. The analyzed frequency band was 18 kHz. As expected based on our previous research,13 the larger the time separation, the steeper the coherence curves, which signifies the growing discrepancies between RIRs. Again, the modeled coherence follows the estimated coherence closely, also realistically reproducing the increase in volatility for measurements further apart with time. The estimated volatility values for this example were for time separations 5, 30, and 50 min, respectively.
Similarly to the simulation results, the goodness of fit was again assessed using MSC between the estimated and modeled coherence curves. The evaluation was performed only on the curves' segments that were used for the fitting, i.e., parts displaying sufficient SNR. The MSC values for all frequency bands ( kHz) and all time separations (5–20 s) for RIRs from the example panel combination from Arni 1 are depicted in Fig. 11. Figure 12 illustrates MSC results for the three pairs of RIRs from Arni 2. The similarity between the measurements and predictions for Arni 1 is above 0.85 for all the considered bands except for the 5-s time separation at 18 kHz. In the case of Arni 2, the MSC values are higher, consistently remaining above 0.95, except for the 50-min time separation and 19 kHz band. The results show that despite the minor differences between the estimated and the fitted coherence curves, the model is capable of predicting the time-variance-induced similarity loss exceptionally well.
To obtain more information about statistics of volatility ϑ, 150 sets of five RIRs were chosen randomly from 5342 combinations measured in Arni 1, to account for different conditions regarding the reverberation time, surface absorption distribution, and atmospheric conditions.33 Fig. 13 presents the results of volatility estimations, showing that ϑ grows proportionally to the time interval between measurements. The per-band ϑ variance is decreasing with the growing frequency. Despite that, the median values across the entire frequency range appear close to each other, which is in line with the nature of frequency-independent c fluctuations.
Volatility ϑ was also estimated for all RIRs from Arni 2 dataset, where the first measurement served as a reference. Figure 14 illustrates the ϑ results for Arni 2 over the entire measurement period and frequency bands 10–18 kHz. Each dot represents one measurement, while the lines depict the ϑ trends obtained from the median filtering of 50 data points. The remaining bands are omitted here for visual clarity.
Figure 14 shows good agreement with the results presented in Fig. 13—the values with the time separations similar to those in Arni 1 (left side of the plot) are in the range of . Figure 14 also shows that ϑ does not saturate even for long time separations between measurements. The saturation seems to appear only for RIRs measured more than 80 min apart and is visible for 16–18 kHz bands. Similarly to Fig. 13, ϑ values are relatively stable across frequencies, except for a momentary increase for the 18-kHz band between about 30 and 80 min in Fig. 14.
D. Comparison with previous methods
In literature, two methods (known to authors) aim to mitigate the time-variance-related phenomena in measured RIRs.10,22 However, both focus on significant interperiodic variations, assuming that the atmospheric conditions, e.g., temperature, are constant within one measurement period and that RIR change occurs slowly. Therefore, the purpose of the model presented in this study differs from both techniques.
Additionally, in our previous study,13 we showed that for RIRs recorded with a long time separation between the measurements, accounting for time-stretching related to interperiodic changes in temperature does not remove all the differences between RIRs. In this paper, we show the effect of time variance on slight time separations between measurements—up to 20 s in Arni 1 dataset—which do not allow implementing the state-of-the-art methods, as the changes in RIRs are too small to be captured by such techniques.
This is illustrated in the top pane of Fig. 15 by the time lag of the cross correlation for two RIRs captured in Arni 1 at different but small time separations between the measurements. The RIRs were upsampled by a factor of 10 to allow the algorithm to capture changes happening within 2 μs.
After 1.2 s from the direct sound, the lag does not show any visible slope or tendency that would allow the estimation of a suitable resampling factor according to the method developed by Postma and Katz,22 or Wang.10 However, the bottom pane of Fig. 15 shows that the difference between RIRs protrudes up to 39 dB above the noise floor level for time separation of 5 s and up to 46 dB for 15 s.
The results presented in Fig. 15 further confirm that the previous models are more suited to assessing and compensating for significant effects in time variance, e.g., interperiodic changes that clearly differentiate two RIRs from each other. Such an outcome further underlines the need for a model able to capture the subtle variation in RIRs, such as the one proposed in Sec. III.
VI. PREDICTION OF ENERGY LOSS IN AVERAGED RIRS
The problem formulation in Sec. II indicates that the most common and severe consequence of time variance in RIR measurements is the energy loss, which occurs when two or more measurements are averaged to enhance the SNR. Our previous study13 showed that cross correlation, being energy-based, is a suitable measure of energy loss. The cross correlation is superior to a simple energy difference since it is not affected by the signal value, and decays over time, capturing the growing TOA variation over the RIR duration.
Figure 16 displays the energy loss predictions made with the short-time coherence compared to the energy loss estimated directly from averaged RIRs. As expected, the amount of energy lost by averaging increases with time separation between the measurements, as the atmospheric environment keeps changing. The modeled energy loss follows the reference curves closely in Fig. 16, proving that the short-time coherence is a robust tool for estimating energy loss in averaged RIRs.
VII. CONCLUSIONS
In this work, we investigated the impact of time variance on the coherence between RIRs. We focused on the atmospheric fluctuations that contribute to the spatiotemporal variations in the speed of sound, effectively altering the TOA of reflections between consecutive RIR measurements.
We showed that the TOA fluctuations follow a Generalized Wiener process and that the resulting short-term coherence exhibits an exponential decay over time. We also described the coherence loss through a slope factor called volatility.
Validation on simulated RIRs showed that the model accurately predicts the progression of coherence over time. The volatility estimation showed good agreement with the ground truth, further confirming our approach's applicability to estimating the effects of atmospheric variations on RIRs.
In validation via measurements, the problem of coherence drop due to decreasing SNR is discussed, concluding that an SNR well above 0 dB, in our case 30 dB, is needed for reliable analysis of the time variance. Below that limit, the signal is heavily affected by the background noise. The example of modeled coherence curves demonstrates a good fit from the statistical model even when little data is available for fitting.
Volatility estimation on two large datasets of measured RIRs was in accord with expectations. This paper's results show volatility's invariance over frequency and its increase with growing time separation between measurements, which is compatible with the theory. The results also suggest that volatility likely does not saturate over time or that the measurement time was insufficient for the saturation to occur.
Furthermore, the paper showed that the proposed model can estimate the time-variance-induced changes in RIRs where the state-of-the-art models fall short, as it can capture very subtle discrepancies between a pair of RIRs. It was also successfully applied to predicting high-frequency energy loss resulting from RIR averaging, which can improve the SNR.
In future work, we suggest investigating the proposed model under various scenarios, such as long-term measurements or more turbulent atmospheric changes outdoors and indoors. The proposed model can be applied to methods where detailed information on the impulse response and its statistics is required, such as feedback cancellation and room acoustic measurements.
ACKNOWLEDGMENTS
The authors would like to thank Dr. Maximilian Schäfer for helpful feedback on the background and derivation of the proposed model. This work was supported by the Nordic Sound and Music Computing Network—NordicSMC, NordForsk Project No. 86892.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts of interest to disclose.