Sound masking can reduce the distraction due to ambient sounds in open-plan offices. This paper compares a typical masking sound with a slope of −5 dB per octave to a steady-state signal with the spectrum of the disturbing speech signal. Subjects had to complete a number recall task and a questionnaire in a laboratory experiment. The sound conditions with the spectrally-matched noise resulted in similar error rates at 3 dB higher speech-to-noise ratios as compared to the standard noise. Using a speech-shaped steady-state noise as masking sound could reduce the effect of distracting speech in the work place more efficiently.
1. Introduction
Sound masking in combination with a high level of room absorption and high screens is a known method to cover disturbing background sounds and to improve the sound privacy within open-plan offices.1,2 Sound masking decreases the fluctuations of disturbing speech signals at low signal-to-noise ratios (SNRs) and provides a controlled background noise level. This nonfluctuating background noise covers distracting noise such as speech sounds that would be intelligible and more distinguishable without the use of sound masking. The SNR refers in the following to the ratio of the A-weighted sound pressure level (SPL) of distracting single voice speech to the A-weighted SPL of a masking sound. This study aimed for a comparison of two steady-state masking sounds with different frequency spectra regarding their effect on working memory performance and annoyance perception.
Both semanticity and temporal-spectral variability of background sounds affect performance and annoyance.3 Colle and Welsh4 were the first who reported that exposure to background sounds with sufficient variation in time and frequency substantially impairs working memory performance. The effect is now referred to as Irrelevant Sound Effect.5 Background speech impairs cognitive performance more than speech-like noise with the same temporal-spectral characteristics and is perceived as more annoying regardless if listeners hear the speech-like noise as speech or nonspeech.6,7 Serial recall is a standard task to test verbal short-term memory performance and is a common method to examine the beneficial effect on cognitive performance by adding a masking sound to background speech (e.g., Refs. 3, 8, and 9). The SPL of the sound signal is not as crucial as the SNR that has a high impact on the working memory performance. For instance, presenting an irrelevant speech signal at 60 and 75 dB(A) did not result in differences in serial recall, but adding pink noise to the irrelevant speech sound showed a monotonic performance increase with decreasing SNRs.10
A broadband stationary noise with a generic spectral slope of −5 dB per octave band is a common signal for sound masking applications (e.g., Refs. 8 and 11). Liebl et al.3 investigated the working memory performance of subjects which listened to speech that was masked by a stationary noise signal with speech-like spectrum and were able to show that a sound condition with clear speech resulted in significantly more errors than sound conditions with masked speech at SNRs of 0, −3, and −6 dB. Recently, water sounds and babble consisting of multiple voices have been suggested as efficient signals for sound masking.8,9 Keus van de Poll et al.9 showed that a masker with a decline of 5 dB per octave is not as advantageous to working memory performance as a masking sound consisting of water and wind sounds, or speech babble at a SNR of −3 dB. To the authors' knowledge, working memory performance during sound conditions with masked speech has not yet been compared between a noise-like masker with a spectrum of −5 dB per octave and a stationary speech-shaped masker.
In applied research, considering the subjective perception of sound conditions with masking sounds is as important as their effect on working memory performance as sound masking systems are frequently not accepted in practice because employees perceive them as annoying. Veitch et al.12 analyzed 15 different stationary masking sounds with regard to their spectra. Based on a sound with a −5 dB per octave frequency spectrum they grouped the spectrum into a low-, mid-, and high-frequency part, cut or boosted some parts, and adjusted all signals to the same overall loudness level. For example, the masking sound with a low boost and mid cut was perceived as less hissy, as described by Veitch et al.,12 but the speech intelligibility was rated higher as compared to the masker with a −5 dB per octave spectrum.
A laboratory experiment was performed to compare the effects of two steady-state masking sounds on working memory performance and annoyance. A time-reversed speech masker was tested as well but the results are outside the scope of this paper. A serial recall task was used to measure the serial order short-term memory performance and subjective ratings were collected by means of a questionnaire. One noise-like sound had the same long-term frequency spectrum as the distracting speech signal. It was expected that this speech spectrum based masker would lead to improvements in working memory performance as compared to the masking sound with a slope of −5 dB per octave because the resulting speech intelligibility was expected to be lower at equal SNRs. However, the exact impact of semanticity on working memory performance is still subject to research and thus it was unclear if notable improvements would occur.
Multiple SNRs were tested to evaluate if the results are similar at different SNRs. Both masking sounds were tested at SNRs of −6, −9, and −12 dB whereas the masker with the spectrum of −5 dB per octave was also tested at −3 dB SNR. At a SNR of −6 dB all masking sounds were expected to improve the serial recall performance as compared to the sound condition with unmasked speech. Additional gains in serial recall performance were expected at lower SNRs due to lower speech intelligibility and temporal-spectral fluctuations.
Subjects were also asked to rate the perceived annoyance after each sound condition. The annoyance ratings were expected to be between both control conditions, silence, and unmasked speech. The annoyance during masked speech was expected to diminish for decreasing SNRs because the speech sound was quieter at lower SNRs. The masking sound with the decline of 5 dB per octave may be perceived as less obtrusive than the speech spectrum based masker due to the louder low-frequency parts, but it was expected to mask the speech sound less efficiently because the speech intelligibility was expected to be higher at same SNRs. The experiment aimed at giving some indication whether the combination of both effects results in similar annoyance perceptions at same SNRs of both noise-like masking sound conditions. In particular, two research questions were addressed:
Does speech-shaped stationary noise mask distracting speech more efficiently than stationary noise with a −5 dB per octave band slope during number recall?
Are sound conditions with speech-shaped stationary masking sounds more annoying than with stationary masking sounds with a −5 dB per octave slope?
2. Methods and materials
Twenty-four students (6 female) aged 20 to 29 (median = 24, standard deviation = 2.5) participated in the experiment. All participants were native German speakers and received a small stipend.
The experimental design was a one-way repeated measures design with 12 levels according to the 12 sound conditions during which working memory performance and subjective ratings were tested. Four subjects performed the test at the same time. All 12 sound conditions were presented as mono signals on both ears using the on-board audio and Sennheiser HD 280 PRO headphones (Sennheiser electronics GmbH & Co. KG, Wedemark, Germany). These headphones have a rather flat frequency response between the relevant voice frequency range of 100 and 5000 Hz that only dips below −6 dB at around 3000 Hz. No filter was applied to the signals to account for the frequency response of the headphones. The tests were conducted in the High Performance Indoor Environment Laboratory at Fraunhofer Institute for Building Physics. Each subject was seated on a desk without visual contact to any other subject. The temperature was kept at around 22 °C. Each participant performed one trial under each of the 12 sound conditions and one additional practice instance of the serial recall task at the beginning. Subjective ratings were collected directly after the serial recall task. The presentation order of the sound conditions was balanced over all 24 subjects by a Latin square design.
The serial recall task was designed as follows. One sequence consisted of each digit from 1 to 9 in a randomized order. Each digit was displayed for 700 ms in font type Chicago, size 16 point. Between two following digits there was a break of 300 ms, i.e., the interval of digits presented was 1 per second. After a retention interval of 8 s subjects had to recall the visually presented digits in the order of presentation by clicking numbers in the same order on a 3 × 3 array on the screen. Each sound condition contained 12 sequences. The percentage of incorrectly recalled digit positions was calculated for each sequence and the average score from all 12 sequences of one condition (mean error rate) was analyzed.
The questionnaire regarding the perception of the masking sound was designed as follows. The perceived annoyance was only addressed to the ambient acoustic conditions and assessed after each sound condition by a 5-point verbal (not at all, slightly, moderately, very, and extremely annoying) and an 11-point numerical (0–10 while 0 means “not at all annoying” and 10 means “extremely annoying”) rating scale according to ISO/TS 15666:2003.13 This standard provides recommendations on the assessment of noise annoyance and suggests the use of two questions on annoyance in each questionnaire with both annoyance scales, a 5-point verbal and an 11-point numerical scale. The questions are designed for the assessment of noise annoyance at home and were modified to refer to the short period of indoor sound exposure in a work environment.
The distracting speech sound was a dry recording of German sentences of the Hochmair-Schulz-Moser test for measuring speech intelligibility,14 recorded in the anechoic chamber at Fraunhofer Institute for Building Physics. Twelve sound conditions, ten mixed signals and two control conditions, silence and unmasked speech, were tested as listed in Table 1. All masker signals were calibrated to 45 dB(A) while the speech signal was varied between 33 and 42 dB(A), resulting in SNRs from −12 to −3 dB, respectively. Masker SPLs of 45 dB(A) are commonly suggested because higher levels can annoy employees and impair communication.1,12 The sound condition with unmasked speech was calibrated to 42 dB(A). The SPL refers to an A-weighted energy-equivalent SPL LAeq averaged over 40 s and measured using a sound level meter Norsonic Sound Analyzer type 110 (Norsonic AS, Tranby, Norway) and an artificial ear with G.R.A.S. 40AG 1/2 in. pressure microphone (G.R.A.S. Sound and Vibration A/S, Holte, Denmark).
Name of sound condition, used masker, and SNR [dB]. The three conditions with the time-reversed masker are not shown. The acronyms REF (reference), BER (masking sound as suggested by Beranek), and HSM (masking sound that is spectrally-matched to Hochmair-Schulz-Moser speech recordings) are used in the following to refer to the sound conditions. The SNR is printed as subscript.
Sound condition . | Masker type . | SNR . |
---|---|---|
REF0 | (Silence) | — |
REF∞ | (Unmasked distracting speech) | ∞ |
BER−3 | Steady-state noise with −5 dB per octave slope | −3 |
BER−6 | Steady-state noise with −5 dB per octave slope | −6 |
BER−9 | Steady-state noise with −5 dB per octave slope | −9 |
BER−12 | Steady-state noise with −5 dB per octave slope | −12 |
HSM−6 | Speech-shaped steady-state noise | −6 |
HSM−9 | Speech-shaped steady-state noise | −9 |
HSM−12 | Speech-shaped steady-state noise | −12 |
Sound condition . | Masker type . | SNR . |
---|---|---|
REF0 | (Silence) | — |
REF∞ | (Unmasked distracting speech) | ∞ |
BER−3 | Steady-state noise with −5 dB per octave slope | −3 |
BER−6 | Steady-state noise with −5 dB per octave slope | −6 |
BER−9 | Steady-state noise with −5 dB per octave slope | −9 |
BER−12 | Steady-state noise with −5 dB per octave slope | −12 |
HSM−6 | Speech-shaped steady-state noise | −6 |
HSM−9 | Speech-shaped steady-state noise | −9 |
HSM−12 | Speech-shaped steady-state noise | −12 |
3. Results
Figure 1(a) depicts the mean error rates that were observed in the number recall task. A repeated measures analysis of variance (ANOVA) showed a significant effect of sound condition on mean error rate in serial recall [F(4.4,102.2) = 5.59, mean squared error = 0.0080, p < 0.001, η2 = 0.20]. The Greenhouse-Geiser correction was applied to the degrees of freedom because Mauchly's test for sphericity was significant. One-tailed t-tests of the results during the mixed sound conditions toward both control conditions were calculated. It was also tested whether the mean error rate in silence was lower than in speech background. Benjamini-Hochberg α-error correction15 was applied. Significantly more errors were made during unmasked speech than under silence (p < 0.001, Cohen's d > 1.0). The mean error rates during sound conditions with the stationary masking sound with −5 dB per octave spectrum were lower as compared to unmasked speech at SNRs of −6, −9, and −12 dB (p < 0.05, Cohen's d > 0.5). The conditions with the speech−shaped masking sound resulted in lower mean error rates than under unmasked speech at −6, −9, and −12 dB SNR (p < 0.01, Cohen's d > 0.6). Furthermore, the mean error rates were higher with −5 dB per octave band spectrum masking at −3 and −6 dB SNR than under silence (p < 0.05, Cohen's d > 0.5).
Illustration of the results, means with standard errors are plotted. (a) Mean error rates in number recall during the sound conditions (n = 24); (b) mean annoyance ratings on a 5-point Likert scale during the sound conditions (n = 21).
Illustration of the results, means with standard errors are plotted. (a) Mean error rates in number recall during the sound conditions (n = 24); (b) mean annoyance ratings on a 5-point Likert scale during the sound conditions (n = 21).
All 24 subjects were asked to rate the perceived annoyance after completion of every serial recall task. The datasets of three subjects were not considered because they had experienced problems with the web browser while replying to the questions. The mean annoyance ratings of the sound conditions are depicted in Fig. 1(b). A repeated measures ANOVA reached statistical significance (F(8,160) = 28.4, MSE = 0.54, p < 0.001, η2 = 0.59). Follow-up t-tests for paired samples were calculated and Benjamini-Hochberg α-error correction15 was applied. The annoyance of speech background was rated significantly higher as compared to the silent sound condition (p < 0.001, Cohen's d > 4.1). Paired comparisons toward the control condition with speech showed that all sound conditions with masked speech were rated as significantly less annoying (p < 0.05, Cohen's d > 0.4). All sound conditions were perceived as more annoying than the silent sound condition (p < 0.001, Cohen's d > 1.9).
4. Discussion
This study compared the effects of masking sounds with different frequency spectra on working memory performance and annoyance. During speech that was masked by a signal with a spectrum of −5 dB per octave at a SNR of −6 dB significantly more errors were made than under silence, but not during speech masked by a speech-shaped masking sound at −6 dB SNR. These results indicate that the sound with speech-like frequency spectrum has an advantage over the sound with −5 dB per octave spectrum. When the SNR of the sound conditions was decreased from −3 to −12 dB in steps of 3 dB, the number recall performance increased monotonically. The working memory performance improved significantly as compared to unmasked speech at a SNR between −6 and −9 dB, as well as at lower SNRs.
These results suggest that a sound masking system requires SNRs between −6 and −9 dB to achieve working memory performance improvements. In practical applications speech would be less intelligible due to reverberation, informal instead of formal speech, and a faster speech tempo. The consideration of typical office conditions with reverberation times around 0.5 s may enable more accurate conclusions about appropriate SNRs. This study provides first indications that sound masking systems are very sensitive to working memory performance in serial recall in a range of approximately −10 to 0 dB SNR. Reducing the SPL of background speech below the SPL of the masking sound that is commonly adjusted to around 45 dB(A) needs very long distances between the disturbing sound source and the receiver or a state of the art open-plan office with high attenuation of disturbing speech sounds.
Both maskers resulted in lower annoyance perception when the SNR was decreased between −6 and −12 dB. Unlike serial recall performance, annoyance ratings during masked speech conditions that included the two masking sounds were very similar at same SNRs. A subsequent study is necessary to analyze the different behavior of cognitive performance and perceived annoyance in more detail. While adjusting the masking sound spectrum may have a beneficial impact on working memory performance, it might have a detrimental impact on the perceived annoyance.
The application of sound masking systems in open-plan offices is mostly driven by improving the user acceptance because masking sounds are often perceived as annoying. There is a common understanding that the SPL has to be set to a level that provides a good trade-off between acoustic privacy and annoyance. This study followed a different approach by analyzing the serial order short-term memory performance by means of a serial recall task. The results point out that adjusting the spectrum of a noise-like masking sound to the speech spectrum may improve the cognitive performance noticeably while the annoyance perception remains unchanged. Whereas office users may not always perceive such performance improvements, employers could clearly benefit from a technology that adjusts the masking sound spectrum to the spectrum of the most disturbing background sound in terms of performance increases of their employees. This approach would require signal processing to separate speech sources from other nonfluctuating background sounds. A study that considers binaural sound conditions may provide further evidence about the applicability of a performance driven approach. Particularly in decentralized sound masking architectures with individual emitters and more directional sound components, the spatial relation between masking and speech sounds becomes more important.