This project acquired sound levels logged across six school days and impulse responses in 220 classrooms across four K–12 grades. Seventy-four percent met reverberation time recommendations. Sound levels were processed to estimate occupied signal-to-noise ratios (SNRs), using Gaussian mixture modeling and from daily equivalent and statistical levels. A third method, k-means clustering, estimated SNR more precisely, separating data on nine dimensions into one group with high levels across speech frequencies and one without. The SNRs calculated as the daily difference between the average levels for the speech and non-speech clusters are found to be lower than 15 dB in 27.3% of the classrooms and differ from using the other two methods. The k-means data additionally indicate that speech occurred 30.5%–81.2% of the day, with statistically larger percentages found in grade 3 compared to higher grades. Speech levels exceeded 65 dBA 35% of the day, and non-speech levels exceeded 50 dBA 32% of the day, on average, with grades 3 and 8 experiencing speech levels exceeding 65 dBA statistically more often than the other two grades. Finally, classroom speech and non-speech levels were significantly correlated, with a 0.29 dBA increase in speech levels for every 1 dBA in non-speech levels.
I. INTRODUCTION
The acoustic conditions of K–12 classrooms impact students' abilities to understand and learn course material. If the clarity of aural communication is impeded, then students may not achieve their full potential and surpass the learning goals set for them. The ANSI S12.60 standard, “Acoustical Performance Criteria, Design Requirements, and Guidelines for Schools, Part 1: Permanent Schools,” provides guidelines for sound levels in unoccupied classrooms due to noise sources resulting from the site or building design (ANSI, 2010), with the goal of achieving appropriate signal-to-noise ratios (SNRs) for student learning. However, meaningful data on the speech and ambient noise levels as actually experienced by students in occupied K–12 classrooms are currently limited or estimated. This paper presents results from a survey of acoustic conditions gathered in situ from 220 occupied K–12 classrooms over six complete school days. K-means clustering was applied to logged sound level data to separate them into speech and non-speech clusters, allowing for deeper analysis and greater understanding of experienced speech and ambient noise levels in occupied classrooms.
Three aspects that impact classroom acoustics are the sound levels from talkers relevant to instruction, sound levels from other competing noise sources, and the room's natural acoustic conditions. These aspects result in a difference between the desired acoustic signal level and undesired sound levels (noise) that may mask the desired one; this difference is characterized as a SNR. The desired signal is typically presented by teachers or audio materials played in class, although peer learning is also common in K–12 classrooms, so students may be focusing on what another classmate is saying in the midst of competing talkers (Topping et al., 2017). Other common noise sources in classrooms include sounds from adjacent spaces, exterior sources such as traffic or aircraft, operating audio-visual and computer equipment in classrooms, and building mechanical systems as used for heating, ventilation, and air-conditioning (HVAC). The room's size, shape, surface materials, and furnishings result in its natural acoustic conditions; one common metric used to characterize that is reverberation time (RT).
To maintain acceptable SNR in classrooms, one should keep desired talker levels high, competing noise levels low, and RTs in a range that supports desired signals (Yang and Bradley, 2009) but does not prolong noise signals. Bistafa and Bradley (2000) originally suggested a minimum SNR of 15 dB for classrooms, although later studies indicated that even higher SNRs are required for younger students to perform at the same levels as older ones (Bradley and Sato, 2008; Neuman et al., 2010). Conditions that result in lower SNR have been found to correlate negatively with speech intelligibility, performance on more cognitively challenging tasks, and academic achievement (Caviola et al., 2021; Connolly et al., 2019; Picard and Bradley, 2001; Prodi et al., 2019; Ronsse and Wang, 2013; Rudner et al., 2018; Shield and Dockrell, 2008) as well as greater listening effort (Degeest et al., 2015; Howard et al., 2010; Hsu et al., 2021; Ohlenforst et al., 2017; Rennies et al., 2014). The effects are significantly worse for children than for adults (Klatte et al., 2010; Leibold, 2017; Valente et al., 2012; Wróblewski et al., 2012), for children with hearing impairments compared to those with normal hearing (McCreery et al., 2019), and for persons listening to non-native-language speakers or who are non-native-language listeners (Cooke and Lecumberri, 2012; Nelson et al., 2005; Peng and Wang, 2016, 2019). Lower SNR conditions also correlate with higher vocal effort, load, and/or fatigue experienced by teachers (Bottalico and Astolfi, 2012; Bottalico et al., 2016; Graetzer et al., 2017; Hunter et al., 2020).
To achieve the minimum suggested SNR, ANSI S12.60 recommends that the greatest 1-h average A-weighted background noise level measured in an unoccupied classroom with HVAC systems on should not exceed 35 dBA for single mode HVAC systems or 37 dBA for multi-mode HVAC systems and that the RTs at the mid-frequency octave bands of 500, 1000, and 2000 Hz should not exceed 0.6 s for classrooms smaller than 283 m3 (10 000 ft3) (ANSI, 2010). Since the initial publication of the ANSI S12.60 standard in 2002, research on and attention to classroom acoustics have grown tremendously, resulting in a plethora of peer-reviewed work, much of which has shown that there are few classrooms meeting the unoccupied background noise level recommendations given in ANSI S12.60, while the RT guidelines are more often achieved (Knecht et al., 2002; Nelson et al., 2007; Ronsse and Wang, 2013; Sato and Bradley, 2008; Shield and Dockrell, 2004).
The noise criterion in ANSI S12.60 applies to unoccupied spaces and was set based on available research at the time of its development, including common talker levels, the expected drop of levels across the length of classrooms, and the minimum 15 dB SNR suggested for normal-hearing students. The standard does not provide explicit guidance on occupied conditions; some may consider it harder to design toward occupied conditions, which can be much more variable over time. Students learn in occupied spaces though; is the minimum 15 dB SNR typically achieved in occupied classrooms?
Reviews of older measurements made in occupied classrooms are provided by Picard and Bradley (2001) and Shield and Dockrell (2004). As those authors note, some older studies reported a single decibel value without indicating the precise metric used, or in cases where an equivalent sound level is given, the specific measurement time period was not always provided. Furthermore, very little had been published on variation in classroom acoustic conditions over the school day. The summary by Picard and Bradley (2001) found that ambient noise levels of occupied K–12 classrooms ranged from 51 to 75 dBA. In their own investigation, Shield and Dockrell (2004) monitored noise and activity in a morning or afternoon classroom session at intervals of 2 min in 110 occupied classrooms and reported A-weighted equivalent sound levels (LAeq) and A-weighted levels exceeded 90% of the time (LA90) for different categories of classroom activity. The LAeq values ranged from 56.3 dBA for the quietest category of activity (silent reading/testing) to 76.8 dBA for the loudest (group work and movement), while the LA90 values ranged from 42.4 to 63.9 dBA, respectively. Across all measurements in occupied teaching spaces, they report an average LAeq of 72.1 dBA and LA90 of 54.1 dBA. Unlike what Picard and Bradley (2001) reported in their review, Shield and Dockrell (2004) did not find a reduction of noise levels with increasing student age, particularly when analyzed by activity type. More recently, Astolfi et al. (2019b) measured ambient noise over 3 min in 20 occupied grade 1 classrooms, showing levels ranging from 38.4 to 55.9 dBA when students were sitting silently and levels ranging from 59.9 to 75.1 dBA when students were doing group activities; this study did not report levels measured with instructors leading lessons.
Table I summarizes results from other published investigations of occupied primary and/or secondary school classrooms that were not included in those reviewed above. Many of these have reported SNR in an occupied class as the difference between LAeq and LA90; that SNR value was then compared against the 15 dB minimum that guided the development of ANSI S12.60 (ANSI, 2010). Calculating this value from the data of Shield and Dockrell (2004) results in a SNR value of 18 dBA, which does meet the guideline. When looking specifically at their results across activity type though, only the activity where one person is speaking shows a difference between LAeq and LA90 of 15.4 dBA, greater than the minimum 15 dB. The other activities have differences that range from 12.6 to 14.3 dBA. (When reporting SNR between dBA values, the authors herein will label the resulting SNR as dBA.) The majority of the studies in Table I indicate SNRs that do not meet the recommended 15 dB minimum, although a few international ones report higher values. Limitations of these previous works, though, include having measurement durations of typically less than 1 h; also, none except for Shield et al. (2015) monitored actual classroom activity. Consequently, the reported differences between LAeq and LA90 can only be considered as estimates of the true SNR between a desired signal (conveying teaching material) and ambient noise (competing against desired signal) in the occupied classrooms. One other study that calculated SNRs in four grade 4 classrooms found values of +1 to +6 dB, which increased to +11 to +15 dB when a classroom amplification system was deployed (Larsen and Blair, 2008), based on (a) observations of classroom activity classified as teacher speaking, child speaking, child group noise, or occupied noise floor and (b) the means of intensity peaks from the logged acoustic signals.
Reference . | # . | Grades . | Duration . | LAeq (dBA) . | LA90 (dBA) . | SNR (dBA) . |
---|---|---|---|---|---|---|
Skarlatos and Manatakis (2003) | 32 | High schools | 40 min | 71.9 (SD 2.6, min 65.4, max 81.9) | — | 19.9 |
Choi and McPherson (2005) | 47 | Primary schools | 30 min | 60.7 (SD 3.2, min 54.1, max 67.6) | — | 13.5 (SD 3.3, min 7.1, max 20.0) |
Astolfi and Pellerey (2008) | 8 | Secondary schools | 3-6 min | — | min 28.2, max 39.0 | min 15.4, max 27 |
Sato and Bradley (2008) | 27 | 1, 3, 6 | 15-20 min | 60.1 (SD 4.3) | — | 11 (SD 4.3) |
Sarantopoulos et al. (2014) | 41 | Primary schools | 40 min | 69.0 (SD 5.7, min 56.3, max 82) | 52.4 (SD 6.0, min 35.2, max 66.3) | 12.7 (SD 4.5, min 0.6, max 21.3) |
Shield et al. (2015) | 68 | Secondary schools | 43 min (SD 9) | 64.2 (SD 5.4) | 51.1 (SD 6.5) | 13.1 |
Silva et al. (2016) | 3 | 3, 4 | 30 min | 69.7 (SD 2.3, min 67.2, max 72.3) | — | 19.2 (SD 3.1, min 12.9, max 24.3) |
Sala and Rantala (2016) | 40 | Primary schools | 3.6 h (SD 0.7) | 69 (SD 6.2, min 57, max 89) | 42 (SD 4.1, min 29, max 51) | 27 |
Kapetanaki et al. (2018) | 91 | Primary schools | 5 min | 70.1 | 55.7 | 14.4 |
Peng et al. (2018) | 46 | Primary schools | 15 min | 72.0 (SD 5.5, min 58.0, max 82.4) | — | 9.2 |
Reference . | # . | Grades . | Duration . | LAeq (dBA) . | LA90 (dBA) . | SNR (dBA) . |
---|---|---|---|---|---|---|
Skarlatos and Manatakis (2003) | 32 | High schools | 40 min | 71.9 (SD 2.6, min 65.4, max 81.9) | — | 19.9 |
Choi and McPherson (2005) | 47 | Primary schools | 30 min | 60.7 (SD 3.2, min 54.1, max 67.6) | — | 13.5 (SD 3.3, min 7.1, max 20.0) |
Astolfi and Pellerey (2008) | 8 | Secondary schools | 3-6 min | — | min 28.2, max 39.0 | min 15.4, max 27 |
Sato and Bradley (2008) | 27 | 1, 3, 6 | 15-20 min | 60.1 (SD 4.3) | — | 11 (SD 4.3) |
Sarantopoulos et al. (2014) | 41 | Primary schools | 40 min | 69.0 (SD 5.7, min 56.3, max 82) | 52.4 (SD 6.0, min 35.2, max 66.3) | 12.7 (SD 4.5, min 0.6, max 21.3) |
Shield et al. (2015) | 68 | Secondary schools | 43 min (SD 9) | 64.2 (SD 5.4) | 51.1 (SD 6.5) | 13.1 |
Silva et al. (2016) | 3 | 3, 4 | 30 min | 69.7 (SD 2.3, min 67.2, max 72.3) | — | 19.2 (SD 3.1, min 12.9, max 24.3) |
Sala and Rantala (2016) | 40 | Primary schools | 3.6 h (SD 0.7) | 69 (SD 6.2, min 57, max 89) | 42 (SD 4.1, min 29, max 51) | 27 |
Kapetanaki et al. (2018) | 91 | Primary schools | 5 min | 70.1 | 55.7 | 14.4 |
Peng et al. (2018) | 46 | Primary schools | 15 min | 72.0 (SD 5.5, min 58.0, max 82.4) | — | 9.2 |
This paper presents results on the acoustic conditions as experienced in occupied K–12 classrooms in the midwestern United States across a much larger scale. Impulse responses and octave-band sound levels logged every 10 s over six full school days have been gathered in each of 220 K–12 classrooms across four grade levels: 3rd, 5th, 8th, and 11th. These data have been processed into metrics that give insight into the conditions of K–12 classrooms, including unoccupied RTs across octave bands and daily metrics such as equivalent and statistical levels. Classroom activity monitoring could not be accomplished on this scale, so instead, an unsupervised machine learning algorithm called k-means clustering has been applied to separate the logged sound levels into two groups, based on nine dimensions of octave frequency bands from 32 Hz to 8 kHz: one with high levels across speech frequencies and one without. Estimates of the SNR as calculated from Gaussian mixture modeling and from the difference between daily equivalent and statistical levels are compared against a more precise estimate using the k-means clustered data. Additionally, the clustered data are analyzed to discern how often speech occurred in the classrooms and the percentages of time that speech levels in classrooms exceeded 65 dBA and that non-speech levels exceeded 50 dBA. This paper also provides further insight into how classroom acoustic conditions vary across the different grade levels and how speech levels in occupied classrooms are related to non-speech levels.
II. METHODOLOGY
Measurements of indoor environmental conditions, including acoustics, lighting, thermal comfort, and indoor air quality, were carried out in 220 classrooms; 110 were completed each year during the 2015–2016 and 2016–2017 academic years. This paper focuses on the acoustic measurements; more detail on the complete set of measurements may be found in Kuhlenengel et al. (2017) and Kabirikopaei et al. (2019). Classrooms from five school districts in Nebraska and Iowa were in the sample, including 74 rooms at grade 3, 70 rooms at grade 5, 32 rooms at grade 8, and 44 rooms at grade 11. Subjects taught in the grade 8 and 11 rooms were either math or language arts.
All of the measured classrooms had closed floor plans. The average volume of classrooms in the study was 201 m3 (7099 ft3), with a standard deviation (SD) of 32.4 m3 (1144 ft3), minimum of 101 m3 (3564 ft3), and maximum of 331 m3 (11 702 ft3). Seven of the 220 classrooms were in portable buildings, while the rest were in traditional buildings. The majority of classrooms had thin carpet, acoustical tile ceilings, gypsum board or concrete-masonry unit walls, and typically at least one exterior window. Some rooms exhibited unique features, such as brick walls, sloped ceiling, wood beams, or ceilings with no absorption. Rooms had typical classroom furnishings, including desks, chairs, dry-erase boards, bulletin boards, cabinetry, bookcases, and assorted decorations and posters on the walls. On average, there were 22 students in each classroom (SD 2.7, minimum 11, maximum 32).
To characterize the room acoustics, impulse responses were measured in each unoccupied classroom on a day after school was dismissed using a Larson Davis 831 sound level meter and a laptop running the software program EASERA. An omni-directional source was positioned at a typical teaching location (e.g., 1 m in front of a wall-mounted dry-erase board), and two receiver locations were used: at seated positions in the middle of the classroom and at the farthest student location. A swept sine method was used, with eight repetitions and measurement sweeps that were at least 1.2 s long. Assorted room acoustic metrics such as RT and clarity index were later calculated by the EASERA software in accordance with ISO 3382-2 (ISO, 2008).
Measurements of the equivalent sound levels were logged over time by two BSWA 309 type 2/class 2 sound levels meters in each occupied classroom, which were routinely calibrated throughout the study. One meter was placed at a work-plane height of 80 cm (32 in) above the finished floor inside an open-air, metal cage on top of a stand near the teacher's desk or teaching position. The second meter was mounted in (or to) the ceiling above the farthest listening position on the opposite side of the room from the teacher's desk. Care was taken to avoid placing the hanging sound level meter next to sources of steady-state noise like ventilation diffusers and projectors. The sound level meters were operated by external rechargeable batteries. Meters were deployed on a school day before class began and retrieved approximately 36 h later the following afternoon after school, capturing both occupied and unoccupied conditions. These two-day logging measurements were repeated three times throughout the academic year, roughly corresponding to once each season (fall, winter, and spring), resulting in sound levels that were logged over six occupied school days. Due to weather conditions in the midwestern United States, the schools' HVAC systems varied in operation throughout the year with cooling typical in early fall and late spring and heating from late fall through early spring.
The sound level meters logged A-weighted equivalent levels and octave band equivalent levels with center frequencies ranging from 32 Hz to 8 kHz every 10 s. Because the focus of this investigation is on sound levels experienced during the occupied school day, only sound level data recorded during published academic hours for each school (typically from 08:30 to 15:30) were used in the following analyses. All data processing and analyses were conducted using matlab (MathWorks, Natick, MA). The data from the two sound level meters within each classroom were energy-averaged at every time interval across the school day, and the energy-averaged data were then used to calculate assorted acoustic metrics for each school day, including LAeq and statistical levels such as LA90.
An unsupervised statistical learning technique called k-means clustering was subsequently applied to the logged equivalent and octave band levels from each day in each classroom in an attempt to categorize data into times when speech did or did not occur. K-means clustering categorizes observations into a specified (K) number of clusters while minimizing the within cluster variation (Alpaydin, 2020). A challenge of using the technique is that there is no direct way to determine the optimum number of clusters for a given data set, and there is no universally agreed upon approach to determine if the found clusters represent the true subgroups in the data. For this study, k-means clustering was performed in a nine-dimensional feature space on observations of the energy-averaged sound level meter data logged during the school day in each classroom, using octave bands as the features. All clustering was performed in matlab using the kmeans function. The number of data partitions was chosen to be K = 2, the measure of difference was squared Euclidean distance, and five repetitions were performed. Initial centroids were chosen using the kmeans++ algorithm. Instead of randomly assigning each observation to a cluster, kmeans++ chooses K centroid starting positions using a heuristic. The centroids are 1 × p vectors, where p is the number of features (or dimensions) in the data to be clustered. This method improves the algorithm and outperforms methods that use random seeding (Arthur and Vassilvitskii, 2007). Figure 1 show boxplots of data in the two resulting clusters across all classrooms, confirming that one cluster does correspond to times when higher levels in the speech frequencies (250 Hz to 2 kHZ) occurred, while the other does not. The boxes represent the interquartile range from the 25th to 75th percentile, the mid-line within the box is the median, the whiskers extend to 1.5 times the interquartile range past the 25th or 75th percentile marks, and plus marks indicate outliers. Figure 1(a) also shows the spectrum of a female talker at a raised level of 66 dBA, based on data from the standards IEC 60268 (IEC, 2003) and ISO 9921 (ISO, 2003), which corresponds closely with the speech cluster measurements.
Gaussian mixture modeling (GMM) is an alternate method to categorize logged classroom sound levels; Hodgson et al. (1999) fit histograms of the measured A-weighted sound levels taken in university classrooms with two Gaussian curves to distill one group representing teaching activity and another representing occupied background noise. The method has been used in other studies (Peng et al., 2018; Sato and Bradley, 2008) to estimate SNR in occupied classrooms. D'Orazio et al. (2020) recently compared these two blind segmentation techniques (GMM and k-means) along with two other visual segmentation methods (percentile levels technique and peak detection technique) and found fairly close extractions of student activity levels and speech levels between the four methods. They appear to have applied k-means clustering only to the A-weighted levels, however, without considering the octave band data at each logged instance, limiting the data set's potential richness. In the investigation discussed herein, comparisons are presented of SNR estimated from applying GMM against those calculated from k-means clustering applied in nine dimensions including A-weighted and eight octave band levels.
III. RESULTS
A. Room acoustic metrics
The impulse responses gathered from each classroom were used to calculate standard room acoustic metrics in EASERA, including RT T20 and clarity index C50 per ISO 3382–2 (ISO, 2008). Figure 2(a) shows boxplots of the T20 across octave bands. ANSI S12.60 states that RTs at the mid-frequency octave bands of 500, 1000, and 2000 Hz should not exceed 0.6 s for classrooms smaller than 283 m3 (10 000 ft3) and 0.7 s for those between 283 m3 (10 000 ft3) and 566 m3 (20 000 ft3) (ANSI, 2010). The acquired data show that 74% of the classrooms in this sample met the recommended maximum RT in each of the three mid-frequency octave bands (500 Hz, 1 kHz, 2 kHZ), with 56% of those failing in one band, 14% in two bands, and 30% in all three bands. Typically, a classroom of standard parallel-piped design and acoustical ceiling tile across the ceiling surface met the RT guidelines; ones that did not tended to lack complete absorptive coverage overhead. The mid-frequency averaged values for the sampled classrooms are in alignment with those reported in other investigations (Sala and Rantala, 2016; Sato and Bradley, 2008; Shield et al., 2015).
Figure 2(b) shows boxplots of the clarity index C50 across octave bands. While ANSI S12.60 does not provide guidance for C50, other standards for classroom acoustics do list desired minimum values of metrics for quantifying speech intelligibility. The United Kingdom's Building Bulletin 93 (2015) recommends a speech transmission index (STI) of 0.6 or higher. Bradley et al. (1999) published a regression equation that linked 0.6 STI with a C50 value of 1 dB, with every 0.1 increase in STI resulting in a 3 dB increase for C50. More recently, Italy released a national standard, UNI 11532, on classroom acoustics, which lists 2 dB as the minimum desired C50, averaged across the mid-frequencies of 500 Hz, 1 kHz, and 2 kHz and averaged across measurement positions (Astolfi et al., 2019a). In the current sample, 98% of the classrooms met the UNI 11532 guidance for C50; the classrooms that did not meet the guidelines are ones that also did not meet the ANSI S12.60 RT limits in each mid-frequency octave band. In a study of 20 grade 1 classrooms, Astolfi et al. (2019b) similarly found that classrooms meeting T20 guidelines exhibited good C50 values, while those with RTs above 0.8 s typically had unacceptable C50 values.
B. Daily noise metrics
The data acquired in this investigation from logging levels in occupied classrooms over six occupied school days can be analyzed in numerous ways to understand the acoustic conditions in classrooms. A primary interest is to determine the experienced SNR, which has been estimated previously by using GMM or from the difference between LAeq and LA90. First, the data from the two sound level meters deployed in each classroom were energy-averaged to produce one value per classroom across time. To apply GMM, Gaussian mixture distributions were fit to the LAeq data logged in each classroom for every measured school day. This analysis was performed in matlab using the fitgmdist function. The peak at higher ranges (LAeqH) is taken as an estimate of the average speech activity sound level, while the peak at lower ranges (LAeqL) is an estimate of average noise activity sound level that day. The difference between these two peaks is the GMM estimate of the daily SNR experienced in that classroom.
One representative value for each classroom is then calculated by arithmetically averaging the results across a classroom's six days of measurements. Figure 3(a) shows a distribution of the resulting estimates for speech activity levels and noise activity levels in the 220 classrooms; the daily speech activity level averaged across classrooms is found to be 65 dBA (SD 2.5), and the daily noise activity level averaged across classrooms is 47 dBA (SD 3.5). The distribution of the resulting averaged daily SNR in each classroom is shown in Fig. 3(b), yielding a sample average of 16.2 dBA (SD 3.2). While the SNR average across the sample is slightly greater than the 15 dB SNR desired, note that 34.6% of the classrooms did not meet the desired minimum according to this metric.
Another way of estimating the SNR from the acquired data as previous studies have done is by first calculating it for each classroom by taking the difference between its daily LAeq and corresponding daily LA90 across the occupied school day. One representative value for each classroom is calculated by arithmetically averaging the results across a classroom's six days of measurements. Figure 4(a) shows a distribution of the resulting daily LAeq and LA90 across the sample of 220 classrooms. The LAeq averaged across classrooms is found to be 64 dBA (SD 2.5), and the LA90 averaged across classrooms is 45 dBA (SD 3.5). The distribution of the resulting averaged daily SNR in each of the classrooms is shown in Fig. 4(b), yielding a sample average of 19 dBA (SD 4.1). This SNR average across the sample is greater than the 15 dB SNR desired and, in this study, equivalent to the difference between the overall LAeq and LA90 sample averages. Note that 14.6% of the classrooms did not meet the desired minimum SNR based on this calculation method.
Applying the k-means clustering across nine dimensions of data separated each logged measurement into one of two clusters: one demonstrates high levels across speech frequencies and is used herein to represent speech activity, while the other does not show high levels across speech frequencies and therefore is used to represent non-speech activity. The clustered data for each classroom were used to calculate a daily LAeq for the speech cluster (LAeqS) and a daily LAeq for the non-speech cluster (LAeqN). Then the daily SNR is estimated as the difference between speech and non-speech daily levels. This metric is able to more accurately calculate speech levels and non-speech levels, as the process separates data based on spectral information rather than based simply on sound level distributions (as LAeq and LA90 do and as GMM does when applied only to LAeq). One representative value for each classroom is calculated by arithmetically averaging the results across a classroom's six days of measurements. Figure 5(a) shows a distribution of the daily LAeqS and LAeqN across the sample of 220 classrooms. The LAeqS averaged across classrooms is found to be 66 dBA (SD 2.4), and the LAeqN averaged across classrooms is 49 dBA (SD 2.9). The distribution of the resulting averaged daily SNR in each of the classrooms is shown in Fig. 5(b), yielding a sample average of 16.9 dBA (SD 3.1). This SNR average across the sample is greater than the 15 dB SNR desired and, in this study, very close to the difference between the overall LAeqS and LAeqN sample averages. Note that 27.3% of the classrooms did not meet the desired minimum SNR based on this calculation method.
Table II summarizes results as calculated from the three methods: GMM, using LAeq and LA90, and using the k-means clustered results of LAeqS and LAeqN. Of these, the results from the k-means clustering are considered to represent speech and non-speech activity levels most accurately, as they are based on clustering the data across nine dimensions of spectral data. The speech activity levels are closely estimated by all three methods, with only a 2 dBA difference across those values. The noise or non-speech activity appears to be underestimated when using GMM's LAeqL and even more so when using LA90. The estimated SNR in the occupied classrooms is consequently overpredicted from using LAeq and LA90; GMM's results are found to more closely align with those calculated from k-means. All methods produce a SNR averaged across the sample of occupied classrooms that is greater than 15 dBA, but a significant percentage of the sample did not individually meet the minimum. Using GMM yields 34.6% of the sample exhibiting SNR lower than 15 dBA, while the method using LAeq and LA90 produces 14.6% of the sample not meeting the minimum; the k-means results find that number to be between the other two at 27.3%. On the opposite end, using GMM yields 10% of the sample exhibiting SNR higher than 20 dBA, while the method using LAeq and LA90 produces 35.9% of the sample exceeding that; the k-means results find that number to be 15.9%, closer to the GMM value. SNRs greater than the 15 dB SNR are desired for normal-hearing students, but others have suggested that even higher SNRs above 20 dB are recommended for children with hearing impairments, with learning disabilities, or learning in a second-language (Bradley and Sato, 2008; Neuman et al., 2010). The percentage of classrooms in this sample that meet a minimum 20 dB SNR based on the k-means results is low.
. | Speech . | σS . | Noise . | σN . | SNR . | <15 dBA (%) . | >20 dBA (%) . |
---|---|---|---|---|---|---|---|
GMM | LAeqH = 65 dBA (SD 2.5) | 2.3 dBA (SD 1.6, min 0.4, max 11.6) | LAeqL = 47 dBA (SD 3.5) | 2.8 dBA (SD 2.6, min 0.2, max 16.1) | 16.2 dBA (SD 3.2) | 34.6 | 10.0 |
LAeq and LA90 | LAeq = 64 dBA (SD 2.5) | 2.4 dBA (SD 1.8, min 0.3, max 12.3) | LA90 = 45 dBA (SD 3.5) | 2.0 dBA (SD 1.4, min 0.1, max 7.0) | 19 dBA (SD 4.1) | 14.6 | 35.9 |
K-means | LAeqS = 66 dBA (SD 2.4) | 2.0 dBA (SD 1.1, min 0.3, max 6.6) | LAeqN = 49 dBA (SD 2.9) | 1.8 dBA (SD 1.1, min 0.2, max 6.0) | 16.9 dBA (SD 3.1) | 27.3 | 15.9 |
. | Speech . | σS . | Noise . | σN . | SNR . | <15 dBA (%) . | >20 dBA (%) . |
---|---|---|---|---|---|---|---|
GMM | LAeqH = 65 dBA (SD 2.5) | 2.3 dBA (SD 1.6, min 0.4, max 11.6) | LAeqL = 47 dBA (SD 3.5) | 2.8 dBA (SD 2.6, min 0.2, max 16.1) | 16.2 dBA (SD 3.2) | 34.6 | 10.0 |
LAeq and LA90 | LAeq = 64 dBA (SD 2.5) | 2.4 dBA (SD 1.8, min 0.3, max 12.3) | LA90 = 45 dBA (SD 3.5) | 2.0 dBA (SD 1.4, min 0.1, max 7.0) | 19 dBA (SD 4.1) | 14.6 | 35.9 |
K-means | LAeqS = 66 dBA (SD 2.4) | 2.0 dBA (SD 1.1, min 0.3, max 6.6) | LAeqN = 49 dBA (SD 2.9) | 1.8 dBA (SD 1.1, min 0.2, max 6.0) | 16.9 dBA (SD 3.1) | 27.3 | 15.9 |
Another result that can be compared in Table II is how much a classroom's daily speech activity level and daily noise activity level varied across the six logged school days. That is, what is the variance in measured speech and noise activity levels across different days? The k-means daily speech and non-speech activity levels were found to vary around 2.0 dBA with similar SDs (1.1 dBA). The maximum variation found across days is around 6 dBA. The other two methods show similar average variations across days, but note that the maximum variations are much higher, as high as 16.1 dBA. A reason for this is that there were days of data collected in classrooms that did not have much student activity, due to cancelled school days from weather or other activities such as field trips. In those cases, the GMM method or using LAeq and LA90 still produced a “speech” level from the logged levels, even though there was not actual speech activity. The k-means method as applied, though, is clustering the data based on the spectral content of the measurements; consequently, its maximum variances found across days are considerably lower.
C. Differences across grades
Some previous work has shown that occupied levels in classrooms do vary with student age, with younger students in lower grades experiencing higher sound levels than older students in higher grades (Peng et al., 2018; Sarantopoulos et al., 2014; Shield et al., 2015). Figure 6 shows boxplots by grade of the average daily LAeqS, average daily LAeqN, and average difference between daily LAeqS and daily LAeqN. Wald tests were conducted on these data to determine if there were statistically significant differences across grades. The LAeqS levels in third and eighth grade classrooms are found to be higher than the other two grades at a statistically significant level of p < 0.01, but this was not the case for the LAeqN or SNR data.
The k-means clustered data additionally show that students in the sampled classrooms were listening to speech on average 58% of each day (SD 9.1%, minimum 30%, maximum 81%). The label %SpD is used to represent the average daily percent of speech in each classroom for the remainder of this paper. %SpD varied by grade with lower grade classrooms experiencing more speech than higher ones (Fig. 7); specifically, the differences between third grade and all other grades were found to be statistically significant (p < 0.01), while other pairwise comparisons were not. Shield et al. (2015) stated that in their sample, on average 46% of lessons were spent in plenary sessions with one teacher addressing the class, which would align with the findings here as it is lower than 58%. Hunter and Titze (2010) report that K–12 teachers had voicing time percentages (or the percent of time that teachers were phonating over a given time period) of 30% (SD 11%) in occupied classrooms; other half or full day investigations have measured slightly lower averages ranging from 21% to 29% (Bottalico and Astolfi, 2012; Durup et al., 2015; Puglisi et al., 2017). With speech occurring on average 58% of the school day and teachers phonating around 30% of class time, one could infer that talking students or other audio teaching materials account for the other 28% of average daily speech occurrences.
D. Exceedance of sound levels across a school day
The clustered data were further used to calculate the average daily percent of time that LAeqS exceeded 65 dBA in each classroom (labeled in this paper as %Sp > 65) and the average daily percent of time that LAeqN exceeded 50 dBA (labeled in this paper as %Ns > 50). These metrics offer alternate ways to consider the speech and non-speech activity in the classrooms, giving further insight into what happens over time rather than simply analyzing time-averaged values. The exceedance values of 65 dBA for speech and 50 dBA for non-speech were selected as they produced the most normal distributions across the data, in comparison to other exceedance metrics. Raised voice levels are stated to be around 66 dBA at a distance of 1 m (ANSI, 1997), just above the 65 dBA used; also note that 65 and 50 dBA are separated by the minimum 15 dB SNR that guided ANSI S12.60's development. Figure 8 shows histograms of (a) the averaged daily %Sp > 65, with a sample average of 35% (SD 15%, minimum 5%, maximum 71%), and (b) the averaged daily %Ns > 50, with a sample average of 32% (SD 22%, minimum 1%, maximum 100%). Figure 9 separates the data by grades; Wald tests indicate that the third and eighth grade classrooms had statistically higher percentages of %Sp > 65 than the other two grades (p < 0.001). No statistically significant differences are found across grade for %Ns > 50 though (p < 0.01). These findings are similar to the results found across grades for LAeqS and LAeqN earlier, likely due to these metrics being highly correlated with the respective exceedance ones, as explored more in Sec. III E.
E. Correlations between acoustic metrics
Table III shows the Pearson's correlation coefficients between metrics used to characterize speech activity (LAeqS, %SpD, %Sp > 65, LAeq), non-speech or noise activity (LAeqN, %Ns > 50, LA90), SNR estimated from k-means clustering, and room acoustic conditions (T20 and C50, averaged across the three mid-frequency octave bands of 500 Hz, 1 kHz, and 2 kHz). Most of the speech metrics are significantly correlated to each other at the p < 0.001 level with correlation coefficients of r = 0.87 or higher, except for %SpD, which is not significantly correlated with %Sp > 65 and has an r-value of only 0.36 (p < 0.001) with LAeq. This follows what was found earlier; although eighth grade classrooms had higher %Sp > 65, they did not have statistically higher percentages of speech occurrence in a day. All of the non-speech activity metrics are significantly correlated to each other at the p < 0.001 level with r = 0.85 or higher. SNR values are correlated to all of these speech and non-speech metrics (p < 0.001), but note that the r-values with the non-speech ones have greater negative magnitudes. As the non-speech activity levels increase, the SNR values are decreasing. Finally, the two mid-frequency room acoustic metrics T20 and C50 are strongly correlated to each other (, p < 0.01) but not to any other metric in this analysis. This is likely due to the relatively acceptable range of the T20 and C50 values in the sample, with few measured T20s higher than 1.0 s or measured C50s less than 1 dB.
. | LAeqS . | %SpD . | %Sp >65 . | LAeq . | LAeqN . | %Ns >50 . | LA90 . | SNR . | T20m . | C50m . |
---|---|---|---|---|---|---|---|---|---|---|
LAeqS | 1 | 0.05 | 0.92** | 0.93** | 0.35** | 0.29** | 0.18* | 0.46** | 0.13 | −0.12 |
%SpD | 1 | 0.10 | 0.36** | −0.19* | −0.19* | −0.19* | 0.23** | 0.02 | 0.05 | |
%Sp > 65 | 1 | 0.87** | 0.41** | 0.37** | 0.27** | 0.33** | 0.15 | −0.15 | ||
LAeq | 1 | 0.33** | 0.27** | 0.14 | 0.34** | 0.13 | −0.19 | |||
LAeqN | 1 | 0.90** | 0.88** | −0.66** | 0.14 | −0.13 | ||||
%Ns > 50 | 1 | 0.85** | −0.61** | 0.11 | −0.11 | |||||
LA90 | 1 | −0.70** | 0.11 | 0.13 | ||||||
SNR | 1 | −0.03 | 0.02 | |||||||
T20m | 1 | −0.89** | ||||||||
C50m | 1 |
. | LAeqS . | %SpD . | %Sp >65 . | LAeq . | LAeqN . | %Ns >50 . | LA90 . | SNR . | T20m . | C50m . |
---|---|---|---|---|---|---|---|---|---|---|
LAeqS | 1 | 0.05 | 0.92** | 0.93** | 0.35** | 0.29** | 0.18* | 0.46** | 0.13 | −0.12 |
%SpD | 1 | 0.10 | 0.36** | −0.19* | −0.19* | −0.19* | 0.23** | 0.02 | 0.05 | |
%Sp > 65 | 1 | 0.87** | 0.41** | 0.37** | 0.27** | 0.33** | 0.15 | −0.15 | ||
LAeq | 1 | 0.33** | 0.27** | 0.14 | 0.34** | 0.13 | −0.19 | |||
LAeqN | 1 | 0.90** | 0.88** | −0.66** | 0.14 | −0.13 | ||||
%Ns > 50 | 1 | 0.85** | −0.61** | 0.11 | −0.11 | |||||
LA90 | 1 | −0.70** | 0.11 | 0.13 | ||||||
SNR | 1 | −0.03 | 0.02 | |||||||
T20m | 1 | −0.89** | ||||||||
C50m | 1 |
Previous researchers have sought to quantify the Lombard effect in classrooms, whereby speakers increase their vocal effort as the noise levels increase. Bottalico et al. (2017) found that the effect begins to take place when the noise level is 43.3 dBA or higher, which matches practically all of the LAeq values in this sample [Fig. 5(a)]. The magnitude of the Lombard effect as measured by others has ranged from +0.51 to 1 dBA increase in a talker's speech level for +1 dBA in noise level (Bottalico and Astolfi, 2012; Durup et al., 2015; Lane and Tranel, 1971; Pearsons et al., 1977; Puglisi et al., 2017; Sarantopoulos et al., 2014; Sato and Bradley, 2008). The data presented in the current paper cannot be used explicitly to calculate a Lombard effect, as the speech levels were measured generally in the classroom and not at any specific talker position. Still, it is of interest to see how the speech and non-speech levels in the classrooms are correlated. Applying linear regression to the LAeqS and LAeqN data acquired in this study yields a statistically significant relationship (r = 0.35, p < 0.001), demonstrating a 0.29 dBA increase in LAeqS for every 1 dBA increase in LAeqN (Fig. 10). This is a lower rate of increase than the Lombard effects that have been previously reported, as would be expected if ambient noise levels are considered to be even across the classroom while the talker level is lower when measured at greater distances from the talker. Note that with an r = 0.35, the linear regression model accounts for only of the variance. A relationship exists, but there are other additional factors that account for variance in LAeqS.
IV. DISCUSSION
Based on the k-means clustered data, the daily average speech levels in classrooms, LAeqS, were found in this study to range from 60.3 to 74.1 dBA, with an overall average of 66 dBA (SD 2.4). These values are in alignment with LAeq values previously reported as estimates of teaching activity in occupied classrooms (Table I), ranging from 60.1 to 71.9 dBA. The values are also consistent with mean unweighted sound pressure levels ranging from 67.2 to 72.7 dB at a distance of 1 m from the teacher's mouth for primary and secondary school teachers (Calosso et al., 2017; Durup et al., 2015; Puglisi et al., 2017), extrapolated from data captured over half or full days by portable vocal analyzers worn by the teachers. Such talker levels may typically be expected to be higher than average LAeqS measured in the classroom, which are taken at greater distances from the instructor.
Analyses in this paper show that calculated LAeq values were close to LAeqS numbers, but the two would vary more significantly if regular teaching activity in the classroom were not assured. The non-speech levels found via k-means in this study, LAeqN, ranged from 42.0 to 57.6 dBA, with an overall average of 49 dBA (SD 2.9). These overlap with the higher LA90 values previously reported as estimates of noise activity in occupied classrooms (Table I), ranging from 28.2 to 55.7 dBA. Analyses herein indicate that LA90 underestimates the non-speech noise levels in rooms, which can then lead to overestimating the SNR.
The overall averaged SNR across this sample of 220 classrooms calculated from the k-means clustered data is found to be 16.9 dBA (SD 3.1), which is greater than the desired 15 dB SNR minimum. However, 27.3% of the individual classrooms in the sample did not meet the guideline, meaning that acoustic conditions in over a quarter of the occupied classrooms did not achieve optimal SNR for normal-hearing students. The majority of the classrooms (84.1%) exhibited average SNR below 20 dBA, therefore not achieving the more optimal SNR recommended for students with hearing impairments, learning disabilities, or learning in a second language (Bradley and Sato, 2008; Neuman et al., 2010). Comparing this to the previous work summarized in Table I, five of the 11 papers reported average estimated SNRs above 15 dB, as high as 27 dBA, but these SNRs were estimated using less precise methods compared to k-means clustering on multiple dimensions.
When looking at how often speech occurred in classrooms, there is a trend that the lower grades experienced greater percent of speech over a school day, as plotted in Fig. 7. However, only the differences between third grade and all other grades were found to be statistically significant (p < 0.01). Having a higher percentage of speech does not necessarily correlate with having higher speech levels, as Table III indicates. Further analyses of how metrics varied across grades revealed that the LAeqS levels in third and eighth grade classrooms were higher than for the other two grades (p < 0.01). Having higher LAeqS values resulted in those same grade levels having statistically higher %Sp > 65 (Fig. 9). Statistically significant differences across grades were not found for the non-speech metric LAeqN or resulting SNR data, however.
Findings from previous studies support lower grades in primary school having higher teaching activity sound levels (Peng et al., 2018; Sarantopoulos et al., 2014; Shield et al., 2015; Skarlatos and Manatakis, 2003), substantiating the result here that grade 3 was found to have higher LAeqS and %Sp > 65 values. Why the eighth grade classrooms also demonstrated these significant results remains unclear, as actual classroom activity was not visually observed. In this sample of midwestern United States classrooms, third grade students were taught all subjects in the same room typically with the same teacher. The eighth grade classrooms, though, were subject-specific, where the room is assigned to a specific teacher who taught only math or language arts to different groups of students throughout the day. There were fewer eighth grade classrooms, only 32 of the 220 rooms sampled, so the smaller sample may have influenced these results. More investigations are needed to determine if the grade findings reported here are consistently observed and, if so, why.
Correlations between the acoustic metrics compiled in this study demonstrate many statistically significant relationships (Table III), such as between the various speech activity metrics, between the various noise or non-speech activity metrics, and between all activity level metrics and the k-means estimated SNR. The mid-frequency averaged T20 and C50, however, were only strongly correlated with each other. This is unlike what has been reported by others; Sato and Bradley (2008) found a weak relation between these two metrics and speech levels. Some previous investigations have found significant correlations between RT with LAeq and/or LA90 (Calosso et al., 2017; Puglisi et al., 2017; Shield et al., 2015), but their samples included classrooms that had RTs up to 1.4 or 1.6 s. The majority of classrooms sampled in the current study were closed plan, seated around 22 students, and had ceiling heights around 3.3 m (11 ft) with fully absorptive ceilings, thereby resulting in few measured T20s higher than 1.0 s or C50s less than 1 dB. Classroom designs found more commonly outside the United States can produce higher T20 and lower C50 values, which may then correlate with higher noise activity levels and subsequent higher speech activity levels to compensate.
There is a statistically significant correlation between LAeqS and LAeqN (r = 0.35, p < 0.001), with a 0.29 dBA increase in speech levels found for every 1 dBA increase in non-speech levels in occupied classrooms. Keeping non-speech levels down can consequently be important to maintaining vocal health, leading to less vocal effort, load, and/or fatigue by teachers (Hunter et al., 2020). Also, the SNR values calculated through k-means have highest correlation coefficients with the noise or non-speech activity metrics; as LAeqN increases, the SNR decreases. Thus, designers of classrooms should aim to reduce non-speech levels in occupied classrooms by designing for noise levels in unoccupied classrooms that are at least 10 dB lower than the average LAeqN of 49 dBA reported here. That would be 39 dBA, slightly above the level set in the ANSI S12.60 standard (ANSI, 2010). While noise levels in unoccupied classrooms were not explicitly presented in this paper, others have reported unoccupied ambient noise levels being 5–10 dB lower than occupied ambient noise levels in classrooms (Sato and Bradley, 2008) or even on average up to 20 dB lower (Shield et al., 2015).
Logging sound levels in 220 classrooms over six occupied school days has provided a large data set with which to explore the acoustic conditions of occupied classrooms. Some limitations to the investigation presented in this paper include the fact that classroom activities were not visually observed. Also, by analyzing all logged data from the start to the end of the school day, the results did include times when classrooms were unoccupied throughout the school day, such as during lunch or school-wide assemblies. Applying GMM or using daily LAeq to calculate speech activity levels from data logged in classrooms that are unoccupied for significant periods of time can lead to error in the approximations then; this may explain some of the differences presented in Table II. Using the presented k-means clustering method at least groups those logged levels with similar spectral content.
V. CONCLUSIONS
The acoustic conditions of 220 occupied K–12 classrooms in the midwestern United States have been captured across six school days each. Results show that the majority of classrooms in the sample meet guidelines for RT and clarity index as provided in classroom acoustic standards. For example, 74% of the classrooms met ANSI S12.60's recommended maximum RT in each of the three mid-frequency octave bands (500 Hz, 1 kHz, 2 kHZ); for those that did not meet the recommendations, 56% failed in one frequency band, 14% in two bands, and 30% in all three bands. Sound levels logged every 10 s at two locations in each classroom have been analyzed to produce assorted acoustic metrics, including LAeqH and LAeqL from applying GMM as well as the average daily LAeq and average daily LA90 over occupied school days. K-means clustering was applied across nine dimensions of data (specifically, the sound levels in nine octave frequency bands from 32 Hz to 8 kHz) to separate each logged measurement into one of two clusters: one with high levels across speech frequencies used to represent speech activity and the other used to represent non-speech activity. The clustered data yielded two metrics, the average daily LAeqS and average daily LAeqN, that may be more confidently assumed to represent speech levels and non-speech levels in occupied classrooms.
Across the sample of 220 classrooms, the average daily LAeqS is found to be 66 dBA, the average daily LAeqN is 49 dBA, and the average difference between daily LAeqS and daily LAeqN is 16.9 dBA. This last value is greater than the 15 dB SNR desired for normal-hearing students although not as high as 20 dB SNR, which has been recommended for children with hearing impairments, with learning disabilities, or learning in a second-language. Variation of the metrics produced from k-means data across the six school days was commonly less than 3 dBA. Comparisons of the speech activity level, noise or non-speech activity level, and SNR estimates as calculated from using GMM, LAeq and LA90, or k-means indicate that LA90 can underestimate the non-speech activity in the classrooms. GMM can also underestimate non-speech activity levels, but its resulting SNRs are similar to those calculated from k-means clustered data.
Additional metrics analyzed in this paper are the average daily percent of time that speech levels exceeded 65 dBA (%Sp > 65) and the average daily percent of time that non-speech levels exceeded 50 dBA (%Ns > 50); respectively, these sample averages were found to be 35% and 32%. Statistically significant differences were found across grades for some metrics, with third grade classrooms demonstrating higher percentages of speech activity (61.2%) compared to the higher grades (54.9% for 11th grade). The third and eighth grade classrooms also significantly demonstrated higher %Sp > 65 values than fifth and 11th grade rooms; reasons for why the eighth grade classrooms exhibited this may be due to the smaller number of that grade in this sample. Further study including observations of classroom activities is suggested.
Correlations between the assorted metrics show statistically significant relations between many of the speech metrics (LAeqS, %Sp > 65, LAeq), between all non-speech or noise metrics (LAeqN, %Ns > 50, LA90), and between all of these and SNR, with correlation coefficients typically above 0.85. Linear regression analysis indicates a statistically significant relation between LAeqS and LAeqN (r = 0.35, p < 0.001), yielding a 0.29 dBA increase in speech levels for every 1 dBA increase in non-speech levels. The scatterplot between LAeqS and LAeqN shows a great deal of variance () that is not accounted for by this relationship.
Since the initial publication of the ANSI S12.60 standard in 2002, which focused greater attention on classroom acoustics, both research and awareness about the impact of K–12 classroom sound fields on occupant well-being and performance have grown tremendously and continue to expand. The broader investigation during which the measurements presented herein were taken is aimed at relating acoustic metrics to student achievement scores. Do classrooms with lower RTs, lower non-speech levels, and/or higher SNR values correlate to greater student achievement? How do those relationships vary when considering other measured environmental variables, such as temperature, indoor air quality, and lighting conditions? The scope of this current paper does not include these research questions, but more research to address such questions is encouraged and will strengthen evidence-based guidelines for the design of classroom environments moving forward.
ACKNOWLEDGMENTS
This research was supported by United States Environmental Protection Agency Grant No. R835633. The authors gratefully acknowledge all members of the research team who worked on the project, listed at the University of Nebraska–Lincoln Healthy Schools website (https://engineering.unl.edu/healthy-schools), with special thanks to Michael Kuhlenengel, Kieren Smith McCord, and Jayden Nord. The authors would also like to thank our school district partners and Dr. Ralph Muehleisen, who served as a technical advisor on the Project Advisory Committee.