Cultural differences in auditory ecology

: Demographic differences in acoustic environments are usually studied using geographic area monitoring. This approach, however, may miss valuable information differentiating cultures. This motivated the current study, which used wearable sound recorders to measure noise levels and speech-to-noise ratios (SNRs) in the immediate acoustic environment of Latinx and European-American college students. Latinx experienced higher noise levels (64.8 dBC) and lower SNRs (3.7 dB) compared to European-Americans (noise levels, 63dB; SNRs, 5.4dB). This work provides a framework for a larger study on the impact of culture on auditory ecology. V C 2023 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/) .


Introduction
Auditory ecology refers to the relationship between acoustic environments and the listening demands that individuals experience during their daily life. 1 A person's auditory ecology is determined, in part, by the noise levels experienced and how this noise affects their ability to extract behaviorally relevant auditory information from their immediate environment. This ability is particularly important in social interactions in which accessing speech sounds effectively is crucial for communication. In this study, using a novel approach, we measured noise levels and speech-to-noise ratios (SNRs; used as a proxy of listening environmental demands) to investigate whether Latinx and European-American college students differed in the acoustic environment that they experience.
Cultural factors have been proposed to influence a person's exposure to loud sounds, background noise, and the acoustic features of their environment. [2][3][4][5][6] For example, cultural values associated with collectivism and interdependence drive Latinx 7,8 social interactions, 9 leading Latinx, in everyday life, to spend less time alone and more time talking, socializing, and engaging in group interactions than European-Americans. 10 Latinx also spend more time laughing and listening to music during their interactions than their European-American peers 3,11 and are more likely to live in crowded multigenerational homes that increase the number of potential noise sources. 12 These cultural factors may combine to shape the acoustic characteristics of everyday environments.
Previous studies have used geographic area monitoring to study cultural differences in environmental sound levels. For example, in a study of noise exposure that aggregated 1.5 Â 10 6 hours of data from $500 different test sites in the U.S., Latinx were overrepresented in noisy neighborhoods close to train tracks or airports. 12 That study and others like it used sound level meters placed at different neighborhood locations to measure the average environmental noise level for each neighborhood. This geospatial, area monitoring approach, however, has its limitations, leaving it unclear whether group differences in neighborhood environmental noise levels, measured from microphones placed at specific locations, are the result of cultural factors or the socioeconomic pressures that drive where people from different cultures live. Another limitation is that this geographical approach may not be able to differentiate different types of sound sources, to dissociate, for example, speech from background noise. As a result, it is not possible to determine the extent to which noise in the environment affects communication. That is, this methodology is unable to provide estimates of SNRs experienced by the individual while communicating. In summary, geospatial monitoring may be less sensitive to social and cultural dynamics that shape the communication environment than recordings made at the individual level using body-worn sound recorders.
These gaps in knowledge motivate the current study, which used personal sound recorders to capture noise levels in the immediate communication environment of the person wearing the device. In this study, we provided wearable sound recorders to two groups of college students living in the U.S. who self-identified as either Latinx or European-American. Participants wore the recorders for two full days during the academic term as they went about their daily routines. Both groups were tested in the same context (the same university campus). We used an algorithm previously developed by C.R.B.-B. to estimate noise levels for the two cultural groups. In addition, we examined whether communication took place under different SNRs for the two groups. We hypothesized a priori that differences in cultural values and interpersonal dynamics would manifest in the personal sound recordings as higher levels of noise and lower SNRs for Latinx students compared to their European-American peers. That is, we expected that our methodology would capture cultural differences in acoustic environments.

Methods
All procedures conformed to a protocol approved by the University of Connecticut Institutional Review Board and Connecticut Recording Laws. Informed consent was obtained from all participants.

Participants
Seventy-four college students with normal hearing (57 females) participated in the study. These students were recruited as part of a larger study at the University of Connecticut. In this larger study, we investigate the extent to which the day-today language environments of Latinx and European-American college students affect their language processing skills.
Participants were between the ages of 18 and 24 years old [M ¼ 20.2; standard deviation (SD) ¼1.4]. All participants passed an air-conduction pure tone hearing screening at 20 dB hearing level (HL) at octave intervals from 250 to 4000 Hz in both ears and used English and/or Spanish as their primary language. Thirty-one participants (23 females) identified themselves as Latinx, whereas the remaining 43 participants (34 females) identified as European-Americans. Participants reported the socioeconomic status (SES) of their primary caregiver growing up on a Likert scale from 1 ¼ working class to 5 ¼ upper class. We included this information because SES has been associated with environmental sound quality 12 and, therefore, could inform us about potential preferences regarding listening environments. Participants were also asked to report if they were currently active musicians because we wanted to rule out the possible effects of music practice in our sound levels. 13 The two groups did not differ significantly by age or current music training. However, they did differ in SES. Specifically, Latinx were more likely to be at the lower end of the socioeconomic stratum than European-Americans (Latinx, Table 1 provides detailed demographic information for each group and the statistical tests used to compare both groups.

Latinx group characterization
Latinx participants represented different parts of Latin America's cultural heritage. Specifically, nine participants' cultural heritages were from Mexico, four were from the Dominican Republic, four were from Ecuador, three were from Peru, two were from Colombia, two were from Guatemala, one was from Argentina, one was from Bolivia, one was from Costa Rica, one was from Cuba, one was from El Salvador, and one was from Puerto Rico (one no response). Twenty participants reported living in the U.S. since birth, and the others reported being born outside of the U.S. On average, the Latinx participants reported living in the U.S. for 16.4 years (SD ¼ 5.6). Sixteen of the Latinx participants indicated that English was their primary (L1) language and Spanish was their secondary (L2) language, whereas ten participants reported that Spanish was their L1 and English was their L2 (note that five Latinx participants did not provide this information). On a Likert scale of 1 ¼ I cannot speak the language fluently to 5 ¼ I have a native-like proficiency, Latinx participants' averages were 4.95 (SD ¼ 0.25) for English and 4.57 (SD ¼ 0.63) for Spanish. Therefore, most Latinx participants self-reported as balanced bilinguals with native or near native-like proficiency in both languages.

Procedures
The Language Environment Analysis TM (LENA) technology was used to obtain two full (continuous) recording days for each participant (16 h per recording). LENA is a technology for recording and analyzing the acoustic parameters of natural environments with a recording capacity of 16-h per recording. 14 The technology has been used extensively to study child language development but has been adopted in other contexts outside of this area of research (e.g., collecting language samples in adults 11,15 or analyzing real-world acoustic environments in adults with hearing loss 16 ). Typically, the person of interest wears a LENA to capture the linguistic environment throughout the day, and then based on acoustic parameters of the language environment, the LENA software yields an estimate of the amount of speech produced near the recorder (i.e., within $6-8 ft 15 ). LENA also estimates the amount of time the wearer is exposed to noise, silence (segments 32 dBC), TV electronics, and overlapping sounds (i.e., two people speaking simultaneously or one person speaking with the TV on). Importantly, LENA technology has been shown to provide precise and accurate measures for spoken English and Spanish. 17,18 For the current analysis, we calculated speech levels and SNRs without considering the specific language. That is, we did not factor in whether the language at a particular point of the recording was English or Spanish. Participants received two LENA recorders and an armband to hold the recorder, and they were instructed to record continuously on a weekday and a weekend day. During recording days, participants were instructed to perform their regular routines and, if possible, wear the recorder for the entire day.

Data analyses
Noise levels and SNRs (variables of interest in the current study) are not part of the LENA automated analysis system. To calculate this information, we followed a methodology detailed in Ben ıtez-Barrera et al. 19 Briefly, the LENA software automatically and continuously categorizes recorded sounds into speech, noise, overlap, tv/electronics, and silence segments. These segments are brief in nature (0.5-5 s long) and represent the minimal unit of analysis that the LENA software provides. In addition, the segments are further divided into "near" and "far" categories (e.g., "speech near" vs "speech far") depending on the estimated proximity of the sound to the wearer. Near segments are those in which the software estimates that the sound was generated <6-8 ft from the recorder, whereas far segments are estimated to be generated >6-8 ft from the recorder. The software then generates a complete summary of each recording day with time-stamped labels for each identified segment (e.g., female speech near and noise near). Importantly, the LENA recording summary provides an estimate of average levels (in dBC) for each labeled sound. The recordings are also divided into conversation and pause "blocks." Conversation blocks are sections that include near speech segments and interleaved segments of nonspeech, overlap (i.e., two people talking simultaneously), or far speech categories. On the other hand, pause blocks are sections of the recording that do not include any near speech segments (i.e., only includes nonspeech, overlap, and far speech categories). Specifically, a pause block is established when there is more than 5 s without any near, clear human speechrelated sound. Finally, a given conversation block is always followed by a pause block, creating pairs of conversation-pause blocks.
We used sound levels from noise, tv/electronics, and silence labels to calculate a time-weighted average noise level within each conversation-pause block pair. Then, using our bespoke algorithm, these levels were averaged in a timeweighted fashion to calculate the average noise level across conversation-pause blocks for a given recording day. To calculate the average SNR for a recording day, we first calculated speech levels by implementing a time-weighted average of near and far speech segments within the conversation block from each conversation-block pair. Next, we subtracted the averaged noise level from the averaged speech levels, resulting in the average SNR for a given conversation-pause block. Finally, we used a time-weighted average across conversation-pause blocks to calculate an average SNR for a given recording day. As a final step, the noise levels and SNRs were averaged across the two recording days to create a two-day average for the noise level and SNR. For detailed information about the algorithm, we refer the reader to Ben ıtez-Barrera et al. 19 Using this methodology, in the current study, we were able to estimate the noise levels and SNRs experienced by Latinx and European-American participants across the two recording days. For the planned comparisons, we statistically compared noise levels and SNRs between the two groups using Welch two-sample t-tests (which are appropriate when experimental groups have unequal variances). Of note, we did not control for multiple comparisons because only two planned comparisons were established. 20,21 That said, in addition to the planned comparisons, we also undertook exploratory analyses using additional Welch two-sample t-tests to compare the proportion of time Latinx and European- Americans spent in environments categorized by LENA as silent, tv/electronics, and overlap. We used the Benjamini and Hochberg procedure to control for multiple comparisons in our exploratory analyses. All t-tests results reported herein reflect two-tailed values. Thresholds for significance levels were established at p < 0.05. Normality was confirmed for all our variables; thus, parametric t-tests were appropriate for our analyses. All statistical tests were computed using R Software Development Program. 22

Results
On average, we had a total of 23.3 recording hours per participant (SD ¼ 6.1) across both days. As revealed by a Welch two-sample t-test, no differences were observed between groups in the number of recording hours (t[71.9] ¼ -0.2, p ¼ 0.9, 95% CI (confidence interval) [-10838.2 9074.9]). In addition, a paired sample t-test revealed that on average and regardless of the group, participants were recorded for the same number of hours each recording day (day 1, M ¼ 11.9, SD ¼ 2.7; day 2, M ¼ 11.9, SD Our exploratory analyses revealed that Latinx and European-American participants were exposed to silence segments 25% (SD ¼ 11.5) and 37% (SD ¼ 12.2) of their time, respectively. This 12% mean difference between the groups was significant (t[66.7] ¼ À4.18, p ¼ 2.6 Â 10 À4 , 95% CI [À17.2.2 À6.1]). On the other hand, no significant differences were observed between the groups in terms of amount of exposure to overlapping sounds (Latinx, M ¼ 16.3%, SD ¼ 9.

Discussion
Using personal sound recorders, we sampled the acoustic environments of college students from two cultural backgrounds attending the same university. In line with our previous findings that Latinx socialize more and engage in more group interactions than European-Americans, 10 we found that Latinx college students communicated in noisier acoustic environments than their European-American counterparts. In addition, they spent more time in situations with low SNRs and less silence than their European-American peers. Collectively, our findings shed light on cultural differences in auditory ecology.
Despite the cultural difference in noise levels, the overall sound levels experienced by both groups did not exceed recreational dB levels that if sustained over a period of 24 h would likely lead to noise-induced hearing loss (established at 75 dBþ). 23 Nevertheless, noise levels-even when below levels that can lead to noise-induced hearing loss-can deter speech communication by limiting access to speech acoustics. 24 In fact, we found that, on average, Latinx and European-Americans were exposed to þ3 dB and þ 5 dB SNRs, respectively. Note that the ANSI standard establishes that the full dynamic range of speech is available at >þ15 dB. 25 Taking this into account, our findings suggest that regardless of the cultural background, these college students engaged in communication interactions in which access to speech information was limited. Also, the $þ2 dB group difference in SNRs could indicate that Latinx have less access to speech acoustics during their daily lives compared to European-Americans. Based on a meta-analysis of 139 studies, speech intelligibility is expected to increase with increasing SNR by $6%/dB, but the increase could be even greater depending on the specific listening environment (e.g., type of masker). 26 That said, the clinical implications of exposure to lower SNRs in the Latinx than the European-American group warrant more detailed investigation. Finally, it should be noted that while chronic exposure to low and moderate levels of noise has been associated with decreased academic performance and cardiovascular function, increased levels of stress, and modification of social behavior, 27-29 the well-known benefits of an active social life might potentially outweigh the relatively small risk of noise-related health and communication complications in Latinx populations.
An important detail about our study design is that all participants were college students at the same university, and hence, they likely lived in similar environments (e.g., campus dormitories) regardless of ethnicity. This minimizes the likelihood that group differences reflect broad geographic differences in noise exposure. It is, instead, more likely that greater noise levels and poorer SNRs measured in the Latinx group arise from social interactions and other types of activities that distinguish the two groups. 10 This is supported by the fact that, compared to their European-American peers, Latinx were exposed to less silence than their European-American peers with a trend for greater incidence of overlapping speech sounds (although not statistically significant). It is also possible that Latinx participants were raised in noisier communication environments than European-Americans and, as a result, they are more comfortable and potentially even seek noisy social environments. This is supported by previous studies suggesting that cultural dynamics experienced during childhood are reproduced in the adult years. 30 That said, future studies should confirm this cultural dynamic.
Three methodological limitations should be pointed out. First, for practical considerations, we only obtained two days of audio data for each participant. While the argument can be made that two days may not fully represent a person's everyday acoustic environment, collecting data for two days is common practice in the field as it has been shown to provide robust and reliable data about an individual's communication environment. 16,31 Second, it is a challenge to completely separate speech from noise as well as to separate "wanted" from "unwanted" speech in everyday complex listening environments. Speech segments identified by the LENA's automatic algorithm necessarily contain noise, which contributes to the overall sound level of the "speech" segment. Therefore, our calculations of speech levels as well as SNRs are likely inflated by the contributing noise, leading to possible residual error in our speech-alone levels and SNRs. That said, as indicated in Ben ıtez-Barrera et al., 19 the contribution of noise to the overall speech level is likely minimal (<3 dB). Also, for near speech, the algorithm cannot separate whether that speech is directed to and/or of interest (i.e., wanted) to the listener. In cases where there are multiple simultaneous talkers, the algorithm labels it as overlap, which is discarded in calculating the speech levels, noise levels, and SNRs. Therefore, for people who live in smaller dwellings, compared to larger ones, unwanted near speech may be more common. Because cohabitation in dormitories and small apartments is high among college students, this may have influenced the amount of unwanted speech in private settings for both groups. Finally, this study is limited by the fact that the LENA system calculates noise levels using dBC. This scale matches the sensitivity of the human ear at high sound levels and may, therefore, not be appropriate for the more moderate levels that were predominant in the current study. As a result, a scale such as dBA, which is more precise at low-to-mid sound levels, would have been preferable to estimate sound levels. However, it is important to note that measurement errors in our calculations would affect both groups similarly; thus, differences between both groups are likely to reflect true differences. In addition to the methodological limitations of the acoustic measurement, our sample was not sufficiently large to isolate the effects of SES and bilingualism or study their interactions. For example, whether the amount of wanted or unwanted speech in the listener's environment varies as a function of culture, SES, and/or degree of bilingualism is beyond the scope of the study.
In summary, we used wearable recording devices to obtain individual-level measurements of acoustic environments, including noise levels and SNRs in day-to-day lives. Using this novel acoustic approach, this study provides the first evidence of potential cultural differences in auditory ecology. Whether these findings hold with more rigorous controls, larger datasets, or other cultural or geographic regions remains unknown. Nevertheless, this work provides a framework for using wearable technology to link ecoacoustic and sociocultural variables to health outcomes in other cultural groups and larger and more geographically diverse study samples.