Listeners recognizing environmental sounds must contend with variations in level due to the source level and the environment. Nonetheless, variations in level disrupt short-term sound recognition [Susini, Houix, Seropian, and Lemaitre (2019). J. Acoust. Soc. Am. 146(2), EL172–EL176] suggesting that loudness is encoded. We asked whether the experimental custom of setting sounds to equal levels disrupts long-term recognition, especially if it creates a mismatch with ecological loudness. Environmental sounds were played at equalized or ecological levels. Although recognition improved with increased loudness and familiarity, this relationship was unaffected by equalization or real-life experience with the source. However, sound pleasantness was altered by deviations from the ecological level.

Memory representations of a sound event may generalize across some perceptual features but preserve others. Sound can convey both the physical form of the sound source and the method by which the sound is generated (McAdams, 1993). The listener typically acquires this information through everyday listening, focusing on discerning the typology (such as solid vs liquid), the action (like scraping), and source characteristics (including material, size, shape) of an event in the environment (Gaver, 1993). Some sound event properties are structurally invariant; they are conveyed by the acoustic waveform despite influence from factors that may create variation in its structure, such as source distance. Invariant properties also generalize across different events and objects (e.g., a plucked string event is associated with both a guitar and violin). Source intensity is one example of a structurally invariant property of a sound event. Despite the physical level of the received sound decreasing with greater distance, the perceived level of the source event remains constant (Shigenaga, 1965; Zahorik and Wightman, 2001). McAdams (1993) suggested that such auditory properties are mapped onto memory representations through one of two “matching” processes: (1) a process of comparison or (2) the direct activation of a memory structure. This process is thought to underpin listeners' ability to recognize and identify sound events in their environment. However, we do not have a complete understanding of how the ecological perceptual loudness of a sound event is maintained within these sound representations and utilized for source recognition (Susini , 2019; Traer , 2021).

Loudness is a parameter that provides the listener not only with spatial information (e.g., distance) about sound events, but also about the force of an action imposed on an object (Kinoshita , 2007) and the size of the source (Grassi , 2013). However, there are limitations on the reliability of loudness as a cue. Variations in perceived loudness may result from variations in the level of the sound source, but they also result from the acoustic loss over distance, sound absorption, and other factors (Coleman, 1963; Zahorik , 2005).

The evidence for loudness in sound representations appears to depend upon both the timescale of when the information is retrieved by the listener and how the listener encodes the sound. Susini (2019) demonstrated an effect of a change in the sound level between initial memory encoding and retrieval. In their study phase, listeners were assigned to one of three encoding tasks: sorting the sounds by typology (e.g., liquid, solid), rating the loudness of the sounds, or passively listening to the sounds with no assigned task. In their test phase, listeners reported whether the sound was present or absent during the study phase. Sound levels were set to one of two values in study and test phases depending on whether the event was associated with being “quiet” (55 dB SPL) or “loud” (70 dB SPL). The type of encoding task did not influence source recognition; however, a change in level from study-to-test had a significant negative impact on the recognition of the sound. More recently, findings by Traer and colleagues (2021) suggest that source recognition is variant to sound level. In a series of studies, listeners were tasked with identifying the source of the sound when presented at varying source intensities, i.e., from 30 to 90 dB SPL. As the presentation intensity increased for high source intensity sounds, the identification accuracy of the source of the sound increased, reaching a peak accuracy at 70 dB SPL which extended out to 90 dB SPL. For low source intensity sounds, identification accuracy peaked at 60 dB SPL and began to decrease with increases in presentation intensity. We hypothesized that this decrease could be due to the greater presentation level violating the listeners' memory of those low source intensity sounds.

In practice, psychoacoustic researchers often contend with level confounds by setting all sounds to the same level. We conducted a literature review of studies published within the last 20 years in the Journal of Acoustical Society of America and found that 90% of the studies on environmental sound identification and/or discrimination (excluding speech, music, and noise) specified an equalization procedure. Most studies (60%) specified normalization to a common level in dBA (range 61 to 87) or dB SPL (range 66 to 76, median of 70). The remainder presented equalized sounds at a comfortable listening level (see supplemental material).

If sound recognition accuracy varies with sound loudness, the equalization of sound levels may significantly disrupt sound recognition and discrimination. It is possible that listeners incorporate memory of the ecological loudness of a sound. Therefore, a typically quiet sound such as an individual whispering may be poorly recognized by listeners in a laboratory study if normalization increases its loudness. To our knowledge, Lemaitre and colleagues (2010) conducted the only recognition study in which environmental sounds were adjusted to ecological levels in an attempt to optimize performance. However, the study did not show that recognition at the ecological level was better than at a non-ecological level. It is possible that cognitive inferences may allow the listener to reconcile potential differences between what is typically experienced and what is presented in a laboratory setting. In that case, sound recognition could be studied in the laboratory without regard to the loudness of the sound being presented; whispers and screams could be played at the same overall level. As an additional consideration, level equalization does not produce loudness equalization because loudness also depends upon frequency content and temporal properties. Level equalization may create more loudness variation for environmental sounds than for homogeneous sound classes (e.g., speech) because of their relatively greater spectral and temporal diversity.

In the present study, we assessed whether equalization of environmental sound levels presents a methodological concern. If memory of a sound preserves its typical loudness, playing sounds at their ecological levels should confer some advantages in recognition and familiarity over playing sounds at equalized levels. We asked whether there are negative effects of equalization that interact with the previously known benefits of increases in level and familiarity on recognition [e.g., Shafiro (2008) and Traer (2021)]. Finally, because previous studies have demonstrated the effects of loudness on sound pleasantness [e.g., Nilsson (2007) and Rådsten Ekman (2015)], we asked whether the effect of loudness on pleasantness would depend on whether the sound was presented at its ecological level.

One environmental sound was sourced from each of the 50 event classes in the Environmental Sound Classification database, “ESC-50” (Piczak, 2015). Sound durations were approximately 5 s and levels ranged from 64 to 103 dB SPL. To supplement the sample of unpleasant sounds, we recorded two additional events using a Zoom H4N Pro microphone at a 24-bit/96-kHz sampling rate in a sound attenuating chamber with sound absorbing foam on the walls and ceiling: (1) the event of an individual chewing gummy candy and (2) the event of an individual repeatedly sniffling. Fifty-two sounds comprised our final set of stimuli.

To create the ecological level condition, we conducted a three-part in-house normalizing process adapted from Traer (2021).1 Our five raters were instructed to set the level of each sound to reflect the loudness of the event in real life when encountered at its typical distance [ICC = 0.95, F(51, 204) = 89, p < 0.001]. The resulting ecological sound levels ranged from 40 to 89 dB SPL. In the equal level condition, the level of the sounds was equalized to 70 dB SPL (the median found in our literature search). Level was calibrated using a G.R.A.S. 42AP Intelligent Pistonphone (Class O) and flat-plate coupler.

Fifty-three individuals (Mage = 23.38; range = 18 to 68 years; 34 female, 17 male, and 2 other), sampled from Carnegie Mellon University and its surrounding area, were recruited via flyers and a university system. All participants reported normal hearing.

The experiment was conducted in-person. All participants signed a consent form approved by Carnegie Mellon University's Institutional Review Board. We randomized participants into one of two level conditions: (1) ecological level condition or (2) equal level condition. Sounds were presented in random order through Sennheiser HD600 headphones. While seated in front of a computer, participants listened to the entire sound before answering questions. The four questions appeared in the following order: (1) Assign the correct label to the sound (source recognition), (2) How pleasant is the sound to you (sound pleasantness), (3) How familiar is the sound to you (sound familiarity), and (4) Have you heard the sound in your actual life (real life encounter). To identify the source of the sound, the participants completed a closed-set recognition task with 52 labels. The participants could only select one label. The labels were created in-house by a group of research assistants; each label described the sound using a noun and verb (e.g., frog croaking) (Ballas, 1993). To rate the sound's pleasantness, the participants were provided an analog scale ranging from −5 (very unpleasant) to +5 (very pleasant). Familiarity with a particular sound was rated using an analog scale ranging from 0 (unfamiliar) to 4 (familiar). Last, the participants indicated whether they had experienced the sound in real life. After two practice trials, there were 52 experimental trials, plus one catch-trial to ensure high data quality. The audio in the catch-trial instructed the listener to provide specific answers to the questions on that trial only. At the end of the study, participants were prompted to recall those instructions. Because nine participants did not follow the instructions during the catch-trial, their data were excluded. Our final sample size contained 44 listeners (22 in each level condition).

Although sounds presented to listeners at 70 dB SPL were recognized with greater mean accuracy (M = 90.6%, 95% CI = [86.7, 94.4]) compared to the sounds presented at their ecological level (M = 87.2%, 95% CI = [83.8, 90.5]), this difference was not reliable [t(102) = –1.33, p = 0.185]. Figure 1S in the supplemental material depicts the mean recognition accuracy as a function of loudness (Sones) for each sound across both level conditions. Recognition accuracy was not accounted for by perceptual loudness (ecological: R2 = 0.038, F(51) = 1.973, p = 0.166; equal: R2 = 0.055, F(51) = 2.928, p = 0.093). To ensure that the average was not obscuring effects for individual sounds, we analyzed the difference in recognition accuracy for each sound as a function of the difference in loudness (using Sone value) between conditions. Figure 1 illustrates that the relationship between the change in recognition accuracy as a function of the change in loudness was not significant [R2 = 0.02, F(51) = 0.863, p = 0.357]. We observe a similar relationship when the change in recognition accuracy is graphed as a function of the ratio of Sones (loudness at equal level/loudness at ecological level) [R2 = 0.017, F(51) = 0.847, p = 0.362] and as a function of the change in dB SPL [R2 = 0.023, F(51) = 1.173, p = 0.284]. Altogether, loudness variation within the range tested in the experiment does not affect environmental sound recognition accuracy.

Fig. 1.

The graph depicts the average change in recognition accuracy (y axis) as a function of the relative difference in perceptual loudness (Sones) (x axis) between the equal and ecological level conditions. The average change is calculated by subtracting the average value measured in the ecological level condition from the average value measured in the equal level condition. Each data point represents the change for a single environmental sound. Error bars indicate standard error of the mean.

Fig. 1.

The graph depicts the average change in recognition accuracy (y axis) as a function of the relative difference in perceptual loudness (Sones) (x axis) between the equal and ecological level conditions. The average change is calculated by subtracting the average value measured in the ecological level condition from the average value measured in the equal level condition. Each data point represents the change for a single environmental sound. Error bars indicate standard error of the mean.

Close modal

Sounds presented to listeners at 70 dB SPL were perceived as marginally more familiar to listeners (M = 3.60, 95% CI = [3.50, 3.69]) than sounds presented at their ecological level (M = 3.47, 95% CI = [3.35, 3.59]) [t(102) = –1.66, p = 0.098]. Ecological level, within the range tested in the experiment, did not profoundly influence perceived source familiarity. Although listeners were familiar with most sounds in our experiment, there were some unfamiliar sounds. Table 1 contains the list of sounds that received an average familiarity rating that was more than one standard deviation below the mean. Most of the sounds that listeners are least familiar with overlap across the level conditions, with the exception of two sounds in the ecological level condition (hen calling and fire crackling). Generally, listeners have the least amount of familiarity with animal and mechanical sounds. Additionally, half of the sounds in each condition that received low familiarity ratings were poorly recognized; these sound exemplars were the same across the two level conditions, suggesting that they may be poor examples of the sound event.

Table 1.

The average familiarity ratings for sounds that received a low familiarity rating in the second column. Low familiarity sounds received an average familiarity rating of more than one SD below the mean. The upper and lower half of the table shows sounds in the ecological and equal level conditions, respectively. The third column provides average recognition accuracy for each sound. The fourth and fifth columns provide the percentage of participants who gave a low familiarity rating separated by whether they recognized the sound incorrectly, or correctly, respectively.

Sound name Average familiarity rating (0 to 4) Average recognition accuracy (%) % of listeners who incorrectly recognized the unfamiliar sound % of listeners who correctly recognized the unfamiliar sound
Ecological level condition 
Fire crackling  3.00  90.91  0.00  31.82 
Crickets callinga  2.64  72.73  4.55  13.64 
Frog croaking  2.68  86.36  0.00  18.18 
Handsaw cutting  2.82  90.91  0.00  22.73 
Helicopter flyinga  2.41  63.64  4.55  13.64 
Hen calling  3.00  77.27  0.00  18.18 
Pig squealinga  1.95  68.18  4.55  4.55 
Washing machine spinninga  3.00  63.64  9.09  0.00 
Equal level condition 
Crickets callinga  3.00  68.18  18.18  4.55 
Frog croaking  3.05  90.91  4.55  0.00 
Handsaw cutting  2.59  90.91  9.09  18.18 
Helicopter flyinga  2.77  63.64  0.00  9.09 
Pig squealinga  2.59  63.64  9.09  9.09 
Washing machine spinninga  3.18  59.09  13.64  9.09 
Sound name Average familiarity rating (0 to 4) Average recognition accuracy (%) % of listeners who incorrectly recognized the unfamiliar sound % of listeners who correctly recognized the unfamiliar sound
Ecological level condition 
Fire crackling  3.00  90.91  0.00  31.82 
Crickets callinga  2.64  72.73  4.55  13.64 
Frog croaking  2.68  86.36  0.00  18.18 
Handsaw cutting  2.82  90.91  0.00  22.73 
Helicopter flyinga  2.41  63.64  4.55  13.64 
Hen calling  3.00  77.27  0.00  18.18 
Pig squealinga  1.95  68.18  4.55  4.55 
Washing machine spinninga  3.00  63.64  9.09  0.00 
Equal level condition 
Crickets callinga  3.00  68.18  18.18  4.55 
Frog croaking  3.05  90.91  4.55  0.00 
Handsaw cutting  2.59  90.91  9.09  18.18 
Helicopter flyinga  2.77  63.64  0.00  9.09 
Pig squealinga  2.59  63.64  9.09  9.09 
Washing machine spinninga  3.18  59.09  13.64  9.09 
a

Sounds that received an average source recognition accuracy of 75% or below.

3.2.1 Relationship between source familiarity and source recognition

Given that sounds are frequently encountered at their ecological loudness, we predicted that sounds presented at their ecological level would receive higher ratings of familiarity compared to the equal level condition. Therefore, we expect there to be an interaction between recognition and familiarity such that sounds encountered at their ecological level would be more familiar to listeners which in turn would result in higher recognition accuracy than sounds at a level deviating from their ecological level. Figure 2 depicts the relationship between perceived familiarity and source recognition accuracy for the two level conditions. Both level conditions exhibit a moderate, positive relationship [ecological: R2 = 0.30, F(51) = 21.88, p < 0.001; equal: R2 = 0.33, F(51) = 24.86, p < 0.001]. Therefore, on average, listeners most accurately identify sounds that they are most familiar with. However, the effect of familiarity is not greater when the sounds are presented at their ecological level (i.e., the slopes and intercepts were not significantly different).

Fig. 2.

The graph depicts the average recognition accuracy (y axis) as a function of sound familiarity (x axis) for the equal level condition (70 dB SPL, open circles) and the ecological level condition (gray diamonds). The average is calculated across participants in each respective condition. Each data point represents a single environmental sound. Asterisks next to line of best fit denotes a significant R2 (p < 0.05). Error bars reflect the standard error of the mean.

Fig. 2.

The graph depicts the average recognition accuracy (y axis) as a function of sound familiarity (x axis) for the equal level condition (70 dB SPL, open circles) and the ecological level condition (gray diamonds). The average is calculated across participants in each respective condition. Each data point represents a single environmental sound. Asterisks next to line of best fit denotes a significant R2 (p < 0.05). Error bars reflect the standard error of the mean.

Close modal

3.2.2 Source familiarity based on real-life experiences

In addition to perceived source familiarity, we examined whether real-life encounters of the event are necessary for ecological level to matter. We assume that when a sound is encountered in real life, it would be heard and remembered at its ecological loudness. A sound that was heard via media but never in real life might not form a strong representation of loudness. Similar to the familiarity scale, both level conditions exhibit a moderate relationship [ecological: R2 = 0.13, F(51) = 7.67, p = 0.007; equal: R2 = 0.14, F(51) = 8.47, p = 0.005]. However, the effect of real-life encounters on source recognition accuracy is not greater when the sounds are presented at their ecological level (i.e., the slopes and intercepts were not significantly different).

Most listeners reported previous real-life encounters with nearly all the environmental sounds they heard in the study, regardless of the sounds being presented at 70 dB SPL (M = 1.07, 95% CI = [1.04, 1.10]) or ecological level (M = 1.08, 95% CI = [1.04, 1.11]). Nevertheless, there were a set of environmental sounds that a quarter (or more) of our listeners had never encountered before in their lives. In agreement with the findings on perceived familiarity, animal sounds (ecological: up to 59% of listeners; equal: up to 50% of listeners) and mechanical sounds (ecological: up to 36% of listeners; equal: up to 41% of listeners) were not encountered in real life by a sizeable set of our listeners.

Last, we examine whether the perceived pleasantness of a sound changes when the sound is presented at a higher or lower level compared to its ecological level. Figure 3(A) depicts the relationship between sound pleasantness and loudness for each level condition. In the equal level condition, there is approximately a 10 Sone range of loudness variation; however, this loudness variation does not account for the large variations in sound pleasantness [R2 = 0.001, F(51) = 0.054, p = 0.82]. The restricted range of loudness limited the power of this regression. In the ecological level condition, sound pleasantness decreases as loudness increases [R2 = 0.19, F(51) = 11.42, p = 0.001]. Within the region of overlap in loudness between conditions, the range of unexplained variability in pleasantness appears to be similar. These pleasantness variations could be due to acoustic or semantic attributes.

Fig. 3.

(A) The graph depicts the average sound pleasantness (y axis) as a function of the perceptual loudness (Sones) (x axis) for each level condition. Environmental sounds presented at 70 dB SPL are represented by open circles, while sounds presented at their ecological loudness are represented by gray diamonds. Asterisks next to line of best fit denotes a significant R2 (p < 0.05). Error bars reflect the standard error of the mean. (B) The graph depicts the average change in sound pleasantness (y axis) as a function of the relative difference in perceptual loudness (Sones) (x axis) between the equal and ecological level conditions. The average change is calculated by subtracting the average value measured in the ecological level condition from the average value measured in the equal level condition. Each data point represents the change for a single environmental sound, with different symbols indicating the low (light circles), medium (triangles), and high (dark squares) ecological levels. Asterisks next to line of best fit denotes a significant R2 (p < 0.05). Error bars indicate pooled standard error of the mean.

Fig. 3.

(A) The graph depicts the average sound pleasantness (y axis) as a function of the perceptual loudness (Sones) (x axis) for each level condition. Environmental sounds presented at 70 dB SPL are represented by open circles, while sounds presented at their ecological loudness are represented by gray diamonds. Asterisks next to line of best fit denotes a significant R2 (p < 0.05). Error bars reflect the standard error of the mean. (B) The graph depicts the average change in sound pleasantness (y axis) as a function of the relative difference in perceptual loudness (Sones) (x axis) between the equal and ecological level conditions. The average change is calculated by subtracting the average value measured in the ecological level condition from the average value measured in the equal level condition. Each data point represents the change for a single environmental sound, with different symbols indicating the low (light circles), medium (triangles), and high (dark squares) ecological levels. Asterisks next to line of best fit denotes a significant R2 (p < 0.05). Error bars indicate pooled standard error of the mean.

Close modal

To remove the variability in pleasantness attributed to acoustics and semantics, Fig. 3(B) depicts the change in average pleasantness of each sound as a function of its difference in loudness between conditions. A decrease in pleasantness was strongly associated with an increase in perceptual loudness [R2 = 0.57, F(51) = 66.71, p < 0.001]. The ecological level of each sound is indicated by its symbol as low (light circles), medium (triangles), or high (dark squares). Low or high level was defined as 10 dB below or above the median ecological level of all the sounds. When equalization increases the loudness of an ecologically low-level sound, the sound is perceived as more unpleasant, whereas when equalization decreases the loudness of an ecologically high-level sound, the sound is perceived as more pleasant.

Our results indicated that environmental sound recognition was invariant to deviations from ecological loudness. Although it seems logical that ecological loudness must be represented in long-term memory given the high agreement between our raters who set the ecological levels, this memory does not have a reliable effect on recognition within the equalization and task parameters we used. If memory for ecological loudness influenced recognition, equalization should harm recognition, but it did not. We acknowledge that different results could be found with a free-response task or with a reaction time measure, both of which may be sensitive to more subtle effects. However, our forced-choice task did not make guessing easy because there were 52 options on each trial. Furthermore, we did not find reliable differences in familiarity of the sounds between the level conditions, nor did familiarity interact with loudness. This may have been because familiarity ratings for our sounds were quite high. Nonetheless, the range of familiarity was large enough to find, in agreement with the literature, that familiarity with sounds improves their recognition. In addition, we found that many listeners were able to recognize sounds (e.g., farm animals) that they had never heard in real life. It is possible that listeners were able to effectively utilize information from media sources.

Although recognition of sounds was not better when presented at ecological level, playing a sound at a level that is higher or lower than a sound's ecological level modulated its pleasantness. A sound presented at a higher level was always more unpleasant regardless of its ecological level. Therefore, equalizing sounds decreases the pleasantness of typically quiet sounds and increases the pleasantness of typically loud sounds.

Although our findings may not appear to support Susini and colleagues (2019) or Traer and colleagues (2021), we offer explanations for the discrepancies. Our study design did not prompt listeners to recall whether sounds were previously heard, nor were listeners asked to categorize their semantic and acoustic properties, as in Susini (2019). Our listeners recognized sounds from a single presentation; this process likely involves long-term memory. Therefore, the difference in our findings compared to Susini (2019) could be due to loudness being more relevant to a working memory old/new task but not as relevant when comparing a stimulus to long-term memory representations. Our study conceptually agrees with Traer (2021) who found that higher presentation levels were beneficial for recognition of sounds of both low and high source intensities. However, Traer (2021) were testing source intensity level, whereas our study tested ecological level, i.e., the level that produces the loudness heard in everyday life. There are cases in which source level and ecological level diverge. For example, the source level for an airplane is higher than its ecological level from a typical distance. In such a case, representing ecological loudness in long term memory would predict a higher recognition accuracy for the lower-level airplane sounds despite its high source level. However, there were not enough distinguishing cases in both studies to allow us to discriminate between the effects of source intensity vs ecological level.

Based on our findings, we recommend that auditory studies of long-term environmental sound recognition normalize the level of sounds across a single value in the vicinity of 70 dB SPL. Recognition studies would also be valid if sounds are normalized to equal loudness instead of equal level, and/or if ecological levels are used within audible and comfortable limits. However, normalizing all sounds to a single value near 70 dB SPL will ensure that (1) sounds will be comfortably audible to a normal-hearing listener and (2) loudness variations that are present will be small enough that they do not impact sound recognition accuracy. However, because level normalization does affect sound pleasantness, we recommend caution towards using normalization in studies of environmental sound pleasantness. Level equalization will cause deviations from ecological loudness which means that pleasantness judgements on normalized sounds will not accurately reflect the ecological pleasantness of everyday sounds in real life.

See the supplementary material for Fig. 1S and for the articles and keywords used in our literature search on level equalization practices.

We thank Paige Brady for conducting the literature search and Sungjoon Park for feedback on the manuscript. Funding was provided by REAM.

There are no conflicts of interest to disclose.

This study was approved by Carnegie Mellon University's Institutional Review Board (IRB). Informed consent was obtained from all participants.

The data that support the findings of this study are openly available in KiltHub, dx.doi.org/10.1184/R1/c.6967506.

1

First, a research assistant adjusted the level of each sound to reflect the typical loudness of the sound event in everyday life at a typical distance. Next, four naïve research assistants further adjusted the sound levels based on their own experiences. Agreement was high across the five raters with an intraclass correlation coefficient (ICC alpha) of 0.95 [F(51, 204) = 89, p < 0.001]. Finally, sounds were set at the median value across raters to determine their ecological sound levels.

1.
Ballas
,
J. A.
(
1993
). “
Common factors in the identification of an assortment of brief everyday sounds
,”
J. Exp. Psychol.: Human Percept. Perform.
19
(
2
),
250
267
.
2.
Coleman
,
P. D.
(
1963
). “
An analysis of cues to auditory depth perception in free space
,”
Psychol. Bull.
60
(
3
),
302
.
3.
Gaver
,
W. W.
(
1993
). “
What in the world do we hear?: An ecological approach to auditory event perception
,”
Ecol. Psychol.
5
(
1
),
1
29
.
4.
Grassi
,
M.
,
Pastore
,
M.
, and
Lemaitre
,
G.
(
2013
). “
Looking at the world with your ears: How do we get the size of an object from its sound?
,”
Acta Psychol.
143
(
1
),
96
104
.
5.
Kinoshita
,
H.
,
Furuya
,
S.
,
Aoki
,
T.
, and
Altenmüller
,
E.
(
2007
). “
Loudness control in pianists as exemplified in keystroke force measurements on different touches
,”
J. Acoust. Soc. Am.
121
(
5
),
2959
2969
.
6.
Lemaitre
,
G.
,
Houix
,
O.
,
Misdariis
,
N.
, and
Susini
,
P.
(
2010
). “
Listener expertise and sound identification influence the categorization of environmental sounds
,”
J. Exp. Psychol.: Appl.
16
(
1
),
16
32
.
7.
McAdams
,
S.
(
1993
). “
Recognition of sound sources and events
,” in
Thinking in Sound: The Cognitive Psychology of Human Audition
(
Oxford Academic
,
Oxford
), pp.
146
198
.
8.
Nilsson
,
M.
,
Botteldooren
,
D.
, and
De Coensel
,
B.
(
2007
). “
Acoustic indicators of soundscape quality and noise annoyance in outdoor urban areas
,” in
Proceedings of the 19th International Congress on Acoustics
.
9.
Piczak
,
K. J.
(
2015
). “
Environmental sound classification with convolutional neural networks
,” in
2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
(IEEE, New York), pp.
1
6
.
10.
Rådsten Ekman
,
M.
,
Lundén
,
P.
, and
Nilsson
,
M. E.
(
2015
). “
Similarity and pleasantness assessments of water-fountain sounds recorded in urban public spaces
,”
J. Acoust. Soc. Am.
138
(
5
),
3043
3052
.
11.
Shafiro
,
V.
(
2008
). “
Development of a large-item environmental sound test and the effects of short-term training with spectrally-degraded stimuli
,”
Ear Hear.
29
(
5
),
775
790
.
12.
Shigenaga
,
S.
(
1965
). “
The constancy of loudness and of acoustic distance
,”
Bull. Faculty Lit. Kyushu Univ.
9
,
289
333
.
13.
Susini
,
P.
,
Houix
,
O.
,
Seropian
,
L.
, and
Lemaitre
,
G.
(
2019
). “
Is loudness part of a sound recognition process?
,”
J. Acoust. Soc. Am.
146
(
2
),
EL172
EL176
.
14.
Traer
,
J.
,
Norman-Haignere
,
S. V.
, and
McDermott
,
J. H.
(
2021
). “
Causal inference in environmental sound recognition
,”
Cognition
214
,
104627
.
15.
Zahorik
,
P.
,
Brungart
,
D. S.
, and
Bronkhorst
,
A. W.
(
2005
). “
Auditory distance perception in humans: A summary of past and present research
,”
Acta Acust. united Acust.
91
(
3
),
409
420
.
16.
Zahorik
,
P.
, and
Wightman
,
F. L.
(
2001
). “
Loudness constancy with varying sound source distance
,”
Nat. Neurosci.
4
(
1
),
78
83
.

Supplementary Material