A previous modelling study reported that spectro-temporal cues perceptually relevant to humans provide enough information to accurately classify “natural soundscapes” recorded in four distinct temperate habitats of a biosphere reserve [Thoret, Varnet, Boubenec, Ferriere, Le Tourneau, Krause, and Lorenzi (2020). J. Acoust. Soc. Am. 147, 3260]. The goal of the present study was to assess this prediction for humans using 2 s samples taken from the same soundscape recordings. Thirty-one listeners were asked to discriminate these recordings based on differences in habitat, season, or period of the day using an oddity task. Listeners' performance was well above chance, demonstrating effective processing of these differences and suggesting a general high sensitivity for natural soundscape discrimination. This performance did not improve with training up to 10 h. Additional results obtained for habitat discrimination indicate that temporal cues play only a minor role; instead, listeners appear to base their decisions primarily on gross spectral cues related to biological sound sources and habitat acoustics. Convolutional neural networks were trained to perform a similar task using spectro-temporal cues extracted by an auditory model as input. The results are consistent with the idea that humans exclude the available temporal information when discriminating short samples of habitats, implying a form of a sub-optimality.

Soundscapes have been referred to as “the collection of biological, geophysical and anthropogenic sounds that emanate from a landscape and which vary over space and time, reflecting important ecosystem processes and human activities” (Pijanowski , 2011). Such a definition, while easy to grasp, does not reflect the difficulty of providing an operational characterization of soundscapes. To this end, Grinfeder (2022) proposed to distinguish between three levels of soundscape integration. The first, and most abstract level, is referred to as the distal soundscape. It is defined as the “spatial and temporal distribution of sounds in a predetermined area, in relation to sound propagation effects.” General properties of the environment, as well as the sound sources at play, are considered at this level. However, these are not considered in relation to a specific point in space or observer. This is the information that would be required to build artificial soundscapes. The second level is referred to as the proximal soundscape. It corresponds to “the collection of propagated sound signals that occurs at a specific point in space.” Therefore, it refers to the mixture of sounds generated by distinct sources reaching a particular location and how these sounds are shaped by the properties of the surroundings. Finally, the third level is defined as “the individual, subjective interpretation of a proximal soundscape” and has been referred to as the perceptual soundscape.

Over the last decades, soundscape ecology and, more recently, ecoacoustics, have repeatedly shown that distal and proximal forms of natural soundscapes, which are soundscapes marginally influenced by human activity, are structured in space and time (e.g., Rodriguez , 2014; Gage and Axel, 2014). Natural soundscapes convey information that can be extracted using a variety of signal-processing methods to assess ecological patterns and processes of a given environment (Sueur , 2014; Sueur and Farina, 2015; Farina and Gage, 2017; Sugai , 2019; Sethi , 2020). Still, knowledge about the human's ability to perceive natural soundscapes, i.e., how humans build a “perceptual soundscape” and use it for mapping the habitat, navigating, finding resources or mates, and assessing danger, is clearly lacking. Most of the research conducted so far in psychoacoustics and neurosciences has been limited to urban soundscapes, concentrating efforts on qualitative and emotional judgments or semantic scales (e.g., Raimbault, 2006; Axelsson , 2010; Irwin , 2011; Filipan , 2019). This is quite surprising given the adaptive value of processing natural soundscapes for any species with an auditory system (Fay, 2009).

Current knowledge in soundscape ecology and ecoacoustics suggests that discrimination among proximal natural soundscapes is based on acoustic cues associated with “biophony” (the combined sound that living organisms produce in a given habitat), “geophony” (the combined sound resulting from geophysical events, such as the wind, thunder, water flow, or earth movement), and habitat sound propagation characteristics (Krause, 1987; Forrest, 1994; Pijanowski , 2011; Sueur and Farina, 2015; Krause, 2016; Farina and Gage, 2017; Grinfeder , 2022). These cues may vary with biological and geophysical cycles (e.g., biophony peaks at dawn and dusk, and during spring) (Krause , 2011; Gage and Axel, 2014). Because biological and geophysical sounds originate from a considerable variety of sources, they may be difficult to characterize. However, a few general properties have been put forward. Biophony is often assumed to correspond to mid-high audio frequency components (∼2–11 kHz) (Gage and Axel, 2014; Farina and Gage, 2017) with relatively slow temporal modulations and fine harmonic structure (Lewicki, 2002; Theunissen and Elie, 2014). However, many animal vocalizations show low-frequency components (<1 kHz), fast fluctuations, and inharmonic or noisy characteristics (Hauser, 1996; Bradbury and Vehrencamp, 2011). Geophony, on the other hand, is often assumed to correspond to low-frequency (<1 kHz) or broadband sounds (Qi , 2008; Pijanowski , 2011; Farina and Gage, 2017; Sánchez-Giraldo , 2020) and transient events (Lewicki, 2002). Habitat characteristics may add to the complexity of natural soundscapes by altering spectral and temporal cues conveyed by biophony and geophony due to geometric/spherical spreading loss, atmospheric attenuation and turbulence, ground effects, and scattering caused by obstacles (Forrest, 1994).

A recent modelling study conducted by Thoret (2020) suggested that relatively slow spectrotemporal cues may play a role in the auditory discrimination of natural soundscapes by humans. The authors used a database of audio recordings collected in the temperate terrestrial biome of the USA Sequoia National Park (Krause , 2011). The recordings were representative of four distinct habitats—a riparian forest, a meadow, a chaparral, and a grassland— during four seasons and four periods of the day. These recordings were used to train a support vector-machine (SVM) classifier using only amplitude-modulation (AM) information extracted at the output of a simulated human auditory model composed of a cochlear filterbank followed by a modulation filterbank (Varnet , 2017). In Thoret (2020), classification scores were twice the chance level (60% correct) even though no hyper-parameter optimization was performed, indicating that gross spectro-temporal modulation cues could support soundscapes discrimination and, therefore, might be used by human observers when perceiving variations in soundscapes within and across the natural habitats of a given temperate, terrestrial biome. In this initial study, however, no human data were collected to corrobrate the modelling results and assess the above assumption. The goal of the present study is to extend this initial work by collecting perceptual data and determine the extent to which gross spectro-temporal modulation cues can support soundscapes discrimination in human listeners. Two questions were addressed in the present study: (i) Are human listeners able to discriminate natural soundscapes based on differences in habitat, season, or period of the day? (ii) If so, what are the cues used to perform this task?

To tackle these issues, we took advantage of the Krause (2011) database already used by Thoret (2020). This unique database consists of 64 h of recordings that we subsequently divided into 2 s samples for psychophysical experiments and modelling. All psychophysical experiments were conducted with normal-hearing adults. In the first experiment, we evaluated the discrimination ability for one factor (i.e., habitats, periods of the day, or seasons) while randomizing the other factors across trials. In the second experiment, we reduced the acoustic variability of soundscape recordings by measuring the ability to discriminate between habitats for each period of the day and each season (these two parameters being fixed within each session). In the third and fourth experiments, we measured the ability to discriminate between habitats (period of the day and season varied randomly across trials) when stimuli were restricted in the audio frequency domain or noise vocoded to assess the contribution of spectral and temporal cues, respectively. Percent-correct discrimination scores were measured in each experiment using a 3-interval oddity (forced-choice) task with a method of constant stimuli. These scores were compared to predictions from one of three convolutional neural networks (CNNs) specifically trained for habitat, period of the day, or season classification. The CNNs received as input “perceptually relevant” AM spectra computed from the model of the human auditory system used by Thoret (2020). For consistency, the output of the CNNs were used to perform a computer-simulated discrimination task using the same oddity discrimination paradigm as used with human participants.

Thirty-one normal-hearing listeners (13 males, 18 females) participated in the experiments. These listeners were recruited through the platform RISC (“Relais d'Information sur les Sciences de la Cognition, UMS CNRS 332”) at Ecole Normale Supérieure (Paris, France). Listeners were between 22 and 40 years of age (mean = 27 years; standard deviation (SD) = 4 years). All participants had pure-tone air-conduction thresholds 20 dB HL, or better, at 0.25, 1, 4, and 8 kHz in both ears. The pure-tone average (PTA) calculated using thresholds at 0.25, 1, 4, and 8 kHz) averaged across ears and listeners ranged between 0 and 13 dB HL (mean = 7 dB HL; SD = 3 dB HL). Audiometric thresholds were also measured at 12.5 kHz in both ears because many insect stridulations show audio frequency components beyond 10 kHz. These thresholds were not measurable for a single listener only (S25). For the remaining participants, the averaged thresholds across ears at 12.5 kHz ranged between 8 and 45 dB HL (mean = 22 dB HL; SD = 9 dB HL).

Demographic data are shown in Table I. This table also indicates in which specific experiment(s) each listener participated (experiments Ia,b,c; II, III, IVa,b). At the end of the study, listeners were sent a questionnaire aiming to assess lifelong experience with natural soundscapes. Twenty-nine amongst the 31 participants responded to the questionnaire (all subjects except S25 and S29). The participants were fully informed about the purpose of the study and provided written consent before their participation. All listeners were paid 10 Euros per hour to participate in the experiments.

TABLE I.

Demographic data for the 31 human listeners who participated in the psychophysical experiments, with age (in years), gender (M: male, F: female), pure-tone average audiometric threshold (in dB HL) calculated between 0.25 and 8 kHz and averaged across ears, absolute threshold (in dB HL) measured at 12.5 kHz (mean across ears), lifetime exposure to natural soundscapes (median of the 5 points mobility scale where 1 corresponds to areas with large and dense permanent human settlements, i.e., the landscape is mainly composed of infrastructure and built environment) and 5 corresponds to the wildest environment, i.e., areas where the land is in its natural state, and the impact from human activities is minimal) as estimated by the questionnaire, and experiments in which listeners were included (Ia,b,c, II, III, IVa,b).

Participants Age (years) Gender PTA (dB HL) Absolute threshold (dB HL) at 12.5 kHz Median mobility score Experiments
S1  23  18  Ia,b,c 
S2  25  35  Ia,b,c 
S3  30  10  Ia,b,c 
S4  27  20  Ia,b,c 
S5  27  10  18  Ia,b,c 
S6  25  28  Ia,b,c 
S7  27  18  Ia,b,c 
S8  27  45  Ia,b,c 
S9  25  18  Ia,b,c 
S10  29  13  Ia,b,c 
S11  23  23  Ia,b,c; IVa,b 
S12  22  20  Ia,b,c 
S13  25  15  Ia,b,c; IVa,b 
S14  23  11  20  Ia,b,c 
S15  25  28  Ia,b,c; IVa,b 
S16  33  43  Ia,b,c 
S17  26  12  23  Ia,b,c; IVa,b 
S18  22  28  Ia,b,c 
S19  28  15  Ia,b,c 
S20  24  33  Ia,b,c; IVa,b 
S21  32  13  18  Ia,b,c; III 
S22  36  20  II; III; IVa,b 
S23  27  20  II; III; IVa,b 
S24  23  15  II; III; IVa,b 
S25  25  13  II; III 
S26  36  II; III; IVa,b 
S27  31  20  2.5  II; III; IVa,b 
S28  31  28  II; III; IVa,b 
S29  40  15  II 
S30  29  25  II; III; IVa,b 
S31  23  13  II; III 
Participants Age (years) Gender PTA (dB HL) Absolute threshold (dB HL) at 12.5 kHz Median mobility score Experiments
S1  23  18  Ia,b,c 
S2  25  35  Ia,b,c 
S3  30  10  Ia,b,c 
S4  27  20  Ia,b,c 
S5  27  10  18  Ia,b,c 
S6  25  28  Ia,b,c 
S7  27  18  Ia,b,c 
S8  27  45  Ia,b,c 
S9  25  18  Ia,b,c 
S10  29  13  Ia,b,c 
S11  23  23  Ia,b,c; IVa,b 
S12  22  20  Ia,b,c 
S13  25  15  Ia,b,c; IVa,b 
S14  23  11  20  Ia,b,c 
S15  25  28  Ia,b,c; IVa,b 
S16  33  43  Ia,b,c 
S17  26  12  23  Ia,b,c; IVa,b 
S18  22  28  Ia,b,c 
S19  28  15  Ia,b,c 
S20  24  33  Ia,b,c; IVa,b 
S21  32  13  18  Ia,b,c; III 
S22  36  20  II; III; IVa,b 
S23  27  20  II; III; IVa,b 
S24  23  15  II; III; IVa,b 
S25  25  13  II; III 
S26  36  II; III; IVa,b 
S27  31  20  2.5  II; III; IVa,b 
S28  31  28  II; III; IVa,b 
S29  40  15  II 
S30  29  25  II; III; IVa,b 
S31  23  13  II; III 

The ability to discriminate natural soundscapes might be affected by previous exposure to biological sounds and natural soundscapes. To examine this possibility, a questionnaire was designed to assess lifetime exposure to natural soundscapes. Questionnaires designed to evaluate history of exposure to occupational and non-occupational noise (Johnson , 2017; Griest-Hines , 2021) were used as a framework to develop the current questionnaire (see supplementary material for the questionnaire used to assess lifetime exposure to natural soundscapes).1

The questionnaire, which was sent by email to the participants after the experiments, aimed at capturing exposure to non-anthropologic biophony and natural soundscapes over the individual's lifetime. Listening situations were grouped into four sections. The first section, “mobility,” focused on the places where participants lived or have lived, even for short periods, to assess daily exposure. For each place, participants were asked (1) how many years they had lived there and how “wild” this area was on a scale from 1–5. Item 1 corresponded to areas with large and dense permanent human settlements where the landscape is mainly composed of infrastructure and built environment. Item 5 corresponded to the wildest environment, defined as areas where the land is in its natural state, and the impact from human activities is minimal. The remaining sections focused on exposure to natural sounds during “holidays,” “occupational,” and “non-occupational” activities. Accordingly, “urban” environments were excluded, and these sections only provided a scale from 3–5 corresponding to rural to wild environments. The holidays section assessed the exposure over delimited periods of time for leisure and recreation. The duration of exposure was quantified by asking the total number of days of exposure. The occupational section assessed the regular exposure due to participants' occupations. The non-occupational section assessed the exposure associated with leisure and/or recreational activities (excluding those categorized as holidays) over the lifetime of the participant. For these two sections, the duration of exposure was quantified by asking the total number of years of exposure and an estimation of exposure frequency [“yearly” (an average of 3 days/year), “monthly” (an average of 3 days/month), “weekly” (an average of 3 days/week), “daily” (at least once/day)]. Participants were also asked whether they considered that the activities examined in the last two sections required or developed some form of expertise on natural sounds. In addition, participants were asked whether they had been exposed in situ or to recorded natural sounds during these activities. All the durations were ultimately converted to years.

Two scoring metric indicatives of the overall exposure were computed for each of the 21 participants who participated in Experiment I. One corresponded to the mobility section while the other corresponded to the cumulative exposure from the remaining sections (holidays, occupational, non-occupational). These metrics consisted of the median of each section or combined sections. For instance, subject 21 reported living 5 years in category 1, 17 years in category 3, 2 years in category 4, and 2 years in category 5, resulting in the following response set [1,1,1,1,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,5,5]. Therefore, the median and corresponding mobility score was 3 for this subject. These medians were used to conduct two separate ordinal regressions with habitat, time of day, and season scores as factor variables. For mobility, 11, 6, and 4 participants had a median score of 1, 2, and 3, respectively. For the combined remaining sections, five, five, and nine participants had a median score of 3, 4, and 5, respectively (two participants reported no activities for sections 2–4).

The stimuli were generated from the sound database collected by Krause (2011) in the Sequoia National Park, an area located in the southern Sierra Nevada, east of Visalia, CA (USA) and designated as a Biosphere reserve by UNESCO in 1976. Recordings were made in four habitats, at four seasons, and in four broadly defined periods of the day. Detailed information about habitats, recording conditions, and material is provided in Krause (2011). The four habitats were chosen as they represent unique combinations of elevation and vegetation diversity. The four habitats, illustrated in Fig. 1, were: (1) Crescent Meadow (CM), located at 2154 m [N36° 33.364 W118° 44.867], a meadow surrounded by sequoia trees; (2) Shepherd Saddle (SH), located at 925 m [N36° 29.470 W118° 51.142], a dry savannah chaparral with high winds; (3) Buckeye Flat (BF), located at 890 m [N36° 31.185 W118° 45.692], a riparian area associated with a river producing a relatively loud stream; and (4) Sycamore Springs (SY), located at 645 m [N36° 29.470 W118° 51.225], a foothill site dominated by an oak savannah. The four seasons corresponded to fall (October), winter (January), spring (May), and summer (July). The four periods of the day corresponded to the following times: T1 [0, 5, or 7 a.m.], T2 [11 a.m.], T3 [4 or 5 pm], and T4 [8 or 11 pm]. As can be seen, the time range covered by these four periods was unequal; hence, the associated results should be interpreted with caution. In particular, recording samples from a period like T2 with a single recording time may be more homogeneous than recording samples from T1 and therefore easier to identify as belonging to the same period. Despite this limitation, the quality of these recordings argued in favor of using them. Moreover, the four periods of the day were sampled identically across habitats, limiting their influence on the experiments involving this factor which represented the majority of the experiments.

FIG. 1.

(Color online) Photographs illustrating the ecological characteristics of each recording site.

FIG. 1.

(Color online) Photographs illustrating the ecological characteristics of each recording site.

Close modal

A total of 64 h of recordings (i.e., 1 h recording for each of the 64 recording conditions: four habitats x four seasons x four periods of the day) was made during the period of September 2001 through October 2002. All sounds were recorded at a 22 050 Hz sampling rate (one channel; 16 bits; wav format). Thirty-second samples at 5 min intervals were extracted from the original recording, resulting in a 6.4 h database composed of 768 files that were 30 s–long (12 recordings that were 30 s–long for each of the 64 recording conditions). From this database, 90 acoustic 2 s samples were extracted for each combination of habitats, seasons, and periods of the day for a total of 5760 samples that were 2 s long (four habitats x four seasons and four periods of the day x 90 samples) (see the supplementary material for 16 examples of sound recordings).1

The 90 samples (2 s long) were taken contiguously from the 30 s–long recordings without overlapping. Previous studies investigating urban soundscape perception used recordings much longer than the present ones [e.g., 8–15 s for brain-imaging studies (Irwin et ., 2011; Gould van Praag , 2017) and 30 s to 3 min for psychophysical studies (Axelsson , 2010; Filipan , 2019)]. Here, the choice of 2 s was motivated by a combination of several practical and theoretical constraints. First, experimental time had to be kept within reasonable limits given the use of a multiple-interval paradigm. Second, stimuli had to be long enough for slow temporal modulations as low as a few Hertz to be represented given their importance for the perception of animal vocalizations (Theunissen and Elie, 2014). Note that while our experience of sounds is continuous, the duration of our exposure to a given soundscape, the proximal soundscape defined in Introduction, may vary from seconds to hours or more.

All samples were tapered by 50 ms raised-cosine ramps and equated in long-term root mean square (rms) power to remove obvious but idiosyncratic level cues related to recording conditions (e.g., distance from microphone to specific biotic and abiotic sound sources). All stimuli were presented diotically to each listener at a nominal sound pressure level (SPL) of 60 dBA under Sennheiser HD650 headphones (Wedemark, Germany). For each testing session, stimulus level was roved by ±6 dB (in 1 dB steps) within and across trials to further discourage the use of absolute loudness cues. It was decided to equate all recordings in terms of long-term rms power and rove presentation level across intervals in addition to that in order to limit the present investigation to the informational content of natural soundscapes.

In the present study, a forced-choice, three-interval oddity paradigm (Frijters, 1979; Frijters , 1980; Versfeld , 1996) and the method of constant stimuli, were used to investigate the ability of individuals with normal hearing to discriminate soundscapes. This “odd-one-out” sometime referred to as “triangular” paradigm is known to be more difficult compared to 2-alternative forced choice, 3-interval 2-alternative forced choice, yes/no, same/different, or XAB paradigms (Macmillan and Creelman, 2005) but one of the main advantages is to be easier to instruct. While such multiple-interval oddity paradigm may have lower ecological validity (Schmuckler, 2001; Neuhoff, 2004; Keidser , 2020) than the identification paradigms typically used in environmental studies (e.g., Gygi , 2004; Shafiro, 2008), it has the advantage to limit the strong biases potentially caused by experience, education, or previous exposure that are likely to affect decision making in single- or even multiple-interval identification tasks (Hautus , 2021). Each testing session (each block) consisted of 120 trials. Each trial was made of three successive time intervals, one target and two standards, separated by a 1 s silent inter-stimulus interval for a total of 3 × 120 = 360 stimuli per session. Samples were selected without replacement within a testing session so that no stimuli were presented more than once during a session. The target and two standards were presented in random order. Listeners were asked to indicate which interval differed from the remaining two by clicking on a box presented on a computer screen. Visual feedback as to the correct answer was given to the listener at the end of each trial. The next trial started 1 s after a listener's response. Every 12 trials, listeners were given a short (silent) 10 s break. Each testing session lasted about 20 min.

Soundscape recordings were passed through a two-stages model consisting of a computational model of human auditory perception followed by a classifier. The first stage was identical to that used by Varnet (2017) and Thoret (2020). First, sounds were passed through a bank of gammatone filters [bandwidth = 1 equivalent-rectangular-bandwidth (ERB); center frequency (CF) ranging from 70–10 298 Hz; log spacing]. A Hilbert envelope was then extracted at the output of each gammatone filter and decomposed into its AM components via a bank of broadly tuned modulation filters (Q = 1; CF ranging from 0.5–188 Hz; log spacing). The modulation index was calculated at the output of each modulation filter, yielding two-dimensional amplitude modulation index spectrum (AMi) (Thoret , 2020). The second stage of the model was a convolutional neural network (CNN) receiving these AMi spectra as inputs. Depending on the task, the CNN was trained and optimized for habitat, season, or period of the day classification. In the end, all three CNNs were based on the same eight layers of architecture, not counting the input layer. The first six layers consisted of three convolutional layers each followed by a subsampling layer (“max pooling”) that scaled down the input by a factor of 2. The convolutional layers consisted of 16, 32, and 32 kernels of size 3 × 3 in that order and used rectified linear unit (ReLu) activation functions. The seventh and eighth layers were fully connected layers with 1600 and 128 outputs, respectively. Finally, the output layer transformed the output of the last layer into a probability distribution of four values (one for each class, e.g., BF, CM, SH, SY) using a normalized exponential function (often referred to as SoftMax function). The CNNs were trained on a subset of the database consisting of 6360 samples. The remaining 360 samples were used for testing only. For each condition, classification (in percentage correct; chance level = 25%) and/or discrimination (in percentage correct; chance level = 33%) scores were computed based on those 360 samples. For discrimination scores, a differencing rule was implemented (Frijters, 1979; Frijters , 1980; Versfeld , 1996) to mimic the design used in the psychophysical experiments. For each interval in a trial, the CNN returned a probability distribution of four values. These values were used as coordinates in a four-dimensional space (i.e., each dimension represented the probability to belong to one of the four classes). The model then selected the odd interval based on the longest Euclidian distance between the three points/intervals.

1. Methods

The first experiment explored the ability to discriminate changes in (i) habitat, (ii) season, and (iii) period of the day. Twenty-one participants took part in this experiment (see Table I, last column, “Ia,b,c”). Discrimination abilities were tested in three sessions, each corresponding to a given experimental condition (habitat, season, or period of the day). The three experimental conditions were tested in random order. Experiment I lasted about 2 h (including breaks) per participant.

a. Habitat-discrimination task.

On each trial, participants were presented with a stimulus recorded in one habitat (e.g., BF) and two stimuli recorded in another habitat (e.g., SY). All 12 possible combinations of habitats were used during a session. These 12 combinations were presented in random order within the session. Within each trial, the season and period of the day were identical for all three stimuli, but changed randomly across trials. All the participants were presented with the same combinations of season and period of the day. However, the order of presentation of these combinations varied randomly across participants.

b. Season-discrimination task.

On each trial, participants were presented with a stimulus recorded during one season (e.g., fall) and two stimuli recorded in another season (e.g., winter), with the constraint that the two seasons were consecutive (i.e., fall and winter; winter and spring; spring and summer; summer and fall). All four pairs of consecutive seasons were used during a session. These four pairs were presented in random order within the session. Within each trial, the habitat and period of the day were identical for all three stimuli, but changed randomly across trials. All the participants were presented with the same combinations of habitat and period of the day. However, the order of presentation of these combinations varied randomly across participants.

c. Period-of-the-day discrimination task.

On each trial, participants were presented with a stimulus recorded during one period of the day (e.g., T1) and two stimuli recorded in another period of the day (e.g., T2), with the constraint that the two periods of the day were consecutive (i.e., T1-T2, T2-T3, T3-T4, and T4-T1). All four pairs of consecutive periods of the day were used during a session. These four pairs were presented in random order within the session. Within each trial, the habitat and season were identical for all three stimuli but changed randomly across trials. All the participants were presented with the same combinations of habitat and season. However, the order of presentation of these combinations varied randomly across participants.

2. Results

Figure 2 shows individual (dots) and mean (stars) discrimination scores for each experimental condition: habitat (leftmost light gray dots), season (middle gray dots), and period of the day (rightmost black dots). The discrimination scores for the three experiments were consistent across listeners. However, there was no correlation across scores for the three tasks. For habitat discrimination, scores ranged from 57%–72% correct (SD = 4.6% points). For season discrimination, scores ranged from 52%–66% correct (SD = 4% points). For discrimination of period of day, scores ranged from 55%–68% correct (SD = 3.5% points). Chance level corresponded to 33% correct discrimination for this three-interval, forced-choice oddity paradigm. Discrimination scores were significantly above chance for habitat [Student t test, t(20) = 31.4; p < 0.001], season [t(20) = 29.5; p < 0.001], and period of the day [t(20) = 35.1; p < 0.001]. Discrimination scores averaged across listeners were 64%, 59%, and 60% for habitat, season, and period of day, respectively. These scores correspond to sensitivity levels (d′) of 2.2, 1.9, and 2, respectively (Versfeld , 1996).

FIG. 2.

Results of Experiment I. Discrimination scores for changes in habitat (leftmost light gray dots), season (middle gray dots), and period of the day (rightmost black dots). In each panel, dots show individual scores and crosses show mean scores across participants (n = 21). Error bars show ±1 standard deviation about the mean. The horizontal dashed line shows chance level (33% correct discrimination).

FIG. 2.

Results of Experiment I. Discrimination scores for changes in habitat (leftmost light gray dots), season (middle gray dots), and period of the day (rightmost black dots). In each panel, dots show individual scores and crosses show mean scores across participants (n = 21). Error bars show ±1 standard deviation about the mean. The horizontal dashed line shows chance level (33% correct discrimination).

Close modal

Pearson-correlation analyses were conducted on scores for each of the three experimental conditions and age, PTA, or audiometric thresholds at 12.5 Hz. For each analysis, the criterion for significance (p < 0.05) was adjusted using a Bonferroni correction which divides the criterion by the number of comparisons made (three here). Therefore, significance was considered as achieved for p < 0.016. For each experimental condition, discrimination scores were not significantly correlated with age (habitat: r = 0.07; p = 0.75; season: r = 0.08; p =0.71; period of the day: r = 0.17; p = 0.46) or with the PTA calculated between 0.25 and 8 kHz (habitat: r = –0.12; p = 0.6; season: r = –0.5; p = 0.81; period of the day: r = 0.23; p = 0.92). Moreover, discrimination scores were not significantly correlated with the mean audiometric thresholds across ears at 12.5 kHz for habitat (r = –0.12; p = 0.6), season (r = –0.42; p = 0.053), and period of the day (r = –0.21; p = 0.35). Two ordinal logistic regression analyses were conducted to evaluate the influence of lifelong exposure to natural soundscapes (as estimated by our questionnaire) on discrimination scores. The latter were used as predictor variables in the models. None of these variables was found to contribute to mobility [habitat (p = 0.17), season (p = 0.31), and period of the day (p = 0.17)] or to the combined activities [holidays, occupational, non-occupational; habitat (p = 0.73), season (p = 0.10), and period of the day (p = 0.59)].

1. Methods

The second experiment explored again the ability to discriminate habitats, but for each combination of season and period of the day separately. In other words, season and period of the day were fixed within and between trials in Experiment II while they were fixed within trials in Experiment I but they varied randomly across trials. The purpose of using the same season and period of the day during an entire testing session was to compare the influence of these two factors on habitat discrimination. A likely consequence is that acoustic variability was reduced within each testing session. Ten new listeners participated in this experiment (see Table I, last column: “II”). Discrimination abilities were tested in 16 sessions, each corresponding to a given combination of season and period of the day (e.g., fall and T1). The 16 experimental conditions were tested in random order across participants. Experiment II lasted about 12 h (including breaks) per participant.

2. Results

The second experiment explored the ability to discriminate habitats in 10 naive listeners. In contrast to Experiment I, discrimination was assessed for each combination of season and period of the day separately. Figure 3 shows habitat-discrimination scores averaged across listeners for each combination of season and period of the day. Overall, discrimination scores were fairly consistent across listeners. Individual discrimination scores (not shown here) ranged from 37%–89% across conditions; the corresponding individual sensitivity levels (d′) ranged between 0.7 and 4 (Versfeld , 1996). Mean discrimination scores ranged between 53% and 81% across conditions; the corresponding d′ ranged between 1.6 and 3.2. As for Experiment I, discrimination scores were significantly above chance (Student t tests, all p < 0.0001). A repeated-measures analysis of variance (ANOVA) was conducted on these discrimination scores with season (four levels) and period of the day (four levels) as within-subjects factors. The main effect of season was not significant [F(3,27) = 2.25; p = 0.105]. However, the analysis revealed a significant main effect of period of the day [F(3,27) = 87.61; p < 0.0001] and a significant interaction between season and period of the day [F(9,81) = 5.93; p < 0.0001]. Post hoc comparisons (Tukey HSD test) indicated that discrimination scores were significantly higher during “T4” (i.e., night) for all seasons except winter (all p < 0.01). Informal listening revealed that this peak in discrimination at night was generally associated with insects chirping and singing.

FIG. 3.

Results of Experiment II. Discrimination scores for changes in habitat, for each season (fall, winter, spring, summer) and period of the day [see legend: T1 (light gray bars), T2 (gray bars), T3 (dark gray bars) or T4 (black bars)]. Bars show mean scores across participants (n = 10). Error bars show ±1 standard deviation about the mean. The horizontal dashed line shows chance level (33% correct discrimination).

FIG. 3.

Results of Experiment II. Discrimination scores for changes in habitat, for each season (fall, winter, spring, summer) and period of the day [see legend: T1 (light gray bars), T2 (gray bars), T3 (dark gray bars) or T4 (black bars)]. Bars show mean scores across participants (n = 10). Error bars show ±1 standard deviation about the mean. The horizontal dashed line shows chance level (33% correct discrimination).

Close modal

This pattern (i.e., T4 > T1 > T3 > T2) could be related to the level of biophony activity (that is, more activity at night, then at dawn, less activity during the afternoon and a minimum at noon) and so to the level of information provided in each recording. This suggests that the more acoustic activity/diversity increases, the more discrimination increases also.

1. Methods

The third experiment explored the contribution of different spectral regions to the ability to discriminate habitats. The method followed that used in Experiment Ia (i.e., fixed season and period of the day within a trial but randomized across trials). In one condition, the stimuli were left intact, i.e., they were not filtered (UNF). In eight other conditions, they were lowpass (LP) or highpass (HP) filtered (zero phase, –72 dB/oct slope Butterworth filter) at each of four cutoff frequencies (0.5, 1, 2, and 4 kHz) for a total of nine experimental conditions (including UNF). Ten listeners participated in this experiment. These listeners benefited from some amount of initial training in the habitat-discrimination task (ranging from 2–12 h). Before participating in Experiment III, one listener had participated in Experiment I and nine listeners had participated in Experiment II (see Table I, last column: “III”). Discrimination abilities were tested in nine sessions, each corresponding to a given filtering condition (e.g., lowpass filtered stimuli with a cutoff frequency of 1 kHz). The nine experimental conditions were tested in random order across participants. Table II details the nine conditions of this experiment and the label assigned to each one of them. Experiment III lasted about 5 h (including breaks) per participant.

TABLE II.

Description of the various conditions of Experiments III and IV and the label assigned to each of them (e.g., 8B-UNP).

Experiment (Exp) Condition Signal processing
Exp III    Audio filtering applied to the broadband signal 
  UNF  No filtering 
  LP0.5 kHz  Lowpass filtering at 0.5 kHz 
  LP1kHz  Lowpass filtering at 1 kHz 
  LP2kHz  Lowpass filtering at 2 kHz 
  LP4kHz  Lowpass filtering at 4 kHz 
  HP0.5kHz  Highpass filtering at 0.5 kHz 
  HP1kHz  Highpass filtering at 1 kHz 
  HP2kHz  Highpass filtering at 2 kHz 
   HP4kHz  Highpass filtering at 4 kHz 
Exp IV    Processing applied to bandpass signals 
  8-band noise vocoder   
  8B-ENV-LP5Hz  Envelopes lowpass filtered at 5 Hz 
  8B-ENV-LP20Hz  Envelopes lowpass filtered at 20 Hz 
  8B-ENV-LP150Hz  Envelope lowpass filtered at ERB/2 (max = 150 Hz) 
  8B-ENV-HP5Hz  Envelopes highpass filtered at 5 Hz 
  8B-ENV-HP20Hz  Envelopes highpass filtered at 20 Hz 
  8B-UNP  Bandpass signals reconstructed by multiplying the (lowpass filtered) Hilbert envelope and original TFS 
  8B-EqualRms-ENV-LP150Hz  As in 8B-ENV-LP150Hz, but long-term rms power of each band set to the mean rms power across the eight bands 
  8B-Flat-ENV  Envelopes are discarded; eight frequency-limited noise carriers assigned their original long-term rms power 
  1-band noise vocoder   
  1B-ENV-LP5Hz  Envelope lowpass filtered at 5 Hz 
  1B-ENV-LP20Hz  Envelope lowpass filtered at 20 Hz 
   1B-ENV-LP150Hz  Envelope lowpass filtered at 150 Hz 
Experiment (Exp) Condition Signal processing
Exp III    Audio filtering applied to the broadband signal 
  UNF  No filtering 
  LP0.5 kHz  Lowpass filtering at 0.5 kHz 
  LP1kHz  Lowpass filtering at 1 kHz 
  LP2kHz  Lowpass filtering at 2 kHz 
  LP4kHz  Lowpass filtering at 4 kHz 
  HP0.5kHz  Highpass filtering at 0.5 kHz 
  HP1kHz  Highpass filtering at 1 kHz 
  HP2kHz  Highpass filtering at 2 kHz 
   HP4kHz  Highpass filtering at 4 kHz 
Exp IV    Processing applied to bandpass signals 
  8-band noise vocoder   
  8B-ENV-LP5Hz  Envelopes lowpass filtered at 5 Hz 
  8B-ENV-LP20Hz  Envelopes lowpass filtered at 20 Hz 
  8B-ENV-LP150Hz  Envelope lowpass filtered at ERB/2 (max = 150 Hz) 
  8B-ENV-HP5Hz  Envelopes highpass filtered at 5 Hz 
  8B-ENV-HP20Hz  Envelopes highpass filtered at 20 Hz 
  8B-UNP  Bandpass signals reconstructed by multiplying the (lowpass filtered) Hilbert envelope and original TFS 
  8B-EqualRms-ENV-LP150Hz  As in 8B-ENV-LP150Hz, but long-term rms power of each band set to the mean rms power across the eight bands 
  8B-Flat-ENV  Envelopes are discarded; eight frequency-limited noise carriers assigned their original long-term rms power 
  1-band noise vocoder   
  1B-ENV-LP5Hz  Envelope lowpass filtered at 5 Hz 
  1B-ENV-LP20Hz  Envelope lowpass filtered at 20 Hz 
   1B-ENV-LP150Hz  Envelope lowpass filtered at 150 Hz 

Although the stimuli were filtered using Butterworth filters with a relatively steep slope of 72 dB/oct, listeners may still have been able to use information from the “transition bands,” where soundscape information was attenuated but not removed completely. To prevent “off-frequency listening,” a complementary noise was presented simultaneously with each stimulus. This complementary noise, referred to as S3N (for “soundscape-shaped noise”), had long-term spectral properties that matched those of a subset of concatenated soundscapes samples (64 samples that were 2 s long, one for each combination of habitat, season, and each period of the day, for a total of 128 s). The complementary noise was generated by deriving the Fourier transform of the concatenated audio files. The phases of the spectral components were then randomized following a Gaussian distribution. The resulting modified Fourier output was converted back into the time domain using an inverse Fourier transform, bandpass filtered between 0.02 and 10 kHz (Butterworth filters, 72 dB/oct) and finally, equated in long-term rms power with the original soundscape recordings.

As shown in Fig. 4, the resulting S3N has a long-term power spectrum almost identical to that of the original corpus. Both power spectra show a lowpass shape, power decreasing by about 30 dB between about 0.1 and 10 kHz. Interestingly, these spectra show increased power between about 1.5 and 10 kHz. According to ecoacoustic studies (e.g., Gage and Axel, 2014; Farina and Gage, 2017), this reflects the contribution of biotic sound sources (biophony) such as those produced by aves and insects, and to a lesser extent, by amphibians and mammals. For each interval, a 2 s sample was randomly extracted from the 128 s–long S3N and filtered (zero phase, Butterworth filters, 72 dB/oct) using the same cutoff frequency as the soundscape sample but in a complimentary fashion (e.g., lowpass filtered if the stimuli were highpass filtered).

FIG. 4.

Long-term power spectra of (i) a subset of 64 concatenated 2 s soundscapes samples (one for each combination of habitat, season, and period of the day (thick black line), and (ii) a 128 s soundscape-shaped noise (S3N, thin gray line). The analysis window is set to 512 points. For visual convenience, the power spectrum of S3N is shifted by 1 dB re original corpus. The power spectrum of S3N is almost identical to that of the original corpus.

FIG. 4.

Long-term power spectra of (i) a subset of 64 concatenated 2 s soundscapes samples (one for each combination of habitat, season, and period of the day (thick black line), and (ii) a 128 s soundscape-shaped noise (S3N, thin gray line). The analysis window is set to 512 points. For visual convenience, the power spectrum of S3N is shifted by 1 dB re original corpus. The power spectrum of S3N is almost identical to that of the original corpus.

Close modal

2. Results

The third experiment explored the ability to discriminate habitats as in Experiment Ia. It was conducted with 10 trained listeners. The stimuli were left as such (UNF) or low-pass (LP) or high-pass (HP) filtered using four cut-off frequencies (0.5, 1, 2, and 4 kHz). A complementary masking noise (S3N) was added in the LP and HP conditions to prevent off-frequency listening. The individual and mean results are shown in Figs. 5 and 6, respectively. Figures 5 and 6 show that most of the detrimental effects of soundscape filtering in the audio frequency domain occurs between 1 and 4 kHz where discrimination scores dropped by up to 20% points at the extreme cutoff frequencies. Still, Student t tests, corrected for multiple comparison (Bonferroni), showed that discrimination scores measured for soundscapes highpass filtered at 0.5 kHz or lowpass filtered at 4 kHz were significantly lower than those for unfiltered (UNF) soundscapes (p < 0.05). This suggests that listeners could also use acoustic cues outside the 0.5–4 kHz range to discriminate natural soundscapes. An ANOVA conducted on discrimination scores with filtering type (two levels: lowpass vs highpass filtering) and cutoff frequency (four levels: 0.5, 1, 2, and 4 kHz) as within-subjects factors showed significant main effects of filtering type [F(1,9) = 20.3; p < 0.01] and cutoff frequency [F(3,27) = 3.34; p < 0.05]. The interaction between these two factors was also significant [F(3,27) = 34.33; p < 0.0001]. As can be seen in Fig. 5, the individual crossover frequency for lowpass and highpass filtering conditions ranged between 1 and 3 kHz, with an average crossover frequency at 1.8 kHz (Fig. 6). The apparent variability in these data suggests that these trained listeners weighted differently acoustic cues between 1 and 3 kHz: One listener (S25) showed a crossover frequency at 1 kHz; six listeners showed a crossover frequency between 1 and 2 kHz and for the remaining three listeners, the crossover frequency was between 2 and 3 kHz.

FIG. 5.

Individual data of Experiment III. Discrimination scores for changes in habitat when stimuli are either lowpass filtered (open circles) or highpass filtered (open squares) in the audio frequency domain. In each panel, the horizontal dashed line shows chance level (33% correct discrimination); the horizontal continuous line shows the discrimination score for the UNF condition.

FIG. 5.

Individual data of Experiment III. Discrimination scores for changes in habitat when stimuli are either lowpass filtered (open circles) or highpass filtered (open squares) in the audio frequency domain. In each panel, the horizontal dashed line shows chance level (33% correct discrimination); the horizontal continuous line shows the discrimination score for the UNF condition.

Close modal
FIG. 6.

Mean data of Experiment III. Discrimination scores for changes in habitat when stimuli are either lowpass filtered (open circles) or highpass filtered (open squares) in the audio frequency domain. Open symbols show the mean scores across participants (n = 10). Error bars show ±1 standard deviation about the mean. The horizontal dashed lines show chance level (33% correct discrimination). The horizontal continuous lines show the mean discrimination score across listeners for the UNF condition (65% correct discrimination: SD = 8.7% points).

FIG. 6.

Mean data of Experiment III. Discrimination scores for changes in habitat when stimuli are either lowpass filtered (open circles) or highpass filtered (open squares) in the audio frequency domain. Open symbols show the mean scores across participants (n = 10). Error bars show ±1 standard deviation about the mean. The horizontal dashed lines show chance level (33% correct discrimination). The horizontal continuous lines show the mean discrimination score across listeners for the UNF condition (65% correct discrimination: SD = 8.7% points).

Close modal

1. Methods

The fourth experiment explored the contribution of (i) temporal fine structure cues, (ii) gross spectral cues, and (iii) temporal-envelope cues to the ability to discriminate changes in habitat. The method followed that used in Experiment Ia (i.e., fixed season and period of the day within a trial but randomized across trials). Here, the stimuli were processed by a noise vocoder to selectively degrade the categories of acoustic cues. Experiment IV was divided into two sets of conditions. Set 1, referred to as “8B,” corresponded to eight 8-band vocoder conditions. Set 2, referred to as “1B,” corresponded to three 1-band vocoder conditions. Twelve listeners participated in this experiment (see Table I, last column: “IVa,b”). The 8B conditions were administered before the 1B conditions. However, within each set, the different vocoder conditions were tested in random order across participants. Table II details the 11 conditions of this experiment and the label assigned to each of them. Participants were informed that they would be listening to “heavily distorted natural soundscapes” prior to the experiment. Experiment IV lasted about 6 h (including breaks) per participant.

a. 8-band vocoder conditions.

Each acoustic sample was initially split into eight, 4 ERB-wide complementary analysis frequency bands spanning the 20–8026 Hz range using zero phase, Butterworth filters (72 dB per octave roll-off): 80–142 Hz (band 1, CF = 107 Hz), 142–253 Hz (band 2, CF = 190 Hz), 253–450 Hz (band 3, CF = 337 Hz), 450–801 Hz (band 4, CF = 600 Hz), 801–1426 Hz (band 5, CF = 1069 Hz), 1426–2536 Hz (band 6, CF = 1902 Hz), 2536–4512 Hz (band 7, CF = 3383 Hz), and 4512–8026 Hz (band 8, CF = 6018 Hz). The Hilbert transform was then applied in each frequency band to decompose the bandpass filtered signal into its temporal-envelope (module of the Hilbert analytic signal) and temporal fine structure (TFS; cosine of the argument of the Hilbert analytic signal). The resulting envelopes were (i) lowpass filtered (zero-phase Butterworth filter, 72 dB/oct roll-off) using a cutoff frequency of either 5, 20, or half the bandwidth (ERB/2) of the auditory filter centered at the CF of the analysis band (the maximum cutoff frequency being set to 150 Hz), or (ii) highpass filtered (zero-phase Butterworth filter, 72 dB/oct roll-off) using a cutoff frequency of either 5 or 20 Hz. The filtered envelopes were then used to modulate Gaussian white noises. The modulated noises were frequency-limited by filtering with the same bandpass filter used in the original analysis band and assigned their original long-term rms power to correct for envelope-filtering effects. The resulting modulated noises were finally summed. The experimental conditions corresponding to lowpass filtered envelopes were labelled “8B-ENV-LP5Hz,” “8B-ENV-LP20Hz,” “8B-ENV-LP150Hz.” Those corresponding to highpass filtered envelopes were labelled “8B-ENV-HP5Hz” and “8B-ENV-HP20Hz.” Three additional experimental conditions, “8B-UNP,” “8B-EqualRms-ENV-LP150Hz,” and “8B-Flat-ENV,” were included to assess (i) the effects of removing temporal fine-structure cues, and (ii) the role of long-term spectral cues. In the 8B-UNP condition, temporal-envelope and temporal fine structure were computed at the output of each of the eight, four ERB-wide complementary frequency bands by means of the Hilbert transform. The bandpass signal was reconstructed by multiplying each Hilbert envelope (lowpass filtered at ERB/2 with a maximum cutoff of 150 Hz) with the corresponding temporal fine structure. The resulting signals were frequency-limited by filtering with the same bandpass filter used in the original analysis band. The bandpass signals were finally combined. In the 8B-EqualRms-ENV-LP150Hz condition, signal processing was identical to that used to generate stimuli in the 8B-ENV-LP150Hz condition, with the exception that the long-term rms power of each band was set to the mean rms power across the eight bands. In the 8B-Flat-ENV condition, the temporal-envelope was discarded (that is, “replaced” by a flat temporal-envelope). Gaussian white noises were frequency-limited by filtering with the same bandpass filters used in the original analysis bands and assigned the original long-term rms power in that band. The resulting noise bands were finally combined. All vocoded signals were equated in long-term rms power.

b. 1-band vocoder conditions.

Each acoustic sample was initially bandpass filtered between 20 and 8026 Hz using a zero phase, Butterworth filter (72 dB per octave roll-off). The Hilbert transform was then applied at the output of this wide frequency band to obtain the temporal-envelope and temporal fine structure. The temporal fine structure was discarded. The envelope was lowpass filtered (zero-phase Butterworth filter, 72 dB/oct roll-off) using a cutoff frequency of either 5 (“1B-ENV-LP5Hz”), 20 (“1B-ENV-LP20Hz”), or 150 Hz (“1B-ENV-LP150Hz”). The resulting envelope was then used to modulate a Gaussian white noise. The modulated noise was frequency-limited by filtering with the same bandpass filter used in the original analysis band. All vocoded signals were equated in long-term rms power.

2. Results

The last experiment explored the ability to discriminate habitats as in Experiment Ia, but for stimuli processed through an 8-band or 1-band noise vocoder. Twelve trained listeners participated in this experiment. The mean results across listeners are shown in Fig. 7.

FIG. 7.

Mean data of Experiment IV. Discrimination scores for changes in habitat when stimuli are noise vocoded. In each panel, gray bars show the mean scores across participants (n = 12). Error bars show ±1 standard deviation about the mean. The thick horizontal dashed line shows chance level (33% correct discrimination). The thick horizontal dashed-dotted line shows the mean discrimination score across listeners for the 8B-UNP condition. The thin horizontal dotted line shows the mean discrimination score across listeners for the 8B-ENV-LP150Hz condition.

FIG. 7.

Mean data of Experiment IV. Discrimination scores for changes in habitat when stimuli are noise vocoded. In each panel, gray bars show the mean scores across participants (n = 12). Error bars show ±1 standard deviation about the mean. The thick horizontal dashed line shows chance level (33% correct discrimination). The thick horizontal dashed-dotted line shows the mean discrimination score across listeners for the 8B-UNP condition. The thin horizontal dotted line shows the mean discrimination score across listeners for the 8B-ENV-LP150Hz condition.

Close modal
a. 8-band vocoder conditions.

For each condition, discrimination scores were significantly above chance (Student t tests, all p < 0.0001). The top panel of Fig. 7 shows the effects of removing temporal fine-structure cues and filtering temporal-envelopes in each band. Overall, mean discrimination scores were comparable across conditions. Student t tests with Bonferroni corrections were run on the six “8B” conditions (8B-UNP, 8B-ENV-LP150Hz, 8B-ENV-LP5Hz, 8B-ENV-LP20Hz, 8B-ENV-HP5Hz, 8B-ENV-HP20Hz) to assess (i) the effect of removing temporal fine-structure cues and (ii) the effect of degrading temporal-envelope cues. Comparison between 8B-UNP and 8B-ENV-LP150Hz shows that removing temporal fine-structure cues had no significant effect on discrimination scores (p = 0.83). Comparison between 8B-ENV-LP150Hz and each of the four remaining “filtered” conditions (8B-ENV-LP5Hz, 8B-ENV-LP20Hz, 8B-ENV-HP5Hz, 8B-ENV-HP20Hz) shows that lowpass or highpass filtering the temporal-envelope had no significant effects on discrimination scores (all p > 0.1).

The leftmost bottom panel of Fig. 7 shows the effect of removing temporal-envelope cues while preserving long-term spectral cues (8B-FlatEnv) or conversely, removing long-term spectral cues while preserving temporal-envelope cues (8B-EqualRms-Env-LP150Hz) in each of the eight bands of the noise vocoder. Student t tests with Bonferroni corrections were run on three conditions (8B-Env-LP150Hz; 8B-FlatEnv; 8B-EqualRms-Env-LP150Hz) to assess the importance of long-term spectral cues. Comparison between 8B-Env-LP150Hz and 8B-FlatEnv shows that removing entirely temporal-envelope cues in each of the eight bands had no significant effect on discrimination scores (p = 1.0). Comparison between 8B-Env-LP150Hz and 8B-EqualRms-Env-LP150Hz shows that removing long-term spectral cues in each of the eight bands had a clear detrimental effect on discrimination scores (p < 0.0001) with a drop in performance by 16 percentage points. As can be seen in Fig. 7, discrimination scores were significantly above chance for all conditions, including 8B-FlatEnv (no temporal-envelope cues; long-term spectral cues preserved) and 8B-EqualRms-Env-LP150Hz (no long-term spectral cues; temporal-envelope cues preserved). This suggests that both long-term spectral cues and temporal-envelope cues could be used by our listeners to discriminate natural soundscapes. However, discrimination scores were significantly higher in the 8B-FlatEnv condition than in the 8B-EqualRms-Env-LP150Hz condition (p < 0.0001), indicating that long-term spectral cues played a greater role than temporal-envelope cues in discrimination.

b. 1-band vocoder conditions.

The rightmost bottom panel of Fig. 7 shows discrimination scores when spectral cues were entirely degraded, reflecting the capacity to use broadband temporal-envelope cues only (1B-Env-LP5Hz, 1B-Env-LP20Hz, 1B-Env-LP150Hz). Discrimination scores were significantly above chance for only one condition: 1B-ENV-LP150Hz [Student t test, t(11) = 5.1; p < 0.001]. Although significant, performance remained low in this condition at 38.5% correct, a mere 5.5% points above chance (33% correct).

Three computational models of human auditory perception were optimized and trained to classify soundscapes based on habitat, season, or moment of the day. The classification scores (in % correct; chance level = 25%) obtained with an unseen set of samples were 88.6% for habitat, 88.9% for season, and 80% for period of the day. These scores are higher than those from Thoret (2020) who reported classification scores of 63% for habitat, 62% for season, and 63% for period of the day using a SVM classifier. Discrimination scores (in % correct; chance level = 33%) were also obtained using a differencing decision rule to mimic the behavioural data from Experiment I. They were equal to 89.2% for habitat, 87.5% for season, and 82.5% for period of the day. It is noteworthy that the auditory model did not reach perfect classification and discrimination performance in these tasks. Simulation scores from the model along with discrimination scores from human participants, are shown in Fig. 8 (left panel). As can be seen, the computational model systematically outperformed human listeners by 22%–28% points, suggesting that human listeners ignored potentially useful spectro-temporal information and/or were not able to process the available information optimally.

FIG. 8.

Simulation data plotted along the empirical results of Experiment I (left panel) and Experiment III (right panel). Left panel: Mean human data (crosses with error bars) are plotted along with simulated scores (filled circles). See Fig. 1 for details about figure legends. Right panel: mean human data (open circles and squares connected by continuous lines) are plotted along with simulated scores (filled circles and squares connected by dotted lines). See Fig. 6 for details about figure legends.

FIG. 8.

Simulation data plotted along the empirical results of Experiment I (left panel) and Experiment III (right panel). Left panel: Mean human data (crosses with error bars) are plotted along with simulated scores (filled circles). See Fig. 1 for details about figure legends. Right panel: mean human data (open circles and squares connected by continuous lines) are plotted along with simulated scores (filled circles and squares connected by dotted lines). See Fig. 6 for details about figure legends.

Close modal

The auditory model trained on habitat classification was then used to perform discrimination on the degraded stimuli from Experiment III. The results of this simulation are also shown in Fig. 8 (right panel). As can be seen, discrimination scores steadily dropped as the cutoff frequency decreased (lowpass filter) or increased (highpass filter). The fact that scores in the highpass-filtered-at-0.5-kHz condition and to a lower extent in the lowpass-filtered-at-4-kHz condition were much lower than those measured with original (UNF) soundscapes indicates that the model used acoustic cues outside the 0.5–4 kHz range to discriminate natural soundscapes. Interestingly, the crossover frequency for lowpass and highpass filtering was slightly lower than 1 kHz for the model. This is to contrast with the results obtained in human listeners (i.e., 1.8 kHz). This discrepancy may very well reflect the divergent strategies implemented by the model and the human listeners. The strategy of the model may have favored relatively more invariant cues, more appropriate for classification. Indeed, low-frequency cues (<1 kHz) have often been associated with wind (Bradley , 2003; Sapozhnykov, 2019) or rain (Bedoya , 2017; Sanchez-Giraldo , 2020). These geophony cues, however, were perhaps too subtle for human listeners who “chose” to focus on more easily identifiable cues such as biophony (see Discussion).

Finally, the auditory model trained on habitat classification was also used to perform discrimination on the degraded stimuli from Experiment IV. The results of this simulation are shown in Fig. 9. It can be seen in the upper panel of Fig. 9 that simulated scores were much higher for the 8B-UNP (72%) than the 8B-Env-LP150Hz (56%) condition. Thus, reducing frequency resolution to eight channels. or in other words, decreasing the amount of temporal-envelope information delivered to the auditory model, substantially altered its performance. It may also be the case that the intrinsic random envelope fluctuations conveyed by the noise carrier contributed to this decrease in performance. Lower performance in 8B-Env-LP5Hz, 8B-Env-LP20Hz, 8B-Env-HP5Hz, 8B-Env-HP20Hz, and 8B-FlatEnv conditions compared to the 8B-Env-LP150Hz condition suggests that unlike human listeners, the model used slow and fast temporal-envelope information. The lowest score (39%) was obtained when temporal-envelopes were highpass filtered at 5 Hz, indicating that the slowest (<5 Hz) AM cues presumably related to biophony (e.g., bird vocalisations) may have played a greater role than faster ones in the model's decisions. Comparison between the model and human data also illustrates an inherent limitation of neural networks: once trained, their weights are no longer updated, meaning that they cannot adapt to change without additional training. This is particularly apparent in the 8B-ENV LP/HP conditions where the human listeners generally outperformed the model, indicating that the cues present in the degraded stimuli were sufficient to achieve a performance well above that of the model. Performance remained well above chance level in the 8B-EqualRms-ENV-LP150Hz (45%), further suggesting that the auditory model used temporal-envelope information (leftmost bottom panel). However, this score was 11 points of percentage below that obtained for the 8B-Env-LP150Hz condition, indicating that the model may have also used long-term spectral cues, as human listeners did. Performance was comparable in the 8B-FlatEnv and 8B-EqualRms-ENV-LP150Hz conditions (41% and 45%, respectively), suggesting that the model did not place as much emphasis on long-term spectral cues as human listeners. Interestingly, the model's performance was close to chance level for all 1B conditions except for 1B-Env-LP150Hz (41%), which is in line with humans' data. Taken together, empirical and modelling data suggest that narrowband temporal-envelope cues may provide useful information for natural soundscape discrimination, while broadband temporal-envelope cues are not heavily weighted in the decision process. One interpretation is that the latter are too distorted by the noise vocoder to constitute usable information. In contrast to humans, however, narrowband temporal-envelope cues, and especially the slowest ones, may have played a stronger role than long-term spectral cues for the auditory model.

FIG. 9.

Simulation data plotted along the empirical results of Experiment IV. Mean human data (gray bar with error bars) are plotted along with simulated scores (filled circles). See Fig. 7 for details about figure legends.

FIG. 9.

Simulation data plotted along the empirical results of Experiment IV. Mean human data (gray bar with error bars) are plotted along with simulated scores (filled circles). See Fig. 7 for details about figure legends.

Close modal

Perhaps the most remarkable result from the present study is that listeners show relatively high sensitivity for discrimination of habitat, season, and period of the day (d′ scores ranged between 0.7 and 4). In other words, listeners can discriminate natural soundscapes based on differences in habitat, season, or period of the day over only a 2 s sample. Despite large acoustic variability across these natural soundscapes, listeners were also fairly consistent. This may seem surprising given that most of our listeners were born and lived in urban environments. As shown by the questionnaire, the 21 listeners who participated in the first experiment accumulated on average 5.8 years (range from 0.1–17.3 years) in situations where they could have been exposed to natural soundscapes since they were born. Given that they were aged between 22 and 33 years, this listening experience grossly amounts to only 23% of their entire life (range from 0.3%–69%). Therefore, one may assume that little to no exposure to natural sounds is needed to discriminate soundscapes (as implemented in the present study) with relatively good accuracy. Overall, habitats' sensitivity was relatively comparable in Experiments Ia and II, with only the night condition in Experiment II (excluding winter season) yielding higher sensitivity. Thus, stimulus variability associated with changes in season and period of the day did not impact auditory sensitivity to changes in habitats, suggesting that natural soundscape discrimination is an ancestral and robust capacity of the human auditory system. It is important to keep in mind that the current data were collected for a single temperate terrestrial biome in the American continent. Further work is warranted to assess whether these conclusions are generalizable to other, and potentially quite different, terrestrial biomes (e.g., tropical, sub-tropical biomes).

The 21 naive listeners who participated in Experiment Ia achieved a mean score of 64.3% (SD = 4.6% points) for habitat discrimination. Despite 2–12 h experience with habitat-discrimination tasks, the 10 listeners who later also participated in Experiment III achieved a comparable score (64.8%, SD = 8.6% points) for habitat (UNF condition). A Student t test for independent samples showed that discrimination scores were not significantly different between Experiment Ia and Experiment III (UNF) [t(29) = –0.22; p = 0.83]. This absence of significant improvement in discrimination after hours of practice suggests that the task and stimuli used in the present study were not susceptible to training effects. The absence of correlation between habitat-discrimination scores and lifelong exposure to natural soundscapes as measured by our questionnaire further suggests that this absence of training effect may not be attributed to the relatively short practice provided in the study. Clearly, additional work is needed to determine whether the present results are limited to the task and stimuli used here or reflect a general imperviousness of our auditory capacity to training effects in habitat discrimination. For example, one could compare the current data with scores collected on persons who developed an expertise with natural soundscapes, such as ornithologists.

The results of audio-filtering experiments (Experiment III) indicate that listeners used acoustic cues over the entire audible frequency range when discriminating habitats. However, they seemed to assign a higher weight to acoustic cues between 1 and 3 kHz. Interestingly, differences in habitat acoustics, and especially differences between open and closed environments, have been associated with large spectral changes in the mid-frequency range (e.g., Morton, 1975; Swearingen and White, 2007). In the stimuli set used in the present study, such dichotomy existed. The set included two open environments [a grassland (SY) and a chaparral (SH)] and one closed environment (a forest: BF). The meadow (CM) might be a specific case as it corresponds to an open environment within a closed one. The spectral differences between these environments are illustrated in Fig. 10, which shows the long-term power spectra of the four habitats (computed for a subset of 160 samples that were 2 s long per habitat). It could be the case that the greater weight attributed by our human listeners to acoustic cues in the mid-frequency range when performing habitat discrimination reflected the dominant role played by biophony and habitat acoustics in decision making. This suggests that human soundscape discrimination may be based on identifiable events, such as bird vocalizations or insect chirps, but also on global acoustic properties of the soundscape. In other words, both strategies may have been used to perform the task, depending on the characteristics of the recordings. For instance, subjects may have relied more heavily on biophony when it was present in some of the intervals. In contrast, they may have used global acoustic properties of the soundscape when biophony was not present or non-distinctive. However, human listeners also assign a greater weight to acoustic cues in the mid-frequency range for the identification of speech sounds (e.g., Studebaker , 1987; Bell , 1992; Ardoint and Lorenzi, 2010) and urban sounds (Gygi , 2004). Thus, the current data may reflect general and relatively stable decision strategies rather than specialized and adaptive ones. Further work is needed to clarify this issue by, for example, comparing perceptual weighting functions for soundscapes associated with other terrestrial biomes (e.g., deserts, savannah, boreal/tropical/sub-tropical forests).

The results of the modulation-filtering experiment (Experiment IV) suggest that human listeners rely on long-term spectral cues rather than temporal-envelope and temporal fine-structure cues in the habitat-discrimination task. Indeed, removing either the temporal fine structure, low-rate, and/or high-rate temporal-envelope information did not significantly affect soundscape discrimination performance. On the other hand, removing long-term spectral cues altered performance substantially. These results are somewhat inconsistent with the conclusions drawn from the modelling study conducted by Thoret , (2020). These results are also inconsistent with the outcome of identification tasks performed with speech and urban sounds (e.g., Shannon , 1995; Gygi , 2004). Overall, they indicate that human listeners do not use broadband temporal-envelope cues in habitat discrimination, probably because these cues are too distorted. However, they also show that narrowband temporal-envelope cues may be used in the absence of long-term spectral cues, although discrimination performance is not as good as that measured when long-term spectral cues are available. The limited contribution of acoustic temporal-envelope and temporal fine-structure cues observed in the present study is all the more surprising considering that these cues have long been associated with the perception of pitch, onset/offset, and rhythmic patterns, which are known to play an important role in auditory scene analysis (see Bregman, 1990; Moore and Gockel, 2012) and are considered salient features of animal vocalizations (see Gerhardt and Huber, 2002; Marler and Slabbekoorn, 2004; Catchpole and Slater, 2008). The limited contribution of temporal-envelope cues is also surprising given their role in the auditory perception of textures (McDermott and Simoncelli, 2011), which correspond to important aspects of geophony (e.g., wind, rain, streams) and biophony (dawn and dusk choruses) in natural soundscapes.

FIG. 10.

Long-term power spectra of a subset of 160 concatenated 2 s soundscapes samples from habitats CM (thick gray dots), BF (thick black line), SH (thin black dots), and SH (thin black line). For each habitat, the 160 soundscapes (2-sec long) correspond to 10 samples for each of the 16 combinations of season and period of the day.

FIG. 10.

Long-term power spectra of a subset of 160 concatenated 2 s soundscapes samples from habitats CM (thick gray dots), BF (thick black line), SH (thin black dots), and SH (thin black line). For each habitat, the 160 soundscapes (2-sec long) correspond to 10 samples for each of the 16 combinations of season and period of the day.

Close modal

Taken together, the present findings suggest that for habitat discrimination, human observers can use long-term spectral cues and, to a lesser degree, narrowband temporal-envelope cues, but they adopt a “veto” rule excluding temporal information when long-term spectral cues are available. Thus, the present results are globally consistent with the notion that human listeners preferentially use differences in long-term spectral cues, the sensory cues based on the neural excitation pattern evoked by soundscapes in auditory-nerve fibers (i.e., brightness of timbre), when discriminating habitats from a temperate biome. Note that these conclusions should not be generalized to the discrimination of season and period of the day. The large variations in biological activity across season and periods of the day suggest that both gross and fine spectro-temporal cues associated with animal vocalizations might play a greater role in this case. Further work is warranted to clarify this important point.

Comparison between behavioural and modelling data suggests that human listeners do not use (or do not have the capacity to use) all the available spectro-temporal sensory information when discriminating natural soundscapes. Habitat, season, and period of the day were classified and discriminated with great accuracy (close to 90% correct) by a deep neural network using “perceptual” spectro-temporal (AM) cues, when human listeners' performance hardly reached 70% correct (not including the night conditions from Experiment II). The computational model used acoustic cues over the entire audible audio frequency range, including relatively low audio frequency information presumably associated with geophony (e.g., wind, rain, and stream) and mid–high audio frequency information presumably associated with biophony (here, mostly birds and insects). Both long-term spectral cues and temporal-envelope cues were also used by the auditory model. Slow (<5 Hz) temporal-envelope cues played a greater role than fast ones. In contrast, human listeners seemed to use a veto rule when monitoring long-term spectral cues and temporal-envelope cues in natural soundscape discrimination, excluding (the presumably noisy) temporal-envelope cues when long-term spectral cues are available. Moreover, human listeners weighted more heavily mid-frequency cues than the model, suggesting that humans favor cues most likely associated with biophony and habitat acoustics. Still, internal noise, imperfect memory, and attention may also contribute to the sub-optimal behaviour of humans. Note that the finding of suboptimal use of temporal cues and greater importance of long-term spectral cues could be conditioned by the current database (a single temperate biome in California), the choice of stimulus parameters (e.g., 2 s samples), methods (e.g., use of feedback) and/or task (forced-choice discrimination). This encourages replication and extension of this study to other terrestrial biomes (e.g., more “ecologically-valid” experimental paradigms) (see Schmuckler, 2001; Krakauer , 2017; Holleman , 2020).

Habitat choice can have considerable consequences for the fitness of living organisms: wrong decisions, i.e., selecting poor-quality habitat, can lead to reduced survival and reproduction (Hale and Swearer, 2016). For this reason, selective pressure on habitat choice is strong and the ability to gather information on habitats is essential (e.g., Doligez and Boulinier, 2008). In that respect, efficient auditory discrimination of natural soundscapes should be favored by natural selection. The current study demonstrates that humans with limited experience of natural environments show high auditory sensitivity in behavioural tasks requiring discrimination between pristine habitats. Comparing this auditory ability across non-human species using behavioural paradigms like the one used here may further our understanding of habitat-selection processes.

Over the last decades, research in soundscape ecology and ecoacoustics have demonstrated repeatedly that monitoring soundscapes allows to assess efficiently ecological processes and human impacts on ecosystems (for reviews, see Sueur and Farina, 2015; Farina and Gage, 2017; Buxton , 2018). Further exploring the capacities of biological organisms to perceive soundscapes, by clarifying auditory cues and auditory mechanisms used by these organisms to perceive changes in natural soundscapes, should benefit ecological acoustics and may lead to the development of new (bio-inspired) algorithms and metrics. Here, we showed that humans were relatively sensitive to acoustic differences between natural soundscape recording associated with changes in habitat, season, and moment of day. Still, the auditory system of humans proved to be sub-optimal in these discrimination tasks. To optimize their performance, classification algorithms used by ecoacousticians might not take into account the entire (i.e., raw) signal but rather focus information processing on (i) long-term (gross) spectral cues in the mid-–high frequency range (1–15 kHz) and (ii) narrowband AM cues (lower than 150 Hz) conveyed by biophony in the same frequency range. Spectro-temporal fine structure cues in the low–mid audio frequency range (<1–3 kHz) may also be taken into account.

The present undertaking was inspired by two pioneering studies by Nelken , (1999) and Fay (2009). The former noted that auditory processing of natural sounds has been studied almost exclusively in the context of species-specific vocalisations, even though these constitute only a small fraction of the acoustic biotope. Fay (2009) took up and further elaborated on this view by pointing out that most hearing scientists have mainly focused on the adaptive value of species-specific communication sounds (e.g., vocalisations of non-human animals, speech) and have not taken into account the massive informational value of the so-called “ambient noises” or “natural noises,” which are, in accordance with their dénomination, generally considered as the acoustic part of the scene to be suppressed or ignored. A tangible example of this way of thinking is embodied in the applications of hearing aids and cochlear implants, whose signal processing resources are essentially designed to enhance the speech signal and attenuate environmental sounds, whatever their nature may be (whether urban noise or natural soundscapes). Fay (2009) particularly emphasized that hearing scientists by and large are unaware of the nature of soundscapes that affect most species (including humans) and the adaptive value of processing these natural soundscapes. In his view, natural soundscapes form a kind of “acoustic daylight” containing information that all organisms can potentially use to construct perceptual representations of their environment. This novel view is not widely shared; even important opinion articles calling for more “natural hearing research” (e.g., Theunissen and Elie, 2014) maintain that the goal is to better understand how communication signals are perceived within natural soundscapes rather than shifting research priorities to the auditory processing of natural soundscapes per se. However, a wealth of research in the field of soundscape ecology (Farina and Gage, 2017) revealed that natural soundscapes are highly structured and potentially information rich. This was also demonstrated by neuroscientific and psychophysical studies characterising the statistics of natural scenes (e.g., Attias and Schreiner, 1997; Singh and Theunissen, 2003; McDermott and Simoncelli, 2011; Traer and McDermott, 2016). In line with the “efficient coding” hypothesis, other studies demonstrated optimized auditory processing for acoustic signals typically encountered in natural environments (e.g., Nelken , 1999; Lewicki, 2002; Lesica and Grothe, 2008). Fay (2009) hypothesised that the ability to use natural soundscapes information should be shared by all vertebrate species, including humans. The present psychophysical and modelling studies, along with the study of Thoret , (2020), follow the direction of Nelken , (1999) and Fay (2009), providing the first assessment of the human ability to discriminate natural soundscapes. These early results suggest that humans are quite sensitive to the information conveyed by natural soundscapes and contribute to our sense of space and time.

  1. Human listeners showed high consistency and high sensitivity when discriminating habitat, season, and period of the day in short samples of natural soundscapes recorded in the same temperate terrestrial biome.

  2. Training listeners up to 10 h had no effect on habitat-discrimination capacities.

  3. Changes in habitat, season, and period of the day were discriminated accurately by an optimized deep neural network using “perceptual” spectro-temporal (AM) cues extracted by an auditory model. The optimal performance serves as a benchmark against which human performance can be compared.

  4. Behavioural data indicated that human listeners use acoustic cues across the audible frequency range to discriminate habitats but use the information in a non-optimal way, weighting more heavily acoustic cues between 1 and 3 kHz.

  5. Additional behavioural and modelling data indicated that human listeners do not use the available temporal fine-structure and temporal-envelope cues for habitat discrimination. Instead, human listeners based their decisions on gross spectral cues (i.e., brightness of timbre) related to biological activity and/or habitat acoustics. Temporal-envelope cues were used only when gross spectral cues were degraded. These results suggest that for habitat discrimination, human listeners may use some form of veto rule when monitoring long-term spectral cues and temporal-envelope cues conveyed by natural soundscapes, excluding (the presumably noisy) temporal-envelope cues when long-term spectral (brightness) cues are available.

This work was supported by ANR-17-EURE-0017 and ANR-20-CE28-0011. The authors wish to thank Richard McWalter, Elie Grinfeder, Pavel Zahorik, and two reviewers for their valuable input on this work. This study was approved by the national ethical committee CPP Ile de France III (Am8618-1-S.C.3460; N° EUDRACT: 2016-A01769-42).

1

See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0017972 for the questionnaire used to assess lifetime exposure to natural soundscapes; 16 examples of sound recordings–these recordings correspond to stimuli used in the experiments and are organized in four sets as follows: Set 1 (four stimuli), Season (summer), and period of the day (T1) are fixed. Habitat is varied. Set 2 ( four stimuli): habitat (SH) and season (spring) are fixed. Period of the day is varied. Set 3 (2 x 4 stimuli): habitat (BF or CM) and period of the day (T1) are fixed. Season is varied.

1.
Ardoint
,
M.
, and
Lorenzi
,
C.
(
2010
). “
Effects of lowpass and highpass filtering on the intelligibility of speech based on temporal fine structure or envelope cues
,”
Hear. Res.
260
,
89
95
.
2.
Attias
,
H.
, and
Schreiner
,
C. E.
(
1997
). “
Temporal low-order statistics of natural sounds
,” in
Advances in Neural Information Processing Systems
, (NIPS,
Denver
,
USA
), Vol. 9.
3.
Axelsson
,
O.
,
Nilsson
,
M. E.
, and
Berglund
,
B.
(
2010
). “
A principal components model of soundscape perception
,”
J. Acoust. Soc. Am.
128
,
2836
2846
.
4.
Bedoya
,
C.
,
Isaza
,
C.
,
Daza
,
J. M.
, and
Lopez
,
J. D.
(
2017
). “
Automatic identification of rainfall in acoustic recordings
,”
Ecol. Indic.
75
,
95
100
.
5.
Bell
,
T. S.
,
Dirks
,
D. D.
, and
Trine
,
T. D.
(
1992
). “
Frequency-importance functions for words in high- and low-context sentences
,”
J. Speech. Lang. Hear. Res.
35
,
950
959
.
6.
Bradbury
,
J. W.
, and
Vehrencamp
,
S. L.
(
2011
).
Principles of Animal Communication
(
Sinauer Associates
,
Sunderland, MA
).
7.
Bradley
,
S.
,
Wu
,
T.
,
von Hünerbein
,
S.
, and
Backman
,
J.
(
2003
). “
The mechanisms creating wind noise in microphones
,” in
Audio Engineering Society 114th Convention
, March 22–25, Amsterdam, The Netherlands, paper 5718, pp.
1
9
.
8.
Bregman
,
A. S.
(
1990
).
Auditory Scene Analysis
(
MIT Press
,
Cambridge, MA
).
9.
Buxton
,
R. T.
,
McKenna
,
M. F.
,
Clapp
,
M.
,
Meyer
,
E.
,
Stabenau
,
E.
,
Angeloni
,
L. M.
,
Crooks
,
K.
, and
Wittemyer
,
G.
(
2018
). “
Efficacy of extracting indices from large-scale acoustic recordings to monitor biodiversity
,”
Conserv. Biol.
32
,
1174
1184
.
10.
Catchpole
,
C. K.
, and
Slater
,
P. J. B.
(
2008
).
Bird Song: Biological Themes and Variations
(
Cambridge University Press
,
Cambridge
).
11.
Doligez
,
B.
, and
Boulinier
,
T.
(
2008
). “
Habitat selection and habitat suitability preferences
,” in
Encyclopedia of Ecology
, edited by
S. E.
Jørgensen
and
B. D.
Fath
(
Elsevier, Oxford
), Vol. 5, pp.
1810
1830
.
12.
Farina
,
A.
, and
Gage
,
S. H.
(
2017
).
Ecoacoustics: The Ecological Role of Sounds
(
John Wiley & Sons
,
Hoboken, NJ
).
13.
Fay
,
R.
(
2009
). “
Soundscapes and the sense of hearing of fishes
,”
Integr. Zool.
4
,
26
32
.
14.
Filipan
,
K.
,
De Coensel
,
B.
,
Aumond
,
P.
,
Can
,
A.
,
Lavandier
,
C.
, and
Botteldooren
,
D.
(
2019
). “
Auditory sensory saliency as a better predictor of change than sound amplitude in pleasantness assessment of reproduced urban soundscapes
,”
Build. Environ.
148
,
730
741
.
15.
Forrest
,
T. G.
(
1994
). “
From sender to receiver: Propagation and environmental effects on acoustic signals
,”
Am. Zool.
34
,
644
654
.
16.
Frijters
,
J. E. R.
(
1979
). “
Variations of the triangular method and the relationship of its unidimensional probabilistic models to three-alternative forced-choice signal detection theory models
,”
Br. J. Math. Stat. Psychol.
32
,
229
241
.
17.
Frijters
,
J. E. R.
,
Kooistra
,
A.
, and
Vereijken
,
P. F. G.
(
1980
). “
Tables of d′ for the triangular method and the 3-AFC signal detection procedure
,”
Percept. Psychophys.
27
,
176
178
.
18.
Gage
,
S. H.
, and
Axel
,
A. C.
(
2014
). “
Visualization of temporal change in soundscape power of a Michigan lake habitat over a 4-year period
,”
Ecol. Inform.
21
,
100
109
.
19.
Gerhardt
,
H. C.
, and
Huber
,
F.
(
2002
).
Acoustic Communication in Insects and Anurans: Common Problems and Diverse Solutions
(
University of Chicago Press
,
Chicago
).
20.
Gould van Praag
,
C.
,
Garfinkel
,
S. N.
,
Sparasci
,
O.
,
Mees
,
A.
,
Philippides
,
A. O.
,
Ware
,
M.
,
Ottaviani
,
C.
, and
Critchley
,
H. D.
(
2017
). “
Mind-wandering and alterations to default mode network connectivity when listening to naturalistic versus artificial sounds
,”
Sci. Rep.
7
,
45273
.
21.
Griest-Hines
,
S. E.
,
Bramhall
,
N. F.
,
Reavis
,
K. M.
,
Theodoroff
,
S. M.
, and
Henry
,
J. A.
(
2021
). “
Development and initial validation of the lifetime exposure to noise and solvents questionnaire in U.S. service members and veterans
,”
Am. J. Audiol.
30
,
810
824
.
22.
Grinfeder
,
E.
,
Lorenzi
,
C.
,
Haupert
,
S.
, and
Sueur
,
J.
(
2022
). “
What do we mean by ‘soundscape’? A functional description
,”
Front. Ecol. Evol.
10
,
894232
.
23.
Gygi
,
B.
,
Kidd
,
G.
, and
Watson
,
C.
(
2004
). “
Spectral-temporal factors in the identification of environmental sounds
,”
J. Acoust. Soc. Am.
115
,
1252
1265
.
24.
Hale
,
R.
, and
Swearer
,
S. E.
(
2016
). “
Ecological traps: Current evidence and future directions
,”
Proc. R. Soc. B
283
,
20152647
.
25.
Hauser
,
M. D.
(
1996
).
The Evolution of Communication
(
MIT Press
,
Cambridge, MA
).
26.
Hautus
,
M. J.
,
Macmillan
,
N. A.
, and
Creelman
,
C. D.
(
2021
).
Detection Theory: A User's Guide
(
Routledge Milton Park, Abingdon-on-Thames
,
Oxfordshire, UK
).
27.
Holleman
,
G. A.
,
Hooge
,
I. T. C.
,
Kemner
,
C.
, and
Hessels
,
R. S.
(
2020
). “
The ‘real-world approach’ and its problems: A critique of the term ecological validity
,”
Front. Psychol.
11
,
721
.
28.
Irwin
,
A.
,
Hall
,
D. A.
,
Peters
,
A.
, and
Plack
,
C. J.
(
2011
). “
Listening to urban soundscapes: Physiological validity of perceptual dimensions
,”
Psychophysiology
48
,
258
268
.
29.
Johnson
,
T. A.
,
Cooper
,
S.
,
Stamper
,
G. C.
, and
Chertoff
,
M.
(
2017
). “
Noise Exposure Questionnaire (NEQ): A tool for quantifying annual noise exposure
,”
J. Am. Acad. Audiol.
28
,
14
35
.
30.
Keidser
,
G.
,
Naylor
,
G.
,
Brungart
,
D. S.
,
Caduff
,
A.
,
Campos
,
J.
,
Carlile
,
S.
,
Carpenter
,
M. G.
,
Grimm
,
G.
,
Hohmann
,
V.
,
Holube
,
I.
,
Launer
,
S.
,
Lunner
,
T.
,
Mehra
,
R.
,
Rapport
,
F.
,
Slaney
,
M.
, and
Smeds
,
K.
(
2020
). “
The quest for ecological validity in hearing science: What it is, why it matters, and how to advance it
,”
Ear. Hear.
41
,
5S
19
.
31.
Krakauer
,
J. W.
,
Ghazanfar
,
A. A.
,
Gomez-Marin
,
A.
,
MacIver
,
M. A.
, and
Poeppel
,
D.
(
2017
). “
Neuroscience needs behavior: Correcting a reductionist bias
,”
Neuron
93
,
480
490
.
32.
Krause
,
B.
(
1987
). “
Bioacoustics, habitat ambience in ecological balance
,”
Whole Earth Rev.
57
,
14
18
.
33.
Krause
,
B.
(
2016
).
Wild Soundscapes: Discovering the Voice of the Natural World
(
Yale University Press
,
New Haven, CT
).
34.
Krause
,
B.
,
Gage
,
S. H.
, and
Joo
,
W.
(
2011
). “
Measuring and interpreting the temporal variability in the soundscape at four places in Sequoia National Park
,”
Landscape Ecol.
26
,
1247
1256
.
35.
Lesica
,
N. A.
, and
Grothe
,
B.
(
2008
). “
Efficient temporal processing of naturalistic sounds
,”
PLoS One
3
,
e1655
.
36.
Lewicki
,
M. S.
(
2002
). “
Efficient coding of natural sounds
,”
Nat. Neurosci.
5
,
356
363
.
37.
Macmillan
,
N. A.
, and
Creelman
,
C. D.
(
2005
).
Detection Theory: A User's Guide
,
2nd ed.
(
Psychological Press
,
New York
).
38.
Marler
,
P.
, and
Slabbekoorn
,
H.
(
2004
).
Nature's Music: The Science of Birdsong
(
Elsevier Academic Press
,
San Diego
).
39.
McDermott
,
J. H.
, and
Simoncelli
,
E. P.
(
2011
). “
Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis
,”
Neuron
71
,
926
940
.
40.
Moore
,
B. C. J.
, and
Gockel
,
H.
(
2012
). “
Properties of auditory stream formation
,”
Philos. Trans. R. Soc. B
367
,
919
931
.
41.
Morton
,
E. S.
(
1975
). “
Ecological sources of selection on avian sounds
,”
Am. Naturalist
109
,
17
34
.
42.
Nelken
,
I.
,
Rotman
,
Y.
, and
Bar Yosef
,
O.
(
1999
). “
Responses of auditory-cortex neurons to structural features of natural sounds
,”
Nature
397
,
154
157
.
43.
Neuhoff
,
J.
(
2004
).
Ecological Psychoacoustics
(
Elsevier Academic Press
,
Cambridge, MA
).
44.
Pijanowski
,
B. C.
,
Villanueva-Rivera
,
L. J.
,
Dumyahn
,
S. L.
,
Farina
,
A.
,
Krause
,
B. L.
,
Napoletano
,
B. M.
,
Gage
,
S. H.
, and
Pieretti
,
N.
(
2011
). “
Soundscape ecology: The science of sound in the landscape
,”
Bioscience
61
,
203
216
.
45.
Qi
,
J.
,
Gage
,
S.
,
Joo
,
W.
,
Napoletano
,
B.
, and
Biswas
,
S.
(
2008
). “
Soundscape characteristics of an environment: A new ecological indicator of ecosystem health
,” in
Wetland and Water Resource Modelling and Assessment
, edited by
W.
Ji
and
W. S.
Ji
(
CRC Press
,
New York
), pp.
201
211
.
46.
Raimbault
,
M.
(
2006
). “
Qualitative judgments of urban soundscapes: Questioning questionnaires and semantic scales
,”
Acta Acust. united Acust.
92
,
929
937
.
47.
Rodriguez
,
A.
,
Gasc
,
A.
,
Pavoine
,
S.
,
Grandcolas
,
P.
,
Gaucher
,
P.
, and
Sueur
,
J.
(
2014
). “
Temporal and spatial variability of animal sound within a neotropical forest
,”
Ecol. Inform.
21
,
133
143
.
48.
Sanchez-Giraldo
,
C.
,
Bedoya
,
C. L.
,
Moran-Vasquez
,
R. A.
,
Isaza
,
C. V.
, and
Daza
,
J. M.
(
2020
). “
Ecoacoustics in the rain: Understanding acoustic indices under the most common geophonic source in tropical rainforests
,”
Remote Sens. Ecol. Conserv.
6
,
248
261
.
49.
Sapozhnykov
,
V. V.
(
2019
). “
Sub-band detector for wind-induced noise
,”
J. Sign. Process. Syst.
91
,
399
409
.
50.
Schmuckler
,
M. A.
(
2001
). “
What is ecological validity? A dimensional analysis
,”
Infancy
2
,
419
436
.
51.
Sethi
,
S. S.
,
Jones
,
N. S.
,
Fulcher
,
B. D.
,
Picinali
,
L.
,
Clink
,
D. J.
,
Klinck
,
H.
,
Orme
,
C. D. L.
,
Wrege
,
P. H.
, and
Ewers
,
R. M.
(
2020
). “
Characterizing soundscapes across diverse ecosystems using a universal acoustic feature set
,”
Proc. Natl. Acad. Sci. U.S.A.
117
,
17049
17055
.
52.
Shafiro
,
V.
(
2008
). “
Identification of environmental sounds with varying spectral resolution
,”
Ear. Hear.
29
,
401
420
.
53.
Shannon
,
R. V.
,
Zeng
,
F. G.
,
Wygonski
,
J.
,
Kamath
,
V.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
54.
Singh
,
N. C.
, and
Theunissen
,
F. E.
(
2003
). “
Modulation spectra of natural sounds and ethological theories of auditory processing
,”
J. Acoust. Soc. Am.
114
,
3394
3411
.
55.
Studebaker
,
G. A.
,
Pavlovic
,
C. V.
, and
Sherbecoe
,
R. L.
(
1987
). “
A frequency importance function for continuous discourse
,”
J. Acoust. Soc. Am.
81
,
1130
1138
.
56.
Sueur
,
J.
, and
Farina
,
A.
(
2015
). “
Ecoacoustics: The ecological investigation and interpretation of environmental sound
,”
Biosemiotics
8
,
493
502
.
57.
Sueur
,
J.
,
Farina
,
A.
,
Gasc
,
A.
,
Pieretti
,
N.
, and
Pavoine
,
S.
(
2014
). “
Acoustic indices for biodiversity assessment and landscape investigation
,”
Acta Acust. United Acust.
100
,
772
781
.
58.
Sugai
,
L.
,
Silva
,
T.
,
Ribeiro
,
J.
, Jr.
, and
Llusia
,
D.
(
2019
). “
Terrestrial passive acoustic monitoring: Review and perspectives
,”
BioScience
69
,
15
25
.
59.
Swearingen
,
M. E.
, and
White
,
M. J.
(
2007
). “
Influence of scattering, atmospheric refraction, and ground effect on sound propagation through a pine forest
,”
J. Acoust. Soc. Am.
122
,
113
119
.
60.
Theunissen
,
F. E.
, and
Elie
,
J. E.
(
2014
). “
Neural processing of natural sounds
,”
Nat. Rev. Neurosci.
15
,
355
366
.
61.
Thoret
,
E.
,
Varnet
,
L.
,
Boubenec
,
Y.
,
Ferriere
,
R.
,
Le Tourneau
,
F.-M.
,
Krause
,
B.
, and
Lorenzi
,
C.
(
2020
). “
Characterizing amplitude and frequency modulation cues in natural soundscapes: A pilot study in four habitats of a biosphere reserve
,”
J. Acoust. Soc. Am.
147
,
3260
3274
.
62.
Traer
,
J.
, and
McDermott
,
J. H.
(
2016
). “
Statistics of natural reverberation enable perceptual separation of sound and space
,”
Proc. Natl. Acad. Sci. U.S.A.
113
,
E7856
E7865
.
63.
Varnet
,
L.
,
Ortiz-Barajas
,
M. C.
,
Erra
,
R. G.
,
Gervain
,
J.
, and
Lorenzi
,
C.
(
2017
). “
A cross-linguistic study of speech modulation spectra
,”
J. Acoust. Soc. Am.
142
,
1976
1989
.
64.
Versfeld
,
N. J.
,
Dai
,
H.
, and
Green
,
D. M.
(
1996
). “
The optimum decision rules for the oddity task
,”
Percept. Psychophys.
58
,
10
21
.

Supplementary Material