This Letter proposes a frequency scaling for processing, storing, and sharing high-bandwidth, passive acoustic spectral data that optimizes data volume while maintaining reasonable data resolution. The format is a hybrid that uses 1 Hz resolution up to 455 Hz and millidecade frequency bands above 455 Hz. This hybrid is appropriate for many types of soundscape analysis, including detecting different types of soundscapes and regulatory applications like computing weighted sound exposure levels. Hybrid millidecade files are compressed compared to the 1 Hz equivalent such that one research center could feasibly store data from hundreds of projects for sharing among researchers globally.
1. Introduction
In the acoustic realm, the effects of anthropogenic sound on marine and terrestrial life have become a growing concern. A large amount of attention has been focused on characterizing the acoustic environment in which animals communicate, find mates, locate food, and listen for predators or prey. In addition to identifying the contribution of human sound sources, analyses have provided a means for understanding the influences of environmental parameters, such as diel patterns, wind, sea ice presence, and lunar cycles, on local acoustic processes (Miksis-Olds et al., 2013; Staaterman et al., 2014); assessing habitat quality and health on coral reefs (Parmentier et al., 2015; Lillis et al., 2018); comparing regions (Miller, 2008; Haver et al., 2019); and measuring biodiversity (Parks et al., 2014; Harris et al., 2016).
An important use of these analyses is detecting and interpreting changes in ecosystems by comparing acoustic metrics across time and space. An excellent example of this type of work is analyzing the change in sound levels due to changes in human activity during the COVID-19 pandemic (Lecocq et al., 2020; Thomson and Barclay, 2020). Ideally, all research groups would compute an agreed-upon set of temporal and frequency-based metrics that would then be shared to enable comparative and post hoc studies on ecologically relevant scales. The lack of standards, accepted guidelines, and tools for collecting, processing, and reporting natural sound levels could lead to misinterpreting results and developing governmental policies and regulations that may be too conservative or too liberal.
Multiple national and international entities have recognized the significance of standardizing soundscape analysis by convening cross-sector workshops of stakeholders to develop protocols and guidelines (e.g., International Whaling Commission, 2014; Consortium for Ocean Leadership, 2018; International Quiet Ocean Experiment, 2019). The workshops identified consensus items and recommendations to enable meaningful comparisons of acoustic environments. With respect to temporal resolution, 1-min averages are the minimum recommended analysis unit, with 1-s averages recommended where practical. With regard to frequency resolution, the workshops identified decidecades as the minimum acceptable frequency resolution, with 1 Hz bands preferred where practical. These recommended resolutions were developed to ensure that as many historical datasets and hardware systems as possible could meet the specified requirements. It was also recognized that the recommended guidelines should create smaller data packages for transfer and easy comparison between research teams, stakeholders, and projects.
As sampling rates on long-term acoustic recorders have increased into the 100s of kHz, the number of spectral bins has increased to the point that 1-s, 1 Hz, power spectral density files for a multi-month recording can be terabytes in size, which are difficult to manage, transfer, and manipulate. The size of these files and the use of research group-specific formats inhibit easy exchange among researchers. A previously proposed compromise to reduce the size of the datasets that are stored and exchanged is using 1 Hz frequency bands up to 1000 Hz and deci-decade bands above that (International Quiet Ocean Experiment, 2019). Deci-decade resolution, in which a factor of 10 in frequency (e.g., from 1–10 kHz) has 10 frequency bins, and is suitable for many applications, such as quantifying weighted sound levels of human activities, or comparing sound levels to the hearing capabilities of marine and terrestrial life. It is inadequate for quantifying abiotic information, such as wind speed and rainfall, because the spectral slope is not preserved at the deci-decade resolution (Vagle et al., 1990; Ma et al., 2005).
Millidecade frequency resolution is proposed as a solution that can reduce the size of the spectral data by a large factor without compromising the use of the data for the applications described previously. As with decidecades, millidecades are logarithmically spaced frequency bands but have a bandwidth equal to 1/1000th of a decade. Dividing the frequency scale using fractions of a decade instead of fractions of an octave was preferred as more intuitive and in-line with recent international standards such as ISO 18405:2017 (International Organization for Standardization, 2017). Historically, the millidecade was called the savart, and was used in the context of measuring musical intonation in the 1800s (Pikler, 1966; Op de Coul, 2015); we prefer the name millidecade as more descriptive.
This note provides a definition for hybrid millidecade analysis that is proposed as a standard for the exchange of high-resolution power spectral density data between research groups studying marine or terrestrial soundscapes. The sharing of soundscape data between research groups will accelerate research into the effects of human sound and climate change on the acoustic environment. The millidecade frequency resolution proposed here is a hybrid of the recommended and optimal guidelines that (1) increases data resolution over the decidecade recommendation, and (2) provides information in a format that allows users to calculate decidecade or user-specified band levels, while (3) maintaining a manageable data package size for transfer and comparison.
2. Definition of millidecades
Similar to decidecades, the center frequency for the ith millidecade (fc_i) is defined as
where is a reference frequency and the band index i counts up or down from the reference. In accordance with IEC 61260–1:2014 (International Electrotechnical Commission, 2014) the standard reference frequency is 1000 Hz and i is 0 at this frequency. The lower (flo_i) and upper (fhi_i) bounds for each millidecade are
There are 1000 millidecades in each frequency decade, where a decade is an increase in the frequency by a factor of 10. For discussion, we consider 1–10, 10–100, 100–1000, 1000–10 000, and 10 000–100 000 Hz decade bands. A pure millidecade presentation of a spectrum from 1–100 000-Hz has 5000 bands rather than one hundred thousand 1 Hz bands, which results in a 20:1 decrease in the amount of data required for storage or exchange. The lowest three decades each have 1000 millidecades, for a total of 3000 bands between 1 and 1000-Hz, which is greater than the one thousand 1 Hz bands. The lowest millidecades over-resolve (bin sizes <1-Hz) the space between 1 and 435-Hz for nearly all soundscape applications. To address this, we propose a hybrid solution that uses 1 Hz bands up to the point where the millidecades are 1-Hz wide, which occurs at 436-Hz. At 436- Hz, the center frequency of the millidecade is 435.6 Hz, which would mean the two bins at 435 and 436 Hz are ∼0.7 Hz wide, which is significantly narrower than all other bands. To avoid this odd transition, we propose that the last 1 Hz bin would be 455 Hz because there is a decidecade centered at 456.03 Hz (see Table I). The two adjacent bands are ∼1.04 Hz wide, and all bands are a least 1 Hz wide, which is viewed as a preferable transition. For a 100 000 Hz spectrum, the total number of points in the hybrid millidecade representation is 2797, a 35:1 compression compared to the 1 Hz representation. For a 256 000 Hz spectrum, which is becoming a common size for recorders sampling at 512,000 Hz, there are 3206 hybrid millidecades resulting in a compression ratio of 80:1. Because all bands are at least 1 Hz wide, bands may be referred to unambiguously using their center frequency rounded to the nearest integer, for example the “405 Hz band” (which is centered at 405 Hz) or the “501 Hz band” (which is centered at approximately 501.187 Hz).
Band start frequency (Hz) . | Band center frequency (Hz) . | Band end frequency (Hz) . |
---|---|---|
0 | 0 | 0.5 |
0.5 | 1 | 1.5 |
1.5 | 2 | 2.5 |
2.5 | 3 | 3.5 |
453.5 | 454 | 454.5 |
454.5 | 455 | 455.5 |
455.51 | 456.04 | 456.56 |
456.56 | 457.09 | 457.61 |
457.61 | 458.14 | 458.67 |
996.55 | 997.70 | 998.85 |
998.85 | 1000 | 1001.15 |
1001.15 | 1002.31 | 1003.46 |
9965.5 | 9977.0 | 9988.5 |
9988.5 | 10000 | 10012 |
10012 | 10023 | 10035 |
Band start frequency (Hz) . | Band center frequency (Hz) . | Band end frequency (Hz) . |
---|---|---|
0 | 0 | 0.5 |
0.5 | 1 | 1.5 |
1.5 | 2 | 2.5 |
2.5 | 3 | 3.5 |
453.5 | 454 | 454.5 |
454.5 | 455 | 455.5 |
455.51 | 456.04 | 456.56 |
456.56 | 457.09 | 457.61 |
457.61 | 458.14 | 458.67 |
996.55 | 997.70 | 998.85 |
998.85 | 1000 | 1001.15 |
1001.15 | 1002.31 | 1003.46 |
9965.5 | 9977.0 | 9988.5 |
9988.5 | 10000 | 10012 |
10012 | 10023 | 10035 |
The supplementary material1 contains a matlab function called “getBandTable.m” that generates band centers and edges for any type of logarithmically spaced bands, including hybrid millidecades. The band definitions are passed into two other functions, “getBandSquaredSoundPressure.m” and “getMeanBandPowerSpectralDensity.m,” also in the supplementary material,1 that provide a reference implementation for dividing a spectrum into logarithmically spaced bands.
3. Example results
A 10-month long recording (September 2018–July 2019) from the Emerald Basin southeast of Halifax, Canada (recorder location: 43.50 N, 62.87 W, 120 m deep) was processed to demonstrate the utility of millidecades. The recording was made with an AMAR G4 (JASCO Applied Sciences, Dartmouth, Canada) using an M36-V35–100 hydrophone (GeoSpectrum Technologies Inc., Dartmouth, Canada). The recording was sampled at 512 kHz for 1 of every 15 min. The data were analyzed using traditional 1-Hz bands as well as with millidecades. The most distinct feature in the long-term spectral average (Fig. 1) is the fin whale (Balaenoptera physalus) choruses around 20 Hz that occurred from approximately 1 October 2019–1 April 2020. This location also had occasional ship passages and fishing vessels, but the primary sound sources were natural abiotic sound sources (wind, waves, and rain) as well as distant shipping.
The resulting binary spectral data file was 26.6 GB for the 1-Hz data but only 334 MB for the millidecade equivalent (in both cases using single precision floating point numbers; see IEC Standard 80000–13:2008 (International Electrotechnical Commission, 2008) for definitions of MB, GB, and TB).
The distribution of the spectral data is shown in Fig. 2 using decidecade sound pressure level (SPL), millidecade-band, and 1 Hz band spectra. Each increase in frequency resolution provides more finely detailed information. The L5 (cyan) vertical lines in Fig. 2, panel C above 30 kHz are recorder self-noise with bandwidths of ∼1 Hz. Their amplitudes are low enough that when they are averaged into millidecades or decidecades, they are not visible in the results. The power spectral density figures (Fig. 2, panels B and C) include the relative spectral probability densities (Merchant et al., 2013) that provide an even finer-scale view of the occurrence of different sound levels than the percentile lines. The millidecade and 1 Hz spectral probability densities are virtually identical.
A common method for performing an initial review of large datasets is to review daily spectral average figures and identify periods with different types of sounds. Figure 3 compares the hybrid millidecade and 1 Hz representations of data from 2 October 2016 collected outside the Gully Marine Protected Area (MPA, 43.87 N, 58.00 W, 2000 m deep). Fin whales (around 20 Hz), vessels (10–1000+ Hz), pilot whales (whistles 2–10 kHz, echolocation clicks 25–50 kHz) and changes in wind-driven noise (200–5000 Hz) are equally detectable in the two representations.
The millidecade data format also accelerated data analysis. In another example, a 6.2 TB dataset sampled at 197 kHz with 24 bits resolution was processed using both the 1 Hz and hybrid millidecade methods. The analysis was performed with matlab custom software in two stages: (1) computing and storing intermediate spectra, and (2) post processing the data to generate daily median spectra (Dugan et al., 2015). The intermediate 1 Hz spectral data were stored with a 1 s temporal resolution, whereas the hybrid millidecade data were stored at a 1 min temporal resolution. Computing and storing the 1 Hz data spectra required 14 h, whereas the hybrid millidecade spectra required 8.5 h. The post processing required 16 h for the 1 Hz data, but only 8 min for the hybrid millidecade data, for a total processing time acceleration of ∼4:1 to generate virtually the same results. The reason for this difference was identified as the reduced memory footprint required, which accelerated loading the data and eliminated swapping data from memory to disk.
4. Discussion
An important element of any acoustic data processing standard or guideline is defining the unit of measure. When computing band levels, the obvious units of measure are either SPL (units of dB re 1 μPa2), or power spectral density, which is the SPL divided by the bandwidth (units of dB re 1 μPa2/Hz). When sharing datasets, there should be an accompanying metadata file that clearly describes the data. We recommend using the power spectral density representation for long-term storage and data exchange.
The processed data used to create the long-term spectral average in Figs. 1 and 2 had more time and frequency cells than there are pixels in the figure. Because the purpose of the figures is to summarize the data in time and frequency, it is important that they highlight interesting features. Choices made while compressing the data to fit into the figures affect the detectability of features. We recommend several conventions for generating long-term spectral average figures:
Use a logarithmic frequency scale.
Use linear interpolation between frequencies that are present to fill in the figure. The lowest frequencies (10–100 Hz) often have more pixels in a figure than there are in the power spectral density.
When multiple frequency bins are mapped to a single pixel in an output image, awareness and clear description of the underlying procedure (selection of maximum or minimum amplitude, spectral averaging, median) is a must. If the objective is to identify significant or interesting events, selecting the maximum is recommended. If the objective it to suppress higher levels sound sources and view the underlying ambient sound level, then selecting the minimum may be preferred.
Similar to selecting the frequency bin to display, when compressing spectrogram data in time, selecting the time slice with the maximum value for each mapped frequency helps identify interesting events; selecting the minimum is preferred for examining ambient sound levels. Supporting text must clearly define how the image was generated.
Use a colormap that is evenly spaced in decibels. A colormap optimized to maximize contrast for all visual capabilities is recommended. Figure 1 was generated using Google's “Turbo” colormap (https://bit.ly/3h4RtIo).
These conventions should be implemented by researchers directly rather than relying on software packages to compress/expand images. This is important because standard image processing packages use mean, median, and nearest-neighbor algorithms designed for visual images that do not enhance the detectability of acoustic features.
Note that we recommend using the mean power spectral density as the value stored in the hybrid millidecade data files used for data sharing. This choice smoothed the L5 spectra in Fig. 2. Only the mean millidecade data can be used to find the decidecade band levels and sound exposure levels.
5. Summary
A hybrid millidecade spectrum is proposed as a means of storing and exchanging passive acoustic spectral data with sufficient frequency resolution for many applications while maintaining reasonable data sizes for transfer and exchange. This frequency resolution is high enough to support many types of analysis, including analyzing different types of soundscapes, computing weighted sound exposure levels, and summing the millidecades to find decidecades, third-octave, and other desired frequency bands. The size of the millidecade files greatly compresses the acoustic data compared to 1 Hz resolution, such that data from long-term, multiple-station, high-sampling frequency projects can easily be stored at a single location. The objective of sharing of soundscape data from a single location is to accelerate research into the effects of human sound and climate change on marine and terrestrial soundscapes so that future regulatory decisions may be based on the best available information.
Acknowledgments
The authors acknowledge the Department of Fisheries and Oceans Canada (DFO) for access to the 2016 Gully East dataset. Collection of this dataset was funded by the Ocean and Coastal Management Division at the Bedford Institute of Oceanography. We also acknowledge the Department of Fisheries and Oceans Canada (DFO) for access to the 2018‐2019 Emerald Basin dataset. Collection of this dataset was funded by the Ocean Protection Plan's Marine Environmental Quality DFO research program. This work was initiated with funding from the Richard Lounsbery Foundation related to the Ocean Sound Software for Making Ambient Noise Trends Accessible (MANTA) project. This is Pacific Marine Environmental Laboratory contribution #5141.
See the supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0003324 for a MATLAB function called “getBandTable.m” that generates band centers and edges for any type of logarithmically spaced bands, including hybrid millidecades. The band definitions are passed into two other functions, “getBandSquaredSoundPressure.m” and “getMeanBandPowerSpectralDensity.m,” that provide a reference implementation for dividing a spectrum into logarithmically spaced bands.