A previous analysis of 1977 passive acoustic recordings in the Indian Ocean focused on sound pressure levels (SPLs) and showed that SPLs were slightly depth dependent and highly influenced by shipping activities [Wagstaff and Aitkenhead, IEEE J. Ocean. Eng. 30(2), 295–302 (2005)]. Consequently, SPL alone does not provide a consistent comprehensive metric to compare among sites or with contemporary recordings in the same region. Therefore, a source separation analysis was devised and applied to identify the major sound source contributions at three Indian Ocean locations. Shipping noise was a major sound contributor in all sites, while the site with the most diverse number of sources was in the central Arabian Sea.
1. Introduction
Concern of rising ocean sound levels was sparked by observations made in the Northeast Pacific Ocean in which a 3 dB/decade rise in low frequency ambient ocean sound was documented.1 The few well-documented datasets of ocean sound recordings prior to 1980 provided information to assess long term ambient sound trends and to understand the changes in the sound source contributions driving the sound level dynamics.2–4 The aforementioned observations raised the question as to whether this is a relatively regional phenomenon or indicative of a more global effect. More recent studies sought to expand the current state of knowledge by assessing ocean sound trends in other parts of the world.5–8 The contemporary studies examined ocean sound trends over the past 20 years, as access to quality recordings prior to the 1990s in areas outside the Northeast Pacific have been a limiting factor. Recent trends do not indicate that low frequency sound levels are increasing globally,8 but access to more historical acoustic recordings are needed to better interpret changes beyond the most recent 20-year window.
The objective of this study was to assess the quality and characteristics of acoustic recordings made in the Northwest Indian Ocean in 1977 for future comparison to recordings made over the past decade. Acoustic recordings were made during the U.S. Naval Bearing Stake exercise in the Northwest Indian Ocean as part of the U.S. Navy Long Range Acoustic Propagation Project (LRAPP). The purpose of the Bearing Stake exercise was to acquire acoustic environmental data to support assessment of sonar performance. Ambient sound recordings were made in three sites across the Arabian and Somali Basins in addition to other measures of propagation losses, coherence, surface shipping distributions, bottom losses, bathymetry, sound speed structure, physical oceanography, and meteorology.9 In regards to these sites, the authors were aware of (1) acoustic analysis performed by Wagstaff and Aitkenhead,9 which was related to ambient sound pressure level, depth dependence, and directionality and (2) the work conducted by Sagers and Knobles10 that addressed the statistical inference of sound-speed depth profile of a seabed in the Gulf of Oman.
In the present study, the Bearing Stake recordings were revisited, and new understanding of the ambient sound is provided. The assessment of Bearing Stake soundscapes via SPL was not a consistent choice of a comparison metric due to poor signal to noise ratio (SNR) and the presence of high electrical noise in some channels. Therefore, source identification was pursued to better understand the source contribution to the overall soundscape. New knowledge resulting from the source identification process will support future comparison with contemporary datasets in this region to gain a better understanding of the ambient sound changes over the past 50 years.
2. Methods
2.1 Acoustic recordings
Recordings in the Bearing Stake exercise were made at three locations in the Northwestern Indian Ocean during the first quarter of 1977; recordings were made at three depths in the water column at each site. Clusters of hydrophones were deployed at depths of 400 m, mid-water column, and near-bottom. A vertical acoustic data capsule (VAC) unit was used to record the underwater sound. Each cluster consisted of four hydrophones recording at 27 dB gain steps to cover a wide dynamic range. The recorded data were multiplexed in a channel order from 1 to 14 at each site. The first 12 channels were distributed equally over the three targeted depth regions with four phones at each depth. Channel 13 was a single hydrophone at the deepest depth, and channel 14 was a time code.
In this study, three sites were analyzed based on knowledge gained from Ref. 9 and documentation of the Bearing Stake recordings provided by Applied Research Laboratories, University of Texas (ARL UT) at Austin11 (Table 1). Site 3 was the first deployment of the Bearing Stake exercise with a total of six recording days. Recordings at sites 1B and 4 were made following site 3, and were recorded over 7 and 13 days, respectively. The original sound pressure time series data were requested from ARL UT by the authors, and it is not available for public use without Office of Naval Research approval.
Site No. . | Coordinates . | Water depth (m) . | Recording period . | Hydrophone depth (m) . |
---|---|---|---|---|
1B | 23°38′N61°09′E | 3351 | Feb 16–Feb 22 | 1685 |
3 | 17°18′N65°24′E | 3583 | Feb 02–Feb 07 | 1919 |
4 | 04°45′N53°09′E | 5106 | Mar 11–Mar 24 | 1916 |
Site No. . | Coordinates . | Water depth (m) . | Recording period . | Hydrophone depth (m) . |
---|---|---|---|---|
1B | 23°38′N61°09′E | 3351 | Feb 16–Feb 22 | 1685 |
3 | 17°18′N65°24′E | 3583 | Feb 02–Feb 07 | 1919 |
4 | 04°45′N53°09′E | 5106 | Mar 11–Mar 24 | 1916 |
2.2 Ambient sound analysis and channel selection
Statistical analysis and characterization of the low frequency (<1000 Hz) ocean ambient sound was performed on acoustic recordings at the three Bearing Stake sites. The depth selected for analysis was the depth within the SOFAR channel (Table 1 shows only the selected depth) to maximize the acoustic coverage represented by the measurements and allow future comparison with contemporary datasets in the same region. The selection of channels with the clearest signal display among the three sites was a critical task for analysis purposes, because the large dynamic range across the hydrophones in each cluster at the same depth resulted in different SNR levels between the signals of interest and the ambient sound. Further, low signal level and electrical noise were documented10,11 in some channels within each cluster.
Spectrograms of each channel were visually reviewed to select one channel per site for source comparison purposes. The channel that displayed the clearest signal and least amount of system noise was selected. Channels 7, 6, and 6 were selected for sites 3, 1B, and 4, respectively. Relying only on sound levels, and due to the gain instability, did not provide a concrete metric to assess the ambient sound differences across sites; therefore, source contribution was examined.
2.3 Frequency correlation metrics
The SPL variability between the three sites was the result of constantly changing sources contributions. A frequency correlation method was used to identify the most salient contribution differences among the three Bearing Stake sites. Frequency correlation metrics were calculated based on the time-frequency representation of the raw data8 (Fig. 1 top row). The raw data was transformed from the time to frequency domain with a 10 s FFT and 0.1 Hz frequency resolution. The correlation is Pearson's pairwise linear correlation coefficient12 between each pair of frequency measurements in the time-frequency matrix. The absolute difference correlation matrices (Fig. 1 bottom row) were computed by taking the difference between the two frequency correlation matrices to highlight areas of greatest dissimilarity between the sites. The 2–20 Hz band was identified as a spectral region of large difference between site 3 compared to sites 1B and 4 and was thought to be driven by a unique source mechanism present at site 3. Sites 1B and 4 were comparatively similar.
2.4 Sound source separation (clustering technique)
A source separation method that takes advantage of the time-frequency representation provided by the short-time Fourier transform (STFT) was applied to the data at each site. A normalized cross correlation was applied to determine the segments that were generated by the same source.
For each site, the sound pressure time series data, , was divided into M segments, , in which the mth segment is 60 s (or 144 000 samples). One minute was collected at 12 minutes intervals totaling 5 minutes each hour. In this study, a total of 12, 12.5, and 22.5 hours were processed for the sites 3, 1B, and 4 respectively. For each segment, the mean, , was subtracted then each zero mean segment was transformed to time-frequency representation using short-time Fourier transform (STFT). The STFT matrix of the mth segment is given as ; whereas the ith element of the STFT matrix13 is computed as follows:
where is the mth sound pressure time series segment, is the Hann window function, and is the hop size between successive DFTs. The STFT matrix magnitude of the mth segment is given as . In this study, DFT points of 2048 samples, Hann spectral window size of 1024 samples, and overlapped samples of 512 were used. For each site, all the STFT matrices were concatenated in one large matrix, denoted here by [Fig. 2(a)]. The matrix was updated after each iteration with the remaining segments, which is why the total number of segments for that matrix was denoted by the variable, [Fig. 2(a)].
The normalized 2-D cross-correlation14 was computed between the mth STFT matrix and the entire matrix, to examine if that particular segment held any similarity with other segments. The resulting cross correlation is another large matrix, U in which the dimension of each segment in that matrix is double the The maximum value of each cross correlation matrix from U was accumulated and sorted in ascending order in an array, denoted by . The changes in the slope was used as a metric to determine whether the contribution of a given segment contained a salient or transient sound source or was representative of the ambient soundscape. In each iteration, two changing points were computed, denoted by and , where these two points worked as a testing metric. These two points were computed using the error sum of squares (SSE) between the correlation coefficients values and the least-squares linear fit through those values minimizing cost function [Eq. (1) in Ref. 15]. If both changing points and were detected in the lower half of , then that mth segment was ambient soundscape [Figs. 2(a), 2(b)]. On the other hand, if at least one changing point was detected in the upper half of series, then there were segments grouped as a sound source cluster [Figs. 2(a), 2(c)]. Consequently, the matrix was updated by excluding the mth segment from the matrix, in case of ambient soundscape. However, the jth cluster, , held all segments after the changing point, , in which those K segments in the were excluded from the matrix . The process continued until all possible sound source clusters and ambient segments were separated.
The characteristics of the separated sound sources were unique enough for the normalized 2-D cross correlation to make a robust decision about them (Fig. 3), except for the ship noise cluster which consisted of a combination of both source and ambient environment characteristics. Therefore, to further separate the ship noise sources from ambient, an additional stage of processing was necessary. For that purpose, the segments of the ship noise cluster (represented by the sound pressure time series segments) were transformed to Welch's power spectral density (Welch's PSD)16 [Fig. 2(a)]. The transformed segments were compared to an empirical threshold () in which the kth segment was considered ambient if its maximum PSD < 135 dB, otherwise the segment was identified as containing ship noise. The empirical threshold was selected high enough to ensure that only high SNR ship noise sources were separated out from the ambient soundscape. On the other hand, opting for a lower threshold (<135 dB) increased the false positive rate of detecting of ship noise, which was not supported by manual review of randomly selected segments across all three sites.
3. Results
The separated sources grouped into four categories based on unique amplitude and frequency characteristics (Fig. 3). The frequency content varied from several Hz to 500 Hz with different regions of energy concentration. Each source category was labeled from A to D, and the ambient environment was labelled as N (Table 2). The proportion of each sound source and the ambient soundscape without any identified sources is shown in Table 2. Identifiable sound sources were detected at site 3 more often than at any other site. The most prevalent source identified at site 3 was ship noise (source C) detected 34% of the time, followed closely by source A detected 33% of the time. Source A is consistent with the broadband signal of particles or plankton colliding with the hydrophone in the current.17 Site 1B shared the presence of source B and C along with a majority contribution of ambient (>60%). Source B is consistent with the signal of seismic impulses and associated multipath arrivals produced by distant seismic surveys.18 The diversity of sources at site 1B was the least among other sites with a significant contribution from source C. The characteristics of signals in source D have not been linked directly to any known source by the authors, but the constant 4 s pulse rate suggests a human generated signal. Sound source D was only detected at site 4 with a minimal 2% proportion. All three sites exhibited ship noise along with ambient environment as the dominant parts of the soundscape (Table 2). The frequency correlation analysis in Sec. 2.2 showed that there was a spectral region of large difference (2–20 Hz band) between site 3 and the other sites. That analysis was supported by identifying source A, which was only detected at site 3, as likely driving the observed spectral difference in correlation matrices of Fig. 1.
Source Category . | Site source contribution (%) . | Overall source contribution (%) . | ||
---|---|---|---|---|
3 . | 1B . | 4 . | ||
A | 33 | — | — | 8.38 |
B | 6 | 1 | 3 | 3.22 |
C | 34 | 37 | 35 | 35.2 |
D | — | — | 2 | 0.85 |
N | 28 | 62 | 60 | 52.35 |
Source Category . | Site source contribution (%) . | Overall source contribution (%) . | ||
---|---|---|---|---|
3 . | 1B . | 4 . | ||
A | 33 | — | — | 8.38 |
B | 6 | 1 | 3 | 3.22 |
C | 34 | 37 | 35 | 35.2 |
D | — | — | 2 | 0.85 |
N | 28 | 62 | 60 | 52.35 |
4. Discussion
The Bearing Stake exercise project measured the acoustic environment of the Northwestern Indian Ocean during the first four months of 1977. The observation of increasing sound levels in (1) the NE Pacific Ocean over a period that included the 1970s decade and (2) over the past 2 decades in the Northern Indian Ocean,5 motivated the current study in which the only known and accessible recordings in the Northern Indian Ocean prior to 1980 were from the Bearing Stake LRAPP exercise. The high and differing dynamic range settings between the hydrophones made it difficult to use the sound pressure levels as an indication of which channel was the most appropriate for spectral decomposition of the source contributions to the overall soundscape.
Due to the limited data recording duration, the statistical analysis was focused on finding the sound source contribution per each site to assess the diversity of regional sources. The applied sound source separation algorithm takes advantages of the short-time Fourier transform in representing the data segments in both time and frequency domains, whereas the normalized cross-correlation between a reference segment and the entire dataset was used as source detector by scanning for relevant segments. The applied algorithm enabled separation of the segments, whether they were amplitude or frequency modulated sources, through a threshold that detects abrupt changes of the segments' correlation coefficients. The significance of this study lies in gaining new knowledge of historical recordings that are relatively short. The results provide a benchmark by which to compare with contemporary studies in the same region and hence facilitate greater understanding of the ambient sound changes over the past 50 years. The unique source category detected at site 3 may indicate a greater level of current and/or water column particles at this site compared to sites 1B and 4. Present-day soundscapes from this area8 show a large soundscape contribution from baleen whales, which the analysis of the 1977 recordings did not reflect. It is possible that the military activity suppressed baleen whale vocal production during the 1977 recording period, as a reduction in baleen whale vocalization has been observed in response to elevated sound levels and anthropogenic source exposure.19,20 This is an observation that must be addressed when comparing the 1977 sound levels and soundscapes to contemporary datasets.
Acknowledgments
Thank you to the Applied Research Laboratories, University of Texas at Austin for facilitating the transfer of the Bearing Stake data. This study was supported under a grant from the Office of Naval Research Award No. N000141812063.