We use non-negative matrix factorization for source separation on ultra-low frequency passive-acoustic data from a single-channel recording acquired in deep sea. Non-negative matrix factorization decomposes the spectrogram into a spectral-component matrix and a time-encoding matrix. Detectors use known time-frequency features to group components from the same source and reconstruct spectrograms of blue whale calls, seismic sounds, and ship noise. Data are separated at low computational cost and without learning step. The separation assessment using scale-invariant signal-to-distortion ratio on spectrograms of simulated reference data is satisfying. Source separation on ocean-bottom seismometer data from the Southern Indian Ocean provides convincing results.
1. Introduction
Ocean bottom seismometers (OBSs) ensure a wide passive-acoustic coverage of the world ocean, particularly in the deep sea, with a limitation to ultra-low frequencies (≤100 Hz). The data acquired include acoustic signals from anthropogenic, biological, geophysical, and cryogenic sources. For a better understanding of the deep-sea acoustic environment, it is crucial to classify this wide diversity of sources. At-sea signals often present time and frequency mixing of several sources. Further investigations require source separation to facilitate future work on well-separated signals. The most challenging scenario for source separation is when information is provided by only one sensor: the underdetermined case.
It has been shown that non-negative matrix factorization (NMF) (Paatero and Tapper, 1994; Lee and Seung, 1999) is an effective technique for isolating patterns in images. Hence, it can be used for retrieving time-frequency patterns in spectrograms to achieve source separation. NMF natively integrates the non-negativity constraint inerrant to amplitude. NMF isolates patterns of limited time-frequency extension, whereas principal component analysis tends to produce basis images covering the whole spectrogram (Lee and Seung, 1999). Moreover, NMF avoids the assumption of non-Gaussian signals and Gaussian noise required by source separation via independent components analysis (Hyvärinen and Oja, 2000).
NMF separates the spectrogram into a spectral-component matrix and a time-encoding matrix. In this work, we propose to use detectors based on known time-frequency features to group components dominated by the same source. Recovered spectrograms can then be built for the sources of interest. We can hence avoid a possibly time-consuming learning step that would require unmixed annotated data. We present detectors of three sources, namely, one type of stereotyped great-whale call (blue whale Z-call), impulse seismic sound, and harmonic continuous ship noise. We simulate data for the three sources of interest. The detectors are applied to the simulated data for evaluating the method and for choosing an appropriate NMF algorithm. Scale-invariant signal-to-distortion ratio (SI-SDR) (Le Roux , 2019) is employed for a quantitative evaluation of the source separation method. We conclude with an example of NMF being applied on in situ data from the Southern Indian Ocean. The main contribution of this letter is source-separation in ultra-low frequency passive acoustic data acquired in the deep sea from a single-channel recorder.
2. Data
2.1 Simulated data
We simulate the reference data used to assess the separation method. Simulated data are generated for three types of sources: blue whale Z-calls, ship noise, and seismic sounds.
2.1.1 Blue whale Z-call
2.1.2 Ship noise
2.1.3 Seismic sounds
We use data simulated by SPECFEM2D (Tromp , 2008) using a configuration described in Lecoulant (2022). The signal is simulated at the bottom of a 3000-m-deep homogeneous ocean with a sound speed 1500 m s−1. The seismic source is a pure shear localized 7000 m below the sea surface, at the apex of a 2000-m-high seamount with a Gaussian shape. The signal is repeated once to show two occurrences of the seismic sound within the half-an-hour long simulated data. It is normalized to its maximum to give the signal sT(t).
2.1.4 Additive mixing
2.2 RHUM-RUM network data
We apply source separation to pressure measurements corrected for sensor response from the RR48 station in the open-source RHUM-RUM network (Barruol and Sigloch, 2013). The water depth is 4830 m and the sample rate is 100 Hz. An eighth order Butterworth high-pass filter (5 Hz) is applied to remove microseismic noise. We focus on the signal acquired on May 31, 2013, while a single male Antarctic blue whale (Balaenoptera musculus intermedia) passes by the OBS following a trajectory already studied by Dréo (2019). The signature of this whale is the Z-call: every 66 s, it emits a higher tone at 26 Hz followed by a lower tone at 18 Hz with a continuous downsweep in between. Other signals of interest include harmonic sounds generated by ships and impulse sounds linked to the seismic activity of the South West Indian Ridge.
3. Method
3.1 NMF
NMF decomposes a non-negative matrix of dimensions p × q (Vp × q) into a dictionary (Wp × n) and activation (Hn × q) matrix with n number of components. When n is smaller than the rank of V, NMF achieves rank reduction.
If V is a spectrogram, it is non-negative; p is the number of frequency bins; and q is the number of time steps. When NMF is applied, W contains n spectral components and the n lines of H are their amplitudes along time. The lack of source overlap in the spectrogram is known as W-disjoint orthogonality (Yilmaz and Rickard, 2004). With sufficient W-disjoint orthogonality, each component of the matrices W and H is dominated by one source, and NMF achieves source separation in the single-sensor case.
Two types of solvers used for equation are initialized with and that are iteratively updated until the cost function L is minimized such that . Multiplicative update is the first solver, where the matrices and receive a product of , , V and their inverses and transposes (e.g., Lee and Seung, 2000). The second solver is based on coordinate descent algorithm (e.g., Cichocki and Phan, 2009). Starting from and , the algorithm finds the matrix minimizing , then the matrix minimizing and repeats. The problem is not convex; therefore, the solution is non-unique, and the global minimum of the cost function is not guaranteed. We present results obtained using three different cost functions, namely, Frobenius (Fr.) norm (Paatero and Tapper, 1994); the non-symmetric Kullback-Leibler (KL) divergence (Lee and Seung, 2000); and the non-symmetric Itakura-Saito (IS) divergence (Févotte , 2009).
Source-separation results when using either solver is sensitive to initialization. A first initialization method fills and with random values between 0 and , being the average of V. In a second method, and are obtained through singular value decomposition (SVD). The zeros of the sparse matrices and can then be filled by or by small random values. Zeros in and cannot be updated multiplicatively.
We use the implementation of NMF proposed by the Python module Scikit-learn (Pedregosa , 2011).
3.2 Detectors
If the different source signals are W-disjoint and if n is large enough, each component of the matrices W and H calculated by NMF is dominated by a single source. However, one does not know which source each component corresponds to. Moreover, several components can be necessary to reconstruct a single source. To solve the class assignment problem, we propose detectors that use known time-frequency characteristics to group components dominated by the same source of interest. Classified components are finally employed to generate recovered spectrograms.
The Z-call detector is based on call repetition every 66 s. It selects components with a spectrum of the activation function H that peaks between 0.014 and 0.0165 Hz (periods between 60.6 and 71.4 s). The ability to adjust the target ICI enables this detector to be adapted to other stereotyped great-whale calls. Lin (2017, 2018) used the periodicity of activation functions to separate biological choruses based on their diurnal periodicity. We push forward this idea to identify a specific population.
The seismic sound detector is based on a large frequency width. The spectral components are smoothed using a rolling average with a 0.39-Hz window. The frequency width, , is the frequency range on which the smoothed spectral component, , is contiguously higher than the threshold: . Spectral components with Hz are classified as seismic sounds.
The ship noise detector is based on the slow variation of ship noise over time and on the small width of ship frequency lines. Components are classified as ship noise if the spectrum of their activation functions shows at least 10% of the energy in the mean value (f = 0 Hz) and if their frequency width verifies Hz.
3.3 Scale-invariant signal-to-distortion ratio
4. Application
4.1 Detector assessment by application to simulated data
Figure 1 gives the SI-SDR and computation time for the 16 combinations of cost function, solver type, and initialization method implemented in Scikit-learn. Although spectrograms visualize signals logarithmically, NMF is instead performed in the linear domain. NMF is conducted with 50 components, a number that guarantees acceptable performances for the three detectors. To account for the non-uniqueness of NMF results, 50 trials have been conducted for each of the 16 combinations. SI-SDR values and computation time in Fig. 1 correspond to the median of these 50 trials, with bars showing the first and the last decile.
SI-SDR between the spectrograms of simulated reference data and recovered spectrograms (top) and computation time (bottom) obtained for Z-calls (black circles), seismic sounds (blue squares), and ship noise (red triangles). The markers correspond to the median of 50 trials and the error bars to the first and last decile. The x axis gives employed cost functions: Frobenius norm (Fr), Kullback-Leibler divergence (KL), and Itakura-Saito divergence (IS); solvers: coordinate descent (CD) and multiplicative update (MU); and initialization methods: random value (ra.), SVD (S), SVD with zeros filled by the average of data (SA) or by small random values (SR).
SI-SDR between the spectrograms of simulated reference data and recovered spectrograms (top) and computation time (bottom) obtained for Z-calls (black circles), seismic sounds (blue squares), and ship noise (red triangles). The markers correspond to the median of 50 trials and the error bars to the first and last decile. The x axis gives employed cost functions: Frobenius norm (Fr), Kullback-Leibler divergence (KL), and Itakura-Saito divergence (IS); solvers: coordinate descent (CD) and multiplicative update (MU); and initialization methods: random value (ra.), SVD (S), SVD with zeros filled by the average of data (SA) or by small random values (SR).
The combinations of Frobenius norm and multiplicative update performs faster than the others (median computation time 0.92 s with a standard deviation 0.20 s). However, only the initialization by SVD with data-average zero filling guarantees median SI-SDR above 0 dB, as the coordinate descent solver does. Median SI-SDR are above the values obtained when source separation if performed by principal component analysis (Z-calls: 0.6 dB, seismic sounds: −19.1 dB, ship noise: −0.6 dB, seismic sounds: −14.1 dB, ship noise: −126.5 dB). This combination seems appropriate for large datasets (e.g., one year of data for investigating whale migration patterns). Comparison between the results obtained for Z-calls, seismic sounds, and ship noise clearly shows that differences in computation times are due to NMF algorithms and not to detectors. Whatever the norm, multiplicative update without zero filling of initial matrices yields low SI-SDR since zeros are not updated (Fig. 1: combinations Fr-MU-S, KL-MU-S, and IS-MU-S).
Figure 2 shows mixed simulated data and the result of source separation using Frobenius norm, multiplicative update, and initialization by SVD with zeros filled by the average of data. This combination yields median SI-SDR above 0 dB and a 0.92 s median computation time (Fig. 1, combination Fr-MU-SA). The three simulated sources are overall well separated, even though artifacts are visible in recovered Z-calls and energy is missing in recovered ship noise at times corresponding to seismic sounds and at frequencies corresponding to the Z-call 26-Hz tone. For the trial in Fig. 2, the SI-SDR are 3.5 dB for Z-calls, 3.6 dB for seismic sounds, and 4.0 dB for ship noise.
Spectrograms of simulated data (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Blue whale Z-calls recovered using NMF and detectors. (c) Recovered seismic sounds. (d) Recovered ship noise.
Spectrograms of simulated data (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Blue whale Z-calls recovered using NMF and detectors. (c) Recovered seismic sounds. (d) Recovered ship noise.
4.2 Application to in situ data
Source separation is applied to RHUM-RUM data to illustrate its advantages and limits. Figure 3 shows a spectrogram of the pressure recorded by the station RR48 on May 31, 2018, and the three recovered spectrograms. NMF is conducted using Frobenius norm, multiplicative update, and initialization by SVD with zeros filled by the average of data (case 7, Fig. 1). The computation time for 24 h of data is 46 s.
Spectrograms (in dB) of RR48 station recorded pressure on May 31, 2013, bandpass filtered between 5 and 46 Hz (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Recovered blue whale Z-calls. (c) Recovered seismic sounds. (d) Recovered ship noise.
Spectrograms (in dB) of RR48 station recorded pressure on May 31, 2013, bandpass filtered between 5 and 46 Hz (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Recovered blue whale Z-calls. (c) Recovered seismic sounds. (d) Recovered ship noise.
All recovered Z-calls show the higher and more energetic tone at 26 Hz; a large proportion of them shows the downsweep with approximately half displaying the lower tone at 18 Hz. False detections occur out of the 18–26 Hz band during the first 8 hours of the day when ship noise is relatively intense. The Z-calls of the blue whale passing by the RR48 station are clearly separated from the chorus of more distant whales. Impulse sounds linked to seismic activity are correctly separated from the rest of the signal after 6:00 a.m., when ship noise becomes less intense. Background noise is attenuated compared to ship noise, but it is not separated from the blue whale chorus, causing false detections. The detector is not perfectly robust to engine-regimen changes that occur after 5:00 a.m., modifying the fundamental frequency of the harmonic signal.
The difficulties encountered in the separation of Z-calls and seismic sounds are linked to the hypothesis of W-disjoint signals not being satisfied locally. However, further work is needed for separating ship noise from blue whale chorus.
5. Conclusion
NMF is effective for isolating time-frequency patterns in spectrograms. We use it to recover whale-call, seismic sound, and ship noise spectrograms. The source separation is evaluated quantitatively on simulated data for different NMF algorithms and then applied to in situ data. The results are encouraging for Z-calls and seismic sounds, as long as time-frequency overlaps are limited. Further work is required to reduce false ship noise detections caused by blue whale chorus.
Different time- and frequency-domain transformations (e.g., wavelet, Stockwell) could minimize the overlap between source signals. Regularization of W or H within NMF cost functions can enhance the identification of continuous components (e.g., Gloaguen , 2021). The methods described in this paper could be adapted to develop detectors, such as adjusting the target ICI to generalize the Z-call detector for other stereotyped great-whale calls.
Future work will use this source separation method to perform source localization.
Supplementary Material
See the supplementary material for the SI-SDR between reference data and recovered data for a number of components from 2 to 200 and for visualizing matrices W and H calculated by 10-component NMF over half an hour of data and mono-component reconstructed spectrograms.
Acknowledgments
This work was supported by Agence de l'Innovation de Défense, DÉtection et Séparation de Sources en Ultra Basses Fréquences project. The authors thank Associate Editor Dr. D. Dall 'Osto and two anonymous reviewers for their constructive and insightful comments on the original manuscript that greatly helped to improve the paper.
Author Declarations
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Data Availability
Data sharing is not applicable to this article as no new data were created or analyzed in this study.