We use non-negative matrix factorization for source separation on ultra-low frequency passive-acoustic data from a single-channel recording acquired in deep sea. Non-negative matrix factorization decomposes the spectrogram into a spectral-component matrix and a time-encoding matrix. Detectors use known time-frequency features to group components from the same source and reconstruct spectrograms of blue whale calls, seismic sounds, and ship noise. Data are separated at low computational cost and without learning step. The separation assessment using scale-invariant signal-to-distortion ratio on spectrograms of simulated reference data is satisfying. Source separation on ocean-bottom seismometer data from the Southern Indian Ocean provides convincing results.

Ocean bottom seismometers (OBSs) ensure a wide passive-acoustic coverage of the world ocean, particularly in the deep sea, with a limitation to ultra-low frequencies (≤100 Hz). The data acquired include acoustic signals from anthropogenic, biological, geophysical, and cryogenic sources. For a better understanding of the deep-sea acoustic environment, it is crucial to classify this wide diversity of sources. At-sea signals often present time and frequency mixing of several sources. Further investigations require source separation to facilitate future work on well-separated signals. The most challenging scenario for source separation is when information is provided by only one sensor: the underdetermined case.

It has been shown that non-negative matrix factorization (NMF) (Paatero and Tapper, 1994; Lee and Seung, 1999) is an effective technique for isolating patterns in images. Hence, it can be used for retrieving time-frequency patterns in spectrograms to achieve source separation. NMF natively integrates the non-negativity constraint inerrant to amplitude. NMF isolates patterns of limited time-frequency extension, whereas principal component analysis tends to produce basis images covering the whole spectrogram (Lee and Seung, 1999). Moreover, NMF avoids the assumption of non-Gaussian signals and Gaussian noise required by source separation via independent components analysis (Hyvärinen and Oja, 2000).

NMF separates the spectrogram into a spectral-component matrix and a time-encoding matrix. In this work, we propose to use detectors based on known time-frequency features to group components dominated by the same source. Recovered spectrograms can then be built for the sources of interest. We can hence avoid a possibly time-consuming learning step that would require unmixed annotated data. We present detectors of three sources, namely, one type of stereotyped great-whale call (blue whale Z-call), impulse seismic sound, and harmonic continuous ship noise. We simulate data for the three sources of interest. The detectors are applied to the simulated data for evaluating the method and for choosing an appropriate NMF algorithm. Scale-invariant signal-to-distortion ratio (SI-SDR) (Le Roux , 2019) is employed for a quantitative evaluation of the source separation method. We conclude with an example of NMF being applied on in situ data from the Southern Indian Ocean. The main contribution of this letter is source-separation in ultra-low frequency passive acoustic data acquired in the deep sea from a single-channel recorder.

This Letter is divided in five sections. The simulated and in situ data are presented in Sec. 2. The method is described in Sec. 3, and it is applied to simulated and in situ data in Sec. 4. Concluding remarks are given in Sec. 5.

We simulate the reference data used to assess the separation method. Simulated data are generated for three types of sources: blue whale Z-calls, ship noise, and seismic sounds.

2.1.1 Blue whale Z-call

Blue whale Z-calls can be modeled using a sigmoid function in the time-frequency domain. The instantaneous phase is (Socheleau , 2015) as follows:
(1)
where t is time (s), U = 4.5 Hz is the upper limit of the frequency fluctuation, L = –4.0 Hz is the lower limit of the frequency fluctuation, α=2 Hz is a growth rate, and M = 40 s is the time of the center of the Z-call. The Z-call signal amplitude can be modeled via the following:
(2)
where ƒc = 22.5 Hz and denotes the real part. The time windowing of the Z-call is ensured by the functions
(3)
where β = 2 Hz. The signal sZ0(t) is normalized to its maximum and repeated every 66 s, to reproduce the Z-call inter-call interval duration (ICI) (Dréo , 2019), finally building the signal sZ(t).

2.1.2 Ship noise

Ship noise is simulated assuming a quasi-periodic source amplitude, sship(t), with a period, T0, perturbed by a Gaussian random process
(4)
where e(t) is a pulse (here a Dirac delta function) and ψm is the time perturbation of the mth period (with zero mean and 0.1% of T0 standard deviation). So each pulse (indexed by k) time delay cumulates all previous period fluctuations. This randomness on period duration permits to reproduce width and shape of observed ship noise tonal lines in the frequency domain. The source is moving eastward at 50 km h−1 with a minimal horizontal distance to the virtual OBS of 10 km. A ray code propagates the source signal in a constant sound-speed water layer until reaching the virtual OBS laying on the seabed of an ocean with a constant depth of 3000 m. The simulated signal sship(t) is finally normalized to its maximum to form the signal ss(t).

2.1.3 Seismic sounds

We use data simulated by SPECFEM2D (Tromp , 2008) using a configuration described in Lecoulant (2022). The signal is simulated at the bottom of a 3000-m-deep homogeneous ocean with a sound speed 1500 m s−1. The seismic source is a pure shear localized 7000 m below the sea surface, at the apex of a 2000-m-high seamount with a Gaussian shape. The signal is repeated once to show two occurrences of the seismic sound within the half-an-hour long simulated data. It is normalized to its maximum to give the signal sT(t).

2.1.4 Additive mixing

The signal used to evaluate the source-separation method is finally generated by an additive mixing of signals sZ(t), sT(t), and sS(t):
(5)
where b(t) is a zero-mean white Gaussian noise process with unit variance, and An the noise amplitude (Pa). The constants AZ=1 Pa, AT=2 Pa, AS=1 Pa, and An = 0.1 Pa are the respective amplitudes of the sources sZ(t), sT(t), and sS(t) and of the noise b(t).

We apply source separation to pressure measurements corrected for sensor response from the RR48 station in the open-source RHUM-RUM network (Barruol and Sigloch, 2013). The water depth is 4830 m and the sample rate is 100 Hz. An eighth order Butterworth high-pass filter (5 Hz) is applied to remove microseismic noise. We focus on the signal acquired on May 31, 2013, while a single male Antarctic blue whale (Balaenoptera musculus intermedia) passes by the OBS following a trajectory already studied by Dréo (2019). The signature of this whale is the Z-call: every 66 s, it emits a higher tone at 26 Hz followed by a lower tone at 18 Hz with a continuous downsweep in between. Other signals of interest include harmonic sounds generated by ships and impulse sounds linked to the seismic activity of the South West Indian Ridge.

NMF decomposes a non-negative matrix of dimensions p × q (Vp×q) into a dictionary (Wp × n) and activation (Hn × q) matrix with n number of components. When n is smaller than the rank of V, NMF achieves rank reduction.

If V is a spectrogram, it is non-negative; p is the number of frequency bins; and q is the number of time steps. When NMF is applied, W contains n spectral components and the n lines of H are their amplitudes along time. The lack of source overlap in the spectrogram is known as W-disjoint orthogonality (Yilmaz and Rickard, 2004). With sufficient W-disjoint orthogonality, each component of the matrices W and H is dominated by one source, and NMF achieves source separation in the single-sensor case.

Two types of solvers used for equation VW×H=0 are initialized with W0 and H0 that are iteratively updated until the cost function L is minimized such that L(V,Wk×Hk)L(V,Wk+1×Hk+1). Multiplicative update is the first solver, where the matrices Wk+1 and Hk+1 receive a product of Wk, Hk, V and their inverses and transposes (e.g., Lee and Seung, 2000). The second solver is based on coordinate descent algorithm (e.g., Cichocki and Phan, 2009). Starting from W0 and H0, the algorithm finds the matrix W1 minimizing L(V,W1×H0), then the matrix H1 minimizing L(V,W1×H1) and repeats. The problem is not convex; therefore, the solution is non-unique, and the global minimum of the cost function is not guaranteed. We present results obtained using three different cost functions, namely, Frobenius (Fr.) norm (Paatero and Tapper, 1994); the non-symmetric Kullback-Leibler (KL) divergence (Lee and Seung, 2000); and the non-symmetric Itakura-Saito (IS) divergence (Févotte , 2009).

Source-separation results when using either solver is sensitive to initialization. A first initialization method fills W0 and H0 with random values between 0 and (V¯/n)1/2, V¯ being the average of V. In a second method, W0 and H0 are obtained through singular value decomposition (SVD). The zeros of the sparse matrices W0 and H0 can then be filled by V¯ or by small random values. Zeros in W0 and H0 cannot be updated multiplicatively.

We use the implementation of NMF proposed by the Python module Scikit-learn (Pedregosa , 2011).

If the different source signals are W-disjoint and if n is large enough, each component of the matrices W and H calculated by NMF is dominated by a single source. However, one does not know which source each component corresponds to. Moreover, several components can be necessary to reconstruct a single source. To solve the class assignment problem, we propose detectors that use known time-frequency characteristics to group components dominated by the same source of interest. Classified components are finally employed to generate recovered spectrograms.

The Z-call detector is based on call repetition every 66 s. It selects components with a spectrum of the activation function H that peaks between 0.014 and 0.0165 Hz (periods between 60.6 and 71.4 s). The ability to adjust the target ICI enables this detector to be adapted to other stereotyped great-whale calls. Lin (2017, 2018) used the periodicity of activation functions to separate biological choruses based on their diurnal periodicity. We push forward this idea to identify a specific population.

The seismic sound detector is based on a large frequency width. The spectral components are smoothed using a rolling average with a 0.39-Hz window. The frequency width, Δf, is the frequency range on which the smoothed spectral component, W¯, is contiguously higher than the threshold: 0.1(max(W¯)min(W¯))+min(W¯). Spectral components with Δf10 Hz are classified as seismic sounds.

The ship noise detector is based on the slow variation of ship noise over time and on the small width of ship frequency lines. Components are classified as ship noise if the spectrum of their activation functions shows at least 10% of the energy in the mean value (f = 0 Hz) and if their frequency width verifies Δf2 Hz.

We use simulated data as a reference to evaluate the quality of the separation method based on SI-SDR (Le Roux , 2019) as follows:
(6)
where X is the reference spectrogram, Y the recovered spectrogram, x and y are the vectors containing all the values of matrices X and Y, matrix transpose is indicated by T, and ǁ.ǁ denotes the Frobenius norm. SI-SDR results in a robust pixel-to-pixel measure of similarity between the images X and Y, expressed in dB. High SI-SDR denotes a recovered spectrogram Y highly similar to the reference spectrogram X and hence proves the effectiveness of the source separation method.

Figure 1 gives the SI-SDR and computation time for the 16 combinations of cost function, solver type, and initialization method implemented in Scikit-learn. Although spectrograms visualize signals logarithmically, NMF is instead performed in the linear domain. NMF is conducted with 50 components, a number that guarantees acceptable performances for the three detectors. To account for the non-uniqueness of NMF results, 50 trials have been conducted for each of the 16 combinations. SI-SDR values and computation time in Fig. 1 correspond to the median of these 50 trials, with bars showing the first and the last decile.

Fig. 1.

SI-SDR between the spectrograms of simulated reference data and recovered spectrograms (top) and computation time (bottom) obtained for Z-calls (black circles), seismic sounds (blue squares), and ship noise (red triangles). The markers correspond to the median of 50 trials and the error bars to the first and last decile. The x axis gives employed cost functions: Frobenius norm (Fr), Kullback-Leibler divergence (KL), and Itakura-Saito divergence (IS); solvers: coordinate descent (CD) and multiplicative update (MU); and initialization methods: random value (ra.), SVD (S), SVD with zeros filled by the average of data (SA) or by small random values (SR).

Fig. 1.

SI-SDR between the spectrograms of simulated reference data and recovered spectrograms (top) and computation time (bottom) obtained for Z-calls (black circles), seismic sounds (blue squares), and ship noise (red triangles). The markers correspond to the median of 50 trials and the error bars to the first and last decile. The x axis gives employed cost functions: Frobenius norm (Fr), Kullback-Leibler divergence (KL), and Itakura-Saito divergence (IS); solvers: coordinate descent (CD) and multiplicative update (MU); and initialization methods: random value (ra.), SVD (S), SVD with zeros filled by the average of data (SA) or by small random values (SR).

Close modal

The combinations of Frobenius norm and multiplicative update performs faster than the others (median computation time 0.92 s with a standard deviation 0.20 s). However, only the initialization by SVD with data-average zero filling guarantees median SI-SDR above 0 dB, as the coordinate descent solver does. Median SI-SDR are above the values obtained when source separation if performed by principal component analysis (Z-calls: 0.6 dB, seismic sounds: −19.1 dB, ship noise: −0.6 dB, seismic sounds: −14.1 dB, ship noise: −126.5 dB). This combination seems appropriate for large datasets (e.g., one year of data for investigating whale migration patterns). Comparison between the results obtained for Z-calls, seismic sounds, and ship noise clearly shows that differences in computation times are due to NMF algorithms and not to detectors. Whatever the norm, multiplicative update without zero filling of initial matrices yields low SI-SDR since zeros are not updated (Fig. 1: combinations Fr-MU-S, KL-MU-S, and IS-MU-S).

Figure 2 shows mixed simulated data and the result of source separation using Frobenius norm, multiplicative update, and initialization by SVD with zeros filled by the average of data. This combination yields median SI-SDR above 0 dB and a 0.92 s median computation time (Fig. 1, combination Fr-MU-SA). The three simulated sources are overall well separated, even though artifacts are visible in recovered Z-calls and energy is missing in recovered ship noise at times corresponding to seismic sounds and at frequencies corresponding to the Z-call 26-Hz tone. For the trial in Fig. 2, the SI-SDR are 3.5 dB for Z-calls, 3.6 dB for seismic sounds, and 4.0 dB for ship noise.

Fig. 2.

Spectrograms of simulated data (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Blue whale Z-calls recovered using NMF and detectors. (c) Recovered seismic sounds. (d) Recovered ship noise.

Fig. 2.

Spectrograms of simulated data (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Blue whale Z-calls recovered using NMF and detectors. (c) Recovered seismic sounds. (d) Recovered ship noise.

Close modal

Source separation is applied to RHUM-RUM data to illustrate its advantages and limits. Figure 3 shows a spectrogram of the pressure recorded by the station RR48 on May 31, 2018, and the three recovered spectrograms. NMF is conducted using Frobenius norm, multiplicative update, and initialization by SVD with zeros filled by the average of data (case 7, Fig. 1). The computation time for 24 h of data is 46 s.

Fig. 3.

Spectrograms (in dB) of RR48 station recorded pressure on May 31, 2013, bandpass filtered between 5 and 46 Hz (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Recovered blue whale Z-calls. (c) Recovered seismic sounds. (d) Recovered ship noise.

Fig. 3.

Spectrograms (in dB) of RR48 station recorded pressure on May 31, 2013, bandpass filtered between 5 and 46 Hz (window length: 1024 points, overlap: 50%). (a) Raw data. (b) Recovered blue whale Z-calls. (c) Recovered seismic sounds. (d) Recovered ship noise.

Close modal

All recovered Z-calls show the higher and more energetic tone at 26 Hz; a large proportion of them shows the downsweep with approximately half displaying the lower tone at 18 Hz. False detections occur out of the 18–26 Hz band during the first 8 hours of the day when ship noise is relatively intense. The Z-calls of the blue whale passing by the RR48 station are clearly separated from the chorus of more distant whales. Impulse sounds linked to seismic activity are correctly separated from the rest of the signal after 6:00 a.m., when ship noise becomes less intense. Background noise is attenuated compared to ship noise, but it is not separated from the blue whale chorus, causing false detections. The detector is not perfectly robust to engine-regimen changes that occur after 5:00 a.m., modifying the fundamental frequency of the harmonic signal.

The difficulties encountered in the separation of Z-calls and seismic sounds are linked to the hypothesis of W-disjoint signals not being satisfied locally. However, further work is needed for separating ship noise from blue whale chorus.

NMF is effective for isolating time-frequency patterns in spectrograms. We use it to recover whale-call, seismic sound, and ship noise spectrograms. The source separation is evaluated quantitatively on simulated data for different NMF algorithms and then applied to in situ data. The results are encouraging for Z-calls and seismic sounds, as long as time-frequency overlaps are limited. Further work is required to reduce false ship noise detections caused by blue whale chorus.

Different time- and frequency-domain transformations (e.g., wavelet, Stockwell) could minimize the overlap between source signals. Regularization of W or H within NMF cost functions can enhance the identification of continuous components (e.g., Gloaguen , 2021). The methods described in this paper could be adapted to develop detectors, such as adjusting the target ICI to generalize the Z-call detector for other stereotyped great-whale calls.

Future work will use this source separation method to perform source localization.

See the supplementary material for the SI-SDR between reference data and recovered data for a number of components from 2 to 200 and for visualizing matrices W and H calculated by 10-component NMF over half an hour of data and mono-component reconstructed spectrograms.

This work was supported by Agence de l'Innovation de Défense, DÉtection et Séparation de Sources en Ultra Basses Fréquences project. The authors thank Associate Editor Dr. D. Dall 'Osto and two anonymous reviewers for their constructive and insightful comments on the original manuscript that greatly helped to improve the paper.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

1.
Barruol
,
G.
, and
Sigloch
,
K.
(
2013
). “
Investigating La Réunion hot spot from crust to core
,”
EoS. Trans.
94
(
23
),
205
207
.
2.
Cichocki
,
A.
, and
Phan
,
A.-H.
(
2009
). “
Fast local algorithms for large scale nonnegative matrix and tensor factorizations
,”
IEICE Trans. Fund.
E92A
(
3
),
708
721
.
3.
Dréo
,
R.
,
Bouffaut
,
L.
,
Leroy
,
E.
,
Barruol
,
G.
, and
Samaran
,
F.
(
2019
). “
Baleen whale distribution and seasonal occurrence revealed by an ocean bottom seismometer network in the Western Indian Ocean
,”
Deep Sea Res. II
161
,
132
144
.
4.
Févotte
,
C.
,
Bertin
,
N.
, and
Durrieu
,
J.-L.
(
2009
). “
Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis
,”
Neural Comput.
21
(
3
),
793
830
.
5.
Gloaguen
,
J.-R.
,
Ecotière
,
D.
,
Gauvreau
,
B.
,
Finez
,
A.
,
Petit
,
A.
, and
Le Bourdat
,
C.
(
2021
). “
Automatic estimation of the sound emergence of wind turbine noise with nonnegative matrix factorization
,”
J. Acoust. Soc. Am.
150
(
4
),
3127
3138
.
6.
Hyvärinen
,
A.
, and
Oja
,
E.
(
2000
). “
Independent component analysis: Algorithms and applications
,”
Neural Netw.
13
(
4
),
411
430
.
7.
Le Roux
,
J.
,
Wisdom
,
S.
,
Erdogan
,
H.
, and
Hershey
,
J. R.
(
2019
). “
SDR: Half-baked or well done?
,” in
ICASSP 2019: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing
(
IEEE
,
Brighton, UK)
, pp.
626
630
.
8.
Lecoulant
,
J.
,
Guennou
,
C.
,
Guillon
,
L.
, and
Royer
,
J.-Y.
(
2022
). “
Numerical modeling and observations of seismo-acoustic waves propagating as modes in a fluid-solid waveguide
,”
J. Acoust. Soc. Am.
151
(
5
),
3437
3447
.
9.
Lee
,
D.
, and
Seung
,
H. S.
(
2000
). “
Algorithms for non–negative matrix factorization
,” in
Advances in Neural Information Processing Systems
, edited by
T.
Leen
,
T.
Dietterich
, and
V.
Tresp
,
Massachusetts Institute of Technology
,
Cambridge, MA
, Vol.
13
.
10.
Lee
,
D. D.
, and
Seung
,
H. S.
(
1999
). “
Learning the parts of objects by non–negative matrix factorization
,”
Nature
401
(
6755
),
788
791
.
11.
Lin
,
T.-H.
,
Fang
,
S.-H.
, and
Tsao
,
Y.
(
2017
). “
Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings
,”
Sci. Rep.
7
(
1
),
4547
.
12.
Lin
,
T.-H.
,
Tsao
,
Y.
, and
Akamatsu
,
T.
(
2018
). “
Comparison of passive acoustic soniferous fish monitoring with supervised and unsupervised approaches
,”
J. Acoust. Soc. Am.
143
(
4
),
EL278
EL284
.
13.
Paatero
,
P.
, and
Tapper
,
U.
(
1994
). “
Positive matrix factorization: A non–negative factor model with optimal utilization of error estimates of data values
,”
Environmetrics
5
(
2
),
111
126
.
14.
Pedregosa
,
F.
,
Varoquaux
,
G.
,
Gramfort
,
A.
,
Michel
,
V.
,
Thirion
,
B.
,
Grisel
,
O.
,
Blondel
,
M.
,
Prettenhofer
,
P.
,
Weiss
,
R.
,
Dubourg
,
V.
,
Vanderplas
,
J.
,
Passos
,
A.
,
Cournapeau
,
D.
,
Brucher
,
M.
,
Perrot
,
M.
, and
Duchesnay
,
E.
(
2011
). “
Scikit-learn: Machine Learning in Python
,”
J. Mach. Learn. Res.
12
,
2825
2830
.
15.
Socheleau
,
F.-X.
,
Leroy
,
E.
,
Carvallo Pecci
,
A.
,
Samaran
,
F.
,
Bonnel
,
J.
, and
Royer
,
J.-Y.
(
2015
). “
Automated detection of Antarctic blue whale calls
,”
J. Acoust. Soc. Am.
138
(
5
),
3105
3117
.
16.
Tromp
,
J.
,
Komatitsch
,
D.
, and
Qiny
,
L.
(
2008
). “
Spectral-element and adjoint methods in seismology
,”
Commun. Comput. Phys.
3
(
1
),
1
32
.
17.
Yilmaz
,
O.
, and
Rickard
,
S.
(
2004
). “
Blind separation of speech mixtures via time-frequency masking
,”
IEEE Trans. Signal Process.
52
(
7
),
1830
1847
.

Supplementary Material