Source separation is an important step to study signals that are not easy or possible to record individually. Common methods such as deep clustering, however, cannot be applied to signals of an unknown number of sources and/or signals that overlap in time and/or frequency—a common problem in bioacoustic recordings. This work presents an approach, using a supervised learning framework, to parse individual sources from a spectrogram of a mixture that contains a variable number of overlapping sources. This method isolates individual sources in the time-frequency domain using only one function but in two separate steps, one for the detection of the number of sources and corresponding bounding boxes, and a second step for the segmentation in which masks of individual sounds are extracted. This approach handles the full separation of overlapping sources in both time and frequency using deep neural networks in an applicable manner to other tasks such as bird audio detection. This paper presents method and reports on its performance to parse individual bat signals from recordings containing hundreds of overlapping bat echolocation signals. This method can be extended to other bioacoustic recordings with a variable number of sources and signals that overlap in time and/or frequency.

1.
S.
Makino
,
T.-W.
Lee
, and
H.
Sawada
,
Blind Speech Separation
, Vol.
615
(
Springer
,
New York
,
2007
).
2.
J.-T.
Chien
,
Source Separation and Machine Learning
(
Academic Press
,
New York
,
2018
).
3.
P.
Comon
and
C.
Jutten
,
Handbook of Blind Source Separation: Independent Component Analysis and Applications
(
Academic Press
,
New York
,
2010
).
4.
P.
Bofill
and
M.
Zibulevsky
, “
Underdetermined blind source separation using sparse representations
,”
Signal Process.
81
(
11
),
2353
2362
(
2001
).
5.
Y.
Li
,
S.-I.
Amari
,
A.
Cichocki
,
D. W.
Ho
, and
S.
Xie
, “
Underdetermined blind source separation based on sparse representation
,”
IEEE Trans. Signal Process.
54
(
2
),
423
437
(
2006
).
6.
M.
Joho
,
H.
Mathis
, and
R. H.
Lambert
, “
Overdetermined blind source separation: Using more sensors than source signals in a noisy mixture
,” in
Proceedings of the Independent Component Analysis and Blind Signal Separation ICA 2000
, Helsinki, Finland (June 19–22,
2000
), pp.
81
86
.
7.
L.-Q.
Zhang
,
A.
Cichocki
, and
S.-I.
Amari
, “
Natural gradient algorithm for blind separation of overdetermined mixture with additive noise
,”
IEEE Signal Process. Lett.
6
(
11
),
293
295
(
1999
).
8.
B.
Arons
, “
A review of the cocktail party effect
,”
J. Am. Voice I/O Soc.
12
(
7
),
35
50
(
1992
).
9.
A. W.
Bronkhorst
, “
The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions
,”
Acta Acust. united Ac.
86
(
1
),
117
128
(
2000
).
10.
S.
Haykin
and
Z.
Chen
, “
The cocktail party problem
,”
Neural Comput.
17
(
9
),
1875
1902
(
2005
).
11.
M. A.
Bee
and
C.
Micheyl
, “
The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it?
,”
J. Compar. Psychol.
122
(
3
),
235
251
(
2008
).
12.
Y.-M.
Qian
,
C.
Weng
,
X.-K.
Chang
,
S.
Wang
, and
D.
Yu
, “
Past review, current progress, and challenges ahead on the cocktail party problem
,”
Front. Inf. Technol. Electr. Eng.
19
(
1
),
40
63
(
2018
).
13.
S.
Choi
,
A.
Cichocki
,
H.-M.
Park
, and
S.-Y.
Lee
, “
Blind source separation and independent component analysis: A review
,”
Neural Inf. Process. Lett. Rev.
6
(
1
),
1
57
(
2005
).
14.
M.
Pal
,
R.
Roy
,
J.
Basu
, and
M. S.
Bepari
, “
Blind source separation: A review and analysis
,” in
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)
,
IEEE
, Gurgaon, India (November 25–27,
2013
), pp.
1
5
.
15.
S.
Makeig
,
A. J.
Bell
,
T.-P.
Jung
, and
T. J.
Sejnowski
, “
Independent component analysis of electroencephalographic data
,” in
Proceedings of Advances in Neural Information Processing Systems
, Denver, CO (December 2–5,
1996
), pp.
145
151
.
16.
D. D.
Lee
and
H. S.
Seung
, “
Learning the parts of objects by non-negative matrix factorization
,”
Nature
401
(
6755
),
788
791
(
1999
).
17.
J. R.
Hershey
,
Z.
Chen
,
J. Le
Roux
, and
S.
Watanabe
, “
Deep clustering: Discriminative embeddings for segmentation and separation
,” in
Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Shanghai, China (March 20–25,
2016
), pp.
31
35
.
18.
L. N.
Kloepper
,
M.
Linnenschmidt
,
Z.
Blowers
,
B.
Branstetter
,
J.
Ralston
, and
J. A.
Simmons
, “
Estimating colony sizes of emerging bats using acoustic recordings
,”
R. Soc. Open Sci.
3
(
3
),
160022
(
2016
).
19.
G.
Neuweiler
,
The Biology of Bats
(
Oxford University Press on Demand
,
Oxford, UK
,
2000
).
20.
S.
Vernes
and
G.
Wilkinson
, “
Behaviour, biology, and evolution of vocal learning in bats
,” bioRxiv:646703 (
2019
).
21.
G.
Jones
and
M. W.
Holderied
, “
Bat echolocation calls: Adaptation and convergent evolution
,”
Proc. R. Soc. B
274
(
1612
),
905
912
(
2007
).
22.
S.
Hiryu
,
M. E.
Bates
,
J. A.
Simmons
, and
H.
Riquimaroux
, “
FM echolocating bats shift frequencies to avoid broadcast–echo ambiguity in clutter
,”
Proc. Natl. Acad. Sci.
107
(
15
),
7048
7053
(
2010
).
23.
A. J.
Corcoran
and
C. F.
Moss
, “
Sensing in a noisy world: Lessons from auditory specialists, echolocating bats
,”
J. Exp. Biol.
220
(
24
),
4554
4566
(
2017
).
24.
C.
Chiu
,
W.
Xian
, and
C. F.
Moss
, “
Flying in silence: Echolocating bats cease vocalizing to avoid sonar jamming
,”
Proc. Natl. Acad. Sci.
105
,
13116
13121
(
2008
).
25.
N.
Ulanovsky
,
M. B.
Fenton
,
A.
Tsoar
, and
C.
Korine
, “
Dynamics of jamming avoidance in echolocating bats
,”
Proc. R. Soc. Lond. Ser. B
271
(
1547
),
1467
1475
(
2004
).
26.
E. H.
Gillam
,
N.
Ulanovsky
, and
G. F.
McCracken
, “
Rapid jamming avoidance in biosonar
,”
Proc. R. Soc. B
274
(
1610
),
651
660
(
2006
).
27.
M. E.
Bates
,
S. A.
Stamper
, and
J. A.
Simmons
, “
Jamming avoidance response of big brown bats in target detection
,”
J. Exp. Biol.
211
(
1
),
106
113
(
2008
).
28.
E.
Amichai
,
G.
Blumrosen
, and
Y.
Yovel
, “
Calling louder and longer: How bats use biosonar under severe acoustic interference from other bats
,”
Proc. R. Soc. B
282
(
1821
),
20152064
(
2015
).
29.
K.
Hase
,
Y.
Kadoya
,
Y.
Maitani
,
T.
Miyamoto
,
K. I.
Kobayasi
, and
S.
Hiryu
, “
Bats enhance their call identities to solve the cocktail party problem
,”
Commun. Biol.
1
(
1
),
39
(
2018
).
30.
Y.
Fu
and
L.
Kloepper
, “
First harmonic shape analysis of Brazilian free-tailed bat calls during emergence
,”
J. Acoust. Soc. Am.
141
(
5
),
3543
3543
(
2017
).
31.
M. B.
Gur
and
C.
Niezrecki
, “
A source separation approach to enhancing marine mammal vocalizations
,”
J. Acoust. Soc. Am.
126
(
6
),
3062
3070
(
2009
).
32.
J.
DiCecco
,
J. E.
Gaudette
, and
J. A.
Simmons
, “
Multi-component separation and analysis of bat echolocation calls
,”
J. Acoust. Soc. Am.
133
(
1
),
538
546
(
2013
).
33.
Z.
Zhang
and
P. R.
White
, “
A blind source separation approach for humpback whale song separation
,”
J. Acoust. Soc. Am.
141
(
4
),
2705
2714
(
2017
).
34.
N.
Hassan
and
D. A.
Ramli
, “
A comparative study of blind source separation for bioacoustics sounds based on fastica, pca and nmf
,”
Proc. Comput. Sci.
126
,
363
372
(
2018
).
35.
K.
He
,
G.
Gkioxari
,
P.
Dollár
, and
R.
Girshick
, “
Mask R-CNN
,” in
Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV)
, Venice, Italy (October 22–29,
2017
), pp.
2980
2988
.
36.
B.
Boashash
,
Time-Frequency Signal Analysis and Processing: A Comprehensive Reference
(
Academic Press
,
New York
,
2015
).
37.
H.
Sawada
,
S.
Araki
, and
S.
Makino
, “
Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment
,”
IEEE Trans. Audio Speech Lang. Process.
19
(
3
),
516
527
(
2010
).
38.
Y.
Wang
,
P.
Getreuer
,
T.
Hughes
,
R. F.
Lyon
, and
R. A.
Saurous
, “
Trainable frontend for robust and far-field keyword spotting
,” in
Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, New Orleans, LA (March 5–9,
2017
), pp.
5670
5674
.
39.
G.
Hu
and
D.
Wang
, “
Monaural speech segregation based on pitch tracking and amplitude modulation
,”
IEEE Trans. Neural Netw.
15
(
5
),
1135
1150
(
2004
).
40.
O.
Ronneberger
,
P.
Fischer
, and
T.
Brox
, “
U-Net: Convolutional networks for biomedical image segmentation
,” in
International Conference on Medical Image Computing and Computer-Assisted Intervention
, Munich, Germany (October 5–9,
2015
), pp.
234
241
.
41.
X.
Li
,
H.
Chen
,
X.
Qi
,
Q.
Dou
,
C.-W.
Fu
, and
P.-A.
Heng
, “
H-Denseunet: Hybrid densely connected unet for liver and tumor segmentation from ct volumes
,”
IEEE Trans. Med. Imaging
37
(
12
),
2663
2674
(
2018
).
42.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2014
).
You do not currently have access to this content.