This paper proposes a robust system for detecting North Atlantic right whales by using deep learning methods to denoise noisy recordings. Passive acoustic recordings of right whale vocalisations are subject to noise contamination from many sources, such as shipping and offshore activities. When such data are applied to uncompensated classifiers, accuracy falls substantially. To build robustness into the detection process, two separate approaches that have proved successful for image denoising are considered. Specifically, a denoising convolutional neural network and a denoising autoencoder, each of which is applied to spectrogram representations of the noisy audio signal, are developed. Performance is improved further by matching the classifier training to include the vestigial signal that remains in clean estimates after the denoising process. Evaluations are performed first by adding white, tanker, trawler, and shot noises at signal-to-noise ratios from −10 to +5 dB to clean recordings to simulate noisy conditions. Experiments show that denoising gives substantial improvements to accuracy, particularly when using the vestigial-trained classifier. A final test applies the proposed methods to previously unseen noisy right whale recordings and finds that denoising is able to improve performance over the baseline clean-trained model in this new noise environment.

1.
Baumgartner
,
M. F.
,
Bonnell
,
J.
,
Corkeron
,
P. J.
,
Van Parijs
,
S. M.
,
Hotchkin
,
C.
,
Hodges
,
B. A.
,
Bort Thornton
,
J.
,
Mensi
,
B. L.
, and
Bruner
,
S. M.
(
2020
). “
Slocum gliders provide accurate near real-time estimates of baleen whale presence from human-reviewed passive acoustic detection information
,”
Front. Mar. Sci.
7
,
100
.
2.
Baumgartner
,
M. F.
,
Fratantoni
,
D.
,
Hurst
,
T.
,
Brown
,
M.
,
Cole
,
T.
,
Van Parijs
,
S.
, and
Johnson
,
M.
(
2013
). “
Real-time reporting of baleen whale passive acoustic detections from ocean gliders
,”
J. Acoust. Soc. Am.
134
(
3
),
1814
1823
.
3.
Clark
,
C. W.
(
1983
). “
Acoustic communication and behavior of the southern right whale (Eubalaena australis)
,” in
Communication and Behavior of Whales
(
Westview
,
Boulder, CO
), pp.
163
198
.
4.
Clark
,
C. W.
,
Gillespie
,
D.
,
Nowacek
,
D.
, and
Parks
,
S.
(
2007
). “
Listening to their world: Acoustics for monitoring and protecting right whales in an urbanized ocean
,” in
The Urban Whale: North Atlantic Right Whales at the Crossroads
, edited by
S. D.
Kraus
and
R. M.
Rolland
(
Harvard University
,
Cambridge, MA
), pp.
333
357
.
5.
Cohen
,
I.
(
2002
). “
Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator
,”
IEEE Signal Process. Lett.
9
(
4
),
113
116
.
6.
Corkeron
,
P.
,
Hamilton
,
P.
,
Bannister
,
J.
,
Best
,
P.
,
Charlton
,
C.
,
Groch
,
K.
,
Findlay
,
K.
,
Rowntree
,
V.
,
Vermeulen
,
E.
, and
Pace
,
R.
 III
(
2018
). “
The recovery of North Atlantic right whales, Eubalaena glacialis, has been constrained by human-caused mortality
,”
R. Soc. Open Sci.
5
(
11
),
180892
.
7.
Davies
,
K.
, and
Brillant
,
S.
(
2019
). “
Mass human-caused mortality spurs federal action to protect endangered North Atlantic right whales in Canada
,”
Mar. Policy
104
,
157
162
.
8.
Davis
,
G.
,
Baumgartner
,
M.
,
Bonnell
,
J.
,
Bell
,
J.
,
Berchok
,
C.
,
Thornton
,
J.
,
Brault
,
S.
,
Buchanan
,
G.
,
Charif
,
R.
,
Cholewiak
,
D.
, and
Clark
,
C.
(
2017
). “
Long-term passive acoustic recordings track the changing distribution of North Atlantic right whales (Eubalaena glacialis) from 2004 to 2014
,”
Sci. Rep.
7
(
1
),
13460
.
9.
Gillespie
,
D.
(
2004
). “
Detection and classification of right whale calls using an ‘edge’ detector operating on a smoothed spectrogram
,”
Can. Acoust.
32
(
2
),
39
47
.
10.
Gillespie
,
D.
(
2013
). “
Workshop dataset 2013
,” https://soi.st-andrews.ac.uk/static/soi/dclde2013/documents/WorkshopDataset2013.pdf (Last viewed 26 May 2021).
11.
Glorot
,
X.
, and
Bengio
,
Y.
(
2010
). “
Understanding the difficulty of training deep feedforward neural networks
,” in
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
, edited by
Y. W.
Teh
and
M.
Titterington
, JMLR Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy, Vol.
9
, pp.
249
256
.
12.
Gondara
,
L.
(
2016
). “
Medical image denoising using convolutional denoising autoencoders
,” in
Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)
, December 12–15, Barcelona, Spain, pp.
241
246
.
13.
Grais
,
E. M.
, and
Plumbley
,
M. D.
(
2017
). “
Single channel audio source separation using convolutional denoising autoencoders
,” in
Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP)
, November 14–16, Montreal, Canada, pp.
1265
1269
.
14.
He
,
K.
,
Zhang
,
X.
,
Ren
,
S.
, and
Sun
,
J.
(
2016
). “
Deep residual learning for image recognition
,” in
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, June 27–30, Las Vegas, NV, pp.
770
778
.
15.
Ibrahim
,
A. K.
,
Zhuang
,
H.
,
Erdol
,
N.
, and
Muhmed Ali
,
A.
(
2018
). “
Detection of North Atlantic right whales with a hybrid system of CNN and dictionary learning
,” in
Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI)
, December 12–14, Las Vegas, NV, pp.
1210
1213
.
16.
Ioffe
,
S.
, and
Szegedy
,
C.
(
2015
). “
Batch normalization: Accelerating deep network training by reducing internal covariate shift
,” in
Proceedings of the 32nd International Conference on Machine Learning
, July 6–11, Lille, France, Vol.
37
, pp.
448
456
.
17.
Kingma
,
D.
, and
Ba
,
J.
(
2014
). “
Adam: A method for stochastic optimization
,” arXiv:1412.6980.
18.
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Hinton
,
G. E.
(
2012
). “
Imagenet classification with deep convolutional neural networks
,” in
Advances in Neural Information Processing Systems
, edited by
F.
Pereira
,
C. J. C.
Burges
,
L.
Bottou
, and
K. Q.
Weinberger
(
Curran Associates, Inc
.,
Red Hook, NY
), Vol.
25
, pp.
1097
1105
.
19.
Leiter
,
S.
,
Stone
,
K.
,
Thompson
,
J.
,
Accardo
,
C.
,
Wikgren
,
B.
,
Zani
,
M.
,
Cole
,
T.
,
Kenney
,
R.
,
Mayo
,
C.
, and
Kraus
,
S.
(
2017
). “
North Atlantic right whale Eubalaena glacialis occurrence in offshore wind energy areas near Massachusetts and Rhode Island
,”
Endanger. Species Res.
34
,
45
59
.
20.
Liu
,
F.
,
Song
,
Q.
, and
Jin
,
G.
(
2020
). “
The classification and denoising of image noise based on deep neural networks
,”
Appl. Intell.
50
,
1
14
.
21.
Loizou
,
P.
(
2013
).
Speech Enhancement: Theory and Practice
(
CRC
,
Boca Raton, FL
).
22.
Lu
,
X.
,
Tsao
,
Y.
,
Matsuda
,
S.
, and
Hori
,
C.
(
2013
). “
Speech enhancement based on deep denoising autoencoder
,” in
Proceedings of INTERSPEECH 2013
, August 25–29, Lyon, France, pp.
436
440
.
23.
Mellinger
,
D. K.
(
2004
). “
A comparison of methods for detecting right whale calls
,”
Can. Acoust.
32
(
2
),
55
65
.
24.
Mellinger
,
D. K.
, and
Clark
,
C. W.
(
2000
). “
Recognizing transient low-frequency whale sounds by spectrogram correlation
,”
J. Acoust. Soc. Am.
107
(
6
),
3518
3529
.
25.
Milner
,
B.
(
2002
). “
A comparison of front-end configurations for robust speech recognition
,” in
Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
, May 13–17, Orlando, FL, pp.
797
800
.
26.
Mouy
,
X.
,
Bahoura
,
M.
, and
Simard
,
Y.
(
2009
). “
Automatic recognition of fin and blue whale calls for real-time monitoring in the St. Lawrence
,”
J. Acoust. Soc. Am.
126
,
2918
2928
.
27.
Nair
,
V.
, and
Hinton
,
G. E.
(
2010
). “
Rectified linear units improve restricted Boltzmann machines
,” in
Proceedings of the 27th International Conference on International Conference on Machine Learning
, June 21–24, Haifa, Israel, pp.
807
814
.
28.
Nazaré
,
T.
,
De Barros Paranhos da Costa
,
G.
,
Contato
,
W.
, and
Ponti
,
M.
(
2018
). “
Deep convolutional neural networks and noisy images
,” in
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
, edited by
M.
Mendoza
and
S.
Velastín
(
Springer
,
Cham, Switzerland
), pp.
416
424
.
29.
Pace
,
R.
 III
,
Corkeron
,
P.
, and
Kraus
,
S.
(
2017
). “
State–space mark–recapture estimates reveal a recent decline in abundance of North Atlantic right whales
,”
Ecol. Evol.
7
(
21
),
8730
8741
.
30.
Parks
,
S.
,
Hotchkin
,
C.
,
Cortopassi
,
K.
, and
Clark
,
C.
(
2012
). “
Characteristics of gunshot sound displays by North Atlantic right whales in the Bay of Fundy
,”
J. Acoust. Soc. Am.
131
(
4
),
3173
3179
.
31.
Parks
,
S.
,
Searby
,
A.
,
Célérier
,
A.
,
Johnson
,
M.
,
Nowacek
,
D.
, and
Tyack
,
P.
(
2011
). “
Sound production behavior of individual North Atlantic right whales: Implications for passive acoustic monitoring
,”
Endanger. Species Res.
15
(
1
),
63
76
.
32.
Pylypenko
,
K.
(
2015
). “
Right whale detection using artificial neural network and principal component analysis
,” in
Proceedings of the 2015 IEEE 35th International Conference on Electronics and Nanotechnology (ELNANO)
, April 21–24, Kyiv, Ukraine, pp.
370
373
.
33.
Seltzer
,
M. L.
,
Yu
,
D.
, and
Wang
,
Y.
(
2013
). “
An investigation of deep neural networks for noise robust speech recognition
,” in
Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
, May 26–31, Vancouver, Canada, pp.
7398
7402
.
34.
Shiu
,
Y.
,
Palmer
,
K.
,
Roch
,
M.
,
Fleishman
,
E.
,
Liu
,
X.
,
Nosal
,
E.-M.
,
Helble
,
T.
,
Cholewiak
,
D.
,
Gillespie
,
D.
, and
Klinck
,
H.
(
2020
). “
Deep neural networks for automated detection of marine mammal species
,”
Sci. Rep.
10
,
607
.
35.
Simonyan
,
K.
, and
Zisserman
,
A.
(
2015
). “
Very deep convolutional networks for large-scale image recognition
,” in
Proceedings of the 3rd International Conference on Learning Representations
, May 7–9, San Diego, CA.
36.
Smirnov
,
E.
(
2013
). “
North Atlantic right whale call detection with convolutional neural networks
,” in
Proceedings of the ICML 2013 Workshop on Machine Learning for Bioacoustics
, June 16–21, Atlanta, GA, pp.
78
79
.
37.
Spaulding
,
E.
,
Robbins
,
M.
,
Calupca
,
T.
,
Clark
,
C. W.
,
Tremblay
,
C.
,
Waack
,
A.
,
Warde
,
A.
,
Kemp
,
J.
, and
Newhall
,
K.
(
2009
). “
An autonomous, near-real-time buoy system for automatic detection of North Atlantic right whale calls
,”
Proc. Mtgs. Acoust.
6
,
010001
.
38.
Van Parijs
,
S.
,
Clark
,
C.
,
Sousa-Lima
,
R.
,
Parks
,
S.
,
Rankin
,
S.
,
Risch
,
D.
, and
Van Opzeeland
,
I.
(
2009
). “
Management and research applications of real-time and archival passive acoustic sensors over varying temporal and spatial scales
,”
Mar. Ecol. Prog. Ser.
395
,
21
36
.
39.
Varga
,
A.
, and
Steeneken
,
H.
(
1993
). “
Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
,”
Speech Commun.
12
(
3
),
247
251
.
40.
Verfuss
,
U. K.
,
Aniceto
,
A. S.
,
Harris
,
D. V.
,
Gillespie
,
D.
,
Fielding
,
S.
,
Jiménez
,
G.
,
Johnston
,
P.
,
Sinclair
,
R. R.
,
Sivertsen
,
A.
,
Solbø
,
S. A.
,
Storvold
,
R.
,
Biuw
,
M.
, and
Wyatt
,
R.
(
2019
). “
A review of unmanned vehicles for the detection and monitoring of marine fauna
,”
Mar. Pollut. Bull.
140
,
17
29
.
41.
Vickers
,
W.
,
Milner
,
B.
,
Lee
,
R.
, and
Lines
,
J.
(
2019a
). “
A comparison of machine learning methods for detecting right whales from autonomous surface vehicles
,” in
Proceedings of the 27th European Signal Processing Conference (EUSIPCO)
, September 2–6, A Coruña, Spain, pp.
1
5
.
42.
Vickers
,
W.
,
Milner
,
B.
,
Lines
,
J.
, and
Lee
,
R.
(
2019b
). “
Detecting right whales from autonomous surface vehicles using RNNs and CNNs
,” in
Proceedings of the EUSIPCO Satellite Workshop: Signal Processing, Computer Vision and Deep Learning for Autonomous Systems
, September 2–6, A Coruña, Spain.
43.
Vickers
,
W.
,
Milner
,
B.
,
Risch
,
D.
, and
Lee
,
R.
(
2021
). “
JASA 2021 machine learning special edition
,” https://github.com/williamvickerss/RightWhale_Jasa2021 (Last viewed 26 May 2021).
44.
Zhang
,
K.
,
Zuo
,
W.
,
Chen
,
Y.
,
Meng
,
D.
, and
Zhang
,
L.
(
2017
). “
Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising
,”
IEEE Trans. Image Process.
26
(
7
),
3142
3155
.
You do not currently have access to this content.