Automatic classification of mysticete sounds has long been a challenging task in the bioacoustics field. The unknown statistical properties of the signals as well as the use of different recording apparatus and low signal-to-noise ratio conditions often lead to non-optimal systems. The goal of this paper is to design methods for the automatic classification of mysticete sounds using a restricted Boltzmann machine and a sparse auto-encoder that are widely used in the field of artificial intelligence. Experiments on five species of mysticetes are presented. The different methods are employed on the subset of species whose frequency range overlaps, as well as in all five species' calls. Moreover, results are offered with and without the use of a noise class. Overall, the systems are able to achieve an average classification accuracy of over 69% (with noise) and 80% (without noise) given the different architectures.

1.
S.
Shamma
, “
On the role of space and time in auditory processing
,”
Trends Cogn. Sci.
5
,
340
348
(
2001
).
2.
S.
David
,
N.
Mesgarani
, and
S.
Shamma
, “
Estimating sparse spectro-temporal receptive fields with natural stimuli
,”
Network Comput. Neural Syst.
18
,
191
212
(
2007
).
3.
Pattern Classification
, edited by
R. O.
Duda
,
P. E.
Hart
and
D. G.
Stork
(
Wiley Interscience
,
New York
,
2001
), pp.
1
680
.
4.
P. J.
Dugan
,
A. N.
Rice
,
I.
Urazghildiiev
, and
C. W.
Clark
, “
North Atlantic right whale acoustic signal processing: Part I. Comparison of machine learning algorithms
,” in Proceedings of the IEEE Applications and Technology Conference (Long Island, NY,
2010
), pp.
1
6
.
5.
E.
Mercado
and
A.
Kuh
, “
Classification of humpback whale vocalizations using a self-organizing neural network
,” in IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on Neural Networks (
1998
), Vol.
2
, pp.
1584
1589
.
6.
J. C.
Brown
and
P. J. O.
Miller
, “
Automatic classification of killer whale vocalizations using dynamic time warping
,”
J. Acoust. Soc. Am.
122
,
1201
1207
(
2007
).
7.
J. C.
Brown
and
P.
Smaragdis
, “
Hidden Markov and Gaussian mixture models for automatic call classification
,”
J. Acoust. Soc. Am.
125
(
6
),
221
224
(
2009
).
8.
M.
Roch
,
M. S.
Soldevilla
,
J. C.
Burtenshaw
,
E. E.
Henderson
, and
J. A.
Hildebrand
, “
Gaussian mixture model classification of odontocetes in the Southern California Bight and the Gulf of California
,”
J. Acoust. Soc. Am.
121
(
3
),
1737
1748
(
2007
).
9.
A. V.
Oppenheim
,
G. E.
Kopec
, and
J. M.
Tribolet
, “
Signal analysis by homomorphic prediction
,”
IEEE Trans. Acoust., Speech, Signal Process.
24
,
327
332
(
1976
).
10.
D.
Gillepsie
, “
Detection and classification of right whale calls using an edge detector operating on a smoothed spectrogram
,”
J. Canadian Acoust.
32
,
39
47
(
2004
).
11.
I.
Urazghildiiev
,
C. W.
Clark
,
T.
Krein
, and
S.
Parks
, “
Detection and recognition of North Atlantic right whale contact calls in the presence of ambient noise
,”
IEEE J. Ocean. Eng.
34
(
3
),
358
368
(
2009
).
12.
I.
Urazghildiiev
,
C. W.
Clark
, and
T.
Krein
, “
Acoustic detection and recognition of fin whale and North Atlantic right whale sounds
,” in Proceedings of the IEEE Workshop on New Trends for Environmental Monitoring using Passive Systems, Hyeres, France (
2008
), pp.
1
6
.
13.
Y.
Ma
and
K.
Chen
, “
A time-frequency perceptual feature for classification of marine mammal sounds
,” in 9th International Conference on Signal Processing ICSP (
2008
), pp.
2820
2823
.
14.
M. A.
Roch
,
M. S.
Soldevilla
,
R.
Hoenigman
,
S. M.
Wiggins
, and
J. H.
Hilderbrand
, “
Comparison of machine learning techniques for the classification of echolocation clicks from three species of odontocetes
,”
J. Canadian Acoust.
36
,
41
47
(
2008
).
15.
D. K.
Mellinger
, “
A comparison of methods for detecting right whale calls
,”
J. Canadian Acoust.
32
,
55
65
(
2004
).
16.
D. K.
Mellinger
and
C. W.
Clark
, “
Recognizing transient low-frequency whale sounds by spectrogram correlation
,”
J. Acoust. Soc. Am.
107
,
3518
3529
(
2000
).
17.
M.
Bahoura
and
Y.
Simard
, “
Blue whale calls classification using short-time Fourier and wavelet packet transforms and artificial neural network
,”
Digit. Signal Process.
20
,
1256
1263
(
2010
).
18.
X.
Mouy
,
M.
Bahoura
, and
Y.
Simard
, “
Automatic recognition of fin and blue whale calls for real-time monitoring in the St. Lawrence
,”
J. Acoust. Soc. Am.
126
,
2918
2928
(
2009
).
19.
P. J.
Dugan
,
A. N.
Rice
,
I.
Urazghildiiev
, and
C. W.
Clark
, “
North Atlantic right whale acoustic signal processing: Part II. Improved decision architecture for auto-detection using multi-classifier combination methodology
,” in Proceedings of the IEEE Applications and Technology Conference (Long Island, NY,
2010
), pp.
1
6
.
20.
S. O.
Murray
,
E.
Mercado
, and
H. L.
Roitblat
, “
The neural network classification of false killer whale (Pseudora crassidens) vocalizations
,”
J. Acoust. Soc. Am.
104
,
3626
3633
(
1998
).
21.
R.
Socher
,
C. C.-H.
Lin
,
A. Y.
Ng
, and
C. D.
Manning
, “
Parsing natural scenes and natural language with recursive neural networks
,” in
ICML
, edited by
L.
Getoor
and
T.
Scheffer
(
Omnipress
,
Madison, WI
,
2011
), pp.
129
136
.
22.
G. E.
Hinton
and
R. R.
Salakhutdinov
, “
Reducing the dimensionality of data with neural networks
,”
Science
313
,
504
507
(
2006
).
23.
Y.
Bengio
,
P.
Lamblin
,
D.
Popovici
, and
H.
Larochelle
, “
Greedy layer-wise training of deep networks
,” in
Advances in Neural Information Processing Systems 19
, edited by
B.
Schölkopf
,
J.
Platt
, and
T.
Hoffman
(
MIT Press
,
Cambridge, MA
,
2007
), pp.
153
160
.
24.
G. E.
Hinton
, “
Learning multiple layers of representation
,”
Trends Cogn. Sci.
11
,
428
434
(
2007
).
25.
G. E.
Hinton
, “
A practical guide for training restricted Boltzmann machines
,” Technical Report UTML TR 2010-003, Department of Computer Science, University of Toronto (2010), available at http://www.cs.toronto.edu/hinton/absps/guideTR.pdf (Last viewed 11 April
2013
).
26.
M.
Ranzato
,
C.
Poultney
,
S.
Chopra
, and
Y.
LeCun
, “
Efficient learning of sparse representations with an energy-based model
,” in Advances in Neural Information Processing Systems 19, edited by
B.
Schölkopf
,
J.
Platt
, and
T.
Hoffman
(
MIT Press
,
Cambridge, MA
,
2007
), pp.
1137
1144
.
27.
R.
Raina
,
A.
Battle
,
H.
Lee
,
B.
Packer
, and
A. Y.
Ng
, “
Self-taught learning: Transfer learning from unlabeled data
,” in Proceedings of the 24th International Conference on Machine Learning (
2007
), pp.
759
766
.
28.
H.
Lee
,
C.
Ekanadham
, and
A.
Ng
, “
Sparse deep belief net model for visual area v2′
,” in Advances in Neural Information Processing Systems 20, edited by
J.
Platt
,
D.
Koller
,
Y.
Singer
, and
S.
Roweis
(
MIT Press
,
Cambridge, MA
,
2008
), pp.
873
880
.
29.
B. A.
Olshausen
and
D. J.
Field
, “
Sparse coding with an overcomplete basis set: A strategy employed by v1?
,”
Vision Res.
37
,
3311
3325
(
1997
).
30.
G.
Hinton
, “
Training products of experts by minimizing contrastive divergence
,”
Neural Comput.
14
,
1771
1800
(
2000
).
31.
Optimization for Machine Learning
, edited by
S.
Sra
,
S.
Nowozin
, and
S. J.
Wright
(
MIT Press
,
Cambridge, MA
,
2011
), pp.
1
512
.
32.
A.
Krizhevsky
, “
Learning multiple layers of features from tiny images
” (
2009
), available at http://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf (Last viewed 11 April
2013
).
33.
I.
Goodfellow
,
Q.
Le
,
A.
Saxe
,
H.
Lee
, and
A.
Ng
, “
Measuring invariances in deep networks
,” in
Advances in Neural Information Processing Systems 22
, edited by
Y.
Bengio
,
D.
Schuurmans
,
J.
Lafferty
,
C. K. I.
Williams
, and
A.
Culotta
(
Curran Associates, Inc.
,
Red Hook, NY
,
2009
), pp.
646
654
.
34.
D. K.
Mellinger
and
C. W.
Clark
, “
Mobysound: A reference archive for studying automatic recognition of marine mammal sounds
,”
Appl. Acoust.
67
(
11-12
),
1226
1242
(
2006
).
35.
A.
Mohamed
,
G. E.
Dahl
, and
G. E.
Hinton
, “
Acoustic modeling using deep belief networks
,”
IEEE Trans. Acoust., Speech, Signal Process.
20
,
14
22
(
2012
).
You do not currently have access to this content.