This paper presents a method for automatic detection of fish sounds in an underwater environment. There exist two difficulties: (i) features and classifiers that provide good detection results differ depending on the underwater environment and (ii) there are cases where a large amount of training data that is necessary for supervised machine learning cannot be prepared. A method presented in this paper (the proposed hybrid method) overcomes these difficulties as follows. First, novel logistic regression (NLR) is derived via adaptive feature weighting by focusing on the accuracy of classification results by multiple classifiers, support vector machine (SVM), and k-nearest neighbors (k-NN). Although there are cases where SVM or k-NN cannot work well due to divergence of useful features, NLR can produce complementary results. Second, the proposed hybrid method performs multi-stage classification with consideration of the accuracy of SVM, k-NN, and NLR. The multi-stage acquisition of reliable results works adaptively according to the underwater environment to reduce performance degradation due to diversity of useful classifiers even if abundant training data cannot be prepared. Experiments on underwater recordings including sounds of Sciaenidae such as silver croakers (Pennahia argentata) and blue drums (Nibea mitsukurii) show the effectiveness of the proposed hybrid method.

1.
P. J. B.
Hart
and
J. D.
Reynolds
,
Handbook of Fish Biology and Fisheries Volume 2
(
Wiley-Blackwell
,
Hoboken, NJ
,
2002
).
2.
J. S.
Gray
, “
Marine biodiversity: Patterns, threats and conservation needs
,”
Biodiversity Conserv.
6
,
153
175
(
1997
).
3.
S. E.
Parks
,
J. L.
Miksis-Olds
, and
S. L.
Denes
, “
Assessing marine ecosystem acoustic diversity across ocean basins
,”
Ecol. Inf.
21
,
81
88
(
2014
).
4.
L.
Hatch
,
C.
Clark
,
R.
Merrick
,
S. V.
Parijs
,
D.
Ponirakis
,
K.
Schwehr
,
M.
Thompson
, and
D.
Wiley
, “
Characterizing the relative contributions of large vessels to total ocean noise fields: A case study using the Gerry E. Studds Stellwagen Bank National Marine Sanctuary
,”
Environ. Manag.
42
,
735
752
(
2008
).
5.
M. O.
Lammers
,
R. E.
Brainard
,
W. W. L.
Au
,
T. A.
Mooney
, and
K. B.
Wong
, “
An ecological acoustic recorder (EAR) for long-term monitoring of biological and anthropogenic sounds on coral reefs and other marine habitats
,”
J. Acoust. Soc. Am.
123
,
1720
1728
(
2008
).
6.
J. L.
Miksis-Olds
,
J. A.
Nystuen
, and
S. E.
Parks
, “
What does ecosystem acoustics reveal about marine mammals in the Bering Sea?
,” in
The Effects of Noise on Aquatic Life. Advances in Experimental Medicine and Biology
, edited by
A. N.
Popper
and
A.
Hawkins
(
Springer
,
New York
,
2012
), Vol.
730
, pp.
597
600
.
7.
S. L.
Nieukirk
,
D. K.
Mellinger
,
S. E.
Moore
,
K.
Klinck
,
R. P.
Dziak
, and
J.
Goslin
, “
Sounds from airguns and fin whales recorded in the mid-Atlantic Ocean, 1999−2009
,”
J. Acoust. Soc. Am.
131
,
1102
1112
(
2012
).
8.
D. K.
Mellinger
,
K. M.
Stafford
,
S. E.
Moore
,
R. P.
Dziak
, and
H.
Matsumoto
, “
An overview of fixed passive acoustic observation methods for cetaceans
,”
Oceanography
20
,
36
45
(
2007
).
9.
K.
Ichikawa
,
C.
Tsutsumi
,
N.
Arai
,
T.
Akamatsu
,
T.
Shinke
,
T.
Hara
, and
K.
Adulyanukosol
, “
Dugong (Dugong dugon) vocalization patterns recorded by automatic underwater sound monitoring systems
,”
J. Acoust. Soc. Am.
119
,
3726
3733
(
2006
).
10.
C.
Erbe
and
A. R.
King
, “
Automatic detection of marine mammals using information entropy
,”
J. Acoust. Soc. Am.
124
,
2833
2840
(
2008
).
11.
E. T.
Küsel
,
D. K.
Mellinger
,
L.
Thomas
,
T. A.
Marques
,
D.
Moretti
, and
J.
Ward
, “
Cetacean population density estimation from single fixed sensors using passive acoustics
,”
J. Acoust. Soc. Am.
129
,
3610
3622
(
2011
).
12.
T-H.
Lin
,
L-S.
Chou
,
T.
Akamatsu
,
H-C.
Chan
, and
C-F.
Chen
, “
An automatic detection algorithm for extracting the representative frequency of cetacean tonal sounds
,”
J. Acoust. Soc. Am.
134
,
2477
2485
(
2013
).
13.
D.
Diep
,
H.
Nonon
,
I.
Marc
,
J.
Delhom
, and
F.
Roure
, “
Acoustic counting and monitoring of shad fish populations
,” in International AmiBio Workshop: Recent Progress in Computational Bioacoustics for Assessing Biodiversity (
2013
), pp.
1
5
.
14.
M.
Vieira
,
P. J.
Fonseca
,
M. C. P.
Amorim
, and
C. J. C.
Teixeira
, “
Call recognition and individual identification of fish vocalizations based on automatic speech recognition: An example with the Lusitanian toadfish
,”
J. Acoust. Soc. Am.
138
,
3941
3950
(
2015
).
15.
D. A.
Reynolds
and
R. C.
Rose
, “
Robust text-independent speaker identification using Gaussian mixture speaker models
,”
IEEE Trans. Speech Audio Process.
3
,
72
83
(
1995
).
16.
L. R.
Rabiner
, “
A tutorial on hidden Markov models and selected applications in speech recognition
,”
Proc. IEEE
77
,
257
286
(
1989
).
17.
I.
Matsuo
,
T.
Imaizumi
, and
T.
Akamatsu
, “
Detection of fish calls by using the small underwater sound recorder
,”
J. Acoust. Soc. Am.
136
,
2152
(
2014
).
18.
F.
Rodrigues
,
F.
Pereira
, and
B.
Ribeiro
, “
Learning from multiple annotators: Distinguishing good from random labelers
,”
Pattern Recognit. Lett.
34
,
1428
1436
(
2013
).
19.
C.
Cortes
and
V.
Vapnik
, “
Support-vector networks
,”
Mach. Learn.
20
,
273
297
(
1995
).
20.
T.
Cover
and
P.
Hart
, “
Nearest neighbor pattern classification
,”
IEEE Trans. Inf. Theory
13
,
21
27
(
1967
).
21.
J.
Ramcharitar
,
D. P.
Gannon
, and
A. N.
Popper
, “
Bioacoustics of fishes of the family Sciaenidae (croakers and drums)
,”
Trans. Am. Fish. Soc.
135
,
1409
1431
(
2006
).
22.
T.
Lin
,
Y.
Tsao
, and
T.
Akamatsu
, “
Comparison of passive acoustic soniferous fish monitoring with supervised and unsupervised approaches
,”
J. Acoust. Soc. Am.
143
,
EL278
EL284
(
2018
).
23.
A. V.
Oppenheim
and
R. W.
Schafer
,
Discrete-Time Signal Processing
, 3rd ed. (
Pearson
,
London
,
2009
).
24.
M.
Rizwan
,
B. T.
Carroll
,
D. V.
Anderson
,
W.
Daley
,
S.
Harbert
,
D. F.
Britton
, and
M. W.
Jackwood
, “
Identifying rale sounds in chickens using audio signals for early disease detection in poultry
,” in
Proc. IEEE Global Conf. Signal and Information Processing
(
2016
), pp.
55
59
.
25.
T.
Lim
,
K.
Bae
,
C.
Hwang
, and
H.
Lee
, “
Classification of underwater transient signals using MFCC feature vector
,”
Proc. International Symposium on Signal Processing and Its Applications
(
2007
), pp.
1
4
.
26.
P.
Du
,
W. A.
Kibbe
, and
S. M.
Lin
, “
Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching
,”
Bioinformatics
22
,
2059
2065
(
2006
).
27.
N.
Nitanda
and
M.
Haseyama
, “
Audio-based shot classification for audiovisual indexing using PCA, MGD and Fuzzy algorithm
,”
IEICE Trans. Fundam.
E90-A
,
1542
1548
(
2007
).
28.
Z.
Cataltepe
,
Y.
Yaslan
, and
A.
Sonmez
, “
Music genre classification using MIDI and audio features
,”
EURASIP J. Adv. Signal Process.
2007
,
36409:1
36409:8
(
2007
).
29.
G.
Tzanetakis
and
P.
Cook
, “
Musical genre classification of audio signals
,”
IEEE Trans. Speech Audio Process.
5
,
293
302
(
2002
).
30.
A.
Coates
and
A. Y.
Ng
, “
Learning feature representations with k-means
,”
Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science
(
2012
), pp.
561
580
.
31.
A.
Bell
and
T. J.
Sejnowski
, “
The ‘independent components’ of natural scenes are edge filters
,”
Vision Res.
37
,
3327
3338
(
1997
).
32.
D.
Ruta
and
B.
Gabrys
, “
An overview of classifier fusion methods
,”
Comput. Inf. Syst.
7
,
1
10
(
2000
).
33.
B.
Raskutti
and
A.
Kowalczyk
, “
Extreme re-balancing for SVMs: A case study
,”
ACM SIGKDD Explor. Newsl.
6
,
60
69
(
2004
).
34.
R.
Akbani
,
S.
Kwek
, and
N.
Japkowicz
, “
Applying support vector machines to imbalanced data sets
,”
Lect. Notes Comput. Sci.
3201
,
39
50
(
2004
).
35.
J.
MacQueen
, “
Some methods for classification and analysis of multivariate observations
,”
Proc. 5th Berkeley Symposium Mathematical Statistics and Probability
(
1967
), pp.
281
297
.
36.
C.-C.
Chang
and
C.-J.
Lin
, “
LIBSVM: A library for support vector machines
,” Available at https://www.csie.ntu.edu.tw/~cjlin/libsvm/ (Last viewed October 7, 2018), pp.
1
39
.
37.
K.
Sasaki
,
T.
Ogawa
,
S.
Takahashi
, and
M.
Haseyama
, “
DLF-based speech segment detection and its application to audio noise removal for video conferences
,”
ITE Trans. Media Technol. Appl.
4
,
68
77
(
2016
).
38.
C.-W.
Hsu
,
C.-C.
Chang
, and
C.-J.
Lin
, “
A practical guide to support vector classification
,” Technical report, Department of Computer Science (
2003
).
39.
P.
Xu
,
G. N.
Brock
, and
R. S.
Parrish
, “
Modified linear discriminant analysis approaches for classification of high-dimensional microarray data
,”
Comput. Stat. Data Anal.
53
,
1674
1687
(
2009
).
40.
O.
Ledoit
and
M.
Wolf
, “
Honey, I shrunk the sample covariance matrix
,”
J. Portfolio Manage.
30
,
110
119
(
2004
).
41.
A.
Krizhevsky
,
I.
Sutskever
, and
G. E.
Hinton
, “
Imagenet classification with deep convolutional neural networks
,”
Advances in Neural Information Processing Systems (NIPS)
(
2012
), pp.
1097
1105
.
42.
S.
Hershey
,
S.
Chaudhuri
,
D. P. W.
Ellis
,
J. F.
Gemmeke
,
A.
Jansen
,
R. C.
Moore
,
M.
Plakal
,
D.
Platt
,
R. A.
Saurous
,
B.
Seybold
,
M.
Slaney
,
R. J.
Weiss
, and
K.
Wilson
, “
CNN architectures for large-scale audio classification
,” in
Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing
(
2017
), pp.
131
135
.
You do not currently have access to this content.