A method is described for the automatic recognition of transient animal sounds. Automatic recognition can be used in wild animal research, including studies of behavior, population, and impact of anthropogenic noise. The method described here, spectrogram correlation, is well-suited to recognition of animal sounds consisting of tones and frequency sweeps. For a sound type of interest, a two-dimensional synthetic kernel is constructed and cross-correlated with a spectrogram of a recording, producing a recognition function—the likelihood at each point in time that the sound type was present. A threshold is applied to this function to obtain discrete detection events, instants at which the sound type of interest was likely to be present. An extension of this method handles the temporal variation commonly present in animal sounds. Spectrogram correlation was compared to three other methods that have been used for automatic call recognition: matched filters, neural networks, and hidden Markov models. The test data set consisted of bowhead whale (Balaena mysticetus) end notes from songs recorded in Alaska in 1986 and 1988. The method had a success rate of about 97.5% on this problem, and the comparison indicated that it could be especially useful for detecting a call type when relatively few (5–200) instances of the call type are known.

1.
Altes
,
R. A.
(
1980
). “
Detection, estimation, and classification with spectrograms
,”
J. Acoust. Soc. Am.
67
,
1232
1246
.
2.
Bradbury, J. W., and Vehrencamp, S. L. (1998). Principles of Animal Communication (Sinauer, Sunderland, MA).
3.
Buck
,
J. R.
, and
Tyack
,
P. L.
(
1993
). “
A quantitative measure of similarity for Tursiops truncatus signature whistles
,”
J. Acoust. Soc. Am.
94
,
2497
2506
.
4.
Chabot, D. (1984). “Sound production of the humpback whale (Megaptera novaeangliae, Borowski) in Newfoundland waters,” Masters thesis, Memorial University of Newfoundland, St. John’s.
5.
Chabot
,
D.
(
1988
). “
A quantitative technique to compare and classify humpback whale (Megaptera novaeangliae) sounds
,”
Ethology
77
,
89
102
.
6.
Charif, R. A., Mitchell, S. G., and Clark, C. W. (1995). CANARY 1.2 User’s Manual (Cornell Laboratory of Ornithology, Ithaca, NY).
7.
Clark, C. W. (1981). “Acoustic Communication and Behavior of the Southern Right Whale,” Ph.D. thesis, State University of New York at Stony Brook.
8.
Clark
,
C. W.
(
1982
). “
The acoustic repertoire of the southern right whale, a quantitative analysis
,”
Anim. Behav.
30
,
1060
1071
.
9.
Clark, C. W. (1983). “Acoustic communication and behavior of the Southern Right Whale (Eubalaena australis),” in Communication and Behavior of Whales, edited by R. Payne (Westview, Boulder), pp. 163–198.
10.
Clark
,
C. W.
(
1989
). “
Call tracks of bowhead whales based on call characteristics as an independent means of determining tracking parameters
,”
Rep. Intl. Whal. Commn.
39
,
111
112
.
11.
Clark, C. W. (1990). “Acoustic behavior of mysticete whales,” in Sensory Abilities of Cetaceans, edited by J. A. Thomas and R. A. Kastelein (Plenum, New York), pp. 571–583.
12.
Clark, C. W. (1991). Ocean Voices of the Alaskan Arctic [audio cassette] (Cornell Laboratory of Ornithology, Ithaca, NY).
13.
Clark
,
C. W.
, and
Ellison
,
W. T.
(
1989
). “
Numbers and distributions of bowhead whales, Balaena mysticetus, based on the 1986 acoustic study off Pt. Barrow, Alaska
,”
Rep. Intl. Whal. Commn.
39
,
297
303
.
14.
Clark, C. W., and Ellison, W. T. (2000). “Calibration and comparison of the acoustic location methods used during the spring migration of the bowhead whale, Balaena mysticetus, off Pt. Barrow, Alaska, 1984–1993,” J. Acoust. Soc. Am. 107, 3509–3517.
15.
Clark
,
C. W.
, and
Fristrup
,
K. M.
(
1996
). “
Whales ’95: A combined visual and acoustic survey of blue and fin whales off Southern California
,”
Rep. Intl. Whal. Commn.
47
,
583
600
.
16.
Clark
,
C. W.
, and
Johnson
,
J. H.
(
1984
). “
The sounds of the bowhead whale, Balaena mysticetus, during the spring migrations of 1979 and 1980
,”
Can. J. Zool.
62
,
1436
1441
.
17.
Clark
,
C. W.
, and
Mellinger
,
D. K.
(
1994
). “
Application of Navy IUSS for whale research
,”
J. Acoust. Soc. Am.
96
,
3315
(A).
18.
Clark, C. W., Ellison, W. T., and Beeman, K. (1986). “Acoustic tracking of migrating bowhead whales,” Proc. IEEE Oceans ’86, pp. 341–346 (IEEE, New York).
19.
Clark
,
C. W.
,
Marler
,
P.
, and
Beeman
,
K.
(
1987
). “
Quantitative analysis of animal vocal phonology: An application to swamp sparrow song
,”
Ethology
76
,
101
115
.
20.
Clark
,
C. W.
,
Bower
,
J. B.
, and
Ellison
,
W. T.
(
1991
). “
Acoustic tracks of migrating bowhead whales, Balaena mysticetus, off Point Barrow, Alaska based on vocal characteristics
,”
Rep. Intl. Whal. Commn.
40
,
596
597
.
21.
Clark
,
C. W.
,
Mitchell
,
S. G.
, and
Charif
,
R. A.
(
1996
). “
Distribution and behavior of the bowhead whale, Balaena mysticetus, based on preliminary analysis of acoustic data collected during the 1993 spring migration off Point Barrow, Alaska
,”
Rep. Intl. Whal. Commn.
46
,
541
554
.
22.
Clark, C. W., Tyack, P. L., and Ellison, W. T. (1998). Quicklook, Low-Frequency Sound Scientific Research Program. Phase I: Responses of blue and fin whales to SURTASS LFA, Southern California Bight, 5 September–21 October 1997.
23.
Cummings
,
W. C.
, and
Thompson
,
P. O.
(
1971
). “
Underwater sounds from the blue whale, Balaenoptera musculus
,”
J. Acoust. Soc. Am.
50
,
1193
1198
.
24.
deCharms
,
R. C.
,
Blake
,
D. T.
, and
Merzenich
,
M. M.
(
1998
). “
Optimizing sound features for cortical neurons
,”
Science
280
,
1439
1443
.
25.
Edds
,
P. L.
(
1982
). “
Vocalizations of the blue whale, Balaenoptera physalus, in the St. Lawrence River
,”
J. Mammal.
63
,
345
347
.
26.
Engineering Design. (1997). SIGNAL/RTS (Engineering Design, Belmont, MA).
27.
Frankel
,
A. S.
,
Clark
,
C. W.
,
Herman
,
L. M.
, and
Gabriele
,
C. M.
(
1995
). “
Spatial distribution, habitat utilization, and social interactions of humpback whales, Magaptera novaeangliae, off Hawai’i, determined using acoustic and visual techniques
,”
Can. J. Zool.
73
,
1134
1146
.
28.
Fristrup, K. M. (1992). “Characterizing acoustic features of marine animal sounds,” Woods Hole Oceanographic Institution, WHOI-92-04, Woods Hole, MA.
29.
Fristrup, K. M., and Watkins, W. A. (1994). “Marine animal sound classification,” Woods Hole Oceanographic Institution, WHOI-94-13, Woods Hole, MA.
30.
Gaetz, W., Jantzen, K., Weinberg, H., Spong, P., and Symonds, H. (1993). “A neural network mechanism for recognition of individual Orcinus orca based on their acoustic bahavior: Phase 1,” Proc. IEEE Oceans ’93, Vol. I, pp. 455–457 (IEEE, New York).
31.
Ghosh
,
J.
,
Deuser
,
L. M.
, and
Beck
,
S. D.
(
1992
). “
A neural network-based hybrid system for detection, characterization, and classification of short-duration oceanic signals
,”
IEEE J. Ocean Eng.
17
,
351
363
.
32.
Goedeking
,
P.
(
1983
). “
A minicomputer-aided method for the detection of features from vocalizations of the cotton-top tamarin (Saguinus oedipus oedipus)
,”
Z. Tierpsychol.
62
,
321
328
.
33.
Graber
,
R. R.
(
1968
). “
Nocturnal migration in Illinois: Different points of view
,”
Wilson Bull.
80
,
36
71
.
34.
Griffin, D. R. (1964). Bird Migration (Natural History, Garden City, NY), pp. 11–13.
35.
Hebb, D. O. (1949). The Organization of Behavior (Wiley, New York).
36.
Herman, L. M., and Tavolga, W. N. (1980). “The communication systems of cetaceans,” in Cetacean Behavior: Mechanism and Function, edited by L. M. Herman (Wiley, New York), pp. 149–209.
37.
Hubel
,
D. H.
, and
Wiesel
,
T. N.
(
1962
). “
Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex
,”
J. Physiol. (London)
160
,
106
154
.
38.
Hubel
,
D. H.
, and
Wiesel
,
T. N.
(
1977
). “
Functional architecture of macaque monkey visual cortex
,”
Proc. R. Soc. London, Ser. B
198
,
1
59
.
39.
Jain, A. K., and Dubes, R. C. (1988). Algorithms for Clustering Data (Prentice-Hall, Englewood Cliffs, NJ).
40.
Kay
,
R. H.
, and
Matthews
,
D. R.
(
1972
). “
On the existence in human auditory pathways of channels selectively tuned to the modulation present in frequency-modulated tones
,”
J. Physiol. (London)
225
,
657
677
.
41.
Larkin, R. P. (1978). “Radar observations of behavior of migrating birds in response to sounds broadcast from the ground,” in Animal Migration, Navigation, and Homing, edited by K. Schmidt-Koenig and W. T. Keaton (Springer, New York).
42.
Leaper
,
R.
,
Chappell
,
O.
, and
Gordon
,
J.
(
1992
). “
The development of practical techniques for surveying sperm whale populations acoustically
,”
Rep. Intl. Whal. Commn.
42
,
549
560
.
43.
Lee
,
K.-F.
,
Hon
,
H.-W.
, and
Reddy
,
R.
(
1990
). “
An overview of the Sphinx speech-recognition system
,”
IEEE Trans. Acoust., Speech, Signal Process.
38
,
35
45
.
44.
Lippman
,
R. C.
(
1989
). “
Review of neural networks for speech recognition
,”
Neural Comput.
1
,
1
38
.
45.
McDonald
,
M. A.
,
Hildebrand
,
J. A.
, and
Webb
,
S. C.
(
1995
). “
Blue and fin whales observed on a seafloor array in the Northeast Pacific
,”
J. Acoust. Soc. Am.
98
,
712
721
.
46.
Mellinger, D. K. (1993). “Handling time variability in bioacoustic transient detection,” Proc. IEEE Oceans ’93, pp. 116–121 (IEEE, New York).
47.
Mellinger, D. K. (1995). OSPREY 1.2 Guide, Technical report (Cornell Laboratory of Ornithology, Ithaca, NY).
48.
Mellinger, D. K., and Clark, C. W. (1993). “A method for filtering bioacoustic transients by spectrogram image convolution,” Proc. IEEE Oceans ’93, pp. 122–127 (IEEE, New York).
49.
Mendelson
,
J. R.
and
Cynader
,
M. S.
(
1985
). “
Sensitivity of cat auditory primary cortex (AI) neurons to the direction and rate of frequency modulation
,”
Brain Res.
327
,
331
335
.
50.
Mo/ller
,
A. R.
(
1977
). “
Coding of time-varying sounds in the cochlear nucleus
,”
Audiology
17
,
446
468
.
51.
Moody, D. B., and Stebbins, W. C. (1989). “Salience of frequency modulation in primate communication,” in The Comparative Psychology of Audition: Perceiving Complex Sounds, edited by R. J. Dooling and S. H. Hulse (Erlbaum, Hillsdale, NJ).
52.
Moore
,
P. W. B.
,
Roitblat
,
H. L.
,
Penner
,
R. H.
, and
Nachtigall
,
P. E.
(
1991
). “
Recognizing successive dolphin echoes with an integrator gateway network
,”
Neural Networks
4
,
701
709
.
53.
Moore
,
S. E.
,
Stafford
,
K. M.
,
Dahlheim
,
M. E.
,
Fox
,
C. G.
,
Braham
,
H. W.
, and
Polovina
,
J. J.
(
1998
). “
Seasonal variation in reception of fin whale calls at five geographic areas in the North Pacific
,”
Mar. Mammal Sci.
14
,
617
627
.
54.
Newman
,
J. D.
,
Lieblich
,
A. K.
,
Talmage-Riggs
,
G.
, and
Symmes
,
D.
(
1978
). “
Syllable classification and sequencing in twitter calls of squirrel monkeys (Saimiri sciureus)
,”
Z. Tierpsychol.
47
,
77
88
.
55.
Nyamsi
,
R. G. M.
,
Aubin
,
T.
, and
Bremond
,
J. C.
(
1994
). “
On the extraction of some time-dependent parameters of an acoustic signal by means of the analytic signal concept: its application to animal sound study
,”
Bioacoustics
5
,
187
203
.
56.
Oppenheim, A. V., and Schafer, R. W. (1975). Digital Signal Processing (Prentice-Hall, Englewood Cliffs, NJ).
57.
Patrick
,
P. H.
,
Ramani
,
N.
,
Sheehan
,
R. W.
, and
Hanson
,
W.
(
1994
). “
Listening to and identifying wildlife using computers
,”
Global Biodiversity
3
(
3
),
12
16
.
58.
Payne
,
R. S.
, and
Payne
,
K. B.
(
1971
). “
Underwater sounds of southern right whales
,”
Zoologica
58
,
159
165
.
59.
Payne, K., Tyack, P., and Payne, R. (1983). “Progressive changes in the songs of humpback whales (Megaptera novaeangliae): A detailed analysis of two seasons in Hawaii,” in Communication and Behavior of Whales, edited by R. S. Payne (Westview, Boulder), pp. 9–57.
60.
Payne
,
R. S.
, and
McVay
,
S.
(
1971
). “
Songs of humpback whales
,”
Science
173
,
587
597
.
61.
Pinkowski
,
B.
(
1994
). “
Robust Fourier descriptors for charactrerizing amplitude-modulated waveform shapes
,”
J. Acoust. Soc. Am.
95
,
3419
3423
.
62.
Potter
,
J. R.
,
Mellinger
,
D. K.
, and
Clark
,
C. W.
(
1994
). “
Marine mammal call discrimination using artificial neural networks
,”
J. Acoust. Soc. Am.
96
,
1255
1262
.
63.
Rabiner, L. R., and Juang, B.-H. (1986). “An introduction to hidden Markov models,” IEEE ASSP Magazine, January 1986, Vol. 3, pp. 4–15.
64.
Rabiner, L. R., and Juang, B.-H. (1993). Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs, NJ).
65.
Raftery
,
A. E.
, and
Zeh
,
J. E.
(
1998
). “
Estimating bowhead whale population size and rate of increase from the 1993 census
,”
J. Am. Stat. Assoc.
93
,
451
463
.
66.
Ramani
,
N.
, and
Patrick
,
P. H.
(
1992
). “
Fish detection and identification using neural networks
,”
IEEE J. Ocean Eng.
17
,
364
368
.
67.
Rivers
,
J. A.
(
1997
). “
Blue whale, Balaenoptera musculus, vocalizations from the waters off central California
,”
Mar. Mammal Sci.
13
,
186
195
.
68.
Rumelhart, D. E., McClelland, J. L., and the PDP Research Group (1987). Parallel Distributed Processing (MIT, Cambridge, MA).
69.
Schevill, W. E. (1964). “Underwater sounds of cetaceans,” in Marine Bio-acoustics, edited by W. N. Tavolga (Pergamon, New York), pp. 307–316.
70.
Schevill, W. E., and Watkins, W. A. (1962). “Whale and Porpoise Voices: A Phonograph Record,” Contribution #1320 from Woods Hole Oceanogr. Inst., Woods Hole.
71.
Schevill, W. E., Watkins, W. A., and Backus, R. H. (1964). “The 20-cycle signals and Balaenoptera (fin whales),” in Marine Bio-acoustics, edited by W. N. Tavolga (Pergamon, New York), pp. 147–152.
72.
Silber
,
G. K.
(
1986
). “
The relationship of social vocalizations to surface behavior and aggression in the Hawaiian humpback whale Megaptera novaeangliae
,”
Can. J. Zool.
64
,
2075
2080
.
73.
Stafford
,
K. M.
,
Fox
,
C. G.
, and
Clark
,
D. S.
(
1998
). “
Long-range acoustic detection and localization of blue whale calls in the northeast Pacific Ocean
,”
J. Acoust. Soc. Am.
104
,
3616
3625
.
74.
Stafford
,
K. M.
,
Nieukirk
,
S. L.
, and
Fox
,
C. G.
(
1999
). “
An acoustic link between blue whales in the eastern tropical Pacific and the northeast Pacific
,”
J. Acoust. Soc. Am.
15
,
1258
1268
.
75.
Sturtivant
,
C.
, and
Datta
,
S.
(
1997
). “
Automatic dolphin whistle detection, extraction, encoding, and classification
,”
Proc. Inst. Acoust.
19
(
9
),
259
266
.
76.
Taylor
,
A.
(
1995
). “
Bird flight call discrimination using machine learning
,”
J. Acoust. Soc. Am.
97
,
3370
(A).
77.
Taylor, A., Watson, G., Grigg, G., and McCallum, H. (1996). “Monitoring frog communities: an application of machine learning,” in Innovative Applications Artificial Intelligence Conference (AAAI Press, Menlo Park, CA), pp. 1564–1569.
78.
Terres, J. K. (1980). The Audubon Society Encyclopedia of North American Birds (Knopf, New York), pp. 604–605.
79.
Thompson
,
P. O.
,
Findley
,
L. T.
,
Vidal
,
O.
, and
Cummings
,
W. C.
(
1996
). “
Underwater sounds of blue whales, Balaenoptera musculus, in the Gulf of California, Mexico
,”
Mar. Mammal Sci.
12
,
288
293
.
80.
Thompson, T. J., Winn, H. E., and Perkins, P. J. (1979). “Mysticete sounds,” in Behavior of Marine Mammals: Current Perspectives in Research, edited by H. E. Winn and B. L. Olla (Plenum, New York), pp. 403–431.
81.
van Trees, H. L. (1968). Detection, Estimation, and Modulation Theory (Wiley, New York), Vol. I.
82.
Watkins
,
W. A.
, and
Schevill
,
W. E.
(
1972
). “
Sound source location by arrival-times on a non-rigid three-dimensional hydrophone array
,”
Deep-Sea Res.
19
,
691
706
.
83.
Watkins
,
W. A.
,
Tyack
,
P.
,
Moore
,
K. E.
, and
Bird
,
J. E.
(
1987
). “
The 20-Hz signals of finback whales (Balaenoptera physalus)
,”
J. Acoust. Soc. Am.
82
,
1901
1912
.
84.
Weisburn, B. A., Mitchell, S. G., Clark, C. W., and Parks, T. W. (1993). “Isolating biological acoustic transient signals,” Proc. IEEE Intl. Conf. Acoust., Speech, Sig. Process., Vol. 1, pp. 269–272 (IEEE, New York).
85.
Whitehead
,
H.
, and
Weilgart
,
L.
(
1990
). “
Click rates from sperm whales
,”
J. Acoust. Soc. Am.
87
,
1798
1806
.
86.
Whitfield
,
I. C.
, and
Evans
,
E. F.
(
1965
). “
Responses of auditory cortical neurons to stimuli of changing frequency
,”
J. Neurophysiol.
28
,
655
672
.
87.
Wiley, R. H., and Richards, D. C. (1982). “Adaptations for acoustic communication in birds: Sound transmission and signal detection,” in Acoustic Communication in Birds, edited by D. E. Kroodsma and E. H. Miller (Academic, London), Vol. I, pp. 131–181.
88.
Winn
,
H. E.
, and
Perkins
,
P. J.
(
1976
). “
Distribution and sounds of the minke whale, with a review of mysticete sounds
,”
Cetology
19
,
1
12
.
89.
Würsig, B., and Clark, C. W. (1993). “Behavior,” in The Bowhead Whale, edited by J. J. Burns, J. J. Montague, and C. J. Cowles (Allen, Lawrence, KS), pp. 157–199.
90.
Zeh, J. E., Clark, C. W., George, J. C., Withrow, D., Carroll, G. M., and Koski, W. R. (1993). “Current population size and dynamics,” in The Bowhead Whale, edited by J. J. Burns, J. J. Montague, and C. J. Cowles (Allen, Lawrence, KS), pp. 409–489.
This content is only available via PDF.
You do not currently have access to this content.