Many odontocetes produce frequency modulated tonal calls known as whistles. The ability to automatically determine time × frequency tracks corresponding to these vocalizations has numerous applications including species description, identification, and density estimation. This work develops and compares two algorithms on a common corpus of nearly one hour of data collected in the Southern California Bight and at Palmyra Atoll. The corpus contains over 3000 whistles from bottlenose dolphins, long- and short-beaked common dolphins, spinner dolphins, and melon-headed whales that have been annotated by a human, and released to the Moby Sound archive. Both algorithms use a common signal processing front end to determine time × frequency peaks from a spectrogram. In the first method, a particle filter performs Bayesian filtering, estimating the contour from the noisy spectral peaks. The second method uses an adaptive polynomial prediction to connect peaks into a graph, merging graphs when they cross. Whistle contours are extracted from graphs using information from both sides of crossings. The particle filter was able to retrieve 71.5% (recall) of the human annotated tonals with 60.8% of the detections being valid (precision). The graph algorithm’s recall rate was 80.0% with a precision of 76.9%.

You do not currently have access to this content.