African manatees (Trichechus senegalensis) are vulnerable, understudied, and difficult to detect. Areas where African manatees are found were acoustically sampled and deep learning techniques were used to develop the first African manatee vocalization detector. A transfer learning approach was used to develop a convolutional neural network (CNN) using a pretrained CNN (GoogLeNet). The network was highly successful, even when applied to recordings collected from a different location. Vocal detections were more common at night and tended to occur within less than 2 min of one another.

Sound propagates well in water and is an effective medium for communication utilized by many marine species (Au and Hastings, 2008). Species can use sound for a wide range of purposes including defending territories, finding conspecifics, identifying individuals, and coordinating activities (Garcia and Favaro, 2017; Penar et al., 2020; Tavolga, 1965). Detecting biological sound to learn about species' presence and behavior is called passive acoustic monitoring (PAM) and has been successful in the detection of many marine mammal species (e.g., Marques et al., 2013; Marian et al., 2021; MacIntyre et al., 2013; Rand et al., 2022; Romagosa et al., 2020; Rycyk et al., 2022).

African manatees (Trichechus senegalensis) are an understudied and vulnerable species (Keith Diagne, 2015). The data deficiency stems, in part, from their cryptic behavior that makes them difficult to visually detect (Mayaka et al., 2015; Takoukam Kamla, 2012). African manatees come to the surface only briefly to breathe, rarely break the water’s surface with their activity during the day, and are commonly found in environments with limited water clarity (Takoukam Kamla, 2012). While difficult to see, African manatees produce vocalizations that can be used to acoustically detect their presence (Rycyk et al., 2021).

One of the barriers to the widespread use of passive acoustic monitoring, across species, is the challenge of analyzing large amounts of data. This burden can be sharply curtailed by using deep learning techniques to develop an automated method of detecting sounds of interest, such as vocalizations. In particular, convolutional neural networks (CNNs) have grown in popularity and have been successful in extracting vocalizations of many species from large datasets (e.g., Escobar-Amado et al., 2022; Jiang et al., 2019; Merchan et al., 2020; Rasmussen and Širović, 2021; Ríos et al., 2021; Stowell, 2022; Usman et al., 2020). For example, Allen et al. (2021) trained a CNN to detect song produced by humpback whales (Megaptera novaeangliae) and applied it to more than 187 000 h of acoustic recordings. Training a CNN “from scratch” is time-consuming and requires significant knowledge of neural network architectures. This burden can be reduced by using transfer learning techniques in which a CNN trained on a large dataset is used as a starting point for developing a new CNN (Dufourq et al., 2022; Pan and Yang, 2010; Weiss et al., 2016).

To develop a CNN, a large training dataset is required. Manually finding a large number of vocalizations for training the CNN can be time-consuming. Additionally, building a training dataset from data collected from a single location can result in a similar soundscape across the training set. The resulting CNN may not be successful when applied to a new location with a different soundscape as new sounds may be incorrectly classified since the CNN was not trained on those sounds (Roch et al., 2015). In addition to the challenges of developing a CNN, there are time and financial costs associated with collecting large amounts of acoustic data. Deciding when, where, and for how long to collect acoustic data is particularly challenging for a species in which little is known about its vocal behavior. Descriptions of the timing and variability in vocal detections of a species are helpful for future studies of that species.

We address many of these challenges by (1) collecting acoustic data from multiple locations in areas where there is evidence that African manatees are present, (2) evaluating the effectiveness of a CNN trained to detect African manatee vocalizations, (3) validating the CNN using two datasets, and (4) summarizing temporal patterns in vocal detections.

We used previously collected recordings from Lake Ossa Cameroon (3.76810° N, 10.02566° E) and newly collected recordings from Lekki Lagoon (6 25.137° N, 4 14.102° E) and Badagry Lagoon (6 26.713° N, 2 50.964° E) in Nigeria. The Lake Ossa samples were from recordings collected in April 2020 with an LS1 underwater acoustic recorder (HTI-96-min hydrophone, sensitivity of −180.2 dB re 1 V/µPa, frequency response 2 Hz to 30 kHz, and a sampling rate of 44.1 kHz, Loggerhead Instruments, Sarasota, FL). For more information about this dataset, see Rycyk et al., 2021. The Nigeria data were collected with a Snap underwater acoustic recorder that collected data continuously (HTI-96-min hydrophone, sensitivity of −180.6 dB re 1 V/µPa, frequency response 2 Hz–30 kHz, and a sampling rate of 44.1 kHz, Loggerhead Instruments). The Lekki Lagoon site was selected based on manatee feeding signs noticed in the environment dominated by the hippo grass (Vossia cuspidata) and the white lotus (Nymphaea lotus). The equipment was deployed between March 23–31, 2022 (8.8 day) on a 40 kg concrete platform in a vertical position approximately 50 cm off the bottom. The environmental conditions at the time of deployment were an air temperature of 33.0 °C, a water temperature of 34.0 °C, water transparency of 1.4 m, salinity of 0.0‰, and water depth of 1.4 m for Lekki Lagoon and air temperature of 34.0 °C, a water temperature of 33.5 °C, water transparency of 1.6 m, salinity of 0.1‰, and water depth of 1.6 m for Badagry Lagoon. The Badagry Lagoon site was selected based on similar environmental conditions to Lekki Lagoon and was deployed between April 20 and April 29, 2022 (10.0 day) on a 20 kg concrete platform also in a vertical position at a depth of 1.6 m.

Recordings were split into 0.5 s clips and a spectrogram was created for each segment. Spectrograms were created by computing short-time Fourier transforms using Kaiser windows with 64 Hz frequency resolution. The default matlab colormap was used. Color mapping was magnitude dependent with the color range extents determined by the power range of the signal. Each spectrogram image was limited to 1–20 kHz and sized to 224 × 224 with red-green-blue color channels using bicubic interpolation.

A transfer learning approach was used to train a convolutional neural network for classifying spectrogram images as containing or not containing a manatee vocalization (see workflow in Fig. 1). A pretrained network, GoogLeNet, was used as a starting point in matlab with the Deep Learning Toolbox model (Szegedy et al., 2014; Mathworks, 2022). This network has been trained on more than 1 million images to classify object types and is 22 layers deep. It was fine-tuned for classifying the presence/absence of manatee vocalizations using a stochastic gradient descent algorithm with a learning rate of 0.0001 over 10 epochs with shuffling between each epoch. Predictions ≥ 0.5 were classified as manatee vocalizations.

Fig. 1.

A diagram of the workflow (starts in the top left) for creating the final convolutional neural network (CNN) to detect 0.5 s clips with an African manatee vocalization and validation data from two locations (inset box).

Fig. 1.

A diagram of the workflow (starts in the top left) for creating the final convolutional neural network (CNN) to detect 0.5 s clips with an African manatee vocalization and validation data from two locations (inset box).

Close modal

The training approach described in the previous paragraph was used twice. The first training session was used to extract a larger sample of manatee vocalization samples [example vocalizations in Fig. 2(A)]. For each training session, images were randomly split into two groups, with 70% used for training and the rest for validation. The first training session used an initial dataset that contained a large number of samples from Lake Ossa (5885 vocalization samples, 2490 no vocalization samples), but a small number of samples from Lekki Lagoon (627 vocalization samples, 4022 no vocalization samples). These samples were manually found when visually scanning spectrograms of a portion of the recordings. GoogLeNet was trained using the preliminary set of samples and is called the preliminary CNN. This fine-tuned CNN was applied to the full Lekki Lagoon dataset of recordings. Newly detected vocalizations were combined with the initial vocalization sample set and false positives were combined with the initial no vocalization sample set to bolster sample size. The training approach described above was repeated using this larger dataset (8613 vocalization samples, 8613 no vocalization samples) to create the final CNN (see supplementary material for the MATLAB file of the final CNN).1

Fig. 2.

(A) A spectrogram of two African manatee vocalizations (1024 point discrete Fourier transform, Hann window, 50% overlap). These vocalizations were detected farther apart and moved closer together for visualization. (B) Mean ± SE number of samples that contained a vocalization detection per day at two locations, Lekki Lagoon (8 day), and Badagry Lagoon (9 day). Only days with a full 24 h of data collection are included. Each sample is 0.5 s in duration and a positive detection could contain more than one vocalization. A particular vocalization may also be detected in more than one clip.

Fig. 2.

(A) A spectrogram of two African manatee vocalizations (1024 point discrete Fourier transform, Hann window, 50% overlap). These vocalizations were detected farther apart and moved closer together for visualization. (B) Mean ± SE number of samples that contained a vocalization detection per day at two locations, Lekki Lagoon (8 day), and Badagry Lagoon (9 day). Only days with a full 24 h of data collection are included. Each sample is 0.5 s in duration and a positive detection could contain more than one vocalization. A particular vocalization may also be detected in more than one clip.

Close modal

The final CNN was applied to the full Lekki Lagoon and Badagry Lagoon sets of recordings. The final CNN was trained on samples from Lekki Lagoon, therefore the final CNN should perform well on this dataset. The final CNN was not trained on samples from Badagry Lagoon, therefore the performance of the final CNN on this dataset determines whether the CNN was overfitted and if it is generalizable to other sites. After the final CNN was applied to each full dataset (Lekki Lagoon and Badagry Lagoon), 10% of the classifications from each location were manually validated. For validation, spectrogram clips classified as containing a manatee vocalization were determined to either be correctly classified as containing a manatee vocalization or not. Spectrogram clips classified as not containing a manatee vocalization were determined to either be correctly classified as not containing a manatee vocalization or not. The validation set of clips was randomly selected with 10% of clips from each day included ensuring the validation set evenly represented each recording day. From the manual validation, true detection, misses, and false alarm rates were calculated from each location (Lekki and Badagry).

The classifications from the application of the final CNN to the full Lekki Lagoon and Badagry Lagoon datasets were evaluated for differences in the occurrence of vocalizations between sites (Lekki and Badagry) and diurnal patterns. To compare the acoustic activity between sites and by the time of day, we calculated the mean ± SE (standard error) number of vocalizations pooled per hour and day. We evaluated how close together in time vocalization detections occurred to inform selecting recording durations for future studies. For this, the number of seconds between each vocalization detection was calculated for the full Lekki Lagoon and Badagry Lagoon datasets. Intervals less than 1 s were excluded as small intervals can indicate vocalizations that were detected in neighboring 0.5 s clips. Cumulative density frequency curves for vocalization detection intervals were created for each location to visualize how different interval thresholds affect the portion of vocalizations captured. The cumulative density frequency curves can be used to guide choices concerning duty cycle and sampling intervals in future African manatee PAM research. Additionally, the 90th percentile interval values were calculated for each location to identify an interval in which the strong majority of neighboring vocalizations occur. The 90th percentile was chosen as a guide, but a different threshold may be more appropriate when designing future sampling schemes based on resources, site accessibility, and research question. All analyses were performed in Matlab (MATLAB, 2021).

The final CNN was able to classify spectrograms with African manatee vocalizations with high accuracy (see validation in Fig. 1). When tested on the Lekki Lagoon location, where part of the training data originated, the CNN accurately captured 95.9% of the samples with an African manatee vocalization. It missed only 0.4% of vocalization samples and had a false alarm rate of 0.0%. When tested on a new location, Badagry Lagoon, the CNN still had high accuracy, but less than was found for the Lekki Lagoon dataset. The final CNN captured 90.2% of samples with an African manatee vocalization in the Badagry Lagoon dataset. It missed 2.3% of samples with vocalizations and had a false alarm rate of 0.1%.

On average, there were more vocalization detections at the Badagry Lagoon location (1642 ± 297 vocalization detections/day) than at the Lekki Lagoon location (386 ± 145 vocalization detections/day) [Fig. 2(B)]. Vocalizations were more commonly detected during the night at both locations (Fig. 3).

Fig. 3.

Mean ± SE number of samples that contained a vocalization detection per hour of the day at two locations, Lekki Lagoon (8.8 day), and Badagry Lagoon (10.0 day). Each sample is 0.5 s in duration and a positive detection could contain more than one vocalization. A particular vocalization may also be detected in more than one clip. The hour numbers start at midnight (i.e., 0 = midnight, 12 = noon).

Fig. 3.

Mean ± SE number of samples that contained a vocalization detection per hour of the day at two locations, Lekki Lagoon (8.8 day), and Badagry Lagoon (10.0 day). Each sample is 0.5 s in duration and a positive detection could contain more than one vocalization. A particular vocalization may also be detected in more than one clip. The hour numbers start at midnight (i.e., 0 = midnight, 12 = noon).

Close modal

The cumulative density frequency curve of duration between vocalization detections was similar for the Lekki Lagoon and Badagry Lagoon locations (Fig. 4). The 90th percentile duration between vocalization detections (excluding < 1 s) at Lekki Lagoon was 112.6 s. The 90th percentile duration between vocalization detections (excluding < 1 s) at Badagry Lagoon was 107.7 s.

Fig. 4.

The cumulative density frequency of the interval (seconds) between vocalization detections for Lekki Lagoon (top) and Badagry Lagoon (bottom). Intervals less than 1 s were excluded as these could represent vocalizations detected in neighboring 0.5 s clips.

Fig. 4.

The cumulative density frequency of the interval (seconds) between vocalization detections for Lekki Lagoon (top) and Badagry Lagoon (bottom). Intervals less than 1 s were excluded as these could represent vocalizations detected in neighboring 0.5 s clips.

Close modal

Our study is the first to train a CNN to detect the presence of African manatee vocalizations in acoustic recordings. The resulting CNN was highly successful at detecting African manatee vocalizations as evidenced by the high true detection rates, the low number of misses, and the low false alarm rates (Fig. 1). Unsurprisingly, the final CNN performed better on the dataset from the location, Lekki Lagoon, where some of the training data originated. Overfitting a CNN can lead to a false sense of model effectiveness. This occurs when the model produced is too complex and specific to the training data. Overfitting can be evaluated by comparing model performance when applied to training data and to a set of new data. We tested whether the CNN was overfitted by applying it to a new location, Badagry Lagoon, and found that the CNN was still highly successful. If the CNN had been overfitted, it would have performed much worse on the Badagry Lagoon dataset. Our results suggest that when analyzing recordings from new sites, training data from another site can serve as a good foundation as long as the acoustic recorder sampled up to 20 kHz. However, it is still important to incorporate samples from the new location into the training data to account for differences in soundscape to boost performance. This would be especially important for soundscapes that are very different. Our two sites were relatively similar as they were both generally quiet locations. However, there were still differences between the soundscapes. The largest differences between the sites were the increased presence of insect choruses that lasted for hours a day at Lekki Lagoon and more bird vocalizations at Badagry Lagoon.

Building training datasets is commonly one of the most time-consuming hurdles to developing deep learning methods for extracting sounds of interest from acoustic recordings. We demonstrate that this burden can be lessened by starting with samples from another location and running a preliminary CNN to bolster the sample size (Yang et al., 2020). Adding samples from more locations will increase the representation of manatee vocalizations from more populations and incorporate samples from a wider variety of soundscapes; both will result in a more robust CNN that has larger applicability. Another advantage is that automated detection algorithms can reduce possible human error in manual auditing and support replicability. Developing and using automated detection algorithms, as the CNN developed here, is crucial for analyzing the large datasets necessary to explore larger temporal and spatial scales of African manatee abundance, distribution, and habitat use patterns.

The development of a CNN to extract African manatee vocalizations from recordings greatly decreases the amount of time required to analyze acoustic recordings, but there are still resource limitations to data collection. Acquiring acoustic recording devices, deploying/retrieving them, storing large amounts of data, and processing large amounts of data have time and financial limitations. Therefore, it is important to select the lowest duty cycle and temporal resolution necessary to answer a given research question. These considerations are difficult to evaluate without knowing how often vocal detections are likely to occur and how much it varies over time for your target species. We provide an analysis of the time between vocalization detections, temporal patterns, and a comparison between two locations to help inform these decisions.

On average, there was a more than fourfold higher number of vocalization detections at the Badagry location compared to Lekki (Fig. 2). These sites are 153.84 km apart but in similar environments. The recordings were collected approximately a month apart and the large difference in detections could stem from temporal and/or geographic differences. Additionally, there was high variance in the number of vocalization detections per day for both sites which suggests only sampling a small number of days could lead to missing manatee presence in an area. Both sites exhibited far more vocalization detections at night compared to during the day (Fig. 3). This pattern agrees with African manatee vocalization detection patterns from a stationary recorder in Lake Ossa, Cameroon (Rycyk et al., 2021). All three datasets were recorded from stationary recorders, so it cannot be ruled out that manatees were not near the recorders during the day. However, finding a similar pattern at three locations is building evidence that African manatees are more vocally active at night. A possible reason for the diel behavior is reduced disturbance from human activity at night (Keith Diagne, 2015; Takoukam Kamla, 2012). This diel pattern in vocalization detection suggests it is crucial to include night sampling when acoustically monitoring African manatees. Vocalization detections tended to occur close together in time with the majority of detections occurring within less than 2 min of one another (Fig. 4). Altogether, our findings suggest that passive acoustic monitoring of African manatees should include multiple locations, multiple days in a row, and nighttime.

Passive acoustic monitoring of African manatees can help us understand their distribution and habitat preferences. Vocalization detections may be used in the future to acoustically estimate abundance (Rycyk et al., 2022). Here, we only consider vocalizations, but similar methods can be used to develop a CNN to detect feeding sounds produced by African manatees. Acoustic detection of feeding sounds has been used to acoustically monitor feeding behavior in Amazonian (Trichechus inunguis) and West Indian manatees (Trichechus manatus) (Kikuchi et al., 2014). Combining the detection of both vocalizations and feeding sounds can increase the probability of acoustically detecting African manatees and provide information about how an area is being used by the manatees.

Data were collected in accordance with Institutional Animal Care and Use Committee Protocol No. IS00007646 from the University of South Florida. We thank the Save the Manatee Club for the manatee habitat monitoring grant awarded to D.A.B.

1

See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0016543 for the MATLAB file of the final CNN.

1.
Allen
,
A. N.
,
Harvey
,
M.
,
Harrell
,
L.
,
Jansen
,
A.
,
Merkens
,
K. P.
,
Wall
,
C. C.
,
Cattiau
,
J.
, and
Oleson
,
E. M.
(
2021
). “
A convolutional neural network for automated detection of humpback whale song in a diverse, long-term passive acoustic dataset
,”
Front. Mar. Sci.
8
,
607321
.
2.
Au
,
W. W. L.
, and
Hastings
,
M. C.
(
2008
).
Principles of Marine Bioacoustics
(
Springer
,
New York
), 679 pp.
3.
Dufourq
,
E.
,
Batist
,
C.
,
Foquet
,
R.
, and
Durbach
,
I.
(
2022
). “
Passive acoustic monitoring of animal populations with transfer learning
,”
Ecol. Inform.
70
,
101688
.
4.
Escobar-Amado
,
C. D.
,
Badiey
,
M.
, and
Pecknold
,
S.
(
2022
). “
Automatic detection and classification of bearded seal vocalizations in the northeastern Chukchi Sea using convolutional neural networks
,”
J. Acoust. Soc. Am.
151
,
299
309
.
5.
Garcia
,
M.
, and
Favaro
,
L.
(
2017
). “
Animal vocal communication: Function, structures, and production mechanisms
,”
Curr. Zool.
63
,
417
419
.
6.
Jiang
,
J.
,
Bu
,
L.
,
Duan
,
F.
,
Wang
,
X.
,
Liu
,
W.
,
Sun
,
Z.
, and
Li
,
C.
(
2019
). “
Whistle detection and classification for whales based on convolutional neural networks
,”
Appl. Acoust.
150
,
169
178
.
7.
Keith Diagne
,
L.
(
2015
). “
Trichechus senegalensis
,” in IUCN Red List Threatened Species, (Last viewed August 16, 2022).
8.
Kikuchi
,
M.
,
Akamatsu
,
T.
,
Gonzalez-Socoloske
,
D.
,
de Souza
,
D. A.
,
Olivera-Gomez
,
L. D.
, and
da Silva
,
V. M. F.
(
2014
). “
Detection of manatee feeding events by animal-borne underwater sound recorders
,”
J. Mar. Biol. Assoc. U.K.
94
,
1139
1146
.
9.
MacIntyre
,
K. Q.
,
Stafford
,
K. M.
,
Berchok
,
C. L.
, and
Boveng
,
P. L.
(
2013
). “
Year-round acoustic detection of bearded seals (Erignathus barbatus) in the Beaufort Sea relative to changing environmental conditions, 2008–2010
,”
Polar Biol.
36
,
1161
1173
.
10.
Marian
,
A. D.
,
Monczak
,
A.
,
Balmer
,
B. C.
,
Hart
,
L. B.
,
Soueidan
,
J.
, and
Montie
,
E. W.
(
2021
). “
Long‐term passive acoustics to assess spatial and temporal vocalization patterns of Atlantic common bottlenose dolphins (Tursiops truncatus) in the May River estuary, South Carolina
,”
Mar. Mam. Sci.
37
,
1060
1084
.
11.
Marques
,
T. A.
,
Thomas
,
L.
,
Martin
,
S. W.
,
Mellinger
,
D. K.
,
Ward
,
J. A.
,
Moretti
,
D. J.
,
Harris
,
D.
, and
Tyack
,
P. L.
(
2013
). “
Estimating animal population density using passive acoustics
,”
Biol. Rev.
88
,
287
309
.
12.
Mathworks
(
2022
). “
Deep Learning Toolbox Model for GoogLeNet Network
,” https://www.mathworks.com/matlabcentral/fileexchange/64456-deep-learning-toolbox-model-for-googlenet-network (Last viewed August 16, 2022)
13.
MATLAB
(
2021
). MATLAB version 9.10 (R2021a) (
The MathWorks Inc.
,
Natick, MA
).
14.
Mayaka
,
T. B.
,
Takoukam Kamla
,
A.
, and
Self-Sullivan
,
C.
(
2015
). “
Using pooled local expert opinions (PLEO) to discern patterns in sightings of live and dead manatees (Trichechus senegalensis, Link 1785) in Lower Sanaga Basin, Cameroon
,”
PLoS ONE
10
,
e0128579
.
15.
Merchan
,
F.
,
Guerra
,
A.
,
Poveda
,
H.
,
Guzman
,
H. M.
, and
Sanchez-Galan
,
J. E.
(
2020
). “
Bioacoustic classification of Antillean manatee vocalization spectrograms using deep convolutional neural networks
,”
Appl. Sci.
10
,
3286
.
16.
Pan
,
S. J.
, and
Yang
,
Q.
(
2010
). “
A survey on transfer learning
,”
IEEE Trans. Knowl. Data Eng.
22
,
1345
1359
.
17.
Penar
,
W.
,
Magiera
,
A.
, and
Klocek
,
C.
(
2020
). “
Applications of bioacoustics in animal ecology
,”
Ecol. Complex.
43
,
100847
.
18.
Rand
,
Z. R.
,
Wood
,
J. D.
, and
Oswald
,
J. N.
(
2022
). “
Effects of duty cycles on passive acoustic monitoring of southern resident killer whale (Orcinus orca) occurrence and behavior
,”
J. Acoust. Soc. Am.
151
,
1651
1660
.
19.
Rasmussen
,
J. H.
, and
Širović
,
A.
(
2021
). “
Automatic detection and classification of baleen whale social calls using convolutional neural networks
,”
J. Acoust. Soc. Am.
149
,
3635
3644
.
20.
Ríos
,
E.
,
Merchan
,
F.
,
Higuero
,
R.
,
Poveda
,
H.
,
Sanchez-Galan
,
J. E.
,
Ferré
,
G.
, and
Guzman
,
H. M.
(
2021
). “
Manatee vocalization detection method based on the autoregressive model and neural networks
,” in
2021 IEEE Latin-American Conference on Communications (LATINCOM)
, pp.
1
6
.
21.
Roch
,
M. A.
,
Stinner-Sloan
,
J.
,
Baumann-Pickering
,
S.
, and
Wiggins
,
S. M.
(
2015
). “
Compensating for the effects of site and equipment variation on delphinid species identification from their echolocation clicks
,”
J. Acoust. Soc. Am.
137
,
22
29
.
22.
Romagosa
,
M.
,
Baumgartner
,
M.
,
Cascão
,
I.
,
Lammers
,
M. O.
,
Marques
,
T. A.
,
Santos
,
R. S.
, and
Silva
,
M. A.
(
2020
). “
Baleen whale acoustic presence and behaviour at a Mid-Atlantic migratory habitat, the Azores Archipelago
,”
Sci. Rep.
10
(
1
),
4766
.
23.
Rycyk
,
A. M.
,
Berchem
,
C.
, and
Marques
,
T. A.
(
2022
). “
Estimating Florida manatee (Trichechus manatus latirostris) abundance using passive acoustic methods
,”
JASA Express Lett.
2
,
051202
.
24.
Rycyk
,
A. M.
,
Factheu
,
C.
,
Ramos
,
E. A.
,
Brady
,
B. A.
,
Kikuchi
,
M.
,
Nations
,
H. F.
,
Kapfer
,
K.
,
Hampton
,
C. M.
,
Garcia
,
E. R.
, and
Kamla
,
A. T.
(
2021
). “
First characterization of vocalizations and passive acoustic monitoring of the vulnerable African manatee (Trichechus senegalensis)
,”
J. Acoust. Soc. Am.
150
,
3028
3037
.
25.
Stowell
,
D.
(
2022
). “
Computational bioacoustics with deep learning: A review and roadmap
,”
Peer J.
10
,
e13152
.
26.
Szegedy
,
C.
,
Liu
,
W.
,
Jia
,
Y.
,
Sermanet
,
P.
,
Reed
,
S.
,
Anguelov
,
D.
,
Erhan
,
D.
,
Vanhoucke
,
V.
, and
Rabinovich
,
A.
(
2014
). “
Going deeper with convolutions
,” http://arxiv.org/abs/1409.4842.
27.
Takoukam Kamla
,
A.
(
2012
). “
Activity center, habitat use and conservation of the West African Manatee (Trichechus senegalensis Link, 1795) in the Douala‐Edea and Lake Ossa Wildlife Reserves
,” M.Sc. thesis,
University of Dschang
,
Cameroon
.
28.
Tavolga
,
W.
(
1965
). “
Review of marine bio-acoustics. State of the art
,” No. 1. US Naval Training Device Center.
29.
Usman
,
A. M.
,
Ogundile
,
O. O.
, and
Versfeld
,
D. J. J.
(
2020
). “
Review of automatic detection and classification techniques for cetacean vocalization
,”
IEEE Access
8
,
105181
105206
.
30.
Weiss
,
K.
,
Khoshgoftaar
,
T. M.
, and
Wang
,
D.
(
2016
). “
A survey of transfer learning
,”
J. Big Data
3
,
9
.
31.
Yang
,
M.
,
Nurzynska
,
K.
,
Walts
,
A. E.
, and
Gertych
,
A.
(
2020
). “
A CNN-based active learning framework to identify mycobacteria in digitized Ziehl-Neelsen stained human tissues
,”
Comput. Med. Imaging Graphics
84
,
101752
.

Supplementary Material