Passive acoustic monitoring of marine mammals is common, and it is now possible to estimate absolute animal density from acoustic recordings. The most appropriate density estimation method depends on how much detail about animals' locations can be derived from the recordings. Here, a method for estimating cetacean density using acoustic data is presented, where only horizontal bearings to calling animals are estimable. This method also requires knowledge of call signal-to-noise ratios, as well as auxiliary information about call source levels, sound propagation, and call production rates. Results are presented from simulations, and from a pilot study using recordings of fin whale (*Balaenoptera physalus*) calls from Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO) hydrophones at Wake Island in the Pacific Ocean. Simulations replicating different animal distributions showed median biases in estimated call density of less than 2%. The estimated average call density during the pilot study period (December 2007–February 2008) was 0.02 calls hr^{−1} km^{2} (coefficient of variation, CV: 15%). Using a tentative call production rate, estimated average animal density was 0.54 animals/1000 km^{2} (CV: 52%). Calling animals showed a varied spatial distribution around the northern hydrophone array, with most detections occurring at bearings between 90 and 180 degrees.

## I. INTRODUCTION

Using acoustic data to estimate animal density has been demonstrated for both terrestrial and marine species (e.g., Buckland, 2006; Marques *et al.*, 2013; Stevenson *et al.*, 2015). A suite of density estimation methods exist that can be applied to different types of acoustic survey data. The most appropriate density estimation method depends on how much detail about animals' locations can be derived from the recordings, which is often determined by the number and configuration of deployed instruments. At best, three-dimensional locations of calling animals can be estimated from acoustic data; conversely, some recordings can yield little to no information about animals' locations.

Distance sampling (Buckland *et al.*, 2001) and spatially explicit capture-recapture (SECR; e.g., Borchers, 2012) are methods that estimate the probability of detecting animals (a key parameter of any animal density estimation method) using spatial data collected during the survey. Specifically, distance sampling can be used when the horizontal range between an instrument and a calling animal can be estimated (e.g., Marques *et al.*, 2011), which, for marine animals, typically requires animal depth to be estimable (or assumed). SECR requires that the same acoustic event is matched across multiple recorders, creating “capture histories” of acoustic events. Indirect information about the location of calling animals can be inferred from these capture histories by assessing which recorders (with known locations) detected the acoustic events. Although SECR does not need measured ranges, SECR analyses can be supplemented with data relating to animals' locations such as direction, received sound level, and time of arrival (Borchers *et al.*, 2015). Given their data requirements, both distance sampling and SECR require arrays of recorders to estimate detection probability (though horizontal ranges to calling animals can, in some particular scenarios, be estimated from single instruments, e.g., Harris *et al.*, 2013; Marques *et al.*, 2011; Tiemann *et al.*, 2006).

Conversely, when no spatial information can be estimated from recorded data (e.g., in most scenarios where single instruments are deployed), detection probability can be estimated using some form of auxiliary data. Marques *et al.* (2013) consider two types of auxiliary information: (1) a sample of measured animal locations in relation to a recorder either from animal-borne tags (e.g., Marques *et al.*, 2009) or combined visual and acoustic trials using focal animals (e.g., Kyhn *et al.*, 2012); (2) acoustic modeling using elements of the passive sonar equation (Urick, 1983), including information about the target species' call source level (SL), transmission loss (TL), ambient noise levels (NLs), and the efficiency of the detection and classification process. This latter information can be combined to estimate the probability of detection using a simulation-based framework (e.g., Küsel *et al.*, 2011). Monte Carlo simulations have been implemented for a range of cetacean species (Küsel *et al.*, 2011; Harris, 2012; Helble *et al.*, 2013; Frasier *et al.*, 2016) but rely on accurate simulation inputs. One such input is the distribution of simulated animals; however, there are often no *a priori* data about what this distribution should be. This is a key limitation of the Monte Carlo simulation approach.

Here, a new method is presented for estimating cetacean density using acoustic data, for cases where horizontal bearings to calling animals are estimable. This approach is suitable for scenarios where neither distance sampling nor SECR can be implemented due to lack of recorders (note that SECR survey design is an ongoing area of research but, to date, the minimum number of recorders used for acoustic capture histories has been three; Kidney *et al.*, 2016). The new method is related to the Monte Carlo simulation methodology as it uses the passive sonar equation; measured call signal-to-noise ratios (SNRs) are required, as well as auxiliary information about call SLs, sound propagation, and call production rates. However, the additional bearing data give some empirical information about animal distribution, conferring an advantage over the standard Monte Carlo simulation. Another advantage of this method is that it produces a spatial map of estimated abundance (or density), allowing inferences about spatial habitat preferences of acoustically active animals.

The paper is structured as follows. Section II presents a background to density estimation using acoustic data and a description of the new method. Details about the motivating case study—fin whales recorded in the Pacific Ocean by Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO) hydrophones—are given in Sec. III (including details of all the required auxiliary analyses). Simulations are presented, which investigate method performance under different known spatial animal distributions (Sec. IV). The method is then applied to three months of recordings from Wake Island between December 2007 and February 2008 (Sec. V). This analysis forms a pilot study prior to applying the method to long-term CTBTO datasets from Wake Island and Diego Garcia in the Indian Ocean. Finally, Sec. VI presents a discussion of the approach, including its limitations, benefits, and potential implementations.

## II. DENSITY ESTIMATION USING ACOUSTIC DATA

A general estimator of animal density using acoustic cues (e.g., animal calls) from static instruments was presented by Marques *et al.* (2009),

where $D\u0302$ = animal density, $nc$ = number of detected signals, $c\u0302$ = false positive proportion, $K$ = number of monitoring points, $w$ = maximum detection range, $P\u0302a$ = average probability of detection of an animal within radius *w* of the sensor, $T$ = total monitoring time, and $r\u0302$ = cue production rate. This equation can be decomposed into three components,

where $N\u0302c=nc(1\u2212c\u0302)/P\u0302a\u2009$ is the estimated abundance of cues, $K\pi w2$ is the area monitored, so that dividing the abundance of cues by the area monitored gives a density of cues, and $1/Tr\u0302$ converts the density of cues to the density of animals.

The average probability of detection, $P\u0302a$, can be estimated in several ways, as shown by the variety of available density estimation methods (Marques *et al.*, 2013). Each method has various assumptions that must be met to produce an unbiased detection probability and hence density. One key assumption in distance sampling is that the distribution of animals' distances from samplers (i.e., transect lines in a line transect survey, or monitoring points in a point transect survey) is known. This is achieved by random placement of multiple samplers within the study area so that, on average, animals are distributed uniformly in horizontal space. For a survey using many fixed monitoring points with circular detection areas, this assumed average distribution of animal distances is specifically a triangular distribution due to the linear increase in area with increasing incremental horizontal distance from each sample point (Buckland *et al.*, 2001). However, when single acoustic stations are used, it may not be reasonable to assume animal distances from that single station follow a triangular distribution, and standard distance sampling should not be used to estimate $P\u0302a$ (even if ranges to animals can be estimated). Therefore, an alternative approach to estimating detection probability is required. In the method developed here, cue abundance is estimated using a Horvitz-Thompson-like estimator (after terminology used by Borchers and Burnham, 2004). These estimators are based on seminal work by Horvitz and Thompson (1952), who showed that when sampling at random from a population where each individual, *i*, has probability $Pi$ of being sampled, then an unbiased estimator of population size is given by the sum over detected individuals of $1/Pi$. One can think of each detection “representing,” on average, $1/Pi$ objects in the population. In animal density estimation methods, individual detection probabilities for every detection can be estimated [rather than estimating an average detection probability as shown in Eq. (1)] and combined to give $N\u0302c=\u2211i=1nc1/P\u0302i$. However, the detection probabilities, $Pi$, are estimated, not known (hence “Horvitz-Thompson-like”). Horvitz-Thompson-like estimators are not unbiased; the bias is typically small unless estimated probabilities are highly uncertain or close to zero (Borchers *et al.*, 2002). The key advantage of this approach in the current case is that the individual detection probabilities can be estimated without requiring any assumption about the distribution of animals with respect to the samplers.

Other key assumptions that apply to this new method are that (1) all data measurements and derived parameters are accurate and (2) detections are independent of one another. It is highly improbable that recorded whale calls are produced independently of each other, given that one animal may produce many calls. However, violation of the independence assumption should not produce severe bias, though variance estimation can be affected (Marques *et al.*, 2013). Another assumption of any density estimation method is that parameters used in the estimator are accurate for the time and place of the main survey. A frequent limitation of auxiliary data used in density estimation analyses is that the additional experiments (e.g., to estimate cue production rate) may have been conducted in a limited part of the study area (or in a different location) and/or at a different time as the main survey, which may lead to bias in the estimated parameters. Therefore, as many auxiliary analyses should be undertaken using data from the main survey region and time period as possible.

### A. Method overview

It is assumed that acoustic data have been recorded at known locations for a known time and then processed using an automated detection and classification algorithm.

Estimation proceeds in the following stages, described in more detail in Sec. II B.

Characterize the automatic detection process to estimate the probability of detecting a call as a function of SNR [$P(SNR)$]. The resulting fitted “detection characterization curve” is used to estimate the detection probability for each detected signal.

Determine the monitored area: for each of a set of discrete bearings, use the assumed call SL and the measured NL distributions with a TL model to determine a set of ranges at which calls are almost certain to be masked (i.e., the resulting SNR is so low that probability of detection is very low) and exclude these areas from further analysis.

Estimate the distribution of possible ranges for each detection. Use the measured received level (RL) and bearing of each detection, together with the assumed SL distribution and TL model to estimate the probability density function (pdf) of ranges for that detection. A probabilistic approach is required because (a) SL for each detection is not assumed known, but is assumed to come from a probability distribution; (b) even if SL were assumed known, the TL does not increase monotonically with range, and hence a detected signal with a given RL can correspond to more than one range.

Estimate the range-specific distribution of number of signals corresponding to each detection, i.e., scale each detection by its associated detection probability to account for undetected signals. Using the Horvitz-Thompson-like estimator, each detection,

*i*, on average corresponds to $1/P(SNRi)$ signals within the area monitored.Estimate spatial density of signals by summing over the estimated number of signals at each bearing and range to yield an empirical spatially explicit abundance of signals. Then smooth this using a generalized estimating equation (GEE) spatial model.

Estimate animal density: use additional multipliers, i.e., false positive proportion, time spent monitoring (excluding periods of high ambient noise that cause masking), and cue rate [Eq. (1)]. Also potentially restrict inference to areas where detection probability is higher and hence inference more reliable.

### B. Further details

##### Stage 1: Characterize the automatic detector.

Detector characterization is performed using a sample of manually detected calls. To ensure the sample is representative, a systematic random subset of recordings (i.e., short sections equally spaced in time—see Sec. III for an example) should be analysed manually. SNR is measured for a sample of manually detected calls, as well as noting whether or not each call was detected by the automatic detector. Logistic regression with automated detection/non-detection as the response and SNR as the explanatory variable is used to model the probability of detecting a call as a function of SNR. A generalized additive model (GAM; Wood, 2006) is used to allow a smooth, nonlinear relationship between probability of detection and SNR. The fitted detector characterization curve is then used to predict probability of detection, $P(SNR)$, for each detection (over the entire monitoring period), $P\u0302i=P\u0302(SNRi)$.

If bearings cannot be estimated for all detections, one of two approaches can be taken: the detector characterization curve can be estimated where a successful detection is defined as either (1) any detected fin whale call (regardless of whether it had an associated bearing or not), or (2) detected fin whale calls that had an associated bearing measurement. The choice of detector characterization approach will affect the value used for *n _{c}* in the estimator [Eq. (1)]. Under the first definition,

*n*will be the number of detections (with or without measured bearings); under the second definition,

_{c}*n*will be the number of detections with measured bearings only. In both cases, an assumption is made that the measured bearings represent the spatial distribution of all detected signals, including those for which bearings could not be estimated.

_{c}##### Stage 2. Determine area monitored.

This stage is analogous to identifying the maximum detection range, *w*, in Eq. (1), although a set of bearing-specific ranges are derived, allowing TL to vary in different directions, and be non-monotonic with increasing range. Hence, the area monitored does not have to be circular or continuous.

SL is assumed to follow a normal distribution; so, it is theoretically possible to detect calls from implausibly large (or even infinite) ranges in stage 3. Therefore, a pragmatic cutoff is used that ensures detections from outside the area monitored will be very rare. The assumed SL and NL distributions are evaluated at the 90th and 10th percentiles, respectively, to represent a loud call in low noise. These values are used in the passive sonar equation along with TL to calculate the SNR of the hypothetical call at various range and bearing steps around the hydrophone (SL – TL – NL = SNR). The detection probability of the call at all locations is evaluated from the detector characterization curve. Locations where the call has a detection probability of equal to or less than 0.1 are considered to be acoustically masked. The lowest TL associated with a masked location is used as a TL threshold to define acoustically masked areas, which are then excluded from the remainder of the analysis.

##### Stage 3. Estimate distribution of possible ranges for each detection.

Given a detection with measured RL and bearing $\theta $, the SL of the detection if the source was at range *r* can be derived from the (simplified) passive sonar equation as

where $TL(r,\theta )$ is range- and bearing-specific TL. An SL distribution is required, which is assumed to follow a normal distribution with mean $\mu $ and standard deviation $\sigma $. In this analysis, SL could be estimated from a subsample of localized calls at short ranges. Then, the pdf of range is

where $\nu $ is a normalizing constant to ensure *f* is a proper pdf

The need for an *r* in the denominator of Eq. (4) is explained by viewing the analysis as analogous to distance sampling with measurement error on the distances. In this case, the geometry of a circular detection area means that random measurement error (in this case, uncertainty in location) will result in underestimation of detections' true locations (discussed in Buckland *et al.*, 2015), leading to biased density estimates at closer ranges.

In practice, range is discretized into a fixed set of range intervals, with midpoints ${R}$. TL is calculated at these ranges, and it is assumed that the TL values apply to each corresponding interval. Then, the probability a detection comes from interval *k* is

##### Stage 4. Estimate range-specific distribution of number of signals corresponding to each detection.

SNR for each detected signal is calculated from the RL and NL measurements associated with each signal (SNR = RL – NL). Detection probabilities of each detected signal are estimated using the detector characterization curve, and the range-specific distribution for each detection is divided by the estimated detection probability. Using the Horvitz-Thompson-like approach, the estimated number of signals in the population “represented” by a signal detected with a given SNR is $1/P(SNR)$. Hence, the range-specific distribution of number of signals corresponding to a particular detection is given by

##### Stage 5. Estimate spatial density of signals.

At each bearing and range interval, the estimated number of signals are summed. This yields a spatial abundance surface, but one that is not necessarily smooth because of random variation in detections. Given a long monitoring period, the true distribution of calls around the sensor likely is smooth, so precision can be gained by smoothing the raw estimates using a GEE model (Hardin and Hilbe, 2012), which accounts for spatial autocorrelation. The response variable is the estimated signal abundance, assuming an overdispersed quasipoisson error distribution and using a log link function. Explanatory variables are the location of the centre of the bearing and range interval in (*x*,*y*) space [two-dimensional (2D) Cartesian coordinates]. To account for the fact that intervals at larger ranges represent a larger area, the area of each interval is included as an offset in the model. To account for spatial autocorrelation, spatial blocks of 100 km × 100 km are created through the study area and an independent working correlation structure implemented; model residuals can therefore be correlated within blocks but are assumed to be independent between spatial blocks. The spatial GEE is fitted using CReSS (complex region and spatial smoother; Scott-Hayward *et al.*, 2014) and SALSA (spatially adaptive local smoothing algorithm; Walker *et al.*, 2011) methods, allowing a flexible surface with spatially varying smoothness to be modeled. Model fit is assessed using concordance correlation and marginal *R* squared values (in both cases, values close to 1 indicate good fit). A predicted density surface is created by predicting abundance on a regular (*x,y*) grid, and dividing by the area of each grid cell.

##### Stage 6. Estimate animal density.

The predicted density surface of signals is converted to a predicted animal density surface by multiplying by $(1\u2212c\u0302)/Tr\u0302$, where *c* is the false positive proportion, *T* is monitoring time, and *r* is the cue production rate. False positive proportion is estimated from the manually validated sample of data. Monitoring time should be known as part of the survey protocol. Furthermore, the NL measurements of the detections can be compared to ambient NL measured throughout the dataset to determine a NL threshold, above which total acoustic masking is likely to occur. Time periods of data where ambient NL exceeds the maximum NL associated with a detection are omitted from the monitoring time, *T*. Cue production rate must come from auxiliary information and is often not known, in which case density of calls can be estimated but not density of animals.

Average density can be computed by taking the average across the prediction surface. To increase robustness, grid cells far from the sensor, where detection probability is low, may be excluded from this averaging. A Horvitz-Thompson-like estimator is known to produce positively biased estimates, particularly when some of the $P\u0302i$ values are small (Borchers *et al.*, 2002) as is the case for more distant calls. To mitigate this, a simulation study can be used to determine at what range bias may be minimised and this can be used to truncate the range over which average density is inferred.

### C. Variance estimation

The delta method (Seber, 1982) is used to combine the coefficients of variation (CVs) for each random variable used in the density estimator to estimate the overall CV for the resulting density estimate. Note that the encounter rate also contributes to the overall variance of a density estimate, and is denoted by $CV(nc)$ in Eq. (8). All other density estimator inputs such as *K*, *T*, and *w* are known constants and therefore do not have an associated variance,

where $P\u0302a$ = overall mean probability of detection, defined as

In surveys with multiple samplers (i.e., monitored lines or points), between-sampler variance in encounter rate is usually estimated. With only one monitoring point as in this study, there is no spatial variance in encounter rate and, instead, variance in encounters is linked only to the detection process. Following guidance in Buckland *et al.* (2001), the encounters are assumed to follow an overdispersed Poisson distribution. Therefore, encounter variance can be estimated using the Poisson expression for variance [multiplied by a factor of 2 to acknowledge assumed aggregation in the encounters; Eq. (10)], which can then be used to calculate the CV,

The false positive proportion and call production rate have weighted means (see Sec. III for details) so variance is estimated using Cochran's approximation (Cochran, 1997, recommended by Gatz and Smith, 1995). Detection probability variance is estimated using parametric bootstrapping of the SL and NL distributions, the coefficients of both the logistic regression, and GEE spatial models, then taking the empirical variance of the resulting bootstrapped signal densities. As these signal densities are uncorrected for false positives, the only parameter used in their estimation is $P\u0302i$, and so the signal density CV will be equivalent to $CV(P\u0302a)$.

## III. CASE STUDY—FIN WHALES IN THE PACIFIC OCEAN

The pilot study focused on fin whale calls recorded in the Pacific Ocean. Fin whales, the second largest cetacean, occur globally and are currently listed as “endangered” in the IUCN Red List (Reilly *et al.*, 2013). Fin whales produce a low-frequency pulsed call, the “20-Hz” call (Watkins *et al.*, 1987), which has been widely utilized to investigate fin whales' distribution and density through passive acoustic monitoring (e.g., Širović *et al.*, 2015). In particular, a study of fin whales near Oahu, Hawaii, was an early example of using passive acoustic data to estimate density (McDonald and Fox, 1999). Multipath arrivals and the passive sonar equation were both used to estimate ranges to calling animals. However, neither detection probability nor non-calling animals were explicitly accounted for, so the resulting estimates were interpreted as a minimum number of animals (McDonald and Fox, 1999).

Data from the CTBTO International Monitoring System (IMS) station at Wake Island (station identifier: H11) in the Equatorial Pacific Ocean were used (1) as a basis for simulation studies to test the efficacy of the method and (2) to demonstrate a pilot analysis using fin whale 20 Hz calls. Data from peak seasonal detections from Dec. 1, 2007 to Feb. 29, 2008 were used, and details of data processing and auxiliary analyses are given throughout the rest of this section.

### A. Wake Island CTBTO IMS station

The Wake Island station is composed of two three-element triangular arrays with 2.5 km spacing between elements, with three hydrophones located to the north of the island (Fig. 1) and three to the south. These cabled hydrophones are suspended in the deep sound channel. The three-month pilot study used data from the northern array (hydrophone depths were 731 m, 732 m, and 729 m). The average water depth at the array was 1068 m (estimated from Amante and Eakins, 2009). Sound levels were recorded continuously at a 250 Hz sampling rate and 24 bit analog-to-digital (A/D) resolution. The hydrophones were calibrated individually prior to initial deployment in January 2002 and re-calibrated while at sea in 2011. All hydrophones had a flat (within 3 dB) frequency response from 8 to 100 Hz. Information from individual hydrophone response curves was applied to the data to obtain absolute values over the full frequency spectrum (5–115 Hz). Data less than 5 Hz and from 115 to 125 Hz were not used due to the steep frequency response roll-off at these frequencies.

### B. TL of a fin whale call

The TL due to range-dependent propagation between a vocalizing whale using a 20 Hz call and one of the northern hydrophone receivers (labeled N1) at 731 m depth was modeled along 360 bearings at 1^{o} resolution using the OASIS Peregrine parabolic equation model out to 1000 km from N1 (Heaney and Campbell, 2016; Fig. 2). The TL was modeled at 1 km range steps over the three-month study using seasonal sound speed profiles obtained from The World Ocean Atlas.^{1} It was assumed that the source was at a depth of 15 m, in keeping with results about fin whale calling behavior (Stimpert *et al.*, 2015). The bathymetry was taken from the global bathymetry database ETOPO1 (Amante and Eakins, 2009). Surface loss was negligible due to the low frequency of signals. Sea floor parameters of soft sand sediment were used representing a global average of deep ocean sediment. Details of the geoacoustics parameters in the specific Wake Island region are not known but should not affect propagation in this environment due to direct path/sound channel propagation.

### C. Ambient NLs

Mean spectral levels within the 10–30 Hz band were calculated for each minute of the three-month dataset, resulting in spectral levels with units of dB re 1 μPa^{2}/Hz. Ambient NLs were calculated in the targeted 10–30 Hz band to directly overlap with the frequency range of the fin whale 20-Hz pulse. Mean spectral levels were calculated using a Hann windowed 15 000 point discrete Fourier transform with no overlap to produce sequential 1-min power spectrum estimates. Note that these measurements included fin whale calls, where present; it was important that the NLs reflected all noise sources that each fin whale call could be exposed to, which included calls by conspecifics.

### D. SL estimation

A sample of fin whale calls were localized using the northern array so that a SL distribution could be estimated. SL estimates of detected fin whale vocalizations were computed using the passive sonar equation [Eq. (11)] that incorporated environmental NLs present at the time of the call within the RL of the vocalization,

As the low-frequency calls are omnidirectional, the directivity index (DI) was set to zero. Processing gain (PG) and detection threshold (DT) are accounted for in the calibration of the recording system. RLs were calculated for individual vocalizations recorded at N1 using a custom matlab (Mathworks, 2018) code. Spectrograms were calculated using a 512-point fast Fourier transform (FFT) and 93.75% overlap. Calls were then manually detected, with a human analyst selecting the upper and lower frequency and time bounds of an individual call. The rms (root-mean-square) RL of the call was then calculated from the selected spectral data.

The TL of a signal of a given frequency is dependent on the range, bearing, and depth of the vocalizing animal. The time difference of arrival (TDOA) between each hydrophone pair was found by cross correlation of received signals and was supplemented with manual inspection due to dispersed waveforms. 2D hyperbolic localization was then used to find the range and bearing of the vocalizing animal. Location information was then input into the site-specific, seasonal TL models to back calculate the SL of each identified vocalization. The depths of the sources were unknown but assumed to be at a depth of 15 m following results from Stimpert *et al.* (2015). For comparison, SLs of the same sample of calls were also calculated using simple spherical spreading instead of the more complex Peregrine TL model.

### E. Automated fin whale call detection

Fin whale calls were detected from the N1 hydrophone using the automatic detection feature of Ishmael, an open-access bioacoustic analysis software package (Mellinger, 2002). The spectrogram correlation method was utilized for the full three-month dataset, cross-correlating the spectrogram of the dataset with a synthetic call kernel. The kernel is a template that indicates the time and frequency endpoints of the desired call. To prepare the dataset for autodetection, time-waveform data were first passed through a 10–30 Hz bandpass filter. Spectral data were then calculated using a 512-point FFT with a 93% overlap, and a 22–14 Hz 1-s downsweep call kernel was applied.

Results from the automatic detector were compared with the manually detected calls from a subset of data. The three-month dataset was divided into six-hour sections, and a systematic random sample of these sections was taken. Every 11th 6-hour section was selected under the sampling scheme, resulting in 32 6-hour sections. All calls within the 32 selected sections were manually detected, and a receiver-operator curve was generated for the automatic detector that compared the false positive proportion (the number of false positives divided by the total number of automatic detections) with the proportion of missed calls (the number of missed calls divided by the total number of manually detected calls, i.e., false negative proportion) for a range of DTs. The Receiver Operating Characteristic (ROC) curve indicated that the optimal DT had a 10% false positive proportion and a false negative proportion of 59%. The mean false positive proportion was weighted by the number of detections checked in each six-hour section.

### F. Bearing measurements

Bearings were calculated using the TDOA of received signals. Using the known distances between receivers and the seasonal sound speed, an estimated bearing was calculated for each pair of hydrophones,

where $\tau $ represents the TDOA of a signal between a hydrophone pair, *d* is the distance between a hydrophone pair, and *s* is the speed of sound.

Left-right ambiguity of each bearing estimate could be resolved by comparing with the other two estimates. The median bearing was then selected. An acceptable bearing is one where the three bearings resulting from the three pair combinations all produced bearings within ten degrees of each other. TDOA between each pair of hydrophones (N1 and N2, N2 and N3, N3 and N1) were found through three different methods, as described in order of application below. If the cross-correlation method failed to produce an acceptable bearing, manual estimation was performed. When manual estimation using the start point of each call failed to produce an acceptable bearing, a band energy analysis was performed. The first step of all methods was to pass the signals through a 10–30 Hz bandpass filter. Bearings were rounded to the nearest integer, to correspond with the resolution of the TL model.

#### 1. Cross correlation

Once the data were filtered, a simple cross correlation was performed in matlab to determine time delays. Characteristics of the environment cause dispersion in the waveforms traveling from distant ranges. As a result, a simple cross correlation was not a viable option for many of the distant calls.

#### 2. Manual estimation

TDOA was found by manually selecting the start of each call from the time waveform. Manual inspection eliminates the discrepancies that arise from the modal dispersion. Manual selection also provided reliable results for calls with a low (<6 dB) SNR, which is not always possible with automated methods. Manual detections were feasible for a limited pilot study, but this method would not be appropriate for large datasets.

#### 3. Band energy analysis

Filtered data from N1 were analysed in 3 Hz bands with 1 Hz overlap, starting at 10 Hz, finding the peak in each band. The first band with a peak of at least 5 dB SNR was then selected. The time index of the first peak in this frequency band for each sensor was then noted and time delays were calculated from the identified time index.

### G. Detector characterization

All calls were manually detected in the subsampled six-hour sections. The rms RL of each call was measured, and the SNR of the call was calculated using a NL measured from the second of data preceding the call (in the same frequency bandwidth as the measured call rms RL). Whether or not the call was detected by the automatic detector was also noted. The detector characterization curve was modeled using the statistical analysis software, R version 3.3.1 (R Core Team, 2016). A GAM (Wood, 2006) with a binary response and logit link function was fitted to the data.

### H. Call production rate

No call production rate data were available for fin whales occurring near Wake Island, but call production rate data from the Southern California Bight in the North Pacific Ocean have been published (Stimpert *et al.*, 2015). The fin whale data from southern California were collected in summer months, and so it is possible that this cue rate is biased for the fin whales calling near Wake Island in the winter months. Cue rates from Stimpert *et al.* (2015) were applied here as a proof of concept only, and resulting animal density estimates must be treated cautiously.

## IV. SIMULATION STUDIES

### A. Simulation overview and input data

The primary aim of the simulation studies was to investigate whether the method returned unbiased (1) detection probability estimates and (2) distribution maps under a range of scenarios. To that end, call density only was estimated in the simulations (i.e., a false positive proportion and call production rate were not considered).

Ambient noise and SL information, as well as the detector characterization curve, were measured directly from the Wake Island dataset. The SL distribution (assumed to be normally distributed and summarized on the dB scale) had a mean of 177.7 dB re 1 μPa^{2}/Hz at 1 m (standard deviation: 3.30, *n* = 79) using the Peregrine TL model and 177.6 dB re 1 μPa^{2}/Hz at 1 m (standard deviation: 3.03) using spherical spreading to predict propagation loss. Further, estimated SL decreased significantly as a function of range when using the Peregrine model (linear regression coefficient = −2.20, *p*-value < 0.001, *n* = 76 due to the removal of three outlying data points using Cook's distance measures). Estimated SLs assuming spherical spreading also decreased slightly with range, though not significantly (linear regression coefficient = −0.62, *p*-value = 0.27, *n* = 76; Fig. 3). Given that the means and standard deviations of the two SL distributions were almost identical, the SL estimates using the more complex, bathymetry-dependent Peregrine model were used for all simulations and analyses (though see Sec. VI for a discussion of the regression results). The mean of the NL distribution (also assumed to be normally distributed and summarized on the dB scale) measured in association with manually detected calls was 92.5 dB re 1 μPa^{2}/Hz (standard deviation: 2.74, *n* = 1484). The detector characterization curve was estimated using 1484 manually detected calls, which were found in 20 out of 32 manually checked 6-hour sections (12 sections contained no calls). The mean SNR of automatically detected calls was 13.98 (standard devation: 7.09, *n* = 612) and the mean SNR of calls missed by the automatic detector was 4.45 (standard deviation: 1.59, *n* = 872). The fitted GAM predicted that the majority of calls with an SNR greater than 10 dB were certain to be detected (Fig. 4).

Simulation TL data were based on TL data from Wake Island but were modified due to extreme TL encountered in the real Wake Island data (see Sec. V). Wake Island TL data were extracted at a depth of 15 m to reflect realistic fin whale calling behavior. TL ranged between 71.70 dB and 286.46 dB. For the simulation studies, the minimum TL value (71.70 dB) was subtracted from all TL values resulting in simulated TL values that ranged between 0 and 214.76 dB.

Three call spatial distributions were tested via simulation, designed to reflect differing calling animal distributions (Fig. 5): calls were distributed (1) uniformly throughout the study area, (2) limited to the northeast, and (3) limited to the south of the hydrophone. The simulation was set up as follows:

Calls (with a specified density or abundance) were simulated through the study area; call distribution were changed by drawing

*x*- and*y*-coordinates from either a uniform or scaled beta distribution, depending on the desired spatial call pattern (Fig. 5).Each simulated call was assigned a SNR based on the passive sonar equation; each call was assigned a SL and NL by drawing values from normal distributions with mean and standard deviations as measured from the Wake Island dataset, which were then combined with the bearing- and range-specific TL value for that call, taken from the modified TL data.

Each call's detection probability was evaluated from the detector characterization curve and a Bernoulli trial was used to determine whether a given simulated call was detected or not.

The TL value above which no calls are detected was determined using the approach described in Sec. II.

Following the simulated detection process, the simulated RL, NL, and bearing values for each simulated detected call were used as inputs for analysis instead of using measurements from real recordings. All simulations were run 500 times in *R*. The maximum detection range of the recording system was specified as 1000 km in all cases. In both simulations and analyses, the maximum detection range is set as an upper limit for a given instrument but may be reduced when the monitored area is defined (step 2, Sec. II A). In each of the three simulation scenarios, the initial abundance was altered so that the number of detected calls was similar across all scenarios. The estimated call density was compared to the known true value by calculating the median percentage bias (with associated 2.5% and 97.5% percentiles). Additionally, because the true number of simulated calls was known at increasing range steps from the array, the percentage bias as a function of range from the array could also be assessed by comparing the true number of simulated calls and the predicted number of calls within each range step. The maximum range at which the percentage bias of call density was minimised was calculated for every iteration (in some cases, the same minimal bias was calculated at multiple ranges, so the largest range was selected). The distribution of these ranges could then be assessed after all iterations were run to see whether there was an optimal prediction range, beyond which percentage bias was likely to become larger, decreasing the robustness of the final predicted density. This feature of the simulation algorithm may be useful for analysts to decide whether to restrict the area of inference following an analysis to potentially reduce bias in the reported density estimate. However, it is important to note that the simulation relies on an assumed distribution of animal calls, which is likely to be different from the true, and unknown, animal distribution, so a reduction in bias in analysis results is not guaranteed.

### B. Simulation results

The simulations performed well—results from all scenarios had median percentage biases less than 2% (Table I). Percentage bias did not exceed 5% in any of the simulations. In some scenarios, assessing the bias as a function of range showed that bias in call density estimates could be substantially reduced when call density was inferred over a reduced range. Bias was negligible for the uniform and southern distributions at median ranges of 678 km and 360 km, respectively, suggesting that these ranges were the optimal prediction ranges for these scenarios. The northeastern (NE) distribution results were not improved by reducing the range of prediction. Spatial model fit across scenarios varied, with uniform distribution models displaying the poorest fit and the NE distribution producing spatial models with the best fit (median marginal *R* squared values: 0.51, 0.79, and 0.92; median concordance correlation values: 0.68, 0.88, and 0.96, for uniform, southern, and NE distributions, respectively). However, all spatial models produced density maps that replicated the initial distributions (Fig. 6).

Scenario . | Uniform distribution . | Southern distribution . | Northeastern (NE) distribution . |
---|---|---|---|

Number of detections | 7243 | 7597 | 7408 |

(7147,7354) | (7484,7714) | (7389,7427) | |

Percentage bias | −1.52 | −1.88 | 0.01 |

(−3.13,1.12) | (−3.96,0.97) | (−0.45,0.86) | |

Minimised percentage bias | −1.93 ×10^{−4} | −0.02 | −0.01 |

(−0.98,0.32) | (−0.67,0.70) | (−0.38,0.32) | |

Range at which bias | 678 | 360 | 1000 |

minimised (km) | (50,993) | (235,1000) | (45,1000) |

Scenario . | Uniform distribution . | Southern distribution . | Northeastern (NE) distribution . |
---|---|---|---|

Number of detections | 7243 | 7597 | 7408 |

(7147,7354) | (7484,7714) | (7389,7427) | |

Percentage bias | −1.52 | −1.88 | 0.01 |

(−3.13,1.12) | (−3.96,0.97) | (−0.45,0.86) | |

Minimised percentage bias | −1.93 ×10^{−4} | −0.02 | −0.01 |

(−0.98,0.32) | (−0.67,0.70) | (−0.38,0.32) | |

Range at which bias | 678 | 360 | 1000 |

minimised (km) | (50,993) | (235,1000) | (45,1000) |

## V. PILOT STUDY

### A. Pilot study overview and input data

The pilot study analysis estimated fin whale density based on the detected calls (and associated SNR and bearing measurements) from three months of data. A simulation was also run to investigate the level of potential bias in the analysis results, and whether inferring density over a smaller area may reduce any bias (as discussed in Secs. II B and IV A). Calls were uniformly distributed through the simulated study area and the steps of the simulation setup were the same as those described in Sec. IV A, except for the TL data used.

A key difference between the simulations described in Sec. IV A and the pilot study analysis and simulation was that unmodified TL data were used in the pilot study, reflecting the true environmental conditions at Wake Island (Fig. 7).

Inputs for the analysis were the following: number of detections, *n _{c}*, was 6552. The automatic detector detected 6658 signals, but the SNR of 106 signals fell below the lower SNR limit of detected calls in the detector characterization analysis (2.24 dB) and so were removed to prevent model extrapolation when estimating detection probability using the detector characterization curve. Of the remaining detections, 3086 (47%) had measurable bearings, which ranged between 1.69 and 359.40 degrees (Fig. 8). While detections occurred at all bearings around N1, the quadrant with the greatest number of detections occurred between 90 and 180 degrees.

The highest NL associated with a detection was 123.89 dB re 1 μPa^{2}/Hz. Of the 91 days of continuous monitoring, 27 min had an average NL of 124 dB re 1 μPa^{2}/Hz or above. Therefore, it is possible that high NLs in these minutes could have prevented any detections taking place, so these periods were considered “off effort” and were excluded from the time spent monitoring, *T*.

The false positive proportion, $\u2009c\u0302$, was 0.097 (standard error: 0.05). The maximum detection radius, where detection probability was assumed to be negligible, was set to 1000 km and a total of 2183.55 h were analysed (excluding 27 min of recordings where ambient noise was assumed to be too high to successfully run the automatic detector).

Call production rate was determined from Stimpert *et al.* (2015). Deployment duration and number of calls recorded were reported for 18 digital acoustic recording tags (DTAGs; Johnson and Tyack, 2003) records. Ten animals were tagged with a version of the DTAG (v3) that enables calls from the tagged animal to be identified from other calls made by non-tagged conspecifics. It is crucial when estimating call production rate that only calls from the focal animal are included in the analysis, so the other eight animals tagged with v2 DTAGs were omitted from the analysis. The v3 DTAGs were deployed between 1.60 and 6.30 h. Six tags did not record any calls, while the number of calls produced by the remaining four tagged whales ranged between 23 and 942. The mean call production rate, weighted by tag deployment length, was 45.08 calls hr^{−1} (standard error: 22.31).

### B. Pilot study results

The pilot study simulation was run 500 times assuming a uniform distribution with an initial starting abundance of 5e+6 calls, and a maximum detection range of 1000 km. The median number of observations was 238, and the resulting median percentage bias in estimated density was −56.37%, but decreased to −10.76% if density was only estimated up to a range step of 10 km. The median estimated density surface showed that the area within which the calls were predicted to originate was very restricted, compared to the detection area initially considered (∼12 × 10^{6} km^{2}) and is fragmented [Fig. 9(a)].

The pilot study analysis estimated initial average call density over the three month period from Dec 2007–Feb 2008 to be 0.014 calls hr^{−1} km^{2} (CV: 0.15). Applying the call production rate from the Southern Californian Bight resulted in an average fin whale density of 0.32 animals/1000 km^{2}. The CV for the density estimate was 0.52. The overall monitored area for both the pilot study simulation and analysis (once spatial acoustic masking was taken into consideration) was 973 km^{2} [Fig. 9(b)]. Based on the results of the simulation, the pilot analysis results were re-analysed so that density was only inferred up to a range step of 10 km. There was no way to determine which of the detections without bearings would have been detected within 10 km, so it was assumed that the relative abundances of the two detection types (which could be calculated from the initial analysis across the whole survey region) was not altered by making inference over a smaller area. Therefore, an additional multiplier, *b*, was used to scale the estimated density based on detections with bearings (*b = *1.22). The resulting call density estimate was 0.02 calls hr^{−1} km^{2} (CV: 0.15), which resulted in a density of 0.54 animals/1000 km^{2} (95% confidence interval: 0.21–1.40 animals/1000 km^{2}). The CV associated with the density estimate was 0.52.

## VI. DISCUSSION

There are already several existing methods that can be used to estimate animal density from acoustic data. However, the large variety of acoustic hardware and instrument configurations continue to present new surveying challenges and require current density approaches to be adapted. The CTBTO dataset presents such a case; there are six hydroacoustic stations similar to Wake Island situated in the Pacific, Atlantic, and Indian Oceans (CTBTO, 2018), which have provided a wealth of baleen whale recordings (e.g., Stafford *et al.*, 2010; Samaran *et al.*, 2013; Le Bras *et al.*, 2016). Each site is configured in a similar way to Wake Island, with two triads of cabled hydrophones, one located to the north and one to the south of a land-based station that collects data around the clock. However, to date, it has not been possible to utilize CTBTO data fully for cetacean density estimation. Distance sampling is not a suitable method for CTBTO data: only two monitoring points would be formed by the two triads at each site, which is too few for distance sampling (due to the animal distribution assumption discussed in Sec. I). In addition, the array spacing within triads only enables call localization using traditional TDOA methods at close ranges, meaning that detections from greater distances would have to be omitted from an analysis. Given that the large detection ranges due to the deep sound channel moorings are an advantageous feature of CTBTO hydrophones, distance sampling would not be an optimal analysis method in cases where the majority of signals were originating from distant locations and could not be localized (recently, however, Le Bras *et al.*, 2016, presented an alternative location methodology using bearing and amplitude information in a Bayesian framework to estimate calling animals' locations from CTBTO data, which may extend the localization capabilities of these arrays). The array design at each site is also not configured well for an SECR analysis. Although six hydrophones are available per site, acoustic masking is expected between the northern and southern arrays, creating an acoustic barrier (Pulli and Upton, 2001). Furthermore, the close spacing of the hydrophones in each triad would likely lead to many detections being recorded by all three instruments. SECR depends on a variety of capture histories to infer the location of calling animals; in this case, the array design may provide limited information (i.e., scenarios where all instruments are ensonified on each occasion yields little spatial information about the calling animals).

Therefore, data from the CTBTO arrays required a density estimation approach that used auxiliary data. Although Monte Carlo simulations have been used to estimate call density of blue whales in the Indian Ocean using CTBTO data (Harris, 2012), the method presented here used the additional distributional information available in the measured bearings. The more empirical data about animals' locations that can be collected during the acoustic survey, the fewer methodological assumptions are required during the analysis. Although this method was developed specifically for CTBTO data, there are other instrument systems that record similar information. For example, DIFAR (directional frequency analysis and recording) sonobuoys record bearings and have been used to detect blue whales at distances over 100 nautical miles (e.g., Miller *et al.*, 2015).

The simulations demonstrated that the method performed well under the three different simulated animal distributions (though with less extreme propagation conditions as modeled at Wake Island). In two of the three cases, bias was further reduced when density was predicted over a smaller area than the detection radius originally set for the simulation. For example, in the median surface plot of the uniform distribution scenario, an area on the periphery of the detection radius has some negative bias [as shown by the darker region to the south of the array in Fig. 6(a)] and the simulation results recommended that density only be predicted out to 678 km. The same issue was also encountered during the pilot study. Running a simulation specifically for the pilot study suggested that the initial estimates were likely to be negatively biased and inference was restricted to a smaller area. In this case, restricting the area nearly doubled the point estimate (from 0.32 to 0.54 animals/1000 km^{2}). In summary, the simulation code provides a tool for users to explore optimal detection ranges for their given target species, survey location, and automated detection software. A natural extension to the work would be to incorporate more complex animal distributions into the simulation algorithm.

The pilot study analysis demonstrated how most of the required auxiliary data for this approach can be generated using subsampled data from the main three-month survey. It is crucial that all parameters in the density estimator have been estimated accurately for the time and place of the main survey, otherwise resulting density estimates may be biased. SLs, NLs, TL, the proportion of false positives, and the detector characterization curve were all estimated specifically for the Wake Island dataset. The SL analysis suggested that, while the choice of TL model made little difference to the SL distribution parameters used in the simulations and analyses, the negative relationship between estimated SL and range of the call from the hydrophone when using the Peregrine TL model warrants further investigation. Parabolic equation models can have limitations at high incidence angles (i.e., small ranges in this case; Jensen *et al.*, 2000), which could result in the discrepancies seen between the two sets of SL results. Further, a fixed source depth of 15 m was assumed for all TL data used in both the simulations and analyses; an extension to this work would be to see whether changes in source depth (or using a distribution of source depths) significantly affects the Peregrine TL (and therefore SL) results. The one parameter that could not be estimated from the collected data was call production rate. In the absence of any other available data, call production rates from the Southern Californian Bight collected during summer months were applied to the estimated call densities. It is highly probable that the call production rates of fin whales around Wake Island and southern California are different; cue production rates do show spatiotemporal variation (e.g., Warren *et al.*, 2017). Therefore, the fin whale densities estimated around Wake Island should be considered a “ballpark” estimate at best.

The pilot study also demonstrated the flexibility of density estimation methods. In this case, bearings could not be measured for all detections, but all detections (except those with SNR values below the lower SNR limit of the detector characterization curve) could still be incorporated into the analysis. It should be noted, however, that the estimated distribution map was based on those detections with measurable bearings only. In order to interpret the resulting distribution map as the predicted spatial distribution of calling fin whales, an assumption must be made that the measured bearings represent the spatial distribution of all detections. In any method that makes assumptions, it is important to assess whether the assumptions are reasonable, or whether they may have been violated. Therefore, consideration should be given as to whether there are any oceanographic or bathymetric features of the study area that may result in certain bearings being difficult, or impossible, to measure (other than high TL values, which are accounted for by identifying areas of acoustic masking at the start of the analysis). In these cases, the resulting map would not depict the distribution of all calling animals.

The most striking result of the pilot analysis was the fact that the monitored area at Wake Island for fin whale calls was much smaller than originally anticipated. Širović *et al.* (2007) estimated detection ranges of fin whale calls in the Antarctic Ocean up to 56 km, though their instruments were not moored in the deep sound channel. Previous work investigating detection range of blue whale calls at CTBTO sites in the Indian Ocean (Samaran *et al.*, 2010; Harris, 2012) predicted that blue whale calls could be detected hundreds of kilometres away, facilitated by the deep sound channel. However, the pilot study results are supported by previous work that predicted detectability of low frequency signals at Wake Island to be lower than at Diego Garcia (Miksis-Olds *et al.*, 2015). The results of all simulations and pilot analysis also demonstrated that the monitored area may be an irregular shape, or even fragmented, as seen in the pilot study. The fragmentation of the monitored area in the pilot study is most likely caused by fluctuations in TL with range; the TL decreases at approximately 50 km (Fig. 8, inset), which corresponds to the fragmented regions. Monitored areas with unusual shapes should not lead to biased density estimates, as long as the results are not extrapolated to areas outside the defined monitored area.

The pilot study has demonstrated the importance of quantifying the size and shape of the monitored area (by estimating detection probabilities of the target species) during acoustic surveys. The same site may show temporal variation in detection probability as oceanographic conditions change through the year. Geographic variability in detection probability between sites, caused by local bathymetric and ocean conditions, should also be considered, even if the acoustic system is the same. Detection probability may also alter if the behavior of the target species changes, e.g., if animals increase call SLs in certain behavioral contexts. Investigating such spatial and temporal variation in detection probabilities at Wake Island and another CTBTO site, Diego Garcia in the Indian Ocean, will comprise the next stage of this research. Another natural extension to this work would be to analyse the southern site at Wake Island to investigate whether the same monitoring conditions are present at a site ∼200 km from the focal instrument in this initial study.

## ACKNOWLEDGMENTS

D.V.H. and L.T. were funded by the Office of Naval Research (Grant Nos. N00014-14-1-0394 and N00014-16-1-2364). J.L.M.O. and J.A.V. were funded under Grant Nos. N00014-14-1-0397 and N00014-16-1-2860 also from the Office of Naval Research. Kevin Heaney (OASIS, Inc., Fairfax Station, VA) generously contributed the TL modeling results for which we extend sincere thanks. Thanks also to Lindesay Scott-Hayward for her help with implementing CReSS and SALSA methods. The CTBTO data were accessed from the Air Force Tactical Applications Center (AFTAC)/U.S. National Data Center. Thanks are extended to James Neely (AFTAC), Richard Baumstark (AFTAC), Mark Prior (formerly CTBTO), and Andrew Forbes (CTBTO) for their assistance in data transfer and transfer of knowledge of CTBTO data. Thanks also to the CTBTO for making available the virtual Data Exploitation Center (https://www.ctbto.org/specials/vdec/). Finally, please note that the views expressed in this study are those of the authors and do not necessarily represent the views of the CTBTO Preparatory Commission.

^{1}

https://www.nodc.noaa.gov/OC5/indprod.html (Last viewed 8 May 2018).