Crows are highly intelligent and social creatures. Each night during the non-breeding period, they gather on large pre-roost aggregations as they move towards their communal roost where they sleep. Crows make numerous and varied vocalizations on these pre-roost aggregations, but the purpose of these calls, and vocal communication in general, in these pre-roost aggregations is not fully understood. In this paper, an array of four microphones is used as a non-intrusive means to observe crow vocal behavior in pre-roost aggregations in the absence of human observers. By passively localizing animal vocalizations, the location of individuals can be monitored while simultaneously recording the acoustic structure and organization of their calls. Simulations and experiment are undertaken to study the performance of two time difference of arrival-based methods (hyperbolic location estimator and maximum likelihood estimator) for call localization. The effect of signal-to-noise ratio and uncertainty in measurement on the localization error is presented. By describing, modeling, and testing these techniques in this innovative context, the authors hope that researchers will employ the authors' approaches in future empirical studies to more fully understand crow vocal behavior.

American and Northwestern crows (Corvus brachyrhynchos and Corvus caurinus, respectively) form nocturnal communal roosts with hundreds to tens or even hundreds of thousands of individuals during fall and winter non-breeding periods (Kalmbach, 1916; McGowan, 2001). As crows move towards these roosts around sunset, they gather in pre-roost aggregations, in which they make numerous vocalizations (Moore and Switzer, 1998). Because the time (relative to sunset) and location of pre-roost aggregations are consistent, they represent an ideal context in which to conduct longitudinal studies of vocal (and non-vocal) behavior to better understand how acoustic and structural variation may code meaning in crow vocal communication. However, little research has been conducted on the nature of calling in these noisy aggregations, partly because the presence of human observers can cause crows to change their calling behavior (Parr, 1997; Tarter, 2008).

In this study, we describe, model, and test approaches to assess the vocal communication of pre-roosting crows who utilize a large communal roost on the North Creek Wetlands Restoration on the University of Washington Bothell campus. The potential confound of observer effects is eliminated by recording crow vocalizations using a remotely activated, time-synched microphone array and video camera. This recording setup allows for not only acoustic and video monitoring of these birds, but also acoustic localization, creating the foundation for an integrated study of crow vocal behavior and spatial network dynamics. Here, we use two methods, the hyperbolic location (HL) estimator and the maximum-likelihood (ML) estimator, to locate vocalizing crows within the confines of the array of acoustic recorders. Both methods depend on calculating the time difference of arrival (TDOA). TDOA algorithms have been successfully applied to radar (Dersan and Tanik, 2002), sonar systems (Carter, 1981), and gunshot localization (Deligeorges et al., 2006). These methods have also been used extensively for localizing marine mammals such as blue and fin whales (Stafford et al., 1998; Širović et al., 2007), bowhead whales (Warner et al., 2017), and manatees (Muanke and Niezrecki, 2007), as well as birds such as antbirds (Collier et al., 2010) and woodpeckers (Wang et al., 2005). The two purposes of this paper are (1) to compare the performance of two TDOA-based localization algorithms to localize simulated crow vocalizations, and (2) apply these methods to localize actual crow vocalizations recorded by an array of microphones and compare the locations with those approximated from the video recordings. The remainder of this paper is divided into five sections. Section II presents the scientific motivation of this work, namely to describe, model, and test the application of known approaches (remote recording and spatial localization of bird calls using time difference of arrival comparisons) for use in a novel context (temporally and spatially predictable pre-roost aggregations of crows). Section III presents the mathematical formulation of the HL and ML estimators. Section IV presents the localization results from simulation. Section V presents the experimental results using the actual crow vocalizations. Section VI summarizes this research effort, presents the conclusions drawn from it, and describes how this approach can be extended to address important biological questions about the social behavior and communication of free-living crows.

Although there has been a significant amount of work to decode the vocal communication of crows, the functional relevance of their calls and the acoustic and structural variation within cawing has not yet been fully or adequately explained. Initial studies on crow cawing, such as Chamberlain and Cornwell, 1971, named call types based on numerous attributes, from the context in which the call was given (e.g., scolding call) to the response elicited by emission (e.g., assembly call) to the perception of the calling crow (e.g., alert call) to the sound of the call to a human ear (e.g., carr carr notes). Although these early works often make only qualitative reference to the acoustic structure of these call types, their names are still used in the scientific and popular literature today. Richards and Thompson (1978) later suggested that there were two broad categories of cawing, structured and unstructured, the former associated with bursts of caw syllables and having no discernable association to particular behaviors. Such multisyllabic cawing, albeit common in crows, has subsequently received less attention in the research literature than the acoustic variation of individual or paired unstructured caw syllables. Later papers have attempted a more quantifiable acoustic description of crow cawing, especially of unstructured scolding and alarm calls (Mates et al., 2015; Parr, 1997, Tanimoto et al., 2017; Tarter, 2008; Yorzinski and Vehrencamp, 2009). These studies are highly informative and significantly more rigorous than previous work, but are not always in agreement as to the functional relevance of call types, and have involved known/marked populations, captive birds, or have required luring with food. Crows have interacted with and thrived alongside humans for hundreds and likely thousands of years, and our presence and the history of our interactions (such as feeding) can alter their behavior, including their vocalizations (Pendergraft and Marzluff, 2019; Tanimoto et al., 2017; Tarter, 2008). Still, crows readily interact without humans being present, and their communication in these contexts has not been properly explored. In order to more fully understand the wide variety of crow vocalizations, especially multisyllabic cawing, we need to employ tools to record crow vocalizations without confounds associated with human observers. Due to the relative rarity of some call types and caw patterns, we need to predictably record birds over long periods of times and at different times of the year, reflecting breeding and non-breeding life history stages, to generate large datasets for subsequent, and hopefully automated analyses. We also need to couple these data with information about the location, behavior, and group dynamics of callers vs non-callers. Here, we describe, model, and test the application of known technologies (remote audio/video recording and sound localization via time difference of arrival comparisons) in a novel and understudied setting (crow pre-roosting aggregations). We present this as a new approach to investigate the behavioral relevance of acoustic and structural differences in crow cawing for use in subsequent empirical studies.

TDOA methods depend on the delay between signals received by multiple receivers. If N is the number of receivers, the signal received at ith receiver, 1iN, is

yi(t)=s(tτi)+ni(t),
(1)

where s(t) is the source signal, ni(t) is the noise at ith receiver, and τi is the time delay between the sound source location and ith receiver. τi cannot be found if the sound source location is unknown. However, the delay between ith and jth receivers, Δi,j, can be found using the cross-correlation between the received signal at ith and jth receivers. Δi,j is the time lag that maximizes the cross-correlation between the filtered received signals and can be formulated as

Δi,j=τiτj=Δdi,jc,
(2)

where c is the speed of sound, and Δdi,j=djdi where di is the distance between the sound source and the ith receiver. The localization accuracy depends on the time delay calculation which strongly depends on the signal-to-noise (SNR) ratio at each receiver. To improve the performance of the cross-correlator with low SNR, the Hilbert envelope peak approach was used to calculate the time delays (Muanke and Niezrecki, 2007). In this study, two approaches are taken to find the source location based on the time-delay between each pair of receivers.

Δdi,j represents possible two dimensional (2D) source locations along a hyperbola whose foci are at the receivers i and j. For N>2, there exist K=(N2) hyperbolas and the source location can be estimated by finding the intersection between the K hyperbolic functions. In the HL estimator, all K possible hyperbolic functions are computed. In an ideal case, all the hyperbolas have one common intersection which is the location of the sound source. However, the calculation in a discrete grid and the presence of noise deviate the intersection between each pair of hyperbolic functions. In this case, there are (K2) intersections between any pair of hyperbolas and the source location can be estimated by averaging the location of the intersections.

Since the absolute value of Δi,j is used to calculate the hyperbolas, there will be two branches in hyperbolas. One of the branches can be discarded based on which receiver first receives the audio signal and that can be determined from the sign of the cross-correlation output. As Δi,j decreases, the hyperbola gets wider until for Δi,j equal to zero, the hyperbola becomes a straight line between two receivers.

The sound source location can be estimated by maximizing the likelihood function in a search grid. The corresponding likelihood function is

L(x,y)=L(x,y|Δobs)=k=1Ke(ΔkobsΔkmod(x,y))2/2σk2.
(3)

Here, Δkobs is the time delay observed between kth pair of receivers and Δkmod is the time delay modeled between kth pair of receivers and is equal to Δdi,j/c. (x,y) are the possible source locations in the search grid and the actual source location is the point that maximizes the likelihood function L(x,y) by minimizing the residual error between the observed and the modeled time differences. σk is the standard deviation of the time differences. Here, σk is assumed to be constant and equal to 1 ms for all the hyperbolas. If the measured TDOAs are independent and Gaussian distributed, the maximum likelihood estimation is equivalent to a nonlinear least square difference between the measured and modeled TDOAs as explained by Friedlander (1987).

Simulations were undertaken in matlab (Mathworks Inc., Natick, MA) to compare the performance of the localization methods in ideal circumstances. The image grid step is equal to 1 mm in both x and y directions. The geometry and parameters of the simulations shown in Fig. 1 mimic those of the rooftop experiment described in Sec. V.

FIG. 1.

(Color online) Simulation array geometry that mimics the array geometry of the experiment.

FIG. 1.

(Color online) Simulation array geometry that mimics the array geometry of the experiment.

Close modal

Figure 2 shows the localization results of a sound source located at (4.2, 2.5 m), broadcasting 500–3000 Hz-bandwidth chirp signal with sampling frequency of 24 kHz. The SNR is equal to 20 dB and the sound speed in the environment is equal to 343 m/s. The localization uncertainty of the HL estimator is equal to the standard deviation of the intersections between any hyperbolic pair in the x and y directions. The localization uncertainty of the ML estimator is quantified by calculating the area of the focal spots where the pixel amplitudes are greater than 60% of the maximum amplitude in the image. Since the ML estimator assumes the residual errors between the observed and modeled time differences can be represented as Gaussian-distributed random variables, 60% of the maximum amplitude is almost equal to a standard deviation from the maximum amplitude. Hence, the length of this area in the x and y directions estimates the localization uncertainty and can be compared with the uncertainty of HL estimator.

FIG. 2.

Simulation localization results. (a) Localization result using the hyperbolic location (HL) estimator, (b) localization result using the maximum likelihood (ML) estimator. The circles located at (0,0), (6,0), (0,6), and (6,6) show the microphone locations. The blue lines on panel (a) show the hyperbolas between each pair of microphones. The square shows the actual sound source location and the diamond shows the estimated sound source location. The color bar in panel (b) represents the residual error in meters. The localization uncertainty, shown by the error bars, is equal to (a) 0.009 m in x direction and 0.005 m in y direction for the HL estimator and (b) 0.25 m in x direction and 0.23 m in y direction for the ML estimator. Note the insets do not have similar scales to better show the error bars and offset between the actual and estimated sound source.

FIG. 2.

Simulation localization results. (a) Localization result using the hyperbolic location (HL) estimator, (b) localization result using the maximum likelihood (ML) estimator. The circles located at (0,0), (6,0), (0,6), and (6,6) show the microphone locations. The blue lines on panel (a) show the hyperbolas between each pair of microphones. The square shows the actual sound source location and the diamond shows the estimated sound source location. The color bar in panel (b) represents the residual error in meters. The localization uncertainty, shown by the error bars, is equal to (a) 0.009 m in x direction and 0.005 m in y direction for the HL estimator and (b) 0.25 m in x direction and 0.23 m in y direction for the ML estimator. Note the insets do not have similar scales to better show the error bars and offset between the actual and estimated sound source.

Close modal

Both techniques localized the source correctly with less than 1 cm error in an ideal situation where all the variables are precisely known and the SNR is high. However, there are some complications in analyzing the experimental data. First, there may be errors in measuring the receiver locations. Second, crow aggregations are usually very noisy because of communication between other crows and noises from other sources such as wind, rain, and rooftop equipment around the experiment. In Secs. IV A and IV B, both limitations are analyzed. In Sec. IV A, the Monte Carlo simulations are used to characterize the performance of the localization algorithms when there are random errors in travel time between the sound source and the receivers. In Sec. IV B, white noise is added to each recording to generate various SNRs.

The Monte Carlo simulations are used to examine the performance of the two algorithms. The inside area is divided into nine regions and a Monte Carlo simulation was run in each region using 1000 random samples of source locations assuming a root-mean-square (rms) timing error of 0.5 ms per pick for each receiver. The SNR is 20 dB at each receiver and the grid step is 1 mm in the x and y directions. Figure 3 shows the localization error averaged over 1000 trails in each region. It shows the localization accuracy of the HL estimator is more dependent on the location of the sound source than the ML estimator. The HL estimator is more accurate in the middle of the receivers than in the corners. However, the ML estimator is equivalently accurate in all the regions. Overall, the ML estimator is more robust than the HL estimator.

FIG. 3.

(Color online) Monte Carlo analysis of (a) the hyperbolic location estimator (HL), (b) the maximum likelihood estimator (ML). The color bar represents the localization error in meters, averaged over 1000 trials in each region.

FIG. 3.

(Color online) Monte Carlo analysis of (a) the hyperbolic location estimator (HL), (b) the maximum likelihood estimator (ML). The color bar represents the localization error in meters, averaged over 1000 trials in each region.

Close modal

If Si(ω) is the Fourier transform of the signal received at the ith receiver, s(tτi) in Eq. (1), and Ni(ω) is the Fourier transform of the noise received at that receiver, ni(t) in Eq. (1), the SNR at the ith receiver is defined by

SNRi=|Si(ω)|2|Ni(ω)|2.
(4)

In this section, the performance of both localization methods is analyzed in various SNRs by adding white noise to the simulated received signals. Two source locations are considered: the middle region of the array and the corner region closer to receiver number 1. Figure 4 shows the localization error and one standard deviation of the error in x and y directions. Figure 4 shows that both methods are more sensitive to noise when the sound source is located in a corner region. Also, the ML estimator is more accurate than the HL estimator for sources located in a corner. Both estimators localize the sound source accurately when the SNR is above 6.5 dB. The standard deviation of the localization error in the HL estimator increases when the sound source is in a corner. However, the standard deviation of the localization error in the ML estimator is independent from the sound source location and SNR.

FIG. 4.

(Color online) Simulation localization results for various signal-to-noise ratios (SNR). (a) localization error, (b) one standard deviation of the error in x direction, (c) one standard deviation of the error in y direction. The results from the HL and ML estimators are shown with squares and circles, respectively. The filled symbols show the results for a sound source located in the middle of the array and the unfilled symbols show the results for a sound source located in a corner of the array.

FIG. 4.

(Color online) Simulation localization results for various signal-to-noise ratios (SNR). (a) localization error, (b) one standard deviation of the error in x direction, (c) one standard deviation of the error in y direction. The results from the HL and ML estimators are shown with squares and circles, respectively. The filled symbols show the results for a sound source located in the middle of the array and the unfilled symbols show the results for a sound source located in a corner of the array.

Close modal

Data were collected during winter 2017 on the University of Washington Bothell campus near the North Creek Wetlands roost. Crows gather on campus before moving to their communal roost for the night. This particular roost is estimated to contain over 15 000 individuals (Ferguson et al., 2018). A photograph of crows gathering on a pre-roost aggregation at the University of Washington Bothell is shown in Fig. 6(c).

Four Song Meter SM3 Acoustic Recorders (Wildlife Acoustics, Inc., Maryland) (henceforth referred to as “receivers”) were attached to concrete blocks, placed on a 6 m×6 m square on the rooftop of the STEM building on the University of Washington Bothell campus, where crows predictably and reliably form a pre-roosting aggregation around sunset for about 15–60 min each evening. A 6 m×6 m array was chosen to maximize the number of crows inside the array, while reducing the chances of signal loss due to distance from the recorder, noise, and call overlap. In addition to the acoustic receivers, a night vision video camera (RLC-411WS, ReoLink, Kowloon, Hong Kong) was installed with live feed capabilities for monitoring the experiment as well as ground-truthing the localization results. To minimize any possible disturbance to the animals' environment, the acoustic receivers and the camera were programmed to start a few hours before sunset and continued for 3–4 h to make sure vocalizations from all pre-roosting crows were captured. Depending on the season, the number of crows observed within the confines of the 6 m × 6 m receiver array varied from one to nearly 200, with many more outside of the array on the same rooftop. Since the performance of the localization algorithms substantially degrades when the sound source is outside of the array (Park and Kotun, 2018), the focus of this study is only on the animals vocalizing inside the square array.

Crows are notoriously boisterous creatures, especially when they are in groups. The most common vocalizations made consist of caw syllables separated by gaps of silence, forming a call or burst (Richards and Thompson, 1978; Tarter, 2008). The functional relevance monosyllabic caws, sometimes categorized as unstructured cawing, is better understood than the multisyllabic cawing often heard in pre-roost aggregations (Chamberlain and Cornwell, 1971; Richards and Thompson, 1978, Tarter, 2008). A complete caw call is normally comprised of one to seven individual caw syllables. Figure 5 shows a sample caw call recorded on Feb. 22, 2017. This call comprised of three caw syllables and has the highest signal energy in ∼1–2 kHz bandwidth.

FIG. 5.

(Color online) A sample crow caw call. (a) Normalized waveform, (b) normalized spectrogram.

FIG. 5.

(Color online) A sample crow caw call. (a) Normalized waveform, (b) normalized spectrogram.

Close modal

To compare the performance of HL and ML estimators, nine separate caw syllables were selected by manually inspecting the audio recordings. Once a vocalization was found at one receiver, it was identified by the other three receivers within ±dmax/c=±62/343=±0.024sec. The actual locations of the vocalizing crows were approximated by visually identifying the vocalizing crows on the video recordings. Crows often move their head forward and backward while vocalizing, which aided call identification in the video recording (Chamberlain and Cornwell, 1971). There is strong persistent noise from ventilation systems approximately 15 m away from the array setup. To reduce this noise and conserve frequencies associated with the most energy in caw calls, all the recordings were bandpass filtered between 500 and 5000 Hz. The SNR of these recordings varied between 7.8 and 17.9 dB.

The HL and ML estimators were applied to the selected vocalizations. Figure 6 shows a sample localization result of the first syllable of the crow caw call shown in Fig. 5. The estimated crow location is (5.43 ± 0.09 m, 3.09 ± 0.07 m) and (5.42 ± 0.32 m, 3.08 ± 0.21 m) by the HL estimator and the ML estimator, respectively. The video recording confirms that the crow is located around (5.60 m, 3.21 m). Table I shows a summary of the localization results of this call and seven more crow vocalizations. The localization error averaged over all eight calls is 0.31 and 0.24 m for the HL estimator and the ML estimator, respectively.

FIG. 6.

(Color online) Crow localization results. (a) Localization result using the hyperbolic location estimator, (b) localization result using the maximum likelihood estimator, (c) a snapshot of the video recording showing the calling crow. The circles located at (0,0), (6,0), (0,6), and (6,6) show the microphone locations. The blue lines on panel (a) show the hyperbolas between each pair of microphones. On panel (a) and (b), the squares show the actual crow location and the diamonds show the estimated crow location. The color bar in panel (b) represents the residual error in meter. The circle in panel (c) shows the vocalizing crow.

FIG. 6.

(Color online) Crow localization results. (a) Localization result using the hyperbolic location estimator, (b) localization result using the maximum likelihood estimator, (c) a snapshot of the video recording showing the calling crow. The circles located at (0,0), (6,0), (0,6), and (6,6) show the microphone locations. The blue lines on panel (a) show the hyperbolas between each pair of microphones. On panel (a) and (b), the squares show the actual crow location and the diamonds show the estimated crow location. The color bar in panel (b) represents the residual error in meter. The circle in panel (c) shows the vocalizing crow.

Close modal
TABLE I.

Summary of all the crow vocalizations, SNR, and their locations using the HL and ML estimators and the actual locations taken from the video recordings.

Call #DateTimeSNR (dB)Estimated location (HL)Estimated location (ML)Actual location
Feb. 22 18:07:21 11.1 (5.43±0.09, 3.09±0.07) (5.42±0.32, 3.08±0.21) (5.60, 3.21) 
March 14 18:35:26 9.4 (4.07±0.06,3.44±0.05) (4.07±0.25,3.44±0.23) (4.25,3.75) 
March 14 18:37:22 12.3 (3.40±0.08,2.43±0.12) (3.41±0.24,2.42±0.24) (3.14,2.54) 
March 15 18:38:42 7.8 (1.74±0.13,0.27±0.31) (1.72±0.22,0.30±0.34) (1.71,0.66) 
March 15 18:42:04 12.2 (4.23±0.23, 6.03±1.12) (4.16±0.21, 5.76±0.35) (4.04,5.74) 
March 23 18:52:06 11.5 (5.46±1.24,3.64±0.11) (5.14±1.29,3.62±0.22) (5.35,3.62) 
March 25 18:56:35 15.7 (3.38±0.03,1.58±0.06) (3.37±0.23,1.57±0.26) (3.49, 1.54) 
March 25 18:57:22 17.9 (4.88±1.39,1.45±1.64) (4.93±0.25,1.04±0.24) (4.89,0.79) 
Call #DateTimeSNR (dB)Estimated location (HL)Estimated location (ML)Actual location
Feb. 22 18:07:21 11.1 (5.43±0.09, 3.09±0.07) (5.42±0.32, 3.08±0.21) (5.60, 3.21) 
March 14 18:35:26 9.4 (4.07±0.06,3.44±0.05) (4.07±0.25,3.44±0.23) (4.25,3.75) 
March 14 18:37:22 12.3 (3.40±0.08,2.43±0.12) (3.41±0.24,2.42±0.24) (3.14,2.54) 
March 15 18:38:42 7.8 (1.74±0.13,0.27±0.31) (1.72±0.22,0.30±0.34) (1.71,0.66) 
March 15 18:42:04 12.2 (4.23±0.23, 6.03±1.12) (4.16±0.21, 5.76±0.35) (4.04,5.74) 
March 23 18:52:06 11.5 (5.46±1.24,3.64±0.11) (5.14±1.29,3.62±0.22) (5.35,3.62) 
March 25 18:56:35 15.7 (3.38±0.03,1.58±0.06) (3.37±0.23,1.57±0.26) (3.49, 1.54) 
March 25 18:57:22 17.9 (4.88±1.39,1.45±1.64) (4.93±0.25,1.04±0.24) (4.89,0.79) 

In this study, two TDOA-based methods (hyperbolic location estimator and maximum likelihood estimator) have been used to localize calling crows in a pre-roost aggregation. The performance of these methods has been studied using simulations and experiment. Crow vocalizations were recorded by a fixed programmable microphone array consisting of four time-synched recorders installed in a pre-roost aggression. In this setup, vocalizations can be recorded and localized in the absence of human observers and in poor weather conditions that are not conducive to visual monitoring.

Evidence suggests that crow vocal communication is context-dependent (Mates et al., 2015). Crows form small groups on territories or diurnal activity centers, and previous examinations of crow vocal communication have often employed marked birds within small, known family groups. Much less is known about communication in larger, multi-family social groups, such as pre-roost aggregations, in part because of the logistic difficulties associated with recording from such large gatherings. Still, understanding communication within these groups, consisting of crows from multiple territories, is critical to fully understand crow social behavior. The approaches we've outlined for crow recording and localization on pre-roosting aggregations have the potential to lead to a better understanding of crow communication for a number of reasons. As these aggregations are predictable in time and space, we can keep our recording equipment set up continuously, reducing its novelty. Corvids (e.g., crows, ravens, etc.) show relatively high levels of neophobia as compared to other avian species (Greggor et al., 2016), and, in our experience, often alter their behavior when an observer points a microphone or video camera in their direction. After a few nights, the crows in this study acclimated to our recording set up, appearing near or on recorders without hesitation. Many previous studies of crow vocal communication were completed by human observers with only their car as a means of camouflage (e.g., Parr, 1997), with captive or hand-reared birds (e.g., Brown, 1985; Parr, 1997), or by inducing wild birds to approach using food (e.g., Tarter, 2008). The spatial predictability and density of pre-roost aggregations, like those assessed in this study, allow for remote recording not biased by the presence of human observers. That crows use these pre-roost aggregations nearly year-round, albeit at much lower numbers during the late spring and early summer, allows longitudinal evaluation of vocal and non-vocal behavior across multiple annually repeating life history stages. Seasonally breeding birds, like crows, provide excellent models to understand the effects of seasonal endocrine and neural changes on social behavior and communication (Wacker, 2018), and the potential for long-term recording, not previously feasible, will aid such investigations. The accumulation and assessment of resultant large datasets, again without the confound of human observers, will allow researchers to more fully understand crow vocal communication. Remote audio recording with our array set up also provides localization data for aggregated crows, allowing for an integration of spatial orientation and social network data to better understand both individual and group behavior.

We endeavor to use our approach to design a high throughput system where crow calls, spatial orientation, non-vocal behaviors, movements, and postures can be integrated from video and audio recording apparatuses to generate large datasets to search for patterns to discern the function of crow vocal communication within these large social groups. To realize this goal, an automated approach needs to be developed to localize all the recorded vocalizations. These are a few challenges in the automated localization approach that need to be addressed:

  1. Crow vocalizations need to automatically be identified in all recordings.

  2. Crow calls are directional signals. Hence, the amplitude of the recorded signals depends on the crow head direction. These data are already collected via our video recorders and should be considered in detection efforts.

  3. In larger groups, such as pre-roost aggregations, crows often overlap their vocalizations, making it difficult to identify which vocalizations pertain to a specific crow call.

This work can be extended to three dimensional localization which is suitable for monitoring more crow behaviors. Array size can also be scaled up to meet the demands of specific pre-roost aggregation spread and the particular research questions being addressed. It is worth mentioning that the localization methods used in this study assume the arrivals are direct propagation paths between the sound source and the receivers. Hence, multipath environments such as forests can degrade the accuracy of the estimated locations.

This work is partially supported by the University of Washington Royalty Research Fund. The authors want to thank Chun Lam, Benjamin Walzer, Virdie Guy, and Derek Delizo for help with developing a matlab script, and Lauren Kirk, Andrea Bilotta, and Arin Chic for help with maintenance and optimization of rooftop equipment.

1.
Brown
,
E. D.
(
1985
). “
Functional interrelationships among the mobbing and alarm caws of common crows (Corvus brachyrhynchos)
,”
Z. Tierpsychol.
67
,
17
33
.
2.
Carter
,
G. C.
(
1981
). “
Time delay estimation for passive sonar signal processing
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-29
,
463
470
.
3.
Chamberlain
,
D. R.
, and
Cornwell
,
G. W.
(
1971
). “
Selected vocalizations of the common crow
,”
Auk
88
,
613
634
.
4.
Collier
,
T. C.
,
Kirschel
,
A. N. G.
, and
Taylor
,
C. E.
(
2010
). “
Acoustic localization of antbirds in a Mexican rainforest using a wireless sensor network
,”
J. Acoust. Soc. Am.
128
(
1
),
182
189
.
5.
Deligeorges
,
S.
,
Zosuls
,
A.
,
Mountain
,
D.
, and
Hubbard
,
A.
(
2006
). “
A biomimetic robotic system for localizing gunfire
,”
J. Acoust. Soc. Am.
119
,
3271
.
6.
Dersan
,
A.
, and
Tanik
,
Y.
(
2002
). “
Passive radar localization by time difference of arrival
,”
Proceedings MILCOM 2002
, Anaheim, CA, Vol.
2
, pp.
1251
1257
.
7.
Ferguson
,
T. R.
,
Driscoll
,
Z. G.
,
Wotus
,
C.
,
Hartley
,
R. S.
,
Greer
,
A. J.
, and
Wacker
,
D. W.
(
2018
). “
Communal crow roosts induce changes in social behavior of song sparrows (Melospiza melodia morphna)
,” in
55th Annual Conference of the Animal Behavior Society (ABS)
, Milwaukee, Wisconsin, August 2–6.
8.
Friedlander
,
B.
(
1987
). “
A passive localization algorithm and its accuracy analysis
,”
IEEE J. Oceanic Eng.
12
,
234
245
.
9.
Greggor
,
A. L.
,
Clayton
,
N. S.
,
Fulford
,
A. J.
, and
Thornton
,
A.
(
2016
). “
Street smart: Faster approach towards litter in urban areas by highly neophobic corvids and less fearful birds
,”
Anim. Behav.
117
,
123
133
.
10.
Kalmbach
,
E. R.
(
1916
). “
Winter crow roosts
,” U.S. Department of Agriculture.
11.
Mates
,
E. A.
,
Tarter
,
R. R.
,
Ha
,
J. C.
,
Clark
,
A. B.
, and
McGowan
,
K. J.
(
2015
). “
Acoustic profiling in a complexly social species, the American crow: Caws encode information on caller sex, identity and behavioural context
,”
Bioacoustics
24
(
1
),
63
80
.
12.
McGowan
,
K. J.
(
2001
). “
Demographic and behavioral comparisons of suburban and rural American crows
,” in
Avian Ecology and Conservation in an Urbanizing World
(
Springer
,
Boston, MA
), pp.
365
381
.
13.
Moore
,
J. E.
, and
Switzer
,
P. V.
(
1998
). “
Preroosting aggregations in the American crow, Corvus brachyrhyncos
,”
Can. J. Zool.
76
(
3
)
508
512
.
14.
Muanke
P. B.
, and
Niezrecki
,
C.
(
2007
). “
Manatee position estimation by passive acoustic localization
,”
J. Acoust. Soc. Am.
121
(
4
),
2049
2059
.
15.
Park
,
J.
, and
Kotun
,
K.
(
2018
). “
Spectral coherence and hyperbolic solutions applied to time difference of arrival localisation
,”
Appl. Acoust.
136
,
149
157
.
16.
Parr
,
C. S.
(
1997
). “
Social behavior and long-distance vocal communication in eastern American crows
,” Dissertation,
University of Michigan
,
Ann Arbor, MI
.
17.
Pendergraft
,
L. T.
, and
Marzluff
,
J. M.
(
2019
). “
Fussing over food: Factors affecting the vocalizations American crows utter around food
,”
Anim. Behav.
150
,
39
57
.
18.
Richards
,
D. B.
, and
Thompson
,
N. S.
(
1978
). “
Critical properties of the assembly call of the common American crow
,”
Behaviour
64
(
3
),
184
203
.
19.
Širović
,
A.
,
Hildebrand
,
J. A.
, and
Wiggins
,
S. M.
(
2007
). “
Blue and fin whale call source levels and propagation range in the Southern Ocean
,”
J. Acoust. Soc. Am.
122
(
2
),
1208
1215
.
20.
Stafford
,
K. M.
,
Fox
,
C. G.
, and
Clark
,
D. S.
(
1998
). “
Long-range acoustic detection and localization of blue whale calls in the northeast Pacific Ocean
,”
J. Acoust. Soc. Am.
104
,
3616
3625
.
21.
Tanimoto
,
A. M.
,
Hart
,
P. J.
,
Pack
,
A. A.
,
Switzer
,
R.
,
Banko
,
P. C.
,
Ball
,
D. L.
, and
Warrington
,
M. H.
(
2017
). “
Changes in vocal repertoire of the Hawaiian crow, Corvus hawaiiensis, from past wild to current captive populations
,”
Anim. Behav.
123
,
427
432
.
22.
Tarter
, and
Robin
,
R.
(
2008
). “
The vocal behavior of the American crow, Corvus brachyrhyncos
,” Dissertation,
The Ohio State University
,
Columbus, OH
.
23.
Wacker
D. W.
(
2018
). “
Leveraging seasonality in male songbirds to better understand the neuroendocrine regulation of vertebrate aggression
,” in
Routledge International Handbook of Social Neuroendocrinology
, edited by
O.
Schultheiss
and
P.
Mehta
(
Routledge
,
Abingdon, UK
), pp.
67
80
.
24.
Wang
,
H.
,
Chen
,
C. E.
,
Ali
,
A.
,
Asgari
,
S.
,
Hudson
,
R. E.
,
Yao
,
K.
,
Estrin
,
D.
, and
Taylor
,
C.
(
2005
). “
Acoustic sensor networks for woodpecker localization
,” in
Proc. SPIE 5910, Advanced Signal Processing Algorithms, Architectures, and Implementations XV
, San Diego, CA, Vol. 5910, p.
591009
.
25.
Warner
,
G. A.
,
Dosso
,
S. E.
, and
Hannay
,
D. E.
(
2017
). “
Bowhead whale localization using time-difference-of-arrival data from asynchronous recorders
,”
J. Acoust. Soc. Am.
141
(
3
),
1921
1935
.
26.
Yorzinski
,
J. L.
, and
Vehrencamp
,
S. L.
(
2009
). “
The effect of predator type and danger level on the mob calls of the American crow
,”
Condor
111
(
1
),
159
168
.