Studies have demonstrated that dolphins can maintain continuous auditory or echolocation vigilance for up to 5 to 15 days when provided with continuous primary reinforcement (i.e., food reward after each correct detection). The goals of this study were to examine whether dolphins could perform an 8-h echolocation vigilance task featuring variable reinforcement schedules, where correct responses were intermittently rewarded, and variable acoustic secondary reinforcement (feedback) patterns. Three dolphins were trained to echolocate simulated targets and press a response paddle upon detecting echoes. Three conditioned reinforcement conditions were utilized: no (acoustic) feedback, acoustic feedback, and structured acoustic feedback. The probability of primary reinforcement following a correct response began at 50% for all dolphins but was sequentially reduced to 25%, 12%, 6%, and 0% each time performance criteria were met. Conditions including acoustic feedback resulted in two dolphins successfully performing the echolocation vigilance task under the 0% primary reinforcement schedule (8 h before receiving primary reinforcement). None of the animals reached 0% reinforcement probability in the no feedback condition. The results demonstrate that dolphins can perform experimental echolocation tasks for extended time periods without primary reinforcement and suggest that secondary reinforcement may be important to maintain this behavior.
I. INTRODUCTION
Previous studies have shown that bottlenose dolphins (Tursiops truncatus) are capable of continuously performing auditory or echoic detection tasks for extended periods of time (Ridgway et al., 2006; Branstetter et al., 2012; Branstetter et al., 2018). In the first of these studies, Ridgway et al. (2006) demonstrated that dolphins could remain attentive and continuously perform a passive acoustic detection task for up to 5 days without significant performance degradation. In later studies, Branstetter et al. (2012) and Branstetter et al. (2018) utilized an active echolocation vigilance paradigm, where dolphins used their biosonar to continuously monitor the waters surrounding a netted enclosure and reported the detection of echoic signals simulating physical targets, for time periods ranging from 1.5 h to 15 days. Although these studies differed as to whether passive (hearing) or active (biosonar) sensory systems were studied, they shared a common feature in the use of a continuous reinforcement schedule: a secondary, conditioned reinforcer (an acoustic signal) was presented, followed by the primary, unconditioned reinforcer (food), after each conditioned response (the press of a response paddle following detection of a signal). Continuous reinforcement is most efficient for training new behaviors (Skinner, 1938); however, it has been demonstrated with a variety of species, including rats and pigeons, that variable reinforcement (VR) schedules, in which some but not all correct responses are reinforced, maintain higher rates of responding (Skinner, 1938; Ferster and Skinner, 1957).
A number of previous marine mammal bioacoustic tests have utilized VR schedules. In early research, a bottlenose dolphin was found to successfully perform an echolocation target detection task for up to 100 trials before receiving food reinforcement; however, the total length of the session was not reported and performance significantly dropped after eight consecutive sessions (Beach and Pepper, 1972; Murchison and Patterson, 1980). More recently, VR schedules have been utilized in underwater hearing tests with dolphins, belugas, and sea lions. In these studies, animals were provided primary reinforcement after a variable number of conditioned acoustic responses to hearing test tones (e.g., Schlundt et al., 2000; Ridgway et al., 2001; Mulsow et al., 2012).
The objectives of the current study were to (1) determine if dolphins could maintain performance during an echolocation vigilance task utilizing VR schedules with decreasing probability of an individual response being rewarded, with a limiting condition of primary reinforcement provided only at the end of the 8-h session, and (2) examine if the presence of secondary reinforcement (acoustic feedback) affected performance. Although variable schedules have been studied extensively, especially in laboratory animals (Ferster and Skinner, 1957), the current study utilized a VR schedule over an extended time period atypical of most studies of marine mammal hearing or biosonar. Extending the amount of time a dolphin performs an echolocation task before receiving food reinforcement provides insight into the behavioral capabilities of these animals while performing echolocation-based tasks where continuous delivery of primary reinforcement (i.e., food) may be impractical.
II. METHODS
A. Overview
The echolocation task was similar to that used by Branstetter et al. (2012), where dolphins were trained to continuously perform an echolocation detection task. Rather than using a physical target, the echolocation task utilized a phantom echo generator (PEG) (Simmons, 1973; Au et al., 1987) that extracted amplitude, frequency, and timing information from the dolphin's echolocation signals in order to broadcast signals back to the dolphin that simulated echoes from a hollow, steel sphere. This approach allowed the presence/absence of echoic targets at various azimuthal angles and ranges to be simulated for extended time periods without inadvertent cues to the dolphins, something that would be difficult to achieve with physical targets. The dolphin's task was to monitor the perimeter of a netted ocean enclosure using its biosonar and touch a response paddle with its rostrum upon detecting the presence of any simulated targets (i.e., echoic signals generated by the PEG in response to dolphin clicks).
Trials were defined as 2-min time intervals during which simulated echoes were presented to the dolphin. Paddle presses occurring within a trial were designated as “hits.” Paddle presses outside of a trial or within a trial but in the absence of echoes (i.e., the dolphin was not echolocating, so no echoes were generated) were designated as false alarms. Performance over each 8-h session was characterized using the hit rate (HR) and the false alarm rate (FR). HR was defined as the ratio of the number of hits to the number of echo present trials, expressed as a percentage. FR was defined as the average number of false alarms per hour during the session, i.e., the number of responses in the absence of phantom echoes divided by session duration in hours.
For baseline sessions, primary reinforcement probability (PRP)—defined as the probability that an individual paddle press following echo detection would be reinforced with fish—was 100% (i.e., a continuous reinforcement schedule). For experimental sessions, PRP was <100% (i.e., a VR schedule). Following sufficient performance by the dolphins (HR ≥ 90% and FR < 1 h−1 for an 8-h session), the PRP was progressively lowered. Thus, for experimental conditions, primary reinforcement followed a VR schedule, with the number of correct responses required before primary reinforcement variable and progressively increasing (PRP decreasing) within each experimental condition. Three experimental conditions were utilized to vary the presence and/or characteristics of a conditioned reinforcer consisting of acoustic feedback.
B. Subjects
The study was conducted in the spring and summer of 2011. Three Atlantic bottlenose dolphins participated: an eight-year-old male, IND; a 27-yr-old female, APR; and a 33-yr-old female, MUU. Animal ages indicate those at the time of the study. All animals were housed in floating netted enclosures at the U.S. Navy Marine Mammal Program facility in San Diego Bay, CA. This study complied with protocols approved by the Space and Naval Warfare Systems Center (SSC) Pacific Institutional Animal Care and Use Committee.
The number of fish received as primary reinforcement depended on each animal's individual diet: APR received four fish per correct response, MUU received five, and IND received six. Any fish that the animals did not receive during their session (resulting from a failure to detect stimuli) were fed afterwards, and all animals received their entire diet each day regardless of performance on the experimental task. APR and MUU participated for 8 h during the day (typically 0600–1400) three to five days per week; they were fed minimally before each session and always received the remainder of their diet after each session over time periods similar to days without experimental sessions. IND participated for 8 h in the evening (typically 1500–2300) five days per week; he received his non-study diet during the day, before his session (0730–1400).
C. Echoic stimuli
Eight pairs of piezoelectric transducers (Reson TC4013) were positioned around the perimeter of a 9 × 9 m floating netted enclosure (Fig. 1). One transducer from each pair operated as a hydrophone (the click receiver) and the other operated as an underwater sound projector (the echo transmitter). The transducers were located at a depth of 0.8 m and azimuthal angles of 0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°. A response paddle was placed 0.9 m from the 270°-transducer pair. An additional underwater sound projector (University Sound UW-30) was positioned near the 270° transducer pair and used to project tones for use as conditioned reinforcers.
(Color online) Illustration of the floating, netted enclosure showing the location of the eight transducer pairs (0, 45, 90, 135, 180, 270, and 315 deg), as well as the locations of the response paddle and speaker.
(Color online) Illustration of the floating, netted enclosure showing the location of the eight transducer pairs (0, 45, 90, 135, 180, 270, and 315 deg), as well as the locations of the response paddle and speaker.
Incoming signals from the eight hydrophone receivers were high-pass filtered (5 kHz, Krohn-Hite 3 C Series) and simultaneously sampled at 750 kHz with 16-bit resolution by a National Instruments PXI-7852R multifunction data acquisition board containing a Virtex-5 LX50 field-programmable gate array (FPGA). Digital signals from the eight receivers were continuously analyzed (in real time) for the presence of dolphin echolocation signals (“clicks”), based on a digital implementation of a threshold-crossing peak-detector. If a click was detected, the time and hydrophone position were recorded in a log file. The log files were analyzed offline to verify that the subjects continuously performed the echolocation task during each session.
An experimental trial was initiated by making a single echo transmitter “active,” meaning that phantom echoes were produced by the transmitter in response to any dolphin clicks detected on the corresponding receiver. Trials were not otherwise delineated to the dolphins; therefore, if the dolphins did not produce echolocation clicks, produced clicks with insufficient amplitude to trigger the active receiver, or produced clicks but failed to detect echoes, they would be unaware that a trial occurred. Only a single transmitter could be active at any given time. At the conclusion of a trial, the active transmitter was returned to an “inactive” state. Click detections on all receivers were always recorded in the log file, but echoes could only be generated during an experimental trial, and by the active transmitter. Toggling the active/inactive state for each echo transmitter was accomplished via a digital switching network implemented within the FPGA.
The time interval between trials was randomly chosen from a Poisson distribution, which, during experimental sessions, had a mean active rate of 0.0167 per 15-s time interval. This produced trial intervals that followed a Poisson distribution with a mean of 15 min; i.e., one of the PEG transmitters became active and responded to the dolphin's echolocation signals by emitting simulated target echoes every 15 min, on average. The majority of trial intervals were less than the mean duration, but much longer duration trials occurred from time to time due to the strong positive skew of the Poisson distribution. The mean number of trials across all 8-h sessions was 31, with a minimum of 22 and maximum of 40.
Phantom echoes were generated by convolving the click time-domain waveform with an impulse response function approximating that of a hollow, water-filled sphere. Echo waveforms were delayed 6.7 ms (approximating the two-way travel time for an object located at a range of ∼5 m), filtered (5–200 kHz, Krohn-Hite 3 C Series), amplified (Crest Pro 1500), and broadcast to the subject via the corresponding transmitter. Echo source levels were approximately −60 dB relative to the received click level. These echo source levels were low enough to prevent acoustic echoes from triggering nearby receivers and being counted as clicks, but resulted in echoes that were easily detectable by the dolphins throughout the experimental enclosure. In addition, click detection on the active receiver was suppressed during the time period when an echo was generated, to prevent the echo from triggering the active receiver, generating a new echo, and creating a feedback loop.
D. Training
All three dolphins were initially trained using primary reinforcement (fish) to press a paddle with their rostrums upon hearing an underwater buzzer. Reception of echoes from the PEG with a single transducer pair was then temporally paired with the buzzer. Pressing the paddle triggered the terminal conditioned reinforcer comprised of two 40-kHz tones, each lasting 250 ms (including 20 ms rise/fall times) and separated by 50 ms. The conditioned reinforcer was immediately followed by the delivery of primary reinforcement by a trainer. The number of transducer pairs was then systematically increased in order to transfer the task onto the entire 360° arrangement. The amount of time between trials and the specific transducer pair that was active for each trial were then varied such that the dolphins could not predict when, or at which location, they would receive echoes. Therefore, the only way for the subject to successfully perform the task was to continuously echolocate around the enclosure perimeter, periodically ensonifying the click receiver at each azimuthal angle, for the duration of the training session. This was typically accomplished by the dolphins swimming in a circular pattern and echolocating beyond the enclosure perimeter. Once the dolphins were continuously ensonifying the hydrophones and reliably pressing the paddle upon echo presentation, the buzzer was faded out. Correct responses during this period were reinforced on a continuous basis (PRP = 100%). The conditioned reinforcer and primary reinforcement were withheld on false alarms, and the start of the next trial was delayed by an additional 15 s. After initial training, the mean time between trials and the total time the dolphins performed the echolocation task were gradually increased until session duration reached 8 h and trial intervals followed the Poisson distribution with a mean of 15 min. Once the dolphins met criteria of HR ≥ 90% and FR < 1 h−1 for an 8-h session, data collection began.
E. Baseline and experimental conditions
Data were collected under a baseline condition followed by three experimental conditions. The characteristics of the baseline and experimental conditions are compared in Table I. The order of the experimental conditions was varied across the three dolphins.
Characteristics of the baseline and experimental conditions. The terminal acoustic conditioned reinforcer, after which the subject always received food reinforcement, was consistent throughout all conditions: two 40-kHz tones. The no feedback (NF) condition provided no acoustic conditioned reinforcer on non-food reinforced trials. The acoustic feedback (AF) condition always provided the same acoustic conditioned reinforcer on non-food reinforced trials: a 5-kHz tone followed by a 40-kHz tone. The structured acoustic feedback (SAF) condition provided progressive, structured feedback consisting of two tones: the frequency of the first tone ranged from 5 to 40 kHz, depending on the proximity to a food-reinforced response, and the frequency of the second was always 40 kHz.
Schedule . | Primary reinforcement . | Conditioned reinforcer (non-terminal) . | Terminal conditioned reinforcer . |
---|---|---|---|
Baseline | Continuous | None | 40 kHz/40 kHz |
No feedback (NF) | Variable | None | 40 kHz/40 kHz |
Acoustic feedback (AF) | Variable | 5 kHz/40 kHz | 40 kHz/40 kHz |
Structured acoustic feedback (SAF) | Variable | 5–40 kHz/40 kHz | 40 kHz/40 kHz |
Schedule . | Primary reinforcement . | Conditioned reinforcer (non-terminal) . | Terminal conditioned reinforcer . |
---|---|---|---|
Baseline | Continuous | None | 40 kHz/40 kHz |
No feedback (NF) | Variable | None | 40 kHz/40 kHz |
Acoustic feedback (AF) | Variable | 5 kHz/40 kHz | 40 kHz/40 kHz |
Structured acoustic feedback (SAF) | Variable | 5–40 kHz/40 kHz | 40 kHz/40 kHz |
During baseline data collection, correct detections were reinforced on a continuous basis (i.e., PRP = 100%) and all hits were followed by the terminal conditioned reinforcer. Once a dolphin met the performance criteria—two consecutive 8-h sessions with HR ≥ 90% and FR < 1 h−1, baseline testing ceased and testing began under the experimental conditions.
During experimental testing, VR schedules were utilized so that the PRP was less than 100%. The PRP began at 50%, and was sequentially reduced if the subject met the performance criteria; PRP values of 50%, 25%, 12%, 6%, and 0% were used during the study. The PRP values of 50%, 25%, 12%, and 6% were essentially equivalent to VR schedules of VR2, VR4, VR8, VR16 (i.e., the number of correct responses required for primary reinforcement). Since there were, on average, 31 trials per 8-h session, the 0% PRP value approximated a VR31 schedule. For correct responses that were not reinforced, the amount of fish reward that the subject would normally receive (four to six fish, depending on the subject) accumulated until the next reinforced response, at which time the animal received its entire “bank” of primary reinforcement. This resulted in a greater amount of food delivered the more time that elapsed between reinforced responses. If the subject did not meet performance criteria after six sessions at any PRP level, lower PRP values were not tested for that condition and the dolphin moved on to the next experimental condition.
The three experimental conditions all utilized VR schedules as described above, but differed in terms of the presence and character of acoustic feedback provided to the dolphins: (1) no acoustic feedback (NF condition), (2) acoustic feedback (AF condition), and (3) structured, acoustic feedback (SAF condition). In the NF condition, no acoustic feedback was provided for non-food reinforced correct paddle presses. Food-reinforced correct responses were followed by the terminal conditioned reinforcer and primary reinforcer. In the AF condition, an intermediate conditioned reinforcer was provided after each correct non-food reinforced response. The intermediate conditioned reinforcer was the same for all non-food reinforced trials: a 5-kHz tone followed by a 40-kHz tone (each lasting 250 ms and separated by 50 ms). The terminal acoustic reinforcer immediately preceding primary reinforcement was identical to that used during the baseline and NF conditions (two 40-kHz tones). Thus two types of acoustic feedback as to performance on the task were provided on correct responses. In the SAF condition, an intermediate acoustic conditioned reinforcer was also provided after each correct non-food reinforced response; however, the frequency of the first tone of the pair varied between 5 and 40 kHz in 1/16-octave increments, depending on proximity to a primary reinforcer. Thus, the frequencies of the two tones became closer leading up to the food-reinforced response, upon which the terminal conditioned reinforcer was presented and primary reinforcement was delivered. Following primary reinforcement, the intermediate conditioned reinforcer for the next correct response was reset, depending on proximity to the next upcoming food reinforced response. This condition provided additional information indicating progress regarding the VR schedule; however, it was not known whether this type of information would lead to increased performance relative to the AF or NF conditions.
III. RESULTS
All three dolphins successfully performed the echolocation vigilance task continuously for 8 h in the baseline condition. After meeting baseline criteria (HR ≥ 90% and FR < 1 h−1) for five consecutive sessions with 100% PRP, each animal participated in the three experimental conditions utilizing VR schedules. Information on the lowest PRP achieved by each dolphin on each condition is summarized in Table II.
Lowest primary reinforcement probability (PRP) achieved by each dolphin. All experimental conditions began with a 50% PRP. If performance criteria (two consecutive sessions with HR ≥ 90% and FR < 1 h−1) were met within six sessions, PRP decreased first to 25%, then 12%, 6%, and finally 0%, during which the animals performed the echolocation task for 8 h before receiving food reinforcement.
Condition . | APR . | MUU . | IND . |
---|---|---|---|
No feedback (NF) | 25% | 6% | 12% |
Acoustic feedback (AF) | 0% | 0% | 50% |
Structured acoustic feedback (SAF) | 0% | 0% | 50% |
Condition . | APR . | MUU . | IND . |
---|---|---|---|
No feedback (NF) | 25% | 6% | 12% |
Acoustic feedback (AF) | 0% | 0% | 50% |
Structured acoustic feedback (SAF) | 0% | 0% | 50% |
For the dolphin APR, the experimental conditions were conducted in the sequence (1) AF, (2) SAF, and (3) NF. APR performed with HR = 100% for all but three sessions [Fig. 2(a)], for which the HR = 97%. In addition, the FR was < 1 h−1 for the conditions with acoustic feedback (AF, SAF), and she performed a total of 10 sessions with no false alarms. For all PRPs in the AF and SAF conditions, APR met performance criteria in the fewest sessions possible, spending just two days at each PRP condition before moving on. Furthermore, at a PRP of 0% in the AF and SAF conditions, performance for three 8-h sessions was at 100% correct without receiving any primary reinforcement until the last trial of the session, and APR performed one session missing only one trial the entire 8 h. While her HR met criteria in the NF condition, FR increased to > 1 h−1 at 25% PRP, with her highest FR > 4 h−1. Therefore, APR did not progress past 25% PRP in the NF condition.
Hit rate (HR, open symbols, left ordinate) and false alarm rate (FR, filled symbols, right ordinate) for the dolphins (a) APR, (b) MUU, and (c) IND across the no feedback (NF), acoustic feedback (AF), and structured acoustic feedback (SAF) experimental conditions. Each symbol represents the HR or FR for a single 8-h session. Each panel shows the performance for a specific experimental condition, arranged from left to right in the order in which the conditions were tested for each subject. The dashed lines show the performance criteria of HR = 90% and FR = 1 h−1 (i.e., one false alarm per hour).
Hit rate (HR, open symbols, left ordinate) and false alarm rate (FR, filled symbols, right ordinate) for the dolphins (a) APR, (b) MUU, and (c) IND across the no feedback (NF), acoustic feedback (AF), and structured acoustic feedback (SAF) experimental conditions. Each symbol represents the HR or FR for a single 8-h session. Each panel shows the performance for a specific experimental condition, arranged from left to right in the order in which the conditions were tested for each subject. The dashed lines show the performance criteria of HR = 90% and FR = 1 h−1 (i.e., one false alarm per hour).
Testing with the dolphin MUU followed the order (1) SAF, (2) NF, (3) AF. In both the AF and SAF conditions, MUU advanced to 0% PRP and successfully performed an 8-h session before receiving food reinforcement [Fig. 2(b)]. In the SAF condition, performance criteria were met at 0% PRP on two sessions, however, they were not consecutive. On the AF condition, MUU only had the opportunity to participate in one 0% PRP session (performance was HR = 100%, FR = 0.12 h−1) before giving birth to a live calf, after which she was removed from the study. Although MUU's HR remained high during the NF condition, the FR increased to > 1 h−1 at 6% PRP, with her highest FR ∼ 4 h−1. Therefore, testing did not progress past 6% in the NF condition.
Testing with the dolphin IND followed the order (1) NF, (2) AF, (3) SAF. IND met criteria at 50% PRP and 25% PRP in the NF condition; however, criteria were not met at 12% PRP and therefore testing did not advance further. In addition, during both the AF and SAF conditions, IND did not meet criteria at 50% PRP and therefore testing did not progress further during these conditions. IND had FR > 1 h−1 in all sessions except the first four in the NF condition (the first four experimental sessions), with the highest FR > 8 h−1. In addition, there were no sessions during which IND did not commit a false alarm.
Examples of echolocation activity versus time for the last baseline session and the session(s) with the lowest PRP reached by each animal are shown in Fig. 3. For all sessions, click counts for all hydrophones were > 0 in all 5-min periods, indicating that the three dolphins continuously monitored the enclosure perimeter throughout the 8-h sessions. This was typically accomplished by each dolphin continuously swimming in a circular pattern and directing echolocation clicks at each hydrophone in turn. Minimum click rates during the first sessions at 0% PRP in the AF and SAF conditions were 0.30 and 0.94 clicks/s for APR and MUU, respectively. Echolocation rates for IND were always lower than APR and MUU, even during baseline sessions with 100% PRP. The minimum click rate for IND during the NF condition with 12% PRP was 0.11 clicks/s. In addition, IND had the highest number of false alarms, the majority of which occurred soon after making a correct response.
Examples of echolocation activity for dolphins (a) APR, (b) MUU, and (c) IND across 8-h sessions. Data are shown for the last baseline session and the session(s) with the lowest PRP reached by each animal: 0% PRP on the AF and SAF conditions for APR and MUU and 12% PRP on the NF condition for IND. Open circles indicate conditioned reinforcement events, stars represent delivery of primary reinforcement, and asterisks indicate false alarms.
Examples of echolocation activity for dolphins (a) APR, (b) MUU, and (c) IND across 8-h sessions. Data are shown for the last baseline session and the session(s) with the lowest PRP reached by each animal: 0% PRP on the AF and SAF conditions for APR and MUU and 12% PRP on the NF condition for IND. Open circles indicate conditioned reinforcement events, stars represent delivery of primary reinforcement, and asterisks indicate false alarms.
IV. DISCUSSION
Baseline data demonstrated that all three dolphins were capable of maintaining high performance (HR ≥ 90%, FR < 1 h−1) in the echolocation vigilance task for up to 8 h with continuous reinforcement. These data provide additional evidence that dolphins can remain vigilant and monitor their surroundings via echolocation for extended periods of time without significant performance degradation (Ridgway et al., 2006; Branstetter et al., 2012). Dolphins have been shown to have marked inter-hemispheric brain wave asymmetries and display evidence of “uni-hemispheric sleep,” where electroencephalograms show slow wave sleep in one hemisphere and wakefulness in the other (e.g., Mukhametov et al., 1977; Ridgway, 2002). Uni-hemispheric sleep is generally thought to have evolved to facilitate breathing at the surface but may also aid in long-term, continuous vigilance such as that demonstrated in the present study (Ridgway et al., 2006; Branstetter et al., 2012).
When the continuous reinforcement schedule was replaced with variable reinforcement schedules, two of the three dolphins (APR and MUU) successfully performed the task for 8 h without primary reinforcement (i.e., 0% PRP) when acoustic feedback was provided after each correct (non-reinforced) response (i.e., in the AF and SAF conditions). This demonstrates the feasibility of trained dolphins participating in echolocation-based behavioral tasks for long time periods when continuous reinforcement is not practical; e.g., dolphins performing biosonar tasks in open ocean environments at some distance from human handlers. The dolphins APR and MUU performed at 0% PRP under the AF and SAF conditions, but not the NF condition. Furthermore, for all dolphins, FR stayed < 1 h−1 until they participated in the NF condition with PRP ≤ 25%. These data suggest that secondary reinforcement may be important in extending the amount of time a dolphin will participate in a task when timely delivery of food reward is difficult, or extending the time over which animals are motivated and engaged in reinforcing activities without being limited by the amount of food available. Although the dolphins did not appear to attend to the structured feedback (relative to unstructured) in terms of enhanced echolocation behavior or responses, a longer period of testing may have facilitated learning of the structured acoustic feedback and a change in biosonar behavioral patterns.
The dolphin IND was the only dolphin to fail to progress beyond 12% PRP for all conditions and to exhibit poor performance in the echolocation vigilance task during the AF and SAF conditions (i.e., did not advance below 50% PRP). Since IND was the only dolphin to participate in the NF condition first, it is possible that receiving no acoustic feedback on the task as PRP decreased initially reduced the effectiveness of the feedback on the following conditions. It is also possible that receiving no feedback in response to correct detections resulted in extinction in the paddle press behavior (suggested by the decline in hit rate across sessions during the NF condition, see below), a lack of motivation, or confusion for the task, as he learned that responses did not always result in reinforcement (whether conditioned or primary). However, definitive conclusions regarding differences between performance under the different experimental schedules cannot be made from the present data, in part because of the small number of subjects and individual differences between the dolphins (e.g., the generally low clicks rates employed by the dolphin IND). In many cases, failure to meet performance criteria was due to high FR, rather than low HR. Since echoic signals possessed salient amplitudes and were always presented at the same simulated range, it is unlikely that false alarms were due to the difficulty of the detection task itself. False alarms for IND (who generally had the highest FR) tended to closely follow correct detections for which he received no reinforcement. This suggests that many false alarms likely represented “probing” behavior, particularly on the NF condition, where the dolphin pressed the paddle again after receiving no feedback on the initial response. This makes interpretation of the high FRs difficult, as they may not have represented “true” false alarms but rather reflected some confusion on the dolphin's part when the PRP was ≤ 25%.
For IND, the HR progressively declined over the course of testing during the NF condition, potentially indicating extinction of the paddle press behavior with the decreasing occurrence of primary (food) reward. In contrast, the biosonar behavior itself did not show evidence of extinction; i.e., the animals continued to produce echolocation clicks even with no acoustic feedback and large time intervals between food reward. It is possible that the occasional presence of the phantom echoes served as a bridging stimulus to maintain the echolocation behavior, and the time intervals between trials were not long enough for extinction to occur. It is also possible that extinction of the biosonar behavior did not occur because the inherent cost to the animal of click production may be low and thus click production would be resistant to extinction. This is supported by recent findings that the metabolic cost of echolocation click production in dolphins is low (Noren et al., 2017).
Some modifications could be made to the study design to provide additional information about the effects of feedback. In the current study, the dolphins moved directly from one condition to the next; returning to the baseline condition (and performance of HR > 90% and FR < 1 h−1) before proceeding to a different condition may result in improved performance on the next condition, particularly after being exposed to the NF condition. Also, a greater number of subjects, longer session durations, and a more rigorous, counterbalanced design may help to parse out any difference between performance under the AF and SAF conditions.
V. CONCLUSIONS
Trained bottlenose dolphins can successfully perform echolocation detection tasks for at least 8 h before receiving primary reinforcement (i.e., food reward), provided that secondary reinforcement (feedback in the absence of food reward) is presented throughout the task. The findings demonstrate the feasibility of dolphins participating in biosonar tasks for long time periods when continuous reinforcement is not practical and suggest that secondary reinforcement may be important in maintaining performance when timely delivery of food reward is difficult.
ACKNOWLEDGMENTS
The authors thank Megan Tormey, Jennifer Stanley, Kelly Dillon, Stephanie Barkmeyer, Shannon Kneifl, and Megan Chase for animal training and running of sessions. The authors also thank research assistants Mark Booth, Jenny Heieck, Kelsey Kaplanek, Kiersten Meader, Lyndsey Nardone, and Brittney Tomaszewski. Figure 1 was prepared by Andrew Cardes. Financial support was provided by the Defense Advanced Research Projects Agency.