The harmonic structure of sounds is an important grouping cue in auditory scene analysis. The ability of ferrets to detect mistuned harmonics was measured using a go/no-go task paradigm. Psychometric functions plotting sensitivity as a function of degree of mistuning were used to evaluate behavioral performance using signal detection theory. The mean (± standard error of the mean) threshold for mistuning detection was 0.8 ± 0.1 Hz, with sensitivity indices and reaction times depending on the degree of mistuning. These data provide a basis for investigation of the neural basis for the perception of complex sounds in ferrets, an increasingly used animal model in auditory research.
The harmonic structure of many natural sounds provides an important grouping cue. Thus, harmonic complex tones (HCTs), comprising tones that are integer multiples of the fundamental frequency, F0, are usually heard as a whole entity rather than as individual sounds with different frequencies. Inharmonic complex tones comprise different frequency components or partials that are not all harmonically related. In human listeners, mistuned partials are heard to “pop out” when their frequencies are close to resolved low harmonics, and as rough “beats” when they are close to unresolved higher harmonics (Moore et al., 1985; Hartmann et al., 1990).
Mistuning detection has also been measured in several animal species, including gerbils (Klinge and Klump, 2009, 2010), zebra finches, and budgerigars (Lohr and Dooling, 1998). Interestingly, these species showed substantially better thresholds than humans, raising questions over whether the same neural mechanisms are involved. Here, we implemented a go/no-go task design to measure mistuning detection in ferrets, which are commonly used in behavioral investigations of both spatial (e.g., Bajo et al., 2010) and non-spatial (Kalluri et al., 2008; Walker et al., 2009; Bizley et al., 2013; Mill et al., 2014) aspects of hearing. In particular, their sensitivity to low-frequency sound has led to ferrets being used to study various aspects of pitch perception (Kalluri et al., 2008; Walker et al., 2009; Bizley et al., 2013). Moreover, the growing availability of methods for recording (Bizley et al., 2013) and manipulating (Bajo et al., 2010) neural activity during behavioral testing makes this species particularly well suited for addressing the neural processing of complex sounds. Consequently, ferrets should be a good model for investigating mistuning detection.
Six adult female ferrets from Marshall BioResources (North Rose, NY) were used. All experimental procedures were approved by the local ethical review committee and carried out under license from the UK Home Office.
Animals were trained in a custom-built test chamber equipped with one loudspeaker (Visaton FRS8, Crewe, UK) and two poke holes, each with an infrared sensor and spout for water delivery. The behavioral task, data acquisition, and stimulus generation were automated and controlled by a real-time signal processor with a sampling rate of 25 kHz (RP2, Tucker-Davis Technologies, Alachua, FL) using custom-written scripts in matlab software (MathWorks, Natick, MA).
The reference stimulus was a HCT of 350 ms duration, comprising 16 harmonics in sine phase with a 400 Hz F0, ramped by a 25 ms Hanning window. The same HCT with its fourth harmonic shifted upward in frequency was used as a target stimulus. The degree of mistuning ranged from 0.1 to 192 Hz (23 fixed values were chosen based on a logarithmic separation within this range).
The animals were trained by positive reinforcement using water as reward in a go/no-go task, spread over blocks of either 14 or 5 days interspersed with 3 or 2 days off, respectively. During training blocks, animals had access to water only during the twice daily training sessions, but received a supplement if the volume of water provided during testing was <60 mL/kg. Body weight was measured daily and not allowed to drop by >10%.
To assist the animals during the initial procedural training phase, reference tones were presented at 30 dB sound pressure level (SPL), whereas target tones with the fourth harmonic mistuned by 200 Hz were delivered at an overall level of 70 dB SPL. This intensity difference was gradually reduced as the animals' performance improved, and all stimuli were then presented at an overall level of 70 dB SPL during testing.
The probability of no-go trials was 10% or 20%. To ensure that the 23 mistuned tones were evenly presented while maintaining task difficulty across sessions, they were subdivided into four blocks, each containing seven different mistuned tones equally distributed within the block. To reduce the predictability of the timing of the target tones and prevent stereotyped responses, each trial consisted of a variable number of stimuli (each 350 ms in duration, separated by 200 ms gaps), which was randomized from trial to trial. No-go trials consisted of a series of identical reference tones (3–7), for which the animal had to stay at the trigger spout in order to receive a reward. Go trials comprised two identical mistuned (target) tones preceded by a variable number (2–6) of reference tones. On hearing the mistuned tone, the animal had to move to the other (reward) spout in order to get a reward.
After misses (staying at the trigger spout during go trials) and early releases (leaving the trigger spout during reference tone presentation), a burst of broadband noise was played to provide feedback and signal the lack of reward. Following a miss, a 1-s time-out was given before the animal was able to trigger the next trial. In contrast, a 12-s time-out was used after early releases to reinforce the performance of the animals on trials that required a long waiting time. To reduce the number of early releases, for the last four animals, we increased the proportion of no-go trials from 10% to 20% and introduced “correction trials” with the same stimulus composition after early releases. For first and second correction trials, the time-out was shortened to 8 or 5 s, respectively, and the stimulus composition was reset after two consecutive correction trials. Correction trials were not included in the data analysis.
For every session, performance was assessed by the overall correct response rate for all trials, hit rate (number of correct responses on go trials / total number of go trials) and false alarm (FA) rate (number of incorrect responses on no-go trials / total number of no-go trials). Data were accumulated over sessions that had FA rates <0.6, a criterion adopted as a measure of animal motivation during testing. A minimum of between 749 and 2073 trials in total were analyzed from each animal, and the overall mean FA rate was 0.4. Using signal detection theory, a sensitivity index (d′) was calculated for each mistuned stimulus by subtracting the z-transformed FA rate from the z-transformed hit rate. Psychometric functions were derived by fitting d′ values versus the degree of mistuning using unconstrained nonlinear optimization to a cumulative Gaussian distribution. Response bias was estimated by calculating λcenter, a measure of displacement of the decision criterion:
Reaction time was calculated from go trials on which correct responses were made as the time from the first target tone onset until the animal ceased contact with the trigger spout. Mistuning reaction times were fitted using maximum likelihood estimation of an ex-Gaussian distribution (exponentially modified Gaussian, EMG), where μ is the estimated mean of the Gaussian component, σ is the estimated standard deviation of the Gaussian component, and τ is the estimated mean of the exponential component (Lacouture and Cousineau, 2008). Because reaction times showed two peaks corresponding to the two target tones presented, a double ex-Gaussian distribution (double-EMG) was defined by combining two EMGs linearly using a probability α to represent the relative contribution of the two distributions corresponding to the two target tones in a go trial. The best fit model was then obtained based on the corrected Akaike's information criterion.
3.1 Mistuning detection sensitivity
We found that ferrets are able to discriminate mistuned complex tones from HCTs and that the relationship between the sensitivity index, d′, and the degree of mistuning followed a cumulative Gaussian distribution [Fig. 1(A)]. The maximal performance (indicated by the asymptote in the d′ values) was obtained for mistuning values >3 Hz, with a mean (± standard error of the mean) threshold of 0.8 ± 0.1 Hz obtained from a criterion of d′ = 1. The psychometric functions of individual animals showed similar trends, and their thresholds varied between 0.4 and 1.2 Hz.
Response bias was examined by plotting λcenter as a function of mistuning [Fig. 1(B)]. We found that λcenter for each mistuning value above the threshold of ∼0.8 Hz was negative, indicating that the animals were biased to make go responses. λcenter values depend on both hit and FA rates [see Eq. (1)]. Because a single FA rate was computed from all no-go trials for each animal, changes in λcenter values [Fig. 1(B)] reflect variations in hit rate with the degree of mistuning. This can account for why the λcenter values changed from negative to positive around the threshold, where the hit rate was low.
3.2 Sensitivity depends on the degree of mistuning and is independent of waiting time
To investigate the effect on the animals' performance of the degree of mistuning and waiting time at the trigger spout, hit rate, FA rate, d′ values and λcenter values were calculated for each trial duration. The 23 different mistuned complex tones were divided into two groups according to the shape of the psychometric functions, which indicated that the animals' maximal performance was achieved for mistuning values >3 Hz [Fig. 1(A)], while balancing the number of trials in each case (small mistuning values, <3 Hz; large mistuning values, 3 to 8 Hz). To examine the effect of waiting time in go and no-go trials in parallel, trial length was divided into short, medium and long groups for go trials composed of two, three, or four reference tones plus two target tones and no-go trials with four, five, or six reference tones (waiting time ∼2.3, ∼2.85, ∼3.4 s, respectively).
Hit and FA rates increased and were significantly correlated with trial length within these groups (for each group R > 0.63, p < 0.01). d′ values varied with the degree of mistuning [two-way analysis of variance (ANOVA): F(1, 30) = 19.7, p = 0.0001], but not with trial length, and no interaction between these parameters was detected [trial length: F(2, 30) = 1.29; p = 0.29; degree of mistuning × trial length: F(2, 30) = 0.5, p = 0.6]. However, λcenter values varied with both the degree of mistuning [two-way ANOVA: F(1, 30) = 15.4, p = 0.0005] and trial length [F(2, 30) = 63.1, P = 2 × 10−11], and no interaction between these parameters was detected [degree of mistuning × trial length: F(2, 30) = 0.4, p = 0.7]. This suggests that while the animals' sensitivity to mistuning was independent of trial length, their decision criterion changed, biasing them toward go responses on longer trials.
3.3 Reaction time is dependent on the degree of mistuning
The histograms of mistuning reaction time were not normally distributed and showed two peaks corresponding to the two target tones presented [Fig. 2(A)]. In previous studies, a non-normal distribution of reaction times has been fitted using an ex-Gaussian function. Here, two ex-Gaussian distributions (double-EMG) were combined linearly, with the probability of responding representing the contribution of each.
Fitted curves showed that animals were generally more likely to respond to the first target tone than to the second [compare the height of the two peaks in each curve in Fig. 2(A)]. In the small mistuning group (<3 Hz), however, the response to the first target tone was delayed in all animals and showed a broader distribution at the second target tone compared to the large mistuning group (3–8 Hz) [Fig. 2(A)]. Significant differences in the parameters μ1 and σ1 from the double-EMG fitting were obtained between trials with different degrees of mistuning (paired t-test μ1: t(5) = 4.39, p = 0.007; σ1: t(5) = 5.30, p = 0.003), confirming that the animals took longer to respond as the degree of mistuning was reduced [Figs. 2(B) and 2(C)]. This analysis provides further evidence that the performance of the ferrets on this task was determined by the degree of mistuning.
We have shown that ferrets, a species increasingly used in auditory research, are readily able to discriminate mistuned complex tones from HCTs. Together with previous evidence that some cortical neurons in this species can distinguish between harmonic and inharmonic complex tones (Kalluri et al., 2008) and encode pitch judgments (Bizley et al., 2013), our results highlight the value of using this species for exploring the neural basis for pitch perception and auditory scene analysis.
We examined the mistuning detection ability of ferrets using a modified go/no-go paradigm. One advantage of a go/no-go paradigm is the simplicity of the task, although performance can be affected by the animals' motivation and change of criteria across sessions. We observed higher hit and FA rates in trials that required longer waiting time at the trigger spout, resulting in increased bias toward go responses. Consequently, correction trials were introduced to reinforce the animals to stay at the trigger spout and decrease their FA rate. Because response bias was not investigated in mistuning detection studies in other species, it is not clear how this might have affected the reported behavior and thresholds or even if there are interspecies differences in bias. It is possible, for example, that, because they are carnivores, ferrets might be particularly prone to make go responses. In previous studies in birds and gerbils (Lohr and Dooling, 1998; Klinge and Klump, 2009, 2010), data were analyzed for sessions in which the FA rate did not exceed 20%, whereas the FA rate was 40% on average in our experiments. Despite these considerations, d′ values did not vary with trial length and mistuning detection thresholds in ferrets are very similar to those measured in other animal species using similar paradigms.
We also studied the relationship between reaction time and performance on the go/no-go task. Using an ex-Gaussian distribution to parameterize the distribution of reaction times, we found that responses to the first target tone were delayed and became much less precise to the second target tone when the degree of mistuning was reduced. These findings indicate that the animals require more time to correctly judge the target tones as mistuned when the harmonic shift is small. Furthermore, we found that not only τ but all parameter values from the ex-Gaussian distribution were affected, appearing to contradict the classical view that the exponentially distributed component represents the time for decision making (Hohle, 1965).
Our results show that the mistuning detection threshold in ferrets is 0.8 ± 0.1 Hz. This corresponds closely to the thresholds reported for gerbils (Klinge and Klump, 2009, 2010), zebra finches and budgerigars (Lohr and Dooling, 1998), which are often <1 Hz, whereas those measured in humans can be as much as an order of magnitude higher (Moore et al., 1985; Hartmann et al., 1990). Importantly, this does not reflect differences in task design. Although two-alternative forced choice and matching tasks are commonly used in humans (Moore et al., 1985; Hartmann et al., 1990), similarly high thresholds have been obtained using procedures modeled on the go/no-go tasks employed in the animal studies (Lohr and Dolling, 1998; Klinge and Klump, 2009).
Because thresholds for the detection of mistuning in gerbils and birds increased when sine phase complex tones were replaced with random phase complex tones, whereas human thresholds were not affected (Lohr and Dooling, 1998; Klinge and Klump, 2009), it has been suggested that animals may rely relatively more on temporal fine structure, whereas spectral mechanisms might be more important in humans. It should be noted that 1600 Hz, which is the harmonic frequency that was shifted in this study, lies at the edge of the ferret's auditory nerve phase locking range (Sumner and Palmer, 2012). However, phase locking has been demonstrated in other species not only to the mistuned harmonics themselves, but also to the lower frequency beats corresponding to the difference in frequency between the mistuned and adjacent harmonics (Sinex, 2008).
The behavioral thresholds measured in gerbils, ferrets and birds for detecting a mistuned harmonic in a HCT are substantially smaller than the frequency difference limens for pure tones in these species (Lohr and Dooling, 1998; Klinge and Klump, 2009; Walker et al., 2009; Klinge et al., 2010). It has been proposed that animal species with a short cochlea are less able to rely on the spatial distribution of excitation for discriminating different frequencies (Klinge et al., 2010). This may therefore account for the difference in frequency difference limens for pure tones between humans and animals and the greater dependence of the latter on temporal cues for discrimination tasks of this sort.
This research was supported by a Wellcome Principal Research Fellowship (WT076508AIA, 108369/Z/2015/Z) to A.J.K., a Japan Student Services Organization Scholarship to N.Y.H., an Action on Hearing Loss grant to V.M.B. and A.J.K, and a Postdoctoral Fellowship of the German Academic Exchange Service to M.F.K.H. We thank Kerry Walker for providing initial matlab code for the go/no-go behavior and for valuable discussion.