This study assessed how precision of binaural processing is affected by center frequency (CF), interaural temporal disparity (ITD), and listeners' hearing status. Tonal signals and 100-Hz-wide Gaussian noise maskers were employed at CFs ranging between 250 and 8000 Hz, in octave steps. In addition, for CFs of 2000, 4000, and 8000 Hz, transposed maskers and signals were employed. All listeners had no greater than “slight” hearing losses (i.e., no thresholds greater than 25 dB HL). Across all CFs and ITDs tested, binaural detection thresholds were elevated for listeners whose absolute thresholds at 4 kHz exceeded 7.5 dB HL. That outcome is consistent with results from Bernstein and Trahiotis [(2016). J. Acoust. Soc. Am. 140, 3540–3548]. Quantitative predictions of binaural detection thresholds derived via a comprehensive interaural correlation-based model of binaural processing were highly accurate across the entire set of data. The modeling results suggest that elevated thresholds from listeners having small hearing losses stem from elevated levels of stimulus-dependent, additive internal noise. They do not appear to stem from increased levels of noise within the central binaural comparator or from reduced sensitivity to changes in interaural correlation produced by the addition of the signal to the masker.
I. INTRODUCTION
Modern explanations of binaural hearing assume that external interaural temporal disparities (ITDs) of sounds arriving at the ears are compensated internally in a manner that mediates and facilitates the ability to detect, to discriminate among, to localize, and to lateralize sources of binaural information. Such explanations commonly incorporate the concept of a neurally-based cross-correlation surface having axes of the amount of activity, internal delay, and center frequency (CF) (see Trahiotis et al., 2005).
Several studies have shown that the ability to discriminate between ITDs declines with the magnitude of the “reference” ITD (e.g., Mills, 1958; Hafter and De Maio, 1975; Domnitz and Colburn, 1977). That result is consistent with the notion that the precision with which external ITDs can be internally compensated decreases with the magnitude of ITD. That notion is, in turn, consistent with data obtained in several binaural detection studies in which ITDs were varied (e.g., Langford and Jeffress, 1964; Rabiner et al., 1966; van der Heijden and Trahiotis, 1999). Collectively, those studies have shown that the maximum release from masking (i.e., the masking-level difference, or MLD) occurs when masking noise is presented diotically (ITD = 0), and diminishes with increasing ongoing interaural delay.
More recent measures of binaural detection as a function of the magnitude of ITD were obtained by Bernstein and Trahiotis (2015, 2016). The novel aspect of those studies was that the ITD was imposed on the masker waveform (in the absence of the tonal signal) and on the signal-plus-masker waveform (when the tonal signal was present). The motivation for using such stimuli was the realization that detection performance would degrade systematically if the precision of compensation declined with the magnitude of ITD but would remain constant as a function of ITD if the internal compensation of ITD were errorless.
In Bernstein and Trahiotis (2015), we reported measurements of binaural detection utilizing either low-frequency stimuli (centered at 250 Hz), for which the fine-structures of the waveforms convey binaural temporal information, or high-frequency stimuli (centered at 4 kHz) for which only the envelopes of the waveforms convey binaural temporal information. The patterning of the detection data at both CFs, and the theoretical analyses of them, suggested that (1) listeners do internally compensate for external ITDs independent of whether the ITDs are conveyed by low- or high-frequency stimuli; (2) such compensation of ITDs enhances binaural detection performance; (3) the precision with which the compensation of ITD occurs declines as the magnitude of the externally imposed ITD increases. Those outcomes verified and extended the findings of van der Heijden and Trahiotis (1999).
Using the same types of interaurally delayed maskers and signal-plus-maskers, Bernstein and Trahiotis (2016) found degradations in binaural detection (measured as a function of ITD) for listeners whose monaural pure-tone thresholds at 4 kHz were elevated (i.e., ≥ 7.5 dB HL) and, importantly, whose overall audiometric profile would be characterized as normal or “slight loss.” Such elevations in threshold are typically not considered to be clinically relevant losses. The behavioral results and quantitative analyses supported our hypothesis that small degradations of neural encoding that may not be discoverable via monaural behavioral tasks might be discoverable via a suitable binaural behavioral task. That hypothesis followed from historically validated characterizations of binaural processing as arising from a central, across-ear, “multiplication” of monaural neural information (e.g., Sayers and Cherry, 1957; Colburn, 1977; Stern and Colburn, 1978) and was stated in Bernstein and Trahiotis (2016) as follows:
…let us assume that, without loss, the neural inputs from 100 left-right pairs of monaural nerve fibers represent inputs for binaural interaction. One would expect that if the neural information were perfectly interaurally synchronized, then 100 multiplications of the monaural events would result in 100 outputs after binaural interaction. Now, let us assume that 20% of the inputs from each monaural channel are removed in a manner that is effectively random with respect to left and right ears. Then, one would expect, everything else being equal, that the information from each of the 80 independent monaural neural units would yield, on average, 64 outputs after binaural interaction. That is, under this scenario, a loss of 20% of the neural information stemming from each ear, could result in a loss of 36% centrally (p. 3541).
The experiments reported herein extend and verify, across CF, the general findings of the several foregoing lines of research. Binaural detection was measured as a joint function of magnitude of ITD and CF. ITD spanned the range from 0 to 3000 μs; CF spanned the five octaves from 250 Hz to 8 kHz.
The primary goals of the research were to determine (1) the relation between CF and the precision with which listeners internally compensate externally imposed ITDs; (2) the degree to which binaural deficits, measured across CF, are associated with slight hearing loss. One theoretical goal was to determine whether the data obtained from both normal-hearing listeners and listeners with slight hearing losses could be accounted for quantitatively and explained conceptually via the interaural correlation-based modeling approach recently described in Bernstein and Trahiotis (2017). That approach entailed making predictions of binaural detection thresholds while assuming that listener's decisions are constrained by (1) the inherent variability of “internal” sample-to-sample values of interaural correlations of both masker and signal-plus-masker waveforms and (2) two independent sources of “internal noise,” one associated with peripheral, monaural processing and the other associated with central, binaural compensation of ITD. It will be seen that the model provides an accurate quantitative description of the binaural detection data obtained across CF, including data obtained from listeners having thresholds at 4 kHz > 7.5 dB HL and who were found to have elevated binaural detection thresholds.
II. EXPERIMENT
A. Stimuli
Detection thresholds were measured in the three masking configurations described in detail in Bernstein and Trahiotis (2015, 2016). The first stimulus configuration, (NoSπ)τ, consists of the standard NoSπ stimulus configuration, but with the imposition of an interaural delay, τ, on the entire NoSπ signal-plus-masker waveform. As explained above, if the listener compensated internally for the externally imposed interaural delay without error, then the (NoSπ)τ configuration would be transformed back to NoSπ, and performance would be independent of τ.
The second configuration, (No)±τ(Sπ)τ, is the “double delay” configuration. It differs from (NoSπ)τ in that the double-delayed noise masker, (No)±τ, is composed of the sum of two independent noises, one interaurally delayed toward the left ear and the other interaurally delayed equally toward the right ear. Note that when τ = 0, (No)±τ(Sπ)τ is equivalent to NoSπ.
For any given interaural delay, τ, the masker of the double-delay configuration, (No)±τ, has the same interaural correlation as the masker, (No)τ, of its (NoSπ)τ counterpart (see van der Heijden and Trahiotis, 1999). Because the double-delayed noise incorporates two opposing, “mirror image” interaural delays, there exists no single compensating internal interaural delay that can be employed by the listener in order to transform (No)±τ(Sπ)τ back to NoSπ. Important for interpretation of the data is the fact that for binaural releases of masking obtained in the double-delay configuration, there is no value of compensating internal delay that can be employed that would yield lower detection thresholds than those achievable by imposing no internal compensating delay (see van der Heijden and Trahiotis, 1999, p 388). Thus, one can be confident that detection advantages measured in (NoSπ)τ conditions re their (No)±τ(Sπ)τ double-delay counterparts reflect advantages gained via the use of compensating interaural delays.
The third stimulus configuration, (NoSo)τ, consists of the standard NoSo stimulus configuration but with the imposition of an interaural delay, τ, on the entire NoSo signal-plus-masker waveform. There are no stimulus-imposed binaural cues available to aid detection in the (NoSo)τ configuration. That condition, therefore, serves as a control by allowing one to determine if interaural-delay-based changes in lateral position of the intracranial image, per se, affect detection.
For all three masking configurations, detection was measured using 100-Hz-wide Gaussian noise maskers and tonal signals centered at octave frequencies spanning the range from 250 Hz to 8 kHz. The lowest three CFs fell within spectral regions within which the fine-structures of the stimulus waveforms convey the interaural temporal information; the highest three CFs fell within spectral regions within which the envelopes of the stimulus waveforms convey the interaural temporal information. Because binaural masking releases are known to be relatively small for high-frequency signals and Gaussian noise maskers (e.g., van de Par and Kohlrausch, 1997), we also employed transposed stimuli at CFs of 2, 4, and 8 kHz. Transposed stimuli were employed because they yield substantially greater releases from masking at high frequencies than do their conventional counterparts (van de Par and Kohlrausch, 1997).
The transposed stimuli were constructed and presented as follows. For observation intervals containing the signal (S+N), a 125-Hz tonal signal was added to a 125-Hz-centered, 100-Hz-wide narrowband Gaussian noise at the appropriate signal-to-noise ratio and then the entire waveform was transposed to 2, 4, or 8 kHz. For observation intervals containing only the masker, the waveforms were created by transposing only the 125-Hz-centered masker (N) to the desired CF. Transposition was carried out via the method described by Bernstein and Trahiotis (e.g., 2002, 2015). Specifically, each 125-Hz-centered S+N or N stimulus was half-wave rectified and transformed to the frequency-domain after which the magnitudes of components above 2 kHz were set to zero. Then, the rectified and filtered stimuli were transformed back to the time-domain and multiplied by a 2, 4, or 8 kHz sinusoidal carrier. All of the stimuli centered at 2, 4, or 8 kHz, whether conventional or transposed, were presented against a background of continuous diotic Gaussian noise low-passed at 1.3 kHz (overall level of 61 dB SPL). This was done in order to preclude listeners' use of low-frequency distortion products arising from normal, non-linear peripheral auditory processing (e.g., Nuetzel and Hafter, 1976; Bernstein and Trahiotis, 1994).
All stimuli were generated digitally with a sampling rate of 96 kHz via a custom software library (MLSig) running within Matlab (MathWorks, Natick, MA) and were converted to voltages via a TDT (Alachua, FL) PD1 before being presented via Etymōtic (Elk Grove Village, IL) ER-2 insert earphones to listeners seated in individual single-walled IAC chambers. Noise and signal waveforms (whether conventional or transposed) were each selected randomly from within 4-s long buffers that were generated anew prior to each adaptive run. The total duration of the maskers was 310 ms (including 20-ms cos2 ramps). Signals were 270-ms-long (including 20-ms cos2 ramps) and were temporally centered within the maskers. In the absence of the addition of a signal, the overall level of the stimuli was 67 dB SPL. The relative levels of the stimuli and their rise-decay ramps were controlled via software and the absolute levels of the stimuli were determined by programmable attenuators (TDT PA4) in a manner that maximized the use of the 16-bit range of the digital-to-analog converters.
For the noise maskers, ITDs were implemented via linear phase shifts in the frequency domain, prior to transformation to the time domain; for the tonal signals, ITDs were converted to the appropriate starting interaural phase difference prior to generation of the signal in the time domain. As a result, all ITDs were implemented as ongoing delays. That is, there were no onset delays. Onset and offset ramps were applied to the final waveform, i.e., subsequent to the imposition of the ITD. The values of ITD employed for the various stimulus conditions are listed in Table I. As indicated, detection thresholds were measured as a function of ITD with the greatest resolution for the (NoSπ)τ conditions. That was done because the (NoSπ)τ conditions are those that yield direct estimates of the relations between precision of ITD processing and magnitude of ITD at each of the various CFs. Fewer values of ITDs were employed when using the (No)±τ(Sπ)τ and (NoSo)τ configurations because their purpose, as described above, was to serve primarily as control conditions.
Entries indicate the values of ITD, in μs, employed for each stimulus condition at each CF.
. | Conventional . | Transposed . | |||||||
---|---|---|---|---|---|---|---|---|---|
CF (Hz) . | 250 . | 500 . | 1000 . | 2000 . | 4000 . | 8000 . | 2000 . | 4000 . | 8000 . |
(NoSo)τ | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 3000 | 0, 1000, 3000 | 0, 1000, 3000 |
(No)±τ(Sπ)τ | 0, 250, 500, 750, 1000, 3000 | 0, 250, 500, 3000 | 0, 125, 250, 3000 | 0, 500, 1000, 3000 | 0, 500, 1000, 3000 | 0, 500, 1000, 3000 | 0, 500, 1000, 2000, 3000 | 0, 500, 1000, 2000, 3000 | 0, 500, 1000, 2000, 3000 |
(NoSπ)τ | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 500, 1000, 20000, 3000 | 0, 500, 1000, 20000, 3000 | 0, 500, 1000, 20000, 3000 |
. | Conventional . | Transposed . | |||||||
---|---|---|---|---|---|---|---|---|---|
CF (Hz) . | 250 . | 500 . | 1000 . | 2000 . | 4000 . | 8000 . | 2000 . | 4000 . | 8000 . |
(NoSo)τ | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 2000, 3000 | 0, 1000, 3000 | 0, 1000, 3000 | 0, 1000, 3000 |
(No)±τ(Sπ)τ | 0, 250, 500, 750, 1000, 3000 | 0, 250, 500, 3000 | 0, 125, 250, 3000 | 0, 500, 1000, 3000 | 0, 500, 1000, 3000 | 0, 500, 1000, 3000 | 0, 500, 1000, 2000, 3000 | 0, 500, 1000, 2000, 3000 | 0, 500, 1000, 2000, 3000 |
(NoSπ)τ | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000, 2500, 3000 | 0, 500, 1000, 20000, 3000 | 0, 500, 1000, 20000, 3000 | 0, 500, 1000, 20000, 3000 |
B. Listeners and procedures
Seven of the listeners (6 females, 1 male) who participated in our earlier experiment (Bernstein and Trahiotis, 2016) were available and participated in the current experiment. Their ages ranged from 26 to 58 yrs. One female listener was unavailable to complete the transposed stimulus conditions centered at 2 and 8 kHz. So as to maintain a crew of seven listeners for those conditions, we collected thresholds in those conditions from a seventh female listener whose audiometric profile was similar to that of the listener who could not participate.
1. Pure-tone absolute thresholds
Absolute thresholds were measured using the method of adjustment for frequencies spanning the range 250 Hz to 8 kHz, in octave steps. Tonal stimuli were presented via matched, calibrated pairs of TDH-49 (Telephonics, Huntington, NY) earphones (mounted in MX-41-AR cushions). For each frequency tested, listeners were presented with repeating 250-ms-long (including 50-ms rise/fall times) tone bursts, each separated by 250 ms. The sequence began at a sound pressure level such that the tone bursts were clearly audible. The listener was instructed to hold down a button on a response box until the tone faded from audibility and then to release the button until the tone became audible again. The listener's actions controlled programmable attenuators that continuously and appropriately altered the level of the tone in 0.1-dB steps. Once the listener's actions produced six “reversals,” in the level of the tone, testing of the next tone in the sequence began. The order of testing in each ear was 1000, 500, 250, 2000, 4000, and 8000 Hz. Testing began with the left ear, followed by testing in the right ear. Thresholds in HL (dB) were computed as the mean of the levels corresponding to the last four reversals for each sequence. During each session, the entire test was repeated for each listener one or two times (depending upon whether the measures appeared to be asymptotic) and HL thresholds were recorded as the mean of the estimates obtained across repetitions. Occasionally, an individual test produced anomalous results, particularly when a listener was tested for the first time. When that occurred, the entire test was discarded. Listeners' absolute thresholds were re-measured at intervals of six months to a year during the one- to two-year period required to complete the experiment. Because no substantial changes were observed in the measures across the duration of the experiment, “final” HL thresholds were recorded for each listener as the means across all of the tests and re-tests conducted.
For all listeners, no pure-tone threshold exceeded 25 dB HL, thus ensuring that all listeners had no more than “slight” losses as defined by American Speech-Language Hearing Association guidelines (see Clark, 1981). In addition, left- vs right-ear thresholds at each frequency differed by no more than 10 dB. As a check on our laboratory procedures, audiometry was also performed for six of the eight listeners, in the Otolaryngology clinic at the University of Connecticut Health Center. Those measures confirmed our laboratory-based measures of absolute threshold.
2. Detection thresholds
Detection thresholds were measured via a two-alternative, two-interval forced-choice adaptive task targeting the 71% point on the psychometric function (Levitt, 1971). Each trial within the adaptive task consisted of a warning interval (500 ms) followed by two 310-ms observation intervals separated by 400 ms. Each interval was marked visually by a computer monitor. Feedback was provided for approximately 400 ms after the listener responded. The listener's task was to detect the presence of signals that were presented with equal a priori probability in either the first or second interval. The initial step-size for the adaptive track was 2 dB and was reduced to 1 dB after two reversals. A run was terminated after 12 reversals and threshold was defined as the mean of the signal-to-noise ratio (in dB) computed across the last ten reversals.
The general ordering of experimental conditions was the same for all listeners. For the stimuli utilizing 100-Hz-wide Gaussian noise maskers, testing was blocked by CF with CFs visited in the order 500, 2000, 8000, 250, 1000, 4000 Hz.1 Within each of those blocks, the signal-plus-masker configurations were visited in the order (NoSo)τ, (NoSπ)τ, (No)±τ(Sπ)τ. For each configuration, four thresholds were collected at each ITD, with ITDs visited in ascending order. When all of those experimental conditions had been exhausted, four additional thresholds were obtained by visiting the entire set of experimental conditions in reverse order.
Next, detection thresholds were measured using transposed stimuli following the same scheme as described above for the 100-Hz-wide conventional Gaussian noise maskers. Testing began with transposed stimuli centered at 2 kHz, followed by those centered at 8 kHz. The data reported for the transposed stimuli centered at 4 kHz were obtained in an earlier study that was completed prior to the experiment reported here.
Final thresholds were obtained for each listener and condition by computing the mean of all of the estimates of threshold obtained for each condition. On the relatively few occasions that the standard deviation of any set of estimates exceeded 3.0 dB, two (or four) additional estimates were obtained and substituted for the “oldest” measures until the set of estimates of threshold yielded a standard deviation of less than or equal to 3.0 dB.
Four of the listeners were not available to complete eight measures of threshold across all of the experimental conditions. If only four thresholds could be obtained, then those were used to compute the mean detection threshold for the particular listener and condition. In the event that eight thresholds were obtained, the standard deviation of the estimates exceeded 3.0 dB, and the listener was unavailable to complete “re-runs,” then the most recent four thresholds were used to compute the mean detection threshold for that particular listener and condition.
C. Results and discussion
Each of the six panels of Fig. 1 shows mean threshold S/N (dB) plotted as a function of ongoing ITD. Squares represent data obtained using the (NoSo)τ configuration; triangles represent data obtained using the (No)±τ(Sπ)τ double-delay configuration; circles represent data obtained using the (NoSπ)τ configuration. The error bars represent ±1 standard error (se) and the parameter within each panel specifies the masking configuration.
Average S/N (in dB) plotted as a function of ITD (μs). The squares, triangles, and circles represent the data obtained in the (NoSo)τ, (No)±τ(Sπ)τ, and (NoSπ)τ configurations, respectively. Error bars represent ±1 se of the mean. The left-hand panels contain the data obtained from the group of listeners having thresholds at 4 kHz ≤ 7.5 dB HL; the right-hand panels contain the data obtained from the group of listeners having thresholds 4 kHz > 7.5 dB HL. The upper, middle, and lower panels contain the data obtained with Gaussian noise maskers and tonal signals centered at 250, 500, and 1000 Hz, respectively.
Average S/N (in dB) plotted as a function of ITD (μs). The squares, triangles, and circles represent the data obtained in the (NoSo)τ, (No)±τ(Sπ)τ, and (NoSπ)τ configurations, respectively. Error bars represent ±1 se of the mean. The left-hand panels contain the data obtained from the group of listeners having thresholds at 4 kHz ≤ 7.5 dB HL; the right-hand panels contain the data obtained from the group of listeners having thresholds 4 kHz > 7.5 dB HL. The upper, middle, and lower panels contain the data obtained with Gaussian noise maskers and tonal signals centered at 250, 500, and 1000 Hz, respectively.
The upper, middle, and lower panels present data obtained at CFs of 250, 500, and 1000 Hz, respectively. The three left-hand panels represent the average of the thresholds obtained from the four listeners whose audiometric thresholds at 4 kHz were ≤ 7.5 dB HL; the three right-hand panels represent the average of the thresholds obtained from the three listeners whose audiometric thresholds at 4 kHz were > 7.5 dB HL.2 This partitioning in terms of audiometric threshold is the same one used successfully in our original investigation (Bernstein and Trahiotis, 2016) that employed 31 listeners and stimuli centered at only two CFs: 500 Hz and 4 kHz. The sizable and robust differences in the patterns of those earlier data partitioned in this manner gave us confidence that an attempt to conduct a study while employing a smaller number of listeners, many more stimulus conditions, and the same partitioning of the data would be successful. Within and across panels, the general form of the data matches that reported in our previous publications employing the same types of stimulus configurations and verifies our expectations.
The (NoSo)τ thresholds obtained for the three CFs shown in Fig. 1 are essentially independent of ITD and the median thresholds are elevated by about 1 dB for the “>7.5 dB” group as compared to the “≤7.5 dB” group. Though small, using an alpha of 0.05, the differences were found to be statistically significant via a median test (see Siegel, 1956, pp. 111–116) calculated while including each listener's mean thresholds (χ2 = 6.16; df = 1; p = 0.013).
The data obtained in the (No)±τ(Sπ)τ configuration are very similar across the three CFs and across the two groups of listeners. Specifically, the thresholds obtained at each CF increase about 10 dB as the ITD of the masker is increased from 0 μs to the ITD representing one quarter-period of the CF (1000 μs at 250 Hz; 500 μs; 250 μs at 1000 Hz). The ITDs corresponding to the quarter-period of those CFs are indicated by the dashed vertical lines within each panel. Note that thresholds did not change appreciably when the ITD at each CF was increased further to 3000 μs. The only exceptions in these trends are observed for the thresholds obtained at a CF of 1000 Hz and from listeners whose audiometric thresholds at 4 kHz were > 7.5 dB HL (lower-right panel). At an ITD of 0 μs, those thresholds are 7–8 dB higher than their counterparts shown in the other five panels.
The observed increases in threshold for ITDs up to a quarter-period of each CF were expected because the interaural correlations of the maskers in each case decreased from 1.0 (when the ITD was zero) to a deterministic 0.0 (when the ITD was equal to the quarter-period). That expectation was based on the outcomes of several prior studies concerning the relation between masker interaural correlation and binaural detection thresholds (e.g., Robinson and Jeffress, 1963; Rilling and Jeffress, 1965). The data obtained with the (No)±τ(Sπ)τ configuration exhibit a pattern consistent with those studies and confirm that listeners' performance was constrained by the interaural correlation of the masker.
Turning to the stimulus conditions of primary interest, note that the thresholds obtained with the (NoSπ)τ configuration are generally much lower than their (No)±τ(Sπ)τ counterparts. That outcome indicates that the listeners were able to compensate the ITD of the masker. The fact that the (NoSπ)τ thresholds generally increased monotonically with ITD suggests that the precision of such compensation diminished as ITD was increased.
The data obtained with the (NoSπ)τ configuration were found via the same type of median test described above to differ statistically significantly across the two groups of listeners (χ2 = 28.65; df = 1; p 0.0001). For listeners having hearing thresholds at 4 kHz ≤ 7.5 dB HL, detection thresholds were similar across CF, ranging from about −15 dB when the ITD was zero to −11 dB or so when the ITD was 3000 μs. In contrast, for listeners having hearing thresholds at 4 kHz > 7.5 dB HL, thresholds were somewhat higher than those obtained from the other group at all values of ITD tested and the relations between thresholds and ITD differed depending on CF. When the CF was 250 Hz (upper right panel), thresholds increased by about 5 dB as ITD was increased from 0 to 3000 μs. When the CF was 500 or 1000 Hz, thresholds were remarkably constant across all values of ITD. Once again, the largest departure between the performances of the two groups occurred at a CF of 1000 Hz. In that case, thresholds for the >7.5 dB group were, independent of ITD, elevated to a value of about −7.5 dB, a value quite close to those obtained in the corresponding (No)±τ(Sπ)τ configuration. In fact, the (No)±τ(Sπ)τ and (NoSπ)τ thresholds are so similar as to suggest that internal compensation of ITD was greatly diminished (if not almost completely absent) for the >7.5 dB group at 1000 Hz. Although appealing at face value, this interpretation is not supported by our quantitative theory-based analysis of the data (see Sec. III).
Figure 2 displays the thresholds obtained at CFs of 2000, 4000, and 8000 Hz and follows the same format as that employed in Fig. 1. As before, the “monaural” thresholds obtained with the (NoSo)τ configuration are essentially independent of ITD. In this case, the median thresholds were elevated by about 2 dB for the > 7.5 dB group as compared to the “≤7.5 dB” group and the differences were found to be statistically significant via a median test (χ2 = 16.36; df = 1; p 0.0001). The data obtained with the (No)±τ(Sπ)τ configuration necessarily display a different pattern as a function of ITD than do their counterparts in Fig. 1. That is so because, for all three of these high CFs, ITDs are conveyed by the envelopes of the 100-Hz-wide maskers. To explain, the ITD that would be expected to produce an interaural correlation of 0.0 would be about 2500 μs (indicated by the vertical dashed lines), the ITD corresponding to about a quarter period of the 100-Hz-wide bandwidth.
Same as Fig. 1, except for data obtained at CFs of 2000, 4000, and 8000 Hz.
For the ≤7.5 dB group, the (No)±τ(Sπ)τ thresholds increased with ITD at about the same rate at all three CFs, albeit by a small amount; for the >7.5 dB group, at each CF thresholds were both elevated slightly and were relatively constant as a function of ITD as compared to the data obtained from the ≤7.5 dB group. Overall, the MLDs [re (NoSo)τ] for the >7.5 dB group are smaller and vary less with ITD as compared to their counterpart MLDs obtained from the ≤7.5 dB group.
The general patterning of the data obtained in the (NoSπ)τ configuration differs from those obtained with the (No)±τ(Sπ)τ configuration in two important respects. First, all but very few of the (NoSπ)τ thresholds fall below their (No)±τ(Sπ)τ counterparts, indicating a detection advantage stemming from, at least, partial compensation of ITD. Second, compared to those in Fig. 1, the (NoSπ)τ thresholds are “flatter” as a function of ITD. Notably, as is the case for the (No)±τ(Sπ)τ thresholds in Fig. 2, the (NoSπ)τ thresholds obtained from the >7.5 dB group are, at all three CFs, elevated as compared to the corresponding thresholds obtained from the ≤7.5 dB group. A median test indicated that those across-group differences were statistically significant (χ2 = 39.43; df = 1; p 0.0001).
Once again, the MLDs obtained re (NoSo)τ are somewhat smaller for the > 7.5 dB group than for the ≤ 7.5 dB group. The across-group differences in the size of the MLDs notwithstanding, it is important to emphasize that the data obtained from both groups of listeners suggest that compensation of ITD did improve performance at all three CFs.
Figure 3 displays the thresholds obtained with transposed stimuli centered at 2000, 4000, and 8000 Hz. The reader is reminded that, for these CFs, ITDs are conveyed by the envelopes of the stimuli. Because the stimuli were generated by transposing 100-Hz wide bands of noise centered at 125 Hz, the quarter-period of the envelope for all three CFs was 2000 μs (indicated by the vertical dashed lines). Therefore, all of the stimuli having an ITD of 2000 μs had a deterministic interaural correlation of zero. The general ordering of the data in Fig. 3 is the same as that in Fig. 2. Once again, the thresholds obtained with the (NoSo)τ configuration are essentially independent of ITD and the median thresholds were elevated by about 2 dB for the >7.5 dB group as compared to the ≤7.5 dB group. Once again, those differences were found to be statistically significant via a median test (χ2 = 5.35; df = 1; p = 0.02). As was the case for the data displayed in Figs. 1 and 2, (NoSπ)τ thresholds are, overall, higher for the >7.5 dB group as compared to the ≤7.5 dB group. Those differences were also statistically significant via a median test (χ2 = 13.14; df = 1; p = 0.0003). As was the case for the data obtained with Gaussian maskers (Fig. 2), the slopes relating threshold S/N (dB) to ITD obtained with transposed stimuli are steeper for the data obtained from the ≤7.5 dB group than for their ≤7.5 dB counterparts.
Same as Fig. 2 except for data obtained with transposed maskers and signals.
There are, however, differences between the data in Figs. 2 and 3 that deserve detailed comment. Note that, for the ≤7.5 dB group, the (NoSπ)τ thresholds obtained with Gaussian noise maskers (Fig. 2) and with transposed stimuli (Fig. 3) are essentially equivalent at an ITD of zero while thresholds obtained with the transposed stimuli at an ITD of 3000 μs are higher than those obtained with the Gaussian noise maskers. This type of outcome departs from the results obtained in several studies indicating that binaural detection and discrimination with transposed stimuli are superior to those obtained with conventional stimuli such as Gaussian noises, sinusoidally amplitude-modulated tones, etc. (e.g., Bernstein and Trahiotis, 1994, 2002). This type of outcome, however, is reminiscent of one reported by Greenberg et al. (2012) who measured detection of changes in the interaural phase of the envelope for different reference values of envelope interaural phase while employing both transposed and conventional stimuli. As will be seen when our model-based, quantitative analyses are presented (Sec. III), the apparent discrepancy vis-à-vis conventional vs transposed stimuli is entirely explainable within a general cross-correlation-based model that takes into account peripheral and central auditory processing.
As stated in Sec. I, two of our primary goals were to determine (1) the relation between CF and the precision with which listeners compensate, internally, externally imposed ITDs; and (2) the degree to which binaural deficits, measured across CF, are associated with slight hearing loss. With regard to those goals, the detection thresholds of primary interest were those obtained with (NoSπ)τ stimuli with the detection thresholds obtained with (NoSo)τ and (No)±τ(Sπ)τ stimuli serving as “controls” for interpretive purposes. Pursuant to our goals, Fig. 4 displays all of the (NoSπ)τ thresholds presented in a manner that fosters the comparisons of interest across the two groups of listeners. The six panels contain the mean detection thresholds obtained as a function of ITD, replotted from Figs. 1–3. The top and middle pairs of panels contain the data obtained with the low CF and the high CF Gaussian maskers, respectively. The bottom pair of panels contain the data obtained with the three high-frequency, transposed stimuli. The left-hand panels contain data obtained from the ≤7.5 dB group; the right-hand panels contain data obtained from the >7.5 dB group. The slopes and intercepts of the regression lines that best-fit the data are listed within each panel.
Average S/N (in dB) plotted as a function of ITD (μs). Data are replotted from Figs. 1–3. The squares, triangles, and circles represent the data obtained with Gaussian noise maskers and tonal signals at 250, 500, and 1000 Hz, respectively (upper panel); 2000, 4000, and 8000 Hz, respectively (middle panel), and transposed maskers and signals centered at 2000, 4000, and 8000 Hz (lower panel). The left-hand panels contain the data obtained from the group of listeners having thresholds at 4 kHz ≤ 7.5 dB HL; the right-hand panels contain the data obtained from the group of listeners having thresholds 4 kHz > 7.5 dB HL. The solid lines within each panel represent linear regression fits to the data plotted within that panel.
Average S/N (in dB) plotted as a function of ITD (μs). Data are replotted from Figs. 1–3. The squares, triangles, and circles represent the data obtained with Gaussian noise maskers and tonal signals at 250, 500, and 1000 Hz, respectively (upper panel); 2000, 4000, and 8000 Hz, respectively (middle panel), and transposed maskers and signals centered at 2000, 4000, and 8000 Hz (lower panel). The left-hand panels contain the data obtained from the group of listeners having thresholds at 4 kHz ≤ 7.5 dB HL; the right-hand panels contain the data obtained from the group of listeners having thresholds 4 kHz > 7.5 dB HL. The solid lines within each panel represent linear regression fits to the data plotted within that panel.
With the data presented in this fashion, it is apparent that, in general, the thresholds obtained from the >7.5 dB group are both elevated (have higher intercepts) and are less affected by increases in ITD (have shallower regression slopes) as compared to their counterparts obtained from the ≤7.5 dB group. As can be seen in the upper right panel, and as discussed earlier, the thresholds obtained from the >7.5 dB group at a CF of 1000 Hz appear to be relatively constant as a function of ITD. They are largely responsible for the regression slope of the aggregate data being only 0.9 × 10−3 dB/μs. The regression slope and intercept calculated after omitting the data obtained at a CF of 1000-Hz were 1.3 × 10−3 dB/μs and −12.6 dB, respectively. The regression slope and intercept calculated after omitting the data obtained at a CF of 1000-Hz for the data obtained from the ≤7.5 dB group (upper left panel) were found to be 1.6 × 10−3 dB/μs and −15.7 dB, respectively. These comparisons reveal that when one considers only the data obtained at CFs of 250 and 500 Hz, the regression slopes derived from the two groups are more similar, while the intercepts for the >7.5 dB group remain elevated by about 3 dB. This means that the finding of relatively elevated detection thresholds as a function of ITD for the >7.5 dB group is a general one and not restricted to the higher CFs where the ITDs are conveyed by only the envelopes of the stimuli.
III. QUANTITATIVE ANALYSES
A. Description of the model
We attempted to account for the data in Figs. 1–3 using an enhanced version of the interaural-correlation-based model of binaural processing described in Bernstein and Trahiotis (2017). A block diagram of the model we used is shown in Fig. 5. The first stage of the model added interaurally uncorrelated internal noise. The internal noises matched the CFs and bandwidths of the external masker or signal-plus-masker waveforms. Within each monaural channel, each of the resulting waveforms was passed through a single gammatone bandpass filter centered on the frequency of each respective signal. This was accomplished via Dr. Michael Akeroyd's “Binaural Toolbox” for Matlab® (also, see Glasberg and Moore, 1990; Slaney, 1993; Patterson et al., 1995). The filtered waveforms were subjected to envelope compression employing an exponent of 0.23 [see Bernstein et al. (1999) and Bernstein and Trahiotis (2017)]. The compressed waveforms were then rectified (half-wave, square-law) and passed through a fourth-order low-pass filter with a cutoff frequency of 425 Hz (see Weiss and Rose, 1988; Bernstein and Trahiotis, 1996, 2017). These types of compression, filtering, and rectification were included to emulate cochlear hair-cell nonlinearity and the loss of neural synchrony to the fine-structure of the stimuli that occurs as the CF is increased. For stimuli centered at 2000 to 8000 Hz, for which the envelopes of the stimuli (and not their fine-structures) convey the ITD, the next step was to pass the waveforms through a second-order (12 dB/oct) Butterworth envelope low-pass filter (for details and justification, see Bernstein and Trahiotis, 2002, 2014). The cutoff frequencies of the envelope low-pass filter were 850, 350, and 150 Hz for CFs of 2000, 4000, and 8000 Hz, respectively.3 Those values were chosen to conform to the relation between envelope low-pass cutoff frequency and CF derived in a previous study (Bernstein and Trahiotis, 2014). In that study, values of the cutoffs of the envelope low-pass filter were those that provided best-fits to ITD-discrimination thresholds obtained at high CFs.
Block diagram of the model used to make predictions of binaural detection data.
The central stage of the model served to compute the normalized interaural coherence of the processed waveforms at desired values of “lag” of the cross-correlation function. Those values of lag corresponded to the values of internal delay that acted to compensate the value of the external ITDs applied to the stimuli. The computation incorporated an interaural time-jitter the magnitude of which increased exponentially with the value of the internal delay. To understand intuitively, envision a spatial mapping of magnitude of internal delay to “place” along the delay dimension of a putative delay line. The assumption is that the “noisiness” within the delay line increases as the place of activity moves from smaller toward larger delays. This conception is patterned after the type of delay-line described and quantified originally by van der Heijden and Trahiotis (1999). Within their “black-box” model, they assumed there to be an interaurally uncorrelated internal delay-line noise, the power of which increased exponentially with the addition of delay-line “links” that, in toto, yielded the desired internal delay. Our use of time-jitter along the delay-line served as a mechanism to reduce the interaural correlation along the delay-line in a manner akin to that described by van der Heijden and Trahiotis.
A time-jittered internal delay line was implemented in the following manner for each binaural pair of waveforms to be processed. For the internal delay (cross-correlation lag) at which the interaural coherence was to be computed, we drew two independent (destined to be left and right) samples from a mean-zero Gaussian distribution. The standard deviation of the Gaussian distribution was scaled to increase with the magnitude of the internal delay that was to be corrupted. The scaling was equal to τα where τ is the value of the internal delay and the value of α determined the rate of the (exponential) growth of the time-jitter along the delay line. For example, assuming an internal delay, τ, of 250 μs and an α of 0.5, the two random values of delay would be drawn from a mean-zero Gaussian distribution having a standard deviation of 15.8 (2500.5) μs.
The two randomly drawn and scaled values served as random delays that were added to the left and right processed waveforms, respectively. Then, the resulting, jittered, internal interaural delay was applied to the waveforms. The interaural coherence of the left/right pair of waveforms was computed by averaging Fisher's z (see Pollack and Trittipoe, 1959; McNemar, 1969; Richards, 1987) transformed values of normalized interaural coherence obtained via a “running” single exponential window with a time-constant of 10 ms. Finally, a decision variable was formed by computing
where and represent the means of the distributions of the computed transformed interaural coherences of noise-alone and signal-plus-noise, respectively, and the quantities and represent the variances of those distributions, respectively [see Bernstein and Trahiotis (2017) for additional details].
B. Predicting the behavioral thresholds
The model was used in the following manner to make predictions of S/N (dB) for the experimental conditions. At least 100 tokens of noise-alone and signal-plus-noise were generated for each combination of stimulus condition and ITD employed in the experiments. The duration of the tokens was 300 ms, corresponding to the duration of the stimuli employed in the behavioral experiment. For each ITD, stimuli were generated over a range of behaviorally-relevant signal-to-noise ratios (S/N). Following the addition of interaurally uncorrelated internal noise matching the CF and bandwidth of the masker, the waveforms corresponding to each token were passed through the stages of the model (Fig. 5). Repeating this procedure for each token resulted in the desired sampling distributions for ρN and ρS+N. Those distributions yielded pairs of values relating (S/N) to da for each experimental condition at each ITD. In order to allow for interpolation, paired values of log(da) and S/N in dB were fit with linear functions (Matlab®, Natick, MA) using a least-squares criterion. For each experimental condition, this entire procedure was carried out multiple times using a broad range of levels of internal noise, a broad range of values of α, and independent (i.e., determined anew) sets of tokens of masker and signal-plus-masker “external” waveforms.
Finally, for all of the conditions employed in the experiment, we determined, for each group of listeners, the single value of da that maximized the average variance accounted for between the predictions of the model and the experimentally obtained values of S/N across all six CFs tested.4 Predictions were taken to be the smaller of: (1) the S/N values derived from the fitted curves relating da to S/N or (2) the threshold obtained in the corresponding monaural masking condition [e.g., NπSπ (where ρN = −1)] condition. In that manner, the threshold obtained in the monaural masking configuration served, for each experimental condition, as a “ceiling.” Thus, following Durlach (1963, 1972), performance was assumed never to be inferior to that which would result from monaural processing.
Determination of the “best-fitting” value of da was accomplished as follows. A “test value” of da, for example, 1.0, was chosen and then, separately for each CF, the level of internal noise and value of α (delay-line jitter growth) yielding the largest variance accounted for was determined. Then, the average value of the percentage of variance accounted for across the six frequencies was computed. The test value of da was incremented and the process was repeated. This allowed us to determine a relatively small range of da values within which the maximum average variance accounted for occurred. That small range was examined using a smaller increment between values of da than that used at the outset. As the process continued, ultimately, the increment between test values of da was set to 0.05 and the value of da yielding the largest average variance accounted for was determined. Note that the data obtained with the Gaussian maskers (Fig. 2) and their transposed counterparts (Fig. 3) were, at each CF (2000, 4000, and 8000 Hz), fit simultaneously for each group of listeners. This constraint was imposed so that values of the parameters were tied to CF and not to the type of stimulus employed, per se (i.e., Gaussian or transposed).
The best-fitting values of the parameters of the model, partitioned by CF and listener group, are displayed in Table II. The table also lists the corresponding root-mean-square (rms) errors between predicted and obtained data, and the amounts of variance in the behavioral data accounted for by the model. The values listed within Table II will be discussed in detail, below.
The values of level of interaurally uncorrelated internal noise, delay-line jitter growth, α, rms error (in dB), and percentage of variance accounted for at each CF and for each of the two groups of listeners. The criterion da for each group was that yielding the greatest mean percentage of variance accounted for across the six CFs.
. | . | Group . | |
---|---|---|---|
. | . | HL @ 4 kHz ≤ 7.5 dB . | HL @ 4 kHz > 7.5 dB . |
. | . | da = 2.35 . | da = 2.30 . |
250 Hz | Internal noise level (dB re external masker) | −11 | −7 |
α | 0.65 | 0.65 | |
rms error (dB) | 1.3 | 1.5 | |
% var accounted for | 87 | 82 | |
500 Hz | Internal noise level (dB re external masker) | −11 | −8 |
α | 0.50 | 0.50 | |
rms error (dB) | 1.0 | 1.2 | |
% var accounted for | 91 | 78 | |
1000 Hz | Internal noise level (dB re external masker) | −12 | −4 |
α | 0.45 | 0.40 | |
rms error (dB) | 1.0 | 0.9 | |
% var accounted for | 86 | −25 | |
2000 Hz (Gaussian and Transposed | Internal noise level (dB re external masker) | −11 | −7 |
α | 0.75 | 0.85 | |
rms error | 0.9 | 1.6 | |
% var accounted for | 81 | 5 | |
4000 Hz (Gaussian and Transposed | Internal noise level (dB re external masker) | −11 | −7 |
α | 0.75 | 0.75 | |
rms error | 0.9 | 0.9 | |
% var accounted for | 82 | 46 | |
8000 Hz (Gaussian and Transposed | Internal noise level (dB re external masker) | −10 | −7 |
α | 0.75 | 0.85 | |
rms error | 0.9 | 0.9 | |
% var accounted for | 66 | −13 |
. | . | Group . | |
---|---|---|---|
. | . | HL @ 4 kHz ≤ 7.5 dB . | HL @ 4 kHz > 7.5 dB . |
. | . | da = 2.35 . | da = 2.30 . |
250 Hz | Internal noise level (dB re external masker) | −11 | −7 |
α | 0.65 | 0.65 | |
rms error (dB) | 1.3 | 1.5 | |
% var accounted for | 87 | 82 | |
500 Hz | Internal noise level (dB re external masker) | −11 | −8 |
α | 0.50 | 0.50 | |
rms error (dB) | 1.0 | 1.2 | |
% var accounted for | 91 | 78 | |
1000 Hz | Internal noise level (dB re external masker) | −12 | −4 |
α | 0.45 | 0.40 | |
rms error (dB) | 1.0 | 0.9 | |
% var accounted for | 86 | −25 | |
2000 Hz (Gaussian and Transposed | Internal noise level (dB re external masker) | −11 | −7 |
α | 0.75 | 0.85 | |
rms error | 0.9 | 1.6 | |
% var accounted for | 81 | 5 | |
4000 Hz (Gaussian and Transposed | Internal noise level (dB re external masker) | −11 | −7 |
α | 0.75 | 0.75 | |
rms error | 0.9 | 0.9 | |
% var accounted for | 82 | 46 | |
8000 Hz (Gaussian and Transposed | Internal noise level (dB re external masker) | −10 | −7 |
α | 0.75 | 0.85 | |
rms error | 0.9 | 0.9 | |
% var accounted for | 66 | −13 |
The obtained (No)±τ(Sπ)τ and (NoSπ)τ thresholds plotted in Figs. 1–3 are replotted in Figs. 6–8, respectively. The solid lines represent the predictions from the model. Visual inspection reveals that the model accounts quite well for the data obtained from both groups of listeners. More specifically, the model captures variations in threshold produced by varying CF, masking configuration, ITD, and stimulus type (Gaussian or transposed).
Symbols represent average obtained S/N (in dB) plotted as a function of ITD (μs). The triangles and circles represent the data obtained in the (No)±τ(Sπ)τ and (NoSπ)τ configurations, respectively. The upper, middle, and lower panels contain the data obtained with Gaussian noise maskers and tonal signals centered at 250, 500, and 1000 Hz, respectively. Data are replotted from Fig. 1. Solid lines represent quantitative predictions of thresholds derived from the interaural correlation-based model depicted in Fig. 5.
Symbols represent average obtained S/N (in dB) plotted as a function of ITD (μs). The triangles and circles represent the data obtained in the (No)±τ(Sπ)τ and (NoSπ)τ configurations, respectively. The upper, middle, and lower panels contain the data obtained with Gaussian noise maskers and tonal signals centered at 250, 500, and 1000 Hz, respectively. Data are replotted from Fig. 1. Solid lines represent quantitative predictions of thresholds derived from the interaural correlation-based model depicted in Fig. 5.
Same as Fig. 6, except for data obtained at CFs of 2000, 4000, and 8000 Hz. Data are replotted from Fig. 2.
Same as Fig. 7 except for data obtained with transposed maskers and signals. Data are replotted from Fig. 3.
A comparison of corresponding conditions across Figs. 7 and 8 reveals that, for the high-frequency stimuli, the model predicts, correctly, lower thresholds for Gaussian vs transposed maskers at the larger values of ITD. Thus, the model captures what we earlier termed an empirical “departure” from the literature that is consistent with data by Greenberg et al. (2012). In order to understand how this could occur, consider that predictions for thresholds obtained with Gaussian and transposed maskers at any particular CF were made using one, common, set of best-fitting values of internal noise level, α, and da. Next, consider that the envelopes of transposed stimuli are “sharper” than are their Gaussian counterparts. Thus, any given amount of added delay-line jitter that serves to obscure the “true ITD” will produce larger decreases of moment-to-moment samples of interaural coherences for transposed stimuli as compared to their Gaussian counterparts. Said differently, the additions of temporal jitter increase the variance of the sampling distributions of interaural coherence that go into the computation of da, and they would, other things being equal, be expected to do so to a greater extent for transposed stimuli as compared to Gaussian noise stimuli.
Turning to Table II, note that the best fitting values of da were 2.35 and 2.30 for the two groups, respectively. These independently determined values are, for all practical purposes, equivalent. In fact, using the value of 2.35 as the criterion da for the >7.5 dB group changed the averaged variance accounted for across CF by only a fraction of 1%.
Turning to the levels of internal noise, note that, for the ≤7.5 dB group, the levels of internal noise derived from the model at each CF are quite uniform, being −11 ± 1 dB. A similar uniformity is evident for the estimates of internal noise for the >7.5 dB group which were −8 at 500 Hz, −4 dB at 1000 Hz, and −7 dB for the remaining four CFs. Note that, taken across all CFs, the estimated levels of internal noise are, on average, 4 dB greater for the >7.5 dB group.
The estimated values of α (growth of delay-line noise) are, center-frequency by center-frequency, quite similar for the two groups. They decreased monotonically from 0.65 by about 0.1 per doubling of CF between 250 and 1000 Hz, CFs for which interaural timing information is conveyed by the fine-structures of the stimuli. At the higher CFs, for which interaural timing information is conveyed by the envelopes of stimuli, estimated values of α were, for both groups of listeners, larger than those estimated at the lower frequencies and they did not vary systematically with CF. More specifically, for the ≤7.5 dB group, the estimated value of α was 0.75 at all three frequencies while for >7.5 dB group, they were either 0.75 or 0.85. These outcomes suggest that, for both groups of listeners, rate of growth of delay-line noise was greater and more uniform at high frequencies than at low frequencies.
C. Accuracy of predictions
Table II includes two indices used to evaluate the accuracy of the model's predicted thresholds. The first index is one that we have routinely employed (e.g., Bernstein and Trahiotis, 1996, 2014, 2016) to quantify variance accounted for.5 It reflects the degree to which using the model's predictions of individual data points is superior to using the grand mean of the data as the prediction at every point. Such an index of variance accounted for is more stringent than the often-employed r2. Our index, unlike r2, is sensitive to overall offsets between predicted and obtained thresholds. This means that as the variation of obtained data points around their grand mean decreases, the amount of variance to be accounted for, above that predicted by the grand mean, also decreases. This could result in very small or even negative amounts of variance accounted for (relative to that accounted for by the grand mean of the data) even when deviations between the model's prediction and the obtained thresholds are small or even negligible. Therefore, we guarded against instances in which one might, mistakenly, conclude that the model does a poor job accounting for the data by also computing the rms error (in dB) between the predictions of the model and the obtained data.
The 12 rms errors reported in Table II are relatively uniform and range from 0.9 to 1.6 dB and average about 1.1 dB. This means that the predictions of the model, across the two groups of listeners, the six CFs, values of ITD, masker configuration, and stimulus type (Gaussian or transposed) are, in terms of rms error, extremely accurate. The amounts of variance accounted for, in contrast, differ, especially between the two groups of listeners. For the ≤7.5 dB group, the amounts of variance accounted for were relatively large, ranging from 86% to 91% at the lower three CFs and from 66% to 82% at the three higher frequencies. For the >7.5 dB group, amounts of variance accounted for ranged between −25% and 82% at the lower three CFs and between −13% and 46% at the three higher CFs. All of the very low or negative values of variance accounted for occurred for stimulus conditions for which behavioral thresholds varied very little as a function of ITD. The data obtained from the >7.5 dB group at 1000 Hz (Fig. 6, bottom right panel) are particularly illustrative. Note that, in this case, there is relatively little variation to predict as a function of ITD and as a function of stimulus configuration. For these data, the amount of variance accounted for was −25%, indicating that predictions of the model were poorer than predictions based on the grand mean of the data. This does not mean that the predictions of the model were inaccurate. On the contrary, as can be seen from the figure, and as indicated in the table by the rms error of 0.9 dB, the predictions of the model are quite accurate. To illustrate further, look at the data obtained for the same group of listeners at 250 and 500 Hz (top and middle right-hand panels of Fig. 6). Note that there is much more variation in the data as a function of ITD, principally because of thresholds obtained in the (No)±τ(Sπ)τ configuration. For these two conditions, the amounts of variance accounted for are 82% and 78%, respectively. Consistent with this and clearly evident via visual inspection, the grand mean of the data in each of the two panels would provide a very poor fit to the data. Notably, as shown in Table II, the rms error obtained across the three CFs were highly similar, ranging from 0.9 to 1.5 dB.
Finally, it is crucial to point out that careful scrutiny of the levels of internal noise and values of α (delay-line jitter exponent) that yielded the best fits to the data (Table II) were found to be quite robust and valid. Specifically, we did not find other combinations of the values of the parameters that yielded essentially equivalent amounts of variance accounted for.
IV. SUMMARY AND CONCLUSIONS
As stated in Sec. II, this research focused on three inter-related issues concerning the processing of binaural information. The first concerned how the ability to compensate, internally, externally imposed ITDs is affected by the CF of the stimuli. Within our model, that precision is indexed by α, which quantifies the rate of growth of “noise” with internal delay along the putative internal delay-line. We found that at low CFs, α decreased modestly with increasing CF while, at high frequencies, α was both larger and essentially constant across CF. This was true, independent of the listeners' hearing status.
The second goal was to assess whether and to what degree small, clinically negligible, elevations in audiometric thresholds affect binaural detection across CF. We found that such listeners exhibited modestly elevated binaural detection thresholds at low CFs and substantial elevations in detection thresholds at high CFs. These outcomes are consistent with those reported by Hawkins and Wightman (1980) and Smoski and Trahiotis (1986). They found that listeners who had high-frequency hearing loss exhibited deficits in resolution of ITD and did so for both high- and low-frequency stimuli.
The third goal concerned how well the totality of our findings could be accounted for quantitatively and explained conceptually via our interaural correlation-based modeling approach (Bernstein and Trahiotis, 2017). Quantitatively, in terms of amount of variance accounted for and rms error, the model was, overall, very accurate. Accuracy of predictions notwithstanding, perhaps the greatest contribution of the model is its ability to isolate and to explain which factor(s) are responsible for the elevated binaural detection thresholds found with the >7.5 dB group.
The results of the modeling (Table II) show that an increase in the level of stimulus-dependent interaurally uncorrelated, additive internal noise was, as far as we can tell from this complex experiment, solely responsible for the elevated thresholds obtained from the >7.5 dB group. Perhaps surprisingly, neither internal delay-line noise (α) nor underlying general sensitivity to signal-dependent changes in information derived from interaural correlation (da) proved to be diagnostic.
Three important generalizations/speculations seem appropriate. First, the stimulus conditions, psychophysical methods, and manner of partitioning listeners vis-à-vis their audiometric thresholds, yield precise, replicable, and valid results even when what might be considered to be small numbers of subjects are tested. This is so because the results of this study, which employed seven listeners, replicated important aspects of the results obtained in a recent study (Bernstein and Trahiotis, 2016). In that study, we employed many more listeners (31) using a small subset of the stimulus conditions employed in the present study.
Second, visual inspection of differences in binaural detection thresholds associated with any of the five independent variables examined [CF, ITD, masker configuration, masker-type (Gaussian or transposed), hearing status] not only does not appear to afford any intuitive understanding, but may be highly misleading. For example, visual inspection of the high-frequency data in Fig. 2 might lead one to conclude that relatively shallow slopes relating binaural detection thresholds to magnitude of ITD are indicative of relatively low rates of growth of internal delay-line noise with increasing magnitude of internal delay. The values of α obtained via the model, however, show this not to be the case. In fact, the slopes one observes across all of the data represent interplays among additive internal noise, rate of growth of delay-line noise, and the temporal characteristics of the stimuli themselves. In our view, understanding which factors determine binaural detection thresholds in any suitably complex modern experiment requires a theoretical framework that is not ad hoc but, rather, the result of a history of validation via application to a variety of relevant previously obtained sets of data.
Third, considering the experiments themselves and their quantitative analysis in the context of the results of previous investigations, it appears that they suggest that degradations of binaural processing we found to accompany slight loss likely stem from “noisy” monaural inputs to a binaural comparator. More specifically, that surmise is consistent with a host of prior monaural studies (e.g., Green, 1960; Spiegel and Green, 1981; Raab and Goldberg, 1975) and binaural studies (e.g., Durlach, 1972; van der Heijden and Trahiotis, 1997, 1999). Those studies characterize stimulus-dependent additive internal noise, the level of which grows with the level of the external stimulus, as an important factor that limits the fidelity of auditory processing or coding.
Finally, for archival purposes, it is worth addressing recent research within the auditory community concerning “hidden hearing loss.” That term was coined by Schatte and McAlpine (2011) and referred to their finding of abnormal auditory brainstem neural potentials in people who experienced tinnitus, but who, when tested clinically, exhibited normal audiograms. Schatte and McAlpine's findings are conceptually consistent with the outcomes of several modern anatomical and physiological studies conducted on animals, including primates (e.g., Kujawa and Liberman, 2009; Sergeyenko et al., 2013; Valero et al., 2017). Such studies have shown that cochlear synaptopathy may be present in the absence of measurable losses in absolute hearing sensitivity.
On that basis, one might hypothesize that perceptually significant peripheral auditory neural coding deficits may be present in human listeners classified as having clinically normal hearing on the basis of pure-tone audiometry. That hypothesis, which we construe as an extension of Schatte and McAlpine's notion of hidden hearing loss, has, either overtly or covertly, motivated many investigations seeking to discover behavioral evidence of auditory deficits in people categorized via standard audiometry as having normal hearing. It will be interesting to see if future investigations designed specifically to reveal hidden hearing loss capitalize on the measures of binaural detection performance and modeling reported herein.
ACKNOWLEDGMENTS
The authors thank Dr. Steven van de Par and Dr. Marcel van der Heijden for their help and insights with regard to the computational implementation of our quantitative model. The authors also thank the three reviewers and the Associate Editor, Dr. Chris Stecker, for their helpful comments that served to strengthen the presentation. This research was supported by the Office of Naval Research (ONR Award No. N00014-15-1-2140).
For two of the listeners, the ordering of the CFs was 250, 1000, 4000, 500, 200, 8000 Hz.
The mean HL measured at 4 kHz for the ≤ 7.5 dB group was 1.7 dB (sd = 2.2 dB); the mean HL measured at 4 kHz for the >7.5 dB group was 15.4 dB (sd = 4.9 dB). A t-test confirmed there to be a statistically significant difference (p = 0.008) in HL @ 4 kHz for the two groups. The mean age for the ≤7.5 dB group was 40.5 yrs (sd = 13.8 yrs); the mean age for the 4 kHz for the >7.5 dB group was 50.3 yrs (sd = 11.6 yrs). A t-test indicated that there was no statistically significant difference (p > 0.05) in age for the two groups.
Note that employing a cutoff frequency of 850 Hz for the envelope low-pass filter at the CF of 2000 Hz causes that filter to have no practical effect. That is because the low-pass cutoff of the “synchrony” filter employed within the model, which precedes the envelope low-pass filter, is 425 Hz.
Bernstein and Trahiotis (2017) demonstrated that the best-fitting values of da required to account for the data obtained in a wide variety of binaural detection studies was typically higher than 0.78, which is the value of d' corresponding to the targeted 71% correct performance in our adaptive, two-alternative, forced-choice task. They argued that such an outcome means that, from the standpoint of the model, adding the signal to the masker provided more information than was ultimately provided to the listener. That is, within the context of signal detection theory, some potentially useful information regarding the distributions of interaural correlation for masker-alone and signal-plus-masker is lost. For this reason, as was the case for the modeling in our earlier publication, da was a free parameter with the fitting procedures. Even so, a single value of da was found to account well for the data across all conditions and across the two groups of listeners.
The formula used to compute the percentage of the variance for which our predicted values of threshold accounted was where and represent individual observed and predicted values of threshold, respectively, and represents the mean of the observed values of threshold.