Klein-Hennig et al. [J. Acoust. Soc. Am. 129, 3856–3872 (2011)] introduced a class of high-frequency stimuli for which the envelope shape can be altered by independently varying the attack, hold, decay, and pause durations. These stimuli, originally employed for testing the shape dependence of human listeners' sensitivity to interaural temporal differences (ITDs) in the ongoing envelope, were used to measure the lateralization produced by fixed interaural disparities. Consistent with the threshold ITD data, a steep attack and a non-zero pause facilitate strong ITD-based lateralization. In contrast, those conditions resulted in the smallest interaural level-based lateralization.
1. Introduction
Psychoacoustic investigations of human listeners' binaural sensitivity to ongoing amplitude modulated stimuli have often been performed with envelope waveforms that were generated with temporally symmetric or stochastic functions such as sinusoidal amplitude modulation, transposed tones, or raised sine modulation (e.g., Bernstein and Trahiotis, 2009), two-tone complexes, or narrow band noises (McFadden and Pasanen, 1976). In contrast, Klein-Hennig et al. (2011) altered the rising envelope segment (attack) duration and the decaying envelope segment duration independently and found that for stimuli with a steep, 1.25-ms attack and shallow 18.75-ms decay flank the threshold interaural time difference (ITD) was about 70% lower than for the temporally reversed stimulus with shallow attack and steep decay.
Here a subset of the stimuli from Klein-Hennig et al. (2011) is used to measure the extents of lateralization for given ITDs and interaural level differences (ILDs). Most previous studies measuring lateralization as a function of envelope ITD only tested ITDs within the human physiological range or up to 1 ms (e.g., Bernstein and Trahiotis, 1986, 2012). In most cases, the lateralization increased up to the maximum ITD value measured, leaving it unclear how much further it would increase for ITDs larger than 1 ms. Thus, the current study measured an extended ITD range up to 2 ms.
Testing this extended range is also relevant for the long term goal of improving lateralization in cochlear implant (CI) subjects. These subjects typically do not have access to temporal fine-structure ITDs and rely on ILDs and envelope ITDs. Our hypothesis is that the range of perceivable extents of lateralization is not fully exploited with envelope ITDs limited to the human physiological range. A verification of this hypothesis would promote the development of ITD enhancement algorithms for the CI specific “envelope only” situation.
2. Methods
Extent of lateralization was measured using an ILD-based acoustic pointer-technique in six normal-hearing listeners (pure tone thresholds equal or below 20 dB) aged between 24 and 35 years. One subject was author M.K.H., one subject was a Ph.D. student from the laboratory. The other four subjects were university students and received compensation on an hourly basis. The stimuli were presented via HD 650 headphones (Sennheiser electronic GmbH & Co. KG, Wedemark-Wennebostel, Germany), connected to a Fireface UCX soundcard (RME Audio, Haimhausen, Germany). A PowerMate USB knob (Griffin Technology, Nashville, TN) was used as input device for the subjects.
The target stimuli were generated in the same way as in Klein-Hennig et al. (2011) and represent a subset of the envelope waveforms from that study. In all conditions of the current study a tonal 4-kHz carrier was fully amplitude modulated with rates between 33 and 50 Hz. Rising and decaying segments were the respective half-cycles of squared sinusoids. For the first comparison all three stimuli had short flank durations of 1.25 ms, and the modulation frequency was always 50 Hz. The duty cycle was varied parametrically with pause durations of 0, 8.75, and 17.5 ms. For the second comparison, the attack and decay flank steepness was varied by setting their durations to either 1.25 or 18.75 ms. The third comparison tested the influence of the stimulus level for 50 Hz sinusoidally amplitude modulated (SAM) tones. Stimuli were 400 ms in duration and were gated simultaneously to both ears with 100-ms (onset) and 10-ms (offset) raised-sine ramps. ITDs were applied before the gating and only to the envelope. The target level was set to 65 dB sound pressure level (SPL) rms for the SAM tone (except for one condition with 45 dB SPL). The other conditions were then measured with the same peak level as the 65 dB SAM tone, which corresponded to slightly different shape-dependent rms values. Binaurally uncorrelated low-frequency noise was added to the target in order to mask potential distortion products. The noise was gated with 50-ms raised-sine ramps. It had a duration of 600 ms and was temporally centered around the target (as in Klein-Hennig et al., 2011; Bernstein and Trahiotis, 2012), i.e., it started 100 ms prior to the target. The noise had a flat spectrum up to 200 Hz, beyond 200 Hz its spectral density decreased by 3 dB per octave. It was filtered with a fifth-order 1000-Hz low-pass filter.
The pointer stimulus was spectrally identical to the one employed by Bernstein and Trahiotis (2011, 2012): 200-Hz wide Gaussian noise centered at 500 Hz. Its duration and gating were identical to the target. The presentation level of the pointer was 65 dB SPL which resulted in a perceived loudness comparable to the target. The starting ILD of the pointer was roved randomly by ±10 dB. The ILD of the pointer could be changed at run-time by turning the knob, with a minimum step size of 0.1 dB. One rotation of the knob corresponded to 4 dB ILD. The overall rms level of the two-channel audio signal was kept constant while changing the ILD [Dietz et al., 2013a, Eq. (5)]. Presentation of target and pointer alternated continuously with inter-stimulus-intervals of 200 ms until the subject pressed down the knob to indicate matched lateralization.
An experimental session consisted of all conditions measured once in random order. Each of seven different envelope shapes was measured in nine interaural configurations: One diotic pointer calibration condition, five non-zero left leading ITDs (200, 600, 1000, 1400, and 2000 μs), one non-zero ILD (5 dB), and two combinations of ITD and ILD (1000 μs and ±5 dB). The pointer ILD from the pointer calibration condition was subtracted from all pointer ILDs of the dichotic conditions with the same envelope shape. For example, as in Bernstein and Trahiotis (2012), the purpose of this correction, which was usually below 4 dB, is that a 0 dB pointer value indicates the lateralization of a diotic target. All conditions were measured six times.
3. Results
Figure 1 displays the data for three of the envelope shapes. Here the influence of pause duration is investigated. The median pointer ILDs and the interquartile range of all six individual subjects are shown, as are the normalized across subject mean data with standard error.
(Color online) Influence of the pause duration (duty cycle). The envelope waveforms of the three conditions are shown in panel (D). (A) Median individual subject data for the extent of lateralization as a function of ITD. Error bars indicate the interquartile range. The corresponding threshold ITD (from Klein-Hennig et al., 2011) is given in parentheses in the legend. (B) Individual subject data (median and interquartile range) for the extent of lateralization as a function of ILD. Open symbols are ILD-only conditions, filled symbols have an additional fixed 1-ms ITD. (C) Mean normalized ITD data (mean pointer for all envelope shapes at ITD = 600 μs set to 1 for each subject, then averaged across subjects). Error bars indicate the standard error of the mean. (E) Mean normalized ILD data and standard error.
(Color online) Influence of the pause duration (duty cycle). The envelope waveforms of the three conditions are shown in panel (D). (A) Median individual subject data for the extent of lateralization as a function of ITD. Error bars indicate the interquartile range. The corresponding threshold ITD (from Klein-Hennig et al., 2011) is given in parentheses in the legend. (B) Individual subject data (median and interquartile range) for the extent of lateralization as a function of ILD. Open symbols are ILD-only conditions, filled symbols have an additional fixed 1-ms ITD. (C) Mean normalized ITD data (mean pointer for all envelope shapes at ITD = 600 μs set to 1 for each subject, then averaged across subjects). Error bars indicate the standard error of the mean. (E) Mean normalized ILD data and standard error.
The main observations from visual inspection of the individual subjects' ITD data [Fig. 1(A)] are that extent of lateralization increases more or less monotonically up to 2 ms ITD—approximately three times the ITD that human listeners experience for sound sources at 90° azimuth. The individual subjects' ILD data [Fig. 1(B)] show a systematic dependence on ILD, both in isolation (open symbols) and with an additional fixed 1-ms ITD (filled symbols). A further observation is the subject-dependent magnitude of pointer ILD: Some subjects use consistently lower pointer values (e.g., S1 and S3) for all envelope shapes and all ITD and ILD conditions. Such subject dependence is not unusual and has been reported previously in acoustic pointer studies (e.g., Bernstein and Trahiotis, 1986; Dietz et al., 2009). In order to obtain a valid average across listeners, normalization prior to averaging is beneficial (e.g., see Bernstein and Trahiotis, 2009). In the current study each subject's data was normalized by dividing it through the subject's average pointer ILD (across all envelope shapes) at 600 μs ITD. Normalization values varied from 2.74 dB for S1 to 7.96 dB for S4, with an average of 5.7 dB. Therefore, the average normalized thresholds can be scaled back to a “dB pointer ILD scale” by multiplying them by 5.7 dB.
A repeated-measures two-way analysis of variance (ANOVA) was performed on both the non-normalized and the normalized pointer ILD values. The latter will be given in parenthesis. Assuming α = 0.05, a significant main effect was found for both ITD [F(4,8) = 19.9; p < 0.001] [normalized data: F(4,8) = 55.5; p < 0.001] and pause duration [F(2,8) = 27.7; p < 0.001] [F(2,8) = 74.8; p < 0.001]. The interaction between ITD and pause duration was only significant in the normalized data [F(8,8) = 1.35 p = 0.23] [F(8,8) = 3.1 p = 0.005]. The proportions of variance accounted for (ω2) were determined according to Hays (1973): ITD accounted for 36% (48%) of the total variance in the data, stimulus type accounted for 25% (32%), and the interaction of the two factors accounted for 5% (5%). The total variance accounted for by the two factors including their interaction was therefore 66% (86%). A post hoc pairwise comparison (Tukey) revealed that, in the non-normalized data, any pair of neighboring ITDs never resulted in significantly different lateralization, but any pair of non-neighboring ITDs did (p < 0.05). In the normalized pointer ILD data, any ITD pair except for the pair of ITD = 1.4 ms and ITD = 2.0 ms resulted in significantly different lateralization (p < 0.05). The difference between the zero-pause and the two non-zero pause durations was highly significant (p < 0.001). Consistent with visual inspection of the data, there was no significant difference between the 8.75 and 17.5 ms pause duration conditions.
There also appears to be an envelope shape dependence in the ILD-based lateralization [Figs. 1(B) and 1(E)]. In contrast to the ITD-based lateralization, for an ILD of 5 dB at zero ITD (open symbols), the lateralization is greatest for the zero-pause condition. An analysis on the correlation of ILD- and ITD-based lateralization will follow at the end of the section for all envelope waveforms together.
The statistical analysis for the different flank durations (Fig. 2) follows the same procedure as described above. A significant main effect of ITD [F(4,8) = 17.3; p < 0.001] [normalized pointer: F(4,8) = 67.9; p < 0.001] and of flank duration [F(2,8) = 8.8; p < 0.001] [F(2,8) = 27.5; p < 0.001] was found. The interaction between the two factors was not significant [F(8,8) = 0.25 p = 0.98] [F(8,8) = 0.81 p = 0.60]. ITD accounted for 42% (67%) of the total variance in the data, stimulus type accounted for 11% (14%). The total variance accounted for was 54% (83%). As in the pause duration comparison, the condition with the least ITD-based lateralization (long attack) has the strongest lateralization with a 5 dB ILD.
(Color online) Mean normalized data for different flank steepness [same format as Figs. 1(C) and 1(E)]. (A) Envelope waveforms. The condition with two steep flanks is identical to the 8.75-ms pause from Fig. 1. (B) ITD data. (C) ILD data.
For different stimulus levels (see Fig. 3) an ANOVA revealed a significant main effect of ITD [F(4,4) = 16.1; p < 0.001] [F(4,4) = 42.4; p < 0.001] but despite level influencing threshold ITD (e.g., Klein-Hennig et al., 2011; Dietz et al., 2013a) there is no significant effect of stimulus level on lateralization [F(1,4) = 0.7; p = 0.42] [F(1,4) = 3.5; p = 0.07]. Also, there was no interaction between the two factors [F(4,4) = 0.03 p = 0.998] [F(4,4) = 0.03; p = 0.999]. ITD accounted for 56% (77%) of the total variance in the data. Both factors and their interaction together accounted for 57% (78%).
(Color online) Mean normalized data for two different overall stimulus levels of an SAM tone (same format as Fig. 2). Solid line: 65 dB SPL, dashed line: 45 dB SPL. The threshold ITDs in parentheses from Klein-Hennig et al. (2011) were measured at slightly different levels: 66 dB and 48 dB, respectively.
(Color online) Mean normalized data for two different overall stimulus levels of an SAM tone (same format as Fig. 2). Solid line: 65 dB SPL, dashed line: 45 dB SPL. The threshold ITDs in parentheses from Klein-Hennig et al. (2011) were measured at slightly different levels: 66 dB and 48 dB, respectively.
In the first two comparisons (pause duration and flank duration) an inverse relation was observed between the ITD and ILD based lateralization. This relation was explored more thoroughly by evaluating it across all seven envelope shapes employed. The non-normalized and normalized mean pointer ILDs from 5 dB ILD (zero ITD) were correlated with their respective values for the 1 ms ITD (zero ILD) condition. A strong and significant negative correlation of ρ = −0.86 (p = 0.01) was found (normalized pointer ILD: ρ = −0.83; p = 0.02).
Similar to the stimulus dependent ITD-ILD trading ratio (e.g., see Stecker, 2010), the inverse relation mentioned above does also result in a shape-dependent ITD-ILD matching ratio. The values range from about 80 μs/dB for the conditions with pause and steep attack (400 μs corresponding to 5 dB) to about 400 μs/dB for the condition with no pause and maximum sustain duration. For the latter condition also the trading ratio can be interpolated from Fig. 1(E) to be 385 μs/dB (2.6 dB ILD trading 1 ms ITD).
4. Discussion
The extent of lateralization measured in terms of pointer ILD at fixed ITDs depends strongly and systematically on the shape of the ongoing amplitude modulation. A steep attack flank and non-zero pause duration resulted in relatively large extents of lateralization, whereas the steepness of the decay flank did not influence the lateralization significantly. All of this is consistent with the threshold ITD data (Klein-Hennig et al., 2011) and with how strong ITD modulates neural response rates (guinea pig inferior colliculus, Dietz et al., 2013b). The higher importance of the rising envelope segment has also been reported in owls (Nelson and Takahashi, 2010) and in the low-frequency domain, where the modulation onset appears to trigger the “read out” of temporal fine structure ITDs (Dietz et al., 2013c; Dietz et al., 2014).
The above stated correspondence of ITD sensitivity and extent of lateralization is nontrivial and several counterexamples have been reported (e.g., Domnitz and Colburn, 1977; Bernstein and Trahiotis, 2011; Stecker et al., 2013). Apparently the relation does also not hold for the dependence on overall level: Although threshold ITD depends on overall level (e.g., Klein-Hennig et al., 2011; Dietz et al., 2013a) the extent of lateralization does not reveal a significant dependence on overall level. It was previously argued (e.g., Domnitz and Colburn, 1977; Bernstein and Trahiotis, 2011) that ITD (or ILD) discrimination sensitivity is proportional to the mean-displacement-to-sound-image-width (variance) ratio. In contrast to the single target presentation in threshold experiments, the repeated presentation in the acoustic pointer paradigm can reduce the influence of the variance through effectively averaging within one experimental trial. Further implications and potential consequences of the methodological difference are discussed in Stecker et al. (2013).
Mostly independent of the envelope shape the data supports our initial hypothesis by showing an increasing lateralization up to 2 ms ITD in the absence of temporal fine structure ITDs. Furthermore, Laback et al. (2011) demonstrated a similar pause-duration dependency when comparing threshold envelope ITDs for normal hearing listeners with CI users. We therefore conclude from the current study that modulation-onset enhancement and ITD enhancement may improve localization performance with future binaural CI processors.
The conditions with 50-Hz sinusoidal modulation employed at different overall levels were included in the protocol to allow for a comparison with previously published data: At 1 ms ITD the average pointer ILD in the current study was 7.2 dB. This matches very well with a linear interpolation between the approximately 4.5 and 9 dB that Bernstein and Trahiotis (2012) reported for 32 and 64-Hz sinusoidal modulation, respectively.
Another finding is the strong negative correlation between ITD- and ILD-based lateralization. A negative correlation can also be found when comparing two previously published studies: Bernstein and Trahiotis (1986) reported a much larger extent of lateralization for 500-Hz centered noise compared to 4-kHz centered noise at a given ITD and Bernstein and Trahiotis (2011) reported a larger lateralization for the 4-kHz noise at a given ILD. Also when visually comparing ILD and ITD based lateralization in Bernstein and Trahiotis (2012) it appears as if fully modulated (32 and 64 Hz) raised sine tones (n = 8) have a stronger ITD based lateralization whereas sinusoidal amplitude modulation (n = 1) results in a stronger ILD based lateralization. A possible explanation is given in the terminology of the ITD-ILD-trading ratio (e.g., Stecker, 2010): It is assumed that conditions with a large ITD-based lateralization generally have very potent ITD cues with a strong weight, even if the ITD is zero. In the stimuli under investigation both ITD cues and ILD cues are always present and always have a non-zero weight. It is further assumed that the ILD weight in isolation does not depend very much on envelope shape. However, when investigating pure ILD-based lateralization the unavoidable addition of the central (zero) ITD cue results in a central cue with a shape dependent weight. The more potent the ITD cue, the stronger the central lateralization weight and the lower the ILD-based lateralization.
Acknowledgments
This work was funded by the European Union under the Advancing Binaural Cochlear Implant Technology (ABCIT) grant agreement (No. 304912). We thank Birger Kollmeier for continuous support and fruitful discussions. Furthermore, we thank Constantine Trahiotis for inspiring us to run the study, Leslie R. Bernstein for helpful suggestions including the ITD-ILD trading ratio argument, and Regina Baumgärtel for her input on the methods.