The roles of interaural time difference (ITD) and interaural level difference (ILD) were studied in free-field source localization experiments for sine tones of low frequency (250–750 Hz). Experiments combined real-source trials with virtual trials created through transaural synthesis based on real-time ear canal measurements. Experiments showed the following: (1) The naturally occurring ILD is physically large enough to exert an influence on sound localization well below 1000 Hz. (2) An ILD having the same sign as the ITD modestly enhances the perceived azimuth of tones for all values of the ITD, and it eliminates left-right confusions that otherwise occur when the interaural phase difference (IPD) passes 180°. (3) Increasing the ILD to large, implausible values can decrease the perceived laterality while also increasing front-back confusions. (4) Tone localization is more directly related to the ITD than to the IPD. (5) An ILD having a sign opposite to the ITD promotes a slipped-cycle ITD, sometimes with dramatic effects on localization. Because the role of the ITD itself is altered by the ILD, the duplex processing of ITD and ILD reflects more than mere trading; the effect of the ITD can be reversed in sign.
I. INTRODUCTION
A. Duplex theory
According to the duplex theory of sound localization, sounds are localized through a combination of interaural time differences (ITD) and interaural level differences (ILD). For pure tones without useful localization information in the envelope, such as onsets or modulation, the ITD dominates for low frequencies and the ILD dominates for high. The ITD in the envelope of a modulated high-frequency tone is another cue for sound localization, and this has been regarded as an extension of the duplex theory (Bernstein and Trahiotis, 1985).
The duplex theory originated in experiments by Lord Rayleigh using pure tones from tuning forks or singing flames (Strutt, 1907). Faced with the problem of identifying the boundary between low and high frequencies, Rayleigh's, somewhat tentative, answer was 500 Hz. He was certain that ITD dominated at 128 Hz and below, and he also decided that ILD dominated above 500 Hz. Two years later, Rayleigh expressed the view that ITD could not be used much above 400 Hz (Strutt, 1909).
Since then, there have been other attempts to identify a boundary for pure tones. Stevens and Newman (1936) concluded that the ITD is useful for frequencies lower than 2000 Hz and that the ILD is useful for frequencies of 4000 Hz and higher. Mills (1960) compared difference limens for ITD and ILD with the minimum audible angles that he had measured earlier in free field (Mills, 1958). The comparison led to an estimated boundary between effective ITD and effective ILD of about 1500 Hz, an octave below the boundary found by Stevens and Newman, but still notably higher than 500 Hz as suggested by Rayleigh. Sandel et al. (1955) also found a boundary at 1500 Hz.
Macpherson and Middlebrooks (2002) showed that a duplex rule applies to noise bands whereby ITD is weighted more strongly for low-passed bands and ILD is weighted more strongly for high-passed bands. Blauert (1996) suggested that if a signal contains no components above 1600 Hz, localization is dominated by the ITD and the role of the ILD is small. Headphone noise-localization experiments designed by Wightman and Kistler (1992) to simulate free-field localization of broadband noise indicated that localization judgements were dominated by ITD whenever the noise included power below 2.5 kHz.
B. Trading ITD and ILD
The duplex idea gave rise to trading experiments in which the two interaural differences were placed in opposition to compare their relative strengths (Shaxby and Gage, 1932). The initial appeal of such experiments was that localization might be the result of a single physiological process. The ILD might be effective only because it is converted into an ITD through its role in the latency of neural spikes (David et al., 1958). Alternatively, the ITD might be effective only because it is converted by a central attenuation process into an ILD (Deatherage et al., 1959). It was subsequently found that, at least to some extent, the two interaural cues are separately preserved at the perceptual level (Hafter and Carrier, 1972). Edmonds and Krumbholz (2014) found evidence for both integrated and separated ITD-ILD excitations in EEG recordings of cortex.
The simplicity of the trading concept was ultimately challenged by the fact that the trading ratios (measured in units of microseconds of ITD per decibel of ILD) for human listeners in different experiments were highly variable, spanning a range two orders in magnitude (Whitworth and Jeffress, 1961). Also, when presented with both ITD and ILD, some listeners heard two images, a “time image” and an “intensity image.” Different listeners exhibited greater sensitivity to one or another of these interaural differences, and experiments were done to see if these tendencies could be reversed (McFadden et al., 1973). The experiments reported in the present article also made use of interaural differences in opposition. The results cast doubt on the concept of a boundary between two parallel, independent processes for interaural cues and may provide insight into the variability of historical trading ratios.
C. Transaural synthesis
The present study addressed the duplex theory for sine tones with low frequencies—less than 1000 Hz. It examined the contributions of ITD and ILD cues to tone localization in a novel way. The essential new feature was that stimuli were presented from real sources in free field and, contemporaneously, from virtual sources with independently controlled ITD and ILD parameters. Synthesis of virtual cues was based on ear-canal measurements for real and synthesis loudspeakers. Knowing the transfer functions between two synthesis loudspeakers and the two ears made it possible to create arbitrary signals in the ear canals. The virtual parametric variations were supplemented with baseline virtual presentations having parameters identical to real sources. The goal was to realize and improve the control that is normally available with headphones while achieving the realism of open-ear listening in a real-world localization task.
In principle, transaural presentation offers several advantages over headphones. As noted by Domnitz (1975), headphone experiments are subject to accidental ILD offsets (as large as 3 dB), and they are subject to variability with repeated headphone positioning [standard deviation (SD) of 1.5 dB]. Such inadvertent ILDs can lead to increased experimental variability. By contrast, transaural experiments potentially reduce the variability to negligible amounts.
A second problem with headphone experiments is that there are defined ILD and ITD stimulus ranges. There may be a defined response range as well. Over time, a listener may associate the stimulus ranges with the allowed range of responses (or with his own chosen range), leading to conclusions that include an experimental artifact based on experience with the protocol. The ILD and ITD ranges of our real and baseline trials were also defined, but not by the experiment protocol. Instead, they were defined by the listener's own anatomy as experienced in everyday listening. Trials with fixed interaural parameters do introduce arbitrary values, but in our experiment these trials were interspersed with real and base-line trials. Our response range was also a natural one—determined only by the listener's egocentric geometry.
Like headphone experiments measuring lateralization, transaural experiments measuring localization can reveal the roles of ITD and ILD in opposition, where their relative weights can be measured, and in cooperation, where they can add or act synergistically. It can be expected that localization experiments will sometimes confirm effects that are well known from years of lateralization research using headphones. Alternatively, virtual presentation may discover processing modes for interaural differences that are unique to actual localization.
II. ILD EXPERIMENTS
Most recent studies of the combined effects of ITD and ILD have investigated noise stimuli (e.g., Blauert, 1982; Wightman and Kistler, 1992; Macpherson and Middlebrooks, 2002; Rakerd and Hartmann, 2010). The experiments described in the present article use sine tones, partly because the pioneering work on combining interaural cues by Lord Rayleigh used sine tones, and because the duplex theory is most clearly expressed in terms of sine tones. Our experiments studied the role of ILD in the free-field localization of low-frequency tones: 250, 500, and 750 Hz. By means of transaural techniques, the ILD was fixed at 0, ±6, or ±12 dB. Alternatively, the ILD was set to values that naturally occurred for a given listener and source azimuth (baseline condition). Whatever the ILD, the ITD was always maintained at its natural real-source value. Constraining the ITD to free-field values is a feature that distinguishes our localization experiments from headphone experiments.
A. Overview
A listener was seated in an anechoic room facing an array of 25 loudspeakers. The speakers were all in a horizontal plane at eye level, equally separated by 7.5°, and they extended from −90° of azimuth to +90°, as shown in Fig. 1. Also, there were two loudspeakers behind the listener used for virtual source synthesis. The listener's ear canals were fitted with probe microphones to obtain data for analysis and synthesis.
During an experiment run, the frequency of the tone was constant, and the magnitude (but not the sign) of the fixed ILD (when applied) was also constant. A run began with a calibration stage, presenting a tone of fixed frequency from each of the 25 loudspeakers in sequence from left to right. The listener was aware of the regular sequence, and the experience may have served as a helpful guide to the source locations. More importantly, the tones of the sequence were recorded through the probe microphones so that the amplitudes and phases occurring in the two ear canals could be determined for each source. Then the tone was presented individually by the two synthesis speakers to calibrate the transaural computation.
After the calibration stage, the experiment trials began. A trial consisted of a tone presentation and a verbal response from the listener indicating the perceived source azimuth on a numerical scale including the full 360° of azimuth. During the presentation, the ITD and ILD were measured. The tones were one-second in duration and had 150-ms raised-cosine envelope edges to avoid providing onset/offset information. The responses were recorded by the experimenter and stored together with the interaural data. The experiments were paced by the listener's responses.
Each trial could be one of four types. Real trial: A tone was presented from one of the 25 real speakers of the array (see Fig. 1). Baseline trial: A virtual tone, with amplitudes and phases in both ears matching those of a real trial, was presented using the two synthesis speakers. Consistent trial: A virtual tone was presented with the chosen, fixed ILD magnitude (0, 6, or 12 dB). The sign of the ILD agreed with the sign of the source azimuth. Opposing trial: A virtual tone was presented with the chosen, fixed ILD magnitude and a sign opposite to the sign of the source azimuth. (Trials at zero source azimuth were always real trials.) During an experiment run, the trial types were presented in haphazard order. It required six experiment runs for each listener, each fixed frequency, and each fixed ILD value to obtain adequate data for all 25 sound sources for real, baseline, consistent, and opposing trials.
B. Experiment details
The anechoic room (IAC 107840) had dimensions 4.3 m by 3.0 m by 2.4 m high with 90-cm foam wedges on all six surfaces. The loudspeaker array was constructed using loudspeakers having a single 3-cm driver. The speakers were 1 m from the listener. They were labeled with source numbers per Fig. 1 and also with numbers on an extended scale that circled around behind the listener out to source numbers +24 and −24, both directly in back. Listeners made responses on this integer scale of source numbers. Responses outside the range −12 to +12 (front-back reversals) were reflected across the median frontal plane in the final analysis of lateralization results, per Stevens and Newman (1936) and Wightman and Kistler (1989a,b), but these responses were separately retained to study reversals.
Virtual trials used a transaural technique similar to cross-talk cancellation (Schroeder and Atal, 1963). As refined by Morimoto and Ando (1980) and by Zhang and Hartmann (2010), this technique requires two synthesis speakers. The generation of a signal begins with a two-by-two matrix—the complex transfer functions between the two synthesis speakers and the two ear canals. Inverting this matrix provides the transfer functions whereby the synthesis speakers can reproduce the sound from any real source for which the signals in the two ear canals are known. The value of the technique is that the signals from the real source can be modified in an arbitrary way, affording excellent stimulus control and flexibility. In our experiments, the synthesis speakers were also 3-cm single-driver and were 50 cm from the listener's head—behind and to the right and left. Because the wavelengths of our tones were long, it was possible to leave the listener's head unrestrained and still obtain a reliable synthesis. An adjustable bar rested on top of the listener's head to remind him to keep his head facing forward.
The ear canal recordings were made using ER-7c probe tube microphones (Etymotic, Elk Grove Village, IL) with associated preamplifiers. Signals from the preamplifiers were given an additional 40 dB of gain by a second preamplifier stage and then recorded by the 16-bit analog-to-digital converters of a TDT DD1 module (Tucker-Davis Technologies, Alachua, FL) with a sample rate of 25 ksps per channel. The digital recordings were then processed by matched filtering on half-second samples to determine the amplitudes and phases of the signals in the two ears.
Prior to the experiment runs, the preamplifier gains were adjusted to produce a 0-dB difference between the channels when the probe microphones were coincident. (Interestingly, this adjustment was unnecessary for correct synthesis of virtual stimuli because any constant ILD and/or ITD offset in the recording system would be automatically compensated by the synthesis technique as we applied it.) The probe tubes were inserted with their tips about half way up the ear canal. It was not necessary to approach the eardrum because at the frequencies of this experiment the pressure distribution is known to be uniform along and across the canal (Hammershøi and Møller, 1996). Further, the pressure is uniform across the tympanic membrane (Ravicz et al., 2014).
Amplitudes and phases were recorded for every trial. They were monitored by the experimenter in real time and compared with the target values based on the calibration stage to ensure that the listener had not moved in a way that would adversely affect virtual trials.
To ensure equal coverage of all the conditions, the stimuli for an experiment run came from one of six lists of 29 stimuli. Combined, the six lists included all 25 speakers with real, baseline, consistent, and opposing conditions haphazardly assigned. During a run, the stimuli of the list were presented in random order. By completing six runs, each with 29 trials, a listener heard 1 real, 2 baseline, 2 consistent, and 2 opposing trials for each of the 24 speakers with non-zero source azimuth. Each list (run) also included one trial with a real presentation from zero azimuth.
The tone level at the listener's position was nominally 70 dBA, but it was attenuated by a random amount on each trial, by as much as 5 dB, to prevent listeners from using loudness or distortion as cues.
C. Listeners
There were five listeners in the experiments, A, B, C, D, and E—all males between the ages of 21 and 29, except for listener B who was 62. Listeners had normal audiometric thresholds 250–4000 Hz. Listeners B and E were authors. The other listeners were assistants in the lab who also had considerable experience in sound localization tasks. The experiments followed procedures approved by the Michigan State University Institutional Review Board.
III. INTERAURAL DIFFERENCE RESULTS
The real-source trials in our experiments measured ILDs and ITDs. These physical differences are described in this section; they were found to be important in the interpretation of the perceptual results.
A. ILDs
According to the duplex theory, listeners do not use ILDs to localize low-frequency tones. A common explanation is that the ILDs are too small to be useful. At low frequencies, the wavelengths are so long that waves diffract around the head causing the levels to be essentially identical in the two ears, whatever the azimuth of the source.
The small-ILD explanation is certainly appropriate for a 128-Hz plane wave, as envisioned by Rayleigh (Strutt, 1907). According to the spherical head model, the ILD for a plane-wave tone at 128 Hz is never greater than 0.02 dB. However, at the frequencies of our experiments, ILDs become easily detectable. The spherical head model gives maximum ILDs of 0.3, 1.8, and 3.9 dB for plane waves of 250, 500, and 750 Hz, respectively.1 Therefore, even for sources at infinite distance there are useable ILDs in tones in a frequency range that is often considered low. This point was previously made by Wightman and Kistler (1992). When the source is closer to the listener, the ILDs become larger (Brungart and Rabinowitz, 1999). For a source at 1 m, those maximum ILDs become 2.5, 3.8, and 5.2 dB according to the spherical head model.
In our experience, ILDs measured on human listeners are typically larger, and potentially more useful, than the spherical head model predicts (Cai et al., 2015). Our real trials provided three measurements of the ILD for each listener, frequency, and source azimuth. Figure 2 shows the ILDs, averaged over the five listeners, where ILDs become as large as 8 dB. However, the plots become rather flat for the higher source azimuths. For 500 Hz, most of the ILD variation with source azimuth occurs within 30° of the forward direction (azimuth zero). Therefore, although these ILDs are easily large enough to be useful, they provide little basis for discrimination of azimuths greater than 50°. For 750 Hz, the data show large error bars and a pronounced left-right asymmetry—mainly caused by one listener. Our protocol ensured that real-source data were collected across runs maximally separated in time, and that may have contributed to the variability.
B. IPDs
Our measurements of ITDs on real trials were converted to interaural phase differences (IPDs) by multiplying the ITDs by the frequency. The IPD representation better separates the functions for the different frequencies. As shown in Fig. 3, the IPDs show a remarkable consistency across listeners; most of the SDs are smaller than the plotting symbols. The IPDs for 250-Hz tones were all within the range −90° to +90°, and these provide unambiguous cues for localization. The IPDs for 500 Hz can have magnitudes greater than ±90, where the experiments by Sayers (1964) and by Domnitz and Colburn (1977) indicate that average localization begins to revert towards the midline, and experiments by Yost (1981) indicate that the IPDs occasionally lead to false cues pointing to the wrong side (left-right reversals). The IPDs for 750 Hz can have magnitudes greater than 180°, but Fig. 3 does not reduce these large IPDs to the central range −180° to +180°. It proved to be more useful to let the signs of large IPDs be equal to the signs of the azimuths of the sources. However, logically and psychologically, these IPDs, by themselves, cue a direction opposite to the true azimuth, and they could lead to confusion.
C. Test of transaural synthesis
To test the accuracy of our transaural technique, we compared ILDs measured during real trials with ILDs for baseline trials. These two kinds of trials ought to lead to identical ILDs. The root-mean-square (RMS) discrepancies between real and baseline ILDs are given in Table I.2 The table shows that the RMS discrepancies increased with increasing frequency so that at 750 Hz, the average discrepancy grew to 0.5 dB. A similar test of accuracy was made for ITDs, and those data appear in Table II. The table shows no systematic frequency dependence. The average RMS discrepancy was 14 μs. This discrepancy can be compared with the minimum just noticeable difference found by Brughera et al. (2013), which was between 10 and 20 μs.
Listener . | 250 Hz . | 500 Hz . | 750 Hz . |
---|---|---|---|
A | 0.14 | 0.26 | 0.39 |
B | 0.08 | 0.20 | 0.22 |
C | 0.13 | 0.19 | 0.38 |
D | 0.12 | 0.34 | 0.73 |
E | 0.14 | 0.41 | 0.80 |
Average | 0.12 | 0.28 | 0.50 |
Listener . | 250 Hz . | 500 Hz . | 750 Hz . |
---|---|---|---|
A | 0.14 | 0.26 | 0.39 |
B | 0.08 | 0.20 | 0.22 |
C | 0.13 | 0.19 | 0.38 |
D | 0.12 | 0.34 | 0.73 |
E | 0.14 | 0.41 | 0.80 |
Average | 0.12 | 0.28 | 0.50 |
Listener . | 250 Hz . | 500 Hz . | 750 Hz . |
---|---|---|---|
A | 19.6 | 12.9 | 13.5 |
B | 14.4 | 7.8 | 9.3 |
C | 12.2 | 19.8 | 10.9 |
D | 13.3 | 15.2 | 17.2 |
E | 13.2 | 12.7 | 19.2 |
Average | 14.5 | 13.7 | 14.0 |
Listener . | 250 Hz . | 500 Hz . | 750 Hz . |
---|---|---|---|
A | 19.6 | 12.9 | 13.5 |
B | 14.4 | 7.8 | 9.3 |
C | 12.2 | 19.8 | 10.9 |
D | 13.3 | 15.2 | 17.2 |
E | 13.2 | 12.7 | 19.2 |
Average | 14.5 | 13.7 | 14.0 |
To test for variability in the synthesis, we compared ITDs and ILDs recorded on baseline trials for blocks of runs at 500 Hz using three different fixed ILDs. Choosing runs in this way ensured that the runs were done on different days; the probe tubes were reset; the contexts were different, and yet, the baseline trials ought to be the same. Averaging across the 24 speakers, the SDs in ILD for listeners A, B, C, D, and E were, respectively, 0.52, 0.40, 0.27, 0.33, and 0.47 dB. These values can be compared with a SD of 1.5 dB found at 500 Hz by Domnitz (1975) in experiments where headphones were repeatedly placed on the listener's head. The corresponding ITD SDs for listeners A, B, C, D, and E were, respectively, 17, 15, 14, 15, and 18 μs. These values can be compared with 56 μs, found at 500 Hz by Domnitz. These comparisons indicate that the transaural synthesis technique compares favorably with headphone experiments at these frequencies.
IV. LOCALIZATION FOR BASELINE STIMULI
The trials employing baseline virtual stimuli form a good basis of comparison for the fixed-ILD trials (described in Secs. V and VI) because the same number of trials were done for all of these trial types (48 for each listener and frequency) and because baseline stimuli had the same ITD and ILD values as real sources. The localization responses that each of the listeners gave on baseline trials are shown in Fig. 4.3 Each response, in units of degrees of azimuth, (y-axis) is plotted against the IPD value recorded in the ear canals on the corresponding trial (x-axis). The IPD scale extends to values as large as 240° with signs chosen to agree with the signs of the source azimuths. There were three frequencies: 250 Hz (diamonds), 500 Hz (triangles), and 750 Hz (circles).
V. ZERO ILD
Our experiments included virtual trials with an ILD of zero. For this condition, the variation in listener responses can be attributed exclusively to the ITD.
A. Comparison with headphone experiments
The perception of source lateralization caused by the ITD when the ILD is zero is important psychoacoustically because it is an unbiased mapping from ITD to spatial coordinates. It has been measured using headphones, where a nominal zero ILD can be obtained (e.g., Sayers, 1964; Yost, 1981; Schiano et al., 1986; Zhang and Hartmann, 2006).
To determine the ILDs actually present in our experiments, and to further test the possibility that transaural techniques can improve upon headphone experiments, we measured the distributions of ILDs for the “zero-ILD” experiment for the five listeners and three frequencies. The fifteen values of mean and SD are given in Table III. The largest mean was 0.17 dB and the largest SD was 0.55 dB—both for listener E at 750 Hz. Second largest values were about half of those sizes. These errors are smaller than the maximum headphone error of 3 dB measured by Domnitz (1975). The SDs from Table III were significantly less than Domnitz's value of 1.5 dB [t(14) = 45.3; p < 0.0001]. Evidently, the transaural presentation made a significant reduction in the inadvertent ILD problem.
Listener . | 250 Hz . | 500 Hz . | 750 Hz . |
---|---|---|---|
A | 0.10 (0.24) | −0.04 (0.23) | 0.01 (0.27) |
B | −0.05 (0.14) | 0.02 (0.14) | −0.04 (0.12) |
C | 0.09 (0.13) | 0.09 (0.23) | −0.05 (0.13) |
D | 0.03 (0.15) | −0.07 (0.13) | −0.02 (0.11) |
E | −0.04 (0.12) | 0.01 (0.18) | 0.17 (0.55) |
Listener . | 250 Hz . | 500 Hz . | 750 Hz . |
---|---|---|---|
A | 0.10 (0.24) | −0.04 (0.23) | 0.01 (0.27) |
B | −0.05 (0.14) | 0.02 (0.14) | −0.04 (0.12) |
C | 0.09 (0.13) | 0.09 (0.23) | −0.05 (0.13) |
D | 0.03 (0.15) | −0.07 (0.13) | −0.02 (0.11) |
E | −0.04 (0.12) | 0.01 (0.18) | 0.17 (0.55) |
B. Localization for zero ILD
For each listener and each frequency, there were 96 virtual, zero-ILD trials, four for each of the 24 sources. The localization responses are plotted against IPD in Fig. 5. If the responses were perfect, with response azimuths exactly equal to the source azimuths, then this figure would be the inverse of Fig. 3. That ideal result is actually a reasonable description of the responses for 250 Hz. Whereas the diamonds in Fig. 3 show that the IPD for 250 Hz resembles a flat, horizontal function of source azimuth for source azimuths near ±60° (where IPD magnitudes lie between 60° and 80°) the diamonds in Fig. 5 resemble a vertical function at those same limits for listeners A, B, C, and E. Listener D rarely used the full range of responses. This comparison shows the strong dominance of ITD at 250 Hz.
A similar ideal pattern holds for 500 Hz (triangles) for listeners C and E, and also for B to a lesser degree. Response azimuths for 500 Hz increased as the IPD surpassed 90°. Remarkably, even though IPD magnitudes became as large as 150°, there were no responses with the wrong sign (sign opposite to the azimuth) at 500 Hz.4 This result contrasts with the opposite-side responses sometimes seen in headphone experiments for IPDs in this range (e.g., Sayers, 1964). A possible explanation is that although IPDs as large as 150° normally indicate the correct side, opposite-side responses occur in headphone experiments when small, inadvertent ILD variations overwhelm them or cause them to be reinterpreted.
For 750 Hz, eight of our 25 sources led to mean IPD magnitudes between 180° and 210°, as shown in Fig. 3. Figure 5 shows that these sources frequently evoked opposite-side responses for all the listeners, but the response patterns were individualistic. Responses for listeners A, C, and E were biased to the left (negative response azimuths). Listeners B and D made extreme responses on both sides, as though IPD magnitudes near 180° sometimes cued a location on the far left and sometimes on the far right. Bimodal responses like these have been seen before. They were seen by Sayers (1964), who also saw responses on the midline. Two out of three listeners in the pure-tone experiments (noise-band pointer) by Bernstein and Trahiotis (1985) also exhibited bimodal responses. By contrast, the pure-tone adjustment techniques employed by Young (1976) and by Domnitz and Colburn (1977) led to responses near the midline for an IPD of 180°. We did not see responses near the midline. The difference may be caused by averaging over trials, or within the trials during the course of adjustment, compared to the responses to brief, single tones in our experiments.
Figure 5 shows that opposite-side responses also occurred for 750-Hz tones when the IPD magnitude was near 120° (listeners C, D, and E), and sometimes when the IPD magnitude was as small as 100° (listeners A and B). The observation of opposite-side responses for these relatively small IPD magnitudes at 750 Hz, in contrast with no such responses at 500 Hz, might be understood in terms of slipped-cycle localization, as will be suggested in Sec. VI.
C. Comparison with baseline
In our computational procedure, the localization responses for fixed ILD (0, 6, or 12 dB) were compared with responses to baseline stimuli, where the ILDs were normal. The particular baseline trials chosen for comparison were those occurring in the same runs as the fixed ILD, making baseline trials contemporaneous with the fixed-ILD trials. Also, the number of trials in the baseline set was equal to the number in a fixed-ILD set.5
Compared to baseline trials, setting the ILD equal to zero significantly reduced the average slope of a straight line fit to data drawn through the origin over the IPD range −60° to +60° for all listeners at 250 Hz [t(4) = 3.76, p < 0.02]. For listeners A, B, C, D, and E the ratio of the slope for zero ILD to the slope for baseline stimuli was 0.86, 0.86, 0.89, 0.73, and 0.96, respectively. Setting the ILD equal to zero also decreased the average slope of the best fit line for most of the listeners at 500 Hz. Corresponding ratios were 0.69, 0.80, 0.70, 1.06, and 1.04. Although the average ratio, 0.85 (0.18) was considerably less than 1.0 the difference failed to be significant. Overall, the zero-ILD experiment indicated that in ordinary free-field listening, the ILD contributes to localization, even at 250 and 500 Hz.
For 750 Hz, opposite-side responses, frequently seen for zero ILD, were never seen for baseline conditions. Therefore, it is evident that the ILD plays an important role in preventing this kind of confusion in normal free-field listening. At 750 Hz, there were too few points in the IPD range −60° to +60° to compute a slope ratio, and points outside that IPD range were often on the wrong side for zero ILD.
D. IPD vs ITD
The measurements with zero ILD gave us the opportunity to readdress an old question about tone localization and lateralization, namely, whether the ITD or the IPD is the more proximally relevant measure. Given that the two measures are simply related by a factor of the frequency, the answer to the question turns on measurements made at different frequencies. Schiano et al. (1986) used headphone presentation and a laterality-matching response to determine that the ITD was the more relevant measure. Their experiments were confined to ITDs of 150 μs or less in order to avoid IPDs greater than 90° that might lead to opposite-side lateralizations. Zhang and Hartmann (2006) used headphone presentation and a laterality-rating response. Their measurements used a wider range of parameters—ITDs out to 1000 μs and IPDs out to 150°—and they dealt with the opposite-side problem by discarding responses with the opposite sign (7% of the data). They came to the same conclusion as Schiano et al., namely, that ITD, not IPD, was the relevant interaural parameter.
The comparison of IPD and ITD models in our virtual, zero-ILD trials begins with panel (a) of Fig. 6, which combines data from the five listeners in Fig. 5. The data are plotted as a function of IPD. If the IPD is the proximally relevant measure, then all the points should exhibit the same function. Clearly panel (a) indicates otherwise. By contrast, the same data plotted against ITD in panel (b) of Fig. 6 nearly do exhibit the same function except for the opposite-side responses for large IPDs.
To quantify the comparison of ITD and IPD models, we fitted the response data with the best (least squares) straight-line function of IPD and then with the best straight-line function of ITD. The RMS deviation from a straight line was larger for the IPD fit than for the ITD fit for all listeners. Across listeners, the average ratio of RMS deviations (IPD/ITD) was 1.8, indicating that the ITD fit was almost twice as good as the IPD fit, but because the responses are not a straight-line function of ITD, this numerical comparison understates the true advantage of the ITD model over the IPD model. The evidence is unequivocal: for localization as well as lateralization, the ITD itself is the relevant measure of differences in interaural timing.
VI. LOCALIZATION FOR FIXED ILD, 6 AND 12 dB
The main goal of our experiments was to test the duplex theory for low-frequency pure tones (250 to 750 Hz) where the ITD is expected to dominate. Our approach examined the effects of fixed ILDs, 6 and 12 dB, in comparison with naturally occurring ILDs (baseline trials) and in comparison with zero-ILD trials. We examined the fixed ILDs when their signs were consistent with the source azimuth and when they were opposing.
The localization judgements averaged over listeners are shown in Figs. 7, 8, and 9 for frequencies 250, 500, and 750 Hz, respectively. The top panels of these figures are for consistent ILDs, and the bottom panels are for opposing ILDs. These figures are discussed below with the aid of Fig. 10, a cartoon that abstracts the salient features of the results in Figs. 7–9. Figure 10 has two purposes. It summarizes the experimental data, and it serves as a guide to our interpretation of those data.
A. Displacements
Figures 7–9 show that adding a constant ILD displaces the localization judgements in the direction favored by the ILD, and this is particularly important at small azimuths where the ITD is small. Figure 10 introduces points “D” and “D′,” the displacement, near the origin.
The D points were quantified by two procedures: (1) The response azimuth for source −1 (−7.5° source azimuth) was subtracted from the response azimuth for source +1 (+7.5°), and this difference was divided by 2. (2) A straight line was drawn through the linear region of the response plot (0° to ±60° of source azimuth) and the intercept with the vertical axis was taken as a D point. Both procedures led to similar results: The displacements were always in the direction favored by the ILD. The displacements did not depend systematically on frequency, nor did they depend on whether the ILD conditions were consistent (D) or opposing (D′), apart from the sign. The displacements depended systematically on the magnitude of the ILD. By procedure (1), D was 9.5° (SD = 4.6°) for 6 dB and 20.9° (3.2°) for 12 dB. By procedure (2), D was 9.0° (8.4°) for 6 dB and 17.6° (4.5°) for 12 dB. Thus, the two calculation methods were in reasonable agreement; the displacements in localization response were about 1.5° of azimuth per dB of ILD. Headphone experiments by Sayers (1964) found similar displacements near zero ITD caused by 9-dB ILDs. The frequency independence of the effect of ILD is consistent with lateralization judgements obtained by Yost (1981). The displacement is contrary to the experience with noise bursts reported by Wightman and Kistler (1992), who found that naturally occurring ILDs combined with ITDs near zero always led to an image on the midline, as cued exclusively by the ITD.
B. Consistent trials
Mean responses to consistent trials, in which the ILD had the same sign as the source azimuth, are shown in the top halves of Figs. 7, 8, and 9. These plots reflect the fact that consistent ILDs eliminated opposite side responses for all frequencies and for all listeners. By contrast, Fig. 5 for zero-ILD trials shows frequent opposite side responses at 750 Hz.
1. Frequency dependence
In these experiments, the ILDs were the same, independent of frequency. The ITDs were established by head diffraction, and these were similar, though not identical, for different frequencies, as was found by dividing the IPDs in Fig. 3 by the frequencies. The ITD was largest for 250 Hz. It decreased by 6% when the frequency increased to 500 Hz and decreased by another 10% when the frequency increased from 500 to 750 Hz.6 Because the physical localization cues were similar, one might expect that localization judgements would also be similar. Observation of the top halves of Figs. 7–9 shows that they were indeed similar except that responses were somewhat more lateralized for 250 Hz compared to the other two frequencies. That difference might be attributed to the larger ITD for 250 Hz, which was especially prominent for negative azimuths for our listeners.
2. Curvature and slope
Curvature and slope of the consistent responses are indicated by symbols C and S in Fig. 10. At large source azimuths, the mean responses in the top halves of Figs. 7–9 have negative curvature C. Further investigation revealed that this negative curvature was entirely caused by the fact that the IPD (ITD) is a saturating function of source azimuth, as shown by Fig. 3. When the response azimuths were plotted against IPD or ITD, the functions became straight lines with negligible curvature. The straight lines on the ITD plot made it easy to calculate the slopes away from the midline—at source azimuths from 52.5° to 90°. Averaged over 6 and 12 dB, the slopes in the upper panels of Figs. 7–9, are 66°, 75°, and 65° of response azimuth per ms of ITD for 250, 500, and 750 Hz, respectively.
The positive slopes at large source azimuth are particularly interesting for 750 Hz because, as shown in Fig. 9, the IPD passes through 180°. To be sure about the slopes in this region it was helpful to do a more exact calculation than was possible based on the average data in Fig. 9. Instead, individual responses were plotted against individual ITDs, and the average slope was calculated from those data. The result was 79° of azimuth per ms. Taking variability into account, that number was significantly positive [t(19) = 3.54; p < 0.001]. Further, the individual data indicated that positive slopes persisted for source azimuths greater than 70° where the IPD increased past 180°. (The individual data were more persuasive on this point than are the average data in Fig. 9.)
The increasing influence of increasing IPDs, even as the IPD passes through 180°, can be contrasted with experiments suggesting that the maximum influence of the ITD occurs when the IPD is about 90° (Garner and Wertheimer, 1951; Sayers, 1964; Yost, 1981). There is a tentative explanation for the discrepancy because the previous studies used zero ILD. As shown in Fig. 5, for zero ILD at 750 Hz, IPDs of 120° (and sometimes smaller) led to frequent responses on the opposite side. The effect of a substantial number of responses on the opposite side is to bring the average response closer to zero, and we suggest that this effect is the main reason for the apparent maximum seen for an IPD near 90° in previous studies. By contrast, the constant ILDs in our experiment prevented opposite side responses. At the same time, our test was a fair test of the influence of the ITD/IPD because the ILD did not depend on source azimuth; only the ITD was a variable.
3. Reversed-level effect
For a frequency of 500 Hz (Fig. 8) and azimuths greater than 45°, fixed ILDs of 6 and 12 dB led to an unexpected effect. Although one would naturally expect that increasing the ILD from 6 to 12 dB would lead to increased lateralization, Fig. 8 shows that usually this did not happen. Instead, all listeners, to some extent, exhibited a reversed-level effect, in which increasing the ILD from 6 to 12 dB led to responses with smaller azimuth. A straightforward way to study the effect was to compare responses for 6 and 12 dB for the outer six sources on both left and right sides, a total of 12 sources where the IPDs were sizeable. At 500 Hz, across the five listeners, the responses to 12 dB were significantly less lateral than for 6 dB [t(59) = 3.91; p < 0.0001]. The effect was not caused by increased variance in responses at 12 dB because the size of the reversed-level effect exceeded the change in SD on 6 of 8 tests. A reversed-level effect was also seen for listeners A and C at 250 Hz; it can be seen in Fig. 7 (250 Hz) for negative source azimuths.
The reversed-level effect is in marked contrast to the 500-Hz results in headphone experiments by Domnitz and Colburn (1977), which show a smooth and monotonic displacement of the matching ILD for parametric ILDs varying from 3 to 25 dB. The response method used by Domnitz and Colburn employed a pointer tone with an adjustable ILD. Listeners were apparently able to use that pointer to track the image of a tone as a consistent ILD increased well beyond 12 dB—from 15 to 25 dB.
The contrast between the results from Domnitz and Colburn at 500 Hz and our results can be interpreted as a difference between lateralization and localization. There are several aspects: First, an experiment where tones with large ILDs are matched by tones with large ILDs does not have any necessary upper limit. By contrast, when the listener's responses are azimuths, there is an upper limit—once an image has moved completely to one side, it cannot move any further. Second, a localization experiment enforces a mapping onto spacelike responses with a visual referent, but at some point in an ILD matching experiment, the responses may reflect a transition from image lateralization to matching that is influenced mainly by the domain of ILD itself, or the domain of loudness.
In our localization experiment, listeners often reported sound images behind them (front-back reversals). The sensation of an image in back was compelling in a way that is not generally experienced in headphone listening. As will be seen in Sec. VII, those listeners exhibiting the most prominent reversed-level effect (A, C, and E) were the listeners with the greatest percentage of front-back reversals. Further, for those listeners, the percentage of front-back reversals greatly increased as the level increased from 6 to 12 dB. We believe that the most likely explanation for the reversed-level effect is that when images were perceived to be in back, they were also systematically shifted towards the midline, leading to smaller response azimuths.
C. Opposing trials—250 Hz
Mean responses to opposing trials, in which the ILD had a sign opposite to the source azimuth, are shown in the lower halves of Figs. 7–9.
The responses for 250 Hz in the lower half of Fig. 7 exhibit the displacement D′, as previously noted. An observation regarding the slopes in Fig. 7 is that for an opposing ILD of 6 dB, the slopes away from the midline are 14% less than for a consistent ILD of 6 dB. Apart from the displacement and reduced slope, the responses appear to be driven by the ITD as would be expected from standard duplex theory. In these respects, the 6-dB results resemble the data from Domnitz and Colburn (1977) at 500 Hz, but the basis for comparison is limited because the IPDs in our experiment never exceeded ±80°—the anatomical limit in free field (compare ±180° for Domnitz and Colburn).
For an opposing ILD of 12 dB, the responses for all listeners, except for listener D, were almost independent of source azimuth—flattened responses compared to opposing 6 dB. For listeners C and E, the flattening might be attributed to an increased number of front-back reversals. For listeners A and B, there was no notable change in the (large) number of front-back reversals, and yet the responses did not show an observable effect of the ITD. Flattened responses would be consistent with a complete dominance of the azimuth-independent ILD, but the responses were never very lateral. The average laterality across all source azimuths for all the listeners was only 12° of response azimuth (SD = 6.5°). We tentatively conclude that the zero slopes for an ILD of 12 dB, compared to positive slopes for 6 dB, represent an incomplete transition towards slipped-cycle localization, as will be discussed in Sec. VI D. For clarity, the discussion of response data for 500 Hz shown in Fig. 8 will be deferred until after discussion of 750 Hz.
D. Opposing trials—750 Hz
1. Region C′, from M to E
It is easiest to begin with the “M” point (maximum) and “E” point (end), connected by curved line “C′.” The M point appears as a discontinuity, in slope if not in actual response value. It occurs at an azimuth defined by “B” (break point). The discontinuity at B can be understood in terms of a slipped-cycle ITD by the following argument: The cross-correlation function for a 750-Hz tone has maxima separated by 360° of IPD or 1333 μs of lag. In normal circumstances, the location of the image is cued by the principal lobe of the cross-correlation function, centered on the internal representation of the ITD itself near the midline, namely, point α in Fig. 11. As the ITD increases, the peak moves to point β. Assuming that the internal lag of the peak encodes the source location, the image is perceived to move in the direction of increasing ITD. However, if there is a large opposing ILD, the binaural system may elect to follow a peak that is consistent with the sign of the ILD, namely, the peak at , leading to a response on the opposite side—a response at M. A similar argument, with signs reversed, applies to source azimuths to the left of midline. There is good evidence in favor of this interpretation.
Our experiments found that the M points were the most lateral of all responses for most listeners. They were more lateral than the most lateral responses for consistent ILDs for both 6 and 12 dB for all listeners except for listener D at 6 dB. This maximum laterality effect follows naturally from the slipped-cycle interpretation. Figure 9 shows that the B points correspond to IPDs of +127° or −129°. Therefore, the points correspond to 127° − 360° = −233° or to −129° + 360° = +231° of IPD. These slipped-cycle IPD values are large. By contrast, the largest IPDs encountered during consistent trials was 195° (or −210°), as shown by the top axis of Fig. 9. Therefore it can be expected that the M points are more lateral than responses for consistent trials, as observed experimentally.
As the azimuth increases further, the slipped-cycle peak moves from to an IPD representation that is closer to the midline. Therefore, the slipped-cycle interpretation accounts for the slope of the anomalous region C′, where increasing the azimuth and the ITD brings the image closer to the center. The curvature of C′ is again attributable to the saturating behavior of the IPD at source azimuths approaching 90°, causing an equivalent curvature of the slipped-cycle IPD.
The end point E occurs where the IPD is +195° or −201°. The slipped-cycle peaks are then at −165° or +159°. To the extent that the ITD determines the localization, the image location at the E points ought to be the same as the image location for consistent trials having the same ITDs (same IPD for fixed frequency). The top axis in Fig. 9 shows that such IPD values occur for a source azimuth of 60°. Therefore, response azimuths at the E points ought to agree with response azimuths for consistent trials at source azimuths of 60°. Our data show that the average response azimuth for the four E points (for ±6 and ±12 dB) is 53.4° (SD = 8.6). The corresponding average for the consistent responses at source azimuths of ±60° is a response azimuth of 57.4° (SD = 6.8). The two values are in reasonable agreement. The small difference between the two values could be attributed to the opposite effects of the ILD.
It is reasonable to ask why slipped-cycle localization does not occur for a lower frequency such as 250 Hz. A good answer is that the binaural system does not have enough coincidence neurons with the necessary long internal lags. For example, at 250 Hz the largest ITD experienced by any of our listeners was about 900 μs. Because the period of the tone is 4000 μs, the ITD corresponding to a slipped cycle would be 4000 − 900 = 3100 μs. This value is larger than is normally thought to be effective for human listeners. For instance, Colburn and Latimer (1978) observed that 75% of the cell density proposed by Colburn (1977) has a lag less than 1 ms. Limits like this are depicted by shaded regions on Fig. 11. By contrast for 750 Hz, the largest slipped-cycle IPD calculated above was 233°, corresponding to only 860 μs, easily accessible to the binaural system.
2. Transition region, T
The most perplexing feature of the 750-Hz data in Fig. 9(b) is the region identified as the “transition region, T” in Fig. 10. Here, the responses were opposite in sign to the azimuth, as for region C′ above, and the slopes were often negative. To quantify that statement, we examined the responses for the five listeners for both positive and negative source azimuths (ten observations) for each value of ILD. For an ILD of 6 dB, slopes were equally often negative and positive. For an ILD of 12 dB, eight slopes were negative, two were zero, and none were positive. A negative slope means that if the source azimuth and the ITD moved in one direction, the response azimuth moved in the opposite direction. Given that the ILD is constant, this behavior is hard to understand.
Region T can be understood as a transition between localization determined by the principal peak of the cross-correlation function and localization determined by the slipped-cycle peak, as promoted by the ILD. When the azimuth is not far from the midline, as it is for region T, the slipped-cycle peak occurs at large values of the lag. For instance, for a source azimuth of +22.5°, the lag for the slipped-cycle peak for 750 Hz is about −1150 μs. At this value of lag, the peak is on the edge of the grey region in Fig. 11, where there are only few coincidence cells. As the source azimuth increases beyond +22.5°, there are two opposing influences: The principal peak moves in the positive azimuthal direction and tends to cause the response azimuth to become more positive. On the other hand, with increasing source azimuth, the slipped-cycle peak moves into the region of increasing population of coincidence cells. Whether the slope in this T region is positive or negative depends on the relative strengths of these two effects. It would be expected that the slipped-cycle peak is emphasized more by a 12-dB ILD than by a 6-dB ILD, and, therefore, that the negative slope would occur more for 12 dB. That expectation agrees with the incidence of negative slopes observed for individual listeners, and it is evident for 12 dB in Fig. 9. The flat mean responses for 6 dB near the midline were also present for the individual listeners except A. Like the flat responses that occurred for 12 dB at 250 Hz, these appear to reflect the expected competition between the principal peak and the opposing ILD, probably made more complicated by some slipped cycle tendency. The competition between principal cycle and slipped cycle can be expected to lead to individual differences, which agrees with the fact that the error bars in Fig. 9 are largest in the T region for both ILD values.
E. Opposing trials—500 Hz
Opposing ILDs led to a greater disagreement among our listeners at 500 Hz than at the other two frequencies. As a result, the error bars on Fig. 8 are unusually large. For an opposing ILD of 6 dB, localization responses for listeners C, D, and E had positive slope and were almost indistinguishable from responses at 0 dB except where the IPD magnitude became greater than 120°, and the responses appeared to follow the slipped-cycle peak at 1333 μs and smaller. At 500 Hz, IPDs never became large enough to generate a C′ region. Response functions for listeners A and B were similar except that the entire function had negative or zero slope. As the ILD increased from 6 to 12 dB, the slopes for listeners A and B became more negative and the slopes for listeners C, D, and E became negative or zero. The tendency toward more negative slopes, observed for all listeners with greater opposing ILD, is consistent with the slipped-cycle interpretation because greater ILD leads to greater weight for the slipped-cycle peak. Therefore, our interpretation of the 500-Hz data is that most of the stimulus-response space, as shown in Fig. 8, is within the transition region, T.
F. Frequency dependence—Summary
The importance of the transition region for opposing ILDs at 500 Hz appears to be part of a trend, depending mostly on the availability of coincidence cells having lags with slipped-cycle ITDs and partly on the strength of the ILD encouraging slipped-cycle localization. At 250 Hz, slipped-cycle ITDs are so large (>3000 μs) that the transition region is only approached, and then only for a 12-dB ILD, changing the response slopes from positive to flat. At 500-Hz, slipped-cycle ITDs are large enough (1167–2000 μs) that the slipped-cycle localization never dominates entirely for any source azimuth, and almost all the response azimuths reflect the transition region—flat or negative slopes. At 750-Hz, the transition region extends only out to the break-point azimuth (B) of about ±45°, where the IPD is ±127°, the ITD is ±470 μs, and the slipped cycle ITD is ∓863 μs (470–1333). For larger azimuths, and smaller slipped-cycle ITDs, the responses are essentially dominated by the slipped cycle.
The logical progression described above caused us to wonder what would happen at 1000 Hz. The entire protocol was repeated for 1000 Hz with two listeners, listener B and a new listener, F. Listener F was female, age 28, with modest experience in sound localization experiments. The responses for these two listeners (not shown) were remarkably similar, and they continued the progression. Again, the M point was the largest response azimuth. It occurred at a break point azimuth of 15° for 6 dB ILD and only 7.5° for 12 dB. Thus, at 1000 Hz, the transition region had disappeared from the data. The slipped cycle dominated as soon as the source azimuth moved a little beyond the midline. If the limit of useful coincidence cell lags is the major cause for the progression with frequency, our data give estimates of that limit: 900 μs for 1000 Hz, 860 μs for 750 Hz, less than 1333 μs for 500 Hz, and not measurable for 250 Hz. In summary, our interpretation of these data posits a form of duplex theory wherein the ILD not only trades with with ITD; the ILD determines what the effect of the ITD shall be. It is a highly complicated interaction whereby modest changes in frequency or ILD can produce dramatic changes in the effective ITD and subsequent localization.
VII. BEYOND AZIMUTHAL LOCATION
A. Variance
One expects that unnatural combinations of ITD and ILD will lead to larger variance in localization responses. Gaik (1993) performed headphone experiments with noise bands using fixed ILDs of 0, 6, and 12 dB—the same ILDs as ours. He found greater SD across listeners when ITD and ILD had opposing signs. We also computed SDs, but unlike Gaik, our SDs were computed within listeners. They were also computed within sources. The averages of those individual SDs are given in Fig. 12. The largest values occurred for 750 Hz and ILD of 0 dB, but these should be discounted because they reflect the opposite side responses that occur for zero ILD when the IPD approaches 180°. There are several other peculiarities. No difference is expected between consistent (C) and opposing (O) for an ILD of zero; the difference that appears for 500 Hz is accidental. Figure 12 shows that SDs for opposing trials tend to be larger than for consistent trials. Also, a consistent ILD of 6 dB applied to a 500-Hz tone produced a SD as small as observed for baseline conditions, presumably because a 6-dB ILD is natural at 500 Hz.
B. Front-back reversals
Listeners knew that the sources were in front of them, with azimuths from −90° to +90°. They could see the sources in front at all times. Nevertheless, listeners sometimes perceived the sounds to be behind them and gave response azimuths between 90° and 180° or between −90° and −180°.
1. Natural and zero ILDs
Summing over real and baseline presentations and over all frequencies and all listeners, there were 3240 trials with nonzero source azimuth. In response to these trials, there were 501 front-back reversals, but 73% of these occurred for one listener, listener A. By contrast, listener D never made any reversals. For the three remaining listeners, B, C, and E, 7.0% of real and baseline trials were front-back reversed. These individual tendencies for real and baseline trials persisted for trials with unnatural ILD conditions.
Reversals for different frequencies and fixed ILD conditions are shown in Fig. 13. Listener D continued to show no reversals. For zero ILD, there were 77 reversals when summed over the three frequencies and three listeners B, C, and E. This number is equivalent to 8.9% of responses, only slightly higher than the percentage with baseline (natural) ILD.
2. Larger ILDs
For ILDs of 6 and 12 dB, the percentage of reversals was normally much larger than for zero ILD, as evident in Fig. 13. There, the percentage of reversals is shown by different symbols for the four relevant listeners (excludes D). Open symbols are for consistent trials; filled symbols are for opposing trials. For 250 Hz, combining 6-dB and 12-dB conditions, the filled symbols lie higher than the open symbols on eight out of eight cases. The same holds true for 500 Hz, indicating a strong tendency for opposing interaural cues to promote reversals. For 750 Hz, the filled symbols are higher on five out of eight cases, and the overall percentage of reversals is less. Summing over all frequencies and both 6- and 12-dB conditions, for listeners B, C, and E, the percentage of reversals on opposing trials was 78% greater than the percentage on consistent trials.
The slopes in Fig. 13 show that the percentage of reversals was usually greater for 12 dB than for 6 dB. This conclusion held good for 6 out of 7 slopes for 250 Hz, excluding one instance of 100% reversals for both ILDs. It held good for 7 out of 8 slopes for 500 Hz and for 6 out of 8 for 750 Hz. Slopes for consistent trials (open symbols) are always positive and often large for listeners A, C, and E for 500 Hz and listeners A and C for 250 Hz. Section VI B 3 conjectured that these were responsible for the reversed-level effect.
3. Discussion
Most of the work on front-back reversals has been confined to the mid-sagittal plane (median plane), where interaural differences are minimal. Roffler and Butler (1968), Blauert (1969), Hebrank and Wright (1974), and Zhang and Hartmann (2010), among others, focused on the importance of spectral cues and usually concluded that the ability to distinguish between front and back depends on information above 4 kHz.
If cues above 2 or 4 kHz are necessary to cue front and back, one wonders why sine tones below 1 kHz would ever elicit responses in back, contrary to all the listener's visual observation and experience with the experiments. Blauert (1996) reported evidence of front-back cues for frequencies as low as 250 Hz in sine tones and one-third-octave noises—so-called direction bands. However, our lower frequencies fall into direction bands that actually point to the forward direction. Therefore, responses in back remain unexplained.
An alternative idea is to interpret these results to indicate that listeners found ILDs of 6 and 12 dB to be, consciously or unconsciously, implausible for 250- and 500-Hz tones, especially when the ITDs were large, and especially when the sign of the ILD was opposite to the sign of the source azimuth. The next step in the interpretation assumes that there is a default perception whereby sources that are not plausibly localized are often perceived to be within the head or in back. Macpherson and Sabin (2007) also found an increased front-back reversal rate for conflicting ITD and ILD cues for noise bands.
Figure 13 shows that the front-back reversal effect for large ILDs was somewhat smaller as the frequency increased. The obvious interpretation of that result is that a large ILD is less implausible for a higher frequency. The ILD grows rapidly as a function of frequency in free field. In a room, where much ordinary listening is done, the width of the distribution of the ILD also increases with increasing frequency, though much more slowly. Unpublished experiments from our lab suggest that the SD grows approximately as the square root of the frequency.
VIII. DISCUSSION AND CONCLUSIONS
This article has reported experiments on the localization of pure tones in free field. Complete experimental control of the interaural parameters was obtained through transaural techniques used to create virtual tones. The virtual tones were always presented in a context including tones from real sources. The experiments particularly tested the duplex theory of sound localization for tones. According to that theory, tones are localized through a combination of ITDs (alternatively IPDs) and ILDs, with ITDs carrying greater weight at low frequencies. The experiments led to the following conclusions.
ILDs can be substantial for tones below 1000 Hz. Even for infinitely distant sound sources, the spherical head model predicts an ILD as large as 4 dB for 750 Hz. For a human listener, the ILD grows to 8 dB when the source is 1 meter distant, per Fig. 2. Suggestions that ILDs are physically too small to be important in sound localization if the frequency is less than a few kHz (e.g., Stevens and Newman, 1936; Moushegian and Jeffress, 1959) are contrary to fact. ILDs are too small to matter for plane waves at 128 Hz (Strutt, 1907).
Naturally occurring ILDs disambiguate tones with ambiguous IPDs. Natural ILDs eliminate the confusions caused by IPDs that approach or exceed 180°. Figure 5 for zero ILD shows many responses on the opposite side of the true azimuth, but Fig. 4, incorporating natural ILDs, shows almost none.
Naturally occurring ILDs enhance response azimuths. By comparing the responses for zero ILD (Fig. 5) to baseline trials, where the ILDs have natural free-field values (Fig. 4), it becomes evident that the natural ILDs increase the slopes of the responses as a function of azimuth (and IPD). This result appeared for all listeners at 250 Hz and for three of the five listeners at 500 Hz.
The proximal ITD localization cue is the ITD itself, not the IPD. In a localization experiment where the ILD is set equal to zero, the responses for different tone frequencies appear to be different functions of IPD for different frequencies, as shown in Fig. 6(a). They appear to be a single function of ITD, as shown in Fig. 6(b). This conclusion was previously reached by Schiano et al. (1986) and by Zhang and Hartmann (2006) in lateralization experiments using headphones. It is now seen to apply to localization as well.
The influence of the ITD is not maximum when the IPD equals 90°. This conclusion is a logical corollary of the previous bullet point, but some headphone experiments have indicated that the ITD does have its maximum effect when the IPD equals 90°. The first kind of evidence came from lateralization experiments with zero ILD (Garner and Wertheimer, 1951; Sayers, 1964; Yost, 1981). We believe that such evidence began with listener responses having opposite signs when the IPD exceeded 90°. Opposite-sign responses would then cancel other responses in subsequent averaging, leading to an apparent maximum near 90°. In contrast, our zero-ILD experiment at 500 Hz (Fig. 5) showed that the ITD had an increasing effect as the IPD surpassed 90°. The second kind of evidence came from time-intensity trading experiments (Elpern and Naughton, 1964; Young, 1976). However, an experiment that cancels an IPD by an opposing ILD risks a reinterpretation of the IPD itself through a slipped-cycle image.
ILDs added to low-frequency tones shift tone localization. Fixed ILDs introduce displacements in localization responses near zero IPD. The displacements were evident as “gaps,” similar for consistent and opposing ILDs but dependent on the size of the ILD (Figs. 7, 8, and 9).
Tones are different from noise. Contrary to the above bullet point, Wightman and Kistler (1992) reported that a broadband noise with an IPD of zero was steadfastly localized at the midline, resisting attempts to move it with naturally occurring ILDs. The only way to square our results, and the results of others such as Domnitz and Colburn (1977), with the observations of Wightman and Kitler is to conclude that a small ITD for a broadband noise is a particularly strong cue, tending to pin the image at the midline.
Further evidence for a relatively strong role for ITD compared to ILD in mid-frequency noise comes from experiments by Macpherson and Middlebrooks (2002) showing that the ILD cue in a two-octave noise band around 1000 Hz had only 20% of the predicted weight. The increasing weight given to ITD relative to ILD as the bandwidth grows from zero (sine tone) to several octaves may occur because the ITD is the more reliable cue in everyday listening. For a given source azimuth, the ILD changes dramatically with frequency. By contrast, the ITD changes little with frequency, and listeners are not confused by the changes that do occur within noise bands (Constan and Hartmann, 2003). A constant ITD over a broad bandwidth has been called “straightness” (Trahiotis and Stern, 1989). Straightness may be responsible for the relatively large weight given to ITD in the localization of noise bands. The relative weight of ITD for stimuli that are intermediate between tones and noise needs to be investigated.
Reversed-level effect. When a consistent ILD of a 500-Hz tone was increased from 6 to 12 dB, four out of five listeners gave responses that were less lateral (smaller magnitude).
The effect can be seen by comparing the large-azimuth responses in the top panel of Fig. 8. The reversed-level effect stands in contrast to the monotonic dependence on ILD seen in the 500-Hz experiments by Domnitz and Colburn (1977). The reversed-level effect appears to be related to the increased tendency for listeners to hear the image behind them—an effect that applies to sound localization much more than to lateralization.
With increasing frequency, response azimuths in ITD-ILD opposition trials increasingly follow the direction cued by the ILD. The frequency dependence of the apparent weighting of ITD and ILD directional cues became dramatically evident in our 6-dB opposing trials, where localization responses were very different for different frequencies, even though the ITDs and ILDs were essentially identical. At 250 Hz, localization responses were in the direction cued by the ITD for all listeners. At 500 Hz, only three of the five listeners responded as cued by the ITD, and at 750 Hz, none of the listeners followed the ITD direction. There are two possible explanations for this effect. One explanation views the effect as part of a trend towards increased relative influence of the ILD in binaural tradeoffs with increasing frequency. The other explanation posits that an opposing ILD promotes a slipped-cycle ITD in a direction opposite to the stimulus ITD direction. This article has chosen the latter explanation because it accounts for so many details of the opposing-ILD experiments. The frequency-dependent role of a given opposing ILD arises because slipped-cycle peaks increasingly enter the best-ITD range of coincidence cells with increasing frequency.
ILDs promote slipped-cycle localization. Slipped-cycle localization is not a normal response to ITD (IPD). However, an opposing ILD appears to promote the influence of a slipped-cycle peak having an ITD consistent with the ILD. The idea of weighting the ITD representation by the ILD is not new. It was part of the modeling by Sayers and Cherry (1957) and by Stern and Colburn (1978). The extension to the slipped cycle was made because our experiments with opposing ILDs at 750 Hz revealed a complicated behavior [Figs. 9(b) and 10(d)] that could be explained in detail by this reasoning.
Very low frequency prevents slipped-cycle localization. When the tone frequency is very low, the slipped cycle peak occurs for internal lags greater than 1 ms where there are few coincidence cells, and the slipped-cycle cannot dominate perceived localization. Nevertheless, a sufficient opposing ILD allows a slipped-cycle peak to compete with the principal peak leading to a “transition” region of IPD. The lower halves of Figs. 7–9 show the joint influence of increasing frequency and increasing ILD in causing the slope of the response azimuth functions to become systematically more negative.
Localization cues are determined by cross-correlation peaks. Our proposed model, by which localization is determined through a combination of principal and slipped-cycle influences, depends on the peaks of a rate-ITD distribution resembling cross-correlation. This assumption is a common one, e.g., Buell et al. (1991). It differs from the position-variable model by Stern and Colburn (1978) and modified by Stern and Shear (1996) wherein localization is determined by the centroid of a rate-ITD function. Our recent computational experience with centroid models is that they predict smooth localization functions, similar to those observed by Domnitz and Colburn (1977). We have not been able to make centroid models reproduce abrupt changes such as those seen at the M point in Figs. 9(b) and 10(d).
Unnatural combinations of ITD and ILD lead to greater variance in azimuthal response. Baseline trials had natural ITD and ILD combinations, and the responses in Fig. 4 showed small scatter. When natural ITDs were paired with unnatural ILDs, localization responses acquired increased scatter. As shown in Fig. 12, increasing the ILD to 12 dB for a 250-Hz tone considerably increased the variance, and opposing trials led to larger variance than consistent trials.
Unnatural combinations of ITD and ILD lead to front/back reversals. Although the ability to distinguish sources in front from sources in back is usually associated with high-frequency spectral structure, Fig. 13 shows that listeners often reported the image of our low-frequency tones to be in back, especially for the least natural of our combinations of ITD and ILD.
The duplex theory requires reinterpretation. According to the standard duplex theory, tone localization is determined jointly by the ITD and ILD with the ILD playing an increasing role with increasing frequency. An improved theory of binaural interaction would go beyond mere trading. It would acknowledge the role of the ILD in determining the effective ITD itself, especially its sign. The dramatic dependence of slipped-cycle influence on the tone frequency and on the ILD means that time-intensity trading experiments, with ITD and ILD in opposition, are likely to exhibit large variability as observed, e.g., by Whitworth and Jeffress (1961).
ACKNOWLEDGMENTS
Aimee Shore provided assistance for experiments. This work was supported by the AFOSR grant FA9550-11-1-0101. Z.D.C. was supported by the Michigan State Professorial Assistants Program.
For a plane wave, the source distance is infinite. Calculations assumed that the ear angle is 90°, the head radius is 8.75 cm, and the speed of sound is 34 400 cm/s.
Computational procedure: For each of the listeners and for each frequency and for each of the 24 source locations (not including zero azimuth), the mean ILD was computed over three real trials. Then the mean ILD was computed over six baseline trials. Table I shows the RMS difference between the real-trial mean ILD and the baseline-trial mean ILD as computed over the source locations.
The baseline responses plotted in Fig. 4 came from the runs that were otherwise devoted to zero-ILD trials. The responses to the baseline trials on runs devoted to 6-dB trials or devoted to 12-dB trials were not systematically different.
In fact, one opposite-side response was recorded. It occurred for listener A and for a 500-Hz tone, but the IPD for that trial was only about 50°. Therefore, it is more likely to be an experimental error than a true result.
There were twice as many trials for a fixed ILD of zero because the stimulus was the same for both consistent and opposing trials. To keep the number of trials the same for all fixed-ILD conditions, this section includes only those zero-ILD trials, and corresponding baseline trials, arbitrarily called “consistent.”
Maximum ITDs were larger for 250 Hz than for the other two frequencies. Dispersion like this is seen in the spherical head model. It is even more prominent in reality, which can be captured by an elliptical head model with a model torso (Cai et al., 2015).