Lateralization of complex high-frequency sounds is conveyed by interaural level differences (ILDs) and interaural time differences (ITDs) in the envelope. In this work, the authors constructed an auditory model and simulate data from three previous behavioral studies obtained with, in total, over 1000 different amplitude-modulated stimuli. The authors combine a well-established auditory periphery model with a functional count-comparison model for binaural excitatory–inhibitory (EI) interaction. After parameter optimization of the EI-model stage, the hemispheric rate-difference between pairs of EI-model neurons relates linearly with the extent of laterality in human listeners. If a certain ILD and a certain envelope ITD each cause a similar extent of laterality, they also produce a similar rate difference in the same model neurons. After parameter optimization, the model accounts for 95.7% of the variance in the largest dataset, in which amplitude modulation depth, rate of modulation, modulation exponent, ILD, and envelope ITD were varied. The model also accounts for 83% of the variances in each of the other two datasets using the same EI model parameters.
I. INTRODUCTION
Accurate sound localization requires precise neural mechanisms for processing relevant binaural cues, such as interaural time difference (ITD, here denoted by Δt) and interaural level difference (ILD). The first stage of neural integration of binaural information is located in the superior olivary complex (SOC) of the auditory brainstem, where projections from the left and the right side converge (for review, see Grothe et al., 2010). More specifically, the medial superior olive (MSO) plays a dominant role in encoding fine-structure ITDs of low-frequency sounds, while neurons in the lateral superior olive (LSO) are often sensitive to both ILDs and ITDs, including envelope ITDs (for review, see Tollin, 2003). A fundamental question in the study of binaural-information processing is how to relate the neuronal representations of these cues to the evoked percepts.
Historically, the mechanism for ITD encoding was envisaged to be formed by an array of binaural coincidence-detecting neurons receiving differently delayed inputs from the left and right ear. These internal delays were thought to compensate for the respective external ITD (Jeffress, 1948). Later anatomical and physiological studies found such circuitry in the auditory brainstem of birds (Carr and Konishi, 1990; Köppl and Carr, 2008). Based on this delay-line approach, binaural perception has most often been modeled and explained by interaural cross-correlation. Such a coincidence-detecting model unit responds maximally when the relative internal delay between its bilateral inputs compensates for the external ITD (e.g., Jeffress, 1948; Lindemann, 1986; Bernstein and Trahiotis, 2003; Stern and Shear, 1996; Colburn, 1977). However, a delay-line mechanism may not be operational in the mammalian binaural pathway (e.g., Grothe et al., 2010), but the issue is still under debate (Leibold and Grothe, 2015; Yin et al., 2019; Joris and van der Heijden, 2019).
For the processing of ITDs in the envelope of high-frequency stimuli, there is a more fundamental discrepancy between animal physiological studies and the most comprehensive models of human perception: Most neurons sensitive to envelope ITDs, especially in the LSO, receive excitatory input originating from the ipsilateral side and inhibitory input from the contralateral side (Tollin, 2003), but models that account for envelope ITD-based lateralization or discrimination commonly use an excitatory–excitatory or multiplicative interaction (e.g., Bernstein and Trahiotis, 2003, 2012). Despite this discrepancy, these models can account for most perceptual data with high accuracy. Notably, Bernstein and Trahiotis (2012) were able to account for 94% of the variance of their psychoacoustic data obtained using 960 different stimuli that were formed by varying five different stimulus parameters: amplitude modulation depth, rate of modulation, modulation exponent, ILD, and envelope ITD.
The primary goal of the current study is to investigate whether a model framework that employs neither a delay-line scheme nor a multiplication-based cross-correlation can still account for the psychoacoustic data of Bernstein and Trahiotis (2003, 2012) and of Dietz et al. (2015). We demonstrate here that the rate difference between a left-hemispheric and a right-hemispheric binaural excitatory–inhibitory (EI) model neuron is largely sufficient to explain both envelope-ITD-based and ILD-based lateralization. The excitatory–inhibitory (EI) interaction model reproduces the well-known sigmoidal ILD rate functions comparable to the characteristic response of LSO neurons. For envelope ITDs up to at least 1 ms the rate difference between left and right EI-model neurons increases monotonically with ITD. The monotonic relations are exploited by the decoding stage, mapping the rate difference to a perceptual quantity, the extent of laterality.
II. METHODS
A. Model topology
The physiologically motivated binaural lateralization model (Fig. 1) starts with the auditory periphery model of Bruce et al. (2018). The input to the model is a stimulus in the form of a pressure waveform [Fig. 2(A)]. The stimulus is first processed by a band-pass filter accounting for the processing of the middle ear (Fig. 1). Subsequent to the middle-ear filter, the signal is processed by three parallel feed-forward paths: the component 1, the component 2, and the control path (Zilany and Bruce, 2006). The collective response properties of the basilar membrane and the inner hair cells (IHCs) are represented in these pathways accounting for both the passive mechano-electrical transduction at the inner hair cells, as well as the mechano-electrical and electro-mechanical transduction facilitated by the outer hair cells. The filtered signal is converted to receptor potentials of the IHCs [Fig. 2(B)], where each has its own characteristic frequency (CF). The model also includes a physiologically realistic representation of the synapses between the inner hair cells and the auditory nerve. The output of the model is given by a spike generator that produces a series of AN spikes [Fig. 2(C); for a more detailed description of the model see Bruce et al., 2018]. Each AN fiber of this model depends on a spontaneous rate rs, on a relative-refractory time trel and on an absolute-refractory time tabs. In Fig. 2(D), the spike phase of the output is plotted in a peri stimulus time histogram (PSTH). In a nutshell, the block “periphery model” in Fig. 1 transforms the acoustic stimulus into a spiking pattern of auditory nerve (AN) fibers arranged along the tonotopical axis.
(Color online) The model structure is subdivided into two parts: (1) The primary processing stage that constitutes both the periphery receiving the binaural sound stimulus as the input, and the excitatory-inhibitory (EI) integration stage that bilaterally receives the excitatory (arrow) and inhibitory (bullet) outputs of the periphery. (2) The decision stage with a simple two-channel rate-difference model that maps to the acoustic pointer (see Sec. II D) and predicts the extent of laterality.
(Color online) The model structure is subdivided into two parts: (1) The primary processing stage that constitutes both the periphery receiving the binaural sound stimulus as the input, and the excitatory-inhibitory (EI) integration stage that bilaterally receives the excitatory (arrow) and inhibitory (bullet) outputs of the periphery. (2) The decision stage with a simple two-channel rate-difference model that maps to the acoustic pointer (see Sec. II D) and predicts the extent of laterality.
Steady state response of the periphery model (CF = 4 kHz) for a fully modulated, sinusoidally amplitude-modulated (SAM) tone with a modulation frequency fm = 128 Hz and carrier frequency fc = 4 kHz at a stimulus level of 20 dB SPL. (A) Stimulus sound waveform, (B) the IHC receptor potential, (C) the spike raster plot for AN fibers, and (D) the PSTH.
Steady state response of the periphery model (CF = 4 kHz) for a fully modulated, sinusoidally amplitude-modulated (SAM) tone with a modulation frequency fm = 128 Hz and carrier frequency fc = 4 kHz at a stimulus level of 20 dB SPL. (A) Stimulus sound waveform, (B) the IHC receptor potential, (C) the spike raster plot for AN fibers, and (D) the PSTH.
For the binaural interaction stage, we used the coincidence-counting model of Ashida et al. (2016). A model neuron of this stage receives excitatory synaptic inputs from ipsilateral AN fibers and inhibitory inputs from contralateral AN fibers (Fig. 1), all with the same CF. In the EI-model, two temporal rectangular windows slide along the time axis: for the excitatory fibers from the ipsilateral side and for the inhibitory fibers of the contralateral side. The sum of excitatory spikes, each counting +1, and inhibitory spikes, each counting − form an activation variable that can be understood as a surrogate for the membrane potential of a real neuron (Ashida et al., 2017). Once the activation variable reaches a specified response threshold , the EI-model neuron generates an output spike. Subsequent to a generated action potential, no further output is possible within the refractory period , even if synaptic integration during this period may occur (i.e., no zero set after spiking).
B. Relating EI-model output to experimental LSO data
Characteristic outputs of the EI-stage in response to a 20 dB sound pressure level (SPL) sinusoidally amplitude-modulated tone with a carrier frequency () of 4 kHz and a modulation frequency () of 128 Hz are shown in Fig. 3. Instead of using spike trains generated by a Poisson process as an input (Ashida et al., 2016), we use a model of the auditory periphery (Bruce et al., 2018) as the front end. The EI-model by Ashida et al. (2016) was never tested before for such an input and generated only a very sparse output, if any. Therefore, the EI parameters had to be adjusted (see Sec. III). With these settings, the model produces ILD- and ITD-rate functions [Figs. 3(A) and 3(B)] similar to those observed experimentally in the LSO (e.g., Joris and Yin, 1995, 1998; Joris, 1996; Tollin and Yin, 2002). As also physiologically measured, the simulated output spike rate of the model varies periodically with ITD [Fig. 3(A)]. The overall response rate increases with ILD, while the shape of the periodic ITD-rate functions remains mostly unaffected by ILD (comparable physiological results can be found in Fig. 8 of Joris and Yin, 1995). For negative values of ILD, which indicate a higher stimulus intensity at the left side, the response rate of the left simulated EI-neuron is higher compared to the right model neuron [Fig. 3(B)]; for positive ILDs, this relation is reversed so that the right model neuron generates more spikes than the left. This effect is comparable to physiological data from Joris and Yin [1995, Fig. 9(A)].
Tuning functions of the EI-stage for ITD and ILD. The model parameters were those derived further below in the Sec. III (see Table I, best performance). AN input fibers had a CF = 4 kHz. The same stimulus as for Fig. 2 was used. (A) ITD rate functions of the EI-model in the left hemisphere for five different ILDs. (B) ILD rate functions for EI-model neurons in the left and right hemisphere (ITD = 0 ms). Filled symbols correspond to data points also shown in panel A. (C) ITD rate functions for the left and right hemisphere (ILD = 0 dB). (D) Rate difference between the left and right EI-model neuron output. The shaded area between +/−0.63 ms corresponds to the approximated physiological range of humans.
Tuning functions of the EI-stage for ITD and ILD. The model parameters were those derived further below in the Sec. III (see Table I, best performance). AN input fibers had a CF = 4 kHz. The same stimulus as for Fig. 2 was used. (A) ITD rate functions of the EI-model in the left hemisphere for five different ILDs. (B) ILD rate functions for EI-model neurons in the left and right hemisphere (ITD = 0 ms). Filled symbols correspond to data points also shown in panel A. (C) ITD rate functions for the left and right hemisphere (ILD = 0 dB). (D) Rate difference between the left and right EI-model neuron output. The shaded area between +/−0.63 ms corresponds to the approximated physiological range of humans.
The ITD rate functions of the left and the right hemisphere are shown in Fig. 3(C). The ILD was set to 0 dB and the response rates of the simulated neurons are shown as a function of ITD [for a physiological comparison, see Fig. 16(B) of Joris and Yin, 1998]. The trough is not at zero ITD, because the inhibition lasts longer than the excitation. The minimum response is reached when the excitation is centered in the longer inhibition (Ashida et al., 2016), i.e., Δt worst = (Win – Wex)/2. The ITD sensitivity is mirrored at Δt = 0 ms for the model neurons of the left and right sides. The rate difference between the two hemispheres is shown in Fig. 3(D). The function is nearly point symmetric and linear around the coordinate origin. The shape of the rate-difference function depends on the shapes of the left and right ITD rate functions, which, in turn, depend on the stimulus parameters such as ILD, modulation frequency, and modulation depth (see Sec. III A).
C. Quantifying ITD-information transmission
Figures 4(A)–4(D) display the model output of the intermediate processing stages along the tonotopic array for a 4 kHz, sinusoidally amplitude-modulated (SAM) tone. For faithful coding of ITD information, the peripheral stage needs to produce sufficient activity [quantified by spike rate, Fig. 4(A)] and phase locking [Fig. 4(B)]. The degree of phase locking can be quantified by the vector strength (Goldberg and Brown, 1969). Each individual spike is represented as a unit vector with angle corresponding to the spike time within the cycle. The vector strength is defined as
with being the total number of spikes and indicating the kth spike. If all spikes occur at a single phase of the stimulus waveform v becomes 1. Phase locking can be visually observed in Figs. 2(C) and 2(D) by the synchronized responses of the simulated AN fibers to the envelope of the SAM tone.
(Color online) Model responses across different CFs for a 68 dB SPL SAM tone with fm = 128 Hz. (A) AN response rate. Different line styles indicate fiber types with different spontaneous rates. (B) Corresponding vector strength. Triangles indicate the center frequencies used in the panels below. (C) Rate-ITD functions for left (L) and right (R) hemisphere. (D) Rate difference ().
(Color online) Model responses across different CFs for a 68 dB SPL SAM tone with fm = 128 Hz. (A) AN response rate. Different line styles indicate fiber types with different spontaneous rates. (B) Corresponding vector strength. Triangles indicate the center frequencies used in the panels below. (C) Rate-ITD functions for left (L) and right (R) hemisphere. (D) Rate difference ().
Compared to physiological experiments, psychoacoustic measurements of envelope-ITD perception are usually performed at much higher sound levels, e.g., 65 or 75 dB SPL. Since AN fibers with CFs matched to the carrier frequency phase lock very poorly to the envelope at such high levels (Joris and Yin, 1992; Dreyer and Delgutte, 2006), envelope ITDs have to be extracted by other means, presumably by neurons tuned to frequencies different from the signal carrier. Figure 4 shows model responses to a higher-level stimulus of 68 dB SPL (compared to 20 dB SPL in Figs. 2 and 3). It is apparent that off-frequency neurons can encode ITD information, while the response of on-frequency EI-model neurons is generally lower and barely changes with ITD [Fig. 4(C)].
D. Decoding the EI response
The output of the AN model along the tonotopic axis [Figs. 4(A) and 4(B)] serves as the input for the central processing stage [Figs. 4(C) and 4(D)], whose output is then used in the decision stage to simulate the extent of laterality. Each EI-model neuron receives a number of excitatory and inhibitory AN model inputs (see Table I) with matching CF. Two EI-model neurons, one from the left and one from the right side (with the same CF) form a pair and the hemispheric rate difference is computed between them:
with being the rate of an EI-model-neuron in the right hemisphere at the respective CF(j). The mean spike rate difference between the left and right hemisphere is
with being the number of frequency channels. The mean rate difference is later converted into the perceived extent of laterality measured in the psychoacoustic experiments [Eq. (4)].
Parameters of the EI-model (first column), respective symbols used (second column), the simulated range (third column), and parameters leading to the best performance obtained by the grid search (fourth column).
Parameter . | Symbol . | Simulated range . | Best performance . |
---|---|---|---|
Number of excitatory inputs | 18–22 | 20 | |
Number of inhibitory inputs | 6–9 | 8 | |
Excitatory window | 0.5–1.6 ms | 1.1 ms | |
Inhibitory window | 2.5–4.0 ms | 3.1 ms | |
Response threshold | 2–5 | 3 | |
Inhibitory gain | 1–3 | 2 | |
Refractory period | a | 1.6 ms |
Parameter . | Symbol . | Simulated range . | Best performance . |
---|---|---|---|
Number of excitatory inputs | 18–22 | 20 | |
Number of inhibitory inputs | 6–9 | 8 | |
Excitatory window | 0.5–1.6 ms | 1.1 ms | |
Inhibitory window | 2.5–4.0 ms | 3.1 ms | |
Response threshold | 2–5 | 3 | |
Inhibitory gain | 1–3 | 2 | |
Refractory period | a | 1.6 ms |
The refractory period is set to 1.6 ms (or equal to if 1.6 ms to avoid multiple output spikes resulting from the same input spike).
In previous studies, various central read-out mechanisms were used (for a review, see Dietz et al., 2018). Kelvasa and Dietz (2015) showed that the hemispheric response difference, averaged across the tonotopic array of LSO model neurons, is proportional to azimuthal sound-source localization in cochlear implant users. With the focus of the current study on binaural interaction, we adopt this simple linear mapping of the mean hemispheric response rate difference to the extent of laterality, instead of employing a more complex mapping stage.
In previous psychoacoustic experiments, extents of laterality for high-frequency stimuli were commonly measured with an acoustic-pointing task (Bernstein and Trahiotis, 2003, 2012; Dietz et al., 2015). In these experiments, listeners were first presented with the high-frequency target stimulus. They were then presented with a pointer stimulus, which was a band-limited Gaussian noise centered at 500 Hz having a bandwidth of 200 Hz. The listeners were asked to adjust the ILD of the pointer to match the perceived intracranial position of the target and that of the pointer stimulus. Pointer and target stimuli were repeatedly alternated until the subjects indicated that they had matched the position of the pointer and the target (open loop). The pointer ILD was then used as a measure for the extent of laterality. Positive or negative pointer ILDs indicate a right or left intracranial position, respectively.
Having thus obtained one neural response rate difference for each condition (), the last model stage relates the simulated to the experimentally obtained pointer ILD. Assuming the simplest case of a linear relationship, a single scaling factor connects the two quantities:
with being the predicted pointer ILD for the ith condition. p is considered to be a subject specific factor.
To quantify the goodness of the prediction, the amount of variance accounted for (VAF, by the model was calculated by
with and being the observed and predicted acoustic pointer ILDs, respectively, and the mean of the observed values for the ith condition. Additionally, we state the root-mean-square error ϵ.
E. Simulation design
The same stimuli as in the respective studies were generated. They differ from the original stimuli only in sampling rate (set to 100 kHz needed by the peripheral stage), duration, and ITDs. The ITDs were inserted after the peripheral processing to reduce computational demand. From the total 2.2 s stimulus duration the first 200 ms were discarded to avoid stimulus onset effects and associated adaptation (of auditory nerve fibers). In the psychoacoustic experiment, an open loop for matching the pointer ILD was used. Therefore, it is unclear how many seconds the subjects exploited to adjust the pointer. Nevertheless, for such a pointer paradigm, the simulated duration is expected to be uncritical, influencing only the spike standard error but not the mean rates (this holds for the psychoacoustic responses as well).
In total, Mex + Minh AN fibers were simulated for each of the N = 30 CFs in the range of 2–10 kHz distributed equidistantly along the tonotopical axis (according to Greenwood, 1961). This results in 840 AN fibers per hemisphere for default parameters merging into the decoding stage described above. For randomly selected AN fibers, each with a unique response rate, a left or right bias in overall activity was occasionally observed in the EI-stage, leading to a lateralization bias not observed in normal hearing (NH) listeners. In a real system more neurons or more central stages are expected to average out or compensate any bias. To avoid such bias in our model, a deterministic distribution of spontaneous rates was generated, representative for the used fiber-type: We picked Mex or Minh rs-values, equidistant on the cumulative Gaussian distribution of the according fiber-type. We assume μ = 4 sp/s, σ = 4 sp/s in a range of 0.5–18 sp/s for medium-spontaneous rate fibers and μ = 70 sp/s, σ = 30 sp/s in a range of 18–180 sp/s for high-spontaneous rate fibers (Liberman, 1978; Bruce et al., 2018). The refractory times trel (131–894 μs) and tabs (209–692 μs) were randomly picked for each AN-fiber, but identically in both hemispheres (for values and range see Miller et al., 2001; Bruce et al., 2018).
The model was implemented in matlab (MathWorks, Natick, MA). The code is published as supplementary material to this paper.1
III. PREDICTING LATERALIZATION OF HIGH-FREQUENCY SOUNDS
To fit the model output to the three experimental datasets, the focus was on varying the parameters of the EI-model-stage. This stage has seven parameters: the numbers of excitatory and inhibitory inputs ( and ), duration of the excitatory window (), duration of the inhibitory window (), the response threshold (), an inhibitory gain factor () to increase the weight of inhibitory inputs, and the duration of the refractory period () (Table I). For a detailed description of each parameter and its effects, see Ashida et al. (2016). The refractory period was kept constant at 1.6 ms. In addition to varying these EI-stage parameters, the model was tested with either high- or mid-spontaneous rate AN fibers.
Furthermore, there is the subject specific factor p to relate the rate difference to pointer ILD. The aim of this study is to model only average data, but even the average p of small cohorts may vary. Therefore, p is fitted to the mean data of each of the three datasets, respectively.
The aim of the parameter variation was to account for most of the variance in three published datasets. Figure 5 gives an overview of how the model performance depends on various parameters for each of the three studies presented below. There is no common optimum set across studies, but many maxima occur within the parameter test range that was inspired by physiology. A range of parameters allows us to account for much of the variance in all studies. The parameter set listed in Table I with Winh = 3.1 ms (cf. filled symbols in Fig. 5) explains much of the variance of all three datasets and therefore was chosen to generate the simulation of data from the three studies shown in all figures below.
(Color online) Variance explained () for Bernstein and Trahiotis (2012) with p = 0.29 dB/sps (top row), Bernstein and Trahiotis (2003) with p = 0.48 dB/sps (middle), and Dietz et al. (2015) with p = 1.02 dB/sps (bottom row). Color and line style indicate the threshold: θ = 2 (blue, dotted), θ = 3 (red, solid), θ = 4 (yellow, dashed-dotted), θ = 5 (purple, dashed). Note the different ordinate ranges. First column is with mid-spontaneous rate fiber input and an inhibitory gain δ = 1, second column with d = 2 and the third column represents model predictions with high-spontaneous rate fibers projecting to the EI stage and d = 1. The filled symbols represent for the chosen parameter combination (Table I) for the particular datasets.
(Color online) Variance explained () for Bernstein and Trahiotis (2012) with p = 0.29 dB/sps (top row), Bernstein and Trahiotis (2003) with p = 0.48 dB/sps (middle), and Dietz et al. (2015) with p = 1.02 dB/sps (bottom row). Color and line style indicate the threshold: θ = 2 (blue, dotted), θ = 3 (red, solid), θ = 4 (yellow, dashed-dotted), θ = 5 (purple, dashed). Note the different ordinate ranges. First column is with mid-spontaneous rate fiber input and an inhibitory gain δ = 1, second column with d = 2 and the third column represents model predictions with high-spontaneous rate fibers projecting to the EI stage and d = 1. The filled symbols represent for the chosen parameter combination (Table I) for the particular datasets.
A. Raised-sine stimuli
To investigate the influence of envelope shape on the extent of laterality, previous studies [Bernstein and Trahiotis (2012)] used raised-sine stimuli. These stimuli are based on SAM tones and allow for an arbitrary exponent () that influences the peakedness of the stimulus envelope independent of and modulation depth () [John et al., 2002; see Fig. 1 of Bernstein and Trahiotis (2009) for examples of raised-sine stimuli]. An exponent of = 1 generates a conventional SAM tone. For higher exponents, the raised sine has a steeper slope. In a compact form (Bernstein and Trahiotis, 2012), the stimulus is defined by
In the study of Bernstein and Trahiotis (2012), was fixed at 4 kHz, while modulation frequencies of 32, 64, 128, and 256 Hz, and modulation depths of 0.25, 0.5, 0.75, and 1.0 were applied. The stimulus was either raised to a power of = 1, or = 8. ITDs of 0, 200, 400, 600, 800, and 1000 μs were employed and ILDs of −8, −4, 0, 4, and 8 dB were applied by symmetrically varying the sound pressure level (in dB) on the left and the right side. An overall sound level of 68 dB SPL was used. All possible stimulus parameter combinations were tested, resulting in a total of 960 stimulus conditions (Fig. 6). This extensive dataset captures many factors influencing sound lateralization, and is used here as the central dataset for fitting the model parameters of the EI-stage. Arguably because of its richness, it restricts the acceptable range of parameters more than the other datasets shown in Fig. 5.
(Color online) Extents of laterality experienced by listeners (Bernstein and Trahiotis, 2012) are shown as black symbols. They are grouped based on the parameters of the stimulus. The predictions of the model (scaling factor p = 0.29 dB/sps) are shown as solid lines with the ITD as abscissa. The four blocks (each consisting of eight panels) represent the four modulation frequencies, the panels in columns organize the different depths of modulation, while the two different exponents are separated into rows. Each panel shows the acoustic pointer in dB on the ordinate, which quantifies the extent of laterality as a function of ITD. Five different ILDs are used in each panel. The variance accounted for is stated in each panel separately. Adapted and redrawn from Fig. 1 of Bernstein and Trahiotis (2012) with permission.
(Color online) Extents of laterality experienced by listeners (Bernstein and Trahiotis, 2012) are shown as black symbols. They are grouped based on the parameters of the stimulus. The predictions of the model (scaling factor p = 0.29 dB/sps) are shown as solid lines with the ITD as abscissa. The four blocks (each consisting of eight panels) represent the four modulation frequencies, the panels in columns organize the different depths of modulation, while the two different exponents are separated into rows. Each panel shows the acoustic pointer in dB on the ordinate, which quantifies the extent of laterality as a function of ITD. Five different ILDs are used in each panel. The variance accounted for is stated in each panel separately. Adapted and redrawn from Fig. 1 of Bernstein and Trahiotis (2012) with permission.
Model predictions for all stimulus conditions are displayed in Fig. 6 as solid lines (after converting rate difference to pointer ILD). The different stimulus conditions are organized into 32 panels for all the combinations of the parameters , , and of the raised sine. Each panel shows the acoustic pointer depending on the ITD and ILD. The mean measured values of acoustic pointer (Bernstein and Trahiotis, 2012) are shown as symbols. The steepness of the relation between ITD and pointer ILD in each panel can be used as an indicator for pronounced ITD-based lateralization.
Taken together, three main effects are apparent in both the observed data and the model predictions (Fig. 6): (1) The ILD-based lateralization is almost constant; i.e., the five lines in each panel primarily differ by an offset which is almost constant across panels, with one exception described below. (2) The extent of ITD-based lateralization increases with modulation depth and with modulation frequency up to 128 Hz but then decreases slightly for modulation frequencies of = 256 Hz. (3) The relation between ITD and pointer ILD is linear for small modulation frequencies, combined with small depth of modulation ( 64 Hz and ), but the relation becomes curved for high modulation frequencies and full modulation ( 128 Hz and = 1.0).
The model was able to quantitatively reproduce all four main effects and accounted for 95.7% of the variance in the data (ϵ = 2 dB) with the parameter set chosen here (see Table I). A model test using only 1 s signals, rather than 2 s, accounted for an almost identical 95.5% of the variance. Because the chosen metric captures only the overall trends in the data, we also state the explained variance separately for each panel. Two deviations are apparent. First, for an exponent of = 8, = 32 Hz, and full modulation (Fig. 6, upper left block, panel in the lower right corner) the model underestimates lateralization when both ITD and ILD are favoring the same direction. Second, for = 256 Hz, = 1, and full modulation the model underestimates the impact of a positive ITD when a negative ILD is present at the same time (Fig. 6, lower right block, panel in the upper right corner).
The model output for all 960 conditions (prior to across-frequency integration and conversion into acoustic pointer) is shown in Fig. 7. The rate differences at each CF are related to the lateralization data recorded by Bernstein and Trahiotis (2012). The rate difference responses of off-frequency neurons show a higher correlation with the observed data. For neurons with a CF matching the carrier frequency of 4 kHz, the model accounted for only 30% of the variance in the data, primarily ILD-based lateralization. In contrast, the off-frequency pair of model neurons accounts for 92% of the variance.
Scatter plots of the rate difference () between left and right EI-model neuron pairs versus observed pointer ILDs (Bernstein and Trahiotis, 2012) for all conditions and across seven different example CFs. In each plot, 960 single data points are shown, corresponding to the varied parameter combinations of the high-frequency raised sines. The correlation coefficient (r) and the variance accounted for () for each of the seven positions in the tonotopic array are given. The model was parameterized with the best overall parameter set (see Table I).
Scatter plots of the rate difference () between left and right EI-model neuron pairs versus observed pointer ILDs (Bernstein and Trahiotis, 2012) for all conditions and across seven different example CFs. In each plot, 960 single data points are shown, corresponding to the varied parameter combinations of the high-frequency raised sines. The correlation coefficient (r) and the variance accounted for () for each of the seven positions in the tonotopic array are given. The model was parameterized with the best overall parameter set (see Table I).
B. Transposed stimuli
Transposed stimuli were designed to better mimic low-frequency tonal responses in high-frequency AN fibers. Low-frequency tones depolarize the hair cells only during the condensation phase, i.e., about 50% of each period. In contrast, sinusoidally amplitude modulated high-frequency stimuli cause a continuous depolarization of hair cells, except for the short moments of zero amplitude where the receptor potential draw near the resting potential.
The purpose of transposed stimuli is to mimic the activation pattern of a low-frequency stimulus in a high-frequency region. To do so, a low-frequency base stimulus (e.g., a low frequency sinusoidal) is half-wave rectified and subsequently low-pass filtered, to simulate the functional role of hair cells. The output serves as the modulator and is multiplied by a high-frequency carrier (van de Par and Kohlrausch, 1997). Once processed by the real hair cells, the output provides high-frequency AN fibers with a temporal excitation pattern that is relatively similar to that generated by the base stimulus available to low-frequency AN fibers. A difference remains in the rarefaction phase where the base stimulus causes hyperpolarization but the transposed stimulus results in a resting potential.
Colburn and Esquissaud (1976) hypothesized that the similar AN activation should cause similar binaural interaction, i.e., for low-frequency and transposed tones. In contrast, conventional high-frequency stimuli are expected to produce weaker interaural differences. The hypothesis was tested by Bernstein and Trahiotis (2003) comparing three types of stimuli: low-frequency noise, transposed noise, and high-frequency narrow-band Gaussian noise. To test our high-frequency model, we used the latter two stimulus types [Fig. 8(B)]. The stimuli were generated in the same way as in Bernstein and Trahiotis (2003) and the carrier frequency for all stimuli was fixed at 4 kHz, while bandwidths of 25, 50, 100, 200, and 400 Hz where used. The transposed stimulus was modulated with a half-wave rectified, low-frequency noise centered at either 125 or 250 Hz. For the 125 Hz center frequency stimulus, the largest bandwidth was 200 Hz.
(Color online) Stimulus conditions, psychoacoustic data and model predictions. (A) Psychoacoustic data measured by Bernstein and Trahiotis (2003) represented as symbols, and our model predictions shown as curves (scaling factor p = 0.48 dB/sps). Data adapted and redrawn from Fig. 2 of Bernstein and Trahiotis (2003) with permission. The input was transposed noise centered at 125 Hz (densely dashed-dotted-dotted, diamond), transposed noise centered at 250 Hz (solid, bullet), and high-frequency Gaussian noise (dashed-dotted, down-facing triangle). (B) High-frequency transposed tones as counterparts of low-frequency Gaussian noise with different center frequencies of either 125 Hz (top) or 250 Hz (middle) and different bandwidths, and high-frequency Gaussian noise with a center frequency of 4 kHz (bottom). (C) High-frequency stimuli with different envelope shapes (Dietz et al., 2015). Envelope shapes have a short 1.25 ms rise time, 18.75 ms decay and 0 ms plateau (top, right-facing triangle), a short 1.25 ms rise time, 8.75 ms plateau, and 1.25 ms decay (middle, square), and a long 18.75 ms rise time, 0 ms plateau, and 1.25 ms decay (bottom, left-facing triangle). (D) The corresponding psychoacoustic data (Dietz et al., 2015) represented as symbols. Line styles for model predictions with a continuous axis of ITD were dotted for a short rise time and shallow decay, dashed for rectangular modulation, and densely dashed-dotted for a long rise time and fast decay. The grey area refers to the stimulus statistics used for fitting the model parameters. Adapted and redrawn from Fig. 2 of Dietz et al. (2015) with permission. For reasons of comparability the pointers were shown not normalized with respect to the subject's average pointer ILD (scaling factor p = 1.02 dB/sps).
(Color online) Stimulus conditions, psychoacoustic data and model predictions. (A) Psychoacoustic data measured by Bernstein and Trahiotis (2003) represented as symbols, and our model predictions shown as curves (scaling factor p = 0.48 dB/sps). Data adapted and redrawn from Fig. 2 of Bernstein and Trahiotis (2003) with permission. The input was transposed noise centered at 125 Hz (densely dashed-dotted-dotted, diamond), transposed noise centered at 250 Hz (solid, bullet), and high-frequency Gaussian noise (dashed-dotted, down-facing triangle). (B) High-frequency transposed tones as counterparts of low-frequency Gaussian noise with different center frequencies of either 125 Hz (top) or 250 Hz (middle) and different bandwidths, and high-frequency Gaussian noise with a center frequency of 4 kHz (bottom). (C) High-frequency stimuli with different envelope shapes (Dietz et al., 2015). Envelope shapes have a short 1.25 ms rise time, 18.75 ms decay and 0 ms plateau (top, right-facing triangle), a short 1.25 ms rise time, 8.75 ms plateau, and 1.25 ms decay (middle, square), and a long 18.75 ms rise time, 0 ms plateau, and 1.25 ms decay (bottom, left-facing triangle). (D) The corresponding psychoacoustic data (Dietz et al., 2015) represented as symbols. Line styles for model predictions with a continuous axis of ITD were dotted for a short rise time and shallow decay, dashed for rectangular modulation, and densely dashed-dotted for a long rise time and fast decay. The grey area refers to the stimulus statistics used for fitting the model parameters. Adapted and redrawn from Fig. 2 of Dietz et al. (2015) with permission. For reasons of comparability the pointers were shown not normalized with respect to the subject's average pointer ILD (scaling factor p = 1.02 dB/sps).
Figure 8(B) shows the stimuli used for our simulations: (1) transposed low-frequency narrow-band Gaussian noise centered at 125 Hz [Fig. 8(B), upper panel], (2) transposed low-frequency narrow-band Gaussian noise centered at 250 Hz [Fig. 8(B), middle panel], (3) high-frequency narrow-band Gaussian noise [Fig. 8(B), bottom panel]. The stimulus level was set to 72 dB SPL as in Bernstein and Trahiotis (2003).
The model predictions were compared to the psychoacoustic data taken from the corresponding study (Bernstein and Trahiotis, 2003). EI-model parameters were kept unchanged (Table I) for the data shown in Fig. 8. Only a different scaling factor was fitted [Eq. (4)].
In general, larger extents of laterality were observed with increasing ITD [Fig. 8(A)]. The bandwidth was the dominant factor for the lateralization of high-frequency Gaussian noise. Three main effects could be observed from the psychophysical data [Fig. 8(A), symbols]: (1) For high-frequency Gaussian noise, there was virtually no lateralization for bandwidths of 25 and 50 Hz [Fig. 8(A), top two panels], and the lateralization increased with bandwidth [Fig. 8(A), solid lines]. (2) The transposed stimuli produced larger extents of laterality compared to non-transposed stimuli [Fig. 8(A), high-frequency Gaussian noise compared to transposed 125 and 250 Hz stimuli] as the acoustic pointer was adjusted by listeners with higher values. (3) The pointer ILDs were similar for the two different center modulation frequencies of the transposed noises [Fig. 8(A), transposed 125 and 250 Hz stimuli]. Our model reproduced most of these trends in the data [Fig. 8(A)] and accounted for 82.7% of the variance (ϵ = 2.4 dB). Other parameter sets, more optimal for this particular study allow us to account for just over 90% of the variance (see Fig. 5).
C. Envelope rise- and decay elements
Another class of high-frequency AM stimuli was constructed by independently varying durations of the rising envelope segment (rise time), the falling envelope segment (decay), the pause between lobes, and of the peak plateau of the temporal envelope (e.g., Klein-Hennig et al., 2011; Dietz et al., 2015). In particular, this configuration allows generation of temporally asymmetric envelopes. The extent of ITD-based lateralization perceived by subjects was found to be particularly high, when a short rise time and a non-zero pause were combined (Dietz et al., 2015). The steepness of the rising part of the envelope had more influence on lateralization than the steepness of the decay. So far, such differences have not been accounted for with cross-correlation based models (e.g., Klein-Hennig et al., 2011).
The current model was tested with three different envelope shapes: (1) a short rise time with zero plateau and long decay [Fig. 8(C), top], (2) a short rise time with a pause-equivalent plateau duration and a fast decay [Fig. 8(C), middle], and (3) a long rise time with zero plateau and fast decay [Fig. 8(C), bottom]. The carrier was a fully modulated 4 kHz tone, matching the peak level of a 65 dB SAM as in Dietz et al. (2015). Again, the model with parameters of best performance (Table I) was used and a new scaling factor was calculated [Eq. (4)] to account for the different subject group and differences in the experimental setting. Primarily due to the lower stimulus level in this study the scaling factor had to be larger compared to the studies of Bernstein and Trahiotis (2003, 2012).
The model predictions were compared with the data from the normal-hearing listeners in Dietz et al. (2015). In both data and model predictions, a long rise time and short decay led to the least pronounced lateralization [Fig. 8(D), left pointing triangles and dashed-dotted line] and to a linear increase of lateralization with ITD up to 2 ms. Also, the model correctly predicted the 50%–80% higher lateralization for the condition with short rise and short decay. However, the model clearly underestimated the lateralization for a short rise time and long decay [Fig. 8(D), squares and dotted line], particularly at large ITDs. The model accounted for 83.4% of the variance (ϵ = 2.8 dB).
IV. DISCUSSION
The present study aimed to develop a rate difference model that links the physiology of mammals with the observed binaural phenomena in human psychoacoustics. We proposed a computational model that captures the effect of ILD- and envelope-ITD-based sound lateralization of narrowband high-frequency stimuli. At a given sound level, predicted lateralization is proportional to the summed hemispheric rate difference of EI- model neurons with identical parameters but different CFs. The EI-model neurons resemble the binaural processing core, corresponding to the LSO (Ashida et al., 2016), which is the primary nucleus in the mammalian brain for encoding both ILD and envelope ITD (Tollin, 2003).
A. Influence of model parameters
The EI-model (Ashida et al., 2016) has seven parameters that were never fitted to account for human perception. Such a number of free parameters may raise concerns about potentially being able to fit to any dataset with up to seven independent stimulus dimensions. Nevertheless, large covariances of parameter influences can be expected to limit the degrees of freedom and the dataset from Bernstein and Trahiotis (2012) which comprises a five-dimensional stimulus space is well suited to study the model parameter dependences.
The data from Fig. 5 (top row) reveal that for most ad-hoc chosen sets of parameters only the duration of the excitatory window had to be optimized to account for at least 94% of the variance, just as the original model from Bernstein and Trahiotis (2012). Similarly, for any fixed in the tested range (see Table I), parameters such as gain, threshold, or fiber number could be adjusted to obtain an excellent fit. This observation suggests strong interdependences across the parameters. For example, decreasing the number of input fibers has a similar effect as increasing the threshold. Similarly, halving the inhibitory gain is comparable to doubling the number of inhibitory input fibers.
Only the duration of the inhibitory input had to be within a narrow range between 2.9 and 3.3 ms, independent of the values of , , and . With = 3.1 ms fixed, at least 94.8% of the variance can be explained by optimizing only the duration of the excitatory input, irrespective of other parameters (within the tested range). has a critical role in determining the upper modulation frequency limit for ITD sensitivity and the starting decline in ITD-based lateralization around 256 Hz. For subjects with a higher frequency limit (e.g., Monaghan et al., 2015), a shorter integration window would be required.
If the focus is put on sparse transient stimuli, i.e., low fm, m = 1, and a long modulation trough, as in Dietz et al. (2015), the influences of the parameters are somewhat different (Fig. 5). The short modulation onset events sometimes cannot cause similarly large response rates as for other stimuli with the same lateralization. Overall responses are weaker and the simple rate difference metric may underestimate the influence of the few reliable responses. The same effect has been observed for the = 32 Hz, = 1, = 8 conditions of Bernstein and Trahiotis (2012), where our model underestimates ITD-based lateralization. In summary, it appears as if the model utilizes two central degrees of freedom: (1) Inhibitory integration time and (2) the right combination of excitation, inhibition, and threshold.
B. Relation to other models
The main difference to other models is arguably in the structure: While other models have fully or partially separated mechanisms to encode ILDs and envelope ITDs (e.g., Bernstein and Trahiotis, 2012; Takanen et al., 2014), the EI model neurons can simultaneously account for both ILD- and envelope ITD-based extents of laterality. This is a more constrained situation for a model and at the same time in closer relation to neural responses in the LSO (Tollin, 2003).
With only two free EI-model parameters (duration of excitatory and inhibitory window), the model can account for about 95% of the variance across a wide range of values for the other parameters in the biggest dataset. For reference, the delay line-based model of Bernstein and Trahiotis (2012) accounted for a practically identical 94% of the variance in the same dataset. The overall high variance accounted for by all models, is to some extent due to the fact that already a pure ILD-based lateralization model accounts for 63.1% of the variance in this particular dataset. The overall performance of the two models is similar, but some differences can be observed. For instance, at = 256 Hz, = 8, and = 0.5, the delay-line-based model predicted a linear dependence of lateralization and envelope ITD and accounted for 75.0% of the variance in the data, while our model accounted for 96.4% in the respective panel (Fig. 6). In contrast, the proposed model underestimated ITD-based extents of laterality at = 32 Hz, = 8, and full modulation ( = 89%), while the model of Bernstein and Trahiotis (2012) accounted for 97.8% of the variance in these conditions. Last, the proposed model even captured the curved relation at high modulation frequencies with high exponent (Fig. 6, bottom right block, lower panels).
Employing identical EI-parameters with a different scaling factor to high-frequency transposed and high-frequency Gaussian noise of different bandwidths, our model accounted for 82.7% of the variance in the psychoacoustic data of Bernstein and Trahiotis (2003). The model correctly captured the dominant effects in this dataset, while it underestimated the lateralization of non-transposed noise at large bandwidths. The original cross-correlation model used to simulate this data (Bernstein and Trahiotis, 2003) overestimated lateralization for high-frequency Gaussian noise by up to 10 dB pointer ILD at all bandwidths, except at 25 Hz. Our model correctly predicts the much smaller lateralization for Gaussian noise, because only the transposed-noise condition facilitates high synchronous excitation to steep modulation onsets, after inhibition has ceased during short intervals of zero amplitude. The same feature causes the more pronounced lateralization in the EI model for envelopes with steep onsets compared to shallow onsets and steep offset in Sec. III C.
C. Off-frequency integration
In experimental data (e.g., Joris and Yin, 1992), as well as in simulated spike trains of the peripheral model (Bruce et al., 2018), the spike rate and the phase locking of AN fibers depend greatly on the stimulus level. Physiologically, AN fibers show the best phase locking to envelopes at a sound pressure level of about 20 dB, and the degree of phase locking decreases at higher levels (Joris and Yin, 1992). This response characteristic is also captured by the peripheral stage of the model (see Fig. 13 in Zilany et al., 2009). While intermediate levels of phase locking are still sufficient to facilitate ITD-based lateralization, the current model does not account for both ITD and ILD based lateralization in a quantitative manner at CF = . In contrast, envelope ITD-based lateralization is level independent in the range from 45 to 65 dB SPL (Dietz et al., 2015), and detection sensitivity even improves with increasing level (e.g., Bernstein and Trahiotis, 2008). We accounted for this discrepancy by employing a population of spiking auditory model neurons with a broad range of CFs. This is an implementation of off-frequency hearing, as suggested by Dreyer and Delgutte (2006). Incorporating these off-frequency components crucially enabled the model to explain the data with high accuracy (Figs. 4 and 7). However, on-frequency neurons were also necessary to account for the most variance, because they were instrumental for stimulus conditions with low modulation depths.
To investigate the role of off-frequency hearing, notched-noise stimuli are commonly used to mask information in off-frequency channels. Bernstein and Trahiotis (2008) report that notched noise has only a modest influence on threshold envelope ITDs. In the periphery model (Bruce et al., 2018) notched noise appears to improve on-frequency phase locking, suggesting off-frequency hearing in the absence and on-frequency hearing in the presence of notched noise. Without published extent of laterality data with notched noise, a quantitative analysis is beyond the scope of the present study. It is expected, however, that the model requires a more sophisticated back-end to operate in the presence of any interfering noise.
Peripheral band-pass filtering may further contribute to the importance of off-frequency channels on envelope ITD perception. The amplitude modulation after filtering can be more pronounced for off- compared to on-frequency channels, especially for high modulation frequencies (Monaghan et al., 2015).
D. Decoding EI output and onset dominance
A variety of different decoding stages have been suggested in previous studies (for review, see Dietz et al., 2018). One common simplistic approach is to decode extent of laterality through a weighted average across frequency channels (Bernstein and Trahiotis, 2012; Takanen et al., 2014; Kelvasa and Dietz, 2015). A linear mapping as proposed for simple stimuli by Kelvasa and Dietz (2015) was employed in our model as a particularly simple decoder option. In its current implementation the model simply averages over the 30 CFs. However, the broadness of neural excitation depends critically on overall level. Therefore, the current decoding stage is not expected to operate in the presence of any interferer or to produce level independent lateralization as reported at moderate signal levels (Dietz et al., 2015). The scaling factor p can be varied to compensate for the vast changes of response rate differences with overall level. A rate ratio could be an alternative to a rate difference metric that appears to be less level dependent but underestimates the influence of differences at large EI rates. Some transition between the two metrics or a decoding stage mapping the two rates in a more complex way to the extent of laterality appears to be a promising next step. Another possibility is to normalize the hemispheric rate difference based on AN response rates (Encke and Hemmert, 2018).
The complexity of the decoding stage should be linked to the complexity of subcortical ITD encoding. Some previous studies reported a diversity of neural responses to envelope ITDs along the auditory pathway. For example, recordings from the inferior colliculus (IC) of guinea pigs revealed step-type ITD sensitivity alongside steep trough-type ITD-rate functions and more gradual functions - all in response to the same envelope shape (Dietz et al., 2016; Greenberg et al., 2017). A more diverse population of binaural interaction neurons may not only lead to an improved model performance, it may also give a more realistic representation of the real-world biological system. Specifically, Dietz et al. (2016) required onset-type input to model binaural model neurons. The onset-type input was facilitated by a simplistic CN stage and was necessary to model the contrast between very pronounced ITD tuning with short sharp onsets compared to a lack of ITD tuning with long gradual onsets, observed in some IC neurons. Onset-type input was not necessary in the present study to account for the data of Bernstein and Trahiotis (2003, 2012). In simulating the different extent of laterality generated by the short and long onsets, the present simple model performs better than cross-correlation-based models (Klein-Hennig et al., 2011) but still clearly underestimate the difference [Fig. 8(D)]. A more diverse mix of model neurons with some modulation onset-dominated specimen is expected to be useful to quantitatively account for the data. Thus, a diverse pool combination of differently behaving model neurons (e.g., Dietz et al., 2016) appears to be supported by both physiology and perception.
Introducing various types of neurons, however, would make the parameter optimization very difficult and require a more complex decoding stage. A complex decoding stage could further estimate the relative importance of the different channels across the tonotopic array. It could also use both left and right rates rather than just the rate difference to extract more information and to be more robust against other stimulus variations, such as overall level. Nevertheless, even the linear decoder can directly relate the output of binaural model neurons to extent of laterality at a fixed sound level. This is in contrast to the decoding stages of several other models that estimate the input ITDs or similar physical quantities, rather than perceptual measures (e.g., Goodman et al., 2013; Encke and Hemmert, 2018). Such a calculation of input ITD will not be useful for predicting the lateralization of complex sounds from the present study, because the same envelope ITD leads to substantially different extents of laterality (Figs. 6 and 8).
E. Physiological, pathophysiological, and anatomical considerations
Although our model is arguably no less complex than other models of envelope-ITD perception (e.g., Cai et al., 1998), from a physiological standpoint the structure is still highly simplified. The simulated model neurons of the AN stage provide direct input to the stage of binaural interaction, bypassing all other brainstem structures such as the cochlear nucleus. Nonetheless, the model produces ITD or ILD rate functions that match functions obtained experimentally in the LSO, with model parameters that are within a physiologically realistic range (compare to, e.g., Sanes, 1990 or Tsuchitani, 1988).
The model is further simplistic in only considering AN fibers with either medium- or high-spontaneous rates. Combining different fiber types or including a cochlear nucleus stage would increase the number of parameters and add complexity. Figure 5 revealed that the model is generally robust against the choice of fiber type. With high-spontaneous rate fibers the model account for an identical 95.7% of the variance in the biggest dataset.
Overall, the model operates between the typical forces of physiological accuracy versus being manageable, interpretable, and therefore rather simplistic (Wilson and Collins, 2019). The ultimate design choice was inspired by Colburn (1973). He listed “attractive characteristics” for models of retrocochlear processing that form the basis of many successful auditory models: (1) the inputs are auditory-nerve responses, (2) the processing is not unreasonable for neural structures, and (3) quantitative predictions can be derived. Thus, we connected an established front-end AN model with a relatively simplistic but physiologically plausible EI-stage and the most simplistic rate-difference decoding back-end.
The above-mentioned co-dependence of model parameters and ability to construct well-performing models with only two independent fitting parameters at the stage of binaural interaction is also interesting from a pathophysiologic or audiological perspective. The abundance of highly co-varying parameters allows a system to compensate for any suboptimal spatial mapping caused mild periphery impairments. It may be that, e.g., an AN fiber loss is compensated for by a reduction of inhibitory gain (e.g., Schaette and McAlpine, 2011) to keep a very similar lateralization performance. However, this hypothesis must be interpreted with caution and would require further testing.
V. SUMMARY AND CONCLUSIONS
The present study demonstrated that the lateralization of complex, high-frequency stimuli can be simulated with a relatively simplistic model deduced from mammalian auditory brainstem physiology and a linear hemispheric rate-difference decoder. To summarize:
Only one pair of model EI-neurons (composed of one neuron from the left and one from the right) was employed for each center frequency.
One neuron pair simultaneously encodes both ILD and envelope ITD, so that the rate difference is proportional to the extent of laterality at a given sound level.
Off-frequency model units are essential for envelope ITD-based lateralization at the sound levels commonly used in psychoacoustic experiments in the absence of notched noise.
The model accounts for 95.7%, 82.7%, and 83.4% of the variance in three datasets using the same set of parameters. Lateralization of stimuli with 2 ms ITD can be simulated without delay lines.
matlab code to reproduce the model data and figures is provided as supplementary material.1
ACKNOWLEDGMENTS
First and foremost, we thank Dr. Leslie R. Bernstein and Dr. Constantin Trahiotis for generously sharing their data, model, and knowledge. Their support was vital throughout the entire study. We further thank Dr. Jörg Encke for helpful discussions. This work was supported by the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Programme grant agreement No. 716800 (ERC Starting Grant to M.D.) and the Cluster of Excellence “Hearing4all” (DFG EXC2177/1) at the University of Oldenburg.
See supplementary material at https://doi.org/10.1121/10.0001602 for matlab code.