The present study investigated the combined effect of binaural cues and comodulation for a narrowband target noise masked by a narrowband noise. The threshold difference between a diotic condition (same stimuli in both ears) and a dichotic condition (target interaural phase difference of π and diotic masker) decreased with spectral distance between masker and target, irrespective of across-frequency envelope correlation. The threshold difference between a condition with comodulated target and masker and a corresponding uncorrelated condition, i.e., the comodulation detection difference, did not depend on target frequency and interaural correlation, indicating that these two stimulus properties are processed independently.

Different cues are used by the auditory system to separate sounds from different sound sources. One cue results from comparison of the signals at the two ears, referred to as a binaural cue. This cue can provide information about the location in space of the source. Another cue is comodulation, i.e., coherent envelope fluctuation in different frequency regions, which is often observed in natural sounds (Nelken et al., 1999). Several psychoacoustic effects are associated with the ability of the auditory system to use these cues. For binaural cues one effect is binaural unmasking, i.e., a reduction in masking when binaural cues can be used to detect a target sound. This is often quantified as the binaural masking-level difference (BMLD), i.e., the threshold difference between a condition without binaural cues (e.g., the same signals at the two ears) and a dichotic condition where the masker and target have different interaural properties (Jeffress et al., 1956). The ability of the auditory system to use comodulation as a cue is associated with two effects: comodulation masking release (CMR) and comodulation detection difference (CDD). CMR describes an effect that masker comodulation can reduce masking of a signal (Hall et al., 1984; Verhey et al., 2003, for a review). In contrast, CDD is the effect that a target that has the same envelope as the masker is less audible (Cohen and Schubert, 1987).

A few studies investigated how comodulation and binaural cues combine (e.g., Hall et al., 1988, Schooneveldt and Moore, 1989). Epp and Verhey (2009) showed in their experiment that the masking releases due to across-frequency comodulation and binaural cues add in decibels (dB), indicating that the monaural comodulation cue and the binaural cue are processed independently. The focus of the previous studies was on how these cues are processed when they are used to improve target detection as in the case of CMR and BMLD. The goal of the present study is to investigate if the stimulus properties coherent envelope fluctuation of the spectral components and interaural disparities combine in a similar way when these stimulus properties have an opposite effect on target detection, i.e., beneficial in the case of the binaural cue and detrimental in the case of comodulation. To this end, a masking paradigm was used that was similar to the one used in Cohen and Schubert (1987) to measure CDD. The narrowband masker was centered at 500 Hz—a common frequency in BMLD studies—and the narrowband target was centered at a slightly higher frequency. The spectral distances between target and masker were chosen to obtain a reasonable magnitude of the BMLD (Nitschmann and Verhey, 2012).

Thresholds were measured using an adaptive three-interval three-alternative forced-choice procedure where one randomly chosen interval contained the target. After each trial, the subject had to indicate the interval that contained the target. An experimental run started with an overall level of the target of 70 dB sound pressure level (SPL) that was clearly above threshold. The target level was adjusted using a 1-up 2-down procedure that provides a threshold estimate at the 70.7% detection probability point (Levitt, 1971). The initial step size of the target level was 8 dB. It was halved after every upper reversal in the adjustment procedure until a final step size of 1 dB was reached. At this final step size, the run continued for another five reversals. The average of the levels at the last six reversals was taken as an estimate of the threshold. Every threshold was measured at least three times. The experiment was organized in sessions. In each session, threshold estimates were obtained for one target center frequency in random order for all combinations of stimulus correlation and binaural conditions. About half of the subjects started with the target center frequency of 530 Hz, the other half with the target center frequency of 560 Hz. The valid threshold estimates of all sessions were averaged to give the final individual thresholds. The valid threshold estimates of all sessions were averaged to give the final individual thresholds. The criteria for the validity of the threshold estimates were the same as in Nitschmann et al. (2010).

The target and masker were 10-Hz wide narrowband noises. The narrowband-noise masker was presented diotically (i.e., identical in the two ears), centered at 500 Hz, had a constant overall level of 60 dB SPL, and a duration of 2.9 s including 50-ms raised-cosine ramps at onset and offset. During the presentation of the masker, there were three 300-ms intervals displayed on the screen in front of the subject, starting 0.7, 1.3, and 1.9 s after masker onset. The narrowband-noise target had a duration of 300 ms including 50-ms raised-cosine ramps at onset and offset. The center frequency of the target was either 530 or 560 Hz. It was presented during one randomly chosen interval of the three intervals displayed on the screen.

The noise samples were multiplied noises, i.e., they were generated by multiplying a 5-Hz wide low-pass noise with a sinusoid at the center frequency of the resulting narrowband noise. Low-pass noises were generated in the frequency domain by setting all frequency components outside the desired frequency range to zero. Inside this frequency range, all frequency components had the same magnitude and a random phase. An inverse Fourier transform generated the time signals. They were multiplied with sinusoids to generate narrowband noises. The buffer size was 131 072 samples, i.e., the next power of 2 for signal duration of 2.9 s that was sampled at the sampling rate that was used in the present study (44.1 kHz). Each noise buffer was reduced to a duration of 2.9 s (the duration of a trial) and then multiplied with the desired gating window. For the target signal, this was a gating window of 300-ms duration (the duration of the interval) that was shifted to the temporal position of the interval of the trial that contained the target. In the comodulated (CM) condition, the same low-pass noise was used for the generation of the target and masker noise bands. To generate uncorrelated (UC) masker and target envelopes for the UC condition, different low-pass noise samples were used. For each presentation, a new noise sample was used. The target was either in phase (S0) or had an interaural phase difference of π (Sπ). The interaural phase difference of π was realized by using a right ear signal that was the left ear signal with an inverted polarity.

Seven subjects (5 female, 2 male, aged between 22 and 40 years, mean 30 years) took part in the experiments. Four of them were paid volunteers and the rest were members of the research group. All subjects had normal audiograms with hearing thresholds lower than or equal to 10 dB hearing level at the standard audiometric frequencies within the frequency range from 125 to 4000 Hz. Subjects were seated in a sound-insulated booth. The stimuli were converted from digital to analog signals via an external sound card (RME Fireface 400, Haimhausen, Germany) and presented to the subjects via Sennheiser HD650 headphones (Sennheiser, Wedemark, Germany). A standard personal computer controlled stimulus generation and presentation and recorded results.

Figure 1 shows data averaged across the seven subjects. The top panel shows thresholds as a function of the spectral distance between the target and masker. The error bars indicate plus/minus one standard error across subjects. The middle panel shows the CDD, i.e., the threshold difference between the CM and the corresponding UC condition. The bottom panel shows the BMLD for the two across-frequency envelope correlations. The mean thresholds in the top panel decrease with increasing spectral distance of the target from the narrowband masker centered at 500 Hz.

Fig. 1.

The top panel shows detection thresholds for a narrowband signal centered at Δf in Hz above 500 Hz that is either UC (open symbols) or CM (filled symbols) with respect to the diotic narrowband masker centered at 500 Hz and either in phase (S0, circles) or antiphase (Sπ, triangles) between the two ears. The thresholds are averaged over seven subjects. The error bars show the standard error across subjects. The two other panels show differences derived from the data shown in the top panel: The middle panel shows the CDD, i.e., the difference between CM thresholds and the corresponding UC thresholds. The bottom panel shows the BMLD, i.e., the difference between S0 and Sπ thresholds. Error bars in the two lower panels denote the square root of the sum of the squared standard errors of the respective thresholds in the top panel.

Fig. 1.

The top panel shows detection thresholds for a narrowband signal centered at Δf in Hz above 500 Hz that is either UC (open symbols) or CM (filled symbols) with respect to the diotic narrowband masker centered at 500 Hz and either in phase (S0, circles) or antiphase (Sπ, triangles) between the two ears. The thresholds are averaged over seven subjects. The error bars show the standard error across subjects. The two other panels show differences derived from the data shown in the top panel: The middle panel shows the CDD, i.e., the difference between CM thresholds and the corresponding UC thresholds. The bottom panel shows the BMLD, i.e., the difference between S0 and Sπ thresholds. Error bars in the two lower panels denote the square root of the sum of the squared standard errors of the respective thresholds in the top panel.

Close modal

As expected, thresholds for the CM condition are higher than the corresponding thresholds for the UC condition and diotic thresholds are higher than the corresponding dichotic thresholds. Thresholds for the narrowband target centered at 530 Hz are highest for the diotic CM condition (39 dB) and lowest for the dichotic UC condition (30 dB). For the target center frequency of 560 Hz, the highest threshold was 30 dB (diotic CM condition) and the lowest 24 dB (dichotic UC condition). The interindividual standard errors were in the range from 0.6 to 1.1 dB.

The CDD values shown in the middle panel are roughly the same for the two target center frequencies. The CDD is about 2 dB for the diotic condition and 3 dB for the dichotic condition. The BMLD shown in the bottom panel decreases with increasing spectral distance between target and masker from about 6 dB at 30 Hz to about 3 dB at 60 Hz.

The trends were confirmed by a within-subject analysis of variance with factors interaural target phase difference (0 and π), spectral distance between target and masker (30 and 60 Hz), and across-frequency envelope correlation (UC and CM). All parameters had a significant effect on the results: spectral distance between target and masker (F1,6 = 246.7, p <0.0001), interaural target phase difference (F1,6 = 50.2, p <0.0001), and across-frequency envelope correlation (F1,6 = 14.9, p <0.01). There was a significant interaction of interaural target phase difference and spectral distance (F1,6 = 9.5, p <0.05). All other interactions were not significant.

The decrease in BMLD as the spectral distance increased is in agreement with previous binaural masking experiments using a sinusoidal target (Zwicker and Henning, 1984; Nitschmann and Verhey, 2012). Nitschmann and Verhey (2012) reported a BMLD of 7 dB for a target center frequency of 530 Hz and a masker centered at 500 Hz. The BMLD was only 3 dB for a target center frequency of 560 Hz. The masker level was the same as in the present study. The similarity between the data with sinusoidal targets and with the noise bands used in the present study suggests that the same binaural cues are used for these two target types.

In a CDD experiment, Borrill and Moore (2002) investigated how thresholds changed when the noise target was replaced by a sinusoidal target. For each of their three subjects, they found that the thresholds for the sinusoidal target were about the same as those for the noise target in their UC condition. In contrast to Borrill and Moore (2002), the diotic CM thresholds of the present study are almost identical (difference <1 dB) to the corresponding thresholds of Nitschmann and Verhey (2012). Note that the difference between the diotic UC thresholds and corresponding thresholds of Nitschmann and Verhey (2012) is also not large (1.5 to 2 dB). Thus, it is not possible to decide if the data of the present study agree or disagree with the findings of Borrill and Moore (2002).

In general, the magnitude of the CDD of the present study is rather small when compared to Borrill and Moore (2002). This is presumably due to the choice of stimulus parameters. The present study only used one masker band, whereas Borrill and Moore (2002) used two masker bands, one spectrally above and one spectrally below the center frequency of the target. McFadden (1987) showed that CDD is larger for two bands than for one band. In addition, CDD seems to be larger for medium spectral distances than for the small spectral distances used in the present study. Only one of the three subjects of Cohen and Schubert (1987) showed a CDD for a target center frequency that differed by 10% from that of the masker. The CDD amounted to more than 10 dB when the ratio of the center frequencies between target and masker was in the range 1.5 to 2. McFadden (1987) also observed the largest CDD for a target spectrally above the masker when the ratio was 1.5. Interestingly, the CDD was already considerably smaller at a ratio of 1.75, indicating that the center frequency of the masker and the bandwidth of the stimuli affect the magnitude of the CDD (both differed between the two studies). Note that the stimulus parameters of the present study were chosen to obtain a reasonable magnitude of the BMLD rather than to maximize the CDD. In Nitschmann and Verhey (2012), the largest ratio of the center frequencies was 1.18. At this spectral distance, the BMLD was less than 3 dB.

The statistical analysis of the data of the present study shows no significant interaction between stimulus envelope correlation and binaural condition. The combined effect of a binaural cue and stimulus comodulation is equal to the sum of the CDD and BMLD. Thus, it is reasonable to interpret the data as indicating that binaural cue and stimulus comodulation are processed independently. This seems to be in agreement with the results of Epp and Verhey (2009), where the effects of CMR and BMLD added. However, they argued that this addition of the two effects was only observed when the CMR was due to an across-frequency-channel process whereas less than an addition was observed in conditions where within-channel cues were used (see, e.g., Hall et al., 1988).

The traditional explanation of CDD was that the auditory system groups CM noise bands together as one auditory object, which involves the same across-frequency channel process as used to explain CMR (Cohen and Schubert, 1987). However, several more recent studies modeled CDD without assuming an across-channel process. Borrill and Moore (2002) argued that CDD is based on the spread of excitation and dip listening rather than on perceptual grouping. Buschermöhle et al. (2007) showed that CDD can be simulated on the basis of the mean compressed envelope at the output in one auditory filter. Based on model predictions, Ernst and Verhey (2008) argued that peripheral two-tone suppression plays an important role in CDD experiments. Whatever the exact mechanism is, all these studies indicate that CDD may result from an analysis within one auditory filter, i.e., is due to a within-channel process. Moreover, given the small spectral distances that were used in the present study, it is reasonable to assume that the CDD shown in Fig. 1 is due to a within-channel process. Thus, the present data indicate that addition of the effects of comodulation and interaural disparities can be obtained when within-channel cues are used to process comodulation and when comodulation results in reduced detection rather than a masking release.

We thank the Deutsche Forschungsgemeinschaft for supporting this project.

1.
Borrill
,
S. J.
, and
Moore
,
B. C. J.
(
2002
). “
Evidence that comodulation detection differences depend on within-channel mechanisms
,”
J. Acoust. Soc. Am.
111
,
309
319
.
2.
Buschermöhle
,
M.
,
Verhey
,
J. L.
,
Feudel
,
U.
, and
Freund
,
J. A.
(
2007
). “
The role of the auditory periphery in comodulation detection difference and comodulation masking release
,”
Biol. Cybern.
97
,
397
411
.
3.
Cohen
,
M. F.
, and
Schubert
,
E. D.
(
1987
). “
The effect of cross-spectrum correlation on the detectability of a noise band
,”
J. Acoust. Soc. Am.
81
,
721
723
.
4.
Epp
,
B.
, and
Verhey
,
J. L.
(
2009
). “
Combination of masking releases for different center frequencies and masker amplitude statistics
,”
J. Acoust. Soc. Am.
126
,
2479
2489
.
5.
Ernst
,
S. M. A.
, and
Verhey
,
J. L.
(
2008
). “
Peripheral and central aspects of auditory across-frequency processing
,”
Brain Res.
1220
,
246
255
.
6.
Hall
,
J. W.
,
Cokely
,
J. A.
, and
Grose
,
J. H.
(
1988
). “
Combined monaural and binaural masking release
,”
J. Acoust. Soc. Am.
83
,
1839
1845
.
7.
Hall
,
J. W.
,
Haggard
,
M. P.
, and
Fernandes
,
M. A.
(
1984
). “
Detection in noise by spectro-temporal pattern analysis
,”
J. Acoust. Soc. Am.
76
,
50
56
.
8.
Jeffress
,
L. A.
,
Blodgett
,
H. C.
,
Sandel
,
T. T.
, and
Wood
,
C. L. I.
(
1956
). “
Masking of tonal signals
,”
J. Acoust. Soc. Am.
28
,
416
426
.
9.
Levitt
,
H.
(
1971
). “
Transformed up-down methods in psychoacoustics
,”
J. Acoust. Soc. Am.
49
,
467
477
.
10.
McFadden
,
D.
(
1987
). “
Comodulation detection differences using noiseband signals
,”
J. Acoust. Soc. Am.
81
,
1519
1527
.
11.
Nelken
,
I.
,
Rotman
,
Y.
, and
Bar Yosef
,
O.
(
1999
). “
Responses of auditory-cortex neurons to structural features of natural sounds
,”
Nature
397
(
6715
),
154
157
.
12.
Nitschmann
,
M.
, and
Verhey
,
J. L.
(
2012
). “
Modulation cues influence binaural masking-level difference in masking-pattern experiments
,”
J. Acoust. Soc. Am.
131
,
EL223
EL228
.
13.
Nitschmann
,
M.
,
Verhey
,
J. L.
, and
Kollmeier
,
B.
(
2010
). “
Monaural and binaural frequency selectivity in hearing-impaired subjects
,”
Int. J. Audiol.
49
,
357
367
.
14.
Schooneveldt
,
G. P.
, and
Moore
,
B. C. J.
(
1989
). “
Comodulation masking release (CMR) for various monaural and binaural combinations of the signal, on-frequency and flanking bands
,”
J. Acoust. Soc. Am.
85
,
262
272
.
15.
Verhey
,
J. L.
,
Pressnitzer
,
D.
, and
Winter
,
I. M.
(
2003
). “
The psychophysics and physiology of comodulation masking release
,”
Exp. Brain Res.
153
,
405
417
.
16.
Zwicker
,
E.
, and
Henning
,
G. B.
(
1984
). “
Binaural masking-level differences with tones masked by noises of various bandwidths and levels
,”
Hear. Res.
14
,
179
183
.