Beamforming techniques are widely used in hearing aids to enhance the intelligibility of speech from a target direction, but they tend to isolate the listener from their acoustic environment and distort spatial cues. The main reason for this is that a typical beamformer method alters the head-related transfer function of the individual users' ears and functions under monaural assumptions instead of a binaural model. In this letter, a binaural auditory steering strategy (BASS) is proposed for the design of asymmetrically presented spatial filters which improves awareness of the surrounding acoustic environment while preserving intelligibility from a target direction. Additionally, an objective metric and the results of a subjective study to evaluate the effectiveness of the BASS are presented.

The primary purpose of a hearing aid is to restore audibility of a target signal for a hearing impaired listener. The classical approach is to apply frequency dependent gain to microphone signals based on a pure-tone audiogram. However, this often does not alleviate the problems listeners with hearing impairment have in noisy environments.1 To address this problem, beamforming techniques can be used to improve the intelligibility of speech from a target direction by suppressing sounds from other directions. As long as the target signal remains in front, this approach is advantageous. However, it creates the undesirable side effect of “tunnel hearing” that isolates the listeners from other sounds in their acoustic environment. In essence, the beamformer decides what the user wants to hear. If, instead, the user is supplied with the acoustic information that the auditory system needs to resolve the cocktail party problem in a natural way,2 then the user could selectively attend to the target sound stream while monitoring other sound streams at the same time.3 

We propose a beamformer design built upon a model of natural binaural auditory and cognitive processing, that we call the Binaural Auditory Steering Strategy (BASS). Our model not only accounts for acoustic effects such as the spatial filtering of the hearing aids and head shadow effects, but also perceptual effects such as binaural unmasking.4 Moreover, it incorporates a model of the binaural auditory process that combines the signal from the left and right ears in a way that allows selective attention to either a target source or background sound. The BASS is used to develop a metric that is indicative of speech intelligibility as well as one that is indicative of monitoring of surrounding sound. When combined to a single metric, it can be used to optimize beamformer parameters in each of a bilaterally fitted hearing aid.

Binaural listening uses the signals from both ears which have been modified individually by the spatial filtering of the head, pinnae, and hearing aids. The auditory system can selectively listen to a sound stream of interest.3 For example, when there is speech both from the front and the left side, the head shadow attenuates the interfering left talker's voice on the right ear while the sound from the front is equally loud to both ears, resulting in a better signal-to-noise ratio (SNR) at the right ear than the left ear. Speech understanding based on the ear with the higher SNR is typically called better-ear (BE) listening.4 When the talker on the left side mentions something alerting, such as your name, you will switch your attention to the voice by focusing on the louder sound in the left ear, which has the best SNR for this source location. In both listening situations, the auditory system uses selective attention3 to listen to the signal with the best SNR. This is what we call BASS. The fact that both talkers are audible to the user is what we call situational awareness (SA).

The algorithm that implements the BASS tries to enhance the capabilities of the auditory system.5 It does so by using the two-directional patterns from Fig. 1 as a lever or target at each ear to minimize the cost function in a least-square sense.6 On the right ear a highly focused beam is used to create the best possible SNR for speech from the front in diffuse noise. The left ear is used to create an omnidirectional pattern. Formally, SA and BE are defined as

SA(f,θ)=max(|L(f,θ)|,|R(f,θ)|),BE(f,θ)=min(|L(f,θ)|,|R(f,θ)|),
(1)

where SA(f, θ) is effectively the union of the left and right directional responses and models Acoustical Awareness, and BE(f, θ) is the intersection of the left [L(f, θ)] and right [R(f, θ)] directional responses to model the BE effect [Fig. 1(a)]. These binaural directional responses can be plotted as conventional polar or density plots such as shown in Figs. 1(a) and 1(b) or reduced to single-number metrics, similar to the Directional Index which is used as a metric to describe the performance of classic beamforming.7 These metrics will be called the Situational Awareness Index (SAI) and the Better Ear Index (BEI), and combined to the Better Ear/Situational Awareness (BESA) index based on the parallel processing nature of the auditory attention.

SAI=10log10Var(SA(f,θ)2)/ SA(f,θ)2 ,BEI=10log10BE(f,0°)2/ BE(f,θ)2 ,BESA=BEISAI,
(2)

where x and Var(x) are the mean and standard deviation of x. Note that the mean and standard deviation are computed across angle as well as frequency resulting in a single number for SA, BEI, and BESA. The SAI describes the proximity of the SA response to an omnidirectional pattern with equal sensitivity at all angles. Figure 1(b) shows the density plots of the BASS for open (unaided) ears based on manikin in situ directional responses and Eq. (1).

The algorithm that implements the BASS is depicted in Fig. 2. The wLi(f) and wRi(f) are the frequency responses of the fixed finite impulse response filters associated with the ith microphone of the microphone array on the left and right hearing aids, respectively. The resulting spatial sensitivity patterns on the left and right hearing aids are given by

Left:L(f,θ)=(wL1(f)hL1(f,θ)+wL2(f)hL2(f,θ)),Right:R(f,θ)=(wR1(f)hR1(f,θ)+wR2(f)hR2(f,θ)),
(3)

where hi(f, θ) denotes the hearing aid related impulse responses in the frequency domain for a source at azimuth angle θ to the ith microphone at frequency f. The spatial filters wRi(f) on the right ear are optimized to maximize the in situ Directivity Index. Spatial filters on the left wLi(f) are chosen by the minimization

argminwL1,wL20°360°fife(wBE*(BE(f,θ)2BE(f,θ)2)+wSA*(SA(f,θ)2SA(f,θ)2)+wLR*(|L(f,0°)||R(f,0°)|)2)dfdθ,
(4)

where BE(f,θ) and SA(f,θ) are reference spatial-spectral functions that provide a high positive BEI and a high negative SAI as in Fig. 1(a), and BE(f, θ) and SA(f, θ) are evaluated for the trial values of wLi(f) during the minimization. The third term represents our desire to equalize the two frequency responses at a zero degree azimuth. The symbols wBE, wSA, and wLR represent weight functions for each target and fe denotes the Nyquist frequency. Here, wBE was set to 1 below 2500 Hz, to 0.5 from 2500–5000 Hz and to 0.2 above 5000 Hz, wSA was 0.2 below 2500 Hz, 0.5 from 2500–5000 Hz and 1 above 5000 Hz while wLR had a value of 10 across frequencies.

In order to preserve interaural time differences in the low frequencies, the patterns on both hearing instruments are often constrained to be equal to the front microphone in the low frequencies in which case fi would be set to, e.g., 1500 Hz. One example design based on this optimization is shown in Fig. 1(c). The intensity plots for the left and right sides can be seen on the left, while the resulting patterns after binaural integration based on the BASS are given on the right. When comparing Fig. 1(c) to Fig. 1(b), one can see that the SNR for a source from the look direction is improved compared to the open ear situation, while the SA is similar to what is achieved by the open ear.

The correlation of BEI, SAI, and BESA with perception was explored by measuring SA and directional benefit for different BESA values. This was accomplished by behavioral testing of speech-reception thresholds (SRTs) in a virtual sound environment. In order to generate realistic sound environments and control the value of the SAI, BEI, and BESA indices, a virtual test environment with room simulation software was applied in this study.

The room simulation software MCRoomSim was used in this study.8 Figures 3(a) and 3(b) illustrate the setup for the measurement of SA and Fig. 3(c) for BE listening. SRTs are measured for targets from different directions. In order to test SAs two distracting speech streams (red symbols) are presented from the frontal hemisphere while the target speech (green symbol) is presented off-axis from either the left or right side. In general, SA is a more complex concept than can be characterized with a simple setup as shown here. However, we feel that low SRTs in a spatial setup as shown in Fig. 3 is at least a necessary acoustical criterion for obtaining high situational awareness. In order to test BE listening one distracting speech stream is presented from the side while the target speech is presented from the look direction. Speech recognition is measured for all situations independently and for SA the obtained SRTs for situations (a) and (b) from Fig. 3 were averaged. The SRTs were obtained with a Danish HINT (Ref. 9) implemented in Matlab. The test uses a two-up and one-down adaptive procedure. Twelve normal-hearing subjects participated in this study. Regarding the simulation parameters the polar patterns yielding different BESA indices as shown in the lower row of Figs. 3(d)–3(g) are applied in MCRoomSim as a direction-dependent hearing aid receiver gain. The simulated room had a low reverberation with an average broadband reverberation time of around 0.2 s. The corresponding BESA, as well as SAI and BEI values, are summarized in Table 1. It also contains representative open-ear values as well as corresponding SRT values.

Mean results for SA and BE listening are seen in the left and right panels of Fig. 4. A lower SRT indicates a better performance. The dashed gray lines indicate trend lines. Threshold values for BE listening are lower than for Acoustical Awareness. This is due to two effects: one is that the SRT for spatially separated as well as co-located sounds are lower for a single masker than for multiple maskers;10 the other is that the target from the look direction receives a 3 dB acoustical boost due to the addition of correlated sounds from the frontal directions for all the investigated BESA values.

This also explains why the increase in BE listening with increasing BESA is not as large as for situational awareness. Hence, one has to conclude here that a one-dimensional index (i.e., BESA) might not be suitable for explaining the observed data under all test conditions. This is to be expected as the SAI, BEI, and BESA indices assume diffuse sound field conditions while the experiments are not conducted with diffuse noise. The individual indices correlate better with the SRTs that characterize BE and SA when comparing their respective values in Table 1.

Though applying beamforming in hearing aids helps to improve speech intelligibility for a source from one location, it tends to isolate the user from the acoustic environment. To circumvent this disadvantage, the BASS was introduced and used to design the spatial filtering algorithms on both ears of bilaterally fitted hearing aids to synergistically work with the capabilities of the auditory system with the aim to preserve speech intelligibility while improving environmental awareness. To quantify this approach and to assist in optimizing the design, the BE and SA indices were introduced and combined in a single metric called the BESA index. All these indices assume diffuse noise conditions. The perceptual data from Fig. 4 show that while BESA correlates well with SA it does not with BE. In these cases, the indices SAI and BEI alone provide a better description of the perceptual findings (see Table 1).

1.
S.
Kochkin
, “
Why my hearing aid is in the drawer: The consumer's perspective
,”
Hear. J.
53
,
34
41
(
1988
).
2.
E.
Cherry
, “
Some experiments on the recognition of speech, with one and with two ears
,”
J. Acoust. Soc. Am.
25
(
4
),
975
979
(
1953
).
3.
E.
Broadbent
, “
A mechanical model for human attention and immediate memory
,”
Psychol. Rev.
64
(
3
),
205
215
(
1957
).
4.
P.
Zurek
, “
Binaural advantages and directional effects on speech intelligibility
,” in
Acoustical Factors Affecting Hearing Aid Performance
(
Allyn and Bacon
,
Boston, MA
,
1993
), pp.
975
979
.
5.
A. W.
Bronkhorst
and
R.
Plomp
, “
The effect of head-induced interaural time and level differences on speech intelligibility in noise
,” in
Acoustical Factors Affecting Hearing Aid Performance
(
Allyn and Bacon
,
Boston, MA
,
1988
), pp.
255
277
.
6.
C.
Ma
,
A.
Dittberner
, and
R.
De Vries
, “
Binaural auditory steering strategy: A cupped ear study for hearing aid design
,” in
AES 141th Convention
, Los Angeles, CA (September 29–October 2,
2016
), pp.
255
277
.
7.
A. B.
Dittberner
and
R. A.
Bentler
, “
Predictive measures of directional benefit Part 1: Estimating the directivity index on a manikin
,”
Ear Hear.
28
,
26
45
(
2007
).
8.
W.
Wabnitz
,
N.
Epain
,
J.
Craig
, and
A.
van Schaik
, “
Room acoustic simulation for multichannel microphone arrays
,” in
Proceedings of the International Symposium on Room Acoustics
(
2010
).
9.
J. B.
Nielsen
and
T.
Dau
, “
A Danish open set speech corpus for competing speech studies
,”
J. Acoust. Soc. Am.
135
,
407
420
(
2007
).
10.
M. L.
Hawley
,
R. Y.
Litovsky
, and
J. F.
Culling
, “
The benefit of binaural hearing in a cocktail party: Effect of location and type of masker
,”
J. Acoust. Soc. Am.
115
,
833
843
(
2007
).