Modern listening devices come equipped with bone-conducted microphones (e.g.accelerometers) and conventional air-conducted microphones. With numerous active sound sources, the air-conducted microphones pick up the target sources alongside unwanted noise sources, while the contact microphones are robust to external noise. A drawback of contact microphones is that they are bandlimited. When a talker of interest has such a listening device then we can use its accelerometer to derive a spatio-temporal filter (beamformer) that estimates the desired source using a microphone array. However, different beamformers have varied performance across the frequency range of interest. If we have access to the output of multiple beamformers, then we would like to be able to combine them in a constructive manner. In this article, we present a frame-by-frame energy based method to combine multiple beamformers. The proposed method is evaluated in a real world environment with human talkers.

This content is only available via PDF.