Combined use of a cochlear implant (CI) and a contralateral hearing aid has been shown to improve CI users’ speech perception performance in multi-talker environments, presumably due to the use of acoustic temporal fine structure cues for separating one talker from several other talkers. However, there is large variability in this bimodal benefit. In this study, we show that differences in width of binaural pitch fusion, the fusion of dichotically presented tones that evoke different pitches across ears, explain a large part of this variability. Specifically, broad binaural pitch fusion could lead to fusion of multiple voices as one voice, and reduce the ability to use voice pitch differences to separate voices and understand speech in a multi-talker environment. Speech reception thresholds measured using male and female target talkers were compared with binaural pitch fusion results. Overall performance improved with different genders for target and maskers. A strong negative correlation was observed between voice gender benefit and breadth of binaural pitch fusion. These results suggest that sharp binaural pitch fusion is necessary for maximal speech perception in noise when acoustic hearing is available to transmit voice pitch cues. [Work supported by NIH-NIDCD Grant Nos. R01 DC01337 and F32 DC016193.]