Exposure to spatially incongruent auditory and visual inputs produces both immediate crossmodal biases and aftereffects. But for event identification, rather than localization, only biases have been demonstrated so far. Taking the case of incongruent audiovisual speech, which produces the well‐known McGurk bias effect, we show that, contrary to earlier reports (e.g., Roberts and Summerfield, 1981), aftereffects can be obtained. Exposure to an ambivalent auditory token from an /aba–ada/ continuum combined with the visual presentation of a face articulating /aba/ (or /ada/) increased the tendency to interpret test auditory tokens as /aba/ (or /ada/). The earlier results that were taken as disproving the possibility of visual recalibration of auditory speech identification were obtained with exposure to nonambiguous auditory tokens that (as we confirm in another experiment) create an auditory contrast effect in a direction opposite that of recalibration, and presumably masked the effect of recalibration.