Research on the integration of auditory and visual sensory information consistently confirms the optimal integration hypothesis, according to which information is weighted according to its relative quality. Thus, since the auditory system has greater temporal resolution, this hypothesis predicts that visual information will not affect auditory judgments of duration. In conflict with this hypothesis, Schutz & Lipscomb (2007) report that percussionists use visual information to alter audience perception of note duration. To show that this discrepancy arises from the acoustic characteristics of percussive sounds, we paired visual information derived from a striking motion with pure tones exhibiting two types of envelopes: percussive (sharp onset followed by exponential decay) and flat (sharp onset to a fixed level, followed by sharp offset) envelopes. Visual information affected auditory duration judgments only for the tones with percussive envelopes (e.g. the sounds produced by a striking motion). As ratings of percussive tones were no more variable than ratings of flat tones, the difference cannot be explained by "ambiguity" of percussive‐tone duration. In contrast to the notion of optimal integration, we conclude that envelope is an important acoustic cue for cross‐modal integration and offers important information regarding event identification.