Communicating about sounds is a difficult task without a technical language, and naïve speakers often rely on different kinds of non-linguistic vocalizations and body gestures (Lemaitre et al. 2014). Previous work has independently studied how effectively people describe sounds with gestures or vocalizations (Caramiaux, 2014, Lemaitre and Rocchesso, 2014). However, speech communication studies suggest a more intimate link between the two processes (Kendon, 2004). Our study thus focused on the combination of manual gestures and non-speech vocalizations in the communication of sounds. We first collected a large database of vocal and gestural imitations of a variety of sounds (audio, video, and motion sensor data). Qualitative analysis of gestural strategies resulted in three hypotheses: (1) voice is more effective than gesture for communicating rhythmic information, (2) textural aspects are communicated with shaky gestures, and (3) concurrent streams of sound events can be split between gestures and voice. These hypotheses were validated in a second experiment in which 20 participants imitated 25 specifically synthesized sounds: rhythmic noise bursts, granular textures, and layered streams. Statistical analyses compared acoustics features of synthesized sounds, vocal features, and a set of novel gestural features based on a wavelet representation of the acceleration data.