During the past decade, solutions to the problem of musical sound source separation have become more evident. Potential applications include sound editing, enhanced spatialization, music-minus-one, karaoke, music classification/identification, music transcription, and computational musicology. Our current approach is to restrict the input signal to a mix of a limited number of instruments, each comprised of harmonic partials, with known F0 contours. (These can be obtained either by audio-to-midi alignment or multiple-F0 estimation.) Since harmonic frequencies of known F0s are easily predicted, binary mask separation is robust except for frequency regions where harmonics of different instruments collide. Four methods of separation are compared: (1) binary mask separation; (2) common amplitude modulation (CAM) [Li et al., IEEE Trans. Audio, Speech, Lang., Process. 17(7), 2009]. (3) least-squares estimate of frequencies based on a multiple sinusoidal model; (4) F0-informed non-negative matrix inversion using instrument spectral libraries. Separation examples using these methods will be demonstrated.

This content is only available via PDF.