A method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model linear time-invariant effects such as gain, pan, equalisation, delay, and reverb. Nonlinear effects, such as distortion and compression, are not considered in this work. The optimization procedure used is the stochastic gradient descent with the aid of differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modelling the audio effects rather than using differentiable blackbox modules. Two reverb module architectures are proposed, a “stereo reverb” model and an “individual reverb” model, and each is discussed. Objective feature measures are taken of the outputs of the two architectures when tasked with estimating a target mix and compared against a stereo gain mix baseline. A listening study is performed to measure how closely the two architectures can perceptually match a reference mix when compared to a stereo gain mix. Results show that the stereo reverb model performs best on objective measures and there is no statistically significant difference between the participants' perception of the stereo reverb model and reference mixes.
Skip Nav Destination
July 27 2021
Reverse engineering of a recording mix with differentiable digital signal processinga)
Special Collection: Machine Learning in Acoustics
Joseph T. Colonel;
Joseph T. Colonel, Joshua Reiss; Reverse engineering of a recording mix with differentiable digital signal processing. J. Acoust. Soc. Am. 1 July 2021; 150 (1): 608–619. https://doi.org/10.1121/10.0005622
Download citation file: