In digital media or games, sound effects are typically recorded or synthesized. While there are a great many digital synthesis tools, the synthesized audio quality is generally not on par with sound recordings. Nonetheless, sound synthesis techniques provide a popular means to generate new sound variations. In this research, we study sound effects synthesis using generative models that are inspired by the models used for high-quality speech and music synthesis. In particular, we explore the trade-off between synthesis quality and variation. With regard to quality, we integrate a reconstruction loss into the original training objective to penalize imperfect audio reconstruction and compare it with neural vocoders and traditional spectrogram inversion methods. We use a Wasserstein GAN (WGAN) as an example model to explore the synthesis quality of generated sound effects such as footsteps, birds, guns, rain, and engine sounds. In addition to synthesis quality, we also consider the range of sound variation that is possible with our generative model. We report on the trade-off that we obtain with our model regarding the quality and diversity of synthesized sound effects.
Skip Nav Destination
Article navigation
4 December 2023
185th Meeting of the Acoustical Society of America
4–8 December 2023
Sydney, Australia
Computational Acoustics: Paper 2aCA8
April 15 2024
Impact on quality and diversity from integrating a reconstruction loss into neural audio synthesis
Yunyi Liu
;
Yunyi Liu
1
Department of Electrical and Information Engineering, The University of Sydney
, Sydney, NSW, 2008, AUSTRALIA
; yunyi.liu@sydney.edu.au
Search for other works by this author on:
Craig Jin
Craig Jin
Search for other works by this author on:
Proc. Mtgs. Acoust. 52, 022003 (2023)
Article history
Received:
February 19 2024
Accepted:
March 14 2024
Connected Content
Citation
Yunyi Liu, Craig Jin; Impact on quality and diversity from integrating a reconstruction loss into neural audio synthesis. Proc. Mtgs. Acoust. 4 December 2023; 52 (1): 022003. https://doi.org/10.1121/2.0001871
Download citation file:
93
Views
Citing articles via
Flyback sonic booms from Falcon-9 rockets: Measured data and some considerations for future models
Mark C. Anderson, Kent L. Gee, et al.
Related Content
Generative models for sound field reconstruction
J. Acoust. Soc. Am. (February 2023)
REVIEWS OF ACOUSTICAL PATENTS
J. Acoust. Soc. Am. (October 2022)