In acoustic scene classification (ASC), acoustic features play a crucial role in the extraction of scene information, which can be stored over different time scales. Moreover, the limited size of the dataset may lead to a biased model with a poor performance for recordings from unseen cities and confusing scene classes. This paper proposes a long-term wavelet feature that captures discriminative long-term scene information. The extracted scalogram requires a lower storage capacity and can be classified faster and more accurately compared with classic Mel filter bank coefficients (FBank). Furthermore, a data augmentation scheme is adopted to improve the generalization of the ASC systems, which extends the database iteratively with auxiliary classifier generative adversarial neural networks (ACGANs) and a deep learning-based sample filter. Experiments were conducted on datasets from the Detection and Classification of Acoustic Scenes and Events (DCASE) challenges. The DCASE17 and DCASE19 datasets marked a performance boost of the proposed techniques compared with the FBank classifier. Moreover, the ACGAN-based data augmentation scheme achieved an absolute accuracy improvement of 6.10% on recordings from unseen cities, far exceeding classic augmentation methods.
Skip Nav Destination
Article navigation
June 2021
June 15 2021
Long-term scalogram integrated with an iterative data augmentation scheme for acoustic scene classificationa)
Special Collection:
Machine Learning in Acoustics
Hangting Chen;
Hangting Chen
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences
, No. 21 North 4th Ring Road, Haidian District, Beijing 100190, People's Republic of China
Search for other works by this author on:
Zuozhen Liu;
Zuozhen Liu
c)
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences
, No. 21 North 4th Ring Road, Haidian District, Beijing 100190, People's Republic of China
Search for other works by this author on:
Zongming Liu;
Zongming Liu
c)
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences
, No. 21 North 4th Ring Road, Haidian District, Beijing 100190, People's Republic of China
Search for other works by this author on:
Pengyuan Zhang
Pengyuan Zhang
d)
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences
, No. 21 North 4th Ring Road, Haidian District, Beijing 100190, People's Republic of China
Search for other works by this author on:
b)
ORCID: 0000-0002-4085-4364.
c)
Also at: University of Chinese Academy of Sciences, No. 19(A) Yuquan Road, Shijingshan District, Beijing 100049, People's Republic of China.
d)
Also at: University of Chinese Academy of Sciences, No. 19(A) Yuquan Road, Shijingshan District, Beijing 100049, People's Republic of China. Electronic mail: zhangpengyuan@hccl.ioa.ac.cn
a)
This paper is part of a special issue on Machine Learning in Acoustics.
J. Acoust. Soc. Am. 149, 4198–4213 (2021)
Article history
Received:
November 02 2020
Accepted:
May 19 2021
Citation
Hangting Chen, Zuozhen Liu, Zongming Liu, Pengyuan Zhang; Long-term scalogram integrated with an iterative data augmentation scheme for acoustic scene classification. J. Acoust. Soc. Am. 1 June 2021; 149 (6): 4198–4213. https://doi.org/10.1121/10.0005202
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Pay-Per-View Access
$40.00
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Short-time coherence between repeated room impulse response measurements
Karolina Prawda, Sebastian J. Schlecht, et al.
Efficient design of complex-valued neural networks with application to the classification of transient acoustic signals
Vlad S. Paul, Philip A. Nelson