In this paper, we introduce an extension of the image method for generating room impulse responses in a structure with more than a single confined space, namely, the structure image method (StIM). The proposed method, StIM, can efficiently generate a large number of environmental examples for a structure impulse response, which is required by current deep-learning methods for many tasks, while maintaining low computational complexity. We address the integration of the environment representation, produced by StIM, into the training process, and present a framework for training deep models. We demonstrate the usage of StIM when training an audio classification model and testing with real recordings acquired by accessible day-to-day devices. StIM shows promising results for indoors audio classification, where the target sound source is not located in the same room as the microphones. StIM enables large scale simulations of multi-room acoustics with low computational complexity which is mostly beneficial for training of deep learning networks.

1.
M. J.
Bianco
,
P.
Gerstoft
,
J.
Traer
,
E.
Ozanich
,
M. A.
Roch
,
S.
Gannot
, and
C.-A.
Deledalle
, “
Machine learning in acoustics: Theory and applications
,”
J. Acoust. Soc. Am.
146
(
5
),
3590
3628
(
2019
).
2.
Y.
Alsouda
,
S.
Pllana
, and
A.
Kurti
, “
A machine learning driven iot solution for noise classification in smart cities
,” arXiv:1809.00238 (
2018
).
3.
G.
Ciaburro
and
G.
Iannace
, “
Improving smart cities safety using sound events detection based on deep neural network algorithms
,”
Informatics
7
(
3
),
1
6
(
2020
).
4.
S.
Krstulović
, “
Audio event recognition in the smart home
,” in
Computational Analysis of Sound Scenes Events
(
Springer Verlag
,
Cham
,
2018
), pp.
335
371
.
5.
L.
Yang
,
H.
Cheng
,
J.
Hao
,
Y.
Ji
, and
Y.
Kuang
, “
A survey on media interaction in social robotics
,” in
Proceedings of the Pacific Rim Conference on Multimedia
(
Springer
,
Berlin
,
2015
), pp.
181
190
.
6.
C.
Clavel
,
T.
Ehrette
, and
G.
Richard
, “
Events detection for an audio-based surveillance system
,” in
Proceedings of the IEEE International Conference on Multimedia and Expo
, IEEE (
2005
), pp.
1306
1309
.
7.
J. T.
Geiger
and
K.
Helwani
, “
Improving event detection for audio surveillance using Gabor filterbank features
,” in
Proceedings of the 23rd European Signal Processing Conference (EUSIPCO)
, IEEE (
2015
), pp.
714
718
.
8.
Y.
Arslan
and
H.
Canbolat
, “
Performance of deep neural networks in audio surveillance
,” in
Proceedings of the 6th International Conference on Control Engineering & Information Technology (CEIT)
, IEEE (
2018
), pp.
1
5
.
9.
B. U.
Töreyin
,
Y.
Dedeoğlu
, and
A. E.
Çetin
, “
HMM based falling person detection using both audio and video
,” in
Proceedings of the International Workshop on Human-Computer Interaction
(
Springer
,
Berlin
,
2005
), pp.
211
220
.
10.
A.
Mesaros
,
T.
Heittola
,
E.
Benetos
,
P.
Foster
,
M.
Lagrange
,
T.
Virtanen
, and
M. D.
Plumbley
, “
Detection and classification of acoustic scenes and events: Outcome of the DCASE 2016 challenge
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
26
(
2
),
379
393
(
2018
).
11.
Ç.
Bilen
,
G.
Ferroni
,
F.
Tuveri
,
J.
Azcarreta
, and
S.
Krstulović
, “
A framework for the robust evaluation of sound event detection
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
IEEE
(
2020
), pp.
61
65
.
12.
E.
Cakir
,
T.
Heittola
,
H.
Huttunen
, and
T.
Virtanen
, “
Polyphonic sound event detection using multi label deep neural networks
,” in
Proceedings of the International Joint Conference on Neural Networks (IJCNN)
,
IEEE
(
2015
). pp.
1
7
.
13.
I.-Y.
Jeong
,
S.
Lee
,
Y.
Han
, and
K.
Lee
, “
Audio event detection using multiple-input convolutional neural network
,” in
Detection and Classification of Acoustic Scenes and Events (DCASE)
(
2017
).
14.
S.
Adavanne
,
G.
Parascandolo
,
P.
Pertilä
,
T.
Heittola
, and
T.
Virtanen
, “
Sound event detection in multichannel audio using spatial and harmonic features
,” arXiv:1706.02293 (
2017
).
15.
S.
Adavanne
,
A.
Politis
, and
T.
Virtanen
, “
A multi-room reverberant dataset for sound event localization and detection
,” arXiv:1905.08546 (
2019
).
16.
S.
Adavanne
,
A.
Politis
, and
T.
Virtanen
, “
A multi-room reverberant dataset for sound event localization and detection
,” in
Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)
(
2019
).
17.
M. R.
Bai
,
S.-S.
Lan
,
J.-Y.
Huang
,
Y.-C.
Hsu
, and
H.-C.
So
, “
Audio enhancement and intelligent classification of household sound events using a sparsely deployed array
,”
J. Acoust. Soc. Am.
147
(
1
),
11
24
(
2020
).
18.
J. F.
Gemmeke
,
D. P.
Ellis
,
D.
Freedman
,
A.
Jansen
,
W.
Lawrence
,
R. C.
Moore
,
M.
Plakal
, and
M.
Ritter
, “
Audio set: An ontology and human-labeled dataset for audio events
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
IEEE
(
2017
), pp.
776
780
.
19.
P.
Smith
, Jr.
, “
Response and radiation of structural modes excited by sound
,”
J. Acoust. Soc. Am.
34
(
5
),
640
647
(
1962
).
20.
R. H.
Lyon
and
G.
Maidanik
, “
Power flow between linearly coupled oscillators
,”
J. Acoust. Soc. Am.
34
(
5
),
623
639
(
1962
).
21.
A.
Craggs
, “
The use of simple three-dimensional acoustic finite elements for determining the natural modes and frequencies of complex shaped enclosures
,”
J. Sound Vib.
23
(
3
),
331
339
(
1972
).
22.
G.
Gladwell
, “
A variational formulation of damped acousto structural vibration problems
,”
J. Sound Vib.
4
(
2
),
172
186
(
1966
).
23.
A.
Burton
and
G.
Miller
, “
The application of integral equation methods to the numerical solution of some exterior boundary-value problems
,”
R. Soc. London, Ser. A
323
(
1553
),
201
210
(
1971
).
24.
D.
Colton
and
R.
Kress
,
Integral Equation Methods in Scattering Theory
(
SIAM
,
Philadelphia
,
2013
).
25.
T.
Walsh
,
L.
Demkowicz
, and
R.
Charles
, “
Boundary element modeling of the external human auditory system
,”
J. Acoust. Soc. Am.
115
(
3
),
1033
1043
(
2004
).
26.
L.
Savioja
,
J.
Backman
,
A.
Järvinen
, and
T.
Takala
, “
Waveguide mesh method for low-frequency simulation of room acoustics
,” in
Proceedings of the 15th International Conference on Acoustics (ICA-95)
,
Trondheim, Norway
(
1995
). pp.
637
640
.
27.
A.
Krokstad
,
S.
Strom
, and
S.
Sørsdal
, “
Calculating the acoustical room response by the use of a ray tracing technique
,”
J. Sound Vib.
8
(
1
),
118
125
(
1968
).
28.
S.
Siltanen
,
T.
Lokki
, and
L.
Savioja
, “
Rays or waves? understanding the strengths and weaknesses of computational room acoustics modeling techniques
,” in
Proceedings of the International Symposium on Room Acoustics
(
2010
).
29.
J. B.
Allen
and
D. A.
Berkley
, “
Image method for efficiently simulating small-room acoustics
,”
J. Acoust. Soc. Am.
65
(
4
),
943
950
(
1979
).
31.
J.
Borish
, “
Extension of the image model to arbitrary polyhedra
,”
J. Acoust. Soc. Am.
75
(
6
),
1827
1836
(
1984
).
32.
M.
Vorländer
, “
Simulation of the transient and steady-state sound propagation in rooms using a new combined ray-tracing/image-source algorithm
,”
J. Acoust. Soc. Am.
86
(
1
),
172
178
(
1989
).
33.
J. H.
Rindel
, “
Modelling the angle-dependent pressure reflection factor
,”
Appl. Acoust.
38
(
2-4
),
223
234
(
1993
).
34.
Y. W.
Lam
, “
Issues for computer modelling of room acoustics in non-concert hall settings
,”
Acoust. Sci. Technol.
26
(
2
),
145
155
(
2005
).
35.
H.
Sinha
,
V.
Awasthi
, and
P. K.
Ajmera
, “
Audio classification using braided convolutional neural networks
,”
IET Sign. Process.
14
(
7
),
448
454
(
2020
).
36.
I.
Rodomagoulakis
,
N.
Kardaris
,
V.
Pitsikalis
,
E.
Mavroudi
,
A.
Katsamanis
,
A.
Tsiami
, and
P.
Maragos
, “
Multimodal human action recognition in assistive human-robot interaction
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
IEEE
(
2016
), pp.
2702
2706
.
37.
X.
Wu
,
H.
Gong
,
P.
Chen
,
Z.
Zhong
, and
Y.
Xu
, “
Surveillance robot utilizing video and audio information
,”
J. Intell. Robotic Syst.
55
(
4
),
403
421
(
2009
).
38.
A.
Mesaros
,
A.
Diment
,
B.
Elizalde
,
T.
Heittola
,
E.
Vincent
,
B.
Raj
, and
T.
Virtanen
, “
Sound event detection in the DCASE 2017 challenge
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
27
(
6
),
992
1006
(
2019
).
39.
Q.
Kong
,
I.
Sobieraj
,
W.
Wang
, and
M.
Plumbley
, “
Deep neural network baseline for DCASE challenge 2016
,” in
Proceedings of DCASE 2016
(
2016
).
40.
K. J.
Piczak
, “
ESC: Dataset for Environmental Sound Classification
,” in
Proceedings of the 23rd Annual ACM Conference on Multimedia
, ACM (
2015
), pp.
1015
1018
.
41.
A.
Mesaros
,
T.
Heittola
,
A.
Eronen
, and
T.
Virtanen
, “
Acoustic event detection in real life recordings
,” in
Proceedings of the 18th European Signal Processing Conference
, IEEE (
2010
), pp.
1267
1271
.
42.
X.
Zhuang
,
X.
Zhou
,
M. A.
Hasegawa-Johnson
, and
T. S.
Huang
, “
Real-world acoustic event detection
,”
Pattern Recogn. Lett.
31
(
12
),
1543
1551
(
2010
).
43.
J. F.
Gemmeke
,
L.
Vuegen
,
P.
Karsmakers
,
B.
Vanrumste
 et al, “
An exemplar-based NMF approach to audio event detection
,” in
Proceedings of the IEEE Workshop Applications Signal Processing to Audio Acoustics
, IEEE (
2013
), pp.
1
4
.
44.
A.
Temko
,
R.
Malkin
,
C.
Zieger
,
D.
Macho
,
C.
Nadeu
, and
M.
Omologo
, “
Clear evaluation of acoustic event detection and classification systems
,” in
Proceedings of the International Evaluation Workshop on Classification of Events, Activities and Relationships
,
Springer
(
2006
), pp.
311
322
.
45.
H.
Kuttruff
,
Room Acoustics
(
CRC Press
,
Boca Raton, FL
,
2016
).
46.
L. E.
Kinsler
,
A. R.
Frey
,
A. B.
Coppens
, and
J. V.
Sanders
, Fundamentals of Acoustics (
Wiley
, New York,
1999
).
47.
A.
Farina
, “
Advancements in impulse response measurements by sine sweeps
,” in
Audio Engineering Society Convention 122
,
Audio Engineering Society
(
2007
).
48.
T.
Ko
,
V.
Peddinti
,
D.
Povey
,
M. L.
Seltzer
, and
S.
Khudanpur
, “
A study on data augmentation of reverberant speech for robust speech recognition
,” in
Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, IEEE (
2017
), pp.
5220
5224
.
49.
O.
Chapelle
,
J.
Weston
,
L.
Bottou
, and
V.
Vapnik
, “
Vicinal risk minimization
,” in
Advances in Neural Information Processing Systems 13 (NIPS 2000)
(
2001
), pp.
416
422
.
50.
H.
Zhang
,
M.
Cisse
,
Y. N.
Dauphin
, and
D.
Lopez-Paz
, “
mixup: Beyond empirical risk minimization
,” arXiv:1710.09412 (
2017
).
51.
A.
Krizhevsky
,
I.
Sutskever
, and
G. E.
Hinton
, “
Imagenet classification with deep convolutional neural networks
,”
Adv. Neural Inf. Process. Syst.
25
,
1097
1105
(
2012
).
52.
K.
Simonyan
and
A.
Zisserman
, “
Very deep convolutional networks for large-scale image recognition
,” arXiv:1409.1556 (
2014
).
You do not currently have access to this content.