The development of efficient and accurate numerical methods for simulating realistic sound in virtual environments—such as computer games and VR/AR—has been an active research area for the last decades. However, handling dynamic scenes with many moving sources is still challenging due to intractable storage requirements and extensive computation time. A recently proposed physics-informed neural network (PINN) approach learns a compact and efficient surrogate model with parameterized moving sources and impedance boundaries on a grid-less 1-D domain. Contrary to traditional “black-box” deep learning, PINNs minimize the residuals of the governing equations through the loss function. We will extend this work using flow maps implemented as Residual Networks (ResNets). ResNets are interpreted from a dynamic systems perspective as ordinary differential equations that can be used as building blocks to approximate the governing equations in time. We will examine the pros and cons of ResNets in acoustics and compare them with state-of-the-art numerical methods and vanilla feed-forward neural networks in terms of accuracy and efficiency.