Metamaterials are attracting increasing interest in the field of acoustics due to their sound insulation effects. By periodically arranged structures, acoustic metamaterials can influence the way sound propagates in acoustic media. To date, the design of acoustic metamaterials relies primarily on the expertise of specialists since most effects are based on localized solutions and interference. This paper outlines a deep learning-based approach to extend current knowledge of metamaterial design in acoustics. We develop a design method by using conditional generative adversarial networks. The generative network proposes a cell candidate regarding a desired transmission behavior of the metamaterial. To validate our method, numerical simulations with the finite element method are performed. Our study reveals considerable insight into design strategies for sound insulation tasks. By providing design directives for acoustic metamaterials, cell candidates can be inspected and tailored to achieve desirable transmission characteristics.

Acoustic metamaterials can possess unnatural features such as negative values for the effective mass density and the stiffness in certain frequency ranges. These features result in new opportunities that can be beneficially used in the field of acoustics for sound insulation purposes. One way to quantify the insulation property is to analyze the transmission characteristics of the metamaterial. Amongst other configurations, unit cells in acoustic metamaterial occur as conventional sonic crystals with a cylindrical cross section1–3 and as C-shaped locally resonant sonic crystals.4,5 Few researchers have addressed the question of how to design unit cells to achieve a desired transmission property of the metamaterial.6,7

In recent years, there has been growing interest in neural networks and deep learning methods in general.8–11 These techniques provide models with high-level abstraction by exploring the underlying structure in datasets. Their use covers a wide range of scientific applications ranging from natural language processing12,13 and object detection14,15 to image classification tasks.16–18 In the field of acoustics, the recently published review article by Bianco et al.19 provides a large and comprehensive overview on machine learning applications.

In a major advance in 2014, Goodfellow et al.20 introduced generative adversarial networks (GANs). The idea of a GAN is to set up two neural networks, a generative network and a discriminative network, competing with each other. The generator tries to deceive the discriminator by creating fake data samples, whereas the discriminator conducts investigations to expose generated fake samples. As a result, a competition arises where both parties train to outsmart their rival. Since then, GANs have been optimized in many ways including robust implementations,21 efficient training strategies,22 and modifications as an alternative unsupervised learning approach.23 For more introductory literature, the interested reader is referred to the comprehensive works.24,25

In their article, Liu et al.26 reported on metamaterial applied in acoustics and demonstrated sound insulation effects of locally resonant sonic crystals with cylindrical cross sections. Elford et al.27 concluded that matryoshka sonic crystals, a set of unit cells of decreasing size stacked one inside another, are a highly effective configuration for unit cells in the lower frequencies. Their approach is based on finite element simulations to analyze the transmission characteristics of the sonic crystal. Lagarrigue et al.28 studied periodically arranged inclusions embedded in porous layers. They observed higher absorption coefficients in the low frequency range when periodically arranged inclusions are considered in the porous layer. Several studies7,29–32 have been conducted on perfect and broadband acoustic absorption for Helmholtz resonators. Romero et al.30 report perfect sound absorption regarding reflection problems, while Jimenez et al.31 achieve perfect sound absorption in a frequency range from 300 to 1000 Hz for transmission problems.

Meng et al.33 proposed an optimization scheme for locally resonant acoustic metamaterials based on genetic algorithms. They conducted a method to optimize the structural parameters of the metamaterial for underwater sound absorption purposes. In their article, Lu et al.34 present a topology optimization technique for the structural design of locally resonant metamaterials. By using the adjoint variable method, they developed a strategy to optimize the negative bulk modulus at specific frequencies. The study by Yang et al.35 provides a topology optimization of acoustic metamaterials by using the concept of effective mass density to maximize selected bandgaps of the metamaterial. Fahey et al.36 developed an optimization method based on analytically determined gradients for broadband metamaterial design. In their article, Robeck et al.37 outline a design scheme based on deep learning. They constructed a convolutional neural network (CNN) trained on data from scattered wave fields.

A concept for generative design was devised by Liu et al.38 applied on metamaterial in the field of optics. The authors developed a design scheme by using GANs. They used the desired optical spectra as inputs to obtain geometries with the specific behavior. Malkiel et al.39 propose a deep learning method for the design of nanophotonic structures in optics. Based on the far-field response of the nanostructure, they predict the geometry of the related structure. Another deep learning approach in optics was conducted by Tahershima et al.40 The authors developed deep neural networks for forward and inverse evaluations of nanophotonic devices. By assigning the desired response behavior to the input layer of the network, topology designs are obtained for the nanophotonic devices. An alternative deep learning approach was investigated by Zhang and Ye.41 They developed a strategy by using a variational autoencoder (VAE) for mask design in optical microlithography.

An improvement of the GAN was proposed by Mirza and Osindero.42 They extended the GAN architecture in such a way that conditions can be involved in the image generation process. In their work, the authors conditioned the image generation on class labels. Recent evidence in the field of nanophotonics43 suggests an efficient design method based on a conditional GAN. The authors assisted the GAN by conditioning the device generation process on labelled topology optimized designs.

To the best of our knowledge, there are no results in the literature regarding generative methods for the design of acoustic metamaterials. The aim of this study is to extend current knowledge on metamaterial designs to the field of acoustics. For this purpose, we develop a conditional GAN to obtain geometric properties based on target transmission spectra. In this study, we focus on the frequency range from 2000 to 10 000 Hz. We initiate this research with a relatively coarse step size of 100 Hz to concentrate on the mechanism of the presented methodology. A finer frequency resolution can be easily considered within our framework. The results of the GAN are encouraging and show that new principles for the design of acoustic metamaterial can be derived with the present technique. This new network will be able to make design proposals apart from well-known sonic crystals and C-shaped resonators. By composing datasets with various transmission spectra from arbitrarily shaped geometries, this network can provide design candidates for applications beyond transmission tubes.

This paper is organized as follows. Section II outlines the simulation model and our deep learning-based approach for analyzing the design task. In Sec. III, the results of the proposed method are presented. Section IV concludes this study and discusses further applications and related future work aspects.

The methods described here consist of a data collection strategy, the simulation model with the finite element method (FEM) according to Elford et al.,27 and the GAN for design tasks. Regarding the GAN, its training procedure and network architecture are detailed. Consequently, the composition of the network architecture for design purposes is presented.

To train the GAN, a dataset with different cell designs is required. For this purpose, we create synthetic training data by using a random generator algorithm. Prior to the cell generation, the unit cell is discretized into a binary image (see Fig. 1). The pixels in the binary image are Boolean variables taking the value zero or one. We assign zeros in the image where we have fluid elements in the unit cell and ones for solid elements. The solid part is assumed to be rigid, and thus, the interface between fluid and solid is assumed to be sound hard. This discretization technique is very advantageous, since it enables an easy implementation of generative design algorithms. In the presented work, we set up an algorithm based on a random generator. In a preliminary attempt, we restrict the geometries to be symmetric. Thus, only half of the unit cell is modeled in the data generation process. The generation process continues following the steps outlined below. In the initial stage of the process, we set a random pixel to one. Subsequently, the algorithm randomly assigns a new pixel next to the current pixel to one. Therefore, the algorithm has at maximum four options for the next pixel. As soon as the new pixel's value is set to one, it will be the next starting point for the upcoming generation. This process is repeated until a fixed number of iterations is reached. When these steps have been completed, the generated shape is mirrored on the vertical axis for reasons of symmetry. In a final step, the new geometry is placed in the center of the unit cell. To provide manufacturable realizations and robust simulations, we constrain the algorithm to design coherent geometries. Thus, we propose only pixels with at least one adjacent edge. The generation strategy is illustrated in Fig. 2(a) with the possible options depicted in blue. Nonetheless, non-coherent geometries might still occur, e.g., if two pixels assigned with ones share the same corner. For this case, we implement a filter operator as motivated by CNNs.9 The entire image is successively filtered with a 2 × 2 kernel matrix. While choosing the elements of this matrix as depicted in Fig. 2(b), non-coherent cell shapes are identified. For instance, an activation of four exposes two solid elements (pixel value 1) that are only connected at one point. Moreover, we introduce a second filter operator. This second filter checks if there is enough space between the crafted geometry and the cell boundary. By applying this operator, we avoid geometries that cover the entire horizontal span. These samples do not comply with the periodic nature of metamaterials and are thus discarded. Note that the image data contain many zeros. This would lead to a lower activation of the first layer of the network, which can result in extremely long training periods. Thus, we replace the zeros in the pixels with “–1” in the training process.

FIG. 1.

The unit cell is discretized to a pixel-based binary image. Fluid elements are depicted in gray and solid elements in white.

FIG. 1.

The unit cell is discretized to a pixel-based binary image. Fluid elements are depicted in gray and solid elements in white.

Close modal
FIG. 2.

(Color online) The growth of the geometry is provided by pixels with adjacent edges (blue). Only half of the geometry is modeled by imposing symmetry along the vertical axis (orange, dashed-dotted) (a). (b) The filter operator to prevent forms sharing only one corner.

FIG. 2.

(Color online) The growth of the geometry is provided by pixels with adjacent edges (blue). Only half of the geometry is modeled by imposing symmetry along the vertical axis (orange, dashed-dotted) (a). (b) The filter operator to prevent forms sharing only one corner.

Close modal

To analyze how sound propagates through the metamaterial, FEM simulations according to Elford et al.27 are performed. In particular, the transmission spectra are evaluated to quantify the absorption behavior of the metamaterial. For this purpose, we use the transmission loss that is expressed by

TL=10log(PoutPin),
(1)

where Pin and Pout are the sound power of an incoming plane wave and the sound power of an outgoing plane wave, respectively. The sound power P is computed by

P=ΓI·ndΓ=12{Γpvn*dΓ},
(2)

with I=(1/2){pv*} denoting the sound intensity, p the sound pressure, and v* the complex conjugate of the fluid particle velocity. The entity n describes the normal vector on the surface of interest Γ.44 To evaluate Pin, the relevant surface Γin is defined in front of the metamaterial, whereas Γout denotes the surface behind the metamaterial to evaluate Pout.

To develop a FEM model for transmission analysis, we use the software program COMSOL Multiphysics®. The FEM model is made up by a series of ten unit cells that are arranged periodically in an air-filled tube (speed of sound c=343m/s, density ρ=1.2kg/m3). Regarding the mesh size, we have chosen 20 quadratic elements per wavelength with respect to the maximum frequency of 10 kHz in all simulations.45 We assume the thickness of the unit cells to be infinitely large. At the upper and lower boundary of the model, we impose symmetric boundary conditions that are interpreted as sound hard boundary conditions. Moreover, we apply perfectly matched layers (PML) at the left end and at the right end of the simulation model to truncate the region of interest. At the interface between the sonic crystal and the tube, we assume sound hard boundary conditions according to Ref. 27. The model of the metamaterial is excited at the left end of the tube by a prescribed sound pressure of 1 Pa. A schematic representation of the FEM model is depicted in Fig. 3. For demonstration purposes, the sonic crystals are assumed to be cylindrical. In the scope of the present work, viscothermal losses due to boundary layer effects are neglected.

FIG. 3.

The metamaterial is modeled by a series of ten unit cells along the axis of an air-filled transmission tube (light gray). At the lower and upper boundary of the model, symmetric boundary conditions, which can be interpreted as sound hard boundary conditions, are applied (black, dashed). The metamaterial is excited at the left end by a prescribed sound pressure of 1 Pa (black, hatched). At the interface between the cylinder and the surrounding media, we assume sound hard boundary conditions (black, solid). The FEM model is truncated by implying PMLs at both ends of the tube (dark gray).

FIG. 3.

The metamaterial is modeled by a series of ten unit cells along the axis of an air-filled transmission tube (light gray). At the lower and upper boundary of the model, symmetric boundary conditions, which can be interpreted as sound hard boundary conditions, are applied (black, dashed). The metamaterial is excited at the left end by a prescribed sound pressure of 1 Pa (black, hatched). At the interface between the cylinder and the surrounding media, we assume sound hard boundary conditions (black, solid). The FEM model is truncated by implying PMLs at both ends of the tube (dark gray).

Close modal

GANs belong to the group of unsupervised machine learning techniques. They are composed by two neural networks, namely a generator network and a discriminator network. The generator creates an image from a latent space and proposes this image to the discriminator. The discriminator as a classification network learns to decide whether the image sample stems from the dataset or from the generator. If the discriminator recognizes the sample as fake, the proposed sample will be rejected. Consequently, the generator network tries to create images that have a higher resemblance to the training data. The underlying training procedure for the two networks is detailed in the following.20,38

1. Training procedure for conditional GANs

In the training procedure of the conditional GAN, the generator and the discriminator are trained alternately by contesting each other. As a result, both networks improve their predictions until the related counterpart can no longer recognize the proposed data as such. The training procedure for one epoch is explained in the following.

First, a small subset from the dataset, the batch, is generated for the training process. This batch contains random vectors in the latent space and transmission spectra from the training data. Based on the information in the batch, the generator creates the corresponding images. The early image generations are expected to be extremely noisy, as the inputs stem from the randomized latent space. Additionally, the parameters of the generator network, also called weights and biases, are randomly initialized. Second, the same number of images and the associated transmission spectra are chosen from the training data. This selection is then mingled with the predictions of the generator while conserving the labels to designate the origin of the data. Based on this mixed subset, the discriminator trains to distinguish between images predicted from the generator and images from the training dataset. The discriminator's output is considered to be true when it either predicts “fake” for an input from the generator (fake =̂0) or “real” for an input from the dataset (real =̂1). After this step, the discriminator's parameters are updated and frozen. This is crucial, since the discriminator is already trained for the current epoch and its parameters should not be influenced by the training phase of the generator. In an effort to deceive the discriminator, i.e., the discriminator outputs real for a generated image, the generator's parameters are updated by solving an optimization problem. As proposed by Foster,25 we choose the binary cross-entropy loss function,

EBCE=1Nk=1Ntklog(yk)+(1tk)log(1yk),
(3)

where N, tk, and yk denote the size of the output layer, the target probability value, and the predicted probability value, respectively. Figure 4 shows a schematic illustration of the training procedure.

FIG. 4.

The generator creates an image based on a randomized vector Z and a transmission loss (TL). The created images are mixed with the images and associated transmission spectra in the dataset before they are passed to the discriminator, which tries to detect the generated samples. It has a logical output, “real” for images from the dataset and “fake” for the generated samples.

FIG. 4.

The generator creates an image based on a randomized vector Z and a transmission loss (TL). The created images are mixed with the images and associated transmission spectra in the dataset before they are passed to the discriminator, which tries to detect the generated samples. It has a logical output, “real” for images from the dataset and “fake” for the generated samples.

Close modal

Once the training of the GAN is completed, the trained generator network is extracted from it. In the sense of an inverse method, the generator expects a target transmission spectrum as its input. Based on the desired transmission characteristics, appropriate cell candidates will be tailored by the generator network.

2. Network architecture of the conditional GAN

In a classical GAN scheme, the generator creates an image sample based on a randomized vector from the latent space. In the training procedure, the parameters of the generator are optimized such that the predicted images are similar to those in the training data. In this form, the GAN is extremely inefficient and not suitable for the upcoming design task. This is because new designs can only be generated on the basis of randomized vectors stemming from the latent space. In this way, the generated cell has no relation to the target transmission loss. Thus, we used a variation of the GAN, the conditional GAN. By this means, we condition the image generation process on transmission spectra in the dataset.

The protagonists of the conditional GAN, the generator and the discriminator, are both realized as CNNs. The generator network consists of four convolutional layers. Prior to the convolutional layers, we implement two input channels. The first one is dedicated to the latent space, and the second one is for the transmission spectra input. The inputs are each processed through separate fully connected dense layers before they are reshaped to the required image format. Then the processed inputs are merged into one layer and processed through the CNN. We use LeakyRELU activation functions in all layers. The generator network is compiled with the Adam optimizer using a learning rate of 0.0004.46 Regarding regularization, we apply batch normalization in each convolutional layer.

The discriminator network contains four convolutional layers as well. Its image input is accompanied by transmission loss data from the training set that is processed through a fully connected layer and then reshaped to an image format. Both inputs, the image and transmission spectrum, are then concatenated and passed to the first convolutional layer. Further, we apply LeakyRELU activation functions in the layers of the discriminator network. One exception is the final neuron, which is activated by a sigmoid function. Since we demand the discriminator's output to be 1 or 0, a sigmoid activation is appropriate here. The model of the discriminator is compiled with the Adam optimizer and learning rate of 0.0008. We use dropout regularization in each convolutional layer, updating only 60% of the neurons in each training step. A schematic representation of the generator and discriminator with details on the number of neurons is depicted in Figs. 5(a) and 5(b), respectively. To develop the GAN model, we used tensorflow and keras. The code and raw data accompanying this paper are available online.47 

FIG. 5.

Random vectors from the latent space z and transmission loss data denote the input of the generator. The inputs are each processed through separate fully connected layers before they are passed to the CNN. The output of the generator is an image candidate (a). The input of the discriminator is made up of an image and a transmission spectrum. The merged input is passed through four convolutional layers. A single neuron is implemented in the output layer for the Boolean outcome (b). The parameters of the convolutional layers are displayed in the format “depth @ height × width.”

FIG. 5.

Random vectors from the latent space z and transmission loss data denote the input of the generator. The inputs are each processed through separate fully connected layers before they are passed to the CNN. The output of the generator is an image candidate (a). The input of the discriminator is made up of an image and a transmission spectrum. The merged input is passed through four convolutional layers. A single neuron is implemented in the output layer for the Boolean outcome (b). The parameters of the convolutional layers are displayed in the format “depth @ height × width.”

Close modal

To train neural networks, a dataset is required. According to our simulation model, the dataset contains the cell geometries as inputs and the related transmission losses as outputs. For design purposes, Liu et al.38 propose to analyze 6500 different geometry configurations to train a GAN. Since we want to investigate cell shapes that are beyond circular or rectangular cross sections, the generation technique as presented in Sec. II A is applied to create a synthetic dataset with the least possible degree of restriction.

The unit cell is assumed to be a square of 22 mm edge length for the dataset. In total, our synthetic dataset contains 2800 unit cells generated by the algorithm in Sec. II A. It can be subdivided into two categories: scatterers and locally resonant metamaterials, either rectangular shaped or C-shaped Helmholtz resonators. Figure 6 shows arbitrarily chosen samples from each category. As aforementioned, the training data for the neural networks consist of the cell shapes as the inputs and the related transmission losses as the outputs. Therefore, a FEM simulation as presented in Sec. II B is performed for each cell sample. In the FEM simulation, the transmission loss is evaluated for a series of ten identical unit cells. However, the generated unit cells are present in the form of pixel-based images. Prior to the FEM simulation, the data in the images need to be transferred to a geometry in comsol. For this purpose, we developed an interpreter that translates the binary images into comsol commands. These commands are then executed in comsol to create the corresponding FEM domain. Once the cells are created in the FEM model, sound hard boundary conditions are applied on their boundaries. The resulting simulation model can then be used to analyze the transmission loss for the inspected metamaterial.

FIG. 6.

Cell samples from the synthetic dataset: scatterers (left), C-shaped resonators (center), rectangular shaped resonators (right).

FIG. 6.

Cell samples from the synthetic dataset: scatterers (left), C-shaped resonators (center), rectangular shaped resonators (right).

Close modal

As presented in Sec. I, the transmission losses are evaluated from 2000 to 10 000 Hz with a step size of 100 Hz. As a result, we obtain the evaluation of the transmission loss at 81 frequencies. In the training process, the entire dataset is split into a training set (2380 samples) and a test set (420 samples). In the presented work, the unit cells are generated randomly. Hence, a randomized assignment of the unit cells into subsets is not required. The structure of the dataset for the training procedure is schematically illustrated in Fig. 7.

FIG. 7.

The synthetic dataset with 2800 samples. Binary images with a resolution of 88 × 88 pixels are used as input data (left). The outputs are the related transmission losses from 2 to 10 kHz (right).

FIG. 7.

The synthetic dataset with 2800 samples. Binary images with a resolution of 88 × 88 pixels are used as input data (left). The outputs are the related transmission losses from 2 to 10 kHz (right).

Close modal

The conditional GAN architecture is trained for 2000 epochs by using a batch size of 32 data samples. The training history is depicted in Fig. 8. It displays the loss of the model in Fig. 8(a) and its accuracy in Fig. 8(b). While the loss of the model is evaluated according to Eq. (3), the accuracy is defined as the ratio of the number of correct predictions to the total number of predictions. In the training analysis, the loss and accuracy of the discriminator are separately evaluated on real samples stemming from the dataset and on fake ones created by the generator.

FIG. 8.

Loss plot (a) and accuracy plot (b) for the GAN training. The loss is displayed for the generator (dashed) and for the discriminator on identifying real images (solid) and fake images (dotted). The classification accuracy is depicted for the discriminator on exposing fake images (dotted) and on finding real images (solid).

FIG. 8.

Loss plot (a) and accuracy plot (b) for the GAN training. The loss is displayed for the generator (dashed) and for the discriminator on identifying real images (solid) and fake images (dotted). The classification accuracy is depicted for the discriminator on exposing fake images (dotted) and on finding real images (solid).

Close modal

In the initial training phase, we can see that the losses of the models are somewhat erratic up to epoch 600. After that, they remain stable. The discriminator's losses on real and on fake data decrease to zero as the loss of the generator is between 3.0 and 5.0. Regarding the accuracy of the discriminator, we identify a similar erratic behavior in the early training phase. After epoch 600, the discriminator's accuracy on real and on fake samples stabilizes at 90% and remains stable beyond that. Note that the generator's accuracy is not displayed. In the case of the generator, metrics such as accuracy are obsolete, since its output is classified by the discriminator and not by itself.

The initial erratic behavior is expected, since the model is prone to the randomized inputs from the latent space that are mingled with randomly chosen transmission spectra from the dataset. After epoch 600, the performance of the proposed model amounts to an accuracy of 90%, while the loss of the generator finds an equilibrium around 4.0. The conditional GAN stabilizes, and plausible image generations are thus to be expected after epoch 600.

As outlined by Goodfellow,24 mode collapse refers to a generator model that yields the same output for different randomized input vectors. Thus, mode collapse is a highly parasitical effect that must be avoided. To diagnose mode collapse, the discriminator's performance on identifying fake image samples provides valuable information. For instance, a clear sign that mode collapse is present would be an accuracy of 100% on exposing fake images throughout the training. This would suggest a generator model creating image samples that are easily declared as fake by the discriminator. By contrast, the accuracy on identifying real samples would be expected to be significantly lower, since the probability for misclassifications on real images is assumed to remain constant. However, considering that our discriminator's performance on identifying fake and real images is balanced around 90% accuracy, we assume that mode collapse is circumvented.

To assess the performance of our method, we choose three transmission spectra whose characteristics are associated with pure scattering cell units. Moreover, we select three transmission spectra related to Helmholtz resonators with C-shaped and another three with rectangular shaped cavities. All desired transmission spectra are taken from the test dataset. To validate the generated samples, we perform a FEM simulation of the generated cell candidate in accordance with the FEM model in Sec. II B. The transmission spectra obtained from the FEM simulations of the generated candidates (solid line) as well as the desired input transmission losses (dashed line) are shown in Fig. 9. The corresponding test input cells are displayed in Fig. 10.

FIG. 9.

The simulated transmission loss of the designed cell candidate (solid) compared with the input transmission loss from the dataset (dashed).

FIG. 9.

The simulated transmission loss of the designed cell candidate (solid) compared with the input transmission loss from the dataset (dashed).

Close modal
FIG. 10.

Specific cell samples from the test dataset containing scatterers (first row) and C-shaped (second row) and rectangular shaped (third row) Helmholtz resonators.

FIG. 10.

Specific cell samples from the test dataset containing scatterers (first row) and C-shaped (second row) and rectangular shaped (third row) Helmholtz resonators.

Close modal

Note that the generator predictions contain single pixel elements that are not coherent with the generated structure. Thus, prior to the validation step, we remove the loosely hanging pixel elements. A schematic representation of this post-processing step is depicted in Fig. 11.

FIG. 11.

Image cleaning by removing the loosely hanging pixel elements.

FIG. 11.

Image cleaning by removing the loosely hanging pixel elements.

Close modal

Regarding the designs #1, #2, and #3, the simulated transmission loss concurs very well with the desired transmission loss as shown in Fig. 9. It is evident that the scatterer-like generations exhibit sound absorption characteristics in the region of the Bragg frequency, although small deviations become apparent in the transmission behavior of sample #3. In the case of the generated Helmholtz resonators, samples #4–#9, the FEM results highlight the locally resonant behavior below the first Bragg frequency. By inspecting the results of samples #4, #6, and #9, one can observe that the simulated resonance frequency differs slightly from the input transmission data.

The generated cell designs are presented in Fig. 12. In the present study, all cell design types are obtained by one execution of the GAN. The samples #1, #2, and #3 are generated by demanding transmission properties associated with purely scattering metamaterials. The samples #4–#6 and #7–#9 are designed by inputting transmission spectra with pronounced local resonances stemming from C-shaped or rectangular cavities, respectively.

FIG. 12.

Nine selected cell candidates generated by one execution of the GAN.

FIG. 12.

Nine selected cell candidates generated by one execution of the GAN.

Close modal

Strong evidence can be observed for all nine cell designs. While the proposed designs in the first row show high resemblance to scattering cell units, the generated images #4–#9 expose distinct cavities that are associated with Helmholtz resonators. Remarkably, the generated samples #4–#6 (second row in Fig. 12) form C-shaped inclusions as their counterparts from the test dataset (second row in Fig. 10). However, the input transmission spectra associated with rectangular shaped Helmholtz resonators #7, #8, and #9 (Fig. 10) reveal cell designs with distinct circular shaped cavities (third row in Fig. 12). This is particularly pronounced in samples #8 and #9. Only sample #7 stands out with a rectangular shaped cavity.

Regarding the scatterer samples, we noticed only small variations in the proposed cell designs. The generated scattering cells are rather of compact and simple form, which is in contrast to the complex fine-grained samples in the dataset (see Fig. 6). The Helmholtz resonator samples, however, are distinguished by inclusions either of circular or rectangular shape. Even though all generated cell shapes differ visually from their counterparts in the input test data, their transmission characteristics are in good agreement. This complies with our research idea, since we intend to develop a framework for tailored cell design. By solely feeding input transmission characteristics, the trained generator is able to decide if the new cells require cavities to represent the desired behavior. In terms of sound absorbing applications, we prioritize geometrical properties, such as the size of the cavity and the aperture, over the form of the cavity. As the transmission spectra in the dataset resemble each other, it becomes natural that we obtain cell structures that look similar.

In a further analysis, we introduce the relative error to quantify the performance of the proposed network. The relative error metric is expressed by

ε=|TLgenTLinpTLinp,μ|,
(4)

where TLgen and TLinp denote the simulated transmission loss of the generated cell and the desired input transmission loss, respectively. With multiple transmission loss values close to zero, referring the error directly to TLinp would lead to insignificantly large errors. Thus, we choose the mean value of the simulated transmission loss, TLinp,μ, as the reference value. The relative error is visualized in Fig. 13. In the first row, the relative error corresponding to the scatterer samples #1 to #3 is depicted across the relevant frequency range from 2 to 10 kHz. The second and third row show the relative error of C-shaped, #4–#6, and rectangular Helmholtz resonators, #7–#9, respectively.

FIG. 13.

The relative error comparing the simulated transmission loss of the designed cell candidate with the input transmission loss from the dataset.

FIG. 13.

The relative error comparing the simulated transmission loss of the designed cell candidate with the input transmission loss from the dataset.

Close modal

In the case of the scatterer samples #1 and #2, the relative error amounts to 103 below 7 kHz. In these frequencies, the relative error of sample #3 is around 102. In the frequency range from 8 to 10 kHz, the three scatterer samples show similar error values around 101. In the frequencies near 7 kHz, the scatterer samples exhibit errors that are on the order of magnitude 100, or even 101 in the case of sample #3. Regarding the C-shaped Helmholtz resonator samples #4–#6, the relative error amounts to 103 in the frequency range from 2 to 4 kHz. These samples show error values around 100–101 in the frequency bands ranging from 4 to 5.5 kHz and from 6.5 to 9 kHz. Error values of 101 are reported for frequencies near 6 and 9.5 kHz. The rectangular Helmholtz resonator samples #7 and #8 show error values of about 103 in the frequency range from 2 to 3.5 kHz. In the frequencies ranging from 3.5 to 5 kHz, the relative error is on the order of magnitude 100. Similar errors are observed in the frequencies ranging from 6 to 8 kHz. However, in the frequency ranges 5–6 kHz and 8–10 kHz, the relative error decreases to 102. Regarding sample #9, the relative error is around 103 in the lower frequencies from 2 to 4 kHz. In the subsequent frequency band from 4 to 7 kHz, relatively large error values around 100 are reported, while the error decreases to 101 in the upper frequencies, 7–10 kHz.

The relative error metric emphasizes the validity of the presented network architecture, although, in the region of the resonant frequencies, e.g., the Bragg frequency around 7 kHz and local resonance frequencies ranging from 3 to 6 kHz, the error becomes relatively large. This is due to small deviations either in the amplitudes or the frequencies of the associated transmission losses. Apart from this slight discordance, the proposed method reproduces the response of the generated cell in satisfactory quality. However, since the proposed error metric is sensitive to discrepancies in amplitudes and frequencies, careful attention must be paid while analyzing such findings.

In addition to the relative error, we introduce the cosine similarity metric to express the error in a single value. It is defined by

k(TLgen,TLinp)=TLgen,TLinpTLgen·TLinp,
(5)

where *,* denotes the inner product of two one-dimensional arrays and * the L2 norm of the one-dimensional input array. Cosine similarity values close to unity signify a positive correlation, whereas values near zero indicate incoherence between the associated transmission losses. The results of the cosine similarity metric for each sample are displayed in Table I. Regarding the scatterer samples, we observe high cosine similarities for the three samples. In particular, samples #1 and #2 exhibit values close to unity. In the case of the C-shaped Helmholtz resonators, #4 and #6 show similarity values around 0.5, whereas a high cosine similarity of about 0.8 is reported for sample #5. The cosine similarity metric applied on the rectangular Helmholtz resonators yields values above 0.8 for samples #7 and #8 and approximately 0.6 for sample #9.

TABLE I.

The cosine similarity metric evaluated for three samples from each cell design type.

ScattererC-shapedRectangular
#1#2#3#4#5#6#7#8#9
Cosine similarity 0.9916 0.9994 0.8071 0.4847 0.8028 0.5324 0.8492 0.8271 0.5648 
ScattererC-shapedRectangular
#1#2#3#4#5#6#7#8#9
Cosine similarity 0.9916 0.9994 0.8071 0.4847 0.8028 0.5324 0.8492 0.8271 0.5648 

By relating the cosine similarities with the transmission losses (Fig. 9), it becomes apparent that similarity values close to unity correlate to samples that are in complete agreement with the desired transmission behavior; see #1 and #2. Similarity scores around 0.8 indicate that the reproduced transmission spectra are consistent with the desired characteristics, although small deviations in the amplitude or the frequency might be present; see samples #3, #5, #7, and #8. As the deviations increase, the similarity score drops to values around 0.5. This applies to samples #4, #6, and #9, where significant frequency shifts occur in the region of the local resonant frequencies.

The results of the cosine similarity metric provide further evidence for the practicability of our proposed method. As reported above, this metric points to the concurrence between the simulated transmission loss of the generated cell and the desired transmission behavior. Moreover, it lends additional support to the previous findings in the error analysis (Fig. 13). While similarity values above 0.95 indicate nearly concurrent transmission characteristics, lower similarity scores suggest further investigations. Therefore, the cosine similarity in conjunction with the relative error metric provides an effective diagnosis tool to detect and inspect discrepancies in the transmission spectra.

As far as we know, this is the first time that conditional GANs have been applied in the design process of acoustic metamaterials. The results have further strengthened our confidence in conditional GANs to design unit cells for tailored sound insulation purposes. Moreover, our study provides considerable insight into conditional GANs. We elaborate the principles of conditional GANs and show how they can be integrated into design tasks for acoustic metamaterial. We are aware that our study may have two limitations. The first is the random generator that deploys cell shapes as image data. It is an artificial, synthetic approach and not subject to physical laws. The second is the synthetic dataset, which may contain many samples that are very similar to each other. By this means, other design types can be less-favored. This can result in an imbalanced dataset causing the network to constrain itself on specific cell designs. These limitations underline the difficulty of collecting data on cell designs applied on acoustic metamaterials.

Our study provides the framework for a new way to generate unit cell designs for sound insulation purposes by using GANs. The proposed conditional GAN was enhanced by a multi-input channel scheme in which the underlying relation between cell geometries and their associated transmission loss is incorporated. To collect data for training purposes, a pixel-based generator algorithm was implemented whose outputs were analyzed with FEM simulations. Despite the fact that there are limitations due to the synthetic dataset, the results have shown that design principles can be derived from the present study. This project is the first step toward enhancing our understanding of GANs in the field of acoustics. This approach has the potential to be applied to further design tasks in engineering. One promising application of our technique would be designing absorption layers, e.g., in vehicle cabins and fuselages. Another possible application could be the design of acoustic scatterers such as sound barriers. Future work will concentrate on the composition of the training data to enrich the GAN with enhanced geometries to broaden the spectrum of possible designs. In this context, we plan to consider finer frequency step sizes to represent possible resonance behavior. Moreover, we intend to imply viscothermal effects as well as experimental investigations.

1.
T.
Miyashita
, “
Sonic crystals and sonic wave-guides
,”
Meas. Sci. Technol.
16
(
5
),
R47
R63
(
2005
).
2.
Y.
Pennec
,
J. O.
Vasseur
,
B.
Djafari-Rouhani
,
L.
Dobrzyński
, and
P. A.
Deymier
, “
Two-dimensional phononic crystals: Examples and applications
,”
Surf. Sci. Rep.
65
(
8
),
229
291
(
2010
).
3.
M. D. P.
Peiró-Torres
,
J.
Redondo
,
J. M.
Bravo Plana-Sala
, and
J. V.
Sánchez Pérez
, “
Open noise barriers based on sonic crystals. advances in noise control in transport infrastructures
,”
Transp. Res. Proc.
18
,
392
398
(
2016
).
4.
D. P.
Elford
,
L.
Chalmers
,
F.
Kusmartsev
, and
G.
Swallowe
, “
Acoustic band gap formation in metamaterials
,”
Int. J. Mod. Phys. B
24
(
25n26
),
4935
4945
(
2010
).
5.
A.
Melnikov
,
Y. K.
Chiang
,
L.
Quan
,
S.
Oberst
,
A.
Alù
,
S.
Marburg
, and
D.
Powell
, “
Acoustic meta-atom with experimentally verified maximum willis coupling
,”
Nat. Commun.
10
(
1
),
3148
(
2019
).
6.
C.
Claeys
,
E.
Deckers
,
B.
Pluymers
, and
W.
Desmet
, “
A lightweight vibro-acoustic metamaterial demonstrator: Numerical and experimental investigation
,”
Mech. Syst. Signal Process.
70-71
,
853
880
(
2016
).
7.
L.
Moheit
,
S.
Anthis
,
J.
Heinz
,
F.
Kronowetter
, and
S.
Marburg
, “
Analysis of scattering by finite sonic crystals in free field with infinite elements and normal modes
,”
J. Sound Vib.
476
,
115291
(
2020
).
8.
Y.
LeCun
,
Y.
Bengio
, and
G.
Hinton
, “
Deep learning
,”
Nature
521
(
7553
),
436
444
(
2015
).
9.
M. A.
Nielsen
,
Neural Networks and Deep Learning
(
Determination Press
,
San Francisco, CA
,
2015
).
10.
I.
Goodfellow
,
Y.
Bengio
, and
A.
Courville
,
Deep Learning
(
MIT Press
,
Cambridge, MA
,
2016
).
11.
J.
Patterson
and
A.
Gibson
,
Deep Learning: A Practitioner's Approach
(
O'Reilly Media, Inc
.,
Sebastopol, CA
,
2017
).
12.
R.
Collobert
and
J.
Weston
, “
A unified architecture for natural language processing: Deep neural networks with multitask learning
,” in
Proceedings of the 25th International Conference on Machine Learning
(
2008
), pp.
160
167
.
13.
C. D.
Manning
,
M.
Surdeanu
,
J.
Bauer
,
J. R.
Finkel
,
S.
Bethard
, and
D.
McClosky
, “
The Stanford CoreNLP natural language processing toolkit
,” in
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations
(
2014
), pp.
55
60
.
14.
J.
Han
,
D.
Zhang
,
G.
Cheng
,
N.
Liu
, and
D.
Xu
, “
Advanced deep-learning techniques for salient and category-specific object detection: A survey
,”
IEEE Signal Process. Mag.
35
(
1
),
84
100
(
2018
).
15.
Z.-Q.
Zhao
,
P.
Zheng
,
S.-T.
Xu
, and
X.
Wu
, “
Object detection with deep learning: A review
,”
IEEE Trans. Neural Netw. Learn. Syst.
30
(
11
),
3212
3232
(
2019
).
16.
G.
Giacinto
and
F.
Roli
, “
Design of effective neural network ensembles for image classification purposes
,”
Image Vision Comput.
19
(
9–10
),
699
707
(
2001
).
17.
Q.
Li
,
W.
Cai
,
X.
Wang
,
Y.
Zhou
,
D. D.
Feng
, and
M.
Chen
, “
Medical image classification with convolutional neural network
,” in
2014 13th International Conference on Control Automation Robotics & Vision (ICARCV)
Singapore (December 10–12,
2014
), pp.
844
848
.
18.
J.
Wang
and
L.
Perez
, “
The effectiveness of data augmentation in image classification using deep learning
,” arXiv:1712.04621 (
2017
).
19.
M. J.
Bianco
,
P.
Gerstoft
,
J.
Traer
,
E.
Ozanich
,
M. A.
Roch
,
S.
Gannot
, and
C.-A.
Deledalle
, “
Machine learning in acoustics: Theory and applications
,”
J. Acoust. Soc. Am.
146
(
5
),
3590
3628
(
2019
).
20.
I.
Goodfellow
,
J.
Pouget-Abadie
,
M.
Mirza
,
B.
Xu
,
D.
Warde-Farley
,
S.
Ozair
,
A.
Courville
, and
Y.
Bengio
, “
Generative adversarial nets
,” in
NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems
(
2014
), Vol.
2
, pp.
2672
2680
.
21.
L.
Metz
,
B.
Poole
,
D.
Pfau
, and
J.
Sohl-Dickstein
, “
Unrolled generative adversarial networks
,” arXiv:1611.02163 (
2016
).
22.
X.
Mao
,
Q.
Li
,
H.
Xie
,
R. Y.
Lau
,
Z.
Wang
, and
S. P.
Smolley
, “
Least squares generative adversarial networks
,” in
Proceedings of the 2017 IEEE International Conference on Computer Vision
, Venice, Italy (October 22–29,
2017
), pp.
2794
2802
.
23.
A.
Radford
,
L.
Metz
, and
S.
Chintala
, “
Unsupervised representation learning with deep convolutional generative adversarial networks
,” arXiv:1511.06434 (
2015
).
24.
I.
Goodfellow
, “
Nips 2016 tutorial: Generative adversarial networks
,” arXiv:1701.00160 (
2016
).
25.
D.
Foster
,
Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play
(
O'Reilly Media
,
Sebastopol, CA
,
2019
).
26.
Z.
Liu
,
X.
Zhang
,
Y.
Mao
,
Y. Y.
Zhu
,
Z.
Yang
,
C. T.
Chan
, and
P.
Sheng
, “
Locally resonant sonic materials
,”
Science
289
(
5485
),
1734
1736
(
2000
).
27.
D. P.
Elford
,
L.
Chalmers
,
F. V.
Kusmartsev
, and
G. M.
Swallowe
, “
Matryoshka locally resonant sonic crystal
,”
J. Acoust. Soc. Am.
130
(
5
),
2746
2755
(
2011
).
28.
C.
Lagarrigue
,
J.-P.
Groby
,
V.
Tournat
,
O.
Dazel
, and
O.
Umnova
, “
Absorption of sound by porous layers with embedded periodic arrays of resonant inclusions
,”
J. Acoust. Soc. Am.
134
(
6
),
4670
4680
(
2013
).
29.
V.
Romero-García
,
G.
Theocharis
,
O.
Richoux
,
A.
Merkel
,
V.
Tournat
, and
V.
Pagneux
, “
Perfect and broadband acoustic absorption by critically coupled sub-wavelength resonators
,”
Sci. Rep.
6
,
19519
(
2016
).
30.
V.
Romero-García
,
N.
Jiménez
,
V.
Pagneux
, and
J.-P.
Groby
, “
Perfect and broadband acoustic absorption in deep sub-wavelength structures for the reflection and transmission problems
,”
J. Acoust. Soc. Am.
141
(
5
),
3641
3641
(
2017
).
31.
N.
Jiménez
,
V.
Romero-García
,
V.
Pagneux
, and
J.-P.
Groby
, “
Rainbow-trapping absorbers: Broadband, perfect and asymmetric sound absorption by subwavelength panels for transmission problems
,”
Sci. Rep.
7
(
1
),
13595
(
2017
).
32.
A.
Melnikov
,
M.
Maeder
,
N.
Friedrich
,
Y.
Pozhanka
,
A.
Wollmann
,
M.
Scheffler
,
S.
Oberst
,
D.
Powell
, and
S.
Marburg
, “
Acoustic metamaterial capsule for reduction of stage machinery noise
,”
J. Acoust. Soc. Am.
147
(
3
),
1491
1503
(
2020
).
33.
H.
Meng
,
J.
Wen
,
H.
Zhao
, and
X.
Wen
, “
Optimization of locally resonant acoustic metamaterials on underwater sound absorption characteristics
,”
J. Sound Vib.
331
(
20
),
4406
4416
(
2012
).
34.
L.
Lu
,
T.
Yamamoto
,
M.
Otomori
,
T.
Yamada
,
K.
Izui
, and
S.
Nishiwaki
, “
Topology optimization of an acoustic metamaterial with negative bulk modulus using local resonance
,”
Finite Elements Anal. Des.
72
,
1
12
(
2013
).
35.
X. W.
Yang
,
J. S.
Lee
, and
Y. Y.
Kim
, “
Effective mass density based topology optimization of locally resonant acoustic metamaterials for bandgap maximization
,”
J. Sound Vib.
383
,
89
107
(
2016
).
36.
L.
Fahey
,
F.
Amirkulova
, and
A.
Norris
, “
Broadband acoustic metamaterial design using gradient-based optimization
,”
J. Acoust. Soc. Am.
146
(
4
),
2830
2830
(
2019
).
37.
C.
Robeck
,
J.
Cipolla
, and
A.
Kelly
, “
Convolutional neural network driven design optimization of acoustic metamaterial microstructures
,”
J. Acoust. Soc. Am.
146
(
4
),
2830
2830
(
2019
).
38.
Z.
Liu
,
D.
Zhu
,
S. P.
Rodrigues
,
K.-T.
Lee
, and
W.
Cai
, “
Generative model for the inverse design of metasurfaces
,”
Nano Lett.
18
(
10
),
6570
6576
(
2018
).
39.
I.
Malkiel
,
A.
Nagler
,
M.
Mrejen
,
U.
Arieli
,
L.
Wolf
, and
H.
Suchowski
, “
Deep learning for design and retrieval of nano-photonic structures
,” arXiv:1702.07949 (
2017
).
40.
M. H.
Tahersima
,
K.
Kojima
,
T.
Koike-Akino
,
D.
Jha
,
B.
Wang
,
C.
Lin
, and
K.
Parsons
, “
Deep neural network inverse design of integrated photonic power splitters
,”
Sci. Rep.
9
(
1
),
1368
(
2019
).
41.
Y.
Zhang
and
W.
Ye
, “
Deep learning–based inverse method for layout design
,”
Struct. Multidiscip. Optim.
60
(
2
),
527
536
(
2019
).
42.
M.
Mirza
and
S.
Osindero
, “
Conditional generative adversarial nets
,” arXiv:1411.1784 (
2014
).
43.
W.
Ma
,
Z.
Liu
,
Z. A.
Kudyshev
,
A.
Boltasseva
,
W.
Cai
, and
Y.
Liu
, “
Deep learning for the design of photonic structures
,”
Nat. Photonics
15
,
77
90
(
2021
).
44.
S.
Marburg
,
E.
Lösche
,
H.
Peters
, and
N.
Kessissoglou
, “
Surface contributions to radiated sound power
,”
J. Acoustical Soc. America
133
(
6
),
3700
3705
(
2013
).
45.
P.
Langer
,
M.
Maeder
,
C.
Guist
,
M.
Krause
, and
S.
Marburg
, “
More than six elements per wavelength: The practical use of structural finite element models and their accuracy in comparison with experimental results
,”
J. Comput. Acoust.
25
(
04
),
1750025
(
2017
).
46.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2014
).
47.
The code and raw data accompanying this paper are available at https://github.com/cauez/metaGAN (Last viewed: 1/22/2021).