We are interested in exploring the limit in using deep learning (DL) to study the electromagnetic (EM) response for complex and random metasurfaces, without any specific applications in mind. For simplicity, we focus on a simple pure reflection problem of a broadband EM plane wave incident normally on such complex metasurfaces in the frequency regime of 2–12 GHz. In doing so, we create a DL-based framework called the metasurface design deep convolutional neural network (MSDCNN) for both forward and inverse designs of three different classes of complex metasurfaces: (a) arbitrary connecting polygons, (b) basic pattern combination, and (c) fully random binary patterns. The performance of each metasurface is evaluated and cross-benchmarked. Dependent on the type of complex metasurfaces, sample size, and DL algorithms used, the MSDCNN is able to provide good agreement and can be a faster design tool for complex metasurfaces than the traditional full-wave EM simulation methods. However, no single universal deep convolutional neural network model can work well for all metasurface classes based on detailed statistical analysis (such as mean, variance, kurtosis, and mean-squared error). Our findings report important information on the advantages and limitations of current DL models in designing these ultimately complex metasurfaces.
I. INTRODUCTION
Metasurfaces are two-dimensional (2D) artificial structures that are designed and fabricated to manipulate the propagation of electromagnetic (EM) waves in order to achieve performance unique from that of conventional materials.1–6 They have attracted enormous research attention due to their extraordinary ability to control many electromagnetic properties, such as amplitude,7,8 phase,9–11 and polarization,12 and many types of metasurfaces have been proposed, offering a large variety of specialized metasurfaces for different applications, such as in programmable metasurfaces,13–15 transforming heat,16 cloaking,17,18 holograms,19 conversion,20 absorption,21,22 scattering reduction,23 polarization,24 transmission,25 and others.
There are two general approaches in designing metasurfaces. The first approach is the forward design, which is an iterative process involving parametric studies to explore within a given set of input parameters in order to produce the desired EM response or output. A simulation tool (or a forward numerical solver) solves the underlying governing equations to provide reliable characterization of input parameters to match the calculated outputs. The cost is determined largely by the simulation time of each trial and error. If the number of input parameters is huge or to avoid computational cost, a designer often has to give up exhaustive exploration of the design space and settle to some trade-offs on the desired output.
The second approach is the inverse design, which is to find an optimal set of input parameters for a given output. This is more difficult than the forward design as there is no definite or unique solution for such a problem. Thus, inverse design is typically formulated to search for the most approximate input conditions within a prescribed domain via an optimization algorithm. Almost all of the inverse design problems are challenging, which require advanced algorithms, such as the heuristic algorithm of the ant colony,26 genetic algorithm,27 particle swarm algorithm,28 and topological optimization.29–32
Machine learning (ML) techniques such as deep learning (DL)33 has been successful in various fields involving complexity, such as computer vision, natural language processing, and speech signal processing. Their applications in some traditional scientific disciplines have also grown significantly in recent years, including condensed matter physics,34 particle physics,35 chemistry,36 text mining for material discovery,37 discovering physical concepts,38 and many other physics-based problems.39–42 DL-based approaches for the design of metasurfaces are also gaining a lot of attention,5,6 where various types of metasurfaces (with some prescribed regular patterns) have been successfully designed.43–60 Most of them use techniques such as the fully connected network (FCN), convolutional neural network (CNN), and transposed convolutional neural network (t-CNN). The FCN is composed of a series of linear dense layers, and it is the most basic neural network, although the input and output of this network are limited to one-dimensional (1D) vectors. By choosing a proper activation function, such as sigmoid or tanh functions, the FCN can achieve outstanding performance on classification problems. In contrast, the CNN accepts higher dimensions such as 2D image or 3D vectors. With the advantages of the convolution operation, it extracts the spatial relationship of the input signal and is expected to achieve learning in long-range interaction by stacking layers sequentially. The Residual Deep Convolutional Neural Network (Resnet DCNN)61 is well-known to be a more robust alternative to the Fully Convolutional Neural Network (FCNN) as the residual function can provide a smooth and stable gradient. The t-CNN is the inverse operation of the normal convolution process, which is typically introduced in the DNN-based inverse design approach, such as the Generative Adversarial Network (GAN).62
In the following sections, we first provide a short overview of the DL-based framework used for designing metasurfaces. The complexity of various types of metasurfaces with different degrees of freedom is shown in Fig. 1. In this paper, we consider three complex metasurfaces: (a) arbitrary connecting polygons (PLGs), (b) basic pattern combination (PTN), and (c) fully random binary patterns (RDNs). The goal is to explore DL-based design for any given complex metasurface for a broadband EM response. In our experiments, the EM response is focused on the reflection of a broadband EM wave (from 2 to 12 GHz) on the given dataset of complex metasurfaces (PLG, PTN, and RDN), where the reflection as a function of frequency can be predicted by using different DL models based on our training procedures for both forward and inverse design. Successful results and limitations will be evaluated and discussed. By considering the subordinate relation in these three metasurfaces, we also use cross-benchmarking to evaluate the ability of the FCNN model in using different metasurfaces in both training and testing. Finally, we conclude the paper that the metasurface design deep convolutional neural network (MSDCNN) is able to provide good performance for each of the three complex metasurfaces studied in this paper; however, there is no one single universal DCNN model that is able to perform well for all of them simultaneously. Other DL models such as the graph neural network or complex value neural network will be likely promising candidates for further improvements.
II. METASURFACE DESIGN USING DEEP LEARNING
Many successful applications of DL models have been reported in the literature for the design of different metasurfaces.43–59,63,64 In metasurface designs, the design parameters and the desired EM responses are of key concerns. The design parameters are properties associated with the metasurfaces such as working frequency and geometrical parameters that describe the physical structure of the metasurfaces. These parameters are normally limited to a continuous variable range R, so the problem can be abstracted as a mapping between Rn and Rk, where n is the number of controllable/designable variables and k is the dimension of the desired response/design target. In this sense, the mapping from Rn to Rk is the forward prediction of metasurface designs, while the mapping from Rk to Rn is the inverse design.
Depending on the complexity of the controllable variables, we can categorize them into two types. The type-1 design task is of relatively low complexity featuring a template with well prescribed shapes that can be described by a handful of geometrical parameters such as thickness, spacing, and width.
For the k = 1 case, it is the simplest single-regression task. In some cases, the design target is a frequency-dependent response (k > 1), and the controllable parameter dimensions are limited to small n. As an example, a prior work43 was focused on the electromagnetic scattering of alternating dielectric thin films with a combination of different thicknesses and materials (also known as the layered model). For such type-1 tasks, the fully connected network (FCN) model is sufficient to predict the corresponding EM response, which agrees with physics-based simulated results (obtained from a numerical solver). However, it is not suitable for the inverse design task where its training becomes unstable and it is also slow due to the inconsistency of the dataset used. An encoder–decoder network, known as tandem,43 has been introduced to solve this problem, which has quickly become a popular framework in the metasurface design. Other improvements include enhancing the FCN for more robust performance by using deeper networks.44,49–51
The type-2 design task allows a higher degree of complexity, which typically features a 2D metasurface, and it can be viewed as a mapping of Rn × Rm → Rk. This task can be converted to the type-1 task (Rn×m → Rk) if n and m are small.52 However, when the size (n and m) is very large, the deep FCN will be too expensive and unstable for computation. Instead of the FCN, the convolutional neural network (CNN) provides a distinct advantage for such multi-dimensional inputs. The convolutional operation in the CNN is ideal in capturing the spatial relationship of the inputs. By using the deeply stacked convolutional layers, both local and long-range effects are expected to be captured by the CNN. For example, at k = 1 (prediction of the quality factor of a cavity),53 the CNN with only four layers is capable of producing outstanding prediction. For k = 2, the generative adversarial network (GAN) has been successfully applied to design a meta-grating component.54 Furthermore, the GAN has also been incorporated for a physics-driven and data-free neural network.55 As mentioned above, for the type-1 problem, the FCN is found to be effective if the input (Rm) and the target output (Rk) have comparable small dimensions. However, it is not suitable for the type-2 problem; the inverse design becomes troublesome due to the complication in dimensionality. It is reported that the GAN can efficiently discover the correct metasurfaces in using user-defined and on-demand spectra as input parameters.58 A bidirectional neural network is successful in designing three-dimensional chiral metamaterials.59
For a broadband response of the metasurface, we may have large k over a large frequency range. Depending on the resolution requirement and frequency range, the value of k can be 1000, or much more is required. For this problem, mathematical transformations like Fourier, wavelet, or simple down-sampling can help extract the most significant information from the EM response.65 Other methods such as the contrast-vector56 designed to emphasize important features such as the peak location of the EM response can also improve the efficiency of the inverse design. With these methods, the dimension from k = 1000 can be compressed to k < 100 with good accuracy.
The above-mentioned discussion suggests that different DL models are required depending on the specific types of the metasurfaces and their required EM response. For complex metasurfaces with irregular patterns (see Fig. 1), we are not sure if DL models will work especially for a broadband EM response, in which the studied domain space is big with large values of n, m, and k. Thus, it is the focus of this paper to evaluate the performance of DL models for the broadband response of such complex metasurfaces.
The remainder of the paper is organized as follows: In Sec. III, we will introduce the training procedures to obtain a well-performing forward prediction and inverse design model based on the convolution neural network (CNN). We will report the good prediction of our model for both fast forward and inverse design of PLG based complex metasurfaces. In Sec. IV, we will introduce the other two complex metasurfaces: PTN and RDNs. We will extend the original DL model developed for PLGs to PTN and RDNs. Improvements in using deeper DL models and large datasets are reported. The weak generalization of one unique DL model to all PLGs, PTN, and RDNs is discussed with detailed statistical measure and cross-benchmarking. Finally, Sec. V concludes with a summary and proposes some future prospects in using DL models to deal with the complex metasurfaces. We argue that complex metasurfaces studied here may serve as a good platform to test the capability or limit of ML in analyzing a complex design problem even though the EM response of the complex metasurface is well governed by the Maxwell equations.
III. METASURFACE DESIGN DEEP CONVOLUTIONAL NEUTRAL NETWORK (MSDCNN)
We propose a metasurface design deep convolutional neural network (MSDCNN) framework for both forward design and inverse design of complex metasurfaces. In particular, co-polarized reflectance (coPR) of a purely reflective metasurface over a frequency range of 2–12 GHz is chosen for the purpose of demonstration. A high-quality dataset is important for training the MSDCNN; thus, we rely on automated full-wave simulations with an F-solver in CST Studio Suite to provide accurate characterization of various complex metasurfaces to the corresponding calculated coPR values.
Each metasurface created in our experiment is represented by a unique pattern encoded by a 16 × 16 matrix made up of 0 and 1. The binary setting of 1 or 0 corresponds, respectively, to the presence or absence of a square copper patch (0.5 × 0.5 × 0.018 mm3) overlaid on top of a dielectric substrate [ϵr = 2.65 × (1 + 0.003i) and μr = 1], which is backed by a 0.18-mm-thick copper plate. This together with a padding of 1 mm on the sides forms the unit cell used in the CST simulation. Simulations were performed with the unit cell boundary condition in x and y directions and the open boundary condition in the z direction. An x-polarized plane wave is incident normally from the top of the metasurface, as illustrated in Fig. 2, and the reflection is measured as a function of frequency. We generate 30 000 samples of arbitrary connecting polygon patterns (PLGs) with six examples shown in Fig. 3(a). The data are randomly split into a training set of 27000 samples and a test set of 3000 samples for training purposes. The calculated coPR for each PLG has 32 points evenly distributed over a frequency range from 9.5 to 12 GHz, as shown in Fig. 3(b). In our experiments, we have found that the uniform sampling method is better than fast Fourier transform (FFT) and wavelet transform. Having tested several different numbers of sampling points, 32 sampling points have been determined to produce a satisfactory outcome with minimal information loss. Furthermore, it is to keep the network compact and efficient. Increasing sampling points has led to a slight increase in network parameters and longer convergence time but without significant improvement in our experiments.
Our MSDCNN framework in Fig. 4 is composed of three branches: the (a) forward prediction branch (evaluation of an image to predict coPR), (b) inverse generator branch (generation of an image from a given coPR), and (c) judge branch (measure the agreement of the generated pattern). Resnet18S is adopted in the evaluation branch, which consists of eight Resnet blocks with the LeakReLU activation function. It has been adapted from the popular Resnet18 model.61 This evaluation branch predicts the corresponding coPR associated with an input 2D image (such as the PLG pattern). The generation branch has five transposed convolution blocks followed by a tanh activation function. This inverse generator branch suggests a 2D image associated with the input EM response such as coPR. The judging branch is made up of a series of regular convolution blocks, which is used to compare the generated image obtained from the generation branch to the input image in terms of the confidence level of the matching. A successfully trained forward prediction branch is regarded as a replacement for the numerical solver for the forward design, and the inverse generator branch is used to perform the inverse design.
In our MSDCNN, we first train a very accurate evaluation model to predict coPR to replace the time-consuming CST simulation. Second, we build a high-quality image generator under the GAN framework in order to convert any random sequences to a 2D polygon-like pattern. Using the evaluation and generation branches, we can establish a differentiable mapping between the 2D images and the input coPR, so it becomes a useful metric in quantifying the matching between the two inputs. By enforcing a minimization (or optimization) of this metric in the training procedure, it becomes a pattern-coPR converter. This training procedure encourages the model to search for the best candidate pattern for the evaluation branch. Thus, the accuracy of the evaluation branch is critical to the performance of the MSDCNN, which largely depends on the quality and quantity of the dataset used in the training. More details on the architecture of the MSDCNN and the training procedure can be found in the supplementary material.
To measure the accuracy of the MSDCNN on the PLG patterns, we use the mean-square error (MSE). Our results suggest that the MSDCNN can obtain high accuracy for both the forward prediction and the inverse design tasks from the evaluation and generation branch, respectively. For the forward prediction task, the MSE is defined as , where Cr is the true coPR (obtained from CST) and Cc is the predicted coPR from the evaluation branch in the MSDCNN. Figure 5 shows the histogram of the accuracy of the 3000 test samples as a function of MSE or s. It is clear that most of MSE are less than 10−4, which confirms the superior performance of the evaluation branch or forward design of the MSDCNN for PLG patterns.
In Fig. 6, eight randomly selected test cases are plotted to demonstrate the excellent agreement between predicted and truly calculated coPR. More importantly, the model can correctly capture both the locations and magnitudes of the peaks or variation in the spectrum from 9.5 to 12 GHz. Here, we have coPR = 1 at lower frequency from 2 to 9.5 GHz (not shown). The average error of the evaluation branch is s = 3.7 × 10−4, thus confirming that the evaluation branch has been successfully trained with high accuracy and thus can be used as a fast computational tool to replace the traditional EM simulator for the design of complex PLG metasurfaces.
For inverse design, the generation branch suggests a corresponding 2D image of a PLG-like metasurface for a given coPR spectrum. There are three metrics used to determine the accuracy: Cr is the input (or real) coPR, Cg is the predicted coPR from the evaluation branch, and Cp is the actual coPR computed by CST based on the 2D image created from the generation branch. An ideal good inverse design demands a low error between Cr and Cp determined by . However, Cp is calculated from the time-consuming full-wave simulation, and avoiding such lengthy simulations will largely reduce the training time required in our model. Thus, we choose to optimize the alternative error between Cr and Cg, which is defined as . Finally, the error between Cp and Cg is . In Fig. 7, we show the comparison of six cases with their respective values of e, d, and b. The results show that the Cr, Cp, and Cg agree well with only small errors in the range of 10−3 to 10−4. This indicates that the generation branch has been successfully trained with the ability to provide promising inverse design, satisfying the input coPR spectrum. It is observed that the designs produced by the generator are not the same as that in the reference upper image. This is due to the patterns and the EM responses not having one to one mapping. In this case, the network will provide a design with the closest possible EM response in the context of the training dataset. Furthermore, we have used a well-trained evaluation branch as a part of the GAN rather than directly employing a full-wave simulator to compute mapping from EM response to the pattern. Those technologies will help expand the expression capability of the model and overcome the data inconsistency.50
The above-mentioned findings shown in Figs. 6 and 7 have proved the possibility of using the MSDCNN in complex metasurface design such as PLG patterns. First, the evaluation branch (forward design) can provide an accurate prediction of a broadband EM response from 2 to 12 GHz. Second, the generation branch can suggest corresponding PLG based metasurfaces to satisfy the input broadband EM response. In Sec. IV, we will extend this capability to other types of complex metasurfaces in order to assess the broader performance of the MSDCNN and to understand its limitation.
IV. EXTENSION TO OTHER METASURFACES
For well-known image datasets in computer science such as MNIST (hand written digital database), CIFAR, and ImageNet, the DL algorithm can achieve state-of-the-art performance on many tasks such as recognition, segmentation, and tracking. Those datasets generally contain prior knowledge or characteristics based on human cognition, and the learning target is the patterns attributed with color, shape, and position. Such a pattern is typically polygon-like or, more precisely, a connected manifold. Unlike these images for human recognition, complex metasurface learning is governed by physics such as Maxwell equations. Few studies have been devoted to the effectiveness in using DL to predict the underlying physics-based outputs concealed in such complex metasurfaces. Our initial success on the aforementioned PLG dataset (in Sec. III) intrigues us to expand the capability to accommodate other complex patterns with a similar high degree of freedom to study the effectiveness of the proposed MSDCNN framework.
A. Three datasets: PLGs, PTN, and RDNs
The two types of datasets are pattern combination (PTN) and the random (RDN) datasets. Including PLGs, we have three types of complex metasurfaces (see Fig. 8): (a) arbitrary connecting polygons (PLGs), (b) basic pattern combination (PTN), and (c) fully random binary patterns (RDNs). Note the patterns in all three metasurfaces are encoded into a binary matrix of size 16 × 16 (). The PLG pattern requires connectivity, which is common in manufacturing design.44,56–58 The PTN pattern is a combination of some basic shapes such as squares (9 pixels), crosses (5 pixels), and triangles (4 pixels) with four directions, a U-shape (5 pixels), and an H-shape (7 pixels). The RDN pattern is the fully random pattern with no constraint. The coPR of each created pattern is calculated by CST as a function of frequency from 2 to 12 GHz. The statistics of the calculated coPR (mean and variance) for each dataset is shown in Fig. 8(d). At low frequency, coPR is 1 (perfect reflection). For the PLG shape, we have perfect reflection at frequency lower than 9.5 GHz; thus, only the limited range from 9.5 to 12 GHz is shown in Figs. 6 and 7.
Among the three datasets, RDN has the largest domain or highest degree of freedom in complexity. Thus, it is interesting to know if a well-trained DL model based on the RDN dataset could function well on different patterns such as PTN and PLG datasets. Thus, it is desirable to extend the MSDCNN framework (previously trained on the PLG dataset) to PTN and RDN datasets. Good performance can be obtained for all three datasets with modification of the DL models used or with a larger sample size (see below). The generalization of only one unique DL model for all datasets is challenging.
Similar to the PLG dataset, the size of training and test sets for RDN and PTN datasets is 27 000 and 300, respectively, unless it is mentioned otherwise. In some experiments, the number of RDN datasets is increased to 108 000 for better performance. Other than the Resnet18S model used, we also apply other DL models such as Resnet34 and ResNa. In our comparison, the performance of the generation branch heavily depends on the accuracy of the evaluation branch. Therefore, the discussion below will be centered around the performance of the evaluation branch (forward design) to predict the coPR for a given arbitrary metasurface. The performance is evaluated not only by the MSE but also by other statistical measurements, such as mean, variance, and kurtosis.
B. Improvement and limitation
The DCNN is well-known for its great generalization in computer vision tasks. For example, Resnet18 can perform well for most computer vision tasks. Since our MSDCNN model is a variant of the DCNN and with the high similarity between our complex metasurfaces and digital images (used in computer vision), we speculate that the success of the MSDCNN on the PLG dataset may work on the PTN dataset and even on the RDN dataset.
While finding one unique and universal DL model for all arbitrary complex metasurfaces is hard, we will show that very good improvements can be obtained if we only focus separately on one type of complex metasurface. This implies that the DL model is a useful approach for the design of complex metasurfaces if a particular type can be specified separately to create a suitable DL model. Figure 9 shows the predicted values (blue) by using Resnet18S trained on PLG, PTN, and RDN datasets (27 000 samples each), which are labeled PLG27000, PTN27000, and RDN27000, respectively. The real calculated values from CST are also plotted (red) for comparison. Notice that the PTN results show poorer agreement than PLG even when the same number of samples is used, which suggests that Resnet18S (good for PLG) is not necessarily good for the PTN dataset.
Deeper/better DL models or a larger dataset can probably enhance performance. Figure 10 demonstrates these efforts with significant improvements. The details of the adapted deeper ML models can be found in the supplementary material. The Resnet34S model is the deeper version of Resnet18S, which has more parameters and greater capacity. As expected, the performance of Resnet34S is better in training the PTN27000 dataset than that of Resnet18S, where the agreement in Fig. 10(a) of PTNRes34 is better than that in Fig. 9(b). The ResNa model combines the CNN with the long short-term memory (LSTM) network, and it has few layers and fewer parameters than Resnet18S. ResNa is expected to handle a more complex pattern such as the RDN dataset. Thus the results of RDN27000 in Fig. 9(c) have been improved by using the ResNa model as shown in Fig. 10(b). By expanding the sample size of the RDN dataset (based on Resnet18S) from 27 000 to 108 000 (RDN27000 to RDN108000), we also improve the performance as shown in Fig. 10(c) in comparison to Fig. 9(c). The improvements reported in Fig. 10 confirms that good performance can be achieved by using better DL models and/or large training sets for all three complex metasurfaces over a frequency range from 2 to 12 GHz. For completeness, we show eight random patterns selected from the RDN108000 dataset in Fig. 11 to illustrate their small s (mean-square error). Note this good performance is due to the same type of complex metasurface used in both training and testing.
Upon further analysis using the high-order statistical measure such as kurtosis, we notice that the previous analysis based on mean and variance does not reflect the matching fully. Note kurtosis is a statistical measure to define how heavily the tails of a distribution differ from the tails of a normal distribution, which helps determine whether the tails of the distribution contain extreme values. The calculated mean and kurtosis are shown in Fig. 12 for three cases: (a) the well-trained PLG27000, (b) the under-trained RDN27000, and (c) the large training set of RDN108000. The well-trained PLG27000 model not only matches the variance well in Fig. 9(a) but also has a matching kurtosis as shown in Fig. 12(a). The RDN108000 case, however, shows good improvement in variance in Fig. 10(c) in using larger sample sizes; the disagreement in kurtosis remains significant as shown in Fig. 12(c). Such disagreement in kurtosis will lead to high bias error in inverse design. For example, Fig. 13 shows the performance of the generation branch (or inverse design) when the model is trained on the RDN dataset of 108k samples. The estimated coPR Cg (green) from the evaluation branch using the predicted image by the generation branch agrees well with the desired input coPR Cr (red) but not necessarily in good agreement with the CST calculated coPR Cp (blue) of the predicted image. This finding reveals that the MSDCNN-Resnet18S model has high bias error in the generation branch (inverse design) despite having superior performance in the evaluation branch (forward prediction).
C. Cross-benchmarking between the models
Consider two datasets A and B, where the domain of B is a subset of the domain of A. A central question in this section is whether a well-trained model on A will show considerably good performance on B. For example, we are interested in knowing if the model trained under the RDN dataset is good for testing PLG or PTN datasets. Another question is to verify if the CNN based model will perform better than other models such as the random forest regressor (RFR), where RFR is an ensemble learning method by constructing many decision trees during training, and the output is based on the average prediction of all trees. Note that the RFR is a general purpose yet powerful model that requires few resources to train and few parameters to be tuned in comparison to CNN based models. This section focuses on answering these questions by performing cross-benchmarking using different training datasets and models.
Table I shows the MSE performance of different models, which are trained independently on PLG, PTN, and RDN datasets and evaluated on all three datasets. For example, the row of PLG27000 shows the results using the PLG dataset (27k samples) to train MSDCNN-Resnet18S, but the model is tested on the all three datasets (PLGs, PTN, and RDNs). The results marked in red show the best performance (smallest MSE), and the best performance remains when using the same type of dataset for both training and testing. This finding suggests that the models trained on the dataset with a larger domain do not necessarily perform better on other datasets even with a smaller domain of less complexity. For example, the models trained on a higher-level dataset (RDNs) do not perform well at a lower-level dataset (like PTN and PLGs). In Table I, among the performances of models tested on PLG dataset (column PLG), RDNResNa and RDN108000 perform significantly poorer than PLG27000. Despite better capacity of model (RDNResNa) and significantly larger training dataset (RDN108000), training on a higher level dataset does not provide comparable performance to PLG27000. Similarly, consistent behaviour can be observed in the evaluation of performances of models on PTN dataset (column PTN). This situation can be regarded as another type of over-fitting on the large domain level, suggesting that the DL-based model is more like a curve-fitting process than learning the physics behind it, which will otherwise allow equivalent performance in using large datasets in training (see move in discussion below). The weak agreement of the model trained on RDN108000, which are tested on PLGs and PTN, can be found in Figs. 14(a) and 14(b).
. | PLG . | PTN . | RDN . |
---|---|---|---|
PLG_RFR | 0.023 588 | 0.024 425 | |
PTN_RFR | 0.053 711 | 0.006 895 | |
RDN_RFR | 0.068 843 | 0.012 041 | |
PTN27000 | 0.016 809 | 0.004 382 | 0.012 409 |
RDN27000 | 0.009 029 | 0.005 257 | 0.010 971 |
PLG27000 | 0.007 517 | 0.017 436 | |
PTNRes34 | 0.018 426 | 0.017 377 | |
RDNResNa | 0.009 477 | 0.011 651 | 0.008 712 |
RDN108000 | 0.014 721 | 0.013 631 |
. | PLG . | PTN . | RDN . |
---|---|---|---|
PLG_RFR | 0.023 588 | 0.024 425 | |
PTN_RFR | 0.053 711 | 0.006 895 | |
RDN_RFR | 0.068 843 | 0.012 041 | |
PTN27000 | 0.016 809 | 0.004 382 | 0.012 409 |
RDN27000 | 0.009 029 | 0.005 257 | 0.010 971 |
PLG27000 | 0.007 517 | 0.017 436 | |
PTNRes34 | 0.018 426 | 0.017 377 | |
RDNResNa | 0.009 477 | 0.011 651 | 0.008 712 |
RDN108000 | 0.014 721 | 0.013 631 |
To completely investigate the compatibility on the low-level dataset, we need to check the dependence of sample size on the RDN dataset used in the training, which is plotted in Fig. 14(c) from 80 000 to 108 000 samples. The figure shows that, although the performance on the RDN dataset increases (smaller MSE) via more samples, its performance in testing PLG’s and PTN’s scores is not affected by the trained sample size. It confirms that a well-trained model on the RDN dataset will not work for other datasets like PLGs and PTN even if it has a small domain. Finally, using the table, we can conclude that the best well-trained CNN models (red) will have better performance than the well-trained RFR (green), which demonstrates the advantages of using CNN based algorithms.
D. Discussion
Our experiments conducted above have shown that the MSDCNN framework can work well separately for three complex metasurface patterns: PLGs, PTN, and RDNs. It can be a useful and fast computation tool to design complex metasurfaces if different DL models are used specifically for different patterns. Having tested a variety of popular DCNN architectures, there is no one unique and universal DCNN model that is suitable for all three types of metasurfaces. The comparison between the PTN and PLG datasets shows that different complex metasurfaces require different DL models even when using the same number of samples in training. The comparison between different sample sizes (from RDN27000 to RDN108000) indicates that larger sample size, as expected, is an essential but not the sole factor for the performance. The poor performance when using a more general dataset with a large domain (such as RDNs) to test more constrained patterns with a smaller domain (such as PTN and PLG) suggests that the patterns emerged within a dataset and recognized by DL models are specifically relevant to the individual dataset. However, such patterns are not generalized enough to constitute an understanding of the governing laws due to physics, namely, Maxwell equations. This phenomenon is similar to the mode collapse issue encountered in the GAN model. An explanation is due to the limitation of using mean-square error (MSE) over the entire training dataset, and local minimum solutions are not captured properly. The local minimum problem can be avoided by introducing a regularization term in most cases. Having tried this approach multiple times, we observed that this issue happens frequently when the size of the dataset is small despite careful tuning of hyper parameters, which can be attributed to the unsuitable architecture of the DCNN models used in the evaluation branch.
The notion of inductive bias has been proposed66 to emphasize the relation between task symmetry and operation ability. From this perspective, datasets such as PLGs and PTN have locality and spatial symmetry, which is suitable for convolutional operation. The PLG pattern is also denser with more continuity than the PTN pattern, so it is expected to perform better. Thus, our MSDCNN-Resnet18S works well on PLGs despite a significantly smaller dataset being used, as shown in Figs. 6, 7, and 9(a). By improving the capacity of Resnet18S to Resnet34S, it offers better performance on the PTN dataset with the same number of samples [see Fig. 10(a)]. The most general dataset (RDN) with the highest degree of freedom turns out to be the most challenging model to be trained well. We have to increase the number of samples from 27k to 108k (four times more) to improve the accuracy to a comparable level [see Fig. 10(c)]. From the cross-benchmarking results, the model trained in the larger domain of datasets (such as RDNs) is not applicable for a smaller or more specific domain such as PLGs and PTN. This suggests that a model trained on two datasets of distinct properties is not compatible even if the domains of the two datasets are of a subset relationship. Thus there may exist a compatible state of the specific ML model that we have not achieved yet, which will require future studies. The ultimate goal to a unique and universal DL model for the forward and inverse design of arbitrary complex metasurfaces will require new DL models in the future investigations.
V. CONCLUSION
In summary, we propose a metasurface design deep convolutional neural network (MSDCNN) to study the performance of CNN based models in order to perform the design (both forward and inverse) of complex metasurfaces for broadband electromagnetic (EM) wave reflection from 2 to 12 GHz. Having experimented on three different complex metasurfaces with a high degree of freedom—arbitrary connecting shapes (PLGs), basic pattern combination (PTN), and fully random binary shapes (RDNs)—it is confirmed that the MSDCNN can provide a promising tool (faster than the traditional numerical EM solver, such as CST Studio Suite EM analysis software) for such complex metasurfaces. Among them, the best performance is on PLG-like metasurfaces, which requires the least effort to achieve good performance. In contrast, a more advanced ML algorithm is required for PTN while a substantially larger sample size is required for the RDN in order to achieve the same performance. There is no one unique and universal DCNN model that can work well for all of them; thus, transfer learning of the DCNN between them is not good. Such behavior is likely caused by the fact that the sequential stacking of convolution operation is suitable for parsing information, which contains spatial locality and a highly nearby correlation pattern such as natural images. However, for an arbitrary complex metasurface, the information extracted by the pure CNN is not sufficient and can be further enhanced. This is revealed by the performance improvement after combining the Recurrent Neural Network (RNN), demonstrated by the ResNa model. The finding suggests that new DL models are required to achieve a universal DL model for arbitrary complex metasurfaces. The inductive bias of arbitrary metasurface design contains not only the simple spatial invariance but also the complex physics governed by the Maxwell equation such as the long-range and time–space interaction and non-local interaction that the current DL models fail to capture according to our studies. For future works, other models such as the graph neural network, complex value neural network, and physics inspired CNN42 could be explored as possible candidates for further improvements.
It is important to note that the objective of this paper is based on the curiosity in understanding the limits of the MSDCNN in predicting the EM response of complex and random metasurfaces without specific applications in mind. We also ignore any experimental constraints and feasibility of such complex metasurfaces in any applications, which will require future exploration. However, some complex metasurfaces of such a kind28,67–75 have been realized and reported in the literature with the feature size down to nanometers. These metasurfaces are designed using binary coding with some of them28,73–75 appearing to be atypical, unstructured, and random. The simple EM reflection problem is for demonstration purpose, and the results can certainly be extended to include other effects such as incident angles, polarization, and others, which can be included using CST for data collection. For realistic applications, one may need to conduct sufficient experimental data for benchmarking and, if possible, a training dataset as well.
SUPPLEMENTARY MATERIAL
See the supplementary material for the model structure of the deep learning used in the paper.
ACKNOWLEDGMENTS
This work was supported by the USA Office of Naval Research Global (Grant No. N62909-19-1-2047). T.Z. acknowledges the support of the Singapore Ministry of Education Ph.D. Research Scholarship. Y.S.A. acknowledges the support of the SUTD Start-Up Research Grant (No. SRT3CI21163).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding authors upon reasonable request.