Metagratings are flat and thin surfaces that rely on unique, periodically repeating (non-gradient), arbitrary shaped light scattering units for wave manipulation. However, the absence of an empirical relationship between the structural and diffraction properties of the units enforces utilization of brute force numerical optimization techniques to determine the unit shape for a desired application. Here, we present an artificial neural network based methodology to develop a fast-paced numerical relationship between the two. We demonstrate the training and the performance of a numerical function, utilizing simulated diffraction efficiencies of a large set of units, that can instantaneously mimic the optical response of any other arbitrary shaped unit of the same class. We validate the performance of the trained neural network on a previously unseen set of test samples and discuss the statistical significance. We then utilize the virtually instantaneous network operations to inverse design the metagrating unit shapes for a desired diffraction efficiency distribution. The proposed inter-disciplinary combination of advanced information processing techniques with Maxwell's equation solvers opens a pathway for the fast-paced *prediction* of metagrating designs rather than full wave computation.

Metasurfaces^{1–8} are new age optical elements that demonstrated several unconventional forms of light manipulating devices using flat and ultra-thin patterned layers. However, the typical metasurface design procedure of arranging subwavelength sized resonant light scatterers guided by the required phase gradient confronts with achievable phase resolution and fabrication limitations.^{9–12} To surpass these limitations, a new class of ultrathin, flat optical wave manipulation layers known as metagratings^{12–18} has been recently evolved to demonstrate that similar degree of wave engineering could be achieved by properly designing the geometrical and optical properties of unique, uniformly repeating, wavelength order sized units. For example, a bianisotropic unit has been demonstrated with unitary reflection efficiency with a minimal fabrication challenge,^{12} a free form shaped unit has been demonstrated for large angle beam scanning performance^{13,17} and coupled dielectric waveguides have been demonstrated for a broadband beam scanning and splitting applications.^{18}

Despite their great potential, the pace of development of metagratings in comparison to conventional metasurfaces is not up to the demand because of the absence of a clear design strategy and an analytical, numerical, or empirical relationship between the shape of the unit and the far-field intensity pattern. Determination of unit shape for a particular application often relies on computationally expensive and time consuming numerical evolutionary optimization techniques.^{13,17}

On the other hand, in a virtually unrelated field of information technology, new strategies of computational algorithms have been under huge development in the form of machine learning^{19} and deep learning^{20} techniques that stem from artificial neural network analysis.^{21} Neural network algorithms are developed based on a heuristic vision to imitate the functionality of a biological brain where by processing the information about many examples through several “artificial neurons,” inferences are made on the relation between the input and output set of parameters. The basic idea of neural network algorithms is to *predict* the solution to a problem by learning from a predefined set of examples rather than solving the problem directly. Neural network algorithms in the context of optics have been previously utilized for microscopy,^{22} antenna design,^{23} lens-less imaging,^{24} multi-layered thin films,^{25} multi-layered nano-particles,^{26} and H-shaped light scatterers.^{27} Here, we utilize the neural network prediction algorithms to develop a numerical relationship between the shape of a metagrating unit and its diffraction efficiency (DE).

The basic unit of a neural network is a neuron (also known as a node or unit). A neural network typically consists of anything from a few tens to even millions of interconnected nodes distributed in a sequence of layers, as shown in Fig. 1. The number of nodes in the first and last layers is equal to the number of input and output parameters of the problem, respectively. Each node receives input from some other nodes, or from an external source (called bias *b*) and computes an output. Each input has an associated weight (*w*), which is assigned on the basis of its relative importance to other inputs. The node applies a function *f* (known as transfer function) to the weighted sum of its inputs, as shown in Fig. 1. The objective of the training process is to compute the weights and biases of each node by utilizing a training dataset. The weights and biases are computed using iterative optimization techniques, such as gradient descent optimization technique, to minimize the prediction error on the training set (details in supplementary material).

To demonstrate the concept of utilizing artificial neural networks in the context of metagrating design, we consider light scattering from an arbitrary shaped periodic dielectric structure placed on a reflecting surface with a spacer layer, as shown in Figs. 2(a) and 2(b). The metagrating, under plane wave illumination in the normal direction, scatters light into diffraction orders along symmetric angles defined by the periodicity of the structure; however, the diffraction efficiency (DE) in each direction could be asymmetric based on the shape of the building block. Computing the DE into each angle for an arbitrary shaped unit usually involves solving Maxwell's equations with full wave solvers such as finite difference time domain technique (FDTD),^{28} finite element method (FEM)^{29} or rigorous coupled wave analysis (RCWA).^{30} The inverse problem of predicting the shape of the building block for a desired set of DE in each angle is an extremely tedious task involving the computation of thousands of shapes using numerical optimization techniques such as a genetic algorithm.^{13,17,31–33} For each desired target set of DEs, the optimization methods re-search the whole parameter space consuming immense amount of time and computational resources.

In the case of neural networks, we compute the DEs of thousands of arbitrary shaped units “once” for the purpose to train the network function. Once trained, the network will quickly mimic the DE patterns of a gigantic set of arbitrary shaped building blocks using a few matrix operations. In the training set, for computational convenience, the shape of each unit is parameterized as a sixteen sided polygon with sixteen vertices whose positions could be varied arbitrary within the periodic bounds. Assuming each vertex in polar coordinates as [(*r _{i}*,

*θ*),

_{i}*i*= 1–16], the polar angles (

*θ*) of the vertices are uniformly distributed between 0 and 2

_{i}*π*. Therefore, the shape of the unit is now completely defined only by the radius coordinates (

*r*) of the vertices. Note that the 16-dimensional space of radii vector with at least 10 steps along each dimension requires a dataset of 10

_{i}^{16}arbitrary shapes for an accurate training process which is far beyond the practical limits of computational resources and time. In order to ensure that a smaller set of samples will incur the qualities of all possible shapes of the 16-dimensional space inside the work plane (unit cell), we consider a training dataset comprising 8 categories, as shown in Fig. 2(c). In the 8 categories, the radii are defined such that each 5000 set of shapes is restricted between the limits (0 <

*r*< 0.25), (0.25 <

_{i}*r*< 0.5), (0.50 <

_{i}*r*< 0.75), (0.75 <

_{i}*r*< 1.0), (0.0 <

_{i}*r*< 0.50), and (0.50 <

_{i}*r*< 1.0). We then forced additional 10 000 shapes to have four-fold symmetry with radii between (0.0 <

_{i}*r*< 1.0), and another additional 50 000 shapes are taken as purely random values between (0.0 <

_{i}*r*< 1.0). The radii of all the shapes can now be represented as a matrix “

_{i}*X*” with 16 rows and (a total of) 90 000 columns.

The chosen periodicity of the metagrating and the wavelength of incident light will result in a total of 13 diffraction orders inside the propagation regime.^{34} The aim of the artificial neural network function (*N _{net}*) is to relate the 16 radii values (

*r*,

_{i}*i*= 1–16) of the shape to the DEs of the resulting 13 diffraction orders (

*D*,

_{j}*j*= 1–13). Mathematically, the relationship can be represented as

where “*r*” and “*D*” represent the column vectors of radii and DEs, respectively.

The computation of DEs is carried out using an in-house built RCWA solver.^{30,35} The RCWA algorithm is computationally inexpensive and is customized to run in parallel utilizing the parallel computing toolbox of MATLAB. On a workstation with 24 cores, the computation of DEs of 90 000 shapes consumed around 50 h.

The computed dataset of DEs of the 13 propagation diffraction orders of 90 000 samples can be represented as a matrix “*Y*” with 13 rows and 90 000 columns. The histogram distribution of computed DEs of each diffraction order of all shapes is shown in Fig. 2(d). The responsibility of the neural network is now to compute the set of weight and bias matrices that represent a generalized numerical relationship between the matrices “*X*” and “*Y*”. Here, we trained the neural network using training packages based on feedforward network layers^{36} available in MATLAB. After many test runs, a feedforward network architecture is chosen with 3 layers (1-input, 1 hidden, and 1-output) with 16, 250 and 13 nodes, respectively (details in supplementary material).

During the training process, the 90 000 sample shapes are further randomly divided into three subsets of 63 000 (70%), 13 500 (15%), and 13 500 (15%) for training, validation and testing purposes, respectively. The training dataset is utilized to iteratively compute the optimized values of weights and biases, while the validation dataset aids in finetuning the network parameters and determining further gradient directions. In other words, the training set directly participates in computing the weights and biases but the validation set occasionally participates in avoiding over-fitting of the data. Finally, the test dataset is used to perform an unbiased analysis of trained neural network characteristics such as mean square error (MSE) and regression. Each iteration of the training process to minimize the prediction error is known as an epoch (details in supplementary material). The MSEs of the training, validation and test datasets at each epoch are presented in Fig. 3(a). The training is considered to have reached an optimal position when the validation and test errors are relatively low and nearly equal to each other. Such an optimum position is reached after 100 epochs for our current dataset. Figure 3(b) presents the histogram of error distribution (between the actual and predicted DEs using the optimized weights and biases) of the datasets. The observed fact that the train, test and validation datasets follow the same error distribution demonstrates that the neural network is a good fit of the data without over-fitting or under-fitting.

After successful training of the function (*N _{net}*), we further tested the performance of the function for canonical shape geometries (like squares and circles that are deliberately avoided from the training set). Figures 3(c) and 3(d) represent the comparison between the DEs of light computed directly using RCWA (solid lines) and the trained network (dotted lines). The network successfully reproduced the trends of DEs for all the diffraction angles. We opine that accurate prediction of each and every oscillation of the DE will be possible in future with the development of advanced neural network architectures. Figure 3(e)–3(g) presents the comparison of DE of a few representative arbitrary shaped samples with the average error distribution around 0.007, 0.02, and 0.04, respectively.

The fast-paced prediction of the DEs of arbitrary shapes make the (*N _{net}*) function a very appropriate tool to perform the inverse design of the unit for a specific target set of DEs. Here, we aim at retrieving the shape parameters (the radii) of the unit for a desired set of DEs (

*D*) using an interior-point optimization algorithm.

_{d}^{37}The optimization process aims at determining the optimal set of radii (

*r*) that satisfy the condition

First, to test the performance of the *N _{net}* function in the inverse design process, we required a full specular reflection of light in the normal direction along [(

*θ*,

*ϕ*) = (0°, 0°)]. The results are shown in Fig. 4(a). As expected, the

*N*function predicted the radii (

_{net}*r*) of all 16 vertices to be negligible, which minimizes the light coupling to diffraction orders. (The back mirror reflects the light completely in the normal direction.) Figures 4(b) and 4(c) presents the predicted metagrating shapes for a typical beam scanning application of conventional metasurfaces where the normal incident light is required to predominately couple to one of the diffraction orders (at oblique angles).

^{1–7}The resultant shapes in these cases are complex for a quantitative discussion; however, gradients and inclinations can be qualitatively observed in the shapes. Figure 4(d)–4(f) present the predicted shapes for multifunctional metasurface applications,

^{33,38}where the incident light is distributed into two desired on-axis and off-axis symmetric angles (diffraction orders), respectively. The predicted shape for the uniform distribution of DE along on-axis symmetric angles [Fig. 4(d)] is (as expected) a one-dimensional diffraction grating. Similarly, the predicted shape for the uniform distribution of DE along off-axis symmetric angles [Fig. 4(e)] is nearly a one-dimensional diffraction grating rotated to 45°. The predicted shape for the uniform distribution of DE along asymmetric angles [Fig. 4(f)] is again found to be complex for a qualitative discussion.

Note that one can perform the optimization process using full wave Maxwell's equation solvers by searching around for the best suitable candidate for the dataset for a target set of DEs; however, the search can be dramatically accelerated if the Maxwell's equation solver is replaced with the neural network function. The total computational time to predict all the six designs shown in Fig. 4 using the trained *N _{net}* function is less than a minute where the full Maxwell's equation solvers might require several hours.

Note that the *N _{net}* function is only trained for one configuration of the spacer and grating heights. Hence, the possible range of DE of a given diffraction order is limited to the range of values in the dataset. For example, as shown in Fig. 2(c), the maximum DE of the diffraction order along (

*θ*,

*ϕ*) = (27.7°, 180°) is of the order of 0.5. Hence, the

*N*function cannot predict any design with DE greater than 0.5 for the particular angle. This limitation is reflected in its lower efficiency of the predicted designs to diffract light to higher angles in comparison to a conventional graded pattern metasurface. Addition of the layer height as one of the parameters of training could enhance the performance of the current metagrating.

_{net}Another limitation for utilization of the *N _{net}* function is the uncertainty in the prediction of material loss. The presence of scatterers on the plasmonic back mirror inevitably results in some absorption loss. Hence, the sum of DEs of all diffraction orders does not always equal to 1 as we expect in an ideal zero-loss case. The sum of DEs of all diffraction orders of the 90 000 samples utilized during the training process fall in the range between 0.6 and 1 (0%–40% absorption loss) with nearly equal probability of all values in between. Hence, an average of 20% loss is assumed in all prediction cases in Fig. 4 [except panel (a)]. The surplus DE (1-assumed absorption loss-sum of required DEs of the diffraction orders) is randomly distributed into the remaining angles during inverse design optimization. The above discussed disadvantages of the neural network based design of metagrating could be attributed to limitations of both the prediction based neural network design methodology and the phase gradient less design.

To conclude, in this work, we design, train and test a neural network architecture to instantaneously mimic the optical response of metagratings with arbitrary unit shapes. We presented the capability of the neural networks for independently predicting the diffraction efficiency of a previously unseen large set of arbitrary shaped building blocks. We demonstrated that the network architecture also enables a fast-paced inverse design process of metagrating devices for a desired set of diffraction efficiencies. The proposed scheme allows the extension of the prediction mechanism to other physical and optical parameters of the host materials and a broad range of frequency spectra. We believe that the interdisciplinary combination of prediction algorithms with full wave solvers of Maxwell's equation opens a new pathway of on-the-fly device design processes for many nanophotonic applications that are particularly beneficial in the case of the evolving new forms of materials with dynamic control of optical^{39,40} and structural properties.^{41} The simplicity of the network function allows effortless integration of electromagnetic computations into larger Multi-physics modeling domains such as multicomponent optical networks, thermal radiation transport, quantum electrodynamics, digital holography, density functional theory, etc. We emphasize that the demonstrated inverse design process is just one immediate application example where the neural network is integrated within numerical optimization techniques.

See supplementary material for a detailed discussion on gradient descent technique for computing weights and biases of the network and building the neural network architecture.

This work was supported in part by the U.S. Air Force Office of Scientific Research (AFOSR) (No. FA9550-14-1-0349).