Metagratings are flat and thin surfaces that rely on unique, periodically repeating (non-gradient), arbitrary shaped light scattering units for wave manipulation. However, the absence of an empirical relationship between the structural and diffraction properties of the units enforces utilization of brute force numerical optimization techniques to determine the unit shape for a desired application. Here, we present an artificial neural network based methodology to develop a fast-paced numerical relationship between the two. We demonstrate the training and the performance of a numerical function, utilizing simulated diffraction efficiencies of a large set of units, that can instantaneously mimic the optical response of any other arbitrary shaped unit of the same class. We validate the performance of the trained neural network on a previously unseen set of test samples and discuss the statistical significance. We then utilize the virtually instantaneous network operations to inverse design the metagrating unit shapes for a desired diffraction efficiency distribution. The proposed inter-disciplinary combination of advanced information processing techniques with Maxwell's equation solvers opens a pathway for the fast-paced prediction of metagrating designs rather than full wave computation.

Metasurfaces1–8 are new age optical elements that demonstrated several unconventional forms of light manipulating devices using flat and ultra-thin patterned layers. However, the typical metasurface design procedure of arranging subwavelength sized resonant light scatterers guided by the required phase gradient confronts with achievable phase resolution and fabrication limitations.9–12 To surpass these limitations, a new class of ultrathin, flat optical wave manipulation layers known as metagratings12–18 has been recently evolved to demonstrate that similar degree of wave engineering could be achieved by properly designing the geometrical and optical properties of unique, uniformly repeating, wavelength order sized units. For example, a bianisotropic unit has been demonstrated with unitary reflection efficiency with a minimal fabrication challenge,12 a free form shaped unit has been demonstrated for large angle beam scanning performance13,17 and coupled dielectric waveguides have been demonstrated for a broadband beam scanning and splitting applications.18 

Despite their great potential, the pace of development of metagratings in comparison to conventional metasurfaces is not up to the demand because of the absence of a clear design strategy and an analytical, numerical, or empirical relationship between the shape of the unit and the far-field intensity pattern. Determination of unit shape for a particular application often relies on computationally expensive and time consuming numerical evolutionary optimization techniques.13,17

On the other hand, in a virtually unrelated field of information technology, new strategies of computational algorithms have been under huge development in the form of machine learning19 and deep learning20 techniques that stem from artificial neural network analysis.21 Neural network algorithms are developed based on a heuristic vision to imitate the functionality of a biological brain where by processing the information about many examples through several “artificial neurons,” inferences are made on the relation between the input and output set of parameters. The basic idea of neural network algorithms is to predict the solution to a problem by learning from a predefined set of examples rather than solving the problem directly. Neural network algorithms in the context of optics have been previously utilized for microscopy,22 antenna design,23 lens-less imaging,24 multi-layered thin films,25 multi-layered nano-particles,26 and H-shaped light scatterers.27 Here, we utilize the neural network prediction algorithms to develop a numerical relationship between the shape of a metagrating unit and its diffraction efficiency (DE).

The basic unit of a neural network is a neuron (also known as a node or unit). A neural network typically consists of anything from a few tens to even millions of interconnected nodes distributed in a sequence of layers, as shown in Fig. 1. The number of nodes in the first and last layers is equal to the number of input and output parameters of the problem, respectively. Each node receives input from some other nodes, or from an external source (called bias b) and computes an output. Each input has an associated weight (w), which is assigned on the basis of its relative importance to other inputs. The node applies a function f (known as transfer function) to the weighted sum of its inputs, as shown in Fig. 1. The objective of the training process is to compute the weights and biases of each node by utilizing a training dataset. The weights and biases are computed using iterative optimization techniques, such as gradient descent optimization technique, to minimize the prediction error on the training set (details in supplementary material).

FIG. 1.

A schematic representation of the architecture of a neural network.

FIG. 1.

A schematic representation of the architecture of a neural network.

Close modal

To demonstrate the concept of utilizing artificial neural networks in the context of metagrating design, we consider light scattering from an arbitrary shaped periodic dielectric structure placed on a reflecting surface with a spacer layer, as shown in Figs. 2(a) and 2(b). The metagrating, under plane wave illumination in the normal direction, scatters light into diffraction orders along symmetric angles defined by the periodicity of the structure; however, the diffraction efficiency (DE) in each direction could be asymmetric based on the shape of the building block. Computing the DE into each angle for an arbitrary shaped unit usually involves solving Maxwell's equations with full wave solvers such as finite difference time domain technique (FDTD),28 finite element method (FEM)29 or rigorous coupled wave analysis (RCWA).30 The inverse problem of predicting the shape of the building block for a desired set of DE in each angle is an extremely tedious task involving the computation of thousands of shapes using numerical optimization techniques such as a genetic algorithm.13,17,31–33 For each desired target set of DEs, the optimization methods re-search the whole parameter space consuming immense amount of time and computational resources.

FIG. 2.

(a) A schematic of a metagrating with an arbitrary shaped unit. (b) A representative shape of the unit using a 16-sided polygon parametrized using 16 radii values of the vertices. The unit dimensions are Λx = Λy = 2.15λ0, where λ0 = 1.55 μm. The spacer layer and the grating are made of silicon (ϵ = 12.67) with heights of 400 nm each. The back mirror is assumed as gold. (c) The distribution of the range of radii values into 8 categories. (d) Histogram distribution of computed DEs of the 13 resulting diffraction orders of 90 000 units. The angle pairs on the x-axis represent the angle of the diffraction order in spherical coordinates (θ, ϕ).

FIG. 2.

(a) A schematic of a metagrating with an arbitrary shaped unit. (b) A representative shape of the unit using a 16-sided polygon parametrized using 16 radii values of the vertices. The unit dimensions are Λx = Λy = 2.15λ0, where λ0 = 1.55 μm. The spacer layer and the grating are made of silicon (ϵ = 12.67) with heights of 400 nm each. The back mirror is assumed as gold. (c) The distribution of the range of radii values into 8 categories. (d) Histogram distribution of computed DEs of the 13 resulting diffraction orders of 90 000 units. The angle pairs on the x-axis represent the angle of the diffraction order in spherical coordinates (θ, ϕ).

Close modal

In the case of neural networks, we compute the DEs of thousands of arbitrary shaped units “once” for the purpose to train the network function. Once trained, the network will quickly mimic the DE patterns of a gigantic set of arbitrary shaped building blocks using a few matrix operations. In the training set, for computational convenience, the shape of each unit is parameterized as a sixteen sided polygon with sixteen vertices whose positions could be varied arbitrary within the periodic bounds. Assuming each vertex in polar coordinates as [(ri, θi), i = 1–16], the polar angles (θi) of the vertices are uniformly distributed between 0 and 2π. Therefore, the shape of the unit is now completely defined only by the radius coordinates (ri) of the vertices. Note that the 16-dimensional space of radii vector with at least 10 steps along each dimension requires a dataset of 1016 arbitrary shapes for an accurate training process which is far beyond the practical limits of computational resources and time. In order to ensure that a smaller set of samples will incur the qualities of all possible shapes of the 16-dimensional space inside the work plane (unit cell), we consider a training dataset comprising 8 categories, as shown in Fig. 2(c). In the 8 categories, the radii are defined such that each 5000 set of shapes is restricted between the limits (0 < ri < 0.25), (0.25 < ri < 0.5), (0.50 < ri < 0.75), (0.75 < ri < 1.0), (0.0 < ri < 0.50), and (0.50 < ri < 1.0). We then forced additional 10 000 shapes to have four-fold symmetry with radii between (0.0 < ri < 1.0), and another additional 50 000 shapes are taken as purely random values between (0.0 < ri< 1.0). The radii of all the shapes can now be represented as a matrix “X” with 16 rows and (a total of) 90 000 columns.

The chosen periodicity of the metagrating and the wavelength of incident light will result in a total of 13 diffraction orders inside the propagation regime.34 The aim of the artificial neural network function (Nnet) is to relate the 16 radii values (ri, i = 1–16) of the shape to the DEs of the resulting 13 diffraction orders (Dj, j = 1–13). Mathematically, the relationship can be represented as

(1)

where “r” and “D” represent the column vectors of radii and DEs, respectively.

The computation of DEs is carried out using an in-house built RCWA solver.30,35 The RCWA algorithm is computationally inexpensive and is customized to run in parallel utilizing the parallel computing toolbox of MATLAB. On a workstation with 24 cores, the computation of DEs of 90 000 shapes consumed around 50 h.

The computed dataset of DEs of the 13 propagation diffraction orders of 90 000 samples can be represented as a matrix “Y” with 13 rows and 90 000 columns. The histogram distribution of computed DEs of each diffraction order of all shapes is shown in Fig. 2(d). The responsibility of the neural network is now to compute the set of weight and bias matrices that represent a generalized numerical relationship between the matrices “X” and “Y”. Here, we trained the neural network using training packages based on feedforward network layers36 available in MATLAB. After many test runs, a feedforward network architecture is chosen with 3 layers (1-input, 1 hidden, and 1-output) with 16, 250 and 13 nodes, respectively (details in supplementary material).

During the training process, the 90 000 sample shapes are further randomly divided into three subsets of 63 000 (70%), 13 500 (15%), and 13 500 (15%) for training, validation and testing purposes, respectively. The training dataset is utilized to iteratively compute the optimized values of weights and biases, while the validation dataset aids in finetuning the network parameters and determining further gradient directions. In other words, the training set directly participates in computing the weights and biases but the validation set occasionally participates in avoiding over-fitting of the data. Finally, the test dataset is used to perform an unbiased analysis of trained neural network characteristics such as mean square error (MSE) and regression. Each iteration of the training process to minimize the prediction error is known as an epoch (details in supplementary material). The MSEs of the training, validation and test datasets at each epoch are presented in Fig. 3(a). The training is considered to have reached an optimal position when the validation and test errors are relatively low and nearly equal to each other. Such an optimum position is reached after 100 epochs for our current dataset. Figure 3(b) presents the histogram of error distribution (between the actual and predicted DEs using the optimized weights and biases) of the datasets. The observed fact that the train, test and validation datasets follow the same error distribution demonstrates that the neural network is a good fit of the data without over-fitting or under-fitting.

FIG. 3.

Validation and error analysis of the neural network. (a) Mean square error of components of the training process at each epoch. (b) Error histogram distribution of the optimum trained network around 100th epoch. Comparison of DEs of three diffraction orders along the marked (θ, ϕ) directions computed using RCWA (solid lines) and predicted using the Nnet function (dotted lines) for (c) square and (d) disc shaped unit cells as a function of their shape parameters. The computed and predicted DE values of diffraction orders for a few representative cases with an average error of 0.007 (e), 0.02 (f) and 0.04 (g) are presented for comparison. Double square markers are used to represent the diffraction angle (position in the θ, ϕ-plane), the computed DE values (left box color), and the predicted DE value (right box color) simultaneously.

FIG. 3.

Validation and error analysis of the neural network. (a) Mean square error of components of the training process at each epoch. (b) Error histogram distribution of the optimum trained network around 100th epoch. Comparison of DEs of three diffraction orders along the marked (θ, ϕ) directions computed using RCWA (solid lines) and predicted using the Nnet function (dotted lines) for (c) square and (d) disc shaped unit cells as a function of their shape parameters. The computed and predicted DE values of diffraction orders for a few representative cases with an average error of 0.007 (e), 0.02 (f) and 0.04 (g) are presented for comparison. Double square markers are used to represent the diffraction angle (position in the θ, ϕ-plane), the computed DE values (left box color), and the predicted DE value (right box color) simultaneously.

Close modal

After successful training of the function (Nnet), we further tested the performance of the function for canonical shape geometries (like squares and circles that are deliberately avoided from the training set). Figures 3(c) and 3(d) represent the comparison between the DEs of light computed directly using RCWA (solid lines) and the trained network (dotted lines). The network successfully reproduced the trends of DEs for all the diffraction angles. We opine that accurate prediction of each and every oscillation of the DE will be possible in future with the development of advanced neural network architectures. Figure 3(e)–3(g) presents the comparison of DE of a few representative arbitrary shaped samples with the average error distribution around 0.007, 0.02, and 0.04, respectively.

The fast-paced prediction of the DEs of arbitrary shapes make the (Nnet) function a very appropriate tool to perform the inverse design of the unit for a specific target set of DEs. Here, we aim at retrieving the shape parameters (the radii) of the unit for a desired set of DEs (Dd) using an interior-point optimization algorithm.37 The optimization process aims at determining the optimal set of radii (r) that satisfy the condition

(2)

First, to test the performance of the Nnet function in the inverse design process, we required a full specular reflection of light in the normal direction along [(θ, ϕ) = (0°, 0°)]. The results are shown in Fig. 4(a). As expected, the Nnet function predicted the radii (r) of all 16 vertices to be negligible, which minimizes the light coupling to diffraction orders. (The back mirror reflects the light completely in the normal direction.) Figures 4(b) and 4(c) presents the predicted metagrating shapes for a typical beam scanning application of conventional metasurfaces where the normal incident light is required to predominately couple to one of the diffraction orders (at oblique angles).1–7 The resultant shapes in these cases are complex for a quantitative discussion; however, gradients and inclinations can be qualitatively observed in the shapes. Figure 4(d)–4(f) present the predicted shapes for multifunctional metasurface applications,33,38 where the incident light is distributed into two desired on-axis and off-axis symmetric angles (diffraction orders), respectively. The predicted shape for the uniform distribution of DE along on-axis symmetric angles [Fig. 4(d)] is (as expected) a one-dimensional diffraction grating. Similarly, the predicted shape for the uniform distribution of DE along off-axis symmetric angles [Fig. 4(e)] is nearly a one-dimensional diffraction grating rotated to 45°. The predicted shape for the uniform distribution of DE along asymmetric angles [Fig. 4(f)] is again found to be complex for a qualitative discussion.

FIG. 4.

Inverse prediction of unit shapes. (a)–(f) The DEs and units predicted for six choices of light distribution. Double square markers are used in each panel to represent the diffraction angle (position), the aimed DE (left box color) and the actual DE (right box color). The background of each panel represents the predicted unit shape. Panel (a) represents reflection in the normal direction (θ, ϕ) ≈ (0°, 0°), (b) represents the dominant reflection into a single on-axis angle (θ, ϕ) = (27.7°, 180°), (c) represents dominant reflection into a single off-axis diffraction angle (θ, ϕ) ≈ (41.2°, 225°), (d) represents uniform reflection into two on-axis symmetric angles [(θ, ϕ) = (27.7°, 0°) and (θ, ϕ) = (27.7°, 180°)], (e) represents uniform reflection into two off-axis symmetric angles [(θ, ϕ) ≈ (41.2°, 225°) and (θ, ϕ) ≈ (41.2°, 45°)] and (f) represents uniform scattering into two asymmetric angles [(θ, ϕ) ≈ (27.7°, 180°) and (θ, ϕ) ≈ (41.2°, 45°)].

FIG. 4.

Inverse prediction of unit shapes. (a)–(f) The DEs and units predicted for six choices of light distribution. Double square markers are used in each panel to represent the diffraction angle (position), the aimed DE (left box color) and the actual DE (right box color). The background of each panel represents the predicted unit shape. Panel (a) represents reflection in the normal direction (θ, ϕ) ≈ (0°, 0°), (b) represents the dominant reflection into a single on-axis angle (θ, ϕ) = (27.7°, 180°), (c) represents dominant reflection into a single off-axis diffraction angle (θ, ϕ) ≈ (41.2°, 225°), (d) represents uniform reflection into two on-axis symmetric angles [(θ, ϕ) = (27.7°, 0°) and (θ, ϕ) = (27.7°, 180°)], (e) represents uniform reflection into two off-axis symmetric angles [(θ, ϕ) ≈ (41.2°, 225°) and (θ, ϕ) ≈ (41.2°, 45°)] and (f) represents uniform scattering into two asymmetric angles [(θ, ϕ) ≈ (27.7°, 180°) and (θ, ϕ) ≈ (41.2°, 45°)].

Close modal

Note that one can perform the optimization process using full wave Maxwell's equation solvers by searching around for the best suitable candidate for the dataset for a target set of DEs; however, the search can be dramatically accelerated if the Maxwell's equation solver is replaced with the neural network function. The total computational time to predict all the six designs shown in Fig. 4 using the trained Nnet function is less than a minute where the full Maxwell's equation solvers might require several hours.

Note that the Nnet function is only trained for one configuration of the spacer and grating heights. Hence, the possible range of DE of a given diffraction order is limited to the range of values in the dataset. For example, as shown in Fig. 2(c), the maximum DE of the diffraction order along (θ, ϕ) = (27.7°, 180°) is of the order of 0.5. Hence, the Nnet function cannot predict any design with DE greater than 0.5 for the particular angle. This limitation is reflected in its lower efficiency of the predicted designs to diffract light to higher angles in comparison to a conventional graded pattern metasurface. Addition of the layer height as one of the parameters of training could enhance the performance of the current metagrating.

Another limitation for utilization of the Nnet function is the uncertainty in the prediction of material loss. The presence of scatterers on the plasmonic back mirror inevitably results in some absorption loss. Hence, the sum of DEs of all diffraction orders does not always equal to 1 as we expect in an ideal zero-loss case. The sum of DEs of all diffraction orders of the 90 000 samples utilized during the training process fall in the range between 0.6 and 1 (0%–40% absorption loss) with nearly equal probability of all values in between. Hence, an average of 20% loss is assumed in all prediction cases in Fig. 4 [except panel (a)]. The surplus DE (1-assumed absorption loss-sum of required DEs of the diffraction orders) is randomly distributed into the remaining angles during inverse design optimization. The above discussed disadvantages of the neural network based design of metagrating could be attributed to limitations of both the prediction based neural network design methodology and the phase gradient less design.

To conclude, in this work, we design, train and test a neural network architecture to instantaneously mimic the optical response of metagratings with arbitrary unit shapes. We presented the capability of the neural networks for independently predicting the diffraction efficiency of a previously unseen large set of arbitrary shaped building blocks. We demonstrated that the network architecture also enables a fast-paced inverse design process of metagrating devices for a desired set of diffraction efficiencies. The proposed scheme allows the extension of the prediction mechanism to other physical and optical parameters of the host materials and a broad range of frequency spectra. We believe that the interdisciplinary combination of prediction algorithms with full wave solvers of Maxwell's equation opens a new pathway of on-the-fly device design processes for many nanophotonic applications that are particularly beneficial in the case of the evolving new forms of materials with dynamic control of optical39,40 and structural properties.41 The simplicity of the network function allows effortless integration of electromagnetic computations into larger Multi-physics modeling domains such as multicomponent optical networks, thermal radiation transport, quantum electrodynamics, digital holography, density functional theory, etc. We emphasize that the demonstrated inverse design process is just one immediate application example where the neural network is integrated within numerical optimization techniques.

See supplementary material for a detailed discussion on gradient descent technique for computing weights and biases of the network and building the neural network architecture.

This work was supported in part by the U.S. Air Force Office of Scientific Research (AFOSR) (No. FA9550-14-1-0349).

1.
N.
Yu
,
P.
Genevet
,
M. A.
Kats
,
F.
Aieta
,
J.-P.
Tetienne
,
F.
Capasso
, and
Z.
Gaburro
,
Science
334
,
333
(
2011
).
2.
X.
Ni
,
N. K.
Emani
,
A. V.
Kildishev
,
A.
Boltasseva
, and
V. M.
Shalaev
,
Science
335
,
427
(
2012
).
3.
M.
Farmahini-Farahani
,
J.
Cheng
, and
H.
Mosallaei
,
J. Opt. Soc. Am. B
30
,
2365
(
2013
).
4.
H.-T.
Chen
,
A. J.
Taylor
, and
N.
Yu
,
Rep. Prog. Phys.
79
,
076401
(
2016
).
5.
S. B.
Glybovski
,
S. A.
Tretyakov
,
P. A.
Belov
,
Y. S.
Kivshar
, and
C. R.
Simovski
,
Phys. Rep.
634
,
1
(
2016
).
6.
P.
Genevet
and
F.
Capasso
,
Rep. Prog. Phys.
78
,
024401
(
2015
).
7.
P.
Genevet
,
F.
Capasso
,
F.
Aieta
,
M.
Khorasaninejad
, and
R.
Devlin
,
Optica
4
,
139
(
2017
).
8.
A.
Forouzmand
and
H.
Mosallaei
,
Adv. Opt. Mater.
5
,
1700147
(
2017
).
9.
N. M.
Estakhri
and
A.
Alù
,
Phys. Rev. X
6
,
041008
(
2016
).
10.
V.
Asadchy
,
M.
Albooyeh
,
S.
Tcvetkova
,
A.
Díaz-Rubio
,
Y.
Ra'di
, and
S.
Tretyakov
,
Phys. Rev. B
94
,
075142
(
2016
).
11.
A.
Epstein
and
G. V.
Eleftheriades
,
J. Opt. Soc. Am. B
33
,
A31
(
2016
).
12.
Y.
Ra'di
,
D. L.
Sounas
, and
A.
Alu
,
Phys. Rev. Lett.
119
,
067404
(
2017
).
13.
D.
Sell
,
J.
Yang
,
S.
Doshay
,
R.
Yang
, and
J. A.
Fan
,
Nano Lett.
17
,
3752
(
2017
).
14.
A.
Pors
,
M. G.
Nielsen
, and
S. I.
Bozhevolnyi
,
Optica
2
,
716
(
2015
).
15.
A.
Epstein
and
O.
Rabinovich
,
Phys. Rev. Appl.
8
,
054037
(
2017
).
16.
Y.
Ra'di
and
A.
Alu
,
ACS Photonics
5
,
1779
(
2018
).
17.
J.
Yang
,
D.
Sell
, and
J. A.
Fan
,
Ann. Phys.
530
,
1700302
(
2018
).
18.
M.
Khorasaninejad
and
F.
Capasso
,
Nano Lett.
15
,
6709
(
2015
).
19.
I. H.
Witten
,
E.
Frank
,
M. A.
Hall
, and
C. J.
Pal
,
Data Mining: Practical Machine Learning Tools and Techniques
(
Morgan Kaufmann
,
2016
).
20.
Y.
LeCun
,
Y.
Bengio
, and
G.
Hinton
,
Nature
521
,
436
(
2015
).
21.
S.
Haykin
,
Neural Networks: A Comprehensive Foundation
, 2nd ed. (
Prentice Hall PTR
,
Upper Saddle River, NJ, USA
,
1998
).
22.
Y.
Rivenson
,
Z.
Göröcs
,
H.
Günaydin
,
Y.
Zhang
,
H.
Wang
, and
A.
Ozcan
,
Optica
4
,
1437
(
2017
).
23.
B.
Liu
,
H.
Aliakbarian
,
Z.
Ma
,
G. A.
Vandenbosch
,
G.
Gielen
, and
P.
Excell
,
IEEE Trans. Antennas Propag.
62
,
7
(
2014
).
24.
A.
Sinha
,
J.
Lee
,
S.
Li
, and
G.
Barbastathis
,
Optica
4
,
1117
(
2017
).
25.
D.
Liu
,
Y.
Tan
,
E.
Khoram
, and
Z.
Yu
,
ACS Photonics
5
,
1365
(
2018
).
26.
J.
Peurifoy
,
Y.
Shen
,
L.
Jing
,
Y.
Yang
,
F.
Cano-Renteria
,
B.
Delacy
,
M.
Tegmark
,
J. D.
Joannopoulos
, and
M.
Soljačić
,
Proc. SPIE
10526
,
1052607
(
2018
).
27.
I.
Malkiel
,
A.
Nagler
,
M.
Mrejen
,
U.
Arieli
,
L.
Wolf
, and
H.
Suchowski
, preprint arXiv:1702.07949 (
2017
).
28.
A.
Taflove
and
S. C.
Hagness
,
Computational Electrodynamics: The Finite-Difference Time-Domain Method
(
Artech House
,
2005
).
29.
B. A.
Szabo
and
I.
Babuška
,
Finite Element Analysis
(
John Wiley & Sons
,
1991
).
30.
M.
Moharam
,
E. B.
Grann
,
D. A.
Pommet
, and
T.
Gaylord
,
J. Opt. Soc. Am. A
12
,
1068
(
1995
).
31.
S.
Inampudi
,
J.
Cheng
, and
H.
Mosallaei
,
Appl. Opt.
56
,
3132
(
2017
).
32.
S.
Inampudi
,
J.
Cheng
,
M. M.
Salary
, and
H.
Mosallaei
,
J. Opt. Soc. Am. B
35
,
39
(
2018
).
33.
J.
Cheng
,
S.
Inampudi
, and
H.
Mosallaei
,
Sci. Rep.
7
,
12228
(
2017
).
34.
M.
Born
and
E.
Wolf
,
Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light
(
Elsevier
,
2013
).
35.
C. M.
Roberts
,
S.
Inampudi
, and
V. A.
Podolskiy
,
Opt. Express
23
,
2764
(
2015
).
36.
K.
Hornik
,
M.
Stinchcombe
, and
H.
White
,
Neural Networks
2
,
359
(
1989
).
37.
R. A.
Waltz
,
J. L.
Morales
,
J.
Nocedal
, and
D.
Orban
,
Math. Program.
107
,
391
(
2006
).
38.
S. M.
Kamali
,
E.
Arbabi
,
A.
Arbabi
,
Y.
Horie
,
M.
Faraji-Dana
, and
A.
Faraon
,
Phys. Rev. X
7
,
041056
(
2017
).
39.
A.
Forouzmand
and
H.
Mosallaei
,
J. Opt.
18
,
125003
(
2016
).
40.
A.
Forouzmand
,
M. M.
Salary
,
S.
Inampudi
, and
H.
Mosallaei
,
Adv. Opt. Mater.
6
,
1701275
(
2018
).
41.
W. K.
Liu
,
E. G.
Karpov
, and
H. S.
Park
,
Nano Mechanics and Materials: Theory, Multiscale Methods and Applications
(
John Wiley & Sons
,
2006
).

Supplementary Material