Recent advances in artificial intelligence (AI) and computing technologies are currently disrupting the modeling and design paradigms in photonics. In this work, we present our perspective on the utilization of current AI models for photonic device modeling and design. Initially, through the physics-informed neural networks (PINNs) framework, we embark on the task of modal analysis, offering a unique neural networks-based solver and utilizing it to predict propagating modes and their corresponding effective indices for slab waveguides. We compare our model’s predictions against theoretical benchmarks and a finite differences solver. Evidently, using 349 analysis points, the PINN approach had a relative percentage error of 0.69272% compared to the finite differences method, which had a percentage error of 1.28142% with respect to the analytical solution, indicating that the PINN approach was more accurate in conducting modal analysis. Our model’s continuity over the entire solution domain enhances its performance and flexibility while requiring no training data due to its guidance by Maxwell’s equations, setting it apart from most AI approaches. Our model design also flexibly enables simultaneous prediction of multiple modes over any specified intervals of effective indices. In addition, we present a novel reinforcement learning (RL)-based paradigm, employing an actor–critic model for inverse design. We utilize this paradigm to optimize the transmittance of a grating coupler by manipulating the device geometry. Comparing the obtained design to that obtained using the Particle Swarm Optimization (PSO) algorithm, our RL-based approach effectively produced a significant enhancement of 34% in 14 iterations only over the initial design compared to the PSO, which prematurely scored 27% enhancement in 30 iterations, proving that our model navigates the design space more efficiently, achieving a better design than PSO and resulting in a superior design. Based on these approaches, we discuss the future of AI in photonics in forward modeling and inverse design and the untapped potential in bringing these worlds together.

From ultra-high-speed communications1–4 to medical sensing,5–7 photonic devices, structures, and systems have been a key enabler of phenomenal progress in many industries. Rooted in Maxwell’s equations, the interaction between light and matter has been extensively studied, leading to breakthroughs in information processing,8 optical communications,9,10 energy harvesting,11 and sensing.12 

Consequently, the design process of these photonic devices has been revolutionized. Initially, it relied on experience-based, physically inspired structures that spurred from studying Maxwell’s equations and their solutions.13 However, with the advent of computer-aided design powered by numerical simulators and optimization procedures,14 there has been a transformation in how these devices are conceptualized and engineered.15 Computer-aided design (CAD) plays a crucial role in shaping the landscape of photonics engineering, particularly in dealing with complex structures and multiple design parameters.16 While our understanding of physics and experimentation is valuable in its own right, translating this knowledge into specific device designs poses significant challenges, especially in intricate systems.17,18 This paradigm shift directed attention toward specifying objectives and selecting suitable parameter spaces, paving the way for two primary approaches: the forward-centered19 and the inverse-centered20 methodologies. The forward-centered approach to photonics design typically involves two primary steps: selecting candidate device geometry and parameters that align with the design requirements, and second, modeling the device’s numerical behavior by mimicking and mapping the physical response of the device to its design parameters by solving Maxwell’s equations.19 This numerical modeling is often carried out using methods such as finite differences (FD),21 finite element (FE),22,23 or the method of moments (MoM).24 

On the other hand, the inverse-centered approach directly addresses the challenge of mapping desired responses to device parameters.20 It enables researchers to specify the desired device functionalities and utilize computational solvers to automatically generate optimized device structures by formulating the problem in reverse, starting with a desired functionality and determining the optimal structure to achieve it.25 This approach typically involves employing optimization algorithms, which can be gradient-based methods26 such as topology optimization,27 evolutionary-based techniques28 such as genetic algorithms,29 or Particle Swarm Optimization (PSO).30 However, these methods have many limitations that need to be addressed, including the complexity of exploring large design spaces and optimizing numerous parameters. They also tend to suffer from premature convergence.31 Therefore, integrating artificial intelligence (AI) techniques into the design and modeling of photonic devices has become necessary to overcome these limitations.32 

AI is a computational approach aiming to develop computer systems that can perform tasks that typically require human intelligence. One of the most impactful approaches to realizing such systems is machine learning (ML). ML offers data-driven computational models that can automatically discover patterns and model relationships within a given set of data samples. It relies on combining statistical models and optimization algorithms in a computational model that iteratively analyzes data while refining its performance within the performed task.33 Currently, deep learning (DL), a subset of machine learning that leverages neural networks (NNs) as a computational model, is the most popular approach to AI. NNs can approximate any (continuous) relation between inputs and outputs via a network of interconnected simple units called neurons, which distribute the processing of a given input (a picture, text, or features of an object). NNs are compositional, such that a standard network is organized into layers that are applied consecutively until a final output is obtained. DL found its way to various industries and applications, ranging from stock market prediction to medical image analysis and object recognition.34,35 NN architectures are also employed within the reinforcement learning (RL)36 framework where an agent (the model) interacts with an environment to learn optimal actions through a trial-and-error process. This allows the RL model to overcome the hurdle of gathering a large dataset and preprocessing the gathered data, allowing for data-generative and real-time adaptive training.37–39 

This drove attention to exploit AI and DL capabilities for photonics modeling and design in an attempt to come up with innovative devices with superior performance. Conversely, photonics can serve AI by providing ultra-fast, low-energy hardware realizations for DL models. Integrated photonics is particularly promising for building hardware for artificial neural networks (ANNs) due to its alignment with analog computing principles and its advantages in energy consumption and bandwidth. Photonic devices offer multiple degrees of freedom, enabling parallel data processing. Advances in silicon photonics have led to photonic integrated circuits (PICs) with thousands of components, although still fewer than electronic systems. To meet the growing computational demands of deep learning, novel photonic neural network (PNN)40 platforms have been developed. These platforms use light for both data transfer and computational functions, achieving higher efficiency than electronic systems. PNNs are especially effective for inference applications and have uses in natural language processing, cybersecurity, matrix multiplication, edge computing, and optimization problems in autonomous driving and robotics.41 

In this paper, we present our perspective on novel paradigms for integrating AI into the field of photonics. As shown in Fig. 1, we demonstrate the conceptual framework of AI-based forward modeling and inverse design. Using physics-informed neural networks (PINNs),42 we approach forward modeling by tailoring them for modal analysis of photonic waveguides. We develop a novel model that accurately predicts the supported modes and their associated effective indices, without requiring training data. This model is meshless and continuous across the entire problem domain. For the inverse design problem, we employ RL with the actor–critic paradigm to design a grating coupler. The resulting device outperforms those designed using PSO, achieving better performance with fewer steps in the search space.

FIG. 1.

AI for photonics paradigms. The forward modeling paradigm maps the photonic structures to their respective optical response, while the inverse design paradigm generates the photonic structures based on a desired optical response.

FIG. 1.

AI for photonics paradigms. The forward modeling paradigm maps the photonic structures to their respective optical response, while the inverse design paradigm generates the photonic structures based on a desired optical response.

Close modal

The upcoming sections of the paper will cover the following: in Sec. II, we provide a brief overview of our modeling strategies for the forward modeling and inverse design paradigms briefly discussing the various elements of these strategies. We begin our discussion with a fundamental structure of deep learning, namely, fully connected neural networks, which will serve as the backbone for the models we build later on. We also discuss the problem of incorporating physical constraints, described by differential equations, into neural networks based on the PINNs framework. In addition, we discuss the basic RL framework, focusing on a key RL algorithm known as the actor–critic (A2C) algorithm. Leveraging the first two parts, we tackle the problem of forward modeling in Sec. III, current trends in forward modeling are reviewed, and the importance of utilizing PINNs for the modal analysis of slab waveguides is demonstrated. Shifting to the domain of inverse design in Sec. IV, the section begins by discussing recent trends in inverse design and demonstrating the need for our A2C-RL model, which is benchmarked by inverse designing a grating coupler. Finally, Sec. V covers the current limitations in both forward and inverse modeling tasks, proposes potential improvements, and outlines our vision for the future of integrating AI techniques in photonics.

The adopted modeling strategies will tackle the frameworks of using PINNs for forward modeling and the RL paradigm for inverse design. Both strategies will require utilizing the basic feed-forward neural networks to realize the PINN for the forward modeling of a waveguide and the policy-deciding agent in the RL model. In the forward modeling strategy, the PINN will combine the power of the feed-forward neural network with Maxwell wave equations to enable accurate modal analysis of photonic devices by capturing the underlying physical behavior of the waveguide. In contrast, the inverse design paradigm through RL will allow for iterative design improvement by leveraging a trial-and-error method. Through interactions with the environment and feedback in the form of rewards or penalties, the RL agent explores the design space in an attempt to learn optimal decisions and explore unconventional efficient design solutions. Hence, these modeling strategies are expected to accelerate the simulation process, enhance design quality, and foster the discovery of innovative solutions within the photonics realm. The next subsections discuss the basics of neural networks and how to extend them to the PINNs and RL.

The architecture of a fully connected forward neural network, or simply NN, typically consists of multiple layers of interconnected nodes, or artificial neurons, organized into an input layer, one or more hidden layers, and an output layer. NNs aim to learn complex patterns and representations from input data in a hierarchical manner, allowing them to perform tasks such as classification, regression, and pattern recognition. We formalize those notions mathematically as follows.43 

Using f:RnRm to represent a fully connected neural network function mapping. This function is realized through a series of layers, where each layer consists of two transformations: a global (where every element in the output depends on every element in the input) linear transformation gl and a local nonlinear transformation σl. In particular, for the lth layer, gl:Rnl1Rnl is defined as gl(x) = Wlx + bl, where WlRnl×nl1 and blRnl is the bias. The local transformation σl:RnlRnl is applied element-wise. A layer Lj of the network is defined as Lj(x)=σ(gl(x)), and the entire network with L layers is expressed as
(1)
The set of all network parameters {W1, b1, W2, b2, …, bl} is collectively denoted as θ. In the deep learning community, the dimension of the output vector after a layer is referred to as the number of neurons, defining a network’s architecture in terms of its layers, neurons, and activation function. A two-layer neural network taking the functional form of Eq. (1) is proven to be a universal approximator capable of representing any Borel measurable function on a compact domain to an arbitrary degree.44 This property allows neural networks to approximate the modes of a problem’s eigenfunctions. The process of adjusting the parameters Θ of the network to satisfy problem requirements, thereby yielding a solution, is known as training. Generally, training involves iteratively adjusting the network parameters to minimize a predefined loss or cost function, quantifying the disparity between predicted outputs and known ground truths. Let X = {(x1, y1), (x2, y2), …, (xN, yN)} denote the set of N training examples, where each tuple (xiRn,yiRm) represents an input feature vector and its corresponding target output. The objective is to find the optimal set of parameters Θ that minimizes the empirical loss given by
(2)
where l is the chosen loss function, such as mean squared error or cross-entropy loss. This optimization problem is typically solved using gradient-based methods, where the gradient ∇Θl(y, f(x; Θ)) of the loss function of the parameters is computed via backpropagation and utilized to minimize the loss function.

In photonics, this structure provides versatile applications for both forward and inverse design. For example, it can be used to predict the optical response of a device based on its design parameters, or conversely, to determine the design parameters needed to achieve a desired optical response. Our goal is to use neural networks (NNs) to directly represent the modes of an optical device. We achieve this by embedding our understanding of the governing physics, characterized by differential equations whose solutions are the modes and boundary conditions, into the network’s loss function.

Several frameworks exist for solving differential equations (DEs) using DL, of which physics informed neural networks (PINNs) have found great success. PINNs use NNs to represent the solution to a given DE having no access to labeled examples. This is achieved by decomposing l in Eq. (2) into several constituent terms incorporating the residual of the DE at hand and minimizing that residual in a mean squared error sense, along with the equation’s initial and boundary conditions.42 Let H be a Hilbert space for which the inner product between two members u,vH is denoted by ⟨u, v⟩, and ‖u‖ denotes the norm. For a DE of the form Aϕ=b, where A is a differential operator and b a non-homogeneous term, defined on some domain Ω, with a boundary Ω, subject to a Dirichlet boundary condition ϕΩ = ϕ0. The loss function takes the form
(3a)
(3b)
(3c)
Practically, each term in the loss function can be calculated exactly, as shown in Fig. 2, at every point in the problem domain using readily available auto-differentiation packages. For the class of eigenvalue problems (EVPs), where b = 0 and the differential operator factored into Aϕ=Bϕkϕ, such that B is another differential operator and (ϕ, k) are the problem’s eigenfunction–eigenvalue pair, a study proposed that an NN solution could be found by augmenting the loss in Eq. (3) with additional terms.45 First, an energy term to prevent the network from learning trivial solutions and second, for linear self-adjoint operators, a term based on Rayleigh quotient to learn the eigenvalue, giving a loss,
(4a)
(4b)
(4c)
with lbd, lphy defined as in Eqs. (3b) and (3c). The formula attempts to predict both the eigenfunction with the smallest eigenvalue and this corresponding value. We modify the last term of formula (4a) to provide more control over the types of modes to be learned by maximizing the eigenvalue within a specific range (nlow, nup) via the term,
(5)
To get the final loss,
(6)
with its constituent terms defined as before. Diagrammatically, this can be represented as shown in Fig. 3, where the NN output is used for energy calculation, boundary condition imposition, learning the first part of the decomposed differential operator, using it to calculate the Rayleigh quotient and then combining those to compute the EVP residual, and finally combining all those parts into a loss function.
FIG. 2.

Leveraging differentiability of NNs to estimate and impose required physical constraints.

FIG. 2.

Leveraging differentiability of NNs to estimate and impose required physical constraints.

Close modal
FIG. 3.

Modifying PINNs to solve EVPs.

FIG. 3.

Modifying PINNs to solve EVPs.

Close modal

Utilizing NNs to learn and represent functions offers numerous advantages. First, the learned function maintains continuity, enabling predictions across the entirety of the solution domain. In addition, the learning process is highly flexible, as it operates on a user-defined point cloud, eliminating the necessity for intricate meshing techniques. We demonstrate the advantages of this flexibility via numerical experiments

RL is a subfield of machine learning that focuses on how an untrained agent learns to make decisions through interactions with an environment to maximize rewards. In RL, an agent, which in its simplest form is a neural network, takes actions in an environment and receives feedback in the form of rewards, which is used to evaluate the correctness of its corresponding actions. Conventionally, to effectively model RL problems, the framework of Markov decision processes (MDPs) serves as the basic building block.46 MDPs provide a mathematical formulation that captures the dynamics of sequential decision-making under uncertainty. Within an MDP, the agent interacts with an environment that transitions between different states (configurations) based on its chosen actions. MDPs enable RL algorithms to learn optimal policies that guide the agent’s decision-making process by incorporating the concept of transition probabilities and rewards associated with state-action pairs. Figure 4 shows the basic MDP configuration for the RL model where an RL agent, functioning as an autonomous entity or computational system, engages in interactions with an environment to acquire knowledge and make decisions aimed at achieving specific objectives or maximizing cumulative rewards. The environment, a well-defined and structured system, serves as the context where the agent’s actions occur. The environment also encompasses a range of possible states that reflect diverse situations or configurations. It also portrays a set of permissible actions available for the RL agent to employ. These actions represent the decisions made by the agent, which influences the state of the environment. Following each action, the environment provides the agent with a numerical signal termed a reward, which serves as an indicator of the immediate desirability or effectiveness of the action taken. Typically, the overarching aim of the agent is to maximize the cumulative reward obtained over time.

FIG. 4.

Simple MDP diagram.46 

FIG. 4.

Simple MDP diagram.46 

Close modal

In an MDP framework, we have a finite set of states S1, S2, S3, …, SN representing various environmental situations and configurations. The agent can take actions from a finite set A1, A2, A3, … AN to influence the environment. Rewards, represented by the function R(St, At, St+1), are received by the agent after taking actions in specific states. Finally, policies π: StAt+1, defined as the strategies that map states to actions, guide the agent’s behavior. The objective is to find an optimal policy π* that maximizes the expected cumulative reward over time, considering a discount factor γ that weighs a trade-off between the significance of the immediate and future rewards on the optimal policy.

There have been numerous attempts to employ DL for photonics modeling. We divide candidate approaches as being data-driven; following the standard data-driven training with neural networks, including the canonical fully connected neural networks (FCNNs) and convolutional neural networks (CNNs), neural operators (NEs),47 and deep operator networks (DeepONets),48 or data-free such as physics-informed neural networks (PINNs) or physics-informed operator networks. The data-free approach has the advantage of not relying on extensive datasets gathered via the use of numerical simulators or laborious experiments but with the downside of being more unstable and typically slower during training. This comes from the lack of a fixed target that the NN is trying to approximate for a given input. We propose a data-free methodology based on PINNs to carry out forward modeling and perform modal analysis of waveguides. The proposed method is illustrated on slab waveguides and compared against the semi-analytic known solution and an FD-based solver.

Alagappan and Png49 pursued a methodology to forecast the Hy element of the transverse electric (TE) mode in a buried channel waveguide. Employing FD as the primary numerical solver, they generated multiple datasets, the largest containing 315 instances covering various device parameters, including the refractive index of the core region (nc ∈ [1.5, 3.8]), and the dimensions of the core (w ∈ [0.1, 1.2] μm and h ∈ [0.1, 0.5]), with silica (n = 1.45) serving as the cladding material. Utilizing symmetric devices, only a quarter of the simulated output image was predicted during training. Two architectures, FCNN and recurrent neural networks (RNN), were deployed to address this issue. Comparing their results revealed an advantage for the RNN in capturing the relationship between the predicted samples. In addition, the authors investigated the impact of variations in the training set size, as well as the number of layers and neurons within the architecture.

In a subsequent study by the same authors, a different approach was taken, employing convolutional (or more precisely, deconvolutional) neural networks.50 This resulted in a flexible architecture allowing for scalable output resolution adjustment through the addition or removal of scaling blocks, comprising an upsampling layer, two-dimensional convolution, and rectified linear unit (ReLU) activation. Transfer learning facilitated rabid architecture training for finer discretizations.

Inspired by the success of CNNs in image processing, Sajedian et al. proposed an architecture combining CNNs and RNNs to predict the absorption spectra of plasmonic nanostructures.51 Devices were assumed to be uniform in one dimension, hence the architecture received its input as a two-dimensional image of the modeled nanostructure. The authors acquired training data via simulation using a finite-difference time-domain (FDTD) solver and artificially created a dataset of 100 000 examples. The examples were of devices with certain parameters fixed, such as constituent materials and source polarization, and other variables including the number of shapes in each structure and their dimensions. Upon training, the model showed good agreement with numerical simulations while being significantly faster to run.

In all the approaches considered, example data consisting of device parameters and their corresponding target responses were necessary to train each model. These data were generated using numerical solvers, indicating each method’s reliance on traditional solvers. Such reliance becomes a significant limitation when aiming to model complex devices with intricate geometries across all spatial dimensions. Traditional solvers are known to scale poorly with increasing problem dimensions and accuracy requirements, necessitating vast computational resources and prohibitively long durations for each simulation—thousands of which are needed for AI models. Our work addresses these challenges via PINNs as a modeling strategy, which requires no example data for its operation.

We leverage PINNs to perform transverse electric (TE) mode analysis for slab waveguides, as shown in Fig. 5. Finding the modes and their corresponding effective indices (EIs) for slab waveguides translates to solving the normalized, with respect to the wavenumber, Helmholtz equation, given by
(7)
where x̃ is the normalized spatial dimension, ϕ = Ey, the y-directed component of the electric field, n(x̃) denotes the refractive index profile, and neff is the EI of the structure. The solution Eỹ and its first derivative dEy/dx are required to be continuous across interfaces as per the continuity conditions imposed by the remainder of Maxwell’s equations. We defined the differential operator for the problem Bϕ=d2ϕdx̃2+n2(x̃)ϕ, so that R will correspond directly to the learned EI. Norms and inner products are approximated via sampling, such that
(8a)
(8b)
FIG. 5.

A diagram for a slab waveguide.

FIG. 5.

A diagram for a slab waveguide.

Close modal

Approximating the Rayleigh quotient could be achieved with uniform or nonuniform sample points; we adopt uniform sampling for our initial experiments and then adopt nonuniform schemes later on.

For all experiments, we utilized an NN with three layers and 128 neurons per hidden layer, using radial basis52 as an activation function for the first two layers, for its rabid convergence properties, and an identity mapping for the last layer. Optimization is carried out via the infinity form variant of the Adam optimizer,53 referred to as Adamax. The training is implemented using ten thousand optimization steps, with a learning rate of 0.01, decreased by a factor of 0.1 every one thousand iterations for the first three thousand ones. The PINN training and the finite difference method (FDM) modeling were implemented through the free-tier Google Colab environment, which had an Intel Xeon central processing unit (CPU) @ 2.20 GHz, a 13 GB RAM, a Tesla K80 accelerator, and a 12 GB GDDR5 VRAM.

The NNs are trained to produce multiple propagating modes for several devices with varying core widths and refractive index (RI) profiles. The results are compared against the semi-analytical solution for the problem, consisting of the eigenmodes and their corresponding effective indices, in addition to a finite differences (FD) solver. We explore the performance of our model within three aspects. First, showing qualitative results of the learned modes against analytical ones and graph its convergence to the exact EIs to ensure effectiveness of the learning process. We then quantitatively tabulate its performance comparing its relative error to FD in a setting for which the network trains on the same sample points for domain discretization as those of FD. Afterward, we explore flexibility of the network showing how its performance could be greatly improved simply by adopting better choices for training points. Both the FD and NN solvers operated using 64-bit precision to ensure identical computation precision.

Plotting the estimated modes in Fig. 6, for the learned principal and second-order predicted modes of a test guide, shows that the learned modes are in excellent agreement with the theoretical ones over the entire problem domain. The learned functions are smooth and training visually produces no artifacts. For the device shown, we graph its convergence to the exact EIs as optimization proceeds. As shown in Fig. 7, predictions are stable past two thousand iterations. Even though the EI estimates rapidly converge, mode errors continue to marginally improve as training advances as shown in Table II.

FIG. 6.

Comparison between the predicted and exact (a) fundamental modes and (b) second-order modes for a slab waveguide with nco = 1.3, nsub = 1.25, and ncl = 1.2.

FIG. 6.

Comparison between the predicted and exact (a) fundamental modes and (b) second-order modes for a slab waveguide with nco = 1.3, nsub = 1.25, and ncl = 1.2.

Close modal
FIG. 7.

Estimated and exact effective indices for a slab waveguide with nco = 1.3, nsub = 1.25, and ncl = 1.2.

FIG. 7.

Estimated and exact effective indices for a slab waveguide with nco = 1.3, nsub = 1.25, and ncl = 1.2.

Close modal

In addition, to demonstrate the effectiveness of the proposed method, we tabulate its percentage error against the exact results, for both the eigenmodes and effective indices, compared to the FD solver operating with a step size of λ50. As Table I presents, the comparable eigenmode results were obtained for both methods for the mode errors with the NN giving slightly better performance for most cases. Estimates of the EI were better, however, with FD, we found that EI estimates from NNs were better when we used NNs to predict single modes with each training sweep. The first two rows presented in Table III confirm that both approaches give very similar errors for a test device when the NN is trained to predict only the principal mode. Unlike FD, NNs have various parameters to change which could affect their performance for every case. In our experiments, however, we chose to fix optimization parameters to maintain uniformity and equity across all the results obtained. The results for the NN were obtained by training on the same sample points as those for FD. This comparison, however, overlooks two important advantages to utilizing NNs, which are flexibility of choice for the training points and continuity of the network. This comparison also shows how this can immensely influence the quality of the learned solutions by considering different sampling mechanisms for learning and EI estimation.

TABLE I.

Relative percentage errors vs training iterations for the fundamental and the second modes of a slab waveguide with nco = 1.3, nsub = 1.25, and ncl = 1.2. Boldface denotes that the PINN solver reached stability at the corresponding number of iterations.

No. of iterationsFundamental mode error (%)Second mode error (%)
1000 3.98 2.18 
2000 0.43 2.08 
3000 0.41 2.00 
4000 0.41 1.99 
5000 0.41 1.99 
6000 0.41 1.99 
7000 0.41 1.99 
8000 0.41 1.99 
9000 0.41 1.99 
No. of iterationsFundamental mode error (%)Second mode error (%)
1000 3.98 2.18 
2000 0.43 2.08 
3000 0.41 2.00 
4000 0.41 1.99 
5000 0.41 1.99 
6000 0.41 1.99 
7000 0.41 1.99 
8000 0.41 1.99 
9000 0.41 1.99 
TABLE II.

Relative percentage error for the eigenmodes and effective indices of the NN-based approach and FD for various devices against the exact result.

EigenmodesEffective index
DeviceMode%errNN%errFDExactNN%errFD%err
ncl = 1.3 
nco = 1.6 
nsub = 1.4 
 0.677 28 0.682 64 1.586 15 1.585 679 −0.029 7 1.585 921 −0.014 44 
 1.077 72 1.121 16 1.544 509 1.542 63 −0.121 69 1.543 603 −0.058 69 
ncl = 1.2 
nco = 1.3 
nsub = 1.25 
 0.411 83 0.441 68 1.287 61 1.287 257 −0.027 43 1.287 436 −0.013 48 
 1.863 05 1.894 12 1.254 723 1.253 778 −0.075 28 1.254 255 −0.037 26 
ncl = 2.4 
nco = 2.55 
nsub = 2.45 
 0.648 69 0.659 14 2.541 581 2.541 299 −0.011 09 2.541 444 −0.005 39 
 1.025 57 1.058 52 2.515 557 2.516 65 −0.043 46 2.516 122 −0.021 01 
ncl = 1 
nco = 3.2 
nsub = 3.1 
 1.142 41 0.748 7 3.192 313 3.192 042 −0.008 5 3.192 184 −0.004 05 
 2.010 12 1.275 54 3.169 402 3.168 33 −0.033 8 3.168 897 −0.015 92 
EigenmodesEffective index
DeviceMode%errNN%errFDExactNN%errFD%err
ncl = 1.3 
nco = 1.6 
nsub = 1.4 
 0.677 28 0.682 64 1.586 15 1.585 679 −0.029 7 1.585 921 −0.014 44 
 1.077 72 1.121 16 1.544 509 1.542 63 −0.121 69 1.543 603 −0.058 69 
ncl = 1.2 
nco = 1.3 
nsub = 1.25 
 0.411 83 0.441 68 1.287 61 1.287 257 −0.027 43 1.287 436 −0.013 48 
 1.863 05 1.894 12 1.254 723 1.253 778 −0.075 28 1.254 255 −0.037 26 
ncl = 2.4 
nco = 2.55 
nsub = 2.45 
 0.648 69 0.659 14 2.541 581 2.541 299 −0.011 09 2.541 444 −0.005 39 
 1.025 57 1.058 52 2.515 557 2.516 65 −0.043 46 2.516 122 −0.021 01 
ncl = 1 
nco = 3.2 
nsub = 3.1 
 1.142 41 0.748 7 3.192 313 3.192 042 −0.008 5 3.192 184 −0.004 05 
 2.010 12 1.275 54 3.169 402 3.168 33 −0.033 8 3.168 897 −0.015 92 

The sampling strategy for training points is first considered. Aside from uniform sampling, we consider a non-uniform sampling strategy that prioritizes the core region and the parts immediately above and below it in what we refer to as stepped uniform sampling. Moreover, we consider stochastic sampling techniques drawing samples from a uniform distribution of points in the domain, as well as a normal distribution, where only in-domain points are considered. We present, in Table III, that for a single device, better choices will lead to a better performance. In this experiment, the model achieves superior error in both mode and EI estimates with normally distributed sample points. In both the stepped and normally distributed sampling scheme, more points are drawn inside and near the core compared to the cladding or substrate prioritizing prediction in the core region, in which we know the function to fluctuate the most. Testing this on another device with multiple number of samples spells the same results that Table IV presents. The results show an advantage for our method over all discretizations while using the same number of points. This advantage is most apparent with coarse discretizations, where our model achieves almost half the error percentage without using additional point samples and with no complex sampling strategies. More complex strategies employing adaptive sampling based on the physics residual,54 as well as others that utilize self-adaptive weights of the per-sample loss55 could be adopted, which could improve the results even more, but are not attempted in this work.

TABLE III.

Relative percentage error for the principal mode and its corresponding EI for a device with nco = 1.6, ncl = 1.3, and nsub = 1.5.

MethodologyMode errorEI relative error
FD 1.238 16 −0.025 6 
Uniform 1.262 73 −0.026 26 
Random normal 0.756 66 −0.011 98 
Stepped 0.797 28 −0.005 27 
Random uniform 0.847 57 −0.010 42 
MethodologyMode errorEI relative error
FD 1.238 16 −0.025 6 
Uniform 1.262 73 −0.026 26 
Random normal 0.756 66 −0.011 98 
Stepped 0.797 28 −0.005 27 
Random uniform 0.847 57 −0.010 42 
TABLE IV.

Relative percentage error for FD an NN with normally distributed learning samples for the principal mode of a device with nco = 2.1, ncl = 1.9, and nsub = 2.

No. of pointsFDPINN + R.N. points
349 1.281 42 0.692 72 
699 0.646 95 0.557 62 
1049 0.432 77 0.408 16 
1399 0.325 14 0.324 32 
No. of pointsFDPINN + R.N. points
349 1.281 42 0.692 72 
699 0.646 95 0.557 62 
1049 0.432 77 0.408 16 
1399 0.325 14 0.324 32 

While the PINNs approach for forward modeling is very promising due to its flexibility and meshless nature, making it well-suited for problems with complex geometries and operating without access to training examples, there are some challenges to its use. Unlike traditional deep learning techniques, which require a single training loop and can then be used quickly in inference mode for predictions in configurations different from those they were trained on, PINNs require retraining for every new device configuration, adding time complexity to the method. While it is typical for known simulation methods such as FD or finite element method (FEM) to reconstruct the problem setting for each new configuration, this is uncommon for deep learning methods. Modifying PINNs to accelerate retraining or avoid it altogether would pave the way for them to become the standard for modeling and simulations. In addition, while the theory and numerical simulators’ convergence are well-established, there is not much on deep learning-based ones, highlighting the need to analyze and quantify their stability and convergence properties.

Traditionally, inverse design56 allows for the automatic exploration of design spaces, leading to optimized device configurations and tailored optical properties that may not be intuitively derived through traditional trial-and-error methods. For instance, researchers usually employ algorithms such as genetic algorithms, PSO,57 and topological optimization methods58,59 to search for optimal device parameters that maximize or minimize specified performance metrics, such as efficiency, bandwidth,59 dispersion,57 polarization response,60 or any other desired characteristics, enabling researchers to achieve enhanced performance and uncover new functionalities.

Lin et al.58 proposed a novel approach based on topology optimization that enables the automatic discovery of wavelength-scale photonic structures, such as micro-posts and grating-slab cavities, for achieving high-efficiency second-harmonic generation. They produced structures with orders of magnitude enhancements in second-harmonic generation (SHG) efficiency compared to state-of-the-art photonic designs. Furthermore, Piggott et al.59 employed topological inverse design to develop a silicon wavelength demultiplexer capable of splitting 1300 and 1550 nm light from an input waveguide into two separate output waveguides and fabricated the device to success. The fabricated device exhibited an excellent performance, including low insertion loss of ∼2 dB, low crosstalk of less than −11 dB, and wide bandwidths exceeding 100 nm with a 2.8 × 2.8 µm2 area. In addition, Hameed et al.57 suggested meta-heuristic algorithms, such as PSO and central force optimization (CFO), for optimizing the dispersion of a standard photonic crystal fiber (PCF). The dispersion of the liquid crystal (LC) PCFs is within [0.10, −0.609] ps/km nm for the CFO and [0.1974, −0.7404] ps/km nm for the PSO between 1.25 and 1.6 μm. However, all these methods suffer some limitations that need to be resolved. These limitations include the computational complexity arising from exploring large design spaces and optimizing numerous parameters, which forces the researcher to interfere and constrain the design process, ultimately leading to sub-optimal designs. Furthermore, these algorithms usually have relatively slow convergence rates, making them time-consuming and resource-intensive. Consequently, integrating AI techniques into inverse design became a necessity to overcome these limitations. For instance, AI algorithms can efficiently navigate vast design spaces, enabling faster convergence capability and significantly reducing the computational burden associated with traditional optimization techniques. In addition, AI algorithms can learn from large datasets and existing knowledge, allowing for the discovery of novel design solutions that may not be apparent through conventional approaches. Moreover, AI algorithms can incorporate multiple objectives and constraints, allowing the optimization of performance metrics while considering fabrication, cost, and other practical considerations.32 

Consequently, researchers leveraged AI-based techniques in inverse design. For example, researchers employed the training process of supervised DL methods to obtain the non-linear relationship between the device parameters and the desired output, to reach the most optimum design. As an illustration, Peurifoy et al.61 employed a forward DL model and applied it in reverse to design a multi-layered core–shell nanoparticle. Initially, they trained their forward modeling NN to predict the optical response of various nanoparticle designs. Subsequently, they leveraged the trained NN by freezing its weights and utilizing it in reverse and then inferring the optimal design parameters for the core–shell nanoparticle by feeding an optimum spectrum into the NN. In another intriguing study, Liu et al.62 introduced a tandem NN architecture that combined an inverse network with a pre-trained forward model. Their objective was to design a multilayer structure composed of SiO2 and Si3N4, where the thickness of each layer served as the design parameters. Initially, they trained the forward model, establishing the relationship between the design parameters and the corresponding transmission spectra. Once the forward model’s weights were fixed, they integrated the inverse design network into the architecture. To train their model, they assembled a dataset consisting of 500 000 labeled pairs of data, with an additional 50 000 pairs reserved for testing. The desired spectrum was used as input, and the network minimized the loss between the desired spectrum and the recovered spectrum, optimizing the design parameters through the inverse network. In addition, even though, DL has shown great potential in inverse design as demonstrated, it also has certain limitations. One major challenge is requiring a large amount of labeled training data, which can be expensive and time-consuming. They can also deal only with simple structures, where a relationship between input and output can be mapped easily. In addition, the DL models may struggle to explore unseen design spaces or novel configurations.

Luckily, RL was able to offer distinct advantages over traditional DL approaches in the context of inverse design. For instance, RL operates through interactions with an environment unlike DL methods, which rely on labeled training data, making RL more suitable for navigating complex and unexplored design spaces. RL agents learn from trial and error, receiving feedback in the form of rewards or penalties based on their actions. At its core, RL revolves around an agent interacting with an environment, learning a policy that maximizes cumulative rewards over time.63 The actor–critic (A2C) RL approach combines value estimation and policy improvement, allowing for more refined learning and efficient decision-making, which is more convenient for inverse design.64 Consequently, we were able to utilize the A2C RL inverse design approach in the optimization of a grating coupler. Figure 8 shows the optimization paradigm of the grating coupler.

FIG. 8.

Inverse design approach utilizing A2C RL for optimizing a grating coupler simulated through lumerical FDTD.

FIG. 8.

Inverse design approach utilizing A2C RL for optimizing a grating coupler simulated through lumerical FDTD.

Close modal

The diagram showcases the key components involved in the process, including the environment where the FDTD simulation of the grating coupler takes place and the RL agent that handles the optimization process by learning to iteratively improve the design parameters of the grating coupler while interacting with the environment. Through this interaction, the RL agent, written in MATLAB, explores the design space to discover optimal configurations that enhance the performance of the grating coupler. In the subsequent paragraphs, we will provide a comprehensive overview of each component depicted in the diagram, elucidating their functionalities and roles.

Initially, the simulation environment is established through the utilization of Lumerical FDTD software, employing a 2D model configuration with perfectly matched layer (PML) boundary conditions. Figure 9 visually shows the overall 3D silicon-on-insulator (SOI) grating coupler structure alongside the simulated 2D structure. The geometry of the coupler incorporates the pitch of the grating (Λ), the width of the grating teeth (w), the etching depth inside the grating (ed), the tilt angle between the optical fiber and the grating (θ), the horizontal distance between the fiber and the waveguide connected to the end of the grating (X0), and the vertical distance between the fiber and the grating (X0 tan θ). For convenience, we define a new parameter, the duty cycle (Dc), to be wΛ. Next, we set ed = 100 nm, and initialize Λ, Dc, θ, and X0 to 0.65 µm, 0.65, 13.45°, and 3 µm. Then, we obtain a normalized transmittance of 0.39 at λ = 1550 nm from the optical fiber to the waveguide through the coupler.

FIG. 9.

(a) 3D model of the grating coupler. (b) The corresponding 2D model provides a detailed representation of the geometrical parameters, materials employed, and boundary conditions utilized within the simulation environment.

FIG. 9.

(a) 3D model of the grating coupler. (b) The corresponding 2D model provides a detailed representation of the geometrical parameters, materials employed, and boundary conditions utilized within the simulation environment.

Close modal
Next, the reward function is calculated as follows:
(9)
where R can reach a maximum value of 0 when the transmittance reaches 1 and a minimum value of −100 when the transmittance reaches 0. Thus, the more the reward approaches 0, the better the design. Then, the structure geometry (system state) and the reward generated through the reward function are mapped into a discretized next-state action, producing a new design. The mapping algorithm is executed through an A2C agent. The A2C agent has two main components: the actor and the critic. The actor is responsible for selecting actions based on the current state of the environment, determining the agent’s behavior. This can be achieved by mapping system states into actions through a policy function [π(A|St, θ)], which also depends on the actor NN parameters (θ). Thus, by updating θ and obtaining a new system state, a new action will occur. On the contrary, the critic, with ϕ parameters, estimates the long-term reduced reward by mapping the system observations and the reward function to a value function [V(Stϕ)].64 The training steps are as follows.
Initially, the actor and critic networks are initialized with random θ and ϕ values. Next, N experiences were generated by conforming to the current policy with a state, an action, and a future reward defined for each experience: St, At, Rt+1, St+1, …, St+N−1, At+N−1, Rt+N, St+N. Here, St is a state observation, At is an action taken according to this state, St+l is the next state, and Rt+l is the reward received for moving from St to St+l. Based on St, the agent computes the probability of taking each action through the policy function and then selects an action based on the probability distribution. Furthermore, the return Gt is defined as the sum of the reward for the current step and the discounted factor calculated by the critic. Using Eq. (9), we can obtain Gt as
(10)
where b is 0 if St+N is a terminal state, one otherwise, and γ is the discount factor estimated by the designer. Furthermore, the gradients of both θ and ϕ are calculated and used to update the initial θ and ϕ values. The α and β coefficients are the learning rates of the actor and the critic. Using Eq. (10), we can define the advantage function Dt,
(11)
The gradients of the actor are accumulated following the policy gradient to maximize the advantage function. We can define the gradient of the actor network (dθ) using the advantage function (Dt) defined in Eq. (11),
(12)
The gradients of the critic are accumulated by minimizing the mean square error between the value function and the total return Gt. We can define the gradient of the critic network (dϕ) using Gt defined in Eq. (10),
(13)
Finally, the actor and critic parameters are updated through Eqs. (12) and (13),
(14)
(15)
This mathematical model was demonstrated previously in the literature.64 

Following the update, the actor NN generates a discrete action, ensuring that the probabilities of all neurons in the final layer sum up to one. The selected action is represented by the neuron with the highest probability in the output layer of the actor NN. This approach guarantees a clear and deterministic decision-making process within the A2C framework. Next, each neuron is mapped to an action vector that takes a discrete step in the n-dimensional solution space. For instance, our problem has four independent parameters, thus, the solution space is four-dimensional. So, the action vector will have four values, one for each parameter. Each value can be 0, 1, or 2, to reduce, hold, or increase each parameter value, respectively, by a certain predefined step; hence, the model obtains a new geometry for the structure. This predefined step is selected to force the model to obtain designs that can be fabricated. Therefore, the output layer of the actor NN must have 3n neurons, where n is the number of the parameters. Table V presents the hyperparameters used for this training. The model was trained for a total of 40 steps. The initial parameters were random. The model achieved the most optimum design only after 14 steps, which took 5 min and 20 s. The A2C-RL training and the PSO algorithm were implemented through a Dell Inspiron 5584 laptop with Intel Core i7-8565U, 16 GB RAM DDR4, 2 TB HDD, and a VGA NVidia MX130 4 GB.

TABLE V.

Hyperparameters of the A2C RL model.

HyperparametersSymbolValue
Learning rate α = β 0.001 
Gradient threshold g 
Discount factor γ 0.9 
Actor input layer no. − 
Actor output layer no. − 81 
Critic input layer no. − 
Critic output layer no. − 
HyperparametersSymbolValue
Learning rate α = β 0.001 
Gradient threshold g 
Discount factor γ 0.9 
Actor input layer no. − 
Actor output layer no. − 81 
Critic input layer no. − 
Critic output layer no. − 

Figure 10 shows the training steps where for each iteration (episode), the reward varies. The exploration–exploitation trade-off was predominant in the optimization performance within the environment. Initially, as the reward dropped from −61 to −66, the model began exploring the environment more extensively. This exploration allowed the model to gather information about different actions and their corresponding rewards. Once the model gained some knowledge, it started exploiting this knowledge to make more informed decisions, resulting in an improvement in the reward from −66 to −47.5. However, to avoid getting stuck in a local maximum and potentially missing a global maximum, the model resumed exploration. Consequently, the reward temporarily dropped to −80 as the model sought alternative solutions. Eventually, the model discovered a new local maximum of −55, demonstrating that the initial maximum was indeed global, and the exploration phase was crucial in uncovering the true global maximum.

FIG. 10.

Training diagram of the A2C RL model mapping the reward value to the number of iterations in comparison with the performance of the built-in PSO Lumerical optimizer.

FIG. 10.

Training diagram of the A2C RL model mapping the reward value to the number of iterations in comparison with the performance of the built-in PSO Lumerical optimizer.

Close modal

The final design is presented in Table VI. The design is compared to the initial design and a design generated by the PSO-embedded optimizer in Lumerical FDTD. The A2C RL model outshines the PSO optimizer in several aspects. The A2C RL model surpasses the PSO’s limitation of handling only two parameters by effectively optimizing four parameters. Furthermore, the A2C RL model exhibits faster convergence, outperforming the PSO optimizer’s optimization time by more than double. In addition, the A2C RL model continuously explores the design space and discovers more promising solutions, while the PSO optimizer often suffers from premature convergence, limiting its ability to explore alternative regions in the solution space. These advantages highlight the power of the A2C RL model over the PSO optimizer and other similar optimizers in the inverse design as expected. However, A2C-RL encounters limitations in inverse design, primarily due to the high-dimensional action space and the stochastic nature of the model, necessitating significant computational resources and time for training and evaluation. Furthermore, the model’s restricted integration of domain knowledge makes it highly reliant on the defined reward function as the sole source of learning for the A2C-RL model, rendering it sensitive to the quality and effectiveness of the reward function.

TABLE VI.

Comparison between the A2C RL model and the PSO optimizer.

Design variablesInitial designPSO designA2C RL design
θ (deg) 13.45 13.45 (fixed) 13.25 
X0 (μm) 3 (fixed) 3.15 
Λ (μm) 0.65 0.6512 0.655 
Dc 0.65 0.756 0.69 
T1550 nm 0.39 0.498 0.5247 
No. of iterations ⋯ 30 14 
Enhancement percentage (%) ⋯ 27 34 
Design variablesInitial designPSO designA2C RL design
θ (deg) 13.45 13.45 (fixed) 13.25 
X0 (μm) 3 (fixed) 3.15 
Λ (μm) 0.65 0.6512 0.655 
Dc 0.65 0.756 0.69 
T1550 nm 0.39 0.498 0.5247 
No. of iterations ⋯ 30 14 
Enhancement percentage (%) ⋯ 27 34 

The proposed successful studies, for forward modeling or inverse design, reveal a highly synergistic relationship between the needs and premises of these two fields, whether it is estimating the non-trivial optical response of a device based on its characteristics or navigating the complex search space of parameters to design a device with desirable properties, AI has proven its efficacy in both areas. For forward modeling, the DL models still require large datasets for training, despite their rapid prediction capability once trained, imposing a burden on their development due to reliance on other numerical simulators or laborious experiments for data acquisition. However, we demonstrated that this burden can be alleviated via PINNs, which approach the problem in a data-free manner and offer advantages such as continuity and a meshless nature compared to traditional solvers such as FD methods. However, the training time for these models remains a challenge. A model that combines these advantages with faster training times, or one that does not require retraining for each new simulation, would be highly beneficial for photonics applications. A promising candidate for such a model could be physics-informed neural operators (PINOs),65 which solve entire families of differential equations. Thus, with the advent of large reliable AI models, developing an accurate, instant solver for cross-domain analysis (Maxwell’s and Schrödinger’s equations) seems possible. This breakthrough would herald a new era in quantum photonics modeling, especially for those required for space applications. This paradigm can also deepen our understanding of the physics of quantum devices, paving the way for new applications. In addition, in the realm of inverse design, RL is poised to transform photonic optimization as a versatile and powerful optimization tool. It offers crucial advantages, such as flexible adaptive exploration and efficient optimization in high-dimensional spaces. It can also rapidly discover unconventional innovative solutions surpassing traditional optimization methods. Therefore, RL has the potential to revolutionize the field by navigating and exploring the vast design space of photonics components and systems, leading to unprecedented performance enhancements in integrated photonics, quantum optics, and optical sensing. This can lead to a new RL-based General Photonic Optimizer (GPO) that can optimize the performance of any photonic device. In addition, this can be extended by using PINOs as the GPO environment, thus creating an all-AI Photonics Design and Exploration Expert (PhoDeX-AI). Figure 11 shows the envisioned structure of the PhoDex-AI tool. The tool could have an additional large language model (LLM) such as a Generative Pre-trained Transformer (GPT)66 module to facilitate communication with the user.

FIG. 11.

Envisioned AI Photonics Design and Exploration Expert (PhoDex-AI) software tool block diagram. The tool consists of physically informed neural operators (PINOs), which mimic the behavior of photonic devices, an RL optimizer that globally optimizes the performance of the photonic device, and a large language model (LLM), such as a Generative Pre-trained Transformer (GPT) to communicate with the users.

FIG. 11.

Envisioned AI Photonics Design and Exploration Expert (PhoDex-AI) software tool block diagram. The tool consists of physically informed neural operators (PINOs), which mimic the behavior of photonic devices, an RL optimizer that globally optimizes the performance of the photonic device, and a large language model (LLM), such as a Generative Pre-trained Transformer (GPT) to communicate with the users.

Close modal

In this paper, we explored the use of AI in photonics, focusing on forward modeling and inverse design. For forward modeling, we emphasized the potential of data-free methods, particularly PINNs. These networks enable accurate modal analysis of waveguides without relying on extensive datasets. Our proposed PINNs-based approach demonstrated superior accuracy in predicting effective indices and eigenmodes of slab waveguides compared to traditional finite-difference solvers, highlighting its efficacy and flexibility in both training on user-defined point clouds and estimating effective indices owing to its continuous nature. The PINN approach was more accurate than the FD method in this modal analysis, with a lower relative percentage error of 0.69272% compared to 1.28142% for the FD method, based on an analysis using 349 data points.

In addition, in the realm of inverse design, we highlighted the shift from traditional optimization techniques, such as genetic algorithms and topology optimization, to AI-driven approaches. AI techniques, especially RL, offer significant advantages as they efficiently navigate vast design spaces and reduce computational burdens. In conformity with this conclusion, we configured a novel A2C-RL inverse design approach, which excels in scenarios with scarce labeled data and complex design spaces. Next, we employed the model to optimize a grating coupler to demonstrate the power of RL in photonics design. The RL-based approach produced a 34% enhancement over the initial design in just 14 iterations, significantly outperforming the PSO method, which only achieved a 27% enhancement but required 30 iterations to do so. Our studies then culminated in a discussion of the current limitations in both tasks, possible improvements, and our vision for what could be possible in the future within this interplay between AI and photonics.

The authors declare their appreciation for the fund received from the ASRT under the project titled “National Nanotechnology Lab” and the Scientists of Next Generation (FRM-SGO-CYCL#8) Grant. The authors also recognize the financial support from the STDF, Project No. 45702.

The authors have no conflicts to disclose.

M.G.M. and A.S.H. contributed equally to this work.

Mohamed G. Mahmoud: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). Amr S. Hares: Conceptualization (supporting); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). Mohamed Farhat O. Hameed: Conceptualization (supporting); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Software (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). M. S. El-Azab: Project administration (equal); Supervision (equal); Writing – review & editing (equal). Salah S. A. Obayya: Conceptualization (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request. The data are not publicly available to allow for a controlled and regulated distribution of the data, ensuring that the futuristic studies will comply with the ethical considerations associated with the study, given the morally ambiguous nature of AI.

1.
W.
Shi
,
Y.
Tian
, and
A.
Gervais
, “
Scaling capacity of fiber-optic transmission systems via silicon photonics
,”
Nanophotonics
9
,
4629
4663
(
2020
).
2.
J.
Witzens
, “
High-speed silicon photonics modulators
,”
Proc. IEEE
106
,
2158
2182
(
2018
).
3.
S.
Chen
,
Y.
Shi
,
S.
He
, and
D.
Dai
, “
Compact monolithically-integrated hybrid (de)multiplexer based on silicon-on-insulator nanowires for PDM-WDM systems
,”
Opt. Express
23
,
12840
(
2015
).
4.
B. M.
Younis
,
A. M.
Heikal
,
M. F. O.
Hameed
, and
S. S. A.
Obayya
, “
Highly wavelength-selective asymmetric dual-core liquid photonic crystal fiber polarization splitter
,”
J. Opt. Soc. Am. B
35
,
1020
1029
(
2018
).
5.
N. L.
Kazanskiy
,
S. N.
Khonina
, and
M. A.
Butt
, “
Recent development in metasurfaces: A focus on sensing applications
,”
Nanomaterials
13
,
118
(
2023
).
6.
A. A.
Zhirnov
,
G. Y.
Chesnokov
,
K. V.
Stepanov
,
T. V.
Gritsenko
,
R. I.
Khan
,
K. I.
Koshelev
,
A. O.
Chernutsky
,
C.
Svelto
,
A. B.
Pnev
, and
O. V.
Valba
, “
Fiber-optic telecommunication network wells monitoring by phase-sensitive optical time-domain reflectometer with disturbance recognition
,”
Sensors
23
,
4978
(
2023
).
7.
M. Y.
Azab
,
M. F. O.
Hameed
, and
S. S. A.
Obayya
, “
Overview of optical biosensors for early cancer detection: Fundamentals, applications and future perspectives
,”
Biology
12
,
232
(
2023
).
8.
P.
Zheng
,
X.
Xu
,
G.
Hu
,
R.
Zhang
,
B.
Yun
, and
Y.
Cui
, “
Integrated multi-functional optical filter based on a self-coupled microring resonator assisted MZI structure
,”
J. Lightwave Technol.
39
,
1429
1437
(
2021
).
9.
L.
Zhu
,
J.
Sun
, and
Y.
Zhou
, “
Silicon-based wavelength division multiplexer using asymmetric grating-assisted couplers
,”
Opt. Express
27
,
23234
23249
(
2019
).
10.
Y. K. A.
Alrayk
,
B. M.
Younis
,
W. S.
El Deeb
,
M. F. O.
Hameed
, and
S. S. A.
Obayya
, “
MIR optical modulator based on silicon-on-calcium fluoride platform with VO2 material
,”
Opt. Quantum Electron.
53
,
559
(
2021
).
11.
M.
Hussein
,
M. F. O.
Hameed
,
S. S. A.
Obayya
, and
M. A.
Swillam
, “
Effective modeling of silicon nanowire solar cells
,” in
2017 International Applied Computational Electromagnetics Society Symposium–Italy
(
ACES) (IEEE
,
2017
), pp.
1
2
.
12.
M. F. O.
Hameed
,
Y. K. A.
Alrayk
, and
S. S. A.
Obayya
, “
Self-calibration highly sensitive photonic crystal fiber biosensor
,”
IEEE Photonics J.
8
,
1
12
(
2016
).
13.
F.
Liu
,
B.
Dong
, and
X.
Liu
, “
Bio-inspired photonic structures: Prototypes, fabrications and devices
,” in
Optical Devices in Communication and Computation
, edited by
P.
Xi
(
IntechOpen
,
Rijeka
,
2012
), Chap. 6.
14.
M. E.
Belkin
,
V.
Golovin
,
Y.
Tyschuk
,
M. G.
Vasil’ev
, and
A. S.
Sigov
, “
Computer-aided design of microwave-photonics-based RF circuits and systems
,” in
RF Systems, Circuits and Components
, edited by
M. B. I.
Reaz
and
M. A. S.
Bhuiyan
(
IntechOpen
,
Rijeka
,
2018
), Chap. 4.
15.
D.
Gostimirovic
and
W. N.
Ye
, “
Automating photonic design with machine learning
,” in
2018 IEEE 15th International Conference on Group IV Photonics (GFP)
(
IEEE
,
2018
), pp.
1
2
.
16.
A.
Lowery
, “
Computer-aided photonics design
,”
IEEE Spectrum
34
,
26
31
(
1997
).
17.
A. J.
Lowery
and
P. C.
Gurney
, “
Two simulators for photonic computer-aided design
,”
Appl. Opt.
37
,
6066
6077
(
1998
).
18.
M. F. O.
Hameed
and
S. S. A.
Obayya
, “
Modal analysis of a novel soft glass photonic crystal fiber with liquid crystal core
,”
J. Lightwave Technol.
30
,
96
102
(
2012
).
19.
F.
Ferranti
, “
Forward modeling for metamaterial design using feature-based machine learning
,”
Proc. SPIE
12130
,
1213009
(
2022
).
20.
S.
Molesky
,
Z.
Lin
,
A. Y.
Piggott
,
W.
Jin
,
J.
Vucković
, and
A. W.
Rodriguez
, “
Inverse design in nanophotonics
,”
Nat. Photonics
12
,
659
670
(
2018
).
21.
D.
Pinto
and
S. S. A.
Obayya
, “
Improved complex-envelope alternating-direction-implicit finite-difference-time-domain method for photonic-bandgap cavities
,”
J. Lightwave Technol.
25
,
440
447
(
2007
).
22.
S.
Obayya
, “
Novel finite element analysis of optical waveguide discontinuity problems
,”
J. Lightwave Technol.
22
,
1420
1425
(
2004
).
23.
B. M. A.
Rahman
,
D. M. H.
Leung
,
S. S. A.
Obayya
, and
K. T. V.
Grattan
, “
Numerical analysis of bent waveguides: Bending loss, transmission loss, mode coupling, and polarization coupling
,”
Appl. Opt.
47
,
2961
2970
(
2008
).
24.
G. A.
Vandenbosch
and
Z.
Ma
, “
Upper bounds for the solar energy harvesting efficiency of nano-antennas
,”
Nano Energy
1
,
494
502
(
2012
).
25.
R. C.
Loonen
,
S.
de Vries
, and
F.
Goia
, “
Inverse design for advanced building envelope materials, systems and operation
,” in
Rethinking Building Skins
, edited by
E.
Gasparri
,
A.
Brambilla
,
G.
Lobaccaro
,
F.
Goia
,
A.
Andaloro
, and
A.
Sangiorgio
(
Woodhead Publishing
,
2022
), pp.
377
402
.
26.
J.
Leng
, “
Optimization techniques for structural design of cold-formed steel structures
,” in
Recent Trends in Cold-Formed Steel Construction
, edited by
C.
Yu
(
Woodhead Publishing
,
2016
), pp.
129
151
.
27.
A. M.
Hammond
,
A.
Oskooi
,
S. G.
Johnson
, and
S. E.
Ralph
, “
Photonic topology optimization with semiconductor-foundry design-rule constraints
,”
Opt. Express
29
,
23916
23938
(
2021
).
28.
C.
Ryan
, “
Evolutionary algorithms and metaheuristics
,” in
Encyclopedia of Physical Science and Technology
, 3rd ed., edited by
R. A.
Meyers
(
Academic Press
,
New York
,
2003
), pp.
673
685
.
29.
H.
Mahrous
,
M.
Fedawy
,
M.
Abboud
,
A.
Shaker
,
W.
Fikry
, and
M.
Gad
, “
A multi-objective genetic algorithm approach for silicon photonics design
,”
Photonics
11
,
80
(
2024
).
30.
R.
Shiratori
,
M.
Nakata
,
K.
Hayashi
, and
T.
Baba
, “
Particle swarm optimization of silicon photonic crystal waveguide transition
,”
Opt. Lett.
46
,
1904
1907
(
2021
).
31.
M. F. O.
Hameed
,
A.-K.
Hassan
,
A.
Elqenawy
, and
S.
Obayya
, “
Modified trust region algorithm for dispersion optimization of photonic crystal fibers
,”
J. Lightwave Technol.
35
,
3810
3818
(
2017
).
32.
S.
So
,
T.
Badloe
,
J.
Noh
,
J.
Bravo-Abad
, and
J.
Rho
, “
Deep learning enabled inverse design in nanophotonics
,”
Nanophotonics
9
,
1041
1057
(
2020
).
33.
I. H.
Sarker
, “
Machine learning: Algorithms, real-world applications and research directions
,”
SN Comput. Sci.
2
,
160
(
2021
).
34.
J. W.
Goodell
,
S.
Kumar
,
W. M.
Lim
, and
D.
Pattnaik
, “
Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis
,”
J. Behav. Exp. Finance
32
,
100577
(
2021
).
35.
F.
Pesapane
,
M.
Codari
, and
F.
Sardanelli
, “
Artificial intelligence in medical imaging: Threat or opportunity? Radiologists again at the forefront of innovation in medicine
,”
Eur. Radiol. Exp.
2
,
35
(
2018
).
36.
F.
Fuchs
,
Y.
Song
,
E.
Kaufmann
,
D.
Scaramuzza
, and
P.
Dürr
, “
Super-human performance in Gran Turismo sport using deep reinforcement learning
,”
IEEE Rob. Autom. Lett.
6
,
4257
4264
(
2021
).
37.
K.
He
,
X.
Zhang
,
S.
Ren
, and
J.
Sun
, “
Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification
,” in
Proceedings of the IEEE International Conference on Computer Vision
(
IEEE
,
2015
), pp.
1026
1034
.
38.
C.
Lu
and
X.
Tang
, “
Surpassing human-level face verification performance on LFW with GaussianFace
,” in
Proceedings of the AAAI Conference on Artificial
Intelligence (PKP Publishing Services
,
2015
), Vol.
29
.
39.
A. K.
Shakya
,
G.
Pillai
, and
S.
Chakrabarty
, “
Reinforcement learning algorithms: A brief survey
,”
Expert Syst. Appl.
231
,
120495
(
2023
).
40.
D.
Woods
and
T. J.
Naughton
, “
Photonic neural networks
,”
Nat. Phys.
8
,
257
259
(
2012
).
41.
C.
Huang
,
V. J.
Sorger
,
M.
Miscuglio
,
M.
Al-Qadasi
,
A.
Mukherjee
,
L.
Lampe
,
M.
Nichols
,
A. N.
Tait
,
T.
Ferreira de Lima
,
B. A.
Marquez
et al, “
Prospects and applications of photonic neural networks
,”
Adv. Phys.: X
7
,
1981155
(
2022
).
42.
M.
Raissi
,
P.
Perdikaris
, and
G. E.
Karniadakis
, “
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
,”
J. Comput. Phys.
378
,
686
707
(
2019
).
43.
D.
Rumelhart
,
G.
Hinton
, and
R.
Williams
, “
Learning representations by back-propagating errors
,”
Nature
323
,
533
536
(
1986
).
44.
K.
Hornik
,
M.
Stinchcombe
, and
H.
White
, “
Multilayer feedforward networks are universal approximators
,”
Neural Networks
2
,
359
366
(
1989
).
45.
I.
Ben-Shaul
,
L.
Bar
,
D.
Fishelov
, and
N.
Sochen
, “
Deep learning solution of the eigenvalue problem for differential operators
,”
Neural Comput.
35
,
1100
1134
(
2023
).
46.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
, 2nd ed. (
MIT Press
,
2018
).
47.
Z.
Li
,
N.
Kovachki
,
K.
Azizzadenesheli
,
B.
Liu
,
K.
Bhattacharya
,
A.
Stuart
, and
A.
Anandkumar
, “
Neural operator: Graph kernel network for partial differential equations
,” in
Proceedings of the ICRL Workshop on Integration of Deep Neural Models and Differential Equations (OpenReview,
2020
), Vol. 8.
48.
L.
Lu
,
P.
Jin
, and
G.
Karniadakis
, “
Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators
,”
Nat Mach Intell
3
,
218
229
(
2021
).
49.
G.
Alagappan
and
C. E.
Png
, “
Prediction of electromagnetic field patterns of optical waveguide using neural network
,”
Neural Comput. Appl.
33
,
2195
2206
(
2021
).
50.
G.
Alagappan
and
C.
Png
, “
Meshless optical mode solving using scalable deep deconvolutional neural network
,”
Sci. Rep.
13
,
1078
(
2023
).
51.
I.
Sajedian
,
J.
Kim
, and
J.
Rho
, “
Finding the optical properties of plasmonic structures by image processing using a combination of convolutional neural networks and recurrent neural networks
,”
Microsyst. Nanoeng.
5
,
27
(
2019
).
52.
M.
Hameed
,
S.
Obayya
,
K.
Al-Begain
,
A.
Nasr
, and
M.
Abo el Maaty
, “
Accurate radial basis function based neural network approach for analysis of photonic crystal fibers
,”
Opt. Quantum Electron.
40
,
891
905
(
2008
).
53.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” in
Proccedings of the 3rd International Conference on Learning Representations,
San Diego
(
DBLP
,
2015), Vol. 3.
54.
C.
Wu
,
M.
Zhu
,
Q.
Tan
,
Y.
Kartha
, and
L.
Lu
, “
A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks
,”
Comput. Methods Appl. Mech. Eng.
403
,
115671
(
2023
).
55.
L.
McClenny
and
U.
Braga-Neto
, “
Self-adaptive physics-informed neural networks
,”
J. Comput. Phys.
474
,
111722
(
2023
).
56.
B.
MacLellan
,
P.
Roztocki
,
J.
Belleville
,
L.
Romero Cortés
,
K.
Ruscitti
,
B.
Fischer
,
J.
Azaña
, and
R.
Morandotti
, “
Inverse design of photonic systems
,”
Laser Photonics Rev.
18
,
2300500
(
2024
).
57.
M.
Hameed
,
K.
Mahmoud
, and
S.
Obayya
, “
Metaheuristic algorithms for dispersion optimization of photonic crystal fibers
,”
Opt. Quantum Electron.
48
,
127
(
2016
).
58.
Z.
Lin
,
X.
Liang
,
M.
Lončar
,
S. G.
Johnson
, and
A. W.
Rodriguez
, “
Cavity-enhanced second-harmonic generation via nonlinear-overlap optimization
,”
Optica
3
,
233
(
2016
).
59.
A. Y.
Piggott
,
J.
Lu
,
K. G.
Lagoudakis
,
J.
Petykiewicz
,
T. M.
Babinec
, and
J.
Vučković
, “
Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer
,”
Nat. Photonics
9
,
374
(
2015
).
60.
W.
Chen
,
R.
Li
,
Z.
Huang
,
H.
Wu
,
J.
Wei
,
S.
Wang
,
L.
Wang
, and
Y.
Li
, “
Inverse design of polarization conversion metasurfaces by deep neural networks
,”
Appl. Opt.
62
,
2048
2054
(
2023
).
61.
J.
Peurifoy
,
Y.
Shen
,
L.
Jing
,
Y.
Yang
,
F.
Cano-Renteria
,
B. G.
DeLacy
,
J. D.
Joannopoulos
,
M.
Tegmark
, and
M.
Soljačić
, “
Nanophotonic particle simulation and inverse design using artificial neural networks
,”
Sci. Adv.
4
,
eaar4206
(
2018
).
62.
D.
Liu
,
Y.
Tan
,
E.
Khoram
, and
Z.
Yu
, “
Training deep neural networks for the inverse design of nanophotonic structures
,”
ACS Photonics
5
,
1365
1369
(
2018
).
63.
S. J.
Russell
and
P.
Norvig
,
Artificial Intelligence: A Modern Approach
(
Pearson Education Limited
,
Malaysia
,
2016
).
64.
V.
Mnih
,
A. P.
Badia
,
M.
Mirza
,
A.
Graves
,
T. P.
Lillicrap
,
T.
Harley
,
D.
Silver
, and
K.
Kavukcuoglu
, “
Asynchronous methods for deep reinforcement learning
,” in
Proceedings of The 33rd International Conference on Machine Learning
(PMLR,
2016
), Vol. 48, pp.
1928
-
1937
.
65.
Z.
Li
,
H.
Zheng
,
N.
Kovachki
,
D.
Jin
,
H.
Chen
,
B.
Liu
,
K.
Azizzadenesheli
, and
A.
Anandkumar
, “
Physics-informed neural operator for learning partial differential equations
,”
ACM/JMS J. Data Sci.
1
(
3
),
1
27
(
2024
).
66.
J.
Gallifant
,
A.
Fiske
,
Y. A.
Levites Strekalova
,
J. S.
Osorio-Valencia
,
R.
Parke
,
R.
Mwavu
et al, “
Peer review of GPT-4 technical report and systems card
,”
PLOS Digit Health
3
, e0000417 (
2024
).