Recent advances in artificial intelligence (AI) and computing technologies are currently disrupting the modeling and design paradigms in photonics. In this work, we present our perspective on the utilization of current AI models for photonic device modeling and design. Initially, through the physics-informed neural networks (PINNs) framework, we embark on the task of modal analysis, offering a unique neural networks-based solver and utilizing it to predict propagating modes and their corresponding effective indices for slab waveguides. We compare our model’s predictions against theoretical benchmarks and a finite differences solver. Evidently, using 349 analysis points, the PINN approach had a relative percentage error of 0.69272% compared to the finite differences method, which had a percentage error of 1.28142% with respect to the analytical solution, indicating that the PINN approach was more accurate in conducting modal analysis. Our model’s continuity over the entire solution domain enhances its performance and flexibility while requiring no training data due to its guidance by Maxwell’s equations, setting it apart from most AI approaches. Our model design also flexibly enables simultaneous prediction of multiple modes over any specified intervals of effective indices. In addition, we present a novel reinforcement learning (RL)-based paradigm, employing an actor–critic model for inverse design. We utilize this paradigm to optimize the transmittance of a grating coupler by manipulating the device geometry. Comparing the obtained design to that obtained using the Particle Swarm Optimization (PSO) algorithm, our RL-based approach effectively produced a significant enhancement of 34% in 14 iterations only over the initial design compared to the PSO, which prematurely scored 27% enhancement in 30 iterations, proving that our model navigates the design space more efficiently, achieving a better design than PSO and resulting in a superior design. Based on these approaches, we discuss the future of AI in photonics in forward modeling and inverse design and the untapped potential in bringing these worlds together.

## I. INTRODUCTION

From ultra-high-speed communications^{1–4} to medical sensing,^{5–7} photonic devices, structures, and systems have been a key enabler of phenomenal progress in many industries. Rooted in Maxwell’s equations, the interaction between light and matter has been extensively studied, leading to breakthroughs in information processing,^{8} optical communications,^{9,10} energy harvesting,^{11} and sensing.^{12}

Consequently, the design process of these photonic devices has been revolutionized. Initially, it relied on experience-based, physically inspired structures that spurred from studying Maxwell’s equations and their solutions.^{13} However, with the advent of computer-aided design powered by numerical simulators and optimization procedures,^{14} there has been a transformation in how these devices are conceptualized and engineered.^{15} Computer-aided design (CAD) plays a crucial role in shaping the landscape of photonics engineering, particularly in dealing with complex structures and multiple design parameters.^{16} While our understanding of physics and experimentation is valuable in its own right, translating this knowledge into specific device designs poses significant challenges, especially in intricate systems.^{17,18} This paradigm shift directed attention toward specifying objectives and selecting suitable parameter spaces, paving the way for two primary approaches: the forward-centered^{19} and the inverse-centered^{20} methodologies. The forward-centered approach to photonics design typically involves two primary steps: selecting candidate device geometry and parameters that align with the design requirements, and second, modeling the device’s numerical behavior by mimicking and mapping the physical response of the device to its design parameters by solving Maxwell’s equations.^{19} This numerical modeling is often carried out using methods such as finite differences (FD),^{21} finite element (FE),^{22,23} or the method of moments (MoM).^{24}

On the other hand, the inverse-centered approach directly addresses the challenge of mapping desired responses to device parameters.^{20} It enables researchers to specify the desired device functionalities and utilize computational solvers to automatically generate optimized device structures by formulating the problem in reverse, starting with a desired functionality and determining the optimal structure to achieve it.^{25} This approach typically involves employing optimization algorithms, which can be gradient-based methods^{26} such as topology optimization,^{27} evolutionary-based techniques^{28} such as genetic algorithms,^{29} or Particle Swarm Optimization (PSO).^{30} However, these methods have many limitations that need to be addressed, including the complexity of exploring large design spaces and optimizing numerous parameters. They also tend to suffer from premature convergence.^{31} Therefore, integrating artificial intelligence (AI) techniques into the design and modeling of photonic devices has become necessary to overcome these limitations.^{32}

AI is a computational approach aiming to develop computer systems that can perform tasks that typically require human intelligence. One of the most impactful approaches to realizing such systems is machine learning (ML). ML offers data-driven computational models that can automatically discover patterns and model relationships within a given set of data samples. It relies on combining statistical models and optimization algorithms in a computational model that iteratively analyzes data while refining its performance within the performed task.^{33} Currently, deep learning (DL), a subset of machine learning that leverages neural networks (NNs) as a computational model, is the most popular approach to AI. NNs can approximate any (continuous) relation between inputs and outputs via a network of interconnected simple units called neurons, which distribute the processing of a given input (a picture, text, or features of an object). NNs are compositional, such that a standard network is organized into layers that are applied consecutively until a final output is obtained. DL found its way to various industries and applications, ranging from stock market prediction to medical image analysis and object recognition.^{34,35} NN architectures are also employed within the reinforcement learning (RL)^{36} framework where an agent (the model) interacts with an environment to learn optimal actions through a trial-and-error process. This allows the RL model to overcome the hurdle of gathering a large dataset and preprocessing the gathered data, allowing for data-generative and real-time adaptive training.^{37–39}

This drove attention to exploit AI and DL capabilities for photonics modeling and design in an attempt to come up with innovative devices with superior performance. Conversely, photonics can serve AI by providing ultra-fast, low-energy hardware realizations for DL models. Integrated photonics is particularly promising for building hardware for artificial neural networks (ANNs) due to its alignment with analog computing principles and its advantages in energy consumption and bandwidth. Photonic devices offer multiple degrees of freedom, enabling parallel data processing. Advances in silicon photonics have led to photonic integrated circuits (PICs) with thousands of components, although still fewer than electronic systems. To meet the growing computational demands of deep learning, novel photonic neural network (PNN)^{40} platforms have been developed. These platforms use light for both data transfer and computational functions, achieving higher efficiency than electronic systems. PNNs are especially effective for inference applications and have uses in natural language processing, cybersecurity, matrix multiplication, edge computing, and optimization problems in autonomous driving and robotics.^{41}

In this paper, we present our perspective on novel paradigms for integrating AI into the field of photonics. As shown in Fig. 1, we demonstrate the conceptual framework of AI-based forward modeling and inverse design. Using physics-informed neural networks (PINNs),^{42} we approach forward modeling by tailoring them for modal analysis of photonic waveguides. We develop a novel model that accurately predicts the supported modes and their associated effective indices, without requiring training data. This model is meshless and continuous across the entire problem domain. For the inverse design problem, we employ RL with the actor–critic paradigm to design a grating coupler. The resulting device outperforms those designed using PSO, achieving better performance with fewer steps in the search space.

The upcoming sections of the paper will cover the following: in Sec. II, we provide a brief overview of our modeling strategies for the forward modeling and inverse design paradigms briefly discussing the various elements of these strategies. We begin our discussion with a fundamental structure of deep learning, namely, fully connected neural networks, which will serve as the backbone for the models we build later on. We also discuss the problem of incorporating physical constraints, described by differential equations, into neural networks based on the PINNs framework. In addition, we discuss the basic RL framework, focusing on a key RL algorithm known as the actor–critic (A2C) algorithm. Leveraging the first two parts, we tackle the problem of forward modeling in Sec. III, current trends in forward modeling are reviewed, and the importance of utilizing PINNs for the modal analysis of slab waveguides is demonstrated. Shifting to the domain of inverse design in Sec. IV, the section begins by discussing recent trends in inverse design and demonstrating the need for our A2C-RL model, which is benchmarked by inverse designing a grating coupler. Finally, Sec. V covers the current limitations in both forward and inverse modeling tasks, proposes potential improvements, and outlines our vision for the future of integrating AI techniques in photonics.

## II. MODELING STRATEGIES

The adopted modeling strategies will tackle the frameworks of using PINNs for forward modeling and the RL paradigm for inverse design. Both strategies will require utilizing the basic feed-forward neural networks to realize the PINN for the forward modeling of a waveguide and the policy-deciding agent in the RL model. In the forward modeling strategy, the PINN will combine the power of the feed-forward neural network with Maxwell wave equations to enable accurate modal analysis of photonic devices by capturing the underlying physical behavior of the waveguide. In contrast, the inverse design paradigm through RL will allow for iterative design improvement by leveraging a trial-and-error method. Through interactions with the environment and feedback in the form of rewards or penalties, the RL agent explores the design space in an attempt to learn optimal decisions and explore unconventional efficient design solutions. Hence, these modeling strategies are expected to accelerate the simulation process, enhance design quality, and foster the discovery of innovative solutions within the photonics realm. The next subsections discuss the basics of neural networks and how to extend them to the PINNs and RL.

### A. Fully connected forward neural networks

The architecture of a fully connected forward neural network, or simply NN, typically consists of multiple layers of interconnected nodes, or artificial neurons, organized into an input layer, one or more hidden layers, and an output layer. NNs aim to learn complex patterns and representations from input data in a hierarchical manner, allowing them to perform tasks such as classification, regression, and pattern recognition. We formalize those notions mathematically as follows.^{43}

*g*

_{l}and a local nonlinear transformation

*σ*

_{l}. In particular, for the

*l*th layer, $gl:Rnl\u22121\u2192Rnl$ is defined as

*g*

_{l}(

**x**) =

**W**

_{l}

**x**+

**b**

_{l}, where $Wl\u2208Rnl\xd7nl\u22121$ and $bl\u2208Rnl$ is the bias. The local transformation $\sigma l:Rnl\u2192Rnl$ is applied element-wise. A layer $Lj$ of the network is defined as $Lj(x)=\sigma (gl(x))$, and the entire network with

*L*layers is expressed as

**W**

_{1},

**b**

_{1},

**W**

_{2},

**b**

_{2}, …,

**b**

_{l}} is collectively denoted as

*θ*. In the deep learning community, the dimension of the output vector after a layer is referred to as the number of neurons, defining a network’s architecture in terms of its layers, neurons, and activation function. A two-layer neural network taking the functional form of Eq. (1) is proven to be a universal approximator capable of representing any Borel measurable function on a compact domain to an arbitrary degree.

^{44}This property allows neural networks to approximate the modes of a problem’s eigenfunctions. The process of adjusting the parameters Θ of the network to satisfy problem requirements, thereby yielding a solution, is known as training. Generally, training involves iteratively adjusting the network parameters to minimize a predefined loss or cost function, quantifying the disparity between predicted outputs and known ground truths. Let

**X**= {(

**x**

_{1},

**y**

_{1}), (

**x**

_{2},

**y**

_{2}), …, (

**x**

_{N},

**y**

_{N})} denote the set of

*N*training examples, where each tuple $(xi\u2208Rn,yi\u2208Rm)$ represents an input feature vector and its corresponding target output. The objective is to find the optimal set of parameters Θ that minimizes the empirical loss given by

*l*is the chosen loss function, such as mean squared error or cross-entropy loss. This optimization problem is typically solved using gradient-based methods, where the gradient ∇

_{Θ}

*l*(

**y**,

*f*(

**x**; Θ)) of the loss function of the parameters is computed via backpropagation and utilized to minimize the loss function.

In photonics, this structure provides versatile applications for both forward and inverse design. For example, it can be used to predict the optical response of a device based on its design parameters, or conversely, to determine the design parameters needed to achieve a desired optical response. Our goal is to use neural networks (NNs) to directly represent the modes of an optical device. We achieve this by embedding our understanding of the governing physics, characterized by differential equations whose solutions are the modes and boundary conditions, into the network’s loss function.

### B. Employing physics constraints into neural networks

*l*in Eq. (2) into several constituent terms incorporating the residual of the DE at hand and minimizing that residual in a mean squared error sense, along with the equation’s initial and boundary conditions.

^{42}Let $H$ be a Hilbert space for which the inner product between two members $u,v\u2208H$ is denoted by ⟨

*u*,

*v*⟩, and ‖

*u*‖ denotes the norm. For a DE of the form $A\varphi =b$, where $A$ is a differential operator and

*b*a non-homogeneous term, defined on some domain Ω, with a boundary

*∂*Ω, subject to a Dirichlet boundary condition

*ϕ*

_{∂Ω}=

*ϕ*

_{0}. The loss function takes the form

*b*= 0 and the differential operator factored into $A\varphi =B\varphi \u2212k\varphi $, such that $B$ is another differential operator and (

*ϕ*,

*k*) are the problem’s eigenfunction–eigenvalue pair, a study proposed that an NN solution could be found by augmenting the loss in Eq. (3) with additional terms.

^{45}First, an energy term to prevent the network from learning trivial solutions and second, for linear self-adjoint operators, a term based on Rayleigh quotient to learn the eigenvalue, giving a loss,

*l*

_{bd},

*l*

_{phy}defined as in Eqs. (3b) and (3c). The formula attempts to predict both the eigenfunction with the smallest eigenvalue and this corresponding value. We modify the last term of formula (4a) to provide more control over the types of modes to be learned by maximizing the eigenvalue within a specific range (

*n*

_{l}

*ow*,

*n*

_{u}

*p*) via the term,

Utilizing NNs to learn and represent functions offers numerous advantages. First, the learned function maintains continuity, enabling predictions across the entirety of the solution domain. In addition, the learning process is highly flexible, as it operates on a user-defined point cloud, eliminating the necessity for intricate meshing techniques. We demonstrate the advantages of this flexibility via numerical experiments

### C. Reinforcement learning

RL is a subfield of machine learning that focuses on how an untrained agent learns to make decisions through interactions with an environment to maximize rewards. In RL, an agent, which in its simplest form is a neural network, takes actions in an environment and receives feedback in the form of rewards, which is used to evaluate the correctness of its corresponding actions. Conventionally, to effectively model RL problems, the framework of Markov decision processes (MDPs) serves as the basic building block.^{46} MDPs provide a mathematical formulation that captures the dynamics of sequential decision-making under uncertainty. Within an MDP, the agent interacts with an environment that transitions between different states (configurations) based on its chosen actions. MDPs enable RL algorithms to learn optimal policies that guide the agent’s decision-making process by incorporating the concept of transition probabilities and rewards associated with state-action pairs. Figure 4 shows the basic MDP configuration for the RL model where an RL agent, functioning as an autonomous entity or computational system, engages in interactions with an environment to acquire knowledge and make decisions aimed at achieving specific objectives or maximizing cumulative rewards. The environment, a well-defined and structured system, serves as the context where the agent’s actions occur. The environment also encompasses a range of possible states that reflect diverse situations or configurations. It also portrays a set of permissible actions available for the RL agent to employ. These actions represent the decisions made by the agent, which influences the state of the environment. Following each action, the environment provides the agent with a numerical signal termed a reward, which serves as an indicator of the immediate desirability or effectiveness of the action taken. Typically, the overarching aim of the agent is to maximize the cumulative reward obtained over time.

In an MDP framework, we have a finite set of states *S*_{1}, *S*_{2}, *S*_{3}, …, *S*_{N} representing various environmental situations and configurations. The agent can take actions from a finite set *A*_{1}, *A*_{2}, *A*_{3}, … *A*_{N} to influence the environment. Rewards, represented by the function *R*(*S*_{t}, *A*_{t}, *S*_{t+1}), are received by the agent after taking actions in specific states. Finally, policies *π*: *S*_{t} → *A*_{t+1}, defined as the strategies that map states to actions, guide the agent’s behavior. The objective is to find an optimal policy *π** that maximizes the expected cumulative reward over time, considering a discount factor *γ* that weighs a trade-off between the significance of the immediate and future rewards on the optimal policy.

## III. FORWARD MODELING

There have been numerous attempts to employ DL for photonics modeling. We divide candidate approaches as being data-driven; following the standard data-driven training with neural networks, including the canonical fully connected neural networks (FCNNs) and convolutional neural networks (CNNs), neural operators (NEs),^{47} and deep operator networks (DeepONets),^{48} or data-free such as physics-informed neural networks (PINNs) or physics-informed operator networks. The data-free approach has the advantage of not relying on extensive datasets gathered via the use of numerical simulators or laborious experiments but with the downside of being more unstable and typically slower during training. This comes from the lack of a fixed target that the NN is trying to approximate for a given input. We propose a data-free methodology based on PINNs to carry out forward modeling and perform modal analysis of waveguides. The proposed method is illustrated on slab waveguides and compared against the semi-analytic known solution and an FD-based solver.

### A. Recent trends in forward modeling

Alagappan and Png^{49} pursued a methodology to forecast the *H*_{y} element of the transverse electric (TE) mode in a buried channel waveguide. Employing FD as the primary numerical solver, they generated multiple datasets, the largest containing 315 instances covering various device parameters, including the refractive index of the core region (*n*_{c} ∈ [1.5, 3.8]), and the dimensions of the core (*w* ∈ [0.1, 1.2] *μ*m and *h* ∈ [0.1, 0.5]), with silica (*n* = 1.45) serving as the cladding material. Utilizing symmetric devices, only a quarter of the simulated output image was predicted during training. Two architectures, FCNN and recurrent neural networks (RNN), were deployed to address this issue. Comparing their results revealed an advantage for the RNN in capturing the relationship between the predicted samples. In addition, the authors investigated the impact of variations in the training set size, as well as the number of layers and neurons within the architecture.

In a subsequent study by the same authors, a different approach was taken, employing convolutional (or more precisely, deconvolutional) neural networks.^{50} This resulted in a flexible architecture allowing for scalable output resolution adjustment through the addition or removal of scaling blocks, comprising an upsampling layer, two-dimensional convolution, and rectified linear unit** (**ReLU) activation. Transfer learning facilitated rabid architecture training for finer discretizations.

Inspired by the success of CNNs in image processing, Sajedian *et al.* proposed an architecture combining CNNs and RNNs to predict the absorption spectra of plasmonic nanostructures.^{51} Devices were assumed to be uniform in one dimension, hence the architecture received its input as a two-dimensional image of the modeled nanostructure. The authors acquired training data via simulation using a finite-difference time-domain (FDTD) solver and artificially created a dataset of 100 000 examples. The examples were of devices with certain parameters fixed, such as constituent materials and source polarization, and other variables including the number of shapes in each structure and their dimensions. Upon training, the model showed good agreement with numerical simulations while being significantly faster to run.

In all the approaches considered, example data consisting of device parameters and their corresponding target responses were necessary to train each model. These data were generated using numerical solvers, indicating each method’s reliance on traditional solvers. Such reliance becomes a significant limitation when aiming to model complex devices with intricate geometries across all spatial dimensions. Traditional solvers are known to scale poorly with increasing problem dimensions and accuracy requirements, necessitating vast computational resources and prohibitively long durations for each simulation—thousands of which are needed for AI models. Our work addresses these challenges via PINNs as a modeling strategy, which requires no example data for its operation.

### B. Numerical experiments and results

*ϕ*=

*E*

_{y}, the

*y*-directed component of the electric field, $n(x\u0303)$ denotes the refractive index profile, and

*n*

_{eff}is the EI of the structure. The solution $Ey\u0303$ and its first derivative

*dE*

_{y}/

*dx*are required to be continuous across interfaces as per the continuity conditions imposed by the remainder of Maxwell’s equations. We defined the differential operator for the problem $B\varphi =d2\varphi dx\u03032+n2(x\u0303)\varphi $, so that

*R*will correspond directly to the learned EI. Norms and inner products are approximated via sampling, such that

Approximating the Rayleigh quotient could be achieved with uniform or nonuniform sample points; we adopt uniform sampling for our initial experiments and then adopt nonuniform schemes later on.

For all experiments, we utilized an NN with three layers and 128 neurons per hidden layer, using radial basis^{52} as an activation function for the first two layers, for its rabid convergence properties, and an identity mapping for the last layer. Optimization is carried out via the infinity form variant of the Adam optimizer,^{53} referred to as Adamax. The training is implemented using ten thousand optimization steps, with a learning rate of 0.01, decreased by a factor of 0.1 every one thousand iterations for the first three thousand ones. The PINN training and the finite difference method (FDM) modeling were implemented through the free-tier Google Colab environment, which had an Intel Xeon central processing unit (CPU) @ 2.20 GHz, a 13 GB RAM, a Tesla K80 accelerator, and a 12 GB GDDR5 VRAM.

The NNs are trained to produce multiple propagating modes for several devices with varying core widths and refractive index (RI) profiles. The results are compared against the semi-analytical solution for the problem, consisting of the eigenmodes and their corresponding effective indices, in addition to a finite differences (FD) solver. We explore the performance of our model within three aspects. First, showing qualitative results of the learned modes against analytical ones and graph its convergence to the exact EIs to ensure effectiveness of the learning process. We then quantitatively tabulate its performance comparing its relative error to FD in a setting for which the network trains on the same sample points for domain discretization as those of FD. Afterward, we explore flexibility of the network showing how its performance could be greatly improved simply by adopting better choices for training points. Both the FD and NN solvers operated using 64-bit precision to ensure identical computation precision.

Plotting the estimated modes in Fig. 6, for the learned principal and second-order predicted modes of a test guide, shows that the learned modes are in excellent agreement with the theoretical ones over the entire problem domain. The learned functions are smooth and training visually produces no artifacts. For the device shown, we graph its convergence to the exact EIs as optimization proceeds. As shown in Fig. 7, predictions are stable past two thousand iterations. Even though the EI estimates rapidly converge, mode errors continue to marginally improve as training advances as shown in Table II.

In addition, to demonstrate the effectiveness of the proposed method, we tabulate its percentage error against the exact results, for both the eigenmodes and effective indices, compared to the FD solver operating with a step size of $\lambda 50$. As Table I presents, the comparable eigenmode results were obtained for both methods for the mode errors with the NN giving slightly better performance for most cases. Estimates of the EI were better, however, with FD, we found that EI estimates from NNs were better when we used NNs to predict single modes with each training sweep. The first two rows presented in Table III confirm that both approaches give very similar errors for a test device when the NN is trained to predict only the principal mode. Unlike FD, NNs have various parameters to change which could affect their performance for every case. In our experiments, however, we chose to fix optimization parameters to maintain uniformity and equity across all the results obtained. The results for the NN were obtained by training on the same sample points as those for FD. This comparison, however, overlooks two important advantages to utilizing NNs, which are flexibility of choice for the training points and continuity of the network. This comparison also shows how this can immensely influence the quality of the learned solutions by considering different sampling mechanisms for learning and EI estimation.

No. of iterations . | Fundamental mode error (%) . | Second mode error (%) . |
---|---|---|

1000 | 3.98 | 2.18 |

2000 | 0.43 | 2.08 |

3000 | 0.41 | 2.00 |

4000 | 0.41 | 1.99 |

5000 | 0.41 | 1.99 |

6000 | 0.41 | 1.99 |

7000 | 0.41 | 1.99 |

8000 | 0.41 | 1.99 |

9000 | 0.41 | 1.99 |

No. of iterations . | Fundamental mode error (%) . | Second mode error (%) . |
---|---|---|

1000 | 3.98 | 2.18 |

2000 | 0.43 | 2.08 |

3000 | 0.41 | 2.00 |

4000 | 0.41 | 1.99 |

5000 | 0.41 | 1.99 |

6000 | 0.41 | 1.99 |

7000 | 0.41 | 1.99 |

8000 | 0.41 | 1.99 |

9000 | 0.41 | 1.99 |

Eigenmodes . | Effective index . | |||||||
---|---|---|---|---|---|---|---|---|

Device . | Mode . | %err_{NN}
. | %err_{FD}
. | Exact . | NN . | %err
. | FD . | %err
. |

n_{cl} = 1.3 | ||||||||

n_{co} = 1.6 | ||||||||

n_{sub} = 1.4 | ||||||||

1 | 0.677 28 | 0.682 64 | 1.586 15 | 1.585 679 | −0.029 7 | 1.585 921 | −0.014 44 | |

2 | 1.077 72 | 1.121 16 | 1.544 509 | 1.542 63 | −0.121 69 | 1.543 603 | −0.058 69 | |

n_{cl} = 1.2 | ||||||||

n_{co} = 1.3 | ||||||||

n_{sub} = 1.25 | ||||||||

1 | 0.411 83 | 0.441 68 | 1.287 61 | 1.287 257 | −0.027 43 | 1.287 436 | −0.013 48 | |

2 | 1.863 05 | 1.894 12 | 1.254 723 | 1.253 778 | −0.075 28 | 1.254 255 | −0.037 26 | |

n_{cl} = 2.4 | ||||||||

n_{co} = 2.55 | ||||||||

n_{sub} = 2.45 | ||||||||

1 | 0.648 69 | 0.659 14 | 2.541 581 | 2.541 299 | −0.011 09 | 2.541 444 | −0.005 39 | |

2 | 1.025 57 | 1.058 52 | 2.515 557 | 2.516 65 | −0.043 46 | 2.516 122 | −0.021 01 | |

n_{cl} = 1 | ||||||||

n_{co} = 3.2 | ||||||||

n_{sub} = 3.1 | ||||||||

1 | 1.142 41 | 0.748 7 | 3.192 313 | 3.192 042 | −0.008 5 | 3.192 184 | −0.004 05 | |

2 | 2.010 12 | 1.275 54 | 3.169 402 | 3.168 33 | −0.033 8 | 3.168 897 | −0.015 92 |

Eigenmodes . | Effective index . | |||||||
---|---|---|---|---|---|---|---|---|

Device . | Mode . | %err_{NN}
. | %err_{FD}
. | Exact . | NN . | %err
. | FD . | %err
. |

n_{cl} = 1.3 | ||||||||

n_{co} = 1.6 | ||||||||

n_{sub} = 1.4 | ||||||||

1 | 0.677 28 | 0.682 64 | 1.586 15 | 1.585 679 | −0.029 7 | 1.585 921 | −0.014 44 | |

2 | 1.077 72 | 1.121 16 | 1.544 509 | 1.542 63 | −0.121 69 | 1.543 603 | −0.058 69 | |

n_{cl} = 1.2 | ||||||||

n_{co} = 1.3 | ||||||||

n_{sub} = 1.25 | ||||||||

1 | 0.411 83 | 0.441 68 | 1.287 61 | 1.287 257 | −0.027 43 | 1.287 436 | −0.013 48 | |

2 | 1.863 05 | 1.894 12 | 1.254 723 | 1.253 778 | −0.075 28 | 1.254 255 | −0.037 26 | |

n_{cl} = 2.4 | ||||||||

n_{co} = 2.55 | ||||||||

n_{sub} = 2.45 | ||||||||

1 | 0.648 69 | 0.659 14 | 2.541 581 | 2.541 299 | −0.011 09 | 2.541 444 | −0.005 39 | |

2 | 1.025 57 | 1.058 52 | 2.515 557 | 2.516 65 | −0.043 46 | 2.516 122 | −0.021 01 | |

n_{cl} = 1 | ||||||||

n_{co} = 3.2 | ||||||||

n_{sub} = 3.1 | ||||||||

1 | 1.142 41 | 0.748 7 | 3.192 313 | 3.192 042 | −0.008 5 | 3.192 184 | −0.004 05 | |

2 | 2.010 12 | 1.275 54 | 3.169 402 | 3.168 33 | −0.033 8 | 3.168 897 | −0.015 92 |

The sampling strategy for training points is first considered. Aside from uniform sampling, we consider a non-uniform sampling strategy that prioritizes the core region and the parts immediately above and below it in what we refer to as stepped uniform sampling. Moreover, we consider stochastic sampling techniques drawing samples from a uniform distribution of points in the domain, as well as a normal distribution, where only in-domain points are considered. We present, in Table III, that for a single device, better choices will lead to a better performance. In this experiment, the model achieves superior error in both mode and EI estimates with normally distributed sample points. In both the stepped and normally distributed sampling scheme, more points are drawn inside and near the core compared to the cladding or substrate prioritizing prediction in the core region, in which we know the function to fluctuate the most. Testing this on another device with multiple number of samples spells the same results that Table IV presents. The results show an advantage for our method over all discretizations while using the same number of points. This advantage is most apparent with coarse discretizations, where our model achieves almost half the error percentage without using additional point samples and with no complex sampling strategies. More complex strategies employing adaptive sampling based on the physics residual,^{54} as well as others that utilize self-adaptive weights of the per-sample loss^{55} could be adopted, which could improve the results even more, but are not attempted in this work.

Methodology . | Mode error . | EI relative error . |
---|---|---|

FD | 1.238 16 | −0.025 6 |

Uniform | 1.262 73 | −0.026 26 |

Random normal | 0.756 66 | −0.011 98 |

Stepped | 0.797 28 | −0.005 27 |

Random uniform | 0.847 57 | −0.010 42 |

Methodology . | Mode error . | EI relative error . |
---|---|---|

FD | 1.238 16 | −0.025 6 |

Uniform | 1.262 73 | −0.026 26 |

Random normal | 0.756 66 | −0.011 98 |

Stepped | 0.797 28 | −0.005 27 |

Random uniform | 0.847 57 | −0.010 42 |

No. of points . | FD . | PINN + R.N. points . |
---|---|---|

349 | 1.281 42 | 0.692 72 |

699 | 0.646 95 | 0.557 62 |

1049 | 0.432 77 | 0.408 16 |

1399 | 0.325 14 | 0.324 32 |

No. of points . | FD . | PINN + R.N. points . |
---|---|---|

349 | 1.281 42 | 0.692 72 |

699 | 0.646 95 | 0.557 62 |

1049 | 0.432 77 | 0.408 16 |

1399 | 0.325 14 | 0.324 32 |

While the PINNs approach for forward modeling is very promising due to its flexibility and meshless nature, making it well-suited for problems with complex geometries and operating without access to training examples, there are some challenges to its use. Unlike traditional deep learning techniques, which require a single training loop and can then be used quickly in inference mode for predictions in configurations different from those they were trained on, PINNs require retraining for every new device configuration, adding time complexity to the method. While it is typical for known simulation methods such as FD or finite element method (FEM) to reconstruct the problem setting for each new configuration, this is uncommon for deep learning methods. Modifying PINNs to accelerate retraining or avoid it altogether would pave the way for them to become the standard for modeling and simulations. In addition, while the theory and numerical simulators’ convergence are well-established, there is not much on deep learning-based ones, highlighting the need to analyze and quantify their stability and convergence properties.

## IV. INVERSE DESIGN

### A. Recent trends in inverse design

Traditionally, inverse design^{56} allows for the automatic exploration of design spaces, leading to optimized device configurations and tailored optical properties that may not be intuitively derived through traditional trial-and-error methods. For instance, researchers usually employ algorithms such as genetic algorithms, PSO,^{57} and topological optimization methods^{58,59} to search for optimal device parameters that maximize or minimize specified performance metrics, such as efficiency, bandwidth,^{59} dispersion,^{57} polarization response,^{60} or any other desired characteristics, enabling researchers to achieve enhanced performance and uncover new functionalities.

Lin *et al.*^{58} proposed a novel approach based on topology optimization that enables the automatic discovery of wavelength-scale photonic structures, such as micro-posts and grating-slab cavities, for achieving high-efficiency second-harmonic generation. They produced structures with orders of magnitude enhancements in second-harmonic generation (SHG) efficiency compared to state-of-the-art photonic designs. Furthermore, Piggott *et al.*^{59} employed topological inverse design to develop a silicon wavelength demultiplexer capable of splitting 1300 and 1550 nm light from an input waveguide into two separate output waveguides and fabricated the device to success. The fabricated device exhibited an excellent performance, including low insertion loss of ∼2 dB, low crosstalk of less than −11 dB, and wide bandwidths exceeding 100 nm with a 2.8 × 2.8 *µ*m^{2} area. In addition, Hameed *et al.*^{57} suggested meta-heuristic algorithms, such as PSO and central force optimization (CFO), for optimizing the dispersion of a standard photonic crystal fiber (PCF). The dispersion of the liquid crystal (LC) PCFs is within [0.10, −0.609] ps/km nm for the CFO and [0.1974, −0.7404] ps/km nm for the PSO between 1.25 and 1.6 *μ*m. However, all these methods suffer some limitations that need to be resolved. These limitations include the computational complexity arising from exploring large design spaces and optimizing numerous parameters, which forces the researcher to interfere and constrain the design process, ultimately leading to sub-optimal designs. Furthermore, these algorithms usually have relatively slow convergence rates, making them time-consuming and resource-intensive. Consequently, integrating AI techniques into inverse design became a necessity to overcome these limitations. For instance, AI algorithms can efficiently navigate vast design spaces, enabling faster convergence capability and significantly reducing the computational burden associated with traditional optimization techniques. In addition, AI algorithms can learn from large datasets and existing knowledge, allowing for the discovery of novel design solutions that may not be apparent through conventional approaches. Moreover, AI algorithms can incorporate multiple objectives and constraints, allowing the optimization of performance metrics while considering fabrication, cost, and other practical considerations.^{32}

Consequently, researchers leveraged AI-based techniques in inverse design. For example, researchers employed the training process of supervised DL methods to obtain the non-linear relationship between the device parameters and the desired output, to reach the most optimum design. As an illustration, Peurifoy *et al.*^{61} employed a forward DL model and applied it in reverse to design a multi-layered core–shell nanoparticle. Initially, they trained their forward modeling NN to predict the optical response of various nanoparticle designs. Subsequently, they leveraged the trained NN by freezing its weights and utilizing it in reverse and then inferring the optimal design parameters for the core–shell nanoparticle by feeding an optimum spectrum into the NN. In another intriguing study, Liu *et al.*^{62} introduced a tandem NN architecture that combined an inverse network with a pre-trained forward model. Their objective was to design a multilayer structure composed of SiO_{2} and Si_{3}N_{4}, where the thickness of each layer served as the design parameters. Initially, they trained the forward model, establishing the relationship between the design parameters and the corresponding transmission spectra. Once the forward model’s weights were fixed, they integrated the inverse design network into the architecture. To train their model, they assembled a dataset consisting of 500 000 labeled pairs of data, with an additional 50 000 pairs reserved for testing. The desired spectrum was used as input, and the network minimized the loss between the desired spectrum and the recovered spectrum, optimizing the design parameters through the inverse network. In addition, even though, DL has shown great potential in inverse design as demonstrated, it also has certain limitations. One major challenge is requiring a large amount of labeled training data, which can be expensive and time-consuming. They can also deal only with simple structures, where a relationship between input and output can be mapped easily. In addition, the DL models may struggle to explore unseen design spaces or novel configurations.

### B. Novel A2C-RL inverse design paradigm

Luckily, RL was able to offer distinct advantages over traditional DL approaches in the context of inverse design. For instance, RL operates through interactions with an environment unlike DL methods, which rely on labeled training data, making RL more suitable for navigating complex and unexplored design spaces. RL agents learn from trial and error, receiving feedback in the form of rewards or penalties based on their actions. At its core, RL revolves around an agent interacting with an environment, learning a policy that maximizes cumulative rewards over time.^{63} The actor–critic (A2C) RL approach combines value estimation and policy improvement, allowing for more refined learning and efficient decision-making, which is more convenient for inverse design.^{64} Consequently, we were able to utilize the A2C RL inverse design approach in the optimization of a grating coupler. Figure 8 shows the optimization paradigm of the grating coupler.

The diagram showcases the key components involved in the process, including the environment where the FDTD simulation of the grating coupler takes place and the RL agent that handles the optimization process by learning to iteratively improve the design parameters of the grating coupler while interacting with the environment. Through this interaction, the RL agent, written in MATLAB, explores the design space to discover optimal configurations that enhance the performance of the grating coupler. In the subsequent paragraphs, we will provide a comprehensive overview of each component depicted in the diagram, elucidating their functionalities and roles.

Initially, the simulation environment is established through the utilization of Lumerical FDTD software, employing a 2D model configuration with perfectly matched layer (PML) boundary conditions. Figure 9 visually shows the overall 3D silicon-on-insulator (SOI) grating coupler structure alongside the simulated 2D structure. The geometry of the coupler incorporates the pitch of the grating (Λ), the width of the grating teeth (*w*), the etching depth inside the grating (*e*_{d}), the tilt angle between the optical fiber and the grating (*θ*), the horizontal distance between the fiber and the waveguide connected to the end of the grating (*X*_{0}), and the vertical distance between the fiber and the grating (*X*_{0} tan *θ*). For convenience, we define a new parameter, the duty cycle (*D*_{c}), to be $w\Lambda $. Next, we set *e*_{d} = 100 nm, and initialize Λ, *D*_{c}, *θ*, and *X*_{0} to 0.65 *µ*m, 0.65, 13.45*°*, and 3 *µ*m. Then, we obtain a normalized transmittance of 0.39 at *λ* = 1550 nm from the optical fiber to the waveguide through the coupler.

*R*can reach a maximum value of 0 when the transmittance reaches 1 and a minimum value of −100 when the transmittance reaches 0. Thus, the more the reward approaches 0, the better the design. Then, the structure geometry (system state) and the reward generated through the reward function are mapped into a discretized next-state action, producing a new design. The mapping algorithm is executed through an A2C agent. The A2C agent has two main components: the actor and the critic. The actor is responsible for selecting actions based on the current state of the environment, determining the agent’s behavior. This can be achieved by mapping system states into actions through a policy function [

*π*(

*A*|

*S*

_{t},

*θ*)], which also depends on the actor NN parameters (

*θ*). Thus, by updating

*θ*and obtaining a new system state, a new action will occur. On the contrary, the critic, with

*ϕ*parameters, estimates the long-term reduced reward by mapping the system observations and the reward function to a value function [

*V*(

*S*

_{t};

*ϕ*)].

^{64}The training steps are as follows.

*θ*and

*ϕ*values. Next,

*N*experiences were generated by conforming to the current policy with a state, an action, and a future reward defined for each experience:

*S*

_{t},

*A*

_{t},

*R*

_{t+1},

*S*

_{t+1}, …,

*S*

_{t+N−1},

*A*

_{t+N−1},

*R*

_{t+N},

*S*

_{t+N}. Here,

*S*

_{t}is a state observation,

*A*

_{t}is an action taken according to this state,

*S*

_{t+l}is the next state, and

*R*

_{t+l}is the reward received for moving from

*S*

_{t}to

*S*

_{t+l}. Based on

*S*

_{t}, the agent computes the probability of taking each action through the policy function and then selects an action based on the probability distribution. Furthermore, the return

*G*

_{t}is defined as the sum of the reward for the current step and the discounted factor calculated by the critic. Using Eq. (9), we can obtain

*G*

_{t}as

*b*is 0 if

*S*

_{t+N}is a terminal state, one otherwise, and

*γ*is the discount factor estimated by the designer. Furthermore, the gradients of both

*θ*and

*ϕ*are calculated and used to update the initial

*θ*and

*ϕ*values. The

*α*and

*β*coefficients are the learning rates of the actor and the critic. Using Eq. (10), we can define the advantage function

*D*

_{t},

*d*

*θ*) using the advantage function (

*D*

_{t}) defined in Eq. (11),

*G*

_{t}. We can define the gradient of the critic network (

*d*

*ϕ*) using

*G*

_{t}defined in Eq. (10),

Following the update, the actor NN generates a discrete action, ensuring that the probabilities of all neurons in the final layer sum up to one. The selected action is represented by the neuron with the highest probability in the output layer of the actor NN. This approach guarantees a clear and deterministic decision-making process within the A2C framework. Next, each neuron is mapped to an action vector that takes a discrete step in the *n*-dimensional solution space. For instance, our problem has four independent parameters, thus, the solution space is four-dimensional. So, the action vector will have four values, one for each parameter. Each value can be 0, 1, or 2, to reduce, hold, or increase each parameter value, respectively, by a certain predefined step; hence, the model obtains a new geometry for the structure. This predefined step is selected to force the model to obtain designs that can be fabricated. Therefore, the output layer of the actor NN must have 3^{n} neurons, where *n* is the number of the parameters. Table V presents the hyperparameters used for this training. The model was trained for a total of 40 steps. The initial parameters were random. The model achieved the most optimum design only after 14 steps, which took 5 min and 20 s. The A2C-RL training and the PSO algorithm were implemented through a Dell Inspiron 5584 laptop with Intel Core i7-8565U, 16 GB RAM DDR4, 2 TB HDD, and a VGA NVidia MX130 4 GB.

Hyperparameters . | Symbol . | Value . |
---|---|---|

Learning rate | α = β | 0.001 |

Gradient threshold | g | 1 |

Discount factor | γ | 0.9 |

Actor input layer no. | − | 4 |

Actor output layer no. | − | 81 |

Critic input layer no. | − | 4 |

Critic output layer no. | − | 1 |

Hyperparameters . | Symbol . | Value . |
---|---|---|

Learning rate | α = β | 0.001 |

Gradient threshold | g | 1 |

Discount factor | γ | 0.9 |

Actor input layer no. | − | 4 |

Actor output layer no. | − | 81 |

Critic input layer no. | − | 4 |

Critic output layer no. | − | 1 |

Figure 10 shows the training steps where for each iteration (episode), the reward varies. The exploration–exploitation trade-off was predominant in the optimization performance within the environment. Initially, as the reward dropped from −61 to −66, the model began exploring the environment more extensively. This exploration allowed the model to gather information about different actions and their corresponding rewards. Once the model gained some knowledge, it started exploiting this knowledge to make more informed decisions, resulting in an improvement in the reward from −66 to −47.5. However, to avoid getting stuck in a local maximum and potentially missing a global maximum, the model resumed exploration. Consequently, the reward temporarily dropped to −80 as the model sought alternative solutions. Eventually, the model discovered a new local maximum of −55, demonstrating that the initial maximum was indeed global, and the exploration phase was crucial in uncovering the true global maximum.

The final design is presented in Table VI. The design is compared to the initial design and a design generated by the PSO-embedded optimizer in Lumerical FDTD. The A2C RL model outshines the PSO optimizer in several aspects. The A2C RL model surpasses the PSO’s limitation of handling only two parameters by effectively optimizing four parameters. Furthermore, the A2C RL model exhibits faster convergence, outperforming the PSO optimizer’s optimization time by more than double. In addition, the A2C RL model continuously explores the design space and discovers more promising solutions, while the PSO optimizer often suffers from premature convergence, limiting its ability to explore alternative regions in the solution space. These advantages highlight the power of the A2C RL model over the PSO optimizer and other similar optimizers in the inverse design as expected. However, A2C-RL encounters limitations in inverse design, primarily due to the high-dimensional action space and the stochastic nature of the model, necessitating significant computational resources and time for training and evaluation. Furthermore, the model’s restricted integration of domain knowledge makes it highly reliant on the defined reward function as the sole source of learning for the A2C-RL model, rendering it sensitive to the quality and effectiveness of the reward function.

Design variables . | Initial design . | PSO design . | A2C RL design . |
---|---|---|---|

θ (deg) | 13.45 | 13.45 (fixed) | 13.25 |

X_{0} (μm) | 3 | 3 (fixed) | 3.15 |

Λ (μm) | 0.65 | 0.6512 | 0.655 |

D_{c} | 0.65 | 0.756 | 0.69 |

T_{1550 nm} | 0.39 | 0.498 | 0.5247 |

No. of iterations | ⋯ | 30 | 14 |

Enhancement percentage (%) | ⋯ | 27 | 34 |

Design variables . | Initial design . | PSO design . | A2C RL design . |
---|---|---|---|

θ (deg) | 13.45 | 13.45 (fixed) | 13.25 |

X_{0} (μm) | 3 | 3 (fixed) | 3.15 |

Λ (μm) | 0.65 | 0.6512 | 0.655 |

D_{c} | 0.65 | 0.756 | 0.69 |

T_{1550 nm} | 0.39 | 0.498 | 0.5247 |

No. of iterations | ⋯ | 30 | 14 |

Enhancement percentage (%) | ⋯ | 27 | 34 |

## V. A GLIMPSE OF THE FUTURE

The proposed successful studies, for forward modeling or inverse design, reveal a highly synergistic relationship between the needs and premises of these two fields, whether it is estimating the non-trivial optical response of a device based on its characteristics or navigating the complex search space of parameters to design a device with desirable properties, AI has proven its efficacy in both areas. For forward modeling, the DL models still require large datasets for training, despite their rapid prediction capability once trained, imposing a burden on their development due to reliance on other numerical simulators or laborious experiments for data acquisition. However, we demonstrated that this burden can be alleviated via PINNs, which approach the problem in a data-free manner and offer advantages such as continuity and a meshless nature compared to traditional solvers such as FD methods. However, the training time for these models remains a challenge. A model that combines these advantages with faster training times, or one that does not require retraining for each new simulation, would be highly beneficial for photonics applications. A promising candidate for such a model could be physics-informed neural operators (PINOs),^{65} which solve entire families of differential equations. Thus, with the advent of large reliable AI models, developing an accurate, instant solver for cross-domain analysis (Maxwell’s and Schrödinger’s equations) seems possible. This breakthrough would herald a new era in quantum photonics modeling, especially for those required for space applications. This paradigm can also deepen our understanding of the physics of quantum devices, paving the way for new applications. In addition, in the realm of inverse design, RL is poised to transform photonic optimization as a versatile and powerful optimization tool. It offers crucial advantages, such as flexible adaptive exploration and efficient optimization in high-dimensional spaces. It can also rapidly discover unconventional innovative solutions surpassing traditional optimization methods. Therefore, RL has the potential to revolutionize the field by navigating and exploring the vast design space of photonics components and systems, leading to unprecedented performance enhancements in integrated photonics, quantum optics, and optical sensing. This can lead to a new RL-based General Photonic Optimizer (GPO) that can optimize the performance of any photonic device. In addition, this can be extended by using PINOs as the GPO environment, thus creating an all-AI Photonics Design and Exploration Expert (PhoDeX-AI). Figure 11 shows the envisioned structure of the PhoDex-AI tool. The tool could have an additional large language model (LLM) such as a Generative Pre-trained Transformer (GPT)^{66} module to facilitate communication with the user.

## VI. CONCLUSION

In this paper, we explored the use of AI in photonics, focusing on forward modeling and inverse design. For forward modeling, we emphasized the potential of data-free methods, particularly PINNs. These networks enable accurate modal analysis of waveguides without relying on extensive datasets. Our proposed PINNs-based approach demonstrated superior accuracy in predicting effective indices and eigenmodes of slab waveguides compared to traditional finite-difference solvers, highlighting its efficacy and flexibility in both training on user-defined point clouds and estimating effective indices owing to its continuous nature. The PINN approach was more accurate than the FD method in this modal analysis, with a lower relative percentage error of 0.69272% compared to 1.28142% for the FD method, based on an analysis using 349 data points.

In addition, in the realm of inverse design, we highlighted the shift from traditional optimization techniques, such as genetic algorithms and topology optimization, to AI-driven approaches. AI techniques, especially RL, offer significant advantages as they efficiently navigate vast design spaces and reduce computational burdens. In conformity with this conclusion, we configured a novel A2C-RL inverse design approach, which excels in scenarios with scarce labeled data and complex design spaces. Next, we employed the model to optimize a grating coupler to demonstrate the power of RL in photonics design. The RL-based approach produced a 34% enhancement over the initial design in just 14 iterations, significantly outperforming the PSO method, which only achieved a 27% enhancement but required 30 iterations to do so. Our studies then culminated in a discussion of the current limitations in both tasks, possible improvements, and our vision for what could be possible in the future within this interplay between AI and photonics.

## ACKNOWLEDGMENTS

The authors declare their appreciation for the fund received from the ASRT under the project titled “National Nanotechnology Lab” and the Scientists of Next Generation (FRM-SGO-CYCL#8) Grant. The authors also recognize the financial support from the STDF, Project No. 45702.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

M.G.M. and A.S.H. contributed equally to this work.

**Mohamed G. Mahmoud**: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). **Amr S. Hares**: Conceptualization (supporting); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). **Mohamed Farhat O. Hameed**: Conceptualization (supporting); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Software (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **M. S. El-Azab**: Project administration (equal); Supervision (equal); Writing – review & editing (equal). **Salah S. A. Obayya**: Conceptualization (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request. The data are not publicly available to allow for a controlled and regulated distribution of the data, ensuring that the futuristic studies will comply with the ethical considerations associated with the study, given the morally ambiguous nature of AI.

## REFERENCES

_{2}material

*2017 International Applied Computational Electromagnetics Society Symposium–Italy*

*Optical Devices in Communication and Computation*

*RF Systems, Circuits and Components*

*Rethinking Building Skins*

*Recent Trends in Cold-Formed Steel Construction*

*Encyclopedia of Physical Science and Technology*

*Reinforcement Learning: An Introduction*

*Artificial Intelligence: A Modern Approach*