Optical networks generate a vast amount of diagnostic, control, and performance monitoring data. When information is extracted from these data, reconfigurable network elements and reconfigurable transceivers allow the network to adapt not only to changes in the physical infrastructure but also to changing traffic conditions. Machine learning is emerging as a disruptive technology for extracting useful information from these raw data to enable enhanced planning, monitoring, and dynamic control. We provide a survey of the recent literature and highlight numerous promising avenues for machine learning applied to optical networks, including explainable machine learning, digital twins, and approaches in which we embed our knowledge into machine learning such as physics-informed machine learning for the physical layer and graph-based machine learning for the networking layer.

## I. INTRODUCTION

Machine learning (ML) is the study of computer algorithms that can learn to achieve a given task via experience and data without being explicitly programmed.^{1} ML has been a topic of research within statistics and computer science since at least the 1950s, with early iterations of many algorithms used today invented in the last 30 years.^{2} However, as a result of the increase in the availability of data and computing power over time, the use of ML has recently become ubiquitous across all disciplines of science and engineering. Optical fiber communications is no exception—there are now a great many works utilizing a range of ML techniques to solve a range of problems within the domain. This is reflected in a large number of review and tutorial papers that have been published on the subject of ML applied to optical networks.^{3–8} However, given the rapid acceleration of the usage of ML within optical networks, there have been many works published in the domain that leverage ML since these reviews were conducted. Moreover, certain ML applications have recently begun to increase in popularity for optical networks problems, which we address in this Tutorial. Thus, in this Tutorial, we introduce the reader to ML, highlight the key ML techniques being deployed within optical fiber communication systems presently, and outline recent impactful works within each application sub-domain.

Optical fiber communication systems form the backbone of communications, having been deployed across the globe since the early 1980s.^{9} At a basic level, the edges of optical fiber networks are composed of optical fibers carrying modulated laser light, with optical amplifiers to combat loss of laser signal power incurred during propagation. The nodes of optical networks are comprised of transmitters, receivers, and switches. Loosely, the job of network operators is to carry messages between these nodes such that the quality of service agreed to customers is met. Different modulated laser signals, known as channels, are assigned different individual wavelengths and can then be transmitted through the same fiber link simultaneously—this is known as wavelength division multiplexing (WDM). Telecommunication systems are split into conceptual layers defined by the open systems interconnection model,^{10} and in this Tutorial, we reference applications of ML in layers one and two, which we refer to as the physical layer and the network layer, respectively. In short, the physical layer concerns how raw bits are transmitted across a link between two nodes, also known as a light path. Contrastingly, network layer applications concern how to transfer data across the physical layer between given nodes. As an example, one can control aspects such as the route taken through the network, meaning the sequence of edges and nodes traversed, and the chosen wavelength channel that is used to carry the information between two nodes. Additionally, optical networks research is commonly carried out on specific network types, which are primarily defined by their scale. In ascending order of transmission length, these network types are access networks, which connect individual users to other users and data centers, metro networks at city scale, backbone networks at the scale of large countries and continents, and submarine systems, for connecting continents. Each of these network types has different constraints, for example, access networks have stringent monetary cost and complexity limits, whereas submarine systems have very strong power constraints. There are also data center networks, which are significantly different from all these network types due to their highly configurable topologies and extremely short reach links. In this Tutorial, we discuss works considering backbone, metro, and access networks.

Optical fiber communication systems facilitate the transfer of information at high data rates, currently 10–100 s (and in some cases, greater than 1000) of Mb/s,^{11} enabling many data-hungry applications. In fact, Cisco predicts that there will be 5.3 × 10^{9} internet users by 2023, an increase from 3.9 × 10^{9} in 2018.^{11} Moreover, the average connection speed is expected to rise from 45.9 Mb/s in 2018 to 110.4 Mb/s by 2023.^{11} The optical fiber communication domain faces a number of key challenges that must be overcome to bring about this growth. First, optical fibers exhibit nonlinear behavior, governed by the optical Kerr effect.^{12,13} This means that the refractive index seen by a given wavelength of laser light propagating through the fiber is dependent on the electric field strength in the fiber. As a result, channels interfere in a nonlinear way both with other channels on the same fiber and with themselves. These nonlinear noise-like distortions due to channel interference are power-dependent, meaning that there exists a trade-off between the optical power of the signal and the strength of these nonlinear interactions.^{14} This introduces a level of complexity that makes physics-based modeling challenging in practical systems, making ML approaches look promising. Estimating the strength of nonlinear interaction and mitigating its effects form the basis of much of the research in optical fiber communication systems, including a large amount of works in which ML is applied. Furthermore, attempts to extend the range of wavelengths used to carry information beyond the traditional *C*-band, known as wide-band systems, require one to deal with some extra physical effects. Among them are the wavelength dependencies of fiber parameters, such as fiber loss (mainly, the elastic Rayleigh scattering^{15}), higher-order fiber dispersion effects, and the influence of the frequency-dependent fiber effective mode area.^{16} In addition, higher-order Kerr-type nonlinearities manifesting themselves as stimulated inelastic light scattering effects, i.e., stimulated Raman scattering (for very short optical pulses)^{17–19} and stimulated Brillouin scattering (for very large launch powers),^{20,21} should also be taken into consideration. ML approaches have shown potential in helping to deal with such effects, which may facilitate the use of wide-band systems in future networks.

Another critical problem in optical fiber communications is the high complexity of optical networks, which poses a significant operational challenge.^{22} As networks have evolved over time to carry a higher information throughput, the modeling of the optical communication channel has become more difficult due to the increased number of adjustable design and operational parameters.^{3} Perhaps the biggest driving force behind this has been the introduction of coherent technologies,^{23} which increased the complexity of transmitters and receivers significantly. Moreover, the configurability of the network layer has increased due to advances such as software defined networking (SDN).^{24} In addition, future optical networks will be more dynamic, requiring automation as requests must be satisfied on shorter time scales.^{25} As a result, investigating the extent to which ML can help with modeling and network control has been the subject of a large volume of research. In this Tutorial, we focus on introducing the ML techniques that appear in the works we outline. Furthermore, we introduce a classification of algorithms in order to clarify the relationship between these techniques as well as outlining trends within optical communications such as which algorithm classes are used within each optical communication sub-domain.

The rest of this Tutorial is organized as follows. In Sec. II A, we introduce the general concept and nomenclature of ML, followed by a description of the specific techniques utilized by the works discussed in this Tutorial in Sec. II B. We then outline key research problems and selected interesting work within the physical layer in Sec. III, followed by an equivalent survey for network layer problems in Sec. IV. Selected opportunities for future research across both physical and network layer problems are highlighted in Sec. V, and concluding remarks are included in Sec. VI.

## II. INTRODUCTION TO SELECTED MACHINE LEARNING TECHNIQUES

### A. Categorization of machine learning

First, algorithms can be categorized based on the type of problem that is being solved, i.e., whether it is a regression or classification problem.^{26} Regression algorithms make continuous predictions, such as the signal-to-noise ratio (SNR) of a light path in an optical network, and may have continuous or discrete inputs, also known as features. Classifier algorithms, instead, predict the class associated with a given set of inputs, for example, whether a request to connect two nodes in a network can be satisfied or rejected. A second distinction can be made based on whether the data are labeled or unlabeled.^{26} Algorithms requiring labeled data are known as supervised, for instance, a dataset of SNR as a function of the signal power for an optical channel. Each datum in this set has a label, the measured SNR, which the algorithm can use as a target when learning. Contrastingly, unsupervised algorithms involve learning from unlabeled data. This can be done by attempting to group these data based on similarity—known as clustering, or compressing the data by finding the features that are most important for distinguishing between examples and removing the remaining features—known as principal component analysis.^{26} An example of unlabeled data might be traffic flows in a network, which can be grouped into classes that are not pre-determined, but rather determined by the algorithm based on similarities in various features. There also exists another formulation of ML that is distinct from supervised and unsupervised learning, known as reinforcement learning (RL).^{27} In RL, the goal is to learn a policy for achieving a given task by interacting with the environment. Every action taken affects the environment and returns a reward, the value of which quantifies how successful the given action was in the context of the overall goal. Formulations of RL and various algorithms are discussed in Sec. II B 4. An example of an application of RL in optical fiber networking might be an agent that learns an optimal routing policy, which maximizes the total throughput of the network, given a series of requests. Here, the environment may consist of the current network state and outstanding requests, and the action space (the set of allowed agent actions) may consist of a set of candidate routes and channel wavelengths to choose from.

A categorization of different ML techniques discussed in this Tutorial is outlined in Fig. 1. This diagram reflects the fact that supervised learning is more commonly used within optical communications than RL and unsupervised learning and that unsupervised learning is the least-used class of algorithms. Moreover, for physical layer applications, regression is more popular than classification, as we are often interested in predicting continuous target signals. Classifier algorithms are predominantly used in network layer applications where we are often interested in distinguishing candidate light paths that are suitably high quality from those that are not and predicting source and destination nodes of network traffic. Similarly, the majority of works applying RL in optical communications address network layer applications, which are often formulated as dynamic control problems. However, these are not absolute rules and there are exceptions. For example, Generative adversarial networks (GANs) and graph neural networks (GNNs) are commonly used in a regression formulation to tackle problems in network traffic prediction and generation. Rather, these are the general trends seen in the literature by the authors.

In the broader applied ML community, the types of data used can be categorized as structured tabular data, text data for natural language processing, image data consisting of sets of pixels, and time series data. The structure within tabular data may include spatial information, such as a graph, which can be represented as a matrix of edges and weights. Within optical fiber communications, the most common data types used are tabular and time series data. Furthermore, a further distinction can be drawn between batch and online learning. The more traditional batch learning approach involves learning from the whole training dataset, before deploying this model on new examples. Alternatively, online learning involves learning as data become available, updating the current model with information obtained from new examples.^{28} In the case of a NN model, for instance, online learning would involve adapting the weights of a trained model based on a small volume of data. One could therefore train the NN initially on a large historical dataset before fine-tuning the weights using new data from monitors via online learning. In the supervised case, the new data will be labeled with an example of a label being the SNR for a given set of operating parameters. Unsupervised online learning is also possible, and online algorithms for principal component analysis and clustering using neural networks are available.^{29} Here, the basic idea is to begin with a dataset that has been compressed in the case of principal component analysis or grouped in the case of clustering and modify the compression or grouping based on a new datum as it becomes available, rather than for all the data at once. Thus, the new datum is also compressed or grouped, which may, in turn, also change how the other data are compressed or grouped. A related approach to online learning is transfer learning, where we utilize information obtained from training a model for one task in order to reduce the computational effort required in training a model to perform another similar task.^{30} In other words, transfer learning involves starting with a trained model for an old task and adapting it for the new target task, rather than starting from scratch. For example, one can modify the weights of a NN that has been trained for another task, rather than starting with untrained weights, reducing the computational requirements of training.

Finally, explainable ML is a growing field of ML that is crucial for ML applications as explainability increases confidence in ML systems.^{31} In this work, we follow the definition of explainability given by Roscher *et al.*^{32} Specifically, explainable ML is transparent and interpretable and leverages domain knowledge. In this context, transparency means that the design of the ML model can be justified beyond empirical performance on the testing dataset; interpretability means that the ML model output is human understandable—we can reason as to why the model makes a given prediction for a given input; and domain knowledge broadly encompasses all the knowledge of the problem we possess before we have seen the data. A black box is a model for which the decision processes are not interpretable by humans and the design cannot be easily justified.^{33} There are two main approaches to explainability. First, there are those that accept that the underlying model is a black box and analyze the model’s input–output relationship, in order to explain how it makes decisions and infer its internal structure.^{34–36} Alternatively, there are those that try to replace the black box with a more simplistic or more mathematically principled model that is inherently more understandable. The former are commonly known as post-hoc techniques. Thus, a black box method can be made more explainable using extra add-on techniques or one can design the method from the ground up to be explainable.

### B. Machine learning techniques

#### 1. Neural networks

^{37}The structure of NNs is analogous to that of animal brains, consisting of a network of units, called neurons, connected via edges with associated weights. The neurons can send signals to one another along these weighted edges and process these signals. The most commonly used type of NN in ML applications is a feedforward NN (FFNN). The mathematical structure of such networks is given by an input layer, followed by a series of layers of neurons, each representing a function that is applied to the previous layer in a chain rule-fashion.

^{38}The final layer yields the model output, and the layers in between the input and output layers are known as hidden layers. As an example, consider a supervised NN model with a single hidden layer

*f*

^{(1)}and an output layer

*f*

^{(2)},

*x*^{(i)}is the input vector for layer

*i*such that

*x*^{(1)}=

**, $W(i)$ is the matrix of weights in layer**

*x**i*,

*b*_{i}is a vector of additive constants known as biases in layer

*i*,

*g*

^{(i)}is the activation function applied element-wise to yield a vector output for layer

*i*, and (·)

^{T}denotes the transpose of a given matrix. A pictorial representation of this NN, adapted from the work of Bishop,

^{26}is given in Fig. 2.

For this example network, nonlinear and linear activation functions may be applied to the hidden layer and output layer, respectively. If both *g*^{(1)} and *g*^{(2)} are linear, the entire NN model is itself simply a linear function of ** x**. Therefore, nonlinear activation functions are crucial for approximating interesting functions.

The term deep learning (DL) refers to NNs with at least one hidden layer—often, networks with multiple hidden layers are used. Choosing the structure of the NN, including the activation functions, is often done in an *ad hoc* trial and error fashion. As a result, NNs are often viewed as being black box opaque models, which are difficult for humans to interpret. In fact, the highly nonlinear layered structure of NNs is what makes them so flexible and powerful. Training NNs—the process of obtaining the optimal set of weights that solve a given problem and generalize well, for data not seen during training—can be achieved in multiple ways, the most commonly used of which is backpropagation and gradient descent.^{39} To train NNs, we first have to define a loss function that, for supervised learning, measures how far the predictions of the network are from the measured data; a commonly used loss function is the mean squared error (MSE). In backpropagation, the gradient of the loss function can be computed efficiently for a given training example input–output pair, allowing for NNs to be trained using gradient descent—update the weights in the opposite direction to that of the gradient, in order to move toward the local minimum.^{40}

There are many extensions to the simple NNs described above, designed to solve a range of specific problems. However, the basic structure and methodology for learning remain the same. One such example is the autoencoder, which can be either supervised or unsupervised. An unsupervised autoencoder learns an efficient encoding of unlabeled data, whereas a supervised autoencoder can be used to obtain the set of inputs that yields a desired output. An autoencoder consists of a FFNN with two parts: the encoder that learns to map the input data to an optimal representation and the decoder that learns to decode this representation and recover the initial data.^{38} This structure is outlined in Fig. 3.

In optical fiber communication systems, there are a number of monitors that provide network operators with time series data, and hence, time series ML techniques are of particular interest. Recurrent NNs (RNNs) are a class of NNs that exhibit temporal dynamic behavior, meaning that they can be used to approximate functional relationships found in time series data.^{41} This is achieved by considering the previous state of the network and the current input when determining the current state of the network. A schematic outlining the basic structure of a RNN is shown in Fig. 4. RNN models can maintain state information, allowing them to perform tasks such as traffic sequence prediction that are beyond the ability of a standard FFNN. However, RNNs are affected by gradient explode or gradient vanish problems^{42} that prevent complete learning of the time series. Due to this issue, special cases of RNNs such as Gated Recurrent Units (GRUs)^{43} and long-short term memory (LSTMs)^{44} have been proposed that are capable of adaptively capturing dependencies on different time scales.

Furthermore, as optical networks have a topological structure that can be represented by a graph, it is natural to utilize graph-based machine learning techniques, such as Graph NNs (GNNs) that leverage the network structure.^{45} GNNs combine graph theory with NNs in a way that draws parallels with RNNs. There are two key sequential steps involved in updating a GNN for a given node: aggregation of the of the states of neighboring nodes, including the target node itself, followed by an update to the state of the node, depending on the specific analysis goal of the GNN.^{46}^{,} Figure 5 describes an example GNN model for node-based prediction tasks. Based on the variations of the aggregation and update functions, several models of GNNs have been proposed in the recent literature, such as message-passing NNs,^{47} graph convolutional networks (GCNs),^{48} graph attention networks,^{49} and gated graph NNs.^{50} Examples of applications include classification and regression on nodes or edges, i.e., predicting classes or continuous values for these elements of a given graph. GNNs can also be supervised or unsupervised, providing some flexibility with regard to the application domain.

Another NN that has been used in network layer applications is the generative adversarial network (GAN).^{51} GANs achieve their unique capabilities owing to their design based on zero-sum game theory. At a high level, they are composed of two NNs, the discriminator and the generator, which compete against each other. A schematic showing the structure of a GAN is shown in Fig. 6. GANs are designed for realistic data generation and have been successfully used for both image and video data generation in the recent literature. Thus, GANs show potential for traffic data generation in optical networks.

#### 2. Gaussian processes

^{52}This makes them attractive for optical fiber communication systems, in which the accepted failure rate is low and thus knowledge of the limitations of ML models is desirable. GPs can be used for regression or classification and are non-parametric methods,

^{53}meaning that no specific parametric form is assumed for the model but rather Bayes theorem is used to search the space of functions directly. In the context of GPs, the Bayes theorem can be stated as

^{52}

^{52}demonstrates a function drawn from an uninformative GP prior, which is then conditioned on data to produce an accurate model.

In general Bayesian inference, this involves numerical integration to calculate the required posterior. However, in GP regression, we assume that the likelihood function is a Gaussian, which means that these integrals then become analytical and thus much less computationally expensive. This assumption is not valid for GP classification, however, making it more computationally demanding than GP regression models.

^{26}As a result, the user must specify the kernel function at the design stage, which means making an assumption about the features we expect to see in the data. For instance, a commonly used kernel function is a squared exponential kernel plus a white Gaussian noise (GN) kernel, giving

*ν*and

*μ*are scalar hyperparameters controlling the absolute scale and the length scale of the target function,

*x*

_{i}and

*x*

_{j}are data points, $\u22c5$ denotes the Euclidean distance operator, and $\Xi (\sigma 2)\u223cN0,\sigma 2$ if

*i*=

*j*and 0 otherwise, where $N0,\sigma 2$ denotes a zero-mean Gaussian distribution with variance

*σ*

^{2}. Choosing this kernel means assuming

*a priori*, meaning before we have seen the data, that the function we are trying to learn has one length scale and white Gaussian noise. More complex kernels exist to describe features such as periodicity and decay, and one can design a kernel by noting that the sum of any two valid kernel functions is itself a valid kernel function.

GPs are trained by finding the optimal kernel hyperparameters via maximizing the log marginal likelihood in order to find the most likely interpretation of the data.^{52} Once optimal hyperparameters are found, the predictive distribution of the GP can be calculated using Algorithm 2.1 of Rasmussen and Williams.^{52} The predictive mean function and predictive variance of the GP can then be used to make probabilistic inferences about the data.

One of the major issues associated with using GPs is the computational complexity, which is $On3$, where *n* is the number of training examples. It is possible to use sparse approximations that reduce this computational burden,^{54} at the cost of some accuracy.

#### 3. Support vector machines

Another kernel-based ML method is the support vector machine (SVM), a method which can be used for supervised regression and classification^{26} and for unsupervised learning.^{55} However, the vast majority of SVM use within optical networking is for classification, and therefore, we focus on SVM classifiers here. Unlike standard GPs, SVMs are sparse kernel methods, meaning that the model predictions do not require evaluation of the kernel function for all training examples, but rather we only need to evaluate the kernel for a subset of the training data.

SVM classifiers work by constructing a decision boundary that separates the labeled data into distinct classes such that the margin, defined as the perpendicular distance between the closest data points in each of the classes and the decision boundary, is maximized. These points that are closest to the boundary are known as the support vectors, so-called because they directly specify the position of the boundary. Being the closest to the optimal boundary, these points are also the most difficult to classify. Figure 8 shows the example of a binary SVM classifier, with the decision boundary and support vectors highlighted.

^{26}and consider the simple case of a binary classifier, with data labeled as one of the two classes,

*t*

_{n}∈ (−1, 1), modeled by a linear decision boundary model of the form

**is a vector of weights,**

*w***is the vector of inputs,**

*x**ϕ*represents a fixed transformation in the input space, and

*b*is a constant. A data point is classified depending on the sign of

*y*(

**). It can be shown that, as the distance of the points**

*x*

*x*_{n}to the decision boundary is invariant under linear transformation, all data points satisfy the constraints

*x*

_{n}to the decision boundary is given by

#### 4. Reinforcement learning

RL is a discipline of ML that involves a learner known as the agent that learns interactively by taking actions in its environment, where the environment consists of everything outside of the agent.^{56} The environment can be simulated or experimental; a simple example for the case of optical fiber communication networks could be the currently established light paths, current requests, and the SNR of these light paths.

Here, we outline the key concepts of RL, following Chap. 3 of Sutton and Barto.^{27} The agent interacts with its environment at a series of time steps *t*, *t* + 1, *t* + 2, …. At each time step *t*, the agent takes as input some representation of the state of the environment $St\u2208S$ and chooses an action $At\u2208A$, where $S$ is the set of all possible states and $A$ is the set of all actions that are possible given a state *S*_{t}, respectively. In the proceeding time step, the agent reaches a new state *S*_{t+1} and receives a reward $Rt+1\u2208R\u2282R$. The method by which the agent selects the action *A*_{t} given a state *S*_{t} is called the policy, denoted as Π_{t}, a mapping from states to probabilities of selecting each possible action. Informally, the goal of the agent is to maximize the cumulative reward received over time. A schematic showing how the agent interacts with its environment, adapted from the work of Sutton and Barto,^{27} is shown in Fig. 9.

*T*, and as RL has been applied to both continuous and episodic tasks in optical networks, we introduce a general notation that is valid for both types, in which a continuous task is represented by

*T*= ∞. Thus, the agent aims to maximize the expected discounted return,

*κ*∈ [0, 1) is a parameter called the discount factor, which controls the value of future rewards at the present time step. If

*κ*= 0, the agent will learn to maximize the immediate reward, whereas as

*κ*approaches 1, the agent will strongly weight future rewards when choosing a policy. An important element of the RL framework is that we desire to have a state representation that conveys to the agent all relevant information about the environment such that the probability of entering a specific new state at

*t*+ 1 can be defined only in terms of the state and action representations at

*t*. In other words, we do not need the entire set of previous states and actions to find an optimal policy, but only the state and action at the previous time step. State representations that satisfy this are said to have the Markov property, and tasks that involve learning with a Markov state are called Markov decision processes (MDPs). For a finite MDP, meaning an MDP for which the state and action spaces are finite, we can completely determine the dynamics by the probability distribution

*s*and

*a*are a given state and action,

*s*′ is the new state, and

*r*is the reward received. Here, it is assumed that $s\u2208S$, $a\u2208A$, and $r\u2208R$. Using Eq. (11), we can compute all other quantities needed by the RL agent.

*s*and following policy Π,

^{27,57}

*ν*

_{Π}. The agent’s goal of maximizing the long-term cumulative reward can be stated as finding the policy that has an optimal value function, and we can write a Bellman equation

*ν** denotes the optimal value function, which may be achieved by more than one policy but will always exist for a finite MDP. In practice, the computational cost of computing

*ν** exactly is too high, and thus, we learn a suitably good approximation.

There are a number of different algorithms for finding Π*, and these algorithms can be either model-based or model-free.^{58} Model-based RL algorithms are concerned with computing an optimal policy for a MDP, assuming that a perfect model of the environment is available. Contrastingly, model-free algorithms do not rely on the assumption that such a model exists, but rather sample the MDP to obtain statistical knowledge about the unknown model. Such algorithms do not attempt to construct a model of the environment. Moreover, RL algorithms can be further categorized: for on-policy approaches, the agent will update its action-value function using the action determined by the current policy, whereas for off-policy approaches, a different policy is used to select the action.^{27} Commonly, off-policy algorithms will utilize the *ɛ*-greedy policy, in which a threshold $\epsilon \u2208[0,1]\u2282R$ is selected, and at each time step, a random real number is generated between 0 and 1. If the value of this number is greater than *ɛ*, the agent will perform the action that maximizes the expected cumulative reward; otherwise, it will perform a random action. This demonstrates the trade-off between exploration and exploitation that is crucial within RL—only exploiting current knowledge leads to short-sighted policies, but we need to refine successful policies to achieve high performance. Therefore, it is important to allow some degree of continuous exploration of the environment to achieve a policy that is optimal in the long-term.^{59} One final distinction that will be encountered in the RL literature is that of value-based algorithms, in which the value function is parameterized in order to find an approximation to the optimal policy^{60} and policy-based algorithms, where the policy is parameterized instead.^{60} Finally, it is possible to combine these approaches by utilizing two learners, known as the actor and the critic. The actor learns the optimal action to take for a given state, and the critic learns to compute the value function of a given action.^{27} Below, we summarize the specific RL algorithms used by works referenced in this Tutorial, highlighting useful references for the reader. The algorithms included in this section are within the scope of deep reinforcement learning (DRL), a sub-field of RL that has become of great interest in the recent literature owing to its successful adaptations in several application domains.^{61} DRL relies on the intersection of reinforcement learning (RL) and deep learning (DL). In general, DRL algorithms incorporate DL to solve MDPs, often representing the policy or other learned functions as a NN.

^{62}The key idea is to use a deep NN (DNN) to estimate the optimal action-value function,

^{63}

^{60}

^{60}

Another commonly used RL algorithm is deep deterministic policy gradient (DDPG),^{63} an extension of the deterministic policy gradient (DPG) algorithm^{64} inspired by deep Q learning. The key idea behind DPG is to assume a deterministic policy, the gradient of which can be shown to follow the gradient of the action-value function *q*(*s*, *a*). In DDPG, this is extended by using DNNs to parameterize the actor function and by employing some innovative techniques from deep Q learning and DL.^{64} The resulting algorithm is effective for exploring continuous action spaces, addressing a shortcoming of deep Q learning.

## III. PHYSICAL LAYER APPLICATIONS

In this section, we outline several key research problems within the physical layer and highlight selected applications of ML to these problems from the literature. Specifically, we discuss quality of transmission (QoT) estimation, digital twins, equalization in short reach applications, and fiber nonlinear noise mitigation in long-haul transmission systems. A summary of the works discussed detailing the physical layer applications tackled and different ML techniques proposed is given in Table I.

Application . | ML technique(s) . | Advantages . | References . |
---|---|---|---|

QoT estimation | Simple learning process, LMA | Interpretable | 72, 73, 75, and 87 |

GP | Well-quantified uncertainty | 76 | |

CBR, NN | Experimental demonstration | 78–79 | |

GP, NN | Physics-informed ML, less data required, and explainable | 86 and 85 | |

NN, SVM | Self-adaptive and reduced computational complexity | 81 and 83 | |

Digital twins for optical networks | RNN, DRL, XGBoost | Experimental data and general framework | 41, 92, 94, 91, and 95 |

Short reach equalization | DNN | Outperform conventional equalizers | 103 and 104 |

CNN | Outperforms DNN | 106 and 107 | |

RNN, LSTM | Improved performance compared to FFNN via feedback | 109–112 | |

SVM | Unsupervised and enable decoding of PAM-N signals | 113 and 114 | |

DNN, RNN | Low complexity FPGA implementation | 105 and 111 | |

Fiber nonlinear noise mitigation | NN, ELM | Reduced computational complexity | 119 and 128 |

LSTM | Better performance than six-step DBP | 120 | |

SVM, KNN, PW | Increased optimal launch power | 121, 122, and 124 | |

K-means clustering | Found required overhead for transmission | 123 | |

NN | Physics-informed ML and explainable | 129, 130, and 131 | |

NN, transfer learning | Increased flexibility and reduced computational load | 137 | |

K-means clustering | Low complexity FPGA implementation | 127 |

Application . | ML technique(s) . | Advantages . | References . |
---|---|---|---|

QoT estimation | Simple learning process, LMA | Interpretable | 72, 73, 75, and 87 |

GP | Well-quantified uncertainty | 76 | |

CBR, NN | Experimental demonstration | 78–79 | |

GP, NN | Physics-informed ML, less data required, and explainable | 86 and 85 | |

NN, SVM | Self-adaptive and reduced computational complexity | 81 and 83 | |

Digital twins for optical networks | RNN, DRL, XGBoost | Experimental data and general framework | 41, 92, 94, 91, and 95 |

Short reach equalization | DNN | Outperform conventional equalizers | 103 and 104 |

CNN | Outperforms DNN | 106 and 107 | |

RNN, LSTM | Improved performance compared to FFNN via feedback | 109–112 | |

SVM | Unsupervised and enable decoding of PAM-N signals | 113 and 114 | |

DNN, RNN | Low complexity FPGA implementation | 105 and 111 | |

Fiber nonlinear noise mitigation | NN, ELM | Reduced computational complexity | 119 and 128 |

LSTM | Better performance than six-step DBP | 120 | |

SVM, KNN, PW | Increased optimal launch power | 121, 122, and 124 | |

K-means clustering | Found required overhead for transmission | 123 | |

NN | Physics-informed ML and explainable | 129, 130, and 131 | |

NN, transfer learning | Increased flexibility and reduced computational load | 137 | |

K-means clustering | Low complexity FPGA implementation | 127 |

### A. Quality of transmission estimation

One of the most widely researched applications of ML in optical fiber communications is QoT estimation, evidenced by a recent survey focusing on this application alone.^{6} QoT is an umbrella term for a number of metrics of the quality of a transmitted optical communication signal, including SNR, bit error rate (BER), and Q-factor.^{65} ML techniques are a logical approach to QoT estimation because of the numerous sources of uncertainty that make the estimation and prediction of QoT challenging^{6} and the necessity of QoT estimation for performing network level control, such as for the routing and spectrum assignment of new light paths. A number of models of QoT exist that are based on the physics of transmission within the fiber, which have varying degrees of accuracy. Two commonly used examples are the Gaussian noise (GN) model^{66} and split-step Fourier transform method (SSFTM).^{67} However, these are plagued by limited applicability due to limited accuracy and high computational requirements, respectively. Moreover, both are limited by uncertainty in the physical layer inputs, with the magnitude of these uncertainties varying between deployed networks. For example, installed fibers can be accidentally damaged, before being spliced back together, resulting in variations in the fiber attenuation. Moreover, other components such as amplifiers and filters can suffer degradation in performance as they age,^{68} which can change physical layer parameters such as EDFA noise figure. Additionally, parameters such as the fiber type and fiber chromatic dispersion (CD) may not be known to the operator in deployed networks.^{69} ML can be used either as a replacement for physics-based models or alongside them in order to combat the input uncertainty and to reduce the computational burden.

The QoT estimation sub-domain can be further divided into three main problems. First, ML can be used to predict the QoT from physical layer inputs, such as the number of channels, operating wavelength, modulation format, and number of spans. This can be formulated as a regression problem, where the QoT itself is the target, or as a classification problem where the goal is to predict whether or not a given light path will have sufficient QoT. Second, ML can be deployed to aid with QoT monitoring, commonly to learn the mapping between the variables that are measured by using monitors and the QoT, often for the purpose of prediction of failures.^{3,6} Finally, the modeling of the optical amplifiers used in optical fiber communications presents a challenge due to the nonlinear dependence of amplifier gain on wavelength, channel launch power, and the number of channels. As amplifiers can have a significant effect on the QoT, there have been a number of works in which ML has been applied to modeling amplifiers.^{3,5,6}

Here, we outline selected examples of ML applied to QoT estimation from the literature that demonstrate what is typical in the field. Interesting works using regression include using a simple learning process based on gradient descent^{40} to reduce the uncertainty in the inputs to a physics-based QoT model.^{70} This represents a hybrid approach where ML is used in concert with physical models of the QoT, rather than relying solely on the data. A similar approach was also demonstrated experimentally—a learning process was used to update the parameters of a physical model based on measurements of the Q-factor of an experimental system.^{71} Additionally, ML based on the Levenberg–Marquardt algorithm^{72} (LMA) was recently utilized for online optimization of the inputs to the GN model, specifically the launch powers, for a simulated network.^{73} Interestingly, the number of iterations used for the optimization is adaptive, which reduces the time and measurement resources required to perform the optimization. Again, the role of ML here is to configure the inputs to the physical model, rather than replacing it. There are also approaches in which the goal is to replace the physical model. For example, a GP regression model has been used to learn the functional relationship between the BER and system transmission parameters, specifically the launch power, length of fiber over which the signal is transmitted, symbol rate, and channel spacing.^{74} This model was trained on both simulated and experimental data, and it was shown that the model could make accurate predictions on a system with a different configuration to that upon which it was trained. As many of the QoT estimation works utilize NNs, this Tutorial highlights that more principled approaches such as GPs with well-quantified predictive uncertainty can also be used successfully for QoT estimation. Moreover, an experimental network has been operated at a reduced margin via a case-based reasoning (CBR) approach,^{75} where margin means the difference between the minimum acceptable QoT and the current signal QoT. In CBR, the QoT for established light paths is stored and used as a lookup table to estimate the QoT of new light paths that take a similar route through the network. This work is particularly interesting as it demonstrates that ML, albeit a simple version of it, can be useful for controlling an experimental optical fiber network—in this case, it allows us to reduce the required margin. Another recent experimental demonstration of the efficacy of ML-based QoT estimation utilized NNs trained on synthetic QoT data to estimate the SNR on a live network operated by Tele2 Estonia.^{76} Crucially, these models demonstrated a maximum SNR error of 0.5 dB and were able to compute the SNR estimate on microsecond scale, indicating that such models could feasibly be deployed in real networks. DNNs have also been used recently to estimate the SNR based on historical telemetry of the optical amplifiers in an experimental system, focusing on the effect of the amplifiers, rather than the nonlinear noise generated by transmission in a fiber, which they assume can be estimated using a physical model.^{77} Moreover, a NN-driven nonlinear SNR estimator was presented, for which the optimal combination of input features was found.^{78} In this work, knowledge of the physics of fiber transmission is used to aid with feature engineering, in order to obtain the set of input features with the highest efficacy.

Classifiers have also been leveraged for QoT estimation, such as a binary NN classifier trained on historical network data, which was used to determine whether or not a given request will have sufficient QoT to be established.^{79} The performance of this classifier was compared to that of an analytical QoT model^{80} and was found to efficiently replace this model, while providing a key benefit of self-adaptivity to changes in the network conditions. Another work^{81} utilized an SVM classifier model, again as a binary classifier designed to label light paths as having sufficiently high QoT to be established or not. Simulated data are used for training, as is common in network-scale research due to the lack of availability of detailed datasets from deployed networks.

Furthermore, an interesting research avenue within QoT estimation is the use of physics-based models in concert with ML. This can be done in a number of ways, for instance, our physical models can be embedded into the ML directly. For example, a methodology for the training of NNs that obey physical laws defined by partial differential equations was recently presented.^{82} The first steps toward using this in optical fiber communications have been taken, where a physics-informed NN was used to solve the nonlinear Schrödinger equation (NLSE) in an optical fiber and model pulse evolution.^{83} An alternative approach is the physics-informed GP regression method, in which a physical model, in this case the SSFTM, is embedded within the GP.^{84} This allows one to train GPs with fewer measurements of the system and represents an explainable ML approach with a well-quantified prediction uncertainty. Additionally, there are works such as those described above,^{70,71} which focus on learning more accurate inputs to a physics-based QoT model. A similar approach has been applied to nonlinearity estimation.^{85} Specifically, ML is utilized to reduce physical model errors and to combine modeling and monitoring schemes for nonlinearity estimation. Moreover, it is possible to use our knowledge of system physics to improve ML in other ways, such as to engineer higher performing input features.^{78}

Finally, a recent paper^{86} highlights the remaining roadblocks that stand in the way of effective deployment of ML in QoT estimation. Specifically, due to competition-related concerns, telecommunication companies are not willing to give external researchers access to real network datasets, resulting in a reliance on simulated data, or data that are produced using a lab setup. Due to the limitations of physics-based models outlined above and the fact that a lab-based network is always going to be more idealized than a deployed network, such data may not be fully representative of deployed network data. As a result, the true efficacy of ML approaches for deployed networks is unknown. Moreover, many of the applications of ML in optical networks utilize error metrics that are standard in ML but may not be suited to optical networks. For example, it has been found that for optical network applications of ML, using only the mean squared error may result in an inflated measure of model efficacy and novel error metrics have been recently proposed to address this.^{87} Thus, although the first problem is tricky to address and is largely up to network operators, the second problem provides an interesting avenue for further research.

### B. Digital twins

Digital twins are models that act as a virtual copy or “twin” of a real system. They are inherently data-driven,^{88} taking as input measurements from the real system to build up a model of its governing physical laws, states, and behavior. Information drawn from the digital twins can then be passed to the real system in the form of changes to its operational configuration. This framework is outlined in Fig. 10. As we move toward higher levels of automation in optical communication network design and operation, digital twins are gaining increasing popularity within the research community.^{89} It is hoped that digital twins can help bridge the gap between the ideal physical layer that is commonly assumed in optical communications and physical layer behavior in deployed networks, which is far from ideal. Although ML is not a required component of digital twins, due to their data-driven nature it is natural that ML approaches can be useful for creating digital twin models. ML can be used as the basis for the digital twin itself—we can take measurements from the real network and train a sufficiently complex ML algorithm to emulate the behavior of the network. Alternatively, we can build the digital twin from physics-based models and utilize ML to reduce the gap between these models and reality. For example, we can use ML to reduce the uncertainty in the model inputs, as discussed in Sec. III A. Additionally, ML can also be used in order to extract more information from network monitors, which may allow for the development of more detailed digital twins.

A framework for applying digital twins in optical networks has recently been proposed,^{89} focusing on three crucial applications: fault prediction, hardware configuration, and simulation of transmission. Different ML approaches from the literature are proposed for each of these applications. For fault prediction and diagnosis, two models are proposed, a RNN to extract the operating state from time series data taken from monitors^{90} and an XGBoost^{91} model to map information from network monitors to new features to aid with fault diagnosis.^{92} Moreover, DRL is proposed to learn an optimal strategy for hardware optimization.^{93} Specifically, the agent learns to control the configuration of the programmable optical transceiver in order to maximize the QoT for varying operating conditions. Finally, a RNN-based approach is proposed to learn a model of the physical layer transmission in the network as a function of time series monitoring data.^{94} Thus, the digital twin is created by combining these models, continually updating them with new data and using them to control the network.^{89} Another recent work demonstrated a digital twin model based on an autoencoder, which is trained on an open-source dataset of power spectral density (PSD) profiles before and after transmission through an experimental optical network.^{95,96} Specifically, this model is used to find the input PSD that produces a desired output PSD. Thus, this model can be used as part of a digital twin to achieve optimal control of the network. It should also be noted that PSD may be a less widely understandable QoT metric and that methods to obtain the optical signal-to-noise ratio (OSNR) from PSD data have been proposed that could be used to convert these data, either before or after training the autoencoder. Other works have successfully utilized autoencoders for end-to-end learning of an intensity modulation direct-detection (IM/DD) optical communication system, outperforming conventional signal processing techniques.^{97} This has been recently extended to include optimization of the symbol distribution for coherent systems.^{98} Such techniques may be of use in the development of digital twins as they constitute an end-to-end virtual model of the system with inherent mapping and feedback between the virtual model and the physical system.

### C. Equalization in short reach applications

Optical short reach systems, defined as having a length less than 100 km, are applied in server-to-server, intra-data center, inter-data center, access, and metro links. Due to stringent requirements of low complexity and cost, minimal power consumption, and small carbon footprint, IM combined with DD with simple on–off keying (OOK) or pulse amplitude modulation (PAM)-4 modulation format is still a preferable transceiver technology compared to coherent systems.^{99}

Increasing demand for high data rate short reach applications such as IM/DD based systems causes several performance limiting factors that need to be addressed. A schematic of a short reach link with possible sources of linear and nonlinear impairments is shown in Fig. 11. First, chromatic dispersion (CD) severely limits the link power budget margin. With a high symbol rate and several kilometers of transmission, the interaction of CD and DD causes a power-fading effect and the detected signal may contain frequency notches. DD is based on square law detection, which complicates the CD equalization, as we cannot simply multiply the received signal spectrum with the inverse of the CD transfer function as in coherent systems. Another common impairment in short reach systems is considerable low-pass filtering effects due to the insufficient bandwidth of various components, which can cause severe inter-symbol interference. Furthermore, as short reach systems often have constrained financial budgets, low-cost components produce non-idealities, resulting in performance degradation. Similarly, low-cost devices such as lasers, modulators, photodiodes, and trans-impedance amplifiers also produce nonlinear distortions, such as level-dependent skew and level-dependent noise.^{100}

For equalization of linear impairments, a feed-forward equalizer (FFE), usually based on a finite impulse response filter, is commonly used. The effect of frequency notches cannot be mitigated by using a FFE, although a decision feedback equalizer (DFE) can be added after a FFE to combat such an effect. However, DFEs may suffer from error propagation and instability due to the decision feedback scheme. Moreover, FFEs/DFEs cannot mitigate the nonlinear effects. Volterra nonlinear equalizers are an effective way to mitigate both fiber nonlinearity and component nonlinearities.^{101} However, the major drawback of this equalizer is the large implementation and computational complexity.

Recently, ML techniques attracted significant attention for equalization of short reach systems. Among different ML-based techniques, NN-based equalization is in the center of this interest. A sufficiently large NN having at least one hidden layer can approximate any function and thus can be used as an equalizer of both linear and nonlinear impairments. Usually, the input vector of the equalizer corresponds to a set of consecutive sampled symbols. The length of vector should be long enough to consider the channel memory. The NN can be structured with a single hidden layer and large number of nodes or multiple hidden layers (i.e., a DNN) with relatively fewer nodes. The choice of nonlinear activation function in each hidden layer is important as it enables approximation of nonlinear functions to deal with the distortion of short reach systems. The commonly used hidden layer activation functions are the sigmoid function, the rectified linear unit (ReLU), and the hyperbolic tangent (tanh) function. On the other hand, the Softmax activation function is usually chosen for the output layer, as this function facilitates making symbol decisions for any PAM-N signal in addition to the equalization.^{102} In several experimental demonstrations, it has been shown that NN-based equalizers outperform conventional equalizers, such as FFE and Volterra nonlinear equalizers.^{102,103} In addition, a field programmable gate array (FPGA) implementation of a fixed point DNN-based equalizer was demonstrated for high-speed passive optical networks.^{104}

The CNN-based equalizer was also investigated by Li *et al.*^{105} As the convolution layer acts as a multi-channel nonlinear learned local pattern detector, it allows the equalizer to overcome the inter-symbol interference and device nonlinearity. In CNN-based nonlinear compensation, the time series input signal is converted to a 1D input array with *N* elements comprising $N\u22121/2$ past and post-symbols, followed by the multiple convolutional layers and fully connected layers with a nonlinear activation function. Experimental demonstrations showed that the CNN-based approach yields a considerable performance improvement as compared to a DNN-based approach.^{105,106}

NNs also exhibit powerful equalization capabilities compared to feed-forward Multilayer perceptron (MLP) or CNNs, as they can use the feedback of past output values as an additional input while calculating the present output value.^{107,108} With such additional feedback information, RNNs perform better than FFNNs, which is analogous to the performance improvement given by the combination of FFE and DFE compared to FFE only. Auto-regressive RNN and layer RNN are two commonly used types of RNN, and the former has better equalization performance.^{109} An RNN-based equalizer with parallel outputs was investigated using an FPGA implementation for 100 Gb/s passive optical network application.^{110} As a variant of RNNs, LSTMs were also demonstrated for the equalization of a 50 Gb/s PAM-4 transmission system.^{111}

In addition to various NN-based equalizers, SVM-based approaches have been demonstrated as an effective tool for mitigation of nonlinear impairments in a short reach application scenario.^{112,113}

The computational complexity of the nonlinear equalizer is a critical issue for short reach in optical communications because the equalizer needs to be implemented in real-time operating at an extremely high symbol rate. It has been shown that a NN-based equalizer with a single hidden layer can provide better performance with lower computational complexity compared to a Volterra equalizer.^{114,115} However, a comprehensive analysis of computational complexity and performance for various advanced ML-based equalization approaches is required. In addition, the techniques for reduction in complexity need to be explored. Given that there is significant potential for practical NN-based equalizers to be implemented on digital signal processing (DSP) ASICs, ML-based equalization may become the mainstream technology for next generation short reach IM/DD-based systems.

### D. Fiber nonlinear noise mitigation in long-haul transmission systems

^{116}

*α*is the fiber loss coefficient,

*β*

_{2}is the fiber group velocity dispersion coefficient, and

*γ*denotes the fiber nonlinear coefficient.

Although the SSFTM can be used to numerically solve the NLSE, the accuracy is low when the interplay among signal, noise, nonlinearity, and dispersion effects is considered. Therefore, the performance improvement of the conventional digital back-propagation (DBP) method based on the NLSE is limited.^{117} Since the performance improvement is related to the modeling accuracy, ML techniques can be applied to describe the evolution of the optical signal after long-haul transmission. Specifically, ML techniques are applied to find a nonlinear function ** f** that can map the received symbol to the transmitted symbol under certain criteria.

Unlike in short reach transmission scenarios, the nonlinear function ** f** has to be obtained by separating the I and Q branches of the complex-valued signal. In the early works,

^{118}an artificial neural network (ANN) has been used in a coherent receiver after CD compensation with extreme learning machine (ELM)-based training techniques. The simulation results for 27.59 GBd/s return-to-zero (RZ) quadrature phase shift keying (QPSK) show that the ELM-based technique can provide similar performance to conventional DBP with much lower computational complexity after 2000 km standard single-mode fiber (SSMF) transmission. Recently, LSTMs have been proposed to mitigate the fiber nonlinear impairments in dual polarization WDM transmission systems. It was shown in simulation that LSTMs can provide better performance than conventional DBP techniques with six steps per span.

^{119}

It is known that the nonlinear noise is non-Gaussian distributed. Therefore, conventional linear boundaries are not effective in the nonlinear fiber channels. One general idea of ML-based coherent receivers is to design nonlinear decision boundaries. These are assumed to be more suitable for the nonlinear fiber channel because the nonlinear noise generated in the fiber channel need not be a Gaussian distribution. A few techniques have been applied to design such nonlinear classifiers. An M-ary SVM has been introduced to mitigate the nonlinear phase noise in the single-channel single-polarization (SCSP) 16-QAM coherent optical systems. Compared with the linear channel equalization case, the simulation results show that M-ary SVMs can increase the optimal launch power by around 4 dB and extend the transmission distance by around 1200 km.^{120} The K-nearest neighbor (KNN) algorithm has also been utilized to mitigate the channel impairments, including the laser phase noise and nonlinear fiber noise. The simulation results show that the optimal launch power can be enhanced by ∼0.4 dB in the SCSP 16-QAM coherent transmission system.^{121} Another work using K-means clustering^{122} experimentally investigated the requirements of the length of the training symbols for the fiber nonlinear mitigation in the SCSP 64-QAM 80-km transmission scheme. It was observed that a 10% training overhead is sufficient to obtain the optimal performance. Another recent publication utilizing nonlinear classification is based on the Parzen window (PW) classifier technique, which is inherently a multi-class technique and can be implemented in online learning mode.^{123} Considering the DBP technique as a benchmark, simulation results prove that a PW classifier can further improve the performance by ∼0.35 and ∼0.2 dB for 16-QAM after 1600 km and 64-QAM after 480-km fiber transmission. A density-based spatial clustering of applications with noise algorithm was employed for blind fiber nonlinearity compensation.^{124} The experimental result showed that this algorithm can provide up to 0.83 and 8.84 dB enhancement in the Q-factor when compared to conventional k-means clustering and linear equalization, respectively, in a 40 Gb/s 16-QAM system after 50-km SSMF transmission. A histogram-based clustering algorithm was also demonstrated in a coherent optical long reach passive optical network, which achieves a Q-factor 0.57 dB higher than that achieved using maximum likelihood and 0.21 dB higher than that obtained using k-means clustering.^{125} In another recent work, an FPGA-based real-time fiber nonlinearity compensator using the sparse K-means++ clustering algorithm was experimentally demonstrated in a 40 Gb/s 16-QAM self-coherent optical system. This resulted in a 3 dB improvement in the Q-factor compared to linear equalization at 50-km transmission.^{126} More recently, a DNN-based nonlinear classifier with a cross-entropy cost function was used as a soft-demapper for soft-decision forward error correction (FEC).^{127} In optical coherent 92 GBd dual polarization 64-QAM 950 Gb/s back-to-back measurements, the DNN-based nonlinear classifier is shown to have better performance than pruned Volterra nonlinear equalizers by 0.35 dB in OSNR with equal complexity or achieve the similar performance with 65% less computational complexity.

The above ML techniques in optical communications are operated as a black box to obtain the data-driven models with unparalleled performance. Therefore, some works have tried to contribute more insights into how the nonlinear fiber noise is mitigated by the ML techniques. Recently, the structure of a NN is designed to be similar to the DBP structure, which is called a learned DBP algorithm.^{128} It is known that the conventional DBP algorithm is a cascade of linear filters *D*^{−1} for CD compensation and nonlinear operations *N*^{−1} for nonlinear phase derotation, as shown in Fig. 12(a). Each linear filter *D*^{−1} is given by the frequency-domain transfer function $Hk\omega =exp\u2212\alpha +i\omega 2\beta 2Lk/2$, where *L*_{k} is the length of the *k*th span. The nonlinear operation *N*^{−1} for the *k*th span is given by $\delta kx=xexp\u2212i\gamma \xi kx2$, where *ξ*_{k} is a scaling factor. It should be noted that practical implementation of the linear filter *D*^{−1} is realized based on a time-domain finite impulse response filter and the filter coefficients are adjusted during training of the NN. Therefore, the interleaving linear and nonlinear processing in DBP can be regarded as the linear and nonlinear operations in the multi-layer NN, as shown in Fig. 12(b), where the input is the received samples and the output is the estimated symbol sequence. In this case, the parameters *ξ*_{k} and the filter coefficients of *D*^{−1} can all be optimized via ML techniques. An experimental demonstration is also conducted to evaluate the effectiveness of the learned DBP algorithm in a DP 5 channel WDM transmission system considering other channel impairments in a coherent transmission system, including frequency offset and laser phase noise.^{129} The experimental results show that 1-steps per span and 2-steps per span learned DBP provide an additional gain of 0.25 and 0.45 dB over conventional 50-steps per span DBP and a total gain of 0.85 and 1 dB over linear equalization, respectively. It is also shown that learned DBP can give an insight into how and what the NN learns, which may guide people to analyze the interplay between CD, nonlinearity, and noise more closely. As of the complexity, it is shown that the performance of learned DBP based on 1 step per span is better than conventional DBP with 50 steps per span.^{130} Note that the performance improvements of learned DBP originate from optimizing the parameters in DBP, and it incurs no additional computational complexity.

^{130}

*P*

_{0},

*H*

_{m},

*V*

_{m},

*C*

_{m,n}are the optical power, sample sequences for

*x*and

*y*polarization, and the perturbation coefficients, respectively. In the conventional method,

^{131}the perturbation coefficients

*C*

_{m,n}can be analytically computed, given the link parameters and signal pulse duration/shaping factors. Alternatively, the perturbation coefficients

*C*

_{m,n}can be obtained via a two-layer NN, which can describe the model with higher accuracy by taking into account higher-order nonlinearities. In a single-channel 32 GBd DP-16QAM transmission system, ∼0.6 dB Q-factor improvement is observed after 2800-km SSMF transmission when the transmitted symbols are pre-distorted based on the estimated perturbation coefficients via NN.

ML-based compensation for multicarrier modulation formats has also been investigated. For the orthogonal frequency-division multiplexing (OFDM) format, an ANN was proposed, which provides 2 dB Q-factor improvement for the 40 Gb/s 16-QAM signal after 2000 km fiber link.^{132} This improvement increased to 4 dB at the data rate of 70 Gb/s. A multiple-input and multiple-output-DNN-based nonlinear dispersion compensator was also demonstrated for the 40 Gb/s coherent OFDM system that achieved significant power margin improvement over both a conventional linear equalizer and a single-input single output DNN.^{133} Considering the same experimental setup, support vector regression shows 1 dB Q-factor improvement over the full-field DBP method for 40 Gb/s 16-QAM OFDM over 2000 km SSMF transmission.^{134} In a further work, a Newton-based SVM method that requires significantly less computational load than a conventional SVM was proposed to extend the optimum launched optical power by 2 dB compared to the Volterra-based nonlinear equalizer.^{135} Finally, we consider the issue of flexibility in NN-based nonlinear channel equalizers. A general question concerning flexibility is whether we need to repeat the training process when the channel conditions (modulation format, launch power, transmission distance, etc.) are changed. In order to solve this issue, transfer learning has been proposed recently to reuse some parameters from the NN model trained for the previous system to build a new NN model that fits the modified system with a smaller amount of training resources.^{136} The simulation results indicate that the number of epochs or size of the training dataset can be reduced by up to 99% when transfer learning is used. Therefore, a fast re-configurable nonlinear equalizer is possible for the practical implementation of optical networks.

## IV. NETWORK LAYER APPLICATIONS

In this section, we describe crucial research domains within the network layer and highlight selected ML approaches to tackling the problems in these domains from the literature. Namely, these domains are network traffic prediction and generation and core network parameter optimization. As detailed below, we find that supervised learning approaches have been successfully deployed in the former domain, whereas RL approaches have shown great potential in the latter. Table II summarizes these applications, highlighting the advantages of the particular ML methods employed.

Application . | ML technique(s) . | Advantages . | Reference(s) . |
---|---|---|---|

Traffic prediction and generation | FFNN | Adaptive method and improved resource utilization | 147 and 142 |

RNN (GRU, LSTM) | Captures temporal aspects and more capacity available | 143, 145, and 146 | |

GP | Improved efficiency and reduced traffic disruption | 140 and 148 | |

SVM, DT, RF, LDA | Classification | 138 and 139 | |

GNN | Captures graph structure | 144 | |

GAN | Ability to generate realistic data | 149 | |

Core network parameter optimization | RL | Handles dynamic traffic request | 161 and 162 |

GNN | Leverages network structure and topology invariance | 163 and 164 |

Application . | ML technique(s) . | Advantages . | Reference(s) . |
---|---|---|---|

Traffic prediction and generation | FFNN | Adaptive method and improved resource utilization | 147 and 142 |

RNN (GRU, LSTM) | Captures temporal aspects and more capacity available | 143, 145, and 146 | |

GP | Improved efficiency and reduced traffic disruption | 140 and 148 | |

SVM, DT, RF, LDA | Classification | 138 and 139 | |

GNN | Captures graph structure | 144 | |

GAN | Ability to generate realistic data | 149 | |

Core network parameter optimization | RL | Handles dynamic traffic request | 161 and 162 |

GNN | Leverages network structure and topology invariance | 163 and 164 |

### A. Network traffic prediction and generation

In the state-of-the-art optical networks, traffic is typically represented by demands.^{139–139} The optical network operates based on a time scale and can be divided into time steps or iterations. In particular, in each time step/iteration, a number of demands arrive to the network, some of which are established. Every demand can be described by the time step in which it appears, a source node that represents the demand initial node and a destination node that represents the demand final node, demand volume, and holding time.^{137} In a real-time flexible networking scenario such as elastic optical networks (EONs), where the network can adapt to accommodate the incoming traffic,^{140} ML techniques coupled with dynamic routing algorithms can improve the overall network performance significantly.^{141} One of the key challenges in increasing the efficiency of network operation is to predict the bandwidth requirement in the next time step based on the measurement of traffic characteristics in real time. When using ML methods, the goal is to forecast future traffic rate variations as precisely as possible based on the measured history.

NN-based approaches are the most commonly used ML technique in the literature of traffic prediction,^{143–145} with early research utilizing standard ANNs.^{141} Following this, later research used different variations of NNs.^{144–145} Moreover, others employed NNs with an improved optimizer such as Zhan *et al.*,^{146} who utilized a NN model optimized by the adaptive artificial fish swarm algorithm to predict tidal traffic.

Variations of NN approaches appearing in the state-of-the-art of traffic prediction include RNNs, such as Gated Recurrent Units (GRU) and LSTM owing to their capability of adaptively capturing dependencies on different time scales (see Sec. II). GRU is studied to make predictions of traffic matrices for a fixed-grid WDM network^{142} and for a backbone EON.^{143} LSTM is studied for traffic prediction in passive optical networks^{144} and for core networks.^{137} Figure 13 describes an example of a traffic prediction model based on GRU.

Another recent type of NN studied in the traffic prediction literature is GNNs. In the context of network topology based traffic data, the ability of GNNs to leverage a graphical representation to learn inter-node dependencies of the network graph shows strong potential for applicability in this domain. Gui *et al.*^{143} studied the pair-wise spatial correlations between optical network nodes using a directed graph. The nodes of this graph represent switch traffic, and the weights of edges denote connections among optical network nodes. A GCN was then employed to leverage these spatial correlations. Vinchoff *et al.*^{145} employed GCNs and GANs for prediction of traffic bursts in the optical network. Three types of burst events were modeled, namely, plateau, single-burst, and double-burst, representing steady traffic, a rapid traffic spike followed by a steady decrease, and a rapid traffic spike followed by an unexpected greater traffic spike, respectively.

Another ML approach that has been successfully applied to traffic prediction is GPs. The ability of GPs to capture temporal aspects of traffic flows allows both the short term and long-term prediction of input traffic. Studies have shown agile management of resources in a core optical network using GP-based traffic prediction.^{139,147}

Recent comparative studies^{137,138,143} on traffic prediction highlight the relative strengths of different ML methods used in the state of the art. Szostak and Walkowiak^{137} compared the efficacy of different ML methods, including FFNN, SVM, DT, random forest (RF), and linear discriminant analysis (LDA), for the problem of predicting source and destination for demands in a dynamic optical network setting. Furthermore, this was extended by including the prediction of traffic volume and holding time.^{138} They observed that the best classifier for such tasks was LDA.^{137} Additionally, Gui *et al.*^{143} benchmarked their GCN-GRU based traffic prediction over several approaches, including LSTM, CNN, and GRU, and the results suggested that GCN-GRU has a greater prediction quality as compared to these other approaches.

As introduced in Sec. II, GANs are designed for realistic data generation and thus show potential in simulated traffic generation for optical networks. In a GAN-based traffic data generation scenario,^{148} the objective of the generator is to transfer the random noise into the generated traffic data and attempt to make the characteristics of generated traffic data close to those of the real world traffic data. In contrast, the discriminator attempts to correctly determine whether the data are from the actual traffic dataset or the generated traffic dataset. Via intense competition, the discriminator and the generator are improved by each other and the generated traffic data become increasingly similar to the actual real world traffic data.

### B. Core network parameter optimization

In this section, we intend to discuss the core optical network parameter optimization given in the frameworks of RL. Core optical networks play the most substantial role in the national and international communication infrastructure. They typically consist of flexible devices, such as the re-configurable optical add/drop multiplexers (ROADMs) and bandwidth variable transponders (BVTs). ROADMs are commonly used to transmit optical signals between different nodes, whereas BVTs are used to adapt a large set of core optical network parameters, such as signal modulation format, coding scheme, forward error correction overhead, and symbol rate, based on the current optical link requirements. Adopting the core optical network parameters is especially vital when attempting to maximize the ultimate network information throughput. However, this procedure requires the optimization of a large parameter space. In addition, finding much more efficient use of core optical network spectral resources is essential to cope with ever-growing bandwidth demand.

Conventionally, in the case of fixed-grid WDM optical networks with a static traffic request assumption, the network parameters adjustment can be realized via adapting launch power per channel and signal modulation format with regard to stochastic system impairments in the physical layer.^{149} The typical core optical network physical layer impairments occurred between its nodes are the amplified spontaneous emission noise arising from the optical amplifiers and the nonlinear interference noise-like distortions induced by the four-wave mixing process in Kerr-type nonlinear media, i.e., in the optical fiber. In essence, the exact behavior of optical data signals between two nodes can be obtained numerically by solving the NLSE/Manakov equation via the SSFTM, when the step-size tends toward zero. However, the numerical solution is a comparatively time-consuming process, especially for wide-band transmission systems. Currently, the most widely used physical layer impairments models are the family of the so-called Gaussian noise (GN) models, which commonly rely on the first-order perturbation theory.^{66} Moreover, under fairly reasonable assumptions, these models admit analytical closed-form approximations that significantly speed up the evaluation of the physical layer impairments. The resource allocation problem in the case of a single flexible-grid fiber link via the GN model closed-form approximation was considered in Ref. 150. Here, it is also worth mentioning that the possibility of quickly performing physical layer impairments estimations is essential regardless of the type of optimization frameworks.

RL has recently appeared as an alternative to conventional approaches, such as integer linear programming (ILP),^{151,152} and heuristics, such as simulated annealing, *k*-shortest path routing and first-fit^{153} and the genetic algorithm (GA).^{154} Generally speaking, RL is capable of efficiently overcoming a wide class of complex optimization problems.^{155} However, in the context of core optical networks, RL cannot be applicable straight away, as it must be generalized to learn over arbitrary network topologies with dynamically changing scenarios, such as network topology, traffic, routing, and link failures. Over the last few years, some initial works have suggested deep RL for solving various resource allocation and dynamic routing problems in core optical networks,^{158–159} in which the advantages of using RL-enabled methods over traditional heuristic optimization algorithms were emphasized.

Yet, more interesting examples of using an RL framework for maximizing the point-to-point link capacity by means of adjusting controllable parameters in core optical networks have been recently reported in Refs. 160 and 161, where the heuristic GA based results were used as a performance benchmark. The predicted performance of these two approaches remains very similar. However, after an initial training phase, the computation time of BVT parameters optimization to maximize the overall network throughput based on the RL approach is up to 1 second on average, while traditional heuristic algorithms may take in the order of minutes to hours. Additionally, preliminary investigations into network routing and parameter optimization show promising potential in leveraging the ability of GNNs to learn and model graph-structured information.^{162,163} Such models are able to generalize over arbitrary network topologies, routing schemes, and traffic intensity.

## V. FUTURE DIRECTIONS

A number of ML-driven future research directions are emerging within optical networks across both the physical layer and the network layer. In this section, we outline selected future directions within the physical layer, the network layer, and those spanning both layers.

### A. Physical layer

An emerging theme within applied ML that is interesting in the context of optical networks is explainable ML,^{32} a subset of explainable artificial intelligence^{164} that aims to make the processes by which ML algorithms make decisions more understandable to humans. Optical networks are operated with high availability, meaning that light paths must stay within the accepted QoT ranges often at least 99.999% of the time, which translates to just over 5 minutes of downtime per year.^{165} This is enforced by service level agreements, which mean that operators must deliver the quality of service that customers have paid for. As a result, ML approaches deployed on optical networks must meet the stringent reliability requirements that are already satisfied by conventional techniques. Thus, understanding how ML algorithms work is crucial for adoption. Both post-hoc explainability methods and inherently explainable ML approaches have potential to yield substantial benefits for ML applied within optical communication networks. There are now open-source libraries that provide implementations of post-hoc techniques,^{166} making their application convenient. It may be better to have a more easily interpretable model with slightly worse performance in some situations if operators can understand how it makes decisions and therefore can be more confident in its reliability. Additionally, probabilistic ML methods such as GPs provide well-quantified predictive uncertainties that can aid with the interpretation of ML model predictions, which would be greatly beneficial for many applications of ML within optical networks.

Another interesting avenue for future research is the combination of physical models with ML, so as to embed our knowledge of system physics into models such as NNs and GPs, as discussed in Sec. III A. For example, physics-informed ML approaches to QoT estimation can allow us to train models with fewer measurements of the system and enhance model explainability. Additionally, we can use our knowledge of the physics to design more effective model architectures. For example, NNs can be designed using the DBP structure for nonlinear noise mitigation. Certainly, the concept of utilizing the information available before we have seen the data and the data itself, rather than discarding this and relying solely on the data, presents an interesting research direction.

A further promising future research direction is digital twins—having been shown to be effective in other research areas, such as healthcare technology, manufacturing, and smart cities,^{88} there are many open research questions for the development of digital twins for optical networks. The realization of true digital twins for optical networks, meaning a high-fidelity virtual copy of a deployed network, will require the amalgamation of models of all aspects of optical networks discussed in this Tutorial. It will also require access to high-quality datasets that are representative of deployed networks, as described above. Additionally, there is an important question regarding how fast digital twins will operate and whether a truly real-time digital twin is realizable. This depends on two factors—how dynamic installed networks become in the future and how operator confidence in ML approaches evolves over time. As networks become increasingly more dynamic, meaning that light paths are established and torn down with greater frequency, the time required to accurately measure the network may begin to form a bottleneck for how fast a digital twin can respond to a change in the network. Moreover, the time taken to retrain models may also limit this responsivity, meaning that online and transfer learning will likely be needed to ensure that ML models remain accurate as the network changes and to support rapid modeling of new light paths. Operator confidence in ML is also crucial as a true digital twin framework requires automatic control of the network based on data. As a result, explainability techniques are important for the development of digital twins as they will increase confidence in the ML models upon which the digital twins are built.

Furthermore, work is required to reduce the complexity of ML algorithms, in order for them to be successfully deployed with a reasonable use of computational resources. For example, in short reach equalizer applications, lower complexity ML is desirable due to the requirements for real-time equalization at high symbol rates. In general, ML techniques will need to have sufficiently low complexity in order to adapt to increasingly dynamic networks. One solution to this may be online learning, where ML models can be trained offline before deployment and adapt to monitoring data once deployed without completely re-training the model. An additional related challenge is the flexibility of ML algorithms—to what extent can the deployed models generalize to cover different network scenarios? One potential solution to this issue is transfer learning, which has been proposed as a method for increasing the flexibility of NNs for fiber nonlinear noise mitigation by re-using some of the initial trained network weights to adapt to a new situation.

An additional future direction is provided by hardware-driven ML approaches to equalization and nonlinearity compensation problems in optical networks. Due to the challenging requirements to operate at real-time data rates, the use of specialist hardware such as FPGAs is crucial for these applications. Low complexity implementations of ML architectures, such as DNN and RNN equalizers^{104,110} and real-time nonlinearity compensation^{126} discussed in Sec. III, present an interesting future direction for performing such signal processing tasks in next generation optical networks.

### B. Network layer

As in the physical layer, explainable ML is a promising field of research within network layer applications. Similarly, reduction in ML algorithm complexity is also an interesting future direction for network layer applications, particularly for any methods which are required to work in an online scenario.

Obtaining sufficiently detailed datasets from deployed networks remains a significant challenge for ML research in optical networks. Such data may often be difficult to find, as network operators may not be able to grant researchers access to detailed network data without a non-disclosure agreement, due to competition-related concerns. In the cases where such data are provided,^{167,168} it could still be insufficiently detailed to be of use. As discussed in Sec. IV, GANs show potential to address this issue to some extent with their ability to generate larger datasets from a small amount of input data. To this end, GANs have been successful in generating data that are indistinguishable from real world input data in optical network traffic generation applications^{148} and numerous other applications in the computer vision domain.^{169}

An additional promising research direction in network routing and parameter optimization is leveraging the ability of GNNs to learn and model graph-structured information to create models that are able to generalize over arbitrary network topologies, routing schemes, and traffic intensity.^{162,163} Furthermore, the preliminary works applying RL techniques in dynamic parameter optimization have shown strong potential, with faster response time and similar quality of solutions compared to conventional optimization approaches.^{156} To this end, it would be interesting to investigate the means of bringing the strengths of RL and GNNs together in a data-driven network routing and parameter optimization scenario.

Moreover, within traffic prediction and generation, future work includes extending the proposed methodologies to networks of different scales, such as core and access networks. Another potential direction is the introduction of novel methods that have been used successfully in time series forecasting problems in other domains, such as echo state networks,^{170} and combining existing ML approaches to develop more effective hybrid methods. For example, hybrid models of GNNs and LSTMs could be investigated as these harness both the knowledge of the network structure and the temporal aspects of the traffic, respectively. Finally, integrating traffic prediction and simulation modules with other modules in a SDN setting will aid in achieving high performance in increasingly dynamic and flexible networks.

## VI. CONCLUSIONS

In this Tutorial, we have outlined the key research challenges in optical networks that exist today, the ML techniques that have been proposed to solve these problems, and interesting works from the literature that have applied ML. We have introduced the crucial concepts required to navigate ML literature and highlighted techniques that are commonly used in optical networks: various forms of NNs, Bayesian approaches such as GPs, classifiers such as SVMs, and RL techniques such as deep Q-learning. In the physical layer, we have surveyed the literature applying ML to QoT estimation, digital twins, equalization for short reach networks, and nonlinear noise mitigation for long-haul systems. In the network layer, we have presented exemplary work tackling network traffic prediction and generation and the optimization of core network parameters. Thus, there has been a significant progress in ML applied to optical networks, with a vast range of methods utilized, each yielding benefits over previous approaches. There remain a number of interesting avenues for future research as discussed above, which will be crucial in delivering the next generation of optical networks and meeting the service requirements of the future.

## ACKNOWLEDGMENTS

The authors acknowledge BT, Huawei, and the EPSRC [IPES CDT (Grant No. EP/L015455/1) and TRANSNET (Grant No. EP/R035342/1)] for funding.

## AUTHOR DECLARATIONS

### Conflict of Interest

We confirm that we do not have any conflicts of interest.

## DATA AVAILABILITY

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

## NOMENCLATURE

- ADC
analog-to-digital converter

- ANN
artificial neural network

- APD
avalanche photodiode

- BER
bit error rate

- BVT
bandwidth variable transponder

- CBR
case-based reasoning

- CD
chromatic dispersion

- CNN
convolutional neural network

- DBP
digital back-propagation

- DD
direct detection

- DDPG
deep deterministic policy gradient

- DFE
decision feedback equalizer

- DL
deep learning

- DML
directly modulated laser

- DNN
deep neural network

- DP
dual polarization

- DQN
deep Q-network

- DRL
deep reinforcement learning

- DSP
digital signal processing

- DT
decision tree

- ELM
extreme learning machine

- EML
electro-absorption modulator

- EON
elastic optical network

- FFE
feed-forward equalizer

- FFNN
feed-forward neural network

- FPGA
field programmable gate array

- GA
genetic algorithm

- GAN
generative adversarial network

- GCN
graph convolutional network

- GN
Gaussian noise

- GNN
graph neural network

- GRU
gated recurrent unit

- ILP
integer linear programming

- IM
intensity modulation

- KNN
K-nearest neighbor

- LDA
linear discriminant analysis

- LMA
Levenberg–Marquardt algorithm

- LSTM
long-short term memory

- MDP
Markov decision process

- ML
machine learning

- MMF
multi-mode fiber

- NLSE
nonlinear Schrödinger equation

- NN
neural network

- OFDM
orthogonal frequency-division multiplexing

- OOK
on–off keying

- OSNR
optical signal-to-noise ratio

- PAM
pulse amplitude modulation

- PSD
power spectral density

- PW
Parzen window

- QAM
quadrature amplitude modulation

- QoT
quality of transmission

- QPSK
quadrature phase shift keying

- ReLU
rectified linear unit

- RF
random forest

- RL
reinforcement learning

- RNN
recurrent neural network

- ROADM
re-configurable optical add/drop multiplexer

- RZ
return-to-zero

- SCSP
single-channel single polarization

- SDN
software defined network

- SMF
single-mode fiber

- SNR
signal-to-noise ratio

- SOA
semiconductor optical amplifier

- SSFTM
split-step Fourier transform method

- VCSEL
vertical cavity surface emitting laser

- WDM
wavelength division multiplexing

## NOMENCLATURE

- ADC
analog-to-digital converter

- ANN
artificial neural network

- APD
avalanche photodiode

- BER
bit error rate

- BVT
bandwidth variable transponder

- CBR
case-based reasoning

- CD
chromatic dispersion

- CNN
convolutional neural network

- DBP
digital back-propagation

- DD
direct detection

- DDPG
deep deterministic policy gradient

- DFE
decision feedback equalizer

- DL
deep learning

- DML
directly modulated laser

- DNN
deep neural network

- DP
dual polarization

- DQN
deep Q-network

- DRL
deep reinforcement learning

- DSP
digital signal processing

- DT
decision tree

- ELM
extreme learning machine

- EML
electro-absorption modulator

- EON
elastic optical network

- FFE
feed-forward equalizer

- FFNN
feed-forward neural network

- FPGA
field programmable gate array

- GA
genetic algorithm

- GAN
generative adversarial network

- GCN
graph convolutional network

- GN
Gaussian noise

- GNN
graph neural network

- GRU
gated recurrent unit

- ILP
integer linear programming

- IM
intensity modulation

- KNN
K-nearest neighbor

- LDA
linear discriminant analysis

- LMA
Levenberg–Marquardt algorithm

- LSTM
long-short term memory

- MDP
Markov decision process

- ML
machine learning

- MMF
multi-mode fiber

- NLSE
nonlinear Schrödinger equation

- NN
neural network

- OFDM
orthogonal frequency-division multiplexing

- OOK
on–off keying

- OSNR
optical signal-to-noise ratio

- PAM
pulse amplitude modulation

- PSD
power spectral density

- PW
Parzen window

- QAM
quadrature amplitude modulation

- QoT
quality of transmission

- QPSK
quadrature phase shift keying

- ReLU
rectified linear unit

- RF
random forest

- RL
reinforcement learning

- RNN
recurrent neural network

- ROADM
re-configurable optical add/drop multiplexer

- RZ
return-to-zero

- SCSP
single-channel single polarization

- SDN
software defined network

- SMF
single-mode fiber

- SNR
signal-to-noise ratio

- SOA
semiconductor optical amplifier

- SSFTM
split-step Fourier transform method

- VCSEL
vertical cavity surface emitting laser

- WDM
wavelength division multiplexing

## REFERENCES

*Artificial Intelligence: A Modern Approach*

*Reinforcement Learning: An Introduction*

*Online Machine Learning*

*Deep Learning*

*Neural Networks: A Systematic Introduction*

*Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability*

*Gaussian Processes for Machine Learning*

*Nonparametric Statistical Methods*

*Deep Reinforcement Learning Hands-On: Apply Modern RL Methods, with Deep Q-Networks, Value Iteration, Policy Gradients, TRPO, AlphaGo Zero and More*

*Reinforcement Learning*

*Deep Reinforcement Learning: Fundamentals, Research and Applications*

*Numerical Analysis*

*Q*-factor physical-layer constraints in metro networks

*Photonic Networks and Devices*

*λ*passive optical network

*λ*PON

*λ*PAM4 IM-DD PON

*k*-nearest neighbors